Skip to content

Conversation

Kamichanw
Copy link

@Kamichanw Kamichanw commented Jul 5, 2025

What does this PR do?

Fix bugs listed below:

  1. Remove unnecessary accelerator.prepare operations from:
    self.model = self.accelerator.prepare(self.model)
  2. Add more docstring to improve readability.
  3. Change the operations of generating datasets in generate_unitl and loglikelihood to direct processing

Detailed explanation

1. Remove accelerator.prepare

Based on note from official documents of accelerate v0.34.2, we

don’t need to prepare a model if you only use it for inference without any kind of mixed precision.

Besides, calling prepare on a model leads to higher GPU memory consumption. For instance, LLaDA-8B-Base takes about 15GB of VRAM when loaded with bf16, but after calling prepare, more than 40GB is allocated, which causes an OOM error on an RTX 3090.

2. Improve readability

I copied the official docstrings from lm-eval (generation_until, loglikelihood and _encode_pair ) to help those who are not familiar with lm-eval interfaces.

I also renamed several variable names to match docstrings.

3. Remove datasets mapping

When I first ran eval code, I got a warning:
7e9c47b54ce7a8f04e50581b88f96e5
When using .map on a dataset, the datasets library generates a cache based on a fingerprint, which is typically created by hashing certain elements such as the mapping function (e.g., _tokenize). However, in this case, hashing is not feasible, so a new dataset is generated repeatedly. This eventually leads to excessive memory usage, causing the system to kill the process.

I change the operations of generating datasets to directly processing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant