Fix bugs for eval_llada.py
#91
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fix bugs listed below:
accelerator.prepare
operations from:LLaDA/eval_llada.py
Line 83 in 3f5e0d0
generate_unitl
andloglikelihood
to direct processingDetailed explanation
1. Remove
accelerator.prepare
Based on note from official documents of
accelerate
v0.34.2, weBesides, calling
prepare
on a model leads to higher GPU memory consumption. For instance, LLaDA-8B-Base takes about 15GB of VRAM when loaded with bf16, but after callingprepare
, more than 40GB is allocated, which causes an OOM error on an RTX 3090.2. Improve readability
I copied the official docstrings from lm-eval (
generation_until
,loglikelihood
and_encode_pair
) to help those who are not familiar withlm-eval
interfaces.I also renamed several variable names to match docstrings.
3. Remove datasets mapping
When I first ran eval code, I got a warning:

When using
.map
on a dataset, the datasets library generates a cache based on a fingerprint, which is typically created by hashing certain elements such as the mapping function (e.g.,_tokenize
). However, in this case, hashing is not feasible, so a new dataset is generated repeatedly. This eventually leads to excessive memory usage, causing the system to kill the process.I change the operations of generating datasets to directly processing them.