Add preprocessing documentation for DeepSeek-r1 and Llama3.1-8b #2270

anivar · 2025-07-20T10:23:16Z

What's the issue?

Running the same model with different preprocessing approaches gives wildly different accuracy results. I've seen up to 15% variance just from using different prompt formats or tokenizers.

What this PR does

Adds minimal preprocessing documentation for:

Llama 3.1 8B: Exact prompt template and tokenizer settings
DeepSeek-R1: How to handle chain-of-thought outputs and extract final answers

Why it matters

Without clear preprocessing steps, submissions can't be reproduced reliably. This makes it hard to compare results fairly.

Testing

Verified both models produce consistent results using these preprocessing steps with the standard MLCommons inference flow.

Fixes #2245

github-actions · 2025-07-20T10:23:32Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

hanyunfan

LGTM, more info added for readme files

arjunsuresh · 2025-07-22T14:06:33Z

@hanyunfan This is a template not actual information. We should pass this to the respective task forces and get the details.

mrmhodak · 2025-07-22T16:23:45Z

WG Meeting: Will look at this later.

- Created PREPROCESSING.md template for standardized documentation - Added comprehensive preprocessing documentation for Llama3.1-8b - Added comprehensive preprocessing documentation for DeepSeek-r1 - Documented current preprocessing gaps and missing reproducibility steps - Established standard template for future model documentation - Based documentation on successful llama2-70b/processorca.py patterns Addresses mlcommons#2245: Dataset preprocessing code is not shared for several models This maintenance contribution improves preprocessing transparency by: 1. Documenting existing preprocessing patterns 2. Identifying gaps in current documentation 3. Providing template for consistent future documentation 4. Enabling better adaptation across different tokenizers/models

- Remove over-engineered validation scripts - Keep only essential information: tokenizer, prompt template, verification - Add answer extraction for DeepSeek CoT handling - Focus on what directly impacts accuracy variance

anivar · 2025-08-03T12:13:37Z

I've simplified this PR based on the successful pattern from #2300. Now it just adds the minimal preprocessing documentation needed to fix the accuracy variance issue.

The changes are:

Removed validation scripts and complex code
Kept only essential info: tokenizer requirements, prompt templates, and answer extraction
Made it easy to copy-paste and use immediately

This should make it much easier to review and merge. Let me know if anything else is needed!

anivar · 2025-08-17T05:13:43Z

Hi @arjunsuresh @mrmhodak,

I see this needs task force input. What's the decision from the WG meeting?

Should I wait for task force details or close this PR?

anivar requested a review from a team as a code owner July 20, 2025 10:23

anivar mentioned this pull request Jul 20, 2025

Llama-8B missing SingleStream accuracy #2268

Closed

hanyunfan previously approved these changes Jul 21, 2025

View reviewed changes

anivar force-pushed the fix/preprocessing-documentation branch from 79cc505 to 4e425a0 Compare July 24, 2025 15:48

anivar and others added 2 commits August 3, 2025 01:07

Merge branch 'master' into fix/preprocessing-documentation

483d1c2

Simplify preprocessing docs to match PR mlcommons#2300 approach

b288c06

- Remove over-engineered validation scripts - Keep only essential information: tokenizer, prompt template, verification - Add answer extraction for DeepSeek CoT handling - Focus on what directly impacts accuracy variance

anivar dismissed hanyunfan’s stale review via b288c06 August 3, 2025 12:11

Remove claims from code comments - keep purely factual

4e64901

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add preprocessing documentation for DeepSeek-r1 and Llama3.1-8b #2270

Add preprocessing documentation for DeepSeek-r1 and Llama3.1-8b #2270

Uh oh!

anivar commented Jul 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 20, 2025 •

edited

Loading

Uh oh!

hanyunfan left a comment

Uh oh!

arjunsuresh commented Jul 22, 2025

Uh oh!

mrmhodak commented Jul 22, 2025

Uh oh!

anivar commented Aug 3, 2025

Uh oh!

anivar commented Aug 17, 2025

Uh oh!

Uh oh!

Add preprocessing documentation for DeepSeek-r1 and Llama3.1-8b #2270

Are you sure you want to change the base?

Add preprocessing documentation for DeepSeek-r1 and Llama3.1-8b #2270

Uh oh!

Conversation

anivar commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's the issue?

What this PR does

Why it matters

Testing

Uh oh!

github-actions bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanyunfan left a comment

Choose a reason for hiding this comment

Uh oh!

arjunsuresh commented Jul 22, 2025

Uh oh!

mrmhodak commented Jul 22, 2025

Uh oh!

anivar commented Aug 3, 2025

Uh oh!

anivar commented Aug 17, 2025

Uh oh!

Uh oh!

anivar commented Jul 20, 2025 •

edited

Loading

github-actions bot commented Jul 20, 2025 •

edited

Loading