-
Notifications
You must be signed in to change notification settings - Fork 91
supported think mode #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
supported think mode #182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @jiapingW, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses an inconsistency in input ID construction for certain inference models, such as the Qwen3 series, where unwanted <think>
tokens were being included, leading to misalignment between training and inference data. To resolve this, a new is-think-mode
parameter has been introduced, allowing the system to correctly process chat templates and apply loss masks by adapting to the specific 'think mode' behavior of different models. This ensures that training data accurately reflects the expected input format for inference, preventing models from training on tokens that should not contribute to loss calculation.
Highlights
- New Command-Line Argument: A new command-line argument,
--is-think-mode
, has been introduced to thetrain_eagle3_online.py
script. This argument allows users to specify whether the input data requires special handling for 'think' tokens, which is crucial for models like Qwen3. - Enhanced Data Preprocessing: The data preprocessing logic in
preprocessing.py
has been updated to incorporate theis_think_mode
parameter. This enables the system to correctly apply loss masks and process conversations by conditionally adjusting assistant message separators and chat template application based on whether 'think mode' is active. - Updated Chat Templates: The
ChatTemplate
class intemplate.py
now includes anassistant_think_header
field. This addition, along with new and updated model templates (e.g., for Qwen3, QwQ, Llama3, Llama4, Qwen, Deepseek), ensures that the system can accurately handle the specific 'think' token formats used by various inference models, aligning training and inference data.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an is-think-mode
flag to handle variations in chat template processing, particularly for models like qwen3 that have different modes for generating responses. The changes correctly plumb this new flag through the data processing pipeline and update the chat templates accordingly.
My review has identified a critical issue in the conversation preprocessing logic that could break parsing for some existing chat templates. I've also found a minor error in an example code snippet. Please address these points.
Also, there's a small typo in the pull request title ('suppoerted' -> 'supported').
user_message_separator = ( | ||
f"{chat_template.end_of_turn_token}{chat_template.user_header}" | ||
) | ||
assistant_message_separator = ( | ||
f"{chat_template.end_of_turn_token}{chat_template.assistant_header}" | ||
f"{chat_template.end_of_turn_token}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to user_message_separator
introduces a regression. By setting it to just the end_of_turn_token
, it breaks parsing for two cases:
- Templates with an empty
end_of_turn_token
(e.g.,deepseek
). The regex delimiter becomes empty, causing it to match incorrectly. - Templates where
end_of_turn_token
is a newline (\n
) and assistant responses can contain newlines. The regex will prematurely stop at the first newline in the assistant's response.
The previous implementation was more robust. Please revert this part of the change to restore the correct behavior.
user_message_separator = ( | |
f"{chat_template.end_of_turn_token}{chat_template.user_header}" | |
) | |
assistant_message_separator = ( | |
f"{chat_template.end_of_turn_token}{chat_template.assistant_header}" | |
f"{chat_template.end_of_turn_token}" | |
) | |
user_message_separator = ( | |
f"{chat_template.end_of_turn_token}{chat_template.user_header}" | |
) |
Not making these changes will not affect model training, but it will cause the training results to deviate from expectations. For example, the <think> token has never been seen in the QwQ-32B training data. |
assistant_message_separator = ( | ||
f"{chat_template.end_of_turn_token}{chat_template.assistant_header}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assistant_message_separator = ( f"{chat_template.end_of_turn_token}{chat_template.assistant_header}" f"{chat_template.end_of_turn_token}" )
Any reason for removing f"{chat_template.end_of_turn_token}"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adopted the suggested code from gemini code; I added back the chat_template.end_of_turn_token, restoring it to the original assistant_message_separator.
specforge/data/preprocessing.py
Outdated
# if the first message is not from user, skip it | ||
source = source[1:] | ||
|
||
assert len(source) % 2 == 0, "conversation have question without answer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(source) % 2 != 0:
continue # raise ValueError("odd number of turns: question without answer")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think your modifications are more robust.
specforge/data/preprocessing.py
Outdated
assert len(source) % 2 == 0, "conversation have question without answer" | ||
convroles = ["user", "assistant"] | ||
for j, sentence in enumerate(source): | ||
for j, sentence in enumerate(source[:-1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the last element skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the last round of dialogue to be formatted with the correct conversation template, rather than directly appending the entire conversation during construction. Otherwise, the same issue described in the motivation will occur, leading to constructed input IDs that are inconsistent with expectations. Therefore, I adopted the approach of concatenating only the last round of dialogue to ensure that the "think" format of the final round is correct.
specforge/data/preprocessing.py
Outdated
add_generation_prompt=True, | ||
enable_thinking=is_think_mode, | ||
) | ||
conversation += source[-1]["content"] + chat_template.end_of_turn_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last role should be assistant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hope that the final round of dialogue is not constructed directly using apply_chat_template, but rather by concatenating user and assistant messages. This is because the logic apply_chat_template uses to process a complete conversation is inconsistent with the output produced when apply_chat_template is applied directly to only the user message in the think model.
Anyone can help review it? I think it's important for train think mode such as Qwen3-8B. |
Can you resolve the conflicts and fix the lints? Thanks |
Wait some minutes, I will fix these. |
I have resolved the conflicts and fixed the lints. |
Can you rebase it? |
306d3c1
to
bde431c
Compare
ok. |
7cf8a0c
to
09ffac3
Compare
|
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
9e814dc
to
53edcf6
Compare
Hi, I think this is ready to merge and can you check the lint and CI please? |
Thanks, I have handled the lint and tests are queued. |
@jiapingW Hi, I have similar changes in #239 |
I understand that your design supports more models in think mode. I used https://github.dev/sgl-project/SpecForge/tree/feature/add_sgl_online to train qwen2.5-7b. I've generated data for the qwen3 series, but haven't used your version for training yet. However, I believe your implementation may have the following issues:
Simply adding parameters would likely cause issues, so my changes above separate the processing of the final round of conversation from the previous ones. |
You can use the code below to test: if __name__ == "__main__":
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen3-30B-A3B", trust_remote_code=True)
messages = [[
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "I am a model."},
{"role": "user", "content": "What is your name?"},
{"role": "assistant", "content": "My name is QwQ."},
{"role": "user", "content": "What is 1+1"},
{"role": "assistant", "content": "=2."}
]]
# chat_template = TEMPLATE_REGISTRY.get('qwen')
max_length = 2048
conversation = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False,
enable_thinking=True,
)
print(conversation) You will get the response: <|im_start|>user\nWho are you?<|im_end|>\n<|im_start|>assistant\nI am a model.<|im_end|>\n<|im_start|>user\nWhat is your name?<|im_end|>\n<|im_start|>assistant\nMy name is QwQ.<|im_end|>\n<|im_start|>user\nWhat is 1+1<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n=2.<|im_end|>\n I think the last conversation is not a thinking format which violates the expected answer. |
Motivation
We found that for inference models such as the qwq-32b and qwen3 series, the input IDs they construct are inconsistent with the expected ones.
Using qwen3 as an example:
We can see it add a
<think>\n\n</think>\n\n
token at the end of the last response. It corresponds to the case where qwen3 adopts non-think mode. This is also the current implementation, which causes the Eagle model to train tokens that should not be involved in calculating the loss. As for the currently open source qwen3-eagle data such as https://huggingface.co/datasets/Tengyunw/qwen3_8b_eagle3, the data obtained uses the thinking mode. This will cause the training data and the inference data to be misaligned.Therefore, we introduce an additional parameter for the inference model: is-think-mode to determine whether to use the prompt processing related to think mode. Because different models' apply_chat_template handles think mode differently, we choose to splice prompt to complete the following mode. Besides, we modified the template in template.py to adapt the corresponding think input to different models.
You can use the following code under the processing.py to debug and verify the correctness of the assistant message processed and the corresponding entire conversation.
Modifications
Related Issues
Accuracy Test
Benchmark & Profiling
Checklist