-
Notifications
You must be signed in to change notification settings - Fork 900
[grpo] Optimize vLLM weight synchronization #5773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an optimization for LoRA training with vLLM by enabling in-memory weight synchronization using flattened tensors. This avoids disk I/O and should improve training speed. The changes involve adding new arguments, new protocol definitions, and new methods in the rollout engine and GRPO trainer. A key part of the implementation is monkey-patching vLLM to support loading LoRA adapters from tensors. The overall approach is sound, but there are a few areas that need attention, such as ensuring deterministic adapter selection, cleaning up commented-out code, and addressing TODO comments.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant optimization for LoRA training with vLLM by synchronizing only the adapter weights instead of the full model. This is achieved by patching vLLM to load adapters from in-memory tensors and flattening the weights for more efficient communication. The changes are well-implemented across the trainer, rollout logic, and communication protocols. The documentation has also been updated to reflect these new features. My review includes a few suggestions to improve code quality and fix a minor typo in the documentation.
Optimize weight synchronization between the training model and the inference engine (vLLM):
LoRA
FULL