[WIP] Megatron GRPO #6025

hjh0119 · 2025-09-30T09:09:56Z

No description provided.

…d definitions

hjh0119 · 2025-10-09T07:32:50Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Generative Reward Policy Optimization (GRPO) within the Megatron framework. It adds a new MegatronGRPOTrainer, corresponding arguments, and integrates with vLLM for efficient generation. The changes are extensive and well-structured. My review focuses on code quality and correctness, and I've identified a couple of areas with redundant code that could be cleaned up to improve maintainability.

swift/megatron/train/rlhf.py

swift/megatron/trainers/grpo_trainer.py

hjh0119 added 30 commits August 29, 2025 18:07

wip

4316425

init wip

5d46eae

args wip

5828229

Merge remote-tracking branch 'origin/main' into mega-grpo

a82cec4

reuse _prepare_rollout_engine

0689b76

merge main

46593cf

mega wip

3da8756

Merge remote-tracking branch 'origin' into mega-grpo

2ca7ac1

wip

d9ec029

override train_step wip

7c56f9f

remove override train_step to grpo

686fc74

Merge remote-tracking branch 'origin' into mega-grpo

095bcbd

sync weight wip

4d9457b

rollout wip

f52d5e1

Merge remote-tracking branch 'origin' into mega-grpo

155d4fb

modify mini_batch_size to generation batch size

3c69c39

wip

eebdd47

loss wip

de6ecfe

fix repeat n

4569e54

Merge remote-tracking branch 'origin' into mega-grpo

f118935

fix padding to multiple of tp_size

9cb84e3

compute loss

8627aa3

fix logps

2292cf8

logging & patch VL

bbe5f39

fix rollout_group & rollout judgement

6a2940c

fix step

486c3d4

merge main

7e8e6b0

move old base trainer to newer

c68d976

fix

6b1653c

offload utils

d4a9dcc

hjh0119 added 6 commits October 9, 2025 09:57

offload context

9dc92a0

Resolve merge conflict in megatron_args.py by removing duplicate fiel…

7bc3d61

…d definitions

fix resolve

91f97ca

fix logps

59f436c

fix old logps

8dea6d7

reduce redundancy

abac696

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

swift/megatron/train/rlhf.py Show resolved Hide resolved

swift/megatron/trainers/grpo_trainer.py Outdated Show resolved Hide resolved

hjh0119 added 5 commits October 10, 2025 10:34

replace token

3a3ff37

fix offload model

2cd89dc

offload optimizer & ref

50d5e6f

support cp

e1a06c6

fix pp+cp

ff9b667

hjh0119 mentioned this pull request Oct 11, 2025

rlhf，grpo，qwen30B， Stuck in the initial stage of training #5949

Open

hjh0119 added 6 commits October 11, 2025 23:30

lora wip

ba4bfbf

Merge remote-tracking branch 'origin' into mega-grpo

e5a6252

arguments document

e22c790

wip lora&cp

b3de262

merge origin

d5bd92c

remove unused patch

fe3270f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Megatron GRPO #6025

[WIP] Megatron GRPO #6025

Uh oh!

hjh0119 commented Sep 30, 2025

Uh oh!

hjh0119 commented Oct 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] Megatron GRPO #6025

Are you sure you want to change the base?

[WIP] Megatron GRPO #6025

Uh oh!

Conversation

hjh0119 commented Sep 30, 2025

Uh oh!

hjh0119 commented Oct 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant