Skip to content

rlhf,grpo,qwen30B, Stuck in the initial stage of training #5949

@helin-wang-zte

Description

@helin-wang-zte

When training qwen3-14B and -32B with the same script, it can be completed smoothly. But switching to 30B won't work and won't generate any errors. Is it a problem with the model type 'qwen3_moe'?
The script is as follows:

#rollout:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift rollout \
    --model "$MODEL_PATH" \
    --model_type qwen3_moe \
    --vllm_tensor_parallel_size 2 \
    --vllm_data_parallel_size 2 \
    --port 8099
#traning:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
    --rlhf_type grpo \
    --model "$MODEL_PATH" \
    --model_type qwen3_moe \
    --dataset /mnt/tenant-home_speed/whl/swift_learn/dataset/numina_math_64#32 \
    --output_dir /mnt/tenant-home_speed/whl/swift_learn/output/grpo-training/qwen30B \
    --importance_sampling_level sequence \
    --reward_funcs accuracy \
    --torch_dtype bfloat16 \
    --beta 0.0 \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --steps_per_generation 4 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --num_generations 16 \
    --train_type full \
    --use_vllm true \
    --vllm_mode server \
    --vllm_server_host xxxx \
    --vllm_server_port 8099 \
    --vllm_gpu_memory_utilization 0.6 \
    --vllm_max_model_len 4096 \
    --max_completion_length 2048 \
    --learning_rate 1e-6 \
    --save_total_limit 2 \
    --logging_steps 1 \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --log_completions true \
    --deepspeed zero3 \
    --offload_optimizer true \
    --offload_model true \
    --sleep_level 1
Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions