rlhf，grpo，qwen30B， Stuck in the initial stage of training

**When training qwen3-14B and -32B with the same script, it can be completed smoothly. But switching to 30B won't work and won't generate any errors. Is it a problem with the model type 'qwen3_moe'?**
The script is as follows:
```
#rollout:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift rollout \
    --model "$MODEL_PATH" \
    --model_type qwen3_moe \
    --vllm_tensor_parallel_size 2 \
    --vllm_data_parallel_size 2 \
    --port 8099
```

```
#traning:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
    --rlhf_type grpo \
    --model "$MODEL_PATH" \
    --model_type qwen3_moe \
    --dataset /mnt/tenant-home_speed/whl/swift_learn/dataset/numina_math_64#32 \
    --output_dir /mnt/tenant-home_speed/whl/swift_learn/output/grpo-training/qwen30B \
    --importance_sampling_level sequence \
    --reward_funcs accuracy \
    --torch_dtype bfloat16 \
    --beta 0.0 \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --steps_per_generation 4 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --num_generations 16 \
    --train_type full \
    --use_vllm true \
    --vllm_mode server \
    --vllm_server_host xxxx \
    --vllm_server_port 8099 \
    --vllm_gpu_memory_utilization 0.6 \
    --vllm_max_model_len 4096 \
    --max_completion_length 2048 \
    --learning_rate 1e-6 \
    --save_total_limit 2 \
    --logging_steps 1 \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --log_completions true \
    --deepspeed zero3 \
    --offload_optimizer true \
    --offload_model true \
    --sleep_level 1
```

<img width="1231" height="626" alt="Image" src="https://github.com/user-attachments/assets/21347236-fa47-4131-8a37-6e7f18428464" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rlhf，grpo，qwen30B， Stuck in the initial stage of training #5949

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rlhf，grpo，qwen30B， Stuck in the initial stage of training #5949

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions