-
Notifications
You must be signed in to change notification settings - Fork 902
Open
Description
When training qwen3-14B and -32B with the same script, it can be completed smoothly. But switching to 30B won't work and won't generate any errors. Is it a problem with the model type 'qwen3_moe'?
The script is as follows:
#rollout:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift rollout \
--model "$MODEL_PATH" \
--model_type qwen3_moe \
--vllm_tensor_parallel_size 2 \
--vllm_data_parallel_size 2 \
--port 8099
#traning:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
--rlhf_type grpo \
--model "$MODEL_PATH" \
--model_type qwen3_moe \
--dataset /mnt/tenant-home_speed/whl/swift_learn/dataset/numina_math_64#32 \
--output_dir /mnt/tenant-home_speed/whl/swift_learn/output/grpo-training/qwen30B \
--importance_sampling_level sequence \
--reward_funcs accuracy \
--torch_dtype bfloat16 \
--beta 0.0 \
--epsilon 3e-4 \
--epsilon_high 4e-4 \
--steps_per_generation 4 \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--num_generations 16 \
--train_type full \
--use_vllm true \
--vllm_mode server \
--vllm_server_host xxxx \
--vllm_server_port 8099 \
--vllm_gpu_memory_utilization 0.6 \
--vllm_max_model_len 4096 \
--max_completion_length 2048 \
--learning_rate 1e-6 \
--save_total_limit 2 \
--logging_steps 1 \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--log_completions true \
--deepspeed zero3 \
--offload_optimizer true \
--offload_model true \
--sleep_level 1

Metadata
Metadata
Assignees
Labels
No labels