Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@
- train_dataloader_shuffle: CPT/SFT训练的dataloader是否随机,默认为True。该参数对IterableDataset无效。IterableDataset采用顺序的方式读取。
- 🔥neftune_noise_alpha: neftune添加的噪声系数。默认为0,通常可以设置为5、10、15。
- 🔥use_liger_kernel: 是否启用[Liger](https://github.com/linkedin/Liger-Kernel)内核加速训练并减少显存消耗。默认为False。示例shell参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/liger)。
- 注意:liger_kernel不支持device_map,请使用DDP/DeepSpeed进行多卡训练。
- 注意:liger_kernel不支持device_map,请使用DDP/DeepSpeed进行多卡训练。liger_kernel目前只支持`task_type='causal_lm'`。
- average_tokens_across_devices: 是否在设备之间进行token数平均。如果设置为True,将使用all_reduce同步`num_tokens_in_batch`以进行精确的损失计算。默认为False。
- max_grad_norm: 梯度裁剪。默认为1.。
- 注意:日志中的grad_norm记录的是裁剪前的值。
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ Other important parameters:
- train_dataloader_shuffle: Whether to shuffle the dataloader in CPT/SFT training. Default is `True`. Not effective for `IterableDataset`, which uses sequential loading.
- 🔥neftune_noise_alpha: Noise magnitude for NEFTune. Default is 0. Common values: 5, 10, 15.
- 🔥use_liger_kernel: Whether to enable the [Liger](https://github.com/linkedin/Liger-Kernel) kernel to accelerate training and reduce GPU memory consumption. Defaults to False. Example shell script can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/liger).
- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training.
- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training. Currently, liger_kernel only supports `task_type='causal_lm'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This documentation update is helpful. To make the implementation more robust and prevent misuse, consider adding a check in the argument parsing logic to enforce this constraint. For example, in swift/llm/argument/train_args.py, within the TrainArguments.__post_init__ method, you could add:

if getattr(self, 'use_liger_kernel', False) and self.task_type != 'causal_lm':
    raise ValueError("`use_liger_kernel` only supports `task_type='causal_lm'`.")

This would provide immediate feedback to users who try to use liger_kernel with an unsupported task type.

- average_tokens_across_devices: Whether to average token counts across devices. If `True`, `num_tokens_in_batch` is synchronized via `all_reduce` for accurate loss computation. Default is `False`.
- max_grad_norm: Gradient clipping. Default is 1.
- Note: The logged `grad_norm` reflects the value **before** clipping.
Expand Down
12 changes: 0 additions & 12 deletions swift/llm/model/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,18 @@
class LLMModelType:
qwen = 'qwen'
qwen2 = 'qwen2'
qwen2_5 = 'qwen2_5'
qwen2_5_math = 'qwen2_5_math'
qwen2_moe = 'qwen2_moe'
qwq_preview = 'qwq_preview'
qwq = 'qwq'
qwen3 = 'qwen3'
qwen3_thinking = 'qwen3_thinking'
qwen3_nothinking = 'qwen3_nothinking'
qwen3_coder = 'qwen3_coder'
qwen3_moe = 'qwen3_moe'
qwen3_moe_thinking = 'qwen3_moe_thinking'
qwen3_next = 'qwen3_next'
qwen3_next_thinking = 'qwen3_next_thinking'
qwen3_emb = 'qwen3_emb'
qwen3_reranker = 'qwen3_reranker'

qwen2_gte = 'qwen2_gte'

bge_reranker = 'bge_reranker'

codefuse_qwen = 'codefuse_qwen'
modelscope_agent = 'modelscope_agent'
marco_o1 = 'marco_o1'

llama = 'llama'
llama3 = 'llama3'
Expand Down Expand Up @@ -168,7 +157,6 @@ class MLLMModelType:
qwen2_audio = 'qwen2_audio'
qwen3_vl = 'qwen3_vl'
qwen3_moe_vl = 'qwen3_moe_vl'
qvq = 'qvq'
qwen2_gme = 'qwen2_gme'
ovis1_6 = 'ovis1_6'
ovis1_6_llama3 = 'ovis1_6_llama3'
Expand Down
Loading
Loading