GitHub - ByteDance-Seed/VeOmni: VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

🔗 Overview

VeOmni is a versatile framework for both single- and multi-modal pre-training and post-training. It empowers users to seamlessly scale models of any modality across various accelerators, offering both flexibility and user-friendliness.

Our guiding principles when building VeOmni are:

Flexibility and Modularity: VeOmni is built with a modular design, allowing users to decouple most components and replace them with their own implementations as needed.
Trainer-free: VeOmni avoids rigid, structured trainer classes (e.g., PyTorch-Lightning or HuggingFace Trainer). Instead, VeOmni keeps training scripts linear, exposing the entire training logic to users for maximum transparency and control.
Omni model native: VeOmni enables users to effortlessly scale any omni-model across devices and accelerators.
Torch native: VeOmni is designed to leverage PyTorch’s native functions to the fullest extent, ensuring maximum compatibility and performance.

🔥 Latest News

[2025/09/19] We release first offical release v0.1.0 of VeOmni.
[2025/08/01] We release VeOmni Tech report and open the WeChat group. Feel free to join us!
[2025/04/03] We release VeOmni!

🔖 Table of Contents

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
🔗 Overview
- 🔥 Latest News
🔖 Table of Contents
📚 Key Features
- 🧪 Upcoming Features
🎈 Getting Started
🧱 Training Examples
✏️ Supported Models
⛰️ Performance
😊 Acknowledgement
💡 Awesome work using VeOmni
🎨 Contributing
📄 License
📝 Citation
🌱 About ByteDance Seed Team

📚 Key Features

Parallelism
- Parallel state by DeviceMesh
- Torch FSDP1/2
- Experts parallelism(Experimental)
- Easy to add new parallelism plan
- Sequence parallelism
  - Ulysess
  - Async-Ulysses
- Activation offloading
- Activation checkpointing
Kernels
- GroupGemm ops for moe
- Liger-Kernel integrations
Model
- Any transformers models.
- Multi-modal
  - Qwen2.5-VL
  - Qwen2-VL
  - Seed-Omni
Data IO
- Dynamic batching strategy
- Omnidata processing
Distributed Checkpointing
- ByteCheckpoint
- Torch Distributed checkpointing
- Dcp merge tools
Other tools
- Profiling tools
- Easy yaml configuration and argument parsing

🧪 Upcoming Features

Torch native Tensor parallelism
torch.compile
Flux: Fine-grained Computation-communication Overlapping GPU Kernel integrations
Better offloading strategy
More models support
Torch native pipeline parallelism

🎈 Getting Started

Read the VeOmni Best Practice for more details.

🔧 Installation

(Recommended) Use `uv` Managed Virtual Environment

We recommend to use uv managed virtual environment to run VeOmni.

# For GPU
uv sync --extra gpu
# For Ascend NPU
uv sync --extra npu
# You can install other optional deps by adding --extra like --extra dit

# Activate the uv managed virtual environment
source .venv/bin/activate

`pip` Based Install

Install using PyPI:

pip3 install veomni

Install from source code:

pip3 install -e .

🚀 Quick Start

User can quickly start training like this:

bash train.sh $TRAIN_SCRIPT $CONFIG.yaml

You can also override arguments in yaml by passing arguments from an external command line:

bash train.sh $TRAIN_SCRIPT $CONFIG.yaml \
    --model.model_path PATH/TO/MODEL \
    --data.train_path PATH/TO/DATA \
    --train.global_batch_size GLOBAL_BATCH_SIZE \

Here is an end-to-end workflow for preparing a subset of the fineweb dataset, continuing training a qwen2_5 model with sequence parallel 2 for 20 steps, and then merging the global_step_10 distributed checkpoint to hf weight by ByteCheckpoint.

Download fineweb dataset

python3 scripts/download_hf_data.py \
  --repo_id HuggingFaceFW/fineweb \
  --local_dir ./fineweb/ \
  --allow_patterns sample/10BT/*

Download qwen2_5 model

python3 scripts/download_hf_model.py \
  --repo_id Qwen/Qwen2.5-7B \
  --local_dir .

Training

bash train.sh tasks/train_torch.py configs/pretrain/qwen2_5.yaml \
    --model.model_path ./Qwen2.5-7B \
    --data.train_path ./fineweb/sample/10BT/ \
    --train.global_batch_size 512 \
    --train.lr 5e-7 \
    --train.ulysses_parallel_size 2 \
    --train.save_steps 10 \
    --train.max_steps 20 \
    --train.output_dir Qwen2.5-7B_CT

Merge checkpoints

python3 scripts/mereg_dcp_to_hf.py \
    --load-dir Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10 \
    --model_assets_dir Qwen2.5-7B-Instruct_CT/model_assets \
    --save-dir Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10/hf_ckpt

Inference

python3 tasks/infer.py \
  --infer.model_path Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10/hf_ckpt

🔒 Merge checkpoints

we use ByteCheckpoint to save checkpoints in torch.distributed.checkpoint(dcp) format. You can merge the dcp files using this command:

python3 scripts/mereg_dcp_to_hf.py \
    --load-dir PATH/TO/CHECKPOINTS \
    --model_assets_dir PATH/TO/MODEL_ASSETS \
    --save-dir PATH/TO/SAVE_HF_WEIGHT \

For example, your output_dir is seed_omni, and you want to merge global_step_100 checkpoint to huggingface-type weight:

python3 scripts/mereg_dcp_to_hf.py \
    --load-dir seed_omni/checkpoints/global_step_100 \
    --model_assets_dir seed_omni/model_assets \
    --save-dir seed_omni/hf_ckpt \

📦 Build Docker

cd docker/
docker compose up -d
docker compose exec VeOmni bash

🧱 Training Examples

PyTorch FSDP2 Qwen2VL

bash train.sh tasks/multimodal/omni/train_qwen2_vl.py configs/multimodal/qwen2_vl/qwen2_vl.yaml

PyTorch FSDP2 Qwen2

bash train.sh tasks/train_torch.py configs/pretrain/qwen2_5.yaml

PyTorch FSDP2 llama3-8b-instruct

bash train.sh  tasks/train_torch.py configs/pretrain/llama3.yaml

✏️ Supported Models

Model	Model size	Example config File
DeepSeek 2.5/3/R1	236B/671B	deepseek.yaml
Llama 3-3.3	1B/3B/8B/70B	llama3.yaml
Qwen 2-3	0.5B/1.5B/3B/7B/14B/32B/72B/	qwen2_5.yaml
Qwen2-VL/Qwen2.5-VL/QVQ	2B/3B/7B/32B/72B	qwen2_vl.yaml
Qwen3-MoE	A330B/A22B235B	qwen3-moe.yaml
Wan	Wan2.1-I2V-14B-480P	wan_sft.yaml
Omni Model	Any Modality Training	seed_omni.yaml

VeOmni Support all transformers models if you don't need sequence parallelism or experts parallelism or other parallelism and cuda kernal optimize in VeOmni. We design a model registry mechanism. When the model is registered in veomni, we will automatically load the model and optimizer in VeOmni. Otherwise, it will default to load the modeling file in transformers.

If you want to add a new model, you can add a new model in the model registry. See in Support costom model docs.

⛰️ Performance

Seed in Tech report (https://arxiv.org/abs/2508.02317)

😊 Acknowledgement

Thanks to the following projects for their excellent work:

💡 Awesome work using VeOmni

🎨 Contributing

Contributions from the community are welcome! Please check out CONTRIBUTING.md our project roadmap(To be updated),

📄 License

This project is licensed under Apache License 2.0. See the LICENSE file for details.

📝 Citation

If you find VeOmni useful for your research and applications, feel free to give us a star ⭐ or cite us using:

@article{ma2025veomni,
  title={VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo},
  author={Ma, Qianli and Zheng, Yaowei and Shi, Zhelun and Zhao, Zhongkai and Jia, Bin and Huang, Ziyue and Lin, Zhiqi and Li, Youjie and Yang, Jiacheng and Peng, Yanghua and others},
  journal={arXiv preprint arXiv:2508.02317},
  year={2025}
}

🌱 About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

You can get to know us better through the following channels👇

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github		.github
assets		assets
configs		configs
docker		docker
docs		docs
scripts		scripts
tasks		tasks
tests		tests
veomni		veomni
.env.local		.env.local
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
logfile		logfile
pyproject.toml		pyproject.toml
train.sh		train.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

🔗 Overview

🔥 Latest News

🔖 Table of Contents

📚 Key Features

🧪 Upcoming Features

🎈 Getting Started

🔧 Installation

(Recommended) Use `uv` Managed Virtual Environment

`pip` Based Install

🚀 Quick Start

🔒 Merge checkpoints

📦 Build Docker

🧱 Training Examples

✏️ Supported Models

⛰️ Performance

😊 Acknowledgement

💡 Awesome work using VeOmni

🎨 Contributing

📄 License

📝 Citation

🌱 About ByteDance Seed Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 22

Uh oh!

Languages

License

ByteDance-Seed/VeOmni

Folders and files

Latest commit

History

Repository files navigation

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

🔗 Overview

🔥 Latest News

🔖 Table of Contents

📚 Key Features

🧪 Upcoming Features

🎈 Getting Started

🔧 Installation

(Recommended) Use uv Managed Virtual Environment

pip Based Install

🚀 Quick Start

🔒 Merge checkpoints

📦 Build Docker

🧱 Training Examples

✏️ Supported Models

⛰️ Performance

😊 Acknowledgement

💡 Awesome work using VeOmni

🎨 Contributing

📄 License

📝 Citation

🌱 About ByteDance Seed Team

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 22

Uh oh!

Languages

(Recommended) Use `uv` Managed Virtual Environment

`pip` Based Install

Packages