Epona: Autoregressive Diffusion World Model for Autonomous Driving

ICCV 2025

Kaiwen Zhang^*, Zhenyu Tang^*, Xiaotao Hu, Xingang Pan,
Xiaoyang Guo, Yuan Liu, Jingwei Huang, Yuan Li, Qian Zhang,
Xiaoxiao Long^✝, Xun Cao, Wei Yin^§

*Equal Contribution ✝Project Adviser §Project Lead, Corresponding Author

Versatile capabilities of Epona: Given historical driving context, our Epona can generate consistent minutes-long driving videos at high resolution (A). It can be controlled by diverse trajectories (B), and understand real-world traffic knowledge (C). In addition, our world model can predict future trajectories and serve as an end-to-end real-time motion planner (D).

🚀 Getting Started

Installation

conda create -n epona python=3.10
conda activate epona
pip install -r requirements.txt

To run the code with CUDA properly, you can comment out torch and torchvision in requirement.txt, and install the appropriate version of torch>=2.1.0+cu121 and torchvision>=0.16.0+cu121 according to the instructions on PyTorch.

Data Preparation

Please refer to data preparation for more details to prepare and preprocess data.

After preprocessing, please change the datasets_paths in the config files (under configs folder) to your own data path.

Inference

You can first download our pre-trained models (including the world models and the finetuned temporal-aware DCAE) from Huggingface.

In addition to our finetuned temporal-aware DCAE, you may also experiment with the original DCAEs provided by MIT Han Lab as the autoencoder: dc-ae-f32c32-mix-1.0 and dc-ae-f32c32-sana-1.1. After downloading, please change the vae_ckpt in the config files to your own autoencoder checkpoint path.

Then, you can run different scripts in scripts/test folder to test Epona for different uses:

Script Name	Dataset	Trajectory Type	Video Length	Use Case Description
`test_nuplan.py`	NuPlan	Fixed (from dataset)	Fixed	Evaluation on NuPlan test set with fixed setup.
`test_free.py`	NuPlan	Self-predicted	Variable (free)	Long-term video generation with autonomous predictions.
`test_ctrl.py`	NuPlan	User-provided (`poses`, `yaws`)	Variable (free)	Trajectory-controlled video generation; requires manual inputs in the script.
`test_traj.py`	NuPlan	Prediction only	N/A	Evaluates the model’s trajectory prediction accuracy.
`test_nuscenes.py`	NuScenes	Fixed (from dataset)	Fixed	Evaluation on nuScenes validation set with fixed setup.
`test_demo.py`	Custom input	Self-predicted	Variable (free)	Run Epona on your own input data.

For example, to test the model on NuPlan test set, you can run:

python3 scripts/test/test_nuplan.py \
  --exp_name "test-nuplan" \
  --start_id 0 --end_id 100 \
  --resume_path "pretrained/epona_nuplan.pkl" \
  --config configs/dit_config_dcae_nuplan.py

where:

exp_name is the name of the experiment;
start_id and end_id are the range of the test samples;
resume_path is the path to the pre-trained world model;
config is the path to the config file.

All the inference scripts can be run on a single NVIDIA 4090 GPU.

Training / Finetuning

We also provide a simple script scripts/train_deepspeed.py for you to train or finetune the world model with DeepSpeed. For example, to train the world model on NuPlan dataset, you can run:

export NODES_NUM=4
export GPUS_NUM=8
torchrun --nnodes=$NODES_NUM --nproc_per_node=$GPUS_NUM \
scripts/train_deepspeed.py \
  --batch_size 2 \
  --lr 2e-5 \
  --exp_name "train-nuplan" \
  --config configs/dit_config_dcae_nuplan.py \
  --resume_path "pretrained/epona_nuplan.pkl" \ # set `resume_path` to resume training on previous checkpoint
  --eval_steps 2000

You can customize the configuration file in the configs folder (e.g., adjust image resolution, number of condition frames, model size, etc.). Additionally, you can finetune our base world model on your own dataset by modifying the dataset folder to implement a custom dataset class.

❤️ Ackowledgement

Our implementation is based on DrivingWorld, Flux and DCAE. Thanks for these great open-source works!

📌 Citation

If any part of our paper or code is helpful to your research, please consider citing our work 📝 and give us a star ⭐. Thanks for your support!

@inproceedings{zhang2025epona,
  author = {Zhang, Kaiwen and Tang, Zhenyu and Hu, Xiaotao and Pan, Xingang and Guo, Xiaoyang and Liu, Yuan and Huang,
  Jingwei and Yuan, Li and Zhang, Qian and Long, Xiao-Xiao and Cao, Xun and Yin, Wei},
  title = {Epona: Autoregressive Diffusion World Model for Autonomous Driving},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
configs		configs
data_preparation		data_preparation
dataset		dataset
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Epona: Autoregressive Diffusion World Model for Autonomous Driving

ICCV 2025

🚀 Getting Started

Installation

Data Preparation

Inference

Training / Finetuning

❤️ Ackowledgement

📌 Citation

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

Kevin-thu/Epona

Folders and files

Latest commit

History

Repository files navigation

Epona: Autoregressive Diffusion World Model for Autonomous Driving

ICCV 2025

🚀 Getting Started

Installation

Data Preparation

Inference

Training / Finetuning

❤️ Ackowledgement

📌 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages