Skip to content

Kevin-thu/Epona

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Epona: Autoregressive Diffusion World Model for Autonomous Driving

ICCV 2025

Kaiwen Zhang*, Zhenyu Tang*, Xiaotao Hu, Xingang Pan,
Xiaoyang Guo, Yuan Liu, Jingwei Huang, Yuan Li, Qian Zhang,
Xiaoxiao Long, Xun Cao, Wei Yin§

*Equal Contribution ✝Project Adviser §Project Lead, Corresponding Author

arXiv page Huggingface

Versatile capabilities of Epona: Given historical driving context, our Epona can generate consistent minutes-long driving videos at high resolution (A). It can be controlled by diverse trajectories (B), and understand real-world traffic knowledge (C). In addition, our world model can predict future trajectories and serve as an end-to-end real-time motion planner (D).

🚀 Getting Started

Installation

conda create -n epona python=3.10
conda activate epona
pip install -r requirements.txt

To run the code with CUDA properly, you can comment out torch and torchvision in requirement.txt, and install the appropriate version of torch>=2.1.0+cu121 and torchvision>=0.16.0+cu121 according to the instructions on PyTorch.

Data Preparation

Please refer to data preparation for more details to prepare and preprocess data.

After preprocessing, please change the datasets_paths in the config files (under configs folder) to your own data path.

Inference

You can first download our pre-trained models (including the world models and the finetuned temporal-aware DCAE) from Huggingface.

In addition to our finetuned temporal-aware DCAE, you may also experiment with the original DCAEs provided by MIT Han Lab as the autoencoder: dc-ae-f32c32-mix-1.0 and dc-ae-f32c32-sana-1.1. After downloading, please change the vae_ckpt in the config files to your own autoencoder checkpoint path.

Then, you can run different scripts in scripts/test folder to test Epona for different uses:

Script Name Dataset Trajectory Type Video Length Use Case Description
test_nuplan.py NuPlan Fixed (from dataset) Fixed Evaluation on NuPlan test set with fixed setup.
test_free.py NuPlan Self-predicted Variable (free) Long-term video generation with autonomous predictions.
test_ctrl.py NuPlan User-provided (poses, yaws) Variable (free) Trajectory-controlled video generation; requires manual inputs in the script.
test_traj.py NuPlan Prediction only N/A Evaluates the model’s trajectory prediction accuracy.
test_nuscenes.py NuScenes Fixed (from dataset) Fixed Evaluation on nuScenes validation set with fixed setup.
test_demo.py Custom input Self-predicted Variable (free) Run Epona on your own input data.

For example, to test the model on NuPlan test set, you can run:

python3 scripts/test/test_nuplan.py \
  --exp_name "test-nuplan" \
  --start_id 0 --end_id 100 \
  --resume_path "pretrained/epona_nuplan.pkl" \
  --config configs/dit_config_dcae_nuplan.py

where:

  • exp_name is the name of the experiment;
  • start_id and end_id are the range of the test samples;
  • resume_path is the path to the pre-trained world model;
  • config is the path to the config file.

All the inference scripts can be run on a single NVIDIA 4090 GPU.

Training / Finetuning

We also provide a simple script scripts/train_deepspeed.py for you to train or finetune the world model with DeepSpeed. For example, to train the world model on NuPlan dataset, you can run:

export NODES_NUM=4
export GPUS_NUM=8
torchrun --nnodes=$NODES_NUM --nproc_per_node=$GPUS_NUM \
scripts/train_deepspeed.py \
  --batch_size 2 \
  --lr 2e-5 \
  --exp_name "train-nuplan" \
  --config configs/dit_config_dcae_nuplan.py \
  --resume_path "pretrained/epona_nuplan.pkl" \ # set `resume_path` to resume training on previous checkpoint
  --eval_steps 2000

You can customize the configuration file in the configs folder (e.g., adjust image resolution, number of condition frames, model size, etc.). Additionally, you can finetune our base world model on your own dataset by modifying the dataset folder to implement a custom dataset class.

❤️ Ackowledgement

Our implementation is based on DrivingWorld, Flux and DCAE. Thanks for these great open-source works!

📌 Citation

If any part of our paper or code is helpful to your research, please consider citing our work 📝 and give us a star ⭐. Thanks for your support!

@inproceedings{zhang2025epona,
  author = {Zhang, Kaiwen and Tang, Zhenyu and Hu, Xiaotao and Pan, Xingang and Guo, Xiaoyang and Liu, Yuan and Huang,
  Jingwei and Yuan, Li and Zhang, Qian and Long, Xiao-Xiao and Cao, Xun and Yin, Wei},
  title = {Epona: Autoregressive Diffusion World Model for Autonomous Driving},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year = {2025}
}

About

Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages