VideoChat-R1 & VideoChat-R1.5: Spatio-Temporal RL for Video Perception and Reasoning

🔥 Updates

2025/09/26:🔥🔥🔥 We release our VideoChat-R1.5 model at Huggingface, paper, and eval code.
2025/09/22: 🎉🎉🎉 Our VideoChat-R1.5 is accepted by NIPS2025.
2025/04/22:🔥🔥🔥 We release our VideoChat-R1-caption at Huggingface.
2025/04/14:🔥🔥🔥 We release our VideoChat-R1 and VideoChat-R1-thinking at Huggingface.
2025/04/10:🔥🔥🔥 We release our VideoChat-R1 paper and code.

🎯 Performances on Video Benchmarks

Across short-form & long-form videos, temporal grounding, video reasoning, and spatio-temporal perception, the model delivers consistently stronger results

🦜 Introduction

We adopt multi-task joint RL to strengthen the model’s spatio-temporal perception and video reasoning capabilities.

During the inference process, we use the Region of Interest strategy which allows the model to gradually find the video interval of interest. By using multi-step perception, model performance increases with the number of perceptions.

Demo & Inference

Refer to hf README to inference our model.

Evaluation

See eval_scripts and lmms-eval_videochat.

Training

See training_scripts.

📄 Citation

If you find this project useful in your research, please consider cite:

@article{li2025videochatr1,
  title={VideoChat-R1: Enhancing Spatio-Temporal
Perception via Reinforcement Fine-Tuning},
  author={Li, Xinhao and Yan, Ziang and Meng, Desen and Dong, Lu and Zeng, Xiangyu and He, Yinan and Wang, Yali and Qiao, Yu and Wang, Yi and Wang, Limin},
  journal={arXiv preprint arXiv:2504.06958},
  year={2025}
}

@article{yan2025videochatr15,
  title={VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception},
  author={Yan, Ziang and Li, Xinhao and He, Yinan and Zhengrong Yue and Zeng, Xiangyu and Wang, Yali and Qiao, Yu and Wang, Limin and Wang, Yi},
  journal={arXiv preprint arXiv:2509.21100},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Videochat-R1.5/src_eval		Videochat-R1.5/src_eval
Videochat-R1		Videochat-R1
README.md		README.md
framework.png		framework.png
perception.png		perception.png
requirements.txt		requirements.txt
sotas.png		sotas.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VideoChat-R1 & VideoChat-R1.5: Spatio-Temporal RL for Video Perception and Reasoning

🔥 Updates

🎯 Performances on Video Benchmarks

🦜 Introduction

Demo & Inference

Evaluation

Training

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

OpenGVLab/VideoChat-R1

Folders and files

Latest commit

History

Repository files navigation

VideoChat-R1 & VideoChat-R1.5: Spatio-Temporal RL for Video Perception and Reasoning

🔥 Updates

🎯 Performances on Video Benchmarks

🦜 Introduction

Demo & Inference

Evaluation

Training

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages