Pengfei Wei1Β Β 
      Lingdong Kong1,2Β Β 
      Xinghua Qu1Β Β 
      Yi Ren1Β Β 
      Zhiqiang Xu3Β Β 
      Jing Jiang4Β Β 
      Xiang Yin1
    
  1ByteDance AI LabΒ Β 
  2National University of SingaporeΒ Β 
  3MBZUAIΒ Β 
  4University of Technology Sydney
  
TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.
   
  
  Col1: Original sequences ("Human" 
Visit our project page to explore more details. πΎ
- [2023.10] - We provide our extracted I3D features, kindly refer to this page for more details.
- [2023.09] - TranSVAE was accepted to NeurIPS 2023! π
- [2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
- [2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces! π€
- [2022.08] - Our paper is available on arXiv, click here to check it out!
- Highlights
- Installation
- Data Preparation
- Getting Started
- Main Results
- TODO List
- License
- Acknowledgement
- Citation
| Conceptual Comparison | 
|---|
|  | 
| Graphical Model | 
|  | 
| Framework Overview | 
|  | 
Please refer to INSTALL.md for the installation details.
Please refer to DATA_PREPARE.md for the details to prepare the 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, and 5Sprites datasets.
Please refer to GET_STARTED.md to learn more usage about this codebase.
| Method | Backbone | U101 β H51 | H51 β U101 | Average | 
|---|---|---|---|---|
| DANN (JMLR'16) | ResNet-101 | 75.28 | 76.36 | 75.82 | 
| JAN (ICML'17) | ResNet-101 | 74.72 | 76.69 | 75.71 | 
| AdaBN (PR'18) | ResNet-101 | 72.22 | 77.41 | 74.82 | 
| MCD (CVPR'18) | ResNet-101 | 73.89 | 79.34 | 76.62 | 
| TA3N (ICCV'19) | ResNet-101 | 78.33 | 81.79 | 80.06 | 
| ABG (MM'20) | ResNet-101 | 79.17 | 85.11 | 82.14 | 
| TCoN (AAAI'20) | ResNet-101 | 87.22 | 89.14 | 88.18 | 
| MA2L-TD (WACV'22) | ResNet-101 | 85.00 | 86.59 | 85.80 | 
| Source-only | I3D | 80.27 | 88.79 | 84.53 | 
| DANN (JMLR'16) | I3D | 80.83 | 88.09 | 84.46 | 
| ADDA (CVPR'17) | I3D | 79.17 | 88.44 | 83.81 | 
| TA3N (ICCV'19) | I3D | 81.38 | 90.54 | 85.96 | 
| SAVA (ECCV'20) | I3D | 82.22 | 91.24 | 86.73 | 
| CoMix (NeurIPS'21) | I3D | 86.66 | 93.87 | 90.22 | 
| CO2A (WACV'22) | I3D | 87.78 | 95.79 | 91.79 | 
| TranSVAE (Ours) | I3D | 87.78 | 98.95 | 93.37 | 
| Oracle | I3D | 95.00 | 96.85 | 95.93 | 
| Task | Source-only | DANN | ADDA | TA3N | CoMix | TranSVAE (Ours) | Oracle | 
|---|---|---|---|---|---|---|---|
| JS β JT | 51.5 | 55.4 | 52.3 | 55.5 | 64.7 | 66.1 | 95.6 | 
| Task | Source-only | DANN | ADDA | TA3N | CoMix | TranSVAE (Ours) | Oracle | 
|---|---|---|---|---|---|---|---|
| D1 β D2 | 32.8 | 37.7 | 35.4 | 34.2 | 42.9 | 50.5 | 64.0 | 
| D1 β D3 | 34.1 | 36.6 | 34.9 | 37.4 | 40.9 | 50.3 | 63.7 | 
| D2 β D1 | 35.4 | 38.3 | 36.3 | 40.9 | 38.6 | 50.3 | 57.0 | 
| D2 β D3 | 39.1 | 41.9 | 40.8 | 42.8 | 45.2 | 58.6 | 63.7 | 
| D3 β D1 | 34.6 | 38.8 | 36.1 | 39.9 | 42.3 | 48.0 | 57.0 | 
| D3 β D2 | 35.8 | 42.1 | 41.4 | 44.2 | 49.2 | 58.0 | 64.0 | 
| Average | 35.3 | 39.2 | 37.4 | 39.9 | 43.2 | 52.6 | 61.5 | 
Domain Transfer Example
- Initial release. π
- Add license. See here for more details.
- Add demo at Hugging Face Spaces.
- Add installation details.
- Add data preparation details.
- Add evaluation details.
- Add training details.

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
We acknowledge the use of the following public resources during the course of this work: 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, 5Sprites, 6I3D, and 7TRN.
If you find this work helpful, please kindly consider citing our paper:
@inproceedings{wei2023transvae,
  title = {Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective},
  author = {Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Ren, Yi and Xu, Zhiqiang and Jiang, Jing and Yin, Xiang},
  booktitle = {Advances in Neural Information Processing Systems}, 
  year = {2023},
}





















