Run on HPC

Background

PPAL only supports distributed GPU training and does not run on CPU.
However, PPAL is based on a old version of MMDetection, which is further based on a old version of OpenMMLab and PyTorch (1.12).
These dependencies are not compatible with the Nvidia H100 GPUs and the CUDA version in HPC.
Luckily, we can build a compatible PyTorch 1.12 from source.
This confluence page gives a step-by-step guide to build PyTorch 1.12 from source for HPC
Furthermore, to run PPAL on HPC, we need to modify the local PyTorch and MMDetection code.
To avoid going through this lengthy and painful process, we a self-contained Conda environment in HPC, which is ready to run PPAL.

Setup HPC environment

We have a shared project folder /scratch3/projects/runway_safety where the Conda environment, PPAL code and datasets are stored.

You can setup the environment by

    # Load Miniconda
    module load miniconda3/
    # Load Cuda
    module load cuda/11.8.0
    module load cudnn/8.7.0.84-cu11
    # Activate Conda environment
    source /scratch3/projects/runway_safety/ppal_env/bin/deactivate

If you want to deactivate the environment, you can run

    source /scratch3/projects/runway_safety/ppal_env/bin/deactivate

If you want to make changes to the environment, please create a local clone of the environment and make changes to the local clone.

    cp -r /scratch3/projects/runway_safety/ppal_env /path/to/local/ppal_env
    source /path/to/local/ppal_env/bin/deactivate

If you want to make changes to the PPAL code, please commit your changes under /scratch3/projects/runway_safety/git/PPAL and create a pull request to the main branch.

Run PPAL on Slurm

PPAL is designed to train distributedly on GPUs, so we use Slurm to manage the training jobs.
We provide a Slurm script /scratch3/projects/runway_safety/slurm_train.sh to run PPAL on Slurm, please create a local copy of the script and modify it to run your own experiments.

The script can be run as follows

sbatch /scratch3/projects/runway_safety/slurm_train.sh

It takes ~6 hours to finish, and the results will be saved in /scratch3/projects/runway_safety/work_dirs
For some reasons, PPAL only runs no 2 GPUs, using more GPUs causes runtime error. Not sure why...

Below are the original readme from PPAL

Plug and Play Active Learning for Object Detection

PyTorch implementation of our paper: Plug and Play Active Learning for Object Detection

Requirements

Our codebase is built on top of MMDetection, which can be installed following the offcial instuctions.

Usage

Installation

python setup.py install

Setup dataset

Place your dataset as the following structure (Only vital files are shown). It should be easy because it's the default MMDetection data placement)

PPAL
|
`-- data
    |
    |--coco
    |   |
    |   |--train2017
    |   |--val2017
    |   `--annotations
    |      |
    |      |--instances_train2017.json
    |      `--instances_val2017.json
    `-- VOCdevkit
        |
        |--VOC2007
        |  |
        |  |--ImageSets
        |  |--JPEGImages
        |  `--Annotations
        `--VOC2012
           |--ImageSets
           |--JPEGImages
           `--Annotations

For convenience, we use COCO style annotation for Pascal VOC active learning. Please download trainval_0712.json.
Set up active learning datasets

zsh tools/al_data/data_setup.sh /path/to/trainval_0712.json

The above command will set up a new Pascal VOC data folder. It will also generate three different active learning initial annotations for both dataset, where the COCO initial sets contain 2% of the original annotated images, and the Pascal VOC initial sets contains 5% of the original annotated images.
The resulted file structure is as following

PPAL
|
`-- data
    |
    |--coco
    |   |
    |   |--train2017
    |   |--val2017
    |   `--annotations
    |      |
    |      |--instances_train2017.json
    |      `--instances_val2017.json
    |--VOCdevkit
    |   |
    |   |--VOC2007
    |   |  |
    |   |  |--ImageSets
    |   |  |--JPEGImages
    |   |  `--Annotations
    |   `--VOC2012
    |      |--ImageSets
    |      |--JPEGImages
    |      `--Annotations
    |--VOC0712
    |  |
    |  |--images
    |  |--annotations
    |     |
    |     `--trainval_0712.json
    `--active_learning
       |
       |--coco
       |  |
       |  |--coco_2365_labeled_1.json
       |  |--coco_2365_unlabeled_1.json
       |  |--coco_2365_labeled_2.json
       |  |--coco_2365_unlabeled_2.json
       |  |--coco_2365_labeled_3.json
       |  `--coco_2365_unlabeled_3.json
       `--voc
          |
          |--voc_827_labeled_1.json
          |--voc_827_unlabeled_1.json
          |--voc_827_labeled_2.json
          |--voc_827_unlabeled_2.json
          |--voc_827_labeled_3.json
          `--voc_827_unlabeled_3.json

Please refer to data_setup.sh and create_al_dataset.py to generate you own active learning annotation.

Run active learning

You can run active learning using a single command with a config file. For example, you can run COCO and Pascal VOC RetinaNet experiments by

python tools/run_al_coco.py --config al_configs/coco/ppal_retinanet_coco.py --model retinanet
python tools/run_al_voc.py --config al_configs/voc/ppal_retinanet_voc.py --model retinanet

Please check the config file to set up the data paths and environment settings before running the experiments.

Citation

@InProceedings{yang2024ppal,
    author    = {{Yang, Chenhongyi and Huang, Lichao and Crowley, Elliot J.}},
    title     = {{Plug and Play Active Learning for Object Detection}},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
al_configs		al_configs
configs		configs
mmdet		mmdet
requirements		requirements
resources		resources
tests		tests
tools		tools
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
model-index.yml		model-index.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Run on HPC

Background

Setup HPC environment

Run PPAL on Slurm

Plug and Play Active Learning for Object Detection

Requirements

Usage

Installation

Setup dataset

Run active learning

Citation

About

Uh oh!

Releases

Packages

Languages

License

csiro-funml/PPAL

Folders and files

Latest commit

History

Repository files navigation

Run on HPC

Background

Setup HPC environment

Run PPAL on Slurm

Plug and Play Active Learning for Object Detection

Requirements

Usage

Installation

Setup dataset

Run active learning

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages