This repository contains all of our ProgLoRA code. We sincerely thank the help of Chen et al.'s repository.
- Install Package
conda create -n prog python=3.10 -y
conda activate prog
pip install --upgrade pip
pip install -e .
- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
This repo is based on CoIN. If you meet a problem, maybe you could find some solutions in issuses.
Please download the images from the constituting dataset: ScienceQA, VQAv2, VizWiz, TextVQA, GQA, OCR-VQA, ImageNet, RefCOCO, RefCOCO+, and RefCOCOg.
Image Source | Download Path |
---|---|
COCO | train2014, test2015, val2014 |
RefCOCO | annotation |
RefCOCO+ | annotation |
RefCOCOg | annotation |
ImageNet | images |
OCR-VQA | images |
GQA | images |
TextVQA | train,test |
ScienceQA | images |
VizWiz | train, val, test |
After downloading all of them, organize the data as follows:
├── COCO2014
│ └── train2014
├── GQA
│ └── images
├── OCR-VQA
│ └── images
├── TextVQA
│ └── train_images
│ └── test_images
Then, please download the instructions: CoIN_Dataset then, organize the instructions as follows:
├── Instruction_Original
│ └── GQA
│ └── train.json
│ └── test.json
│ └── ScienceQA
│ └── train.json
│ └── test.json
├── Instruction_Type2
│ └── GQA
│ └── train.json
│ └── test.json
First, downloading the pretrained projectors in LLaVA Model_Zoo and setting pretrain_mm_mlp_adapter
.
We provide the training scripts in scripts/LLaVA/Train_MOE_dynamic_share
.
We have prepared the scripts to evaluate the trained model in scripts/LLaVA/Eval_dynamic_share
.