Official code of VDGR

Find a file

Adnen Abdessaied 56fe49fc33 Update README.md		2023-11-02 13:14:18 +01:00
ckpt	Code release	2023-10-25 15:38:09 +02:00
config	Code release	2023-10-25 15:38:09 +02:00
data	Code release	2023-10-25 15:38:09 +02:00
dataloader	Code release	2023-10-25 15:38:09 +02:00
misc	Code release	2023-10-25 15:38:09 +02:00
models	Code release	2023-10-25 15:38:09 +02:00
utils	Code release	2023-10-25 15:38:09 +02:00
ensemble.py	Code release	2023-10-25 15:38:09 +02:00
main.py	Code release	2023-10-25 15:38:09 +02:00
README.md	Update README.md	2023-11-02 13:14:18 +01:00
setup_data.sh	Code release	2023-10-25 15:38:09 +02:00

README.md

VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs

Adnen Abdessaied, Lei Shi, Andreas Bulling

WACV'24, Hawaii, USA
[Paper]

Citation

If you find our code useful or use it in your own projects, please cite our paper:

@inproceedings{abdessaied_vdgr,
  author = {Abdessaied, Adnen and Lei, Shi and Bulling, Andreas},
  title = {{VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs}},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year = {2024},
}

Setup and Dependencies
Download Data
Pre-trained Checkpoints
Training
Results

Setup and Dependencies

We implemented our model using Python 3.7 and PyTorch 1.11.0 (CUDA 11.3, CuDNN 8.2.0). We recommend to setup a virtual environment using Anaconda.

Install git lfs on your system

Clone our repository to download the data, checkpoints, and code

git lfs install
git clone https://git.hcics.simtech.uni-stuttgart.de/public-projects/VDGR.git

Create a conda environment and install dependencies

conda create -n vdgr python=3.7
conda activate vdgr
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
conda install pyg -c pyg  # 2.1.0
pip install pytorch-transformers
pip install pytorch_pretrained_bert
pip install pyhocon glog wandb lmdb

If you wish to speed-up training, we recommend installing apex

git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

Download Data

Download the extacted visual features of VisDial and setup all files we used in our work. We provide a shell script for convenience:

./setup_data.sh  # Please make sure you have enough disk space

If everything was correctly setup, the data/ folder should look like this

├── history_adj_matrices
│   ├── test
│        ├── *.pkl
│   ├── train
│        ├── *.pkl
│   ├── val
│        ├── *.pkl
├── question_adj_matrices
│   ├── test
│        ├── *.pkl
│   ├── train
│        ├── *.pkl
│   ├── val
│        ├── *.pkl
├── img_adj_matrices
│   ├── *.pkl
├── parse_vocab.pkl
├── test_dense_mapping.json
├── tr_dense_mapping.json
├── val_dense_mapping.json
├── visdial_0.9_test.json
├── visdial_0.9_train.json
├── visdial_0.9_val.json
├── visdial_1.0_test.json
├── visdial_1.0_train_dense_annotations.json
├── visdial_1.0_train_dense.json
├── visdial_1.0_train.json
├── visdial_1.0_val_dense_annotations.json
├── visdial_1.0_val.json
├── visdialconv_dense_annotations.json
├── visdialconv.json
├── vispro_dense_annotations.json
└── vispro.json

Pre-trained Checkpoints

For convenience, we provide checkpoints of our model after the warm-up training stage in ckpt/ for both VisDial v1.0 and VisDial v0.9.
These checkpoints will be downloaded with the code if you use git lfs.

Training

We trained our model on 8 Nvidia Tesla V100-32GB GPUs. The default hyperparameters in config/vdgr.conf and config/bert_base_6layer_6conect.json need to be adjusted if your setup differs from ours.

Phase 1

Training

In this phase, we train our model on VisDial v1.0 via

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P1 \
--mode train \
--tag K2_v1.0 \
--wandb_mode online \
--wandb_project your_wandb_project_name

⚠️ On a similar setup to ours, this will take roughly 20h to complete using apex for training.

To train on VisDial v0.9:

Set visdial_version = 0.9 in config/vdgr.conf
Set start_path = ckpt/vdgr_visdial_v0.9_after_warmup_K2.ckpt in config/vdgr.conf
Run

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
 --model vdgr/P1 \
 --mode train \
 --tag K2_v0.9 \
 --wandb_mode online \
 --wandb_project your_wandb_project_name

Inference

For inference on VisDial v1.0 val, VisDialConv, or VisPro:

Set eval_dataset = {visdial, visdial_conv, visdial_vispro} in logs/vdgr/P1_K2_v1.0/code/config/vdgr.conf
Run

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
 --model vdgr/P1 \
 --mode eval \
 --eval_dir logs/vdgr/P1_K2_v1.0 \
 --wandb_mode offline \

For inference on VisDial v0.9:

Set eval_dataset = visdial in logs/vdgr/P1_K2_v0.9/code/config/vdgr.conf
Run

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
 --model vdgr/P1 \
 --mode eval \
 --eval_dir logs/vdgr/P1_K2_v0.9 \
 --wandb_mode offline \

⚠️ This might take some time to finish as the testing data of VisDial v0.9 is large.

For inference on the visdial_v1.0 test:

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
 --model vdgr/P1 \
 --mode predict \
 --eval_dir logs/vdgr/P1_K2_v1.0 \
 --wandb_mode offline \

The output file will be saved in output/

Phase 2

In this phase, we finetune on dense annotations to improve the NDCG score (Only supported for VisDial v1.0.)

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P2_CE \
--mode train \
--tag K2_v1.0_CE \
--wandb_mode online \
--wandb_project your_wandb_project_name

⚠️This will take roughly 3-4 hours to complete using the same setup as before and DP for training.

For inference on VisDial v1.0:

Run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P2_CE \
--mode predict \
--eval_dir logs/vdgr/P1_K2_v1.0_CE \
--wandb_mode offline \

The output file will be saved in output/

Phase 3

Training

In the final phase, we train an ensemble method comprising of 8 models using K={1,2,3,4} and dense_loss={ce, listnet}. For K=k:

Set the value of num_v_gnn_layers, num_q_gnn_layers, num_h_gnn_layers to k
Set start_path = ckpt/vdgr_visdial_v1.0_after_warmup_K[k].ckpt in config/vdgr.conf (P1)
Phase 1 training:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P1 \
--mode train \
--tag K[k]_v1.0 \
--wandb_mode online \
--wandb_project your_wandb_project_name

Set start_path = logs/vdgr/P1_K[k]_v1.0/epoch_best.ckpt in config/vdgr.conf (P2)
Phase 2 training:

Fine-tune with CE:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P2_CE \
--mode train \
--tag K[k]_v1.0_CE \
--wandb_mode online \
--wandb_project your_wandb_project_name

Fine-tune with LISTNET:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P2_LISTNET \
--mode train \
--tag K[k]_v1.0_LISTNET \
--wandb_mode online \
--wandb_project your_wandb_project_name

Inference

For inference on VisDial v1.0 test:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py \
--model vdgr/P2_[CE,LISTNET] \
--mode predict \
--eval_dir logs/vdgr/P2_K[1,2,3,4]_v1.0_[CE,LISTNET] \
--wandb_mode offline \

Finally, merge the outputs of all models

 python ensemble.py \
--exp test \
--mode predict \

The output file will be saved in output/

Results

VisDial v0.9

Model	MRR	R@1	R@5	R@10	Mean
Prev. SOTA	71.99	59.41	87.92	94.59	2.87
VD-GR	74.50	62.10	90.49	96.37	2.45

VisDialConv

Model	NDCG	MRR	R@1	R@5	R@10	Mean
Prev. SOTA	61.72	61.79	48.95	77.50	86.71	4.72
VD-GR	67.09	66.82	54.47	81.71	91.44	3.54

VisPro

Model	NDCG	MRR	R@1	R@5	R@10	Mean
Prev. SOTA	59.30	62.29	48.35	80.10	88.87	4.37
VD-GR	60.35	69.89	57.21	85.97	92.68	3.15

VisDial V1.0 Val

Model	NDCG	MRR	R@1	R@5	R@10	Mean
Prev. SOTA	65.47	69.71	56.79	85.82	93.64	3.15
VD-GR	64.32	69.91	57.01	86.14	93.74	3.13

VisDial V1.0 Test

Model	NDCG	MRR	R@1	R@5	R@10	Mean
Prev. SOTA	64.91	68.73	55.73	85.38	93.53	3.21
VD-GR	63.49	68.65	55.33	85.58	93.85	3.20
♣️ Prev. SOTA	75.92	56.18	45.32	68.05	80.98	5.42
♣️ VD-GR	75.95	58.30	46.55	71.45	84.52	5.32
♣️♦️ Prev. SOTA	76.17	56.42	44.75	70.23	84.52	5.47
♣️♦️ VD-GR	76.43	56.35	45.18	68.13	82.18	5.79

♣️ = Finetuning on dense annotations, ♦️ = Ensemble model

Contributors

Adnen Abdessaied

For any questions or enquiries, don't hesitate to contact the above contributor(s).

README.md

VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs

Citation

Table of Contents

Setup and Dependencies

Download Data

Pre-trained Checkpoints

Training

Phase 1

Training

Inference

Phase 2

Phase 3

Training

Inference

Results

VisDial v0.9

VisDialConv

VisPro

VisDial V1.0 Val

VisDial V1.0 Test

Contributors