public-projects/OLViT

Fork 0

Adnen Abdessaied b7118ca94a Update README.md

2024-02-21 10:45:19 +01:00

3.7 KiB

Raw Permalink Blame History

OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog

Adnen Abdessaied, Manuel von Hochmeister, Andreas Bulling

COLING 2024, Turin, Italy
[Paper]

Citation

If you find our code useful or use it in your own projects, please cite our paper:

@InProceedings{abdessaied24_coling,
    author    = {Abdessaied, Adnen and Hochmeister, Manuel and Bulling, Andreas},
    title     = {{OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog}},
    booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
    month     = {May},
    year      = {2024},
}

Setup and Dependencies
Download Data
Training
Testing
Results
Acknowledgements

Setup and Dependencies

We implemented our model using Python 3.7, PyTorch 1.11.0 (CUDA 11.3, CuDNN 8.3.2) and PyTorch Lightning. We recommend to setup a virtual environment using Anaconda.

Install git lfs on your system
Clone our repository to download a checpint of our best model and our code
```
git lfs install
git clone this_repo.git
```

Create a conda environment and install dependencies

conda create -n olvit python=3.7
conda activate olvit
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install pytorch-lightning==1.6.3 
pip install transformers==4.19.2
pip install torchtext==0.12.0
pip install wandb nltk pandas

Download Data

DVD and SIMMC 2.1 data are included in this repository and will be downloaded using git lfs

Setup the data by executing

chmod u+x setup_data.sh
./setup_data.sh

This will unpack all the data necessary in data/dvd/ and data/simmc/

Training

We trained our model on 3 Nvidia Tesla V100-32GB GPUs. The default hyperparameters need to be adjusted if your setup differs from ours.

DVD

Adjust the config file for DVD according to your hardware specifications in config/dvd.json
Execute

CUDA_VISIBLE_DEVICES=0,1,2 python train.py --cfg_path config/dvd.json

Checkpoints will be saved in checkpoints/dvd/

SIMMC 2.1

Adjust the config file for SIMMC 2.1 according to your hardware specifications in config/simmc.json
Execute

CUDA_VISIBLE_DEVICES=0,1,2 python train.py --cfg_path config/simmc.json

Checkpoints will be saved in checkpoints/simmc/

Testing

Execute

CUDA_VISIBLE_DEVICES=0 python test.py --ckpt_path <PATH_TO_TRAINED_MODEL> --cfg_path <PATH_TO_CONFIG_OF_TRAINED_MODEL>

Results

Training using the default config and a similar hardware setup as ours will result in the following performance

DVD

SIMMC 2.1

Acknowledgements

Our work relied on the codebases of DVD and SIMMC. Thanks to the authors for sharing their code.

3.7 KiB Raw Permalink Blame History

OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog

Adnen Abdessaied, Manuel von Hochmeister, Andreas Bulling COLING 2024, Turin, Italy [Paper]

Citation

Table of Contents

Setup and Dependencies

Download Data

Training

DVD

SIMMC 2.1

Testing

Results

DVD

SIMMC 2.1

Acknowledgements

3.7 KiB

Raw Permalink Blame History

Adnen Abdessaied, Manuel von Hochmeister, Andreas Bulling

COLING 2024, Turin, Italy
[Paper]