release code base

2024-02-20 16:31:21 +01:00 · 2024-02-20 16:31:21 +01:00 · efbd43fed1
commit efbd43fed1
70 changed files with 4923 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,90 @@
+<div align="center">
+<h1> OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog  </h1>
+    
+**[Adnen Abdessaied][4], &nbsp; [Manuel von Hochmeister][5], &nbsp; [Andreas Bulling][6]** <br>  <br>
+**COLING 2024**, Turin, Italy <img src="misc/italy.png" width="3%" align="center"> <br>
+**[[Paper][7]]**
+----------------
+<img src="misc/teaser.png" width="40%" align="middle"><br><br>
+
+</div>
+
+# Table of Contents
+* [Setup and Dependencies](#Setup-and-Dependencies)
+* [Download Data](#Download-Data)
+* [Training](#Training)
+* [Testing](#Testing)
+* [Results](#Results)
+* [Acknowledgements](#Acknowledgements)
+
+# Setup and Dependencies
+We implemented our model using Python 3.7, PyTorch 1.11.0 (CUDA 11.3, CuDNN 8.3.2) and PyTorch Lightning. We recommend to setup a virtual environment using Anaconda. <br>
+1. Install [git lfs][1] on your system
+2. Clone our repository to download a checpint of our best model and our code
+   ```shell
+   git lfs install
+   git clone this_repo.git
+   ```
+3. Create a conda environment and install dependencies
+   ```shell
+   conda create -n olvit python=3.7
+   conda activate olvit
+   conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
+   pip install pytorch-lightning==1.6.3 
+   pip install transformers==4.19.2
+   pip install torchtext==0.12.0
+   pip install wandb nltk pandas 
+    ```
+# Download Data
+1. [DVD][2] and [SIMMC 2.1][3] data are included in this repository and will be downloaded using git lfs  
+2. Setup the data by executing
+   ```shell
+   chmod u+x setup_data.sh
+   ./setup_data.sh
+    ```
+3. This will unpack all the data necessary in ```data/dvd/``` and ```data/simmc/``` 
+
+# Training
+We trained our model on 3 Nvidia Tesla V100-32GB GPUs. The default hyperparameters need to be adjusted if your setup differs from ours.
+## DVD
+1. Adjust the config file for DVD according to your hardware specifications in ```config/dvd.json```
+2. Execute
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2 python train.py --cfg_path config/dvd.json
+```
+3. Checkpoints will be saved in ```checkpoints/dvd/```
+
+## SIMMC 2.1
+1. Adjust the config file for SIMMC 2.1 according to your hardware specifications in ```config/simmc.json```
+2. Execute
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2 python train.py --cfg_path config/simmc.json
+```
+3. Checkpoints will be saved in ```checkpoints/simmc/```
+
+# Testing
+1. Execute
+```shell
+CUDA_VISIBLE_DEVICES=0 python test.py --ckpt_path <PATH_TO_TRAINED_MODEL> --cfg_path <PATH_TO_CONFIG_OF_TRAINED_MODEL>
+```
+
+# Results
+Training using the default config and a similar hardware setup as ours will result in the following performance
+
+## DVD
+<img src="misc/results_dvd.png" width="100%" align="middle"><br><br>
+
+## SIMMC 2.1
+<img src="misc/results_simmc.png" width="50%" align="middle"><br><br>
+
+# Acknowledgements
+Our work relied on the codebases of [DVD][2] and [SIMMC][3]. Thanks to the authors for sharing their code.
+
+
+[1]: https://git-lfs.com/
+[2]: https://github.com/facebookresearch/DVDialogues/
+[3]: https://github.com/facebookresearch/simmc2/
+[4]: https://perceptualui.org/people/abdessaied/
+[5]: https://www.linkedin.com/in/manuel-von-hochmeister-285416202/
+[6]: https://www.perceptualui.org/people/bulling/
+[7]: https://drive.google.com/file/d/1sDFfGpQ9E9NahT5gw8UjknWt3sNdxM7p/view?usp=sharing