VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs
**[Adnen Abdessaied][5], [Lei Shi][6], [Andreas Bulling][7]**
**WACV'24, Hawaii, USA**
**[[Paper][8]]**
-------------------
# Citation
If you find our code useful or use it in your own projects, please cite our paper:
```bibtex
@InProceedings{Abdessaied_2024_WACV,
author = {Abdessaied, Adnen and Shi, Lei and Bulling, Andreas},
title = {VD-GR: Boosting Visual Dialog With Cascaded Spatial-Temporal Multi-Modal Graphs},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2024},
pages = {5805-5814}
}
```
# Table of Contents
* [Setup and Dependencies](#Setup-and-Dependencies)
* [Download Data](#Download-Data)
* [Pre-trained Checkpoints](#Pre-trained-Checkpoints)
* [Training](#Training)
* [Results](#Results)
# Setup and Dependencies
We implemented our model using Python 3.7 and PyTorch 1.11.0 (CUDA 11.3, CuDNN 8.2.0). We recommend to setup a virtual environment using Anaconda.