The raw data of MSVD-QA and MSRVTT-QA are located in
``
data/MSVD-QA
``
and
``
data/MSRVTT-QA
``
, respectively.<br/>
**Videos:** The raw videos of MSVD-QA and MSRVTT-QA can be downloaded from [⬇](https://www.cs.utexas.edu/users/ml/clamp/videoDescription/) and [⬇](https://www.mediafire.com/folder/h14iarbs62e7p/shared), respectively.<br/>
**Text:** The text data can be downloaded from [⬇](https://github.com/xudejing/video-question-answering).<br/>
After downloading all the raw data, ``
data/MSVD-QA
``
and
``
data/MSRVTT-QA
``
should have the following structure:
<palign="center"><imgsrc="assets/structure.png"alt="PHP Terminal style set text color"/></p>
# Preprocessing
To sample the individual frames and clips and generate the corresponding visual features, we run the script
``
preporocess.py
``
on the raw videos with the appropriate flags. E.g. for MSVD-QA we have to execute
Our pre-trained models are available here [⬇](https://drive.google.com/drive/folders/172yj4iUkF1U1WOPdA5KuKOTQXkgzFEzS)
# Acknowledgements
We thank the Vision and Language Group@ MIL for their [MCAN](https://github.com/MILVLG/mcan-vqa) open source implementation, [DavidA](https://github.com/DavideA/c3d-pytorch/blob/master/C3D_model.py) for his pretrained C3D model and finally [ixaxaar](https://github.com/ixaxaar/pytorch-dnc) for his DNC implementation.