79 lines
No EOL
2.5 KiB
Markdown
79 lines
No EOL
2.5 KiB
Markdown
# VSA4VQA
|
|
|
|
![pipeline.png](pipeline.png)
|
|
|
|
Official code for [VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images](https://perceptualui.org/publications/penzkofer24_cogsci/) published at CogSci'24
|
|
|
|
## Installation
|
|
```shell
|
|
# create environment
|
|
conda create -n ssp_env python=3.9 pip
|
|
conda activate ssp_env
|
|
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia -y
|
|
|
|
sudo apt install libmysqlclient-dev
|
|
|
|
# install requirements
|
|
pip install -r requirements.txt
|
|
|
|
# install CLIP
|
|
pip install git+https://github.com/openai/CLIP.git
|
|
|
|
# setup jupyter notebook kernel
|
|
python -m ipykernel install --user --name=ssp_env
|
|
|
|
```
|
|
|
|
## Get GQA Programs
|
|
using code by [https://github.com/wenhuchen/Meta-Module-Network](https://github.com/wenhuchen/Meta-Module-Network)<br>
|
|
- Download github repo MMN
|
|
- Add `gqa-questions` folder with GQA json files
|
|
- Run Preprocessing
|
|
`python preprocess.py create_balanced_programs`
|
|
- Save generated programs to data folder:
|
|
```
|
|
testdev_balanced_inputs.json
|
|
trainval_balanced_inputs.json
|
|
testdev_balanced_programs.json
|
|
trainval_balanced_programs.json
|
|
```
|
|
|
|
> GQA dictionaries: `gqa_all_attributes.json` and `gqa_all_vocab_classes` are also adapted from [https://github.com/wenhuchen/Meta-Module-Network](https://github.com/wenhuchen/Meta-Module-Network)
|
|
|
|
## Generate Query Masks
|
|
> All 37 generated query masks are available in [relations.zip](relations.zip) as numpy files
|
|
|
|
```python
|
|
# loading a query mask with numpy
|
|
import numpy as np
|
|
rel = 'to_the_right_of'
|
|
path = f'relations/{rel}.npy'
|
|
mask = np.load(path)
|
|
mask = mask > 0.05 # binary mask
|
|
```
|
|
If you want to run the generation process, run:
|
|
```shell
|
|
python generate_query_masks.py
|
|
```
|
|
- generates full_relations_df.pkl if not already present
|
|
- generates query masks for all relations with more than 1000 samples
|
|
|
|
|
|
## Pipeline
|
|
Execute Pipeline for all samples in GQA: train_balanced (with `TEST=False`) or validation_balanced (with `TEST=True`)
|
|
```shell
|
|
python run_programs.py
|
|
```
|
|
For visualizing samples with all pipeline steps see [VSA4VQA_examples.ipynb](VSA4VQA_examples.ipynb) <br>
|
|
|
|
## Citation
|
|
Please consider citing this paper if you use VSA4VQA or parts of this publication in your research:
|
|
```
|
|
@inproceedings{penzkofer24_cogsci,
|
|
author = {Penzkofer, Anna and Shi, Lei and Bulling, Andreas},
|
|
title = {VSA4VQA: Scaling A Vector Symbolic Architecture To Visual Question Answering on Natural Images},
|
|
booktitle = {Proc. 46th Annual Meeting of the Cognitive Science Society (CogSci)},
|
|
year = {2024},
|
|
pages = {}
|
|
}
|
|
``` |