Official code for "VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images" published at CogSci'24
Go to file
2024-05-24 15:03:09 +02:00
dataset.py initial commit 2024-04-29 17:18:10 +02:00
generate_query_masks.py initial commit 2024-04-29 17:18:10 +02:00
gqa_all_attributes.json initial commit 2024-04-29 17:18:10 +02:00
gqa_all_relations_map.json initial commit 2024-04-29 17:18:10 +02:00
gqa_all_vocab_classes.json initial commit 2024-04-29 17:18:10 +02:00
LICENSE Initial commit 2024-04-29 14:48:16 +02:00
pipeline.png added pipeline image 2024-05-24 14:50:00 +02:00
query_masks.png added query mask examples 2024-05-24 15:03:09 +02:00
README.md added query mask examples 2024-05-24 15:03:09 +02:00
relations.zip add all query masks 2024-04-29 17:38:37 +02:00
requirements.txt initial commit 2024-04-29 17:18:10 +02:00
run_programs.py initial commit 2024-04-29 17:18:10 +02:00
utils.py initial commit 2024-04-29 17:18:10 +02:00
VSA4VQA_examples.ipynb initial commit 2024-04-29 17:18:10 +02:00

VSA4VQA

pipeline.png

Official code for VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images published at CogSci'24

Installation

# create environment
conda create -n ssp_env python=3.9 pip
conda activate ssp_env
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia -y

sudo apt install libmysqlclient-dev

# install requirements
pip install -r requirements.txt

# install CLIP 
pip install git+https://github.com/openai/CLIP.git

# setup jupyter notebook kernel
python -m ipykernel install --user --name=ssp_env

Get GQA Programs

using code by https://github.com/wenhuchen/Meta-Module-Network

  • Download github repo MMN
  • Add gqa-questions folder with GQA json files
  • Run Preprocessing
    python preprocess.py create_balanced_programs
  • Save generated programs to data folder:
testdev_balanced_inputs.json
trainval_balanced_inputs.json
testdev_balanced_programs.json  
trainval_balanced_programs.json

GQA dictionaries: gqa_all_attributes.json and gqa_all_vocab_classes are also adapted from https://github.com/wenhuchen/Meta-Module-Network

Generate Query Masks

query_masks.png

All 37 generated query masks are available in relations.zip as numpy files

# loading a query mask with numpy
import numpy as np 
rel = 'to_the_right_of'
path = f'relations/{rel}.npy'
mask = np.load(path)
mask = mask > 0.05 	# binary mask

If you want to run the generation process, run:

python generate_query_masks.py 
  • generates full_relations_df.pkl if not already present
  • generates query masks for all relations with more than 1000 samples

Pipeline

Execute Pipeline for all samples in GQA: train_balanced (with TEST=False) or validation_balanced (with TEST=True)

python run_programs.py 

For visualizing samples with all pipeline steps see VSA4VQA_examples.ipynb

Citation

Please consider citing this paper if you use VSA4VQA or parts of this publication in your research:

@inproceedings{penzkofer24_cogsci,
  author = {Penzkofer, Anna and Shi, Lei and Bulling, Andreas},
  title = {VSA4VQA: Scaling A Vector Symbolic Architecture To Visual Question Answering on Natural Images},
  booktitle = {Proc. 46th Annual Meeting of the Cognitive Science Society (CogSci)},
  year = {2024},
  pages = {}
}