public-projects/ActionDiffusion_WACV2025

No description

Find a file

Lei Shi 8f8cf48929 first commit		2024-12-02 15:42:58 +01:00
dataloader	first commit	2024-12-02 15:42:58 +01:00
dataset	first commit	2024-12-02 15:42:58 +01:00
model	first commit	2024-12-02 15:42:58 +01:00
utils	first commit	2024-12-02 15:42:58 +01:00
data_load_json.py	first commit	2024-12-02 15:42:58 +01:00
inference.sh	first commit	2024-12-02 15:42:58 +01:00
inference_act.py	first commit	2024-12-02 15:42:58 +01:00
inference_act_new_dist.py	first commit	2024-12-02 15:42:58 +01:00
inference_dist.sh	first commit	2024-12-02 15:42:58 +01:00
inference_no_mask.py	first commit	2024-12-02 15:42:58 +01:00
inference_no_mask.sh	first commit	2024-12-02 15:42:58 +01:00
main_distributed_act.py	first commit	2024-12-02 15:42:58 +01:00
main_distributed_act_no_mask.py	first commit	2024-12-02 15:42:58 +01:00
print_para.py	first commit	2024-12-02 15:42:58 +01:00
README.md	first commit	2024-12-02 15:42:58 +01:00
temp.py	first commit	2024-12-02 15:42:58 +01:00
temp.sh	first commit	2024-12-02 15:42:58 +01:00
train.sh	first commit	2024-12-02 15:42:58 +01:00
train_mlp.py	first commit	2024-12-02 15:42:58 +01:00
train_mlp.sh	first commit	2024-12-02 15:42:58 +01:00
train_no_mask.sh	first commit	2024-12-02 15:42:58 +01:00

README.md

ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning in Instructional Videos

Lei Shi¹, Paul Bürkner², Andreas Bulling¹

University of Stuttgart
TU Dortmund University

IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Paper link: https://arxiv.org/abs/2403.08591

Dataset

Download pre-extracted features.

Crosstask

cd dataset/crosstask
wget https://www.di.ens.fr/~dzhukov/crosstask/crosstask_release.zip
wget https://vision.eecs.yorku.ca/WebShare/CrossTask_s3d.zip
unzip '*.zip'

Coin

cd dataset/coin
wget https://vision.eecs.yorku.ca/WebShare/COIN_s3d.zip
unzip COIN_s3d.zip

NIV

cd dataset/NIV
wget https://vision.eecs.yorku.ca/WebShare/NIV_s3d.zip
unzip NIV_s3d.zip

Train

Task Predicion

Set arguments in train_mlp.sh. Train task prediction model for each dataset. Set --class_dim, --action_dim, --observation_dim accordingly. For horizon T={3,4,5,6}, set --horizon, --json_path_val ,--json_path_train accordingly.

sh train_mlp.sh

Set the checkpoint path in temp.py via --checkpoint_mlp

Diffusion Model

Set dataset, horizon in train.sh to corresponding datasets and time horizons for training. Set mask_type to multi_add to use multiple-add noise mask or single_add to use single-add noise mask. Set attn to WithAttention to use UNet with attention or NoAttention to use UNet without attention.

To train the model, run

sh train.sh

To train the model without mask, run

sh train_no_mask.sh

Inference

Set dataset, horizon in inference.sh to corresponding datasets and time horizons for training. Set checkpoint_diff to the pre-trained model. Set mask_type to multi_add to use multiple-add noise mask or single_add to use single-add noise mask. Set attn to WithAttention to use UNet with attention or NoAttention to use UNet without attention.

Set dataset, horizon to corresponding datasets and time horizons for inference. Set checkpoint_diff to the path of pre-trained model. Set mask_type to multi_add to use multiple-add noise mask or single_add to use single-add noise mask. Set attn to WithAttention to use UNet with attention or NoAttention to use UNet without attention.

To perform inference, run

sh inference.sh

To perform inference without action mask, run

sh inference_no_mask.sh

To infer with the ditribution of the noise with action embedding, run

sh inference_dist.sh

Acknowledgement

This repository is developed based on https://github.com/MCG-NJU/PDPP/tree/main/