No description
Find a file
2024-12-02 15:42:58 +01:00
dataloader first commit 2024-12-02 15:42:58 +01:00
dataset first commit 2024-12-02 15:42:58 +01:00
model first commit 2024-12-02 15:42:58 +01:00
utils first commit 2024-12-02 15:42:58 +01:00
data_load_json.py first commit 2024-12-02 15:42:58 +01:00
inference.sh first commit 2024-12-02 15:42:58 +01:00
inference_act.py first commit 2024-12-02 15:42:58 +01:00
inference_act_new_dist.py first commit 2024-12-02 15:42:58 +01:00
inference_dist.sh first commit 2024-12-02 15:42:58 +01:00
inference_no_mask.py first commit 2024-12-02 15:42:58 +01:00
inference_no_mask.sh first commit 2024-12-02 15:42:58 +01:00
main_distributed_act.py first commit 2024-12-02 15:42:58 +01:00
main_distributed_act_no_mask.py first commit 2024-12-02 15:42:58 +01:00
print_para.py first commit 2024-12-02 15:42:58 +01:00
README.md first commit 2024-12-02 15:42:58 +01:00
temp.py first commit 2024-12-02 15:42:58 +01:00
temp.sh first commit 2024-12-02 15:42:58 +01:00
train.sh first commit 2024-12-02 15:42:58 +01:00
train_mlp.py first commit 2024-12-02 15:42:58 +01:00
train_mlp.sh first commit 2024-12-02 15:42:58 +01:00
train_no_mask.sh first commit 2024-12-02 15:42:58 +01:00

ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning in Instructional Videos

Lei Shi1, Paul Bürkner2, Andreas Bulling1

  1. University of Stuttgart
  2. TU Dortmund University

IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Paper link: https://arxiv.org/abs/2403.08591

Dataset

Download pre-extracted features.

Crosstask

cd dataset/crosstask
wget https://www.di.ens.fr/~dzhukov/crosstask/crosstask_release.zip
wget https://vision.eecs.yorku.ca/WebShare/CrossTask_s3d.zip
unzip '*.zip'

Coin

cd dataset/coin
wget https://vision.eecs.yorku.ca/WebShare/COIN_s3d.zip
unzip COIN_s3d.zip

NIV

cd dataset/NIV
wget https://vision.eecs.yorku.ca/WebShare/NIV_s3d.zip
unzip NIV_s3d.zip

Train

Task Predicion

Set arguments in train_mlp.sh. Train task prediction model for each dataset. Set --class_dim, --action_dim, --observation_dim accordingly. For horizon T={3,4,5,6}, set --horizon, --json_path_val ,--json_path_train accordingly.

sh train_mlp.sh

Set the checkpoint path in temp.py via --checkpoint_mlp

Diffusion Model

Set dataset, horizon in train.sh to corresponding datasets and time horizons for training. Set mask_type to multi_add to use multiple-add noise mask or single_add to use single-add noise mask. Set attn to WithAttention to use UNet with attention or NoAttention to use UNet without attention.

To train the model, run

sh train.sh

To train the model without mask, run

sh train_no_mask.sh

Inference

Set dataset, horizon in inference.sh to corresponding datasets and time horizons for training. Set checkpoint_diff to the pre-trained model. Set mask_type to multi_add to use multiple-add noise mask or single_add to use single-add noise mask. Set attn to WithAttention to use UNet with attention or NoAttention to use UNet without attention.

Set dataset, horizon to corresponding datasets and time horizons for inference. Set checkpoint_diff to the path of pre-trained model. Set mask_type to multi_add to use multiple-add noise mask or single_add to use single-add noise mask. Set attn to WithAttention to use UNet with attention or NoAttention to use UNet without attention.

To perform inference, run

sh inference.sh

To perform inference without action mask, run

sh inference_no_mask.sh

To infer with the ditribution of the noise with action embedding, run

sh inference_dist.sh

Acknowledgement

This repository is developed based on https://github.com/MCG-NJU/PDPP/tree/main/