Uploaded

2025-04-10 20:14:17 +02:00 · 2025-04-10 20:14:17 +02:00 · 04c4625cfe
commit 04c4625cfe
11 changed files with 1330 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,95 @@
+<div align="center">
+<h1> SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation </h1>
+    
+**[Guanhua Zhang][4], &nbsp; [Mohamed Ahmed][3], &nbsp; [Zhiming Hu][5], &nbsp; [Andreas Bulling][6]** <br>
+**ACM CHI 2025**, Yokohama, Japan <br>
+**[[Project][2]]** **[[Paper][7]]** </div>
+----------------
+
+# Directory Structure
+```
+SummAct
+│   README.md
+│   environment.yml
+│
+└───preprocess
+│   convert_dataset.py
+│   create_steps.py
+│
+└───hf_bmt
+│   hf_2_bmtrain.py
+│   hf_2_bmtrain.sh
+│   bmt_hf.py
+│
+└───train
+│   train.py
+│   train.sh
+│
+└───inference
+│   inference.py
+│   inference.sh
+│
+└───train
+│   train.py
+│   train.sh
+│
+└───inference
+│   inference.py
+│   inference.sh   
+
+```
+# Setup
+We recommend setting up a virtual environment using Anaconda. <br>
+1. Create a conda environment and install dependencies
+   ```shell
+   conda env create --name summact --file=env.yaml
+   conda activate summact
+    ```
+2. Since `model_center==1.0.3` is needed but is not yet available on PYPI, build from [source](https://github.com/OpenBMB/ModelCenter)
+    ```
+    $ git clone https://github.com/OpenBMB/ModelCenter.git
+    $ cd ModelCenter
+    $ pip install -r requirements.txt
+    $ python3 setup.py install
+    ```
+3. Clone our repository to download our code and a pretrained model
+   ```shell
+   git clone this_repo.git
+   ```
+
+# Preprocessing
+1. Convert actions from symbolic formats to natural language by running `preprocess/convert_dataset.py`. Adapt it to your local dataset paths.
+2. Prompting the pretrained LLM with examples to generate sub-intentions using `preprocess/create_steps.py`. Adapt it to your local prompt txt path.
+
+# Fine-tuning
+1. After downloading the model from Hugging Face, convert it into `model_center` weights using the script in `hf_bmt/hf_2_bmtrain.sh`. Adapt it to your local paths of the downloaded model and the wanted output.
+2. Run `train/train.sh`, which will call `train/train.py` to fine-tune the model for interactive behaviour summarisation. Make sure your computer has GPUs.
+
+# Inference
+Run `inference/inference.sh`, which will call `inference/inference.py` to convert the fine-tuned model back to HF format, and then calculate metrics to evaluate the summarisation quality.
+
+# Citation 
+If you find our code useful or use it in your own projects, please cite our paper:
+```
+@inproceedings{zhang25_chi,
+  title = {SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation},
+  author = {Zhang, Guanhua and Ahmed, Mohamed and Hu, Zhiming and Bulling, Andreas},
+  year = {2025},
+  pages = {1--17},
+  booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)},
+  doi = {10.1145/3706598.3713190}
+}
+```
+
+# Acknowledgements
+Our work relied on the codebase of [Mind2Web][1], [ScreenAgent][8] and [Tell Me More!][9]. Thanks to the authors for sharing their code.
+
+[1]: https://osu-nlp-group.github.io/Mind2Web/
+[2]: https://collaborative-ai.org/publications/zhang25_chi/
+[3]: https://www.linkedin.com/in/mohamed-adel-naguib/
+[4]: https://scholar.google.com/citations?user=NqkK0GwAAAAJ&hl=en
+[5]: https://scholar.google.com/citations?hl=en&user=OLB_xBEAAAAJ
+[6]: https://www.collaborative-ai.org/people/bulling/
+[7]: https://collaborative-ai.org/publications/zhang25_chi.pdf
+[8]: https://github.com/niuzaisheng/ScreenAgent
+[9]: https://github.com/OpenBMB/Tell_Me_More
--- a/environment.yml
+++ b/environment.yml
@ -0,0 +1,184 @@
+name: Mistral
+channels:
+  - defaults
+  - conda-forge
+dependencies:
+  - _libgcc_mutex=0.1=conda_forge
+  - _openmp_mutex=4.5=2_gnu
+  - asttokens=2.4.1=pyhd8ed1ab_0
+  - bzip2=1.0.8=hd590300_5
+  - ca-certificates=2024.2.2=hbcca054_0
+  - comm=0.2.2=pyhd8ed1ab_0
+  - debugpy=1.8.1=py311hb755f60_0
+  - decorator=5.1.1=pyhd8ed1ab_0
+  - exceptiongroup=1.2.0=pyhd8ed1ab_2
+  - executing=2.0.1=pyhd8ed1ab_0
+  - importlib-metadata=7.1.0=pyha770c72_0
+  - importlib_metadata=7.1.0=hd8ed1ab_0
+  - ipykernel=6.29.3=pyhd33586a_0
+  - ipython=8.24.0=pyh707e725_0
+  - jedi=0.19.1=pyhd8ed1ab_0
+  - jupyter_client=8.6.1=pyhd8ed1ab_0
+  - jupyter_core=5.7.2=py311h38be061_0
+  - keyutils=1.6.1=h166bdaf_0
+  - krb5=1.21.2=h659d440_0
+  - ld_impl_linux-64=2.40=h41732ed_0
+  - libedit=3.1.20191231=he28a2e2_2
+  - libexpat=2.6.2=h59595ed_0
+  - libffi=3.4.2=h7f98852_5
+  - libgcc-ng=13.2.0=h807b86a_5
+  - libgomp=13.2.0=h807b86a_5
+  - libnsl=2.0.1=hd590300_0
+  - libsodium=1.0.18=h36c2ea0_1
+  - libsqlite=3.45.2=h2797004_0
+  - libstdcxx-ng=13.2.0=hc0a3c3a_7
+  - libuuid=2.38.1=h0b41bf4_0
+  - libxcrypt=4.4.36=hd590300_1
+  - libzlib=1.2.13=hd590300_5
+  - matplotlib-inline=0.1.7=pyhd8ed1ab_0
+  - ncurses=6.4.20240210=h59595ed_0
+  - nest-asyncio=1.6.0=pyhd8ed1ab_0
+  - openssl=3.3.0=hd590300_0
+  - packaging=24.0=pyhd8ed1ab_0
+  - parso=0.8.4=pyhd8ed1ab_0
+  - pexpect=4.9.0=pyhd8ed1ab_0
+  - pickleshare=0.7.5=py_1003
+  - pip=24.0=pyhd8ed1ab_0
+  - platformdirs=4.2.1=pyhd8ed1ab_0
+  - prompt-toolkit=3.0.42=pyha770c72_0
+  - psutil=5.9.8=py311h459d7ec_0
+  - ptyprocess=0.7.0=pyhd3deb0d_0
+  - pure_eval=0.2.2=pyhd8ed1ab_0
+  - pygments=2.18.0=pyhd8ed1ab_0
+  - python=3.11.8=hab00c5b_0_cpython
+  - python_abi=3.11=4_cp311
+  - pyzmq=26.0.3=py311h08a0b41_0
+  - readline=8.2=h8228510_1
+  - setuptools=69.5.1=pyhd8ed1ab_0
+  - six=1.16.0=pyh6c4a22f_0
+  - stack_data=0.6.2=pyhd8ed1ab_0
+  - tk=8.6.13=noxft_h4845f30_101
+  - tornado=6.4=py311h459d7ec_0
+  - traitlets=5.14.3=pyhd8ed1ab_0
+  - typing_extensions=4.11.0=pyha770c72_0
+  - wcwidth=0.2.13=pyhd8ed1ab_0
+  - wheel=0.43.0=pyhd8ed1ab_1
+  - xz=5.2.6=h166bdaf_0
+  - zeromq=4.3.5=h75354e8_4
+  - zipp=3.17.0=pyhd8ed1ab_0
+  - pip:
+      - absl-py==2.1.0
+      - accelerate==0.29.2
+      - aiohttp==3.9.4
+      - aiosignal==1.3.1
+      - annotated-types==0.7.0
+      - antlr4-python3-runtime==4.9.3
+      - anyio==4.4.0
+      - appdirs==1.4.4
+      - attrs==23.2.0
+      - beautifulsoup4==4.12.3
+      - bmtrain==1.0.0
+      - bs4==0.0.2
+      - certifi==2024.2.2
+      - charset-normalizer==3.3.2
+      - click==8.1.7
+      - colorama==0.4.6
+      - cprint==1.2.2
+      - cython==0.29.37
+      - datasets==2.18.0
+      - dill==0.3.8
+      - distro==1.9.0
+      - docker-pycreds==0.4.0
+      - evaluate==0.4.1
+      - filelock==3.13.4
+      - frozenlist==1.4.1
+      - fsspec==2024.2.0
+      - gitdb==4.0.11
+      - gitpython==3.1.43
+      - grpcio==1.62.1
+      - h11==0.14.0
+      - hdbscan==0.8.37
+      - httpcore==1.0.5
+      - httpx==0.27.0
+      - huggingface-hub==0.22.2
+      - hydra-core==1.3.2
+      - idna==3.7
+      - jieba==0.42.1
+      - jinja2==3.1.3
+      - joblib==1.4.0
+      - keybert==0.8.5
+      - levenshtein==0.25.1
+      - lxml==5.2.1
+      - markdown==3.6
+      - markdown-it-py==3.0.0
+      - markupsafe==2.1.5
+      - mdurl==0.1.2
+      - mpmath==1.3.0
+      - multidict==6.0.5
+      - multiprocess==0.70.16
+      - networkx==3.3
+      - nltk==3.8.1
+      - numpy==1.26.4
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu11==2.21.5
+      - nvidia-nccl-cu12==2.19.3
+      - nvidia-nvjitlink-cu12==12.4.127
+      - nvidia-nvtx-cu12==12.1.105
+      - omegaconf==2.3.0
+      - openai==1.36.0
+      - pandas==2.2.2
+      - pdb-tools==2.5.0
+      - pillow==10.3.0
+      - portalocker==2.8.2
+      - protobuf==4.25.3
+      - pyarrow==15.0.2
+      - pyarrow-hotfix==0.6
+      - pydantic==2.8.2
+      - pydantic-core==2.20.1
+      - python-dateutil==2.9.0.post0
+      - pytz==2024.1
+      - pyyaml==6.0.1
+      - rapidfuzz==3.9.6
+      - regex==2023.12.25
+      - requests==2.31.0
+      - responses==0.18.0
+      - rich==13.7.1
+      - rouge-score==0.1.2
+      - sacrebleu==2.4.2
+      - safetensors==0.4.3
+      - scikit-learn==1.4.2
+      - scipy==1.13.0
+      - sentence-transformers==2.7.0
+      - sentencepiece==0.2.0
+      - sentry-sdk==1.45.0
+      - setproctitle==1.3.3
+      - smmap==5.0.1
+      - sniffio==1.3.1
+      - soupsieve==2.5
+      - sympy==1.12
+      - tabulate==0.9.0
+      - tensorboard==2.16.2
+      - tensorboard-data-server==0.7.2
+      - textblob==0.18.0.post0
+      - threadpoolctl==3.4.0
+      - tokenizers==0.15.2
+      - torch==2.2.2
+      - torchvision==0.17.2
+      - tqdm==4.66.2
+      - transformers==4.39.3
+      - triton==2.2.0
+      - tzdata==2024.1
+      - urllib3==2.2.1
+      - wandb==0.16.6
+      - werkzeug==3.0.2
+      - xxhash==3.4.1
+      - yarl==1.9.4
+prefix: /opt/anaconda3/envs/Mistral
--- a/hf_bmt/bmt_hf.py
+++ b/hf_bmt/bmt_hf.py
@ -0,0 +1,92 @@
+import os, pdb
+import json
+import torch
+import sys
+import shutil
+import argparse
+from collections import OrderedDict
+from transformers import AutoConfig, AutoModelForCausalLM
+
+
+def transform_to_hf(bmt_model, model_size):
+    model_hf = OrderedDict()
+
+    if 'input_embedding.weight' in bmt_model.keys():
+        model_hf['model.embed_tokens.weight'] = bmt_model["input_embedding.weight"].contiguous().float()
+        model_hf['model.norm.weight'] = bmt_model["encoder.output_layernorm.weight"].contiguous().float()
+        try:
+            model_hf['lm_head.weight'] = bmt_model['output_projection.weight'].contiguous().float()
+        except:
+            model_hf['lm_head.weight'] = bmt_model["input_embedding.weight"].contiguous().float()
+    else:
+        model_hf['model.embed_tokens.weight'] = bmt_model["LLM.input_embedding.weight"].contiguous().float()
+        model_hf['model.norm.weight'] = bmt_model["LLM.encoder.output_layernorm.weight"].contiguous().float()
+        try:
+            model_hf['lm_head.weight'] = bmt_model['LLM.output_projection.weight'].contiguous().float()
+        except:
+            model_hf['lm_head.weight'] = bmt_model["LLM.input_embedding.weight"].contiguous().float()
+
+    if model_size == "7b":
+        layernum = 32
+    elif model_size == "13b" or model_size == "13b-2":
+        layernum = 40
+    elif model_size == "65b":
+        layernum = 80
+        
+    for lnum in range(layernum):
+        hf_pfx = f"model.layers.{lnum}"
+        if 'input_embedding.weight' in bmt_model.keys():
+            bmt_pfx = f"encoder.layers.{lnum}"
+        else:
+            bmt_pfx = f"LLM.encoder.layers.{lnum}"
+        
+        model_hf[f"{hf_pfx}.input_layernorm.weight"] = bmt_model[f"{bmt_pfx}.self_att.layernorm_before_attention.weight"].contiguous().float()
+
+        model_hf[f"{hf_pfx}.self_attn.q_proj.weight"] = bmt_model[f"{bmt_pfx}.self_att.self_attention.project_q.weight"].contiguous().float()
+        model_hf[f"{hf_pfx}.self_attn.k_proj.weight"] = bmt_model[f"{bmt_pfx}.self_att.self_attention.project_k.weight"].contiguous().float()
+        model_hf[f"{hf_pfx}.self_attn.v_proj.weight"] = bmt_model[f"{bmt_pfx}.self_att.self_attention.project_v.weight"].contiguous().float()
+        model_hf[f"{hf_pfx}.self_attn.o_proj.weight"] = bmt_model[f"{bmt_pfx}.self_att.self_attention.attention_out.weight"].contiguous().float()
+
+        model_hf[f"{hf_pfx}.post_attention_layernorm.weight"] = bmt_model[f"{bmt_pfx}.ffn.layernorm_before_ffn.weight"].contiguous().float()
+
+        model_hf[f"{hf_pfx}.mlp.gate_proj.weight"] = bmt_model[f"{bmt_pfx}.ffn.ffn.w_in.w_0.weight"].contiguous().float()
+        model_hf[f"{hf_pfx}.mlp.up_proj.weight"] = bmt_model[f"{bmt_pfx}.ffn.ffn.w_in.w_1.weight"].contiguous().float()
+
+        model_hf[f"{hf_pfx}.mlp.down_proj.weight"] = bmt_model[f"{bmt_pfx}.ffn.ffn.w_out.weight"].contiguous().float()
+    
+    for key in model_hf:
+        model_hf[key] = model_hf[key].bfloat16()
+    return model_hf
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument("--in_path", type=str)
+    parser.add_argument("--output_path", type=str)
+    parser.add_argument("--original_mistral_path", type=str)
+
+    args = parser.parse_args()
+    os.makedirs(args.output_path, exist_ok=True)
+    print("transforming " + args.in_path)
+    
+    model_size = "7b"
+    
+    ckpt = [name for name in os.listdir(args.in_path) if name.endswith(".pt")]
+    bmt_model = torch.load(os.path.join(args.in_path, ckpt[0]))
+
+    hf_state_dict = transform_to_hf(bmt_model, model_size)
+    print(f"start saving to {args.output_path}")
+
+    model_config = AutoConfig.from_pretrained(args.original_mistral_path)
+    model = AutoModelForCausalLM.from_config(model_config)
+    model.load_state_dict(hf_state_dict)
+    
+    for param in model.parameters():
+        param.data = param.data.to(torch.bfloat16)
+    
+    model.save_pretrained(args.output_path, safe_serialization=False)
+    for file_name in ["tokenizer_config.json", "special_tokens_map.json", "tokenizer.model", "tokenizer.json"]:
+        if os.path.exists(os.path.join(args.in_path, file_name)):
+            shutil.copy(os.path.join(args.in_path, file_name), os.path.join(args.output_path, file_name))
+    print("saved huggingface checkpoint")
--- a/hf_bmt/hf_2_bmtrain.py
+++ b/hf_bmt/hf_2_bmtrain.py
@ -0,0 +1,108 @@
+from transformers import LlamaConfig
+from transformers import AutoModelForCausalLM
+import torch, os
+import json
+from collections import OrderedDict
+import shutil, pdb
+
+import argparse
+
+def initialize():
+    # get arguments
+    parser = argparse.ArgumentParser("")
+    # Output Directory for the bmt train weights.
+    parser.add_argument("--out_path", type=str, default=f"/Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24")
+    # Path where you downloaded mistral-7b hugging face weight
+    parser.add_argument('--in_path', type=str, default=f"/Mistral-{ver}-bmtrain")
+    args = parser.parse_args()
+    return args
+
+ver = "7b"
+# Change these two 
+# Output Directory for the bmt train weights.
+# outpath = f"/Mistral-{ver}-bmtrain"
+# Path where you downloaded mistral-7b hugging face weight
+# inpath = f"/Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24"
+def convert_weights(args):
+    hf_config = LlamaConfig.from_pretrained(args.in_path)
+    config = {
+        'dim_model': hf_config.hidden_size,
+        'dim_ff': hf_config.intermediate_size,
+        'num_layers': hf_config.num_hidden_layers,
+        'num_heads': hf_config.num_attention_heads,
+        'num_heads_kv': hf_config.num_key_value_heads,
+        'dim_head': hf_config.hidden_size // hf_config.num_attention_heads,
+        'norm_eps': hf_config.rms_norm_eps,
+    }
+    os.makedirs(args.out_path, exist_ok=True)
+
+    with open(os.path.join(args.out_path, "config.json"), 'w') as f:
+        json.dump(config, f)
+
+    layernum = config['num_layers']
+
+    model_hf = OrderedDict()
+    ckpt_num = None
+    if 'v0.1' in args.in_path:
+        prefix = "pytorch_model-"
+        endtext = ".bin"
+    else:
+        prefix = "model-"
+        endtext = ".safetensors"
+    for name in os.listdir(args.in_path):
+        if name.startswith(prefix) and name.endswith(endtext):
+            ckpt_num =int(name.split(endtext)[0].split('-')[-1])
+    for i in range(1, ckpt_num + 1):
+        if 'v0.1' in args.in_path:
+            part = torch.load(os.path.join(args.in_path, f"pytorch_model-{i:05d}-of-{ckpt_num:05d}.bin"))
+        else:            
+            from safetensors import safe_open
+            with safe_open(os.path.join(args.in_path, f"model-{i:05d}-of-{ckpt_num:05d}.safetensors"), framework="pt", device=0) as f:
+                part = {}
+                for k in f.keys():
+                    part[k] = f.get_tensor(k)
+        model_hf.update(part)
+
+    out = OrderedDict()
+
+    out["input_embedding.weight"] = model_hf['model.embed_tokens.weight'].contiguous()
+    out["encoder.output_layernorm.weight"] = model_hf['model.norm.weight'].contiguous()
+    out['output_projection.weight'] = model_hf['lm_head.weight'].contiguous()
+    for lnum in range(layernum):
+        hf_pfx = f"model.layers.{lnum}"
+        bmt_pfx = f"encoder.layers.{lnum}"
+        
+        out[f"{bmt_pfx}.self_att.layernorm_before_attention.weight"] = model_hf[f"{hf_pfx}.input_layernorm.weight"].contiguous()
+
+        out[f"{bmt_pfx}.self_att.self_attention.project_q.weight"] = model_hf[f"{hf_pfx}.self_attn.q_proj.weight"].contiguous()
+        out[f"{bmt_pfx}.self_att.self_attention.project_k.weight"] = model_hf[f"{hf_pfx}.self_attn.k_proj.weight"].contiguous()
+        out[f"{bmt_pfx}.self_att.self_attention.project_v.weight"] = model_hf[f"{hf_pfx}.self_attn.v_proj.weight"].contiguous()
+        out[f"{bmt_pfx}.self_att.self_attention.attention_out.weight"] = model_hf[f"{hf_pfx}.self_attn.o_proj.weight"].contiguous()
+
+        out[f"{bmt_pfx}.ffn.layernorm_before_ffn.weight"] = model_hf[f"{hf_pfx}.post_attention_layernorm.weight"].contiguous()
+
+        out[f"{bmt_pfx}.ffn.ffn.w_in.w_0.weight"] = model_hf[f"{hf_pfx}.mlp.gate_proj.weight"].contiguous()
+        out[f"{bmt_pfx}.ffn.ffn.w_in.w_1.weight"] = model_hf[f"{hf_pfx}.mlp.up_proj.weight"].contiguous()
+
+        out[f"{bmt_pfx}.ffn.ffn.w_out.weight"] = model_hf[f"{hf_pfx}.mlp.down_proj.weight"].contiguous()
+        
+        
+    for key in out:
+        out[key] = out[key].half()
+
+    if not os.path.exists(args.out_path):
+        os.makedirs(args.out_path)
+    torch.save(out, os.path.join(args.out_path, "pytorch_model.pt"))
+
+    for file_name in ["tokenizer_config.json", "special_tokens_map.json", "tokenizer.model", "tokenizer.json"]:
+        if os.path.exists(os.path.join(args.in_path, file_name)):
+            shutil.copy(os.path.join(args.in_path, file_name), os.path.join(args.out_path, file_name))
+    
+    print("BMT weights created sucessfully")
+
+def main():
+    args = initialize()
+    convert_weights(args)
+    
+if __name__ == "__main__":
+    main()
--- a/hf_bmt/hf_2_bmtrain.sh
+++ b/hf_bmt/hf_2_bmtrain.sh
@ -0,0 +1,13 @@
+IN_PATH="your-path-to-hf-model"
+OUT_PATH="your-wanted-path-to-bm-model"
+
+OPTS=""
+OPTS+="--in_path ${IN_PATH} "
+OPTS+="--out_path ${OUT_PATH}"
+
+CMD="python3 hf_2_bmtrain.py ${OPTS}"
+
+echo "-------final CMD is------"
+echo "${CMD}"
+echo "-------final CMD end------"
+eval ${CMD}
--- a/inference/inference.py
+++ b/inference/inference.py
@ -0,0 +1,158 @@
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import argparse
+import os, pdb
+import numpy as np
+import json
+from pathlib import Path
+import json
+from tqdm import tqdm
+from cprint import cprint
+import evaluate
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:32"
+import logging
+import torch
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+from sentence_transformers import SentenceTransformer, util
+
+def initialize():
+    parser = argparse.ArgumentParser("")
+    parser.add_argument("--model_name_or_path", type=str, default='')
+    parser.add_argument("--embedding_model_path", type=str, default="")
+    parser.add_argument("--train_data_dir", type=str, default='')
+    parser.add_argument("--test_data_dir", type=str, default='')
+    parser.add_argument("--prompt_file", type=str, default=None, help="The file for loading the prompt")
+    args = parser.parse_args()
+    return args
+
+def get_tokenizer(args):
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path, device_map={"":0})
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.padding_side = 'left'
+    return tokenizer
+
+def get_model(args):
+    model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path, device_map={"":0})
+    return model
+
+def setup_model_and_tokenizer(args):
+    tokenizer = get_tokenizer(args)
+    model = get_model(args)
+    return tokenizer, model
+
+def read_json_file(filename):
+    with open(filename, 'r') as infile:
+        data = json.load(infile)
+    return data
+
+def format_one_action(action):
+    return f"- {action}\n"
+
+def format_actions_list(actions):
+    actions_str = ""
+    for action in actions:
+        actions_str += format_one_action(action)
+    return actions_str
+
+def preprocess_data(task, args):
+    with open(args.prompt_file, 'r') as file:
+        task_description = file.read().split('===')
+
+    input_str = f"## Website:\n{task['website_en']}\n\n## Domain:\n{task['domain_en']}\n\n## Sub-domain:\n{task['subdomain_en']}\n\n## Actions (Each line is one action):\n{format_actions_list(task['task_subintention'])}\n## Sub-intentions summarised from these actions:\n{format_actions_list(task['steps'])}"
+    query_inputs = f"{task_description[0]}\n{input_str}{task_description[1]}\n"
+    summary_str = task['task_description']
+    summary_str = summary_str[0].upper() + summary_str[1:] + "."
+    test_prompt = f"User: {query_inputs}\nAgent:"
+    return {"task": summary_str, "prompt": test_prompt}
+
+def load_raw_dataset(data, args):
+    tasks = []
+    for d in tqdm(data):
+        processed_task = preprocess_data(d, args)
+        tasks.append(processed_task)
+    return tasks
+
+def main_loop(args, test_dataset, tokenizer, model, sacrebleu, rouge, meteor, embedding_model, mark):
+    os.makedirs(args.model_name_or_path+"/results/", exist_ok=True)
+    global_sacrebleu, global_rouge1, global_rouge2, global_rougeL, global_rougeLsum, global_meteor, global_cosine, global_distance = [], [], [], [], [], [], [], []
+    for i, data in tqdm(enumerate(test_dataset)):
+        save_task_response_filename = args.model_name_or_path + f"/results/{mark}_{i}_insert_mistral.json"
+        if os.path.exists(save_task_response_filename):
+            with open(save_task_response_filename, 'r') as f:
+                save_dict = json.load(f)
+        else:
+            prompt = data["prompt"]
+            task = data["task"]
+
+            save_dict = {}
+            model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
+            generated_ids = model.generate(**model_inputs,max_new_tokens=1024, do_sample=False, top_p= 0.95, repetition_penalty=1.2)
+            pred =  tokenizer.batch_decode(generated_ids)[0]
+            response = pred.split("[SUMMARY]")[-1].replace('</s>','').strip()
+            
+            rouge_calc = rouge.compute(predictions = [response], references=[[task]], use_aggregator=True)
+            sacrebleu_calc  = sacrebleu.compute(predictions = [response], references=[[task]])
+            meteor_calc = meteor.compute(predictions = [response], references=[[task]])
+            GT_Embedding= embedding_model.encode(task.lower(), convert_to_tensor=True)
+            Prediction_Embedding = embedding_model.encode(response.lower(), convert_to_tensor=True)
+            cosine_similarity = util.cos_sim(GT_Embedding, Prediction_Embedding).item()
+            euclidean_disance  = torch.sqrt(torch.sum(torch.pow(torch.subtract(GT_Embedding, Prediction_Embedding), 2))).item()
+            save_dict["prompt"] = prompt
+            save_dict["prediction"] = response
+            save_dict["task"] = task
+            save_dict["sacrebleu"] = sacrebleu_calc
+            save_dict["rouge"] = rouge_calc
+            save_dict["meteor"] = meteor_calc
+            save_dict["cosine_similarity"] = cosine_similarity
+            save_dict["euclidean_disance"] = euclidean_disance
+
+            with open(save_task_response_filename, 'w') as f:
+                json.dump(save_dict, f)
+        
+        global_sacrebleu.append(save_dict["sacrebleu"]["score"])
+        global_rouge1.append(save_dict["rouge"]["rouge1"])
+        global_rouge2.append(save_dict["rouge"]["rouge2"])
+        global_rougeL.append(save_dict["rouge"]["rougeL"])
+        global_rougeLsum.append(save_dict["rouge"]["rougeLsum"])
+        global_meteor.append(save_dict["meteor"]["meteor"])
+        global_cosine.append(save_dict["cosine_similarity"])
+        global_distance.append(save_dict["euclidean_disance"])
+
+    return global_sacrebleu, global_rouge1, global_rouge2, global_rougeL, global_rougeLsum, global_meteor, global_cosine, global_distance
+
+def main(mark):
+    args = initialize()
+    assert 'Mind2Web' in args.test_data_dir
+    tokenizer, model = setup_model_and_tokenizer(args)
+    sacrebleu = evaluate.load('sacrebleu', modeule_type = "metric")
+    rouge = evaluate.load('rouge', modeule_type = "metric")
+    meteor = evaluate.load('meteor', modeule_type = "metric")
+    embedding_model = SentenceTransformer(args.embedding_model_path, device="cuda")
+    
+    test_folders_names = ["test_domain", "test_task", "test_website"]
+    for name in test_folders_names:
+        test_folder_path = Path(os.path.join(args.test_data_dir,name))
+        global_sacrebleu, global_rouge1, global_rouge2, global_rougeL, global_rougeLsum, global_meteor, global_cosine, global_distance = [], [], [], [], [], [], [], []
+        for json_file in test_folder_path.rglob('*_with_steps_insert_mistral.json'):
+            with json_file.open('r') as f:
+                data = json.load(f)
+            raw_tasks = load_raw_dataset(data, args)
+            sacrebleu_calc, rouge1_calc, rouge2_calc, rougeL_calc, rougeLsum_calc, meteor_calc, cosine_calc, distance_calc = main_loop(args, raw_tasks, tokenizer, model, sacrebleu, rouge, meteor, embedding_model, 'test_%s'%(name))
+            
+            global_sacrebleu.extend(sacrebleu_calc)
+            global_rouge1.extend(rouge1_calc)
+            global_rouge2.extend(rouge2_calc)
+            global_rougeL.extend(rougeL_calc)
+            global_rougeLsum.extend(rougeLsum_calc)
+            global_meteor.extend(meteor_calc)
+            global_cosine.extend(cosine_calc)
+            global_distance.extend(distance_calc)
+
+        print(mark, name)
+        print("%.3f" % (np.mean(global_cosine)))
+        print("%.3f" % (np.mean(global_sacrebleu)/100.0))
+        print("%.3f" % (np.mean(global_rougeL)))
+        print("%.3f" % (np.mean(global_meteor)))
+
+if __name__ == "__main__":
+    main('test')
--- a/inference/inference.sh
+++ b/inference/inference.sh
@ -0,0 +1,30 @@
+PROJECT_PATH="your-project-path"
+EMBEDDING_MODEL_PATH="${PROJECT_PATH}/sentence-transformer/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/e4ce9877abf3edfe10b0d82785e83bdcb973e22e"
+
+OPTS=""
+OPTS+=" --embedding_model_path ${EMBEDDING_MODEL_PATH}"
+OPTS+=" --test_data_dir ${PROJECT_PATH}/data/Mind2Web/test"
+OPTS+=" --train_data_dir ${PROJECT_PATH}/data/Mind2Web/train/train_with_steps_insert_mistral.json"
+OPTS+=" --prompt_file ${PROJECT_PATH}/prompts/summarisation/summarisation_prompt.txt"
+
+MODEL_NAME_OR_PATH_BMT="${PROJECT_PATH}/ckpts/experiment/epoch_14"
+MODEL_NAME_OR_PATH_HF="${MODEL_NAME_OR_PATH_BMT}-hf"
+MODEL_NAME_OR_PATH_ORIGINAL_MISTRAL="${PROJECT_PATH}/Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24"
+
+# Convert the model to HF format
+if [ ! -f "${MODEL_NAME_OR_PATH_HF}/config.json" ]; then
+    CMD="python3 ${PROJECT_PATH}/hf_bmt/bmt_hf.py --in_path ${MODEL_NAME_OR_PATH_BMT} --output_path ${MODEL_NAME_OR_PATH_HF} --original_mistral_path ${MODEL_NAME_OR_PATH_ORIGINAL_MISTRAL}"
+    echo "-------BMT -> HF CMD is------"
+    echo "CMD: ${CMD}"
+    echo "-------BMT -> HF CMD end------"
+    eval ${CMD}
+fi
+
+OPTS+=" --model_name_or_path ${MODEL_NAME_OR_PATH_HF}"
+
+CMD="python3 inference.py ${OPTS}"
+
+echo "-------final CMD is------"
+echo "${CMD}"
+echo "-------final CMD end------"
+eval ${CMD}
--- a/preprocess/convert_dataset.py
+++ b/preprocess/convert_dataset.py
@ -0,0 +1,135 @@
+import os, pdb
+import re
+import json
+from enum import Enum
+from tqdm import tqdm
+from bs4 import BeautifulSoup
+
+def read_json_file(filename):
+    with open(filename, 'r') as infile:
+        data = json.load(infile)
+    return data
+
+def convert_string(string_or_list):
+    # Add escaping symbols to English quotes in string
+    if isinstance(string_or_list, str):
+        return string_or_list.replace('"', '\\"')
+    elif isinstance(string_or_list, list):
+        return [convert_string(s) for s in string_or_list]
+
+def is_visible(element):
+    bounding_box = element.get('bounding_box_rect')
+    return bounding_box != "-1,-1,-1,-1"
+
+def clean_text(text):
+    cleaned_text = text.strip()
+    cleaned_text = cleaned_text.replace('\n', ' ').replace('\t', ' ')
+    cleaned_text = re.sub(' +', ' ', cleaned_text)
+    return cleaned_text
+
+def find_semantic_info(element):
+    element_text = clean_text(element.get_text(strip=True))
+    if element_text:
+        return element_text
+    
+    label = element.find_previous(lambda x: x.name == 'label' and is_visible(x))
+    if label:
+        label_text = clean_text(label.get_text(strip=True))
+        if label_text:
+            return label_text
+    return None
+
+def action_discription(ui_element_name, ui_element_text, operation_type, value):
+    ret_en = ""
+    if operation_type == "TYPE":
+        if ui_element_text != "":
+            ret_en += f'Type text "{value}" into {ui_element_name} with text "{ui_element_text}" on it'
+        else:
+            ret_en += f'Type text "{value}" into {ui_element_name}'
+    elif operation_type == "SELECT":
+        if ui_element_text != "":
+            ret_en += f'Select "{value}" from {ui_element_name} with text "{ui_element_text}" on it'
+        else:
+            ret_en += f'Select "{value}" from {ui_element_name}.'
+    elif operation_type == "CLICK":
+        if ui_element_text != "":
+            ret_en += f'Click the {ui_element_name} element with text "{ui_element_text}" on it'
+        else:
+            ret_en += f'Click the {ui_element_name} element'
+    return ret_en
+
+def process_one_task(task):
+    base_info = {
+        "website_en": task["website"],
+        "domain_en": task["domain"],
+        "subdomain_en": task["subdomain"],
+        "annotation_id":task["annotation_id"],
+        "task_description": task["confirmed_task"],
+        "action_reprs" : task["action_reprs"]
+    }
+    action_descriptions_en = []
+    for action_index, action in enumerate(task["actions"]):
+        action_repr = task["action_reprs"][action_index]
+        ui_element, _ = action_repr.split(" -> ")
+        assert ui_element.count("]  ")==1
+        ui_element_name, ui_element_text = ui_element.split("]  ")
+        ui_element_name = ui_element_name[1:]
+        ui_element_text = ui_element_text.strip()
+        
+        if ui_element_text == "":
+            raw_html = action["raw_html"]
+            soup2 = BeautifulSoup(raw_html, 'html.parser')
+            selected_element2 = soup2.find(attrs={"data_pw_testid_buckeye": action["action_uid"]})
+
+            ui_element_text = find_semantic_info(selected_element2)
+            if ui_element_text is not None:
+                ui_element_text = clean_text(ui_element_text)
+                task["action_reprs"][action_index] = f"[{ui_element_name}]  {ui_element_text} -> {task['action_reprs'][action_index].split(' -> ')[1]}"
+            else: 
+                print(f'Warning: {task["annotation_id"]}, can not find semantic info for {action["action_uid"]}')
+
+        action_description_en = action_discription(ui_element_name, ui_element_text, action["operation"]["op"], action["operation"]["value"])
+        action_descriptions_en.append(action_description_en)
+
+    base_info["task_subintention"] = action_descriptions_en
+    return base_info
+
+if __name__ == "__main__":
+    for foldername in ['train','test_domain','test_website','test_task']:
+        SAVE_PATH = f"your-path-to-data/{foldername}"
+        
+        for idx in range(100):
+            savejsonfilename = os.path.join(SAVE_PATH,f'{foldername}_{idx}_with_actions_description_insert.json')
+            if os.path.exists(savejsonfilename):
+                continue
+            else:
+                jsonfilename = f"{SAVE_PATH}/{foldername}_{idx}.json"
+                if not os.path.exists(jsonfilename):
+                    break
+                dataset = read_json_file(jsonfilename)
+                
+                Mind2Web_with_subintentions = []
+                for task in tqdm(dataset):
+                    base_info = process_one_task(task)
+                    Mind2Web_with_subintentions.append(base_info)
+                assert len(Mind2Web_with_subintentions) == len(dataset)
+
+                if 'test' in foldername:
+                    with open(os.path.join(SAVE_PATH,f'{foldername}_{idx}_with_actions_description.json'), 'r') as json_file:
+                        Mind2Web_with_subintentions_saved = json.load(json_file)
+
+                    for i in range(len(Mind2Web_with_subintentions)):
+                        if i>=len(Mind2Web_with_subintentions_saved):
+                            break
+                        if Mind2Web_with_subintentions[i] != Mind2Web_with_subintentions_saved[i]:
+                            for key in Mind2Web_with_subintentions[i].keys():
+                                if Mind2Web_with_subintentions[i][key] != Mind2Web_with_subintentions_saved[i][key]:
+                                    found = False
+                                    for j in range(len(Mind2Web_with_subintentions_saved)):
+                                        if Mind2Web_with_subintentions[i][key] == Mind2Web_with_subintentions_saved[j][key]:
+                                            found = True
+                                            break
+                                    if not found:
+                                        print(found, i, j, jsonfilename)
+                with open(savejsonfilename, 'w') as json_file:
+                    json.dump(Mind2Web_with_subintentions, json_file)
--- a/preprocess/create_steps.py
+++ b/preprocess/create_steps.py
@ -0,0 +1,97 @@
+from tqdm import tqdm
+import json
+import os
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+def get_tokenizer(model_name_or_path):
+    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, device_map={"":0})
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.padding_side = 'left'
+    return tokenizer
+
+def get_model(model_name_or_path):
+    model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map={"":0})
+    return model
+
+def read_json_file(filename):
+    with open(filename, 'r') as infile:
+        data = json.load(infile)
+    return data
+
+if __name__ == "__main__":
+    model_name_or_path = "Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24"
+    tokenizer = get_tokenizer(model_name_or_path)
+    model = get_model(model_name_or_path)
+
+    # load prompts
+    with open("your-path-to-data/train_prompt.txt", "r") as f:
+        train_prompt = f.read()
+    with open("your-path-to-data/test_prompt.txt", "r") as f:
+        test_prompt = f.read()
+
+    for foldername in ['train','test_domain','test_website','test_task']:
+        SAVE_PATH = f"your-path-to-data/{foldername}"
+        
+        for idx in range(100):
+            savejsonfilename = f"{SAVE_PATH}/{foldername}_{idx}_with_steps_insert_mistral.json"
+            jsonfilename = f"{SAVE_PATH}/{foldername}_{idx}_with_actions_description_insert.json"
+            if not os.path.exists(jsonfilename):
+                break
+
+            data = read_json_file(jsonfilename)
+            if os.path.exists(savejsonfilename):
+                data = read_json_file(savejsonfilename)
+            actions_steps = []
+            for i in tqdm(range(len(data)), desc="Steps_Creation"):
+                if "train" in foldername: # include task
+                    message = f"""Website: {data[i]["website_en"]}
+                    Domain: {data[i]["domain_en"]}
+                    Sub-domain: {data[i]["subdomain_en"]}
+                    Task: {data[i]["task_description"]}
+                    Actions: {data[i]["task_subintention"]}\n
+                    # OUTPUT #
+                    """
+                    prompt = train_prompt
+                else: # exclude task
+                    message = f"""Website: {data[i]["website_en"]}
+                    Domain: {data[i]["domain_en"]}
+                    Sub-domain: {data[i]["subdomain_en"]}
+                    Actions: {data[i]["task_subintention"]}\n
+                    # OUTPUT #
+                    """
+                    prompt = test_prompt
+
+                messages=[
+                    {"role": "system", "content": prompt},
+                    {"role": "user", "content": message}
+                ]
+                messages = 'System: ' + prompt + 'User: ' + message
+                
+                model_inputs = tokenizer(messages, return_tensors="pt").to("cuda")
+                assert len(model_inputs['input_ids'])<=4096
+                generated_ids = model.generate(**model_inputs,max_new_tokens=1024, do_sample=False, top_p= 0.95, repetition_penalty=1.2)
+                json_object = tokenizer.batch_decode(generated_ids)[0]
+                answer = json_object.split('Sub-intentions: [')[-1].split('\n')
+                final_answer = []
+                for a in answer:
+                    a = a.strip()
+                    if '</s>' in a:
+                        a = a.split('</s>')[0]
+                    if len(a)==0:
+                        continue
+                    while a[0]=='"':
+                        a = a[1:]
+                        if len(a)==0:
+                            break
+                    if len(a)==0:
+                        continue
+                    while a[-1] in ['"', ',', ']', ]:
+                        a = a[:-1]
+                        if len(a)==0:
+                            break
+                    if len(a)==0:
+                        continue
+                    final_answer.append(a)
+                data[i]['steps'] = final_answer
+                with open(savejsonfilename, 'w') as json_file:
+                    json.dump(data, json_file)
--- a/train/train.py
+++ b/train/train.py
@ -0,0 +1,374 @@
+import argparse
+import torch
+from tqdm import tqdm
+from torch.utils.data import DataLoader
+import bmtrain as bmt
+from functools import partial
+import time
+import os, pdb, shutil
+import random
+import json
+from model_center.model import Llama
+from model_center.tokenizer import LlamaTokenizer
+from functools import partial
+from dataset_wrapper import PromptIterableDataset, collator
+import wandb
+import csv
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:32"
+import logging
+import numpy as np
+import math
+from sentence_transformers import SentenceTransformer, util
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+def set_seed(seed):
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    np.random.seed(seed)
+    random.seed(seed)
+    torch.backends.cudnn.benchmark = False
+    torch.backends.cudnn.deterministic = True
+
+def get_tokenizer(args):
+    tokenizer = LlamaTokenizer.from_pretrained(args.model_name_or_path)
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.padding_side = 'left'
+    return tokenizer
+
+def get_model(args):
+    model = Llama.from_pretrained(args.model_name_or_path)
+    if args.load_ckpt is not None:
+        logger.info(f"loading model from {args.load_ckpt}")
+        bmt.load(model, os.path.join(args.load_ckpt, "pytorch_model.pt"))
+    return model
+
+def get_optimizer(args, model):
+    optimizer = bmt.optim.AdamOffloadOptimizer(
+        model.parameters(), 
+        weight_decay=args.weight_decay, 
+        eps=1e-5, 
+        betas=(0.9, 0.95)
+    )
+    if args.load_ckpt is not None:
+        file_name = os.path.join(args.load_ckpt, "optim.rank-{}.opt".format(bmt.rank()))
+        logger.info(file_name)
+        if os.path.exists(file_name):
+            logger.info("start to load gradient ckpt {}".format(file_name))
+            states = torch.load(file_name)
+            optimizer.load_state_dict(states)
+    return optimizer
+
+def get_learning_rate_scheduler(args, optimizer):
+    if args.lr_decay_iters is None:
+        args.lr_decay_iters = args.train_iters
+    if args.lr_decay_style == "linear":
+        lr_scheduler = bmt.lr_scheduler.Linear(
+            optimizer,
+            start_lr=args.lr,
+            warmup_iter=int(args.warmup_iters),
+            end_iter=args.lr_decay_iters,
+            num_iter=args.start_step,
+        )
+    elif args.lr_decay_style == "cosine":
+        bmt.print_rank("use cosine")
+        class Cosine(bmt.lr_scheduler.WarmupLRScheduler):
+            def get_lr_warmup(self, num_iter) -> float:
+                return self.start_lr * num_iter / self.warmup_iter
+                
+            def get_lr_decay(self, num_iter) -> float:
+                progress = (num_iter - self.warmup_iter) / max(1, (self.end_iter - self.warmup_iter))
+                return max(self.start_lr * 0.1, self.start_lr * (0.1 + 0.45 * (1.0 + math.cos(progress * math.pi))))
+
+        lr_scheduler = Cosine(
+            optimizer,
+            start_lr=args.lr,
+            warmup_iter=int(args.warmup_iters),
+            end_iter=args.lr_decay_iters,
+            num_iter=args.start_step,
+        )
+
+    elif args.lr_decay_style == "noam":
+        logger.info("use noam")
+        lr_scheduler = bmt.lr_scheduler.Noam(
+            optimizer,
+            start_lr=args.lr,
+            warmup_iter=int(args.warmup_iters),
+            end_iter=args.lr_decay_iters,
+            num_iter=args.start_step,
+        )
+    else:
+        raise NotImplementedError
+    return lr_scheduler
+
+def setup_model_and_optimizer(args):
+    # get the tokenizer
+    tokenizer = get_tokenizer(args)
+    # get the model
+    model = get_model(args)
+    bmt.synchronize()
+    # get the optimizer and lr_scheduler
+    optimizer = get_optimizer(args, model)
+    lr_scheduler = get_learning_rate_scheduler(args, optimizer)
+    bmt.synchronize()
+    return tokenizer, model, optimizer, lr_scheduler
+
+def initialize():
+    parser = argparse.ArgumentParser("")
+    # model training arguments
+    parser.add_argument("--lr", type=float, default=1e-5)
+    parser.add_argument("--model_name_or_path")
+    parser.add_argument("--epochs", type=int, default=1)
+    parser.add_argument("--seed", type=int, default=0)
+    parser.add_argument("--max_seq_length", default=2048, type=int)
+    parser.add_argument("--batch_size_per_device", default=2, type=int)
+    parser.add_argument("--logging_step", default=100, type=int)
+    parser.add_argument("--save_step", default=50000, type=int)
+    parser.add_argument("--gradient_accumulation_steps", default=1, type=int)
+    parser.add_argument("--wandb", default= True ,action="store_true")
+    parser.add_argument("--with_eval", action="store_true")
+    parser.add_argument("--clip_grad", type=float, default=1.0, help="gradient clipping")
+    parser.add_argument("--weight_decay", type=float, default=0.0, help="weight decay rate")
+    parser.add_argument("--loss_scale", type=float, default=6553600, help="loss scale")
+    parser.add_argument("--train_iters", type=int, default=2000000)
+
+    # loss parameters
+    parser.add_argument("--action_weight", type=float, help="weight of the tokens that match the action")
+    parser.add_argument("--embedding_model_path", type=str, help="The path to the sentence embedding model")
+
+    # data parameters
+    parser.add_argument('--data_setting', type=str ,help='MTSD or MTMD', default="MTMD")
+    parser.add_argument('--data_dir', type=str, help='The directory for saving the dataset')
+    parser.add_argument('--max_train_samples', type=int, help='The maximum number of training samples')
+
+    parser.add_argument('--cache_dir', type=str, help='The directory for cache')
+    parser.add_argument("--save_dir", type=str, default="")
+
+    parser.add_argument("--save_limit", type=int, default=None, help="ckpt saved limit number")
+
+    parser.add_argument("--warmup_iters", type=int, default=1000)
+    parser.add_argument(
+        "--lr_decay_style",
+        type=str,
+        default="cosine",
+        choices=["constant", "linear", "cosine", "exponential", "noam"],
+        help="learning rate decay function",
+    )
+    parser.add_argument("--lr_decay_iters", type=int, default=None, help="lr decay steps")
+    parser.add_argument("--start_step", type=int, default=0, help="step to start or continue training")
+    parser.add_argument("--load_ckpt", type=str, default=None, help="resumed ckpt")
+    parser.add_argument("--save_processed_data", action='store_true', help="wheather or no save the processed data")
+    parser.add_argument("--prompt_file", type=str, default=None, help="The file for loading the prompt")
+    args = parser.parse_args()
+    # init bmt
+    bmt.init_distributed(seed=args.seed)
+    set_seed(args.seed)
+    # wandb
+    if args.wandb and bmt.rank() == 0:
+        wandb.init(project='Mistral-Interact', config=args, name=args.save_dir.split('Mistral-7b/')[1][:-1], save_code=True, settings=wandb.Settings(code_dir="."))
+    return args
+
+def format_one_action(action):
+    return f"- {action}\n"
+
+def format_actions_list(actions):
+    actions_str = ""
+    for action in actions:
+        actions_str += format_one_action(action)
+    return actions_str     
+                
+def read_json_file(filename):
+    with open(filename, 'r') as infile:
+        data = json.load(infile)
+    return data
+
+def load_Mind2Web_dataset(args, save_dataset= False):
+    # read text from a file (file name is args.prompt_file)
+    with open(args.prompt_file, 'r') as file:
+        task_description = file.read().split('===')
+    raw_dataset = read_json_file(args.data_dir)
+
+    dataset=[]
+    for idx, d in enumerate(raw_dataset):
+        sequences = []
+        input_str = f"## Website:\n{d['website_en']}\n\n## Domain:\n{d['domain_en']}\n\n## Sub-domain:\n{d['subdomain_en']}\n\n## Actions (Each line is one action):\n{format_actions_list(d['task_subintention'])}\n## Sub-intentions summarised from these actions:\n{format_actions_list(d['steps'])}"
+
+        query_inputs = f"{task_description[0]}\n{input_str}{task_description[1]}\n"
+        sequences.append(query_inputs)
+        summary_str = d['task_description']
+        summary_str = "[SUMMARY] " + summary_str[0].upper() + summary_str[1:]
+        sequences.append(summary_str)
+        dataset.append({"data": sequences.copy()})
+
+    random.shuffle(dataset)
+    
+    if args.max_train_samples is not None:
+        dataset = dataset[:args.max_train_samples]
+    return dataset
+
+def load_MoTIF_dataset(args, save_dataset= False):
+    with open(args.prompt_file, 'r') as file:
+        task_description = file.read().split('===')
+    
+    raw_dataset = []
+    for filename in os.listdir(args.data_dir):
+        if filename.endswith('_steps.json'):
+            file_path = os.path.join(args.data_dir, filename)
+            with open(file_path, 'r', encoding='utf-8') as json_file:
+                try:
+                    content = json.load(json_file)
+                    raw_dataset.append(content)
+                except json.JSONDecodeError as e:
+                    raise ValueError(f"Error decoding JSON from file {filename}: {e}")
+
+    dataset=[]
+    for d in raw_dataset:
+        sequences = []
+        input_str = f"## Application:\n{d['app']}\n\n## Actions (Each line is one action):\n{format_actions_list(d['instr'])}\n## Sub-intentions summarised from these actions:\n{format_actions_list(d['steps'])}"
+        query_inputs = f"{task_description[0]}\n{input_str}{task_description[1]}\n"
+        sequences.append(query_inputs)
+        summary_str = d['goal']
+        summary_str = "[SUMMARY] " + summary_str[0].upper() + summary_str[1:]
+        sequences.append(summary_str)
+        dataset.append({"data": sequences.copy()})
+
+    random.shuffle(dataset)
+    
+    if args.max_train_samples is not None:
+        dataset = dataset[:args.max_train_samples]
+    return dataset    
+
+def finetune(args, tokenizer, model, optimizer, lr_scheduler, dataset):
+    embedding_model = SentenceTransformer(args.embedding_model_path, device="cuda")
+    for param in embedding_model.parameters():
+        param.requires_grad = False
+
+    logger.info(f"total training instance number: {len(dataset)}")
+    loss_func = bmt.loss.FusedCrossEntropy(ignore_index=-100, reduction="none")
+    optim_manager = bmt.optim.OptimManager(loss_scale=args.loss_scale)
+    optim_manager.add_optimizer(optimizer, lr_scheduler)
+    bmt.synchronize()
+
+    avg_time_recorder = bmt.utils.AverageRecorder()
+    avg_loss_recorder = bmt.utils.AverageRecorder()
+    train_start_time = time.time()
+    global_step = 0
+
+    logger.info("split data for each process")
+    data_per_gpu = len(dataset) // bmt.world_size()
+    dataset = dataset[bmt.rank() * data_per_gpu: (bmt.rank() + 1) * data_per_gpu]
+    bmt.print_rank("training on [%d, %d] of the dataset" % (bmt.rank() * data_per_gpu, (bmt.rank() + 1) * data_per_gpu))
+    dataset = PromptIterableDataset(
+        dataset, 
+        tokenizer=tokenizer, 
+        max_seq_length=args.max_seq_length, 
+        teacher_forcing=True, 
+        truncate_method="tail",
+    )
+    total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    print(f"Total trainable parameters: {total_params}")
+    
+    total_params = sum(p.numel() for p in model.parameters())
+    print(f"Total parameters: {total_params}")
+
+    for epoch in range(args.epochs):
+        savefolder = os.path.join(args.save_dir, f"epoch_{epoch}")
+        os.makedirs(savefolder, exist_ok=True)
+
+        dataloader = DataLoader(dataset, batch_size=args.batch_size_per_device)
+
+        progress_bar = tqdm(range(len(dataloader)), disable=not bmt.rank()==0, desc=f"epoch {epoch}")
+        logger.info(f"*******start {epoch} epoch training********")
+        for step, inputs in enumerate(dataloader):
+            if global_step < args.start_step:
+                global_step += 1
+                progress_bar.update(1)
+                continue
+            st = time.time()
+
+            with bmt.inspect.inspect_tensor() as inspector:
+                for k in inputs:
+                    inputs[k] = inputs[k].cuda()
+
+                labels = inputs.pop("labels")
+                weight_idxs = inputs.pop('weight_idxs')
+                logits = model(**inputs).logits
+
+                shift_logits = logits[..., :-1, :].contiguous()
+                shift_labels = labels[..., 1:].contiguous()
+
+                # Flatten the tokens
+                shift_logits = shift_logits.view(-1, len(tokenizer))
+                shift_labels = shift_labels.view(-1).to(shift_logits.device)
+                ntp_loss = loss_func(shift_logits, shift_labels)
+
+                sample_specific_weights = torch.ones_like(shift_logits)
+                weight_idxs = weight_idxs[:, 1:, :].contiguous()
+                weight_idxs = weight_idxs.view(-1, weight_idxs.size(-1))
+                assert weight_idxs.shape[0] == sample_specific_weights.shape[0], "310"
+                sample_specific_weights[weight_idxs==1] = args.action_weight
+                sample_specific_weights = sample_specific_weights[torch.arange(sample_specific_weights.size(0)), shift_labels]
+
+                ntp_loss = (ntp_loss * sample_specific_weights).mean()
+                next_token_loss_item = bmt.sum_loss(ntp_loss).item()
+
+                global_loss = next_token_loss_item
+                optim_manager.backward(ntp_loss)
+
+                if (step + 1) % args.gradient_accumulation_steps == 0 or step == len(dataloader) - 1:
+                    optim_manager.clip_grad_norm(optimizer.param_groups, max_norm=args.clip_grad)
+                    optim_manager.step()
+                    optim_manager.zero_grad()
+            
+            global_step += 1
+            progress_bar.update(1)
+
+            # record time and loss
+            iteration_time = time.time() - st
+
+            avg_time_recorder.record(iteration_time)
+            if not np.isnan(global_loss):
+                avg_loss_recorder.record(global_loss)
+
+            # print time and loss
+            if global_step % args.logging_step == 0:
+                bmt.print_rank(
+                    "| Iter: {:6d} | loss: {:.4f} average_loss: {:.4f} | lr: {:.4e} | time: {:.4f} seconds | total_time_passed: {:.4f} minutes".format(
+                        global_step,
+                        global_loss,
+                        avg_loss_recorder.value,
+                        lr_scheduler.current_lr,
+                        avg_time_recorder.value,
+                        (time.time() - train_start_time) / 60
+                    )
+                )
+                if args.wandb and bmt.rank() == 0:
+                    wandb.log({
+                        "loss": global_loss,
+                        "next_token_loss": next_token_loss_item,
+                        "average_loss": avg_loss_recorder.value,
+                        "lr": lr_scheduler.current_lr,
+                    }, step=global_step)
+            
+            if global_step == args.train_iters:
+                break
+            
+        bmt.save(model, os.path.join(savefolder, "pytorch_model.pt"))
+        if bmt.rank() == 0:
+            tokenizer.save_pretrained(savefolder)
+        bmt.print_rank(f"model saved at {savefolder}")
+
+def main():
+    args = initialize()
+    if "Mind2Web" in args.data_dir:
+        dataset = load_Mind2Web_dataset(args, save_dataset=True)
+    else:
+        assert "MoTIF" in args.data_dir
+        dataset = load_MoTIF_dataset(args, save_dataset=True)
+    args.train_iters = min(args.epochs * (len(dataset) // (bmt.world_size() * args.batch_size_per_device) + 1), args.train_iters)
+    tokenizer, model, optimizer, lr_scheduler = setup_model_and_optimizer(args)
+    finetune(args, tokenizer, model, optimizer, lr_scheduler, dataset)
+
+if __name__ == "__main__":
+    main()
--- a/train/train.sh
+++ b/train/train.sh
@ -0,0 +1,44 @@
+#! /bin/bash
+MASTER_ADDR=localhost
+MASTER_PORT=12345
+NNODES=1
+NODE_RANK=0
+GPUS_PER_NODE=2
+DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \
+                  --nnodes $NNODES \
+                  --node_rank $NODE_RANK \
+                  --master_addr $MASTER_ADDR \
+                  --master_port $MASTER_PORT"
+
+PROJECT_PATH="your-project-path"
+
+OPTS=""
+# model config
+MAXSEQLEN=1024
+OPTS+=" --max_seq_length ${MAXSEQLEN}"
+OPTS+=" --model_name_or_path ${PROJECT_PATH}/Mistral-7b-bmtrain"
+# training config
+OPTS+=" --logging_step 4"
+BATCHSIZE=16
+OPTS+=" --batch_size_per_device ${BATCHSIZE}"
+OPTS+=" --save_step 500"
+OPTS+=" --epochs 15"
+LR=1e-6
+OPTS+=" --lr ${LR}"
+OPTS+=" --warmup_iters 0"
+OPTS+=" --start_step 0"
+OPTS+=" --loss_scale 6400"
+ACTIONWEIGHT=2
+OPTS+=" --action_weight ${ACTIONWEIGHT}"
+EMBEDDING_MODEL_PATH="${PROJECT_PATH}/sentence-transformer/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/e4ce9877abf3edfe10b0d82785e83bdcb973e22e"
+OPTS+=" --embedding_model_path ${EMBEDDING_MODEL_PATH}"
+
+OPTS+=" --prompt_file ${PROJECT_PATH}/prompts/summarisation/summarisation_prompt.txt"
+OPTS+=" --save_dir ${PROJECT_PATH}/ckpts/experiment"
+
+CMD="torchrun ${DISTRIBUTED_ARGS} train.py ${OPTS}"
+
+echo "-------final CMD is------"
+echo "${CMD}"
+echo "-------final CMD end------"
+${CMD}