Init
28
.gitignore
vendored
Normal file
|
@ -0,0 +1,28 @@
|
|||
src/overcooked_teacher_layout_imgs
|
||||
|
||||
*~
|
||||
.venv
|
||||
venv
|
||||
env
|
||||
!.gitkeep
|
||||
tmp
|
||||
.DS_Store
|
||||
.idea
|
||||
*.log
|
||||
*.map
|
||||
*.pyc
|
||||
*.h5
|
||||
__pycache__/
|
||||
.pytest_cache
|
||||
dist/
|
||||
**/data/
|
||||
**/logs/
|
||||
**/results/
|
||||
**/images/
|
||||
**/wandb/
|
||||
**/figures/
|
||||
**/config/wandb.json
|
||||
!docs/images
|
||||
src/*.egg-info
|
||||
**/.ipynb_checkpoints/
|
||||
!src/minimax/config
|
4
.vscode/settings.json
vendored
Normal file
|
@ -0,0 +1,4 @@
|
|||
{
|
||||
"editor.codeActionsOnSave": {},
|
||||
"git.ignoreLimitWarning": true
|
||||
}
|
203
LICENSE
Normal file
|
@ -0,0 +1,203 @@
|
|||
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
496
README.md
Normal file
|
@ -0,0 +1,496 @@
|
|||
<h1 align="center">The Overcooked Generalisation Challenge</h1>
|
||||
|
||||
<h1 align="center"><img src="docs/images/OvercookedDCD.png" width="90%" /></h1>
|
||||
|
||||
|
||||
This repository houses the Overcooked generalisation challange, a novel cooperative UED environment that explores the effect of generalisation on cooperative agents with a focus on zero-shot cooperation.
|
||||
We built this work on top of [minimax](https://github.com/facebookresearch/minimax) (original README included below) and are inspired by many of their implementation details.
|
||||
|
||||
We require Python to be above 3.9 and below 3.12, we use 3.10.12.
|
||||
To install this research code use `pip install -r requirements.txt`.
|
||||
|
||||
## Structure
|
||||
|
||||
Our project inlcudes the following major components:
|
||||
|
||||
- Overcooked UED
|
||||
- Multi-Agent UED Runners
|
||||
- Scripts for training and evaluations
|
||||
- Holdout populations for evaluation (accesible [here](https://drive.google.com/drive/folders/11fxdhrRCSTmB7BvfqMGqdIhvJUDv_0zP?usp=share_link))
|
||||
|
||||
We highlight our additions to minimax below often with additional comments.
|
||||
We choose minimax as the basis as it is tested and intended for this use case.
|
||||
The project is structured as follows:
|
||||
|
||||
```
|
||||
docs/
|
||||
envs/
|
||||
...
|
||||
overcooked.md (<- We document OvercookedUED here)
|
||||
images/
|
||||
...
|
||||
examples/*
|
||||
src/
|
||||
config/
|
||||
configs/
|
||||
maze/*
|
||||
overcooked/* (<- Our configurations for all runs in the paper)
|
||||
minimax/
|
||||
agents/
|
||||
...
|
||||
mappo.py (<- Our MAPPO interface for training)
|
||||
config/* (<- logic related to configs, and getting commands, OvercookedUED included)
|
||||
envs/
|
||||
...
|
||||
overcooked_proc/ (<- home of overcooked procedual content generation for UED)
|
||||
...
|
||||
overcooked_mutators.py (<- For ACCEL)
|
||||
overcooked_ood.py (<- Testing layouts (can be extended!))
|
||||
overcooked_ued.py (<- UED interface)
|
||||
overcooked.py (<- Overcooked capable of being run in parallel across layouts)
|
||||
models/
|
||||
...
|
||||
overcooked/
|
||||
...
|
||||
models.py (<- Models we use in the paper are defined here)
|
||||
runners/*
|
||||
runners_ma/* (<- multi-agent runners for Overcooked UED and potentially others)
|
||||
tests/*
|
||||
utils/*
|
||||
arguments.py
|
||||
count_params.py
|
||||
evaluate_against_baseline.py
|
||||
evaluate_against_population.py
|
||||
evaluate_baseline_against_population.py
|
||||
evaluate_from_pckl.py
|
||||
evaluate.py
|
||||
extract_fcp.py
|
||||
train.py (<- minimax starting point, also for our work)
|
||||
populations/
|
||||
fcp/* (see below)
|
||||
baseline_train__${what} (Trains multiple self play agents across seeds)
|
||||
eval_xpid_${what} (Evals populations, stay and random agents)
|
||||
eval_xpid.sh (Evals a run based on its XPID)
|
||||
extract_fcp.sh (Extracts FCP checkpoint from self-play agents)
|
||||
make_cmd.sh (Extended with our work)
|
||||
train_baseline_${method}_${architecture}.sh (Trains all methods in the paper)
|
||||
train_maze_s5.sh
|
||||
train_maze.sh
|
||||
```
|
||||
|
||||
## Overcooked UED
|
||||
We provide a detailed explanation of the environment in the paper.
|
||||
OvercookedUED provides interfaces to both edit-based, generator-based and curator-based DCD methods.
|
||||
For an overview see the figure above.
|
||||
|
||||
## Mutli-Agent UED Runners
|
||||
Multi-Agent runners are placed under `src/minimax/runners_ma`.
|
||||
They extend the minimax runners by support for multiple agents, i.e. by carrying around hidden states etc.
|
||||
Note: Our current implementation only features two agents.
|
||||
|
||||
## Scripts
|
||||
|
||||
Reproducability is important to us.
|
||||
We thus store all important script in this repository that produce the policies discussed in the paper.
|
||||
To generate a command, please use `make_cmd.sh` like so by specifying `overcooked` and the config file name:
|
||||
|
||||
```bash
|
||||
> ./make_cmd.sh overcooked baseline_dr_softmoe_lstm
|
||||
python -m train \
|
||||
--seed=1 \
|
||||
--agent_rl_algo=ppo \
|
||||
--n_total_updates=30000 \
|
||||
--train_runner=dr \
|
||||
--n_devices=1 \
|
||||
--student_model_name=default_student_actor_cnn \
|
||||
--student_critic_model_name=default_student_critic_cnn \
|
||||
--env_name=Overcooked \
|
||||
--is_multi_agent=True \
|
||||
--verbose=False \
|
||||
--log_dir=~/logs/minimax \
|
||||
--log_interval=10 \
|
||||
--from_last_checkpoint=False \
|
||||
...
|
||||
```
|
||||
|
||||
They are named `train_baseline_${method}_${architecture}.sh` and can be found in `src`.
|
||||
`${method}` specifies the DCD method and can be from {`p_accel`, `dr`, `pop_paired`, `p_plr`} which correspond to parallel ACCEL (https://arxiv.org/abs/2203.01302 & https://arxiv.org/abs/2311.12716), domain randimisation (https://arxiv.org/abs/1703.06907), population paired (https://arxiv.org/abs/2012.02096) and parallel PLR (https://arxiv.org/abs/2010.03934 & https://arxiv.org/abs/2311.12716).
|
||||
`${architecture}` on the other hand corresponds to the neural network architechture employed and can be from {`lstm`, `s5`, `softmoe`}.
|
||||
To use them, please set the environment variable `${WANDB_ENTITY}` to your wandb user name or specify `wandb_mode=offline`.
|
||||
The scripts can be called like this:
|
||||
|
||||
```bash
|
||||
./train_baseline_p_plr_s5.sh $device $seed
|
||||
```
|
||||
|
||||
The scripts run `src/minimax/train.py` and store their results to the configured locations (see the config jsons and the `--log_dir` flag) but usually somewhere in your home directory `~/logs/`.
|
||||
There are 12 train scripts and helper scripts that run multiple variations of these after the other, i.e. like in `train_baselines_s56x9.sh` that trains all 4 DCD methods with an S5 policy:
|
||||
|
||||
```bash
|
||||
DEFAULTVALUE=4
|
||||
DEFAULTSEED=1
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
seed="${2:-$DEFAULTSEED}"
|
||||
echo "Using device ${device} and seed ${seed}"
|
||||
|
||||
./train_baseline_p_plr_s5.sh $device $seed
|
||||
./train_baseline_p_accel_s5.sh $device $seed
|
||||
./train_baseline_pop_paired_s5.sh $device $seed
|
||||
./train_baseline_dr_s5.sh $device $seed
|
||||
```
|
||||
|
||||
Evaluation is performed via scripts starting with `eval`.
|
||||
One can evaluate against scripted agents `eval_stay_against_population.sh` and random ones via `eval_random_against_population.sh`.
|
||||
To evaluate against a population using a trained agent use `eval_xpid_against_population.sh` with device 4 and the agents XPID `YOUR_XPID` you can use `./eval_xpid_against_population.sh 4 YOUR_XPID`.
|
||||
|
||||
## Holdout populations for evaluation
|
||||
|
||||
The populations can be accessed here: https://drive.google.com/drive/folders/11fxdhrRCSTmB7BvfqMGqdIhvJUDv_0zP?usp=share_link.
|
||||
They need to be placed under `src/populations` to work with the provided scripts.
|
||||
Alternatively -- if desired -- populations can be obtained by running `src/baseline_train__all.sh` or alternatively by using `src/baseline_train__8_seeds.sh` for the desired layout, i.e. via:
|
||||
|
||||
```bash
|
||||
./baseline_train__8_seeds.sh $device coord_ring_6_9
|
||||
```
|
||||
|
||||
We exclude the detailed calls here as they are too verbose.
|
||||
The resulting directory structure for inlcuding the poppulations should look like the following:
|
||||
|
||||
```txt
|
||||
src/
|
||||
minimax
|
||||
...
|
||||
populations/
|
||||
fcp/
|
||||
Overcooked-AsymmAdvantages6_9/
|
||||
1/
|
||||
high.pkl
|
||||
low.pkl
|
||||
meta.json
|
||||
mid.pkl
|
||||
xpid.txt
|
||||
2/*
|
||||
...
|
||||
8/*
|
||||
population.json
|
||||
Overcooked-CoordRing6_9/*
|
||||
Overcooked-CounterCircuit6_9/*
|
||||
Overcooked-CrampedRoom6_9/*
|
||||
Overcooked-ForcedCoord6_9/*
|
||||
```
|
||||
|
||||
To work with these populations meta files point to the correct scripts.
|
||||
These are included in the downloadable zip, called `population.json` (see above) and should look like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"population_size": 24,
|
||||
"1": "populations/fcp/Overcooked-AsymmAdvantages6_9/1/low.pkl",
|
||||
"2": "populations/fcp/Overcooked-AsymmAdvantages6_9/1/mid.pkl",
|
||||
...
|
||||
"24": "populations/fcp/Overcooked-AsymmAdvantages6_9/8/high.pkl",
|
||||
"1_meta": "populations/fcp/Overcooked-AsymmAdvantages6_9/1/meta.json",
|
||||
"2_meta": "populations/fcp/Overcooked-AsymmAdvantages6_9/1/meta.json",
|
||||
...
|
||||
"24_meta": "populations/fcp/Overcooked-AsymmAdvantages6_9/8/meta.json"
|
||||
}
|
||||
```
|
||||
|
||||
They help our evaluation to keep track of the correct files to use.
|
||||
|
||||
To check whether they work correctly use something along the lines of (compare the eval scripts):
|
||||
|
||||
```bash
|
||||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
for env in "Overcooked-CoordRing6_9" "Overcooked-ForcedCoord6_9" "Overcooked-CounterCircuit6_9" "Overcooked-AsymmAdvantages6_9" "Overcooked-CrampedRoom6_9";
|
||||
do
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.evaluate_baseline_against_population \
|
||||
--env_names=${env} \
|
||||
--population_json="populations/fcp/${env}/population.json" \
|
||||
--n_episodes=100 \
|
||||
--is_random=True
|
||||
done
|
||||
```
|
||||
|
||||
## Credit the minimax authors
|
||||
|
||||
For attribution in academic contexts please also cite the original work on minimax:
|
||||
```
|
||||
@article{jiang2023minimax,
|
||||
title={minimax: Efficient Baselines for Autocurricula in JAX},
|
||||
author={Jiang, Minqi and Dennis, Michael and Grefenstette, Edward and Rocktäschel, Tim},
|
||||
booktitle={Agent Learning in Open-Endedness Workshop at NeurIPS},
|
||||
year={2023}
|
||||
}
|
||||
```
|
||||
|
||||
The original readme is included below.
|
||||
|
||||
<br>
|
||||
|
||||
<br>
|
||||
|
||||
<br>
|
||||
|
||||
<br>
|
||||
|
||||
<h1 align="center">Original Minimax Readme</h1>
|
||||
|
||||
<h1 align="center"><img src="docs/images/minimax_logo.png" width="60%" /></h1>
|
||||
|
||||
<h3 align="center"><i>Efficient baselines for autocurricula in JAX</i></h3>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/license-Apache2.0-blue.svg"/></a>
|
||||
<a href="https://pypi.python.org/pypi/minimax-lib"><img src="https://badge.fury.io/py/minimax-lib.svg"/></a>
|
||||
<a href= "https://drive.google.com/drive/folders/15Vi7OsY6OrVaM5ZnY3Bt7J-0s5o_KV9b?usp=drive_link"><img src="https://colab.research.google.com/assets/colab-badge.svg"/></a>
|
||||
<a href="https://arxiv.org/abs/2311.12716"><img src="https://img.shields.io/badge/arXiv-2311.12716-b31b1b.svg"/></a>
|
||||
</p>
|
||||
|
||||
## Contents
|
||||
- [Why `minimax`?](#-why-minimax)
|
||||
- [Hardware-accelerated baselines](#-hardware-accelerated-baselines)
|
||||
- [Install](#%EF%B8%8F-install)
|
||||
- [Quick start](#-quick-start)
|
||||
- [Dive deeper](#-dive-deeper)
|
||||
- [Training](#training)
|
||||
- [Logging](#logging)
|
||||
- [Checkpointing](#checkpointing)
|
||||
- [Evaluating](#evaluating)
|
||||
- [Environments](#%EF%B8%8F-environments)
|
||||
- [Supported environments](#supported-environments)
|
||||
- [Adding environments](#adding-environments)
|
||||
- [Agents](#-agents)
|
||||
- [Roadmap](#-roadmap)
|
||||
- [License](#-license)
|
||||
- [Citation](#-citation)
|
||||
|
||||
## 🐢 Why `minimax`?
|
||||
|
||||
Unsupervised Environment Design (UED) is a promising approach to generating autocurricula for training robust deep reinforcement learning (RL) agents. However, existing implementations of common baselines require excessive amounts of compute. In some cases, experiments can require more than a week to complete using V100 GPUs. **This long turn-around slows the rate of research progress in autocuriculum methods**. `minimax` provides fast, [JAX-based](https://github.com/google/jax) implementations of key UED baselines, which are based on the concept of _minimax_ regret. By making use of fully-tensorized environment implementations, `minimax` baselines are fully-jittable and thus take full advantage of the hardware acceleration offered by JAX. In timing studies done on V100 GPUs and Xeon E5-2698 v4 CPUs, we find `minimax` baselines can run **over 100x faster than previous reference implementations**, like those in [facebookresearch/dcd](https://github.com/facebookresearch/dcd).
|
||||
|
||||
All autocurriculum algorithms implemented in `minimax` also support multi-device training, which can be activated through a [single command line flag](#multi-device-training). Using multiple devices for training can lead to further speed ups and allows scaling these autocurriculum methods to much larger batch sizes.
|
||||
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset="docs/images/minimax_speedups_darkmode.png#gh-dark-mode-only">
|
||||
<img alt="Shows Anuraghazra's GitHub Stats." src="docs/images/minimax_speedups.png">
|
||||
</picture>
|
||||
|
||||
### 🐇 Hardware-accelerated baselines
|
||||
|
||||
`minimax` includes JAX-based implementations of
|
||||
|
||||
- [Domain Randomization (DR)](https://arxiv.org/abs/1703.06907)
|
||||
|
||||
- [Minimax adversary](https://arxiv.org/abs/2012.02096)
|
||||
|
||||
- [PAIRED](https://arxiv.org/abs/2012.02096)
|
||||
|
||||
- [Population PAIRED](https://arxiv.org/abs/2012.02096)
|
||||
|
||||
- [Prioritized Level Replay (PLR)](https://arxiv.org/abs/2010.03934)
|
||||
|
||||
- [Robust Prioritized Level Replay (PLR$`^{\perp}`$)](https://arxiv.org/abs/2110.02439)
|
||||
|
||||
- [ACCEL](https://arxiv.org/abs/2203.01302)
|
||||
|
||||
Additionally, `minimax` includes two new variants of PLR and ACCEL that further reduce wall time by better leveraging the massive degree of environment parallelism enabled by JAX:
|
||||
|
||||
- Parallel PLR (PLR$`^{||}`$)
|
||||
|
||||
- Parallel ACCEL (ACCEL$`^{||}`$)
|
||||
|
||||
In brief, these two new algorithms collect rollouts for new level evaluation, level replay, and, in the case of Parallel ACCEL, mutation evaluation, all in parallel (i.e. rather than sequentially, as done by Robust PLR and ACCEL). As a simple example for why this parallelization improves wall time, consider how Robust PLR with replay probability of `0.5` would require approximately 2x as many rollouts in order to reach the same number of RL updates as a method like DR, because updates are only performed on rollouts based on level replay. Parallelizing level replay rollouts alongside new level evaluation rollouts by using 2x the environment parallelism reduces the total number of parallel rollouts to equal the total number of updates desired, thereby matching the 1:1 rollout to update ratio of DR. The diagram below summarizes this difference.
|
||||
|
||||
![Parallel DCD overview](docs/images/parallel_dcd_overview.png)
|
||||
|
||||
`minimax` includes a fully-tensorized implementation of a maze environment that we call [`AMaze`](docs/envs/maze.md). This environment exactly reproduces the MiniGrid-based mazes used in previous UED studies in terms of dynamics, reward function, observation space, and action space, while running many orders of magnitude faster in wall time, with increasing environment parallelism.
|
||||
|
||||
|
||||
## 🛠️ Install
|
||||
|
||||
1. Use a virtual environment manager like `conda` or `mamba` to create a new environment for your project:
|
||||
|
||||
```bash
|
||||
conda create -n minimax
|
||||
conda activate minimax
|
||||
```
|
||||
|
||||
2. Install `minimax` via either `pip install minimax-lib` or `pip install ued`.
|
||||
|
||||
3. That's it!
|
||||
|
||||
⚠️ Note that to enable hardware acceleration on GPU, you will need to make sure to install the latest version of `jax>=0.4.19` and `jaxlib>=0.4.19` that is compatible with your CUDA driver (requires minimum CUDA version of `11.8`). See [the official JAX installation guide](https://jax.readthedocs.io/en/latest/installation.html#pip-installation-gpu-cuda-installed-via-pip-easier) for detailed instructions.
|
||||
|
||||
## 🏁 Quick start
|
||||
|
||||
The easiest way to get started is to play with the Python notebooks in the [examples folder](examples) of this repository. We also host Colab versions of these notebooks:
|
||||
|
||||
- DR [[IPython](examples/dr.ipynb), [Colab](https://colab.research.google.com/drive/1HhgQgcbt77uEtKnV1uSzDsWEMlqknEAM)]
|
||||
|
||||
- PAIRED [[IPython](examples/paired.ipynb), [Colab](https://colab.research.google.com/drive/1NjMNbQ4dgn8f5rt154JKDnXmQ1yV0GbT?usp=drive_link)]
|
||||
|
||||
- PLR and ACCEL*: [[IPython](examples/plr.ipynb), [Colab](https://colab.research.google.com/drive/1XqVRgcIXiMDrznMIQH7wEXjGZUdCYoG9?usp=drive_link)]
|
||||
|
||||
*Depending on how the top-level flags are set, this notebook runs PLR, Robust PLR, Parallel PLR, ACCEL, or Parallel ACCEL.
|
||||
|
||||
`minimax` comes with high-performing hyperparameter configurations for several algorithms, including domain randomization (DR), PAIRED, PLR, and ACCEL for 60-block mazes. You can train using these settings by first creating the training command for executing `minimax.train` using the convenience script [`minimax.config.make_cmd`](docs/make_cmd.md):
|
||||
|
||||
`python -m minimax.config.make_cmd --config maze/[dr,paired,plr,accel] | pbcopy`,
|
||||
|
||||
followed by pasting and executing the resulting command into the command line.
|
||||
|
||||
[See the docs](docs/make_cmd.md) for `minimax.config.make_cmd` to learn more about how to use this script to generate training commands from JSON configurations. You can browse the available JSON configurations for various autocurriculum methods in the [configs folder](config/configs).
|
||||
|
||||
Note that when logging and checkpointing are enabled, the main `minimax.train` script outputs this data as `logs.csv` and `checkpoint.pkl` respectively in an experiment directory located at `<log_dir>/<xpid>`, where `log_dir` and `xpid` are arguments specified in the command. You can then evaluate the checkpoint by using `minimax.evaluate`:
|
||||
|
||||
```bash
|
||||
python -m minimax.evaluate \
|
||||
--seed 1 \
|
||||
--log_dir <absolute path log directory> \
|
||||
--xpid_prefix <select checkpoints with xpids matching this prefix> \
|
||||
--env_names <csv string of test environment names> \
|
||||
--n_episodes <number of trials per test environment> \
|
||||
--results_path <path to results folder> \
|
||||
--results_fname <filename of output results csv>
|
||||
```
|
||||
|
||||
## 🪸 Dive deeper
|
||||
|
||||
![minimax system diagram](docs/images/minimax_system_diagram.png)
|
||||
|
||||
### Training
|
||||
|
||||
The main entry for training is `minimax.train`. This script configures the training run based on command line arguments. It constructs an instance of `ExperimentRunner` to manage the training process on an update-cycle basis: These duties include constructing and delegating updates to an appropriate training runner for the specified autocurriculum algorithm and conducting logging and checkpointing. The training runner used by `ExperimentRunner` executes all autocurriculum-related logic. The system diagram above describes how these pieces fit together, as well as how `minimax` manages various, hierarchical batch dimensions.
|
||||
|
||||
Currently, `minimax` includes training runners for the following classes of autocurricula:
|
||||
|
||||
| **Runner** | **Algorithm class** | **`--train_runner`** |
|
||||
| -------------- | ------------------------------------------------------------- | --------------------- |
|
||||
| `DRRunner` | Domain randomization | `dr` |
|
||||
| `PLRRunner` | Replay-based curricula, including ACCEL | `plr` |
|
||||
| `PAIREDRunner` | Curricula via a co-adapting teacher environment design policy | `paired` |
|
||||
|
||||
The below table summarizes how various autocurriculum methods map to these runners and the key arguments that must be set differently from the default settings in order to switch the runner's behavior to each method.
|
||||
|
||||
| **Algorithm** | **Reference** | **Runner** | **Key args** |
|
||||
| - | - | - | - |
|
||||
| DR | [Tobin et al, 2019](https://arxiv.org/abs/1703.06907) | `DRRunner` | – |
|
||||
| Minimax adversary | [Dennis et al, 2020](https://arxiv.org/abs/2012.02096) | `PAIREDRunner` | `ued_score='neg_return'` |
|
||||
| PAIRED | [Dennis et al, 2020](https://arxiv.org/abs/2012.02096) | `PAIREDRunner` | – |
|
||||
| Population PAIRED | [Dennis et al, 2020](https://arxiv.org/abs/2012.02096) | `PAIREDRunner` | `n_students >= 2`, `ued_score='population_regret'` |
|
||||
| PLR | [Jiang et al, 2021](https://arxiv.org/abs/2010.03934) | `PLRRunner` | `plr_use_robust_plr=False` |
|
||||
| Robust PLR | [Jiang et al, 2021a](https://arxiv.org/abs/2110.02439) | `PLRRunner` | – |
|
||||
| ACCEL | [Parker-Holder et al, 2022](https://arxiv.org/abs/2203.01302) | `PLRRunner` | `plr_mutation_fn != None`, `plr_n_mutations > 0` |
|
||||
| Parallel PLR | [Jiang et al, 2023](https://openreview.net/forum?id=vxZgTbmC4L) | `PLRRunner` | `plr_use_parallel_eval=True` |
|
||||
| Parallel ACCEL | [Jiang et al, 2023](https://openreview.net/forum?id=vxZgTbmC4L) | `PLRRunner` | `plr_use_parallel_eval=True`, `plr_mutation_fn != None`, `plr_n_mutations > 0`|
|
||||
|
||||
[See the docs](docs/train_args.md) on `minimax.train` for a comprehensive guide on how to configure command-line arguments for running various autocurricula methods via `minimax.train`.
|
||||
|
||||
### Logging
|
||||
|
||||
By default, `minimax.train` generates a folder in the directory specified by the `--log_dir` argument, named according to `--xpid`. This folder contains the main training logs, `logs.csv`, which are updated with a new row every `--log_interval` rollout cycles.
|
||||
|
||||
### Checkpointing
|
||||
|
||||
**Latest checkpoint:**
|
||||
The latest model checkpoint is saved as `checkpoint.pkl`. The model is checkpointed every `--checkpoint_interval` number of updates, where each update corresponds to a full rollout and update cycle for each participating agent. For the same number of environment interaction steps, methods may differ in the number of gradient updates performed by participating agents, so checkpointing based on number of update cycles controls for this potential discrepency. For example, methods based on Robust PLR, like ACCEL, do not perform student gradient updates every rollout cycle.
|
||||
|
||||
**Archived checkpoints:**
|
||||
Separate archived model checkpoints can be saved at specific intervals by specifying a positive value for the argument `--archive_interval`. For example, setting `--archive_interval=1000` will result in saving model checkpoints every 1000 updates, named `checkpoint_1000.tar`, `checkpoint_2000.tar`, and so on. These archived models are saved in addition to `checkpoint.pkl`, which always stores the latest checkpoint, based on `--checkpoint_interval`.
|
||||
|
||||
### Evaluating
|
||||
|
||||
Once training completes, you can evaluate the resulting `checkpoint.pkl` on test environments using `minimax.evaluate`. This script can evaluate an individual checkpoint or group of checkpoints created via training runs with a shared experiment ID prefix (`--xpid` value), e.g. each corresponding to different training seeds of the same experiment configuration. Each checkpoint is evaluated over `--n_episodes` episodes for each of the test environments, specified via a csv string of test environment names passed in via `--env_names`. The evaluation results can be optionally written to a csv file in `--results_path`, if a `--results_fname` is provided.
|
||||
|
||||
[See the docs](docs/evaluate_args.md) on `minimax.evaluation` for a comprehensive guide on how to configure command line arguments for `minimax.evaluate`.
|
||||
|
||||
### Multi-device training
|
||||
|
||||
All autocurriculum algorithms in `minimax` support multi-device training via `shmap` across the environment batch dimension (see the system diagram above). In order to shard rollouts and gradient updates along the environment batch dimension across `N` devices, simply pass `minimax.train` the additional argument `--n_devices=N`. By default, `n_devices=1`.
|
||||
|
||||
|
||||
## 🏝️ Environments
|
||||
|
||||
### Supported environments
|
||||
|
||||
![Maze Overview](docs/images/env_maze_overview.png)
|
||||
|
||||
`minimax` currently includes [`AMaze`](docs/envs/maze.md), a fully-tensorized implementation of the partially-observable maze navigation environments featured in previous UED studies (see example `AMaze` environments in the figure above). The `minimax` implementation of the maze environment fully replicates the original MiniGrid-based dynamics, reward functions, observation space, action space. See the environment docs fo more details.
|
||||
|
||||
We look forward to working with the greater RL community in continually expanding the set of environments integrated with `minimax`.
|
||||
|
||||
### Adding environments
|
||||
|
||||
In order to integrate into `minimax`'s fully-jittable training logic, environments should be implemented in a tensorized fashion via JAX. All environments must implement the `Environment` interface. At a high level, `Environment` subclasses should implement reset and step logic assuming a single environment instance (no environment parallelism). Parallelism is automatically achieved via the training runner logic included with`minimax` (See the [paper]() and system diagram above for a quick overview of how this is performed).
|
||||
|
||||
A key design decision of `minimax` is to separate environment parameters into two groups:
|
||||
|
||||
- **Static parameters** are fixed throughout training. These parameters are frozen hyperparameters defining some unchanging aspect of the underlying environment distribution, e.g. the width, height, or maximum number of walls of maze environments considered during training. These static parameters are encapsulated in an `EnvParams` dataclass.
|
||||
|
||||
- **Free parameters** can change per environment instance (e.g. across each instance in a parallel rollout batch). These parameters might correspond to aspects like the specific wall map defining the maze layout or the starting position of the agent. Free parameters are simply treated as part of the fully-traceable `EnvState`, taking the form of an arbitrary pytree.
|
||||
|
||||
All environments supporting the `Environment` interface will interoperate with `DRRunner` and `PLRRunner` (though for ACCEL mode, where `mutation_fn != None`, a mutation operator [must additionally be defined](#environment-operators)).
|
||||
|
||||
#### Environment design with a co-adapting teacher policy
|
||||
|
||||
In PAIRED-style autocurricula, a teacher policy generates environment instances in order to maximize some curriculum objective, e.g. relative regret. The teacher's decision-making process corresponds to its own MDP.
|
||||
|
||||
To support such autocurricula, `minimax` follows the pattern of implementing separate `Environment` subclasses for each of student and teacher MDPs. A convenience class called `UEDEnvironment` is then initialized with instances of the student and teacher MDPs. The `UEDEnvironment` instance exposes a unified interface for resetting and stepping the teacher and student, which is then used in the training runner. For example, stepping the `UEDEnvironment` instance for the teacher (via the `step_teacher` method) produces an environment instance, which can then be used with the `reset_student` method to reset the state of the UEDEnvironment object to that particular environment instance. Subsequent calls of the `step_student` method then operate within this environment instance. Following this pattern, integration of a new environment with `PAIREDRunner` requires implementing the corresponding `Environment` subclass for the teacher MDP (the decision process ). See [`minimax/envs/maze/maze_ued.py`](src/minimax/envs/maze/maze_ued.py) for an example based on the maze environment.
|
||||
|
||||
#### Environment operators
|
||||
|
||||
Custom environment operators can also be defined in `minimax`.
|
||||
|
||||
- **Comparators** take two environment instances, as represented by their `EnvState` pytrees and return `True` iff the two instances are deemed equal. If a comparator is registered for an environment, training runners can use the comparator to enforce uniqueness of environment instances for many purposes, e.g. making sure the members of the PLR buffer are all unique.
|
||||
|
||||
- **Mutators** take an environment instance, as represented by an `EnvState` pytree, and apply some modification to the instance, returning the modified (or "mutated") instance. Mutators are used by ACCEL to mutate environment instances in the PLR buffer. New environments seeking integration with the ACCEL mode of the `PLRRunner` should implement and register a default mutation operator.
|
||||
|
||||
#### Registration
|
||||
|
||||
Each new `Environment` subclass should be registered with the `envs` module:
|
||||
|
||||
- **Student environments** should be registered using `envs.registration.register`. See src/minimax/maze/maze.py for an example.
|
||||
|
||||
- **Teacher environments** should be registered using `envs.registration.register_ued`. See [`envs/maze/maze_ued.py`](src/minimax/envs/maze/maze_ued.py) for an example.
|
||||
|
||||
- **Mutators** should be registered using `envs.registration.register_mutator`. See `envs/maze/maze_mutators.py` for an example.
|
||||
|
||||
- **Comparators** should be registered using `envs.registration.register_comparator`. See `envs/maze/maze_comparators.py` for an example.
|
||||
|
||||
## 🤖 Agents
|
||||
|
||||
In `minimax` *agents* correspond to a particular data-seeking learning algorithm, e.g. PPO. A *model* corresponds to a module that implements the policy (or value function) used by the agent. Any agent that follows the [`Agent`](src/minimax/agents/agent.py) interface should be usable in any `minimax` compatible environment.
|
||||
|
||||
Model forward passes are assumed to return a tuple of `(value_prediction, policy_logits, carry)`.
|
||||
|
||||
#### Registration
|
||||
|
||||
Custom model classes should be registered for a particular environment for which they are designed. See [`models/maze/gridworld_models.py`](src/minimax/models/maze/gridworld_models.py) for an example. After registration, the model can be easily retrieved and via `models.make(env_name, model_name, **model_kwargs)`.
|
||||
|
||||
## 🚀 Roadmap
|
||||
|
||||
Many exciting features are planned for future releases of `minimax`. Features planned for near-term release include:
|
||||
|
||||
- [ ] Add support for [JaxMARL](https://github.com/flairox/jaxmarl) (multi-agent RL environments) via an IPPO mode in `DRRunner` and `PLRRunner`.
|
||||
- [ ] Extend `Parsnip` with methods for composing argument specs across multiple files to reduce the size of `arguments.py` currently used for `train.py`.
|
||||
- [ ] Add support for [Jumanji](https://github.com/instadeepai/jumanji) (combinatorial optimization environments), via an appropriate decorator class.
|
||||
|
||||
You can suggest new features or ways to improve current functionality by creating an issue in this repository.
|
||||
|
||||
## 🪪 License
|
||||
|
||||
`minimax` is licensed under [Apache 2.0](LICENSE).
|
||||
|
||||
## 📜 Citation
|
||||
For attribution in academic contexts, please cite this work as
|
||||
```
|
||||
@article{jiang2023minimax,
|
||||
title={minimax: Efficient Baselines for Autocurricula in JAX},
|
||||
author={Jiang, Minqi and Dennis, Michael and Grefenstette, Edward and Rocktäschel, Tim},
|
||||
booktitle={Agent Learning in Open-Endedness Workshop at NeurIPS},
|
||||
year={2023}
|
||||
}
|
||||
```
|
126
docs/envs/maze.md
Normal file
|
@ -0,0 +1,126 @@
|
|||
# `AMaze`
|
||||
|
||||
## 🧭 Partially-observable navigation in procedural mazes.
|
||||
|
||||
![Maze Overview](../images/env_maze_overview.png)
|
||||
|
||||
The `AMaze` environment reproduces the MiniGrid-based, partially-observable maze navigation environments featured in previous works. Specifically `AMaze` provides feature-parity with respect to the previous reference implementation of the maze environment in [facebookresearch/dcd](https://github.com/facebookresearch/dcd).
|
||||
|
||||
## Student environment
|
||||
View source: [`envs/maze/maze.py`](../../src/minimax/envs/maze/maze.py)
|
||||
|
||||
### Static EnvParams
|
||||
|
||||
The table below summarizes the configurable static environment parameters of `AMaze`. The parameters that can be provided via `minimax.train` by default are denoted in the table below. Their corresponding command-line argument is the name of the parameter, preceded by the prefix `maze`, e.g. `maze_n_walls` for specifying `n_walls`.
|
||||
|
||||
Similarly, evaluation parameters can be specified via the prefix `maze_eval`, e.g. `maze_eval_see_agent` for specifying `see_agent`. Currently, `minimax.train` only accepts `maze_eval_see_agent` and `maze_eval_normalize_obs`.
|
||||
|
||||
Note that `AMaze` treats `height` and `width` as parameterizing only the portion of the maze grid that can vary, and thus excludes the 1-tile wall border surrounding each maze instance. Thus, a 15x15 maze in the prior `MiniGrid`-based implementation corresponds to an `AMaze` parameterization with `height=13` and `width=13`.
|
||||
|
||||
| Parameter | Description| Command-line support |
|
||||
| - | - | - |
|
||||
| `height` | Height of maze | ✅ |
|
||||
| `width` | Width of maze | ✅ |
|
||||
| `n_walls` | Number of walls to place per maze | ✅ |
|
||||
| `agent_view_size` | Size of foward-facing partial observation see by agent | ✅ |
|
||||
| `replace_wall_pos` | Wall positions are sampled with replacement if `True` | ✅ |
|
||||
| `see_agent` | Agent sees itself in its partial observation if `True` | ✅ |
|
||||
| `normalize_obs`| Scale observation values to [0,1] if `True`| ✅ |
|
||||
| `sample_n_walls` | Sample # walls placed between [0, `n_walls`] if `True` | ✅ |
|
||||
| `obs_agent_pos` | Include `agent_pos` in the partial observation | ✅ |
|
||||
| `max_episode_steps` | Maximum # steps per episode | ✅ |
|
||||
| `singleton_seed` | Fix the random seed to this value, making the environment a singleton | |
|
||||
|
||||
### State space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `agent_pos` | Agent's (x,y) position |
|
||||
| `agent_dir` | Agent's orientation vector |
|
||||
| `agent_dir_idx` | Agent's orientation enum |
|
||||
| `goal_pos` | Goal (x,y) position |
|
||||
| `wall_map` | H x W bool tensor, `True` in wall positions |
|
||||
| `maze_map` | Full maze map with all objects for rendering |
|
||||
| `time` | Time step |
|
||||
| `terminal` | `True` iff episode is done |
|
||||
|
||||
|
||||
### Observation space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `image`| Partial observation seen by agent |
|
||||
| `agent_dir` | Agent's orientation enum |
|
||||
| `agent_pos` | Agent's (x,y) position (not included by default) |
|
||||
|
||||
|
||||
### Action space
|
||||
| Action index | Description|
|
||||
| - | - |
|
||||
| `0` | Left |
|
||||
| `1` | Right |
|
||||
| `2` | Foward |
|
||||
| `3` | Pick up |
|
||||
| `4` | Drop |
|
||||
| `5` | Toggle |
|
||||
| `6` | Done |
|
||||
|
||||
Note that the navigation environments only use actions `0` through `2`, however all actions are included for parity with the original `MiniGrid`-based environments.
|
||||
|
||||
|
||||
## Teacher environment
|
||||
View source: [`envs/maze/maze_ued.py`](../../src/minimax/envs/maze/maze_ued.py)
|
||||
|
||||
To support autocurricula generated by a co-adapting teacher policy (e.g. PAIRED), `AMaze` includes `UEDMaze`, which implements the teacher's MDP for designing `Maze` instances. By design, a pair of `Maze` and `UEDMaze` objects (corresponding to a specific setting of `EnvParams`) can be wrapped into a `UEDEnvironment` object for use in a training runner (see `PAIREDRunner` for an example).
|
||||
|
||||
The parameters that can be provided via `minimax.train` by default are denoted in the table below. Their corresponding command-line argument is the name of the parameter, preceded by the prefix `maze_ued`, e.g. `maze_ued_n_walls` for specifying `n_walls`. Note that when the corresponding `maze_*` and `maze_ued_*` arguments conflict, those specified in `maze_*` take precedent.
|
||||
|
||||
### Static EnvParams
|
||||
| Variable | Description| Command-line support |
|
||||
| - | - | - |
|
||||
| `height` | Height of maze | ✅ |
|
||||
| `width` | Width of maze | ✅ |
|
||||
| `n_walls` | Wall budget | ✅ |
|
||||
| `noise_dim` | Size of noise vector in the observation | ✅ |
|
||||
| `replace_wall_pos` | If `True`, placing an object over an existing way replaces it. Otherwise, the object is placed in a random unused position. | ✅ |
|
||||
| `fixed_n_wall_steps` | First `n_walls` actions are wall positions if `True`. Otherwise, the first action only determines the fraction of wall budget to use. | ✅ |
|
||||
| `first_wall_pos_sets_budget` | First wall position also determines the fraction of wall budget to use (rather than using a separate first action to separately determine this fraction) | ✅ |
|
||||
| `set_agent_dir` | If `True`, the action in an extra last time step determines the agent's initial orientation index | ✅ |
|
||||
| `normalize_obs` | If `True`, Scale observation values to [0,1] | ✅ |
|
||||
|
||||
|
||||
### State space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `encoding` | `A 1D vector encoding the running action sequence of the teacher` |
|
||||
| `time` | `current time step` |
|
||||
| `terminal` | `True` if the episode is done |
|
||||
|
||||
### Observation space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `image` | Full `maze_map` of the maze instance under construction |
|
||||
| `time` | Time step |
|
||||
| `noise` | A noise vector sampled from Uniform(0,1) |
|
||||
|
||||
### Action space
|
||||
The action space corresponds to integers in [0,`height*width`]. Each action corresponds to a selected wall location in the flattened maze grid, with the exception of the last two actions, which correspond to the goal position and the agent's starting position. This interpretation of the action sequence can change based on the specific configuration of `EnvParams`:
|
||||
|
||||
- If `params.replace_wall_pos=True`, the first action corresponds to the number of walls to place in the current episode.
|
||||
|
||||
- If `params.set_agent_dir=True`, an additional step is appended to the episode, where the action corresponds to the agent's initial orientation index.
|
||||
|
||||
## OOD test environments
|
||||
The `AMaze` module includes the set of OOD, human-designed environments for testing zero-shot transfer from previous studies (See the figure above for a summary of these environments). Several of these environments are procedurally-generated:
|
||||
|
||||
- `Maze-SmallCorridor`
|
||||
- `Maze-LargeCorridor`
|
||||
- `Maze-FourRooms`
|
||||
- `Maze-Crossing`
|
||||
- `Maze-PerfectMaze*`
|
||||
|
||||
The OOD maze environments are defined in [`envs/maze/maze_ood.py`](../minimax/envs/maze/maze_ood.py). They each subclass `Maze` and support customization via the `EnvParams` configuration, e.g. changing the default `height` or `width` values to generate larger or smaller instances.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
110
docs/envs/overcooked.md
Normal file
|
@ -0,0 +1,110 @@
|
|||
# `AMaze`
|
||||
|
||||
## 🧭 Partially-observable navigation in procedural mazes.
|
||||
|
||||
![Maze Overview](../images/Training6x9SmallStylised.png)
|
||||
|
||||
The `OvercookedUED` environment reproduces the Overcooked in its classical state as described by Carroll et al. (https://github.com/HumanCompatibleAI/overcooked_ai) while also adding parallelisation across layouts and the possibility to design layouts by a teacher agents.
|
||||
Observation and action spaces are consistent with original and thus excluded from the description here.
|
||||
The student environment is built by starting from the JaxMARL project: https://github.com/FLAIROx/JaxMARL.
|
||||
|
||||
## Student environment
|
||||
View source: [`envs/overcooked_proc/overcooked.py`](../../src/minimax/envs/overcooked_proc/overcooked.py)
|
||||
|
||||
### Static EnvParams
|
||||
|
||||
Similar to the `AMaze` environment the parameters of the environment are described below.
|
||||
The interaction with these env parameters is fundamentally the same.
|
||||
All commands are command-line supported.
|
||||
|
||||
| Parameter | Description| Command-line support |
|
||||
| - | - | - |
|
||||
| `height` | Height of Overcooked layout | ✅ |
|
||||
| `width` | Width of Overcooked layout | ✅ |
|
||||
| `h_min` | Minimum height of Overcooked layout | - |
|
||||
| `w_min` | Minimum width of Overcooked layout | - |
|
||||
| `n_walls` | Number of walls to place per Overcooked layout | ✅ |
|
||||
| `replace_wall_pos` | Wall positions are sampled with replacement if `True` | ✅ |
|
||||
| `normalize_obs`| Scale observation values to [0,1] if `True`| ✅ |
|
||||
| `sample_n_walls` | Sample # walls placed between [0, `n_walls`] if `True` | ✅ |
|
||||
| `max_steps` | Steps in Overcooked until termination | ✅ |
|
||||
| `max_episode_steps` | Same as `max_steps` for consistency | ✅ |
|
||||
| `singleton_seed` | Fix the random seed to this value, making the environment a singleton | |
|
||||
|
||||
### State space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `agent_pos` | Agent's (x,y) position |
|
||||
| `agent_dir` | Agent's orientation vector |
|
||||
| `agent_dir_idx` | Agent's orientation enum |
|
||||
| `agent_inv` | The agents inventory |
|
||||
| `goal_pos` | Where serving locations are |
|
||||
| `pot_pos` | Where pots are |
|
||||
| `wall_map` | Boolean wall map |
|
||||
| `maze_map` | hxwx3 map |
|
||||
| `bowl_pile_pos` | Where bowl piles are |
|
||||
| `onion_pile_pos` | Where onion piles are |
|
||||
| `time` | N steps taken |
|
||||
| `terminal` | Terminal step? |
|
||||
|
||||
|
||||
## Teacher environment
|
||||
View source: [`envs/overcooked_proc/overcooked_ued.py`](../../src/minimax/envs/overcooked_proc/overcooked_ued.py)
|
||||
|
||||
Also similar to `AMaze` we document the teacher environment below.
|
||||
`UEDOvercooked` is the teacher's MDP for setting the env params described above.
|
||||
Similar to above:
|
||||
|
||||
### Static EnvParams
|
||||
| Variable | Description| Command-line support |
|
||||
| - | - | - |
|
||||
| `height` | Height of maze | ✅ |
|
||||
| `width` | Width of maze | ✅ |
|
||||
| `n_walls` | Wall budget | ✅ |
|
||||
| `noise_dim` | Size of noise vector in the observation | ✅ |
|
||||
| `replace_wall_pos` | If `True`, placing an object over an existing way replaces it. Otherwise, the object is placed in a random unused position. | ✅ |
|
||||
| `fixed_n_wall_steps` | First `n_walls` actions are wall positions if `True`. Otherwise, the first action only determines the fraction of wall budget to use. | ✅ |
|
||||
| `first_wall_pos_sets_budget` | First wall position also determines the fraction of wall budget to use (rather than using a separate first action to separately determine this fraction) | ✅ |
|
||||
| `use_seq_actions` | Whether to use sequential actions, always true | ✅ |
|
||||
| `normalize_obs` | If `True`, Scale observation values to [0,1] | ✅ |
|
||||
| `sample_n_walls` | Whether to sample n walls | ✅ |
|
||||
| `max_steps` | See above | ✅ |
|
||||
| `singleton_seed` | See above | ✅ |
|
||||
| `max_episode_steps` | See above | ✅ |
|
||||
|
||||
|
||||
### State space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `encoding` | `A 1D vector encoding the running action sequence of the teacher` |
|
||||
| `time` | `current time step` |
|
||||
| `terminal` | `True` if the episode is done |
|
||||
|
||||
### Observation space
|
||||
| Variable | Description|
|
||||
| - | - |
|
||||
| `image` | Full `maze_map` of the Overcooked instance under construction: hxwx3 |
|
||||
| `time` | Time step |
|
||||
| `noise` | A noise vector sampled from Uniform(0,1) |
|
||||
|
||||
### Action space
|
||||
Similar to in `AMaze`, the action space corresponds to integers in [0,`height*width`]. Each action corresponds to a selected wall location in the flattened maze grid, with the exception of the last few actions, which place objects in the environment. This interpretation of the action sequence can change based on the specific configuration of `EnvParams`:
|
||||
|
||||
- If `params.replace_wall_pos=True`, the first action corresponds to the number of walls to place in the current episode.
|
||||
|
||||
- If `params.set_agent_dir=True`, an additional step is appended to the episode, where the action corresponds to the agent's initial orientation index.
|
||||
|
||||
The actions are:
|
||||
```python
|
||||
class SequentialActions(IntEnum):
|
||||
skip = 0
|
||||
wall = 1
|
||||
goal = 2
|
||||
agent = 3
|
||||
onion = 4
|
||||
soup = 5
|
||||
bowls = 6
|
||||
```
|
||||
|
||||
## OOD test environments
|
||||
We include the original 5 and more layouts for OOD testing in [`envs/overcooked_proc/overcooked_ood.py`](../../src/minimax/envs/overcooked_proc/overcooked_ood.py)
|
37
docs/evaluate_args.md
Normal file
|
@ -0,0 +1,37 @@
|
|||
# Command-line usage guide for `minimax.evaluate`
|
||||
|
||||
You can evaluate student agent checkpoints using `minimax.evaluate` as follows:
|
||||
|
||||
```bash
|
||||
python -m minimax.evaluate \
|
||||
--seed 1 \
|
||||
--log_dir <absolute path log directory> \
|
||||
--xpid_prefix <select checkpoints with xpids matching this prefix> \
|
||||
--env_names <csv string of test environment names> \
|
||||
--n_episodes <number of trials per test environment> \
|
||||
--results_path <path to results folder> \
|
||||
--results_fname <filename of output results csv>
|
||||
```
|
||||
|
||||
Some behaviors of `minimax.evaluate` to be aware of:
|
||||
- This command will search `log_dir` for all experiment directories with names matching `xpid_prefix` and evaluate the checkpoint named `<checkpoint_name>.pkl`.
|
||||
- `minimax.evaluate` assumes xpid values end with a unique index, so that they match the regex `.*_[0-9]+$`.
|
||||
- The results will be averaged over all such checkpoints (at most one checkpoint per matching experiment folder). Using the `--xpid_prefix` argument can be useful for evaluating corresponding to the same experimental configuration with different training seeds (and thus share an xpid prefix, e.g. <xpid_prefix_0>, <xpid_prefix_1>, <xpid_prefix_2>).
|
||||
|
||||
If you would like to evaluate a checkpoint for only a single experiment, specify the full experiment directory name using `--xpid` instead of using `--xpid_prefix`.
|
||||
|
||||
|
||||
## All command-line arguments
|
||||
| Argument | Description |
|
||||
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `seed` | Random seed for evaluation |
|
||||
| `log_dir` | Directory containing experiment folders |
|
||||
| `xpid` | Name of experiment folder, i.e. the experiment ID |
|
||||
| `xpid_prefix` | Evaluate and average results over checkpoints for experiments with experiment IDs matching this prefix (ignores `--xpid` if set) |
|
||||
| `checkpoint_name` | Name of checkpoint to evaluate (in each matching experiment folder) |
|
||||
| `env_names` | Number of devices over which to shard the environment batch dimension |
|
||||
| `n_episodes` | Number of students in the autocurriculum |
|
||||
| `agent_idxs` | Indices of student agents to evaluate (csv of indices or `*` for all indices) |
|
||||
| `results_path` | Number of parallel environments |
|
||||
| `results_fname` | Number of parallel trials per environment (environment) |
|
||||
| `render_mode` | If set, renders the evaluation episode. Requires disabling JIT. Use `'ipython'` if rendering inside an IPython notebook. |
|
BIN
docs/images/OvercookedDCD.png
Normal file
After Width: | Height: | Size: 156 KiB |
BIN
docs/images/Training6x9SmallStylised.pdf
Normal file
BIN
docs/images/Training6x9SmallStylised.png
Normal file
After Width: | Height: | Size: 214 KiB |
BIN
docs/images/env_maze_overview.png
Normal file
After Width: | Height: | Size: 132 KiB |
BIN
docs/images/minimax_logo.png
Normal file
After Width: | Height: | Size: 119 KiB |
BIN
docs/images/minimax_speedups.png
Normal file
After Width: | Height: | Size: 42 KiB |
BIN
docs/images/minimax_speedups_darkmode.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
docs/images/minimax_system_diagram.png
Normal file
After Width: | Height: | Size: 230 KiB |
BIN
docs/images/parallel_dcd_overview.png
Normal file
After Width: | Height: | Size: 130 KiB |
28
docs/make_cmd.md
Normal file
|
@ -0,0 +1,28 @@
|
|||
# Generating commands
|
||||
|
||||
The `minimax.config.make_cmd` module enables generating batches of commands from a JSON configuration file, e.g. for running array jobs with Slurm. The JSON should adhere to the following format:
|
||||
- Each key is a valid command-line argument for `minimax.train`.
|
||||
- Each value is a list of values for the corresponding command-line argument. Commands are generated for each combination of command-line argument values.
|
||||
- Boolean values should be specified as 'True' or 'False'.
|
||||
- If a value is specified as `null`, the associated command-line argument is not included in the generated command (and thus would take on the default value specified when defining the argument parser).
|
||||
|
||||
You can try it out by running the following command in your project root directory:
|
||||
|
||||
```
|
||||
python -m minimax.config.make_cmd --config maze/plr
|
||||
```
|
||||
|
||||
The above command will create a directory called `config` in the calling directory with a subdirectory `config/maze` containing configuration files for several autocurriculum methods.
|
||||
|
||||
By default, `minimax.config.make_cmd` searches for configuration files inside `config`. You can create your own JSON configuration files within `config`. If your JSON configuration is located at `config/path/to/my/json`, then you can generate commands with it by calling `minimax.config.make_cmd --config path/to/my/json`.
|
||||
|
||||
## Configuring `wandb`
|
||||
|
||||
If your configuration includes the argument `wandb_project`, then `minimax.config.make_cmd` will look for a JSON dictionary with your credentials at `config/wandb.json`. The expected format of this JSON file is
|
||||
|
||||
```json
|
||||
{
|
||||
"base_url": <URL for wandb API endpoint, e.g. https://api.wandb.ai>,
|
||||
"api_key": <Your wandb API key>
|
||||
}
|
||||
```
|
131
docs/parsnip.md
Normal file
|
@ -0,0 +1,131 @@
|
|||
# `Parsnip`
|
||||
|
||||
## 🥕 `argparse` with conditional argument groups.
|
||||
|
||||
As `minimax.train` is the single point-of-entry for training, its command-line arguments can grow quickly in number with each additional autocurriculum method supported in `minimax`. This complexity arises for several reasons:
|
||||
|
||||
- New components in the form of training runners, environments, agents, and models may require additional arguments
|
||||
- New components may require existing arguments shared with previous components
|
||||
- New components may overload the meaning of existing arguments used by other components
|
||||
|
||||
We make use of a custom module called `Parsnip` to help manage the complexity of specifying and parsing command-line arguments. `Parsnip` allows the creation of named argument groups, which allows adding new arguments while explicitly separating them into name spaces. Each argument group results in its own kwarg dictionary when parsed.
|
||||
|
||||
`Parsnip` directly builds on `argparse` by adding the notion of a "subparser". Here, a subparser is simply an `argparse` parser responsible for a named argument group. Subparsers enable some useful behavior:
|
||||
- Arguments can be added to the top-level `Parsnip` parser or to a subparser.
|
||||
- Each subparser is initialized with a `name` for its corresponding argument group. All arguments under this subparser will be contained in a nested kwarg dictionary under the key equal to `name`.
|
||||
- Each subparser can be initialized with an optional `prefix`, in which case all command-line arguments added to the subparser will be prepended with the value of `prefix` (see example below), thus creating a namespace for the corresponding argument group.
|
||||
- Subparsers can be added conditionally, based on the specific value of a top-level argument (with support for the wildcard `*`).
|
||||
- After parsing, `Parsnip` produces a kwargs dictionary containing a key:value pair for each top-level argument and a nested kwargs dictionary, under the key `<prefix>` containing the parsed arguments managed by each active subparser initialized with `prefix=<prefix>`.
|
||||
|
||||
Other than these details, `Parsnip`'s interface remains identical to that of `argparse`.
|
||||
|
||||
## A minimal example
|
||||
In this example, we assume the parser is used inside a script called `run.py`.
|
||||
|
||||
```python
|
||||
from util.parsnip import Parsnip
|
||||
|
||||
# Create a new Parsnip parser
|
||||
parser = Parsnip()
|
||||
|
||||
# Add some top-level arguments (same as argparse)
|
||||
parser.add_argument(
|
||||
'--name',
|
||||
type=str,
|
||||
help='Name of my farm.')
|
||||
parser.add_argument(
|
||||
'--kind',
|
||||
type=str,
|
||||
choices=['apple', 'radish'],
|
||||
help='What kind of farm I run.')
|
||||
parser.add_argument(
|
||||
'--n_acres',
|
||||
type=str,
|
||||
help='Size of my farm in acres.')
|
||||
|
||||
# Create a nested argument group with a prefix
|
||||
crop_subparser = parser.add_subparser(name='crop', prefix='crop')
|
||||
parser.add_argument(
|
||||
'--n_acres',
|
||||
type=str,
|
||||
help='Size of land for growing radish, in acres.')
|
||||
|
||||
# Create a conditional argument group
|
||||
radish_subparser = parser.add_subparser(
|
||||
name='radish',
|
||||
prefix='radish',
|
||||
dependency={'crop': 'radish'},
|
||||
dest='crop')
|
||||
radish_subparser.add_argument(
|
||||
'--is_pickled'
|
||||
type=str2bool,
|
||||
default=False,
|
||||
help='Whether my farm produces pickled radish.')
|
||||
|
||||
# Create another conditional argument group
|
||||
apple_subparser = parser.add_subparser(
|
||||
name='apple',
|
||||
prefix='apple',
|
||||
dependency={'crop': 'apple'},
|
||||
dest='crop')
|
||||
apple_subparser.add_argument(
|
||||
'--kind'
|
||||
type=str,
|
||||
choices=['fuji', 'mcintosh'],
|
||||
default='fuji',
|
||||
help='Whether my farm produces pickled radish.')
|
||||
|
||||
args = parser.parse_args()
|
||||
```
|
||||
|
||||
Then running this command
|
||||
|
||||
```bash
|
||||
python run.py \
|
||||
--name 'Radelicious Farms' \
|
||||
--kind radish \
|
||||
--n_acres 200 \
|
||||
--crop_n_acres 150 \
|
||||
--radish_is_pickled
|
||||
```
|
||||
|
||||
would produce this kwargs dictionary:
|
||||
|
||||
```python
|
||||
{
|
||||
'name': 'Radelicious Farms',
|
||||
'kind': 'radish',
|
||||
'n_acres': 200,
|
||||
'crop_args': {
|
||||
'n_acres': 150,
|
||||
'is_pickled': True
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Notice how the `prefix` for each subparser is appended to each argument name added to that subparser (e.g. `n_acres` became `crop_n_acres`, and `is_pickled` became `radish_is_pickled`). Also notice how the `radish_is_pickled` argument became active, as its activation conditions on `kind=radish`, as we specified when defining the `radish_subparser`.
|
||||
|
||||
Likewise, running this argument
|
||||
|
||||
```bash
|
||||
python run.py \
|
||||
--name 'Appledores Farms' \
|
||||
--kind apple \
|
||||
--n_acres 200 \
|
||||
--crop_n_acres 150 \
|
||||
--apple_kind fuji
|
||||
```
|
||||
|
||||
results in this kwargs dictionary:
|
||||
|
||||
```python
|
||||
{
|
||||
'name': 'Appledores Farms',
|
||||
'kind': 'apple',
|
||||
'n_acres': 200,
|
||||
'crop_args': {
|
||||
'n_acres': 150,
|
||||
'kind': 'fuji'
|
||||
}
|
||||
}
|
||||
```
|
125
docs/train_args.md
Normal file
|
@ -0,0 +1,125 @@
|
|||
# Command-line usage guide for `minimax.train`
|
||||
|
||||
Parsing command-line arguments is handled by [`Parsnip`](parsnip.md).
|
||||
|
||||
You can quickly generate batches of training commands from a JSON configuration file using [`minimax.config.make_cmd`](make_cmd.md).
|
||||
|
||||
## General arguments
|
||||
|
||||
| Argument | Description |
|
||||
| ----------------------- | ---------------------------------------------------------------------------------------------------- |
|
||||
| `seed` | Random seed, should be unique per experimental run |
|
||||
| `agent_rl_algo` | Base RL algorithm used for training (e.g. PPO) |
|
||||
| `n_total_updates` | Total number of updates for the training run |
|
||||
| `train_runner` | Which training runner to use, e.g. `dr`, `plr`, or `paired` |
|
||||
| `n_devices` | Number of devices over which to shard the environment batch dimension |
|
||||
| `n_students` | Number of students in the autocurriculum |
|
||||
| `n_parallel` | Number of parallel environments |
|
||||
| `n_eval` | Number of parallel trials per environment (environment batch dimension is then `n_parallel*n_eval`) |
|
||||
| `n_rollout_steps` | Number of steps per rollout (used for each update cycle) |
|
||||
| `lr` | Learning rate |
|
||||
| `lr_final` | Final learning rate, based on linear schedule. Defaults to `None`, corresponding to no schedule. |
|
||||
| `lr_anneal_steps` | Number of steps over which to linearly anneal from `lr` to `lr_final` |
|
||||
| `student_value_coef` | Value loss coefficient |
|
||||
| `student_entropy_coef` | Entropy bonus coefficient |
|
||||
| `student_unroll_update` | Unroll multi-gradient updates this many times (can lead to speed ups) |
|
||||
| `max_grad_norm` | Clip gradients beyond this magnitude |
|
||||
| `adam_eps` | Value of $`\epsilon`$ numerical stability constant for Adam |
|
||||
| `discount` | Discount factor $`\gamma`$ for the student's RL optimization |
|
||||
| `n_unroll_rollout` | Unroll rollout scans this many times (can lead to speed ups) |
|
||||
|
||||
## Logging arguments
|
||||
|
||||
| Argument | Description |
|
||||
| ------------------- | -------------------------------------------------------- |
|
||||
| `verbose` | Random seed, should be unique per experimental run |
|
||||
| `track_env_metrics` | Track per rollout batch environment metrics if `True` |
|
||||
| `log_dir` | Path to directory storing all experiment folders |
|
||||
| `xpid` | Unique name for experiment folder, stored in `--log_dir` |
|
||||
| `log_interval` | Log training statistics every this many rollout cycles |
|
||||
| `wandb_base_url` | Base API URL if logging with `wandb` |
|
||||
| `wandb_api_key` | API key for `wandb` |
|
||||
| `wandb_entity` | `wandb` entity associated with the experiment run |
|
||||
| `wandb_project` | `wandb` project for the experiment run |
|
||||
| `wandb_group` | `wandb` group for the experiment run |
|
||||
|
||||
## Checkpointing arguments
|
||||
|
||||
| Argument | Description |
|
||||
| ---------------------- | ----------------------------------------------------------------------------- |
|
||||
| `checkpoint_interval` | Random seed, should be unique per experimental run |
|
||||
| `from_last_checkpoint` | Begin training from latest `checkpoint.pkl`, if any, in the experiment folder |
|
||||
| `archive_interval` | Save an additional checkpoint for models trained per this many rollout cycles |
|
||||
|
||||
## Evaluation arguments
|
||||
|
||||
| Argument | Description |
|
||||
| ----------------- | -------------------------------------------------------------------- |
|
||||
| `test_env_names` | Random seed, should be unique per experimental run |
|
||||
| `test_n_episodes` | Average test results over this many episodes per test environment |
|
||||
| `test_agent_idxs` | Test agents at these indices (csv of indices or `*` for all indices) |
|
||||
|
||||
## PPO arguments
|
||||
|
||||
These arguments activate when `--agent_rl_algo=ppo`.
|
||||
|
||||
| Argument | Description |
|
||||
| ----------------------------- | ----------------------------------------------------------- |
|
||||
| `student_ppo_n_epochs` | Random seed, should be unique per experimental run |
|
||||
| `student_ppo_n_epochs` | Number of PPO epochs per update cycle |
|
||||
| `student_ppo_n_minibatches` | Number of minibatches per PPO epoch |
|
||||
| `student_ppo_clip_eps` | Clip coefficient for PPO |
|
||||
| `student_ppo_clip_value_loss` | Perform value clipping if `True` |
|
||||
| `gae_lambda` | Lambda discount factor for Generalized Advantage Estimation |
|
||||
|
||||
## PAIRED arguments
|
||||
|
||||
The arguments in this section activate when `--train_runner=paired`.
|
||||
|
||||
| Argument | Description |
|
||||
| ------------------------- | --------------------------------------------------------------------- |
|
||||
| `teacher_lr` | Learning rate |
|
||||
| `teacher_lr_final` | Anneal learning rate to this value (defaults to `teacher_lr`) |
|
||||
| `teacher_lr_anneal_steps` | Number of steps over which to linearly anneal from `lr` to `lr_final` |
|
||||
| `teacher_discount` | Discount factor, $`\gamma`$ |
|
||||
| `teacher_value_loss_coef` | Value loss coefficient |
|
||||
| `teacher_entropy_coef` | Entropy bonus coefficient |
|
||||
| `teacher_n_unroll_update` | Unroll multi-gradient updates this many times (can lead to speed ups) |
|
||||
| `ued_score` | Name of UED objective, e.g. `relative_regret` |
|
||||
|
||||
These PPO-specific arguments for teacher optimization further activate when `--agent_rl_algo=ppo`.
|
||||
|
||||
| Argument | Description |
|
||||
| ----------------------------- | ----------------------------------------------------------- |
|
||||
| `teacher_ppo_n_epochs` | Number of PPO epochs per update cycle |
|
||||
| `teacher_ppo_n_minibatches` | Number of minibatches per PPO epoch |
|
||||
| `teacher_ppo_clip_eps` | Clip coefficient for PPO |
|
||||
| `teacher_ppo_clip_value_loss` | Perform value clipping if `True` |
|
||||
| `teacher_gae_lambda` | Lambda discount factor for Generalized Advantage Estimation |
|
||||
|
||||
## PLR arguments
|
||||
|
||||
The arguments in this section activate when `--train_runner=paired`.
|
||||
|
||||
| Argument | Description |
|
||||
| ----------------------------- | ------------------------------------------------------------------------------------------------------------- |
|
||||
| `ued_score` | Name of UED objective (aka PLR scoring function) |
|
||||
| `plr_replay_prob` | Replay probability |
|
||||
| `plr_buffer_size` | Size of level replay buffer |
|
||||
| `plr_staleness_coef` | Staleness coefficient |
|
||||
| `plr_temp` | Score distribution temperature |
|
||||
| `plr_use_score_ranks` | Use rank-based prioritization (rather than proportional) |
|
||||
| `plr_min_fill_ratio` | Only replay once level replay buffer is filled above this ratio |
|
||||
| `plr_use_robust_plr` | Use robust PLR (i.e. only update policy on replay levels) |
|
||||
| `plr_force_unique` | Force level replay buffer members to be unique |
|
||||
| `plr_use_parallel_eval` | Use Parallel PLR or Parallel ACCEL (if `plr_mutation_fn` is set) |
|
||||
| `plr_mutation_fn` | If set, PLR becomes ACCEL. Use `'default'` for default mutation operator per environment. |
|
||||
| `plr_n_mutations` | Number of applications of `plr_mutation_fn` per mutation cycle. |
|
||||
| `plr_mutation_criterion` | How replay levels are selected for mutation (e.g. `batch`, `easy`, `hard`). |
|
||||
| `plr_mutation_subsample_size` | Number of replay levels selected for mutation according to the criterion (ignored if using `batch` criterion) |
|
||||
|
||||
## Environment-specific arguments
|
||||
|
||||
### Maze
|
||||
|
||||
See the [`AMaze`](envs/maze.md) docs for details on how to specify [training](envs/maze.md#student-environment), [evaluation](envs/maze.md#student-environment), and [teacher-specific](envs/maze.md#teacher-environment) environment parameters via command line
|
10
requirements.txt
Normal file
|
@ -0,0 +1,10 @@
|
|||
numpy>=1.25,<1.26
|
||||
pandas==1.5.3
|
||||
jax>=0.4.19
|
||||
jaxlib>=0.4.19
|
||||
flax>=0.7.4
|
||||
optax>=0.1.7
|
||||
chex>=0.1.83
|
||||
wandb>=0.13
|
||||
ipython>=7.34.0
|
||||
GitPython>=3.1.29
|
0
src/__init__.py
Normal file
72
src/baseline_train__8_seeds.sh
Executable file
|
@ -0,0 +1,72 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
layout=$2
|
||||
|
||||
seed_max=8
|
||||
|
||||
for seed in `seq ${seed_max}`;
|
||||
do
|
||||
echo "seed is ${seed}:"
|
||||
CUDA_VISIBLE_DEVICES=${device} XLA_PYTHON_CLIENT_MEM_FRACTION=.40 LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.train \
|
||||
--wandb_mode=online \
|
||||
--wandb_project=overcooked-minimax-jax \
|
||||
--wandb_entity=${WANDB_ENTITY} \
|
||||
--seed=${seed} \
|
||||
--agent_rl_algo=ppo \
|
||||
--n_total_updates=1000 \
|
||||
--train_runner=dr \
|
||||
--n_devices=1 \
|
||||
--student_model_name=default_student_actor_cnn \
|
||||
--student_critic_model_name=default_student_critic_cnn \
|
||||
--env_name=Overcooked \
|
||||
--is_multi_agent=True \
|
||||
--verbose=False \
|
||||
--log_dir=~/logs/minimax \
|
||||
--log_interval=10 \
|
||||
--from_last_checkpoint=False \
|
||||
--checkpoint_interval=25 \
|
||||
--archive_interval=25 \
|
||||
--archive_init_checkpoint=False \
|
||||
--test_interval=50 \
|
||||
--n_students=1 \
|
||||
--n_parallel=100 \
|
||||
--n_eval=1 \
|
||||
--n_rollout_steps=400 \
|
||||
--lr=3e-4 \
|
||||
--lr_anneal_steps=0 \
|
||||
--max_grad_norm=0.5 \
|
||||
--adam_eps=1e-05 \
|
||||
--track_env_metrics=True \
|
||||
--discount=0.99 \
|
||||
--n_unroll_rollout=10 \
|
||||
--render=False \
|
||||
--student_gae_lambda=0.95 \
|
||||
--student_entropy_coef=0.01 \
|
||||
--student_value_loss_coef=0.5 \
|
||||
--student_n_unroll_update=5 \
|
||||
--student_ppo_n_epochs=5 \
|
||||
--student_ppo_n_minibatches=1 \
|
||||
--student_ppo_clip_eps=0.2 \
|
||||
--student_ppo_clip_value_loss=True \
|
||||
--student_hidden_dim=64 \
|
||||
--student_n_hidden_layers=3 \
|
||||
--student_n_conv_layers=3 \
|
||||
--student_n_conv_filters=32 \
|
||||
--student_n_scalar_embeddings=4 \
|
||||
--student_scalar_embed_dim=5 \
|
||||
--student_agent_kind=mappo \
|
||||
--overcooked_height=6 \
|
||||
--overcooked_width=9 \
|
||||
--overcooked_n_walls=15 \
|
||||
--overcooked_replace_wall_pos=True \
|
||||
--overcooked_sample_n_walls=True \
|
||||
--overcooked_normalize_obs=True \
|
||||
--overcooked_max_steps=400 \
|
||||
--overcooked_random_reset=False \
|
||||
--overcooked_fix_to_single_layout=${layout} \
|
||||
--n_shaped_reward_steps=3000000 \
|
||||
--test_n_episodes=10 \
|
||||
--test_env_names=Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9 \
|
||||
--overcooked_test_normalize_obs=True \
|
||||
--xpid=8SEED_${seed}_dr-overcookedNonexNonewNone_fs_FIX${layout}_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
||||
done
|
8
src/baseline_train__all.sh
Executable file
|
@ -0,0 +1,8 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
./baseline_train__8_seeds.sh $device coord_ring_6_9
|
||||
./baseline_train__8_seeds.sh $device counter_circuit_6_9
|
||||
./baseline_train__8_seeds.sh $device forced_coord_6_9
|
||||
./baseline_train__8_seeds.sh $device cramped_room_6_9
|
||||
./baseline_train__8_seeds.sh $device asymm_advantages_6_9
|
71
src/baseline_train__holdout_sp.sh
Executable file
|
@ -0,0 +1,71 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
seed=42
|
||||
|
||||
for layout in "coord_ring_6_9" "forced_coord_6_9" "cramped_room_6_9" "asymm_advantages_6_9" "counter_circuit_6_9";
|
||||
do
|
||||
echo "layout is ${layout}:"
|
||||
CUDA_VISIBLE_DEVICES=${device} XLA_PYTHON_CLIENT_MEM_FRACTION=.40 LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.train \
|
||||
--wandb_mode=online \
|
||||
--wandb_project=overcooked-minimax-jax \
|
||||
--wandb_entity=${WANDB_ENTITY} \
|
||||
--seed=${seed} \
|
||||
--agent_rl_algo=ppo \
|
||||
--n_total_updates=1000 \
|
||||
--train_runner=dr \
|
||||
--n_devices=1 \
|
||||
--student_model_name=default_student_actor_cnn \
|
||||
--student_critic_model_name=default_student_critic_cnn \
|
||||
--env_name=Overcooked \
|
||||
--is_multi_agent=True \
|
||||
--verbose=False \
|
||||
--log_dir=~/logs/minimax \
|
||||
--log_interval=10 \
|
||||
--from_last_checkpoint=False \
|
||||
--checkpoint_interval=25 \
|
||||
--archive_interval=25 \
|
||||
--archive_init_checkpoint=False \
|
||||
--test_interval=50 \
|
||||
--n_students=1 \
|
||||
--n_parallel=100 \
|
||||
--n_eval=1 \
|
||||
--n_rollout_steps=400 \
|
||||
--lr=3e-4 \
|
||||
--lr_anneal_steps=0 \
|
||||
--max_grad_norm=0.5 \
|
||||
--adam_eps=1e-05 \
|
||||
--track_env_metrics=True \
|
||||
--discount=0.99 \
|
||||
--n_unroll_rollout=10 \
|
||||
--render=False \
|
||||
--student_gae_lambda=0.95 \
|
||||
--student_entropy_coef=0.01 \
|
||||
--student_value_loss_coef=0.5 \
|
||||
--student_n_unroll_update=5 \
|
||||
--student_ppo_n_epochs=5 \
|
||||
--student_ppo_n_minibatches=1 \
|
||||
--student_ppo_clip_eps=0.2 \
|
||||
--student_ppo_clip_value_loss=True \
|
||||
--student_hidden_dim=64 \
|
||||
--student_n_hidden_layers=3 \
|
||||
--student_n_conv_layers=3 \
|
||||
--student_n_conv_filters=32 \
|
||||
--student_n_scalar_embeddings=4 \
|
||||
--student_scalar_embed_dim=5 \
|
||||
--student_agent_kind=mappo \
|
||||
--overcooked_height=6 \
|
||||
--overcooked_width=9 \
|
||||
--overcooked_n_walls=15 \
|
||||
--overcooked_replace_wall_pos=True \
|
||||
--overcooked_sample_n_walls=True \
|
||||
--overcooked_normalize_obs=True \
|
||||
--overcooked_max_steps=400 \
|
||||
--overcooked_random_reset=False \
|
||||
--overcooked_fix_to_single_layout=${layout} \
|
||||
--n_shaped_reward_steps=3000000 \
|
||||
--test_n_episodes=10 \
|
||||
--test_env_names=Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9 \
|
||||
--overcooked_test_normalize_obs=True \
|
||||
--xpid=9SEED_${seed}_dr-overcookedNonexNonewNone_fs_FIX${layout}_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
||||
done
|
73
src/config/configs/maze/accel.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
59
src/config/configs/maze/dr.json
Normal file
|
@ -0,0 +1,59 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
73
src/config/configs/maze/paccel.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [10],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
84
src/config/configs/maze/paired.json
Normal file
|
@ -0,0 +1,84 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.995],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.05],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [5],
|
||||
"teacher_ppo_n_minibatches": [1],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [256],
|
||||
"teacher_hidden_dim": [32],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [false],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"maze_ued_replace_wall_pos": [true],
|
||||
"maze_ued_fixed_n_wall_steps": [true],
|
||||
"maze_ued_first_wall_pos_sets_budget": [false],
|
||||
"maze_ued_noise_dim": [50],
|
||||
"maze_ued_n_walls": [60],
|
||||
"maze_ued_set_agent_dir": [false],
|
||||
"maze_ued_normalize_obs": [true],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/config/configs/maze/plr.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [5e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/config/configs/maze/pplr.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/config/configs/maze/s5_accel.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [10],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"test_agent_idxs": ["\"*\""],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
63
src/config/configs/maze/s5_dr.json
Normal file
|
@ -0,0 +1,63 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
77
src/config/configs/maze/s5_paccel.json
Normal file
|
@ -0,0 +1,77 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [1e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
94
src/config/configs/maze/s5_paired.json
Normal file
|
@ -0,0 +1,94 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.995],
|
||||
"teacher_lr": [0.0001],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.001],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [5],
|
||||
"teacher_ppo_n_minibatches": [1],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["s5"],
|
||||
"teacher_recurrent_hidden_dim": [256],
|
||||
"teacher_hidden_dim": [32],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [32],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"teacher_s5_n_blocks": [2],
|
||||
"teacher_s5_n_layers": [2],
|
||||
"teacher_s5_layernorm_pos": ["post"],
|
||||
"teacher_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [false],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"maze_ued_replace_wall_pos": [true],
|
||||
"maze_ued_fixed_n_wall_steps": [true],
|
||||
"maze_ued_first_wall_pos_sets_budget": [false],
|
||||
"maze_ued_noise_dim": [50],
|
||||
"maze_ued_n_walls": [60],
|
||||
"maze_ued_set_agent_dir": [false],
|
||||
"maze_ued_normalize_obs": [true],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"test_agent_idxs": ["\"*\""],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
73
src/config/configs/maze/s5_plr.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
73
src/config/configs/maze/s5_pplr.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,63 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [25],
|
||||
"archive_interval": [25],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [50],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [5e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.99],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.95],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_fix_to_single_layout": ["asymm_advantages_6_9"],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [100],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_fix_to_single_layout": ["asymm_advantages_6_9"],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/config/configs/overcooked/baseline__s5_coord_ring.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"overcooked_fix_to_single_layout": ["coord_ring_6_9"],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"overcooked_fix_to_single_layout": ["counter_circuit_6_9"],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/config/configs/overcooked/baseline__s5_cramped_room.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"overcooked_fix_to_single_layout": ["cramped_room_6_9"],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/config/configs/overcooked/baseline__s5_forced_coord.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"overcooked_fix_to_single_layout": ["forced_coord_6_9"],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
64
src/config/configs/overcooked/baseline_dr_lstm.json
Normal file
|
@ -0,0 +1,64 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
64
src/config/configs/overcooked/baseline_dr_lstm5x5.json
Normal file
|
@ -0,0 +1,64 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
68
src/config/configs/overcooked/baseline_dr_s5.json
Normal file
|
@ -0,0 +1,68 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
68
src/config/configs/overcooked/baseline_dr_s55x5.json
Normal file
|
@ -0,0 +1,68 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
67
src/config/configs/overcooked/baseline_dr_softmoe_lstm.json
Normal file
|
@ -0,0 +1,67 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,67 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/config/configs/overcooked/baseline_p_accel_lstm.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/config/configs/overcooked/baseline_p_accel_lstm5x5.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
82
src/config/configs/overcooked/baseline_p_accel_s5.json
Normal file
|
@ -0,0 +1,82 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
82
src/config/configs/overcooked/baseline_p_accel_s55x5.json
Normal file
|
@ -0,0 +1,82 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,81 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,81 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
74
src/config/configs/overcooked/baseline_p_plr_lstm.json
Normal file
|
@ -0,0 +1,74 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
74
src/config/configs/overcooked/baseline_p_plr_lstm5x5.json
Normal file
|
@ -0,0 +1,74 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/config/configs/overcooked/baseline_p_plr_s5.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/config/configs/overcooked/baseline_p_plr_s55x5.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,77 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,77 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
86
src/config/configs/overcooked/baseline_pop_paired_lstm.json
Normal file
|
@ -0,0 +1,86 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.999],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.01],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [8],
|
||||
"teacher_ppo_n_minibatches": [4],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [64],
|
||||
"teacher_hidden_dim": [64],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,86 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.999],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.01],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [8],
|
||||
"teacher_ppo_n_minibatches": [4],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [64],
|
||||
"teacher_hidden_dim": [64],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
90
src/config/configs/overcooked/baseline_pop_paired_s5.json
Normal file
|
@ -0,0 +1,90 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.999],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.01],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [8],
|
||||
"teacher_ppo_n_minibatches": [4],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [64],
|
||||
"teacher_hidden_dim": [64],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
90
src/config/configs/overcooked/baseline_pop_paired_s55x5.json
Normal file
|
@ -0,0 +1,90 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.999],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.01],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [8],
|
||||
"teacher_ppo_n_minibatches": [4],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [3],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [64],
|
||||
"teacher_hidden_dim": [64],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.999],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.01],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [8],
|
||||
"teacher_ppo_n_minibatches": [4],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [64],
|
||||
"teacher_hidden_dim": [64],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-4],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.999],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.01],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.01],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [8],
|
||||
"teacher_ppo_n_minibatches": [4],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [64],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_is_soft_moe": [true],
|
||||
"student_soft_moe_num_experts": [4],
|
||||
"student_soft_moe_num_slots": [32],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [64],
|
||||
"teacher_hidden_dim": [64],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [5],
|
||||
"overcooked_width": [5],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [false],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"n_shaped_reward_updates": [30000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing5_5,Overcooked-ForcedCoord5_5,Overcooked-CrampedRoom5_5"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
83
src/config/configs/overcooked/paired.json
Normal file
|
@ -0,0 +1,83 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_moe"],
|
||||
"student_critic_model_name": ["default_student_critic_moe"],
|
||||
"env_name": ["Overcooked"],
|
||||
"verbose": [false],
|
||||
"is_multi_agent": [true],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [100],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.995],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.05],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [5],
|
||||
"teacher_ppo_n_minibatches": [1],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [256],
|
||||
"teacher_hidden_dim": [32],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [5],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"overcooked_ued_replace_wall_pos": [true],
|
||||
"overcooked_ued_fixed_n_wall_steps": [false],
|
||||
"overcooked_ued_first_wall_pos_sets_budget": [true],
|
||||
"overcooked_ued_noise_dim": [50],
|
||||
"overcooked_ued_n_walls": [15],
|
||||
"overcooked_ued_normalize_obs": [true],
|
||||
"test_n_episodes": [10],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
71
src/config/configs/overcooked/plr.json
Normal file
|
@ -0,0 +1,71 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [1000000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_moe"],
|
||||
"student_critic_model_name": ["default_student_critic_moe"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [100],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [5e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.99],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [false],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"test_n_episodes": [10],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/config/configs/overcooked/plr_s5.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [100000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_actor_cnn"],
|
||||
"student_critic_model_name": ["default_student_critic_cnn"],
|
||||
"env_name": ["Overcooked"],
|
||||
"is_multi_agent": [true],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [128],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [400],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [false],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [8],
|
||||
"student_ppo_n_minibatches": [4],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [64],
|
||||
"student_n_hidden_layers": [2],
|
||||
"student_n_conv_layers": [3],
|
||||
"student_n_conv_filters": [32],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"student_agent_kind": ["mappo"],
|
||||
"overcooked_height": [6],
|
||||
"overcooked_width": [9],
|
||||
"overcooked_n_walls": [15],
|
||||
"overcooked_replace_wall_pos": [true],
|
||||
"overcooked_sample_n_walls": [true],
|
||||
"overcooked_normalize_obs": [true],
|
||||
"overcooked_max_steps": [400],
|
||||
"overcooked_random_reset": [true],
|
||||
"n_shaped_reward_steps": [5000000],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": [
|
||||
"Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9"
|
||||
],
|
||||
"overcooked_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
9
src/eval_all_xpid_against_population_in_all_layouts.sh
Executable file
|
@ -0,0 +1,9 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
# "Overcooked-CoordRing6_9" "Overcooked-ForcedCoord6_9" "Overcooked-CounterCircuit6_9" "Overcooked-AsymmAdvantages6_9" "Overcooked-CrampedRoom6_9"
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device Overcooked-CoordRing6_9 9SEED_9_dr-overcookedNonexNonewNone_fs_FIXcoord_ring_6_9_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device Overcooked-ForcedCoord6_9 9SEED_9_dr-overcookedNonexNonewNone_fs_FIXforced_coord_6_9_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device Overcooked-CounterCircuit6_9 9SEED_9_dr-overcookedNonexNonewNone_fs_FIXcounter_circuit_6_9_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device Overcooked-AsymmAdvantages6_9 9SEED_9_dr-overcookedNonexNonewNone_fs_FIXasymm_advantages_6_9_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device Overcooked-CrampedRoom6_9 9SEED_9_dr-overcookedNonexNonewNone_fs_FIXcramped_room_6_9_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr3e-5g0.99cv0.5ce0.01e5mb1l0.95_pc0.2_h64cf32fc2se5ba_re_0
|
11
src/eval_random_against_population.sh
Executable file
|
@ -0,0 +1,11 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
for env in "Overcooked-CoordRing6_9" "Overcooked-ForcedCoord6_9" "Overcooked-CounterCircuit6_9" "Overcooked-AsymmAdvantages6_9" "Overcooked-CrampedRoom6_9";
|
||||
do
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.evaluate_baseline_against_population \
|
||||
--env_names=${env} \
|
||||
--population_json="populations/fcp/${env}/population.json" \
|
||||
--n_episodes=100 \
|
||||
--is_random=True
|
||||
done
|
10
src/eval_stay_against_population.sh
Executable file
|
@ -0,0 +1,10 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
for env in "Overcooked-AsymmAdvantages6_9" "Overcooked-CrampedRoom6_9" "Overcooked-CoordRing6_9" "Overcooked-ForcedCoord6_9" "Overcooked-CounterCircuit6_9";
|
||||
do
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.evaluate_baseline_against_population \
|
||||
--env_names=${env} \
|
||||
--population_json="populations/fcp/${env}/population.json" \
|
||||
--n_episodes=100
|
||||
done
|
6
src/eval_xpid.sh
Executable file
|
@ -0,0 +1,6 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.evaluate \
|
||||
--xpid=$2 \
|
||||
--env_names=Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9 \
|
||||
--n_episodes=1000
|
10
src/eval_xpid_against_population.sh
Executable file
|
@ -0,0 +1,10 @@
|
|||
DEFAULTVALUE=4
|
||||
# Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
ENV=Overcooked-AsymmAdvantages6_9
|
||||
XPID=$2
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.evaluate_against_population \
|
||||
--xpid=${XPID} \
|
||||
--env_names=${ENV} \
|
||||
--population_json="populations/fcp/${ENV}/population.json" \
|
||||
--n_episodes=100
|
14
src/eval_xpid_against_population_in_all_layouts.sh
Executable file
|
@ -0,0 +1,14 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
NAME=$2
|
||||
XPID=$3
|
||||
|
||||
for env in "Overcooked-CoordRing6_9" "Overcooked-ForcedCoord6_9" "Overcooked-CounterCircuit6_9" "Overcooked-AsymmAdvantages6_9" "Overcooked-CrampedRoom6_9";
|
||||
do
|
||||
echo "Evaluating ${NAME} against population in ${env} for xpid ${XPID}"
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.evaluate_against_population \
|
||||
--xpid=${XPID} \
|
||||
--env_names=${env} \
|
||||
--population_json="populations/fcp/${env}/population.json" \
|
||||
--n_episodes=100
|
||||
done
|
19
src/eval_xpid_all_cnn_lstm.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_CNN-LSTM_SEED1 dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_CNN-LSTM_SEED2 SEED_2_dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_CNN-LSTM_SEED3 SEED_3_dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_CNN-LSTM_SEED1 plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_CNN-LSTM_SEED2 SEED_2_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_CNN-LSTM_SEED3 SEED_3_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_CNN-LSTM_SEED1 paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_CNN-LSTM_SEED2 SEED_2_paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_CNN-LSTM_SEED3 SEED_3_paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_CNN-LSTM_SEED1 plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_CNN-LSTM_SEED2 SEED_2_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_CNN-LSTM_SEED3 SEED_3_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lstm_h64_0
|
||||
|
19
src/eval_xpid_all_cnn_s5.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_CNN-S5_SEED1 dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_CNN-S5_SEED2 SEED_2_dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_CNN-S5_SEED3 SEED_3_dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_CNN-S5_SEED1 plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_CNN-S5_SEED2 SEED_2_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_CNN-S5_SEED3 SEED_3_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_CNN-S5_SEED1 paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_CNN-S5_SEED2 SEED_2_paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_CNN-S5_SEED3 SEED_3_paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_CNN-S5_SEED1 plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_CNN-S5_SEED2 SEED_2_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_CNN-S5_SEED3 SEED_3_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc3se5ba_re_lpr_ahg1_s5_h64nb2nl2_0
|
||||
|
19
src/eval_xpid_all_softmoe.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
DEFAULTVALUE=4
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_SoftMoE_SEED1 dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_SoftMoE_SEED2 SEED_2_dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device DR_SoftMoE_SEED3 SEED_3_dr-overcooked6x9w15_fs_IMAGE-r1s_32p_1e_400t_ae1e-05-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_SoftMoE_SEED1 plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_SoftMoE_SEED2 SEED_2_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PLR_SoftMoE_SEED3 SEED_3_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.5b4000t0.1s0.3m0.5r_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_SoftMoE_SEED1 paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_SoftMoE_SEED2 SEED_2_paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device PAIRED_SoftMoE_SEED3 SEED_3_paired-overcooked6x9w5_ld50_rb-r2s_32p_1e_400t_ae1e-05_sr-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___tch_ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98pc0.2_h64cf128fc1se10ba_re_lstm_h64_0
|
||||
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_SoftMoE_SEED1 plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_SoftMoE_SEED2 SEED_2_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
./eval_xpid_against_population_in_all_layouts.sh $device ACCEL_SoftMoE_SEED3 SEED_3_plr-overcooked6x9w15_fs_IMAGE-rpf_p0.8b4000t0.1s0.3m0.5r_mdef20bat_r1s_32p_1e_400t_ae1e-05_smm-ppo_lr0.0003g0.999cv0.5ce0.01e8mb4l0.98_pc0.2_h64cf32fc2se5ba_re_lstm_h64__SoftMoE_4E_32S___0
|
||||
|
14
src/extract_fcp.sh
Executable file
|
@ -0,0 +1,14 @@
|
|||
DEFAULTVALUE=4
|
||||
ENV=Overcooked-CrampedRoom5_5 # Overcooked-CoordRing6_9,Overcooked-ForcedCoord6_9,Overcooked-CounterCircuit6_9,Overcooked-AsymmAdvantages6_9,Overcooked-CrampedRoom6_9
|
||||
device="${1:-$DEFAULTVALUE}"
|
||||
|
||||
seed_max=8
|
||||
|
||||
for seed in `seq ${seed_max}`;
|
||||
do
|
||||
CUDA_VISIBLE_DEVICES=${device} LD_LIBRARY_PATH="" nice -n 5 python3 -m minimax.extract_fcp \
|
||||
--xpid=8SEED_${seed}_$2 \
|
||||
--env_names=${ENV} \
|
||||
--n_episodes=100 \
|
||||
--trained_seed=${seed}
|
||||
done
|
1
src/make_cmd.sh
Executable file
|
@ -0,0 +1 @@
|
|||
python3 -m minimax.config.make_cmd --config $1/$2
|
9
src/minimax/__init__.py
Normal file
|
@ -0,0 +1,9 @@
|
|||
from . import envs
|
||||
from . import agents
|
||||
from . import models
|
||||
from . import runners
|
||||
from . import util
|
||||
from . import arguments
|
||||
from . import evaluate
|
||||
# from . import train
|
||||
from . import config
|
15
src/minimax/agents/__init__.py
Normal file
|
@ -0,0 +1,15 @@
|
|||
"""
|
||||
Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
All rights reserved.
|
||||
|
||||
This source code is licensed under the license found in the
|
||||
LICENSE file in the root directory of this source tree.
|
||||
"""
|
||||
|
||||
from .ppo import PPOAgent
|
||||
from .mappo import MAPPOAgent
|
||||
|
||||
|
||||
__all__ = [
|
||||
PPOAgent, MAPPOAgent
|
||||
]
|
40
src/minimax/agents/agent.py
Normal file
|
@ -0,0 +1,40 @@
|
|||
"""
|
||||
Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
All rights reserved.
|
||||
|
||||
This source code is licensed under the license found in the
|
||||
LICENSE file in the root directory of this source tree.
|
||||
"""
|
||||
|
||||
from abc import ABC
|
||||
|
||||
|
||||
class Agent(ABC):
|
||||
"""
|
||||
Generic interface for an agent.
|
||||
"""
|
||||
@property
|
||||
def is_recurrent(self):
|
||||
pass
|
||||
|
||||
@property
|
||||
def action_info_keys(self):
|
||||
pass
|
||||
|
||||
def init_params(self, rng, obs, carry=None):
|
||||
pass
|
||||
|
||||
def init_carry(self, rng, batch_dims):
|
||||
pass
|
||||
|
||||
def act(self, *args, **kwargs):
|
||||
pass
|
||||
|
||||
def get_action_dist(self, dist_params, dtype):
|
||||
pass
|
||||
|
||||
def evaluate(self, *args, **kwargs):
|
||||
pass
|
||||
|
||||
def update(self, *args, **kwargs):
|
||||
pass
|
449
src/minimax/agents/mappo.py
Normal file
|
@ -0,0 +1,449 @@
|
|||
"""
|
||||
Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
All rights reserved.
|
||||
|
||||
This source code is licensed under the license found in the
|
||||
LICENSE file in the root directory of this source tree.
|
||||
"""
|
||||
|
||||
from functools import partial
|
||||
from collections import OrderedDict
|
||||
|
||||
import einops
|
||||
import jax
|
||||
import jax.numpy as jnp
|
||||
import optax
|
||||
from flax.training.train_state import TrainState
|
||||
from tensorflow_probability.substrates import jax as tfp
|
||||
|
||||
from .agent import Agent
|
||||
|
||||
|
||||
class MAPPOAgent(Agent):
|
||||
def __init__(
|
||||
self,
|
||||
actor,
|
||||
critic,
|
||||
n_epochs=5,
|
||||
n_minibatches=1,
|
||||
value_loss_coef=0.5,
|
||||
entropy_coef=0.0,
|
||||
clip_eps=0.2,
|
||||
clip_value_loss=True,
|
||||
track_grad_norm=False,
|
||||
n_unroll_update=1,
|
||||
n_devices=1):
|
||||
|
||||
self.actor = actor
|
||||
self.critic = critic
|
||||
|
||||
self.n_epochs = n_epochs
|
||||
self.n_minibatches = n_minibatches
|
||||
self.value_loss_coef = value_loss_coef
|
||||
self.entropy_coef = entropy_coef
|
||||
self.clip_eps = clip_eps
|
||||
self.clip_value_loss = clip_value_loss
|
||||
self.track_grad_norm = track_grad_norm
|
||||
self.n_unroll_update = n_unroll_update
|
||||
self.n_devices = n_devices
|
||||
|
||||
self.actor_grad_fn = jax.value_and_grad(self._actor_loss, has_aux=True)
|
||||
self.critic_grad_fn = jax.value_and_grad(
|
||||
self._critic_loss, has_aux=True)
|
||||
|
||||
@property
|
||||
def is_recurrent(self):
|
||||
# Actor and Critic need to share arch for now.
|
||||
return self.actor.is_recurrent
|
||||
|
||||
def init_params(self, rng, obs):
|
||||
"""
|
||||
Returns initialized parameters and RNN hidden state for a specific
|
||||
observation shape.
|
||||
"""
|
||||
if len(obs) == 2:
|
||||
obs, shared_obs = obs
|
||||
else:
|
||||
raise ValueError("Obs should always be a two tuple for MAPPO!")
|
||||
|
||||
rng, subrng = jax.random.split(rng)
|
||||
is_recurrent = self.actor.is_recurrent
|
||||
if is_recurrent:
|
||||
batch_size = jax.tree_util.tree_leaves(obs)[0].shape[1]
|
||||
actor_carry = self.actor.initialize_carry(
|
||||
rng=subrng, batch_dims=(batch_size,))
|
||||
critic_carry = self.critic.initialize_carry(
|
||||
rng=subrng, batch_dims=(batch_size,))
|
||||
reset = jnp.zeros((1, batch_size), dtype=jnp.bool_)
|
||||
|
||||
rng, subrng = jax.random.split(rng)
|
||||
|
||||
# Notice that these are different to later observations but they resemble what we need
|
||||
actor_params = self.actor.init(
|
||||
subrng, obs[:, :, 0], actor_carry, reset)
|
||||
critic_params = self.critic.init(
|
||||
subrng, shared_obs[:, :, 0], critic_carry, reset)
|
||||
else:
|
||||
|
||||
obs = jnp.concatenate(obs, axis=0)
|
||||
shared_obs = jnp.concatenate(shared_obs, axis=0)
|
||||
actor_params = self.actor.init(subrng, obs, None)
|
||||
critic_params = self.critic.init(subrng, shared_obs, None)
|
||||
|
||||
return (actor_params, critic_params)
|
||||
|
||||
def init_carry(self, rng, batch_dims=1):
|
||||
actor_carry = self.actor.initialize_carry(
|
||||
rng=rng, batch_dims=batch_dims)
|
||||
# This is for evaluation where we throw away the critic
|
||||
if self.critic is not None:
|
||||
critic_carry = self.critic.initialize_carry(
|
||||
rng=rng, batch_dims=batch_dims)
|
||||
else:
|
||||
critic_carry = None
|
||||
return actor_carry, critic_carry
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def act(self, actor_params, obs, carry=None, reset=None):
|
||||
logits, carry = self.actor.apply(
|
||||
actor_params, obs, carry, reset)
|
||||
|
||||
return None, logits, carry
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def get_value(self, params, shared_obs, carry=None, reset=None):
|
||||
value, new_carry = self.critic.apply(params, shared_obs, carry, reset)
|
||||
return value, new_carry
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def evaluate_action(
|
||||
self, actor_params, action, obs, actor_carry=None, reset=None
|
||||
):
|
||||
dist_params, actor_carry = self.actor.apply(
|
||||
actor_params, obs, actor_carry, reset)
|
||||
dist = self.get_action_dist(dist_params, dtype=action.dtype)
|
||||
log_prob = dist.log_prob(action)
|
||||
entropy = dist.entropy()
|
||||
|
||||
return log_prob.squeeze(), \
|
||||
entropy.squeeze(), \
|
||||
actor_carry
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def evaluate(self, params, action, obs, carry=None, reset=None):
|
||||
value, dist_params, carry = self.model.apply(params, obs, carry, reset)
|
||||
dist = self.get_action_dist(dist_params, dtype=action.dtype)
|
||||
log_prob = dist.log_prob(action)
|
||||
entropy = dist.entropy()
|
||||
|
||||
return value.squeeze(), \
|
||||
log_prob.squeeze(), \
|
||||
entropy.squeeze(), \
|
||||
carry
|
||||
|
||||
def get_action_dist(self, dist_params, dtype=jnp.uint8):
|
||||
return tfp.distributions.Categorical(logits=dist_params, dtype=dtype)
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def update(self, rng, train_state, batch):
|
||||
rngs = jax.random.split(rng, self.n_epochs)
|
||||
|
||||
def _scan_epoch(carry, rng):
|
||||
brng, urng = jax.random.split(rng)
|
||||
batch, train_state = carry
|
||||
minibatches = self._get_minibatches(brng, batch)
|
||||
train_state, stats = \
|
||||
self._update_epoch(
|
||||
urng, train_state, minibatches)
|
||||
|
||||
return (batch, train_state), stats
|
||||
|
||||
(_, train_state), stats = jax.lax.scan(
|
||||
_scan_epoch,
|
||||
(batch, train_state),
|
||||
rngs,
|
||||
length=len(rngs)
|
||||
)
|
||||
|
||||
stats = jax.tree_util.tree_map(lambda x: x.mean(), stats)
|
||||
train_state = train_state.increment_updates()
|
||||
|
||||
return train_state, stats
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def get_empty_update_stats(self):
|
||||
keys = [
|
||||
'total_loss', # actor_loss + critic_loss
|
||||
'actor_loss', # loss_actor - entropy_coef*entropy
|
||||
'critic_loss', # value_loss_coef*value_loss
|
||||
'actor_loss_actor', # Without the entropy term added
|
||||
'actor_l2_reg_weight_loss',
|
||||
'actor_entropy',
|
||||
'actor_mean_target',
|
||||
'actor_mean_gae',
|
||||
'critic_value_loss',
|
||||
'critic_l2_reg_weight_loss',
|
||||
'critic_mean_value',
|
||||
'critic_mean_target',
|
||||
'critic_mean_gae',
|
||||
'actor_grad_norm',
|
||||
'critic_grad_norm',
|
||||
]
|
||||
|
||||
return OrderedDict({k: -jnp.inf for k in keys})
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def _update_epoch(
|
||||
self,
|
||||
rng,
|
||||
train_state: TrainState,
|
||||
minibatches):
|
||||
|
||||
def _update_minibatch(carry, step):
|
||||
rng, minibatch = step
|
||||
train_state = carry
|
||||
|
||||
(actor_loss, actor_aux_info), actor_grads = self.actor_grad_fn(
|
||||
train_state.actor_params,
|
||||
train_state.actor_apply_fn,
|
||||
minibatch,
|
||||
rng,
|
||||
)
|
||||
|
||||
(critic_loss, critic_aux_info), critic_grads = self.critic_grad_fn(
|
||||
train_state.critic_params,
|
||||
train_state.critic_apply_fn,
|
||||
minibatch,
|
||||
rng,
|
||||
)
|
||||
|
||||
total_loss = actor_loss + critic_loss
|
||||
loss_info = (total_loss, actor_loss, critic_loss,) + \
|
||||
actor_aux_info + critic_aux_info
|
||||
loss_info = loss_info + \
|
||||
(optax.global_norm(actor_grads), optax.global_norm(critic_grads),)
|
||||
|
||||
if self.n_devices > 1:
|
||||
loss_info = jax.tree_map(
|
||||
lambda x: jax.lax.pmean(x, 'device'), loss_info)
|
||||
actor_grads = jax.tree_map(
|
||||
lambda x: jax.lax.pmean(x, 'device'), actor_grads)
|
||||
critic_grads = jax.tree_map(
|
||||
lambda x: jax.lax.pmean(x, 'device'), critic_grads)
|
||||
|
||||
train_state = train_state.apply_gradients(
|
||||
actor_grads=actor_grads,
|
||||
critic_grads=critic_grads)
|
||||
|
||||
stats_def = jax.tree_util.tree_structure(OrderedDict({
|
||||
k: 0 for k in [
|
||||
'total_loss', # actor_loss + critic_loss
|
||||
'actor_loss', # loss_actor - entropy_coef*entropy
|
||||
'critic_loss', # value_loss_coef*value_loss
|
||||
'actor_loss_actor', # Without the entropy term added
|
||||
'actor_l2_reg_weight_loss',
|
||||
'actor_entropy',
|
||||
'actor_mean_target',
|
||||
'actor_mean_gae',
|
||||
'critic_value_loss',
|
||||
'critic_l2_reg_weight_loss',
|
||||
'critic_mean_value',
|
||||
'critic_mean_target',
|
||||
'critic_mean_gae',
|
||||
'actor_grad_norm',
|
||||
'critic_grad_norm',
|
||||
]}))
|
||||
|
||||
loss_stats = jax.tree_util.tree_unflatten(
|
||||
stats_def, jax.tree_util.tree_leaves(loss_info))
|
||||
return train_state, loss_stats
|
||||
|
||||
rngs = jax.random.split(rng, self.n_minibatches)
|
||||
train_state, loss_stats = jax.lax.scan(
|
||||
_update_minibatch,
|
||||
train_state,
|
||||
(rngs, minibatches),
|
||||
length=self.n_minibatches,
|
||||
unroll=self.n_unroll_update
|
||||
)
|
||||
|
||||
loss_stats = jax.tree_util.tree_map(
|
||||
lambda x: x.mean(axis=0), loss_stats)
|
||||
|
||||
return train_state, loss_stats
|
||||
|
||||
@partial(jax.jit, static_argnums=(0, 2, 4))
|
||||
def _actor_loss(
|
||||
self,
|
||||
params,
|
||||
apply_fn,
|
||||
batch,
|
||||
rng=None
|
||||
):
|
||||
"""Currently the shape of elements is n_rollout_steps x n_envs x n_env_agents x ...shape.
|
||||
This is one more than intended for the actor and critic. The extra dimension is for the
|
||||
env agents. We thus need to merge it into the n_envs dimension.
|
||||
"""
|
||||
carry = None
|
||||
|
||||
if self.is_recurrent:
|
||||
"""
|
||||
Elements have batch shape of n_rollout_steps x n_envs//n_minibatches
|
||||
"""
|
||||
batch = jax.tree_map(
|
||||
lambda x: einops.rearrange(
|
||||
x, 't n a ... -> t (n a) ...'), batch
|
||||
)
|
||||
carry = jax.tree_util.tree_map(
|
||||
lambda x: x[0, :], batch.actor_carry)
|
||||
obs, _, action, rewards, dones, log_pi_old, value_old, target, gae, carry_old, _ = batch
|
||||
|
||||
if self.is_recurrent:
|
||||
dones = dones.at[1:, :].set(dones[:-1, :])
|
||||
dones = dones.at[0, :].set(False)
|
||||
_batch = batch._replace(dones=dones)
|
||||
|
||||
# Returns LxB and LxBxH tensors
|
||||
obs, _, action, _, done, _, _, _, _, _, _ = _batch
|
||||
log_pi, entropy, carry = apply_fn(
|
||||
params, action, obs, carry, done)
|
||||
else:
|
||||
log_pi, entropy, carry = apply_fn(
|
||||
params, action, obs, carry_old)
|
||||
else:
|
||||
batch = jax.tree_map(
|
||||
lambda x: einops.rearrange(x, 'n a ... -> (n a) ...'), batch
|
||||
)
|
||||
obs, _, action, rewards, dones, log_pi_old, value_old, target, gae, _, _ = batch
|
||||
log_pi, entropy, _ = apply_fn(params, action, obs, carry)
|
||||
|
||||
ratio = jnp.exp(log_pi - log_pi_old)
|
||||
norm_gae = (gae - gae.mean()) / (gae.std() + 1e-5)
|
||||
loss_actor1 = ratio * norm_gae
|
||||
loss_actor2 = jnp.clip(ratio, 1.0 - self.clip_eps,
|
||||
1.0 + self.clip_eps) * norm_gae
|
||||
loss_actor = -jnp.minimum(loss_actor1, loss_actor2).mean()
|
||||
|
||||
entropy = entropy.mean()
|
||||
|
||||
l2_reg_actor = 0.0
|
||||
|
||||
actor_loss = loss_actor - self.entropy_coef * entropy + l2_reg_actor
|
||||
|
||||
return actor_loss, (
|
||||
loss_actor,
|
||||
l2_reg_actor,
|
||||
entropy,
|
||||
target.mean(),
|
||||
gae.mean()
|
||||
)
|
||||
|
||||
@partial(jax.jit, static_argnums=(0, 2, 4))
|
||||
def _critic_loss(
|
||||
self,
|
||||
params,
|
||||
apply_fn,
|
||||
batch,
|
||||
rng=None
|
||||
):
|
||||
|
||||
carry = None
|
||||
|
||||
if self.is_recurrent:
|
||||
"""
|
||||
Elements have batch shape of n_rollout_steps x n_envs//n_minibatches
|
||||
"""
|
||||
"Same as in actor loss:"
|
||||
batch = jax.tree_map(
|
||||
lambda x: einops.rearrange(
|
||||
x, 't n a ... -> t (n a) ...'), batch
|
||||
)
|
||||
carry = jax.tree_util.tree_map(
|
||||
lambda x: x[0, :], batch.critic_carry)
|
||||
_, obs_shared, action, rewards, dones, log_pi_old, value_old, target, gae, _, carry_old = batch
|
||||
|
||||
if self.is_recurrent:
|
||||
dones = dones.at[1:, :].set(dones[:-1, :])
|
||||
dones = dones.at[0, :].set(False)
|
||||
_batch = batch._replace(dones=dones)
|
||||
|
||||
# Returns LxB and LxBxH tensors
|
||||
_, obs_shared, action, _, done, _, _, _, _, _, _ = _batch
|
||||
value, carry = apply_fn(
|
||||
params, obs_shared, carry, done)
|
||||
else:
|
||||
value, carry = apply_fn(
|
||||
params, obs_shared, carry_old)
|
||||
value = value.squeeze(-1)
|
||||
else:
|
||||
batch = jax.tree_map(
|
||||
lambda x: einops.rearrange(x, 'n a ... -> (n a) ...'), batch
|
||||
)
|
||||
obs, obs_shared, action, rewards, dones, log_pi_old, value_old, target, gae, _, _ = batch
|
||||
value, _ = apply_fn(params, obs_shared, carry)
|
||||
|
||||
if self.clip_value_loss:
|
||||
value_pred_clipped = value_old + (value - value_old).clip(
|
||||
-self.clip_eps, self.clip_eps
|
||||
)
|
||||
value_losses = jnp.square(value - target)
|
||||
value_losses_clipped = jnp.square(value_pred_clipped - target)
|
||||
value_loss = 0.5 * \
|
||||
jnp.maximum(value_losses, value_losses_clipped).mean()
|
||||
else:
|
||||
value_pred_clipped = value_old + (value - value_old).clip(
|
||||
-self.clip_eps, self.clip_eps
|
||||
)
|
||||
value_loss = optax.huber_loss(
|
||||
value_pred_clipped, target, delta=10.0).mean()
|
||||
|
||||
l2_reg_critic = 0.0
|
||||
|
||||
critic_loss = self.value_loss_coef*value_loss + l2_reg_critic
|
||||
|
||||
return critic_loss, (
|
||||
value_loss,
|
||||
l2_reg_critic,
|
||||
value.mean(),
|
||||
target.mean(),
|
||||
gae.mean()
|
||||
)
|
||||
|
||||
@partial(jax.jit, static_argnums=0)
|
||||
def _get_minibatches(self, rng, batch):
|
||||
# get dims based on dones
|
||||
n_rollout_steps, n_envs = batch.dones.shape[0:2]
|
||||
if self.is_recurrent:
|
||||
"""
|
||||
Reshape elements into a batch shape of
|
||||
n_minibatches x n_envs//n_minibatches x n_rollout_steps.
|
||||
"""
|
||||
assert n_envs % self.n_minibatches == 0, \
|
||||
'Number of environments must be divisible into number of minibatches.'
|
||||
|
||||
n_env_per_minibatch = n_envs//self.n_minibatches
|
||||
shuffled_idx = jax.random.permutation(rng, jnp.arange(n_envs))
|
||||
|
||||
shuffled_batch = jax.tree_util.tree_map(
|
||||
lambda x: jnp.take(x, shuffled_idx, axis=1), batch)
|
||||
|
||||
minibatches = jax.tree_util.tree_map(
|
||||
lambda x: x.swapaxes(0, 1).reshape(
|
||||
self.n_minibatches,
|
||||
n_env_per_minibatch,
|
||||
n_rollout_steps,
|
||||
*x.shape[2:]
|
||||
).swapaxes(1, 2), shuffled_batch)
|
||||
else:
|
||||
n_txns = n_envs*n_rollout_steps
|
||||
assert n_envs*n_rollout_steps % self.n_minibatches == 0
|
||||
|
||||
shuffled_idx = jax.random.permutation(rng, jnp.arange(n_txns))
|
||||
shuffled_batch = jax.tree_util.tree_map(
|
||||
lambda x: jnp.take(
|
||||
x.reshape(n_txns, *x.shape[2:]),
|
||||
shuffled_idx, axis=0), batch)
|
||||
minibatches = jax.tree_util.tree_map(
|
||||
lambda x: x.reshape(self.n_minibatches, -1, *x.shape[1:]), shuffled_batch)
|
||||
|
||||
return minibatches
|
304
src/minimax/agents/ppo.py
Normal file
|
@ -0,0 +1,304 @@
|
|||
"""
|
||||
Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
All rights reserved.
|
||||
|
||||
This source code is licensed under the license found in the
|
||||
LICENSE file in the root directory of this source tree.
|
||||
"""
|
||||
|
||||
from functools import partial
|
||||
from typing import Any, Callable, Tuple
|
||||
from collections import defaultdict, OrderedDict
|
||||
|
||||
import jax
|
||||
import jax.numpy as jnp
|
||||
import optax
|
||||
from flax.training.train_state import TrainState
|
||||
from tensorflow_probability.substrates import jax as tfp
|
||||
|
||||
from .agent import Agent
|
||||
|
||||
|
||||
class PPOAgent(Agent):
|
||||
def __init__(
|
||||
self,
|
||||
model,
|
||||
n_epochs=5,
|
||||
n_minibatches=1,
|
||||
value_loss_coef=0.5,
|
||||
entropy_coef=0.0,
|
||||
clip_eps=0.2,
|
||||
clip_value_loss=True,
|
||||
track_grad_norm=False,
|
||||
n_unroll_update=1,
|
||||
n_devices=1):
|
||||
|
||||
self.model = model
|
||||
self.n_epochs = n_epochs
|
||||
self.n_minibatches = n_minibatches
|
||||
self.value_loss_coef = value_loss_coef
|
||||
self.entropy_coef = entropy_coef
|
||||
self.clip_eps = clip_eps
|
||||
self.clip_value_loss = clip_value_loss
|
||||
self.track_grad_norm = track_grad_norm
|
||||
self.n_unroll_update = n_unroll_update
|
||||
self.n_devices = n_devices
|
||||
|
||||
self.grad_fn = jax.value_and_grad(self._loss, has_aux=True)
|
||||
|
||||
@property
|
||||
def is_recurrent(self):
|
||||
return self.model.is_recurrent
|
||||
|
||||
def init_params(self, rng, obs):
|
||||
"""
|
||||
Returns initialized parameters and RNN hidden state for a specific
|
||||
observation shape.
|
||||
"""
|
||||
rng, subrng = jax.random.split(rng)
|
||||
if self.model.is_recurrent:
|
||||
batch_size = jax.tree_util.tree_leaves(obs)[0].shape[1]
|
||||
carry = self.model.initialize_carry(
|
||||
rng=subrng, batch_dims=(batch_size,))
|
||||
reset = jnp.zeros((1, batch_size), dtype=jnp.bool_)
|
||||
rng, subrng = jax.random.split(rng)
|
||||
params = self.model.init(subrng, obs, carry, reset)
|
||||
else:
|
||||
params = self.model.init(subrng, obs)
|
||||
|
||||
return params
|
||||
|
||||
def init_carry(self, rng, batch_dims=(1,)):
|
||||
return self.model.initialize_carry(rng=rng, batch_dims=batch_dims)
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def act(self, params, obs, carry=None, reset=None):
|
||||
value, logits, carry = self.model.apply(params, obs, carry, reset)
|
||||
|
||||
return value, logits, carry
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def get_value(self, params, obs, carry=None, reset=None):
|
||||
value, _, carry = self.model.apply(params, obs, carry, reset)
|
||||
return value, carry
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def evaluate(self, params, action, obs, carry=None, reset=None):
|
||||
value, dist_params, carry = self.model.apply(params, obs, carry, reset)
|
||||
dist = self.get_action_dist(dist_params, dtype=action.dtype)
|
||||
log_prob = dist.log_prob(action)
|
||||
entropy = dist.entropy()
|
||||
|
||||
return value.squeeze(), \
|
||||
log_prob.squeeze(), \
|
||||
entropy.squeeze(), \
|
||||
carry
|
||||
|
||||
def get_action_dist(self, dist_params, dtype=jnp.uint8):
|
||||
return tfp.distributions.Categorical(logits=dist_params, dtype=dtype)
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def update(self, rng, train_state, batch):
|
||||
rngs = jax.random.split(rng, self.n_epochs)
|
||||
|
||||
def _scan_epoch(carry, rng):
|
||||
brng, urng = jax.random.split(rng)
|
||||
batch, train_state = carry
|
||||
minibatches = self._get_minibatches(brng, batch)
|
||||
train_state, stats = \
|
||||
self._update_epoch(
|
||||
urng, train_state, minibatches)
|
||||
|
||||
return (batch, train_state), stats
|
||||
|
||||
(_, train_state), stats = jax.lax.scan(
|
||||
_scan_epoch,
|
||||
(batch, train_state),
|
||||
rngs,
|
||||
length=len(rngs)
|
||||
)
|
||||
|
||||
stats = jax.tree_util.tree_map(lambda x: x.mean(), stats)
|
||||
train_state = train_state.increment_updates()
|
||||
|
||||
return train_state, stats
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def get_empty_update_stats(self):
|
||||
keys = ['total_loss',
|
||||
'actor_loss',
|
||||
'value_loss',
|
||||
'entropy',
|
||||
'mean_value',
|
||||
'mean_target',
|
||||
'mean_gae',
|
||||
'grad_norm']
|
||||
|
||||
return OrderedDict({k: -jnp.inf for k in keys})
|
||||
|
||||
@partial(jax.jit, static_argnums=(0,))
|
||||
def _update_epoch(
|
||||
self,
|
||||
rng,
|
||||
train_state: TrainState,
|
||||
minibatches):
|
||||
|
||||
def _update_minibatch(carry, step):
|
||||
rng, minibatch = step
|
||||
train_state = carry
|
||||
|
||||
(loss, aux_info), grads = self.grad_fn(
|
||||
train_state.params,
|
||||
train_state.apply_fn,
|
||||
minibatch,
|
||||
rng,
|
||||
)
|
||||
|
||||
loss_info = (loss,) + aux_info
|
||||
loss_info = loss_info + (optax.global_norm(grads),)
|
||||
|
||||
if self.n_devices > 1:
|
||||
loss_info = jax.tree_map(
|
||||
lambda x: jax.lax.pmean(x, 'device'), loss_info)
|
||||
grads = jax.tree_map(
|
||||
lambda x: jax.lax.pmean(x, 'device'), grads)
|
||||
|
||||
train_state = train_state.apply_gradients(grads=grads)
|
||||
|
||||
stats_def = jax.tree_util.tree_structure(OrderedDict({
|
||||
k: 0 for k in [
|
||||
'total_loss',
|
||||
'actor_loss',
|
||||
'value_loss',
|
||||
'entropy',
|
||||
'mean_value',
|
||||
'mean_target',
|
||||
'mean_gae',
|
||||
'grad_norm',
|
||||
]}))
|
||||
|
||||
loss_stats = jax.tree_util.tree_unflatten(
|
||||
stats_def, jax.tree_util.tree_leaves(loss_info))
|
||||
|
||||
return train_state, loss_stats
|
||||
|
||||
rngs = jax.random.split(rng, self.n_minibatches)
|
||||
train_state, loss_stats = jax.lax.scan(
|
||||
_update_minibatch,
|
||||
train_state,
|
||||
(rngs, minibatches),
|
||||
length=self.n_minibatches,
|
||||
unroll=self.n_unroll_update
|
||||
)
|
||||
|
||||
loss_stats = jax.tree_util.tree_map(
|
||||
lambda x: x.mean(axis=0), loss_stats)
|
||||
|
||||
return train_state, loss_stats
|
||||
|
||||
@partial(jax.jit, static_argnums=(0, 2, 4))
|
||||
def _loss(
|
||||
self,
|
||||
params,
|
||||
apply_fn,
|
||||
batch,
|
||||
rng=None):
|
||||
carry = None
|
||||
|
||||
if self.is_recurrent:
|
||||
"""
|
||||
Elements have batch shape of n_rollout_steps x n_envs//n_minibatches
|
||||
"""
|
||||
carry = jax.tree_util.tree_map(lambda x: x[0, :], batch.carry)
|
||||
obs, action, rewards, dones, log_pi_old, value_old, target, gae, carry_old = batch
|
||||
|
||||
if self.is_recurrent:
|
||||
dones = dones.at[1:, :].set(dones[:-1, :])
|
||||
dones = dones.at[0, :].set(False)
|
||||
_batch = batch._replace(dones=dones)
|
||||
|
||||
# Returns LxB and LxBxH tensors
|
||||
obs, action, _, done, _, _, _, _, _ = _batch
|
||||
value, log_pi, entropy, carry = apply_fn(
|
||||
params, action, obs, carry, done)
|
||||
else:
|
||||
value, log_pi, entropy, carry = apply_fn(
|
||||
params, action, obs, carry_old)
|
||||
else:
|
||||
obs, action, rewards, dones, log_pi_old, value_old, target, gae, _ = batch
|
||||
value, log_pi, entropy, _ = apply_fn(params, action, obs, carry)
|
||||
|
||||
if self.clip_value_loss:
|
||||
value_pred_clipped = value_old + (value - value_old).clip(
|
||||
-self.clip_eps, self.clip_eps
|
||||
)
|
||||
value_losses = jnp.square(value - target)
|
||||
value_losses_clipped = jnp.square(value_pred_clipped - target)
|
||||
value_loss = 0.5 * \
|
||||
jnp.maximum(value_losses, value_losses_clipped).mean()
|
||||
else:
|
||||
value_loss = optax.huber_loss(value, target).mean()
|
||||
|
||||
if self.model.value_ensemble_size > 1:
|
||||
gae = gae.at[..., 0].get()
|
||||
|
||||
ratio = jnp.exp(log_pi - log_pi_old)
|
||||
norm_gae = (gae - gae.mean()) / (gae.std() + 1e-5)
|
||||
loss_actor1 = ratio * norm_gae
|
||||
loss_actor2 = jnp.clip(ratio, 1.0 - self.clip_eps,
|
||||
1.0 + self.clip_eps) * norm_gae
|
||||
loss_actor = -jnp.minimum(loss_actor1, loss_actor2).mean()
|
||||
|
||||
entropy = entropy.mean()
|
||||
|
||||
total_loss = (
|
||||
loss_actor + self.value_loss_coef*value_loss - self.entropy_coef*entropy
|
||||
)
|
||||
|
||||
return total_loss, (
|
||||
loss_actor,
|
||||
value_loss,
|
||||
entropy,
|
||||
value.mean(),
|
||||
target.mean(),
|
||||
gae.mean()
|
||||
)
|
||||
|
||||
@partial(jax.jit, static_argnums=0)
|
||||
def _get_minibatches(self, rng, batch):
|
||||
# get dims based on dones
|
||||
n_rollout_steps, n_envs = batch.dones.shape[0:2]
|
||||
if self.is_recurrent:
|
||||
"""
|
||||
Reshape elements into a batch shape of
|
||||
n_minibatches x n_envs//n_minibatches x n_rollout_steps.
|
||||
"""
|
||||
assert n_envs % self.n_minibatches == 0, \
|
||||
'Number of environments must be divisible into number of minibatches.'
|
||||
|
||||
n_env_per_minibatch = n_envs//self.n_minibatches
|
||||
shuffled_idx = jax.random.permutation(rng, jnp.arange(n_envs))
|
||||
|
||||
shuffled_batch = jax.tree_util.tree_map(
|
||||
lambda x: jnp.take(x, shuffled_idx, axis=1), batch)
|
||||
|
||||
minibatches = jax.tree_util.tree_map(
|
||||
lambda x: x.swapaxes(0, 1).reshape(
|
||||
self.n_minibatches,
|
||||
n_env_per_minibatch,
|
||||
n_rollout_steps,
|
||||
*x.shape[2:]
|
||||
).swapaxes(1, 2), shuffled_batch)
|
||||
else:
|
||||
n_txns = n_envs*n_rollout_steps
|
||||
assert n_envs*n_rollout_steps % self.n_minibatches == 0
|
||||
|
||||
shuffled_idx = jax.random.permutation(rng, jnp.arange(n_txns))
|
||||
shuffled_batch = jax.tree_util.tree_map(
|
||||
lambda x: jnp.take(
|
||||
x.reshape(n_txns, *x.shape[2:]),
|
||||
shuffled_idx, axis=0), batch)
|
||||
minibatches = jax.tree_util.tree_map(
|
||||
lambda x: x.reshape(self.n_minibatches, -1, *x.shape[1:]), shuffled_batch)
|
||||
|
||||
return minibatches
|
1023
src/minimax/arguments.py
Normal file
0
src/minimax/config/__init__.py
Normal file
73
src/minimax/config/configs/maze/accel.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
59
src/minimax/config/configs/maze/dr.json
Normal file
|
@ -0,0 +1,59 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
73
src/minimax/config/configs/maze/paccel.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [10],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
84
src/minimax/config/configs/maze/paired.json
Normal file
|
@ -0,0 +1,84 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.995],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.05],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [5],
|
||||
"teacher_ppo_n_minibatches": [1],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["lstm"],
|
||||
"teacher_recurrent_hidden_dim": [256],
|
||||
"teacher_hidden_dim": [32],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [128],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [false],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"maze_ued_replace_wall_pos": [true],
|
||||
"maze_ued_fixed_n_wall_steps": [true],
|
||||
"maze_ued_first_wall_pos_sets_budget": [false],
|
||||
"maze_ued_noise_dim": [50],
|
||||
"maze_ued_n_walls": [60],
|
||||
"maze_ued_set_agent_dir": [false],
|
||||
"maze_ued_normalize_obs": [true],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/minimax/config/configs/maze/plr.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [5e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.1],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
69
src/minimax/config/configs/maze/pplr.json
Normal file
|
@ -0,0 +1,69 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [false],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["lstm"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
78
src/minimax/config/configs/maze/s5_accel.json
Normal file
|
@ -0,0 +1,78 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [10],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"test_agent_idxs": ["\"*\""],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
63
src/minimax/config/configs/maze/s5_dr.json
Normal file
|
@ -0,0 +1,63 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["dr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
77
src/minimax/config/configs/maze/s5_paccel.json
Normal file
|
@ -0,0 +1,77 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [1e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.8],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"plr_mutation_fn": ["default"],
|
||||
"plr_n_mutations": [20],
|
||||
"plr_mutation_criterion": ["batch"],
|
||||
"plr_mutation_subsample_size": [4],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.0],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [0],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
94
src/minimax/config/configs/maze/s5_paired.json
Normal file
|
@ -0,0 +1,94 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["paired"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [2],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [0.0001],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["relative_regret"],
|
||||
"student_gae_lambda": [0.98],
|
||||
"teacher_discount": [0.995],
|
||||
"teacher_lr": [0.0001],
|
||||
"teacher_lr_anneal_steps": [0],
|
||||
"teacher_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"teacher_entropy_coef": [0.001],
|
||||
"teacher_value_loss_coef": [0.5],
|
||||
"teacher_n_unroll_update": [5],
|
||||
"teacher_ppo_n_epochs": [5],
|
||||
"teacher_ppo_n_minibatches": [1],
|
||||
"teacher_ppo_clip_eps": [0.2],
|
||||
"teacher_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"teacher_model_name": ["default_teacher_cnn"],
|
||||
"teacher_recurrent_arch": ["s5"],
|
||||
"teacher_recurrent_hidden_dim": [256],
|
||||
"teacher_hidden_dim": [32],
|
||||
"teacher_n_hidden_layers": [1],
|
||||
"teacher_n_conv_filters": [32],
|
||||
"teacher_scalar_embed_dim": [10],
|
||||
"teacher_s5_n_blocks": [2],
|
||||
"teacher_s5_n_layers": [2],
|
||||
"teacher_s5_layernorm_pos": ["post"],
|
||||
"teacher_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [false],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"maze_ued_replace_wall_pos": [true],
|
||||
"maze_ued_fixed_n_wall_steps": [true],
|
||||
"maze_ued_first_wall_pos_sets_budget": [false],
|
||||
"maze_ued_noise_dim": [50],
|
||||
"maze_ued_n_walls": [60],
|
||||
"maze_ued_set_agent_dir": [false],
|
||||
"maze_ued_normalize_obs": [true],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"test_agent_idxs": ["\"*\""],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
73
src/minimax/config/configs/maze/s5_plr.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.999],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [false],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["pre"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
73
src/minimax/config/configs/maze/s5_pplr.json
Normal file
|
@ -0,0 +1,73 @@
|
|||
{
|
||||
"args": {
|
||||
"seed": [1],
|
||||
"agent_rl_algo": ["ppo"],
|
||||
"n_total_updates": [30000],
|
||||
"train_runner": ["plr"],
|
||||
"n_devices": [1],
|
||||
"student_model_name": ["default_student_cnn"],
|
||||
"env_name": ["Maze"],
|
||||
"verbose": [false],
|
||||
"log_dir": ["~/logs/minimax"],
|
||||
"log_interval": [10],
|
||||
"from_last_checkpoint": [true],
|
||||
"checkpoint_interval": [1000],
|
||||
"archive_interval": [0],
|
||||
"archive_init_checkpoint": [false],
|
||||
"test_interval": [100],
|
||||
"n_students": [1],
|
||||
"n_parallel": [32],
|
||||
"n_eval": [1],
|
||||
"n_rollout_steps": [256],
|
||||
"lr": [3e-05],
|
||||
"lr_anneal_steps": [0],
|
||||
"max_grad_norm": [0.5],
|
||||
"adam_eps": [1e-05],
|
||||
"track_env_metrics": [true],
|
||||
"discount": [0.995],
|
||||
"n_unroll_rollout": [10],
|
||||
"render": [false],
|
||||
"ued_score": ["max_mc"],
|
||||
"plr_replay_prob": [0.5],
|
||||
"plr_buffer_size": [4000],
|
||||
"plr_staleness_coef": [0.3],
|
||||
"plr_temp": [0.3],
|
||||
"plr_use_score_ranks": [true],
|
||||
"plr_min_fill_ratio": [0.5],
|
||||
"plr_use_robust_plr": [true],
|
||||
"plr_use_parallel_eval": [true],
|
||||
"plr_force_unique": [true],
|
||||
"student_gae_lambda": [0.98],
|
||||
"student_entropy_coef": [0.001],
|
||||
"student_value_loss_coef": [0.5],
|
||||
"student_n_unroll_update": [5],
|
||||
"student_ppo_n_epochs": [5],
|
||||
"student_ppo_n_minibatches": [1],
|
||||
"student_ppo_clip_eps": [0.2],
|
||||
"student_ppo_clip_value_loss": [true],
|
||||
"student_recurrent_arch": ["s5"],
|
||||
"student_recurrent_hidden_dim": [256],
|
||||
"student_hidden_dim": [32],
|
||||
"student_n_hidden_layers": [1],
|
||||
"student_n_conv_filters": [16],
|
||||
"student_n_scalar_embeddings": [4],
|
||||
"student_scalar_embed_dim": [5],
|
||||
"student_s5_n_blocks": [2],
|
||||
"student_s5_n_layers": [2],
|
||||
"student_s5_layernorm_pos": ["post"],
|
||||
"student_s5_activation": ["half_glu1"],
|
||||
"maze_height": [13],
|
||||
"maze_width": [13],
|
||||
"maze_n_walls": [60],
|
||||
"maze_replace_wall_pos": [true],
|
||||
"maze_sample_n_walls": [false],
|
||||
"maze_see_agent": [false],
|
||||
"maze_normalize_obs": [true],
|
||||
"maze_obs_agent_pos": [false],
|
||||
"maze_max_episode_steps": [250],
|
||||
"test_n_episodes": [10],
|
||||
"test_env_names": ["Maze-SixteenRooms,Maze-Labyrinth,Maze-StandardMaze"],
|
||||
"maze_test_see_agent": [false],
|
||||
"maze_test_normalize_obs": [true]
|
||||
}
|
||||
}
|
287
src/minimax/config/make_cmd.py
Normal file
|
@ -0,0 +1,287 @@
|
|||
"""
|
||||
Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
All rights reserved.
|
||||
|
||||
This source code is licensed under the license found in the
|
||||
LICENSE file in the root directory of this source tree.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import pathlib
|
||||
|
||||
import numpy as np
|
||||
|
||||
from minimax.util.dotdict import DefaultDotDict
|
||||
import minimax.config.xpid_maker as xpid_maker
|
||||
|
||||
|
||||
def get_wandb_config():
|
||||
wandb_config_path = os.path.join(os.path.abspath(os.getcwd()), 'config', 'wandb.json')
|
||||
if os.path.exists(wandb_config_path):
|
||||
with open(wandb_config_path, 'r') as config_file:
|
||||
config = json.load(config_file)
|
||||
if len(config) == 2:
|
||||
return {
|
||||
'wandb_base_url': config['base_url'],
|
||||
'wandb_api_key': config['api_key'],
|
||||
}
|
||||
|
||||
return {}
|
||||
|
||||
|
||||
def generate_train_cmds(
|
||||
cmd, params, num_trials=1, start_index=0, newlines=False,
|
||||
xpid_generator=None, xpid_prefix='',
|
||||
include_wandb_group=False,
|
||||
count_set=None):
|
||||
separator = ' \\\n' if newlines else ' '
|
||||
|
||||
cmds = []
|
||||
|
||||
if xpid_generator:
|
||||
params['xpid'] = xpid_generator(cmd, params, xpid_prefix)
|
||||
if include_wandb_group:
|
||||
params['wandb_group'] = params['xpid']
|
||||
|
||||
start_seed = params['seed']
|
||||
|
||||
for t in range(num_trials):
|
||||
params['seed'] = start_seed + t + start_index
|
||||
|
||||
_cmd = [f'python -m {cmd}']
|
||||
|
||||
trial_idx = t + start_index
|
||||
for k,v in params.items():
|
||||
if v is None:
|
||||
continue
|
||||
|
||||
if k == 'xpid':
|
||||
v = f'{v}_{trial_idx}'
|
||||
|
||||
assert len(v) < 256, f'{v} exceeds 256 characters!'
|
||||
|
||||
if count_set is not None:
|
||||
count_set.add(v)
|
||||
|
||||
if v == "*":
|
||||
v = f'"*"'
|
||||
|
||||
_cmd.append(f'--{k}={v}')
|
||||
|
||||
_cmd = separator.join(_cmd)
|
||||
|
||||
cmds.append(_cmd)
|
||||
|
||||
return cmds
|
||||
|
||||
|
||||
def generate_all_params_for_grid(grid, defaults={}):
|
||||
def update_params_with_choices(prev_params, param, choices):
|
||||
updated_params = []
|
||||
for v in choices:
|
||||
for p in prev_params:
|
||||
updated = p.copy()
|
||||
updated[param] = v
|
||||
updated_params.append(updated)
|
||||
|
||||
return updated_params
|
||||
|
||||
all_params = [{}]
|
||||
for param, choices in grid.items():
|
||||
all_params = update_params_with_choices(all_params, param, choices)
|
||||
|
||||
full_params = []
|
||||
for p in all_params:
|
||||
d = defaults.copy()
|
||||
d.update(p)
|
||||
full_params.append(d)
|
||||
|
||||
return full_params
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description='Make commands')
|
||||
|
||||
parser.add_argument(
|
||||
'--dir',
|
||||
type=str,
|
||||
default='config/configs/',
|
||||
help='Path to directory with .json configs')
|
||||
|
||||
parser.add_argument(
|
||||
'--config', '-c',
|
||||
type=str,
|
||||
default=None,
|
||||
help='Name of .json config for hyperparameter search-grid')
|
||||
|
||||
parser.add_argument(
|
||||
'--n_trials',
|
||||
type=int,
|
||||
default=1,
|
||||
help='Name of .json config for hyperparameter search-grid')
|
||||
|
||||
parser.add_argument(
|
||||
'--start_index',
|
||||
default=0,
|
||||
type=int,
|
||||
help='Starting trial index of xpid runs')
|
||||
|
||||
parser.add_argument(
|
||||
'--count',
|
||||
action='store_true',
|
||||
help='Print number of generated commands at the end of output.')
|
||||
|
||||
parser.add_argument(
|
||||
"--checkpoint",
|
||||
action='store_true',
|
||||
help='Whether to start from checkpoint'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--wandb_base_url",
|
||||
type=str,
|
||||
default=None,
|
||||
help='wandb base url'
|
||||
)
|
||||
parser.add_argument(
|
||||
"--wandb_api_key",
|
||||
type=str,
|
||||
default=None,
|
||||
help='wandb api key'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--wandb_project',
|
||||
type=str,
|
||||
default=None,
|
||||
help='wandb project name')
|
||||
|
||||
parser.add_argument(
|
||||
'--include_wandb_group',
|
||||
action="store_true",
|
||||
help='Whether to include wandb group in cmds.')
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def xpid_from_params(cmd, p, prefix=''):
|
||||
p = DefaultDotDict(p)
|
||||
|
||||
env_info = xpid_maker.get_env_info(p)
|
||||
runner_info = xpid_maker.get_runner_info(p)
|
||||
a_algo_info = xpid_maker.get_algo_info(p, role='student')
|
||||
|
||||
a_info = a_algo_info
|
||||
if cmd != 'finetune':
|
||||
a_model_info = xpid_maker.get_model_info(p, role='student')
|
||||
a_info = f"{a_info}_{a_model_info}"
|
||||
pt_info = ''
|
||||
else:
|
||||
pt_agent_info = 'tch' if p.get('ft_teacher') else 'st'
|
||||
pt_info = f"-{p.get('checkpoint_name', 'checkpoint')}_{pt_agent_info}"
|
||||
|
||||
tch_info = ''
|
||||
train_runner = p.get('train_runner', 'dr')
|
||||
if train_runner == 'paired':
|
||||
tch_algo_info = xpid_maker.get_algo_info(p, role='teacher')
|
||||
tch_model_info = xpid_maker.get_model_info(p, role='teacher')
|
||||
tch_info = f"_tch_{tch_algo_info}_{tch_model_info}"
|
||||
|
||||
xpid = f"{train_runner}-{env_info}-{runner_info}-{a_info}{tch_info}{pt_info}"
|
||||
|
||||
return xpid
|
||||
|
||||
|
||||
def setup_config_dir():
|
||||
config_dir = 'config/configs'
|
||||
if not os.path.exists(os.path.join(config_dir, 'maze')):
|
||||
os.makedirs(config_dir, exist_ok=True)
|
||||
|
||||
import shutil
|
||||
|
||||
this_path = os.path.dirname(os.path.abspath(__file__))
|
||||
src_path = os.path.join(this_path, 'configs')
|
||||
|
||||
for item in os.listdir(src_path):
|
||||
src_item = os.path.join(src_path, item)
|
||||
dst_item = os.path.join(config_dir, item)
|
||||
|
||||
if os.path.isdir(src_item):
|
||||
shutil.copytree(src_item, dst_item, symlinks=True)
|
||||
else:
|
||||
shutil.copy(src_item, dst_item)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
args = parse_args()
|
||||
|
||||
# Default parameters
|
||||
params = {
|
||||
# Not needed.
|
||||
}
|
||||
|
||||
setup_config_dir()
|
||||
|
||||
json_filename = args.config
|
||||
if not json_filename.endswith('.json'):
|
||||
json_filename += '.json'
|
||||
|
||||
grid_path = os.path.join(os.path.expandvars(os.path.expanduser(args.dir)), json_filename)
|
||||
config = json.load(open(grid_path))
|
||||
cmd = config.get('cmd', 'train')
|
||||
grid = config['args']
|
||||
xpid_prefix = '' if 'xpid_prefix' not in config else config['xpid_prefix']
|
||||
|
||||
if args.checkpoint:
|
||||
params['checkpoint'] = True
|
||||
|
||||
if 'wandb_project' in grid:
|
||||
params['wandb_project'] = args.wandb_project
|
||||
|
||||
if args.wandb_base_url:
|
||||
params['wandb_base_url'] = args.wandb_base_url
|
||||
if args.wandb_api_key:
|
||||
params['wandb_api_key'] = args.wandb_api_key
|
||||
|
||||
params.update(get_wandb_config())
|
||||
|
||||
# Generate all parameter combinations within grid, using defaults for fixed params
|
||||
all_params = generate_all_params_for_grid(grid, defaults=params)
|
||||
|
||||
unique_xpids = None
|
||||
if args.count:
|
||||
unique_xpids = set()
|
||||
|
||||
# Print all commands
|
||||
if cmd == 'eval':
|
||||
xpid_generator = None
|
||||
else:
|
||||
xpid_generator = xpid_from_params
|
||||
count = 0
|
||||
for p in all_params:
|
||||
cmds = generate_train_cmds(
|
||||
cmd, p,
|
||||
num_trials=args.n_trials,
|
||||
start_index=args.start_index,
|
||||
newlines=True,
|
||||
xpid_generator=xpid_generator,
|
||||
xpid_prefix=xpid_prefix,
|
||||
include_wandb_group=args.include_wandb_group,
|
||||
count_set=unique_xpids)
|
||||
|
||||
for c in cmds:
|
||||
print(c + '\n')
|
||||
count += 1
|
||||
|
||||
if args.count:
|
||||
print(f'Generated {len(unique_xpids)} unique commands.')
|
||||
print('Sweep over')
|
||||
grid_sizes = []
|
||||
for k,v in grid.items():
|
||||
if len(v) > 1:
|
||||
grid_sizes.append(len(v))
|
||||
print(f'{k}: {len(v)}')
|
||||
|
||||
print(f'Total num settings: {np.prod(grid_sizes)}')
|
||||
|