code to train classifiers

This commit is contained in:
Sabrina Hoppe 2018-05-05 22:05:03 +02:00
parent 0403f2ce55
commit 34ff6100e6
7 changed files with 438 additions and 0 deletions

View file

@ -25,7 +25,25 @@ reproducing the paper results step by step:
1. __Extract features from raw gaze data__:
`python 00_compute_features.py` to compute gaze features for all participants
Once extracted, the features are stored in `features/ParticipantXX/window_features_YY.npy` where XX is the participant number and YY the length of the sliding window in seconds.
2. __Train random forest classifiers__
`./01 train_classifiers.sh` to reproduce the evaluation setting described in the paper in which each classifier was trained 100 times.
`./02_train_specialized_classifiers.sh` to train specialized classifiers on parts of the data (specifically on data from inside the shop or on the way).
If the scripts cannot be executed, you might not have the right access permissions to do so. On Linux, you can try `chmod +x 01_train_classifiers.sh`,`chmod +x 02_train_specialized_classifiers.sh` and `chmod +x 03_label_permutation_test.sh` (see below for when/how to use the last script).
In case you want to call the script differently, e.g. to speed-up the computation or try with different parameters, you can pass the following arguments to `classifiers.train_classifier`:
`-t` trait index between 0 and 6
`-l` lowest number of repetitions, e.g. 0
`-m` max number of repetitions, e.g. 100
`-a` using partial data only: 0 (all data), 1 (way data), 2(shop data)
In case of performance issues, it might be useful to check `_conf.py` and change `max_n_jobs` to restrict the number of jobs (i.e. threads) running in parallel.
The results will be saved in `results/A0` for all data, `results/A1` for way data only and `results/A2` for data inside a shop. Each file is named `TTT_XXX.npz`, where TTT is the abbreviation of the personality trait (`O`,`C`,`E`,`A`,`N` for the Big Five and `CEI` or `PCS` for the two curiosity measures). XXX enumerates the classifiers (remember that we always train 100 classifiers for evaluation because there is some randomness involved in the training process).
3. __Evaluate Baselines__
* To train a classifier that always predicts the most frequent personality score range from its current training set, please execute `python 03_train_baseline.py`
* To train classifiers on permuted labels, i.e. perform the so-called label permutation test, please execute `./04_label_permutation_test.sh`
## Citation