code to train classifiers

2018-05-05 22:05:03 +02:00 · 2018-05-05 22:05:03 +02:00 · 34ff6100e6
commit 34ff6100e6
parent 0403f2ce55
7 changed files with 438 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -25,7 +25,25 @@ reproducing the paper results step by step:
 1. __Extract features from raw gaze data__:    
   `python 00_compute_features.py` to compute gaze features for all participants  
   Once extracted, the features are stored in `features/ParticipantXX/window_features_YY.npy` where XX is the participant number and YY the length of the sliding window in seconds.  
+2. __Train random forest classifiers__  
+     `./01 train_classifiers.sh` to reproduce the evaluation setting described in the paper in which each classifier was trained 100 times.  
+    `./02_train_specialized_classifiers.sh` to train specialized classifiers on parts of the data (specifically on data from inside the shop or on the way).

+    If the scripts cannot be executed, you might not have the right access permissions to do so. On Linux, you can try `chmod +x 01_train_classifiers.sh`,`chmod +x 02_train_specialized_classifiers.sh` and `chmod +x 03_label_permutation_test.sh` (see below for when/how to use the last script).
+
+    In case you want to call the script differently, e.g. to speed-up the computation or try with different parameters, you can pass the following arguments to `classifiers.train_classifier`:  
+      `-t` 	trait index between 0 and 6  
+      `-l`   lowest number of repetitions, e.g. 0   
+      `-m`   max number of repetitions, e.g. 100  
+      `-a`   using partial data only: 0 (all data), 1 (way data), 2(shop data)  
+
+    In case of performance issues, it might be useful to check `_conf.py` and change `max_n_jobs` to restrict the number of jobs (i.e. threads) running in parallel.
+
+    The results will be saved in `results/A0` for all data, `results/A1` for way data only and `results/A2` for data inside a shop. Each file is named `TTT_XXX.npz`, where TTT is the abbreviation of the personality trait (`O`,`C`,`E`,`A`,`N` for the Big Five and `CEI` or `PCS` for the two curiosity measures). XXX enumerates the classifiers (remember that we always train 100 classifiers for evaluation because there is some randomness involved in the training process).  
+
+3. __Evaluate Baselines__
+   * To train a classifier that always predicts the most frequent personality score range from its current training set, please execute `python 03_train_baseline.py`  
+   * To train classifiers on permuted labels, i.e. perform the so-called label permutation test, please execute `./04_label_permutation_test.sh`    


 ## Citation