evaluation code

2018-05-05 22:22:21 +02:00 · 2018-05-05 22:22:21 +02:00 · 3d3cebb956
commit 3d3cebb956
parent fc7973a49b
6 changed files with 660 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -25,27 +25,48 @@ reproducing the paper results step by step:
 1. __Extract features from raw gaze data__:    
   `python 00_compute_features.py` to compute gaze features for all participants  
   Once extracted, the features are stored in `features/ParticipantXX/window_features_YY.npy` where XX is the participant number and YY the length of the sliding window in seconds.  
+
+
 2. __Train random forest classifiers__  
-     `./01 train_classifiers.sh` to reproduce the evaluation setting described in the paper in which each classifier was trained 100 times.  
-    `./02_train_specialized_classifiers.sh` to train specialized classifiers on parts of the data (specifically on data from inside the shop or on the way).
+   `./01 train_classifiers.sh` to reproduce the evaluation setting described in the paper in which each classifier was trained 100 times.  
+  `./02_train_specialized_classifiers.sh` to train specialized classifiers on parts of the data (specifically on data from inside the shop or on the way).

-    If the scripts cannot be executed, you might not have the right access permissions to do so. On Linux, you can try `chmod +x 01_train_classifiers.sh`,`chmod +x 02_train_specialized_classifiers.sh` and `chmod +x 03_label_permutation_test.sh` (see below for when/how to use the last script).
+  If the scripts cannot be executed, you might not have the right access permissions to do so. On Linux, you can try `chmod +x 01_train_classifiers.sh`,`chmod +x 02_train_specialized_classifiers.sh` and `chmod +x 03_label_permutation_test.sh` (see below for when/how to use the last script).

-    In case you want to call the script differently, e.g. to speed-up the computation or try with different parameters, you can pass the following arguments to `classifiers.train_classifier`:  
-      `-t` 	trait index between 0 and 6  
-      `-l`   lowest number of repetitions, e.g. 0   
-      `-m`   max number of repetitions, e.g. 100  
-      `-a`   using partial data only: 0 (all data), 1 (way data), 2(shop data)  
+  In case you want to call the script differently, e.g. to speed-up the computation or try with different parameters, you can pass the following arguments to `classifiers.train_classifier`:  
+    `-t` 	trait index between 0 and 6  
+    `-l`   lowest number of repetitions, e.g. 0   
+    `-m`   max number of repetitions, e.g. 100  
+    `-a`   using partial data only: 0 (all data), 1 (way data), 2(shop data)  

-    In case of performance issues, it might be useful to check `_conf.py` and change `max_n_jobs` to restrict the number of jobs (i.e. threads) running in parallel.
+  In case of performance issues, it might be useful to check `_conf.py` and change `max_n_jobs` to restrict the number of jobs (i.e. threads) running in parallel.

-    The results will be saved in `results/A0` for all data, `results/A1` for way data only and `results/A2` for data inside a shop. Each file is named `TTT_XXX.npz`, where TTT is the abbreviation of the personality trait (`O`,`C`,`E`,`A`,`N` for the Big Five and `CEI` or `PCS` for the two curiosity measures). XXX enumerates the classifiers (remember that we always train 100 classifiers for evaluation because there is some randomness involved in the training process).  
+  The results will be saved in `results/A0` for all data, `results/A1` for way data only and `results/A2` for data inside a shop. Each file is named `TTT_XXX.npz`, where TTT is the abbreviation of the personality trait (`O`,`C`,`E`,`A`,`N` for the Big Five and `CEI` or `PCS` for the two curiosity measures). XXX enumerates the classifiers (remember that we always train 100 classifiers for evaluation because there is some randomness involved in the training process).  

-3. __Evaluate Baselines__
-   * To train a classifier that always predicts the most frequent personality score range from its current training set, please execute `python 03_train_baseline.py`  
-   * To train classifiers on permuted labels, i.e. perform the so-called label permutation test, please execute `./04_label_permutation_test.sh`    
+3. __Train baselines__
+  * To train a classifier that always predicts the most frequent personality score range from its current training set, please execute `python 03_train_baseline.py`  
+  * To train classifiers on permuted labels, i.e. perform the so-called label permutation test, please execute `./04_label_permutation_test.sh`    


+4. __Performance analysis__
+   * Run `python 05_plot_weights.py` to extract feature importance scores. These scores will be visualized in `figures/figure2.pdf` which corresponds to Figure 2 in the paper and `figures/table2.tex` which is shown in Table 2 in the supplementary information.
+   (additionally this step computes F1 scores which are required for the next step, so do not skip it)
+   * The results obtained from both baselines will be written to disk and read once you execute `python 06_baselines.py`.
+   A figure illustrating the actual classifiers' performance along with the random results will be written to `figures/figure1.pdf` as well as `figures/figure1.csv` and correspond to Figure 1 in the paper.  
+
+
+5. __Context comparison__  
+`python 07_evaluation_across_contexts.py` to compute the average correlation coefficients between predictions based on data from different contexts. The table with all coefficients will be written to `figures/table1-5.csv` which can be found in Table 1 and Table 5 in supplementary information.  
+If (some) files in the results folder are missing, try re-running all one of the bash (\*.sh) scripts again.
+
+6. __Descriptive analysis__  
+`python 08_descriptive.py` to compute the correlation between each participant's average feature for the most frequently chosen time window and their personality score range. Results are written to four files `figures/table4-1.tex`,`figures/table4-2.tex`,`figures/table4-3.tex`,`figures/table4-4.tex` and are shown together in Table 4 in the supplementary information.  
+
+7. __Window Size Histogram__    
+`python 09_plot_ws_hist.py` to plot a histogram of window sizes chosen during the nested cross validation routine to `figures/ws_hist.pdf`.
+
+All these scripts write intermediate results to disk, i.e. if you start a script a second time, it will be much faster - but the first run can take some time, e.g. up to 8 hours to train classifiers for one context on a 16 core machine; 1 hour to compute correlations between contexts.  
+
 ## Citation  
 If you want to cite this project, please use the following Bibtex format: