From 9d7716895496698f6c25a78ae58a67baa9195b4d Mon Sep 17 00:00:00 2001 From: apenzko Date: Tue, 19 Oct 2021 14:49:38 +0200 Subject: [PATCH] Updated README --- processing/README.md | 61 ++++++++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 16 deletions(-) diff --git a/processing/README.md b/processing/README.md index 3f4312a..ba2ee7b 100644 --- a/processing/README.md +++ b/processing/README.md @@ -4,20 +4,49 @@ conda env create -f conan_windows.yml conda activate conan_windows_env ``` - -### OpenPose -### RT-Gene -- Run [processing/install_RTGene.py](/processing/install_RTGene.py) -- [OPTIONAL] Provide camera calibration file calib.pkl -- Provide maximum number of people in the video -### JAA-Net -### AVA-Active Speaker -### Apriltag - -[https://www.wikihow.com/Install-FFmpeg-on-Windows](https://www.wikihow.com/Install-FFmpeg-on-Windows) -### Training +## Usage +Run [ConAn_RunProcessing.ipynb](ConAn_RunProcessing.ipynb) to extract all frames from video and run processing models. +### Body Movement +For body movement detection we selected [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose). For our case, we used the 18-keypoint model, +which takes the full frame as input and jointly predicts anatomical keypoints and a measurement +for the degree of association between them.
+If you're using this processing step in your research please cite: ``` -conda install -c anaconda cupy -conda install -c anaconda chainer -conda install -c anaconda ipykernel -``` \ No newline at end of file +@article{8765346, + author = {Z. {Cao} and G. {Hidalgo Martinez} and T. {Simon} and S. {Wei} and Y. A. {Sheikh}}, + journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title = {OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields}, + year = {2019} +} +``` +### Eye Gaze +For eye gaze estimation we selected [RT-GENE](https://github.com/Tobias-Fischer/rt_gene). In addition to feeding each video frame to the model, +we also input a version of the frame where the left side and the right side are wrapped together. +This enables us to detect when a person moves over the edge of the video, as none of the models account for this. +As this is a single frame estimation, we then track all subjects throughout the video using a minimal euclidean distance heuristic.
+
+If you're using this processing step in your research please cite: +``` +@inproceedings{FischerECCV2018, +author = {Tobias Fischer and Hyung Jin Chang and Yiannis Demiris}, +title = {{RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments}}, +booktitle = {European Conference on Computer Vision}, +year = {2018}, +month = {September}, +pages = {339--357} +} +``` +Notes: +- Before using [process_RTGene.py](process_RTGene.py) you need to run [install_RTGene.py](install_RTGene.py)! +- [OPTIONAL] You can provide a camera calibration file calib.pkl to improve detections. +- You need to provide maximum number of people in the video for the sorting algorithm. +### Facial Expression +Under construction +### Speaking Activity +Under construction +### Object Tracking +We assume that you are most likely able to define your own study procedure, +therefore we decided to simplify object tracking by employing the visual fiducial system [AprilTag 2](https://github.com/AprilRobotics/apriltag), +where the tag positions are extracted with their tailored detector. + +Note: For Windows we use [pupil_apriltags](https://github.com/pupil-labs/apriltags).