REAL-TIME HUMAN HAND POSE ESTIMATION AND TRACKING USING DEPTH SENSORS by Furkan Kıraç Department of Computer Science Ozyegin University MOTIVATION & SCOPE Algorithm Depth Image Pose Configuration (angles of the skeleton) OUTLINE n Single Frame Pose Estimation Using Pixel Classification n n n n n n Single Frame Pose Estimation Using Regression n n n Randomized Decision Forests (RDF) Shape Recognition (RDF-S) Hand Joint Estimation Using Shape Recognition (RDF-C) Hand Joint Estimation Using Hybrid Forests (RDF-H) Randomized Regression Forests (RDF-R) Hand Joint Estimation Using Regression Forests (RDF-R) Taking The Hierarchy Into Account (RDF-R+) Dynamic Hand Gesture Tracking n Manifold Extraction and Kalman Tracking over the Manifold Illumination Problem Segmentation (Use Depth Image?) Single Frame Pose Estimation Using Pixel Classification (2011) Randomized Decision Forest (*) * Ho, Tin Kam. Document Analysis and Recognition, Proceedings of the Third International Conference on, volume 1, pages 278-282, 1995. Randomized Decision Forests Features * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Real time hand pose estimation using depth sensors. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1228–1234. IEEE, 2011. Randomized Decision Forests Training and Inference on the Tree Structure Randomized Decision Forests Shape Recognition (RDF-S) American Sign Language Letters We will be working on ASL alphabet. ASL is mostly single frame based. ‘j’ and ‘z’ letter gestures Are dynamic. RDF-S Results Confusion Matrix Leave one subject out Half training-half validation RDF-S Results Examining the confused shapes Confusion matrix results are meaningful. These shapes are really very similar. RDF-S Performance Results n n n 4 trees with depths of 20 are used. 1000 features are sampled at each node. Shape Recognition Performance Method Leave One Subject Out Half Training Half Validation Pugeult (SURREY) 47.0% 69.0% RDF-S (*) 97.8% 84.3% * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863. Springer, 2012. RDF for Hand Joint Estimation Texture is designed this way so that the centroids of the colored regions correspond with a hand joint. Hand Joint Estimation Using Pixel Classification (RDF-C) Hand Part Centroid Finding with Mean-shift Hand Pose Estimation Steps Using RDF-C Depth image Per-pixel classification After Mean-shift Connecting the dots Randomized Hybrid Forests Boosting the classification accuracy n n n We have shape classification forests (RDF-S) We have pixel classification forests (RDF-C) We now use them both: n n RDF-S + RDF-C = RDF-H (hybrid forests) We try to control the training phase by dividing it to two layers. * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Multi–layered Randomized Classification Forests for Hand Pose Estimation using Depth Sensors. In Computer Vision and Pattern Recognition Workshops on Gesture Recognition–CVPR 2012. * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863. Springer, 2012. Randomized Hybrid Forests RDF-H * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Multi–layered Randomized Classification Forests for Hand Pose Estimation using Depth Sensors. In Computer Vision and Pattern Recognition Workshops on Gesture Recognition–CVPR 2012. Local Expert Network (LEN) * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863. Springer, 2012. Global Expert Network (GEN) * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863. Springer, 2012. RDF-H Results Used 60k depth images synthesized from interpolating ASL digit animations. Method RDF-C RDF-H (GEN) RDF-H (LEN) Pixel classification rate 68.0% 91.2% 90.9% Clusters (K) 1 (RDF-C) 5 10 20 30 30 SCF Height N/A 15 16 17 18 19 PCF Height 20 18 17 16 15 14 Total Height 20 33 33 33 33 33 Relative Memory 1.00 0.87 0.90 0.96 0.88 0.81 Relative Recognition Time 1.00 1.35 1.37 1.38 1.40 1.42 Pixel classification rate 81.3% 86.6% 91.2% 77.4% 64.3 68.0% * Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863. Springer, 2012. Major Problems in Classification RDFs Confusion of small parts & self occlusion Major Problems in Classification RDFs Confusion and missing of the small parts Single Frame Pose Estimation Using Regression Randomized Regression Forests (RDF-R) RDF-C RDF-R Randomized Regression Forests Training for Regression Hopefully in each leaf we have pixels which are spatially close. What if those spatially close pixels vote for each joint relative to their own positions? * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Randomized Regression Forests Leaf Votes per Joint Consider the votes cast for a specific joint from a leaf node. Note that the distributions in the leaves may be multi-modal. Cannot store all the votes in the leaves. We compress the votes by using mean-shift. * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Randomized Regression Forests Inference of the joints from a depth image Fingertip’s maxima in the vote distribution ONLY RELATIVE VOTES ARE STORED * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Randomized Regression Forests (RDF-R) Comparison of regression vs classification based methods left) Regression Forest Result middle) Ground Truth Pixel Classification right) Occlusion and classification based problems * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Randomized Regression Forests RDF-R Problems 1st mode 2nd mode 3rd mode 1) How to handle multi-modal per-joint vote distributions? 2) How to incorporate hierarchical dependencies of the joints? Randomized Regression Forests with bone length constraints (RDF-R+) Hierarchy between joints c and p Posterior of a joint j # of local modes of a joint j a mode of distribution for joint j Define an ordering over the modes * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Randomized Regression Forests with bone length constraints (RDF-R+) Minimize total penalty of f Best skeletal configuration Corresponds to the RDF-R Corresponds to the RDF-R+ * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Randomized Regression Forests with bone length constraints (RDF-R+) 1st mode 2nd mode RDF-R Posterior of index fingertip RDF-R+ * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Difficulties of Real Samples SURREY Dataset (Pugeault et al.) Cropped Missing data Depth variations, segmentation problems (missing parts). * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data. Submitted for publication, 2013. Experiments Training Dataset: 30k samples 40 different hand poses 160x160 resolution Cropped Dataset: Used only the center 80x80 region of the training dataset. RPSLS Dataset: Rock, paper, scissors, lizard, spock poses. SURREY Dataset: We landmarked 55 select frames. Experiments Fine tuning the hyper-parameters 12,5 RDF-R 12 11,5 11 10,5 10 RDF-C RDF-R 10 Joint Error Joint Error 12 RDF-C 8 6 4 2 9,5 0 1 3 5 7 30 Forest size 40 50 60 70 80 90 Maximum probe distance 25 RDF-C RDF-R Joint Error 20 15 10 5 0 7 9 11 13 15 17 19 21 23 25 Tree depth * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data, 2013. CMC Curves for the Datasets TRAIN and CROP 1 1 0.9 0.9 0.8 0.8 0.7 Acceptance rate Acceptance rate 0.7 0.6 0.5 0.4 0.5 0.4 0.3 0.3 0.2 0.2 RDF−C RDF−R RDF−R+ 0.1 0 0.6 0 5 10 15 20 25 Acceptance threshold in mm 30 TRAIN dataset 35 RDF−C RDF−R RDF−R+ 0.1 40 0 0 5 10 15 20 25 Acceptance threshold in mm 30 35 CROP dataset * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data, 2013. 40 CMC Curves for the Datasets RPSLS and SURREY 1 1 0.9 0.9 0.8 0.8 0.7 Acceptance rate Acceptance rate 0.7 0.6 0.5 0.4 0.5 0.4 0.3 0.3 0.2 0.2 RDF−C RDF−R RDF−R+ 0.1 0 0.6 0 5 10 15 20 25 Acceptance threshold in mm 30 RPSLS dataset 35 RDF−C RDF−R RDF−R+ 0.1 40 0 0 5 10 15 20 25 Acceptance threshold in mm 30 35 SURREY dataset * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data, 2013. 40 CMC Performances at 10 mm acceptance threshold * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data, 2013. CMC Performances on SURREY at 10 mm acceptance threshold * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data, 2013. Joint Estimation Examples in test datasets * Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using regression forests from single frame depth data, 2013. Hand Joint estimations in the SURREY dataset Left to right: Ground truth Pixel classification RDF-C RDF-R RDF-R+ RDF-C RDF-R RDF-R+ Now? n 2014 Performance from Yann LeCun (Head of AI in Facebook)

1/--pages