by Furkan Kıraç

REAL-TIME HUMAN HAND POSE ESTIMATION
AND TRACKING USING DEPTH SENSORS
by Furkan Kıraç
Department of Computer Science
Ozyegin University
MOTIVATION & SCOPE
Algorithm
Depth Image
Pose Configuration
(angles of the skeleton)
OUTLINE
n 
Single Frame Pose Estimation Using Pixel Classification
n 
n 
n 
n 
n 
n 
Single Frame Pose Estimation Using Regression
n 
n 
n 
Randomized Decision Forests (RDF)
Shape Recognition (RDF-S)
Hand Joint Estimation Using Shape Recognition (RDF-C)
Hand Joint Estimation Using Hybrid Forests (RDF-H)
Randomized Regression Forests (RDF-R)
Hand Joint Estimation Using Regression Forests (RDF-R)
Taking The Hierarchy Into Account (RDF-R+)
Dynamic Hand Gesture Tracking
n 
Manifold Extraction and Kalman Tracking over the Manifold
Illumination Problem
Segmentation (Use Depth Image?)
Single Frame Pose Estimation
Using Pixel Classification
(2011)
Randomized Decision Forest (*)
* Ho, Tin Kam. Document Analysis and Recognition, Proceedings of the Third
International Conference on, volume 1, pages 278-282, 1995.
Randomized Decision Forests
Features
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Real time hand pose estimation using depth sensors. In Computer
Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1228–1234. IEEE, 2011.
Randomized Decision Forests
Training and Inference on the Tree Structure
Randomized Decision Forests
Shape Recognition (RDF-S)
American Sign
Language
Letters
We will be working on ASL
alphabet.
ASL is mostly single frame based.
‘j’ and ‘z’ letter gestures
Are dynamic.
RDF-S Results
Confusion Matrix
Leave one subject out
Half training-half validation
RDF-S Results
Examining the confused shapes
Confusion matrix results are meaningful.
These shapes are really very similar.
RDF-S Performance
Results
n 
n 
n 
4 trees with depths of 20 are used.
1000 features are sampled at each node.
Shape Recognition Performance
Method
Leave One
Subject Out
Half Training
Half Validation
Pugeult (SURREY) 47.0%
69.0%
RDF-S (*)
97.8%
84.3%
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose
estimation and hand shape classification using multi-layered randomized decision
forests. In Computer Vision–ECCV 2012, pages 852–863. Springer, 2012.
RDF for Hand Joint Estimation
Texture is designed this way so that
the centroids of the colored regions
correspond with a hand joint.
Hand Joint Estimation
Using Pixel Classification (RDF-C)
Hand Part Centroid Finding with Mean-shift
Hand Pose Estimation Steps
Using RDF-C
Depth image
Per-pixel
classification
After
Mean-shift
Connecting the
dots
Randomized Hybrid Forests
Boosting the classification accuracy
n 
n 
n 
We have shape classification forests (RDF-S)
We have pixel classification forests (RDF-C)
We now use them both:
n 
n 
RDF-S + RDF-C = RDF-H (hybrid forests)
We try to control the training phase by dividing it to two
layers.
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Multi–layered Randomized Classification Forests
for Hand Pose Estimation using Depth Sensors. In Computer Vision and Pattern Recognition Workshops on
Gesture Recognition–CVPR 2012.
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape
classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863.
Springer, 2012.
Randomized Hybrid Forests
RDF-H
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Multi–layered Randomized Classification Forests
for Hand Pose Estimation using Depth Sensors. In Computer Vision and Pattern Recognition Workshops on
Gesture Recognition–CVPR 2012.
Local Expert Network
(LEN)
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape
classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863.
Springer, 2012.
Global Expert Network
(GEN)
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape
classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863.
Springer, 2012.
RDF-H Results
Used 60k depth images synthesized from interpolating ASL digit animations.
Method
RDF-C
RDF-H (GEN)
RDF-H (LEN)
Pixel classification rate
68.0%
91.2%
90.9%
Clusters (K)
1 (RDF-C) 5
10
20
30
30
SCF Height
N/A
15
16
17
18
19
PCF Height
20
18
17
16
15
14
Total Height
20
33
33
33
33
33
Relative Memory
1.00
0.87
0.90
0.96
0.88
0.81
Relative Recognition Time 1.00
1.35
1.37
1.38
1.40
1.42
Pixel classification rate
81.3%
86.6%
91.2% 77.4% 64.3
68.0%
* Cem Keskin, Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hand pose estimation and hand shape
classification using multi-layered randomized decision forests. In Computer Vision–ECCV 2012, pages 852–863.
Springer, 2012.
Major Problems in Classification RDFs
Confusion of small parts & self occlusion
Major Problems in Classification RDFs
Confusion and missing of the small parts
Single Frame Pose Estimation
Using Regression
Randomized Regression Forests
(RDF-R)
RDF-C
RDF-R
Randomized Regression Forests
Training for Regression
Hopefully in each leaf we
have pixels which are
spatially close.
What if those spatially
close pixels vote for each
joint relative to their own
positions?
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data. Submitted for publication, 2013.
Randomized Regression Forests
Leaf Votes per Joint
Consider the votes cast for a specific joint
from a leaf node.
Note that the distributions in the leaves may be
multi-modal.
Cannot store all the votes in the leaves.
We compress the votes by using mean-shift.
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data. Submitted for publication, 2013.
Randomized Regression Forests
Inference of the joints from a depth image
Fingertip’s maxima in the vote distribution
ONLY RELATIVE VOTES ARE STORED
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data. Submitted for publication, 2013.
Randomized Regression Forests (RDF-R)
Comparison of regression vs
classification based methods
left)
Regression Forest Result
middle)
Ground Truth Pixel Classification
right)
Occlusion and classification based
problems
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data. Submitted for publication, 2013.
Randomized Regression Forests
RDF-R Problems
1st mode
2nd mode
3rd mode
1)  How to handle multi-modal per-joint vote distributions?
2)  How to incorporate hierarchical dependencies of the joints?
Randomized Regression Forests
with bone length constraints (RDF-R+)
Hierarchy between joints c and p
Posterior of a joint j
# of local modes of a joint j
a mode of distribution for joint j
Define an ordering over the modes
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose
estimation using regression forests from single frame depth data. Submitted for publication, 2013.
Randomized Regression Forests
with bone length constraints (RDF-R+)
Minimize total penalty of f
Best skeletal configuration
Corresponds to the RDF-R
Corresponds to the RDF-R+
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose
estimation using regression forests from single frame depth data. Submitted for publication, 2013.
Randomized Regression Forests
with bone length constraints (RDF-R+)
1st mode
2nd mode
RDF-R
Posterior of index fingertip
RDF-R+
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose
estimation using regression forests from single frame depth data. Submitted for publication, 2013.
Difficulties of Real Samples
SURREY Dataset (Pugeault et al.)
Cropped
Missing data
Depth variations, segmentation problems (missing parts).
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data. Submitted for publication, 2013.
Experiments
Training Dataset:
30k samples
40 different hand poses
160x160 resolution
Cropped Dataset:
Used only the center 80x80
region of the training
dataset.
RPSLS Dataset:
Rock, paper, scissors, lizard,
spock poses.
SURREY Dataset:
We landmarked 55 select
frames.
Experiments
Fine tuning the hyper-parameters
12,5
RDF-R
12
11,5
11
10,5
10
RDF-C
RDF-R
10
Joint Error
Joint Error
12
RDF-C
8
6
4
2
9,5
0
1
3
5
7
30
Forest size
40
50
60
70
80
90
Maximum probe distance
25
RDF-C
RDF-R
Joint Error
20
15
10
5
0
7
9
11
13
15
17
19
21
23
25
Tree depth
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data, 2013.
CMC Curves for the Datasets
TRAIN and CROP
1
1
0.9
0.9
0.8
0.8
0.7
Acceptance rate
Acceptance rate
0.7
0.6
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
RDF−C
RDF−R
RDF−R+
0.1
0
0.6
0
5
10
15
20
25
Acceptance threshold in mm
30
TRAIN dataset
35
RDF−C
RDF−R
RDF−R+
0.1
40
0
0
5
10
15
20
25
Acceptance threshold in mm
30
35
CROP dataset
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data, 2013.
40
CMC Curves for the Datasets
RPSLS and SURREY
1
1
0.9
0.9
0.8
0.8
0.7
Acceptance rate
Acceptance rate
0.7
0.6
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
RDF−C
RDF−R
RDF−R+
0.1
0
0.6
0
5
10
15
20
25
Acceptance threshold in mm
30
RPSLS dataset
35
RDF−C
RDF−R
RDF−R+
0.1
40
0
0
5
10
15
20
25
Acceptance threshold in mm
30
35
SURREY dataset
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data, 2013.
40
CMC Performances
at 10 mm acceptance threshold
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data, 2013.
CMC Performances on SURREY
at 10 mm acceptance threshold
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data, 2013.
Joint Estimation Examples
in test datasets
* Furkan Kıraç, Yunus Emre Kara, and Lale Akarun. Hierarchically constrained 3d hand pose estimation using
regression forests from single frame depth data, 2013.
Hand Joint
estimations in the
SURREY dataset
Left to right:
Ground truth
Pixel classification
RDF-C
RDF-R
RDF-R+
RDF-C
RDF-R
RDF-R+
Now?
n 
2014 Performance from
Yann LeCun
(Head of AI in Facebook)