ars_proj_pres

CRF-based Activity Recognition
on Manifolds
Presented by
Arshad Jamal and Prakhar Banga
Introduction
• Objective: To explore the STIP based activity analysis by
finding a manifold structure for the HoG-HOF descriptor to
reduce the dimension of the data and learn a hCRF based
discriminative classifier to classify actions
• Challenges: Huge diversity in the data (view-points,
appearance, motion, lighting etc.)
• Applications: video surveillance, video indexing and
retrieval and human computer interaction
Related Works
1. Various variants of
spatio-temporal interest
points (STIPs) based
approaches exist
2. Generative models like
Bayesian Networks to
model key pose
changes
3. Discriminative models like a CRF network to model the
temporal dynamics of silhouettes based features
4. Learning and Matching of Dynamic Shape Manifolds for
Human Action Recognition
Proposed Approach
Labeled Training
Dataset
STIP Detector
&
Descriptor
STIP
Detector &
Descriptor
Test Video
Manifold
Learning
(LPP)
Learn CRF
Classifier
Dimensionality
reduction
Classifier
Action Class
Algorithm Details: STIP Detector
• Looks for distinctive neighborhood in the video
• High image variation in space and time
• Describe it using distribution of gradient and optical flow
 Lxx

H   Lxy
L
 xt
Lxy
L yy
L yt
Lxt 

L yt  Where Lij is
Ltt 
L
ij
Any (x, y, t) location in the video is STIP if
det( H )    trace ( H )  TH
3
I. Laptev. On Space-Time Interest Points. IJCV, 2005
Algorithm Details: STIP Descriptor
• Small spatio-temporal neighborhood extracted
• Divided into 3x3x2 tiles
I. Laptev. On Space-Time Interest Points. IJCV, 2005
Dimensionality Reduction
A broad classification
• Linear:
• Cannot capture inherent non-linearity in the
manifold
• Non-linear
• May not be defined everywhere
• Do not preserve neighborhood
Locality preserving projections
(LPP)
Concentrates on preserving locality
rather than minimizing the Least Square
Error
Capable of learning the non-linear
manifold structure as optimally as
possible
Algorithm Details: HCRF based Classifier
Output
Hidden
states
y
h1
h2
hm
x1
x2
xm
Observations
1. HCRF is a discriminative classifier conditioned globally
on all the observations
2. Model parameters are found by maximizing the
conditional log likelihood on the labelled training data
3. Flexible configuration and connectivity of the Hidden
variables
Wang, Mori NIPS 2008
Algorithm Details: HCRF based Classifier
Given a sequence of STIPs x and it’s class label y  
We wish to find p(y/x)
p( y, x, h; )
p( y, x; )

h m
p( y / x; ) 

p(x; )
y ' hm p( y' , x, h' ; )


 
h m
y '
exp( ( y, x, h; ))
h m
exp( ( y' , x, h' ; ))
 ( y, h, x;  )    T   ( x j , h j )   T   ( y, h j ) 
jV
jV
T

   ( y , h j ,h k )
( j , k )E
Wang, Mori NIPS 2008
Current Status
1. Datasets:
•
KTH: 600 videos of 6 actions by 25 actors in 4 scenarios
•
UCF-50 dataset
2. Features: 3D-Harris corner as STIP, 162-dim (72+90)
HoG-HoF descriptors computed for the two datasets
•
Script based approach to add new datasets
3. Dimensionality reduction using LPP
•
Learn a low dimensional manifold using all the STIPs obtained
from the full dataset
•
Project the data onto the learned manifold
4. HCRF model is learned using the training dataset
•
Initial results obtained
Results: STIPs
Result: LPP output
~1.2Lac STIPs collected
from all action classes in
KTH dataset
3-dimensions
Plotted for
visualization
To be done…
• Testing the classifier for different dataset
• Compiling the action classification results for different
datasets
• Code debugging
References
• I. Laptev. On Space-Time Interest Points. IJCV, 2005
• I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic
•
•
•
•
•
•
•
human actions from movies. In CVPR, 2008
F. Lv and R. Nevatia. Single view human action recognition using key
pose matching and viterbi path searching. In CVPR, 2007
S. Wang, A. Quattoni, L.-P. Morency, D. Demirdjian, and T. Darrell.
Hidden Conditional Random Fields for Gesture Recognition. In CVPR,
2006
L.-P. Morency, A. Quattoni, and T. Darrell. Latent-Dynamic Discriminative
Models for Continuous Gesture recognition. In CVPR, June 2007
A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden
conditional random fields. PAMI, Oct. 2007
C. Sminchisescu. Selection and context for action recognition. In ICCV,
2009
J. Sun, X. Wu, S. Yan, L. Cheong, T. Chua, and J. Li. Hierarchical spatiotemporal context modelling for action recognition. In CVPR, 2009
L Wang, D Suter, Learning and Matching of Dynamic Shape Manifolds for
Human Action Recognition, IEEE TIP, 2007
Thank You