Spring 2015 Deep learning for Human action Recognition

Deep learning for Human action Recognition
Dr. Z. R. Ghassabi
[email protected]
Spring 2015
1
Outline
•
•
•
Introduction to human action recognition
Introduction to deep learning
Is deep learning useful for human action recognition?
2
Introduction to Human action recognition
Vision-based human activity recognition
sensor-based human activity recognition
Introduction to Human action recognition
• Segmentation
• Feature representation
• Feature Classification
Usually hand-crafted: SIFT, HOG, etc…
The real challenge: image features
Representation
• Examples of descriptors
Unsupervised feature learning
• Until very recently, learning has not played a major role until
the classification stage, at which point much of the input is
lost.
• Now learn from data directly and No engineering/research
effort
• Equally good if not better
Hierarchies in high-level vision
Hierarchies in high-level vision
Hierarchies in high-level vision
Deep Learning and AI
Deep Learning and AI
Deep Learning and AI
Unsupervised learning: optimizes Φ from
unlabeled data distribution
Unsupervised feature learning
• Distributed representations
– many-to-many relationship
between concepts and
variables
• Each concept is represented by
many variables
• Each variable participates in
the representation of many
concepts
Distributed color-shape representation
Hierarchical sparse coding
Unsupervised feature learning
• Boltzmann machine
• Deep Neural Networks
• Convolutional Neural
Networks
Deep Neural Network for AR
• A key advantage of DNN is its
representation of input features.
• DNN can model diverse activities
with much less training data.
Deep Neural Network for AR
• Supervised :
– Restricted Bozltman Machine (RBM)
• Unsupervised:
– Shift-Invariant Sparse Coding
• RBM and Sparse Coding are fully connected DNN models. Therefore, they do not
capture local dependencies of the time series signals.
Fully and Locally Connected NN
Convolutional NN
Advantages of applying CNN to AR
• Can capture Local Dependency and Scale invariance
features of activity signals.
•
variations of the same activity can be effectively captured through the extracted
features.
CNN for sensor-based AR
• Consists of one or more
pairs of convolution and
pooling layers
• Local dependencies by
Convolutional layers
• Scale-invariance by maxpooling layers
Activity Recognition
Criticism on Deep Learning
• Computational intensive
• A lot of parameters to tune