Deep learning for Human action Recognition Dr. Z. R. Ghassabi [email protected] Spring 2015 1 Outline • • • Introduction to human action recognition Introduction to deep learning Is deep learning useful for human action recognition? 2 Introduction to Human action recognition Vision-based human activity recognition sensor-based human activity recognition Introduction to Human action recognition • Segmentation • Feature representation • Feature Classification Usually hand-crafted: SIFT, HOG, etc… The real challenge: image features Representation • Examples of descriptors Unsupervised feature learning • Until very recently, learning has not played a major role until the classification stage, at which point much of the input is lost. • Now learn from data directly and No engineering/research effort • Equally good if not better Hierarchies in high-level vision Hierarchies in high-level vision Hierarchies in high-level vision Deep Learning and AI Deep Learning and AI Deep Learning and AI Unsupervised learning: optimizes Φ from unlabeled data distribution Unsupervised feature learning • Distributed representations – many-to-many relationship between concepts and variables • Each concept is represented by many variables • Each variable participates in the representation of many concepts Distributed color-shape representation Hierarchical sparse coding Unsupervised feature learning • Boltzmann machine • Deep Neural Networks • Convolutional Neural Networks Deep Neural Network for AR • A key advantage of DNN is its representation of input features. • DNN can model diverse activities with much less training data. Deep Neural Network for AR • Supervised : – Restricted Bozltman Machine (RBM) • Unsupervised: – Shift-Invariant Sparse Coding • RBM and Sparse Coding are fully connected DNN models. Therefore, they do not capture local dependencies of the time series signals. Fully and Locally Connected NN Convolutional NN Advantages of applying CNN to AR • Can capture Local Dependency and Scale invariance features of activity signals. • variations of the same activity can be effectively captured through the extracted features. CNN for sensor-based AR • Consists of one or more pairs of convolution and pooling layers • Local dependencies by Convolutional layers • Scale-invariance by maxpooling layers Activity Recognition Criticism on Deep Learning • Computational intensive • A lot of parameters to tune
© Copyright 2025 Paperzz