Recognizing Human Actions from Video Data

Retrieving Actions in Group
Contexts
Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch
Simon Fraser University
Sept. 11, 2010
Outline
• Contextual Representation of Actions
• Action Retrieval as Ranking
• Results and Future Work
Nursing Home
• Fall analysis in nursing home surveillance
videos
– a system automatically rank the videos according
to the relevance to fall action is expected
Action-Action Context
What other
people are
doing ?
Context
Actions in Group Context
• Motivation
– human actions are rarely performed in isolation,
the actions of individuals in a group can serve as
context for each other.
• Goal
– explore the benefit of contextual information in
action retrieval in challenging real-world
applications
Action Context Descriptor
τ
z
+
Focal person
action
Context
action
Feature
Descriptor
Multi-class
SVM
action class
score
e.g. HOG by Dalal & Triggs
action class
score
…
score
action class
max
score
Action Context Descriptor
action class
Outline
• Contextual Representation of Actions
• Action Retrieval as Ranking
• Results and Future Work
Classification or Retrieval
• Previous Work
– Most work in human action understanding
focuses on action classification.
Classification or Retrieval
• Most surveillance tasks are typical retrieval
tasks
– retrieve a small video segment contains a
particular action from thousands of hours of
videos.
• The “action of interest” is rare event
– Extremely imbalanced classes
Action Retrieval
Query : fall
Rank according to
the relevance to falls
Learning
• Input: document-rank pair (xi,yi)
• Optimization
Joachims, KDD 06
Ranking SVM
• Ranking function h(x)
h(x)
Rank r1
Rank r2
Rank r3
Action Retrieval - training
irrelevant
relevant
very
relevant
Outline
• Contextual Representation of Actions
• Action Retrieval as Ranking
• Results and Future Work
Dataset
• Nursing Home Dataset
• 5 action categories: walking, standing, sitting, bending
and falling. (per person)
• 18 video clips.
• Query: fall
• Collective Activity Dataset (Choi et al. VS. 09)
• 5 action categories: crossing, waiting, queuing,
walking, talking. (per person)
• 44 video clips.
• Query: each of the five actions
Dataset
• Nursing Home Dataset
Dataset
• Collective Activity Dataset
System Overview
Video
Person
Detector
Person
Descriptor
u
v
• Pedestrian Detection
• HOG by Dalal & Triggs
by Felzenszwalb et al.
• LST by Loy et al.
• Background Subtraction at cvpr 09
Rank
SVM
Baselines
• Context vs No Context
– Action Context Descriptor
– Original feature descriptors, e.g. HOG (Dalal & Triggs at
CVPR 05), LST (Loy et al. at CVPR 09)
• RankSVM vs SVM
• Methods
–
–
–
–
Context + RankSVM (our method)
Context + SVM
No Context + RankSVM
No Context + SVM
Retrieval Results
Nursing Home Dataset
Retrieval Results
Collective Activity Dataset
Retrieval Results
Collective Activity Dataset
Retrieval Results
Collective Activity Dataset
1
2
3
4
5
6
7
8
Action Classification
[10] Choi et al. in VS. 09
Collective Activity Dataset
Conclusion
• A new contextual feature descriptor to
represent actions
– action context (AC) descriptor
• Formulate our problem as a retrieval
task.
Future Work
• Contextual Feature Descriptors
– How to only encode useful context?
• Rank-SVM loss, optimize the NDCG score
Thank you!
5
6
7
8