Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010 Outline • Contextual Representation of Actions • Action Retrieval as Ranking • Results and Future Work Nursing Home • Fall analysis in nursing home surveillance videos – a system automatically rank the videos according to the relevance to fall action is expected Action-Action Context What other people are doing ? Context Actions in Group Context • Motivation – human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. • Goal – explore the benefit of contextual information in action retrieval in challenging real-world applications Action Context Descriptor τ z + Focal person action Context action Feature Descriptor Multi-class SVM action class score e.g. HOG by Dalal & Triggs action class score … score action class max score Action Context Descriptor action class Outline • Contextual Representation of Actions • Action Retrieval as Ranking • Results and Future Work Classification or Retrieval • Previous Work – Most work in human action understanding focuses on action classification. Classification or Retrieval • Most surveillance tasks are typical retrieval tasks – retrieve a small video segment contains a particular action from thousands of hours of videos. • The “action of interest” is rare event – Extremely imbalanced classes Action Retrieval Query : fall Rank according to the relevance to falls Learning • Input: document-rank pair (xi,yi) • Optimization Joachims, KDD 06 Ranking SVM • Ranking function h(x) h(x) Rank r1 Rank r2 Rank r3 Action Retrieval - training irrelevant relevant very relevant Outline • Contextual Representation of Actions • Action Retrieval as Ranking • Results and Future Work Dataset • Nursing Home Dataset • 5 action categories: walking, standing, sitting, bending and falling. (per person) • 18 video clips. • Query: fall • Collective Activity Dataset (Choi et al. VS. 09) • 5 action categories: crossing, waiting, queuing, walking, talking. (per person) • 44 video clips. • Query: each of the five actions Dataset • Nursing Home Dataset Dataset • Collective Activity Dataset System Overview Video Person Detector Person Descriptor u v • Pedestrian Detection • HOG by Dalal & Triggs by Felzenszwalb et al. • LST by Loy et al. • Background Subtraction at cvpr 09 Rank SVM Baselines • Context vs No Context – Action Context Descriptor – Original feature descriptors, e.g. HOG (Dalal & Triggs at CVPR 05), LST (Loy et al. at CVPR 09) • RankSVM vs SVM • Methods – – – – Context + RankSVM (our method) Context + SVM No Context + RankSVM No Context + SVM Retrieval Results Nursing Home Dataset Retrieval Results Collective Activity Dataset Retrieval Results Collective Activity Dataset Retrieval Results Collective Activity Dataset 1 2 3 4 5 6 7 8 Action Classification [10] Choi et al. in VS. 09 Collective Activity Dataset Conclusion • A new contextual feature descriptor to represent actions – action context (AC) descriptor • Formulate our problem as a retrieval task. Future Work • Contextual Feature Descriptors – How to only encode useful context? • Rank-SVM loss, optimize the NDCG score Thank you! 5 6 7 8
© Copyright 2026 Paperzz