Location-Based Activity Recognition

Location-Based
Activity Recognition
Lin Liao
University of Washington
NIPS
Goal Of Research
• Enable computers to have similar
capabilities as humans for recognizing
people’s activities.
To Achieve This Goal
• We must provide computers with the 2
types of functions that ordinary people
possess
1. Sensing
2. Learning and inference
Challenges Of Activity Recognition
1. Function of sensing
– It is difficult to develop sensing platform
that can collect information over long
periods of time and is not intrusive to
users.
Challenges Of Activity Recognition
2. High level activity
Gap

Low level activity
inference
Challenges Of Activity Recognition
3. Some factors are hard to determine
precisely.
Ex) preference, capabilities
4. Activities are strongly correlated.
5. System requires a large amount of
domain knowledge.
What model to use
• Uncertainty and variability of human
: Impractical to model activities in a
deterministic manner.
• Probability reasoning
- Discriminative approach
- Generative approach
Probability Reasoning
Discriminative models
Generative models
Logistic regression,
Conditional Random Fields
Naïve Bayesian models, HMM,
Dynamic Bayesian networks
Directly represent the conditional
distributions of the hidden labels
given all the observations
Represent joint distributions and
use the Baye’s rule to obtain
conditional distribution
During training, the parameters
could be adjusted to maximize
the conditional likelihood of the
data.
Usually trained by maximizing the
joint likelihood.
Probability Reasoning
• Apply both approaches
1. Discriminative models (CRF, RMN)
•
Activity classification
2. Generative models (DBN)
•
Inferring transportation
Conditional Random Fields
• CRFs are undirected graphical models.
• Nodes define the conditional distribution
p(y|x)
– Y : hidden state
– X : observations
• Let’s start with an example
Conditional Random Fields
Conditional Random Fields
• The fully connected sub-graphs of a CRF
are cliques.
• CRF factorizes the conditional
distribution into a product of clique
potentials
Conditional Random Fields
• Computation of this partition function(Z)
is exponential in the size of hidden
states since it requires summation over
all possible configurations.
Conditional Random Fields
• WLOG,
– wc : weight vector, fc : feature vector
• Then,
Relational Markov Networks
• These cliques represent the same type of
relation
• Weight and features should be tied
• Parameter sharing
Relational Markov Networks
• RMN extend CRFs by providing
1. A relational language for describing clique
structures
2. Enforcing parameter sharing at the
template level
Relational Markov Networks
Relational Markov Networks
Relational Markov Networks
• By applying the query to a data instance
I, we get the query result I(C).
• In summary, given a specific data
instance I, an RMN defines a conditional
distribution p(y|x) by instantiating the
CRF.
Relational Markov Networks
Extensions of RMN
1. Aggregate features
– Aggregate functions : count, sum, mean, …
2. Structural uncertainty
Aggregate Features
1. Allow aggregates of tuples, such as
COUNT( ), SUM( ), … in SELECT clause
2. WHERE clause can include label attributes
Aggregate Features
Aggregate Features
Structural Uncertainty
• So far, we have assumed we know the
exact structures of the instantiated CRFs
and thus the only uncertainties in our
models are the hidden labels.
• We focus on instance uncertainty.
Structural Uncertainty
• New class “Restaurant” that stores list of
restaurants the person has been to.
• Heuristics. BEST(n)
Structural Uncertainty
• Most likely (BEST(1)) sequence of labels
is first inferred.
• Find “Dining Out” activities
• Insert locations to the set of “Restaurant”
Now we can make RMN model from a
given input.
Inference
• We first instantiate the CRF from the
data instance, then we do inference.
• Exact inference in a general Markov
network is NP-hard
• 2 widely used approximated algorithms
: Markov Chain Monte Carlo, Belief
Propagation
Meaning of Inference
• Meaning of inference in a CRF
1. Estimate the posterior distribution
2. Estimate the most likely configuration
Monte Carlo
• Approximate p(y|x) by drawing enough
samples and simply counting frequency.
• But state space of p(y|x) is exponentially
increasing.
• Introduce MCMC
MCMC
• Basic idea
1. start with some point in state space
2. Sample next point based on current point
and a given transition matrix
3. Repeat sampling until it has enough
samples
MCMC - Initialization
• Randomly sample each hidden variable
MCMC – State Transition
• Current configuration y(i)
• Next configuration y(i+1)
• y(i+1) = Matrix * y(i)
1. Gibbs sampling
2. Metropolis-sampling
Gibbs sampling
• Gibbs sampler filps on label at a time.
•
stand for all the labels except label j
•
is Markov blanket
• Block Gibbs sampler
MCMC – Stop Criterion
• There has no satisfactory theoretical
answer yet.
• Statistical tests.
MCMC – Approximation of the
distribution
• Discard 20% initial set.
• Simply count the frequency of labels for
each hidden variable
Belief Propagation
• Basic idea
– Sending local messages through the graph
structure defined by CRF
– Tree : Good
– Loop : Loopy BP
– Discuss BP with pairwise CRFs.
BP
• Message
– Distribution sent from node i to its neighbor
j about which state variable yj should be in.
• Discuss about sum-product for posterior
estimation.
BP – Message Initialization
• Usually
are uniform distribution.
BP – Message Update Rule
•
is updated based on
1. Local potentials
2. Pairwise potential
3. All messages to i, not from j.
BP – Message Update Order
• Iterates the message update rule until it
converges.
BP – Convergence conditions
• BP measures the difference between the
old message and the updated one.
Belief Propagation
Efficient Inference
• Large cliques can easily make BP
intractable.
• MCMC, Gibbs : slow in complex model
1. Summation features : BP with FFT
2. Generic features : BP with MCMC
BP with FFT
• Dynamically builds summation tree.
y_jk = y_j + y_k
• Upward message
• Downward message
BP with FFT
• Upward message
• Computation complexity
• K is max number of states in y_i and y_j
BP with FFT
• Downward message
BP with FFT
• Suppose n upward messages at the
bottom, and the maximum size of a
message is k. (n >> k)
Now deal with real data
Extracting places and activities
• Inference using BP
• Parameter learning using MPL
Extracting places and activities
• GPS Measurements
– Input to our model
– Segment GPS trace by 10m
– Generate discrete sequence of activity nodes
Extracting places and activities
• Activities
– Estimated based on segmented GPS trace.
– Labels person’s activity whenever she passes
through or stay at a 10m patch.
– Significant activities
– Navigation activities
– To determine activities : features(ex duration),
geographic information
Extracting places and activities
• Significant places
– Places include person typically uses
Extracting places and activities
GPS to Street Map Association
• For example, in order to relate locations
to address in the map.
• We have to check
1. GPS noise and map uncertainty
2. Temporal consistency
3. Smoothness
GPS noise and map uncertainty
Temporal consistency
Smoothness
GPS to Street Map Association
Extracting and Inferring Activities
• This template instantiates the activities
using the MAP sequence of the street
patches.
• TimeOfDay of an activity is obtained from
the starting timestamp, MIN(Timestamp)
Extracting and Inferring Activities
• After extracting the activity instances, we
define relational clique templates that
connect the labels of activities with all
source of evidence.
• There are 4 types of evidence.
Source of Evidence
1. Temporal information such as
TimeOfDay, DayOfWeek are discretized.
2. Average speed through a segment.
3. Information extracted from geographic
databases.
Source of Evidence
4. Each activity node is connected to its
neighbors.
Extracting and Inferring Activities
• Each activity node a_i is connected to
observed local evidence nodes e_ij
Extracting and Labeling Significant
Places
• Since places are unknown priori, again
we encounter the problem of instance
uncertainty.
Extracting and Labeling Significant
Places
• Looks for all the significant activities
using a filter function
Extracting and Labeling Significant
Places
• The activities that occur at a place
strongly indicate the type of the place.
• Our template captures the dependencies
between the place labels and the activity
labels.
Extracting and Labeling Significant
Places
• A person usually has only a limited
number of homes or work places.
• To use this knowledge
• Likelihood of labels decrease
exponentially as the count increases.
Inference
• Structure of CRF -> run BP to get MAP
• But we have structure uncertainty (place)
1. GPS trace is segmented.
2. Activity nodes are generated.
3. MAP activity sequence is used to extract
a set of significant places.
4. Repeat 3. until it converges.
Experimental Results
4 person, 6 days
40000GPS measurements
9000 segments
Manually labeled all activities and
significant places.
• Leave-one-out cross-validation
• MPL for learning
• Less than 1 minute to converge
•
•
•
•
Experimental Results
• Extracting significant places
Experimental Results
• Labeling places and activities
• Accuracy is 86.0%
• Performing joint inference over activities and places
increases the quality
Experimental Results
• Labeling places and activities
• Overall accuracy in place detection and
labeling is 90.6%
Indoor Activity Recognition
• CRF model is simply a linear chain, so
the BP inference is very efficient and
exact.
• However, there are large number of
continuous observations in the sensor
data, parameter learning using ML or
MPL does not work well.
• Therefore, we apply VEB for this task.
Indoor Activity Recognition
• Shoulder mounted multi-sensor board
• Audio, acceleration, light
Questions ?