Chapter 1 Synthesising Generative Probabilistic Models for

Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Chapter 1
Synthesising Generative Probabilistic Models for High-Level Activity
Recognition
Christoph Burghardt, Maik Wurdel, Sebastian Bader, Gernot Ruscher and Thomas Kirste
Institute for Computer Science, University of Rostock, 18059 Rostock, Germany
[email protected]
High-level (hierarchical) behaviour with long-term correlations is difficult to describe
with first-order Markovian models like Hidden Markov models. We therefore discuss different approaches to synthesise generative probabilistic models for activity recognition
based on different symbolic high-level description. Those descriptions of complex activities are compiled into robust generative models. The underlying assumptions for our
work are (i) we need probabilistic models in robust activity recognition systems for the
real world, (ii) those models should not necessarily rely on an extensive training phase and
(iii) we should use available background knowledge to initialise them. We show how to
construct such models based on different symbolic representations.
1.1. Introduction & Motivation
Activity recognition is an important part of most ubiquitous computing applications.1 With
the help of activity recognition, the system can interpret human behaviour to assist the
user,2–4 being able to control complex environments5,6 or detect difficult or hazardous situations.7
Activity recognition got very successful with the rise of machine-learning techniques8
that helped to build systems which gain domain knowledge from simple training data.9–11
However while it is simple to gather training data for inferring e.g. the gait of the user,12
this process does not scale up for longer and more complex activities.13 Long-term behaviour of users, or scenarios with multiple interacting people, lead to a state explosion
that systematically hinders the process of collecting training data. Higher-level activities
like “making tea”, “printing a paper” or “holding a meeting” typically consist of more than
five steps which can – at least partially – vary in execution order or even be omitted. Under
these circumstances gathering or even annotating training data is in general very complex
and expensive. Furthermore the training data is less valuable because it is much more
difficult to generalise to unseen instances.
Another desirable feature of an activity recognition system is that the system should
work in an ad-hoc fashion and exhibit sensible behaviour right from the start. It is not
possible to gather training data for every ad-hoc session.
The aim of this paper is to briefly present earlier research and explore three formalisms
1
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
2
Burghardt, Wurdel, Bader, Ruscher and Kirste
of human behaviour modelling with respect to their suitability to express different highlevel activities, thus turning our explicit knowledge automatically to a probabilistic framework for inference tasks.
Our work is based on the following three hypotheses:
i) Probabilistic generative models are needed: To cope with noisy data, we need some
kind of probabilistic system, because crisp systems usually fail in such domains. One
goal of activity recognition is to provide pro-active assistance to the user. For this, we
do need generative models which allow forecasts into to future activities, which then
can be supported by the system. Therefore, we need probabilistic and in particular
also generative models.
ii) Activity recognition systems should not rely on a training phase: We believe that an
activity recognition system should work right from the start. I.e., we should not have
to train it before being able to use it. Nonetheless it should profit from available
training data. As argued above, in most interesting situations the collection of training
data is quite expensive and sometimes even unfeasible, because there are just to many
different situations the system is supposed to work in.
iii) Available (symbolic) background knowledge can and should be used: To create systems with the desired properties, we need to use available background knowledge to
synthesise such probabilistic models.
Because different modelling paradigms have individual strength and weaknesses in
terms of describing sequences and hierarchical structures of activities, we discuss three
different approaches in this paper. After introducing the preliminaries, we discuss how
to convert process-based hierarchical approaches based on task models and grammars in
section 1.4.1 to 1.4.3. and one causal approach, based on preconditions and effects in section 1.4.2. To leverage the individual strength of each formal description, we also discuss
a first approach to combine those approaches into a single model. In section 1.5 we discuss the advantages and disadvantages of the approaches and conclude this paper with an
outlook for future research avenues.
We are using hidden Markov models as the target formalism, because they are simple to
illustrate and yet powerful enough to abstract every realistic (finite) problem. Furthermore,
powerful and efficient inference mechanism for HMMs are well known.
Even though it is clear that propositional symbolic descriptions can be transformed
into a hidden Markov model, a concise treatment of such approaches is missing in the
literature. Here we make a first attempt by discussing the transformation of different formal
descriptions into HMMs utilising the same notation and thus provide a starting point for a
further unification of such approaches.
1.2. Related Work
Among the most mature approaches to activity recognition are logic-based ones that try to
infer the current activity based on logical explanations for the observed actions. K AUTZ et
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
3
al. introduced a formal theory on “plan recognition”.14 However, also argued by C HAR NIAK AND G OLDMAN ,15 in the real world sensor data is inherently uncertain and therefore
a probabilistic framework is needed. This made Bayesian inference methods very popular
for activity and context recognition.10,11,16–18 Many approaches use dynamic Bayesian networks (DBNs) and especially hidden Markov models (HMMs)16,19,20 as the most simple
DBN to infer the current activity of the user. Numerous effective inference and learning
algorithms exists for HMMs and we can map every realistic finite problem onto them.
Many recent activity recognition systems21,22 employ discriminative approaches because of their more efficient use of training data. E.g., in23,24 it is argued that discriminative
models achieve a lower asymptotic error bound, but that generative models achieve their
(typically higher) error bound faster. That is, with little training data, a generative approach
(built on prior knowledge) is able to provide better estimates. If a generative model does
correctly reflect the dynamics of the system, it achieves a better performance with limited
or no training data.
Different approaches and techniques (e.g. bootstrapping,25,26 cross-domain activity
recognition27 ) are under research to minimise or omit the need of training data. E.g., the
Proact system16 data-mined structured action descriptions from the Internet. In later publications their extended the data-mining basis to also include knowledge databases.28,29 We
complement and extend this approach by taking (different) formal descriptions of human
behaviour and turn these into probabilistic generative models usable for activity recognition. However, in our work we seek to construct generative models in order to predict future
behaviour.
1.3. Preliminaries
We now discuss some related approaches and introduce some preliminary concepts important for the sequel. In particular, we discuss hidden Markov models (HMMs), the collaborative task modelling language (CTML), the planning domain definition language (PDDL)
and probabilistic context-free grammars (PCFGs) to some extent.
1.3.1. Hidden Markov Models
Inferring activities from noisy and ambiguous sensor data is a problem akin to tasks such as
estimating the position of vehicles from velocity and direction information, or estimating
a verbal utterance from speech sounds. These tasks are typically tackled by employing
probabilistic inference methods.
In a probabilistic framework, the basic inference problem is to compute the distribution
function p(Xt |Y1:t = y1:t ); the probability distribution of the state random variable Xt given
a specific realisation of the sequence of observation random variables Y1:t . The probability
of having a specific state xt is then given by p(xt | y1:t ).
In a model-based setting, we assume that our model specifies the state space X of
the system and provide information on the system dynamics: It will make a statement
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
4
Burghardt, Wurdel, Bader, Ruscher and Kirste
about in which future state xt+1 ∈ X it will be in, if it is currently in state xt ∈ X. In a
deterministic setting, system dynamics will be a function xt+1 = f (xt ). However, many
systems (e.g. humans) act in a non-deterministic fashion, therefore, the model generally
provides a probability for reaching a state xt+1 given a state xt , denoted by p(xt+1 | pt ). This
is the system model. The system model is first order Markovian: the current state depends
only on the previous state and no other states.
Furthermore, we typically cannot observe directly the current state of the system, we
have sensors that make observations Y . These observations yt will depend on the current
system state xt , again non-deterministically in general, giving the distribution p(yt | xt ) as
observation model. Let us consider a small example:
 
Example 1.1.
Figure 1.1 shows a graphical representation of the HMM hS, π, T, τ, O, Pi with S =
{One, Two}, π(One) = π(Two) = 0.5, τ = S × S with τ(One, One) = τ((Two, Two)) =
0.9 and τ(One, Two) = τ((Two, One)) = 0.1, using O = R and P(One, i) = N(−1,2) (i) and
P(Two, i) = N(1,1) (i). a
 
Two
 
One
 
0.5
0.1
0.1
0.9
 
0.9
 
0.5

Fig. 1.1.






A graphical representation of the transition and the observation model of the HMM from Ex. 1.1
We can write this example more formally:
Definition 1.1 (HMM). A hidden Markov model is a tuple (S, π, T, τ, O, P) with S being a
set of states and π assigning an initial probability to these states with ∑s∈S π(s) = 1 , T ⊆
S × S being the state transition relation and τ : T → R mapping transition to probabilities
with ∑(s,s0 )∈T τ((s, s0 )) = 1 for all s ∈ S , O being a set of possible observations and P :
S × O → R mapping states and observations to probabilities with ∑o∈O P(s, o) = 1 for all
s ∈ S.
The states and the corresponding state-transition model describe the sub-steps of an
activity and the order that they have to be done. In the literature there exists many difa Throughout
the paper we use N(µ,σ ) to denote the normal distribution with mean µ and standard deviation σ .
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
5
ferent paradigms to model human behaviour. However, few have been applied to describe
human behaviour with respect to activity recognition. In this paper, we discuss different
approaches to describe human long term and high-level activities such that those formal descriptions (task models, planning operators and grammars) can be used to initialise HMMs.
1.3.2. Planning Problem
Automated planning is a branch of artificial intelligence that concerns the realisation of
strategies or action sequences, typically for execution by intelligent agents, autonomous
robots and unmanned vehicles. In the STRIPS formalism30 actions are described using
preconditions and effects. A state of the world is described using a set of ground proposition, which are assumed to be true in the current state. A precondition is an arbitrary
function-free first order logic sentence. The effects are imposed on the current world-state.
Problems are defined with respect to a domain (a set of actions) and specify an initial situation and the goal condition which the planner should achieve. All predicates which are
not explicitly said to be true in the initial conditions are assumed to be false (closed-world
assumption).
Example 1.2.
Consider a planning problem with i.e., three simple actions a, b, and c with effects p, q,
and r respectively. We can define this problem as I = {a, b, c}, Is = {}, G = {p, q, r}, A =
{a, b, c}, pre(a) = pre(b) = {}, pre(c) = {p, q}, eff+ (a) = {p}, eff+ (b) = {q}, eff+ (c) =
{r}, eff− (a) = eff− (b) = eff− (c) = {}. Starting from an empty initial world state we are
looking for a state in which every operator is applied at least once. The corresponding state
graph is shown in Figure 1.2.
aa
{}
{a}
ab
ab
aa
{a,b}
ac
{a,b,c}
{b}
Fig. 1.2.
Transition graph for the planning problem from Example 1.2
Definition 1.2 (Planning Problem). We define a planning problem formally as a tuple
hI, Is , G, A, pre, eff+ , eff− i with I being the set of all ground propositions in the domain, Is ⊆ I
being the initial state of the world and G ⊆ I being the set of propositions that must hold in
the goal state, A is a set of actions with pre : A → P(I) mapping actions to preconditions
and eff+ , eff− : A → P(I) mapping actions to positive and negative effects respectively.
The purpose of a planner is to build up a valid sequence of actions that change the
world from the initial state Is to a state where the goal condition G is true, as described
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
6
Burghardt, Wurdel, Bader, Ruscher and Kirste
by the problem description. By describing each action with precondition and effects, this
modelling approach describes the resulting valid processes implicit, thus in a bottom up
fashion. The automatic emergence of all valid process sequences is a very useful property
for activity recognition.
1.3.3. Task Models
In Human Computer Interaction (HCI), task analysis and modelling is a commonly accepted method to elicit requirements when developing a software system. This also applies
for smart environments but due to the complexity of such an environment the modelling
tasks is quite challenging. Different task-based specification techniques exist (e.g.: HTA
TAG, TKS, CTT (for an overview we refer to31 ) but none incorporates the complexity involved in such a scenario. According to task analysis humans, tend to decompose complex
tasks into more simple ones until an atomic unit, the action, has been reached. Moreover
tasks are performed not only sequential but also decision making and interleaving task
performance is common to humans. Therefore the basic concepts incorporated by most
modelling languages are hierarchical decomposition and temporal operators restricting the
potential execution order of tasks (e.g. sequential, concurrent or disjunct execution).
An appropriate modelling notation, accompanied by a tool, supports the user in several ways. First, task-based specifications have the asset of being understandable to stakeholders. Thus modelling results can be discussed and revised based on feedback. Second,
task models can be animated which even fosters the understandability and offers the opportunity to generate early prototypes. Last but not least, task-based specifications can also
be used in design stages to derive lower level models such as HMMs.32 Task models are
usually understood as descriptions of user interaction with a software system. They have
been mostly applied to model-based UI development even though the application areas are
versatile. For intention recognition task modelling can be employed to specify knowledge
about the behaviour of users in an understandable manner which may be inspected at a very
early stage.
CTML (The Collaborative Task Modelling Language) is a task modelling language
dedicated for smart environments which incorporates cooperational aspects as well as location modelling,33 device modelling, team modelling and domain modelling.34 It is a
high level approach starting with role-based task specifications defining the stereotypical
behaviour of actors in the environment. The interplay of tasks and the environment is defined by preconditions and effects. A precondition defines a required state which is needed
to execute a task. An effect denotes a state change of the model through the performance of
the corresponding task. Therefore CTML specifications allow for specifying complex dependencies of task models and relevant other models (e.g. domain model, location model)
via preconditions and/or effects. Here, we focus on a subset of CTML only, called Custom
CTML (CCTML).
Example 1.3.
To clarify the intuition behind task models, we give a brief example of a CCTML
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
7
specification. It will not only highlight the basic concepts of CCTML but will also
serve as foundation for a more elaborate example in Section 1.4.1. In Figure 1.3 the
example is given. It specifies how a presenter may give a talk in a very basic manner. More precisely it defines that a sequence of the actions Introduce, Configure Equipment, Start Presentation, Show Slides and Leave Room constitute to a successful presentation. With respect to the formal definitions above we can formalise the example as
follows (instead of using the task names we use the prefixed numbers of task in Figure
1.3): T = {1., 2., 3., 4., 5.}, γ = (1., 2., 3., 4., 5.), prio(t) = 1, O = {O1 , O2 , O3 },o(t) =
{(1., O1 ), (2., O1 ), (3., O2 ), (4., O3 ), (5., O3 )}
1. Give Talk
2. Introduce
>>
3. Configure
Equipment
>>
4. Start
Presentation
>>
5. Show
Slides
>>
6. Leave
Room
Fig. 1.3. Simplified CCTML Model for “Giving a Presentation”. Please note that O, prio and o have been
excluded from the visual representation.
To specify such a model formally, we first introduce the basic task expressions as follows:
Definition 1.3 (Task Expression (TE)). Let T be a set of atomic tasks. Let t1 and t2
be task expressions and λ ∈ T, then the following expression are also task expressions:
t1 []t2 ,t1 | = |t2 ,t1 |||t2 ,t1 [>t2 ,t1 | >t2 ,t1 t2 , λ , λ ∗ , [λ ], (λ ).
A CCTML model is defined by a set of action, denoting the atomic units of execution and a task expression γ. We extended the usual notion by introducing the function
prio assigning a priority to each action in the CCTML model and the function o assigning
observations to actions.
Definition 1.4 (CCTML Model). A CCTML model is defined as a tuple hT, γ, O, prio, oi
with T being a set atomic actions, γ ∈ TE being the task, prio : T → N assigning a priority
to each action and o : T → O assigning an observation to each atomic action.
Please note that TE defines only binary operators in infix notation. Nested TEs
((tx [] (ty []tz ))) can be easily translated into n-ary operators which are used in the examples.
The task expressions TE defines the structure of the CCTML model. It is an recursive
definition. Each task expression e ∈ TE is therefore defined by nesting of (an-) other task
expression(s) plus an operator until an action has been reached. More precisely there are
binary and unary operators. Additionally the actions need to be defined in the set of actions
(T ) in the corresponding CCTML.
For CTML an interleaving semantics, similar to operational semantics in process algebras,35 is defined via inference rules. Basically for every temporal operator (such as
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
8
Burghardt, Wurdel, Bader, Ruscher and Kirste
choice([]), enabling (), etc.) a set of inference rules is declared which enables a derivation of a Labeled Transition System or a HMM. As the comprehensive definition includes
more then 30 rules only an example for the choice operator is given. Informally this operator defines that a set of tasks are enabled concurrently. A task may be activated but due to
its execution the others become disabled. Thus, it implements an exclusive choice of task
execution. Formally the behaviour of the operator can be defined as follows (ti ∈ t1 . . .tn ):
act
ti →
act
[] (t1 ,t2 . . .tn ) →
(1.1)
act
ti → ti0
act
[] (t1 ,t2 . . .tn ) → ti0
(1.2)
Taking the second rule we explain how such a rule is to be read. Given that we may
transit from ti to ti0 by executing act then a choice expression can be translated to ti0 by
executing act with ti ∈ t1 . . .tn . The first rule declares that if an execution of act in ti lead
to a successful termination then a choice expression may terminate also successfully by
executing act with ti ∈ t1 . . .tn . This rule is applied if ti is an atomic unit, so called action.
Further readings about the inference rules for task modelling can be found in.36
The definition of a CCTML model is based on task names and a complex expression
which specifies the structure of the model. The transformation process of a CCTML model
to a state transition system by inference rules always terminates as expressions are always
simplified until no rule to apply exists anymore. Picking up the example above a choice
expression is transformed into a simple task expression (more precisely a state representing
[] (t1 ,t2 . . .tn ) to a state representing ti0 ) which is in turn further simplified. Therefore eventually the state representing the empty task expression is created. This expression cannot
be further simplified.
1.3.4. Probabilistic Context-Free Grammars
Probabilistic context free grammars (PCFGs) are usually applied in speech recognition
problems. They extend the notion of context free grammars by assigning probabilities to
rules. For a general introduction we refer to.37 Valid words of the language are derived
by replacing non-terminals according to the rules starting with the start symbol. Using the
probabilities attached to the rules, we can compute the overall probability that a word has
been generated using a given grammar.
Example 1.4. As a running example we use the following PCFG with the set of terminal
symbols T = {indoor, walk, stop, carStop, carSlow, carFast, indoor}, and the non-terminals
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
9
N = {Day, Trans, Car}, N1 = Day, and
R = {(r1 := 1.0 : Day → Trans, indoor, Trans),
(r2 := 0.3 : Trans → walk),
(r3 := 0.7 : Trans → carStop, Car, carStop),
(r4 := 0.2 : Car → carSlow, carFast, carSlow),
(r5 := 0.8 : Car → carSlow, carFast, carSlow, Car)}
in which every rule (r1 , . . . , r5 ) is annotated with its probability π. A valid sequence of
actions is walk, indoor, walk. Another valid sequence in which the first walk has been exchanged is carStop, carSlow, carFast, carSlow, carStop, indoor, walk
Definition 1.5 (PCFG). A probabilistic context free grammar is defined as a quintuple
hT, N, N1 , R, πi, with T being a set of terminal symbols, N being a set of non-terminals,
N1 ∈ N being the start symbol, R (rewrite rules) being a relation between non-terminals
and words ζ ∈ (N × T)∗ , such that for every non-terminal n ∈ N there is at least one rule
(n, ζ ) ∈ R, and π : R → R assigning probabilities to rules such that for all n ∈ N we find
∑(n,ζ )∈R π((n, ζ )) = 1.
1.4. Synthesising Probabilistic Models
The following three sections show different approaches for the construction of hidden
Markov models based on planning operators, task models and grammars, respectively. Afterwards, we discuss a first approach to combine the different modelling approaches to
produce a single joint HMM.
1.4.1. From Task Models to Hidden Markov Models
Using the definitions presented in Section 1.2, we define the corresponding HMM for a
given CCTML-model as follows:
Definition 1.6.
Let hT, γ, O, prio, oi be a CCTML model.
hS, π, T0 , τ, O, Pi as follows:
We define the corresponding HMM
act
0
0
• S = {(γ)}
( ∪ {e | e ∈ S, e → e }
1 if s = γ
• π(s) =
0 otherwise
act
• T0 = {(t1 ,t2 ) | t1 ,t2 ∈ S,t1 → t2 }
act
prio(act) with ti → t j
• τ(ti ,t j ) =
prio(act)
∑(t ,t)∈T0 and t act
→t
i
• P(ti , o) =
i
1
act
|{o(act) | (t,ti , ) ∈ T0 and t → ti }|
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
10
Burghardt, Wurdel, Bader, Ruscher and Kirste
For a given CCTML model we derive the HMM by extracting all potential states from
the CCTML by the inference rules. Thus it is started with the task expression γ and all
potential action relations are fired. The resulting task expressions are added to the set S.
This process is continued until the empty task expression is derived. Therefore the set of
states of the HMM is consisting of all potential states of the CCTML model. Task models
clearly specify the initial state by their structure. Therefore the initial probability of γ is
1 and for all other states 0. In the same vein as the states, transitions are derived. We
define a transition in the HMM for each pair of states which can be reached by executing
an action. The transition probabilities are calculated by means of the function prio. As a
transition in the HMM coincides with an execution of an action in the task model transition
probabilities are calculated by the ratio of the priority of the task under execution and the
sum of priorities of all potential task executions. The function o(t) assigns an observation to
an action. The probability of the occurrence of a certain observation in a state is uniformly
distributed over the observations assigned to the incoming actions.
1. Give Talk
2. Introduce
>>
3. Configure
Equipment
8. Connect Laptop
& Projector
|=|
>>
4. Start
Presentation
9. Set to
Presentation
Fig. 1.4.
>>
10. Show
Next Slide
5. Next
Slide
>>
* [>
6. End
Presentation
>>
7. Leave
Room
11. Explain
Slide
CCTML Model for “Giving a Presentation”
Example 1.5. Let us examine the transformation by an example depicted in Figure 1.4.
Nodes denote tasks whereas edges either represent hierarchical task decomposition (vertical) or temporal operators (horizontal). The model specifies how a presenter may give a
presentation. First the presenter introduces herself, then the presenter configures her equipment by connecting the laptop to the projector and set her laptop to presentation mode.
Note that these two action may be executed in arbitrary order (denoted by the orderindependence operator (| = |)). After doing so the presenter starts her talk. The talk itself is
performed by presenting slides iteratively (denoted by the unary iteration operator (∗ )). After ending the presenting which aborts the iteration (due to the disabling operator ([>))the
presenter leaves the room.
Using the prefixed numbers the task model can be paraphrase by the formal task expression term: >> (2., | = |(8., 9.), 4., [> (>> (10., 11.)∗ , 6.), 7.). Please note that there
are also modelling elements which no visual counterpart (e.g. the observations O). The
same applies for prio function which assigns for each task the priority 1 but for the following: (8., 4), (10., 8), (6., 2) The first number denotes the task to be assigned whereas the
second illustrates the priority. Thus we have all information to construct the HMM.
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
11
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
{4, 10,
3, 2}
{2,8}
{}
1.0
0.727
1.0
0.8
{2}
{3,2}
0.2
1.0
{4,3,2}
0.8
0.667
{4,3,
2,5}
0.333
{4,3,2,
5,6}
0.182
1.0
0.2
1.0
0.091
{1}
{2,9}
Fig. 1.5.
Corresponding HMM to Figure 1.4
The resulting HMM is depicted in Figure 1.5. Again not all modelling elements are
visualised (observations (O) and initial probability (π) are hidden). Nodes represent elements of S whereas edges define state transitions (T). Moreover labels of transitions mark
transition probabilities (τ) for the corresponding transitions. Labels inside of nodes denote
the set of task executed within the state (again, for reason of brevity we use the prefixed
numbers).
1.4.2. From Planning Problems to Hidden Markov Models
Given a planning problem as defined in the preliminaries, we first create the directed acyclic
graph (and therefore the HMM transition model) that contains all operator sequences that
reach a state where the goal condition G is true. Every vertex corresponds to a state of
the world and the edges are annotated with actions. As humans normally behave situationdriven, we execute in our implementation a forward search, applying consecutive each
operator to a world state I1 . If the preconditions of the operator are satisfied, we derive a
new world-state I2 . We continue until either the goal condition is satisfied or we ran out of
applicable actions. Afterwards we generate the observations O and the observation model
P by applying a P mapping. This process is defined as follows:
Definition 1.7. Let hI, Is , G, A, pre, eff+ , eff− i be a planning problem, O be a set of possible
observations and P : A × O → R be a mapping from actions and observations to probabilities. Then, we define the corresponding HMM hS, π, T, τ, O, Pi as follows:
• S = (Is ) ∪ {(s0 ) | (s) ∈ S, G 6⊆ s, a ∈ A, pre(a) ⊆ s, s0 := s \ eff− (a) ∪ eff+ (a)}
• T = {((s), (s0 )), a ∈ A and s0 := s \ eff− (a) ∪ eff+ (a)}
• τ(s1 , s2 ) = |{(s ,s0 )|(s1 ,s0 )∈T}|
1
1
(
p if s = Is
• π((s)) =
and p = |{(s)|(s)∈S1and s⊆Is }|
0 otherwise
• P((s), o) = P(a, o)
Example 1.6. Reconsider the planning problem from Example 1.2, the set of observables
{oa , ob , oc } and P(a, oa ) = 1, P(b, ob ) = 1, P(c, oc ) = 1. The states, prior probabilities and
transitions of the resulting HMM are depicted in Figure 1.6 and we find P(({}, a), oa ) =
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
12
Burghardt, Wurdel, Bader, Ruscher and Kirste
P(({}, b), ob ) = P(({b}, a), oa ) = P(({a}, b), ob ) = 1.
0.5
{}
aa
1.0
{a}
ab
0.5
1.0
{}
ab
Fig. 1.6.
1.0
1.0
{a,b}
ac
{b}
aa
A graphical representation of the HMM from Example 1.6.
While the resulting transition model T is already usable, we can enhance it by applying heuristic functions to weight each transition individually and generate better transition
probabilities that reflect more the individual human behaviour. Possible heuristic functions
come e.g. from the planning domain38 (plan length, goal-distance) or from knowledgedriven systems.39 We can apply these heuristics as follows:
Salience implies to explicit prioritise operators. We can just give each operator a certain
weight that will added to all transitions from this operator.
Refractoriness denotes that once a operator is applied, don’t apply it again. This rule can
be modelled in PDDL by introducing history-propositions that store whether an
operator has been applied.
Recency indicates to fire most recently activated rule first. It can be implemented by
calculating the goal-distance to each operator, in order to add penalty to actions
that lead to to plans with more plan-steps.
Specificity is to choose an operator with the most conditions or the most specific conditions ahead of a more general rule (e.g. the number of ground propositions the
effects of an action that are also part of the goal condition.
These different strategies generate different transition probabilities τ for a HMM. We
are currently investigating which rules are more successful in mimicking human behaviour.
The derivation of the observation probabilities in the example is arguably very simple
with a given direct action-observation mapping. However this mapping can be generated
by more sophisticated approaches from related literature. P HILOPOSE et. al13 mined these
observation probabilities for RFID-based activity recognition from the web. In later publications, P ENTNEY further enhanced this approach28 by data-mining and combining more
common-sense databases like Cyc40 and OpenMind.41 These and similar ongoing research
complements our approach of automatically synthesising probabilistic models for activity
recognition.
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
13
1.4.3. From Probabilistic Context-Free Grammars to Hidden Markov Models
As already argued above, humans tend to describe complex actions in a hierarchical manner. One approach to describe complex tasks has been discussed in Section 1.4.1. Here
we pursue a second approach by employing probabilistic context free grammars (PCFG).
Many users find writing grammar rules more intuitive than e.g. a causal language description. PCFGs are also used in the literature intention recognition by K IEFER ET. AL.42
Our approach to construct HMMs from probabilistic context-free grammars is as follows: We first construct a so-called hierarchical hidden Markov model which captures the
structure of a given grammar and then we flatten this hierarchical model to obtain a normal
HMM. Obviously one could also use the hierarchical model for inferences, but as already
argued we would like to obtain a plain hidden Markov model. Therefore, we convert the
hierarchical model into a flat version.
But before diving into the technical details of the transformation, we introduce extended
PCFGs and hierarchical hidden Markov models formally. Extended PCFGs (EPCFG) extends a PCFG by adding the set of possible observations and the observation probabilities
as follows:
Definition 1.8 (EPCFG). Let hT, N, N1 , R, πi be a PCFG as defined above, let I be a set
of input symbols and P : T × I → R mapping input symbols and terminal states to observation probabilities, such that for all t ∈ T we find ∑i∈I P(t, i) = 1. Then we call
hT, N, N1 , R, π, I, Pi an extended PCFG.
To illustrate the concept, we use the following simple example in which a working day
is modelled. The task was to infer the current activity based on the speed obtained from a
GPS-sensor.
Example 1.7. Reconsider the grammar shown in Example 1.4. Using the current speed as
input, we have I = R and we define the observation probabilities as follows:
Pindoor = N(2,3)
Pwalk = N(4,3)
PcarSlow = N(25,15)
PcarFast = N(50,20)
PcarStop = N(0,2)
The observations probabilities shown above define the probability of certain activities with
respect to the current speed of the user.
Hierarchical hidden Markov models43 extend the usual notation of a HMM, by allowing
the definition of sub-models c ∈ C. Those sub-models can be activated by so called internal
states. This call-relation is captured in the function J with J(c, i) = c0 stating that state
i in sub-model c activates sub-model c0 , for non-internal states we define J(c, i) = . A
given sub-model can be left in any state with ξ (c, i) being the probability of sub-model c
terminating in state i. To simplify the notation below, we use natural numbers to denote the
states within a sub-model of a HHMM.
Definition 1.9 (HHMM). A hierarchical hidden Markov model is defined as a octuple
hC,C1 , | · |, π, τ, ξ , J, O, Pi with C being a set of model names and C1 ∈ C be the start model,
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
14
Burghardt, Wurdel, Bader, Ruscher and Kirste
| · | : C → N defining the size of a given sub-model, π : C × N → R being a prior probability
function with π(c, i) being the probability that model c starts in state i and for all c ∈ C
|c|
we find ∑i=0 π(c, i) = 1, τ : C × N × N → R defining the transition probabilities between to
|c|
states within a given model with ∑ j=0 τ(c, i, j) = 1 for all c and 0 ≤ i < |c|, ξ : C × N → R
defining the exit probability, J : C × N → C ∪ {} being a function from states to other submodels indicating sub-model-calls, O being a set of input symbols and P : C × N × O → R
being a function from sub-model states and inputs to observation probabilities.
Please note that we use the model name as an index rather than a parameter, e.g., we
write πc (i) instead of π(c, i).
D
Day:
T
0.3
Trans:
0
1
2
Trans
indoor
Trans
0.7
0
1
2
3
walk
carStop
Car
carStop
C
0.2
Car:
0.8
0
1
2
3
4
5
6
carSlow
carFast
carSlow
carSlow
carFast
carSlow
Car
Fig. 1.7. A graphical representation of the HHMM from Example 1.8. Prior probabilities are depicted by an
in-going arrow, exit probabilities using an outgoing arrow, and calls to a sub-model using a dashed line. All
unlabelled links are weighted with 1.0.
Example 1.8.
Consider the HHMM hC,C1 , | · |, π, τ, ξ , J, O, Pi with C = {D, T,C}, C1 = D, |D| = 3,
|T | = 4 and |C| = 7, πD (0) = 1, πT (0) = 0.3, πT (1) = 0.7, πC (0) = 0.2, πC (3) = 0.8 and
πc (i) = 0 otherwise, ξD (2) = 1, ξT (0) = 1, ξT (3) = 1, ξC (2) = 1, ξC (6) = 1 and ξc (i) = 0
otherwise, JD (0) = T , JD (2) = T , JT (2) = C and JC (6) = C. The transition probabilities are
shown as arrows in Figure 1.7, which contains a graphical representation of this HHMM.
Every arrow represents a transition with non-zero probability. Please note that this example
defines the structure of the HHMM only and neither the input symbols nor the observation
probabilities are depicted here.
1.4.3.1. From PCFGs to HHMMs
As a first step while constructing a HMM, we construct a HHMM corresponding to a given
EPCFG, with corresponding meaning informally “doing the same”. For the moment, we
consider acyclic PCFGs only. After presenting the definition and discussing the underlying
intuition, we extend our transformation to PCFGs allowing for cycles of length 1, i.e.,
references from a rule to the non-terminal.
For convenience, we assume the set of rules to be ordered and define the following
function ι : N×R×N → N mapping a given non-terminal, rule and position of some symbol
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
15
within the body of the rule to a global offset within respect to the non-terminal. Considering
the PCFG from Example 1.7, we find ι(Day, r1 , 1) = 1 and ι(Car, r5 , 2) = 5.
Definition 1.10. Let hT, N, N1 , R, π, I, Pi be an extended PCFG. We define the corresponding hierarchical hidden Markov model hC,C1 , | · |, π 0 , τ, ξ , J, O0 , P0 i as follows:
• C := N, C1 := N1 , |N| := ∑(n→ζ )∈R |ζ | and O0 := I
(
π(r) if i = ι(c, r, 1)
0
• πc (i) :=
0
otherwise
(
1 if i = ι(c, (c → ζ ), i0 ), j = i + 1 and i0 < |ζ |
• τc (i, j) :=
0 otherwise
(
1 if i = ι(c, (c → ζ ), i0 ) and i0 = |ζ |
• ξc :=
0 otherwise
0
• Jc (i) := c0 iff i = ι(c, (c → ζ ), i0 ), ζ [i ] = c0 and c0 ∈ N
0
• Pc0 (i, s) := P(t, s) iff i = ι(c, (c → ζ ), i0 ), ζ [i ] = t and t ∈ T
For a given grammar, we construct the corresponding HHMM by inserting a sub-model
for every non-terminal. For every occurrence of a (non)-terminal symbol within a rule for
the non-terminal we insert a state into the corresponding sub-model. E.g., for the nonterminal Day from the example above, we create a sub-model containing 3 states.
In our running example, we use a cycle of length 1 to reference sub-model C from state
C6 . Those cycles can be handled by changing the transition model of the corresponding
sub-state. Without discussing the technical details, the result is shown in the following
example.
Example 1.9. Reconsidering the EPCFG from Example 1.7 we find the corresponding
HHMM to be the one discussed in Example 1.8.
1.4.3.2. Flattening HHMMs
After constructing a hierarchical hidden Markov model from a given PCFG, we have to flatten this model into a “normal” HMM. Considering the procedural semantics of a HHMM,
we find that the model’s global state can be described using the current call-stack, which is
represented as a list of numbers. E.g., in the HHMM from Figure 1.7, the state 0 of the start
model Day calls sub-model Trans, being there in state 2 is represented as call stack [0, 2].
Please note furthermore, that for a given stack [s1 , . . . , sn ], the stack context [1s , . . . , sn−1 ]
completely specifies the sub-model of state sn . Therefore, we can use stack contexts to refer to sub-models in the definitions below. We use the notation [s1:n ] to refer to [s1 , . . . , sn ].
Before defining a flatted HMM, we compute the set of reachable stacks as follows:
Definition 1.11 (Stacks). Let hC,C1 , | · |, π, τ, ξ , J, O, Pi be a given HHMM. We define the
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
16
Burghardt, Wurdel, Bader, Ruscher and Kirste
set of reachable stacks as follows:
S = {[i] | 0 ≤ i < |C1 | and πC1 (i) > 0} ∪
{[si:n−1 , i] | [si:n ] ∈ S, 0 ≤ i < |[si:n−1 ]| and τ[si:n−1 ] (sn , i) > 0} ∪
{[si:n , i] | [si:n ] ∈ S, J[si:n−1 ] (sn ) = c, 0 ≤ i < |c| and πc (i) > 0}
This definition collects all reachable stacks by simply traversing the HHMM. To obtain
a flattened HMM, we construct a state for every possible stack ending with a non-internal
state, and afterwards, we compute the transition probabilities between different stacks.
Definition 1.12.
Let hC,C1 , | · |, π, τ, ξ , J, O, Pi be a given HHMM and S be the set of stacks. We define
the corresponding HMM hS, π, T, τ, O, Pi as follows:
• S := {[s1:n ] | [s1:n ] ∈ S and J[s1:n−1 ] (sn ) = }
(
πC1 (i) iff s = [i]
• π : S → R : s 7→
0
otherwise
• T := S × S
min(n,m)
• τ : T → R : ([s1:n ], [r1:m ]) 7→ ∑i=1
p([s1:n ], [r1:m ], i)
• P([s1:n ], o) = P[s1:n−1 ] (sn , o)
with p : S × S × N → R being defined as follows:
i
i
p([s1:n ], [r1:m ], i) = ∏ ξ[s1: j−1 ] (s j ) · (1 − ξ[s1:i−1 ] (si )) · τ[s1:i−1 ] (si , ri ) · ∏ π[r1: j−1 ] (r j )
j=n
j=m
The intuition behind the definition of p is as follows: to reach a stack configuration
[r1:m ] from a stack [s1:n ] using a transition on level i, we have to leave all sub-models from
level n back to level i (∏ij=n ξ[s1: j−1 ] (s j )), afterwards, we multiply the probability of not
leaving sub-model i (1 − ξ[s1:i−1 ] (si )) and transiting to state ri (τ[s1:i−1 ] (si , ri )), finally we call
all necessary sub-models to reach the desired stack-configuration (∏ij=m π[r1: j−1 ] (r j )).
Example 1.10. Figure 1.8 shows the flattened version of the hierarchical HMM from Example 1.8.
Using the process described above, we can construct a HMM for a given extended
PCFG such that the HMM captures the ideas of the grammar. I.e., all valid sequences of
actions are represented within the HMM and the corresponding observation models are
attached to the states. As mentioned above, we applied this approach in a first simple
scenario, in which we try to infer the users activity based on her current speed. The experiments show that the general temporal structure of a day can easily be captured in a PCFG
and thus be transformed into an equivalent HMM. Here, we did not discuss the duration
of single actions. But instead of using just a single state for a given action, we can use a
sequence of states. But this extension is beyond the scope of this paper.
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
17
0.3
0.0
1
0.3
2.0
0.7
0.7
0.2
0.1
0.2.0
0.2.1
0.2.2
0.3
0.2
0.8
0.2.3
0.2.4
0.8
0.2
0.2.5
2.1
2.2.0
2.2.1
2.2.2
2.3
0.2
0.8
2.2.3
2.2.4
2.2.5
0.8
Fig. 1.8. A graphical representation of the flattened HHMM from Example 1.8. As above, prior probabilities
are depicted by an in-going arrow, exit probabilities using an outgoing arrow, and unlabelled links are weighted
1.0.
1.4.4. Joint HMMs
In the previous sections three approaches have been presented which allow us to model
human activities in a more convenient manner than it is achievable by instantly utilising the
hidden Markov models (HMM) approach. We have seen that it is feasible to synthesise a
HMM based on a description of domain and problem of the intended scenario in terms of
precondition/effect operators or on a set of rules of a Probabilistic Context-Free Grammar
(PCFG) or on a process algebra described in the CTML language, respectively. In each case
we obtain a well-defined hidden Markov model as a result, that allows us to infer human
activities, given the specific case of application.
As mentioned in section 1.3.1, we can assume that the real-world processes which
can be described by one of the previously depicted formalisms and of which afterwards a
HMM can be synthesised, consist of time-sequential activities which we understand as each
of them beginning and ending at specific points in time, and following one after another.
Thus, a state si ∈ S of the HMM hS, π, T, τ, O, Pi can be seen as a specific activity in
the underlying process. The transition (i, j) ∈ T connecting the states si and s j can - if
τ((i, j)) > 0 - hence be seen as the existence of a potential consecutiveness between the
two corresponding activities of this process.
Note that it is varying and heavily scenario-dependent how well one of the specified
formalisms fits to the process that is to be modelled. Consider having gained two or more
somehow scenario-complementary hidden Markov models by utilising different synthesising algorithms. One would like to join these models, e.g. by inserting novel inter-HMM
transitions, which would introduce additional potential consecutiveness between the corresponding activities. It is thereby evident that the result of this joining operation needs to be
a well-defined hidden Markov model itself, preserving the advantages of this probabilistic
approach.
Definition 1.13 (JointHMM). Let H be a set of n well-defined hidden Markov models
S
hSi , πi , Ti , τi , Oi , Pi i with 1 ≤ i ≤ n, Si ∩ S j = 0/ for i 6= j, S = Si and Oi = O j . Let R
be a set of connections with R ⊆ S × S and ρ : R 7→ R be a mapping from inter-model
connections to probabilities. We define the joint hidden Markov model hS, π, T, τ, O, Pi as
follows:
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
18
Burghardt, Wurdel, Bader, Ruscher and Kirste
• π : S → R : s 7→
S
• T = Ti ∪ R
πi (s)
n
with s ∈ Si
• τ : S × S → R : (s1 , s2 ) 7→



ρ((s1 , s2 ))
τ (s , s ) · f
 i 1 2


if (s1 , s2 ) ∈ R
with s1 , s2 ∈ Si
and f = 1 − ∑(si ,s0 )∈R ρ((si , s0 ))
• P : S × O → R : (s, o) 7→ Pi (s, o) with s ∈ Si
Given two or more hidden Markov models compliant with definition 1.1, as sextuples
hSi , πi , Ti , τi , Oi , Pi i the question was issued of how to join these models. In a first step, we
unify the original sets of states S1 , S2 , . . . , Sn of the n HMMs to obtain the set of states S of
the joint HMM, while assuming them to be pairwise disjoint.
Furthermore, the values of the joint prior probability function π should sum up to 1.
As a straightforward approach we divide all the values πi by the number of the original
HMMs, n. Incidentally, one also could define (some kind of global) prior weights over the
different models, and combine these with the priors for each model. We decided not to do
so for simplicity reasons.
The joint set of transitions T can be determined by unifying the particular sets of transitions T1 , T2 , . . . , Tn and R, which is a set of novel inter-HMM transitions. This step also
needs the assumption of the subsets T1 , T2 , . . . , Tn , R to be pairwise disjoint. The set R can
hereby be imagined as a set of rules that allow an intuitive way to fully specify novel interHMM transitions. As an example, one of these rules might denote a transition between the
states s and s0 of the HMMs A and B, respectively, together with assigning the probability
p
p: A(s) −
→ B(s0 ).
The probability value of an inter-HMM transition is herewith interpreted as the chance
of exiting a sub-HMM and entering the targeted sub-HMM. The joint transition probability
function τ is therefore determined by the original τi except for the particular transitions that
originate from a state s ∈ Si and this s itself is a starting point of at least one inter-HMM
transition. In such cases, the probabilities of the intra-HMM transitions coming from s have
to be normalised with the factor f by definition 1.13.
Finally coming to the observation model, for simplicity reasons we assume the sets
of observations Oi of the sub-HMMs to be identical. Thus, the observation probability
function P is fully determined by the Pi of the sub-HMMs.
Example 1.11.
Consider now having modelled a meeting scenario held in an intelligent meeting room,
e.g. by using PDDL operators, and furthermore the modelling of a technical infrastructure
demonstration in this room using a set of PCFG rules. After having translated these two
model descriptions, we have as a result obtained two disparate hidden Markov models, each
recognising diverse sets of activities. Once we imagine the meeting to be preceded by such
a technical demonstration the concern for joining these HMMs is coming up.
Let now denote M(Presentation) the state Presentation of a hidden Markov model M,
with M modelling the meeting scenario, and D(DemoEnding) denote a state DemoEnding
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
19
of HMM D, with D modelling the technical demonstration. We now would like to introduce
the existence of a potential transition between these two activities, expressing a potential
consecutiveness between the ending of a technical demonstration and a presentation within
a project meeting. For this purpose we bring in a novel inter-HMM transition and establish
0.2
a rule which connects the two states: D(DemoEnding) −→ M(Presentation)
The probability value 0.2 indicates a chance of 0.2 to leave the sub-HMM D and enter
M. Each intra-HMM transition which comes from the state D(DemoEnding) needs to be
multiplied by 1 − 0.2 = 0.8. At this point, we have everything we need to join the HMMs
D and M by definition 1.13.
1.5. Discussion
The underlying idea of all our approaches is to turn a formal description of the users activity into a probabilistic generative model that is able to recognise the desired activities.
Over time, many different formal descriptions for human behaviour have emerged. In this
paper we have explored three alternatives: a top-down description where we have chosen
CTML as a representative for task modelling language. The second approach is a bottomup modelling approach where we use a causal description with preconditions and effects.
Our third approach is to use PCFGs. To leverage the advantages of the individual modelling approaches we combine the simpler HMMs generated by each modelling approach
to a joint HMM.
PDDL
Synthesise
HMM1
CCTML
Synthesise
HMM2
PCFG
Synthesise
HMM3
Fig. 1.9.
Join
Joint
HMM
Overview of our model synthesis research
In this section we discuss the individual strength and weaknesses of the different approaches when modelling real world scenarios, creating some guidelines when to use a
certain modelling paradigm and when not.
1.5.1. Planning operators
“Classical” planning has a long history of searching for solutions in order to efficiently
schedule resources or control the behaviour of robots. We translated this approach to humans in order to describe (simplified) human behaviour. The advantage of the planning
approach is the implicit description of all processes that can occur, clearly an advantage
as human behaviour often is interleaved with different activities. As the processes are
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
20
Burghardt, Wurdel, Bader, Ruscher and Kirste
modelled implicit, no permutation can be forgotten. Also the possible interaction between
multiple humans emerge automatically, a clear advantage for team scenarios.
Furthermore it is easy to extend the application domain with new operators as these can
be simply added to the action repository and are automatically added by the algorithm in the
right positions. Thus devices in a smart environment could describe their possible human
interactions with preconditions and effects and become a part in the activity recognition
process.44
Two main challenges arise from this planning approach: The first is the description of
the world-state I. It is difficult to define which information belongs in the current worldstate and what therefore all the preconditions and effects of an action need to take into
account. The second challenge is the state explosion which is here much more imminent
than in the other two approaches. However by employing approximate inference we can
construct a particle filter where the planning process efficiently describes the state transition
process, thus avoiding the need of exploring the whole state space a-priori.
1.5.2. Task models
Task analysis and modelling has been employed for specifying user actions over decades.
It is a commonly agreed in HCI that task analysis is a important device for interaction development (which we consider as a super domain of intention recognition). As CTML is
rooted in task modelling approaches of HCI, it supports the asset of those approaches: understandability, intuitiveness, tool support and embedment into a methodology. The assets
are also relevant for intention recognition especially while creating the model.
Top-down modelling has the advantage of incorporating gradual refinement of models
in an intuitive manner. Actions can be further decomposed as necessary. Moreover as
CTML define a formal semantics verification algorithms can be employed to assure certain
quality criteria (e.g. deadlock freedom). Another advantage of CTML is the opportunity
of validating the model by animation. As the semantics are well defined an interactive
animation can be directly created.
However it is more important how such a modelling approach is valid for highly dynamic systems like smart environments at runtime. One has to admit that task modelling
is a rather static approach as task models are created interactively by software designers
at design time. Adding new tasks is rather cumbersome as the task model needs to be
adapted (which is not the case using the precondition/effects description formalism). On
the contrary designed models exhibit a higher quality than automatically composed ones.
In contrast to context-free grammars CTML is able to specify concurrent behaviour.
This is a major advantage as humans usually perform not only one task a time but mingling
task in an interleaving manner.
Especially for multi-user specifications CTML models can become quite complex
which leads naturally to very large HMMs. This is a problem that can be solved with
approximate inference by employing a particle filter. The CTML is intuitively a compact
representation of the state transition function. With a particle filter we keep only track of
the executed task and employ a lazy (on demand) state creation. The treatment of observa-
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
21
tion is currently rather basic. More sophisticated approaches need to be developed to create
more realistic HMMs which is one item currently under investigation.
1.5.3. Probabilistic Context-Free Grammars
Probabilistic context-free grammars have quite successfully been applied in the area of
speech recognition. Here we used them to initialise a hidden Markov model to recognise
human behaviour. In first experiments we used the approach to model the overall structure
of a day and to infer the users activity based on her current speed as computed by a GPS
sensor. Those experiments showed that our approach can indeed be used to model highlevel activities and their temporal structure quite intuitively. But they also showed that e.g.,
interleaving activities are hard to model. On the other hand, the overall structure can be
modelled quite intuitive. Due to space constraints we did not discuss timing information
here. But those can be integrated into the model using the standard time-expansion of
states. E.g., if a certain state is supposed to last for three time-steps, it is expanded into a
sequence of three identical states. In a similar fashion and using self transitions in all states
of the sequence it is possible to implement a negative binomial distribution with respect to
the expected time a state should last.
1.5.4. Joint Hidden Markov Models
The major advantage in the presented HMM-joining algorithm can be seen in addressing
the need for a formalism to combine a set of hidden Markov models that (potentially) have
been synthesised by using differing formalisms.
p
Therefore we have introduced a simple operator −
→, that allows us to deploy a set of
rules, with each one introducing a novel inter-HMM transition, plus the assignment of a
probability value p which obeys an intuitive semantics: the chance of leaving the current
sub-HMM through this particular transition.
As a result, we gain a well-defined hidden Markov model, incorporating the previously
shown advantages of probabilistic generative models and symbolic high-level description
of complex human activities.
We are currently investigating, how we can weaken the precondition Oi = O j and
thereby determine the two parameters O and P of the joint HMM in a more generalising manner, so that the set of observations which can be recognised by the available sensors, and the state-observation probabilities, respectively, are allowed to differ between
sub-HMMs.
1.6. Summary and Outlook
In order to easily describe hierarchical behaviour, we employ high-level description languages to derive probabilistic models for activity recognition. In this paper we have shown
three alternatives based on a process-based language, a causal language, and a grammar,
to synthesise generative probabilistic models for activity recognition from different high-
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
22
Burghardt, Wurdel, Bader, Ruscher and Kirste
level description modelling paradigms. These different approaches comprise top-down and
bottom-up modelling, and are thus complementing each other.
We have further sketched a method to combine the languages to derive a joint
HMM, thus leveraging the individual strength and weaknesses of the different modelling
paradigms.
Given the basics in this paper, we seek to build activity recognition systems for complex domains like team-activity recognition or a day-care nurse, that comprise of many
hierarchical, long-term, intermingled activities, leading to very large state-spaces that are
very challenging to describe.
References
1. D. J. Cook, J. C. Augusto, and V. R. Jakkula, Ambient intelligence: Technologies, applications, and opportunities, Pervasive and Mobile Computing. 5(4), 277–298 (August, 2009). ISSN
15741192. doi: 10.1016/j.pmcj.2009.04.001. URL http://dx.doi.org/10.1016/j.pmcj.
2009.04.001.
2. G. D. Abowd and E. D. Mynatt, Designing for the human experience in smart environments. pp.
151–174 (September, 2004). doi: 10.1002/047168659X.ch7. URL http://dx.doi.org/10.
1002/047168659X.ch7.
3. G. M. Youngblood and D. J. Cook, Data mining for hierarchical model creation, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 37(4), 561–572
(July, 2007). ISSN 1094-6977. doi: 10.1109/TSMCC.2007.897341. URL http://dx.doi.
org/10.1109/TSMCC.2007.897341.
4. F. Doctor, H. Hagras, and V. Callaghan, A fuzzy embedded agent-based approach for realizing
ambient intelligence in intelligent inhabited environments, IEEE Transactions on Systems, Man,
and Cybernetics, Part A. 35(1), 55–65, (2005). doi: 10.1109/TSMCA.2004.838488. URL http:
//dx.doi.org/10.1109/TSMCA.2004.838488.
5. E. Chávez, R. Ide, and T. Kirste. Samoa: An experimental platform for situation-aware mobile
assistance. In eds. C. H. Cap, W. Erhard, W. Koch, C. H. Cap, W. Erhard, and W. Koch, ARCS,
pp. 231–236. VDE Verlag, (1999). ISBN 3-8007-2482-0. URL http://dblp.uni-trier.de/
rec/bibtex/conf/arcs/ChavezIK99.
6. A. Fox, B. Johanson, P. Hanrahan, and T. Winograd, Integrating information appliances into
an interactive workspace, IEEE Computer Graphics and Applications. 20(3), 54–65 (August,
2000). ISSN 0272-1716. doi: 10.1109/38.844373. URL http://dx.doi.org/10.1109/38.
844373.
7. S. A. Velastin, B. A. Boghossian, B. Ping, L. Lo, J. Sun, and M. A. Vicencio-silva. Prismatica:
Toward ambient intelligence in public transport environments. In Good Practice for the Management and Operation of Town Centre CCTV. European Conf. on Security and Detection, vol. 35,
pp. 164–182, (2005). URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.
1.1.110.2113.
8. Z. Chen. Bayesian filtering: From kalman filters to particle filters, and beyond. Technical report, McMaster University, (2003). URL http://math1.unice.fr/\~{}delmoral/chen\
_bayesian.pdf.
9. L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov
networks. In IJCAI’05: Proceedings of the 19th international joint conference on Artificial intelligence, pp. 773–778, San Francisco, CA, USA, (2005). Morgan Kaufmann Publishers Inc.
URL http://portal.acm.org/citation.cfm?id=1642293.1642417.
10. D. J. Patterson, L. Liao, K. Gajos, M. Collier, N. Livic, K. Olson, S. Wang, D. Fox, and H. A.
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
23
Kautz. Opportunity knocks: A system to provide cognitive assistance with transportation services. In eds. N. Davies, E. D. Mynatt, I. Siio, N. Davies, E. D. Mynatt, and I. Siio, Ubicomp, vol.
3205, Lecture Notes in Computer Science, pp. 433–450. Springer, (2004). ISBN 3-540-22955-8.
URL http://dblp.uni-trier.de/rec/bibtex/conf/huc/PattersonLGCLOWFK04.
D. H. Hu and Q. Yang. Cigar: concurrent and interleaving goal and activity recognition. In
AAAI’08: Proceedings of the 23rd national conference on Artificial intelligence, pp. 1363–1368.
AAAI Press, (2008). ISBN 978-1-57735-368-3. URL http://portal.acm.org/citation.
cfm?id=1620286.
A. Hein and T. Kirste, A hybrid approach for recognizing adls and care activities using inertial
sensors and rfid. 5615, 178–188, (2009). doi: 10.1007/978-3-642-02710-9\_21. URL http:
//dx.doi.org/10.1007/978-3-642-02710-9\_21.
M. Perkowitz, M. Philipose, K. Fishkin, and D. J. Patterson. Mining models of human activities
from the web. In WWW ’04: Proceedings of the 13th international conference on World Wide
Web, pp. 573–582, New York, NY, USA, (2004). ACM. ISBN 1-58113-844-X. doi: 10.1145/
988672.988750. URL http://dx.doi.org/10.1145/988672.988750.
H. A. Kautz. A formal theory of plan recognition and its implementation. In eds. J. F. Allen,
H. A. Kautz, R. Pelavin, and J. Tenenberg, Reasoning About Plans, pp. 69–125. Morgan Kaufmann Publishers, San Mateo (CA), USA, (1991). URL http://citeseerx.ist.psu.edu/
viewdoc/summary?doi=10.1.1.21.1583.
R. P. Goldman, C. W. Geib, and C. A. Miller, A bayesian model of plan recognition, Artificial Intelligence. 64, 53–79, (1993). URL http://citeseerx.ist.psu.edu/viewdoc/summary?
doi=10.1.1.21.4744.
M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Fox, H. Kautz, and D. Hahnel,
Inferring activities from interactions with objects, IEEE Pervasive Computing. 3(4), 50–57 (October, 2004). ISSN 1536-1268. doi: 10.1109/MPRV.2004.7. URL http://dx.doi.org/10.
1109/MPRV.2004.7.
D. W. Albrecht, I. Zukerman, and A. E. Nicholson, Bayesian models for keyhole plan recognition
in an adventure game, User Modeling and User-Adapted Interaction. 8(1), 5–47 (March, 1998).
doi: 10.1023/A:1008238218679. URL http://dx.doi.org/10.1023/A:1008238218679.
H. H. Bui. A general model for online probabilistic plan recognition. In IJCAI’03: Proceedings
of the 18th international joint conference on Artificial intelligence, pp. 1309–1315, San Francisco, CA, USA, (2003). Morgan Kaufmann Publishers Inc. URL http://portal.acm.org/
citation.cfm?id=1630846.
E. Kim, S. Helal, and D. Cook, Human activity recognition and pattern discovery, IEEE Pervasive Computing. 9(1), 48–53 (January, 2010). ISSN 1536-1268. doi: 10.1109/MPRV.2010.7.
URL http://dx.doi.org/10.1109/MPRV.2010.7.
G. Singla, D. J. Cook, and M. Schmitter-Edgecombe, Recognizing independent and joint activities among multiple residents in smart environments, Journal of Ambient Intelligence and
Humanized Computing. 1(1), 57–63. ISSN 1868-5137. doi: 10.1007/s12652-009-0007-1. URL
http://dx.doi.org/10.1007/s12652-009-0007-1.
T. Gu, Z. Wu, X. Tao, H. K. Pung, and J. Lu. epsicar: An emerging patterns based approach to
sequential, interleaved and concurrent activity recognition. In 2009 IEEE International Conference on Pervasive Computing and Communications, vol. 0, pp. 1–9, Los Alamitos, CA, USA
(March, 2009). IEEE. ISBN 978-1-4244-3304-9. doi: 10.1109/PERCOM.2009.4912776. URL
http://dx.doi.org/10.1109/PERCOM.2009.4912776.
R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, and C. Isbell, A novel sequence representation for unsupervised analysis of human activities, Artificial Intelligence. 173(14),
1221–1244 (September, 2009). ISSN 00043702. doi: 10.1016/j.artint.2009.05.002. URL http:
//dx.doi.org/10.1016/j.artint.2009.05.002.
A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logis-
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
24
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
Burghardt, Wurdel, Bader, Ruscher and Kirste
tic regression and naive bayes, (2001). URL http://citeseerx.ist.psu.edu/viewdoc/
summary?doi=10.1.1.19.9829.
J.-H. Xue and D. M. Titterington, Comment on ” on discriminative vs. generative classifiers: A
comparison of logistic regression and naive bayes”, Neural Processing Letters. 28(3), 169–187
(December, 2008). ISSN 1370-4621. doi: 10.1007/s11063-008-9088-7. URL http://dx.doi.
org/10.1007/s11063-008-9088-7.
S. Dasgupta, M. L. Littman, and D. McAllester. Pac generalization bounds for co-training,
(2001). URL http://ttic.uchicago.edu/\~{}dmcallester/cotrain01.ps.
A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via
the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological). 39(1),
1–38, (1977). ISSN 00359246. doi: 10.2307/2984875. URL http://dx.doi.org/10.2307/
2984875.
V. W. Zheng, D. H. Hu, and Q. Yang. Cross-domain activity recognition. In Ubicomp ’09:
Proceedings of the 11th international conference on Ubiquitous computing, pp. 61–70, New
York, NY, USA, (2009). ACM. ISBN 978-1-60558-431-7. doi: 10.1145/1620545.1620554. URL
http://dx.doi.org/10.1145/1620545.1620554.
W. Pentney. Large scale use of common sense for activity recognition and analysis. URL http:
//citeseer.ist.psu.edu/rd/32044135\%2C761213\%2C1\%2C0.25\%2CDownload/
http://citeseer.ist.psu.edu/cache/papers/cs2/285/http:zSzzSzwww.cs.
washington.eduzSzhomeszSzbillzSzgenerals.pdf/pentney05large.pdf.
W. Pentney, M. Philipose, and J. A. Bilmes. Structure learning on large scale common sense statistical models of human state. In eds. D. Fox, C. P. Gomes, D. Fox, and C. P. Gomes, AAAI, pp.
1389–1395. AAAI Press, (2008). ISBN 978-1-57735-368-3. URL http://dblp.uni-trier.
de/rec/bibtex/conf/aaai/PentneyPB08.
R. E. Fikes and N. J. Nilsson, Strips: A new approach to the application of theorem proving to
problem solving, Artificial Intelligence. 2(3-4), 189–208, (1971). doi: 10.1016/0004-3702(71)
90010-5. URL http://dx.doi.org/10.1016/0004-3702(71)90010-5.
Q. Limbourg and J. Vanderdonckt. Comparing task models for user interface design. In eds.
D. Diaper and N. Stanton, The Handbook of Task Analysis for Human-Computer Interaction.
Lawrence Erlbaum Associates, (2003).
M. Giersich, P. Forbrig, G. Fuchs, T. Kirste, D. Reichart, and H. Schumann, Towards an integrated approach for task modeling and human behavior recognition, Human-Computer Interaction. 4550, 1109–1118, (2007).
M. Wurdel. Towards an holistic understanding of tasks, objects and location in collaborative
environments. In ed. M. Kurosu, HCI (10), vol. 5619, Lecture Notes in Computer Science, pp.
357–366. Springer, (2009). ISBN 978-3-642-02805-2.
M. Wurdel, D. Sinnig, and P. Forbrig, Ctml: Domain and task modeling for collaborative environments, JUCS. 14(Human-Computer Interaction), (2008).
A. W. Roscoe, C. A. R. Hoare, C. A. R. Hoare, and B. Richard, The Theory and Practice of
Concurrency. (Prentice Hall PTR, 1997).
F. Paterno and C. Santoro. The concurtasktrees notation for task modelling. Technical report,
(2001).
K. Lari and S. Young, The estimation of stochastic context-free grammars using the insideoutside algorithm, Computer Speech and Language. (4), 35–56, (1990).
J. Hoffmann and B. Nebel. The ff planning system: Fast plan generation through heuristic search,
(2001). URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.673.
L. Brownston, Programming expert systems in OPS5 : an introduction to rule-based programming. The Addison-Wesley series in artificial intelligence, (Addison-Wesley, 1985). ISBN 978.
URL http://www.worldcat.org/isbn/978.
D. B. Lenat, Cyc: a large-scale investment in knowledge infrastructure, Commun. ACM. 38
Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
41.
42.
43.
44.
25
(11), 33–38 (November, 1995). ISSN 0001-0782. doi: 10.1145/219717.219745. URL http:
//dx.doi.org/10.1145/219717.219745.
P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu. Open mind common sense:
Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems,
2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and
ODBASE 2002, pp. 1223–1237, London, UK, (2002). Springer-Verlag. ISBN 3-540-00106-9.
URL http://portal.acm.org/citation.cfm?id=701499.
P. Kiefer and K. Stein. A framework for mobile intention recognition in spatially structured
environments. In eds. B. Gottfried, H. K. Aghajan, B. Gottfried, and H. K. Aghajan, BMI, vol.
396, CEUR Workshop Proceedings, pp. 28–41. CEUR-WS.org, (2008). URL http://dblp.
uni-trier.de/rec/bibtex/conf/ki/KieferS08.
S. Fine and Y. Singer. The hierarchical hidden markov model: Analysis and applications. In
MACHINE LEARNING, pp. 41–62, (1998).
C. Reisse, C. Burghardt, F. Marquardt, T. Kirste, and A. Uhrmacher. Smart environments
meet the semantic web. In MUM ’08: Proceedings of the 7th International Conference on
Mobile and Ubiquitous Multimedia, pp. 88–91, New York, NY, USA, (2008). ACM. ISBN
978-1-60558-192-7. doi: 10.1145/1543137.1543154. URL http://dx.doi.org/10.1145/
1543137.1543154.