Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Chapter 1 Synthesising Generative Probabilistic Models for High-Level Activity Recognition Christoph Burghardt, Maik Wurdel, Sebastian Bader, Gernot Ruscher and Thomas Kirste Institute for Computer Science, University of Rostock, 18059 Rostock, Germany [email protected] High-level (hierarchical) behaviour with long-term correlations is difficult to describe with first-order Markovian models like Hidden Markov models. We therefore discuss different approaches to synthesise generative probabilistic models for activity recognition based on different symbolic high-level description. Those descriptions of complex activities are compiled into robust generative models. The underlying assumptions for our work are (i) we need probabilistic models in robust activity recognition systems for the real world, (ii) those models should not necessarily rely on an extensive training phase and (iii) we should use available background knowledge to initialise them. We show how to construct such models based on different symbolic representations. 1.1. Introduction & Motivation Activity recognition is an important part of most ubiquitous computing applications.1 With the help of activity recognition, the system can interpret human behaviour to assist the user,2–4 being able to control complex environments5,6 or detect difficult or hazardous situations.7 Activity recognition got very successful with the rise of machine-learning techniques8 that helped to build systems which gain domain knowledge from simple training data.9–11 However while it is simple to gather training data for inferring e.g. the gait of the user,12 this process does not scale up for longer and more complex activities.13 Long-term behaviour of users, or scenarios with multiple interacting people, lead to a state explosion that systematically hinders the process of collecting training data. Higher-level activities like “making tea”, “printing a paper” or “holding a meeting” typically consist of more than five steps which can – at least partially – vary in execution order or even be omitted. Under these circumstances gathering or even annotating training data is in general very complex and expensive. Furthermore the training data is less valuable because it is much more difficult to generalise to unseen instances. Another desirable feature of an activity recognition system is that the system should work in an ad-hoc fashion and exhibit sensible behaviour right from the start. It is not possible to gather training data for every ad-hoc session. The aim of this paper is to briefly present earlier research and explore three formalisms 1 Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 2 Burghardt, Wurdel, Bader, Ruscher and Kirste of human behaviour modelling with respect to their suitability to express different highlevel activities, thus turning our explicit knowledge automatically to a probabilistic framework for inference tasks. Our work is based on the following three hypotheses: i) Probabilistic generative models are needed: To cope with noisy data, we need some kind of probabilistic system, because crisp systems usually fail in such domains. One goal of activity recognition is to provide pro-active assistance to the user. For this, we do need generative models which allow forecasts into to future activities, which then can be supported by the system. Therefore, we need probabilistic and in particular also generative models. ii) Activity recognition systems should not rely on a training phase: We believe that an activity recognition system should work right from the start. I.e., we should not have to train it before being able to use it. Nonetheless it should profit from available training data. As argued above, in most interesting situations the collection of training data is quite expensive and sometimes even unfeasible, because there are just to many different situations the system is supposed to work in. iii) Available (symbolic) background knowledge can and should be used: To create systems with the desired properties, we need to use available background knowledge to synthesise such probabilistic models. Because different modelling paradigms have individual strength and weaknesses in terms of describing sequences and hierarchical structures of activities, we discuss three different approaches in this paper. After introducing the preliminaries, we discuss how to convert process-based hierarchical approaches based on task models and grammars in section 1.4.1 to 1.4.3. and one causal approach, based on preconditions and effects in section 1.4.2. To leverage the individual strength of each formal description, we also discuss a first approach to combine those approaches into a single model. In section 1.5 we discuss the advantages and disadvantages of the approaches and conclude this paper with an outlook for future research avenues. We are using hidden Markov models as the target formalism, because they are simple to illustrate and yet powerful enough to abstract every realistic (finite) problem. Furthermore, powerful and efficient inference mechanism for HMMs are well known. Even though it is clear that propositional symbolic descriptions can be transformed into a hidden Markov model, a concise treatment of such approaches is missing in the literature. Here we make a first attempt by discussing the transformation of different formal descriptions into HMMs utilising the same notation and thus provide a starting point for a further unification of such approaches. 1.2. Related Work Among the most mature approaches to activity recognition are logic-based ones that try to infer the current activity based on logical explanations for the observed actions. K AUTZ et Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 3 al. introduced a formal theory on “plan recognition”.14 However, also argued by C HAR NIAK AND G OLDMAN ,15 in the real world sensor data is inherently uncertain and therefore a probabilistic framework is needed. This made Bayesian inference methods very popular for activity and context recognition.10,11,16–18 Many approaches use dynamic Bayesian networks (DBNs) and especially hidden Markov models (HMMs)16,19,20 as the most simple DBN to infer the current activity of the user. Numerous effective inference and learning algorithms exists for HMMs and we can map every realistic finite problem onto them. Many recent activity recognition systems21,22 employ discriminative approaches because of their more efficient use of training data. E.g., in23,24 it is argued that discriminative models achieve a lower asymptotic error bound, but that generative models achieve their (typically higher) error bound faster. That is, with little training data, a generative approach (built on prior knowledge) is able to provide better estimates. If a generative model does correctly reflect the dynamics of the system, it achieves a better performance with limited or no training data. Different approaches and techniques (e.g. bootstrapping,25,26 cross-domain activity recognition27 ) are under research to minimise or omit the need of training data. E.g., the Proact system16 data-mined structured action descriptions from the Internet. In later publications their extended the data-mining basis to also include knowledge databases.28,29 We complement and extend this approach by taking (different) formal descriptions of human behaviour and turn these into probabilistic generative models usable for activity recognition. However, in our work we seek to construct generative models in order to predict future behaviour. 1.3. Preliminaries We now discuss some related approaches and introduce some preliminary concepts important for the sequel. In particular, we discuss hidden Markov models (HMMs), the collaborative task modelling language (CTML), the planning domain definition language (PDDL) and probabilistic context-free grammars (PCFGs) to some extent. 1.3.1. Hidden Markov Models Inferring activities from noisy and ambiguous sensor data is a problem akin to tasks such as estimating the position of vehicles from velocity and direction information, or estimating a verbal utterance from speech sounds. These tasks are typically tackled by employing probabilistic inference methods. In a probabilistic framework, the basic inference problem is to compute the distribution function p(Xt |Y1:t = y1:t ); the probability distribution of the state random variable Xt given a specific realisation of the sequence of observation random variables Y1:t . The probability of having a specific state xt is then given by p(xt | y1:t ). In a model-based setting, we assume that our model specifies the state space X of the system and provide information on the system dynamics: It will make a statement Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 4 Burghardt, Wurdel, Bader, Ruscher and Kirste about in which future state xt+1 ∈ X it will be in, if it is currently in state xt ∈ X. In a deterministic setting, system dynamics will be a function xt+1 = f (xt ). However, many systems (e.g. humans) act in a non-deterministic fashion, therefore, the model generally provides a probability for reaching a state xt+1 given a state xt , denoted by p(xt+1 | pt ). This is the system model. The system model is first order Markovian: the current state depends only on the previous state and no other states. Furthermore, we typically cannot observe directly the current state of the system, we have sensors that make observations Y . These observations yt will depend on the current system state xt , again non-deterministically in general, giving the distribution p(yt | xt ) as observation model. Let us consider a small example: Example 1.1. Figure 1.1 shows a graphical representation of the HMM hS, π, T, τ, O, Pi with S = {One, Two}, π(One) = π(Two) = 0.5, τ = S × S with τ(One, One) = τ((Two, Two)) = 0.9 and τ(One, Two) = τ((Two, One)) = 0.1, using O = R and P(One, i) = N(−1,2) (i) and P(Two, i) = N(1,1) (i). a Two One 0.5 0.1 0.1 0.9 0.9 0.5 Fig. 1.1. A graphical representation of the transition and the observation model of the HMM from Ex. 1.1 We can write this example more formally: Definition 1.1 (HMM). A hidden Markov model is a tuple (S, π, T, τ, O, P) with S being a set of states and π assigning an initial probability to these states with ∑s∈S π(s) = 1 , T ⊆ S × S being the state transition relation and τ : T → R mapping transition to probabilities with ∑(s,s0 )∈T τ((s, s0 )) = 1 for all s ∈ S , O being a set of possible observations and P : S × O → R mapping states and observations to probabilities with ∑o∈O P(s, o) = 1 for all s ∈ S. The states and the corresponding state-transition model describe the sub-steps of an activity and the order that they have to be done. In the literature there exists many difa Throughout the paper we use N(µ,σ ) to denote the normal distribution with mean µ and standard deviation σ . Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 5 ferent paradigms to model human behaviour. However, few have been applied to describe human behaviour with respect to activity recognition. In this paper, we discuss different approaches to describe human long term and high-level activities such that those formal descriptions (task models, planning operators and grammars) can be used to initialise HMMs. 1.3.2. Planning Problem Automated planning is a branch of artificial intelligence that concerns the realisation of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. In the STRIPS formalism30 actions are described using preconditions and effects. A state of the world is described using a set of ground proposition, which are assumed to be true in the current state. A precondition is an arbitrary function-free first order logic sentence. The effects are imposed on the current world-state. Problems are defined with respect to a domain (a set of actions) and specify an initial situation and the goal condition which the planner should achieve. All predicates which are not explicitly said to be true in the initial conditions are assumed to be false (closed-world assumption). Example 1.2. Consider a planning problem with i.e., three simple actions a, b, and c with effects p, q, and r respectively. We can define this problem as I = {a, b, c}, Is = {}, G = {p, q, r}, A = {a, b, c}, pre(a) = pre(b) = {}, pre(c) = {p, q}, eff+ (a) = {p}, eff+ (b) = {q}, eff+ (c) = {r}, eff− (a) = eff− (b) = eff− (c) = {}. Starting from an empty initial world state we are looking for a state in which every operator is applied at least once. The corresponding state graph is shown in Figure 1.2. aa {} {a} ab ab aa {a,b} ac {a,b,c} {b} Fig. 1.2. Transition graph for the planning problem from Example 1.2 Definition 1.2 (Planning Problem). We define a planning problem formally as a tuple hI, Is , G, A, pre, eff+ , eff− i with I being the set of all ground propositions in the domain, Is ⊆ I being the initial state of the world and G ⊆ I being the set of propositions that must hold in the goal state, A is a set of actions with pre : A → P(I) mapping actions to preconditions and eff+ , eff− : A → P(I) mapping actions to positive and negative effects respectively. The purpose of a planner is to build up a valid sequence of actions that change the world from the initial state Is to a state where the goal condition G is true, as described Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 6 Burghardt, Wurdel, Bader, Ruscher and Kirste by the problem description. By describing each action with precondition and effects, this modelling approach describes the resulting valid processes implicit, thus in a bottom up fashion. The automatic emergence of all valid process sequences is a very useful property for activity recognition. 1.3.3. Task Models In Human Computer Interaction (HCI), task analysis and modelling is a commonly accepted method to elicit requirements when developing a software system. This also applies for smart environments but due to the complexity of such an environment the modelling tasks is quite challenging. Different task-based specification techniques exist (e.g.: HTA TAG, TKS, CTT (for an overview we refer to31 ) but none incorporates the complexity involved in such a scenario. According to task analysis humans, tend to decompose complex tasks into more simple ones until an atomic unit, the action, has been reached. Moreover tasks are performed not only sequential but also decision making and interleaving task performance is common to humans. Therefore the basic concepts incorporated by most modelling languages are hierarchical decomposition and temporal operators restricting the potential execution order of tasks (e.g. sequential, concurrent or disjunct execution). An appropriate modelling notation, accompanied by a tool, supports the user in several ways. First, task-based specifications have the asset of being understandable to stakeholders. Thus modelling results can be discussed and revised based on feedback. Second, task models can be animated which even fosters the understandability and offers the opportunity to generate early prototypes. Last but not least, task-based specifications can also be used in design stages to derive lower level models such as HMMs.32 Task models are usually understood as descriptions of user interaction with a software system. They have been mostly applied to model-based UI development even though the application areas are versatile. For intention recognition task modelling can be employed to specify knowledge about the behaviour of users in an understandable manner which may be inspected at a very early stage. CTML (The Collaborative Task Modelling Language) is a task modelling language dedicated for smart environments which incorporates cooperational aspects as well as location modelling,33 device modelling, team modelling and domain modelling.34 It is a high level approach starting with role-based task specifications defining the stereotypical behaviour of actors in the environment. The interplay of tasks and the environment is defined by preconditions and effects. A precondition defines a required state which is needed to execute a task. An effect denotes a state change of the model through the performance of the corresponding task. Therefore CTML specifications allow for specifying complex dependencies of task models and relevant other models (e.g. domain model, location model) via preconditions and/or effects. Here, we focus on a subset of CTML only, called Custom CTML (CCTML). Example 1.3. To clarify the intuition behind task models, we give a brief example of a CCTML Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 7 specification. It will not only highlight the basic concepts of CCTML but will also serve as foundation for a more elaborate example in Section 1.4.1. In Figure 1.3 the example is given. It specifies how a presenter may give a talk in a very basic manner. More precisely it defines that a sequence of the actions Introduce, Configure Equipment, Start Presentation, Show Slides and Leave Room constitute to a successful presentation. With respect to the formal definitions above we can formalise the example as follows (instead of using the task names we use the prefixed numbers of task in Figure 1.3): T = {1., 2., 3., 4., 5.}, γ = (1., 2., 3., 4., 5.), prio(t) = 1, O = {O1 , O2 , O3 },o(t) = {(1., O1 ), (2., O1 ), (3., O2 ), (4., O3 ), (5., O3 )} 1. Give Talk 2. Introduce >> 3. Configure Equipment >> 4. Start Presentation >> 5. Show Slides >> 6. Leave Room Fig. 1.3. Simplified CCTML Model for “Giving a Presentation”. Please note that O, prio and o have been excluded from the visual representation. To specify such a model formally, we first introduce the basic task expressions as follows: Definition 1.3 (Task Expression (TE)). Let T be a set of atomic tasks. Let t1 and t2 be task expressions and λ ∈ T, then the following expression are also task expressions: t1 []t2 ,t1 | = |t2 ,t1 |||t2 ,t1 [>t2 ,t1 | >t2 ,t1 t2 , λ , λ ∗ , [λ ], (λ ). A CCTML model is defined by a set of action, denoting the atomic units of execution and a task expression γ. We extended the usual notion by introducing the function prio assigning a priority to each action in the CCTML model and the function o assigning observations to actions. Definition 1.4 (CCTML Model). A CCTML model is defined as a tuple hT, γ, O, prio, oi with T being a set atomic actions, γ ∈ TE being the task, prio : T → N assigning a priority to each action and o : T → O assigning an observation to each atomic action. Please note that TE defines only binary operators in infix notation. Nested TEs ((tx [] (ty []tz ))) can be easily translated into n-ary operators which are used in the examples. The task expressions TE defines the structure of the CCTML model. It is an recursive definition. Each task expression e ∈ TE is therefore defined by nesting of (an-) other task expression(s) plus an operator until an action has been reached. More precisely there are binary and unary operators. Additionally the actions need to be defined in the set of actions (T ) in the corresponding CCTML. For CTML an interleaving semantics, similar to operational semantics in process algebras,35 is defined via inference rules. Basically for every temporal operator (such as Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 8 Burghardt, Wurdel, Bader, Ruscher and Kirste choice([]), enabling (), etc.) a set of inference rules is declared which enables a derivation of a Labeled Transition System or a HMM. As the comprehensive definition includes more then 30 rules only an example for the choice operator is given. Informally this operator defines that a set of tasks are enabled concurrently. A task may be activated but due to its execution the others become disabled. Thus, it implements an exclusive choice of task execution. Formally the behaviour of the operator can be defined as follows (ti ∈ t1 . . .tn ): act ti → act [] (t1 ,t2 . . .tn ) → (1.1) act ti → ti0 act [] (t1 ,t2 . . .tn ) → ti0 (1.2) Taking the second rule we explain how such a rule is to be read. Given that we may transit from ti to ti0 by executing act then a choice expression can be translated to ti0 by executing act with ti ∈ t1 . . .tn . The first rule declares that if an execution of act in ti lead to a successful termination then a choice expression may terminate also successfully by executing act with ti ∈ t1 . . .tn . This rule is applied if ti is an atomic unit, so called action. Further readings about the inference rules for task modelling can be found in.36 The definition of a CCTML model is based on task names and a complex expression which specifies the structure of the model. The transformation process of a CCTML model to a state transition system by inference rules always terminates as expressions are always simplified until no rule to apply exists anymore. Picking up the example above a choice expression is transformed into a simple task expression (more precisely a state representing [] (t1 ,t2 . . .tn ) to a state representing ti0 ) which is in turn further simplified. Therefore eventually the state representing the empty task expression is created. This expression cannot be further simplified. 1.3.4. Probabilistic Context-Free Grammars Probabilistic context free grammars (PCFGs) are usually applied in speech recognition problems. They extend the notion of context free grammars by assigning probabilities to rules. For a general introduction we refer to.37 Valid words of the language are derived by replacing non-terminals according to the rules starting with the start symbol. Using the probabilities attached to the rules, we can compute the overall probability that a word has been generated using a given grammar. Example 1.4. As a running example we use the following PCFG with the set of terminal symbols T = {indoor, walk, stop, carStop, carSlow, carFast, indoor}, and the non-terminals Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 9 N = {Day, Trans, Car}, N1 = Day, and R = {(r1 := 1.0 : Day → Trans, indoor, Trans), (r2 := 0.3 : Trans → walk), (r3 := 0.7 : Trans → carStop, Car, carStop), (r4 := 0.2 : Car → carSlow, carFast, carSlow), (r5 := 0.8 : Car → carSlow, carFast, carSlow, Car)} in which every rule (r1 , . . . , r5 ) is annotated with its probability π. A valid sequence of actions is walk, indoor, walk. Another valid sequence in which the first walk has been exchanged is carStop, carSlow, carFast, carSlow, carStop, indoor, walk Definition 1.5 (PCFG). A probabilistic context free grammar is defined as a quintuple hT, N, N1 , R, πi, with T being a set of terminal symbols, N being a set of non-terminals, N1 ∈ N being the start symbol, R (rewrite rules) being a relation between non-terminals and words ζ ∈ (N × T)∗ , such that for every non-terminal n ∈ N there is at least one rule (n, ζ ) ∈ R, and π : R → R assigning probabilities to rules such that for all n ∈ N we find ∑(n,ζ )∈R π((n, ζ )) = 1. 1.4. Synthesising Probabilistic Models The following three sections show different approaches for the construction of hidden Markov models based on planning operators, task models and grammars, respectively. Afterwards, we discuss a first approach to combine the different modelling approaches to produce a single joint HMM. 1.4.1. From Task Models to Hidden Markov Models Using the definitions presented in Section 1.2, we define the corresponding HMM for a given CCTML-model as follows: Definition 1.6. Let hT, γ, O, prio, oi be a CCTML model. hS, π, T0 , τ, O, Pi as follows: We define the corresponding HMM act 0 0 • S = {(γ)} ( ∪ {e | e ∈ S, e → e } 1 if s = γ • π(s) = 0 otherwise act • T0 = {(t1 ,t2 ) | t1 ,t2 ∈ S,t1 → t2 } act prio(act) with ti → t j • τ(ti ,t j ) = prio(act) ∑(t ,t)∈T0 and t act →t i • P(ti , o) = i 1 act |{o(act) | (t,ti , ) ∈ T0 and t → ti }| Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 10 Burghardt, Wurdel, Bader, Ruscher and Kirste For a given CCTML model we derive the HMM by extracting all potential states from the CCTML by the inference rules. Thus it is started with the task expression γ and all potential action relations are fired. The resulting task expressions are added to the set S. This process is continued until the empty task expression is derived. Therefore the set of states of the HMM is consisting of all potential states of the CCTML model. Task models clearly specify the initial state by their structure. Therefore the initial probability of γ is 1 and for all other states 0. In the same vein as the states, transitions are derived. We define a transition in the HMM for each pair of states which can be reached by executing an action. The transition probabilities are calculated by means of the function prio. As a transition in the HMM coincides with an execution of an action in the task model transition probabilities are calculated by the ratio of the priority of the task under execution and the sum of priorities of all potential task executions. The function o(t) assigns an observation to an action. The probability of the occurrence of a certain observation in a state is uniformly distributed over the observations assigned to the incoming actions. 1. Give Talk 2. Introduce >> 3. Configure Equipment 8. Connect Laptop & Projector |=| >> 4. Start Presentation 9. Set to Presentation Fig. 1.4. >> 10. Show Next Slide 5. Next Slide >> * [> 6. End Presentation >> 7. Leave Room 11. Explain Slide CCTML Model for “Giving a Presentation” Example 1.5. Let us examine the transformation by an example depicted in Figure 1.4. Nodes denote tasks whereas edges either represent hierarchical task decomposition (vertical) or temporal operators (horizontal). The model specifies how a presenter may give a presentation. First the presenter introduces herself, then the presenter configures her equipment by connecting the laptop to the projector and set her laptop to presentation mode. Note that these two action may be executed in arbitrary order (denoted by the orderindependence operator (| = |)). After doing so the presenter starts her talk. The talk itself is performed by presenting slides iteratively (denoted by the unary iteration operator (∗ )). After ending the presenting which aborts the iteration (due to the disabling operator ([>))the presenter leaves the room. Using the prefixed numbers the task model can be paraphrase by the formal task expression term: >> (2., | = |(8., 9.), 4., [> (>> (10., 11.)∗ , 6.), 7.). Please note that there are also modelling elements which no visual counterpart (e.g. the observations O). The same applies for prio function which assigns for each task the priority 1 but for the following: (8., 4), (10., 8), (6., 2) The first number denotes the task to be assigned whereas the second illustrates the priority. Thus we have all information to construct the HMM. Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 11 Synthesising Generative Probabilistic Models for High-Level Activity Recognition {4, 10, 3, 2} {2,8} {} 1.0 0.727 1.0 0.8 {2} {3,2} 0.2 1.0 {4,3,2} 0.8 0.667 {4,3, 2,5} 0.333 {4,3,2, 5,6} 0.182 1.0 0.2 1.0 0.091 {1} {2,9} Fig. 1.5. Corresponding HMM to Figure 1.4 The resulting HMM is depicted in Figure 1.5. Again not all modelling elements are visualised (observations (O) and initial probability (π) are hidden). Nodes represent elements of S whereas edges define state transitions (T). Moreover labels of transitions mark transition probabilities (τ) for the corresponding transitions. Labels inside of nodes denote the set of task executed within the state (again, for reason of brevity we use the prefixed numbers). 1.4.2. From Planning Problems to Hidden Markov Models Given a planning problem as defined in the preliminaries, we first create the directed acyclic graph (and therefore the HMM transition model) that contains all operator sequences that reach a state where the goal condition G is true. Every vertex corresponds to a state of the world and the edges are annotated with actions. As humans normally behave situationdriven, we execute in our implementation a forward search, applying consecutive each operator to a world state I1 . If the preconditions of the operator are satisfied, we derive a new world-state I2 . We continue until either the goal condition is satisfied or we ran out of applicable actions. Afterwards we generate the observations O and the observation model P by applying a P mapping. This process is defined as follows: Definition 1.7. Let hI, Is , G, A, pre, eff+ , eff− i be a planning problem, O be a set of possible observations and P : A × O → R be a mapping from actions and observations to probabilities. Then, we define the corresponding HMM hS, π, T, τ, O, Pi as follows: • S = (Is ) ∪ {(s0 ) | (s) ∈ S, G 6⊆ s, a ∈ A, pre(a) ⊆ s, s0 := s \ eff− (a) ∪ eff+ (a)} • T = {((s), (s0 )), a ∈ A and s0 := s \ eff− (a) ∪ eff+ (a)} • τ(s1 , s2 ) = |{(s ,s0 )|(s1 ,s0 )∈T}| 1 1 ( p if s = Is • π((s)) = and p = |{(s)|(s)∈S1and s⊆Is }| 0 otherwise • P((s), o) = P(a, o) Example 1.6. Reconsider the planning problem from Example 1.2, the set of observables {oa , ob , oc } and P(a, oa ) = 1, P(b, ob ) = 1, P(c, oc ) = 1. The states, prior probabilities and transitions of the resulting HMM are depicted in Figure 1.6 and we find P(({}, a), oa ) = Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 12 Burghardt, Wurdel, Bader, Ruscher and Kirste P(({}, b), ob ) = P(({b}, a), oa ) = P(({a}, b), ob ) = 1. 0.5 {} aa 1.0 {a} ab 0.5 1.0 {} ab Fig. 1.6. 1.0 1.0 {a,b} ac {b} aa A graphical representation of the HMM from Example 1.6. While the resulting transition model T is already usable, we can enhance it by applying heuristic functions to weight each transition individually and generate better transition probabilities that reflect more the individual human behaviour. Possible heuristic functions come e.g. from the planning domain38 (plan length, goal-distance) or from knowledgedriven systems.39 We can apply these heuristics as follows: Salience implies to explicit prioritise operators. We can just give each operator a certain weight that will added to all transitions from this operator. Refractoriness denotes that once a operator is applied, don’t apply it again. This rule can be modelled in PDDL by introducing history-propositions that store whether an operator has been applied. Recency indicates to fire most recently activated rule first. It can be implemented by calculating the goal-distance to each operator, in order to add penalty to actions that lead to to plans with more plan-steps. Specificity is to choose an operator with the most conditions or the most specific conditions ahead of a more general rule (e.g. the number of ground propositions the effects of an action that are also part of the goal condition. These different strategies generate different transition probabilities τ for a HMM. We are currently investigating which rules are more successful in mimicking human behaviour. The derivation of the observation probabilities in the example is arguably very simple with a given direct action-observation mapping. However this mapping can be generated by more sophisticated approaches from related literature. P HILOPOSE et. al13 mined these observation probabilities for RFID-based activity recognition from the web. In later publications, P ENTNEY further enhanced this approach28 by data-mining and combining more common-sense databases like Cyc40 and OpenMind.41 These and similar ongoing research complements our approach of automatically synthesising probabilistic models for activity recognition. Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 13 1.4.3. From Probabilistic Context-Free Grammars to Hidden Markov Models As already argued above, humans tend to describe complex actions in a hierarchical manner. One approach to describe complex tasks has been discussed in Section 1.4.1. Here we pursue a second approach by employing probabilistic context free grammars (PCFG). Many users find writing grammar rules more intuitive than e.g. a causal language description. PCFGs are also used in the literature intention recognition by K IEFER ET. AL.42 Our approach to construct HMMs from probabilistic context-free grammars is as follows: We first construct a so-called hierarchical hidden Markov model which captures the structure of a given grammar and then we flatten this hierarchical model to obtain a normal HMM. Obviously one could also use the hierarchical model for inferences, but as already argued we would like to obtain a plain hidden Markov model. Therefore, we convert the hierarchical model into a flat version. But before diving into the technical details of the transformation, we introduce extended PCFGs and hierarchical hidden Markov models formally. Extended PCFGs (EPCFG) extends a PCFG by adding the set of possible observations and the observation probabilities as follows: Definition 1.8 (EPCFG). Let hT, N, N1 , R, πi be a PCFG as defined above, let I be a set of input symbols and P : T × I → R mapping input symbols and terminal states to observation probabilities, such that for all t ∈ T we find ∑i∈I P(t, i) = 1. Then we call hT, N, N1 , R, π, I, Pi an extended PCFG. To illustrate the concept, we use the following simple example in which a working day is modelled. The task was to infer the current activity based on the speed obtained from a GPS-sensor. Example 1.7. Reconsider the grammar shown in Example 1.4. Using the current speed as input, we have I = R and we define the observation probabilities as follows: Pindoor = N(2,3) Pwalk = N(4,3) PcarSlow = N(25,15) PcarFast = N(50,20) PcarStop = N(0,2) The observations probabilities shown above define the probability of certain activities with respect to the current speed of the user. Hierarchical hidden Markov models43 extend the usual notation of a HMM, by allowing the definition of sub-models c ∈ C. Those sub-models can be activated by so called internal states. This call-relation is captured in the function J with J(c, i) = c0 stating that state i in sub-model c activates sub-model c0 , for non-internal states we define J(c, i) = . A given sub-model can be left in any state with ξ (c, i) being the probability of sub-model c terminating in state i. To simplify the notation below, we use natural numbers to denote the states within a sub-model of a HHMM. Definition 1.9 (HHMM). A hierarchical hidden Markov model is defined as a octuple hC,C1 , | · |, π, τ, ξ , J, O, Pi with C being a set of model names and C1 ∈ C be the start model, Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 14 Burghardt, Wurdel, Bader, Ruscher and Kirste | · | : C → N defining the size of a given sub-model, π : C × N → R being a prior probability function with π(c, i) being the probability that model c starts in state i and for all c ∈ C |c| we find ∑i=0 π(c, i) = 1, τ : C × N × N → R defining the transition probabilities between to |c| states within a given model with ∑ j=0 τ(c, i, j) = 1 for all c and 0 ≤ i < |c|, ξ : C × N → R defining the exit probability, J : C × N → C ∪ {} being a function from states to other submodels indicating sub-model-calls, O being a set of input symbols and P : C × N × O → R being a function from sub-model states and inputs to observation probabilities. Please note that we use the model name as an index rather than a parameter, e.g., we write πc (i) instead of π(c, i). D Day: T 0.3 Trans: 0 1 2 Trans indoor Trans 0.7 0 1 2 3 walk carStop Car carStop C 0.2 Car: 0.8 0 1 2 3 4 5 6 carSlow carFast carSlow carSlow carFast carSlow Car Fig. 1.7. A graphical representation of the HHMM from Example 1.8. Prior probabilities are depicted by an in-going arrow, exit probabilities using an outgoing arrow, and calls to a sub-model using a dashed line. All unlabelled links are weighted with 1.0. Example 1.8. Consider the HHMM hC,C1 , | · |, π, τ, ξ , J, O, Pi with C = {D, T,C}, C1 = D, |D| = 3, |T | = 4 and |C| = 7, πD (0) = 1, πT (0) = 0.3, πT (1) = 0.7, πC (0) = 0.2, πC (3) = 0.8 and πc (i) = 0 otherwise, ξD (2) = 1, ξT (0) = 1, ξT (3) = 1, ξC (2) = 1, ξC (6) = 1 and ξc (i) = 0 otherwise, JD (0) = T , JD (2) = T , JT (2) = C and JC (6) = C. The transition probabilities are shown as arrows in Figure 1.7, which contains a graphical representation of this HHMM. Every arrow represents a transition with non-zero probability. Please note that this example defines the structure of the HHMM only and neither the input symbols nor the observation probabilities are depicted here. 1.4.3.1. From PCFGs to HHMMs As a first step while constructing a HMM, we construct a HHMM corresponding to a given EPCFG, with corresponding meaning informally “doing the same”. For the moment, we consider acyclic PCFGs only. After presenting the definition and discussing the underlying intuition, we extend our transformation to PCFGs allowing for cycles of length 1, i.e., references from a rule to the non-terminal. For convenience, we assume the set of rules to be ordered and define the following function ι : N×R×N → N mapping a given non-terminal, rule and position of some symbol Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 15 within the body of the rule to a global offset within respect to the non-terminal. Considering the PCFG from Example 1.7, we find ι(Day, r1 , 1) = 1 and ι(Car, r5 , 2) = 5. Definition 1.10. Let hT, N, N1 , R, π, I, Pi be an extended PCFG. We define the corresponding hierarchical hidden Markov model hC,C1 , | · |, π 0 , τ, ξ , J, O0 , P0 i as follows: • C := N, C1 := N1 , |N| := ∑(n→ζ )∈R |ζ | and O0 := I ( π(r) if i = ι(c, r, 1) 0 • πc (i) := 0 otherwise ( 1 if i = ι(c, (c → ζ ), i0 ), j = i + 1 and i0 < |ζ | • τc (i, j) := 0 otherwise ( 1 if i = ι(c, (c → ζ ), i0 ) and i0 = |ζ | • ξc := 0 otherwise 0 • Jc (i) := c0 iff i = ι(c, (c → ζ ), i0 ), ζ [i ] = c0 and c0 ∈ N 0 • Pc0 (i, s) := P(t, s) iff i = ι(c, (c → ζ ), i0 ), ζ [i ] = t and t ∈ T For a given grammar, we construct the corresponding HHMM by inserting a sub-model for every non-terminal. For every occurrence of a (non)-terminal symbol within a rule for the non-terminal we insert a state into the corresponding sub-model. E.g., for the nonterminal Day from the example above, we create a sub-model containing 3 states. In our running example, we use a cycle of length 1 to reference sub-model C from state C6 . Those cycles can be handled by changing the transition model of the corresponding sub-state. Without discussing the technical details, the result is shown in the following example. Example 1.9. Reconsidering the EPCFG from Example 1.7 we find the corresponding HHMM to be the one discussed in Example 1.8. 1.4.3.2. Flattening HHMMs After constructing a hierarchical hidden Markov model from a given PCFG, we have to flatten this model into a “normal” HMM. Considering the procedural semantics of a HHMM, we find that the model’s global state can be described using the current call-stack, which is represented as a list of numbers. E.g., in the HHMM from Figure 1.7, the state 0 of the start model Day calls sub-model Trans, being there in state 2 is represented as call stack [0, 2]. Please note furthermore, that for a given stack [s1 , . . . , sn ], the stack context [1s , . . . , sn−1 ] completely specifies the sub-model of state sn . Therefore, we can use stack contexts to refer to sub-models in the definitions below. We use the notation [s1:n ] to refer to [s1 , . . . , sn ]. Before defining a flatted HMM, we compute the set of reachable stacks as follows: Definition 1.11 (Stacks). Let hC,C1 , | · |, π, τ, ξ , J, O, Pi be a given HHMM. We define the Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 16 Burghardt, Wurdel, Bader, Ruscher and Kirste set of reachable stacks as follows: S = {[i] | 0 ≤ i < |C1 | and πC1 (i) > 0} ∪ {[si:n−1 , i] | [si:n ] ∈ S, 0 ≤ i < |[si:n−1 ]| and τ[si:n−1 ] (sn , i) > 0} ∪ {[si:n , i] | [si:n ] ∈ S, J[si:n−1 ] (sn ) = c, 0 ≤ i < |c| and πc (i) > 0} This definition collects all reachable stacks by simply traversing the HHMM. To obtain a flattened HMM, we construct a state for every possible stack ending with a non-internal state, and afterwards, we compute the transition probabilities between different stacks. Definition 1.12. Let hC,C1 , | · |, π, τ, ξ , J, O, Pi be a given HHMM and S be the set of stacks. We define the corresponding HMM hS, π, T, τ, O, Pi as follows: • S := {[s1:n ] | [s1:n ] ∈ S and J[s1:n−1 ] (sn ) = } ( πC1 (i) iff s = [i] • π : S → R : s 7→ 0 otherwise • T := S × S min(n,m) • τ : T → R : ([s1:n ], [r1:m ]) 7→ ∑i=1 p([s1:n ], [r1:m ], i) • P([s1:n ], o) = P[s1:n−1 ] (sn , o) with p : S × S × N → R being defined as follows: i i p([s1:n ], [r1:m ], i) = ∏ ξ[s1: j−1 ] (s j ) · (1 − ξ[s1:i−1 ] (si )) · τ[s1:i−1 ] (si , ri ) · ∏ π[r1: j−1 ] (r j ) j=n j=m The intuition behind the definition of p is as follows: to reach a stack configuration [r1:m ] from a stack [s1:n ] using a transition on level i, we have to leave all sub-models from level n back to level i (∏ij=n ξ[s1: j−1 ] (s j )), afterwards, we multiply the probability of not leaving sub-model i (1 − ξ[s1:i−1 ] (si )) and transiting to state ri (τ[s1:i−1 ] (si , ri )), finally we call all necessary sub-models to reach the desired stack-configuration (∏ij=m π[r1: j−1 ] (r j )). Example 1.10. Figure 1.8 shows the flattened version of the hierarchical HMM from Example 1.8. Using the process described above, we can construct a HMM for a given extended PCFG such that the HMM captures the ideas of the grammar. I.e., all valid sequences of actions are represented within the HMM and the corresponding observation models are attached to the states. As mentioned above, we applied this approach in a first simple scenario, in which we try to infer the users activity based on her current speed. The experiments show that the general temporal structure of a day can easily be captured in a PCFG and thus be transformed into an equivalent HMM. Here, we did not discuss the duration of single actions. But instead of using just a single state for a given action, we can use a sequence of states. But this extension is beyond the scope of this paper. Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 17 0.3 0.0 1 0.3 2.0 0.7 0.7 0.2 0.1 0.2.0 0.2.1 0.2.2 0.3 0.2 0.8 0.2.3 0.2.4 0.8 0.2 0.2.5 2.1 2.2.0 2.2.1 2.2.2 2.3 0.2 0.8 2.2.3 2.2.4 2.2.5 0.8 Fig. 1.8. A graphical representation of the flattened HHMM from Example 1.8. As above, prior probabilities are depicted by an in-going arrow, exit probabilities using an outgoing arrow, and unlabelled links are weighted 1.0. 1.4.4. Joint HMMs In the previous sections three approaches have been presented which allow us to model human activities in a more convenient manner than it is achievable by instantly utilising the hidden Markov models (HMM) approach. We have seen that it is feasible to synthesise a HMM based on a description of domain and problem of the intended scenario in terms of precondition/effect operators or on a set of rules of a Probabilistic Context-Free Grammar (PCFG) or on a process algebra described in the CTML language, respectively. In each case we obtain a well-defined hidden Markov model as a result, that allows us to infer human activities, given the specific case of application. As mentioned in section 1.3.1, we can assume that the real-world processes which can be described by one of the previously depicted formalisms and of which afterwards a HMM can be synthesised, consist of time-sequential activities which we understand as each of them beginning and ending at specific points in time, and following one after another. Thus, a state si ∈ S of the HMM hS, π, T, τ, O, Pi can be seen as a specific activity in the underlying process. The transition (i, j) ∈ T connecting the states si and s j can - if τ((i, j)) > 0 - hence be seen as the existence of a potential consecutiveness between the two corresponding activities of this process. Note that it is varying and heavily scenario-dependent how well one of the specified formalisms fits to the process that is to be modelled. Consider having gained two or more somehow scenario-complementary hidden Markov models by utilising different synthesising algorithms. One would like to join these models, e.g. by inserting novel inter-HMM transitions, which would introduce additional potential consecutiveness between the corresponding activities. It is thereby evident that the result of this joining operation needs to be a well-defined hidden Markov model itself, preserving the advantages of this probabilistic approach. Definition 1.13 (JointHMM). Let H be a set of n well-defined hidden Markov models S hSi , πi , Ti , τi , Oi , Pi i with 1 ≤ i ≤ n, Si ∩ S j = 0/ for i 6= j, S = Si and Oi = O j . Let R be a set of connections with R ⊆ S × S and ρ : R 7→ R be a mapping from inter-model connections to probabilities. We define the joint hidden Markov model hS, π, T, τ, O, Pi as follows: Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 18 Burghardt, Wurdel, Bader, Ruscher and Kirste • π : S → R : s 7→ S • T = Ti ∪ R πi (s) n with s ∈ Si • τ : S × S → R : (s1 , s2 ) 7→ ρ((s1 , s2 )) τ (s , s ) · f i 1 2 if (s1 , s2 ) ∈ R with s1 , s2 ∈ Si and f = 1 − ∑(si ,s0 )∈R ρ((si , s0 )) • P : S × O → R : (s, o) 7→ Pi (s, o) with s ∈ Si Given two or more hidden Markov models compliant with definition 1.1, as sextuples hSi , πi , Ti , τi , Oi , Pi i the question was issued of how to join these models. In a first step, we unify the original sets of states S1 , S2 , . . . , Sn of the n HMMs to obtain the set of states S of the joint HMM, while assuming them to be pairwise disjoint. Furthermore, the values of the joint prior probability function π should sum up to 1. As a straightforward approach we divide all the values πi by the number of the original HMMs, n. Incidentally, one also could define (some kind of global) prior weights over the different models, and combine these with the priors for each model. We decided not to do so for simplicity reasons. The joint set of transitions T can be determined by unifying the particular sets of transitions T1 , T2 , . . . , Tn and R, which is a set of novel inter-HMM transitions. This step also needs the assumption of the subsets T1 , T2 , . . . , Tn , R to be pairwise disjoint. The set R can hereby be imagined as a set of rules that allow an intuitive way to fully specify novel interHMM transitions. As an example, one of these rules might denote a transition between the states s and s0 of the HMMs A and B, respectively, together with assigning the probability p p: A(s) − → B(s0 ). The probability value of an inter-HMM transition is herewith interpreted as the chance of exiting a sub-HMM and entering the targeted sub-HMM. The joint transition probability function τ is therefore determined by the original τi except for the particular transitions that originate from a state s ∈ Si and this s itself is a starting point of at least one inter-HMM transition. In such cases, the probabilities of the intra-HMM transitions coming from s have to be normalised with the factor f by definition 1.13. Finally coming to the observation model, for simplicity reasons we assume the sets of observations Oi of the sub-HMMs to be identical. Thus, the observation probability function P is fully determined by the Pi of the sub-HMMs. Example 1.11. Consider now having modelled a meeting scenario held in an intelligent meeting room, e.g. by using PDDL operators, and furthermore the modelling of a technical infrastructure demonstration in this room using a set of PCFG rules. After having translated these two model descriptions, we have as a result obtained two disparate hidden Markov models, each recognising diverse sets of activities. Once we imagine the meeting to be preceded by such a technical demonstration the concern for joining these HMMs is coming up. Let now denote M(Presentation) the state Presentation of a hidden Markov model M, with M modelling the meeting scenario, and D(DemoEnding) denote a state DemoEnding Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 19 of HMM D, with D modelling the technical demonstration. We now would like to introduce the existence of a potential transition between these two activities, expressing a potential consecutiveness between the ending of a technical demonstration and a presentation within a project meeting. For this purpose we bring in a novel inter-HMM transition and establish 0.2 a rule which connects the two states: D(DemoEnding) −→ M(Presentation) The probability value 0.2 indicates a chance of 0.2 to leave the sub-HMM D and enter M. Each intra-HMM transition which comes from the state D(DemoEnding) needs to be multiplied by 1 − 0.2 = 0.8. At this point, we have everything we need to join the HMMs D and M by definition 1.13. 1.5. Discussion The underlying idea of all our approaches is to turn a formal description of the users activity into a probabilistic generative model that is able to recognise the desired activities. Over time, many different formal descriptions for human behaviour have emerged. In this paper we have explored three alternatives: a top-down description where we have chosen CTML as a representative for task modelling language. The second approach is a bottomup modelling approach where we use a causal description with preconditions and effects. Our third approach is to use PCFGs. To leverage the advantages of the individual modelling approaches we combine the simpler HMMs generated by each modelling approach to a joint HMM. PDDL Synthesise HMM1 CCTML Synthesise HMM2 PCFG Synthesise HMM3 Fig. 1.9. Join Joint HMM Overview of our model synthesis research In this section we discuss the individual strength and weaknesses of the different approaches when modelling real world scenarios, creating some guidelines when to use a certain modelling paradigm and when not. 1.5.1. Planning operators “Classical” planning has a long history of searching for solutions in order to efficiently schedule resources or control the behaviour of robots. We translated this approach to humans in order to describe (simplified) human behaviour. The advantage of the planning approach is the implicit description of all processes that can occur, clearly an advantage as human behaviour often is interleaved with different activities. As the processes are Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 20 Burghardt, Wurdel, Bader, Ruscher and Kirste modelled implicit, no permutation can be forgotten. Also the possible interaction between multiple humans emerge automatically, a clear advantage for team scenarios. Furthermore it is easy to extend the application domain with new operators as these can be simply added to the action repository and are automatically added by the algorithm in the right positions. Thus devices in a smart environment could describe their possible human interactions with preconditions and effects and become a part in the activity recognition process.44 Two main challenges arise from this planning approach: The first is the description of the world-state I. It is difficult to define which information belongs in the current worldstate and what therefore all the preconditions and effects of an action need to take into account. The second challenge is the state explosion which is here much more imminent than in the other two approaches. However by employing approximate inference we can construct a particle filter where the planning process efficiently describes the state transition process, thus avoiding the need of exploring the whole state space a-priori. 1.5.2. Task models Task analysis and modelling has been employed for specifying user actions over decades. It is a commonly agreed in HCI that task analysis is a important device for interaction development (which we consider as a super domain of intention recognition). As CTML is rooted in task modelling approaches of HCI, it supports the asset of those approaches: understandability, intuitiveness, tool support and embedment into a methodology. The assets are also relevant for intention recognition especially while creating the model. Top-down modelling has the advantage of incorporating gradual refinement of models in an intuitive manner. Actions can be further decomposed as necessary. Moreover as CTML define a formal semantics verification algorithms can be employed to assure certain quality criteria (e.g. deadlock freedom). Another advantage of CTML is the opportunity of validating the model by animation. As the semantics are well defined an interactive animation can be directly created. However it is more important how such a modelling approach is valid for highly dynamic systems like smart environments at runtime. One has to admit that task modelling is a rather static approach as task models are created interactively by software designers at design time. Adding new tasks is rather cumbersome as the task model needs to be adapted (which is not the case using the precondition/effects description formalism). On the contrary designed models exhibit a higher quality than automatically composed ones. In contrast to context-free grammars CTML is able to specify concurrent behaviour. This is a major advantage as humans usually perform not only one task a time but mingling task in an interleaving manner. Especially for multi-user specifications CTML models can become quite complex which leads naturally to very large HMMs. This is a problem that can be solved with approximate inference by employing a particle filter. The CTML is intuitively a compact representation of the state transition function. With a particle filter we keep only track of the executed task and employ a lazy (on demand) state creation. The treatment of observa- Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 21 tion is currently rather basic. More sophisticated approaches need to be developed to create more realistic HMMs which is one item currently under investigation. 1.5.3. Probabilistic Context-Free Grammars Probabilistic context-free grammars have quite successfully been applied in the area of speech recognition. Here we used them to initialise a hidden Markov model to recognise human behaviour. In first experiments we used the approach to model the overall structure of a day and to infer the users activity based on her current speed as computed by a GPS sensor. Those experiments showed that our approach can indeed be used to model highlevel activities and their temporal structure quite intuitively. But they also showed that e.g., interleaving activities are hard to model. On the other hand, the overall structure can be modelled quite intuitive. Due to space constraints we did not discuss timing information here. But those can be integrated into the model using the standard time-expansion of states. E.g., if a certain state is supposed to last for three time-steps, it is expanded into a sequence of three identical states. In a similar fashion and using self transitions in all states of the sequence it is possible to implement a negative binomial distribution with respect to the expected time a state should last. 1.5.4. Joint Hidden Markov Models The major advantage in the presented HMM-joining algorithm can be seen in addressing the need for a formalism to combine a set of hidden Markov models that (potentially) have been synthesised by using differing formalisms. p Therefore we have introduced a simple operator − →, that allows us to deploy a set of rules, with each one introducing a novel inter-HMM transition, plus the assignment of a probability value p which obeys an intuitive semantics: the chance of leaving the current sub-HMM through this particular transition. As a result, we gain a well-defined hidden Markov model, incorporating the previously shown advantages of probabilistic generative models and symbolic high-level description of complex human activities. We are currently investigating, how we can weaken the precondition Oi = O j and thereby determine the two parameters O and P of the joint HMM in a more generalising manner, so that the set of observations which can be recognised by the available sensors, and the state-observation probabilities, respectively, are allowed to differ between sub-HMMs. 1.6. Summary and Outlook In order to easily describe hierarchical behaviour, we employ high-level description languages to derive probabilistic models for activity recognition. In this paper we have shown three alternatives based on a process-based language, a causal language, and a grammar, to synthesise generative probabilistic models for activity recognition from different high- Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 22 Burghardt, Wurdel, Bader, Ruscher and Kirste level description modelling paradigms. These different approaches comprise top-down and bottom-up modelling, and are thus complementing each other. We have further sketched a method to combine the languages to derive a joint HMM, thus leveraging the individual strength and weaknesses of the different modelling paradigms. Given the basics in this paper, we seek to build activity recognition systems for complex domains like team-activity recognition or a day-care nurse, that comprise of many hierarchical, long-term, intermingled activities, leading to very large state-spaces that are very challenging to describe. References 1. D. J. Cook, J. C. Augusto, and V. R. Jakkula, Ambient intelligence: Technologies, applications, and opportunities, Pervasive and Mobile Computing. 5(4), 277–298 (August, 2009). ISSN 15741192. doi: 10.1016/j.pmcj.2009.04.001. URL http://dx.doi.org/10.1016/j.pmcj. 2009.04.001. 2. G. D. Abowd and E. D. Mynatt, Designing for the human experience in smart environments. pp. 151–174 (September, 2004). doi: 10.1002/047168659X.ch7. URL http://dx.doi.org/10. 1002/047168659X.ch7. 3. G. M. Youngblood and D. J. Cook, Data mining for hierarchical model creation, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 37(4), 561–572 (July, 2007). ISSN 1094-6977. doi: 10.1109/TSMCC.2007.897341. URL http://dx.doi. org/10.1109/TSMCC.2007.897341. 4. F. Doctor, H. Hagras, and V. Callaghan, A fuzzy embedded agent-based approach for realizing ambient intelligence in intelligent inhabited environments, IEEE Transactions on Systems, Man, and Cybernetics, Part A. 35(1), 55–65, (2005). doi: 10.1109/TSMCA.2004.838488. URL http: //dx.doi.org/10.1109/TSMCA.2004.838488. 5. E. Chávez, R. Ide, and T. Kirste. Samoa: An experimental platform for situation-aware mobile assistance. In eds. C. H. Cap, W. Erhard, W. Koch, C. H. Cap, W. Erhard, and W. Koch, ARCS, pp. 231–236. VDE Verlag, (1999). ISBN 3-8007-2482-0. URL http://dblp.uni-trier.de/ rec/bibtex/conf/arcs/ChavezIK99. 6. A. Fox, B. Johanson, P. Hanrahan, and T. Winograd, Integrating information appliances into an interactive workspace, IEEE Computer Graphics and Applications. 20(3), 54–65 (August, 2000). ISSN 0272-1716. doi: 10.1109/38.844373. URL http://dx.doi.org/10.1109/38. 844373. 7. S. A. Velastin, B. A. Boghossian, B. Ping, L. Lo, J. Sun, and M. A. Vicencio-silva. Prismatica: Toward ambient intelligence in public transport environments. In Good Practice for the Management and Operation of Town Centre CCTV. European Conf. on Security and Detection, vol. 35, pp. 164–182, (2005). URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10. 1.1.110.2113. 8. Z. Chen. Bayesian filtering: From kalman filters to particle filters, and beyond. Technical report, McMaster University, (2003). URL http://math1.unice.fr/\~{}delmoral/chen\ _bayesian.pdf. 9. L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov networks. In IJCAI’05: Proceedings of the 19th international joint conference on Artificial intelligence, pp. 773–778, San Francisco, CA, USA, (2005). Morgan Kaufmann Publishers Inc. URL http://portal.acm.org/citation.cfm?id=1642293.1642417. 10. D. J. Patterson, L. Liao, K. Gajos, M. Collier, N. Livic, K. Olson, S. Wang, D. Fox, and H. A. Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 23 Kautz. Opportunity knocks: A system to provide cognitive assistance with transportation services. In eds. N. Davies, E. D. Mynatt, I. Siio, N. Davies, E. D. Mynatt, and I. Siio, Ubicomp, vol. 3205, Lecture Notes in Computer Science, pp. 433–450. Springer, (2004). ISBN 3-540-22955-8. URL http://dblp.uni-trier.de/rec/bibtex/conf/huc/PattersonLGCLOWFK04. D. H. Hu and Q. Yang. Cigar: concurrent and interleaving goal and activity recognition. In AAAI’08: Proceedings of the 23rd national conference on Artificial intelligence, pp. 1363–1368. AAAI Press, (2008). ISBN 978-1-57735-368-3. URL http://portal.acm.org/citation. cfm?id=1620286. A. Hein and T. Kirste, A hybrid approach for recognizing adls and care activities using inertial sensors and rfid. 5615, 178–188, (2009). doi: 10.1007/978-3-642-02710-9\_21. URL http: //dx.doi.org/10.1007/978-3-642-02710-9\_21. M. Perkowitz, M. Philipose, K. Fishkin, and D. J. Patterson. Mining models of human activities from the web. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pp. 573–582, New York, NY, USA, (2004). ACM. ISBN 1-58113-844-X. doi: 10.1145/ 988672.988750. URL http://dx.doi.org/10.1145/988672.988750. H. A. Kautz. A formal theory of plan recognition and its implementation. In eds. J. F. Allen, H. A. Kautz, R. Pelavin, and J. Tenenberg, Reasoning About Plans, pp. 69–125. Morgan Kaufmann Publishers, San Mateo (CA), USA, (1991). URL http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.21.1583. R. P. Goldman, C. W. Geib, and C. A. Miller, A bayesian model of plan recognition, Artificial Intelligence. 64, 53–79, (1993). URL http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.21.4744. M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Fox, H. Kautz, and D. Hahnel, Inferring activities from interactions with objects, IEEE Pervasive Computing. 3(4), 50–57 (October, 2004). ISSN 1536-1268. doi: 10.1109/MPRV.2004.7. URL http://dx.doi.org/10. 1109/MPRV.2004.7. D. W. Albrecht, I. Zukerman, and A. E. Nicholson, Bayesian models for keyhole plan recognition in an adventure game, User Modeling and User-Adapted Interaction. 8(1), 5–47 (March, 1998). doi: 10.1023/A:1008238218679. URL http://dx.doi.org/10.1023/A:1008238218679. H. H. Bui. A general model for online probabilistic plan recognition. In IJCAI’03: Proceedings of the 18th international joint conference on Artificial intelligence, pp. 1309–1315, San Francisco, CA, USA, (2003). Morgan Kaufmann Publishers Inc. URL http://portal.acm.org/ citation.cfm?id=1630846. E. Kim, S. Helal, and D. Cook, Human activity recognition and pattern discovery, IEEE Pervasive Computing. 9(1), 48–53 (January, 2010). ISSN 1536-1268. doi: 10.1109/MPRV.2010.7. URL http://dx.doi.org/10.1109/MPRV.2010.7. G. Singla, D. J. Cook, and M. Schmitter-Edgecombe, Recognizing independent and joint activities among multiple residents in smart environments, Journal of Ambient Intelligence and Humanized Computing. 1(1), 57–63. ISSN 1868-5137. doi: 10.1007/s12652-009-0007-1. URL http://dx.doi.org/10.1007/s12652-009-0007-1. T. Gu, Z. Wu, X. Tao, H. K. Pung, and J. Lu. epsicar: An emerging patterns based approach to sequential, interleaved and concurrent activity recognition. In 2009 IEEE International Conference on Pervasive Computing and Communications, vol. 0, pp. 1–9, Los Alamitos, CA, USA (March, 2009). IEEE. ISBN 978-1-4244-3304-9. doi: 10.1109/PERCOM.2009.4912776. URL http://dx.doi.org/10.1109/PERCOM.2009.4912776. R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, and C. Isbell, A novel sequence representation for unsupervised analysis of human activities, Artificial Intelligence. 173(14), 1221–1244 (September, 2009). ISSN 00043702. doi: 10.1016/j.artint.2009.05.002. URL http: //dx.doi.org/10.1016/j.artint.2009.05.002. A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logis- Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com 24 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. Burghardt, Wurdel, Bader, Ruscher and Kirste tic regression and naive bayes, (2001). URL http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.19.9829. J.-H. Xue and D. M. Titterington, Comment on ” on discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes”, Neural Processing Letters. 28(3), 169–187 (December, 2008). ISSN 1370-4621. doi: 10.1007/s11063-008-9088-7. URL http://dx.doi. org/10.1007/s11063-008-9088-7. S. Dasgupta, M. L. Littman, and D. McAllester. Pac generalization bounds for co-training, (2001). URL http://ttic.uchicago.edu/\~{}dmcallester/cotrain01.ps. A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological). 39(1), 1–38, (1977). ISSN 00359246. doi: 10.2307/2984875. URL http://dx.doi.org/10.2307/ 2984875. V. W. Zheng, D. H. Hu, and Q. Yang. Cross-domain activity recognition. In Ubicomp ’09: Proceedings of the 11th international conference on Ubiquitous computing, pp. 61–70, New York, NY, USA, (2009). ACM. ISBN 978-1-60558-431-7. doi: 10.1145/1620545.1620554. URL http://dx.doi.org/10.1145/1620545.1620554. W. Pentney. Large scale use of common sense for activity recognition and analysis. URL http: //citeseer.ist.psu.edu/rd/32044135\%2C761213\%2C1\%2C0.25\%2CDownload/ http://citeseer.ist.psu.edu/cache/papers/cs2/285/http:zSzzSzwww.cs. washington.eduzSzhomeszSzbillzSzgenerals.pdf/pentney05large.pdf. W. Pentney, M. Philipose, and J. A. Bilmes. Structure learning on large scale common sense statistical models of human state. In eds. D. Fox, C. P. Gomes, D. Fox, and C. P. Gomes, AAAI, pp. 1389–1395. AAAI Press, (2008). ISBN 978-1-57735-368-3. URL http://dblp.uni-trier. de/rec/bibtex/conf/aaai/PentneyPB08. R. E. Fikes and N. J. Nilsson, Strips: A new approach to the application of theorem proving to problem solving, Artificial Intelligence. 2(3-4), 189–208, (1971). doi: 10.1016/0004-3702(71) 90010-5. URL http://dx.doi.org/10.1016/0004-3702(71)90010-5. Q. Limbourg and J. Vanderdonckt. Comparing task models for user interface design. In eds. D. Diaper and N. Stanton, The Handbook of Task Analysis for Human-Computer Interaction. Lawrence Erlbaum Associates, (2003). M. Giersich, P. Forbrig, G. Fuchs, T. Kirste, D. Reichart, and H. Schumann, Towards an integrated approach for task modeling and human behavior recognition, Human-Computer Interaction. 4550, 1109–1118, (2007). M. Wurdel. Towards an holistic understanding of tasks, objects and location in collaborative environments. In ed. M. Kurosu, HCI (10), vol. 5619, Lecture Notes in Computer Science, pp. 357–366. Springer, (2009). ISBN 978-3-642-02805-2. M. Wurdel, D. Sinnig, and P. Forbrig, Ctml: Domain and task modeling for collaborative environments, JUCS. 14(Human-Computer Interaction), (2008). A. W. Roscoe, C. A. R. Hoare, C. A. R. Hoare, and B. Richard, The Theory and Practice of Concurrency. (Prentice Hall PTR, 1997). F. Paterno and C. Santoro. The concurtasktrees notation for task modelling. Technical report, (2001). K. Lari and S. Young, The estimation of stochastic context-free grammars using the insideoutside algorithm, Computer Speech and Language. (4), 35–56, (1990). J. Hoffmann and B. Nebel. The ff planning system: Fast plan generation through heuristic search, (2001). URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.673. L. Brownston, Programming expert systems in OPS5 : an introduction to rule-based programming. The Addison-Wesley series in artificial intelligence, (Addison-Wesley, 1985). ISBN 978. URL http://www.worldcat.org/isbn/978. D. B. Lenat, Cyc: a large-scale investment in knowledge infrastructure, Commun. ACM. 38 Preprint – The final publication is available with DOI 10.2991/978-94-91216-05-3_10 at link.springer.com Synthesising Generative Probabilistic Models for High-Level Activity Recognition 41. 42. 43. 44. 25 (11), 33–38 (November, 1995). ISSN 0001-0782. doi: 10.1145/219717.219745. URL http: //dx.doi.org/10.1145/219717.219745. P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu. Open mind common sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002, pp. 1223–1237, London, UK, (2002). Springer-Verlag. ISBN 3-540-00106-9. URL http://portal.acm.org/citation.cfm?id=701499. P. Kiefer and K. Stein. A framework for mobile intention recognition in spatially structured environments. In eds. B. Gottfried, H. K. Aghajan, B. Gottfried, and H. K. Aghajan, BMI, vol. 396, CEUR Workshop Proceedings, pp. 28–41. CEUR-WS.org, (2008). URL http://dblp. uni-trier.de/rec/bibtex/conf/ki/KieferS08. S. Fine and Y. Singer. The hierarchical hidden markov model: Analysis and applications. In MACHINE LEARNING, pp. 41–62, (1998). C. Reisse, C. Burghardt, F. Marquardt, T. Kirste, and A. Uhrmacher. Smart environments meet the semantic web. In MUM ’08: Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia, pp. 88–91, New York, NY, USA, (2008). ACM. ISBN 978-1-60558-192-7. doi: 10.1145/1543137.1543154. URL http://dx.doi.org/10.1145/ 1543137.1543154.
© Copyright 2026 Paperzz