Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K. Murphy, L. Kaelbling Presented by: Hannaneh Hajishirzi Outline • Define H-HMM – Flattening H-HMM • Define H-POMDP – Flattening H-POMDP • Approximate H-POMDP with DBN • Inference and Learning in H-POMDP Introduction • H-POMDPs represent state-space at multiple levels of abstraction – Scale much better to larger environments – Simplify planning • Abstract states are more deterministic – Simplify learning • Number of free parameters is reduced Hierarchical HMMs • A generalization of HMM to model hierarchical structure domains – Application: NLP • Concrete states: emit single observation • Abstract states: emit strings of observations • Emitted strings by abstract states are governed by sub-HMMs Example • HHMM representing a(xy)+b | c(xy)+d When the sub-HHMM is finished, control is returned to wherever it was called from HHMM to HMM • Create a state for every leaf in HHMM HHMM to HMM • Create a state for every leaf in HHMM • Flat transition probability = Sum( P( all paths in HHMM)) • Disadvantages: Flattening loses modularity Learning requires more samples Representing HHMMs as DBNs Q d t : state at level d Ft 1 : d if HMM at level d finished H-POMDPs • HHMMs with inputs and reward function • Problems: – Planning: Find mapping from belief states to actions – Filtering: Compute the belief state online – Smoothing: Compute P( xt | y1:T , u1:T ) offline – Learning: Find MLE of model parameters H-POMDP for Robot Navigation Flat model 4 * Robot position: Xt (1..10) Hierarchical model * Abstract state: Xt1 (1..4) * Concrete state: Xt2 (1..3) * Observation: Yt (4 bits) In this paper, Ignore the problem of how to choose the actions State Transition Diagram for 2-H-POMDP Sample path: State Transition Diagram for Corridor Environment Exit States Concrete States Abstract States Entry States Flattening H-POMDPs • Advantages of H-POMDP over corresponding POMDP: – Learning is easier: Learn sub-models – Planning is easier: Reason in terms of “macro” actions Dynamic Bayesian Networks STATE POMDP FACTORED DBN POMDP Ut 0.08 0.05 0.01 0.01 0.7 Lt L t 1 t t 1 0.08 # of parameters 12*9 108 # of parameters 3 4 3 4 40 Representing H-POMDPs as DBNs Ut WEST EAST WEST EAST WEST EAST WEST EAST L2t U t 1 L2t 1 Et STATE H-POMDP E t 1 L1t L1t 1 t t 1 Yt Yt 1 FACTORED DBN H-POMDP Representing H-POMDPs as DBNs Ut WEST EAST WEST EAST WEST EAST WEST EAST L2t U t 1 L2t 1 Et STATE H-POMDP E t 1 L1t L1t 1 t t 1 Yt Yt 1 FACTORED DBN H-POMDP Representing H-POMDPs as DBNs Ut WEST EAST WEST EAST WEST EAST WEST EAST L2t U t 1 L2t 1 Et STATE H-POMDP E t 1 L1t L1t 1 t t 1 Yt Yt 1 FACTORED DBN H-POMDP Representing H-POMDPs as DBNs Ut WEST EAST WEST EAST WEST EAST WEST EAST L2t U t 1 L2t 1 Et STATE H-POMDP E t 1 L1t L1t 1 t t 1 Yt Yt 1 FACTORED DBN H-POMDP Representing H-POMDPs as DBNs Ut WEST EAST WEST EAST WEST EAST WEST EAST L2t U t 1 L2t 1 Et STATE H-POMDP E t 1 L1t L1t 1 t t 1 Yt Yt 1 FACTORED DBN H-POMDP H-POMDPs as DBNs 1 : Abstract location t 2 : Concrete location t : Orientation t L L Yt : Observation U t : Action node Et : Exit node (5 values) Representing no-exit, s, n, l, r -exit Transition Model P( L j | L 2 t 2 t 1 If e = no-exit (i, j ) i, Et 1 e) 2 H (i, e, j ) otherwise Abstract horizontal transition matrix Transition Model (i, j ) P( L j | L i, Et 1 e) 2 H (i, e, j ) 2 t 2 t If e = no-exit otherwise P( Et e | L1t j , t , L2t a) X ( j , a, t , e) Probability of entering exit state e P( L1t j | L1t 1 i, Et 1 e, t 1 , L2t a) H (i,t 1 , a, j ) If e = no-exit otherwise V (e, a, j ) 1 Concrete vertical entry vector Concrete horizontal transition matrix Observation Model • Probability of seeing a wall or opening on each of 4 sides of the robot • Naïve Bayes 4assumption: 1 2 i X ( L , L where P(Yt | X t ) P(Yt | X t ) t t t ,t ) i 1 • Map global coordinate frame to robot’s local coordinate frame B(a, j , , y ) P(Yt y | L1t j , L2t a) Then, P(Yt B y | L1t j , L2t a, t ) B(a, j , R180 , y ) Learn the appearance of the cell in all directions Example .1 .9 H .9 .1 2 .1 .9 0 1 H ,1 0 .1 .9 0 0 1 Inference • Online filtering: P( X t | y1:t , u1:t ) – Input of controller: MLE of the abstract and concrete states • Offline smoothing: P( X t | y1:T , u1:T ) – O(DK1.5D T) D: # of dimensions K: # of states in each level – 1.5D: size of largest clique in DBN = The state nodes at t-1 + half of the state nodes at t – Approximation (belief propagation): O(DKT) Learning • Maximum likelihood parameter estimate using EM • In E step, compute: P(V , Pa(V ) | y t t 1:T , u1:T ) • In M step, compute normalizing matrix of expected counts: H 2 (t , i, e, j ) P ( L2t j , L2t 1 i, Et 1 e | O ) T H 2 (t , i, e, j ) H t 2 T 2 (t , i, e, j ) 2 H (t , i, e, j ) j ' t 2 Learning (Cont.) Concrete horizontal transition matrix: H 1 (t , i, t 1 , a, j ) P( L1t 1 i, Et 1 n, t 1 , L2t a, L1t j | O) Exit probabilities: X (t , j , t , a, e) P( L1t j , t , L2t a, Et e | O) Vertical transition vector: V (t , e, a, j ) P( Et 1 e, t 1 , L2t a, L1t j | O), e n Estimating Observation Model • Map local observations into worldcentered Probability of observing y, facing North Hierarchical Localizes better Factored DBN H-POMDP H-POMDP STATE POMDP Before training localizati on accuracy t 1 bt (s) T bt ( s) P( X t s | y1:t ,u1:t ) Conclusions • Represent H-POMDPs with DBNs – Learn large models with less data • Difference with SLAM: – SLAM is harder to generalize Complexity of Inference STATE H-POMDP O(ST 3 ) FACTORED DBN H-POMDP At L2t WEST EAST WEST EAST WEST EAST A t 1 L2t 1 WEST EAST Et L1t L1t 1 t t 1 Ot Number of states: SK D E t 1 O(S1.5T) O t 1
© Copyright 2026 Paperzz