state h-pomdp factored dbn h

Representing hierarchical
POMDPs as DBNs for multi-scale
robot localization
G. Thocharous, K. Murphy, L.
Kaelbling
Presented by:
Hannaneh Hajishirzi
Outline
• Define H-HMM
– Flattening H-HMM
• Define H-POMDP
– Flattening H-POMDP
• Approximate H-POMDP with DBN
• Inference and Learning in H-POMDP
Introduction
• H-POMDPs represent state-space at
multiple levels of abstraction
– Scale much better to larger environments
– Simplify planning
• Abstract states are more deterministic
– Simplify learning
• Number of free parameters is reduced
Hierarchical HMMs
• A generalization of HMM to model
hierarchical structure domains
– Application: NLP
• Concrete states: emit single observation
• Abstract states: emit strings of observations
• Emitted strings by abstract states are
governed by sub-HMMs
Example
• HHMM representing a(xy)+b | c(xy)+d
When the sub-HHMM is finished,
control is returned to wherever it
was called from
HHMM to HMM
• Create a state for every leaf in HHMM
HHMM to HMM
• Create a state for every leaf in HHMM
• Flat transition probability =
Sum( P( all paths in HHMM))
• Disadvantages:
Flattening loses modularity
Learning requires more samples
Representing HHMMs as DBNs
Q
d
t
: state at level d
Ft  1 :
d
if HMM at level d finished
H-POMDPs
• HHMMs with inputs and reward function
• Problems:
– Planning: Find mapping from belief states to
actions
– Filtering: Compute the belief state online
– Smoothing: Compute P( xt | y1:T , u1:T ) offline
– Learning: Find MLE of model parameters
H-POMDP for Robot Navigation
Flat model
4
* Robot position: Xt
(1..10)
Hierarchical
model
* Abstract state: Xt1 (1..4)
* Concrete state: Xt2 (1..3)
* Observation: Yt (4 bits)
In this paper, Ignore the problem of
how to choose the actions
State Transition Diagram for
2-H-POMDP
Sample path:
State Transition Diagram for
Corridor Environment
Exit States
Concrete States
Abstract States
Entry States
Flattening H-POMDPs
• Advantages of H-POMDP over corresponding POMDP:
– Learning is easier: Learn sub-models
– Planning is easier: Reason in terms of “macro” actions
Dynamic Bayesian Networks
STATE POMDP
FACTORED DBN POMDP
Ut
0.08
0.05
0.01
0.01
0.7
Lt
L t 1
t
 t 1
0.08
# of parameters
12*9 108
# of parameters
3 4  3  4  40
Representing H-POMDPs as DBNs
Ut
WEST
EAST
WEST
EAST
WEST
EAST
WEST
EAST
L2t
U t 1
L2t 1
Et
STATE H-POMDP
E t 1
L1t
L1t 1
t
 t 1
Yt
Yt 1
FACTORED DBN H-POMDP
Representing H-POMDPs as
DBNs
Ut
WEST
EAST
WEST
EAST
WEST
EAST
WEST
EAST
L2t
U t 1
L2t 1
Et
STATE H-POMDP
E t 1
L1t
L1t 1
t
 t 1
Yt
Yt 1
FACTORED DBN H-POMDP
Representing H-POMDPs as
DBNs
Ut
WEST
EAST
WEST
EAST
WEST
EAST
WEST
EAST
L2t
U t 1
L2t 1
Et
STATE H-POMDP
E t 1
L1t
L1t 1
t
 t 1
Yt
Yt 1
FACTORED DBN H-POMDP
Representing H-POMDPs as
DBNs
Ut
WEST
EAST
WEST
EAST
WEST
EAST
WEST
EAST
L2t
U t 1
L2t 1
Et
STATE H-POMDP
E t 1
L1t
L1t 1
t
 t 1
Yt
Yt 1
FACTORED DBN H-POMDP
Representing H-POMDPs as
DBNs
Ut
WEST
EAST
WEST
EAST
WEST
EAST
WEST
EAST
L2t
U t 1
L2t 1
Et
STATE H-POMDP
E t 1
L1t
L1t 1
t
 t 1
Yt
Yt 1
FACTORED DBN H-POMDP
H-POMDPs as DBNs
1
: Abstract location
t
2
: Concrete location
t
: Orientation
t
L
L

Yt : Observation
U t : Action node
Et : Exit node (5 values)
Representing no-exit, s, n, l, r -exit
Transition Model
P( L  j | L
2
t
2
t 1
If e = no-exit
 (i, j )
 i, Et 1  e)   2
H (i, e, j ) otherwise
Abstract horizontal transition
matrix
Transition Model
 (i, j )
P( L  j | L  i, Et 1  e)   2
H (i, e, j )
2
t
2
t
If e = no-exit
otherwise
P( Et  e | L1t  j , t , L2t  a)  X ( j , a, t , e)
Probability of entering exit state e
P( L1t  j | L1t 1  i, Et 1  e, t 1 , L2t  a)
H (i,t 1 , a, j ) If e = no-exit

otherwise
V (e, a, j )
1
Concrete vertical entry vector
Concrete horizontal
transition matrix
Observation Model
• Probability of seeing a wall or opening on each of 4 sides
of the robot
• Naïve Bayes 4assumption:
1
2
i
X

(
L
,
L
where
P(Yt | X t ) 
P(Yt | X t )
t
t
t ,t )

i 1
• Map global coordinate frame to robot’s local coordinate
frame
B(a, j , , y )  P(Yt  y | L1t  j , L2t  a)
Then,
P(Yt B  y | L1t  j , L2t  a, t )  B(a, j , R180 , y )
 Learn the appearance of the cell in all directions
Example
 .1 .9 

H  
 .9 .1
2

 .1 .9 0 


1
H ,1   0 .1 .9 
0 0 1


Inference
• Online filtering:
P( X t | y1:t , u1:t )
– Input of controller: MLE of the abstract and concrete
states
• Offline smoothing: P( X t | y1:T , u1:T )
– O(DK1.5D T) D: # of dimensions
K: # of states in each level
– 1.5D: size of largest clique in DBN =
The state nodes at t-1 + half of the state nodes at t
– Approximation (belief propagation): O(DKT)
Learning
• Maximum likelihood parameter estimate using EM
• In E step, compute:
 P(V , Pa(V ) | y
t
t
1:T
, u1:T )
• In M step, compute normalizing matrix of expected counts:
H 2 (t , i, e, j )  P ( L2t  j , L2t 1  i, Et 1  e | O )
T
H 2 (t , i, e, j ) 
H
t 2
T
2
(t , i, e, j )
2
H
 (t , i, e, j )
j ' t 2
Learning (Cont.)
Concrete horizontal transition matrix:
H 1 (t , i, t 1 , a, j )  P( L1t 1  i, Et 1  n, t 1 , L2t  a, L1t  j | O)
Exit probabilities:
X (t , j , t , a, e)  P( L1t  j , t , L2t  a, Et  e | O)
Vertical transition vector:
V (t , e, a, j )  P( Et 1  e, t 1 , L2t  a, L1t  j | O), e  n
Estimating Observation Model
• Map local observations into worldcentered
Probability of observing y, facing North
Hierarchical Localizes better
Factored DBN H-POMDP
H-POMDP
STATE POMDP
Before training
localizati on accuracy  t 1 bt (s)
T
bt ( s)  P( X t  s | y1:t ,u1:t )
Conclusions
• Represent H-POMDPs with DBNs
– Learn large models with less data
• Difference with SLAM:
– SLAM is harder to generalize
Complexity of Inference
STATE
H-POMDP
O(ST 3 )
FACTORED
DBN H-POMDP
At
L2t

WEST EAST
WEST
EAST
WEST EAST
A t 1
L2t 1
WEST EAST
Et
L1t
L1t 1
t
 t 1
Ot
Number
of states:
SK
D

E t 1
O(S1.5T)
O t 1