pdf3

Hidden Markov Models
First Story!
Majid Hajiloo, Aria Khademi
Outline
This lecture is devoted to the problem of inference in probabilistic
graphical models (aka Bayesian nets). The goal is for you to:
 Practice marginalization and conditioning
 Practice Bayes rule
 Learn the HMM representation (model)
 Learn how to do prediction and filtering using HMMs
2
Assistive Technology
Assume you have a little robot that is trying to estimate the
posterior probability that you are happy or sad
3
Assistive Technology
Given that the robot has observed whether you are:
 watching Game of Thrones (w)
 sleeping (s)
 crying (c)
 face booking (f)
Let the unknown state be X=h if you’re happy and X=s if you’re
sad.
Let Y denote the observation, which can be w, s, c or f.
4
Assistive Technology
We want to answer queries, such as:
 𝑃 𝑋 = ℎ 𝑌 = 𝑓 =?
 𝑃 𝑋 = 𝑠 𝑌 = 𝑐 =?
5
Assistive Technology
Assume that an expert has compiled the following prior and
likelihood models:
𝑃 𝑋 𝑌 =?
𝑃 𝑋 = 0.2 0.8
0.1 0.3
𝑃 𝑌𝑋 =
0.4 0.4
X
0.5
0.2
0.1
0
𝑃 𝑋=ℎ𝑌=𝑤
𝑃 𝑌 = 𝑤 𝑋 = ℎ . 𝑃(𝑋 = ℎ)
=
𝑃 𝑌 = 𝑤 𝑋 = ℎ .𝑃 𝑋 = ℎ + 𝑃 𝑌 = 𝑤 𝑋 = 𝑠 .𝑃 𝑋 = 𝑠
0.4 × 0.8
=
= 0.94
0.4 × 0.8 + 0.1 × 0.2
Y
6
Assistive Technology
But what if instead of an absolute prior, what we have instead is a
temporal (transition prior). That is, we assume a dynamical system:
 𝑃 𝑋2 𝑋1 =
0.9
0
0.1
1
 𝑃 𝑋2 = 𝑠 𝑋1 = 𝑠 = 0.99
 Init 𝑃 𝑋0 = 0.2 0.8
Given a history of observations, say 𝑌1 = 𝑤, 𝑌2 = 𝑓, 𝑌3 = 𝑐, we want
to compute the posterior distribution that you are happy at step 3.
That is, we want to estimate:
𝑃(𝑋3 = ℎ|𝑌1 = 𝑤, 𝑌2 = 𝑓, 𝑌3 = 𝑐)
7
Dynamic Model
In general, we assume we have an initial distribution 𝑃(𝑋0 ), a
transition model 𝑃 𝑋𝑡 𝑋𝑡−1 and an observation model 𝑃(𝑌𝑡 |𝑋𝑡 )
𝑋0
𝑋1
𝑋2
𝑌1
𝑌2
For 2 time steps:
𝑃(𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) = P 𝑋0 . 𝑃 𝑌1 𝑋1 . 𝑃 𝑌2 𝑋2 . 𝑃 𝑋1 𝑋0 . 𝑃 𝑋2 𝑋1
8
Optimal Filtering
Our goal is to compute, for all t, the posterior (aka filtering)
distribution:
𝑃 𝑋𝑡 𝑌1:𝑡 = 𝑃(𝑋𝑡 |𝑌1 , 𝑌2 , … , 𝑌𝑡 )
We derive a recursion to compute 𝑃 𝑋𝑡 𝑌1:𝑡 assuming that we
have as input 𝑃 𝑋𝑡−1 𝑌1:𝑡−1 . The recursion has two steps:
1. Prediction
2. Bayesian update
9
Prediction
First, we compute the state prediction: 𝑃 𝑋𝑡 𝑌1:𝑡−1
𝑃 𝑋𝑡 𝑌1:𝑡−1 =
𝑃 𝑋𝑡 , 𝑋𝑡−1 𝑌1:𝑡−1 =
𝑋𝑡−1 ∈ ℎ,𝑠
𝑃 𝑋𝑡 𝑋𝑡−1 , 𝑌1:𝑡−1 . 𝑃 𝑋𝑡−1 𝑌1:𝑡−1 =
𝑋𝑡−1
𝑃 𝑋𝑡 𝑋𝑡−1 . 𝑃 𝑋𝑡−1 𝑌1:𝑡−1
𝑋𝑡−1
10
Bayes Update
Second, we use Bayes rule to obtain 𝑃(𝑋𝑡 |𝑌1:𝑡 )
𝑃 𝑋𝑡 𝑌1:𝑡 = 𝑃(𝑋𝑡 |𝑌𝑡 , 𝑌1:𝑡−1 )
=
𝑃 𝑌𝑡 𝑋𝑡 , 𝑌1:𝑡−1 . 𝑃(𝑋𝑡 |𝑌1:𝑡−1 )
𝑋𝑡 𝑃 𝑌𝑡 𝑋𝑡 , 𝑌1:𝑡−1 . 𝑃(𝑋𝑡 |𝑌1:𝑡−1 )
=
𝑃 𝑌𝑡 𝑋𝑡 . 𝑃(𝑋𝑡 |𝑌1:𝑡−1 )
𝑋𝑡 𝑃 𝑌𝑡 𝑋𝑡 . 𝑃(𝑋𝑡 |𝑌1:𝑡−1 )
11
HMM Algorithm
Example. Assume:
𝑃 𝑋𝑡−1 |𝑌1:𝑡−1 = 0.2 0.8
𝑃 𝑋𝑡 𝑋𝑡−1 =
0.9 0.1
0
1
0.1 0.3 0.5 0.1
0.4 0.4 0.2 0
 Prediction:
𝑃 𝑌𝑡 𝑋𝑡 =
𝑃 𝑋𝑡 = ℎ 𝑌1:𝑡−1 = 𝑋𝑡−1 𝑃 𝑋𝑡 𝑋𝑡−1 . 𝑃 𝑋𝑡−1 𝑌1:𝑡−1 =
𝑃 𝑋𝑡 = ℎ 𝑋𝑡−1 = 𝑠 . 𝑃 𝑋𝑡−1 = 𝑠 𝑌1:𝑡−1 + 𝑃 𝑋𝑡 = ℎ 𝑋𝑡−1 = ℎ . 𝑃 𝑋𝑡−1 = ℎ 𝑌1:𝑡−1 = 0 ×
0.2 + 1 × 0.8 = 0.8
 Bayes Update: if we observe 𝑌𝑡 = 𝑤, the Bayes update yields:
𝑃 𝑋𝑡 = ℎ 𝑌𝑡 = 𝑤, 𝑌1:𝑡−1
𝑃 𝑌𝑡 = 𝑤 𝑋𝑡 = ℎ . 𝑃(𝑋𝑡 = ℎ|𝑌1:𝑡−1 )
=
𝑃 𝑌𝑡 = 𝑤 𝑋𝑡 = ℎ . 𝑃 𝑋𝑡 = ℎ 𝑌1:𝑡−1 + 𝑃 𝑌𝑡 = 𝑤 𝑋𝑡 = 𝑠 . 𝑃(𝑋𝑡 = 𝑠|𝑌1:𝑡−1 )
(0.4) × (0.8)
=
= 0.94
0.4 × 0.8 + (0.1) × (0.2)
12
The END