Hidden Markov Models

Hidden Markov Models
By
Manish Shrivastava
Hidden Markov Model
A colored ball choosing example :
Urn 1
# of Red = 30
# of Green = 50
# of Blue = 20
Urn 3
# of Red =60
# of Green =10
# of Blue = 30
Urn 2
# of Red = 10
# of Green = 40
# of Blue = 50
Probability of transition to another Urn after picking a ball:
U1
U2
U3
U1
0.1
0.4
0.5
U2
0.6
0.2
0.2
U3
0.3
0.4
0.3
Hidden Markov Model





Set of states : S where |S|=N
Output Alphabet : V
Transition Probabilities : A = {aij}
Emission Probabilities : B = {bj(ok)}
Initial State Probabilities : π
  ( A, B,  )
Three Basic Problems of HMM
Given Observation Sequence O ={o1… oT}
1.
Efficiently estimate P(O|λ)

Given Observation Sequence O ={o1… oT}
2.
Get best Q ={q1… qT} i.e.


Maximize P(Q|O, λ)
How to adjust
3.

Re-estimate λ
  ( A, B,  )
to best maximize
P(O |  )
Solutions

Problem 1: Likelihood of a sequence



Problem 2: Best state sequence


Forward Procedure
Backward Procedure
Viterbi Algorithm
Problem 3: Re-estimation

Baum-Welch ( Forward-Backward Algorithm )
Problem 1
Consider : O  {o 1...o T }
And Q  {q 1...q T }
Then ,
T
P (O | Q,  )   P (ot | qt ,  )
t 1
 bq1 (o1 ). ... bqT (oT )
And
P (Q |  )   q1 aq1q2 ....aqT 1qT
We know,
P (O |  )   P (O, Q |  )
T
Then,
P (O |  )   P (O | Q,  ) P (Q |  )
T
Problem 1
P(O |  ) 

q1
aq1q2 ....aqT 1qT bq1 (o1 ). ... bqT (oT )
q1 ... qT



Order 2TNT
Definitely not efficient!!
Is there a method to tackle this problem? Yes.

Forward or Backward Procedure
Forward Procedure
Define Forward variable as
 t (i )  P(o1...ot , qt  Si |  )
The probabilit y that the state at position t is Si ,
and of the partial observatio n o1...ot , given the model 
Forward Step:
Forward Procedure
Backward Procedure
Backward Procedure
Forward Backward Procedure

Benefit

Order


N2T as compared to 2TNT for simple computation
Only Forward or Backward procedure needed
for Problem 1
Problem 2
Given Observation Sequence O ={o1… oT}

Get “best” Q ={q1… qT} i.e.

Solution :

Best state individually likely at a position i
Best state given all the previously observed
states and observations
1.
2.

Viterbi Algorithm
Viterbi Algorithm
• Define
such that,
i.e. the sequence which has the best joint probability so far.
• By induction, we have,
Viterbi Algorithm
Viterbi Algorithm
Problem 3
How to adjust   ( A, B,  ) to best maximize P(O |  )

Re-estimate λ

Solutions :

To re-estimate (iteratively update and improve)
HMM parameters A,B, π


Use Baum-Welch algorithm
Baum-Welch Algorithm

Define

Putting forward and backward variables
Baum-Welch algorithm

Define

Then, expected number of transitions from Si

And, expected number of transitions from Sj to Si
Baum-Welch Algorithm

Baum et al have proved that the above
equations lead to a model as good or better
than the previous
Questions?