Hidden Markov Model
Si
aij
a ji
Sj
Observation : O1,O2, . . .
O1 , O2 , O3 ,, Ot
States in time : q1, q2, . .
.
q1 , q2 , q3 , , qt
All states : s1, s2, . . .
1
Hidden Markov Model (Cont’d)
Discrete Markov Model
P(qt s j | qt 1 si , qt 2 sk ,, q1 s z )
P(qt s j | qt 1 si )
Degree 1 Markov Model
2
Hidden Markov Model (Cont’d)
aij : Transition Probability from Si to Sj ,
1 i, j N
ai , j P(qt s j | qt 1 si )
3
Hidden Markov Model
Example
S1 : The weather is rainy
S2 : The weather is cloudy
S3 : The weather is sunny
rainy cloudy sunny
0.4 0.3 0.3 rainy
A {aij } 0.2 0.6 0.2 cloudy
0.1 0.1 0.8 sunny
4
Hidden Markov Model Example
(Cont’d)
Question 1:How much is this probability:
Sunny-Sunny-Sunny-Rainy-Rainy-Sunny-Cloudy-Cloudy
q1q2 q3 q4 q5 q6 q7 q8
s3 s3 s3 s1s1s3 s2 s2 1.536 10
a33a33a31a11a13a32a22
5
4
Hidden Markov Model Example
(Cont’d)
The probability of being in state i in time t=1
i P(q1 si ),1 i N
Question 2:The probability of staying in a state for d
days if we are in state Si?
d 1
ii
P(si si si s j i ) a (1 aii ) Pi (d )
d Days
6
HMM Components
N : Number Of States
M : Number Of Outputs
A : State Transition Probability Matrix
B : Output Occurrence Probability in
each state
: Primary Occurrence Probability
( A, B, )
: Set of HMM Parameters
7
Three Basic HMM Problems
Given an HMM and a sequence of
observations O,what is the probability P(O | ) ?
Given a model and a sequence of
observations O, what is the most likely state
sequence in the model that produced the
observations?
Given a model and a sequence of
observations O, how should we adjust model
parameters in order to maximize P(O | ) ?
8
First Problem Solution
T
T
t 1
t 1
P(o | q, ) P(ot | qt , ) bqt (ot )
P(q | ) q1aq1q2 aq2 q3 aqT 1qT
We Know That:
And
P( x, y) P( x | y) P( y)
P( x, y | z ) P( x | y, z ) P( y | z )
9
First Problem Solution (Cont’d)
P(o, q | ) P(o | q, ) P(q | )
P(o, q | )
q1 bq1 (o1 )aq1q2 bq2 (o2 ) aqT 1qT bqT (oT )
P(o | ) P(o, q | )
q
q1q2 qT
q1
bq1 (o1 )aq1q2 bq2 (o2 ) aqT 1qT bqT (oT )
T
Account Order : O(2TN )
10
Forward Backward Approach
t (i) P(o1 , o2 ,, ot , qt i | )
Computing t (i )
1) Initialization
1 (i) i bi (o1 ),1 i N
11
Forward Backward Approach
(Cont’d)
2) Induction :
N
t 1 ( j ) [ t (i )aij ]b j (ot 1 )
i 1
1 t T 1,1 j N
3) Termination :
N
P(o | ) T (i)
i 1
2
Account Order : O( N T )
12
Backward Variable Approach
t (i) P(ot 1 , ot 2 ,, oT | qt i, )
1) Initialization
T (i) 1,1 i N
2)Induction
N
t (i ) aijb j (ot 1 ) t 1 ( j )
j 1
t T 1, T 2, ,1And1 j N
13
Second Problem Solution
Finding the most likely state sequence
P (o, qt i | )
t (i ) P (qt i | o, )
P (o | )
P (o, qt i | )
t (i ) t (i )
N
N
P(o, qt i | ) t (i) t (i)
i 1
i 1
Individually most likely state :
q arg max[ t (i)],1 t T ,1 t n N
*
t
14
Viterbi Algorithm
Define :
t (i )
max P[q1 , q2 , , qt 1 , qt i, o1 , o2 , , ot | ]
q1 , q2 , , qt 1
1 i N
P is the most likely state sequence with this
conditions : state i , time t and observation o
15
Viterbi Algorithm (Cont’d)
t 1 ( j ) [max t (i)aij ].b j (ot 1 )
i
1) Initialization
1 (i ) i bi (o1 ),1 i N
1 (i ) 0
t (i ) Is the most likely state before state i
at time t-1
16
Viterbi Algorithm (Cont’d)
2) Recursion
t ( j ) max [ t 1 (i )aij ]b j (ot )
1i N
t ( j ) arg max [ t 1 (i )aij ]
1i N
2 t T ,1 j N
17
Viterbi Algorithm (Cont’d)
3) Termination:
p max [ T (i )]
*
1i N
q arg max [ T (i )]
*
T
1i N
4)Backtracking:
q t 1 (q ), t T 1, T 2,,1
*
t
*
t 1
18
Third Problem Solution
Parameters Estimation using BaumWelch Or Expectation Maximization
(EM) Approach
t (i, j ) P (qt i, qt 1 j | o, )
Define:
P (o, qt i, qt 1 j | )
P (o | )
t (i )aijb j (ot 1 ) t 1 ( j )
N
N
(i)a b
i 1 j 1
t
ij
j
(ot 1 ) t 1 ( j )
19
Third Problem Solution
(Cont’d)
N
t (i) t (i, j )
j 1
T 1
t (i)
t 1
T
: Expectation value of the number of
jumps from state i
Expectation value of the number of
(i, j ):jumps
from state i to state j
t 1
t
20
Third Problem Solution
(Cont’d)
T
( j)
T
aij
(i, j )
t 1
T
t
t (i)
t 1
t
b j (k )
t 1
ot Vk
T
( j)
t 1
t
i 1 (i)
21
Baum Auxiliary Function
Q( | ) P(o, q | ' ) log P(o, q | )
'
q
if : Q( , ) Q( , ' )
P(o | ) P(o | )
'
'
By this approach we will reach to a local
optimum
22
Restrictions Of
Reestimation Formulas
N
i 1
N
a
j 1
i
ij
M
1
1,1 i N
b (k ) 1,1
k 1
j
jN
23
Continuous Observation
Density
We have amounts of a PDF instead of
b j (k ) P(ot Vk | qt j )
We have
M
b j (o) C jk (o, jk , jk ), b j (o)do 1
k 1
Mixture
Coefficients
Average
Variance
24
Continuous Observation
Density
Mixture in HMM
M1|1 M2|1
M1|2 M2|2
M1|3 M2|3
M3|1 M4|1
M3|2 M4|2
M3|3 M4|3
S2
S3
S1
Dominant Mixture:
b j (o) Max C jk (o, jk , jk )
k
25
Continuous Observation
Density (Cont’d)
Model Parameters:
( A, , C , , )
N×N
1×N
N×M N×M×K N×M×K×K
N : Number Of States
M : Number Of Mixtures In Each State
K : Dimension Of Observation Vector
26
Continuous Observation
Density (Cont’d)
T
C jk
t 1
T
M
t
( j, k )
t 1 k 1
T
jk
t 1
T
t
t 1
t
( j, k )
( j , k )ot
t
( j, k )
27
Continuous Observation
Density (Cont’d)
T
jk
( j , k ) (o
t 1
t
t
jk ) (o t jk )
T
( j, k )
t 1
t
t ( j, k ) Probability of event j’th state
and k’th mixture at time t
28
State Duration Modeling
aij
Si
Sj
a ji
Probability of staying d times in state i :
d 1
ii
Pi (d ) a
(1 aii )
29
State Duration Modeling
(Cont’d)
HMM With clear duration
Pi (d )
aij
…….
…….
Si
Pj (d )
a ji
Sj
30
State Duration Modeling
(Cont’d)
HMM consideration with State Duration :
– Selecting q1 i
using i ‘s
– Selecting d1 using Pq (d )
– Selecting Observation Sequence O1 , O2 ,, Od
using bq (O1 , O2 , , Od )
1
1
in practice we assume the following independence:
1
d1
bq1 (O1 , O2 ,, Od1 ) bq1 (t , Ot )
t 1
– Selecting next state q2 j using transition
probabilities aq1q2 . We also have an additional
constraint:
aq1q1 0
31
Training In HMM
Maximum Likelihood (ML)
Maximum Mutual Information (MMI)
Minimum Discrimination Information
(MDI)
32
Training In HMM
Maximum Likelihood (ML)
P(o | 1 )
P(o | 2 )
P(o | 3 )
.
.
.
P ( o | n )
P Maximum[ P(O | V )]
*
r
Observation
Sequence
33
Training In HMM (Cont’d)
Maximum Mutual Information (MMI)
P(O , | )
I (O , ) log
P (O ) P( )
, {v }
Mutual Information
I (O , ) log P(O | v )
v
log P(O | w , w) P( w)
w 1
34
Training In HMM (Cont’d)
Minimum Discrimination Information
(MDI)
Observation :
O (O1 , O2 ,, OT )
Auto correlation : R ( R1 , R2 ,, Rt )
( R, P ) inf I (Q : P )
Q (R)
q (o )
I (Q : P ) q(o) log
do
P (o | )
35
© Copyright 2026 Paperzz