Hidden Markov Models
A Hidden Markov Model consists of
1. A sequence of states {Xt|t T} = {X1, X2,
... , XT} , and
2. A sequence of observations {Yt |t T} =
{Y1, Y2, ... , YT}
Some basic problems:
from the observations {Y1, Y2, ... , YT}
1. Determine the sequence of states {X1, X2,
... , XT}. (Assuming the model)
- Viterbi path
- State probabilities given observations {Y1,
Y2, ... , YT
2. Determine (or estimate) the parameters of
the stochastic process that is generating the
states and the observations.;
Computing Likelihood
Let pij = P[Xt+1 = j|Xt = i] and P = (pij) = the MM
transition matrix.
Let p i0 = P[X1 = i] and
p
0
0
0
0
p 1 , p 2 ,, p M
= the initial distribution over the states.
P[ X 1 i1, X 2 i2 , ... , XT iT ]
0
p i1p i1i2 p i2i3 p iT 1iT
Computing Likelihood
P[X1 = i1,X2 = i2..,XT = iT, Y1 = y1, Y2 = y2, ... , YT = yT]
= P[X = i, Y = y]
=
p i01 i1 y1 p i1i2 i2 y 2 p i2i3 i3 y 3 p iT 1iT iT yT
Therefore
P[Y1 = y1, Y2 = y2, ... , YT = yT]
= P[Y = y]
0
p i1 i1 y1p i1i2 i2 y2 p i2i3 i3 y3 p iT 1iT iT yT
i1 ,i2 ,,iT
L p , P, where 1 , 2 ,, M
0
In the case when Y1, Y2, ... , YT are continuous
random variables or continuous random
vectors, Let f(y| i ) denote the conditional
distribution of Yt given Xt = i. Then the joint
density of Y1, Y2, ... , YT is given by
0
L p , P, = f(y1, y2, ... , yT) = f(y)
0
p i1 i1 y1p i1i2 i2 y2 p i2i3 i3 y3 p iT 1iT iT yT
i1 ,i2 ,,iT
where
it y t = f(yt| i )
t
Efficient Methods for computing
Likelihood
The Forward Method
1. 1 i1 p i1 y1
0
i1
2. t 1 it 1 t it p it it 1 it 1 y t 1
it
3. Then P Y y
T iT
iT
The Backward Procedure
1.
i
*
T 1 T 1
2. Then
*
t 1
p iT 1iT iT yT
iT
it 1
p i i i
*
t it
t 1 t
t 1 y t 1
it
and 3. P Y y
i1
*
1 i1
0
p i1 y1i1 ,
Prediction of states from the observations
and the model:
P X t it Y y
i
t
T iT
*
t it t it
iT
The Viterbi Algorithm
(Viterbi Paths)
The Viterbi Path is the sequence of States
X1 = i1, X2 = i2, ... , XT = iT
That maximizes
P[X1 = i1,... , XT = iT, Y1 = y1,... , YT = yT]
0
p i1 i1 y1p i1i2 i2 y2p i2i3 i3 y3 p iT 1iT iT yT
for a given set of observations
Y1 = y1, Y2 = y2, ... , YT = yT
Summary of calculations of Viterbi Path
1
0
V
(i
)
ln
p
1.
1
i1 i1 y1 i1 = 1, 2, …, M
t 1
t
V
(i
)
min
V
(it ) ln p it it1 it1 yt1
2.
t 1
i
t
it+1 = 1, 2, …, M; t = 1,…, T-2
3. V
T
min V
iT 1
T 1
(iT 1 ) ln p iT 1iT iT yT
min U(i1, i2 , ... , iT )
i1, i 2 , ... , i T
HMM generator (normal).xls
Estimation of Parameters of a Hidden
Markov Model
If both the sequence of observations Y1, Y2, ...
, YT and the sequence of States X1, X2, ... , XT
is observed Y1 = y1, Y2 = y2, ... , YT = yT, X1 = i1,
X2 = i2, ... , XT = iT, then the Likelihood is
given by:
L p 0 , P, p i01 i1 y1 p i1i2 i2 y 2 p i2i3 i3 y 3 p iT 1iT iT yT
the log-Likelihood is given by:
l p 0 , P, ln L p 0 , P, ln p i01 ln i1 y1 ln p i1i2
ln p i2i3 ln i3 y3 ln p iT 1iT ln iT yT
fi0 ln p i0 fij ln p ij ln iy
M
i 1
M M
M
i 1 j 1
i 1 y i
where f i0 the number of times state i occurs in the first state
fij the number of times state i changes to state j.
iy f y i (or p y i in the discrete case)
y i
the sum of all observations yt where X t i
In this case the Maximum Likelihood estimates
are:
0
fi
1
f ij
pˆij M , and
fij
pˆi0
j 1
ˆi = the MLE of i computed from the
observations yt where Xt = i.
MLE (states unknown)
If only the sequence of observations Y1 = y1,
Y2 = y2, ... , YT = yT are observed then the
Likelihood is given by:
L p 0 , P,
0
p
i1 i1 y1p i1i2 i2 y2 p i2i3 i3 y3 p iT 1iT iT yT
i1 ,i2 ,,iT
•
•
It is difficult to find the Maximum
Likelihood Estimates directly from the
Likelihood function.
The Techniques that are used are
1. The Segmental K-means Algorithm
2. The Baum-Welch (E-M) Algorithm
The Segmental K-means
Algorithm
In this method the parameters λ π0 , Π, θ are
adjusted to maximize
L π , Π, θ y, i Lλ y, i
0
p i01 i1 y1p i1i2 i2 y2 p i2i3 i3 y3 p iT 1iT iT yT
where i i1, i2 ,iT is the Viterbi path
Consider this with the special case
Case: The observations {Y1, Y 2, ... , YT} are
continuous Multivariate Normal with mean
vector μ i and covariance matrix Σ i when
Xt i ,
i.e.
iy t
2p
1
p/2
Σi
exp
1
2
y t μ i Σi1 y t μ i
1. Pick arbitrarily M centroids a1, a2, … aM.
Assign each of the T observations yt (kT if
multiple realizations are observed) to a
state it by determining :
min y t ai
i
Number of times i1 i
2. Then
k
Number of transitio ns from i to j
pˆij
Number of transitio ns from i
0
pˆi
3. And
yt
μˆ i
it i
Ni
y t μˆ i y t μˆ i
it i
ˆ
, Σi
Ni
4. Calculate the Viterbi path (i1, i2, …, iT)
based on the parameters of step 2 and 3.
5. If there is a change in the sequence (i1, i2,
…, iT) repeat steps 2 to 4.
The Baum-Welch (E-M)
Algorithm
• The E-M algorithm was designed originally
to handle “Missing observations”.
• In this case the missing observations are the
states {X1, X2, ... , XT}.
• Assuming a model, the states are estimated
by finding their expected values under this
model. (The E part of the E-M algorithm).
• With these values the model is estimated by
Maximum Likelihood Estimation (The M
part of the E-M algorithm).
• The process is repeated until the estimated
model converges.
The E-M Algorithm
f Y, X θ LY, X, θ denote the joint
Let
distribution of Y,X.
Consider the function:
Qθ, θ EX ln LY, X, θ Y, θ
(1)
Starting with an initial estimate of θ θ .
A sequence of estimates θ(m) are formed by
finding θ θ
to maximize Q θ, θ( m )
( m 1)
with respect to θ .
The sequence of estimates θ(m)
converge to a local maximum of the
likelihood
LY, θ f Y θ .
In the case of an HMM the log-Likelihood is
given by:
l p 0 , P, ln Lp 0 , P, ln p i0 ln i y ln p i i
M
M M
M
fi0 ln p i0 fij ln p ij ln iy
1
1 1
12
ln p i2i3 ln i3 y3 ln p iT 1iT ln iT yT
i 1
i 1 j 1
i 1 y i
where f i0 the number of times state i occurs in the first state
fij the number of times state i changes to state j.
iy f y i (or p y i in the discrete case)
y i
the sum of all observations yt where X t i
Recall
t i t* i
t i t* i
t i P X t i Y y
*
j
T
t j t j
j
j
and
T 1
t i
t 1
Expected no. of transitions from
state i.
Let
t i, j P X t i, X t 1 j Y y
P X t i, X t 1 j, Y y
P Y y
P X t i, Y (t ) y (t ) , X t 1 j, Yt 1 yt 1 , Y*(t 1) y *(t 1)
P Y y
t i p ij j yt 1 t*1 j
T j
T 1
j
t i, j
t 1
Expected no. of transitions from state i to
state j.
The E-M Re-estimation Formulae
Case 1: The observations {Y1, Y2, ... , YT} are
discrete with K possible values and
iy PYt y X t i
p i
0
ˆi
T 1
pˆ ij
t i, j
t 1
T 1
t i
t 1
T
, and
t i
t 1, yt y
ˆ
iy T
t i
t 1
Case 2: The observations {Y1, Y 2, ... , YT}
are continuous Multivariate Normal with
mean vector μ i and covariance matrix Σ i
when X t i ,
i.e.
iy t
2p
1
p/2
Σi
exp
1
2
y t μ i Σi1 y t μ i
pˆ i0 i
T 1
pˆ ij
t i, j
t 1
T 1
t i
, and
t 1
T
μˆ i
t i y t
t 1
T
t i
t 1
ˆ
ˆ
i
y
μ
y
μ
t t i t i
T
ˆ
,Σ
i
t 1
T
t i
t 1
Measuring distance between two HMM’s
Let
λ
and
λ
1
π
01
2
02
π
1
,Π ,θ
2
1
,Π ,θ
2
denote the parameters of two different HMM
models. We now consider defining a distance
between these two models.
The Kullback-Leibler distance
Consider the two discrete distributions
2
1
y
p
p y and
( f 1 y and f 2 y in the continuous case)
then define
1
p y 1
1 2
I p , p ln 2 p y
y
p y
E p1
ln p y ln p y
1
2
and in the continuous case:
1 2
I f ,f
ln
E f 1
f 1 y 1
f
y
d
y
2
f y
ln f y ln f y
1
2
These measures of distance between the two
distributions are not symmetric but can be
made symmetric by the following:
1 2
2 1
I
p
,
p
I
p
,p
1 2
Is p , p
2
In the case of a Hidden Markov model.
p
i
y py λ py π , Π , θ
p y, i π , Π , θ
i
0i
0i
i
i
i
i
i
where
p y, i π0 , Π, θ p i01 i1 y1p i1i2 i2 y 2 p i2i3 i3 y3 p iT 1iT iT yT
1 2
I
p
,p
The computation of
in this case
is formidable
Juang and Rabiner distance
Let YT(i ) Y1(i ) , Y2(i ) ,, YT(i ) denote a sequence
of observations generated from the HMM
with parameters:
i
i i
0i
λ π ,Π ,θ
Let i*(i ) y i1(i ) y , i2(i ) y ,, iT(i ) y
denote the optimal (Viterbi) sequence of states
assuming HMM model
.
λ i π0i , Πi , θi
Then define:
def
D λ 1 , λ 2
1
lim ln p YT(1) , i*(1) YT(1) λ 1 ln p YT(1) , i*( 2) YT(1) λ 2
T T
and
1 2
2 1
D
λ
,
λ
D
λ
,λ
1 2
Ds λ , λ
2
© Copyright 2026 Paperzz