Hidden Markov Models
.
Hidden Markov Models (HMMs)
X
S1
X2
S2
X3
S3
Xi-1
Si-1
Xi
Si
Xi+1
Si+1
X
R1
X2
R2
X3
R3
Xi-1
Ri-1
Xi
Ri
Xi+1
Ri+1
This HMM depicts the factorization:
p ( r1 , , rL )
( s1 ,..., s L )
kk transition matrix
L
p ( s1 ) p ( r1 | s1 ) p ( si | si 1 ) p ( ri | si )
i2
Application in communication: message sent is (s1,…,sm) but we
receive (r1,…,rm) . Compute what is the most likely message sent ?
Applications in Forecasting: Next time-slot weather condition
Applications in Tracking Missiles: Next location given sensor data
2
Queries of interest (MAP)
H1
H2
HL-1
HL
X1
X2
XL-1
XL
The Maximum A Posteriori query :
( h1* ,..., hL* ) max arg p ( h1 ,..., hL | x1 , , x L )
(h 1 ,..., h L )
An efficient solution, assuming local probability are known is
called the Viterbi Algorithm.
Same problem if replaced by maximizing
the joint distribution p(h1,…,hL,x1,..,xL)
An answer to this query gives the most probable message
sent.
3
Queries of interest (Belief Update)
Posterior Decoding
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
1. Compute the posteriori belief in Hi (specific i)
given the evidence {x1,…,xL} for each of Hi’s values hi,
namely, compute p(hi | x1,…,xL).
2. Do the same computation for every Hi but without
repeating the first task L times.
Local probability tables are assumed to be known. An answer to this
query gives the probability of a value at an arbitrary location.
4
Decomposing the computation of Belief
update (Posterior decoding)
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
Belief update: P(hi | x1,…,xL) = (1/K) P(x1,…,xL,hi)
where K= hi P(x1,…,xL,hi).
P(x1,…,xL,hi) = P(x1,…,xi,hi) P(xi+1,…,xL | x1,…,xi,hi)
= P(x1,…,xi,hi) P(xi+1,…,xL | hi) f(hi) b(hi)
Equality due to IP({xi+1,…,xL}, Hi , {x1,…,xi} )
5
The forward algorithm
H1
H2
Hi
X1
X2
Xi
The task: Compute f(hi) = P(x1,…,xi,hi) for i=1,…,L (namely,
considering evidence up to time slot i).
P(x1, h1) = P(h1) P(x1|h1)
h P(x1,h1,h2,x2)
= h P(x1,h1) P(h2 | x1,h1)
Last equality due
P(x1,x2,h2) =
1
1
to conditional
independence
{Basis step}
{Second step}
P(x2 | x1,h1,h2)
= h P(x1,h1) P(h2 | h1) P(x2 | h2)
1
{step i}
P(x1,…,xi,hi) = hi-1P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi)
6
The backward algorithm
Hi
Hi+1
HL-1
HL
Xi+1
XL-1
XL
The task: Compute b(hi) = P(xi+1,…,xL|hi) for i=L-1,…,1
(namely, considering evidence after time slot i).
P(xL| hL-1) = hL P(xL ,hL |hL-1) = hL P(hL |hL-1) P(xL |hL-1 ,hL )=
Last equality due to
conditional independence
= hLP(hL |hL-1) P(xL |hL ) {first step}
{step i}
P(xi+1,…,xL|hi) = hi+1 P(hi+1 | hi) P(xi+1 | hi+1) P(xi+2,…,xL| hi+1)
=b(hi)=
=b(hi+1)=
7
The combined answer
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
1. To Compute the posteriori belief in Hi (specific i)
given the evidence {x1,…,xL} run the forward algorithm
and compute f(hi) = P(x1,…,xi,hi), run the backward
algorithm to compute b(hi) = P(xi+1,…,xL|hi), the product
f(hi)b(hi) is the answer (for every possible value hi).
2. To Compute the posteriori belief for every Hi simply
run the forward and backward algorithms once, storing
f(hi) and b(hi) for every i (and value hi). Compute
f(hi)b(hi) for every i.
8
Computing Likelihood of Evidence
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
1. To compute the likelihood of evidence P(x1,…,xL), do
one more step in the forward algorithm, namely,
h f(hL) = h P(x1,…,xL,hL)
L
L
2. Alternatively, do one more step in the backward
algorithm, namely,
h b(h1) P(h1) P(x1|h1) = h P(x2,…,xL|h1) P(h1) P(x1|h1)
1
1
9
Time and Space Complexity of the
forward/backward algorithms
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
Time complexity is linear in the length of the chain,
provided the number of states of each variable is a
constant. More precisely, time complexity is O(k2L)
where k is the maximum domain size of each variable.
Space complexity is also O(k2L).
10
The MAP query in HMM
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
1. Recall that the query asking likelihood of evidence is
to compute P(x1,…,xL) =
P(x1,…,xL, h1,…,hL)
(h ,…,h )
1
L
2. Now we wish to compute a similar quantity (MAP):
P*(x1,…,xL) = MAX P(x1,…,xL, h1,…,hL)
(h1,…,hL)
And, of course, we wish to find a MAP assignment
(h1*,…,hL*) that brought about this maximum.
11
Example: Revisiting likelihood of evidence
H1
H2
H3
X1
X2
X3
P(x1,x2,x3) = h1 P(h1)P(x1|h1) h P(h2|h1)P(x2|h2) h P(h3 |h2)P(x3|h3)
3
2
= h P(h1)P(x1|h1) h b(h2) P(h2|h1)P(x2|h2)
1
2
= h b(h1) P(h1)P(x1|h1)
1
12
Example: Computing the MAP assignment
H1
H2
H3
X1
X2
X3
Replace sums with taking maximum:
maximum = max h P(h1)P(x1|h1) max P(h2|h1)P(x2|h2) max P(h3 |h2)P(x3|h3)
h3
h2
1
= max P(h1)P(x1|h1) max b (h2) P(h1|h2)P(x2|h2)
h 2 h3
h1
= max b (h1) P(h1)P(x1|h1)
h1 h 2
h1*
x*
h3 (h2)
x*
h2 (h1)
{Finding the maximum}
= arg max b h(h1) P(h1)P(x1|h1) {Finding the map assignment}
h1 2
h2* = x*
(h1*); h3* = x* (h2*)
h
h
2
3
13
Viterbi’s algorithm
Backward phase:
H1
H2
Hi
bh (hL) = 1
X2
X1
Xi
L+1
For i=L-1 downto 1 do
bh (hi) =
MAX h P(hi+1 | hi) P(xi+1 | hi+1) bh (hi+1)
i+1
i+1
HL-1
HL
XL-1
XL
i+2
x*h (hi) = ARGMAX h P(hi+1 | hi) P(xi+1 | hi+1) bh (hi+1)
i+1
i+1
i+2
(Storing the best value as a function of the parent’s values)
Forward phase (Tracing the MAP assignment) :
h1* = ARG MAX h P(h1) P(x1|h1) bh (h1)
For i=1 to L-1 do
hi+1* = x*
(hi *)
h
i+1
2
2
14
Factorial HMM
H1
H2
Hi
HL-1
HL
nH
1
nH
2
nH
i
nH
L-1
nH
L
X1
X2
Xi
XL-1
XL
15
Improved complexity via variable elimination
H1
H2
Hi
HL-1
HL
nH
1
nH
2
nH
i
nH
L-1
nH
L
X1
X2
Xi
XL-1
XL
Eliminating the hidden variables in the order top-down, leftright yields time complexity of O(n2n L). Rather than naïve
complexity of O(22n L) obtained by clustering Hi for each i.
16
© Copyright 2026 Paperzz