Hidden Markov Models in
Machine Learning
Patricia Francis-Lyon
Machine Learning
• The machine learns from experience without
being explicitly programmed
• The machine has learned if its measured
performance WRT the task improves with
experience
• Used in many fields and applications
Optical Character Recognition (OCR)
Backpropogation Neural Network used by US Postal Service
Coupled with synthesized speech to make virtually any
printed material accessible to the blind
Autonomous Vehicles
This google car is licensed to drive in the states of NV, FL, CA
See it race around cones:
http://www.youtube.com/watch?v=J17Qgc4a8xY&feature=related
Earlier car driven by Carnegie Mellon NN
Speech recognition
Translation of spoken words into text
Hidden Markov Model (HMM) Viterbi Algorithm
Neural Network
Hidden Markov Models
• Discover a data sequence that is not
observable, But other data that depends on
this sequence is
• Sequential data (including temporal): speech,
gesture (Kinect), handwriting recognition,
patterns in DNA, proteins
HMM: graph model
• Hidden states represented by vertices (nodes)
• Transitions from state to state represented by
edges
Ex:
Fair/loaded dice at the occasionally
dishonest casino:
fair
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
0,9
0.95
0.1
loaded
fair
0.05
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
HMM: fair/loaded dice
• The observed sequence is the numbers rolled:
1,4,4,5,4,3,6,3
• But with fair or loaded die? This is hidden:
F,F,F,F,F,F,F,L,L
fair
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
0,9
0.95
0.1
loaded
fair
0.05
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
HMM: probabilistic model
• The transitions from state to state are
according to transition probabilities.
• Each state generates output of the given
alphabet (ex: {1,2,3,4,5,6} ) according to
emissions probabilities
transition probs
fair
1:
2:
3:
4:
5:
6:
emissions probs
1/6
1/6
1/6
1/6
1/6
1/6
0.9
0.95
0.1
loaded
fair
0.05
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
HMM formal definition
• N hidden states = {S1, …, Sn}
• M emitted (observed) symbols
• Initial state probability distribution vector π
where πi =probability of starting in state i
• Transition probability matrix τ where
τij=probability of transition from state i to state j
where i is row index, j is column index of matrix τ
• Emission probability matrix where
ei(c)=probability that state i emits character c
Dishonest Casino
N = 2 states
M = 6 symbols
Initial prob distribution vector π = {1, 0} <- always start with fair die
Transition probs
F
Emission probs
L
1
F
F
L
L
fair
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
2
0.9
0.95
0.1
loaded
fair
0.05
3
4
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
5
6
Dishonest Casino
N = 2 states
M = 6 symbols
Initial prob distribution vector π = {1, 0} <- always start with fair die
Transition probs
Emission probs
F
L
F
.95
.05
F
L
.1
.9
L
fair
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
1
2
0.9
0.95
0.1
loaded
fair
0.05
3
4
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
5
6
Dishonest Casino
N = 2 states
M = 6 symbols
Initial prob distribution vector π = {1, 0} <- always start with fair die
Transition probs
Emission probs
F
L
F
.95
.05
F
L
.1
.9
L
fair
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
1/10
1/10
1/10
1/10
1/10
1/2
0.9
0.95
0.1
loaded
fair
0.05
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
Dishonest Casino
N = 2 states
M = 6 symbols
Initial prob distribution vector π = {1, 0} <- always start with fair die
Transition probs
F
L
F
.95
.05
L
.1
.9
fair
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
Emission probs
1
=1
=1
2
3
4
5
6
F
1/6
1/6
1/6
1/6
1/6
1/6
L
1/10
1/10
1/10
1/10
1/10
1/2
0.9
0.95
0.1
loaded
fair
0.05
loaded
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
=1
=1
3 Problems we can solve
• Scoring: The probability that a given sequence
was generated by a given HMM model
• Best Path: The most likely sequence of states
given a sequence generated by an HMM
• Training: The most likely parameters of an
HMM to emit a large set of sequences
3 Problems we can solve
• Scoring: The probability that a given sequence
was generated by a given HMM model
• Best Path: The most likely sequence of states
given a sequence generated by an HMM
• Training: The most likely parameters of an
HMM to emit a large set of sequences
Calculate with Viterbi
• Best Path: The most likely sequence of states
given observed sequence generated by HMM
• Naïve: calculate joint prob of each hidden state
path to emit the observed sequence: choose best
• Naïve has exponential complexity- we can do
better with Viterbi
Joint Probability
Definition of joint probability:
P(A,B) = P(A|B) * P(B)
Definition of independence:
A and B are independent events iff:
P(A) = P(A|B) and P(B) = P(B|A)
Joint probability for A, B independent:
P(A,B) = P(A) * P(B)
HMM assumptions
• The Markov assumption
Current state is dependent only on previous
state, independent of other states
• Independence Assumption
Observed output depends only on current
state
HMM emits DNA sequence
N = 3 states
M = 4 symbols
Initial prob distribution vector π = {.25, .5, .25}
Transition probs
S1
S2
S3
S1
¼
½
¼
S2
¼
¼
S3
½
½
Emission probs
a
c
t
g
S1
1
0
0
0
½
S2
¼
½
0
¼
0
S3
¼
¼
¼
¼
0.25
0.25
0.25
S2
S1
0.5
0.5
0.0
0.25
0.5
0.5
S3
Viterbi: best path
Cache each sequence step: fill in an array
Viterbi recurrence relations:
Initialization: 1(i) = iei(O1)
Iteration:
t+1(i) = ei (O t+1) max ( t(j) * τji )
j ϵ states
Viterbi recurrence relations:
Initialization: 1(i) = iei(O1)
Iteration:
t+1(i) = ei (O t+1) max ( t(j) * τji )
j ϵ states
Initial prob distribution vector π = {.25, .5, .25}
Transition probs
S1
S2
S3
S1
¼
½
¼
S2
¼
¼
S3
½
½
Emission probs
a
c
t
g
S1
1
0
0
0
½
S2
¼
½
0
¼
0
S3
¼
¼
¼
¼
c
S1
S2
S3
c
t
Viterbi recurrence relations:
Initialization: 1(i) = iei(O1)
Iteration:
t+1(i) = ei (O t+1) max ( t(j) * τji )
j ϵ states
Initial prob distribution vector π = { ¼, ½, ¼ }
Transition probs
S1
S2
S3
S1
¼
½
¼
S2
¼
¼
S3
½
½
Emission probs
a
c
t
g
S1
1
0
0
0
½
S2
¼
½
0
¼
0
S3
¼
¼
¼
¼
c
S1
¼ *0=0
S2
½*½=¼
S3
¼ * ¼ = 1/16
c
t
Viterbi recurrence relations:
Initialization: 1(i) = iei(O1)
Iteration:
t+1(i) = ei (O t+1) max ( t(j) * τji )
j ϵ states
Initial prob distribution vector π = {.25, .5, .25}
Transition probs
Emission probs
S1
S2
S3
S1
¼
½
¼
S2
¼
¼
S3
½
½
a
c
t
g
S1
1
0
0
0
½
S2
¼
½
0
¼
0
S3
¼
¼
¼
¼
c
S1
S2
S3
c
¼ *0=0
0 * max{…} = 0
½*½=¼
¼ *¼=
{0
}
½ * max { ¼* ¼ }=1/32
{ 1/16 * ½ }
{0
}
1/
¼ * max { ¼* ½ } = 1/32
16
{ 1/16 * 0 }
t
Viterbi recurrence relations:
Initialization: 1(i) = iei(O1)
Iteration:
t+1(i) = ei (O t+1) max ( t(j) * τji )
j ϵ states
Initial prob distribution vector π = {.25, .5, .25}
Transition probs
Emission probs
S1
S2
S3
S1
¼
½
¼
S2
¼
¼
S3
½
½
a
c
t
g
S1
1
0
0
0
½
S2
¼
½
0
¼
0
S3
¼
¼
¼
¼
c
S1
S2
S3
c
t
0 * max{…} = 0
0 * max {…} = 0
½*½=¼
{0
}
½ * max { ¼* ¼ } = 1/32
{ 1/16 * ½ }
0 * max {…} = 0
¼ * ¼ = 1/16
{0
}
¼ * max { ¼* ½ } = 1/32
{ 1/16 * 0}
{ 0
}
¼ * max { 1/32 * ½ } = 1/256
{ 1/32 * 0 }
¼ *0=0
Viterbi backtrace
c
S1
S2
S3
c
t
0 * max{…} = 0
0 * max{…} = 0
½*½=¼
{0}
½ * max{ ¼* ¼ } = 1/32
{ 1/16 * ½}
0 * max{…} = 0
¼ * ¼ = 1/16
{0}
¼ * max{ ¼* ½ } = 1/32
{1/16 * 0 }
{0}
¼ * max{ 1/32 * ½ } = 1/256
{ 1/32 * 0 }
¼ *0=0
Most likely state sequence:
Also possible, but less likely:
Worst-case Runtime complexity:
Viterbi backtrace
c
S1
S2
S3
c
t
0 * max{…} = 0
0 * max{…} = 0
½*½=¼
{0}
½ * max{ ¼* ¼ } = 1/32
{ 1/16 * ½}
0 * max{…} = 0
¼ * ¼ = 1/16
{0}
¼ * max{ ¼* ½ } = 1/32
{1/16 * 0 }
{0}
¼ * max{ 1/32 * ½ } = 1/256
{ 1/32 * 0 }
¼ *0=0
Most likely state sequence:
S2, S2, S3
Also possible, but less likely:
S3, S2, S3
Worst-case Runtime: O(N2T) ; backtrace part: O(T)
Underflow: use logs (ex: base 2)
c
S1
S2
S3
S2
S3
t
0 * max{…} = 0
0 * max{…} = 0
½*½=¼
{0}
½ * max{ ¼* ¼ } = 1/32
{ 1/16 * ½}
0 * max{…} = 0
¼ * ¼ = 1/16
{0}
¼ * max{ ¼* ½ } = 1/32
{1/16 * 0 }
{0}
¼ * max{ 1/32 * ½ } = 1/256
{ 1/32 * 0 }
¼ * =0
c
S1
c
c
t
-2 + -inf = -inf
-inf + max{…} = -inf -inf + max{…} = -inf
-1 + -1 = -2
{-inf }
-1 + max{ -2+ -2} = -5
{ -4 + -1}
-inf + max{…} = -inf
-2 + -2 = -4
{-inf }
-2+ max{ -2+ -1 } = -5
{ -4 + -inf }
{-inf }
-2 + max{-5 + -1 } = -8
{-5 + -inf }
HMM Implementations
Open Source:
• NLTK in Python
• Apache Mahout in Java
• HMMER (Sean Eddy) in C
Logs are exponents
log10 (100) = 2
another way to say 102 = 100
When you multiply numbers with same base:
102 *
103
= 102+3
10*10 * 10*10*10
= 105
log10(100*1000) = log10 (100) + log10(1000) = 2+ 3
you add the exponents:
loga(x*y) = loga(x) + loga(y)
© Copyright 2025 Paperzz