Hidden Markov Models

Hidden Markov Models
Hidden Markov Models
Martin Emms
March 22, 2017
Hidden Markov Models
Outline
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Hidden Markov Models
Outline
HMM Viterbi Decoding
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Outline
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
want to find most-probable hidden state sequence for a given sequence of
visible observations
decode(o1:T ) = arg max[(π(s1 )bs1 (o1 ) ×
s1:T
T
Y
t=2
bst (ot )ast−1 st )]
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
want to find most-probable hidden state sequence for a given sequence of
visible observations
decode(o1:T ) = arg max[(π(s1 )bs1 (o1 ) ×
s1:T
◮
N possible states
◮
N T possible state sequences for o1:T
T
Y
bst (ot )ast−1 st )]
t=2
◮
not computationally feasible to simply enumerate the possible state
sequences
◮
Viterbi algorithm is an efficient, dynamic programming method for
avoiding this
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Part of Speech tagging example
one
wants
tries
a
.002 .003
PNI
VVZ
.5
.3
the
AT0
NN1
In this example
pause
NN1
STOP
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Part of Speech tagging example
one
wants
tries
a
.002 .003
PNI
VVZ
.5
.3
the
AT0
NN1
In this example
◮
states are part-of-speech tags
pause
NN1
STOP
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Part of Speech tagging example
one
wants
tries
a
.002 .003
PNI
VVZ
.5
.3
the
AT0
NN1
In this example
◮
states are part-of-speech tags
◮
observation symbols are words.
pause
NN1
STOP
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Part of Speech tagging example
one
wants
tries
a
.002 .003
PNI
VVZ
.5
.3
the
AT0
pause
NN1
NN1
In this example
◮
states are part-of-speech tags
◮
observation symbols are words.
◮
eg. P(tag at t is AT0 | tag at t − 1 VVZ) = 0.5
STOP
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Part of Speech tagging example
one
wants
tries
a
.002 .003
PNI
VVZ
.5
.3
the
AT0
pause
NN1
STOP
NN1
In this example
◮
states are part-of-speech tags
◮
observation symbols are words.
◮
eg. P(tag at t is AT0 | tag at t − 1 VVZ) = 0.5
◮
eg. P(word at t wants | tag at t is VVZ) = 0.002
This part-of-speech tagging scenario will be used to illustrate the Viterbi
algorithm. The following few slides just explain the POS tags a little, though
this is not really necessary to follow the algorithm
AJ0
AJC
AJS
AT0
AV0
AVP
AVQ
CJC
CJS
CJT
CRD
DPS
DT0
DTQ
EX0
ITJ
NN0
NN1
NN2
NP0
Adjective (general or positive)
Comparative adjective
Superlative adjective
Article
General adverb
Adverb particle
Wh-adverb
Coordinating conjunction
Subordinating conjunction
The subordinating conjunction
Cardinal number
Possessive determiner
General determiner
Wh-determiner
Existential
Interjection
Common noun, neutral for number
Singular common noun
Plural common noun
Proper noun
good, old, beautiful
better, older
best, oldest
the, a, an, no
often, well, longer
up, off, out
when, where, how, why, wherever
and, or, but
although, when
that
one, 3, fifty-five, 3609
your, their, his
this
which, what, whose, whichever
there in the there is
oh, yes, mhm, wow
aircraft, data, committee
pencil, goose, time, revelation
pencils, geese, times, revelations
London, Michael, Mars
ORD
PNI
PNP
PNQ
PNX
POS
PRF
PRP
PUL
PUN
PUQ
PUR
TO0
UNC
Ordinal numeral
Indefinite pronoun
Personal pronoun
Wh-pronoun
Reflexive pronoun
The possessive or genitive marker
The preposition of
Preposition (except for of
Punctuation: left bracket
Punctuation: general separating mark
Punctuation: quotation mark
Punctuation: right bracket
Infinitive marker to
Unclassified items
first, sixth, 77th, last
none, everything, one
I, you, them, ours
who, whoever, whom
myself, yourself, itself, ourselves
’s
of
about, at, in, on
( or [
. , ! , : ; - or ?
’
) or ]
to
formulae
VBB
VBD
VBG
VBI
VBN
VBZ
VDB
VDD
VDG
VDI
VDN
VDZ
VHB
VHD
VHG
VHI
VHN
VHZ
VM0
VVB
VVD
VVG
VVI
VVN
VVZ
XX0
ZZ0
The present tense forms of the verb BE
The past tense forms of the verb BE
The -ing form of the verb BE
The infinitive form of the verb BE
The past participle form of the verb BE
The -s form of the verb BE
The finite base form of the verb BE
The past tense form of the verb DO
The -ing form of the verb DO
The infinitive form of the verb DO
The past participle form of the verb DO
The -s form of the verb DO
The finite base form of the verb HAVE
The past tense form of the verb HAVE
The -ing form of the verb HAVE
The infinitive form of the verb HAVE
The past participle form of the verb HAVE
The -s form of the verb HAVE
Modal auxiliary verb
The finite base form of lexical verbs
The past tense form of lexical verbs
The -ing form of lexical verbs
The infinitive form of lexical verbs
The past participle form of lexical verbs
The -s form of lexical verbs
The negative particle
Alphabetical symbols
am, are, ’m, ’re
was, were
being
be
been
is, ’s
do
did
doing
do
done
does, ’s
have, ’ve
had, ’d
having
have
had
has, ’s
will, would, can, could, ’ll,’d
forget, send, live, return
forgot, sent, lived, returned
forgetting, sending, living, returning
forget, send, live, return
forgotten, sent, lived, returned
forgets, sends, lives, returns
not or n’t
A, a, B, b, c, d
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
this part-of-speech example provides a model which is intended to assign
probabilities to state+obs sequences like
s:
o:
PNI
one
VVZ
wants
AT0
a
s:
o:
AT0
the
NN1
cup
VVZ
wins
etc
NN1
pause
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
◮
best path(t, i): best path through the HMM which
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
◮
best path(t, i): best path through the HMM which
accounts for the first t obs symbols
and ends with state i
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
◮
best path(t, i): best path through the HMM which
accounts for the first t obs symbols
and ends with state i
◮
abs best path(t): best path through the HMM which
accounts for the first t obs symbols
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
◮
best path(t, i): best path through the HMM which
accounts for the first t obs symbols
and ends with state i
◮
abs best path(t): best path through the HMM which
accounts for the first t obs symbols
◮
abs best path(T ) is what you eventually want, but since clearly
abs best path(t) =
best path(t, i) suffices
max
best path(t, i)
1≤i ≤N
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Viterbi in words
◮
abs best path(t) is impossible to calculate from abs best path(t − 1)
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Viterbi in words
◮
abs best path(t) is impossible to calculate from abs best path(t − 1)
◮
best path(t, .) is easy to calculate from best path(t − 1, .), in outline as
follows
For a given state j at time t
consider every possible immediate predecessor i for j
and compare best path(t − 1, i) × aij bj (ot )
take the maximum and remember which i was j’s best predecessor
◮
can implement by tabulating values of best path(t, i), tabulating entries
for t − 1 before entries for t
Viterbi Pseudo code
Initialisation:
f o r ( i = 1 ; i ≤ N ; i++) {
best path(1, i ) = π(i )bi (o1 ) ;
}
Iteration:
f o r ( t = 2 ; t ≤ T ; t++) {
f o r ( j = 1 ; j ≤ N ; j++) {
max = 0 ;
f o r ( i = 1 ; i ≤ N ; i++) {
p = best path(t − 1, i ).prob × aij bj (ot ) ;
i f ( p > max )
{ max = p ; prev state = i ; }
}
best path(t, j).prob = max ;
best path(t, j).prev state = prev state
}
}
The cost of this algorithm will be of the order of N 2 T , compared to the brute
force cost N T .
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Trellis
◮
operation of the algorithm best visualised with trellis; this has as many
columns as there are observation symbols and the column at t shows the
states i with non-zero P(ot |i)
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Trellis
◮
operation of the algorithm best visualised with trellis; this has as many
columns as there are observation symbols and the column at t shows the
states i with non-zero P(ot |i)
◮
next picture shows such a trellis for a part-of-speech tagging example,
where the observation sequences is one wants a pause
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
The defining probabilities
one|CRD = .001
|PNI = .001
pi(CRD) = .004
pi(PNI) = .001
wants|VVZ = .002
|NN2 = .0002
VVZ|CRD = .0001
|PNI = .5
a|AT0 = .45
AT0|VVZ = .5
|NN2 = .01
pause|NN1 = .002
|VVB = .001
|VVI = .001
NN1|AT0 = 0.5
VVB|AT0 = 0.01
NN2|CRD = .45
|PNI = .001
VVI|AT0 = 0.01
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
A trellis
one
wants
CRD
VVZ
a
NN1
AT0
PNI
pause
NN2
assign parts of the equation to each arc in this trellis
VVB
VVI
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
A trellis
wants
CRD
VVZ
a
(.5
pause
2)
)(
.45
)
(.5
)
0
(.0
NN1
)(.
0
02
)
one
VVB
(.5
AT0
(.0
01
) (.
00
1)
PNI
π(PNI) P(one | PNI)
VVI
NN2
x P(AT0 | VVZ) P(a | AT0)
x P(VVZ | PNI)P(wants | VVZ)
x P(NN1 | AT0) P(pause | NN1)
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
A trellis
one
VVZ
02
(.5 (
) .45
)
(.5
)(.
0
0
(.0
a
(.0001)(.002)
)
1) CRD
0
(.0
4)
wants
(.4
5)
00
1)
(.001) (.0002)
PNI
(.0
NN1
2)
.00
(
)
(.5
(.01) (.001)
AT0 (
VVB
.01
) (.
00
1)
2)
) (.
00
01
(.0
(.0
5)
.4
(
1)
pause
NN2
problem is now to find the best path through this trellis
where total path prob is product of segments
VVI
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Viterbi initialisation
one
wants
a
4e−6
CRD
1)
.00
VVZ
pause
NN1
)(
04
(.0
AT0
VVB
(.0
01
) (.0
01
)
PNI
1e−6
best_path(1,CRD) = 4e−6
best_path(1,PNI) = 1e−6
VVI
NN2
INITIALISATION
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path to wants/VVZ
one
wants
a
4e−6
(.0001)(.002)
CRD
VVZ
2)
0
0
.
)(
(.5
NN1
AT0
PNI
1e−6
pause
NN2
for best_path(2,VVZ) compare
CRD VVZ (4e−6)(2e−7) = 8e−13
PNI VVZ (1e−6)(1e−3) = 1e−9
winner!
VVB
VVI
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path to wants/NN2
one
wants
4e−6
CRD
VVZ
a
pause
NN1
VVB
5)
(.4
AT0
2)
00
(.0
(.0001)(.0002)
PNI
NN2
1e−6
for best_path(2,NN2) compare
CRD NN2 (4e−6)(9e−5) = 3.6e−10 winner!
PNI NN2 (1e−6)(2e−8) = 2e−14
VVI
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path(2,.) finished
one
CRD
wants
a
1e−9
VVZ
pause
NN1
AT0
VVB
3.6e−10
PNI
NN2
best_path(2,VVZ) = PNI VVZ 1e−9
best_path(2,NN2) = CRD NN2 3.6e−10
VVI
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path to a/AT0
one
CRD
wants
a
1e−9
VVZ (
.5) (
.45
)
pause
NN1
AT0
VVB
5)
(.4
)
1
.0
3.6e−10
(
PNI
NN2
for best_path(3,AT0) compare
PNI VVZ AT0 (1e−9)(2.25e−1) = 2.25e−10
CRD NN2 AT0 (3.6e−10)(4.5e−3) = 1.6e−12
VVI
(winner!)
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path(3,.) finished
one
wants
CRD
VVZ
a
pause
NN1
2.25e−10
AT0
PNI
NN2
best_path(3,AT0) = PNI VVZ AT0 2.25e−10
VVB
VVI
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path to pause/NN1
one
wants
CRD
VVZ
a
pause
)
02
(.0
.5)
NN1
(
2.25e−10
AT0
PNI
NN2
VVB
VVI
best_path(4,NN1) = PNI VVZ AT0 NN1 (2.25e−10)(1e−3) = 2.25e−13
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path to pause/VVB
one
wants
CRD
VVZ
a
pause
NN1
2.25e−10
(.01)(.001)
AT0
VVB
PNI
NN2
VVI
best_path(4,VVB) = PNI VVZ AT0 VVB (2.25e−10)(1e−5) = 2.25e−15
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path to pause/VVI
one
wants
CRD
VVZ
a
pause
NN1
2.25e−10
AT0
VVB
(.0
1) (
.00
1)
PNI
NN2
VVI
best_path(4,VVI) = PNI VVZ AT0 VVI (2.25e−10)(1e−5) = 2.25e−15
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path[4] finished
one
wants
CRD
VVZ
a
NN1 2.25e−13
AT0
PNI
NN2
pause
VVB 2.25e−15
VVI 2.25e−15
best_path(4,NN1) = PNI VVZ AT0 NN1 = 2.25e−13
(final max)
best_path(4,VVB) = PNI VVZ AT0 VVB = 2.25e−15
best_path(4,VVI) = PNI VVZ AT0 VVI = 2.25e−15