Hidden Markov Models Hidden Markov Models Martin Emms March 22, 2017 Hidden Markov Models Outline Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Hidden Markov Models Outline HMM Viterbi Decoding Hidden Markov Models Best path through an HMM: Viterbi Decoding Outline Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding want to find most-probable hidden state sequence for a given sequence of visible observations decode(o1:T ) = arg max[(π(s1 )bs1 (o1 ) × s1:T T Y t=2 bst (ot )ast−1 st )] Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding want to find most-probable hidden state sequence for a given sequence of visible observations decode(o1:T ) = arg max[(π(s1 )bs1 (o1 ) × s1:T ◮ N possible states ◮ N T possible state sequences for o1:T T Y bst (ot )ast−1 st )] t=2 ◮ not computationally feasible to simply enumerate the possible state sequences ◮ Viterbi algorithm is an efficient, dynamic programming method for avoiding this Hidden Markov Models Best path through an HMM: Viterbi Decoding Part of Speech tagging example one wants tries a .002 .003 PNI VVZ .5 .3 the AT0 NN1 In this example pause NN1 STOP Hidden Markov Models Best path through an HMM: Viterbi Decoding Part of Speech tagging example one wants tries a .002 .003 PNI VVZ .5 .3 the AT0 NN1 In this example ◮ states are part-of-speech tags pause NN1 STOP Hidden Markov Models Best path through an HMM: Viterbi Decoding Part of Speech tagging example one wants tries a .002 .003 PNI VVZ .5 .3 the AT0 NN1 In this example ◮ states are part-of-speech tags ◮ observation symbols are words. pause NN1 STOP Hidden Markov Models Best path through an HMM: Viterbi Decoding Part of Speech tagging example one wants tries a .002 .003 PNI VVZ .5 .3 the AT0 pause NN1 NN1 In this example ◮ states are part-of-speech tags ◮ observation symbols are words. ◮ eg. P(tag at t is AT0 | tag at t − 1 VVZ) = 0.5 STOP Hidden Markov Models Best path through an HMM: Viterbi Decoding Part of Speech tagging example one wants tries a .002 .003 PNI VVZ .5 .3 the AT0 pause NN1 STOP NN1 In this example ◮ states are part-of-speech tags ◮ observation symbols are words. ◮ eg. P(tag at t is AT0 | tag at t − 1 VVZ) = 0.5 ◮ eg. P(word at t wants | tag at t is VVZ) = 0.002 This part-of-speech tagging scenario will be used to illustrate the Viterbi algorithm. The following few slides just explain the POS tags a little, though this is not really necessary to follow the algorithm AJ0 AJC AJS AT0 AV0 AVP AVQ CJC CJS CJT CRD DPS DT0 DTQ EX0 ITJ NN0 NN1 NN2 NP0 Adjective (general or positive) Comparative adjective Superlative adjective Article General adverb Adverb particle Wh-adverb Coordinating conjunction Subordinating conjunction The subordinating conjunction Cardinal number Possessive determiner General determiner Wh-determiner Existential Interjection Common noun, neutral for number Singular common noun Plural common noun Proper noun good, old, beautiful better, older best, oldest the, a, an, no often, well, longer up, off, out when, where, how, why, wherever and, or, but although, when that one, 3, fifty-five, 3609 your, their, his this which, what, whose, whichever there in the there is oh, yes, mhm, wow aircraft, data, committee pencil, goose, time, revelation pencils, geese, times, revelations London, Michael, Mars ORD PNI PNP PNQ PNX POS PRF PRP PUL PUN PUQ PUR TO0 UNC Ordinal numeral Indefinite pronoun Personal pronoun Wh-pronoun Reflexive pronoun The possessive or genitive marker The preposition of Preposition (except for of Punctuation: left bracket Punctuation: general separating mark Punctuation: quotation mark Punctuation: right bracket Infinitive marker to Unclassified items first, sixth, 77th, last none, everything, one I, you, them, ours who, whoever, whom myself, yourself, itself, ourselves ’s of about, at, in, on ( or [ . , ! , : ; - or ? ’ ) or ] to formulae VBB VBD VBG VBI VBN VBZ VDB VDD VDG VDI VDN VDZ VHB VHD VHG VHI VHN VHZ VM0 VVB VVD VVG VVI VVN VVZ XX0 ZZ0 The present tense forms of the verb BE The past tense forms of the verb BE The -ing form of the verb BE The infinitive form of the verb BE The past participle form of the verb BE The -s form of the verb BE The finite base form of the verb BE The past tense form of the verb DO The -ing form of the verb DO The infinitive form of the verb DO The past participle form of the verb DO The -s form of the verb DO The finite base form of the verb HAVE The past tense form of the verb HAVE The -ing form of the verb HAVE The infinitive form of the verb HAVE The past participle form of the verb HAVE The -s form of the verb HAVE Modal auxiliary verb The finite base form of lexical verbs The past tense form of lexical verbs The -ing form of lexical verbs The infinitive form of lexical verbs The past participle form of lexical verbs The -s form of lexical verbs The negative particle Alphabetical symbols am, are, ’m, ’re was, were being be been is, ’s do did doing do done does, ’s have, ’ve had, ’d having have had has, ’s will, would, can, could, ’ll,’d forget, send, live, return forgot, sent, lived, returned forgetting, sending, living, returning forget, send, live, return forgotten, sent, lived, returned forgets, sends, lives, returns not or n’t A, a, B, b, c, d Hidden Markov Models Best path through an HMM: Viterbi Decoding this part-of-speech example provides a model which is intended to assign probabilities to state+obs sequences like s: o: PNI one VVZ wants AT0 a s: o: AT0 the NN1 cup VVZ wins etc NN1 pause Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding ◮ best path(t, i): best path through the HMM which Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding ◮ best path(t, i): best path through the HMM which accounts for the first t obs symbols and ends with state i Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding ◮ best path(t, i): best path through the HMM which accounts for the first t obs symbols and ends with state i ◮ abs best path(t): best path through the HMM which accounts for the first t obs symbols Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding ◮ best path(t, i): best path through the HMM which accounts for the first t obs symbols and ends with state i ◮ abs best path(t): best path through the HMM which accounts for the first t obs symbols ◮ abs best path(T ) is what you eventually want, but since clearly abs best path(t) = best path(t, i) suffices max best path(t, i) 1≤i ≤N Hidden Markov Models Best path through an HMM: Viterbi Decoding Viterbi in words ◮ abs best path(t) is impossible to calculate from abs best path(t − 1) Hidden Markov Models Best path through an HMM: Viterbi Decoding Viterbi in words ◮ abs best path(t) is impossible to calculate from abs best path(t − 1) ◮ best path(t, .) is easy to calculate from best path(t − 1, .), in outline as follows For a given state j at time t consider every possible immediate predecessor i for j and compare best path(t − 1, i) × aij bj (ot ) take the maximum and remember which i was j’s best predecessor ◮ can implement by tabulating values of best path(t, i), tabulating entries for t − 1 before entries for t Viterbi Pseudo code Initialisation: f o r ( i = 1 ; i ≤ N ; i++) { best path(1, i ) = π(i )bi (o1 ) ; } Iteration: f o r ( t = 2 ; t ≤ T ; t++) { f o r ( j = 1 ; j ≤ N ; j++) { max = 0 ; f o r ( i = 1 ; i ≤ N ; i++) { p = best path(t − 1, i ).prob × aij bj (ot ) ; i f ( p > max ) { max = p ; prev state = i ; } } best path(t, j).prob = max ; best path(t, j).prev state = prev state } } The cost of this algorithm will be of the order of N 2 T , compared to the brute force cost N T . Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Trellis ◮ operation of the algorithm best visualised with trellis; this has as many columns as there are observation symbols and the column at t shows the states i with non-zero P(ot |i) Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Trellis ◮ operation of the algorithm best visualised with trellis; this has as many columns as there are observation symbols and the column at t shows the states i with non-zero P(ot |i) ◮ next picture shows such a trellis for a part-of-speech tagging example, where the observation sequences is one wants a pause Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging The defining probabilities one|CRD = .001 |PNI = .001 pi(CRD) = .004 pi(PNI) = .001 wants|VVZ = .002 |NN2 = .0002 VVZ|CRD = .0001 |PNI = .5 a|AT0 = .45 AT0|VVZ = .5 |NN2 = .01 pause|NN1 = .002 |VVB = .001 |VVI = .001 NN1|AT0 = 0.5 VVB|AT0 = 0.01 NN2|CRD = .45 |PNI = .001 VVI|AT0 = 0.01 Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging A trellis one wants CRD VVZ a NN1 AT0 PNI pause NN2 assign parts of the equation to each arc in this trellis VVB VVI Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging A trellis wants CRD VVZ a (.5 pause 2) )( .45 ) (.5 ) 0 (.0 NN1 )(. 0 02 ) one VVB (.5 AT0 (.0 01 ) (. 00 1) PNI π(PNI) P(one | PNI) VVI NN2 x P(AT0 | VVZ) P(a | AT0) x P(VVZ | PNI)P(wants | VVZ) x P(NN1 | AT0) P(pause | NN1) Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging A trellis one VVZ 02 (.5 ( ) .45 ) (.5 )(. 0 0 (.0 a (.0001)(.002) ) 1) CRD 0 (.0 4) wants (.4 5) 00 1) (.001) (.0002) PNI (.0 NN1 2) .00 ( ) (.5 (.01) (.001) AT0 ( VVB .01 ) (. 00 1) 2) ) (. 00 01 (.0 (.0 5) .4 ( 1) pause NN2 problem is now to find the best path through this trellis where total path prob is product of segments VVI Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Viterbi initialisation one wants a 4e−6 CRD 1) .00 VVZ pause NN1 )( 04 (.0 AT0 VVB (.0 01 ) (.0 01 ) PNI 1e−6 best_path(1,CRD) = 4e−6 best_path(1,PNI) = 1e−6 VVI NN2 INITIALISATION Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path to wants/VVZ one wants a 4e−6 (.0001)(.002) CRD VVZ 2) 0 0 . )( (.5 NN1 AT0 PNI 1e−6 pause NN2 for best_path(2,VVZ) compare CRD VVZ (4e−6)(2e−7) = 8e−13 PNI VVZ (1e−6)(1e−3) = 1e−9 winner! VVB VVI Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path to wants/NN2 one wants 4e−6 CRD VVZ a pause NN1 VVB 5) (.4 AT0 2) 00 (.0 (.0001)(.0002) PNI NN2 1e−6 for best_path(2,NN2) compare CRD NN2 (4e−6)(9e−5) = 3.6e−10 winner! PNI NN2 (1e−6)(2e−8) = 2e−14 VVI Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path(2,.) finished one CRD wants a 1e−9 VVZ pause NN1 AT0 VVB 3.6e−10 PNI NN2 best_path(2,VVZ) = PNI VVZ 1e−9 best_path(2,NN2) = CRD NN2 3.6e−10 VVI Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path to a/AT0 one CRD wants a 1e−9 VVZ ( .5) ( .45 ) pause NN1 AT0 VVB 5) (.4 ) 1 .0 3.6e−10 ( PNI NN2 for best_path(3,AT0) compare PNI VVZ AT0 (1e−9)(2.25e−1) = 2.25e−10 CRD NN2 AT0 (3.6e−10)(4.5e−3) = 1.6e−12 VVI (winner!) Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path(3,.) finished one wants CRD VVZ a pause NN1 2.25e−10 AT0 PNI NN2 best_path(3,AT0) = PNI VVZ AT0 2.25e−10 VVB VVI Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path to pause/NN1 one wants CRD VVZ a pause ) 02 (.0 .5) NN1 ( 2.25e−10 AT0 PNI NN2 VVB VVI best_path(4,NN1) = PNI VVZ AT0 NN1 (2.25e−10)(1e−3) = 2.25e−13 Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path to pause/VVB one wants CRD VVZ a pause NN1 2.25e−10 (.01)(.001) AT0 VVB PNI NN2 VVI best_path(4,VVB) = PNI VVZ AT0 VVB (2.25e−10)(1e−5) = 2.25e−15 Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path to pause/VVI one wants CRD VVZ a pause NN1 2.25e−10 AT0 VVB (.0 1) ( .00 1) PNI NN2 VVI best_path(4,VVI) = PNI VVZ AT0 VVI (2.25e−10)(1e−5) = 2.25e−15 Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path[4] finished one wants CRD VVZ a NN1 2.25e−13 AT0 PNI NN2 pause VVB 2.25e−15 VVI 2.25e−15 best_path(4,NN1) = PNI VVZ AT0 NN1 = 2.25e−13 (final max) best_path(4,VVB) = PNI VVZ AT0 VVB = 2.25e−15 best_path(4,VVI) = PNI VVZ AT0 VVI = 2.25e−15
© Copyright 2024 Paperzz