Lecture Slides for ETHEM ALPAYDIN © The MIT Press, 2010 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Introduction Modeling dependencies in input; no longer iid Sequences: Temporal: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). In handwriting, pen movements Spatial: In a DNA sequence; base pairs Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3 Discrete Markov Process N states: S1, S2, ..., SN First-order Markov State at “time” t, qt = Si P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si) Transition probabilities aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1 Initial probabilities πi ≡ P(q1=Si) Σj=1N πi=1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4 Stochastic Automaton T P O Q | A , P q1 P qt | qt 1 q aq q t 2 1 1 2 aqT 1qT Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5 Example: Balls and Urns Three urns each full of balls of one color S1: red, S2: blue, S3: green 0.5,0.2,0.3T O S1 , S1 , S3 , S3 0.4 0.3 0.3 A 0.2 0.6 0.2 0.1 0.1 0.8 P O | A , P S1 P S1 | S1 P S3 | S1 P S3 | S3 1 a11 a13 a33 0.5 0.4 0.3 0.8 0.048 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6 Balls and Urns: Learning Given K example sequences of length T ˆi aˆij # sequences starting with Si # sequences # transition s from Si to S j k 1 q k 1 S i K # transition s from Si k k 1 q S and q k t 1 t i t 1 S j T- 1 k 1 q k t 1 t Si T- 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7 Hidden Markov Models States are not observable Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function of the state Emission probabilities bj(m) ≡ P(Ot=vm | qt=Sj) Example: In each urn, there are balls of different colors, but with different probabilities. For each observation sequence, there are multiple state sequences Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8 HMM Unfolded in Time Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9 Elements of an HMM N: Number of states M: Number of observation symbols A = [aij]: N by N state transition probability matrix B = bj(m): N by M observation probability matrix Π = [πi]: N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10 Three Basic Problems of HMMs 1. Evaluation: Given λ, and O, calculate P (O | λ) 2. State sequence: Given λ, and O, find Q* such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) 3. Learning: Given X={Ok}k, find λ* such that P ( X | λ* )=maxλ P ( X | λ ) (Rabiner, 1989) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11 Evaluation Forward variable: t i P O1 Ot , qt Si | Initializa tion : 1 i i bi O1 Recursion : N t 1 j t i aij b j Ot 1 i 1 N P O | T i i 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12 Backward variable: t i P Ot 1 OT | qt Si , Initializa tion : T i 1 Recursion : N t i aij b j Ot 1 t 1 j j 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13 Finding the State Sequence t i P qt Si O, t i t i N j 1 t j t j Choose the state that has the highest probability, for each time step: qt*= arg maxi γt(i) No! Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14 Viterbi’s Algorithm δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ) Initialization: δ1(i) = πibi(O1), ψ1(i) = 0 Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij Termination: p* = maxi δT(i), qT*= argmaxi δT (i) Path backtracking: qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15 Learning t i , j P qt Si , qt 1 S j | O , t i aij b j Ot 1 t 1 j t i , j k l t k akl bl Ot 1 t 1 l Baum - Welch algorithm (EM) : 1 if qt Si 1 if qt Si and qt 1 S j t z zij 0 otherwise 0 otherwise t i Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16 Baum-Welch (EM) E step : E zit t i E zijt t i , j M step : K ˆi k 1 i k 1 K aˆij k k 1 t 1 t i K Tk 1 k k j 1 O k 1 t 1 t t vm K bˆ j m k k 1 t 1 t i , j Tk 1 K Tk 1 k k 1 t 1 t i K Tk 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17 Continuous Observations Discrete: rmt 1 if Ot vm r 0 otherwise P Ot | qt S j , bj m M t m m 1 Gaussian mixture (Discretize using k-means): P Ot |qt S j , P G jl pOt |qt S j ,Gl , L l 1 Continuous: ~ N l , l POt |qt S j , ~ N j , j2 Use EM to learn parameters, e.g., j O ̂ j t t t j t Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) t 18 HMM with Input Input-dependent observations: POt | qt S j , x t , ~ N g j x t | j , j2 Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996): Pqt 1 S j |qt Si , x t Time-delay input: xt f Ot ,...,Ot 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 19 Model Selection in HMM Left-to-right HMMs: a11 a12 a13 0 0 a a a 22 23 24 A 0 0 a33 a34 0 0 0 a 44 In classification, for each Ci, estimate P (O | λi) by a separate HMM and use Bayes’ rule P O | i P i P i | O j P O | j P j Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 20
© Copyright 2026 Paperzz