STAT 530 Hidden Markov Model Ping Ma Outline Markov Chain Hidden Markov Model – Observations, hidden states, initial, transition and emission probabilities Three problems – Pb(observations): forward, backward procedure – Infer hidden states: forward-backward, Viterbi – Estimate parameters: Baum-Welch Protein sequence motif using HMM Ping Ma STAT 530 1 Markov Chain iid: independently and identically distributed – The outcome of event A does not have any impact on B – e.g. P(girl | boy) = P(girl), P(H | H) = P(H) Discrete Markov process – Distinct states: S1, S2, …Sn – Regularly spaced discrete times: t = 1, 2,… – Markov chain: future state only depends on present state, but not the path to get here Ping Ma STAT 530 Markov Chain Example 1 States: 1 – rain; 2 – cloudy; 3 – sunny State transition probability Given 3 at t=1 Ping Ma STAT 530 2 Markov Chain Example 2 States: fair coin F, unfair (biased) coin B Initial probability: πF = 0.6, πB = 0.4 Ping Ma STAT 530 Hidden Markov Model Coin toss example Coin transition is a Markov chain Probability of H/T depends on the coin used Observation of H/T is a hidden Markov chain (coin state is hidden) Ping Ma STAT 530 3 Hidden Markov Model Elements of an HMM (coin toss) – N, the number of states (F / B) – M, the number of distinct observation (H / T) Ping Ma STAT 530 Basic Problems for HMM 1. Given observation sequence O = O1O2…OT and λ, how to compute P(O|λ) • Probability of observing HTTHHHT … • Forward procedure, backward procedure 2. Given observation sequence O = O1O2…OT and λ, how to choose state sequence Q = q1q2…qt • What is the hidden coin behind each flip • Forward-backward, Viterbi 3. How to estimate λ =(A,B,π) so as to maximize P(O| λ) • How to estimate coin parameters λ • Baum-Welch (Expectation maximization) Ping Ma STAT 530 4 Problem 1: P(O|λ) Ping Ma STAT 530 Problem 1: P(O|λ) Ping Ma STAT 530 5 Problem 1: P(O|λ) Ping Ma STAT 530 Solution to Prob1: Forward Procedure Use dynamic programming Summing at every time point Keep previous subproblem solution to speed up current calculation Ping Ma STAT 530 6 Forward Procedure Coin toss, O = HTTHHHT Initialization Ping Ma STAT 530 Forward Procedure Coin toss, O = HTTHHHT Initialization Induction Ping Ma STAT 530 7 Forward Procedure Coin toss, O = HTTHHHT Initialization Induction Ping Ma STAT 530 Solution to Prob1: Backward Procedure Coin toss, O = HTTHHHT Initialization Ping Ma STAT 530 8 Backward Procedure Coin toss, O = HTTHHHT Initialization Induction Ping Ma STAT 530 Backward Procedure Coin toss, O = HTTHHHT Initialization Induction Ping Ma STAT 530 9 Backward Procedure Coin toss, O = HTTHHHT Initialization Induction Ping Ma STAT 530 Solution to Problem 2 Forward-Backward Procedure Coin toss Probabilistic prediction at every time point Forward-backward maximizes the expected number of correct predicted states Ping Ma STAT 530 10 Solution to Problem 2 Viterbi Algorithm Get the most likely path: hidden states Initiation Recursion Termination Path (state sequence) backtracking Ping Ma STAT 530 Viterbi Algorithm Observe: HTTHHHT Initiation Ping Ma STAT 530 11 Viterbi Algorithm Observe: HTTHHHT Initiation Recursion Max instead of + keep track path Ping Ma STAT 530 Viterbi Algorithm Observe: HTTHHHT Initiation Recursion Ping Ma Max instead of + keep track path STAT 530 12 Viterbi Algorithm Max instead of +, keep track of path Ping Ma STAT 530 Viterbi Algorithm Terminate, pick state that gives final best δ score, and backtrack to get path Ping Ma STAT 530 13 Solution to Problem 3 No optimal way to do this, so find local maximum Baum-Welch algorithm (equivalent to EM) – Random initialize λ =(A,B,π) – Run Viterbi based on λ and O – Update λ =(A,B,π) • π: % of F vs B on Viterbi path • A: frequency of F/B transition on Viterbi path • B: frequency of H/T emitted by F/B Ping Ma STAT 530 HMM for Sequence Motifs Standard profile HMM architecture Three types of states: – Match, Insert, Delete Start and end “dummy” states Ping Ma STAT 530 14 Standard Profile HMM Architecture Match states Ping Ma STAT 530 Standard Profile HMM Architecture Match states Ping Ma STAT 530 15 Standard Profile HMM Architecture Delete states Ping Ma STAT 530 HMM for Sequence Motifs Known alignment Ping Ma STAT 530 16 HMM for Sequence Motifs If sequences are unaligned – Use sequence & Baum-Welch to estimate HMM parameters – Random initialize HMM, find alignment, reestimate transition/emissions, repeat… – End product: trained HMM parameters (sequence profile) + local multiple alignment At least 20 sequences required to train model, may get stuck in local optimal Ping Ma STAT 530 17
© Copyright 2025 Paperzz