Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004 Hidden Markov Models I • Markov chains • Hidden Markov models – Transition and emission probabilities – Decoding algorithms • Viterbi • Forward • Forward and backward – Parameter estimation • Baum-Welch algorithm Markov Chain • A particular class of Markov process – Finite set of states – Probability of being in state i at time t+1 depends only on state at time t (Markov property) • Can be described by – Transition probability matrix – Initial probability distribution 0 Markov Chain Markov chain a13 a23 a12 a11 1 a22 2 a21 a33 a32 a31 3 Transition probability matrix • Square matrix with dimensions equal to the number of states • Describes the probability of going from state i to state j in the next step • Sum of each row must equal 1 a11 A a21 a31 a ij j 1 a12 a22 a32 a13 a23 a33 Multistep transitions • Probability of 2 step transition is sum of probability of all 1 step transitions • And so on for n steps a (2) ij aij akj k A (2) A 2 A (n ) A n Stationary distribution • A vector of frequencies that exists if chain – Is irreducible: each state can eventually be reached from every other – Is aperiodic: state sequence does not necessarily cycle A 1 (n ) A 1 n .. 1 i 2 .. 2 .. .. N N .. Reducibility Periodicity Applications • Substitution models – PAM – DNA and codon substitution models • Phylogenetics and molecular evolution • Hidden Markov models Hidden Markov models: applications • • • • • Alignment and homology search Gene finding Physical mapping Genetic linkage mapping Protein secondary structure prediction Hidden Markov models • Observed sequence of symbols • Hidden sequence of underlying states • Transition probabilities still govern transitions among states • Emission probabilities govern the likelihood of observing a symbol in a particular state Hidden Markov models Let represent the state and x represent the symbol Transition probabilities : axy P( i y | i1 x) Emission probabilities : ek (b) P(x i b | i k) A coin flip HMM • Two coins – Fair: 50% Heads, 50% Tails – Loaded: 90% Heads, 10% Tails What is the probability for each of these sequences assuming one coin or the P (0.5) 110 P (0.9) (0.1) 6 10 other? A: HHTHTHTTHT P (0.5) 110 P (0.9) (0.1) 4 10 B: HHHHHTHHHH 10 4 A,F 10 B,F 5 5 6 9 1 2 A,L 4 A,L A coin flip HMM • Now imagine the coin is switched with some probability Symbol: HTTHHTHHHTHHHHHTHHTHTTHTTHTTH State: FFFFFFFLLLLLLLLFFFFFFFFFFFFFL HHHHTHHHTHTTHTTHHTTHHTHHTHHHHHHHTTHTT LLLLLLLLFFFFFFFFFFFFFFLLLLLLLLLLFFFFF The formal model aFL aFF F H 0.5 T 0.5 L aLF H 0.9 T 0.1 where aFF, aLL > aFL, aLF aLL Probability of a state path Symbol: T H H H State: F F L L P(x, ) a0FeF (T)aFFeF (H)aFLeL (H)aLLeL (H) Symbol: T H H H State: L L F F P(x, ) a0LeL (T)aLLeL (H)aLFeF (H)aFFeF (H) Generally P(x, ) a0 1 e i (x i )a i i 1 i1 L HMMs as sequence generators • An HMM can generate an infinite number of sequences – There is a probability associated with each one – This is unlike regular expressions • With a given sequence – We might want to ask how often that sequence would be generated by a given HMM – The problem is there are many possible state paths even for a single HMM • Forward algorithm – Gives us the summed probability of all state paths Decoding • How do we infer the “best” state path? – We can observe the sequence of symbols – Assume we also know • Transition probabilities • Emission probabilities • Initial state probabilities • Two ways to answer that question – Viterbi algorithm - finds the single most likely state path – Forward-backward algorithm - finds the probability of each state at each position – These may give different answers Viterbi algorithm We use dynamic programming again Maximum likelihood path : arg max P(x, ) Assume we know the most probable path ending in state k at position i : v k (i) We can recursively find the most probable path for the next position l : v l (i 1) el (x i1 )max (v k (i)akl ) k Viterbi with coin example • Let aFF=aLL=0.7, aFL aLF=0.3, a0=(0.5, 0.5) B F L 1 0 0 T 0 0.25 0.05 H H 0 0 0.03125 0.0182* 0.0675* 0.0425 H 0 0.0115* 0.0268 • * = F L L L • Better to use log probabilities! Forward algorithm • Gives us the sum of all paths through the model f k (i) P(x1..xi, i k) • Recursion similar to Viterbi but with a twist – Rather than using the maximum state k at position i , we take the sum of all possible states kf at i 1) e (x ) f (i)a (i l l i1 k kl k Forward with coin example • Let aFF=aLL=0.7, aFL aLF=0.3, a0=(0.5, 0.5) • eL(H)=0.9 B F L 1 0 0 T 0 0.25 0.05 H 0 0.101 0.353 H 0 ? ? H 0 ? ? Forward-Backward algorithm We wish to calculate P( i k | x) P( i k | x) P(x1 ..x i , i k)P(x i1 ..x L | i k) f k (i)bk (i) where bk (i) is the backward variable We calculate bk (i) like f k (i),but starting at the end of the sequence Posterior decoding • We can use the forward-backward algorithm to define a simple state sequence, as in Viterbi ˆ i argmax P( i k | x) k • Or we can use it to look at ‘composite states’ – Example: a gene prediction HMM – Model contains states for UTRs, exons, introns, etc. versus noncoding sequence – A composite state for a gene would consist of all the above except for noncoding sequence – We can calculate the probability of finding a gene, independent of the specific match states Parameter estimation • Design of model (specific to application) – What states are there? – How are they connected? • Assigning values to – Transition probabilities – Emission probabilities Model training • Assume the states and connectivity are given • We use a training set from which our model will learn the parameters – An example of machine learning – The likelihood is probability of the data given the model – Calculate likelihood assuming j, j=1..n sequences in training set are independent n l(x1,..x n | ) log P(x1,..x n | ) log P(x j | ) j1 When state sequence is known • Maximum likelihood estimators Akl observed number of transistions from k to l E k (b) observed number of emissions of symbol aˆ kl Akl A kl l eˆk (b) E k (b) E k (b) b • Adjusted with pseudocounts b in state k When state sequence is unknown • Baum-Welch algorithm – Example of a general class of EM (ExpectationMaximization) algorithms – Initialize with a guess at akl and ek(b) – Iterate until convergence • Calculate likely paths with current parameters • Recaculate parameters from likely paths – Akl and Ek(b) are calculated from posterior decoding (ie forward-backward algorithm) at each iteration – Can get stuck on local optima Preview: Profile HMMs Reading assignment • Continue studying: – Durbin et al. (1998) pgs. 46-79 in Biological Sequence Analysis
© Copyright 2025 Paperzz