Biology 162: Computational Genetics

Hidden Markov Models I
Biology 162 Computational
Genetics
Todd Vision
14 Sep 2004
Hidden Markov Models I
• Markov chains
• Hidden Markov models
– Transition and emission probabilities
– Decoding algorithms
• Viterbi
• Forward
• Forward and backward
– Parameter estimation
• Baum-Welch algorithm
Markov Chain
• A particular class of Markov process
– Finite set of states
– Probability of being in state i at time t+1
depends only on state at time t (Markov
property)
• Can be described by
– Transition probability matrix
– Initial probability distribution 0
Markov Chain
Markov chain
a13
a23
a12
a11
1
a22
2
a21
a33
a32
a31
3
Transition probability matrix
• Square matrix with dimensions equal to the
number of states
• Describes the probability of going from state i
to state j in the next step
• Sum of each row must equal 1
a11


A  a21


a31
a
ij
j
1
a12
a22
a32
a13 


a23

a33

Multistep transitions
• Probability of 2 step transition is sum of
probability of all 1 step transitions
• And so on for n steps
a
(2)
ij
  aij akj
k
A
(2)
A
2
A
(n )
A
n
Stationary distribution
• A vector of frequencies that exists if chain
– Is irreducible: each state can eventually be
reached from every other
– Is aperiodic: state sequence does not necessarily
cycle
 A
1


(n )
A  1
n 


..
  1
i
2 ..
2 ..
..
N 


N 

.. 

Reducibility
Periodicity
Applications
• Substitution models
– PAM
– DNA and codon substitution models
• Phylogenetics and molecular evolution
• Hidden Markov models
Hidden Markov models:
applications
•
•
•
•
•
Alignment and homology search
Gene finding
Physical mapping
Genetic linkage mapping
Protein secondary structure prediction
Hidden Markov models
• Observed sequence of symbols
• Hidden sequence of underlying states
• Transition probabilities still govern
transitions among states
• Emission probabilities govern the
likelihood of observing a symbol in a
particular state
Hidden Markov models
Let  represent the state and x represent the symbol
Transition probabilities : axy  P( i  y |  i1  x)
Emission probabilities : ek (b)  P(x i  b |  i  k)
A coin flip HMM
• Two coins
– Fair: 50% Heads, 50% Tails
– Loaded: 90% Heads, 10% Tails
What is the probability for each of these
sequences assuming one coin or the
P  (0.5) 110
P  (0.9) (0.1)  6 10
other?
A: HHTHTHTTHT P  (0.5) 110 P  (0.9) (0.1)  4 10
B: HHHHHTHHHH
10
4
A,F
10
B,F

5
5
6
9
1
2
A,L
4
A,L
A coin flip HMM
• Now imagine the coin is switched with some
probability
Symbol: HTTHHTHHHTHHHHHTHHTHTTHTTHTTH
State: FFFFFFFLLLLLLLLFFFFFFFFFFFFFL
HHHHTHHHTHTTHTTHHTTHHTHHTHHHHHHHTTHTT
LLLLLLLLFFFFFFFFFFFFFFLLLLLLLLLLFFFFF
The formal model
aFL
aFF
F
H 0.5
T 0.5
L
aLF
H 0.9
T 0.1
where aFF, aLL > aFL, aLF
aLL
Probability of a state path

Symbol: T H H H
State: F F L L

P(x, )  a0FeF (T)aFFeF (H)aFLeL (H)aLLeL (H)
Symbol: T H H H
State:
L L F F
P(x, )  a0LeL (T)aLLeL (H)aLFeF (H)aFFeF (H)
Generally
P(x,  )  a0 1  e i (x i )a i  i 1
i1
L
HMMs as sequence
generators
• An HMM can generate an infinite number of
sequences
– There is a probability associated with each one
– This is unlike regular expressions
• With a given sequence
– We might want to ask how often that sequence
would be generated by a given HMM
– The problem is there are many possible state
paths even for a single HMM
• Forward algorithm
– Gives us the summed probability of all state paths
Decoding
• How do we infer the “best” state path?
– We can observe the sequence of symbols
– Assume we also know
• Transition probabilities
• Emission probabilities
• Initial state probabilities
• Two ways to answer that question
– Viterbi algorithm - finds the single most likely state
path
– Forward-backward algorithm - finds the probability
of each state at each position
– These may give different answers
Viterbi algorithm
We use dynamic programming again
Maximum likelihood path
:    arg max P(x,  )

Assume we know the most probable path
ending in state k at position i : v k (i)
We can recursively find the most probable path
for the next position l :
v l (i  1)  el (x i1 )max (v k (i)akl )
k
Viterbi with coin example
• Let aFF=aLL=0.7, aFL aLF=0.3, a0=(0.5,
0.5)
B
F
L
1
0
0
T
0
0.25
0.05
H
H
0
0
0.03125 0.0182*
0.0675* 0.0425
H
0
0.0115*
0.0268
• * = F L L L
• Better to use log probabilities!
Forward algorithm
• Gives us the sum of all paths through
the model
f k (i)  P(x1..xi, i  k)
• Recursion similar to Viterbi but with a
twist
– Rather than using the maximum state k at
position i , we take the sum of all possible
states kf at
i 1)  e (x ) f (i)a
(i
l
l
i1
k
kl
k
Forward with coin example
• Let aFF=aLL=0.7, aFL aLF=0.3, a0=(0.5,
0.5)
• eL(H)=0.9
B
F
L
1
0
0
T
0
0.25
0.05
H
0
0.101
0.353
H
0
?
?
H
0
?
?
Forward-Backward algorithm
We wish to calculate P( i  k | x)
P( i  k | x)  P(x1 ..x i ,  i  k)P(x i1 ..x L |  i  k)
 f k (i)bk (i)
where bk (i) is the backward variable
We calculate bk (i) like f k (i),but starting at the
end of the sequence
Posterior decoding
• We can use the forward-backward algorithm
to define a simple state sequence, as in
Viterbi
ˆ i  argmax P( i  k | x)

k
• Or we can use it to look at ‘composite states’
– Example: a gene prediction HMM

– Model contains states for UTRs, exons, introns,
etc. versus noncoding sequence
– A composite state for a gene would consist of all
the above except for noncoding sequence
– We can calculate the probability of finding a gene,
independent of the specific match states
Parameter estimation
• Design of model (specific to application)
– What states are there?
– How are they connected?
• Assigning values to
– Transition probabilities
– Emission probabilities
Model training
• Assume the states and connectivity are given
• We use a training set from which our model
will learn the parameters 
– An example of machine learning
– The likelihood is probability of the data given the
model
– Calculate likelihood assuming j, j=1..n sequences
in training set are independent
n
l(x1,..x n |  )  log P(x1,..x n |  )   log P(x j |  )
j1
When state sequence is
known
• Maximum likelihood estimators
Akl  observed number of transistions from
k to l
E k (b)  observed number of emissions of symbol
aˆ kl 
Akl
 A kl 
l 
eˆk (b) 
E k (b)
 E k (b)
b 
• Adjusted with pseudocounts

b in state k
When state sequence is
unknown
• Baum-Welch algorithm
– Example of a general class of EM (ExpectationMaximization) algorithms
– Initialize with a guess at akl and ek(b)
– Iterate until convergence
• Calculate likely paths with current parameters
• Recaculate parameters from likely paths
– Akl and Ek(b) are calculated from posterior
decoding (ie forward-backward algorithm) at each
iteration
– Can get stuck on local optima
Preview: Profile HMMs
Reading assignment
• Continue studying:
– Durbin et al. (1998) pgs. 46-79 in Biological
Sequence Analysis