4.2 Computing Likelihood - Department of Computer Science

Chapter 4: Hidden Markov Models
4.2 HMM: Computing Likelihood
Prof. Yechiam Yemini (YY)
Computer Science Department
Columbia University
The Problem
 How likely is a given sequence of observations?
 Let X=X1…Xn be the observed sequence
 Compute the probability P(X)
P(X 1 ,...X n ) =
# P(X1,....X n ,"1,..." n ) = #" P(X,")
" 1 .." n
The event of “observing X“ is the union of events
<X,π> of “following path π and emitting X”.
! This involves summing over exponential # of paths
 Recursion can reduce this complexity
 Forward/backward algorithms
.5
.6
e(F,H)=0.5
e(F,T)=0.5
Start
.5
.4
.8
e(B,H)=0.9
.2
e(B,T)=0.1
2
1
Why Is This Problem of Interest?
 Computing likelihood
 E.g., what is the likelihood that a sequence X belongs to a class C ?
C= CpG island, intron, promoter…. Protein active site, transmembrane protein…
 Computing p-value
 Learning HMM
 Define θ={a(i,j),e(i)} as the HMM parameters
 For a sequence X=(X1…Xn ) define the log-likelihood L(X|θ)=log P(X|θ)
 For a sample of observed sequences {Xs} define L(θ)=Σs L(Xs|θ)
 L(θ) is the log-likelihood of observing this sample given the model θ
 Maximum Likelihood Estimation (MLE): find θ maximizing L(θ)
3
The Forward Algorithm
Key idea: use recursion to reduce complexity
Define fk(i)=P(X1…Xi,πi= k)
 fk(i) is the likelihood or emitting X1..Xi and reaching state k
 P(X1 …Xi)=Σk fk(i)
Recursive formula to compute fk(i):
Initialization : f 0 (0) = 1, f k (0) = 0
Recursion : f m (i) = P(X1,...X i , " i = m) =
=$
" 1 .. " i #1
=$
k
$
P(X1,...X i#1, " 1,.." i = m)em (X i ) =
" 1 .. " i #2
This derivation is based on two facts:
1. If an event A is a disjoint union of events A=∪πB[π],
then P(A)= ∑π P(B[π]); here A=(X 1 …Xi,π i=m)
B[π]=(X 1 …Xi,π 1 …πi= m).
2. P(B[π] ∩C)= P(C)P(B[π]|C); here P(B[π]|C)=em(Xi)
and C=(X1 …X i-1,π 1 …πi=m)
P(X1,...X i# 2 , " 1,.." i#1 = k)akmem (X i ) =
= em (X i )$ f k (i #1)akm
k
Completion : P(X1,...X n ) = $ f k (n)
k
4
!
2
Example
 What is the probability of p(HHH) ?
Initialization : f 0 (0) = 1, f k (0) = 0
Recursion : f m (i) = em (X i )# f k (i "1)akm
k
Completion : P(X1,...X n ) = " f k (n)
Start
k
!
.6
.5
.5
.4
e(F,H)=0.5
e(F,T)=0.5
!
i m=Start
1
0
0
1
0
2
0
3
.8
m=F (Fair)
e(B,H)=0.9
.2
e(B,T)=0.1
m=B (Biased)
0
0
fF (1)=eF (H)Σfk(0)a kF=0.5*0.5=0.25
fB (1)=eB (H)Σfk(0)a kB=0.5*0.9=0.45
fF (2)=eF (H)Σfk(1)a kF=.5*[.25*.6+.45*.8]=0.255
fB (2)=eB (H)Σfk(1)a kF =.9*[.25*.4+.45*.2]=0.171
fF (3)=eF (H)Σfk(2)a kF=.5*[.255*.6+.171*.8]=0.145
fB (3)=eB (H)Σfk(2)a kF =.9*[.255*.4+.171*.2]=0.123
p(HHH)=fF(3)+fB(3)=0.268
5
The Backward Algorithm
Compute: P(πi = k, X)
Solution: use recursion on P(xi+1…xn | πi = k)
P(πi = k, X) = P(X1…Xi, πi = k, Xi+1…Xn)
= P(X1…Xi, πi = k) P(Xi+1…Xn | X1…Xi, πi = k)
= P(X1…Xi, πi = k) P(Xi+1…Xn | πi = k)
Forw ard , f k ( i)
Backw ard , b k ( i)
6
3
The Recursion
bk (i) = P(X i+1,...X n | " i = k) =
=#
" i +1 .. " n
=#
m
#
P(X i+1,...X n , " i+1,.." n | " i = k) =
" i +1 .. " n
P(X i+1,...X n , " i+1 = m, " i+2 ..." n | " i = k) =
= # em (X i+1 )akm #
m
" i +1 .. " n
P(X i+2 ,...X n , " i+2 ..." n | " i+1 = m)
= # em (X i+1 )akmbm (i + 1)
m
!
7
The Backward Algorithm
Algorithm:
Initialization:
bk(n) = ak0, for all k
Iteration:
bk(i) = Σm em(xi+1) akm bm(i+1)
Termination:
P(X) = Σm a0m em(x1) bm(1)
Complexity: time O(k2N), space O(kN)
Error propagation: scale by constants
8
4
Conclusions
 The forward/backward algorithms compute likelihood
 May be used for
Estimation (max likelihood); learning
Analysis (p-value)
9
5