Forward-backward algorithm
LING 572
Fei Xia
02/23/06
Outline
• Forward and backward probability
• Expected counts and update formulae
• Relation with EM
HMM
• A HMM is a tuple ( S , , , A, B) :
–
–
–
–
–
A set of states S={s1, s2, …, sN}.
A set of output symbols Σ={w1, …, wM}.
Initial state probabilities
{ i }
State transition prob: A={aij}.
Symbol emission prob: B={bijk}
• State sequence: X1…XT+1
• Output sequence: o1…oT
Constraints
N
i 1
i
1
ij
1
N
a
j 1
ij ijk
M
b
k 1
a b
ijk
1
k
j
1
Decoding
• Given the observation O1,T=o1…oT, find
the state sequence X1,T+1=X1 … XT+1 that
maximizes P(X1,T+1 | O1,T).
o1
X1
X2
o2
Viterbi algorithm
…
oT
XT
XT+1
Notation
•
•
•
•
•
A sentence: O1,T=o1…oT,
T is the sentence length
The state sequence X1,T+1=X1 … XT+1
t: time t, range from 1 to T+1.
Xt: the state at time t.
• i, j: state si, sj.
• k: word wk in the vocabulary
Forward and backward
probabilities
Forward probability
The probability of producing oi,t-1 while ending up in
state si:
def
i (t ) P(O1,t 1 , X t i)
Calculating forward probability
Initialization:
i (1) i
Induction:
j (t 1) P (O1,t , X t 1 j )
P (O1,t , X t i, X t 1 j )
i
P (O1,t 1 , X t i ) * P (ot , X t 1 j | O1,t 1 , X t i )
i
P (O1,t 1 , X t i ) * P (ot , X t 1 j | X t i )
i
i (t )aij bijot
i
Backward probability
• The probability of producing the sequence
Ot,T, given that at time t, we are at state si.
def
i (t ) P(Ot ,T | X t i)
Calculating backward probability
Initialization:
i (T 1) 1
Induction:
def
i (t ) P(Ot ,T | X t i )
P (ot , O(t 1),T ,X t 1 j | X t i )
j
P (ot , X t 1 j | X t i ) * P (O( t 1),T | X t i, X t 1 j , ot )
j
P (ot , X t 1 j | X t i ) * P (Ot 1,T | X t 1 j )
j
j (t 1)aij bijot
j
Calculating the prob of
the observation
N
P(O) i (T 1)
i 1
N
P(O) i i (1)
i 1
N
P (O ) P (O, X t i )
i 1
N
i (t ) i (t )
i 1
Estimating parameters
• The prob of traversing a certain arc at time t
given O: (denoted by pt(i, j) in M&S)
ij (t ) P( X t i, X t 1 j | O)
P ( X t i, X t 1 j , O)
P(O)
i (t )aij bijot j (t 1)
N
m 1
m
(t ) m (t )
The prob of being at state i at time t given O:
N
i (t ) P( X t i | O) P( X t i, X t 1 j | O)
j 1
N
i (t ) ij (t )
j 1
Expected counts
Sum over the time index:
• Expected # of transitions from state i to j in O:
T
t 1
ij
(t )
• Expected # of transitions from state i in O:
T
t 1
T
i
N
N
T
(t ) ij (t ) ij (t )
t 1 j 1
j 1 t 1
Update parameters
ˆ i exp ected frequency in state i at time t 1 i (1)
T
exp ected # of transition s from state i to j
aij
exp ected # of transition s from state i
t 1
T
t 1
T
ij
i
(t )
(t )
t 1
N
ij
(t )
T
j 1
t 1
ij
(t )
T
bijk
exp ected # of transitions from state i to j with k observed
exp ected # of transitions from state i to j
(o , w )
t
t 1
k
T
t 1
ij
(t )
ij
(t )
Final formulae
i (t )aij bijo j (t 1)
ij (t )
t
N
m 1
m
(t ) m (t )
T
a ij
t 1
N
ij
(t )
T
j 1
t 1
ij
(t )
T
bijk
(o , w
t
t 1
k
) ij (t )
T
t 1
ij
(t )
Emission probabilities
Arc-emission HMM:
bijk
exp ected # of transition s from state i to j with k observed
exp ected # of transition s from state i to j
T
(o , w
t
t 1
k
) ij (t )
T
t 1
ij
(t )
The inner loop for
forward-backward algorithm
Given an input sequence and ( S , K , , A, B)
1. Calculate forward probability:
i (1) i
•
Base case
•
Recursive case:
j (t 1) i (t )aij bijo
t
2.
i
Calculate backward probability:
i (T 1) 1
i (t ) j (t 1)aij bijo
j
i (t )aij bijot j (t 1)
Calculate expected counts:
ij (t )
N
Update the parameters:
m (t ) m (t )
•
•
Base case:
Recursive case:
t
3.
4.
T
a ij
t 1
N
ij
(t )
j 1
bijk
T
t 1
ij
(t )
m 1
T
(o , w
t
t 1
k
) ij (t )
T
t 1
ij
(t )
Relation to EM
Relation to EM
• HMM is a PM (Product of Multi-nominal) Model
• Forward-back algorithm is a special case of the
EM algorithm for PM Models.
• X (observed data): each data point is an O1T.
• Y (hidden data): state sequence X1T.
• Θ (parameters): aij, bijk, πi.
Relation to EM (cont)
count (aij ) P(Y | X , ) * count ( X , Y , aij )
Y
P( X 1T | O1T , ) * count (O1T , X 1T , aij )
X 1T
T
P( X t i, X t 1 j | O1T , )
t 1
T
ij (t )
t 1
count (bijk ) P(Y | X , ) * count ( X , Y , bijk )
Y
P( X 1T | O1T , ) * count (O1T , X 1T , bijk )
X 1T
T
P( X t i, X t 1 j | O1T , ) * (Ok , wk )
t 1
T
ij (t ) (Ok , wk )
t 1
Iterations
• Each iteration provides values for all the
parameters
• The new model always improve the
likeliness of the training data:
ˆ ) P(O | )
P(O |
• The algorithm does not guarantee to reach
global maximum.
Summary
• A way of estimating parameters for HMM
– Define forward and backward probability, which can
calculated efficiently (DP)
– Given an initial parameter setting, we re-estimate the
parameters at each iteration.
– The forward-backward algorithm is a special case of
EM algorithm for PM model
Additional slides
Definitions so far
• The prob of producing O1,t-1, and ending at state si at
def
time t:
i (t ) P(O1,t 1 , X t i)
• The prob of producing the sequence Ot,T, given that at
time t, we are at state si:
def
i (t ) P(Ot ,T | X t i)
• The prob of being at state i at time t given O:
P ( X t i, O )
i (t ) P( X t i | O)
P(O)
i (t ) i (t )
N
j 1
j
(t ) j (t )
ij (t ) P( X t i, X t 1 j | O)
i (t )aij bijo j (t 1)
t
N
m 1
i (t ) P( X t i | O)
m
(t ) m (t )
i (t ) i (t )
N
j 1
N
j
(t ) j (t )
i (t ) ij (t )
j 1
Emission probabilities
Arc-emission HMM:
bijk
exp ected # of transition s from state i to j with k observed
exp ected # of transition s from state i to j
T
(o , w
t
t 1
k
) ij (t )
T
t 1
ij
(t )
State-emission HMM:
exp ected # of transitions to j with k observed
b jk
exp ected # of transitions to j
T
(o , w )
t
t 1
k
T
t 1
j
(t )
j
(t )
© Copyright 2026 Paperzz