The Expectation Maximization (EM) Algorithm

The Expectation Maximization (EM)
Algorithm
• Formally outlined by Dempster, Laird, and
Rubin (1977) in “Maximum likelihood from
incomplete data via the EM algorithm”
U
y
• Complete Data: y  ( y ,
)
C
O
• Parameter: 
35
The EM Algorithm
(0)


• Initialize parameter: set to
• For t=1 to …
complete-data likelihood
conditional expectation
– E-step
Q( |  (t ) )  E[log f ( yC |  ) | yO , (t ) ]
– M-step
 (t 1)  arg max Q( |  (t ) )

36
The EM Algorithm
• Ascent property
– The M step ensures the algorithm improves
Q( |  (t ) )
– It can be shown that improving Q( |  (t ) ) implies
improving L( )  f (Y O |  ) (the observed-data
likelihood)
• Convergence to local maxima
– Choose multiple sets of initial values
37
Example 1: Allele Frequencies for
the ABO Blood Group
• Suppose nA=186, nB=38, nAB=13, and nO=284
were observed.
Genotype Phenotype
AA
A
AO
A
AB
BB
AB
B
BO
B
OO
O
count
nA  186
n AB  13
nB  38
nO  284
• Question: P(A) = ?, P(B)=?, P(O)=?
38
Example 1: Allele Frequencies for
the ABO Blood Group
• More possible classes (genotypes) than
those can be distinguishable (phenotypes)
– If a person has type A (B), the underlying
genotype could be either AA (B) or AO (BO)
• The likelihood of “complete data “is simple
(nAA , nAO , nBB , nBO , nAB , nOO ) ~
Multinomia l(n, p A2 ,2 p A pO , pB2 ,2 pB pO ,2 p A pB , pO2 )
• Available data is incomplete
– nAA, nAO, nBB, nBO are unknown
• Consider the EM algorithm
39
Example 1: Allele Frequencies for
the ABO Blood Group
• Observed data: nO=(nA,nB,nAB,nO)
• Unobserved data: nU=(nAA, nAO,nBB,nBO)
• Complete data: nC=(nAA,nAO,nBB,nBO,nAB,nOO)
– nAA+nAO=nA
– nBB+nBO=nB
– nO=nOO
• Log of complete-data likelihood
2
ln f (n C | p (t ) )  n AA ln( p A(t ) )  n AO ln(2 p A(t ) pO(t ) ) 
2
2
nBB ln( pB(t ) )  nBO ln(2 pB(t ) pO(t ) )  nO ln( pO(t ) )  n AB ln(2 p A(t ) pB(t ) )
n



ln 
 n AA , n AO , nBB , nBO , n AB , nO ) 
40
Example 1: Allele Frequencies for
the ABO Blood Group
•
(t )
Q
(
p
|
p
)
The E step: calculate
– Take the expectation of ln f (nC | p (t ) ) conditional on
the observed counts nA , nB , nAB , nO , and the current
parameters p (t )
– To do that , we need to calculate E(nU | nO , p(t ) )
E (n AA | n , p )  n A
O
p
(t )
p
(t ) 2
A
(t ) 2
A
2p p
(t )
A
(t )
O
,
E (n AO | n O , p (t ) )  n A  E (n AA | n O , p (t ) )
E (nBB | n O , p (t ) )  nB
pB(t )
p
(t ) 2
B
2
2p p
(t )
B
(t )
O
,
E (nBO | n O , p (t ) )  nB  E (nBB | n O , p (t ) )
41
Example 1: Allele Frequencies for
the ABO Blood Group
• The M step
– Maximizes Q( p | p(t ) )
– Notice the constraint p A  p B  pO  1
– Introduce a Lagrange multiplier
H ( p,  )  Q( p | p(t ) )   ( pA  pB  pO  1)
– Setting the partial derivatives leads to
p A(t 1) 
pB(t 1) 
pO(t 1) 
2 E (n AA | n O , p (t ) )  E (n AO | n O , p (t ) )  n AB
2n
2 E (nBB | n O , p (t ) )  E (nBO | n O , p (t ) )  n AB
2n
E (n AO | n O , p (t ) )  E (nBO | n O , p ( t ) )  2nO
2n
42
Example 1: Allele Frequencies for
the ABO Blood Group
• Take an initial guess: pA  0.3, pB  0.2, pO  0.5
Iteration
0
1
2
3
4
5
pA
pB
pO
.3000
.2321
.2160
.2139
.2136
.2136
.2000
.0550
.0503
.0502
.0501
.0501
.5000
.7129
.7337
.7359
.7363
.7363
43