Introduction to EM Algorithm

Introduction to EM
Algorithm
PL Lab.
Liu, Xiao
November 23, 2016
Outline
• What is EM algorithm?
• Definition
• Where the EM can be used?
• Background knowledge
• Likelihood
• Binomial distribution
• The EM computation steps
• An example to understand EM: tossing coins
• A little exercise
EM definition
• Expectation Maximization (EM) Algorithm
• Presented by Dempster, Laird and Rubin in 1977
• EM algorithm is an iterative estimation that can derive the maximum
likelihood (ML) estimates in the presence of missing or hidden data
(“incomplete data”)
Where to use EM algorithm?
• Frequently used for data clustering in machine learning and computer
vision
• From random initial clusters to most appropriate clusters
Likelihood & Binomial distribution
• Likelihood could estimate the parameter from a set of statistics.
• Probability: using parameter to estimate result
• Likelihood: using result to estimate parameter
• Binomial distribution presents 𝑛 times of yes/no experiments with
the success probability 𝑝.
EM computation steps
• Expectation step (E-step)
• Takes the expected value (complete data 𝑥, observation 𝑦) to estimate 𝜃𝑘
• 𝑄 𝜃, 𝜃𝑘 = 𝐸{log 𝑓 𝑥 𝜃 |𝑦, 𝜃𝑘 }
• Maximization step (M-step)
• Maximizes the Q-function by iterating x and y
• 𝜃𝑘 = arg max 𝑄(𝜃|𝜃𝑘 )
𝜃
Example of EM algorithm
• There are two coins 𝐶𝑜𝑖𝑛𝐴 and 𝐶𝑜𝑖𝑛𝐵
• One is more likely to get Heads, the other more likely to get Tails
• We pick one at random and toss it. Which one was it?
Tossing coins (1)
• Pick a coin randomly
• Toss it 10 times
• Record the number of heads and tails
• Get the average number of heads for each coin.
• Do this 5 times.
Tossing coins (2)
5 sets, each set tossed 10 times
B
H T T T H H T H T H
𝐶𝑜𝑖𝑛𝐴
𝐶𝑜𝑖𝑛𝐵
5 H, 5 T
A
H H H H T H H H H H
24
𝜃𝐴 =
= 0.80
24 + 6
9 H, 1 T
8 H, 2 T
A
4 H, 6 T
H T H H H H H T H H
7 H, 3 T
B
H T H T T T H H T T
A
T H H H T H H H T H
𝜃𝐵 =
9
= 0.45
9 + 11
24 H, 6 T 9 H, 11 T
Conclusion: 𝑪𝒐𝒊𝒏𝑨 yields heads 80% of the time, 𝑪𝒐𝒊𝒏𝑩 yields 45% of the time
Tossing coins (3)
• What if we are given ONLY the results of our coin tosses?
• Can we guess the percentage of the heads each coin yields?
• Can we guess which coin was picked for each set of 10 coin tosses?
1 H T T T H H T H T H
2 H H H H T H H H H H
3 H T H H H H H T H H
4 H T H T T T H H T T
5 T H H H T H H H T H
Tossing coins (4)
• Solve the problems using EM algorithm
1. Assign random averages to both coins (E-step)
2. For each of the 5 rounds of 10 coin tosses (M-step)
a) Check the percentage of heads
b) Find the probability of it coming from each coin
c) Compute the expected number of heads: using that probability as a weight, multiply it
by the number of heads
d) Record those numbers
e) Re-compute new means for 𝐶𝑜𝑖𝑛𝐴 and 𝐶𝑜𝑖𝑛𝐵
3. With these new means go back to step 2.
Tossing coins (5)
Tossing coins (6)
Tossing coins (7)
Tossing coins (8)
Tossing coins (9)
1 H T T T H H T H T H
2 H H H H T H H H H H
3 H T H H H H H T H H
Assume 𝜽𝑨 = 𝟎. 𝟔 and 𝜽𝑩 = 𝟎. 𝟓
4 H T H T T T H H T T
5 T H H H T H H H T H
Compute the likelihood that it was 𝑪𝒐𝒊𝒏𝑨 and 𝑪𝒐𝒊𝒏𝑩 using the binomial distribution with mean
probability 𝜽 on 𝒏 trials with 𝒌 successes:
𝒏
𝒑 𝒌 = ( )𝜽𝒌 (𝟏 − 𝜽)𝒏−𝒌
𝒌
Tossing coins (10)
1 H T T T H H T H T H
2 H H H H T H H H H H
3 H T H H H H H T H H
Assume 𝜽𝑨 = 𝟎. 𝟔 and 𝜽𝑩 = 𝟎. 𝟓
4 H T H T T T H H T T
5 T H H H T H H H T H
Let’s take the first round: 5 heads and 5 tails
Likelihood of 𝑪𝒐𝒊𝒏𝑨: 𝐥𝐢𝐤𝐞(𝑪𝒐𝒊𝒏𝑨 ) = 𝜽𝑨 𝟓 (𝟏 − 𝜽𝑨 )𝟏𝟎−𝟓 = 𝟎. 𝟎𝟎𝟎𝟕𝟗𝟔𝟐𝟔𝟐𝟒
Likelihood of 𝑪𝒐𝒊𝒏𝑩: 𝐥𝐢𝐤𝐞(𝑪𝒐𝒊𝒏𝑩 ) = 𝜽𝑩 𝟓 (𝟏 − 𝜽𝑩 )𝟏𝟎−𝟓 = 𝟎. 𝟎𝟎𝟎𝟗𝟕𝟔𝟓𝟔𝟐𝟓
Normalization: 𝐧𝐨𝐫(𝐀) =
𝐥𝐢𝐤𝐞(𝑪𝒐𝒊𝒏𝑨 )
𝐥𝐢𝐤𝐞 𝑪𝒐𝒊𝒏𝑨 +𝐥𝐢𝐤𝐞(𝑪𝒐𝒊𝒏𝑩 )
= 𝟎. 𝟒𝟓, 𝐧𝐨𝐫(𝐁) =
𝐥𝐢𝐤𝐞(𝑪𝒐𝒊𝒏𝑩 )
𝐥𝐢𝐤𝐞 𝑪𝒐𝒊𝒏𝑩 +𝐥𝐢𝐤𝐞(𝑪𝒐𝒊𝒏𝑩 )
= 𝟎. 𝟓𝟓
Tossing coins (11)
1 H T T T H H T H T H
2 H H H H T H H H H H
3 H T H H H H H T H H
Assume 𝜽𝑨 = 𝟎. 𝟔 and 𝜽𝑩 = 𝟎. 𝟓
4 H T H T T T H H T T
5 T H H H T H H H T H
Let’s take the first round: 5 heads and 5 tails
Recap:𝐧𝐨𝐫(𝐀) = 𝟎. 𝟒𝟓,𝐧𝐨𝐫(𝐁) = 𝟎. 𝟓𝟓
Estimating likely number of heads and tails from:
𝑪𝒐𝒊𝒏𝑨 : 𝑯 = 𝟎. 𝟒𝟓 × 𝟓 𝒉𝒆𝒂𝒅𝒔 = 𝟐. 𝟐 𝒉𝒆𝒂𝒅𝒔, 𝑻 = 𝟎. 𝟒𝟓 × 𝟓 𝒕𝒂𝒊𝒍𝒔 = 𝟐. 𝟐 𝒕𝒂𝒊𝒍𝒔
𝑪𝒐𝒊𝒏𝑩 : 𝑯 = 𝟎. 𝟓𝟓 × 𝟓 𝒉𝒆𝒂𝒅𝒔 = 𝟐. 𝟖 𝒉𝒆𝒂𝒅𝒔, 𝑻 = 𝟎. 𝟓𝟓 × 𝟓 𝒕𝒂𝒊𝒍𝒔 = 𝟐. 𝟖 𝒕𝒂𝒊𝒍𝒔
Tossing coins (12)
Assume 𝜽𝑨 = 𝟎. 𝟔 and 𝜽𝑩 = 𝟎. 𝟓
Compute the new probabilities for each coin:
𝑯
𝟐𝟏. 𝟑
𝜽𝟏𝑨 =
=
= 𝟎. 𝟕𝟏
𝑯 + 𝑻 𝟐𝟏. 𝟑 + 𝟖. 𝟔
𝜽𝟏𝑩
𝑯
𝟏𝟏. 𝟕
=
=
= 𝟎. 𝟓𝟖
𝑯 + 𝑻 𝟏𝟏. 𝟕 + 𝟖. 𝟒
Tossing coins (13)
Exercise & references
• Implement the EM algorithm
• Write a Python program to compute probabilities of tossing coins presented
on slide “Tossing coins (9) ~ (12)”
• References
• https://www.youtube.com/watch?v=7e65vXZEv5Q
• http://nipunbatra.github.io/2014/04/em/
Thank you!