Expectation Maximization (EM)
Northwestern University
EECS 395/495
Special Topics in Machine Learning
Most slides from
http://www.autonlab.org/tutorials/
Outline
• Objective
• Simple example
• Complex example
Most slides from
http://www.autonlab.org/tutorials/
Objective
• Learning with missing/unobservable data
E B A J
1 1 1 1
B
E
1 0 1 1
0 0 0 0
A
…
Maximum likelihood
J
Most slides from
http://www.autonlab.org/tutorials/
Objective
• Learning with missing/unobservable data
E B A J
1 1 ? 1
B
E
1 0 ? 1
0 0 ? 0
…
A
Optimize what?
J
Most slides from
http://www.autonlab.org/tutorials/
Outline
• Objective
• Simple example
• Complex example
Most slides from
http://www.autonlab.org/tutorials/
Simple example
Most slides from
http://www.autonlab.org/tutorials/
Maximize likelihood
Most slides from
http://www.autonlab.org/tutorials/
Same Problem with Hidden
Information
Hidden
Observable
Most slides from
http://www.autonlab.org/tutorials/
Grade
Score
Same Problem with Hidden
Information
G
S
Most slides from
http://www.autonlab.org/tutorials/
Same Problem with Hidden
Information
G
S
Most slides from
http://www.autonlab.org/tutorials/
EM for our example
Most slides from
http://www.autonlab.org/tutorials/
EM Convergence
Most slides from
http://www.autonlab.org/tutorials/
Generalization
• X: observable data
(score = {h, c, d})
• z: missing data (grade = {a, b, c, d})
• : model parameters to estimate ( )
• E: given , compute the expectation of z
• M: use z obtained in E step, maximize the
likelihood
with respect to
Most slides from
http://www.autonlab.org/tutorials/
Outline
• Objective
• Simple example
• Complex example
Most slides from
http://www.autonlab.org/tutorials/
Gaussian Mixtures
Most slides from
http://www.autonlab.org/tutorials/
Gaussian Mixtures
• Know
– Data
– – -
• Don’t know
– Data label
• Objective
– Most slides from
http://www.autonlab.org/tutorials/
The GMM assumption
Most slides from
http://www.autonlab.org/tutorials/
The GMM assumption
Most slides from
http://www.autonlab.org/tutorials/
The GMM assumption
Most slides from
http://www.autonlab.org/tutorials/
The GMM assumption
Most slides from
http://www.autonlab.org/tutorials/
The data generated
Label
Coordinates
Most slides from
http://www.autonlab.org/tutorials/
Computing the likelihood
Most slides from
http://www.autonlab.org/tutorials/
EM for GMMs
Most slides from
http://www.autonlab.org/tutorials/
EM for GMMs
Most slides from
http://www.autonlab.org/tutorials/
EM for GMMs
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Most slides from
http://www.autonlab.org/tutorials/
Generalization
• X: observable data
• z: unobservable data
• : model parameters to estimate
• E: given , compute the “expectation” of z
• M: use z obtained in E step, maximize the
likelihood
with respect to
Most slides from
http://www.autonlab.org/tutorials/
For distributions in exponential
family
• Exponential family
– Yes: normal, exponential, beta, Bernoulli,
binomial, multinomial, Poisson…
– No: Cauchy and uniform
• EM using sufficient statistics
– S1: computing the expectation of the statistics
– S2: set the maximum likelihood
Most slides from
http://www.autonlab.org/tutorials/
What EM really is •X: observable data
• Maximize expected log likelihood
•z: missing data
• E-step: Determine the expectation
• M-step: Maximize the expectation above
with respect to
Most slides from
http://www.autonlab.org/tutorials/
Final comments
• Deal with missing data/latent variables
• Maximize expected log likelihood
• Local minima
Most slides from
http://www.autonlab.org/tutorials/
© Copyright 2026 Paperzz