Generative Model Additive Gaussian Noise

A Unifying Review of Linear Gaussian Models
Summary Presentation 2/15/10 – Dae Il Kim
Department of Computer Science
Graduate Student
Advisor: Erik Sudderth Ph.D.
Overview
•
•
•
•
•
•
•
•
•
•
Introduce the Basic Model
Discrete Time Linear Dynamical System (Kalman Filter)
Some nice properties of Gaussian distributions
Graphical Model: Static Model (Factor Analysis, PCA, SPCA)
Learning & Inference: Static Model
Graphical Model: Gaussian Mixture & Vector Quantization
Learning & Inference: GMMs & Quantization
Graphical Model: Discrete-State Dynamic Model (HMMs)
Independent Component Analysis
Conclusion
The Basic Model
•
Basic Model: Discrete Time Linear Dynamical System (Kalman Filter)
Generative Model
xt 1  Axt  wt  Axt  w
yt  Cxt  vt  Cxt  v
A = k x k state transition matrix
C = p x k observation / generative matrix
Additive Gaussian Noise
w ~ N (0, Q)
v ~ N (0, R)
Variations of this model produce:
Factor Analysis
Principal Component Analysis
Mixtures of Gaussians
Vector Quantization
Independent Component Analysis
Hidden Markov Models
Nice Properties of Gaussians
•
Conditional Independence
P ( xt 1 | xt )  N ( Axt , Q) | xt 1
•
Markov Property
P( yt | xt )  N (Cxt , R) | yt
 1

t 1
t 1
P({ x1 ,..., x },{ y1 ,..., y })  P( x1 ) P ( xt 1 | xt ) P ( yt | xt )
•
Inference in these models
P({ x1 ,..., x } | { y1 ,..., y }) 
Smoothing : P({xt } | { y1 ,..., y })
•
P({ x1 ,..., x },{ y1 ,..., y })
P({ y1 ,..., y })
Filtering : P({xt } | { y1 ,..., yt })
Learning via Expectation Maximization (EM)
E  step : Qk 1  arg max  Q( x) log P( X , Y |  k )dX
Q
X
M  step :  k 1  arg max  P( X | Y , k ) log P( X , Y |  )dX

X
Graphical Model for Static Models
Generative Model
A  0  x  w
y  Cx  v
Additive Gaussian Noise
v ~ N (0, R)
w ~ N (0, Q)
Factor Analysis: Q = I & R is diagonal
SPCA: Q = I & R = αI
PCA: Q = I & R = lime0eI
Example of the generative process for PCA
Bishop (2006)
1-dimensional latent space
Z = latent variable
X = observed variable
2-dimensional
observation space
Marginal
distribution for p(x)
Learning & Inference: Static Models
Analytically integrating over the joint, we obtain the marginal distribution of y.
y ~ N (0, CQC T  R)
We can calculate our poterior using Bayes rule
p( x | y ) 

p( y | x ) p( x )
p ( y )
N (Cx , R ) | y N (0, I ) | x
N (0, CC T  R ) | y
Our posterior now becomes another Gaussian
P( x | y )  N (y , I  C) | x
Where beta is equal to:
  C T (CC T  R) 1
Note: Filtering and Smoothing reduce to the same problem in the static model since the time
dependence is gone. We want to find P(x.|y.) over a single hidden state given the single
observation. Inference can be performed simply by linear matrix projection and the result is also
Gaussian.
Graphical Model: Gaussian Mixture Models & Vector
Quantization
Generative Model
A  0  x  WTA[w ]
y  Cx  v
Additive Gaussian Noise
v ~ N (0, R)
w ~ N (, Q)
(Winner Takes All - WTA)[x] = new vector with
unity in the position of the largest coordinate
of the input and zeros in all other positions.
[0 0 1 ]
Note: Each state x. is generated independently according to
a fixed discrete probability histogram controlled by the
mean and covariance of w.
This model becomes a Vector Quantization model when:
R  lim I
 0
Learning & Inference: GMMs & Quantization
Computing the Likelihood for the data is straightforward
k
P ( y )   P ( x  e j , y )
i 1
k
  N (Ci , R) | y P(x  ei )
i 1
k
  N (Ci , R) | y ( i )
i 1
Calculating the posterior responsibility for each cluster is
analagous to the E-Step in this model.
( xˆ ) j  P( x  e j | y ) 

P( x  e j , y )
P( y )
N (C j , R ) | y P ( x  e j )

k
i 1

N (C j , R ) | y P ( x  ei )
N (C j , R ) | y ( j )

k
i 1
N (C j , R ) | y ( i )
Pi is the probability
assigned by the Gaussian
N(mu,Q) to the region of kspace in which the jth
coordinate is larger than all
the others.
Gaussian Mixture Models
Joint Distribution p(y,x)
Marginal Distribution p(y)
 j  P ( x  e j )
Pi is the probability
assigned by the Gaussian
N(mu,Q) to the region of kspace in which the jth
coordinate is larger than all
the others.
Graphical Model: Discrete-State Dynamic Models
Generative Model
xt 1  WTA[ Axt  wt ]
 WTA[ Axt  w ]
yt  Cxt  vt  Cxt  v
Additive Gaussian Noise
v ~ N (0, R)
w ~ N (, Q)
Independent Component Analysis
•
ICA can be seen as a linear generative model with non-gaussian priors for the
hidden variables or as a nonlinear generative model with gaussian priors for the
hidden variables.
Generative Model
A  0  x  g (w )
w ~ N (0, Q)
y  Cx  v
g(.) is a general
nonlinearity that is
invertible and
differentiable
v ~ N (0, R)
g ( w)  ln(tan(

4
(1  erf (
w
)))
2
The gradient learning rule to increase the likelihood:
W  W T  f (Wy ) yT
f ( x) 
d log p x ( x)
dx
Conclusion
Many more potential models!