slides

Adversarial Autoencoders
Cem Benar
Introduction
• Autoencoders are neural networks that learns the identity
function f(x) = x, reconstructing the input
• To achieve this goal, they have to go through an intermediate
layer (latent space), whose dimensionality is much lower than
input space
• Encoder – Decoder
• Reconstruction loss: LR (x, y)= || x – y||2
• Applications: image/audio denoising, compression, scene
understanding, representation learning
Variational Autoencoders
• Traditional AE + imposed prior distribution (becomes a generative
model)
• Aim: to match model distribution to true data distribution
• Minimizing KL divergence (a distance between any two distributions)
• Latent loss, which is the KL divergence that measures how closely the
latent variables match a unit Gaussian
• Maximizing a lower bound on the log likelihood of the data
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Generative Adversarial Networks
(GANs)
• Generator: a typically deconvolutional neural network, that maps samples z
from the prior p(z) to the data space
• Discriminator: a typically convolutional neural network that discriminates
whether an image is real or generated (i.e., a point x is in data space is from
data distribution)
• Aim: to match model distribution to true data distribution (the generator
network is outputting images that are inseparable from real images for the
discriminator)
(Radford et. al., Deep convolutional generative adversarial nets, 2015)
Adversarial Autoencoders
•
•
•
•
•
•
•
•
•
x: input, z: latent (hidden) code vector
p(z): prior distribution
q(z|x): encoding distribution
p(x|z): decoding distribution
pd(x): data distribution
p(x): model distribution
q(z): aggregated posterior distribution
Aim: to match q(z) to p(z)
Reconstruction phase + regularization phase
How are Adversarial AEs trained?
Adversarial AE vs. VAE
• Entropy: encourages large variances in q(z)
• VAE uses entropy and cross-entropy (i.e., KL divergence)
• Instead AAE uses adversarial training
Adversarial AE vs. GAN
• Adversarial AEs impose simpler prior distribution in a lower dimensionality
which results in a better test-likelihood
How well AAE and VAE impose prior distr.
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Incorporating label information
• Labeled data helps to better shape q(z)
• Each mode of mixture of Gaussian dist. is forced to represent a
single label of MNIST
• 11 classes for 10 2D Gaussian distributions
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Supervised AAE
• It can separate the class label information from image style
information
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Semi-supervised AAE
• Assuming the vector y, and vector z have categorical distribution, and
Gaussian distribution, respectively
• Consist of reconstruction + regularization + semi-supervised
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Unsupervised AAE
• No semi-supervised classification stage
• One-hot vector’s dimension is equal to number of clusters
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Dimensionality reduction
(Alireza Makhzani et. al., Adversarial Autoencoders, 2015)
Thank you for listening