Energy-based GAN

Energy-based GAN
Hung-yi Lee
Discriminator
Data (target) distribution
Original Idea
Generated distribution
โ€ข Discriminator leads the generator
๐ท ๐‘ฅ
๐ท ๐‘ฅ
๐ท ๐‘ฅ
๐ท ๐‘ฅ
Is it the only explanation of GAN?
Original GAN
The discriminator is flat
in the end.
Source: https://www.youtube.com/watch?v=ebMei6bYeWw (credit: Benjamin Striner)
Evaluation Function
โ€ข We want to find an evaluation function F(x)
โ€ข Input: object x, output: scalar F(x) (how โ€œgoodโ€ the
object is)
โ€ข E.g. x are images
Evaluation
object
โ€ข Real x has high F(x)
Function F
x
โ€ข F(x) can be a network
โ€ข We can generate good x by F(x):
โ€ข Find x with large F(x)
F(x)
โ€ข How to find F(x)?
In practice, you cannot
decrease all the x other
than real data.
real data
scalar
F(x)
Evaluation Function
- Structured Perceptron
โ€ข Input: training data set ๏ป๏€จx , yห† ๏€ฉ, ๏€จx , yห† ๏€ฉ, ๏‹ , ๏€จx , yห† ๏€ฉ, ๏‹๏ฝ
1
1
2
2
r
r
โ€ข Output: weight vector w
F ๏€จ x, y ๏€ฉ ๏€ฝ w ๏ƒ— ๏ฆ ๏€จ x, y ๏€ฉ
โ€ข Algorithm: Initialize w = 0
โ€ข do
โ€ข For each pair of training example x r , yห† r
โ€ข Find the label ๐‘ฆเทค ๐‘Ÿ maximizing F ๐‘ฅ ๐‘Ÿ , ๐‘ฆ
Can be an issue
๏€จ
๏€ฉ
๏€จ
๏€ฉ
~
y r ๏€ฝ arg max F x r , y
r
r
~
ห†
โ€ข If y ๏‚น y , update w
y๏ƒŽY
๏€จ
๏€ฉ ๏€จ
r
r
r ~r
Increase ๐น ๐‘ฅ ๐‘Ÿ , ๐‘ฆเทœ ๐‘Ÿ ,
ห†
w
๏‚ฎ
w
๏€ซ
๏ฆ
x
,
y
๏€ญ
๏ฆ
x
,y
๐‘Ÿ
๐‘Ÿ
decrease ๐น ๐‘ฅ , ๐‘ฆเทค
โ€ข until w is not updated
We are done!
๏€ฉ
How about
GAN?
โ€ข Generator is an
intelligent way to find
the negative examples.
โ€œExperience replayโ€,
parameters from last iteration
๐น ๐‘ฅ
๐น ๐‘ฅ
In the end โ€ฆโ€ฆ
๐น ๐‘ฅ
๐น ๐‘ฅ
Pdata = โ€œoriginโ€
G = 1 hidden layer (100)
D = 1 hidden layer (100)
100 iterations on G
100 iterations on D
Pdata = โ€œlineโ€
100 iterations on D
G = 1 hidden layer (100)
D = 1 hidden layer (100)
G = 2 hidden layer (100)
D = 1 hidden layer (100)
G = 1 hidden layer (100)
D = 2 hidden layer (100)
Pdata = 1-D Gaussian
100 iterations on D
G = 2 hidden layer (100)
D = 1 hidden layer (100)
Energy-based GAN (EBGAN)
โ€ข Viewing the discriminator as an energy function (negative
evaluation function)
โ€ข Auto-encoder as discriminator (energy function)
โ€ข Loss function with margin for discriminator training
โ€ข Generate reasonable-looking images from the ImageNet
dataset at 256 x 256 pixel resolution
โ€ข without a multiscale approach
Junbo Zhao, Michael
Mathieu, Yann LeCun,
โ€œEnergy-based Generative
Adversarial Networkโ€,
arXiv preprint, 2016
Discriminator
EBGAN
๐‘ง
GE
๐‘ฅ
EN
DE
๐‘ฅเทค
Autoencoder
Sample real example x
Sample code z from prior distribution
Update discriminator D to minimize
๐ฟ๐ท ๐‘ฅ, ๐‘ง = ๐ท ๐‘ฅ + ๐‘š๐‘Ž๐‘ฅ 0, ๐‘š โˆ’ ๐ท ๐บ ๐‘ง
Sample code z from prior distribution
Update generator G to minimize
๐ฟ๐บ ๐‘ง = ๐ท ๐บ ๐‘ง
๐‘ฅ โˆ’ ๐‘ฅเทค
scalar
Discriminator
EBGAN
๐‘ง
GE
๐‘ฅ
EN
DE
๐‘ฅเทค
Autoencoder
๐‘ฅ โˆ’ ๐‘ฅเทค
scalar
๐ฟ๐ท ๐‘ฅ, ๐‘ง = ๐ท ๐‘ฅ + ๐‘š๐‘Ž๐‘ฅ 0, ๐‘š โˆ’ ๐ท ๐บ ๐‘ง
โˆ’๐ท ๐บ ๐‘ง
Generator G: ๐ฟ๐บ ๐‘ง = ๐ท ๐บ ๐‘ง
Discriminator D:
m
Hard to reconstruct,
easy to destroy
gen
real
gen
Discriminator
EBGAN
๐‘ง
GE
Discriminator D:
๐‘ฅ
EN
DE
Autoencoder
๐›พ
๐‘ฅเทค
๐‘ฅ โˆ’ ๐‘ฅเทค
scalar
๐›พ
๐ฟ๐ท ๐‘ฅ, ๐‘ง = ๐ท ๐‘ฅ + ๐‘š๐‘Ž๐‘ฅ 0, ๐‘š โˆ’ ๐ท ๐บ ๐‘ง
Generator G: ๐ฟ๐บ ๐‘ง = ๐ท ๐บ ๐‘ง
For auto-encoder, the region for low value is limited.
What would happen if
x and ๐บ ๐‘ง have the m
same distribution?
๐›พ is a value
between 0 and m
large
large
๐›พ
real = gen
More about EBGAN
๐‘ฅ๐‘–
EN ๐‘’ DE
๐‘–
โ€ข Pulling-away term for training generator
Given a batch S = โ‹ฏ ๐‘ฅ๐‘– โ‹ฏ ๐‘ฅ๐‘— โ‹ฏ from generator
๐‘“๐‘ƒ๐‘‡ ๐‘† = เท ๐‘๐‘œ๐‘  ๐‘’๐‘– , ๐‘’๐‘—
To increase diversity
๐‘–,๐‘—,๐‘–โ‰ ๐‘—
โ€ข Better way to learn auto-encoder?
โ€ข If auto-encoder only learns to minimize the
reconstruction error of real images
โ€ข Can obtain nearly identity function (not properly
designed structure)
โ€ข Giving larger reconstruction error for fake images
regularized auto-encoder
๐‘ฅเทค๐‘–
Margin Adaptation GAN (MAGAN)
๐ฟ๐ท ๐‘ฅ, ๐‘ง = ๐ท ๐‘ฅ +
๐‘š๐‘Ž๐‘ฅ 0, ๐‘š โˆ’ ๐ท ๐บ ๐‘ง
โ€ข Dynamic margin m
โ€ข As the generator
generates better
images
โ€ข The margin
becomes smaller
and smaller
Loss-sensitive GAN (LSGAN)
โ€ข Reference: Guo-Jun Qi, โ€œLoss-Sensitive Generative
Adversarial Networks on Lipschitz Densitiesโ€, arXiv preprint,
2017
โ€ข LSGAN allows the generator to focus on improving poor data
points that are far apart from real examples.
โ€ข Connecting LSGAN with WGAN
LSGAN Assuming D(x) is the energy function
Discriminator minimizing:
๐ท ๐‘ฅ + ๐‘š๐‘Ž๐‘ฅ 0, ฮ” ๐‘ฅ, ๐บ ๐‘ง
+๐ท ๐‘ฅ โˆ’๐ท ๐บ ๐‘ง
D(x)
๐บ ๐‘ง
ฮ” ๐‘ฅ, ๐บ ๐‘ง
๐บ ๐‘งโ€ฒ
ฮ” ๐‘ฅ, ๐บ ๐‘งโ€ฒ
๐‘ฅ
LSGAN Assuming D(x) is the energy function
Discriminator minimizing:
๐ท ๐‘ฅ + ๐‘š๐‘Ž๐‘ฅ 0, ฮ” ๐‘ฅ, ๐บ ๐‘ง
+๐ท ๐‘ฅ โˆ’๐ท ๐บ ๐‘ง
๐น ๐‘ฅ, ๐‘ฆ
๐น ๐‘ฅ, ๐‘ฆ
๐น
โˆ†
๐น
margin
๐น
โˆ†
margin
๐น
๐น
๐น ๐‘ฅ ๐‘› , ๐‘ฆเทœ ๐‘› โ‰ฅ ๐น ๐‘ฅ ๐‘› , ๐‘ฆ
๐น
๐น
๐น ๐‘ฅ ๐‘› , ๐‘ฆเทœ ๐‘› โˆ’ ๐น ๐‘ฅ ๐‘› , ๐‘ฆ โ‰ฅ โˆ† ๐‘ฆเทœ ๐‘› , ๐‘ฆ
Boundary Equilibrium Generative
Adversarial Networks (BEGAN)
โ€ข Ref: David Berthelot, Thomas Schumm, Luke Metz,
โ€œBEGAN: Boundary Equilibrium Generative
Adversarial Networksโ€, arXiv preprint, 2017
โ€ข Auto-encoder based GAN
Not from celebA
BEGAN
๐‘ง
GE
Discriminator
๐‘ฅ
EN
DE
๐‘ฅเทค
Autoencoder
๐‘ฅ โˆ’ ๐‘ฅเทค
scalar
For discriminator: ๐ฟ๐ท = ๐ท ๐‘ฅ โˆ’ ๐‘˜๐‘ก ๐ท ๐บ ๐‘ง
For generator:
๐ฟ๐บ = ๐ท ๐บ ๐‘ง
For each training step t:
๐‘˜๐‘ก+1 = ๐‘˜๐‘ก + ๐œ† ๐›พ๐ท ๐‘ฅ โˆ’ ๐ท ๐บ ๐‘ง
๐‘˜๐‘ก increase
If ๐›พ๐ท ๐‘ฅ > ๐ท ๐บ ๐‘ง
๐ท ๐บ ๐‘ง
๐ท ๐‘ฅ
<๐›พ
๐ท ๐บ ๐‘ง
๐ท ๐‘ฅ
BEGAN
For discriminator: ๐ฟ๐ท = ๐ท ๐‘ฅ โˆ’ ๐‘˜๐‘ก ๐ท ๐บ ๐‘ง
For generator:
๐ฟ๐บ = ๐ท ๐บ ๐‘ง
For each training step t:
๐‘˜๐‘ก+1 = ๐‘˜๐‘ก + ๐œ† ๐›พ๐ท ๐‘ฅ โˆ’ ๐ท ๐บ ๐‘ง
<๐›พ
้™ณๆŸๆ–‡ (ๅคงๅ››) ๆไพ›ๅฏฆ้ฉ—็ตๆžœ (using CelebA)