Energy-based GAN Hung-yi Lee Discriminator Data (target) distribution Original Idea Generated distribution โข Discriminator leads the generator ๐ท ๐ฅ ๐ท ๐ฅ ๐ท ๐ฅ ๐ท ๐ฅ Is it the only explanation of GAN? Original GAN The discriminator is flat in the end. Source: https://www.youtube.com/watch?v=ebMei6bYeWw (credit: Benjamin Striner) Evaluation Function โข We want to find an evaluation function F(x) โข Input: object x, output: scalar F(x) (how โgoodโ the object is) โข E.g. x are images Evaluation object โข Real x has high F(x) Function F x โข F(x) can be a network โข We can generate good x by F(x): โข Find x with large F(x) F(x) โข How to find F(x)? In practice, you cannot decrease all the x other than real data. real data scalar F(x) Evaluation Function - Structured Perceptron โข Input: training data set ๏ป๏จx , yห ๏ฉ, ๏จx , yห ๏ฉ, ๏ , ๏จx , yห ๏ฉ, ๏๏ฝ 1 1 2 2 r r โข Output: weight vector w F ๏จ x, y ๏ฉ ๏ฝ w ๏ ๏ฆ ๏จ x, y ๏ฉ โข Algorithm: Initialize w = 0 โข do โข For each pair of training example x r , yห r โข Find the label ๐ฆเทค ๐ maximizing F ๐ฅ ๐ , ๐ฆ Can be an issue ๏จ ๏ฉ ๏จ ๏ฉ ~ y r ๏ฝ arg max F x r , y r r ~ ห โข If y ๏น y , update w y๏Y ๏จ ๏ฉ ๏จ r r r ~r Increase ๐น ๐ฅ ๐ , ๐ฆเท ๐ , ห w ๏ฎ w ๏ซ ๏ฆ x , y ๏ญ ๏ฆ x ,y ๐ ๐ decrease ๐น ๐ฅ , ๐ฆเทค โข until w is not updated We are done! ๏ฉ How about GAN? โข Generator is an intelligent way to find the negative examples. โExperience replayโ, parameters from last iteration ๐น ๐ฅ ๐น ๐ฅ In the end โฆโฆ ๐น ๐ฅ ๐น ๐ฅ Pdata = โoriginโ G = 1 hidden layer (100) D = 1 hidden layer (100) 100 iterations on G 100 iterations on D Pdata = โlineโ 100 iterations on D G = 1 hidden layer (100) D = 1 hidden layer (100) G = 2 hidden layer (100) D = 1 hidden layer (100) G = 1 hidden layer (100) D = 2 hidden layer (100) Pdata = 1-D Gaussian 100 iterations on D G = 2 hidden layer (100) D = 1 hidden layer (100) Energy-based GAN (EBGAN) โข Viewing the discriminator as an energy function (negative evaluation function) โข Auto-encoder as discriminator (energy function) โข Loss function with margin for discriminator training โข Generate reasonable-looking images from the ImageNet dataset at 256 x 256 pixel resolution โข without a multiscale approach Junbo Zhao, Michael Mathieu, Yann LeCun, โEnergy-based Generative Adversarial Networkโ, arXiv preprint, 2016 Discriminator EBGAN ๐ง GE ๐ฅ EN DE ๐ฅเทค Autoencoder Sample real example x Sample code z from prior distribution Update discriminator D to minimize ๐ฟ๐ท ๐ฅ, ๐ง = ๐ท ๐ฅ + ๐๐๐ฅ 0, ๐ โ ๐ท ๐บ ๐ง Sample code z from prior distribution Update generator G to minimize ๐ฟ๐บ ๐ง = ๐ท ๐บ ๐ง ๐ฅ โ ๐ฅเทค scalar Discriminator EBGAN ๐ง GE ๐ฅ EN DE ๐ฅเทค Autoencoder ๐ฅ โ ๐ฅเทค scalar ๐ฟ๐ท ๐ฅ, ๐ง = ๐ท ๐ฅ + ๐๐๐ฅ 0, ๐ โ ๐ท ๐บ ๐ง โ๐ท ๐บ ๐ง Generator G: ๐ฟ๐บ ๐ง = ๐ท ๐บ ๐ง Discriminator D: m Hard to reconstruct, easy to destroy gen real gen Discriminator EBGAN ๐ง GE Discriminator D: ๐ฅ EN DE Autoencoder ๐พ ๐ฅเทค ๐ฅ โ ๐ฅเทค scalar ๐พ ๐ฟ๐ท ๐ฅ, ๐ง = ๐ท ๐ฅ + ๐๐๐ฅ 0, ๐ โ ๐ท ๐บ ๐ง Generator G: ๐ฟ๐บ ๐ง = ๐ท ๐บ ๐ง For auto-encoder, the region for low value is limited. What would happen if x and ๐บ ๐ง have the m same distribution? ๐พ is a value between 0 and m large large ๐พ real = gen More about EBGAN ๐ฅ๐ EN ๐ DE ๐ โข Pulling-away term for training generator Given a batch S = โฏ ๐ฅ๐ โฏ ๐ฅ๐ โฏ from generator ๐๐๐ ๐ = เท ๐๐๐ ๐๐ , ๐๐ To increase diversity ๐,๐,๐โ ๐ โข Better way to learn auto-encoder? โข If auto-encoder only learns to minimize the reconstruction error of real images โข Can obtain nearly identity function (not properly designed structure) โข Giving larger reconstruction error for fake images regularized auto-encoder ๐ฅเทค๐ Margin Adaptation GAN (MAGAN) ๐ฟ๐ท ๐ฅ, ๐ง = ๐ท ๐ฅ + ๐๐๐ฅ 0, ๐ โ ๐ท ๐บ ๐ง โข Dynamic margin m โข As the generator generates better images โข The margin becomes smaller and smaller Loss-sensitive GAN (LSGAN) โข Reference: Guo-Jun Qi, โLoss-Sensitive Generative Adversarial Networks on Lipschitz Densitiesโ, arXiv preprint, 2017 โข LSGAN allows the generator to focus on improving poor data points that are far apart from real examples. โข Connecting LSGAN with WGAN LSGAN Assuming D(x) is the energy function Discriminator minimizing: ๐ท ๐ฅ + ๐๐๐ฅ 0, ฮ ๐ฅ, ๐บ ๐ง +๐ท ๐ฅ โ๐ท ๐บ ๐ง D(x) ๐บ ๐ง ฮ ๐ฅ, ๐บ ๐ง ๐บ ๐งโฒ ฮ ๐ฅ, ๐บ ๐งโฒ ๐ฅ LSGAN Assuming D(x) is the energy function Discriminator minimizing: ๐ท ๐ฅ + ๐๐๐ฅ 0, ฮ ๐ฅ, ๐บ ๐ง +๐ท ๐ฅ โ๐ท ๐บ ๐ง ๐น ๐ฅ, ๐ฆ ๐น ๐ฅ, ๐ฆ ๐น โ ๐น margin ๐น โ margin ๐น ๐น ๐น ๐ฅ ๐ , ๐ฆเท ๐ โฅ ๐น ๐ฅ ๐ , ๐ฆ ๐น ๐น ๐น ๐ฅ ๐ , ๐ฆเท ๐ โ ๐น ๐ฅ ๐ , ๐ฆ โฅ โ ๐ฆเท ๐ , ๐ฆ Boundary Equilibrium Generative Adversarial Networks (BEGAN) โข Ref: David Berthelot, Thomas Schumm, Luke Metz, โBEGAN: Boundary Equilibrium Generative Adversarial Networksโ, arXiv preprint, 2017 โข Auto-encoder based GAN Not from celebA BEGAN ๐ง GE Discriminator ๐ฅ EN DE ๐ฅเทค Autoencoder ๐ฅ โ ๐ฅเทค scalar For discriminator: ๐ฟ๐ท = ๐ท ๐ฅ โ ๐๐ก ๐ท ๐บ ๐ง For generator: ๐ฟ๐บ = ๐ท ๐บ ๐ง For each training step t: ๐๐ก+1 = ๐๐ก + ๐ ๐พ๐ท ๐ฅ โ ๐ท ๐บ ๐ง ๐๐ก increase If ๐พ๐ท ๐ฅ > ๐ท ๐บ ๐ง ๐ท ๐บ ๐ง ๐ท ๐ฅ <๐พ ๐ท ๐บ ๐ง ๐ท ๐ฅ BEGAN For discriminator: ๐ฟ๐ท = ๐ท ๐ฅ โ ๐๐ก ๐ท ๐บ ๐ง For generator: ๐ฟ๐บ = ๐ท ๐บ ๐ง For each training step t: ๐๐ก+1 = ๐๐ก + ๐ ๐พ๐ท ๐ฅ โ ๐ท ๐บ ๐ง <๐พ ้ณๆๆ (ๅคงๅ) ๆไพๅฏฆ้ฉ็ตๆ (using CelebA)
© Copyright 2026 Paperzz