Summary (.doc)

AdaGAN Summary
With the advent of Generative Adversarial Networks (GANs), various state-of-the art
results have been produced on a variety of generative tasks (such as DeepDream), However,
one of the main issues that troubles GANs is the stability of GANs along with its propensity of
miss modes. The AdaGAN paper introduces a novel method that helps to alleviate this issue
without adding too much overhead to the GAN itself.
AdaGAN is an iterative procedure that where new components are added at each step
and adding it to a mixture modeled that runs the GAN on a reweighted sample. This is actually
inspired by current boosting algorithms (such as Adaboost, where this paper derived its name
from, using a similar concept at the perceptron level, rather than at the GAN level).
The paper doesn’t aim to improve the performance of the GANs themselves, but rather
act as a supplement to allow for faster convergence which can be used alongside existing GAN
implementations, thus the modification should not be too extensive code wise, called a metaalgorithm.
The actual AdaGAN algorithm consists of several import steps. First, the GAN algorithm
(or some other generative model) must be run in the usual way to initialize the generative model
with the resulting generator G1. Following that at every t-th step, the following must be
performed:
o pick the mixture weight βt for the next component
o update weights Wt of examples from the training set in such a way to bias the
next component towards “hard” ones, not covered by the current mixture of
generators Gt−1
o run the GAN algorithm, this time importance sampling mini-batches according to
the updated weights Wt, resulting in a new generator Gc t , and finally
o update our mixture of generators Gt = (1 − βt)Gt−1 + βtGc t (notation expressing
the mixture of Gt−1 and Gc t with probabilities 1−βt and βt).
Following the specifics of the algorithm, the paper then goes into a measure of error
called f-divergence. It then goes into various proofs regarding the validity of the algorithm and
the optimal βt value at each step, I will skip these sections as they aren’t the main focus of our
class, which the implementation and application of neural networks.
Following the proof of concept for the algorithm itself, the paper compares AdaGAN
boosted GAN to a couple of more naïve implementations, such as Best-of-N and Ensemble.
Best-of-N basically runs N independent GAN instances and takes the run that returns the best
result on the validation set. Ensemble is a mixture of T GANs trained independently and
combined with equal weights, along with a vanilla GAN.
In their tests, it could be seen that the GAN boosted using AdaGAN performed
significantly better than the vanilla GAN and produced best results and faster convergence rates
compare the both Best-of-N and Ensemble GANs.
The reason why this paper interested me was that AdaGAN can essentially be applied on
the of ANY current GAN implementation and be able to reduce missed modes. Thus it pertains
to not just our own individual group projects, but also every other project within the class. One
such case where AdaGAN could seem use would be the facial generation GANs which often
have convergence issue in which there would be times where only one face would be generated.