Slides - Harchaoui

Deep Adversarial
Gaussian Mixture
Auto-Encoder
for Clustering
Warith HARCHAOUI Pierre-Alexandre MATTEI
Charles BOUVEYRON
Université Paris Descartes MAP5
Oscaro.com Research & Development
February 2017
1/17
Clustering
Clustering is grouping
similar
objects together!
2/17
Thesis
Representation Learning and Clustering operate a
symbiosis
3/17
Gaussian Mixture Model
I
Density Estimation applied to Clustering for
K
modes/clusters
I Linear complexity
suitable for Large Scale Problems
4/17
Learning Representations
I
Successful in a
I
Successful in an
supervised
context (Kernel SVM)
unsupervised
context (Spectral Clustering)
5/17
Auto-Encoder
An auto-encoder is a neural network that consists of:
I
I
an Encoder: E : RD → Rd (compression)
a Decoder: D : Rd → RD (decompression)
D
>> d
D(E(x )) ' x
6/17
Optimization Scheme
Gaussian Clusters (π , µ, Σ)
GMM
Discriminator
Code Space
Encoder
Decoder
Input Space
Figure: Global Optimization Scheme for DAC
7/17
Adversarial Auto-Encoder
An adversarial auto-encoder is a neural network that consists of:
I
I
I
I
an Encoder: E : RD → Rd (compression)
a Decoder: D : Rd → RD (decompression)
R
a Prior: P : Rd → R and Rd P = 1 associated with a random
generator of distribution P
a Discriminator: A : Rd → [0, 1] ⊂ R that distinguishes fake
data from the random generator and real data from the
encoder
8/17
Optimizations
3 lines objectives:
I
The encoder and decoder try to minimize the reconstruction
loss
I
The discriminator tries to distinguish fake codes (from the
random generator associated with the prior) and real codes
(from the encoder)
I
The encoder also tries to fool the discriminator (opposite
discriminator loss function)
9/17
Results
Datasets
DAC EC (Ensemble Clustering)
DAC
GMVAE
DEC
AE + GMM (full covariances, median accuracy over 10 runs)
GMM
KM
MNIST-70k
Reuters-10k
HHAR
96.50
73.34
81.24
94.08
88.54
84.30
82.56
53.73
53.47
72.14
72.17
70.12
54.72
54.04
80.5
79.86
78.48
60.34
59.98
Table: Experimental accuracy results (%, the higher, the better) based on
the Hungarian method
10/17
Actual class
Visualizations
68
22
1
43
49
2
8
0
7486 173
8
88
2
8
25
80
1
1
6863
42
13
3
3
29
27
4
2
0
84
6819
2
125
1
24
81
3
4
3
1
22
0
6747
4
11
17
1
18
5
7
0
22
94
7
6104
39
1
29
10
6
23
1
20
2
7
124 6678
0
21
0
7
2
3
90
17
80
2
0
7025
15
59
8
3
2
69
235
13
170
17
15
6281
20
9
11
0
59
127
318
44
3
104
62
6230
0
1
2
3
4
5
6
7
8
9
0
6710
1
6
2
5
3
0
Predicted class
Figure: Confusion matrix for DAC on MNIST. (best seen in color)
11/17
Visualizations
µk
µk + 0.5σ
µ k + 1σ
µk + 1.5σ
µ k + 2σ
µk + 2.5σ
µ k + 3σ
µk + 3.5σ
Figure: Generated digits images. From left to right, we have the ten
classes found by DAC and ordered thanks to the Hungarian algorithm.
From top to bottom, we go further and further in random directions from
the centroids (the rst row being the decoded centroids).
12/17
Visualizations
Figure: Principal Component Analysis rendering of the code space for
MNIST at the end of the DAC optimization, with colors indicating the
true labels. (best seen in color)
13/17
Conclusion
Representation Learning and Clustering operate a
symbiosis
14/17
References I
Christopher M.. Bishop.
.
Pattern recognition and machine learning
Springer, 2006.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Deep Learning.
MIT Press, 2016.
http://www.deeplearningbook.org.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,
David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua
Bengio.
Generative adversarial nets.
In Advances in Neural Information Processing Systems, pages
26722680, 2014.
15/17
References II
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian
Goodfellow.
Adversarial autoencoders.
arXiv preprint arXiv:1511.05644, 2015.
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua
Bengio, and Pierre-Antoine Manzagol.
Stacked denoising autoencoders: Learning useful
representations in a deep network with a local denoising
criterion.
Journal of Machine Learning Research, 11(Dec):33713408,
2010.
16/17