k-means

Random Moments for
Sketched Mixture Learning
Nicolas Keriven12, Rémi Gribonval2,
Gilles Blanchard3, Yann Traonmilin2
1Université
Rennes 1
2Inria Rennes Bretagne-atlantique
3University of Potsdam
SPARS 2017
Outline
Introduction
Illustration
Main results
Conclusion
07/06/2017
Nicolas Keriven
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
07/06/2017
Nicolas Keriven
1
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.





07/06/2017
Nicolas Keriven
PCA :
Classification :
Regression :
k-means :
Density estimation :
1
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.





Loss function
07/06/2017
Nicolas Keriven
PCA :
Classification :
Regression :
k-means :
Density estimation :
1
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
Loss function
Goal : Minimize Expected Risk
07/06/2017
Nicolas Keriven





PCA :
Classification :
Regression :
k-means :
Density estimation :
1
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
Loss function
Goal : Minimize Expected Risk





PCA :
Classification :
Regression :
k-means :
Density estimation :
Empirical Risk Minimization (ERM)
07/06/2017
Nicolas Keriven
1
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
Loss function
Goal : Minimize Expected Risk





PCA :
Classification :
Regression :
k-means :
Density estimation :
Large d or n
Empirical Risk Minimization (ERM)
07/06/2017
Nicolas Keriven
1
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
Loss function
Goal : Minimize Expected Risk





PCA :
Classification :
Regression :
k-means :
Density estimation :
Large d or n
Empirical Risk Minimization (ERM)
Compress the database before
learning
07/06/2017
Nicolas Keriven
1
Compressive Statistical Learning
. . .
Hypothesis
Database of vectors
Large d : Dimensionality
reduction
See eg [Calderbank 2009, Boutsidis 2010]
-
Random Projection
Feature selection
07/06/2017
. . .
Database of compressed vectors
Nicolas Keriven
2
Compressive Statistical Learning
. . .
Hypothesis
Database of vectors
Large n : Subsampling,
coresets
See eg [Feldman 2010]
. . .
-
Uniform sampling (naive)
Adaptive, weighted sampling
Hierarchical construction
07/06/2017
Reduced database
Nicolas Keriven
2
Compressive Statistical Learning
. . .
Hypothesis
Database of vectors
Linear sketch
See [Thaper 2002, Cormode 2011]
-
-
Sketch of union of databases = sum
of sketches
Extremely convenient for streaming
/ parallel computing
Used for simple queries. Can we do
learning ?
07/06/2017
Nicolas Keriven
Sketch
2
Random Sketching operator
Linear sketch = Empirical
generalized moments…
07/06/2017
Nicolas Keriven
3
Random Sketching operator
Linear sketch = Empirical
generalized moments…
07/06/2017
... i.e. linear measurements of
underlying probability
distribution
Nicolas Keriven
3
Random Sketching operator
Linear sketch = Empirical
generalized moments…
07/06/2017
... i.e. linear measurements of
underlying probability
distribution
Nicolas Keriven
3
Random Sketching operator
Linear sketch = Empirical
generalized moments…
... i.e. linear measurements of
underlying probability
distribution
Reminiscent of
Compressive Sensing :
Random design of
07/06/2017
Nicolas Keriven
3
Outline
Introduction
Illustration (previous work)
Main results
Conclusion
07/06/2017
Nicolas Keriven
Experimental illustration
Compressive Learning-OMP algorithm [Keriven 2015,2016]
(OMP + non-convex updates)
07/06/2017
Nicolas Keriven
4
Experimental illustration
k-means
(d=10, k=10)
Compressive Learning-OMP algorithm [Keriven 2015,2016]
(OMP + non-convex updates)
Comparison with
• Matlab’s kmeans
• VLFeat’s gmm
• Faster and more memory
efficient on large databases
• Number of measurements
does not depend on n
07/06/2017
Nicolas Keriven
4
Experimental illustration
k-means
(d=10, k=10)
Compressive Learning-OMP algorithm [Keriven 2015,2016]
(OMP + non-convex updates)
Comparison with
• Matlab’s kmeans
• VLFeat’s gmm
• Faster and more memory
efficient on large databases
• Number of measurements
does not depend on n
GMMs
07/06/2017
Nicolas Keriven
(d=10, k=10)
4
Outline
Introduction
Illustration
Main results
Conclusion
07/06/2017
Nicolas Keriven
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
Loss function
Goal : Minimize Expected Risk
07/06/2017
Nicolas Keriven
6
Statistical Learning
. . .
Hypothesis
Database of vectors i.i.d.
Loss function
Goal : Minimize Expected Risk
Here:
- k-means
- GMM with known covariance
07/06/2017
Nicolas Keriven
6
k-means
Hyp. class
07/06/2017
Nicolas Keriven
7
k-means
•
Hyp. class
•
- separation
- bounded domain
(centroids, not samples)
07/06/2017
Nicolas Keriven
7
k-means
•
Hyp. class
•
- separation
- bounded domain
(centroids, not samples)
Loss function
07/06/2017
Nicolas Keriven
7
k-means
•
Hyp. class
•
- separation
- bounded domain
(centroids, not samples)
Loss function
Sketching operator
07/06/2017
• (weighted) Random Fourier sampling
• « Smoothing » weights
Nicolas Keriven
7
k-means
•
Hyp. class
•
- separation
- bounded domain
(centroids, not samples)
Loss function
Sketching operator
07/06/2017
• (weighted) Random Fourier sampling
• « Smoothing » weights
Nicolas Keriven
7
k-means: result
07/06/2017
Nicolas Keriven
8
k-means: result
: (weighted) Random Fourier sampling
07/06/2017
Nicolas Keriven
8
k-means: result
: (weighted) Random Fourier sampling
07/06/2017
Nicolas Keriven
8
k-means: result
: (weighted) Random Fourier sampling
If
w.h.p. on
07/06/2017
Nicolas Keriven
8
GMM with known covariance
Hyp. class
07/06/2017
Nicolas Keriven
9
GMM with known covariance
Hyp. class
•
•
separation
- bounded domain
(means, not samples)
07/06/2017
Nicolas Keriven
9
GMM with known covariance
Hyp. class
•
•
separation
- bounded domain
(means, not samples)
Loss function
07/06/2017
Nicolas Keriven
9
GMM with known covariance
Hyp. class
•
•
separation
- bounded domain
(means, not samples)
Loss function
Sketching operator
07/06/2017
• Random Fourier sampling
Nicolas Keriven
9
GMM with known covariance
Hyp. class
•
•
separation
- bounded domain
(means, not samples)
Loss function
Sketching operator
• Random Fourier sampling
linked to separation
07/06/2017
Nicolas Keriven
9
GMM: result
07/06/2017
Nicolas Keriven
10
GMM: result
: Random Fourier sampling
07/06/2017
Nicolas Keriven
10
GMM: result
: Random Fourier sampling
07/06/2017
Nicolas Keriven
10
GMM: result
: Random Fourier sampling
Trade-off with
minimal separation
If
w.h.p.
07/06/2017
Nicolas Keriven
10
GMM trade-off
Trade-off
Separation of means
Size of sketch
More high
frequencies
07/06/2017
Nicolas Keriven
11
Sketch Size
k-means
Non-convex optimization.
Greedy heuristic: CL-OMP
SSE on k
points
[Keriven 2016]
In theory, at least
Empirically
GMMs, known cov.
Relative
loglike
07/06/2017
Nicolas Keriven
5
Sketch of proof
Key idea 1
Sketching operator =
Step 1
Relate risk to kernel metric
Kernel mean embedding [Smola 2007]
+ Random Features [Rahimi 2007]
07/06/2017
Nicolas Keriven
12
Sketch of proof
Key idea 1
Sketching operator =
Step 1
Relate risk to kernel metric
Kernel mean embedding [Smola 2007]
+ Random Features [Rahimi 2007]
Step 2
Key idea 2
Compressive Sensing analysis
satisfies the RIP
[Bourrier 2014]
07/06/2017
Nicolas Keriven
12
Sketch of proof
Key idea 1
Sketching operator =
Step 1
Relate risk to kernel metric
Kernel mean embedding [Smola 2007]
+ Random Features [Rahimi 2007]
Step 2
Key idea 2
Compressive Sensing analysis
satisfies the RIP
[Bourrier 2014]
Main difficulty
Controlling metrics between mixtures
that get close to each other in infinitedimensional space
07/06/2017
Nicolas Keriven
12
Sketch of proof
Key idea 1
Sketching operator =
Step 1
Relate risk to kernel metric
Kernel mean embedding [Smola 2007]
+ Random Features [Rahimi 2007]
Step 2
Key idea 2
Compressive Sensing analysis
satisfies the RIP
[Bourrier 2014]
Main difficulty
Controlling metrics between mixtures
that get close to each other in infinitedimensional space
No hypothesis
Separation hypothesis
07/06/2017
Nicolas Keriven
12
Outline
Introduction
Main results
Experimental illustration
Conclusion
07/06/2017
Nicolas Keriven
Conclusions
Contributions
• Efficient sketched mixture learning framework, using
random generalized moments
• Combination of many tools:
• Kernel mean embedding
• Random Fourier features
• Analysis inspired by Compressive Sensing
07/06/2017
Nicolas Keriven
13
Conclusions
Contributions
• Efficient sketched mixture learning framework, using
random generalized moments
• Combination of many tools:
• Kernel mean embedding
• Random Fourier features
• Analysis inspired by Compressive Sensing
Outlooks
• Bridge gap theory / practice
• Other models (done in practice), with other sketching
operators
• Non-linear sketches ? (neural networks…)
07/06/2017
Nicolas Keriven
13
The SketchMLbox
SketchMLbox (sketchml.gforge.inria.fr)
• Mixture of Diracs (« K-means »)
• GMMs with known covariance
• GMMs with unknown diagonal covariance
• Soon:
• Mixtures of multivariate alpha-stable (only known algorithm !)
• Gaussian Locally Linear Mapping [Deleforge 2014]
• Optimized for user-defined
07/06/2017
Nicolas Keriven
14
Thank you !
•
K., Bourrier, Gribonval, Perez. Sketching for Large-Scale Learning of Mixture Models ICASSP
2016
•
K., Bourrier, Gribonval, Perez. Sketching for Large-Scale Learning of Mixture Models
(extended version) submitted to Information and Inference, arXiv:1606.0238
•
K., Tremblay, Gribonval, Traonmilin. Compressive K-means ICASSP 2017
•
K., Tremblay, Gribonval. SketchMLbox (sketchml.gforge.inria.fr)
•
Gribonval, Blanchard, K., Traonmilin. Compressive Statistical Learning online soon
Nicolas Keriven
Appendix : CLOMPR
Nicolas Keriven