The Noisy Power Method

The Noisy Power Method
Moritz Hardt
Eric Price
IBM
IBM → UT Austin
2014-10-31
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
1 / 18
Problem
Common problem: find low rank approximation to a matrix A
I
I
PCA: apply to covariance matrix
Spectral analysis: PageRank, Cheever’s inequality for cuts, etc.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
2 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Repeat:
Yt+1 = AXt
Xt+1 = orthonormalize(Yt+1 )
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Repeat:
Yt+1 = AXt
Xt+1 = orthonormalize(Yt+1 )
Converges towards U, the space of the top k eigenvalues.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Repeat:
Yt+1 = AXt
Xt+1 = orthonormalize(Yt+1 )
Converges towards U, the space of the top k eigenvalues.
Question 1: how quickly?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Repeat:
Yt+1 = AXt
Xt+1 = orthonormalize(Yt+1 )
Converges towards U, the space of the top k eigenvalues.
Question 1: how quickly?
I
[Stewart ’69, ..., Halko-Martinsson-Tropp ’10]
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Repeat:
Yt+1 = AXt + G
Xt+1 = orthonormalize(Yt+1 )
Converges towards U, the space of the top k eigenvalues.
Question 1: how quickly?
I
[Stewart ’69, ..., Halko-Martinsson-Tropp ’10]
Question 2: how robust to noise?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Simple algorithm: the power method
AKA subspace iteration, subspace power iteration
Choose random X0 ∈ Rn×k .
Repeat:
Yt+1 = AXt + G
Xt+1 = orthonormalize(Yt+1 )
Converges towards U, the space of the top k eigenvalues.
Question 1: how quickly?
I
[Stewart ’69, ..., Halko-Martinsson-Tropp ’10]
Question 2: how robust to noise?
I
Application-specific bounds: [Hardt-Roth ’13,
Mitliagkas-Caramanis-Jain ’13, Jain-Netrapalli-Sanghavi ’13]
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
3 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Suppose A has eigenvectors v1 , . . . , vn , eigenvalues
λ1 > λ2 ≥ · · · λn ≥ 0.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Suppose A has eigenvectors v1 , . . . , vn , eigenvalues
λ1 > λ2 ≥ · · · λn ≥ 0.
Start with x0 = α1 v1 + α2 v2 + · · · + αn vn .
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Suppose A has eigenvectors v1 , . . . , vn , eigenvalues
λ1 > λ2 ≥ · · · λn ≥ 0.
Start with x0 = α1 v1 + α2 v2 + · · · + αn vn .
After q iterations,
Aq x0 =
X
λqi αi vi
i
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Suppose A has eigenvectors v1 , . . . , vn , eigenvalues
λ1 > λ2 ≥ · · · λn ≥ 0.
Start with x0 = α1 v1 + α2 v2 + · · · + αn vn .
After q iterations,
X λi q αi
X q
q
vi
A x0 =
λi αi vi ∝ v1 +
λ1
α1
i
Moritz Hardt, Eric Price (IBM)
i≥2
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Suppose A has eigenvectors v1 , . . . , vn , eigenvalues
λ1 > λ2 ≥ · · · λn ≥ 0.
Start with x0 = α1 v1 + α2 v2 + · · · + αn vn .
After q iterations,
X λi q αi
X q
q
vi
A x0 =
λi αi vi ∝ v1 +
λ1
α1
i
For q ≥ logλ1 /λ2
Moritz Hardt, Eric Price (IBM)
d
α1 ,
i≥2
have Aq x proportional to v1 ± O().
The Noisy Power Method
2014-10-31
4 / 18
Basic power method, k = 1
Choose a random unit vector x ∈ Rn .
Output Aq x, renormalized to unit vector.
I
xt+1 = Axt /kAxt k for t = 0, . . . , q − 1.
Suppose A has eigenvectors v1 , . . . , vn , eigenvalues
λ1 > λ2 ≥ · · · λn ≥ 0.
Start with x0 = α1 v1 + α2 v2 + · · · + αn vn .
After q iterations,
X λi q αi
X q
q
vi
A x0 =
λi αi vi ∝ v1 +
λ1
α1
i≥2
i
For q ≥ logλ1 /λ2
d
α1 ,
have Aq x proportional to v1 ± O().
q = O(
Moritz Hardt, Eric Price (IBM)
λ1
log n)
λ1 − λ2
The Noisy Power Method
2014-10-31
4 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
What conditions on G will cause this to converge to within ?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
What conditions on G will cause this to converge to within ?
I
G must make progress at the beginning
1
|G1 | ≤ (λ1 − λ2 ) √
d
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
What conditions on G will cause this to converge to within ?
I
I
G must make progress at the beginning
G must not perturb by at the end.
1
|G1 | ≤ (λ1 − λ2 ) √
d
Moritz Hardt, Eric Price (IBM)
kGk ≤ (λ1 − λ2 )
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
What conditions on G will cause this to converge to within ?
I
I
I
G must make progress at the beginning
G must not perturb by at the end.
Looser requirements in the middle.
1
|G1 | ≤ (λ1 − λ2 ) √
d
Moritz Hardt, Eric Price (IBM)
kGk ≤ (λ1 − λ2 )
The Noisy Power Method
2014-10-31
5 / 18
Handling noise, k = 1
Consider the iteration
yt+1 = Axt + G
xt+1 = yt+1 /kyt+1 k
What conditions on G will cause this to converge to within ?
I
I
I
G must make progress at the beginning
G must not perturb by at the end.
Looser requirements in the middle.
Theorem: Converges to v1 ± O() if all the G satisfy
1
|G1 | ≤ (λ1 − λ2 ) √
d
kGk ≤ (λ1 − λ2 )
1
in O( λ2λ−λ
log(d/)) iterations.
1
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
5 / 18
Noisy convergence proof (k = 1)
Use a potential-based argument to
show progress at each step. Potential:
qP
tan θ =
Moritz Hardt, Eric Price (IBM)
θ
2
j>1 αj
α1
The Noisy Power Method
2014-10-31
6 / 18
Noisy convergence proof (k = 1)
Use a potential-based argument to
show progress at each step. Potential:
qP
θ
2
j>1 αj
tan θ =
α1
With no noise:
qP
tan θt+1 =
Moritz Hardt, Eric Price (IBM)
2 2
j>1 λj αj
λ1 α1
≤
λ2
tan θt
λ1
The Noisy Power Method
2014-10-31
6 / 18
Noisy convergence proof (k = 1)
Use a potential-based argument to
show progress at each step. Potential:
qP
θ
2
j>1 αj
tan θ =
α1
With no noise:
qP
tan θt+1 =
2 2
j>1 λj αj
λ1 α1
≤
λ2
tan θt
λ1
With noise G satisfying the conditions (|G1 |, kGk small enough),
qP
2
λ2
j>1 αj + kGk
tan θt+1 ≤
λ1 α1 − |G1 |
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
6 / 18
Noisy convergence proof (k = 1)
Use a potential-based argument to
show progress at each step. Potential:
qP
θ
2
j>1 αj
tan θ =
α1
With no noise:
qP
tan θt+1 =
2 2
j>1 λj αj
λ1 α1
≤
λ2
tan θt
λ1
With noise G satisfying the conditions (|G1 |, kGk small enough),
qP
2
λ2
j>1 αj + kGk
λ2
tan θt+1 ≤
≤ + ( )1/4 tan θt
λ1 α1 − |G1 |
λ1
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
6 / 18
Noisy convergence proof (general k)
Use “principal angle” θ from X to U
let U ∈ Rd×k have top k eigenvectors, V = U ⊥ .
vP
u
T
α2
kV X k u
t Pj>k j
=
tan θ :=
2
kU T X k
j≤k αj
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
X
U
θ
2014-10-31
7 / 18
Noisy convergence proof (general k)
Use “principal angle” θ from X to U
let U ∈ Rd×k have top k eigenvectors, V = U ⊥ .
vP
u
T
α2
kV X k u
t Pj>k j
=
tan θ :=
2
kU T X k
j≤k αj
X
U
θ
With no noise:
tan θt+1
Moritz Hardt, Eric Price (IBM)
vP
u
u j>k λ2j αj2
= tP
2 2
j≤k λj αj
The Noisy Power Method
2014-10-31
7 / 18
Noisy convergence proof (general k)
Use “principal angle” θ from X to U
let U ∈ Rd×k have top k eigenvectors, V = U ⊥ .
vP
u
T
α2
kV X k u
t Pj>k j
=
tan θ :=
2
kU T X k
j≤k αj
X
U
θ
With no noise:
tan θt+1
Moritz Hardt, Eric Price (IBM)
vP
u
u j>k λ2j αj2
λ
= tP
≤ k+1 tan θt
2
2
λk
j≤k λj αj
The Noisy Power Method
2014-10-31
7 / 18
Noisy convergence proof (general k)
Use “principal angle” θ from X to U
let U ∈ Rd×k have top k eigenvectors, V = U ⊥ .
vP
u
T
α2
kV X k u
t Pj>k j
=
tan θ :=
2
kU T X k
j≤k αj
X
U
θ
With no noise:
tan θt+1
vP
u
u j>k λ2j αj2
λ
= tP
≤ k+1 tan θt
2
2
λk
j≤k λj αj
With noise G “small enough” we will have
tan θt+1 ≤
Moritz Hardt, Eric Price (IBM)
λk+1 kV T X k + kGk
λk kU T X k − kU T Gk
The Noisy Power Method
2014-10-31
7 / 18
Noisy convergence proof (general k)
Use “principal angle” θ from X to U
let U ∈ Rd×k have top k eigenvectors, V = U ⊥ .
vP
u
T
α2
kV X k u
t Pj>k j
=
tan θ :=
2
kU T X k
j≤k αj
X
U
θ
With no noise:
tan θt+1
vP
u
u j>k λ2j αj2
λ
= tP
≤ k+1 tan θt
2
2
λk
j≤k λj αj
With noise G “small enough” we will have
tan θt+1 ≤
Moritz Hardt, Eric Price (IBM)
λk+1 kV T X k + kGk
λ
≤ + ( k+1 )1/4 tan θt
T
T
λk
λk kU X k − kU Gk
The Noisy Power Method
2014-10-31
7 / 18
Noisy power method lemma
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×k . Let U ∈ Rd×k have top k eigenvectors of A. If
5kGk ≤ (λk − λk+1 )
1
5kU T Gk ≤ (λk − λk+1 ) √
kd
λk
then after L = O( λk −λ
log(d/)) iterations,
k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
8 / 18
Noisy power method lemma
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×k . Let U ∈ Rd×k have top k eigenvectors of A. If
5kGk ≤ (λk − λk+1 )
1
5kU T Gk ≤ (λk − λk+1 ) √
kd
λk
then after L = O( λk −λ
log(d/)) iterations,
k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . Can also iterate on a p > k dimensional subspace.
U
θ
X
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
8 / 18
Noisy power method lemma
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×p . Let U ∈ Rd×k have top k eigenvectors of A. If
√
√
p− k −1
T
√
5kGk ≤ (λk − λk+1 )
5kU Gk ≤ (λk − λk+1 )
d
λk
then after L = O( λk −λ
log(d/)) iterations,
k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . Can also iterate on a p > k dimensional
subspace.
√ √
I
kth singular value of X is typically
p− k−1
√
.
d
U
θ
X
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
8 / 18
Noisy power method lemma
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
T
5kGk ≤ (λk − λk +1 )
5kU Gk ≤ (λk − λk+1 )
d
λk
then after L = O( λk −λ
log(d/)) iterations,
k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . Can also iterate on a p > k dimensional
subspace.
√ √
I
kth singular value of X is typically
p− k−1
√
.
d
U
θ
X
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
8 / 18
Noisy power method lemma
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
T
5kGk ≤ (λk − λk +1 )
5kU Gk ≤ (λk − λk+1 )
d
λk
then after L = O( λk −λ
log(d/)) iterations,
k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . Can also iterate on a p > k dimensional
subspace.
√ √
I
kth singular value of X is typically
p− k−1
√
.
d q
If G is fairly uniform, expect kU T Gk ≈ kGk
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
U
θ
k
d
X
2014-10-31
8 / 18
Noisy power method lemma
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
T
5kGk ≤ (λk − λk +1 )
5kU Gk ≤ (λk − λk+1 )
d
λk
then after L = O( λk −λ
log(d/)) iterations,
k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . Can also iterate on a p > k dimensional
subspace.
√ √
I
kth singular value of X is typically
p− k−1
√
.
d q
If G is fairly uniform, expect kU T Gk ≈ kGk
I
U
θ
k
d
X
First condition is the main one, iteration will converge to to
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
kGk
λk −λk +1 .
2014-10-31
8 / 18
Conjectures to remove eigengap
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
5kGk ≤ (λk − λk+1 )
5kU T Gk ≤ (λk − λk+1 )
d
λk
at each iteration then after L = O( λk −λ
log(d/)) iterations,
k +1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . If λk = λk+1 , our theorem is useless.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
9 / 18
Conjectures to remove eigengap
Conjecture (Can depend on λk − λ2k+1 eigengap)
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
5kGk ≤ (λk − λ2k+1 )
5kU T Gk ≤ (λk − λ2k+1 )
d
λk
log(d/)) iterations,
at each iteration then after L = O( λk −λ
2k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . If λk = λk+1 , our theorem is useless.
If X is n × p, maybe the relevant eigengap is λk − λp+1 ?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
9 / 18
Conjectures to remove eigengap
Conjecture (Can depend on λk − λ2k+1 eigengap)
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
5kGk ≤ (λk − λ2k+1 )
5kU T Gk ≤ (λk − λ2k+1 )
d
λk
log(d/)) iterations,
at each iteration then after L = O( λk −λ
2k+1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . If λk = λk+1 , our theorem is useless.
If X is n × p, maybe the relevant eigengap is λk − λp+1 ?
But do we need any eigengap at all?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
9 / 18
Conjectures to remove eigengap
Do we need any eigengap at all?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
10 / 18
Conjectures to remove eigengap
Do we need any eigengap at all?
Yes for X to approximate U:
Moritz Hardt, Eric Price (IBM)
k(I − XX T )Uk ≤ .
The Noisy Power Method
2014-10-31
10 / 18
Conjectures to remove eigengap
Do we need any eigengap at all?
Yes for X to approximate U:
k(I − XX T )Uk ≤ .
Not clear for X to approximate A: k(I − XX T )Ak ≤ λk+1 + .
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
10 / 18
Conjectures to remove eigengap
Do we need any eigengap at all?
Yes for X to approximate U:
k(I − XX T )Uk ≤ .
Not clear for X to approximate A: k(I − XX T )Ak ≤ λk+1 + .
I
This is weaker: doesn’t imply Frobenius approximation.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
10 / 18
Conjectures to remove eigengap
Do we need any eigengap at all?
k(I − XX T )Uk ≤ .
Yes for X to approximate U:
Not clear for X to approximate A: k(I − XX T )Ak ≤ λk+1 + .
I
This is weaker: doesn’t imply Frobenius approximation.
Conjecture
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have top k eigenvectors of A. If
r
k
T
kGk ≤ kU Gk ≤ d
at each iteration then after L = O(
λk+1
log(d/)) iterations,
k(I − XL XLT )Ak ≤ λk+1 + O()
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
10 / 18
Review of our theorem
Theorem
Consider running the noisy power method on a random starting space
X0 ∈ Rd×2k . Let U ∈ Rd×k have the top k eigenvectors of A. If
r
k
T
5kGk ≤ (λk − λk+1 )
5kU Gk ≤ (λk − λk+1 )
d
λk
at each iteration then after L = O( λk −λ
log(d/)) iterations,
k +1
tan Θ(XL , U) . ⇐⇒ k(I − XL XLT )Uk . √
√
Gaussian G: if Gi,j ∼ N(0, σ 2 ) then kGk . dσ, kU T Gk . k σ
√
with high probability. Hence σ = (λk − λk+1 )/ d is tolerable.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
11 / 18
Outline
1
Applications
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
12 / 18
Applications of the Noisy Power Method
Will discuss two applications of our theorem:
I
I
Privacy-preserving spectral analysis [Hardt-Roth ’13]
Streaming PCA [Mitliagkas-Caramanis-Jain ’13]
Both cases, get improved bound.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
13 / 18
Privacy-preserving spectral analysis
Can we find differentially private approximations to the top
eigenvectors?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
14 / 18
Privacy-preserving spectral analysis
Can we find differentially private approximations to the top
eigenvectors?
Think of A as related to adjacency matrix for graph (web links,
social network, etc.)
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
14 / 18
Privacy-preserving spectral analysis
Can we find differentially private approximations to the top
eigenvectors?
Think of A as related to adjacency matrix for graph (web links,
social network, etc.)
I
Top eigenvectors are useful to study and reveal (e.g. PageRank,
Cheever cuts)
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
14 / 18
Privacy-preserving spectral analysis
Can we find differentially private approximations to the top
eigenvectors?
Think of A as related to adjacency matrix for graph (web links,
social network, etc.)
I
I
Top eigenvectors are useful to study and reveal (e.g. PageRank,
Cheever cuts)
Don’t want to reveal whether x and y are friends.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
14 / 18
Privacy-preserving spectral analysis
Can we find differentially private approximations to the top
eigenvectors?
Think of A as related to adjacency matrix for graph (web links,
social network, etc.)
I
I
Top eigenvectors are useful to study and reveal (e.g. PageRank,
Cheever cuts)
Don’t want to reveal whether x and y are friends.
Randomized algorithm f is (, δ) differentially private if: for any
A, A0 with kA − A0 k ≤ 1, and for any subset S of the range,
Pr[f (A) ∈ S] ≤ e Pr[f (A0 ) ∈ S] + δ
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
14 / 18
Privacy-preserving spectral analysis
Can we find differentially private approximations to the top
eigenvectors?
Think of A as related to adjacency matrix for graph (web links,
social network, etc.)
I
I
Top eigenvectors are useful to study and reveal (e.g. PageRank,
Cheever cuts)
Don’t want to reveal whether x and y are friends.
Randomized algorithm f is (, δ) differentially private if: for any
A, A0 with kA − A0 k ≤ 1, and for any subset S of the range,
Pr[f (A) ∈ S] ≤ e Pr[f (A0 ) ∈ S] + δ
Typical dependence is poly( 1 log(1/δ)).
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
14 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Apply our theorem to see how well the result approximates U.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Apply our theorem to see how well the result approximates U.
Complicated expression, but for example:
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Apply our theorem to see how well the result approximates U.
Complicated expression, but for example:
I
Suppose A represents random graph with a planted sparse cut.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Apply our theorem to see how well the result approximates U.
Complicated expression, but for example:
I
I
Suppose A represents random graph with a planted sparse cut.
Then the (, δ)-differentially private result is very close to U.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Apply our theorem to see how well the result approximates U.
Complicated expression, but for example:
I
I
I
Suppose A represents random graph with a planted sparse cut.
Then the (, δ)-differentially private result is very close to U.
[Hardt-Roth: k = 1 case.]
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Privacy-preserving spectral analysis
Hardt-Roth ’12: Noisy power method X → AX + G preserves
privacy if Gi,j ∼ N(0, σ 2 ) for large enough σ at each stage.
Apply our theorem to see how well the result approximates U.
Complicated expression, but for example:
I
I
I
Suppose A represents random graph with a planted sparse cut.
Then the (, δ)-differentially private result is very close to U.
[Hardt-Roth: k = 1 case.]
Added bonus: algorithm uses sparsity of A.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
15 / 18
Streaming PCA
Can take samples x1 , x2 , . . . ∼ D in Rd .
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
16 / 18
Streaming PCA
Can take samples x1 , x2 , . . . ∼ D in Rd .
Want to estimate covariance matrix Σ = E[xx T ].
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
16 / 18
Streaming PCA
Can take samples x1 , x2 , . . . ∼ D in Rd .
Want to estimate covariance matrix Σ = E[xx T ].
Easy answer is the empirical covariance:
n
X
b=1
Σ
xi xiT .
n
i=1
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
16 / 18
Streaming PCA
Can take samples x1 , x2 , . . . ∼ D in Rd .
Want to estimate covariance matrix Σ = E[xx T ].
Easy answer is the empirical covariance:
n
X
b=1
Σ
xi xiT .
n
i=1
But uses n2 space. Can we use O(nk ) space if Σ is nearly low
rank?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
16 / 18
Streaming PCA
Can take samples x1 , x2 , . . . ∼ D in Rd .
Want to estimate covariance matrix Σ = E[xx T ].
Easy answer is the empirical covariance:
n
X
b=1
Σ
xi xiT .
n
i=1
But uses n2 space. Can we use O(nk ) space if Σ is nearly low
rank?
[Mitliagkas-Caramanis-Jain ’13] Yes, using more samples. Can do
one iteration of the power method in small space:
n
X
b =1
xi xiT X
Xt+1 = ΣX
n
i=1
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
16 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
I
Gaussian, where Σ has k eigenvalues λ1 , . . . , λk = Θ(1) perturbed
by Gaussian noise N(0, σ 2 ) in each coordinate.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
I
I
Gaussian, where Σ has k eigenvalues λ1 , . . . , λk = Θ(1) perturbed
by Gaussian noise N(0, σ 2 ) in each coordinate.
6
e 1+σ
O(
dk ) samples suffice.
2
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
I
I
I
Gaussian, where Σ has k eigenvalues λ1 , . . . , λk = Θ(1) perturbed
by Gaussian noise N(0, σ 2 ) in each coordinate.
6
e 1+σ
O(
dk ) samples suffice.
2
Factor k improvement on [Mitliagkas-Caramanis-Jain ’13]
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
I
I
I
Gaussian, where Σ has k eigenvalues λ1 , . . . , λk = Θ(1) perturbed
by Gaussian noise N(0, σ 2 ) in each coordinate.
6
e 1+σ
O(
dk ) samples suffice.
2
Factor k improvement on [Mitliagkas-Caramanis-Jain ’13]
But also applies to less nice cases:
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
I
I
I
Gaussian, where Σ has k eigenvalues λ1 , . . . , λk = Θ(1) perturbed
by Gaussian noise N(0, σ 2 ) in each coordinate.
6
e 1+σ
O(
dk ) samples suffice.
2
Factor k improvement on [Mitliagkas-Caramanis-Jain ’13]
But also applies to less nice cases:
I
Strong whenever D has exponential concentration.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Streaming PCA
Algorithm is: in every iteration, take a bunch of samples to move in
correct direction.
How many iterations, and how many samples per iteration?
This is then a noisy power method problem:
b − Σ)X
Xt+1 = Σx + (Σ
b − Σ)X in each iteration.
Just need to bound norm of (Σ
Nice case: spiked covariance model
I
I
I
Gaussian, where Σ has k eigenvalues λ1 , . . . , λk = Θ(1) perturbed
by Gaussian noise N(0, σ 2 ) in each coordinate.
6
e 1+σ
O(
dk ) samples suffice.
2
Factor k improvement on [Mitliagkas-Caramanis-Jain ’13]
But also applies to less nice cases:
I
I
Strong whenever D has exponential concentration.
Nontrivial result for general distributions.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
17 / 18
Recap and open questions
Noisy power method is a useful tool.
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
18 / 18
Recap and open questions
Noisy power method is a useful tool.
Can show tan Θ(XL , U) ≤ if
r
5kGk ≤ (λk − λk+1 )
Moritz Hardt, Eric Price (IBM)
T
5kU Gk ≤ (λk − λk+1 )
The Noisy Power Method
k
d
2014-10-31
18 / 18
Recap and open questions
Noisy power method is a useful tool.
Can show tan Θ(XL , U) ≤ if
r
5kGk ≤ (λk − λk+1 )
T
5kU Gk ≤ (λk − λk+1 )
k
d
Can we apply it to more problems?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
18 / 18
Recap and open questions
Noisy power method is a useful tool.
Can show tan Θ(XL , U) ≤ if
r
5kGk ≤ (λk − λk+1 )
T
5kU Gk ≤ (λk − λk+1 )
k
d
Can we apply it to more problems?
Can we prove a theorem without the eigengap?
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
18 / 18
Moritz Hardt, Eric Price (IBM)
The Noisy Power Method
2014-10-31
19 / 18