Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear
Algebra
(Part 2)
David P. Woodruff
presented by Sepehr Assadi
o(n) Big Data Reading Group
University of Pennsylvania
February, 2015
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
1 / 21
Goal
New survey by David Woodruff:
I
Sketching as a Tool for Numerical Linear Algebra
Topics:
I
I
I
I
I
I
Subspace Embeddings
Least Squares Regression
Least Absolute Deviation Regression
Low Rank Approximation
Graph Sparsification
Sketching Lower Bounds
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
2 / 21
Goal
New survey by David Woodruff:
I
Sketching as a Tool for Numerical Linear Algebra
Topics:
I
I
I
I
I
I
Subspace Embeddings
Least Squares Regression
Least Absolute Deviation Regression
Low Rank Approximation
Graph Sparsification
Sketching Lower Bounds
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
3 / 21
Introduction
You have “Big” data!
I
I
I
I
Computationally expensive to deal with
Excessive storage requirement
Hard to communicate
...
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
4 / 21
Introduction
You have “Big” data!
I
I
I
I
Computationally expensive to deal with
Excessive storage requirement
Hard to communicate
...
Summarize your data
I
Sampling
I
Sketching
F
F
A representative subset of the data
An aggregate summary of the whole data
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
5 / 21
Model
Input:
I
I
matrix A ∈ Rn×d
vector b ∈ Rn .
Output: function F (A, b, . . .)
I
e.g. least square regression
Different goals:
I
I
I
Faster algorithms
Streaming
Distributed
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
6 / 21
Linear Sketching
Input:
I
matrix A ∈ Rn×d
Let r n and S ∈ Rr×n be a random matrix
Let S · A be the sketch
Compute F (S · A) instead of F (A)
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
7 / 21
Linear Sketching (cont.)
Pros:
I
I
I
Compute on a r × d matrix instead of n × d
Smaller representation and faster computation
Linearity:
F
F
S · (A + B) = S · A + S · B
We can compose linear sketches !
Cons:
I
F (S · A) is an approximation of F (A)
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
8 / 21
Approximate `2-regression
Input:
I
I
I
matrix A ∈ Rn×d (full column rank)
vector b ∈ Rn
parameter 0 < ε < 1
Output x̂ ∈ Rd :
kAx̂ − bk2 ≤ (1 + ε) arg min
kAx − bk2
x
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
9 / 21
Subspace Embedding
Definition (`2 -subspace embedding)
A (1 ± ε) `2 -subspace embedding for a matrix A ∈ Rn×d is a matrix
S for which for all x ∈ Rn
kSAxk22 = (1 ± ε) kAxk22
Actually subspace embedding for column space of A
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
10 / 21
Previous Session
Oblivious `2 -subspace embedding
I
I
I
The distribution from which S is chosen is oblivious to A
One very common tool: Johnson-Lindenstrauss transform (JLT)
Immediately approximate `2 -regression problem
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
11 / 21
Today
Non-oblivious `2 -subspace embedding
I
I
I
The distribution from which S is chosen depends on A
One very common tool: Leverage Score Sampling
Can still be used to approximate `2 -regression problem
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
12 / 21
Leverage Scores
Thin Singular Value Decomposition (SVD) of A:
I
I
An×d = Un×d · Σd×d · Vd×d
U is an orthonormal basis of column space of A
Leverage Score of i-th row of A:
`i = U(i) 2
Properties:
I
I
I
Independent of the basis (property of the column space)
Forms a probability distribution (by simple normalization)
Let H = A(AT A)−1 AT , then `2i = Hi,i
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
13 / 21
Leverage Score Sampling
Definition (SampleRescale(n, s, p))
We define the procedure S = SampleRescale(n, s, p), if
Ss×n = D · Ω, where each row of Ω is a random basis vector in Rn
chosen according to the probability distribution p, and D is a diagonal
√
matrix where Di,i = 1/ pj s if ej is chosen for i-th row of Ω.
Leverage Score Sampling (p = LS-Sampling(A, β)):
I
I
I
p = (p1 , . . . , pn ) is a probability distribution satisfying
pi ≥ β · `2i /d, where `i is the i-th leverage score of An×d
Compute S = SampleRescale(n, s, p)
Return S · A
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
14 / 21
Subspace Embedding via LS-Sampling
Theorem
log d
Let s = Θ( d βε
2 ), S = SampleRescale(n, s, p) for
p = LS-Sampling(A, β), and U be an orthonormal matrix of the
column space of A; then with probability 0.99, simultaneously for all
i ∈ [d],
1 − ε ≤ σ 2 (S · U) ≤ 1 + ε
It immediately implies subspace embedding
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
15 / 21
Subspace Embedding via LS-Sampling (cont.)
Theorem
log d
Let s = Θ( d βε
2 ), S = SampleRescale(n, s, p) for
p = LS-Sampling(A, β), and U be an orthonormal matrix of the
column space of A; then with probability 0.99, simultaneously for all
i ∈ [d],
1 − ε ≤ σ 2 (S · U) ≤ 1 + ε
Proof.
Matrix Chernoff: Suppose X1 , . . . , Xs are independent copies of
d×d
symmetric
with E[X] = 0, and kXk ≤ γ, and
matrix X ∈ R
P
2
T
E[X X] ≤ s and let W = 1s si=1 Xi ; then
Pr(kWk > ε) ≤ 2d · exp −sε2 /(2s 2 + 2γε/3)
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
16 / 21
Linear Regression via LS-Sampling
Theorem
log d
Let s = Θ( d βε
2 ), S = SampleRescale(n, s, p) for
p = LS-Sampling(A, β), and x̂ = arg minx kSAx − Sbk, then with
probability 0.99,
kAx̂ − bk2 ≤ (1 + ε) arg min kAx − bk2
x
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
17 / 21
Linear Regression via LS-Sampling (cont.)
Theorem (Approximate Matrix Multiplication)
For an orthonormal matrix Cn×m , an arbitrary vector dn×1 , and
probabilities p = (p1 , . . . , pn ) such that:
2
β C(k) pk ≥
kCkF
let S = SampleRescale(n, s, p); then, with probability 0.99:
(SC)T (Sd) − CT d
F
s
≤ O(
1
) kCkF kdkF
sβ
Warning: this statement is neither general nor precise!
see [DKM06]
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
18 / 21
Linear Regression via LS-Sampling (cont.)
Theorem
log d
Let s = Θ( d βε
2 ), S = SampleRescale(n, s, p) for
p = LS-Sampling(A, β), and x̂ = arg minx kSAx − Sbk, then with
probability 0.99,
kAx̂ − bk2 ≤ (1 + ε) arg min kAx − bk2
x
Proof.
On the board.
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
19 / 21
Approximating Leverage Scores
Computing leverage scores is as hard as solving the regression
problem!
Can we approximate them?
I
I
For β = 1/2, in time O(nd log n + d 3 ) [DMIMW12]
Improved to O(nnz(A) log n + d 3 ) [CW13]
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
20 / 21
Questions?
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
21 / 21
Kenneth L Clarkson and David P Woodruff.
Low rank approximation and regression in input sparsity time.
In Proceedings of the forty-fifth annual ACM symposium on
Theory of computing, pages 81–90. ACM, 2013.
Petros Drineas, Ravi Kannan, and Michael W Mahoney.
Fast monte carlo algorithms for matrices i: Approximating matrix
multiplication.
SIAM Journal on Computing, 36(1):132–157, 2006.
Petros Drineas, Malik Magdon-Ismail, Michael W Mahoney, and
David P Woodruff.
Fast approximation of matrix coherence and statistical leverage.
The Journal of Machine Learning Research, 13(1):3475–3506,
2012.
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
21 / 21