Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear
Algebra
David P. Woodruff
presented by Sepehr Assadi
o(n) Big Data Reading Group
University of Pennsylvania
February, 2015
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
1 / 25
Goal
New survey by David Woodruff:
I
Sketching as a Tool for Numerical Linear Algebra
Topics:
I
I
I
I
I
I
Subspace Embeddings
Least Squares Regression
Least Absolute Deviation Regression
Low Rank Approximation
Graph Sparsification
Sketching Lower Bounds
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
2 / 25
Goal
New survey by David Woodruff:
I
Sketching as a Tool for Numerical Linear Algebra
Topics:
I
I
I
I
I
I
Subspace Embeddings
Least Squares Regression
Least Absolute Deviation Regression
Low Rank Approximation
Graph Sparsification
Sketching Lower Bounds
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
3 / 25
Introduction
You have “Big” data!
I
I
I
I
Computationally expensive to deal with
Excessive storage requirement
Hard to communicate
...
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
4 / 25
Introduction
You have “Big” data!
I
I
I
I
Computationally expensive to deal with
Excessive storage requirement
Hard to communicate
...
Summarize your data
I
Sampling
I
Sketching
F
F
A representative subset of the data
An aggregate summary of the whole data
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
5 / 25
Model
Input:
I
I
matrix A ∈ Rn×d
vector b ∈ Rn .
Output: function F (A, b, . . .)
I
e.g. least square regression
Different goals:
I
I
I
Faster algorithms
Streaming
Distributed
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
6 / 25
Linear Sketching
Input:
I
matrix A ∈ Rn×d
Let r n and S ∈ Rr×n be a random matrix
Let S · A be the sketch
Compute F (S · A) instead of F (A)
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
7 / 25
Linear Sketching (cont.)
Pros:
I
I
I
Compute on a r × d matrix instead of n × d
Smaller representation and faster computation
Linearity:
F
F
S · (A + B) = S · A + S · B
We can compose linear sketches !
Cons:
I
F (S · A) is an approximation of F (A)
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
8 / 25
Least Square Regression (`2-regression)
Input:
I
I
matrix A ∈ Rn×d
vector b ∈ Rn
(full column rank)
Output x∗ ∈ Rd :
x∗ = arg min kAx − bk2
x
Closed form solution:
x∗ = (AT A)−1 AT b
Θ(nd 2 )-time algorithm using naive matrix multiplication
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
9 / 25
Approximate `2-regression
Input:
I
I
I
matrix A ∈ Rn×d (full column rank)
vector b ∈ Rn
parameter 0 < ε < 1
Output x̂ ∈ Rd :
kAx̂ − bk2 ≤ (1 + ε) arg min
kAx − bk2
x
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
10 / 25
Approximate `2-regression (cont.)
A sketching algorithm:
I
I
I
Sample a random matrix S ∈ Rr×n
Compute S · A and S · b
Output x̂ = arg minx k(SA)x − (Sb)k2
Which randomized family of matrices S and what value of r?
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
11 / 25
Approximate `2-regression (cont.)
An introductory construction:
I
I
Let r = Θ(d/ε2 )
Let S ∈ Rr×n be a matrix of i.i.d normal random variables with
mean zero and variance 1/r
Proof Sketch.
On the board
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
12 / 25
Approximate `2-regression (cont.)
Problems:
I
I
Computing S · A takes Θ(nrd) time
Constructing S requires Θ(nr) space
Different constructions for S:
I
I
I
Fast Johnson-Lindenstrauss transforms:
O(nd log d) + poly(d/ε) time [Sarlos, FOCS ’06]
Optimal O(nnz(A)) + poly(d/ε) time algorithm [Clarkson,
Woodruff, STOC ’13]
Random sign matrices with Θ(d)-wise independent entries:
O(d 2 /ε log (nd))-space streaming algorithm [Clarkson,
Woodruff, STOC ’09]
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
13 / 25
Subspace Embedding
Definition (`2 -subspace embedding)
A (1 ± ε) `2 -subspace embedding for a matrix A ∈ Rn×d is a matrix
S for which for all x ∈ Rn
kSAxk22 = (1 ± ε) kAxk22
Actually subspace embedding for column space of A
Oblivious `2 -subspace embedding
I
The distribution from which S is chosen is oblivious to A
One very common tool for (oblivious) `2 -subspace embedding is
Johnson-Lindenstrauss transform (JLT)
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
14 / 25
Johnson-Lindenstrauss transform
Definition (JLT(ε, δ, f ))
A random matrix S ∈ Rr×d forms a JLT(ε, δ, f ), if with probability at
least 1 − δ, for any f -element subset V ⊆ Rn , it holds that:
∀ v, v0 ∈ V |hSv, Sv0 i − hv, v0 i| ≤ ε kvk2 kv0 k2
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
15 / 25
Johnson-Lindenstrauss transform
Definition (JLT(ε, δ, f ))
A random matrix S ∈ Rr×d forms a JLT(ε, δ, f ), if with probability at
least 1 − δ, for any f -element subset V ⊆ Rn , it holds that:
∀ v, v0 ∈ V |hSv, Sv0 i − hv, v0 i| ≤ ε kvk2 kv0 k2
Usual statement (i.e. original Johnson-Lindenstrauss Lemma)
Lemma (JLL)
Given N points q1 , . . . , qN ∈ Rn , there exists a matrix S ∈ Rt×n
(linear map) for t = Θ(log N /ε2 ) such that with high probability,
simultaneously for all pairs qi and qj ,
kS(qi − qj )k2 = (1 ± ε) k(qi − qj )k2
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
16 / 25
Johnson-Lindenstrauss transform (cont.)
A simple construction of JLT(ε, δ, f ):
Theorem
Let 0 < ε, δ < 1 and S = √1r R ∈ Rr×n where the entries Ri,j are
independent standard normal random variables. Assuming
r = Ω(ε−2 log (f /δ)) then S is a JLT(ε, δ, f ).
Other constructions:
I
I
I
Random sign matrices
[Achlioptas, ’03],[Clarkson, Woodruff, STOC ’09]
Random sparse matrices
[Dasgupta, Kumar, Sarlos, STOC ’10],[Kane, Nelson, J. ACM
’14]
Fast Johnson-Lindenstrauss transforms
[Ailon, Chazelle, STOC ’06]
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
17 / 25
JLT results in `2-subspace embedding
Claim
S = JLT(ε, δ, f ) is an oblivious `2 -subspace embedding for A ∈ Rn×d
Challenge:
I
I
JLT(ε, δ, f ) provides a guarantee for a single finite set in Rn
`2 -subspace embedding requires the guarantee for an infinite
set, i.e. the column space of A
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
18 / 25
JLT results in `2-subspace embedding (cont.)
Let S be the unit sphere in column space of A
S = {y ∈ Rn | y = Ax for some x ∈ Rd and kyk2 = 1}
We seek a finite subset N ⊆ S so that if
∀ w, w0 ∈ N hSw, Sw0 i = hw, w0 i ± ε
then
∀ y ∈ S kSyk = (1 ± ε) kyk
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
19 / 25
JLT results in `2-subspace embedding (cont.)
Lemma ( 12 -net for S)
Suffices to choose any N such that
∀y ∈ S ∃w ∈ N s.t. ky − wk2 ≤ 1/2
Proof.
1
Decompose y:
y = y(0) + y(1) + y(2) + . . .
where y(i) ≤
2
2
kSyk22 =
S(y(0)
Sepehr Assadi (Penn)
1
2i
and
yi
y
k (i) k
∈N
+ y(1) + y(2) + . . .) = 1 ± O(ε)
Sketching for Numerical Linear Algebra
Big Data Reading Group
20 / 25
1
2 -net
of S
Lemma
There exists a 12 -net N of S for which |N | ≤ 5d
Proof.
1
2
3
Find a set N 0 of maximal number of points in Rd so that no two
points are within 1/2 distance from each other
Let U be the orthonormal matrix of column space of A
N = {y ∈ Rn | y = Ux for some x ∈ N 0 and kyk2 = 1}
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
21 / 25
Subspace Embedding via JLT
Theorem
Let 0 < ε, δ < 1 and S = JLT(ε, δ, 5d ). For any fixed matrix
A ∈ Rn×d , with probability 1 − δ, S is a (1 ± ε) `2 -subspace
embedding for A, i.e.
∀x ∈ Rd , kSAxk2 = (1 ± ε) kAxk2
Results in
I
I
O(nnz(A) · ε−1 log d) time algorithm using column-sparsity
transform of Kane and Nelson [Kane, Nelson, J. ACM ’14]
O(nd log n) time algorithm using Fast Johnson-Lindenstrauss
transform of Ailon and Chazelle [Ailon, Chazelle, STOC ’06]
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
22 / 25
Other Subspace Embedding Algorithms
Not JLT-based subspace embedding
I
O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff,
STOC ’13]
None oblivious subspace embeddings
I
Based on Leverage Score Sampling [Drineas, Mahoney,
Muthukrishnan, SODA ’06]
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
23 / 25
`2-regression via Oblivious Subspace Embedding
Theorem
Let S ∈ Rr×n be any oblivious subspace embedding matrix and
x̂ = arg minx kSAx − Sbk2 ; then,
kSAx̂ − Sbk2 ≤ (1 + ε) arg min kAx − bk2
x
Proof.
1
2
Let matrix U ∈ Rn×(d+1) be the orthonormal basis of columns of
A together with vector b
Suppose S is a `2 -subspace embedding for U
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
24 / 25
Questions?
Sepehr Assadi (Penn)
Sketching for Numerical Linear Algebra
Big Data Reading Group
25 / 25