Low-Rank Approximation of Kernel Matrices

Low-Rank Approximation of Kernel Matrices
Leopold Cambier
ICME, Stanford University
February 21, 2017
Motivation
Integral equations
Z
K (x, y )u(x)dx = f (y )
X
for given K : X × Y → R, f : Y → R and unknown u : X → R.
Discretization gives
X
K (xi , yj )u(xi ) = f (yj )
i
2
Motivation
Typical situation of BEM:
X , Y are 2 parts of a 3D mesh
May or may not be well-separated
For instance,
K (x, y ) =
1
kx − y k2
3
Motivation
1
kx−y k
When X and Y are well-separated, K (x, y ) =
low-rank.
10
3
10
10
1
10
0
z
σi /σ1
2
10
is smooth and hence
-1
-3
-5
-7
-9
−1
−2
−3
1.5
1.0
0.5
−2.0−1.5
0.0
−1.0−0.5
−0.5 y
0.0 0.5
−1.0
1.0 1.5
x
−1.5
2.0
1581 × 1608 points
10
10
10
-11
-13
-15
0
50
100
150
200
i
The spectrum of Kij
4
Motivation
The problem
Given X = {x1 , . . . , xM }, Y = {y1 , . . . , yN } and K : X × Y → R, build a low-rank
representation of Kij = K (xi , yj ), i.e.
K ≈ USV >
for U ∈ RM×r ,V ∈ RN×r , S ∈ Rr ×r .
5
Outline
1 Motivation
2 Low-Rank Kernel Approximation
3 Multivariate Polynomial Interpolation
4 Mappings
5 Numerical Results
Plates
Ellipsoid
Torus
6 Conclusion
Low-Rank Kernel Approximation
1 Motivation
2 Low-Rank Kernel Approximation
3 Multivariate Polynomial Interpolation
4 Mappings
5 Numerical Results
Plates
Ellipsoid
Torus
6 Conclusion
Low-Rank Kernel Approximation
Naive Solution
Naive idea
Compute Kij = K (xi , yj ) - O (MN)
Rank-revealing QR - up to O (MNr )
Bottleneck is O (MN) complexity.
6
Low-Rank Kernel Approximation
Main Idea
Main idea
Assume we are given
K (x, y ) ≈
r
X
up (x)vp (y )
p=1
Then we immediately have
Kij ≈
r
X
up (xi )vp (yj ) =
p=1
r
X
uip vjp
p=1
How to compute up (x) and vp (y ) ?
7
Low-Rank Kernel Approximation
The Answer
Interpolation !
8
Low-Rank Kernel Approximation
Quick Review
Given a function f : X ⊂ Rd → R we can write
f (x) ≈ f¯(x) =
K
X
f (x̄k )Tk (x)
k=1
where xk are interpolation nodes and Tk Lagrange basis functions, i.e.
Tk (x̄k ) = 1, Tk (x̄i ) = 0 for i 6= k.
This directly implies (interpolation)
f (x̄k ) = f¯(x̄k ).
9
Low-Rank Kernel Approximation
Low-Rank Kernel Approximation
Given this, write
K (x, y ) ≈
K
X
Rk (x)K (x̄k , y )
k=1
≈
K
X
Rk (x)
k=1
=
K X
L
X
L
X
Tl (y )K (x̄k , ȳl )
l=1
Rk (x)K (x̄k , ȳl )Tl (y )
k=1 l=1
= K̄ (x, y )
We have an interpolation scheme on X × Y since
K̄ (x̄k , ȳl ) = K (x̄k , ȳl ).
10
Low-Rank Kernel Approximation
Low-Rank Kernel Approximation
K̄ (x, y ) =
K X
L
X
Rk (x)K (x̄k , ȳl )Tl (y )
k=1 l=1
K (x̄K , ȳL )
T1 (y )>
..
.
TL (y )>



|

K (x̄1 , ȳL )

..

.
|
  K (x̄ , ȳ ) . . .
1 1
|

..
RK (x) 
.
|
K (x̄K , ȳ1 ) . . .
|
|
= R1 (x) . . .
|
|

rank r0 = min(K , L)
11
Low-Rank Kernel Approximation
Given such representation, complexity becomes
O (MK + KL + LN) ≈ O (r0 n)
versus
O (MKr ) ≈ O rn2
before.
12
Low-Rank Kernel Approximation
Recompression
Given
K̄ = RKT >
rank K̄ = r0
we further recompress K̄ as
K̄ = (QR RR )K (QT RT )>
= QR (RR KRT> )QT>
= QR UK SK VK> QT>
= (QR UK )SK (QT VK )>
= USV >
where rank K̄ = r1 . This requires ≈ O (nr0 r1 ) + O r02 r1 work.
13
Low-Rank Kernel Approximation
One Thing Left Unanswered
Interpolation
How to obtain the (multivariate) interpolation ?
14
Multivariate Polynomial Interpolation
1 Motivation
2 Low-Rank Kernel Approximation
3 Multivariate Polynomial Interpolation
4 Mappings
5 Numerical Results
Plates
Ellipsoid
Torus
6 Conclusion
Multivariate Polynomial Interpolation
Univariate Polynomial Interpolation
The best way to interpolate a smooth function f on [a, b] using polynomials
is to use Chebyshev-like type of nodes that cluster to the boundary
E.g.
a+b b−a
+
cos
x̄k =
2
2
2k − 1
π
2n
k = 1, . . . , n
Using barycentric formula (for stability), this implies
f (x) ≈ f¯(x) =
n
X
w̄k
f (x̄k ) Pnx−x̄k w̄j
j=1 x−x̄j
k=1
|
with
{z
=Tk (x)
}
1
k6=j (x̄j − x̄k )
w̄j = Q
15
Multivariate Polynomial Interpolation
Multivariate Polynomial Interpolation
If X = I1 × · · · × Id where Ii = [ai , bi ] then use a sequence of univariate
interpolation rules
f (x) ≈ f¯(x) =
K1
X
k1 =1
···
Kd
X
Tk1 (x1 ) . . . Tkd (xd )f (x̄1 , . . . , x̄d )
kd =1
16
Mappings
1 Motivation
2 Low-Rank Kernel Approximation
3 Multivariate Polynomial Interpolation
4 Mappings
5 Numerical Results
Plates
Ellipsoid
Torus
6 Conclusion
Mappings
Mappings
Basic idea
Find a mapping X ⊂ Rd → R such that
R = I1 × · · · × Id 0
and such that R is "small"
17
Mappings
Mapping 1: Box
Find a box aligned with the data X = {x1 , . . . , xM } using PCA:
Translate points to the origin x̃i = xi − c where c is the center
Compute the axis of the box as the eigenvector of the covariance matrix
Cij = x̃i> x̃j
Compute the length of each axis
Works well if points lie on planes (d 0 < d) or almost planar surfaces.
18
Mappings
Mapping 2: Ellipsoid
Find an ellipsoid tightly fitting the data
In general
f (x) = (x − xc )> A(x − xc ) = 1
Find xc by taking the mean of the data
Find A by
V = argminA=A>
n
X
2
(xi − xc )> A(xi − xc ) − 1
i=1
A = PΛP > gives the axis (pi ) and the length ( √1λ ) of the ellipsoid
i
In 3d, use this to build a polar coordinate representation of the data,
minimizing the range of every coordinate
19
Numerical Results
1 Motivation
2 Low-Rank Kernel Approximation
3 Multivariate Polynomial Interpolation
4 Mappings
5 Numerical Results
Plates
Ellipsoid
Torus
6 Conclusion
Numerical Results
Algorithm
Given
K : X × Y → R,
{x1 , . . . , xM } ⊂ X and {y1 , . . . , yN } ⊂ Y
Test points {x̃1 , . . . , x̃M 0 } ⊂ X and {ỹ1 , . . . , ỹN 0 } ⊂ Y
d
Mappings Mx : X → Ix1 × · · · × Ixdx and My : Y → Iy1 × · · · × Iy y
Tolerance δ
Start with nx = [0, 0, . . . , 0] ∈ Ndx and ny = [0, 0, . . . , 0] ∈ Ndy and build
K̄nx ,ny interpolating K (Mx−1 (·), My−1 (·)).
While > δ,
For z = {x, y } and for i = 1, . . . , dz
Increase nz [i] by 1 and build temporary K̄nx ,ny
z,i =
kK̄nx ,ny (x̃,ỹ )−K (x̃,ỹ )k
kK (x̃,ỹ )k
Pick (z, i) = argminz,i z,i
= z,i
nz [i] = nz [i] + 1, update K̄nx ,ny
This gives K̄nx ,ny of rank r0
Recompress to get K̄n0 x ,ny of rank r1 .
20
Numerical Results
Experiments
2d and 3d geometries
Multiple radial kernels
Compare to optimal low-rank factorization (SVD)
Re-compression (from rank r0 to r1 ) always brings the rank very close
(∼ 5 − 10% max) to the optimal value (r ) and is ommited in plots, where we
show r0 (before recompression, for usual method "Tensor" and more involved
"Sparse Grids") and the optimal rank r
Sparse Grids ideas (i.e. removing some well-choosen nodes from Tensor) also
used
21
Numerical Results
Plates
Parallel Plates: 1/r
Parallel plates distance 1 - 1/r
300
1.6
1.4
250
Tensor
Sparse Grids
Optimal Rank
1.2
1.0
0.8
200
0.6
0.4
150
0.2
0.0
1.2
100
1.0
0.8
−0.6
−0.4
0.6
−0.2
0.0
50
0.4
0.2
0.2
0.4
0.6
0.0
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
2500 x 2500 points
22
Numerical Results
Plates
Parallel Plates: Sparse Grids
1.6
1.6
1.4
1.4
1.2
1.2
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
1.2
1.2
1.0
1.0
0.8
−0.6
−0.4
0.6
−0.2
0.0
0.4
0.2
0.2
0.4
0.6
Tensor Grid
0.0
0.8
−0.6
−0.4
0.6
−0.2
0.0
0.4
0.2
0.2
0.4
0.6
0.0
Sparse Grid
23
Numerical Results
Plates
Parallel Plates: r 2 log(r )
Parallel plates distance 1 - r^2 ln(r)
200
Tensor
Sparse Grids
Optimal Rank
150
100
50
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
24
Numerical Results
Plates
Parallel Plates: 1/r 2
Parallel plates distance 1 - 1/r^2
Tensor
Sparse Grids
Optimal Rank
350
300
250
200
150
100
50
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
25
Numerical Results
Plates
Parallel Plates - Increasing the Distance: 1/r
Parallel plates distance 3 - 1/r
100
80
Tensor
Sparse Grids
Optimal Rank
60
40
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
26
Numerical Results
Plates
Perpendicular Plates: 1/r
Perpendicular plates - 1/r
160
1.5
1.0
0.5
0.0
−0.5
140
120
Tensor
Sparse Grids
Optimal Rank
100
80
60
−1.0
1.2
1.0
−1.5
−1.0
40
0.8
0.6
−0.5
0.4
0.0
0.2
0.5
1.0
0.0
1.5 −0.2
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
27
Numerical Results
Perpendicular Plates:
√
Plates
1 + r2
Perpendicular plates - sqrt(1+r^2)
100
Tensor
Sparse Grids
Optimal Rank
80
60
40
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
28
Numerical Results
Plates
45◦ Plates: 1/r
45 degrees plates - 1/r
250
1.6
200
1.4
Tensor
Sparse Grids
Optimal Rank
1.2
1.0
150
0.8
0.6
0.4
0.2
100
0.0
−0.2
1.2
1.0
0.8
0.6
0.4
0.2
0.0
−0.2−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
50
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
29
Numerical Results
Plates
45◦ Plates: r 3
45 degrees plates - r^3
140
120
Tensor
Sparse Grids
Optimal Rank
100
80
60
40
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
30
Numerical Results
Ellipsoid
Ellipsoid 2d: 1/r
Facing ellipsoids 2d - 1/r
250
Tensor
Sparse Grids
Optimal Rank
3
2
200
1
0
150
−1
−2
−3
100
1.0
0.5
−2.0
−1.5
−1.0
−0.5
0.0
0.0
0.5
50
−0.5
1.0
1.5
2.0 −1.0
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
31
Numerical Results
Ellipsoid
Ellipsoid 2d: r 2 log(r )
32
Numerical Results
Ellipsoid
Ellipsoid 3d: 1/r
Facing ellipsoids 3d - 1/r
1000
Tensor
Sparse Grids
Optimal Rank
3
2
800
1
0
600
−1
−2
−3
400
1.0
0.5
−2.0
−1.5
−1.0
−0.5
0.0
0.0
0.5
200
−0.5
1.0
1.5
2.0 −1.0
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
33
Ellipsoid 3d:
√
Numerical Results
Ellipsoid
1 + r2
Facing ellipsoids 3d - sqrt(1+r^2)
700
Tensor
Sparse Grids
Optimal Rank
600
500
400
300
200
100
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
34
Numerical Results
Torus
Torus 2d: 1/r
C
1
0
-1
B
-10
-10
A
-5
-5
0
0
5
5
10
10
50 × 50 = 2500 points per partition
35
Numerical Results
Torus
Torus 2d - A: 1/r
Torus Angle 2 - 1/r
300
Tensor
Sparse Grids
Optimal Rank
250
200
150
100
50
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
36
Numerical Results
Torus
Torus 2d - B: 1/r
Torus Angle 8 - 1/r
120
Tensor
Sparse Grids
Optimal Rank
100
80
60
40
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
37
Numerical Results
Torus
Torus 2d - C: 1/r
Torus Angle 16 - 1/r
90
Tensor
Sparse Grids
Optimal Rank
80
70
60
50
40
30
20
10
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
38
Conclusion
Low-Rank Kernel Matrix Approximation
Method to approximate kernel matrices
Independant of the size of the matrix
Independant of the geometry, ...
but requires a tight parametrization of the surface
Can be improved by removing some well selected nodes ("Sparse Grids")
39
Conclusion
References
Berrut, Jean-Paul, and Lloyd N. Trefethen. "Barycentric lagrange
interpolation." SIAM review 46.3 (2004): 501-517.
Kaarnioja, Vesa. "Smolyak quadrature." (2013). Master’s Thesis
40
1 Motivation
2 Low-Rank Kernel Approximation
3 Multivariate Polynomial Interpolation
4 Mappings
5 Numerical Results
Plates
Ellipsoid
Torus
6 Conclusion
Parallel Plates:
√
Additional Results
1 + r2
Parallel plates distance 1 - sqrt(1+r^2)
Tensor
Sparse Grids
Optimal Rank
160
140
120
100
80
60
40
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
42
Additional Results
Parallel Plates: r 3
Parallel plates distance 1 - r^3
180
160
140
Tensor
Sparse Grids
Optimal Rank
120
100
80
60
40
20
0
10 -10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
43
Additional Results
Torus 2d - A: Accuracy
Tol Torus Angle 2 - 1/r
10 -1
Tensor
Sparse Grids
Tol
10 -2
10 -3
10 -4
10 -5
10 -6
10 -7
10 -8
10 -9
10 -10
10 -11 -10
10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
44
Additional Results
Torus 2d - B: Accuracy
Tol Torus Angle 8 - 1/r
10 -1
Tensor
Sparse Grids
Tol
10 -2
10 -3
10 -4
10 -5
10 -6
10 -7
10 -8
10 -9
10 -10
10 -11 -10
10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
45
Additional Results
Torus 2d - C: Accuracy
Tol Torus Angle 16 - 1/r
10 -1
Tensor
Sparse Grids
Tol
10 -2
10 -3
10 -4
10 -5
10 -6
10 -7
10 -8
10 -9
10 -10
10 -11 -10
10
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
46