Interpolation via weighted l1 minimization

Interpolation
via weighted `1 minimization
Rachel Ward
University of Texas at Austin
December 12, 2014
Joint work with Holger Rauhut (Aachen University)
Function interpolation
Given a function f : D → C on a domain D, interpolate or
approximate f from sample values y1 = f (u1 ), . . . , ym = f (um ).
Function interpolation
Given a function f : D → C on a domain D, interpolate or
approximate f from sample values y1 = f (u1 ), . . . , ym = f (um ).
Assume the form
f (u) =
X
xj ψj (u),
|Γ| = N
j∈Γ
Approaches:
I
Standard interpolation: Choose m = N. Find appropriate Γ
and sampling points u1 , u2 , . . . , um .
I
Least squares regression: Choose N < m; minimize ky − Axk2
where A`,j = ψj (u` ) is the sampling matrix.
I
Compressive sensing methods: Choose N > m. Exploit
approximate sparsity of coefficient vector x to solve the
underdetermined system y = Ax.
Orthonormal systems
L2 -normalized Legendre polynomials
D: domain endowed with a probability measure ν.
ψj : D → C, j ∈ Γ (finite or infinite)
R
{ψj }j∈Γ is an orthonormal system: D ψj (t)ψk (t)dν(t) = δj,k
Orthonormal systems
L2 -normalized Legendre polynomials
D: domain endowed with a probability measure ν.
ψj : D → C, j ∈ Γ (finite or infinite)
R
{ψj }j∈Γ is an orthonormal system: D ψj (t)ψk (t)dν(t) = δj,k
Examples:
I
Trigonometric system: ψj (t) = e 2πijt ,
D = [0, 1], dν(t) = dt, kψj k∞ ≤ 1.
Orthonormal systems
L2 -normalized Legendre polynomials
D: domain endowed with a probability measure ν.
ψj : D → C, j ∈ Γ (finite or infinite)
R
{ψj }j∈Γ is an orthonormal system: D ψj (t)ψk (t)dν(t) = δj,k
Examples:
I
Trigonometric system: ψj (t) = e 2πijt ,
D = [0, 1], dν(t) = dt, kψj k∞ ≤ 1.
I
L2 -normalized Legendre polynomials L√j :
D = [−1, 1], dν(t) = dt, kLj k∞ = 2j + 1.
Orthonormal systems
L2 -normalized Legendre polynomials
D: domain endowed with a probability measure ν.
ψj : D → C, j ∈ Γ (finite or infinite)
R
{ψj }j∈Γ is an orthonormal system: D ψj (t)ψk (t)dν(t) = δj,k
I
Tensor products of Legendre polynomials:
Q
D = [−1, 1]d , j = (j1 , j2 , . . . , jd ), Lj = dk=1 Ljk ,
Q
√
kLj k∞ = dk=1 2jk + 1.
In high-dimensional problems, smoothness is not enough to avoid
curse of dimension – too local! We will combine smoothness and
sparsity.
Smoothness and weights
In general, kf kL∞ + kf 0 kL∞ promotes smoothness
Consider ψj (t) = e 2πijt , j ∈ Z, t ∈ [0, 1]
Derivatives satisfy kψj0 k∞ = 2π|j|, j ∈ Z.
P
For f (t) = j xj ψj (t),
kf kL∞ + kf 0 kL∞ = k
X
j
≤
X
=
X
xj ψj kL∞ + k
X
xj ψj0 k∞
j
|xj |(kψj kL∞ + kψj0 kL∞ )
j∈Z
j∈Z
|xj |(1 + 2π|j|) =: kxkω,1 .
Smoothness and weights
In general, kf kL∞ + kf 0 kL∞ promotes smoothness
Consider ψj (t) = e 2πijt , j ∈ Z, t ∈ [0, 1]
Derivatives satisfy kψj0 k∞ = 2π|j|, j ∈ Z.
P
For f (t) = j xj ψj (t),
kf kL∞ + kf 0 kL∞ = k
X
j
≤
X
=
X
xj ψj kL∞ + k
X
xj ψj0 k∞
j
|xj |(kψj kL∞ + kψj0 kL∞ )
j∈Z
|xj |(1 + 2π|j|) =: kxkω,1 .
j∈Z
Weighted `1 -coefficient norm promotes smoothness. It also
promotes sparsity!
Weighted norms and weighted sparsity
Weighted `1 norm: pay a higher price to include certain indices
X
kxkω,1 =
ωj |xj |
j∈Γ
New: weighted sparsity:
X
kxkω,0 :=
ωj2
j:xj 6=0
x is called weighted s-sparse if
kxkω,0 ≤ s.
Weighted best s-term approximation error:
σs (x)ω,1 := inf z:kzkω,0 ≤s kx − zkω,1
(Weighted) Compressive Sensing
Recover a weighted s-sparse (or weighted-compressible) vector x
from measurements y = Ax, where A ∈ Cm×N with m < N.
Weighted `1 -minimization
min kzkω,1
z∈CN
subject to Az = y
“Noisy” version
min kzkω,1
z∈CN
subject to kAz − yk2 ≤ η
(Weighted) Compressive Sensing
Recover a weighted s-sparse (or weighted-compressible) vector x
from measurements y = Ax, where A ∈ Cm×N with m < N.
Weighted `1 -minimization
min kzkω,1
z∈CN
subject to Az = y
“Noisy” version
min kzkω,1
z∈CN
subject to kAz − yk2 ≤ η
Keep in mind sampling matrix:

ψ1 (u1 ) ψ2 (u1 )
 ψ1 (u2 ) ψ2 (u2 )

A=
..
..

.
.
... ...
... ...
ψN (u1 )
ψN (u2 )
..
.
ψ1 (um ) ψ2 (um ) . . . . . . ψN (um )





Numerical example - comparing different reconstructions
Least squares − D=15
Original function
1
1
0.5
0.5
0
−1
0
0.5
0
−1
1
0
Unweighted l1 minimizer
0.5
0.5
1
0
1
Weighted l1 minimizer, j=j1/2
1
0
0
−1
1
1
0
−1
Exact interpolation, D=30
1
0
−1
0
1
1
. Draw m = 30 sampling points u` i.i.d. from uniform
1+25u 2
measure on [−1, 1].
I Interpolate (exactly or approximately) the samples y` = f (u` ) by various
choices of {xj } in
80
X
f # (u) =
xj e πiju
I f (u) =
j=0
*Stability of unweighted `1 minimization given exact sparsity: Rauhut, W. ’09
*Stability of least squares regression: Cohen, Davenport, Leviatan 2011
Numerical example - comparing different reconstructions
Different trials correspond to different random draws of m = 30
sampling points from uniform measure on [−1, 1].
Numerical example - comparing different reconstructions
Same experiment, but now interpolating/approximating by
Legendre polynomials, and sampling points are from Chebyshev
measure on [−1, 1], dν(u) = π√du
.
1−u 2
What is going on?
Runge’s function and its Legendre polynomial coefficients
Compare coefficient indices picked up by various reconstruction
methods:
Least squares
unweighted `1 minimization
weighted `1 minimization
Back to Compressive Sensing setting
Recover a weighted s-sparse (or weighted-compressible) vector x
from measurements y = Ax, where A ∈ Cm×N with m < N.
Weighted `1 -minimization
min
z∈CN
N
X
ωj |zj |
subject to Az = y
j=1
Keep in mind sampling matrix:

ψ1 (u1 ) ψ2 (u1 )
 ψ1 (u2 ) ψ2 (u2 )

A=
..
..

.
.
... ...
... ...
ψN (u1 )
ψN (u2 )
..
.
ψ1 (um ) ψ2 (um ) . . . . . . ψN (um )





Weighted restricted isometry property (WRIP)
Definition (with H. Rauhut ’13)
The weighted restricted isometry constant δω,s of a matrix
A ∈ Cm×N is defined to be the smallest constant such that
(1 − δω,s )kxk22 ≤ kAxk22 ≤ (1 + δω,s )kxk22
P
for all x ∈ CN with kxkω,0 = `:x` 6=0 ωj2 ≤ s.
Weighted restricted isometry property (WRIP)
Definition (with H. Rauhut ’13)
The weighted restricted isometry constant δω,s of a matrix
A ∈ Cm×N is defined to be the smallest constant such that
(1 − δω,s )kxk22 ≤ kAxk22 ≤ (1 + δω,s )kxk22
P
for all x ∈ CN with kxkω,0 = `:x` 6=0 ωj2 ≤ s.
Unweighted “uniform” restricted isometry property (Candès, Tao
’05): ω ≡ 1.
Since we assume ωj ≥ 1, WRIP is weaker than “uniform” RIP
Related: Model-based compressive sensing (Baraniuk, Cevher,
Duarte, Hedge, Wakin ’10)
Weighted restricted isometry property (WRIP)
Definition (with H. Rauhut ’13)
The weighted restricted isometry constant δω,s of a matrix
A ∈ Cm×N is defined to be the smallest constant such that
(1 − δω,s )kxk22 ≤ kAxk22 ≤ (1 + δω,s )kxk22
P
for all x ∈ CN with kxkω,0 = `:x` 6=0 ωj2 ≤ s.
Weights allow us to analyze sampling matrices from function
systems with unbounded k · k∞ norm. Uniform RIP does not hold
for such matrices!
Weighted RIP of random sampling matrix
ψj : D → C, orthonormal system w.r.t. prob. measure ν. and
kψj k∞ ≤ ωj .
Sampling points u1 , . . . , um taken i.i.d. at random according to ν.
Random sampling matrix A ∈ Cm×N with entries A`j = ψj (u` ).
Weighted RIP of random sampling matrix
ψj : D → C, orthonormal system w.r.t. prob. measure ν. and
kψj k∞ ≤ ωj .
Sampling points u1 , . . . , um taken i.i.d. at random according to ν.
Random sampling matrix A ∈ Cm×N with entries A`j = ψj (u` ).
Fix s, and choose N so that ω1 , ω2 , . . . , ωN ≤ s/2.
Theorem (with H. Rauhut, ’13)
If
m ≥ C δ −2 s log3 (s) log(N)
then the weighted restricted isometry constant of
δω,s ≤ δ with high probability.
√1 A
m
satisfies
Weighted RIP of random sampling matrix
ψj : D → C, orthonormal system w.r.t. prob. measure ν. and
kψj k∞ ≤ ωj .
Sampling points u1 , . . . , um taken i.i.d. at random according to ν.
Random sampling matrix A ∈ Cm×N with entries A`j = ψj (u` ).
Fix s, and choose N so that ω1 , ω2 , . . . , ωN ≤ s/2.
Theorem (with H. Rauhut, ’13)
If
m ≥ C δ −2 s log3 (s) log(N)
then the weighted restricted isometry constant of
δω,s ≤ δ with high probability.
√1 A
m
satisfies
Generalizes previous results (Candès, Tao, Rudelson, Vershynin) for
uniformly bounded systems, kψj k∞ ≤ K for all j.
Recovery via weighted `1 -minimization
Theorem (with H. Rauhut)
Let A ∈ Cm×N have the WRIP with weights (ω) and δω,3s < 1/3.
For x ∈ CN and y = Ax + e with kek2 ≤ η let x] be a minimizer of
min kzkω,1
subject to kAz − yk2 ≤ η.
Then
√
kx − x] kω,1 ≤ C1 σs (x)ω,1 + D1 sη
Generalizes unweighted `1 minimization results (Candès, Romberg,
Tao ’06, Donoho ’06).
Recovery via weighted `1 -minimization
Theorem (with H. Rauhut)
Let A ∈ Cm×N have the WRIP with weights (ω) and δω,3s < 1/3.
For x ∈ CN and y = Ax + e with kek2 ≤ η let x] be a minimizer of
min kzkω,1
subject to kAz − yk2 ≤ η.
Then
√
kx − x] kω,1 ≤ C1 σs (x)ω,1 + D1 sη
Generalizes unweighted `1 minimization results (Candès, Romberg,
Tao ’06, Donoho ’06).
Should be of independent interest in structured sparse recovery
problems / sparse recovery problems with nonuniform priors on
support
Prior work on weighted `1 minimization
Weighted `1 norm:
kxkω,1 =
P
j∈Γ ωj |xj |
Many previous works on the analysis of weighted `1 minimization!
Focused on the finite-dimensional setting, and with analysis based
on unweighted sparsity
I
von Borries, Miosso, and Potes 2007
I
Vaswani and Lu 2009, Jacques 2010, Xu 2010
I
Khajehnejad, Xu, Avestimehr, and Hassibi (2009 & 2010)
I
Friedlander, Mansour, Saab, and Yilmaz 2012, Misra and
Parrilo 2013
I
Peng, Hampton, Doostan 2013 - sparse polynomial chaos
approximation of high-dimensional stochastic functions
Back to function interpolation
Suppose that |Γ| = N < ∞.
Given samples y1 = f (u1 ), . . . , ym = f (um ) of
P
f (u) = j∈Γ xj ψj (u)
reconstruction amounts to solving y = Ax where A ∈ Cm×N is the
sampling matrix A`,j = ψj (u` )
Back to function interpolation
Suppose that |Γ| = N < ∞.
Given samples y1 = f (u1 ), . . . , ym = f (um ) of
P
f (u) = j∈Γ xj ψj (u)
reconstruction amounts to solving y = Ax where A ∈ Cm×N is the
sampling matrix A`,j = ψj (u` )
I
Previous analysis suggests: use weighted `1 -minimization with
weights ωj ≥ kψj kL∞ or ωj ≥ kψj0 kL∞ to recover
weighted-sparse or weighted-compressible x when m < N.
I
Choose u1 , . . . , um i.i.d. at random according to νψ in order
to analyze WRIP of sampling matrix.
I
x will not be exactly sparse. Measure residual error f − f # in
which norm? Recall kg kL∞ + kg 0 kL∞ ≤ ||| g ||| ω,1 if
ωj ≥ kψj0 kL∞
Interpolation via weighted `1 minimization
Suppose f (u) =
P
j∈Γ xj ψj (u),
Set weights ωj > kψj k∞
|Γ| = N < ∞
Interpolation via weighted `1 minimization
Suppose f (u) =
P
j∈Γ xj ψj (u),
|Γ| = N < ∞
Set weights ωj > kψj k∞
Theorem
From m s log3 (s) log(N) samples y1 = f (u1 ), . . . , ym = f (um )
where uj ∼ νψ , if x] is the solution to
min kzkω,1
z∈CΓ
and f ] (u) =
]
j∈Γ xj ψj (u),
P
subject to Az = y
Interpolation via weighted `1 minimization
Suppose f (u) =
P
j∈Γ xj ψj (u),
|Γ| = N < ∞
Set weights ωj > kψj k∞
Theorem
From m s log3 (s) log(N) samples y1 = f (u1 ), . . . , ym = f (um )
where uj ∼ νψ , if x] is the solution to
min kzkω,1
z∈CΓ
and f ] (u) =
]
j∈Γ xj ψj (u),
P
subject to Az = y
then with high probability,
kf − f ] k∞ ≤ ||| f − f ] ||| ω,1 ≤ C1 σ(f )ω,1
Interpolation via weighted `1 minimization
Suppose f (u) =
P
j∈Γ xj ψj (u),
|Γ| = N < ∞
Set weights ωj > kψj k∞
Theorem
From m s log3 (s) log(N) samples y1 = f (u1 ), . . . , ym = f (um )
where uj ∼ νψ , if x] is the solution to
min kzkω,1
z∈CΓ
and f ] (u) =
]
j∈Γ xj ψj (u),
P
subject to Az = y
then with high probability,
kf − f ] k∞ ≤ ||| f − f ] ||| ω,1 ≤ C1 σ(f )ω,1
More realistic: |Γ| = ∞. How to pass to a finite dimensional
approximation in a principled way?
Weighted function spaces
Aω,p = {f : f (u) =
X
xj ψj (u), ||| f ||| ω,p := kxkω,p < ∞}
j∈Γ
Interesting range: 0 ≤ p ≤ 1
Aω,p ⊂ Aω,q if p < q
Solving m s log3 (s) log(N) for s, a Stechkin-type error bound
yields
]
]
kf − f kL∞ ≤ ||| f − f ||| ω,1 ≤ C1
log(N)4
m
1/p−1
||| f ||| ω,p
Approximation in infinite-dimensional spaces
Suppose |Γ| = ∞, lim|j|→∞ ωj = ∞ and ωj > kψj k∞ .
Approximation in infinite-dimensional spaces
Suppose |Γ| = ∞, lim|j|→∞ ωj = ∞ and ωj > kψj k∞ .
Theorem
Suppose f ∈ Aω,p for some 0 < p < 1. Fix s, and set
Γs = {j ∈ Γ : ωj2 ≤ s/2}.
Approximation in infinite-dimensional spaces
Suppose |Γ| = ∞, lim|j|→∞ ωj = ∞ and ωj > kψj k∞ .
Theorem
Suppose f ∈ Aω,p for some 0 < p < 1. Fix s, and set
Γs = {j ∈ Γ : ωj2 ≤ s/2}.
Draw m ≥ Cs log3 (s) log(|Γs |) samples y1 = f (u1 ), . . . , ym = f (um )
according to νψ , and let x] be the solution to
p
min kzkω,1 subject to kAx − yk2 ≤ m/s ||| f − fΓs ||| ω,1
z∈CΓs
Put f ] =
P
j∈Γs
xj] ψj .
Approximation in infinite-dimensional spaces
Suppose |Γ| = ∞, lim|j|→∞ ωj = ∞ and ωj > kψj k∞ .
Theorem
Suppose f ∈ Aω,p for some 0 < p < 1. Fix s, and set
Γs = {j ∈ Γ : ωj2 ≤ s/2}.
Draw m ≥ Cs log3 (s) log(|Γs |) samples y1 = f (u1 ), . . . , ym = f (um )
according to νψ , and let x] be the solution to
p
min kzkω,1 subject to kAx − yk2 ≤ m/s ||| f − fΓs ||| ω,1
z∈CΓs
Put f ] =
P
j∈Γs
]
xj] ψj . Then with high probability,
]
kf − f kL∞ ≤ ||| f − f ||| ω,1 ≤ C1
log4 (N)
m
1/p−1
||| f ||| ω,p
Approximation in infinite-dimensional spaces
Suppose |Γ| = ∞, lim|j|→∞ ωj = ∞ and ωj > kψj k∞ .
Theorem
Suppose f ∈ Aω,p for some 0 < p < 1. Fix s, and set
Γs = {j ∈ Γ : ωj2 ≤ s/2}.
Draw m ≥ Cs log3 (s) log(|Γs |) samples y1 = f (u1 ), . . . , ym = f (um )
according to νψ , and let x] be the solution to
p
min kzkω,1 subject to kAx − yk2 ≤ m/s ||| f − fΓs ||| ω,1
z∈CΓs
Put f ] =
P
j∈Γs
]
xj] ψj . Then with high probability,
]
kf − f kL∞ ≤ ||| f − f ||| ω,1 ≤ C1
log4 (N)
m
1/p−1
||| f ||| ω,p
**Using greedy alternative such as weighted iterative hard thresholding, do not
p
need to know m/s ||| f − fΓs ||| ω,1 , get same bound
Function approximation in high dimensions
Important: For domains D = [−1, 1]d , don’t want number of
measurements m in resulting bound to grow much with d.
Function approximation in high dimensions
Important: For domains D = [−1, 1]d , don’t want number of
measurements m in resulting bound to grow much with d.
L2 -normalized Chebyshev polynomials
√
Tj (u) = 2 cos(j arccos u), T0 (u) ≡ 1,
j ∈N,
Tensorized Chebyshev polynomials
Tk (u) =
d
Y
j=1
Tkj (uj ) =
Y
Tkj (uj ),
u = (uj )j≥1 , k ∈ F,
j∈supp k
F = {k = (k1 , k2 , . . .), kj ∈ N}}
Function approximation in high dimensions
Important: For domains D = [−1, 1]d , don’t want number of
measurements m in resulting bound to grow much with d.
L2 -normalized Chebyshev polynomials
√
Tj (u) = 2 cos(j arccos u), T0 (u) ≡ 1,
j ∈N,
Tensorized Chebyshev polynomials
Tk (u) =
d
Y
j=1
Tkj (uj ) =
Y
Tkj (uj ),
u = (uj )j≥1 , k ∈ F,
j∈supp k
F = {k = (k1 , k2 , . . .), kj ∈ N}}
Product probability measure
O
duj
q
,
η=
2
j≥1 π 1 − uj
∈ D.
{Tk }k∈F forms orthonormal basis for L2 (D, dη).
Sparse recovery for tensorized Chebyshev polynomials
L∞ -bound for the tensorized Chebyshev polynomials Tk :
kTk k∞ = 2kkk0 /2 .
Choice of weights:
ωk =
d
O
(kj + 1)1/2 ≥ 2kkk0 /2
j=1
Encourages both sparse and low-order tensor products
Sparse recovery for tensorized Chebyshev polynomials
L∞ -bound for the tensorized Chebyshev polynomials Tk :
kTk k∞ = 2kkk0 /2 .
Choice of weights:
ωk =
d
O
(kj + 1)1/2 ≥ 2kkk0 /2
j=1
Encourages both sparse and low-order tensor products
Subset of indices forms a hyperbolic cross:
Λ0 = {k ∈ Nd0 : ωk2 ≤ s}
[Kuhn, Sickel, Ulrich, 2014, Cohen, DeVore, Foucart, Rauhut
2011]:
|Λ0 | ≤ e d s 2+log(d)
Apply weighted `1 theory:
Consider a function
X
f =
xk Tk
on [−1, 1]d ,
f ∈ Aω,p
k∈Λ
I
I
I
I
I
Fix s, and fix Λ0 = {k ∈ Nd : ωk2 ≤ s} Reduced basis
Fix number of samples m ≥ Cs log4 (s) log(d)
Draw m samples y` = f (u` ) according to Chebyshev measure
Let A ∈ Cm×N be sampling matrix with entries A`,k = Tk (u` ).
Let x# be solution of
p
min kxkω,1 subject to kAx − yk2 ≤ m/s ||| f − fΛ0 ||| ω,1
P
and set f # = k∈Λ0 xk# Tk .
Apply weighted `1 theory:
Consider a function
X
f =
xk Tk
on [−1, 1]d ,
f ∈ Aω,p
k∈Λ
I
I
I
I
I
Fix s, and fix Λ0 = {k ∈ Nd : ωk2 ≤ s} Reduced basis
Fix number of samples m ≥ Cs log4 (s) log(d)
Draw m samples y` = f (u` ) according to Chebyshev measure
Let A ∈ Cm×N be sampling matrix with entries A`,k = Tk (u` ).
Let x# be solution of
p
min kxkω,1 subject to kAx − yk2 ≤ m/s ||| f − fΛ0 ||| ω,1
P
and set f # = k∈Λ0 xk# Tk .
3
Then with probability exceeding 1 − e −d log (s) ,
4
1/p−1
log (s) log(d)
]
∞
kf − f kL ≤ C1
||| f ||| ω,p
m
Limitations of weighted `1 approach
In the previous example,
N = |Λ0 | = e d s 2+log2 (d)
Weighted `1 minimization as a reconstruction method on such
large scale problems is impractical
Fix s̃ s. Least squares projection onto
Ñ = e d s̃ 2+log2 (d)
is faster, but too greedy
Can we meet in the middle?
Summary
I
We introduced weighted `1 minimization for stable and robust
function interpolation, as taking into account both sparsity
and smoothness present in natural functions of interest
I
Along the way, we extended the notion of restricted isometry
property to weighted restricted isometry property, a more mild
condition that allows us to treat function systems with
increasing k · k∞ norm
I
Weighted `1 minimization can overcome curse of dimension
w.r.t. number of samples in high-dimensional approximation
problems.
Extensions:
I
We observe empirically that weighted `1 minimization is
“faster” than unweighted `1 minimization. The steeper the
weights, the faster. Justificaton?
I
We observe that, with error on measurements y = f (u) + ξ,
reconstruction results are similar, provided that the
regularization parameter is chosen correctly in regularization
methods. Equality constrained `1 minimization, weighted or
not, leads to overfitting. Why?
References
Rauhut, Ward. “Interpolation via weighted `1 minimization.”
Arxiv preprint, 2013.
Rauhut, Ward, “Sparse Legendre expansions via
`1 -minimization.” Journal of approximation theory 164.5
(2012): 517-533.
Cohen, Davenport, Leviatan.“On the stability and accuracy of
least squares approximations.” Foundations of computational
mathematics, 13 (2013), 819–834.
Jo. “Iterative Hard Thresholding for weighted sparse
approximation.” Arxiv preprint, 2014.
Burq, Dyatlov, Ward, Zworski. “Weighted eigenfunction
estimates with applications to compressed sensing.” SIAM J.
Math. Anal. 44(5) (2012) 3481-3501

Download Report

Interpolation via weighted l1 minimization

Paperzz.com

Your Paperzz