Methods for Sparse Optimization

Sparse Optimization Methods
Methods for Sparse Optimization
F. Rinaldi1
1 Dipartimento
di Matematica, Università di Padova
Padova, 21 June 2015
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Why Sparse is Better
A simple approximate solution of a given optimization problem
is better than a (more complex) exact solution
Simple solutions are more robust to data inexactness
Simple solutions are easier to implement - store - explain
⇓
Simplicity in our case corresponds to Sparsity
(solution with a few nonzero components)
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Why Sparse is Better
A simple approximate solution of a given optimization problem
is better than a (more complex) exact solution
Simple solutions are more robust to data inexactness
Simple solutions are easier to implement - store - explain
⇓
Simplicity in our case corresponds to Sparsity
(solution with a few nonzero components)
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Why Sparse is Better
A simple approximate solution of a given optimization problem
is better than a (more complex) exact solution
Simple solutions are more robust to data inexactness
Simple solutions are easier to implement - store - explain
⇓
Simplicity in our case corresponds to Sparsity
(solution with a few nonzero components)
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
SparseLand
Popular Keywords
Sparse, sparsity, sparse representations, sparse approximations, sparse
decompositions
A new research front (Institute for Scientific Information - June 2006)
SparseLand ≡ Sparse Modelling
How to fit models where only a few terms out of many will be used
How to sparsely model important natural data type
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
SparseLand
Popular Keywords
Sparse, sparsity, sparse representations, sparse approximations, sparse
decompositions
A new research front (Institute for Scientific Information - June 2006)
SparseLand ≡ Sparse Modelling
How to fit models where only a few terms out of many will be used
How to sparsely model important natural data type
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Sparse Problem Formulations
ℓ0 -Constrained Formulation:
min f (x)
x∈X
(1)
s.t. kxk0 ≤ T
Function-Constrained Formulation:
min kxk0
x∈X
(2)
s.t. f (x) ≤ f
Penalty Formulation:
min f (x) + λkxk0
x∈X
(3)
kxk0 is the zero-norm of x defined as
kxk0 = card{xi : xi 6= 0}
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
How to Deal with ℓ0-norm
The zero-norm is a nonconvex discontinuous function
⇓
Problems are hard to be solved
⇓
ℓ1 -norm Approximation
Use of kxk1 has long been known to promote sparsity in x. Also,
Can solve without discrete variables;
It maintains convexity
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
How to Deal with ℓ0-norm
The zero-norm is a nonconvex discontinuous function
⇓
Problems are hard to be solved
⇓
ℓ1 -norm Approximation
Use of kxk1 has long been known to promote sparsity in x. Also,
Can solve without discrete variables;
It maintains convexity
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
How to Deal with ℓ0-norm
The zero-norm is a nonconvex discontinuous function
⇓
Problems are hard to be solved
⇓
ℓ1 -norm Approximation
Use of kxk1 has long been known to promote sparsity in x. Also,
Can solve without discrete variables;
It maintains convexity
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Some Relevant Examples
LASSO problem
Given the training set
T = {(ai , bi ), ai ∈ Rn , bi ∈ R, i = 1, . . . , m}
GOAL:
Find a sparse linear model describing the data
[R. Tibshirani, JRS, 1996]
Basis Pursuit Denoising Problem
Given a discrete-time input signal b, and a dictionary
D = {aj ∈ Rm : j = 1, . . . , n}
of elementary discrete-time signals, usually called atoms.
GOAL: Find a sparse linear combination of the atoms that approximate the real
signal.
[S. Chen, D. Donoho, and M. Saunders, SIAM JSC 1998]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
ℓ1 -regularized least squares problems
Problem formulation
The problem we deal with is the ℓ1 -regularized least-squares
Problem Definition
min
x∈IRn
1
kAx − bk22 + τ kxk1
2
(4)
A ∈ IRm×n , b ∈ IRm ;
τ ∈ IR+ ;
P
kxk1 = ni=1 |xi |.
that is convex but nonsmooth
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Motivations
The ℓ1 -regularized least squares is a standard approach to look
for sparse solutions of large underdetermined systems of
linear equations.
Application contexts:
statistics
signal and image processing
astrophysics
optics
...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Motivations
The ℓ1 -regularized least squares is a standard approach to look
for sparse solutions of large underdetermined systems of
linear equations.
Application contexts:
statistics
signal and image processing
astrophysics
optics
...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
General Purpose Methods
The ℓ1 -regularized least squares can be transformed to a convex quadratic
problem, with linear inequality constraints:
min
x∈Rn
1
kAx − bk22 + λeT y
2
s.t. − y ≤ x ≤ y.
(5)
Then solved using a Constrained Optimization Method (e.g. Projected Conjugate
Gradient Method, Interior-point Method).
An example of Interior-Point Method for l1-Regularized Least Squares is l1-ls
[S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, IEEE JSTSP, 2007]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
General Purpose Methods
The ℓ1 -regularized least squares can be transformed to a convex quadratic
problem, with linear inequality constraints:
min
x∈Rn
1
kAx − bk22 + λeT y
2
s.t. − y ≤ x ≤ y.
(5)
Then solved using a Constrained Optimization Method (e.g. Projected Conjugate
Gradient Method, Interior-point Method).
An example of Interior-Point Method for l1-Regularized Least Squares is l1-ls
[S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, IEEE JSTSP, 2007]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Special Purpose Techniques for solving BPDP
General Purpose Techniques
PRO Good way to get a reliable solution with little programming efforts
CON Slow when dealing with large-scale applications
⇓
Special Purpose Techniques
Iterative Shrinkage/Thresholding
Decomposition Methods
Augmented Lagrangian Methods
...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Special Purpose Techniques for solving BPDP
General Purpose Techniques
PRO Good way to get a reliable solution with little programming efforts
CON Slow when dealing with large-scale applications
⇓
Special Purpose Techniques
Iterative Shrinkage/Thresholding
Decomposition Methods
Augmented Lagrangian Methods
...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Iterative Shrinkage/Thresholding
Widely studied family of iterative approximate solution techniques
⇓
Iterative Shrinkage/Thresholding
We use ∇f (x) to build a quadratic approximation of f at x.
At step k:
Given a current approximate solution xk , solve
minn (z − xk )T ∇f (xk ) +
z∈R
where f (x) =
1
kAx
2
αk
kz − xk k22 + λkzk1 ;
2
(6)
− bk22 and αk ∈ R+ .
[Daubechies, Defriese, De Mol; 2004]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Iterative Shrinkage/Thresholding
Widely studied family of iterative approximate solution techniques
⇓
Iterative Shrinkage/Thresholding
We use ∇f (x) to build a quadratic approximation of f at x.
At step k:
Given a current approximate solution xk , solve
minn (z − xk )T ∇f (xk ) +
z∈R
where f (x) =
1
kAx
2
αk
kz − xk k22 + λkzk1 ;
2
(6)
− bk22 and αk ∈ R+ .
[Daubechies, Defriese, De Mol; 2004]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
SpaRSA Algorithm
Problem (6) can be rewritten as follows:
min
z∈Rn
where uk = xk −
1
αk
λ
1
kz − uk k22 + k kzk1 ;
2
α
(7)
∇f (xk ) and αk ∈ R+ .
This problem is separable then:
xk+1
∈ arg minn
i
z∈R
λ
1
(z − ui k )2 + k |z|,
2
α
i = 1, . . . , n.
(8)
If αk properly chosen, we have descent at each iteration and convergence
αk can be adaptively changed (e.g. Barzilai-Borwein Method)
[Wright, Nowak, Figueiredo; 2009]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
SpaRSA Algorithm
Problem (6) can be rewritten as follows:
min
z∈Rn
where uk = xk −
1
αk
λ
1
kz − uk k22 + k kzk1 ;
2
α
(7)
∇f (xk ) and αk ∈ R+ .
This problem is separable then:
xk+1
∈ arg minn
i
z∈R
λ
1
(z − ui k )2 + k |z|,
2
α
i = 1, . . . , n.
(8)
If αk properly chosen, we have descent at each iteration and convergence
αk can be adaptively changed (e.g. Barzilai-Borwein Method)
[Wright, Nowak, Figueiredo; 2009]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Block Coordinate Gradient Descent Method
Coordinate descent to generate an improving direction d at x
+
∇f (x) to build a quadratic approximation of f at x
⇓
A Coordinate Gradient Descent Approach
Set f (x) =
1
kAx
2
− bk22 . At step k, choose:
a nonempty index subset J k ⊆ N
a simmetric matrix H k ≻ 0n (approximating the Hessian ∇2 f (xk ))
and move x along the following direction:
1
dH k (xk ; J k ) = arg min {∇f (xk )T d+ dT H k d+λkxk +dk1 | dj = 0 ∀ j ∈
/ J k }. (9)
d
2
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Block Coordinate Gradient Descent Method
Coordinate descent to generate an improving direction d at x
+
∇f (x) to build a quadratic approximation of f at x
⇓
A Coordinate Gradient Descent Approach
Set f (x) =
1
kAx
2
− bk22 . At step k, choose:
a nonempty index subset J k ⊆ N
a simmetric matrix H k ≻ 0n (approximating the Hessian ∇2 f (xk ))
and move x along the following direction:
1
dH k (xk ; J k ) = arg min {∇f (xk )T d+ dT H k d+λkxk +dk1 | dj = 0 ∀ j ∈
/ J k }. (9)
d
2
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Some Remarks
Case of a Diagonal Matrix H
If H is diagonal then (9) decomposes into subproblems that can be solved in parallel.
Stepsize Rule
Various stepsize rules for smooth optimization can be adapted (e.g. Armijo-like
rules).
Subset Choice
For convergence, the index subset J k is chosen in a Gauss-Seidel manner, e.g., J k
cycles through {1}, {2}, . . . , {n} or, more generally, J 0 , J 1 , . . . collectively cover
1, 2, . . . , n for every T consecutive iterations (where T ≥ 1), i.e.
J k ∪ J k+1 ∪ · · · ∪ J k+T −1 = N,
k = 0, 1, . . .
or in a Gauss-Southwell manner.
[Tseng and Yun, 2009]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Augmented Lagrangian Approach
Constrained Reformulation
min
x∈Rn
s.t.
1
kAx − bk22 + λkzk1
2
(10)
x − z = 0.
Augmented Lagrangian Formulation
(z k , vk ) = minn
x∈R
1
µk
T
kAv − bk22 + λkzk1 + (σk ) (v − z) +
kv − zk22
2
2
(11)
σk+1 = σk + µk (vk − z k )
with µk > 0.
The Approach
Take alternating steps in z and v
[Afonso, Bioucas-Dias, Figueiredo, 2010]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Augmented Lagrangian Approach
Constrained Reformulation
min
x∈Rn
s.t.
1
kAx − bk22 + λkzk1
2
(10)
x − z = 0.
Augmented Lagrangian Formulation
(z k , vk ) = minn
x∈R
1
µk
T
kAv − bk22 + λkzk1 + (σk ) (v − z) +
kv − zk22
2
2
(11)
σk+1 = σk + µk (vk − z k )
with µk > 0.
The Approach
Take alternating steps in z and v
[Afonso, Bioucas-Dias, Figueiredo, 2010]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
Augmented Lagrangian Approach
Constrained Reformulation
min
x∈Rn
s.t.
1
kAx − bk22 + λkzk1
2
(10)
x − z = 0.
Augmented Lagrangian Formulation
(z k , vk ) = minn
x∈R
1
µk
T
kAv − bk22 + λkzk1 + (σk ) (v − z) +
kv − zk22
2
2
(11)
σk+1 = σk + µk (vk − z k )
with µk > 0.
The Approach
Take alternating steps in z and v
[Afonso, Bioucas-Dias, Figueiredo, 2010]
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
ℓ1 -regularized least squares problems
Proposed approaches
Several special purpose techniques have been proposed for
solving ℓ1 -regularized least squares:
Iterative Shrinkage/Thresholding (IST) methods (e.g.
[Wright, Nowak, Figuereido; 2009] [Beck, Teboulle; 2009]
[Combettes, Wajs; 2005] [Daubechies, Defriese, De Mol;
2004])
Augmented Lagrangian approaches (e.g. [Afonso,
Bioucas-Dias, Figuereido; 2010])
Sequential and parallel block coordinate approaches (e.g.
[Qin, Scheinberg, Goldfarb; 2013] [Richtárik, Takáč; 2012]
[Yun, Toh; 2011] [Tseng, Yun; 2009])
Second order methods (e.g. [Byrd, Chi, Nocedal, Oztoprak;
2012] [Fountoulakis, Gondzio; 2013])
Active set strategies ...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
ℓ1 -regularized least squares problems
Proposed approaches
Several special purpose techniques have been proposed for
solving ℓ1 -regularized least squares:
Iterative Shrinkage/Thresholding (IST) methods (e.g.
[Wright, Nowak, Figuereido; 2009] [Beck, Teboulle; 2009]
[Combettes, Wajs; 2005] [Daubechies, Defriese, De Mol;
2004])
Augmented Lagrangian approaches (e.g. [Afonso,
Bioucas-Dias, Figuereido; 2010])
Sequential and parallel block coordinate approaches (e.g.
[Qin, Scheinberg, Goldfarb; 2013] [Richtárik, Takáč; 2012]
[Yun, Toh; 2011] [Tseng, Yun; 2009])
Second order methods (e.g. [Byrd, Chi, Nocedal, Oztoprak;
2012] [Fountoulakis, Gondzio; 2013])
Active set strategies ...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
ℓ1 -regularized least squares problems
Proposed approaches
Several special purpose techniques have been proposed for
solving ℓ1 -regularized least squares:
Iterative Shrinkage/Thresholding (IST) methods (e.g.
[Wright, Nowak, Figuereido; 2009] [Beck, Teboulle; 2009]
[Combettes, Wajs; 2005] [Daubechies, Defriese, De Mol;
2004])
Augmented Lagrangian approaches (e.g. [Afonso,
Bioucas-Dias, Figuereido; 2010])
Sequential and parallel block coordinate approaches (e.g.
[Qin, Scheinberg, Goldfarb; 2013] [Richtárik, Takáč; 2012]
[Yun, Toh; 2011] [Tseng, Yun; 2009])
Second order methods (e.g. [Byrd, Chi, Nocedal, Oztoprak;
2012] [Fountoulakis, Gondzio; 2013])
Active set strategies ...
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
ℓ1 -regularized least squares problems
Use of active set strategies
Identifying the active set for the ℓ1 -regularized ls
problem
A(x∗ ) = {i ∈ {1, . . . , n} : x∗i = 0}
is becoming crucial task since it can guarantee savings in terms
of CPU time.
FPC-AS [Wen, Yin, Goldfarb, Zhang; 2010] : An estimate of the
active variable set is driven by using a first-order iterative shrinkage method,
[Byrd, Chi, Nocedal, Oztoprak; 2012] : A rule to identify the active
manifold is combined with a semi-smooth Newton approach,
[Wright; 2012] : The identification of the active manifold allows
acceleration techniques to be applied on a reduced space,
[Yuan, Chang, Hsieh, Lin; 2010] : A shrinking technique is proposed
and combined with a coordinate descent algorithm,
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015
Sparse Optimization Methods
ℓ1 -regularized least squares problems
Use of active set strategies
Identifying the active set for the ℓ1 -regularized ls
problem
A(x∗ ) = {i ∈ {1, . . . , n} : x∗i = 0}
is becoming crucial task since it can guarantee savings in terms
of CPU time.
FPC-AS [Wen, Yin, Goldfarb, Zhang; 2010] : An estimate of the
active variable set is driven by using a first-order iterative shrinkage method,
[Byrd, Chi, Nocedal, Oztoprak; 2012] : A rule to identify the active
manifold is combined with a semi-smooth Newton approach,
[Wright; 2012] : The identification of the active manifold allows
acceleration techniques to be applied on a reduced space,
[Yuan, Chang, Hsieh, Lin; 2010] : A shrinking technique is proposed
and combined with a coordinate descent algorithm,
F. Rinaldi (Unipd)
Padova 2015
Padova, 21 June 2015