Learning with Structured Sparsity

Authors:
Junzhou Huang, Tong Zhang, Dimitris Metaxas
Zhennan Yan
1
Introduction
n
{
x
,

,
x
}
x

R
 Fixed set of p basis vectors 1
for
p where
j
each j. --> X n p
 Given a random observation y  [ y1 ,  , yn ]  R n , which
depends on an underlying coefficient vector   R p .
 Assume the target coefficient  is sparse.
 Throughout the paper, assume X is fixed, and
randomization is w.r.t. the noise in observation y.
y  X
Zhennan Yan
2
Introduction
 Define the support of a vector   R p as
sup p( )  { j :  j  0}
 So ||  ||0 | sup p( ) |
 A natural method for sparse learning is L0
regularization for desired sparsity s:
ˆL 0  arg min Qˆ (  ) subject to ||  ||0  s,
 Here, only consider the least squares loss
2
ˆ
Q(  ) || X  y ||2
Zhennan Yan
3
Introduction
 NP-hard!
 Standard approach:
 Relaxation of L0 to L1 (Lasso)
 Greedy algorithms (such as OMP)
 In practical applications, often know a structure on β
in addition to sparsity.
 Group sparsity: variables in the same group tend to be
zero or nonzero
 Tonal and transient structures: sparse decomposition for
audio signals
Zhennan Yan
4
Structured Sparsity
 Denote the index set of coefficients
 For any sparse subset
 Coding complexity of F is defined as:
Structured Sparsity
 If a coefficient vector has a small coding complexity, it
can be efficiently learned.
 Why ?
 Number of bits to encode F is cl(F)
 Number of bits to encode nonzero coefficients in F is
O(|F|)
General Coding Scheme
 Block Coding: Consider a small number of base
blocks
(each element of
is a subset of
every subset
can be expressed as union of
blocks in .
 Define code length on :
 Where
 So
),
General Coding Scheme
 a structured greedy algorithm that can take advantage
of block structures is efficient:
 Instead of searching over all subsets of
up to a fixed
coding complexity s (exponential), we greedily add
blocks from one at a time

is supposed to contain only manageable number of
base blocks
General Coding Scheme
 Standard Sparsity:
consisted only of single element
sets and each base block has coding length
. This
uses
bits to code each subset
of
cardinality k.
 Group Sparsity:
 Graph Sparsity:
General Coding Scheme
 Standard Sparsity:
 Group Sparsity: Consider
, let contain
the m groups, and contain p single element blocks.
Element in has cl0 of ∞, and element in has cl0
of
.
only looks for signals consisted
of the groups.
 The result coding length is:
if can be
represented as union of g disjoint groups.
 Graph Sparsity:
General Coding Scheme
 Standard Sparsity:
 Group Sparsity:
 Graph Sparsity: Generalization of Group Sparsity.
Employs a directed graph structure G on . Each
element of is a node of G but G may contain
additional nodes.
 At each node
, we define coding length clv(S) on
the neighborhood Nv of v, as well as any other single
node
with clv(u), such that
General Coding Scheme
 Example for Graph Sparsity: Image Processing
Problem
 Each pixel has 4 adjacent pixels, the number of the
subsets in its neighborhood is 24 = 16, with a coding
length
. Encode all other pixels using
random jumping with coding length
 If connected region F is composed of g sub-regions,
then the coding length is
 While standard sparse coding length is
Zhennan Yan
12
Algorithms for Structured Sparsity
ˆL 0  arg min Qˆ (  ) subject to ||  ||0  s,
Zhennan Yan
13
Algorithms for Structured Sparsity
 Extend forward greedy algorithms by using block
structure, which is only used to limit the search space.
Zhennan Yan
14
Algorithms for Structured Sparsity
 Maximize the gain ratio:
 Using least squares regression
 Where
is the projection matrix
to the subspaces generated by columns of XF
 Select by
Zhennan Yan
15
Experiments-1D
 1D structured sparse signal with values +1~-1,
 p = 512,
 k =32
g=2
 Zero-mean Gaussian noise with standard deviation is
a
added to the measurements
 n = 4k = 128
 Recovery result by Lasso, OMP and structOMP:
Zhennan Yan
16
Experiments-1D
Zhennan Yan
17
Experiments-2D
 Generate a 2D structured sparsity image by putting




four letters in random locations.
p = H*W = 48*48
k = 160
g=4
m = 4k = 640
 Strongly sparse signal, Lasso is better than OMP!
Zhennan Yan
18
Experiments-2D
Zhennan Yan
19
Experiments for sample size
Zhennan Yan
20
Experiment on Tree-structured
Sparsity
 2D wavelet coefficient
 Weakly sparse signal
Zhennan Yan
21
Experiments-Background
Subtracted Images
Zhennan Yan
22
Experiments for sample size
Zhennan Yan
23