Learning Incoherent Sparse and Low

Learning Incoherent Sparse
and Low-Rank Patterns from
Multiple Tasks
Jianhui Chen
Computer Science & Engineering
Arizona State University
Joint work with Ji Liu and Jieping Ye
Learning Multiple Tasks
Task 1
Task 2
f1 ( x1 )  y1
f2 ( x2 )  y 2
Single Task Learning (STL)
Learn f1
Learn f2
Learn fm
Task m
fm ( xm )  y m
Multi-Task Learning (MTL)
Learn f1 , f2, … fm simultaneously
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Motivation
 Goal of Multi-task learning ?
• Improve the overall generalization performance.
 How can the performance be improved?
• Learn the tasks simultaneously via exploiting tasks
relationship.
 When do we need multi-task learning?
• When there are a number of related tasks, but the
training data is limited for each task.
Introduction
 MTL has been studied from many perspectives:
• share hidden units of neural networks among similar tasks
[Caruana’97; Baxter’00]
• model task relatedness via the common prior distribution in
hierarchical Bayesian models [Bakker’03, Schwaighofer,’04;
Yu’05; Zhang’05 ]
• learn the parameters of Gaussian Process covariance from
multiple tasks [Lawrence’04]
• extend kernel methods and regularization networks to MTL
setting [Evgeniou’05]
• learn a shared low-rank structure from multiple tasks
[Ando’05; Chen’09]
• employ trace norm regularization for multi-task learning
[Abernethy’09; Argyriou’08; Obozinski’09; Pong’09]
Applications
 MTL has been applied in many areas such as
•
•
•
•
•
Bioinformatics [Ando’07]
Medical image analysis [Bi’08]
Web search ranking [Chapelle’10]
Computer vision [Quattoni’07]
…...
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Learning Multiple Tasks
Task 1
Task 2
f1 ( x1 )  y1
f2 ( x2 )  y 2
{( x1 , y1 )}
{( x 2 , y 2 )}
Task m
fm ( xm )  y m
{( x m , y m )}
T
f
(
x
)

z
x ,  1,
Linear classifiers:
Task 1
Task 2
f1 ( x1 )  z1T x1
f 2 ( x 2 )  z2T x 2
, m.
Task m
f m ( x m )  zmT x m
correlated via underlying relationship
Sparse and Low-Rank Structure
 The task relationship can be captured as (Argyriou’08;
Pong’09)
z1 z2
zm
low-rank structure
 Multiple tasks may have sufficient difference and
the discriminative features can be sparse.
z1 z2
zm
sparse structure
 Low-rank and sparse structures are desirable.
Sparse and Low-Rank Structure
Transformation matrix
Z  [ z1 , z2 ,
, zm ] 
d m
Incoherent sparse and low-rank structure
Transformation Matrix
=
Sparse Component
+
Low-Rank Component
Sparse and Low-Rank Structure
Detailed
facial
marks
Rough
shape of the
faces
Muti-task Learning Formulation
The proposed MTL Formulation
Smooth convex loss
n
min Z ,P ,Q
T
L
z
  xi , yi
subject to
Z  P  Q , rank(Q )   , Low-rank structure
m
1 i 1
  ‖P‖ Sparse structure
0
Incoherent structure
The proposed formulation is non-convex .
This problem NP-hard.
Convex Envelop
We consider a convex relaxation via non-convex
term substitution.
Substitution using convex envelops
‖P‖0
rank(Q )
‖P‖1
convex envelop
‖Q‖*
• The convex envelope is the tightest convex function approximate
the non-convex function from below.
Convex Formulation
A convex relaxation
n
Substituted convex envelopes
min Z , P ,Q
T
L
z
xi , yi


subject to
Z  P  Q , ‖Q‖*   ,
m
1 i 1
  ‖P‖
1
The formulation is
• convex
• non-smooth
• constrained
 It can be reformulated as a semi-definite program
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Projected Gradient Scheme
A compact form the convex relaxation
min
Smooth term
n
T
1 i 1
subject to
Non-smooth term
TP)‖1
fL(Tz) x , y  g(‖
m
i
i
TM
ZP
Q, ‖Q‖*  
Convex domain set
Projected gradient scheme finds the optimal
solution T* via a sequence
T1  T2
 T3 
T*
Projected Gradient Scheme
The key component in projected gradient scheme
solution point
searching point
Ti 1  arg minTM
step size
2
L



1
 i T   Si  f ( Si )   g (T )  ,
Li
2



F


• If g(T) = 0 and M is all Real space, then
1
Ti 1  Si  f ( Si )
Li
Projected Gradient Scheme
For the convex relaxation, the key component
is
2
min TP ,TQ
subject to
L  TP   SˆP 

T   
2  Q   SˆQ 


TQ
*
1
F

minTQ
minTP ‖TP  SˆP‖2F  ‖TP‖1
A closed form solution
  TP
subject to
1
‖TQ  SˆQ‖2F
2
‖TQ‖*   .
Solved via an SVD + a projection
q
min{
q
i }i 1
 
i 1
q
subject to

i 1
i
i
 i 
2
  ,  i  0.
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Main Algorithms
 Main components of the algorithms
• find the appropriate step size 1 / Li via line search
• solve the optimization
Ti 1  arg minT M
L


1
 i T   S i  f ( S i ) 
Li
 2



Projected gradient algorithm (PG)
• set Si = Ti
• attain the convergence rate at Ο(1/k)
2
F

 g (T ) 


Accelerated Projected gradient algorithm (AG)
• set Si = (1+αi)Ti – αiTi-1
• attain the convergence rate at Ο(1/k2)
Outline
•
•
•
•
•
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Performance Evaluation
Key observations
• Incoherent structure improves performance
• Sparse structure is effective on multi-media data
• Low-rank structure is important on image and gene data
Efficiency Comparison
PG : O (1 / k )
AG : O (1 / k 2 )
• AG is more efficient than PG for solving the proposed MTL formulation
Conclusion
 Main Contributions
• Propose MTL formulations and the efficient algorithms
• Conduct experiments for demonstration
• Future Work
• Conduct theoretical analysis on the MTL formulations
• Apply the MTL formulation on real-world application