Learning Incoherent Sparse
and Low-Rank Patterns from
Multiple Tasks
Jianhui Chen
Computer Science & Engineering
Arizona State University
Joint work with Ji Liu and Jieping Ye
Learning Multiple Tasks
Task 1
Task 2
f1 ( x1 ) y1
f2 ( x2 ) y 2
Single Task Learning (STL)
Learn f1
Learn f2
Learn fm
Task m
fm ( xm ) y m
Multi-Task Learning (MTL)
Learn f1 , f2, … fm simultaneously
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Motivation
Goal of Multi-task learning ?
• Improve the overall generalization performance.
How can the performance be improved?
• Learn the tasks simultaneously via exploiting tasks
relationship.
When do we need multi-task learning?
• When there are a number of related tasks, but the
training data is limited for each task.
Introduction
MTL has been studied from many perspectives:
• share hidden units of neural networks among similar tasks
[Caruana’97; Baxter’00]
• model task relatedness via the common prior distribution in
hierarchical Bayesian models [Bakker’03, Schwaighofer,’04;
Yu’05; Zhang’05 ]
• learn the parameters of Gaussian Process covariance from
multiple tasks [Lawrence’04]
• extend kernel methods and regularization networks to MTL
setting [Evgeniou’05]
• learn a shared low-rank structure from multiple tasks
[Ando’05; Chen’09]
• employ trace norm regularization for multi-task learning
[Abernethy’09; Argyriou’08; Obozinski’09; Pong’09]
Applications
MTL has been applied in many areas such as
•
•
•
•
•
Bioinformatics [Ando’07]
Medical image analysis [Bi’08]
Web search ranking [Chapelle’10]
Computer vision [Quattoni’07]
…...
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Learning Multiple Tasks
Task 1
Task 2
f1 ( x1 ) y1
f2 ( x2 ) y 2
{( x1 , y1 )}
{( x 2 , y 2 )}
Task m
fm ( xm ) y m
{( x m , y m )}
T
f
(
x
)
z
x , 1,
Linear classifiers:
Task 1
Task 2
f1 ( x1 ) z1T x1
f 2 ( x 2 ) z2T x 2
, m.
Task m
f m ( x m ) zmT x m
correlated via underlying relationship
Sparse and Low-Rank Structure
The task relationship can be captured as (Argyriou’08;
Pong’09)
z1 z2
zm
low-rank structure
Multiple tasks may have sufficient difference and
the discriminative features can be sparse.
z1 z2
zm
sparse structure
Low-rank and sparse structures are desirable.
Sparse and Low-Rank Structure
Transformation matrix
Z [ z1 , z2 ,
, zm ]
d m
Incoherent sparse and low-rank structure
Transformation Matrix
=
Sparse Component
+
Low-Rank Component
Sparse and Low-Rank Structure
Detailed
facial
marks
Rough
shape of the
faces
Muti-task Learning Formulation
The proposed MTL Formulation
Smooth convex loss
n
min Z ,P ,Q
T
L
z
xi , yi
subject to
Z P Q , rank(Q ) , Low-rank structure
m
1 i 1
‖P‖ Sparse structure
0
Incoherent structure
The proposed formulation is non-convex .
This problem NP-hard.
Convex Envelop
We consider a convex relaxation via non-convex
term substitution.
Substitution using convex envelops
‖P‖0
rank(Q )
‖P‖1
convex envelop
‖Q‖*
• The convex envelope is the tightest convex function approximate
the non-convex function from below.
Convex Formulation
A convex relaxation
n
Substituted convex envelopes
min Z , P ,Q
T
L
z
xi , yi
subject to
Z P Q , ‖Q‖* ,
m
1 i 1
‖P‖
1
The formulation is
• convex
• non-smooth
• constrained
It can be reformulated as a semi-definite program
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Projected Gradient Scheme
A compact form the convex relaxation
min
Smooth term
n
T
1 i 1
subject to
Non-smooth term
TP)‖1
fL(Tz) x , y g(‖
m
i
i
TM
ZP
Q, ‖Q‖*
Convex domain set
Projected gradient scheme finds the optimal
solution T* via a sequence
T1 T2
T3
T*
Projected Gradient Scheme
The key component in projected gradient scheme
solution point
searching point
Ti 1 arg minTM
step size
2
L
1
i T Si f ( Si ) g (T ) ,
Li
2
F
• If g(T) = 0 and M is all Real space, then
1
Ti 1 Si f ( Si )
Li
Projected Gradient Scheme
For the convex relaxation, the key component
is
2
min TP ,TQ
subject to
L TP SˆP
T
2 Q SˆQ
TQ
*
1
F
minTQ
minTP ‖TP SˆP‖2F ‖TP‖1
A closed form solution
TP
subject to
1
‖TQ SˆQ‖2F
2
‖TQ‖* .
Solved via an SVD + a projection
q
min{
q
i }i 1
i 1
q
subject to
i 1
i
i
i
2
, i 0.
Outline
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Main Algorithms
Main components of the algorithms
• find the appropriate step size 1 / Li via line search
• solve the optimization
Ti 1 arg minT M
L
1
i T S i f ( S i )
Li
2
Projected gradient algorithm (PG)
• set Si = Ti
• attain the convergence rate at Ο(1/k)
2
F
g (T )
Accelerated Projected gradient algorithm (AG)
• set Si = (1+αi)Ti – αiTi-1
• attain the convergence rate at Ο(1/k2)
Outline
•
•
•
•
•
Introduction
Multi-Task Learning Formulations
Projected Gradient Scheme
Main Algorithms
Experiments
Performance Evaluation
Key observations
• Incoherent structure improves performance
• Sparse structure is effective on multi-media data
• Low-rank structure is important on image and gene data
Efficiency Comparison
PG : O (1 / k )
AG : O (1 / k 2 )
• AG is more efficient than PG for solving the proposed MTL formulation
Conclusion
Main Contributions
• Propose MTL formulations and the efficient algorithms
• Conduct experiments for demonstration
• Future Work
• Conduct theoretical analysis on the MTL formulations
• Apply the MTL formulation on real-world application
© Copyright 2026 Paperzz