ppt

Discriminative,
Unsupervised,
Convex Learning
Dale Schuurmans
Department of Computing Science
University of Alberta
MITACS Workshop, August 26, 2005
Current Research Group
PhD Tao Wang
PhD Ali Ghodsi
PhD Dana Wilkinson
PhD Yuhong Guo
PhD Feng Jiao
PhD Jiayuan Huang
PhD Qin Wang
PhD Adam Milstein
PhD Dan Lizotte
PhD Linli Xu
PDF Li Cheng
reinforcement learning
dimensionality reduction
action-based embedding
ensemble learning
bioinformatics
transduction on graphs
statistical natural language
robotics, particle filtering
optimization, everything
unsupervised SVMs
computer vision
2
Current Research Group
PhD Tao Wang
reinforcement learning
PhD Dana Wilkinson
action-based embedding
PhD Feng Jiao
bioinformatics
PhD Qin Wang
statistical natural language
PhD Dan Lizotte
optimization, everything
PDF Li Cheng
computer vision
3
Today I will talk about:
One Current Research Direction
Learning Sequence Classifiers (HMMs)
 Discriminative
 Unsupervised
 Convex
EM?
4
Outline
 Unsupervised SVMs
 Discriminative, unsupervised, convex
HMMs
 Tao, Dana, Feng, Qin, Dan, Li
5
6
Unsupervised
Support Vector Machines
Joint work with
Linli Xu
Main Idea
 Unsupervised SVMs
(and semi-supervised SVMs)
 Harder computational problem than
SVMs
 Convex relaxation – Semidefinite
program
(Polynomial time)
8
Background: Two-class SVM
 Supervised classification learning
 Labeled data  linear discriminant
wx b  0
 Classification rule: y  sgn(w  x  b)
+
Some better than others?
9
Maximum Margin Linear Discriminant
w  x  b  0 to maximize
dist ( xi , yi , Plane w  x  b  0)
Choose a linear discriminant
min xi , yi
10
Unsupervised Learning
 Given unlabeled data,
how to infer classifications?
 Organize objects into
groups — clustering
11
Idea: Maximum Margin Clustering
 Given unlabeled data,
find maximum margin
separating hyperplane
 Clusters the data
 Constraint: class balance:
bound difference in sizes
between classes
12
Challenge
 Find label assignment
that results in a large margin
 Hard
 Convex relaxation – based on semidefinite
programming
13
How to Derive Unsupervised SVM?
Two-class case:
1. Start with Supervised Algorithm
Given vector of
assignments, y, solve

* 2
Inv. sq.
margin

 max λ e 
λ
1
2
K λλ  , yy 
subject to 0  λ  1
14
How to Derive Unsupervised SVM?

2. Think of
* 2
as a function of y
If given y, would then
solve

* 2
Inv. sq.
margin

(y )  max λ e 
λ
1
2
Goal: Choose y to
minimize inverse
squared margin
K λλ  , yy 
subject to 0  λ  1
Problem: not a
convex function of y
15
How to Derive Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels

M

yy
New variables:
If given y, would then
solve

* 2
Inv. sq.
margin

(y )  max λ e 
λ
1
2
An equivalence relation
matrix
1 if yi  y j
M ij  
1 if yi  y j
K λλ  , yy 
subject to 0  λ  1
16
How to Derive Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels

M

yy
New variables:
If given M, would then
solve

* 2
Inv. sq.
margin

( M )  max λ e 
λ
1
2
subject to 0  λ  1
An equivalence relation
matrix
1 if yi  y j
M ij  
1 if yi  y j
K λλ  , M
Note: convex
function
of Mof linear
Maximum
functions is convex
17
How to Derive Unsupervised SVM?
4. Get constrained optimization problem
Solve for M
min  *2 ( M )
M
subject to 0  λ  1
Not convex!
M  1, 1
n n
M  yy 
Class balance    e  Me   e


M  1, 1
nn
encodes
an equivalence relation
iff
M
± 0, diag( M )  e
18
How to Derive Unsupervised SVM?
4. Get constrained optimization problem
Solve for M
min  *2 ( M )
M
subject to 0  λ  1
M  1, 1
M ± 0, diag( M )  e
n n
  e  Me   e


M  1, 1
nn
encodes
an equivalence relation
iff
M
± 0, diag( M )  e
19
How to Derive Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M
min  *2 ( M )
M
subject to 0  λ  1
M  1, 1
M ± 0, diag( M )  e
n n
  e  Me   e
20
How to Derive Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M
min  *2 ( M )
M
subject to 0  λ  1
M   1, 1
M ± 0, diag( M )  e
n n
  e  Me   e
Semidefinite
program
21
Multi-class Unsupervised SVM?
1. Start with Supervised Algorithm
Given vector of assignments, y, solve
  max 
λ
Margin
loss
K 1


2
1
ij
i, j
yi


 i 1y j   j    ir yi ,r
 i ,r
subject to i  0, i  e  1 i
(Crammer & Singer 01)
22
Multi-class Unsupervised SVM?
2. Think of

as a function of y
Goal: Choose y to
minimize margin
loss
If given y, would then solve
  y   max 
λ
Margin
loss
K 1


2
1
ij
i, j
yi


 i 1y j   j    ir yi ,r
 i ,r
subject to i  0, i  e  1 i
Problem: not a
function of y
(Crammer &convex
Singer 01)
23
Multi-class Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels
New variables: M & D
M ij  1( yi  y j ) , Dir  1( yi  r )
If given y, would then solve
  y   max 
λ
Margin
loss
K 1


2
1
ij
i, j
yi

M  DD 

 i 1y j   j    ir yi ,r
 i ,r
subject to i  0, i  e  1 i
(Crammer & Singer 01)
24
Multi-class Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels
New variables: M & D
M ij  1( yi  y j ) , Dir  1( yi  r )
M  DD 
If given M and D, would then solve
  M , D   max Q(, M , D) subject to   0, e  e
Λ
Margin
loss
where Q(, M , D)  n  D,  
1

KD,  
1
2
1
2
K, M 
  , K
convex
function of
M&D
25
Multi-class Unsupervised SVM?
4. Get constrained optimization problem
Solve for M and D
min M , D   M , D 
subject to M  DD  , diag( M )  e
M  0,1
nn
, D  0,1
nk
1

1



n
e

M
e





 ne
Class balance  
k

k

26
Multi-class Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M and D
min M , D   M , D 
subject to M  DD  , diag( M )  e
M  0,1
nn
, D  0,1
nk
1

1



n
e

M
e






 ne
k

k

27
Multi-class Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M and D
min M , D   M , D 
subject to M 
± DD  , diag( M )  e
0  M  1, 0  D  1
1

1



n
e

M
e






 ne
k

k

Semidefinite
program
28
Kmeans
Spectral
Clustering
SemiDef
Experimental Results
29
Experimental Results
30
Experimental Results
Percentage of misclassification errors
Digit dataset
31
Extension to Semi-Supervised Algorithm
Matrix M :
1
1
Labeled
t
Unlabeled
(Clamped)
M ij  yi y j
t
t
M
j 1
ij
 2t
i  {t  1,..., n}
32
Experimental Results
Percentage of misclassification errors
Face dataset
33
Experimental Results
34
35
Discriminative, Unsupervised,
Convex HMMs
Joint work with
Linli Xu
With help from
Li Cheng and Tao Wang
Hidden Markov Model
Must coordinate local classifiers
y1
y2
y3
x1
x2
x3
yi  f ( xi )
“hidden” state
observations
 Joint probability model P(xy)
 Viterbi classifier arg max P (y | x)
y
37
HMM Training: Supervised
 Given x1 y 1 , x 2 y 2 ,... x n y n
Maximum likelihood
max i 1 P (x i y i )
Models input
distribution
n

 max i 1 P (y i | xi ) P (xi )
n

Conditional likelihood max


n
i 1
P (y i | x i )
Discriminative
(CRFs)
38
HMM Training: Unsupervised
 Given only
x1 , x 2 ,... x n
 Now what? EM!
Marginal likelihood
max i 1 P (x i )
n

Exactly the
part we don’t
care about
39
HMM Training: Unsupervised
 Given only
x1 , x 2 ,... x n
The problem with EM:




Not convex
Wrong objective
Too popular
Doesn’t work
40
HMM Training: Unsupervised
 Given only
x1 , x 2 ,... x n
The dream:
 Convex training
 Discriminative training
P ( y | x)
When will someone invent unsupervised CRFs?
41
HMM Training: Unsupervised
 Given only
x1 , x 2 ,... x n
The question:
 How to learn P(y | x) effectively
without seeing any y’s?
42
HMM Training: Unsupervised
 Given only
x1 , x 2 ,... x n
The question:
 How to learn P(y | x) effectively
without seeing any y’s?
The answer:
 That’s what we already did!
 Unsupervised SVMs
43
HMM Training: Unsupervised
 Given only
x1 , x 2 ,... x n
The plan:
single
supervised
unsupervised
y
SVM

unsup SVM
sequence


y
M3N

?
44
M3N: Max Margin Markov Nets
 Relational SVMs
f x ( y1 , y2 )
y1
y2
x1
x2
y3
x3
 Supervised training:
 Given x1 y 1 , x 2 y 2 ,... x n y n
 Solve factored QP
45
Unsupervised M3Ns
 Strategy
 Start with supervised M3N QP
 y-labels  re-express in local M,D
equivalence relations
 Impose class-balance
 Relax non-convex constraints
 Then solve a really big SDP
 But still polynomial size
46
Unsupervised M3Ns
 SDP
47
Some Initial Results
 Synthetic HMM
 Protein Secondary Structure pred.
48
49
Current Research Group
PhD Tao Wang
reinforcement learning
PhD Dana Wilkinson
action-based embedding
PhD Feng Jiao
bioinformatics
PhD Qin Wang
statistical natural language
PhD Dan Lizotte
optimization, everything
PDF Li Cheng
computer vision
50
Brief Research Background






Sequential PAC Learning
Linear Classifiers: Boosting, SVMs
Metric-Based Model Selection
Greedy Importance Sampling
Adversarial Optimization & Search
Large Markov Decision Processes
51