Discriminative,
Unsupervised,
Convex Learning
Dale Schuurmans
Department of Computing Science
University of Alberta
MITACS Workshop, August 26, 2005
Current Research Group
PhD Tao Wang
PhD Ali Ghodsi
PhD Dana Wilkinson
PhD Yuhong Guo
PhD Feng Jiao
PhD Jiayuan Huang
PhD Qin Wang
PhD Adam Milstein
PhD Dan Lizotte
PhD Linli Xu
PDF Li Cheng
reinforcement learning
dimensionality reduction
action-based embedding
ensemble learning
bioinformatics
transduction on graphs
statistical natural language
robotics, particle filtering
optimization, everything
unsupervised SVMs
computer vision
2
Current Research Group
PhD Tao Wang
reinforcement learning
PhD Dana Wilkinson
action-based embedding
PhD Feng Jiao
bioinformatics
PhD Qin Wang
statistical natural language
PhD Dan Lizotte
optimization, everything
PDF Li Cheng
computer vision
3
Today I will talk about:
One Current Research Direction
Learning Sequence Classifiers (HMMs)
Discriminative
Unsupervised
Convex
EM?
4
Outline
Unsupervised SVMs
Discriminative, unsupervised, convex
HMMs
Tao, Dana, Feng, Qin, Dan, Li
5
6
Unsupervised
Support Vector Machines
Joint work with
Linli Xu
Main Idea
Unsupervised SVMs
(and semi-supervised SVMs)
Harder computational problem than
SVMs
Convex relaxation – Semidefinite
program
(Polynomial time)
8
Background: Two-class SVM
Supervised classification learning
Labeled data linear discriminant
wx b 0
Classification rule: y sgn(w x b)
+
Some better than others?
9
Maximum Margin Linear Discriminant
w x b 0 to maximize
dist ( xi , yi , Plane w x b 0)
Choose a linear discriminant
min xi , yi
10
Unsupervised Learning
Given unlabeled data,
how to infer classifications?
Organize objects into
groups — clustering
11
Idea: Maximum Margin Clustering
Given unlabeled data,
find maximum margin
separating hyperplane
Clusters the data
Constraint: class balance:
bound difference in sizes
between classes
12
Challenge
Find label assignment
that results in a large margin
Hard
Convex relaxation – based on semidefinite
programming
13
How to Derive Unsupervised SVM?
Two-class case:
1. Start with Supervised Algorithm
Given vector of
assignments, y, solve
* 2
Inv. sq.
margin
max λ e
λ
1
2
K λλ , yy
subject to 0 λ 1
14
How to Derive Unsupervised SVM?
2. Think of
* 2
as a function of y
If given y, would then
solve
* 2
Inv. sq.
margin
(y ) max λ e
λ
1
2
Goal: Choose y to
minimize inverse
squared margin
K λλ , yy
subject to 0 λ 1
Problem: not a
convex function of y
15
How to Derive Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels
M
yy
New variables:
If given y, would then
solve
* 2
Inv. sq.
margin
(y ) max λ e
λ
1
2
An equivalence relation
matrix
1 if yi y j
M ij
1 if yi y j
K λλ , yy
subject to 0 λ 1
16
How to Derive Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels
M
yy
New variables:
If given M, would then
solve
* 2
Inv. sq.
margin
( M ) max λ e
λ
1
2
subject to 0 λ 1
An equivalence relation
matrix
1 if yi y j
M ij
1 if yi y j
K λλ , M
Note: convex
function
of Mof linear
Maximum
functions is convex
17
How to Derive Unsupervised SVM?
4. Get constrained optimization problem
Solve for M
min *2 ( M )
M
subject to 0 λ 1
Not convex!
M 1, 1
n n
M yy
Class balance e Me e
M 1, 1
nn
encodes
an equivalence relation
iff
M
± 0, diag( M ) e
18
How to Derive Unsupervised SVM?
4. Get constrained optimization problem
Solve for M
min *2 ( M )
M
subject to 0 λ 1
M 1, 1
M ± 0, diag( M ) e
n n
e Me e
M 1, 1
nn
encodes
an equivalence relation
iff
M
± 0, diag( M ) e
19
How to Derive Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M
min *2 ( M )
M
subject to 0 λ 1
M 1, 1
M ± 0, diag( M ) e
n n
e Me e
20
How to Derive Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M
min *2 ( M )
M
subject to 0 λ 1
M 1, 1
M ± 0, diag( M ) e
n n
e Me e
Semidefinite
program
21
Multi-class Unsupervised SVM?
1. Start with Supervised Algorithm
Given vector of assignments, y, solve
max
λ
Margin
loss
K 1
2
1
ij
i, j
yi
i 1y j j ir yi ,r
i ,r
subject to i 0, i e 1 i
(Crammer & Singer 01)
22
Multi-class Unsupervised SVM?
2. Think of
as a function of y
Goal: Choose y to
minimize margin
loss
If given y, would then solve
y max
λ
Margin
loss
K 1
2
1
ij
i, j
yi
i 1y j j ir yi ,r
i ,r
subject to i 0, i e 1 i
Problem: not a
function of y
(Crammer &convex
Singer 01)
23
Multi-class Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels
New variables: M & D
M ij 1( yi y j ) , Dir 1( yi r )
If given y, would then solve
y max
λ
Margin
loss
K 1
2
1
ij
i, j
yi
M DD
i 1y j j ir yi ,r
i ,r
subject to i 0, i e 1 i
(Crammer & Singer 01)
24
Multi-class Unsupervised SVM?
3. Re-express problem with indicators
comparing y labels
New variables: M & D
M ij 1( yi y j ) , Dir 1( yi r )
M DD
If given M and D, would then solve
M , D max Q(, M , D) subject to 0, e e
Λ
Margin
loss
where Q(, M , D) n D,
1
KD,
1
2
1
2
K, M
, K
convex
function of
M&D
25
Multi-class Unsupervised SVM?
4. Get constrained optimization problem
Solve for M and D
min M , D M , D
subject to M DD , diag( M ) e
M 0,1
nn
, D 0,1
nk
1
1
n
e
M
e
ne
Class balance
k
k
26
Multi-class Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M and D
min M , D M , D
subject to M DD , diag( M ) e
M 0,1
nn
, D 0,1
nk
1
1
n
e
M
e
ne
k
k
27
Multi-class Unsupervised SVM?
5. Relax indicator variables to obtain a
convex optimization problem
Solve for M and D
min M , D M , D
subject to M
± DD , diag( M ) e
0 M 1, 0 D 1
1
1
n
e
M
e
ne
k
k
Semidefinite
program
28
Kmeans
Spectral
Clustering
SemiDef
Experimental Results
29
Experimental Results
30
Experimental Results
Percentage of misclassification errors
Digit dataset
31
Extension to Semi-Supervised Algorithm
Matrix M :
1
1
Labeled
t
Unlabeled
(Clamped)
M ij yi y j
t
t
M
j 1
ij
2t
i {t 1,..., n}
32
Experimental Results
Percentage of misclassification errors
Face dataset
33
Experimental Results
34
35
Discriminative, Unsupervised,
Convex HMMs
Joint work with
Linli Xu
With help from
Li Cheng and Tao Wang
Hidden Markov Model
Must coordinate local classifiers
y1
y2
y3
x1
x2
x3
yi f ( xi )
“hidden” state
observations
Joint probability model P(xy)
Viterbi classifier arg max P (y | x)
y
37
HMM Training: Supervised
Given x1 y 1 , x 2 y 2 ,... x n y n
Maximum likelihood
max i 1 P (x i y i )
Models input
distribution
n
max i 1 P (y i | xi ) P (xi )
n
Conditional likelihood max
n
i 1
P (y i | x i )
Discriminative
(CRFs)
38
HMM Training: Unsupervised
Given only
x1 , x 2 ,... x n
Now what? EM!
Marginal likelihood
max i 1 P (x i )
n
Exactly the
part we don’t
care about
39
HMM Training: Unsupervised
Given only
x1 , x 2 ,... x n
The problem with EM:
Not convex
Wrong objective
Too popular
Doesn’t work
40
HMM Training: Unsupervised
Given only
x1 , x 2 ,... x n
The dream:
Convex training
Discriminative training
P ( y | x)
When will someone invent unsupervised CRFs?
41
HMM Training: Unsupervised
Given only
x1 , x 2 ,... x n
The question:
How to learn P(y | x) effectively
without seeing any y’s?
42
HMM Training: Unsupervised
Given only
x1 , x 2 ,... x n
The question:
How to learn P(y | x) effectively
without seeing any y’s?
The answer:
That’s what we already did!
Unsupervised SVMs
43
HMM Training: Unsupervised
Given only
x1 , x 2 ,... x n
The plan:
single
supervised
unsupervised
y
SVM
unsup SVM
sequence
y
M3N
?
44
M3N: Max Margin Markov Nets
Relational SVMs
f x ( y1 , y2 )
y1
y2
x1
x2
y3
x3
Supervised training:
Given x1 y 1 , x 2 y 2 ,... x n y n
Solve factored QP
45
Unsupervised M3Ns
Strategy
Start with supervised M3N QP
y-labels re-express in local M,D
equivalence relations
Impose class-balance
Relax non-convex constraints
Then solve a really big SDP
But still polynomial size
46
Unsupervised M3Ns
SDP
47
Some Initial Results
Synthetic HMM
Protein Secondary Structure pred.
48
49
Current Research Group
PhD Tao Wang
reinforcement learning
PhD Dana Wilkinson
action-based embedding
PhD Feng Jiao
bioinformatics
PhD Qin Wang
statistical natural language
PhD Dan Lizotte
optimization, everything
PDF Li Cheng
computer vision
50
Brief Research Background
Sequential PAC Learning
Linear Classifiers: Boosting, SVMs
Metric-Based Model Selection
Greedy Importance Sampling
Adversarial Optimization & Search
Large Markov Decision Processes
51
© Copyright 2025 Paperzz