Learning Structural SVMs
with Latent Variables
Chun-Nam Yu
Dept. of Computer Science, Cornell University
October 8-9, IBM SMiLe Workshop
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
1 / 21
Structured Output Prediction
Traditional classification and regression
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
2 / 21
Structured Output Prediction
Traditional classification and regression
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
2 / 21
Structured Output Prediction
Traditional classification and regression
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
2 / 21
Structured Output Prediction
Traditional classification and regression
Structured output prediction
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
2 / 21
Structured Output Prediction
Traditional classification and regression
Structured output prediction
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
2 / 21
Introduction to Structural SVMs
Structural SVM (Margin rescaling) [Tsochantardis et.al ’04]
n
X
1
~ k2 + C
min kw
ξi
~ ,ξ~ 2
w
i=1
s.t. for 1 ≤ i ≤ n, for all output structures ŷ ∈ Y,
~ · Φ(xi , yi ) − w
~ · Φ(xi , ŷ) ≥ ∆(yi , ŷ) − ξi
w
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
3 / 21
Introduction to Structural SVMs
Structural SVM (Margin rescaling) [Tsochantardis et.al ’04]
n
X
1
~ k2 + C
min kw
ξi
~ ,ξ~ 2
w
i=1
s.t. for 1 ≤ i ≤ n, for all output structures ŷ ∈ Y,
~ · Φ(xi , yi ) − w
~ · Φ(xi , ŷ) ≥ ∆(yi , ŷ) − ξi
w
~ ·Φ(
w
|
C.-N. Yu (Cornell)
,
)
{z
}
score of correct parse tree
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
3 / 21
Introduction to Structural SVMs
Structural SVM (Margin rescaling) [Tsochantardis et.al ’04]
n
X
1
~ k2 + C
min kw
ξi
~ ,ξ~ 2
w
i=1
s.t. for 1 ≤ i ≤ n, for all output structures ŷ ∈ Y,
~ · Φ(xi , yi ) − w
~ · Φ(xi , ŷ) ≥ ∆(yi , ŷ) − ξi
w
~ ·Φ(
w
|
,
)
{z
}
score of correct parse tree
C.-N. Yu (Cornell)
~ ·Φ(
w
|
,
)
{z
}
score of wrong parse tree
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
3 / 21
Introduction to Structural SVMs
Structural SVM (Margin rescaling) [Tsochantardis et.al ’04]
n
X
1
~ k2 + C
min kw
ξi
~ ,ξ~ 2
w
i=1
s.t. for 1 ≤ i ≤ n, for all output structures ŷ ∈ Y,
~ · Φ(xi , yi ) − w
~ · Φ(xi , ŷ)≥∆(yi , ŷ) − ξi
w
~ ·Φ(
w
|
,
~ ·Φ(
)≥w
{z
}
score of correct parse tree
C.-N. Yu (Cornell)
|
,
)
{z
}
score of wrong parse tree
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
3 / 21
Introduction to Structural SVMs
Structural SVM (Margin rescaling) [Tsochantardis et.al ’04]
n
X
1
~ k2 + C
min kw
ξi
~ ,ξ~ 2
w
i=1
s.t. for 1 ≤ i ≤ n, for all output structures ŷ ∈ Y,
~ · Φ(xi , yi ) − w
~ · Φ(xi , ŷ) ≥ ∆(yi , ŷ) − ξi
w
~ ·Φ(
w
|
,
~ ·Φ(
)≥w
,
)
{z
} |
{z
}
score of correct parse tree
score of wrong parse tree
Loss function ∆ controls the penalty of predicting ŷ instead
of yi
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
3 / 21
Solving Margin-based Training Problems with
the Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomial
time
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
4 / 21
Solving Margin-based Training Problems with
the Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomial
time
using the cutting-plane
algorithm to speed up
training of structural SVMs
[Joachims, Finley & Yu,
MLJ’09]
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
4 / 21
Solving Margin-based Training Problems with
the Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomial
time
using the cutting-plane
algorithm to speed up
training of structural SVMs
[Joachims, Finley & Yu,
MLJ’09]
C.-N. Yu (Cornell)
using approximate
cutting-plane models to
build faster and sparser
kernel SVMs
[Yu & Joachims, KDD’08],
[Joachims & Yu, ECML’09;
Best Machine Learning Paper]
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
4 / 21
Incomplete Label Information and Latent
Variables
Discriminative motif finding
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
5 / 21
Incomplete Label Information and Latent
Variables
Discriminative motif finding
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
5 / 21
Incomplete Label Information and Latent
Variables
Discriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
5 / 21
Incomplete Label Information and Latent
Variables
Discriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
5 / 21
Latent Structural Support Vector Machines
Latent Structural SVM [Yu & Joachims, ICML’09]
n
X
1
~ k2 + C
min kw
ξi s.t. for 1 ≤ i ≤ n, for all outputs ŷ ∈ Y,
~ ,ξ~ 2
w
i=1
~ · Φ(xi , yi , h) − max w
~ · Φ(xi , ŷ, ĥ) ≥ ∆(yi , ŷ, ĥ) − ξi
max w
h∈H
~ · Φ(|
w
C.-N. Yu (Cornell)
ĥ∈H
{z
xi
}, |
{z
yi
Latent Structural SVMs
}, |
{z
h0
})
Oct 8-9, IBM SMiLe Workshop
6 / 21
Latent Structural Support Vector Machines
Latent Structural SVM [Yu & Joachims, ICML’09]
n
X
1
~ k2 + C
min kw
ξi s.t. for 1 ≤ i ≤ n, for all outputs ŷ ∈ Y,
~ ,ξ~ 2
w
i=1
~ · Φ(xi , yi , h) − max w
~ · Φ(xi , ŷ, ĥ) ≥ ∆(yi , ŷ, ĥ) − ξi
max w
h∈H
ĥ∈H
~ · Φ(
{w
~ · Φ(|
w
C.-N. Yu (Cornell)
,
{z
xi
,
}, |
{z
yi
Latent Structural SVMs
}, |
)
{z
h00
}), . . .}
Oct 8-9, IBM SMiLe Workshop
6 / 21
Latent Structural Support Vector Machines
Latent Structural SVM [Yu & Joachims, ICML’09]
n
X
1
~ k2 + C
min kw
ξi s.t. for 1 ≤ i ≤ n, for all outputs ŷ ∈ Y,
~ ,ξ~ 2
w
i=1
~ · Φ(xi , yi , h) − max w
~ · Φ(xi , ŷ, ĥ) ≥ ∆(yi , ŷ, ĥ) − ξi
maxw
h∈H
ĥ∈H
~ · Φ(
maxh∈H {w
~ · Φ(|
w
C.-N. Yu (Cornell)
,
{z
xi
}, |
,
{z
yi
Latent Structural SVMs
)
}, |
{z
h00
}), . . .}
Oct 8-9, IBM SMiLe Workshop
6 / 21
Latent Structural Support Vector Machines
Latent Structural SVM [Yu & Joachims, ICML’09]
n
X
1
~ k2 + C
min kw
ξi s.t. for 1 ≤ i ≤ n, for all outputs ŷ ∈ Y,
~ ,ξ~ 2
w
i=1
~ · Φ(xi , yi , h) − max w
~ · Φ(xi , ŷ , ĥ) ≥ ∆(yi , ŷ, ĥ) − ξi
max w
h∈H
ĥ∈H
~ · Φ(
maxh∈H {w
~ · Φ(|
w
C.-N. Yu (Cornell)
,
{z
xi
}, |
,
{z
ŷ
Latent Structural SVMs
)
}, |
{z
ĥ00
}), . . .}
Oct 8-9, IBM SMiLe Workshop
6 / 21
Latent Structural Support Vector Machines
Latent Structural SVM [Yu & Joachims, ICML’09]
n
X
1
~ k2 + C
min kw
ξi s.t. for 1 ≤ i ≤ n, for all outputs ŷ ∈ Y,
~ ,ξ~ 2
w
i=1
~ · Φ(xi , yi , h) − max w
~ · Φ(xi , ŷ , ĥ)≥∆(yi , ŷ, ĥ) − ξi
maxw
h∈H
ĥ∈H
~ · Φ(|
max{w
{z
}, |
{z
}, |
{z
}), . . . . . .}
~ · Φ(|
≥max{w
{z
}, |
{z
}, |
{z
}), . . . . . .}
h∈H
h∈H
C.-N. Yu (Cornell)
xi
xi
yi
ŷ
Latent Structural SVMs
h0
ĥ0
Oct 8-9, IBM SMiLe Workshop
6 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure [Yuille & Rangarajan ’03]
1
Decompose the objective into convex and concave part
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
7 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure [Yuille & Rangarajan ’03]
1
Decompose the objective into convex and concave part
2
Upper bound the concave part with a hyperplane
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
7 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure [Yuille & Rangarajan ’03]
1
Decompose the objective into convex and concave part
2
Upper bound the concave part with a hyperplane
3
Minimize the resulting convex sum. Iterate until
convergence
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
7 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure [Yuille & Rangarajan ’03]
1
Decompose the objective into convex and concave part
2
Upper bound the concave part with a hyperplane
3
Minimize the resulting convex sum. Iterate until
convergence
Recent works employing the CCCP algorithm
[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
7 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)
(1) Decompose the objective into convex and concave part
#
n
X
1
~ k2 + C
~ · Φ(xi , ŷ , ĥ) + ∆(yi , ŷ, ĥ)]
kw
max [w
2
(ŷ,ĥ)∈Y×H
i=1
{z
}
|
convex
" n
#
X
~ · Φ(xi , yi , h)
− C
maxw
"
i=1
|
C.-N. Yu (Cornell)
h∈H
{z
concave
}
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
8 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)
~t
(2) Upper bound the concave part with a hyperplane at w
"
~ ,− C
∀w
n
X
i=1
#
"
~ · Φ(xi , yi , h) ≤ − C
maxw
h∈H
|
n
X
#
~ · Φ(xi , yi , hi∗ )
w
i=1
{z
concave
}
|
{z
linear
}
~ t · Φ(xi , yi , h)
where hi∗ = argmax w
h∈H
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
9 / 21
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)
~ t+1
(3) Minimize the resulting convex sum to get w
"
~ t+1
w
#
n
X
1
~ k2 + C
~ · Φ(xi , ŷ, ĥ) + ∆(yi , ŷ, ĥ)]
= min
kw
max [w
~
2
w
(ŷ,ĥ)∈Y×H
i=1
|
{z
}
convex
#
" n
X
~ · Φ(xi , yi , hi∗ )
− C
w
i=1
|
C.-N. Yu (Cornell)
{z
linear
}
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
10 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper bounding
hyperplane
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
11 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper bounding
hyperplane
M-step: equivalent to minimizing the convex sum
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
11 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper bounding
hyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization with
partition function required
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
11 / 21
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper bounding
hyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization with
partition function required
Discriminative probabilistic models with latent variables
I
[ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein
’07]
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
11 / 21
Noun Phrase Coreference
[from Cardie & Wagstaff ’99]
Input x: Noun phrases
with edge features
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
12 / 21
Noun Phrase Coreference
[from Cardie & Wagstaff ’99]
Input x: Noun phrases
with edge features
Label y : Clusters of
noun phrases
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
12 / 21
Noun Phrase Coreference
[from Cardie & Wagstaff ’99]
Input x: Noun phrases
with edge features
Label y : Clusters of
noun phrases
Latent variable h:
‘Strong’ links as trees
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
12 / 21
Noun Phrase Coreference
[from Cardie & Wagstaff ’99]
Input x: Noun phrases
with edge features
Label y : Clusters of
noun phrases
Latent variable h:
‘Strong’ links as trees
Task: Cluster the
noun phrases using
single-link
agglomerative
clustering
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
12 / 21
Noun Phrase Coreference
[from Cardie & Wagstaff ’99]
Input x: Noun phrases
with edge features
Label y : Clusters of
noun phrases
Latent variable h:
‘Strong’ links as trees
Task: Cluster the
noun phrases using
single-link
agglomerative
clustering
Inference: Minimum
Spanning Tree
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
12 / 21
Noun Phrase Coreference: Results
Test on MUC 6 data, using the same features as in [Ng &
Cardie ’02]
Initialize spanning trees by chronological order
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
13 / 21
Noun Phrase Coreference: Results
Test on MUC 6 data, using the same features as in [Ng &
Cardie ’02]
Initialize spanning trees by chronological order
10-fold CV results:
Algorithm
MITRE loss
SVM cluster
41.3
35.6
[Finley & Joachims ’05]
Latent Structural SVM
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
13 / 21
Discriminative Motif Finding
S. cerevisiae
Input x: DNA sequences containing
ARS from S. cerevisiae and S. kluyveri
S. kluyveri
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
14 / 21
Discriminative Motif Finding
S. cerevisiae
Input x: DNA sequences containing
ARS from S. cerevisiae and S. kluyveri
S. kluyveri
Label y : Whether the sequence
replicates in S. cerevisiae
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
14 / 21
Discriminative Motif Finding
S. cerevisiae
Input x: DNA sequences containing
ARS from S. cerevisiae and S. kluyveri
S. kluyveri
Label y : Whether the sequence
replicates in S. cerevisiae
Latent variable h: position of the motif
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
14 / 21
Discriminative Motif Finding
S. cerevisiae
Input x: DNA sequences containing
ARS from S. cerevisiae and S. kluyveri
S. kluyveri
Label y : Whether the sequence
replicates in S. cerevisiae
Latent variable h: position of the motif
Task: Find out the predictive motif
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
14 / 21
Discriminative Motif Finding
S. cerevisiae
Input x: DNA sequences containing
ARS from S. cerevisiae and S. kluyveri
S. kluyveri
Label y : Whether the sequence
replicates in S. cerevisiae
Latent variable h: position of the motif
Task: Find out the predictive motif
Inference: Enumerate all positions h
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
14 / 21
Discriminative Motif Finding: Results
Data - 197 yeast DNA sequences from S. cerevisiae and S.
kluyveri.
∼6000 intergenic sequences for background estimation
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
15 / 21
Discriminative Motif Finding: Results
Data - 197 yeast DNA sequences from S. cerevisiae and S.
kluyveri.
∼6000 intergenic sequences for background estimation
10-fold CV, 10 random restarts for each parameter setting
Algorithm
Error Rate
Gibbs Sampler (w=11)
37.9%
Gibbs Sampler (w=17)
35.06%
Latent Structural SVM (w=11) 11.09%
Latent Structural SVM (w=17) 12.00%
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
15 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with an
efficient solution algorithm
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with an
efficient solution algorithm
A modular algorithm that exhibits very good accuracies on
two example structured prediction tasks
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with an
efficient solution algorithm
A modular algorithm that exhibits very good accuracies on
two example structured prediction tasks
Potential extensions to semi-supervised settings
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with an
efficient solution algorithm
A modular algorithm that exhibits very good accuracies on
two example structured prediction tasks
Potential extensions to semi-supervised settings
Also looking at situations in structured output learning
where unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
16 / 21
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with an
efficient solution algorithm
A modular algorithm that exhibits very good accuracies on
two example structured prediction tasks
Potential extensions to semi-supervised settings
Also looking at situations in structured output learning
where unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
16 / 21
Discriminative Motif Finding - Formulation
Feature vector Φ: Position-specific weight matrix plus
parameters for Markov background model
Φ(x, y, h) =
h
X
l
n
X
X
(j)
φBG (xi ) +
φPSM (xh+j ) +
φBG (xi )
|i=1 {z
background
} |j=1
i=h+l+1
{z
motif
}
|
{z
background
}
[from Wasserman 2004]
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
17 / 21
Discriminative Motif Finding - Formulation
Feature vector Φ: Position-specific weight matrix plus
parameters for Markov background model
Φ(x, y, h) =
h
X
l
n
X
X
(j)
φBG (xi ) +
φPSM (xh+j ) +
φBG (xi )
|i=1 {z
background
} |j=1
i=h+l+1
{z
motif
}
|
{z
background
}
[from Wasserman 2004]
Loss function ∆: Zero-one loss
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
17 / 21
Discriminative Motif Finding - Formulation
Feature vector Φ: Position-specific weight matrix plus
parameters for Markov background model
Φ(x, y, h) =
h
X
l
n
X
X
(j)
φBG (xi ) +
φPSM (xh+j ) +
φBG (xi )
|i=1 {z
background
} |j=1
i=h+l+1
{z
motif
}
|
{z
background
}
[from Wasserman 2004]
Loss function ∆: Zero-one loss
Inference: enumeration, as y is binary and h is linear in
sequence length
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
17 / 21
Noun Phrase Coreference - Formulation
Feature vector Φ: sum of tree edge features:
X
Φ(x, y, h) =
xij
(i,j)∈h
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
18 / 21
Noun Phrase Coreference - Formulation
Feature vector Φ: sum of tree edge features:
X
Φ(x, y, h) =
xij
(i,j)∈h
Loss function ∆:
∆(y, ŷ, ĥ) = n(y ) − k(y)
|{z}
|{z}
#nodes #components
+
X
(i,j)∈ĥ
C.-N. Yu (Cornell)
`(y, (i, j))
| {z }
+1/−1
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
18 / 21
Noun Phrase Coreference - Formulation
Feature vector Φ: sum of tree edge features:
X
Φ(x, y, h) =
xij
(i,j)∈h
Loss function ∆:
∆(y, ŷ, ĥ) = n(y ) − k(y)
|{z}
|{z}
#nodes #components
+
X
(i,j)∈ĥ
`(y, (i, j))
| {z }
+1/−1
Inference: Any Maximum
Spanning Tree algorithm
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
18 / 21
Optimizing Precision@k
Input x: A query with an
associated collection of
documents
Label y : Relevance
judgments of each
document
Latent variable h: Top k
relevant documents
C.-N. Yu (Cornell)
Query q: ICML 2009
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
19 / 21
Optimizing Precision@k - Formulation
Feature vector Φ: sum of features from top k documents
Φ(x, y, h) =
k
X
xhj
j=1
Loss function ∆: One minus precison@k
k
1X
∆(y, ŷ, ĥ) = 1 −
[yĥj == 1]
k
j=1
Depends only on top k document selected by h
Inference: Sorting
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
20 / 21
Optimizing Precision@k - Results
OHSUMED dataset from LETOR 3.0 benchmark
Initialize h with weight vector trained on classification
accuracy
5-fold CV results:
C.-N. Yu (Cornell)
Latent Structural SVMs
Oct 8-9, IBM SMiLe Workshop
21 / 21
© Copyright 2026 Paperzz