ppt

Structured Learning
Exploiting Hidden Information
Hung-yi Lee
Previously on Object Detection …
• An object can have more than one types
Bicycle
Train
Haruhi
Previously on Object Detection …
• An object can have more than one types
• Evaluation
𝑥:
𝑦
Type1
Type2
𝐹 𝑥, 𝑦 = 𝑤1 ∙ 𝜙 𝑥, 𝑦
𝐹 𝑥, 𝑦 = 𝑤2 ∙ 𝜙 𝑥, 𝑦
Represented in typical way ( 𝐹 𝑥, 𝑦 = 𝑤 ∙ 𝜙 𝑥, 𝑦 )
𝑦
Previously on Object Detection …
• An object can have more than one types
• Evaluation
For “type 1”, 𝐹 𝑥, 𝑦 = 𝑤1 ∙ 𝜙 𝑥, 𝑦
For “type 2”, 𝐹 𝑥, 𝑦 = 𝑤2 ∙ 𝜙 𝑥, 𝑦
=
𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
ℎ: type of Haruhi (type 1 or type 2)
Ψ 𝑥, 𝑦, ℎ : a feature vector for x, y and type h
Its length is twice of 𝜙 𝑥, 𝑦
𝑤: a weight vector to be learned
Its length is twice of 𝑤1 or 𝑤2
Previously on Object Detection …
• An object can have more than one types
• Evaluation 𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
𝑤1
𝑤= 𝑤
2
𝜙 𝑥, 𝑦
Ψ 𝑥, 𝑦, ℎ = "𝑡𝑦𝑝𝑒 1" =
0
0
Ψ 𝑥, 𝑦, ℎ = "𝑡𝑦𝑝𝑒 2" =
𝜙 𝑥, 𝑦
For “type 1”, F 𝑥, 𝑦, ℎ = 𝑤1 ∙ 𝜙 𝑥, 𝑦 + 𝑤2 ∙ 0
For “type 2”, F 𝑥, 𝑦, ℎ = 𝑤1 ∙ 0 + 𝑤2 ∙ 𝜙 𝑥, 𝑦
Previously on Object Detection …
• An object can have more than one types
• Inference
Input
𝑤1 ∙ 𝜙
…
𝑤1 ∙ 𝜙
…
𝑤1 ∙ 𝜙
If we know its
type 1
We don’t know the
type of the input
image actually.
Previously on Object Detection …
• An object can have more than one types
• Inference
Type 1
𝑤2 ∙ 𝜙
𝑤1 ∙ 𝜙
…
…
𝑤2 ∙ 𝜙
𝑤1 ∙ 𝜙
…
…
𝑤1 ∙ 𝜙
𝑤2 ∙ 𝜙
Previously on Object Detection …
• An object can have more than one types
• Inference
𝑦 = 𝑎𝑟𝑔 max 𝑚𝑎𝑥 𝑤1 ∙ 𝜙 𝑥, 𝑦 , 𝑤2 ∙ 𝜙 𝑥, 𝑦
𝑦
=
𝑦 = 𝑎𝑟𝑔 max max 𝐹 𝑥, 𝑦, ℎ
𝑦
ℎ
What we are
going to learn
= 𝑎𝑟𝑔 max max 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
𝑦
ℎ
ℎ = “type 1” or “type 2”
Previously on Object Detection …
• An object can have more than one types
• Training
Larger than
𝑤∙Ψ
∆
−𝜀
𝑤∙Ψ
𝑤∙Ψ
…
…
𝑤∙Ψ
Type 1
𝑤∙Ψ
Previously on Object Detection …
• An object can have more than one types
• Training
Larger than
𝑤∙Ψ
∆
𝑤∙Ψ
𝑤∙Ψ
…
…
𝑤∙Ψ
−𝜀
𝑤∙Ψ
Previously on Object Detection …
• An object can have more than one types
Don’t care. Why?
• Training
correct y, incorrect h
𝑤∙Ψ
𝑤∙Ψ
𝑤∙Ψ
𝑤∙Ψ
…
…
𝑤∙Ψ
𝑤∙Ψ
Previously on Object Detection …
• An object can have more than one types
• Training
Given training data:
𝑥 1 , 𝑦1 , ℎ1 , ⋯ , 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 , ⋯ , 𝑥 𝑅 , 𝑦 𝑅 , ℎ𝑅
Find 𝑤, 𝜀 1 , … , 𝜀 𝑟 , … , 𝜀 𝑅 minimize:
1
𝑤
2
𝑅
2
𝜀𝑟
+𝜆
𝑟=1
∀𝑟, ∀𝑦 ∈ 𝕐,𝑦 ≠ 𝑦 𝑟 , ∀ℎ ∈ ℍ
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0
The world is not that easy ……
• The useful information are usually hidden
Type 1
Type 1
Type 1
Type 1
Type 2
Type 2
Type 2
Type 2
How to deal with hidden information with Structured SVM?
Training with Hidden Information
• No types? Try to generate ourselves
𝑥2, 𝑦2
𝑥 1 , 𝑦1
𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
Random initialized
𝑥3, 𝑦3
𝑥4, 𝑦4
Evaluate the compatibility
of x, y and h
𝑤 = 𝑤0
ℎ = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
ℎ
Given x and 𝑦, find the most compatible h
Training with Hidden Information
• No types? Try to generate ourselves
𝑥 1 , 𝑦1
𝑥2, 𝑦2
𝑥3, 𝑦3
𝑥4, 𝑦4
ℎ1 = type 1
ℎ2 = type 2
ℎ3 = type 1
ℎ4 = type 2
For r = 1, …, 4: ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 0 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
Good guess? Of course not.
Because 𝑤0 is random
Training with Hidden Information
• With the types we generate, we can find a w
𝑥 1 , 𝑦1
𝑥2, 𝑦2
𝑥3, 𝑦3
𝑥4, 𝑦4
ℎ1 = type 1
ℎ2 = type 2
ℎ3 = type 1
ℎ4 = type 2
Find 𝑤,
𝜀1, 𝜀 2, 𝜀 3, 𝜀 4
1
minimize:
𝑤
2
4
2
𝜀𝑟
+𝜆
𝑟=1
𝑟 = 1, … , 4, ∀𝑦 ∈ 𝕐,𝑦 ≠ 𝑦 𝑟 , ∀ℎ ∈ ℍ
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0
Training with Hidden Information
For r = 1, …, 4: ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 1 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
𝑥 1 , 𝑦1
𝑥2, 𝑦2
𝑥3, 𝑦3
𝑥4, 𝑦4
ℎ1 = type 1
ℎ2 = type 2
ℎ3 = type 21
ℎ4 = type 2
𝑤1
Solving a QP
Is 𝑤 1 a good weight vector? Probably not
Train from random ℎ
Training with Hidden Information
For r = 1, …, 4: ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 2 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
𝑥 1 , 𝑦1
𝑥2, 𝑦2
𝑥3, 𝑦3
𝑥4, 𝑦4
ℎ1 = type 1
ℎ2 = type 12
ℎ3 = type 21
ℎ4 = type 2
𝑤2
Solving a QP
Is 𝑤 2 better than 𝑤 1 ? Yes (?)
Use the new ℎ
to solve the same
QP again
Iteratively
Why we can get better
weight vector after each
iteration?
Structured SVM
𝑥 1 , 𝑦1 , ⋯ 𝑥 𝑟 , 𝑦 𝑟 ⋯ 𝑥 𝑅 , 𝑦 𝑅
Training data:
Find w such that
If 𝑦 = arg max ‫ݔ(𝜙∙ݓ‬,‫)ݕ‬
𝑦∈𝕐
Minimizing cost function
𝑅
𝐶 𝑤 =
∆
𝑦𝑟 , 𝑦𝑟
𝑟=1
𝐶 ′′
1
𝑤 =
𝑤
2
𝑅
2
𝐿𝑟 𝑤
+
𝑟=1
Formulated as QP similar to SVM
∆ 𝑦 𝑟 , 𝑦 𝑟 ≤ 𝐿𝑟 𝑤
𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟
𝑦
Structured SVM
𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟
𝑦
𝐿𝑟 𝑤
Structured SVM
𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟
𝑦
convex
∆ 𝑦 𝑟 , 𝑦1 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦1
∆ 𝑦 𝑟 , 𝑦2 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦2
w
Structured SVM
𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟
𝑦
convex
w
Structured SVM
convex
𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟
𝑦
line
convex
convex
convex
line
Structured SVM
𝐶 ′′
1
𝑤 =
𝑤
2
convex
convex
𝑅
2
𝐿𝑟 𝑤
+
𝑟=1
convex
Structured SVM
with Hidden Information
Find w such that
𝑦 = 𝑎𝑟𝑔 max max 𝐹 𝑥, 𝑦, ℎ = 𝑎𝑟𝑔 max max 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
𝑦
𝑦
ℎ
ℎ
Minimizing cost function
𝑅
𝐶 𝑤 =
∆
𝑦𝑟 , 𝑦𝑟
𝑟=1
𝐶 ′′
1
𝑤 =
𝑤
2
𝑅
2
𝐿𝑟 𝑤
+
𝑟=1
∆ 𝑦 𝑟 , 𝑦 𝑟 ≤ 𝐿𝑟 𝑤
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ
𝑦
ℎ
− max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
Structured SVM
with Hidden Information
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ
𝑦
ℎ
− max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
𝐿𝑟 𝑤
h = “type 1”
h = “type 2”
Structured SVM
with Hidden Information
• Cost function to be minimized
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
ℎ
convex
convex
Structured SVM
with Hidden Information
• Cost function to be minimized
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
ℎ
concave
convex
?
+
=
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
ℎ
convex
Auxiliary function 𝐴𝑟 𝑤 at w0:
1.𝐴𝑟 𝑤 0 = 𝐿𝑟 𝑤 0
𝑟
𝐿 𝑤
2. Upper bound of 𝐿𝑟 𝑤
3. Easy to be minimized
concave
Minimum value of 𝐴𝑟 𝑤
is at w1
𝐴𝑟 𝑤 1 < 𝐴𝑟 𝑤 0
𝐿𝑟 𝑤 1 < 𝐴𝑟 𝑤 1
𝐿𝑟 𝑤 1 < 𝐿𝑟 𝑤 0
w1
w0
w
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
ℎ
convex
Auxiliary function 𝐴𝑟 𝑤 at w1:
𝑟 𝑤 1 = 𝐿𝑟 𝑤 1
1.
𝐴
𝐿𝑟 𝑤
2. Upper bound of 𝐿𝑟 𝑤
3. Easy to be minimized
concave
Minimum value of 𝐴𝑟 𝑤
is at w2
𝐴𝑟 𝑤 2 < 𝐴𝑟 𝑤 1
𝐿𝑟 𝑤 2 < 𝐴𝑟 𝑤 2
𝐿𝑟 𝑤 2 < 𝐿𝑟 𝑤 1
Find a w that can make 𝐿𝑟 𝑤
smaller at each iteration
Can only reach local minimum
w2
w1
w0
w
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
ℎ
concave
convex
+
𝑤
=
𝐿𝑟
𝑤
w0
w0
𝑤
Auxiliary function 𝐴𝑟 𝑤
at w0:
1. 𝐴𝑟 𝑤 0 = 𝐿𝑟 𝑤 0
2. Upper bound of 𝐿𝑟 𝑤
3. Easy to be minimized
convex
𝑤
What is the relation to
the above process?
ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
𝑤
to form the auxiliary function
find the
minimum value
Solving a QP
After each iteration, the w obtained
decrease the cost function
𝑤
ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
w0
𝑤
to form the auxiliary function
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 1
ℎ
convex
concave
𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 2
𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 1
𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 2
𝑤
ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
w0
𝑤
to form the auxiliary function
𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
𝑦
ℎ
ℎ
convex
concave
ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 0 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
𝐴𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ −𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟
𝑦
ℎ
𝐴𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ −𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟
𝑦
ℎ
find the minimum value
Solving a QP
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ
𝑦
ℎ
= −𝐴𝑟 𝑤
∀𝑦 ∈ 𝕐, ∀ℎ ∈ ℍ
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ + ∆ 𝑦 𝑟 , 𝑦
≥ −𝐴𝑟 𝑤
∀𝑦 ∈ 𝕐, ∀ℎ ∈ ℍ
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 − 𝐴𝑟 𝑤
This is a general framework.
Framework of Structured SVM
with Hidden Information
• You can involve any hidden information h in the
task you are considering.
• Problem 1: Evaluation
𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ
• Problem 2: Inference
𝑦 = 𝑎𝑟𝑔 max max 𝐹 𝑥, 𝑦, ℎ
𝑦
ℎ
Framework of Structured SVM
with Hidden Information
• Problem 3: Training
Training data:
Initialize
w
Find 𝑤,
𝑥 1 , 𝑦1 , 𝑥 2 , 𝑦 2 , ⋯ 𝑥 𝑟 , 𝑦 𝑟 ⋯ 𝑥 𝑅 , 𝑦 𝑅
ℎ𝑟 = arg max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ
ℎ
𝜀1, ⋯ , 𝜀 𝑅
1
minimize
𝑤
2
𝑅
2
𝜀𝑟
+𝜆
𝑟=1
∀𝑟, ∀𝑦 ∈ 𝕐,𝑦 ≠ 𝑦 𝑟 , ∀ℎ ∈ ℍ
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0
More ……
• Structured SVM + Hidden Information
• Chun-Nam John Yu and Thorsten Joachims, ”Learning
Structural SVMs with Latent Variables,” ICML 2009
• Wang, Yang, and Greg Mori. "Max-margin hidden
conditional random fields for human action
recognition," CVPR 2009
• Structured Perceptron + Hidden Information
• Sun, Xu, et al. "Latent Variable Perceptron Algorithm for
Structured Classification," IJCAI. Vol. 9. 2009
• CRF + Hidden Information
• Wang, Yang, and Greg Mori. "Hidden part models for
human action recognition: Probabilistic versus max
margin," Pattern Analysis and Machine Intelligence, IEEE
Transactions, 2011
Thanks for your attention!
How to use structured SVM with
hidden information for summarization
http://140.112.21.28/~tlkagk/homepa
ge/courses/MLDS_2015/Structured%2
0Lecture/Summarization%20Hidden.fs
p/index.html
Appendix
Discussion
• Hidden information v.s. hidden layers in DNN
• Learning with information can be difficult because
there is not guide for hidden information.
• Initialization is important
• The model described in the slides similar to GMM,
k-means, etc.
any continuous function be represented as a sum of convex
and concave function?
http://math.stackexchange.com/questions/13386/can-anycontinuous-function-be-represented-as-a-sum-of-convex-andconcave-functi
𝐴𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ −𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟
𝑦
ℎ
find the minimum value
Solving a QP
𝐴𝑟 𝑤
ℎ𝑟
𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0