Structured Learning Exploiting Hidden Information Hung-yi Lee Previously on Object Detection … • An object can have more than one types Bicycle Train Haruhi Previously on Object Detection … • An object can have more than one types • Evaluation 𝑥: 𝑦 Type1 Type2 𝐹 𝑥, 𝑦 = 𝑤1 ∙ 𝜙 𝑥, 𝑦 𝐹 𝑥, 𝑦 = 𝑤2 ∙ 𝜙 𝑥, 𝑦 Represented in typical way ( 𝐹 𝑥, 𝑦 = 𝑤 ∙ 𝜙 𝑥, 𝑦 ) 𝑦 Previously on Object Detection … • An object can have more than one types • Evaluation For “type 1”, 𝐹 𝑥, 𝑦 = 𝑤1 ∙ 𝜙 𝑥, 𝑦 For “type 2”, 𝐹 𝑥, 𝑦 = 𝑤2 ∙ 𝜙 𝑥, 𝑦 = 𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ ℎ: type of Haruhi (type 1 or type 2) Ψ 𝑥, 𝑦, ℎ : a feature vector for x, y and type h Its length is twice of 𝜙 𝑥, 𝑦 𝑤: a weight vector to be learned Its length is twice of 𝑤1 or 𝑤2 Previously on Object Detection … • An object can have more than one types • Evaluation 𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ 𝑤1 𝑤= 𝑤 2 𝜙 𝑥, 𝑦 Ψ 𝑥, 𝑦, ℎ = "𝑡𝑦𝑝𝑒 1" = 0 0 Ψ 𝑥, 𝑦, ℎ = "𝑡𝑦𝑝𝑒 2" = 𝜙 𝑥, 𝑦 For “type 1”, F 𝑥, 𝑦, ℎ = 𝑤1 ∙ 𝜙 𝑥, 𝑦 + 𝑤2 ∙ 0 For “type 2”, F 𝑥, 𝑦, ℎ = 𝑤1 ∙ 0 + 𝑤2 ∙ 𝜙 𝑥, 𝑦 Previously on Object Detection … • An object can have more than one types • Inference Input 𝑤1 ∙ 𝜙 … 𝑤1 ∙ 𝜙 … 𝑤1 ∙ 𝜙 If we know its type 1 We don’t know the type of the input image actually. Previously on Object Detection … • An object can have more than one types • Inference Type 1 𝑤2 ∙ 𝜙 𝑤1 ∙ 𝜙 … … 𝑤2 ∙ 𝜙 𝑤1 ∙ 𝜙 … … 𝑤1 ∙ 𝜙 𝑤2 ∙ 𝜙 Previously on Object Detection … • An object can have more than one types • Inference 𝑦 = 𝑎𝑟𝑔 max 𝑚𝑎𝑥 𝑤1 ∙ 𝜙 𝑥, 𝑦 , 𝑤2 ∙ 𝜙 𝑥, 𝑦 𝑦 = 𝑦 = 𝑎𝑟𝑔 max max 𝐹 𝑥, 𝑦, ℎ 𝑦 ℎ What we are going to learn = 𝑎𝑟𝑔 max max 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ 𝑦 ℎ ℎ = “type 1” or “type 2” Previously on Object Detection … • An object can have more than one types • Training Larger than 𝑤∙Ψ ∆ −𝜀 𝑤∙Ψ 𝑤∙Ψ … … 𝑤∙Ψ Type 1 𝑤∙Ψ Previously on Object Detection … • An object can have more than one types • Training Larger than 𝑤∙Ψ ∆ 𝑤∙Ψ 𝑤∙Ψ … … 𝑤∙Ψ −𝜀 𝑤∙Ψ Previously on Object Detection … • An object can have more than one types Don’t care. Why? • Training correct y, incorrect h 𝑤∙Ψ 𝑤∙Ψ 𝑤∙Ψ 𝑤∙Ψ … … 𝑤∙Ψ 𝑤∙Ψ Previously on Object Detection … • An object can have more than one types • Training Given training data: 𝑥 1 , 𝑦1 , ℎ1 , ⋯ , 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 , ⋯ , 𝑥 𝑅 , 𝑦 𝑅 , ℎ𝑅 Find 𝑤, 𝜀 1 , … , 𝜀 𝑟 , … , 𝜀 𝑅 minimize: 1 𝑤 2 𝑅 2 𝜀𝑟 +𝜆 𝑟=1 ∀𝑟, ∀𝑦 ∈ 𝕐,𝑦 ≠ 𝑦 𝑟 , ∀ℎ ∈ ℍ 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0 The world is not that easy …… • The useful information are usually hidden Type 1 Type 1 Type 1 Type 1 Type 2 Type 2 Type 2 Type 2 How to deal with hidden information with Structured SVM? Training with Hidden Information • No types? Try to generate ourselves 𝑥2, 𝑦2 𝑥 1 , 𝑦1 𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ Random initialized 𝑥3, 𝑦3 𝑥4, 𝑦4 Evaluate the compatibility of x, y and h 𝑤 = 𝑤0 ℎ = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ ℎ Given x and 𝑦, find the most compatible h Training with Hidden Information • No types? Try to generate ourselves 𝑥 1 , 𝑦1 𝑥2, 𝑦2 𝑥3, 𝑦3 𝑥4, 𝑦4 ℎ1 = type 1 ℎ2 = type 2 ℎ3 = type 1 ℎ4 = type 2 For r = 1, …, 4: ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 0 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ Good guess? Of course not. Because 𝑤0 is random Training with Hidden Information • With the types we generate, we can find a w 𝑥 1 , 𝑦1 𝑥2, 𝑦2 𝑥3, 𝑦3 𝑥4, 𝑦4 ℎ1 = type 1 ℎ2 = type 2 ℎ3 = type 1 ℎ4 = type 2 Find 𝑤, 𝜀1, 𝜀 2, 𝜀 3, 𝜀 4 1 minimize: 𝑤 2 4 2 𝜀𝑟 +𝜆 𝑟=1 𝑟 = 1, … , 4, ∀𝑦 ∈ 𝕐,𝑦 ≠ 𝑦 𝑟 , ∀ℎ ∈ ℍ 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0 Training with Hidden Information For r = 1, …, 4: ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 1 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ 𝑥 1 , 𝑦1 𝑥2, 𝑦2 𝑥3, 𝑦3 𝑥4, 𝑦4 ℎ1 = type 1 ℎ2 = type 2 ℎ3 = type 21 ℎ4 = type 2 𝑤1 Solving a QP Is 𝑤 1 a good weight vector? Probably not Train from random ℎ Training with Hidden Information For r = 1, …, 4: ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 2 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ 𝑥 1 , 𝑦1 𝑥2, 𝑦2 𝑥3, 𝑦3 𝑥4, 𝑦4 ℎ1 = type 1 ℎ2 = type 12 ℎ3 = type 21 ℎ4 = type 2 𝑤2 Solving a QP Is 𝑤 2 better than 𝑤 1 ? Yes (?) Use the new ℎ to solve the same QP again Iteratively Why we can get better weight vector after each iteration? Structured SVM 𝑥 1 , 𝑦1 , ⋯ 𝑥 𝑟 , 𝑦 𝑟 ⋯ 𝑥 𝑅 , 𝑦 𝑅 Training data: Find w such that If 𝑦 = arg max ݔ(𝜙∙ݓ,)ݕ 𝑦∈𝕐 Minimizing cost function 𝑅 𝐶 𝑤 = ∆ 𝑦𝑟 , 𝑦𝑟 𝑟=1 𝐶 ′′ 1 𝑤 = 𝑤 2 𝑅 2 𝐿𝑟 𝑤 + 𝑟=1 Formulated as QP similar to SVM ∆ 𝑦 𝑟 , 𝑦 𝑟 ≤ 𝐿𝑟 𝑤 𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟 𝑦 Structured SVM 𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟 𝑦 𝐿𝑟 𝑤 Structured SVM 𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟 𝑦 convex ∆ 𝑦 𝑟 , 𝑦1 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦1 ∆ 𝑦 𝑟 , 𝑦2 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦2 w Structured SVM 𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟 𝑦 convex w Structured SVM convex 𝐿𝑟 𝑤 = max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 −𝑤 ∙ 𝜙 𝑥 𝑟 , 𝑦 𝑟 𝑦 line convex convex convex line Structured SVM 𝐶 ′′ 1 𝑤 = 𝑤 2 convex convex 𝑅 2 𝐿𝑟 𝑤 + 𝑟=1 convex Structured SVM with Hidden Information Find w such that 𝑦 = 𝑎𝑟𝑔 max max 𝐹 𝑥, 𝑦, ℎ = 𝑎𝑟𝑔 max max 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ 𝑦 𝑦 ℎ ℎ Minimizing cost function 𝑅 𝐶 𝑤 = ∆ 𝑦𝑟 , 𝑦𝑟 𝑟=1 𝐶 ′′ 1 𝑤 = 𝑤 2 𝑅 2 𝐿𝑟 𝑤 + 𝑟=1 ∆ 𝑦 𝑟 , 𝑦 𝑟 ≤ 𝐿𝑟 𝑤 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ 𝑦 ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ Structured SVM with Hidden Information 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ 𝑦 ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ 𝐿𝑟 𝑤 h = “type 1” h = “type 2” Structured SVM with Hidden Information • Cost function to be minimized 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ ℎ convex convex Structured SVM with Hidden Information • Cost function to be minimized 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ ℎ concave convex ? + = 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ ℎ convex Auxiliary function 𝐴𝑟 𝑤 at w0: 1.𝐴𝑟 𝑤 0 = 𝐿𝑟 𝑤 0 𝑟 𝐿 𝑤 2. Upper bound of 𝐿𝑟 𝑤 3. Easy to be minimized concave Minimum value of 𝐴𝑟 𝑤 is at w1 𝐴𝑟 𝑤 1 < 𝐴𝑟 𝑤 0 𝐿𝑟 𝑤 1 < 𝐴𝑟 𝑤 1 𝐿𝑟 𝑤 1 < 𝐿𝑟 𝑤 0 w1 w0 w 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ ℎ convex Auxiliary function 𝐴𝑟 𝑤 at w1: 𝑟 𝑤 1 = 𝐿𝑟 𝑤 1 1. 𝐴 𝐿𝑟 𝑤 2. Upper bound of 𝐿𝑟 𝑤 3. Easy to be minimized concave Minimum value of 𝐴𝑟 𝑤 is at w2 𝐴𝑟 𝑤 2 < 𝐴𝑟 𝑤 1 𝐿𝑟 𝑤 2 < 𝐴𝑟 𝑤 2 𝐿𝑟 𝑤 2 < 𝐿𝑟 𝑤 1 Find a w that can make 𝐿𝑟 𝑤 smaller at each iteration Can only reach local minimum w2 w1 w0 w 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ ℎ concave convex + 𝑤 = 𝐿𝑟 𝑤 w0 w0 𝑤 Auxiliary function 𝐴𝑟 𝑤 at w0: 1. 𝐴𝑟 𝑤 0 = 𝐿𝑟 𝑤 0 2. Upper bound of 𝐿𝑟 𝑤 3. Easy to be minimized convex 𝑤 What is the relation to the above process? ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ 𝑤 to form the auxiliary function find the minimum value Solving a QP After each iteration, the w obtained decrease the cost function 𝑤 ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ w0 𝑤 to form the auxiliary function 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ 𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 1 ℎ convex concave 𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 2 𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 1 𝑤 ∙ Ψ 𝑥𝑟 , 𝑦𝑟 , ℎ = 2 𝑤 ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ w0 𝑤 to form the auxiliary function 𝐿𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ − max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ 𝑦 ℎ ℎ convex concave ℎ𝑟 = 𝑎𝑟𝑔 max 𝑤 0 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ 𝐴𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ −𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 𝑦 ℎ 𝐴𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ −𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 𝑦 ℎ find the minimum value Solving a QP 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ 𝑦 ℎ = −𝐴𝑟 𝑤 ∀𝑦 ∈ 𝕐, ∀ℎ ∈ ℍ 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ + ∆ 𝑦 𝑟 , 𝑦 ≥ −𝐴𝑟 𝑤 ∀𝑦 ∈ 𝕐, ∀ℎ ∈ ℍ 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 − 𝐴𝑟 𝑤 This is a general framework. Framework of Structured SVM with Hidden Information • You can involve any hidden information h in the task you are considering. • Problem 1: Evaluation 𝐹 𝑥, 𝑦, ℎ = 𝑤 ∙ Ψ 𝑥, 𝑦, ℎ • Problem 2: Inference 𝑦 = 𝑎𝑟𝑔 max max 𝐹 𝑥, 𝑦, ℎ 𝑦 ℎ Framework of Structured SVM with Hidden Information • Problem 3: Training Training data: Initialize w Find 𝑤, 𝑥 1 , 𝑦1 , 𝑥 2 , 𝑦 2 , ⋯ 𝑥 𝑟 , 𝑦 𝑟 ⋯ 𝑥 𝑅 , 𝑦 𝑅 ℎ𝑟 = arg max 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ ℎ 𝜀1, ⋯ , 𝜀 𝑅 1 minimize 𝑤 2 𝑅 2 𝜀𝑟 +𝜆 𝑟=1 ∀𝑟, ∀𝑦 ∈ 𝕐,𝑦 ≠ 𝑦 𝑟 , ∀ℎ ∈ ℍ 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0 More …… • Structured SVM + Hidden Information • Chun-Nam John Yu and Thorsten Joachims, ”Learning Structural SVMs with Latent Variables,” ICML 2009 • Wang, Yang, and Greg Mori. "Max-margin hidden conditional random fields for human action recognition," CVPR 2009 • Structured Perceptron + Hidden Information • Sun, Xu, et al. "Latent Variable Perceptron Algorithm for Structured Classification," IJCAI. Vol. 9. 2009 • CRF + Hidden Information • Wang, Yang, and Greg Mori. "Hidden part models for human action recognition: Probabilistic versus max margin," Pattern Analysis and Machine Intelligence, IEEE Transactions, 2011 Thanks for your attention! How to use structured SVM with hidden information for summarization http://140.112.21.28/~tlkagk/homepa ge/courses/MLDS_2015/Structured%2 0Lecture/Summarization%20Hidden.fs p/index.html Appendix Discussion • Hidden information v.s. hidden layers in DNN • Learning with information can be difficult because there is not guide for hidden information. • Initialization is important • The model described in the slides similar to GMM, k-means, etc. any continuous function be represented as a sum of convex and concave function? http://math.stackexchange.com/questions/13386/can-anycontinuous-function-be-represented-as-a-sum-of-convex-andconcave-functi 𝐴𝑟 𝑤 = max max ∆ 𝑦 𝑟 , 𝑦 + 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ −𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 𝑦 ℎ find the minimum value Solving a QP 𝐴𝑟 𝑤 ℎ𝑟 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦 𝑟 , ℎ𝑟 − 𝑤 ∙ Ψ 𝑥 𝑟 , 𝑦, ℎ ≥ ∆ 𝑦 𝑟 , 𝑦 −𝜀 𝑟 , 𝜀 𝑟 ≥ 0
© Copyright 2026 Paperzz