PowerPoint Template - Carnegie Mellon School of Computer Science

Optimistic Knowledge Gradient Policy for
Optimal Budget Allocation in Crowdsourcing
Xi Chen
Mentor: Denny Zhou
In collaboration with: Qihang Lin
Machine Learning Department
Carnegie Mellon University
Crowdsourcing Services
 Building predictive models: collecting reliable labels for training
 Crowdsourcing: outsourcing labeling tasks to a group of
workers, who usually are not experts on these tasks
2
Challenge: budget allocation in crowdsourcing
 Labels from crowd are very noisy: repeated labeling. Aggregate
labels to infer the truth. More labels lead to higher confidence.
 No free lunch: each labeling will cost a certain amount of money!
 Different workers have different reliability
 Goal: given a fixed amount of budget, how to sequentially allocate
the budgets on different item-worker pairs so that the overall
labeling accuracy is maximized ?
 Estimate items’ difficulty and workers’ reliability and incorporate
the estimation into the sequential allocation process.
 Simplest setting:
 Binary Labeling
 Homogenous Workers
3
Binary Labeling by Homogenous Workers




𝐾 items: 𝑖 = {1, … , 𝐾}
Soft-label 𝜃𝑖 = Pr 𝑍𝑖 = 1 ∈ 0,1 : unknown
Easy: 𝜃𝑖 → 0, 𝜃𝑖 → 1. Difficult : 𝜃𝑖 → 0.5
Positive Set: 𝐻 ∗ ={𝑖: 𝜃𝑖 ≥ 0.5}
 Example: identify whether individuals are adult or not
4
Binary Labeling by Homogenous Workers
 Homogenous Workers: provide labeling according to Bernoulli(𝜃𝑖 )
worker
+
 Coin Tossing Problem:
[CrowdSynth: Kamar et al. 12]
 There are K different biased coins
 Positive/Head Set:
Labeling Procedure:
Tossing the Coin
Unknown
 Total budget 𝑇 : the number of labels that we can acquire
 Challenge: How to dynamically allocate budgets on 𝐾 items
so that the overall accuracy can be maximized ?
5
Binary Labeling by Homogenous Workers
 Dynamic Budget Allocation:
 Step 1: Dynamically acquire labels from the crowd for different items:
 Step 2: When the budget 𝑇 is exhausted, make an inference about the
positive set 𝐻𝑇 based on collected labels.
Homogenous workers: majority vote
Heterogeneous workers: involves the reliability of the workers
 Goal: maximize the accuracy:
 Theory Tools: Bayesian Markov Decision Process
 Bayesian Statistical Decision
 Bayesian Sequential Optimization
 Bayesian Reinforcement Learning
6
Roadmap
Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming
Approximate Policy: Optimistic Knowledge Gradient
Modeling Workers’ Reliability
Extensions: Incorporate of Feature Information
Extensions: Multi-Class Labeling
7
Bayesian Markov Decision Process
 Beta Prior
 Conjugate prior of Bernoulli
𝑎𝑖 : counts of 1s,
𝑏𝑖 : counts of -1s
 At each stage:




Current state:
Decision rule:
The Decision rule is Markovian
Taking the Observation:
 Allocation Policy:
8
Bayesian Markov Decision Process
 State Transition and Transition Probability:
State:
Action:
 Sample Path
 Filtration:
9
Final Inference on Positive Set
 When the budget 𝑇 is exhausted, make an inference about
the positive set 𝐻𝑇 based on collected labels.
Bayesian Decision Rule
𝑎𝑖𝑇 : counts of observed 1s plus 𝑎𝑖0
𝑏𝑖𝑇 : counts of observed -1s plus 𝑏𝑖0
𝑎𝑖0 = 𝑏i0
10
Expected Accuracy Maximization
 Value Function: optimization over the policy
 Optimization: Markov Decision Process and Dynamic Programming
Stage-wise Reward
Final Accuracy:
No stage-wise reward
11
Stage-wise Reward
 Value Function
 Telescope Expansion
[Ng et al. 99
Xie et al. 11]
 Expected Stage-wise Reward
12
Markov Decision Process
 Value Function
 Stage-wise Reward
State
 Markov Decision Process (Finite-horizon)
13
Optimal Policy via Dynamic Programming
 Finite Horizon Markov Decision Process:
 Dynamic Programming (a.k.a. Backward Induction)
 Curse of dimensionality:
 Approximate policies are needed
14
Roadmap
Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming
Approximate Policy: Optimistic Knowledge Gradient
Modeling Workers’ Reliability
Extensions: Incorporate of Feature Information
Extensions: Multi-Class Labeling
15
Approximate Policies
 Uniform Sampling:
[J.C.Gittins, 79]
 Finite-Horizon Gittins Index Rule:
 Decompose the joint state space into state space for each single
item 𝑂(𝑇 2 )
 Infinite-horizon and discounted reward: Gittins index rule is optimal
 Finite-horizon and non-discounted reward: suboptimal policy
• Exact Computation:
• Approximate Computation:
[Nino-Mora, 11]
(Time & Space)
(Time & Space)
16
Knowledge Gradient
 Knowledge Gradient (KG):
[Powell, 07]
 Myopic/single-step look-ahead policy: optimal if only one labeling is
still remaining.
 Deterministic KG: breaking ties by choosing the smallest index
 Randomized KG: breaking ties at random
17
Optimistic Knowledge Gradient
 Optimistic Knowledge Gradient:
Proof Sketch:
 𝑅+ 𝑎, 𝑏 > 0
 lim 𝑅 + 𝑎, 𝑏 = 0
𝑎+𝑏→∞
 𝑇 → ∞ , each item will be labeled infinitely
many times
 By strong law of large number, 𝐻𝑇 = 𝐻 ∗ , 𝑎. 𝑠.
 Pessimistic Knowledge Gradient:
Inconsistent Policy
18
Optimistic Knowledge Gradient
19
Conditional Value-at-Risk
 Conditional Value-at-Risk (CVaR)
[Rockafellar and Uryasev, 02]
Value-at-Risk: 𝛼 -upper quantile
Conditional Value-at-Risk (CVaR):
Expected reward exceeding VaR 𝛼 (𝑅)
Max
Reward
CVaR is a consistent
policy for any 𝛼 < 1
Knowledge Gradient
Optimistic Knowledge Gradient
20
Experiments
Simulated Data
𝐾 = 50, 𝜃𝑖 ∼ Beta(1,1)
𝑇 = 2𝐾, 3𝐾, … , 10𝐾
Recognizing Textual Entailment Data
(Snow at .al. EMNLP’08)
𝐾 = 800, 𝜃𝑖 ∼ Beta(1,1)
𝑇 = 2𝐾, 3𝐾, … , 10𝐾
21
Roadmap
Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming
Approximate Policy: Optimistic Knowledge Gradient
Modeling Workers’ Reliability
Extensions: Incorporate of Feature Information
Extensions: Multi-Class Labeling
22
Heterogeneous Workers: modeling reliability
Labeling Matrix: 𝑍𝑖𝑗 ∈ −1,1
 Heterogeneous workers:
 Modeling reliability of workers to
facilitate the estimation of true label
 Assign more items to reliable workers
 𝑁 items 1 ≤ 𝑖 ≤ 𝑁
 𝜃𝑖 = Pr(𝑍𝑖 = 1)
𝜃𝑖 ∼ Beta(𝑎𝑖0 , 𝑏𝑖0 )
 𝑀 workers 1 ≤ 𝑗 ≤ 𝑀
[Dawid and Skene, 79]
 Reliability: 𝜌𝑗 = Pr(𝑍𝑖𝑗 = 𝑍𝑖 𝑍𝑖
 𝜌𝑗 ∼ Beta(𝑐𝑗0 , 𝑑𝑗0 )
 Action space: 𝑖, 𝑗 ∈ 1, … , 𝑁 × {1, … , 𝑀}
 Likelihood (Law of total probability):
Homogeneous Worker Model
23
Variational Approximation and Moment Matching
 Prior (product of Beta):
 Likelihood:
 Posterior:
No longer product
of Beta distribution
Variational Approximation: Approximate the posterior
by the product of marginal distributions
Approximate marginal distribution by beta
distribution using moment matching
 Two-Coin Model:
24
Optimistic Knowledge Gradient
Reward of getting label 1
Reward of getting label -1
25
Experiments
Simulated Data
𝐾 = 50, 𝜃𝑖 ∼ Beta(1,1)
𝑀 = 10, 𝜌𝑗 ∼ Beta(4,1)
𝑇 = 2𝐾, 3𝐾, … , 10𝐾
Recognizing Textual Entailment Data
(Snow at .al. EMNLP’08)
𝐾 = 800, 𝜃𝑖 ∼ Beta(1,1), 𝑀 = 164, 𝜌𝑗 ∼ Beta(4,1)
𝑇 = 2𝐾, 3𝐾, … , 10𝐾
Homogenous (Perfect) Workers
Heterogeneous Workers
Best
accuracy
(92.25% )
with
only 40% of
the budget
26
Roadmap
Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming
Approximate Policy: Optimistic Knowledge Gradient
Modeling Workers’ Reliability
Extensions: Incorporate of Feature Information
Extensions: Multi-Class Labeling
27
Incorporate Feature Information
 Each item 𝑖 has a feature vector x𝑖 ∈ R𝑝
Prior:
 Posterior:
 Laplace Approximation:
 Updated Mean:
Rank-1 update: Sherman-Morrison
 Updated Covariance:
 Bottleneck:
 Variational Bayesian logistic regression update
[Jaakkola & Jordan, 00]
28
Incorporate Feature Information
Simulated Data
𝐾 = 50, dimension of feature 𝑝 = 10,
𝒘 ~ 𝑁 0,0.1 ∗ 𝑰 , 𝒙 ~ 𝑁 0,0.3 𝑖−𝑗
29
Roadmap
Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming
Approximate Policy: Optimistic Knowledge Gradient
Modeling Workers’ Reliability
Extensions: Incorporate of Feature Information
Extensions: Multi-Class Labeling
30
Multi-Class Labeling
 Classes: 𝑐 = 1, … , 𝐶
𝜃𝑖 ∈ [0,1]: underlying
probability of being
positive
𝜃𝑖 : Beta Prior
𝑦𝑖𝑡 ∈ −1, 1 : Bernoulli 𝜃𝑖
𝜽𝑖 = 𝜃𝑖1 , … , 𝜃𝑖𝐶 : 𝐶𝑐=1 𝜃𝑖𝑐 = 1
𝜃𝑖𝑐 : underlying probability of
belong to class c
𝜽𝑖 = 𝜃𝑖1 , … , 𝜃𝑖𝐶 : Dirichlet Prior
𝑦𝑖𝑡 ∈ 1, 2, … , 𝐶 : Categorical 𝜽𝑖
31
Real Experiment
Stanford Image Data (4-classes of dogs)
(Zhou at .al. NIPS’12, labeled by Amazon MTurk)
𝐾 = 807, 𝜃𝑖 ∼ Dirichlet(1, … ,1)
𝑇 = 2𝐾, 3𝐾, … , 10𝐾
Bing Search Relevance Data
(5 Ratings)
𝐾 =2653, 𝜃𝑖 ∼ Dirichlet(1, … ,1)
32
𝑇 = 2𝐾, 3𝐾, … , 6𝐾
Conclusions
 A general MDP framework for budget allocation in crowdsourcing
 Optimistic Knowledge Gradient Policy: approximate dynamic
programming
 Future Works:
 Saving computational cost (e.g., features / multi-class settings)
 Budget allocation in other settings in crowdsourcing (e.g., rating)
 Make the current framework more practical: batch assignment
(assign a set of items to a worker at each stage)
 Apply algorithms to real platforms in Bing
33
Acknowledgement
Great summer at Redmond: May 1st ~ Oct 12th
CLUES
Group
Machine
Learning
Department
Theory
Group
Interns
34