Adaptive entity resolution
with human computation
龚赛赛
20151026
Contents
Background and Motivation
Related Work
Model Overview
Background
• ER: identify entities referring to the same real-world objects
• Hybrid human-machine approaches
• Leverage human intelligence
• Improving the quality of user-judged results is important
• Users usually have diverse competence w.r.t resolving different
entities
• Users have different background and domain knowledge
• Unfamiliar resolution tasks -> low accuracy
• So, it is necessary to assign tasks to competent users
Motivation
• In the Semantic Web, few works consider users’ diverse competence
across resolution tasks.
• In the other communities (e.g. ML and DB), several works for handling
crowdsourcing tasks propose their approaches to estimate users’
competence and adaptively assign tasks.
• However, these approaches did not fully exploit the characteristics of Linked
Data and need to be tailored.
• Our goal: estimate user competency based on the similar completed
tasks and adaptively assign tasks to competent users
Related Work
• Use crowdsourcing to acquire for user contribution with various goals
• Infer true labels from the crowds when existing unreliable users
• Reduce expenditure, e.g.
• Ref. [1] vldb’12 use clustering algorithms to reduce verification tasks for crowd ER
• Ref. [2] vldb’13 select the best question for crowd ER
• Others
• Common algorithms: EM
• Other ones such as minimax entropy
• Two important components (unified in a single model or separated)
• Estimating user competence (/expertise/quality)
• Adaptive task assignment
• Heterogeneous vs Homogeneous
Related Work
• Estimating user competence
• Based on prior knowledge on probabilistic distribution
• Based on similar tasks
• Based on ground truth
• The fraction of tasks with ground truth users labeled correctly
• Adaptive task assignment
• Assign tasks to the best competent users
• For a task, assign it to the users with highest estimated competence values
• Global optimization
Related Work
• Estimating user competence based on prior knowledge on
probabilistic distribution (Ref. [3], icdm12)
X: instance feature vector
N: num of instances
Z: true label
Y: user label
A: user competence
M: num of users
i: instance index, j: user index
Related Work
• Estimating user competence based on prior knowledge on
probabilistic distribution (cont.)
• E step
• M step
Related Work
• Estimating user competence based on similar tasks (Ref. [4] sigmod15)
Similar tasks
have similar
estimated
accuracies
Solve by page rank
Estimated
accuracy p
need to be
similar to real
accuracy q
Related Work
• Estimating user competence based on similar tasks (Ref[6] aaai14)
X’ a learned high level representation
of instances from transfer learning
Related Work
• Adaptive task assignment with global optimization (Ref. [4] Sigmod15 )
Optimization
target:
Greedy assignment: Maximize
• More sophisticated ones e.g. online primal-dual technique (Ref. [5])
Model overview
Tasks {ti}
(entity
pairs)
User competences
{cij}
Task selector and assigner
Competence Estimator
Next task with
assigned users
task similarity {ti,tk,uj},
accuracy {tk,pk,uj}
User answers
User 1
User 2 User m
Task answers of users
Model overview
• Task similarity <ei,ej>, <ea,eb>
• Sim_ea : Max(sim(ea, ei), sim(ea, ej))
• (Sim_ea +Sim_eb)/2
• sim(ea, ei),
• Same datasource and similar types
• Isub of class names or shortest path length
• Overlap of properties used in description
• Isub of property names
• Connected neighbors
• Random walk
• Semantic based
• e.g. owl:sameAs
Model overview
• Task selector by uncertainty
• Entropy based
• Task assigner
• The top-k users with highest competences
Reference
1. Wang, J., Kraska, T., Franklin, M. J., Feng, J.: CrowdER: Crowdsourcing entity
resolution. VLDB, 5(11):1483-1494, 2012
2. Whang, S. E., Lofgren, P., Garcia-Monlina, H.: Question selection for crowd entity
resolution. VLDB, 6(6):349-360, 2013
3. Fang M. et al. Sukthankar G. Self taught active learning from crowds. ICDM, 858863, 2012
4. Fan, J., Li, G., Ooi, B. C., Tan, K., Feng, J.: iCrowd: An adaptive crowdsourcing
framework. In: SIGMOD, pp. 1015-1030, 2015
5. Ho, C. J., Jabbari, S., Vaughan, J. W.: Adapative task assigment for crowsourced
classification. In: ICML, pp. 534-542, 2013
6. Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge
transfer. In: AAAI, pp. 1809-1815, 2014
© Copyright 2026 Paperzz