Vote Calibration in Community Question-Answering

Vote Calibration in Community
Question-Answering Systems
Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo!
Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
SIGIR 2012
This work was conducted when all authors were
affiliated with Yahoo!
1
Why I Present This Paper?
• Vote bias exists in many social media
platforms
• This paper solves a problem in a relatively old
context “CQA” from a new perspective,
“crowd sourcing quality content identification”
2
Outline
•
•
•
•
•
•
•
•
Motivation
Related Work
Data Set
Vote Calibration Model
Exploratory Analysis
Features
Experimental Results
Conclusion
3
Community Question Answering
Crowd sourced
alternative to search
engines for providing
information
4
Community Question Answering
Commercial spam: mostly can be tackled by
conventional machine learning
Low quality content: difficult for machines to
detect!
Crowdsourcing quality content identification
5
Voting Mechanism
• Content quality
• User expertise
6
Vote in Yahoo! Answers
• Asker vote for the best answer
• Asker does not vote for the best answer within
certain period, other users in the community
vote
• Thumb-up or thumb-down votes on each
individual answer
• However… Are users’ votes always un-biased?
7
Potential Bias
• Vote more positively for friends’ answers
• Use votes to show appreciation instead of
identifying high quality content
• Game the system to obtain high status,
multiple accounts, vote for one another
• Questions about opinions, vote for answer
that share same opinions
• …
8
Potential Bias
• Trained human editors to judge answers based
on a set of well-defined guidelines
• Raw user votes have low correlation with
editorial judgment
9
Motivation
• Propose the problem of vote calibration in
CQA systems
• Based on exploratory data analysis, identify a
variety of potential factors that bias the votes
• Develop a model for vote calibration based on
supervised learning, content-agnostic
approach
10
Related Work
• Predicting user-voted best answer
– Assumption: readily available user-voted best
answer are ground truth
• Predicting editorial judgments
– User votes are used as features, calibration of
each individual vote has not be studied
• Content-agnostic user expertise estimation
11
Dataset
• Editorial data
– Sample questions and answers from Yahoo!
Answers
– Give quality grade to the answer according to predetermined set of editorial guideline, excellent,
good, fair, bad
– 21,525 editorial judged answers on 7,372
questions
12
Dataset
• Distribution of editorial grades for best answers are
not very different from non-best answers. Low
correlation between users’ best-answer votes and
answer quality
• Significant percentage (>70%) of best answers are
not even good
• Many non-best answers are actually good or
excellent
13
Dataset
• Numeric quality scores,
excellent=1,good=0.5,fair=0,bad=-0.5
• Voting data, 1.3M questions, 7.0M answers,
0.5M asker best answer votes, 2.1M
community best answer votes, 9.1M thumb
up/down votes
14
Vote Calibration Model
15
Vote Calibration Model
• Three types of votes
– Asker votes: best answer votes by asker
• +1 for best answer
• -1 for other answers
– CBA votes: community best answer votes
• +1 from the voter that votes for best answer
• -1 from the voter for other answers
– Thumb votes: thumb-up and thumb down
• +1 for thumb up
• -1 for thumb down
16
Average Vote of An Answer
Pseudo votes, prior
Calibrated type-t votes
17
Average Vote of An Answerer/User
18
Quality Prediction Function
Calibrated vote aggregation model:
Bias term
Answer level
User level
Quality prediction: weighted sum of answer-level
and user-level average vote values of all types on
an answer
19
Training Algorithm
• Determine model parameters by minimizing
the following loss function
• Using gradient descent to determine model
parameters
20
Self Voting
• Self votes contribute to 33% of total CBA votes
• Users who cast at least 20 votes, percentage of self
votes goes above 40%
21
Vote Spread and Reciprocity
22
Interaction Bias
• Chi-squared statistic and randomized test
show past interaction could be useful features
for vote calibration
23
Feature
• Voter features
24
Feature
• Relation feature
25
Feature Transformation
• Each for the features C that are counts,
consider log(1+C) as an additional feature
• For ratio features R, include a quadratic term
R2
26
Experimental Results
• User-level expert ranking
– How well we rank users based on the predicted
user-level scores
• Answer ranking
– How well we rank answers based on the predicted
answer-level scores
27
Experimental Results
28
Comparison of Calibration Models
29
Impact on Heavy Users
30
Conclusion
• Introduce vote calibration problem to CQA
• Propose a set of features to capture bias by
analyzing potential bias in users’ voting
behavior
• Supervised calibrated models are better than
non-calibrated versions
31
• Thanks
• Q&A
32