Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google) SIGIR 2012 This work was conducted when all authors were affiliated with Yahoo! 1 Why I Present This Paper? • Vote bias exists in many social media platforms • This paper solves a problem in a relatively old context “CQA” from a new perspective, “crowd sourcing quality content identification” 2 Outline • • • • • • • • Motivation Related Work Data Set Vote Calibration Model Exploratory Analysis Features Experimental Results Conclusion 3 Community Question Answering Crowd sourced alternative to search engines for providing information 4 Community Question Answering Commercial spam: mostly can be tackled by conventional machine learning Low quality content: difficult for machines to detect! Crowdsourcing quality content identification 5 Voting Mechanism • Content quality • User expertise 6 Vote in Yahoo! Answers • Asker vote for the best answer • Asker does not vote for the best answer within certain period, other users in the community vote • Thumb-up or thumb-down votes on each individual answer • However… Are users’ votes always un-biased? 7 Potential Bias • Vote more positively for friends’ answers • Use votes to show appreciation instead of identifying high quality content • Game the system to obtain high status, multiple accounts, vote for one another • Questions about opinions, vote for answer that share same opinions • … 8 Potential Bias • Trained human editors to judge answers based on a set of well-defined guidelines • Raw user votes have low correlation with editorial judgment 9 Motivation • Propose the problem of vote calibration in CQA systems • Based on exploratory data analysis, identify a variety of potential factors that bias the votes • Develop a model for vote calibration based on supervised learning, content-agnostic approach 10 Related Work • Predicting user-voted best answer – Assumption: readily available user-voted best answer are ground truth • Predicting editorial judgments – User votes are used as features, calibration of each individual vote has not be studied • Content-agnostic user expertise estimation 11 Dataset • Editorial data – Sample questions and answers from Yahoo! Answers – Give quality grade to the answer according to predetermined set of editorial guideline, excellent, good, fair, bad – 21,525 editorial judged answers on 7,372 questions 12 Dataset • Distribution of editorial grades for best answers are not very different from non-best answers. Low correlation between users’ best-answer votes and answer quality • Significant percentage (>70%) of best answers are not even good • Many non-best answers are actually good or excellent 13 Dataset • Numeric quality scores, excellent=1,good=0.5,fair=0,bad=-0.5 • Voting data, 1.3M questions, 7.0M answers, 0.5M asker best answer votes, 2.1M community best answer votes, 9.1M thumb up/down votes 14 Vote Calibration Model 15 Vote Calibration Model • Three types of votes – Asker votes: best answer votes by asker • +1 for best answer • -1 for other answers – CBA votes: community best answer votes • +1 from the voter that votes for best answer • -1 from the voter for other answers – Thumb votes: thumb-up and thumb down • +1 for thumb up • -1 for thumb down 16 Average Vote of An Answer Pseudo votes, prior Calibrated type-t votes 17 Average Vote of An Answerer/User 18 Quality Prediction Function Calibrated vote aggregation model: Bias term Answer level User level Quality prediction: weighted sum of answer-level and user-level average vote values of all types on an answer 19 Training Algorithm • Determine model parameters by minimizing the following loss function • Using gradient descent to determine model parameters 20 Self Voting • Self votes contribute to 33% of total CBA votes • Users who cast at least 20 votes, percentage of self votes goes above 40% 21 Vote Spread and Reciprocity 22 Interaction Bias • Chi-squared statistic and randomized test show past interaction could be useful features for vote calibration 23 Feature • Voter features 24 Feature • Relation feature 25 Feature Transformation • Each for the features C that are counts, consider log(1+C) as an additional feature • For ratio features R, include a quadratic term R2 26 Experimental Results • User-level expert ranking – How well we rank users based on the predicted user-level scores • Answer ranking – How well we rank answers based on the predicted answer-level scores 27 Experimental Results 28 Comparison of Calibration Models 29 Impact on Heavy Users 30 Conclusion • Introduce vote calibration problem to CQA • Propose a set of features to capture bias by analyzing potential bias in users’ voting behavior • Supervised calibrated models are better than non-calibrated versions 31 • Thanks • Q&A 32
© Copyright 2026 Paperzz