Machine Learning in Microsoft’s Online Services: TrueSkill, AdPredictor, and Matchbox Thore Graepel Online Services and Advertising Research Microsoft Research Cambridge, UK Joint work with Thomas Borchert, Joaquin Quiñonero Candela, Ralf Herbrich, David Stern, and Tom Minka Industry Day at ECML/PKDD 2010 - Barcelona Outline TrueSkillTM: Ranking and Matchmaking • Ranking and Matchmaking Task • TrueSkill Model and Inference AdPredictorTM: Click-Through Rate Prediction • Paid Search Advertising and Click-Through Rate prediction • AdPredictor Model and Inference, and results Matchbox: Large-Scale Recommendations • Recommendation with Meta-Data • Matchbox model and inference, and results Thore Graepel, Ralf Herbrich, Tom Minka, and our friends from Xbox Live TRUESKILL • Ralf Herbrich, Tom Minka, and Thore Graepel, TrueSkill(TM): A Bayesian Skill Rating System, in Advances in Neural Information Processing Systems 20, MIT Press, January 2007 • Pierre Dangauthier, Ralf Herbrich, Tom Minka, and Thore Graepel, TrueSkill Through Time: Revisiting the History of Chess, in Advances in Neural Information Processing Systems 20, MIT Press, 2008 Ranking and Matchmaking • Competition is central to our lives – Evolutionary principle – Driving principle of many sports • Chess Rating for fair competition – ELO: Developed in 1960 by Árpád Imre Élő – Matchmaking system for tournaments • Challenges of online gaming – Learn from few match outcomes efficiently – Support multiple teams and multiple players per team The Skill Rating Problem Multiple Team Match Outcome Model s s s s 1 2 3 4 t1 t2 y12 t3 y23 Efficient Approximate Inference Gaussian Prior Factors s1 s2 t1 s3 s4 t2 t3 y y Ranking Likelihood Factors 12 23 Applications to Online Gaming • Matchmaking – For gamers: Most uncertain outcome – For inference: Most informative – Equivalent! Convergence Speed 40 35 30 Level 25 20 15 char (TrueSkill™) 10 SQLWildman (TrueSkill™) char (Halo 2 rank) SQLWildman (Halo 2 rank) 5 0 0 100 200 Number of Games 300 400 Xbox 360 & Halo 3 • Xbox 360 Live – Launched in September 2005 – Every game uses TrueSkill™ to match players – > 20 million players – > 2 million matches per day – > 2 billion hours of game play • Halo 3 – Launched on 25th September 2007 – Largest entertainment launch in history – > 200,000 player concurrently (peak: 1,000,000) Halo 3 in Action Halo 3 Analysis: Fair Matches? My former boss Bill… Why are they wasting their time on computer games? Great stuff, guys, but how about online advertising? Are the ads not competing for the attention of the users just like players in a game?! Thore Graepel, Joaquin Quiñonero Candela, Thomas Borchert, Ralf Herbrich & our friends from Microsoft adCenter ADPREDICTOR • Thore Graepel, Joaquin Quiñonero Candela, Thomas Borchert, and Ralf Herbrich, Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine, in ICML 2010, Invited Applications Track, June 2010 $1.00 * 10% =$0.10 $0.80 $2.00 * 4% =$0.08 $1.25 $0.10 * 50% =$0.05 $0.05 Display to users (expected bid) Charge advertisers (per click) Importance of accurate probability estimates: • Increase user satisfaction by better targeting • Provide better deal to advertisers • Increase revenue by showing ads with high click-thru rate AdPredictor: Bayesian Probit Regression 102.34.12.201 Client IP Impression Level ClickThrough Rate Prediction 15.70.165.9 221.98.2.187 92.154.3.86 Match Type Exact Match Broad Match ML-1 Position SB-1 SB-2 + p(pClick) Closed Form Updates for “Click” Weight Parameters from Production • Means and variances of the weights for display position • Mainline ML, sidebar SB • High mean High CTR • Low variance Shown often • Means and variances of weights for user ID feature. • Initialisation at zero mean unit variance (top) • Very high mean fraud/bots Comparison with Naïve Bayes • Calibration: Ratio of Empirical CTR / Predicted CTR • Horizontal line at 1 is perfect calibration Algorithm RIG AUC adPredictor 61.42% 95.6% adPredictor (calibrated) 61.35% 95.6% Naïve Bayes -41.54% 89.4% Naïve Bayes (calibrated) 33.86% 89.3% Web Scale Implementation AdPredictor drives almost 100% of Bing’s Sponsored Search traffic System’s predictions determine composition of future training sample Exploration Continuous Model Training Maintain controlled memory footprint by pruning Thore Graepel, David Stern, Ralf Herbrich and the Emporia team MATCHBOX • David Stern, Ralf Herbrich, and Thore Graepel, Matchbox: Large Scale Bayesian Recommendations, in Proceedings of the 18th International World Wide Web Conference, 2009 Collaborative Filtering Items 1 Users A 2 3 4 5 6 Metadata? B C D ? ? ? Map Sparse Features To ‘Trait’ Space 234566 User ID 456457 13456 654777 Gender Country Height Male Female UK USA 1.2m 34 345 Item ID 64 5474 Horror Drama Comedy Movie Genre Documentary Matchbox With Metadata User Metadata ID=234 u01 Male British u11 u21 + u02 u12 User Item s Item Metadata Camera SLR v11 1 + + User ‘trait’ 1 t1 u22 v21 v12 s2 User ‘trait’ 2 t2 Rating potential ~ * r v22 + Message Passing For Matchbox u01 u11 u21 + u02 u12 v11 s1 + + t1 u22 v21 v12 s2 t2 * r v22 + Message Passing For Matchbox u01 u11 u21 + u02 u12 v11 s1 * + + t1 u22 v21 v12 s2 * t2 v22 + + r Message update functions powered by Infer.net 1,5 User/Item Trait Space Adaptation 24: Season 3 1 24: Season 2 0,5 ‘Preference Cone’ for user 145035 0 -1,5 -1 -0,5 0 0,5A Clockwork1Orange 1,5 A Knights Tale -0,5 AI: Artificial Intelligence -1 Users A Cinderella Story -1,5 Movies Feedback Models u01 u11 u21 + u02 u12 v11 s1 + + t1 u22 v21 v12 s2 t2 * r v22 + Feedback Models u01 u11 u21 + u02 u12 v11 s1 + + t1 u22 v21 v12 s2 t2 * r v22 + Feedback Models r q > > < < t0 t1 t2 t3 MovieLens – 1,000,000 ratings 6,040 users User ID 3,900 movies Movie ID User Job User Age Movie Genre Other Lawyer <18 Action Horror Academic Programmer 18-25 Adventure Musical Artist Retired 25-34 Animation Mystery Admin Sales 35-44 Children’s Romance Student Scientist 45-49 Comedy Thriller Customer Service Self-Employed 50-55 Crime Sci-Fi Health Care Technician >55 Documentary War Managerial Craftsman User Gender Drama Western Farmer Unemployed Male Fantasy Film Noir Homemaker Writer Female MovieLens with Thresholds Model Mean Absolute Error (ADF), Training Time= 1 Minute Applications Ranking of content on web portals Online advertising (Display and Paid Search) Personalised web search Algorithm portfolio management Tweet/News recommendation Friends recommendation on social platforms www.projectemporia.com Conclusions Probabilistic Graphical Models Work at Scale! • Single pass learning benefits from explicit model of uncertainty • The models make accurate and calibrated predictions • They are efficient if we use problem-specific independence structure • They are modular and can be composed into more powerful models Wide Range of Applications • Skill estimation in online games TrueSkill • Click-through rate estimation in Paid Search AdPredictor • Large scale recommendations of content Matchbox / Emporia • Hopefully many more to come… Thank you! [email protected]
© Copyright 2026 Paperzz