AdPredictor - VideoLectures.NET

Machine Learning in Microsoft’s
Online Services: TrueSkill,
AdPredictor, and Matchbox
Thore Graepel
Online Services and Advertising Research
Microsoft Research Cambridge, UK
Joint work with Thomas Borchert, Joaquin Quiñonero Candela, Ralf
Herbrich, David Stern, and Tom Minka
Industry Day at ECML/PKDD 2010 - Barcelona
Outline
TrueSkillTM: Ranking and Matchmaking
• Ranking and Matchmaking Task
• TrueSkill Model and Inference
AdPredictorTM: Click-Through Rate Prediction
• Paid Search Advertising and Click-Through Rate prediction
• AdPredictor Model and Inference, and results
Matchbox: Large-Scale Recommendations
• Recommendation with Meta-Data
• Matchbox model and inference, and results
Thore Graepel, Ralf Herbrich, Tom Minka, and our friends from Xbox Live
TRUESKILL
• Ralf Herbrich, Tom Minka, and Thore Graepel,
TrueSkill(TM): A Bayesian Skill Rating System,
in Advances in Neural Information Processing Systems 20, MIT Press, January 2007
• Pierre Dangauthier, Ralf Herbrich, Tom Minka, and Thore Graepel,
TrueSkill Through Time: Revisiting the History of Chess,
in Advances in Neural Information Processing Systems 20, MIT Press, 2008
Ranking and Matchmaking
• Competition is central to our lives
– Evolutionary principle
– Driving principle of many sports
• Chess Rating for fair competition
– ELO: Developed in 1960 by Árpád Imre Élő
– Matchmaking system for tournaments
• Challenges of online gaming
– Learn from few match outcomes efficiently
– Support multiple teams and multiple players per
team
The Skill Rating Problem
Multiple Team Match Outcome Model
s
s
s
s
1
2
3
4
t1
t2
y12
t3
y23
Efficient Approximate Inference
Gaussian Prior Factors
s1
s2
t1
s3
s4
t2
t3
y
y
Ranking
Likelihood Factors
12
23
Applications to Online Gaming
• Matchmaking
– For gamers: Most uncertain outcome
– For inference: Most informative
– Equivalent!
Convergence Speed
40
35
30
Level
25
20
15
char (TrueSkill™)
10
SQLWildman (TrueSkill™)
char (Halo 2 rank)
SQLWildman (Halo 2 rank)
5
0
0
100
200
Number of Games
300
400
Xbox 360 & Halo 3
• Xbox 360 Live
– Launched in September 2005
– Every game uses TrueSkill™ to match
players
– > 20 million players
– > 2 million matches per day
– > 2 billion hours of game play
• Halo 3
– Launched on 25th September 2007
– Largest entertainment launch in history
– > 200,000 player concurrently (peak: 1,000,000)
Halo 3 in Action
Halo 3 Analysis: Fair Matches?
My former boss Bill…
Why are they
wasting their
time on
computer
games?
Great stuff, guys, but
how about online
advertising? Are the
ads not competing
for the attention of
the users just like
players in a game?!
Thore Graepel, Joaquin Quiñonero Candela, Thomas Borchert, Ralf
Herbrich & our friends from Microsoft adCenter
ADPREDICTOR
• Thore Graepel, Joaquin Quiñonero Candela, Thomas Borchert, and Ralf Herbrich,
Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising
in Microsoft’s Bing Search Engine, in ICML 2010, Invited Applications Track, June 2010
$1.00
* 10%
=$0.10
$0.80
$2.00
* 4%
=$0.08
$1.25
$0.10
* 50%
=$0.05
$0.05
Display to users (expected bid)
Charge advertisers (per click)
Importance of accurate probability estimates:
• Increase user satisfaction by better targeting
• Provide better deal to advertisers
• Increase revenue by showing ads with high click-thru rate
AdPredictor: Bayesian Probit Regression
102.34.12.201
Client IP
Impression Level ClickThrough Rate Prediction
15.70.165.9
221.98.2.187
92.154.3.86
Match
Type
Exact Match
Broad Match
ML-1
Position
SB-1
SB-2
+
p(pClick)
Closed Form Updates for “Click”
Weight Parameters from Production
• Means and variances of the
weights for display position
• Mainline ML, sidebar SB
• High mean  High CTR
• Low variance  Shown often
• Means and variances of
weights for user ID feature.
• Initialisation at zero mean unit
variance (top)
• Very high mean  fraud/bots
Comparison with Naïve Bayes
• Calibration: Ratio of Empirical
CTR / Predicted CTR
• Horizontal line at 1 is perfect
calibration
Algorithm
RIG
AUC
adPredictor
61.42%
95.6%
adPredictor (calibrated)
61.35%
95.6%
Naïve Bayes
-41.54%
89.4%
Naïve Bayes (calibrated)
33.86%
89.3%
Web Scale Implementation
AdPredictor drives almost 100% of
Bing’s Sponsored Search traffic
System’s
predictions
determine
composition of
future training
sample
Exploration
Continuous
Model Training
Maintain
controlled
memory footprint
by pruning
Thore Graepel, David Stern, Ralf Herbrich and the Emporia team
MATCHBOX
• David Stern, Ralf Herbrich, and Thore Graepel,
Matchbox: Large Scale Bayesian Recommendations,
in Proceedings of the 18th International World Wide Web Conference, 2009
Collaborative Filtering
Items
1
Users
A
2
3
4
5
6
Metadata?
B
C
D
?
?
?
Map Sparse Features To ‘Trait’ Space
234566
User ID
456457
13456
654777
Gender
Country
Height
Male
Female
UK
USA
1.2m
34
345
Item ID
64
5474
Horror
Drama
Comedy
Movie
Genre
Documentary
Matchbox With Metadata
User Metadata
ID=234
u01
Male British
u11
u21
+
u02
u12
User
Item s
Item Metadata
Camera SLR
v11
1
+
+
User ‘trait’ 1 t1
u22
v21
v12
s2 User ‘trait’ 2 t2
Rating potential ~
*
r
v22
+
Message Passing For Matchbox
u01
u11
u21
+
u02
u12
v11
s1
+
+
t1
u22
v21
v12
s2
t2
*
r
v22
+
Message Passing For Matchbox
u01
u11
u21
+
u02
u12
v11
s1
*
+
+
t1
u22
v21
v12
s2
*
t2
v22
+
+
r
Message update functions
powered by Infer.net
1,5
User/Item Trait Space
Adaptation
24: Season 3
1
24: Season 2
0,5
‘Preference Cone’ for user
145035
0
-1,5
-1
-0,5
0
0,5A
Clockwork1Orange
1,5
A Knights Tale
-0,5
AI: Artificial Intelligence
-1
Users
A Cinderella Story
-1,5
Movies
Feedback Models
u01
u11
u21
+
u02
u12
v11
s1
+
+
t1
u22
v21
v12
s2
t2
*
r
v22
+
Feedback Models
u01
u11
u21
+
u02
u12
v11
s1
+
+
t1
u22
v21
v12
s2
t2
*
r
v22
+
Feedback Models
r
q
>
>
<
<
t0
t1
t2
t3
MovieLens – 1,000,000 ratings
6,040 users
User ID
3,900 movies
Movie ID
User Job
User Age
Movie Genre
Other
Lawyer
<18
Action
Horror
Academic
Programmer
18-25
Adventure
Musical
Artist
Retired
25-34
Animation
Mystery
Admin
Sales
35-44
Children’s
Romance
Student
Scientist
45-49
Comedy
Thriller
Customer
Service
Self-Employed
50-55
Crime
Sci-Fi
Health Care
Technician
>55
Documentary
War
Managerial
Craftsman
User Gender
Drama
Western
Farmer
Unemployed
Male
Fantasy
Film Noir
Homemaker
Writer
Female
MovieLens with Thresholds Model
Mean Absolute Error
(ADF), Training Time= 1 Minute
Applications
Ranking of content on web portals
Online advertising (Display and Paid Search)
Personalised web search
Algorithm portfolio management
Tweet/News recommendation
Friends recommendation on social platforms
www.projectemporia.com
Conclusions
Probabilistic Graphical Models Work at Scale!
• Single pass learning benefits from explicit model of uncertainty
• The models make accurate and calibrated predictions
• They are efficient if we use problem-specific independence structure
• They are modular and can be composed into more powerful models
Wide Range of Applications
• Skill estimation in online games TrueSkill
• Click-through rate estimation in Paid Search  AdPredictor
• Large scale recommendations of content  Matchbox / Emporia
• Hopefully many more to come…
Thank you!
[email protected]