TAC-AA 2010 TAU Agent

TAU Agent
Team:
Yishay Mansour
Mariano Schain
Tel Aviv University
TAC-AA 2010
Overview
• Machine Learning approach:
– Regret Minimization
• Simple: Adaptive scheme
– Robust: Performance Bounds
• Low dependency on the exact models
• Started (very) late.
– 3 weeks (for everything)
• Influenced many of the strategic decisions
Regret Minimization: Overview
• Setting: Single player
multiple actions
• Goal:
• At every time step:
• Benchmarks:
– Player Chooses a
distribution over actions.
– observes the gain of each
action
• Can be even adversarial
model
• Partial information model
(MAB)
– Maximize cumulative gain
– Best static choice of action
(‘external regret’)
• Guarantee:
– Near optimal
• W.r.t. benchmark
• Vanishing average regret
RM Algorithm (full information)
• Main idea:
Smoothed Greedy
• Best action –
Highest weight
• Near-best action –
High weight
• Inferior actions –
low weight
• Non trivial analysis
• Many algorithms
Polynomial Weights:
• Parameter u
• Maintain weights wi,t
pi,t = wi,t /Wt
• Initially wi,1=1, W1=m
• At time step t:
observed gains gi,t-1 :
wi,t =wi,t-1(1+u*gi,t-1)
Applying Regret Minimization to AA: Challenges
• Partial Information
– Explore vs. Exploit
– There are Partial Information (MAB) Regret Minimization algo.,
– Similar regret bounds
• Higher dependency on the action space
• More time for initial exploration
• Very Large Action Space
– Action = (bid, ad type, budget limit) for every query
– Observed ‘gain’ = Value Per Unit Sold for every query
– Theoretical results may not directly apply
The elements of TAU scheme
• (Almost) constant ‘high’ bids on specialty queries:
– Reduce action space!
– Win impression for every user in population – ease exploration!
– Also… High conversion rate, High click-through rate, High revenue
• Adaptive score: based on Value Per Unit Sold:
– Main limitation is capacity units
– Use regret minimization to select action distribution
• Fractional allocation of capacity based on score
– Based on regret minimization output
• Profitable queries gets most of the capacity
– Maintain exploration
• a minimum budget to ‘probe’ all queries and adapt to trends
Software
sales
reports
Overall
Capacity
Control
Analysis
Analysis
Analysis
Analysis:
Score, Est.
Allocation
quota
scores, est. sales
Bid
Bid
Bid
Bid: cpc, limit
est. cpc, est. convrate
Plans / Enhancements
• Features:
–
–
–
–
–
Burst Identification
‘Bottom fishing’
Tuning parameters to capacity
ML to estimate sales
Reinforced learning of capacity allocation decisions
• Post Competition analysis:
Validate Robustness
– Varying game simulation parameters
Thank You
Mariano Schain
[email protected]