TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010 Overview • Machine Learning approach: – Regret Minimization • Simple: Adaptive scheme – Robust: Performance Bounds • Low dependency on the exact models • Started (very) late. – 3 weeks (for everything) • Influenced many of the strategic decisions Regret Minimization: Overview • Setting: Single player multiple actions • Goal: • At every time step: • Benchmarks: – Player Chooses a distribution over actions. – observes the gain of each action • Can be even adversarial model • Partial information model (MAB) – Maximize cumulative gain – Best static choice of action (‘external regret’) • Guarantee: – Near optimal • W.r.t. benchmark • Vanishing average regret RM Algorithm (full information) • Main idea: Smoothed Greedy • Best action – Highest weight • Near-best action – High weight • Inferior actions – low weight • Non trivial analysis • Many algorithms Polynomial Weights: • Parameter u • Maintain weights wi,t pi,t = wi,t /Wt • Initially wi,1=1, W1=m • At time step t: observed gains gi,t-1 : wi,t =wi,t-1(1+u*gi,t-1) Applying Regret Minimization to AA: Challenges • Partial Information – Explore vs. Exploit – There are Partial Information (MAB) Regret Minimization algo., – Similar regret bounds • Higher dependency on the action space • More time for initial exploration • Very Large Action Space – Action = (bid, ad type, budget limit) for every query – Observed ‘gain’ = Value Per Unit Sold for every query – Theoretical results may not directly apply The elements of TAU scheme • (Almost) constant ‘high’ bids on specialty queries: – Reduce action space! – Win impression for every user in population – ease exploration! – Also… High conversion rate, High click-through rate, High revenue • Adaptive score: based on Value Per Unit Sold: – Main limitation is capacity units – Use regret minimization to select action distribution • Fractional allocation of capacity based on score – Based on regret minimization output • Profitable queries gets most of the capacity – Maintain exploration • a minimum budget to ‘probe’ all queries and adapt to trends Software sales reports Overall Capacity Control Analysis Analysis Analysis Analysis: Score, Est. Allocation quota scores, est. sales Bid Bid Bid Bid: cpc, limit est. cpc, est. convrate Plans / Enhancements • Features: – – – – – Burst Identification ‘Bottom fishing’ Tuning parameters to capacity ML to estimate sales Reinforced learning of capacity allocation decisions • Post Competition analysis: Validate Robustness – Varying game simulation parameters Thank You Mariano Schain [email protected]
© Copyright 2025 Paperzz