Minimizing Regret with Multiple Reserves

Minimizing Regret with
Multiple Reserves
Joshua R. Wang (Stanford)
Tim Roughgarden (Stanford)
EC’16: Maastricht, The Netherlands
July 27th, 2016
Reserve Prices
• Seller has a single item and wants to maximize revenue.
• [Myerson ‘81] if valuations are i.i.d. from regular distribution, use
second-price auction with reserve price set to the monopoly price.
• Bayesian Optimization: Given a prior distribution over bidder
valuations, try to maximize expected revenue.
Reserve Price Problems
• Bayesian Optimization: Given a prior distribution over bidder
valuations, maximize expected revenue.
• Batch Learning: Unknown distribution 𝐹 over bidder valuations, given
𝑚 i.i.d. samples from 𝐹, maximize the expected revenue w.r.t. 𝐹.
• Online no-regret Learning: Valuation profiles arrive one by one. At
time 𝑡, choose an auction as a function of previously seen profiles
v1 , v 2 , … , v 𝑡−1 . Maximize time-averaged revenue relative to the timeaveraged revenue of the best auction in hindsight.
• Offline Optimization: Given 𝑚 valuation profiles, maximize the
average revenue across these profiles.
Non-Anonymous Reserve Prices
• Natural Extension: For asymmetric bidders, use non-anonymous
reserve prices.
• Real-life examples of discriminating between bidders include ad
quality and different opening bids for the FCC Incentive Auction
[Cramton et al. ‘15]
Non-Anonymous Reserve Prices
• Natural Extension: For asymmetric bidders, use non-anonymous
reserve prices.
• We can again consider Bayesian optimization, batch learning, online
no-regret learning, and offline optimization.
• [Hartline and Roughgarden ‘09] studied Bayesian optimization where
bidders valuations were independently but not identically distributed;
setting each bidder’s reserve to the monopoly price for her
distribution yields near-optimal expected revenue in many scenarios.
• [Paes Leme et al. ‘16] showed that offline optimization is NP-hard.
Eager Versus Lazy Reserve Prices
Bidder
1
2
3
Valuation
6
4
2
Reserve
7
6
1
No
Winner
Winner
Pays 1
We focus on eager
reserve prices, which
are superior from
both a welfare and
revenue standpoint
[Paes Leme et al. ‘16].
Eager
Lazy
Offline Maximizing Multiple Reserves (MMR)
• Offline Optimization Problem
• Input: 𝑚 valuation profiles v1 , … , v 𝑚
• Output: vector r of reserve prices which maximize revenue
• Without loss of generality, set 𝑟𝑖 to some
𝑗
𝑣𝑖 .
• Brute-force: consider 𝑚𝑛 possible reserve vectors.
Offline Maximizing Multiple Reserves (MMR)
+2
-2
-2
6
4
2
6
4
Bidder 1
4
2
2
Bidder 2
6
Offline MMR Algorithm
• Let 𝑆𝑖 denote profiles where 𝑖 has the highest valuation.
• Let
𝑗
𝑣2
denote the second highest valuation in v𝑗 .
• For each bidder 𝑖 = 1, 2, … , 𝑛:
• Choose 𝑟𝑖 to maximize
𝑗 𝑟 .
𝑞
𝑖
𝑗∈𝑆𝑖
𝑞𝑗 5 = 1
𝑗
𝑣2 =4
• Return either r or the all-zero vector,
whichever generates more revenue.
𝑞𝑗 3 = 0
6
Offline MMR Algorithm
• Theorem 1. The offline MMR algorithm is a 1/2-approximation.
• Also extends to matroid environments.
Offline MMR Algorithm Proof
• The total revenue OPT obtains is
at most
𝑚
𝑞 5 =1
5
𝑗=1
𝑗
𝑣2 +
𝑛
𝑖=1
𝑗∈𝑆𝑖
𝑞 𝑗 𝑟𝑖∗
𝑞 3 =0
3
6
4
Bidder 1
Bidder 2
• Our computed reserves r achieve
at least the second term; the allzero vector achieves at least the
first term.
Offline MMR Lower Bound
• Theorem 2. The MMR problem is NP-hard to approximate within a
1 − 𝜖 factor for some constant 𝜖 > 0.
• Proof Idea: Reduction from a variant of Set Cover. Each set
represented by a bidder; a low reserve means picking the set.
• Using a result of [Chlebik and Chlebikova ‘08], we get that the offline
MMR problem cannot be approximated better than 884/885 (even
simply computing the revenue).
Consequences for Batch MMR
• Batch Learning Problem:
• Unknown distribution 𝐹, draw 𝑚 i.i.d. samples. Algorithm uses samples to
maximize expected revenue on 𝐹.
• Computationally efficient batch learning is essentially efficiently
computing the Empirical Risk Minimizer (ERM).
• ERM is the offline MMR problem; which we showed was APX-hard.
• Can use our 1/2-approximation algorithm with polynomial samples
to obtain roughly half of the maximum possible revenue.
Online Maximizing Multiple Reserves (MMR)
• Online Optimization Problem
• Every round, adversary chooses a valuation vector v 𝑖 and algorithm chooses a
reserve vector r 𝑖 .
• After 𝑇 rounds, compared against to the best fixed r in hindsight.
• Number of actions (corresponding to reserve vectors) is exponential
in 𝑛.
• Cannot simply apply black-box techniques such as [Kakade et al. ‘09]
because MMR is nonlinear.
𝛼-Regret [Kakade et al. 2009]
• 𝛼-regret notion merges regret with 𝛼-approximation.
• The (time-averaged) 𝛼-regret of a sequence r1 , … , r 𝑇 with respect to
valuation profiles 𝑣 1 , … , 𝑣 𝑇 is
1
𝛼 ⋅ max
r 𝑇
𝑇
𝑅
𝑡=1
r, v 𝑡
1
−
𝑇
𝑇
𝑅 r𝑡 , v𝑡
𝑡=1
• Goal is still to drive this towards 0 as quickly as possible, for 𝛼 as close
to 1 as possible.
Online MMR Algorithm
• Based on Follow the Perturbed Leader (FTPL) [Kalai and Vempala ‘03].
• Choose 𝐾 = 𝑇 and 𝜖 = log 𝐾/𝑇.
• For each bidder 𝑖 = 1,2, … , 𝑛:
• Let 𝑆𝑖 denote the previous rounds 𝑖 had the highest valuation.
• For each reserve price 𝑟 = 1/𝐾, 2/𝐾, … , 1:
• Draw a random variable 𝑋𝑖,𝑟 from the standard exponential distribution.
• Let 𝑌𝑖,𝑟 = ±𝑋𝑖,𝑟 uniformly at random.
• Choose 𝑟𝑖𝑡 ∈ 1/𝐾, 2/𝐾, … , 1 to maximize 𝑌𝑖,𝑟 +
𝑗 𝑟𝑡 .
𝑞
𝑗∈𝑆𝑖
𝑖
• Return either r 𝑡 or the all-zero vector, each with 1/2 probability.
Online MMR Algorithm
• Theorem 3. The 1/2-regret of the online MMR algorithm is
𝑂 𝑛 log 𝑇 /𝑇 .
Online MMR Algorithm Proof
• Proof by Picture:
𝑇
OPT
≤
𝑗=1
𝑗
𝑣2
all-zero
reserves
𝑛
𝑖=1
𝑗∈𝑆𝑖
𝑞 𝑗 𝑟𝑖∗
FTPL for
each bidder
• Revenue decomposition allows us to move from exponential search
space to linear search spaces.
Online MMR Lower Bound
• No 𝛼-regret: if for every fixed 𝜖 > 0, the number of rounds 𝑇 to drive
the 𝛼-regret down to 𝜖 is bounded by a polynomial in the number of
bidders 𝑛.
884
,
885
• Theorem 4. For all constants 𝛼 >
there is no polynomial-time
algorithm for online MMR with no 𝛼-regret, unless 𝑁𝑃 = 𝑅𝑃.
• Similar to [folklore] and [Daskalakis and Syrgkanis ‘16].
Open Questions
• Extend algorithms beyond matroid environments?
• Consider other classes of auction (e.g. further discretized virtual
welfare maximizers).
• Improper learning problem? See [Devanur Huang Psomas ‘16] for
general revenue maximization for unknown distribution with samples.