Adaptively Learning Tolls to Induce Target Flows

Adaptively Learning Tolls to
Induce Target Flows
Aaron Roth
Joint work with Jon Ullman and Steven Wu
Non-Atomic Congestion Games
• A graph representing a road network
Non-Atomic Congestion Games
• A mass of players 𝑚𝑠,𝑡 who want to route
flow between 𝑠 and 𝑡.
•
𝑠,𝑡 𝑚𝑠,𝑡
=1
• Actions for players are paths in the graph
• Action profile ↔ a multi-commodity flow.
ℓ5 𝑥 = 1
• A graph representing a road network
• A latency function ℓ𝑒 𝑥 on each edge
• Infinitely many (infinitesimally small)
players
Equilibrium Flows
• “Nobody can improve their total latency by switching paths”
• Let 𝑃𝑠,𝑡 denote the set of 𝑠 → 𝑡 paths in the graph.
• Let ℓ 𝑝, 𝑓 = 𝑒∈𝑝 ℓ𝑒 (𝑓𝑒 )
Definition: A feasible multi-commodity flow 𝑓 is a Wardrop equilibrium
if for every (𝑠, 𝑡) with 𝑚𝑠,𝑡 > 0, and for all paths 𝑝, 𝑝′ ∈ 𝑃𝑠,𝑡 with
𝑓 𝑝 > 0:
ℓ 𝑝, 𝑓 ≤ ℓ(𝑝′ , 𝑓)
Routing games are potential games
• Equilibrium flows minimize the following potential function, among all
feasible multi-commodity flows:
𝑓𝑒
𝜙 𝑓 =
ℓ𝑒 𝑥 𝑑𝑥
𝑒∈𝐸 0
• Convex so long as ℓ𝑒 are non-decreasing
• Strongly convex if ℓ𝑒 are strictly increasing → equilibrium is unique.
Manipulating equilibrium flow
(classic problem)
• Suppose you can set tolls 𝑡𝑒 on each edge:
• ℓ𝑒 𝑡; 𝑥 = ℓ𝑒 𝑥 + 𝑡𝑒
• Potential function becomes 𝜙 𝑓, 𝑡 =
𝑒∈𝐸
𝑓𝑒
ℓ𝑒
0
𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒
• Changes the equilibrium flow.
• Goal: Set tolls to induce some target flow 𝑔 in equilibrium
• E.g. the socially optimal flow.
• Always possible
• Computationally tractable
A Natural Problem [Bhaskar, Ligett, Schulman, Swamy FOCS 14]
• You don’t know the latency functions…
• But you have the power to set tolls and see what happens.
• Agents play equilibrium flow given your tolls.
• i.e. you have access to an oracle that takes toll vectors 𝑡 and returns 𝑓 ∗ (𝑡),
the equilibrium flow given 𝑡.
• 𝑓 ∗ 𝑡 ≔ arg min 𝜙( 𝑓, 𝑡)
𝑓
• Want to learn tolls 𝑡 ∗ that induce some target flow 𝑔 in polynomially
many rounds.
The [BLSS] solution in a nutshell
• Assume latency functions ℓ𝑒 are convex polynomials of fixed degree
(the only thing unknown is the coefficients)
• E.g. ℓ𝑒 𝑥 = 𝑎𝑒 𝑥 + 𝑏𝑒
• Write down a convex program to try and solve for coefficients and
tolls that induce the target flow using Ellipsoid
• i.e. 𝑎𝑒 , 𝑏𝑒 , 𝑡𝑒
𝑒
• Every day use tolls 𝑡𝑒 at the centroid of the ellipsoid
• If 𝑓 ∗ 𝑡 ≠ 𝑔, can find a separating hyperplane
• So: number of rounds to convergence ↔ running time of ellipsoid.
The [BLSS] solution in a nutshell
• Very neat! (Read the paper!)
• A couple of limitations:
• Latency functions must have a simple, known form: only unknowns are a
small number of coefficients.
• Latency functions must be convex
• Heavy machinery
• Have to run Ellipsoid. Computationally intensive. Centralized.
Some desiderata
• Remove assumptions on latency functions
• No known parametric form
• Not necessarily convex
• Not necessarily Lipschitz
• Make update elementary
• Ideally decentralized.
Proposed algorithm: Tatonnemont.
1. Initialize 𝑡𝑒0 ← 0 for all edges. Let 𝑟 = 0.
2. Observe equilibrium flow 𝑓 = 𝑓 ∗ (𝑡 0 )
3. While 𝑓 − 𝑔 ≥ 𝜖
1. For each edge 𝑒 set: 𝑡𝑒𝑟+1 ← 𝑡𝑒𝑟 + 𝜂 𝑓𝑒 − 𝑔𝑒
2. Set 𝑟 ← 𝑟 + 1 and 𝑓 ← 𝑓 ∗ (𝑡 𝑟 )
• Natural, simple, distributed.
• Why should it work?
+
Reverse engineering why it should work
• View the interaction as repeated play of a game between a toll player
and a flow player.
• What game are they playing?
• Flow player’s strategies are feasible flows, toll player’s strategies are tolls in
some bounded range
• What are their cost functions?
• How are they playing it?
Behavior of the flow player is clear.
• Every day, the flow player plays:
𝑓∗
≔ arg min 𝜙 𝑓, 𝑡 = arg min
𝑓
𝑓
𝑒∈𝐸
𝑓𝑒
ℓ𝑒
0
𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒
• If we define the flow player’s cost to be:
𝑓𝑒
𝑐1 𝑓, 𝑡 =
ℓ𝑒 𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒
𝑒∈𝐸
0
then the flow player is playing a best response every day.
Behavior of the toll player?
• 𝑡𝑒𝑟+1 ← 𝑡𝑒𝑟 + 𝜂 𝑓𝑒 − 𝑔𝑒
+
• Consistent with playing online gradient descent with loss function:
ℓ𝑟+1
= 𝑔𝑒 − 𝑓𝑒
𝑒
• Which is the gradient of cost function:
𝑐2 𝑓, 𝑡 =
𝑡𝑒 𝑔𝑒 − 𝑓𝑒
𝑒
So algorithm is consistent with:
• Repeated play of the following game:
• 𝐴flow = 𝑓 ∶ 𝑓 is a feasible multi−commodity flow
• 𝐴toll = 𝑡 ∶ 𝑡𝑒 ≥ 0 ∀𝑒
𝑓
• 𝑐flow 𝑓, 𝑡 = 𝑒∈𝐸 0 𝑒 ℓ𝑒 𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒
• 𝑐toll 𝑓, 𝑡 =
𝑒 𝑡𝑒
𝑔𝑒 − 𝑓𝑒
• Where, in rounds:
• The flow player updates his strategy with online gradient descent, and
• The toll player best responds.
Questions
• Does the equilibrium of this game correspond to the target flow?
• Does this repeated play converge (quickly) to equilibrium?
Questions
• Does the equilibrium of this game correspond to the target flow?
• Yes
• Suppose tolls 𝑡 ∗ induce flow 𝑔.
• 𝑐toll 𝑔, 𝑡 ∗ =
• 𝑔 = arg min
𝑓
𝑒 𝑡𝑒
𝑒∈𝐸
𝑔𝑒 − 𝑓𝑒 = 0; t ∗ ∈ arg min 𝑐toll (𝑔, 𝑡)
𝑓𝑒
ℓ𝑒
0
𝑡
𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒 = arg min 𝑐flow (𝑓, 𝑡 ∗ )
𝑓
• ⇒ Nash equilibrium.
• Suppose (𝑓, 𝑡) is a Nash equilibrium.
• 𝑓 = 𝑔, else ∃ an edge 𝑒 with 𝑓𝑒 > 𝑔𝑒 and toll player would set 𝑡𝑒 → ∞
• Thinking a little bit harder, if ℓ𝑒 𝑥 ≤ 𝐵 for 𝑥 ∈ [0,1], then setting 𝑡𝑒 ∈ 0, 𝑚𝐵 sufficient
Questions
• Does this repeated play converge (quickly) to equilibrium?
• It does in zero sum games! [FreundSchapire96?]
• Actual strategy of GD player, empirical average of BR player.
So it converges in a zero sum game.
𝑓𝑒
𝑐flow 𝑓, 𝑡 =
ℓ𝑒 𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒
0
𝑒∈𝐸
𝑐toll 𝑓, 𝑡 =
𝑡𝑒 𝑔𝑒 − 𝑓𝑒
𝑒
𝑐flow 𝑓, 𝑡 + 𝑐toll 𝑓, 𝑡 =
𝑓𝑒
ℓ𝑒 𝑥 𝑑𝑥 + 𝑔𝑒 𝑡𝑒
𝑒
0
≠0
Strategic Equivalence
• Adding a strategy-independent term to a player’s cost function does
not change that player’s best response function
• And so doesn’t change the equilibria of the game…
So it converges in a zero sum game.
𝑓𝑒
ℓ𝑒 𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒 −
𝑐flow 𝑓, 𝑡 =
𝑒∈𝐸
0
𝑐toll 𝑓, 𝑡 =
𝑒∈𝐸
𝑡𝑒 𝑔𝑒 − 𝑓𝑒
𝑒
𝑐flow 𝑓, 𝑡 + 𝑐toll 𝑓, 𝑡 =
𝑓𝑒
ℓ𝑒 𝑥 𝑑𝑥 + 𝑔𝑒 𝑡𝑒
𝑒
0
𝑔𝑒 𝑡𝑒
So it converges in a zero sum game.
𝑓𝑒
ℓ𝑒 𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒 −
𝑐flow 𝑓, 𝑡 =
0
𝑒∈𝐸
𝑔𝑒 𝑡𝑒
𝑒∈𝐸
𝑓𝑒
𝑐toll 𝑓, 𝑡 =
𝑡𝑒 𝑔𝑒 − 𝑓𝑒 −
𝑒
𝑐flow 𝑓, 𝑡 + 𝑐toll 𝑓, 𝑡 =
𝑓𝑒
ℓ𝑒 𝑥 𝑑𝑥
𝑒
0
ℓ𝑒 𝑥 𝑑𝑥
𝑒∈𝐸 0
So it converges in a zero sum game.
𝑓𝑒
ℓ𝑒 𝑥 𝑑𝑥 + 𝑓𝑒 𝑡𝑒 −
𝑐flow 𝑓, 𝑡 =
𝑒∈𝐸
0
𝑔𝑒 𝑡𝑒
𝑒∈𝐸
𝑓𝑒
𝑐toll 𝑓, 𝑡 =
𝑡𝑒 𝑔𝑒 − 𝑓𝑒 −
𝑒
𝑐flow 𝑓, 𝑡 + 𝑐toll 𝑓, 𝑡 =
0
ℓ𝑒 𝑥 𝑑𝑥
𝑒∈𝐸 0
So the dynamics converge!
𝑡 ∗ 2 ≤ 𝐵𝑚3/2
𝑓−𝑔 2 ≤ 𝑚
So by the regret bound of OGD:
Toll player reaches an 𝜖-approximate min-max strategy in T rounds for:
𝐵 2 𝑚4
𝑇≤
𝜖2
Do approximate min-max tolls guarantee the
approximate target flow?
• Yes! Recall that if latency functions are strictly increasing, then
𝑐flow (𝑓, 𝑡) is strongly convex in 𝑓 for all 𝑡.
Upshot
So: For any fixed class of latency functions 𝐿 such that:
1. Each ℓ(𝑥) ∈ 𝐿 is bounded in the range [0,1]
2. Each ℓ 𝑥 ∈ 𝐿 is strictly increasing
This simple process results in a flow 𝑓 such that 𝑓 − 𝑔
𝑇=𝑂
𝑚4
𝜖2
rounds.
• Can get better bounds with further assumptions
• E.g. ℓ 𝑥 are Lipschitz-continuous
2
2
≤ 𝜖 in
Questions
• Exact convergence without assumptions on latency functions?
• Extensions to other games?
• Seem to crucially use the fact that equilibrium is the solution to a convex
optimization problem…
Questions
• Exact convergence without assumptions on latency functions?
• Extensions to other games?
• Seem to crucially use the fact that equilibrium is the solution to a convex
optimization problem…

Download Report

Adaptively Learning Tolls to Induce Target Flows

Paperzz.com

Your Paperzz