Adaptively Learning Tolls to Induce Target Flows

Adaptively Learning Tolls to
Induce Target Flows
Aaron Roth
Joint work with Jon Ullman and Steven Wu
Non-Atomic Congestion Games
β€’ A graph representing a road network
Non-Atomic Congestion Games
β€’ A mass of players π‘šπ‘ ,𝑑 who want to route
flow between 𝑠 and 𝑑.
β€’
𝑠,𝑑 π‘šπ‘ ,𝑑
=1
β€’ Actions for players are paths in the graph
β€’ Action profile ↔ a multi-commodity flow.
β„“5 π‘₯ = 1
β€’ A graph representing a road network
β€’ A latency function ℓ𝑒 π‘₯ on each edge
β€’ Infinitely many (infinitesimally small)
players
Equilibrium Flows
β€’ β€œNobody can improve their total latency by switching paths”
β€’ Let 𝑃𝑠,𝑑 denote the set of 𝑠 β†’ 𝑑 paths in the graph.
β€’ Let β„“ 𝑝, 𝑓 = π‘’βˆˆπ‘ ℓ𝑒 (𝑓𝑒 )
Definition: A feasible multi-commodity flow 𝑓 is a Wardrop equilibrium
if for every (𝑠, 𝑑) with π‘šπ‘ ,𝑑 > 0, and for all paths 𝑝, 𝑝′ ∈ 𝑃𝑠,𝑑 with
𝑓 𝑝 > 0:
β„“ 𝑝, 𝑓 ≀ β„“(𝑝′ , 𝑓)
Routing games are potential games
β€’ Equilibrium flows minimize the following potential function, among all
feasible multi-commodity flows:
𝑓𝑒
πœ™ 𝑓 =
ℓ𝑒 π‘₯ 𝑑π‘₯
π‘’βˆˆπΈ 0
β€’ Convex so long as ℓ𝑒 are non-decreasing
β€’ Strongly convex if ℓ𝑒 are strictly increasing β†’ equilibrium is unique.
Manipulating equilibrium flow
(classic problem)
β€’ Suppose you can set tolls 𝑑𝑒 on each edge:
β€’ ℓ𝑒 𝑑; π‘₯ = ℓ𝑒 π‘₯ + 𝑑𝑒
β€’ Potential function becomes πœ™ 𝑓, 𝑑 =
π‘’βˆˆπΈ
𝑓𝑒
ℓ𝑒
0
π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒
β€’ Changes the equilibrium flow.
β€’ Goal: Set tolls to induce some target flow 𝑔 in equilibrium
β€’ E.g. the socially optimal flow.
β€’ Always possible
β€’ Computationally tractable
A Natural Problem [Bhaskar, Ligett, Schulman, Swamy FOCS 14]
β€’ You don’t know the latency functions…
β€’ But you have the power to set tolls and see what happens.
β€’ Agents play equilibrium flow given your tolls.
β€’ i.e. you have access to an oracle that takes toll vectors 𝑑 and returns 𝑓 βˆ— (𝑑),
the equilibrium flow given 𝑑.
β€’ 𝑓 βˆ— 𝑑 ≔ arg min πœ™( 𝑓, 𝑑)
𝑓
β€’ Want to learn tolls 𝑑 βˆ— that induce some target flow 𝑔 in polynomially
many rounds.
The [BLSS] solution in a nutshell
β€’ Assume latency functions ℓ𝑒 are convex polynomials of fixed degree
(the only thing unknown is the coefficients)
β€’ E.g. ℓ𝑒 π‘₯ = π‘Žπ‘’ π‘₯ + 𝑏𝑒
β€’ Write down a convex program to try and solve for coefficients and
tolls that induce the target flow using Ellipsoid
β€’ i.e. π‘Žπ‘’ , 𝑏𝑒 , 𝑑𝑒
𝑒
β€’ Every day use tolls 𝑑𝑒 at the centroid of the ellipsoid
β€’ If 𝑓 βˆ— 𝑑 β‰  𝑔, can find a separating hyperplane
β€’ So: number of rounds to convergence ↔ running time of ellipsoid.
The [BLSS] solution in a nutshell
β€’ Very neat! (Read the paper!)
β€’ A couple of limitations:
β€’ Latency functions must have a simple, known form: only unknowns are a
small number of coefficients.
β€’ Latency functions must be convex
β€’ Heavy machinery
β€’ Have to run Ellipsoid. Computationally intensive. Centralized.
Some desiderata
β€’ Remove assumptions on latency functions
β€’ No known parametric form
β€’ Not necessarily convex
β€’ Not necessarily Lipschitz
β€’ Make update elementary
β€’ Ideally decentralized.
Proposed algorithm: Tatonnemont.
1. Initialize 𝑑𝑒0 ← 0 for all edges. Let π‘Ÿ = 0.
2. Observe equilibrium flow 𝑓 = 𝑓 βˆ— (𝑑 0 )
3. While 𝑓 βˆ’ 𝑔 β‰₯ πœ–
1. For each edge 𝑒 set: π‘‘π‘’π‘Ÿ+1 ← π‘‘π‘’π‘Ÿ + πœ‚ 𝑓𝑒 βˆ’ 𝑔𝑒
2. Set π‘Ÿ ← π‘Ÿ + 1 and 𝑓 ← 𝑓 βˆ— (𝑑 π‘Ÿ )
β€’ Natural, simple, distributed.
β€’ Why should it work?
+
Reverse engineering why it should work
β€’ View the interaction as repeated play of a game between a toll player
and a flow player.
β€’ What game are they playing?
β€’ Flow player’s strategies are feasible flows, toll player’s strategies are tolls in
some bounded range
β€’ What are their cost functions?
β€’ How are they playing it?
Behavior of the flow player is clear.
β€’ Every day, the flow player plays:
π‘“βˆ—
≔ arg min πœ™ 𝑓, 𝑑 = arg min
𝑓
𝑓
π‘’βˆˆπΈ
𝑓𝑒
ℓ𝑒
0
π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒
β€’ If we define the flow player’s cost to be:
𝑓𝑒
𝑐1 𝑓, 𝑑 =
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒
π‘’βˆˆπΈ
0
then the flow player is playing a best response every day.
Behavior of the toll player?
β€’ π‘‘π‘’π‘Ÿ+1 ← π‘‘π‘’π‘Ÿ + πœ‚ 𝑓𝑒 βˆ’ 𝑔𝑒
+
β€’ Consistent with playing online gradient descent with loss function:
β„“π‘Ÿ+1
= 𝑔𝑒 βˆ’ 𝑓𝑒
𝑒
β€’ Which is the gradient of cost function:
𝑐2 𝑓, 𝑑 =
𝑑𝑒 𝑔𝑒 βˆ’ 𝑓𝑒
𝑒
So algorithm is consistent with:
β€’ Repeated play of the following game:
β€’ 𝐴flow = 𝑓 ∢ 𝑓 is a feasible multiβˆ’commodity flow
β€’ 𝐴toll = 𝑑 ∢ 𝑑𝑒 β‰₯ 0 βˆ€π‘’
𝑓
β€’ 𝑐flow 𝑓, 𝑑 = π‘’βˆˆπΈ 0 𝑒 ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒
β€’ 𝑐toll 𝑓, 𝑑 =
𝑒 𝑑𝑒
𝑔𝑒 βˆ’ 𝑓𝑒
β€’ Where, in rounds:
β€’ The flow player updates his strategy with online gradient descent, and
β€’ The toll player best responds.
Questions
β€’ Does the equilibrium of this game correspond to the target flow?
β€’ Does this repeated play converge (quickly) to equilibrium?
Questions
β€’ Does the equilibrium of this game correspond to the target flow?
β€’ Yes
β€’ Suppose tolls 𝑑 βˆ— induce flow 𝑔.
β€’ 𝑐toll 𝑔, 𝑑 βˆ— =
β€’ 𝑔 = arg min
𝑓
𝑒 𝑑𝑒
π‘’βˆˆπΈ
𝑔𝑒 βˆ’ 𝑓𝑒 = 0; t βˆ— ∈ arg min 𝑐toll (𝑔, 𝑑)
𝑓𝑒
ℓ𝑒
0
𝑑
π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒 = arg min 𝑐flow (𝑓, 𝑑 βˆ— )
𝑓
β€’ β‡’ Nash equilibrium.
β€’ Suppose (𝑓, 𝑑) is a Nash equilibrium.
β€’ 𝑓 = 𝑔, else βˆƒ an edge 𝑒 with 𝑓𝑒 > 𝑔𝑒 and toll player would set 𝑑𝑒 β†’ ∞
β€’ Thinking a little bit harder, if ℓ𝑒 π‘₯ ≀ 𝐡 for π‘₯ ∈ [0,1], then setting 𝑑𝑒 ∈ 0, π‘šπ΅ sufficient
Questions
β€’ Does this repeated play converge (quickly) to equilibrium?
β€’ It does in zero sum games! [FreundSchapire96?]
β€’ Actual strategy of GD player, empirical average of BR player.
So it converges in a zero sum game.
𝑓𝑒
𝑐flow 𝑓, 𝑑 =
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒
0
π‘’βˆˆπΈ
𝑐toll 𝑓, 𝑑 =
𝑑𝑒 𝑔𝑒 βˆ’ 𝑓𝑒
𝑒
𝑐flow 𝑓, 𝑑 + 𝑐toll 𝑓, 𝑑 =
𝑓𝑒
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑔𝑒 𝑑𝑒
𝑒
0
β‰ 0
Strategic Equivalence
β€’ Adding a strategy-independent term to a player’s cost function does
not change that player’s best response function
β€’ And so doesn’t change the equilibria of the game…
So it converges in a zero sum game.
𝑓𝑒
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒 βˆ’
𝑐flow 𝑓, 𝑑 =
π‘’βˆˆπΈ
0
𝑐toll 𝑓, 𝑑 =
π‘’βˆˆπΈ
𝑑𝑒 𝑔𝑒 βˆ’ 𝑓𝑒
𝑒
𝑐flow 𝑓, 𝑑 + 𝑐toll 𝑓, 𝑑 =
𝑓𝑒
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑔𝑒 𝑑𝑒
𝑒
0
𝑔𝑒 𝑑𝑒
So it converges in a zero sum game.
𝑓𝑒
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒 βˆ’
𝑐flow 𝑓, 𝑑 =
0
π‘’βˆˆπΈ
𝑔𝑒 𝑑𝑒
π‘’βˆˆπΈ
𝑓𝑒
𝑐toll 𝑓, 𝑑 =
𝑑𝑒 𝑔𝑒 βˆ’ 𝑓𝑒 βˆ’
𝑒
𝑐flow 𝑓, 𝑑 + 𝑐toll 𝑓, 𝑑 =
𝑓𝑒
ℓ𝑒 π‘₯ 𝑑π‘₯
𝑒
0
ℓ𝑒 π‘₯ 𝑑π‘₯
π‘’βˆˆπΈ 0
So it converges in a zero sum game.
𝑓𝑒
ℓ𝑒 π‘₯ 𝑑π‘₯ + 𝑓𝑒 𝑑𝑒 βˆ’
𝑐flow 𝑓, 𝑑 =
π‘’βˆˆπΈ
0
𝑔𝑒 𝑑𝑒
π‘’βˆˆπΈ
𝑓𝑒
𝑐toll 𝑓, 𝑑 =
𝑑𝑒 𝑔𝑒 βˆ’ 𝑓𝑒 βˆ’
𝑒
𝑐flow 𝑓, 𝑑 + 𝑐toll 𝑓, 𝑑 =
0
ℓ𝑒 π‘₯ 𝑑π‘₯
π‘’βˆˆπΈ 0
So the dynamics converge!
𝑑 βˆ— 2 ≀ π΅π‘š3/2
π‘“βˆ’π‘” 2 ≀ π‘š
So by the regret bound of OGD:
Toll player reaches an πœ–-approximate min-max strategy in T rounds for:
𝐡 2 π‘š4
𝑇≀
πœ–2
Do approximate min-max tolls guarantee the
approximate target flow?
β€’ Yes! Recall that if latency functions are strictly increasing, then
𝑐flow (𝑓, 𝑑) is strongly convex in 𝑓 for all 𝑑.
Upshot
So: For any fixed class of latency functions 𝐿 such that:
1. Each β„“(π‘₯) ∈ 𝐿 is bounded in the range [0,1]
2. Each β„“ π‘₯ ∈ 𝐿 is strictly increasing
This simple process results in a flow 𝑓 such that 𝑓 βˆ’ 𝑔
𝑇=𝑂
π‘š4
πœ–2
rounds.
β€’ Can get better bounds with further assumptions
β€’ E.g. β„“ π‘₯ are Lipschitz-continuous
2
2
≀ πœ– in
Questions
β€’ Exact convergence without assumptions on latency functions?
β€’ Extensions to other games?
β€’ Seem to crucially use the fact that equilibrium is the solution to a convex
optimization problem…
Questions
β€’ Exact convergence without assumptions on latency functions?
β€’ Extensions to other games?
β€’ Seem to crucially use the fact that equilibrium is the solution to a convex
optimization problem…