Adaptively Learning Tolls to Induce Target Flows Aaron Roth Joint work with Jon Ullman and Steven Wu Non-Atomic Congestion Games β’ A graph representing a road network Non-Atomic Congestion Games β’ A mass of players ππ ,π‘ who want to route flow between π and π‘. β’ π ,π‘ ππ ,π‘ =1 β’ Actions for players are paths in the graph β’ Action profile β a multi-commodity flow. β5 π₯ = 1 β’ A graph representing a road network β’ A latency function βπ π₯ on each edge β’ Infinitely many (infinitesimally small) players Equilibrium Flows β’ βNobody can improve their total latency by switching pathsβ β’ Let ππ ,π‘ denote the set of π β π‘ paths in the graph. β’ Let β π, π = πβπ βπ (ππ ) Definition: A feasible multi-commodity flow π is a Wardrop equilibrium if for every (π , π‘) with ππ ,π‘ > 0, and for all paths π, πβ² β ππ ,π‘ with π π > 0: β π, π β€ β(πβ² , π) Routing games are potential games β’ Equilibrium flows minimize the following potential function, among all feasible multi-commodity flows: ππ π π = βπ π₯ ππ₯ πβπΈ 0 β’ Convex so long as βπ are non-decreasing β’ Strongly convex if βπ are strictly increasing β equilibrium is unique. Manipulating equilibrium flow (classic problem) β’ Suppose you can set tolls π‘π on each edge: β’ βπ π‘; π₯ = βπ π₯ + π‘π β’ Potential function becomes π π, π‘ = πβπΈ ππ βπ 0 π₯ ππ₯ + ππ π‘π β’ Changes the equilibrium flow. β’ Goal: Set tolls to induce some target flow π in equilibrium β’ E.g. the socially optimal flow. β’ Always possible β’ Computationally tractable A Natural Problem [Bhaskar, Ligett, Schulman, Swamy FOCS 14] β’ You donβt know the latency functionsβ¦ β’ But you have the power to set tolls and see what happens. β’ Agents play equilibrium flow given your tolls. β’ i.e. you have access to an oracle that takes toll vectors π‘ and returns π β (π‘), the equilibrium flow given π‘. β’ π β π‘ β arg min π( π, π‘) π β’ Want to learn tolls π‘ β that induce some target flow π in polynomially many rounds. The [BLSS] solution in a nutshell β’ Assume latency functions βπ are convex polynomials of fixed degree (the only thing unknown is the coefficients) β’ E.g. βπ π₯ = ππ π₯ + ππ β’ Write down a convex program to try and solve for coefficients and tolls that induce the target flow using Ellipsoid β’ i.e. ππ , ππ , π‘π π β’ Every day use tolls π‘π at the centroid of the ellipsoid β’ If π β π‘ β π, can find a separating hyperplane β’ So: number of rounds to convergence β running time of ellipsoid. The [BLSS] solution in a nutshell β’ Very neat! (Read the paper!) β’ A couple of limitations: β’ Latency functions must have a simple, known form: only unknowns are a small number of coefficients. β’ Latency functions must be convex β’ Heavy machinery β’ Have to run Ellipsoid. Computationally intensive. Centralized. Some desiderata β’ Remove assumptions on latency functions β’ No known parametric form β’ Not necessarily convex β’ Not necessarily Lipschitz β’ Make update elementary β’ Ideally decentralized. Proposed algorithm: Tatonnemont. 1. Initialize π‘π0 β 0 for all edges. Let π = 0. 2. Observe equilibrium flow π = π β (π‘ 0 ) 3. While π β π β₯ π 1. For each edge π set: π‘ππ+1 β π‘ππ + π ππ β ππ 2. Set π β π + 1 and π β π β (π‘ π ) β’ Natural, simple, distributed. β’ Why should it work? + Reverse engineering why it should work β’ View the interaction as repeated play of a game between a toll player and a flow player. β’ What game are they playing? β’ Flow playerβs strategies are feasible flows, toll playerβs strategies are tolls in some bounded range β’ What are their cost functions? β’ How are they playing it? Behavior of the flow player is clear. β’ Every day, the flow player plays: πβ β arg min π π, π‘ = arg min π π πβπΈ ππ βπ 0 π₯ ππ₯ + ππ π‘π β’ If we define the flow playerβs cost to be: ππ π1 π, π‘ = βπ π₯ ππ₯ + ππ π‘π πβπΈ 0 then the flow player is playing a best response every day. Behavior of the toll player? β’ π‘ππ+1 β π‘ππ + π ππ β ππ + β’ Consistent with playing online gradient descent with loss function: βπ+1 = ππ β ππ π β’ Which is the gradient of cost function: π2 π, π‘ = π‘π ππ β ππ π So algorithm is consistent with: β’ Repeated play of the following game: β’ π΄flow = π βΆ π is a feasible multiβcommodity flow β’ π΄toll = π‘ βΆ π‘π β₯ 0 βπ π β’ πflow π, π‘ = πβπΈ 0 π βπ π₯ ππ₯ + ππ π‘π β’ πtoll π, π‘ = π π‘π ππ β ππ β’ Where, in rounds: β’ The flow player updates his strategy with online gradient descent, and β’ The toll player best responds. Questions β’ Does the equilibrium of this game correspond to the target flow? β’ Does this repeated play converge (quickly) to equilibrium? Questions β’ Does the equilibrium of this game correspond to the target flow? β’ Yes β’ Suppose tolls π‘ β induce flow π. β’ πtoll π, π‘ β = β’ π = arg min π π π‘π πβπΈ ππ β ππ = 0; t β β arg min πtoll (π, π‘) ππ βπ 0 π‘ π₯ ππ₯ + ππ π‘π = arg min πflow (π, π‘ β ) π β’ β Nash equilibrium. β’ Suppose (π, π‘) is a Nash equilibrium. β’ π = π, else β an edge π with ππ > ππ and toll player would set π‘π β β β’ Thinking a little bit harder, if βπ π₯ β€ π΅ for π₯ β [0,1], then setting π‘π β 0, ππ΅ sufficient Questions β’ Does this repeated play converge (quickly) to equilibrium? β’ It does in zero sum games! [FreundSchapire96?] β’ Actual strategy of GD player, empirical average of BR player. So it converges in a zero sum game. ππ πflow π, π‘ = βπ π₯ ππ₯ + ππ π‘π 0 πβπΈ πtoll π, π‘ = π‘π ππ β ππ π πflow π, π‘ + πtoll π, π‘ = ππ βπ π₯ ππ₯ + ππ π‘π π 0 β 0 Strategic Equivalence β’ Adding a strategy-independent term to a playerβs cost function does not change that playerβs best response function β’ And so doesnβt change the equilibria of the gameβ¦ So it converges in a zero sum game. ππ βπ π₯ ππ₯ + ππ π‘π β πflow π, π‘ = πβπΈ 0 πtoll π, π‘ = πβπΈ π‘π ππ β ππ π πflow π, π‘ + πtoll π, π‘ = ππ βπ π₯ ππ₯ + ππ π‘π π 0 ππ π‘π So it converges in a zero sum game. ππ βπ π₯ ππ₯ + ππ π‘π β πflow π, π‘ = 0 πβπΈ ππ π‘π πβπΈ ππ πtoll π, π‘ = π‘π ππ β ππ β π πflow π, π‘ + πtoll π, π‘ = ππ βπ π₯ ππ₯ π 0 βπ π₯ ππ₯ πβπΈ 0 So it converges in a zero sum game. ππ βπ π₯ ππ₯ + ππ π‘π β πflow π, π‘ = πβπΈ 0 ππ π‘π πβπΈ ππ πtoll π, π‘ = π‘π ππ β ππ β π πflow π, π‘ + πtoll π, π‘ = 0 βπ π₯ ππ₯ πβπΈ 0 So the dynamics converge! π‘ β 2 β€ π΅π3/2 πβπ 2 β€ π So by the regret bound of OGD: Toll player reaches an π-approximate min-max strategy in T rounds for: π΅ 2 π4 πβ€ π2 Do approximate min-max tolls guarantee the approximate target flow? β’ Yes! Recall that if latency functions are strictly increasing, then πflow (π, π‘) is strongly convex in π for all π‘. Upshot So: For any fixed class of latency functions πΏ such that: 1. Each β(π₯) β πΏ is bounded in the range [0,1] 2. Each β π₯ β πΏ is strictly increasing This simple process results in a flow π such that π β π π=π π4 π2 rounds. β’ Can get better bounds with further assumptions β’ E.g. β π₯ are Lipschitz-continuous 2 2 β€ π in Questions β’ Exact convergence without assumptions on latency functions? β’ Extensions to other games? β’ Seem to crucially use the fact that equilibrium is the solution to a convex optimization problemβ¦ Questions β’ Exact convergence without assumptions on latency functions? β’ Extensions to other games? β’ Seem to crucially use the fact that equilibrium is the solution to a convex optimization problemβ¦
© Copyright 2025 Paperzz