Are One-Player Games Always Simple?

Are One-Player Games Always Simple?
Daniel Andersson & Sergei Vorobyov
Information Technology Department
Uppsala University, Sweden
Games’06, Cambridge, July 2006
1
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Motivation
1. Despite their simple appearances, “trivial” one-player games deserve special
consideration
2. Algorithms for 1-player games are important subroutines in algorithms for
2-player games
3. 1-player games represent, in a simplified, miniaturized form, BIG algorithmic
problems similar to those we encounter (and cannot solve) for 2-player games
4. Can we handle these (1-player) problems adequately?
5. YES will give insights into the 2-player case
Games’06, Cambridge, July 2006
2
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
A Simple Question
1. Suppose we have a polynomial algorithm for a (game-theoretic) problem.
2. Are we happy?
Games’06, Cambridge, July 2006
3
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
A Simple Question
1. Suppose we have a polynomial algorithm for a (game-theoretic) problem.
2. Are we happy?
Possible Answers
1. YES!
Games’06, Cambridge, July 2006
4
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
A Simple Question
1. Suppose we have a polynomial algorithm for a (game-theoretic) problem.
2. Are we happy?
Possible Answers
1. YES!
2. NO!
Games’06, Cambridge, July 2006
5
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
A Simple Question
1. Suppose we have a polynomial algorithm for a (game-theoretic) problem.
2. Are we happy?
Possible Answers
1. YES!
2. NO!
3. IT DEPENDS...
Games’06, Cambridge, July 2006
6
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
One-Player Mean Payoff Games
1. In an edge-weighted leafless digraph a single player wants to find
a Minimal Average Cost cycle reachable from from a given vertex
Games’06, Cambridge, July 2006
7
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
One-Player Mean Payoff Games
1. In an edge-weighted leafless digraph a single player wants to find
a Minimal Average Cost cycle reachable from from a given vertex
2. Karp’s algorithm O(nm)
(n - number of vertices, m - number of edges, independent of edge weights!)
√
3. Orlin’s algorithm O( n m log(nW ))
Games’06, Cambridge, July 2006
8
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
One-Player Discounted Payoff Games
1. In an edge-weighted digraph (without sinks) a solitary player wants to find
an infinite path (sequence of edges e0 , e1 , e2 , . . . sharing endpoints)
maximizing
+∞
X
λi w(ei ),
i=0
where λ ∈ (0, 1), w(.) is the edge weight function.
Games’06, Cambridge, July 2006
9
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
One-Player Discounted Payoff Games
1. In an edge-weighted digraph (without sinks) a solitary player wants to find
an infinite path (sequence of edges e0 , e1 , e2 , . . . sharing endpoints)
maximizing
+∞
X
λi w(ei ),
i=0
where λ ∈ (0, 1), w(.) is the weight function.
2. Any (algorithmic) suggestions?
Games’06, Cambridge, July 2006
10
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
LP for One-Player Discounted Payoff Games
1. Write the system S of linear MD2-constraints:
xi ≥ λ xj + wij
for each edge of the graph (i, j) of weight wij
2. Solve the LP
minimize
X
xi subject to S
i
Games’06, Cambridge, July 2006
11
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Is It Satisfactory?
Games’06, Cambridge, July 2006
12
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Is It Satisfactory?
1. YES (polynomial)
Games’06, Cambridge, July 2006
13
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Is It Satisfactory?
1. YES (polynomial)
2. NO — WEAKLY/NONSTRONGLY POLYNOMIAL
the number of operations depends not only on the number of variables n and
constraints m, but also on the magnitude of the coefficients in the constraints.
λ = 0.99999999999999999999999999999999999999999999999999999999999999
Actually, the runtime of all known general-purpose LP solvers is
O(poly(n, m) · size(A, b, c)), to solve min(cT x|Ax ≥ b)
- size(A, b, c) – the total space in bits to write ALL entries in A, b, c
- sometimes you would prefer to stay with the exponential simplex-method
- or rely on strongly subexponential LP-methods
- rather than on WEAKLY polynomial ones
Games’06, Cambridge, July 2006
14
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Possible Objection
1. we never have such crazy coefficients!
Games’06, Cambridge, July 2006
15
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Possible Objection
1. we never have such crazy coefficients!
2. YES, WE DO!
Games’06, Cambridge, July 2006
16
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
One-Player Discounted Payoff Games
1. If you reduce from a parity game with n vertices and k colors, you get a
discounted game with the discounting factor as close to 1 as
λ=1−
1
nk+3
2. Plugging in, e.g., n = 106 ≈ 220 and k = 3 gives you
λ = 1 − 10−36 ≈ 1 − 2−120 ,
i.e., you need about 241 bits to write this λ:
1| .{z
. . 1} / 1 0| .{z
. . 0}
120
120
3. Are you still satisfied with weakly polynomial algorithms?
Games’06, Cambridge, July 2006
17
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
One-Player Stochastic Games
Solving a one-player stochastic game amounts to solving a linear program
minimize
X
xi
i
subject to a system of linear constraints of the MD-form (Monotonic Discounted)
xi ≥
where pij ≥ 0 and
probability 1)
P
j
pij = 1, or
Games’06, Cambridge, July 2006
P
j
X
pij xj + β
j
pij < 1 (to make the game stopping with
18
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Strongly vs. Weakly Polynomial Algorithms
1. One of the most important open problems in mathematical optimization is
the existence of a strongly polynomial algorithm for linear programming.
2. In the (weakly) polynomial ellipsoid algorithm due to Khachiyan and the
interior-point algorithms of Karmarkar, the number of operations depends
not only on the number of variables n and constraints m, but also on the
magnitude of the coefficients.
3. The quest for a strongly polynomial algorithm, with a running time
polynomial in n, m, while independent of the coefficients, has continued for
the last quarter of a century.
Games’06, Cambridge, July 2006
19
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Challenge: Miniaturized Quest for S/Poly LP
Can we, using a special structure of linear programs resulting from 1-player
games, suggest strongly polynomial algorithms for these games?
(Not relying on linear programming)
Particular Questions
1. Are discrete/combinatorial methods possible (like shortest paths, iterative
improvement, etc.)
(non-interior-point, non-ellipsoid-like)
2. Can something nontrivial be proved about iterative improvement?
(Many advocate iterative improvement for 2-player games ... good luck!)
Positive answers will give insights into the 2-player case
Games’06, Cambridge, July 2006
20
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Noticeable Example: Min Cost Max Flow
1. Was initially solved by reduction to linear programming
(weakly polynomial)
2. Strongly polynomial algorithms were eventually developed (Tardos, Orlin)
Games’06, Cambridge, July 2006
21
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
LPs for 1-Player SGs are Very Well Structured
1. A linear program
X
min(
xi |S)
for a 1-player (Max) SG has many special properties.
2. The constraints of S are monotonic and discounted (MD), i.e., have form
X
xi ≥
aj xj + β,
i6=j
with 0 ≤ aj and
P
aj < 1.
3. Least Element Property: The feasible region of S contains a unique least
element, which is also the solution to the program and the game.
(least – wrt the ≤-ordering in Rn ; e.g., the 0 vector is the least element of the
nonnegative orthant)
Games’06, Cambridge, July 2006
22
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Basic Properties
Proposition. The feasible region for an MD-system is always nonempty and we
can easily find a feasible solution, in strongly polynomial time.
(This is in contrast to the general linear programs!)
Proposition. A feasible solution for a given MD-system can be computed in
O(mn) time. For an MD2-system, O(m) time suffices.
Idea. Pick any point. If it is infeasible, go in the direction (1, . . . , 1) until you
satisfy all the constraints (= intersect the farthest bounding hyperplane).
But: (just) feasible solutions are not enough! We need the least one!
Games’06, Cambridge, July 2006
23
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Full System of MD2=λ -Constraints (1-DPG Case)
n groups of constraints, one for each variable, each variable is bounded from below
discounting factor λ ∈ (0, 1) is the same in all constraints!



 x1



x1
≥
λ x... + β...
≥
λ x... + β...
...
...



 xn



Games’06, Cambridge, July 2006
xn
≥
λ x... + β...
≥
λ x... + β...
...
24
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Least Element Problem
The main problem we concentrate upon is the following
MD2-Least Element Problem (MD2-LEP).
Given:
Find:
a full system S of MD2-constraints.
the least element of the convex polyhedron defined by S.
Theorem. This least element always exists and is uniquely defined
Games’06, Cambridge, July 2006
25
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Equal Discounting Factors. Coloring Tight Edges
1. Each constraint has form xi ≥ λxj + βij
2. For each variable xi , there is at least one such constraint with xi on the left
(full system) (= no sinks)
3. Let x be a feasible solution of the full system of constraints
4. Some constraints are satisfied as equalities by x, called tight
5. Consider the graph G with variables as vertices, in which a constraint
xi ≥ λxj + βij is represented as a directed edge from xi to xj with weight βij
6. An edge is called tight if the corresponding constraint is satisfied as equality
by the feasible solution x
7. Tight edges may be colored red, but at most one outgoing red edge per
vertex is allowed
8. A vertex without red outgoing edges is called the root
Games’06, Cambridge, July 2006
26
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Red Zones
Proposition. If every vertex has an outgoing red edge (i.e., there are no roots),
then x is the unique least solution of the constraint system (least element).
(A restatement of: if a feasible solution satisfies at least one constraint in each
group as equality, this solution is the unique minimal solution/least element)
A red zone is a maximal subgraph G′ of G induced by red edges such that for
every two vertices u, v of G′ there is a vertex of G′ reachable from both u and v
by red paths (possibly empty) in G′
Every red zone is either a red tree with a root (no outgoing red edges), or a red
sun with a unique cycle and incoming rays (directed simple paths)
Games’06, Cambridge, July 2006
27
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Red Zones: Trees and Suns
Games’06, Cambridge, July 2006
28
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
The Pull-Down Algorithm
1. starting from a feasible solution, we move in a nonpositive direction within
the feasible region, making sure the number of red edges is nondecreasing
2. this has a clear intuitive interpretation as pulling a red tree down by its root
3. a pull-down of a red tree is performed as follows.
The coordinates of the feasible solution x corresponding to nodes in the tree
are decreased in such a way that the tree edges remain tight, while all other
coordinates are kept fixed, until some constraint is just about to be violated.
An edge corresponding to such a constraint (satisfied as equality) is then
chosen, and is made the unique new outgoing red edge from its tail.
Note that the tail of this new red edge will always be a node originating from
the red tree that was pulled down, due to the monotonicity of the constraints.
Also, roots are never created, i.e., once a vertex has acquired an outgoing red
edge, it will always have some outgoing red edge. Thus, a pull-down results
in one of the following three events, illustrated in Figure 30.
Games’06, Cambridge, July 2006
29
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Possible Results of a Pull-Down Operation
Monotonicity
1. in a pull-down operation the number of red edges monotonically
non-decreases
2. in root elimination this number increases
3. in subtree defection and internal switch it remains the same
4. in subtree defection the number of the tree nodes decreases
Games’06, Cambridge, July 2006
30
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Critical Monotonicity Property
Proposition. In an internal switch the depth of the switching node increases
Corollary 1. The number of successive pull-downs needed to eliminate the root
of a red tree is at most t2 , where t is the number of nodes in the tree before the
first pull-down.
Remark. The bound in the Corollary is asymptotically tight — it is possible to
give an example when Θ(t2 ) pulls down are needed to eliminate one root.
(We have an explicit example.)
Corollary 2. O(n3 ) pull down operations suffice to eliminate all roots
(i.e., to find the unique least element and solve the game)
Games’06, Cambridge, July 2006
31
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
The Greedy Strategy
Always select the SMALLEST tree to pull down and eliminate its root!
Proposition. With the greedy strategy O(n2 ) pull down operations suffice to
eliminate all roots
Proof. If there are k trees, at least one of them must have at most ⌊n/k⌋ nodes,
and thus the total number of pull-downs is bounded above by
n 2
X
n
k=1
Games’06, Cambridge, July 2006
k
∞
X
1
π 2 n2
2
2
≤n
=
O(n
).
=
2
k
6
k=1
32
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Overall Complexity
Proposition. A single pull-down operation can be performed in O(m) time
Intuition: you have a descent direction d, following which you keep all red edges
of the tree tight, while reducing node values. Find the first hyperplane (out of m),
which you reach/hit going in direction d (constant time per 2-variable constraint).
Main Theorem.
1. The unique least element of a full MD2-system of linear constraints with
equal discounting factors, as well as
2. the unique solution of a 1-player discounted payoff game
can be found in strongly polynomial time O(mn2 ), where n is the number of
variables (vertices), and m is the number of constraints (edges).
Games’06, Cambridge, July 2006
33
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Comparison
• O(mn2 ) seems the best currently available bound for the problem
(asymptotically, like running Bellman-Ford’s from every vertex)
• Lower bounds – unknown (trying to reduce from Shortest Paths)
• Extensive literature on special classes of LPs (Megiddo, Hochbaum,...)
– all suggest worse complexity bounds
Games’06, Cambridge, July 2006
34
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
2-Player Discounted Payoff Games
Combined, as a subroutine, with the combinatorial randomized optimization
scheme developed for Recursive Local Global Functions optimization
[Björklund & V., TCS’05, 349(3)], it gives the best currently available
2
√
n log m
O(mn e
)
algorithm (which has a simultaneous psedopolynomial bound)
1. RLG-functions — a useful abstraction for strategy evaluation functions
unifying many common games.
(defined on finite Cartesian products, every local optimum is global, and this
holds recursively on subproducts)
2. Once you have an RLG-function for your favorite strategy evaluation, you
can immediately conclude a subexponential bound.
Games’06, Cambridge, July 2006
35
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Ergodic Partition for 2-Player Mean Payoff Games
Used as a subroutine, the new algorithm also provides for a new improved
√
2
strongly subexponential O(mn e n log m ) algorithm for the
Ergodic Partition for MPGs. Given a 2-player MPG, factor its vertices into
classes with equal values.
1. Our previous randomized subexponential algorithm from [Björklund & V.,
Discrete Applied Maths, to appear] was not strongly subexponential — it
proceeds by dichotomy and hence has a factor log W , where W is the largest
absolute edge weight
2. (Strongly subexponential for the decision version: whether the value is > B)
p
3. The new algorithm roughly gives the gain T (n, m) compared with T (n, m)
for what was possible before.
Games’06, Cambridge, July 2006
36
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Generalized Discounted Payoff Games
1. Edges may have different discounting factors
2. There may exist multiple edges between vertices
3. The previous algorithm can, in principle, handle 1-player Generalized DPGs,
after a simple modification (correctness/completeness preserved!)
4. But we cannot claim (for now) a strongly polynomial bound
5. More precisely, we can no longer bound the number of internal switches
during pull-downs
(this bound was O(n2 ) before — by the depth increasing argument)
A New Challenge: Whether a Strongly Polynomial
Algorithm Possible?
Games’06, Cambridge, July 2006
37
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Algorithm for 1-Player Generalized DPGs
1. Extensive results/algorithms/literature on special classes of linear programs:
- 2 or 3 variables per constraint
- not necessarily monotonic
- Aspvall, Shiloach, Cohen, Megiddo, Hochbaum, Naor, ...
2. Usually such algorithm work for feasibility (rather than optimality)
(they cannot optimize, in general, in polynomial time)
3. Feasibility for MD-constraints is trivial, O(mn) (strongly polynomial)
4. We need optimality/minimality/least elements – not trivial!
5. We selected the best (asymptotically) algorithm [Hochbaum & Naor’94], and
adapted it for solving MD2 constraints (with different λ’s)
Games’06, Cambridge, July 2006
38
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Basic Idea: Clever Fourier-Motzkin Elimination
• To eliminate a variable xi ,
• all inequalities containing xi are replaced with inequalities L ≤ U for each
pair L ≤ xi and xi ≤ U in the original system.
• Feasibility is preserved, and the method can be applied recursively to
compute a feasible solution, or determine that no one exists.
• Crux: the number of inequalities created during a straightforward
application of such repeated eliminations may be exponential.
• For the 2-variable case it is possible to keep this growth linear, by eliminating
lots of redundant inequalities
(quick binary search of the bounds on a feasible solution among breakpoints
of piecewise-linear bounding envelopes in the 2-dimensional plane [Hochbaum
& Naor’94], using modified Bellman-Ford’s)
Games’06, Cambridge, July 2006
39
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Main Result
Theorem. The running time of the algorithm for solving MD2-constraints
(with different discounting factors) to optimality (least element) is O(mn2 log m),
where n is the number of variables and m is the number of constraints.
Corollary. Generalized 1-player DPGs (with different discounts) can be solved
in strongly polynomial time O(mn2 log m), where n is the number of vertices and
m is the number of edges.
Corollary. Generalized 2-player DPGs can be solved in randomized strongly
subexponential time
√
2
O(mn log m · e n log m )
Remark: n in the exponent can be replaced with the minimal of the numbers of
vertices of the players
Games’06, Cambridge, July 2006
40
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden
Conclusions
1. New strongly polynomial algorithm for some popular 1-player games
2. Previously unavailable
3. Purely combinatorial
4. Not based on Linear Programming
5. Not based on iterative improvement
6. Heavily exploit the structure of the problem
7. When combined with subexponential RLG-optimization, produce
asymptotically best algorithms
8. Next step: 1-player stochastic games resist
9. Several (generic) candidate algorithms are being currently analyzed
10. Not completely discrete, but much simpler/easier to implement compared
with ellipsoids and IP-algorithms
Games’06, Cambridge, July 2006
41
Daniel Andersson & Sergei Vorobyov, Uppsala, Sweden

Download Report

Are One-Player Games Always Simple?

Paperzz.com

Your Paperzz