The complexity of interior point methods for solving discounted turn

The complexity of interior point methods
for solving discounted turn-based stochastic games
Thomas Dueholm Hansen1,2
1
Rasmus Ibsen-Jensen1
Department of Computer Science,
Aarhus University, Denmark
2
School of Computer Science,
Tel Aviv University, Israel
July 1, 2013
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 1/21
An open problem
We study the problem of solving two-player turn-based
stochastic games (2TBSGs); a special case of Shapley’s
stochastic games (1953).
The status of 2TBSGs resembles that of linear programming
40 years ago:
2TBSGs can be solved efficiently using
strategy iteration algorithms that resemble
the simplex method.
The corresponding decision problem is in
NP ∩ coNP.
Major open problem: No polynomial
time algorithm is known.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 2/21
A natural question
The interior point method, introduced by Karmarkar (1984),
solves linear programs in polynomial time.
Can the technique be used to solve 2TBSGs in polynomial
time?
Suggested by, e.g., Jurdziński and Savani (2008) and Hansen,
Miltersen, and Zwick (2011).
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 3/21
A natural question
The interior point method, introduced by Karmarkar (1984),
solves linear programs in polynomial time.
Can the technique be used to solve 2TBSGs in polynomial
time?
Suggested by, e.g., Jurdziński and Savani (2008) and Hansen,
Miltersen, and Zwick (2011).
Gärtner and Rüst (2005), and Jurdziński and Savani (2008):
2TBSGs can be solved via P-matrix linear complementarity
problems (LCPs).
P-matrix linear complementarity problems can be solved by
interior point methods.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 3/21
A natural question
The interior point method, introduced by Karmarkar (1984),
solves linear programs in polynomial time.
Can the technique be used to solve 2TBSGs in polynomial
time?
Suggested by, e.g., Jurdziński and Savani (2008) and Hansen,
Miltersen, and Zwick (2011).
Gärtner and Rüst (2005), and Jurdziński and Savani (2008):
2TBSGs can be solved via P-matrix linear complementarity
problems (LCPs).
P-matrix linear complementarity problems can be solved by
interior point methods.
We analyze known interior point methods when used to solve
the P-matrix LCPs arising from 2TBSGs.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 3/21
Two-player turn-based stochastic games (2TBSGs)
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 4/21
Two-player turn-based stochastic games (2TBSGs)
7
1
2
1
2
-4
1
a3
a1
a2
1
3
5
1
a5
a4
1
1 4
2
2
1
4
a6
2
1 3
3
-10
A 2TBSG consists of a set of states S = S1 ∪ S2 (circles)
where:
S1 is controlled by player 1, the minimizer.
S2 is controlled by player 2, the maximizer.
We let |S| = n.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 5/21
Two-player turn-based stochastic games (2TBSGs)
7
1
2
1
2
-4
1
a3
a1
a2
1
3
5
1
a5
a4
1
1 4
2
2
1
4
a6
2
1 3
3
-10
Every state i ∈ S is associated with a non-empty set of
actions Ai .
Every action a is associated with a cost ca and a probability
distribution Pa ∈ R1×n such that Pa,j is the probability of
moving to state j when using action a.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 5/21
Two-player turn-based stochastic games (2TBSGs)
7
1
2
1
2
-4
1
a3
a1
a2
1
3
5
1
a5
a4
1
1 4
2
2
1
4
a6
2
1 3
3
-10
A strategy σ1 (or σ2 ) for player 1 (or player 2) is a choice of
an action for each state i ∈ S1 (or i ∈ S2 ).
A strategy profile σ = (σ1 , σ2 ) is a pair of strategies,
defining a Markov chain, Pσ ∈ Rn×n , with costs, cσ ∈ Rn .
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 5/21
Two-player turn-based stochastic games (2TBSGs)
7
1
2
5
1
2
1
a1
a5
a4
1
1 4
2
2
1
4
A strategy σ1 (or σ2 ) for player 1 (or player 2) is a choice of
an action for each state i ∈ S1 (or i ∈ S2 ).
A strategy profile σ = (σ1 , σ2 ) is a pair of strategies,
defining a Markov chain, Pσ ∈ Rn×n , with costs, cσ ∈ Rn .
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 5/21
Two-player turn-based stochastic games (2TBSGs)
7
1
2
5
1
2
1
a1
a5
a4
1
1 4
2
2
1
4
Let γ < 1 be a discount factor.
The value of a state i for a strategy profile σ = (σ1 , σ2 ) is the
expected total discounted cost accumulated when starting in
state i:
∞ X
X
valσ1 ,σ2 (i) =
γ t (Pσt )i,j cσ(j)
t=0 j∈S
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 5/21
Values
The values can be found as the unique solution to the system
of linear equations:
X
∀i : valσ (i) = cσ(i) + γ
Pσ(i), j · valσ (j)
j
2
1
3
1
7
2
3
valσ (1) = 7 + γ 13 valσ (2) + 23 valσ (3)
3
The value vector vσ ∈ Rn , where (vσ )i = valσ (i) for all i,
then satisfies:
vσ = cσ + γPσ vσ
Hansen and Ibsen-Jensen
⇐⇒
(I − γPσ )vσ = cσ .
Interior point methods for 2TBSGs
Page 6/21
Optimal strategies
σ1∗ and σ2∗ are optimal for player 1 and 2, respectively, iff:
∀i :
σ1∗ ∈ argmin max valσ1 ,σ2 (i)
σ1
∀i :
σ2
σ2∗ ∈ argmax min valσ1 ,σ2 (i)
σ2
Hansen and Ibsen-Jensen
σ1
Interior point methods for 2TBSGs
Page 7/21
Optimal strategies
σ1∗ and σ2∗ are optimal for player 1 and 2, respectively, iff:
∀i :
σ1∗ ∈ argmin max valσ1 ,σ2 (i)
σ1
∀i :
σ2
σ2∗ ∈ argmax min valσ1 ,σ2 (i)
σ2
σ1
Shapley (1953): Optimal strategies always exist.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 7/21
Optimal strategies
σ1∗ and σ2∗ are optimal for player 1 and 2, respectively, iff:
∀i :
σ1∗ ∈ argmin max valσ1 ,σ2 (i)
σ1
∀i :
σ2
σ2∗ ∈ argmax min valσ1 ,σ2 (i)
σ2
σ1
Shapley (1953): Optimal strategies always exist.
σ ∗ = (σ1∗ , σ2∗ ) is optimal iff (σ1∗ , σ2∗ ) is a Nash equilibrium:
∀i :
valσ1∗ ,σ2∗ (i) = min valσ1 ,σ2∗ (i) = max valσ1∗ ,σ2 (i)
σ1
Hansen and Ibsen-Jensen
σ2
Interior point methods for 2TBSGs
Page 7/21
Optimal strategies
σ1∗ and σ2∗ are optimal for player 1 and 2, respectively, iff:
∀i :
σ1∗ ∈ argmin max valσ1 ,σ2 (i)
σ1
∀i :
σ2
σ2∗ ∈ argmax min valσ1 ,σ2 (i)
σ2
σ1
Shapley (1953): Optimal strategies always exist.
σ ∗ = (σ1∗ , σ2∗ ) is optimal iff (σ1∗ , σ2∗ ) is a Nash equilibrium:
∀i :
valσ1∗ ,σ2∗ (i) = min valσ1 ,σ2∗ (i) = max valσ1∗ ,σ2 (i)
σ1
σ2
∗
We say that v∗ = vσ is the optimal value vector. Note that
v∗ is unique.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 7/21
Optimality condition
σ ∗ = (σ1∗ , σ2∗ ) is optimal iff:
∀i ∈ S1 , ∀a ∈ Ai :
valσ∗ (i) ≤ ca + γ
X
Pa, j · valσ∗ (j)
j
∀i ∈ S2 , ∀a ∈ Ai :
valσ∗ (i) ≥ ca + γ
X
Pa, j · valσ∗ (j)
j
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 8/21
Optimality condition
σ ∗ = (σ1∗ , σ2∗ ) is optimal iff:
∀i ∈ S1 , ∀a ∈ Ai :
valσ∗ (i) ≤ ca + γ
X
Pa, j · valσ∗ (j)
j
∀i ∈ S2 , ∀a ∈ Ai :
valσ∗ (i) ≥ ca + γ
X
Pa, j · valσ∗ (j)
j
Note that for every state i, the inequality is tight for a = σ(i),
and that these equalities define the values.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 8/21
Solving 2TBSGs
The optimal value vector v∗ ∈ Rn can be found as the unique
solution to:
X
∀i ∈ S1 , ∀a ∈ Ai : (v∗ )i + za = ca + γ
Pa, j · (v∗ )j
j
∀i ∈ S2 , ∀a ∈ Ai :
(v∗ )i − za = ca + γ
X
Pa, j · (v∗ )j
j
∀i ∈ S1 ∪ S2 :
Y
za = 0
a∈Ai
∀a :
za ≥ 0
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 9/21
Solving 2TBSGs
Let G be a 2TBSG for which the actions can be partitioned
into two disjoint strategy profiles σ and τ .
Let I ∈ {−1, 0, 1}n×n be a diagonal matrix with Ii,i = −1 for
i ∈ S1 , and Ii,i = 1 for i ∈ S2 .
Then the optimal value vector can be found as the unique
solution v∗ , w, z ∈ Rn to:
(I − γPσ )v∗ − Iw = cσ
(I − γPτ )v∗ − Iz = cτ
wT z = 0
w, z ≥ 0
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 10/21
P-matrix linear complementarity problems (LCPs)
The linear complementarity problem (M, q), where
M ∈ Rn×n and q ∈ Rn , is the problem of finding a solution
z, w ∈ Rn , if it exists, satisfying:
w = q + Mz
wT z = 0
w, z ≥ 0
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 11/21
P-matrix linear complementarity problems (LCPs)
The linear complementarity problem (M, q), where
M ∈ Rn×n and q ∈ Rn , is the problem of finding a solution
z, w ∈ Rn , if it exists, satisfying:
w = q + Mz
wT z = 0
w, z ≥ 0
The problem of solving an LCP is in general NP-complete.
Hence, it is common to make assumptions about M.
M is a P-matrix iff every principal submatrix of M has
positive determinant, or, equivalently, xi (Mx)i > 0 for every
x 6= 0 and i ∈ {1, . . . , n}.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 11/21
Reduction from 2TBSGs to LCPs
Jurdziński and Savani (2008) showed that a deterministic
2TBSG can be solved by solving an LCP (MG ,σ,τ , qG ,σ,τ ):
MG ,σ,τ = I(I − γPσ )(I − γPτ )−1 I
qG ,σ,τ = I(I − γPσ )(I − γPτ )−1 cτ − Icσ ,
and that MG ,σ,τ is a P-matrix.
Result: We show that the same is true for 2TBSGs in general.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 12/21
Reduction from 2TBSGs to LCPs
Jurdziński and Savani (2008) showed that a deterministic
2TBSG can be solved by solving an LCP (MG ,σ,τ , qG ,σ,τ ):
MG ,σ,τ = I(I − γPσ )(I − γPτ )−1 I
qG ,σ,τ = I(I − γPσ )(I − γPτ )−1 cτ − Icσ ,
and that MG ,σ,τ is a P-matrix.
Result: We show that the same is true for 2TBSGs in general.
A similar reduction was made by Gärtner and Rüst (2005) for
simple stochastic games; a class of games that are
polynomially equivalent to 2TBSGs.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 12/21
Interior point methods for P-matrix LCPs
There exist several interior point methods for solving P-matrix
LCPs. The running time is typically expressed as a function of
parameters that depend on M.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 13/21
Interior point methods for P-matrix LCPs
There exist several interior point methods for solving P-matrix
LCPs. The running time is typically expressed as a function of
parameters that depend on M.
Kojima, Megiddo, Noma, and Yoshise (1991): A unified
interior point method that runs in time
O((1 + κ)n3.5 L)
where L is the number of bits needed to describe M.
If κ = 0 then M is positive semi-definite, and in general:
X
κ = min κ ≥ 0 | ∀x ∈ Rn : xT (Mx) + 4κ
xi (Mx)i ≥ 0
i∈δ+ (M)
where δ+ (M) = {i ∈ [n] | xi (Mx)i > 0}.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 13/21
Interior point methods for P-matrix LCPs
Kojima, Megiddo, and Ye (1992): An interior point
potential reduction algorithm that produces a solution with
wT z < in time
4
−1
O( −δ
θ n log ) .
δ is the smallest eigenvalue of
M+M T
2
.
θ is the positive P-matrix number:
θ =
min
max xi (Mx)i .
kxk2 =1 i∈[n]
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 14/21
Related work and results
Let G be a 2TBSG with n states and discount factor γ < 1, for
which the actions can be partitioned into two strategy profiles σ
and τ , and let M = MG ,σ,τ .
Rüst (2007):
−δ
θ
1
= Ω( 1−γ
) for simple stochastic games.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 15/21
Related work and results
Let G be a 2TBSG with n states and discount factor γ < 1, for
which the actions can be partitioned into two strategy profiles σ
and τ , and let M = MG ,σ,τ .
Rüst (2007):
−δ
θ
1
= Ω( 1−γ
) for simple stochastic games.
We show:
−δ =
1
θ
n
(1−γ)2
√ n
Θ 1−γ
κ = Θ
= Θ
n
(1−γ)2
All our lower bounds are obtained using the same game.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 15/21
Related work and results
Result: The analysis of Kojima, Megiddo, Noma, and Yoshise
(1991) gives the following bound for 2TBSGs:
4.5 n L
O((1 + κ)n3.5 L) = O (1−γ)
.
2
Result: The analysis of Kojima, Megiddo, and Ye (1992)
gives the following bound for 2TBSGs:
5.5
n log −1
4 log −1
O −δ
n
=
O
.
3
θ
(1−γ)
Result: These bounds can not be improved by better bounds
for κ, δ, and θ.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 16/21
Related work and results
Result: The analysis of Kojima, Megiddo, Noma, and Yoshise
(1991) gives the following bound for 2TBSGs:
4.5 n L
O((1 + κ)n3.5 L) = O (1−γ)
.
2
Result: The analysis of Kojima, Megiddo, and Ye (1992)
gives the following bound for 2TBSGs:
5.5
n log −1
4 log −1
O −δ
n
=
O
.
3
θ
(1−γ)
Result: These bounds can not be improved by better bounds
for κ, δ, and θ.
Note that the bounds are not polynomial when γ is part of
the input.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 16/21
Related work and results
Result: The analysis of Kojima, Megiddo, Noma, and Yoshise
(1991) gives the following bound for 2TBSGs:
4.5 n L
O((1 + κ)n3.5 L) = O (1−γ)
.
2
Result: The analysis of Kojima, Megiddo, and Ye (1992)
gives the following bound for 2TBSGs:
5.5
n log −1
4 log −1
O −δ
n
=
O
.
3
θ
(1−γ)
Result: These bounds can not be improved by better bounds
for κ, δ, and θ.
Note that the bounds are not polynomial when γ is part of
the input.
Shapley’s value iteration algorithm (1953) solves such
n2 L
1
2TBSGs in time O( 1−γ
log 1−γ
).
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 16/21
Lower bound for κ
Let M := MG ,σ,τ = I(I − γPσ )(I − γPτ )−1 I, and recall that
X
κ = min κ ≥ 0 | ∀x ∈ Rn : xT (Mx) + 4κ
xi (Mx)i ≥ 0
i∈δ+ (M)
where δ+ (M) = {i ∈ [n] | xi (Mx)i > 0}.
Find G , σ, τ , and x = Ix0 ∈ Rn giving a large value for:
κ ≥
4
−xT ((I − γPσ )(I − γPτ )−1 x)
−1
i∈δ+ (M) xi ((I − γPσ )(I − γPτ ) x)i
P
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 17/21
Lower bound for κ
n
n−1
1
2
...
n−3
σ consists of solid arrows, and τ
Let x ∈ Rn be defined by, for all

1

1 − 1−γ
xi = 1


−1
n−2
consists of dashed arrows.
i ∈ [n]:
if i ≤ n − 2
if i = n − 1
if i = n
Note that the game is deterministic and that there is only one
player.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 18/21
Lower bound for κ
Simple calculations show that:
κ ≥
4
−xT ((I − γPσ )(I − γPτ )−1 x)
n
.
=
Ω
2
(1−γ)
−1
i∈δ+ (M) xi ((I − γPσ )(I − γPτ ) x)i
P
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 19/21
Lower bound for κ
Simple calculations show that:
κ ≥
4
−xT ((I − γPσ )(I − γPτ )−1 x)
n
.
=
Ω
2
(1−γ)
−1
i∈δ+ (M) xi ((I − γPσ )(I − γPτ ) x)i
P
It is not difficult to show that (I − γPτ )−1 =
P∞
t=0 (γPτ )
t.
Note that (I − γPτ )−1 x corresponds to the value vector for τ
when the costs are replaced by x.
This fact is also used for proving our upper bounds.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 19/21
Concluding remarks
We have shown that known bounds for interior point methods
for solving 2TBSGs are not polynomial when γ is part of the
input.
This should only be viewed as a first step.
It remains open whether the analysis of known interior point
methods can be specialized for 2TBSGs, and whether other
variants can solve the problem in polynomial time.
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 20/21
The end
Thank you for listening!
Hansen and Ibsen-Jensen
Interior point methods for 2TBSGs
Page 21/21