Available online at www.sciencedirect.com Electronic Notes in Discrete Mathematics 55 (2016) 155–159 www.elsevier.com/locate/endm Determining the Optimal Strategies for Zero-Sum Average Stochastic Positional Games Dmitrii Lozovanu Applied Mathematics Institute of Mathematics and Computer Science of Moldova Academy of Sciences Chisinau, Moldova Stefan Pickl Institute for Theoretical Computer Science, Mathematics and Operations Research Universität der Bundeswehr, München, Neubiberg-München, Germany Abstract We consider a class of zero-sum stochastic positional games that generalizes the deterministic antagonistic positional games with average payoffs. We prove the existence of saddle points for the considered class of games and propose an approach for determining the optimal stationary strategies of the players. Keywords: Average Markov decision process, Zero-sum stochastic positional games, Optimal strategies, Saddle points. 1 2 Email: [email protected] Email: [email protected] http://dx.doi.org/10.1016/j.endm.2016.10.039 1571-0653/© 2016 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). 156 1 D. Lozovanu, S. Pickl / Electronic Notes in Discrete Mathematics 55 (2016) 155–159 Introduction and Problem Formulation The aim of this paper is to prove the existence of saddle points for a class of antagonistic stochastic games that extends deterministic positional games with average payoffs from [1,2,5]. The considered class of games we formulate by using the framework of a Markov decision process (X, A, c, p, x0 ) with a finite set of states X, a finite set of actions A = ∪x∈X A(x) where A(x) is the set of actions in the state x ∈ X, a probability transition function p : X×A×X → [0, 1] a that satisfies the condition y∈X px,y = 1, ∀x ∈ X, a ∈ A(x), and the cost function c : X × X → R. We assume that the Markov process describes the evolution of a dynamic system that is controlled by two players as follows: The set of states is divided into two subsets X = X1 ∪ X2 with X1 ∩ X2 = ∅, where X1 represents the position set of the first player and X2 represents the position set of the second player. At time moment t = 0 the dynamical system is in the state x0 . If this state belongs to the set of positions of the first player then the action a0 ∈ A(x0 ) in this state is selected by the first player, otherwise the action a0 ∈ A(x0 ) is chosen by the second player. After that the dynamical system passes randomly to an another state according to the probability distribution {pax00 ,y }. At time moment t = 1 the players observe the state x1 ∈ X of the system. If x1 belongs to the set of positions of the first player then the action a ∈ A(x1 ) is chosen by the first player, otherwise the action is chosen by the second one and so on, indefinitely. In this process the first t aτ player intends to maximize lim inf 1t μ(xτ −1 ), where μ(xτ ) = pxτ ,y cxτ ,y , t→∞ τ =1 and the second player intends to minimize lim sup 1t t→∞ t τ =1 y∈X μ(xτ −1 ). We assume that the players use stationary strategies of a selection of the actions in their position sets. We define the stationary strategies of the players as maps: s1 : x → a ∈ A(x) for x ∈ X1 ; s2 : x → a ∈ A(x) for x ∈ X2 . Let s1 , s2 be arbitrary strategies of the players.Then s = (s1 , s2 ) determines a Markov prosi (x) cess induced by probability distributions {px,y } in the states x ∈ Xi , i = 1, 2 and a given starting state x0 . For this Markov process with transition costs cx,y , x, y ∈ X we can determine the average cost per transition Fxo (s1 , s2 ). The function Fxo (s1 , s2 ) on S = S 1 × S 2 , where S 1 and S2 represent the corresponding sets of the stationary strategies of player 1 and player 2, defines an antagonistic game. This game is determined by position sets X1 , X2 , the set of actions A, the probability function p, the cost function c and the starting state x0 . We denote this game by (X1 , X2 , A, c, p, x0 ) and call it zero-sum average stochastic positional game. In this game, we are seeking for the saddle points. D. Lozovanu, S. Pickl / Electronic Notes in Discrete Mathematics 55 (2016) 155–159 2 157 The Main Results We show that for the considered zero-sum game there exists a saddle point, ∗ ∗ i.e there exist s1 , s2 for which ∗ ∗ Fx (s1 , s2 ) = max min2 Fx (s1 , s2 ) = min max1 Fx (s1 , s2 ). 1 1 2 2 2 1 s ∈S s ∈S s ∈S s ∈S Theorem 2.1 Let (X1 , X2 , A, c, p, x) be an antagonistic stochastic positional game with average payoff Fx (s1 , s2 ). Then the system of equations ⎧ a ⎪ ⎪ ε + ω = max + p ε μ x x,a x,y y , ⎨ x a∈A(x) y∈X a ⎪ ⎪ px,y εy , ⎩ εx + ωx = min μx,a + (1) a∈A(x) y∈X ∀x ∈ X1 ; ∀x ∈ X2 ; has a solution under the set of solutions of the system of equations ⎧ a ⎪ ⎪ max px,y ωy , ∀x ∈ X1 ; ⎨ ωx = a∈A(x) y∈X (2) ⎪ ⎪ pax,y ωy , ∀x ∈ X2 , ⎩ ωx = min a∈A(x) y∈X i.e. the system of equations (2) has such a solution ωx∗ , x ∈ X for which there exists a solution ε∗x , x ∈ X of the system of equations ⎧ a ∗ ⎪ ⎪ + ω = max + p ε μ , ∀x ∈ X1 ; ε x x,a y x x,y ⎨ a∈A(x) y∈X a ⎪ ∗ ⎪ + ω = min + p ε μ ∀x ∈ X2 . ε x,a ⎩ x x x,y y , a∈A(x) y∈X ∗ ∗ The optimal stationary strategies s1 , s2 of the players can be found by fixing ∗ ∗ arbitrary maps s1 (x) ∈ A(x) for x ∈ X1 and s2 (x) ∈ A(x) for x ∈ X2 such that 1∗ s (x) ∈ Arg max a∈A(x) 2∗ s (x) ∈ Arg max a∈A(x) y∈X y∈X pax,y ωy∗ a ∗ px,y εy , x ∈ X1 , Arg max μx,a + pax,y ωy∗ a ∗ px,y εy , x ∈ X2 . Arg max μx,a + a∈A(x) a∈A(x) y∈X y∈X 158 D. Lozovanu, S. Pickl / Electronic Notes in Discrete Mathematics 55 (2016) 155–159 Proof (Sketch) Let x ∈ X be an arbitrary state and consider the stationary strategies s1 ∈ S 1 , s2 ∈ S 2 for which Fx (s1 , s2 ) = mins2 ∈S 2 maxs1 ∈S 1 Fx (s1 , s2 ). We show that Fx (s1 , s2 ) = maxs1 ∈S 1 mins2 ∈S 2 Fx (s1 , s2 ), i.e. we show that ∗ ∗ s1 = s1 , s2 = s2 . If we consider the Markov decision process induced by strategies s1 , s2 then according to the properties of the bias equations for this decision process the system of linear equations ⎧ a ⎪ εx + ωx = μx,a + px,y εy , ∀x ∈ X1 , a = s1 (x); ⎪ ⎪ ⎪ y∈X ⎪ ⎪ ⎪ ⎪ ⎪ εx + ωx = μx,a + pa εy , ∀x ∈ X2 , a = s2 (x); ⎪ x,y ⎨ y∈X (3) a ⎪ ⎪ px,y ωy , ∀x ∈ X1 , a = s1 (x); ωx = ⎪ ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ⎪ ⎪ ∀x ∈ X2 , a = s2 (x) px,y ωy , ωx = ⎩ y∈X has the solution (x ∈ X) which for a fixed strategy s2 ∈ S 2 satisfies the condition: ⎧ a ∗ ⎪ ε∗x + ωx∗ ≥ μx,a + px,y εy , ∀x ∈ X1 , a ∈ A(x); ⎪ ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ∗ ⎪ ∗ ∗ ⎪ ⎪ px,y εy , ∀x ∈ X2 , a = s2 (x); ⎨ εx + ωx = μx,a + ε∗x , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ωx∗ ωx∗ ≥ ωx∗ = y∈X y∈X y∈X pax,y ωy∗ , ∀x ∈ X1 , a ∈ A(x); pax,y ωy∗ , ∀x ∈ X2 , a = s2 (x) and Fx (s1 , s2 ) = ωx∗ , ∀x ∈ X. Taking into account that Fx (s1 , s2 ) = mins2 ∈S 2 Fx (s1 , s2 ) then for a fixed strategy s1 ∈ S 1 the solution ∗x , ωx∗ (x ∈ X) satisfies the condition ⎧ a ∗ ∗ ∗ ⎪ ε + ω = μ + px,y εy , ∀x ∈ X1 , a = s1 (x); ⎪ x,a x x ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ∗ ⎪ ∗ ∗ ⎪ ⎪ px,y εy , ∀x ∈ X2 , a ∈ A(x); ⎨ εx + ωx ≤ μx,a + y∈X a ∗ ⎪ ∗ ⎪ = px,y ωy , ∀x ∈ X1 , a = s1 (x); ω ⎪ x ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ∗ ⎪ ∗ ⎪ px,y ωy , ∀x ∈ X2 , a ∈ A(x) ωx ≤ ⎩ y∈X D. Lozovanu, S. Pickl / Electronic Notes in Discrete Mathematics 55 (2016) 155–159 So, the following system ⎧ a ⎪ ⎪ px,y εy , εx + ωx ≥ μx,a + ⎪ ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ⎪ ⎪ px,y εy , ⎪ ⎨ εx + ωx ≤ μx,a + ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ωx ≥ ωx ≤ y∈X y∈X y∈X 159 ∀x ∈ X1 , a ∈ A(x); ∀x ∈ X2 , a ∈ A(x); pax,y ωy , ∀x ∈ X1 , a ∈ A(x); pax,y ωy , ∀x ∈ X2 , a ∈ A(x) ∗ has a solution, which satisfies condition (3). This means that s1 = s1 , ∗ s2 = s2 and ∗ Fx (s1 , s2 ) = max min2 Fx (s1 , s2 ) = min max1 Fx (s1 , s2 ), ∀x ∈ X, 1 1 2 2 2 1 s ∈S s ∈S i.e. , the theorem holds. s ∈S s ∈S 2 The obtained saddle point condition for zero-sum stochastic games generalizes the saddle point condition for deterministic average positional games from [1,5]. Based on Theorem 2.1 we may conclude that the optimal strategies of the players in the considered game can be found if we determine a solution of equations (1),(2). We have shown that a solution of these equations can be determined using iterative algorithms like algorithms for determining the optimal solutions of an average Markov decision problem [3,4]. References [1] Ehrenfeucht, A., Mycielski, J., Positional strategies for mean payoff games International Journal of Game Theory. 8 (1979), 109–113. [2] Lozovanu, D., Pickl, S., ”Optimization and Multiobjective Control of TimeDiscrete Systems”, Springer, 2009. [3] Lozovanu, D., Pickl, S., ”Optimization of Stochastic Discrete Systems and Control on Complex Networks”, Springer, 2015. [4] Puterman, M., ”Markov Decision Processes: Discrete Stochastic Dynamic Programming”, John Wiley, New Jersey, 2005. [5] Zwik U., Paterson M., The complexity of mean payoff games on graphs, TCS 158 (1996), 344-359.
© Copyright 2026 Paperzz