Markov Game Analysis for Attack and Defense of Power Networks

Markov Game Analysis for
Attack and Defense of
Power Networks
Chris Y. T. Ma, David K. Y. Yau,
Xin Lou, and Nageswara S. V. Rao
Power Networks are Important Infrastructures
(And Vulnerable to Attacks)
• Growing reliance on electricity
• Aging infrastructure
• Introduced more connected digital sensing and
control devices (and attract attacks on cyber space)
• Hard and expensive to protect
• Limited budget
• How to allocate the limited resources?
– Optimal deployment to maximize long-term payoff
Modeling the Interactions –
Game Theoretic Approaches
• Static game
– Each player has a set of actions available
– Outcome and payoff determined by action of all
players
– Players act simultaneously
Static Game
• Example
Defend &
Attack
Defend &
No Attack
No defend
& Attack
No defend &
No Attack
Modeling the Interactions –
Game Theoretic Approaches
• Leader-follower game (Stackelberg game)
– Defender as the leader
– Adversary as the follower
– Bi-level optimization – minimax operation
• Inner level: follower maximizes its payoff given a
leader’s strategy
• Outer level: leader maximizes its payoff subject to the
follower’s solution of the inner problem
Stackelberg Game
• Example
Defend
Attack
No
Attack
No defend
Attack
No
Attack
Only model one-time interactions
Modeling the Interactions –
Markov Decision Process
• Markov Decision Process (MDP)
– System modeled as set of states with Markov
transitions between them
– Transition depends on action of one player and
some passive disruptors of known probabilistic
behaviors (acts of nature)
Markov Decision Process (MDP)
• Example (2 states, each has 2 actions
available)
0.9
0.1
Defend
Recover
up
down
No defend
0.6
0.1
0.9
0.4
No recover
0.1
0.9
Only models one intelligent player
Our Approach – Markov Game
• Generalizations of MDP to an adversarial
setting
– Models the continual interactions between
multiple players
• Players interact in the new state with different payoffs
– Models probabilistic state transition because of
inherent uncertainty in the underlying physical
system (e.g., random acts of nature)
Problem Formulation
• Defender and adversary of a power network
– Two-player zero-sum game
• Game formulation:
– Adversary
• Actions: which link to attack
• Payoff: cost of load shedding by the defender because of the
attack
– Defender
• Actions: which (up) link to reinforce or which (down) link to
recover
• Payoff: cost of load shedding because of the attack
Markov Game – Reward Overview
• Assume five links; link 4 both attacked and defended
(u,u,u,u,u)
p1
(u,u,u,u,u)
(u,u,u,u,u)
p2
1-p1
(u,u,u,d,u)
1-p2
(u,u,u,d,u)
• Immediate
Assume at state
reward
(u,u,u,d,u),
of such actions
link 4 both
is theattacked
weighted
and
sum
defended
of
successful attack and successful defense
again
• Immediate reward at state (u,u,u,d,u) is then the weighted sum
of successful recovery and failed recovery
• This immediate reward is further “propagated” back to the
original state (u,u,u,u,u) with a discount factor
• Hence, actions taken in a state will accrue a long-term reward
Solving the Markov Game – Definitions
Finding the Optimal Strategy –
Solving a Linear Program
Solving the Markov Game –
Value Iteration
• Dynamic program (value iteration) to solve the
Markov game
Experiment Results
Link diagram
State {u,u,u,u,u}
Links 4 and 5 both connect to generator, and generator at bus 4 has higher output
Experiment Results
Payoff Matrix of state {u,u,u,u,u} for the static game.
Payoff Matrix of state {u,u,u,u,u} for the Markov game.
(ϒ = 0.3)
Conclusions
• Using Markov game to model the attack and
defense of a power network between two
players
• Results show the action of players depends
not only on current state, but also later states
– To obtain the optimal long term benefit