Zero-sum games - A Symposium on Complex Data Analysis 2017

Learning in Games
Chi-Jen Lu
Academia Sinica
Outline
What machine learning can do for game
theory
 What game theory can do for machine
learning

TWO-PLAYER
ZERO-SUM GAMES
Zero-sum games
player 2
player 2
player 1
player 1
0
-1
1
0
1
-1
1
0
-1
-1
0
1
-1
1
0
1
-1
0
utility (reward) of
player 1
utility (reward) of
player 2
Zero-sum games
𝐾𝑖 : action set of player i.
 𝑈: utility matrix of Player 1. U(a, b)
 Player 1: max. Player 2: min.
 Don’t want to play first:
max min 𝑈(𝑎, 𝑏)  min max 𝑈(𝑎, 𝑏)

𝑎∈𝐾1 𝑏∈𝐾2
𝑏∈𝐾2 𝑎∈𝐾1
<
in many games
Zero-sum games

Minimax Theorem:
max min 𝑈 𝐴, 𝐵 = min max 𝑈 𝐴, 𝐵
𝐴
𝐵
distributions

E [𝑈 𝑎, 𝑏 ]
𝑎~𝐴
𝑏~𝐵
𝐵
𝐴
distributions
How to find such A, B efficiently?
ONLINE LEARNING
Online learning / decision

Making decisions/predictions repeatedly
and then paying the prices
I wish I had…
Many examples
Predicting weather, trading stocks,
commuting to work, …
 Network routing
 Scheduling
 Resource allocation
 Online advertising
…

Problem formulation
Play for T rounds, from action set K
 In round t,
distribution over K

◦ play an action 𝑥 (𝑡) ∈ 𝐾 or ∈ ∆(𝐾)
◦ receive a reward 𝑟 (𝑡) (𝑥 (𝑡) )
𝑡
E (𝑡)[𝑟
𝑎~𝑥

How to choose 𝑥 (𝑡) ? Goal?
(𝑎)]
Goal: minimize regret

Regret:
𝑇
𝑟 (𝑡) (𝑥 ∗ )
max
∗
𝑥
𝑇
𝑡=1
𝑟 (𝑡) (𝑥 (𝑡) )
𝑡=1
total reward of
best fixed strategy
total reward of
online algorithm
I wish I had…
No-regret algorithms
no regret
T-step regret ≈ 𝑇
Average regret per step: ≈ 1/ 𝑇  0

Finite action space K
◦ time, space per step ≈ |K|

Convex K  Rd and concave 𝑟 (𝑡) ’s
◦ time, space per step ≈ d

Gradient-descent type algorithms
Applications in other areas
algorithms: approximation algorithms
 complexity: hardcore set for
derandomization
 optimization: LP duality
 biology: evolution
 game theory: minimax theorem

13
Zero-sum games

Minimax Theorem: ≈
max min 𝑈 𝐴, 𝐵 = min max 𝑈 𝐴, 𝐵
𝐴
𝐵
distributions
E [𝑈 𝑎, 𝑏 ]
𝐵
𝑎~𝐴
𝑏~𝐵
𝐴
distributions
oneshot
game
How to find such A, B efficiently?
 Play no regret algorithm with each other:

◦ Get 𝑥 (1) , … , 𝑥 (𝑇) and 𝑦 (1) , … , 𝑦 (𝑇)
◦𝐴=
1
𝑇
𝑇
(𝑡)
𝑥
.
𝑡=1
B=
1
𝑇
T ≈ 1/2
𝑇
(𝑡)
𝑦
.
𝑡=1
◦ Time, space ≈ #(actions)
huge?
INFLUENCE
MAXIMIZATION
GAMES
Opinion formation in social net

A population of n individuals, each with
some internal opinion from [-1, 1]
vs.

Each tries to express an opinion close to
neighbors’ opinions and her internal one
16
Opinion formation in social net

Zero-sum game between
and
player/party
player/party:
◦ goal: makes n shades of grey darker lighter
◦ actions: controls the opinions of k individuals

Find minimax strategy?
𝑛
𝑘
actions
17
Opinion formation in social net

Zero-sum game between
and
player/party
player/party:
◦ goal: makes n shades of grey darker lighter
◦ actions: controls the opinions of k individuals

Find minimax strategy?

Solution: no-regret algorithm for online
combinatorial optimization.
𝑛
𝑘
actions
follow the perturbed leader
18
MARKOV
GAMES
Games with states
board
configurations
policy: states  actions (randomized)
 Minimax theorem:

max min 𝑈 𝐴, 𝐵 = min max 𝑈 𝐴, 𝐵
𝐴
policy
𝐵
𝐵
policy
𝐴
Games with states
board
configurations
policy: states  actions (randomized)
 Minimax theorem:

max min 𝑈 𝐴, 𝐵 = min max 𝑈 𝐴, 𝐵
𝐴
policy
𝐵
𝐵
𝐴
policy
How to find such A, B efficiently?
 #(policies) ≈ #(states)#(actions)

huge?
Games with states
Solution: no-regret algorithm for twoplayer Markov decision process
 Time, space  poly(#(states), #(actions))

still huge for
many games
Outline
What machine learning can do for game
theory
 What game theory can do for machine
learning

ALGORITHMS
VS.
ADVERSARIES
No-regret algorithm

 algorithm A s.t.  sequence of loss
functions 𝑐 = (𝑐 1 , … , 𝑐 𝑇 ) :
Regret(𝐴, 𝑐) ≤ 𝑂( 𝑇)
log T ?

min max Regret 𝐴, 𝑐 ≤ 𝑂( 𝑇)
𝐴
𝑐
find adversarial c
benign class of c
≥ Ω( 𝑇)
smaller regret
More generally…
For any algorithm design problem and any
cost measure (regret, time, space, …)
  algorithm A s.t.  input 𝑥:
cost(𝐴, 𝑥) ≤ …


min max cost 𝐴, 𝑥 ≤ ⋯
𝐴
𝑥
≥⋯
GENERATIVE
ADVERSARIAL
NETWORKS
Learning generative models
fake images!
Learning generative models
fake images!
Learning generative models
Training data: real face images
novel / fake
 Learn generative model
G: random seeds  face images

Learning generative models
Training data: real face images
novel / fake
 Learn generative model
G: random seeds  face images
 How to train a good G?
 If we can evaluate how bad/good G is…

Discriminator D(x) ≈

How to get a good D?
1 if x is fake
−1 if x is real
Play the zero-sum game
D tries to distinguish fake images from
real ones by behaving differently
 G tries to fool D

real
fake
min max E[D G z ] −
G
D
z
E [D 𝑥 ]
x~real
Learning generative model G
 Finding minimax solution to the game

Still not an easy task!
G, D: deep neural nets.
huge
action sets

Download Report

Zero-sum games - A Symposium on Complex Data Analysis 2017

Paperzz.com

Your Paperzz