Zero-sum games - A Symposium on Complex Data Analysis 2017

Learning in Games
Chi-Jen Lu
Academia Sinica
Outline
What machine learning can do for game
theory
๏‚— What game theory can do for machine
learning
๏‚—
TWO-PLAYER
ZERO-SUM GAMES
Zero-sum games
player 2
player 2
player 1
player 1
0
-1
1
0
1
-1
1
0
-1
-1
0
1
-1
1
0
1
-1
0
utility (reward) of
player 1
utility (reward) of
player 2
Zero-sum games
๐พ๐‘– : action set of player i.
๏‚— ๐‘ˆ: utility matrix of Player 1. U(a, b)
๏‚— Player 1: max. Player 2: min.
๏‚— Donโ€™t want to play first:
max min ๐‘ˆ(๐‘Ž, ๐‘) ๏‚ฃ min max ๐‘ˆ(๐‘Ž, ๐‘)
๏‚—
๐‘Žโˆˆ๐พ1 ๐‘โˆˆ๐พ2
๐‘โˆˆ๐พ2 ๐‘Žโˆˆ๐พ1
<
in many games
Zero-sum games
๏‚—
Minimax Theorem:
max min ๐‘ˆ ๐ด, ๐ต = min max ๐‘ˆ ๐ด, ๐ต
๐ด
๐ต
distributions
๏‚—
E [๐‘ˆ ๐‘Ž, ๐‘ ]
๐‘Ž~๐ด
๐‘~๐ต
๐ต
๐ด
distributions
How to find such A, B efficiently?
ONLINE LEARNING
Online learning / decision
๏‚—
Making decisions/predictions repeatedly
and then paying the prices
I wish I hadโ€ฆ
Many examples
Predicting weather, trading stocks,
commuting to work, โ€ฆ
๏‚— Network routing
๏‚— Scheduling
๏‚— Resource allocation
๏‚— Online advertising
๏‚—โ€ฆ
๏‚—
Problem formulation
Play for T rounds, from action set K
๏‚— In round t,
distribution over K
๏‚—
โ—ฆ play an action ๐‘ฅ (๐‘ก) โˆˆ ๐พ or โˆˆ โˆ†(๐พ)
โ—ฆ receive a reward ๐‘Ÿ (๐‘ก) (๐‘ฅ (๐‘ก) )
๐‘ก
E (๐‘ก)[๐‘Ÿ
๐‘Ž~๐‘ฅ
๏‚—
How to choose ๐‘ฅ (๐‘ก) ? Goal?
(๐‘Ž)]
Goal: minimize regret
๏‚—
Regret:
๐‘‡
๐‘Ÿ (๐‘ก) (๐‘ฅ โˆ— )
max
โˆ—
๐‘ฅ
๐‘‡
๐‘ก=1
๐‘Ÿ (๐‘ก) (๐‘ฅ (๐‘ก) )
๐‘ก=1
total reward of
best fixed strategy
total reward of
online algorithm
I wish I hadโ€ฆ
No-regret algorithms
no regret
T-step regret โ‰ˆ ๐‘‡
Average regret per step: โ‰ˆ 1/ ๐‘‡ ๏‚ฎ 0
๏‚—
Finite action space K
โ—ฆ time, space per step โ‰ˆ |K|
๏‚—
Convex K ๏ƒ Rd and concave ๐‘Ÿ (๐‘ก) โ€™s
โ—ฆ time, space per step โ‰ˆ d
๏‚—
Gradient-descent type algorithms
Applications in other areas
algorithms: approximation algorithms
๏‚— complexity: hardcore set for
derandomization
๏‚— optimization: LP duality
๏‚— biology: evolution
๏‚— game theory: minimax theorem
๏‚—
13
Zero-sum games
๏‚—
Minimax Theorem: โ‰ˆ๏ฅ
max min ๐‘ˆ ๐ด, ๐ต = min max ๐‘ˆ ๐ด, ๐ต
๐ด
๐ต
distributions
E [๐‘ˆ ๐‘Ž, ๐‘ ]
๐ต
๐‘Ž~๐ด
๐‘~๐ต
๐ด
distributions
oneshot
game
How to find such A, B efficiently?
๏‚— Play no regret algorithm with each other:
๏‚—
โ—ฆ Get ๐‘ฅ (1) , โ€ฆ , ๐‘ฅ (๐‘‡) and ๐‘ฆ (1) , โ€ฆ , ๐‘ฆ (๐‘‡)
โ—ฆ๐ด=
1
๐‘‡
๐‘‡
(๐‘ก)
๐‘ฅ
.
๐‘ก=1
B=
1
๐‘‡
T โ‰ˆ 1/๏ฅ2
๐‘‡
(๐‘ก)
๐‘ฆ
.
๐‘ก=1
โ—ฆ Time, space โ‰ˆ #(actions)
huge?
INFLUENCE
MAXIMIZATION
GAMES
Opinion formation in social net
๏‚—
A population of n individuals, each with
some internal opinion from [-1, 1]
vs.
๏‚—
Each tries to express an opinion close to
neighborsโ€™ opinions and her internal one
16
Opinion formation in social net
๏‚—
Zero-sum game between
and
player/party
player/party:
โ—ฆ goal: makes n shades of grey darker lighter
โ—ฆ actions: controls the opinions of k individuals
๏‚—
Find minimax strategy?
๐‘›
๐‘˜
actions
17
Opinion formation in social net
๏‚—
Zero-sum game between
and
player/party
player/party:
โ—ฆ goal: makes n shades of grey darker lighter
โ—ฆ actions: controls the opinions of k individuals
๏‚—
Find minimax strategy?
๏‚—
Solution: no-regret algorithm for online
combinatorial optimization.
๐‘›
๐‘˜
actions
follow the perturbed leader
18
MARKOV
GAMES
Games with states
board
configurations
policy: states ๏‚ฎ actions (randomized)
๏‚— Minimax theorem:
๏‚—
max min ๐‘ˆ ๐ด, ๐ต = min max ๐‘ˆ ๐ด, ๐ต
๐ด
policy
๐ต
๐ต
policy
๐ด
Games with states
board
configurations
policy: states ๏‚ฎ actions (randomized)
๏‚— Minimax theorem:
๏‚—
max min ๐‘ˆ ๐ด, ๐ต = min max ๐‘ˆ ๐ด, ๐ต
๐ด
policy
๐ต
๐ต
๐ด
policy
How to find such A, B efficiently?
๏‚— #(policies) โ‰ˆ #(states)#(actions)
๏‚—
huge?
Games with states
Solution: no-regret algorithm for twoplayer Markov decision process
๏‚— Time, space ๏‚ป poly(#(states), #(actions))
๏‚—
still huge for
many games
Outline
What machine learning can do for game
theory
๏‚— What game theory can do for machine
learning
๏‚—
ALGORITHMS
VS.
ADVERSARIES
No-regret algorithm
๏‚—
๏€ค algorithm A s.t. ๏€ข sequence of loss
functions ๐‘ = (๐‘ 1 , โ€ฆ , ๐‘ ๐‘‡ ) :
Regret(๐ด, ๐‘) โ‰ค ๐‘‚( ๐‘‡)
log T ?
๏‚—
min max Regret ๐ด, ๐‘ โ‰ค ๐‘‚( ๐‘‡)
๐ด
๐‘
find adversarial c
benign class of c
โ‰ฅ ฮฉ( ๐‘‡)
smaller regret
More generallyโ€ฆ
For any algorithm design problem and any
cost measure (regret, time, space, โ€ฆ)
๏‚— ๏€ค algorithm A s.t. ๏€ข input ๐‘ฅ:
cost(๐ด, ๐‘ฅ) โ‰ค โ€ฆ
๏‚—
๏‚—
min max cost ๐ด, ๐‘ฅ โ‰ค โ‹ฏ
๐ด
๐‘ฅ
โ‰ฅโ‹ฏ
GENERATIVE
ADVERSARIAL
NETWORKS
Learning generative models
fake images!
Learning generative models
fake images!
Learning generative models
Training data: real face images
novel / fake
๏‚— Learn generative model
G: random seeds ๏‚ฎ face images
๏‚—
Learning generative models
Training data: real face images
novel / fake
๏‚— Learn generative model
G: random seeds ๏‚ฎ face images
๏‚— How to train a good G?
๏‚— If we can evaluate how bad/good G isโ€ฆ
๏‚—
Discriminator D(x) โ‰ˆ
๏‚—
How to get a good D?
1 if x is fake
โˆ’1 if x is real
Play the zero-sum game
D tries to distinguish fake images from
real ones by behaving differently
๏‚— G tries to fool D
๏‚—
real
fake
min max E[D G z ] โˆ’
G
D
z
E [D ๐‘ฅ ]
x~real
Learning generative model G
๏ƒž Finding minimax solution to the game
๏‚—
Still not an easy task!
G, D: deep neural nets.
huge
action sets