Learning in Games Chi-Jen Lu Academia Sinica Outline What machine learning can do for game theory ๏ What game theory can do for machine learning ๏ TWO-PLAYER ZERO-SUM GAMES Zero-sum games player 2 player 2 player 1 player 1 0 -1 1 0 1 -1 1 0 -1 -1 0 1 -1 1 0 1 -1 0 utility (reward) of player 1 utility (reward) of player 2 Zero-sum games ๐พ๐ : action set of player i. ๏ ๐: utility matrix of Player 1. U(a, b) ๏ Player 1: max. Player 2: min. ๏ Donโt want to play first: max min ๐(๐, ๐) ๏ฃ min max ๐(๐, ๐) ๏ ๐โ๐พ1 ๐โ๐พ2 ๐โ๐พ2 ๐โ๐พ1 < in many games Zero-sum games ๏ Minimax Theorem: max min ๐ ๐ด, ๐ต = min max ๐ ๐ด, ๐ต ๐ด ๐ต distributions ๏ E [๐ ๐, ๐ ] ๐~๐ด ๐~๐ต ๐ต ๐ด distributions How to find such A, B efficiently? ONLINE LEARNING Online learning / decision ๏ Making decisions/predictions repeatedly and then paying the prices I wish I hadโฆ Many examples Predicting weather, trading stocks, commuting to work, โฆ ๏ Network routing ๏ Scheduling ๏ Resource allocation ๏ Online advertising ๏โฆ ๏ Problem formulation Play for T rounds, from action set K ๏ In round t, distribution over K ๏ โฆ play an action ๐ฅ (๐ก) โ ๐พ or โ โ(๐พ) โฆ receive a reward ๐ (๐ก) (๐ฅ (๐ก) ) ๐ก E (๐ก)[๐ ๐~๐ฅ ๏ How to choose ๐ฅ (๐ก) ? Goal? (๐)] Goal: minimize regret ๏ Regret: ๐ ๐ (๐ก) (๐ฅ โ ) max โ ๐ฅ ๐ ๐ก=1 ๐ (๐ก) (๐ฅ (๐ก) ) ๐ก=1 total reward of best fixed strategy total reward of online algorithm I wish I hadโฆ No-regret algorithms no regret T-step regret โ ๐ Average regret per step: โ 1/ ๐ ๏ฎ 0 ๏ Finite action space K โฆ time, space per step โ |K| ๏ Convex K ๏ Rd and concave ๐ (๐ก) โs โฆ time, space per step โ d ๏ Gradient-descent type algorithms Applications in other areas algorithms: approximation algorithms ๏ complexity: hardcore set for derandomization ๏ optimization: LP duality ๏ biology: evolution ๏ game theory: minimax theorem ๏ 13 Zero-sum games ๏ Minimax Theorem: โ๏ฅ max min ๐ ๐ด, ๐ต = min max ๐ ๐ด, ๐ต ๐ด ๐ต distributions E [๐ ๐, ๐ ] ๐ต ๐~๐ด ๐~๐ต ๐ด distributions oneshot game How to find such A, B efficiently? ๏ Play no regret algorithm with each other: ๏ โฆ Get ๐ฅ (1) , โฆ , ๐ฅ (๐) and ๐ฆ (1) , โฆ , ๐ฆ (๐) โฆ๐ด= 1 ๐ ๐ (๐ก) ๐ฅ . ๐ก=1 B= 1 ๐ T โ 1/๏ฅ2 ๐ (๐ก) ๐ฆ . ๐ก=1 โฆ Time, space โ #(actions) huge? INFLUENCE MAXIMIZATION GAMES Opinion formation in social net ๏ A population of n individuals, each with some internal opinion from [-1, 1] vs. ๏ Each tries to express an opinion close to neighborsโ opinions and her internal one 16 Opinion formation in social net ๏ Zero-sum game between and player/party player/party: โฆ goal: makes n shades of grey darker lighter โฆ actions: controls the opinions of k individuals ๏ Find minimax strategy? ๐ ๐ actions 17 Opinion formation in social net ๏ Zero-sum game between and player/party player/party: โฆ goal: makes n shades of grey darker lighter โฆ actions: controls the opinions of k individuals ๏ Find minimax strategy? ๏ Solution: no-regret algorithm for online combinatorial optimization. ๐ ๐ actions follow the perturbed leader 18 MARKOV GAMES Games with states board configurations policy: states ๏ฎ actions (randomized) ๏ Minimax theorem: ๏ max min ๐ ๐ด, ๐ต = min max ๐ ๐ด, ๐ต ๐ด policy ๐ต ๐ต policy ๐ด Games with states board configurations policy: states ๏ฎ actions (randomized) ๏ Minimax theorem: ๏ max min ๐ ๐ด, ๐ต = min max ๐ ๐ด, ๐ต ๐ด policy ๐ต ๐ต ๐ด policy How to find such A, B efficiently? ๏ #(policies) โ #(states)#(actions) ๏ huge? Games with states Solution: no-regret algorithm for twoplayer Markov decision process ๏ Time, space ๏ป poly(#(states), #(actions)) ๏ still huge for many games Outline What machine learning can do for game theory ๏ What game theory can do for machine learning ๏ ALGORITHMS VS. ADVERSARIES No-regret algorithm ๏ ๏ค algorithm A s.t. ๏ข sequence of loss functions ๐ = (๐ 1 , โฆ , ๐ ๐ ) : Regret(๐ด, ๐) โค ๐( ๐) log T ? ๏ min max Regret ๐ด, ๐ โค ๐( ๐) ๐ด ๐ find adversarial c benign class of c โฅ ฮฉ( ๐) smaller regret More generallyโฆ For any algorithm design problem and any cost measure (regret, time, space, โฆ) ๏ ๏ค algorithm A s.t. ๏ข input ๐ฅ: cost(๐ด, ๐ฅ) โค โฆ ๏ ๏ min max cost ๐ด, ๐ฅ โค โฏ ๐ด ๐ฅ โฅโฏ GENERATIVE ADVERSARIAL NETWORKS Learning generative models fake images! Learning generative models fake images! Learning generative models Training data: real face images novel / fake ๏ Learn generative model G: random seeds ๏ฎ face images ๏ Learning generative models Training data: real face images novel / fake ๏ Learn generative model G: random seeds ๏ฎ face images ๏ How to train a good G? ๏ If we can evaluate how bad/good G isโฆ ๏ Discriminator D(x) โ ๏ How to get a good D? 1 if x is fake โ1 if x is real Play the zero-sum game D tries to distinguish fake images from real ones by behaving differently ๏ G tries to fool D ๏ real fake min max E[D G z ] โ G D z E [D ๐ฅ ] x~real Learning generative model G ๏ Finding minimax solution to the game ๏ Still not an easy task! G, D: deep neural nets. huge action sets
© Copyright 2026 Paperzz