Emergence of Conventions through social learning

Emergence of Conventions through social
learning
Séminaire équipe multiagent LIP6
Stéphane Airiau
LAMSADE
joint work with Sandip Sen and Daniel Villatoro
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
1
Convention
“A convention as an equilibrium that everyone expects in interactions
that have more than one equilibrium”
Young (1996). The Economics of Convention. Journal of Economic Perspectives
examples:
picking the side of the road
picking a symbol / word for an object
picking the notation for retweeting
The Emergence of Conventions in Online Social Networks.
Kooti, Yang, Cha, Gummadi, Mason in AAAI conference on Weblogs and Social Media
A norm is more than a convention: there is a deontic aspect
(obligation/sanctions)
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
2
Convention and Norms in MAS
Some examples of research directions:
designing tools to build communities of agents that use norms
(e.g. reasoning about norms, implementation of sanctions)
modeling or understanding establishments of human norms or
conventions (ex: language, conventions)
Some (overlaping) communities interested in norms
DEON: International Conference on Deontic Logic and
Normative Systems (12 editions)
NorMAS: Normative Multiagent Systems systems where
individual and collective behavior is affected by norms
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
3
Related work on emergence
agents can observe other agents’ interactions
H.P. Young Econometrica 1993 (Markov Chain)
Epstein Computational Economics 2001 (majority rule),
Axelrod American Political Science review 1986 (evolutionary)
Hao et al. AAMAS 2013, AAAI 2014
agents communicate their model
Verhagen Social Science Computer Review 2001
use of social networks
Delgado Artificial Intelligence 2002
Yu et al AAMAS 2013, Hao et al. AAMAS 2013, AAAI 2014
human agents
Centola and Baronchelli PNAS 2015
Kooti et al. in AAAI conference on Weblogs and Social Media 2012
private information
Shoham and Tenneholtz Artificial Intelligence 1997 (local learning algorithm HCR
rule)
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
4
Our work
When multiple conventions are possible, how does a society of
artificial agent collectively adopts a norm?
voting would be a possibility: but one needs to organise the
election
the convention may emerge from interactions between agents
Contribution:
previous work assumes the observation of interactions between
other agents
ë will a convention emerge even if all interactions are private?
when agents are learning in repeated interactions, they interact
witht the same agent(s)
ë does learning converge when the interacting agents keep on changing?
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
5
Social Learning Framework
We consider only conventions between pairs of agents
(example of the road, shaking hands, etc)
N is the set of n agents
Ar set of actions of row role
Ac set of actions of column role
Gi payoff matrix of agent i ∈ N:
we do not assume inter comparison of utility
but we assume the same ordering over all joint-actions
In most cases: Ar = Ac and ∀i, j Gi = Gj
interconnection topology: a topology may restrict the
interactions between agents.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
6
Social Learning Framework
Interaction protocol:
One iteration
initialisation: all agents are available
until there is no pair of available agents do:
randomly pick a pair (i, j) of available agents
randomly select a role row or column for i and j
row selects an action in Ar , column selects an action in Ac
row and column receive the corresponding payoff
row and column update their learning algorithms
update the set of available agents
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
7
Stability
Definition (convention)
A convention is a pure Nash equilibrium of the game
in practice conventions are pure strategy
if there is a unique pure Nash equilibrium, actually, we do not
study a convention.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
8
Examples
Go
Yield
Go
-1
-1
3
2
Yield
2
3
1
1
L
L
R
intersection
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
R
4
2
-1
-10
-1
-10
4
2
picking side
Séminaire équipe multiagent LIP6
9
Learning algorithms
Fictitious Play
Q-learning with -greedy
Watkins and Dayan (1992) Machine Learning
Win of learn fast – policy hill climbing (WoLF-PHC)
Bowling and Veloso (2002) Artificial Intelligence
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
10
First algorithm: Fictitious Play
The learner believes its opponent is playing a fixed mixed strategy
given by the empirical distribution of the opponents previous action.
ë the learner plays a best response to this mixed strategy.
1
2
3
4
intialize frequencies p of the actions played by the opponent
repeat
play a best response to p
observe the action played by the opponent
and update frequencies
Theorem
If the empirical distribution of each player’s strategies converges in fictitious play, then it converges to a Nash equilibrium
the play converges to a NE, but the players may not play a NE and may not
receive a NE expected payoff (ex anti-coordination game)
convergence is not always guaranteed (ex Rock-paper-cisors)
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
11
Q-learning with -greedy
We apply the Q-learning algorithm with only one state!
Update rule of Q-learning is Q(a) ← Q(a) + α(r − Q(a))
For exploration, we use the -greedy method:
argmaxa∈A(s) Q(a) with probability 1 − at =
,
sample from U(A(s)) with probability where U(S) is the uniform probability distribution over a set of
alternative S.
may decrease during learning.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
12
Detecting convergence
Even if all agents use the same strategy,
if they use some exploration
we will observe some deviation.
For simplicity, we use a threshold:
if a strategy profile is played by 95% of the population,
we declare convergence.
note: some authors used a different threshold (90%).
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
13
Simulation results
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
14
Intersection game
(G, YL ) emerges 506 times and (YR , G) 494 times.
(G, YL ) emerges 534 times and (YR , G) 466 times.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
15
Dynamics
Dynamics of the probability to play R for each agent
each agent is represented by two lines: policy to play as a row and a
column player
the clearer the cell, the more likely to play L, the darker, the more likely
to play R.
The convention of choosing action L emerges through social learning.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
16
Influence of population size
average payoff of a learner
Influence of the population size (average over 100 runs) with agents using WoLF
2.4
2.2
2
2 agents
10 agents
20 agents
50 agents
100 agents
150 agents
200 agents
300 agents
400 agents
500 agents
1.8
1.6
1.4
1.2
0
200
400
600
800
1000
number of iterations
1200
1400
Dynamics of the average payoff of learners using WoLF with
different population sizes (average over 100 runs).
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
17
Varying the size of the game (number of actions)
Dynamics of the payoff of 200 WoLF learners with different size of the game
4
3.5
average payoffs
3
2.5
2
1.5
1
2x2
3x3
4x4
0.5
0
0
500
1000
1500
iterations
2000
2500
3000
Coordination game
Dynamics of the payoff of learners using WoLF with different game
sizes (average over 100 runs).
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
18
Using different learning algorithms
Dynamics of the average payoff in a population of 200 agents
using different learning algorithms
4
average payoffs
3.5
3
2.5
FP
QL
WoLF
FP+QL
FP+WoLF
QL+WoLF
QL+WoLF+FP
2
1.5
1
100
1000
iterations
10000
100000
Dynamics of the payoff of learners using different learning
algorithms (population of 200 agents, average over 100 runs).
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
19
Influence of fixed agents
percentage of time it converged to a norm
Effect of fixed agents
1
converged to (0,0)
converged to (1,1)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
number of additional agents playing fixed strategy 1
5
Number of times each convention emerges (average over 100 runs)
a small imbalance in the number of agents using a pure strategy is
enough to influence an entire population.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
20
In a network
Convergence Times for Different Neighborhood Sizes and Different
Learning Algorithms
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
21
Emergence of different sub-conventions
fraction of time system converge to a norm
2 groups evolving a norm with different degree of isolation
1
(0,0)
(1,1)
0.8
0.6
0.4
0.2
0
0.5
0.4
0.3
0.2
0.1
0.05
0.01
probability of interaction between agents of different groups
Two groups of 100 agents each evolve conventions with different
interactions frequencies (average over 1,000 runs).
When the probability of interaction is low, the groups can evolve
different conventions.
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
22
Emergence of different sub-conventions
subconventions may emerge in scale-free networks
k-n connected star networks
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
23
Conclusion
private interactions are enough for a convention to emerge
agents may use generic learning mechanisms
stable subconventions may exist and be stable
SA, Sandip Sen, Daniel Villatoro. Emergence of Conventions through Social Learning in JAAMAS 2014
Stéphane Airiau (LAMSADE) - Emergence of Conventions through social learning
Séminaire équipe multiagent LIP6
24