There`s no such thing as a single agent system.

Multiagent Systems
LECTURE 6:
MULTIAGENT
INTERACTIONS
There's no such thing as a
single agent system.
An Introduction to MultiAgent Systems
http://www.csc.liv.ac.uk/~mjw/pubs/imas
6-2
What are Multiagent Systems?
MultiAgent Systems
Thus a multiagent system contains a
number of agents…
6-3

…which interact through communication…

…are able to act in an environment…

…have different “spheres of influence” (which
may coincide)…

…will be linked by other (organizational)
relationships
6-4
1
Utilities and Preferences





What is Utility?
Assume we have just two agents: Ag = {i, j}
Agents are assumed to be self-interested (ιδιοτελείς):
they have preferences over how the environment is
Assume {12…} is the set of “outcomes” that
agents have preferences over
We capture preferences by utility functions:
uj =   
ui =   


Utility is not money (but it is a useful analogy)
Typical relationship between utility & money:
Utility functions lead to preference orderings over
outcomes:
 ≥i ’ means ui() ≥ ui(’)
6-5
Multiagent Encounters (συμπλοκή)




Multiagent Encounters
We need a model of the environment in which these
agents will act…

6-6
agents simultaneously choose an action to perform, and as a
result of the actions they select, an outcome in  will result
the actual outcome depends on the combination of actions
assume each agent has just two possible actions that it can
perform, C (“cooperate”) and D (“defect”)

Here is a state transformer function:

(This environment is sensitive to actions of both
agents.)
Here is another:

(Neither agent has any influence in this
environment.)
And here is another:
Environment behavior given by state transformer
function:
(This environment is controlled by j.)
6-7
6-8
2
Rational Action
Payoff Matrices

Suppose we have the case where both agents can
influence the outcome, and they have utility functions
as follows:

With a bit of abuse of notation:

Then agent i’s preferences are:

“C” is the rational choice for i.
(Because i prefers all outcomes that arise through C
over all outcomes that arise through D.)

We can characterize the previous scenario in
a payoff matrix (πίνακας απολαβών):

Agent i is the column player
Agent j is the row player

6-9
Dominant Strategies (κυρίαρχες στρατηγικές)





6-10
Nash Equilibrium
Given any particular strategy (either C or D) of agent
i, there will be a number of possible outcomes
We say s1 dominates s2 if every outcome possible by i
playing s1 is preferred over every outcome possible
by i playing s2
A rational agent will never play a dominated strategy
(κυριαρχούμενη στρατηγική)
So in deciding what to do, we can delete dominated
strategies
Unfortunately, there isn’t always a unique
undominated strategy
6-11
In general, we will say that two strategies s1 and s2
are in Nash equilibrium if:

1.
2.
under the assumption that agent i plays s1, agent j can do
no better than play s2; and
under the assumption that agent j plays s2, agent i can do
no better than play s1.
Neither agent has any incentive to deviate from a
Nash equilibrium
Unfortunately:


1.
2.
Not every interaction scenario has a Nash equilibrium
Some interaction scenarios have more than one Nash
equilibrium
6-12
3
Competitive and Zero-Sum Interactions




The Prisoner’s Dilemma
Where preferences of agents are
diametrically opposed we have strictly
competitive scenarios
Zero-sum encounters are those where utilities
sum to zero:
ui() + uj() = 0 for all   
Zero sum implies strictly competitive
Zero sum encounters in real life are very rare
… but people tend to act in many scenarios
as if they were zero sum


Two men are collectively charged with a
crime and held in separate cells, with no way
of meeting or communicating.
They are told that:



if one confesses and the other does not, the
confessor will be freed, and the other will be jailed
for three years
if both confess, then each will be jailed for two
years
Both prisoners know that if neither confesses,
then they will each be jailed for one year
6-13
The Prisoner’s Dilemma

6-14
Dominant Strategy?
Payoff matrix for
prisoner’s dilemma:

Agent i:


Confess ≡ Defect, Not confess ≡ cooperate





Top left: If both defect, then both get punishment
for mutual defection
Top right: If i cooperates and j defects, i gets
sucker’s payoff of 1, while j gets 4
Bottom left: If j cooperates and i defects, j gets
sucker’s payoff of 1, while i gets 4
Bottom right: Reward for mutual cooperation


Agent j:



6-15
ui(D,D)=2, ui(D,C)=4, ui(C,D)=1, ui(C,C)=3
D,C ≥i C,C ≥i D,D ≥i C,D
No dominant strategy
uj(D,D)=2, uj(D,C)=1, uj(C,D)=4, uj(C,C)=3
C,D ≥j C,C ≥j D,D ≥j D,C
No dominant strategy
6-16
4
The Prisoner’s Dilemma



Nash Equilibrium?
The individual rational action is defect
This guarantees a payoff of no worse than 2,
whereas cooperating guarantees a payoff of at
most 1
So defection is the best response to all
possible strategies: both agents defect, and
get payoff = 2
But intuition says this is not the best outcome:
Surely they should both cooperate and each
get payoff of 3!

(D,D)




(C,D)




If i plays D, then for j the best is D (2 vs. 1)
If j plays D, then for i the best is D (2 vs. 1)
(D,D) is in Nash equilibrium
If i plays C, then for j the best is D (4 vs. 3)
If j plays D, then for i the best is D (2 vs. 1)
(C,D) is not in Nash equilibrium
etc.
6-17
Arguments for Recovering Cooperation
The Prisoner’s Dilemma


This apparent paradox is the fundamental
problem of multi-agent interactions.
It appears to imply that cooperation will not
occur in societies of self-interested agents.
Real world examples:





6-18

Conclusions that some have drawn from this
analysis:


nuclear arms reduction (“why don’t I keep mine. . . ”)
free rider systems — public transport;
in the UK — television licenses.
The prisoner’s dilemma is ubiquitous.
Can we recover cooperation?

Arguments to recover cooperation:



6-19
the game theory notion of rational action is wrong!
somehow the dilemma is being formulated
wrongly
We are not all Machiavelli!
The other prisoner is my twin!
The shadow of the future…
6-20
5
The Iterated Prisoner’s Dilemma



Backwards Induction
One answer: play the game more than
once
If you know you will be meeting your
opponent again, then the incentive to
defect appears to evaporate
Cooperation is the rational choice in the
infinititely repeated prisoner’s dilemma
(Hurrah!)


But…suppose you both know that you will
play the game exactly n times
On round n, you have an incentive to defect,
to gain that extra bit of payoff…
But this makes round n – 1 the last “real”, and
so you have an incentive to defect there, too.
This is the backwards induction problem.
Playing the prisoner’s dilemma with a fixed,
finite, pre-determined, commonly known
number of rounds, defection is the best
strategy
6-21
Axelrod’s Tournament
6-22
Strategies in Axelrod’s Tournament



Suppose you play iterated prisoner’s dilemma
against a range of opponents…
What strategy should you choose, so as to
maximize your overall payoff?
Axelrod (1984) investigated this problem, with
a computer tournament for programs playing
the prisoner’s dilemma
ALLD:


RANDOM


Ignore what the opponent did on the previous
round.
TIT-FOR-TAT:
1.
2.

6-23
“Always defect” — the hawk strategy;
On round u = 0, cooperate
On round u > 0, do what your opponent did on
round u – 1
Simplest strategy: Only 5 lines of Fortran code!
6-24
6
Recipes for Success in Axelrod’s Tournament
Strategies in Axelrod’s Tournament
TESTER:




Intends to exploit weak opponents that do not
punish defection
On 1st round, defect. If the opponent retaliated,
then play TIT-FOR-TAT. Otherwise intersperse
cooperation and defection (2 rounds
cooperation, then defect).
Axelrod suggests the following rules for
succeeding in his tournament:



JOSS:



Intends to exploit weak opponents
As TIT-FOR-TAT, except periodically (10%)
defect

Don’t be envious: Don’t play as if it were zero
sum!
Be nice: Start by cooperating, and reciprocate
cooperation
Retaliate appropriately: Always punish defection
immediately, but use “measured” force — don’t
overdo it
Don’t hold grudges: Always reciprocate
cooperation immediately
6-25
Recipes for Success in Axelrod’s Tournament

6-26
Axelrod’s Tournament
Do not be too clever:



The most complex strategies attempted to
develop a model of the behaviour of the opponent
while ignoring the fact that the opponent was
watching! (reciprocal learning, αμοιβαία μάθηση)

The overall best strategy was TITFOR-TAT

When confronted with ALLD, it lost!
Most complex strategies over-generalized when
seeing their opponent defect (not forgiving)

When confronted with less aggressive
strategies that favour co-operation it did
well!
Many strategies had a very complex behaviour to
be understood (seemed rather random to their
opponents)
6-27
6-28
7
Stag Hunt
Game of Chicken

Consider another type of encounter — the game of
chicken:




(Think of James Dean in Rebel without a Cause:
swerving = coop, driving straight = defect.)
Difference to prisoner’s dilemma:
Mutual defection is most feared outcome.
(Whereas sucker’s payoff is most feared in prisoner’s
dilemma.)
Strategies (c,d) and (d,c) are in Nash equilibrium



i defects i cooperates
You and your friend decide
to show up last day at
school with funny haircuts
The best would be both to
have the haircut
The worst would be that only
you show up with a haircut
and your friend sells you up
It would be fun to watch your
friend with the haircut, but you
do not have it.
It would not be fun if both do
not have the haircut
j defects
1
1
0
2
j cooperates
0
2
3
3

2 Nash
equilibria

Mutiny
scenarios
6-30
6-29
Other Symmetric 2 x 2 Games






Some Other Interaction Scenarios
Given the 4 possible outcomes of (symmetric)
cooperate/defect games, there are 4!=24 possible
orderings on outcomes
CC >i CD >i DC >i DD
Cooperation dominates
DC >i DD >i CC >i CD
Deadlock. You will always do best by defecting
DC >i CC >i DD >i CD
Prisoner’s dilemma
DC >i CC >i CD >i DD
Chicken
CC >i DC >i DD >i CD
Stag hunt
i
defects
j
defects
3
3
j cooperates
1




6-31
i cooperates
4
2
1
2
4
i
defects
j
defects
i cooperates
-1
-1
j cooperates
2
2
1
1
-1
-1
What the agents should do?
Classify preferences.
Are there strongly (or weakly) dominated
strategies?
Are there any Nash equilibria?
6-32
8
Dependence Relations





Dependence Relations
A dependence relation exists between 2 agents if
one of the agents requires the other in order to
achieve one of its goals
Independence: There is no dependency.
Unilateral: One agent depends on the other, but not
vice versa.
Mutual (αμοιβαίος): Both agents depend on each
other for the same goal.
Reciprocal (ανταποδοτικός): The first agent
depends on the second for some goal. The second
also depends on the first for some goal. The 2 goals
are not necessarily the same.
6-33



Locally believed relationships: One agent
believes the dependence exists, but does not
believe that the other agent believes it exists.
Mutually believed relationships: One agent
believes the dependence exists, and also
believes that the other agent is aware of it.
DepNet (Sichman et al., 1994): A social
reasoning system that given a description of
a MAS, computes the relationships between
agents
6-34
9