Practicum Kennistechnologie

Design of Multi-Agent Systems
Teacher
Bart Verheij
Student assistants
Albert Hankel
Elske van der Vaart
Web site
http://www.ai.rug.nl/~verheij/teaching/dmas/
(Nestor contains a link)
Student presentations
Week 37
* C. Jonker et al. (2002). BDI-Modelling of
Intracellular Dynamics.
Joris Ijsselmuiden
* R. Wulfhorst et al. (2003). A Multiagent Approach
for Musical Interactive Systems.
Rosemarijn Looije
* M. Dastani, J. Hulstijn, F. Dignum, J.-J.Ch. Meyer
(2004). Issues in Multiagent System Development.
Sander van Dijk
Student presentations
Week 38
* W. C. Stirling, M. A. Goodrich and D. J.
Packard (2002). Satisficing Equilibria: A NonClassical Theory of Games and Decisions.
Dimitri Vrehen
* A. Bazzan and R.H. Bordini (2001). A
framework for the simulation of agents with
emotions. Report on Experiments with the
Iterated Prisoner's Dilemma.
Stijn Colen
* I. Dickinson and M. Wooldridge (2003).
Towards Practical Reasoning Agents for the
Semantic Web.
* E. Norling (2004). Folk Psychology for
Human Modelling: Extending the BDI Paradigm.
Some practical matters



Please submit exercises to
[email protected].
Please use naming conventions for file names and
message subjects.
Please read your student mail.
Overview
Introduction
Evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibria
The Prisoner’s Dilemma
Loose end: dominant strategies
Not or different
in the book
Typical structure of a multi-agent system
Interactions





Communication
Influence on environment (‘spheres of influence’)
Organizations, communities, coalitions
Hierarchical relations
Cooperation, competition
Utilities & preferences
How to measure the results of a multi-agent
systems? In terms of preferences and utilities.
Some notation:
={1,2, … }
‘outcomes’, future
environmental states
' ' ' ' ' ' 
group preferences (assumes
cooperation)
i' i' ' i' ' ' i  individual preferences
Preferences
Strict preferences
  ' if and only if ' and not ' 
Properties
Reflexive:
Transitive:
Comparable:
for all    : i
if i' and ' i' ' then i' '
for all , '  : i' of ' i
Utilities
According to utility theory, preferences can be
measured in terms of real numbers
u :  R
ui ( )  ui ( ' ) if and only if   i '
Example: money
But money isn’t always the right measure: think of
the subjective value of a million dollars when you
have nothing or when you are Bill Gates.
Utility & money
Zero-sum & constant-sum games
Simplification: two agents
Constant sum games
The sum of all players' payoffs is the same for
any outcome.
ui() + uj() = C
for all   
Zero-sum games
All outcomes involve a sum of the players’
payoffs of 0:
ui() + uj() = 0
for all   
Chess
0, ½, 1
-½, 0, ½
Zero-sum & constant-sum games
One agent’s gain is another agent’s loss.
Zero-sum games are necessarily always competitive.
But there are many non-zero sum situations.
Overview
Introduction
Evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibria
The Prisoner’s Dilemma
Loose end: dominant strategies
Kinds of evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibrium
Social welfare
Social welfare measures the sum of all individual
outcomes.
Optimal social welfare may not be achievable
when individuals are self-interested
Individual agents follow their own (different) utility
function.
Example 1
highest social welfare
Agent
a2
Strategy
s2,1
s2,2
s1,1
(5,6)
(4,3)
s1,2
(1,2)
(6,4)
a1
Overview
Introduction
Evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibria
The Prisoner’s Dilemma
Loose end: dominant strategies
Pareto efficiency or optimality
An outcome is Pareto optimal if a better outcome
for one agent always results in a worse outcome
for some other agent
When all agents pursue social welfare, highest
social welfare is Pareto optimal. However, a Pareto
optimal outcome need not be desirable. E.g.,
dictatorship
Pareto improvement: change that is an
improvement for someone without hurting anyone
Example 1
Pareto efficient
Agent
a2
Strategy
s2,1
s2,2
s1,1
(5,6)
(4,3)
Pareto
improvements
a1
s1,2
(1,2)
(6,4)
Overview
Introduction
Evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibria
The Prisoner’s Dilemma
Loose end: dominant strategies
Nash equilibrium
Two strategies s1 and s2 are in Nash equilibrium if:
1. under the assumption that agent i plays s1, agent j can do no
better than play s2; and
2. under the assumption that agent j plays s2, agent i can do no
better than play s1.
No individual has the incentive to unilaterally change
strategy
Example: driving on the right side of the road
Nash equilibria do not always exist and are not always unique
Example 1
Nash equilibria
Agent
a2
Strategy
s2,1
s1,1
(5,6)

(4,3)

‘Nash
incentives’

(1,2)

(6,4)
a1
s1,2
s2,2
Example 1
outcomes corresponding to
strategies in Nash
equilibrium
Agent
a2
Strategy
s2,1
s1,1
(5,6)
s2,2


a1
s1,2
(1,2)
(4,3)


(6,4)
Example 2
no Nash equilibrium
Agent
a2
Strategy
s2,1
s1,1
(3,6)
s2,2


a1
s1,2
(6,2)
(5,3)


(2,5)
Example 3
unique Nash equilibrium
Agent
a2
Strategy
s2,1
s1,1
(1,1)
s2,2


a1
s1,2
(0,5)
(5,0)


(3,3)
Example 3
unique Nash equilibrium
highest social welfare &
Pareto efficient
Agent
a2
Strategy
s2,1
s1,1
(1,1)
s2,2


a1
s1,2
(0,5)
(5,0)


(3,3)
Overview
Introduction
Evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibria
The Prisoner’s Dilemma
Loose end: dominant strategies
The Prisoner’s Dilemma
Two men are collectively charged with a crime and
held in separate cells, with no way of meeting or
communicating. They are told that:
– if one confesses and the other does not, the confessor
will be freed, and the other will be jailed for three years
– if both confess, then each will be jailed for two years
Both prisoners know that if neither confesses,
then they will each be jailed for one year
The Prisoner’s Dilemma
The prisoners can either defect or cooperate.
The rational action for each individual prisoner is
to defect.
Example 3 is a prisoner’s dilemma (but note that
it tables utilities, not prison years: less years in
prison has a higher utility).
Real life: nuclear arms reduction, free riders
The Prisoner’s Dilemma
The Prisoner’s Dilemma is the fundamental
problem of multi-agent interactions.
It appears to imply that cooperation will not occur
in societies of self-interested agents.
Recovering cooperation ...
Conclusions that some have drawn from this
analysis:
– the game theory notion of rational action is wrong!
– somehow the dilemma is being formulated wrongly
Arguments to recover cooperation:
– We are not all Machiavelli!
– The other prisoner is my twin!
– The shadow of the future…
The Iterated Prisoner’s Dilemma
One answer: play the game more than once
If you know you will be meeting your opponent
again, then the incentive to defect appears to
evaporate
When you now how many times you’ll meet your
opponent, defection is again rational
Axelrod’s tournament
Suppose you play iterated prisoner’s dilemma against
a range of opponents…
What strategy should you choose, so as to
maximize your overall payoff?
Axelrod (1984) investigated this problem, with a
computer tournament for programs playing the
prisoner’s dilemma
Strategies in Axelrod’s tournament
ALL-D:
Always defect
TIT-FOR-TAT:
At the first meeting of an opponent: cooperate. Then do
what your opponent did on the previous meeting
TESTER:
First: defect. If the opponent retaliates, play TIT-FOR-TAT.
Otherwise intersperse cooperation and defection.
JOSS:
As TIT-FOR-TAT, except periodically defect
Reasons for TIT-FOR-TAT’s success
– Don’t be envious:
Don’t play as if it were zero sum!
– Be nice:
Start by cooperating, and reciprocate cooperation
– Retaliate appropriately:
Always punish defection immediately, but use
“measured” force — don’t overdo it
– Don’t hold grudges:
Always reciprocate cooperation immediately
Overview
Introduction
Evaluation criteria & equilibria
Social welfare
Pareto efficiency
Nash equilibria
The Prisoner’s Dilemma
Loose end: dominant strategies
Dominant strategy
A strategy is dominant for an agent if it is the best
under all circumstances
Dominant strategy equilibrium: each agent uses a
dominant strategy
A dominant strategy equilibrium is always a Nash
equilibrium (but there are ‘more’ of the latter).
Example 4
Dominant for a2
Dominant for a1
Agent
a2
Strategy
s2,1
s1,1
(2,3)
s2,2


a1
s1,2
(1,2)
(4,5)


(2,3)
Just to play with: new roads
-
There are 6 cars going
from A to D each day.
(A,B) and (C,D) are
highways
time(c) = 5 + 2c,
where c is the number
of cars
-
(B,D) and (A,C) are local
roads
time(c) = 20 + c
B
A
D
C
What will happen
when a new highway
is made between B
and C?