Learning to Cooperate in a Continuous Tragedy of the Commons

Learning to Cooperate
in a Continuous Tragedy of the Commons
(Extended Abstract)
Steven de Jong
Karl Tuyls
Dep. of Knowledge Engineering, Maastricht Univ.
Postbus 616, 6200 MD Maastricht, Netherlands
Eindhoven University of Technology
Postbus 513, 5600 MB Eindhoven, Netherlands
[email protected]
[email protected]
Categories and Subject Descriptors
UHDFK GHVLUDEOH FRRSHUDWLYH VROXWLRQV WR WKH 3** LH DOO DJHQWV LQ
YHVW C WKH PD[LPXP DPRXQW DOORZHG LQVWHDG RI WKH LQGLYLGXDOO\ UD
WLRQDO LQYHVWPHQW RI 0 HYHQ LQ D FRQWLQXRXV VWUDWHJ\ VSDFH 7KH RQO\
UHTXLUHPHQW LV WKDW WKHUH DUH LQLWLDOO\ VRPH DJHQWV DOUHDG\ LQYHVWLQJ
C 7KH PHWKRGRORJ\ LV EDVHG RQ WKH IROORZLQJ IRXU HOHPHQWV
, >/HDUQLQJ@ , >'LVWULEXWHG $UWLILFLDO ,QWHOOLJHQFH@ -
>6RFLDO DQG %HKDYLRUDO 6FLHQFHV@
General Terms
&RQWLQXRXVDFWLRQ OHDUQLQJ DXWRPDWD &$/$ $JHQWV OHDUQ
LQGLYLGXDO EHKDYLRU E\ PHDQV RI &$/$ >@ (DFK DJHQW i NHHSV WUDFN
RI LWV FXUUHQW VWUDWHJ\ LQ WKH 3** μi ∈ [0, C] ZKLFK LV DGDSWHG DIWHU
HDFK SDLUZLVH LQWHUDFWLRQ ZLWK D QHLJKERU j LQ WKH QHWZRUN
$OJRULWKPV 'HVLJQ +XPDQ )DFWRUV
Keywords
7UDJHG\ RI WKH FRPPRQV SXQLVKPHQW OHDUQLQJ FRPSOH[ QHWZRUNV
,QWHUDFWLRQ LQ G\QDPLF VRFLDO QHWZRUNV 6RFLDO QHWZRUN VWUXF
WXUHV KDYH D VWURQJ LQIOXHQFH RQ WKH VWUDWHJLHV LQWHUDFWLQJ DJHQWV FRQ
YHUJH WR HVSHFLDOO\ LI ZH DOORZ DJHQWV WR UHZLUH WKHLU QHLJKERU UHOD
WLRQV >@ :H WKHUHIRUH VWUXFWXUH RXU SRSXODWLRQ RI DJHQWV LQ D VFDOH
IUHH VWUXFWXUH 7KLV DOORZV XV WR XVH WKRXVDQGV RI DJHQWV (DFK QHZO\
LQWURGXFHG DJHQW FRQQHFWV WR RQH WZR RU WKUHH H[LVWLQJ RQHV ZLWK D
SUHIHUHQFH IRU DJHQWV WKDW DUH DOUHDG\ GHQVHO\ FRQQHFWHG >@ 7R PRGHO
WKH IDFW WKDW UHODWLYH FRRSHUDWRUV LH WKRVH ZLWK D KLJK μi PD\
ZDQW WR SUHYHQW LQWHUDFWLQJ ZLWK UHODWLYH GHIHFWRUV DJDLQ DQ DJHQW i
XQZLUHV IURP LWV QHLJKERU j DIWHU LQWHUDFWLQJ ZLWK LW ZLWK D SUREDELO
LW\ pr = C1 (μi − μj ) ,I XQZLULQJ KDSSHQV i FRQQHFWV WR D UDQGRP
QHLJKERU RI j DV LQ >@ $OORZLQJ DJHQW i WR VHOHFW D QHZ QHLJKERU
ZRXOG JLYH i WKH RSSRUWXQLW\ WR DFWLYHO\ H[SORLW WKLV QHLJKERU
1. RESEARCH SUMMARY
,Q SUHYLRXV ZRUN ZH GLVFXVVHG WKDW VRFLDO GLOHPPDV DUH RIWHQ SUHVHQW
LQ PXOWLDJHQW V\VWHPV >@ 6RFLDO GLOHPPDV DUH SUREOHPV LQ ZKLFK
ZH FDQ RQO\ ILQG D JRRG VROXWLRQ LI ZH FRQVLGHU WKH EHQHILW RI RWKHUV LQ
DGGLWLRQ WR RXU RZQ EHQHILW $OWUXLVWLF SXQLVKPHQW KDV EHHQ LGHQWLILHG
DV DQ LPSRUWDQW PHFKDQLVP WR HQIRUFH WKLV FRQVLGHUDWLRQ +RZHYHU
DV WKH SXQLVKPHQW LV DOWUXLVWLF GHFLGLQJ ZKHWKHU WR SXQLVK HVVHQWLDOO\
HQWDLOV D VHFRQGRUGHU VRFLDO GLOHPPD :H GHYHORSHG D PHWKRGRORJ\
WKDW DOORZHG LQGLYLGXDOO\ OHDUQLQJ DJHQWV WR UHDFK VDWLVIDFWRU\ VROX
WLRQV LQ D VRFLDO GLOHPPD ZLWK D FRQWLQXRXV VWUDWHJ\ VSDFH FDOOHG WKH
8OWLPDWXP *DPH >@ :H H[WHQGHG WKLV PHWKRGRORJ\ WR WKRXVDQGV
RI DJHQWV XVLQJ VRFLDO QHWZRUNV >@ 0RUHRYHU ZH GHYRWHG DWWHQWLRQ
WR WKH WUDJHG\ RI WKH FRPPRQV D VRFLDO GLOHPPD W\SLFDOO\ H[HPSOL
ILHG E\ WKH 3XEOLF *RRGV *DPH 3** >@ ,Q WKLV JDPH ZKLFK LV
SOD\HG UHSHDWHGO\ HYHU\ DJHQW i RXW RI n KDV WR GHFLGH RQ DQ LQ
YHVWPHQW μi ∈ [0, C] 7KH VXPPHG LQYHVWPHQW LV PXOWLSOLHG E\ D
IDFWRU 1 < r < n DQG HTXDOO\ GLVWULEXWHG RYHU DOO DJHQWV $JHQW i¶V
LQGLYLGXDO EHQHILW RU UHZDUG LV PD[LPL]HG E\ μi = 0 ZKHUHDV WKH
JURXS JDLQV WKH PRVW E\ FROOHFWLYHO\ SOD\LQJ μi = C $OWUXLVWLF SXQ
LVKPHQW LH UHGXFLQJ DQ RWKHU DJHQW¶V UHZDUG E\ DQ DPRXQW e ZLWK
D FRVW c < e WR WKH SXQLVKHU DOORZV DJHQWV WR IRUFH RWKHUV WR LQYHVW
D KLJKHU DPRXQW EXW SHUIRUPLQJ VXFK SXQLVKPHQW LV FOHDUO\ QRW LQGL
YLGXDOO\ UDWLRQDO ,Q HDUOLHU ZRUN ZH UHVWULFWHG RXUVHOYHV WR D VPDOO
QXPEHU RI VWUDWHJLHV DQGRU DJHQWV LQ WKLV JDPH >@
7KLV SDSHU XQLWHV DQG H[WHQGV DOO RXU SUHYLRXV ZRUN :H GHYHORS D
PHWKRGRORJ\ WKDW DOORZV WKRXVDQGV RI LQGLYLGXDOO\ OHDUQLQJ DJHQWV WR
,QHTXLW\ DYHUVLRQ 5HVHDUFK LQ EHKDYLRUDO HFRQRPLFV LGHQWLILHG
WKDW WKH KXPDQ WHQGHQF\ WR SHUIRUP DOWUXLVWLF SXQLVKPHQW PD\ EH PR
WLYDWHG E\ LQHTXLW\ DYHUVLRQ >@ :H IRXQG WKDW LQHTXLW\ DYHUVLRQ PD\
LQGHHG DOVR EH XVHG WR PRWLYDWH DOWUXLVWLF SXQLVKPHQW LQ WKH 3** >@
7KXV ZH LQFOXGH DOWUXLVWLF SXQLVKPHQW DOO DJHQWV i FRQVLGHU SXQLVK
LQJ WKHLU SHHU j DIWHU LQWHUDFWLQJ ZLWK LW LII μj < μi 3UREDELOLVWLF SXQLVKPHQW (YHQ LI DOO DJHQWV DUH ZLOOLQJ WR SXQ
LVK WKH\ VKRXOG QRW DOZD\V GR VR 7KH PDLQ SUREOHP LV WKDW LQ D FRQ
WLQXRXV VWUDWHJ\ VSDFH PDQ\ OHDUQLQJ DOJRULWKPV HJ &$/$ VHH
EHORZ RSWLPL]H E\ SHUIRUPLQJ D JUHDW GHDO RI ORFDO VHDUFK 7R VROYH
WKLV SUREOHP ZH SURSRVH WKH PHFKDQLVP RI SUREDELOLVWLF SXQLVKPHQW
LH WKH SUREDELOLW\ WKDW DQ DJHQW i SXQLVKHV DQ DJHQW j VKRXOG GHSHQG
RQ WKH DFWXDO VWUDWHJLHV μi DQG μj DV ZHOO DV WKH UHVXOWLQJ UHZDUGV ri
DQG rj 3XQLVKPHQW LV PRUH RIWHQ SHUIRUPHG IRU KLJKHU GLIIHUHQFHV
EHWZHHQ WKHVH UHZDUGV 0RUH SUHFLVHO\ ZH PD\ GHULYH WKDW WKH SXQ
LVKPHQW SUREDELOLW\ VKRXOG EH VHW WR pi () > 1e (1 − 0.5r)Δ ZLWK e
GHQRWLQJ WKH HIIHFW RI SXQLVKPHQW RQ WKH UHZDUG RI WKH DJHQW EHLQJ
SXQLVKHG DQG Δ = ri − rj >@
'XH WR WKH VSDFH FRQVWUDLQWV RI WKLV H[WHQGHG DEVWUDFW ZH RPLW PDQ\
UHOHYDQW UHIHUHQFHV WR RWKHU DXWKRUV :H UHIHU WKH LQWHUHVWHG UHDGHU WR
RXU SUHYLRXV ZRUN DV JLYHQ LQ WKH OLVW RI UHIHUHQFHV 2XU SDSHUV DUH
DYDLODEOH DW KWWSZZZFVXQLPDDVQOVWHYHQGHMRQJ
&LWH DV Learning to Cooperate in a Continuous Tragedy of the Commons
Cite as: Learning to Cooperate in a Continuous Tragedy of the Com(Short Paper), Steven de Jong and Karl Tuyls, 3URF RI WK ,QW &RQI RQ
mons,
(Extended
Abstract),
Steven de Jong,
Karl
Tuyls, Proc.
8th Int.
$XWRQRPRXV
$JHQWV
DQG 0XOWLDJHQW
6\VWHPV
$$0$6
of, Decker,
Conf.
on Autonomous
Agents and Multiagent
(AAMAS
2009),
Sichman,
Sierra and Castelfranchi
(eds.), May,Systems
10–15, 2009,
Budapest,
Decker,
Sierra and Castelfranchi (eds.), May, 10–15, 2009,
Hungary,Sichman,
pp. XXX-XXX.
Budapest,
pp. 1185–1186
Copyright Hungary,
(c) 2009, International
Foundation for Autonomous Agents and
Multiagent©Systems
(www.ifaamas.org).
All rights
reserved.
Copyright
2009, International
Foundation
for Autonomous
Agents
and Multiagent Systems (www.ifaamas.org), All rights reserved.
,PDJLQH DJHQW j ZLWK μj = 2 SOD\LQJ DJDLQVW DJHQW i ZLWK μi = 8
$JHQW j PD\ ZDQW WR WU\ μj = 3 'XH WR LQHTXLW\ DYHUVLRQ DJHQW i
ZLOO SXQLVK j LQ ERWK FDVHV 7KHUHIRUH WKH HVVHQWLDO LGHD XQGHUO\LQJ
SXQLVKPHQW LH D UHYHUVDO RI WKH LQYHUVH UHODWLRQ EHWZHHQ FRQWULEX
WLRQ DQG UHZDUG IDLOV WR ZRUN μj = 2 JLYHV D KLJKHU UHZDUG WKDQ
μj = 3 EHFDXVH ERWK VWUDWHJLHV DUH SXQLVKHG
1185
AAMAS 2009 • 8th International Conference on Autonomous Agents and Multiagent Systems • 10–15 May, 2009 • Budapest, Hungary
:,7+ 5(:,5,1*
6000
5000
5000
4000
4000
Games per agent
3000
3000
2000
2000
1000
1000
10
9
9
8
8
Average converged strategy
7
6
5
4
3
90
10
0
70
80
7
6
5
4
3
2
2
1
1
90
10
0
80
70
60
50
40
30
0
90
10
0
70
80
50
60
30
40
10
20
0
20
0
10
Percentage FS agents
Percentage FS agents
1
0.9
0.9
0.8
0.8
Performance at convergence
1
0.7
0.6
0.5
0.4
0.3
0.7
0.6
0.5
3.
0.3
0.2
0.2
0.1
0.1
Percentage FS agents
90
10
0
80
70
60
50
40
30
20
0
90
10
0
80
70
60
50
40
30
20
10
0
10
0
0
Percentage FS agents
1
0.9
0.9
0.8
0.8
0.7
0.7
Degree distribution
1
0.6
0.5
0.4
0.6
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
CONCLUSION
:H SUHVHQW D PHWKRGRORJ\ DLPHG DW DOORZLQJ D SRSXODWLRQ RI OHDUQ
LQJ DJHQWV WR ILQG DQG PDLQWDLQ FRRSHUDWLYH GHVLUHG VWUDWHJLHV LQ D
JDPH PRGHOOLQJ D WUDJHG\ RI WKH FRPPRQV ZLWK D FRQWLQXRXV VWUDW
HJ\ VSDFH LH WKH 3** :H VKRZ WKDW RXU PHWKRGRORJ\ FRPELQ
LQJ LQHTXLW\ DYHUVLRQ SUREDELOLVWLF SXQLVKPHQW DQG G\QDPLF VR
FLDO QHWZRUNV DOORZV LQGLYLGXDOO\ OHDUQLQJ DJHQWV WR UHDFK WKH EHVW
PRVW FRRSHUDWLYH VWUDWHJ\ LQLWLDOO\ SUHVHQW $ FHUWDLQ SHUFHQWDJH RI
RXU DJHQWV LQLWLDOO\ SOD\V LQ DQ LQGLYLGXDOO\ UDWLRQDO XQFRRSHUDWLYH
PDQQHU :H VKRZ WKDW RXU PHWKRGRORJ\ IRUFHV WKHVH DJHQWV WR EH
FRPH PRUH FRRSHUDWLYH 7KXV WKH PHWKRORJ\ PD\ DOVR EH DSSOLHG LQ
RSHQ V\VWHPV ZKHUH ZH DUH QRW DEOH WR FRQWURO WKH EHKDYLRU RI DOO
DJHQWV 7KH PHWKRGRORJ\ LV WKHUHIRUH XVHIXO LQ PDQ\ SUREOHPV FRP
PRQO\ DGGUHVVHG E\ PXOWLDJHQW V\VWHPV HJ UHVRXUFH DOORFDWLRQ
90
10
0
80
70
60
50
20
10
90
10
0
80
70
60
50
40
30
20
10
0
Percentage FS agents
0
0
0
UHSRUW UHVXOWV IRU n = 100 UHVXOWV IRU RWKHU n DUH KLJKO\ VLPLODU
)RU DQ RYHUYLHZ RI UHVXOWV VHH )LJXUH :H XVH WKH VDPH PHD
VXUHPHQWV DV LQ >@ )URP WRS WR ERWWRP ZH ILUVW ORRN DW WKH QXPEHU
RI SDLUZLVH JDPHV SHU DJHQW UHTXLUHG WR REWDLQ FRQYHUJHQFH ZLWK DQ
LPSRVHG PD[LPXP RI 6000 $ VWDWLF QHWZRUN UHTXLUHV VOLJKWO\ PRUH
JDPHV EXW VWLOO UHTXLUHV GUDVWLFDOO\ OHVV JDPHV WKDQ UHSRUWHG LQ UH
ODWHG ZRUN >@ 6HFRQG ZH ORRN DW WKH DYHUDJH FRQYHUJHG VWUDWHJ\ RI
WKH FROOHFWLYH DQG PHQWLRQ WZR REVHUYDWLRQV LH ZLWKRXW DQ\ )6
DJHQWV SUHVHQW WKH FROOHFWLYH FRQYHUJHV WR LQYHVWLQJ 7 ZKLFK LV WKH
PRVW FRRSHUDWLYH VWUDWHJ\ SUHVHQW ZLWK DQ LQFUHDVLQJ SHUFHQWDJH
RI )6 DJHQWV WKH FROOHFWLYH OHDUQV WR LQYHVW 10 ZKLFK LV UHPDUNDEO\
HDVLHU IRU DJHQWV WKDW DUH DEOH WR UHZLUH 7KLUG ZH H[DPLQH WKH SHU
IRUPDQFH DW FRQYHUJHQFH ZKLFK H[SUHVVHV WKH IUDFWLRQ RI QHLJKERU
LQJ DJHQWV LQ WKH QHWZRUN WKDW KDYH VLPLODU VWUDWHJLHV 5HVXOWV DUH LQ
OLQH ZLWK WKH REVHUYHG DYHUDJH VWUDWHJ\ )RU VWDWLF QHWZRUNV ZH VHH
WKDW WKH '6 DJHQWV KDYH GLIILFXOWLHV WR DOLJQ WKHPVHOYHV ZLWK D ORZ
QXPEHU RI )6 DJHQWV 7KLV SUREOHP LV QRW SUHVHQW LQ G\QDPLF QHW
ZRUNV ZKHUH WKH SHUIRUPDQFH LV QHDUO\ SHUIHFW HYHQ ZLWK RQO\ 10
)6 DJHQWV SURYLGLQJ WKH µJRRG H[DPSOH¶ )RXUWK DQG ODVW ZH UHSRUW
WKH GHJUHH GLVWULEXWLRQ RI WKH QHWZRUN RI LQWHUDFWLRQ )RU UHIHUHQFH
UHVXOWV IRU VWDWLF QHWZRUNV DUH DOVR LQFOXGHG :H VHH WKDW D IHZ DJHQWV
FRQQHFW WR DSSUR[LPDWHO\ 20 RI WKH RWKHUV ZKLOH PRVW DJHQWV KDYH
RQO\ D IHZ QHLJKERUV ,Q G\QDPLF QHWZRUNV ZH VHH WKDW WKH PD[L
PXP GHJUHH LQFUHDVHV VLJQLILFDQWO\ HVSHFLDOO\ ZKHQ WKHUH DUH IHZ )6
DJHQWV WR OHDUQ WKH GHVLUHG EHKDYLRU IURP 7KLV HPHUJHQFH RI VWURQJHU
KXEV LV YHU\ XVHIXO 7KH IDFW WKDW WKH WRSRORJ\ RI WKH QHWZRUN FKDQJHV
TXLWH D ORW LV VXUSULVLQJ JLYHQ RXU PHDVXUHPHQWV LH RQ DYHUDJH
UHZLULQJ RQO\ KDSSHQV RQFH LQ DSSUR[LPDWHO\ 1000 JDPHV
0.4
30
Average converged strategy
10
0
Performance at convergence
50
60
Percentage FS agents
Percentage FS agents
Degree distribution
40
30
0
10
80
90
10
0
70
60
50
40
30
20
0
10
20
0
0
40
Games per agent
:,7+287 5(:,5,1*
6000
Percentage FS agents
)LJXUH ,QIOXHQFH RI SHUFHQWDJH )6 DJHQWV
2. EXPERIMENTS AND RESULTS
,Q DQ H[WHQVLYH VHW RI H[SHULPHQWV RI ZKLFK WKLV DEVWUDFW VKRZV D
VPDOO VHOHFWLRQ ZH VWXG\ WKH EHKDYLRU RI D FROOHFWLYH RI n DJHQWV RI
ZKLFK D FHUWDLQ SHUFHQWDJH LV D IL[HGVWUDWHJ\ )6 DJHQW LH DQ DJHQW
WKDW DOZD\V SOD\V WKH VDPH VWUDWHJ\ LH LQYHVWLQJ C WKH GHVLUHG
VWUDWHJ\ 7KH UHPDLQLQJ DJHQWV DUH G\QDPLF VWUDWHJ\ '6 DJHQWV
ZKLFK OHDUQ XVLQJ &$/$ DIWHU HYHU\ SDLUZLVH LQWHUDFWLRQ ,QLWLDOO\
KDOI RI WKH '6 DJHQWV SUHVHQW KDYH μi = 0 DQG KDOI KDYH μi = 0.7C 7KH DJHQWV DUH RUJDQLVHG EDVHG RQ D UDQGRPO\ JHQHUDWHG VFDOH
IUHH QHWZRUN :H FRPSDUH D VWDWLF QHWZRUN ZLWKRXW UHZLULQJ WR D
G\QDPLF QHWZRUN ZLWK UHZLULQJ :H VHW WKH 3**¶V SDUDPHWHUV WR
C = 10 r = 1.5 c = 1 DQG e = 3 ZKLFK DOORZV XV WR VHW
pi () = C :H VWXG\ FROOHFWLYHV RI n ∈ {100, 1000, 10000} DQG
IRU HDFK n ZH YDU\ WKH SHUFHQWDJH RI )6 DJHQWV EHWZHHQ 0 DQG
100 LQ VWHSV RI 10 $OO H[SHULPHQWV DUH UHSHDWHG 100 WLPHV :H
4.
REFERENCES
>@ 6 GH -RQJ DQG . 7X\OV /HDUQLQJ WR FRRSHUDWH LQ SXEOLFJRRGV
LQWHUDFWLRQV 3UHVHQWHG DW WKH (80$6¶ :RUNVKRS
%DWK 8. 'HFHPEHU >@ 6 GH -RQJ . 7X\OV DQG . 9HUEHHFN $UWLILFLDO $JHQWV
/HDUQLQJ +XPDQ )DLUQHVV ,Q 3URFHHGLQJV RI WKH LQWHUQDWLRQDO
MRLQW FRQIHUHQFH RQ $XWRQRPRXV $JHQWV DQG 0XOWL$JHQW
6\VWHPV $$0$6¶ SDJHV ± >@ 6 GH -RQJ . 7X\OV DQG . 9HUEHHFN )DLUQHVV LQ PXOWLDJHQW
V\VWHPV .QRZOHGJH (QJLQHHULQJ 5HYLHZ ± >@ 6 GH -RQJ 6 8\WWHQGDHOH DQG . 7X\OV /HDUQLQJ WR 5HDFK
$JUHHPHQW LQ D &RQWLQXRXV 8OWLPDWXP *DPH -RXUQDO RI
$UWLILFLDO ,QWHOOLJHQFH 5HVHDUFK ± >@ ( )HKU DQG . 6FKPLGW $ 7KHRU\ RI )DLUQHVV &RPSHWLWLRQ DQG
&RRSHUDWLRQ 4XDUW - RI (FRQRPLFV ± >@ ) & 6DQWRV - 0 3DFKHFR DQG 7 /HQDHUWV &RRSHUDWLRQ
3UHYDLOV :KHQ ,QGLYLGXDOV $GMXVW 7KHLU 6RFLDO 7LHV 3/R6
&RPSXW %LRO ± >@ 0 $ / 7KDWKDFKDU DQG 3 6 6DVWU\ 1HWZRUNV RI /HDUQLQJ
$XWRPDWD 7HFKQLTXHV IRU 2QOLQH 6WRFKDVWLF 2SWLPL]DWLRQ
.OXZHU $FDGHPLF 3XEOLVKHUV 'RUGUHFKW WKH 1HWKHUODQGV 1186