Predicting Bargaining Behaviors: Out- of- Sample

Predicting Bargaining Behaviors:
Out - of- Sample Estimates from a Social Utility
Model
with Quantal Respon se
Arnaud De Bruyn ♣
The Pennsylvania State University
Gary E. Bolton
The Pennsylvania State University
Draft version 3.9
June 2003
♣
701 Business Administration Building, Smeal College of Business Administration, The
Pennsylvania State University, University Park, PA 16802. Tel.: (+1) (814) 865 - 4091.
Fax: (+1) (814) 865 - 3015. Email: [email protected] .
1
PREDICTING BARGAINING BEHAVIORS:
OUT-OF- SAMPLE ESTIMATES FROM A SOCIAL UTILITY MODEL
WITH QUANTAL RESPONSE
Substa n tial evidence from experi me n t al game theory suggest s that
people are not only motivated by their selfish monet ary gains when they
play games, but also take into account non - monet ary consider ation s
such as equity or reciprocity. In this paper, we lay the grou nd wo r k to
quan tify these motives. We fit a mo del to 1- round sequent ial (ultim at u m)
bargaini ng game data, and use the model to obtain out - of- sample
estimat es of play in multiple rou nd sequen tial bargaining games. The
model embed s a social utility function in a quantal respon se framework,
and has 3 fitted para me t er s, 1 to captu r e utility trade - offs, 1 to
represen t heterog en eity and ran do m n e s s of behavior s, and 1 to captur e
experience effects. The data comes from 6 previously report ed studies,
enco m p a s si n g 19 distinct param et e ri za tion s of the sequent ial bargaining
game. The model is remark a bly accurate with respect to directional
findings and to out - of- sam ple estima tes of average first offers,
accounti n g for 90% of the variability in the data. Out - of- sampl e
estimat es of rejection behavior account for nearly half the variability.
Furt her m or e, the para me t er estimat es achieve reaso nable stability when
fitted to different dataset s. The resul ts suggest that the influence of
social utility on bargai ner decision can be reliably quantified for
forecasti ng, and that the mod el can be fairly generalize d to predict actual
behaviors in very differen t games setting s.
2
TABLE OF CONTENTS
TABLE OF CONTENTS..............................................................................................................................3
INTRODUCTION.........................................................................................................................................5
A UTILITY- DECISION FRAMEWORK MODEL FOR THE ULTIMATUM GAME......................9
OPEROTIANILIZATION OF THE ERC FUNCTION ....................................................................................................9
T HE ULTIMATUM GAME..................................................................................................................................11
T HE SELLER..................................................................................................................................................11
T HE BUYER...................................................................................................................................................14
A
CLOSER LOOK AT THE DECISION PARAMETERS................................................................................................15
ESTIMATING THE PARAMETERS OF THE MODEL......................................................................17
O PTIMIZATION
A
PROCEDURE AND RESULTS.........................................................................................................17
NOTE ON SCALING.....................................................................................................................................23
SUMMARY.....................................................................................................................................................24
REPLICATIONS OF OTHER GAMES...................................................................................................24
BINMORE, SHAKED
AND
N EELIN, SONNENSCHEIN
SUTTON (1985).......................................................................................................25
AND
SPIEGEL (1988) ................................................................................................27
GÜTH
AND
T IETZ (1988)............................................................................................................................30
O CHS
AND
ROTH (1989)............................................................................................................................32
BOLTON (1991) ..........................................................................................................................................42
Two- round bargaining game .....................................................................................................42
Truncation game ............................................................................................................................44
GÜTH
AND VAN
DAMME (1998) ..................................................................................................................46
3
SUMMARY.....................................................................................................................................................50
STABILITY OF THE PARAMETER ESTIMATES.............................................................................56
ULTIMATUM GAME SIMULATIONS WITH ALTERNATIVE ASSUMPTIONS
ABOUT PLAYERS’ CHARACTERISTICS...........................................................................................59
DISCUSSION AND CONCLUSIONS.....................................................................................................63
COMPARISONS TO
REINFORCEMENT LEARNING MODELS........................................................................................63
LIMITATIONS AND
DIRECTIONS FOR FUTURE RESEARCH.......................................................................................66
CONTRIBUTIONS AND
CONCLUSION..................................................................................................................67
REFERENCES.............................................................................................................................................70
4
INTRODUCTION
When agent s’ behaviors are hypot hesi ze d to be solely driven by their
self- interes te d material gains, classical equilibrium game theo ry fails to
explain num ero us experi m ent al findings. People do reject positive offers
in ultimat u m games; they do give money to other players in the dictator
games; and they make decisions that are by all means inconsi ste nt with
what classic game theory would have predict ed.
It has now been largely acknowledged that people do not seem to be only
motivated by their selfish, monet ar y interes t s when they make decisions,
and the urge to better under s t an d such pheno m e na has triggered an
entire new strea m of research in busi nes s econo mics and game theory. In
the last decade, variou s models have been propos ed to include a variety
of non - mon et ary motivatio n s in people's behavior s, such as fairness (or
aversion to unfairn ess), equity, reciprocity, or altruis m (Bolton 1991;
Bolton and Ockenfels 2000; Camerer 1990; Fehr et al. 1993; Fehr and
Schmidt 1997; Ochs and Roth 1989; Rabin 1993). Others have introd uce d
reinforce m e nt - learni ng models to explain how players adap t their
behaviors (someti m es not optim ally) to a competi tive game environ m e nt
(Camerer and Ho 1999; Cheung and Friedma n 1997; Cheung and
Friedma n 1995; Cooper and Feltovich 1996; Cox et al. 1995; Erev et al.
1999; Erev and Roth 1995; Erev and Roth 1998; Feltovich 2000; Grossko pf
5
1999; Hopkin s 1999; Rapopor t and Erev 1998; Swartho ut and Walker
1999).
In this paper, we investigate one particular model of social utility that
may explain nu mer o u s so- called "incon sist ent" behavior s in games,
namely, the Equity - Reciprocity - Competi tion theory, hereafter ERC
(Bolton 1991; Bolton and Ockenfels 2000). This theory suggest s that
people gauge the outco me of a game both in term of absolute and relative
money: when people form an opinion about how well they perfor m ed in a
game, they not only take into account their actual payoffs, but also
com pa re their payoffs relative to others'.
Bolton and Ockenfels (2000) pro pos e a specific ERC utility functional
form where the outco m e of a game is described by a scaling factor c (i.e.,
the size of the pie) and the prop or ti on σ of the pie the player actually
gets. Players'utility is modeled as a weighted sum of both his absolute
(the more the better) and relative gains (the more equitable, that is, the
closer to a 50:50 split –if only two players are involved– the better). Two
amplitud e param et e r s, a and b , measur e the relative import a nce of these
two conflicting motivations. If a is equal to 0, the player's only
motivatio n is fairness and equity; if b is equal to 0, the utility function is
equivalen t to the sole self- intereste d materi al gain model.
6
b
1
U (σ ) = aσc −  σ − 
2
2
2
Equation 1 – ERC social utility function (Bolton and Ockenfels 2000) . Utility is a
function of the size of the pie c, the proportion σ of the pie the player gets, and
two parameters a and b measuring the relative importance of both absolu te and
relative gains. a is usually set equal to 1.
This model has several interesti ng prop erties. First, the pure - pecuniary
model can be expres sed as a special case of this more general utility
function (i.e., b=0). Second, Equation 1 spans a large variety of mixed
self- interes te d and altr uistic motivation s, motivations that can be scaled
by simply modifying the amplitu de of either a or b . Finally, the aut hor s
have demon st r a t e d that this particular functio nal form could explain
several surpri sing findings in various versions of the ultimat u m game
(including the Güth - van Dam me 3- person bargaini ng version) as well as
in the dictato r game (Bolton and Ockenfels 2000; Bolton and Ockenfels
1998), at least theoretically.
But is the ERC utility functio n only a norm a tive or descri ptive model, or
can it be used to predict actual players'behavior s? In other words, is it
possible to find the actual value of the param e t er b (a being fixed to a
consta nt) in one particular game, and if yes, is this param et e r stable
enough to predict how players might behave in other games? The aim of
this paper is to show that this is actu ally the case, and to demons t r a t e
the good predictive accuracy of the ERC theory in a large variety of
games.
7
This paper is organi zed in five section s. First, we develop a utilitydecision framewo rk, based on the underlying ERC theory, to describe
how people make their decision s when they play the sim ple version of
the ultim at u m game. In other words, we build a model that links the ERC
utility function to the actual decisions made by players through a
decision rule that incorpora t es players'heterogenei ty, uncert ai nty and
choice rando m ne s s. Then, in the second section, we fit the para me t er s of
this model to describe the observations made by Roth et al. in their
multi - country, bargaini ng experi men t s (Roth et al. 1991), and describe
the model’s fit and face validity proper ties. In the third section, we show
that the pro pose d fram ewor k (i.e., ERC utility function + Gibbs decision
rule) can predict to a very large extent how people actually played a
variety of games (even games played at different points in time with
differen t popul ation s), nam ely (i) the 2- round, (ii) the 3- roun d and (iii)
the 5- rou nd versions of the ultim at u m game, (iv) the truncation game,
and (v) the Güth - van Damm e 3 - person bargaining game. Out - of - sam ple
prediction s are com par ed to observations repor te d in 6 distinct papers,
totaling 19 differen t experi m ent al conditions. All major experi m ent al
findings are direction ally replicated, often with a sur pri si ng accuracy. In
the fourt h section, we re- estimat e the para me t er s of the model on two
differen t, large dataset s, and show an adeq uat e stability of the param et e r
estimat es. We conclude this paper by a discus sion of our results and by
suggesti ng several directions for fut ure research.
8
A
UTILITY -DECISION FRAMEWORK MODEL FOR THE ULTIMATUM GAME
Operotianilization of the ERC function
In this paper, we will amend the ERC utility function (Equation 1 ) in three
ways. First, we will set a equal to 1, without loss of generalization.
Second, one can note that, in the original model, only the absolute term
of the utility function was multiplied by c. It did not affect the theoretical
resul ts, though (c was set equal to 1 for the sake of the argum en t
thro ugho u t Bolton and Ockenfels’ paper), but we argue that the model
would be more realistic by multi plying both absolute and relative term s
of the utility function by the size of the pie. Without this modification, (i)
players would be upset by an unfair split of a pie of size 0 (that is,
players would be upset by having 0% of not hi ng, but not by having 50% of
nothin g); and (ii) players would be propor tion ally more driven by their
selfish interest s when pie size grows, an assum p ti on that has also been
partly contra dict ed in the literat u r e. To multiply the relative element of
the utility function by the size of the pie takes care of these two relative
inconsist encies witho ut affecting the theoretical discussion originally
prop o se d by the auth or s.
Finally, it has been argued that the ERC function was not symm et ric, and
that the import a nce of positive reciprocity (disutility generate d by unfair
splits in disfavor of others) and negative reciprocity (disutility generat ed
9
by unfair splits in one’s disfavor) might differ. In his first model, Bolton
(1991) argued that unfair splits generat ed disutility if and only if inequity
were in disfavo r of the player (i.e., no positive reciprocity, b=0 if σ>½ ).
Later, Bolton and Ockenfels (2000) released this assu m p t io n, and
introd uce d a perfectly sym met ric reciprocity effect in the ERC function.
Others have argued that the trut h might lie in between; players’
behaviors might be driven by both positive and negative reciprocities, but
negative recip rocity’s amplitude is likely to be more import a nt. In any
cases, the games we intend to fit and replicate in this paper are likely to
involve negative recip rocity only, and mainly trigger com petitive
behaviors amo ng players. Althoug h we do believe players might be
driven by positive reciprocity consider ation s, too, the dataset s we will
analy ze will not give us the chance to estimat e such considera tions, and
we therefore assu m e perfect asym m et r y in the utility function.
The ERC utility function we will use thro ug hou t this paper is therefore as
follows:
2
 
1 
b
c σ −  σ −   if
U (σ ) =  
2
2  

cσ
fi

σ<
1
σ≥
1
2
2
Equation 2 – Utility function (ERC functional form) used throughout this paper. c
is the size of the pie, σ the proportion of the pie the player gets, and b measures
the relative importance of relative gains (without positive reciprocity).
10
It only contain s one para m et er to be esti mate d, namely b , while both the
size of the pie and the propo rti on the player gets are exogenou s to the
utility function.
The ultimatum game
In the sim plest version of the ultimat u m game (one round per game), say
α are buyers, and β are sellers. α makes an offer to β and propose s him to
keep a proport ion σ of a pie of size c. If β accept s, the pie is divided
accor di ngly, β gets σc and α gets (1- σ)c. If β refuses, bot h α and β get
nothin g. For the convenience of the argum en t, we consider that σ can
only take a finite number of values, and varies between 0 and 1 with an
increm en t of 0.1. All possible values for σ are noted σi, with 0 ≤ i ≤ 10 , and
σ0 =0, σ1 =0.1,… σ10 =1.
The seller
Say Pβ ( σ i ) is the aggregat e prob ability for β, the sellers, to accept an
offer of σi. For inst ance, Pβ ( 0.5) = .95 means that, on average, 95% of
players β accept an offer of σ=0.5 (i.e., an equal split of the pie). By
definition, Pβ ( σ i ) ∈ [ 0,1]∀i .
If the player accepts the offer, the pie is split accordingly, and his utility
is a functio n of both "absolut e" and "relative" money as given by the ERC
utility function (see Equation 2). If he refuses the offer (noted ∅), no
11
player receives any payoff. Since the size of the pie shri nks to not hi ng,
U ( ∅ ) , the utility associated with a refusal, can be obtained by replacing
c=0 in the ERC equation. Consequ e nt ly, U ( ∅ ) = 0, ∀σ i .
How can we model the probability that sellers will accept a particular
offer? One of the basic assu m p t i on s of most math e m a t ical learni ng
theories propose d in psychology is that choice behavior is probabilistic
(Bush and Mosteller 1955; Estes 1950; Luce 1959; Suppes and Atkinson
1960). At an aggregate level, we therefor e hypot hesi ze that Pβ ( σ i ) , the
probability of accepti ng a particular offer σi, can be expres sed by a Gibbs
distribution:
Pβ (σ i ) =
τ β .U ( σ i )
e
τ β .U ( ∅ )
e
τ β .U ( σ i )
+e
Where U ( ∅ ) and U (σ i ) are the utilities of rejecting or accepting the offer,
respectively. Since U ( ∅ ) = 0 , it follows that:
Pβ (σ i ) =
τ β .U ( σ i )
e
1+e
τ β .U ( σ i )
Equation 3 – Sellers' probability to accept an offer of σi.
12
In Equation 3 , τ β is a positive coefficient of certitu de 1 . We will also use
the term “decision param e t er” intercha ngeably. If τ β ∞,
Pβ ( σ i ) → 1 ⇔ U ( σ i ) > U ( ∅ ) , and Pβ ( σ i ) → 0 otherwise; the larger τ β , the
higher the probability for the seller to follow the strat egy that produces
the highest utility. At the other extrem e, if τ β=0, the seller has a 50:50
chance to accept any offer, indepen d e n tly of the actual value of σi.
In this model, the signification of τ β is twofold. First, it is an indicat or of
individuals'choice consist ency. It has been observed in variou s
experi me nt s that some players are inconsist ent over time, accepting and
offer of .4 at one game, and refusi ng a better offer of .5 at the very next
game. The probabilistic nat ur e of the decision rule, introd uced when τ β
takes a relatively small value, takes care of that pheno m en o n and
introd uces some uncert ai nt y in the strat egy the same player will follow
over time. In other word s, a small value for τ β underli nes the fact that
players might very well not be certain of their own preferences, or might
show some inconsi ste ncies in their choices.
Also, since Equatio n 3 is an aggregated decisio n rule, to introd uce some
uncer tai nty takes care of the heterogeneity of different player s'strat egies
to either accept or refuse the same offers. Conseq ue n tly, the coefficient
of amplitude τ β is an elegant way to aggregat e both individu al s’
uncer tai nty and choice inconsi stencies as well as players’ heterogeneity.
This term is similar to the "alpha- rule" used in some Logit models to link products' preferences
to their actual market shares.
1
13
The buyer
Say Pα ( σ i ) is the aggregate probability for α, the buyers, to make an offer
of σi to the sellers. For instance, Pα ( .5) = .2 indicates that buyers propose
on average to split the pie equally 22% of the time. By definition,
I
∑ P (σ ) = 1.
i= 0
α
i
We hypot h esi ze that buyers’ decisio n to offer σi (and to propose to keep
1 - σi for them selves) follows a Gibbs distribu tion, too:
Pα ( σ i ) =
e τ α .Ε ( U ( 1−σ i ) )
I
∑e
( (
τ α . Ε U 1−σ
j
))
j =1
Equation 4 – Buyers’ decision rule: probability for the buyers to make an offer of
σi .
Where Ε(U (1 − σ i ) ) is the expected utility of offering σi to the seller. If
buyers have a perfect knowledge of the true prob abilities for the sellers
to accept any particular offer and if they are risk - indifferent (two
assu m p t io n s we will make for the time being), the expected utility of an
offer σi is equal to Pβ (σ i ) ⋅U (1 − σ i ) , where U (1 − σ i ) is given by the ERC
utility function (with the sam e param et e r s as the buyers’), and thus:
14
Pα ( σ i ) =
e
τα . Pβ ( σ i ).U ( 1−σ i )
I
( j ).U (1−σ j )
∑e
τ α . Pβ σ
j =1
Equation 5 – Buyers’ decision rule: probability for the buyers to make an offer of
σi, re- expressed as a function of sellers’ probability to accept such offer.
I
This expres sio n guarant ees that
∑ P ( σ ) = 1 , and
i= 0
α
i
that the offers with the
highest expected utilities are likely to be chose n more often. Again, if
τ α ∞, buyers system atically make the offer σi that procures the highest
expected utility. If τ α=0, Pα ( σ i ) = Pα (σ j ),∀i, j .
A closer look at the decisi on parameters
So far, we have hypot hesi ze d that buyers and sellers had two different
decision param et e r s, τ α and τ β, and that these para m et er s were
indepen de n t of players’ experience and consta nt over time. We now
release these hypot hes es.
Indepen de nt ly of the psychological aspect s of hum an decision - making
they captur e, decision param et e r s are also influenced by the num ber of
alternatives players have to choose from. For instance, if the num ber of
irrelevant alternatives were to be artificially increased while keeping the
decision param et e r const an t, the probability to choose the action with
the highest utility would decrease. To reconcile players’ decision
para me t er s while at the same time taking into accou nt such “dilution”
15
effects, we hypot hesi z e that α and β players’ decision para m et er s are
equal to a comm o n “root”, multiplied by the natur al logarith m of the
nu mb er of alternatives players have to choose from (that is, either 11 for
the first player or 2 for the seco nd).
Furt her m o r e, the model as is does not take into account learning effects
that are likely to occur, and does not differentiat e decisions made during
the first games from decisions made later duri ng the experi m ent, for
which players had more experience. We present here what are in our
opinion the two most likely learning pheno m e n a.
First, buyers probably begin the game with a large variety of expectations
about sellers'likelihood to accept various offers, and these expectation s
converge toward more reliable estimat es after a few games. That creates
a greater hom ogen eity of sellers'beliefs about the most likely outcom es
of a given offer, and thus a greater ho mog enei ty of sellers’ behavior s (i.e.,
offers).
Second, more experienced sellers shoul d learn what a "fair offer" is.
Players shoul d eventu ally becom e more certain whet her they should
accept or refuse any given offer or, in other words, they shoul d learn
their own preferen ces and make more consisten t decisions after a few
games.
Given the interpre ta ti on of the decisio n param et er (i.e., indicator of
players’ heteroge nei ty and choice rand om n e s s), these two learning
phen o m en a shoul d translate into decision param et e r s that increase as a
16
function of players’ experience. Therefore, we re- express the decision
para me t er s as a linear function of the num ber of games played, where τ 0
is the intercep t (i.e., value of the decision param et e r at the very first
game), and τ 1 is the slope represen ti ng the increase in the decision
para me t er thanks to players’ experience. Thus, we have:
τ α = ( τ 0 + τ 1 g ) ⋅ ln ( 1 )
τ β = (τ 0 + τ 1 g ) ⋅ ln ( 2)
Equation 6 – Decision parameters of the players re- expressed as a function of a
common parameter τ 0 and a learning trend τ 1, scaled to the number of
alternatives (g is the number of games already played).
ESTIMATING THE PARAMETERS OF THE MODEL
Optimization procedure and results
The above decision - utility framewor k atte m pt s to nor m atively model
how people play the simple version of the ultim at u m game. It has three
para me t er s to be estimat ed: b , the uniqu e para m et er of the ERC utility
function, and τ 0 and τ 1 , the two para m et er s that drive players’
coefficients of certit ud e. Note that the param et e r c (the size of the pie) is
a function of the game design and is given a priori .
We fit the model to the multi - count ry bargai ning experim ent conduct ed
by Roth and his colleagues (Roth and Erev 1995; Roth et al. 1991). In this
well- known experi me n t, 270 participant s from 4 different countries
17
played 10 games each of the simple ultimat u m game, either as α or as β.
The dataset contains 1,350 observations, each with an offer (σi) and an
outco m e (i.e., the seller either accept s or rejects the offer). The size of the
pie c was $10, divided into 1,000 token s of 1¢ each, but for estima tion
proced u re s we sim plify the dataset to the case of 10 tokens of $1 each.
We find the opti mal values of the three param et e r s of the model using
maxim u m likelihood estim ation. As not hing is assum e d about the
underlying process that generate d the data, standar d deviations are
estimat ed using nonpar a m e t ri c boots t ra p variance estima tion (for a
review of the advant ages of this met hod, see the books of Davison and
Hinkley 1997; Efron and Tibshirani 1993; Manly 1997; Shao and Tu 1996),
and are shown within parent he se s.
The param et e r estim ates we obtain ed are b=10.742 (.995), τ 0 =.3478
(.0189) and τ 1 =.015 9 (.0038). All param e t er s are significant at p<.01.
As expected, b is positive (players are not only greedy but also seem to
evaluat e their gains in ter m of “relative” payoffs, too), τ 0 is positive but
relatively small, and τ 1 is positive and significant (there is a learni ng
trend).
The correlatio n s between observations and predictions are high, with
Rα=.907 and Rβ =.973. Specifically, the mo del predict s that the average
offer will be 4.06, with an average rejection rate of 30.9%. These num ber s
are actually 4.07 and 26.4% in the original datas et. Successful studen t t-
18
tests at p<0.05 on these two meas ur es confirm a good statistical fit of
the model.
i
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
P ( i )
P ( i )
Observations
Model
Observations
Model
.006
.020
.107
.076
.416
.338
.025
.003
.002
.000
.007
.006
.012
.053
.223
.359
.253
.083
.010
.000
.000
.000
.000
.333
.424
.534
.714
.928
.855
1.000
1.000
1.000
1.000
.026
.113
.312
.561
.734
.811
.847
.873
.893
.909
.924
Table 1 – Probability for the buyer to make an offer of σi , and probability for the
seller to accept such offer: observations vs. model (source: Roth et al., 1991 ).
19
Probabilit y t o m ak e an offer of... ( seller)
0.45
Obser vat ions
0.40
Model
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
i
Figure 1 – Probability for the buyer to make an offer of σi : observations vs.
model (source: Roth et al., 1991 ).
Probabilit y t o accept an offer of... ( seller)
1.0
Observ at ions
0.9
Model
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
i
Figure 2 – Probability for the seller to accept an offer of σi : observations vs.
model (source: Roth et al., 1991 ).
20
The following figure shows the shape of the estima ted ERC utility
function.
10
b= 0
b= 10.7 42
5
0
Ut ility
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-5
- 10
- 15
Figure 3 – Seller's utility function for different values of σi (and a pie- size of
$10). The buyer's utility function is symmetric (i.e., the main argument is [1- σi ]
instead of σi ).
The game we used to fit the model did not seem appropri at e to estim ate
possible positive reciprocity considera ti o ns, and since no estimation of
the right part of the curve could be perfor m e d given the data at hand, we
assu m ed perfect linearity of the social utility function when σ>.5. We
now check the statistical validity of this assum p t io n.
The log - likelihoo d of the asym m et r ic model was - 5,266.5, with a
stan d ar d deviation of 80.1 (estimat ed using boots t ra p variance
estimation). The log- likeliho od of the model with perfectly sym m et ric
recip rocity consider atio n s (b≠0, ∀σ) was - 5,254.2, with a standar d
21
deviation of 76.2. Since we cannot reject the null hypot he sis that there is
no positive recip rocity consider ation s involved in this game, and that no
statistical difference could be found between the two model s, the
original, asy m m et ric model was retai ned.
In term of learni ng, the model predict s that buyers will learn to offer
more to sellers over time, which will in turn increase the likelihood of
accepta nce.
Figure 4 shows how learning affects the mean opening offer (model
versu s observatio n s).
. 45
Round 1
. 40
Round 10
. 35
. 30
. 25
. 20
. 15
. 10
. 05
. 00
0 .0
0. 1
0.2
0. 8
0 .9
1. 0
0 .3
0. 4
0.5
0 .6
0. 7
0 .8
0 .9
1. 0
. 50
Round 1- 2
. 45
Round 9- 10
. 40
. 35
. 30
. 25
. 20
. 15
. 10
. 05
. 00
0 .0
0. 1
0 .2
0 .3
0 .4
0 .5
0 .6
0.7
Figure 4 – Probability for the buyer to make an offer of σi during the first and
last periods, model (left chart) versus observations (right chart). Players learn
that small offers are likely to be rejected, and learn to make offers that tend to
converge around 40% of the pie.
22
A note on scaling
What if the size of the pie were not $10 but $1, or $100? The shape of the
ERC utility function (see Equation 2) would be unaltere d, but its
amplitud e would. This effect is intuitively easy to suppo r t: having half of
a $100 pie shoul d procur e more satisfaction than having half of a $10
pie. Altern atively, a player sho uld be more upset by an unfair split of a
large pie than by a similarly unfair split of a smaller one.
But what about the Gibbs decision rules employed to transfor m expected
levels of utility into probabilities of actions? Obviously, scale does mat ter
in Gibbs distribu tio ns. If the amplit u de of the utility function were to be
increased, the distributi on of action s’ likelihood would be much more
concent r at e d aroun d the ones with the highest expected rewar ds. At the
very extrem e, if the size of the pie were infinite, the player would choose
the action with the highest expected utility with probability of 1. In fact,
increasi ng the size of the pie has the same effect in this model than
increasi ng the decision para me t er. Conseq ue nt ly, the model struct u re
conveys that players select actions with the highest expected rewar ds
more frequ en tly when the size of the pie is large. In other words, players’
decisions are expected to be more consist ent, less erratic, when game’s
stake is import a n t.
23
Summary
We have presen t ed shreds of evidence that the ERC utility function,
linked to actual players’ decisions by a Gibbs distribution that
incorpo rat e s both players'heterogen ei ty and individuals’ choice
rand o m ne s s, offers an appro priat e fit for the simple version of the
ultima t u m game. We have also shown that strategies followed by both
buyers and sellers tended to becom e more hom ogeneou s and less
rand o m over time. They learn what the most likely outcom es of the game
are, and make more clear - cut and consisten t decisions when
experienced, which translat es into statistically significant learni ng trends
in the decision param e t er s.
REPLICATIONS OF OTHER GAMES
The model presen t ed herein is based on the solid theoretical framework
of the ERC utility function, and offers an appropri at e represe nt a tion of
the specific dataset we used to compu t e the param et e r s of the model. But
can it accurat ely predict outcom es of other games? In other words, to
which extent the specific param et e r s we com put e d to fit this particul ar
dataset can be generalized to other games, other popul ati on s, other
experi me n t s?
To assess the generalization proper ties of our model (and of its
estimat ed param e t er s), we compu t e the out - of- sam ple predictions of the
24
model for 5 other games (the 2- roun d, 3- round and 5- round versions of
the ultim at u m game, the truncatio n game, and the Güth - van Dam me 3perso n bargai ning game), and compare the games’ outcom e s we obtain to
well- known experi me nt s in the literat ur e. Expected utilities are
com p u t e d using perfect backwar d inductio n, and unless specified
otherwise, simulatio n s of repeate d games are averaged over an equal
nu mb er of games (with learning trend) than those in the experi me nt used
as a benchm ar k.
Altho ugh com pari son s between prediction s made by the model and the
actu al outcome s observed in these games are perfor m e d only ex post ,
that is, we apply our model without any additional inform a tion specific
to the games played except for the struct u r e (i.e., rules) of the games
them s elves, we show that our prediction s are often very similar to
observation s.
Binmore, Shaked and Sutton (1985)
The first experi m ent of the so- called ultimat u m game was run by Güth
and his colleag ues in 1982. The results are consist ent with the ones
replicated later by Roth et al. in their multi - country bargaini ng
experi me n t, and are therefore not reprod uce d here (Güth et al. 1982).
The main finding, similar to what has been shown in the previous
section, was that the pecuniary model of game theory had little predictive
power and could not explain subject s’ behaviors in the simple ultim at u m
25
game. This work triggered many respons es, the first of which was made
by Binmore, Shaked and Sutton (1985).
In Binmore et al.’s experi m ent, 164 subject s played a 2- round version of
the ultim at u m game (Binmore et al. 1985), and bargained over a pie size
of 100 pences (which was appr oxi m at ely equivalent to $1.15 at the time
of the experi me nt, i.e., c=1.15). In the 2 - roun d version of the ultim at u m
game, also known as the 2- round sequent ial - bargaining game, the game
does not stop if the second player refuse s the offer. Instead, a second
roun d is played where the roles are inverted: the second player has the
oppo rt u ni t y to make a counte roffer to the first player, and the latter can
either accept or refuse it. If both first and second player’s offers are
rejected, players get no payoff.
The game is mad e more com plex by incorpor ati ng discount factors in the
secon d rou nd. If players do not agree in the first round, the actual
portion of the pie both players receive (if they eventually agree) is
multiplied by δ α and δ β (∈ [ 0,1] ), respectively. The closer to 0 the discount
factors are, the more the pie shrinks, a strong incentive for both players
to find an agreeme nt during the first round. Note that a combin ati o n of
discount factors (0, 0) corres pon d s to the simple version of the
ultima t u m game.
The disco unt factor was ¼ (i.e., the pie size was 25 pences in the second
round), and was similar for both players α and β. Game theory suggest s
26
that players β shoul d accept any offer greater or equal to 25% of the pie,
allowing the first players to keep 75% for them selves in the first round.
Binmore and his colleagues repor t a mean opening offer of 41.6 pences
in the first game; our model predict s a mean opening offer of 41.4
pences, perfectly replicati ng author s’ findings.
For com plete ne s s, we shoul d also mention that the aut hor s asked the β
players what offer they would hypot h etically make if they had to play a
secon d game, this time as propos er s. The average answer was 33.2
pences.
Neelin, Sonnensch ein and Spiegel (1988)
Neelin, Sonnen sc h ei n and Spiegel challenged Binmore, Shaked and
Sutto n’s finding s by extending the bargaining sequences to more than 2
roun d s of negotiatio ns (Neelin et al. 1988), and played 2- , 3- and 5roun d bargai ning games.
The 3- round version of the ultimat u m game is an extension of the 2roun d version, where players α can refu se the counter offer, and go to a
thir d rou nd of negotiations by making a counter - count er offer that
players β can either accept or reject. The 5- round version is yet a longer
version of the game that relies on the same sequenti al bargaining
mech ani s m s. Discoun t factor s are cum ul atively applied at each round.
27
This experi men t is one of the rare that expand s the ultima t u m game up
to five rou n ds of negotiations (Neelin et al. 1988), and is therefore wort h
mentio ni ng.
In this experi m ent, 80 participant s played 3 multi - round ultimat u m
games, either as player α or player β (they did not switch roles). The pie
size was $5 for all 3 games (c=5). The first game was a 2- round version
with sym m et ric discount factors of ¼ (i.e., δ α= δ β =0.25); the second a 3roun d version with a discoun t factor of ½; and the last was a 5- round
ultima t u m game with a disco un t factor of 1 / 3 . The discount rates were
chosen so that seller’s Rubins tein mean openi ng offer was $1.25 in every
game (Rubinst ein 1982).
The observed mean opening offer was $1.37 (27.4%) in the 2- round
ultima t u m game, replicating the findings of Binmore and his colleagues
that the pecu niary model of game theory had some predictive power.
With a pie size of $5, our model predict s $1.73 (34.6%). As a remi nder,
Binmore’s results were 41.9% in the first game, and 33.2% in their
hypot hetical seco nd game. This finding did not hold for the 3- round and
5 - roun d versio ns, though:
R1:
In the 3- round and 5- roun d versions of the ultimat u m
game, neither “game me n s hi p” nor “fairmens hi p” theories
predict the actual outcome s.
28
Neelin et al. foun d that mean opening offers in the 3- roun d and 5- round
version s of the ultim at u m game were inconsi st e nt with both the
pecuniary model of game theory and the equal split predictions. The
mean openi ng offers were $2.36 in the second game (3 rounds), and
$1.71 in the third (5 round s). Our model predicts $1.80 and $1.77,
respectively.
Interestingly, Neelin and his colleagues concluded that players were
acting myopically, and wrote that “[sellers] always act as if they were in
the 2 - roun d game.” They seem to suggest that the increasing number of
roun d s (and players’ inability to find the perfect subga m e equilibriu m as
the game beco me s more com plex) was the principal factor to explain
discrepa ncies between observations and Rubinstein’s predictions. In our
opinion, this explanation has some merits; one might find difficult to
believe that players are able to men tally simulate up to five round s of
negotiations using backwar d induction and correctly find the subgam e
perfect equilibriu m.
Another explanation of Neelin et al.’s findings could be that the key
driver of the result s was the discount factor s, not the num ber of rounds.
This alternative explanation could be for mulat ed as follows:
R2:
Mean opening offers increase as discount factors increase.
29
In other words, indepen de n tl y of the num be r of round s played, the mean
openi ng offer gets closer to the 50:50 split as the discount factor s
increase and favor multiple rounds of negotiations. Although our model
fails to replicate the amplit ude of this trend, it predict s a similar
phen o m en o n:
0. 25
0 .33
0. 50
# of rounds
2
5
3
Observ at ions
1.37
1.71
2.36
Sim ulat ion s
1.73
1.77
1.80
Table 2 – Mean opening offers in the 2- , 3- and 5- round versions of the
ultimatum game, with different discount factors, observations versus model
(source: Neelin, Sonnenschein and Spiegel, 1986 ).
Güth and Tietz (1988)
Güth and Tietz had a similar intuitio n, and ran a 2- round version of the
ultima t u m game, only with much more extrem e values for the discount
factors (Güth and Tietz 1988). 42 players participat ed in the experi me nt,
and played the 2- round ultimat u m game under different combi nations
of pie size (DM 5, 15 and 35) and discoun t factors (discount factor s were
similar for both players, and equal to either 0.1 or 0.9). Note that the
pecuniary model of game theory suggest s that in the δ =0.1 condition,
player α shoul d propo se 90% of the pie to player β and keep 10% for
himself.
30
R3:
The mean dem and e d shares are always greater than 0.5.
On average and across all conditions, players α dema nd ed 64.6% of the
pie and propo s ed 35.4% to the second player. Our model predict s 62.1%
and 37.9%, respectively.
R4:
When the time costs of bargaining are rather low, subjects
tend to bargain longer (p.10).
The author s note a dram atic difference in the first - round rejection rate
across con dition s. In the δ =0.1 con ditio n, rejection rate is 19.0%, but
increases to 61.9% in the δ =0.9 condition. Our model predicts an increase
from 38.1% to 55.1%.
Note that the data were not sufficient to highlight a clear influence of the
size of the pie on either the rejection rates or the mean openi ng offers.
Subjects having played the game twice, the auth ors noticed that
experience seem ed to induce a tenden cy to play fair, that is, to make
offers closer to the equal money division split. This trend was previously
mentio ne d (and replicated) in the simple version of the ultimat u m game.
However, in Güth and Tietz’s experi men t, subject s played in com pletely
differen t conditio ns during the second game: the swapped their place
and played the role of the other player, and were invited to bargain over
31
a pie of different size, with a different discount factor. It is quite
challenging to post ul at e (and to replicate in com put er simulations) as to
how these changes affected learni ng. Fortunat ely, Ochs and Roth’s
system a tic analysis of multi - round ultimat u m games will shed some
light on these aspect s of the model.
Ochs and Roth (1989)
Ochs and Roth teste d four combinations of discount factors (δ α , δ β ) for
both 2- round and 3- round ultimat u m games, nam ely (.4, .4), (.4, .6),
(.6, .4) and (.6, .6). Participant s bargained over a pie of $30 (i.e., c=30)
divided into 100 tokens. For instance, with a disco unt factor δ of .6, the
face value of a token was 30¢ in the first rou nd, 18¢ (=30 × .6) in the
second, and 11¢ (≈30 × .6²) in the third roun d, if any.
There were a total of 8 conditions (2- round or 3- roun d versions, 4
combinati ons of discoun t factors per version), referred to by the author s
as cell 1 to cell 8. Each game was played 10 times. There were between 8
and 10 α- and β- bargainers per cell (Ochs and Roth 1989). Since the
aut hor s find many empirical similarities between the 2 - and 3- roun d
bargaini ng games, we will study them conjointly.
We find the equilibriu m of the game by backwar d induction, in the form
of a decision tree where each node represen t s a decision made by one of
the players based on the expected utilities of the strategies he can follow.
We then com par e our predictions to several regularities found by Ochs
32
and Roth (Ochs and Roth 1989), and later com me nt e d by Bolton and
Ockenfels (Bolton and Ockenfels 2000).
R5:
There is a consisten t first - mover advant age: α bargainers
receive more than β bargainer s, regar dles s of the value of δ β.
Altho ugh the aut hor s do not report a particular meas ur e of first - mover
advant age per se, we com put ed the ratio of α bargainer s’ payoffs over β
bargainer s’ payoffs (includi ng the cases where no payoffs are received, or
when they are discoun t ed). In the 2- rou nd version of the ultimat u m
game, the model suggest s that first - mover s’ payoffs are on average
18.9% higher than second mover s’ payoffs, ranging from 5.7% to 29.9%
given the particular combination of discount factor s. That confir m s Ochs
and Roth’s observatio n that α bargainer s receive more than β bargainer s,
regardless of the value of δ β . More specifically, the model predict s that
the first - mover advan tage is maximi zed when the second player’s
discou nt factor is very import an t (22% to 33% when δ β=.4,
versu s 6% to 16% first - mover advant age when δ β =.6). In other words, the
first - player advan tage is maximi zed when the second player has a great
deal to loose by going to the second roun d of negotiation, and thus has
less negotiatio n power. A similar patter n of finding is found with the 3roun d version.
33
Ochs and Roth also note that for each of the four combinations of
discou nt factors they tested, the mean opening offer favored the first
player to the detri m ent of the second, that is, the first player
system a tically prop ose d to keep more than half of the pie. This finding is
linked with the second regularity they foun d and that we replicate, too:
R6:
Observed mean opening offers deviate from the pecuniary
equilibri um in the direction of the equal money division.
We com par e the mean opening (offer mad e by the first player) to
observation s averaged over all conditions (8 cells, 10 games per cell, 8 to
10 observation s per game). The model predicts that the first player’s
average offer will range between $12.7 and $13.9, with a mean of $13.1.
In the original experi men t, the range was between $12.4 and $14.6, with a
mean of $13.6. Both observation s and model’s simulations are closer to
the equal - split division than what the pure pecuniary model would have
suggested.
Two-round ultimatum game
Three-round ultimatum game
cell 1
cell 2
cell 3
cell 4
cell 5
cell 6
cell 7
cell 8
( .4, .4)
( .6, .4)
( .6, .6)
( .4, .6)
( .4, .4)
( .6, .4)
( .6, .6)
( .4, .6)
Observ at ions
12.4
14.6
14.2
13.7
13.0
13.4
13.6
14.0
Model
12.8
13.0
13.9
13.4
12.7
12.7
13.2
13.1
( , )
Figure 5 – Mean opening offer in the 2- round and 3- round ultimatum game,
observations versus model (source: Ochs and Roth, 1989 ), averaged over 10
34
games. The first player offers less than half of the pie to the second player,
whatever the discount factors (pie size=30). The mean opening offers are closer
to the equal- split division than suggested by the pecuniary model.
Note that the model suggest s that the first player is likely to offer the
smallest share of the pie to the second player in the (.4, .4) condition,
that is, when both players have the most to loose by going to furt her
roun d s of negotiation, both for the 2- rou nd and 3- round versions. This
particul ar pheno m e no n is also observed in the original experi m ent.
R7:
There are learning tren ds.
Indepen de n t ly of the mean opening offers that vary from one condition
to another, the authors have observed import an t learni ng trends over
time. Players modify their behavior s thanks to recently gained
experience. Our model replicates many of these trend s.
For instance, in the (.4,.4) condition of the 2- round ultim at u m game, first
players decrease their opening offers over time, from 13.2 in the first
game to 12.0 in the last. Our predictions are 13.0 and 12.6 respectively.
At the other extrem e, in the (.6, .6) condition of the 2- round ultimat u m
game, first players learn to increase the offer they make to the second
player, from 13.9 in the first game to 14.7 in the tent h game, on average.
Our mod el predicts this upward trend, too, from 13.6 to 14.2 after 10
games.
35
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
(δ
α
, δ β ) = ( .4,. 4)
(δ
α
, δ β ) = ( .6,. 4 )
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
36
(δ
α
, δ β ) = ( .6,. 6 )
(δ
α
, δ β ) = ( .4,. 6 )
Figure 6 – Mean opening offers in the 2- round version of the ultimatum game,
for the first ten games, with different discount factors for players α and β,
observations versus model (source: Ochs and Roth, 1989 ).
In our simulation s of the 2 - round ultimat u m games, though, we fail to
replicate the learning involved in the (.4, .6) condition during the first few
games (altho ugh both simulation s and observatio ns converge to the same
equilibriu m). Furt her m o r e, the model actu ally predicts the direction of
the learni ng tren d in the (.6, .4) condition, but under esti m a t es the mean
openi ng by an average of $1.60.
37
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
(δ
α
, δ β ) = ( .4,. 4)
(δ
α
, δ β ) = ( .6,. 4 )
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
16
15
14
13
12
11
10
1
2
3
4
5
6
7
8
9
10
38
(δ
α
, δ β ) = ( .6,. 6 )
(δ
α
, δ β ) = ( .4,. 6 )
Figure 7 – Mean opening offers in the 3- round version of the ultimatum game,
for the first ten games, with different discount factors for players α and β,
observations versus model (source: Ochs and Roth, 1989 ).
The replication s of the learning trend s observed duri ng 3- round
ultima t u m games are quite satisfacto rily for all conditions except for the
(.4, .6) condition where both amplitude and direction of learni ng are not
correctly replicate d.
39
R8:
A subst a n t ial prop or tion of first - period offers are rejected.
The autho r s repor t an average 15.8% first - period rejection rate across all
condition s in the 2 - round ultimat u m game. The model predict s a
subst an ti ally similar rejectio n rate of 16.5%. Besides, the predictions are
directionally consist en t with observations: first - period rejection rate is
mini m u m in the (.4, .4) condition and maxim u m in the (.6, .6) condition,
in both model’s predictio ns and observations.
Two-round ultimatum game
Three-round ultimatum game
cell 1
cell 2
cell 3
cell 4
cell 5
cell 6
cell 7
cell 8
( .4, .4)
( .6, .4)
( .6, .6)
( .4, .6)
( .4, .4)
( .6, .4)
( .6, .6)
( .4, .6)
Observ at ions
.100
.150
.188
.200
.120
.140
.144
.289
Model
.130
.167
.202
.170
.137
.164
.226
.180
( , )
Table 3 – Average rejection rate of first- period offers in the 2- round and 3round versions of the ultimatum game, with different discount factors for players
α and β, observations versus model (source: Ochs and Roth, 1989 ).
Besides, rejection rates in the 3- round version of the ultimat u m game
are slightly higher than in the 2- round version, both in observations
(17.1% vs. 15.8%) and in our predictions (17.5% vs. 16.5%).
Finally, as noted by the aut hor s, first - offer rejection rates are higher
when second player’s discou n t factor is high. In other words, second
players are more likely to reject an offer when they have less to loose by
40
going to the secon d round. This finding is replicated by the model, too:
wheth er it is in the 2 - round or 3- roun d versions, rejections rates are
higher when δ β=.6 then when δ β=.4.
R9:
A subst a n t ial propor tion of rejected first - period offers are
followed by disadvan t ageous counter offers.
In Ochs and Roth’s 2- round ultimat u m game (1989), 101 of the 125 first roun d rejections (observed across all conditions) are followed by
disadva n t ageou s count eroffers (81.0%). Players end up refusi ng offers to
make event u ally coun teroffer s that, in fine , give them less money (due to
discou nt factors). This ratio is compar able to findings in other
experi me n t s (Binmo re et al. 1985; Neelin et al. 1988). The model predicts
96.8% of disadva nt ageou s cou nte roffer s. Despite the overesti m a tion, this
is not inconsis ten t with similarly high ratios of disadvant ageous
countero ffer s found in other experim ent s. For instance, Bolton (1991)
repo rt s in his 2- round bargaining experi m en t that 24 of the 25 observed
countero ffer s (93.3%) were disadvant ageous (p.1103).
R10: The value of δ α influences the outcom e.
By the pecuniary equilibrium, the proport ion al allocation shoul d depe nd
exclusively on the value of δ β . Still, actual payoffs and observed rejection
41
rates are influenced by δ α, as shown in both observations and model’s
prediction s.
Bolton (1991)
Two - round bargaining game
Bolton also ran a 2- round version of the ultimat u m game (Bolton 1991),
althoug h players bargained over a pie of $12, and 2 different
combin ati o n s of discoun t factors were tested, namely (2 / 3 , 1 / 3 ) and (1 / 3 ,
2
/ 3 ). The game was played 8 and 7 times, respectively.
Findings similar to the ones foun d by Ochs and Roth were replicat ed by
Bolton (1991). We rapidly report the results of our simulation s, and
com pa re them to the observations made by the author.
The mean openi ng offers were 4.80 (40.0%) in the (2 / 3 , 1 / 3 ) condition, and
5.78 (48.2%) in the (1 / 3 , 2 / 3 ) condition on average across all games.
Simulation s predict 4.97 (41.4%) and 5.21 (43.4%), respectively.
The rejection rates in the original experi men t were similar across both
condition s, at 18.8% and 18.4% respectively. The model does not predict
any variation across condition s in the rejection rates either, but largely
overesti m a te these figures at 32.8% and 33.5% respectively.
The pro por tion of disadvant ageou s coun teroffer s was 85% and 20%
respectively in the observatio ns, for 95.5% and 69.0% predicted by the
model. Predictions are consisten t with observations (disadvant ageou s
countero ffer s are not rare, and occur more often in the (2 / 3 , 1 / 3 ) condition
42
than in the (1 / 3 , 2 / 3 ) one) but overesti m at e d. One has to remem be r,
however, that the original ratios were comp u te d on about only 10
observation s each.
10
9
8
7
6
5
4
3
2
1
2
3
4
5
6
7
8
10
9
8
7
6
5
4
3
2
1
2
3
4
5
6
7
(δ
α
, δ β ) = ( 23 , 13 )
(δ
α
, δ β ) = ( 13 , 23 )
Figure 8 – Mean opening offers in the 2- round version of the ultimatum game,
for the first eight or seven games, with different discount factors for players α
and β, observations versus model (source: Bolton, 1991 ).
43
Truncation ga me
The truncation game is very similar to the 2- round version of the
ultima t u m game, only that, if the secon d player refuses the first - round
offer and decides to make a counter - offer, the first player has no choice
but to accept (i.e., in the second round, the second player becom es a
dictato r). Similarly to the 2- roun d ultimat u m game, discoun t factors
apply in the second round. In Bolton (1991), players bargained over a pie
of $12 divided into 100 tokens (c=12), and two combination s of discount
factors were tested, namely (2 / 3 , 1 / 3 ) and (1 / 3 , 2 / 3 ). Each game was played 8
times.
R11: In the (2 / 3 , 1 / 3 ) condition, observed mean opening offers
deviate from the pecuniary equilibriu m in the direction of
the equal money divisio n. The difference widens with
experience.
The pecuniary equilibriu m of the (2 / 3 , 1 / 3 ) condition is for the second
player to accept any offer above 1/3 of the pie, and therefore for the first
player to make an offer of $4.08 (in Bolton (1991), each of the 100 tokens
had a face value of 12¢, and therefore 34 tokens with a total dollar value
of $4.08 was the smallest offer above $4 a player could possibly make).
On average, however, opening offers were equal to $4.62 in the first
44
game, and increased up to $5.21 in the last game. Our model predict s
$5.59 and $5.85 respectively.
The observed rejection rate was 39.1%. Our model predicts 37.0%.
R12: In the (1 / 3 , 2 / 3 ) condition, observed mean opening offers
deviate from the pecuniary equilibriu m in the direction of
the equal money divisio n. The difference narrows with
experience.
The pecuniary equilibriu m of the (2 / 3 , 1 / 3 ) condition is for the second
player to accept any offer above $8, and therefore for the first player to
make an offer of $8.04 (67 tokens). In the observations, the mean
openi ng offer was $7.64 in the first game, and got exactly equal to the
equilibriu m at $8.04 after 7 games. Our mod el predicts $6.77 in the first
and $7.22 in the last games, respectively. Since predictions
under est i m a te the mean opening offer, the model also overesti m at es the
rejection rate (26.6% in observation s versus 53.5% in simulations).
Altho ugh the model overesti m a te s the bias towar d the equal division
split, both pattern s of findings and directio ns of learni ng are replicat ed.
45
10
9
8
7
6
5
4
3
2
1
2
3
4
5
6
7
8
10
9
8
7
6
5
4
3
2
1
2
3
4
5
6
7
8
(δ
α
, δ β ) = ( 2 3 , 13 )
(δ
α
, δ β ) = ( 13 , 2 3 )
Figure 9 – Mean opening offers in the truncation game, for the first eight games,
with different discount factors for players α and β, observations versus model
(source: Bolton, 1991 ).
Güth and van Damme (1998)
The Güth - van Dam me game three - person bargaini ng game is similar to
the simple, one- round version of the ultim at u m game, except that there
is a third player with whom the pie has to be shared. The first player
prop o ses to the secon d player a divisio n of the pie among all three
players, and the latter either accepts or reject s the offer. If the
46
prop o si tio n is rejected, players do not receive any payoffs, otherwi se the
pie is split accordin gly. In any case, the third player has nothing to say.
This game challenges many conventional theories about fairness and
equity, and is therefore worth stu dyi ng (Bolton and Ockenfels 1998; Güth
et al. 2002; Güth and van Dam me 1998).
We com par e our predictions to the simplest version (i.e., essential
infor m ati o n condition, consta nt mode) of the original experi m ent
conduc ted by Güth and van Damm e (Güth and van Damm e 1998). Players
had to share a pie of 24 Dutch Guilder s (divided into 120 tokens), which
represen t e d by the time approxi m at ely $13.6 (c=13.6).
To apply the model we developed so far to a 3- person game requires a
small modification of the utility function, though. Since 3 players are
involved, the part of each player's payoff is expected to be one - third
instead of one - half of the pie, and the deviation in term of "relative"
money has to be modified accor di ngly, that is, the term (σ − 1 2 ) in
Equation 2 is replaced by (σ − 1 3 ) . No other modification is made, and the
para me t er b of the equation remains uncha nged.
R13: The amoun t the dum m y receives is very small.
Proposers generally offer much less than a third of the pie to the third
player (the dum m y). On average duri ng the first six games, the dum my’s
47
share was 7.8 out of 120 tokens in the observations (6.5%). Our model
predict s 8.4 (7.0%), as shown below.
Observ at ions
Model
Pr oposer ( x )
79.1
76.3
Respon der ( y )
33.1
35.3
Dum m y ( z)
7.8
8.4
Table 4 – Average amounts (pie size=120 tokens) allocated to the three players
by the proposer in the essential information condition of the Güth- van Damme
game, observations versus model (source: Güth and van Damme, 1998 ).
R14: Rejectio n rates are lower in the 3- person Güth - van Damm e
game than in the 2- person ulti mat u m game.
The average rejection rate in the original simple ultimat u m game dataset
we used to fit our model (Roth et al. 1991) was .264 (.309 predicte d), a
ratio that is consisten t although higher than the typical 15- 20 percent
rejection rate observed in 2 - person ultimat u m games (Roth 1995). It is .
079 in the Güth - van Damm e original dataset (p.241). Our model does not
captu r e this finding, and predict a very high rejection rate of .281.
R15: There is a learning trend.
48
We have shown that a learning tren d could be described by expres si ng
the decision param et e r of the decision rule as a linear function of
players’ experience. We apply the same schem e in this model to replicate
players’ learning by predicting the game’s outcom e at the first, sixth,
twentiet h and fiftieth games. As shown below, this does not affect y '
s
payoff much, but increases the prop oser's payoff to the detrim e nt of the
du m m y's. In other words, the proposer learns that he can keep the
du m m y's share of the pie without affecting the respond er 's likelihood of
accepti ng his proposals. The exact sam e patter n has been found duri ng
Güth and van Dam m e's actual experi m ent (p.239).
Model
Round 1
Round 6
Round 20
Round 50
Prop oser ( x )
75.9
76.3
78.7
82.8
Responder ( y )
35.0
35.3
35.2
32.0
Dum m y ( z)
9.2
8.4
6.1
5.2
Table 5 – Division of the pie when players gain experience. The model replicates
observations: dummy's payoff decreases and proposer's payoff increases with
learning.
Note that the model correctly esti mat es both the nat ure and the direction
of players’ learning, but underes ti m a t e s the pace at which it will occur.
Actually, the predictions made by the model for the fiftieth game are very
close to the observation s already mad e duri ng the sixth game of Güth
and van Dam m e’s experi m ent, namely x=8 0.8, y=3 3.3 and z=5.8.
49
Furt her m or e, the model predict s that the rejection rate will dram atically
decrease to .076 after 50 games, compar able to the actual rejection rate
observed during the experi men t. It seem s that the large overesti m ati on
of the rejectio n rate (.281 predict ed versus .079 observed) is mainly the
consequ ence of the model’s inability to predict the pace at which learning
will occur, rather than the direction, nat ur e or effects of such learni ng.
Summary
In this section, we have simulated 5 games (the 2- round, 3- round and 5roun d version s of the ultimat u m game, the truncation game, and the
Güth - van Damm e 3- person bargaining game) and compare d our out - ofsam ple predictio n s to existing experim ent s in the literat ur e. Most of 15
major findings are replicated, and prediction s are not only directionally
consist en t with observations, but also often accurate. These result s seem
to un derlin e the great generali za tion proper ties and predictive power of
the ERC utility - decision framework we have developed.
Observ at ions
Mod el
GAM E U SED T O FI T T H E M ODEL
ON E- ROU N D , U LT I M AT UM GAM E ( Rot h et al., 19 91)
Mean op ening off er
4 0. 6%
40. 7%
Av er ag e r ej ect ion r at e
2 6. 4%
30. 9%
Table 6 – Summary of findings. Roth et al. simple version of the ultimatum game
has been used to fit the parameters of the model.
50
Obser v at ions
Model
OUT- OF- SAM PLE PREDI CT I ON S OF OTH ER GAM ES
T W O- ROU N D ULT I M ATUM GAM E ( Binm or e, Sh aked and Su t t on, 1985 )
Mean opening of fer
4 1.6%
4 1.4%
T W O- , T H REE- AN D FI V E- ROU N D ULT I M AT UM GAM ES ( Neelin et al. ,
2 7.4%
Mean opening of fer ( 2 R, = 1 / 4 )
1
3 4.2%
Mean opening of fer ( 5 R, = / 3 )
1
4 7.2%
Mean opening of fer ( 3 R, = / 2 )
Mean opening of fer incr eases wit h discount fact or
1 988)
3 4.6%
3 5.4%
3 6.0%
T W O- ROU N D ULT I M ATUM GAM E ( Güt h and Tiet z, 19 88)
Mean opening of fer
3 5.4%
Dem anded shar es alw ay s g reat er t h an 50 %
Rej ect ion rat e in ( = .1) condit ion
Rej ect ion rat e in ( = .9) condit ion
3 7.9%
1 9.0%
3 8.1%
6 1.9%
5 5.1%
Rej ect ion rat es incr ease w it h discoun t fact or s
T W O- AN D T H REE- ROU N D ULT I M ATU M GAM ES ( Ochs and Rot h, 19 89)
Consist en t fir st - m ov er adv ant age
Mean opening of fer ( 2 R)
4 5.8%
4 4.3%
Mean opening of fer ( 3 R)
4 5.0%
4 4.0%
Mean opening m inim um in ( . 4, .4 ) con dit ion
Th ere are lear ning t r end s
Av er age r ej ect ion r at e ( 2R)
1 5.8%
1 6.5%
Av er age r ej ect ion r at e ( 3R)
1 7.1%
1 7.5%
Rej ect ion rat e m inim um in ( .4 , .4) condit ion
Rej ect ion rat e m ax im um in ( .6 , . 6) condit ion
Disadv ant ag eou s count erof fer s
Value of
inf luences t he out com e
8 1.0%
9 6.8%
Table 6 (cont’d) – Summary of findings. Major experimental findings are
replicated,
and
predictions
are
not
only
directionally
consistent
with
observations, but often very accurate.
51
Obser v at ions
Model
OUT- OF- SAM PLE PREDI CT I ON S OF OTH ER GAM ES ( co n t 'd)
T W O- ROU N D ULT I M ATUM GAM E ( Bolt on, 1 991)
Mean opening of fer in ( 2 / 3 , 1 / 3 ) condit ion
1
2
Mean opening of fer in ( / 3 , / 3 ) condit ion
4 0.0%
4 1.4%
4 8.2%
4 3.4%
Mean opening m inim um in ( 2 / 3 , 1 / 3 ) cond it ion
Av er age r ej ect ion r at e
1 8.6%
3 3.2%
Disadv . count er of fer s in ( / 3 , / 3 ) cond it ion
8 5.0%
9 5.5%
Disadv . count er of fer s in ( 1 / 3 , 2 / 3 ) cond it ion
2 0.0%
6 9.0%
2
1
2
1
Dis. count . m ax im um in ( / 3 , / 3 ) cond it ion
T RUN CATI ON GAM E ( Bolt on, 1 991)
Mean opening in ( 2 / 3 , 1 / 3 ) condit ion
1
2
Mean opening in ( / 3 , / 3 ) condit ion
4 0.7%
4 7.8%
6 5.3%
5 8.3%
Mean opening dev iat es fr om equilibr ium
Widens ov er t im e in ( 2 / 3 , 1 / 3 ) condit ion
1
2
Nar r ows ov er t im e in ( / 3 , / 3 ) condit ion
GÜ TH - V AN DAM M E ULTI M AT UM GAM E ( Güt h an d v an Dam m e, 19 98)
Pr oposer's sh ar e
6 5.9%
6 3.6%
Resp onder 's shar e
2 7.6%
2 9.4%
6 .5%
7 .0%
7 .9%
2 8.1%
Du m m y 's sh ar e
Av er age r ej ect ion r at e
( 1)
Rej ect ion rat e lower t han ult im at um gam e
no
Th ere is a lear ning t r end
( 1)
Not e: Unless ind icat ed ot her wise, all figur es ar e relat ed t o off ers ( in cluding t he ones
ev ent ually rej ect ed ) m ade dur ing t he fir st r ound ( if m ore t h an one) .
( 1)
The m odel p redict s t he av er age r ej ect ion r at e will ev ent ually decr ease t o 7.6%
aft er 50 gam es.
Table 6 (cont’d) – Summary of findings. Major experimental findings are
replicated,
and
predictions
are
not
only
directionally
consistent
with
observations, but often very accurate.
52
By plotti ng observation s versus mod el predictions across all the games,
for both mean openi ng offers and rejection rates, one can see that mean
openi ng offers are often very accurat ely predict ed across games.
Rejection rates, however, are often overesti m at e d. One of the reason
might be that the game we used to fit the model (i.e., Roth et al. 1991)
has already a high rejection rate (26.4%) com pa re d to the 15- 20% usually
foun d in one - round ultima t u m games.
53
70%
60%
Pr edictions
50%
40%
30%
20%
10%
0%
0%
10%
20 %
30%
40%
5 0%
60%
70%
Obser vat ions
70 %
60 %
Pr edict ions
50 %
40 %
30 %
20 %
10 %
0%
0%
10%
20%
3 0%
40 %
50%
60%
70 %
Obser vat ions
54
Figure 10 – Plots of mean opening offers (left chart) and first- round rejection
rates
(right
chart),
observations
versus
predictions.
Squared- correlation
statistics are .895 and .474, respectively. Mean opening offers are predicted very
satisfactorily across games and conditions, but rejection rates are usually
overestimated by the model. Note that rejection rates are not always reported in
all experiments (e.g., Binmore et al., 1985).
55
STABILITY OF THE PARAMETER ESTIMATES
We esti mat ed the para me t er s of our model using Roth et al.’s multi coun try bargaining dataset (Roth et al. 1991), and then used these
para me t er estimat es to make out - of- sam pl e predictions for several
other games. The reason s why we used this particular datas et to fit our
model are twofold. First, the one- round ultimat u m game is the keysto ne
of all bargaining games; it is therefo re natur al to fit the model on the
simplest setting possible, and then test the out - of- sam ple validity of the
predictions on more elaborate d versions of the game. Another, more
practical reaso n to use Roth et al.’s dataset is that it is one of the rare
very large dataset s available. Since maxim u m likelihood estimat es can be
heavily biased for small sam ples, the latter reaso n is not trivial.
One might wonder, however, whether the param et e r estim at es we
obtain ed would have been different if we had fitted the model to anot her
dataset. To suit our needs, such dataset should have the two following
desirable charact eri stics:
First, since one of the para me t er of the model is a learning param et e r,
the game shoul d have been played repeat edly (i.e., several periods) with
the same players to allow the model to captur e learni ng tren ds.
Second, since the desirable proper ties of maxim u m likelihoo d estimat es
are only achieved asy mp t o tically (Eliason 1993), the dataset should have
56
a large numb er of observations per period . For inst ance, Eliason insists
that “in the typical ML estimation proced ur e, one would want to have a
large sam pl e size because the desirable proper ties of the MLE (…) are
justified only in large sam ple situation s.”
Fortu n at ely, despite the lack of potenti al candidat es, Ochs and Roth’s
dataset s meet all the above criteria (Ochs and Roth 1989). We therefore
re- estim ate d the param et e r s of the model to the observations made by
Ochs and Roth in their 2- round and 3- roun d ultim at u m games
separat ely. To maximize the size of each dataset, we estimat ed one set of
para me t er s to fit simult aneo u s ly all 4 conditions of each game (i.e., 4
combin ati o n s of discoun t factors), thu s leading to two datas et s of 380
observation s each. To avoid artificial biases, observations were weighted
within each dataset so that each condition would equally contribute to
their respective log- likelihood function.
Table 7 repor ts the resul ts of the param et e r estim ate s fitted on the 3
dataset s (the original ultimat u m game from Roth et al. used in the first
section s of this paper and the 2- round and 3- round versions of the
ultima t u m game from Ochs and Roth), using maxim u m likelihood
estimation.
Altho ugh most difference s are statistically significant (though, b estim ate
is not statistically different between the first and the second dataset s, so
is τ 1 between the first and the thir d dataset s), param e t er estimat es seem
to be reason ab ly stable across games.
57
b
ON E- ROUN D ULTI M ATUM GAM E
( Rot h et al., 1991)
TW O- ROUN D ULT I M AT UM GAM E
( Ochs & Rot h, 1989)
TH REE- ROUN D ULT I M AT UM GAM E
( Ochs & Rot h, 1989)
0
1
10.742
0.3478
0.0159
( .995)
( .0189)
( .0038)
10.566
0.2704
0.0016
( 2.258)
( .0267)
( .0038)
12.579
0.2206
0.0180
( 1.485)
( .0190)
( .0050)
Table 7 – Parameter estimates of the model, using maximum likelihood
estimation and 3 different datasets. Estimates seem to be reasonably stable
across games.
58
ULTIMATUM GAME SIMULATIONS WITH ALTERNATIVE ASSUMPTIONS
ABOUT PLAYERS’ CHARACTERISTICS
In the first sections of this paper, we have assu m e d that the struct u r al
form of players’ utility function followed an ERC, asym m et ric shape (i.e.,
presence of negative recip rocity, but no positive reciprocity), and that
choices were the outcom e of a rando m process. The statistical fit
obtain ed one the simple version of the ultimat u m game seem ed to
confir m our hypo t hese s. First, introd uci ng positive reciprocity did not
improve the overall fit of the model, and the null hypot h esis that there
was no positive reciprocity considerat io ns in players’ behavio rs could not
be rejected; then, the decision para m et er estimat e was sufficiently small
to allow a great deal of variability in the decisions’ outco m es.
Taking the simple ultimat u m game for illust ration purpos e, we could
modify the param et e r values of the model for one or more players, and
see how these modificatio n s affect the outcom e of the game. The
following modificatio ns can be applied to the model:
1. The decision para me t er can be set to infinity (τ→∞), tran sfo r m i ng
the probabilistic decision rule into a deter mi ni stic one; the
strat egy with the highest expected utility would then be chosen
with a probability of 1.
59
2. The param et e r b of the utility function can be set equal to 0,
transf or m i ng the nonlinear ERC utility function into a standar d
linear function; players’ motivation s becom e self - interest ed,
mo net ary gains only, witho ut equity consider ation s, as
suggested by the pecu ni ary model of classic game theory
(“greedy” utility function).
3. On the other han d, the ERC function can be made symm e tric,
and b can be set positive even for σ> ½, introd ucing some
altruistic considerat io ns in players’ motivations (“sym m et ric
ERC” utility functio n).
These changes can be readily applied to the proposer, the respon de r, or
both. Table 8 shows the mean openi ng offer in the ultimate game as
predict ed by our simulatio n s for all possible combi nations. Cells in bold
are not statistically different from the observation s made by Roth et al.
(1991), used to fit the model in the first place.
Responder
Greedy
Asymmetric ERC
(b>0 ⇔ σ<½)
(b=0)
Proposer
τ→∞
τ→∞
Greedy
τ→∞
.001
.039
Asymmetric
ERC
τ→∞
Symmetric
ERC
τ→∞
.410
(e)
.440
(e)
.410
.407
(e)
.435
(e)
.410
.280
.400
(d)
.280
.400
(d)
.191
.319
.419
(d)
.319
.415
(d)
.001
.140
.280
.400
(d)
.280
.400
(d)
.039
.189
.319
.408
(c)
.319
.408
(d)
(e)
.450
(d)
.410
(e)
.450
(d)
(e)
.455
(d)
.410
(e)
.454
(d)
(a)
.140
Symmetric ERC
(b>0, )
τ→∞
(b)
60
Table 8 – Mean opening offer (in percent) in the simple ultimatum game
(c=$10),
as given
by
simulations,
with
different
hypotheses
about
the
parameters. The cells in bold are not statistically different from the results
obtained by Roth et al. (1991).
For illustra tion pur po se, cell (a) corresp o nd s to the perfect subgam e
equilibriu m: the propo ser and the resp on d er are both only motivated by
their pecuniary gains (b=0), and they system at ically choose the strategy
with the highest (expected) utility. Therefore, the first player offers the
smallest share of the pie possible to the second player, and the latter
accept s with a probability of 1. This illustr at es the fact that the perfect
subga m e equilibriu m can be viewed as a special case of our model, where
b =0 and the decision para m et er is set to an extrem ely large value.
It has been suggested that, under some conditions, even “greedy” and
“ratio nale” players migh t choose to offer more than the minim u m
suggested by game theory, in order to secure a rationale respons e from
the second player. For instance, Binmore wrote that “the first player
might be dissu a ded from making an opening dema nd at, or close to, the
‘optim u m’ level, becaus e his opponen t would then incur a negligible cost
in making an ‘irrational’ rejection” (Binmore et al. 1985, p.1180). Cell (b)
replicates this reasoni ng: altho ugh both players are greedy, the
rand o m ne s s associated with the second player’s decision to reject small
offers has to be com pen s at e d (by increasi ng the share offered to the
secon d player), so that the cost of an irratio nal decision of rejection
would increase. Our simulations suggest that this sole effect could
61
explain an increase in the mean opening offer up to .080, but is far from
sufficient to explain the .408 observed in the original experi m ent.
Cell (c) is the stan d ar d model we used throu ghou t this paper: players
share the same utility function (an ERC, asym m e t ric curve), and both
players’ decision s are probabilistic.
Interestin gly, as shown in cells (d), only two conditions suffice to
replicate the observations mad e by Roth et al. (1991). First, respon de r s
mus t have an aversion to games outcom e s that are in his or her disfavor
(b β>0 if σ< ½), and second, respo nd er s’ choices must be probabilistic.
Other than that, nothi ng matter s. For instance, charact eri stics associat ed
with the first player are irrelevant to explain the results. If the two above
condition s were met, even a “greedy” (b=0) and “rationale” (τ→∞) robot
that would maximize its expected gains by system a tically following the
strat egy with the highest expected utility would still have to select a
mean openi ng offer not statistically different from what Roth et al. have
observed in their multi - count ry bargaining experi m ent.
Also, as shown additionally in cells (e), propose r’s altruistic motives
could also explain by them s elves a large mean opening offer, although
such hypot hesi s could be refut ed by observations made in other games
(see for instance the Güth - van Damm e game, where the propos er does
not seem to care about the du m my.)
62
DISCUSSION AND CONCLUSIONS
In this paper, we have develope d a utility- decision framework to explain
players’ behavior s in various games, based on the und erlying ERC utility
functio n (to take into account players’ consider atio n for fairness and
equity) linked to actual decisions by a probabilistic Gibbs distrib utio n (to
incorp orat e choice ran do m n e s s and players’ heterogenei ty). After fitting
the model to the ultimat u m game and obtaini ng a good fit, we have used
this model to predict players’ behaviors in 5 other games, totaling 19
differen t experi m ent al con ditions, and replicated 15 majo r experi m ent al
findings, often with a sur p ri si ng accu racy.
This model has many elements in com m on with other mod els propo sed
in the literat ure to explain players’ behaviors, and yet present s many
original and remar kab le properties that we will highlight by compari ng it
to two well- known reinforce m e nt learning model s: the Roth and Erev’s
simple reinforcem e nt learning mod el (Erev and Roth 1998; Roth and Erev
1995), hereaft er RE; and the Sarin and Vahid’s dynamic model of choice
(Sarin and Vahid 1999; Sarin and Vahid 2001), hereafter SV.
Comparison s to reinforcem e nt learning models
RE (i.e., Roth and Erev’s reinforcem e n t learning mod el) models players’
propensities q t ( i ) to choo se action i at time t: at the beginni ng of the
game, each player has an initial propensi ty to play each possible strategy
63
(i.e., q 0 ( i ) are exogenou s to the model). The chosen strategy is then
deter mi ned by a linear probabilistic decision rule, where the prob ability
p t ( i ) to choose action i at time t is defined by:
pt (i ) =
qt (i )
N
∑ q ( j)
j =1
t
Equation 7 – Linear probabilistic decision rule in the RE model (Roth and Erev
1995) . Probability to take each action i at time t is a linear function of its
propensity q t (i) .
Then, the actu al payoff x t ( i ) that results from having chosen the ith action
at time t is observed, and the propensi ties are updat ed by the following
form ul a:
q t +1 ( i ) = q t ( i ) + x t ( i ) − x min
Equation 8 – Updating procedure in the RE model (Roth and Erev 1995) .
Propensity q(i) are linearly augmented with the actual, observed payoffs for
taking this action.
Where x min
is the minim u m payoff that can be experience d, so that
choosing an action that leads to the smallest possible rewar d (usually
zero for most games) is not reinfo rced.
The SV model, althou gh based on a similar set of equations, differs in
many ways from the RE’s model. First, SV does not mod el players’
64
propensities to choose different actions, but rat her expected utilities of
these actions (Sarin and Vahid 2001), where expected utilities “represent
the subjective assess m e n t of the player regardi ng the payoff she would
obtain from the choice of any strat egy at any time” (p.105). Second,
players’ choices are not hypot hesi ze d to be probabilistic: they choose the
action with the highest expected utility with a probability of 1. In that
sense, the SV model is myopic 2 . Finally, the updati ng procedur e also
differs in the sense that, after choosing the action i at time t and
observing the actual payoff x t ( i ) of this choice, expected utilities are
update d by compari ng them to actual payoffs as follows:
u t +1 ( i ) = (1 − λ ).u t ( i ) + λ .x t ( i )
Equation 9 – Updating procedure in the SV model (Sarin and Vahid 2001) .
Expected utilities u(i) are iteratively shifted by a constant λ toward the latest
observed payoff of taking this action.
Where λ is a small positive const a nt smaller than 1 (usually equal to
0.01). At the very extre me, if λ=0, there are no up dat es in expected
utilities and u t ( i ) = u 0 ( i ) , ∀t .
Our model diverges noticeably from these two models in three ways.
First, althou g h the utility- decision framewor k model we have prese nte d
in this paper has some learni ng com pone n t s embedded in the decision
rule (i.e., the coefficients of certit u de is a linear function of players’
2
In learning theory, it is also said that players follow a “greedy” decision rule.
65
experience), it is not a learning model per se. Players’ behavior s are not
reinforce d by trial - and - error (as in the ER model), and changes in
strat egy is not explained by continuou sly refined estim ate s of expected
utilities than k s to experience (as in the SV model). In a sense, our model
incorpo rat e s a learning compone nt that is indepen de n t of the actual
choices made by players, which is a possible limitation of our model.
Second, our model explicitly incorporat e s a nonlinear social utility
function, while ER and SV implicitly use a linear utility function: both ER
(resp. SV) upd ate s the propensi ties of choices (resp. expected utilities)
thro ug h a linear transfor m a ti on of game payoffs, without taking into
account decreasing marginal utilities, expectation s of fairness or sense of
equity.
Third, our mod el does not need prior knowledge of any sort to be able to
replicate and simulate new games. While ER3 needs initial prope nsi ties q 0
, and SV needs initial expected utilities u 0 , our model does not have these
prerequi sites, and therefore is more general and less inform a tion sensi tive.
Limitations and direction s for future research
The model we have developed, despi te its good stati stical fit and
predictive power, has several limitations we addr es s here.
There has been a recent attempt by Roth and Erev to circumvent this model limitation, although
the presence of initial propensities still seems to be critical to predict the outcome of a large
family of games.
3
66
First , the learning compon en t of the ERC utility- decision framewor k is
indepen de n t of players’ past actions, which seems unrealistic. The
learni ng trend embed d e d in the model is expected to arise from a
decrease in players’ heteroge nei ty and in choice rando m ne s s, but the way
these phen om en a are linked to experience and trials and error s (i.e.,
reinforce m e n t learning) has still to be investigated.
A secon d limitation of this model is the way variability is accoun t ed for.
One can argue that players’ heterogen eity and individual s’ choice
rand o m n e s s (the two main compone nt s that explain why games
outco m e s are probabilistic) are not very well capt ure d by our model, the
reason being that they are obviously confound ed in the param e t er of the
decision rules, and therefore cannot be analyzed and estim ate d
separat ely.
A third limitation is that it is unclear so far whet her or not the
para me t er s of the model are actually stable, and whet her they sho uld be
modified (if ever) to better account for certai n popul ati on s (e.g., stud en t s
vs. other types of participan t s) or certain game configuration s (e.g.,
varying nu m ber of players).
Contributions and conclusion
The first contribu t ion of our model is that it does not rely upon any kind
of prior knowledge abou t initial players’ expect atio ns or utilities or
prope nsities to play certain strategies. Most of the criticism s that have
67
been made against reinforce me n t learni ng models are indeed about the
critical role of these initial values in the model s: where they come from,
and the amo u nt of infor m ati on they contain (that is, whethe r or not the
models propo se d to explain players’ behaviors only “magnify”
infor m ati o n already contained into these initial values exter nal to the
model, and therefo re do not explain much). Erev and Roth (1998) say
their model is “agnostic about where the initial propensi ties come from”.
Similarly, Sarin and Vahid (2001) write that “initial assess m e nt s u 0 may
have been formed by hears ay, strategy labels, or by similarity of the
decision situation to other decision problem s that the individual may
have faced in the past […]”. We offer an alternative to these ad hoc
approaches by linking behavior s to an underlying utility function that is
stable over time, consta nt whatever the game played, and shared by all
players whatever their specific roles in the game. Conseque n tly, at the
difference of existing models, our mod el can be hypot hetically used to
predict ex ante the outcom e s of never - before - played games.
The second contribution of this paper is to addres s one of the major
criticis m s made to the social utility theory, and specifically its lack of
quan tification. By quantifying the unique param e t er of the utility
functio n we employ, our mod el gives a sense of magnit ude about the
players’ motivatio ns. Although most of the findings present ed here have
already been explained by an exploration of the theoretical properties of
the ERC utility function (Bolton 1991; Bolton and Ockenfels 2000), these
68
prope rties could not be quantified without comput i ng the most likely
values of the para m et er s.
A third contrib ution is achieved by the remar kable generalization
prope rties of our model when applied across different datas et s, collected
at different point of time by different aut hor s and for different games.
This finding seem s to und erli ne the relatively good stability of the
model’s param et e rs.
Finally , this paper also lays the groun dwor k for more sophis ticat ed
models that shoul d investigate a different way to model variability by
disent a ng ling heterogeneity of players and rando m n e s s of individual s’
choices, and investigat e furt her the possible asym m et r y between positive
and negative reciprocities, possibly by stu dying games that involve both
types of behaviors.
69
REFERENCES
Binmore, Kennet h, Avner Shaked, and J. Sutton (1985), "Testing
Noncoopera tive Bargaining Theory: A Prelimin ary Study," The American
Economic Review, 75 (Decem ber), 1178- 80.
Bolton, Gary E. (1991), "A Compa rative Model of Bargaining: Theory and
Evidence," The American Economic Review, 81 (5), 1096- 136.
Bolton, Gary E. and Axel Ockenfels (2000), "ERC: A Theory of Equity,
Reciprocity, and Comp etition," The American Economic Review, 90 (1),
166 - 93.
- - - - (1998), "Strategy and Equity: An ERC- Analysis of the Güth - van
Dam me Game," Journal of Mathem atical Psychology, 42 (2/3), 215- 26.
Bush, Robert and Frederick Mosteller (1955), Stochastic Models for
Learning. New York: Wiley.
Camerer, Colin (1990), "Behavioral Game Theory," in Insights in Decision
Making: A Tribute to Hillel J. Einhom, Robin Hogarth, Ed. Chicago:
University of Chicago Press.
70
Camerer, Colin and Teck Hua Ho (1999), "Experience - weighted Attraction
Learning in Normal Form Games," Econom et rica, 67 (4), 827 - 74.
Cheu ng, Yin - Wong and Daniel Friedm an (1997), "Individual Learning in
Norm al Form Games: Some Laborat o ry Results," Games and Economic
Behavior (19), 46- 76.
- - - - (1995), "Individual Learning in Normal Form Games: Some
Laboratory Results,". Santa Cru z: Mimeo, University of California.
Cooper, David and Nick Feltovich (1996), "Reinforce m en t - Based Learning
vs. Bayesian Learning: A Compari so n,": Mimeo, University of Pittsburg h.
Cox, James, Jason Shacht, and Mark Walker (1995), "An Experime nt to
Evaluate Bayesian Learning of Nash Equilibriu m,": Mimeo, University of
Arizona.
Davison, A.C. and D.V. Hinkley (1997), Bootst rap Methods and Their
Application. Cambridge Series in Statistical and Probabilistic
Mathema tics: Camb ri dge University Press.
71
Efron, B. and R. J. Tibshirani (1993), An Intro ducti on to the Bootst rap.
Monograp h s on Statistics and Applied Probability. London: Chapm a n &
Hall /CRC.
Eliason, Scott R. (1993), Maximum Likelihood Estimation: Logic and
Practice. Sage University Paper on Quantitative Applications in the Social
Sciences. Thousa n d Oaks, CA: Sage.
Erev, Ido, Yoella Bereby- Meyer, and Alvin E. Roth (1999), "The Effect of
Adding a Const ant to All Payoffs: Experiment al Investigation, and a
Reinforcem en t Learning Model with Self- Adjusting Speed of Learning,"
Jour nal of Econo mic Behavior and Organi zation, 39 (1), 111- 28.
Erev, Ido and Alvin E. Roth (1995), "On the need for low rationality,
cognitive game theory: Reinforcem e nt learning in experi men t al games
with unique, mixed strategy equilibria," in MIMEO. University of
Pittsb u rg h.
- - - - (1998), "Predicting how people play games: Reinforcem e nt learning
in experi m ent al games with unique, mixed strat egy equilibria," American
Economic Review, 88 (4), 848 - 81.
72
Estes, William K. (1950), "Toward a Statistical Theory of Learni ng,"
Psychological Review, 57 (2), 94- 107.
Fehr, Ernst, Georg Kirchsteiger, and Arno Riedl (1993), "Does Fairness
Prevent Market Clearing? An Experim ent al Investigation," The Quaterly
Jour nal of Econo mics, 108 (2), 437 - 59.
Fehr, Ernst and Klaus Schmi dt (1997), "How to Account for Fair and
Unfair Outco me s - A Model of Biased Inequality Aversion," in Symposi u m
on Econo mic Theory. Gerzensee, Switzerlan d.
Feltovich, Nick (2000), "Reinforce m en t - based vs. belief - based learni ng
models in experi me nt al asym m et ric - inform a tion games," Econom et rica,
68 (3), 605 - 41.
Grossk op f, Brit (1999), "Competition, Aspiration and Learning in the
Ultimat u m Game: An Experimen t al Investigation," in 1999 Europea n
Economics Associatio n Meetings. Universitat Pompeu Fabra.
Güth, Werner , Carst en Schmi dt, and Matthias Sutter (2002), "Bargaining
Outside The Lab – A Newspaper Experiment Of A Three Person Ultimat u m Game,".
73
Güth, Werner, R. Schmittberger, and B. Schwar ze (1982), "An
Experime nt al Analysis of Ultimat u m Bargaining," Journal of Economic
Behavior and Organizati on, 3, 367 - 88.
Güth, Werner and Reinhard Tietz (1988), "Ultimat u m Bargaining for a
Shrinki ng Cake, An Experimen t al Analysis," in Working paper.
Güth, Werner and Eric van Damm e (1998), "Infor ma ti on, Strategic
Behavior and Fairness in Ultimat u m Bargaining, An Experime nt al Study,"
Jour nal of Mathem at ical Psychology, 42 (2/3), 227- 47.
Hopkins, Ed (1999), "Learning, Matching, and Aggregation," Games and
Economic Behavior, 26, 79- 110.
Luce, Duncan R. (1959), Individual choice behaviour. New York: Wesley.
Manly, Bryan F. J. (1997), Randomi z at ion, Bootstra p and Monte Carlo
Metho ds in Biology, Second Edition: CRC Press.
Neelin, Janet, Hugo Sonnenschei n, and Matthew Spiegel (1988), "A Further
Test of Noncoo per ative Bargaining Theory," The American Economic
Review, 78 (Septemb er), 824- 36.
74
Ochs, Jack and Alvin E. Roth (1989), "An Experiment al Study of
Sequenti al Bargaining," The American Economic Review, 79 (June), 35584.
Rabin, Matthew (1993), "Incorpor ati ng Fairnes s into Game Theory and
Economics," American Economic Review, 83 (5), 1281 - 302.
Rapopo r t, Amno n and Ido Erev (1998), "Coordination, "magic", and
reinforce m e n t learning in a market entry game," Games and Economic
Behavior, 23 (2), 146- 75.
Roth, Alvin E. (1995), "Bargaining Experim en t s," in Handbook of
Experime nt al Econo mics, J. Kagel and A. E. Roth, Ed. Princeton: Princeton
University Press.
Roth, Alvin E. and Ido Erev (1995), "Learning in Extensive - Form Games:
Experime nt al Data and Simple Dynamic Models in the Inter m ediat e
Term," Games and Economic Behavior, 8 (Special Issue: Nobel
Symposi u m), 164 - 212.
Roth, Alvin E., Vesna Prasnikar, Masahiro Okuno - Fujiwara, and Shmuel
Zamir (1991), "Bargaining and Market Behavior in Jerusalem, Ljubljana,
Pittsb u rg h, and Tokyo," American Economic Review, 81 (5), 1068- 95.
75
Rubinst ein, A. (1982), "Perfect Equilibriu m in a Bargaining Model,"
Econom et rica, January.
Sarin, Rajiv and Farshi d Vahid (1999), "Payoff Assess m e n t s Without
Probabilities: A Simple Dynamic Model of Choice," Games and Economic
Behavior, 28, 294 - 309.
- - - - (2001), "Predicting How People Play Games: A Simple Dynamic
Model of Choice," Games and Economic Behavior, 34 (1), 104 - 22.
Shao, Jun and Dongsheng Tu (1996), The Jackknife and Bootstr ap.
Springer Series in Statistics: Springer Verlag.
Suppes, P. and R. C. Atkinson (1960), "Markov Learning Models for
Multiperso n Inter - Actions," Review of Metaphysics, 15, 196.
Swarth ou t, Todd and Mark Walker (1999), "Reinforcem en t, Belief
Learning, and Inform ati on Processing," in Summ er 1999 ESA Meeting.
76