“Real Effort” with Induced Effort Costs

Combining “Real Effort” with Induced Effort Costs: The Ball-Catching Task
Simon Gächter, Lingbo Huang, Martin Sefton *
CeDEx and School of Economics, University of Nottingham
14 February, 2015
PRELIMINARY: PLEASE DO NOT CITE WITHOUT PERMISSION
Abstract
We introduce a novel computerized real effort task, which combines “real” efforts in the lab with
induced material cost of effort. The central feature of this real effort task is that it allows researchers
to manipulate the cost of effort function as well as the production function which permits quantitative
predictions on effort provision. In an experiment with piece rate incentive schemes we found that the
predictions on effort provision are remarkably accurate. We also present experimental findings from
some classic experiments, namely, team production, gift exchange and tournament, using the task. All
of the results are closely in line with the stylized facts from experiments using purely induced values.
Keywords: Experimental design, real effort task, incentives.
Acknowledgement: We would like to thank participants in CBESS-CeDEx-CREED meeting in
Nottingham, 2014, ESA European meeting, Prague, 2014, Nordic Conference on Behavioral and
Experimental Economics, Aarhus, 2014, Max Planck Institute for Tax Law and Public Finance for
valuable comments. Simon Gächter acknowledges support from the European Research Council
Advanced Investigator Grant 295707 COOPERATION.
*
Email: Simon Gächter ([email protected]); Lingbo Huang ([email protected]);
Martin Sefton ([email protected]).
1
1. Introduction
Experiments using “real effort” tasks enjoy increasing popularity among experimental economists.
Some frequently used tasks include, for instance, number-addition tasks (e.g., Niederle & Vesterlund
(2007)), counting-zero tasks (e.g., Abeler et al (2010)) and slider-positioning tasks (Gill & Prowse
(2012)). 1 In this paper, we present a novel computerized real effort task, called the “ball-catching
task”, which combines “real” efforts in the lab with induced material cost of effort.
This task shares an advantage of other real effort tasks in that subjects are required to do something
tangible in order to achieve a level of performance, as opposed to simply choosing a number (as is
done in experiments that implement cost of effort functions using a pure induced value method, where
different number choices are directly linked with different financial costs). A drawback, however, of
existing real effort tasks is that in using them the researcher sacrifices considerable control over the
cost of effort function. As noted by Falk and Fehr (2003, p. 404): “while ‘real effort’ surely adds
realism to the experiment, one should also note that it is realized at the cost of losing control. Since
the experimenter does not know the workers’ effort cost, it is not possible to derive precise
quantitative predictions”. Incorporating material effort costs re-establishes a degree of control over
effort costs and, as we shall demonstrate, allows researchers to manipulate observable effort costs and
make point predictions on effort provision.
In the ball-catching task, a subject has a fixed amount of time to catch balls that fall randomly from
the top of the screen by using mouse clicks to move a tray at the bottom of the screen. Control over
the cost of effort is achieved by attaching material costs to mouse clicks that move the tray.
In our initial study we examine performance on this task under piece-rate incentives. Subjects incur a
cost for each mouse click and receive a prize for each ball caught. We first estimate the relationship
between clicks and catches and then use this to predict how the number of clicks will vary as the costs
of clicking and the benefits of catching are manipulated. We find that efforts, as measured by the
number of mouse clicks, are close to predicted efforts both at the aggregate and at the individual level.
These findings also add to the literature on empirical testing of incentive theories (Prendergast (1999))
by presenting real effort experimental evidence that basic piece-rate incentive theory works
quantitatively well. By comparison, the prominent field evidence reported by Lazear (2000) and lab
evidence reported by Dickinson (1999) provide support for comparative static predictions of basic
incentive theory, whereas we show that the theory also predicts exact effort levels accurately.
1
To our knowledge, the first experimental study using a real effort task for the purpose of testing incentive
theory is Dickinson (1999) in which subjects were asked to type paragraphs in a 4-day period. Another early
study implementing a real effort task within a typical laboratory experiment is van Dijk, Sonnemans & van
Winden (2001). See Appendix A for a comprehensive list of existing real effort tasks.
2
In the second part of the paper, we report further experiments that demonstrate how the task can be
implemented in some classic experiments. We administer the task in three classic experiments used to
study cooperation, fairness and competition, namely, team production (e.g., Nalbanian & Schotter
(1997)), gift exchange (e.g., Fehr, Kirchsteiger & Riedl (1993)) and tournament (e.g., Bull, Schotter &
Weigelt (1987)). In all three experiments, the results reproduce the stylized findings from previous
experiments using purely induced values.
In our last study, we introduce a version of online ball-catching task and conduct the same experiment
as in the first study using the Amazon Mechanical Turk workers. Comparative statics results are
largely replicated. But caveats are given as to attempts to make quantitative predictions using this
particular online version.
2. Design of the Ball-Catching Task
The ball-catching task is a computerized task programmed in z-Tree (Fischbacher (2007)), and
requires subjects to catch falling balls by moving a tray on their computer screens. Figure 1 shows a
screenshot of the task. In the middle of the screen there is a rectangular task box with four hanging
balls at the top and one tray at the bottom. Once a subject presses the “Start the task” button at the
lower right corner of the screen, the balls will fall from the top of the task box. In the version used in
this paper, the timer starts and balls fall one after another in a fixed time interval and the column
where each ball will fall is randomly determined by the computer. The software allows adjusting the
speed of falling balls and the time interval between falling balls. It is also possible to change the
number of columns and even fix a falling pattern rather than a random one. As will be discussed later,
flexibility in all these parameters will allow tight control over the production function in this task, that
is, the relationship between the number of balls caught and the number of clicks made.
To catch the falling balls, the subject can move the tray by mouse clicking the “LEFT” or “RIGHT”
buttons below the task box. At the top of the screen, the number of balls caught (CATCHES) and the
number of clicks made (CLICKS) are updated in real time. As will be clear later, we will interpret the
number of clicks as the amount of effort exerted by the subject in this task.
What distinguishes our ball-catching task from other real effort tasks is that we attach pecuniary costs
to mouse clicks. By specifying the relation clicks and pecuniary costs we can implement any material
cost of effort function. The most convenient specification might be to induce a linear cost function by
simply attaching a fixed cost to every mouse click, but it is also possible to induce any convex,
concave or even irregular cost function. In Figure 1 the subjects incurs a cost of 5 tokens for each
mouse click. Accumulated costs (EXPENSE) are updated and displayed in real time. It is also
3
possible to attach pecuniary benefits to catches. In Figure 1 the subject receives 20 tokens for each
ball caught and accumulated benefits (SCORE) are updated on screen in real time.
Figure 1: A screenshot of the ball-catching task
Another distinction from existing real effort task is that in existing real effort tasks output and effort
are typically indistinguishable. For instance, the output in the slider task, that is, the number of sliders
correctly positioned, is also interpreted as effort. In the ball-catching task there is clear distinction
between the catches and the clicks variables, with the natural interpretation being that the former
represents output and latter input. Moreover, by choosing the time constraint, ball speed, etc., the
researcher has flexibility in selecting the production technology.
Evidence collected in a post-experimental questionnaire suggests that subjects find the ball-catching
task easy to understand and learn. In the next section we examine in more detail how subjects perform
on the task under piece-rate incentives.
3. Study 1: Testing the Ball-Catching Task
3.1. Experimental design
Study 1 examines performance on the ball-catching task under piece-rate incentives. Each subject
worked on the same ball-catching task for 36 periods. Each period lasted 60 seconds. In each period
one combination of prize-per-catch (either 10 or 20 tokens) and cost-per-click (0, 5 or 10 tokens) was
used, giving six treatments that are varied within subjects (see Table 1). The first 6 periods, one
4
period of each treatment in random order, served as practice periods for participants to familiarize
themselves with the task. Token earnings from these periods were not converted to cash. The
following 30 periods, five periods of each treatment in completely randomized order (i.e., unblocked
and randomized), were paid out for real. In all, 64 subjects participated in the experiment with average
earnings of £13.80 for a session lasting about one hour. 2
Table 1: Treatments in Study 1
Treatment No.
1
2
3
4
5
6
Prize per catch (P)
10
10
10
20
20
20
Cost per click (C)
0
5
10
0
5
10
Given a particular piece-rate incentive, what level of effort would subjects exert? For illustrative
purposes, suppose the utility is increasing in the financial rewards, which are given by 𝑃𝑃𝑃𝑃 − 𝐶𝐶𝐶𝐶,
where 𝑄𝑄 is the number of catches and 𝑒𝑒 is the number of clicks. Assume the relationship between 𝑄𝑄
and 𝑒𝑒 is given by 𝑄𝑄 = 𝑓𝑓𝑖𝑖 (𝑒𝑒, 𝜖𝜖𝑖𝑖 ), where the function 𝑓𝑓𝑖𝑖 is an individual-specific production function,
with 𝑓𝑓𝑖𝑖 ′ > 0 and 𝑓𝑓𝑖𝑖 ′′ < 0, and 𝜖𝜖𝑖𝑖 is a random shock uncorrelated with effort. Given these assumptions
the expected utility maximizing number of clicks satisfies:
𝐶𝐶
𝑃𝑃
𝑒𝑒𝑖𝑖 ∗ = 𝑓𝑓𝑖𝑖 ′−1 � �.
(1)
Note that optimal effort is homogeneous of degree zero in 𝐶𝐶 and 𝑃𝑃. We recognize that the marginal
production function might be different for each individual and therefore the optimal individual effort
may also vary. However, to be able to make predictions at the aggregate level, we will estimate the
average production function and then use this estimate along with our incentive parameters to predict
average optimal effort. This approach relies on several assumptions. First, the production function
should be concave. Second, the production function should be independent of 𝐶𝐶 and 𝑃𝑃. Third, there is
no other subjective valuation in agents’ decision-making.
This model assumes that there are no cognitive or psychological costs/benefits associated with
working on the task. We can think of two plausible types of unobservable cost/benefit. First, output
may be a function of cognitive effort as well as the number of clicks. For example, output may depend
not just on how many clicks a subject makes but also on how intelligently a subject uses her clicks. If
the production function is given by 𝑓𝑓(𝑒𝑒, 𝑡𝑡), where 𝑡𝑡 represents cognitive effort, then 𝑒𝑒 ∗ will reflect a
𝐶𝐶
𝑃𝑃
trade-off between and 𝑡𝑡. If 𝑒𝑒 and 𝑡𝑡 are substitute inputs then a proportionate increase in 𝐶𝐶 and 𝑃𝑃 will
2
The experiment was run in two sessions at the CeDEx lab at the University of Nottingham with subjects
recruited using the online campus recruitment system ORSEE (Greiner (2004)). Instructions are reproduced in
Appendix B.
5
result in a decrease in 𝑒𝑒 ∗ as the subject substitutes more expensive clicking with more careful thinking.
Second, subjects may incur psychological costs from the action of clicking or enjoy psychological
benefits from catching balls. For example, if utility is increasing in (𝑃𝑃 + 𝐵𝐵)𝑄𝑄 − 𝐶𝐶𝐶𝐶, where 𝐵𝐵 is a non-
monetary benefit from a catch, then a proportionate increase in 𝐶𝐶 and 𝑃𝑃 will result in a decrease in 𝑒𝑒 ∗.
Our experiment is designed to directly examine the importance of incorporating material costs of
effort. One could use the ball-catching task without material costs of effort, basing comparative static
predictions (e.g. that the number of catches will increase as the prize per catch increases) on
psychological costs of effort. However, as in many existing real effort tasks, the ball-catching task
without material effort costs might exhibit a “ceiling effect”, that is unresponsiveness of effort to
varying prize incentives. 3 A ceiling effect in the ball-catching task would indicate that inducing
material cost is a desirable design feature. For this reason our design includes two treatments where
the material cost of clicking is zero (treatment 1 and 4 in Table 1). These allow us to test whether
there is a ceiling effect in the ball-catching task without induced effort costs.
Our experimental treatments also allow us to test whether unobservable costs/benefits matter
compared with induced effort costs in the ball-catching task. Our design includes two treatments that
vary 𝐶𝐶 and 𝑃𝑃 while keeping the ratio
𝐶𝐶
𝑃𝑃
constant (treatment 2 and 6 in Table 1). In the absence of
unobserved costs/benefits, the distribution of clicks should be the same in these two treatments. The
presence of unobserved costs/benefits would instead lead to systematic differences. Note that with this
𝐶𝐶
𝑃𝑃
design the prediction that optimal effort is homogeneous of degree zero in can be tested without the
need to derive the underlying production function, 𝑓𝑓, since all that is needed is a comparison of the
distributions of clicks between these two treatments. Our design further includes four other treatments
𝐶𝐶
𝑃𝑃
that also vary the ratio . Along with the first two treatments, variation in incentives allows us to
recover a more accurate estimate of the underlying production function. We then use the estimated
function to predict the optimal effort in each treatment.
3.2. Comparative statics results
Before presenting results on the quantitative predictions, we first discuss the comparative statics
results with an emphasis on the importance of inducing financial costs in the ball-catching task.
A comparison between treatment 1 and 4 offers an examination of the ceiling effect that is observed in
many other real effort tasks in the context of the ball-catching task. If there is no “real” psychological
cost/benefit associated with working on the task, we should observe the same distribution of the
number of clicks in these two treatments, thus exemplifying the typical ceiling effect. The closeness
3
See an early review in Camerer and Hogarth (1999). Another possible reason for the “ceiling effect” is that
subjects may also simply work on the paid task due to some experimenter demand effects, particularly in the
absence of salient outside options.
6
of distributions between treatment 1 and 4 is statistically supported by a two-sample KolmogorovSmirnov test that produces a p-value of 0.843 and a Wilcoxon matched-pairs signed-ranks test that
produces a p-value of 0.215 by using subject’s average effort per treatment as the unit of observations.
Our interpretation is that without clicking costs subjects simply catch as many as balls as they can.
We could further examine the intra-individual treatment differences by computing the within-subject
treatment difference in average effort. The result shows that the treatment difference is within ±12%
from the average effort in about 80% of our subjects.
Figure 2: The distributions and the Kernel density distributions of the number of clicks
Will induced costs of effort reduce the number of clicks? And will clicks depend on prize when there
are clicking costs? Figure 2 depicts the distribution of the number of clicks for each treatment. First,
we perform a horizontal comparison across treatment 1, 2 and 3 where the prize is always 10 and a
horizontal comparison across treatment 4, 5 and 6 where the prize is always 20. We observe a clear
shift of the distribution of the number of clicks when moving from the treatment with lower induced
effort costs to the one with higher induced effort costs in both comparisons. Friedman test for
detecting systematic differences in matched subjects’ observations by using subject’s average effort
per treatment as the unit of observations shows that the differences across treatments are highly
significant in both comparisons (p <0.001). Next, we perform two vertical comparisons between
treatment 2 and 5 and between treatment 3 and 6. Holding the clicking costs constant, we find that a
7
higher prize leads to higher number of clicks in both comparisons (Wilcoxon matched-pairs signedranks test: p<0.001).
The sharp contrast between the treatments with and without induced effort costs illustrates the
importance of inducing financial costs in the ball-catching task both in the sense of avoiding the
undesirable ceiling effect and moving effort provision in a predicted way.
Our final comparative statics result is concerned about the effect of financial stakes on effort, that is,
the effect of changing stakes without altering the ratio of the marginal cost and the marginal prize. As
has been discussed before, if subjects get unobserved costs/benefits we will observe systematic
differences in the distributions of clicks in the high-stakes and low-stakes treatments. However, by
using subject’s average effort per treatment as the unit of observations, both a two-sample
Kolmogorov-Smirnov test (p=0.843) and a Wilcoxon matched-pairs signed-ranks test do not suggest
significant differences between treatments (p=0.880). 4 Financial stakes do not appear to affect
subjects’ average effort.
To have a more detailed overview of individual effort, in Appendix C1 we summarize the average
number of clicks in each treatment for each individual. As one of comparative statics exercises, we
can check whether each individual exhibited qualitatively consistent behavior, by which we mean that
the ranking of actual efforts is the same as the ranking of predicted efforts across all treatments. As
both treatment 2 and 6 predict the same effort level, we take an average of the actual efforts in these
two treatments and only enter this average to the test of consistent behavior for each subject. This step
also applies to treatment 1 and 4. Only 3 of the 64 subjects (less than 5%) behaved inconsistently: in
all three cases the effort in treatment 5 is lower than the average effort in treatment 2 and 6.
In sum, we have shown that comparative statics results hold well in the experimental data. It is not
just the marginal cost or the marginal prize alone that affect behavior but it is their ratio that drives
different behaviors. This also implies that inducing a level of financial costs circumvents the ceiling
effect.
3.3. The production function
As discussed above, our approach depends on inducing a concave production function that is
independent of different incentives. The mathematical derivation of the exact functional form is
unfortunately almost intractable and, therefore, we will instead estimate the production functional
form. 5 Our empirical strategy of estimating the production function is to first retrieve the production
4
An intra-individual treatment difference examination shows that the treatment difference is within ±25% from
the average effort in about 83% of our subjects.
5
In the version of the ball-catching task used in Study 1, it is almost impossible to catch every ball. Along with
the randomness of the falling pattern, even if we are able to simulate the optimal number of clicks given a
8
functional form from the full sample and check some of its basic properties, including concavity. Next,
we check whether the estimates (intercept and shape) of the production function given its functional
form is invariant to varying levels of piece-rate incentives. If this is true, then the estimated
production function is in and of itself unaltered over different economic environments. We will also
examine the stability of the production function out of the sample.
Figure 3: The estimated production functional form from a fractional polynomial regression
Note: The first entry in (*,*) denotes the prize per catch and the second the cost per click. The fitted production
functional form is given by 𝑄𝑄 = α0 + α1 𝑒𝑒 0.5 + 𝛼𝛼2 𝑒𝑒 2 , where 𝑄𝑄 denotes the number of catches and 𝑒𝑒 the number
of clicks. The estimates of coefficients are from a fractional polynomial regression.
In Figure 3 we visualize the production function with estimates from the factional polynomial
regression. 6 We can see a clear concave production function which is reminiscent of the assumption
on a production technology with a diminishing marginal rate of return to inputs in neoclassical theory.
A slight decrease at the tail suggests that this production function also features a “technological
ceiling” beyond which higher effort provision may actually lead to lower production levels. However,
it is clear that most of such observations are collected only in treatments with zero cost of clicking. As
one of the main advantages of using the ball-catching task is precisely that clicking can be made
costly, the drop in the tail should be of little concern and, therefore, the empirical production function
can largely preserve its monotonic nature.
certain ratio of cost and prize, the fact that the subject cannot move fast enough to catch those which he/she
should have prevents us from deriving a useful production function that describes actual behaviour.
6
Fractional polynomials can afford more flexibility than conventional polynomials by allowing logarithms and
non-integer powers in the models.
9
Using the production functional form suggested by the fractional polynomial regression, we move on
to estimate the following random coefficients panel data model:
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ𝑒𝑒𝑒𝑒𝑖𝑖,𝑟𝑟 = 𝛽𝛽0 + 𝛽𝛽1 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 0.5 + 𝛽𝛽2 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 2 + 𝛿𝛿𝑟𝑟 + 𝜔𝜔𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 + 𝑢𝑢𝑖𝑖,𝑟𝑟 ,
(2)
where 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ𝑒𝑒𝑒𝑒𝑖𝑖,𝑟𝑟 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 are respectively the number of catches and the number of clicks of
subject 𝑖𝑖 in period 𝑟𝑟. The period dummies (with the first period providing the omitting category), the
random effect 𝜔𝜔𝑖𝑖 is assumed to be multiplicative with clicks. 𝜔𝜔𝑖𝑖 is assumed to vary only across
subjects and have a variance 𝜎𝜎𝜔𝜔2 . The random error 𝑢𝑢𝑖𝑖,𝑟𝑟 is assumed to be identically and independently
distributed over subjects and periods with a variance 𝜎𝜎𝑢𝑢2 . The motivation of using this specification is
that we observe a non-negligible variation in individual effort in every treatment and there also seems
to be higher variation in output at the range of larger number of clicks. Our specification of
multiplicative heterogeneity allows persistent individual differences at the marginal production
function. The inherent randomness of ball falling is represented by the random error at the production
function because the nature of this randomness is that the location from which a ball falls is random
but not the speed or the total number of balls so that a subject’s total number of clicking should not be
correlated with a random error. 7 All equations are estimated using maximum likelihood methods and
estimates are reported in Table 2.
Table 2: Random coefficients regression for the model (2) in Study 1
Dep. Var.: Catches
Coefficient estimates
(1) full sample
(2) Prize=10
(3) Prize=20
𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼
10.093***
(0.526)
11.247***
(0.791)
8.943***
(0.788)
5.517***
(0.103)
5.269***
(0.131)
5.773***
(0.161)
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2
-0.003***
(0.000)
-0.003***
(0.001)
-0.004***
(0.000)
𝜎𝜎𝜔𝜔
0.070***
(0.007)
0.066***
(0.007)
0.071***
(0.011)
3.162***
(0.052)
3.134***
(0.074)
3.136***
(0.074)
N
1920
960
960
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5
𝜎𝜎𝑢𝑢
Note: All period dummies are included and insignificant except for period 2 using the full sample.
***p < 0.01
7
This assumption is supported by a one-way analysis of variance of the number of clicks for each treatment. In
all six treatments, F-test strongly suggests that most of the variation in clicking can be accounted for by
between-subjects variation (p<0.001).
10
Column (1), (2) and (3) in Table 2 reports the coefficient estimates for the full sample, the sub-sample
with the prize of 10 and the sub-sample with the prize of 20. One observation is the closeness of the
estimates of the parameters of the production function in all equations. The figure in Appendix C2
visualizes this statistical observation by comparing the fitted production functions for two subsamples with different prizes using the estimates in Table 2. The two production functions almost
coincide both at the intercept and the shape.
To formally test whether the production function is invariant over different incentives, we proceed to
estimate an augmented model by adding interactions of covariates 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 (i.e. the
parameters of the shape of the production function) with a binary variable indicating whether the prize
is 10 or 20. This approach ensures that we will have sufficient observations over the whole range of
the production function for both sub-samples with different prizes, while at the same time be able to
do an informative test. By putting restrictions on all these interaction terms, we can perform a
likelihood ratio test to see whether the shape of the production function interacts with different
incentives. We cannot reject the null hypothesis that the shape of the production function applies to all
treatments (𝜒𝜒 2 (2)=3.43, p=0.180). Note that we are making a claim that the shape of production
function is invariant to the different incentives but not the intercept. Although in principle the
intercept should also not vary across treatments, its coefficient estimate may be imprecisely estimated
for treatments with the higher prize because most of the low or zero clicks are observed in the
treatment with the highest cost and prize ratio (treatment 3). Indeed, by instead estimating an
augmented model with interactions of the intercept, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 with a binary variable
indicating whether the prize is 10 or 20, a likelihood ratio test by comparing its log-likelihood to that
of the model (2) using the full sample reject the null hypothesis that the production function is
invariant both at the intercept and the shape across sub-samples (𝜒𝜒 2 (3)=8.18, p=0.042). We don’t
think, however, that this weaker claim is detrimental to our main purpose. After all, it is only the
shape of the production function matters for the quantitative predictions of effort.
To see the stability of the production function out of the sample, we split the full sample by
experimental sessions and fit instead an augmented model by adding interactions of adding
interactions of the intercept, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 with the session dummy. We cannot reject the null
hypothesis that both the intercept and the shape of production function are invariant across sessions
(𝜒𝜒 2 (3)=1.26, p=0.739). 8 This implies that there is essentially no harm to use the estimated production
function from the full sample to make predictions of effort even on the original sample because
whichever sub-sample has been used as a basis for making out of the sample predictions of effort on
another sub-sample the predictions will always be the same. This result is also supportive of our
previous conjecture that the reason why the intercept seems to vary with incentives is simply because
8
A graphic demonstration of the closeness of the two fitted production function is shown in Appendix C3.
11
fewer observations of low clicks in treatments with the prize of 20 render the estimate of the intercept
to be imprecise.
Turning back to the estimates using the full sample, we observe that the random effect is statistically
significant. Therefore, the model can explain why there is a dispersion of effort within each treatment
and it might be due to individual persistent tendency to click more or less often from one to the other.
The variance of the random error is also statistically significant and thus randomness in ball falling
affects the level of the production function.
In sum, we find that the shape of the estimated production function is invariant to different levels of
incentive. The production function is also stable out of the sample. There exist both persistent and
transitory individual differences in making clicks which explain the dispersion in effort.
3.4. Predicted and actual effort levels: Do induced effort costs dominate psychological
costs/benefits?
With the estimated production function from (3) and treatment parameters, we are ready to see how
quantitative predictions on effort perform. The evidence can be readily seen in Table 3 which
compares the predicted effort that is derived from (1) and the actual number of clicks for every
treatment. 9 Figure 4 visualizes these results by combining categories whenever the treatments have
the same predicted efforts.
Table 3: Comparisons between the predicted number of clicks and the actual number of clicks
Treatment No.
1
2
3
4
5
6
Prize per catch (P)
10
10
10
20
20
20
Cost per click (C)
0
5
10
0
5
10
Predicted clicks
57
20
8
57
34
20
57.8
(1.15)
18.6
(0.96)
8.8*
(0.48)
58.7
(1.20)
30.9*
(1.73)
18.4
(0.99)
Actual clicks
Note: the standard errors are in parentheses (clustered at the subject level). One sample two-tailed t-test:
*p < 0.1.
Consistent with the absence of unobserved costs/benefits, we find that predictions in all treatments are
remarkably accurate, although subjects slightly over-provide effort in the lower incentivized treatment
5 and under-provide effort in the higher incentivized treatment 3 (both significantly different from the
predicted efforts at the 10% level). 10 Overall, our subjects seem to understand the incentive structure
9
Note that we have assumed a continuous form of the production function. But this assumption is made mainly
for expositional and analytical convenience. In reality, the production function is a discrete relationship between
catches and clicks. We find the predicted effort which is rounded to the closest integer that maximizes the profit
in each treatment.
10
In Appendix C4, we also produce a similar table by separately calculating the actual number of clicks for each
experimental session. To the extent that it can be viewed as an out of the sample exercise, the resulting statistics
corroborate our claims here.
12
under the current context very well and respond to them in the predicted way. The results are
surprising given that subjects cannot know the production function a priori and therefore they cannot
calculate the optimal level of effort deliberatively. Nonetheless, they behaved as if they knew the
underlying structural parameters and responded to them optimally.
Figure 4: The distributions and the Kernel density distributions of the actual number of clicks and the
predicted clicks
Note: The vertical line in each sub-figure represents the predicted average effort.
We also administered a post-experimental questionnaire in Study 2 (reported in the next section)
where we asked subjects to rate the difficulty, enjoyableness and boredom of the task. We found that
on average subjects found the task very easy to do and they had neutral attitudes towards the
enjoyableness and boredom of the task, thus also supportive of our assumptions on psychological
costs/benefits.
A comparison between treatment 1 and 4 where the cost of clicking is zero provides another lens to
examine the relevance of psychological costs/benefits. The absence of psychological costs/benefits
would lead to similar effort provision in these two treatments. The presence of “real” psychological
costs of effort, as often assumed in other real effort tasks, would instead lead to higher effort levels in
the higher-powered treatment 4 than in the lower-powered treatment 1. We can see from Table 3 that
this is clearly not the case. The actual effort and its variation are very similar in these two treatments.
A two-sample Kolmogorov-Smirnov test for equality of distributions of clicks also does not suggest a
significant difference (two-tailed, p=0.521). This finding suggests that relying solely on alleged “real”
psychological cost has serious limitations when it comes to the danger of the ceiling effect. This is
true for the ball-catching task and probably applies to many other real effort tasks.
We conclude that incentive theory predicts behavior well both at the aggregate and at the individual
level. Material incentives seem to dominate cognitive and psychological concerns when subjects work
on the task.
13
4. Study 2: Applications - Team Production, Gift Exchange and Tournament
The previous section has shown the accuracy of predictions on effort using the ball-catching task. In
this section, we implement the same task in three classic experiments that have been used to study
cooperation, fairness and competition. These examples showcase that the power of the ball-catching
task is not just confined to the individual choice environment, but widely applicable to interactive
environments. In particular, we are interested in seeing whether our real effort task produces behavior
that is similar to findings from experiments using induced value techniques. Note that an important
feature of experimental methods in this study is that we will utilize the estimated production function
from Study 1 to derive predictions on effort whenever possible.
In each of the following experiments, we recruited 32 subjects for each treatment which ran 10
periods. We ran five sessions and recruited 160 subjects in total. After each session, a postexperimental questionnaire was administered inquiring subjects’ perception of the ball-catching task
itself, including difficulty, enjoyableness and boredom. More details on the design including matching
protocol, parameters of the task and payoff structure are specific to each experiment and, therefore,
will be explained separately below for each experiment. All of the experiments were run at the CeDEx
lab at the University of Nottingham. All treatments lasted no more than one hour and the average
earnings were around £13.0.
4.1. Team production
The understanding of free-riding incentives in team production is at the heart of contract theory and
organizational economics (Holmstrom (1982)). The standard experimental framework for studying
team production is the voluntary contribution mechanism in which the socially desirable outcome is in
conflict with individual free-riding incentives (see a recent survey in Chaudhuri (2011) in the context
of public goods).
We implemented a team production (TP) experiment in which four team members worked on the ballcatching task independently. The same four subjects played as a team for the entire 10 periods. For
each ball caught, the subject contributes 20 tokens to the team while he/she has to bear own cost of
clicking with a cost per click of 5 tokens. At the end of each period, the total team production
calculated as tokens was equally shared among the four team members. Each member’s payoff was
determined by the share of the production net of the individual cost of clicking. Therefore, the
revenue-sharing rule in team production reduces each individual’s marginal reward of catching balls
by 75% without changing the marginal cost of clicking. We also ran two control treatments, each of
which is basically the piece rate experiment as in Study 1. The first treatment (PR20) has a prize per
catch of 20 tokens and a cost per click of 5 tokens. The second (PR5) has a prize per catch of 5 tokens
and a cost per click of 5 tokens.
14
Considering a subject in the TP treatment, it is easy to see that if the subject behaves only selfishly,
then his/her effort provision should be the same as in PR5, whereas if he/she is primarily concerned
about maximizing the total team production, then his/her effort provision would be the same as in
PR20. Our hypothesis is that free-riding incentives would drive effort towards the level predicted by
selfishness as observed in many similar experiments using induced values (e.g., Nalbantian &
Schotter (1997) and many public goods experiments using voluntary contribution mechanisms).
Figure 6 displays the average numbers of clicks in all treatments. The two horizontal lines represent
the Nash predictions on optimal effort levels in PR20 and PR5 respectively. Note that we have used
the estimated production function from Study 1 to compute the optimal effort levels.
Figure 6: Average clicks over time in team production
The diagram shows a clear declining average effort over time in TP. The average effort decreases
from 30 clicks to just above 17 clicks in the last period. By comparison, the average effort in PR20
decreases from 38 to 32 and in PR5 from 16 to 8 and thus is consistent with our findings in Study 1.
Subjects in TP realize the free rider incentives from the very first period and steadily decrease their
efforts to the individually optimal level. Although the average effort does not reach that extreme level,
remember that our design has only 10 periods and uses a partner matching protocol. This empirical
result is certainly in line with findings from experiments using induced values.
15
4.2. Gift exchange
The gift exchange experiment (Fehr, Kirchsteiger & Riedl, 1993) postulates that the reason why firms
offer more than competitive wages to workers is because they understand the reciprocal nature of
workers’ effort provision, which was coined as “gift exchange” in the efficiency wage literature.
We implemented a bilateral gift exchange experiment similar to Gächter & Falk (2002). The firm
could offer a wage between 0 and 1000 token to the matched worker who then worked on the ballcatching task. There are two key differences we made in the version of the task in this experiment.
First, we deliberately reduced the number of balls that could be caught from 52 to 20 within 60
seconds by increasing the time interval between falling balls. We made this change because we
wanted to reduce the influence of random shocks as much as possible by making it possible to catch
every ball so that the reciprocal behaviors by workers could be reflected in their efforts as well as in
their actual outputs. Second, the cost schedule was changed to a convex function in accordance with
the parameters used in most gift exchange experiments. The full cost schedule is reported in Table 5.
Table 5: The cost schedule in gift exchange
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Cost 5
5
6
6
6
7
7
7
7
8
8
8
8
8
9
No.
17
18
19
20
21
22
23
24
25
26
27
28
29
30+
9
9
9
10
10
10
10
10
11
11
11
11
11
12
16
Cost 9
For example, the column with No. 6 means that the 6th click costs 6 tokens, the column with No. 21
means that the 21st click costs 10 tokens, and finally the last column with No. 30+ means that the
30th and any further click costs 12 tokens. Notice that if, for example, the worker makes a total of
three clicks she will incur a total cost of 5 + 5 + 6 = 16 tokens.
For the firm, each ball caught by his/her matched worker adds 50 tokens to his/her payoff. For the
worker, his/her payoff is determined by the wage received net of the cost of clicking. To compensate
any possible loss during the experiment, every firm and worker receives 300 tokens at the beginning
of each period.
We implemented two sessions with the stranger matching in one and the partner matching in the other
session. Figure 7 shows the relationship between outputs and wages on the upper panel and the
relationship between efforts and wages on the lower panel. The data suggests a clear reciprocal
pattern in both treatments whether we look at outputs or efforts. As predicted from the experiments by
Falk, Gächter & Kovacs (1999) and Gächter & Falk (2002) who also compared partners and strangers
in gift exchange, the reciprocal pattern is stronger when it is possible to build up a reputation between
16
a firm and a worker. Our findings between strangers are also commensurate to another early real
effort gift exchange experiment by Gneezy (2002) who used a maze solving task to measure worker’s
performance, although his experiment was conducted in a one-shot setting.
score
0 200 400 600 8001000
score
0 200 400 600 8001000
Figure 7: Reciprocal patterns in gift exchange
0
200
600
400
wagereceived
800
0
1000
200
bandwidth = .8
Stranger
Partner
800
1000
400
600
wagereceived
800
1000
0
0
5
clicks
10
20
clicks
10 15
20
30
bandwidth = .8
600
400
wagereceived
0
200
600
400
wagereceived
800
1000
0
200
bandwidth = .8
bandwidth = .8
Stranger
Partner
Note: the upper panel shows the relationship between outputs and wages in both treatments
and the lower panel displays the relationship between efforts and wages. The two sub-figures
belonging to the stranger matching treatment are on the left panel while the two sub-figures
belonging to the partner matching treatment are placed on the right panel. The fitted lines are
estimated from the non-parametric Lowess regressions respectively.
To provide some statistical evidence, we estimate the following random effects panel data model for
the number of clicks on the wage received:
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 = 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖,𝑟𝑟 + 𝜔𝜔𝑖𝑖 + 𝛿𝛿𝑟𝑟 + 𝑢𝑢𝑖𝑖,𝑟𝑟
where 𝜔𝜔𝑖𝑖 denotes the unobserved individual heterogeneity across subjects, 𝛿𝛿𝑟𝑟 denotes the period
dummies with the first period providing the omitted category, and 𝑢𝑢𝑖𝑖,𝑟𝑟 denotes the unobserved
individual heterogeneity across subjects and across periods. 𝜔𝜔𝑖𝑖 is assumed to be identically and
independently distributed over subjects with a variance 𝜎𝜎𝜔𝜔2 and 𝑢𝑢𝑖𝑖,𝑟𝑟 is assumed to be identically and
independently distributed over subjects and periods with a variance 𝜎𝜎𝑢𝑢2 .
17
Table 6 reports estimates of the parameters. We observe statistically significant reciprocal responses
by workers to higher wages by clicking more often, and the strength of reciprocity is stronger between
partners than strangers.
Table 6: Random effects regressions for worker’s clicks in gift exchange
Dep. Var.: Clicks
(1)Stranger
(2) Partner
wage
0.003**
(0.001)
0.014***
(0.003)
Constant
3.279***
(0.681)
3.746***
(1.444)
1.753
3.346
2.649
4.397
df=10
p=1.000
df=10
p=0.956
160
160
𝜎𝜎𝜔𝜔
𝜎𝜎𝑢𝑢
Hausman test for random
versus fixed effects
N
Note: the period dummies are included and insignificant.
***p < 0.01, **p < 0.05.
4.3. Tournament
Tournament incentive schemes, such as sales competition and job promotions, are an important
example of relative performance incentives (Lazear & Rosen (1981)). One early laboratory
experiment by Bull, Schotter & Weigelt (1987) found that tournament incentives indeed induced
efforts as predicted theoretically. But the variance of behavior was much larger under tournament
incentives than under piece rate incentives.
We implemented a simple simultaneous tournament in which two players compete for a prize worth
1200 tokens. The winner in each period earns 1200 tokens net of any cost of clicking, whereas the
loser receives 200 tokens net of any cost of clicking. The cost per click is always 5 tokens. Each
player’s probability of winning follows a piecewise linear success function (Che & Gale, 2000; Gill &
Prowse, 2012): (own output – opponent’s output + 50)/100. We use this contest success function
instead of the rank-order tournament in Bull, Schotter & Weigelt (1987) because it allows us to make
a point prediction on the expected effort regardless of any shape of random shocks in the production
function. This is because the marginal benefit of catching is equal to the marginal probability of
winning multiplied by the prize spread between the winner prize and the loser prize. With the
specified piecewise linear success function, the marginal probability of winning is always equal to
1/100. Therefore the marginal benefit of catching equals 10, while the marginal cost of clicks remains
18
5. Once again, we can simply utilize the estimated production function from Study 1 to compute the
optimal effort level which turns out to be 20.
Figure 8 displays the average efforts across all subjects over time. We observe a quick convergence
towards to the predicted effort level. The variance of effort in tournament also appears to be larger
than that observed in treatments with the same predicted effort in Study 1. The standard deviation of
effort is around 12 in the former and 9.6 in the latter. However, this difference is smaller than what
Bull, Schotter & Weigelt (1987) found with their induced value experiment, in which the standard
deviation of effort under tournament incentives more than doubled that under piece rate incentives.
Figure 8: Average clicks over time in tournament
In sum, in Study 2 we found that our real effort task produces behavior that is closely in line with the
stylized findings from three classic experiments that previously used induced values: free rider
incentives in team production drove down the average effort, workers reciprocated higher wages by
catching more balls and behavior in tournaments was more varied than that in piece-rate schemes.
5. Study 3: Online Ball-Catching Task
19
In this section, we introduce an online version of ball-catching task 11 . This online task has been
designed to resemble the lab version as closely as possible. The purpose of this section is to show the
potential to administer the ball-catching task in the online experiment, which has been proving itself
to be a valuable complement to the physical laboratory experiment.
We run the same experiment as in the first study on the popular online labor market platform,
Amazon Mechanical Turk (MTurk). In total, we recruited 95 subjects from MTurk and 74 of them
finished the task. The recruitment only took around 10 minutes. Given the unusually long duration of
the task (50 minutes), 78% completion rate suggests that our promised payment is sufficiently
attractive to most of the workers on MTurk. The average payment including $3 participation fee is
around $5.9, which is well above what most MTurk tasks can offer. The average age of our subjects
are 35, ranging from 20 to 66, and 52% of the participants are male.
Figure 9: The distributions and the Kernel density distributions of the number of clicks in Study 3
Parallel the presentation in Study 1, Figure 9 summarizes the distribution and the Kernel density
distribution of the number of clicks for each treatment. In general, we find that the comparative statics
results are very similar to those in Study 1. The Wilcoxon signed-ranks test using subject’s average
clicks per treatment as the unit of observations suggests that the difference in clicks between the two
treatments with the same ratio of the cost and the prize is not systematic (p=0.309). So does the
difference between the two treatments with no cost of clicking (p=0.832). Similarly, when comparing
11
The code programmed in PHP is available upon request.
20
across treatments with the same prize, Friedman tests strongly suggests that the treatment differences
for different costs are systematic (p<0.001 in both comparisons). However, the variance of clicking in
each treatment for MTurkers appears to be higher than that observed in the lab with student subjects.
Secondly, we examine the property of the production function in this online task. Figure D1 in
Appendix D displays the fitted production function from a fractional polynomial regression using the
online data. The production function shares almost all of the features of the lab task except that its tail
almost shows no decrease in catches. We perform the same test as in Study 1 to check whether the
production function is invariant to different incentives. The likelihood ratio test cannot reject the null
hypothesis that the shape of the production function is invariant to different prize incentives but not
the intercept (𝜒𝜒 2 (2)=2.68, p=0.262). Similarly as in Study 1, when we randomly split the full sample
into two halves (37 subjects in each), we cannot reject the null hypothesis that both the intercept and
the shape of the production function are invariant (𝜒𝜒 2 (3)=3.38, p=0.336).
Lastly, Table D1 in Appendix D shows performance of the quantitative predictions of effort. Contrary
to what we find using the lab task, it appears that the actual number of clicks is statistically higher
than the predicted one when C/P is larger than ½ and is statistically lower than the predicted effort
when C/P is no more than ½. The inaccuracy of quantitative predictions might be the consequence of
the technology represented by the underlying production function. In particular, the concavity of the
production function is perhaps not sufficiently “strong” to allow for sharp predictions of effort.
In sum, we find that performance in this particular version of online ball-catching task preserves all of
the comparative statics predictions and stability properties of the underlying production function as
compared to the lab task. However, the quantitative predictions do not hold as well as those using the
lab task.
One caveat is in order for potential users who want to administer this online ball-catching task in their
own research. It is important to calibrate the parameters, such as, the total number of balls, the speed
of falling, the column of falling balls, etc., in the ball-catching task to fit their research purposes. In
our case, it is desirable to make the production function sufficiently concave to allow for more
accurate predictions of effort. One possible solution is to induce a convex cost of effort function to
sharpen predictions.
6. Conclusion
In this paper we have introduced a novel real effort task that combines “real”, i.e. non-pecuniary,
effort costs with material effort costs. The task offers flexibility in controlling the cost of effort
21
function as well as the production function while retaining “real” efforts. Its greatest advantage over
other real effort tasks lies in its power to make a priori point predictions on effort.
As has been shown in Study 1, the underlying production function of the ball-catching task preserves
important properties of a production technology: it displays a diminishing return in clicking and is
invariant to incentives. With a stable production function, we find that the predicted effort levels are
remarkably accurate under various piece-rate incentive schemes on average and highly consistent at
the individual level. Our results underscore the importance of inducing material cost of effort in the
ball-catching task. With material costs we find that efforts are invariant to increasing the stakes and
close to the predicted efforts, a finding that suggests any cognitive and psychological costs/benefits
are largely dominated by induced material costs. Without material costs we observe a ceiling effect:
efforts are unresponsive to increases in the prize for catching balls.
We illustrate the usefulness of the ball-catching task by implementing it in classic interactive
environments that have been used to study cooperation, fairness and competition. Using the ballcatching task in place of pure induced effort costs, we reproduce the stylized findings from these
experiments. Finally, we introduce an online version of ball-catching task for researchers who are
interested in using the ball-catching task in their online experiments.
We believe that the ball-catching task will be a valuable tool for experimental researchers interested in
using “real efforts” in the lab. One of our future projects is to develop an online version of the ballcatching task and implement it among a more general population, for example, Amazon Mechanical
Turk workers.
22
References
Abeler, J., Falk, A., Goette, L., & Huffman, D. (2011). Reference Points and Effort Provision.
American Economic Review, 101(2), 470–492.
Bull, C., Schotter, A., & Weigelt, K. (1987). Tournaments and Piece Rates : An Experimental Study.
Journal of Political Economy, 95(1), 1–33.
Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review
and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1-3), 7–42.
Chaudhuri, A. (2010). Sustaining cooperation in laboratory public goods experiments: a selective
survey of the literature. Experimental Economics, 14(1), 47–83.
Che, Y.-K., & Gale, I. (2000). Difference-Form Contests and the Robustness of All-Pay Auctions.
Games and Economic Behavior, 30(1), 22–43.
Corgnet, B., Hernán-González, R., & Schniter, E. (2014). Why real leisure really matters: incentive
effects on real effort in the laboratory. Forthcoming in Experimental Economics.
Dickinson, D. L. (1999). An Experimental Examination of Labor Supply and Work Intensities.
Journal of Labor Economics, 17(4), 638–670.
Eckartz, K. M. (2014). Task enjoyment and opportunity costs in the lab – the effect of financial
incentives on performance in real effort tasks. Jena Economic Research Papers.
Falk, A., Gächter, S., & Kovács, J. (1999). Intrinsic motivation and extrinsic incentives in a repeated
game with incomplete contracts. Journal of Economic Psychology, 20(3), 251–284.
Fehr, E., Kirchsteiger, G., & Riedl, A. (1993). Does Fairness Prevent Market Clearing? An
Experimental Investigation. Quarterly Journal of Economics, 108(2), 437–459.
Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments. Experimental
Economics, 10(2), 171–178.
Gächter, S., & Falk, A. (2002). Reputation and Reciprocity: Consequences for the Labour Relation.
Scandinavian Journal of Economics, 104(1), 1–26.
Gill, D., & Prowse, V. (2012). A Structural Analysis of Disappointment Aversion in a Real Effort
Competition. American Economic Review, 102(1), 469–503.
Gneezy, U. (2002). Does high wage lead to high profits? An experimental study of reciprocity using
real effort. Unpublished.
Greiner, B. (2004). An Online Recruitment System for Economic Experiments. In M. Kremer & V.
Macho (Eds.), Forschung Und Wissenschaftliches Rechnen Gwdg Bericht 63 (pp. 79–93). Göttingen:
Ges. für Wiss. Datenverarbeitung.
Holmstrom, B. (1982). Moral hazard in teams. Bell Journal of Economics, 13(2), 324–340.
Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: conducting experiments
in a real labor market. Experimental Economics, 14(3), 399–425.
23
Lazear, E. (2000). Performance Pay and Productivity. American Economic Review, 90(5), 1346-1361.
Lazear, E. P., & Rosen, S. (1981). Rank-order tournaments as optimum labor contracts. Journal of
Political Economy, 89(5), 841–864.
Mohnen, A., Pokorny, K., & Sliwka, D. (2008). Transparency, Inequity Aversion, and the Dynamics
of Peer Pressure in Teams: Theory and Evidence. Journal of Labor Economics, 26(4), 693–720.
Nalbantian, H., & Schotter, A. (1997). Productivity under group incentives: An experimental study.
American Economic Review, 87(3), 314–341.
Niederle, M., & Vesterlund, L. (2007). Do Women Shy Away From Competition? Do Men Compete
Too Much? Quarterly Journal of Economics, 122(3), 1067–1101.
Prendergast, C. (1999). The Provision of Incentives in Firms. Journal of Economic Literature, 37(1),
7–63.
Smith, V. L. (1982). Microeconomic systems as an experimental science. American Economic
Review, 72(5), 923–955.
van Dijk, F., Sonnemans, J., & van Winden, F. (2001). Incentive systems in a real effort experiment.
European Economic Review, 45(2), 187–214.
24
Appendix A. Existing Real Effort Tasks 12
types
specific tasks
mathematical
solve mathematical equation
skills
multiply one-digit with twodigit
studies
Sutter & Weck-Hannemann (2003)
Dohmen & Falk (2011)
Kuhnen & Tymula (2012)
multiply two-digit numbers
addition of two-digit numbers
select a subset of the numbers
that added up to 100
decode a number from a grid of
letters
Brüggen & Strobel (2007)
Niederle & Vesterlund (2007)
Healy & Pate (2011)
Sloof & van Praag (2010)
Eriksson, Poulsen & Villeval (2009)
Cason, Masters & Sheremeta (2010)
Heyman & Ariely (2004)
Lévy-Garboua, Masclet & Montmarquette (2009)
Dutcher & Saral (2012)
linguistic
skills
memory and
logic
solve anagrams
Charness and Villeval (2009)
mazes
Gneezy (2002)
Gneezy, Niederle & Rustichini (2003)
Freeman & Gelber (2010)
trial and
error
memory games
Ivanova-Stenzel & Kübler (2011)
Sudoku
Calsamiglia, Franke & Rey-Biel (2011)
IQ test
Gneezy & Rustichini (2000)
Pokorny (2008)
two-variable optimization task
van Dijk, Sonnemans & van Winden (2001)
Bosman, Sutter & van Winden (2005)
Bosman & van Winden (2002)
Sloof & van Praag (2008)
Secretary
tasks
search height of a curve
Montmarquette et al (2004)
Dickinson & Villeval (2008)
type abstracts
Hennig-Schmidt, Rockenbach & Sadrieh (2010)
type paragraphs
Dickinson (1999)
log library books into database
Gneezy & List (2006)
Kube, Maréchal & Puppe (2012)
12
This list was compiled by our own. Later we found that there was an independent effort by Christina Gravert
who compiled a similar list of real effort tasks and circulated it within the ESA google group community
(available
at
https://groups.google.com/forum/#!searchin/esa-discuss/real$20effort$20tasks/esadiscuss/P9MuBRweawc/Amc6ylQAvgYJ).
25
Kube, Maréchal & Puppe (2011)
copy fictitious students in
database
copy numbers from
questionnaires at computer
stuff letters into envelope
Ortona, Ottone, Ponzano, & Scacciati (2008)
Greiner & Ockenfels & Werner (2011)
Konow (2000)
Falk & Ichino (2006)
Carpenter, Matthews, & Schirm (2010)
Manual tasks
count zeros
count numbers
drag a computerized ball to a
specific location
slider-moving
sort and count coins
Abeler, Falk, Goette, & Huffman (2011)
Pokorny (2008)
Heyman & Ariely (2004)
Gill & Prowse (2012); Gill & Prowse (2011)
Gill, Prowse & Vlassopoulos (2012)
Dörrenberg, & Duncan (2012)
Hammermann, Mohnen & Nieken (2012)
Bortolotti, Devetag, & Ortmann (2009)
26
Appendix B. Instructions Used in Study 1
Welcome to the experiment. You are about to participate in an experiment on decision making.
Throughout the experiment you must not communicate with other participants. If you follow these
instructions carefully, you could earn a considerable amount of money.
For participating in this experiment you will receive a £3 show-up fee. In addition you can earn
money by completing a task. Your earnings from the task will depend only on your own performance.
You will be asked to work on a computerized ball-catching task for 36 periods. Each period lasts
one minute. In each period, there will be a task box in the middle of the task screen like the one shown
below:
Once you click on the “Start the Task” button, the timer will start and balls will fall randomly
from the top of the task box. You can move the tray at the bottom of the task box to catch the balls by
using the mouse to click on the LEFT or RIGHT buttons. To catch a ball, your tray must be below the
ball before it touches the bottom of the tray. When the ball touches the tray your catches increase by
one.
You will receive a prize (in tokens) for each ball you catch and incur a cost (in tokens) for each
mouse click you make. At the beginning of each period you will be informed of your prize for each
ball caught, which will be either 10 or 20, and your cost for each click, which will be either 0, 5 or 10.
In each period, the number of balls you caught so far (displayed as CATCHES), the number of clicks
you made so far (CLICKS), your accumulated prizes so far (SCORE) and your accumulated costs so
far (EXPENSE) are shown right above the task box. SCORE will be CATCH multiplied by your prize
per catch for the period and EXPENSE will be CLICK multiplied by your cost per click for the period.
At the end of the period your earnings in tokens for the period will be your SCORE minus your
EXPENSE. Please note that catching more balls by moving the tray more often does not necessarily
lead to higher earnings because both SCORE and EXPENSE matter for your earnings.
The first six periods will be practice periods which will not affect your earnings in the experiment
in any way. At the end of the experiment, the tokens you earned from periods 7 to 36 will be
converted to cash at the rate of 1200 tokens = 1 pound. You will be paid this amount in addition to
your £3 show-up fee.
27
Appendix C. Additional Tables and Figures in Study 1
C1. Summary of individual effort
Table C1: Individual average number of clicks in each treatment
Subject
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
401
402
1:(10,0)
59.8
62.6
51
21.6
62
56.8
57.2
47.4
62.6
60.2
61.4
48.6
64.6
49.2
64
63.8
56.4
38.2
61.8
64
65.4
64.6
67.2
53
51.2
72.8
62.4
63.2
63.4
65.8
59.2
58.4
61.2
63.2
46.8
38.2
52.4
67
56.6
44.6
52.8
44
52.6
69.4
60.8
56.6
58.4
63.2
59.6
65
2:(10,5)
27.4
18.4
14.8
6
15.8
18.8
20.8
13.4
18.8
20.8
5.8
6.4
25
17.4
18.8
29.8
36.2
9.4
19.2
24.4
21
24.6
31.4
19.6
28.6
10.6
20.8
23.4
20
30.4
24.8
7.6
3.4
16.4
17.2
5
14.2
14.6
16.8
29.4
7.4
17.2
15.8
13.8
17
23.6
21.8
15.8
13.2
20.8
3:(10,10)
12
12.2
9.4
3
3.6
9
11.2
11.6
9.6
7.2
5
2.8
16
14.6
9.2
11
13.2
4.6
11.4
6.8
10.6
11.4
14.8
6.8
5.4
8
8.8
10.2
12.2
18.8
13.6
3.8
3.2
9
11
1.4
7.2
7.4
5.6
8.6
4
9.8
7.4
6.2
7.4
10
8.2
15.2
13.2
8.6
4:(20,0)
64.6
63.6
65
20.2
60.6
58.6
57.8
54.6
65
61.4
63.8
49.2
69.8
59
64.4
66
53
38
58.4
55
58.4
61.6
60.8
53
49.2
65.2
63
68.4
58
64.2
53.4
59.6
60.8
70
50.8
30.6
60
63.2
64.4
48.6
56.6
50.2
66.4
62.4
71.6
64.6
63
59.8
64.4
66.2
28
5:(20,5)
45
30.2
22.8
5.4
18
34.8
29.6
25.8
26
39
9
8.4
48.2
24.2
30.2
52.6
39.2
11.6
27.6
37.8
37.2
45.6
53.2
27.8
43.6
16.8
40
53
31.4
24.2
36.6
31.8
5.6
21.8
32.2
5.6
23.6
36.2
29.6
27.8
37
29.8
29.6
26.4
40.4
56.8
23
23.4
15.6
42.2
6:(20,10)
27.6
20
14.2
4.4
16
19.2
26
10.8
14.2
23.8
7
6.8
29.6
16.8
21.4
29.8
34.6
7
17
23.2
16.6
25.8
32.2
17.4
22.2
10.6
22.4
28.6
19.2
25.4
25.2
12.2
13.4
13
11.6
1.4
13.2
13.6
17.2
28.6
9.4
21.8
11.8
9.8
23.6
24
16.2
18.6
16.8
23.2
Consistent behavior?
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
×
√
√
×
√
√
√
√
√
√
×
√
√
√
√
√
√
√
√
√
√
403
41.2
16.2
10.4
42.8
404
57.2
24.4
5.2
60.2
405
51.6
16.8
8
54
406
62.4
25.8
11.6
69
407
67
8.2
2.8
69
408
67.6
17.2
7.2
62.6
409
62.4
16
4.8
58.6
410
47.4
11.4
8.6
35.6
411
54.4
13
6.4
49.4
412
66.4
34.8
0
62.6
413
71.8
13.8
13.4
67.6
414
62
27.6
9.8
64.2
415
48.4
16
11.8
55.6
416
67.8
37.2
11.8
70.4
Note: we consider the average clicks of treatment 2 and
evaluating the consistency at the individual level.
29
15.4
13
√
38.4
27.6
√
25.8
15
√
58.4
29
√
10
6.8
√
28.6
13.8
√
42
14.6
√
13.6
11.6
√
20.4
13.6
√
52.2
41.4
√
23.2
16.8
√
45
30
√
22.8
15.2
√
66
17.8
√
6 and average clicks of treatment 1 and 4 when
C2. Stability of the production function over different incentives
10
catches
20
30
40
50
Figure C1: The fitted production functions for sub-samples with different prizes using the model (2)
0
P=10
0
20
60
40
clicks
30
P=20
80
100
C3. Stability of the production function over experimental sessions
10
catches
20
30
40
50
Figure C2: The fitted production functions for different sessions using the model (2)
0
Session 1
0
20
40
clicks
31
Session 2
60
80
C4. Out-of-sample exercises
We split the full sample into two halves, S1 and S2, by experimental sessions and estimate the
production functions respectively. The actual number of clicks in each sub-sample is calculated and
reported in Table C1. Overall, out-of-sample predictions of effort perform quite well.
Table C2: Comparisons between the predicted number of clicks and the actual number of clicks
Treatment No.
1
2
3
4
5
6
Prize per catch (P)
10
10
10
20
20
20
Cost per click (C)
0
5
10
0
5
10
Predicted clicks
57
20
8
57
34
20
58.119
(1.710)
57.500
(1.568)
19.694
(1.371)
17.556*
(1.764)
9.619**
(0.703)
7.975
(0.621)
58.213
(1.678)
59.225
(1.735)
31.456
(2.307)
30.263
(2.620)
19.600
(1.403)
17.294*
(1.405)
Actual clicks (S1)
Actual clicks (S2)
Note: the standard errors are in parentheses (clustered at the subject level). One sample two-way t-test:
**p < 0.05, *p < 0.1.
32
Appendix D. Additional Figures and Tables in Study 3
D1: The fitted production function
Figure D1. The estimated production function from a fractional polynomial regression in the online
20
catches
30
40
50
experiment
0
10
(10, 0)
(10, 10)
(20, 5)
95% CI
0
20
60
40
(10, 5)
(20, 0)
(20, 10)
predicted balls
80
100
clicks
Note: The first entry in (*,*) denotes the prize per catch and the second the cost per click. The fitted production
functional form is given by 𝑄𝑄 = α0 + α1 𝑒𝑒 0.5 + 𝛼𝛼2 𝑒𝑒 2 , where 𝑄𝑄 denotes the number of catches and 𝑒𝑒 the number
of clicks. The estimates of coefficients are from a fractional polynomial regression.
33
D2: Predicted effort and actual effort
Table D1. Comparisons between the predicted number of clicks and the actual number of clicks
Treatment No.
1
2
3
4
5
6
Prize per catch (P)
10
10
10
20
20
20
Cost per click (C)
0
5
10
0
5
10
Predicted
95
25
8
95
51
25
55.3***
(1.00)
22.0***
(0.80)
13.7***
(0.74)
55.9***
(0.96)
35.6***
(0.85)
22.7***
(0.85)
Actual
Note: the standard errors are in parentheses (clustered at the subject level). One sample two-tailed t-test: ***p <
0.01, *p < 0.1 ***p<0.01
34