Combining “Real Effort” with Induced Effort Costs: The Ball-Catching Task Simon Gächter, Lingbo Huang, Martin Sefton * CeDEx and School of Economics, University of Nottingham 14 February, 2015 PRELIMINARY: PLEASE DO NOT CITE WITHOUT PERMISSION Abstract We introduce a novel computerized real effort task, which combines “real” efforts in the lab with induced material cost of effort. The central feature of this real effort task is that it allows researchers to manipulate the cost of effort function as well as the production function which permits quantitative predictions on effort provision. In an experiment with piece rate incentive schemes we found that the predictions on effort provision are remarkably accurate. We also present experimental findings from some classic experiments, namely, team production, gift exchange and tournament, using the task. All of the results are closely in line with the stylized facts from experiments using purely induced values. Keywords: Experimental design, real effort task, incentives. Acknowledgement: We would like to thank participants in CBESS-CeDEx-CREED meeting in Nottingham, 2014, ESA European meeting, Prague, 2014, Nordic Conference on Behavioral and Experimental Economics, Aarhus, 2014, Max Planck Institute for Tax Law and Public Finance for valuable comments. Simon Gächter acknowledges support from the European Research Council Advanced Investigator Grant 295707 COOPERATION. * Email: Simon Gächter ([email protected]); Lingbo Huang ([email protected]); Martin Sefton ([email protected]). 1 1. Introduction Experiments using “real effort” tasks enjoy increasing popularity among experimental economists. Some frequently used tasks include, for instance, number-addition tasks (e.g., Niederle & Vesterlund (2007)), counting-zero tasks (e.g., Abeler et al (2010)) and slider-positioning tasks (Gill & Prowse (2012)). 1 In this paper, we present a novel computerized real effort task, called the “ball-catching task”, which combines “real” efforts in the lab with induced material cost of effort. This task shares an advantage of other real effort tasks in that subjects are required to do something tangible in order to achieve a level of performance, as opposed to simply choosing a number (as is done in experiments that implement cost of effort functions using a pure induced value method, where different number choices are directly linked with different financial costs). A drawback, however, of existing real effort tasks is that in using them the researcher sacrifices considerable control over the cost of effort function. As noted by Falk and Fehr (2003, p. 404): “while ‘real effort’ surely adds realism to the experiment, one should also note that it is realized at the cost of losing control. Since the experimenter does not know the workers’ effort cost, it is not possible to derive precise quantitative predictions”. Incorporating material effort costs re-establishes a degree of control over effort costs and, as we shall demonstrate, allows researchers to manipulate observable effort costs and make point predictions on effort provision. In the ball-catching task, a subject has a fixed amount of time to catch balls that fall randomly from the top of the screen by using mouse clicks to move a tray at the bottom of the screen. Control over the cost of effort is achieved by attaching material costs to mouse clicks that move the tray. In our initial study we examine performance on this task under piece-rate incentives. Subjects incur a cost for each mouse click and receive a prize for each ball caught. We first estimate the relationship between clicks and catches and then use this to predict how the number of clicks will vary as the costs of clicking and the benefits of catching are manipulated. We find that efforts, as measured by the number of mouse clicks, are close to predicted efforts both at the aggregate and at the individual level. These findings also add to the literature on empirical testing of incentive theories (Prendergast (1999)) by presenting real effort experimental evidence that basic piece-rate incentive theory works quantitatively well. By comparison, the prominent field evidence reported by Lazear (2000) and lab evidence reported by Dickinson (1999) provide support for comparative static predictions of basic incentive theory, whereas we show that the theory also predicts exact effort levels accurately. 1 To our knowledge, the first experimental study using a real effort task for the purpose of testing incentive theory is Dickinson (1999) in which subjects were asked to type paragraphs in a 4-day period. Another early study implementing a real effort task within a typical laboratory experiment is van Dijk, Sonnemans & van Winden (2001). See Appendix A for a comprehensive list of existing real effort tasks. 2 In the second part of the paper, we report further experiments that demonstrate how the task can be implemented in some classic experiments. We administer the task in three classic experiments used to study cooperation, fairness and competition, namely, team production (e.g., Nalbanian & Schotter (1997)), gift exchange (e.g., Fehr, Kirchsteiger & Riedl (1993)) and tournament (e.g., Bull, Schotter & Weigelt (1987)). In all three experiments, the results reproduce the stylized findings from previous experiments using purely induced values. In our last study, we introduce a version of online ball-catching task and conduct the same experiment as in the first study using the Amazon Mechanical Turk workers. Comparative statics results are largely replicated. But caveats are given as to attempts to make quantitative predictions using this particular online version. 2. Design of the Ball-Catching Task The ball-catching task is a computerized task programmed in z-Tree (Fischbacher (2007)), and requires subjects to catch falling balls by moving a tray on their computer screens. Figure 1 shows a screenshot of the task. In the middle of the screen there is a rectangular task box with four hanging balls at the top and one tray at the bottom. Once a subject presses the “Start the task” button at the lower right corner of the screen, the balls will fall from the top of the task box. In the version used in this paper, the timer starts and balls fall one after another in a fixed time interval and the column where each ball will fall is randomly determined by the computer. The software allows adjusting the speed of falling balls and the time interval between falling balls. It is also possible to change the number of columns and even fix a falling pattern rather than a random one. As will be discussed later, flexibility in all these parameters will allow tight control over the production function in this task, that is, the relationship between the number of balls caught and the number of clicks made. To catch the falling balls, the subject can move the tray by mouse clicking the “LEFT” or “RIGHT” buttons below the task box. At the top of the screen, the number of balls caught (CATCHES) and the number of clicks made (CLICKS) are updated in real time. As will be clear later, we will interpret the number of clicks as the amount of effort exerted by the subject in this task. What distinguishes our ball-catching task from other real effort tasks is that we attach pecuniary costs to mouse clicks. By specifying the relation clicks and pecuniary costs we can implement any material cost of effort function. The most convenient specification might be to induce a linear cost function by simply attaching a fixed cost to every mouse click, but it is also possible to induce any convex, concave or even irregular cost function. In Figure 1 the subjects incurs a cost of 5 tokens for each mouse click. Accumulated costs (EXPENSE) are updated and displayed in real time. It is also 3 possible to attach pecuniary benefits to catches. In Figure 1 the subject receives 20 tokens for each ball caught and accumulated benefits (SCORE) are updated on screen in real time. Figure 1: A screenshot of the ball-catching task Another distinction from existing real effort task is that in existing real effort tasks output and effort are typically indistinguishable. For instance, the output in the slider task, that is, the number of sliders correctly positioned, is also interpreted as effort. In the ball-catching task there is clear distinction between the catches and the clicks variables, with the natural interpretation being that the former represents output and latter input. Moreover, by choosing the time constraint, ball speed, etc., the researcher has flexibility in selecting the production technology. Evidence collected in a post-experimental questionnaire suggests that subjects find the ball-catching task easy to understand and learn. In the next section we examine in more detail how subjects perform on the task under piece-rate incentives. 3. Study 1: Testing the Ball-Catching Task 3.1. Experimental design Study 1 examines performance on the ball-catching task under piece-rate incentives. Each subject worked on the same ball-catching task for 36 periods. Each period lasted 60 seconds. In each period one combination of prize-per-catch (either 10 or 20 tokens) and cost-per-click (0, 5 or 10 tokens) was used, giving six treatments that are varied within subjects (see Table 1). The first 6 periods, one 4 period of each treatment in random order, served as practice periods for participants to familiarize themselves with the task. Token earnings from these periods were not converted to cash. The following 30 periods, five periods of each treatment in completely randomized order (i.e., unblocked and randomized), were paid out for real. In all, 64 subjects participated in the experiment with average earnings of £13.80 for a session lasting about one hour. 2 Table 1: Treatments in Study 1 Treatment No. 1 2 3 4 5 6 Prize per catch (P) 10 10 10 20 20 20 Cost per click (C) 0 5 10 0 5 10 Given a particular piece-rate incentive, what level of effort would subjects exert? For illustrative purposes, suppose the utility is increasing in the financial rewards, which are given by 𝑃𝑃𝑃𝑃 − 𝐶𝐶𝐶𝐶, where 𝑄𝑄 is the number of catches and 𝑒𝑒 is the number of clicks. Assume the relationship between 𝑄𝑄 and 𝑒𝑒 is given by 𝑄𝑄 = 𝑓𝑓𝑖𝑖 (𝑒𝑒, 𝜖𝜖𝑖𝑖 ), where the function 𝑓𝑓𝑖𝑖 is an individual-specific production function, with 𝑓𝑓𝑖𝑖 ′ > 0 and 𝑓𝑓𝑖𝑖 ′′ < 0, and 𝜖𝜖𝑖𝑖 is a random shock uncorrelated with effort. Given these assumptions the expected utility maximizing number of clicks satisfies: 𝐶𝐶 𝑃𝑃 𝑒𝑒𝑖𝑖 ∗ = 𝑓𝑓𝑖𝑖 ′−1 � �. (1) Note that optimal effort is homogeneous of degree zero in 𝐶𝐶 and 𝑃𝑃. We recognize that the marginal production function might be different for each individual and therefore the optimal individual effort may also vary. However, to be able to make predictions at the aggregate level, we will estimate the average production function and then use this estimate along with our incentive parameters to predict average optimal effort. This approach relies on several assumptions. First, the production function should be concave. Second, the production function should be independent of 𝐶𝐶 and 𝑃𝑃. Third, there is no other subjective valuation in agents’ decision-making. This model assumes that there are no cognitive or psychological costs/benefits associated with working on the task. We can think of two plausible types of unobservable cost/benefit. First, output may be a function of cognitive effort as well as the number of clicks. For example, output may depend not just on how many clicks a subject makes but also on how intelligently a subject uses her clicks. If the production function is given by 𝑓𝑓(𝑒𝑒, 𝑡𝑡), where 𝑡𝑡 represents cognitive effort, then 𝑒𝑒 ∗ will reflect a 𝐶𝐶 𝑃𝑃 trade-off between and 𝑡𝑡. If 𝑒𝑒 and 𝑡𝑡 are substitute inputs then a proportionate increase in 𝐶𝐶 and 𝑃𝑃 will 2 The experiment was run in two sessions at the CeDEx lab at the University of Nottingham with subjects recruited using the online campus recruitment system ORSEE (Greiner (2004)). Instructions are reproduced in Appendix B. 5 result in a decrease in 𝑒𝑒 ∗ as the subject substitutes more expensive clicking with more careful thinking. Second, subjects may incur psychological costs from the action of clicking or enjoy psychological benefits from catching balls. For example, if utility is increasing in (𝑃𝑃 + 𝐵𝐵)𝑄𝑄 − 𝐶𝐶𝐶𝐶, where 𝐵𝐵 is a non- monetary benefit from a catch, then a proportionate increase in 𝐶𝐶 and 𝑃𝑃 will result in a decrease in 𝑒𝑒 ∗. Our experiment is designed to directly examine the importance of incorporating material costs of effort. One could use the ball-catching task without material costs of effort, basing comparative static predictions (e.g. that the number of catches will increase as the prize per catch increases) on psychological costs of effort. However, as in many existing real effort tasks, the ball-catching task without material effort costs might exhibit a “ceiling effect”, that is unresponsiveness of effort to varying prize incentives. 3 A ceiling effect in the ball-catching task would indicate that inducing material cost is a desirable design feature. For this reason our design includes two treatments where the material cost of clicking is zero (treatment 1 and 4 in Table 1). These allow us to test whether there is a ceiling effect in the ball-catching task without induced effort costs. Our experimental treatments also allow us to test whether unobservable costs/benefits matter compared with induced effort costs in the ball-catching task. Our design includes two treatments that vary 𝐶𝐶 and 𝑃𝑃 while keeping the ratio 𝐶𝐶 𝑃𝑃 constant (treatment 2 and 6 in Table 1). In the absence of unobserved costs/benefits, the distribution of clicks should be the same in these two treatments. The presence of unobserved costs/benefits would instead lead to systematic differences. Note that with this 𝐶𝐶 𝑃𝑃 design the prediction that optimal effort is homogeneous of degree zero in can be tested without the need to derive the underlying production function, 𝑓𝑓, since all that is needed is a comparison of the distributions of clicks between these two treatments. Our design further includes four other treatments 𝐶𝐶 𝑃𝑃 that also vary the ratio . Along with the first two treatments, variation in incentives allows us to recover a more accurate estimate of the underlying production function. We then use the estimated function to predict the optimal effort in each treatment. 3.2. Comparative statics results Before presenting results on the quantitative predictions, we first discuss the comparative statics results with an emphasis on the importance of inducing financial costs in the ball-catching task. A comparison between treatment 1 and 4 offers an examination of the ceiling effect that is observed in many other real effort tasks in the context of the ball-catching task. If there is no “real” psychological cost/benefit associated with working on the task, we should observe the same distribution of the number of clicks in these two treatments, thus exemplifying the typical ceiling effect. The closeness 3 See an early review in Camerer and Hogarth (1999). Another possible reason for the “ceiling effect” is that subjects may also simply work on the paid task due to some experimenter demand effects, particularly in the absence of salient outside options. 6 of distributions between treatment 1 and 4 is statistically supported by a two-sample KolmogorovSmirnov test that produces a p-value of 0.843 and a Wilcoxon matched-pairs signed-ranks test that produces a p-value of 0.215 by using subject’s average effort per treatment as the unit of observations. Our interpretation is that without clicking costs subjects simply catch as many as balls as they can. We could further examine the intra-individual treatment differences by computing the within-subject treatment difference in average effort. The result shows that the treatment difference is within ±12% from the average effort in about 80% of our subjects. Figure 2: The distributions and the Kernel density distributions of the number of clicks Will induced costs of effort reduce the number of clicks? And will clicks depend on prize when there are clicking costs? Figure 2 depicts the distribution of the number of clicks for each treatment. First, we perform a horizontal comparison across treatment 1, 2 and 3 where the prize is always 10 and a horizontal comparison across treatment 4, 5 and 6 where the prize is always 20. We observe a clear shift of the distribution of the number of clicks when moving from the treatment with lower induced effort costs to the one with higher induced effort costs in both comparisons. Friedman test for detecting systematic differences in matched subjects’ observations by using subject’s average effort per treatment as the unit of observations shows that the differences across treatments are highly significant in both comparisons (p <0.001). Next, we perform two vertical comparisons between treatment 2 and 5 and between treatment 3 and 6. Holding the clicking costs constant, we find that a 7 higher prize leads to higher number of clicks in both comparisons (Wilcoxon matched-pairs signedranks test: p<0.001). The sharp contrast between the treatments with and without induced effort costs illustrates the importance of inducing financial costs in the ball-catching task both in the sense of avoiding the undesirable ceiling effect and moving effort provision in a predicted way. Our final comparative statics result is concerned about the effect of financial stakes on effort, that is, the effect of changing stakes without altering the ratio of the marginal cost and the marginal prize. As has been discussed before, if subjects get unobserved costs/benefits we will observe systematic differences in the distributions of clicks in the high-stakes and low-stakes treatments. However, by using subject’s average effort per treatment as the unit of observations, both a two-sample Kolmogorov-Smirnov test (p=0.843) and a Wilcoxon matched-pairs signed-ranks test do not suggest significant differences between treatments (p=0.880). 4 Financial stakes do not appear to affect subjects’ average effort. To have a more detailed overview of individual effort, in Appendix C1 we summarize the average number of clicks in each treatment for each individual. As one of comparative statics exercises, we can check whether each individual exhibited qualitatively consistent behavior, by which we mean that the ranking of actual efforts is the same as the ranking of predicted efforts across all treatments. As both treatment 2 and 6 predict the same effort level, we take an average of the actual efforts in these two treatments and only enter this average to the test of consistent behavior for each subject. This step also applies to treatment 1 and 4. Only 3 of the 64 subjects (less than 5%) behaved inconsistently: in all three cases the effort in treatment 5 is lower than the average effort in treatment 2 and 6. In sum, we have shown that comparative statics results hold well in the experimental data. It is not just the marginal cost or the marginal prize alone that affect behavior but it is their ratio that drives different behaviors. This also implies that inducing a level of financial costs circumvents the ceiling effect. 3.3. The production function As discussed above, our approach depends on inducing a concave production function that is independent of different incentives. The mathematical derivation of the exact functional form is unfortunately almost intractable and, therefore, we will instead estimate the production functional form. 5 Our empirical strategy of estimating the production function is to first retrieve the production 4 An intra-individual treatment difference examination shows that the treatment difference is within ±25% from the average effort in about 83% of our subjects. 5 In the version of the ball-catching task used in Study 1, it is almost impossible to catch every ball. Along with the randomness of the falling pattern, even if we are able to simulate the optimal number of clicks given a 8 functional form from the full sample and check some of its basic properties, including concavity. Next, we check whether the estimates (intercept and shape) of the production function given its functional form is invariant to varying levels of piece-rate incentives. If this is true, then the estimated production function is in and of itself unaltered over different economic environments. We will also examine the stability of the production function out of the sample. Figure 3: The estimated production functional form from a fractional polynomial regression Note: The first entry in (*,*) denotes the prize per catch and the second the cost per click. The fitted production functional form is given by 𝑄𝑄 = α0 + α1 𝑒𝑒 0.5 + 𝛼𝛼2 𝑒𝑒 2 , where 𝑄𝑄 denotes the number of catches and 𝑒𝑒 the number of clicks. The estimates of coefficients are from a fractional polynomial regression. In Figure 3 we visualize the production function with estimates from the factional polynomial regression. 6 We can see a clear concave production function which is reminiscent of the assumption on a production technology with a diminishing marginal rate of return to inputs in neoclassical theory. A slight decrease at the tail suggests that this production function also features a “technological ceiling” beyond which higher effort provision may actually lead to lower production levels. However, it is clear that most of such observations are collected only in treatments with zero cost of clicking. As one of the main advantages of using the ball-catching task is precisely that clicking can be made costly, the drop in the tail should be of little concern and, therefore, the empirical production function can largely preserve its monotonic nature. certain ratio of cost and prize, the fact that the subject cannot move fast enough to catch those which he/she should have prevents us from deriving a useful production function that describes actual behaviour. 6 Fractional polynomials can afford more flexibility than conventional polynomials by allowing logarithms and non-integer powers in the models. 9 Using the production functional form suggested by the fractional polynomial regression, we move on to estimate the following random coefficients panel data model: 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ𝑒𝑒𝑒𝑒𝑖𝑖,𝑟𝑟 = 𝛽𝛽0 + 𝛽𝛽1 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 0.5 + 𝛽𝛽2 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 2 + 𝛿𝛿𝑟𝑟 + 𝜔𝜔𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 + 𝑢𝑢𝑖𝑖,𝑟𝑟 , (2) where 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ𝑒𝑒𝑒𝑒𝑖𝑖,𝑟𝑟 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 are respectively the number of catches and the number of clicks of subject 𝑖𝑖 in period 𝑟𝑟. The period dummies (with the first period providing the omitting category), the random effect 𝜔𝜔𝑖𝑖 is assumed to be multiplicative with clicks. 𝜔𝜔𝑖𝑖 is assumed to vary only across subjects and have a variance 𝜎𝜎𝜔𝜔2 . The random error 𝑢𝑢𝑖𝑖,𝑟𝑟 is assumed to be identically and independently distributed over subjects and periods with a variance 𝜎𝜎𝑢𝑢2 . The motivation of using this specification is that we observe a non-negligible variation in individual effort in every treatment and there also seems to be higher variation in output at the range of larger number of clicks. Our specification of multiplicative heterogeneity allows persistent individual differences at the marginal production function. The inherent randomness of ball falling is represented by the random error at the production function because the nature of this randomness is that the location from which a ball falls is random but not the speed or the total number of balls so that a subject’s total number of clicking should not be correlated with a random error. 7 All equations are estimated using maximum likelihood methods and estimates are reported in Table 2. Table 2: Random coefficients regression for the model (2) in Study 1 Dep. Var.: Catches Coefficient estimates (1) full sample (2) Prize=10 (3) Prize=20 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 10.093*** (0.526) 11.247*** (0.791) 8.943*** (0.788) 5.517*** (0.103) 5.269*** (0.131) 5.773*** (0.161) 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 -0.003*** (0.000) -0.003*** (0.001) -0.004*** (0.000) 𝜎𝜎𝜔𝜔 0.070*** (0.007) 0.066*** (0.007) 0.071*** (0.011) 3.162*** (0.052) 3.134*** (0.074) 3.136*** (0.074) N 1920 960 960 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 𝜎𝜎𝑢𝑢 Note: All period dummies are included and insignificant except for period 2 using the full sample. ***p < 0.01 7 This assumption is supported by a one-way analysis of variance of the number of clicks for each treatment. In all six treatments, F-test strongly suggests that most of the variation in clicking can be accounted for by between-subjects variation (p<0.001). 10 Column (1), (2) and (3) in Table 2 reports the coefficient estimates for the full sample, the sub-sample with the prize of 10 and the sub-sample with the prize of 20. One observation is the closeness of the estimates of the parameters of the production function in all equations. The figure in Appendix C2 visualizes this statistical observation by comparing the fitted production functions for two subsamples with different prizes using the estimates in Table 2. The two production functions almost coincide both at the intercept and the shape. To formally test whether the production function is invariant over different incentives, we proceed to estimate an augmented model by adding interactions of covariates 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 (i.e. the parameters of the shape of the production function) with a binary variable indicating whether the prize is 10 or 20. This approach ensures that we will have sufficient observations over the whole range of the production function for both sub-samples with different prizes, while at the same time be able to do an informative test. By putting restrictions on all these interaction terms, we can perform a likelihood ratio test to see whether the shape of the production function interacts with different incentives. We cannot reject the null hypothesis that the shape of the production function applies to all treatments (𝜒𝜒 2 (2)=3.43, p=0.180). Note that we are making a claim that the shape of production function is invariant to the different incentives but not the intercept. Although in principle the intercept should also not vary across treatments, its coefficient estimate may be imprecisely estimated for treatments with the higher prize because most of the low or zero clicks are observed in the treatment with the highest cost and prize ratio (treatment 3). Indeed, by instead estimating an augmented model with interactions of the intercept, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 with a binary variable indicating whether the prize is 10 or 20, a likelihood ratio test by comparing its log-likelihood to that of the model (2) using the full sample reject the null hypothesis that the production function is invariant both at the intercept and the shape across sub-samples (𝜒𝜒 2 (3)=8.18, p=0.042). We don’t think, however, that this weaker claim is detrimental to our main purpose. After all, it is only the shape of the production function matters for the quantitative predictions of effort. To see the stability of the production function out of the sample, we split the full sample by experimental sessions and fit instead an augmented model by adding interactions of adding interactions of the intercept, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 0.5 and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 2 with the session dummy. We cannot reject the null hypothesis that both the intercept and the shape of production function are invariant across sessions (𝜒𝜒 2 (3)=1.26, p=0.739). 8 This implies that there is essentially no harm to use the estimated production function from the full sample to make predictions of effort even on the original sample because whichever sub-sample has been used as a basis for making out of the sample predictions of effort on another sub-sample the predictions will always be the same. This result is also supportive of our previous conjecture that the reason why the intercept seems to vary with incentives is simply because 8 A graphic demonstration of the closeness of the two fitted production function is shown in Appendix C3. 11 fewer observations of low clicks in treatments with the prize of 20 render the estimate of the intercept to be imprecise. Turning back to the estimates using the full sample, we observe that the random effect is statistically significant. Therefore, the model can explain why there is a dispersion of effort within each treatment and it might be due to individual persistent tendency to click more or less often from one to the other. The variance of the random error is also statistically significant and thus randomness in ball falling affects the level of the production function. In sum, we find that the shape of the estimated production function is invariant to different levels of incentive. The production function is also stable out of the sample. There exist both persistent and transitory individual differences in making clicks which explain the dispersion in effort. 3.4. Predicted and actual effort levels: Do induced effort costs dominate psychological costs/benefits? With the estimated production function from (3) and treatment parameters, we are ready to see how quantitative predictions on effort perform. The evidence can be readily seen in Table 3 which compares the predicted effort that is derived from (1) and the actual number of clicks for every treatment. 9 Figure 4 visualizes these results by combining categories whenever the treatments have the same predicted efforts. Table 3: Comparisons between the predicted number of clicks and the actual number of clicks Treatment No. 1 2 3 4 5 6 Prize per catch (P) 10 10 10 20 20 20 Cost per click (C) 0 5 10 0 5 10 Predicted clicks 57 20 8 57 34 20 57.8 (1.15) 18.6 (0.96) 8.8* (0.48) 58.7 (1.20) 30.9* (1.73) 18.4 (0.99) Actual clicks Note: the standard errors are in parentheses (clustered at the subject level). One sample two-tailed t-test: *p < 0.1. Consistent with the absence of unobserved costs/benefits, we find that predictions in all treatments are remarkably accurate, although subjects slightly over-provide effort in the lower incentivized treatment 5 and under-provide effort in the higher incentivized treatment 3 (both significantly different from the predicted efforts at the 10% level). 10 Overall, our subjects seem to understand the incentive structure 9 Note that we have assumed a continuous form of the production function. But this assumption is made mainly for expositional and analytical convenience. In reality, the production function is a discrete relationship between catches and clicks. We find the predicted effort which is rounded to the closest integer that maximizes the profit in each treatment. 10 In Appendix C4, we also produce a similar table by separately calculating the actual number of clicks for each experimental session. To the extent that it can be viewed as an out of the sample exercise, the resulting statistics corroborate our claims here. 12 under the current context very well and respond to them in the predicted way. The results are surprising given that subjects cannot know the production function a priori and therefore they cannot calculate the optimal level of effort deliberatively. Nonetheless, they behaved as if they knew the underlying structural parameters and responded to them optimally. Figure 4: The distributions and the Kernel density distributions of the actual number of clicks and the predicted clicks Note: The vertical line in each sub-figure represents the predicted average effort. We also administered a post-experimental questionnaire in Study 2 (reported in the next section) where we asked subjects to rate the difficulty, enjoyableness and boredom of the task. We found that on average subjects found the task very easy to do and they had neutral attitudes towards the enjoyableness and boredom of the task, thus also supportive of our assumptions on psychological costs/benefits. A comparison between treatment 1 and 4 where the cost of clicking is zero provides another lens to examine the relevance of psychological costs/benefits. The absence of psychological costs/benefits would lead to similar effort provision in these two treatments. The presence of “real” psychological costs of effort, as often assumed in other real effort tasks, would instead lead to higher effort levels in the higher-powered treatment 4 than in the lower-powered treatment 1. We can see from Table 3 that this is clearly not the case. The actual effort and its variation are very similar in these two treatments. A two-sample Kolmogorov-Smirnov test for equality of distributions of clicks also does not suggest a significant difference (two-tailed, p=0.521). This finding suggests that relying solely on alleged “real” psychological cost has serious limitations when it comes to the danger of the ceiling effect. This is true for the ball-catching task and probably applies to many other real effort tasks. We conclude that incentive theory predicts behavior well both at the aggregate and at the individual level. Material incentives seem to dominate cognitive and psychological concerns when subjects work on the task. 13 4. Study 2: Applications - Team Production, Gift Exchange and Tournament The previous section has shown the accuracy of predictions on effort using the ball-catching task. In this section, we implement the same task in three classic experiments that have been used to study cooperation, fairness and competition. These examples showcase that the power of the ball-catching task is not just confined to the individual choice environment, but widely applicable to interactive environments. In particular, we are interested in seeing whether our real effort task produces behavior that is similar to findings from experiments using induced value techniques. Note that an important feature of experimental methods in this study is that we will utilize the estimated production function from Study 1 to derive predictions on effort whenever possible. In each of the following experiments, we recruited 32 subjects for each treatment which ran 10 periods. We ran five sessions and recruited 160 subjects in total. After each session, a postexperimental questionnaire was administered inquiring subjects’ perception of the ball-catching task itself, including difficulty, enjoyableness and boredom. More details on the design including matching protocol, parameters of the task and payoff structure are specific to each experiment and, therefore, will be explained separately below for each experiment. All of the experiments were run at the CeDEx lab at the University of Nottingham. All treatments lasted no more than one hour and the average earnings were around £13.0. 4.1. Team production The understanding of free-riding incentives in team production is at the heart of contract theory and organizational economics (Holmstrom (1982)). The standard experimental framework for studying team production is the voluntary contribution mechanism in which the socially desirable outcome is in conflict with individual free-riding incentives (see a recent survey in Chaudhuri (2011) in the context of public goods). We implemented a team production (TP) experiment in which four team members worked on the ballcatching task independently. The same four subjects played as a team for the entire 10 periods. For each ball caught, the subject contributes 20 tokens to the team while he/she has to bear own cost of clicking with a cost per click of 5 tokens. At the end of each period, the total team production calculated as tokens was equally shared among the four team members. Each member’s payoff was determined by the share of the production net of the individual cost of clicking. Therefore, the revenue-sharing rule in team production reduces each individual’s marginal reward of catching balls by 75% without changing the marginal cost of clicking. We also ran two control treatments, each of which is basically the piece rate experiment as in Study 1. The first treatment (PR20) has a prize per catch of 20 tokens and a cost per click of 5 tokens. The second (PR5) has a prize per catch of 5 tokens and a cost per click of 5 tokens. 14 Considering a subject in the TP treatment, it is easy to see that if the subject behaves only selfishly, then his/her effort provision should be the same as in PR5, whereas if he/she is primarily concerned about maximizing the total team production, then his/her effort provision would be the same as in PR20. Our hypothesis is that free-riding incentives would drive effort towards the level predicted by selfishness as observed in many similar experiments using induced values (e.g., Nalbantian & Schotter (1997) and many public goods experiments using voluntary contribution mechanisms). Figure 6 displays the average numbers of clicks in all treatments. The two horizontal lines represent the Nash predictions on optimal effort levels in PR20 and PR5 respectively. Note that we have used the estimated production function from Study 1 to compute the optimal effort levels. Figure 6: Average clicks over time in team production The diagram shows a clear declining average effort over time in TP. The average effort decreases from 30 clicks to just above 17 clicks in the last period. By comparison, the average effort in PR20 decreases from 38 to 32 and in PR5 from 16 to 8 and thus is consistent with our findings in Study 1. Subjects in TP realize the free rider incentives from the very first period and steadily decrease their efforts to the individually optimal level. Although the average effort does not reach that extreme level, remember that our design has only 10 periods and uses a partner matching protocol. This empirical result is certainly in line with findings from experiments using induced values. 15 4.2. Gift exchange The gift exchange experiment (Fehr, Kirchsteiger & Riedl, 1993) postulates that the reason why firms offer more than competitive wages to workers is because they understand the reciprocal nature of workers’ effort provision, which was coined as “gift exchange” in the efficiency wage literature. We implemented a bilateral gift exchange experiment similar to Gächter & Falk (2002). The firm could offer a wage between 0 and 1000 token to the matched worker who then worked on the ballcatching task. There are two key differences we made in the version of the task in this experiment. First, we deliberately reduced the number of balls that could be caught from 52 to 20 within 60 seconds by increasing the time interval between falling balls. We made this change because we wanted to reduce the influence of random shocks as much as possible by making it possible to catch every ball so that the reciprocal behaviors by workers could be reflected in their efforts as well as in their actual outputs. Second, the cost schedule was changed to a convex function in accordance with the parameters used in most gift exchange experiments. The full cost schedule is reported in Table 5. Table 5: The cost schedule in gift exchange No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cost 5 5 6 6 6 7 7 7 7 8 8 8 8 8 9 No. 17 18 19 20 21 22 23 24 25 26 27 28 29 30+ 9 9 9 10 10 10 10 10 11 11 11 11 11 12 16 Cost 9 For example, the column with No. 6 means that the 6th click costs 6 tokens, the column with No. 21 means that the 21st click costs 10 tokens, and finally the last column with No. 30+ means that the 30th and any further click costs 12 tokens. Notice that if, for example, the worker makes a total of three clicks she will incur a total cost of 5 + 5 + 6 = 16 tokens. For the firm, each ball caught by his/her matched worker adds 50 tokens to his/her payoff. For the worker, his/her payoff is determined by the wage received net of the cost of clicking. To compensate any possible loss during the experiment, every firm and worker receives 300 tokens at the beginning of each period. We implemented two sessions with the stranger matching in one and the partner matching in the other session. Figure 7 shows the relationship between outputs and wages on the upper panel and the relationship between efforts and wages on the lower panel. The data suggests a clear reciprocal pattern in both treatments whether we look at outputs or efforts. As predicted from the experiments by Falk, Gächter & Kovacs (1999) and Gächter & Falk (2002) who also compared partners and strangers in gift exchange, the reciprocal pattern is stronger when it is possible to build up a reputation between 16 a firm and a worker. Our findings between strangers are also commensurate to another early real effort gift exchange experiment by Gneezy (2002) who used a maze solving task to measure worker’s performance, although his experiment was conducted in a one-shot setting. score 0 200 400 600 8001000 score 0 200 400 600 8001000 Figure 7: Reciprocal patterns in gift exchange 0 200 600 400 wagereceived 800 0 1000 200 bandwidth = .8 Stranger Partner 800 1000 400 600 wagereceived 800 1000 0 0 5 clicks 10 20 clicks 10 15 20 30 bandwidth = .8 600 400 wagereceived 0 200 600 400 wagereceived 800 1000 0 200 bandwidth = .8 bandwidth = .8 Stranger Partner Note: the upper panel shows the relationship between outputs and wages in both treatments and the lower panel displays the relationship between efforts and wages. The two sub-figures belonging to the stranger matching treatment are on the left panel while the two sub-figures belonging to the partner matching treatment are placed on the right panel. The fitted lines are estimated from the non-parametric Lowess regressions respectively. To provide some statistical evidence, we estimate the following random effects panel data model for the number of clicks on the wage received: 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑖𝑖,𝑟𝑟 = 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖,𝑟𝑟 + 𝜔𝜔𝑖𝑖 + 𝛿𝛿𝑟𝑟 + 𝑢𝑢𝑖𝑖,𝑟𝑟 where 𝜔𝜔𝑖𝑖 denotes the unobserved individual heterogeneity across subjects, 𝛿𝛿𝑟𝑟 denotes the period dummies with the first period providing the omitted category, and 𝑢𝑢𝑖𝑖,𝑟𝑟 denotes the unobserved individual heterogeneity across subjects and across periods. 𝜔𝜔𝑖𝑖 is assumed to be identically and independently distributed over subjects with a variance 𝜎𝜎𝜔𝜔2 and 𝑢𝑢𝑖𝑖,𝑟𝑟 is assumed to be identically and independently distributed over subjects and periods with a variance 𝜎𝜎𝑢𝑢2 . 17 Table 6 reports estimates of the parameters. We observe statistically significant reciprocal responses by workers to higher wages by clicking more often, and the strength of reciprocity is stronger between partners than strangers. Table 6: Random effects regressions for worker’s clicks in gift exchange Dep. Var.: Clicks (1)Stranger (2) Partner wage 0.003** (0.001) 0.014*** (0.003) Constant 3.279*** (0.681) 3.746*** (1.444) 1.753 3.346 2.649 4.397 df=10 p=1.000 df=10 p=0.956 160 160 𝜎𝜎𝜔𝜔 𝜎𝜎𝑢𝑢 Hausman test for random versus fixed effects N Note: the period dummies are included and insignificant. ***p < 0.01, **p < 0.05. 4.3. Tournament Tournament incentive schemes, such as sales competition and job promotions, are an important example of relative performance incentives (Lazear & Rosen (1981)). One early laboratory experiment by Bull, Schotter & Weigelt (1987) found that tournament incentives indeed induced efforts as predicted theoretically. But the variance of behavior was much larger under tournament incentives than under piece rate incentives. We implemented a simple simultaneous tournament in which two players compete for a prize worth 1200 tokens. The winner in each period earns 1200 tokens net of any cost of clicking, whereas the loser receives 200 tokens net of any cost of clicking. The cost per click is always 5 tokens. Each player’s probability of winning follows a piecewise linear success function (Che & Gale, 2000; Gill & Prowse, 2012): (own output – opponent’s output + 50)/100. We use this contest success function instead of the rank-order tournament in Bull, Schotter & Weigelt (1987) because it allows us to make a point prediction on the expected effort regardless of any shape of random shocks in the production function. This is because the marginal benefit of catching is equal to the marginal probability of winning multiplied by the prize spread between the winner prize and the loser prize. With the specified piecewise linear success function, the marginal probability of winning is always equal to 1/100. Therefore the marginal benefit of catching equals 10, while the marginal cost of clicks remains 18 5. Once again, we can simply utilize the estimated production function from Study 1 to compute the optimal effort level which turns out to be 20. Figure 8 displays the average efforts across all subjects over time. We observe a quick convergence towards to the predicted effort level. The variance of effort in tournament also appears to be larger than that observed in treatments with the same predicted effort in Study 1. The standard deviation of effort is around 12 in the former and 9.6 in the latter. However, this difference is smaller than what Bull, Schotter & Weigelt (1987) found with their induced value experiment, in which the standard deviation of effort under tournament incentives more than doubled that under piece rate incentives. Figure 8: Average clicks over time in tournament In sum, in Study 2 we found that our real effort task produces behavior that is closely in line with the stylized findings from three classic experiments that previously used induced values: free rider incentives in team production drove down the average effort, workers reciprocated higher wages by catching more balls and behavior in tournaments was more varied than that in piece-rate schemes. 5. Study 3: Online Ball-Catching Task 19 In this section, we introduce an online version of ball-catching task 11 . This online task has been designed to resemble the lab version as closely as possible. The purpose of this section is to show the potential to administer the ball-catching task in the online experiment, which has been proving itself to be a valuable complement to the physical laboratory experiment. We run the same experiment as in the first study on the popular online labor market platform, Amazon Mechanical Turk (MTurk). In total, we recruited 95 subjects from MTurk and 74 of them finished the task. The recruitment only took around 10 minutes. Given the unusually long duration of the task (50 minutes), 78% completion rate suggests that our promised payment is sufficiently attractive to most of the workers on MTurk. The average payment including $3 participation fee is around $5.9, which is well above what most MTurk tasks can offer. The average age of our subjects are 35, ranging from 20 to 66, and 52% of the participants are male. Figure 9: The distributions and the Kernel density distributions of the number of clicks in Study 3 Parallel the presentation in Study 1, Figure 9 summarizes the distribution and the Kernel density distribution of the number of clicks for each treatment. In general, we find that the comparative statics results are very similar to those in Study 1. The Wilcoxon signed-ranks test using subject’s average clicks per treatment as the unit of observations suggests that the difference in clicks between the two treatments with the same ratio of the cost and the prize is not systematic (p=0.309). So does the difference between the two treatments with no cost of clicking (p=0.832). Similarly, when comparing 11 The code programmed in PHP is available upon request. 20 across treatments with the same prize, Friedman tests strongly suggests that the treatment differences for different costs are systematic (p<0.001 in both comparisons). However, the variance of clicking in each treatment for MTurkers appears to be higher than that observed in the lab with student subjects. Secondly, we examine the property of the production function in this online task. Figure D1 in Appendix D displays the fitted production function from a fractional polynomial regression using the online data. The production function shares almost all of the features of the lab task except that its tail almost shows no decrease in catches. We perform the same test as in Study 1 to check whether the production function is invariant to different incentives. The likelihood ratio test cannot reject the null hypothesis that the shape of the production function is invariant to different prize incentives but not the intercept (𝜒𝜒 2 (2)=2.68, p=0.262). Similarly as in Study 1, when we randomly split the full sample into two halves (37 subjects in each), we cannot reject the null hypothesis that both the intercept and the shape of the production function are invariant (𝜒𝜒 2 (3)=3.38, p=0.336). Lastly, Table D1 in Appendix D shows performance of the quantitative predictions of effort. Contrary to what we find using the lab task, it appears that the actual number of clicks is statistically higher than the predicted one when C/P is larger than ½ and is statistically lower than the predicted effort when C/P is no more than ½. The inaccuracy of quantitative predictions might be the consequence of the technology represented by the underlying production function. In particular, the concavity of the production function is perhaps not sufficiently “strong” to allow for sharp predictions of effort. In sum, we find that performance in this particular version of online ball-catching task preserves all of the comparative statics predictions and stability properties of the underlying production function as compared to the lab task. However, the quantitative predictions do not hold as well as those using the lab task. One caveat is in order for potential users who want to administer this online ball-catching task in their own research. It is important to calibrate the parameters, such as, the total number of balls, the speed of falling, the column of falling balls, etc., in the ball-catching task to fit their research purposes. In our case, it is desirable to make the production function sufficiently concave to allow for more accurate predictions of effort. One possible solution is to induce a convex cost of effort function to sharpen predictions. 6. Conclusion In this paper we have introduced a novel real effort task that combines “real”, i.e. non-pecuniary, effort costs with material effort costs. The task offers flexibility in controlling the cost of effort 21 function as well as the production function while retaining “real” efforts. Its greatest advantage over other real effort tasks lies in its power to make a priori point predictions on effort. As has been shown in Study 1, the underlying production function of the ball-catching task preserves important properties of a production technology: it displays a diminishing return in clicking and is invariant to incentives. With a stable production function, we find that the predicted effort levels are remarkably accurate under various piece-rate incentive schemes on average and highly consistent at the individual level. Our results underscore the importance of inducing material cost of effort in the ball-catching task. With material costs we find that efforts are invariant to increasing the stakes and close to the predicted efforts, a finding that suggests any cognitive and psychological costs/benefits are largely dominated by induced material costs. Without material costs we observe a ceiling effect: efforts are unresponsive to increases in the prize for catching balls. We illustrate the usefulness of the ball-catching task by implementing it in classic interactive environments that have been used to study cooperation, fairness and competition. Using the ballcatching task in place of pure induced effort costs, we reproduce the stylized findings from these experiments. Finally, we introduce an online version of ball-catching task for researchers who are interested in using the ball-catching task in their online experiments. We believe that the ball-catching task will be a valuable tool for experimental researchers interested in using “real efforts” in the lab. One of our future projects is to develop an online version of the ballcatching task and implement it among a more general population, for example, Amazon Mechanical Turk workers. 22 References Abeler, J., Falk, A., Goette, L., & Huffman, D. (2011). Reference Points and Effort Provision. American Economic Review, 101(2), 470–492. Bull, C., Schotter, A., & Weigelt, K. (1987). Tournaments and Piece Rates : An Experimental Study. Journal of Political Economy, 95(1), 1–33. Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1-3), 7–42. Chaudhuri, A. (2010). Sustaining cooperation in laboratory public goods experiments: a selective survey of the literature. Experimental Economics, 14(1), 47–83. Che, Y.-K., & Gale, I. (2000). Difference-Form Contests and the Robustness of All-Pay Auctions. Games and Economic Behavior, 30(1), 22–43. Corgnet, B., Hernán-González, R., & Schniter, E. (2014). Why real leisure really matters: incentive effects on real effort in the laboratory. Forthcoming in Experimental Economics. Dickinson, D. L. (1999). An Experimental Examination of Labor Supply and Work Intensities. Journal of Labor Economics, 17(4), 638–670. Eckartz, K. M. (2014). Task enjoyment and opportunity costs in the lab – the effect of financial incentives on performance in real effort tasks. Jena Economic Research Papers. Falk, A., Gächter, S., & Kovács, J. (1999). Intrinsic motivation and extrinsic incentives in a repeated game with incomplete contracts. Journal of Economic Psychology, 20(3), 251–284. Fehr, E., Kirchsteiger, G., & Riedl, A. (1993). Does Fairness Prevent Market Clearing? An Experimental Investigation. Quarterly Journal of Economics, 108(2), 437–459. Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments. Experimental Economics, 10(2), 171–178. Gächter, S., & Falk, A. (2002). Reputation and Reciprocity: Consequences for the Labour Relation. Scandinavian Journal of Economics, 104(1), 1–26. Gill, D., & Prowse, V. (2012). A Structural Analysis of Disappointment Aversion in a Real Effort Competition. American Economic Review, 102(1), 469–503. Gneezy, U. (2002). Does high wage lead to high profits? An experimental study of reciprocity using real effort. Unpublished. Greiner, B. (2004). An Online Recruitment System for Economic Experiments. In M. Kremer & V. Macho (Eds.), Forschung Und Wissenschaftliches Rechnen Gwdg Bericht 63 (pp. 79–93). Göttingen: Ges. für Wiss. Datenverarbeitung. Holmstrom, B. (1982). Moral hazard in teams. Bell Journal of Economics, 13(2), 324–340. Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: conducting experiments in a real labor market. Experimental Economics, 14(3), 399–425. 23 Lazear, E. (2000). Performance Pay and Productivity. American Economic Review, 90(5), 1346-1361. Lazear, E. P., & Rosen, S. (1981). Rank-order tournaments as optimum labor contracts. Journal of Political Economy, 89(5), 841–864. Mohnen, A., Pokorny, K., & Sliwka, D. (2008). Transparency, Inequity Aversion, and the Dynamics of Peer Pressure in Teams: Theory and Evidence. Journal of Labor Economics, 26(4), 693–720. Nalbantian, H., & Schotter, A. (1997). Productivity under group incentives: An experimental study. American Economic Review, 87(3), 314–341. Niederle, M., & Vesterlund, L. (2007). Do Women Shy Away From Competition? Do Men Compete Too Much? Quarterly Journal of Economics, 122(3), 1067–1101. Prendergast, C. (1999). The Provision of Incentives in Firms. Journal of Economic Literature, 37(1), 7–63. Smith, V. L. (1982). Microeconomic systems as an experimental science. American Economic Review, 72(5), 923–955. van Dijk, F., Sonnemans, J., & van Winden, F. (2001). Incentive systems in a real effort experiment. European Economic Review, 45(2), 187–214. 24 Appendix A. Existing Real Effort Tasks 12 types specific tasks mathematical solve mathematical equation skills multiply one-digit with twodigit studies Sutter & Weck-Hannemann (2003) Dohmen & Falk (2011) Kuhnen & Tymula (2012) multiply two-digit numbers addition of two-digit numbers select a subset of the numbers that added up to 100 decode a number from a grid of letters Brüggen & Strobel (2007) Niederle & Vesterlund (2007) Healy & Pate (2011) Sloof & van Praag (2010) Eriksson, Poulsen & Villeval (2009) Cason, Masters & Sheremeta (2010) Heyman & Ariely (2004) Lévy-Garboua, Masclet & Montmarquette (2009) Dutcher & Saral (2012) linguistic skills memory and logic solve anagrams Charness and Villeval (2009) mazes Gneezy (2002) Gneezy, Niederle & Rustichini (2003) Freeman & Gelber (2010) trial and error memory games Ivanova-Stenzel & Kübler (2011) Sudoku Calsamiglia, Franke & Rey-Biel (2011) IQ test Gneezy & Rustichini (2000) Pokorny (2008) two-variable optimization task van Dijk, Sonnemans & van Winden (2001) Bosman, Sutter & van Winden (2005) Bosman & van Winden (2002) Sloof & van Praag (2008) Secretary tasks search height of a curve Montmarquette et al (2004) Dickinson & Villeval (2008) type abstracts Hennig-Schmidt, Rockenbach & Sadrieh (2010) type paragraphs Dickinson (1999) log library books into database Gneezy & List (2006) Kube, Maréchal & Puppe (2012) 12 This list was compiled by our own. Later we found that there was an independent effort by Christina Gravert who compiled a similar list of real effort tasks and circulated it within the ESA google group community (available at https://groups.google.com/forum/#!searchin/esa-discuss/real$20effort$20tasks/esadiscuss/P9MuBRweawc/Amc6ylQAvgYJ). 25 Kube, Maréchal & Puppe (2011) copy fictitious students in database copy numbers from questionnaires at computer stuff letters into envelope Ortona, Ottone, Ponzano, & Scacciati (2008) Greiner & Ockenfels & Werner (2011) Konow (2000) Falk & Ichino (2006) Carpenter, Matthews, & Schirm (2010) Manual tasks count zeros count numbers drag a computerized ball to a specific location slider-moving sort and count coins Abeler, Falk, Goette, & Huffman (2011) Pokorny (2008) Heyman & Ariely (2004) Gill & Prowse (2012); Gill & Prowse (2011) Gill, Prowse & Vlassopoulos (2012) Dörrenberg, & Duncan (2012) Hammermann, Mohnen & Nieken (2012) Bortolotti, Devetag, & Ortmann (2009) 26 Appendix B. Instructions Used in Study 1 Welcome to the experiment. You are about to participate in an experiment on decision making. Throughout the experiment you must not communicate with other participants. If you follow these instructions carefully, you could earn a considerable amount of money. For participating in this experiment you will receive a £3 show-up fee. In addition you can earn money by completing a task. Your earnings from the task will depend only on your own performance. You will be asked to work on a computerized ball-catching task for 36 periods. Each period lasts one minute. In each period, there will be a task box in the middle of the task screen like the one shown below: Once you click on the “Start the Task” button, the timer will start and balls will fall randomly from the top of the task box. You can move the tray at the bottom of the task box to catch the balls by using the mouse to click on the LEFT or RIGHT buttons. To catch a ball, your tray must be below the ball before it touches the bottom of the tray. When the ball touches the tray your catches increase by one. You will receive a prize (in tokens) for each ball you catch and incur a cost (in tokens) for each mouse click you make. At the beginning of each period you will be informed of your prize for each ball caught, which will be either 10 or 20, and your cost for each click, which will be either 0, 5 or 10. In each period, the number of balls you caught so far (displayed as CATCHES), the number of clicks you made so far (CLICKS), your accumulated prizes so far (SCORE) and your accumulated costs so far (EXPENSE) are shown right above the task box. SCORE will be CATCH multiplied by your prize per catch for the period and EXPENSE will be CLICK multiplied by your cost per click for the period. At the end of the period your earnings in tokens for the period will be your SCORE minus your EXPENSE. Please note that catching more balls by moving the tray more often does not necessarily lead to higher earnings because both SCORE and EXPENSE matter for your earnings. The first six periods will be practice periods which will not affect your earnings in the experiment in any way. At the end of the experiment, the tokens you earned from periods 7 to 36 will be converted to cash at the rate of 1200 tokens = 1 pound. You will be paid this amount in addition to your £3 show-up fee. 27 Appendix C. Additional Tables and Figures in Study 1 C1. Summary of individual effort Table C1: Individual average number of clicks in each treatment Subject 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 401 402 1:(10,0) 59.8 62.6 51 21.6 62 56.8 57.2 47.4 62.6 60.2 61.4 48.6 64.6 49.2 64 63.8 56.4 38.2 61.8 64 65.4 64.6 67.2 53 51.2 72.8 62.4 63.2 63.4 65.8 59.2 58.4 61.2 63.2 46.8 38.2 52.4 67 56.6 44.6 52.8 44 52.6 69.4 60.8 56.6 58.4 63.2 59.6 65 2:(10,5) 27.4 18.4 14.8 6 15.8 18.8 20.8 13.4 18.8 20.8 5.8 6.4 25 17.4 18.8 29.8 36.2 9.4 19.2 24.4 21 24.6 31.4 19.6 28.6 10.6 20.8 23.4 20 30.4 24.8 7.6 3.4 16.4 17.2 5 14.2 14.6 16.8 29.4 7.4 17.2 15.8 13.8 17 23.6 21.8 15.8 13.2 20.8 3:(10,10) 12 12.2 9.4 3 3.6 9 11.2 11.6 9.6 7.2 5 2.8 16 14.6 9.2 11 13.2 4.6 11.4 6.8 10.6 11.4 14.8 6.8 5.4 8 8.8 10.2 12.2 18.8 13.6 3.8 3.2 9 11 1.4 7.2 7.4 5.6 8.6 4 9.8 7.4 6.2 7.4 10 8.2 15.2 13.2 8.6 4:(20,0) 64.6 63.6 65 20.2 60.6 58.6 57.8 54.6 65 61.4 63.8 49.2 69.8 59 64.4 66 53 38 58.4 55 58.4 61.6 60.8 53 49.2 65.2 63 68.4 58 64.2 53.4 59.6 60.8 70 50.8 30.6 60 63.2 64.4 48.6 56.6 50.2 66.4 62.4 71.6 64.6 63 59.8 64.4 66.2 28 5:(20,5) 45 30.2 22.8 5.4 18 34.8 29.6 25.8 26 39 9 8.4 48.2 24.2 30.2 52.6 39.2 11.6 27.6 37.8 37.2 45.6 53.2 27.8 43.6 16.8 40 53 31.4 24.2 36.6 31.8 5.6 21.8 32.2 5.6 23.6 36.2 29.6 27.8 37 29.8 29.6 26.4 40.4 56.8 23 23.4 15.6 42.2 6:(20,10) 27.6 20 14.2 4.4 16 19.2 26 10.8 14.2 23.8 7 6.8 29.6 16.8 21.4 29.8 34.6 7 17 23.2 16.6 25.8 32.2 17.4 22.2 10.6 22.4 28.6 19.2 25.4 25.2 12.2 13.4 13 11.6 1.4 13.2 13.6 17.2 28.6 9.4 21.8 11.8 9.8 23.6 24 16.2 18.6 16.8 23.2 Consistent behavior? √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ × √ √ × √ √ √ √ √ √ × √ √ √ √ √ √ √ √ √ √ 403 41.2 16.2 10.4 42.8 404 57.2 24.4 5.2 60.2 405 51.6 16.8 8 54 406 62.4 25.8 11.6 69 407 67 8.2 2.8 69 408 67.6 17.2 7.2 62.6 409 62.4 16 4.8 58.6 410 47.4 11.4 8.6 35.6 411 54.4 13 6.4 49.4 412 66.4 34.8 0 62.6 413 71.8 13.8 13.4 67.6 414 62 27.6 9.8 64.2 415 48.4 16 11.8 55.6 416 67.8 37.2 11.8 70.4 Note: we consider the average clicks of treatment 2 and evaluating the consistency at the individual level. 29 15.4 13 √ 38.4 27.6 √ 25.8 15 √ 58.4 29 √ 10 6.8 √ 28.6 13.8 √ 42 14.6 √ 13.6 11.6 √ 20.4 13.6 √ 52.2 41.4 √ 23.2 16.8 √ 45 30 √ 22.8 15.2 √ 66 17.8 √ 6 and average clicks of treatment 1 and 4 when C2. Stability of the production function over different incentives 10 catches 20 30 40 50 Figure C1: The fitted production functions for sub-samples with different prizes using the model (2) 0 P=10 0 20 60 40 clicks 30 P=20 80 100 C3. Stability of the production function over experimental sessions 10 catches 20 30 40 50 Figure C2: The fitted production functions for different sessions using the model (2) 0 Session 1 0 20 40 clicks 31 Session 2 60 80 C4. Out-of-sample exercises We split the full sample into two halves, S1 and S2, by experimental sessions and estimate the production functions respectively. The actual number of clicks in each sub-sample is calculated and reported in Table C1. Overall, out-of-sample predictions of effort perform quite well. Table C2: Comparisons between the predicted number of clicks and the actual number of clicks Treatment No. 1 2 3 4 5 6 Prize per catch (P) 10 10 10 20 20 20 Cost per click (C) 0 5 10 0 5 10 Predicted clicks 57 20 8 57 34 20 58.119 (1.710) 57.500 (1.568) 19.694 (1.371) 17.556* (1.764) 9.619** (0.703) 7.975 (0.621) 58.213 (1.678) 59.225 (1.735) 31.456 (2.307) 30.263 (2.620) 19.600 (1.403) 17.294* (1.405) Actual clicks (S1) Actual clicks (S2) Note: the standard errors are in parentheses (clustered at the subject level). One sample two-way t-test: **p < 0.05, *p < 0.1. 32 Appendix D. Additional Figures and Tables in Study 3 D1: The fitted production function Figure D1. The estimated production function from a fractional polynomial regression in the online 20 catches 30 40 50 experiment 0 10 (10, 0) (10, 10) (20, 5) 95% CI 0 20 60 40 (10, 5) (20, 0) (20, 10) predicted balls 80 100 clicks Note: The first entry in (*,*) denotes the prize per catch and the second the cost per click. The fitted production functional form is given by 𝑄𝑄 = α0 + α1 𝑒𝑒 0.5 + 𝛼𝛼2 𝑒𝑒 2 , where 𝑄𝑄 denotes the number of catches and 𝑒𝑒 the number of clicks. The estimates of coefficients are from a fractional polynomial regression. 33 D2: Predicted effort and actual effort Table D1. Comparisons between the predicted number of clicks and the actual number of clicks Treatment No. 1 2 3 4 5 6 Prize per catch (P) 10 10 10 20 20 20 Cost per click (C) 0 5 10 0 5 10 Predicted 95 25 8 95 51 25 55.3*** (1.00) 22.0*** (0.80) 13.7*** (0.74) 55.9*** (0.96) 35.6*** (0.85) 22.7*** (0.85) Actual Note: the standard errors are in parentheses (clustered at the subject level). One sample two-tailed t-test: ***p < 0.01, *p < 0.1 ***p<0.01 34
© Copyright 2026 Paperzz