INCENTIVES FOR BOUNDEDLY RATIONAL AGENTS March 2, 2001 Suren Basov Department of Economics Boston University 270 Bay State Road Boston, MA 02215 Acknowledgments: I am grateful to Adam Brandenburger, Hsueh-Ling Huynh, Dilip Mookherjee, Robert W. Rosenthal, and Marc Rysman for helpful comments and discussion. None of them is responsible for any mistakes in the paper. Keywords: bounded rationality, incentives, principal-agent model. JEL classi…cation numbers: C60, D82, D83 e-mail: [email protected] 1 Abstract This paper develops a theoretical framework for analyzing incentive schemes under bounded rationality. It starts from a standard principal-agent model and then superimposes an assumption of boundedly rational behavior on the part of the agent. Boundedly rational behavior is modeled as an explicit optimization procedure which combines gradient dynamics with a speci…c form of social learning called imitation of scope. The model creates a framework for addressing the issues of the robustness and complexity of incentive schemes. The results help to shed light on the underprovision of optimal incentives and on gift-exchange behavior in the real world. As a by-product, a standard su¢cient statistics result from the agency literature is seen not to hold in this world. 2 1 INTRODUCTION Agency relationships form an important part of economic life. Among the most common examples are managers acting on behalf of an owner, workers supplying labor to a …rm, and customers buying coverage from an insurance company. The common feature of all these examples is that unobservable actions undertaken by one party have payo¤ relevant consequences for another. This creates a moral hazard problem. The early papers that incorporated agency relationships into formal economic models were Spence and Zeck.. hauser (1971), Ross (1973), Mirrlees (1974, 1976), and Holmstrom (1979). The …rst three authors used a state-space formulation, while the last two switched to a parametrized distribution formulation. The latter formulation has become more popular in part because of its greater tractability. Though the assumptions of the basic models, pioneered by Mirrlees (1974, 1976), were quite general, they were capable of generating some predictions that were in broad agreement with casual empiricism. The main predictive .. power of the models is due to a su¢cient statistics result [Holmstrom (1979, 1982), Shavell (1979), Dewatripont, Jewitt, and Tirole (1999)]. Among other things, it leads to statements about the use of relative performance evaluations that matches empirical evidence. The result states that optimal compensation to an agent should be based on a function of observable measures of e¤ort which is a su¢cient statistic for e¤ort. Intuitively, this means that the principal pretends not to know the e¤ort level undertaken by an agent1 and uses the best statistical estimate based on performance. Adding other measures of performance on top of the su¢cient statistic would only add additional noise without improving statistical inference. Given that the optimal scheme should balance risk sharing against creation of proper incentives, such additions impair performance. However, the basic agency model has some well-recognized di¢culties. The main one is its sensitivity to distributional assumptions, which manifests itself in the complexity of the optimal sharing rules. We see that sharing rules are much simpler in practice than the basic agency model would suggest. One explanation for such simplicity is a demand that the optimal scheme should be robust to a variety of hypotheses. The demand for robustness can be understood by imagining an agent with a richer set of actions. Since intricate schemes are too …nely tuned, the more options such an agent has, 1 In fact the principal knows the e¤ort level on the equilibrium path. 3 the worse the old optimal scheme will perform. This issue was addressed in .. the paper of Holmstrom and Milgrom (1987), which found a setting in which a simple scheme is optimal. They considered a dynamic context, where the agent is paid at the end of a period (say a month), but she observes her own performance during the month so that she can adjust her e¤ort based on the partially realized path of output. Though feasible incentive schemes can .. be very complicated and conditioned on the entire history, Holmstrom and Milgrom showed, in the case when principal and agent have CARA utility functions, that the optimal scheme pays wages that are a linear function of the …nal output, while the agent chooses a constant e¤ort in time, independently of the realized history. Though the linearity result can shed some light on the simplicity of compensation schemes used in practice, the assumptions of the model are quite restrictive. For example, for the result to hold the agent must be prevented from making private investments, which does not seem to be very realis.. tic. As Hart and Holmstrom (1987) write: one will have to go outside the Bayesian framework and introduce bounded rationality in order to capture the true sense in which incentive schemes need to be robust in the real world. This paper can be considered as a …rst step in incorporating bounded rationality into the incentive problems. It does not go so far as to derive linear incentive schemes (such a scheme is simply assumed in the paper). However, it provides a framework in which robustness issues can be addressed in future research. Before going further it is important to say what is meant by “bounded rationality” in this paper. Full rationality implies that an agent has wellde…ned preferences over set of possible outcomes and is capable of …nding the optimal outcome instantaneously. I drop this assumption and model the agent’s behavior as a search algorithm. Think about maximizing a function on a computer. To do so one has to write a program. If one has to use the same program for many problems and require that it works in reasonable time (grid search is excluded), one should not expect it to solve all problems perfectly. By “bounded rationality” I mean that the optimum should be learned rather than solved be instantaneously and the agents should not always be assumed to succeed in learning it. I do not consider bounded rationality in the senses of “limited memory” or “limited information”. Basov (2000a) investigates the general properties of learning algorithms and parametrizes such algorithms in terms of coe¢cients of the generator of a stochastic learning process. Some particular cases of those learning 4 algorithms are applied in this paper to the standard principal-agent model. I …nd that when a principal knows that a population of agents behaves according to such a process and plans accordingly, the contracts she o¤ers can be very di¤erent from those of standard models, and the resulting behavior of both principal and agents can be more realistic than is depicted in standard models. The rest of the paper is organized as follows: Section 2 discusses general learning models, Section 3 reviews a standard principal-agent model, and Section 4 considers agents following gradient dynamics (an example of a learning algorithm). The central part of the paper is Section 5 where a social learning rule is proposed for a population of agents, and the main results about the nature of contract and the resulting agents behavior are presented. I revisit the su¢cient statistic question in Section 6 and conclude in Section 7. 2 BOUNDED RATIONALITY AND LEANING As I stated in the Introduction, by “bounded rationality” I mean that the optimal choice should be learned by agents rather then made instantaneously. I will assume that learning generates a stochastic process on the space of possible choice. Introducing stochastic elements in learning algorithm makes sense even for purely computational reasons. For instance, they may prevent an optimization algorithm from being trapped in a local (as opposed to global) maxima. In fact, many computational optimization algorithms (for example, simulated annealing) use stochastic elements. The stochastic nature of the learning algorithm also gives rise to probabilistic choice models which have been routinely used to explain behavior in psychological experiments. See, for example, Estes (1950), Bush and Mosteller (1955). These models were introduced to economics by Luce (1959). Several stochastic learning models have found applications in economics. For example, see Kandori, Mailath, and Rob (1993), Fudenberg and Harris (1992), Young (1993), and Anderson, Goeree, Holt (1997). These papers make speci…c assumptions about the source and type of randomness. I do not want to specify the source of randomness explicitly. Instead, I ask what is the most general form of a stochastic learning algorithm. To answer this 5 question I use an axiomatic approach.2 I assume that agents adjusts her choice stochastically. She does so in a social environment after observing the choice of another individual. Axiom 1 The probability of reaching a particular set is determined solely by the current choice and current observation. Axiom 1 is rather weak. Indeed, assume an agent keeps track of her choices and observations at discrete moments of time and remembers only …nitely many choices and observations. Then one can always rede…ne the choice space in such a way that Axiom 1 will hold. Hence, Axiom1 is essentially …nite memory assumption. Axiom 2 Transition probability conditional on current choice and current observation is represented by a probability measure which is absolutely continuous with respect to Lebesque measure and its Radon-Nykodim derivative is su¢ciently smooth . Axiom 3 The stochastic process de…ned by the transition probabilities is Khinchine continuous and is uniquely de…ned by generator.3 The axioms stated so far are technical in nature. They are satis…ed by many reasonable stochastic processes. For example, Brownian motion with drift satis…es them. Axioms 1-3 allow us to decompose any stochastic behavioral algorithm into a deterministic and a stochastic part. Furthermore, this decomposition is unique. The stochastic part can be decomposed further. I will return to this question below. Detailed proof of this assertion can be found in Basov (2000a). A sketch of proof is given in the Appendix. Besides those technical axioms, I would like to capture the intuition that the deterministic part of the adjustment process should represent a form of learning. In particular, it should be payo¤ improving over time. De…nition Let x(t) denote the choice of the agent at time t if she were governed by the deterministic part of the process alone. I will say that choice x1 is strictly revealed preferred to choice x2 (x1 6= x2 ) if there exists t1 > t2 ¸ 0, such that x(t1 ) = x1 and x(t2 ) = x2 . In this case write x1 Rx2 . This de…nition says that the choice which is made later should be better for the agent, if her behavior is strictly deterministic. Intuitively, that is what learning would achieve. 2 In the main text axioms are presented informally. For a formal presentation of axioms see Appendix. 3 For a de…nition of technical terms see Kanan (1979). 6 Axiom 4 The revealed preference relation R can be represented by a continuously di¤erentiable utility function. Axioms 4 allows us to put restrictions on the deterministic part of learning process, namely it implies that deterministic adjustment takes place in the direction of increased utility. This result is rather intuitive, formal proof can be found in Basov (2000a). Now the main result about the form of the learning algorithm can be restated in a following way. Any stochastic learning algorithm can be decomposed into four behavioral components: adjustment in the direction of increased payo¤s (generalized gradient dynamics)4 , experimentation, direct imitation, and imitation of scope. While the …rst three components are self explanatory and rather intuitive, the last needs some explanation. By imitation of scope I mean a procedure like the following: an agent observes the choice of another agent in the population and decides to experiment randomly within a window the width of which is proportional to the Euclidean distance between her current choice and the observation. Note that the outcome of the experiment may fall closer to or further from the observed choice. To understand the intuition behind such a procedure, note that experimentation pays o¤ when an agent is far from the optimum. Assume that all agents are ex-ante symmetrical and have the same preferences. Then on average the distance of their choices from the optimum should be the same. If two choices are close to an optimum, then the distance between them should also be small by the triangle inequality. Alternatively, a large distance between two choices means that at least one of them is far from the optimum. Symmetry implies that there is at least a 50% chance that the observing agent is far enough and hence the value of experimentation is high for her. This paper uses a particular version of the model of bounded rationality developed in Basov (2000a) and described above. In Section 4 I consider agents who follow gradient dynamics. In Section 5 I combine it with the imitation of scope. I will not introduce direct imitation and experimentation. The reason is that their introduction will make analysis less tractable but is unlikely to o¤er any new insights. Indeed, direct imitation will simply facilitate gradient dynamics and experimentation will add exogenous noise to the endogenous one created by the imitation of scope. It is important to note that the model of bounded rationality developed 4 If adjustment takes place in the direction of maximum increase of payo¤s it is called gradient dynamics. 7 in Basov (2000a) and used in this paper can be applied to a wide variety of problems. In principal, any problem that can be considered using the conventional model can be reconsidered in this model. Hence, this model is not tailored to a particular application as many other models of bounded rationality are. For a discussion of other potential applications see Basov (1999). 3 A SIMPLE PRINCIPAL-AGENT MODEL In this section I will consider a simple conventional principal-agent model. Let the gross pro…t of the principal be given by ¦ = z + "; (1) where z is e¤ort undertaken by the agent, and " is random noise with zero mean and variance ¾ 2 . Only ¦ is observable by the principal. The utility of the agent is given by: Á z2 U = E(w) ¡ V ar(w) ¡ ; 2 2 (2) where w is the agent’s payment (wage) conditioned on z through ¦. The principal wants to maximize expected pro…ts net of the wage, subject to the incentive compatibility constraint: z2 Á z 2 arg max(E(w) ¡ V ar(w) ¡ ) 2 2 (3) and the individual rationality (participation) constraint: Á z2 E(w) ¡ V ar(w) ¡ ¸ 0: 2 2 (4) I will concentrate attention on a¢ne payment schemes: w = ®¦ + ¯: 8 (5) It is straightforward to show that the optimal a¢ne contract has: ®= 1 ; 1 + Á¾ 2 ¯= Á¾ 2 ¡ 1 : 2(1 + Á¾ 2 )2 (6) To see this, note that ® is chosen to maximize a total surplus W de…ned as (7) W = E(U + ¦ ¡ w); subject to (3), and ¯ is chosen to insure that (4) holds. Since in this case the objective function of the agent is strictly concave, the incentive constraint can be replaced by the …rst order condition z = ®. Plugging this into (7), solving the maximization program, and using (4) to obtain ¯, yields (6). The net pro…t of the principal and the utility of the agent under the optimal a¢ne compensation scheme are given by: E(¦ ¡ w) = 1 ; 2(1 + Á¾ 2 ) (8) U = 0: One can see that the slope ® of the optimal compensation scheme and the pro…t of the principal are decreasing in ¾, while the utility of the agent is determined by the reservation utility, which is normalized at zero here. Hence, noise damps incentives and dissipates social surplus. By way of contrast, consider a nonlinear scheme: w = ®¦ + °¦2 + ¯: (9) The utility of the agent will then be given by Á U = ®z + °(z 2 + ¾ 2 ) + ¯ ¡ (4z¾ 2 °[z° + ®] + 2 ° 2 V ar("2 ) + ®2 ¾ 2 + [4° 2 z + 2®°]E("3 )) ¡ 9 z2 : 2 (10) This formula can be obtained by plugging (9) into (2) and evaluating relevant expectations and variances. The terms V ar("2 ) and E("3 ) depend in general on higher moments of the distribution of ", that is on the third and fourth moments. For nonlinear schemes higher order moments will generally play a role. This means that knowledge of the entire distribution becomes important for general nonlinear schemes. Hence, they are not robust with respect to the distributional assumptions. 4 OPTIMAL INCENTIVES UNDER GRADIENT DYNAMICS In this section I will assume that the agent is boundedly rational and adjusts her choices in the direction of increasing payo¤. Let the general structure of the model be the same as in Section 2. As in the previous section, restrict attention to a¢ne compensation schemes. The main di¤erence between the model in this section and that of Section 2 is that the agent, instead of responding optimally to the compensation scheme, adjusts her choices according to the di¤erential equation: dz (11) = ®(t) ¡ z; dt conditionally on the decision to participate. The function on the right hand side of (11) is the derivative of agent’s utility with respect to z. This is the reason I call equation (11) gradient dynamics. It is worth mentioning that units of utility have meaning in this framework since they determine the speed of adjustment. This contrasts with the rational paradigm where utility units are arbitrary. The agent participates if her instantaneous utility at time t is nonnegative, that is if: ®z ¡ z2 1 2 2 ¡ Á® ¾ + ¯ ¸ 0: 2 2 (12) (To obtain the left-hand side of (12), plug the compensation scheme (5) into the agent’s utility function (2) and use the assumed expectation and variance.) 10 The principal seeks to maximize the discounted expected present value of net pro…ts, subject to (11) and (12). Solving (12) with equality for ¯; one gets the following optimal control problem for the principal: M ax Z1 e¡½t (z ¡ z 2 Á[®(t)]2 ¾ 2 ¡ )dt 2 2 (13) 0 subject to (11). The integrand is discounted total surplus. The present-value Hamiltonian for this problem has the form: H =z¡ z 2 Á®2 ¾ 2 ¡ + ¹(® ¡ z): 2 2 (14) The evolution of the costate variable ¹ is governed by: d¹ = (1 + ½)¹ ¡ 1 + z; dt (15) together with the transversality condition: lim ¹(t)e¡½t = 0: t!1 (16) The time discount rate ½ will be assumed to be small, in fact I will …x it at zero after using the transversality condition. This is to make the comparison of the steady state with the outcome of the static model meaningful. (In the opposite case with large discount rate the solution is trivial: take the level of e¤ort as given and pay the constant wage that ensures that the participation constraint is satis…ed. Indeed, since the principal does not care about the future and since current e¤ort is given, the only problem is one of optimal risk sharing, which implies the above outcome since the principal is risk neutral and the agent risk averse. Admittedly, this case is not very interesting.) The maximum principle states that the present-value Hamiltonian should be maximized with respect to ®, which implies ®(t) = 11 ¹(t) : Á¾ 2 (17) Combining (17) with (15)-(16) gives the complete system characterizing the optimal incentive schedule: d® z 1 = (1 + ½)® + 2 ¡ 2 ; dt Á¾ Á¾ dz = ® ¡ z; dt z(0) = z0 ; lim ®(t)e¡½t = 0; t!1 (18) (19) (20) where z0 is the initial e¤ort exerted by the agent. De…ne ° by the expression q (½ + 2)2 + °= 4 Á¾ 2 ¡½ 2 : (21) Then the only solution to the system (18)-(20) is: 1 1 + (z0 ¡ )(1 ¡ °)e¡°t 2 1 + Á¾ (1 + ½) 1 + Á¾ 2 (1 + ½) 1 1 z(t) = + (z0 ¡ )e¡°t : 2 1 + Á¾ (1 + ½) 1 + Á¾ 2 (1 + ½) ®(t) = (22) (23) Two things are worth mentioning here. First, the slope of the compensation scheme ®(t) converges to a stationary value, which coincides with the slope of the optimal compensation scheme for a rational agent in the static model if ½ = 0. Second, on the dynamic path z(t) > ®(t) provided that z0 is greater than the steady-state level; that is, the agent exerts more e¤ort than would be myopically optimal, though the di¤erence shrinks in time. It is straightforward to show that the present value of expected pro…ts of the principal as a function of the initial conditions have the form: ¦(z0 ) = Az02 + Bz0 + C; (24) with A < 0 and C < 0. (One has simply to plug (22)-(23) into (13) and carry out the integration.) This implies that pro…ts would be negative if initial e¤ort is too high. In that case the principal would prefer to stay out 12 of business. This makes intuitive sense since the principal has to compensate agent for wasted e¤ort due to the participation constraint. Note that the model of this section provides an additional rationale for simple linear schemes. Under suitable assumptions such schemes provide the agent a strictly concave objective function. Agents who search myopically would eventually learn their globally maximizing e¤ort level independently of initial e¤ort level. Since this level of e¤ort maximizes the total surplus and the agent’s utility is …xed by the participation constraint, the principle wants the agent to learn the global maximizer. With nonlinear compensation schemes, the agent’s objective function need not be concave and hence may have local maxima which are not global. In that case, what the agent learns might depend on the initial e¤ort level. 5 OPTIMAL INCENTIVES WHEN AGENTS ALSO IMITATE In this section I consider the problem of designing of an optimal compensation scheme when a population of agents is engaged in a social learning process. Assume that there is a continuum of identical agents working for the same principal. Assume that the principal can pay a wage based only on the output produced by an agent but not on relative performance, and is bound to pay di¤erent agents the same wage for the same performance. Each agent chooses e¤ort x from the interval [c; d] (d > c > 0) at each point in time.5 Under the wage schedule and cost of e¤ort speci…ed in Section 2, the instantaneous payo¤ to the agent from choosing x at t is given by the function: U (x; t) = ®(t)x ¡ x2 + const: 2 (25) The constant term in (25) does not depend on x but depends on ®. Each agent starts at some exogenously speci…ed e¤ort level x0 2 and adjusts it at times k¢t, where k is a natural number and ¢t is some …xed time interval length. To describe the adjustment rule it is necessary to specify 5 The symbol z will now be used for mean e¤ort level in the agent population. 13 the information available to the agent at the moment of adjustment and the rules of information processing. I will assume that each agent knows the gradient (i.e. derivative) of the payo¤ function at the point of her current choice. During the time interval ¢t the agent also observes the choice of some other randomly selected member of the population of agents. Let yt denote the observed choice. The agent adjusts her choice of x according to the rule: xt+¢t = (1 ¡ °(¢t))(xt + @U (xt ) ¢t) + °(¢t)(yt ¡ xt ): @x (26) The …rst term represents gradient dynamics, while the second term represents imitation. In the above formula the imitation weight °(¢t) is assumed to be itself a nondegenerate random variable with a compact support and such that E(°(¢t)) = 0. This implies a particular form of imitation: imitation of scope, which was informally discussed in Section 2. Since °(¢t) assumes both positive and negative values with positive probability, the agent does not imitate directly the choice of the other agent. Instead, she opens a search window the width of which is determined by the degree of disagreement between her current behavior and the observed choice of another agent, that is by (yt ¡ xt ). Intuitively, since the observation the agent makes is the choice of another boundedly rational agent, there is no good reason to imitate the choice directly. On the other hand, the spread of the choices in the population indicates that society as a whole does not know the optimal choice and hence that there may be returns to experimentation. The second term in (26) embodies a simple version of this intuition: jyt ¡ xt j increases probabilistically in the population’s spread. It will be convenient to assume °(¢t) = ®(¢t)u, where ®(¢t) is a deterministic function and u is a random variable, independent of x, with a compact support and such that E(u) = 0; and V ar(u) = 1. To proceed further Ipwill make the following assumption. Assumption ®(¢t) = ¢t. The assumption guarantees that in the continuous-time limit both the gradient dynamics and the imitation terms will be of the same order of magnitude. If ®(¢t) converged to zero at a faster rate as ¢t goes to zero, the continuous-time limit would be described by the gradient dynamics already considered in previous section. If ®(¢t) converged to zero at a slower rate, the compensation scheme would have no impact on behavior in the 14 continuous-time limit, which does not seem very realistic. So I will consider behavior generated by (26) in the continuous-time limit under Assumption. Let f (x; t) denote the density of the choices in the population of agents at time t. (If we normalize the mass of the population to be one, an equivalent interpretation of the function f (x; t) would be the probability density of the choice for an individual at time t.) Its evolution is described by the following theorem: Theorem 1 Let the adjustment rule be given by (26) under Assumption 1. Then the continuous-time evolution of the density f(x; t) is well-de…ned and is governed by: @f @ 1 @2 + ((®(t) ¡ x)f) = ((x ¡ z(t))2 + v(t)f ) @t @x 2 @x2 Zd z(t) = xf (x; t)dx (27) (28) c v(t) = Zd (x ¡ z(t))2 f (x; t)dx: (29) c Proof. Let Q(t; ¢t; xt ; ) denote the probability of ending up in a set at time t+¢t if choice at time t is given by xt and V´ denote a complement to ´¡neighborhood of xt . Using (26) and the fact that °(¢t) = ®(¢t)u, where ®(¢t) is a deterministic function and u is a random variable, independent of x, with a compact support and such that E(u) = 0; and V ar(u) = 1, it is straightforward to see that 1:Q(t; R ¢t; xt ; V´ ) = o(¢t) 2: U´ (wt ¡ xt )Q(t; ¢t; xt ; dwt ) = (®(t) ¡ x)¢t + o(¢t) R 3: U´ (wt ¡ xt )2 Q(t; ¢t; xt ; dwt ) = ((x ¡ z(t))2 + v(t))¢t + o(¢t): Now Theorem 1 follows from the well-known result from the theory of stochastic processes (see Kanan (1979)). The system (27)-(29) should be supplemented by initial and boundary conditions. The initial condition is arbitrary, but I will impose the following boundary condition: (® ¡ x)f ¡ 1 @ ((x ¡ z)2 + v)f ) = 0 for x = c; d; 8t ¸ 0: 2 @x 15 (30) The boundary condition (30) guarantees conservation of the population mass. (Basov (1999a) gives a derivation of (27)-(29) in a multidimensional context under assumptions slightly more general than Assumption 1. I also derive some general properties of the dynamics speci…ed by this system and discusses the behavioral foundations of the boundary conditions (30).) An important feature of this model is the existence of special kinds of solutions: wave packets and quasistationary states. Intuitively, a wave packet is a solution to (27)-(29) in which the mean moves according to the gradient dynamics and the variance shrinks so slowly that in a …rst-order approximation it can be considered to be constant. As the mean approaches its steady state value under the gradient dynamics, a wave packet converges to a quasistationary state, that is to a distribution with a very slowly changing mean and variance. To demonstrate the existence of wave packets and quasistationary states, I need to assume that the choice space is big in the sense that c << z0 ; p 1 << d; c << a; b << d; v0 << d ¡ c 2 1 + Á(¾ + v0 ) (31) Here << means “much less,” z0 and v0 are the mean and variance of the population’s e¤ort distribution at time zero, respectively; and the numbers a and b (a < b) are bounds on the support of z0 which guarantee that the expected pro…t of the principal at time zero is positive. The inequalities in (31) say that the initial distribution and quasistationary state are concentrated far from the boundary points, and that the principal will never force a large probability mass close to the boundary.6 Under (31) I can derive di¤erential equations that govern the evolution of z(t) and v(t). To do this, di¤erentiate equations (28) and (29) with respect to time, integrate by parts, and use the boundary condition (30). This will yield: dz 1 = ®(t) ¡ z ¡ g(z; d; v)f (d; t) ¡ g(z; c; v)f (c; t) dt 2 dv = ¡(d ¡ z)g(z; d; v)f (d; t) ¡ (z ¡ c)g(z; c; v)f (c; t) dt 6 (32) (33) The argument for this last point is the same as in the next-to-last paragraph of Section 3. 16 where g(z; ³; v) = (³ ¡ z)2 + v. Under the conditions imposed on the initial density function, the boundary terms will be small. Indeed, they are small at time zero by assumption and remain small because the variance shrinks in time (due to (33)) and the principal has no incentive to push probability mass close to the boundary. Hence, the system (32)-(33) can be rewritten approximately in the form: dz (34) = ®(t) ¡ z dt dv (35) = 0: dt System (34)-(35) implies that the mean follows the gradient dynamics, while the variance remains constant (wave packet). Next, to formulate the principal’s problem, I need to reformulate the participation constraint. I will assume that each agent observes the variance of the output in population and uses it to evaluate wage variance and, hence, her expected utility and participates as long as it is greater then zero. This pins down ¯(t). Given this, either all agents will decide to participate or all will drop out. Hence, the principal’s problem is M ax Z1 e¡½t (z ¡ z 2 + v(t) Á[®(t)]2 (¾ 2 + v(t)) ¡ )dt 2 2 (36) 0 dz (37) s:t: = ®(t) ¡ z: dt The variance v(t) will be taken to be constant according to the approximation (35); hence I will omit the argument t in the function v(t) below. The derivation of (36) is similar to the derivation of (13). The only di¤erence is that e¤ort is now stochastic. This randomness is re‡ected in the expected cost of e¤ort (E(x2 ) = z 2 + v) and the variance of output which now has two components, exogenous ¾ 2 and endogenous v. The solution to (36)-(37) is given by: (1 ¡ ±) 1 ¡ + (z )e¡±t (38) 0 2 2 1 + Á(¾ + v)(1 + ½) 1 + Á(¾ + v)(1 + ½) 1 1 z(t) = + (z0 ¡ )e¡±t ; (39) 2 2 1 + Á(¾ + v)(1 + ½) 1 + Á(¾ + v)(1 + ½) ®(t) = 17 where q (½ + 2)2 + 4 Á(¾ 2 +v) ¡½ (40) : 2 From (38)-(39) one can see that the steady-state incentive ®(1) is lower than under pure gradient dynamics, and convergence takes longer. On the dynamic path, z(t) > ®(t) provided that z0 is greater than the steady-state level; that is, agents work harder than they would if they myopically responded optimally to the compensation scheme (and to the optimal static model incentive scheme). Such a situation becomes more likely as the endogenous variance v increases. Hence, social learning decreases the power of the optimal incentive scheme but makes it more likely that agents will overwork relative to the incentive scheme. It is also important to note that endogenous variance leads to the decrease of both social welfare and principal’s pro…ts. Hence, the value of the incentive contracts goes down under bounded rationality. This implies that parties may abandon the idea for incentive contracts altogether and rely on reciprocal schemes instead. Role of the reciprocity and gift exchange behavior was …rst discussed by Akerlof (1982). Fehr (2000) demonstrated existence of reciprocal behavior in the series of controlled experiments and Fehr and Gachter (2000) …gured out role of reciprocity in the general incomplete contracting framework. Since the value of incentive contracts under bounded rationality diminishes reciprocity based contracts might become important even if proportion of reciprocal agents in population is small. For further elaboration of this point see Basov (2000b). At the end of the previous section I mentioned that gradient dynamics provides an additional rationale for simple linear incentive schemes. With social learning this argument is even stronger. While under gradient dynamics what the agent learns depends on the initial e¤ort level of the agent, now it depends only on the initial distribution of e¤orts in the population. Statistically, a single initial e¤ort is much easier to estimate than the entire initial population distribution of e¤ort. Hence, the robustness problem already mentioned becomes even more severe in the presence of social-learning of e¤ort, making the use of simple incentive schemes even more important. In all of the above I have assumed a strictly convex quadratic e¤ort cost. If the e¤ort cost is linear, that is g(x) = ·x, then, neglecting boundary terms, the equations for z and v become: ±= 18 dz (41) = ®(t) ¡ · dt dv (42) = 2v: dt The derivation of (41)-(42) is similar to the derivation of (34)-(35). If · > 1 the solution to the principal’s problem subject to (41)-(42) is: ®(t) = 1¡· : ½Á(¾ 2 + v0 e2t ) (43) Under the assumption of a small discount rate (½ close to zero), (43) implies superpowerful incentives for small t (® is large since it is proportional to ½¡1 ), which then decrease. Strictly speaking, however, (43) is valid only for small t, since the variance increases exponentially, which can put large probability mass near the boundary and make the approximation used to derive (41)-(42) inapplicable. To avoid this complication, use a cost-of-e¤ort function g(x) = ·x + x2 2 (44) and assume (31) and c << z0 << · << 1 << d: 1 + Á(¾ 2 + v0 ) (45) Then for small t the solution is described by (43); that is, the principal will provide superpowerful incentives. As z(t) reaches and passes ·, which eventually happens given a su¢ciently small discount rate, the second term in the cost of e¤ort begins to dominate and the change in the slope of the incentive scheme is approximately described by (38). This says that individuals should be given superpowerful incentives at the beginnings of their careers and less powerful incentives towards the end. This would imply that wages will increase with tenure, at a rate that is higher at the beginning of a career (since the expected e¤ort will increase) and stabilize afterwards. Williams (1991), for example, found that tenure increases wage only in the …rst several years of employment, which is consistent with the above consequence of my model. 19 6 SUFFICIENT STATISTICS REVISITED One of the main results of conventional contract theory is that when several signals of e¤ort are observed optimal contracts should be based on the su¢cient statistics of e¤ort. Since in the …rst best the principal should compensate the agent only for e¤ort, it seems quite intuitive that the secondbest compensation is based on the best possible estimate of e¤ort available.7 However, this is not the case under bounded rationality. Intuitively, the reason is that under bounded rationality e¤ort can be only partly attributed to the incentive scheme. The other part comes from the social learning process. Formally, consider a model similar to the model of the previous section but allow for two measures of output ¦1 and ¦2 . Let them be determined as follows: ¦1 = x + "1 ¦2 = x + "2 : (46) (47) Here x is e¤ort exerted by the agent, "1 and "2 are independent normal random variables with zero means and variances ¾ 21 and ¾ 22 , respectively. Suppose the principal is interested in ¦1 only.8 The optimal contract under perfect rationality should be based on a su¢cient statistics for x, namely on ¦1 ¦2 + 2. ¾ 21 ¾2 (48) This is rather intuitive: in the optimal contract an observation that conveys more information should be given higher weight. This is re‡ected by the fact that the ratio of the coe¢cients before ¦1 and ¦2 is ¾ 22 =¾ 21 . In the case of bounded rationality, going through the same calculations as in the previous section one can obtain that this ratio changes to (¾ 22 + v)=(¾ 21 + v)9 . That is, the optimal contract will have the form 7 This intuition is a little misleading since in the equilibrium the principal knows the e¤ort. However, to create correct incentives, she should pretend that she does not know and behave as a statistician who tries to estimate e¤ort from available data. 8 Everything below would be true if the principal were interested in any convex combination of ¦1 and ¦2 . 9 To obtain this result one has to assume that agents observe variances of both measures 20 w(t) = ®(t)( ¦2 ¦1 + 2 ) + ¯(t): + v ¾2 + v (49) ¾ 21 The optimal slope ® will be given by ®(t) = 1 (1 ¡ ´) + (z0 ¡ )e¡´t 1 + Á³(1 + ½) 1 + Á³(1 + ½) (50) where (¾ 2 + ¾ 22 + v) ³= 21 ; ´= (¾ 1 + v)(¾ 22 + v) q (½ + 2)2 + 2 4 Á³ ¡½ : (51) The intercept ¯ is given by the participation constraint. Its exact value is not interesting here. (To obtain all these results, one needs to go through calculations similar to those of the previous two sections.) It is worth mentioning that the ratio (¾ 22 + v)=(¾ 21 + v) is less sensitive to the changes in the variances of exogenous noise then ratio ¾ 22 =¾ 21 . If social noise dominates technological uncertainty (v >> max(¾ 21 ; ¾ 22 )) then (49) can be rewritten approximately as w(t) = 2®(t) ¦1 + ¦2 ( ) + ¯(t) v 2 (52) This means that in this case the agents’ payments depend on the arithmetic mean of the signals. Intuition for this is straightforward: the principal wants to pay only for the part of e¤ort that responds to the incentive scheme. To do so, she must …lter out both technological uncertainly and social noise. Since social noise is common to both signals, taking the arithmetic mean is the best way to …lter it. When social noise is much greater then technological uncertainty, then the issue of …ltering out social noise dominates the issue of …ltering out technological uncertainty. of output and treat them as independent when calculating the variance of wage. In fact, behavior of the agents creates correlation between x1 and x2 (though they are independent conditional on e¤ort) but agents fail to understand this. If we assume that agents observe wage variance directlrly the results would change quantitatively but not qualitatively. 21 It is interesting to examine some empirical evidence in the light of this result. Bertrand and Mullainathan (1999) found that the actual payment does not …lter out all technological noise in a variety of contexts. This is inconsistent with standard agency theory but consistent with my model. Indeed, let v >> ¾ 22 >> ¾ 21 . In this case optimal compensation for rational agents should be almost independent of ¦2 ; in order words, ¦2 should be …ltered out. In the case of imitating agents, however, the optimal compensation should approximately depend on the arithmetic mean of ¦1 and ¦2 . This means that the optimal contract here does not …lter out all technological noise as in standard theory. 7 CONCLUSION This paper develops a theoretical framework for analyzing incentive schemes when agents behave in a boundedly rational manner. Even though they take very simple forms, the models help to create a framework for addressing robustness and complexity issues. It helps to understand underprovision of optimal incentives, deviations from a su¢cient statistics result, and gift-exchange behavior. Even though gift-exchange behavior is not a direct consequence of the model it helps to shed light on its prevalence. Indeed, under the behavioral assumption that replaces rationality in my of Section 5 model is that agents’ choices follow some continuous stochastic process. The stochastic component of the adjustment rule is determined by a social learning rule. The important property of this social learning rule is that it allows to existing of endogenous variance in distribution of choices in steady state. This in turn leads to dissipation of social surplus and principal’s pro…ts. It makes incentive scheme rather unattractive in some cases in which they are replaced by reciprocity based schemes. If endogenous variance in agents choices under bounded rationality is big enough even small proportion of reciprocal workers would su¢ce to make reciprocity based schemes more attractive. Underprovision of optimal incentives, deviations from su¢cient statistics result in the direction predicted by the model, and gift-exchange behavior are all ubiquitous in real world. The fact that under social learning incentive contracts become less attractive con…rms the intuition expressed, for example, in Akerlof (1982) that gift-exchange is a result of social interaction. Even though a lot of questions (for example, robustness of optimal incentive 22 scheme) remain unsolved in this paper it can be viewed as a useful …rst step in incorporated bounded rationality into incentive problems. 23 REFERENCES Akerlof, G. A. “Labor Contracts as Partial Gift Exchange,” Quarterly Journal of Economics, 1982, 97, pp. 543-569. Anderson S. P., J. K. Goeree, and C. A. Holt. Stochastic Game Theory: Adjustment to Equilibrium under Bounded Rationality, unpublished draft, 1997. Basov, S. “Axiomatic Model of Learning,” unpublished draft, 2000a. Basov, S. “Bounded Rationality, Reciprocity, and Incomplete Contracts,” unpublished draft, 2000b. Bertrand, M., and S. Mullainathan “Are CEOs Rewarded for Luck? A Test of Performance Filtering,” NBER Working Paper #7604. Bush R., and F. Mosteller. Stochastic Models for Learning, New York, Wiley, 1955. Dewartripont, M., I. Jewitt, and J. Tirole “The Economics of Career Concerns, Part I,” Review of Economic Studies, 1999, 66, pp.183-198. Estes W. K., “Towards Statistical Theory of Learning,” Psychological Review, 1950, 57(2), pp.94-107. Fehr, E. Do Incentive Contracts Crowd Out Voluntary Cooperation? mimeo, University of Zurich, 2000. Fehr, E., and S. Gachter Fairness and Retaliation: The Economics of Reciprocity. Forthcoming in the Journal of Economic Perspectives, 2000. Fudenberg, D., and C. Harris. “Evolutionary Dynamics with Aggregate Shocks,” Journal of Economic Theory, 1992, 57, pp. 420-441. .. Hart, O., and B. Holmstrom “The Theory of Contracts,” in T. Bewley (ed.) Advances in Economic Theory, Fifth World Congress, 1987, New York: Cambridge University Press. .. Holmstrom, B. “Moral Hazard and Observability,” Bell Journal of Economics, 1979, 10, pp.74-91 .. Holmstrom, B. “Moral Hazard in Teams,” Bell Journal of Economics, 1982, 13, pp. 324-340. Kanan, D. An Introduction to Stochastic Processes, Elsevier North Holland, Inc., 1979. Kandori, M., G. Mailath, and R. Rob “Learning, Mutation and Long Run Equilibria in Games,” Econometrica, 1993, 61, pp. 29-56. Luce R. D. Individual Choice Behavior, Wiley, 1959. Mirrlees, J. “Notes on Welfare Economics, Information and Uncertainty,” in M. Balch, D. McFadden, and S. Wu (eds.) Essays in Economic Behavior 24 Under Uncertainty, 1974, pp.243-258, Amsterdam: North-Holland. Mirrlees, J. The Optimal Structure of Authority and Incentives Within an Organization,” Bell Journal of Economics,1976, 7, pp. 105-131. Rogers L. C. G., and D. Williams Di¤usions, Markov Processes and Martingales, Wiley Series in Probability and Mathematical Analysis, John Wiley & Sons, 1994. Ross, S. “The Economic Theory of Agency: The Principal’s Problem,” American Economic Review, 1973, 63, pp.134-139. Shavell, S. “Risk Sharing and Incentives in the Principal and Agent Relationship,” Bell Journal of Economics, 1979, 10, pp. 55-73. Spence, M. and R. Zeckhauser. “Insurance, Information and Individual Action,” American Economic Review (Papers and Proceedings), 1971, 61, pp. 380-387. Williams, N. “Reexamining the Wage, Tenure and Experience Relationship,” The Review of Economics and Statistics, 1991, 73, pp.512-517. Young, P. “The Evolution of Conventions,” Econometrica, 1993, 61, pp. 57-84. 25 8 APPENDIX In this appendix I will introduce formal notation, formulate axioms needed for decomposition of a stochastic behavioral algorithm, and prove the decomposition theorem. Assume that an individual is faced with repeated choices over time from the set of alternatives ½ Rn ; which is assumed to be compact, and let § be a sigma-algebra on . Other individuals are making similar decisions simultaneously, and, although their decisions do not a¤ect each other directly, they are assumed to carry relevant information. At time t the individual observes the current choice of another member of the population, yt . For any ¡ 2 §, de…ne the transition probability P (fx(t)g; fy(t)g; ¡; t; ¿ ), that is the probability that agent who at time t has history of choices fx(t)g and history of observations fy(t)g, will make a choice w 2 ¡ at time t + ¿ . Here P (fx(t)g; fy(t)g; ¡; t; ¿ ) is assumed to be a continuous functional of fx(t)g, and fy(t)g, continuous function of t and ¿ , and for each fx(t)g; fy(t)g; t; ¿ , de…nes a measure on § which satis…es P (fx(t)g; fy(t)g; ; t; ¿ ) = 1. Assumption 1 P (fx(t)g; fy(t)g; ¡; t; ¿ ) = P (x; y; ¡; t; ¿ ); where x = x(t); and y = y (t). Assumption 1 says that only current choice and current observation determine transition probabilities. In other was, if choices in population at time t are determined by a density function f(y; t) then a process with a transition probability P (x; ¡; t; ¿ ) = Z P (x; y; ¡; t; ¿ )f(y; t)dy (53) will be a Markov process. Assumption 1 is rather weak. Indeed, assume an agent keeps track about her choices and observations at discrete moments of time and remembers only …nitely many choices and observations. Then one can always rede…ne choice space in such a way that Assumption 1 will hold. Hence, Assumption 1 is essentially …nite memory assumption. Assumption 2 There exists a function p(x; y; z; t; ¿ ) > 0 measurable in z and twice continuously di¤erentiable in ¿ such that for any ¡ 2 § transition probability is given by P (x; y; ¡; t; ¿ ) = Z p(x; y; z; t; ¿ )dz ¡ 26 De…ne a set V± (x) = fw 2 : kw ¡ xk < ±g, where k¢k denotes Euclidean norm. Assumption 3 For any ± > 0 and any x 2 transition probability satis…es P (x; y; V±c (x); t; ¿ ) = o(¿ ) and the following limits exist: 1 lim E(x(t + ¿ ) ¡ x(¿ )j x(t); y(t)) ¿ !0 ¿ 1 lim V ar(x(t + ¿ ) ¡ x(¿ )j x(t); y(t)): ¿ !0 ¿ Here and throughout the Appendix superindex c near set denotes the complement of the set. Assumption 4 For any x 2 and any neighborhood V (x) transition probability P (x; y; V c (x); t; ¿ ) considered as function of y achieves minimum at y = x. This assumption says that observing y di¤erent from x increases the probability to moving away from x at increasing rate. This assumption makes random term involved in imitation of scope and exogenous experimentation independent. Since this independence is not important for my results I omitted it in the text. Finally, I will assume: Assumption 5 p(x; y; z; t) is four times continuously di¤erentiable in x and y for any t ¸ 0 and any realization of z. Assumptions 2 and 5 are combined in the main text in Axiom 2. Theorem 2 Assume that Assumptions 1-5 are satis…ed. Let f (¢; t) denotes the density of population choices at time t. Then the exist twice continuously di¤erentiable vector function ¹1 (x; t); and matrix valued functions ¹2 (x; y; t), ¡1 (x; t), and ¡2 (x; y; t) such that matrices ¡1 (x; t), and ¡2 (x; y; t) are positive semide…nite. Then the generator of the stochastic behavioral algorithm is given by: $ = (¹1 + Z 1 T r(¡1 + 2 Z ¹2 (y ¡ x)f (y; t)dy)r + (y ¡ x)T ¡2 (y ¡ x)f(y; t)dy)D2 ; 27 (54) where @ @ @2 2 r=( ; :::; ); fD gij = : @x1 @xn @xi @xj (55) and T r denotes trace of a matrix. Proof. By de…nition (Rogers and Williams (1994)) the generator of a Markov process is de…ned by: P (x; ¡; t; ¿ ) ¡ I ; ¿ !0 ¿ (56) $ = lim where I is identity operator. It can be shown (Kanan (1979)) that under assumptions of the theorem $ = a(x; t)r + C(x; t)D2 ; (57) where 1 a(x; t) = lim E(x(t + ¿ ) ¡ x(¿ )) ¿ !0 ¿ 1 C(x; t) = lim V ar(x(t + ¿ ) ¡ x(¿ )): ¿ !0 ¿ Assumption 3 guarantees that the above limits exist. The law of iterated expectations implies that a(x; t) = Z ¹(x; y; t)f (y; t)dy; C(x; t) = Z ¡(x; y; t)f(y; t)dy (58) where 1 ¹(x; y; t) = lim ¿ !0 ¿ Z 1 (z ¡ x)pdz; ¡(x; y; t) = lim ¿ !0 ¿ Using Assumption 5 one can write: 28 Z (z ¡ x)(z ¡ x)T pdz (59) ¹(x; y; t) = ¹(x; x; t) + ¹0 (x; y; t)(y ¡ x) ¡ij (x; y; t) = ¡ij (x; x; t) + n X k=1 n X k;`=1 (60) ¡0ijk (x; x; t)(y ¡ x)k + ¡00ijk` (x; y; t)(y ¡ x)k (y ¡ x)` (61) Assumptions 4 and 5 imply that ¡0ijk (x; x; t) = 0. De…ne ¹1 (x; t) = ¹(x; x; t); ¹2 (x; y; t) = ¹0 (x; y; t); n X ¡1ij (x; t) = ¡ij (x; x; t); ¡2k` (x; y; t) = ¡00ijk` (x; y; t) (62) (63) i:j=1 Positive semide…nitness of matrices ¡1 and ¡2 follows from their de…nition and Assumption 4. Finally, $ = ¹1 + Z 1 T r(¡1 + 2 Z ¹2 (y ¡ x)f (y; t)dy)r + (y ¡ x)T ¡2 (y ¡ x)f (y; t)dy)D2 (64) and the theorem is proved. To interpret Theorem 2 let us consider a speci…c behavioral model. The framework is similar to general model except for time is assumed to be discrete and behavioral rule is given explicitly by: xt+¿ ¡ xt = ·(xt ; yt ; t)¿ + B(¿ ; xt ; yt ; t)(yt ¡ xt )"t + ¤(xt ; t)» t 29 (65) Here ·(xt ; yt ; t) = ¸(xt ; t) + ±(xt ; yt ; t), where ±(xt ; yt ) = º(xt ; yt )(yt ¡ xt ) for some matrix º. All functions are assumed to be twice continuously differentiable in x and y, and continuously di¤erentiable in t. The random variables " and » are assumed to be independent, have compact support, E(") = E(») = 0 and V ar(") = V ar(») = 1. The …rst term in the behavioral rule describes the deterministic adjustment and the direct imitation of choices, the second describes imitation of scope, and the third one describes exogenous experimentation. I will study continuous time limit of the stochastic process generated this behavioral rule. To pass to this limit assume: matrix valued Assumption 6 There exist twice continuously di¤erentiable p 2m function b : R ! R, such that B(¿ ; xt ; yt ; t) = b(xt ; yt ) ¿ : ¹1 = ¸, ¹2 = º, ¡1 = bT b, and ¡2 = ¤T ¤. Theorem 3 Let Assumption 6 be satis…ed. The the generator of stochastic process (64) is given by (63) with ¹1 = ¸, ¹2 = º, ¡1 = bT b, and ¡2 = ¤T ¤. Hence, any stochastic behavioral algorithm can be decomposed into deterministic adjustment, direct imitation, experimentation and imitation of scope. To prove this theorem one can use a technique similar to one used in the prove of Theorem 2. For details see Basov (2000a). 30
© Copyright 2026 Paperzz