INCENTIVES FOR BOUNDEDLY RATIONAL AGENTS

INCENTIVES FOR BOUNDEDLY
RATIONAL AGENTS
March 2, 2001
Suren Basov
Department of Economics
Boston University
270 Bay State Road
Boston, MA 02215
Acknowledgments: I am grateful to Adam Brandenburger, Hsueh-Ling
Huynh, Dilip Mookherjee, Robert W. Rosenthal, and Marc Rysman for helpful comments and discussion. None of them is responsible for any mistakes
in the paper.
Keywords: bounded rationality, incentives, principal-agent model.
JEL classi…cation numbers: C60, D82, D83
e-mail: [email protected]
1
Abstract
This paper develops a theoretical framework for analyzing incentive schemes
under bounded rationality. It starts from a standard principal-agent model
and then superimposes an assumption of boundedly rational behavior on the
part of the agent. Boundedly rational behavior is modeled as an explicit
optimization procedure which combines gradient dynamics with a speci…c
form of social learning called imitation of scope. The model creates a framework for addressing the issues of the robustness and complexity of incentive
schemes. The results help to shed light on the underprovision of optimal
incentives and on gift-exchange behavior in the real world. As a by-product,
a standard su¢cient statistics result from the agency literature is seen not
to hold in this world.
2
1
INTRODUCTION
Agency relationships form an important part of economic life. Among the
most common examples are managers acting on behalf of an owner, workers
supplying labor to a …rm, and customers buying coverage from an insurance
company. The common feature of all these examples is that unobservable actions undertaken by one party have payo¤ relevant consequences for another.
This creates a moral hazard problem. The early papers that incorporated
agency relationships into formal economic models were Spence and Zeck..
hauser (1971), Ross (1973), Mirrlees (1974, 1976), and Holmstrom (1979).
The …rst three authors used a state-space formulation, while the last two
switched to a parametrized distribution formulation. The latter formulation
has become more popular in part because of its greater tractability.
Though the assumptions of the basic models, pioneered by Mirrlees (1974,
1976), were quite general, they were capable of generating some predictions
that were in broad agreement with casual empiricism. The main predictive
..
power of the models is due to a su¢cient statistics result [Holmstrom (1979,
1982), Shavell (1979), Dewatripont, Jewitt, and Tirole (1999)]. Among other
things, it leads to statements about the use of relative performance evaluations that matches empirical evidence. The result states that optimal compensation to an agent should be based on a function of observable measures
of e¤ort which is a su¢cient statistic for e¤ort. Intuitively, this means that
the principal pretends not to know the e¤ort level undertaken by an agent1
and uses the best statistical estimate based on performance. Adding other
measures of performance on top of the su¢cient statistic would only add
additional noise without improving statistical inference. Given that the optimal scheme should balance risk sharing against creation of proper incentives,
such additions impair performance.
However, the basic agency model has some well-recognized di¢culties.
The main one is its sensitivity to distributional assumptions, which manifests
itself in the complexity of the optimal sharing rules. We see that sharing rules
are much simpler in practice than the basic agency model would suggest.
One explanation for such simplicity is a demand that the optimal scheme
should be robust to a variety of hypotheses. The demand for robustness
can be understood by imagining an agent with a richer set of actions. Since
intricate schemes are too …nely tuned, the more options such an agent has,
1
In fact the principal knows the e¤ort level on the equilibrium path.
3
the worse the old optimal scheme will perform. This issue was addressed in
..
the paper of Holmstrom and Milgrom (1987), which found a setting in which
a simple scheme is optimal. They considered a dynamic context, where the
agent is paid at the end of a period (say a month), but she observes her own
performance during the month so that she can adjust her e¤ort based on
the partially realized path of output. Though feasible incentive schemes can
..
be very complicated and conditioned on the entire history, Holmstrom and
Milgrom showed, in the case when principal and agent have CARA utility
functions, that the optimal scheme pays wages that are a linear function of the
…nal output, while the agent chooses a constant e¤ort in time, independently
of the realized history.
Though the linearity result can shed some light on the simplicity of compensation schemes used in practice, the assumptions of the model are quite
restrictive. For example, for the result to hold the agent must be prevented
from making private investments, which does not seem to be very realis..
tic. As Hart and Holmstrom (1987) write: one will have to go outside the
Bayesian framework and introduce bounded rationality in order to capture the
true sense in which incentive schemes need to be robust in the real world.
This paper can be considered as a …rst step in incorporating bounded
rationality into the incentive problems. It does not go so far as to derive
linear incentive schemes (such a scheme is simply assumed in the paper).
However, it provides a framework in which robustness issues can be addressed
in future research.
Before going further it is important to say what is meant by “bounded
rationality” in this paper. Full rationality implies that an agent has wellde…ned preferences over set of possible outcomes and is capable of …nding
the optimal outcome instantaneously. I drop this assumption and model the
agent’s behavior as a search algorithm. Think about maximizing a function
on a computer. To do so one has to write a program. If one has to use the
same program for many problems and require that it works in reasonable
time (grid search is excluded), one should not expect it to solve all problems
perfectly. By “bounded rationality” I mean that the optimum should be
learned rather than solved be instantaneously and the agents should not
always be assumed to succeed in learning it. I do not consider bounded
rationality in the senses of “limited memory” or “limited information”.
Basov (2000a) investigates the general properties of learning algorithms
and parametrizes such algorithms in terms of coe¢cients of the generator
of a stochastic learning process. Some particular cases of those learning
4
algorithms are applied in this paper to the standard principal-agent model.
I …nd that when a principal knows that a population of agents behaves
according to such a process and plans accordingly, the contracts she o¤ers can
be very di¤erent from those of standard models, and the resulting behavior of
both principal and agents can be more realistic than is depicted in standard
models. The rest of the paper is organized as follows: Section 2 discusses
general learning models, Section 3 reviews a standard principal-agent model,
and Section 4 considers agents following gradient dynamics (an example of
a learning algorithm). The central part of the paper is Section 5 where a
social learning rule is proposed for a population of agents, and the main
results about the nature of contract and the resulting agents behavior are
presented. I revisit the su¢cient statistic question in Section 6 and conclude
in Section 7.
2
BOUNDED RATIONALITY AND LEANING
As I stated in the Introduction, by “bounded rationality” I mean that the optimal choice should be learned by agents rather then made instantaneously.
I will assume that learning generates a stochastic process on the space of
possible choice. Introducing stochastic elements in learning algorithm makes
sense even for purely computational reasons. For instance, they may prevent an optimization algorithm from being trapped in a local (as opposed
to global) maxima. In fact, many computational optimization algorithms
(for example, simulated annealing) use stochastic elements. The stochastic
nature of the learning algorithm also gives rise to probabilistic choice models
which have been routinely used to explain behavior in psychological experiments. See, for example, Estes (1950), Bush and Mosteller (1955). These
models were introduced to economics by Luce (1959).
Several stochastic learning models have found applications in economics.
For example, see Kandori, Mailath, and Rob (1993), Fudenberg and Harris
(1992), Young (1993), and Anderson, Goeree, Holt (1997). These papers
make speci…c assumptions about the source and type of randomness. I do
not want to specify the source of randomness explicitly. Instead, I ask what
is the most general form of a stochastic learning algorithm. To answer this
5
question I use an axiomatic approach.2 I assume that agents adjusts her
choice stochastically. She does so in a social environment after observing the
choice of another individual.
Axiom 1 The probability of reaching a particular set is determined solely
by the current choice and current observation.
Axiom 1 is rather weak. Indeed, assume an agent keeps track of her
choices and observations at discrete moments of time and remembers only
…nitely many choices and observations. Then one can always rede…ne the
choice space in such a way that Axiom 1 will hold. Hence, Axiom1 is essentially …nite memory assumption.
Axiom 2 Transition probability conditional on current choice and current
observation is represented by a probability measure which is absolutely continuous with respect to Lebesque measure and its Radon-Nykodim derivative
is su¢ciently smooth .
Axiom 3 The stochastic process de…ned by the transition probabilities is
Khinchine continuous and is uniquely de…ned by generator.3
The axioms stated so far are technical in nature. They are satis…ed by
many reasonable stochastic processes. For example, Brownian motion with
drift satis…es them.
Axioms 1-3 allow us to decompose any stochastic behavioral algorithm
into a deterministic and a stochastic part. Furthermore, this decomposition
is unique. The stochastic part can be decomposed further. I will return to
this question below. Detailed proof of this assertion can be found in Basov
(2000a). A sketch of proof is given in the Appendix.
Besides those technical axioms, I would like to capture the intuition that
the deterministic part of the adjustment process should represent a form of
learning. In particular, it should be payo¤ improving over time.
De…nition Let x(t) denote the choice of the agent at time t if she were
governed by the deterministic part of the process alone. I will say that choice
x1 is strictly revealed preferred to choice x2 (x1 6= x2 ) if there exists t1 >
t2 ¸ 0, such that x(t1 ) = x1 and x(t2 ) = x2 . In this case write x1 Rx2 .
This de…nition says that the choice which is made later should be better
for the agent, if her behavior is strictly deterministic. Intuitively, that is
what learning would achieve.
2
In the main text axioms are presented informally. For a formal presentation of axioms
see Appendix.
3
For a de…nition of technical terms see Kanan (1979).
6
Axiom 4 The revealed preference relation R can be represented by a continuously di¤erentiable utility function.
Axioms 4 allows us to put restrictions on the deterministic part of learning
process, namely it implies that deterministic adjustment takes place in the
direction of increased utility. This result is rather intuitive, formal proof can
be found in Basov (2000a). Now the main result about the form of the learning algorithm can be restated in a following way. Any stochastic learning
algorithm can be decomposed into four behavioral components: adjustment
in the direction of increased payo¤s (generalized gradient dynamics)4 , experimentation, direct imitation, and imitation of scope. While the …rst three
components are self explanatory and rather intuitive, the last needs some
explanation. By imitation of scope I mean a procedure like the following: an
agent observes the choice of another agent in the population and decides to
experiment randomly within a window the width of which is proportional to
the Euclidean distance between her current choice and the observation. Note
that the outcome of the experiment may fall closer to or further from the
observed choice. To understand the intuition behind such a procedure, note
that experimentation pays o¤ when an agent is far from the optimum. Assume that all agents are ex-ante symmetrical and have the same preferences.
Then on average the distance of their choices from the optimum should be
the same. If two choices are close to an optimum, then the distance between
them should also be small by the triangle inequality. Alternatively, a large
distance between two choices means that at least one of them is far from the
optimum. Symmetry implies that there is at least a 50% chance that the
observing agent is far enough and hence the value of experimentation is high
for her.
This paper uses a particular version of the model of bounded rationality
developed in Basov (2000a) and described above. In Section 4 I consider
agents who follow gradient dynamics. In Section 5 I combine it with the
imitation of scope. I will not introduce direct imitation and experimentation.
The reason is that their introduction will make analysis less tractable but
is unlikely to o¤er any new insights. Indeed, direct imitation will simply
facilitate gradient dynamics and experimentation will add exogenous noise
to the endogenous one created by the imitation of scope.
It is important to note that the model of bounded rationality developed
4
If adjustment takes place in the direction of maximum increase of payo¤s it is called
gradient dynamics.
7
in Basov (2000a) and used in this paper can be applied to a wide variety
of problems. In principal, any problem that can be considered using the
conventional model can be reconsidered in this model. Hence, this model
is not tailored to a particular application as many other models of bounded
rationality are. For a discussion of other potential applications see Basov
(1999).
3
A SIMPLE PRINCIPAL-AGENT MODEL
In this section I will consider a simple conventional principal-agent
model. Let the gross pro…t of the principal be given by
¦ = z + ";
(1)
where z is e¤ort undertaken by the agent, and " is random noise with zero
mean and variance ¾ 2 . Only ¦ is observable by the principal. The utility of
the agent is given by:
Á
z2
U = E(w) ¡ V ar(w) ¡ ;
2
2
(2)
where w is the agent’s payment (wage) conditioned on z through ¦. The
principal wants to maximize expected pro…ts net of the wage, subject to the
incentive compatibility constraint:
z2
Á
z 2 arg max(E(w) ¡ V ar(w) ¡ )
2
2
(3)
and the individual rationality (participation) constraint:
Á
z2
E(w) ¡ V ar(w) ¡
¸ 0:
2
2
(4)
I will concentrate attention on a¢ne payment schemes:
w = ®¦ + ¯:
8
(5)
It is straightforward to show that the optimal a¢ne contract has:
®=
1
;
1 + Á¾ 2
¯=
Á¾ 2 ¡ 1
:
2(1 + Á¾ 2 )2
(6)
To see this, note that ® is chosen to maximize a total surplus W de…ned as
(7)
W = E(U + ¦ ¡ w);
subject to (3), and ¯ is chosen to insure that (4) holds. Since in this case the
objective function of the agent is strictly concave, the incentive constraint
can be replaced by the …rst order condition z = ®. Plugging this into (7),
solving the maximization program, and using (4) to obtain ¯, yields (6).
The net pro…t of the principal and the utility of the agent under the
optimal a¢ne compensation scheme are given by:
E(¦ ¡ w) =
1
;
2(1 + Á¾ 2 )
(8)
U = 0:
One can see that the slope ® of the optimal compensation scheme and the
pro…t of the principal are decreasing in ¾, while the utility of the agent
is determined by the reservation utility, which is normalized at zero here.
Hence, noise damps incentives and dissipates social surplus.
By way of contrast, consider a nonlinear scheme:
w = ®¦ + °¦2 + ¯:
(9)
The utility of the agent will then be given by
Á
U = ®z + °(z 2 + ¾ 2 ) + ¯ ¡ (4z¾ 2 °[z° + ®] +
2
° 2 V ar("2 ) + ®2 ¾ 2 + [4° 2 z + 2®°]E("3 )) ¡
9
z2
:
2
(10)
This formula can be obtained by plugging (9) into (2) and evaluating relevant
expectations and variances. The terms V ar("2 ) and E("3 ) depend in general
on higher moments of the distribution of ", that is on the third and fourth
moments. For nonlinear schemes higher order moments will generally play a
role. This means that knowledge of the entire distribution becomes important
for general nonlinear schemes. Hence, they are not robust with respect to
the distributional assumptions.
4
OPTIMAL INCENTIVES UNDER GRADIENT DYNAMICS
In this section I will assume that the agent is boundedly rational and
adjusts her choices in the direction of increasing payo¤. Let the general
structure of the model be the same as in Section 2. As in the previous section, restrict attention to a¢ne compensation schemes. The main di¤erence
between the model in this section and that of Section 2 is that the agent,
instead of responding optimally to the compensation scheme, adjusts her
choices according to the di¤erential equation:
dz
(11)
= ®(t) ¡ z;
dt
conditionally on the decision to participate. The function on the right hand
side of (11) is the derivative of agent’s utility with respect to z. This is
the reason I call equation (11) gradient dynamics. It is worth mentioning
that units of utility have meaning in this framework since they determine
the speed of adjustment. This contrasts with the rational paradigm where
utility units are arbitrary.
The agent participates if her instantaneous utility at time t is nonnegative,
that is if:
®z ¡
z2 1 2 2
¡ Á® ¾ + ¯ ¸ 0:
2
2
(12)
(To obtain the left-hand side of (12), plug the compensation scheme (5)
into the agent’s utility function (2) and use the assumed expectation and
variance.)
10
The principal seeks to maximize the discounted expected present value
of net pro…ts, subject to (11) and (12). Solving (12) with equality for ¯; one
gets the following optimal control problem for the principal:
M ax
Z1
e¡½t (z ¡
z 2 Á[®(t)]2 ¾ 2
¡
)dt
2
2
(13)
0
subject to (11). The integrand is discounted total surplus.
The present-value Hamiltonian for this problem has the form:
H =z¡
z 2 Á®2 ¾ 2
¡
+ ¹(® ¡ z):
2
2
(14)
The evolution of the costate variable ¹ is governed by:
d¹
= (1 + ½)¹ ¡ 1 + z;
dt
(15)
together with the transversality condition:
lim ¹(t)e¡½t = 0:
t!1
(16)
The time discount rate ½ will be assumed to be small, in fact I will …x it at
zero after using the transversality condition. This is to make the comparison
of the steady state with the outcome of the static model meaningful. (In the
opposite case with large discount rate the solution is trivial: take the level of
e¤ort as given and pay the constant wage that ensures that the participation
constraint is satis…ed. Indeed, since the principal does not care about the
future and since current e¤ort is given, the only problem is one of optimal risk
sharing, which implies the above outcome since the principal is risk neutral
and the agent risk averse. Admittedly, this case is not very interesting.)
The maximum principle states that the present-value Hamiltonian should
be maximized with respect to ®, which implies
®(t) =
11
¹(t)
:
Á¾ 2
(17)
Combining (17) with (15)-(16) gives the complete system characterizing the
optimal incentive schedule:
d®
z
1
= (1 + ½)® + 2 ¡ 2 ;
dt
Á¾
Á¾
dz
= ® ¡ z;
dt
z(0) = z0 ; lim ®(t)e¡½t = 0;
t!1
(18)
(19)
(20)
where z0 is the initial e¤ort exerted by the agent.
De…ne ° by the expression
q
(½ + 2)2 +
°=
4
Á¾ 2
¡½
2
:
(21)
Then the only solution to the system (18)-(20) is:
1
1
+ (z0 ¡
)(1 ¡ °)e¡°t
2
1 + Á¾ (1 + ½)
1 + Á¾ 2 (1 + ½)
1
1
z(t) =
+ (z0 ¡
)e¡°t :
2
1 + Á¾ (1 + ½)
1 + Á¾ 2 (1 + ½)
®(t) =
(22)
(23)
Two things are worth mentioning here. First, the slope of the compensation scheme ®(t) converges to a stationary value, which coincides with the
slope of the optimal compensation scheme for a rational agent in the static
model if ½ = 0. Second, on the dynamic path z(t) > ®(t) provided that z0 is
greater than the steady-state level; that is, the agent exerts more e¤ort than
would be myopically optimal, though the di¤erence shrinks in time.
It is straightforward to show that the present value of expected pro…ts of
the principal as a function of the initial conditions have the form:
¦(z0 ) = Az02 + Bz0 + C;
(24)
with A < 0 and C < 0. (One has simply to plug (22)-(23) into (13) and
carry out the integration.) This implies that pro…ts would be negative if
initial e¤ort is too high. In that case the principal would prefer to stay out
12
of business. This makes intuitive sense since the principal has to compensate
agent for wasted e¤ort due to the participation constraint.
Note that the model of this section provides an additional rationale for
simple linear schemes. Under suitable assumptions such schemes provide the
agent a strictly concave objective function. Agents who search myopically
would eventually learn their globally maximizing e¤ort level independently
of initial e¤ort level. Since this level of e¤ort maximizes the total surplus
and the agent’s utility is …xed by the participation constraint, the principle
wants the agent to learn the global maximizer. With nonlinear compensation
schemes, the agent’s objective function need not be concave and hence may
have local maxima which are not global. In that case, what the agent learns
might depend on the initial e¤ort level.
5
OPTIMAL INCENTIVES WHEN AGENTS
ALSO IMITATE
In this section I consider the problem of designing of an optimal compensation scheme when a population of agents is engaged in a social learning
process. Assume that there is a continuum of identical agents working for
the same principal. Assume that the principal can pay a wage based only
on the output produced by an agent but not on relative performance, and
is bound to pay di¤erent agents the same wage for the same performance.
Each agent chooses e¤ort x from the interval [c; d] (d > c > 0) at each point
in time.5 Under the wage schedule and cost of e¤ort speci…ed in Section 2,
the instantaneous payo¤ to the agent from choosing x at t is given by the
function:
U (x; t) = ®(t)x ¡
x2
+ const:
2
(25)
The constant term in (25) does not depend on x but depends on ®.
Each agent starts at some exogenously speci…ed e¤ort level x0 2 ­ and
adjusts it at times k¢t, where k is a natural number and ¢t is some …xed
time interval length. To describe the adjustment rule it is necessary to specify
5
The symbol z will now be used for mean e¤ort level in the agent population.
13
the information available to the agent at the moment of adjustment and the
rules of information processing.
I will assume that each agent knows the gradient (i.e. derivative) of the
payo¤ function at the point of her current choice. During the time interval ¢t
the agent also observes the choice of some other randomly selected member
of the population of agents. Let yt denote the observed choice. The agent
adjusts her choice of x according to the rule:
xt+¢t = (1 ¡ °(¢t))(xt +
@U (xt )
¢t) + °(¢t)(yt ¡ xt ):
@x
(26)
The …rst term represents gradient dynamics, while the second term represents
imitation. In the above formula the imitation weight °(¢t) is assumed to
be itself a nondegenerate random variable with a compact support and such
that E(°(¢t)) = 0. This implies a particular form of imitation: imitation
of scope, which was informally discussed in Section 2. Since °(¢t) assumes
both positive and negative values with positive probability, the agent does
not imitate directly the choice of the other agent. Instead, she opens a search
window the width of which is determined by the degree of disagreement between her current behavior and the observed choice of another agent, that
is by (yt ¡ xt ). Intuitively, since the observation the agent makes is the
choice of another boundedly rational agent, there is no good reason to imitate the choice directly. On the other hand, the spread of the choices in
the population indicates that society as a whole does not know the optimal
choice and hence that there may be returns to experimentation. The second
term in (26) embodies a simple version of this intuition: jyt ¡ xt j increases
probabilistically in the population’s spread.
It will be convenient to assume °(¢t) = ®(¢t)u, where ®(¢t) is a deterministic function and u is a random variable, independent of x, with a
compact support and such that E(u) = 0; and V ar(u) = 1.
To proceed further Ipwill make the following assumption.
Assumption ®(¢t) = ¢t.
The assumption guarantees that in the continuous-time limit both the
gradient dynamics and the imitation terms will be of the same order of magnitude. If ®(¢t) converged to zero at a faster rate as ¢t goes to zero,
the continuous-time limit would be described by the gradient dynamics already considered in previous section. If ®(¢t) converged to zero at a slower
rate, the compensation scheme would have no impact on behavior in the
14
continuous-time limit, which does not seem very realistic. So I will consider
behavior generated by (26) in the continuous-time limit under Assumption.
Let f (x; t) denote the density of the choices in the population of agents at
time t. (If we normalize the mass of the population to be one, an equivalent
interpretation of the function f (x; t) would be the probability density of the
choice for an individual at time t.) Its evolution is described by the following
theorem:
Theorem 1 Let the adjustment rule be given by (26) under Assumption 1.
Then the continuous-time evolution of the density f(x; t) is well-de…ned and
is governed by:
@f
@
1 @2
+
((®(t) ¡ x)f) =
((x ¡ z(t))2 + v(t)f )
@t
@x
2 @x2
Zd
z(t) =
xf (x; t)dx
(27)
(28)
c
v(t) =
Zd
(x ¡ z(t))2 f (x; t)dx:
(29)
c
Proof. Let Q(t; ¢t; xt ; ­) denote the probability of ending up in a set ­
at time t+¢t if choice at time t is given by xt and V´ denote a complement to
´¡neighborhood of xt . Using (26) and the fact that °(¢t) = ®(¢t)u, where
®(¢t) is a deterministic function and u is a random variable, independent of
x, with a compact support and such that E(u) = 0; and V ar(u) = 1, it is
straightforward to see that
1:Q(t;
R ¢t; xt ; V´ ) = o(¢t)
2: U´ (wt ¡ xt )Q(t; ¢t; xt ; dwt ) = (®(t) ¡ x)¢t + o(¢t)
R
3: U´ (wt ¡ xt )2 Q(t; ¢t; xt ; dwt ) = ((x ¡ z(t))2 + v(t))¢t + o(¢t):
Now Theorem 1 follows from the well-known result from the theory of
stochastic processes (see Kanan (1979)).
The system (27)-(29) should be supplemented by initial and boundary
conditions. The initial condition is arbitrary, but I will impose the following
boundary condition:
(® ¡ x)f ¡
1 @
((x ¡ z)2 + v)f ) = 0 for x = c; d; 8t ¸ 0:
2 @x
15
(30)
The boundary condition (30) guarantees conservation of the population mass.
(Basov (1999a) gives a derivation of (27)-(29) in a multidimensional context
under assumptions slightly more general than Assumption 1. I also derive
some general properties of the dynamics speci…ed by this system and discusses the behavioral foundations of the boundary conditions (30).)
An important feature of this model is the existence of special kinds of solutions: wave packets and quasistationary states. Intuitively, a wave packet
is a solution to (27)-(29) in which the mean moves according to the gradient
dynamics and the variance shrinks so slowly that in a …rst-order approximation it can be considered to be constant. As the mean approaches its steady
state value under the gradient dynamics, a wave packet converges to a quasistationary state, that is to a distribution with a very slowly changing mean
and variance.
To demonstrate the existence of wave packets and quasistationary states,
I need to assume that the choice space is big in the sense that
c << z0 ;
p
1
<< d; c << a; b << d; v0 << d ¡ c
2
1 + Á(¾ + v0 )
(31)
Here << means “much less,” z0 and v0 are the mean and variance of the
population’s e¤ort distribution at time zero, respectively; and the numbers
a and b (a < b) are bounds on the support of z0 which guarantee that the
expected pro…t of the principal at time zero is positive. The inequalities in
(31) say that the initial distribution and quasistationary state are concentrated far from the boundary points, and that the principal will never force
a large probability mass close to the boundary.6
Under (31) I can derive di¤erential equations that govern the evolution of
z(t) and v(t). To do this, di¤erentiate equations (28) and (29) with respect
to time, integrate by parts, and use the boundary condition (30). This will
yield:
dz
1
= ®(t) ¡ z ¡ g(z; d; v)f (d; t) ¡ g(z; c; v)f (c; t)
dt
2
dv
= ¡(d ¡ z)g(z; d; v)f (d; t) ¡ (z ¡ c)g(z; c; v)f (c; t)
dt
6
(32)
(33)
The argument for this last point is the same as in the next-to-last paragraph of Section
3.
16
where g(z; ³; v) = (³ ¡ z)2 + v. Under the conditions imposed on the initial
density function, the boundary terms will be small. Indeed, they are small
at time zero by assumption and remain small because the variance shrinks
in time (due to (33)) and the principal has no incentive to push probability
mass close to the boundary. Hence, the system (32)-(33) can be rewritten
approximately in the form:
dz
(34)
= ®(t) ¡ z
dt
dv
(35)
= 0:
dt
System (34)-(35) implies that the mean follows the gradient dynamics, while
the variance remains constant (wave packet).
Next, to formulate the principal’s problem, I need to reformulate the
participation constraint. I will assume that each agent observes the variance
of the output in population and uses it to evaluate wage variance and, hence,
her expected utility and participates as long as it is greater then zero. This
pins down ¯(t). Given this, either all agents will decide to participate or all
will drop out. Hence, the principal’s problem is
M ax
Z1
e¡½t (z ¡
z 2 + v(t) Á[®(t)]2 (¾ 2 + v(t))
¡
)dt
2
2
(36)
0
dz
(37)
s:t:
= ®(t) ¡ z:
dt
The variance v(t) will be taken to be constant according to the approximation
(35); hence I will omit the argument t in the function v(t) below. The
derivation of (36) is similar to the derivation of (13). The only di¤erence is
that e¤ort is now stochastic. This randomness is re‡ected in the expected
cost of e¤ort (E(x2 ) = z 2 + v) and the variance of output which now has
two components, exogenous ¾ 2 and endogenous v. The solution to (36)-(37)
is given by:
(1 ¡ ±)
1
¡
+
(z
)e¡±t (38)
0
2
2
1 + Á(¾ + v)(1 + ½)
1 + Á(¾ + v)(1 + ½)
1
1
z(t) =
+ (z0 ¡
)e¡±t ; (39)
2
2
1 + Á(¾ + v)(1 + ½)
1 + Á(¾ + v)(1 + ½)
®(t) =
17
where
q
(½ + 2)2 +
4
Á(¾ 2 +v)
¡½
(40)
:
2
From (38)-(39) one can see that the steady-state incentive ®(1) is lower
than under pure gradient dynamics, and convergence takes longer. On the
dynamic path, z(t) > ®(t) provided that z0 is greater than the steady-state
level; that is, agents work harder than they would if they myopically responded optimally to the compensation scheme (and to the optimal static
model incentive scheme). Such a situation becomes more likely as the endogenous variance v increases. Hence, social learning decreases the power
of the optimal incentive scheme but makes it more likely that agents will
overwork relative to the incentive scheme.
It is also important to note that endogenous variance leads to the decrease of both social welfare and principal’s pro…ts. Hence, the value of the
incentive contracts goes down under bounded rationality. This implies that
parties may abandon the idea for incentive contracts altogether and rely on
reciprocal schemes instead. Role of the reciprocity and gift exchange behavior was …rst discussed by Akerlof (1982). Fehr (2000) demonstrated existence
of reciprocal behavior in the series of controlled experiments and Fehr and
Gachter (2000) …gured out role of reciprocity in the general incomplete contracting framework. Since the value of incentive contracts under bounded
rationality diminishes reciprocity based contracts might become important
even if proportion of reciprocal agents in population is small. For further
elaboration of this point see Basov (2000b).
At the end of the previous section I mentioned that gradient dynamics
provides an additional rationale for simple linear incentive schemes. With
social learning this argument is even stronger. While under gradient dynamics what the agent learns depends on the initial e¤ort level of the agent,
now it depends only on the initial distribution of e¤orts in the population.
Statistically, a single initial e¤ort is much easier to estimate than the entire
initial population distribution of e¤ort. Hence, the robustness problem already mentioned becomes even more severe in the presence of social-learning
of e¤ort, making the use of simple incentive schemes even more important.
In all of the above I have assumed a strictly convex quadratic e¤ort cost.
If the e¤ort cost is linear, that is g(x) = ·x, then, neglecting boundary terms,
the equations for z and v become:
±=
18
dz
(41)
= ®(t) ¡ ·
dt
dv
(42)
= 2v:
dt
The derivation of (41)-(42) is similar to the derivation of (34)-(35). If · > 1
the solution to the principal’s problem subject to (41)-(42) is:
®(t) =
1¡·
:
½Á(¾ 2 + v0 e2t )
(43)
Under the assumption of a small discount rate (½ close to zero), (43) implies
superpowerful incentives for small t (® is large since it is proportional to ½¡1 ),
which then decrease.
Strictly speaking, however, (43) is valid only for small t, since the variance increases exponentially, which can put large probability mass near the
boundary and make the approximation used to derive (41)-(42) inapplicable.
To avoid this complication, use a cost-of-e¤ort function
g(x) = ·x +
x2
2
(44)
and assume (31) and
c << z0 << · <<
1
<< d:
1 + Á(¾ 2 + v0 )
(45)
Then for small t the solution is described by (43); that is, the principal will
provide superpowerful incentives. As z(t) reaches and passes ·, which eventually happens given a su¢ciently small discount rate, the second term in
the cost of e¤ort begins to dominate and the change in the slope of the incentive scheme is approximately described by (38). This says that individuals
should be given superpowerful incentives at the beginnings of their careers
and less powerful incentives towards the end. This would imply that wages
will increase with tenure, at a rate that is higher at the beginning of a career
(since the expected e¤ort will increase) and stabilize afterwards. Williams
(1991), for example, found that tenure increases wage only in the …rst several
years of employment, which is consistent with the above consequence of my
model.
19
6
SUFFICIENT STATISTICS REVISITED
One of the main results of conventional contract theory is that when
several signals of e¤ort are observed optimal contracts should be based on
the su¢cient statistics of e¤ort. Since in the …rst best the principal should
compensate the agent only for e¤ort, it seems quite intuitive that the secondbest compensation is based on the best possible estimate of e¤ort available.7
However, this is not the case under bounded rationality. Intuitively, the
reason is that under bounded rationality e¤ort can be only partly attributed
to the incentive scheme. The other part comes from the social learning
process.
Formally, consider a model similar to the model of the previous section
but allow for two measures of output ¦1 and ¦2 . Let them be determined as
follows:
¦1 = x + "1
¦2 = x + "2 :
(46)
(47)
Here x is e¤ort exerted by the agent, "1 and "2 are independent normal
random variables with zero means and variances ¾ 21 and ¾ 22 , respectively.
Suppose the principal is interested in ¦1 only.8 The optimal contract under
perfect rationality should be based on a su¢cient statistics for x, namely on
¦1 ¦2
+ 2.
¾ 21
¾2
(48)
This is rather intuitive: in the optimal contract an observation that conveys
more information should be given higher weight. This is re‡ected by the fact
that the ratio of the coe¢cients before ¦1 and ¦2 is ¾ 22 =¾ 21 . In the case of
bounded rationality, going through the same calculations as in the previous
section one can obtain that this ratio changes to (¾ 22 + v)=(¾ 21 + v)9 . That is,
the optimal contract will have the form
7
This intuition is a little misleading since in the equilibrium the principal knows the
e¤ort. However, to create correct incentives, she should pretend that she does not know
and behave as a statistician who tries to estimate e¤ort from available data.
8
Everything below would be true if the principal were interested in any convex combination of ¦1 and ¦2 .
9
To obtain this result one has to assume that agents observe variances of both measures
20
w(t) = ®(t)(
¦2
¦1
+ 2
) + ¯(t):
+ v ¾2 + v
(49)
¾ 21
The optimal slope ® will be given by
®(t) =
1
(1 ¡ ´)
+ (z0 ¡
)e¡´t
1 + Á³(1 + ½)
1 + Á³(1 + ½)
(50)
where
(¾ 2 + ¾ 22 + v)
³= 21
; ´=
(¾ 1 + v)(¾ 22 + v)
q
(½ + 2)2 +
2
4
Á³
¡½
:
(51)
The intercept ¯ is given by the participation constraint. Its exact value is
not interesting here. (To obtain all these results, one needs to go through
calculations similar to those of the previous two sections.)
It is worth mentioning that the ratio (¾ 22 + v)=(¾ 21 + v) is less sensitive to
the changes in the variances of exogenous noise then ratio ¾ 22 =¾ 21 . If social
noise dominates technological uncertainty (v >> max(¾ 21 ; ¾ 22 )) then (49) can
be rewritten approximately as
w(t) =
2®(t) ¦1 + ¦2
(
) + ¯(t)
v
2
(52)
This means that in this case the agents’ payments depend on the arithmetic
mean of the signals. Intuition for this is straightforward: the principal wants
to pay only for the part of e¤ort that responds to the incentive scheme. To
do so, she must …lter out both technological uncertainly and social noise.
Since social noise is common to both signals, taking the arithmetic mean is
the best way to …lter it. When social noise is much greater then technological
uncertainty, then the issue of …ltering out social noise dominates the issue of
…ltering out technological uncertainty.
of output and treat them as independent when calculating the variance of wage. In fact,
behavior of the agents creates correlation between x1 and x2 (though they are independent
conditional on e¤ort) but agents fail to understand this. If we assume that agents observe
wage variance directlrly the results would change quantitatively but not qualitatively.
21
It is interesting to examine some empirical evidence in the light of this result. Bertrand and Mullainathan (1999) found that the actual payment does
not …lter out all technological noise in a variety of contexts. This is inconsistent with standard agency theory but consistent with my model. Indeed, let
v >> ¾ 22 >> ¾ 21 . In this case optimal compensation for rational agents should
be almost independent of ¦2 ; in order words, ¦2 should be …ltered out. In the
case of imitating agents, however, the optimal compensation should approximately depend on the arithmetic mean of ¦1 and ¦2 . This means that the
optimal contract here does not …lter out all technological noise as in standard
theory.
7
CONCLUSION
This paper develops a theoretical framework for analyzing incentive
schemes when agents behave in a boundedly rational manner. Even though
they take very simple forms, the models help to create a framework for addressing robustness and complexity issues. It helps to understand underprovision of optimal incentives, deviations from a su¢cient statistics result, and
gift-exchange behavior.
Even though gift-exchange behavior is not a direct consequence of the
model it helps to shed light on its prevalence. Indeed, under the behavioral
assumption that replaces rationality in my of Section 5 model is that agents’
choices follow some continuous stochastic process. The stochastic component
of the adjustment rule is determined by a social learning rule. The important
property of this social learning rule is that it allows to existing of endogenous
variance in distribution of choices in steady state. This in turn leads to
dissipation of social surplus and principal’s pro…ts. It makes incentive scheme
rather unattractive in some cases in which they are replaced by reciprocity
based schemes. If endogenous variance in agents choices under bounded
rationality is big enough even small proportion of reciprocal workers would
su¢ce to make reciprocity based schemes more attractive.
Underprovision of optimal incentives, deviations from su¢cient statistics
result in the direction predicted by the model, and gift-exchange behavior
are all ubiquitous in real world. The fact that under social learning incentive contracts become less attractive con…rms the intuition expressed, for
example, in Akerlof (1982) that gift-exchange is a result of social interaction.
Even though a lot of questions (for example, robustness of optimal incentive
22
scheme) remain unsolved in this paper it can be viewed as a useful …rst step
in incorporated bounded rationality into incentive problems.
23
REFERENCES
Akerlof, G. A. “Labor Contracts as Partial Gift Exchange,” Quarterly
Journal of Economics, 1982, 97, pp. 543-569.
Anderson S. P., J. K. Goeree, and C. A. Holt. Stochastic Game Theory:
Adjustment to Equilibrium under Bounded Rationality, unpublished draft,
1997.
Basov, S. “Axiomatic Model of Learning,” unpublished draft, 2000a.
Basov, S. “Bounded Rationality, Reciprocity, and Incomplete Contracts,”
unpublished draft, 2000b.
Bertrand, M., and S. Mullainathan “Are CEOs Rewarded for Luck? A
Test of Performance Filtering,” NBER Working Paper #7604.
Bush R., and F. Mosteller. Stochastic Models for Learning, New York,
Wiley, 1955.
Dewartripont, M., I. Jewitt, and J. Tirole “The Economics of Career
Concerns, Part I,” Review of Economic Studies, 1999, 66, pp.183-198.
Estes W. K., “Towards Statistical Theory of Learning,” Psychological
Review, 1950, 57(2), pp.94-107.
Fehr, E. Do Incentive Contracts Crowd Out Voluntary Cooperation? mimeo,
University of Zurich, 2000.
Fehr, E., and S. Gachter Fairness and Retaliation: The Economics of
Reciprocity. Forthcoming in the Journal of Economic Perspectives, 2000.
Fudenberg, D., and C. Harris. “Evolutionary Dynamics with Aggregate
Shocks,” Journal of Economic Theory, 1992, 57, pp. 420-441.
..
Hart, O., and B. Holmstrom “The Theory of Contracts,” in T. Bewley
(ed.) Advances in Economic Theory, Fifth World Congress, 1987, New York:
Cambridge University Press.
..
Holmstrom, B. “Moral Hazard and Observability,” Bell Journal of Economics, 1979, 10, pp.74-91
..
Holmstrom, B. “Moral Hazard in Teams,” Bell Journal of Economics,
1982, 13, pp. 324-340.
Kanan, D. An Introduction to Stochastic Processes, Elsevier North Holland, Inc., 1979.
Kandori, M., G. Mailath, and R. Rob “Learning, Mutation and Long Run
Equilibria in Games,” Econometrica, 1993, 61, pp. 29-56.
Luce R. D. Individual Choice Behavior, Wiley, 1959.
Mirrlees, J. “Notes on Welfare Economics, Information and Uncertainty,”
in M. Balch, D. McFadden, and S. Wu (eds.) Essays in Economic Behavior
24
Under Uncertainty, 1974, pp.243-258, Amsterdam: North-Holland.
Mirrlees, J. The Optimal Structure of Authority and Incentives Within
an Organization,” Bell Journal of Economics,1976, 7, pp. 105-131.
Rogers L. C. G., and D. Williams Di¤usions, Markov Processes and Martingales, Wiley Series in Probability and Mathematical Analysis, John Wiley
& Sons, 1994.
Ross, S. “The Economic Theory of Agency: The Principal’s Problem,”
American Economic Review, 1973, 63, pp.134-139.
Shavell, S. “Risk Sharing and Incentives in the Principal and Agent Relationship,” Bell Journal of Economics, 1979, 10, pp. 55-73.
Spence, M. and R. Zeckhauser. “Insurance, Information and Individual
Action,” American Economic Review (Papers and Proceedings), 1971, 61, pp.
380-387.
Williams, N. “Reexamining the Wage, Tenure and Experience Relationship,” The Review of Economics and Statistics, 1991, 73, pp.512-517.
Young, P. “The Evolution of Conventions,” Econometrica, 1993, 61, pp.
57-84.
25
8
APPENDIX
In this appendix I will introduce formal notation, formulate axioms needed
for decomposition of a stochastic behavioral algorithm, and prove the decomposition theorem.
Assume that an individual is faced with repeated choices over time from
the set of alternatives ­ ½ Rn ; which is assumed to be compact, and let
§ be a sigma-algebra on ­. Other individuals are making similar decisions
simultaneously, and, although their decisions do not a¤ect each other directly,
they are assumed to carry relevant information. At time t the individual
observes the current choice of another member of the population, yt . For
any ¡ 2 §, de…ne the transition probability P (fx(t)g; fy(t)g; ¡; t; ¿ ), that is
the probability that agent who at time t has history of choices fx(t)g and
history of observations fy(t)g, will make a choice w 2 ¡ at time t + ¿ . Here
P (fx(t)g; fy(t)g; ¡; t; ¿ ) is assumed to be a continuous functional of fx(t)g,
and fy(t)g, continuous function of t and ¿ , and for each fx(t)g; fy(t)g; t; ¿ ,
de…nes a measure on § which satis…es P (fx(t)g; fy(t)g; ­; t; ¿ ) = 1.
Assumption 1 P (fx(t)g; fy(t)g; ¡; t; ¿ ) = P (x; y; ¡; t; ¿ ); where x = x(t);
and y = y (t).
Assumption 1 says that only current choice and current observation determine transition probabilities. In other was, if choices in population at
time t are determined by a density function f(y; t) then a process with a
transition probability
P (x; ¡; t; ¿ ) =
Z
P (x; y; ¡; t; ¿ )f(y; t)dy
(53)
­
will be a Markov process. Assumption 1 is rather weak. Indeed, assume an
agent keeps track about her choices and observations at discrete moments of
time and remembers only …nitely many choices and observations. Then one
can always rede…ne choice space in such a way that Assumption 1 will hold.
Hence, Assumption 1 is essentially …nite memory assumption.
Assumption 2 There exists a function p(x; y; z; t; ¿ ) > 0 measurable in z
and twice continuously di¤erentiable in ¿ such that for any ¡ 2 § transition
probability is given by
P (x; y; ¡; t; ¿ ) =
Z
p(x; y; z; t; ¿ )dz
¡
26
De…ne a set V± (x) = fw 2 ­ : kw ¡ xk < ±g, where k¢k denotes
Euclidean norm.
Assumption 3 For any ± > 0 and any x 2 ­ transition probability satis…es
P (x; y; V±c (x); t; ¿ ) = o(¿ ) and the following limits exist:
1
lim E(x(t + ¿ ) ¡ x(¿ )j x(t); y(t))
¿ !0 ¿
1
lim V ar(x(t + ¿ ) ¡ x(¿ )j x(t); y(t)):
¿ !0 ¿
Here and throughout the Appendix superindex c near set denotes the complement of the set.
Assumption 4 For any x 2 ­ and any neighborhood V (x) transition probability P (x; y; V c (x); t; ¿ ) considered as function of y achieves minimum at
y = x.
This assumption says that observing y di¤erent from x increases the probability to moving away from x at increasing rate.
This assumption makes random term involved in imitation of scope and
exogenous experimentation independent. Since this independence is not important for my results I omitted it in the text.
Finally, I will assume:
Assumption 5 p(x; y; z; t) is four times continuously di¤erentiable in x and
y for any t ¸ 0 and any realization of z.
Assumptions 2 and 5 are combined in the main text in Axiom 2.
Theorem 2 Assume that Assumptions 1-5 are satis…ed. Let f (¢; t) denotes
the density of population choices at time t. Then the exist twice continuously
di¤erentiable vector function ¹1 (x; t); and matrix valued functions ¹2 (x; y; t),
¡1 (x; t), and ¡2 (x; y; t) such that matrices ¡1 (x; t), and ¡2 (x; y; t) are positive
semide…nite. Then the generator of the stochastic behavioral algorithm is
given by:
$ = (¹1 +
Z
­
1
T r(¡1 +
2
Z
­
¹2 (y ¡ x)f (y; t)dy)r +
(y ¡ x)T ¡2 (y ¡ x)f(y; t)dy)D2 ;
27
(54)
where
@
@
@2
2
r=(
; :::;
); fD gij =
:
@x1
@xn
@xi @xj
(55)
and T r denotes trace of a matrix.
Proof. By de…nition (Rogers and Williams (1994)) the generator of a
Markov process is de…ned by:
P (x; ¡; t; ¿ ) ¡ I
;
¿ !0
¿
(56)
$ = lim
where I is identity operator. It can be shown (Kanan (1979)) that under
assumptions of the theorem
$ = a(x; t)r + C(x; t)D2 ;
(57)
where
1
a(x; t) = lim E(x(t + ¿ ) ¡ x(¿ ))
¿ !0 ¿
1
C(x; t) = lim V ar(x(t + ¿ ) ¡ x(¿ )):
¿ !0 ¿
Assumption 3 guarantees that the above limits exist. The law of iterated
expectations implies that
a(x; t) =
Z
¹(x; y; t)f (y; t)dy; C(x; t) =
­
Z
¡(x; y; t)f(y; t)dy
(58)
­
where
1
¹(x; y; t) = lim
¿ !0 ¿
Z
1
(z ¡ x)pdz; ¡(x; y; t) = lim
¿ !0 ¿
­
Using Assumption 5 one can write:
28
Z
­
(z ¡ x)(z ¡ x)T pdz
(59)
¹(x; y; t) = ¹(x; x; t) + ¹0 (x; y; t)(y ¡ x)
¡ij (x; y; t) = ¡ij (x; x; t) +
n
X
k=1
n
X
k;`=1
(60)
¡0ijk (x; x; t)(y ¡ x)k +
¡00ijk` (x; y; t)(y ¡ x)k (y ¡ x)`
(61)
Assumptions 4 and 5 imply that ¡0ijk (x; x; t) = 0. De…ne
¹1 (x; t) = ¹(x; x; t); ¹2 (x; y; t) = ¹0 (x; y; t);
n
X
¡1ij (x; t) = ¡ij (x; x; t); ¡2k` (x; y; t) =
¡00ijk` (x; y; t)
(62)
(63)
i:j=1
Positive semide…nitness of matrices ¡1 and ¡2 follows from their de…nition
and Assumption 4. Finally,
$ = ¹1 +
Z
­
1
T r(¡1 +
2
Z
­
¹2 (y ¡ x)f (y; t)dy)r +
(y ¡ x)T ¡2 (y ¡ x)f (y; t)dy)D2
(64)
and the theorem is proved.
To interpret Theorem 2 let us consider a speci…c behavioral model. The
framework is similar to general model except for time is assumed to be discrete and behavioral rule is given explicitly by:
xt+¿ ¡ xt = ·(xt ; yt ; t)¿ + B(¿ ; xt ; yt ; t)(yt ¡ xt )"t + ¤(xt ; t)» t
29
(65)
Here ·(xt ; yt ; t) = ¸(xt ; t) + ±(xt ; yt ; t), where ±(xt ; yt ) = º(xt ; yt )(yt ¡ xt )
for some matrix º. All functions are assumed to be twice continuously differentiable in x and y, and continuously di¤erentiable in t. The random
variables " and » are assumed to be independent, have compact support,
E(") = E(») = 0 and V ar(") = V ar(») = 1.
The …rst term in the behavioral rule describes the deterministic adjustment and the direct imitation of choices, the second describes imitation of
scope, and the third one describes exogenous experimentation.
I will study continuous time limit of the stochastic process generated this
behavioral rule. To pass to this limit assume:
matrix valued
Assumption 6 There exist twice continuously di¤erentiable
p
2m
function b : R ! R, such that B(¿ ; xt ; yt ; t) = b(xt ; yt ) ¿ :
¹1 = ¸, ¹2 = º, ¡1 = bT b, and ¡2 = ¤T ¤.
Theorem 3 Let Assumption 6 be satis…ed. The the generator of stochastic
process (64) is given by (63) with ¹1 = ¸, ¹2 = º, ¡1 = bT b, and ¡2 =
¤T ¤. Hence, any stochastic behavioral algorithm can be decomposed into
deterministic adjustment, direct imitation, experimentation and imitation of
scope.
To prove this theorem one can use a technique similar to one used in the
prove of Theorem 2. For details see Basov (2000a).
30