Optimal Feedback and Wage Policies∗ Alex Smolin† April 15, 2015 [WORK IN PROGRESS] Abstract We consider a principal-agent setting in which the principal has superior expertise to assess performance of the agent. The agent, if capable and making an effort, generates successes exponentially distributed over time. The principal observes successes and may disclose them to the agent via (i) a feedback policy, if transfers are not allowed, or (ii) a wage policy, if transfers are allowed. We solve for principal’s optimal policies and find that they are coarse in both cases; the principal postpones revealing information about the agent’s performance. When transfers are not allowed, the optimal feedback policy prescribes a single revision at a fixed date; it leaves the agent with procrastination rents when his actions are not observable. When transfers are allowed, the optimal wage policy starts with a probation period that is followed by permanent employment if the agent has ever been successful; it satisfies limited liability and extracts full surplus even when agent’s actions are not observable. Keywords: principal-agent, communication, experimentation. JEL Codes: D82, D83, M52 ∗ I thank Dirk Bergemann, Johannes Hörner, and Larry Samuelson for constant support and encouragement. I am indebted to Yale economics program for outstanding research environment and many of its participants for insightful discussions. † Department of Economics, Yale University, 28 Hillhouse Ave., New Haven, CT, 06520, USA, [email protected] 1 1 Introduction In many situations a principal in charge has greater expertise than a subordinate agent to assess the agent’s performance. For example, an adviser can better evaluate the progress of her student or a manager can better evaluate the work of her employee than the student of the employee can. While providing feedback the principal faces a natural trade-off. On one hand, some feedback is required for the agent to make informed decisions. On the other hand, constant feedback may discourage the agent too quickly after a sequence of bad performance reviews. What is the optimal policy that the principal should pursue in this case? In this paper we formalize the question posed above in a setting of exponential bandit experimentation. The agent, if capable and putting efforts, generates successes exponentially distributed over time; to achieve a success both talent and hard work as well as a bit of luck are required. The principal observes successes and may disclose them to the agent. In the main setting we endow the principal with full commitment and solve for principal’s optimal policies in settings where she can use transfers and those where she cannot. In both cases the optimal policies are coarse; the principal postpones revealing information about the agent’s performance. When transfers are not allowed, an optimal policy is a deterministic revision, it provides full feedback at a pre-specified date. The revision time depends on whether the agent’s actions are observable or not. If actions are observable, then the principal prolongs experimentation by placing the revision as far in the future as possible to leave no surplus to the agent; an agent is threatened by a permanent feedback shutdown if ever seen shirking. If action are not observable, then the principal must place the revision earlier in time and leave the agent with procrastination rents; otherwise, the agent would procrastinate in the early game. These optimal policies are practical for implementation: first, they are conceptually easy because the revision time is a fixed calendar date commonly known between the players that does not depend on past messages or outcomes; second, the policies are “inexpensive” in that they require attention from a principal only at a single point in time. When transfers are allowed, an optimal policy consists of probation followed by employment if the agent succeeds. Namely, it starts with a probation period during which the agent receives neither feedback nor salary. After that, if a success has ever occurred the agent is permanently hired and enjoys a wage premium depending on how successful he was during probation; the size of the premium is tailored to provide minimal incentives for the agent to work. This policy allows the principal extract full surplus even under moral hazard and limited liability constraints. The key idea behind the surplus extraction is that the efficient policy is informationally coarse – it prescribes to work until a pre-specified date and then continue to work if and only if a success has ever occurred. Consequently, it does not require to reveal any information to the agent 2 until that date and allows to shut down the information rents carried by the agent in the case of full success observability. This result draws attention to the informational content of a salary and suggests that the employer’s incentives to muddle feedback may translate into increased wage rigidity. Related literature: We model experimentation as a bandit problem that can be formally viewed as an agent sequentially pulling different slot machine arms with unknown returns. This problem in discrete time was first developed by Robbins (1952) and introduced in economics by Rothschild (1974). Since then it was applied extensively in many different economic settings as it concisely captures the tradeoff between exploration and exploitation. See a recent review by Bergemann and Valimaki (2008) for more detail. We use its continuous time extension, an exponential bandit, with a good arm yielding successes at exponentially distributed random times. It was developed by Pressman and Sonin (1990) for a single-agent problem and popularized in economics by Keller et al. (2005) in the analysis of moral hazard in teams. The bandit problem has been recently applied in many design settings on experimentation and innovation. This literature starts with the a principal-agent conflict highlighted by Hölmstrom (1979) and designs schemes that help the principal to mitigate it. Bonatti and Hörner (2013) derive optimal monetary schemes to incentivize a worker to put in more effort in the presence of career concerns. Hörner and Samuelson (2013) analyze a similar setting in which the principal provides funding for experimentation and cannot commit to future contract terms akin to Bergemann and Hege (1998). Halac et al. (2013) design an optimal scheme in the setting where the moral hazard is coupled with the adverse selection problem; Guo (2014) considers a delegation problem in the setting with adverse selection. Ederer (2013) investigates optimal wage schemes to incentivize experimentation in teams. In all of these papers agents immediately observe their own successes. Instead, this paper concentrates on a dynamic management problem where the successes are privately observed by the principal who can then disclose it to the agent. The problem of static information management has been considered in the economic literature already by Milgrom and Weber (1982) but had been usually parametrized and auxiliary to the mechanism design problem, see Bergemann and Valimaki (2005) for a survey. Bergemann and Pesendorfer (2007) consider a general problem of information management in the context of an optimal auction. Later, Kamenica and Gentzkow (2011) characterize an optimal disclosure policy in a principal-agent problem, whereas Bergemann and Morris (2013a) and Bergemann and Morris (2013b) further analyze settings with interacting players. While the problem of static information management is sufficiently developed, the problem of dynamic information management is just emerging in the economic literature. Bergemann and Wambach (2013) jointly design allocation and disclosure policies in the general auction setting and show that sequential disclosure policy is superior to the static one. Ely (2013) analyzes a principal-agent problem where the information exogeneously arrives over time and the principal wants to maximize an expected time before the agent’s belief 3 hits a certain threshold. Finally, Che and Hörner (2014) consider a problem of designing a recommendation system that aggregates information generated by consumers to maximize aggregate sales. In their setting, as in ours, the information is endogenously generated over time and publicly revealed according to a disclosure policy chosen by the principal. Yet, the agents in their setting are short-lived and thus do not internalize future information externalities. As a result, their conclusions and optimal policies are different from ours even in the case with no transfers. The rest of the paper is organized as follows. Section 2 presents a formal model of experimentation and feedback and derives optimal policies when transfers are not allowed. Section 3 derives the corresponding optimal policies when transfers are allowed. Section 4 discusses possible extensions. Finally, the Appendix contains details of the proofs omitted in the main body of the paper. 2 Feedback policies A dissertation adviser wants to uncover an academic talent in her student that is initially unknown but required to produce an excellent dissertation. The student starts working on a challenging yet promising project and receives constant feedback from the adviser on how well he is doing. At some point, not seeing any major success he gives up and switches to an easy project that secures an average dissertation and a decent but not stellar job in a private company. He doesn’t pursue research projects anymore and never realizes that he was just a month away from a major breakthrough when he gave up the project. Could the advisor persuade the student to work on the project for another month? 2.1 Model Two risk neutral players, a principal (she) and an agent (he), i ∈ {p, a}, interact in continuous time t ∈ [0, ∞). At any point in time the agent decides whether to work or to shirk.1 Work requires costly efforts from the worker but generates random successes if (and only if) the worker is capable. Both players value successes and discount the future at the common rate r. The principal observes successes and commits to a feedback policy that governs what information she discloses to the agent and at what times. More precisely, the agent’s type ω ∈ {0, 1}, incapable or capable, is unknown for both players, p0 being a prior probability for the agent to be capable. If the agent is not working, at = 0, over an interval of time [t, t + dt) both players receive a deterministic stream of payoffs normalized to zero to the principal and to cdt to the agent. If the agent is working, at = 1, over an interval of time [t, t + dt) then a success yt = 1 is 1 This setting is equivalent to having a continuum of efforts a ∈ [0, 1] with instantaneous probability of success being λa and costs ca. 4 generated with probability λωdt that is worth hi > 0 to player i. The principal observes signals s , {st }t≥0 over time. We consider two main cases, observable and unobservable actions. In the former case, the principal constantly observes both actions of the agent and successes, st = {at , yt }; in the latter case, the principal observes only successes, st = {yt }. At the beginning of a game the principal commits to a feedback policy, m = {mt }t≥0 , that maps her private histories into the messages she sends to the agent: mt : mt− , st →M (M ) , where the choice of m includes the choice of the message set M . We place no additional constraints on the size of possible message set M thus concentrating on the strategic limitations of communication rather than the limitations of information channel capacity. In fact, as argued later, it suffices that |M | = |A| = 2. The agent does not observe successes himself but receives information from the principal according to m. In each period his behavioral working strategy, a = {at }t≥0 , maps all past messages and actions into a possible random action, to work or to shirk: at : mt− , at− →M (A) . Denote by M, A the sets of possible feedback policies and working strategies. Given a feedback policy m ∈ M and a working strategy a ∈ A, the normalized payoffs of the players are ˆ Ua (a, m) = E ∞ re−rt (at λha ω + (1 − at ) c) dt|a, m , 0 ˆ Up (a, m) = E ∞ re−rt at λhp ωdt|a, m . 0 Given this specification, the principal wants the agent to work all the time while the agent would like to work only if he is sufficiently certain that he is capable of producing a success. 5 2.2 Optimal feedback policy The principal cannot use monetary transfers as in mechanism design problems or directly control the actions of the agent as in delegation problems. Instead, she manages information available to the agent. The optimal feedback problem is thus a dynamic information management problem that can be stated as max m∈M,a∈A Up (a, m) s.t. a ∈ arg max Ua (a, m) â∈A (IC) To highlight the complexity of the problem we list some possible disclosure policies m ∈ M: • no disclosure: m ≡ ∅, • full disclosure: m ≡ y, • random monitoring: mt = yt with probability ρ and ∅ with probability 1 − ρ for some ρ ∈ (0, 1), • scheduled monitoring: mt = yt for t ∈ T and mt = ∅ for t ∈ / T for some T ⊂ [0, ∞), • scheduled revision: mt = y t , {ys }s≤t for t ∈ T and mt = ∅ for t ∈ / T for some T ⊂ [0, ∞). Despite the seeming complexity of the problem, the solution can be readily obtained if the agent is sufficiently optimistic or sufficiently pessimistic. If the agent is so optimistic that expected flow benefits are greater than opportunity costs, then the principal can achieve her maximal payoff by no disclosure; the agent’s belief will stand still and he will keep working forever. At the same time, if the agent is so pessimistic that he wouldn’t work even under full feedback policy, then the principal cannot persuade him to work, and no information will ever be generated in the relationship. In particular, we define the myopic threshold as p , c λha and the full info threshold as p , 1 1+(λ/r+1)(λha /c−1) . The former is the threshold at which immediate expected gains of work exceed its opportunity costs for the worker. The latter is the threshold at which the worker stops working under the full disclosure policy as derived byPressman and Sonin (1990). Note that p > p as the latter threshold takes into account informational benefits of working. 2 There 2 are no further restrictions on the pair p, p besides p < p; depending on the parameters it can be anywhere in (0, 1)2 . 6 Theorem 1. If p0 ≥ p then an optimal feedback policy is to provide no feedback, m ≡ ∅. If p0 ≤ p then under any feedback policy the agent’s optimal response is not to work, a ≡ 0. Proof. If p0 ≥ p then under the no feedback policy the agent’s belief stands still and he works forever. This achieves the principal’s maximal utility over all a, m and thus is optimal. If p0 ≤ p then even under the full feedback policy the agent never works, a ≡ 0,and his uniquely optimal strategy is to shirk whenever pt ≤ p. Since less feedback limits the choice of available working strategies leaving a ≡ 0 available to the agent, the result follows. The theorem holds for both observable and unobservable actions, leaving unsolved only the case of intermediate prior beliefs p0 ∈ p, p . The main result of the section shows that an optimal feedback policy in this case is a deterministic revision policy. Before we state the result it is instrumental to calculate a payoff U (t, h, c) that an arbitrary person with benefits h and costs c will receive if working until a pre-specified time t and continuing working after that time if and only if a success has occurred. U (t, h, c) = 1 − e−rt p0 λh + e−rt p0 1 − e−λt λh + (1 − p0 (1 − −λt)) c = p0 λh 1 − e−λt e−rt + c 1 − p0 + p0 e−λt e−rt . The first term in the first line captures the expected payoffs from blind experimentation. The second term captures the continuation payoffs weighted by the corresponding probabilities of a success p0 1 − e−λt and the complimentary probability of no successes. The payoffs of the principal and the agent from the corresponding policy are then Ua (t) ,U (t, ha , c) = p0 λha 1 − e−λt e−rt + c 1 − p0 + p0 e−λt e−rt , Up (t) ,U (t, hp , 0) = p0 λhp 1 − e−λt e−rt . The payoff functions Ua (t) , Up (t) are illustrated in the Figure 1. Note that the time at which the belief drops from p0 to p under full feedback policy is exactly t∗a that maximizes Ua (t). 7 Ua ,U p 4 3 Ua c Λhp0 2 Up 1 t* ta * 1 2 3 4 5 t Figure 1: Payoff functions under deterministic feedback policies. p0 = 1/2, r = 1/2, λ = 3, h = 2, c = 4. Theorem 2. If p0 ∈ p, p then an optimal feedback policy is a deterministic revision policy: if actions are observable then mt = y t if t = To and aTo − ≡ 1; mt = ∅ otherwise, if actions are not observable then mt = y t if t = Tu ; mt = ∅ otherwise. The optimal revision times satisfy Ua (To ) = c, Ua (Tu ) = c − Ua0 (Tu ) . r The formal proof is relegated to the Appendix; here we briefly outline its main ideas. The proofs for observable and unobservable action cases go along the same lines. First, we show that an optimal feedback policy can be characterized by a distribution F (t) over times where the principal recommends that the agent permanently stops working if no success has ever occurred. This complexity reduction roughly stems from Pareto efficiency considerations: under an optimal policy the agent never procrastinates, i.e. never works after some time of shirking, so any switch to shirking is permanent; furthermore, the agent should never switch to shirking if a success has ever occurred. Second, we show that the randomization over stopping times is detrimental for the principal; for any random stopping policy there is a deterministic stopping policy that preserves the agent’s incentives and benefits the principal. Consequently, optimal F (t) is concentrated at a single point and thus can be implemented by a deterministic revision policy. Within this class the hardest agent’s incentive constraints are always at 0: his belief stands still all the time before the revision, so he will always prefer to procrastinate earlier rather than later; flow value is the same while informational benefits are later in time. It is then straightforward to show that at the optimal revision times the zero time incentive constraint binds leading to the optimality conditions of the theorem. 8 Ua 4 3 T 5 c 4 Λhp0 3 2 2 1 1 To Tu Ta Tu 1 To Ta t 2 3 4 5 0.3 0.4 0.5 0.6 p0 Figure 2: Left: illustration of revision times at p0 = 1/2, r = 1/2, λ = 3, h = 2, c = 4. Right: comparative statics of revision times in p0 ∈ p, p at r = 1/2, λ = 3, h = 2, c = 4. Ta – full disclosure revision time, maximizes agent’s utility, Tu – optimal revision time under moral hazard, To – optimal revision time under observable actions. 2.3 Discussion Note that the deterministic revision policy has a number of properties that makes it practical for implementation: first, the policy is conceptually easy because the deadline is a fixed calendar date commonly known between the players that does not depend on past messages or outcomes; second, the policy is “inexpensive” in that it requires attention from a principal only at a single point in time. Further, note how moral hazard constrains the principal. If she tried to implement the deterministic revision that is optimal under observable actions, she would face a procrastination problem. Instead of working all the time before the deadline the agent would shirk at the beginning and start working exactly Tu periods before the deadline. Consequently, the optimal revision time is closer, Tu < To , and the agent enjoys positive procrastination rents measured as −Ua0 (Tu ) /r. Finally, note that there is no discontinuity between theorems 1 and 2. As the prior belief approaches p the revision time diverges to infinity which is equivalent to the no feedback policy. The revision times and their respective comparative statics are illustrated in the Fig. 2. 3 Wage policies A firm owner hopes that an intern she just hired will be a good addition to her small R&D firm. She gives him a list of basic tasks and promises that as soon as the interт impresses her with his job she will reward him with a bonus. The intern starts working hard full of expectations but after several months of hard work sees no bonus and gets discouraged. After another few weeks of no reward he hands in a resignation and quits the job. He starts working as a barista in a local coffee shop, not knowing that he was just a week from a major breakthrough. Could the firm owner have kept the worker if she had put him on a probation 9 period with a fixed wage for one year during which she would provide no feedback? 3.1 Model The setting is the same as before but with the addition of transfers available between the players. As usual, these transfers can represent any one-to-one shift of the surplus between the players but we will adopt a common and natural interpretation of monetary transfers and will call them “wages”. In particular, at any point in time a transfer of wt ∈ R happens between the players. If wt > 0 then the transfer goes from the principal to the agent, and if wt < 0 then the transfer goes from the agent to the principal. The definition of strategies and payoffs are augmented in a straightforward way. The behavioral strategy of the principal maps her private history into messages and wages,3 mt : wt− , mt− , st →M (M ) , wt : wt− , mt− , st → R. The behavioral strategy of the agent maps all past messages, actions, and wages into a possibly random action, to work or to shirk: at : wt− , mt− , at− →M (A) . Denote by M, W, A the sets of possible feedback, wage, and work policies. Given m ∈ M, w ∈ W and a ∈ A the normalized payoffs of the players are ˆ ∞ Ua (a, m, w) , E re−rt (at λha ω + (1 − at ) c + wt ) dt|a, m, w , 0 ˆ Up (a, m, w) , E ∞ −rt re (at λhp ω − wt ) dt|a, m, w . 0 In what follows, with a slight abuse of terminology we will refer to the tuple of feedback and wage policies (m, w) simply as a wage policy. 3 Importantly, the wage policy itself has an information content. In fact, as will be shown, optimal policies provide no extra feedback, m ≡ 0. 10 3.2 Optimal wage policy We show that an optimal wage policy allows the principal to extract full surplus that can be generated in the relationship. By Pressman and Sonin (1990) to maximize a joint surplus W , Ua + Up the agent need to work until his belief drops low enough under full feedback. As before, the policy can be implemented by working until a pre-specified time T ∗ that maximizes U (T ∗ , ha + hp , c) and then continuing experimenting if and only if there has ever been a success. The maximal surplus is then W ∗ = U (T ∗ , ha + hp , c) . where T ∗ can be calculated to satisfy ∗ e−λT = r 1 − p0 c . r + λ p0 λ (ha + hp ) − c Since individual rationality requires the agent to achieve at least c, the maximal payoff that the principal can possibly achieve in this game is W ∗ − c. Without any further restrictions the principal can achieve this first-best payoff with very simple policies. If actions are observable then the principal can reimburse the agent for his actions as long as she is still sufficiently optimistic, i.e. wt = c if t < T ∗ and at = 1, wt = 0 otherwise; mt ≡ ∅. This policy implements the efficient allocation with the agent receiving his minimal surplus, c; consequently the principal achieves the first-best payoff. If actions are not observable then the principal can “sell the arm” to the agent with the provision of constant supervising, i.e. w0 = −W ∗ + c, wt ≡ yt for t > 0, and mt ≡ ∅. Again, this policy implements the efficient allocation with the agent receiving his minimal surplus; consequently the principal achieves the first-best payoff. These full extraction arguments have already been pointed out by Halac et al. (2013) in an environment with a single success and full observability. However, as pointed out by Moroni (2015), the policy of “selling the arm” requires the agent to make a big upfront payment, −w0 . In fact, Moroni (2015) shows that if the agent is subject to limited liability then he must be left with welfare rents. The argument is straightforward. For efficiency the agent must work for some time even if no successes occurred. To extract full surplus the principal must then promise greater transfer after such histories as the agent becomes discouraged. However, the agent then can secure positive rents by procrastinating; this way he will interpret no successes more optimistically than before and value positively the transfers promised by the principal in this case. The procrastination rent argument presented above hinges crucially on the full observability of successes 11 and the corresponding discouragement effect. Indeed, the main result of this section shows that the principal can extract full surplus even under the (strong) limited liability of the agent. Definition 1. The wage contract w satisfies limited liability if wt ≥ 0 for all t ≥ 0 pathwise. Theorem 3. An optimal wage policy is a probation policy: wt ≡ 0 for t < T ∗ and wt = w (1 + γ) I (yt = 1) for t ≥ T ∗ where γ is a wage premium that depends on the number and timing of successes during the probation period: γ= −1, m ≡ ∅, where w , c λ r λp0 if N ∗ = 0, PN ∗ k=1 er(T ∗ −τk ) , if N ∗ ≥ 1, is a minimal wage per success to induce the agent to work if known to be capable, and N ∗ and τk , k = 1, . . . , N ∗ , are the number and times of successes during a probation period. This policy satisfies limited liability and allows the principal to extract full surplus under both observable and unobservable actions. Details of the proof are relegated to Appendix but the optimal policy is very transparent. During a probation period the agent receives no information so his belief stands still at the prior p0 . If at least one success happened by the end of a probation period the agent is known to be capable and “permanently employed” at a wage w (1 + γ) per success; otherwise he is effectively fired with wt ≡ 0. The size of the premium γ is tailored to provide minimal incentives for the agent to work during probation; the compensation of c/λp0 per success is spread throughout the permanent employment.4 Hence, the premium increases with a number of successes during probation with more weight put on most distant successes. This policy implements the efficient allocation with agent being left at minimal surplus; consequently the principal achieves the firstbest payoff. 3.3 Discussion The full surplus extraction hinges on the coarse informativeness of the efficient policy. Since it prescribes to work till time T ∗ in any case, it is possible to achieve efficiency without disclosing any extra information. This way the principal shuts down any information rents carried by the agent in the case of full success observability. This argument suggests that the full extraction result can be extended from the conclusive good news case to the general learning technology, e.g. when the payoffs from working follows an arbitrary Lévy process with unknown parameters. Indeed, as shown by Cohen and Solan (2013) the efficient policy in this case is 4 It could be alternatively paid out as a single bonus at the end of probation. 12 a threshold policy that prescribes to work until the belief drops below a time-independent level. While this policy isn’t coarse, in that it gradually reveals information, the information comes in a shape of “bad news”, whether the belief fell below the threshold or not. Consequently, if just being informed to continue to work the agent will become more optimistic over time; hence the transfers promised by the principal as time passes can become lower shutting down procrastination benefits. The only caveat is that at some finite time the principal still needs to pay the promised payoff out thus revealing some information and potentially reducing efficiency. However, these losses vanish as long as postponed late in the future via big bonuses. Interestingly, it agrees with a common use of “golden parachutes” in practice. More broadly, the surplus extraction result draws attention to the informational content of a salary and suggests that the employer’s incentives to muddle feedback about worker abilities may translate into increased wage rigidity. 4 Extensions The current setting is stylized to preserve tractability and highlight the main insights of the paper. At the same time it allows for straightforward extensions to fit various applications. Here we briefly outline some of them. 4.1 Optimal wage policies beyond full extraction The full surplus extraction by the optimal wage policy relies heavily on the coarse informativeness of the efficient allocation policy. While it is possible in the stylized setting considered here, each of the following modifications require the efficient policy to be fully revealing: if only one success can ever be achieved the efficient policy prescribes the agent to stop working right after the success occurred, hence fully revealing; if there is a continuum of effort levels with the corresponding convex costs then the efficient policy prescribes the agent to gradually reduce the effort if no success occurred, hence fully revealing; if there are promotion opportunities for capable agents then the efficient policy may prescribe to promote the agents right after a success occurred, hence fully revealing. Consequently, in these cases the principal can hardly extract full surplus and the search of the second-best policies is a straightforward yet challenging research question. 4.2 No commitment policies We assumed that the principal has full commitment in designing feedback and wage policies. While this assumption is plausible in some settings it can be less appealing in the others. In a broader sense, it is 13 important to understand the role the commitment power plays in the structure of optimal policies. To this end, one can consider a setting in which the contract between the principal and the agent is purely relational, i.e. when neither player is able to commit to informative messages or to promised transfers. The setting with no commitment belongs to the area of repeated games with private information that is known to be quite challenging to analyze. However, it is clear that the current full surplus extraction wouldn’t work as the principal would be understating the surplus generated by the agent both during probation and during permanent employment. Further, it is possible to obtain some positive results: there is an informative equilibrium even in the game with no commitment with limited liability and moral hazard when transfers are allowed. The main idea is that the principal can get discouraged by switching to babbling forever whenever she reports a small generated surplus. Incentivizing informative communication seems to be harder without monetary transfers since the range of possible continuation payoffs is limited. For example, it seems to be harder to deliver small continuation payoffs to the principal on paths after which the agent’s belief goes above p; switching to babbling is then not helpful as it induces the agent to work forever. The small continuation payoffs for p > p are, however, essential for an informative equilibrium: the agent’s belief must cross p after some paths to make him willing to work; the smaller continuation payoff is then required to make the principal’s message credible. References Bergemann, D. and U. Hege (1998): “Venture capital financing, moral hazard, and learning,” Journal of Banking & Finance, 22, 703–735. Bergemann, D. and S. Morris (2013a): “Bayes correlated equilibrium and the comparison of information structures,” Tech. rep., Cowles Foundation for Research in Economics, Yale University. ——— (2013b): “Robust predictions in games with incomplete information,” Econometrica, 81, 1251–1308. Bergemann, D. and M. Pesendorfer (2007): “Information structures in optimal auctions,” Journal of Economic Theory, 137, 580–609. Bergemann, D. and J. Valimaki (2005): “Information in Mechanism Design,” . ——— (2008): “Bandit problems,” The New Palgrave Dictionary of Economics, 2nd ed. Macmillan Press. Bergemann, D. and A. Wambach (2013): “Sequential Information Disclosure in Auctions,” Cowles Foundation Discussion Paper No. 1900. 14 Bonatti, A. and J. Hörner (2013): “Career Concerns and Market Structure,” Working paper. Che, Y.-K. and J. Hörner (2014): “Optimal Design for Social Learning,” Working paper. Cohen, A. and E. Solan (2013): “Bandit Problems with Levy Processes,” Mathematics of Operations Research, 38, 92–107. Ederer, F. (2013): “Incentives for Parallel Innovation,” in SSRN. Ely, J. (2013): “Beeps,” Working paper. Guo, Y. (2014): “Dynamic Delegation of Experimentation,” Working paper. Halac, M., N. Kartik, and Q. Liu (2013): “Optimal Contracts for Experimentation,” Working paper. Hölmstrom, B. (1979): “Moral hazard and observability,” The Bell Journal of Economics, 74–91. Hörner, J. and L. Samuelson (2013): “Incentives for Experimenting Agents,” The RAND Journal of Economics, 44, 632–663. Kamenica, E. and M. Gentzkow (2011): “Bayesian Persuasion,” The American Economic Review, 101, 2590–2615. Keller, G., S. Rady, and M. Cripps (2005): “Strategic Experimentation with Exponential Bandits,” Econometrica, 73, pp. 39–68. Milgrom, P. R. and R. J. Weber (1982): “A theory of auctions and competitive bidding,” Econometrica: Journal of the Econometric Society, 1089–1122. Moroni, S. (2015): “Experimentation in Organizations,” Working paper. Pressman, E. and I. Sonin (1990): “Sequential Control with Incomplete Information: The Bayesian Approach to Multi-Armed Bandit Problems.(translated and edited by E. Medova-Dempster and M. Dempster),” . Robbins, H. (1952): “Some aspects of the sequential design of experiments,” Bull. Amer. Math. Soc., 58, 527–535. Rothschild, M. (1974): “A two-armed bandit theory of market pricing,” Journal of Economic Theory, 9, 185 – 202. 15 5 Appendix A Proof of Theorem 2 The proof goes along the lines outlined in the main body of the paper. We first show that in case of both observable and not observable actions an optimal feedback policy is a stopping recommendation policy that never recommends to stop if success ever occurred. Lemma 1. Under an optimal feedback policy along any path the agent first works and then shirks forever. Switching to shirking is permanent. Proof. First, note that under moral hazard we can restrict attention to pure working strategies; otherwise just let the agent work whenever he is indifferent. Now, to the contrary assume that under an optimal feedback policy m the event that the agent shirks and then works at some point in the future has positive measure. During these (possibly random) periods of shirking players commonly know that the agent is not working and hence no new information is generated in the relationship. Consequently, the principal can shift the continuation play earlier by speeding up the disclosure by a factor of two over these events. The resulting feedback policy is more informative than m thus giving more incentives for the agent to work. Further, it cuts the probability of procrastination in half, bringing agent’s effort earlier in time thus benefitting a principal and contradicting the optimality of m. Lemma 2. Under an optimal feedback policy the agent never stops working if a success ever occurred. Proof. Under an optimal feedback policy at the time of (permanent) stopping the principal should disclose all of the past history; this can only increase incentives to work and can also persuade the agent to continue working. Since the agent would never stop working after observing a success the result follows. Furthermore, since the principal has full commitment power, the standard mediator argument applies and we can restrict attention to recommendation policies, so that mt ∈ {0, 1} prescribes the agent whether to work or not. By the previous lemmas any optimal disclosure policy can be characterized by a commonly known distribution F (t) at which the principal recommends the agent to stop working if no success occurred. Under any such policy the agent’s belief weakly increases given no stopping recommendation and plummets if the stopping recommendation occurs. Consider an arbitrary incentive compatible stopping recommendation policy F (t) with non-trivial support that delivers payoffs (Ua , Up ) to the players. Since the randomization 16 of t is made independently of the history of successes, player’s expected payoffs can be calculated as ˆ ∞ Ua (t) dF (t) , Ua = ˆ 0 ∞ Up = Up (t) dF (t) . 0 where the integral is Lebesgue-Stiltjes taking care of potential atoms in the distribution. We construct alternative deterministic policies Fˆj (t) = I t ≥ T̂j , j = o, u that preserve incentives of the agent and deliver (weakly) greater payoffs to the principal. Observable actions. Choose To to preserve agent’s expected payoff, ˆ Ua (To ) = EF [Ua (t)] = ∞ Ua (t) dF (t) . 0 Since any agent’s deviation triggers permanent shut down of disclosure, her outside option of disobedience is equal to c. The incentive compatibility of Fˆo at time 0 then follows from the incentive compatibility of the original policy at time 0 as Ua (To ) = EF [Ua (t)] ≥ c. The incentive compatibility at times t ∈ 0, T̂ follow from the incentive compatibility at time 0 since the belief of the agent stands still and the revision h time becomes closer. The incentive compatibility at times t ∈ T̂ , ∞ , after revision, is trivially satisfied. Moreover, Up (ua ) is strictly concave and hence F̂ delivers strictly greater payoffs to the principal (see Fig. 3 for illustration). We show the concavity of Ua (up ) that implies the result. By the definition of the payoffs it follows that e−t = (1 − up / (p0 λh)) 1/(λ+r) hence the agent’s payoff can be written as r λ+r up up Ua (up ) = up + c (1 − p0 ) 1 − + cp0 1 − . p0 λh p0 λh Consequently, d2 Ua r = c (1 − p0 ) du2p λ+r r 2 λ+r −2 r 1 up −1 1− < 0. λ+r p0 λh p0 λh Unobservable actions. Choose Tu to preserve agent’s expected payoff adjusted for procrastination rents, ˆ ∞ Ua0 (Tu ) Ua0 (t) Ua0 (t) Ua (Tu ) + = EF Ua (t) + = Ua (t) + dF (t) . r r r 0 When agent decides whether to work at time 0 he compares the flow cost of the effort with its informational benefits. In particular, consider the incentives of the agent if the stopping time is deterministic. If he works all the time till t then he obtains a payoff Ua (t). If he doesn’t work at the interval [0, dt] the he obtains 17 payoff a rcdt + (1 − rdt)(Ua (t) − Ua0 (t)dt). He will work only if the former is greater than the latter, i.e. if Ua (t) ≥ c − Ua0 (t)/r. Consequently, the necessary condition for the agent to work at time 0 under an arbitrary policy F (t) is ˆ ˆ ∞ Ua (t) dF (t) ≥ c − 0 0 ∞ Ua0 (t) dF (t) . r It then follows by construction that Ua (Tu ) + Ua0 (Tu ) /r = ´∞ (Ua (t) + Ua0 (t) /r) dF (t) ≥ c and F̂u is incentive compatible at time zero. The incentive compatibility at times t ∈ 0, T̂ follows from the incentive 0 compatibility at time 0 since the belief of the agent stands still and the revision time becomes closer. The h incentive compatibility at times t ∈ T̂ , ∞ is trivially satisfied. Moreover, Up (ua + u0a /r) is linear and hence F̂ delivers the same payoffs to the principal (see Fig. 3 for illustration). Indeed, Ua (t) + Ua0 (ua ) (λh − c) −λt −rt = p0 λh 1 + e e . r rh Since the dependence of the right-hand side on t is coming linearly through e−λt e−rt , the same as in Up (t), the result follows. Now notice that any stopping recommendation policy that is concentrated at a single point can be implemented by a deterministic revision with the same deadline; both policies induce the same belief dynamic and agent’s response. The only difference is that a recommendation policy does not reveal how many successes occurred if more than one. However, this information is redundant for the agent. Therefore, in a search for an optimal policy it is sufficient to consider the class of deterministic revision policies. Within this class the hardest incentive constraints are always at 0 for the reasons spelled out above – the belief of the agent stands still hence expected costs of efforts are the same while informational benefits become closer over time. It is then straightforward to show that at the optimal revision times the incentive constraint binds leading to the optimality conditions of the theorem. B Proof of Theorem 3 Most of the proof is presented in the main body of the paper. Here, we derive the size of the wage premium γ. The size of the premium is constructed to provide minimal incentives for the agent to work at any instant during the probation period. In particular, if rewarded by 4 per success during the permanent employment the agent’s payoff flow if working at time t < T ∗ is ˆ ∞ λ4e−rz dz = p0 λe−rT λ4dt, rp0 λdt T 18 Up 3.0 Λhp0 Up t®¥ 3.0 Λhp0 2.5 t®¥ 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 Λhp0 c 3.2 3.4 3.6 3.8 t=0 4.0 4.2 4.4 t=0 Λhp0 Ua 4 5 6 7 8 9 Ua +Ua 'r Figure 3: Frontiers used in the proof. Left: observable actions, agent’s payoffs vs. principal payoffs. Right: unobservable actions, agent’s adjusted payoffs vs. principal payoffs. p0 = 1/2, r = 1/2, λ = 3, h = 2, c = 4. and if not working is re−rt cdt. Equating these two expressions we obtain the minimal compensation per success: 4= r r(T −t) r r(T −t) c = e e w λp0 λ λp0 that translates into the final formula for the premium, ∗ N r X r(T ∗ −τk ) γ= e . λp0 k=1 19
© Copyright 2026 Paperzz