Negative marginal tax rates and heterogeneity 1 Philippe Choné and Guy Laroque INSEE-CREST May 10, 2007 Preliminary, comments welcome 1 We have beneted from the comments of Mark Armstrong, Richard Blundell, Martin Hellwig, Bruno Jullien, Wojciech Kopczuk, Etienne Lehmann, Pierre Pestieau, JeanCharles Rochet, Emmanuel Saez, Gilles Saint Paul, Bernard Salanié, John Weymark and of the participants in seminars in Toulouse, University College London, Louvain la Neuve and ESEM 2006 in Vienna. Abstract This article suggests a rationale for the use of negative marginal tax rates. Workers dier through productivity and opportunity cost of work. This cost may enter the cardinal measure of utility either positively or negatively; in the former case, it is interpreted as handicap, in the latter as laziness. Standard utilitarianism, understood as maximizing the sum of a concave transformation of cardinal utilities in the economy, selects a subset of the second best allocations, as in Mirrlees optimal tax theory. However the dependence of cardinal utilities on the opportunity costs of work invalidates the result that the marginal tax rate is everywhere nonnegative: Negative tax rates may allow to force into work the lazy agents. We illustrate the screening motivation of negative tax rates in a simple intensive model, where the set of second best optimal allocations is easily characterized. JEL classication numbers: H21, H31. Keywords: optimal taxation, heterogeneity, welfare. Please address correspondence to Guy Laroque, [email protected], INSEECREST J360, 15 boulevard Gabriel Péri, 92245 Malako Cedex, France. 1 Introduction The bulk of the theory of optimal taxation recommends that the marginal tax rate be everywhere positive: there is no room for subsidizing work. Indeed it is well established in the Mirrlees setup (continuous labor supply or intensive margin, unobserved productivity, constant opportunity cost of work, utilitarian planner) that the marginal tax rate is everywhere non negative at the optimum (Seade (1977), Seade (1982), Werning (2000), Hellwig (2005)). The purpose of the present paper is to suggest a rationale for negative marginal tax rates, keeping with the intensive setup largely studied in the literature, when there are multiple dimensions of heterogeneity. 1 Workers dier through productivity and opportunity cost of work. This cost enters the social criterion either positively or negatively; in the former case, it is interpreted as handicap, in the latter as laziness. A starting point is to discuss criteria for social choice. We rst recall the definition of the set of second best allocations. Given the informational constraints, these are feasible allocations which are not Pareto dominated by other feasible allocations. The denition is ordinal, contrary to that of utilitarianism, which begins with an a priori choice of a cardinal utility for every participant in the economy. The cardinal utility function of the typical tax payer usually depends 2 on her type . This feature is general in the presence of multiple dimensions of heterogeneity: the opportunity cost of work is reected in the cardinal utility, with dierent specications when it reects a handicap or mere laziness. Then standard utilitarianism consists in maximizing the sum of an increasing concave transfor- mation of the individual cardinal utility levels across the population. Concavity measures the government's aversion to inequality. It is easy to see that, given the cardinal choice of utility functions, the set of optimal allocations consistent with utilitarianism, when one varies the government's concave transformation, is a proper subset of the set of second best allocations: Standard utilitarianism is restrictive. But it is tempting to consider what we call an extended utilitarianism, where the concave transformation applied by the social decision maker depends on the tax payer's type: the government may have its own evaluation of laziness or handicap, which it may want to impose upon the cardinal measures. We however argue that extended utilitarianism is essentially non binding and generates all the second best allocations, for some suitable choice of the transformation function of the government. This preliminary discussion motivates our approach, 1 Indeed Mirrlees (1976) in its Section 4 indicates, along a line that will be pursued further here, that the sign of the marginal tax rate cannot be predicted when the agents in the economy dier along several dimensions of heterogeneity. 2 Note that in some versions of the optimal taxation model, e.g. Mirrlees (1971), the utility function is a function of hours worked and after tax income, independent of the person's productivity which only plays a role through the budget constraint and the arbitrage between consumption and leisure. 1 which we apply to a theoretical example: We rst look at the set of all second best allocations. Then we categorize the choices of cardinal utilities according to the shapes of the associated tax schedules under standard utilitarianism. We want to illustrate the utilization of negative marginal tax rates as a screening device in the simplest model as possible. In particular, we do not want to deal with the technicalities implied by the multidimensional setup. 3 To this aim, we take a separable specication for the agents tastes, without income eects as Diamond (1998), much simpler than the standard Mirrlees specication: utility is linear in commodity and, for the participating agents, labor supply has a constant elasticity with respect to wages. The study follows on the steps of Sandmo (1993), but we allow for a general non-linear tax. This specication makes it possible to subsume the two dimensions of heterogeneity (productivity and an opportunity cost of work) into a single one. A technical contribution of the paper is a full characterization of the set of second best allocations, including the ones that involve pooling, in line with the general analysis of Jullien (2000). Heterogeneity comes into play in the measurement of the agents' utilities, which increase with productivity and may either decrease or increase with the work opportunity cost. It is likely to decrease when the cost is associated with poor living conditions (i.e. a handicap); it increases when the cost reects a taste for leisure or opportunities outside the legal market (such as gardening at home or black market activities). We nd that the Mirrlees result, of positive marginal tax rates, extends here whenever the distribution of opportunity costs is independent of that of productivities, whatever the impact of these costs on the agents utilities. We give an example of optimal negative marginal rates in an economy where agents with low productivities exhibit a large spectrum of opportunity costs, and are better o, the larger their costs. The negative tax rate serves to screen out the agents with large costs, who anyway benet from working at home or on the black market, in the spirit of the imperfect screening literature (e.g. Akerlof (1978) or Salanié (2002)). To summarize, the result that marginal tax rates are non negative at the optimum, which holds under utilitarianism in the Mirrlees model, appears to be non robust to the presence of heterogeneity, apart from that aecting productivity, in the economy. Then upwards distortions in labor supply may be useful for screening purposes. This occurs here when the low income people are thought to 4 be well o agents who shirk and have outside opportunities . 3 Beaudry and Blackorby (2004) have studied a model with several `true' dimensions of heterogeneity. 4 In a companion paper Choné and Laroque (2007), we give a set of natural conditions such that in the extensive model, under utilitarianism, the less skilled workers have their work subsidized at the optimum. 2 2 Second best optimality and utilitarianism We rst characterize the second best optima in the standard optimal taxation model, and discuss their relationship with the allocations obtained by a utilitarian planner. Consider an economy with a continuum of agents of measure 1. An agent is described through her productivity her type θ = (α, ω) ω is an element of and her opportunity cost of work IR2+ . α, so that An economy is a collection of agents, Θ. An agent supplies labor h, to produce a quantity y = ωh of commodity, before tax income. The government observes y , but not the agent's type nor represented with a probability measure with support her her hours of work. It collects taxes and distributes benets or transfers, based on An agent of before tax income y has an after tax income R(y), where summarizes the government's actions is a non linear function from IR+ R y. which into IR. U (c, h; θ), which depends on her consumption h. Faced with the tax schedule R, the agent chooses her labor supply h through the The agent has a utility function of good c = R(ωh), equal to her after tax income, and on her labor supply program u(R; α, ω) = max U [R(ωh), h; θ]. (1) h≥0 R, an allocation yR is a function from Θ into IR+ such that, θ, yR (θ) = ωh for some h that maximizes the program (1) of the agent of characteristics θ . The allocation yR , and the associated function R are feasible when Z [yR (θ) − R(yR (θ))] dH(θ) = 0. Given a function for all Θ 2.1 Second best optimality An allocation yR∗ and the associated transfer function R∗ are second best optimal when there does not exist another feasible allocation which gives at least as much utility to everyone in the economy and strictly more to a subgroup of agents of positive measure. Assuming that U is strictly increasing with respect to its rst R∗ is second best optimal if and only if the program argument, ( maxR R Θ [yR (θ) − R(yR (θ))] dH(θ) u(R; θ) ≥ u(R∗ ; θ) R∗ (2) for all θ in Θ 0. Provided an appropriate dierentiability structure is put on the set of functions R, a version of the standard necessary condition for has solution and value 5 optimality then holds (see proof in the Appendix): 5 We thank Bruno Jullien and Martin Hellwig for pointing out an error at this stage in earlier versions of the paper. 3 Consider a second best optimum R∗ such that the maximizer and the constraint in (2) are Fréchet dierentiable with respect to R at R∗ . Then there exists a non negative measure Π on Θ, such that the function R∗ is a local extremum of the Lagrangian Lemma 1. Z L= [u(R; θ) dΠ(θ) + (yR (θ) − R(yR (θ)) dH(θ)]. (3) Θ Π is absolutely continuous with respect to the distribution characteristics, for any measurable set A, one has Z Π(A) = π(θ) dH(θ), When the measure of the agents A and π(θ) can be interpreted as the social weight of the agents of characteristics θ. This is the situation on which we shall mainly focus in the paper, even though the results are typically valid for a general measure, possibly with discrete masses: they cover in particular the Rawlsian optimum, which corresponds to a unit mass on the agents with the lowest utility level. The optimal taxation literature typically assumes that governments maximize a utilitarian criterion, that is the sum of a concave transformation of the utilities of the agents in the economy. Any optimal allocation for a utilitarian planner is of course second best optimal. But does a utilitarian criterion rule out some of the second best optima? It depends on the precise notion of utilitarianism which is used. 2.2 Standard utilitarianism 6 In the usual denition, the agents are endowed with a cardinal utility function U (c, h; θ). The utilitarian government maximizes the sum of agents, where Ψ Ψ(U ) over all the is a non decreasing concave real function whose concavity is a not depend on the consistent with utilitarianism if there is measure of the aversion of society for inequality, and does agents' types. A second best allocation is a function Ψ such that it maximizes the government program. Then the alloca- tions consistent with utilitarianism are a proper subset of the second best optima. Indeed identifying the rst order conditions associated with the two, utilitarian and second best, programs shows that a necessary condition for an allocation to 6 In Mirrlees (1971), the utility function does not directly depend on productivity, contrary to Mirrlees (1976). This is a benign restriction there, since the indirect utility is increasing in productivity anyhow, and there are no strong reasons to revert this sense of variation. In a multidimensional setup, the normal case is when cardinal utility directly depends on the agent type. For instance when the value of leisure varies across the population, the utilities have to be a function of the work opportunity cost. 4 6U2 E D C B A U1 - Figure 1: Second best optimality and utilitarianism be a solution of both programs is yR (θ) π(θ) = aΨ U yR (θ), ;θ , ω 0 for some positive constant a, (4) assuming enough regularity. This condition implies π(θ1 ) > π(θ2 ) if and only if u(R, θ1 ) < u(R, θ2 ), from which one derives the familiar property that the social weight is decreasing with income and productivity at a utilitarian optimum in the Mirrlees setup. This is illustrated on the stylized Figure 1, which sketches an hypothetical economy with two types of agents in the plan of their utilities frontier is the curve ABCDE . (U1 , U2 ). The second best The set of allocations consistent with utilitarian- AB ism is a part of the thick line, union of points where the frontier has slope −1: and CD, where B and D are the it must have a tangent of slope smaller than 1 in absolute value below the 45 degree line, and larger than 1 above the 45 degree line. Allocations consistent with utilitarianism are only a part of the thick line: the indierence hypersurfaces Ψ(U1 ) + Ψ(U2 ) = constant are symmetric with respect to the 45 degree line, and this imposes further restrictions. A geometric 0 representation of this constraint is as follows. For any point P , note P its symmetric with respect to the 45 degree line. Then a point P on the Pareto frontier represents a utilitarian allocation if and only if the interior of the segment [P P 0 ] does not intersect the set of feasible utilities. 5 2.3 Extended utilitarianism Society may have dierent valuations of the types than the ones coded into the individual cardinal utilities. Formally the social criterion then takes the form Ψ(u(R, θ), θ), where Ψ is concave and non decreasing7 in its rst argument. out further restrictions, it is easy to see that in these circumstances any second best optimal allocation is consistent with utilitarianism. With- essentially The adverb `essentially' refers to boundary allocations that cannot be attained with a suitable function Ψ. Only the allocations associated with a multiplier Π absolutely continuous with respect to the Lebesgue measure can be reached. Then 0 any π(θ) can be made proportional to Ψu (u(R, θ), θ), since this quantity depends in an arbitrary way on its second argument. The fact that the utilitarian criterion is a sum of the individual Ψ(u; θ), rather than any non linear non decreasing function, is little restrictive. Consider the case of two types as in Figure 1. ∗ ∗ To support a couple (U1 , U2 ), one has to nd two non decreasing functions Ψ1 ∗ ∗ and Ψ2 such that the indierence curve Ψ1 (U1 ) + Ψ2 (U2 ) = Ψ1 (U1 ) + Ψ2 (U2 ) in the plan (U1 , U2 ) intersects the set of feasible utilities U at the unique point (U1∗ , U2∗ ). This is always possible on Figure 1. In dimension 2, the only case where the construction is impossible is at a point where the Pareto frontier is non dierentiable, with an horizontal tangent from the right and a vertical tangent from the left. The previous discussion leads us to structure the remainder of the paper as follows. We consider a simple model with two dimensions of heterogeneity, productivity and work opportunity cost. First, we characterize the set of second best optimal allocations. Then, adopting standard utilitarianism, we concentrate on relating shapes of cardinal utilities to subsets of second best allocations. Specically, we seek to identify circumstances, i.e. shapes of cardinal utilities, leading to negative marginal tax rates. The benchmark is of course Mirrlees (1971), where a utility independent of productivity leads to positive marginal tax rates everywhere. Whether a large work opportunity is interpreted as laziness (low social weight) or handicap (high social weight) is crucial for the results. 3 Second best optima: an example Faced with the function R, the agent chooses her labor supply max R(ωh) − αv(h). h≥0 h by maximizing (5) The optimization of the agent is linear in commodity (labor supply does not 7 Monotonicity is required for consistency with private values. It may be worth emphasizing that we stick here to a purely welfarist viewpoint. We do not consider situations where the social objective includes moral considerations other than the eects of policies on individual utilities, as discussed in Sen (1982) and Kaplow and Shavell (2001). 6 depend on the income level: no income eect). The penibility of labor is described with the function v(h), which we specify as 1 h1+ e v(h) = . 1 + 1e The parameter e, e ≥ 0, common to all the agents in the economy, is the elas- ticity of the labor supply of the participating agents with respect to wage. This specication is adopted for convenience, but is in line with a number of works in the literature. Atkinson (1990) uses it for empirical purposes. Diamond (1998) studies the shape of the optimal tax rates in a linear in consumption model, with a more general function v than here. Boadway, Marchand, Pestieau, and del Mar Racionero (2002) also study similar issues in a quasi linear model with heterogeneous preferences for leisure, but in a discrete setup, with four dierent types of agents. Note that by quasi-linearity of the utilities, the solution to program (2) where ∗ ∗ a constant a is added to R is a + R , with yR∗ equal to ya+R∗ . Therefore Z dΠ(θ) = 1, Θ and Π is a probability measure. Furthermore, when looking for all the second best allocations, one may ignore the feasibility condition, which only xes the ∗ intercept of the function R . In general when there are several dimensions of heterogeneity (productivity, penibility of labor) and the government has only one dimension of observation (income), a major diculty is to identify the set of idiosyncratic shocks that are associated with a given level of income, which typically depends on the announced tax function. Here, the specication of the utility function and of the way shocks enter the model allows to reduce the problem to a single dimension of heterogeneity from the start, independently of the function R. It also enables us to work directly in the function space of indirect utilities, applying vector space techniques and avoiding the Hamiltonian approach. Proposition 1. 1. Consider a function R : IR+ → IR. Let ( 1 y 1+ e R(y) − β 1 + 1e V (β) = max y≥0 where β= α 1 ω 1+ e ) , . V is a convex nonincreasing function, which satises V 0 (β) = −v(yR (β)), whenever it is dierentiable, so that R(yR (β)) = V (β) − βV 0 (β). 7 2. Conversely, to any convex nonincreasing function V corresponds a real function R̃ : IR+ → IR+ through R̃(q) = min[V (β) + βq]. β≥0 R̃(.) is concave non decreasing in its argument. If V has been derived from a function R as in 1., R̃(.) coincides with the function R ◦ v −1 when R ◦ v −1 is concave, which implies that R itself is non decreasing. From the point of view of the agents the only thing that matters is the level V (β) of their utility, and Proposition 1 shows that without loss of generality we can consider any convex nonincreasing function. Also, without loss of generality, the government can restrict the R functions to be non decreasing and such that R ◦ v −1 be concave. Throughout, we work with the indirect utility function V rather than with the schedule 4 R. Optimal tax V The admissible functions functions on IR+ , belong to the convex cone of convex nonincreasing and the second best program (2) can be written Z −1 v (−V 0 ) + βV 0 − V dG(β) max (6) subject to the constraints V convex non increasing and V −1 function v is concave, the maximizer in (6) is concave in ≥ V ?. (V, V 0 ) mild regularity conditions the program has a unique solution. Since the and under Theorem 1 of Section 8.3 of Luenberger (1969) yields the existence of the Lagrange multiplier 8 for the inequality constraint. Thanks to the convexity of the problem, the rst order conditions are sucient for an optimum. Denoting Π(β) the cumulative distribution function of the probability measure which represents the Lagrange multiplier, the Lagrangian can be written as Z L= Z V dΠ(β) + [y − (V − βV 0 )] dG(β). Conversely, take any probability measure Π. (7) The primal problem Z max V dΠ(β) subject to Z −1 v (−V 0 ) + βV 0 − V dG(β) ≥ 0, 8 We apply Luenberger's result with Ω the convex cone of convex functions. The inequality constraint V ≥ V ? is obviously dierentiable in V . 8 and V convex non increasing, is well posed. Its Lagrangian is concave and co- incides with (7). It has a unique solution for any measure section is to relate the weights Π. Our aim in this Π to the corresponding optimal allocation and tax scheme. f An integration by parts in (7) (apply Lemma A.1 in Appendix with = V 0 , y = Π, y(θ = 0) and y(θ) = 1) yields Z L= where Z 0 (y + βV ) dG(β) + V 0 = −v(y) from Proposition 1. of the indirect utility (and not F = V, V 0 (G − Π) dβ, Note that it depends only on the derivative on its level V ), i.e. on the allocation y. We have proven the following Lemma: An allocation y : [β, β] → IR+ is second best optimal if and only if there exists a nondecreasing function Π : [β, β] → [0, 1] such that V 0 (β) = −v(y) is the solution to Lemma 2. Z max L = −1 0 Z 0 (v (−V ) + βV ) dG(β) + V 0 (G − Π) dβ, on the set of nondecreasing and negative functions V 0 . The set of second best optimal allocations is easy to describe when the distribution of heterogeneity is continuous, i.e. The parameter β is distributed in the economy with the c.d.f. G of support [β, β], 0 < β < β < ∞. Furthermore G has a continuous positive density g. Assumption 1 (Continuous distribution). We have Suppose that Assumption 1 holds. A non negative decreasing function y(β) dened on [β, β] is a second best allocation if and only if the function Proposition 2. 1 G(β) − g(β) 0 −β Π(β) = v (y(β)) 1 for β in [β, β) for β = β (8) is non negative and non decreasing. Then both y(β) and Π(β) are continuous on (β, β). There is no distortion at the top when Π is continuous at β : βv 0 (y(β)) = 1. There is no distortion at the bottom when Π(β) = 0: βv 0 (y(β)) = 1. The social weights π(β) associated with this allocation are the (Stieltjes) derivative of Π(β). 9 Proof: I) Necessity. Since y V0 is decreasing, is strictly negative and a necessary condition for optimality is that the pointwise derivative of the Lagrangian in Lemma 2 be equal to zero. This yields the condition of the Proposition. Continuity is proved as follows. Since y(β) is decreasing, any discontinuity −1/v 0 (y) and has to be downwards. That creates a downwards discontinuity for therefore for Π, a contradiction with the fact that Π is non decreasing. The no distortion properties are straightforward consequences of the rst order condition. II) Suciency. The measure Π dened in the proposition is an adequate multiplier for the second best program. The function β Z v(y(x)) dx V (β) = β is convex non increasing. It maximizes the Lagrangian of Lemma 2: its derivative 0 is the pointwise maximum of the concave function of V whose integral is equal to the Lagrangian. Where R is dierentiable, the program of the typical consumer yields the rst order condition R0 (y) = βv 0 (y). The tax function T corresponding to R is condition can be written, using the equality T (y) = y − R(y). R0 = 1 − T 0 The rst order 1 T 0 (y) = − 1. 1 − T 0 (y) βv 0 (y) Let p(β) be the average value of the social weights of all the agents with idiosynβ: Z β Π(β) 1 p(β) = = π(x) dG(x). (9) G(β) G(β) β cratic characteristics smaller than An immediate consequence of Proposition 2 is Corollary 3. The function p(β) corresponds to an increasing optimal allocation y if and only if 1. the function pG is nondecreasing and takes its values in [0, 1]; 2. the function β + Gg (1 − p) is increasing (and therefore always positive for β ≥ β ≥ 0). 10 The (typically large) set of allowed functions p therefore depends on the dis- tribution G p(β)G(β) is the cdf of a probability distribution. The second condition charac- of characteristics. The rst condition just restates that Π(β) = terizes the set of weights for which there is no pooling at the optimum, so that y is strictly increasing. Remark 4.2 below makes explicit the relationship between the weights and the allocations in case of pooling. Using Proposition 2, we get an expression of the optimal tax rate as a function of the distribution of the heterogeneity in the population and of the social weights: G(β) T 0 (y(β)) = [1 − p(β)] . 0 1 − T (y(β)) βg(β) Under Assumption 1, G/g (10) is well dened and positive for all the marginal tax rate has the same sign as β larger than β , and (1 − p(β)). Remark 4.1. There is no simple description, in economic terms, of the set of Pareto optimal tax schedules: they are the images of the set of allowed functions p's under the above transformation. schedule TL In the absence of pooling, the Laer tax has a simple equation, since it obtains when all the weights are zero, but for a mass point at β: G(β) TL0 (yL (β)) = . 0 1 − TL (yL (β)) βg(β) Note that any optimal tax schedule imposes a smaller marginal tax rate than the Laer rate to any tax payer β: TL0 (yL (β)) T 0 (y(β)) ≤ . 1 − T 0 (y(β)) 1 − TL0 (yL (β)) Remark 4.2. Here is a general version of Proposition 2 with proof in the Ap- pendix, which allows for pooling (i.e. y may be constant on some interval). In what follows, a pooling interval is a maximal interval where y is constant. Suppose that Assumption 1 holds. A nonnegative nonincreasing function y(β) dened on Γ is a second best allocation if and only if there exists a nonnegative and nondecreasing function Π(β) with values in [0, 1] such that Proposition 4. Z β G(β̃) − g(β̃) β 1 − β̃ 0 v (y(β)) Z dβ̃ ≥ β Π(β̃) dβ̃ (11) β for all β , and (11) is an equality at any β where y is decreasing and for β = β . In an interval where y is decreasing, (11) holds as an equality. Dierentiating this equality yields equation (8). Proposition 2 has established the existence of a one-to-one relationship between distributions of social weights and second best 11 allocations in the absence of pooling. This property does not hold any more when we allow for the possibility of pooling (case of Proposition 4): dierent distributions may, in general, give rise to the same second best allocation. It turns out that pooling intervals can be generated by mass points in the distribution of social weights, but, in general, they can also be generated by (many) smooth distributions of weights. In the appendix, we explain geometrically how to construct the (set of ) cumulative distribution functions given optimal allocation 5 Π associated with a y. Utilitarianism and marginal tax rates Propositions 2 and 4 describe fully the set of second best allocations in the economy. In any optimal allocation, all the agents of type θ = (α, ω) with the same β = α/ω 1+1/e are treated equally. From (10), the marginal tax rate supported by an agent of type type β β0 smaller than is directly related to the average social weight of the agents of β0 . To describe the utilitarian tax rates, we have to relate the values of these social weights to the social objective. As discussed in Section 2, standard utilitarianism rst supposes to specify cardinal utilities. To x ideas, we posit α ,α , u(R; α, ω) = K V 1 ω 1+ e and we assume Assumption 2. ment. The function K is non decreasing and concave in its rst argu- Both productivity and the work opportunity cost enter the rst argument of cardinal utility, in a way similar to what productivity alone does in the Mirrlees model (where work opportunity cost is constant across the population). The second argument allows for a separate eect of the work opportunity cost. As always in utilitarianism, it plays a role at the margin, through its impact on the 0 0 marginal social value of a change in utility KV [V, α]. When KV is increasing in its second argument, large α's, holding β constant, go with large social weights: a large opportunity cost is associated with a handicap that may deserve some 0 social compensation. When KV is decreasing in its second argument, a large opportunity cost of work reduces the social weights, perhaps because non market time allows lucrative activities on the black market or enjoyable leisure. Standard utilitarianism is associated with a concave increasing transformation Ψ of the cardinal utilities leading to a social welfare function Z α Ψ K V ,α dG(θ). 1 ω 1+ e 12 Taking as unknown the function V as in Section 3, noting λ the multiplier of the government budget constraint, the associated Lagrangian of the primal program is Z Z Lp = where F (α|β) Ψ (K [V, α]) dF (α|β) + λ[y − (V − βV )] dG(β), 0 α is the distribution of conditional on the parameter (12) β. Taking the derivative of the rst term, the social welfare function, with respect to V, it follows that a necessary and sucient condition for a second best allocation to be consistent with standard utilitarianism is that the social weight π(β), underlying equations (9) and (10), be given by Z π(β) = and π̃(β, α) = R π̃(β, α) dF (α|β) Ψ0 KV0 [V (β), α] , Ψ0 KV0 [V (β), α] dH(θ) with β Z Π(β) = π(x) dG(x). β Given the cardinal utilities K, standard utilitarianism is associated with all the π̃(β, α) that verify the above equalities for some increasing concave transΨ. It is important to note that, under Assumption 2 and utilitarianism, π̃(β, α) is increasing in β . weights formation 5.1 Ruling out negative marginal tax rates α is constant [ω, ω]. Then Consider the standard Mirrlees case where and ω has a continuous distribution on β= α ω 1+ 1e α β= 1 ω 1+ e across the population, , and productivity, as well as cardinal utility, decreases with just seen, social weights increase with 1, p(β) < 1 for all β < β, β. Then, as we have β , and from (9), p(β) as well. Since p(β) = and (10) gives the standard result: the marginal tax 9 rate is always positive, and it is equal to zero for the richest, at β. The situation changes when there are several dimensions of heterogeneity. Nevertheless there are a variety of situations where tax rates are non negative: 9 The fact that the marginal tax rate is zero for the low skilled, in β , is a consequence of the shape of the function v , which satises v 0 (0) = 0. 13 6 π(β) p(β) A 1 B β βm - β̄ Figure 2: Social weights and negative marginal tax rates Assume that KV0 [V, α] is non decreasing (resp. decreasing) in α and that the distribution of α, conditional on β , is rst order stochastically non decreasing (resp. decreasing) in β . Then under standard utilitarianism, the weights π(β) are increasing and marginal tax rates are everywhere non negative. Proposition 5. Proof: Let Z f (a, b) = f in is increasing in b a, since π̃ , π̃(a, α) dF (α|b). proportional to by rst order stochastic dominance. It Ψ0 KV0 [V (a), α], is. It is increasing follows that π(β) = f (β, β) is also increasing in its argument. α = βω 1+1/e , a plausible situation is when the conditional distribution 0 of α given β , F (α|β), is rst order stochastically increasing in β . Then if KV is increasing (or non decreasing) in α, i.e. larger opportunity costs increase the value of a marginal change in utility, holding β constant, possibly due to a handicap, Since the optimal marginal tax rates are non negative. 14 5.2 When are negative marginal tax rates optimal for low skilled workers? There are nevertheless a number of circumstances where negative marginal tax rates on low incomes are optimal. First a theoretical remark is useful. To see negative marginal tax rates bearing on the lowest incomes, i.e. in a neighborhood of pooling it is necessary and sucient that hood. Since by construction p(β) = 1, p(β) β, from (10), in the absence of be larger than 1 in this neighbor- assuming dierentiability, this amounts to p0 (β) < 0 Since in a neighbourhood of p0 (β) = (π(β) − p(β))g(β)/G(β), β. π, this only occurs if the social weight of the agents, is smaller than 1, the average social weight in the economy, in this neighbourhood 10 . This property has more generality: Proposition 6. A necessary condition for the lowest income agents to face negative marginal tax rates at the optimum is that their social weight π(β̄) be smaller than the average social weight in the economy. This condition is sucient when there is no pooling at β̄ . When the functions π̃(β, α), F (α|β) and G(β) are twice continuously dierentiable with respect to their arguments, one has ZZ π(β̄) − 1 = ∂F ∂ π̃ ∂F ∂ π̃ (β, α) (α|β) − (β, α) (α|β) G(β) dα dβ. ∂β ∂α ∂α ∂β (13) Remark 5.1. A feature of the intensive model, in line with the above property, is that the agents facing negative marginal tax rates are worse o than in the absence of such negative rates. Indeed they are induced to work more than otherwise, which reduces at the margin their utility levels. This is in stark contrast with what happens in the extensive model, where the extra work goes with an increase in the welfare of the concerned agents (see Choné and Laroque (2007)). We rst provide an illustrative example of the previous proposition. economy is described as follows. At the lowest wage rate α's, of a continuous distribution on ω, The there are a variety [α, α]. For all the wage rates above the (ω, ω], there is a unique value of α, equal minimum, a continuous distribution on to α. In terms of β 's, we have: β= α ω 1+ 1e βm = α ω 10 This 1+ 1e β= α ω 1+ 1e . has already been noted by Saez (2002), page 1054: negative marginal tax rates at the bottom of the wage distribution can only occur if the social weight of the concerned agent is smaller than the average social weight. 15 The agent β is the most productive with the smallest opportunity cost to work. All the agents of the segment [β, βm ] dier only by their productivities. All the [βm , β] have the same low productivity ω , but have dierent, increasing, 11 opportunity costs . Figure 2 represents a possible prole of π(β), when the social weights are decreasing in α: here large α's stand for laziness, or unlawful activities on the black market. Following standard utilitarianism, π is increasing on [β, βm ]; it is supposed to decrease further on, the home production eect more than compensating the mechanical increase in β as α rises. The agent with the largest social weight is the βm person with lowest productivity and opportunity cost to work. The associated function p(β), which measures the average height of π(x) for x smaller than β , is also represented: p(β) increases whenever it lies under the graph of π , decreases when it is above the graph, and has an horizontal tangent when it crosses the π curve. Also, we know that p(β) = 1. In the case where the distribution of β is uniform, deriving (8) shows that the optimal allocation is without pooling if and only if π(β) is everywhere smaller than 2. agents in From (10), in the situation depicted on Figure 2, all the agents in the segment AB then face negative tax rates. The previous example has a degenerate distribution of Proposition 6 gives an expression for the weight π(β̄) α conditional on β. in the situation where the distributions are smooth. There are two terms in the formula: 1. The rst term is positive, from the standard utilitarianism motive of aver0 sion to income inequality (∂ π̃/∂β ≥ 0). It is only equal to zero when KV is independent of V, the cardinal utility is linear in for redistribution across the 2. The second term, β V, and there is no desire characteristic. −∂ π̃/∂α ∂F /∂β , cannot be signed in general. It is equal to zero in a number of cases, for instance if the two parameters are independently distributed (∂F /∂β on α. Then the marginal tax rate is positive at the bottom of the income distribution. When and π̃ = 0), or if the social weight does not depend α rst order stochastically increases in is non decreasing in α, β (∂F /∂β ≤ 0) then the second term is positive, which yields the analog to Proposition 5 at the point β̄ . In practice, for negative tax rates to be optimal, the second term must be negative and larger in absolute value than the rst one. A special case where this is easier to achieve is when society has no aversion to income inequality, so that the only redistribution motive is linked to the 11 Technically, α parameter, i.e. K[V, α] = K(α)V . given β , the distribution of α is degenerate: for β ≤ βm , α is equal to α while 1 for β ≥ βm , α is equal to βω 1+ e . In the later interval, π(β) = π̃(β, α(β)) can take any shape, depending on the specication of the cardinal utility. 16 Then π̃(β, α) = K(α), the rst term vanishes and we get ZZ ∂ π̃ ∂F G(β) dα dβ ∂α ∂β Z Z ∂F 0 = − K (α) dα G(β) dβ α β ∂β Z K 0 (α)[Φ(α) − F (α|β̄)] dα = Zα = K(α)[f (α|β̄) − φ(α)] dα, π(β̄) − 1 = − α Φ and φ denote respectively the cdf and the pdf of the marginal distribution α in the economy. The social weight is smaller than 1, when the agents with low value of α are more numerous in the subpopulation of lowest income (β = β̄) than in the economy as a whole. where of the parameter Negative tax rates are therefore useful to redistribute away from socially undeserving members of the society with characteristics more numerous at β (β, α) when they are relatively than in society as a whole. This force works against the tra- ditional Mirrlees redistributive motive, which must not be too large for negative rates to be optimal. References Akerlof, G. A. (1978): The Economics of "Tagging" as Applied to the Op- timal Income Tax, Welfare Programs, and Manpower Planning, Economic Review, 68(1), 819. Atkinson, A. (1990): Public economics and the economic public, Economic Review, 34, 225248. Beaudry, P., and American European C. Blackorby (2004): Taxes and Employment Subsidies in Optimal Redistribution Programs, Discussion paper, University of British Columbia. Boadway, R., M. Marchand, P. Pestieau, (2002): and M. del Mar Racionero Optimal redistribution with heterogeneous preferences for leisure, Journal of Public Economic Theory, 4(4), 475498. Choné, P., and G. Laroque (2007): Should labor force participation be subsidized, Discussion paper, INSEE-CREST. Diamond, P. (1998): Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal Marginal Tax Rates, 95. 17 American Economic Review, 88, 83 Hellwig, M. F. (2005): A Contribution to the Theory of Optimal Utilitarian Income Taxation, Discussion paper, Max Planck Institute for Research on Collective Goods. Jullien, B. (2000): Participation Constraints in Adverse Selection Models, Journal of Economic Theory, 93, 147. Kaplow, L., and S. Shavell (2001): Any Non-welfarist Method of Policy As- sessment Violates the Pareto Principle, Journal of Political Economy, 109(2), 281286. Luenberger, D. (1969): Optimization by vector space methods. Wiley, New York. Mirrlees, J. (1971): An Exploration in the Theory of Optimum Income Tax- Review of Economic Studies, 38, 175208. ation, (1976): Optimal Tax Theory: A Synthesis, nomics, 6, 327358. Rockafellar, T. (1970): Journal of Public Eco- Convex analysis. Princeton Univ. Press, Princeton. Saez, E. (2002): Optimal Income Transfer Programs: Intensive versus Exten- Quarterly Journal of Economics, 117, 1039 Salanié, B. (2002): Optimal Demogrants with Imperfect tagging, Economics sive Labor Supply Responses, 1073. Letters, 75, 319324. Sandmo, A. (1993): Optimal Redistribution When Tastes Dier, zarchiv, 50(2), 149163. Seade, J. (1977): On the Shape of Optimal Tax Schedules, Economics, 7, 203236. Journal of Public (1982): On the Sign of the Optimum Marginal Income Tax, Economic Studies, 49, 637643. Sen, A. K. (1982): Equality of What?, in Finan- Review of Choice, Welfare and Measurement, ed. by A. K. Sen, pp. 353369. Cambridge University Press. Werning, I. (2000): An Elementary Proof of Positive Optimal Marginal Taxes, Discussion paper. 18 A Appendix Proof of Lemma 1 The inequality constraints can be written G(R) ≤ 0, where the map G is given G(R) = u(R? ; θ) − u(R; θ). The Lemma assumes that there exists functional spaces X et Z such that G : X → Z is Fréchet dierentiable. by Giving one additional dollar to every agent in the economy makes all the u(R? + 1; θ) − u(R? ; θ) < 0. It follows that R? is a inequalities slack, so that regular point of the inequality constraint G(R) ≤ 0. Theorem 1 of Section 9.4 of Luenberger (1969) yields the existence of a Laz0? in Luen12 berger's book and Π in the present paper. The multiplier belongs to the positive ? cone of Z , the dual of Z . grange multiplier for the inequality constraints, which is denoted The space Z shall be chosen according to the specics of each case under consideration. It will depend in particular on the utility function u. (For instance, Z can be the space L2 (R+ ) for a particular measure on R+ , see examples below.) ? In any case, Z is included in a space of distributions, whose positive cone is made of nonnegative measures. Therefore the multiplier Π is a nonnegative measure. Proof of Proposition 1 Part 1 of the Proposition follows from the envelope theorem and from the fact that a maximum of ane functions is convex. Turning to part 2, we write V (β) = max R(v −1 (q)) − βq = max −βq − S(q), q≥0 (14) q≥0 S(q) = −R(v −1 (q)). Note that the function V can be extended on the real line through V (β) = +∞ for β < 0. Equation (14) expresses the fact that V (−β) is the Fenchel-Legendre transform of S(q). As shown in Rockafellar where (1970), applying twice this transform yields the original function S(q) = max βq − V (−β) = max −βq − V (β) = − min βq + V (β) β β β or R(v −1 (q)) = min βq + V (β). β The minimum can be taken on β≥0 only, since V (β) = +∞ for β < 0, which completes the proof of Proposition 1. 12 Actually, Luenberger's result only requires that G is Gateaux dierentiable and that the Gateaux dierentials are linear in their increments. 19 Let f be in L1 (θ, θ) and F be given by F (θ) = a nondecreasing and bounded function on [θ, θ]. Then the following integration by parts formula holds Lemma A.1. Z θ Z f (θ)y(θ) dθ = F (θ)y(θ) − F (θ)y(θ) − Rθ θ θ f (t) dt. Let y be θ F dy, θ where Rθ (15) θ F dy is dened as a Riemann-Stieltjes integral, that is, as the limit of S= n X F (ti ) [y(θi+1 ) − y(θi )] i=0 for any mesh (θ0 = θ, θ1 , ..., θn , θn+1 = θ) and any ti ∈ (θi , θi+1 ), when the mesh size maxi |θi+1 − θi | tends to zero. Proof of Lemma A.1 First note that the left hand side of Eq. (15) is well dened since the function f y is Lebesgue integrable. Note also that the function F 0 = f a.e. F is continuous and almost everywhere dierentiable with A simple computation shows that S = −F (t0 )y(θ) − y(θ1 )[F (t1 ) − F (t0 )] − ... − y(θn )[F (tn ) − F (tn−1 )] + F (tn )y(θ) Z ti n X f (t) dt. y(θi ) = −F (t0 )y(θ) + F (tn )y(θ) − ti−1 i=1 By the Lebesgue Theorem, the last sum tends to Rθ f (θ)y(θ) dθ when the F ) gives (15). θ size tends to zero, which (together with the continuity of mesh Proof of Proposition 4 Suppose rst that is y is second best optimal. The derivative of the Lagrangian Z Z 1 < dL, H >= − 0 + β Ḣ dG(β) + Ḣ(G − Π) dβ. v (y) Since the problem is concave, a function V is the solution if and only if < dL, H >≤ 0 for all admissible variations Ḣ negative and non decreasing for (i.e., for all functions ε small enough). 20 Ḣ such that V̇ + εḢ is When that case, y is strictly decreasing, < dL, H > must be zero for all Ḣ (since, V̇ and V̇ + εḢ are increasing for small ε). It follows that we have in in the no pooling region 1 Π(β) = G(β) − g(β) 0 −β . v (y) In a pooling interval such that Ḣ [β i , β i ], the functions y and V̇ are constant and any H V̇ + εḢ is is decreasing is not an admissible test function (since [β i , β i ]). It is easy to check that if decreasing in H satises Ḣ = then H and −H 1 0 in [β , β ] i i otherwise., (16) are admissible variations, so we must have: < dL, H >= 0. It follows that Z βi βi Now if H Z βi 1 −β dβ = G(β̃) − g(β̃) 0 Π(β̃) dβ̃. v (y) β (17) i satises ( Ḣ(β̃) = −1 0 for for β̃ < β β̃ > β in in [β i , β i ] [β i , β i ] β ∈ [β i , β i ], then H is admissible (but −H is not) and < dL, H >≤ 0. It follows that Z β Z β 1 Π(β̃) dβ̃. G−g 0 − β̃ dβ̃ ≥ v (yi ) β β for some i (18) we must have: (19) i The conditions (17) and(19) are equivalent to the necessity part of the proposition. The sucient part follows from the fact that conditions (17) and(19) are < dL, H >≤ 0 for all admissible variations H , since the set of nonincreasing functions Ḣ on [β , β i ] is generated by the set of functions H satisfying i equivalent to (16) and (18). Construction of social weights associated to an optimal allocation 21 Y(γ ) Y convex Y Y convex Convex function below Y (and Y*) γ γ γ Pooling interval (y constant) Figure 3: Pooling in the intensive case When there is no pooling (case of Proposition 2), there is a one-to-one relationship between the allocation y and the distribution of social weights Π. This is not the case in general. In this subsection, we explain how to construct social weights for which a given allocation is optimal. To this aim, we rst present a geometric interpretation of Proposition 4. First Rβ Π(β̃) dβ̃ is convex in β , its derivative Π being observe that the function Z(β) = β between 0 and 1. Let Y be dened by Z Y (β) = β G(β̃) − g(β̃) β 1 − β̃ v 0 (y(β)) According to Proposition 4, for an allocation a convex function when y Z y to be optimal, it must exist with slope between 0 and 1 such that is strictly increasing. In other words, Y dβ̃. Y ≥ Z, and Y =Z must be convex outside pooling intervals and bounded below by a convex function with slope between 0 and 1 in such intervals. ? is, by denition, its convex hull Y . Then y is second best optimal if and only if the slope of Y ? is in [0, 1] and Y = Y ? ? outside the pooling intervals; the derivative of Y is the cumulative distribution The largest convex function below Y function of a social weight distribution for which the allocation The distribution of social weights Π y is optimal. is unique outside pooling intervals, but it is not unique in the pooling intervals. Indeed there might exist other convex ? 0 functions Z ≤ Y ≤ Y , with 0 ≤ Z ≤ 1 and Y = Z whenever y increases. The 22 derivatives of such functions Z utions for which the allocation also correspond to c.d.f. of social weight distrib- y is optimal. The lowest function Z, represented on Figure 3 by the dotted line, corresponds to the supremum of two segments; in the pooling interval, the support of the associated distribution Π consists of one β̄ , is a necessary unique point. Proof of Proposition 6 We rst show that π(β̄) < 1, i.e. p(β) > 1 for β close to condition for negative marginal tax rates when there is pooling. Let [β0 , β̄] be the pooling interval. From Proposition 4, equation (11), it follows that Z β̄ Z i G(β̃) 1 − p(β̃) dβ̃ ≤ h β β β̄ 1 g(β̃) 0 − β̃ v (y(β)) dβ̃ (20) β ≥ β0 (note that the interval ends at the top, which reverses the inequality compared with the Proposition). For β suciently close to β̄ , a negative marginal 1 − β̄ < 0, implies that the left hand side be negative, that is tax rate, 0 v (y(β̄)) p(β) > 1. for all The expression for π(β) is derived from the following computation: Z Z π̃(β, α) dF (α|β) = π̃(β, ᾱ) − π(β) = α α ∂ π̃ (β, α)F (α|β) dα ∂α β = β̄ Z Z ∂ π̃ π(β̄) = π̃(β̄, α) dF (α|β̄) = π̃(β̄, ᾱ) − (β̄, α)F (α|barβ) dα. α ∂α α whence, for It follows that Z 1= Z ZZ ∂ π̃ π̃(β, ᾱ)g(β) dβ − F (α|β)g(β) dα dβ ∂α β Z ∂ π̃ = π̃(β̄, ᾱ) − (β, ᾱ)G(β) dβ β ∂β ZZ ∂ π̃ F (α|β)g(β) dα dβ − ∂α π(β)g(β) dβ = β 23 By dierence, we get: Z ∂ π̃ (β, ᾱ)G(β) dβ β ∂β Z Z Z ∂ π̃ ∂ π̃ + (β, α)F (α|β) dα − (β̄, α)F (α|β̄) dα g(β) dβ β α ∂α α ∂α Z ∂ π̃ = (β, ᾱ)G(β) dβ β ∂β Z Z Z ∂ 2 π̃ ∂ π̃ ∂F − (β, α)F (α|β) dα + (β, α) (α|β) dα G(β) dβ ∂β β α ∂α∂β α ∂α Z Z ∂F ∂ π̃ ∂F ∂ π̃ (β, α) (α|β) − (β, α) (α|β) dα G(β) dβ, = ∂α ∂α ∂β β α ∂β π(β̄) − 1 = which yields (13). 24
© Copyright 2026 Paperzz