Negative marginal tax rates and heterogeneity

Negative marginal tax rates and
heterogeneity
1
Philippe Choné and Guy Laroque
INSEE-CREST
May 10, 2007
Preliminary, comments welcome
1 We
have beneted from the comments of Mark Armstrong, Richard Blundell, Martin Hellwig, Bruno Jullien, Wojciech Kopczuk, Etienne Lehmann, Pierre Pestieau, JeanCharles Rochet, Emmanuel Saez, Gilles Saint Paul, Bernard Salanié, John Weymark
and of the participants in seminars in Toulouse, University College London, Louvain la
Neuve and ESEM 2006 in Vienna.
Abstract
This article suggests a rationale for the use of negative marginal tax rates. Workers dier through productivity and opportunity cost of work. This cost may enter
the cardinal measure of utility either positively or negatively; in the former case,
it is interpreted as handicap, in the latter as laziness. Standard utilitarianism,
understood as maximizing the sum of a concave transformation of cardinal utilities in the economy, selects a subset of the second best allocations, as in Mirrlees
optimal tax theory. However the dependence of cardinal utilities on the opportunity costs of work invalidates the result that the marginal tax rate is everywhere
nonnegative: Negative tax rates may allow to force into work the lazy agents.
We illustrate the screening motivation of negative tax rates in a simple intensive
model, where the set of second best optimal allocations is easily characterized.
JEL classication numbers: H21, H31.
Keywords: optimal taxation, heterogeneity, welfare.
Please address correspondence to Guy Laroque, [email protected], INSEECREST J360, 15 boulevard Gabriel Péri, 92245 Malako Cedex, France.
1
Introduction
The bulk of the theory of optimal taxation recommends that the marginal tax
rate be everywhere positive: there is no room for subsidizing work.
Indeed it
is well established in the Mirrlees setup (continuous labor supply or intensive
margin, unobserved productivity, constant opportunity cost of work, utilitarian
planner) that the marginal tax rate is everywhere non negative at the optimum
(Seade (1977), Seade (1982), Werning (2000), Hellwig (2005)).
The purpose of the present paper is to suggest a rationale for negative marginal tax rates, keeping with the intensive setup largely studied in the literature,
when there are multiple dimensions of heterogeneity.
1
Workers dier through
productivity and opportunity cost of work. This cost enters the social criterion
either positively or negatively; in the former case, it is interpreted as handicap,
in the latter as laziness.
A starting point is to discuss criteria for social choice. We rst recall the definition of the set of second best allocations. Given the informational constraints,
these are feasible allocations which are not Pareto dominated by other feasible
allocations.
The denition is ordinal, contrary to that of utilitarianism, which
begins with an a priori choice of a cardinal utility for every participant in the
economy. The cardinal utility function of the typical tax payer usually depends
2
on her type . This feature is general in the presence of multiple dimensions of heterogeneity: the opportunity cost of work is reected in the cardinal utility, with
dierent specications when it reects a handicap or mere laziness. Then
standard
utilitarianism consists in maximizing the sum of an increasing concave transfor-
mation of the individual cardinal utility levels across the population. Concavity
measures the government's aversion to inequality.
It is easy to see that, given
the cardinal choice of utility functions, the set of optimal allocations consistent
with utilitarianism, when one varies the government's concave transformation, is
a proper subset of the set of second best allocations: Standard utilitarianism is
restrictive. But it is tempting to consider what we call an
extended utilitarianism,
where the concave transformation applied by the social decision maker depends
on the tax payer's type: the government may have its own evaluation of laziness
or handicap, which it may want to impose upon the cardinal measures. We however argue that extended utilitarianism is essentially non binding and generates
all the second best allocations, for some suitable choice of the transformation
function of the government. This preliminary discussion motivates our approach,
1 Indeed
Mirrlees (1976) in its Section 4 indicates, along a line that will be pursued further
here, that the sign of the marginal tax rate cannot be predicted when the agents in the economy
dier along several dimensions of heterogeneity.
2 Note that in some versions of the optimal taxation model, e.g. Mirrlees (1971), the utility function is a function of hours worked and after tax income, independent of the person's
productivity which only plays a role through the budget constraint and the arbitrage between
consumption and leisure.
1
which we apply to a theoretical example: We rst look at the set of all second
best allocations. Then we categorize the choices of cardinal utilities according to
the shapes of the associated tax schedules under standard utilitarianism.
We want to illustrate the utilization of negative marginal tax rates as a screening device
in the simplest model as possible.
In particular, we do not want to deal
with the technicalities implied by the multidimensional setup.
3
To this aim, we
take a separable specication for the agents tastes, without income eects as Diamond (1998), much simpler than the standard Mirrlees specication: utility is
linear in commodity and, for the participating agents, labor supply has a constant
elasticity with respect to wages. The study follows on the steps of Sandmo (1993),
but we allow for a general non-linear tax. This specication makes it possible to
subsume the two dimensions of heterogeneity (productivity and an opportunity
cost of work) into a single one.
A technical contribution of the paper is a full
characterization of the set of second best allocations, including the ones that involve pooling, in line with the general analysis of Jullien (2000). Heterogeneity
comes into play in the measurement of the agents' utilities, which increase with
productivity and may either decrease or increase with the work opportunity cost.
It is likely to decrease when the cost is associated with poor living conditions (i.e.
a handicap); it increases when the cost reects a taste for leisure or opportunities
outside the legal market (such as gardening at home or black market activities).
We nd that the Mirrlees result, of positive marginal tax rates, extends here
whenever the distribution of opportunity costs is independent of that of productivities, whatever the impact of these costs on the agents utilities. We give an
example of optimal negative marginal rates in an economy where agents with low
productivities exhibit a large spectrum of opportunity costs, and are better o,
the larger their costs. The negative tax rate serves to screen out the agents with
large costs, who anyway benet from working at home or on the black market,
in the spirit of the imperfect screening literature (e.g. Akerlof (1978) or Salanié
(2002)).
To summarize, the result that marginal tax rates are non negative at the optimum, which holds under utilitarianism in the Mirrlees model, appears to be non
robust to the presence of heterogeneity, apart from that aecting productivity,
in the economy.
Then upwards distortions in labor supply may be useful for
screening purposes. This occurs here when the low income people are thought to
4
be well o agents who shirk and have outside opportunities .
3 Beaudry
and Blackorby (2004) have studied a model with several `true' dimensions of
heterogeneity.
4 In a companion paper Choné and Laroque (2007), we give a set of natural conditions
such that in the extensive model, under utilitarianism, the less skilled workers have their work
subsidized at the optimum.
2
2
Second best optimality and utilitarianism
We rst characterize the second best optima in the standard optimal taxation
model, and discuss their relationship with the allocations obtained by a utilitarian
planner.
Consider an economy with a continuum of agents of measure 1. An agent is
described through her productivity
her type
θ = (α, ω)
ω
is an element of
and her opportunity cost of work
IR2+ .
α, so that
An economy is a collection of agents,
Θ.
An agent supplies labor h, to produce a quantity y = ωh of commodity,
before tax income. The government observes y , but not the agent's type nor
represented with a probability measure with support
her
her
hours of work. It collects taxes and distributes benets or transfers, based on
An agent of before tax income
y
has an after tax income
R(y),
where
summarizes the government's actions is a non linear function from
IR+
R
y.
which
into
IR.
U (c, h; θ), which depends on her consumption
h.
Faced with the tax schedule R, the agent chooses her labor supply h through the
The agent has a utility function
of good
c = R(ωh),
equal to her after tax income, and on her labor supply
program
u(R; α, ω) = max U [R(ωh), h; θ].
(1)
h≥0
R, an allocation yR is a function from Θ into IR+ such that,
θ, yR (θ) = ωh for some h that maximizes the program (1) of the agent of
characteristics θ . The allocation yR , and the associated function R are feasible
when
Z
[yR (θ) − R(yR (θ))] dH(θ) = 0.
Given a function
for all
Θ
2.1
Second best optimality
An allocation
yR∗
and the associated transfer function
R∗
are
second best optimal
when there does not exist another feasible allocation which gives at least as much
utility to everyone in the economy and strictly more to a subgroup of agents of
positive measure. Assuming that U is strictly increasing with respect to its rst
R∗ is second best optimal if and only if the program
argument,
(
maxR
R
Θ
[yR (θ) − R(yR (θ))] dH(θ)
u(R; θ) ≥ u(R∗ ; θ)
R∗
(2)
for all
θ
in
Θ
0.
Provided an appropriate dierentiability structure
is put on the set of functions
R, a version of the standard necessary condition for
has solution
and value
5
optimality then holds
(see proof in the Appendix):
5 We
thank Bruno Jullien and Martin Hellwig for pointing out an error at this stage in earlier
versions of the paper.
3
Consider a second best optimum R∗ such that the maximizer and
the constraint in (2) are Fréchet dierentiable with respect to R at R∗ . Then
there exists a non negative measure Π on Θ, such that the function R∗ is a local
extremum of the Lagrangian
Lemma 1.
Z
L=
[u(R; θ) dΠ(θ) + (yR (θ) − R(yR (θ)) dH(θ)].
(3)
Θ
Π is absolutely continuous with respect to the distribution
characteristics, for any measurable set A, one has
Z
Π(A) =
π(θ) dH(θ),
When the measure
of the agents
A
and
π(θ) can be interpreted as the social weight of the agents of characteristics θ.
This is the situation on which we shall mainly focus in the paper, even though the
results are typically valid for a general measure, possibly with discrete masses:
they cover in particular the Rawlsian optimum, which corresponds to a unit mass
on the agents with the lowest utility level.
The optimal taxation literature typically assumes that governments maximize
a utilitarian criterion, that is the sum of a concave transformation of the utilities
of the agents in the economy. Any optimal allocation for a utilitarian planner is
of course second best optimal. But does a utilitarian criterion rule out some of
the second best optima? It depends on the precise notion of utilitarianism which
is used.
2.2
Standard utilitarianism
6
In the usual denition, the agents are endowed with a cardinal utility function
U (c, h; θ).
The utilitarian government maximizes the sum of
agents, where
Ψ
Ψ(U )
over all the
is a non decreasing concave real function whose concavity is a
not depend on the
consistent with utilitarianism if there is
measure of the aversion of society for inequality, and does
agents' types. A second best allocation is
a function
Ψ
such that it maximizes the government program. Then the alloca-
tions consistent with utilitarianism are a proper subset of the second best optima.
Indeed identifying the rst order conditions associated with the two, utilitarian
and second best, programs shows that a necessary condition for an allocation to
6 In
Mirrlees (1971), the utility function does not directly depend on productivity, contrary
to Mirrlees (1976). This is a benign restriction there, since the indirect utility is increasing in
productivity anyhow, and there are no strong reasons to revert this sense of variation. In a
multidimensional setup, the normal case is when cardinal utility directly depends on the agent
type. For instance when the value of leisure varies across the population, the utilities have to
be a function of the work opportunity cost.
4
6U2
E
D
C
B
A
U1 -
Figure 1: Second best optimality and utilitarianism
be a solution of both programs is
yR (θ)
π(θ) = aΨ U yR (θ),
;θ ,
ω
0
for some positive constant
a,
(4)
assuming enough regularity. This condition implies
π(θ1 ) > π(θ2 )
if and only if
u(R, θ1 ) < u(R, θ2 ),
from which one derives the familiar property that the social weight is decreasing
with income and productivity at a utilitarian optimum in the Mirrlees setup. This
is illustrated on the stylized Figure 1, which sketches an hypothetical economy
with two types of agents in the plan of their utilities
frontier is the curve
ABCDE .
(U1 , U2 ).
The second best
The set of allocations consistent with utilitarian-
AB
ism is a part of the thick line, union of
points where the frontier has slope
−1:
and
CD,
where
B
and
D
are the
it must have a tangent of slope smaller
than 1 in absolute value below the 45 degree line, and larger than 1 above the 45
degree line.
Allocations consistent with utilitarianism are only a part of the thick line:
the indierence hypersurfaces
Ψ(U1 ) + Ψ(U2 ) =
constant are symmetric with
respect to the 45 degree line, and this imposes further restrictions. A geometric
0
representation of this constraint is as follows. For any point P , note P its
symmetric with respect to the 45 degree line.
Then a point
P
on the Pareto
frontier represents a utilitarian allocation if and only if the interior of the segment
[P P 0 ] does not intersect the set of feasible utilities.
5
2.3
Extended utilitarianism
Society may have dierent valuations of the types than the ones coded into the
individual cardinal utilities.
Formally the social criterion then takes the form
Ψ(u(R, θ), θ), where Ψ is concave and non decreasing7 in its rst argument.
out further restrictions, it is easy to see that in these circumstances
any second best optimal allocation is consistent with utilitarianism.
With-
essentially
The adverb `essentially' refers to boundary allocations that cannot be attained
with a suitable function
Ψ.
Only the allocations associated with a multiplier
Π
absolutely continuous with respect to the Lebesgue measure can be reached. Then
0
any π(θ) can be made proportional to Ψu (u(R, θ), θ), since this quantity depends
in an arbitrary way on its second argument. The fact that the utilitarian criterion
is a sum of the individual
Ψ(u; θ),
rather than any non linear non decreasing
function, is little restrictive. Consider the case of two types as in Figure 1.
∗
∗
To support a couple (U1 , U2 ), one has to nd two non decreasing functions Ψ1
∗
∗
and Ψ2 such that the indierence curve Ψ1 (U1 ) + Ψ2 (U2 ) = Ψ1 (U1 ) + Ψ2 (U2 )
in the plan (U1 , U2 ) intersects the set of feasible utilities U at the unique point
(U1∗ , U2∗ ). This is always possible on Figure 1. In dimension 2, the only case
where the construction is impossible is at a point where the Pareto frontier is non
dierentiable, with an horizontal tangent from the right and a vertical tangent
from the left.
The previous discussion leads us to structure the remainder of the paper as
follows. We consider a simple model with two dimensions of heterogeneity, productivity and work opportunity cost. First, we characterize the set of second best
optimal allocations. Then, adopting standard utilitarianism, we concentrate on
relating shapes of cardinal utilities to subsets of second best allocations. Specically, we seek to identify circumstances, i.e. shapes of cardinal utilities, leading to
negative marginal tax rates. The benchmark is of course Mirrlees (1971), where
a utility independent of productivity leads to positive marginal tax rates everywhere. Whether a large work opportunity is interpreted as laziness (low social
weight) or handicap (high social weight) is crucial for the results.
3
Second best optima: an example
Faced with the function
R,
the agent chooses her labor supply
max R(ωh) − αv(h).
h≥0
h
by maximizing
(5)
The optimization of the agent is linear in commodity (labor supply does not
7 Monotonicity is required for consistency with private values. It may be worth emphasizing
that we stick here to a purely welfarist viewpoint. We do not consider situations where the
social objective includes moral considerations other than the eects of policies on individual
utilities, as discussed in Sen (1982) and Kaplow and Shavell (2001).
6
depend on the income level: no income eect). The penibility of labor is described
with the function
v(h),
which we specify as
1
h1+ e
v(h) =
.
1 + 1e
The parameter
e, e ≥ 0,
common to all the agents in the economy, is the elas-
ticity of the labor supply of the participating agents with respect to wage. This
specication is adopted for convenience, but is in line with a number of works in
the literature. Atkinson (1990) uses it for empirical purposes. Diamond (1998)
studies the shape of the optimal tax rates in a linear in consumption model,
with a more general function
v
than here. Boadway, Marchand, Pestieau, and
del Mar Racionero (2002) also study similar issues in a quasi linear model with
heterogeneous preferences for leisure, but in a discrete setup, with four dierent
types of agents.
Note that by quasi-linearity of the utilities, the solution to program (2) where
∗
∗
a constant a is added to R is a + R , with yR∗ equal to ya+R∗ . Therefore
Z
dΠ(θ)
= 1,
Θ
and
Π
is a probability measure.
Furthermore, when looking for all the second
best allocations, one may ignore the feasibility condition, which only xes the
∗
intercept of the function R .
In general when there are several dimensions of heterogeneity (productivity,
penibility of labor) and the government has only one dimension of observation
(income), a major diculty is to identify the set of idiosyncratic shocks that
are associated with a given level of income, which typically depends on the announced tax function. Here, the specication of the utility function and of the
way shocks enter the model allows to reduce the problem to a single dimension of
heterogeneity from the start, independently of the function
R.
It also enables us
to work directly in the function space of indirect utilities, applying vector space
techniques and avoiding the Hamiltonian approach.
Proposition 1.
1. Consider a function R : IR+ → IR. Let
(
1
y 1+ e
R(y) − β
1 + 1e
V (β) = max
y≥0
where
β=
α
1
ω 1+ e
)
,
.
V is a convex nonincreasing function, which satises
V 0 (β) = −v(yR (β)),
whenever it is dierentiable, so that R(yR (β)) = V (β) − βV 0 (β).
7
2. Conversely, to any convex nonincreasing function V corresponds a real function R̃ : IR+ → IR+ through
R̃(q) = min[V (β) + βq].
β≥0
R̃(.) is concave non decreasing in its argument. If V has been derived from
a function R as in 1., R̃(.) coincides with the function R ◦ v −1 when R ◦ v −1
is concave, which implies that R itself is non decreasing.
From the point of view of the agents the only thing that matters is the level
V (β)
of their utility, and Proposition 1 shows that without loss of generality we
can consider any convex nonincreasing function. Also, without loss of generality,
the government can restrict the R functions to be non decreasing and such that
R ◦ v −1 be concave. Throughout, we work with the indirect utility function V
rather than with the schedule
4
R.
Optimal tax
V
The admissible functions
functions on
IR+ ,
belong to the convex cone of convex nonincreasing
and the second best program (2) can be written
Z
−1
v (−V 0 ) + βV 0 − V dG(β)
max
(6)
subject to the constraints V convex non increasing and V
−1
function v
is concave, the maximizer in (6) is concave in
≥ V ?.
(V, V 0 )
mild regularity conditions the program has a unique solution.
Since the
and under
Theorem 1 of
Section 8.3 of Luenberger (1969) yields the existence of the Lagrange multiplier
8
for the inequality constraint.
Thanks to the convexity of the problem, the rst
order conditions are sucient for an optimum.
Denoting
Π(β)
the cumulative
distribution function of the probability measure which represents the Lagrange
multiplier, the Lagrangian can be written as
Z
L=
Z
V dΠ(β) +
[y − (V − βV 0 )] dG(β).
Conversely, take any probability measure
Π.
(7)
The primal problem
Z
max
V dΠ(β)
subject to
Z
−1
v (−V 0 ) + βV 0 − V dG(β) ≥ 0,
8 We
apply Luenberger's result with Ω the convex cone of convex functions. The inequality
constraint V ≥ V ? is obviously dierentiable in V .
8
and
V
convex non increasing, is well posed. Its Lagrangian is concave and co-
incides with (7). It has a unique solution for any measure
section is to relate the weights
Π.
Our aim in this
Π to the corresponding optimal allocation and tax
scheme.
f
An integration by parts in (7) (apply Lemma A.1 in Appendix with
= V 0 , y = Π, y(θ = 0) and y(θ) = 1) yields
Z
L=
where
Z
0
(y + βV ) dG(β) +
V 0 = −v(y) from Proposition 1.
of the indirect utility (and
not
F = V,
V 0 (G − Π) dβ,
Note that it depends only on the derivative
on its level
V ),
i.e. on the allocation
y.
We have
proven the following Lemma:
An allocation y : [β, β] → IR+ is second best optimal if and only if
there exists a nondecreasing function Π : [β, β] → [0, 1] such that V 0 (β) = −v(y)
is the solution to
Lemma 2.
Z
max L =
−1
0
Z
0
(v (−V ) + βV ) dG(β) +
V 0 (G − Π) dβ,
on the set of nondecreasing and negative functions V 0 .
The set of second best optimal allocations is easy to describe when the distribution of heterogeneity is continuous, i.e.
The parameter β is distributed in the
economy with the c.d.f. G of support [β, β], 0 < β < β < ∞. Furthermore G has
a continuous positive density g.
Assumption 1 (Continuous distribution).
We have
Suppose that Assumption 1 holds. A non negative decreasing
function y(β) dened on [β, β] is a second best allocation if and only if the function
Proposition 2.


1
G(β) − g(β) 0
−β
Π(β) =
v (y(β))

1
for β in [β, β)
for β = β
(8)
is non negative and non decreasing.
Then both y(β) and Π(β) are continuous on (β, β). There is no distortion at
the top when Π is continuous at β : βv 0 (y(β)) = 1. There is no distortion at the
bottom when Π(β) = 0: βv 0 (y(β)) = 1. The social weights π(β) associated with
this allocation are the (Stieltjes) derivative of Π(β).
9
Proof: I) Necessity. Since
y
V0
is decreasing,
is strictly negative and a necessary
condition for optimality is that the pointwise derivative of the Lagrangian in
Lemma 2 be equal to zero. This yields the condition of the Proposition.
Continuity is proved as follows.
Since
y(β)
is decreasing, any discontinuity
−1/v 0 (y) and
has to be downwards. That creates a downwards discontinuity for
therefore for
Π,
a contradiction with the fact that
Π
is non decreasing. The no
distortion properties are straightforward consequences of the rst order condition.
II) Suciency.
The measure
Π
dened in the proposition is an adequate
multiplier for the second best program. The function
β
Z
v(y(x)) dx
V (β) =
β
is convex non increasing. It maximizes the Lagrangian of Lemma 2: its derivative
0
is the pointwise maximum of the concave function of V whose integral is equal
to the Lagrangian.
Where
R is dierentiable, the program of the typical consumer yields the rst
order condition
R0 (y) = βv 0 (y).
The tax function
T
corresponding to
R
is
condition can be written, using the equality
T (y) = y − R(y).
R0 = 1 − T 0
The rst order
1
T 0 (y)
=
− 1.
1 − T 0 (y)
βv 0 (y)
Let
p(β) be the average value of the social weights of all the agents with idiosynβ:
Z β
Π(β)
1
p(β) =
=
π(x) dG(x).
(9)
G(β)
G(β) β
cratic characteristics smaller than
An immediate consequence of Proposition 2 is
Corollary 3.
The function p(β) corresponds to an increasing optimal allocation
y if and only if
1. the function pG is nondecreasing and takes its values in [0, 1];
2. the function β + Gg (1 − p) is increasing (and therefore always positive for
β ≥ β ≥ 0).
10
The (typically large) set of allowed functions
p
therefore depends on the dis-
tribution
G
p(β)G(β)
is the cdf of a probability distribution. The second condition charac-
of characteristics.
The rst condition just restates that
Π(β) =
terizes the set of weights for which there is no pooling at the optimum, so that
y
is strictly increasing. Remark 4.2 below makes explicit the relationship between
the weights and the allocations in case of pooling.
Using Proposition 2, we get an expression of the optimal tax rate as a function
of the distribution of the heterogeneity in the population and of the social weights:
G(β)
T 0 (y(β))
=
[1 − p(β)] .
0
1 − T (y(β))
βg(β)
Under Assumption 1,
G/g
(10)
is well dened and positive for all
the marginal tax rate has the same sign as
β
larger than
β , and
(1 − p(β)).
Remark 4.1. There is no simple description, in economic terms, of the set of
Pareto optimal tax schedules: they are the images of the set of allowed functions
p's
under the above transformation.
schedule
TL
In the absence of pooling, the Laer tax
has a simple equation, since it obtains when all the weights are zero,
but for a mass point at
β:
G(β)
TL0 (yL (β))
=
.
0
1 − TL (yL (β))
βg(β)
Note that any optimal tax schedule imposes a smaller marginal tax rate than the
Laer rate to any tax payer
β:
TL0 (yL (β))
T 0 (y(β))
≤
.
1 − T 0 (y(β))
1 − TL0 (yL (β))
Remark 4.2. Here is a general version of Proposition 2 with proof in the Ap-
pendix, which allows for pooling (i.e.
y
may be constant on some interval). In
what follows, a pooling interval is a maximal interval where
y
is constant.
Suppose that Assumption 1 holds. A nonnegative nonincreasing
function y(β) dened on Γ is a second best allocation if and only if there exists a
nonnegative and nondecreasing function Π(β) with values in [0, 1] such that
Proposition 4.
Z
β
G(β̃) − g(β̃)
β
1
− β̃
0
v (y(β))
Z
dβ̃
≥
β
Π(β̃) dβ̃
(11)
β
for all β , and (11) is an equality at any β where y is decreasing and for β = β .
In an interval where
y
is decreasing, (11) holds as an equality. Dierentiating
this equality yields equation (8). Proposition 2 has established the existence of
a one-to-one relationship between distributions of social weights and second best
11
allocations in the absence of pooling.
This property does not hold any more
when we allow for the possibility of pooling (case of Proposition 4):
dierent
distributions may, in general, give rise to the same second best allocation.
It
turns out that pooling intervals can be generated by mass points in the distribution of social weights, but, in general, they can
also
be generated by (many)
smooth distributions of weights. In the appendix, we explain geometrically how
to construct the (set of ) cumulative distribution functions
given optimal allocation
5
Π
associated with a
y.
Utilitarianism and marginal tax rates
Propositions 2 and 4 describe fully the set of second best allocations in the economy. In any optimal allocation, all the agents of type θ = (α, ω) with the same
β = α/ω 1+1/e are treated equally. From (10), the marginal tax rate supported by
an agent of type
type
β
β0
smaller than
is directly related to the average social weight of the agents of
β0 .
To describe the utilitarian tax rates, we have to relate the
values of these social weights to the social objective. As discussed in Section 2,
standard utilitarianism rst supposes to specify cardinal utilities. To x ideas,
we posit
α
,α ,
u(R; α, ω) = K V
1
ω 1+ e
and we assume
Assumption 2.
ment.
The function K is non decreasing and concave in its rst argu-
Both productivity and the work opportunity cost enter the rst argument of
cardinal utility, in a way similar to what productivity alone does in the Mirrlees
model (where work opportunity cost is constant across the population).
The
second argument allows for a separate eect of the work opportunity cost. As
always in utilitarianism, it plays a role at the margin, through its impact on the
0
0
marginal social value of a change in utility KV [V, α]. When KV is increasing in
its second argument, large α's, holding β constant, go with large social weights:
a large opportunity cost is associated with a handicap that may deserve some
0
social compensation. When KV is decreasing in its second argument, a large
opportunity cost of work reduces the social weights, perhaps because non market
time allows lucrative activities on the black market or enjoyable leisure.
Standard utilitarianism is associated with a concave increasing transformation
Ψ
of the cardinal utilities leading to a social welfare function
Z
α
Ψ K V
,α
dG(θ).
1
ω 1+ e
12
Taking as unknown the function
V
as in Section 3, noting
λ the multiplier of the
government budget constraint, the associated Lagrangian of the primal program
is
Z Z
Lp =
where
F (α|β)
Ψ (K [V, α]) dF (α|β) + λ[y − (V − βV )] dG(β),
0
α
is the distribution of
conditional on the parameter
(12)
β.
Taking the derivative of the rst term, the social welfare function, with respect to
V,
it follows that a necessary and sucient condition for a second best
allocation to be consistent with standard utilitarianism is that the social weight
π(β),
underlying equations (9) and (10), be given by
Z
π(β) =
and
π̃(β, α) = R
π̃(β, α) dF (α|β)
Ψ0 KV0 [V (β), α]
,
Ψ0 KV0 [V (β), α] dH(θ)
with
β
Z
Π(β) =
π(x) dG(x).
β
Given the cardinal utilities
K,
standard utilitarianism is associated with all the
π̃(β, α) that verify the above equalities for some increasing concave transΨ. It is important to note that, under Assumption 2 and utilitarianism,
π̃(β, α) is increasing in β .
weights
formation
5.1
Ruling out negative marginal tax rates
α is constant
[ω, ω]. Then
Consider the standard Mirrlees case where
and
ω
has a continuous distribution on
β=
α
ω
1+ 1e
α
β=
1
ω 1+ e
across the population,
,
and productivity, as well as cardinal utility, decreases with
just seen, social weights increase with
1, p(β) < 1
for all
β < β,
β.
Then, as we have
β , and from (9), p(β) as well.
Since
p(β) =
and (10) gives the standard result: the marginal tax
9
rate is always positive, and it is equal to zero
for the richest, at
β.
The situation changes when there are several dimensions of heterogeneity.
Nevertheless there are a variety of situations where tax rates are non negative:
9 The
fact that the marginal tax rate is zero for the low skilled, in β , is a consequence of the
shape of the function v , which satises v 0 (0) = 0.
13
6
π(β)
p(β)
A
1
B
β
βm
-
β̄
Figure 2: Social weights and negative marginal tax rates
Assume that KV0 [V, α] is non decreasing (resp. decreasing) in α
and that the distribution of α, conditional on β , is rst order stochastically non
decreasing (resp. decreasing) in β .
Then under standard utilitarianism, the weights π(β) are increasing and marginal tax rates are everywhere non negative.
Proposition 5.
Proof: Let
Z
f (a, b) =
f
in
is increasing in
b
a,
since
π̃ ,
π̃(a, α) dF (α|b).
proportional to
by rst order stochastic dominance. It
Ψ0 KV0 [V (a), α], is. It is increasing
follows that π(β) = f (β, β) is also
increasing in its argument.
α = βω 1+1/e , a plausible situation is when the conditional distribution
0
of α given β , F (α|β), is rst order stochastically increasing in β . Then if KV is
increasing (or non decreasing) in α, i.e. larger opportunity costs increase the value
of a marginal change in utility, holding β constant, possibly due to a handicap,
Since
the optimal marginal tax rates are non negative.
14
5.2
When are negative marginal tax rates optimal for low
skilled workers?
There are nevertheless a number of circumstances where negative marginal tax
rates on low incomes are optimal.
First a theoretical remark is useful. To see negative marginal tax rates bearing
on the lowest incomes, i.e. in a neighborhood of
pooling it is necessary and sucient that
hood. Since by construction
p(β) = 1,
p(β)
β,
from (10), in the absence of
be larger than
1
in this neighbor-
assuming dierentiability, this amounts
to
p0 (β) < 0
Since
in a neighbourhood of
p0 (β) = (π(β) − p(β))g(β)/G(β),
β.
π,
this only occurs if
the social weight of
the agents, is smaller than 1, the average social weight in the economy, in this
neighbourhood
10
. This property has more generality:
Proposition 6. A necessary condition for the lowest income agents to face negative marginal tax rates at the optimum is that their social weight π(β̄) be smaller
than the average social weight in the economy. This condition is sucient when
there is no pooling at β̄ .
When the functions π̃(β, α), F (α|β) and G(β) are twice continuously dierentiable with respect to their arguments, one has
ZZ π(β̄) − 1 =
∂F
∂ π̃
∂F
∂ π̃
(β, α)
(α|β) −
(β, α)
(α|β) G(β) dα dβ.
∂β
∂α
∂α
∂β
(13)
Remark 5.1. A feature of the intensive model, in line with the above property, is
that the agents facing negative marginal tax rates are worse o than in the absence
of such negative rates.
Indeed they are induced to work more than otherwise,
which reduces at the margin their utility levels. This is in stark contrast with
what happens in the extensive model, where the extra work goes with an increase
in the welfare of the concerned agents (see Choné and Laroque (2007)).
We rst provide an illustrative example of the previous proposition.
economy is described as follows. At the lowest wage rate
α's,
of
a continuous distribution on
ω,
The
there are a variety
[α, α]. For all the wage rates above the
(ω, ω], there is a unique value of α, equal
minimum, a continuous distribution on
to
α.
In terms of
β 's,
we have:
β=
α
ω
1+ 1e
βm =
α
ω
10 This
1+ 1e
β=
α
ω
1+ 1e
.
has already been noted by Saez (2002), page 1054: negative marginal tax rates at the
bottom of the wage distribution can only occur if the social weight of the concerned agent is
smaller than the average social weight.
15
The agent
β
is the most productive with the smallest opportunity cost to work.
All the agents of the segment
[β, βm ]
dier only by their productivities. All the
[βm , β] have the same low productivity ω , but have dierent, increasing,
11
opportunity costs . Figure 2 represents a possible prole of π(β), when the
social weights are decreasing in α: here large α's stand for laziness, or unlawful
activities on the black market. Following standard utilitarianism, π is increasing
on [β, βm ]; it is supposed to decrease further on, the home production eect more
than compensating the mechanical increase in β as α rises. The agent with the
largest social weight is the βm person with lowest productivity and opportunity
cost to work. The associated function p(β), which measures the average height
of π(x) for x smaller than β , is also represented: p(β) increases whenever it lies
under the graph of π , decreases when it is above the graph, and has an horizontal
tangent when it crosses the π curve. Also, we know that p(β) = 1. In the
case where the distribution of β is uniform, deriving (8) shows that the optimal
allocation is without pooling if and only if π(β) is everywhere smaller than 2.
agents in
From (10), in the situation depicted on Figure 2, all the agents in the segment
AB then face negative tax rates.
The previous example has a degenerate distribution of
Proposition 6 gives an expression for the weight
π(β̄)
α
conditional on
β.
in the situation where the
distributions are smooth. There are two terms in the formula:
1. The rst term is positive, from the standard utilitarianism motive of aver0
sion to income inequality (∂ π̃/∂β ≥ 0). It is only equal to zero when KV is
independent of
V,
the cardinal utility is linear in
for redistribution across the
2. The second term,
β
V,
and there is no desire
characteristic.
−∂ π̃/∂α ∂F /∂β ,
cannot be signed in general. It is equal
to zero in a number of cases, for instance if the two parameters are independently distributed (∂F /∂β
on
α.
Then the marginal tax rate is positive at the bottom of the income
distribution. When
and
π̃
= 0), or if the social weight does not depend
α
rst order stochastically increases in
is non decreasing in
α,
β (∂F /∂β ≤ 0)
then the second term is positive, which yields
the analog to Proposition 5 at the point
β̄ .
In practice, for negative tax rates to be optimal, the second term must be
negative and larger in absolute value than the rst one. A special case where this
is easier to achieve is when society has no aversion to income inequality, so that
the only redistribution motive is linked to the
11 Technically,
α parameter, i.e. K[V, α] = K(α)V .
given β , the distribution
of α is degenerate: for β ≤ βm , α is equal to α while
1
for β ≥ βm , α is equal to βω 1+ e . In the later interval, π(β) = π̃(β, α(β)) can take any shape,
depending on the specication of the cardinal utility.
16
Then
π̃(β, α) = K(α),
the rst term vanishes and we get
ZZ
∂ π̃ ∂F
G(β) dα dβ
∂α ∂β
Z
Z
∂F
0
= − K (α) dα
G(β) dβ
α
β ∂β
Z
K 0 (α)[Φ(α) − F (α|β̄)] dα
=
Zα
=
K(α)[f (α|β̄) − φ(α)] dα,
π(β̄) − 1 = −
α
Φ and φ denote respectively the cdf and the pdf of the marginal distribution
α in the economy. The social weight is smaller than 1, when
the agents with low value of α are more numerous in the subpopulation of lowest
income (β = β̄) than in the economy as a whole.
where
of the parameter
Negative tax rates are therefore useful to redistribute away from socially undeserving members of the society with characteristics
more numerous at
β
(β, α) when they are relatively
than in society as a whole. This force works against the tra-
ditional Mirrlees redistributive motive, which must not be too large for negative
rates to be optimal.
References
Akerlof, G. A. (1978): The Economics of "Tagging" as Applied to the Op-
timal Income Tax, Welfare Programs, and Manpower Planning,
Economic Review, 68(1), 819.
Atkinson, A. (1990): Public economics and the economic public,
Economic Review, 34, 225248.
Beaudry, P.,
and
American
European
C. Blackorby (2004): Taxes and Employment Subsidies
in Optimal Redistribution Programs, Discussion paper, University of British
Columbia.
Boadway, R., M. Marchand, P. Pestieau,
(2002):
and M. del Mar Racionero
Optimal redistribution with heterogeneous preferences for leisure,
Journal of Public Economic Theory, 4(4), 475498.
Choné, P.,
and
G. Laroque (2007):
Should labor force participation be
subsidized, Discussion paper, INSEE-CREST.
Diamond, P. (1998): Optimal Income Taxation: An Example with a U-Shaped
Pattern of Optimal Marginal Tax Rates,
95.
17
American Economic Review, 88, 83
Hellwig, M. F. (2005): A Contribution to the Theory of Optimal Utilitarian
Income Taxation, Discussion paper, Max Planck Institute for Research on
Collective Goods.
Jullien, B. (2000):
Participation Constraints in Adverse Selection Models,
Journal of Economic Theory, 93, 147.
Kaplow, L.,
and S. Shavell (2001):
Any Non-welfarist Method of Policy As-
sessment Violates the Pareto Principle,
Journal of Political Economy, 109(2),
281286.
Luenberger, D. (1969):
Optimization by vector space methods.
Wiley, New
York.
Mirrlees, J. (1971): An Exploration in the Theory of Optimum Income Tax-
Review of Economic Studies, 38, 175208.
ation,
(1976): Optimal Tax Theory: A Synthesis,
nomics, 6, 327358.
Rockafellar, T. (1970):
Journal of Public Eco-
Convex analysis. Princeton Univ. Press, Princeton.
Saez, E. (2002): Optimal Income Transfer Programs: Intensive versus Exten-
Quarterly Journal of Economics,
117, 1039
Salanié, B. (2002): Optimal Demogrants with Imperfect tagging,
Economics
sive Labor Supply Responses,
1073.
Letters, 75, 319324.
Sandmo,
A. (1993):
Optimal Redistribution When Tastes Dier,
zarchiv, 50(2), 149163.
Seade, J. (1977): On the Shape of Optimal Tax Schedules,
Economics, 7, 203236.
Journal of Public
(1982): On the Sign of the Optimum Marginal Income Tax,
Economic Studies, 49, 637643.
Sen, A. K. (1982): Equality of What?, in
Finan-
Review of
Choice, Welfare and Measurement,
ed. by A. K. Sen, pp. 353369. Cambridge University Press.
Werning,
I. (2000):
An Elementary Proof of Positive Optimal Marginal
Taxes, Discussion paper.
18
A
Appendix
Proof of Lemma 1
The inequality constraints can be written G(R) ≤ 0, where the map G is given
G(R) = u(R? ; θ) − u(R; θ). The Lemma assumes that there exists functional
spaces X et Z such that G : X → Z is Fréchet dierentiable.
by
Giving one additional dollar to every agent in the economy makes all the
u(R? + 1; θ) − u(R? ; θ) < 0. It follows that R? is a
inequalities slack, so that
regular point of the inequality constraint
G(R) ≤ 0.
Theorem 1 of Section 9.4 of Luenberger (1969) yields the existence of a Laz0? in Luen12
berger's book and Π in the present paper.
The multiplier belongs to the positive
?
cone of Z , the dual of Z .
grange multiplier for the inequality constraints, which is denoted
The space
Z
shall be chosen according to the specics of each case under
consideration. It will depend in particular on the utility function u. (For instance,
Z can be the space L2 (R+ ) for a particular measure on R+ , see examples below.)
?
In any case, Z is included in a space of distributions, whose positive cone is made
of nonnegative measures. Therefore the multiplier
Π
is a nonnegative measure.
Proof of Proposition 1
Part 1 of the Proposition follows from the envelope theorem and from the fact
that a maximum of ane functions is convex.
Turning to part 2, we write
V (β) = max R(v −1 (q)) − βq = max −βq − S(q),
q≥0
(14)
q≥0
S(q) = −R(v −1 (q)). Note that the function V can be extended on the
real line through V (β) = +∞ for β < 0. Equation (14) expresses the fact
that V (−β) is the Fenchel-Legendre transform of S(q). As shown in Rockafellar
where
(1970), applying twice this transform yields the original function
S(q) = max βq − V (−β) = max −βq − V (β) = − min βq + V (β)
β
β
β
or
R(v −1 (q)) = min βq + V (β).
β
The minimum can be taken on
β≥0
only, since
V (β) = +∞
for
β < 0,
which
completes the proof of Proposition 1.
12 Actually,
Luenberger's result only requires that G is Gateaux dierentiable and that the
Gateaux dierentials are linear in their increments.
19
Let f be in L1 (θ, θ) and F be given by F (θ) =
a nondecreasing and bounded function on [θ, θ].
Then the following integration by parts formula holds
Lemma A.1.
Z
θ
Z
f (θ)y(θ) dθ = F (θ)y(θ) − F (θ)y(θ) −
Rθ
θ
θ
f (t) dt. Let y be
θ
F dy,
θ
where
Rθ
(15)
θ
F dy is dened as a Riemann-Stieltjes integral, that is, as the limit of
S=
n
X
F (ti ) [y(θi+1 ) − y(θi )]
i=0
for any mesh (θ0 = θ, θ1 , ..., θn , θn+1 = θ) and any ti ∈ (θi , θi+1 ), when the mesh
size maxi |θi+1 − θi | tends to zero.
Proof of Lemma A.1
First note that the left hand side of Eq. (15) is well dened since the function
f y is Lebesgue integrable.
Note also that the function
F 0 = f a.e.
F
is continuous and almost
everywhere dierentiable with
A simple computation shows that
S = −F (t0 )y(θ) − y(θ1 )[F (t1 ) − F (t0 )] − ... − y(θn )[F (tn ) − F (tn−1 )] + F (tn )y(θ)
Z ti
n
X
f (t) dt.
y(θi )
= −F (t0 )y(θ) + F (tn )y(θ) −
ti−1
i=1
By the Lebesgue Theorem, the last sum tends to
Rθ
f (θ)y(θ) dθ when the
F ) gives (15).
θ
size tends to zero, which (together with the continuity of
mesh
Proof of Proposition 4
Suppose rst that
is
y
is second best optimal. The derivative of the Lagrangian
Z Z
1
< dL, H >=
− 0
+ β Ḣ dG(β) + Ḣ(G − Π) dβ.
v (y)
Since the problem is concave, a function
V
is the solution if and only if
< dL, H >≤ 0
for all admissible variations
Ḣ
negative and non decreasing for
(i.e., for all functions
ε
small enough).
20
Ḣ
such that
V̇ + εḢ
is
When
that case,
y is strictly decreasing, < dL, H > must be zero for all Ḣ (since,
V̇ and V̇ + εḢ are increasing for small ε). It follows that we have
in
in
the no pooling region
1
Π(β) = G(β) − g(β) 0
−β .
v (y)
In a pooling interval
such that
Ḣ
[β i , β i ],
the functions
y
and
V̇
are constant and any
H
V̇ + εḢ
is
is decreasing is not an admissible test function (since
[β i , β i ]).
It is easy to check that if
decreasing in
H
satises
Ḣ =
then
H
and
−H
1
0
in
[β , β ]
i
i
otherwise.,
(16)
are admissible variations, so we must have:
<
dL, H
>= 0.
It
follows that
Z
βi
βi
Now if
H
Z βi
1
−β
dβ =
G(β̃) − g(β̃) 0
Π(β̃) dβ̃.
v (y)
β
(17)
i
satises
(
Ḣ(β̃) =
−1
0
for
for
β̃ < β
β̃ > β
in
in
[β i , β i ]
[β i , β i ]
β ∈ [β i , β i ], then H is admissible (but −H is not) and
< dL, H >≤ 0. It follows that
Z β
Z β
1
Π(β̃) dβ̃.
G−g 0
− β̃
dβ̃ ≥
v (yi )
β
β
for some
i
(18)
we must have:
(19)
i
The conditions (17) and(19) are equivalent to the necessity part of the proposition.
The sucient part follows from the fact that conditions (17) and(19) are
< dL, H >≤ 0 for all admissible variations H , since the set of nonincreasing functions Ḣ on [β , β i ] is generated by the set of functions H satisfying
i
equivalent to
(16) and (18).
Construction of social weights associated to an optimal allocation
21
Y(γ )
Y convex
Y
Y convex
Convex function below Y (and Y*)
γ
γ
γ
Pooling interval (y constant)
Figure 3: Pooling in the intensive case
When there is no pooling (case of Proposition 2), there is a one-to-one relationship between the allocation
y
and the distribution of social weights
Π.
This
is not the case in general. In this subsection, we explain how to construct social
weights for which a given allocation is optimal.
To this aim, we rst present a geometric interpretation of Proposition 4. First
Rβ
Π(β̃) dβ̃ is convex in β , its derivative Π being
observe that the function Z(β) =
β
between 0 and 1. Let
Y
be dened by
Z
Y (β) =
β
G(β̃) − g(β̃)
β
1
− β̃
v 0 (y(β))
According to Proposition 4, for an allocation
a convex function
when
y
Z
y
to be optimal, it must exist
with slope between 0 and 1 such that
is strictly increasing. In other words,
Y
dβ̃.
Y ≥ Z,
and
Y =Z
must be convex outside pooling
intervals and bounded below by a convex function with slope between 0 and 1 in
such intervals.
?
is, by denition, its convex hull Y . Then
y is second best optimal if and only if the slope of Y ? is in [0, 1] and Y = Y ?
?
outside the pooling intervals; the derivative of Y is the cumulative distribution
The largest convex function below
Y
function of a social weight distribution for which the allocation
The distribution of social weights
Π
y
is optimal.
is unique outside pooling intervals, but
it is not unique in the pooling intervals. Indeed there might exist other convex
?
0
functions Z ≤ Y ≤ Y , with 0 ≤ Z ≤ 1 and Y = Z whenever y increases. The
22
derivatives of such functions
Z
utions for which the allocation
also correspond to c.d.f. of social weight distrib-
y
is optimal. The lowest function
Z,
represented
on Figure 3 by the dotted line, corresponds to the supremum of two segments; in
the pooling interval, the support of the associated distribution
Π
consists of one
β̄ ,
is a necessary
unique point.
Proof of Proposition 6
We rst show that
π(β̄) < 1,
i.e.
p(β) > 1
for
β
close to
condition for negative marginal tax rates when there is pooling.
Let
[β0 , β̄]
be the pooling interval.
From Proposition 4, equation (11), it
follows that
Z
β̄
Z
i
G(β̃) 1 − p(β̃) dβ̃ ≤
h
β
β
β̄
1
g(β̃) 0
− β̃
v (y(β))
dβ̃
(20)
β ≥ β0 (note that the interval ends at the top, which reverses the inequality
compared with the Proposition). For β suciently close to β̄ , a negative marginal
1
− β̄ < 0, implies that the left hand side be negative, that is
tax rate, 0
v (y(β̄))
p(β) > 1.
for all
The expression for
π(β)
is derived from the following computation:
Z
Z
π̃(β, α) dF (α|β) = π̃(β, ᾱ) −
π(β) =
α
α
∂ π̃
(β, α)F (α|β) dα
∂α
β = β̄
Z
Z
∂ π̃
π(β̄) = π̃(β̄, α) dF (α|β̄) = π̃(β̄, ᾱ) −
(β̄, α)F (α|barβ) dα.
α ∂α
α
whence, for
It follows that
Z
1=
Z
ZZ
∂ π̃
π̃(β, ᾱ)g(β) dβ −
F (α|β)g(β) dα dβ
∂α
β
Z
∂ π̃
= π̃(β̄, ᾱ) −
(β, ᾱ)G(β) dβ
β ∂β
ZZ
∂ π̃
F (α|β)g(β) dα dβ
−
∂α
π(β)g(β) dβ =
β
23
By dierence, we get:
Z
∂ π̃
(β, ᾱ)G(β) dβ
β ∂β
Z Z
Z
∂ π̃
∂ π̃
+
(β, α)F (α|β) dα −
(β̄, α)F (α|β̄) dα g(β) dβ
β
α ∂α
α ∂α
Z
∂ π̃
=
(β, ᾱ)G(β) dβ
β ∂β
Z Z
Z
∂ 2 π̃
∂ π̃
∂F
−
(β, α)F (α|β) dα +
(β, α)
(α|β) dα G(β) dβ
∂β
β
α ∂α∂β
α ∂α
Z Z ∂F
∂ π̃
∂F
∂ π̃
(β, α)
(α|β) −
(β, α)
(α|β) dα G(β) dβ,
=
∂α
∂α
∂β
β α ∂β
π(β̄) − 1 =
which yields (13).
24