So close yet so unequal: Reconsidering spatial inequality in U.S. cities

So close yet so unequal: Reconsidering spatial
inequality in U.S. cities
Francesco Andreoli∗
Eugenio Peluso†
Preliminary version: October 9, 2016
Abstract
We study spatial inequality through new Gini-type indices that capture inequality
in the distribution of incomes both within and between individual neighborhoods
of variable size. We characterize redistribution schemes that reduce citywide income inequality along with spatial inequality. We apply this new methodology on
a rich income database covering the 30 largest U.S. cities from 1980 to nowadays.
We identify remarkable similarities in spatial inequality patterns across these cities,
summarized by few stylized facts: within neighborhood inequality is substantial
irrespectively of the neighborhood size, and generally stable over time, while inequality between individual neighborhoods strongly depends on neighborhood size
and it has been on the rise during the last 35 years.
Keywords: Gini, neighborhood, between inequality, variogram, geostatistics, flat tax,
census, ACS, urban, U.S.
JEL codes: C34, D31, H24, P25.
∗
†
Luxembourg Institute of Socio-Economic Research, LISER. Email: [email protected].
DSE University of Verona
1
1
Introduction
The uprising economic debate on the spatial component of income inequality in the U.S.
has made clear that not all cities are made equal. Despite large national income inequality,
there is great heterogeneity across U.S states and across their major cities (Chetty, Hendren, Kline and Saez 2014). Inequality in some cities has skyrocketed in the last decades,
while inequality has remained relatively low in others. For instance, the Gini index of
disposable equivalent household income in New York City in 2014 is larger than 0.5, while
the figure is much closer to, and even below 0.4 in other large cities like Washington (DC).
This heterogeneity, which can be explained by differences in skills and human capital composition across cities (Glaeser, Resseger and Tobio 2009), bears important consequences
for local intervention, for targeting program participation and for designing federal to
local redistribution of resources (Sampson 2008, Reardon and Bischoff 2011).
What this picture misses is that not all neighborhoods of the same city are made
equally unequal. Works at the frontier of economics, sociology and urban geography have
recognized that neighborhood inequality is generally not representative of citywide inequality in the U.S. The discrepancy between local inequalities and citywide inequality, reflecting association between incomes and location of income holders, has been addressed to
as urban spatial inequality. Various contributions have proposed to assess spatial inequality by using the administrative partition of the urban space to identify neighborhoods,
and to measure it as the share of citywide inequality due to differences across neighborhoods (Shorrocks and Wan 2005, Dawkins 2007, Wheeler and La Jeunesse 2008, Kim and
Jargowsky 2009). Evaluations based on this approach, however, have been criticized for
putting the administrative neighborhood and not the individual, who is responsible for
localization decisions, at the center.
Following the geographic approach (Galster 2001, Clark, Anderson, Östh and Malmberg
2015), we consider the notion of individual neighborhood, namely the group of neighbors
located within a given distance range from an individual, as the relevant concept of neighborhood. On the one hand, individual neighborhoods are meant to capture the space in
2
which different factors play a role in determining the pattern of location of rich and poor
people within the urban space, and that the contribution of these factors can be captured
by information on the incomes of one’s neighbors. On the other hand, the individual
neighborhood is a meaningful representation of the space in which local inequality may
affect individuals, for instance because individuals care about the income composition of
the neighborhood (generating sorting patterns à la Schelling 1969), or via externalities
(also called neighborhood effects, see Durlauf 2004). Social experiments such as Moving
to Opportunity, for instance, leverage on peer effects in the hope of improving the situation of the poor (Sampson 2008, Ludwig, Duncan, Gennetian, Katz, Kessler, Kling and
Sanbonmatsu 2013) and of his offspring (Chetty, Hendren and Katz 2016).
In this paper, we study inequalities that take place at the urban level. Our contribution
integrates the notion of individual neighborhood in inequality measurement and develops
in three interconnected directions.
Our first concern is positive, and has to do with how spatial inequality should be
modeled and measured. We develop a new notion of spatial inequality where individuals
(and not the administrative partition of the urban space) are put at the central stage and
spatial inequality is assessed by using information on the distribution of other income
recipients on a reference space around them. We parametrize the comparison set by
using the geographic distance around each individual, such that the spatial pattern of
inequality is accounted by a single and continuous parameter, notably the size of individual
neighborhoods. In this way each individual neighborhood identifies the potential peers
which the individual is exposed to.
If there is association between space and income, we expect that the average income
of the neighbors varies across individuals depending on their location. Otherwise, each
individual’s neighborhood is representative of the city income distribution, implying no
heterogeneity across individual neighborhoods. This point is made clear by figure 1. City
A and City B are two stylized cities displaying the same overall incomes inequality. The
two cities differ only for the location of the middle-class income recipient. In City A,
the middle class individual is located close to the poor individual, while in City B he is
3
$4
$4
$2
$2
$1
i
$1
j
k
i
City A
j
k
City B
Figure 1: Spatial distribution of three individuals (the rich, the middle-class, and the
poor) with their incomes (vertical spikes) in two linear cities of length 3d.
located closer to the rich individual. Consider now the individual neighborhoods of size d,
obtained by drawing a circle of ray d around each individual. Simple calculations reveal
that there is more inequality between neighborhoods in City B compared to City A. This
happens because the rich individual, who has an income substantially higher than the
poor and middle-class income, is isolated in City B. The ”between” dimension of spatial
inequality then captures the extent of inequality between individual neighborhoods.1
Heterogeneity of neighborhoods affluence overlooks heterogeneity within the neighborhoods. Figure 1 suggests that the degree of heterogeneity at which individuals are
exposed to is substantially larger in City A compared to City B. This occurs because
the rich and the middle-class individuals, whose incomes are very unequal, are very close
neighbors only in City A, while in City B the middle-class person is a close neighbor
of the poor person, whose income is just smaller than the middle-class income. Being
exposed to peers who are either very affluent or very deprived might represent the society
as a very risky one, where either success comes with very high income or failure comes
with poverty. The ”within” dimension of spatial inequality captures the average degree
of heterogeneity in peers’ incomes within individual neighborhoods of fixed size.
Assessing spatial inequality along these two dimensions provides relevant information
to support opposite views about the effects of (local) inequality: if negative externalities
arise from deprivation feeling and envy (Luttmer 2005), then the distance between rich and
1
This perspective becomes relevant, for instance, if individual opportunities are proxied by the average
income among the peers, a common perspective in peer effects literature.
4
poor people could mitigate part of these externalities, making City A the preferred one.
However, the proximity among people from different social status might rise ambitions
and opportunities of the poor and also benefit the rich (Ellen, Mertens Horn and O’ Regan
2013). In this case, City B might be considered as the favored one.
To quantify the extent of spatial inequality along these two dimensions, we introduce
the Gini Individual Neighborhood Inequality (GINI) indices. The GINI between index is
the Gini inequality index in the population, where individual incomes are replaced by
individual neighborhood average incomes. The size of the individual neighborhoods is
parametrized by d, the distance between the income recipients on the urban space.2
To assess the within perspective of inequality, we rely on the probabilistic interpretation of the Gini index pioneered in Pyatt (1976). Pyatt shows that the Gini inequality
coefficient can be written as the average of the gains and losses that each individual might
realize if he had to exchange his own income with the income of any other income recipient randomly drawn from the population. To assess spatial inequality, we imagine that
these comparisons are confined to individual neighborhoods rather than to the overall
population. This gives the GINI within index, also parametrized by a distance parameter
capturing the extent of individual neighborhood. We formalize these indices in Section
2, where we also derive spatial inequality curves, depicting the evolution of the GINI
indices when d varies. We argue in a dedicated section of the paper that the traditional
approach to spatial inequality has some conceptual limitations, and we motivate in what
sense our proposed measurement setting can be seen as superior. We also uncover relations of the GINI indices with tools developed in geostatistics literature, such as the
variogram function (Section 4), and commonly adopted to model spatial dependence in
random processes. A methodological appendix develops innovative asymptotic results
based on stationarity assumptions that are common in geostatistics.
Our second concern is normative, and has to do with the implications of redistribution
on inequality when the spatial dimensions of the income distribution is accounted for. We
recognize that changes in overall inequality might not necessarily distribute evenly across
2
For a discussion about the use of alternative notions of distance, see Conley and Topa (2002).
5
the urban space. Taking the location of the individuals as given, we study the features of
the redistribution scheme that, when applied to gross incomes in the city, produce postfisc incomes that display less citywide and less spatial inequality. Results are developed
in Section 3.
Our third and last concern is empirical. We make use of rich census data on income
distributions in 30 largest U.S. cities over four decades to assess the pattern of spatial
inequality in American urban areas (Section 5). Overall, we find very strong resemblances
in patterns of spatial inequality among the cities considered in the study. We conclude
that the pattern of localization of income in a city has little to do with the differences in
the organization of the urban space of the cities or with differences in citywide income
distributions, and more with the preferences of inhabitants for the relative composition
of their neighborhood and with costs and benefit these preferences might give rise to.
Section 6 concludes drawing lines of future research on the issue.
2
Spatial inequality measurement
2.1
The GINI indices
We consider a population of n ≥ 3 individuals, indexed by i = 1, ..., n. Let yi ∈ R+ be the
income of individual i and y = (y1 , y2 , ..., yn ) the income vector with average µ > 0. A
P P
popular measure of inequality in y is the Gini index, defined as G(y) = 2n12 µ i j |yj −yi |.
We focus on inequality within the city, and we assume that information on the income
distribution comes with information about the location of each income recipient in the
urban space. We simply use income distribution to refer to the income-location distribution of individuals on the city map. Building on this logic, we put each individual at
the center stage and we identify her own set of neighbors, whose composition depends
on the distribution of people in the relevant space. To define individual neighborhoods,
we consider the spatial distance between pairs of individuals. The set containing all the
neighbors placed within a distance ray d from individual i is denoted di , such that j ∈ di
6
if the distance between individuals i and j is less than, or equal to d. We also denote nid
the cardinality of di , that is the number of people living within a range d from i (including
i). The average affluence of individual i’s neighborhood of length d is then measured by
P
µid =
j∈di
nid
yj
.
To assess inequality between individual neighborhoods we compute the Gini index for
the vector (µ1d , . . . , µnd ). The elements of these vectors depend upon individuals’ location
and proximity. For instance, if a rich individual lives near to many poor individuals, her
income contributes in rising the mean income in all their neighborhoods, while if she lives
isolated, her income does not generate any positive effect on other people’s neighborhoods,
provided that the notion of individual neighborhood is sufficiently exclusive. This implies
that the average value of the vector (µ1d , .., µnd ), denoted µd , generally differs from µ. The
between dimension of spatial inequality is captured by the Gini Individual Neighborhood
Inequality between index GIN IB , defined as:
GIN IB (y,d) =
1 XX
|µid − µjd |.
2n2 µd i j
As expected, GIN IB (y, d) ∈ [0, 1] for any y and d. The index is equal to G(y) at a zerodistance and whenever all incomes within each individual neighborhood of length d are
equal. GIN IB converges to zero when d approaches the size of the city. The in-between
pattern depends on the association between incomes and locations.
We also propose a GINI within index that captures the within component of spatial
inequality. Inspired by the work of Pyatt (1976), who provides a probabilistic interpretation to the Gini inequality index, we consider, for each individual i, the average absolute
income differences with respect to her neighbors, normalized by the average income in the
(individual) neighborhood. This quantity captures the extent of gains and losses that i
would expect if her income was replaced with the income of the neighbors identified by
d. It can be computed as follows:
∆i (y, d) =
1 X |yi− yj |
.
µid j∈d nid
i
7
Notice that, given the relevant notion of individual neighborhood parametrized by d,
individual i has 1/nid chances of interacting with one of his neighbors. This probability
also changes across individuals, despite uniform weighting. We define the GINI within
index as the average of ∆i across the whole population:
n
X
1
GIN IW (y, d) =
∆i (y, d).
n
i=1
The GIN IW index hence captures the overall degree of inequality we would observe if
income comparisons were limited only to neighbors located at a distance smaller than d.
It inherits some properties of the Gini index. It is bounded, with GIN IW (y, d) ∈ [0, 1] for
any y and d. Moreover, it is endowed with specular properties with respect to GIN IB :
for instance, GIN IW (y, d) = 0 if and only if all incomes within individual neighborhoods
of length d are equal. Notice that this cannot exclude inequalities among people located
at a distance larger than d. Additionally, GIN IW (y, d) can take on values that are either
higher or smaller than G(y). When d reaches the size of the city, then spatial inequality
ends up coinciding with overall inequality, that is GIN IW (y, ∞) = G(y).
2.2
The spatial inequality curves
Computing GIN I indices for different values of d and plotting their values on a graph
with d on the horizontal axes provides a simple and insightful picture of spatial inequality
patterns. The curve interpolating all these points is called the spatial inequality curve
generated by either the GINI within or the GINI between index. More precisely, the
curve originated by GIN IB takes the same value of the Gini index when each individual
is considered as isolated (that is, when d = 0) and approaches 0 when each individual
neighborhood includes the whole city. The curve originated by GIN IW can exhibit a less
regular shape. First, it can locally decrease or increase in d accordingly to the spatial
dependency of incomes. Second, when each individual neighborhood coincides with the
whole city, GIN IW (y, d) approaches G(y). Third, the graph of GIN IW (y, d) can be
flat for any d, meaning that incomes are randomized across locations and the spatial
8
component of inequality is irrelevant. Fourth, if the curve is increasing with d, then
individuals with similar incomes tend to sort themselves in the city, generating income
segregation. The shape of the spatial inequality curves are also informative on the degree
at which citywide income inequality can be correctly inferred from randomly sampling
individuals.3
For a given size of the individual neighborhood, spatial inequality comparisons can be
carried over by looking at the level of the GINI within or between index corresponding
to the selected distance parameter. These evaluations generate a complete ranking of the
income distributions, but the robustness of the ranking is disputable vis-à-vis the size
of the neighborhood (generally unknown). We design GINI between and within partial
orders that are based on comparisons of spatial inequality curves.
Definition 1 Given the income distributions y and y0 , we say that:
a) y0 dominates y in terms of between individual neighborhood inequality, and we write
y0 GB y, if and only if GIN IB (y0 , d) ≤ GIN IB (y, d) for any distance d ≥ 0.
b) y0 dominates y in terms of within individual neighborhood inequality, and we write
y0 GW y, if and only if GIN IW (y0 , d) ≤ GIN IW (y, d) for any distance d ≥ 0.
Both within and between dimensions of spatial inequality dominance always imply
dominance in terms of citywide inequality, as captured by the Gini coefficient. The two
dominance criteria can be hence seen as tests for assessing the degree at which changes
in citywide distributions are robust to the spatial component of inequality. We do not
attribute a normative value to the spatial inequality dominance per se, as long as it might
reflect both effects of changes in the income distribution, as well as the implications of
changes in the sorting dynamic of rich and poor people within the same city. While redistribution policies have to do with income recipients budgets and can be motivated on
welfaristic or equity concerns, sorting dynamics is guided by preferences, hence complicating the normative judgement about changes in spatial inequality. It is nevertheless
3
When the role played by space is negligible, i.e., the spatial inequality curves are rather flat, we
expect that any random sample of individuals taken from a given point in the space is representative of
overall inequality. When space is relevant and people locations are stratified according to income, then a
sample of neighbors randomly drawn could underestimate the level of citywide inequality.
9
interesting to characterize redistribution schemes on the bases of their implications for
spatial inequality changes, when the location of individuals is taken as given.
2.3
Discussion
Inequality literature largely agrees that any valuable income inequality measure should
satisfy some normatively relevant properties (see Cowell 2000): (i) invariance with respect
to population replication; (ii) invariance to the measurement scale; (iii) anonymity, that
is, invariance to any permutation of the incomes recipients names; (iv) the Pigou-Dalton
principle, recognizing that every rich-to-poor income transfer unambiguously reduces inequality. Measuring spatial inequality means accounting for the joint information about
the distribution of incomes and the location of individuals in an urban space. While
properties (i) and (ii) have desirable implications also for the measurement of spatial
inequality, namely that populations of different sizes and affluence can be made comparable, anonymity conflicts with the idea that individual locations matter in inequality
evaluations. A configuration where a rich person lives nearby rich people should not necessarily yield the same degree of spatial inequality as a configuration where poor and rich
people are neighbors, provided that incomes of rich and poor people respectively coincide
in the two situations. Based on this arguments, it is not clear whether all permutations
of incomes across individuals, whose location on the urban space is given, are sources of
indifference for inequality assessments. Hence, anonymity should be relaxed in spatial
inequality assessments. The existing literature does so in a very peculiar way.
The spatial dimension of inequality is usually associated with the magnitude of the
between groups inequality, where the partition of the population rests on the administrative neighborhoods where individuals reside: urban blocks, tracts, etc (see Shorrocks
and Wan 2005). One common trait in this literature is the idea of partitioning the city
into a pre-determined spatial grid, defining neighborhoods, and grouping individuals living in the city according to the neighborhood of residence. Dawkins (2007) and Kim
and Jargowsky (2009) propose to decompose overall inequality in the within and between
10
neighborhood components, and to assess spatial inequality as the contribution of the between component, capturing differences across neighborhoods, relative to the overall city
inequality. Reardon and Bischoff (2011) builds on this approach, but focus instead on
the degree of disproportionality of rich and poor individuals across neighborhoods. Their
approach captures a particular aspect of spatial inequality, called income segregation (as
the measurement apparatus bears from the racial segregation literature), which is insensitive to the overall income distribution in the city (rich and poor groups are defined on
the basis of the ranks of individuals in the overall population).
The aforementioned approaches retain anonymity at two levels: first, among individuals living in same neighborhoods; second, when aggregating summary measures associated
to the representative individuals of different neighborhoods (average income or dispersion
in her neighborhood). Retaining anonymity at both levels implies that current spatial
inequality measures are not robust to the underlying partition of the urban space (the
so-called Modifiable Areal Units Problem, MAUP, see Openshaw 1983, Wong 2009). More
precisely, results are sensitive to the initial scale and to the shape of the partition (respectively, the scaling and the zoning components of the MAUP). To overcome this issues,
propositions have been made in the literature that involve assessing inequality between
neighborhood at different scales of aggregation of the initial partition (Shorrocks and
Wan 2005, Wheeler and La Jeunesse 2008). This procedure, however, implies strengthening the implications of anonymity within the neighborhood, as the partition of the urban
space gets courser. This is undesirable. Another potential pitfall of this approach to
spatial inequality is that measured inequality relies too strongly on the initial partition of
the urban space. Based on U.S. census data Hardman and Ioannides (2004) and Wheeler
and La Jeunesse (2008) evaluate spatial inequality at different spatial scales. Both works
recognize the importance of incorporating overall inequality in the evaluation of spatial
inequality, to properly contextualize the relevance of the within and between component
in comparisons across cities. They also propose to investigate how patterns of spatial
inequality change when pre-determined neighborhoods are aggregated in larger spatial
units. The procedure is motivated from the need of accounting for the size of the space
11
of individual interactions.
Our approach to spatial inequality measurement, based on the GINI indices, fully incorporates this logic by considering individual neighborhoods and the income distribution
therein as the primary information about the spatial location of individuals. We do not
assume representative individuals within administrative areas of the city. In the logic of
construction of individual neighborhoods, the fact that individual k is neighbor of individual i and of individual j does not imply that i and j are also neighbors. This logic
discards anonymity within the neighborhoods, which is retained only to assess inequality
in the summary measures (means or dispersion) associated to each individual neighborhood. On these grounds, we claim that our approach is robust to space partitioning, while
the scale of analysis can be controlled for by constructing more or less inclusive individual
neighborhoods.
Along with anonymity, every inequality measure should be sensitive to the PigouDalton principle. Traditional inequality measures should always decrease when income is
transferred from the rich to poor individual. Whether this principle is desirable or not in
a context where anonymity is (partially or completely) dropped, is an issue. For instance,
transferring money from a rich individual in a poor neighborhood to a poor individual in a
rich neighborhood arguably increases between neighborhood inequality. In line with this
objection, most of the spatial inequality indices proposed so far are not consistent with the
implications postulated by rich-to-poor transfers. Rather, they are consistent with more
demanding notions of income transfers involving information about the position (i.e., the
neighborhood), where the poor and rich person live.4 Our approach is even more robust
to the implications of anonymity than the traditional approach. Hence, the implications
of rich-to-poor transfers on GINI within and between indices are even more ambiguous.5
Performing rich-to-poor transfers in this setting may be extremely information demanding.
4
For instance, PD transfers do not increase the spatial inequality when they take place within the
neighborhood, while PD transfers takeing place between neighborhoods decrease spatial inequality only
when income is transferred from a rich person living in a rich neighborhood to a poor person living in a
a poor neighborhood.
5
This occurs, for instance, as long as the same rich neighbor in a rich individual neighborhood is also
the poor neighbor in another relatively richer individual neighborhood.
12
We rather focus on the implications of redistribution schemes on inequality and spatial
inequality.
3
Spatial inequality reducing income redistribution
In this section, we shift the focus from rich-to-poor transfers, involving individual comparisons of incomes, to less sophisticated redistribution schemes that apply to incomes
and that are common to the citywide distribution, but do not require the knowledge of
the position of individuals within the city. More precisely, we investigate redistributive
schemes that are able to reduce within or between spatial inequality, according to the
dominance criteria defined in terms of GINI indices. To investigate the effect of redistributive policies on spatial inequality, we consider policies generating transformations of
incomes such as tax-benefit rules, involving the whole population.
Definition 2 A redistributive scheme is a mapping f : R+ → R+ that associates a postredistribution income f (y) to any y.
Notice that the final income f (y) only depends on the initial income y, while other
characteristics such as location are neglected. Let F be the general set of redistribution
schemes defined on R+ . We restrict our attention rank-preserving distributive schemes,
restricting the domain to cases where f is non-decreasing over R+ and the ranking of
individuals after the policy is the same of the pre-tax income distribution. This requirement is justified by the need of keeping incentives from relocations of individual across
the city, related to differential in utilities, as constant (provided all other features of the
city remain constant). Let us denote F ∗ the set of rank-preserving redistributive schemes
defined on R++ that admit left and right derivatives over the entire domain. The following
proposition identifies the largest subset of redistribution schemes in F ∗ that are able to
reduce spatial inequality according to the within perspective.
Lemma 1 Given f ∈ F ∗ , the following claims are equivalent:
13
i) f (y1 ), . . . , f (yn ) GW y for any y ∈ Rn+ and for any location of the n individuals.
ii) f is star-shaped from above (ssa), that is
f (y)
y
is decreasing for any positive y.
Proof. i)⇒ii). This is due to Proposition 2.1.a and Remark 2.4. p.279 in Moyes (1994),
which proves that a ssa redistribution scheme applied to the whole population reduces
inequality in the relative Lorenz sense within any possible group of individuals of the initial
population (and even when groups are overlapping). The Gini index then decreases as well
within any possible group (notice that we do not refer to groups generating a partition of
the whole population).
ii)⇒i). Consider a neighborhood containing only the two poorest individuals, endowed
with incomes y and y + h. To reduce the Gini coefficient after applying the redistributive
scheme f we need
f (y+h)−f (y)
f (y+h)+f (y)
<
h
.
2y+h
By taking the limit for h → 0, we get f 0 (y) <
f (y)
.
y
Since the sum of ssa functions is a ssa function, a simple corollary of the previous
proposition is that applying any progressive tax schedule t (star shaped from below,
i.e.,
t(y)
y
is increasing) and redistributing the tax liability through a star-shaped function
reduces the within neighborhood inequality everywhere.
The following proposition provides an answer for the GINI between case by further
restricting the class of admissible redistribution schemes.
Lemma 2 Given f ∈ F ∗ , the following claims are equivalent:
i) f (y1 ), . . . , f (yn ) GB y for any y ∈ Rn+ and for any location of the n individuals.
ii) f is an affine function f (y) = αy + β, with
β
α
≥ 0.
PP
Proof. ii)⇒i). This is immediate, since GIN IB (y,d) = n21µd
j |µid − µjd |, whilst for
i
P
P
y0 it holds that GIN IB (αy + β,d) = n2 µ1+ β
j |µid − µjd | and β/α ≥ 0.
( d α) i
i)⇒ii). First, observe that the same proof of the within case can be applied to show the
necessity of progressive redistributive schemes. Consider a city with a single individual
14
with income ȳ living in a neighborhood of ray d and two individuals with income y1
and y2 , such that ȳ =
y1 +y2
2
who live in a similar neighborhood but separated from the
first one. The GIN Ib is then equal to zero. Requiring the same value after redistribution
2
)=
implies: f ( y1 +y
2
f (y1 )+f (y2 )
.
2
This is the Jensen’s functional equation, that is well-known
to admit only affine solutions (see Theorem 1, p.43 in Aczel 1966).
The two lemmas altogether can be combined to provide an intuitive proof of an important result, which we highlight in the following theorem.
Theorem 3 There is less citywide and less spatial inequality in post-redistribution incomes compared to pre-redistribution incomes, i.e. f (y1 ), . . . , f (yn ) GB (GW )y for any
initial distribution y ∈ Rn+ and for any location of the n individuals, if and only if f is a
basic income-flat tax (BIFT) scheme.
Proof. The result follows from Lemma 1 and 2 by imposing the government budget
P
P
constraint i f (yi ) = i yi . By setting α = 1 − t and β = tµ, the redistribution scheme
discussed in Lemma 2 identifies a BIFT scheme, with tax rate t.
The theorem has sound implications for local and urban policymakers, who take the
distribution of people in the city as given and and only control upon the distribution of
incomes. The theorem states that, for a given distribution of income holders in a city,
the only option left for a policymaker interested in reducing overall inequality in the city
(as measured, for instance, by G(y)) through taxation while being robust to patterns of
inequality across the urban space, is to implement a BIFT scheme. Rich to poor transfers
of incomes, which are highly demanding in terms of transfers operations that have to
be conducted, might not be suitable in some cases, unless one is willing to introduce
additional information about the location of individuals.6 Additionally, the proposition
6
Notice that this form of progressive redistribution unambiguously reduces inequality in the space,
pushing down both the within and between GINI curves. However, its implications for the affluence at
the level of individual neighborhood are ambiguous. To understand this point, imagine for instance a
rich individual who lives near to several poor individuals, who are located far from each other. In this
case we can expect µd > µ and a BIFT scheme takes away a proportion of the rich income dividing it
among several poor individuals. Apart from this direct redistributive effect, there is also another effect
in the opposite direction, consisting in the reduction of the (potential) positive externality generated by
the presence of a rich people in the neighborhood of poor income recipients.
15
bears important consequences for national politics. Robustness of the results with respect
to the initial distribution of individuals helps the spatial inequality minded policymaker
in shaping national tax rules, a key policy leverage, that apply indistinctly across different
cities differing both in the distribution of incomes and in the distribution of people within
the city. Our result says, for instance, that the only federal tax schedule for the U.S.
(i.e., a federal tax scheme does not depend upon the specificities of a single city and the
distribution of individual therein) granting a reduction in citywide and spatial inequality
both in New York City and in Los Angeles (two cities that differ in geography, in residential
patterns and in skills distributions), is the BIFT scheme. Other redistribution schemes
might reduce inequality in one city at the cost of rising spatial inequality in another city.
4
Empirical analysis of spatial inequality
4.1
Relations with geostatistics
The GINI indices capture a relation between the degree of inequality in incomes and the
distribution of these incomes in a geographic space. The indices do so in a way that is
analogous to how spatial processes are treated in geostatistics literature (Cressie 1991).
In this section, we investigate more in details these connections.
We assume that the data generating process is {Ys : s ∈ S}, where Y is a random variable distributed over a random field S. This process’s distribution is FS (y), representing
the degree of spatial dependence between the random variables. Through geolocalization,
it is possible to compute the distance ||.|| between locations s, v ∈ S. We generally write
||s − v|| ≤ d to indicate that the distance between the two location is smaller than d, or
equivalently v ∈ ds . The cardinality of the set of locations ds is nds while n is the total
number of locations. The observed income distribution y is a particular realization of
such process, where one observation i is observed in one location s.
Consider first the GINI within index of the spatial process FS . It can be written in
terms of first order moments of the random variables Ys as follows:
16
GIN IW (FS , d) =
XX
s
v∈ds
1 E[|Ys − Yv |]
.
2n nds
E[Yv ]
The distribution FS expresses the degree of spatial dependence between the random
variables Ys . Consider first the case with no spatial dependence, that is the random
variables Ys and Yv are i.i.d. for any s, v ∈ S. The distribution of the stochastic process
distributed as FS is thus independent of the random field S. In this case, we obtain
that GIN IW (FS , d) =
E[|Ys −Yv |]
E[Yv ]
where Ys and Yv are two i.i.d. random variables, which
coincides with the definition of the standard Gini inequality coefficient in Muliere and
Scarsini (1989).7
If, instead, spatial dependence is at stake, then the expectation E[|Ys − Yv |] varies
across locations and cannot be identified and estimated from the observation of just one
data point in each location. It is standard in geostatistics to rely on standard stationarity
assumptions about the distribution FS (Cressie and Hawkins 1980, Cressie 1991). The first
assumption is that the random variables Ys have stationary expectations over the random
field, i.e., E[Yv ] = µ for any v. The second assumption is that the spatial dependence in
incomes registered between two locations s and v only depends on the distance between
the two locations, ||s − v||, and not on their position on the random field. Here, we
consider radial distance measures for simplicity, so that ||s − v|| = d. This allows to
write E[(Ys − Yv )2 ] = 2γ(||s − v||) = 2γ(d), where the function 2γ is the variogram of the
distribution FS (Matheron 1963).
The variogram displays the evolution of the variance in the difference (a measure of
correlation) between any two elements of the random process defined on S as a function
of their spatial lag. Thus 2γ(d) is informative of the correlation between two random
variables that are exactly d distance units away one from the other. The shape of the
function 2γ(d) represents the degree of spatial dependence of the random process, conceptualized as the contribution of spatial association on incomes variability. Generally,
2γ(d) → 0 as d approaches 0, indicating that random variables that are very close in space
7
Biondi and Qeadan (2008) explore the use of connected estimator to assess dependency across time
in paleorecords observed on a given point in space.
17
tend to be strongly spatially correlated. Conversely, 2γ(d) → 2σ 2 when d is sufficiently
large, indicating spatial independence between two random variables Ys and Yv far apart
on the random field.
Together, the two assumptions listed above depict a form of intrinsic stationarity of
the process (Cressie and Hawkins 1980, Cressie 1991, Chilès and Delfiner 2012). If we
also assume that Ys is gaussian with mean µ and variance σ 2 , it is possible to show that
the GINI within index is a function of the variogram, as follows
GIN IW (FS , d) =
XX
s
v∈ds
p
1 2 γ(||s − v||)/π
.
2n nds
µ
With more algebra, a similar relation can also be established for the GINI between
index. Both formulas are formally derived in the appendix. These results are suggestive of
two important facts. The first fact is that the GINI indices measure spatial inequality as
a direct expression of the spatial dependence in the data generating process, represented
under stationarity assumptions by the variogram, without imposing external normative
hypotheses about the interactions between incomes, income inequality and space. The
second fact is that the empirical counterpart of the variogram can be used to produce
asymptotically valid estimators of the GINI indices standard errors, which can be used to
test hypothesis on the extent and dynamics of spatial inequality.
4.2
Testing hypothesis about spatial inequality
In the methodology appendix, we derive distribution free, non-parametric estimators for
the GINI indices in a general setting where sample information about the process FS is
available.8 GINI indices estimators can be used to assess inequality in a city for a given
concept of individual neighborhood, and more broadly to construct spatial inequality
curves. Methodology for their construction is discussed in the appendix.
8
The GINI between index estimator can be computed as a plug-in estimator as in as in Binder
and Kovacevic (1995) and Bhattacharya (2007), provided individual neighborhood averages are properly
estimated. The GINI within estimator involves instead compering individual income realizations.
18
The GINI spatial inequality curves can be used to test hypothesis about the shape and
dynamics of spatial inequality. (i) By contrasting the level of spatial inequality measured
by the GINI curves at a given distance d with the overall level of inequality captured by
the Gini index, it is possible to assess if, and to what extent, average income inequality
experienced within a neighborhood of size d is different from the level of inequality in the
city. (ii) Moreover, by contrasting the level of the GINI curves for at d and at d0 > d, it is
possible to state if, and by how much, inequality is increased with an increase in the range
of the average individual neighborhood. (iii) Lastly, by comparing the levels of the GINI
curves at distance d registered in different periods within the same city, it is possible to
conclude about the dynamics of spatial inequality. Furthermore, if one is confident that
spatial distance is comparable across cities, one can use GINI spatial inequality curves to
rank these cities in terms of urban inequality.9
In the appendix, we maintain the intrinsic stationarity and the gaussian assumptions
to derive standard errors for the GINI within and between estimators. We show that
standard errors can be written as averages of variogram functions of the process and that
the GINI indices estimators sampling distribution converge to asymptotic normality.10
It follows that the statistical relevant of the hypothesis about spatial inequality can be
assessed via standard t-statistics.
In the empirical investigation of spatial inequality across U.S. cities, based on decennial
census data and repeated surveys, we employ standard error estimators to account for
issues related to data reporting (which come in forms of summary tables for each element
of a very fine spatial partition of U.S. urban territories).11
9
One is compelled to conclude in favor of spatial inequality only if there is strong evidence against the
null hypothesis that the level of the GINI curve at d is the same as the Gini inequality index, and that
the level of spatial inequality captured by the GINI curves does not change with d. When comparing two
GINI (either between or within) curves, one cannot reject that a strong increase or reduction in spatial
inequality if there is strong evidence against the null hypothesis that the two curves coincide at every d.
10
Standard errors for GINI indices can be derived using results on ratio-measures estimators (see
Hoeffding 1948, Goodman and Hartley 1958, Tin 1965, Xu 2007, Davidson 2009) under distributional
assumptions that are common in the geostatistics literature (Cressie and Hawkins 1980, Cressie 1985),
such as intrinsic stationarity.
11
A Stata routine implementing the GINI between and within indices and curves, along with their
standard error estimators, is available on the authors web-pages.
19
5
Spatial inequality in U.S. cities: 1980-2014
In this section we recover and study the patters of spatial inequality in U.S. cities along
four decades. Our aim is to illustrate the functioning of spatial inequality measures on
the spatial distribution of income and to provide stylized facts about the shape of spatial
income inequality in U.S. cities. We focus first on the city of Chicago (IL) as a case study
of the spatial dimension of inequality in a large U.S. metropolitan area. Then, we provide
stylized facts about patterns of spatial inequality across the 30 largest U.S. cities.
5.1
Data
We follow the dynamics of income within U.S. cities for four decades. We make use of
the Census data from the U.S. Census Bureau for 1980, 1990 and 2000. For these years,
we select information from the Summary Tape File 3A about population counts, income
levels and family composition at a very fine spatial grid.12 The STF 3A data are reported
in the form of statistical tables for each block group, due to anonymization issues. After
2000, the STF 3A file of the Censuses have been replaced with survey-based evidence from
the American Community Survey (ACS), running yearly on representative samples of U.S.
resident population. For our analysis, we consider the 2010-2014 ACS 5-years Estimates
module obtained from representative samples of the U.S. population.13 Sampling rates
vary independently at the census block level according to 2010 census population counts,
covering on average 2% of U.S. population.
The block group scale provides course information about the individual incomes, given
the substantial heterogeneity in incomes observed within blocks. The block groups define,
however, sufficiently small spatial units that can be safely used to construct individual
neighborhoods of minimal size.
12
Available information is a snapshot of overall inequality in the Census year with a universal statistical
coverage. The Census STF 3A provides data for States and their subareas in hierarchical sequence down
to the block group level, which is the finest geographical information available in the Census.
13
The ACS reports estimates of incomes, population counts and household size measures at the level
of the census block from data collected over the five years span. Hence, estimated counts do not have an
annual validity, but rather should be interpreted as average measures across the 2010-2014 time span.
20
To account for within block group income inequality, we construct synthetic measures of income from the counts information. We retain households made of one or more
individuals as the income recipient units. We use two income variables to model the distribution of household gross income. The first variable is total (i.e. aggregate) income in
the block group. The second variable is the count of the households living in each block
group. Tables of households counts are also provided by household size (scoring from 1
to 7 or more individuals), so that equivalence scales representative at the block group
level can be constructed. The use of demographic equivalent scales is necessary to draw
conclusions about the distribution of inequality across households that are otherwise not
comparable. A second table reports households counts per income category at the block
group level. There are 17 income bins in 1980, 25 in 1990 and 16 in 2000 and in the ACS
sample. Each bin consists in two income levels, the bottom and top incomes marking
that bins. The last income bin defines an open interval instead, for which only the lower
income bound is reported. Census and ACS also report, for each income bin, data of the
population of households whose gross income falls in that bin. Using methodology similar
to Nielsen and Alderson (1997), based on Pareto distribution fitting (for an alternative
methodology based on parametric models, leading to very close results, see Wheeler and
La Jeunesse 2008), tables about the distribution of household counts across bins are transformed into a vector of incomes levels and the associated vector of households frequencies
corresponding to these incomes that are representative of the block group specific income
distribution.14 Incomes and household frequencies vary across block groups, implying
strong within city heterogeneity. These income values can be equivalized by dividing for
the appropriate equivalent scales, so that households are made comparable within and
across block groups.15
We iterate the procedure for each year considered in this study. The resulting income
14
Estimation is constrained in a way that average income inferred from this distribution of incomes
and frequencies matches the average income in the Census or ACS within the block group, obtained by
dividing total income in the block by the number of households there residing.
15
In most of the cases, it turns out that the reference income category associated to a bin is simply
the income bin’s midpoint. For the top income bin, the reference value is adjusted so that the average
estimated income coincides with data provided by the Census.
21
City
Year
Chicago (IL)
1980
1990
2000
2010/14
# Blocks
3756
4444
4691
4763
Hh/block
1122
1217
1173
1060
Eq. scale
1.630
2.029
1.625
1.575
Mean
13794
21859
41193
55710
Equivalent household income
20%
80%
Gini 90%/10%
5798 20602 0.434
11.351
9132 32316 0.461
11.903
16076 61667 0.473
11.533
20022 89856 0.486
13.452
Table 1: Income and population distribution across block groups in Chicago (IL)
database consists in strings of incomes and frequency weights for each block group in
the cities considered. We treat these income levels as observations with their associated
sample weight. Thus, weighted variants of the GINI indices estimators are used.
Census and ACS identifier is defined at the census block-group level. Each block group
is geocoded, and measures of distance between the bock groups centroids can be hence
constructed. All income observations within the same block group are hence on the same
spatial location. To identify cities, we resort on the Census definition of a Metropolitan
Statistical Area, based on the 1980 Census definition.16
5.2
Spatial inequality in Chicago, IL
We use the 1980 definition of the Chicago Primary MSA area provided by the Census,
which comprises Cook County, Du Page County and McHenry County surface. By sticking to this definition of the Chicago metro area we can construct measures of income
inequality in a well defined, yet large, urban space.17 Table 1 reports summary information of household population and the respective income distribution in Chicago.18 Average
equivalent household income has increased fourfold over 1980-2014 in nominal terms. De16
Details about the U.S. counties defining the Chicago (as well as the other cities) metropolitan area
can be found at this link: http://www.census.gov/population/metro/files/lists/historical/80mfips.txt.
17
Note that for some of the block groups of 1980 Census it is not possible to establish geocoded
references. Hence, these units cannot be included in the index computation and might have an impact on
estimation of the GINI patterns. Other works, such as (Reardon and Bischoff 2011), have demonstrated,
however, that the the impact of this missing information is negligible on overall trends of inequality within
the 100 largest U.S. cities.
18
Throughout the four decades considered in this study, the block group partition of Chicago has
become finer, with the number of block groups increasing of 1000 units. This change keeps track of the
demographic boom in Chicago, implying a roughly stable demographic composition in each block group
(around 1100 households on average).
22
Figure 2: Spatial GINI indices of income inequality for Chicago (IL), 1980, 1990, 2000
and 2010/14
(a) GINI within
(b) GINI between
Note: Authors elaborations on U.S. Census data.
spite the relative gap between the poor (bottom 20%) and rich population (top 20%)
has remained stable across time, we observe from Table 1 that the top 10% to bottom
10% ratio has sharply increased from 11.533 in 2000 to almost 13.5 in 2010/2014 period,
indicating increased dispersion at the tails of the distribution. The citywide income inequality, measured by the Gini index, has contextually raised from 0.43 to 0.48 during
the same period.
We compute GINI within and between indices for 1980, 1990, 2000 and 2010/2014
Census waves using the estimators discussed in section 4 to assess how inequality in
equivalent household income across individual neighborhoods has evolved in this period.
The estimators of the GINI indices are non-parametric, implying that we measure the level
of inequality at every empirical distance (expressed in meters) estimated from the data.
At distance zero, the GINI within index captures the average inequality in estimated
income levels within block groups. Data confirm a substantial heterogeneity of within
income inequality across block groups: the Gini index at the level of the block groups
varies, in fact, between 0.2 to above 0.6 in 2000.19 Inequality remains above 0.4 even
19
For this year block group level estimates of inequality are reported in the Census.
23
averaging across block groups. This explains the relatively high starting point of the
GINI within curves shown in Figure 2.a. Inequality slightly decreases as the neighborhood
size reaches 2km and then quickly raises to reach its city-wide level when the size of the
neighborhood is larger than 25 kilometers. Comparing the GINI within curves of the
different decades, within neighborhood inequality appears increasing over time, for any
size of the neighborhood.
In figure 2.b we plot GINI between curves from 1980 to 2010/14. When neighborhoods are narrow (basically, we compare average income across block groups), we find
lower inequality values compared to the within case (the GINI between index values are
generally smaller than 0.3). The most striking evidence in figure 2.b is that between
inequality decreases with the size of the neighborhood, but in a very smooth manner.
For neighborhoods of size 2 kilometers, the GINI between is generally larger than 0.25.
It decreases to 0.1 only for neighborhoods at least of 20 kilometers range. Overall, this
pattern is robust across Census years. Contrasting the GINI between curves over the last
three decades, between inequality is on the rise up to the 1990, and then remains stable.
Are these patterns statistically significant? To answer this question, we first compute empirical estimators of the variograms produced from data, and then we use results
presented in appendix A to derive standard errors and confidence intervals of the GINI
within and between indices at pre-selected abscissae. Confidence bounds are drawn for
each spatial inequality curve, and dominance across curves is tested making use of pointwise t-statistics.20
All together, we obtain evidence of the following patterns of spatial inequality in
Chicago over the time span considered: i) GINI within inequality is quite high and weakly
increases with the size of the neighborhood; ii) GINI between inequality is smaller than
GINI within (and than overall inequality) when the neighborhood size is small and de20
To do so, we compute all pairwise differences in GINI within or between spatial inequality curves
across all the decades under analysis. We then plot these differences measured at pre-determined distance
abscissae (along with the associated confidence bounds) on a graph. If the horizontal line passing from the
origin of the graph (indicating the null hypothesis of no differences in spatial inequality at every distance
threshold) falls within these bounds, we conclude that the gap in the spatial inequality curves under
scrutiny are not significant at 5% confidence level. For a detailed description of results, see appendix B.
24
creases smoothly with the neighborhood size, implying spatial persistence of inequality;
iii) GINI within inequality levels do not increase significantly over time; iv) GINI between
inequality rose in the two first decades up to 2000, while stabilizing between 2000 and
2014.
The previous facts suggest that largest increases in absolute income deviations are registered for individuals living in neighborhoods experiencing a growth in average income.
This might be the case if, on average, rich households get richer in those neighborhoods
that are composed predominantly by rich households where poor households income remained stable, or poor households get poorer in neighborhoods that are composed predominantly by poor households where rich households income remained stable, implying
little variations across time of the GINI within index. Similar results are consistent with
a case in which income growth is evenly distributed across poor and rich households (so
relative incomes do not change), but rich and poor households distance increases. However, data show that inequality at the tails of the distribution increased substantially
over 1980-2014 (table 1), affecting as well the citywide inequality, on the rise. The result
highlights the prominent implications for spatial inequality that are due to changes in the
income distribution, rather than to changes in sorting patterns.
5.3
Stylized facts about spatial inequality in the U.S.
Can we recover the same patterns of spatial inequality registered in Chicago also in other
U.S. cities? We now try to fix new stylized facts about the dynamics of spatial inequality
across the U.S. major urban areas. We consider the 30 most populated cities in 2014.21
The spatial dimension of the city is established by the city metropolitan statistical area
as defined from the 1980 Census, so that within-city patterns of spatial inequality can be
meaningfully compared across decades.22
21
Data on urban demographic size are from the Census Bureau and can be downloaded from:
http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk. The list of cities,
ordered by their size, can be found in table 4 in the appendix.
22
Information on the county composition of a city metropolitan statistical are are taken from the Census
1980 definition: http://www.census.gov/population/metro/files/lists/historical/83mfips.txt.
25
Figure 3: GINI within and between for 30 most populated U.S. cities
(a) GINI within, Census 1980
(b) GINI within, ACS 2010/2014
(c) GINI between, Census 1980
(d) GINI between, ACS 2010/2014
Note: Authors elaboration on U.S. Census data. A full description of the cities used in this study can be
found in the appendix, table 4.
We compute within and between GINI for each of these cities in each year at predetermined distances. We summarize our main findings in Figure 3, which shows spatial
inequality curves for the years 1980 and 2010/2014. There are 30 curves in each plot,
one for each city considered in the study. These curves are suggestive of two basic facts.
First, that spatial within and between inequality is larger in 2010/2014 than it was in
1980, at every distance abscissa. Second, the dynamics of spatial inequality displayed by
the between and within curves are substantially similar to that registered for Chicago:
26
the GINI within index is very high even for small distances and rapidly converges to the
citywide level of inequality. The GINI between index fluctuates around 0.3 and smoothly
converges to zero for large individual neighborhoods. The black curves in the figure
represent a fifth degree polynomial fit of the relation between the values of GINI within
and between indices and the neighborhood size. Their shapes reflect the two stylized facts
described above on the pattern of spatial inequality in U.S. metropolitan areas.
Although resting on a completely different methodology, our results support the findings of Wheeler and La Jeunesse (2008). They considered two different exogenous spatial
partitions of US metropolitan areas and reported quite strong levels of spatial inequality
within block groups, also stable over time. They also pointed out that the major changes
over 1980-2000 were driven by the between component of inequality.
The GINI within and between indices capture dimensions of inequality that are not
predictable looking at city inequality or affluence. Figure 4.a relates overall city inequality
(measured through the Gini index) and the values of spatial GINI inequality indices in
our data. The degree of association is visualized by the slopes of the regression lines.
We examine both within and between spatial inequality for the Census year 1980 and for
ACS 2010/14 data, for a neighborhood size of 2Km. As expected, citywide and withinneighborhood inequality are positively correlated. Correlation evolves from 94% in 1980 to
72% in 2014. Also inequality between neighborhoods is positively associated with citywide
inequality. Heterogeneity of GINI between indices around the regression lines is, however,
substantially larger than heterogeneity in GINI within, thus indicating less reliability in
these correlation estimates. Correlation evolves from 62% in 1980 to barely 35% in 2014.
In both cases, the degree of association between spatial and citywide inequality is slightly
decreasing over time.
Figure 4.b similarly shows the association among GINI spatial measures and city
affluence (measured by the average equivalent income). Results about the relation between
GINI inequality and average households income in U.S. cities are less clear-cut. We do
not detect a remarkable association between city affluence and GINI spatial inequality,
both in the within and the between form.
27
Figure 4: City-level income inequality, average income and spatial inequality for different
individual neighborhood concepts: Census 1980 and ACS 2010/2014
(a) Distance range: 0.5Km
(b) Distance range: 5Km
Note: Authors elaboration on U.S. Census data. Citywide average income normalized by weighted sample
28
mean.
6
Concluding remarks
We study spatial inequality at the urban level from the perspective of the individual. We
use information about the income distribution in the neighborhood surrounding each individual, defined by measures of spatial distance between pairs of individuals, to assess the
spatial component of inequality. We provide simple principles for spatial inequality and
derive measures connected to the Gini inequality index. We conclude that only flat-tax
minimum-income redistributive schemes are coherent with spatial inequality reduction.
We then investigated spatial inequality patterns in the 30 largest U.S. cities from 1980 to
2014 and we established four stylized facts about spatial inequality: i) Spatial inequality
within individual neighborhoods of any size is relatively high in all the periods considered.
ii) Spatial inequality between individual neighborhoods is smaller than within inequality
when the neighborhood concepts is restricted to close neighbors. Inequality decreases
smoothly with the size of individual neighborhood in all the periods considered. iii) Spatial inequality has been on the rise over time, especially in the between component the
two first decades. iv) Spatial inequality is only modestly associated with citywide income
inequality and does not depend on city affluence.
The stable patterns of spatial inequality over time mainly suggest that urban inequality has been driven mainly by changes income distribution that affected differently rich
an poor, depending from where they are located in the city, rather then by dramatic
changes in sorting dynamics. From the policy perspective, we remark that an effective
strategy able to check this tendency from the redistribution side (avoiding policies from
the sorting side, as the Moving to Opportunity experiment) is given by basic incomeflat tax redistributive schemes. The stylized facts presented above are consistent with a
paradoxical picture of incomes and households sorting in US cities: a high and persistent
neighborhood inequality is accompanied by poor and rich neighborhoods more and more
”linearly” separated in the city. This picture deserves further investigation, both from
the measurement side and from its possible explanation through spatial sorting models.
29
References
Aczel, J. (1966). Lectures on functional equations and their applications, Academic Press,
New York.
Bhattacharya, D. (2007). Inference on inequality from household survey data, Journal of
Econometrics 137(2): 674 – 707.
Binder, D. and Kovacevic, M. (1995). Estimating some measures of income inequality
from survey data: An application of the estimating equations approach., Survey
Methodology 21: 137–45.
Biondi, F. and Qeadan, F. (2008). Inequality in paleorecords, Ecology 89(4): 1056–1067.
Bishop, J. A., Chakraborti, S. and Thistle, P. D. (1989). Asymptotically distribution-free
statistical inference for generalized Lorenz curves, The Review of Economics and
Statistics 71(4): pp. 725–727.
Chetty, R., Hendren, N. and Katz, L. F. (2016). The effects of exposure to better neighborhoods on children: New evidence from the Moving to Opportunity experiment,
American Economic Review 106(4): 855–902.
Chetty, R., Hendren, N., Kline, P. and Saez, E. (2014). Where is the land of opportunity?
The geography of intergenerational mobility in the United States, The Quarterly
Journal of Economics 129(4): 1553–1623.
Chilès, J.-P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, John
Wiley & Sons, New York.
Clark, W. A. V., Anderson, E., Östh, J. and Malmberg, B. (2015). A multiscalar analysis
of neighborhood composition in Los Angeles, 2000-2010: A location-based approach
to segregation and diversity, Annals of the Association of American Geographers
105(6): 1260–1284.
Conley, T. G. and Topa, G. (2002). Socio-economic distance and spatial patterns in
unemployment, Journal of Applied Econometrics 17(4): 303–327.
Cowell, F. (2000). Chapter 2 measurement of inequality, Vol. 1 of Handbook of Income
Distribution, Elsevier, pp. 87 – 166.
30
Cressie, N. (1985). Fitting variogram models by weighted least squares, Journal of the
International Association for Mathematical Geology 17(5): 563–586.
Cressie, N. A. C. (1991). Statistics for Spatal Data, John Wiley & Sons, New York.
Cressie, N. and Hawkins, D. M. (1980). Robust estimation of the variogram: I, Journal
of the International Association for Mathematical Geology 12(2): 115–125.
Dardanoni, V. and Forcina, A. (1999). Inference for Lorenz curve orderings, Econometrics
Journal 2: 49–75.
Davidson, R. (2009). Reliable inference for the Gini index, Journal of Econometrics
150(1): 30 – 40.
Dawkins, C. J. (2007). Space and the measurement of income segregation, Journal of
Regional Science 47: 255–272.
Durlauf, S. N. (2004). Neighborhood effects, Vol. 4 of Handbook of Regional and Urban
Economics, Elsevier, chapter 50, pp. 2173–2242.
Ellen, I. G., Mertens Horn, K. and O’ Regan, K. M. (2013). Why do higher-income
households choose low-income neighbourhoods? Pioneering or thrift?, Urban Studies
50(12): 2478–2495.
Galster, G. (2001). On the nature of neighbourhood, Urban Studies 38(12): 2111–2124.
Glaeser, E., Resseger, M. and Tobio, K. (2009). Inequality in cities, Journal of Regional
Science 49(4): 617–646.
Goodman, L. A. and Hartley, H. O. (1958). The precision of unbiased ratio-type
estimators, Journal of the American Statistical Association 53(282): 491–508.
Hardman, A. and Ioannides, Y. (2004). Neighbors’ incom distribution: Economic segregation and mixing in US urban neighborhoods, Journal of Housing Economics
13(4): 368–382.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution, The
Annals of Mathematical Statistics 19(3): 293–325.
Kim, J. and Jargowsky, P. A. (2009). The Gini-coefficient and segregation on a continuous
variable, Vol. Occupational and Residential Segregation of Research on Economic
Inequality, Emerald Group Publishing Limited, pp. 57 – 70.
31
Ludwig, J., Duncan, G. J., Gennetian, L. A., Katz, L. F., Kessler, R. C., Kling, J. R. and
Sanbonmatsu, L. (2013). Long-term neighborhood effects on low-income families:
Evidence from moving to opportunity, American Economic Review 103(3): 226–31.
Luttmer, E. F. (2005). Neighbors as negatives: Relative earnings and well-being, The
Quarterly Journal of Economics 120(3): 963–1002.
Matheron, G. (1963). Principles of geostatistics, Economic Geology 58(8): 1246–1266.
Moyes, P. (1994). Inequality reducing and inequality preserving transformations of
incomes: Symmetric and individualistic transformations, Journal of Economic
Theory 63(2): 271 – 298.
Muliere, P. and Scarsini, M. (1989). A note on stochastic dominance and inequality
measures, Journal of Economic Theory 49(2): 314 – 323.
Nielsen, F. and Alderson, A. S. (1997). The Kuznets curve and the great u-turn: Income
inequality in U.S. counties, 1970 to 1990, American Sociological Review 62(1): 12–33.
Openshaw, S. (1983). The modifiable areal unit problem, Norwick: Geo Books.
Pyatt, G. (1976). On the interpretation and disaggregation of Gini coefficients, The
Economic Journal 86(342): 243–255.
Reardon, S. F. and Bischoff, K. (2011). Income inequality and income segregation,
American Journal of Sociology 116(4): 1092–1153.
Sampson, R. J. (2008). Moving to inequality: Neighborhood eeffects and experiments
meet social structure, American Journal of Sociology 114(1): 189–231.
Schelling, T. C. (1969). Models of segregation, The American Economic Review 59(2): pp.
488–493.
Shorrocks, A. and Wan, G. (2005). Spatial decomposition of inequality, Journal of Economic Geography 5(1): 59–81.
Tin, M. (1965). Comparison of some ratio estimators, Journal of the American Statistical
Association 60(309): 294–307.
32
Wheeler, C. H. and La Jeunesse, E. A. (2008). Trends in neighborhood income inequality
in the U.S.: 1980–2000, Journal of Regional Science 48(5): 879–891.
Wong, D. (2009). The modifiable areal unit problem (MAUP), The SAGE handbook of
spatial analysis pp. 105–124.
Xu, K. (2007). U-statistics and their asymptotic results for some inequality and poverty
measures, Econometric Reviews 26(5): 567–577.
33
Appendix
A
Standard errors and confidence bounds for spatial
inequality measures
A.1
Setting
Let S denote a random field. The spatial process {Ys : s = 1, . . . , n} with s ∈ S is defined
on the random field and is jointly distributed as FS . Suppose data come equally spaced
on a grid, so that for any two points s, v ∈ S such that ||v − s|| = h we write v = s + h.
The process distributed as FS is said to display intrinsic (second-order) stationarity if
E[Ys ] = µ, V ar[Ys ] = σ 2 and Cov[Ys , Yv ] = c(h) when the covariance function is isotropic
and v = s + h. Under these circumstances, we denote V ar[Ys+h − Ys ] = E[(Ys+h − Ys )2 ] =
2σ 2 − 2c(h) = 2γ(h), the variogram of the process at distance lag h.
Noticing that E[Ys+h · Ys ] = σ 2 − γ(h) + µ2 , we can derive a simple formulation of the
covariance between differences in random variables, notably Cov[(Ys+h1 − Ys ), (Yv+h2 −
Yv )] = γ(s − v + h1 ) + γ(s − v − h2 ) − γ(s − v) − γ(s − v + h1 − h2 ) as in Cressie
and Hawkins (1980). This assumption holds, in particular, if the spatial data occur on
a transect. Denote s − v = h where h indicates that the random variables are located
within a distance lag of h units. We can hence write Cov[(Ys+h1 − Ys ), (Yv+h2 − Yv )] =
γ(|h + min{h1 , h2 }|) + γ(|h − max{h1 , h2 }|) − γ(|h|) − γ(|h − |h1 − h2 ||), which yields
the formula above when h1 > h2 . We adopt the convention that γ(−h) = γ(h) in what
follows.
We now introduce one additional distributional assumption. Assume that Ys is gaussian with mean µ and variance σ 2 . The random variable (Ys+h − Ys ) is also gaussian
with variance 2γ(h), which implies |Ys+h − Ys | is folded-normal distributed, with expectap
p
tion E[|Ys+h − Ys |] = 2/πV ar[Ys+h − Ys ] = 2 γ(h)/π and variance V ar[|Ys+h − Ys |] =
(1 − 2/π)2γ(h).
34
A.2
GINI indices and the variogram
Under the assumptions listed above, we now show that the GINI indices of spatial inequality in the population can be written as explicit functions of the variogram. We maintain
the assumption that the spatial random process is defined on a transect, and occurs at
equally spaced lags. For given d, we can thus partition the distance spectrum [0, d] into
Bd intervals of fixed size d/Bd . Each interval is denoted by the index b with b = 1, . . . , Bd .
Abusing notation, we denote with dbi the set of locations at interval b (and thus distant
b · d/Bd ) within the range d from location si . The cardinality of this set is ndbi ≤ ndi ≤ n.
Under these assumptions, the GINI within index rewrites
1 E[|Ysj − Ysi |]
2n ndi
µ
i j∈di
p
XX 1
4γ(||sj − si ||)/π
=
2n ndi
µ
i j∈di
p
Bd
X1X
4γ(si + b − si )/π
ndbi X 1
=
n b=1 ndi j∈d 2 nddi
µ
i
bi
!
p
Bd
X nd
4γ(b)/π
1X
bi
=
.
2 b=1
n
n
µ
d
i
i
GIN IW (FS , d) =
XX
(1)
The GINI within index is an average of a concave transformation of the (semi)variogram
function, weighted by the average density of locations at given distance lag b on the
transect. This average is then normalized by the average income, to produce a scaleinvariant measure of inequality. The index can be also conceptualized as a coefficient
of variation, where the standard deviation is replaced by a measure of dispersion that
accounts for the spatial dependence of the underlying process.
Similarly, the spatial GINI between index can also be written as a function of the
variogram. The result holds under the assumption that the process Ys is gaussian, as
P
above, which implies that µsi d = ndi1+1 Ysi + j∈di Ysj is also gaussian under the intrinsic stationarity assumption, with expectation E[µsi d ] = µ for any i. From this, it
follows that the difference in random variables |µsi d − µs` d | occurring in two locations si
35
and s` is a folded-normal distributed random variable with expectation E[|µsi d − µs` d |] =
p
2/π V ar[µsi d − µs` d ]. The variance term can be decomposed as follows:
V ar[µsi d − µs` d ] = V ar[µsi d ] + V ar[µs` d ] − 2Cov[µsi d ; µs` d ].
(2)
Developing the variance and covariance terms we obtain:
"
!#
X
1
V ar[µsi d ] = V ar
Ysi +
Ysj
ndi + 1
j∈di
X
X
1
E[Ysj Ysk ] − µ2
=
2
(ndi + 1)
j∈di ∪{i} k∈di ∪{i}
=
=
1
(ndi + 1)2
Bd
X
X
b=1 j∈dbi
= σ2 −
X
X
c(||sj − sk ||)
(3)
Bd X
1
1 X
c(|si + b − (si + b0 )|)
ndi + 1 b0 =1 k∈d ndi + 1
(4)
j∈di ∪{i} k∈di ∪{i}
b0 i
Bd X
Bd
X
ndbi ndb0 i
γ(b − b0 ),
2
(n
+
1)
di
b=1 b0 =1
(5)
where (3) follows from the definition of the covariogram, (4) is a consequence of the
assumption that the process can be represented on a transect and, for simplicity, it is
assumed that the set of location at b = 1 is d1i ∪ {i} with cardinality ndbi + 1, as it
includes location si , while (5) follows from the definition of the variogram. Similarly, the
covariance term in (2) can be manipulated to obtain the following:
Cov[µsi d ; µs` d ] =
XX
j∈di k∈d`
2
= σ −
1
E[Yj Yk ] − µ2
(ndi + 1)(nd` + 1)
Bd X
Bd
X
ndbi ndb0 `
γ(si − s` + |b − b0 |),
(n
+
1)(n
+
1)
d
d
i
`
b=1 b0 =1
(6)
where the assumption that the process can be represented on a transect allows to write
the variogram as a function of si − s` . Plugging (5) and (6) into (2), and by denoting
i − ` = m to recall that the gap between si and sj is m, with m positive integer such that
36
m ≤ B with B being the maximal distance between any two locations on the transect,
we obtain
V ar[µsi d − µs` d ] =
=
Bd X
Bd
X
ndbi ndb0 `
γ(si − s` + |b − b0 |) −
(n
+
1)(n
+
1)
di
d`
b=1 b0 =1
!
Bd X
Bd
X
ndbi ndb0 i ndb` ndb0 `
γ(b − b0 )
+
−
2
2
ndi
nd`
b=1 b0 =1
Bd X
Bd
X
2
ndbi ndb0 i+m
γ(m + |b − b0 |) −
(n
+
1)(n
+
1)
d
d
i
i+m
b=1 b0 =1
!
B
B
d
d X
X
ndbi ndb0 i ndb i+m ndb0 i+m
+
γ(b − b0 )
−
2
2
n
n
di
di+m
b=1 b0 =1
2
= V (γ, i, m).
(7)
Using the notation (7), we derive an alternative formulation of the GINI between index:
GIN IB (FS , d) =
1
1 XX
E[|µsi d − µs` d |]
2 i `6=i n(n − 1)
µ
B
1X1 X X
1 E[|µsi d − µs` d |]
=
2 i n m=1 `∈n (n − 1)
µ
bi
P
p
1 nbi
B
2V
(γ,
i,
m)/π
i n (n−1)
1X
=
.
2 m=1
µ
(8)
Under stationarity assumptions about the spatial process, we can show that the GINI between index of spatial inequality can be written as an average of coefficients of variations,
each discounted by a weight controlling for the spatial dependency of the process.
Formulations of the GINI within and between indices in (1) and (8) clarify the role of
spatial dependence on the measurement of spatial inequality. Spatial dependence can be
modeled via the variogram. Standard errors and confidence intervals of the GINI indices
can be calculated accordingly.
37
A.3
Standard errors for the spatial GINI within index
In this section, we derive confidence interval bounds for the GINI within index under
three key assumptions: that the underling spatial process is stationary, that the spatial
process occurs on a transect at equally spaced points, and the gaussian assumption. This
allows to build confidence intervals for the empirical GINI within index estimator of the
ˆ I W (y, d) ± zα SEW d , where zα is the standardized normal critical value for
form GIN
confidence level α and SEW d is the standard error of the GINI within estimator. For a
given empirical income distribution, the confidence interval changes as a function of the
distance parameter selected. Hence, we can use the confidence interval estimator to trace
confidence bounds for the GINI within curve. Null hypothesis of dominance or equality
for the GINI within curves can be formulated by using confidence bounds as the rejection
region and by defining null hypothesis at each distance point separately (alike to statistical
tests for strong forms of stochastic dominance relations, as in Bishop, Chakraborti and
Thistle 1989, Dardanoni and Forcina 1999).
Asymptotic standard errors (SE in brief) are derived for the weighted GINI within
index for the spatial process {Ys : s ∈ S}. We assume that S is limited to n locations.
We index these locations for simplicity by i such that i = 1, . . . , n. The spatial process is
then a collection of n random variables {Yi : i = 1, . . . , n} that are spatially correlated.
The joint distribution of the process is F . Each location is associated with a weight
P
wi ≥ 0 with w = i wi , which might reflect the underling population density at a given
location. These weights are assumed not to be stochastic. We also assume intrinsic
stationarity as before. The first implication is that, asymptotically, the random variable
P
P
w
µid = j∈di ∪{i} P j wj Yj is equivalent in expectation to µ̃ = i Pwiwi Yi , i.e., E[µ̃] = µ.
j∈di ∪{i}
i
The second implication is that the spatial correlation exhibited by F is stationary in the
distance d and can be represented through the variogram of F , denoted 2γ(d).
An asymptotically equivalent version of the weighted GINI within index of the process
38
distributed as F where individual neighborhood have size d is
n
1 XX
1
wi wj
P
GIN IW (F, d) =
|Yi − Yj | =
∆W d .
2µ i=1 j∈d 2 w j∈di wj
2µ
(9)
i
The GINI within index can thus be expressed as a ratio of two random variables. Asymptotic SE for ratios of random variables have been developed in Goodman and Hartley
(1958) and Tin (1965). Related results have also been derived from the theory of Ustatistics pioneered in Hoeffding (1948) and adopted to derive asymptotic SE for the Gini
coefficient of inequality under simple and complex random sampling by Xu (2007) and
Davidson (2009). Based on these results, we derive the asymptotic variance of the GINI
within index in (9):
V ar [GIN IW (F, d)] =
1
(GIN IW (F, d))2
V
ar[∆
]
+
V ar[µ̃] −
Wd
4nµ2
nµ2
GIN IW (F, d)
Cov[∆W d , µ̃] + O(n−2 ),
nµ2
where the asymptotic SE is SEW d =
(10)
p
V ar [GIN IW (F, d)] at any d.
The variance and covariance terms in (10) are shown to be relate to the variogram. To
obtain this result, we have to introduce two additional assumptions. The first assumption
is that the process distributed as F occurs on a transect, as explained before. We use
scalars m, b,0 and so on to identify equally spaced points on the transect. Second, we
assume that Yi is gaussian with expectation µ and variance σ 2 . These assumptions are
taken from Cressie and Hawkins (1980). Under these assumptions, the variance of µ̃
39
writes
X wi X wj
E[Yi Yj ] − µ2
w j w
i
B P
X wi X
X
w
j∈dmi wj
P j
=
c(||si − sj ||)
w
w
w
j
j∈d
mi
m=1
i
j∈dmi
!
P
B
X X wi j∈d wj
mi
=
c(|m|)
w
w
m=1
i
V ar[µ̃] =
2
= σ −
B
X
(11)
(12)
ω(m)γ(m),
(13)
m=1
where (13) is obtained from (12) by renaming the weight score, which satisfies
PB
m=1
ω(m) =
1, and by using the definition of the variogram.
The second variance component of (10) can be written as follows:
V ar[∆W d ] =
n X
X
i=1 j∈di
−
ww
Pi j
w
j∈di
n X
X
wj
X wi X
i
w
wj
P
j∈di
`=1 k∈d`
j∈di
wj
w
ww
P` k
wk
!2
E[|Yi − Yj ||Y` − Yk |]
k∈d`
E[|Yj − Yi |]
.
The first component of V ar[∆W d ] cannot be further simplified, as the absolute value operator enters the expectation term in multiplicative way. Under the gaussian assumption,
the expectation can be nevertheless simulated. This can be done acknowledging that the
random vector (Yj , Yi , Yk , Y` ) is normally distributed with expectations (µ, µ, µ, µ) and
variance-covariance matrix Cov[(Yj , Yi , Yk , Y` )] given by:




Cov[(Yj , Yi , Yk , Y` )] = 



σ
2
c(||sj − si ||) c(||sj − sk ||) c(||sj − s` ||)
σ
2



c(||si − sk ||) c(||si − s` ||) 
.

σ2
c(||sk − s` ||) 

2
σ
Data occur on a transect at equally spaced points, where sj = si + b and sk = s` + b0
40
for the positive integers b ≤ Bd and b0 ≤ Bd . We take the convention that b0 > b and
we further assume that there is a positive gap m, with m ≤ B between points si and s` .
Using this notation, we can express the variance-covariance matrix as a function of the
variogram

σ
2
σ − γ(b)
σ − γ(m − |b − b|)
σ − γ(m + min{b , b})

σ2
σ 2 − γ(m − max{b0 , b})
σ 2 − γ(m)
σ2
σ 2 − γ(b0 )



.



2



Cov[(Yj , Yi , Yk , Y` )] = 



0
2
0
2
σ2
The expectation E[|Yi − Yj ||Y` − Yk |] can be simulated from a large number S (say,
S = 10, 000) of independent draws (y1s , y2s , y3s , y4s ), with s = 1, . . . , S, from the random
vector (Yj , Yi , Yk , Y` ). The simulated expectation is a function of the variogram parameters
m, b, b0 and d and of σ 2 . It is denoted θW (m, b, b0 , d, σ 2 ) and estimated as follows:
S
1X
|y2s − y1s | · |y4s − y3s |.
θW (m, b, b , d, σ ) =
S s=1
0
2
With some algebra, and using the fact that E[|Y` − Yi |] = 2
p
γ(m)/π for locations ` and
i at distance m ≤ B one from each other, it is then possible to write the term V ar[∆W d ]
as follows:
V ar[∆W d ] =
Bd X
Bd
B X
X
ω(m, b, b0 , d)θW (m, b, b0 , d, σ 2 )
m=1 b=1 b0 =1
−4
Bd
X
!2
p
ω(m, d) γ(m)/π
.
(14)
m
P
P
P
w
In the formula, ω(m, b, b0 , d) = i wwi j∈dbi P j wj `∈dmi
j∈di
P wi P
w
P j
are
calculated
as
before.
j∈dmi
i w
wj
w`
w
P
k∈db0 `
P wk
k∈d`
wk
while ω(m, d) =
j∈di
The third component of (10) is the covariance term. It also depends on the variogram.
The result relies on the following equivalence, when the process is define on the transect
41
and i and j are separated by b units of spacing while i and ` are separated by m unit of
spacing:
E[|Yj − Yi |Y` ] = E[|Yj Y` − Yi Y` |] = E[Yj Y` ] − E[Yi Y` ] − 2E[min{Yj Y` − Yi Y` , 0}]
= c(||sj − s` ||) + µ2 − c(||si − s` ||) − µ2 − 2E[min{Yj Y` − Yi Y` , 0}]
= γ(m) − γ(m − b) − 2E[min{Yj Y` − Yi Y` , 0}].
(15)
The expectation E[min{Yj Y` − Yi Y` , 0}] is non-liner in the underlying random variables.
Under the gaussian hypothesis it can be nevertheless simulated from a large number
S (say, S = 10, 000) of independent draws (y1s , y2s , y3s ), with s = 1, . . . , S, from the
random vector (Yj , Yi , Y` ) which is normally distributed with expectations (µ, µ, µ) and
variance-covariance matrix Cov[(Yj , Yi , Y` )]. As the process occurs on the transect, the
variance-covariance matrix writes

σ
2
2
σ − γ(b)


Cov[(Yj , Yi , Y` )] = 

σ2
2
σ − γ(m)



σ 2 − γ(m − b) 

2
σ
for given m, b and d. The resulting simulated expectation is denoted φW (m, b, d, σ 2 ) and
computed as follows:
S
1X
φW (m, b, d, σ ) =
min{y2s y3s − y1s y3s , 0}.
S s=1
2
42
Based on this result, we develop the covariance term in (10) as follows:
Cov[∆W d , µ̃] =
X wi X
X w`
wj
E[|Yj − Yi |Y` ]
w j∈d
j∈di wj ` w
i
X wi X
w
P j
E[|Yj − Yi |]
−µ
w
w
j
j∈d
i
i
j∈d
P
i
i
=
Bd
B X
X
ω(m, b, d) γ(m) − γ(m − b) − 2φW (m, b, d, σ 2 )
m=1 b=1
Bd
X
−2µ
ω(m, d)
p
γ(m)/π.
(16)
m=1
P
P
P
w
The weights in (16) coincide respectively with ω(m, b, d) = i wwi `∈dmi ww` j∈dbi P j wj
j∈di
P
P
w
and ω(m, d) = i wwi j∈dmi P j wj . The variogram appears in the second term of (16)
j∈di
is it was the case in (14).
ˆ W d , can be obtained by plugging into
A consistent estimator for the SE, denoted SE
(10) the empirical counterparts of the variogram and the lag-dependent weights, using
the formulas in (13), (14) and (16). These estimators are discussed in section A.5.
A.4
Standard errors for the spatial GINI between index
ˆ I B (y, d) ± zα SEB d for the GINI between
Estimation of confidence interval bounds GIN
index are obtained under the same assumptions outlined in the previous section. As
before, we assume that the spatial process {Ys : s ∈ S} is limited to n locations. We
index these locations for simplicity by i such that i = 1, . . . , n. The spatial process is then
a collection of n random variables {Yi : i = 1, . . . , n} that are spatially correlated. The
joint distribution of the process is F . Each location is associated with a weight wi ≥ 0
P
with w = i wi . These weights are assumed not to be stochastic.
P
w
Under stationary assumptions, the neighborhood averages µid = j∈di ∪{i} P j wj Yj
j∈di ∪{i}
P
and µd = i wwi µid are equivalent in distribution to µ̃, and hence µ̃ can be used to assess
V ar[µd ], as V ar[µd ] = V ar[µ̃]. Similar conclusions cannot be drawn for measures of linear
association involving µd .
43
An asymptotically equivalent version of the weighted GINI within index of the process
distributed as F where individual neighborhood have size d is
n
GIN IW (F, d) =
n
1 X X wi wj
1
∆B d .
|µ
−
µ
|
=
id
jd
2µ i=1 j=1 w2
2µ
(17)
We use results on variance estimators for ratios to derive the SE of (17):
V ar [GIN IB (F, d)] =
1
(GIN IB (F, d))2
V
ar[∆
]
+
V ar[µ̃] −
Bd
4nµ2
nµ2
GIN IB (F, d)
Cov[∆B d , µd ] + O(n−2 ),
−
nµ2
where the asymptotic SE is SEB d =
(18)
p
V ar [GIN IB (F, d)] at any d.
The variance and covariance terms in (18) are shown to be related to the variogram.
To obtain this result, we have to further introduce two assumption. The first assumption
is that the process distributed as F occurs on a transect, as explained before. We use
scalars m, b, b0 and so on to identify equally spaced points on the transect. Second, we
assume that Yi is gaussian with expectation µ and variance σ 2 .
The variance V ar[µ̃] which represents the population estimator for µ is given as in
(12).
The second variance component in (18) can be written as follows:
V ar[∆B d ] =
X X wi wj X X w` wk
i
−
j
w2
`
k
X wi X wj
i
w
j
w
w2
E[|µid − µjd ||µ`d − µkd |]
!2
E[|µjd − µid |]
.
(19)
The first component of V ar[∆B d ] cannot be further simplified as the absolute value operator enters the expectation term in multiplicative way. Under the gaussian assumption,
the expectation can be nevertheless simulated. This can be done acknowledging that the
random vector (µjd , µid , µkd , µ`d ) is normally distributed with expectations (µ, µ, µ, µ) and
variance-covariance matrix C of size 4 × 4. The cells in the matrix C are indexed accord44
ingly to vector (µjd , µid , µkd , µ`d ), so that element C12 is used, for instance, to indicate the
covariance between the random variables µjd and µid . The sample occurs on a transect.
We use scalars b and b0 to denote a well defined distance gap between any observation
indexed by {j, i, k, `} and any other observation that is b or b0 units away from it, within a
distance range d. We use scalars m to indicate the gap between i and `, so that ` = i + m;
we use m0 to indicate the gap between i and j, so that j = i + m0 and we use m00 to
indicate the gap between k and `, so that ` = k + m0 . Based on this notation, we can
construct a weighted analog of (6) to explicitly write the elements of C as transformations
45
of the variogram. This gives:
2
C11 = σ −
C22 = σ 2 −
C33 = σ 2 −
C44 = σ 2 −
C12 = σ 2 −
C13 = σ 2 −
C14 = σ 2 −
C23 = σ 2 −
C24 = σ 2 −
C34 = σ 2 −
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd
Bd X
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
ω1 (b, d)ω1 (b0 , d)γ(b − b0 ),
ω2 (b, d)ω2 (b0 , d)γ(b − b0 ),
ω3 (b, d)ω3 (b0 , d)γ(b − b0 ),
ω4 (b, d)ω4 (b0 , d)γ(b − b0 ),
ω1 (b, d)ω2 (b0 , d)γ(m0 + |b − b0 |),
ω1 (b, d)ω3 (b0 , d)γ(m + |b − b0 |),
ω1 (b, d)ω4 (b0 , d)γ(m + |b − b0 |),
ω2 (b, d)ω3 (b0 , d)γ(m + |b − b0 |),
ω2 (b, d)ω4 (b0 , d)γ(m + |b − b0 |),
ω3 (b, d)ω4 (b0 , d)γ(m00 + |b − b0 |),
b=1 b0 =1
where we denote, for instance, ω1 (b, d) =
wj
j w
P
P
j 0 ∈dj
wj 0
P
j 0 ∈dj
wj
and similarly for the other
elements.
The expectation E[|µjd − µid ||µkd − µ`d |] can be simulated from a large number S (say,
S = 10, 000) of independent draws (ȳ1s , ȳ2s , ȳ3s , ȳ4s ), with s = 1, . . . , S, of the random
vector (µjd , µid , µkd , µ`d ). The simulated expectation will be a function of the variogram
parameters m, m0 , m00 and d and of σ 2 . It is denoted θB (m, m0 , m00 , d, σ 2 ) and estimated
46
as follows:
S
1X
θB (m, m , m , d, σ ) =
|ȳ2s − ȳ1s | · |ȳ4s − ȳ3s |.
S s=1
0
00
2
The summations in V ar[∆B d ] run over four indices i, j, k, `. These can be equivalently
represented through summations at given distance lags m, m0 , m00 . For instance, we write
P wi P wj
PB P wi P
wj
P
=
to indicate that i and j are separated by
i w
j w
m0 =1
i w
j∈dm0 i
wj
j∈dm0 i
0
a lag of m units on the transect. Repeating this for each of the three pairs of indices i, j
and `, k and i, ` we end up with three summations over m0 , m00 and m respectively, where
the aggregate weight is denoted
B X
B X
B X
X
X
X
wi X
w` X
wi X
wj
wk
w
P
P
P `
·
·
.
ω(m, m , m , d) =
w j∈d
j∈dm0 i wj m00 =1 ` w k∈d
k∈dm00 ` wk m=1 i w `∈d
`∈dmi w`
m0 =1 i
0
00
m0 i
m00 `
wi wj
w2
mi
w` wk
E[|µid
w2
− µjd ||µ`d − µkd |],
ω(m, m0 , m00 , d)θB (m, m0 , m00 , d, σ 2 ).
(20)
Hence, the first term of the V ar[∆B d ],
P P
i
j
P P
`
k
can be written as follows:
B X
B
B
X
X
m=1
m0 =1
m00 =1
As of the second term of V ar[∆B d ], we make use of the gaussian assumption and the
variogram properties to express the square of the expectation as a weighted analog of (8),
47
that is
"
V ar[∆B d ] = E
X X wi wj
i
j
w2
X X wi wj
=
i
j
w2
#2
|µid − µjd |
!2
E[|µid − µjd |]
r !2
2
V ar[|µid − µjd |]
=
2
w
π
i
j
!2
B P
q
2 X wi X j∈dmi wj X
wj
P
=
V ar[|µid − µjd |]
π
w m0 =1
w
j∈dmi wj
i
j∈dmi
!2
B
q
2 XX X
ωij (m, d) V ar[|µid − µjd |]
(21)
=
π m0 =1 i j∈d
X X wi wj q
mi
where
V ar[|µid − µjd |] =
Bd X
Bd
X
2ωij (b, b0 , d)γ(m − |b − b0 |) − (ωi (b, b0 , d) + ωj (b, b0 , d))γ(b − b0 ).
b=1 b0 =1
Both weighting schemes in (20) and in (21) cannot be easily estimated in reasonable
computation time: they involve multiple loops across the observed locations, so that the
length of estimation increases exponentially with the density of the spatial structure. In
section A.5 we discuss estimators of the weights ωij (m, m0 , m00 , d), ωij (m, d), ωij (b, b0 , d),
ωi (b, b0 , d) and ωj (b, b0 , d) that are feasible, and provide the empirical estimator of the
variance V ar[∆B d ].
The third component of (18) is the covariance term Cov[∆B d , µd ]. The indices i, j, `
identify three locations and their respective neighborhood income averages vector (µid , µjd , µ`d ).
48
Under normality and stationarity assumptions, we can write the covariance term as follows
X X wi wj
X w`
Cov[∆B d , µd ] = Cov[
µ`d ]
|µ
−
µ
|,
id
jd
w2
w
i
j
`
X X wi wj
X w`
Cov[
=
|µid − µjd |, µ`d ]
2
w
w
i
j
`
X w` X X wi wj
=
E[|µid − µjd |µ`d ] −
w i j w2
`
X X wi wj
X w`
E[µ`d ]
−
E[|µid − µjd |]
2
w
w
i
j
`
X w` X X wi wj
E[|µid − µjd |µ`d ] −
=
2
w
w
i
j
`
r
q
2 X X wi wj
−
µ
V ar[µid − µjd ].
π i j w2
(22)
The first term of (22) is the expectation of a non-linear function of convex combinations of
normally distributed random variables. Under the gaussian hypothesis, the expectation
E[|µid −µjd |µ`d ] can be nevertheless simulated from a large number S (say, S = 10, 000) of
independent draws (ȳ1s , ȳ2s , ȳ3s ) with s = 1, . . . , S from the random vector (µid , µjd , µ`d )
which is normally distributed with expectations (µ, µ, µ) and variance-covariance matrix
C of size 3 × 3. Let use scalars b and b0 to denote a well defined distance gap between any
observation indexed by {j, i, `} and any other observation that is b or b0 units away from
it, within a distance boundary d. We use scalars m0 to indicate the gap between i and j,
so that j = i + m0 ; we use m00 to indicate the gap between i and `, so that ` = i + m00 .
Based on this notation, we obtain a convenient formulation for the covariances of mean
49
neighborhood incomes that are weighted analog of (6), thus giving:
2
C11 = σ −
C22 = σ 2 −
C33 = σ 2 −
C12 = σ 2 −
C13 = σ 2 −
C23 = σ 2 −
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd X
Bd
X
b=1 b0 =1
Bd
Bd X
X
b=1 b0 =1
Bd
Bd X
X
b=1 b0 =1
Bd X
Bd
X
ω1 (b, d)ω1 (b0 , d)γ(b − b0 ),
ω2 (b, d)ω2 (b0 , d)γ(b − b0 ),
ω3 (b, d)ω3 (b0 , d)γ(b − b0 ),
ω1 (b, d)ω2 (b0 , d)γ(m0 + |b − b0 |),
ω1 (b, d)ω3 (b0 , d)γ(m00 + |b − b0 |),
ω2 (b, d)ω3 (b0 , d)γ(|m0 − m00 | + |b − b0 |),
b=1 b0 =1
where we denote, for instance, ω1 (b, d) =
P
wj
j w
P
wj 0
j 0 ∈dj
P
j 0 ∈dj
wj
and similarly for the other
elements. See previous notation for further details. The expectation E[|µjd − µid |µ`d ] is
simulated from a number S of independent draws (ȳ1s , ȳ2s , ȳ3s ) with s = 1, . . . , S of
the random vector (µjd , µid , µ`d ). The simulated expectation will be a function of the
variogram parameters m0 , m00 and d and of σ 2 . It is denoted θB (m, m0 , m00 , d, σ 2 ) and
estimated as follows:
φB (m0 , m00 , d, σ 2 ) =
S
1X
|ȳ2s − ȳ1s |ȳ3s .
S s=1
This element is constant over m0 and m00 . Hence, we use φB (m0 , m00 , d, σ 2 ) as a simulated
P
P P ww
analog for E[|µid −µjd |µ`d ], so that the covariance term ` ww` i j wi 2 j E[|µid − µjd |µ`d ]
P
P P ww
writes ` ww` i j wi 2 j φB (m0 , m00 , d, σ 2 ), or equivalently
B
X wi X
i
w
m0 =1
P
j∈dm0 i
w
B
wj X
P
`∈dm00 i
m00 =1
50
w
w`
φB (m0 , m00 , d, σ 2 ),
which is denoted
PB
m0 =1
PB
m00 =1
ω(m0 , m00 , d)φB (m0 , m00 , d, σ 2 ).
The second term of (22) is calculated as in (21). Overall, we are now allowed to write
the covariance term as follows:
Cov[∆B d , µd ] =
B
B
X
X
ω(m0 , m00 , d)φB (m0 , m00 , d, σ 2 )
m0 =1 m00 =1
r
−
B
q
2 XX X
µ
ωij (m, d) V ar[|µid − µjd |],
π m0 =1 i j∈d
(23)
mi
where
q
V ar[|µid − µjd |] =
Bd
Bd X
X
!.5
2ωij (b, b0 , d)γ(m − |b − b0 |) − (ωi (b, b0 , d) + ωj (b, b0 , d))γ(b − b0 )
b=1 b0 =1
The weights have been already defined in (21). Plugging (13), (20), (21) and (23) into
(18) we derive an estimator for the GINI within index SE. The last section discuss feasible
estimators.
A.5
Implementation
Consider a sample of size n of income realizations yi with i = 1, . . . , n. The income vector
y = (y1 , . . . , yn ) is a draw from the spatial random process {Ys }, while for each location
s ∈ S we assume to observe, at most, one income realization. Information about location
of an observation i in the geographic space S under analysis is denoted by si ∈ S, so that
a location s identifies a precise point on a map. Information about latitude and longitude
coordinates of si are given. Furthermore, observed incomes are associated with weights
P
wi ≥ 0 and are indexed according to the sample units, with w = i wi . It is often the
case that the sample weights give the inverse probability of selection of an observation
from the population.
The mean income in an individual neighborhood of range d, µid , is estimated by
51
µ
bid =
Pn
j=1
ŵj yj where
wj 1(||si − sj || ≤ d)
ŵj := P
j wj 1(||si − sj || ≤ d)
P
ŵj = 1, and 1(.) is the indicator function. The estimator of the average
Pn wi
neighborhood mean income is instead µ̂d =
bid . The estimator of the GINI
i=1 w µ
so that
j
ˆ I B (y, d), is the Gini inequality index
between index of spatial inequality, denoted GIN
of the vector of estimated average incomes (µ̂1d , . . . , µ̂nd ), indexed by the size d of the
individual neighborhood. It can be computed by mean of the plug-in estimators as in
Binder and Kovacevic (1995) and Bhattacharya (2007). The estimator of the GINI within
ˆ I W (y, d), is the sample weighted average of the
index of spatial inequality, denoted GIN
mean absolute deviation of the income of an individual located in s from the income of
other individuals located in s0 , with ||s − s0 || ≤ d. Formally
ˆ I W (y, d) =
GIN
n
n
X
wi 1 X
ŵj |yi − yj |,
w
2µ̂
id
i=1
j=1
where ŵj is defined as above.
The estimation of the GINI indices is conditional on d, which is a parameter under
control of the researcher. The distance d is conventionally reported in meters and is meant
to capture a continuous measure of individual neighborhood. In practice, however, one
cannot produce estimates of spatial inequality for a continuum of neighborhoods, and so
in applications the neighborhood size is parametrized by the product of the number and
size of lags between observations. The GINI indices are estimated for a finite number of
lags and for a given size of the lags. The maximum number of lags indicates the point
at which distance between observations is large enough that the spatial GINI indices
converge to their respective asymptotic values. For a given neighborhood of size d, we
can then partition the distance interval [0, d], defining the size of a neighborhood, into
K intervals d0 , d1 , . . . , dK of equal size, with d0 = 0. Any pair of observations i and j
at distance dk−1 < ||si − sj || ≤ dk is assumed to be located at distance dk one from the
ˆ I B (y, dk )) and (dk , GIN
ˆ I W (y, dk )) for any k = 1, . . . , K can
other. The pairs (dk , GIN
52
be hence plotted on a graph. The curves resulting by linearly interpolating these points
are the empirical equivalent of the GINI spatial inequality curves.
A plug-in estimator for the asymptotic standard error of the GINI indices can be
derived under the assumptions listed in the previous sections. The SE estimator crucially
depend on four components: (i) the consistent estimator for the average µ̃, denoted µ̂,
which coincides with the sample average; (ii) the consistent estimator for variance σ 2 ,
denoted σ̂ 2 , which is given by the sample variance; (iii) the consistent estimator for the
variogram; (iv) the estimator of the weighting schemes.
A sample of incomes y = (y1 , . . . , yn ) of n distinct units is drawn from F . These
incomes represent realizations of the spatial process {Ys : s ∈ S}, where we assume that
income realizations occur at distinct locations. Locations are geocoded. In this way,
distance measures between locations can be easily constructed. In applications involving
geographic representations, the latitude and longitude coordinates of any pair of income
yi , yj can be combined to obtain the geodesic distance among the locations of i and of j.
We also denote with wi the weight of each observation, often reflecting the inverse of the
probability that observation i is observed in the population.
Empirical estimators µ̂ and σ̂ 2 are standard. The robust non-parametric estimator
of the variogram proposed by Cressie and Hawkins (1980) can be used to assess the
pattern of spatial dependency from spatial data on income realizations. The empirical
variogram is defined for given spatial lags, meaning that it produces a measure of spatial
dependence among observations that are located at a given distance lag one from the
others. Under the assumption that data occur on the transect at equally spaced points,
we use b = 1, . . . , B to partition the empirical spectrum of distances between observed
locations into equally spaced lags, and we estimate the variogram on each of these lags.
This means that 2γ(b) refers to the correlation between incomes placed at distance lags of
exactly b distance units. It is understood that the size of the sample is large compared to
B, in the sense that the sampling rate per unit area remains constant when the partition
into lags becomes finer. This assumption allows to estimate a non-parametric version
of the variogram at every distance lag. Following Cressie (1985), we use weighted least
53
squares to fit a theoretical variogram model to the empirical variogram estimates. The
theoretical model consists in a continuous parametric function mapping distance into the
corresponding variogram level. In the application, we choose the spherical variogram
model for γ (see Cressie 1985). We also assume that γ(0) → 0 and that γ(a) = σ 2 , where
a is the so-called range level: beyond distance a, the random variables Ys+h and Ys with
h > a are spatially uncorrelated. Under the assumption that data occur on a transect,
we set the max number of lags B so that 2B = a. The parameters of the variogram
model are estimated via weighted least squares, where the non-parametric variogram
coordinates are regressed on distance lags. The estimated parameters are then used to
draw parametric predictions for the estimator 2γ̂ of the variogram at pre-determined
abscissae (distance lags). The predictions are then plugged into the GINI indices SE
estimators. Cressie (1985) has shown that this methodology leads to consistent estimates
of the true variogram function under the stationarity assumptions mentioned above.
Finally, SE estimation requires to produce reliable estimators of the weights ω. These
can be non-parametrically identified from the formulas provided above. In some cases,
however, computation of the exact weights requires looping more than once across observations. The overall computation time thus increases exponentially in the number of
observations and the procedure becomes quickly unfeasible. We propose alternative, feasible estimator for these weights, denoted ω̂, that are expressed as linear averages. The
computational time is, nevertheless, quadratic in the number of observations as it requires
at least one loop across all observations.
ˆ W d in (10) and
We consider here only the weights that appear in the estimators SE
ˆ B d in (18) that cannot be directly inferred (i.e., are computationally unfeasible) from
SE
P
observed weights. For a given observation i, define w(b, i) = j∈dbi wj for any gap b =
1, . . . , Bd , . . . , B the weight associated with income realizations that are exactly located
P
P d
b lags away from i. Then, denote w(d, i) = j∈di wj = B
b=1 w(b, i). We construct the
54
following estimators for the weights appeasing in the GINI within SE estimator:
X wi w(b, i) w(m, i) w(m + b0 , i)
For (14) : ω̂(m, b, b0 , d) =
w w(d, i)
i
For (16) : ω̂(m, b0 , d) =
w
X wi w(m, i) w(b0 , i)
i
w
w
w(d, i)
w(m + d, i)
,
,
To compute these weights, one has to loop over all observations twice, and assign to each
observation i the total weight w(b, i) of those observations j 6= i that are located exactly
at distance b from i. Then, ω̂(m, b, b0 , d) and ω̂(m, b0 , d) are obtained by averaging these
weights across i’s. The key feature of these estimators is that second-order loops across
observations placed at distance b0 from an observation at distance m from i are estimated
by averaging across all observations i the relative weight of observations at distance m + b0
from ay i.
For the computation of the GINI between index, one needs to construct the relative
weights by taking as a reference the maximum distance achievable, and not the reference
abscissa d for which the index is calculated. We hence assume that beyond the threshold d,
indicating half of the the maximum distance achievable in the sample, spatial correlation
is negligible and weights can thus be set to zero. We implicitly maintain that d ≤ d. We
then propose the following estimators:
For (17) : ω̂(m, m0 , m00 , d) =
X wi w(m, i) w(m, i) w(m + m0 , i) w(m + m00 , i)
i
w
w
w(d, i) w(m + d, i) w(m + d, i)
w(b, i) w(m + b0 , i)
w(d, i) w(m + d, i)
w(b, i) w(b0 , i)
For (18) : ω̂i (b, b0 , d) =
w(d, i) w(d, i)
w(m + b, i) w(m + b0 , i)
For (18) : ω̂j (b, b0 , d) =
w(m + d, i) w(m + d, i)
X X
X wi w(m, i)
For (18) :
ω̂ij (m, d) =
w
w
i j∈d
i
For (18) : ω̂ij (b, b0 , d) =
mi
By plugging these estimators into (19) we obtain the implementable estimator of the
55
variance component V ar[∆B d ], defined as follows:
Vd
ar[∆B d ] =
B X
B X
B
X
ω̂(m, m0 , m00 , d)θB (m, m0 , m00 , d, σ̂ 2 ) −
m=1 m0 m00 =1
2
π
!2
q
B X
X
wi w(m, i) d
V ar[|µid − µjd |]
w
w
0
m =1 i
(24)
where
q
Vd
ar[|µid − µjd |] =
Bd
Bd X
X
!.5
0
0
0
0
0
2ω̂ij (b, b , d)γ̂(m − |b − b |) − (ω̂i (b, b , d) + ω̂j (b, b , d))γ̂(b − b )
b=1 b0 =1
An equivalent procedure, based on analogous weighting scheme, has to be replicated
to determine the empirical estimator for (23).
56
.
B
Additional tables of results
Figure 6 show that, in general, the gap in GINI within indices is small and never significant, not even at 10% confidence level. Essentially, there is no statistical support to
conclude that the GINI curves for within spatial inequality have changed across time, a
result which holds irrespectively of the extent of individual neighborhood. We draw a
different conclusion for what concerns changes associated to the GINI between inequality
curves. Pairwise differences across these curves, along with their confidence intervals,
are reported in figure 5. The differences in inequality curves compared to the spatial
inequality curve of the year 1980 (panels a, b and c of the figure) are generally positive
and significant at 5% confidence level. This indicates that spatial inequality between individual neighborhood has increased compared to the initial period, roughly homogenously
with respect to the individual neighborhood spatial extension. After that period, data
display very little statistical support to changes in inequality across the 1990’ and 2000’.
Spatial between inequality has slightly increased after 1990 (panels d and e), while it has
remained stable after 2000 (panel f). In the latter case, the confidence bounds of the
difference in spatial inequality curves of years 2010/2014 and 2000 fluctuates around the
horizontal axis.
We use estimates of the GINI standard errors to study the pattern of the spatial inequality curves. More specifically, we compute differences in the GINI within GIN IW (d)
or between GIN IB (d) indices at various abscissae d, then we compute the standard errors
of these differences, and finally we test if these differences are significantly different than
zero. If they are, we study how spatial inequality evolves with the size of the neighborhood. In particular, the sign of these differences predicts the direction of the change
in spatia inequality. We refer to five distance thresholds defining neighborhoods that
are very small (100 meters, 300 meters), relatively large (1km, 5km), and very inclusive
neighborhoods (10km, 25km), which include most of the urban space under analysis. The
resulting differences are reported in Table 2. We note that the dip in the spatial inequality curve associated with the GINI within is not statistically significant, since most of
57
Index
GIN IW
Year
1980
1990
2000
2010
GIN IB
1980
1990
2000
2010
300m vs
100m
-0.004
(0.019)
-0.006
(0.022)
-0.004
(0.017)
-0.000
(0.017)
-0.020**
(0.002)
-0.012**
(0.004)
-0.009**
(0.002)
-0.019**
(0.002)
1km vs
100m
-0.012
(0.020)
-0.019
(0.022)
-0.016
(0.017)
-0.004
(0.018)
-0.087**
(0.002)
-0.084**
(0.003)
-0.060**
(0.002)
-0.083**
(0.002)
Differences across distances
5km vs 25km vs 10km vs
100m
100m
2km
-0.006
0.015
0.013
(0.020)
(0.021)
(0.021)
0.003
0.037*
0.036
(0.021)
(0.021)
(0.022)
-0.002
0.035*
0.034
(0.020)
(0.021)
(0.021)
0.001
0.033
0.019
(0.019)
(0.021)
(0.021)
-0.151** -0.239** -0.061**
(0.003)
(0.003)
(0.002)
-0.171** -0.280** -0.097**
(0.004)
(0.004)
(0.003)
-0.130** -0.237** -0.095**
(0.002)
(0.003)
(0.003)
-0.160** -0.261** -0.084**
(0.003)
(0.003)
(0.003)
25km vs
2km
0.025
(0.023)
0.051**
(0.023)
0.050**
(0.022)
0.036
(0.023)
-0.120**
(0.002)
-0.160**
(0.003)
-0.152**
(0.003)
-0.141**
(0.002)
25km vs
10km
0.012
(0.023)
0.015
(0.021)
0.016
(0.024)
0.017
(0.024)
-0.059**
(0.003)
-0.064**
(0.004)
-0.057**
(0.003)
-0.058**
(0.003)
Table 2: Patterns of GINI indices across distance levels
Note: Authors elaboration on U.S. Census data. Each column report differences in GINI indices at
various distance thresholds. SE of the distance estimate are reported in brackets. Significance levels:
∗ = 10% and ∗∗ = 5%.
the changes in spatial inequality in very large neighborhoods is substantially equivalent
to the spatial inequality observed for very small neighborhoods (between 100 to 300 meters of size). For 1990 and 2000, we find however a statistically significant increase in
inequality when average size neighborhoods (1km of radius) are compared with very large
concepts of neighborhoods. Overall, the GINI within pattern is substantially flat when
the distance increases beyond 5km. The pattern registered for the GINI between index is
much more clear-cut: generally, the spatial inequality curve constructed from the index
is decreasing in distance (differences in GINI between are generally negative), and the
patterns of changes are also significant at 5%, indicating strong reliability on the pattern
of heterogeneity in average income distribution across neighborhoods.
58
Figure 5: Differences in spatial GINI within indices across decades
(a) GIN IW 1990 - GIN IW 1980
(b) GIN IW 2000 - GIN IW 1980
(c) GIN IW 2014 - GIN IW 1980
(d) GIN IW 2000 - GIN IW 1990
(e) GIN IW 2014 - GIN IW 1990
(f) GIN IW 2014 - GIN IW 2000
Note: Authors elaboration on U.S. Census data. Confidence bounds at 5% are based on standard error
estimators discussed in the appendix.
59
Figure 6: Differences in spatial GINI between indices across decades
(a) GIN IB 1990 - GIN IB 1980
(b) GIN IB 2000 - GIN IB 1980
(c) GIN IB 2014 - GIN IB 1980
(d) GIN IB 2000 - GIN IB 1990
(e) GIN IB 2014 - GIN IB 1990
(f) GIN IB 2014 - GIN IB 2000
Note: Authors elaboration on U.S. Census data. Confidence bounds at 5% are based on standard error
estimators discussed in the appendix.
60
C
Statistics for selected U.S. cities
City
Year
# Blocks
Hh/block
Eq. scale
Mean
Equivalent household income
20%
80%
Gini 90%/10%
New York (NY)
1980
1990
2000
2010/14
6319
6774
6618
7182
1318
1664
1537
1140
1.572
2.058
1.604
1.566
12289
22763
41061
56558
4601
7799
12196
19749
19034
35924
66542
92656
0.474
0.507
0.549
0.502
11.247
13.013
25.913
17.323
Los Angeles (CA)
1980
1990
2000
2010/14
5059
5905
6103
6385
1052
1585
1158
1107
1.615
2.012
1.690
1.649
14697
26434
38844
55224
6167
10509
13720
19056
22248
41048
59767
90324
0.441
0.475
0.509
0.505
10.735
12.391
19.256
13.628
Chicago (IL)
1980
1990
2000
2010/14
3756
4444
4691
4763
1122
1217
1173
1060
1.630
2.029
1.625
1.575
13794
21859
41193
55710
5798
9132
16076
20022
20602
32316
61667
89856
0.434
0.461
0.473
0.486
11.351
11.903
11.533
13.452
Houston (TX)
1980
1990
2000
2010/14
1238
2531
2318
2781
1253
1291
1418
2148
1.624
1.994
1.667
1.644
15419
22827
39231
55841
6900
10203
16619
22156
22718
33287
57539
88033
0.428
0.462
0.472
0.484
10.233
11.771
10.736
12.394
Philadelphia (PA)
1980
1990
2000
2010/14
3978
3300
4212
3819
855
1384
982
1124
1.650
2.001
1.602
1.566
12651
21816
38995
56205
5589
9601
15788
21567
18557
31606
57841
89602
0.410
0.442
0.454
0.465
10.245
11.788
10.972
13.174
Phoenix (AZ)
1980
1990
2000
2010/14
697
1857
1984
2494
1155
961
1222
1110
1.609
1.970
1.622
1.590
12854
21233
37860
48194
5920
9831
17098
20218
18741
30732
54998
73509
0.401
0.439
0.437
0.456
8.972
9.803
8.541
10.906
San Antonio (TX)
1980
1990
2000
2010/14
597
1101
1065
1220
891
890
1189
1307
1.686
1.983
1.651
1.623
10501
17350
31592
44773
4364
7569
13726
19048
15399
25243
45517
68074
0.451
0.455
0.454
0.454
10.206
9.903
16.081
11.225
San Diego (CA)
1980
1990
2000
2010/14
908
1628
1678
1789
1471
1473
1172
1546
1.577
1.961
1.637
1.615
12759
24194
39537
55564
5628
11007
16698
21947
18338
35191
57219
88783
0.412
0.434
0.451
0.452
8.893
11.239
9.644
11.978
Table 4: Income and population distribution across block groups, U.S. 30 largest cities
61
City
Year
# Blocks
Continued
Hh/block Eq. scale
Mean
Equivalent household income
20%
80%
Gini 90%/10%
Dallas (TX)
1980
1990
2000
2010/14
1141
2310
2189
2696
931
965
1251
1251
1.620
1.993
1.633
1.625
14614
24074
43913
54729
6759
11287
19306
23689
21494
35141
65158
84291
0.425
0.454
0.464
0.460
9.522
11.691
10.093
11.163
San Jose (CA)
1980
1990
2000
2010/14
571
1016
965
1071
1417
1400
1169
1427
1.633
1.954
1.689
1.664
16762
32120
59428
82154
8441
15598
24663
30785
24258
47103
91637
137435
0.365
0.405
0.433
0.455
7.215
8.339
9.465
14.295
Austin (TX)
1980
1990
2000
2010/14
296
718
644
899
1084
1345
1416
1662
1.517
2.019
1.569
1.576
11407
18968
38993
55093
4867
8497
17418
23478
17064
27339
55766
85981
0.440
0.461
0.442
0.443
9.902
10.522
9.455
11.403
Jacksonville (FL)
1980
1990
2000
2010/14
434
628
505
688
1000
1509
2358
1757
1.622
1.973
1.590
1.550
10868
19217
34398
46517
4602
8365
14528
18370
15546
27219
49341
71941
0.428
0.435
0.434
0.450
9.415
9.512
8.629
10.883
San Francisco (CA)
1980
1990
2000
2010/14
1083
1226
1105
1210
1166
1477
1316
1328
1.514
2.040
1.549
1.525
16322
28783
60967
85755
6927
11624
20961
28440
24339
44191
97430
145763
0.424
0.467
0.494
0.482
9.864
13.379
13.179
16.858
Indianapolis (IN)
1980
1990
2000
2010/14
730
1029
944
1030
1073
1395
1395
1639
1.617
1.985
1.573
1.568
12550
20996
37021
47262
5958
9806
16392
19870
18183
29406
52896
71036
0.388
0.425
0.423
0.450
9.032
9.515
8.317
10.624
Columbus (OH)
1980
1990
2000
2010/14
758
1281
1140
1269
1105
1128
986
1293
1.593
1.988
1.553
1.560
12427
19865
35926
48270
5984
9262
16152
21115
17840
28819
51815
72778
0.394
0.427
0.431
0.439
8.874
9.649
8.848
11.633
Fort Worth (TX)
1980
1990
2000
2010/14
640
1203
1101
1326
650
956
1147
1294
1.615
1.972
1.638
1.625
12873
21517
37074
50540
5870
10428
17140
21830
18794
30620
52607
75565
0.409
0.424
0.429
0.449
9.169
9.835
8.719
10.553
Charlotte (NC)
1980
1990
2000
2010/14
346
930
856
1172
1169
1032
1195
1299
1.614
1.959
1.583
1.579
11411
20366
39683
47697
5203
8961
16640
19231
16277
29519
59188
74717
0.400
0.424
0.451
0.452
8.864
9.445
9.145
11.757
Detroit (MI)
1980
1990
2000
2010/14
2184
4531
3954
3798
764
974
963
986
1.638
1.990
1.603
1.560
12853
22673
40742
46492
5587
10194
17362
18592
19246
33441
59654
71604
0.415
0.445
0.439
0.456
10.783
12.181
9.817
11.856
El Paso (TX)
1980
1990
2000
2010/14
218
425
418
511
897
1042
960
1142
1.759
1.969
1.750
1.694
8525
15009
23862
33277
3572
6372
9095
13049
12373
21601
33972
51000
0.443
0.456
0.476
0.462
9.182
8.963
16.668
11.060
62
City
Year
# Blocks
Continued
Hh/block Eq. scale
Mean
Equivalent household income
20%
80%
Gini 90%/10%
Seattle (WA)
1980
1990
2000
2010/14
1405
2255
2473
2475
885
1004
855
1087
1.540
1.984
1.568
1.555
14437
22563
42386
59626
6481
10601
18650
24442
21204
31905
60276
92751
0.398
0.416
0.427
0.438
8.514
10.514
8.448
10.314
Denver (CO)
1980
1990
2000
2010/14
1054
1694
1711
1908
899
983
1038
1230
1.575
2.005
1.578
1.561
14283
22072
43300
58203
6866
10791
20142
24081
20352
31410
62101
90216
0.396
0.432
0.425
0.450
8.081
11.069
8.100
10.751
Washington (DC)
1980
1990
2000
2010/14
1580
2540
2642
3335
1608
2193
1409
1360
1.619
1.968
1.603
1.600
18273
32091
53263
80366
9281
16818
24898
35929
26315
45700
78715
124973
0.390
0.404
0.425
0.420
8.361
7.758
8.968
10.665
Memphis (TN)
1980
1990
2000
2010/14
478
920
783
764
1021
903
1153
1380
1.639
1.997
1.605
1.573
11370
17888
33086
42700
4852
8072
13753
17702
16693
26052
47853
65757
0.457
0.471
0.471
0.465
10.804
10.945
18.640
11.492
Boston (MA)
1980
1990
2000
2010/14
3662
4497
3963
4082
809
1032
961
1058
1.622
1.997
1.584
1.566
12696
24633
43840
64422
5417
10314
16776
23196
18790
37112
66109
105048
0.406
0.436
0.458
0.470
10.048
12.226
11.004
13.712
Nashville (TN)
1980
1990
2000
2010/14
375
755
723
911
1043
1260
1374
1535
1.605
1.979
1.555
1.568
12416
19811
36360
49714
5382
8712
15118
20024
18373
28653
52565
76735
0.442
0.442
0.448
0.452
10.358
9.710
9.000
10.444
Baltimore (MD)
1980
1990
2000
2010/14
1517
1965
1780
1932
900
1269
1204
1182
1.641
1.972
1.588
1.567
12751
23987
38615
59954
5932
11302
16954
25171
18442
34591
55517
93398
0.400
0.426
0.431
0.439
10.075
11.780
9.565
11.158
Oklahoma City (OK)
1980
1990
2000
2010/14
709
1034
880
1015
720
854
941
1021
1.573
1.993
1.557
1.562
12933
17551
30578
45377
5777
7499
12488
18504
18878
26072
44422
68795
0.419
0.445
0.447
0.457
9.075
9.616
15.739
10.504
Portland (OR)
1980
1990
2000
2010/14
696
1145
1141
1374
1077
1131
1111
1211
1.526
1.991
1.586
1.567
12819
19987
37618
49201
5411
8840
16409
19927
18704
28511
53854
74485
0.404
0.424
0.417
0.428
9.155
9.403
8.385
10.490
Las Vegas (NV)
1980
1990
2000
2010/14
150
318
796
1284
2018
2570
1396
1215
1.554
1.976
1.620
1.592
12756
20006
36442
44657
5568
8888
16095
18771
17713
27960
51823
66044
0.406
0.431
0.430
0.442
8.542
9.310
8.202
9.525
Louisville (KY)
1980
1990
2000
2010/14
582
957
742
840
873
938
1021
1087
1.592
1.990
1.542
1.536
11451
18323
32264
45220
5036
7864
13213
17798
17218
27067
46595
69576
0.414
0.445
0.444
0.451
9.188
9.771
15.196
10.739
63