So close yet so unequal: Reconsidering spatial inequality in U.S. cities Francesco Andreoli∗ Eugenio Peluso† Preliminary version: October 9, 2016 Abstract We study spatial inequality through new Gini-type indices that capture inequality in the distribution of incomes both within and between individual neighborhoods of variable size. We characterize redistribution schemes that reduce citywide income inequality along with spatial inequality. We apply this new methodology on a rich income database covering the 30 largest U.S. cities from 1980 to nowadays. We identify remarkable similarities in spatial inequality patterns across these cities, summarized by few stylized facts: within neighborhood inequality is substantial irrespectively of the neighborhood size, and generally stable over time, while inequality between individual neighborhoods strongly depends on neighborhood size and it has been on the rise during the last 35 years. Keywords: Gini, neighborhood, between inequality, variogram, geostatistics, flat tax, census, ACS, urban, U.S. JEL codes: C34, D31, H24, P25. ∗ † Luxembourg Institute of Socio-Economic Research, LISER. Email: [email protected]. DSE University of Verona 1 1 Introduction The uprising economic debate on the spatial component of income inequality in the U.S. has made clear that not all cities are made equal. Despite large national income inequality, there is great heterogeneity across U.S states and across their major cities (Chetty, Hendren, Kline and Saez 2014). Inequality in some cities has skyrocketed in the last decades, while inequality has remained relatively low in others. For instance, the Gini index of disposable equivalent household income in New York City in 2014 is larger than 0.5, while the figure is much closer to, and even below 0.4 in other large cities like Washington (DC). This heterogeneity, which can be explained by differences in skills and human capital composition across cities (Glaeser, Resseger and Tobio 2009), bears important consequences for local intervention, for targeting program participation and for designing federal to local redistribution of resources (Sampson 2008, Reardon and Bischoff 2011). What this picture misses is that not all neighborhoods of the same city are made equally unequal. Works at the frontier of economics, sociology and urban geography have recognized that neighborhood inequality is generally not representative of citywide inequality in the U.S. The discrepancy between local inequalities and citywide inequality, reflecting association between incomes and location of income holders, has been addressed to as urban spatial inequality. Various contributions have proposed to assess spatial inequality by using the administrative partition of the urban space to identify neighborhoods, and to measure it as the share of citywide inequality due to differences across neighborhoods (Shorrocks and Wan 2005, Dawkins 2007, Wheeler and La Jeunesse 2008, Kim and Jargowsky 2009). Evaluations based on this approach, however, have been criticized for putting the administrative neighborhood and not the individual, who is responsible for localization decisions, at the center. Following the geographic approach (Galster 2001, Clark, Anderson, Östh and Malmberg 2015), we consider the notion of individual neighborhood, namely the group of neighbors located within a given distance range from an individual, as the relevant concept of neighborhood. On the one hand, individual neighborhoods are meant to capture the space in 2 which different factors play a role in determining the pattern of location of rich and poor people within the urban space, and that the contribution of these factors can be captured by information on the incomes of one’s neighbors. On the other hand, the individual neighborhood is a meaningful representation of the space in which local inequality may affect individuals, for instance because individuals care about the income composition of the neighborhood (generating sorting patterns à la Schelling 1969), or via externalities (also called neighborhood effects, see Durlauf 2004). Social experiments such as Moving to Opportunity, for instance, leverage on peer effects in the hope of improving the situation of the poor (Sampson 2008, Ludwig, Duncan, Gennetian, Katz, Kessler, Kling and Sanbonmatsu 2013) and of his offspring (Chetty, Hendren and Katz 2016). In this paper, we study inequalities that take place at the urban level. Our contribution integrates the notion of individual neighborhood in inequality measurement and develops in three interconnected directions. Our first concern is positive, and has to do with how spatial inequality should be modeled and measured. We develop a new notion of spatial inequality where individuals (and not the administrative partition of the urban space) are put at the central stage and spatial inequality is assessed by using information on the distribution of other income recipients on a reference space around them. We parametrize the comparison set by using the geographic distance around each individual, such that the spatial pattern of inequality is accounted by a single and continuous parameter, notably the size of individual neighborhoods. In this way each individual neighborhood identifies the potential peers which the individual is exposed to. If there is association between space and income, we expect that the average income of the neighbors varies across individuals depending on their location. Otherwise, each individual’s neighborhood is representative of the city income distribution, implying no heterogeneity across individual neighborhoods. This point is made clear by figure 1. City A and City B are two stylized cities displaying the same overall incomes inequality. The two cities differ only for the location of the middle-class income recipient. In City A, the middle class individual is located close to the poor individual, while in City B he is 3 $4 $4 $2 $2 $1 i $1 j k i City A j k City B Figure 1: Spatial distribution of three individuals (the rich, the middle-class, and the poor) with their incomes (vertical spikes) in two linear cities of length 3d. located closer to the rich individual. Consider now the individual neighborhoods of size d, obtained by drawing a circle of ray d around each individual. Simple calculations reveal that there is more inequality between neighborhoods in City B compared to City A. This happens because the rich individual, who has an income substantially higher than the poor and middle-class income, is isolated in City B. The ”between” dimension of spatial inequality then captures the extent of inequality between individual neighborhoods.1 Heterogeneity of neighborhoods affluence overlooks heterogeneity within the neighborhoods. Figure 1 suggests that the degree of heterogeneity at which individuals are exposed to is substantially larger in City A compared to City B. This occurs because the rich and the middle-class individuals, whose incomes are very unequal, are very close neighbors only in City A, while in City B the middle-class person is a close neighbor of the poor person, whose income is just smaller than the middle-class income. Being exposed to peers who are either very affluent or very deprived might represent the society as a very risky one, where either success comes with very high income or failure comes with poverty. The ”within” dimension of spatial inequality captures the average degree of heterogeneity in peers’ incomes within individual neighborhoods of fixed size. Assessing spatial inequality along these two dimensions provides relevant information to support opposite views about the effects of (local) inequality: if negative externalities arise from deprivation feeling and envy (Luttmer 2005), then the distance between rich and 1 This perspective becomes relevant, for instance, if individual opportunities are proxied by the average income among the peers, a common perspective in peer effects literature. 4 poor people could mitigate part of these externalities, making City A the preferred one. However, the proximity among people from different social status might rise ambitions and opportunities of the poor and also benefit the rich (Ellen, Mertens Horn and O’ Regan 2013). In this case, City B might be considered as the favored one. To quantify the extent of spatial inequality along these two dimensions, we introduce the Gini Individual Neighborhood Inequality (GINI) indices. The GINI between index is the Gini inequality index in the population, where individual incomes are replaced by individual neighborhood average incomes. The size of the individual neighborhoods is parametrized by d, the distance between the income recipients on the urban space.2 To assess the within perspective of inequality, we rely on the probabilistic interpretation of the Gini index pioneered in Pyatt (1976). Pyatt shows that the Gini inequality coefficient can be written as the average of the gains and losses that each individual might realize if he had to exchange his own income with the income of any other income recipient randomly drawn from the population. To assess spatial inequality, we imagine that these comparisons are confined to individual neighborhoods rather than to the overall population. This gives the GINI within index, also parametrized by a distance parameter capturing the extent of individual neighborhood. We formalize these indices in Section 2, where we also derive spatial inequality curves, depicting the evolution of the GINI indices when d varies. We argue in a dedicated section of the paper that the traditional approach to spatial inequality has some conceptual limitations, and we motivate in what sense our proposed measurement setting can be seen as superior. We also uncover relations of the GINI indices with tools developed in geostatistics literature, such as the variogram function (Section 4), and commonly adopted to model spatial dependence in random processes. A methodological appendix develops innovative asymptotic results based on stationarity assumptions that are common in geostatistics. Our second concern is normative, and has to do with the implications of redistribution on inequality when the spatial dimensions of the income distribution is accounted for. We recognize that changes in overall inequality might not necessarily distribute evenly across 2 For a discussion about the use of alternative notions of distance, see Conley and Topa (2002). 5 the urban space. Taking the location of the individuals as given, we study the features of the redistribution scheme that, when applied to gross incomes in the city, produce postfisc incomes that display less citywide and less spatial inequality. Results are developed in Section 3. Our third and last concern is empirical. We make use of rich census data on income distributions in 30 largest U.S. cities over four decades to assess the pattern of spatial inequality in American urban areas (Section 5). Overall, we find very strong resemblances in patterns of spatial inequality among the cities considered in the study. We conclude that the pattern of localization of income in a city has little to do with the differences in the organization of the urban space of the cities or with differences in citywide income distributions, and more with the preferences of inhabitants for the relative composition of their neighborhood and with costs and benefit these preferences might give rise to. Section 6 concludes drawing lines of future research on the issue. 2 Spatial inequality measurement 2.1 The GINI indices We consider a population of n ≥ 3 individuals, indexed by i = 1, ..., n. Let yi ∈ R+ be the income of individual i and y = (y1 , y2 , ..., yn ) the income vector with average µ > 0. A P P popular measure of inequality in y is the Gini index, defined as G(y) = 2n12 µ i j |yj −yi |. We focus on inequality within the city, and we assume that information on the income distribution comes with information about the location of each income recipient in the urban space. We simply use income distribution to refer to the income-location distribution of individuals on the city map. Building on this logic, we put each individual at the center stage and we identify her own set of neighbors, whose composition depends on the distribution of people in the relevant space. To define individual neighborhoods, we consider the spatial distance between pairs of individuals. The set containing all the neighbors placed within a distance ray d from individual i is denoted di , such that j ∈ di 6 if the distance between individuals i and j is less than, or equal to d. We also denote nid the cardinality of di , that is the number of people living within a range d from i (including i). The average affluence of individual i’s neighborhood of length d is then measured by P µid = j∈di nid yj . To assess inequality between individual neighborhoods we compute the Gini index for the vector (µ1d , . . . , µnd ). The elements of these vectors depend upon individuals’ location and proximity. For instance, if a rich individual lives near to many poor individuals, her income contributes in rising the mean income in all their neighborhoods, while if she lives isolated, her income does not generate any positive effect on other people’s neighborhoods, provided that the notion of individual neighborhood is sufficiently exclusive. This implies that the average value of the vector (µ1d , .., µnd ), denoted µd , generally differs from µ. The between dimension of spatial inequality is captured by the Gini Individual Neighborhood Inequality between index GIN IB , defined as: GIN IB (y,d) = 1 XX |µid − µjd |. 2n2 µd i j As expected, GIN IB (y, d) ∈ [0, 1] for any y and d. The index is equal to G(y) at a zerodistance and whenever all incomes within each individual neighborhood of length d are equal. GIN IB converges to zero when d approaches the size of the city. The in-between pattern depends on the association between incomes and locations. We also propose a GINI within index that captures the within component of spatial inequality. Inspired by the work of Pyatt (1976), who provides a probabilistic interpretation to the Gini inequality index, we consider, for each individual i, the average absolute income differences with respect to her neighbors, normalized by the average income in the (individual) neighborhood. This quantity captures the extent of gains and losses that i would expect if her income was replaced with the income of the neighbors identified by d. It can be computed as follows: ∆i (y, d) = 1 X |yi− yj | . µid j∈d nid i 7 Notice that, given the relevant notion of individual neighborhood parametrized by d, individual i has 1/nid chances of interacting with one of his neighbors. This probability also changes across individuals, despite uniform weighting. We define the GINI within index as the average of ∆i across the whole population: n X 1 GIN IW (y, d) = ∆i (y, d). n i=1 The GIN IW index hence captures the overall degree of inequality we would observe if income comparisons were limited only to neighbors located at a distance smaller than d. It inherits some properties of the Gini index. It is bounded, with GIN IW (y, d) ∈ [0, 1] for any y and d. Moreover, it is endowed with specular properties with respect to GIN IB : for instance, GIN IW (y, d) = 0 if and only if all incomes within individual neighborhoods of length d are equal. Notice that this cannot exclude inequalities among people located at a distance larger than d. Additionally, GIN IW (y, d) can take on values that are either higher or smaller than G(y). When d reaches the size of the city, then spatial inequality ends up coinciding with overall inequality, that is GIN IW (y, ∞) = G(y). 2.2 The spatial inequality curves Computing GIN I indices for different values of d and plotting their values on a graph with d on the horizontal axes provides a simple and insightful picture of spatial inequality patterns. The curve interpolating all these points is called the spatial inequality curve generated by either the GINI within or the GINI between index. More precisely, the curve originated by GIN IB takes the same value of the Gini index when each individual is considered as isolated (that is, when d = 0) and approaches 0 when each individual neighborhood includes the whole city. The curve originated by GIN IW can exhibit a less regular shape. First, it can locally decrease or increase in d accordingly to the spatial dependency of incomes. Second, when each individual neighborhood coincides with the whole city, GIN IW (y, d) approaches G(y). Third, the graph of GIN IW (y, d) can be flat for any d, meaning that incomes are randomized across locations and the spatial 8 component of inequality is irrelevant. Fourth, if the curve is increasing with d, then individuals with similar incomes tend to sort themselves in the city, generating income segregation. The shape of the spatial inequality curves are also informative on the degree at which citywide income inequality can be correctly inferred from randomly sampling individuals.3 For a given size of the individual neighborhood, spatial inequality comparisons can be carried over by looking at the level of the GINI within or between index corresponding to the selected distance parameter. These evaluations generate a complete ranking of the income distributions, but the robustness of the ranking is disputable vis-à-vis the size of the neighborhood (generally unknown). We design GINI between and within partial orders that are based on comparisons of spatial inequality curves. Definition 1 Given the income distributions y and y0 , we say that: a) y0 dominates y in terms of between individual neighborhood inequality, and we write y0 GB y, if and only if GIN IB (y0 , d) ≤ GIN IB (y, d) for any distance d ≥ 0. b) y0 dominates y in terms of within individual neighborhood inequality, and we write y0 GW y, if and only if GIN IW (y0 , d) ≤ GIN IW (y, d) for any distance d ≥ 0. Both within and between dimensions of spatial inequality dominance always imply dominance in terms of citywide inequality, as captured by the Gini coefficient. The two dominance criteria can be hence seen as tests for assessing the degree at which changes in citywide distributions are robust to the spatial component of inequality. We do not attribute a normative value to the spatial inequality dominance per se, as long as it might reflect both effects of changes in the income distribution, as well as the implications of changes in the sorting dynamic of rich and poor people within the same city. While redistribution policies have to do with income recipients budgets and can be motivated on welfaristic or equity concerns, sorting dynamics is guided by preferences, hence complicating the normative judgement about changes in spatial inequality. It is nevertheless 3 When the role played by space is negligible, i.e., the spatial inequality curves are rather flat, we expect that any random sample of individuals taken from a given point in the space is representative of overall inequality. When space is relevant and people locations are stratified according to income, then a sample of neighbors randomly drawn could underestimate the level of citywide inequality. 9 interesting to characterize redistribution schemes on the bases of their implications for spatial inequality changes, when the location of individuals is taken as given. 2.3 Discussion Inequality literature largely agrees that any valuable income inequality measure should satisfy some normatively relevant properties (see Cowell 2000): (i) invariance with respect to population replication; (ii) invariance to the measurement scale; (iii) anonymity, that is, invariance to any permutation of the incomes recipients names; (iv) the Pigou-Dalton principle, recognizing that every rich-to-poor income transfer unambiguously reduces inequality. Measuring spatial inequality means accounting for the joint information about the distribution of incomes and the location of individuals in an urban space. While properties (i) and (ii) have desirable implications also for the measurement of spatial inequality, namely that populations of different sizes and affluence can be made comparable, anonymity conflicts with the idea that individual locations matter in inequality evaluations. A configuration where a rich person lives nearby rich people should not necessarily yield the same degree of spatial inequality as a configuration where poor and rich people are neighbors, provided that incomes of rich and poor people respectively coincide in the two situations. Based on this arguments, it is not clear whether all permutations of incomes across individuals, whose location on the urban space is given, are sources of indifference for inequality assessments. Hence, anonymity should be relaxed in spatial inequality assessments. The existing literature does so in a very peculiar way. The spatial dimension of inequality is usually associated with the magnitude of the between groups inequality, where the partition of the population rests on the administrative neighborhoods where individuals reside: urban blocks, tracts, etc (see Shorrocks and Wan 2005). One common trait in this literature is the idea of partitioning the city into a pre-determined spatial grid, defining neighborhoods, and grouping individuals living in the city according to the neighborhood of residence. Dawkins (2007) and Kim and Jargowsky (2009) propose to decompose overall inequality in the within and between 10 neighborhood components, and to assess spatial inequality as the contribution of the between component, capturing differences across neighborhoods, relative to the overall city inequality. Reardon and Bischoff (2011) builds on this approach, but focus instead on the degree of disproportionality of rich and poor individuals across neighborhoods. Their approach captures a particular aspect of spatial inequality, called income segregation (as the measurement apparatus bears from the racial segregation literature), which is insensitive to the overall income distribution in the city (rich and poor groups are defined on the basis of the ranks of individuals in the overall population). The aforementioned approaches retain anonymity at two levels: first, among individuals living in same neighborhoods; second, when aggregating summary measures associated to the representative individuals of different neighborhoods (average income or dispersion in her neighborhood). Retaining anonymity at both levels implies that current spatial inequality measures are not robust to the underlying partition of the urban space (the so-called Modifiable Areal Units Problem, MAUP, see Openshaw 1983, Wong 2009). More precisely, results are sensitive to the initial scale and to the shape of the partition (respectively, the scaling and the zoning components of the MAUP). To overcome this issues, propositions have been made in the literature that involve assessing inequality between neighborhood at different scales of aggregation of the initial partition (Shorrocks and Wan 2005, Wheeler and La Jeunesse 2008). This procedure, however, implies strengthening the implications of anonymity within the neighborhood, as the partition of the urban space gets courser. This is undesirable. Another potential pitfall of this approach to spatial inequality is that measured inequality relies too strongly on the initial partition of the urban space. Based on U.S. census data Hardman and Ioannides (2004) and Wheeler and La Jeunesse (2008) evaluate spatial inequality at different spatial scales. Both works recognize the importance of incorporating overall inequality in the evaluation of spatial inequality, to properly contextualize the relevance of the within and between component in comparisons across cities. They also propose to investigate how patterns of spatial inequality change when pre-determined neighborhoods are aggregated in larger spatial units. The procedure is motivated from the need of accounting for the size of the space 11 of individual interactions. Our approach to spatial inequality measurement, based on the GINI indices, fully incorporates this logic by considering individual neighborhoods and the income distribution therein as the primary information about the spatial location of individuals. We do not assume representative individuals within administrative areas of the city. In the logic of construction of individual neighborhoods, the fact that individual k is neighbor of individual i and of individual j does not imply that i and j are also neighbors. This logic discards anonymity within the neighborhoods, which is retained only to assess inequality in the summary measures (means or dispersion) associated to each individual neighborhood. On these grounds, we claim that our approach is robust to space partitioning, while the scale of analysis can be controlled for by constructing more or less inclusive individual neighborhoods. Along with anonymity, every inequality measure should be sensitive to the PigouDalton principle. Traditional inequality measures should always decrease when income is transferred from the rich to poor individual. Whether this principle is desirable or not in a context where anonymity is (partially or completely) dropped, is an issue. For instance, transferring money from a rich individual in a poor neighborhood to a poor individual in a rich neighborhood arguably increases between neighborhood inequality. In line with this objection, most of the spatial inequality indices proposed so far are not consistent with the implications postulated by rich-to-poor transfers. Rather, they are consistent with more demanding notions of income transfers involving information about the position (i.e., the neighborhood), where the poor and rich person live.4 Our approach is even more robust to the implications of anonymity than the traditional approach. Hence, the implications of rich-to-poor transfers on GINI within and between indices are even more ambiguous.5 Performing rich-to-poor transfers in this setting may be extremely information demanding. 4 For instance, PD transfers do not increase the spatial inequality when they take place within the neighborhood, while PD transfers takeing place between neighborhoods decrease spatial inequality only when income is transferred from a rich person living in a rich neighborhood to a poor person living in a a poor neighborhood. 5 This occurs, for instance, as long as the same rich neighbor in a rich individual neighborhood is also the poor neighbor in another relatively richer individual neighborhood. 12 We rather focus on the implications of redistribution schemes on inequality and spatial inequality. 3 Spatial inequality reducing income redistribution In this section, we shift the focus from rich-to-poor transfers, involving individual comparisons of incomes, to less sophisticated redistribution schemes that apply to incomes and that are common to the citywide distribution, but do not require the knowledge of the position of individuals within the city. More precisely, we investigate redistributive schemes that are able to reduce within or between spatial inequality, according to the dominance criteria defined in terms of GINI indices. To investigate the effect of redistributive policies on spatial inequality, we consider policies generating transformations of incomes such as tax-benefit rules, involving the whole population. Definition 2 A redistributive scheme is a mapping f : R+ → R+ that associates a postredistribution income f (y) to any y. Notice that the final income f (y) only depends on the initial income y, while other characteristics such as location are neglected. Let F be the general set of redistribution schemes defined on R+ . We restrict our attention rank-preserving distributive schemes, restricting the domain to cases where f is non-decreasing over R+ and the ranking of individuals after the policy is the same of the pre-tax income distribution. This requirement is justified by the need of keeping incentives from relocations of individual across the city, related to differential in utilities, as constant (provided all other features of the city remain constant). Let us denote F ∗ the set of rank-preserving redistributive schemes defined on R++ that admit left and right derivatives over the entire domain. The following proposition identifies the largest subset of redistribution schemes in F ∗ that are able to reduce spatial inequality according to the within perspective. Lemma 1 Given f ∈ F ∗ , the following claims are equivalent: 13 i) f (y1 ), . . . , f (yn ) GW y for any y ∈ Rn+ and for any location of the n individuals. ii) f is star-shaped from above (ssa), that is f (y) y is decreasing for any positive y. Proof. i)⇒ii). This is due to Proposition 2.1.a and Remark 2.4. p.279 in Moyes (1994), which proves that a ssa redistribution scheme applied to the whole population reduces inequality in the relative Lorenz sense within any possible group of individuals of the initial population (and even when groups are overlapping). The Gini index then decreases as well within any possible group (notice that we do not refer to groups generating a partition of the whole population). ii)⇒i). Consider a neighborhood containing only the two poorest individuals, endowed with incomes y and y + h. To reduce the Gini coefficient after applying the redistributive scheme f we need f (y+h)−f (y) f (y+h)+f (y) < h . 2y+h By taking the limit for h → 0, we get f 0 (y) < f (y) . y Since the sum of ssa functions is a ssa function, a simple corollary of the previous proposition is that applying any progressive tax schedule t (star shaped from below, i.e., t(y) y is increasing) and redistributing the tax liability through a star-shaped function reduces the within neighborhood inequality everywhere. The following proposition provides an answer for the GINI between case by further restricting the class of admissible redistribution schemes. Lemma 2 Given f ∈ F ∗ , the following claims are equivalent: i) f (y1 ), . . . , f (yn ) GB y for any y ∈ Rn+ and for any location of the n individuals. ii) f is an affine function f (y) = αy + β, with β α ≥ 0. PP Proof. ii)⇒i). This is immediate, since GIN IB (y,d) = n21µd j |µid − µjd |, whilst for i P P y0 it holds that GIN IB (αy + β,d) = n2 µ1+ β j |µid − µjd | and β/α ≥ 0. ( d α) i i)⇒ii). First, observe that the same proof of the within case can be applied to show the necessity of progressive redistributive schemes. Consider a city with a single individual 14 with income ȳ living in a neighborhood of ray d and two individuals with income y1 and y2 , such that ȳ = y1 +y2 2 who live in a similar neighborhood but separated from the first one. The GIN Ib is then equal to zero. Requiring the same value after redistribution 2 )= implies: f ( y1 +y 2 f (y1 )+f (y2 ) . 2 This is the Jensen’s functional equation, that is well-known to admit only affine solutions (see Theorem 1, p.43 in Aczel 1966). The two lemmas altogether can be combined to provide an intuitive proof of an important result, which we highlight in the following theorem. Theorem 3 There is less citywide and less spatial inequality in post-redistribution incomes compared to pre-redistribution incomes, i.e. f (y1 ), . . . , f (yn ) GB (GW )y for any initial distribution y ∈ Rn+ and for any location of the n individuals, if and only if f is a basic income-flat tax (BIFT) scheme. Proof. The result follows from Lemma 1 and 2 by imposing the government budget P P constraint i f (yi ) = i yi . By setting α = 1 − t and β = tµ, the redistribution scheme discussed in Lemma 2 identifies a BIFT scheme, with tax rate t. The theorem has sound implications for local and urban policymakers, who take the distribution of people in the city as given and and only control upon the distribution of incomes. The theorem states that, for a given distribution of income holders in a city, the only option left for a policymaker interested in reducing overall inequality in the city (as measured, for instance, by G(y)) through taxation while being robust to patterns of inequality across the urban space, is to implement a BIFT scheme. Rich to poor transfers of incomes, which are highly demanding in terms of transfers operations that have to be conducted, might not be suitable in some cases, unless one is willing to introduce additional information about the location of individuals.6 Additionally, the proposition 6 Notice that this form of progressive redistribution unambiguously reduces inequality in the space, pushing down both the within and between GINI curves. However, its implications for the affluence at the level of individual neighborhood are ambiguous. To understand this point, imagine for instance a rich individual who lives near to several poor individuals, who are located far from each other. In this case we can expect µd > µ and a BIFT scheme takes away a proportion of the rich income dividing it among several poor individuals. Apart from this direct redistributive effect, there is also another effect in the opposite direction, consisting in the reduction of the (potential) positive externality generated by the presence of a rich people in the neighborhood of poor income recipients. 15 bears important consequences for national politics. Robustness of the results with respect to the initial distribution of individuals helps the spatial inequality minded policymaker in shaping national tax rules, a key policy leverage, that apply indistinctly across different cities differing both in the distribution of incomes and in the distribution of people within the city. Our result says, for instance, that the only federal tax schedule for the U.S. (i.e., a federal tax scheme does not depend upon the specificities of a single city and the distribution of individual therein) granting a reduction in citywide and spatial inequality both in New York City and in Los Angeles (two cities that differ in geography, in residential patterns and in skills distributions), is the BIFT scheme. Other redistribution schemes might reduce inequality in one city at the cost of rising spatial inequality in another city. 4 Empirical analysis of spatial inequality 4.1 Relations with geostatistics The GINI indices capture a relation between the degree of inequality in incomes and the distribution of these incomes in a geographic space. The indices do so in a way that is analogous to how spatial processes are treated in geostatistics literature (Cressie 1991). In this section, we investigate more in details these connections. We assume that the data generating process is {Ys : s ∈ S}, where Y is a random variable distributed over a random field S. This process’s distribution is FS (y), representing the degree of spatial dependence between the random variables. Through geolocalization, it is possible to compute the distance ||.|| between locations s, v ∈ S. We generally write ||s − v|| ≤ d to indicate that the distance between the two location is smaller than d, or equivalently v ∈ ds . The cardinality of the set of locations ds is nds while n is the total number of locations. The observed income distribution y is a particular realization of such process, where one observation i is observed in one location s. Consider first the GINI within index of the spatial process FS . It can be written in terms of first order moments of the random variables Ys as follows: 16 GIN IW (FS , d) = XX s v∈ds 1 E[|Ys − Yv |] . 2n nds E[Yv ] The distribution FS expresses the degree of spatial dependence between the random variables Ys . Consider first the case with no spatial dependence, that is the random variables Ys and Yv are i.i.d. for any s, v ∈ S. The distribution of the stochastic process distributed as FS is thus independent of the random field S. In this case, we obtain that GIN IW (FS , d) = E[|Ys −Yv |] E[Yv ] where Ys and Yv are two i.i.d. random variables, which coincides with the definition of the standard Gini inequality coefficient in Muliere and Scarsini (1989).7 If, instead, spatial dependence is at stake, then the expectation E[|Ys − Yv |] varies across locations and cannot be identified and estimated from the observation of just one data point in each location. It is standard in geostatistics to rely on standard stationarity assumptions about the distribution FS (Cressie and Hawkins 1980, Cressie 1991). The first assumption is that the random variables Ys have stationary expectations over the random field, i.e., E[Yv ] = µ for any v. The second assumption is that the spatial dependence in incomes registered between two locations s and v only depends on the distance between the two locations, ||s − v||, and not on their position on the random field. Here, we consider radial distance measures for simplicity, so that ||s − v|| = d. This allows to write E[(Ys − Yv )2 ] = 2γ(||s − v||) = 2γ(d), where the function 2γ is the variogram of the distribution FS (Matheron 1963). The variogram displays the evolution of the variance in the difference (a measure of correlation) between any two elements of the random process defined on S as a function of their spatial lag. Thus 2γ(d) is informative of the correlation between two random variables that are exactly d distance units away one from the other. The shape of the function 2γ(d) represents the degree of spatial dependence of the random process, conceptualized as the contribution of spatial association on incomes variability. Generally, 2γ(d) → 0 as d approaches 0, indicating that random variables that are very close in space 7 Biondi and Qeadan (2008) explore the use of connected estimator to assess dependency across time in paleorecords observed on a given point in space. 17 tend to be strongly spatially correlated. Conversely, 2γ(d) → 2σ 2 when d is sufficiently large, indicating spatial independence between two random variables Ys and Yv far apart on the random field. Together, the two assumptions listed above depict a form of intrinsic stationarity of the process (Cressie and Hawkins 1980, Cressie 1991, Chilès and Delfiner 2012). If we also assume that Ys is gaussian with mean µ and variance σ 2 , it is possible to show that the GINI within index is a function of the variogram, as follows GIN IW (FS , d) = XX s v∈ds p 1 2 γ(||s − v||)/π . 2n nds µ With more algebra, a similar relation can also be established for the GINI between index. Both formulas are formally derived in the appendix. These results are suggestive of two important facts. The first fact is that the GINI indices measure spatial inequality as a direct expression of the spatial dependence in the data generating process, represented under stationarity assumptions by the variogram, without imposing external normative hypotheses about the interactions between incomes, income inequality and space. The second fact is that the empirical counterpart of the variogram can be used to produce asymptotically valid estimators of the GINI indices standard errors, which can be used to test hypothesis on the extent and dynamics of spatial inequality. 4.2 Testing hypothesis about spatial inequality In the methodology appendix, we derive distribution free, non-parametric estimators for the GINI indices in a general setting where sample information about the process FS is available.8 GINI indices estimators can be used to assess inequality in a city for a given concept of individual neighborhood, and more broadly to construct spatial inequality curves. Methodology for their construction is discussed in the appendix. 8 The GINI between index estimator can be computed as a plug-in estimator as in as in Binder and Kovacevic (1995) and Bhattacharya (2007), provided individual neighborhood averages are properly estimated. The GINI within estimator involves instead compering individual income realizations. 18 The GINI spatial inequality curves can be used to test hypothesis about the shape and dynamics of spatial inequality. (i) By contrasting the level of spatial inequality measured by the GINI curves at a given distance d with the overall level of inequality captured by the Gini index, it is possible to assess if, and to what extent, average income inequality experienced within a neighborhood of size d is different from the level of inequality in the city. (ii) Moreover, by contrasting the level of the GINI curves for at d and at d0 > d, it is possible to state if, and by how much, inequality is increased with an increase in the range of the average individual neighborhood. (iii) Lastly, by comparing the levels of the GINI curves at distance d registered in different periods within the same city, it is possible to conclude about the dynamics of spatial inequality. Furthermore, if one is confident that spatial distance is comparable across cities, one can use GINI spatial inequality curves to rank these cities in terms of urban inequality.9 In the appendix, we maintain the intrinsic stationarity and the gaussian assumptions to derive standard errors for the GINI within and between estimators. We show that standard errors can be written as averages of variogram functions of the process and that the GINI indices estimators sampling distribution converge to asymptotic normality.10 It follows that the statistical relevant of the hypothesis about spatial inequality can be assessed via standard t-statistics. In the empirical investigation of spatial inequality across U.S. cities, based on decennial census data and repeated surveys, we employ standard error estimators to account for issues related to data reporting (which come in forms of summary tables for each element of a very fine spatial partition of U.S. urban territories).11 9 One is compelled to conclude in favor of spatial inequality only if there is strong evidence against the null hypothesis that the level of the GINI curve at d is the same as the Gini inequality index, and that the level of spatial inequality captured by the GINI curves does not change with d. When comparing two GINI (either between or within) curves, one cannot reject that a strong increase or reduction in spatial inequality if there is strong evidence against the null hypothesis that the two curves coincide at every d. 10 Standard errors for GINI indices can be derived using results on ratio-measures estimators (see Hoeffding 1948, Goodman and Hartley 1958, Tin 1965, Xu 2007, Davidson 2009) under distributional assumptions that are common in the geostatistics literature (Cressie and Hawkins 1980, Cressie 1985), such as intrinsic stationarity. 11 A Stata routine implementing the GINI between and within indices and curves, along with their standard error estimators, is available on the authors web-pages. 19 5 Spatial inequality in U.S. cities: 1980-2014 In this section we recover and study the patters of spatial inequality in U.S. cities along four decades. Our aim is to illustrate the functioning of spatial inequality measures on the spatial distribution of income and to provide stylized facts about the shape of spatial income inequality in U.S. cities. We focus first on the city of Chicago (IL) as a case study of the spatial dimension of inequality in a large U.S. metropolitan area. Then, we provide stylized facts about patterns of spatial inequality across the 30 largest U.S. cities. 5.1 Data We follow the dynamics of income within U.S. cities for four decades. We make use of the Census data from the U.S. Census Bureau for 1980, 1990 and 2000. For these years, we select information from the Summary Tape File 3A about population counts, income levels and family composition at a very fine spatial grid.12 The STF 3A data are reported in the form of statistical tables for each block group, due to anonymization issues. After 2000, the STF 3A file of the Censuses have been replaced with survey-based evidence from the American Community Survey (ACS), running yearly on representative samples of U.S. resident population. For our analysis, we consider the 2010-2014 ACS 5-years Estimates module obtained from representative samples of the U.S. population.13 Sampling rates vary independently at the census block level according to 2010 census population counts, covering on average 2% of U.S. population. The block group scale provides course information about the individual incomes, given the substantial heterogeneity in incomes observed within blocks. The block groups define, however, sufficiently small spatial units that can be safely used to construct individual neighborhoods of minimal size. 12 Available information is a snapshot of overall inequality in the Census year with a universal statistical coverage. The Census STF 3A provides data for States and their subareas in hierarchical sequence down to the block group level, which is the finest geographical information available in the Census. 13 The ACS reports estimates of incomes, population counts and household size measures at the level of the census block from data collected over the five years span. Hence, estimated counts do not have an annual validity, but rather should be interpreted as average measures across the 2010-2014 time span. 20 To account for within block group income inequality, we construct synthetic measures of income from the counts information. We retain households made of one or more individuals as the income recipient units. We use two income variables to model the distribution of household gross income. The first variable is total (i.e. aggregate) income in the block group. The second variable is the count of the households living in each block group. Tables of households counts are also provided by household size (scoring from 1 to 7 or more individuals), so that equivalence scales representative at the block group level can be constructed. The use of demographic equivalent scales is necessary to draw conclusions about the distribution of inequality across households that are otherwise not comparable. A second table reports households counts per income category at the block group level. There are 17 income bins in 1980, 25 in 1990 and 16 in 2000 and in the ACS sample. Each bin consists in two income levels, the bottom and top incomes marking that bins. The last income bin defines an open interval instead, for which only the lower income bound is reported. Census and ACS also report, for each income bin, data of the population of households whose gross income falls in that bin. Using methodology similar to Nielsen and Alderson (1997), based on Pareto distribution fitting (for an alternative methodology based on parametric models, leading to very close results, see Wheeler and La Jeunesse 2008), tables about the distribution of household counts across bins are transformed into a vector of incomes levels and the associated vector of households frequencies corresponding to these incomes that are representative of the block group specific income distribution.14 Incomes and household frequencies vary across block groups, implying strong within city heterogeneity. These income values can be equivalized by dividing for the appropriate equivalent scales, so that households are made comparable within and across block groups.15 We iterate the procedure for each year considered in this study. The resulting income 14 Estimation is constrained in a way that average income inferred from this distribution of incomes and frequencies matches the average income in the Census or ACS within the block group, obtained by dividing total income in the block by the number of households there residing. 15 In most of the cases, it turns out that the reference income category associated to a bin is simply the income bin’s midpoint. For the top income bin, the reference value is adjusted so that the average estimated income coincides with data provided by the Census. 21 City Year Chicago (IL) 1980 1990 2000 2010/14 # Blocks 3756 4444 4691 4763 Hh/block 1122 1217 1173 1060 Eq. scale 1.630 2.029 1.625 1.575 Mean 13794 21859 41193 55710 Equivalent household income 20% 80% Gini 90%/10% 5798 20602 0.434 11.351 9132 32316 0.461 11.903 16076 61667 0.473 11.533 20022 89856 0.486 13.452 Table 1: Income and population distribution across block groups in Chicago (IL) database consists in strings of incomes and frequency weights for each block group in the cities considered. We treat these income levels as observations with their associated sample weight. Thus, weighted variants of the GINI indices estimators are used. Census and ACS identifier is defined at the census block-group level. Each block group is geocoded, and measures of distance between the bock groups centroids can be hence constructed. All income observations within the same block group are hence on the same spatial location. To identify cities, we resort on the Census definition of a Metropolitan Statistical Area, based on the 1980 Census definition.16 5.2 Spatial inequality in Chicago, IL We use the 1980 definition of the Chicago Primary MSA area provided by the Census, which comprises Cook County, Du Page County and McHenry County surface. By sticking to this definition of the Chicago metro area we can construct measures of income inequality in a well defined, yet large, urban space.17 Table 1 reports summary information of household population and the respective income distribution in Chicago.18 Average equivalent household income has increased fourfold over 1980-2014 in nominal terms. De16 Details about the U.S. counties defining the Chicago (as well as the other cities) metropolitan area can be found at this link: http://www.census.gov/population/metro/files/lists/historical/80mfips.txt. 17 Note that for some of the block groups of 1980 Census it is not possible to establish geocoded references. Hence, these units cannot be included in the index computation and might have an impact on estimation of the GINI patterns. Other works, such as (Reardon and Bischoff 2011), have demonstrated, however, that the the impact of this missing information is negligible on overall trends of inequality within the 100 largest U.S. cities. 18 Throughout the four decades considered in this study, the block group partition of Chicago has become finer, with the number of block groups increasing of 1000 units. This change keeps track of the demographic boom in Chicago, implying a roughly stable demographic composition in each block group (around 1100 households on average). 22 Figure 2: Spatial GINI indices of income inequality for Chicago (IL), 1980, 1990, 2000 and 2010/14 (a) GINI within (b) GINI between Note: Authors elaborations on U.S. Census data. spite the relative gap between the poor (bottom 20%) and rich population (top 20%) has remained stable across time, we observe from Table 1 that the top 10% to bottom 10% ratio has sharply increased from 11.533 in 2000 to almost 13.5 in 2010/2014 period, indicating increased dispersion at the tails of the distribution. The citywide income inequality, measured by the Gini index, has contextually raised from 0.43 to 0.48 during the same period. We compute GINI within and between indices for 1980, 1990, 2000 and 2010/2014 Census waves using the estimators discussed in section 4 to assess how inequality in equivalent household income across individual neighborhoods has evolved in this period. The estimators of the GINI indices are non-parametric, implying that we measure the level of inequality at every empirical distance (expressed in meters) estimated from the data. At distance zero, the GINI within index captures the average inequality in estimated income levels within block groups. Data confirm a substantial heterogeneity of within income inequality across block groups: the Gini index at the level of the block groups varies, in fact, between 0.2 to above 0.6 in 2000.19 Inequality remains above 0.4 even 19 For this year block group level estimates of inequality are reported in the Census. 23 averaging across block groups. This explains the relatively high starting point of the GINI within curves shown in Figure 2.a. Inequality slightly decreases as the neighborhood size reaches 2km and then quickly raises to reach its city-wide level when the size of the neighborhood is larger than 25 kilometers. Comparing the GINI within curves of the different decades, within neighborhood inequality appears increasing over time, for any size of the neighborhood. In figure 2.b we plot GINI between curves from 1980 to 2010/14. When neighborhoods are narrow (basically, we compare average income across block groups), we find lower inequality values compared to the within case (the GINI between index values are generally smaller than 0.3). The most striking evidence in figure 2.b is that between inequality decreases with the size of the neighborhood, but in a very smooth manner. For neighborhoods of size 2 kilometers, the GINI between is generally larger than 0.25. It decreases to 0.1 only for neighborhoods at least of 20 kilometers range. Overall, this pattern is robust across Census years. Contrasting the GINI between curves over the last three decades, between inequality is on the rise up to the 1990, and then remains stable. Are these patterns statistically significant? To answer this question, we first compute empirical estimators of the variograms produced from data, and then we use results presented in appendix A to derive standard errors and confidence intervals of the GINI within and between indices at pre-selected abscissae. Confidence bounds are drawn for each spatial inequality curve, and dominance across curves is tested making use of pointwise t-statistics.20 All together, we obtain evidence of the following patterns of spatial inequality in Chicago over the time span considered: i) GINI within inequality is quite high and weakly increases with the size of the neighborhood; ii) GINI between inequality is smaller than GINI within (and than overall inequality) when the neighborhood size is small and de20 To do so, we compute all pairwise differences in GINI within or between spatial inequality curves across all the decades under analysis. We then plot these differences measured at pre-determined distance abscissae (along with the associated confidence bounds) on a graph. If the horizontal line passing from the origin of the graph (indicating the null hypothesis of no differences in spatial inequality at every distance threshold) falls within these bounds, we conclude that the gap in the spatial inequality curves under scrutiny are not significant at 5% confidence level. For a detailed description of results, see appendix B. 24 creases smoothly with the neighborhood size, implying spatial persistence of inequality; iii) GINI within inequality levels do not increase significantly over time; iv) GINI between inequality rose in the two first decades up to 2000, while stabilizing between 2000 and 2014. The previous facts suggest that largest increases in absolute income deviations are registered for individuals living in neighborhoods experiencing a growth in average income. This might be the case if, on average, rich households get richer in those neighborhoods that are composed predominantly by rich households where poor households income remained stable, or poor households get poorer in neighborhoods that are composed predominantly by poor households where rich households income remained stable, implying little variations across time of the GINI within index. Similar results are consistent with a case in which income growth is evenly distributed across poor and rich households (so relative incomes do not change), but rich and poor households distance increases. However, data show that inequality at the tails of the distribution increased substantially over 1980-2014 (table 1), affecting as well the citywide inequality, on the rise. The result highlights the prominent implications for spatial inequality that are due to changes in the income distribution, rather than to changes in sorting patterns. 5.3 Stylized facts about spatial inequality in the U.S. Can we recover the same patterns of spatial inequality registered in Chicago also in other U.S. cities? We now try to fix new stylized facts about the dynamics of spatial inequality across the U.S. major urban areas. We consider the 30 most populated cities in 2014.21 The spatial dimension of the city is established by the city metropolitan statistical area as defined from the 1980 Census, so that within-city patterns of spatial inequality can be meaningfully compared across decades.22 21 Data on urban demographic size are from the Census Bureau and can be downloaded from: http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk. The list of cities, ordered by their size, can be found in table 4 in the appendix. 22 Information on the county composition of a city metropolitan statistical are are taken from the Census 1980 definition: http://www.census.gov/population/metro/files/lists/historical/83mfips.txt. 25 Figure 3: GINI within and between for 30 most populated U.S. cities (a) GINI within, Census 1980 (b) GINI within, ACS 2010/2014 (c) GINI between, Census 1980 (d) GINI between, ACS 2010/2014 Note: Authors elaboration on U.S. Census data. A full description of the cities used in this study can be found in the appendix, table 4. We compute within and between GINI for each of these cities in each year at predetermined distances. We summarize our main findings in Figure 3, which shows spatial inequality curves for the years 1980 and 2010/2014. There are 30 curves in each plot, one for each city considered in the study. These curves are suggestive of two basic facts. First, that spatial within and between inequality is larger in 2010/2014 than it was in 1980, at every distance abscissa. Second, the dynamics of spatial inequality displayed by the between and within curves are substantially similar to that registered for Chicago: 26 the GINI within index is very high even for small distances and rapidly converges to the citywide level of inequality. The GINI between index fluctuates around 0.3 and smoothly converges to zero for large individual neighborhoods. The black curves in the figure represent a fifth degree polynomial fit of the relation between the values of GINI within and between indices and the neighborhood size. Their shapes reflect the two stylized facts described above on the pattern of spatial inequality in U.S. metropolitan areas. Although resting on a completely different methodology, our results support the findings of Wheeler and La Jeunesse (2008). They considered two different exogenous spatial partitions of US metropolitan areas and reported quite strong levels of spatial inequality within block groups, also stable over time. They also pointed out that the major changes over 1980-2000 were driven by the between component of inequality. The GINI within and between indices capture dimensions of inequality that are not predictable looking at city inequality or affluence. Figure 4.a relates overall city inequality (measured through the Gini index) and the values of spatial GINI inequality indices in our data. The degree of association is visualized by the slopes of the regression lines. We examine both within and between spatial inequality for the Census year 1980 and for ACS 2010/14 data, for a neighborhood size of 2Km. As expected, citywide and withinneighborhood inequality are positively correlated. Correlation evolves from 94% in 1980 to 72% in 2014. Also inequality between neighborhoods is positively associated with citywide inequality. Heterogeneity of GINI between indices around the regression lines is, however, substantially larger than heterogeneity in GINI within, thus indicating less reliability in these correlation estimates. Correlation evolves from 62% in 1980 to barely 35% in 2014. In both cases, the degree of association between spatial and citywide inequality is slightly decreasing over time. Figure 4.b similarly shows the association among GINI spatial measures and city affluence (measured by the average equivalent income). Results about the relation between GINI inequality and average households income in U.S. cities are less clear-cut. We do not detect a remarkable association between city affluence and GINI spatial inequality, both in the within and the between form. 27 Figure 4: City-level income inequality, average income and spatial inequality for different individual neighborhood concepts: Census 1980 and ACS 2010/2014 (a) Distance range: 0.5Km (b) Distance range: 5Km Note: Authors elaboration on U.S. Census data. Citywide average income normalized by weighted sample 28 mean. 6 Concluding remarks We study spatial inequality at the urban level from the perspective of the individual. We use information about the income distribution in the neighborhood surrounding each individual, defined by measures of spatial distance between pairs of individuals, to assess the spatial component of inequality. We provide simple principles for spatial inequality and derive measures connected to the Gini inequality index. We conclude that only flat-tax minimum-income redistributive schemes are coherent with spatial inequality reduction. We then investigated spatial inequality patterns in the 30 largest U.S. cities from 1980 to 2014 and we established four stylized facts about spatial inequality: i) Spatial inequality within individual neighborhoods of any size is relatively high in all the periods considered. ii) Spatial inequality between individual neighborhoods is smaller than within inequality when the neighborhood concepts is restricted to close neighbors. Inequality decreases smoothly with the size of individual neighborhood in all the periods considered. iii) Spatial inequality has been on the rise over time, especially in the between component the two first decades. iv) Spatial inequality is only modestly associated with citywide income inequality and does not depend on city affluence. The stable patterns of spatial inequality over time mainly suggest that urban inequality has been driven mainly by changes income distribution that affected differently rich an poor, depending from where they are located in the city, rather then by dramatic changes in sorting dynamics. From the policy perspective, we remark that an effective strategy able to check this tendency from the redistribution side (avoiding policies from the sorting side, as the Moving to Opportunity experiment) is given by basic incomeflat tax redistributive schemes. The stylized facts presented above are consistent with a paradoxical picture of incomes and households sorting in US cities: a high and persistent neighborhood inequality is accompanied by poor and rich neighborhoods more and more ”linearly” separated in the city. This picture deserves further investigation, both from the measurement side and from its possible explanation through spatial sorting models. 29 References Aczel, J. (1966). Lectures on functional equations and their applications, Academic Press, New York. Bhattacharya, D. (2007). Inference on inequality from household survey data, Journal of Econometrics 137(2): 674 – 707. Binder, D. and Kovacevic, M. (1995). Estimating some measures of income inequality from survey data: An application of the estimating equations approach., Survey Methodology 21: 137–45. Biondi, F. and Qeadan, F. (2008). Inequality in paleorecords, Ecology 89(4): 1056–1067. Bishop, J. A., Chakraborti, S. and Thistle, P. D. (1989). Asymptotically distribution-free statistical inference for generalized Lorenz curves, The Review of Economics and Statistics 71(4): pp. 725–727. Chetty, R., Hendren, N. and Katz, L. F. (2016). The effects of exposure to better neighborhoods on children: New evidence from the Moving to Opportunity experiment, American Economic Review 106(4): 855–902. Chetty, R., Hendren, N., Kline, P. and Saez, E. (2014). Where is the land of opportunity? The geography of intergenerational mobility in the United States, The Quarterly Journal of Economics 129(4): 1553–1623. Chilès, J.-P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, John Wiley & Sons, New York. Clark, W. A. V., Anderson, E., Östh, J. and Malmberg, B. (2015). A multiscalar analysis of neighborhood composition in Los Angeles, 2000-2010: A location-based approach to segregation and diversity, Annals of the Association of American Geographers 105(6): 1260–1284. Conley, T. G. and Topa, G. (2002). Socio-economic distance and spatial patterns in unemployment, Journal of Applied Econometrics 17(4): 303–327. Cowell, F. (2000). Chapter 2 measurement of inequality, Vol. 1 of Handbook of Income Distribution, Elsevier, pp. 87 – 166. 30 Cressie, N. (1985). Fitting variogram models by weighted least squares, Journal of the International Association for Mathematical Geology 17(5): 563–586. Cressie, N. A. C. (1991). Statistics for Spatal Data, John Wiley & Sons, New York. Cressie, N. and Hawkins, D. M. (1980). Robust estimation of the variogram: I, Journal of the International Association for Mathematical Geology 12(2): 115–125. Dardanoni, V. and Forcina, A. (1999). Inference for Lorenz curve orderings, Econometrics Journal 2: 49–75. Davidson, R. (2009). Reliable inference for the Gini index, Journal of Econometrics 150(1): 30 – 40. Dawkins, C. J. (2007). Space and the measurement of income segregation, Journal of Regional Science 47: 255–272. Durlauf, S. N. (2004). Neighborhood effects, Vol. 4 of Handbook of Regional and Urban Economics, Elsevier, chapter 50, pp. 2173–2242. Ellen, I. G., Mertens Horn, K. and O’ Regan, K. M. (2013). Why do higher-income households choose low-income neighbourhoods? Pioneering or thrift?, Urban Studies 50(12): 2478–2495. Galster, G. (2001). On the nature of neighbourhood, Urban Studies 38(12): 2111–2124. Glaeser, E., Resseger, M. and Tobio, K. (2009). Inequality in cities, Journal of Regional Science 49(4): 617–646. Goodman, L. A. and Hartley, H. O. (1958). The precision of unbiased ratio-type estimators, Journal of the American Statistical Association 53(282): 491–508. Hardman, A. and Ioannides, Y. (2004). Neighbors’ incom distribution: Economic segregation and mixing in US urban neighborhoods, Journal of Housing Economics 13(4): 368–382. Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution, The Annals of Mathematical Statistics 19(3): 293–325. Kim, J. and Jargowsky, P. A. (2009). The Gini-coefficient and segregation on a continuous variable, Vol. Occupational and Residential Segregation of Research on Economic Inequality, Emerald Group Publishing Limited, pp. 57 – 70. 31 Ludwig, J., Duncan, G. J., Gennetian, L. A., Katz, L. F., Kessler, R. C., Kling, J. R. and Sanbonmatsu, L. (2013). Long-term neighborhood effects on low-income families: Evidence from moving to opportunity, American Economic Review 103(3): 226–31. Luttmer, E. F. (2005). Neighbors as negatives: Relative earnings and well-being, The Quarterly Journal of Economics 120(3): 963–1002. Matheron, G. (1963). Principles of geostatistics, Economic Geology 58(8): 1246–1266. Moyes, P. (1994). Inequality reducing and inequality preserving transformations of incomes: Symmetric and individualistic transformations, Journal of Economic Theory 63(2): 271 – 298. Muliere, P. and Scarsini, M. (1989). A note on stochastic dominance and inequality measures, Journal of Economic Theory 49(2): 314 – 323. Nielsen, F. and Alderson, A. S. (1997). The Kuznets curve and the great u-turn: Income inequality in U.S. counties, 1970 to 1990, American Sociological Review 62(1): 12–33. Openshaw, S. (1983). The modifiable areal unit problem, Norwick: Geo Books. Pyatt, G. (1976). On the interpretation and disaggregation of Gini coefficients, The Economic Journal 86(342): 243–255. Reardon, S. F. and Bischoff, K. (2011). Income inequality and income segregation, American Journal of Sociology 116(4): 1092–1153. Sampson, R. J. (2008). Moving to inequality: Neighborhood eeffects and experiments meet social structure, American Journal of Sociology 114(1): 189–231. Schelling, T. C. (1969). Models of segregation, The American Economic Review 59(2): pp. 488–493. Shorrocks, A. and Wan, G. (2005). Spatial decomposition of inequality, Journal of Economic Geography 5(1): 59–81. Tin, M. (1965). Comparison of some ratio estimators, Journal of the American Statistical Association 60(309): 294–307. 32 Wheeler, C. H. and La Jeunesse, E. A. (2008). Trends in neighborhood income inequality in the U.S.: 1980–2000, Journal of Regional Science 48(5): 879–891. Wong, D. (2009). The modifiable areal unit problem (MAUP), The SAGE handbook of spatial analysis pp. 105–124. Xu, K. (2007). U-statistics and their asymptotic results for some inequality and poverty measures, Econometric Reviews 26(5): 567–577. 33 Appendix A Standard errors and confidence bounds for spatial inequality measures A.1 Setting Let S denote a random field. The spatial process {Ys : s = 1, . . . , n} with s ∈ S is defined on the random field and is jointly distributed as FS . Suppose data come equally spaced on a grid, so that for any two points s, v ∈ S such that ||v − s|| = h we write v = s + h. The process distributed as FS is said to display intrinsic (second-order) stationarity if E[Ys ] = µ, V ar[Ys ] = σ 2 and Cov[Ys , Yv ] = c(h) when the covariance function is isotropic and v = s + h. Under these circumstances, we denote V ar[Ys+h − Ys ] = E[(Ys+h − Ys )2 ] = 2σ 2 − 2c(h) = 2γ(h), the variogram of the process at distance lag h. Noticing that E[Ys+h · Ys ] = σ 2 − γ(h) + µ2 , we can derive a simple formulation of the covariance between differences in random variables, notably Cov[(Ys+h1 − Ys ), (Yv+h2 − Yv )] = γ(s − v + h1 ) + γ(s − v − h2 ) − γ(s − v) − γ(s − v + h1 − h2 ) as in Cressie and Hawkins (1980). This assumption holds, in particular, if the spatial data occur on a transect. Denote s − v = h where h indicates that the random variables are located within a distance lag of h units. We can hence write Cov[(Ys+h1 − Ys ), (Yv+h2 − Yv )] = γ(|h + min{h1 , h2 }|) + γ(|h − max{h1 , h2 }|) − γ(|h|) − γ(|h − |h1 − h2 ||), which yields the formula above when h1 > h2 . We adopt the convention that γ(−h) = γ(h) in what follows. We now introduce one additional distributional assumption. Assume that Ys is gaussian with mean µ and variance σ 2 . The random variable (Ys+h − Ys ) is also gaussian with variance 2γ(h), which implies |Ys+h − Ys | is folded-normal distributed, with expectap p tion E[|Ys+h − Ys |] = 2/πV ar[Ys+h − Ys ] = 2 γ(h)/π and variance V ar[|Ys+h − Ys |] = (1 − 2/π)2γ(h). 34 A.2 GINI indices and the variogram Under the assumptions listed above, we now show that the GINI indices of spatial inequality in the population can be written as explicit functions of the variogram. We maintain the assumption that the spatial random process is defined on a transect, and occurs at equally spaced lags. For given d, we can thus partition the distance spectrum [0, d] into Bd intervals of fixed size d/Bd . Each interval is denoted by the index b with b = 1, . . . , Bd . Abusing notation, we denote with dbi the set of locations at interval b (and thus distant b · d/Bd ) within the range d from location si . The cardinality of this set is ndbi ≤ ndi ≤ n. Under these assumptions, the GINI within index rewrites 1 E[|Ysj − Ysi |] 2n ndi µ i j∈di p XX 1 4γ(||sj − si ||)/π = 2n ndi µ i j∈di p Bd X1X 4γ(si + b − si )/π ndbi X 1 = n b=1 ndi j∈d 2 nddi µ i bi ! p Bd X nd 4γ(b)/π 1X bi = . 2 b=1 n n µ d i i GIN IW (FS , d) = XX (1) The GINI within index is an average of a concave transformation of the (semi)variogram function, weighted by the average density of locations at given distance lag b on the transect. This average is then normalized by the average income, to produce a scaleinvariant measure of inequality. The index can be also conceptualized as a coefficient of variation, where the standard deviation is replaced by a measure of dispersion that accounts for the spatial dependence of the underlying process. Similarly, the spatial GINI between index can also be written as a function of the variogram. The result holds under the assumption that the process Ys is gaussian, as P above, which implies that µsi d = ndi1+1 Ysi + j∈di Ysj is also gaussian under the intrinsic stationarity assumption, with expectation E[µsi d ] = µ for any i. From this, it follows that the difference in random variables |µsi d − µs` d | occurring in two locations si 35 and s` is a folded-normal distributed random variable with expectation E[|µsi d − µs` d |] = p 2/π V ar[µsi d − µs` d ]. The variance term can be decomposed as follows: V ar[µsi d − µs` d ] = V ar[µsi d ] + V ar[µs` d ] − 2Cov[µsi d ; µs` d ]. (2) Developing the variance and covariance terms we obtain: " !# X 1 V ar[µsi d ] = V ar Ysi + Ysj ndi + 1 j∈di X X 1 E[Ysj Ysk ] − µ2 = 2 (ndi + 1) j∈di ∪{i} k∈di ∪{i} = = 1 (ndi + 1)2 Bd X X b=1 j∈dbi = σ2 − X X c(||sj − sk ||) (3) Bd X 1 1 X c(|si + b − (si + b0 )|) ndi + 1 b0 =1 k∈d ndi + 1 (4) j∈di ∪{i} k∈di ∪{i} b0 i Bd X Bd X ndbi ndb0 i γ(b − b0 ), 2 (n + 1) di b=1 b0 =1 (5) where (3) follows from the definition of the covariogram, (4) is a consequence of the assumption that the process can be represented on a transect and, for simplicity, it is assumed that the set of location at b = 1 is d1i ∪ {i} with cardinality ndbi + 1, as it includes location si , while (5) follows from the definition of the variogram. Similarly, the covariance term in (2) can be manipulated to obtain the following: Cov[µsi d ; µs` d ] = XX j∈di k∈d` 2 = σ − 1 E[Yj Yk ] − µ2 (ndi + 1)(nd` + 1) Bd X Bd X ndbi ndb0 ` γ(si − s` + |b − b0 |), (n + 1)(n + 1) d d i ` b=1 b0 =1 (6) where the assumption that the process can be represented on a transect allows to write the variogram as a function of si − s` . Plugging (5) and (6) into (2), and by denoting i − ` = m to recall that the gap between si and sj is m, with m positive integer such that 36 m ≤ B with B being the maximal distance between any two locations on the transect, we obtain V ar[µsi d − µs` d ] = = Bd X Bd X ndbi ndb0 ` γ(si − s` + |b − b0 |) − (n + 1)(n + 1) di d` b=1 b0 =1 ! Bd X Bd X ndbi ndb0 i ndb` ndb0 ` γ(b − b0 ) + − 2 2 ndi nd` b=1 b0 =1 Bd X Bd X 2 ndbi ndb0 i+m γ(m + |b − b0 |) − (n + 1)(n + 1) d d i i+m b=1 b0 =1 ! B B d d X X ndbi ndb0 i ndb i+m ndb0 i+m + γ(b − b0 ) − 2 2 n n di di+m b=1 b0 =1 2 = V (γ, i, m). (7) Using the notation (7), we derive an alternative formulation of the GINI between index: GIN IB (FS , d) = 1 1 XX E[|µsi d − µs` d |] 2 i `6=i n(n − 1) µ B 1X1 X X 1 E[|µsi d − µs` d |] = 2 i n m=1 `∈n (n − 1) µ bi P p 1 nbi B 2V (γ, i, m)/π i n (n−1) 1X = . 2 m=1 µ (8) Under stationarity assumptions about the spatial process, we can show that the GINI between index of spatial inequality can be written as an average of coefficients of variations, each discounted by a weight controlling for the spatial dependency of the process. Formulations of the GINI within and between indices in (1) and (8) clarify the role of spatial dependence on the measurement of spatial inequality. Spatial dependence can be modeled via the variogram. Standard errors and confidence intervals of the GINI indices can be calculated accordingly. 37 A.3 Standard errors for the spatial GINI within index In this section, we derive confidence interval bounds for the GINI within index under three key assumptions: that the underling spatial process is stationary, that the spatial process occurs on a transect at equally spaced points, and the gaussian assumption. This allows to build confidence intervals for the empirical GINI within index estimator of the ˆ I W (y, d) ± zα SEW d , where zα is the standardized normal critical value for form GIN confidence level α and SEW d is the standard error of the GINI within estimator. For a given empirical income distribution, the confidence interval changes as a function of the distance parameter selected. Hence, we can use the confidence interval estimator to trace confidence bounds for the GINI within curve. Null hypothesis of dominance or equality for the GINI within curves can be formulated by using confidence bounds as the rejection region and by defining null hypothesis at each distance point separately (alike to statistical tests for strong forms of stochastic dominance relations, as in Bishop, Chakraborti and Thistle 1989, Dardanoni and Forcina 1999). Asymptotic standard errors (SE in brief) are derived for the weighted GINI within index for the spatial process {Ys : s ∈ S}. We assume that S is limited to n locations. We index these locations for simplicity by i such that i = 1, . . . , n. The spatial process is then a collection of n random variables {Yi : i = 1, . . . , n} that are spatially correlated. The joint distribution of the process is F . Each location is associated with a weight P wi ≥ 0 with w = i wi , which might reflect the underling population density at a given location. These weights are assumed not to be stochastic. We also assume intrinsic stationarity as before. The first implication is that, asymptotically, the random variable P P w µid = j∈di ∪{i} P j wj Yj is equivalent in expectation to µ̃ = i Pwiwi Yi , i.e., E[µ̃] = µ. j∈di ∪{i} i The second implication is that the spatial correlation exhibited by F is stationary in the distance d and can be represented through the variogram of F , denoted 2γ(d). An asymptotically equivalent version of the weighted GINI within index of the process 38 distributed as F where individual neighborhood have size d is n 1 XX 1 wi wj P GIN IW (F, d) = |Yi − Yj | = ∆W d . 2µ i=1 j∈d 2 w j∈di wj 2µ (9) i The GINI within index can thus be expressed as a ratio of two random variables. Asymptotic SE for ratios of random variables have been developed in Goodman and Hartley (1958) and Tin (1965). Related results have also been derived from the theory of Ustatistics pioneered in Hoeffding (1948) and adopted to derive asymptotic SE for the Gini coefficient of inequality under simple and complex random sampling by Xu (2007) and Davidson (2009). Based on these results, we derive the asymptotic variance of the GINI within index in (9): V ar [GIN IW (F, d)] = 1 (GIN IW (F, d))2 V ar[∆ ] + V ar[µ̃] − Wd 4nµ2 nµ2 GIN IW (F, d) Cov[∆W d , µ̃] + O(n−2 ), nµ2 where the asymptotic SE is SEW d = (10) p V ar [GIN IW (F, d)] at any d. The variance and covariance terms in (10) are shown to be relate to the variogram. To obtain this result, we have to introduce two additional assumptions. The first assumption is that the process distributed as F occurs on a transect, as explained before. We use scalars m, b,0 and so on to identify equally spaced points on the transect. Second, we assume that Yi is gaussian with expectation µ and variance σ 2 . These assumptions are taken from Cressie and Hawkins (1980). Under these assumptions, the variance of µ̃ 39 writes X wi X wj E[Yi Yj ] − µ2 w j w i B P X wi X X w j∈dmi wj P j = c(||si − sj ||) w w w j j∈d mi m=1 i j∈dmi ! P B X X wi j∈d wj mi = c(|m|) w w m=1 i V ar[µ̃] = 2 = σ − B X (11) (12) ω(m)γ(m), (13) m=1 where (13) is obtained from (12) by renaming the weight score, which satisfies PB m=1 ω(m) = 1, and by using the definition of the variogram. The second variance component of (10) can be written as follows: V ar[∆W d ] = n X X i=1 j∈di − ww Pi j w j∈di n X X wj X wi X i w wj P j∈di `=1 k∈d` j∈di wj w ww P` k wk !2 E[|Yi − Yj ||Y` − Yk |] k∈d` E[|Yj − Yi |] . The first component of V ar[∆W d ] cannot be further simplified, as the absolute value operator enters the expectation term in multiplicative way. Under the gaussian assumption, the expectation can be nevertheless simulated. This can be done acknowledging that the random vector (Yj , Yi , Yk , Y` ) is normally distributed with expectations (µ, µ, µ, µ) and variance-covariance matrix Cov[(Yj , Yi , Yk , Y` )] given by: Cov[(Yj , Yi , Yk , Y` )] = σ 2 c(||sj − si ||) c(||sj − sk ||) c(||sj − s` ||) σ 2 c(||si − sk ||) c(||si − s` ||) . σ2 c(||sk − s` ||) 2 σ Data occur on a transect at equally spaced points, where sj = si + b and sk = s` + b0 40 for the positive integers b ≤ Bd and b0 ≤ Bd . We take the convention that b0 > b and we further assume that there is a positive gap m, with m ≤ B between points si and s` . Using this notation, we can express the variance-covariance matrix as a function of the variogram σ 2 σ − γ(b) σ − γ(m − |b − b|) σ − γ(m + min{b , b}) σ2 σ 2 − γ(m − max{b0 , b}) σ 2 − γ(m) σ2 σ 2 − γ(b0 ) . 2 Cov[(Yj , Yi , Yk , Y` )] = 0 2 0 2 σ2 The expectation E[|Yi − Yj ||Y` − Yk |] can be simulated from a large number S (say, S = 10, 000) of independent draws (y1s , y2s , y3s , y4s ), with s = 1, . . . , S, from the random vector (Yj , Yi , Yk , Y` ). The simulated expectation is a function of the variogram parameters m, b, b0 and d and of σ 2 . It is denoted θW (m, b, b0 , d, σ 2 ) and estimated as follows: S 1X |y2s − y1s | · |y4s − y3s |. θW (m, b, b , d, σ ) = S s=1 0 2 With some algebra, and using the fact that E[|Y` − Yi |] = 2 p γ(m)/π for locations ` and i at distance m ≤ B one from each other, it is then possible to write the term V ar[∆W d ] as follows: V ar[∆W d ] = Bd X Bd B X X ω(m, b, b0 , d)θW (m, b, b0 , d, σ 2 ) m=1 b=1 b0 =1 −4 Bd X !2 p ω(m, d) γ(m)/π . (14) m P P P w In the formula, ω(m, b, b0 , d) = i wwi j∈dbi P j wj `∈dmi j∈di P wi P w P j are calculated as before. j∈dmi i w wj w` w P k∈db0 ` P wk k∈d` wk while ω(m, d) = j∈di The third component of (10) is the covariance term. It also depends on the variogram. The result relies on the following equivalence, when the process is define on the transect 41 and i and j are separated by b units of spacing while i and ` are separated by m unit of spacing: E[|Yj − Yi |Y` ] = E[|Yj Y` − Yi Y` |] = E[Yj Y` ] − E[Yi Y` ] − 2E[min{Yj Y` − Yi Y` , 0}] = c(||sj − s` ||) + µ2 − c(||si − s` ||) − µ2 − 2E[min{Yj Y` − Yi Y` , 0}] = γ(m) − γ(m − b) − 2E[min{Yj Y` − Yi Y` , 0}]. (15) The expectation E[min{Yj Y` − Yi Y` , 0}] is non-liner in the underlying random variables. Under the gaussian hypothesis it can be nevertheless simulated from a large number S (say, S = 10, 000) of independent draws (y1s , y2s , y3s ), with s = 1, . . . , S, from the random vector (Yj , Yi , Y` ) which is normally distributed with expectations (µ, µ, µ) and variance-covariance matrix Cov[(Yj , Yi , Y` )]. As the process occurs on the transect, the variance-covariance matrix writes σ 2 2 σ − γ(b) Cov[(Yj , Yi , Y` )] = σ2 2 σ − γ(m) σ 2 − γ(m − b) 2 σ for given m, b and d. The resulting simulated expectation is denoted φW (m, b, d, σ 2 ) and computed as follows: S 1X φW (m, b, d, σ ) = min{y2s y3s − y1s y3s , 0}. S s=1 2 42 Based on this result, we develop the covariance term in (10) as follows: Cov[∆W d , µ̃] = X wi X X w` wj E[|Yj − Yi |Y` ] w j∈d j∈di wj ` w i X wi X w P j E[|Yj − Yi |] −µ w w j j∈d i i j∈d P i i = Bd B X X ω(m, b, d) γ(m) − γ(m − b) − 2φW (m, b, d, σ 2 ) m=1 b=1 Bd X −2µ ω(m, d) p γ(m)/π. (16) m=1 P P P w The weights in (16) coincide respectively with ω(m, b, d) = i wwi `∈dmi ww` j∈dbi P j wj j∈di P P w and ω(m, d) = i wwi j∈dmi P j wj . The variogram appears in the second term of (16) j∈di is it was the case in (14). ˆ W d , can be obtained by plugging into A consistent estimator for the SE, denoted SE (10) the empirical counterparts of the variogram and the lag-dependent weights, using the formulas in (13), (14) and (16). These estimators are discussed in section A.5. A.4 Standard errors for the spatial GINI between index ˆ I B (y, d) ± zα SEB d for the GINI between Estimation of confidence interval bounds GIN index are obtained under the same assumptions outlined in the previous section. As before, we assume that the spatial process {Ys : s ∈ S} is limited to n locations. We index these locations for simplicity by i such that i = 1, . . . , n. The spatial process is then a collection of n random variables {Yi : i = 1, . . . , n} that are spatially correlated. The joint distribution of the process is F . Each location is associated with a weight wi ≥ 0 P with w = i wi . These weights are assumed not to be stochastic. P w Under stationary assumptions, the neighborhood averages µid = j∈di ∪{i} P j wj Yj j∈di ∪{i} P and µd = i wwi µid are equivalent in distribution to µ̃, and hence µ̃ can be used to assess V ar[µd ], as V ar[µd ] = V ar[µ̃]. Similar conclusions cannot be drawn for measures of linear association involving µd . 43 An asymptotically equivalent version of the weighted GINI within index of the process distributed as F where individual neighborhood have size d is n GIN IW (F, d) = n 1 X X wi wj 1 ∆B d . |µ − µ | = id jd 2µ i=1 j=1 w2 2µ (17) We use results on variance estimators for ratios to derive the SE of (17): V ar [GIN IB (F, d)] = 1 (GIN IB (F, d))2 V ar[∆ ] + V ar[µ̃] − Bd 4nµ2 nµ2 GIN IB (F, d) Cov[∆B d , µd ] + O(n−2 ), − nµ2 where the asymptotic SE is SEB d = (18) p V ar [GIN IB (F, d)] at any d. The variance and covariance terms in (18) are shown to be related to the variogram. To obtain this result, we have to further introduce two assumption. The first assumption is that the process distributed as F occurs on a transect, as explained before. We use scalars m, b, b0 and so on to identify equally spaced points on the transect. Second, we assume that Yi is gaussian with expectation µ and variance σ 2 . The variance V ar[µ̃] which represents the population estimator for µ is given as in (12). The second variance component in (18) can be written as follows: V ar[∆B d ] = X X wi wj X X w` wk i − j w2 ` k X wi X wj i w j w w2 E[|µid − µjd ||µ`d − µkd |] !2 E[|µjd − µid |] . (19) The first component of V ar[∆B d ] cannot be further simplified as the absolute value operator enters the expectation term in multiplicative way. Under the gaussian assumption, the expectation can be nevertheless simulated. This can be done acknowledging that the random vector (µjd , µid , µkd , µ`d ) is normally distributed with expectations (µ, µ, µ, µ) and variance-covariance matrix C of size 4 × 4. The cells in the matrix C are indexed accord44 ingly to vector (µjd , µid , µkd , µ`d ), so that element C12 is used, for instance, to indicate the covariance between the random variables µjd and µid . The sample occurs on a transect. We use scalars b and b0 to denote a well defined distance gap between any observation indexed by {j, i, k, `} and any other observation that is b or b0 units away from it, within a distance range d. We use scalars m to indicate the gap between i and `, so that ` = i + m; we use m0 to indicate the gap between i and j, so that j = i + m0 and we use m00 to indicate the gap between k and `, so that ` = k + m0 . Based on this notation, we can construct a weighted analog of (6) to explicitly write the elements of C as transformations 45 of the variogram. This gives: 2 C11 = σ − C22 = σ 2 − C33 = σ 2 − C44 = σ 2 − C12 = σ 2 − C13 = σ 2 − C14 = σ 2 − C23 = σ 2 − C24 = σ 2 − C34 = σ 2 − Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd Bd X X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X ω1 (b, d)ω1 (b0 , d)γ(b − b0 ), ω2 (b, d)ω2 (b0 , d)γ(b − b0 ), ω3 (b, d)ω3 (b0 , d)γ(b − b0 ), ω4 (b, d)ω4 (b0 , d)γ(b − b0 ), ω1 (b, d)ω2 (b0 , d)γ(m0 + |b − b0 |), ω1 (b, d)ω3 (b0 , d)γ(m + |b − b0 |), ω1 (b, d)ω4 (b0 , d)γ(m + |b − b0 |), ω2 (b, d)ω3 (b0 , d)γ(m + |b − b0 |), ω2 (b, d)ω4 (b0 , d)γ(m + |b − b0 |), ω3 (b, d)ω4 (b0 , d)γ(m00 + |b − b0 |), b=1 b0 =1 where we denote, for instance, ω1 (b, d) = wj j w P P j 0 ∈dj wj 0 P j 0 ∈dj wj and similarly for the other elements. The expectation E[|µjd − µid ||µkd − µ`d |] can be simulated from a large number S (say, S = 10, 000) of independent draws (ȳ1s , ȳ2s , ȳ3s , ȳ4s ), with s = 1, . . . , S, of the random vector (µjd , µid , µkd , µ`d ). The simulated expectation will be a function of the variogram parameters m, m0 , m00 and d and of σ 2 . It is denoted θB (m, m0 , m00 , d, σ 2 ) and estimated 46 as follows: S 1X θB (m, m , m , d, σ ) = |ȳ2s − ȳ1s | · |ȳ4s − ȳ3s |. S s=1 0 00 2 The summations in V ar[∆B d ] run over four indices i, j, k, `. These can be equivalently represented through summations at given distance lags m, m0 , m00 . For instance, we write P wi P wj PB P wi P wj P = to indicate that i and j are separated by i w j w m0 =1 i w j∈dm0 i wj j∈dm0 i 0 a lag of m units on the transect. Repeating this for each of the three pairs of indices i, j and `, k and i, ` we end up with three summations over m0 , m00 and m respectively, where the aggregate weight is denoted B X B X B X X X X wi X w` X wi X wj wk w P P P ` · · . ω(m, m , m , d) = w j∈d j∈dm0 i wj m00 =1 ` w k∈d k∈dm00 ` wk m=1 i w `∈d `∈dmi w` m0 =1 i 0 00 m0 i m00 ` wi wj w2 mi w` wk E[|µid w2 − µjd ||µ`d − µkd |], ω(m, m0 , m00 , d)θB (m, m0 , m00 , d, σ 2 ). (20) Hence, the first term of the V ar[∆B d ], P P i j P P ` k can be written as follows: B X B B X X m=1 m0 =1 m00 =1 As of the second term of V ar[∆B d ], we make use of the gaussian assumption and the variogram properties to express the square of the expectation as a weighted analog of (8), 47 that is " V ar[∆B d ] = E X X wi wj i j w2 X X wi wj = i j w2 #2 |µid − µjd | !2 E[|µid − µjd |] r !2 2 V ar[|µid − µjd |] = 2 w π i j !2 B P q 2 X wi X j∈dmi wj X wj P = V ar[|µid − µjd |] π w m0 =1 w j∈dmi wj i j∈dmi !2 B q 2 XX X ωij (m, d) V ar[|µid − µjd |] (21) = π m0 =1 i j∈d X X wi wj q mi where V ar[|µid − µjd |] = Bd X Bd X 2ωij (b, b0 , d)γ(m − |b − b0 |) − (ωi (b, b0 , d) + ωj (b, b0 , d))γ(b − b0 ). b=1 b0 =1 Both weighting schemes in (20) and in (21) cannot be easily estimated in reasonable computation time: they involve multiple loops across the observed locations, so that the length of estimation increases exponentially with the density of the spatial structure. In section A.5 we discuss estimators of the weights ωij (m, m0 , m00 , d), ωij (m, d), ωij (b, b0 , d), ωi (b, b0 , d) and ωj (b, b0 , d) that are feasible, and provide the empirical estimator of the variance V ar[∆B d ]. The third component of (18) is the covariance term Cov[∆B d , µd ]. The indices i, j, ` identify three locations and their respective neighborhood income averages vector (µid , µjd , µ`d ). 48 Under normality and stationarity assumptions, we can write the covariance term as follows X X wi wj X w` Cov[∆B d , µd ] = Cov[ µ`d ] |µ − µ |, id jd w2 w i j ` X X wi wj X w` Cov[ = |µid − µjd |, µ`d ] 2 w w i j ` X w` X X wi wj = E[|µid − µjd |µ`d ] − w i j w2 ` X X wi wj X w` E[µ`d ] − E[|µid − µjd |] 2 w w i j ` X w` X X wi wj E[|µid − µjd |µ`d ] − = 2 w w i j ` r q 2 X X wi wj − µ V ar[µid − µjd ]. π i j w2 (22) The first term of (22) is the expectation of a non-linear function of convex combinations of normally distributed random variables. Under the gaussian hypothesis, the expectation E[|µid −µjd |µ`d ] can be nevertheless simulated from a large number S (say, S = 10, 000) of independent draws (ȳ1s , ȳ2s , ȳ3s ) with s = 1, . . . , S from the random vector (µid , µjd , µ`d ) which is normally distributed with expectations (µ, µ, µ) and variance-covariance matrix C of size 3 × 3. Let use scalars b and b0 to denote a well defined distance gap between any observation indexed by {j, i, `} and any other observation that is b or b0 units away from it, within a distance boundary d. We use scalars m0 to indicate the gap between i and j, so that j = i + m0 ; we use m00 to indicate the gap between i and `, so that ` = i + m00 . Based on this notation, we obtain a convenient formulation for the covariances of mean 49 neighborhood incomes that are weighted analog of (6), thus giving: 2 C11 = σ − C22 = σ 2 − C33 = σ 2 − C12 = σ 2 − C13 = σ 2 − C23 = σ 2 − Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd X Bd X b=1 b0 =1 Bd Bd X X b=1 b0 =1 Bd Bd X X b=1 b0 =1 Bd X Bd X ω1 (b, d)ω1 (b0 , d)γ(b − b0 ), ω2 (b, d)ω2 (b0 , d)γ(b − b0 ), ω3 (b, d)ω3 (b0 , d)γ(b − b0 ), ω1 (b, d)ω2 (b0 , d)γ(m0 + |b − b0 |), ω1 (b, d)ω3 (b0 , d)γ(m00 + |b − b0 |), ω2 (b, d)ω3 (b0 , d)γ(|m0 − m00 | + |b − b0 |), b=1 b0 =1 where we denote, for instance, ω1 (b, d) = P wj j w P wj 0 j 0 ∈dj P j 0 ∈dj wj and similarly for the other elements. See previous notation for further details. The expectation E[|µjd − µid |µ`d ] is simulated from a number S of independent draws (ȳ1s , ȳ2s , ȳ3s ) with s = 1, . . . , S of the random vector (µjd , µid , µ`d ). The simulated expectation will be a function of the variogram parameters m0 , m00 and d and of σ 2 . It is denoted θB (m, m0 , m00 , d, σ 2 ) and estimated as follows: φB (m0 , m00 , d, σ 2 ) = S 1X |ȳ2s − ȳ1s |ȳ3s . S s=1 This element is constant over m0 and m00 . Hence, we use φB (m0 , m00 , d, σ 2 ) as a simulated P P P ww analog for E[|µid −µjd |µ`d ], so that the covariance term ` ww` i j wi 2 j E[|µid − µjd |µ`d ] P P P ww writes ` ww` i j wi 2 j φB (m0 , m00 , d, σ 2 ), or equivalently B X wi X i w m0 =1 P j∈dm0 i w B wj X P `∈dm00 i m00 =1 50 w w` φB (m0 , m00 , d, σ 2 ), which is denoted PB m0 =1 PB m00 =1 ω(m0 , m00 , d)φB (m0 , m00 , d, σ 2 ). The second term of (22) is calculated as in (21). Overall, we are now allowed to write the covariance term as follows: Cov[∆B d , µd ] = B B X X ω(m0 , m00 , d)φB (m0 , m00 , d, σ 2 ) m0 =1 m00 =1 r − B q 2 XX X µ ωij (m, d) V ar[|µid − µjd |], π m0 =1 i j∈d (23) mi where q V ar[|µid − µjd |] = Bd Bd X X !.5 2ωij (b, b0 , d)γ(m − |b − b0 |) − (ωi (b, b0 , d) + ωj (b, b0 , d))γ(b − b0 ) b=1 b0 =1 The weights have been already defined in (21). Plugging (13), (20), (21) and (23) into (18) we derive an estimator for the GINI within index SE. The last section discuss feasible estimators. A.5 Implementation Consider a sample of size n of income realizations yi with i = 1, . . . , n. The income vector y = (y1 , . . . , yn ) is a draw from the spatial random process {Ys }, while for each location s ∈ S we assume to observe, at most, one income realization. Information about location of an observation i in the geographic space S under analysis is denoted by si ∈ S, so that a location s identifies a precise point on a map. Information about latitude and longitude coordinates of si are given. Furthermore, observed incomes are associated with weights P wi ≥ 0 and are indexed according to the sample units, with w = i wi . It is often the case that the sample weights give the inverse probability of selection of an observation from the population. The mean income in an individual neighborhood of range d, µid , is estimated by 51 µ bid = Pn j=1 ŵj yj where wj 1(||si − sj || ≤ d) ŵj := P j wj 1(||si − sj || ≤ d) P ŵj = 1, and 1(.) is the indicator function. The estimator of the average Pn wi neighborhood mean income is instead µ̂d = bid . The estimator of the GINI i=1 w µ so that j ˆ I B (y, d), is the Gini inequality index between index of spatial inequality, denoted GIN of the vector of estimated average incomes (µ̂1d , . . . , µ̂nd ), indexed by the size d of the individual neighborhood. It can be computed by mean of the plug-in estimators as in Binder and Kovacevic (1995) and Bhattacharya (2007). The estimator of the GINI within ˆ I W (y, d), is the sample weighted average of the index of spatial inequality, denoted GIN mean absolute deviation of the income of an individual located in s from the income of other individuals located in s0 , with ||s − s0 || ≤ d. Formally ˆ I W (y, d) = GIN n n X wi 1 X ŵj |yi − yj |, w 2µ̂ id i=1 j=1 where ŵj is defined as above. The estimation of the GINI indices is conditional on d, which is a parameter under control of the researcher. The distance d is conventionally reported in meters and is meant to capture a continuous measure of individual neighborhood. In practice, however, one cannot produce estimates of spatial inequality for a continuum of neighborhoods, and so in applications the neighborhood size is parametrized by the product of the number and size of lags between observations. The GINI indices are estimated for a finite number of lags and for a given size of the lags. The maximum number of lags indicates the point at which distance between observations is large enough that the spatial GINI indices converge to their respective asymptotic values. For a given neighborhood of size d, we can then partition the distance interval [0, d], defining the size of a neighborhood, into K intervals d0 , d1 , . . . , dK of equal size, with d0 = 0. Any pair of observations i and j at distance dk−1 < ||si − sj || ≤ dk is assumed to be located at distance dk one from the ˆ I B (y, dk )) and (dk , GIN ˆ I W (y, dk )) for any k = 1, . . . , K can other. The pairs (dk , GIN 52 be hence plotted on a graph. The curves resulting by linearly interpolating these points are the empirical equivalent of the GINI spatial inequality curves. A plug-in estimator for the asymptotic standard error of the GINI indices can be derived under the assumptions listed in the previous sections. The SE estimator crucially depend on four components: (i) the consistent estimator for the average µ̃, denoted µ̂, which coincides with the sample average; (ii) the consistent estimator for variance σ 2 , denoted σ̂ 2 , which is given by the sample variance; (iii) the consistent estimator for the variogram; (iv) the estimator of the weighting schemes. A sample of incomes y = (y1 , . . . , yn ) of n distinct units is drawn from F . These incomes represent realizations of the spatial process {Ys : s ∈ S}, where we assume that income realizations occur at distinct locations. Locations are geocoded. In this way, distance measures between locations can be easily constructed. In applications involving geographic representations, the latitude and longitude coordinates of any pair of income yi , yj can be combined to obtain the geodesic distance among the locations of i and of j. We also denote with wi the weight of each observation, often reflecting the inverse of the probability that observation i is observed in the population. Empirical estimators µ̂ and σ̂ 2 are standard. The robust non-parametric estimator of the variogram proposed by Cressie and Hawkins (1980) can be used to assess the pattern of spatial dependency from spatial data on income realizations. The empirical variogram is defined for given spatial lags, meaning that it produces a measure of spatial dependence among observations that are located at a given distance lag one from the others. Under the assumption that data occur on the transect at equally spaced points, we use b = 1, . . . , B to partition the empirical spectrum of distances between observed locations into equally spaced lags, and we estimate the variogram on each of these lags. This means that 2γ(b) refers to the correlation between incomes placed at distance lags of exactly b distance units. It is understood that the size of the sample is large compared to B, in the sense that the sampling rate per unit area remains constant when the partition into lags becomes finer. This assumption allows to estimate a non-parametric version of the variogram at every distance lag. Following Cressie (1985), we use weighted least 53 squares to fit a theoretical variogram model to the empirical variogram estimates. The theoretical model consists in a continuous parametric function mapping distance into the corresponding variogram level. In the application, we choose the spherical variogram model for γ (see Cressie 1985). We also assume that γ(0) → 0 and that γ(a) = σ 2 , where a is the so-called range level: beyond distance a, the random variables Ys+h and Ys with h > a are spatially uncorrelated. Under the assumption that data occur on a transect, we set the max number of lags B so that 2B = a. The parameters of the variogram model are estimated via weighted least squares, where the non-parametric variogram coordinates are regressed on distance lags. The estimated parameters are then used to draw parametric predictions for the estimator 2γ̂ of the variogram at pre-determined abscissae (distance lags). The predictions are then plugged into the GINI indices SE estimators. Cressie (1985) has shown that this methodology leads to consistent estimates of the true variogram function under the stationarity assumptions mentioned above. Finally, SE estimation requires to produce reliable estimators of the weights ω. These can be non-parametrically identified from the formulas provided above. In some cases, however, computation of the exact weights requires looping more than once across observations. The overall computation time thus increases exponentially in the number of observations and the procedure becomes quickly unfeasible. We propose alternative, feasible estimator for these weights, denoted ω̂, that are expressed as linear averages. The computational time is, nevertheless, quadratic in the number of observations as it requires at least one loop across all observations. ˆ W d in (10) and We consider here only the weights that appear in the estimators SE ˆ B d in (18) that cannot be directly inferred (i.e., are computationally unfeasible) from SE P observed weights. For a given observation i, define w(b, i) = j∈dbi wj for any gap b = 1, . . . , Bd , . . . , B the weight associated with income realizations that are exactly located P P d b lags away from i. Then, denote w(d, i) = j∈di wj = B b=1 w(b, i). We construct the 54 following estimators for the weights appeasing in the GINI within SE estimator: X wi w(b, i) w(m, i) w(m + b0 , i) For (14) : ω̂(m, b, b0 , d) = w w(d, i) i For (16) : ω̂(m, b0 , d) = w X wi w(m, i) w(b0 , i) i w w w(d, i) w(m + d, i) , , To compute these weights, one has to loop over all observations twice, and assign to each observation i the total weight w(b, i) of those observations j 6= i that are located exactly at distance b from i. Then, ω̂(m, b, b0 , d) and ω̂(m, b0 , d) are obtained by averaging these weights across i’s. The key feature of these estimators is that second-order loops across observations placed at distance b0 from an observation at distance m from i are estimated by averaging across all observations i the relative weight of observations at distance m + b0 from ay i. For the computation of the GINI between index, one needs to construct the relative weights by taking as a reference the maximum distance achievable, and not the reference abscissa d for which the index is calculated. We hence assume that beyond the threshold d, indicating half of the the maximum distance achievable in the sample, spatial correlation is negligible and weights can thus be set to zero. We implicitly maintain that d ≤ d. We then propose the following estimators: For (17) : ω̂(m, m0 , m00 , d) = X wi w(m, i) w(m, i) w(m + m0 , i) w(m + m00 , i) i w w w(d, i) w(m + d, i) w(m + d, i) w(b, i) w(m + b0 , i) w(d, i) w(m + d, i) w(b, i) w(b0 , i) For (18) : ω̂i (b, b0 , d) = w(d, i) w(d, i) w(m + b, i) w(m + b0 , i) For (18) : ω̂j (b, b0 , d) = w(m + d, i) w(m + d, i) X X X wi w(m, i) For (18) : ω̂ij (m, d) = w w i j∈d i For (18) : ω̂ij (b, b0 , d) = mi By plugging these estimators into (19) we obtain the implementable estimator of the 55 variance component V ar[∆B d ], defined as follows: Vd ar[∆B d ] = B X B X B X ω̂(m, m0 , m00 , d)θB (m, m0 , m00 , d, σ̂ 2 ) − m=1 m0 m00 =1 2 π !2 q B X X wi w(m, i) d V ar[|µid − µjd |] w w 0 m =1 i (24) where q Vd ar[|µid − µjd |] = Bd Bd X X !.5 0 0 0 0 0 2ω̂ij (b, b , d)γ̂(m − |b − b |) − (ω̂i (b, b , d) + ω̂j (b, b , d))γ̂(b − b ) b=1 b0 =1 An equivalent procedure, based on analogous weighting scheme, has to be replicated to determine the empirical estimator for (23). 56 . B Additional tables of results Figure 6 show that, in general, the gap in GINI within indices is small and never significant, not even at 10% confidence level. Essentially, there is no statistical support to conclude that the GINI curves for within spatial inequality have changed across time, a result which holds irrespectively of the extent of individual neighborhood. We draw a different conclusion for what concerns changes associated to the GINI between inequality curves. Pairwise differences across these curves, along with their confidence intervals, are reported in figure 5. The differences in inequality curves compared to the spatial inequality curve of the year 1980 (panels a, b and c of the figure) are generally positive and significant at 5% confidence level. This indicates that spatial inequality between individual neighborhood has increased compared to the initial period, roughly homogenously with respect to the individual neighborhood spatial extension. After that period, data display very little statistical support to changes in inequality across the 1990’ and 2000’. Spatial between inequality has slightly increased after 1990 (panels d and e), while it has remained stable after 2000 (panel f). In the latter case, the confidence bounds of the difference in spatial inequality curves of years 2010/2014 and 2000 fluctuates around the horizontal axis. We use estimates of the GINI standard errors to study the pattern of the spatial inequality curves. More specifically, we compute differences in the GINI within GIN IW (d) or between GIN IB (d) indices at various abscissae d, then we compute the standard errors of these differences, and finally we test if these differences are significantly different than zero. If they are, we study how spatial inequality evolves with the size of the neighborhood. In particular, the sign of these differences predicts the direction of the change in spatia inequality. We refer to five distance thresholds defining neighborhoods that are very small (100 meters, 300 meters), relatively large (1km, 5km), and very inclusive neighborhoods (10km, 25km), which include most of the urban space under analysis. The resulting differences are reported in Table 2. We note that the dip in the spatial inequality curve associated with the GINI within is not statistically significant, since most of 57 Index GIN IW Year 1980 1990 2000 2010 GIN IB 1980 1990 2000 2010 300m vs 100m -0.004 (0.019) -0.006 (0.022) -0.004 (0.017) -0.000 (0.017) -0.020** (0.002) -0.012** (0.004) -0.009** (0.002) -0.019** (0.002) 1km vs 100m -0.012 (0.020) -0.019 (0.022) -0.016 (0.017) -0.004 (0.018) -0.087** (0.002) -0.084** (0.003) -0.060** (0.002) -0.083** (0.002) Differences across distances 5km vs 25km vs 10km vs 100m 100m 2km -0.006 0.015 0.013 (0.020) (0.021) (0.021) 0.003 0.037* 0.036 (0.021) (0.021) (0.022) -0.002 0.035* 0.034 (0.020) (0.021) (0.021) 0.001 0.033 0.019 (0.019) (0.021) (0.021) -0.151** -0.239** -0.061** (0.003) (0.003) (0.002) -0.171** -0.280** -0.097** (0.004) (0.004) (0.003) -0.130** -0.237** -0.095** (0.002) (0.003) (0.003) -0.160** -0.261** -0.084** (0.003) (0.003) (0.003) 25km vs 2km 0.025 (0.023) 0.051** (0.023) 0.050** (0.022) 0.036 (0.023) -0.120** (0.002) -0.160** (0.003) -0.152** (0.003) -0.141** (0.002) 25km vs 10km 0.012 (0.023) 0.015 (0.021) 0.016 (0.024) 0.017 (0.024) -0.059** (0.003) -0.064** (0.004) -0.057** (0.003) -0.058** (0.003) Table 2: Patterns of GINI indices across distance levels Note: Authors elaboration on U.S. Census data. Each column report differences in GINI indices at various distance thresholds. SE of the distance estimate are reported in brackets. Significance levels: ∗ = 10% and ∗∗ = 5%. the changes in spatial inequality in very large neighborhoods is substantially equivalent to the spatial inequality observed for very small neighborhoods (between 100 to 300 meters of size). For 1990 and 2000, we find however a statistically significant increase in inequality when average size neighborhoods (1km of radius) are compared with very large concepts of neighborhoods. Overall, the GINI within pattern is substantially flat when the distance increases beyond 5km. The pattern registered for the GINI between index is much more clear-cut: generally, the spatial inequality curve constructed from the index is decreasing in distance (differences in GINI between are generally negative), and the patterns of changes are also significant at 5%, indicating strong reliability on the pattern of heterogeneity in average income distribution across neighborhoods. 58 Figure 5: Differences in spatial GINI within indices across decades (a) GIN IW 1990 - GIN IW 1980 (b) GIN IW 2000 - GIN IW 1980 (c) GIN IW 2014 - GIN IW 1980 (d) GIN IW 2000 - GIN IW 1990 (e) GIN IW 2014 - GIN IW 1990 (f) GIN IW 2014 - GIN IW 2000 Note: Authors elaboration on U.S. Census data. Confidence bounds at 5% are based on standard error estimators discussed in the appendix. 59 Figure 6: Differences in spatial GINI between indices across decades (a) GIN IB 1990 - GIN IB 1980 (b) GIN IB 2000 - GIN IB 1980 (c) GIN IB 2014 - GIN IB 1980 (d) GIN IB 2000 - GIN IB 1990 (e) GIN IB 2014 - GIN IB 1990 (f) GIN IB 2014 - GIN IB 2000 Note: Authors elaboration on U.S. Census data. Confidence bounds at 5% are based on standard error estimators discussed in the appendix. 60 C Statistics for selected U.S. cities City Year # Blocks Hh/block Eq. scale Mean Equivalent household income 20% 80% Gini 90%/10% New York (NY) 1980 1990 2000 2010/14 6319 6774 6618 7182 1318 1664 1537 1140 1.572 2.058 1.604 1.566 12289 22763 41061 56558 4601 7799 12196 19749 19034 35924 66542 92656 0.474 0.507 0.549 0.502 11.247 13.013 25.913 17.323 Los Angeles (CA) 1980 1990 2000 2010/14 5059 5905 6103 6385 1052 1585 1158 1107 1.615 2.012 1.690 1.649 14697 26434 38844 55224 6167 10509 13720 19056 22248 41048 59767 90324 0.441 0.475 0.509 0.505 10.735 12.391 19.256 13.628 Chicago (IL) 1980 1990 2000 2010/14 3756 4444 4691 4763 1122 1217 1173 1060 1.630 2.029 1.625 1.575 13794 21859 41193 55710 5798 9132 16076 20022 20602 32316 61667 89856 0.434 0.461 0.473 0.486 11.351 11.903 11.533 13.452 Houston (TX) 1980 1990 2000 2010/14 1238 2531 2318 2781 1253 1291 1418 2148 1.624 1.994 1.667 1.644 15419 22827 39231 55841 6900 10203 16619 22156 22718 33287 57539 88033 0.428 0.462 0.472 0.484 10.233 11.771 10.736 12.394 Philadelphia (PA) 1980 1990 2000 2010/14 3978 3300 4212 3819 855 1384 982 1124 1.650 2.001 1.602 1.566 12651 21816 38995 56205 5589 9601 15788 21567 18557 31606 57841 89602 0.410 0.442 0.454 0.465 10.245 11.788 10.972 13.174 Phoenix (AZ) 1980 1990 2000 2010/14 697 1857 1984 2494 1155 961 1222 1110 1.609 1.970 1.622 1.590 12854 21233 37860 48194 5920 9831 17098 20218 18741 30732 54998 73509 0.401 0.439 0.437 0.456 8.972 9.803 8.541 10.906 San Antonio (TX) 1980 1990 2000 2010/14 597 1101 1065 1220 891 890 1189 1307 1.686 1.983 1.651 1.623 10501 17350 31592 44773 4364 7569 13726 19048 15399 25243 45517 68074 0.451 0.455 0.454 0.454 10.206 9.903 16.081 11.225 San Diego (CA) 1980 1990 2000 2010/14 908 1628 1678 1789 1471 1473 1172 1546 1.577 1.961 1.637 1.615 12759 24194 39537 55564 5628 11007 16698 21947 18338 35191 57219 88783 0.412 0.434 0.451 0.452 8.893 11.239 9.644 11.978 Table 4: Income and population distribution across block groups, U.S. 30 largest cities 61 City Year # Blocks Continued Hh/block Eq. scale Mean Equivalent household income 20% 80% Gini 90%/10% Dallas (TX) 1980 1990 2000 2010/14 1141 2310 2189 2696 931 965 1251 1251 1.620 1.993 1.633 1.625 14614 24074 43913 54729 6759 11287 19306 23689 21494 35141 65158 84291 0.425 0.454 0.464 0.460 9.522 11.691 10.093 11.163 San Jose (CA) 1980 1990 2000 2010/14 571 1016 965 1071 1417 1400 1169 1427 1.633 1.954 1.689 1.664 16762 32120 59428 82154 8441 15598 24663 30785 24258 47103 91637 137435 0.365 0.405 0.433 0.455 7.215 8.339 9.465 14.295 Austin (TX) 1980 1990 2000 2010/14 296 718 644 899 1084 1345 1416 1662 1.517 2.019 1.569 1.576 11407 18968 38993 55093 4867 8497 17418 23478 17064 27339 55766 85981 0.440 0.461 0.442 0.443 9.902 10.522 9.455 11.403 Jacksonville (FL) 1980 1990 2000 2010/14 434 628 505 688 1000 1509 2358 1757 1.622 1.973 1.590 1.550 10868 19217 34398 46517 4602 8365 14528 18370 15546 27219 49341 71941 0.428 0.435 0.434 0.450 9.415 9.512 8.629 10.883 San Francisco (CA) 1980 1990 2000 2010/14 1083 1226 1105 1210 1166 1477 1316 1328 1.514 2.040 1.549 1.525 16322 28783 60967 85755 6927 11624 20961 28440 24339 44191 97430 145763 0.424 0.467 0.494 0.482 9.864 13.379 13.179 16.858 Indianapolis (IN) 1980 1990 2000 2010/14 730 1029 944 1030 1073 1395 1395 1639 1.617 1.985 1.573 1.568 12550 20996 37021 47262 5958 9806 16392 19870 18183 29406 52896 71036 0.388 0.425 0.423 0.450 9.032 9.515 8.317 10.624 Columbus (OH) 1980 1990 2000 2010/14 758 1281 1140 1269 1105 1128 986 1293 1.593 1.988 1.553 1.560 12427 19865 35926 48270 5984 9262 16152 21115 17840 28819 51815 72778 0.394 0.427 0.431 0.439 8.874 9.649 8.848 11.633 Fort Worth (TX) 1980 1990 2000 2010/14 640 1203 1101 1326 650 956 1147 1294 1.615 1.972 1.638 1.625 12873 21517 37074 50540 5870 10428 17140 21830 18794 30620 52607 75565 0.409 0.424 0.429 0.449 9.169 9.835 8.719 10.553 Charlotte (NC) 1980 1990 2000 2010/14 346 930 856 1172 1169 1032 1195 1299 1.614 1.959 1.583 1.579 11411 20366 39683 47697 5203 8961 16640 19231 16277 29519 59188 74717 0.400 0.424 0.451 0.452 8.864 9.445 9.145 11.757 Detroit (MI) 1980 1990 2000 2010/14 2184 4531 3954 3798 764 974 963 986 1.638 1.990 1.603 1.560 12853 22673 40742 46492 5587 10194 17362 18592 19246 33441 59654 71604 0.415 0.445 0.439 0.456 10.783 12.181 9.817 11.856 El Paso (TX) 1980 1990 2000 2010/14 218 425 418 511 897 1042 960 1142 1.759 1.969 1.750 1.694 8525 15009 23862 33277 3572 6372 9095 13049 12373 21601 33972 51000 0.443 0.456 0.476 0.462 9.182 8.963 16.668 11.060 62 City Year # Blocks Continued Hh/block Eq. scale Mean Equivalent household income 20% 80% Gini 90%/10% Seattle (WA) 1980 1990 2000 2010/14 1405 2255 2473 2475 885 1004 855 1087 1.540 1.984 1.568 1.555 14437 22563 42386 59626 6481 10601 18650 24442 21204 31905 60276 92751 0.398 0.416 0.427 0.438 8.514 10.514 8.448 10.314 Denver (CO) 1980 1990 2000 2010/14 1054 1694 1711 1908 899 983 1038 1230 1.575 2.005 1.578 1.561 14283 22072 43300 58203 6866 10791 20142 24081 20352 31410 62101 90216 0.396 0.432 0.425 0.450 8.081 11.069 8.100 10.751 Washington (DC) 1980 1990 2000 2010/14 1580 2540 2642 3335 1608 2193 1409 1360 1.619 1.968 1.603 1.600 18273 32091 53263 80366 9281 16818 24898 35929 26315 45700 78715 124973 0.390 0.404 0.425 0.420 8.361 7.758 8.968 10.665 Memphis (TN) 1980 1990 2000 2010/14 478 920 783 764 1021 903 1153 1380 1.639 1.997 1.605 1.573 11370 17888 33086 42700 4852 8072 13753 17702 16693 26052 47853 65757 0.457 0.471 0.471 0.465 10.804 10.945 18.640 11.492 Boston (MA) 1980 1990 2000 2010/14 3662 4497 3963 4082 809 1032 961 1058 1.622 1.997 1.584 1.566 12696 24633 43840 64422 5417 10314 16776 23196 18790 37112 66109 105048 0.406 0.436 0.458 0.470 10.048 12.226 11.004 13.712 Nashville (TN) 1980 1990 2000 2010/14 375 755 723 911 1043 1260 1374 1535 1.605 1.979 1.555 1.568 12416 19811 36360 49714 5382 8712 15118 20024 18373 28653 52565 76735 0.442 0.442 0.448 0.452 10.358 9.710 9.000 10.444 Baltimore (MD) 1980 1990 2000 2010/14 1517 1965 1780 1932 900 1269 1204 1182 1.641 1.972 1.588 1.567 12751 23987 38615 59954 5932 11302 16954 25171 18442 34591 55517 93398 0.400 0.426 0.431 0.439 10.075 11.780 9.565 11.158 Oklahoma City (OK) 1980 1990 2000 2010/14 709 1034 880 1015 720 854 941 1021 1.573 1.993 1.557 1.562 12933 17551 30578 45377 5777 7499 12488 18504 18878 26072 44422 68795 0.419 0.445 0.447 0.457 9.075 9.616 15.739 10.504 Portland (OR) 1980 1990 2000 2010/14 696 1145 1141 1374 1077 1131 1111 1211 1.526 1.991 1.586 1.567 12819 19987 37618 49201 5411 8840 16409 19927 18704 28511 53854 74485 0.404 0.424 0.417 0.428 9.155 9.403 8.385 10.490 Las Vegas (NV) 1980 1990 2000 2010/14 150 318 796 1284 2018 2570 1396 1215 1.554 1.976 1.620 1.592 12756 20006 36442 44657 5568 8888 16095 18771 17713 27960 51823 66044 0.406 0.431 0.430 0.442 8.542 9.310 8.202 9.525 Louisville (KY) 1980 1990 2000 2010/14 582 957 742 840 873 938 1021 1087 1.592 1.990 1.542 1.536 11451 18323 32264 45220 5036 7864 13213 17798 17218 27067 46595 69576 0.414 0.445 0.444 0.451 9.188 9.771 15.196 10.739 63
© Copyright 2026 Paperzz