Spatial Composite and Disaggregate Indicators: Chow

9
*Francesco Vidoli
**Claudio Mazziotta
Spatial Composite and Disaggregate
Indicators: Chow-Lin Methods and
Applications
Key words: Spatial Data Analysis, Composite Indicators Disaggregation, Chow-Lin approach.
Abstract The present paper aims to verify a statistical procedure that allows for the transition from
aggregate indicators to disaggregate indicators, whereas in this case, the terms aggregate/disaggregate
refer to territorial areas and not to categories. More specifically, an attempt was made to reconstruct
the indicators of the infrastructural endowment at the provincial level (in Italy) on the basis of analogous
indicators that are available at the regional level. For such purpose following Chow-Lin’s approach, we test
some coavriates representing the territorial demand for infrastructure, related to productive, demographic
and tourist aspects. The comparison between the results obtained through the application of the mentioned
approach and the “real” data (available at the provincial level) in effect allows for the verification of the
coherence among factors of demand and the real distribution of the infrastructure in Italy.
Introduction
The present paper aims to verify if and to what extent the methods of spatial disaggregation, in the
form derived from Chow-Lin’s approach, allow for the reconstruction in an appreciably precise way
of the infrastructural index at the disaggregate level on the basis of the corresponding indicator at
the superior territorial level. This essentially means verifying if the infrastructural levels of territorial
units (the Italian provinces) may be “explained” by what should be their natural factors of generation,
“demand”, in particular the factors of demographic and economic nature.
More specifically, having obtained an estimate of the provincial indicators of infrastructures on
the basis of the aforementioned methods, such indicators are then compared with the “real” ones,
available on the basis of the statistics provided by ISTAT (2006). As soon as this comparison presents
a good approximation between the “real” data and the estimated ones, it might deduce that the
infrastructural provincial endowment conforms to corresponding demographic and economic
factors of the demand; otherwise opposite conclusions shall be drawn in case of mismatch between
the two data series.
* Francesco Vidoli, “Roma Tre” University, Department of Public Istitutions, Economy and Society ([email protected])
** Claudio Mazziotta, “Roma Tre” University, Department of Public Istitutions, Economy and Society ([email protected])
10
Francesco Vidoli and Claudio Mazziotta
THE METHODOLOGICAL APPROACH
In order to derive indicators at the disaggregate level from the ones at the aggregate level1 we
propose a model based on the approach presented by Chow-Lin (1971), that is founded on three
fundamental hypotheses:
•
•
•
Structural similarity: the aggregate model and the disaggregate model are structurally similar.
This implies that the relationships between the variables observed at the aggregate level are
the same as those at the disaggregate level, that is the regression parameters in both models
remain the same.
Error similarity: the spatially correlated errors present the same structure both at the aggregate
level and the disaggregate level; that is to say that the spatial correlations are not significantly
different.
Reliable indicators: the interpolating variables have sufficiently large predictive power at both
the aggregate and disaggregate level, or R2 (or the F test) of the regression model significantly
differs from zero.
To be noted that violation of hypothesis 1 leads to attaining systematically biased estimates;
violating hypothesis 2, instead, involves spillover effects that largely contribute to the estimates
violating hypothesis 3 implies that the disaggregate estimates mirror the simple proportion of the
aggregate ones.
Such models have been prevalently used for the construction of monthly or quarterly series, starting
with the annual series, but, in recent years, these have been used in some applications in the spatial
field as well. As for Italy, it is worth mentioning the work of Bollino and Polinori (2007), where they
present the reconstruction of the Value Added at the municipal level in Umbria from the point of view
of the convergence between suburban townships and urban municipalities that benefit from a higher
growth, explained by contiguity factors and by agglomeration mechanisms.
The model is characterized both by an econometric relationship between the composite indicator
at the provincial level and a series of explanatory variables observable at the disaggregate level
(and also obviously at the aggregate one), and by a methodology of inferring unknown parameters.
Mazziotta and Vidoli (2009b) tested a first application of the model to infrastructural data.
The model is based on the assumption that at the disaggregate level a linear econometric relationship
is valid:
yd=Xd βd+εd
(1)
where: yd represents a vector (n*1) of observations of the composite indicator at the disaggregate
level, Xd is a matrix (n*k) of observations of k explanatory variables observable at the disaggregate
level and n is the number of provinces.
1 For the sake of clarity, the term “aggregate indicators” refers to an aggregate territorial measure (for example the average
value at regional level) of a disaggregate indicator (at provincial level) . “Aggregate” should not be confused with the term “composite”, where the latter is used to describe a summary measure of simple indicators always with the same territorial level.
Formally, the relationship (for example, through the operator mean) between the aggregate indicator with disaggregated ones
can be described as:
n
∑
Idi
i=1
Ia=
, ∀i∊I (a)
n
where I(a) represents the territorial area (region) in which the individual units (provinces) are included.
Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications
11
It is assumed that C is a matrix of dimension (n*N), where n is the number of the Italian provinces,
capable of transforming the disaggregate observations into aggregate ones; such transformation
may obviously be obtained through any operator.
In particular, if you choose the sum operator, regional estimates are obtained for comparison with
the corresponding provincial values (ya =∑ yd) and the generic element Ci,j will be constructed as:
Ci,j =
1, if province i ∈ region j
0, elsewhere
If you choose the arithmetic mean operator, on the contrary, C should be built as:
1
_
, if province i ∈ region j, where k = number provinces belonging to the region j
Ci,j = k
0 , elsewhere
and regional estimates will be reconstructed through the average of provincial estimation2 (ya =E (yd)).
Therefore, assuming the hypotheses of structural similarity (βd = β̂a), it is possible to write:
ya =Xa βd + εa
(2)
Under the following aggregation constraints ya = Cyd , Xa = CXd and
εa=Cεd .
Recently Polasek and Sellner (2008) presented an advancement, or rather a very interesting
generalization on the model, introducing a spatial auto-correlation term3 into a classical multivariate
regression (equation (2)). From an application point of view, this means that the value of the
dependent variable to a specific area depends not only on its own independent variables, but also
by the level of the variable in the neighbouring areas.
Indeed, assuming that effects of spatial correlation exist in competitive levels among the provinces,
but also and especially within the very similar provinces, then (see for example Anselin, 1988), given
a matrix of spatial weights WN and a spatial lag parameter ρ∊[0,1], it is possible to hypothesize at the
disaggregate level a “mixed regressive spatial auto regressive” relationship:
yd = ρd Wn yd + Xd βd + εd
with
εd ~ N [0,σ d2 IN]
(3)
The reduced form of equation (3) lets us better appreciate the spatial component through which the
contribution of Xd has been filtered.
yd = (I - ρd WN) -1 Xd βd + (I - ρd WN) -1 εd
(4)
2 In this specific case, the arithmetic average is used.
3 For a preliminary introduction of statistical applications in the field of urban planning, please see: Bickman et al. (1998) and for
more recent approaches Ayuga-Téllez et al. (2011).
12
Francesco Vidoli and Claudio Mazziotta
More specifically, such spatial filter is applied proportionally to the distance, in fact, if the expression
(I - ρd WN) -1 is developed in series (similarly to an inverse matrix of Leontief), the following is obtained:
E (yd | Xd) = (1+ρd WN + ρ d2 W N2 + ....) Xd βd
(5)
From equation (5), it is noted more easily how all the contiguous areas are involved in the estimate of
yd and that this occurs through a coefficient proportional to the distance (distance decay).
It is possible, therefore, to rewrite the reduced form of equation (4) with RN = (I - ρd WN).
yd = RN-1 Xd βd + RN-1εd , εd ∊ N [0,∑d]
(6)
with the ∑d matrix of the covariance equal to:
∑d = σ 2d (R'N RN)-1
(7)
The unknown terms of the models at the disaggregate level are therefore, the ρd, βd and the σ 2d
covariance. To estimate these unknown, it is possible, according to the basic hypotheses, to exploit
the relationship between y and X at the aggregate level and to estimate the mixed autoregressive
model (see, for example LeSage, 1998) at the aggregate level, in the form:
ya = ρa WN ya + CXd βa + εa , εa ~ N [0,∑ 2a IN]
(8)
obtaining ρ̂a and σ̂ 2a .
As for structural similarity (ρd = ρ̂a , βd = β̂a) and error similarity (σ 2d = σ̂ a2 ) , hypotheses, it is possible
to substitute the estimated parameters in equations (3) and (6).
Regarding the estimate of βa as Chow-Lin’s classic method, the following is obtained4:
β̂a, GLS = (X'a (C∑ˆ d C') -1) Xa) -1 X'a (C∑ˆ d C') -1 ya
(9)
and the estimate of yd at the disaggregate level can be constructed as:
yˆd=
RN-1 Xd β̂a
1° term
+
ˆ -1 C' X β̂ )
∑ˆ d C' (C∑ˆ d C') -1 (ya - C R
N
a a
2° term
(10)
The first term of equation (10) therefore represents the naïve estimate of the unknown vector yd,
meanwhile the second term of the equation represents the estimate error distribution at the
aggregate level and it’s named “gain projection matrix” G (Goldberger, 1962):
G = ∑ˆ d C' (C∑ˆ d C') -1
(11)
4 Please note that β̂ a, GLS are not dependent by σ̂ 2a , but rather by ρ̂a.
Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications
13
This amount crucially depends on the spatial lag parameter ρ̂a at the aggregate level; note that if ρ̂a
= 0, the ∑ˆ d matrix is equal to the matrix identity and it is reduced to the projection matrix transpose:
G = C' (CC') -1 as in the base model provided in equation (1); the ρd parameter and the WN matrix
therefore let not the 1/N part of the residual at the aggregate level be assigned equally to all the
municipalities; instead it is filtered through the spatial weights matrix.
An application to the infrastructure indicators
The application5 of the model illustrated above assumes the availability of the following information: i)
synthetic infrastructural indicators at the regional level (and provincial, for the subsequent verification
of the model’s accuracy); ii) demographic and economic variables correlated with infrastructural
needs, at the provincial and regional level. The first ones are derived from the application of a
particular method of synthesizing elementary indicators (source: ISTAT) applied by the authors in a
previous work (Mazziotta and Vidoli, 2009a)6. The second ones are reported in Table 1.
Table 1 Generation factors of infrastructure endowment - used sources)
Variable
Source
Year
Gross domestic product
Ist. Tagliacarne
2007
Share of population residing in municipalities
with more than 50 thousand inhabitants
ISTAT
2007
Accommodation capacity of low
to mid range hotels per inhabitant
ISTAT
2006
On the basis of such data, we estimate a spatial simultaneous autoregressive lag and mixed model at
the aggregate level (regional) ya = ρa WN ya + CXd βa + εa in order to obtain a good level of efficiency,
according to hypothesis n°3 of the reliable indicators.
The selection of variables in the model has been made following two paths: firstly an economic
criteria, choosing dimensions logically related with the dependent variable and secondly a statistic
criteria, choosing a model with a good predictive capability and statistically significant.
5 The analysis has been developed in R (package spdep). Code R, written by the authors, is available, on request, by the authors.
6 ISTAT indicators refer to the publication “Atlante statistico territoriale delle infrastrutture”, ISTAT 2008, available at
http://www3.istat.it/dati/catalogo/20080805_01/.
The logic that links territorial development and infrastructure indicators lies in the idea that, in view of regional analysis (see, in
particular, the approach of the regional development potential prepared by Biehl, 1994), differences in the starting points weigh
heavily the growth opportunities in a specific area. Among these differences, the infrastructural one have a great importance, for
its characteristics of (relatively) immovable property factor, which affects the efficiency of production processes. A better provision of public capital increases productivity and lowers acquisition costs of private production factors (i.e., private capital and
skilled labour), making them more profitable and thus increasing the probability of attracting them or keep them in a given region.
14
Francesco Vidoli and Claudio Mazziotta
Table 2 Estimated model results7 at the regional level,* ρ = 0,37, R2 = 0,43, AIC= -26.497
Estimation Std. Error8
z value
Pr(>|z|)
Gross domestic product
7.893E-06
0.000
2.1024
0.0355
Share of the population residing in municipalities
with more than 50 thousand inhabitants
3.916E-01
0.181
2.1587
0.0309
Accommodation capacity of low to mid range hotels
-1.182E-02
per inhabitant
0.005
-2.2568
0.0240
Variable
* See equation (8)
The estimated regression shows that the model is satisfactory (Table 2 shows the results for the
coefficients of independent variables that are all significant, the value of ρ equal to 0.37; p-value
0.043 and goodness of fit estimated by R2 equal to 0.43) both for the meaning of the variables
included as regressors (proxy of the economic development, of demographic density and of the
supply of qualified tourism) and for the verified statistical properties. Having once obtained β̂ a ,
and the parameter ρ̂a , thanks to the hypothesis of structural similarity and of error similarity, such
parameters have been substituted in the equation (10) for the purpose of obtaining the estimated
infrastructural indicator at the disaggregate level.
The results obtained indicate a considerable gap when compared to the “real” data available at the
provincial level; in other words it is deduced both from the graphic examination of the distributions
(Figure 1) and from the application of a specific indicator of spatial robustness (called IRS, see Table
3). As far as the latter, it involves a spatial robustness indicator applied to the ranks (in this case, the
positions held by individual provinces on the bases of infrastructural indicators – the “real” ones and
those resulting from the model) that varies from 0 to 1 and is created in such a way as to highlight
not only the average differences of rank, but also, among the units territorially identified, which
differences have manifested themselves. For greater detail, it is worthwhile to refer to a previous
work (Mazziotta and Vidoli, 2009b), but here it is sufficient considering that such indicator is the
result of the product of two matrices: the first (contiguity matrix, W) identifies the territorial contiguity
of the units (the provinces, in our case) among them; the second (transition matrix, T) highlights
the differences in the ranks and from which unit and towards which other unit this difference has
manifested itself. Multiplying the I-W matrix by T yields an index that, comparing the ranks held by
the province in the two situations considered (“real” infrastructural indicator and the calculated ones
using the model), highlights changes in the ranks that only involved units spatially not contiguous.
The indicator of spatial robustness (IRS) – please see the Appendix - between two ranking distributions,
in algebric form, may be expressed as follows:
IRSR ,R =
0
1
∑i,j Ti,j (1-wi,j)
MaxI
It should be noted that the maximum of the proposed index (MaxI) equals the worst situation from
the perspective of conformity of the two rankings, that is to say the one in which the unit i that was
in the first position in the R0 ranking finds itself in the last position in the R1, ranking and so on, for
as many non contiguous units (n-1) * (n-2) * …, or as many times as there is a value greater than zero
in the matrix T(I-W).
7 Please note that R2 value for a spatial lag regression (or a spatial error regression) is not defined in the same way as for ordinary
least squares. We report also AIC value, that we use to compare different models (please note that AIC can be negative and the
most negative one is the best).
8 Numerical Hessian approximate standard errors.
Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications
15
In Table 3 and Figure 1, there are many marked differences between the two rankings, given that the
IRS is equal to 0.38 with an average change in the ranks among different areas (in this specific case,
regions) that is particularly high (25).
Table 3 Index of spatial robustness (IRS) of “real” indicators vs the results from the model
IRS
N° of extra-area ranking
shifts
Extra-area mean
ranking difference
0.384
75
25
Figure 1 Ranking of Italian provinces, based on “real” indicators (on the left) and “calculated” indicators (on the right).
CONCLUSIONS
The objective of the work was to recreate the infrastructural indicators (of land transportation, in this
specific case), at a level of high territorial detail (provincial) counting on the availability of the same
indicators at a higher territorial level (regions) and proceeding with their disaggregation through
the application of a model derived from the Chow-Lin approach, applied according to the version
supplemented by Polasek and Sellner.
Because of the used variables in the model application at the provincial level, it may be sustained
that the obtained results tend to identify the levels of infrastructure that the factors of demand in
each province require.
Actually, the use in such models of socio-economic variables like regressors (infrastructure demand
factors), confers to the comparison of the two rankings – the one created on the basis of “real”
provincial infrastructural data and the one “recreated” with the model – the meaning of comparison
16
Francesco Vidoli and Claudio Mazziotta
between the supply and demand of infrastructures at the territorially disaggregate level.
Obviously, this meaning is founded as the model presents a higher level of statistical fitness. In our
application, the used goodness-of-fit measures indicate that the model results can be estimated
statistically satisfactory, considering the cross-section application and the territorial units. A better
(or larger) selection of variables would be an important improvement in the model application.
Currently, in any case, a considerable gap between the estimated levels and those assumed to be
“real” may be interpreted, even conservatively, as confirmation of mismatch existing between the
demand and the supply of infrastructures expressed by the territory. And this seems, ultimately
analysis, to be the result that is statistically more evident and economically more interesting: the
territorial distribution of the infrastructural endowment of transport is not in line with the “theoretic”
factors of generation that are present in the Italian provinces.
References
Anselin L. (1988) Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dordrecht.
Ayuga-Téllez, E., Contato-Carol, M., González, C., Grande-Ortiz, M., and Velázquez, J. (2011). ”Applying Multivariate Data Analysis as Objective Method for
Calculating the Location Index for Use in Urban Tree Appraisal.” J. Urban Plann. Dev., 137(3), 230–237.
Bickman L., Rog D.J, (1998), “Handbook of applied social research methods”,eds. Sage Publications.
Biehl, D., (1994). “The role of infrastructure in regional policy”. OECD, Working Party No. 6, Regional Development Policies, Paris.
Bollino C. A., Polinori P. (2007) Ricostruzione del valore aggiunto su scala comunale e percorsi di crescita a livello micro-territoriale: il caso dell’Umbria,
Rivista di Scienze regionali, fascicolo 2.
Chow G. C., Lin, A. (1971) Best linear unbiased interpolation, distribution, and extrapolation of time series by related series, The Rev. of Economics and
Statistics, 53(4): 372-375.
Goldberger A. S. (1962) Best linear unbiased prediction in the generalized linear regression model, American Statistical Association J., 57: 369-375.
ISTAT. (2006) Le infrastrutture in Italia. Un’analisi provinciale della dotazione e della funzionalità, Roma.
LeSage J. P. (1998) Spatial econometrics, Technical report, University of Toledo.
Mazziotta C., Vidoli F. (2009a) La costruzione di un indicatore sintetico ponderato. Un’applicazione della procedura Benefit of Doubt al caso della
dotazione infrastrutturale in Italia, Italian J. of Regional Science, vol 8, n°1, Franco Angeli.
Mazziotta C, Vidoli F. (2009b) Robustezza e stabilità spaziale di indicatori di dotazione infrastrutturale: una verifica per le province italiane, XXX Conferenza
Italiana di Scienze Regionali, Firenze.
Polasek W., Sellner R. (2008) Spatial Chow-Lin methods: Bayesian and ML forecast comparisons, Rimini Centre for Economic Analysis (RCEA), working
paper 38-08.
17
Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications
AppendiX
Spatial Robustness Indicator (IRS)
The main purpose of the Spatial Robustness Indicator (IRS) is to analyze the stability of the results
obtained from the comparison of alternative rankings of territorial units, or the calculation of the
rankings permanence within wider geographical areas.
To clarify the meaning of the proposed approach, we propose an example of four units belonging to
two different geographical areas that have a composite indicator whose order is equal to R0:
Unit
Rank
A
1
B
2
C
3
D
4
Area
Area1
R0 =
Area2
After some changes in key assumptions for the construction of the indicator or comparisons between
methods you can get two cases, R1 and R2:
Unit
Rank
A
2
Area
Unit
Rank
A
4
B
2
C
3
D
1
Area1
R1 =
B
1
C
4
Area1
R2 =
Area2
D
3
Area
Area2
Both situations have a mean rank differences equals, but sorting R1 is preferable to R2 because more
spatially stable (in R1 there is only one permutation within the areas, while in R2 permutations take
place between different areas).
A robustness indicator applied to spatial ranks should, therefore, not only highlight the mean rank
differences, but also highlight where, or rather, between which units territorially identified, these
differences have arisen.
To achieve this purpose, always keeping in mind the example proposed, it may be introduced
in the analysis two matrices. The first (W) identifies the considered units belonging to a wider
geographical area (from regions to provinces, for example), or identifies the territorial unity
contiguity (contiguity matrix).
18
Francesco Vidoli and Claudio Mazziotta
W=
A
B
C
D
A
-
1
0
0
B
1
-
0
0
C
0
0
-
1
D
0
0
1
-
The second matrix, defined as “transition” matrix, highlights the differences in the ranks both versus
to which this difference manifested itself. The transition matrices respectively between R0 and R1
and between R0 and R2, are therefore:
TR
=
0,R1
A
B
C
D
A
B
C
D
A
0
1
0
0
A
0
0
0
3
B
1
0
0
0
B
0
1
0
0
C
0
0
0
1
C
0
0
1
0
D
0
0
1
0
D
3
0
0
0
TR
=
0,R2
Analyzing these matrices is easy. The numbers that appear in each cell identify the intensity of the
displacements between a territorial unit and the other as a result of changes to the indicators: the
number 3, which you can read in the second matrix, for example, shows that the territorial unit A has
lost 3 places in the transition from R0 to R2, moving from rank 1 to rank 4.
Furthermore, the position of the number 3 at the intersection between the row headed to A and the
column headed to D, means that the displacement of rank has affected these two territorial units: A
has lost 3 positions in favour of D, which mutually earned the same 3 positions to the detriment of A.
Multiplying the matrix (I-W) for T we obtain an indicator that shows the rank changes for units not
belonging to the same geographical area (or that have affected units spatially non-contiguous,
depending on the type of matrix W used).
The spatial robustness indicator (IRS), in algebric form, can therefore be written as:
IRSR ,R =
0
1
∑i,j Ti,j (1-wi,j)
maxI
Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications
19
As regards the estimation of the maximum level of the indicator (MaxI) we must place it in the worst
condition, namely one in which the unit i was in the first place goes to the last one and so on, as many
times as the units that are not contiguous, (n-1)*(n-2)*..., or as many times as times it has a value
greater than zero in the matrix T(I-W).
In the matrix W formulation, proposed in this paper, the distances between different territorial units
are unweighted, regardless of the more or less larger distance elapsing between them.
Removing this limitation is possible by calculating a symmetric matrix which represents a distance
matrix between territorial units (i.e. in terms of kilometres between centroids) in order to consider a
greater weight for units not adjacent. P, for example, may be calculated as:
P=
A
B
C
D
A
0
40
60
80
B
40
0
30
45
C
60
30
0
70
D
80
45
70
0
With this further specification the spatial robustness indicator could be written as:
IRS PR ,R =
0
1
∑i,j Ti,j (1-wi,j) pi,j
maxI
P
where (maxI ) is equal to (n-1) * (n-2) * ... as many times as it has a value greater than zero in the
matrix TR(I-W)P.
P