Can we increase neighborhood disorder estimation accuracy by

Can we increase neighborhood disorder estimation accuracy by
incorporating spatially located covariates in kriging models?
Stephen J Mooney, Michael DM Bader, Gina S Lovasi, Kathryn M Neckerman, Andrew G Rundle, Julien O
Teitler
Population Association of America
Brief Abstract (150 word limit)
Ordinary kriging, a geostatistical technique that leverages spatial correlation between observed sample points to
estimate at unobserved locations, has been used in the social sciences to estimate contextual measures such as
neighborhood physical disorder. Universal kriging extends ordinary kriging by supplementing the spatial model
with additional covariates measured at the estimate location.
We will use an existing measure of neighborhood physical disorder collected using Google Street View imagery
from 1,826 sampled block faces across 4 US cities (New York, Philadelphia, Detroit, and San Jose) to test whether
universal kriging can improve disorder estimation. The first analysis will use all measured points within each city to
compare accuracy between kriging methods. The second will use random subsamples of the virtual audited data to
explore the relationship of sampling density to estimation accuracy. Preliminary results using convenient but not
theorized covariates in Philadelphia achieved a minor (-1 to 7%) decrease in error.
Extended Abstract (2-4 pages)
Introduction
In recent years, social science and public health researchers have been increasingly interested in
neighborhood contextual factors such as physical disorder or pedestrian infrastructure as a driver of
diverse set of health outcomes (Entwisle 2007). Much of this research has characterized neighborhoods
using systematic social observations, wherein trained neighborhood auditors rate street segments on a
number of characteristics (Reiss 1971). Such audits have been performed in person (Keyes, McLaughlin
et al. 2012), using filmed street images (Sampson and Raudenbush 1999), or using Google Street View
imagery (Badland, Opit et al. 2010, Rundle, Bader et al. 2011, Wilson, Kelly et al. 2012).
However, because auditing every block in a city is usually prohibitively expensive and time-consuming
many audits have selected a spatial sample of blocks to audit and used geostatistical tools such as
ordinary kriging (Cressie 1988) and land use regression (Shmool, Kubzansky et al. 2014) to estimate at
unobserved locations. This approach can substantially reduce audit costs for neighborhood factors
whose spatial characteristics are amenable to such techniques while allowing researchers to define
neighborhood measures at theoretically relevant scales (Bader and Ailshire 2014).
However, because ordinary kriging relies only on spatial covariance to construct estimates, ordinary
kriged estimates include error due to differences in small-scale street characteristics such as the
presence of retail, due to jurisdictional boundaries, or due to any other causes of differences in the
neighborhood construct being measured that cannot be
predicted from the spatial covariance structure alone. For
example, because pedestrians create litter, places with higher
population densities and consequently more pedestrian traffic
may have more litter on average, independent of other
sources of disorder. However, ordinary kriging cannot
incorporate population density into its disorder estimates. By
contrast, land use regression can incorporate such covariates,
but does not incorporate distances between sample points or
spatial covariance.
Universal kriging, however, incorporates both spatial
Figure 1. A map of neighborhood
correlation and covariates such as land use in prediction
disorder in Philadelphia, PA
models. For example, a universal kriging model could be fitted
estimated using ordinary kriging.
to spatial data wherein each point in the sample includes a
Green indicates more disorder.
location, a level of physical disorder, and the population
density in the census tract where the sample was taken. Estimates at unobserved locations would then
be made by first estimating the level of disorder as in ordinary kriging, then ‘correcting’ for the
population density by adding the expected value of the ordinary kriged residual for that population
density. In principle, this approach should improving estimation accuracy if the covariates incorporate
additional information about the level of disorder (Hengl, Heuvelink et al. 2003). However, to the best of
our knowledge, universal kriging has not been explored to improve accuracy of neighborhood disorder
estimates.
Methods
Disorder Data
This analysis will use a measure of neighborhood disorder estimated from 9 virtual audit items collected
from a structured spatial sample on 1,826 block faces in four US cities: New York (N=532), Philadelphia
(N=503), Detroit (N=502) and San Jose (N=289) using the Computer-Assisted Neighborhood Visual
Assessment System (CANVAS) and imagery from Google Street View (Mooney, Bader et al. 2014, Bader,
Mooney et al. 2015). The construction and validation of the measure has been described elsewhere
(Mooney, Bader et al. 2014).
Covariate Data
We will use spatial data analysis R packages, including ‘rgdal’ and ‘rgeos’, to incorporate covariate
information from additional datasets, a ‘high-variety Big Data’ approach (Mooney, Westreich et al. 2015).
Specifically, we will merge data from the US Census to identify population density in the census tract
surrounding each sampled location (United States Census Bureau 2013). Particularly in cities,
population density is a driver of high pedestrian volume, which contributes to minor indicators of
disorder such as litter. We will also identify street functional class from TIGER 2015 data released by the
US Census Bureau. We anticipate more indicators of disorder to be present on larger streets owing to
presence of retail and higher pedestrian volumes on such streets. Finally, within New York City, we will
identify the community district for each location. Within New York, Community Districts are the smallest
unit of municipal government and advocate for a neighborhood’s priorities, potentially including
reducing physical disorder.
Universal Kriging vs. Ordinary Kriging
Next, to assess the incremental accuracy by incorporating covariates into the kriging model, we will use
the R package ‘gstat’ to krige the disorder measure four ways for each city: (1) using ordinary kriging, (2)
using universal kriging incorporating population density as a predictor, (3) using universal kriging
incorporating street functional class code, and (4) using universal kriging incorportating community
district (for New York City only). For each kriging method in each city, we use leave-one-out (jackknife)
cross-validation to estimate the root mean squared error (RMSE) from the model (Bivand, Pebesma et
al. 2008). By comparing the RMSE of the universal kriging approach to the RMSE of the ordinary kriging
approach we can estimate the prediction accuracy benefit (or harm) from incorporating additional
covariates into the kriging model. Because each city’s kriging model will be estimated independently,
consistent findings across all cities may provide evidence of generalizability to other contexts.
Sampling Density
Finally, in order to assess the sensitivity of model accuracy to sampling density, we will estimate the
RMSE for the ordinary kriging approach and the best-performing universal kriging approach after
deleting sample points. Specifically, we will select a random number between 0 and 100, delete that
proportion of sample points from the spatial dataset, then compute the leave-one-out cross-validation
RMSE for the resulting model. If too few points remain to estimate a variogram automatically, we will
consider the model to be have an infinite RMSE. By plotting the relationship of estimated RMSE to
sample point count, we can estimate the relationship between sampling density and measure accuracy.
Preliminary Results
As a preliminary analysis, we explored the RMSE resulting from incorporating other variables using a
universal kriging model. Because this analysis was preliminary, we used data only from 503 block faces
in Philadelphia, and incorporated covariates that had been assessed by virtual street audit rather than
the Census measures of more theoretical relevance. As expected, covariates with little relation to
physical disorder, such as conditions of the roadway surface, had little impact on estimates. Low
sidewalk quality, which is in some cases a marker of neighborhood abandonment, was the only
covariate that markedly approved performance. Table 1 displays the results of this preliminary analysis.
Table 1: Preliminary cross-validation results from universal kriging and ordinary kriging models
estimating physical disorder in Philadelphia, PA
Geographic Feature included as a Universal
Kriging Covariate
Nothing (Ordinary Kriging)
Number of lanes for cars
Presence of a bus stop
Condition of the roadway surface
Presence of any visible billboard
Condition of the sidewalk
Presence of any rowhouses
Root Mean Squared Error
(lower is better)
0.441
0.442
0.447
0.438
0.440
0.412
0.435
Percent improvement
over ordinary kriging
--<1%
-1%
1%
<1%
7%
1%
Conclusions
The proposed project will advance methodological knowledge regarding accuracy and efficiency gains
that may be available to social science and public health researchers using universal kriging to assess
contextual measures on study subjects. If we identify accuracy or efficiency gains, we anticipate
pursuing independent funding to explore different sampling and spatial interpolation techniques and
working to identify relevant, frequently measured covariates to improve estimates.
References
Bader, M. D., S. J. Mooney, Y. J. Lee, D. Sheehan, K. M. Neckerman, A. G. Rundle and J. O. Teitler (2015).
"Development and deployment of the Computer Assisted Neighborhood Visual Assessment System
(CANVAS) to measure health-related neighborhood conditions." Health & place 31: 163-172.
Bader, M. D. M. and J. A. Ailshire (2014). "Creating Measures of Theoretically Relevant Neighborhood
Attributes at Multiple Spatial Scales [Available online ahead of print February 7, 2014]." Sociological
Methodology: (doi:10.1177/0081175013516749).
Badland, H. M., S. Opit, K. Witten, R. A. Kearns and S. Mavoa (2010). "Can Virtual Streetscape Audits
Reliably Replace Physical Streetscape Audits?" Journal of Urban Health-Bulletin of the New York
Academy of Medicine 87(6): 1007-1016.
Bivand, R. S., E. J. Pebesma, V. Gomez-Rubio and E. J. Pebesma (2008). Applied spatial data analysis with
R, Springer.
Cressie, N. (1988). "Spatial prediction and ordinary kriging." Mathematical Geology 20(4): 405-421.
Entwisle, B. (2007). "Putting people into place." Demography 44(4): 687-703.
Hengl, T., G. B. Heuvelink and A. Stein (2003). "Comparison of kriging with external drift and regressionkriging." Technical note, ITC 51.
Keyes, K. M., K. A. McLaughlin, K. C. Koenen, E. Goldmann, M. Uddin and S. Galea (2012). "Child
maltreatment increases sensitivity to adverse social contexts: neighborhood physical disorder and
incident binge drinking in Detroit." Drug and alcohol dependence 122(1): 77-85.
Mooney, S. J., M. D. Bader, G. S. Lovasi, K. M. Neckerman, J. O. Teitler and A. G. Rundle (2014). "Validity
of an ecometric neighborhood physical disorder measure constructed by virtual street audit." American
journal of epidemiology: kwu180.
Mooney, S. J., D. J. Westreich and A. M. El-Sayed (2015). "Commentary: Epidemiology in the Era of Big
Data." Epidemiology 26(3): 390-394.
Reiss, A. J. (1971). "Systematic observation of natural social phenomena." Sociological methodology
3(1): 3-33.
Rundle, A. G., M. D. Bader, C. A. Richards, K. M. Neckerman and J. O. Teitler (2011). "Using Google Street
View to audit neighborhood environments." Am J Prev Med 40(1): 94-100.
Sampson, R. J. and S. W. Raudenbush (1999). "Systematic social observation of public spaces: A new look
at disorder in urban neighborhoods." American Journal of Sociology 105(3): 603-651.
Shmool, J. L., L. D. Kubzansky, O. D. Newman, J. Spengler, P. Shepard and J. E. Clougherty (2014). "Social
stressors and air pollution across New York City communities: a spatial approach for assessing
correlations among multiple exposures." Environmental Health 13(1): 91.
United States Census Bureau. (2013). "TIGER/Line Shapefiles." Retrieved July 31, 2013, from
http://www.census.gov/geo/maps-data/data/tiger-line.html.
Wilson, J. S., C. M. Kelly, M. Schootman, E. A. Baker, A. Banerjee, M. Clennin and D. K. Miller (2012).
"Assessing the Built Environment Using Omnidirectional Imagery." American Journal of Preventive
Medicine 42(2): 193-199.