Update: Spatial Aspects of Epidemiology: The Interface with Medical

Epidemiologic Reviews
Copyright © 2000 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol. 22, No. 1
Printed in U.S.A.
Update: Spatial Aspects of Epidemiology: The Interface with Medical
Geography
Gregory E. Glass
Although antecedent, individual studies in medical
geography exist, Meade et al. (1) trace the emergence
of systematic interest in the field to the early 1950s.
The field developed over the subsequent decade before
a substantial international focus was created. Much of
this delay reflected technologic limitations rather than
conceptual difficulties (6). Recently created analytic
methods and technologic advances suggest that it may
be appropriate to reevaluate the contributions of each
field to the study of problems of disease and disease
control and the identification of areas of research in
which the two fields can more actively interact with
one another. This review focuses on recent technologic
and analytic developments in medical geography,
especially where they apply to epidemiologic study.
Epidemiology has, historically, been rooted in dealing with acute public health phenomena and has investigated these problems from the perspective of implicating specific agents and/or exposures as the cause of
disease. This has led to a significant focus on study
design and analytic methods. In contrast, medical geography has maintained an ecologic perspective to study
disease patterns, especially emphasizing the culturalenvironmental interactions that lead to disease events
(7). Thus, while each deals with aspects considered by
the other, the primary difference appears to be the focus
of medical geography on the spatial context of healthrelated issues—an aspect that epidemiology recognizes
but rarely explicitly considers.
In recent years, environmental epidemiology has
bridged the distinction between medical geography
and epidemiology. This occurred especially when
environmental epidemiology shifted its focus to exposure-centered rather than disease outcomes. This shift
has occurred because of the increased attention on
chronic, degenerative diseases; because of the recognition that their etiologies are often multifactorial and
dose dependent; and because of substantial technologic improvement in measuring environmental contaminants (8). These factors, in combination, have created
conditions favoring an integrated, environmental
approach in which the spatial relations of features,
such as populations at risk and environmental pollu-
INTRODUCTION
Medical geography and epidemiology share the
common goal of understanding disease processes and
improving methods of health interventions. While they
share a common history, the fields have diverged and
developed different, often complementary, approaches
to the same types of problems. Medical geography differs from epidemiology in its underlying focus on
applying the concepts and methods of geography to
investigate health-related topics (1). Much like other
subdisciplines within its field, medical geography
attempts to integrate concepts and methods from various social, physical, and biological sciences to create a
unique specialty. Throughout much of its development
and continuing to the present, the focus of medical
geography remains very much on an ecologic perspective of health. Its emphasis is on the interplay of the
behavior of humans and the environment to generate
observed health outcomes so that, in addition to the
traditional issue of spatial variation in disease incidence or prevalence, research related to understanding
variation in individual risk perception and movement
patterns has come into play (2, 3). Coupled with this is
the practical application of these research issues, such
as the optimal structure and network design of health
care delivery systems (4).
In contrast, epidemiology is purposive for health
promotion and disease prevention, with a focus on providing paradigms for the investigation of specific
health problems and providing etiologic explanations
for these problems, especially from a biological perspective (5). From a medical geography perspective,
epidemiology provides an additional rich source of
methods, concepts, and information that can be incorporated to study problems of interest.
Received for publication November 17, 1999, and accepted for
publication May 16, 2000.
Abbreviation: GIS, geographic information systems.
From the W. Harry Feinstone Department of Molecular
Microbiology and Immunology, The Johns Hopkins School of Public
Health, Baltimore, MD 21205. (e-mail: [email protected]).
(Reprints to Dr. Glass at this address.)
136
Spacial Aspects of Epidemiology
tants, are key and methods to assess exposure for the
populations require a spatially explicit modeling of
environmental conditions. This approach fits quite
comfortably within the medical geographic approach
while avoiding many of the difficulties associated with
strictly group-level analyses. However, issues of "ecologic bias" will probably need to be considered and
evaluated on a study-by-study basis.
ANALYTIC APPROACHES
Two major areas of analytic study and application in
medical geography have been developing methods to
identify disease clusters and applying methods to estimate conditions or outcomes in places where measurements have not been obtained. The first of these
reflects a formalization of the methods of earlier
efforts but also characterizes the important methodological issue to identify environmental conditions
associated with rare disease events, such as (potential)
clusters of childhood leukemia cases (9, 10), or acute
disease events, such as zoonotic disease outbreaks
(11). Identification of significant clustering then serves
as the starting point for further investigation and the
creation of hypothesized relations between the environment and health outcomes.
The perspective of spatial clusters that has developed recognizes that clustering occurs at a variety of
spatial scales from regional to local scales. Underlying
these can be geographic changes in risk factors, which
introduce a degree of spatial autocorrelation in the outcome. Spatial autocorrelation of cases can occur
because there is a contagious nature to the underlying
process, as with many infectious diseases, because of a
single time and place of exposure, or because there is
localized heterogeneity in the environmental covariates of risk (12). Identification of risk factors and the
extent of their effects (see below) that are used to
adjust the spatial analysis need to be explicit (13).
Commonly, the approach for cluster identification is
to partition the study region into a large number of
small areas and identify stratum-specific reference
rates that can be coupled with counts of the population
at risk for each small area (14). Under the null hypothesis, the number of cases in each small area has a nominally Poisson distribution with the expected number
of cases equal to the mean of the Poisson process.
Clustering occurs when the departure from the null
hypothesis results in increased variability in the number of cases (14). When arbitrary partitions of the
study region, such as political boundaries, are made
under the null hypothesis, the case locations become a
realization of a nonhomogeneous Poisson process
whose intensity depends on the strata-specific population distribution and large-scale geographic factors,
Epidemiol Rev Vol. 22, No. 1, 2000
137
making identification of the null model more problematic. Similarly, the distribution of environmental
covariates affects the underlying model. Approaches
for estimating the effects of environmental covariates
are a significant focus of research (see below).
Distance methods have been an especially useful
method of identifying clusters (14) and correspond
closely in design to epidemiologic studies. Both methods rely on using locations of cases and controls to
specify the nuisance function associated with local
population structure (14). The method compares the
spatial distribution of cases relative to a random labeling of the combined case and control locations (13).
No particular distribution is assumed for either the
cases or controls.
An alternative approach that explicitly addresses the
spatial scale of clusters is proposed by Diggle (15). In
this approach, the second-moment properties of the
process generating the disease are used. The mean
number of cases within a specified distance of an arbitrarily selected case is determined relative to the local
population density, with the same term defined for the
control series. The statistic is the difference between
the two values over all distances. Positive values of the
statistic are indicative of clustering. In both
approaches (13, 15), the spatial distribution of the local
population is a nuisance parameter that requires a random sample of unaffected individuals or sites. As such,
both fall closely within case-control study design
methodology.
Identification of clusters, however, does not address
issues related to identifying the environmental determinants of disease. The presence of spatial variation in
the underlying environmental risk factors and the estimation of their effects remains a substantial area of
further research. In part, this reflects the underlying
effect of spatial autocorrelation of environmental conditions. One recent approach to estimate these effects
and partition their impact as local or regional follows
from methods proposed by Cressie (16). The purpose
of these analyses is to identify a suite of environmental covariates associated with the outcome variables of
interest and relies on hierarchical modeling in which
risk, the variable of interest, is unknown but the outcome—counts or locations of individual cases—is
observed. The outcome is modeled as a function of the
risk, e.g., in the situation of counts, a Poisson function
might be appropriate. Then risk is modeled as a function of the environmental factors, which explicitly
consider that variances are location specific and the
spatial autocorrelation is used as a measure of spatial
dependence among sites (12). Parameterizations for
the spatial covariances follow standard structures used
in geostatistics (16), and parameter estimation is per-
138 Glass
formed using the Monte Carlo Newton Raphson
method proposed by McMullogh (17).
Estimating outcomes or conditions at unmeasured
sites has been an area of special concern in geography
because of the (usually positive) spatial correlation
among sites that renders the assumption of independent observations in traditional statistical methods
inappropriate (18). Various methods have been proposed in geostatistics to efficiently interpolate spatial
data. Statistical methods such as kriging, which represents the best linear unbiased predictor procedure to
estimate surfaces, however, have the practical difficulty of overestimating low values and underestimating high values. Such an approach represents a practical difficulty if the surface being generated is related to
disease risk.
METHODOLOGICAL DEVELOPMENTS
Probably the key development in medical geography
during the past 15 years has been the increase in the
availability and power of computing systems and the
development of software to deal with the relations of
spatially explicit data. Moore and Carpenter (18) provide an excellent summary of recent key studies applying geographic information systems (GIS) to epidemiologic research, as well as a summary of various
spatial analytic approaches. The ability to input, store,
manipulate, and present the spatial relation of health
events to other key features of interest is the critical
characteristic of the field. As such, GIS, which have
experienced explosive growth as an application tool,
are likely to continue to impact the field (6, 19), especially as these tools are coupled with epidemiologic
analyses (20-22). Several studies have demonstrated
the feasibility of using mapped environmental features, stored in GIS, as predictor variables for outcomes of epidemiologic studies with a substantial
degree of success (22, 23). More recently, Cubbin et al.
(24) explored using mapped environmental features to
develop qualitative tests of hypotheses for three different social hypotheses related to patterns of violence. A
key feature of these studies is that GIS was able to calculate variables that were, from a practical perspective,
nearly impossible to obtain from field studies. Thus,
the approach uses the observed occurrence of disease
to identify the risk factors, based on stored environmental information, and then uses the spatial distribution of the identified environmental risk factors to
characterize the sizes and locations of risk areas. These
results can then be presented as maps to identify
regions of high or low risk for the outcome.
Historically, GIS has relied on its cartographic influences that have minimized the extent of statistical
analysis. This is evident in the limited number and
types of statistics that are resident on most software
systems. Most of these statistics are simple, univariate,
first- and second-order statistics. However, as epidemiology and medical geography have begun to
interact in this area, we can anticipate that there will be
substantial cross-fertilization of approaches.
One area that GIS has significantly impacted is
applied medical geography. In particular, the design
and development of health systems (4) and studies of
the patterns of land use (25, 26) for planning have
incorporated GIS technology and methodology heavily. This suggests that GIS might similarly impact several aspects of applied epidemiology, such as identification of study populations. For example, GIS was
used to overlay environmental risk factors for Lyme
disease with street address maps to recruit volunteers
for a vaccine trial study (6). This stratification procedure produced a significant enrichment in the numbers
of at-risk individuals relative to that obtained by other
methods.
FUTURE DIRECTIONS
The conceptual framework of environmental epidemiology, especially when it focuses on exposures or
the multifactorial nature of many chronic diseases,
provides an excellent starting point for communication
between epidemiologists and medical geographers.
The most critical area for future development that
would provide an area of common communication is
spatial analysis. The appropriate methods for dealing
with spatially autocorrelated health outcomes, either
because they are by their nature autocorrelated or
because of the spatial pattern of underlying environmental risk factors, remains an area that must be
addressed. Substantial progress has been made in the
past decade in dealing with spatially explicit data, but
methods need to be developed to deal with estimation
procedures and study design. For example, common
study designs, such as matched case-control designs,
are poorly characterized for spatially explicit situations (27). In unmatched designs, under the null
hypothesis, the spatial distributions of cases and controls are assumed to be drawn from the same, underlying spatial pattern so that the expected difference in
their distributions is zero. However, in matched
designs, even under the null model using random relabeling of points, the expected difference in their distributions is no longer zero.
In applied and environmental epidemiology studies,
we can anticipate an increased need for up-to-date,
accurate measures of environmental conditions that
extend over large geographic regions. Increasingly,
remotely sensed data gathered from various platforms,
such as satellites, provide real-time or near real-time
Epidemiol Rev Vol. 22, No. 1, 2000
Spacial Aspects of Epidemiology
characterization of the environment that are used to
interpret the dynamics of environmental conditions
that could impact health outcomes. GIS provides an
important methodology that bridges epidemiology and
medical geography by providing the tools for explicit
spatial characterization of data. However, a practical
issue with these systems will be the substantial size of
the environmental data generated (often reaching terabytes of data) at high repeat frequencies (days or
weeks). Analyzing and interpreting results from these
inputs will provide a substantial challenge that cannot
be met by traditional data analytic and management
approaches (28). Methods such as context-based data
search and retrieval, which abstract the data with no
loss of relevant information, will need to be developed
further to deal with these issues (28).
Integration and analysis of distributed data sets will
represent a serious issue in the near future as well,
especially as they relate to data quality. Development
of standards for data quality and the appropriate metadata needed for both efficient searching and documentation of quality are still being created. The Federal
Geographic Data Committee and the National Spatial
Data Infrastructure represent attempts to coordinate
geographic data that would meet appropriate standards
for data quality.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
ACKNOWLEDGMENTS
Spported by an NASA Earth Sciences Enterprise program
ESIP to IBM T. J. Watson Research Center NCC 5-305.
The author thanks Drs. A. Das, S. R. Lele, and N. Cressie
for their discussion of the issues involved with spatial
analyses.
20.
21.
22.
REFERENCES
1. Meade MS, Florin JW, Gesler WM. Medical geography. New
York, NY: The Guilford Press, 1988.
2. Slovik P. Perception of risk. Science 1987;236:280-5.
3. Phillips DR. Spatial variations in attendance at general practitioner services. Soc Science Med 1979;13D: 169-81.
4. Westcott B. GIS and GPS—the backbone of Vermont's
statewide E911 implementation. PE & RS 1999;65:1269-76.
5. Garfield E. Mapping the world of epidemiology. Part 2.
Current Contents 1988;36:3-10.
6. Glass GE. Geographic information systems. In: Nelson KE,
Masters CF, Graham NMH, eds. Infectious disease epidemiology. Frederick, MD: Aspen Publishers (in press).
7. Dyck I. Health and health care experiences of the immigrant
woman: questions of culture, context and gender. In: Hayes
Epidemiol Rev Vol. 22, No. 1, 2000
23.
24.
25.
26.
27.
28.
139
MV, Foster LT, Foster HD, eds. Community, environment
and health: geographic perspectives. Victoria, British
Columbia, Canada: University of Victoria, 1992:231-56.
Terraini B. Environmental epidemiology: a historical perspective. In: Elliott P, Cuzick J, English D, et al., eds.
Geographical and environmental epidemiology: methods for
small-area studies. Oxford, England: Oxford Medical
Publications, 1996:253-63.
Baron J. Cancer mortality in small areas around nuclear facilities in England and Wales. Br J Cancer 1984;50:815-24.
Gardner MJ. Childhood leukaemia around the Sellafield
nuclear plant. In: Elliott P, Cuzick J, English D, et al., eds.
Geographical and environmental epidemiology: methods for
small-area studies. Oxford, England: Oxford Medical
Publications, 1996:291-309.
Kitron U, Michael J, Swanson J, et al. Spatial analysis of the
distribution of LaCrosse encephalitis in Illinois, using a geographic information system and local and global spatial statistics. Am J Trop Med Hyg 1997;57:469-75.
Das A. Topics in spatial statistics. PhD dissertation. The
Johns Hopkins University, Baltimore, MD, 1998.
Cuzick J, Edwards J. Tests for spatial clustering in heterogeneous populations. J Roy Stat Soc Series B 1990;52:73-104.
Alexander FE, Cuzick J. Methods for the assessment of disease clusters. In: Elliott P, Cuzick J, English D, et al., eds.
Geographical and environmental epidemiology: methods for
small-area studies. Oxford, England: Oxford Medical
Publications, 1996:238-50.
Diggle PJ. Statistical analysis of spatial point patterns.
London, England: Academic Press, Inc., 1983.
Cressie N. Statistics for spatial data. New York, NY: John
Wiley & Sons, 1993.
McCulloch CE. Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 1997;92:162-70.
Moore DA, Carpenter TE. Spatial analytical methods and
geographic information systems: use in health research and
epidemiology. Epidemiol Rev 1999;21:143-61.
Vine MF, Degnan D, Hanchette C. Geographic information
systems: their use in environmental epidemiologic research.
Environ Health Perspect 1997,105:598-605.
Kitron U. Landscape epidemiology and epidemiology of
vector-borne diseases: tools for spatial analysis. J Med
Entomol 1998;35:435-45.
Zenilman JM, Ellish N, Freisa A, et al. The geography of sexual partnerships in Baltimore: applications of core theory
dynamics using a geographic information system. Sex
Transm Dis 1999;26:75-81.
Latkin C, Glass GE, Duncan T. Using geographic information systems to assess spatial patterns of drug use, selection
bias and attrition among a sample of injection drug users.
Drug Alcohol Depend 1998;50:167-75.
Childs JE, McLafferty SL, Sadek R, et al. Epidemiology of
rodent bites and prediction of rat infestation in New York
City. Am J Epidemiol 1998;148:78-87.
Cubbin C, Pickle LW, Fingerhut L. Social context and geographic patterns of homicide among U.S Black and White
males. Am J Public Health 20OO;90:579-87.
McCracken SD, Brondizio ES, Nelson D, et al. Remote sensing and GIS at farm property level: demography and deforestation in the Brazilian Amazon. PE&RS 1999;65:1311-20.
Buchanan JT, Acevedo W. Studying urban sprawl using a
temporal database. Geo Info Systems 1996;6:42-7.
Diggle PJ, Morris SE, Wakefield JC. Point-source modelling
using matched case-control data. Biostatistics (in press).
Li C-S, Bergman L, Castelli V, et al. Model-based mining of
remotely sensed data for environmental and public health
applications. In: S Wong, ed. Med Databases (in press).