Use of Generalized Linear Mixed Models in the

American Journal of Epidemiology
Copyright © 2001 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol. 153, No. 12
Printed in U.S.A.
Spatial Analysis of Malaria Incidence Rates Kleinschmidt et al.
Use of Generalized Linear Mixed Models in the Spatial Analysis of Small-Area
Malaria Incidence Rates in KwaZulu Natal, South Africa
I. Kleinschmidt,1 B. L. Sharp,1 G. P. Y. Clarke,2 B. Curtis,1 and C. Fraser1
Spatial statistical analysis of 1994–1995 small-area malaria incidence rates in the population of the
northernmost districts of KwaZulu Natal, South Africa, was undertaken to identify factors that might explain very
strong heterogeneity in the rates. In this paper, the authors describe a method of adjusting the regression
analysis results for strong spatial correlation in the rates by using generalized linear mixed models and
variograms. The results of the spatially adjusted, multiple regression analysis showed that malaria incidence was
significantly positively associated with higher winter rainfall and a higher average maximum temperature and
was significantly negatively associated with increasing distance from water bodies. The statistical model was
used to produce a map of predicted malaria incidence in the area, taking into account local variation from the
model prediction if this variation was supported by the data. The predictor variables showed that even small
differences in climate can have very marked effects on the intensity of malaria transmission, even in areas
subject to malaria control for many years. The results of this study have important implications for malaria control
programs in the area. Am J Epidemiol 2001;153:1213–21.
malaria; models, statistical; spatial analysis; variogram
The districts of Ngwavuma and Ubombo in the northern
part of the province of KwaZulu Natal have the highest
malaria incidence rates in South Africa (1). Very large variation in the rates in these districts has not been properly
accounted for, although it has been ascribed as much to
human factors such as cross-border migration as to natural
forces such as climate and environment.
The purpose of this study was to undertake a spatial statistical analysis of malaria incidence to identify important
predictor variables and to produce an incidence map of the
area illustrating the variation in malaria risk. A secondary,
but important aim was to advance methodology for the spatial analysis and modeling of malaria transmission data in
the context of the MARA/ARMA project (2), which, among
other objectives, seeks to produce maps of malaria risk for
the continent of Africa (3, 4).
In Africa, the predominant species of malaria-causing
parasite is Plasmodium falciparum. When a person is bitten
by an infected anopheles mosquito, the parasite, called a
sporozoite during this stage of its cycle, enters the body via
the mosquito’s saliva, which is injected into the person’s
blood. The parasites multiply in the liver and reinvade the
blood, via the red blood cells, as merozoites. The merozoites
multiply sexually, some becoming gametocytes. Uninfected
anopheles mosquitoes become infected if they feed on a person whose blood contains gametocytes. In the insect, the
gametocytes undergo another phase of reproduction called
the “sporogony” cycle. At the end of this cycle, the mosquito becomes infective as a new generation of sporozoites
is able to infect another human host.
Therefore, malaria is bound closely to conditions that
favor survival of the anopheles mosquito and the life cycle
of the parasite. These conditions are determined predominantly by climatic factors, vegetation cover, and the vector’s
access to water surfaces for breeding (5–7). Human population movement from areas in which malaria is endemic to
those in which the disease has been at least partially eradicated can also contribute to malaria transmission (8).
Accurate knowledge regarding the distribution of malaria
is a crucial tool in planning and evaluating malaria control
(9). Explaining this distribution is also important, since it
provides a rationale for interventions and makes it possible to
predict transmission intensity in places in which it has not
been measured. The houses in the area under consideration
have been sprayed with insecticide over several decades (1),
resulting in reduced malaria incidence rates compared with
the era before control measures were introduced (10). In
recent years, incidence rates have again risen steeply. The
study area is situated on the southern fringe of climatic suitability for endemic malaria distribution in Africa (11).
Summer rainfall (6-month average, 68 mm per month) and
temperatures (average maximum daily temperature, 29.4˚C)
are generally suitable for malaria transmission. On the other
hand, winter conditions are suboptimal and could limit trans-
Received for publication September 21, 1999, and accepted for
publication October 20, 2000.
Abbreviations: GIS, geographic information system; GLMM, generalized linear mixed model; SD, standard deviation.
1
Medical Research Council (South Africa), Durban, South Africa.
2
Department of Statistics and Biometry, University of Natal,
Pietermaritzburg, South Africa.
Reprint requests to Immo Kleinschmidt, Medical Research
Council (South Africa), P.O. Box 17120, Congella, Durban 4013,
South Africa (e-mail: [email protected]).
1213
1214
Kleinschmidt et al.
mission of the disease (5) (average rainfall, 19 mm per
month; average maximum daily temperature, 25.9˚C).
We sought to test the hypothesis that the spatial distribution of climatic conditions in winter accounts for much of
the variation in the intensity of malaria transmission in this
region. To this end, we undertook a spatial statistical analysis of small-area malaria incidence rates in relation to rainfall and temperature. Because of their potential confounding
effects, we included proximity to permanent water bodies
and distance to the Mozambique border. The latter was used
as a proxy for migration from Mozambique, where routine
malaria control had not been implemented. Water bodies
included all permanent freshwater surfaces except rivers.
In both the Ngwavuma and Ubombo districts, a smallarea malaria incidence reporting system has been in operation for a number of years as part of the provincial malaria
control program (12). A component of the control strategy is
to identify and treat all infected persons. Therefore, both
passive and active case finding is practiced. The latter consists of screening measures in which teams visit the community to encourage those persons suspected of having
malaria to be tested. While this system may not achieve 100
percent coverage, it is thought to identify the vast majority
of cases. Low levels of exposure to P. falciparum in the past
have made it unlikely that the host population, apart from
recent immigrants from Mozambique, has naturally
acquired immunity. Therefore, persons are unlikely to possess clinical tolerance to parasite infection and to recover
from it without treatment (5).
As part of the control program, a homestead-level population census was conducted in 1994, thus making it possible, with the use of a geographic information system (GIS),
to derive population counts for the same small areas for
which cases were being reported. It was decided to base the
analysis on the July 1, 1994–June 30, 1995 malaria season,
since the denominator data for this time period were the
most reliable available for calculating incidence rates.
DATA
Cases of malaria (ascertained by passive and active detection) in the mid-1994 to mid-1995 season in the districts of
Ngwavuma and Ubombo, northern KwaZulu Natal, were
extracted from the malaria control program database and
were allocated to 220 magisterial subdivisions referred to as
sections. Population totals for each section, based on a census conducted during the 1994–1995 year, were obtained
from the same source.
The total number of cases was 2,418, including 400
patients whose place of residence was not known and who
therefore were excluded from the spatial analysis. Many of
these latter persons were probably from Mozambique, with
no fixed residence in the study area. The total population for
the study area was 241,397 persons, resulting in a crude
overall malaria incidence rate of 8.4 per 1,000 person-years.
Population per section ranged from 11 to 5,482 persons
(median, 858; mean, 1,102; standard deviation (SD), 886).
Area per section varied from 1.1 km2 to 263.2 km2 (median,
19.0; mean, 25.9; SD, 27.5).
For each section, a crude incidence rate for 1994–1995
was calculated. To calculate rates, we assumed that the
entire population of an area was exposed to the risk of
malaria during the period studied, that is, that each person
contributed exactly 1 person-year of exposure. Crude incidence rates per section ranged from 0 to 306 cases of
malaria per 1,000 person-years (mean, 11.2; SD, 32.0;
interquartile range, 0–8 per 1,000). In all but 14 sections, the
rate was less than 50 per 1,000 person-years. In 66 sections,
there were no cases. Figure 1 shows the distribution of incidence rates by section.
Long-term average climatic data (rainfall, daily maximum temperature) by calendar month were obtained from
climate databases for Africa (13). By using GIS (14), we
calculated the average value of each variable for each section by averaging the pixel values of the variable over the
area in the section. The monthly averages were combined
into 6-month averages to coincide with the winter (April–
September) and summer (October–March) seasons. The
average daily maximum temperature in winter was highly
correlated with the average daily maximum temperature in
summer (correlation coefficient 0.995). Therefore, we
combined these two variables into a single average daily
maximum temperature for the whole year. We also calculated the average monthly rainfall and the average maximum daily temperature for the midwinter months of June
and July combined.
The centroid of each section was also derived by using
GIS. For each of these centroids, distance to water bodies
(lakes, reservoirs, and dams; figure 1) and distance to the
Mozambican border were calculated.
ANALYSIS AND RESULTS
Smoothed incidence map
Relatively small population totals per section result in
incidence rates subject to considerable random error (15).
The map of smoothed rates (figure 1) was produced by
using the smoothing method proposed by Kafadar (16). The
smoothing effect of the incidence rate in one area on another
was limited to those areas whose centroids were less than 15
km apart.
Heterogeneity test
A Potthoff-Whittinghill heterogeneity test (17) was
applied to the observed rates by using Monte Carlo simulation to distribute cases randomly among areas but in proportion to the populations of each area. This test showed that the
observed distribution of cases was very unlikely to have
resulted from chance alone (p < 0.0001) given a null hypothesis of no differences in underlying risk between areas.
Test for spatial pattern
The nonparametric D statistic (18) was used to test the
observed incidence rates in the sections for a significant spatial pattern. The statistic is calculated as a weighted average
of rank differences in incidence rates for all pairs of secAm J Epidemiol Vol. 153, No. 12, 2001
Spatial Analysis of Malaria Incidence Rates
1215
FIGURE 1. Water bodies, malaria incidence rates, and smoothed malaria incidence rates for the population of the districts of Ngwavuma and
Ubombo, KwaZulu Natal, South Africa, July 1994–June 1995 (Source: KwaZulu Natal Malaria Control Programme).
tions; the weights favor pairs of areas in close proximity,
which results in a low D value if there is marked spatial corAm J Epidemiol Vol. 153, No. 12, 2001
relation. We used binary mutual neighborhood weights (i.e.,
a weight of 1 if two areas shared a common boundary, 0 oth-
1216
Kleinschmidt et al.
erwise), which were obtained by using a GIS program written for this purpose. The calculated value of D for the
observed incidence rates was 34.3, while the expected value
of D given the adjacency configuration of the areas was 194
(SD, 5.67), indicating that there was strong evidence of a
spatial pattern in these rates (p < 0.0001), as might be
expected.
Association between malaria incidence and climatic
and environmental factors
To determine the climatic and environmental covariates
that are associated with observed incidence rates while
allowing for spatial correlation of the data, we used an iterative approach of variograms and generalized linear mixed
models (GLMM) (19). We chose this approach since we
expected it to include the optimal properties (best linear
unbiased prediction (20)) of the mixed model.
The Appendix provides details of how the spatial correlation of the data can be derived from a variogram of model
residuals and how this spatial correlation can be taken into
account by implementing the GLMM using SAS software
(21). The overall strategy for modeling spatially correlated
incidence data is described in the paragraphs that follow.
The counts of malaria cases in each section were represented by a GLMM with a Poisson distribution, a logarithmic
link function, and a correlated error structure as described in
the Appendix, and with the population of the section as an offset. (The offset is a term added to the Poisson model on the
logarithmic scale so the count of cases is adjusted for population size.) Potential explanatory variables were the average
values, for the winter and summer seasons separately, of
monthly rainfall, annual daily maximum temperature, distance
to the nearest water body, and distance to the Mozambique
border. Average monthly rainfall and average daily maximum
temperature combined for the midwinter months of June and
July also were included as candidate explanatory variables.
Specification of the correlated error structure in the GLMM
due to a spatial pattern in the data was improved iteratively by
examining the model residuals and respecifying the covariance matrix, as detailed in the Appendix.
The final model contained three variables that were significantly associated with malaria incidence: average daily
maximum temperature, average monthly rainfall during
March–September, and distance to water bodies. Once these
variables were selected, we investigated, for each of the
three variables in turn, whether any of a set of transformations of the variable would result in a significant reduction
in deviance in the multiple regression Poisson model, following a procedure suggested by Royston et al. (22). The
transformation producing the biggest reduction in residual
deviance was chosen if this reduction was at least 3.84 compared with the untransformed variable. Transformations that
were tried for each variable x were 1/x2, 1/x, 1/x0.5, ln(x), x0.5,
and x2, without simultaneous inclusion of the untransformed
variable (x1) in the model. Transformations are useful in representing relations in which the log incidence rate increases
more rapidly than a straight line at low values of x and more
slowly at high values, or vice versa. Any resulting improvement in the fit of the model enhances its utility for the purpose of prediction.
The variogram of residuals of the final model (figure 2)
shows that there was residual spatial dependence after fitting the model. Specification of a covariance structure in the
GLMM based on the spatial correlation of residuals ensures
that the results are adjusted for this spatial correlation.
The incidence rate ratios for the final model are shown in
table 1, together with the estimated effect that these variables by themselves would have on incidence rates.
Distance to water and winter rainfall were fitted as square
root and logarithmic transformations, respectively, whereas
maximum temperature fit best without any transformation.
To facilitate reporting and interpretation of transformed
variables, we calculated the model-based incidence rate
ratio for the transformed variable for two separate values of
the variable relative to a third (referent) value of the variable
(22). The referent value chosen was the mean of the untransformed variable minus one standard deviation; the other two
values were the mean itself and the mean plus one standard
deviation. Therefore, the two incidence rate ratios demonstrating the association between the transformed variable
and malaria incidence were one and two standard deviations
FIGURE 2. Variogram of deviance residuals of the final model of malaria incidence rates for the population of the districts of Ngwavuma and
Ubombo, KwaZulu Natal, South Africa, for the July 1994–June 1995 season.
Am J Epidemiol Vol. 153, No. 12, 2001
Spatial Analysis of Malaria Incidence Rates
1217
TABLE 1. Final model of the climatic and topographic factors and their effect on malaria incidence in
the districts of Ngwavuma and Ubombo, KwaZulu Natal, South Africa, 1994–1995*
Malaria incidence
Single-variable
analysis
Variable
IRR
Average daily maximum temperature§
(TMAX) (change per °C)
Average monthly rainfall, March to
September (RAI0409) (mm)
13¶
19
25
Distance to water (DSTWTR) (km)
3¶
8
13
6.7
p < 0.0001
95% CI‡
3.5, 13.0
Poisson multiple
regression analysis†
IRR
95% CI
135.6
p < 0.0001
61.2, 300.4
1
0.48
0.29
p = 0.005
0.29, 0.80
0.12, 0.68
1
18.6
153.8
p < 0.0001
10.0, 34.4
53.1, 445.5
1
0.43
0.24
p < 0.0001
0.32, 0.59
0.14, 0.41
1
0.60
0.43
p < 0.0001
0.50, 0.73
0.31, 0.58
* All results were adjusted for spatial autocorrelation.
† For this model, incidence rate ratios (IRR) were calculated from the following model: log(IRR) = 4.91 * TMAX
+ 7.70 * ln(RAI0409) – 0.46 * sqrt(DSTWTR).
‡ CI, confidence interval.
§ Range, 26.5–29.0°C.
¶ Reference.
removed from the referent value, and they provided some
indication of the nonlinear nature of the association.
Throughout this analysis, there was evidence of considerable overdispersion of the Poisson model, with a variance
greater than the mean (dispersion factor in the final model,
11.5). This overdispersion was accounted for by inflating
the standard errors in the model by the square root of the
overdispersion factor (19). Overdispersion indicated other
unmeasured sources of variation that could not be taken into
account.
We grouped observed and predicted malaria incidence
rates per 1,000 into categories of less than 1, 1–5, 5–10,
10–50, and more than 50. Doing so provided a weighted
kappa statistic of 83 percent for agreement between
observed and predicted categories.
Model-based prediction map
Using the model derived above, we produced a map of
predicted malaria incidence rates by following a two-stage
approach developed previously to map malaria prevalences
(4). Kriged residuals from the final model were added to the
model predictions and exponentiated (exp(linear prediction
+ kriged residual)) to produce a map of incidence rates that
took into account local deviation from the model predictions, when supported by the data (figure 3).
DISCUSSION
Historical comparisons of malaria incidence with the
period prior to introduction of malaria control (1) and comAm J Epidemiol Vol. 153, No. 12, 2001
parisons with neighboring countries suggest that house spraying has resulted in a significant reduction in the intensity of
transmission in KwaZulu Natal. Our analysis found that even
with this relatively low level of transmission, the overwhelming factors related to variation in malaria incidence were the
same climatic factors that are also associated with malaria distribution in endemic areas (5, 11). The prediction map (figure
3) shows that the combination of proximity to water bodies,
winter rainfall, and maximum temperature was closely associated with high malaria risk in the northwest portion of the
study area and with a moderately high intensity of transmission in the border areas along the northern and western parts
of the area. Although cross-border migration of infected persons and the proximity of uncontrolled areas across the border
may further add to the intensity of transmission in border
areas, our model showed that natural factors help to explain
the higher transmission levels found in these areas. Distance to
the Mozambique border (the surrogate measure of crossborder migration), summer rainfall, and average rainfall and
average daily maximum temperature in June and July were not
significant once other variables were included in the model.
Our model underlines the importance of adjusting for confounding effects when investigating the association between
malaria incidence and climatic factors. Our study area was
marked by a significantly higher winter rainfall along the
coast than in the interior. On the other hand, the coastal areas
had lower daily maximum temperatures than the inland areas.
Both of these factors were closely associated with malaria
incidence, but, because of the negative correlation between
winter rainfall and maximum temperature (correlation coefficient –0.89), the former by itself was negatively associated
1218
Kleinschmidt et al.
FIGURE 3. Map showing the model-based prediction of malaria incidence for the population of the districts of Ngwavuma and Ubombo,
KwaZulu Natal, South Africa, for the July 1994–June 1995 season.
with malaria incidence (table 1). The same negative confounding effect between rainfall and temperature was responsible for the large difference between the sizes of the adjusted
and unadjusted rate ratios for maximum temperature. The
multiple regression model showed that, after adjustment for
maximum temperature, winter rainfall was positively associated with malaria incidence. Similarly, the effect of temperature, after adjustment for the effect of rainfall and, to a lesser
Am J Epidemiol Vol. 153, No. 12, 2001
Spatial Analysis of Malaria Incidence Rates
extent, distance to water, was much larger than the singlepredictor relation between temperature and malaria incidence
suggests. The large magnitude in the rate ratios should be
viewed in the context of very low incidence rates in much of
the study area; incidence rates in the areas in which risk was
higher were many tens of times higher. For this reason, the
incidence rate ratios cannot be extrapolated to situations of
higher endemicity levels, found in the areas further north.
Furthermore, the likely absence of immunity in the study population has the effect that any increase in transmission intensity is directly translated into higher incidence rates.
The very high collinearity between the average daily
maximum temperatures in summer and in winter means that
we were unable to test the hypothesis that variation in the
average daily maximum temperature in winter is a factor in
determining variation in malaria risk. In the study area, there
is virtually no variation in average daily maximum temperature in winter independent of average daily maximum temperature in summer.
According to our results, average daily maximum temperature has a large effect on malaria incidence, with a rate ratio
of 135.6 per degree centigrade. Our study area is at the southern limit of malaria distribution, and an association with temperature, particularly winter temperature, would not be surprising because of its effect on both parasite and vector
development (11). In the study area, annual average daily
maximum temperatures vary from 26.5˚C to 29˚C; some
places have much lower maxima during some of the winter
months. The temperature of small water bodies around which
the vectors overwinter is likely to be well below the maximum daily air temperature. Low water temperatures have
been shown to have a significant negative impact on mosquito abundance because of long larval duration (23), while
fairly high air temperatures for at least part of the day keep
adult mosquito life spans reasonably short. At the same time,
any reduction in air temperature leads to an increased incubation period in the vector (the time needed for production of
sporozoites in the female mosquito that infect humans) (5),
thereby reducing the chances that adult vectors will survive
long enough to become infective. Therefore, this result is in
accordance with the Macdonald model, which expresses the
dependence of the basic reproduction rate of malaria in terms
of the daily survival probability of the vector and the length
of the incubation period (5, 24).
Winter rainfall obviously affects winter vegetation and
sustains smaller water bodies in which vector populations
can survive. The range in winter rainfall is quite large in the
study area, and some places are fairly dry, averaging less
than 15 mm per month during the winter half of the year.
This fact would suggest a lengthy period during which no
breeding sites are available in these areas. The nonlinear
nature of the relation between winter rainfall and malaria
incidence suggests that some places are below the threshold
needed before additional rain can be associated with a large
increase in malaria incidence. The distribution of winter
rainfall in this area is a plausible constraint on vector survival in winter and hence on malaria transmission.
Under these circumstances, mosquitoes may be forced to
overwinter around permanent water bodies from which popAm J Epidemiol Vol. 153, No. 12, 2001
1219
ulations can spread out as more transient water bodies form
after early summer rains. Permanent water bodies may be a
last-resort breeding habitat for Anopheles gambiae because
of the presence of predators (25). However, these water bodies are often surrounded by smaller, less permanent ones
that are more suitable for breeding. Whatever the dynamics
of survival around these sites, our model shows that their
proximity is an additional risk factor for malaria. As a result
of the nonlinearity of the relation, the effect of a unit
increase in distance from water bodies attenuates with the
distance from them.
Spatial correlation in the model residuals up to 20 km
between pairs of sections (figure 2) suggests that there are
unmeasured spatially structured sources of variation not
accounted for by the model. The range of this spatial pattern
in residuals is in excess of the flight distances of mosquitoes
(6). Some of this unexplained variation could be due to deviation of climatic effects during the particular study year from
the pattern of the long-term averages used in the analysis.
Other possible sources of spatially dependent variation in
incidence rates may be the diligence with which insecticide
spraying was carried out by teams in particular sections,
emergence of insecticide and drug resistance, and differences
in the host populations.
The two significant climatic factors in our analysis both
influence survival of vector populations in winter.
Therefore, an important conclusion is that in this area, control activities can become more effective by focusing on the
eradication of winter populations of vectors. Any areas with
a combination of high average daily maximum temperature
and high winter rainfall are at risk. The additional risk posed
by water bodies should be considered when new irrigation
schemes are envisaged. The highly uneven distribution of
malaria in the Ngwavuma and Ubombo districts justifies an
uneven application of control activities, with more effort
being concentrated in the higher risk areas.
This study had a number of limitations that could have
affected the results. First, only 83 percent of the cases could
be allocated to their geographic areas. We did not include
the remaining 17 percent of the cases in the spatial analysis
because we assumed that they were either transients or constituted a random sample of all cases. Second, incidence
rates could not be adjusted for age and sex, since demographic data were not available for the population counts.
Previous studies have shown that the age distribution of
cases in this area closely follows the age distribution of the
population, so adjustment for age would have had a negligible effect on the results (10). Third, large databases of
reporting data are always prone to inaccuracies, omissions,
and duplicates (26). As long as these errors are distributed
randomly, any significant associations are attenuated (27).
Fourth, potential bias caused by the system of active case
finding cannot be ruled out, but we have no reason to think
that such underreporting favored some areas over others. A
recent health survey conducted in this area showed that
health-seeking behavior in general does not appear to be
affected by the distance of a residence from health facilities
(Joyce Tsoka, Medical Research Council, Durban, personal
communication, 1999).
1220
Kleinschmidt et al.
Our study accounted for spatial correlation in the data by
iteratively improving the measurement of residual correlation and subsequently specifying the covariance matrix of
the data in a GLMM. Ignoring spatial correlation can result
in explanatory variables apparently being associated with
incidence as a result of overstatement of the degrees of freedom in the data and consequent underestimation of the sizes
of standard errors. In our analysis, the standard errors of the
regression coefficients were underestimated by as much as
35 percent in the model that ignored spatial effects compared
with the model that adjusted for spatial effects. The advantage of using the GLMM approach is that it can be extended
to incorporate additional data requiring further random
effects to be specified. For example, we would like to extend
this analysis to consider several years of data for the same
small areas by specifying the year of each annual count as a
random effect and allowing for temporal correlation. Such
spatial-temporal analysis would show whether unusual
weather conditions in a given winter predict unusual malaria
incidence during the ensuing malaria season.
11.
12.
13.
14.
15.
16.
17.
18.
19.
ACKNOWLEDGMENTS
20.
21.
The authors thank Dr. Tom Smith for his critical comments on an earlier draft of this paper and Carrin Martin for
providing the digitized data on water bodies.
22.
23.
REFERENCES
24.
1. Sharp BL, Le Sueur D. Malaria in South Africa—the past, the
present and selected implications for the future. S Afr Med J
1996;86:83–9.
2. Towards an atlas of malaria risk in Africa: first technical report
of the MARA/ARMA collaboration. Durban, South Africa:
MARA/ARMA, December 1998. (http://www.mara.org.za).
3. Snow RW, Gouws E, Omumbo JA, et al. Models to predict the
intensity of Plasmodium falciparum transmission: applications
to the burden of disease in Kenya. Trans R Soc Trop Med Hyg
1998;92:601–6.
4. Kleinschmidt I, Bagayako M, Clarke GPY, et al. A spatial statistical approach to malaria mapping. Int J Epidemiol 2000;29:
355–61.
5. Molineaux L. The epidemiology of human malaria as an explanation of its distribution, including some implications for its
control. In: Wernsdorfer WH, McGregor I, eds. Malaria: principles and practice of malariology. Vol 2. London, United
Kingdom: Churchill Livingstone, 1988:445, 936–8.
6. Gillies MT, De Meillon B. The Anophelinae of Africa south of
the Sahara. Johannesburg, South Africa: The South African
Institute for Medical Research, 1968:212–19.
7. Ghebreyesus TA, Haile M, Witten KH, et al. Incidence of
malaria among children living near dams in northern Ethiopia:
community based incidence survey. BMJ 1999;319:663–6.
8. Martens P, Hall L. Malaria on the move: human population
movement and malaria transmission. Emerg Infect Dis
2000;March–April:6. (Online serial, World Wide Web URL:
http://www.cdc.gov/ncidod/EID/eid.htm).
9. Snow RW, Marsh K, LeSeur D. The need for maps of transmission intensity to guide malaria control in Africa. Parasitol
Today 1996;12:455–7.
10. Sharp BL, Ngxongo S, Botha MJ, et al. An analysis of 10 years
25.
26.
27.
28.
29.
30.
of retrospective malaria data from the KwaZulu areas of Natal.
S Afr J Sci 1988;84:102–6.
Craig MH, Snow RW, Le Sueur D. A climate-based distribution model of malaria transmission in Africa. Parasitol Today
1999;15:105–11.
Sharp B, Fraser C, Naidoo K, et al. Computer assisted health
information system for malaria control. Presented at the MIM
(Multilateral Initiative on Malaria) African Malaria
Conference, Durban, South Africa, March 14–19, 1999.
Hutchinson MF, Nix HA, McMahon JP, et al. Climate data: a
topographic and climate database. (CD-ROM). Canberra,
Australia: Centre for Resource and Environmental Studies,
The Australian National University, 1995.
Clark Labs. Idrisi for Windows version 2.008. Worcester, MA:
The Idrisi Project, Clark University, 1998.
Cuzick J, Elliott P. Small-area studies: purpose and methods.
In: Cuzick J, Elliott P, eds. Geographical and environmental
epidemiology: methods for small area studies. Oxford, United
Kingdom: Oxford University Press, 1992:14–21.
Kafadar K. Smoothing geographical data, particularly rates of
disease. Stat Med 1996;15:2539–60.
Potthoff RF, Whittinghill M. Testing for homogeneity. II. The
Poisson distribution. Biometrika 1966;53:167–82.
Walter SD. A simple test for spatial pattern in regional health
data. Stat Med 1994;13:1037–44.
Littell RC, Milliken GA, Stroup WW, et al. SAS system for
mixed models. Cary, NC: SAS Institute Inc, 1996:229–251, 305,
307, 423–60.
Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York, NY: Springer, 2000:80.
SAS Institute Inc. SAS system version 6.12. Cary, NC: SAS
Institute Inc, 1996.
Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology.
Int J Epidemiol 1999;28:964–74.
Le Sueur D. The ecology, over-wintering and population
dynamics of the pre-imaginal stages of the Anopheles gambiae
Giles complex (Diptera: Culicidae) in northern Natal, South
Africa. Doctoral thesis. Natal, South Africa: University of
Natal, Pietermaritzburg, 1991:134–85.
Bruce-Chwatt LJ. Essential malariology. London, United
Kingdom: Heineman Medical Books Ltd, 1980:149–59.
Le Sueur D, Sharp BL. The breeding requirements of three
members of the Anopheles gambiae Giles complex in the
endemic malaria area of Natal, South Africa. Bull Entomol Res
1988;78:549–60.
Kleinschmidt I, Pattenden P, Walls P, et al. A national health
statistics database: data quality requirements for small area
health studies. In: Frederiksen P, ed. Proceedings of Eurocarto
XII Conference on Geo related databases, Copenhagen,
Denmark, October 1994.
MacMahon S, Peto R, Cutler J, et al. Blood pressure, stroke,
and coronary heart disease. Part 1. Prolonged differences in
blood pressure: prospective observational studies corrected for
the regression dilution bias. Lancet 1990;335:765–74.
McCullagh P, Nelder JA. Generalised linear models. 2nd ed.
New York, NY: Chapman and Hall, 1989:37–40.
Cressie NAC. Statistics for spatial data. New York, NY: John
Wiley & Sons, 1991.
GS+ geostatistics for the environmental sciences, version
3.10.2 beta. Plainwell, MI: Gamma Design Software, 1998.
APPENDIX
Iterative Procedure for Specifying Spatial Dependence
in the Generalized Linear Mixed Model
The SAS procedure MIXED (SAS Institute, Inc., Cary,
North Carolina) is an implementation of mixed-model
Am J Epidemiol Vol. 153, No. 12, 2001
Spatial Analysis of Malaria Incidence Rates
methodology for Gaussian response variables. For nonGaussian responses, such as incidence rates, the macro
GLIMMIX implements the generalized linear mixed model.
GLIMMIX produces estimates via the PROC MIXED procedure (SAS Institute, Inc.) and hence provides similar functionality. We used GLIMMIX to adjust for spatial dependence
in the regression analysis
Standard generalized linear model
With the classical generalized linear model (28), a vector
of observations y is assumed to have uncorrelated elements.
In our particular application, the model assumes that the yi
b,
are Poisson distributed; if mean(y) µ, then log(µ) Xb
a linear function of the explanatory variables X; var(y) V,
a diagonal matrix with unequal elements, since the variance
of a Poisson variate depends on the mean.
Linear mixed model allowing for a correlated error
structure
In the linear mixed model, as implemented in the SAS procedure MIXED, it is possible for observations, y, which are
assumed to be distributed normally, to have a spatial correlation structure. In particular, if dij denotes the distance between
points i and j, where observations yi and yj were made, V Iσ12 Fσ2, where Fij exp(–dij /ρ). The unknown parameters in this model, namely, σ12 (the nuggett), σ2 (the sill), and
ρ (the range), can be obtained from a variogram of the data,
σ21
as described below. They are jointly referred to as u £ σ2 § .
ρ
This particular spatial model is known as the exponential
model, and it effectively postulates that the correlation
increases as points occur closer together, whereas points at
distances greater than the range from one another are uncorrelated. Fij can be defined by other spatial models (19). In the
PROC MIXED implementation, only the variogram parameters, u, are specified so that the SAS procedure can be used to
calculate V. For nonnormally distributed data, u is specified
in the macro call to enable GLIMMIX to estimate the appropriate correlation matrix for a spatially correlated Poisson
model. However, since the Poisson model does not satisfy the
requirement of constant variance, estimation of u is carried
out on standardized residuals, as explained below.
Variogram approach to estimating θ
In classical kriging (29), the following approach is
adopted: If it can be assumed that the mean value of y does
not change from point to point (the characteristic of stationarity), a variogram can be constructed. The pairs of
observations are arranged so that all pairs a given distance
of h±h/2 apart are pooled into one class, and the semivariance γ(h) 1⁄2 average(yi – yj)2 is calculated. The variogram is a plot of γ(h) against distance for distances h, 2h,
3h, 4h . . . etc. (19). Estimates of u are obtained graphically.
Am J Epidemiol Vol. 153, No. 12, 2001
1221
We used the software package GS-Windows (30) for this
purpose.
In our application, the variogram was not constructed
from y because of the nonstationarity and the Poisson distribution (implying nonconstant variance). Instead, we used
the signed deviance residuals calculated from r sign(y –
µ) 521ylog1y>µ2 y µ261>2, where y and µ are the
observed and fitted values of the response variable, respectively (28).
Fitting the GLMM and iteration between the variogram
and GLMM
The mixed-model estimates of b and u (19) can be
obtained jointly via the SAS procedure MIXED. However,
in the context of the GLIMMIX implementation (i.e., nonGaussian models), interpretation of the output relating to the
spatial parameters u is not straightforward, and convergence
problems can arise. Our approach was to fix u; use GLIMMIX to estimate b; and then, from the variogram of residuals, r, revise the estimate of u. In other words, accommodation of correlation between elements of y due to spatial
effects was achieved by converting from the observed y to
standardized residuals r, which have the property of constant variance and mean. The following steps were carried
out:
b̂0 of b was made by assuming no
1. An initial estimate b
spatial correlation, that is, with V as a diagonal matrix.
b̂0,
The deviance residuals r0 were calculated by using b
and a variogram was constructed by using GSWindows. Initial estimates uû0 of u were made as
described above.
2. GLIMMIX was then used with u being fixed at the
values uû0.
b̂1 was
3. From GLIMMIX, a new set of estimates b
found; hence, a new set of deviance residuals r1 was
calculated. These were used in a new cycle to redraw
the variogram and thus derive fresh estimates of uû1.
Steps 2 and 3 form an iterative cycle that continued until
there was no further change in the estimates. Assuming that
any spatial correlation was positive rather than negative,
b̂0 might have been underestimated rather
standard errors of b
than overestimated. Adjustment for spatial correlation may
therefore lead to removal of some variables from the model
whose contributions were overstated initially. The criterion
for dropping a variable from the model was p > 0.05.
Producing predicted values by kriging
Figure 2 shows that the residuals of the final model
remained spatially correlated. Therefore, map predictions
could be improved by kriging the residuals. We did so by
b̂ and then adding to this
calculating the log(mean y) using b
(on the log scale) the kriged predictor for the deviance residual, r, at that point (4). These kriged predictions were then
transformed back into predicted incidence rates (figure 3).