Combining data from multiple spatially referenced surveys

Combining data from multiple spatially referenced surveys:
geostatistical analysis of childhood malaria in Chikhwawa
District, Malawi
Emanuele Giorgi 1
Sanie S. S. Sesay 2
Peter J. Diggle1
Dianne J. Terlouw2
1
Medical School, Lancaster University, Lancaster, UK
2
Liverpool School of Tropical Medicine, Liverpool, UK
CHICAS
combining
health information,
computation and statistics
Leahurst, University of Liverpool, 4-5 November 2013
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
0
Overview
• Motivation: quality and temporal variation across surveys.
• Data: two Malaria Indicator Surveys and an Easy Access
Group study in Chikwawa District, Southern Malawi.
• Methods: a tri-variate generalised linear geostatistical
model.
• Monte Carlo maximum likelihood.
• Results: estimation and prediction.
• Discussion.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
1
Introduction
Randomised and Convenience sampling
• Ideally every subject is drawn from the target population
and is chosen randomly, such that each individual has the
same probability of being chosen (random sample).
• Convenience sampling is a non-probability sampling
technique where subjects are selected because of their
convenient accessibility and proximity to the researcher.
• Convenience sampling is tempting in resource-poor settings
because it is more economical but it can potentially
produce bias estimates.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
2
Introduction
Temporal variation
• When surveys are repeated over time, it may be of interest
to estimate changes over time.
• If surveys are correlated, there may be some in gain in the
use of a joint model.
• A joint model is usually more advantageous if surveys do not
use the same sampling locations.
• Surveys that are conducted further apart in time may not
may provide two independent pieces of information.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
3
Introduction
The problem
1
How to combine data in a joint model?
2
How to account for bias from non-randomised surveys?
3
Is there any gain in spatial predictions from a joint model?
Some (non-spatial) answers from the literature: Moriarity and
Schoren (2001); Elliot and Davis (2005); Lohr and Rao (2006);
Manzi, Spiegelhalter, Turner, Flowers and Thompson (2011); Hedt
and Pagano (2011).
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
4
Data
Malaria and Malawi
Republic of Malawi
Capital: Lilongwe
Official language: English,
Chichewa
Population: 14.901.000
Data
Malaria and Malawi
Republic of Malawi
Capital: Lilongwe
Official language: English,
Chichewa
Population: 14.901.000
Data
Malaria and Malawi
Republic of Malawi
Capital: Lilongwe
Official language: English,
Chichewa
Population: 14.901.000
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
5
Data
Three surveys: 1st Malaria Indicator Survey (May 2010 - April 2011)
• Target population: children
under 5 years.
• Sampling scheme: (first step)
selection of 7 or 8 villages every
month, and each village twice
a year (rainy and dry season);
(second step); random
selection of households based
on a household list.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
6
Data
Three surveys: 2nd Malaria Indicator Survey (May 2011 - April 2012)
0
2000
4000
• Target population: all groups of
age.
• Sampling scheme: (first step)
selection of 7 or 8 villages every
month, and each village twice
a year (rainy and dry season);
(second step); random
selection of households based
on a spin-the-bottle approach.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
7
Data
Three surveys: Convenience survey (May 2011 - April 2012)
Percentage of access
0.4016 - 1.5261
1.5261 - 2.6506
2.6506 - 3.7751
3.7751 - 4.8996
4.8996 - 6.0241
6.0241 - 7.1486
7.1486 - 8.2731
8.2731 - 9.3976
9.3976 - 10.5221
10.5221 - 11.6466
OpenStreetMap
0
Emanuele Giorgi et al.
2000
• Target population: children
under 5 years.
• Sampling scheme: enrolment
of children who came to the
hospital for their childhood
vaccines.
4000
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
8
Methods
Model: a tri-variate GLGM (1)
Yij = ‘‘number of positive RDTs for the
j-th household in the i-th survey’’
Yij |
random effect ∼ Binomial(nij , pij )
Two main sources of heterogeneity across the three surveys:
• temporal variation of the underlying prevalence;
• quality-variation between randomised and convenience
surveys.
Spatial covariates: Insecticide Treated Net (ITN), Indoor Residual
Spraying (IRS), Rainy season (RS), Distance from closest waterway
(DW), Socio-Economic-Status (SES).
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
9
Methods
Model: a tri-variate GLGM (2)
2
1
S2
S1
B2
*
2
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
10
Methods
Model: a tri-variate GLGM (3)
• 1st Malaria Indicator Survey (May 2010 - April 2011)
log{p1j /(1 − p1j )} = d (x1j )> β + S1 (xij ) + Z1j , j = 1, . . . , 475.
• 2nd Malaria Indicator Survey (May 2011 - April 2012)
log{p2j /(1 − p2j )}
cor(S1 (x ), S2 (x ))
=
d (x2j )> β + S2 (x2j ) + Z2j , j = 1, . . . , 425.
= α ∈ (−1, 1).
• Convenience survey (May 2011 - April 2012)
log{p3j /(1 − p3j )}
=
d (x3j )> β + S2 (x3j ) + β ∗ SES3j + B(x3j ) + Z3j ,
j = 1, . . . , 249.
Si (x ) and B(x ) are stationary isotropic RGPs with exponential correlation for
i = 1, 2; Zij is Gaussian noise. We use Monte Carlo Maximum Likelihood to fit
the model to the data.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
11
Methods
Monte Carlo maximum likelihood
Let T and θ denote the vector of the random effects and model
parameters, respectively, for given data y.
• Likelihood function
Z
L (θ) = fY (y; θ) =
fT (t ; θ)fY |T (y|t ) dt
(1)
Let f˜(y, t ) = fT (t , θ0 )fY |T (y|t ), then (1) can be expressed as
Z
L (θ) =
fT (t ; θ)fY |T (y|t )
f˜(y, t )
f˜(y, t ) dt ∝
Z
fT (t ; θ)
fT (t ; θ0 )
= ẼT |Y
Emanuele Giorgi et al.
[email protected]
f˜T |Y (t |y) dt
fT (t ; θ)
fT (t ; θ0 )
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
12
Results
Parameter estimates
Term
Intercept
ITN
IRS
RS
DW
SES
SES (spatial bias)
α
Emanuele Giorgi et al.
Estimate
0.130
-0.840
-0.485
0.550
-0.282
-0.150
-0.160
0.859
2.5%
-1.741
-1.004
-0.666
0.348
-0.742
-0.238
-0.239
0.483
[email protected]
97.5%
1.027
-0.680
-0.305
0.761
0.175
-0.065
0.072
0.924
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
13
Results
Prediction of S2 (x )
-15.95
-15.90
S
-16.00
2
1
-16.05
0
-16.10
-1
-16.15
-2
34.65
34.70
34.75
34.80
34.85
34.90
Results
Prediction of S2 (x )
-15.95
-15.90
S
-16.00
2
1
-16.05
0
-16.10
-1
-16.15
-2
34.65
Emanuele Giorgi et al.
34.70
34.75
34.80
34.85
34.90
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
14
Results
Prediction of B(x )
-15.95
-15.90
Bias (excluding SES)
-16.00
1.0
0.5
-16.05
0.0
-16.10
-0.5
-16.15
-1.0
34.65
34.70
34.75
34.80
34.85
34.90
Results
Prediction of B(x )
-15.90
Bias (excluding SES)
-15.95
SES
-16.00
1.0
Poor
Middle
Rich
OpenStreetMap
0.5
-16.05
0.0
-16.10
-0.5
-16.15
-1.0
34.65
34.70
34.75
34.80
34.85
34.90
0
2000
4000
Results
Prediction of B(x )
Bias (excluding SES)
0.5
-16.05
0.0
-15.95
1.0
-16.00
-16.00
1.0
0.5
0.0
-16.05
-15.95
-15.90
-15.90
Bias (including SES)
-1.0
-16.15
-16.15
-1.0
-0.5
-16.10
-16.10
-0.5
34.65
Emanuele Giorgi et al.
34.70
34.75
34.80
34.85
34.90
34.65
[email protected]
34.70
34.75
34.80
34.85
34.90
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
15
Results
0.8
1.0
1.2
1.4
Standard errors S2 (eMIS)
Emanuele Giorgi et al.
1.6
1.2
1.0
0.6
0.8
Standard errors S2 (eMIS + rMIS + EAG)
1.2
1.0
0.6
0.8
Standard errors S2 (eMIS + rMIS + EAG)
1.2
1.0
0.8
0.6
Standard errors S2 (eMIS + rMIS)
1.4
1.6
Prediction (3)
0.6
0.8
1.0
1.2
Standard errors S2 (eMIS +rMIS)
[email protected]
1.4
1.6
0.8
1.0
1.2
1.4
1.6
Standard errors S2 (eMIS)
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
16
Discussion
Further research
Potential applications in disease control are:
• development of computational procedures to inform
improved prospective data collection for efficient hybrid
sampling approaches;
• more accurate local spatio-temporal risk stratification maps
that can inform more targeted control efforts.
• it is time to rethink about convenience sampling.
• how do we test the gold-standard assumption?
• the method is applicable under any general form of biased
sampling but further validation is needed.
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
17
Discussion
Thanks for the attention
Emanuele Giorgi et al.
[email protected]
Lancaster University
Combining data from multiple spatially referenced surveys: geostatistical analysis of childhood malaria in Chikhwawa District, Malawi
18