Adjusting for sampling variability in disease mapping using an interval soft data model Sampling Variability • Disease rates are a common measure of disease risk since higher rates reflect greater chances for contracting the disease • Mapping raw rates may obscure the underlying (latent) risk due to noise generated by sampling variability. - Dominated by areas with small populations since small changes to the observed cases will result in large changes to the rate - High rates based on small populations likely to be artificially elevated due to high variability in the estimates - Particularly affects maps of rare diseases calculated from health data indexed at a small geographical resolution (sparse data) Sampling Variability • Ex: Study Area Rate = 2 = 0.0008 2500 2 2500 (0.0008) Sampling Variability • Ex: Study Area Rate = 2 = 0.0008 2500 1 1000 Subregional Rate = Cases Population = 0 0 500 100 (0.001) 0 1 500 100 (0.01) 0 75 0 0 0 100 75 50 Sampling Variability • Upscaling causes a loss in the resolution of the data • ‘Spatial smoothing’ allows areas to borrow strength from neighboring regions to produce a more stable estimate of the value associated with each region. Ex: - Disk smoothing obtains a smoothed value for each region by averaging the rates of all regions falling within a defined neighborhood window - Empirical Bayes and Bayesian hierarchical modeling that ‘shrink’ rates toward a mean trend as estimated by known covariates (Clayton and Kaldor, 1987) Model Definitions • Disease risk may be defined as the probability of disease x occurring in a population of size n as n approaches infinity. Yit X it lim , nit nit i 1,... I ; t 1,..., T • In real world situations, only a finite population nit may be sampled, producing a finite number of cases Yit and an observed rate Rit = Yit / nit . • The difference between Rit and Xit may be expressed as: Rit X it it where it is the error due to sampling variability. Model Definitions • Given Xit and Nit measured without error and representative of the population at risk Yit = round (Xit nit) • Due to rounding error: Y 0.5 Yit 0.5 X it it , N it N it • or 0.5 0.5 X it Rit , Rit N N it it Smaller population sizes generate larger confidence intervals and higher variability around observed rates Model Definitions • • In most datasets, there will be additional errors associated with the measurement of cases given a sample of the population at risk. To include additional uncertainty, we define a smoothing factor such that: = 0.5 + where is the error in observed cases due to random effects other than sample size. • Therefore, X it Rit , Rit N it N it In the special case that Yit and nit are measured without error, = 0 and = 0.5. Conceptual Example Random number generator 3 2 1 0 Yit = round (Xit 3 2 1 0 Rit = Yit / Nit Spatial Covariance • experimental — model (a) Observed rate field R(s) • experimental — model (b) Risk field X(s) Model Fit MSE: 1.52e-06 MSE: 6.43e-07 Observed Rate as Risk Estimate MSE: 1.52e-06 MSE: 2.75e-07 Application: Spatio-temporal Mapping of HIV in North Carolina North Carolina I-77 I-85 I-95 Greensboro I-40 Raleigh Asheville Fayetteville Charlotte Wilmington Urbanized Area Miles 0 15 30 60 90 120 Interstate Hwy N North Carolina VCT Locations Miles 0 15 30 60 90 120 VCT Site N Hard/Soft Data Cross-validation Threshold (Percentile of Testing Population) 0.6 0.7 0.8 0.9 MSE Hard Data MSE Soft Data ( = 0.5) 0.000053 0.000051 0.000048 0.000054 0.000054 0.000049 0.000043 0.000045 Error reduction (%MSE from Hard to Soft Data) 1.7 -3.6 -11.6 -17.3 • Hard Data = 0.5 = 1.0 o = 2.0
© Copyright 2026 Paperzz