TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk1 & Ulf Strömberg2 1Competence Center for Clinical Research 2Occupational and Environmental Medicine Lund University Hospital OUTLINE OF TALK • Previous project: What have we done? (Jonas Björk) • Ongoing project: What shall we do? (Ulf Strömberg) Two-stage procedure for casecontrol studies 1st stage Complete data obtained from registries Disease status General characteristics Group affiliation (e.g. occupation or residential area) Group-level exposure XG 2nd stage Individual exposure data for a subset of the 1st stage sample Exposure database group-level exposure • JEM = Job Exposure Matrix Occupational group proportion exposed • GIS Residential group (area) average concentration of an air pollutant JEM - proportion exposed 0,5 0,4 0,3 0,2 0,1 0 Most data typically in groups with low XG Group 0 Group 1 Group 2 Group 3 Group 4 Linear Relation between Proportion Exposed and Relative Risk • No confounding between/within groups Example: RR (exposed vs. unexposed) = 2.0 Proportion exposed XG 0% 10% 50% 100% Average RR 1.0 0.10 * 2 + 0.9 +1.0 =1.1 1.5 2.0 Linear OR model: OR(XG) = 1 + β XG XG = Exposure proportion OR for exposed vs. unexposed = OR(1) = 1 + β OR(1) Most data typically in groups with low XG 1 0 1 XG Confounding between groups • General confounders (eg, gender and age) can normally be adjusted for • Assuming no confounding within groups and no effect modification in any stratum sk: OR(XG;s1, s2, ...sk) = (1 + β XG) exp(Σγksk) Combining 1st and 2nd stage data • Assumption: 2nd stage data missing at random condition on disease status and 1st stage group affiliation • For subjects with missing 2nd stage data: Use 1st stage data to calculate expected number of exposed/unexposed • Expectation-maximization (EM) algorithm EM-algorithm (Wacholder & Weinberg 1994) 1. Select a starting value, e.g. OR=1 2. E-step Among the non-participants, calculate expected number of exposed/unexposed case and controls in each group 3. M-step Maximize the likelihood for observed+expected cell frequencies using the chosen risk model for individual-level data (not necessarily linear) New OR-estimate 4. Repeat 2. and 3. until convergence E-step in our situation (Strömberg & Björk, submitted) ÔR = Current OR-estimate Complete the data in each group G: • m0 controls with missing 2nd stage data m0 * XG = expected number of exposed • m1 cases with missing 2nd stage data m1 * XG * ÔR / [1+(ÔR-1)* XG] Simulated case-control studies • 400 cases, 1200 controls in the 1st stage • 2nd stage participation 75% of the cases 25% of the controls • Selective participation of 2nd stage controls Corr(Participation, XG) =0, > 0, <0 • 1000 replications in each scenario • True OR = 3 Simulations - Results 1st stage data only (400 + 1200) 2nd stage data only (300 + 300) EM-method (400 + 1200) OR SD Coverage OR SD Coverage OR SD Coverage Corr(Part., XG)=0 3.0 0.18 95.0% 3.0 0.23 95.6% 3.0 0.15 95.5% Corr(Part., XG)<0 3.0 0.18 95.0% 5.3 0.29 45.8% 3.0 0.15 95.0% Corr(Part., XG)>0 3.0 0.18 95.0% 1.8 0.20 32.9% 3.0 0.15 95.5% Participation SD = Empirical standard deviation of the ln(OR) estimates Coverage = Coverage of 95% confidence intervals Simulations - Conclusions Combining 1st and 2nd stage data, using the EM method can: 1. Improve precision 2. Remove bias from selective participation Method is sensitive to errors in the (1st stage) external exposure data! Simulations – Conclusions II EM-method is sensitive to 1. Violations of the MAR-assumption (condition on on disease status and 1st stage group affiliation) 2. Errors in the (1st stage) external exposure data Ongoing methodological research project • Focus on exposure estimates from a GIS GIS data: NO2 (Scania) Two-stage exposure assessment procedure 1st stage: XG represents mean exposure levels rather than proportion exposed XG = 4.8 XG = 10.1 xi xi XG = 20.1 ... xi 2nd stage: xi is a continuous, rather than a dichotomous, exposure variable Assume a linear relation between and xi and disease odds (cf. radon exposure and lung cancer [Weinberg et al., 1996]). Odds xi For the ”only 1st stage” subjects: no bias expected by using their XG:s (Berkson errors) provided MAR in each group – independent of disease status. EM method? Exposure variation in each group? Two-stage exposure assessment procedure – related work • Multilevel studies with applications to a study of air pollution [Navidi et al., 1994]: pooling exposure effect estimates based on individual-level and group-level models, respectively Collecting data on confounders or effect modifiers at 2nd stage 1st stage: XG = mean exposure levels XG = 4.8 XG = 10.1 ci ci XG = 20.1 ... ci 2nd stage: ci is a covariate, e.g. smoking history Data on confounders or effect modifiers at 2nd stage – estimation of exposure effect • Confounder adjustment based on logistic regression: pseudo-likelihood approach [Cain & Breslow, 1988] • More general approach: EM method [Wacholder & Weinberg, 1994] Design stage (“stage 0”) 1st stage: How many geographical areas (groups)? Group 1 Group 2 Subjects? ? Group 3 ... ? 2nd stage: Fractions of the 1st stage cases and controls? Design stage – related work • Two-stage exposure assessment: power depends more strongly on the number of groups than on the number of subjects per group [Navidi et al., 1994] References I • Björk & Strömberg. Int J Epidemiol 2002;31:154-60. • Strömberg & Björk. “Incorporating grouplevel exposure information in case-control studies with missing data on dichotomous exposures”. Submitted. References II • Cain & Breslow. Am J Epidemiol 1988;128:11981206. • Navidi et al. Environ Health Perspect 1994;102(Suppl 8):25-32. • Wacholder & Weinberg. Biometrics 1994;50:350-7. • Weinberg et al. Epidemiology 1996;7:190-7.
© Copyright 2026 Paperzz