American Journal of Epidemiology ª The Author 2007. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: [email protected]. Vol. 167, No. 3 DOI: 10.1093/aje/kwm292 Advance Access publication November 6, 2007 Practice of Epidemiology Census and Geographic Differences between Respondents and Nonrespondents in a Case-Control Study of Non-Hodgkin Lymphoma Min Shen1, Wendy Cozen2, Lan Huang3, Joanne Colt1, Anneclaire J. De Roos4,5, Richard K. Severson6,7, James R. Cerhan8, Leslie Bernstein2, Lindsay M. Morton1, Linda Pickle3, and Mary H. Ward1 1 Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD. Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA. 3 Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD. 4 Fred Hutchinson Cancer Research Center, Seattle, WA. 5 Department of Epidemiology, School of Public Health and Community Medicine, University of Washington, Seattle, WA. 6 Department of Family Medicine and Public Health Sciences, School of Medicine, Wayne State University, Detroit, MI. 7 Karmanos Cancer Institute, Detroit, MI. 8 Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN. 2 Received for publication April 27, 2007; accepted for publication September 1, 2007. To quantify nonresponse bias and estimate its potential impact, the authors compared census-based socioeconomic and demographic factors and geographic locations among respondents and nonrespondents in a multicenter case-control study of non-Hodgkin lymphoma (1998–2000). Using a geographic information system, the authors mapped current addresses and linked them to the 2000 US Census database to determine group-level demographic and socioeconomic information. They used logistic regression analysis to compute the risk of being a nonrespondent, separately for cases and controls. They used spatial scan methods to evaluate spatial clustering at each study center. Among cases at one or more centers, nonresponse was significantly associated with nonWhite race, lower household income, a greater proportion of multiple-unit housing, fewer years of education, and living in a more urbanized area. For most factors, the authors observed similar patterns among controls, although findings were mostly nonsignificant. They found two nonrandom elliptical clusters in Los Angeles, California, and Detroit, Michigan, that disappeared after adjustment for the demographic factors. The authors determined the bias in non-Hodgkin lymphoma risk associated with census-tract educational level by comparing risks among respondents and all subjects. The bias was 8%, indicating that the socioeconomic and demographic differences between respondents and nonrespondents did not result in a large bias in the risk estimate for education. bias (epidemiology); case-control studies; censuses; epidemiologic methods; geographic information systems; lymphoma, non-Hodgkin Abbreviations: NCI, National Cancer Institute; NHL, non-Hodgkin lymphoma; SEER, Surveillance, Epidemiology, and End Results. Case-control studies have yielded important insights into risk factors for many diseases. Studies of the etiology of cancer and other rare diseases often require a casecontrol design with multiple data collection components. Correspondence to Dr. Mary H. Ward, Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Blvd., Executive Plaza South 8006, MSC 7240, Rockville, MD 20892-7240 (e-mail: wardm@ mail.nih.gov). 350 Am J Epidemiol 2008;167:350–361 Respondent/Nonrespondent Differences in an NHL Case-Control Study Investigators carrying out such studies face increasing challenges in obtaining sufficiently high response rates (1). Nonresponse bias occurs when participating cases, controls, or both differ from nonparticipants with respect to the distribution of disease risk factors (2). When patterns of nonresponse differ between cases and controls, biased risk estimates can result. Higher rates of nonresponse increase the likelihood of nonresponse bias and errors in risk estimates. Through the Surveillance, Epidemiology, and End Results (SEER) Program, the National Cancer Institute (NCI) compiles data on cancer incidence and survival from numerous population-based cancer registries. We conducted a large, multicenter, population-based case-control study of non-Hodgkin lymphoma (NHL) at four SEER centers: Seattle, Washington; Los Angeles, California; Detroit, Michigan; and the state of Iowa (the NCI-SEER NHL Study). The study examined various environmental exposures as possible risk factors for NHL, including pesticides, occupational exposures, hair dyes, tobacco and alcohol, dietary factors, and viruses. Because response rates were relatively low and because many of the exposures of interest may vary by race, sex, age, socioeconomic status, and geographic location, we compared these factors among participating and nonparticipating cases and controls to evaluate the potential for nonresponse bias. MATERIALS AND METHODS Study design and data sources The methods of the NCI-SEER NHL Study have been described in detail elsewhere (3). A total of 2,248 potentially eligible cases were identified in four population-based SEER cancer registries from July 1, 1998, through June 30, 2000. We did not approach 520 NHL cases because they had died (14 percent), their physician refused (3 percent), they could not be found (6 percent), or they had moved out of the area (1 percent). Of the remaining 1,728 cases, 274 (16 percent) refused to participate and 133 (8 percent) were not interviewed for reasons such as illness, impairment, or our inability to contact them. We selected controls stratified on age, sex, race, and center. Controls aged 20–64 years were selected by one-step list-assisted random digit dialing, sampling from banks with one or more telephone numbers listed in the White Pages (residential) directory. Telephone numbers were matched against the Yellow Pages (commercial) directory to remove business numbers and were autodialed to remove nonworking numbers. Controls aged 65 years or older were selected from Medicare files. Of 2,409 eligible controls selected from the general population in the four registry areas, 1,352 did not participate in the study because they either could not be located (13 percent), had died (1 percent), had moved out of the area (1 percent), refused (35 percent), or were not interviewed because of illness, a language barrier, or other reasons (6 percent). All participants were defined as respondents, and all selected eligible subjects who did not participate were defined as nonrespondents. The overall response rates were 58.8 percent for cases and 43.9 percent for controls. Am J Epidemiol 2008;167:350–361 351 Using a geographic information system, we linked the current address of each respondent and nonrespondent to the 2000 US Census database to compare respondents with nonrespondents on the basis of demographic and socioeconomic factors. The addresses of respondents and nonrespondents were geocoded using ArcView 3.2 software (ESRI, Inc., Redlands, California) and Geographic Data Technology’s MatchMaker SDK Professional Version 4.3 street database (Geographic Data Technology, Inc., Lebanon, New Hampshire) (4). Addresses for 2,183 of 2,378 respondents (92 percent) and 1,996 of 2,279 nonrespondents (88 percent) were located or were matched to the nearest intersection; the remainder could not be geocoded because the addresses were incomplete or could not be found in the database. Among both cases and controls, persons with addresses that could be geocoded were comparable to those whose addresses could not be geocoded with respect to age and gender. Information on race and education was missing for many nonrespondents, but among those with this information, there were no substantial differences based on geocoding status. Geocoded addresses were linked to the corresponding 2000 US Census geographic units (block, block group, and census tract). We also linked the addresses to the 1990 US Census (block group) to obtain information on water source for each subject, which was not available in the 2000 Census. For 117 subjects, the geocoded location of their residence was linked to a census block with no population, probably because of a geocoding error. These subjects were excluded, leaving 2,109 respondents (1,166 cases, 943 controls) and 1,953 nonrespondents (783 cases, 1,170 controls) in the analysis. Data analysis The decennial census data are available in hierarchical order of increasing geographic and population size (e.g., census block, block group, and census tract). Information on block demographic factors and block group socioeconomic factors and housing characteristics was compiled from the questions asked of all people (Summary File 1) and a one-in-six sample of all people (Summary File 3), respectively. We obtained this information from the US Census Bureau website (http://www.census.gov) for each census unit where respondents and nonrespondents lived. All variables except for block-group median household income were listed in the Census database as the number of people in the census unit defined by the particular demographic factor (table 1). We calculated the percentage of the census-unit population in each subgroup (e.g., the proportion of non-Hispanic Whites in a census block). We also calculated years of education for specific demographic subgroups as a weighted average across the census unit. We used a weight of 5 years for ‘‘less than 9th grade,’’ 10 for ‘‘9th–12th grade, no diploma,’’ 12 for ‘‘high school graduate (includes equivalency),’’ 14 for ‘‘some college, no degree,’’ 15 for ‘‘associate degree,’’ 16 for ‘‘bachelor’s degree,’’ and 18 for ‘‘graduate or professional degree.’’ Self-reported information on race was available for nonrespondents in Detroit and Los Angeles to allow for oversampling of African Americans at the two study centers (3); this was not the case for Seattle and Iowa. For persons with 352 Shen et al. TABLE 1. US Census data sources and derived variables in a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Data source Level Census 2000 (SF* 1) Census block Census 2000 (SF 3) Block group Census 2000 (SF 3) Census 1990 (SF 3) Median population of census unit (and interquartile range) 98 (48–214) Variables Derived variables Population stratified by age, sex, and race Proportion of people by race 1,226 (902–1,683) Median household income in 1999; population stratified by housing type; population stratified by sex, race, and educational attainment; population in urbanized and rural areas Proportion of people with different housing types; estimated educational attainment by race and sex; proportion of people living in urbanized areas (Iowa) Tract 3,269 (2,513–4,248) Population stratified by sex, age, and educational attainment Estimated educational attainment by sex and age Block group 1,226 (902–1,683) Source of water (public system, private well, or other source) Proportion of people using a public water system (Iowa) * SF, Summary File. self-reported information on individual race, we categorized race into White (designated 1) versus non-White (designated 0). For persons with missing information on race, we calculated the proportion of Whites by sex and 10-year age group in the census blocks where they resided, and we assigned the appropriate sex- and age-specific proportion of Whites (range, 0–1). There was a good correlation between the census block and individual race variables for persons with individual race information (Spearman correlation, r ¼ 0.60). The amount of cropland surrounding the residence was found to be a predictor of residential exposure to agricultural pesticides among respondents in Iowa (5). We computed the average crop acreage within 500 m of the residence during 1998–2000 for each of 196 respondents and 112 nonrespondents in Iowa whose residence fell within the boundaries of crop maps for central Iowa. As previously described (5), maps for 1998, 1999, and 2000 were created from spring and summer Landsat multispectral satellite images from two path/rows in south-central Iowa, using Farm Service Agency records for validation. We used logistic regression to estimate the risk (odds ratio) of being a nonrespondent for cases and controls separately and by study center. We categorized the proportions of census units comprising different races, incomes, and housing types into four levels based on the quartile cutpoints among all respondents, by center, using the lowest quartile as the reference group. We categorized education into three levels: <12 years, 12–15 years, and 16 years. Crop acreage near homes was categorized into three levels based on tertiles among the respondents. We adjusted all models for individual-level information on sex, age (continuous), and race (White, non-White). The census race variable was used when information on individual race was missing. All statistical tests were two-sided. Seattle, Los Angeles, and Detroit are highly urbanized areas, with 97 percent of the population using public water supplies. However, Iowa has more variation in the percentage of the census block group population using public water supplies and living in urbanized areas. Therefore, we evaluated these variables for Iowa only and categorized them into two levels (<50 percent vs. 50 percent) because the distributions tended to be bimodal. We used stepwise logistic regression to identify the most significant variables associated with nonresponse at each center using significance levels of 0.20 and 0.15 for entering an effect into the model and keeping it in the model, respectively. Age, sex, and race were retained in all models. We assessed spatial autocorrelation by repeating the logistic regression analysis with the same covariates plus additional random effects for latitude and longitude of the geocoded residences, using the GLIMMIX procedure in SAS (6). No unexplained spatial autocorrelation was found, so we present results from the fixed-effects logistic models. We evaluated the potential for bias in the estimate of NHL risk associated with census tract educational level and median household income by comparing the odds ratio based on respondents only with that based on all subjects. Educational level and household income were analyzed because they are major components of socioeconomic status and they may be correlated with certain occupational and environmental exposures. We analyzed spatial clustering of respondents and nonrespondents separately for cases and controls at each center, using a spatial scan method based on a Bernoulli model developed by Kulldorff and colleagues (7–9). The maximum size of the geographic cluster was set to include no more than 49 percent of all subjects. The analysis identified circular or elliptical clusters in which the ratio of nonrespondents to respondents differed from the overall ratio in the study population. We compared the demographic and socioeconomic characteristics of individuals within a significant cluster to those of persons outside the cluster using a t test, or the Satterthwaite method if variances were not equal. To determine whether the spatial clustering was explained by the census factors that were associated with nonresponse, we conducted the spatial clustering analysis again using a normal model-based spatial scan method (9) on the residuals of the multivariate logistic model that included the census demographic and socioeconomic factors. Am J Epidemiol 2008;167:350–361 Respondent/Nonrespondent Differences in an NHL Case-Control Study RESULTS For Los Angeles, Detroit, Seattle, and Iowa, response rates were 52 percent, 55 percent, 62 percent, and 67 percent, respectively, among cases and 38 percent, 39 percent, 44 percent, and 58 percent, respectively, among controls. Table 2 shows the distribution of demographic and other individual-level information for respondents and nonrespondents. Overall, response rates were higher among female cases (62 percent) than among male cases (56 percent). Cases and controls who responded tended to be younger than nonrespondents (median age in cases, 58 years vs. 60 years; median age in controls, 61 years vs. 63 years). Selfreported information on race was not available from a high proportion of nonrespondents. Information on educational level was obtained from all respondents but only from a small proportion of nonrespondents who completed a brief questionnaire. The histologic distributions of NHL were significantly different between respondents and nonrespondents, with a greater proportion of follicular lymphomas and a lower proportion of diffuse large B-cell lymphomas being seen among respondents, mainly because of the higher fatality rate for diffuse large B-cell lymphomas as compared with follicular and other lymphomas. Approximately 18.0 percent of persons with diffuse large B-cell lymphomas were deceased when we attempted to contact them, as compared with 2.8 percent of persons with follicular lymphomas. With the exception of race, the distributions of these factors were similar across the four centers, although for some centers the differences did not achieve statistical significance (results not shown). There were more African Americans at the Los Angeles and Detroit centers (18–31 percent) than at the Seattle and Iowa centers (<2.5 percent) (results not shown). Tables 3–6 present comparisons of the census block, block group, or tract demographic and socioeconomic characteristics of respondents and nonrespondents at each study center. Los Angeles had a much higher percentage of Hispanics than the other centers. Among cases, the odds of being a nonrespondent increased significantly as the proportion of Hispanics increased. The odds decreased significantly with increasing proportions of non-Hispanic Whites, median household income, proportions living in singlefamily homes, and proportions in years-of-education categories among men and women overall and among Hispanics (table 3). Among controls, nonresponse was not statistically significantly associated with the census variables; however, as was observed for cases, odds ratios for nonresponse were inversely associated with years of education. In Detroit (table 4), the odds of being a nonresponding control decreased significantly as median household income increased; the same pattern was observed among cases, although the trend was not statistically significant. Educational level was higher for both men and women in responding cases’ census tracts as compared with nonresponding cases’ census tracts. Among African-American women, a higher educational level was inversely associated with nonresponse. We observed similar patterns for educational level among male African-American cases and controls which did not reach statistical significance. Am J Epidemiol 2008;167:350–361 353 In Seattle (table 5), controls from census blocks with fewer Hispanics and more non-Hispanic Whites and from census block groups with higher household incomes were more likely to participate. Cases and controls from census block groups with 50 percent single-family housing units were more likely to participate; however, the association was statistically significant only among cases in the third quartile. In Iowa (table 6), cases from census blocks with more Hispanics and fewer non-Hispanic Whites were less likely to participate in the study. Among cases and controls, being a nonrespondent was associated with living in an urbanized area, although the association was statistically significant only among cases. For both cases and controls, the proportions who obtained their water from a public utility were similar between respondents and nonrespondents. Although respondents were more likely to live in less urbanized areas, differences in the acreage of cropland within 500 m of homes did not differ statistically between respondents and nonrespondents. Several of the census variables we evaluated were correlated with each other (e.g., household income and living in a single-family home (r ¼ 0.51), household income and educational level (r ¼ 0.64)). Adjusting all of the census factors for each other in multivariable analyses, we found that, among cases, non-White race (all centers), older age (Detroit and Iowa), fewer years of education (Los Angeles and Detroit), male sex (Iowa and Seattle), and a lower percentage of single-family homes (Los Angeles) were statistically significant predictors of nonresponse. Among controls, non-White race (Los Angeles and Seattle; non-Whites were mostly African-American), lower income (Detroit), and fewer years of education (Seattle) were significant predictors of nonresponse. We evaluated the association between census tract educational level and household income and risk of NHL among respondents only and among all eligible subjects (table 7). When results were based on respondents only, the odds ratios for 12–15 years of education versus <12 years (odds ratio ¼ 1.41, 95 percent confidence interval: 1.12, 1.77) and for 16 years of education versus <12 years (odds ratio ¼ 1.37, 95 percent confidence interval: 1.01, 1.86) were underestimated by 1 percent and 8 percent, respectively, as compared with the respective odds ratios for education among all subjects. The bias related to household income was 6 percent, 8 percent, and 7 percent for comparisons between the three higher income levels, respectively, as compared with the lowest income level. In the spatial analysis, we found a nonrandom elliptical cluster of 105 cases (p ¼ 0.025) in Detroit in which there were fewer respondents than expected (35 vs. 59) (figure 1). In this cluster, there were fewer Whites (p < 0.001), and cases inside the ellipse were less educated (p < 0.001) than cases outside the ellipse. The differences by education and race were observed for both respondents and nonrespondents. In Los Angeles, there was also an elliptical cluster of 269 cases (p ¼ 0.022) in which the number of respondents was lower than expected (109 vs. 140). There were fewer Whites inside the ellipse (p < 0.001) than outside, and subjects inside the ellipse were less educated (p < 0.001) than subjects outside the ellipse. The differences by education and race were seen for both respondents and 354 Shen et al. TABLE 2. Individual demographic information for respondents and nonrespondents, by case-control status, in a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Controls Respondents (n ¼ 1,057) Cases Nonrespondents (n ¼ 1,352) No. % No. % Los Angeles, California 273 26 442 33 Detroit, Michigan 214 20 332 25 Seattle, Washington 294 28 376 Iowa 276 26 202 Respondents (n ¼ 1,321) No. Nonrespondents (n ¼ 927) % No. % 319 24 296 32 319 24 259 28 28 322 24 198 21 15 361 27 174 19 Study center p value* <0.0001 <0.0001 Sex Male 546 52 Female 511 48 p value* 730 54 711 54 622 46 610 46 0.25 555 60 372 40 28 3 0.004 Age group (years) <30 27 3 60 4 37 3 30–39 79 7 106 8 110 8 82 9 40–49 145 14 193 14 235 18 122 13 50–59 226 21 223 16 314 24 213 23 60–69 372 35 438 32 409 31 281 30 70–79 208 20 331 24 216 16 201 22 p value* 0.0008 0.006 Self-reported race White 843 80 725 54 1,123 85 332 36 Black, mixed Black/other 151 14 256 19 110 8 128 14 Asian 22 2 33 2 40 3 28 3 Other 16 2 62 5 21 2 52 6 Missing data 25 2 276 20 27 2 387 42 Educational level Unknown 0 5 Refused to answer 0 0 0.4 0 0 1 0.1 1 0.1 12 years 111 11 20 1 128 10 11 1 13–15 years 616 58 97 7 815 62 26 3 16 years 330 31 52 4 377 29 1,178 87 0 Missing data 0 17 2 872 94 Histologic typey Follicular 319 24 134 14 Diffuse 41 417 32 376 T-cell 82 6 90 8 Other 459 35 297 32 44 3 30 3 Unknown p value* <0.0001 * Based on the Pearson chi-squared test. y Based on the Revised European–American Lymphoma/World Health Organization classification (21). nonrespondents. At both centers, spatial analysis of the residuals from the logistic regression analyses adjusted for census race, sex, age, income, education, and proportion of single-family housing units showed no significant clus- tering, indicating that all of the geographic clustering of nonrespondents could be explained by socioeconomic and demographic factors. There were no statistically significant geographic clusters of nonrespondents in Iowa or Seattle. Am J Epidemiol 2008;167:350–361 Respondent/Nonrespondent Differences in an NHL Case-Control Study 355 TABLE 3. Comparison of selected census variables between respondents and nonrespondents, by case-control status, at the Los Angeles, California, center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Controls Variable Cases 95% CI* No. of respondents 85 54 1.0 0.90 0.55, 1.49 74 61 1.23 0.76, 2.01 86 0.79 0.48, 1.30 72 71 1.42 0.87, 2.30 119 0.88 0.55, 1.41 58 78 2.00 1.23, 3.26 No. of respondents No. of nonrespondents 7.5 46 79 1.0 >7.5–22 57 88 >22–50 59 >50 73 OR*,y No. of nonrespondents ORy 95% CI Proportion of Hispanics (%)z p-trend 0.54 0.0049 Proportion of non-Hispanic Whites (%)z 13 74 158 1.0 57 93 1.0 >13–42 67 76 0.63 0.40, 0.99 64 62 0.66 0.40, 1.08 >42–71 49 74 0.89 0.54, 1.47 82 65 0.57 0.35, 0.93 >71 45 64 0.88 0.51, 1.49 86 44 0.38 0.22, 0.65 p-trend 0.84 0.0004 Median household income in 1999z $36,000 63 132 1.0 68 96 >$36,000–$49,000 63 87 0.76 0.48, 1.20 69 62 0.70 0.44, 1.12 >$49,000–$66,000 64 82 0.70 0.44, 1.11 66 57 0.69 0.42, 1.11 >$66,000 45 71 0.91 0.55, 1.52 86 49 0.47 0.29, 0.76 p-trend 1.0 0.52 0.0032 Living in a single-family housing unit (%)z 28 57 89 1.0 74 79 1.0 >28–63 68 118 1.15 0.73, 1.81 63 78 1.06 0.66, 1.70 >63–91 61 88 0.93 0.58, 1.48 70 56 0.74 0.46, 1.19 >91 49 77 1.10 0.66, 1.83 82 51 0.61 0.38, 0.99 p-trend 0.99 0.020 Years of education for Hispanic men aged >25 years 131 222 1.0 143 166 1.0 12–15 83 123 1.02 0.70, 1.48 118 84 0.65 0.45, 0.94 16 15 16 0.75 0.36, 1.59 20 8 0.39 0.17, 0.93 <12 p-trend 0.69 0.0040 Years of education for Hispanic women aged >25 years <12 12–15 16 148 249 1.0 155 176 1.0 75 110 1.03 0.71, 1.49 114 78 0.66 0.46, 0.95 6 6 0.69 0.22, 2.21 11 8 0.68 0.26, 1.77 p-trend 0.86 0.033 Years of education for men aged 45–64 years <12 12–15 16 74 148 1.0 64 91 1.0 145 205 0.81 0.56, 1.17 198 157 0.62 0.42, 0.91 16 19 0.73 0.35, 1.54 27 16 0.49 0.24, 0.99 p-trend 0.23 0.0096 Years of education for women aged 45–64 years <12 12–15 16 p-trend 80 158 1.0 154 211 0.78 1 3 0.55, 1.12 —§ 0.24 * OR, odds ratio; CI, confidence interval. y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race. z Quartile categories based on the distribution among all respondents. § Odds ratio was not calculated because of small numbers. Am J Epidemiol 2008;167:350–361 77 99 1.0 210 164 0.68 2 1 0.47, 0.98 —§ 0.039 356 Shen et al. TABLE 4. Comparison of selected census variables between respondents and nonrespondents, by case-control status, at the Detroit, Michigan, center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Controls Variable Proportion of Hispanics (%)z 0 >0–0.38 >0.38–2.08 >2.08 p-trend Proportion of non-Hispanic Whites (%)z 60 >60–91 >91–97 >97 p-trend Median household income in 1999z $41,000 >$41,000–$54,000 >$54,000–$72,000 >$72,000 p-trend Living in a single-family housing unit (%)z 57 >57–84 >84–98 >98 p-trend Years of education for AfricanAmerican men aged >25 years <12 12–15 16 p-trend Years of education for AfricanAmerican women aged >25 years <12 12–15 16 p-trend Years of education for men aged 45–64 years <12 12–15 16 p-trend Years of education for women aged 45–64 years <12 12–15 16 p-trend Cases No. of respondents No. of nonrespondents 103 3 42 47 146 3 73 91 1.30 1.44 59 50 49 37 103 70 72 68 1.0 1.04 1.13 1.40 60 43 47 45 127 77 61 48 1.0 0.84 0.61 0.49 44 54 49 48 73 95 85 60 1.0 1.05 1.02 0.77 36 76 15 74 122 14 1.0 0.80 0.45 22 82 14 43 143 19 1.0 0.89 0.72 30 152 13 59 236 18 1.0 0.84 0.76 23 171 1 42 267 4 1.0 0.88 OR*,y 95% CI* 1.0 —§ 0.82, 2.07 0.93, 2.24 0.08 0.52, 2.08 0.54, 2.36 0.66, 2.99 0.28 0.51, 1.39 0.36, 1.03 0.28, 0.85 0.0062 0.64, 1.74 0.61, 1.72 0.45, 1.31 0.36 0.49, 1.33 0.19, 1.06 0.093 0.50, 1.60 0.29, 1.76 0.49 0.51, 1.39 0.32, 1.79 0.46 0.51, 1.53 —§ 0.89 No. of respondents No. of nonrespondents 137 2 81 76 105 2 58 65 0.93 1.15 63 74 74 85 72 59 46 53 1.0 0.84 0.72 0.69 63 80 77 76 82 49 50 49 1.0 0.53 0.55 0.58 79 69 73 75 58 57 58 57 1.0 1.20 1.11 1.19 38 111 21 48 88 14 1.0 0.67 0.64 23 115 25 36 103 17 1.0 0.59 0.46 25 249 22 41 173 16 1.0 0.44 0.46 19 275 2 28 201 1 1.0 0.46 ORy 95% CI 1.0 —§ 0.60, 1.43 0.75, 1.77 0.67 0.42, 1.71 0.34, 1.53 0.32, 1.46 0.28 0.31, 0.88 0.32, 0.93 0.34, 0.99 0.080 0.73, 1.98 0.68, 1.82 0.72, 1.96 0.57 0.40, 1.15 0.27, 1.48 0.18 0.32, 1.06 0.20, 1.06 0.053 0.25, 0.78 0.20, 1.09 0.030 0.24, 0.88 —§ 0.019 * OR, odds ratio; CI, confidence interval. y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race. z Quartile categories based on the distribution among all respondents. § Odds ratio was not calculated because of small numbers. Am J Epidemiol 2008;167:350–361 Respondent/Nonrespondent Differences in an NHL Case-Control Study 357 TABLE 5. Comparison of selected census variables between respondents and nonrespondents, by case-control status, at the Seattle, Washington, center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Controls Variable No. of respondents No. of nonrespondents Cases OR*,y 95% CI* No. of respondents No. of nonrespondents ORy 95% CI Proportion of Hispanics (%)z 0.62 76 70 1.0 67 35 1.0 >0.62–2.43 72 66 0.96 0.60, 1.54 72 29 0.81 0.44, 1.51 >2.43–5.22 62 75 1.21 0.75, 1.94 82 32 0.80 0.44, 1.47 >5.22 67 104 1.56 0.99, 2.45 77 46 1.14 0.64, 2.03 p-trend 0.033 0.63 Proportion of Non-Hispanic Whites (%)z 76 68 117 1.0 >76–86 68 69 0.62 0.39, 0.98 75 48 1.0 76 34 0.85 0.48, 1.52 >86–92 75 59 0.50 0.31, 0.79 73 24 0.64 0.35, 1.20 >92 66 70 0.70 0.44, 1.11 74 36 0.95 0.53, 1.69 p-trend 0.062 0.67 Median household income in 1999z $47,000 67 98 1.0 >$47,000–$59,000 69 83 0.86 0.55, 1.35 77 47 1.0 75 33 0.81 0.46, 1.45 >$59,000–$73,000 70 74 0.79 0.50, 1.25 74 29 0.79 0.44, 1.43 >$73,000 71 60 0.63 0.39, 1.01 72 33 0.80 0.45, 1.43 p-trend 0.053 0.45 Living in a single-family housing unit (%)z 49 72 105 1.0 72 48 1.0 >49–78 68 64 0.68 0.43, 1.08 76 29 0.57 0.32, 1.04 >78–94 62 75 0.89 0.56, 1.41 81 29 0.56 0.31, 1.00 >94 75 71 0.70 0.44, 1.10 69 36 0.80 0.45, 1.41 p-trend 0.23 0.37 Years of education for men aged 45–64 years§ <12 12–15 16 1 2 1.0 252 285 24 28 1.06 5 6 1.0 268 302 1.10 4 7 1.73 1 2 274 132 23 8 0.74 0.31, 1.75 5 3 1.0 0.32, 3.74 288 138 0.31, 9.85 5 1 0.60, 1.89 1.0 Years of education for women aged 45–64 years <12 12–15 16 p-trend 0.53 1.94 0.41, 9.08 —{ 0.70 * OR, odds ratio; CI, confidence interval. y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race. z Quartile categories based on the distribution among all respondents. § Odds ratios were based on the comparison between 16 years and <16 years. { Odds ratio was not calculated because of small numbers. DISCUSSION Response rates in epidemiologic studies, particularly population-based case-control studies, have been decreasing for two decades (1). Low response rates increase the likelihood of nonresponse bias, the systematic difference Am J Epidemiol 2008;167:350–361 in exposures and other factors between responding and nonresponding groups. In our study, the response rates ranged from 52 percent to 67 percent for cases and from 38 percent to 58 percent for controls, and among cases, the distributions of sex, age, and histologic type differed significantly between respondents and nonrespondents. 358 Shen et al. TABLE 6. Comparison of selected census variables and amount of surrounding cropland between respondents and nonrespondents, by case-control status, at the Iowa center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Controls Variable Proportion of Hispanics (%) 0 >0–1.07 >1.07 p-trend Proportion of non-Hispanic Whites (%) 94 >94 Median household income in 1999z $34,000 >$34,000–$40,000 >$40,000–$46,000 >$46,000 p-trend Living in a single-family housing unit (%)z 71 >71–83 >83–91 >91 p-trend Years of education for men aged 45–64 years <12 12–15 16 p-trend Years of education for women aged 45–64 years <12 12–15 16 p-trend Proportion of people living in an urbanized area (%){ 50 <50 Water obtained from public water system (%) 50 <50 Amount of cropland (acres#) within 500 m of subject’s residence**,yy 6.7 >6.7–36.9 >36.9 p-trend Cases No. of respondents No. of nonrespondents OR*,y 161 9 66 118 9 43 1.0 1.39 0.90 58 178 42 128 1.0 0.99 63 53 58 62 45 35 41 49 1.0 0.94 1.01 1.14 53 61 63 59 52 37 43 38 1.0 0.59 0.67 0.63 10 225 1 9 160 1 1.0 0.83 10 226 6 164 1.0 1.48 95% CI* 0.53, 3.62 0.57, 1.44 0.78 0.62, 1.59 0.53, 1.68 0.58, 1.77 0.66, 1.97 0.60 0.34, 1.05 0.39, 1.17 0.36, 1.12 0.17 0.32, 2.14 —§ 0.78 0.46, 4.73 No. of respondents No. of nonrespondents ORy 203 17 63 87 9 51 1.0 1.19 1.84 72 211 54 93 1.0 0.61 67 80 69 67 40 44 24 39 1.0 0.94 0.58 0.92 75 70 69 69 48 31 35 33 1.0 0.75 0.81 0.69 14 264 5 11 134 2 1.0 0.59 10 271 2 11 136 1.0 0.46 95% CI 0.49, 2.85 1.14, 2.97 0.015 0.39, 0.96 0.53, 1.65 0.31, 1.11 0.51, 1.66 0.49 0.42, 1.36 0.46, 1.45 0.38, 1.24 0.26 0.24, 1.42 —§ 0.28 0.18, 1.18 0.066 70 166 65 105 1.0 0.66 0.44, 1.02 97 186 66 81 1.0 0.60 0.39, 0.93 198 38 142 28 1.0 1.04 0.61, 1.78 233 50 126 21 1.0 0.85 0.48, 1.52 22 30 31 20 17 14 1.0 0.63 0.51 44 35 34 23 21 17 1.0 1.14 0.95 0.27, 1.48 0.21, 1.25 0.14 0.54, 2.40 0.44, 2.06 0.93 * OR, odds ratio; CI, confidence interval. y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race. z Quartile categories based on the distribution among all respondents. § Odds ratio was not calculated because of small numbers. { An urbanized area is defined by the Census Bureau as a densely settled territory that contains 50,000 or more people. # 1 acre ¼ 0.4 hectares. ** Categorization into tertiles of total acres of cropland within 500 m of the study subject’s house, averaged over three years (1998–2000), was based on the distribution for all respondents. yy Odds ratios were not adjusted for race. Am J Epidemiol 2008;167:350–361 Respondent/Nonrespondent Differences in an NHL Case-Control Study 359 TABLE 7. Odds ratios for non-Hodgkin lymphoma according to census tract educational level and block group household income, for respondents only and for all subjects, in a US multicenter study of non-Hodgkin lymphoma, 1998–2000 Respondents only Variable No. of controls No. of cases OR*,y All subjects 95% CI* No. of controls No. of cases Bias (%)z ORy 95% CI Years of education <12 260 233 1.0 670 483 12–15 538 735 1.41 1.12, 1.77 1,145 1,157 1.43 1.0 1.22, 1.66 1 16 145 198 1.37 1.01, 1.86 298 309 1.48 1.20, 1.84 8 $36,000 213 232 1.0 529 467 1.0 >$36,000–$47,000 241 296 1.05 0.82, 1.36 528 479 0.99 0.83, 1.19 6 >$47,000–$63,000 230 299 1.11 0.84, 1.45 528 469 1.02 0.85, 1.23 8 >$63,000 259 339 1.10 0.84, 1.45 528 534 1.18 0.97, 1.42 7 Median household income in 1999§ * OR, odds ratio; CI, confidence interval. y Adjusted for age, sex, race, and study center. z {[OR(respondents) – OR(all subjects)]/OR(respondents)} 3 100. § Quartile categories based on the distribution among all controls. Therefore, we were concerned about possible differences in the distribution of exposure variables between respondents and nonrespondents. Since we had little individual information for nonrespondents, we selected census-derived variables that characterized the socioeconomic status of the population living in FIGURE 1. Geocoded addresses of respondents and nonrespondents among cases of non-Hodgkin lymphoma (NHL) at the Detroit, Michigan, center of a US multicenter study of NHL, 1998–2000. The oval shows the location of a significant elliptical cluster of nonrespondents. The cluster of nonrespondents is explained by census demographic and socioeconomic characteristics. Semiminor axis: 8.2 km; semimajor axis: 16.3 km. Am J Epidemiol 2008;167:350–361 360 Shen et al. the study subjects’ geographic census units and compared this information between respondents and nonrespondents. To the best of our knowledge, this is the first study to have used a geographic information system to evaluate census data and spatial clustering among respondents and nonrespondents in epidemiologic research. Census variables may be useful surrogates for individual information because of the small size of census units and the tendency for them to be fairly homogeneous with regard to socioeconomic features. Linkage of participants in epidemiologic studies to their census units has been used to estimate neighborhood socioeconomic characteristics in public health studies (10). Our analysis of individual census variables showed that among cases at one or more study centers, nonresponse was statistically significantly associated with Hispanic ethnicity and non-White race, lower household income, multiple-unit housing, less education, and living in a more urbanized area. Differences between respondents and nonrespondents were most marked in Los Angeles. There were no significant differences in census characteristics among responding and nonresponding controls, except in Seattle, where nonresponse was associated with living in census blocks with a greater proportion of Hispanics, and in Detroit, where nonresponse was associated with lower income. However, for many of the other census characteristics, the patterns of risk of nonresponse among controls were similar to those observed among cases. This was confirmed in multivariable analyses, in which most census variables showed similar associations among cases and controls, although there were more significant contributors among cases. Our results indicate that there was higher participation in census units with higher socioeconomic status. Such a pattern of lower participation rates in populations with lower socioeconomic status has been demonstrated in several other studies (11–14). Our spatial clustering analysis confirmed that the low response rates in specific geographic areas in Detroit and Los Angeles were completely explained by demographic and socioeconomic factors. In future studies, it may be useful to monitor participation rates in small census-defined areas using the cluster analysis approach to target efforts to increase response rates. Disparities between respondents and nonrespondents may render case and control groups unrepresentative of the base population with respect to exposure prevalence. However, differences in exposure between respondents and nonrespondents do not necessarily lead to biased risk estimates (15–17). Risk estimates are biased only if a risk factor (or another factor related to a risk factor) differentially influences participation among cases and controls (16). In other words, the odds ratio among respondents may be unbiased even though the odds of exposure in responding cases and controls are misrepresented, because the odds ratio is determined by the combined effect of response rates on all exposure-disease combinations (15). In spite of the low response rates in our study, the analysis of educational level and household income and NHL risk comparing respondents with all eligible subjects revealed that the nonresponse bias in the odds ratio estimate was not large. The small bias in the odds ratio was probably due to the consistent nonresponse bias patterns by educational level or household income for cases and controls. Other studies in which the distribution of socioeconomic status has varied between respondents and nonrespondents have similarly demonstrated minimal bias in risk estimates for socioeconomic status and exposures which are correlated with socioeconomic status (12, 18). These findings are consistent with research demonstrating that the degree of bias is tolerable as long as the determinants of participation do not act dramatically differently for cases and controls and are not very highly correlated with the exposure of interest (19). In summary, we have shown that responding and nonresponding cases differed by demographic and socioeconomic characteristics in the NCI-SEER NHL Study. Our analysis underscores the need for efforts to improve response rates among subjects who are older or have a lower socioeconomic status (20). Cluster analysis may be a useful method for identifying particular neighborhoods to target for these efforts. Caution should be used in inferring characteristics of the target population based on respondents only. Nevertheless, in this study, nonresponse did not have a substantial impact on the risk of NHL associated with educational level and household income, offering some assurance that the magnitude of bias due to nonresponse is not likely to have been large. However, we were unable to assess the potential effect of nonresponse on specific exposure variables. ACKNOWLEDGMENTS This research was partly supported by the Intramural Research Program of the National Institutes of Health, National Cancer Institute, and conducted with contracts N01-PC-67010, N01-PC-67008, N02-PC-71105, N02-CP31003, N01-PC-67009, and N01-PC-65064. The authors thank Matthew Airola of Westat, Inc. (Rockville, Maryland) for obtaining the census data and mapping the study subjects. They acknowledge Muhammad T. Salam for assistance with preliminary data analysis. The authors thank Dr. Barry I. Graubard for statistical consultation. They also thank Nathan Appel of Information Management Services, Inc. (Silver Spring, Maryland) for preparing the data set. Contributions of each author: M. S. conducted the statistical analysis and was the main author of the manuscript. M. H. W. designed the analysis. J. C., M. W., W. C., R. K. S., L. B., and J. R. C. designed and conducted the case-control study. L. H. and L. P. were responsible for the spatial scan analysis. L. M. M. and A. J. D. contributed to the manuscript Discussion section. All authors contributed to manuscript review and editing. Conflict of interest: none declared. REFERENCES 1. Morton LM, Cahill J, Hartge P. Reporting participation in epidemiologic studies: a survey of practice. Am J Epidemiol 2006;163:197–203. 2. Rothman KJ, Greenland S. Precision and validity in epidemiologic studies. In: Rothman KJ, Greenland S, eds. Modern Am J Epidemiol 2008;167:350–361 Respondent/Nonrespondent Differences in an NHL Case-Control Study 3. 4. 5. 6. 7. 8. 9. 10. 11. epidemiology. 2nd ed. Philadelphia, PA: Lippincott Williams & Wilkins, 1998:115–34. Chatterjee N, Hartge P, Cerhan JR, et al. Risk of nonHodgkin’s lymphoma and family history of lymphatic, hematologic, and other cancers. Cancer Epidemiol Biomarkers Prev 2004;13:1415–21. Ward MH, Nuckols JR, Giglierano J, et al. Positional accuracy of two methods of geocoding. Epidemiology 2005;16: 542–7. Ward MH, Lubin J, Giglierano J, et al. Proximity to crops and residential exposure to agricultural herbicides in Iowa. Environ Health Perspect 2006;114:893–7. SAS Institute, Inc. The GLIMMIX procedure. Cary, NC: SAS Institute, Inc, 2005. Kulldorff M. A spatial scan statistic. Commun Stat Theory Methods 1997;26:1481–96. Kulldorff M, Huang L, Pickle L, et al. An elliptic spatial scan statistic. Stat Med 2006;25:3929–43. Kulldorff M. SaTScan: software for the spatial and space-time scan statistics. Version 7.0. Silver Spring, MD: Information Management Services, Inc, 2006. Krieger N, Chen JT, Waterman PD, et al. Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter? The Public Health Disparities Geocoding Project. Am J Epidemiol 2002;156: 471–82. Benfante R, Reed D, MacLean C, et al. Response bias in the Honolulu Heart Program. Am J Epidemiol 1989;130: 1088–100. Am J Epidemiol 2008;167:350–361 361 12. Richiardi L, Boffetta P, Merletti F. Analysis of nonresponse bias in a population-based case-control study on lung cancer. J Clin Epidemiol 2002;55:1033–40. 13. Psaty BM, Cheadle A, Koepsell TD, et al. Race- and ethnicityspecific characteristics of participants lost to follow-up in a telephone cohort. Am J Epidemiol 1994;140:161–71. 14. Sonne-Holm S, Sorensen TI, Jensen G, et al. Influence of fatness, intelligence, education and sociodemographic factors on response rate in a health survey. J Epidemiol Community Health 1989;43:369–74. 15. Criqui MH. Response bias and risk ratios in epidemiologic studies. Am J Epidemiol 1979;109:394–9. 16. Austin MA, Criqui MH, Barrett-Connor E, et al. The effect of response bias on the odds ratio. Am J Epidemiol 1981; 114:137–43. 17. Greenland S, Criqui MH. Are case-control studies more vulnerable to response bias? Am J Epidemiol 1981;114:175–7. 18. Chang ET, Zheng T, Weir EG, et al. Childhood social environment and Hodgkin’s lymphoma: new findings from a population-based case-control study. Cancer Epidemiol Biomarkers Prev 2004;13:1361–70. 19. Chen J, Wacholder S, Morton LM, et al. Quantifying selection bias in epidemiologic studies. (Abstract 577). Am J Epidemiol 2005;161(suppl):S145. 20. Hartge P. Raising response rates: getting to yes. Epidemiology 1999;10:105–7. 21. Harris NL, Jaffe ES, Diebold J, et al. Lymphoma classification—from controversy to consensus: the R.E.A.L. and WHO Classification of lymphoid neoplasms. Ann Oncol 2000; 11(suppl 1):3–10.
© Copyright 2026 Paperzz