Census and Geographic Differences between Respondents and

American Journal of Epidemiology
ª The Author 2007. Published by the Johns Hopkins Bloomberg School of Public Health.
All rights reserved. For permissions, please e-mail: [email protected].
Vol. 167, No. 3
DOI: 10.1093/aje/kwm292
Advance Access publication November 6, 2007
Practice of Epidemiology
Census and Geographic Differences between Respondents and Nonrespondents
in a Case-Control Study of Non-Hodgkin Lymphoma
Min Shen1, Wendy Cozen2, Lan Huang3, Joanne Colt1, Anneclaire J. De Roos4,5, Richard K.
Severson6,7, James R. Cerhan8, Leslie Bernstein2, Lindsay M. Morton1, Linda Pickle3, and
Mary H. Ward1
1
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD.
Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA.
3
Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD.
4
Fred Hutchinson Cancer Research Center, Seattle, WA.
5
Department of Epidemiology, School of Public Health and Community Medicine, University of Washington, Seattle, WA.
6
Department of Family Medicine and Public Health Sciences, School of Medicine, Wayne State University, Detroit, MI.
7
Karmanos Cancer Institute, Detroit, MI.
8
Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN.
2
Received for publication April 27, 2007; accepted for publication September 1, 2007.
To quantify nonresponse bias and estimate its potential impact, the authors compared census-based socioeconomic and demographic factors and geographic locations among respondents and nonrespondents in a multicenter
case-control study of non-Hodgkin lymphoma (1998–2000). Using a geographic information system, the authors
mapped current addresses and linked them to the 2000 US Census database to determine group-level demographic and socioeconomic information. They used logistic regression analysis to compute the risk of being
a nonrespondent, separately for cases and controls. They used spatial scan methods to evaluate spatial clustering
at each study center. Among cases at one or more centers, nonresponse was significantly associated with nonWhite race, lower household income, a greater proportion of multiple-unit housing, fewer years of education, and
living in a more urbanized area. For most factors, the authors observed similar patterns among controls, although
findings were mostly nonsignificant. They found two nonrandom elliptical clusters in Los Angeles, California, and
Detroit, Michigan, that disappeared after adjustment for the demographic factors. The authors determined the bias
in non-Hodgkin lymphoma risk associated with census-tract educational level by comparing risks among respondents and all subjects. The bias was 8%, indicating that the socioeconomic and demographic differences between
respondents and nonrespondents did not result in a large bias in the risk estimate for education.
bias (epidemiology); case-control studies; censuses; epidemiologic methods; geographic information systems;
lymphoma, non-Hodgkin
Abbreviations: NCI, National Cancer Institute; NHL, non-Hodgkin lymphoma; SEER, Surveillance, Epidemiology, and End
Results.
Case-control studies have yielded important insights
into risk factors for many diseases. Studies of the etiology
of cancer and other rare diseases often require a casecontrol design with multiple data collection components.
Correspondence to Dr. Mary H. Ward, Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics,
National Cancer Institute, 6120 Executive Blvd., Executive Plaza South 8006, MSC 7240, Rockville, MD 20892-7240 (e-mail: wardm@
mail.nih.gov).
350
Am J Epidemiol 2008;167:350–361
Respondent/Nonrespondent Differences in an NHL Case-Control Study
Investigators carrying out such studies face increasing challenges in obtaining sufficiently high response rates (1). Nonresponse bias occurs when participating cases, controls, or
both differ from nonparticipants with respect to the distribution of disease risk factors (2). When patterns of nonresponse
differ between cases and controls, biased risk estimates can
result. Higher rates of nonresponse increase the likelihood of
nonresponse bias and errors in risk estimates.
Through the Surveillance, Epidemiology, and End Results (SEER) Program, the National Cancer Institute (NCI)
compiles data on cancer incidence and survival from numerous population-based cancer registries. We conducted
a large, multicenter, population-based case-control study
of non-Hodgkin lymphoma (NHL) at four SEER centers:
Seattle, Washington; Los Angeles, California; Detroit,
Michigan; and the state of Iowa (the NCI-SEER NHL
Study). The study examined various environmental exposures as possible risk factors for NHL, including pesticides,
occupational exposures, hair dyes, tobacco and alcohol, dietary factors, and viruses. Because response rates were relatively low and because many of the exposures of interest
may vary by race, sex, age, socioeconomic status, and geographic location, we compared these factors among participating and nonparticipating cases and controls to evaluate
the potential for nonresponse bias.
MATERIALS AND METHODS
Study design and data sources
The methods of the NCI-SEER NHL Study have been
described in detail elsewhere (3). A total of 2,248 potentially eligible cases were identified in four population-based
SEER cancer registries from July 1, 1998, through June 30,
2000. We did not approach 520 NHL cases because they had
died (14 percent), their physician refused (3 percent), they
could not be found (6 percent), or they had moved out of the
area (1 percent). Of the remaining 1,728 cases, 274 (16
percent) refused to participate and 133 (8 percent) were
not interviewed for reasons such as illness, impairment, or
our inability to contact them. We selected controls stratified
on age, sex, race, and center. Controls aged 20–64 years
were selected by one-step list-assisted random digit dialing,
sampling from banks with one or more telephone numbers
listed in the White Pages (residential) directory. Telephone
numbers were matched against the Yellow Pages (commercial) directory to remove business numbers and were autodialed to remove nonworking numbers. Controls aged 65
years or older were selected from Medicare files. Of 2,409
eligible controls selected from the general population in the
four registry areas, 1,352 did not participate in the study
because they either could not be located (13 percent), had
died (1 percent), had moved out of the area (1 percent),
refused (35 percent), or were not interviewed because of
illness, a language barrier, or other reasons (6 percent).
All participants were defined as respondents, and all selected eligible subjects who did not participate were defined
as nonrespondents. The overall response rates were 58.8
percent for cases and 43.9 percent for controls.
Am J Epidemiol 2008;167:350–361
351
Using a geographic information system, we linked the
current address of each respondent and nonrespondent to
the 2000 US Census database to compare respondents with
nonrespondents on the basis of demographic and socioeconomic factors. The addresses of respondents and nonrespondents were geocoded using ArcView 3.2 software (ESRI, Inc.,
Redlands, California) and Geographic Data Technology’s
MatchMaker SDK Professional Version 4.3 street database
(Geographic Data Technology, Inc., Lebanon, New Hampshire) (4). Addresses for 2,183 of 2,378 respondents (92 percent) and 1,996 of 2,279 nonrespondents (88 percent) were
located or were matched to the nearest intersection; the remainder could not be geocoded because the addresses were
incomplete or could not be found in the database. Among
both cases and controls, persons with addresses that could be
geocoded were comparable to those whose addresses could
not be geocoded with respect to age and gender. Information
on race and education was missing for many nonrespondents,
but among those with this information, there were no substantial differences based on geocoding status. Geocoded
addresses were linked to the corresponding 2000 US Census
geographic units (block, block group, and census tract). We
also linked the addresses to the 1990 US Census (block
group) to obtain information on water source for each subject, which was not available in the 2000 Census. For 117
subjects, the geocoded location of their residence was linked
to a census block with no population, probably because of
a geocoding error. These subjects were excluded, leaving
2,109 respondents (1,166 cases, 943 controls) and 1,953 nonrespondents (783 cases, 1,170 controls) in the analysis.
Data analysis
The decennial census data are available in hierarchical
order of increasing geographic and population size (e.g.,
census block, block group, and census tract). Information
on block demographic factors and block group socioeconomic factors and housing characteristics was compiled
from the questions asked of all people (Summary File 1)
and a one-in-six sample of all people (Summary File 3),
respectively. We obtained this information from the US
Census Bureau website (http://www.census.gov) for each
census unit where respondents and nonrespondents lived.
All variables except for block-group median household income were listed in the Census database as the number of
people in the census unit defined by the particular demographic factor (table 1). We calculated the percentage of the
census-unit population in each subgroup (e.g., the proportion of non-Hispanic Whites in a census block). We also
calculated years of education for specific demographic subgroups as a weighted average across the census unit. We
used a weight of 5 years for ‘‘less than 9th grade,’’ 10 for
‘‘9th–12th grade, no diploma,’’ 12 for ‘‘high school graduate
(includes equivalency),’’ 14 for ‘‘some college, no degree,’’
15 for ‘‘associate degree,’’ 16 for ‘‘bachelor’s degree,’’ and
18 for ‘‘graduate or professional degree.’’
Self-reported information on race was available for nonrespondents in Detroit and Los Angeles to allow for oversampling of African Americans at the two study centers (3);
this was not the case for Seattle and Iowa. For persons with
352 Shen et al.
TABLE 1. US Census data sources and derived variables in a US multicenter study of non-Hodgkin lymphoma, 1998–2000
Data source
Level
Census 2000 (SF* 1)
Census block
Census 2000 (SF 3)
Block group
Census 2000 (SF 3)
Census 1990 (SF 3)
Median population
of census unit
(and interquartile range)
98 (48–214)
Variables
Derived variables
Population stratified by age, sex,
and race
Proportion of people by race
1,226 (902–1,683)
Median household income in 1999;
population stratified by housing
type; population stratified by sex,
race, and educational attainment;
population in urbanized and rural
areas
Proportion of people with different
housing types; estimated
educational attainment by race
and sex; proportion of people
living in urbanized areas (Iowa)
Tract
3,269 (2,513–4,248)
Population stratified by sex, age,
and educational attainment
Estimated educational attainment
by sex and age
Block group
1,226 (902–1,683)
Source of water (public system,
private well, or other source)
Proportion of people using a
public water system (Iowa)
* SF, Summary File.
self-reported information on individual race, we categorized
race into White (designated 1) versus non-White (designated
0). For persons with missing information on race, we calculated the proportion of Whites by sex and 10-year age group
in the census blocks where they resided, and we assigned the
appropriate sex- and age-specific proportion of Whites
(range, 0–1). There was a good correlation between the census block and individual race variables for persons with individual race information (Spearman correlation, r ¼ 0.60).
The amount of cropland surrounding the residence was
found to be a predictor of residential exposure to agricultural pesticides among respondents in Iowa (5). We computed the average crop acreage within 500 m of the
residence during 1998–2000 for each of 196 respondents
and 112 nonrespondents in Iowa whose residence fell within
the boundaries of crop maps for central Iowa. As previously
described (5), maps for 1998, 1999, and 2000 were created
from spring and summer Landsat multispectral satellite images from two path/rows in south-central Iowa, using Farm
Service Agency records for validation.
We used logistic regression to estimate the risk (odds
ratio) of being a nonrespondent for cases and controls separately and by study center. We categorized the proportions
of census units comprising different races, incomes, and
housing types into four levels based on the quartile cutpoints
among all respondents, by center, using the lowest quartile
as the reference group. We categorized education into three
levels: <12 years, 12–15 years, and 16 years. Crop acreage near homes was categorized into three levels based on
tertiles among the respondents. We adjusted all models for
individual-level information on sex, age (continuous), and
race (White, non-White). The census race variable was used
when information on individual race was missing. All statistical tests were two-sided.
Seattle, Los Angeles, and Detroit are highly urbanized
areas, with 97 percent of the population using public water
supplies. However, Iowa has more variation in the percentage of the census block group population using public water
supplies and living in urbanized areas. Therefore, we evaluated these variables for Iowa only and categorized them
into two levels (<50 percent vs. 50 percent) because the
distributions tended to be bimodal.
We used stepwise logistic regression to identify the most
significant variables associated with nonresponse at each
center using significance levels of 0.20 and 0.15 for entering
an effect into the model and keeping it in the model, respectively. Age, sex, and race were retained in all models.
We assessed spatial autocorrelation by repeating the logistic
regression analysis with the same covariates plus additional
random effects for latitude and longitude of the geocoded
residences, using the GLIMMIX procedure in SAS (6). No
unexplained spatial autocorrelation was found, so we present results from the fixed-effects logistic models.
We evaluated the potential for bias in the estimate of NHL
risk associated with census tract educational level and median household income by comparing the odds ratio based
on respondents only with that based on all subjects. Educational level and household income were analyzed because
they are major components of socioeconomic status and
they may be correlated with certain occupational and environmental exposures.
We analyzed spatial clustering of respondents and nonrespondents separately for cases and controls at each center, using a spatial scan method based on a Bernoulli model
developed by Kulldorff and colleagues (7–9). The maximum size of the geographic cluster was set to include no
more than 49 percent of all subjects. The analysis identified
circular or elliptical clusters in which the ratio of nonrespondents to respondents differed from the overall ratio
in the study population. We compared the demographic
and socioeconomic characteristics of individuals within
a significant cluster to those of persons outside the cluster
using a t test, or the Satterthwaite method if variances
were not equal. To determine whether the spatial clustering
was explained by the census factors that were associated
with nonresponse, we conducted the spatial clustering
analysis again using a normal model-based spatial scan
method (9) on the residuals of the multivariate logistic
model that included the census demographic and socioeconomic factors.
Am J Epidemiol 2008;167:350–361
Respondent/Nonrespondent Differences in an NHL Case-Control Study
RESULTS
For Los Angeles, Detroit, Seattle, and Iowa, response
rates were 52 percent, 55 percent, 62 percent, and 67 percent, respectively, among cases and 38 percent, 39 percent,
44 percent, and 58 percent, respectively, among controls.
Table 2 shows the distribution of demographic and other
individual-level information for respondents and nonrespondents. Overall, response rates were higher among female
cases (62 percent) than among male cases (56 percent).
Cases and controls who responded tended to be younger
than nonrespondents (median age in cases, 58 years vs. 60
years; median age in controls, 61 years vs. 63 years). Selfreported information on race was not available from a high
proportion of nonrespondents. Information on educational
level was obtained from all respondents but only from
a small proportion of nonrespondents who completed a brief
questionnaire. The histologic distributions of NHL were
significantly different between respondents and nonrespondents, with a greater proportion of follicular lymphomas
and a lower proportion of diffuse large B-cell lymphomas
being seen among respondents, mainly because of the
higher fatality rate for diffuse large B-cell lymphomas as
compared with follicular and other lymphomas. Approximately 18.0 percent of persons with diffuse large B-cell
lymphomas were deceased when we attempted to contact
them, as compared with 2.8 percent of persons with follicular lymphomas. With the exception of race, the distributions
of these factors were similar across the four centers, although
for some centers the differences did not achieve statistical
significance (results not shown). There were more African
Americans at the Los Angeles and Detroit centers (18–31
percent) than at the Seattle and Iowa centers (<2.5 percent)
(results not shown).
Tables 3–6 present comparisons of the census block,
block group, or tract demographic and socioeconomic characteristics of respondents and nonrespondents at each study
center. Los Angeles had a much higher percentage of Hispanics than the other centers. Among cases, the odds of
being a nonrespondent increased significantly as the proportion of Hispanics increased. The odds decreased significantly with increasing proportions of non-Hispanic Whites,
median household income, proportions living in singlefamily homes, and proportions in years-of-education categories among men and women overall and among Hispanics
(table 3). Among controls, nonresponse was not statistically
significantly associated with the census variables; however,
as was observed for cases, odds ratios for nonresponse were
inversely associated with years of education.
In Detroit (table 4), the odds of being a nonresponding
control decreased significantly as median household income
increased; the same pattern was observed among cases,
although the trend was not statistically significant. Educational level was higher for both men and women in responding cases’ census tracts as compared with nonresponding
cases’ census tracts. Among African-American women,
a higher educational level was inversely associated with
nonresponse. We observed similar patterns for educational
level among male African-American cases and controls
which did not reach statistical significance.
Am J Epidemiol 2008;167:350–361
353
In Seattle (table 5), controls from census blocks with fewer
Hispanics and more non-Hispanic Whites and from census
block groups with higher household incomes were more
likely to participate. Cases and controls from census block
groups with 50 percent single-family housing units were
more likely to participate; however, the association was statistically significant only among cases in the third quartile.
In Iowa (table 6), cases from census blocks with more
Hispanics and fewer non-Hispanic Whites were less likely to
participate in the study. Among cases and controls, being a
nonrespondent was associated with living in an urbanized area,
although the association was statistically significant only among
cases. For both cases and controls, the proportions who obtained their water from a public utility were similar between
respondents and nonrespondents. Although respondents
were more likely to live in less urbanized areas, differences
in the acreage of cropland within 500 m of homes did not
differ statistically between respondents and nonrespondents.
Several of the census variables we evaluated were correlated with each other (e.g., household income and living in
a single-family home (r ¼ 0.51), household income and
educational level (r ¼ 0.64)). Adjusting all of the census
factors for each other in multivariable analyses, we found
that, among cases, non-White race (all centers), older age
(Detroit and Iowa), fewer years of education (Los Angeles
and Detroit), male sex (Iowa and Seattle), and a lower
percentage of single-family homes (Los Angeles) were statistically significant predictors of nonresponse. Among controls, non-White race (Los Angeles and Seattle; non-Whites
were mostly African-American), lower income (Detroit),
and fewer years of education (Seattle) were significant predictors of nonresponse.
We evaluated the association between census tract educational level and household income and risk of NHL
among respondents only and among all eligible subjects
(table 7). When results were based on respondents only,
the odds ratios for 12–15 years of education versus <12
years (odds ratio ¼ 1.41, 95 percent confidence interval:
1.12, 1.77) and for 16 years of education versus <12 years
(odds ratio ¼ 1.37, 95 percent confidence interval: 1.01,
1.86) were underestimated by 1 percent and 8 percent, respectively, as compared with the respective odds ratios for
education among all subjects. The bias related to household
income was 6 percent, 8 percent, and 7 percent for comparisons between the three higher income levels, respectively, as compared with the lowest income level.
In the spatial analysis, we found a nonrandom elliptical
cluster of 105 cases (p ¼ 0.025) in Detroit in which there
were fewer respondents than expected (35 vs. 59) (figure 1).
In this cluster, there were fewer Whites (p < 0.001), and
cases inside the ellipse were less educated (p < 0.001) than
cases outside the ellipse. The differences by education and
race were observed for both respondents and nonrespondents. In Los Angeles, there was also an elliptical cluster of
269 cases (p ¼ 0.022) in which the number of respondents
was lower than expected (109 vs. 140). There were fewer
Whites inside the ellipse (p < 0.001) than outside, and
subjects inside the ellipse were less educated (p < 0.001)
than subjects outside the ellipse. The differences by education and race were seen for both respondents and
354 Shen et al.
TABLE 2. Individual demographic information for respondents and nonrespondents, by case-control status, in a US multicenter
study of non-Hodgkin lymphoma, 1998–2000
Controls
Respondents
(n ¼ 1,057)
Cases
Nonrespondents
(n ¼ 1,352)
No.
%
No.
%
Los Angeles, California
273
26
442
33
Detroit, Michigan
214
20
332
25
Seattle, Washington
294
28
376
Iowa
276
26
202
Respondents
(n ¼ 1,321)
No.
Nonrespondents
(n ¼ 927)
%
No.
%
319
24
296
32
319
24
259
28
28
322
24
198
21
15
361
27
174
19
Study center
p value*
<0.0001
<0.0001
Sex
Male
546
52
Female
511
48
p value*
730
54
711
54
622
46
610
46
0.25
555
60
372
40
28
3
0.004
Age group (years)
<30
27
3
60
4
37
3
30–39
79
7
106
8
110
8
82
9
40–49
145
14
193
14
235
18
122
13
50–59
226
21
223
16
314
24
213
23
60–69
372
35
438
32
409
31
281
30
70–79
208
20
331
24
216
16
201
22
p value*
0.0008
0.006
Self-reported race
White
843
80
725
54
1,123
85
332
36
Black, mixed Black/other
151
14
256
19
110
8
128
14
Asian
22
2
33
2
40
3
28
3
Other
16
2
62
5
21
2
52
6
Missing data
25
2
276
20
27
2
387
42
Educational level
Unknown
0
5
Refused to answer
0
0
0.4
0
0
1
0.1
1
0.1
12 years
111
11
20
1
128
10
11
1
13–15 years
616
58
97
7
815
62
26
3
16 years
330
31
52
4
377
29
1,178
87
0
Missing data
0
17
2
872
94
Histologic typey
Follicular
319
24
134
14
Diffuse
41
417
32
376
T-cell
82
6
90
8
Other
459
35
297
32
44
3
30
3
Unknown
p value*
<0.0001
* Based on the Pearson chi-squared test.
y Based on the Revised European–American Lymphoma/World Health Organization classification (21).
nonrespondents. At both centers, spatial analysis of the residuals from the logistic regression analyses adjusted for
census race, sex, age, income, education, and proportion
of single-family housing units showed no significant clus-
tering, indicating that all of the geographic clustering of
nonrespondents could be explained by socioeconomic and
demographic factors. There were no statistically significant
geographic clusters of nonrespondents in Iowa or Seattle.
Am J Epidemiol 2008;167:350–361
Respondent/Nonrespondent Differences in an NHL Case-Control Study
355
TABLE 3. Comparison of selected census variables between respondents and nonrespondents, by case-control status, at the
Los Angeles, California, center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000
Controls
Variable
Cases
95% CI*
No. of
respondents
85
54
1.0
0.90
0.55, 1.49
74
61
1.23
0.76, 2.01
86
0.79
0.48, 1.30
72
71
1.42
0.87, 2.30
119
0.88
0.55, 1.41
58
78
2.00
1.23, 3.26
No. of
respondents
No. of
nonrespondents
7.5
46
79
1.0
>7.5–22
57
88
>22–50
59
>50
73
OR*,y
No. of
nonrespondents
ORy
95% CI
Proportion of Hispanics (%)z
p-trend
0.54
0.0049
Proportion of non-Hispanic
Whites (%)z
13
74
158
1.0
57
93
1.0
>13–42
67
76
0.63
0.40, 0.99
64
62
0.66
0.40, 1.08
>42–71
49
74
0.89
0.54, 1.47
82
65
0.57
0.35, 0.93
>71
45
64
0.88
0.51, 1.49
86
44
0.38
0.22, 0.65
p-trend
0.84
0.0004
Median household income in 1999z
$36,000
63
132
1.0
68
96
>$36,000–$49,000
63
87
0.76
0.48, 1.20
69
62
0.70
0.44, 1.12
>$49,000–$66,000
64
82
0.70
0.44, 1.11
66
57
0.69
0.42, 1.11
>$66,000
45
71
0.91
0.55, 1.52
86
49
0.47
0.29, 0.76
p-trend
1.0
0.52
0.0032
Living in a single-family housing
unit (%)z
28
57
89
1.0
74
79
1.0
>28–63
68
118
1.15
0.73, 1.81
63
78
1.06
0.66, 1.70
>63–91
61
88
0.93
0.58, 1.48
70
56
0.74
0.46, 1.19
>91
49
77
1.10
0.66, 1.83
82
51
0.61
0.38, 0.99
p-trend
0.99
0.020
Years of education for Hispanic
men aged >25 years
131
222
1.0
143
166
1.0
12–15
83
123
1.02
0.70, 1.48
118
84
0.65
0.45, 0.94
16
15
16
0.75
0.36, 1.59
20
8
0.39
0.17, 0.93
<12
p-trend
0.69
0.0040
Years of education for Hispanic
women aged >25 years
<12
12–15
16
148
249
1.0
155
176
1.0
75
110
1.03
0.71, 1.49
114
78
0.66
0.46, 0.95
6
6
0.69
0.22, 2.21
11
8
0.68
0.26, 1.77
p-trend
0.86
0.033
Years of education for men aged
45–64 years
<12
12–15
16
74
148
1.0
64
91
1.0
145
205
0.81
0.56, 1.17
198
157
0.62
0.42, 0.91
16
19
0.73
0.35, 1.54
27
16
0.49
0.24, 0.99
p-trend
0.23
0.0096
Years of education for women aged
45–64 years
<12
12–15
16
p-trend
80
158
1.0
154
211
0.78
1
3
0.55, 1.12
—§
0.24
* OR, odds ratio; CI, confidence interval.
y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race.
z Quartile categories based on the distribution among all respondents.
§ Odds ratio was not calculated because of small numbers.
Am J Epidemiol 2008;167:350–361
77
99
1.0
210
164
0.68
2
1
0.47, 0.98
—§
0.039
356 Shen et al.
TABLE 4. Comparison of selected census variables between respondents and nonrespondents, by case-control status, at the Detroit,
Michigan, center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000
Controls
Variable
Proportion of Hispanics (%)z
0
>0–0.38
>0.38–2.08
>2.08
p-trend
Proportion of non-Hispanic
Whites (%)z
60
>60–91
>91–97
>97
p-trend
Median household income in 1999z
$41,000
>$41,000–$54,000
>$54,000–$72,000
>$72,000
p-trend
Living in a single-family housing
unit (%)z
57
>57–84
>84–98
>98
p-trend
Years of education for AfricanAmerican men aged
>25 years
<12
12–15
16
p-trend
Years of education for AfricanAmerican women aged
>25 years
<12
12–15
16
p-trend
Years of education for men aged
45–64 years
<12
12–15
16
p-trend
Years of education for women aged
45–64 years
<12
12–15
16
p-trend
Cases
No. of
respondents
No. of
nonrespondents
103
3
42
47
146
3
73
91
1.30
1.44
59
50
49
37
103
70
72
68
1.0
1.04
1.13
1.40
60
43
47
45
127
77
61
48
1.0
0.84
0.61
0.49
44
54
49
48
73
95
85
60
1.0
1.05
1.02
0.77
36
76
15
74
122
14
1.0
0.80
0.45
22
82
14
43
143
19
1.0
0.89
0.72
30
152
13
59
236
18
1.0
0.84
0.76
23
171
1
42
267
4
1.0
0.88
OR*,y
95% CI*
1.0
—§
0.82, 2.07
0.93, 2.24
0.08
0.52, 2.08
0.54, 2.36
0.66, 2.99
0.28
0.51, 1.39
0.36, 1.03
0.28, 0.85
0.0062
0.64, 1.74
0.61, 1.72
0.45, 1.31
0.36
0.49, 1.33
0.19, 1.06
0.093
0.50, 1.60
0.29, 1.76
0.49
0.51, 1.39
0.32, 1.79
0.46
0.51, 1.53
—§
0.89
No. of
respondents
No. of
nonrespondents
137
2
81
76
105
2
58
65
0.93
1.15
63
74
74
85
72
59
46
53
1.0
0.84
0.72
0.69
63
80
77
76
82
49
50
49
1.0
0.53
0.55
0.58
79
69
73
75
58
57
58
57
1.0
1.20
1.11
1.19
38
111
21
48
88
14
1.0
0.67
0.64
23
115
25
36
103
17
1.0
0.59
0.46
25
249
22
41
173
16
1.0
0.44
0.46
19
275
2
28
201
1
1.0
0.46
ORy
95% CI
1.0
—§
0.60, 1.43
0.75, 1.77
0.67
0.42, 1.71
0.34, 1.53
0.32, 1.46
0.28
0.31, 0.88
0.32, 0.93
0.34, 0.99
0.080
0.73, 1.98
0.68, 1.82
0.72, 1.96
0.57
0.40, 1.15
0.27, 1.48
0.18
0.32, 1.06
0.20, 1.06
0.053
0.25, 0.78
0.20, 1.09
0.030
0.24, 0.88
—§
0.019
* OR, odds ratio; CI, confidence interval.
y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race.
z Quartile categories based on the distribution among all respondents.
§ Odds ratio was not calculated because of small numbers.
Am J Epidemiol 2008;167:350–361
Respondent/Nonrespondent Differences in an NHL Case-Control Study
357
TABLE 5. Comparison of selected census variables between respondents and nonrespondents, by case-control status, at the Seattle,
Washington, center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000
Controls
Variable
No. of
respondents
No. of
nonrespondents
Cases
OR*,y
95% CI*
No. of
respondents
No. of
nonrespondents
ORy
95% CI
Proportion of Hispanics (%)z
0.62
76
70
1.0
67
35
1.0
>0.62–2.43
72
66
0.96
0.60, 1.54
72
29
0.81 0.44, 1.51
>2.43–5.22
62
75
1.21
0.75, 1.94
82
32
0.80 0.44, 1.47
>5.22
67
104
1.56
0.99, 2.45
77
46
1.14 0.64, 2.03
p-trend
0.033
0.63
Proportion of Non-Hispanic
Whites (%)z
76
68
117
1.0
>76–86
68
69
0.62
0.39, 0.98
75
48
1.0
76
34
0.85 0.48, 1.52
>86–92
75
59
0.50
0.31, 0.79
73
24
0.64 0.35, 1.20
>92
66
70
0.70
0.44, 1.11
74
36
0.95 0.53, 1.69
p-trend
0.062
0.67
Median household income in 1999z
$47,000
67
98
1.0
>$47,000–$59,000
69
83
0.86
0.55, 1.35
77
47
1.0
75
33
0.81 0.46, 1.45
>$59,000–$73,000
70
74
0.79
0.50, 1.25
74
29
0.79 0.44, 1.43
>$73,000
71
60
0.63
0.39, 1.01
72
33
0.80 0.45, 1.43
p-trend
0.053
0.45
Living in a single-family housing
unit (%)z
49
72
105
1.0
72
48
1.0
>49–78
68
64
0.68
0.43, 1.08
76
29
0.57 0.32, 1.04
>78–94
62
75
0.89
0.56, 1.41
81
29
0.56 0.31, 1.00
>94
75
71
0.70
0.44, 1.10
69
36
0.80 0.45, 1.41
p-trend
0.23
0.37
Years of education for men aged
45–64 years§
<12
12–15
16
1
2
1.0
252
285
24
28
1.06
5
6
1.0
268
302
1.10
4
7
1.73
1
2
274
132
23
8
0.74 0.31, 1.75
5
3
1.0
0.32, 3.74
288
138
0.31, 9.85
5
1
0.60, 1.89
1.0
Years of education for women aged
45–64 years
<12
12–15
16
p-trend
0.53
1.94 0.41, 9.08
—{
0.70
* OR, odds ratio; CI, confidence interval.
y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race.
z Quartile categories based on the distribution among all respondents.
§ Odds ratios were based on the comparison between 16 years and <16 years.
{ Odds ratio was not calculated because of small numbers.
DISCUSSION
Response rates in epidemiologic studies, particularly
population-based case-control studies, have been decreasing for two decades (1). Low response rates increase the
likelihood of nonresponse bias, the systematic difference
Am J Epidemiol 2008;167:350–361
in exposures and other factors between responding and
nonresponding groups. In our study, the response rates
ranged from 52 percent to 67 percent for cases and from
38 percent to 58 percent for controls, and among cases, the
distributions of sex, age, and histologic type differed
significantly between respondents and nonrespondents.
358 Shen et al.
TABLE 6. Comparison of selected census variables and amount of surrounding cropland between respondents and nonrespondents,
by case-control status, at the Iowa center of a US multicenter study of non-Hodgkin lymphoma, 1998–2000
Controls
Variable
Proportion of Hispanics (%)
0
>0–1.07
>1.07
p-trend
Proportion of non-Hispanic Whites (%)
94
>94
Median household income in 1999z
$34,000
>$34,000–$40,000
>$40,000–$46,000
>$46,000
p-trend
Living in a single-family housing unit (%)z
71
>71–83
>83–91
>91
p-trend
Years of education for men aged
45–64 years
<12
12–15
16
p-trend
Years of education for women aged
45–64 years
<12
12–15
16
p-trend
Proportion of people living in an
urbanized area (%){
50
<50
Water obtained from public water
system (%)
50
<50
Amount of cropland (acres#) within
500 m of subject’s residence**,yy
6.7
>6.7–36.9
>36.9
p-trend
Cases
No. of
respondents
No. of
nonrespondents
OR*,y
161
9
66
118
9
43
1.0
1.39
0.90
58
178
42
128
1.0
0.99
63
53
58
62
45
35
41
49
1.0
0.94
1.01
1.14
53
61
63
59
52
37
43
38
1.0
0.59
0.67
0.63
10
225
1
9
160
1
1.0
0.83
10
226
6
164
1.0
1.48
95% CI*
0.53, 3.62
0.57, 1.44
0.78
0.62, 1.59
0.53, 1.68
0.58, 1.77
0.66, 1.97
0.60
0.34, 1.05
0.39, 1.17
0.36, 1.12
0.17
0.32, 2.14
—§
0.78
0.46, 4.73
No. of
respondents
No. of
nonrespondents
ORy
203
17
63
87
9
51
1.0
1.19
1.84
72
211
54
93
1.0
0.61
67
80
69
67
40
44
24
39
1.0
0.94
0.58
0.92
75
70
69
69
48
31
35
33
1.0
0.75
0.81
0.69
14
264
5
11
134
2
1.0
0.59
10
271
2
11
136
1.0
0.46
95% CI
0.49, 2.85
1.14, 2.97
0.015
0.39, 0.96
0.53, 1.65
0.31, 1.11
0.51, 1.66
0.49
0.42, 1.36
0.46, 1.45
0.38, 1.24
0.26
0.24, 1.42
—§
0.28
0.18, 1.18
0.066
70
166
65
105
1.0
0.66
0.44, 1.02
97
186
66
81
1.0
0.60
0.39, 0.93
198
38
142
28
1.0
1.04
0.61, 1.78
233
50
126
21
1.0
0.85
0.48, 1.52
22
30
31
20
17
14
1.0
0.63
0.51
44
35
34
23
21
17
1.0
1.14
0.95
0.27, 1.48
0.21, 1.25
0.14
0.54, 2.40
0.44, 2.06
0.93
* OR, odds ratio; CI, confidence interval.
y Multivariable odds ratio for the risk of being a nonrespondent, adjusted for age, sex, and race.
z Quartile categories based on the distribution among all respondents.
§ Odds ratio was not calculated because of small numbers.
{ An urbanized area is defined by the Census Bureau as a densely settled territory that contains 50,000 or more people.
# 1 acre ¼ 0.4 hectares.
** Categorization into tertiles of total acres of cropland within 500 m of the study subject’s house, averaged over three years (1998–2000), was based on
the distribution for all respondents.
yy Odds ratios were not adjusted for race.
Am J Epidemiol 2008;167:350–361
Respondent/Nonrespondent Differences in an NHL Case-Control Study
359
TABLE 7. Odds ratios for non-Hodgkin lymphoma according to census tract educational level and block group household income, for
respondents only and for all subjects, in a US multicenter study of non-Hodgkin lymphoma, 1998–2000
Respondents only
Variable
No. of
controls
No. of
cases
OR*,y
All subjects
95% CI*
No. of
controls
No. of
cases
Bias (%)z
ORy
95% CI
Years of education
<12
260
233
1.0
670
483
12–15
538
735
1.41
1.12, 1.77
1,145
1,157
1.43
1.0
1.22, 1.66
1
16
145
198
1.37
1.01, 1.86
298
309
1.48
1.20, 1.84
8
$36,000
213
232
1.0
529
467
1.0
>$36,000–$47,000
241
296
1.05
0.82, 1.36
528
479
0.99
0.83, 1.19
6
>$47,000–$63,000
230
299
1.11
0.84, 1.45
528
469
1.02
0.85, 1.23
8
>$63,000
259
339
1.10
0.84, 1.45
528
534
1.18
0.97, 1.42
7
Median household income
in 1999§
* OR, odds ratio; CI, confidence interval.
y Adjusted for age, sex, race, and study center.
z {[OR(respondents) – OR(all subjects)]/OR(respondents)} 3 100.
§ Quartile categories based on the distribution among all controls.
Therefore, we were concerned about possible differences in
the distribution of exposure variables between respondents
and nonrespondents.
Since we had little individual information for nonrespondents, we selected census-derived variables that characterized the socioeconomic status of the population living in
FIGURE 1. Geocoded addresses of respondents and nonrespondents among cases of non-Hodgkin lymphoma (NHL) at the Detroit, Michigan,
center of a US multicenter study of NHL, 1998–2000. The oval shows the location of a significant elliptical cluster of nonrespondents. The cluster of
nonrespondents is explained by census demographic and socioeconomic characteristics. Semiminor axis: 8.2 km; semimajor axis: 16.3 km.
Am J Epidemiol 2008;167:350–361
360 Shen et al.
the study subjects’ geographic census units and compared
this information between respondents and nonrespondents.
To the best of our knowledge, this is the first study to have
used a geographic information system to evaluate census
data and spatial clustering among respondents and nonrespondents in epidemiologic research. Census variables may
be useful surrogates for individual information because of
the small size of census units and the tendency for them to
be fairly homogeneous with regard to socioeconomic features. Linkage of participants in epidemiologic studies to
their census units has been used to estimate neighborhood
socioeconomic characteristics in public health studies (10).
Our analysis of individual census variables showed that
among cases at one or more study centers, nonresponse was
statistically significantly associated with Hispanic ethnicity
and non-White race, lower household income, multiple-unit
housing, less education, and living in a more urbanized area.
Differences between respondents and nonrespondents were
most marked in Los Angeles. There were no significant
differences in census characteristics among responding
and nonresponding controls, except in Seattle, where nonresponse was associated with living in census blocks with
a greater proportion of Hispanics, and in Detroit, where
nonresponse was associated with lower income. However,
for many of the other census characteristics, the patterns of
risk of nonresponse among controls were similar to those
observed among cases. This was confirmed in multivariable
analyses, in which most census variables showed similar
associations among cases and controls, although there were
more significant contributors among cases.
Our results indicate that there was higher participation in
census units with higher socioeconomic status. Such a pattern of lower participation rates in populations with lower
socioeconomic status has been demonstrated in several
other studies (11–14). Our spatial clustering analysis confirmed that the low response rates in specific geographic
areas in Detroit and Los Angeles were completely explained
by demographic and socioeconomic factors. In future studies, it may be useful to monitor participation rates in small
census-defined areas using the cluster analysis approach to
target efforts to increase response rates.
Disparities between respondents and nonrespondents may
render case and control groups unrepresentative of the base
population with respect to exposure prevalence. However,
differences in exposure between respondents and nonrespondents do not necessarily lead to biased risk estimates
(15–17). Risk estimates are biased only if a risk factor (or
another factor related to a risk factor) differentially influences participation among cases and controls (16). In other
words, the odds ratio among respondents may be unbiased
even though the odds of exposure in responding cases and
controls are misrepresented, because the odds ratio is determined by the combined effect of response rates on all
exposure-disease combinations (15).
In spite of the low response rates in our study, the analysis
of educational level and household income and NHL risk
comparing respondents with all eligible subjects revealed
that the nonresponse bias in the odds ratio estimate was not
large. The small bias in the odds ratio was probably due to the
consistent nonresponse bias patterns by educational level or
household income for cases and controls. Other studies in
which the distribution of socioeconomic status has varied
between respondents and nonrespondents have similarly
demonstrated minimal bias in risk estimates for socioeconomic status and exposures which are correlated with socioeconomic status (12, 18). These findings are consistent with
research demonstrating that the degree of bias is tolerable as
long as the determinants of participation do not act dramatically differently for cases and controls and are not very
highly correlated with the exposure of interest (19).
In summary, we have shown that responding and nonresponding cases differed by demographic and socioeconomic
characteristics in the NCI-SEER NHL Study. Our analysis
underscores the need for efforts to improve response rates
among subjects who are older or have a lower socioeconomic status (20). Cluster analysis may be a useful method
for identifying particular neighborhoods to target for these
efforts. Caution should be used in inferring characteristics of
the target population based on respondents only. Nevertheless, in this study, nonresponse did not have a substantial
impact on the risk of NHL associated with educational level
and household income, offering some assurance that the
magnitude of bias due to nonresponse is not likely to have
been large. However, we were unable to assess the potential
effect of nonresponse on specific exposure variables.
ACKNOWLEDGMENTS
This research was partly supported by the Intramural
Research Program of the National Institutes of Health,
National Cancer Institute, and conducted with contracts
N01-PC-67010, N01-PC-67008, N02-PC-71105, N02-CP31003, N01-PC-67009, and N01-PC-65064.
The authors thank Matthew Airola of Westat, Inc.
(Rockville, Maryland) for obtaining the census data and
mapping the study subjects. They acknowledge Muhammad
T. Salam for assistance with preliminary data analysis. The
authors thank Dr. Barry I. Graubard for statistical consultation. They also thank Nathan Appel of Information
Management Services, Inc. (Silver Spring, Maryland) for
preparing the data set.
Contributions of each author: M. S. conducted the
statistical analysis and was the main author of the
manuscript. M. H. W. designed the analysis. J. C., M. W.,
W. C., R. K. S., L. B., and J. R. C. designed and conducted
the case-control study. L. H. and L. P. were responsible for
the spatial scan analysis. L. M. M. and A. J. D. contributed
to the manuscript Discussion section. All authors contributed to manuscript review and editing.
Conflict of interest: none declared.
REFERENCES
1. Morton LM, Cahill J, Hartge P. Reporting participation in
epidemiologic studies: a survey of practice. Am J Epidemiol
2006;163:197–203.
2. Rothman KJ, Greenland S. Precision and validity in epidemiologic studies. In: Rothman KJ, Greenland S, eds. Modern
Am J Epidemiol 2008;167:350–361
Respondent/Nonrespondent Differences in an NHL Case-Control Study
3.
4.
5.
6.
7.
8.
9.
10.
11.
epidemiology. 2nd ed. Philadelphia, PA: Lippincott Williams &
Wilkins, 1998:115–34.
Chatterjee N, Hartge P, Cerhan JR, et al. Risk of nonHodgkin’s lymphoma and family history of lymphatic, hematologic, and other cancers. Cancer Epidemiol Biomarkers Prev
2004;13:1415–21.
Ward MH, Nuckols JR, Giglierano J, et al. Positional accuracy of two methods of geocoding. Epidemiology 2005;16:
542–7.
Ward MH, Lubin J, Giglierano J, et al. Proximity to crops
and residential exposure to agricultural herbicides in Iowa.
Environ Health Perspect 2006;114:893–7.
SAS Institute, Inc. The GLIMMIX procedure. Cary, NC: SAS
Institute, Inc, 2005.
Kulldorff M. A spatial scan statistic. Commun Stat Theory
Methods 1997;26:1481–96.
Kulldorff M, Huang L, Pickle L, et al. An elliptic spatial scan
statistic. Stat Med 2006;25:3929–43.
Kulldorff M. SaTScan: software for the spatial and space-time
scan statistics. Version 7.0. Silver Spring, MD: Information
Management Services, Inc, 2006.
Krieger N, Chen JT, Waterman PD, et al. Geocoding and
monitoring of US socioeconomic inequalities in mortality
and cancer incidence: does the choice of area-based
measure and geographic level matter? The Public Health
Disparities Geocoding Project. Am J Epidemiol 2002;156:
471–82.
Benfante R, Reed D, MacLean C, et al. Response bias in
the Honolulu Heart Program. Am J Epidemiol 1989;130:
1088–100.
Am J Epidemiol 2008;167:350–361
361
12. Richiardi L, Boffetta P, Merletti F. Analysis of nonresponse
bias in a population-based case-control study on lung cancer.
J Clin Epidemiol 2002;55:1033–40.
13. Psaty BM, Cheadle A, Koepsell TD, et al. Race- and ethnicityspecific characteristics of participants lost to follow-up in
a telephone cohort. Am J Epidemiol 1994;140:161–71.
14. Sonne-Holm S, Sorensen TI, Jensen G, et al. Influence of
fatness, intelligence, education and sociodemographic
factors on response rate in a health survey. J Epidemiol
Community Health 1989;43:369–74.
15. Criqui MH. Response bias and risk ratios in epidemiologic
studies. Am J Epidemiol 1979;109:394–9.
16. Austin MA, Criqui MH, Barrett-Connor E, et al. The effect
of response bias on the odds ratio. Am J Epidemiol 1981;
114:137–43.
17. Greenland S, Criqui MH. Are case-control studies more vulnerable to response bias? Am J Epidemiol 1981;114:175–7.
18. Chang ET, Zheng T, Weir EG, et al. Childhood social environment and Hodgkin’s lymphoma: new findings from a
population-based case-control study. Cancer Epidemiol
Biomarkers Prev 2004;13:1361–70.
19. Chen J, Wacholder S, Morton LM, et al. Quantifying selection
bias in epidemiologic studies. (Abstract 577). Am J Epidemiol
2005;161(suppl):S145.
20. Hartge P. Raising response rates: getting to yes. Epidemiology
1999;10:105–7.
21. Harris NL, Jaffe ES, Diebold J, et al. Lymphoma classification—from controversy to consensus: the R.E.A.L. and
WHO Classification of lymphoid neoplasms. Ann Oncol 2000;
11(suppl 1):3–10.