assessing changes in coverage bias of web

Public Opinion Quarterly, Vol. 81, Special Issue, 2017, pp. 338–356
ASSESSING CHANGES IN COVERAGE BIAS OF WEB
SURVEYS IN THE UNITED STATES
DAVID STERRETT*
DAN MALATO
JENNIFER BENZ
TREVOR TOMPSON
NED ENGLISH
Abstract The rising costs and declining response rates of traditional survey modes have spurred many organizations to conduct
surveys online. Less expensive broadband connections and the popularity of smartphones have also made it easier for many to access the
Internet. The General Social Survey (GSS) shows that the percentage of
American adults who use the Internet increased from 69 percent in 2006
to 86 percent in 2014. The increased access has reduced some concerns
about the representativeness of Internet surveys. However, there remains
little research into coverage bias, which occurs if those not in the sampling frame differ from the target population on variables of interest.
This raises a question: With the increase in Internet access, has there
been any change in the coverage bias of web surveys? To assess coverage bias, we analyze the GSS to determine whether those with Internet
access and those without it became more or less similar between 2006
and 2014. We calculate the potential coverage bias of web-only surveys
over these years for sex, age, education, income, race, political ideology,
urbanicity, and life satisfaction. We also compare the bias observed in
David Sterrett is a research scientist at NORC at the University of Chicago, Chicago, IL, USA.
Dan Malato is a principal research analyst at NORC at the University of Chicago, Chicago, IL,
USA. Jennifer Benz is a principal research scientist at NORC at the University of Chicago in
Boston, Boston, MA, USA. Trevor Tompson is the vice president for public affairs research at
NORC at the University of Chicago, Chicago, IL, USA. Ned English is a senior research methodologist II at NORC at the University of Chicago, Chicago, IL, USA. The authors thank Rene
Bautista, Ipek Bilgen, Allyson Holbrook, Timothy Johnson, and the anonymous reviewers for
helpful comments on earlier versions of the article. *Address correspondence to David Sterrett,
NORC at the University of Chicago, 55 E. Monroe Street, 30th Floor, Chicago, IL 60603, USA;
e-mail: [email protected].
doi:10.1093/poq/nfx002
© The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.
All rights reserved. For permissions, please e-mail: [email protected]
Changes in Coverage Bias of Web Surveys
339
the United States to that in Europe. Our results illustrate that relative
coverage bias associated with education, income, race, and age declined
between 2006 and 2014, but bias still exists.
The rise of the Internet in the past decade has dramatically changed how many
researchers conduct surveys. Researchers frequently rely on the Internet for
mixed-mode surveys, web-only surveys, and online panels, with the use of web
surveys growing increasingly common across a range of research disciplines
(Tourangeau, Conrad, and Couper 2013). The popular media, policy analysts,
and academic researchers frequently use Internet surveys of Americans, and
many academic publications feature data from surveys conducted online.
Since researchers first began experimenting with web surveys more than a
decade ago, a major concern has been whether they can be representative of the
general population (Couper 2000, 2007; Stern, Adams, and Elsasser 2009). As
of 2014, more than 23 million American households lacked Internet access, and
potential coverage bias resulting from differences between those with Internet
access and those without access could pose a significant challenge for some
uses of web surveys (Bureau of the Census 2014). An address-based sample, for
instance, can reach nearly everyone living in the United States, and only about 2
percent of households do not have access to a cell or landline phone that would
be included in a random-digit-dial sample frame (Pew Research Center 2015).
In spite of coverage bias concerns, web surveys are becoming increasingly
popular as more Americans gain access to the Internet. Less costly broadband
connections and the growth of smartphones have made it easier for many to get
online. The General Social Survey (GSS) finds that the percentage of adults
in the United States who have Internet access at home or through a mobile
device increased from 69 percent in 2006 to 86 percent in 2014 (Smith et al.
1972–2014). As more people are able to get online, coverage of web surveys
improves, but it is still far from complete.
Past studies of coverage bias of web-only surveys in the United States and more
recent research on declines in coverage bias in Europe raise several key questions:
1) Has there been a change in the coverage bias associated with demographic variables for web-only surveys in the United States?
2) Have changes in coverage bias been similar across demographic groups,
or are there differences?
3) How do patterns of change compare to those seen in Europe?
4) And finally, what do the results indicate about the state of coverage bias
for web-only surveys?
This research aims to answer these questions by using the GSS to compare
the demographics, political ideology, and life satisfaction of Americans with
Internet access and those without it from 2006 to 2014. The findings are noteworthy for researchers conducting online surveys in the United States.
340
Sterrett et al.
Background
Coverage bias is a non-observational error in the total survey error framework
(Groves 1989; Groves and Lyberg 2010). Coverage bias can exist whenever
certain people or households in the population of interest never have a chance
to be studied because they are not part of the sampling frame (Groves et al.
2009). However, coverage bias only occurs when those people who never have
a chance to be studied are different than those people who have a chance to be
studied. Coverage bias depends on two factors: 1) the proportion of the population not covered; and 2) the difference in the statistic of interest between
those who are covered and those who are not covered. Since coverage bias
depends on this difference in the statistic of interest, coverage bias can vary
on a question-by-question basis. In web-only surveys, those who do not have
Internet access are not covered because they do not have an opportunity to
complete such a survey.
There is a wealth of research highlighting the potential coverage bias of
web-only surveys and documenting disparities in Internet use or access (see
Robinson et al. 2015). Access to the Internet varies across locations (Stern,
Adams, and Elsasser 2009). In particular, there can be significant differences
in connectivity between rural, suburban, and urban communities in the United
States (Mossberger, Tolbert, and McNeal 2007; Sylvester and McGlynn 2010;
Stern and Rookey 2012). Socioeconomic factors (DiMaggio et al. 2004; Stern
and Dillman 2006) and race and ethnicity (Mesch and Talmud 2011; CamposCastillo 2014) have also been associated with Internet access. Much of this
research, however, was conducted when there were lower levels of Internet
access in the United States (see Tourangeau, Conrad, and Couper 2013 for
a review). Moreover, studies show that the potential coverage bias of webonly surveys has declined in a number of European nations in recent years as
Internet access has increased (Vicente and Reis 2012; Mohorko, de Leeuw,
and Hox 2013).
Measuring Internet Coverage
There are a number of ways to measure and conceptualize Internet access
(see Tourangeau, Conrad, and Couper 2013 for a review). The GSS has asked
about Internet access at home since 2006 and Internet access through a mobile
device since 2010. For this research, Internet access from 2010 to 2014
includes both people with Internet at home and those with access via a mobile
device. For 2006 and 2008, Internet access includes just those with Internet
access at home.
Several other general population surveys attempt to estimate the number of
people with access to the Internet, and these surveys vary in their approach
and the unit of measurement. Despite differences between how these organizations measure Internet access, they have produced relatively similar estimates
Changes in Coverage Bias of Web Surveys
341
of access in the United States during the past decade. A comparison of estimates of Internet access from the GSS, Pew Research Center, and National
Telecommunications and Information Administration (NTIA) can be found in
figure 1. More details on each organization’s methods for measuring access
can be found in appendix 1.
Data for This Analysis
We use the GSS to assess the potential coverage bias of web surveys in the
United States. NORC at the University of Chicago, with funding from the
National Science Foundation, has conducted the GSS at least every two years
since 1972. The data allow for a direct comparison of those with Internet
access and those without Internet access. The GSS features a multi-stage area
probability sample designed to generate a nationally representative survey. A
number of past studies examining coverage bias of web surveys in the United
States and Europe have utilized in-person surveys that ask respondents about
Internet access (Lee 2006; Mohorko, de Leeuw, and Hox 2013; Tourangeau,
Conrad, and Couper 2013). In-person surveys with address-based sampling
designs offer the greatest coverage of the general population and can produce
samples that are highly representative of both those with and without Internet
access (Groves et al. 2009). The GSS, an in-person survey that asks about
Internet access, allows for an analysis of those with and without access based
on the same questions asked at the same time in the same survey mode.1
Figure 1. Estimates of Internet Access across Organizations.
1. The items used to assess coverage bias are part of two question modules. The demographic
questions and mobile Internet access question are part of a core module asked near the start of the
survey. The question about Internet access at home is part of a science module asked in the middle
part of the survey. The placement of all these questions within the modules remains consistent
across years.
342
Sterrett et al.
GSS data from 2006 to 2014 are used in this analysis for examining changes in
coverage bias over time. Response rates have remained around 70 percent since
2006 (Smith et al. 1972–2014). As with other surveys, nonresponse and sample
design present possible sources of error when measuring Internet access on the
GSS, but the area-probability sampling and the relatively high response rates
help reduce the potential for such error. Response rates and completed interview
totals for the 2006–2014 surveys can be seen in table 1. Descriptive statistics and
sample sizes for variables included in models can be found in table 2.
The following measures are used for our analysis. The question wording
and coding for each variable is in appendix 2.
INTERNET ACCESS
As noted above, Internet access is calculated based on a question about Internet
access at home and a question about Internet access via a mobile device.
The variable is coded 0 for people who do not have access at home or
through a mobile device, and 1 for people who have access either at home or
through a mobile device. About 2 percent of people reported having Internet
access through a mobile device but not at home in 2014, and including these
people in the access group provides an inclusive measure of coverage. This
more inclusive measure of access assumes that a web-only survey can be completed via a mobile device. In addition, this measure assumes that everyone
with access at home has an Internet connection conducive to completing surveys (i.e., a connection that is reliable and fast).
TIME
To test whether change in coverage over time is linear, the analysis includes
both a linear and a quadratic measure of time. For the linear variable, 2006 = 0,
2008 = 2, 2010 = 4, 2012 = 6, and 2014 = 8. For the quadratic variable,
2006 = 0, 2008 = 4, 2010 = 16, 2012 = 36, and 2014 = 64.
DEMOGRAPHIC CHARACTERISTICS AND ATTITUDES
To make this analysis comparable to that of Mohorko, de Leeuw, and Hox
(2013) on Internet coverage in Europe, we include measures of gender, age,
Table 1. Response Rates for GSS from 2006 to 2014
Year
Response rate
Interviews
2006
2008
2010
2012
2014
71.2
70.4
70.3
71.4
69.2
4,510
2,023
2,044
1,974
2,538
Less than high school degree
A high school degree
More than a high school degree
Education
Black
Hispanic
Non-Hispanic white
Race
18–29 years old
30–49 years old
50–64 years old
65 and older
Missing
item = 3
15%
51%
34%
Missing
item = 0
12%
15%
69%
Missing
item = 18
20%
42%
24%
14%
Age
N = 4,510
N = 1,862
69%
31%
Total GSS sample
Internet variable
Have Internet access
No Internet access
2006
Missing
item = 1
14%
50%
35%
Missing
item = 0
13%
14%
69%
Missing
item = 10
21%
38%
26%
15%
N = 2,023
N = 1,492
74%
26%
2008
Missing
item = 0
15%
50%
35%
Missing
item = 0
14%
13%
69%
Missing
item = 3
20%
37%
26%
17%
N = 2,044
N = 712
77%
23%
2010
Table 2. Sample Sizes and Descriptive Statistics for GSS Variables by Year 2006–2014
Missing
item = 0
15%
49%
36%
Missing
item = 0
14%
16%
64%
Missing
item = 5
20%
38%
25%
16%
N = 1,974
N = 993
80%
20%
2012
Continued
Missing
item = 0
13%
51%
36%
Missing
item = 0
14%
18%
64%
Missing
item = 9
18%
35%
29%
17%
N = 2,538
N = 1,238
86%
14%
2014
Changes in Coverage Bias of Web Surveys
343
Missing/ not
asked = 1,524
88%
12%
Life satisfaction
Very or pretty happy
Not too happy
Liberal
Moderate
Conservative
Missing
item = 177
26%
39%
35%
Missing
item = 0
11%
33%
57%
Missing
item = 0
54%
46%
Missing
item = 637
18%
32%
50%
2006
Political ideology
Live in rural area
Live in suburban area
Live in urban area
Urbanicity
Female
Male
Gender
Family income < $20,000
Family income $20,000–$50,000
Family income > $50,000
Income
Table 2. Continued
Missing
item = 8
86%
14%
Missing
item = 90
26%
38%
36%
Missing
item = 0
10%
32%
58%
Missing
item = 0
53%
47%
Missing
item = 249
17%
28%
55%
2008
Missing
item = 5
86%
14%
Missing
item = 71
29%
37%
34%
Missing
item = 0
11%
31%
58%
Missing
item = 0
55%
45%
Missing
item = 239
21%
30%
49%
2010
Missing
item = 10
87%
13%
Missing
item = 100
27%
39%
34%
Missing
item = 0
11%
35%
55%
Missing
item = 0
54%
46%
Missing
item = 216
18%
30%
51%
2012
Missing
item = 8
88%
12%
Missing
item = 89
26%
40%
34%
Missing
item = 0
10%
35%
55%
Missing
item = 0
54%
46%
Missing
item = 224
17%
28%
55%
2014
344
Sterrett et al.
Changes in Coverage Bias of Web Surveys
345
education, income, race, ethnicity, urbanicity, life satisfaction, and political
ideology.
Analysis
We model our analysis on the work of Mohorko, de Leeuw, and Hox (2013),
which assessed changes in Internet coverage bias between 2005 and 2009 in
Europe. They designated the Eurobarometer survey samples for those years as
proxies for the target populations of European countries. Similarly, we use the
GSS samples as proxies for the US population. In both the European and US
studies, the findings are subject to possible nonresponse and other errors in the
surveys that serve as population proxies. Thus, we present relative coverage
bias estimates, comparing characteristics of respondents who report having
Internet access to the total achieved GSS samples. In regression analyses of
change over time, like Mohorko, de Leeuw, and Hox, we analyze the absolute
value of the relative coverage bias (the absolute relative coverage bias), to
avoid the problem of positive and negative values for relative coverage bias
possibly canceling each other out and giving a false impression of no change.
We examine how Internet access relates to demographic characteristics and
look at changes from 2006 to 2014. First, we calculate the relative Internet
coverage bias for age, gender, income, education, race and ethnicity, political ideology, urbanicity, and happiness. Then, we use bivariate regressions to
determine whether there has been a change in the absolute relative coverage
bias associated with these characteristics from 2006 to 2014. We test both linear and quadratic measures of time, and we analyze whether the rate of change
in bias varies across characteristics. In addition, we use multivariate regression
to explore which demographic factors were associated with Internet access in
2014. Finally, we compare the demographic changes in access in the United
States to the findings of previous research that document changes in Internet
access in European countries.
We conduct all of the analyses in STATA 14 with the GSS weights
(WSSNR) that account for differences in both the probability of selection and
nonresponse.
Results
DEMOGRAPHIC DIFFERENCES
Ordered logistic regressions show that absolute relative coverage bias has significantly declined for some, but not all, demographic characteristics in the
United States between 2006 and 2014. Similar to the research on the changes
in coverage bias in Europe (Mohorko, de Leeuw, and Hox 2013), the quadratic
346
Sterrett et al.
effect of time was never significant and only the linear effect of time was used
in the final models. These models show a significant decline in absolute relative
coverage bias associated with income, education, race and ethnicity, and age
over time (see table 3). In 2006, people with more than a high school degree
were overrepresented on a web-only survey by about 12.3 percentage points,
but there is a decline in absolute relative coverage bias of education of about 1
percentage point a year (coefficient = –1.01, p < .05). A similar pattern emerges
for income, with those with high incomes overrepresented in web-only surveys
by about 13.8 percentage points in 2006, and a decline in absolute relative coverage bias of about 1 percentage point per year (coefficient = –0.97, p < .05).
Overrepresentation of whites also declined significantly between 2006 and 2014
(coefficient = –0.78, p < .01), while the decrease in overrepresentation of people
under age 30 is slightly significant (coefficient = –0.18, p < .1). The changes in
absolute relative coverage bias associated with gender, political ideology, happiness, and urbanicity are not significant.
Despite the declines in coverage bias related to education levels, income
levels, race, and age over time, the web-only population remained significantly different from the overall population on several demographic measures
(see table 4). Education, income, age, race, and urbanicity all remain predictive of Internet access in 2014.
Table 3. Change in Absolute Relative Coverage Bias across
Demographic Factors in the United States from 2006 to 2014
Model
Sex
(over-represent male)
Age
(over-represent under 30)
Education
(over-represent high education)
Income
(over-represent high income)
Race
(over-represent white)
Urbanicity
(over-represent suburb)
Happiness
(over-represent happy)
Political ideology
(under-represent moderate)
Intercept
(SE)
Time
2006–2014
(SE)
0.82
(.75)
0.07
(.15)
1.89**
(.27)
–0.18+
(.06)
12.27**
(1.17)
–1.01*
(.24)
13.79**
(.94)
–0.97*
(.19)
8.37**
(.53)
–0.78**
(.11)
3.66**
(.60)
–0.19
(.12)
3.54
(1.56)
–0.42
(.32)
3.10+
(1.09)
–0.29
(.22)
OLS regressions. +p < .10; *p < .05; **p < .01.
Agea
18–29 years old
30–49 years old
50–64 years old
65 and older
Race and ethnicitya
Black
Hispanic
Non-Hispanic white
Educationa
Less than high school degree
A high school degree
More than a high school degree
Household incomea
Family income less than $20,000
Family income between $20,000 and $50,000
Family income more than $50,000
Gender
Female
Male
–0.014
0.047
0.015
–0.047
–0.033
–0.030
0.061
–0.057
–0.024
0.082
–0.059
–0.043
0.106
–0.010
0.010
0.018
0.025
0.001
–0.043
–0.027
–0.076
0.090
–0.092
–0.038
0.130
–0.067
–0.057
0.141
–0.002
0.002
0.019
–0.019
–0.051
–0.037
0.115
–0.050
–0.036
0.086
–0.016
–0.026
0.047
0.014
0.032
0.005
–0.051
–0.021
0.021
–0.051
–0.019
0.076
–0.053
–0.016
0.069
–0.035
–0.009
0.042
–0.012
–0.008
0.023
–0.003
–0.004
0.004
–0.027
–0.024
0.059
–0.044
0.017
0.027
0.003
–0.027
0.021
0.001
0.026
0.004
–0.031
Continued
–0.004
0.004
–0.051
–0.036
0.099
–0.059
–0.019
0.079
–0.021
–0.034
0.052
0.001
0.024
0.010
–0.035
Relative
Relative
Relative
Relative
Relative
Average relative
coverage bias coverage bias coverage bias coverage bias coverage bias coverage bias
2006
2008
2010
2012
2014
2006–2014
Table 4. Relative Coverage Bias for Demographic Characteristics across Years
Changes in Coverage Bias of Web Surveys
347
–0.015
0.024
–0.009
0.007
–0.015
0.008
0.055
–0.055
–0.020
0.039
–0.019
0.017
–0.039
0.023
0.024
–0.024
0.002
–0.002
0.035
–0.010
–0.025
–0.001
0.034
–0.032
–0.007
0.007
–0.010
–0.031
0.041
–0.015
0.032
–0.018
0.005
–0.005
0.008
–0.002
–0.006
–0.012
0.017
–0.005
0.016
–0.016
0.011
–0.020
0.008
–0.013
0.029
–0.017
Relative
Relative
Relative
Relative
Relative
Average relative
coverage bias coverage bias coverage bias coverage bias coverage bias coverage bias
2006
2008
2010
2012
2014
2006–2014
Note.—Positive values indicate overrepresentation, and negative values indicate underrepresentation. A value of 0.01 would indicate 1 percentage point of overrepresentation, while a value of –0.045 would indicate 4.5 percentage points of underrepresentation.
a
Differences among categories within this variable are statistically significantly in the 2014 GSS based on multivariate logistic regressions.
Urbanicitya
Live in rural area
Live in suburban area
Live in urban area
Political ideology
Liberal
Moderate
Conservative
Life satisfaction
Very or pretty happy
Not too happy
Table 4. Continued
348
Sterrett et al.
Changes in Coverage Bias of Web Surveys
349
COMPARISON TO PAST RESEARCH ON EUROPE
The changes in coverage bias in the United States over time are comparable,
but distinct, from the declines in coverage bias in Europe detailed by Mohorko,
de Leeuw, and Hox (2013). One key similarity is that in both the United States
and Europe there has been a significant decline in the overrepresentation of
highly educated people in web-only populations. Between 2005 and 2009,
bias in Europe of those who are highly educated declined by an average of .78
percentage points per year (Mohorko, de Leeuw, and Hox 2013), which is similar to the 1-percentage-point decline observed in the United States. The overrepresentation of young people also declined in both Europe and the United
States, although the rate of reduction that Mohorko et al. found in Europe
from 2005 to 2009 (.60 percentage points a year) is greater than the observed
decline in the United States from 2006 to 2014 (.18 percentage points a year).
Differences in coverage patterns also emerge between the United States and
Europe when looking at trends in gender balance and life satisfaction. From
2005 to 2009, Europe showed a significant decline in the overrepresentation
of males and those who say they are happy (Mohorko, de Leeuw, and Hox
2013), though no significant decline in bias among those groups was observed
in the United States, as bias related to those variables was relatively low in the
United States from 2006 to 2014.
The potential Internet coverage error for most subgroups in Europe has
likely decreased since Mohorko, de Leeuw, and Hox’s (2013) analysis, since
levels of Internet access have continued to increase since 2009. Statistics from
Eurostat (see table 5) measuring the percentage of people aged 16 to 74 who
have used the Internet in the past three months shows that Internet access for
the European area has increased from an average of 53 percent of the population in 2006 to about 78 percent in 2014 (Eurostat 2016). The proportion of
Americans with Internet access in 2014, about 86 percent, is similar to the
levels of access in places such as Germany, Belgium, and France. In contrast, more people had Internet access in places such as Iceland, Denmark,
and Norway (more than 96 percent), while fewer had access in places such as
Poland, Portugal, Greece, and Italy (less than 70 percent).
Discussion
This research illustrates that the relative Internet coverage bias associated with
education, income, race, and age in the United States declined significantly
from 2006 to 2014, and the rate of decline varied across different variables. No
significant decline in coverage bias was observed, however, with gender, political ideology, happiness, or urbanicity. The declines in coverage bias related
to education and age were similar to those observed in Europe from 2005 to
2009 by Mohorko, de Leeuw, and Hox (2013), although declines in bias associated with gender and life satisfaction were observed in Europe and not the
350
Sterrett et al.
Table 5. Percent of Those Aged 16–74 who Used Internet in Past Three
Months
Country/region
Iceland
Denmark
Norway
Luxembourg
Netherlands
Sweden
Finland
United Kingdom
Germany
Belgium
Estonia
France
Austria
Czech Republic
Ireland
Slovakia
Euro area
Spain
Latvia
Hungary
Malta
Lithuania
Slovenia
Croatia
Cyprus
Former Yugoslav Republic of Macedonia
Poland
Portugal
Greece
Italy
Bulgaria
Romania
Turkey
2006
2008
2010
2012
2014
88
83
81
71
81
86
77
66
69
62
61
47
61
44
51
50
53
47
50
44
38
42
51
NA
34
25
40
36
29
36
24
21
NA
91
84
89
81
87
88
83
76
75
69
66
68
71
58
63
66
62
56
61
58
49
53
56
42
39
42
49
42
38
42
35
29
32
93
88
93
90
90
91
86
83
80
78
74
75
74
66
67
76
69
64
66
61
62
60
68
54
52
52
59
51
44
51
43
36
38
96
92
95
92
93
93
90
87
82
81
78
81
80
73
77
77
74
69
73
70
69
66
68
62
61
57
62
60
55
56
52
46
43
98
96
96
95
93
93
92
92
86
85
84
84
81
80
80
80
78
76
76
76
73
72
72
69
69
68
67
65
63
62
55
54
48
Source.—Eurostat 2016.
United States. The analysis also shows that education, income, age, race, and
urbanicity remain predictors of Internet access in the United States despite the
declines in coverage bias across time.
This study may underestimate the extent of potential coverage bias because
it assumes that everyone with Internet access has personal capability and adequate facilities to a web survey. Beyond having access to the Internet, people
Changes in Coverage Bias of Web Surveys
351
need a certain level of proficiency to complete online tasks such as surveys
(Hargittai and Hsieh 2012), and research indicates that Internet proficiency
and/or skills can vary across groups (Mossberger, Tolbert, and McNeal 2007;
Stern, Adams, and Elsasser 2009). Past studies show that people lacking the
necessary proficiency to complete a survey online are likely to be different
socially, economically, and politically than those with the access and skills
to complete a web survey (Selwyn 2004; Mossberger, Tolbert, and McNeal
2007).
The results of this study have a number of implications for researchers. Our
results show that although potential coverage bias has decreased for demographics like education, income, race, and age, there are still significant differences in access to the Internet based on those demographic variables. Since the
proportion of Americans with Internet access has remained relatively stable in
recent years (Pew reports 84 percent of Americans with access in 2013, 2014,
and 2015), researchers need to continue to consider the potential for coverage bias when conducting Web-only surveys. The impact of coverage error
likely depends on the particular study objectives and highlights the need for a
fit-for-purpose research design. The coverage error associated with web-only
surveys poses challenges to tracking changes over time, as apparent shifts in
statistics of interest could be cofounded with increases/decreases in coverage
bias among certain subgroups.
The potential coverage bias of key demographic factors inherent to webonly surveys highlights the advantages of using mixed-mode designs to sample and survey the broader population. Mail, telephone, or in-person surveys
all provide an opportunity to reach people without Internet access and reduce
coverage bias across groups. However, the reductions in coverage bias of
mixed-mode surveys depend on the specific survey design, and reductions
in coverage bias need to be weighed against the challenges associated with
multiple modes. In addition, it would be worthwhile to assess the benefits
and consequences of weight adjustments that consider differences in Internet
access within demographic cells.
Potential coverage bias of web-only surveys is declining for several demographic groups, but Americans without Internet access remain a distinct segment of society that should be included in any survey designed to make precise
inferences about the broader public.
Appendix 1. Other Measures of Internet Access Rates
Surveys other than the GSS have also attempted to estimate the number of
people with access to the Internet, and these surveys vary in their approach
and the unit of measurement. Some ask about frequency of Internet use, while
other surveys ask about access to the Internet at various places, such as at
home or work. Some approach this at an individual level, while others look at
it at the household level. The Pew Research Center has tracked Internet use
352
Sterrett et al.
since 2000, and has used a few different approaches to measuring Internet
access during that time (Perrin and Duggan 2015). From January 2005 through
February 2012, an Internet user was defined as someone who said “yes” to
either “Do you use the Internet, at least occasionally?” or “Do you send or
receive email, at least occasionally?” From April 2012 through April 2013,
users were defined as anyone who said “yes” to at least one of these three
questions: “Do you use the Internet, at least occasionally?”; “Do you send or
receive email, at least occasionally?”; or “Do you access the Internet on a cell
phone, tablet, or other mobile handheld device, at least occasionally?” Since
then, they have included respondents who said “yes” to either “Do you use
the Internet or email, at least occasionally?” or “Do you access the Internet on
a cell phone, tablet, or other mobile handheld device, at least occasionally?”
The National Telecommunications and Information Administration’s
(NTIA) estimates of access are based on the Current Population Survey and
reflect the number of Americans age 15 and older who say they go online
from any location. NTIA also estimates Internet users age three and older
and Internet access by household, but figures for those age 15 and older are
included here to provide a closer comparison to the GSS and Pew’s findings.
The questions used by NTIA to measure access have changed over the years.
In 2007–2009, CPS asked, “(Do you/Does anyone) in this household use the
Internet at any location?” and then followed up by asking which people use the
Internet. Starting in 2010, CPS began to ask if people used the Internet in specific locations, then created an estimate of Internet use based on anyone who
accessed the Internet from any of the locations asked about. While the exact
wording of the questions has changed in small ways and a few additional specific locations have been added to the list asked about between 2010 and 2015,
the most recent versions of these questions are “(Do you/Does anyone in this
household, including you,) use the Internet at [home/work/school/coffee shop
or other business that offers Internet access/while traveling between places/
library, community center, park, or other public place/someone else’s home/
some other location we haven’t covered yet]?” There is a follow-up for each
location to determine who uses the Internet at that location. The NTIA Internet
estimate includes anyone who said yes to any of the locations of Internet use
(National Telecommunications and Information Administration 2016).
Appendix 2. GSS Question Wording and Variable Coding
INTERNET ACCESS VARIABLES
Internet access at home (INTRHOME): Respondents were asked, “Do you
have access to the Internet in your home?” Coded 1 if respondent says yes and
0 if respondent says no.
Internet access, mobile device (WEBMOB): Respondents were asked, “Do
you have access to the Internet or World Wide Web in your home through
Changes in Coverage Bias of Web Surveys
353
an Internet-enabled mobile device like a smart phone, PDA, or BlackBerry?”
Coded 1 if respondent says yes and 0 if respondent says no.
Internet access, at home or through a mobile device (INTRHOMERC):
Respondents were asked, “Do you have access to the Internet in your home?”
Coded 1 if respondent says yes and 0 if respondent says no. Some respondents were also asked, “Do you have access to the Internet or World Wide Web
in your home through an Internet-enabled mobile device like a smart phone,
PDA, or BlackBerry?” Coded 1 if respondent says yes to either of these questions and 0 if they did not say yes to either.
DEMOGRAPHIC VARIABLES
Age 18–29 years old (AGE): Respondents were coded 1 if between the ages of
18–29 years old and 0 if older than 29 years old.
Age 30–49 years old (AGE): Respondents were coded 1 if between the ages
of 30–49 years old and 0 if younger than 30 or older than 49 years old.
Age 50–64 years old (AGE): Respondents were coded 1 if between the ages
of 50–64 years old and 0 if younger than 50 or older than 64 years old.
Age 65 and older (AGE): Respondents were coded 1 if 65 years old or older
and 0 if younger than 65 years old.
Black (RACE, HISPANIC): Respondents were asked, “What race do you
consider yourself? Are you Spanish, Hispanic, or Latino/Latina?” Coded 1 for
non-Hispanic black and 0 for not non-Hispanic black.
Hispanic (RACE, HISPANIC): Respondents were asked, “What race do
you consider yourself? Are you Spanish, Hispanic, or Latino/Latina?” Coded
1 for Hispanic and 0 for not Hispanic.
Non-Hispanic white (RACE, HISPANIC): Respondents were asked, “What
race do you consider yourself? Are you Spanish, Hispanic, or Latino/Latina?”
Coded 1 for non-Hispanic white and 0 for not non-Hispanic white.
Less than a high school degree (DEGREE): Respondents’ education is
based on a series of questions about how many years of education they completed and what degrees they received. Coded 1 if respondent reports less than
a high school degree, and 0 if respondent reports a high school degree or more
education.
A high school degree (DEGREE): Respondents’ education is based on a
series of questions about how many years of education they completed and
what degrees they received. Coded 1 if respondent reports a high school
degree, and 0 if respondent reports either less than a high school degree or
more than a high school degree.
More than a high school degree (DEGREE): Respondents’ education is
based on a series of questions about how many years of education they completed and what degrees they received. Coded 1 if respondent reports either
a junior college degree, bachelor’s degree, or graduate degree. Coded 0 if
respondent reports a high school degree or less education.
354
Sterrett et al.
Family income less than $20,000 (INCOME06): Respondents were asked,
“In which of these groups did your total family income, from all sources, fall
last year before taxes, that is.” Respondents were coded 1 if they reported
income less than $20,000 and 0 if they reported income higher than $20,000.
Family income between $20,000 and $50,000 (INCOME06): Respondents
were asked, “In which of these groups did your total family income, from all
sources, fall last year before taxes, that is.” Respondents were coded 1 if they
reported income between $20,000 and $50,000 and 0 if they reported income
less than $20,000 or income more than $50,000.
Family income more than $50,000 (INCOME06): Respondents were asked,
“In which of these groups did your total family income, from all sources, fall
last year before taxes, that is.” Respondents were coded 1 if they reported
income more than $50,000 and 0 if they reported income less than $50,000.
Gender (SEX): Respondent’s sex is coded 1 for female and 0 for male.
Live in rural area (SRCBELT): Respondents were coded 1 if they lived in an
area designated rural and 0 if they lived in any other type of area.
Live in suburban area (SRCBELT): Respondents were coded 1 if they lived
in an area designated either large suburb or suburb and 0 if they lived in any
other type of area.
Live in urban area (SRCBELT): Respondents were coded 1 if they lived in
an area designated as one of the largest 100 standard metropolitan statistical
areas or other urban area and 0 if they lived in any other type of area.
Conservative (POLVIEWS): Respondents were asked, “We hear a lot of talk
these days about liberals and conservatives. I’m going to show you a seven-point
scale on which the political views that people might hold are arranged from
extremely liberal—point 1—to extremely conservative—point 7.” Coded 1 if
respondent says extremely conservative, conservative, or slightly conservative and
0 if respondent says moderate, extremely liberal, liberal, or slightly liberal.
Moderate (POLVIEWS): Respondents were asked, “We hear a lot of talk
these days about liberals and conservatives. I’m going to show you a sevenpoint scale on which the political views that people might hold are arranged
from extremely liberal—point 1—to extremely conservative—point 7.”
Coded 1 if respondent says moderate and 0 if respondent says extremely
liberal, liberal, slightly liberal, extremely conservative, conservative, or
slightly conservative.
Liberal (POLVIEWS): Respondents were asked, “We hear a lot of talk
these days about liberals and conservatives. I’m going to show you a sevenpoint scale on which the political views that people might hold are arranged
from extremely liberal—point 1—to extremely conservative—point 7. Where
would you place yourself on this scale?” Coded 1 if respondent says extremely
liberal, liberal, or slightly liberal and 0 if respondent says moderate, extremely
conservative, conservative, or slightly conservative.
Life satisfaction (HAPPY): Respondents were asked, “Taken all together,
how would you say things are these days—would you say that you are very
Changes in Coverage Bias of Web Surveys
355
happy, pretty happy, or not too happy?” Coded 1 if respondent says very happy
or pretty happy and 0 if respondent says not too happy.
References
Bureau of the Census. 2014. “2014 American Community Survey 1-Year Estimates.” Available
at http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_14_
1YR_B28002&prodType=table.
Campos-Castillo, Celeste. 2014. “Revisiting the First-Level Digital Divide in the United States:
Gender and Race/Ethnicity Patterns, 2007–2012.” Social Science Computer Review 33:423–39.
Couper, Mick P. 2000. “Web Surveys: A Review of Issues and Approaches.” Public Opinion
Quarterly 64:464–94.
———. 2007. “Issues of Representation in eHealth Research (with a Focus on Web Surveys).”
American Journal of Preventive Medicine 32:S83–S89.
DiMaggio, Paul, Eszter Hargittai, Coral Celeste, and Steven Shafer. 2004. “From Unequal Access
to Differentiated Use: A Literature Review and Agenda for Research on Digital Inequality.” In
Social Inequality, edited by K. Neckerman, 355–400. New York: Russell Sage Foundation.
Eurostat. 2016. “Internet Use by Individuals.” Available at http://ec.europa.eu/eurostat/web/
products-datasets/-/tin00028.
Groves, Robert M. 1989. Survey Costs and Survey Errors. New York: John Wiley & Sons.
Groves, Robert M., Floyd J. Fowler, Mick P. Couper, James M. Lepkowski, Eleanor Singer, and
Roger Tourangeau. 2009. Survey Methodology. New York: John Wiley & Sons.
Groves, Robert M., and Lars Lyberg. 2010. “Total Survey Error: Past, Present, and Future.” Public
Opinion Quarterly 74:849–79.
Hargittai, Eszter, and Yuli Patrick Hsieh. 2012. “Succinct Survey Measures of Web-Use Skills.”
Social Science Computer Review 30:95–107.
Lee, Sunghee. 2006. “Propensity Score Adjustments as a Weighting Scheme for Volunteer Panel
Web Surveys.” Journal of Official Statistics 22:329–49.
Mesch, Gustavo S., and Ilan Talmud. 2011. “Ethnic Differences in Internet Access: The Role of
Occupation and Exposure.” Information Communication & Society 14:445–71.
Mohorko, Anja, Edith de Leeuw, and Joop Hox. 2013. “Internet Coverage and Coverage Bias
in Europe: Developments Across Countries and Over Time.” Journal of Official Statistics
29:609–22.
Mossberger, Karen, Caroline J. Tolbert, and Ramona S. McNeal. 2007. Digital Citizenship: The
Internet, Society, and Participation. Cambridge, MA: MIT Press.
National Telecommunications and Information Administration. US Department of Commerce.
2016. “Digital Nation Data Explorer—Ages 15+: Uses the Internet (Any Location).”
Available at https://www.ntia.doc.gov/other-publication/2016/digital-nation-data-explorer.
Perrin, Andrew, and Maeve Duggan. 2015. “Americans’ Internet Access: 2000–2015.”
Pew Research Center, June 26. Available at http://www.pewInternet.org/2015/06/26/
americans-Internet-access-2000–2015/.
Pew Research Center. 2015. “Sampling.” Available at http://www.pewresearch.org/
methodology/u-s-survey-research/sampling/.
Robinson, Laura, Shelia R. Cotten, Hiroshi Ono, Anabel Quan-Haase, Gustavo Mesch,
Wenhong Chen, Jeremy Schulz, Timothy M. Hale, and Michael J. Stern. 2015. “Digital
Inequalities and Why They Matter.” Information, Communication & Society 18:569–82.
Selwyn, Neil. 2004. “Reconsidering Political and Popular Understandings of the Digital Divide.”
New Media & Society 6:341–62.
Smith, Tom W., Peter Marsden, Michael Hout, and Jibum Kim. General Social Surveys, 1972–
2014 [machine-readable data file]. Principal Investigator, Tom W. Smith; Co-Principal
Investigator, Peter V. Marsden; Co-Principal Investigator, Michael Hout; Sponsored by National
356
Sterrett et al.
Science Foundation. NORC ed. Chicago: NORC at the University of Chicago [producer and
distributor].
Stern, Michael J., Alison E. Adams, and Shaun Elsasser. 2009. “Digital Inequality and Place: The
Effects of Technological Diffusion on Internet Proficiency and Usage across Rural, Suburban,
and Urban Counties.” Sociological Inquiry 79:391–417.
Stern, Michael J., and Don A. Dillman. 2006. “Community Participation, Social Ties, and Use of
the Internet.” City & Community 5:409–24.
Stern, Michael J., and Bryan D. Rookey. 2012. “The Politics of New Media, Space, and Race:
A Socio-Spatial Analysis of the 2008 Presidential Election.” New Media & Society 15:519–40.
Sylvester, Dari E., and Adam J. McGlynn. 2010. “The Digital Divide, Political Participation, and
Place.” Social Science Computer Review 28:64–74.
Tourangeau, Roger, Fredrick G. Conrad, and Mick P. Couper. 2013. The Science of Web Surveys.
New York: Oxford University Press.
Vicente, Paula, and Elizabeth Reis. 2012. “Coverage Error in Internet Surveys: Can Fixed Phones
Fix It?” International Journal of Market Research 54:323–45.