Mode of administration does not cause bias in patient

Qual Life Res
DOI 10.1007/s11136-015-1110-8
SPECIAL SECTION: PROS IN NON-STANDARD SETTINGS (BY INVITATION ONLY)
Mode of administration does not cause bias in patient-reported
outcome results: a meta-analysis
Claudia Rutherford1 • Daniel Costa1 • Rebecca Mercieca-Bebber1,2
Holly Rice3 • Liam Gabb4 • Madeleine King1,2
•
Accepted: 18 August 2015
Ó Springer International Publishing Switzerland 2015
Abstract
Purpose Technological advances in recent decades have
led to the availability of new modes to administer patientreported outcomes (PROs). To aid selecting optimal modes
of administration (MOA), we undertook a systematic
review to determine whether differences in bias (both size
and direction) exist among modes.
Methods We searched five electronic databases from
2004 (date of last comprehensive review on this topic) to
April 2014, cross-referenced and searched reference lists.
Studies that compared two or more MOA for a healthrelated PRO measure in adult samples were included. Two
reviewers independently applied inclusion and quality criteria and extracted findings. Meta-analyses and meta-regressions were conducted using random-effects models.
Results Of 5100 papers screened, 222 were considered
potentially relevant and 56 met eligibility criteria. No
evidence of bias was found for: (1) paper versus electronic
self-complete; and (2) self-complete versus assisted MOA.
Heterogeneity for paper versus electronic comparison was
explained by type of construct (i.e. physical vs. psychological). Heterogeneity for self-completion versus assisted
modes was in part explained by setting (clinic vs. home);
the largest bias was introduced when assisted completion
occurred in the clinic and follow-up was by self-completion (either electronic or paper) in the home.
Conclusions Self-complete paper and electronic MOA
can be used interchangeably for research in clinic and
home settings. Self-completion and assisted completion
produce equivalent scores overall, although heterogeneity
may be induced by setting. These results support the use of
mixed MOAs within a research study, which may be a
useful strategy for reducing missing PRO data.
Electronic supplementary material The online version of this
article (doi:10.1007/s11136-015-1110-8) contains supplementary
material, which is available to authorized users.
Introduction
& Claudia Rutherford
[email protected]
1
Quality of Life Office, Psycho-oncology Co-operative
Research Group, School of Psychology, University of
Sydney, Level 6 North, Chris O’Brien Lifehouse (C39Z),
Sydney, NSW, Australia
2
Sydney Medical School, University of Sydney, Sydney,
NSW, Australia
3
Centre for Medical Psychology and Evidence-Based
Decision-Making, School of Psychology, University of
Sydney, Sydney, NSW, Australia
4
Kings’ College London, London, UK
Keywords Systematic review Patient-reported
outcome Mode of administration Bias
A patient-reported outcome (PRO) is a patient’s report of
their own health status that has not been interpreted by a
clinician or anyone else [1]. The method of collecting PRO
data, or mode of administration (MOA), is receiving
increasing attention in both research and clinical contexts,
largely driven by the rapid development of new methods
and technologies that allow electronic data capture.
Historically, paper-based administration (variably
referred to as ‘‘paper and pencil’’ or ‘‘hard copy’’) has been
used in the clinic. Touch screen computers and tablets are
increasingly used in the clinic, and remote data capture
(e.g. via internet or mobile devices) allows PRO assessment at times and locations outside the clinic. These newer
123
Qual Life Res
methods allow increased ease, speed and efficiency of data
capture.
Typically, PRO measures are self-administered, i.e. completed personally by the patient, but they may also be interviewer-administered, whereby the interviewer reads
questions to the patient and the patient records their response
via a chosen input mode. This difference in MOA may induce
bias, even when question (or item) wording is consistent
across modes, if for example, the question is appraised psychologically by respondents depending on MOA [2]. This
may be particularly apparent when an interviewer is present;
participants may produce more candid responses when selfcompleting than when responding to a questionnaire verbally
administered by a researcher, due to phenomena such as social
desirability or response acquiescence [3].
Both self- and interviewer-administered PRO assessment can be completed via a range of input methods,
including paper, over the phone, using a computer or other
electronic device. The choice of MOA may be determined
by the setting in which patients complete PROs, e.g. paperbased MOA may be feasible in clinics if staff are available
to handout and collect questionnaires [4], touch-screens are
known to be feasible in clinic but require investment in
hardware and software [5] and web-based MOA enables
completion at home and therefore at times other than
scheduled clinic visits.
High-quality PRO research relies not only on the use of
reliable, valid and responsive PRO measures [1, 6], but also
on the generation of high-quality PRO data. While there is
no agreed definition of data quality, it is clear that higherquality data will result from higher response rates, greater
reliability of responses and absence of bias [7]. A narrative
review of the effects of MOA on data quality found five
main indicators of data quality: validity of responses (i.e.
absence of bias); absence of social desirability; item
response rates; amount of information (e.g. amount of
open-ended questions); and similarity of response distributions obtained by different MOA [7]. MOA variations
arguably could affect PRO data quality. Bias, a systematic
error in scores attributable to variables other than those of
primary interest, is of particular interest for MOA of PRO
measures because it may confound interpretation of PRO
results, potentially leading to erroneous conclusions, for
example, about the effect of health conditions and interventions on PROs.
Another major threat to the quality of PRO data is
missing data [8]. The use of a mix of MOA within a study
may be an effective way to maximise response rates. For
example, some participants may prefer web-based administration while others prefer paper-based [9]. Non-responders from the primary MOA, such as paper, may be followed
up using an alternative MOA, such as via telephone. Mixedmode designs may also be used to minimise costs. For
123
example, paper-based MOA may be used when planned
PRO assessments coincide with a clinic visit, and clinic staff
are available to handout and collect questionnaires, while
online administration could be a cost-effective approach at
other times. While using a mix of MOA is appealing for a
range of practical and logistical reasons, scientific rigour
makes it essential to determine whether there are systematic
differences due to MOA, and if so, what is the size and
direction of MOA bias.
Five review papers have assessed the effect of MOA on
data quality. Of these, two reviews were based on studies
published up to 2006 that compared paper and electronic
self-completion of PROs and found they are equivalent [2,
10]. On the other hand, a narrative literature review found
mixed results for assisted and self-complete methods: higher
item response rates were found with administered methods
when compared to paper self-completion while others
reported inconsistent effects [11]. One paper found no MOA
bias for repeated PRO measurements [12], whereas another
did [7].
The last comprehensive review of the literature investigating the impact of MOA on response quality is based on
research published up to 2004 [2]. An updated review of
evidence is warranted, given the greater availability of
computers and increased use of the Internet by researchers
and patients. Therefore, to assist researchers and clinicians in
selecting the most appropriate MOA for PROs, we undertook
a systematic review to investigate bias in MOA (including
the size and direction of any bias), specifically between:
(a) paper and electronic self-completion; and (b) self-completion and assisted completion [e.g. computer-assisted
telephone interview (CATI), researcher-administered].
Methods
We searched five electronic databases: MEDLINE; PsycInfo; CINAHL; EMBASE; and Scopus from 2004 (date of
last comprehensive review on this topic [2]) to April 2014.
The search strategy was developed based on terms used in
published papers that compared MOA and considered data
quality issues. The search strategy consisted of terms for all
possible ‘‘MOA’’, ‘‘data quality’’, ‘‘bias’’ and ‘‘PROs’’. The
search strategy was reviewed by the team and discussed with
a librarian at the University of Sydney. The final search was
tested in MEDLINE to determine whether it retrieved five
relevant key papers (see online appendix). No geographical
restrictions were applied.
An electronic search by author (i.e. authors who had
published key papers identified through our search) was
performed. The electronic search was supplemented by a
search of reference lists of all studies included in the
review and from relevant systematic reviews identified.
Qual Life Res
Study selection and eligibility criteria
2.
Abstracts from retrieved papers were reviewed against five
broad criteria by one reviewer (CR, RMB or HR):
1.
2.
3.
4.
5.
Primary research of quantitative design [i.e. randomised controlled trials (RCTs); cohort study;
cross-sectional; comparative study];
Comparing two or more administration modes of data
collection;
Using PRO measures intended for assessing health
outcomes including QOL, symptoms, psychological
aspects (e.g. well-being; distress; risk perception) (as
opposed to objective patient risk based on patient
characteristics), and satisfaction with treatment. We
use the umbrella term ‘‘PRO’’ to describe measures
(also called questionnaires, survey, tools, rating scales)
that attempt to assess patients’ health-related experience from the patient’s perspective;
For the standardised health-related PRO measures, a
minimum of one piece of empirical evidence for
reliability and validity was required, i.e. not ‘‘study
specific questions’’ or ad hoc measures;
In any adult samples.
These criteria were selected as our intention was to identify
all studies comparing different MOA for PRO measures in
adults, regardless of quantitative research design or population. If all five criteria were met, the paper was considered
potentially relevant and obtained in full. Abstracts clearly not
relevant to this review were rejected at this stage. Where
relevance was ambiguous, papers were obtained in full for
further scrutiny. A second reviewer (RMB and HR) screened
25 % of excluded papers, selected at random to assess accuracy of classification; there were no disagreements.
Papers were then included if they met the following
criteria:
1.
2.
3.
4.
Investigated sources of poor data quality or bias,
including size and/or direction of bias. Studies that
considered techniques for enhancing response rates
were included if a comparison between MOA methods
was reported;
Reported sufficient data to allow inclusion in the
quantitative analysis, e.g. means and/or mean difference between MOA groups were reported;
Study outcomes were health-related PROs;
Published in English
Papers were excluded if:
1.
The study used qualitative approaches to data collection (e.g. unstructured interviews, focus group discussions) or was a literature review, discussion or
perspective paper, or a single case study;
3.
4.
5.
6.
7.
8.
Study outcomes were not health-related PROs. For
example, measures used for: clinical purposes (i.e.
variables such as physiologic measurements (e.g. blood
pressure) that may assist clinicians in diagnosing or
treating an illness; screening/health history taking/counselling; diagnosis/screening; demographics; personality tests; education/learning interventions; medication
adherence; health-related behaviours; census data; aptitude assessment (e.g. task performance such as STROP
test); or ad hoc questions;
The PRO measure used did not have a minimum of one
piece of empirical evidence for reliability and validity;
The study did not include a comparison of the effect of
MOA (e.g. the study considered incentives for increasing response rates or predictors of increased/decreased
response rates; i.e. only one mode, no comparison
between modes);
Studies addressing the development or refinement of
existing PRO measures (e.g. studies assessing the
psychometric properties of specific PRO measures) or
reporting methods for establishing reliability, validity
or responsiveness to change;
Analysis for equivalence between two modes was not
undertaken/reported or the data were pooled for both
modes;
Results were not reported by mode or were not useful
to our study objectives (i.e. means, mean difference,
ICC, response rates, preference were not reported by
mode groups on the PRO);
Not primary research.
Papers were assessed against the eligibility criteria by two
researchers (CR, RMB or HR) independently. Disagreements were resolved through team discussion. Where study
details were lacking, attempts were made to contact the
authors and invite them to provide additional information.
Quality assessment
The QualSyst quality appraisal tool [13] was used to assess
the quality of studies included in the review (Table 2 lists the
quality criteria items). We modified item 4 in the checklist to
enable assessment of whether ‘‘MOA was clearly described
in the methods’’, in line with our review objectives.
Study quality was assessed by two reviewers independently (CR, RMB, HR or LG). Each quality criterion was
assessed as being met (yes/partial/no), providing a structured approach for the classification of overall study quality
and the strength of the evidence based on methodological
components. Quality assessments were cross-checked for
consistency between assessments. In the case of any disagreements or discrepancies in scores, a third reviewer
independently assessed study quality.
123
Qual Life Res
Data extraction
Data from included studies was extracted into pre-prepared
data extraction tables by one reviewer (CR, RMB, HR or LG).
All data extraction was cross-checked for errors and omissions
by another reviewer (CR, RMB, HR or LG). Disagreements or
discrepancies were discussed between the researchers and
confirmed with a third reviewer (MK or DC). Final decisions
were agreed with the research team. Data pertaining to study
aims, sample characteristics, design, data collection methods,
analytical methods, and PRO measures were extracted. Many
studies reported more than one aspect, or domain, of selfreported health. For the meta-analysis, the following results
were extracted for each PRO domain, if reported in the source
papers: means for specific modes or mean difference between
any two modes, standard deviation or standard error of the
mean or mean difference, standardised mean difference,
t statistic, p value, or confidence intervals (CI). Of secondary
interest, where reported, we extracted response rates for each
MOA, the proportion of the sample preferring each MOA, and
whether the study considered covariates of preference,
response rates or missing data.
MOA [15]. The domain-level predictor was construct
(physical, psychological, social, or other, where other was
used as the reference category). The study-level predictors
were type of allocation to mode groups (randomised vs.
non-randomised); setting [home vs. home, clinic vs. clinic,
home (self-complete) vs. clinic (assisted) or home (assisted) vs. clinic (self-complete), where this last category
was the reference category].
Because the domain-level analysis contained non-independent observations (i.e. PRO domains within a study are
expected to be correlated), we conducted another metaanalysis on the study-level effect sizes (study-level analysis), which was a weighted average of the effect sizes
within each study [15]. Inter-domain correlations were
rarely reported in these studies so a correlation of 0.5
between every pair of domains within a study was
assumed. We used meta-regression to determine whether
any heterogeneity was predicted by the study-level variables described above (allocation method and setting).
Results
Synthesis
General study characteristics
A meta-analysis was conducted using a random-effects
model, which assumes that the effect size is not fixed but
has a distribution within the population [14]. We considered this appropriate for our review because the different
type of PRO domains (e.g. physical, psychological) and
study designs led us to expect that any heterogeneity in
observed effect sizes might be explained by domain-level
or study-level variables [14].
The comparisons of interest were (a) paper versus electronic self-completion and (b) self-completion versus assisted completion. A small number of studies identified in our
search did not include either of these comparisons, instead
comparing face-to-face interviews with tele-assisted interviews, so this comparison was examined post hoc.
The summary statistic we used was the standardised mean
difference (i.e. mean of one mode minus the other divided by
the pooled standard deviation) and its 95 % CI. Analyses
were conducted using SAS 9.3. To facilitate comparability of
PRO domain scales, we standardised domain score direction
so that higher scores always represent a worse outcome.
Heterogeneity was assessed by examining s2 and its Z and
p values, where p \ 0.05 was taken to indicate the presence
of heterogeneity and s2 = 0 indicates no heterogeneity.
The primary analysis treated individual domains within
studies as the unit of analysis; we refer to this as the domain-level analysis. This allowed us to conduct a metaregression to assess domain-level and study-level predictors of any observed heterogeneity in the differences due to
Of 5100 abstracts screened, 222 were considered potentially relevant and 56 met inclusion criteria; 46 provided
data to allow inclusion in a meta-analysis and a further 10
reported response rates or preference for MOA data
(Fig. 1). Of the 46 studies included in the meta-analysis, 43
compared two modes and three compared three modes.
Twenty studies were conducted in the clinic; 31 randomised participants to MOA groups; and 10 recruited
healthy or general population samples. Table 1 summarises
the key characteristics of these studies.
The modes used to administer PROs can be categorised into
four groups: (1) self-complete paper (hard copy); (2) selfcomplete electronic (e.g. computer including web, touch
screen, hand-held device); (3) telephone-assisted self-complete (paper-based, electronic, voice recorded); and (4)
interviewer-administered (face-to-face or telephone). Electronic MOA comparisons included: four telephone interactive
voice recognition systems, one CATI, two videoconference,
six touch screen and three personal digital assistants as the
mode of electronic assessment (Table 1).
123
Study quality
Study quality is reported in Table 2. Most studies described
their aims or hypotheses, participants, outcomes, analytical
methods and results, and drew appropriate conclusions.
Thirty-one studies (67 %) randomised participants to
MOA. The other studies allocated participants to MOA
Qual Life Res
Fig. 1 Flow of studies through
the process
Retrieved electronic
(minus duplicates)
n=5098
Retrieved from
reference lists
n=2
Abstracts reviewed
n=5100
Not relevant – abstract only
n=4878
Obtained in full
n=222
Met eligibility
n=56
Included Meta-analysis
n=46
(31 RCTs; 3 longitudinal; 12
cross-sectional)
through alternate allocation upon study entry, participant
preference (i.e. participants self-selected), or if non-response to one mode the alternative mode was offered.
The description of the method of subject and comparison group selection, MOA methods and allocation to mode
were relatively poor (Table 2). For 54 % of studies, power
was justified post hoc (e.g. only mentioned ‘‘we had the
power to detect differences’’ in the discussion) and \50 %
of studies controlled for confounding (Table 2).
Meta-analysis
There were 273 PRO domain-level effect sizes in the
analytic database from the 46 included studies.
Paper versus electronic self-completion
We found no significant difference in mean scores for
comparisons between administration by paper and electronic
self-completion at the domain level (effect size = 0.01;
Table 3). Figure 2 presents the forest plot of the paper
versus self-completion comparison. We found evidence of
heterogeneity (0.002), which was significantly predicted by
construct, such that the mean bias was larger for physical
Reason for exclusions (n=166):
113 Outcomes not validated PROs
10 Study about psychometric
properties of existing PRO
measures
21 Study not about mode effects
1 Proxy-report compared to patient
self-report
21 Comparison of mode not
undertaken/reported or results not
useful to our study objectives
Data on response rates and
preference for mode only
n=10
domains (mean effect size = 0.010) than for psychological
domains (mean effect size = 0.006; Table 4). The residual
unexplained heterogeneity was small. At the study level,
there was no significant bias (0.01) or heterogeneity (0.002).
Self-completion versus assisted completion
We found no significant difference in mean scores for
comparisons between self-completion and assisted completion at the domain level (0.04; Table 3). Figure 3 presents
the forest plot of the self-completion versus assisted completion comparison. However, we found evidence of
heterogeneity (s2 = 0.030), which was predicted by setting:
the size of the bias (where self-completion exhibited a
poorer health outcome than assisted completion) was significantly different for the home versus home comparisons
(mean effect size = 0.10), the clinic versus clinic comparisons (mean effect size = 0.05) and the home (self-complete) versus clinic (assisted) (mean effect size = 0.10) than
for the home (assisted) versus clinic (self-complete) (mean
effect size = -0.07; Table 4). Some unexplained heterogeneity remained but insufficient reporting in the included
studies did not allow for any other predictors (unmeasured
or unmeasurable) to be investigated. We found no significant
123
123
Swedish and Australian
participants with panic disorder,
registered to an internet-based
intervention (n = 120)
Patients with Rheumatology
(n = 200)
Austin (2009) [26],
Sweden; Australia
Participants from a RCT exploring
yoga for chronic lower back pain
(n = 45)
Prostate cancer (n = 107)
Heterogeneous cancer patients
(n = 1317)
Individuals with ocular surface
disease and those without
(controls) (n = 132)
Undergraduate psychology
students (n = 106)
Undergraduate psychology
students (paper group);
university staff, postgraduate
students, general population
(electronic group) (n = 531)
Patients with Chronic Pain
(n = 200)
Chang (2014) [30], Taiwan
Cheung (2006) [31],
Singapore
Clayton (2013) [32], USA
Coles (2007) [33], USA
Collins (2004) [34],
Australia
Cook (2004) [35], USA
Group 2: pts 1 year post-stroke,
recruited through central London
stroke self-help groups (n = 59)
Cerrada (2014) [29], USA
Caute (2012) [28], UK
Group 1: pts 3–6 months poststroke, from NHS Hospitals
Patients receiving curative
treatment for cancer (n = 153)
Ashley (2013) [25], UK
Bjorner (2014) [27],
Denmark
Population and sample
size for analysis
First author (year), country
Table 1 Characteristics of included studies
RCT, cross-over, allocation to both
modes, 45 min apart
Cross-sectional, modes advertised
separately to different
populations; individuals selfselected
RCT, cross-over allocation to
mode. Both modes completed
within average of 1.5 days
Hard copy in clinic versus
electronic (touch screen)
in clinic
Hard copy at home/postal
versus electronic
(computer) at home
Hard copy in clinic versus
electronic (computer) at
home
Hard copy in clinic versus
electronic (computer) in
clinic
Hard copy in clinic versus
face-to-face interview
Cross-sectional study. Allocation
to mode based on participant
preference
RCT, cross-over, allocation to both
modes, completed within 7 days
Hard copy in clinic versus
electronic (touch screen)
in clinic
RCT, cross-over, allocation to both
modes, 120 min apart
Comparative study: assessed at
baseline, 6 and 12 weeks. Modes
completed 2–11 days
Hard copy in clinic versus
CATI at home
Face-to-face interview in
clinic versus hard copy at
home or telephone
interview at home
Group 1: one mode then the other
Group 2: randomised to mode.
Target: 3–14 days between
modes; actual not reported
Electronic (PDA) in clinic
versus electronic (PC) in
clinic
Hard copy at home/postal
versus electronic
(computer) at home
Hard copy at home/postal
versus electronic
(computer) at home
Modes compared
RCT, randomised to PDA-PC or
PC-PC. Both groups completed
on the same day.
RCT, cross-over allocation to both
modes, completed within 7 days
RCT, cross-over allocation to both
modes, 2 weeks apart
Design
University pain
management
clinic
Home
College, home
Ophthalmology
clinic
Clinic
Clinic
Clinic, home
Clinic, home
Clinic
Home
Home
Setting
Short-form McGill Pain
Questionnaire; Pain Disability
Index
Dissociative Experiences Scale
(DES)
Obsessive Compulsive Inventory
(OCI); Obsessive Beliefs
Questionnaire (OBQ)
Refractive Error QoL
Questionnaire (RQL); Visual
Function Questionnaire (VFQ);
Ocular Surface Disease Index
(OSDI)
FACT-General; Functional Living
Index–Cancer (FLIC); EORTC
QLQ-C30
EORTC QOL Questionnaire
(EORTC QLQ-PR25) in Taiwan
Chinese
Roland Morris Disability
Questionnaire (RMDQ); SF-36
Stroke and Aphasia QOL Scale
(SAQOL-39 g)
Body Sensations Questionnaire
(BSQ); Agoraphobic Cognitions
Questionnaire (ACQ); Mobility
Inventory (MI)
PROMIS Physical, Fatigue and
Depression measures
The Social Difficulties Inventory
(SDI)
PRO measures
Qual Life Res
Population and sample
size for analysis
Salaried employees at a large
company (n = 751)
Undergraduate students and
volunteers from general public
(n = 223)
Cancer patients undergoing
treatment at specialist centre
(n = 483)
Adult women, new patients at
centre for pelvic health (n = 62)
Adults from two geriatric
rehabilitation wards (n = 156)
Sample description not recorded
(n = 91)
Individuals recruited to two studies
evaluating the effect of Internetdelivered treatment for social
anxiety disorder (n = 121)
Psychiatric patients receiving
treatment at the Internet-based
cognitive behaviour therapy
clinic (n = 82)
Participants recruited within public
health care (n = 112)
Participants with an affective
disorder (n = 42)
First author (year), country
Greene (2008) [36], USA
Grieve (2011) [37],
Australia
Gundy (2010) [38],
Netherlands
Handa (2008) [39], USA
Hauer (2010) [40],
Germany
Hayes (2013) [41],
Australia
Hedman (2010) [42],
Sweden
Hedman (2013) [43],
Sweden
Hollandare (2010) [44],
Sweden
Kobak (2004) [45], USA
Table 1 continued
Cross-sectional study. Face-to-face
mode first, then alternate
allocation to face-to-face or
videoconference, 1–3 days apart
RCT, cross-over allocation to both
modes, completed within an
average of 10 days
Repeated measures, cohort study.
Participants completed electronic
mode first, then telephone 3 days
later
Cross-sectional study. Recruited
via media advertisement. Paper
(2006 study); electronic (2007
study)
RCT, randomised to mode groups;
only one mode completed
Cross-sectional study. Completed
one mode after the other (same
order)
RCT, cross-over, allocation to both
modes, within 6 weeks
RCT, randomised to mode groups;
only one mode completed
Cross-sectional study. Mode
selected by participant
preference
RCT, randomised to mode, nonrespondents offered switching
mode
Design
Face-to-face interview in
clinic versus interview
assisted via
videoconference at home
Hard copy at home/postal
versus electronic
(computer) at home
Electronic (computer) at
home versus telephone
interview
Hard copy at home versus
electronic (computer) at
home
Electronic (computer) in
clinic versus hard copy in
clinic versus telephone
interview
Hard copy in clinic versus
face-to-face interview
Hard copy at home and in
clinic versus electronic
(computer) at home and in
clinic
Hard copy at home/postal
versus telephone
interview versus hard
copy in clinic
Electronic (computer) in
clinic versus hard copy in
clinic
Electronic (computer) at
home versus telephone
interview
Modes compared
Clinic, home
Home
Home
Hamilton Depression Rating Scale
(HAM-D)
Montgomery-Asberg Depression
Rating Scale (MADRS-S); Beck
Depression Inventory (BDI-II)
Montgomery–Åsberg Depression
Rating Scale Self-rated
(MADRS-S)
Liebowitz Social Anxiety Scale
(LSAS); Social Interaction
Anxiety Scale (SIAS); Social
Phobia Scale (SPS);
Montgomery-Asberg Depression
Rating Scale (MADRS-S); Beck
Anxiety Inventory (BAI); QoL of
Insomniacs Questionnaire
(QOLI)
Edinburgh Depression Scale
(EDDS)
Clinic, home
Home
Falls Efficacy Scale (FES); Falls
Efficacy Scale-International
(FES-I)
Pelvic Floor Distress Inventory
(PFDI); Pelvic Floor Impact
Questionnaire (PFIQ)
EORTC QoL Questionnaire
(QLQ-C30v2)
Depression, Anxiety, Stress Scales
(DASS)
Widely used self-report of health
status
PRO measures
Clinic
Home & in pelvic
health clinic
Home and cancer
clinic
University
Home
Setting
Qual Life Res
123
Population and sample
size for analysis
Subjects with a DSM-IV-TRdiagnosed mood disorder
(n = 70)
Patients with lower back pain
(n = 701)
A non-probability sample of
subjects from the Arizona
Cancer Center outpatient clinics
(n = 184)
Patients who received acupuncture
treatment, drawn from GERAC
acupuncture trial (n = 1087)
Adults attending urban pain
management centre in a teaching
hospital (n = 42)
Prostate cancer patients (n = 185)
Female undergraduate psychology
students (n = 76)
Adults with asthma (n = 96)
Non-probability, purposive sample
of the general adult population
(n = 312)
Prosthodontic patients (n = 42)
Outpatients with inflammatory
rheumatic diseases (n = 153)
First author (year), country
Kobak (2008) [46], USA
Lall (2012) [47], UK
Lundy (2011) [48], USA
Lungenhausen (2007) [49],
Germany
Marceau (2007) [50], USA
Matthew (2007) [51],
Canada
Naus (2009) [52], USA
Pinnock (2005) [53], UK
Ramachandran (2008) [54],
USA
Reissmann (2011) [55],
Germany
Richter (2008) [56],
Germany
Table 1 continued
123
RCT, cross-over allocation to both
modes. Time between
completions not reported
RCT, cross-over allocation. All
modes completed over 3–4 week
period
RCT, cross-over, allocation to both
modes, 10 min apart
Longitudinal cohort study.
Alternate completion (postal first
then in clinic under supervision
at next scheduled visit,
completed 1 week apart
RCT, cross-over allocation to both
modes, completed within
21 days
RCT, cross-over allocation to both
modes, completed 30 min apart
RCT, cross-over, allocation to both
modes, completed within 1 week
RCT, cross-over, allocation to both
modes, completed within
4 weeks
Cross-sectional study. Nonresponders telephoned to collect
data over the phone
RCT, cross-over allocation to both
modes, completed between 2 and
3 days
Allocation to mode via counterbalanced order, completion on
same day
Cross-sectional study.
Design
Electronic (tablet
computer) in clinic versus
hard copy in clinic
Telephone interview versus
face-to-face interview
versus Hard copy in clinic
Hard copy in clinic versus
electronic (touch screen)
in clinic
University Clinic
Clinic, home
Clinic
Home, clinic
University clinic
Hard copy in clinic versus
electronic (computer) in
clinic
Hard copy at home/postal
versus assisted hard copy
in clinic
Outpatient clinic
Home
Home
Home
Clinic, home
Clinic, home
Setting
Hard copy in clinic versus
electronic (PDA) in clinic
Electronic (computer) at
home versus hard copy at
home/postal
Telephone interview at
home versus hard copy at
home/postal
Hard copy in clinic versus
telephone interview at
home
Hard copy at home/postal
versus telephone
automated
Face-to-face interview in
clinic versus interview
assisted via
videoconference at home
Modes compared
Hannover Functional Questionnaire
(FFbH); Health Assessment
Questionnaire (HAQ); Bath
Ankylosing Spondylitis Disease
Activity Index; SF-36 (Physical
& Mental)
Oral Health Impact Profile
(German version)
EuroQol Visual Analogue Scale
(EQ VAS)
Mini Asthma QoL Questionnaire
(MiniAQLQ)
International Prostate Symptom
Score; International Index of
Erectile Function; Patient
Oriented Prostate Utility Scale
Beck Depression Inventory (BDIII); SF-36
Brief Pain Inventory (BPI)
SF-12; Graded Chronic Pain Scale
Vorn Korff Scale; SF12; EuroQoL
Five Dimensions Questionnaire
(EQ-5D)
EuroQoL Five Dimensions
Questionnaire (EQ-5D);
EuroQoL VAS
Montgomery-Asberg Depression
Rating Scale (MADRS)
PRO measures
Qual Life Res
Population and sample
size for analysis
Convenience sample with chronic
disease who had access to and
who were familiar with the
Internet (n = 462)
Patients with rheumatoid arthritis
(n = 87)
Patients with axial
spondyloarthritis (N = 55)
Patients with cancer (n = 437)
Patients attending an adult dental
clinic (n = 90)
Veterans with diagnosis of moodand-substance-related disorders,
and general medical patients with
no psychiatric diagnosis (n = 97)
Patients and companions from
genitourinary medical oncology,
gastrointestinal medical oncology
and psychiatric clinics (n = 801)
Patients with rheumatoid arthritis
(n = 43)
First author (year), country
Ritter (2004) [57], USA
Salaffi (2009) [58], Italy
Salaffi (2013) [59], Italy
Sikorski (2009) [60], USA
Sousa (2009) [61], Brazil
Suris (2007) [62], USA
Swartz (2007) [63], USA
Tiplady (2010) [64], UK
Table 1 continued
Hard copy in clinic versus
electronic (touch screen)
in clinic
Electronic (PDA) in clinic
versus hard copy in clinic
RCT, cross-over allocation to
mode. Time between mode
completion not recorded
RCT, cross-over allocation to both
modes, 45 min apart
Hard copy in clinic versus
electronic (computer) in
clinic
Hard copy in clinic versus
face-to-face interview in
clinic
Telephone interview versus
telephone automated
Hard copy in clinic versus
electronic (touch screen)
in clinic
Hard copy in clinic versus
electronic (touch screen)
in clinic
Electronic (computer) at
home versus hard copy at
home/postal
Modes compared
Cross-sectional study. Consecutive
assignment to mode
RCT, cross-over allocation to
mode, completed 2 weeks apart
RCT, randomised to mode groups;
only one mode completed
RCT, cross-over allocation to both
modes, 60 min apart
RCT, cross-over allocation to both
modes, 60 min apart
RCT, randomised to mode groups;
only one mode completed
Design
Rheumatology
clinic
Cancer clinic
Clinic
University clinic
Two Cancer
clinics
Clinic
Clinic
Home
Setting
Brief Pain Inventory (BPI); EuroQoL
Five Dimensions Questionnaire
(EQ-5D); Functional Assessment
of Chronic Illness Therapy-Fatigue
(FACIT-F); Heath Assessment
Questionnaire (HAQ); Short-form
McGill Pain Questionnaire (SFMPQ); SF-36; Stress and
Adaptation in Rheumatoid
Arthritis
Center for Epidemiologic Studies
Depression Scale (CES-D)
Aggression Questionnaire; Barratt
Impulsiveness Scale-11; SF-36
Oral Health Impact Profile
(OHIP14)
Symptom index for: pain, fatigue,
cough, peripheral neuropathy,
dyspnoea, insomnia, dry mouth,
difficulty remembering, poor
appetite, nausea/vomiting, diarrhoea,
constipation, weakness, alopecia
Bath Ankylosing Spondylitis
Disease Activity Index; Bath
Ankylosing Spondylitis
Functional Index; Ankylosing
Spondylitis Disease Activity
Score- preferred and alternative
VAS for general health, pain and
global activity; Tender Joint
Count; Recent-Onset Arthritis
Disability Questionnaire
VAS for pain, shortness of breath
and fatigue; Illness Intrusiveness
Ratings Scale (IIRS); Health
Distress Thermometer; Health
Assessment Questionnaire (HAQ)
PRO measures
Qual Life Res
123
123
People with allergic rhinitis
(n = 82)
University students (n = 2000)
Patients with chronic heart failure
(n = 58)
Teachers at elementary, junior and
senior high schools (n = 2400)
Psychiatric patients receiving
treatment for depression
(n = 53)
Patients with a benign abnormality
or with breast cancer (n = 800)
Weiler (2004) [65], USA
Whitehead (2011) [66],
New Zealand
Wu (2009) [67], Canada
Yu (2007) [68], Taiwan
Zimmermann (2012) [69],
USA
Zuidgeest (2011) [21],
Netherlands
Cross-sectional study. Nonresponse to one mode, alternative
mode offered
Cross-sectional study. Electronic
version completed 48 h before
clinic; hard copy completed after
clinic
RCT, randomised to mode groups;
only one mode completed
RCT, cross-over allocation to both
modes, 2 weeks apart
RCT, randomised to mode groups;
only one mode completed
RCT, cross-over allocation to both
modes, 1 week apart
Design
Hard copy at home/postal
versus electronic
(computer) at home
Hard copy in clinic versus
electronic (computer) at
home
Electronic (computer) at
home versus hard copy at
home/postal
Electronic (computer) in
clinic versus hard copy in
clinic
Electronic (computer) at
home versus hard copy at
home/postal
Hard copy at home/postal
versus telephone
automated
Modes compared
Home
Clinic and home
Home
Clinic
Home
Home
Setting
CQ-index Breast Care
Clinically Useful Depression
Outcome Scale
Centre for Epidemiological Studies
Depression Scale (CES-D)
The Kansas City Cardiomyopathy
Questionnaire; Minnesota Living
with Heart Failure Questionnaire
SF-12; Feeling of Satisfaction with
Inhaler Questionnaire; Hospital
Anxiety and Depression Score
(HADS)
CompleWare Corporation
Symptom Score Card II or
Symptom Phone Link
PRO measures
CATI Computer-assisted telephone interview, PDA personal digital assistant, PC personal computer, EORTC European Organization for Research and Treatment of Cancer, QOL quality of life,
VAS visual analogue scale, FACT Functional Assessment of Cancer Therapy
Population and sample
size for analysis
First author (year), country
Table 1 continued
Qual Life Res
Qual Life Res
Table 2 Methodological characteristics of studies comparing two or more modes of PRO measure administration (n = 46)
Quality criteria
% Yes
% Partial
% No
Question or objective sufficiently described?
60.9
39.1
Design evident and appropriate to answer the author’s study question
56.5
41.3
2.2
Method of subject selection and comparison group selection is described and appropriate
34.8
58.7
6.5
0
Modes of administration are clearly described in the methods
47.8
52.2
0
Subject and comparison group characteristics sufficiently described?
69.6
26.1
4.3
If random allocation to treatment group was possible, is it described?
23.9
50
If interventional and blinding of investigators to intervention was possible, is it reported? (n = 31 RCT)
26.1
0
2.2
Outcome well defined and robust to measurement bias? Means of assessment reported?
82.6
17.4
97.8
0
Sample size appropriate?
26.1
54.3
19.6
Analysis described and appropriate to address the authors aims?
71.7
28.3
0
Some estimate of variance (e.g. SD, CI, standard errors) reported for the main results/outcomes
(i.e. those directly addressing the study question/objective upon which the conclusions are based)?
73.9
23.9
2.2
Controlled for confounding?
41.3
50
8.7
Results reported in sufficient detail?
82.6
15.2
2.2
Do the results support the conclusions?
84.8
15.2
0
Table 3 Summary of meta-analysis results, including mean effect size, 95 % confidence interval (CI) and heterogeneity (s2)
Modes of administration compared
Effect size
Heterogeneity
Number of domains/studies
Mean effect size
95 % CI
s2
181
0.01
-0.01, 0.03
0.002
0.02
31
0.01
-0.02, 0.04
0.002
0.09
Domain level
90
0.04
-0.01, 0.08
0.030
\0.001
Study level
14
0.01
-0.10, 0.12
0.026
0.02
4
0.005
-0.39, 0.39
0
–
3
0.005
-0.52, 0.52
0
–
p
Self-completion: paper versus electronic
Domain level
Study level
Self-completion versus assisted completion
Interview: face-to-face versus tele-assisted
Domain level
Study level
bias (0.01) at the study level, but observed significant
heterogeneity (s2 = 0.026) that was not explained by any of
the predictors.
Face-to-face interview versus tele-assisted interview
We found no significant difference (0.005) in mean scores
for comparisons between face-to-face and tele-assisted
interview at the domain level (Table 3). We found no evidence of heterogeneity (s2 = 0) nor did we find significant
bias (0.005) or heterogeneity (s2 = 0) at the study level.
18–39 % of total sample) [16, 17], three studies in favour of
electronic MOA (range 39–90 %) compared to paper (range
22–73 %) [9, 18, 19], and four studies found no difference in
response rates between modes [20–23]. Five studies contributed some information about participants’ preference for
MOA (Table 5). In three studies, participants preferred
paper when compared to electronic MOA [9, 16, 17]. In one
study, participants preferred electronic MOA (e.g. web link,
computer) when compared to paper MOA [19] and in
another study, participants preferred paper and interactive
voice MOA compared to paper alone [23].
Response rates and mode preference
Discussion
Although not of primary interest, nine studies contributed
information about response rates for different MOAs. Two
studies found higher response rates for paper completion
(range 61–89 %) compared to electronic MOA (range
The results summarised here provide strong evidence that
there is no bias attributable to MOA in PRO data collection. The effect sizes (standardised mean differences) for
123
Qual Life Res
Fig. 2 Forest plot (standardised
mean difference and 95 % CI)
for self-complete electronic
versus pen–paper administration
at the study level. Negative
effect sizes favour electronic
Table 4 Unstandardised regression parameter estimates (and p values) from meta-regression analyses
Comparison
Predictor
Constructa
Settingb
Randomisationc
Other
Social
Psych
CC
CH
-0.04 (0.06)
-0.02 (0.70)
-0.16 (\0.01)
0.06 (0.69)
Self-completion:
paper versus
electronic
Domain
level
-0.04 (0.39)
-0.04 (0.44)
-0.06 (0.03)
Self-completion
versus assisted
completion
Domain
level
-0.04 (0.51)
0.08 (0.44)
0.05 (0.43)
Study
level
–
–
–
HC
–
0.07 (0.09)
-0.21 (\0.01)
-0.24 (0.03)
0.14 (0.04)
0.08 (0.57)
–
0.03 (0.81)
Only comparisons exhibiting heterogeneity in meta-analyses were included in meta-regressions. The estimates represent the covariate-adjusted
mean differences between the variable level listed in the column header and the reference level, e.g. the difference between physical (reference)
and psychological constructs is -0.06
a
Reference group = physical
b
CC, clinic–clinic; CH, clinic (assisted)-home (self); HC, home (assisted)-clinic (self); reference group, home–home
c
Reference group = randomised
123
Qual Life Res
Fig. 3 Forest plot (standardised
mean difference and 95 % CI)
for self-completion versus
assisted completion at the study
level. Negative effect sizes
favour assisted completion
Table 5 Preference for mode
of administration
First author of study
Paper (%)
Electronic (%)
Interactive voice/paper (%)
No preference (%)
Kongsved [16]
55.4
32.4
–
–
Reichmann [17]
61
39
–
–
Shea [23]
22
–
78
–
Smith [9]
57
43
–
–
Wijndaele [19]
21.6
39.2
–
39.2
the paper versus electronic self-completion were small and
not statistically significant, consistent with other reviews of
studies comparing paper- and electronic-completion of
PROs [2, 10]. However, we found evidence of heterogeneity in effect sizes, which was explained mostly by
construct, such that mean bias was larger for physical
outcomes (e.g. physical functioning, role functioning) than
for psychological (e.g. anxiety, distress) outcomes. However, in an absolute sense, this bias was small for both
types of construct.
Our findings have important practical implications. First,
they legitimize the use of mixed MOA in studies. For
example, participants may complete PROs electronically in
the clinic and subsequently paper versions at home, or vice
versa. Patients can be offered MOA based on their preference, which may in turn increase response rates and subsequently improve data quality. Our findings also suggest
researchers can safely transfer validated, paper-based PRO
measures to electronic interfaces, assuming that the item
wording, response options and layout remain the same. Any
changes to these aspects should be evaluated through cognitive interviewing to ensure that respondents are interpreting the items as intended and responding equivalently across
different MOA [1].
While our findings suggest that studies using a mix of
self-completion and assisted completion will generally produce equivalent scores, some caution may be required if
setting (home vs. clinic) changes as well as self-completion
versus assisted completion, particularly when assisted
completion occurs in the home and self-completion in the
clinic. However, the small number of studies in each category means this may be a chance finding rather than true
interaction of setting by MOA. Until we have more robust
evidence of the relationship between self-completion versus
assisted completion and setting, it may be safest to alter only
one of these two aspects of administration within a study.
The studies included in our review were published in the
last decade, during which time, social media and the
emergence of online and in-person support groups have
increased, as has patients’ familiarity with PRO completion.
Improved patient–clinician communication and shared
decision-making have been encouraged and promoted, and
clinicians are expected to better communicate with their
patients and to consider PROs for individual patient care.
PROs are increasingly collected routinely in clinical practice, thus patients may be more familiar with completing
them. Consequently, social acquiescence bias may be less
problematic than was previously as people may be more
comfortable with sharing and discussing their feelings and
are aware that honest reporting of their health outcomes may
positively impact their care. Interestingly, of the 14 studies
we identified that compared self-completion and assisted
completion, 11 included patients with a medical or mental
health diagnosis attending clinic, rather than a general
population or student sample (which were relatively common in the other MOA comparisons we meta-analysed).
123
Qual Life Res
Preference for mode was not of primary interest, and
therefore, we did not conduct a comprehensive search for
studies designed to assess MOA preference. However, consistent with Shih and Fan [24], the few studies included in our
review that reported information about participants’ preference for mode found mixed results; four studies found participants preferred paper when compared to electronic MOA,
while in two studies, participants preferred electronic over
paper MOA. Having detected no bias between MOA, participants can select MOA based on their preference, which
may increase response rates and reduce missing data. Similarly, there was no systematic difference in response rates;
two studies found response rates in favour of paper compared
to electronic MOA, three studies in favour of electronic
MOA compared to paper, and four found no difference.
These mixed findings are consistent with others [11]; no one
mode produces consistently higher response rates. A possible
association between MOA preference and response rate
should be tested in future research. Other factors not explored
in this review but that may play a role and should be explored
are age, computer literacy and level of health status.
Caveats, recommendations and future directions
An important caveat is that our findings apply to research
studies but not necessarily to use of PRO measures in clinic
to manage individual patients because the evidence base does
not yield any information about the extent to which an
individual’s scores would differ between two MOA. A
combination of electronic and paper self-completion can be
used in the clinic and at home without likelihood of introducing any bias due to MOA. Future mode comparison
studies are required, but these need to be experimentally
designed to enable the direct assessment of the impact of
some of the mediators of MOA effects on data quality.
Specifically, these include the method of contacting respondents (e.g. by post, in person); sensory input channel (e.g.
visual vs. aural); and response mode (e.g. handwritten, keyboard, touchscreen, telephone). Due to the lack of data presented in papers, the impact of these mediators of MOA
effects could not be determined. MOA effects may also differ
between populations (e.g. general population vs. diseased);
however, the heterogeneity in our data did not allow us to
examine this variable but it is worthy of future research.
An earlier review exploring self- and administered-MOA
found that MOA did not have an effect on repeated PRO
assessments [12], whereas another found such an effect [7].
The majority of studies in our meta-analysis compared
MOA at one time point rather than longitudinally across
multiple times points. Further comparative studies of MOA
should consider multiple variables, including equivalence of
mode in individuals rather than at group level; longitudinal
comparisons to assess the impact of mixed MOA over time
123
in terms of equivalence in scores and the quality of response
(non-response bias; validity, reliability and distribution of
responses); and setting. Rather than more studies designed to
describe bias as a function of MOA, what is needed is further consideration of study design features that may explain
any bias, such as setting.
Conclusions
A combination of electronic and paper self-completion and
assisted completion methods can be used in the clinic and
at home, based on patient preference for mode. To fill the
evidence gap and increase our understanding of the impact
of MOA on PROs, further experimental comparative
studies are needed to assess the mediators of mode effects,
measurement equivalence and reliability of assessment for
individuals rather than groups, and the impact of setting
and mixed MOA over time.
Acknowledgments We thank HR and LG for their contribution to
this project as volunteers. We also acknowledge the support from our
faculty librarian, Matthew Davis, in developing the search strategy.
Funding C.R., D.C., R.M.B. and M.K. are supported by the Australian Government through Cancer Australia. No additional funding
was sought for this review.
Compliance with ethical standards
Conflict of interest All authors declare that they have no conflict of
interest.
Ethical approval This article is a secondary analysis of published
literature. It does not contain any studies with human participants or
animals performed by any of the authors.
References
1. Food and Drug Administration. (2009). Patient reported outcome
measures: Use in medical product development to support
labelling claims. MD: US Department of Health & Human
Support Food & Drug Administration.
2. Hood, K., Robling, M., Ingledew, D., Gillespie, D., Greene, G.,
Ivins, R., et al. (2012). Mode of data elicitation, acquisition and
response to surveys: A systematic review. Health Technology
Assessment, 16(27), 1–162.
3. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N.
P. (2003). Common method biases in behavioral research: a
critical review of the literature and recommended remedies.
Journal of Applied Psychology, 88(5), 879–903.
4. Basch, E., Abernethy, A. P., Mullins, C. D., Reeve, B. B., Smith,
M. L., Coons, S. J., et al. (2012). Recommendations for incorporating patient-reported outcomes into clinical comparative
effectiveness research in adult oncology. Journal of Clinical
Oncology, 30(34), 4249–4255.
5. Stukenborg, G. J., Blackhall, L., Harrison, J., Barclay, J. S.,
Dillon, P., Davis, M. A., et al. (2014). Cancer patient-reported
Qual Life Res
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
outcomes assessment using wireless touch screen tablet computers. Quality of Life Research, 23(5), 1603–1607.
Scientific Advisory Committee of the Medical Outcomes Trust.
(2002). Assessing health status and quality-of-life instruments:
Attributes and review criteria. Quality of Life Research, 11(3),
193–205.
Bowling, A. (2005). Mode of questionnaire administration can
have serious effects on data quality. Journal of Public Health,
27(3), 281–291.
Bernhard, J., Cella, D. F., Coates, A. S., Fallowfield, L., Ganz, P.
A., Moinpour, C. M., et al. (1998). Missing quality of life data in
cancer clinical trials: Serious problems and challenges. Statistics
in Medicine, 17(5–7), 517–532.
Smith, A. B., King, M., Butow, P., & Olver, I. (2013). A comparison of data quality and practicality of online versus postal
questionnaires in a sample of testicular cancer survivors. PsychoOncology, 22(1), 233–237.
Gwaltney, C. J., Shields, A. L., & Shiffman, S. (2008). Equivalence of electronic and paper-and-pencil administration of
patient-reported outcome measures: A meta-analytic review.
Value Health, 11(2), 322–333.
McColl, E., Jacoby, A., Thomas, L., Soutter, J., Bamford, C.,
Steen, N., et al. (2001). Design and use of questionnaires: A
review of best practice applicable to surveys of health service
staff and patients. Health Technology Assessment, 5(31), 1–256.
Puhan, M. A., Ahuja, A., Van Natta, M. L., Ackatz, L. E., &
Meinert, C. (2011). Interviewer versus self-administered healthrelated quality of life questionnaires—Does it matter? Health and
Quality of Life Outcomes,. doi:10.1186/1477-7525-1189-1130.
Kmet, L., Lee, R., & Cook, L. (2004). Standard quality assessment criteria for evaluating primary research papers from a
variety of fields. Health Technology Assessment, 13, 1–294.
Lipsey, M., & Wilson, D. (2001). Practical meta-analysis.
Thousand Oaks: Sage.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009).
Introduction to meta-analysis. Oxford: Wiley.
Kongsved, S. M., Basnov, M., Holm-Christensen, K., & Hjollund, N. H. (2007). Response rate and completeness of questionnaires: A randomized study of Internet versus paper-andpencil versions. Journal of Medical Internet Research, 9(3), e25.
Reichmann, W. M., Losina, E., Seage, G. R., Arbelaez, C., Safren, S. A., Katz, J. N., et al. (2010). Does modality of survey
administration impact data quality: Audio computer assisted self
interview (ACASI) versus self-administered pen and paper? PLoS
One, 5(1), e8728.
Lannin, N. A., Anderson, C., Lim, J., Paice, K., Price, C., Faux,
S., et al. (2013). Telephone follow-up was more expensive but
more efficient than postal in a national stroke registry. Journal of
Clinical Epidemiology, 66(8), 896–902.
Wijndaele, K., Matton, L., Duvigneaud, N., Lefevre, J., Duquet,
W., Thomis, M., et al. (2007). Reliability, equivalence and
respondent preference of computerized versus paper-and-pencil
mental health questionnaires. Computers in Human Behavior,
23(4), 1958–1970.
Rodriguez, H. P., von Glahn, T., Rogers, W. H., Chang, H.,
Fanjiang, G., & Safran, D. G. (2006). Evaluating patients’
experiences with individual physicians: A randomized trial of
mail, internet, and interactive voice response telephone administration of surveys. Medical Care, 44(2), 167–174.
Zuidgeest, M., Hendriks, M., Koopman, L., Spreeuwenberg, P., &
Rademakers, J. (2011). A comparison of a postal survey and
mixed-mode survey using a questionnaire on patients’ experiences with breast care. Journal of Medical Internet Research,
13(3), e68.
Rutherford, C., Nixon, J., Brown, J. M., Lamping, D. L., & Cano,
S. J. (2014). Using mixed methods to select optimal mode of
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
administration for a patient-reported outcome instrument for
people with pressure ulcers. BMC Medical Research Methodology, 14(22), 1471–2288.
Shea, J. A., Guerra, C. E., Weiner, J., Aguirre, A. C., Ravenell, K.
L., & Asch, D. A. (2008). Adapting a patient satisfaction
instrument for low literate and Spanish-speaking populations:
Comparison of three formats. Patient Education and Counseling,
73(1), 132–140.
Shih, T., & Fan, X. (2007). Response rates and mode preferences
in web-mail mixed-mode surveys: A meta-analysis. International
Journal of Internet Science, 2, 59–82.
Ashley, L., Keding, A., Brown, J., Velikova, G., & Wright, P.
(2013). Score equivalence of electronic and paper versions of the
Social Difficulties Inventory (SDI-21): A randomised crossover
trial in cancer patients. Quality of Life Research, 22(6), 1435–1440.
Austin, J., Alvero, A. M., Fuchs, M. M., Patterson, L., & Anger, W.
K. (2009). Pre-training to improve workshop performance in
supervisor skills: An exploratory study of Latino agricultural
workers. Journal of Agricultural Safety and Health, 15(3), 273–281.
Bjorner, J. B., Rose, M., Gandek, B., Stone, A. A., Junghaenel, D.
U., & Ware, J. E, Jr. (2014). Method of administration of PROMIS scales did not significantly impact score level, reliability, or
validity. Journal of Clinical Epidemiology, 67(1), 108–113.
Caute, A., Northcott, S., Clarkson, L., Pring, T., & Hilari, K.
(2012). Does mode of administration affect health-related quality-of-life outcomes after stroke? International Journal of
Speechlanguage Pathology, 14(4), 329–337.
Cerrada, C. J., Weinberg, J., Sherman, K. J., & Saper, R. B.
(2014). Inter-method reliability of paper surveys and computer
assisted telephone interviews in a randomized controlled trial of
yoga for low back pain. BMC Research Notes, 7, 227. doi:10.
1186/1756-0500-7-227.
Chang, Y. J., Chang, C. H., Peng, C. L., Wu, H. C., Lin, H. C.,
Wang, J. Y., et al. (2014). Measurement equivalence and feasibility of the EORTC QLQ-PR25: Paper-and-pencil versus touchscreen administration. Health and Quality of Life Outcomes, 12,
23. doi:10.1186/1477-7525-12-23.
Cheung, Y. B., Goh, C., Thumboo, J., Khoo, K. S., & Wee, J.
(2006). Quality of life scores differed according to mode of
administration in a review of three major oncology questionnaires. Journal of Clinical Epidemiology, 59(2), 185–191.
Clayton, J. A., Eydelman, M., Vitale, S., Manukyan, Z., Kramm,
R., Datiles, M., et al. (2013). Web-based versus paper administration of common ophthalmic questionnaires: Comparison of
subscale scores. Ophthalmology, 120(10), 2151–2159.
Coles, M. M., Cook, L. M., & Blake, T. R. (2007). Assessing
obsessive compulsive symptoms and cognitions on the internet:
Evidence for the comparability of paper and Internet administration. Behaviour Research and Therapy, 45(9), 2232–2240.
Collins, F. E., & Jones, K. V. (2004). Investigating dissociation
online: Validation of a web-based version of the dissociative experiences scale. Journal of Trauma & Dissociation, 5(1), 133–147.
Cook, A. J., Roberts, D. A., Henderson, M. D., Van Winkle, L. C.,
Chastain, D. C., & Hamill-Ruth, R. J. (2004). Electronic pain questionnaires: a randomized, crossover comparison with paper questionnaires for chronic pain assessment. Pain, 110(1–2), 310–317.
Greene, J., Speizer, H., & Wiitala, W. (2008). Telephone and
web: Mixed-mode challenge. Health Services Research, 43(1 Pt
1), 230–248.
Grieve, R., & de Groot, H. T. (2011). Does online psychological
test administration facilitate faking? Computers in Human
Behavior, 27(6), 2386–2391.
Gundy Cm, A. N. K. (2010). Effects of mode of administration
(MOA) on the measurement properties of the EORTC QLQ-C30:
A randomized study. Health and Quality of Life Outcomes, 8, 35.
doi:10.1186/1477-7525-8-35.
123
Qual Life Res
39. Handa, V. L., Barber, M. D., Young, S. B., Aronson, M. P.,
Morse, A., & Cundiff, G. W. (2008). Paper versus web-based
administration of the pelvic floor distress inventory 20 and pelvic
floor impact questionnaire 7. International Urogynecology
Journal, 19(10), 1331–1335.
40. Hauer, K., Yardley, L., Beyer, N., Kempen, G., Dias, N.,
Campbell, M., et al. (2010). Validation of the falls efficacy scale
and falls efficacy scale international in geriatric patients with and
without cognitive impairment: results of self-report and interview-based questionnaires. Gerontology, 56(2), 190–199.
41. Hayes, J., & Grieve, R. (2013). Faked depression: Comparing
malingering via the internet, pen-and-paper, and telephone
administration modes. Telemedicine and e Health, 19(9), 714–716.
42. Hedman, E., Ljotsson, B., Ruck, C., Furmark, T., Carlbring, P.,
Lindefors, N., & Andersson, G. (2010). Internet administration of
self-report measures commonly used in research on social anxiety
disorder: A psychometric evaluation. Computers in Human
Behavior, 26(4), 736–740.
43. Hedman, E., Ljotsson, B., Blom, K., Alaoui, S. E., Kraepelien,
M., Ruck, C., et al. (2013). Telephone versus internet administration of self-report measures of social anxiety, depressive
symptoms, and insomnia: Psychometric evaluation of a method to
reduce the impact of missing data. Journal of Medical Internet
Research, 15(10), 131–138.
44. Hollandare, F., Andersson, G., & Engstrom, I. (2010). A comparison
of psychometric properties between internet and paper versions of
two depression instruments (BDI-II and MADRS-S) administered to
clinic patients. Journal of Medical Internet Research, 12(5), e49.
45. Kobak, K. A. (2004). A comparison of face-to-face and videoconference administration of the Hamilton Depression Rating
Scale. Journal of Telemedicine and Telecare, 10(4), 231–235.
46. Kobak, K. A., Williams, J. B. W., Jeglic, E., Salvucci, D., & Sharp,
I. R. (2008). Face-to-face versus remote administration of the
Montgomery–Asberg Depression Rating Scale using videoconference and telephone. Depression and Anxiety, 25(11), 913–919.
47. Lall, R., Mistry, D., Bridle, C., & Lamb, S. E. (2012). Telephone
interviews can be used to collect follow-up data subsequent to no
response to postal questionnaires in clinical trials. Journal of
Clinical Epidemiology, 65(1), 90–99.
48. Lundy, J. J., & Coons, S. J. (2011). Measurement equivalence of
interactive voice response and paper versions of the EQ-5D in a
cancer patient sample. Value in Health, 14(6), 867–871.
49. Lungenhausen, M., Lange, S., Maier, C., Schaub, C., Trampisch,
H. J., & Endres. H. G. (2007). Randomised controlled comparison
of the Health Survey Short Form (SF-12) and the Graded Chronic
Pain Scale (GCPS) in telephone interviews versus self-administered questionnaires. Are the results equivalent? BMC Medical
Research Methodology, 7(50). doi:10.1186/1471-2288-7-50.
50. Marceau, L. D., Link, C., Jamison, R. N., & Carolan, S. (2007).
Electronic diaries as a tool to improve pain management: Is there
any evidence? Pain Medicine, 8(Suppl 3), S101–S109.
51. Matthew, A. G., Currie, K. L., Irvine, J., Ritvo, P., Santa Mina,
D., Jamnicky, L., et al. (2007). Serial personal digital assistant
data capture of health-related quality of life: A randomized
controlled trial in a prostate cancer clinic. Health Qual Life
Outcomes, 5, 38.
52. Naus, M. J., Philipp, L. M., & Samsi, M. (2009). From paper to
pixels: A comparison of paper and computer formats in psychological assessment. Computers in Human Behavior, 25(1), 1–7.
53. Pinnock, H., Juniper, E. F., & Sheikh, A. (2005). Concordance
between supervised and postal administration of the Mini Asthma
Quality of Life Questionnaire (MiniAQLQ) and Asthma Control
Questionnaire (ACQ) was very high. Journal of Clinical Epidemiology, 58(8), 809–814.
54. Ramachandran, S., Lundy, J. J., & Coons, S. J. (2008). Testing
the measurement equivalence of paper and touch-screen versions
123
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
of the EQ-5D visual analog scale (EQ VAS). Quality of Life
Research, 17(8), 1117–1120.
Reissmann, D. R., John, M. T., & Schierz, O. (2011). Influence of
administration method on oral health-related quality of life
assessment using the Oral Health Impact Profile. European
Journal of Oral Sciences, 119(1), 73–78.
Richter, J. G., Becker, A., Koch, T., Nixdorf, M., Willers, R.,
Monser, R., et al. (2008). Self-assessments of patients via Tablet PC
in routine patient care: Comparison with standardised paper questionnaires. Annals of the Rheumatic Diseases, 67(12), 1739–1741.
Ritter, P., Lorig, K., Laurent, D., & Matthews, K. (2004). Internet
versus mailed questionnaires: A randomized comparison. Journal
of Medical Internet Research, 6(3), e29.
Salaffi, F., Gasparini, S., & Grassi, W. (2009). The use of computer touch-screen technology for the collection of patient-reported outcome data in rheumatoid arthritis: Comparison with
standardized paper questionnaires. Clinical and Experimental
Rheumatology, 27(3), 459–468.
Salaffi, F., Gasparini, S., Ciapetti, A., Gutierrez, M., & Grassi, W.
(2013). Usability of an innovative and interactive electronic
system for collection of patient-reported data in axial spondyloarthritis: Comparison with the traditional paper-administered
format. Rheumatology, 52(11), 2062–2070.
Sikorski, A., Given, C. W., Given, B., Jeon, S., & You, M.
(2009). Differential symptom reporting by mode of administration of the assessment: Automated voice response system versus a
live telephone interview. Medical Care, 47(8), 866–874.
Sousa, P. C., Mendes, F. M., Imparato, J. C., & Ardenghi, T. M.
(2009). Differences in responses to the Oral Health Impact Profile
(OHIP14) used as a questionnaire or in an interview. Pesquisa
Odontologica Brasileira—Brazilian Oral Research, 23(4), 358–364.
Suris, A., Borman, P. D., Lind, L., & Kashner, T. M. (2007).
Aggression, impulsivity, and health functioning in a veteran
population: Equivalency and test-retest reliability of computerized and paper-and-pencil administrations. Computers in Human
Behavior, 23(1), 97–110.
Swartz, R. J., de Moor, C., Cook, K. F., Fouladi, R. T., BasenEngquist, K., Eng, C., & Taylor, C. L. C. (2007). Mode effects in
the center for epidemiologic studies depression (CES-D) scale:
Personal digital assistant vs. paper and pencil administration.
Quality of Life Research, 16(5), 803–813.
Tiplady, B., Goodman, K., Cummings, G., Lyle, D., Carrington,
R., Battersby, C., & Ralston, S. H. (2010). Patient-reported outcomes in rheumatoid arthritis: Assessing the equivalence of
electronic and paper data collection. The Patient: Patient Centered Outcomes Research, 3(3), 133–143.
Weiler, K., Christ, A. M., Woodworth, G. G., Weiler, R. L., &
Weiler, J. M. (2004). Quality of patient-reported outcome data
captured using paper and interactive voice response diaries in an
allergic rhinitis study: Is electronic data capture really better?
Annals of Allergy, Asthma & Immunology, 92(3), 335–339.
Whitehead, L. (2011). Methodological issues in Internet-mediated
research: A randomized comparison of internet versus mailed
questionnaires. Journal of Medical Internet Research, 13(4), e109.
Wu, L. T., Pan, J. J., Blazer, D. G., Tai, B., Brooner, R. K.,
Stitzer, M. L., et al. (2009). The construct and measurement
equivalence of cocaine and opioid dependences: A National Drug
Abuse Treatment Clinical Trials Network (CTN) study. Drug and
Alcohol Dependence, 103(3), 114–123.
Yu, S. C., & Yu, M. N. (2007). Comparison of Internet-based and
paper-based questionnaires in Taiwan using multisample invariance approach. Cyberpsychology & Behavior, 10(4), 501–507.
Zimmerman, M., & Martinez, J. H. (2012). Web-based assessment of depression in patients treated in clinical practice: Reliability, validity, and patient acceptance. Journal of Clinical
Psychiatry, 73(3), 333–338.