Qual Life Res DOI 10.1007/s11136-015-1110-8 SPECIAL SECTION: PROS IN NON-STANDARD SETTINGS (BY INVITATION ONLY) Mode of administration does not cause bias in patient-reported outcome results: a meta-analysis Claudia Rutherford1 • Daniel Costa1 • Rebecca Mercieca-Bebber1,2 Holly Rice3 • Liam Gabb4 • Madeleine King1,2 • Accepted: 18 August 2015 Ó Springer International Publishing Switzerland 2015 Abstract Purpose Technological advances in recent decades have led to the availability of new modes to administer patientreported outcomes (PROs). To aid selecting optimal modes of administration (MOA), we undertook a systematic review to determine whether differences in bias (both size and direction) exist among modes. Methods We searched five electronic databases from 2004 (date of last comprehensive review on this topic) to April 2014, cross-referenced and searched reference lists. Studies that compared two or more MOA for a healthrelated PRO measure in adult samples were included. Two reviewers independently applied inclusion and quality criteria and extracted findings. Meta-analyses and meta-regressions were conducted using random-effects models. Results Of 5100 papers screened, 222 were considered potentially relevant and 56 met eligibility criteria. No evidence of bias was found for: (1) paper versus electronic self-complete; and (2) self-complete versus assisted MOA. Heterogeneity for paper versus electronic comparison was explained by type of construct (i.e. physical vs. psychological). Heterogeneity for self-completion versus assisted modes was in part explained by setting (clinic vs. home); the largest bias was introduced when assisted completion occurred in the clinic and follow-up was by self-completion (either electronic or paper) in the home. Conclusions Self-complete paper and electronic MOA can be used interchangeably for research in clinic and home settings. Self-completion and assisted completion produce equivalent scores overall, although heterogeneity may be induced by setting. These results support the use of mixed MOAs within a research study, which may be a useful strategy for reducing missing PRO data. Electronic supplementary material The online version of this article (doi:10.1007/s11136-015-1110-8) contains supplementary material, which is available to authorized users. Introduction & Claudia Rutherford [email protected] 1 Quality of Life Office, Psycho-oncology Co-operative Research Group, School of Psychology, University of Sydney, Level 6 North, Chris O’Brien Lifehouse (C39Z), Sydney, NSW, Australia 2 Sydney Medical School, University of Sydney, Sydney, NSW, Australia 3 Centre for Medical Psychology and Evidence-Based Decision-Making, School of Psychology, University of Sydney, Sydney, NSW, Australia 4 Kings’ College London, London, UK Keywords Systematic review Patient-reported outcome Mode of administration Bias A patient-reported outcome (PRO) is a patient’s report of their own health status that has not been interpreted by a clinician or anyone else [1]. The method of collecting PRO data, or mode of administration (MOA), is receiving increasing attention in both research and clinical contexts, largely driven by the rapid development of new methods and technologies that allow electronic data capture. Historically, paper-based administration (variably referred to as ‘‘paper and pencil’’ or ‘‘hard copy’’) has been used in the clinic. Touch screen computers and tablets are increasingly used in the clinic, and remote data capture (e.g. via internet or mobile devices) allows PRO assessment at times and locations outside the clinic. These newer 123 Qual Life Res methods allow increased ease, speed and efficiency of data capture. Typically, PRO measures are self-administered, i.e. completed personally by the patient, but they may also be interviewer-administered, whereby the interviewer reads questions to the patient and the patient records their response via a chosen input mode. This difference in MOA may induce bias, even when question (or item) wording is consistent across modes, if for example, the question is appraised psychologically by respondents depending on MOA [2]. This may be particularly apparent when an interviewer is present; participants may produce more candid responses when selfcompleting than when responding to a questionnaire verbally administered by a researcher, due to phenomena such as social desirability or response acquiescence [3]. Both self- and interviewer-administered PRO assessment can be completed via a range of input methods, including paper, over the phone, using a computer or other electronic device. The choice of MOA may be determined by the setting in which patients complete PROs, e.g. paperbased MOA may be feasible in clinics if staff are available to handout and collect questionnaires [4], touch-screens are known to be feasible in clinic but require investment in hardware and software [5] and web-based MOA enables completion at home and therefore at times other than scheduled clinic visits. High-quality PRO research relies not only on the use of reliable, valid and responsive PRO measures [1, 6], but also on the generation of high-quality PRO data. While there is no agreed definition of data quality, it is clear that higherquality data will result from higher response rates, greater reliability of responses and absence of bias [7]. A narrative review of the effects of MOA on data quality found five main indicators of data quality: validity of responses (i.e. absence of bias); absence of social desirability; item response rates; amount of information (e.g. amount of open-ended questions); and similarity of response distributions obtained by different MOA [7]. MOA variations arguably could affect PRO data quality. Bias, a systematic error in scores attributable to variables other than those of primary interest, is of particular interest for MOA of PRO measures because it may confound interpretation of PRO results, potentially leading to erroneous conclusions, for example, about the effect of health conditions and interventions on PROs. Another major threat to the quality of PRO data is missing data [8]. The use of a mix of MOA within a study may be an effective way to maximise response rates. For example, some participants may prefer web-based administration while others prefer paper-based [9]. Non-responders from the primary MOA, such as paper, may be followed up using an alternative MOA, such as via telephone. Mixedmode designs may also be used to minimise costs. For 123 example, paper-based MOA may be used when planned PRO assessments coincide with a clinic visit, and clinic staff are available to handout and collect questionnaires, while online administration could be a cost-effective approach at other times. While using a mix of MOA is appealing for a range of practical and logistical reasons, scientific rigour makes it essential to determine whether there are systematic differences due to MOA, and if so, what is the size and direction of MOA bias. Five review papers have assessed the effect of MOA on data quality. Of these, two reviews were based on studies published up to 2006 that compared paper and electronic self-completion of PROs and found they are equivalent [2, 10]. On the other hand, a narrative literature review found mixed results for assisted and self-complete methods: higher item response rates were found with administered methods when compared to paper self-completion while others reported inconsistent effects [11]. One paper found no MOA bias for repeated PRO measurements [12], whereas another did [7]. The last comprehensive review of the literature investigating the impact of MOA on response quality is based on research published up to 2004 [2]. An updated review of evidence is warranted, given the greater availability of computers and increased use of the Internet by researchers and patients. Therefore, to assist researchers and clinicians in selecting the most appropriate MOA for PROs, we undertook a systematic review to investigate bias in MOA (including the size and direction of any bias), specifically between: (a) paper and electronic self-completion; and (b) self-completion and assisted completion [e.g. computer-assisted telephone interview (CATI), researcher-administered]. Methods We searched five electronic databases: MEDLINE; PsycInfo; CINAHL; EMBASE; and Scopus from 2004 (date of last comprehensive review on this topic [2]) to April 2014. The search strategy was developed based on terms used in published papers that compared MOA and considered data quality issues. The search strategy consisted of terms for all possible ‘‘MOA’’, ‘‘data quality’’, ‘‘bias’’ and ‘‘PROs’’. The search strategy was reviewed by the team and discussed with a librarian at the University of Sydney. The final search was tested in MEDLINE to determine whether it retrieved five relevant key papers (see online appendix). No geographical restrictions were applied. An electronic search by author (i.e. authors who had published key papers identified through our search) was performed. The electronic search was supplemented by a search of reference lists of all studies included in the review and from relevant systematic reviews identified. Qual Life Res Study selection and eligibility criteria 2. Abstracts from retrieved papers were reviewed against five broad criteria by one reviewer (CR, RMB or HR): 1. 2. 3. 4. 5. Primary research of quantitative design [i.e. randomised controlled trials (RCTs); cohort study; cross-sectional; comparative study]; Comparing two or more administration modes of data collection; Using PRO measures intended for assessing health outcomes including QOL, symptoms, psychological aspects (e.g. well-being; distress; risk perception) (as opposed to objective patient risk based on patient characteristics), and satisfaction with treatment. We use the umbrella term ‘‘PRO’’ to describe measures (also called questionnaires, survey, tools, rating scales) that attempt to assess patients’ health-related experience from the patient’s perspective; For the standardised health-related PRO measures, a minimum of one piece of empirical evidence for reliability and validity was required, i.e. not ‘‘study specific questions’’ or ad hoc measures; In any adult samples. These criteria were selected as our intention was to identify all studies comparing different MOA for PRO measures in adults, regardless of quantitative research design or population. If all five criteria were met, the paper was considered potentially relevant and obtained in full. Abstracts clearly not relevant to this review were rejected at this stage. Where relevance was ambiguous, papers were obtained in full for further scrutiny. A second reviewer (RMB and HR) screened 25 % of excluded papers, selected at random to assess accuracy of classification; there were no disagreements. Papers were then included if they met the following criteria: 1. 2. 3. 4. Investigated sources of poor data quality or bias, including size and/or direction of bias. Studies that considered techniques for enhancing response rates were included if a comparison between MOA methods was reported; Reported sufficient data to allow inclusion in the quantitative analysis, e.g. means and/or mean difference between MOA groups were reported; Study outcomes were health-related PROs; Published in English Papers were excluded if: 1. The study used qualitative approaches to data collection (e.g. unstructured interviews, focus group discussions) or was a literature review, discussion or perspective paper, or a single case study; 3. 4. 5. 6. 7. 8. Study outcomes were not health-related PROs. For example, measures used for: clinical purposes (i.e. variables such as physiologic measurements (e.g. blood pressure) that may assist clinicians in diagnosing or treating an illness; screening/health history taking/counselling; diagnosis/screening; demographics; personality tests; education/learning interventions; medication adherence; health-related behaviours; census data; aptitude assessment (e.g. task performance such as STROP test); or ad hoc questions; The PRO measure used did not have a minimum of one piece of empirical evidence for reliability and validity; The study did not include a comparison of the effect of MOA (e.g. the study considered incentives for increasing response rates or predictors of increased/decreased response rates; i.e. only one mode, no comparison between modes); Studies addressing the development or refinement of existing PRO measures (e.g. studies assessing the psychometric properties of specific PRO measures) or reporting methods for establishing reliability, validity or responsiveness to change; Analysis for equivalence between two modes was not undertaken/reported or the data were pooled for both modes; Results were not reported by mode or were not useful to our study objectives (i.e. means, mean difference, ICC, response rates, preference were not reported by mode groups on the PRO); Not primary research. Papers were assessed against the eligibility criteria by two researchers (CR, RMB or HR) independently. Disagreements were resolved through team discussion. Where study details were lacking, attempts were made to contact the authors and invite them to provide additional information. Quality assessment The QualSyst quality appraisal tool [13] was used to assess the quality of studies included in the review (Table 2 lists the quality criteria items). We modified item 4 in the checklist to enable assessment of whether ‘‘MOA was clearly described in the methods’’, in line with our review objectives. Study quality was assessed by two reviewers independently (CR, RMB, HR or LG). Each quality criterion was assessed as being met (yes/partial/no), providing a structured approach for the classification of overall study quality and the strength of the evidence based on methodological components. Quality assessments were cross-checked for consistency between assessments. In the case of any disagreements or discrepancies in scores, a third reviewer independently assessed study quality. 123 Qual Life Res Data extraction Data from included studies was extracted into pre-prepared data extraction tables by one reviewer (CR, RMB, HR or LG). All data extraction was cross-checked for errors and omissions by another reviewer (CR, RMB, HR or LG). Disagreements or discrepancies were discussed between the researchers and confirmed with a third reviewer (MK or DC). Final decisions were agreed with the research team. Data pertaining to study aims, sample characteristics, design, data collection methods, analytical methods, and PRO measures were extracted. Many studies reported more than one aspect, or domain, of selfreported health. For the meta-analysis, the following results were extracted for each PRO domain, if reported in the source papers: means for specific modes or mean difference between any two modes, standard deviation or standard error of the mean or mean difference, standardised mean difference, t statistic, p value, or confidence intervals (CI). Of secondary interest, where reported, we extracted response rates for each MOA, the proportion of the sample preferring each MOA, and whether the study considered covariates of preference, response rates or missing data. MOA [15]. The domain-level predictor was construct (physical, psychological, social, or other, where other was used as the reference category). The study-level predictors were type of allocation to mode groups (randomised vs. non-randomised); setting [home vs. home, clinic vs. clinic, home (self-complete) vs. clinic (assisted) or home (assisted) vs. clinic (self-complete), where this last category was the reference category]. Because the domain-level analysis contained non-independent observations (i.e. PRO domains within a study are expected to be correlated), we conducted another metaanalysis on the study-level effect sizes (study-level analysis), which was a weighted average of the effect sizes within each study [15]. Inter-domain correlations were rarely reported in these studies so a correlation of 0.5 between every pair of domains within a study was assumed. We used meta-regression to determine whether any heterogeneity was predicted by the study-level variables described above (allocation method and setting). Results Synthesis General study characteristics A meta-analysis was conducted using a random-effects model, which assumes that the effect size is not fixed but has a distribution within the population [14]. We considered this appropriate for our review because the different type of PRO domains (e.g. physical, psychological) and study designs led us to expect that any heterogeneity in observed effect sizes might be explained by domain-level or study-level variables [14]. The comparisons of interest were (a) paper versus electronic self-completion and (b) self-completion versus assisted completion. A small number of studies identified in our search did not include either of these comparisons, instead comparing face-to-face interviews with tele-assisted interviews, so this comparison was examined post hoc. The summary statistic we used was the standardised mean difference (i.e. mean of one mode minus the other divided by the pooled standard deviation) and its 95 % CI. Analyses were conducted using SAS 9.3. To facilitate comparability of PRO domain scales, we standardised domain score direction so that higher scores always represent a worse outcome. Heterogeneity was assessed by examining s2 and its Z and p values, where p \ 0.05 was taken to indicate the presence of heterogeneity and s2 = 0 indicates no heterogeneity. The primary analysis treated individual domains within studies as the unit of analysis; we refer to this as the domain-level analysis. This allowed us to conduct a metaregression to assess domain-level and study-level predictors of any observed heterogeneity in the differences due to Of 5100 abstracts screened, 222 were considered potentially relevant and 56 met inclusion criteria; 46 provided data to allow inclusion in a meta-analysis and a further 10 reported response rates or preference for MOA data (Fig. 1). Of the 46 studies included in the meta-analysis, 43 compared two modes and three compared three modes. Twenty studies were conducted in the clinic; 31 randomised participants to MOA groups; and 10 recruited healthy or general population samples. Table 1 summarises the key characteristics of these studies. The modes used to administer PROs can be categorised into four groups: (1) self-complete paper (hard copy); (2) selfcomplete electronic (e.g. computer including web, touch screen, hand-held device); (3) telephone-assisted self-complete (paper-based, electronic, voice recorded); and (4) interviewer-administered (face-to-face or telephone). Electronic MOA comparisons included: four telephone interactive voice recognition systems, one CATI, two videoconference, six touch screen and three personal digital assistants as the mode of electronic assessment (Table 1). 123 Study quality Study quality is reported in Table 2. Most studies described their aims or hypotheses, participants, outcomes, analytical methods and results, and drew appropriate conclusions. Thirty-one studies (67 %) randomised participants to MOA. The other studies allocated participants to MOA Qual Life Res Fig. 1 Flow of studies through the process Retrieved electronic (minus duplicates) n=5098 Retrieved from reference lists n=2 Abstracts reviewed n=5100 Not relevant – abstract only n=4878 Obtained in full n=222 Met eligibility n=56 Included Meta-analysis n=46 (31 RCTs; 3 longitudinal; 12 cross-sectional) through alternate allocation upon study entry, participant preference (i.e. participants self-selected), or if non-response to one mode the alternative mode was offered. The description of the method of subject and comparison group selection, MOA methods and allocation to mode were relatively poor (Table 2). For 54 % of studies, power was justified post hoc (e.g. only mentioned ‘‘we had the power to detect differences’’ in the discussion) and \50 % of studies controlled for confounding (Table 2). Meta-analysis There were 273 PRO domain-level effect sizes in the analytic database from the 46 included studies. Paper versus electronic self-completion We found no significant difference in mean scores for comparisons between administration by paper and electronic self-completion at the domain level (effect size = 0.01; Table 3). Figure 2 presents the forest plot of the paper versus self-completion comparison. We found evidence of heterogeneity (0.002), which was significantly predicted by construct, such that the mean bias was larger for physical Reason for exclusions (n=166): 113 Outcomes not validated PROs 10 Study about psychometric properties of existing PRO measures 21 Study not about mode effects 1 Proxy-report compared to patient self-report 21 Comparison of mode not undertaken/reported or results not useful to our study objectives Data on response rates and preference for mode only n=10 domains (mean effect size = 0.010) than for psychological domains (mean effect size = 0.006; Table 4). The residual unexplained heterogeneity was small. At the study level, there was no significant bias (0.01) or heterogeneity (0.002). Self-completion versus assisted completion We found no significant difference in mean scores for comparisons between self-completion and assisted completion at the domain level (0.04; Table 3). Figure 3 presents the forest plot of the self-completion versus assisted completion comparison. However, we found evidence of heterogeneity (s2 = 0.030), which was predicted by setting: the size of the bias (where self-completion exhibited a poorer health outcome than assisted completion) was significantly different for the home versus home comparisons (mean effect size = 0.10), the clinic versus clinic comparisons (mean effect size = 0.05) and the home (self-complete) versus clinic (assisted) (mean effect size = 0.10) than for the home (assisted) versus clinic (self-complete) (mean effect size = -0.07; Table 4). Some unexplained heterogeneity remained but insufficient reporting in the included studies did not allow for any other predictors (unmeasured or unmeasurable) to be investigated. We found no significant 123 123 Swedish and Australian participants with panic disorder, registered to an internet-based intervention (n = 120) Patients with Rheumatology (n = 200) Austin (2009) [26], Sweden; Australia Participants from a RCT exploring yoga for chronic lower back pain (n = 45) Prostate cancer (n = 107) Heterogeneous cancer patients (n = 1317) Individuals with ocular surface disease and those without (controls) (n = 132) Undergraduate psychology students (n = 106) Undergraduate psychology students (paper group); university staff, postgraduate students, general population (electronic group) (n = 531) Patients with Chronic Pain (n = 200) Chang (2014) [30], Taiwan Cheung (2006) [31], Singapore Clayton (2013) [32], USA Coles (2007) [33], USA Collins (2004) [34], Australia Cook (2004) [35], USA Group 2: pts 1 year post-stroke, recruited through central London stroke self-help groups (n = 59) Cerrada (2014) [29], USA Caute (2012) [28], UK Group 1: pts 3–6 months poststroke, from NHS Hospitals Patients receiving curative treatment for cancer (n = 153) Ashley (2013) [25], UK Bjorner (2014) [27], Denmark Population and sample size for analysis First author (year), country Table 1 Characteristics of included studies RCT, cross-over, allocation to both modes, 45 min apart Cross-sectional, modes advertised separately to different populations; individuals selfselected RCT, cross-over allocation to mode. Both modes completed within average of 1.5 days Hard copy in clinic versus electronic (touch screen) in clinic Hard copy at home/postal versus electronic (computer) at home Hard copy in clinic versus electronic (computer) at home Hard copy in clinic versus electronic (computer) in clinic Hard copy in clinic versus face-to-face interview Cross-sectional study. Allocation to mode based on participant preference RCT, cross-over, allocation to both modes, completed within 7 days Hard copy in clinic versus electronic (touch screen) in clinic RCT, cross-over, allocation to both modes, 120 min apart Comparative study: assessed at baseline, 6 and 12 weeks. Modes completed 2–11 days Hard copy in clinic versus CATI at home Face-to-face interview in clinic versus hard copy at home or telephone interview at home Group 1: one mode then the other Group 2: randomised to mode. Target: 3–14 days between modes; actual not reported Electronic (PDA) in clinic versus electronic (PC) in clinic Hard copy at home/postal versus electronic (computer) at home Hard copy at home/postal versus electronic (computer) at home Modes compared RCT, randomised to PDA-PC or PC-PC. Both groups completed on the same day. RCT, cross-over allocation to both modes, completed within 7 days RCT, cross-over allocation to both modes, 2 weeks apart Design University pain management clinic Home College, home Ophthalmology clinic Clinic Clinic Clinic, home Clinic, home Clinic Home Home Setting Short-form McGill Pain Questionnaire; Pain Disability Index Dissociative Experiences Scale (DES) Obsessive Compulsive Inventory (OCI); Obsessive Beliefs Questionnaire (OBQ) Refractive Error QoL Questionnaire (RQL); Visual Function Questionnaire (VFQ); Ocular Surface Disease Index (OSDI) FACT-General; Functional Living Index–Cancer (FLIC); EORTC QLQ-C30 EORTC QOL Questionnaire (EORTC QLQ-PR25) in Taiwan Chinese Roland Morris Disability Questionnaire (RMDQ); SF-36 Stroke and Aphasia QOL Scale (SAQOL-39 g) Body Sensations Questionnaire (BSQ); Agoraphobic Cognitions Questionnaire (ACQ); Mobility Inventory (MI) PROMIS Physical, Fatigue and Depression measures The Social Difficulties Inventory (SDI) PRO measures Qual Life Res Population and sample size for analysis Salaried employees at a large company (n = 751) Undergraduate students and volunteers from general public (n = 223) Cancer patients undergoing treatment at specialist centre (n = 483) Adult women, new patients at centre for pelvic health (n = 62) Adults from two geriatric rehabilitation wards (n = 156) Sample description not recorded (n = 91) Individuals recruited to two studies evaluating the effect of Internetdelivered treatment for social anxiety disorder (n = 121) Psychiatric patients receiving treatment at the Internet-based cognitive behaviour therapy clinic (n = 82) Participants recruited within public health care (n = 112) Participants with an affective disorder (n = 42) First author (year), country Greene (2008) [36], USA Grieve (2011) [37], Australia Gundy (2010) [38], Netherlands Handa (2008) [39], USA Hauer (2010) [40], Germany Hayes (2013) [41], Australia Hedman (2010) [42], Sweden Hedman (2013) [43], Sweden Hollandare (2010) [44], Sweden Kobak (2004) [45], USA Table 1 continued Cross-sectional study. Face-to-face mode first, then alternate allocation to face-to-face or videoconference, 1–3 days apart RCT, cross-over allocation to both modes, completed within an average of 10 days Repeated measures, cohort study. Participants completed electronic mode first, then telephone 3 days later Cross-sectional study. Recruited via media advertisement. Paper (2006 study); electronic (2007 study) RCT, randomised to mode groups; only one mode completed Cross-sectional study. Completed one mode after the other (same order) RCT, cross-over, allocation to both modes, within 6 weeks RCT, randomised to mode groups; only one mode completed Cross-sectional study. Mode selected by participant preference RCT, randomised to mode, nonrespondents offered switching mode Design Face-to-face interview in clinic versus interview assisted via videoconference at home Hard copy at home/postal versus electronic (computer) at home Electronic (computer) at home versus telephone interview Hard copy at home versus electronic (computer) at home Electronic (computer) in clinic versus hard copy in clinic versus telephone interview Hard copy in clinic versus face-to-face interview Hard copy at home and in clinic versus electronic (computer) at home and in clinic Hard copy at home/postal versus telephone interview versus hard copy in clinic Electronic (computer) in clinic versus hard copy in clinic Electronic (computer) at home versus telephone interview Modes compared Clinic, home Home Home Hamilton Depression Rating Scale (HAM-D) Montgomery-Asberg Depression Rating Scale (MADRS-S); Beck Depression Inventory (BDI-II) Montgomery–Åsberg Depression Rating Scale Self-rated (MADRS-S) Liebowitz Social Anxiety Scale (LSAS); Social Interaction Anxiety Scale (SIAS); Social Phobia Scale (SPS); Montgomery-Asberg Depression Rating Scale (MADRS-S); Beck Anxiety Inventory (BAI); QoL of Insomniacs Questionnaire (QOLI) Edinburgh Depression Scale (EDDS) Clinic, home Home Falls Efficacy Scale (FES); Falls Efficacy Scale-International (FES-I) Pelvic Floor Distress Inventory (PFDI); Pelvic Floor Impact Questionnaire (PFIQ) EORTC QoL Questionnaire (QLQ-C30v2) Depression, Anxiety, Stress Scales (DASS) Widely used self-report of health status PRO measures Clinic Home & in pelvic health clinic Home and cancer clinic University Home Setting Qual Life Res 123 Population and sample size for analysis Subjects with a DSM-IV-TRdiagnosed mood disorder (n = 70) Patients with lower back pain (n = 701) A non-probability sample of subjects from the Arizona Cancer Center outpatient clinics (n = 184) Patients who received acupuncture treatment, drawn from GERAC acupuncture trial (n = 1087) Adults attending urban pain management centre in a teaching hospital (n = 42) Prostate cancer patients (n = 185) Female undergraduate psychology students (n = 76) Adults with asthma (n = 96) Non-probability, purposive sample of the general adult population (n = 312) Prosthodontic patients (n = 42) Outpatients with inflammatory rheumatic diseases (n = 153) First author (year), country Kobak (2008) [46], USA Lall (2012) [47], UK Lundy (2011) [48], USA Lungenhausen (2007) [49], Germany Marceau (2007) [50], USA Matthew (2007) [51], Canada Naus (2009) [52], USA Pinnock (2005) [53], UK Ramachandran (2008) [54], USA Reissmann (2011) [55], Germany Richter (2008) [56], Germany Table 1 continued 123 RCT, cross-over allocation to both modes. Time between completions not reported RCT, cross-over allocation. All modes completed over 3–4 week period RCT, cross-over, allocation to both modes, 10 min apart Longitudinal cohort study. Alternate completion (postal first then in clinic under supervision at next scheduled visit, completed 1 week apart RCT, cross-over allocation to both modes, completed within 21 days RCT, cross-over allocation to both modes, completed 30 min apart RCT, cross-over, allocation to both modes, completed within 1 week RCT, cross-over, allocation to both modes, completed within 4 weeks Cross-sectional study. Nonresponders telephoned to collect data over the phone RCT, cross-over allocation to both modes, completed between 2 and 3 days Allocation to mode via counterbalanced order, completion on same day Cross-sectional study. Design Electronic (tablet computer) in clinic versus hard copy in clinic Telephone interview versus face-to-face interview versus Hard copy in clinic Hard copy in clinic versus electronic (touch screen) in clinic University Clinic Clinic, home Clinic Home, clinic University clinic Hard copy in clinic versus electronic (computer) in clinic Hard copy at home/postal versus assisted hard copy in clinic Outpatient clinic Home Home Home Clinic, home Clinic, home Setting Hard copy in clinic versus electronic (PDA) in clinic Electronic (computer) at home versus hard copy at home/postal Telephone interview at home versus hard copy at home/postal Hard copy in clinic versus telephone interview at home Hard copy at home/postal versus telephone automated Face-to-face interview in clinic versus interview assisted via videoconference at home Modes compared Hannover Functional Questionnaire (FFbH); Health Assessment Questionnaire (HAQ); Bath Ankylosing Spondylitis Disease Activity Index; SF-36 (Physical & Mental) Oral Health Impact Profile (German version) EuroQol Visual Analogue Scale (EQ VAS) Mini Asthma QoL Questionnaire (MiniAQLQ) International Prostate Symptom Score; International Index of Erectile Function; Patient Oriented Prostate Utility Scale Beck Depression Inventory (BDIII); SF-36 Brief Pain Inventory (BPI) SF-12; Graded Chronic Pain Scale Vorn Korff Scale; SF12; EuroQoL Five Dimensions Questionnaire (EQ-5D) EuroQoL Five Dimensions Questionnaire (EQ-5D); EuroQoL VAS Montgomery-Asberg Depression Rating Scale (MADRS) PRO measures Qual Life Res Population and sample size for analysis Convenience sample with chronic disease who had access to and who were familiar with the Internet (n = 462) Patients with rheumatoid arthritis (n = 87) Patients with axial spondyloarthritis (N = 55) Patients with cancer (n = 437) Patients attending an adult dental clinic (n = 90) Veterans with diagnosis of moodand-substance-related disorders, and general medical patients with no psychiatric diagnosis (n = 97) Patients and companions from genitourinary medical oncology, gastrointestinal medical oncology and psychiatric clinics (n = 801) Patients with rheumatoid arthritis (n = 43) First author (year), country Ritter (2004) [57], USA Salaffi (2009) [58], Italy Salaffi (2013) [59], Italy Sikorski (2009) [60], USA Sousa (2009) [61], Brazil Suris (2007) [62], USA Swartz (2007) [63], USA Tiplady (2010) [64], UK Table 1 continued Hard copy in clinic versus electronic (touch screen) in clinic Electronic (PDA) in clinic versus hard copy in clinic RCT, cross-over allocation to mode. Time between mode completion not recorded RCT, cross-over allocation to both modes, 45 min apart Hard copy in clinic versus electronic (computer) in clinic Hard copy in clinic versus face-to-face interview in clinic Telephone interview versus telephone automated Hard copy in clinic versus electronic (touch screen) in clinic Hard copy in clinic versus electronic (touch screen) in clinic Electronic (computer) at home versus hard copy at home/postal Modes compared Cross-sectional study. Consecutive assignment to mode RCT, cross-over allocation to mode, completed 2 weeks apart RCT, randomised to mode groups; only one mode completed RCT, cross-over allocation to both modes, 60 min apart RCT, cross-over allocation to both modes, 60 min apart RCT, randomised to mode groups; only one mode completed Design Rheumatology clinic Cancer clinic Clinic University clinic Two Cancer clinics Clinic Clinic Home Setting Brief Pain Inventory (BPI); EuroQoL Five Dimensions Questionnaire (EQ-5D); Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F); Heath Assessment Questionnaire (HAQ); Short-form McGill Pain Questionnaire (SFMPQ); SF-36; Stress and Adaptation in Rheumatoid Arthritis Center for Epidemiologic Studies Depression Scale (CES-D) Aggression Questionnaire; Barratt Impulsiveness Scale-11; SF-36 Oral Health Impact Profile (OHIP14) Symptom index for: pain, fatigue, cough, peripheral neuropathy, dyspnoea, insomnia, dry mouth, difficulty remembering, poor appetite, nausea/vomiting, diarrhoea, constipation, weakness, alopecia Bath Ankylosing Spondylitis Disease Activity Index; Bath Ankylosing Spondylitis Functional Index; Ankylosing Spondylitis Disease Activity Score- preferred and alternative VAS for general health, pain and global activity; Tender Joint Count; Recent-Onset Arthritis Disability Questionnaire VAS for pain, shortness of breath and fatigue; Illness Intrusiveness Ratings Scale (IIRS); Health Distress Thermometer; Health Assessment Questionnaire (HAQ) PRO measures Qual Life Res 123 123 People with allergic rhinitis (n = 82) University students (n = 2000) Patients with chronic heart failure (n = 58) Teachers at elementary, junior and senior high schools (n = 2400) Psychiatric patients receiving treatment for depression (n = 53) Patients with a benign abnormality or with breast cancer (n = 800) Weiler (2004) [65], USA Whitehead (2011) [66], New Zealand Wu (2009) [67], Canada Yu (2007) [68], Taiwan Zimmermann (2012) [69], USA Zuidgeest (2011) [21], Netherlands Cross-sectional study. Nonresponse to one mode, alternative mode offered Cross-sectional study. Electronic version completed 48 h before clinic; hard copy completed after clinic RCT, randomised to mode groups; only one mode completed RCT, cross-over allocation to both modes, 2 weeks apart RCT, randomised to mode groups; only one mode completed RCT, cross-over allocation to both modes, 1 week apart Design Hard copy at home/postal versus electronic (computer) at home Hard copy in clinic versus electronic (computer) at home Electronic (computer) at home versus hard copy at home/postal Electronic (computer) in clinic versus hard copy in clinic Electronic (computer) at home versus hard copy at home/postal Hard copy at home/postal versus telephone automated Modes compared Home Clinic and home Home Clinic Home Home Setting CQ-index Breast Care Clinically Useful Depression Outcome Scale Centre for Epidemiological Studies Depression Scale (CES-D) The Kansas City Cardiomyopathy Questionnaire; Minnesota Living with Heart Failure Questionnaire SF-12; Feeling of Satisfaction with Inhaler Questionnaire; Hospital Anxiety and Depression Score (HADS) CompleWare Corporation Symptom Score Card II or Symptom Phone Link PRO measures CATI Computer-assisted telephone interview, PDA personal digital assistant, PC personal computer, EORTC European Organization for Research and Treatment of Cancer, QOL quality of life, VAS visual analogue scale, FACT Functional Assessment of Cancer Therapy Population and sample size for analysis First author (year), country Table 1 continued Qual Life Res Qual Life Res Table 2 Methodological characteristics of studies comparing two or more modes of PRO measure administration (n = 46) Quality criteria % Yes % Partial % No Question or objective sufficiently described? 60.9 39.1 Design evident and appropriate to answer the author’s study question 56.5 41.3 2.2 Method of subject selection and comparison group selection is described and appropriate 34.8 58.7 6.5 0 Modes of administration are clearly described in the methods 47.8 52.2 0 Subject and comparison group characteristics sufficiently described? 69.6 26.1 4.3 If random allocation to treatment group was possible, is it described? 23.9 50 If interventional and blinding of investigators to intervention was possible, is it reported? (n = 31 RCT) 26.1 0 2.2 Outcome well defined and robust to measurement bias? Means of assessment reported? 82.6 17.4 97.8 0 Sample size appropriate? 26.1 54.3 19.6 Analysis described and appropriate to address the authors aims? 71.7 28.3 0 Some estimate of variance (e.g. SD, CI, standard errors) reported for the main results/outcomes (i.e. those directly addressing the study question/objective upon which the conclusions are based)? 73.9 23.9 2.2 Controlled for confounding? 41.3 50 8.7 Results reported in sufficient detail? 82.6 15.2 2.2 Do the results support the conclusions? 84.8 15.2 0 Table 3 Summary of meta-analysis results, including mean effect size, 95 % confidence interval (CI) and heterogeneity (s2) Modes of administration compared Effect size Heterogeneity Number of domains/studies Mean effect size 95 % CI s2 181 0.01 -0.01, 0.03 0.002 0.02 31 0.01 -0.02, 0.04 0.002 0.09 Domain level 90 0.04 -0.01, 0.08 0.030 \0.001 Study level 14 0.01 -0.10, 0.12 0.026 0.02 4 0.005 -0.39, 0.39 0 – 3 0.005 -0.52, 0.52 0 – p Self-completion: paper versus electronic Domain level Study level Self-completion versus assisted completion Interview: face-to-face versus tele-assisted Domain level Study level bias (0.01) at the study level, but observed significant heterogeneity (s2 = 0.026) that was not explained by any of the predictors. Face-to-face interview versus tele-assisted interview We found no significant difference (0.005) in mean scores for comparisons between face-to-face and tele-assisted interview at the domain level (Table 3). We found no evidence of heterogeneity (s2 = 0) nor did we find significant bias (0.005) or heterogeneity (s2 = 0) at the study level. 18–39 % of total sample) [16, 17], three studies in favour of electronic MOA (range 39–90 %) compared to paper (range 22–73 %) [9, 18, 19], and four studies found no difference in response rates between modes [20–23]. Five studies contributed some information about participants’ preference for MOA (Table 5). In three studies, participants preferred paper when compared to electronic MOA [9, 16, 17]. In one study, participants preferred electronic MOA (e.g. web link, computer) when compared to paper MOA [19] and in another study, participants preferred paper and interactive voice MOA compared to paper alone [23]. Response rates and mode preference Discussion Although not of primary interest, nine studies contributed information about response rates for different MOAs. Two studies found higher response rates for paper completion (range 61–89 %) compared to electronic MOA (range The results summarised here provide strong evidence that there is no bias attributable to MOA in PRO data collection. The effect sizes (standardised mean differences) for 123 Qual Life Res Fig. 2 Forest plot (standardised mean difference and 95 % CI) for self-complete electronic versus pen–paper administration at the study level. Negative effect sizes favour electronic Table 4 Unstandardised regression parameter estimates (and p values) from meta-regression analyses Comparison Predictor Constructa Settingb Randomisationc Other Social Psych CC CH -0.04 (0.06) -0.02 (0.70) -0.16 (\0.01) 0.06 (0.69) Self-completion: paper versus electronic Domain level -0.04 (0.39) -0.04 (0.44) -0.06 (0.03) Self-completion versus assisted completion Domain level -0.04 (0.51) 0.08 (0.44) 0.05 (0.43) Study level – – – HC – 0.07 (0.09) -0.21 (\0.01) -0.24 (0.03) 0.14 (0.04) 0.08 (0.57) – 0.03 (0.81) Only comparisons exhibiting heterogeneity in meta-analyses were included in meta-regressions. The estimates represent the covariate-adjusted mean differences between the variable level listed in the column header and the reference level, e.g. the difference between physical (reference) and psychological constructs is -0.06 a Reference group = physical b CC, clinic–clinic; CH, clinic (assisted)-home (self); HC, home (assisted)-clinic (self); reference group, home–home c Reference group = randomised 123 Qual Life Res Fig. 3 Forest plot (standardised mean difference and 95 % CI) for self-completion versus assisted completion at the study level. Negative effect sizes favour assisted completion Table 5 Preference for mode of administration First author of study Paper (%) Electronic (%) Interactive voice/paper (%) No preference (%) Kongsved [16] 55.4 32.4 – – Reichmann [17] 61 39 – – Shea [23] 22 – 78 – Smith [9] 57 43 – – Wijndaele [19] 21.6 39.2 – 39.2 the paper versus electronic self-completion were small and not statistically significant, consistent with other reviews of studies comparing paper- and electronic-completion of PROs [2, 10]. However, we found evidence of heterogeneity in effect sizes, which was explained mostly by construct, such that mean bias was larger for physical outcomes (e.g. physical functioning, role functioning) than for psychological (e.g. anxiety, distress) outcomes. However, in an absolute sense, this bias was small for both types of construct. Our findings have important practical implications. First, they legitimize the use of mixed MOA in studies. For example, participants may complete PROs electronically in the clinic and subsequently paper versions at home, or vice versa. Patients can be offered MOA based on their preference, which may in turn increase response rates and subsequently improve data quality. Our findings also suggest researchers can safely transfer validated, paper-based PRO measures to electronic interfaces, assuming that the item wording, response options and layout remain the same. Any changes to these aspects should be evaluated through cognitive interviewing to ensure that respondents are interpreting the items as intended and responding equivalently across different MOA [1]. While our findings suggest that studies using a mix of self-completion and assisted completion will generally produce equivalent scores, some caution may be required if setting (home vs. clinic) changes as well as self-completion versus assisted completion, particularly when assisted completion occurs in the home and self-completion in the clinic. However, the small number of studies in each category means this may be a chance finding rather than true interaction of setting by MOA. Until we have more robust evidence of the relationship between self-completion versus assisted completion and setting, it may be safest to alter only one of these two aspects of administration within a study. The studies included in our review were published in the last decade, during which time, social media and the emergence of online and in-person support groups have increased, as has patients’ familiarity with PRO completion. Improved patient–clinician communication and shared decision-making have been encouraged and promoted, and clinicians are expected to better communicate with their patients and to consider PROs for individual patient care. PROs are increasingly collected routinely in clinical practice, thus patients may be more familiar with completing them. Consequently, social acquiescence bias may be less problematic than was previously as people may be more comfortable with sharing and discussing their feelings and are aware that honest reporting of their health outcomes may positively impact their care. Interestingly, of the 14 studies we identified that compared self-completion and assisted completion, 11 included patients with a medical or mental health diagnosis attending clinic, rather than a general population or student sample (which were relatively common in the other MOA comparisons we meta-analysed). 123 Qual Life Res Preference for mode was not of primary interest, and therefore, we did not conduct a comprehensive search for studies designed to assess MOA preference. However, consistent with Shih and Fan [24], the few studies included in our review that reported information about participants’ preference for mode found mixed results; four studies found participants preferred paper when compared to electronic MOA, while in two studies, participants preferred electronic over paper MOA. Having detected no bias between MOA, participants can select MOA based on their preference, which may increase response rates and reduce missing data. Similarly, there was no systematic difference in response rates; two studies found response rates in favour of paper compared to electronic MOA, three studies in favour of electronic MOA compared to paper, and four found no difference. These mixed findings are consistent with others [11]; no one mode produces consistently higher response rates. A possible association between MOA preference and response rate should be tested in future research. Other factors not explored in this review but that may play a role and should be explored are age, computer literacy and level of health status. Caveats, recommendations and future directions An important caveat is that our findings apply to research studies but not necessarily to use of PRO measures in clinic to manage individual patients because the evidence base does not yield any information about the extent to which an individual’s scores would differ between two MOA. A combination of electronic and paper self-completion can be used in the clinic and at home without likelihood of introducing any bias due to MOA. Future mode comparison studies are required, but these need to be experimentally designed to enable the direct assessment of the impact of some of the mediators of MOA effects on data quality. Specifically, these include the method of contacting respondents (e.g. by post, in person); sensory input channel (e.g. visual vs. aural); and response mode (e.g. handwritten, keyboard, touchscreen, telephone). Due to the lack of data presented in papers, the impact of these mediators of MOA effects could not be determined. MOA effects may also differ between populations (e.g. general population vs. diseased); however, the heterogeneity in our data did not allow us to examine this variable but it is worthy of future research. An earlier review exploring self- and administered-MOA found that MOA did not have an effect on repeated PRO assessments [12], whereas another found such an effect [7]. The majority of studies in our meta-analysis compared MOA at one time point rather than longitudinally across multiple times points. Further comparative studies of MOA should consider multiple variables, including equivalence of mode in individuals rather than at group level; longitudinal comparisons to assess the impact of mixed MOA over time 123 in terms of equivalence in scores and the quality of response (non-response bias; validity, reliability and distribution of responses); and setting. Rather than more studies designed to describe bias as a function of MOA, what is needed is further consideration of study design features that may explain any bias, such as setting. Conclusions A combination of electronic and paper self-completion and assisted completion methods can be used in the clinic and at home, based on patient preference for mode. To fill the evidence gap and increase our understanding of the impact of MOA on PROs, further experimental comparative studies are needed to assess the mediators of mode effects, measurement equivalence and reliability of assessment for individuals rather than groups, and the impact of setting and mixed MOA over time. Acknowledgments We thank HR and LG for their contribution to this project as volunteers. We also acknowledge the support from our faculty librarian, Matthew Davis, in developing the search strategy. Funding C.R., D.C., R.M.B. and M.K. are supported by the Australian Government through Cancer Australia. No additional funding was sought for this review. Compliance with ethical standards Conflict of interest All authors declare that they have no conflict of interest. Ethical approval This article is a secondary analysis of published literature. It does not contain any studies with human participants or animals performed by any of the authors. References 1. Food and Drug Administration. (2009). Patient reported outcome measures: Use in medical product development to support labelling claims. MD: US Department of Health & Human Support Food & Drug Administration. 2. Hood, K., Robling, M., Ingledew, D., Gillespie, D., Greene, G., Ivins, R., et al. (2012). Mode of data elicitation, acquisition and response to surveys: A systematic review. Health Technology Assessment, 16(27), 1–162. 3. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879–903. 4. Basch, E., Abernethy, A. P., Mullins, C. D., Reeve, B. B., Smith, M. L., Coons, S. J., et al. (2012). Recommendations for incorporating patient-reported outcomes into clinical comparative effectiveness research in adult oncology. Journal of Clinical Oncology, 30(34), 4249–4255. 5. Stukenborg, G. J., Blackhall, L., Harrison, J., Barclay, J. S., Dillon, P., Davis, M. A., et al. (2014). Cancer patient-reported Qual Life Res 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. outcomes assessment using wireless touch screen tablet computers. Quality of Life Research, 23(5), 1603–1607. Scientific Advisory Committee of the Medical Outcomes Trust. (2002). Assessing health status and quality-of-life instruments: Attributes and review criteria. Quality of Life Research, 11(3), 193–205. Bowling, A. (2005). Mode of questionnaire administration can have serious effects on data quality. Journal of Public Health, 27(3), 281–291. Bernhard, J., Cella, D. F., Coates, A. S., Fallowfield, L., Ganz, P. A., Moinpour, C. M., et al. (1998). Missing quality of life data in cancer clinical trials: Serious problems and challenges. Statistics in Medicine, 17(5–7), 517–532. Smith, A. B., King, M., Butow, P., & Olver, I. (2013). A comparison of data quality and practicality of online versus postal questionnaires in a sample of testicular cancer survivors. PsychoOncology, 22(1), 233–237. Gwaltney, C. J., Shields, A. L., & Shiffman, S. (2008). Equivalence of electronic and paper-and-pencil administration of patient-reported outcome measures: A meta-analytic review. Value Health, 11(2), 322–333. McColl, E., Jacoby, A., Thomas, L., Soutter, J., Bamford, C., Steen, N., et al. (2001). Design and use of questionnaires: A review of best practice applicable to surveys of health service staff and patients. Health Technology Assessment, 5(31), 1–256. Puhan, M. A., Ahuja, A., Van Natta, M. L., Ackatz, L. E., & Meinert, C. (2011). Interviewer versus self-administered healthrelated quality of life questionnaires—Does it matter? Health and Quality of Life Outcomes,. doi:10.1186/1477-7525-1189-1130. Kmet, L., Lee, R., & Cook, L. (2004). Standard quality assessment criteria for evaluating primary research papers from a variety of fields. Health Technology Assessment, 13, 1–294. Lipsey, M., & Wilson, D. (2001). Practical meta-analysis. Thousand Oaks: Sage. Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Oxford: Wiley. Kongsved, S. M., Basnov, M., Holm-Christensen, K., & Hjollund, N. H. (2007). Response rate and completeness of questionnaires: A randomized study of Internet versus paper-andpencil versions. Journal of Medical Internet Research, 9(3), e25. Reichmann, W. M., Losina, E., Seage, G. R., Arbelaez, C., Safren, S. A., Katz, J. N., et al. (2010). Does modality of survey administration impact data quality: Audio computer assisted self interview (ACASI) versus self-administered pen and paper? PLoS One, 5(1), e8728. Lannin, N. A., Anderson, C., Lim, J., Paice, K., Price, C., Faux, S., et al. (2013). Telephone follow-up was more expensive but more efficient than postal in a national stroke registry. Journal of Clinical Epidemiology, 66(8), 896–902. Wijndaele, K., Matton, L., Duvigneaud, N., Lefevre, J., Duquet, W., Thomis, M., et al. (2007). Reliability, equivalence and respondent preference of computerized versus paper-and-pencil mental health questionnaires. Computers in Human Behavior, 23(4), 1958–1970. Rodriguez, H. P., von Glahn, T., Rogers, W. H., Chang, H., Fanjiang, G., & Safran, D. G. (2006). Evaluating patients’ experiences with individual physicians: A randomized trial of mail, internet, and interactive voice response telephone administration of surveys. Medical Care, 44(2), 167–174. Zuidgeest, M., Hendriks, M., Koopman, L., Spreeuwenberg, P., & Rademakers, J. (2011). A comparison of a postal survey and mixed-mode survey using a questionnaire on patients’ experiences with breast care. Journal of Medical Internet Research, 13(3), e68. Rutherford, C., Nixon, J., Brown, J. M., Lamping, D. L., & Cano, S. J. (2014). Using mixed methods to select optimal mode of 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. administration for a patient-reported outcome instrument for people with pressure ulcers. BMC Medical Research Methodology, 14(22), 1471–2288. Shea, J. A., Guerra, C. E., Weiner, J., Aguirre, A. C., Ravenell, K. L., & Asch, D. A. (2008). Adapting a patient satisfaction instrument for low literate and Spanish-speaking populations: Comparison of three formats. Patient Education and Counseling, 73(1), 132–140. Shih, T., & Fan, X. (2007). Response rates and mode preferences in web-mail mixed-mode surveys: A meta-analysis. International Journal of Internet Science, 2, 59–82. Ashley, L., Keding, A., Brown, J., Velikova, G., & Wright, P. (2013). Score equivalence of electronic and paper versions of the Social Difficulties Inventory (SDI-21): A randomised crossover trial in cancer patients. Quality of Life Research, 22(6), 1435–1440. Austin, J., Alvero, A. M., Fuchs, M. M., Patterson, L., & Anger, W. K. (2009). Pre-training to improve workshop performance in supervisor skills: An exploratory study of Latino agricultural workers. Journal of Agricultural Safety and Health, 15(3), 273–281. Bjorner, J. B., Rose, M., Gandek, B., Stone, A. A., Junghaenel, D. U., & Ware, J. E, Jr. (2014). Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity. Journal of Clinical Epidemiology, 67(1), 108–113. Caute, A., Northcott, S., Clarkson, L., Pring, T., & Hilari, K. (2012). Does mode of administration affect health-related quality-of-life outcomes after stroke? International Journal of Speechlanguage Pathology, 14(4), 329–337. Cerrada, C. J., Weinberg, J., Sherman, K. J., & Saper, R. B. (2014). Inter-method reliability of paper surveys and computer assisted telephone interviews in a randomized controlled trial of yoga for low back pain. BMC Research Notes, 7, 227. doi:10. 1186/1756-0500-7-227. Chang, Y. J., Chang, C. H., Peng, C. L., Wu, H. C., Lin, H. C., Wang, J. Y., et al. (2014). Measurement equivalence and feasibility of the EORTC QLQ-PR25: Paper-and-pencil versus touchscreen administration. Health and Quality of Life Outcomes, 12, 23. doi:10.1186/1477-7525-12-23. Cheung, Y. B., Goh, C., Thumboo, J., Khoo, K. S., & Wee, J. (2006). Quality of life scores differed according to mode of administration in a review of three major oncology questionnaires. Journal of Clinical Epidemiology, 59(2), 185–191. Clayton, J. A., Eydelman, M., Vitale, S., Manukyan, Z., Kramm, R., Datiles, M., et al. (2013). Web-based versus paper administration of common ophthalmic questionnaires: Comparison of subscale scores. Ophthalmology, 120(10), 2151–2159. Coles, M. M., Cook, L. M., & Blake, T. R. (2007). Assessing obsessive compulsive symptoms and cognitions on the internet: Evidence for the comparability of paper and Internet administration. Behaviour Research and Therapy, 45(9), 2232–2240. Collins, F. E., & Jones, K. V. (2004). Investigating dissociation online: Validation of a web-based version of the dissociative experiences scale. Journal of Trauma & Dissociation, 5(1), 133–147. Cook, A. J., Roberts, D. A., Henderson, M. D., Van Winkle, L. C., Chastain, D. C., & Hamill-Ruth, R. J. (2004). Electronic pain questionnaires: a randomized, crossover comparison with paper questionnaires for chronic pain assessment. Pain, 110(1–2), 310–317. Greene, J., Speizer, H., & Wiitala, W. (2008). Telephone and web: Mixed-mode challenge. Health Services Research, 43(1 Pt 1), 230–248. Grieve, R., & de Groot, H. T. (2011). Does online psychological test administration facilitate faking? Computers in Human Behavior, 27(6), 2386–2391. Gundy Cm, A. N. K. (2010). Effects of mode of administration (MOA) on the measurement properties of the EORTC QLQ-C30: A randomized study. Health and Quality of Life Outcomes, 8, 35. doi:10.1186/1477-7525-8-35. 123 Qual Life Res 39. Handa, V. L., Barber, M. D., Young, S. B., Aronson, M. P., Morse, A., & Cundiff, G. W. (2008). Paper versus web-based administration of the pelvic floor distress inventory 20 and pelvic floor impact questionnaire 7. International Urogynecology Journal, 19(10), 1331–1335. 40. Hauer, K., Yardley, L., Beyer, N., Kempen, G., Dias, N., Campbell, M., et al. (2010). Validation of the falls efficacy scale and falls efficacy scale international in geriatric patients with and without cognitive impairment: results of self-report and interview-based questionnaires. Gerontology, 56(2), 190–199. 41. Hayes, J., & Grieve, R. (2013). Faked depression: Comparing malingering via the internet, pen-and-paper, and telephone administration modes. Telemedicine and e Health, 19(9), 714–716. 42. Hedman, E., Ljotsson, B., Ruck, C., Furmark, T., Carlbring, P., Lindefors, N., & Andersson, G. (2010). Internet administration of self-report measures commonly used in research on social anxiety disorder: A psychometric evaluation. Computers in Human Behavior, 26(4), 736–740. 43. Hedman, E., Ljotsson, B., Blom, K., Alaoui, S. E., Kraepelien, M., Ruck, C., et al. (2013). Telephone versus internet administration of self-report measures of social anxiety, depressive symptoms, and insomnia: Psychometric evaluation of a method to reduce the impact of missing data. Journal of Medical Internet Research, 15(10), 131–138. 44. Hollandare, F., Andersson, G., & Engstrom, I. (2010). A comparison of psychometric properties between internet and paper versions of two depression instruments (BDI-II and MADRS-S) administered to clinic patients. Journal of Medical Internet Research, 12(5), e49. 45. Kobak, K. A. (2004). A comparison of face-to-face and videoconference administration of the Hamilton Depression Rating Scale. Journal of Telemedicine and Telecare, 10(4), 231–235. 46. Kobak, K. A., Williams, J. B. W., Jeglic, E., Salvucci, D., & Sharp, I. R. (2008). Face-to-face versus remote administration of the Montgomery–Asberg Depression Rating Scale using videoconference and telephone. Depression and Anxiety, 25(11), 913–919. 47. Lall, R., Mistry, D., Bridle, C., & Lamb, S. E. (2012). Telephone interviews can be used to collect follow-up data subsequent to no response to postal questionnaires in clinical trials. Journal of Clinical Epidemiology, 65(1), 90–99. 48. Lundy, J. J., & Coons, S. J. (2011). Measurement equivalence of interactive voice response and paper versions of the EQ-5D in a cancer patient sample. Value in Health, 14(6), 867–871. 49. Lungenhausen, M., Lange, S., Maier, C., Schaub, C., Trampisch, H. J., & Endres. H. G. (2007). Randomised controlled comparison of the Health Survey Short Form (SF-12) and the Graded Chronic Pain Scale (GCPS) in telephone interviews versus self-administered questionnaires. Are the results equivalent? BMC Medical Research Methodology, 7(50). doi:10.1186/1471-2288-7-50. 50. Marceau, L. D., Link, C., Jamison, R. N., & Carolan, S. (2007). Electronic diaries as a tool to improve pain management: Is there any evidence? Pain Medicine, 8(Suppl 3), S101–S109. 51. Matthew, A. G., Currie, K. L., Irvine, J., Ritvo, P., Santa Mina, D., Jamnicky, L., et al. (2007). Serial personal digital assistant data capture of health-related quality of life: A randomized controlled trial in a prostate cancer clinic. Health Qual Life Outcomes, 5, 38. 52. Naus, M. J., Philipp, L. M., & Samsi, M. (2009). From paper to pixels: A comparison of paper and computer formats in psychological assessment. Computers in Human Behavior, 25(1), 1–7. 53. Pinnock, H., Juniper, E. F., & Sheikh, A. (2005). Concordance between supervised and postal administration of the Mini Asthma Quality of Life Questionnaire (MiniAQLQ) and Asthma Control Questionnaire (ACQ) was very high. Journal of Clinical Epidemiology, 58(8), 809–814. 54. Ramachandran, S., Lundy, J. J., & Coons, S. J. (2008). Testing the measurement equivalence of paper and touch-screen versions 123 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. of the EQ-5D visual analog scale (EQ VAS). Quality of Life Research, 17(8), 1117–1120. Reissmann, D. R., John, M. T., & Schierz, O. (2011). Influence of administration method on oral health-related quality of life assessment using the Oral Health Impact Profile. European Journal of Oral Sciences, 119(1), 73–78. Richter, J. G., Becker, A., Koch, T., Nixdorf, M., Willers, R., Monser, R., et al. (2008). Self-assessments of patients via Tablet PC in routine patient care: Comparison with standardised paper questionnaires. Annals of the Rheumatic Diseases, 67(12), 1739–1741. Ritter, P., Lorig, K., Laurent, D., & Matthews, K. (2004). Internet versus mailed questionnaires: A randomized comparison. Journal of Medical Internet Research, 6(3), e29. Salaffi, F., Gasparini, S., & Grassi, W. (2009). The use of computer touch-screen technology for the collection of patient-reported outcome data in rheumatoid arthritis: Comparison with standardized paper questionnaires. Clinical and Experimental Rheumatology, 27(3), 459–468. Salaffi, F., Gasparini, S., Ciapetti, A., Gutierrez, M., & Grassi, W. (2013). Usability of an innovative and interactive electronic system for collection of patient-reported data in axial spondyloarthritis: Comparison with the traditional paper-administered format. Rheumatology, 52(11), 2062–2070. Sikorski, A., Given, C. W., Given, B., Jeon, S., & You, M. (2009). Differential symptom reporting by mode of administration of the assessment: Automated voice response system versus a live telephone interview. Medical Care, 47(8), 866–874. Sousa, P. C., Mendes, F. M., Imparato, J. C., & Ardenghi, T. M. (2009). Differences in responses to the Oral Health Impact Profile (OHIP14) used as a questionnaire or in an interview. Pesquisa Odontologica Brasileira—Brazilian Oral Research, 23(4), 358–364. Suris, A., Borman, P. D., Lind, L., & Kashner, T. M. (2007). Aggression, impulsivity, and health functioning in a veteran population: Equivalency and test-retest reliability of computerized and paper-and-pencil administrations. Computers in Human Behavior, 23(1), 97–110. Swartz, R. J., de Moor, C., Cook, K. F., Fouladi, R. T., BasenEngquist, K., Eng, C., & Taylor, C. L. C. (2007). Mode effects in the center for epidemiologic studies depression (CES-D) scale: Personal digital assistant vs. paper and pencil administration. Quality of Life Research, 16(5), 803–813. Tiplady, B., Goodman, K., Cummings, G., Lyle, D., Carrington, R., Battersby, C., & Ralston, S. H. (2010). Patient-reported outcomes in rheumatoid arthritis: Assessing the equivalence of electronic and paper data collection. The Patient: Patient Centered Outcomes Research, 3(3), 133–143. Weiler, K., Christ, A. M., Woodworth, G. G., Weiler, R. L., & Weiler, J. M. (2004). Quality of patient-reported outcome data captured using paper and interactive voice response diaries in an allergic rhinitis study: Is electronic data capture really better? Annals of Allergy, Asthma & Immunology, 92(3), 335–339. Whitehead, L. (2011). Methodological issues in Internet-mediated research: A randomized comparison of internet versus mailed questionnaires. Journal of Medical Internet Research, 13(4), e109. Wu, L. T., Pan, J. J., Blazer, D. G., Tai, B., Brooner, R. K., Stitzer, M. L., et al. (2009). The construct and measurement equivalence of cocaine and opioid dependences: A National Drug Abuse Treatment Clinical Trials Network (CTN) study. Drug and Alcohol Dependence, 103(3), 114–123. Yu, S. C., & Yu, M. N. (2007). Comparison of Internet-based and paper-based questionnaires in Taiwan using multisample invariance approach. Cyberpsychology & Behavior, 10(4), 501–507. Zimmerman, M., & Martinez, J. H. (2012). Web-based assessment of depression in patients treated in clinical practice: Reliability, validity, and patient acceptance. Journal of Clinical Psychiatry, 73(3), 333–338.
© Copyright 2026 Paperzz