RHEUMATOLOGY Rheumatology 2014;53:11611171 doi:10.1093/rheumatology/ket374 Advance Access publication 18 November 2013 Review Measurement properties of instruments assessing self-efficacy in patients with rheumatic diseases Andrew M. Garratt1,2, Ida Løchting3, Geir Smedslund1,2 and Kåre B. Hagen1,4 The measurement properties of instruments assessing self-efficacy (SE) in patients with rheumatic diseases were reviewed. The consensus-based standards for the selection of health measurement instruments (COSMIN) checklist was applied following systematic searches of seven electronic databases from 1989 to December 2011. Fifteen articles met the inclusion criteria that included the arthritis SE scales (ASES), generalized SE scale (GSES), joint protection SE scale (JP-SES), Marcus & Resnick SE exercise behaviour (SEEB) instruments, and RA SE scale (RASE). The ASES and RASE have undergone more than one evaluation. There was little formal evaluation of content validity for the instruments. Evidence for the RASE suggests that it is not unidimensional. The JP-SES and SEEB were evaluated using modern psychometric methods. The instruments require further evaluation before application. The quality of the evidence for the ASES and RASE is generally poor. The generic focus of the GSES limits its relevance. The JP-SES and SEEB have only undergone one evaluation and that relating to the latter was narrow in scope. Future studies should address these methodological weaknesses. Key words: arthritis self-efficacy scales, COSMIN, questionnaire, reliability, self-efficacy, survey, validity. Introduction Self-efficacy (SE) is a psychosocial variable that has gained importance within rheumatology over the past 20 years [1]. It has been defined as a patient’s belief that they have the ability to perform a task to achieve a desired outcome [2]. SE has been found to be associated with health behaviours, including physical activity, treatment compliance and mental and physical aspects of health status and health outcomes [1, 36]. Interventions to increase SE, including patient education and self-management courses, have been developed and widely implemented within rheumatology with the aim of improving patient outcomes more generally. Several instruments have been developed to assess SE within rheumatology and these have been used to assess the 1 National Resource Centre for Rehabilitation in Rheumatology, Diakonhjemmet Hospital, 2Norwegian Knowledge Centre for the Health Services, 3Communication and Research Unit for Musculoskeletal Disorders (FORMI), Oslo University Hospital and 4 Institute of Health and Society, Faculty of Medicine, University of Oslo, Oslo, Norway. Submitted 21 March 2013; revised version accepted 25 September 2013. Correspondence to: Andrew M. Garratt, National Resource Centre for Rehabilitation in Rheumatology, Diakonhjemmet Hospital, PO Box 23 Vinderen, 0319 Oslo, Norway. E-mail: [email protected] outcomes of these interventions more widely [1, 3]. The instruments differ in content and focus and it is not clear which are the most appropriate for application. Moreover, following a systematic review of evidence for an association between SE and physical health in RA, it was concluded that the use of different instruments leads to problems of comparability [3]. Studies included in the review often failed to report the measurement properties of instruments and a more consistent use of instruments was deemed desirable. Another systematic review relating to chronic diseases concluded that the development and evaluation of the majority of SE instruments had major limitations [7]. Reviews of patient-reported outcome (PRO) instruments have been criticized for lacking methodological quality and the introduction of guidelines has been recommended [8]. Existing reviews of SE instruments have identified a number of instruments, but they have not focused on rheumatic disease [7, 9] or have not included the search strategies and inclusion criteria on which they are based [1, 10]. Hence there is uncertainty regarding what instruments are available for patients with rheumatic disease and whether they are appropriate, reliable and valid in this population. A structured review of the measurement properties of instruments that have undergone evaluation in these patients will inform the selection of SE instruments for future applications in rheumatology ! The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: [email protected] R EV I E W Abstract Andrew M. Garratt et al. [11]. Moreover, the findings of such a review can identify important areas that have been neglected that future research can address, including the testing of specific measurement properties. The review that follows assesses available evidence for the SE instruments that have been developed and evaluated in patients with rheumatic disease. The review follows the consensusbased standards for the selection of health measurement instruments (COSMIN) checklist for assessing the methodological quality of studies, which was based on an international Delphi panel of 43 experts in the field of PROs [12]. Methods Search strategy Electronic searches were used to retrieve articles that included the development and/or evaluation of an instrument designed to measure SE in patients with rheumatic diseases. Search terms included all rheumatic diseases AND SE/self-concept AND the terms relevant to measurement, including instrument, measure, reliability, validity and questionnaire. Searches were limited to articles in English or the Scandinavian languages. The databases included AMED, British Nursing Index, Cinahl, Embase, Medline, PsychINFO and Svemed+ for the years 1989, when the arthritis SE scales (ASES) [13] were first published, to December 2011. Searches were then conducted using the names of identified instruments in PubMed. Three researchers (A.G., I.L., G.S.) independently assessed the titles and abstracts of a random selection of 50 records against the inclusion critieria of the development and/or evaluation of measurement properties of SE instruments in patients with rheumatic diseases. This included translations and cross-cultural evaluations of existing instruments. The reviewers then discussed their findings to ensure that they were applying the criteria correctly before assessing the remainder. One reviewer (A.G.) assessed all and the other two each assessed half of the records. The findings were pooled, consensus achieved and articles meeting the inclusion criteria retrieved. Other potentially relevant articles were retrieved from reference lists. additional boxes have items relating to item response theory and generalizability of results. It is also possible to calculate a methodological quality score where each item is rated on a four-point scale of poor, fair, good and excellent [14]. The overall score per box is determined by the item with the lowest score, hence a poor score on any item represents a fatal flaw. COSMIN does not take account of the findings of the study, thus, as recommended by the COSMIN Group [12], an overall quality rating that is designed to evaluate and compare the quality of instruments [15] was also included, which has been used in previous systematic reviews [1618]. The ratings of positive (+), indeterminate (?), negative ( ) or no information available (0) relate to floor and ceiling effects, internal consistency, interpretability, qualitative and quantitative aspects of validity, reproducibility and responsiveness [15]. Results Search strategy After removal of duplicates there were 1300 articles and 15 met the inclusion criteria. The articles related to six SE instruments that had been evaluated in patients with rheumatic disease. Two studies of education interventions from the Netherlands [19] and Turkey [20] that included the ASES were excluded because the methods of translation were not reported and testing for measurement properties was limited. Both studies reported Cronbach’s a. The Turkish study used a modified ASES with four scales and reported testretest reliability, but only one value was given for this and Cronbach’s a [20]. Searches based on instrument names did not give any additional articles that met the inclusion criteria. Table 1 shows that several studies included more than one population. Two instruments were evaluated in different countries and languages [2129], but there were no cross-cultural comparisons of measurement properties and equivalence [12] and the COSMIN ratings that follow relate to the translation procedures only. Where stated, the instruments were all self-administered with the exception of one study [23]. Data extraction Arthritis SE scales Data extraction followed the COSMIN checklist for PRO instrument measurement properties [12, 14]. Two reviewers (A.G., G.S.) independently extracted the information using a data extraction sheet after comparing their results for two articles to assess consistency. The COSMIN checklist includes 10 boxes relating to the methodological quality of the studies describing the development and evaluation of PROs based on an international Delphi study [12]. The boxes comprise 5 to 18 items relating to internal consistency, reliability (relative measures including testretest), measurement error, content validity (including face validity), structural validity (factor analysis), hypothesis testing, cross-cultural validity, criterion validity, responsiveness and interpretability. Two The ASES was originally developed in the USA to measure perceived SE to cope with the consequence of arthritis and to understand change processes relating to patient education programmes and rehabilitation outcomes [13]. The content of the ASES was developed by a rheumatologist based on the 1981 Conference on Outcome Measures, with further items being developed following focus groups with patients [13]. The original ASES comprises 20 items contributing to three scales of SE related to physical function (9 items), other symptoms (6 items) and pain (5 items) (Table 2). The items ask if patients think they can perform a given level of activity and to rate their confidence in that judgement. Items are scaled on a 10-point (10100), end- and midpoint-anchored scale. 1162 www.rheumatology.oxfordjournals.org www.rheumatology.oxfordjournals.org USA, two community based RCTs of physical activity programs in adult arthritis Switzerland (German), rheumatologists, consecutive Clinical quality management project UK, outpatients, consecutive Outpatients, not undergoing admission/ASMC, consecutive Routine self-management groups UK, education programme with normal care Denmark, outpatients, consecutive Outpatients in educational sessions, consecutive UK, arthritis care, study targeting those >50 years old ASMC attendees recruited through arthritis care, rheumatology clinics, general practice ASMC in adult education setting UK, secondary care rheumatology centres, random selection Germany, hospital centre for rheumatology Outpatients, interdisciplinary group therapy, consecutive USA/Venezuela (Spanish) Hispanics, rheumatology clinics, interview Senior citizen and Hispanic service centres, interview/selfadministration USA (English, Spanish), community health centre and enrolled in NIAMS Natural History of Rheumatic Disease in Minority Communities protocol, consecutive USA, RCT of arthritis self-management course (ASMC) Post-ASMC patients Sweden, pain clinic outpatients, consecutive Sweden, rheumatology outpatients being discharged or in regular contact Sweden, RCT of FM self-management and physical training UK, ASMC ASMC in an adult education setting Older people with arthritis project Country (language)a,b, setting, selection 595g 101 (88.6) 126 (72.0) (82.2) (74.2) (77.4) (86.5) (90.0) — 58.6 (13.7) 59.0 (12.9) 61.8 (11.5) 62.3 (10.7) 55.4 (12.4) 56.2 (11.6) 60.5 (13.8) 65 (9.6) 56.1 (11.9) 66 (82.9) 88 23 48 128 72 106 68.4 (8.0) 56.0 (12.5) 50.8 (12.2) 57.7 (6.4) 51.4 (9.4) 51.3 (13.2) 80 (82.1) 79 (72.4) 612 (64) 148 (60.4) 43 109 62.5 (21.9) 151 (8.3) (12.5) (11.9) (8.0) 52.8 (14.4) 46.5 56.0 56.1 68.4 64.7 (12.9) 63.7 (12.9) 37.5 52.6 Mean age (S.D.), years 272 99 145 (72.4) 66 (82.9) 80 (82.1) 97 144 25 24 n (response rate %) — 77.2 73.8 79.5 69.6 68.8 80 83.3 67.0 79 91 81 28.4 92 100 75.2 84.1 88.6 100 81 79 91 86 80 68 75 Female, % RA RA RA RA RA RA — 100 RA 100 RA 100 100 100 100 100 100 30 RA, 38 OA, 32 other 29 RA, 38 OA, 33 other 31 RA, 44 OA, 25 other 100 AS 100 FM 100 FM — 71 OA, 18 RA, 2 SLE, 9 other 13 OA, 73 RA, 10 SLE, 4 other 54 RA, 21 arthritis, 17 FM, 4 back/knee paind 100 FM 48 RA, 46 OA, 36 other 42 RA, 53 OA, 26 other 46 RA, 60 OA, 53 other 56 OA, 15 RA, 29 other arthritis 58 OA, 22 RA, 20 other arthritis Disease(s)c, % — 12.8 (8.3) 13.0 (9.3) 10.5 (239) 17.6 (10.6) 20.5 (13.1) 13.1 (10.1) 1 (0.547)f 13.3 (10.0) 25.3 (16.2) 18.1 (13.3) 17.3 (11.7) 11.8 (8.3) 5.6 (3.7) — — — 7.5 18.1 (13.3) 13.3 (10.0) 25.3 (16.2) — — 5.20 10.25 Mean disease duration (S.D./range), years RCT: randomized controlled trial; NIAMS: National Institute of Arthritis and Musculoskeletal and Skin Diseases. aGiven when the instrument was administered in languages other than that of the main language of country. bAll instruments were self-administered unless stated. cIn some studies patients had more than one disease and hence percentages do not add up to 100. dPercentages do not add to 100 due to rounding errors (Lomi, 1992) or because patients had multiple diagnoses (Barlow, 1997). eThe wording is specific to FM and AS. f Median (interquartile range). gA total of 595 (85.0%) of the trial participants had complete data for the two instruments. Marcus SEEB/Resnick SEEB Mielenz et al., 2011 [35] JP-SES Niedermann et al., 2011 [34] Hewlett et al., 2008 [33] Primdahl et al., 2010 [29] RASE Hewlett et al., 2001 [32] ASES-8 ASe Sandhu et al., 2010 [27] GSES Barlow et al., 1996 [31] ASES-8 fibromyalgiae Mueller et al., 2003 [26] Wallen et al., 2011 [28] ASES-8 Gonzalez et al., 1995 [23] Lomi et al., 1995 [24] Barlow et al., 1997 [25] Lomi, 1992 [21] ASES Lorig et al., 1989 [13] Instrument/Author TABLE 1 Generalizability adapted from COSMIN—populations in which instruments were evaluated Review of self-efficacy instruments 1163 1164 Marcus SEEB: unidimensional (5) Resnick SEEB: unidimensional (9) GSES: unidimensional (10) RASE: unidimensional (28) JP-SES: unidimensional (10) The Swedish version goes in the opposite direction from very certain to very uncertain with moderately certain corresponding to the midpoint of the scale [21]. b‘How confident . . .’ is used for two items. cThe original article describing the development of the instrument has an 11-point scale from 0 to 10 [37]. a 5-point: not at all confident, somewhat confident, moderately confident, very confident, completely confident 10-point numerical rating scale with endpoint-only descriptors: not confident and very confidentc 10-point numerical rating scale with end- and midpoint descriptors of very uncertain, moderately uncertain and very certain. The midpoint lies between two scale pointsa 10-point numerical rating scale with endpoint-only descriptors: very uncertain and very certain 4-point: not at all true, barely true, moderately true, exactly true 5-point: strongly disagree, disagree, neither, agree, strongly agree 4-point: not at all confident, a little confident, quite confident, very confident. ‘How certain are you that you can . . .’ followed by tasks/now ‘How certainb are you that you can . . .’ followed by tasks/now Item specific/none ‘I believe I could . . .’ followed by tasks/none ‘I am confident that I care for my joints, (even) when I . . .’/none ‘I am confident that I can participate in regular exercise when . . .’/none ‘How confident are you right now that you could exercise three times per week for 20 min if . . .’/now ASES: other symptoms (6), pain (5), physical function (9) ASES-8: unidimensional (8) Instrument/dimensions (items) TABLE 2 Instrument content Item formulation/timeframe Scaling Andrew M. Garratt et al. This version has been evaluated in patients in the USA and Sweden (Table 2) that included patients with OA, RA and other rheumatic diseases [13, 21, 22] and FM [24]. The Swedish version underwent forwardbackwards translation before evaluation [21]. The two scales of other symptoms and pain (ASES-11) were evaluated in UK patients with OA and RA [25]. This study used different scaling, with the 10-point scale being changed to 110 without a midpoint. The same scaling was used with the ASES-8, the items of which were derived from the other symptom and pain scales with two additional items based on the experiences of the authors that sum to a single score [23]. The ASES-8 was developed in the USA and forwardbackwards translated to Spanish and evaluated in Hispanic patients in the USA and Venezuela [23]. FM- and AS-specific versions of the ASES-8 were subsequently evaluated in Germany [26] and the UK [27], respectively. The FM-specific ASES-8 underwent forwardbackwards translation in the German study [26]. English and Spanish versions have been evaluated in urban Hispanic and African American patients with rheumatic disease [28]. Generalized SE scale The generalized SE scale (GSES) is a generic measure of SE that was developed in Germany to assess perceived coping ability across a wide range of demanding situations [30]. The 10 items have a four-point descriptive scale and sum to a single score. The GSES has undergone one evaluation in patients with rheumatic diseases in the UK [31]. RA SE scale The RA SE scale (RASE) was developed in the UK to assess SE for self-management and is specific to RA [32, 33]. Content development was based on interviews with 19 health professionals who identified common problems experienced by RA patients and associated selfmanagement strategies. The former were then grouped into problem themes and 17 RA patients were asked to describe how they dealt with the problems, giving further items [32]. The final RASE comprises 28 items with a fivepoint descriptive scale that sum to a single score (Table 2). The instrument was evaluated in outpatients and self-management group attendees. The RASE has also been evaluated in Denmark following forwardbackwards translation [29]. Joint protection SE scale The joint protection SE scale (JP-SES) was developed in Switzerland and was designed to assess the perceived ability of patients with RA to use methods of joint protection [34]. Content is based on existing instruments, experiences and consensus between five occupational therapists and interviews with 10 RA patients who assessed item relevance and suggested additional items. The final JP-SES includes 10 items with a fourpoint descriptive scale that sum to a single score (Table 2). It was evaluated in patients recruited by www.rheumatology.oxfordjournals.org Review of self-efficacy instruments rheumatologists and participants of a clinical quality management project [34]. SE scale for exercise behaviour The Marcus SE scale for exercise behaviour (SEEB) and Resnick SEEB instruments were evaluated in the same study, the first within rheumatology [35], following their development in working adults [36] and older adults [37], respectively. The Marcus has five items with a five-point descriptive scale. The Resnick has nine items with a 10-point numerical rating scale with end-only descriptors. The items within both instruments sum to a single score. The instruments were evaluated in arthritis patients taking part in community-based randomized trials of physical activity programs [35]. Measurement properties The information extracted from the articles relating to the COSMIN criteria are shown in Tables 3 and 4, and supplementary Table S1, available at Rheumatology Online, and summarized in Table 5. Table 6 gives a summary of the assessment of measurement properties that includes the results of testing [15]. The COSMIN criteria of measurement error and criterion validity are not included in Table 5 because they were not evaluated for any instrument. Just two studies relating to the JP-SES [34] and the two SEEB [35] instruments used modern psychometric methods that were of good methodological quality according to COSMIN (Table 5). The tables do not include the COSMIN box of interpretability because very little information was available from studies on data quality and scale scores and none of the studies gave the minimal important change (MIC) or minimal important difference (MID). Levels of Cronbach’s a were generally acceptable for all studies (Table 3). However, for the ASES and its variants, the methodological quality was often found to be poor (Table 5) because structural validity was not assessed [24, 27, 28] or the sample sizes were small [13, 21, 24]. The RASE is treated as a unidimensional instrument, but there is strong evidence to the contrary [32, 33]; the other aspects of quantitative testing, including internal consistency, are based on a unidimensional scale that lacks supporting evidence and hence these are not commented on further. Testretest estimates for the ASES were generally acceptable (Table 3). However, problems with design, including lengthy time intervals [13, 21, 26] and the use of Pearson’s rather than intraclass correlation [13, 21, 23], resulted in fair ratings for the original ASES and poor ratings for the ASES-8. Just one of the ASES studies [27] had a COSMIN rating of good or excellent and an overall quality assessment of positive for reliability (Tables 5 and 6). TABLE 3 Internal consistency and reliability Testretest Internal consistency/ Cronbach’s a Instrument/author ASES Lorig et al., 1989 [13] Lomi, 1992 [21] Lomi et al., 1995 [24] Barlow et al., 1997 [25] ASES-8 Gonzalez et al., 1995 [23] Mueller et al., 2003 [26] Sandhu et al., 2010 [27] Wallen et al., 2011 [28] GSES Barlow et al., 1996 [31] RASE Hewlett et al., 2001 [32] Hewlett et al., 2008 [33] Primdahl et al., 2010 [29] JP-SES Niedermann et al., 2011 [34] Interval in days Pearson’s r (unless stated) 9.4 (range 229) — Other 0.90, pain 0.87, physical 0.85 Other 0.75, pain 0.94, physical 0.87a — — — 0.880.96 0.90 0.93 1014 56 (range 3973) 14 0.87 — 0.69 ICC 0.51 ICC 0.77 (95% CI 0.72, 0.81)b ICC 0.78 (95% CI 0.73, 0.82)c — 0.880.91 124 0.63 0.89 0.91 7 and 28 — 7 0.690.89 — ICC 0.88 0.92 21 Spearman’s rank 0.78 Other symptoms 0.87, pain 0.750.76, physical 0.890.90 Other symptoms 0.820.92, pain 0.900.92, physical 0.930.94a Other symptoms 0.90, pain 0.90, physical 0.89 Other symptoms 0.890.91, pain 0.820.85 21 ICC: intraclass correlation coefficient. aRheumatology patients only. bPatients reporting no change in AS-specific health. Patients reporting no change in general health. c www.rheumatology.oxfordjournals.org 1165 1166 Hypothesis — — FA and confirmatory FA (i) Association with ASES pain only (ii) No association with disease activity and disability No hypotheses Association with health status ASES-8 more predictive of German Pain Coping Questionnaire scores than GSES, locus of control, optimism/pessimism scale scores (i) Moderate (0.40.5) correlations: BASDI, BASFI, EASI-QOL (physical, social, disease activity), SF-36 (general health, physical, pain, role-physical, social, vitality) (ii) Moderate to high correlations (>0.5): EASI-QOL emotional well-being, Hospital Anxiety and Depression Scale, Pain NRS and SF-36 (mental health, role-emotional) (iii) ASES associated with education level and employment status (i) Associations with baseline and future health status (ii) Modest association with GSES Association with health status (i) Baseline SE associated with health status at 0, 1 and 2 years (ii) Change in SE associated with change in health/quality of life (i) Association with health status at 0 and 4 months (ii) Association with improved SE/health status (iii) Moderate correlation with home task performance PCA — PCA indicated several factors PCA — PCA PCA: pain item 2 loaded on other symptoms scale; other symptoms item 7 loaded on pain scale PCA: other symptoms; item 2 had higher loading on pain scale — FA: pain item 2 loaded on other symptoms scale; physical item 7 loaded on pain scale PCA Structural validity Correlations/resultsa — — — (i) ASES: function 0.21 ns, other 0.56, pain 0.44 (ii) DAS28 0.26 ns; HAQ 0.15 ns Centre for Epidemiological StudiesDepression Scale, HAQ, Fatigue VAS, Medical Outcomes Study Health Distress and Positive Affect scales, Pain VAS, Social Support Survey 0.17 ns0.52 (i) 0.560.69 (ii) 0.420.70 (iii) Mean ASES scores: higher education 5.89, lower education 5.30; employed 6.07, unemployed 4.84 ASES-8 was the most important independent variable in multiple regression analysesc Disease duration, pain intensity, pain frequency 0.05 ns0.38b (i) Beck Depression Index, Fibromyalgia Attitudes Index, Fibromyalgia Impact Questionnaire, Quality of Life Scale, McGill Pain Questionnaire, Walking Performance 0.18 ns0.59 (ii) Fibromyalgia Impact Questionnaire 0.370.50 Not all data reported (i) Centre for Epidemiological Studies Depression Scale, HAQ, Fatigue VAS, Medical Outcomes Study Health Distress and Positive Effect scales, Pain VAS, Social Support (ii) GSES 0.22 ns0.45 (i) Beck Depression Scale, HAQ, Pain VAS: 0.210.76 (ii) Beck Depression Scale, HAQ, Pain VAS: 0.290.41 (iii) Home task performance 0.61 Hypothesis testing PCA: principal component analysis; ns: not significant; VAS: visual analogue scale; NRS: numerical rating scale; FA: factor analysis; EASI-QOL: Evaluation of AS Quality of Life; SF-36: 36-item Short Form Health Survey; DAS28: 28-joint Disease Activity Score. aNegative coefficients have been made positive for ease of understanding. All results are statistically significant (P < 0.05 or higher) unless stated. bRheumatology patients only. cStudies that used multiple regression analysis are included if there were specific hypotheses. JP-SES Niedermann et al., 2011 [34] Marcus and Resnick SEEB Mielenz et al. 2011 [35] RASE Hewlett et al., 2001 [32], Hewlett et al. [33], 2008 Primdahl et al., 2010 [29] GSES Barlow et al., 1996 [31] Sandhu et al., 2011 [27] ASES-8 Mueller et al., 2003 [26] Barlow et al., 1997 [25] Lomi, 1992 [21] Lomi et al., 1995 [24] Lomi, 1992 [21] ASES Lorig et al., 1989 [13] Instrument/author TABLE 4 Structural validity and hypothesis testing Andrew M. Garratt et al. www.rheumatology.oxfordjournals.org Review of self-efficacy instruments TABLE 5 COSMIN checklist ratinga of methodological quality of articles Measurement properties evaluated across the studiesb Instrument/author ASES Lorig et al., 1989 [13] Lomi, 1992 [21] Lomi et al., 1992 [22] Lomi et al., 1995 [24] Barlow et al., 1997 [25] ASES-8 Gonzalez et al., 1995 [23] Mueller et al., 2003 [26] Sandhu et al., 2011 [27] Wallen et al., 2011 [28] GSES Barlow et al., 1996 [31] RASE Hewlett et al., 2001 [32] Hewlett et al., 2008 [33] Primdahl et al., 2010 [29] JP-SES Niedermann et al., 2011 [34] Marcus SEEB Mielenz et al., 2011 [35] Resnick SEEB Mielenz et al., 2011 [35] IRT models Internal consistency Reliability Poor Poor Fair Fair Content validity Poor Fair Structural validity Hypothesis testing Poor Poor Fair Fair Fair Good Poor Fair Poor Poor Poor Poor Excellent Fair Fair Fair Fair Poor Fair Fair Poor Good Poor Fair Fair Poor Poor Poor Fair Good Good Fair Poor Good Good Good Good Good Good Fair Good Poor Fair Poor Poor Translationc Responsiveness Fair Good Good Good Excellent Fair Excellent Fair Fair Fair Poor IRT: Item Response Theory. aFour-point rating scale of excellent, good, fair and poor. bMeasurement error and criterion validity were not reported in any of the articles and hence are not included in the table. cRatings relate to quality of the translation procedure only, as cross-cultural validity was not assessed. TABLE 6 Summary of the assessment of measurement propertiesa,b of all instruments Instrument, author, year ASES Lorig et al., 1989 [13] Lomi, 1992 [21] Lomi et al., 1992 [22] Lomi et al., 1995 [24] Barlow et al., 1997 [25] ASES-8 Gonzalez et al., 1995 [23] Mueller et al., 2003 [26] Sandhu et al., 2011 [27] Wallen et al., 2011 [28] GSES Barlow et al., 1997 [25] RASE Hewlett et al., 2001 [32] Hewlett et al., 2008 [33] Primdahl et al., 2010 [29] JP-SES Niedermann et al., 2011 [34] Marcus SEEB Mielenz et al., 2011 [35] Resnick SEEB Mielenz et al., 2011 [35] Content validity Internal consistency Construct validity Reliability Responsiveness Floor/ceiling effect ? 0 0 0 0 ? ? 0 ? + ? 0 ? ? ? ? ? 0 0 0 ? 0 0 ? 0 0 0 0 0 0 0 0 0 0 ? 0 0 0 0 ? + ? ? 0 ? + 0 ? ? + 0 0 ? + 0 0 + + 0 ? ? ? 0 0 + ? ? 0 0 ? ? 0 ? ? ? ? ? ? ? ? 0 ? ? ? ? 0 0 0 ? ? ? ? + ? ? 0 ? ? 0 + 0 0 0 0 0 0 + 0 0 0 0 0 Interpretability a Three-point rating of positive rating (+), indeterminate rating (?), negative rating ( ) and no information available (0). Agreement and criterion validity were not reported in any of the articles and hence are not included in the table. b www.rheumatology.oxfordjournals.org 1167 Andrew M. Garratt et al. Just three studies explicitly addressed qualitative aspects of validity testing, including content and face validity [3234], but there were problems with methodology (Tables 5 and 6). Patient focus groups were used to assess item relevance in the development of the ASES [13] and comprehensibility and understanding were assessed in subsequent studies by means of patient interviews [25, 28]. However, as with the other studies relating to the ASES and the short-form versions, content validity was not explicitly considered and hence was not rated in relation to the COSMIN checklist. Patients contributed to item development within the RASE and item selection included the consideration of clinical judgment, clinical relevance and face validity [32]. The instrument was also assessed for face validity by means of patient interviews in Denmark [29]. The development of the JP-SES included semi-structured interviews with patients who assessed content relevance [34], however, the content validity of the instrument was not further considered. The SEEB instruments were not assessed for content validity [35]. The GSES was assessed for comprehensibility [31]. The results of testing for structural validity by means of factor or principal component analysis (PCA) largely supported the existence of the three scales within the original ASES [13], two scales within the ASES-11 [25] and the unidimensional ASES-8 [26] (Table 4). However, the COSMIN ratings were fair to poor across studies because of small sample sizes [13, 21, 2426] (Table 5). There was one poorly performing item within each scale, with some consistency across Swedish and UK populations [21, 24, 25]. The GSES has evidence for unidimensionality [31], but the lack of information relating to the handling of missing data meant that a rating of fair was given. The PCA of the final 28-item RASE gave eight components [32, 33] and is at odds with the unidimensional scale subsequently used. Moreover, these two studies had less than adequate sample sizes necessary for the highest COSMIN rating and overall rating of quality [14, 15]. There was evidence to support the unidimensionality of the JP-SES [34] and two SEEB [35] instruments (Table 4). However, the structural validity of the JP-SES was only assessed after item removal [34]. With the exception of the first UK evaluation of the ASES [25], all studies had poor to fair ratings for one or more aspects of hypothesis testing, or what is often referred to as construct validity. All but one of these studies [31] lacked some information on the other self-completed PRO instruments used in testing (Tables 4 and 5). The two SEEB instruments were not tested for this form of validity [35]. Only the evaluation of the ASES-8 for AS was consistent in including hypotheses for the expected size of correlations [27]. Cross-cultural validity was not assessed and hence the ratings relate to the quality of the translation procedures for the ASES [21], ASES-8 [23, 26] and RASE [29]. These were of good to excellent methodological quality and included forwardbackwards translation by independent translators (Table 5). Responsiveness was evaluated for the ASES and RASE (see supplementary Table S1, available at Rheumatology 1168 Online) and was of fair or better methodological quality (Table 5). However, none of the studies gave explicit consideration to the smallest detectable change and the MID. Hence the summary ratings for responsiveness were generally indeterminate (Table 6). Finally, very little information was given to aid in the interpretability of instrument scores. MIC was not reported for any study and when means and S.D.s of scores were reported it was for less than four important subgroups. One study included a BlandAltman plot [38], but the MIC was not given [29]. Floor and ceiling effects for instrument scores were reported in two studies relating to the ASES-8 [26, 27] and one for the RASE [29]. For the JP-SES it was simply reported that no such effects were present [34]. Discussion The methodological quality of studies relating to instruments that are designed to assess different forms of SE in rheumatology is generally poor to fair for the ASES, its variants and the RASE according to the COSMIN fourpoint rating scale [14]. The three more recently evaluated instruments have COSMIN ratings of fair to good. Where information was available, the overall ratings of quality, including results [15], were generally indeterminate. The relatively few positive ratings were associated with more recent studies including the short-form ASES-8 [26, 27], JP-SES [34] and two SEEB [35] instruments. The poorest evidence relates to the older instruments. This is a common finding within reviews, with more recent instruments being developed and evaluated according to more stringent criteria that also reflect developments in psychometric and statistical analysis [11]. However, the more recent instruments [34, 35] have not been evaluated in relation to two or more important criteria within the COSMIN checklist. The ASES, including the short forms, were developed earlier than the other instruments and are more broadly applicable. Hence this instrument has had the greatest application, which, in addition to the evaluation of arthritis self-management programmes [39], has included applications as an outcome measure within rheumatology more generally [3, 40]. However, the content validity of this instrument and whether it adequately reflects SE was not explicitly addressed as part of the development [13]. Quantitative criteria have been neglected, including measurement error and the application of confirmatory factor analysis, and item response theory could further our understanding of this instrument [41]. The GSES may be too general in focus for interventions that are designed to improve SE in rheumatology, but might have use in assessing the construct validity of SE instruments specific to rheumatic diseases [25, 32]. The RASE has evidence for multidimensionality and yet the items have been summed to give a single score [29, 32, 33]. The three more recently evaluated instruments relating to joint protection [34] and exercise behaviour [35] should be evaluated for reliability, content validity and construct validity through hypothesis testing as well www.rheumatology.oxfordjournals.org Review of self-efficacy instruments as responsiveness before they can be recommended for applications within rheumatology. The lack of explicit consideration of qualitative aspects of validity, including content and face validity, was the single most important weakness across the majority of studies. This includes patient involvement through interviews and focus groups in the development of instrument content and cognitive interviews before psychometric testing [15, 42, 43]. There was no evaluation of content validity for the SEEB instruments [35] and the one for the JP-SES was of poor quality [34]. The two SEEB instruments were originally developed with working adults and older adults [35]. The content validity of these instruments should be assessed before application in patients with rheumatic disease. The assessment of construct validity by means of hypothesis testing was generally of poor to fair quality for the majority of studies, with just one study being rated as good [25] according to COSMIN. Given the lack of a gold standard measure of SE, it is important that future studies give greater consideration to hypothesis testing, including the expected direction and size of associations, with scores for instruments assessing different aspects of SE, health status and other relevant variables [12]. The main strengths of this review include the systematic searches of the literature based on highly relevant search terms and the use of COSMIN and other recognized criteria for assessing the quality of instrument evaluations [12, 15]. The searches were informed by published reviews of PRO instruments [7, 11, 44], a review relating to SE within rheumatology [3] and consultation with a research librarian who undertook the electronic searches. This is the first review of SE instruments that has followed these standards. The review included three instruments [34, 35] and two studies [27, 28] relating to the ASES that were not included in existing reviews [1, 7, 9, 10]. However, the search strategy is not without its limitations. First, only articles published in English and the Scandinavian languages were included, creating a potential language bias. Second, the grey literature and unpublished work were not included, but the absence of peer review would have made this work harder to assess and include alongside the findings for the methodology quality and results of the other studies. The information extracted from articles followed the COSMIN checklist relating to the quality of studies of PRO instruments [12]. No other checklist is available for this purpose that is based on an international panel of experts from the field of PROs. COSMIN focuses on the methodological quality of studies, which is not sufficient for determining which instruments are most appropriate for future application or further evaluation, therefore a quality rating was also given that includes the results of development and evaluation studies [15]. These criteria do not give an overall rating of an individual instrument or study and the task of synthesising the evidence must follow their application. Moreover, evidence pooled from separate evaluations of the same instrument in similar patients and settings can further contribute to the overall burden of evidence for an www.rheumatology.oxfordjournals.org instrument in such a context. The concurrent evaluation of instruments is arguably the gold standard method for determining the most appropriate one for a given patient population and health care setting and has been recommended when evidence from systematic reviews is inconclusive [11, 43]. The few studies included in the current review were conducted across several countries and populations and only one included more than one review instrument, in relation to just three COSMIN criteria [35]. These issues, while of limited relevance here, are not explicitly considered by COSMIN or the other review criteria and those undertaking reviews in the future should be aware of their potential importance. The majority of PRO instruments, including those in this review, are based on psychometric methods including summated rating scales [44]. However, the COSMIN checklist is somewhat less applicable to other types of instruments. For example, internal consistency is of less relevance to individualized and utility instruments. Individualized instruments include domains or items selected by patients and/or importance weightings [44, 45]. Health states within utility instruments have preference weights [44]. Reviews that include such instruments should also consider other relevant criteria [43, 46]. Criteria differ in importance and their relevance may vary depending on the application. It has been argued that an instrument should not be considered for application if it does not have adequate content validity [15, 42, 43]. Rigorous testing for content validity, acceptability and comprehensibility with patients might further the understanding of problems that have been identified in the rheumatology literature [3, 32]. The SE instruments use different item phraseology and scaling, which may have implications for validity and other measurement properties. SE is a belief in one’s ability to carry out a task and not a measure of actual ability, task performance or outcome [2]. It has been argued that the ASES assesses these other factors rather than SE [47]. The developers argue that the content of the RASE reflects beliefs about the potential for initiating tasks rather than other factors [32]. Items within the RASE begin with ‘I believe I could’ [32]. Items within the ASES begin with ‘How certain are you that you can’ [13]. Which approach is the most appropriate for assessing SE is far from clear, but all are more cognitively challenging than the vast majority of PRO instruments, which assess aspects of health status such as physical function [44]. Any theoretical advantages arising from the use of ‘I believe’ within the RASE items might be overshadowed by problems relating to item wording and scaling. Several of the statements are very long and/or double-barrelled, including item 15: ‘I believe I could educate my family and friends about my arthritis to help with the strains that arthritis can make on relationships’ [32]. Five items within the ASES function scale relate to timed tasks, which are relatively more cognitively demanding [13]. In contrast, the JP-SES and two SEEB instruments include much shorter statements [34, 35]. The scaling within the ASES has been criticized for its complexity [32]. Ten-point scales appear to be precise, 1169 Andrew M. Garratt et al. but patients may be reluctant to use all scale points [48, 49]. There is also evidence that the endpoint-only descriptors such as those used in the ASES, draw responses to the scale ends [49]. The SE instruments comprise multiitem scales, which lessens the need for such precision at the item level. However, the five-point agreedisagree scale included alongside positively worded statements in the RASE can lead to acquiescence bias, where respondents tend to agree to questions regardless of whether this reflects their SE or not [50]. In the Danish evaluation of the RASE, >60% of patients responded agree or strongly agree to 21 of the 28 items, which may indicate bias [29]. In conclusion, this review is designed to inform the future application of SE instruments within rheumatology and, through the identification of study and instrument weaknesses, future developmental and evaluative work. The recommendations arising from this review are limited due to the small number of studies, their poor quality or their limited scope. In spite of methodological weaknesses, the ASES and its short forms continue to be widely used in research and clinical applications, hence testing for content validity and the consideration of alternative methods of scaling the items is recommended. Content validity should be further considered for all instruments. Structural validity should be assessed before the evaluation of internal consistency, reliability, construct validity and responsiveness. There is evidence that the RASE is a multidimensional instrument and the scoring of the instrument must reflect this before further evaluation and application can be considered. Finally, the JP-SES and two SEEB instruments are more specific in their focus and hence have more limited scope for application. However, the results of initial evaluations are encouraging, but as with all the instruments included in the review, further evaluation is necessary before they can be recommended as measures of SE in rheumatology. Rheumatology key messages Six self-efficacy (SE) instruments are available for patients with rheumatic disease. . SE instrument testing was generally of poor quality or limited scope. . Further evaluation of SE instruments that includes content validity is recommended. . References 1 Brady TJ. Measures of self-efficacy. Arthritis Care Res 2011;63:S47385. 2 Bandura A. Self-efficacy. Englewood Cliffs, NJ: Prentice Hall, 1986. 3 Primdahl J, Wagner L, Hørslev-Petersen K. Self-efficacy as an outcome measure and its association with physical disease-related variables in persons with rheumatoid arthritis: a literature review. Musculoskeletal Care 2011;9:12540. 4 Healey EL, Haywood KL, Jordan KP et al. Ankylosing spondylitis and its impact on sexual relationships. Rheumatology 2009;48:137881. 5 van Hoogmoed D, Fransen J, Bleijenberg G et al. Physical and psychosocial correlates of severe fatigue in rheumatoid arthritis. Rheumatology 2010;49:1294302. 6 Somers TJ, Shelby RA, Keefe FJ et al. Disease severity and domain-specific arthritis self-efficacy: relationships to pain and functioning in patients with rheumatoid arthritis. Arthritis Care Res 2010;62:84856. 7 Frei A, Svarin A, Steurer-Stey C et al. Self-efficacy instruments for patients with chronic diseases suffer from methodological limitations—a systematic review. Health Qual Life Outcomes 2009;7:86. 8 Mokkink LB, Terwee CB, Stratford PW et al. Evaluation of the methodological quality of systematic reviews of health status instruments. Qual Life Res 2009;18:31333. 9 Miles CL, Pincus T, Carnes D et al. Measuring pain self-efficacy. Clin J Pain 2011;27:46170. 10 Brady TJ. Measures of self-efficacy, helplessness, mastery, and control. Arthritis Care Res 2003;49:S14764. 11 Garratt AM, Brealey S, Gillespie W. Patient-assessedassessed health instruments for the knee: a structured review. Rheumatology 2004;43:141423. 12 Mokkink LB, Terwee CB, Patrick DL et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status instruments: an international Delphi study. Qual Life Res 2010;19:53949. 13 Lorig K, Chastain RL, Ung E et al. Development and evaluation of a scale to measure perceived self-efficacy in people with arthritis. Arthritis Rheum 1989;32:3744. 14 Terwee CB, Mokkink LB, Knol DL et al. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res 2012;21:6517. Acknowledgements 15 Terwee CB, Bot SDM, de Boer MR et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:3442. Hilde Iren Flaaten undertook the searches of the electronic databases and reference management. This work was supported by Diakonhjemmet Hospital. 16 Tijssen M, van Cingel R, van Melick N et al. Patient reported outcome questionnaires for hip arthroscopy: a systematic review of the clinometric evidence. BMC Musculoskelet Disord 2011;12:117. Disclosure statement: The authors have declared no conflicts of interest. Supplementary data Supplementary data are available at Rheumatology Online. 1170 17 Thorborg K, Roos EM, Bartels EM et al. Validity, reliability and responsiveness of patient-reported outcome questionnaires when assessing hip and groin disability: a systematic review. Br J Sports Med 2010;44:118696. 18 Eechaute C, Vaes P, Van Aerschot L et al. The clinimetric qualities of patient-assessed instruments for measuring www.rheumatology.oxfordjournals.org Review of self-efficacy instruments chronic ankle instability: a systematic review. BMC Musculoskelet Disord 2007;8:6. 19 Taal E, Riemsma RP, Brus HL et al. Group education for patients with rheumatoid arthritis. Patient Educ Couns 1993;20:17787. 20 Unsal A, Kasikci MK. Effect of education on perceived self-efficacy for individuals with arthritis. Int J Caring Sci 2010;3:311. 35 Mielenz TJ, Edwards MC, Callahan LF. Item-responsetheory analysis of two scales for self-efficacy for exercise behavior in people with arthritis. J Aging Phys Act 2011; 19:23948. 36 Marcus BH, Owen N. Motivational readiness, self-efficacy and decision-making for exercise. J Appl Soc Psychol 1992;22:316. 21 Lomi C. Evaluation of a Swedish version of the arthritis self-efficacy scale. Scand J Caring Sci 1992;6:1318. 37 Resnick B, Jenkins LS. Testing the reliability and validity of the self-efficacy for exercise scale. Nurs Res 2000;49: 1549. 22 Lomi C, Nordholm LA. Validation of a Swedish version of the arthritis self-efficacy scale. Scand J Rheumatol 1992; 21:2317. 38 Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;327:30710. 23 Gonzalez WM, Stewart A, Ritter PL et al. Translation and validation of arthritis outcome measures into Spanish. Arthritis Rheum 1995;38:142966. 39 Barlow J, Turner A, Swaby L et al. An 8-yr follow-up of arthritis self-management programme participants. Rheumatology 2009;48:12833. 24 Lomi C, Burckhardt C, Nordholm L et al. Evaluation of a Swedish version of the arthritis self-efficacy scales in people with fibromyalgia. Scand J Rheumatol 1995;24: 2827. 40 Hill J, Lewis M, Bird H. Do OA patients gain additional benefit from care from a clinical nurse specialist?—a randomized clinical trial. Rheumatology 2009;48:65864. 25 Barlow JH, Williams B, Wright CC. The reliability and validity of the arthritis self-efficacy scale in a UK context. Psychol Health Med 1997;2:317. 26 Mueller A, Hartmann M, Mueller K et al. Validation of the arthritis self-efficacy short-form scale in German fibromyalgia patients. Euro J Pain 2003;7:16371. 27 Sandhu J, Packham JC, Healey EL et al. Evaluation of a modified arthritis self-efficacy scale for an ankylosing spondylitis UK population. Clin Exp Rheumatol 2011;29: 22330. 28 Wallen GW, Middleton KR, Rivera-Goba MV et al. Validating English- and Spanish-language patientreported outcome measures in underserved patients with rheumatic disease. Arthritis Res Ther 2011;13:R1. 29 Primdahl J, Wagner L, Hørslev-Petersen K. Self-efficacy in rheumatoid arthritis: translation and test of validity, reliability and sensitivity of the Danish version of the Rheumatoid Arthritis Self-Efficacy Questionnaire (RASE). Musculoskeletal Care 2010;8:12335. 41 Haugen IK, Moe RH, Christensen BS et al. The AUSCAN subscales, AIMS-2 hand/finger subscale and FIOHA were not unidimensional scales. J Clin Epidemiol 2011;64: 103946. 42 U.S. Food and Drug Administration. Guidance for industry. Patient-reported outcome measures: use in medical product development to support labeling claims. Rockville, MD: U.S. Food and Drug Administration, 2009. http://www.fda.gov/downloads/Drugs/Guidances/ UCM193282.pdf (3 June 2013, date last accessed). 43 Fitzpatrick R, Davey C, Buxton MJ et al. Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess 1998;2:174. 44 Garratt AM, Schmidt L, Mackintosh A et al. Quality of life measurement: bibliographic study of patient assessed health outcome measures. BMJ 2002;324:14179. 45 Klokkerud M, Grotle M, Løchting I et al. Psychometric properties of the Norwegian version of the patient generated index in patients with rheumatic diseases participating in rehabilitation or self-management programmes. Rheumatology 2013;52:92432. 30 Jerusalem M, Schwarzer R. Self-efficacy as a resource factor in stress appraisal processes. In: Schwarzer R, ed. Self-efficacy: thought control and action. Washington, DC: Hemisphere, 1992. 46 Brazier J, Deverill M, Green C et al. A review of the use of health status measures in economic evaluation. Health Technol Assess 1999;3:1164. 31 Barlow J, Williams B, Wright C. The generalized self-efficacy scale in people with arthritis. Arthritis Care Res 1996;9:18996. 47 Brady TJ. Do common arthritis self-efficacy measures really measure self-efficacy? Arthritis Care Res 1997;10: 18. 32 Hewlett S, Cockshott Z, Kirwan J et al. Development and validation of a self-efficacy scale for use in British patients with rheumatoid arthritis (RASE). Rheumatology 2001;40: 122130. 48 Preston CC, Colman AM. Optimal number of response categories in rating scales: validity, discriminating power, and respondent preferences. Acta Psychologica 2000; 104:115. 33 Hewlett S, Cockshott Z, Almeida C et al. Sensitivity to change of the rheumatoid arthritis self-efficacy scale (RASE) and predictors of change in self-efficacy. Musculoskeletal Care 2008;6:4967. 49 Garratt AM, Helgeland J, Gulbrandsen P. Scaling responses within a patient experiences survey: a randomised comparison. J Clin Epidemiol 2011;64:2007. 34 Niedermann K, Forster A, Ciurea A et al. Development and psychometric properties of a joint protection self-efficacy scale. Scand J Occup Ther 2011;18:14352. www.rheumatology.oxfordjournals.org 50 Saris WE, Revilla, Krosnick JA et al. Comparing questions with agree/disagree response options to questions with item-specific response options. Surv Res Methods 2010; 4:6179. 1171
© Copyright 2026 Paperzz