RHEUMATOLOGY Rheumatology 2015;54:18421851 doi:10.1093/rheumatology/kev125 Advance Access publication 21 May 2015 Original article Are patient-reported outcome instruments for ankylosing spondylitis fit for purpose for the axial spondyloarthritis patient? A qualitative and psychometric analysis Astrid van Tubergen1, Peter M. Black2 and Geoffroy Coteur3 Abstract Objectives. Several patient-reported outcome (PRO) instruments have been validated in AS. This study aims to evaluate several measurement properties of such PROs in a broad axial SpA (axSpA) population, including both AS and non-radiographic axSpA (nr-axSpA) subpopulations. CLINICAL SCIENCE Methods. PROs assessed were total and nocturnal back pain, patient global assessment of disease activity, BASDAI, BASFI and the 36-item Short Form Health Survey. A literature review and both clinician and patient qualitative interviews provided information on instrument content validity. Reliability (testretest and internal consistency), construct validity (PROs, clinical-outcome correlations and knowngroups validity) and PRO responsiveness were assessed. Data from the RAPID-axSpA trial (NCT01087762) investigating certolizumab pegol efficacy in axSpA, including relevant subpopulations, were utilized. Results. Concepts identified for the broad axSpA population by both clinician and patient interviews were consistent with those identified through literature review of AS. All PROs demonstrated reliability in the RAPID-axSpA population (n = 325), with testretest intraclass correlation coefficients and internal consistency Cronbach’s a >0.8. Validity was supported by agreement between PROs and clinician-rated measures; except for the 36-item Short Form Health Survey Mental Components Summary, correlations between PROs and physician global assessment of disease activity ranged from 0.28 to 0.42 for week 0 and from 0.53 to 0.65 for week 24. PRO measures showed good sensitivity to change (effect size >0.8) at weeks 12 and 24 for responders. No variations in measurement properties were noted between the subpopulations. Conclusion. This study indicates that both content validity and measurement properties of PRO instruments utilized in AS are preserved in the broad axSpA population. Key words: axial spondyloarthritis, anti-TNF therapy, certolizumab pegol, patient-reported outcomes, validation, non-radiographic axSpA, ankylosing spondylitis, RAPID-axSpA. Rheumatology key messages Commonly used patient-reported outcome instruments were shown to be valid in the broad axial SpA population, including AS and non-radiographic axial SpA. . Validity for patient-reported outcomes in axial SpA was supported by strong associations between patientreported outcomes and clinician-rated measures. . Proposed minimal clinically important differences for patient-reported outcomes were similar regardless of axial SpA patient radiographic status. . 1 Department of Medicine, Division of Rheumatology, Maastricht University Medical Center, Maastricht, The Netherlands, 2Clinical Outcome Assessments Consulting, ERT, Philadelphia, PA, USA and 3 Global Health Outcomes Research, UCB Pharma, Brussels, Belgium Submitted 15 May 2014; revised version accepted 20 March 2015 Correspondence to: Astrid van Tubergen, Department of Medicine, Division of Rheumatology, Maastricht University Medical Centre, Maastricht, The Netherlands. E-mail: [email protected] ! The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: [email protected] RAPID-axSpA: validation of PRO instruments Introduction Axial spondyloarthritis (axSpA), a chronic, inflammatory rheumatic disease, comprises a spectrum of disease states including patients with definitive radiographic damage in the sacroiliac joints, AS and a subgroup characterized by little or no change on plain radiographs, referred to as non-radiographic axSpA (nr-axSpA). The most frequently investigated patient subgroup is AS, as classification criteria to include the nr-axSpA subgroup were only recently developed [1, 2]. AS affects approximately twice as many men as women and has a disease onset usually in the second or third decade of life. The nraxSpA subpopulation shows a more balanced gender ratio [3, 4]. The prevalence of AS worldwide ranges from 0.1% to 1.4% [57]; the prevalence of the broad axSpA population in the USA was recently shown to be 0.7% [8]. The majority of studies investigating outcomes in axSpA have been performed on the AS subpopulation. Disability in AS is related to both the degree of inflammatory activity causing pain, stiffness, fatigue and poor quality of sleep and the degree of bony ankylosis causing loss of spinal mobility. Due to these symptoms, AS also has a significant impact on patient health-related quality of life (HRQoL) as measured by patient-reported outcomes (PROs). During the early stages of disease, disability is mostly determined by inflammatory activity, whereas in long-standing disease, both inflammation and ankylosis contribute to disability. Some recent studies suggest that the patient burden of disease is similar between the AS and nr-axSpA subpopulations [4, 9, 10]. However, there is a lack of qualitative comparisons of concepts impacted by the disease in each patient subpopulation. Therefore, while many PRO measures have been developed, validated and used in AS [11, 12], only limited data support their use in the entire disease spectrum of axSpA, particularly in the clinical research arena [10, 13]. The aim of this analysis was to qualitatively and quantitatively assess several aspects of validity and reliability of these PRO instruments used in the certolizumab pegol (CZP) RAPID-axSpA trial in a broad axSpA population with objective signs of inflammation, including both AS and nr-axSpA patients. We hypothesized that the PROs would perform similarly in both disease subpopulations. Assessment of reliability, validity and other psychometric characteristics followed the principles outlined in the 2009 US Food and Drug Administration PRO guidance document [14]. Methods This was a two-part study assessing the validity and reliability of PROs, previously validated in AS, in a broader axSpA population. First, a qualitative analysis was carried out to confirm the concepts included in the existing PROs and their applicability to a wider axSpA patient population (concept validity). Second, the measurement properties of the PROs in the broad axSpA population as well as AS and nr-axSpA subpopulations were analysed www.rheumatology.oxfordjournals.org quantitatively. Measurement properties assessed were testretest reliability and internal consistency, construct and known-groups validity as well as sensitivity to change and minimal clinically important difference (MCID). Qualitative assessment of content validity To assess whether the concepts measured by existing PROs in AS were relevant to the experience of the broad axSpA population, a systematic literature review and semi-structured interviews of both clinicians and patients were conducted. These aimed to identify the relevant signs, symptoms and subsequent impacts for axSpA patients and whether these concepts were similar in both the AS and nr-axSpA subpopulations. The systematic literature review was conducted in Medline. The search strategy consisted of a combination of free terms and controlled vocabulary terms relating to axSpA and PROs (see supplementary data, section on search terms used for systematic review, available at Rheumatology Online). Publications that discussed the use and psychometric properties of relevant instruments were identified. Additional searching for references was also performed. When possible, instruments were also reviewed in the PRO and Quality of Life Instruments Database for additional information. Following the literature review, and independent of it, patient and clinician interviews were carried out. Clinician interviews were conducted with four experts in the field of axSpA. Each expert was interviewed independently and was asked to indicate the most important symptoms and their impact on patients with axSpA. Similarly, 15 axSpA patients underwent semi-structured concept elicitation interviews to identify the relevant signs, symptoms and subsequent impact of axSpA on their HRQoL. Patient interviews were conducted in waves by a trained senior interviewer. The interview waves continued until concept saturation was reached (i.e. no new concepts were identified). Inclusion and exclusion criteria for the patients interviewed were aligned with those of the RAPID-axSpA clinical trial of CZP in axSpA [15]: patients were 518 years of age with a diagnosis of axSpA of 53 months duration, as defined by fulfilment of the Assessment of SpondyloArthritis international Society (ASAS) axSpA criteria [1]. This study was performed in accordance with the ethical principles that have their origin in the Declaration of Helsinki and are consistent with Good Clinical Practice and applicable regulatory requirements. All patients provided informed consent. An application for central institutional review board approval was submitted and approved at all sites. Ethical approval for the study was provided by the Quorum Institutional Review Board. All recruited patients had active disease, defined by a BASDAI 54, back pain 54 (on a 010 numerical rating scale) and either CRP level greater than the upper limit of normal (7.9 mg/l) or sacroiliitis on MRI according to the ASAS definition [11]. Eligible patients must have previously had an inadequate response to or been intolerant to one or more NSAID. 1843 Astrid van Tubergen et al. Patients were excluded if they had received previous treatment for axSpA with more than two biologics, previous treatment with two or more TNF inhibitors or had a primary failure to respond to TNF inhibitor therapy. Patients were excluded when total spinal ankylosis was present. Stratification parameters of the study stated that at least 50% of patients had to meet both the ASAS and modified New York (mNY) criteria for AS [16]. Of the 15 patients interviewed, 9 (60%) had AS and 6 had nr-axSpA. Patients came from two sites in the USA and their overall characteristics were intentionally diverse in terms of age, gender and ethnicity. A concept elicitation interview guide was used and the disease concepts derived from the patient interviews were evaluated against the findings of the literature review and expert interviews. Quantitative assessment of psychometric properties Patients and study design Measurement properties were assessed using data from the RAPID-axSpA trial (NCT01087762) [15], an international study on the safety and efficacy of CZP in axSpA patients with objective signs of inflammation. CZP is a PEGylated Fc-free anti-TNF that has been demonstrated to improve PROs in RA [17]. The ongoing RAPID-axSpA trial is double blind and placebo controlled to week 24 and dose blind to week 48. These analyses used data from all 325 axSpA patients enrolled in the RAPID-axSpA trial; of these, 178 were diagnosed with AS and 147 with nr-axSpA. Outcomes assessed The PROs assessed for aspects of reliability and validity were the BASDAI (six items, scale 010) [18], fatigue (item 1 from the BASDAI, scale 010), morning stiffness (items 5 and 6 from the BASDAI, scale 010), total and nocturnal back pain (one item each, scale 010) [19, 20], patient global assessment of disease activity (PtGADA; one item, scale 010) [20], BASFI (10 items, scale 010) [20], 36-item Short Form Health Survey (SF-36) physical component score (PCS; 21 items, 4 domains, scale 0100), SF-36 mental component score (MCS; 14 items, 4 domains, scale 0100) and SF-36 physical functioning (PF) domain (10 items, scale 0100) [21]. The measurement properties of these PROs were assessed in the broad axSpA population and the AS and nr-axSpA subpopulations. The clinical outcomes reported in this article are the AS DAS (ASDAS) [22] and physician global assessment of disease activity (PhGADA; one item, scale 0100) change from baseline. Data handling and statistical analysis Missing items were imputed to calculate the domain scores according to the authors’ recommendations for each respective questionnaire. However, missing domain scores were not imputed (observed case analysis). Testretest reliability of the PROs was tested with intraclass correlation coefficients in a population of patients expected not to change significantly between the first and second assessments. This population was therefore defined as patients with a change in the PtGADA of 1844 ±1 point (10-point numerical rating scale) and a PhGADA rating within ±15 scale units (0100 mm visual analogue scale) between weeks 12 and 16 (stable phase) of the RAPID-axSpA study. For instruments composed of more than two items, internal consistency was measured using Cronbach’s a, whereas Spearman’s rank correlation was used for instruments with two items. No internal consistency measure was applicable for single-item instruments. In terms of testretest reliability, the threshold for Cronbach’s a was defined as satisfactory when >0.7, as good when >0.8 and as excellent when >0.9 [23]. Construct validity was assessed by testing Pearson’s correlation coefficients between PROs and clinical outcomes (ASDAS and PhGADA) at weeks 0, 12 and 24. Known-groups validity for PROs used effect sizes calculated as the difference between patients with high and low disease activity (PhGADA median score or ASDAS 2.1 score [24] cut-offs) divided by the S.D. of the full sample at the appropriate time point. Scores were rescaled so that a positive effect size corresponded to a more severe disease state; an effect size >0.8 was considered large [25]. Sensitivity to change was assessed by comparing the PRO response of clinical endpoint responders (PhGADA improvement 510 from baseline), stable patients (PhGADA change from baseline <10) and non-responders (PhGADA deterioration 510 from baseline) using the effect size (calculated as the change from week 0 to week 24 divided by the S.D. of the value for week 0) and Guyatt’s responsiveness statistic (calculated as the change from week 0 to week 24 divided by the S.D. of change for stable patients in the same period) [23]. The MCID for each instrument was also determined using the median and range of values derived by anchorbased and distribution-based approaches. The anchorbased method used a patient global impression of change (PGIC) [26]; this was not evaluated in its own right, but was used to assess meaningful change in other measures. The PGIC is a single-item, 7-point scale for the change from baseline for a condition using the following categories: marked improvement, moderate improvement, slight improvement, no change, slight worsening, moderate worsening, marked worsening. Anchor-based methods used patients achieving a slight or slight to moderate change compared with those demonstrating no change in PGIC as the a priori definition of a clinically meaningful change. A regression coefficient between the PGIC and each PRO was determined and the receiver operating characteristic approach (used to determine the cut-off value of each PRO that provided the best discrimination between external criteria) was also performed. The distributionbased approach defined the MCID as 0.5 S.D. from the baseline score for each of the PROs investigated [27]. Results Qualitative assessment of content validity Patients A total of 15 axSpA patients participated in the concept validation interviews, including 9 patients who also met www.rheumatology.oxfordjournals.org RAPID-axSpA: validation of PRO instruments the mNY criteria for AS (60%). Interviewed patients ranged in age from 28 to 63 years, with a median age of 51 years. Seven patients (47%) were men and 12 (80%) were HLA-B27+. Content validity Of the concepts identified in the patient interviews, 55 (86%) were identified in the first wave of interviews, with 5 new concepts in wave 2 and 1 new concept in wave 3. No new concepts were identified in wave 4 (supplementary Table S1, available at Rheumatology Online). Since no concepts were identified in the fourth and final wave, saturation of concepts was determined to be achieved and no new patients were interviewed. Clinicians were interviewed once and their responses separated into the relevant concepts that were mentioned. The concepts described in both patient and clinician interviews were then compared with those that resulted from the systematic literature review. Concepts identified by all methods included pain (including back pain, hip pain and neck pain), morning stiffness, fatigue, sleep disturbances, physical function and problems performing daily and recreational activities. In addition to these, the literature reported difficulty with work-related, self-care and household tasks, mobility problems, disease activity, problems with social life/family relationships and depression. Both the clinician and patient interviews also revealed further concepts other than the common concepts listed above. For clinician interviews these were joint pain and swelling, difficulty performing physically demanding activities and low vitality levels. Concepts identified only by patients included difficulty travelling and driving, limitations in physical activity, mobility problems and the emotional impact of axSpA. These additional concepts were identified in interviews with both AS and nr-axSpA patients, underlining the similarity of these diseases in terms of disease burden. Quantitative assessment of psychometric properties Patients Since the qualitative assessment suggested that similar disease-related issues impact the HRQoL of both AS and nr-axSpA patients, the measurement properties of PROs previously validated in AS were analysed in a wider axSpA population; specifically, a set of 325 axSpA patients with objective signs of inflammation (RAPID-axSpA population) [28]. Of these patients, 178 (55%) met the mNY criteria for AS, with the other 147 (45%) patients being diagnosed with nr-axSpA. Baseline demographics were similar between the AS and nr-axSpA subgroups, with the exception of gender ratio (Table 1). Patients meeting the mNY criteria for AS also reported longer symptom duration compared with nr-axSpA patients (11.9 and 8.6 years, respectively). A total of 163 axSpA patients (AS, n = 91; nr-axSpA, n = 72) from the RAPID-axSpA trial met the criteria for stable disease and were included in the testretest reliability analysis. www.rheumatology.oxfordjournals.org Data completeness For the BASFI, BASDAI, total back pain score and PtGADA, the level of missing items was very low (<1% in all cases). Slightly higher levels of missing data were seen for the nocturnal back pain score (<2% in all cases), PGIC (1.9% at week 12, 2.3% at week 24), SF-36 PF (1.8% at week 0, 2.6% at week 12, 1.7% at week 24) and SF-36 component summaries (7.1% at week 0, 8.7% at week 12, 5.7% at week 24). For the SF-36, Last Observation Carried Forward imputation for missing components was possible in most cases. After imputation of missing items, 597.5% of domain scores at weeks 0, 12 and 24 were evaluable across PRO measures. Disease burden At baseline, mean symptom scores indicated a substantial level of disease burden (Table 1). Similar mean scores were reported by AS and nr-axSpA patients. However, the BASFI showed somewhat higher scores (i.e. more disability) in AS patients compared with nr-axSpA patients (5.7 vs 4.9). Reliability The testretest reliability of all PRO instruments was satisfactory in the assessed stable subgroup of the overall RAPID-axSpA population and in each subpopulation (Table 2). The internal consistency of the PROs was good, as indicated by a Cronbach’s a >0.8 for all instruments with more than two items (BASFI, BASDAI and the SF-36 measures), and moderate to good correlations were found between BASDAI items 5 and 6 (morning stiffness) of 0.53 (week 0) and 0.88 (week 24). Similar results were observed in the AS and nr-axSpA subpopulations (Table 2). Construct and known-groups validity Correlations were calculated between PROs and clinical outcomes (i.e. ASDAS and PhGADA). The correlation between the PROs investigated, excluding SF-36 MCS, and the PhGADA ranged from 0.28 to 0.42 for week 0, 0.58 to 0.66 for week 12 and 0.53 to 0.65 for week 24; correlations with ASDAS ranged from 0.38 to 0.66 for week 0, 0.61 to 0.80 for week 12 and 0.60 to 0.77 for week 24 (Table 3). Known-groups analysis was also undertaken by calculating the effect sizes in PROs between subgroups of patients based on their clinical health status (Table 4). Although effect sizes were smaller for week 0, at weeks 12 and 24 the effect sizes for all PROs (excluding SF-36 MCS) were large (0.801.30). Similar results were found in both AS and nr-axSpA patients (data not shown), suggesting that all PRO instruments (excluding SF-36 MCS) correlated with clinical outcomes in these subpopulations. Sensitivity to change Both effect sizes and Guyatt’s responsiveness statistic demonstrated a good sensitivity of PROs (with the exception of SF-36 MCS) to change in the PhGADA in axSpA patients, with effect sizes >0.8 and Guyatt’s responsiveness statistic values in the range of 1.02.0 (Table 5). 1845 Astrid van Tubergen et al. TABLE 1 Baseline demographics and clinical characteristics of RAPID-axSpA study population Characteristic axSpA (n = 325) AS (n = 178) nr-axSpA (n = 147) Male, % Age, years White, % Symptom duration, years Positive for HLA-B27, n (%) Prior TNF inhibitor exposure, % CRPa, median (minimummaximum), mg/l ASDASb BASDAIb Fatigueb Morning stiffnessb Total back painb Nocturnal back painb PtGADAb PhGADAb BASFIb SF-36 MCSb SF-36 PCSb SF-36 PFb MASESc Peripheral arthritis,d n (%) Heel enthesitis,e n (%) Uveitis,e n (%) Psoriasis,e n (%) Crohn’s disease/ulcerative colitis,e n (%) 1.5 39.6 (11.9) 90.2 10.4 (9.5) 255 (78.5) 16.0 13.9 (0.1174.8) 3.9 (0.9) 6.4 (1.6) 6.7 (1.9) 6.6 (1.9) 7.0 (1.9) 6.9 (2.3) 7.0 (2.1) 61.4 (17.2) 5.4 (2.3) 40.5 (12.1) 32.5 (7.5) 34.9 (9.3) 5.1 (3.4) 124 (38.2) 117 (36.0) 69 (21.2) 20 (6.2) 18 (5.5) 72.5 41.5 (11.6) 89.3 11.9 (9.9) 145 (81.5) 20.2 14.3 (0.1174.8) 4.0 (0.9) 6.4 (1.6) 6.6 (2.0) 6.6 (1.9) 7.1 (1.9) 6.9 (2.2) 7.0 (2.1) 61.6 (17.6) 5.7 (2.2) 40.9 (12.0) 32.0 (7.3) 34.3 (9.1) 4.8 (3.5) 62 (34.8) 62 (34.8) 37 (20.8) 7 (3.9) 11 (6.2) 48.3 37.4 (11.8) 91.2 8.6 (8.6) 110 (74.8) 10.9 11.9 (0.1156.2) 3.8 (0.9) 6.5 (1.5) 6.7 (1.8) 6.6 (1.9) 7.0 (1.9) 6.9 (2.5) 7.0 (2.1) 61.3 (16.8) 4.9 (2.3) 40.0 (12.3) 33.1 (7.8) 35.7 (9.6) 5.4 (3.4) 62 (42.2) 55 (37.4) 32 (21.8) 13 (8.8) 7 (4.8) Except where indicated otherwise, values are given as mean (S.D.). aNormal range of CRP <8.0 mg/l. bIn the FAS population: n = 324 (axSpA), 178 (AS), 146 (nr-axSpA). cShown for those patients with at least one affected joint at baseline: n = 229 (axSpA), 122 (AS), 107 (nr-axSpA). dDefined as at least one swollen joint in the 44-joint assessment. eExtra-spinal features of axSpA are taken from patients’ ASAS classification criteria screening assessments and show both patient history or current diagnosis. ASDAS: AS Disease Activity Score; ASAS: Assessment of SpondyloArthritis international Society; axSpA: axial SpA; FAS: full analysis set; MASES: Maastricht Ankylosing Spondylitis Enthesitis Score; nr-axSpa: non-radiographic axSpA; PhGADA: physician’s global assessment of disease activity; PtGADA: patient’s global assessment of disease activity; SF-36: 36-item Short Form Health Survey; MCS: Mental Component Summary; PCS: Physical Component Summary; PF: Physical Functioning. TABLE 2 Testretest between weeks 12 and 16, and internal consistency of PROs Testretest reliability Internal consistency axSpA (n = 325) Instrument axSpA (n = 163)a AS (n = 91)a nr-axSpA (n = 72)a BASDAIb Fatigue Morning stiffnessc Total back pain Nocturnal back pain BASFIb 0.93 0.85 0.91 0.94 0.92 0.94 0.92 0.84 0.90 0.93 0.90 0.93 0.93 0.86 0.93 0.95 0.93 0.96 AS (n = 178) nr-axSpA (n = 147) Week 0 Week 24 Week 0 Week 24 Week 0 Week 24 0.82 0.94 0.83 0.79 0.95 0.54 0.88 0.53 0.56 0.89 0.94 0.97 0.93 0.94 NA 0.87 NA NA 0.97 0.94 0.97 Testretest reliability was assessed with intraclass correlation coefficients. aPatients with PtGADA rating within ±1 point (10point scale) and PhGADA rating within ±15 scale units (0100 mm visual analogue scale) at weeks 12 and 16. Internal consistency was assessed with either bCronbach’s a (>2 item scales) or cSpearman’s rank (2 item scales). The threshold for Cronbach’s a was defined as satisfactory when >0.7, as good when >0.8 and as excellent when >0.9. axSpA: axial SpA; NA: no internal consistency measure was possible for instruments with one item; nr-axSpa: non-radiographic axSpA; PhGADA: physician’s global assessment of disease activity; PRO: patient-reported outcome; PtGADA: patient’s global assessment of disease activity. 1846 www.rheumatology.oxfordjournals.org RAPID-axSpA: validation of PRO instruments TABLE 3 Correlations between PROs and clinical measures at week 12 for the overall axSpA population PhGADA Measure BASDAI Fatigue Morning stiffness Total back pain Nocturnal back pain PtGADA BASFI SF-36 MCS SF-36 PCS SF-36 PF Week 0 Week 12 0.39 0.28 0.33 0.32 0.30 0.38 0.42 0.17 0.40 0.37 0.66 0.59 0.58 0.66 0.62 0.64 0.66 0.35 0.65 0.58 ASDAS Week 24 Week 0 0.64 0.54 0.58 0.64 0.65 0.58 0.60 0.36 0.59 0.55 Week 12 0.66 0.41 0.54 0.44 0.43 0.62 0.52 0.08 0.45 0.39 0.80 0.63 0.73 0.77 0.75 0.78 0.72 0.37 0.65 0.63 Week 24 0.75 0.60 0.64 0.71 0.71 0.77 0.66 0.33 0.65 0.62 Data shown are Pearson’s correlation coefficients. ASDAS: AS disease activity score; MCS: Mental Component Summary; PCS: Physical Component Summary; PF: Physical Functioning; PhGADA: physician’s global assessment of disease activity; PtGADA: patient’s global assessment of disease activity; SF-36: 36-item Short Form Health Survey. TABLE 4 Known-groups validity between PROs and clinical outcomes (PhGADA and ASDAS) in axSpA Effect sizes PhGADA median score cut-off ASDAS 2.1 score cut-off Instrument Week 0 (cut-off 63.0) Week 12 (cut-off 32.0) Week 24 (cut-off 20.0) Week 0 Week 12 Week 24 BASDAI Fatigue Morning stiffness Total back pain Nocturnal back pain BASFI SF-36 MCS SF-36 PCS SF-36 PF 0.69 0.46 0.54 0.56 0.51 0.75 0.20 0.68 0.65 1.13 0.99 1.01 1.18 1.13 1.14 0.54 1.12 0.99 0.95 0.79 0.80 0.96 0.95 0.99 0.46 1.05 0.96 1.16 0.43 0.96 1.12 1.08 0.94 0.04 0.67 0.73 1.30 1.01 1.17 1.30 1.22 1.18 0.64 1.03 1.00 1.30 1.02 1.15 1.26 1.28 1.14 0.54 1.07 1.03 Effect sizes were calculated as the difference between patients with high and low disease activity (as defined by the clinical cut-offs) divided by the S.D. of the whole sample at the appropriate time point. Scores were rescaled so that a positive effect size corresponded to a more severe disease state. ASDAS: AS disease activity score; MCS: Mental Component Summary; PCS: Physical Component Summary; PF: Physical Functioning; PhGADA: physician’s global assessment of disease activity; PRO: patient-reported outcome; SF-36: 36-item Short Form Health Survey. Similar results were observed in the AS and nr-axSpA subpopulations (data not shown). the value of the MCID to differ in either direction for the AS compared with the nr-axSpA subpopulation (Table 6). MCID A range of values for the MCID in each PRO was calculated using anchor-based and distribution-based approaches utilizing the RAPID-axSpA data. As there is no accepted algorithm for establishing an MCID value from such results, the median and range for MCID values ascertained using these approaches were reported (Table 6). The MCID results for the AS and nr-axSpA populations were less stable compared with the overall axSpA population, however, there was no tendency for www.rheumatology.oxfordjournals.org Discussion The present study demonstrated that concepts identified for the broad axSpA population by clinician and patient interviews were consistent with those identified through the literature review in AS and that the measurement properties of PRO instruments in AS were preserved in the broad axSpA population, including both the AS and nr-axSpA subpopulations. 1847 Astrid van Tubergen et al. TABLE 5 Responsiveness of the instruments according to PhGADA-defined response Effect size of change from week 0 to week 24 Instrument Responder BASDAI Fatigue Morning stiffness Total back pain Nocturnal back pain BASFI SF-36 MCS SF-36 PCS SF-36 PF 2.15 1.56 1.98 1.90 1.74 1.09 0.46 1.31 0.93 Stable Guyatt’s responsiveness statistic from weeks 0 to 24 Non-responder Responder 0.38 0.55 0.32 0.28 0.25 0.04 0.07 0.08 0.06 2.05 1.41 2.07 1.30 1.27 1.26 0.49 1.76 0.91 0.56 0.66 0.43 0.35 0.42 0.12 0.05 0.30 0.11 Stable 0.58 0.61 0.52 0.26 0.32 0.14 0.07 0.47 0.13 Non-responder 0.37 0.47 0.39 0.19 0.18 0.05 0.09 0.14 0.07 Patients were defined using the following cut-offs: responder: improvement 510 from baseline; stable: change from baseline <10; non-responder: deterioration 510 from baseline. Effect size is calculated as the change from weeks 0 to 24 divided by the S.D. of the value for week 0. Guyatt’s responsiveness statistic is calculated as the change from weeks 0 to 24 divided by the S.D. of change for stable patients in the same period. MCS: Mental Component Summary; PCS: Physical Component Summary; PF: Physical Functioning; PhGADA: physician’s global assessment of disease activity; SF-36: 36-item Short Form Health Survey. Baseline PRO scores suggested high levels of symptom burden and impact of disease. The average scores were >50% of the possible scale maximum for all the diseasespecific questions at baseline, with substantial reductions in scores over the study period, typically in the region of 4050%. A similar symptom burden, as reported through PRO scores, was reported in AS and nr-axSpA patients; only BASFI showed somewhat higher scores (i.e. higher disability) for AS patients, potentially reflecting the presence of structural damage in these patients. All measures showed high testretest reliability in the overall axSpA population, with correlation values >0.8. All composite scales also demonstrated good internal consistency. Validity was supported by agreement between PRO measures and external (clinician-rated) measures. Correlations with the ASDAS were higher in general than those for the PhGADA, although there is an element of circularity in these correlations, since the ASDAS includes PRO assessments in its construction. It was included here because the ASDAS has been found to be highly discriminatory and sensitive to change, better reflecting the inflammatory disease processes and more reliably determining the disease activity status of patients than the BASDAI or other measures of disease activity in AS [29]. The PhGADA, being independent of PRO assessments, gave a more independent measure of clinical outcomes. The correlations between PRO measures and PhGADA were higher than those often found between clinician and patient-rated measures, particularly at week 24, and support the validity of the PRO measures in this patient population. Patients were classified as responders if they showed an improvement of 510 points on the PhGADA. All the PRO measures showed good sensitivity to change, with large response sizes (effect size >0.8) [25] for change from baseline at weeks 12 and 24 for responders. Effect sizes for patients classified as non-responders were much 1848 smaller. While there is no standard set of threshold values for interpreting Guyatt’s responsiveness statistic, the calculated effect size values were similar to those reported elsewhere [30, 31]. Suggested values for the MCID were derived using an anchor-based or a distribution-based approach. Values tended to be in the same range, or slightly higher, than values seen in previous studies [3234]. As the minimum improvement that a patient would perceive as being meaningful is likely influenced by their expectations regarding treatment efficacy, these slightly higher values may reflect higher patient expectations for improvement. Treatments now offered are more effective than when the original studies describing MCIDs were published, thus patient expectations regarding minimum improvements may be higher. The observed differences between the current findings and those reported previously did not appear to be due to different study subpopulations (Table 6). Correlations between the SF-36 MCS and clinical outcomes, as well as SF-36 MCS sensitivity to change, were substantially lower compared with those for other PRO measures, including SF-36 physical scores (PCS and PF). In addition, the SF-36 MCS did not provide a stable estimate of clinically meaningful change. This may be due to the loading of SF-36 MCS towards emotional and social functioning, which are not as closely associated with the axial symptoms of axSpA captured by clinical outcomes as the more physical assessments of HRQoL. This does not necessarily reflect a general problem with the SF-36 since the physical components of the SF-36 performed well. The limitations of this study include the use of a clinical trial population composed of patients with active disease, which may not accurately represent the entire spectrum of axSpA patients. However, this patient population did include both AS and nr-axSpA patients, hence www.rheumatology.oxfordjournals.org www.rheumatology.oxfordjournals.org 1.37 1.07 1.67 1.63 1.75 1.81 0.77 2.03 3.19 1.98 PRO measure BASDAI Fatigue Morning stiffness Total back pain Nocturnal back pain PtGADA BASFI SF-36 MCS SF-36 PCS SF-36 PF 1.85 1.50 2.12 2.23 2.38 2.22 1.33 3.63 5.30 4.42 Slight or moderate improvementc 1.27 1.06 1.41 1.46 1.56 1.46 0.96 2.43 3.93 3.50 Regression coefficientd 1.30 1.00 2.00 2.00 2.00 2.00 1.20 2.68 5.22 6.31 ROC slight or greater improvemente 0.78 0.95 0.95 0.95 1.15 1.05 1.12 6.13 3.81 4.69 0.5 S.D. of baseline Distribution-based method in axSpAa 1.1 2.4 3.8 4.4 1.3 1.1 1.7 1.6 1.8 (0.81.8) (1.01.5) (1.02.7) (1.02.2) (1.12.4) — (0.81.3) ( 2.7 to 6.1) (3.25.3) (2.06.3) axSpA 1.1 3.5 3.6 4.6 1.4 1.0 1.8 1.5 1.5 (0.91.8) (1.01.4) (1.02.1) (1.02.4) (1.12.3) — (1.01.4) (1.86.0) (2.35.3) (2.46.3) AS 1.0 3.2 4.4 4.5 1.4 1.0 1.4 1.6 2.1 (0.91.4) (1.01.4) (1.02.0) (1.02.0) (1.32.5) — ( 0.2 to 1.1) (1.56.2) (3.05.6) (1.66.3) nr-axSpA Estimated MCID, median (range) 1.0 [32] 1.0 [32] NA 1.0 [34] 1.0 [34] — 1.0 [32] 2.5 [33] 2.5 [33] 5.0 [33] Protocol-defined MCID [reference] a In the overall axSpA population. bDifference between mean PRO measure score for patients scoring 1 (slight improvement) on the PGIC and those scoring 0 (no change). cDifference between mean PRO measure score for patients scoring 1 (slight improvement) or 2 (moderate improvement) on the PGIC and those scoring 0 (no change). dRegression coefficient for the PRO measure against PGIC, with the regression line constrained to pass through the origin. eScore on the PRO measure giving the best discrimination between patients scoring 40 on the PGIC and those scoring 51. axSpA: axial SpA; MCID: minimal clinically important difference; MCS: Mental Component Summary; PCS: Physical Component Summary; PF: Physical Functioning; PGIC: patient global impression of change; PRO: patient-oriented outcome; PtGADA: patient’s global assessment of disease activity; ROC: receiver operating characteristic; SF-36: 36-item Short Form Health Survey;. Slight improvementb Anchor-based methods: change from week 0 to week 12 compared with PGIC level in axSpAa TABLE 6 MCID estimates comparing approaches in the axSpA population: MCID median and range estimates in all populations RAPID-axSpA: validation of PRO instruments 1849 Astrid van Tubergen et al. measurement properties for PROs validated in AS could be compared not only with previously reported analyses in this subpopulation, but also with the broad axSpA population. These results therefore suggest that the PROs reported here are valid for use in the broad axSpA population, including both AS and nr-axSpA patients. In addition, given that the RAPID-axSpA trial was conducted across multiple centres in Europe, North America and Latin America [28], the applicability of the PROs can be extended to the worldwide axSpA population. In summary, these results provide validation for future use of these PROs in the broader axSpA population—such as the results reported in this issue concerning the impact of CZP on PROs in the broader axSpA population [35]. Conclusion In conclusion, this study indicates that both the content validity and the measurement properties of PRO instruments used historically in AS patients are preserved in the broad population of axSpA patients with objective signs of inflammation, including both AS and nr-axSpA patients. Acknowledgements The authors acknowledge Marine Champsaur, UCB Pharma, Brussels, Belgium, for publication coordination and Costello Medical Consulting, Cambridge, UK, for writing and editorial assistance, which was funded by UCB. Funding: This work was supported by UCB Pharma. Disclosure statement: A.v.T. has received grant/research support from Pfizer and Roche, is a member of a speakers’ bureau for AbbVie, MSD, UCB and Pfizer and has consultancies with AbbVie, Pfizer, UCB, Jansen-Cilag and MSD. P.M.B. is a consultant for UCB Pharma. C.G. is an employee of UCB Pharma and holds stock in UCB Pharma. Supplementary data Supplementary data are available at Rheumatology Online. References 1 Rudwaleit M, Landewé R, van der Heijde D et al. The development of Assessment of SpondyloArthritis international Society classification criteria for axial spondyloarthritis (part I): classification of paper patients by expert opinion including uncertainty appraisal. Ann Rheum Dis 2009;68(6):7706. 2 Rudwaleit M, van der Heijde D, Landewé R et al. The development of Assessment of SpondyloArthritis international Society classification criteria for axial spondyloarthritis (part II): validation and final selection. Ann Rheum Dis 2009;68:77783. 3 van Tubergen A, Weber U. Diagnosis and classification in spondyloarthritis: identifying a chameleon. Nat Rev Rheumatol 2012;8:25361. 1850 4 Rudwaleit M, Haibel H, Baraliakos X et al. The early disease stage in axial spondylarthritis: results from the German Spondyloarthritis Inception Cohort. Arthritis Rheum 2009;60:71727. 5 Braun J, Baraliakos X, Godolias G, Bohm H. Therapy of ankylosing spondylitis—a review. Part I: conventional medical treatment and surgical therapy. Scand J Rheumatol 2005;34:97108. 6 Gran JT, Husby G, Hordvik M. Prevalence of ankylosing spondylitis in males and females in a young middle-aged population of Tromsø, northern Norway. Ann Rheum Dis 1985;44:35967. 7 van der Linden SM, Valkenburg HA, de Jongh BM, Cats A. The risk of developing ankylosing spondylitis in HLA-B27 positive individuals. A comparison of relatives of spondylitis patients with the general population. Arthritis Rheum 1984;27:2419. 8 Strand V, Rao SA, Shillington AC et al. Prevalence of axial SpA in US rheumatology practices: assessment of ASAS criteria vs. rheumatology expert clinical diagnosis. Arthritis Care Res 2013;65:1299306. 9 van der Heijde D, Kivitz A, Schiff MH et al. Efficacy and safety of adalimumab in patients with ankylosing spondylitis: results of a multicenter, randomized, doubleblind, placebo-controlled trial. Arthritis Rheum 2006;54: 213646. 10 Haibel H, Rudwaleit M, Listing J et al. Efficacy of adalimumab in the treatment of axial spondylarthritis without radiographically defined sacroiliitis: results of a twelveweek randomized, double-blind, placebo-controlled trial followed by an open-label extension up to week fifty-two. Arthritis Rheum 2008;58:198191. 11 Sieper J, Rudwaleit M, Baraliakos X et al. The Assessment of SpondyloArthritis international Society (ASAS) handbook: a guide to assess spondyloarthritis. Ann Rheum Dis 2009;68(Suppl 2):ii144. 12 Zochling J. Measures of symptoms and disease status in ankylosing spondylitis: Ankylosing Spondylitis Disease Activity Score (ASDAS), Ankylosing Spondylitis Quality of Life Scale (ASQoL), Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), Bath Ankylosing Spondylitis Functional Index (BASFI), Bath Ankylosing Spondylitis Global Score (BAS-G), Bath Ankylosing Spondylitis Metrology Index (BASMI), Dougados Functional Index (DFI), and Health Assessment Questionnaire for the Spondylarthropathies (HAQ-S). Arthritis Care Res 2011;63(Suppl 11):S4758. 13 Barkham N, Keen HI, Coates LC et al. Clinical and imaging efficacy of infliximab in HLA-B27-positive patients with magnetic resonance imaging-determined early sacroiliitis. Arthritis Rheum 2009;60:94654. 14 US Food Drug Administration. Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims. Fed Reg 2009;74:651323. 15 Landewé R, Braun J, Deodhar A et al. Efficacy of certolizumab pegol on signs and symptoms of axial spondyloarthritis including ankylosing spondylitis: 24-week results of a double-blind randomised placebo-controlled phase 3 study. Ann Rheum Dis 2014;73:3947. www.rheumatology.oxfordjournals.org RAPID-axSpA: validation of PRO instruments 16 van der Linden S, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis. A proposal for modification of the New York criteria. Arthritis Rheum 1984;27:3618. 27 Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 2003;41:58292. 17 Strand V, Smolen JS, van Vollenhoven RF et al. Certolizumab pegol plus methotrexate provides broad relief from the burden of rheumatoid arthritis: analysis of patient-reported outcomes from the RAPID 2 trial. Ann Rheum Dis 2011;70:9961002. 28 Landewé R, Braun J, Deodhar A et al. Efficacy of certolizumab pegol on signs and symptoms of axial spondyloarthritis including ankylosing spondylitis: 24-week results of a double-blind randomised placebo-controlled phase 3 study. Ann Rheum Dis 2014;73:3947. 18 Garrett S, Jenkinson T, Kennedy LG et al. A new approach to defining disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol 1994;21:228691. 29 Machado P, van der Heijde D. How to measure disease activity in axial spondyloarthritis? Curr Opin Rheumatol 2011;23:33945 19 Sieper J, van der Heijde D, Landewé R et al. New criteria for inflammatory back pain in patients with chronic back pain: a real patient exercise by experts from the Assessment of SpondyloArthritis international Society (ASAS). Ann Rheum Dis 2009;68:7848. 30 Piva SR, Gil AB, Moore CG, Fitzgerald GK. Responsiveness of the activities of daily living scale of the knee outcome survey and numeric pain rating scale in patients with patellofemoral pain. J Rehabil Med 2009;41:12935. 20 van der Heijde D, Dougados M, Davis J et al. Assessment in Ankylosing Spondylitis International Working Group/ Spondylitis Association of America recommendations for conducting clinical trials in ankylosing spondylitis. Arthritis Rheum 2005;52:38694. 31 Cook KF, Roddey TS, Gartsman GM, Olson SL. Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function. Med Care 2003;41:82335. Õ 21 Maruish M. User’s Manual for the SF-36v2 Health Survey. 2011. 22 van der Heijde D, Lie E, Kvien TK et al. ASDAS, a highly discriminatory ASAS-endorsed disease activity score in patients with ankylosing spondylitis. Ann Rheum Dis 2009;68:18118. 23 Cole BF, Lee M-LT. Statistical Methods for Quality of Life Studies: Design, Measurements and Analysis. 2002. 24 Machado P, Landewé R, Lie E et al. Ankylosing Spondylitis Disease Activity Score (ASDAS): defining cutoff values for disease activity states and improvement scores. Ann Rheum Dis 2011;70:4753. 25 Cohen J. A power primer. Psychol Bull 1992;112:1559. 26 Hurst H, Bolton J. Assessing the clinical significance of change scores recorded on subjective outcome measures. J Manipulative Physiol Ther 2004;27:2635. www.rheumatology.oxfordjournals.org 32 Pavy S, Brophy S, Calin A. Establishment of the minimum clinically important difference for the bath ankylosing spondylitis indices: a prospective study. J Rheumatol 2005;32:805. 33 Strand V, Scott DL, Emery P et al. Physical function and health related quality of life: analysis of 2-year data from randomized, controlled studies of leflunomide, sulfasalazine, or methotrexate in patients with active rheumatoid arthritis. J Rheumatol 2005;32:590601. 34 Dworkin RH, Turk DC, Wyrwich KW et al. Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations. J Pain 2008;9:10521. 35 Sieper J, Kivitz A, van Tubergen A, Deodhar A, Coteur G, Woltering F, Landewé R. Impact of certolizumab pegol on patient-reported outcomes in patients with axial spondyloarthritis. Arthritis Care Res 2015; doi: 10.1002/ acr.22594. 1851
© Copyright 2026 Paperzz