Are patient-reported outcome instruments for ankylosing spondylitis

RHEUMATOLOGY
Rheumatology 2015;54:1842–1851
doi:10.1093/rheumatology/kev125
Advance Access publication 21 May 2015
Original article
Are patient-reported outcome instruments for
ankylosing spondylitis fit for purpose for the axial
spondyloarthritis patient? A qualitative and
psychometric analysis
Astrid van Tubergen1, Peter M. Black2 and Geoffroy Coteur3
Abstract
Objectives. Several patient-reported outcome (PRO) instruments have been validated in AS. This study
aims to evaluate several measurement properties of such PROs in a broad axial SpA (axSpA) population,
including both AS and non-radiographic axSpA (nr-axSpA) subpopulations.
CLINICAL
SCIENCE
Methods. PROs assessed were total and nocturnal back pain, patient global assessment of disease
activity, BASDAI, BASFI and the 36-item Short Form Health Survey. A literature review and both clinician
and patient qualitative interviews provided information on instrument content validity. Reliability
(test–retest and internal consistency), construct validity (PROs, clinical-outcome correlations and knowngroups validity) and PRO responsiveness were assessed. Data from the RAPID-axSpA trial (NCT01087762)
investigating certolizumab pegol efficacy in axSpA, including relevant subpopulations, were utilized.
Results. Concepts identified for the broad axSpA population by both clinician and patient interviews were
consistent with those identified through literature review of AS. All PROs demonstrated reliability in
the RAPID-axSpA population (n = 325), with test–retest intraclass correlation coefficients and internal consistency Cronbach’s a >0.8. Validity was supported by agreement between PROs and clinician-rated
measures; except for the 36-item Short Form Health Survey Mental Components Summary, correlations
between PROs and physician global assessment of disease activity ranged from 0.28 to 0.42 for week 0
and from 0.53 to 0.65 for week 24. PRO measures showed good sensitivity to change (effect size >0.8) at
weeks 12 and 24 for responders. No variations in measurement properties were noted between the
subpopulations.
Conclusion. This study indicates that both content validity and measurement properties of PRO instruments utilized in AS are preserved in the broad axSpA population.
Key words: axial spondyloarthritis, anti-TNF therapy, certolizumab pegol, patient-reported outcomes, validation, non-radiographic axSpA, ankylosing spondylitis, RAPID-axSpA.
Rheumatology key messages
Commonly used patient-reported outcome instruments were shown to be valid in the broad axial SpA population,
including AS and non-radiographic axial SpA.
. Validity for patient-reported outcomes in axial SpA was supported by strong associations between patientreported outcomes and clinician-rated measures.
. Proposed minimal clinically important differences for patient-reported outcomes were similar regardless of axial
SpA patient radiographic status.
.
1
Department of Medicine, Division of Rheumatology, Maastricht
University Medical Center, Maastricht, The Netherlands, 2Clinical
Outcome Assessments Consulting, ERT, Philadelphia, PA, USA and
3
Global Health Outcomes Research, UCB Pharma, Brussels, Belgium
Submitted 15 May 2014; revised version accepted 20 March 2015
Correspondence to: Astrid van Tubergen, Department of Medicine,
Division of Rheumatology, Maastricht University Medical Centre,
Maastricht, The Netherlands. E-mail: [email protected]
! The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: [email protected]
RAPID-axSpA: validation of PRO instruments
Introduction
Axial spondyloarthritis (axSpA), a chronic, inflammatory
rheumatic disease, comprises a spectrum of disease
states including patients with definitive radiographic
damage in the sacroiliac joints, AS and a subgroup characterized by little or no change on plain radiographs,
referred to as non-radiographic axSpA (nr-axSpA). The
most frequently investigated patient subgroup is AS, as
classification criteria to include the nr-axSpA subgroup
were only recently developed [1, 2]. AS affects approximately twice as many men as women and has a disease
onset usually in the second or third decade of life. The nraxSpA subpopulation shows a more balanced gender
ratio [3, 4]. The prevalence of AS worldwide ranges from
0.1% to 1.4% [5–7]; the prevalence of the broad axSpA
population in the USA was recently shown to be 0.7% [8].
The majority of studies investigating outcomes in axSpA
have been performed on the AS subpopulation. Disability
in AS is related to both the degree of inflammatory activity
causing pain, stiffness, fatigue and poor quality of sleep
and the degree of bony ankylosis causing loss of spinal
mobility. Due to these symptoms, AS also has a significant
impact on patient health-related quality of life (HRQoL) as
measured by patient-reported outcomes (PROs). During
the early stages of disease, disability is mostly determined
by inflammatory activity, whereas in long-standing disease, both inflammation and ankylosis contribute to
disability.
Some recent studies suggest that the patient burden of
disease is similar between the AS and nr-axSpA subpopulations [4, 9, 10]. However, there is a lack of qualitative
comparisons of concepts impacted by the disease in each
patient subpopulation. Therefore, while many PRO measures have been developed, validated and used in AS [11,
12], only limited data support their use in the entire disease spectrum of axSpA, particularly in the clinical research arena [10, 13].
The aim of this analysis was to qualitatively and quantitatively assess several aspects of validity and reliability of
these PRO instruments used in the certolizumab pegol
(CZP) RAPID-axSpA trial in a broad axSpA population
with objective signs of inflammation, including both AS
and nr-axSpA patients. We hypothesized that the PROs
would perform similarly in both disease subpopulations.
Assessment of reliability, validity and other psychometric
characteristics followed the principles outlined in the 2009
US Food and Drug Administration PRO guidance document [14].
Methods
This was a two-part study assessing the validity and reliability of PROs, previously validated in AS, in a broader
axSpA population. First, a qualitative analysis was carried
out to confirm the concepts included in the existing PROs
and their applicability to a wider axSpA patient population
(concept validity). Second, the measurement properties
of the PROs in the broad axSpA population as well
as AS and nr-axSpA subpopulations were analysed
www.rheumatology.oxfordjournals.org
quantitatively. Measurement properties assessed were
test–retest reliability and internal consistency, construct
and known-groups validity as well as sensitivity to
change and minimal clinically important difference (MCID).
Qualitative assessment of content validity
To assess whether the concepts measured by existing
PROs in AS were relevant to the experience of the
broad axSpA population, a systematic literature review
and semi-structured interviews of both clinicians and patients were conducted. These aimed to identify the relevant signs, symptoms and subsequent impacts for axSpA
patients and whether these concepts were similar in both
the AS and nr-axSpA subpopulations.
The systematic literature review was conducted in
Medline. The search strategy consisted of a combination
of free terms and controlled vocabulary terms relating to
axSpA and PROs (see supplementary data, section on
search terms used for systematic review, available at
Rheumatology Online). Publications that discussed the
use and psychometric properties of relevant instruments
were identified. Additional searching for references was
also performed. When possible, instruments were also reviewed in the PRO and Quality of Life Instruments
Database for additional information.
Following the literature review, and independent of it, patient and clinician interviews were carried out. Clinician
interviews were conducted with four experts in the field of
axSpA. Each expert was interviewed independently and
was asked to indicate the most important symptoms and
their impact on patients with axSpA. Similarly, 15 axSpA
patients underwent semi-structured concept elicitation
interviews to identify the relevant signs, symptoms and
subsequent impact of axSpA on their HRQoL. Patient
interviews were conducted in waves by a trained senior
interviewer. The interview waves continued until concept
saturation was reached (i.e. no new concepts were
identified).
Inclusion and exclusion criteria for the patients interviewed were aligned with those of the RAPID-axSpA clinical trial of CZP in axSpA [15]: patients were 518 years
of age with a diagnosis of axSpA of 53 months duration, as defined by fulfilment of the Assessment of
SpondyloArthritis international Society (ASAS) axSpA criteria [1]. This study was performed in accordance with the
ethical principles that have their origin in the Declaration of
Helsinki and are consistent with Good Clinical Practice
and applicable regulatory requirements. All patients provided informed consent. An application for central institutional review board approval was submitted and approved
at all sites. Ethical approval for the study was provided by
the Quorum Institutional Review Board. All recruited patients had active disease, defined by a BASDAI 54, back
pain 54 (on a 0–10 numerical rating scale) and either CRP
level greater than the upper limit of normal (7.9 mg/l) or
sacroiliitis on MRI according to the ASAS definition [11].
Eligible patients must have previously had an inadequate
response to or been intolerant to one or more NSAID.
1843
Astrid van Tubergen et al.
Patients were excluded if they had received previous
treatment for axSpA with more than two biologics, previous treatment with two or more TNF inhibitors or had a primary failure to respond to TNF inhibitor therapy. Patients
were excluded when total spinal ankylosis was present.
Stratification parameters of the study stated that at least
50% of patients had to meet both the ASAS and modified
New York (mNY) criteria for AS [16]. Of the 15 patients
interviewed, 9 (60%) had AS and 6 had nr-axSpA.
Patients came from two sites in the USA and their overall characteristics were intentionally diverse in terms of
age, gender and ethnicity. A concept elicitation interview
guide was used and the disease concepts derived from
the patient interviews were evaluated against the findings
of the literature review and expert interviews.
Quantitative assessment of psychometric properties
Patients and study design
Measurement properties were assessed using data from
the RAPID-axSpA trial (NCT01087762) [15], an international
study on the safety and efficacy of CZP in axSpA patients
with objective signs of inflammation. CZP is a PEGylated
Fc-free anti-TNF that has been demonstrated to improve
PROs in RA [17]. The ongoing RAPID-axSpA trial is double
blind and placebo controlled to week 24 and dose blind to
week 48. These analyses used data from all 325 axSpA
patients enrolled in the RAPID-axSpA trial; of these, 178
were diagnosed with AS and 147 with nr-axSpA.
Outcomes assessed
The PROs assessed for aspects of reliability and validity
were the BASDAI (six items, scale 0–10) [18], fatigue (item
1 from the BASDAI, scale 0–10), morning stiffness (items 5
and 6 from the BASDAI, scale 0–10), total and nocturnal
back pain (one item each, scale 0–10) [19, 20], patient
global assessment of disease activity (PtGADA; one item,
scale 0–10) [20], BASFI (10 items, scale 0–10) [20], 36-item
Short Form Health Survey (SF-36) physical component
score (PCS; 21 items, 4 domains, scale 0–100), SF-36
mental component score (MCS; 14 items, 4 domains,
scale 0–100) and SF-36 physical functioning (PF) domain
(10 items, scale 0–100) [21]. The measurement properties
of these PROs were assessed in the broad axSpA population and the AS and nr-axSpA subpopulations.
The clinical outcomes reported in this article are the
AS DAS (ASDAS) [22] and physician global assessment
of disease activity (PhGADA; one item, scale 0–100)
change from baseline.
Data handling and statistical analysis
Missing items were imputed to calculate the domain
scores according to the authors’ recommendations
for each respective questionnaire. However, missing
domain scores were not imputed (observed case analysis). Test–retest reliability of the PROs was tested with
intraclass correlation coefficients in a population of patients expected not to change significantly between the
first and second assessments. This population was therefore defined as patients with a change in the PtGADA of
1844
±1 point (10-point numerical rating scale) and a PhGADA
rating within ±15 scale units (0–100 mm visual analogue
scale) between weeks 12 and 16 (stable phase) of the
RAPID-axSpA study. For instruments composed of more
than two items, internal consistency was measured using
Cronbach’s a, whereas Spearman’s rank correlation was
used for instruments with two items. No internal consistency measure was applicable for single-item instruments.
In terms of test–retest reliability, the threshold for
Cronbach’s a was defined as satisfactory when >0.7, as
good when >0.8 and as excellent when >0.9 [23].
Construct validity was assessed by testing Pearson’s
correlation coefficients between PROs and clinical outcomes (ASDAS and PhGADA) at weeks 0, 12 and 24.
Known-groups validity for PROs used effect sizes calculated as the difference between patients with high and
low disease activity (PhGADA median score or ASDAS 2.1
score [24] cut-offs) divided by the S.D. of the full sample at
the appropriate time point. Scores were rescaled so that a
positive effect size corresponded to a more severe disease
state; an effect size >0.8 was considered large [25].
Sensitivity to change was assessed by comparing the
PRO response of clinical endpoint responders (PhGADA
improvement 510 from baseline), stable patients
(PhGADA change from baseline <10) and non-responders
(PhGADA deterioration 510 from baseline) using the effect
size (calculated as the change from week 0 to week 24
divided by the S.D. of the value for week 0) and Guyatt’s
responsiveness statistic (calculated as the change from
week 0 to week 24 divided by the S.D. of change for
stable patients in the same period) [23].
The MCID for each instrument was also determined
using the median and range of values derived by anchorbased and distribution-based approaches. The anchorbased method used a patient global impression of
change (PGIC) [26]; this was not evaluated in its own
right, but was used to assess meaningful change in other
measures. The PGIC is a single-item, 7-point scale for the
change from baseline for a condition using the following
categories: marked improvement, moderate improvement,
slight improvement, no change, slight worsening, moderate
worsening, marked worsening. Anchor-based methods
used patients achieving a slight or slight to moderate
change compared with those demonstrating no change in
PGIC as the a priori definition of a clinically meaningful
change. A regression coefficient between the PGIC and
each PRO was determined and the receiver operating characteristic approach (used to determine the cut-off value of
each PRO that provided the best discrimination between
external criteria) was also performed. The distributionbased approach defined the MCID as 0.5 S.D. from the
baseline score for each of the PROs investigated [27].
Results
Qualitative assessment of content validity
Patients
A total of 15 axSpA patients participated in the concept
validation interviews, including 9 patients who also met
www.rheumatology.oxfordjournals.org
RAPID-axSpA: validation of PRO instruments
the mNY criteria for AS (60%). Interviewed patients
ranged in age from 28 to 63 years, with a median age of
51 years. Seven patients (47%) were men and 12 (80%)
were HLA-B27+.
Content validity
Of the concepts identified in the patient interviews, 55
(86%) were identified in the first wave of interviews, with
5 new concepts in wave 2 and 1 new concept in wave 3.
No new concepts were identified in wave 4 (supplementary Table S1, available at Rheumatology Online). Since no
concepts were identified in the fourth and final wave, saturation of concepts was determined to be achieved and
no new patients were interviewed. Clinicians were interviewed once and their responses separated into the relevant concepts that were mentioned.
The concepts described in both patient and clinician
interviews were then compared with those that resulted
from the systematic literature review. Concepts identified
by all methods included pain (including back pain, hip pain
and neck pain), morning stiffness, fatigue, sleep disturbances, physical function and problems performing daily
and recreational activities. In addition to these, the literature reported difficulty with work-related, self-care and
household tasks, mobility problems, disease activity,
problems with social life/family relationships and
depression.
Both the clinician and patient interviews also revealed
further concepts other than the common concepts listed
above. For clinician interviews these were joint pain and
swelling, difficulty performing physically demanding activities and low vitality levels. Concepts identified only by
patients included difficulty travelling and driving, limitations in physical activity, mobility problems and the emotional impact of axSpA. These additional concepts were
identified in interviews with both AS and nr-axSpA patients, underlining the similarity of these diseases in
terms of disease burden.
Quantitative assessment of psychometric properties
Patients
Since the qualitative assessment suggested that similar
disease-related issues impact the HRQoL of both AS
and nr-axSpA patients, the measurement properties of
PROs previously validated in AS were analysed in a
wider axSpA population; specifically, a set of 325
axSpA patients with objective signs of inflammation
(RAPID-axSpA population) [28]. Of these patients, 178
(55%) met the mNY criteria for AS, with the other 147
(45%) patients being diagnosed with nr-axSpA. Baseline
demographics were similar between the AS and nr-axSpA
subgroups, with the exception of gender ratio (Table 1).
Patients meeting the mNY criteria for AS also reported
longer symptom duration compared with nr-axSpA
patients (11.9 and 8.6 years, respectively). A total of
163 axSpA patients (AS, n = 91; nr-axSpA, n = 72) from
the RAPID-axSpA trial met the criteria for stable disease and were included in the test–retest reliability
analysis.
www.rheumatology.oxfordjournals.org
Data completeness
For the BASFI, BASDAI, total back pain score and
PtGADA, the level of missing items was very low (<1%
in all cases). Slightly higher levels of missing data were
seen for the nocturnal back pain score (<2% in all cases),
PGIC (1.9% at week 12, 2.3% at week 24), SF-36 PF
(1.8% at week 0, 2.6% at week 12, 1.7% at week 24)
and SF-36 component summaries (7.1% at week 0,
8.7% at week 12, 5.7% at week 24). For the SF-36, Last
Observation Carried Forward imputation for missing components was possible in most cases. After imputation of
missing items, 597.5% of domain scores at weeks 0, 12
and 24 were evaluable across PRO measures.
Disease burden
At baseline, mean symptom scores indicated a substantial
level of disease burden (Table 1). Similar mean scores
were reported by AS and nr-axSpA patients. However,
the BASFI showed somewhat higher scores (i.e. more disability) in AS patients compared with nr-axSpA patients
(5.7 vs 4.9).
Reliability
The test–retest reliability of all PRO instruments was satisfactory in the assessed stable subgroup of the overall
RAPID-axSpA population and in each subpopulation
(Table 2). The internal consistency of the PROs was
good, as indicated by a Cronbach’s a >0.8 for all instruments with more than two items (BASFI, BASDAI and the
SF-36 measures), and moderate to good correlations
were found between BASDAI items 5 and 6 (morning stiffness) of 0.53 (week 0) and 0.88 (week 24). Similar results
were observed in the AS and nr-axSpA subpopulations
(Table 2).
Construct and known-groups validity
Correlations were calculated between PROs and clinical
outcomes (i.e. ASDAS and PhGADA). The correlation between the PROs investigated, excluding SF-36 MCS, and
the PhGADA ranged from 0.28 to 0.42 for week 0, 0.58 to
0.66 for week 12 and 0.53 to 0.65 for week 24; correlations with ASDAS ranged from 0.38 to 0.66 for week 0,
0.61 to 0.80 for week 12 and 0.60 to 0.77 for week 24
(Table 3).
Known-groups analysis was also undertaken by calculating the effect sizes in PROs between subgroups of patients based on their clinical health status (Table 4).
Although effect sizes were smaller for week 0, at weeks
12 and 24 the effect sizes for all PROs (excluding SF-36
MCS) were large (0.80–1.30). Similar results were found in
both AS and nr-axSpA patients (data not shown), suggesting that all PRO instruments (excluding SF-36 MCS) correlated with clinical outcomes in these subpopulations.
Sensitivity to change
Both effect sizes and Guyatt’s responsiveness statistic
demonstrated a good sensitivity of PROs (with the exception of SF-36 MCS) to change in the PhGADA in axSpA
patients, with effect sizes >0.8 and Guyatt’s responsiveness statistic values in the range of 1.0–2.0 (Table 5).
1845
Astrid van Tubergen et al.
TABLE 1 Baseline demographics and clinical characteristics of RAPID-axSpA study population
Characteristic
axSpA (n = 325)
AS (n = 178)
nr-axSpA (n = 147)
Male, %
Age, years
White, %
Symptom duration, years
Positive for HLA-B27, n (%)
Prior TNF inhibitor exposure, %
CRPa, median (minimum–maximum), mg/l
ASDASb
BASDAIb
Fatigueb
Morning stiffnessb
Total back painb
Nocturnal back painb
PtGADAb
PhGADAb
BASFIb
SF-36 MCSb
SF-36 PCSb
SF-36 PFb
MASESc
Peripheral arthritis,d n (%)
Heel enthesitis,e n (%)
Uveitis,e n (%)
Psoriasis,e n (%)
Crohn’s disease/ulcerative colitis,e n (%)
1.5
39.6 (11.9)
90.2
10.4 (9.5)
255 (78.5)
16.0
13.9 (0.1–174.8)
3.9 (0.9)
6.4 (1.6)
6.7 (1.9)
6.6 (1.9)
7.0 (1.9)
6.9 (2.3)
7.0 (2.1)
61.4 (17.2)
5.4 (2.3)
40.5 (12.1)
32.5 (7.5)
34.9 (9.3)
5.1 (3.4)
124 (38.2)
117 (36.0)
69 (21.2)
20 (6.2)
18 (5.5)
72.5
41.5 (11.6)
89.3
11.9 (9.9)
145 (81.5)
20.2
14.3 (0.1–174.8)
4.0 (0.9)
6.4 (1.6)
6.6 (2.0)
6.6 (1.9)
7.1 (1.9)
6.9 (2.2)
7.0 (2.1)
61.6 (17.6)
5.7 (2.2)
40.9 (12.0)
32.0 (7.3)
34.3 (9.1)
4.8 (3.5)
62 (34.8)
62 (34.8)
37 (20.8)
7 (3.9)
11 (6.2)
48.3
37.4 (11.8)
91.2
8.6 (8.6)
110 (74.8)
10.9
11.9 (0.1–156.2)
3.8 (0.9)
6.5 (1.5)
6.7 (1.8)
6.6 (1.9)
7.0 (1.9)
6.9 (2.5)
7.0 (2.1)
61.3 (16.8)
4.9 (2.3)
40.0 (12.3)
33.1 (7.8)
35.7 (9.6)
5.4 (3.4)
62 (42.2)
55 (37.4)
32 (21.8)
13 (8.8)
7 (4.8)
Except where indicated otherwise, values are given as mean (S.D.). aNormal range of CRP <8.0 mg/l. bIn the FAS population:
n = 324 (axSpA), 178 (AS), 146 (nr-axSpA). cShown for those patients with at least one affected joint at baseline: n = 229
(axSpA), 122 (AS), 107 (nr-axSpA). dDefined as at least one swollen joint in the 44-joint assessment. eExtra-spinal features of
axSpA are taken from patients’ ASAS classification criteria screening assessments and show both patient history or current
diagnosis. ASDAS: AS Disease Activity Score; ASAS: Assessment of SpondyloArthritis international Society; axSpA: axial
SpA; FAS: full analysis set; MASES: Maastricht Ankylosing Spondylitis Enthesitis Score; nr-axSpa: non-radiographic axSpA;
PhGADA: physician’s global assessment of disease activity; PtGADA: patient’s global assessment of disease activity; SF-36:
36-item Short Form Health Survey; MCS: Mental Component Summary; PCS: Physical Component Summary; PF: Physical
Functioning.
TABLE 2 Test–retest between weeks 12 and 16, and internal consistency of PROs
Test–retest reliability
Internal consistency
axSpA (n = 325)
Instrument
axSpA
(n = 163)a
AS
(n = 91)a
nr-axSpA
(n = 72)a
BASDAIb
Fatigue
Morning stiffnessc
Total back pain
Nocturnal back pain
BASFIb
0.93
0.85
0.91
0.94
0.92
0.94
0.92
0.84
0.90
0.93
0.90
0.93
0.93
0.86
0.93
0.95
0.93
0.96
AS (n = 178)
nr-axSpA (n = 147)
Week 0
Week 24
Week 0
Week 24
Week 0
Week 24
0.82
0.94
0.83
0.79
0.95
0.54
0.88
0.53
0.56
0.89
0.94
0.97
0.93
0.94
NA
0.87
NA
NA
0.97
0.94
0.97
Test–retest reliability was assessed with intraclass correlation coefficients. aPatients with PtGADA rating within ±1 point (10point scale) and PhGADA rating within ±15 scale units (0–100 mm visual analogue scale) at weeks 12 and 16. Internal consistency was assessed with either bCronbach’s a (>2 item scales) or cSpearman’s rank (2 item scales). The threshold for
Cronbach’s a was defined as satisfactory when >0.7, as good when >0.8 and as excellent when >0.9. axSpA: axial SpA; NA:
no internal consistency measure was possible for instruments with one item; nr-axSpa: non-radiographic axSpA; PhGADA:
physician’s global assessment of disease activity; PRO: patient-reported outcome; PtGADA: patient’s global assessment of
disease activity.
1846
www.rheumatology.oxfordjournals.org
RAPID-axSpA: validation of PRO instruments
TABLE 3 Correlations between PROs and clinical measures at week 12 for the overall axSpA population
PhGADA
Measure
BASDAI
Fatigue
Morning stiffness
Total back pain
Nocturnal back pain
PtGADA
BASFI
SF-36 MCS
SF-36 PCS
SF-36 PF
Week 0
Week 12
0.39
0.28
0.33
0.32
0.30
0.38
0.42
0.17
0.40
0.37
0.66
0.59
0.58
0.66
0.62
0.64
0.66
0.35
0.65
0.58
ASDAS
Week 24
Week 0
0.64
0.54
0.58
0.64
0.65
0.58
0.60
0.36
0.59
0.55
Week 12
0.66
0.41
0.54
0.44
0.43
0.62
0.52
0.08
0.45
0.39
0.80
0.63
0.73
0.77
0.75
0.78
0.72
0.37
0.65
0.63
Week 24
0.75
0.60
0.64
0.71
0.71
0.77
0.66
0.33
0.65
0.62
Data shown are Pearson’s correlation coefficients. ASDAS: AS disease activity score; MCS: Mental Component Summary;
PCS: Physical Component Summary; PF: Physical Functioning; PhGADA: physician’s global assessment of disease activity;
PtGADA: patient’s global assessment of disease activity; SF-36: 36-item Short Form Health Survey.
TABLE 4 Known-groups validity between PROs and clinical outcomes (PhGADA and ASDAS) in axSpA
Effect sizes
PhGADA median score cut-off
ASDAS 2.1 score cut-off
Instrument
Week 0
(cut-off 63.0)
Week 12
(cut-off 32.0)
Week 24
(cut-off 20.0)
Week 0
Week 12
Week 24
BASDAI
Fatigue
Morning stiffness
Total back pain
Nocturnal back pain
BASFI
SF-36 MCS
SF-36 PCS
SF-36 PF
0.69
0.46
0.54
0.56
0.51
0.75
0.20
0.68
0.65
1.13
0.99
1.01
1.18
1.13
1.14
0.54
1.12
0.99
0.95
0.79
0.80
0.96
0.95
0.99
0.46
1.05
0.96
1.16
0.43
0.96
1.12
1.08
0.94
0.04
0.67
0.73
1.30
1.01
1.17
1.30
1.22
1.18
0.64
1.03
1.00
1.30
1.02
1.15
1.26
1.28
1.14
0.54
1.07
1.03
Effect sizes were calculated as the difference between patients with high and low disease activity (as defined by the clinical
cut-offs) divided by the S.D. of the whole sample at the appropriate time point. Scores were rescaled so that a positive effect
size corresponded to a more severe disease state. ASDAS: AS disease activity score; MCS: Mental Component Summary;
PCS: Physical Component Summary; PF: Physical Functioning; PhGADA: physician’s global assessment of disease activity;
PRO: patient-reported outcome; SF-36: 36-item Short Form Health Survey.
Similar results were observed in the AS and nr-axSpA
subpopulations (data not shown).
the value of the MCID to differ in either direction for the
AS compared with the nr-axSpA subpopulation (Table 6).
MCID
A range of values for the MCID in each PRO was
calculated using anchor-based and distribution-based
approaches utilizing the RAPID-axSpA data. As there is
no accepted algorithm for establishing an MCID value
from such results, the median and range for MCID
values ascertained using these approaches were reported
(Table 6). The MCID results for the AS and nr-axSpA
populations were less stable compared with the overall
axSpA population, however, there was no tendency for
www.rheumatology.oxfordjournals.org
Discussion
The present study demonstrated that concepts identified
for the broad axSpA population by clinician and patient
interviews were consistent with those identified through
the literature review in AS and that the measurement
properties of PRO instruments in AS were preserved in
the broad axSpA population, including both the AS and
nr-axSpA subpopulations.
1847
Astrid van Tubergen et al.
TABLE 5 Responsiveness of the instruments according to PhGADA-defined response
Effect size of change
from week 0 to week 24
Instrument
Responder
BASDAI
Fatigue
Morning stiffness
Total back pain
Nocturnal back pain
BASFI
SF-36 MCS
SF-36 PCS
SF-36 PF
2.15
1.56
1.98
1.90
1.74
1.09
0.46
1.31
0.93
Stable
Guyatt’s responsiveness
statistic from weeks 0 to 24
Non-responder
Responder
0.38
0.55
0.32
0.28
0.25
0.04
0.07
0.08
–0.06
2.05
1.41
2.07
1.30
1.27
1.26
0.49
1.76
0.91
0.56
0.66
0.43
0.35
0.42
0.12
0.05
0.30
0.11
Stable
0.58
0.61
0.52
0.26
0.32
0.14
0.07
0.47
0.13
Non-responder
0.37
0.47
0.39
0.19
0.18
0.05
0.09
0.14
0.07
Patients were defined using the following cut-offs: responder: improvement 510 from baseline; stable: change from baseline
<10; non-responder: deterioration 510 from baseline. Effect size is calculated as the change from weeks 0 to 24 divided by
the S.D. of the value for week 0. Guyatt’s responsiveness statistic is calculated as the change from weeks 0 to 24 divided by
the S.D. of change for stable patients in the same period. MCS: Mental Component Summary; PCS: Physical Component
Summary; PF: Physical Functioning; PhGADA: physician’s global assessment of disease activity; SF-36: 36-item Short Form
Health Survey.
Baseline PRO scores suggested high levels of symptom
burden and impact of disease. The average scores were
>50% of the possible scale maximum for all the diseasespecific questions at baseline, with substantial reductions
in scores over the study period, typically in the region of
40–50%. A similar symptom burden, as reported through
PRO scores, was reported in AS and nr-axSpA patients;
only BASFI showed somewhat higher scores (i.e. higher
disability) for AS patients, potentially reflecting the presence of structural damage in these patients.
All measures showed high test–retest reliability in the
overall axSpA population, with correlation values >0.8.
All composite scales also demonstrated good internal
consistency. Validity was supported by agreement between PRO measures and external (clinician-rated) measures. Correlations with the ASDAS were higher in general
than those for the PhGADA, although there is an element
of circularity in these correlations, since the ASDAS includes PRO assessments in its construction. It was
included here because the ASDAS has been found to be
highly discriminatory and sensitive to change, better
reflecting the inflammatory disease processes and more
reliably determining the disease activity status of patients
than the BASDAI or other measures of disease activity in
AS [29]. The PhGADA, being independent of PRO assessments, gave a more independent measure of clinical outcomes. The correlations between PRO measures and
PhGADA were higher than those often found between
clinician and patient-rated measures, particularly at
week 24, and support the validity of the PRO measures
in this patient population.
Patients were classified as responders if they showed
an improvement of 510 points on the PhGADA. All the
PRO measures showed good sensitivity to change, with
large response sizes (effect size >0.8) [25] for change
from baseline at weeks 12 and 24 for responders. Effect
sizes for patients classified as non-responders were much
1848
smaller. While there is no standard set of threshold values
for interpreting Guyatt’s responsiveness statistic, the calculated effect size values were similar to those reported
elsewhere [30, 31].
Suggested values for the MCID were derived using an
anchor-based or a distribution-based approach. Values
tended to be in the same range, or slightly higher, than
values seen in previous studies [32–34]. As the minimum
improvement that a patient would perceive as being
meaningful is likely influenced by their expectations regarding treatment efficacy, these slightly higher values
may reflect higher patient expectations for improvement.
Treatments now offered are more effective than when the
original studies describing MCIDs were published, thus
patient expectations regarding minimum improvements
may be higher. The observed differences between the
current findings and those reported previously did not
appear to be due to different study subpopulations
(Table 6).
Correlations between the SF-36 MCS and clinical outcomes, as well as SF-36 MCS sensitivity to change, were
substantially lower compared with those for other PRO
measures, including SF-36 physical scores (PCS and
PF). In addition, the SF-36 MCS did not provide a stable
estimate of clinically meaningful change. This may be due
to the loading of SF-36 MCS towards emotional and social
functioning, which are not as closely associated with the
axial symptoms of axSpA captured by clinical outcomes
as the more physical assessments of HRQoL. This does
not necessarily reflect a general problem with the SF-36
since the physical components of the SF-36 performed
well.
The limitations of this study include the use of a clinical
trial population composed of patients with active disease,
which may not accurately represent the entire spectrum
of axSpA patients. However, this patient population
did include both AS and nr-axSpA patients, hence
www.rheumatology.oxfordjournals.org
www.rheumatology.oxfordjournals.org
1.37
1.07
1.67
1.63
1.75
1.81
0.77
2.03
3.19
1.98
PRO measure
BASDAI
Fatigue
Morning stiffness
Total back pain
Nocturnal back pain
PtGADA
BASFI
SF-36 MCS
SF-36 PCS
SF-36 PF
1.85
1.50
2.12
2.23
2.38
2.22
1.33
3.63
5.30
4.42
Slight or
moderate
improvementc
1.27
1.06
1.41
1.46
1.56
1.46
0.96
2.43
3.93
3.50
Regression
coefficientd
1.30
1.00
2.00
2.00
2.00
2.00
1.20
2.68
5.22
6.31
ROC slight or
greater
improvemente
0.78
0.95
0.95
0.95
1.15
1.05
1.12
6.13
3.81
4.69
0.5 S.D. of
baseline
Distribution-based
method in axSpAa
1.1
2.4
3.8
4.4
1.3
1.1
1.7
1.6
1.8
(0.8–1.8)
(1.0–1.5)
(1.0–2.7)
(1.0–2.2)
(1.1–2.4)
—
(0.8–1.3)
( 2.7 to 6.1)
(3.2–5.3)
(2.0–6.3)
axSpA
1.1
3.5
3.6
4.6
1.4
1.0
1.8
1.5
1.5
(0.9–1.8)
(1.0–1.4)
(1.0–2.1)
(1.0–2.4)
(1.1–2.3)
—
(1.0–1.4)
(1.8–6.0)
(2.3–5.3)
(2.4–6.3)
AS
1.0
3.2
4.4
4.5
1.4
1.0
1.4
1.6
2.1
(0.9–1.4)
(1.0–1.4)
(1.0–2.0)
(1.0–2.0)
(1.3–2.5)
—
( 0.2 to 1.1)
(1.5–6.2)
(3.0–5.6)
(1.6–6.3)
nr-axSpA
Estimated MCID, median (range)
1.0 [32]
1.0 [32]
NA
1.0 [34]
1.0 [34]
—
1.0 [32]
2.5 [33]
2.5 [33]
5.0 [33]
Protocol-defined
MCID [reference]
a
In the overall axSpA population. bDifference between mean PRO measure score for patients scoring 1 (slight improvement) on the PGIC and those scoring 0 (no change). cDifference
between mean PRO measure score for patients scoring 1 (slight improvement) or 2 (moderate improvement) on the PGIC and those scoring 0 (no change). dRegression coefficient for
the PRO measure against PGIC, with the regression line constrained to pass through the origin. eScore on the PRO measure giving the best discrimination between patients scoring
40 on the PGIC and those scoring 51. axSpA: axial SpA; MCID: minimal clinically important difference; MCS: Mental Component Summary; PCS: Physical Component Summary; PF:
Physical Functioning; PGIC: patient global impression of change; PRO: patient-oriented outcome; PtGADA: patient’s global assessment of disease activity; ROC: receiver operating
characteristic; SF-36: 36-item Short Form Health Survey;.
Slight
improvementb
Anchor-based methods:
change from week 0 to week 12
compared with PGIC level in axSpAa
TABLE 6 MCID estimates comparing approaches in the axSpA population: MCID median and range estimates in all populations
RAPID-axSpA: validation of PRO instruments
1849
Astrid van Tubergen et al.
measurement properties for PROs validated in AS
could be compared not only with previously reported
analyses in this subpopulation, but also with the broad
axSpA population. These results therefore suggest that
the PROs reported here are valid for use in the broad
axSpA population, including both AS and nr-axSpA patients. In addition, given that the RAPID-axSpA trial was
conducted across multiple centres in Europe, North
America and Latin America [28], the applicability of the
PROs can be extended to the worldwide axSpA population. In summary, these results provide validation for
future use of these PROs in the broader axSpA population—such as the results reported in this issue concerning
the impact of CZP on PROs in the broader axSpA population [35].
Conclusion
In conclusion, this study indicates that both the content validity and the measurement properties of PRO instruments
used historically in AS patients are preserved in the broad
population of axSpA patients with objective signs of inflammation, including both AS and nr-axSpA patients.
Acknowledgements
The authors acknowledge Marine Champsaur, UCB
Pharma, Brussels, Belgium, for publication coordination
and Costello Medical Consulting, Cambridge, UK, for writing and editorial assistance, which was funded by UCB.
Funding: This work was supported by UCB Pharma.
Disclosure statement: A.v.T. has received grant/research
support from Pfizer and Roche, is a member of a
speakers’ bureau for AbbVie, MSD, UCB and Pfizer and
has consultancies with AbbVie, Pfizer, UCB, Jansen-Cilag
and MSD. P.M.B. is a consultant for UCB Pharma. C.G. is
an employee of UCB Pharma and holds stock in UCB
Pharma.
Supplementary data
Supplementary data are available at Rheumatology
Online.
References
1 Rudwaleit M, Landewé R, van der Heijde D et al. The development of Assessment of SpondyloArthritis international Society classification criteria for axial
spondyloarthritis (part I): classification of paper patients by
expert opinion including uncertainty appraisal. Ann Rheum
Dis 2009;68(6):770–6.
2 Rudwaleit M, van der Heijde D, Landewé R et al. The
development of Assessment of SpondyloArthritis international Society classification criteria for axial spondyloarthritis (part II): validation and final selection. Ann Rheum
Dis 2009;68:777–83.
3 van Tubergen A, Weber U. Diagnosis and classification in
spondyloarthritis: identifying a chameleon. Nat Rev
Rheumatol 2012;8:253–61.
1850
4 Rudwaleit M, Haibel H, Baraliakos X et al. The early disease stage in axial spondylarthritis: results from the
German Spondyloarthritis Inception Cohort. Arthritis
Rheum 2009;60:717–27.
5 Braun J, Baraliakos X, Godolias G, Bohm H. Therapy of
ankylosing spondylitis—a review. Part I: conventional
medical treatment and surgical therapy. Scand J
Rheumatol 2005;34:97–108.
6 Gran JT, Husby G, Hordvik M. Prevalence of ankylosing
spondylitis in males and females in a young middle-aged
population of Tromsø, northern Norway. Ann Rheum Dis
1985;44:359–67.
7 van der Linden SM, Valkenburg HA, de Jongh BM, Cats A.
The risk of developing ankylosing spondylitis in HLA-B27
positive individuals. A comparison of relatives of spondylitis patients with the general population. Arthritis
Rheum 1984;27:241–9.
8 Strand V, Rao SA, Shillington AC et al. Prevalence of axial
SpA in US rheumatology practices: assessment of ASAS
criteria vs. rheumatology expert clinical diagnosis. Arthritis
Care Res 2013;65:1299–306.
9 van der Heijde D, Kivitz A, Schiff MH et al. Efficacy and
safety of adalimumab in patients with ankylosing spondylitis: results of a multicenter, randomized, doubleblind, placebo-controlled trial. Arthritis Rheum 2006;54:
2136–46.
10 Haibel H, Rudwaleit M, Listing J et al. Efficacy of adalimumab in the treatment of axial spondylarthritis without
radiographically defined sacroiliitis: results of a twelveweek randomized, double-blind, placebo-controlled trial
followed by an open-label extension up to week fifty-two.
Arthritis Rheum 2008;58:1981–91.
11 Sieper J, Rudwaleit M, Baraliakos X et al. The Assessment
of SpondyloArthritis international Society (ASAS) handbook: a guide to assess spondyloarthritis. Ann Rheum Dis
2009;68(Suppl 2):ii1–44.
12 Zochling J. Measures of symptoms and disease status in
ankylosing spondylitis: Ankylosing Spondylitis Disease
Activity Score (ASDAS), Ankylosing Spondylitis Quality of
Life Scale (ASQoL), Bath Ankylosing Spondylitis Disease
Activity Index (BASDAI), Bath Ankylosing Spondylitis
Functional Index (BASFI), Bath Ankylosing Spondylitis
Global Score (BAS-G), Bath Ankylosing Spondylitis
Metrology Index (BASMI), Dougados Functional Index
(DFI), and Health Assessment Questionnaire for the
Spondylarthropathies (HAQ-S). Arthritis Care Res
2011;63(Suppl 11):S47–58.
13 Barkham N, Keen HI, Coates LC et al. Clinical and imaging
efficacy of infliximab in HLA-B27-positive patients with
magnetic resonance imaging-determined early sacroiliitis.
Arthritis Rheum 2009;60:946–54.
14 US Food Drug Administration. Guidance for industry:
patient-reported outcome measures: use in medical
product development to support labeling claims. Fed Reg
2009;74:65132–3.
15 Landewé R, Braun J, Deodhar A et al. Efficacy of certolizumab pegol on signs and symptoms of axial spondyloarthritis including ankylosing spondylitis: 24-week
results of a double-blind randomised placebo-controlled
phase 3 study. Ann Rheum Dis 2014;73:39–47.
www.rheumatology.oxfordjournals.org
RAPID-axSpA: validation of PRO instruments
16 van der Linden S, Valkenburg HA, Cats A. Evaluation of
diagnostic criteria for ankylosing spondylitis. A proposal
for modification of the New York criteria. Arthritis Rheum
1984;27:361–8.
27 Norman GR, Sloan JA, Wyrwich KW. Interpretation of
changes in health-related quality of life: the remarkable
universality of half a standard deviation. Med Care
2003;41:582–92.
17 Strand V, Smolen JS, van Vollenhoven RF et al.
Certolizumab pegol plus methotrexate provides broad
relief from the burden of rheumatoid arthritis: analysis of
patient-reported outcomes from the RAPID 2 trial. Ann
Rheum Dis 2011;70:996–1002.
28 Landewé R, Braun J, Deodhar A et al. Efficacy of
certolizumab pegol on signs and symptoms of axial
spondyloarthritis including ankylosing spondylitis:
24-week results of a double-blind randomised
placebo-controlled phase 3 study. Ann Rheum Dis
2014;73:39–47.
18 Garrett S, Jenkinson T, Kennedy LG et al. A new approach
to defining disease status in ankylosing spondylitis:
the Bath Ankylosing Spondylitis Disease Activity Index.
J Rheumatol 1994;21:2286–91.
29 Machado P, van der Heijde D. How to measure disease
activity in axial spondyloarthritis? Curr Opin Rheumatol
2011;23:339–45
19 Sieper J, van der Heijde D, Landewé R et al. New criteria
for inflammatory back pain in patients with chronic back
pain: a real patient exercise by experts from the
Assessment of SpondyloArthritis international Society
(ASAS). Ann Rheum Dis 2009;68:784–8.
30 Piva SR, Gil AB, Moore CG, Fitzgerald GK.
Responsiveness of the activities of daily living scale of the
knee outcome survey and numeric pain rating scale in
patients with patellofemoral pain. J Rehabil Med
2009;41:129–35.
20 van der Heijde D, Dougados M, Davis J et al. Assessment
in Ankylosing Spondylitis International Working Group/
Spondylitis Association of America recommendations for
conducting clinical trials in ankylosing spondylitis. Arthritis
Rheum 2005;52:386–94.
31 Cook KF, Roddey TS, Gartsman GM, Olson SL.
Development and psychometric evaluation of the Flexilevel
Scale of Shoulder Function. Med Care 2003;41:823–35.
Õ
21 Maruish M. User’s Manual for the SF-36v2 Health
Survey. 2011.
22 van der Heijde D, Lie E, Kvien TK et al. ASDAS, a highly
discriminatory ASAS-endorsed disease activity score in
patients with ankylosing spondylitis. Ann Rheum Dis
2009;68:1811–8.
23 Cole BF, Lee M-LT. Statistical Methods for Quality of Life
Studies: Design, Measurements and Analysis. 2002.
24 Machado P, Landewé R, Lie E et al. Ankylosing
Spondylitis Disease Activity Score (ASDAS): defining cutoff values for disease activity states and improvement
scores. Ann Rheum Dis 2011;70:47–53.
25 Cohen J. A power primer. Psychol Bull 1992;112:155–9.
26 Hurst H, Bolton J. Assessing the clinical significance of
change scores recorded on subjective outcome measures. J Manipulative Physiol Ther 2004;27:26–35.
www.rheumatology.oxfordjournals.org
32 Pavy S, Brophy S, Calin A. Establishment of the minimum
clinically important difference for the bath ankylosing
spondylitis indices: a prospective study. J Rheumatol
2005;32:80–5.
33 Strand V, Scott DL, Emery P et al. Physical function and
health related quality of life: analysis of 2-year data from
randomized, controlled studies of leflunomide, sulfasalazine, or methotrexate in patients with active rheumatoid
arthritis. J Rheumatol 2005;32:590–601.
34 Dworkin RH, Turk DC, Wyrwich KW et al. Interpreting the
clinical importance of treatment outcomes in chronic pain
clinical trials: IMMPACT recommendations. J Pain
2008;9:105–21.
35 Sieper J, Kivitz A, van Tubergen A, Deodhar A, Coteur G,
Woltering F, Landewé R. Impact of certolizumab pegol
on patient-reported outcomes in patients with axial
spondyloarthritis. Arthritis Care Res 2015; doi: 10.1002/
acr.22594.
1851