Evidence from crossover trials: empirical

Published by Oxford University Press on behalf of the International Epidemiological Association
ß The Author 2007; all rights reserved. Advance Access publication 14 February 2007
International Journal of Epidemiology 2007;36:422–430
doi:10.1093/ije/dym001
METHODOLOGY
Evidence from crossover trials: empirical
evaluation and comparison against parallel
arm trials
Dimitrios N Lathyris,1 Thomas A Trikalinos1, 2 and John PA Ioannidis1, 2*
Accepted
20 December 2006
Background We aimed to evaluate empirically how crossover trial results are analysed in
meta-analyses of randomized evidence and whether their results agree with
parallel arm studies on the same questions.
Methods
We used a systematic sample of Cochrane meta-analyses including crossover
trials. We evaluated the methods of analysis for crossover results and compared
the concordance of the estimated effect sizes in crossover vs parallel arm trials.
Results
Of 334 screened reviews, 62 had crossover trials. Of those, 33 meta-analyses
performed quantitative syntheses involving two-arm two-period crossover trials.
There was large variability on how these trials were analysed; only one of the 33
meta-analyses stated that they used the data from both the first and second
period with an appropriate paired approach. Nine meta-analyses used the first
period data only and 14 gave no information at all on what they had done.
Twenty-eight meta-analyses had both crossover (n ¼ 137, sample size n ¼ 7162)
and parallel arm (n ¼ 132, sample size n ¼ 11 398) trials. Effect sizes correlated
well with the two types of designs ( ¼ 0.72). Differences on whether the
summary effect had a P < 0.05 or not were common due to limited sample sizes.
The summary relative odds ratio for parallel arm vs crossover designs for
favourable outcomes was 0.87 (95% CI, 0.74–1.02).
Conclusions Crossover designs may contribute evidence in a fifth of systematic reviews, but
few meta-analyses make use of their full data. The results of crossover trials
tend to agree with those of parallel arm trials, although there was a trend for
more conservative treatment effect estimates in parallel arm trials.
Keywords
Crossover studies, parallel studies, meta-analysis, empirical evaluation
Randomized trials may be conducted with either a parallel arm
design or variants of crossover designs. Parallel arm trials are
far more popular in the literature. They are simpler to
design and analyse and are generally feasible. We also have
well-established and accepted methods for their meta-analysis.1
Crossover trials are less common, but not rare. An appraisal
of issue 1, 2001 of the Cochrane Database of Systematic
1
Clinical Trials and Evidence-Based Medicine Unit, Department of Hygiene
and Epidemiology, University of Ioannina School of Medicine, Ioannina,
Greece.
2
Institute for Clinical Research and Health Policy Studies, Department of
Medicine, Tufts University School of Medicine, Boston, MA, USA.
* Corresponding author. Department of Hygiene and Epidemiology,
University of Ioannina School of Medicine, Ioannina 45110, Greece.
E-mail: [email protected]
Reviews found that 184 out of 1000 reviews referred to
crossover trials and 151 included data from crossover designs.2
A Medline search of crossover studies published between
2000 and 2003 in five major medical journals found
40 crossover trials.3
Crossover trials are difficult or inappropriate to apply for
many research questions. Moreover, they represent relatively
complex designs. In a two-treatment crossover study each
patient may be perceived as his own control: ideally one might
simply compare the effect of treatment of the first period to
that of the second period. However, treatment-period interaction and carry over effects jeopardize the validity of such simple
inferences. Dealing with these problems poses analytical
challenges.4,5 There are several different approaches for the
analysis of data from crossover trials. Diverse options apply to
422
EVIDENCE FROM CROSSOVER TRIALS
the primary analysis of the data; the problem is further
accentuated in meta-analyses, where information to perform
some types of analyses may be missing in published reports.
One may use the first period results only, but this will waste
all the data after crossover and will drastically diminish the
power to detect any intervention effects. Alternatively, one may
use the first and second period data as two different parallel
studies, but this approach is biased, because it ignores
treatment-period interaction. Finally, one may use outcomes
from paired analyses. However, this information is often
missing from published reports.3 Then one can try to
approximate paired analysis by using imputed correlation
coefficients from previous studies with sufficient amount
of data.6
This lack of unanimity and use of suboptimal methods may
introduce bias in the analysis of crossover studies. We
considered that it would be useful to obtain empirical evidence
on the handling of the data and the influence of data from
crossover trials in meta-analyses of randomized evidence. We
tried to answer the following questions. What is the relative
contribution of crossover trials to randomized evidence for
diverse medical interventions? How are crossover trials handled
in current meta-analyses? Do the results of crossover trials
agree with those of parallel studies in the same meta-analysis,
or do they find strong or weaker treatment effects? To answer
these questions we analysed data from a systematic sample of
meta-analyses.
Methods
Selection of meta-analyses, crossover studies
and outcomes
We perused every fifth review out of the 1669 included in the
Cochrane Library Issue 2, 2003 (in the order listed in the
Cochrane Database) to identify reviews which included
crossover randomized trials in quantitative syntheses (metaanalyses). The spacing of the evaluated reviews was chosen
arbitrarily. We excluded all non-randomized and pseudorandomized trials; and selected only two treatments, two
period crossover studies (AB/BA design). Comparisons of
more than two treatments are rare and their interpretation is
rather complicated anyhow. At a second step, we focused on
meta-analyses with at least one parallel and at least one
crossover trial so as to evaluate the concordance of results
obtained with these two designs.
We examined both binary and continuous outcomes. In every
systematic review we selected the primary outcome as stated by
the reviewers. If a primary outcome was not clearly identified,
we selected what we deemed to be the most clinically
important outcome; if this was also not clear, we selected the
one with the larger number of studies; if there were ties, we
selected the one with the larger number of randomized
patients; and then the meta-analysis with fewer zero cells in
the 2 2 tables of the constituent studies (for binary outcomes). Whenever there were two or more independent metaanalyses from the same systematic review (e.g. on different,
non-overlapping types of patients), we considered them as
separate meta-analyses. Two investigators (D.L. and T.A.T.)
423
independently selected eligible meta-analyses and crossover
studies. Disagreements were referred to the third investigator
(J.P.A.I.).
Data extraction on primary trials
For every eligible meta-analysis we extracted information on
the specific comparison, and outcome, and the number of
parallel and crossover trials. For each trial we recorded the first
author and year of publication, the design (parallel vs.
crossover), the way the meta-analysts utilized data from
crossover trials [first period only, second period only,
‘combined’ data (and if so how combined), or no comment/
unclear], the approach of the meta-analysis to the carry-over
effect in each crossover trial, and information necessary for the
calculation of the effect sizes (2 2 tables for binary outcomes,
and number of patients, mean response and SD of the
responses in each arm for continuous outcomes).
Statistical methods
Meta-analyses
Quantitative analyses were performed in meta-analyses that
included both crossover and parallel studies. For binary
outcomes we used the odds ratio (OR) as the metric of
choice. Continuous outcomes were quantified using standardized mean differences (SMDs), as expressed by Hedges’s g
metric. SMDs express the magnitude of the treatment effects
relative to the within group standard deviations and can be
used to synthesize studies that have quantified treatment
effects in different scales.7
All comparisons were coined so that the experimental
treatment is compared vs the standard/older treatment or no
treatment (or placebo) for a good, favourable outcome. For
example, if the outcome used by the original trials and their
meta-analysis was therapeutic success, we kept the data as is;
however if it was therapeutic failure, then we took the
complementary counts, i.e. we inversed the OR. If the outcome
was measured on a continuous scale, where health was better
with higher scores, we kept the data as is; if the scale meant
worse health with higher scores, then we inversed the sign of
the difference between arms in the calculation of the SMD.
Therefore an OR > 1 or SMD > 0 suggests that the experimental
treatment fares better than the comparator.
For every topic we calculated the summary effect size of
crossover and parallel studies with both fixed and random
effects inverse variance syntheses.8,9 Fixed effects analyses
assume that the true effect of treatment is the same across
synthesized studies, while random effects analyses allow for
between-study heterogeneity (dissimilarity) and incorporate it
in the calculations. For binary outcomes, if any trial arms in the
analysed studies had zero observed events we added 0.5 to all
cells of the pertinent two by two table.
We tested for between-study heterogeneity using the 2
distributed Q statistic and quantified its extent with the I2
statistic.10 I2 ranges between 0% and 100%, and values above
75% imply very high heterogeneity.
Assessment of concordance
We evaluated the agreement between parallel and crossover
designs. Concordance was measured by assessing the Spearman
424
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
correlation coefficient for the effect size estimates between the
two designs across all meta-analyses; how often the two
designs showed effects in the same direction; how often
the two designs agreed with respect to having P < 0.05
for the summary effect; and whether the effect size estimates
of the two designs differed beyond chance.11 For the latter
assessment, we calculated the relative odds ratio (ROR) by
dividing the summary OR in parallel trials by the summary
OR in crossover trials and also estimated the variance
and 95% confidence intervals (CIs) of the ROR.12
We transformed the weighted SMDs of the continuous
outcomespffiffi to ORs13 using the pfollowing
formulae:
ffiffiffi
with
se lnðORÞ ¼ ð 3=3Þ seðgÞ.
When
OR ¼ e ð 3=3Þg
ROR > 1, parallel trials yield larger OR estimates than crossover
trials, i.e. the parallel trials give a more favourable picture for
the experimental intervention than the crossover trials, and vice
versa when ROR < 1.
Finally, we assessed whether there was a systematic
difference in the estimated summary effects between the two
designs across all topics. The summary ROR was calculated
across all topics with random effects syntheses and heterogeneity in the ROR estimates across meta-analyses was
measured with the Q and I2 statistics.
Subgroup and sensitivity analyses
The main analyses included all topics, regardless of the way
data from the crossover studies were reported to be analysed. In
separate analyses we estimated the summary ROR across the
two designs considering only the crossover studies where only
data from the first period were used; and estimated the
summary ROR considering all other crossover studies. We also
performed a sensitivity analyses by excluding the largest metaanalysis that included almost half of the crossover studies in
our empirical evaluation.
Analyses were conducted in Intercooled Stata 8.2 and in
SPSS 13.0. All P-values are two tailed. Hypothesis-testing for
between-study heterogeneity is done at ¼ 0.10.14
Results
Eligible meta-analyses
Figure 1 shows the flow of the systematic screening of the
Cochrane Library. We screened 334 systematic reviews and
excluded 272 of them, as they did not have any crossover trials.
Of the remaining 62 systematic reviews, 26 were finally
selected.15–40 These pertained to 28 independent meta-analyses,
because one review contained eligible data for three different
meta-analyses (on three different types of study populations).
Analysis of data from crossover trials
A variety of approaches were used to incorporate the results of
crossover trials in the 28 meta-analyses that included both
types of designs and 12 out of these 28 meta-analyses did not
mention at all their approach towards crossover trials results.
Nine meta-analyses used only the first period results of
crossover studies. Three meta-analyses combined data from
first and second period, but only one of them described the
method of combination, which was the paired analysis
334 potentially eligible
reviews
272 without crossover trials
62 with at least one
crossover trial
36 excluded:
No outcomes with > 1 study (n =14)
No quantitative meta-analysis (n =9)
All crossover trials had > 2 groups
(n =4)
Only parallel arm studies in metaanalysis (n =2)
No study-level graphical/numerical
information (n = 1)
Identical meta-analysis with another
review (n =1)
31 with at least one
AB / BA crossover trial
in a meta-analysis
Five had only crossover trials in their
meta-analyses
26 eligible for
quantitative analyses
Figure 1 Flow of systematic reviews
proposed by the Cochrane group. One meta-analysis used
data from the second period only. Finally, three meta-analyses
did not have a consistent approach towards analysis of
crossover trials, using first period data in some trials and data
from both periods in others. Only four of the 28 meta-analyses
discussed the problem of carry over effect: three of them
assumed that there was no carry-over effect in the analysed
crossover studies and one relied on the carry over test results of
crossover studies as reported in their initial publication.
In the five meta-analyses that had included only crossover
trials, two meta-analyses did not mention at all what they had
done, one approximated a paired analysis as described in the
Reviewers’ Handbook of the Cochrane Collaboration, one used
data from the second period, and one combined data from both
periods without clarifying precisely the analysis methods. Only
one of these meta-analyses stated that it assumed that there
was no carry-over effect to the analysed crossover studies, while
the remaining four meta-analyses did not comment at all on
the potential problem of carry-over effect.
Characteristics of crossover and parallel arm trials
In the 28 meta-analyses that included both types of designs
(Table 1), there were 142 crossover trials. We excluded five
crossover trials from two meta-analyses (four trials analysed
Table 1 Characteristics of included meta-analyses
Outcome used
in the meta-analysis
Meta-analysis
(year)
Reference
number
CDSR
Interventions and disease
NStudies
(Patients)
Definition
ES > 0 | OR > 1 favours
experimental intervention Crossover Parallel
Analysis of
crossover trials
Continuous
Gotzsche (2002)
15 CD000189 Short-term corticosteroids vs placebo for RA
Joint tenderness
No
2 (66)
Suarez-Almazor (1997)
16 CD000957 MTX vs placebo for RA
Number of tender joints
No
2 (55)
3 (164) Unclear
1 (40) Combined data
8 (252) First period
Ferreira (2001)
17 CD000998 Nutrition vs placebo/usual diet for stable COPD
Weight loss
Yes
1 (25)
Pinelli (2001)
18 CD001071 Non-nutritive sucking vs control in preterm infants
Heart rate change
No
2 (66)
1 (20) Unclear
White (2001)
19 CD001106 CPAP vs placebo for obstructive sleep apnea
Epworth Sleepiness
Scale
No
2 (86)
2 (212) Unclear
Poustie(2003)
20 CD001304 Low PA diet vs no diet for PKU
Blood PA level
No
1 (32)
Suarez-Almazor (2000)
21 CD001461 Azathioprine vs placebo for RA
Number of tender joints
No
2 (53)
1 (28) Mixed
1 (9) Unclear
Brion (2001)
22 CD001817 Diuretics vs placebo for non-intubated preterms with CLD
Change in compliance
Yes
1 (20)
1 (21) Combined data
Lemyre (2001)
23 CD002272 NIPPV vs NCPAP for apnea of prematurity
PCO2 at 4–6 h
No
1 (40)
1 (34) Combined data
Lonergan (2002)
24 CD002852 Haloperidol vs placebo for agitation in dementia
Change in agitation
No
1 (60)
Jones (2001)
25 CD003537 Inhaled corticosteroid vs placebo for asthma and mild COPD Osteocalcin
Yes
2 (111)
Green (2001)
26 CD003686 NSAID vs placebo for lateral elbow pain
Pain VAS
No
1 (28)
Riemsma (2003)
27 CD003688 Patient education vs control in RA
Change in pain score
No
McCrory (2003)
28 CD003900 Anticholinergics vs 2-agonists for COPD exacerbation
Change in FEV1 at
90 min
Yes
Change in SBP
No
47 (2612) 10 (2484) Unclear
38 (1556) 18 (1811) Unclear
3 (309) First period
1 (30) Second period
2 (102) Unclear
4 (262) 33 (1957) First period
2 (58)
2 (71) First period
29a CD004022 Low vs high Na diet (Caucasians, normal SBP)
Jurgens (2003)
29b CD004022 Low vs high Na diet (Caucasians, elevated SBP)
Change in SBP
No
Jurgens (2003)
29c CD004022 Low vs high Na diet (African-Americans, elevated DBP)
Change in SBP
No
6 (350)
2 (172) Unclear
30 CD004257 NSAIDS vs acetaminophen for OA
Rest pain
No
1 (148)
1 (123) Unclear
Hughes (1999)
31 CD000057 Clomiphene citrate vs placebo for unexplained subfertility
Pregnancies per patient
Yes
3 (357)
Cheine (2001)
32 CD000234 -blocker supplementation vs placebo in schizophrenia
Leaving the study early
No
1 (8)
4 (109) First period
Clarke (2001)
33 CD000236 Pergoline vs bromocriptine for L-Dopa complications in PD
Proportion improved
Yes
1 (114)
1 (191) First period
Ortiz (1999)
34 CD000951 Folinic acid vs placebo for MTX side effects in RA
GI side effects
No
2 (33)
Higgins (2003)
35 CD001015 Lecithin vs control for AD
Proportion deteriorated
No
1 (15)
Wiffen (2000)
36 CD001133 Anticonvulsants vs placebo for all neuropathic pain
Proportion improved
Yes
5 (675)
Toweed (2002)
Binary
2 (98) Mixed
3 (135) Unclear
1 (37) First period
2 (380) Unclear
Melchart (2000)
37 CD001218 True vs sham acupuncture for idiopathic headache
Proportion improved
Yes
1 (18)
Sultana (2000)
38 CD001944 Thioridazine vs typical neuroleptics for schizophrenia
Leaving the study early
No
2 (81) 19 (1656) Combined data
Proctor (2001)
39 CD002123 TENS vs placebo for primary dysmenorrhoea
Proportion with pain relief Yes
Hood (2001)
40 CD002901 Digitalis vs control for CHF (patients with sinus rhythm)
Clinical deterioration
No
1 (42)
4 (191)
1 (30) First period
2 (29) First period
6 (894) Unclear
425
AD, Alzheimer’s disease; CDSR, Cochrane database of systematic reviews number; CHF, congestive heart failure; CLD, chronic lung disease; COPD, chronic obstructive pulmonary disease; (N)CPAP, (nasal)
continuous positive airway pressure; DBP, diastolic blood pressure; ES, effect size; h, hours; GI, gastrointestinal; MTX, methotrexate; NIPPV, nasal intermittent positive pressure ventilation; NSAID, non-steroid
anti-inflammatory drugs; NStudies, number of studies; OA, osteoarthritis; OR, odds ratio; PA, phenylalanine; PKU, phenylketonuria; PD, Parkinson’s disease; RA, rheumatoid arthritis; SBP, systolic blood pressure;
TENS, transcutaneous electrical nerve stimulation; VAS, visual analogue scale.
Note: All meta-analyses are included in the Cochrane Library, Issue 2, 2003. The year next to the authors name refers to the last amendment of each meta-analysis. Outcomes are described as defined in the
systematic reviews. For the quantitative analyses, all comparisons were coined so that the experimental treatment is compared vs the standard/older treatment or no treatment (or placebo) for a good, favourable
outcome. Thus outcome definitions were reversed for entries with ‘No’ in the fifth column.
EVIDENCE FROM CROSSOVER TRIALS
Jurgens (2003)
426
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
more than two groups and one was a partial crossover trial
where many patients continued the first period’s treatment in
the second period). Finally, we analysed 137 crossover studies
with a total sample size of 7162 included in the analyses and
132 parallel studies with 11 398 study subjects. In one topic
(effects of low sodium diet vs high sodium diet on blood
pressure, and biochemical indices29) three different outcomes
were analysed that included 91 crossover analyses with a total
sample size of 4518 and 30 parallel studies with 4467 subjects.
These trials represented a large proportion of the overall
database and thus we performed sensitivity analyses excluding
this topic.
A wide variety of diseases and interventions were represented.
The most common targeted categories of diseases were joint
diseases (n ¼ 7, of which n ¼ 5 for rheumatoid arthritis) and
respiratory (n ¼ 6, of which n ¼ 3 for chronic obstructive
pulmonary disease). Parallel trials had a larger analysed
sample size than crossover trials (median 46 vs 36, P ¼ 0.006).
The crossover trials contributed anywhere between 5% and
79% of the total sample size of analysed subjects across these
meta-analyses. In 16 of 28 meta-analyses, the total sample size
in the parallel trials was larger than the sample size analysed in
the crossover trials. There was no difference in the year of
publication of crossover vs parallel trials (median 1990 vs 1991,
P ¼ 0.36). Of the 28 meta-analyses, 18 used continuous
outcomes (Table 1).
Treatment effects and between-study heterogeneity
Summary SMDs expressed by Hedges’s g for continuous
outcomes and summary OR for binary outcomes and I2 for
the extent of heterogeneity are shown in Table 2. These data
refer to all studies and separately to crossover and parallel
studies. Between-study heterogeneity did not seem to be clearly
explained by differences between crossover and parallel trials.
Only one meta-analysis found very-large between-study heterogeneity overall; one found very large between-study heterogeneity when separate analyses were performed for crossover
trials; and two when separate analyses were performed for
parallel arm trials.
Concordance between crossover and parallel trial
results
The summary effect sizes for crossover trials were highly
correlated to the summary effects in parallel trials across
all 28 meta-analyses (Spearman correlation coefficient 0.72,
P < 0.001). The results were similar, but based on more sparse
data, separately for meta-analyses of continuous ( ¼ 0.68,
P ¼ 0.002) and binary ( ¼ 0.68, P ¼ 0.030) outcomes.
In 23 of the 28 meta-analyses, the point summary estimates
were in the same direction with both designs (15/18 continuous
outcomes, 6/8 binary outcomes). When the overall results had
P < 0.05, the two types of designs always found estimates of
effects in the same direction.
In six meta-analyses, evidence from both designs gave
P < 0.05 in the summary results, and in 12 meta-analyses
both designs gave P’s 5 0.05. However, in 10 meta-analyses the
two designs differed in the presence or not of P < 0.05 (in seven
only the crossover data resulted in P < 0.05, in 3 only the
parallel trials data resulted in P < 0.05). Overall the kappa
coefficient was 0.27 (95% CI, 0.07 to 0.62).
ROR analyses
ROR analyses showed that the difference in the effect size
observed in the two types of designs was beyond chance in only
one of the 28 meta-analyses (Figure 2). This meta-analysis
evaluated distal loop diuretics in preterm infants with chronic
lung disease: there was only one small parallel trial (n ¼ 21
infants) and one even smaller crossover trial [n ¼ 10 infants,
counted twice in the analysis (for the two periods)] that found
opposite results. The crossover trial found even formally
statistically results, despite its small sample size.
There was considerable variability in the point estimates of
the ROR per meta-analysis and 12 of the ROR point estimates
were either larger than two or smaller than 0.5 (Figure 2).
However, the large majority of RORs had wide CIs, because
usually the amount of evidence was not extensive with either
or both designs. Overall, there was no between-meta-analysis
heterogeneity in the ROR estimates (P ¼ 0.46, I2 ¼ 0%). The
summary ROR was 0.87 and its 95% CI marginally encompassed 1.00 (ROR ¼ 0.87, 95% CI 0.74–1.02), suggesting a trend
for more conservative treatment effects in parallel arm trials
than in crossover trials.
Exclusion of the largest meta-analysis yielded similar estimates, with a summary ROR of 0.88. The summary ROR
estimates were similar in meta-analyses that clearly specified
that they had used consistently only the first period data
(ROR ¼ 0.87) and the other meta-analyses (ROR ¼ 0.87). The
ROR estimates were 0.94 for binary and 0.86 for continuous
data, respectively and these did not differ beyond chance
(P ¼ 0.74). Finally, the ROR was still 0.91 (95% CI 0.73–1.13)
when limited to meta-analyses where the analysed sample size
of the crossover designs was smaller than the sample size of the
parallel arm trials (Table 3).
Discussion
Our empirical evaluation shows that crossover designs may
contribute evidence on the effectiveness of medical interventions in about a fifth of systematic reviews. We found that their
results correlate with those of parallel arm trials in broad terms.
However, if anything, there was a trend for more conservative
treatment effect estimates in parallel arm trials. This trend
should be seen with caution. Perhaps more importantly, the
majority of meta-analyses do not mention at all their approach
towards the carry-over effect problem and none of them
proceeded to reanalysis of data from all crossover periods
with testing for carry-over effect. Systematic reviewers usually
do not state how exactly they have handled the results of such
studies, while a common practice is to use only the first period
results, thus wasting the second period information.
For some disciplines, such as rheumatology, respiratory
medicine and dermatology, crossover trials may be encountered
quite commonly in the literature. This has been demonstrated
also in empirical evaluations of large samples of trials in these
disciplines.41–44 Sometimes medical interventions may have
been evaluated only with crossover trials. It is well appreciated
EVIDENCE FROM CROSSOVER TRIALS
427
Table 2 Effect sizes and heterogeneity in the eligible meta-analyses overall and per study design
All studies combined
Meta-analysis
(year)
Crossover trials only
Parallel trials only
Reference
number
N
Effect (95% CI)
I2% (pHet)
N
Effect (95% CI)
I2% (pHet)
N
Gotzsche(2002)
15
3
1.32 (0.46, 2.18)
74 (0.02)
2
1.58 (0.34, 2.82)
78 (0.03)
1
0.85 (0.20, 1.49)
NA
Suarez-Almazor
(1997)
16
5
0.86 (0.58, 1.14)
0 (0.58)
2
1.12 (0.55, 1.70)
0 (0.52)
3
0.77 (0.45, 1.09)
0 (0.50)
31 (0.18)
Effect (95% CI) I2% (pHet)
Continuous
Ferreira (2001)
17
9
0.07 (0.25, 0.40)
37 (0.12)
1
0.58 (1.39, 0.22)
NA
8
0.15 (0.18, 0.48)
Pinelli (2001)
18
3
0.19 (0.76, 0.38)
41 (0.18)
2
0.01 (0.64, 0.62)
39 (0.20)
1
0.68 (1.59, 0.23)
NA
White (2001)
19
4
0.84 (0.52, 1.16)
38 (0.18)
2
0.46 (0.26, 1.18)
51 (0.15)
2
1.02 (0.73, 1.31)
0 (0.97)
Poustie (2003)
20
2
2.53 (0.28, 4.78)
61 (0.11)
1
1.72 (0.89, 2.54)
NA
1
4.15 (1.28, 7.03)
NA
Suarez-Almazor
(2000)
21
3
1.12 (0.30, 1.93)
61 (0.08)
2
1.51 (0.11, 2.92)
72 (0.06)
1
0.60 (0.16, 1.37)
NA
Brion (2001)
22
2
0.64 (0.92, 2.21)
82 (0.02)
1
1.47 (0.45, 2.48)
NA
1
0.13 (1.00, 0.73)
NA
Lemyre (2001)
23
2
0.10 (0.56, 0.36)
0 (0.89)
1
0.13 (0.75, 0.49)
NA
1
0.07 (0.74, 0.61)
NA
Lonergan (2002)
24
4
0.12 (0.08, 0.33)
0 (0.99)
1
0.15 (0.39, 0.69)
NA
3
0.12 (0.10, 0.34)
0 (0.96)
Jones (2001)
25
3
0.41 (0.96, 0.13)
48 (0.15)
2
0.24 (0.86, 0.38)
45 (0.18)
Green (2001)
26
3
0.93 (0.56, 1.29)
0 (0.38)
1
0.69 (0.07, 1.46)
NA
Riemsma (2003)
27 37
McCrory (2003)
28
1 0.84 (1.63, 0.05)
2
NA
33 (0.22)
0.08 (0.01, 0.17)
7 (0.36)
4
0.24 (0.13, 0.60)
4
0.01 (0.36, 0.33)
0 (0.97)
2
0.05 (0.47, 0.56)
Jurgens (2003)
29a 57
0.14 (0.07, 0.22)
25 (0.05) 47
0.16 (0.06, 0.26)
28 (0.04) 10
0.13 (0.04, 0.22)
7 (0.38)
Jurgens (2003)
29b 56
0.27 (0.19, 0.34)
11 (0.24) 38
0.34 (0.24, 0.44)
0 (0.48) 18
0.19 (0.08, 0.30)
18 (0.24)
Jurgens (2003)
29c
0.61 (0.24, 0.98) 72 (<0.01)
2
0.88 (0.06, 1.82)
79 (0.03)
Toweed (2002)
30
1
0.31 (0.67, 0.04)
NA
8
2 0.32 (0.56, 0.08)
0 (0.96)
6
44 (0.15) 33
0.92 (0.33, 1.52)
0 (0.97)
0.52 (0.04, 1.00) 76 (<0.01)
1 0.32(0.65, 0.00)
NA
2
0.07 (0.02, 0.16)
1 (0.44)
0.07 (0.54, 0.40)
0 (0.73)
Binary
Hughes (1999)
31
5
2.30 (1.25, 4.23)
6 (0.37)
3
2.14 (1.20, 3.82)
0 (0.49)
2
3.41 (0.11, 110.90)
60 (0.11)
Cheine (2001)
32
5
0.65 (0.17, 2.56)
0 (0.63)
1
1.57 (0.02, 98.96)
NA
4
0.59 (0.14, 2.50)
0 (0.50)
Clarke (2001)
33
2
2.00 (0.94, 4.28)
49 (0.16)
1
3.30 (1.25, 8.68)
NA
1
1.48 (0.83, 2.63)
NA
0 (0.48)
Ortiz (1999)
34
5
1.74 (0.84, 3.64)
0 (0.70)
2
0.98 (0.21, 4.53)
0 (0.90)
3
2.07 (0.90, 4.81)
Higgins (2003)
35
2
0.33 (0.10, 1.08)
0 (0.33)
1
0.11 (0.01, 1.41)
NA
1
0.45 (0.12, 1.70)
NA
Wiffen (2000)
36
7
5.14 (3.38, 7.82)
34 (0.17)
5
6.42 (3.33, 12.36)
36 (0.18)
2
4.05 (2.22, 7.37)
38 (0.20)
Melchart (2000)
37
2
3.58 (0.92, 13.90)
0 (0.75)
1
2.80 (0.36, 21.73)
NA
1
4.33 (0.71, 26.53)
NA
Sultana (2000)
38 21
1.02 (0.74, 1.39)
9 (0.34)
2
0.81 (0.08, 8.51)
0 (0.90) 19
1.00 (0.71, 1.41)
18 (0.23)
Proctor (2001)
39
3
1.55 (0.54, 4.50)
0 (0.63)
1
1.50 (0.43, 5.25)
NA
2
1.71 (0.23, 12.71)
0 (0.34)
Hood (2001)
40 10
3.51 (2.30, 5.34)
0 (0.58)
4
2.33 (0.92, 5.95)
0 (0.41)
6
3.89 (2.43, 6.23)
0 (0.57)
N: Number of trials.
Note: The effect size is Hedges g for continuous outcomes and the odds ratio (OR) for binary outcomes. In this table outcomes have been coined, so that g > 0
and OR > 1 imply favourable effects for the experimental intervention.
that crossover designs should not be performed for interventions that have substantial carry-over effects. One might
reasonably question whether this prerequisite is fulfilled for
several of the topics in Table 1, like for example the use of
clomiphene citrate vs placebo for unexplained subfertility.
Therefore, the caveats associated with the design and conduct
of these designs should be better understood. This applies
both to the design and conduct of the original trials and
to their incorporation into systematic reviews and metaanalyses.
Although treatment effects correlated well between parallel
arm and crossover trials, there was a trend for crossover
trials to give more favourable results for the experimental
intervention. The trend could be due to chance. Alternatively, it
may reflect the fact that crossover trials tended to be smaller
and it is common for small trials to give more impressive
estimates of treatment effects than larger trials on the same
topic.45 However, the number of trials per meta-analysis was
too small to allow examining small study effects within each
meta-analysis in a meaningful way. A second possibility is that
crossover trials may have shorter follow-up than parallel trials
and sometimes treatment effects may wane over time with
longer follow-up.46 However, follow-up data were not available
in sufficient detail to allow us to explore this explanation
further. Overall, the difference between the two designs, even if
truly present, does not seem large enough that it would suggest
428
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Continuous outcomes
ROR (95% CI)
Author (reference)
Brion22
0.06 (0.01, 0.62)
0.19 (0.01, 3.47)
0.26 (0.02, 3.32)
0.30 (0.04, 2.22)
0.34 (0.06, 2.08)
0.53 (0.16, 1.76)
0.74 (0.37, 1.45)
0.77 (0.58, 1.01)
0.81 (0.23, 2.87)
0.95 (0.74, 1.21)
0.95 (0.33, 2.72)
1.02 (0.43, 2.44)
1.12 (0.21, 5.91)
1.52 (0.26, 8.80)
1.93 (0.29, 13.10)
2.75 (0.67, 11.29)
3.79 (0.79, 18.28)
83.24 (0.37, >100)
Suarez Almazor21
Gotzsche15
Pinelli18
Jones25
Suarez Almazor 16
Riemsma27
Jurgens29
McCrory28
Jurgens29
Lonergan24
Toweed30
Lemyre23
Green26
Jurgens29
White19
Ferreira17
Poustie 20
0.86 (0.67, 1.10)
Binary outcomes
Cheine32
0.37 (0.01, 30.03)
0.45 (0.15, 1.38)
0.63 (0.26, 1.53)
1.14 (0.11, 12.13)
1.23 (0.11, 13.16)
1.55 (0.10, 23.85)
1.60 (0.05, 54.39)
1.67 (0.58, 4.75)
2.11 (0.37, 12.06)
4.20 (0.23, 76.23)
Clarke33
Wiffen 36
Proctor39
Sultana38
Melchart37
Hughes31
Hood40
Ortiz34
Higgins35
0.94 (0.57, 1.55)
0.87 (0.74, 1.02)
0.01
0.1
0.5
1
2
10
100
ROR and 95% CI
Figure 2 Relative odds ratios (ROR) between crossover and parallel arm trials. RORs and 95% confidence intervals for each meta-analysis for the
comparison of effect sizes in the two types of designs. Separate summary ROR estimates are provided for the continuous outcomes (top) and binary
outcomes (bottom). The grand summary pertains to the summary ROR estimate across all meta-analyses. All comparisons were coined so that the
experimental treatment is compared vs the standard/older treatment or no treatment (or placebo) for a good, favourable outcome. Confidence
intervals that extend beyond the border of the graph have been marked with to crossing lines (like an ‘x’)
Table 3 Summary ROR estimates
Overall
N
ROR (95% CI)
I2% (pHet)
All meta-analyses
28
0.87 (0.74–1.02)
0 (0.46)
Excluding CD004022
25
0.88 (0.65–1.18)
5 (0.40)
Continuous
18
0.86 (0.67–1.10)
20 (0.22)
Binary
10
0.94 (0.57–1.55)
0 (0.75)
9
0.87 (0.69–1.10)
13 (0.30)
19
0.87 (0.57–1.34)
0 (0.60)
Larger in crossover trials
12
0.91 (0.73–1.13)
0 (0.66)
Larger in parallel trials
16
0.91 (0.66–1.30)
18 (0.25)
Outcome type
Crossover trial analyses
First period only
Other/unclear
Total analysed sample size
N: Number of trials.
that one type of design in particular yields consistently biased
results.
Some other caveats should be discussed. The number of
meta-analyses that we included in the ROR estimations is not
large, so there was considerable accompanying uncertainty in
the summary ROR estimates. Moreover, for the large majority
of the meta-analyses included here, the amount of evidence
was limited both for parallel and crossover studies. This is not
atypical of randomized evidence in general in the medical
sciences and it leaves considerable uncertainty about the
credibility of the results47,48 obtained with either study
design. Finally, we only examined meta-analyses published in
the Cochrane Library. This may be an advantage, as these metaanalyses follow specific instructions and standards,43 but also a
disadvantage, as the RevMan statistical package used in the
Cochrane reviews cannot easily analyse crossover trials data.49
EVIDENCE FROM CROSSOVER TRIALS
Surprisingly, only one of the analysed meta-analyses in our
study used the approach proposed by Reviewers’ Handbook of
the Cochrane Collaboration for incorporation of data of crossover studies in meta-analyses. It is difficult to extrapolate these
findings to meta-analyses that have been published in scientific
journals. Non-Cochrane meta-analyses may be quite differently
reported compared with Cochrane reviews, partly because of
space limitations.
It would be inappropriate to consider the results of parallel
arm trials as the perfect gold standard against which the
429
crossover trials are judged—or vice versa. Thus agreement
between the two study designs does not guarantee that the
estimates are unbiased. In particular for small trials, there may
be room for biased results with either type of designs. In
all, our empirical evaluation highlights the need for better
understanding and improved use of crossover designs and
their data.
Conflict of interest: None declared.
KEY MESSAGES
Crossover designs contribute evidence in about a fifth of systematic reviews in the Cochrane Library.
Full use of crossover data is uncommon in meta-analyses.
The observed effect sizes in crossover and parallel trials do not disagree beyond chance across different topics.
However, there was a trend for more conservative treatment effect estimates in parallel arm trials.
References
1
Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for
examining heterogeneity and combining results from several studies
in meta-analysis. In: Egger M, Davey Smith G and Altman DG (eds).
Systematic Reviews in Health Care. Meta-Analysis in Context. 2nd edn.,
London: BMJ Books, 2001.
2
Elbourne DR, Altman DG, Higgins JPT, Curtin F, Worthington HV,
Vail A. Meta-analyses involving cross-over trials: methodological
issues. Int J Epidemiol 2002;31:140–49.
3
Garcia R, Benet M, Arnau C, Cobo E. Efficiency of the crossover
design: an empirical estimation. Stat Med 2004;23:3773–80.
4
John JA, Russell KG, Whitaker D. Crossover: an algorithm for
the construction of efficient crossover designs. Stat Med
2004;23:2645–58.
5
Senn S. Misunderstandings regarding clinical crossover trials. Stat Med
2005;24:3675–78.
6
Cochrane Collaboration. Reviewers’ Handbook 8.11.3.
7
Rosenthal R. Parametric measures of effect size. In: Cooper H and
Hedges LV (eds). The Handbook of Research Synthesis. New York: Russell
Sage Foundation, 1994.
8
DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clin
Trials 1986;7:177–88.
9
Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for
Meta-Analysis in Medical Research. London: Wiley, 2004.
10
Higgins JP, Thompson SG. Quantifying heterogeneity in a metaanalysis. Stat Med 2002;21:1539–58.
11
Ioannidis JPA, Cappeleri JC, Lau J. Issues in comparisons of metaanalyses and large trials. JAMA 1998;279:1089–93.
12
Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M.
Statistical methods for assessing the influence of study characteristics
on treatment effects in ‘meta-epidemiological’ research. Stat Med
2002;21:1513–24.
13
Hasselblad V, Hedges LV. Meta-analysis of screening and diagnostic
tests. Psychol Bull 1995;117:167–78.
14
Fleiss JL. Analysis of data from multiclinic trials. Controlled Clin Trials
1986;7:267–75.
15
Gotzsche PC, Johansen HK. Short-term low dose corticosteroids vs
placebo and non-steroidal anti-inflammatory drugs in rheumatoid
arthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
16
Suarez-Almazor ME, Belseck E, Shea B, Wells G, Tugwell P.
Methotrexate for treating rheumatoid arthritis (Cochrane review).
The Cochrane Library. Issue 2. Oxford: Update Software, 2003.
17
Ferreira IM, Brooks D, Lacasse Y, Goldstein RS, White J. Nutritional
supplementation for stable chronic obstructive pulmonary disease
(Cochrane review). The Cochrane Library. Issue 2. Oxford: Update
Software, 2003.
18
Pinelli J, Symington A. Non nutritive sucking for promoting
physiologic
stability
and
nutrition
in
preterm
infants
(Cochrane review). The Cochrane Library. Issue 2. Oxford: Update
Software, 2003.
19
White J, Cates C, Wright J. Continuous positive airways pressure for
obstructive sleep apnoea (Cochrane review). The Cochrane Library.
Issue 2. Oxford: Update Software, 2003.
20
Poustie VJ, Rutherford P. Dietary interventions for phenylketonuria
(Cochrane review). The Cochrane Library. Issue 2. Oxford: Update
Software, 2003.
21
Suarez-Almazor ME, Spooner C, Belseck E. Azathioprine for treating
rheumatoid arthritis (Cochrane review). The Cochrane Librar. Issue 2.
Oxford: Update Software, 2003.
22
Brion LP, Primhak RA, Ambrosio-Perez I. Diuretics acting on the
distal renal tubule for preterm infants with (or developing) chronic
lung disease (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
23
Lemyre B, Davis PG, De Paoli AG. Nasal intermittent positive pressure
ventilation (NIPPV) versus nasal continuous positive airway pressure
(NCPAP) for apnea of prematurity (Cochrane review). The Cochrane
Library. Issue 2. Oxford: Update Software, 2003.
24
Lonergan E, Luxenberg J, Colford J. Haloperidol for agitation in
dementia (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
25
Jones A, Fay JK, Burr M, Stone M, Hood K, Roberts G. Inhaled
corticosteroid effects on bone metabolism in asthma and mild chronic
obstructive pulmonary disease (Cochrane review). The Cochrane
Library. Issue 2. Oxford: Update Software, 2003.
26
Green S, Buchbinder R, Barnsley L, Hall S, White M, Smidt N,
Assenbelft W. Non-steroidal anti-inflammatory drugs (NSAIDS) for
430
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
treating lateral elbow pain in adults (Cochrane review). The Cochrane
Library. Issue 2. Oxford: Update Software, 2003.
27
Riemsma RP, Kirwan JR, Taal E, Rasker JJ. Patient education for
adults with rheumatoid arthritis (Cochrane review). The Cochrane
Library. Issue 2. Oxford: Update Software, 2003.
28
McCrory DC, Brown CD. Anti-cholinergic bronchodilators versus
beta2-sympathomimetic agents for acute exacerbations of chronic
obstructive pulmonary disease (Cochrane review). The Cochrane
Library. Issue 2. Oxford: Update Software, 2003.
29
Jurgens G, Graudal NA. Effects of low sodium diet versus high
sodium diet on blood pressure, renin, aldosterone, catecholamines,
cholesterols and triglyceride (Cochrane review). The Cochrane Library.
Issue 2. Oxford: Update Software, 2003.
30
Toweed TE, Judd MJ, Hochberg MC, Wells G. Acetaminophen for
osteoarthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
31
Hughes E, Collins J, Vandekerckhove P. Clomiphene citrate for
unexplained subfertility in women (Cochrane review). The Cochrane
Library. Issue 2. Oxford: Update Software, 2003.
32
Cheine M, Ahonen J, Wahlbeck K. Beta-blocker supplementation
of standard drug treatment for schizophrenia (Cochrane review).
The Cochrane Library. Issue 2. Oxford: Update Software, 2003.
33
Clarke CE, Speller JM. Pergolide versus bromocriptine for
levodopa-induced
complications
in
Parkinson’s
disease
(Coharane review). The Cochrane Library. Issue 2. Oxford: Update
Software, 2003.
34
Ortiz Z, Shea B, Suarez-Almazor ME, Moher D, Wells G, Tugwell P.
Folic acid and folinic acid for reducing side effects in
patients
receiving
methotrexate
for
rheumatoid
arthritis
(Cochrane review). The Cochrane Library. Issue 2. Oxford: Update
Software, 2003.
35
Higgins JPT, Flicker L. Lecithin for dementia and cognitive impairment
(Cochrane review). The Cochrane Library. Issue 2. Oxford: Update
Software, 2003.
36
Wiffen P, Collins S, McQuay H, Carroll D, Jadad A, Moore A.
Anticonvulsant drugs for acute and chronic pain (Cochrane review).
The Cochrane Library. Issue 2. Oxford: Update Software, 2003.
37
Melchart D, Linde K, Fischer P et al. Acupuncture for idiopathic
headache (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
38
Sultana A, Reilly J, Fenton M. Thioridazine for schizophrenia (Cochrane
review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003.
39
Proctor ML, Smith CA, Farquhar CM, Stones RW. Transcutaneous
electrical nerve stimulation and acupuncture for primary dysmenorrhoea (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
40
Hood WBJr, Dans AL, Guyatt GH, Jaeschke R, McMurray JJV.
Digitalis for treatment of congestive heart failure in patients in sinus
rhythm (Cochrane review). The Cochrane Library. Issue 2. Oxford:
Update Software, 2003.
41
Karassa FB, Tatsioni A, Ioannidis JP. Design, quality, and bias
in randomized controlled trials of systemic lupus erythematosus.
J Rheumatol 2003;30:979–84.
42
Kyriakidi M, Ioannidis JPA. Design and quality considerations for
randomized controlled trials in systemic sclerosis. Arthritis Rheum
2002;47:73–81.
43
Jadad AR, Moher M, Browman GP et al. Systematic reviews and
meta-analyses on treatment of asthma: critical evaluation. Br Med J
2000;320:537–40.
44
Van Coevorden AM, Coenraads PJ, Svensson A et al. European
Dermato-Epidemiology Network (Eden). Overview of studies of
treatments for hand eczema-the EDEN hand eczema survey. Br J
Dermatol 2004;151:446–51.
45
Egger M, Davey Smith G, Schneider M, Minder C. Bias in metaanalysis detected by a simple, graphical test. Br Med J 1997;315:629–34.
46
Ioannidis JP, Cappelleri JC, Sacks HS, Lau J. The relationship
between study design, results, and reporting of randomized clinical
trials of HIV infection. Control Clin Trials 1997;18:431–44.
47
Sterne JA, Davey Smith G. Sifting the evidence-what’s wrong with
significance tests? Br Med J 2001;322:226–31.
48
Ioannidis JP. Why most published research findings are false. PLoS
Med 2005;2:e124.
49
Senn S. The quality of systematic reviews. Review is biased. Br Med J
2000;321:297.