Published by Oxford University Press on behalf of the International Epidemiological Association ß The Author 2007; all rights reserved. Advance Access publication 14 February 2007 International Journal of Epidemiology 2007;36:422–430 doi:10.1093/ije/dym001 METHODOLOGY Evidence from crossover trials: empirical evaluation and comparison against parallel arm trials Dimitrios N Lathyris,1 Thomas A Trikalinos1, 2 and John PA Ioannidis1, 2* Accepted 20 December 2006 Background We aimed to evaluate empirically how crossover trial results are analysed in meta-analyses of randomized evidence and whether their results agree with parallel arm studies on the same questions. Methods We used a systematic sample of Cochrane meta-analyses including crossover trials. We evaluated the methods of analysis for crossover results and compared the concordance of the estimated effect sizes in crossover vs parallel arm trials. Results Of 334 screened reviews, 62 had crossover trials. Of those, 33 meta-analyses performed quantitative syntheses involving two-arm two-period crossover trials. There was large variability on how these trials were analysed; only one of the 33 meta-analyses stated that they used the data from both the first and second period with an appropriate paired approach. Nine meta-analyses used the first period data only and 14 gave no information at all on what they had done. Twenty-eight meta-analyses had both crossover (n ¼ 137, sample size n ¼ 7162) and parallel arm (n ¼ 132, sample size n ¼ 11 398) trials. Effect sizes correlated well with the two types of designs ( ¼ 0.72). Differences on whether the summary effect had a P < 0.05 or not were common due to limited sample sizes. The summary relative odds ratio for parallel arm vs crossover designs for favourable outcomes was 0.87 (95% CI, 0.74–1.02). Conclusions Crossover designs may contribute evidence in a fifth of systematic reviews, but few meta-analyses make use of their full data. The results of crossover trials tend to agree with those of parallel arm trials, although there was a trend for more conservative treatment effect estimates in parallel arm trials. Keywords Crossover studies, parallel studies, meta-analysis, empirical evaluation Randomized trials may be conducted with either a parallel arm design or variants of crossover designs. Parallel arm trials are far more popular in the literature. They are simpler to design and analyse and are generally feasible. We also have well-established and accepted methods for their meta-analysis.1 Crossover trials are less common, but not rare. An appraisal of issue 1, 2001 of the Cochrane Database of Systematic 1 Clinical Trials and Evidence-Based Medicine Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece. 2 Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts University School of Medicine, Boston, MA, USA. * Corresponding author. Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece. E-mail: [email protected] Reviews found that 184 out of 1000 reviews referred to crossover trials and 151 included data from crossover designs.2 A Medline search of crossover studies published between 2000 and 2003 in five major medical journals found 40 crossover trials.3 Crossover trials are difficult or inappropriate to apply for many research questions. Moreover, they represent relatively complex designs. In a two-treatment crossover study each patient may be perceived as his own control: ideally one might simply compare the effect of treatment of the first period to that of the second period. However, treatment-period interaction and carry over effects jeopardize the validity of such simple inferences. Dealing with these problems poses analytical challenges.4,5 There are several different approaches for the analysis of data from crossover trials. Diverse options apply to 422 EVIDENCE FROM CROSSOVER TRIALS the primary analysis of the data; the problem is further accentuated in meta-analyses, where information to perform some types of analyses may be missing in published reports. One may use the first period results only, but this will waste all the data after crossover and will drastically diminish the power to detect any intervention effects. Alternatively, one may use the first and second period data as two different parallel studies, but this approach is biased, because it ignores treatment-period interaction. Finally, one may use outcomes from paired analyses. However, this information is often missing from published reports.3 Then one can try to approximate paired analysis by using imputed correlation coefficients from previous studies with sufficient amount of data.6 This lack of unanimity and use of suboptimal methods may introduce bias in the analysis of crossover studies. We considered that it would be useful to obtain empirical evidence on the handling of the data and the influence of data from crossover trials in meta-analyses of randomized evidence. We tried to answer the following questions. What is the relative contribution of crossover trials to randomized evidence for diverse medical interventions? How are crossover trials handled in current meta-analyses? Do the results of crossover trials agree with those of parallel studies in the same meta-analysis, or do they find strong or weaker treatment effects? To answer these questions we analysed data from a systematic sample of meta-analyses. Methods Selection of meta-analyses, crossover studies and outcomes We perused every fifth review out of the 1669 included in the Cochrane Library Issue 2, 2003 (in the order listed in the Cochrane Database) to identify reviews which included crossover randomized trials in quantitative syntheses (metaanalyses). The spacing of the evaluated reviews was chosen arbitrarily. We excluded all non-randomized and pseudorandomized trials; and selected only two treatments, two period crossover studies (AB/BA design). Comparisons of more than two treatments are rare and their interpretation is rather complicated anyhow. At a second step, we focused on meta-analyses with at least one parallel and at least one crossover trial so as to evaluate the concordance of results obtained with these two designs. We examined both binary and continuous outcomes. In every systematic review we selected the primary outcome as stated by the reviewers. If a primary outcome was not clearly identified, we selected what we deemed to be the most clinically important outcome; if this was also not clear, we selected the one with the larger number of studies; if there were ties, we selected the one with the larger number of randomized patients; and then the meta-analysis with fewer zero cells in the 2 2 tables of the constituent studies (for binary outcomes). Whenever there were two or more independent metaanalyses from the same systematic review (e.g. on different, non-overlapping types of patients), we considered them as separate meta-analyses. Two investigators (D.L. and T.A.T.) 423 independently selected eligible meta-analyses and crossover studies. Disagreements were referred to the third investigator (J.P.A.I.). Data extraction on primary trials For every eligible meta-analysis we extracted information on the specific comparison, and outcome, and the number of parallel and crossover trials. For each trial we recorded the first author and year of publication, the design (parallel vs. crossover), the way the meta-analysts utilized data from crossover trials [first period only, second period only, ‘combined’ data (and if so how combined), or no comment/ unclear], the approach of the meta-analysis to the carry-over effect in each crossover trial, and information necessary for the calculation of the effect sizes (2 2 tables for binary outcomes, and number of patients, mean response and SD of the responses in each arm for continuous outcomes). Statistical methods Meta-analyses Quantitative analyses were performed in meta-analyses that included both crossover and parallel studies. For binary outcomes we used the odds ratio (OR) as the metric of choice. Continuous outcomes were quantified using standardized mean differences (SMDs), as expressed by Hedges’s g metric. SMDs express the magnitude of the treatment effects relative to the within group standard deviations and can be used to synthesize studies that have quantified treatment effects in different scales.7 All comparisons were coined so that the experimental treatment is compared vs the standard/older treatment or no treatment (or placebo) for a good, favourable outcome. For example, if the outcome used by the original trials and their meta-analysis was therapeutic success, we kept the data as is; however if it was therapeutic failure, then we took the complementary counts, i.e. we inversed the OR. If the outcome was measured on a continuous scale, where health was better with higher scores, we kept the data as is; if the scale meant worse health with higher scores, then we inversed the sign of the difference between arms in the calculation of the SMD. Therefore an OR > 1 or SMD > 0 suggests that the experimental treatment fares better than the comparator. For every topic we calculated the summary effect size of crossover and parallel studies with both fixed and random effects inverse variance syntheses.8,9 Fixed effects analyses assume that the true effect of treatment is the same across synthesized studies, while random effects analyses allow for between-study heterogeneity (dissimilarity) and incorporate it in the calculations. For binary outcomes, if any trial arms in the analysed studies had zero observed events we added 0.5 to all cells of the pertinent two by two table. We tested for between-study heterogeneity using the 2 distributed Q statistic and quantified its extent with the I2 statistic.10 I2 ranges between 0% and 100%, and values above 75% imply very high heterogeneity. Assessment of concordance We evaluated the agreement between parallel and crossover designs. Concordance was measured by assessing the Spearman 424 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY correlation coefficient for the effect size estimates between the two designs across all meta-analyses; how often the two designs showed effects in the same direction; how often the two designs agreed with respect to having P < 0.05 for the summary effect; and whether the effect size estimates of the two designs differed beyond chance.11 For the latter assessment, we calculated the relative odds ratio (ROR) by dividing the summary OR in parallel trials by the summary OR in crossover trials and also estimated the variance and 95% confidence intervals (CIs) of the ROR.12 We transformed the weighted SMDs of the continuous outcomespffiffi to ORs13 using the pfollowing formulae: ffiffiffi with se lnðORÞ ¼ ð 3=3Þ seðgÞ. When OR ¼ e ð 3=3Þg ROR > 1, parallel trials yield larger OR estimates than crossover trials, i.e. the parallel trials give a more favourable picture for the experimental intervention than the crossover trials, and vice versa when ROR < 1. Finally, we assessed whether there was a systematic difference in the estimated summary effects between the two designs across all topics. The summary ROR was calculated across all topics with random effects syntheses and heterogeneity in the ROR estimates across meta-analyses was measured with the Q and I2 statistics. Subgroup and sensitivity analyses The main analyses included all topics, regardless of the way data from the crossover studies were reported to be analysed. In separate analyses we estimated the summary ROR across the two designs considering only the crossover studies where only data from the first period were used; and estimated the summary ROR considering all other crossover studies. We also performed a sensitivity analyses by excluding the largest metaanalysis that included almost half of the crossover studies in our empirical evaluation. Analyses were conducted in Intercooled Stata 8.2 and in SPSS 13.0. All P-values are two tailed. Hypothesis-testing for between-study heterogeneity is done at ¼ 0.10.14 Results Eligible meta-analyses Figure 1 shows the flow of the systematic screening of the Cochrane Library. We screened 334 systematic reviews and excluded 272 of them, as they did not have any crossover trials. Of the remaining 62 systematic reviews, 26 were finally selected.15–40 These pertained to 28 independent meta-analyses, because one review contained eligible data for three different meta-analyses (on three different types of study populations). Analysis of data from crossover trials A variety of approaches were used to incorporate the results of crossover trials in the 28 meta-analyses that included both types of designs and 12 out of these 28 meta-analyses did not mention at all their approach towards crossover trials results. Nine meta-analyses used only the first period results of crossover studies. Three meta-analyses combined data from first and second period, but only one of them described the method of combination, which was the paired analysis 334 potentially eligible reviews 272 without crossover trials 62 with at least one crossover trial 36 excluded: No outcomes with > 1 study (n =14) No quantitative meta-analysis (n =9) All crossover trials had > 2 groups (n =4) Only parallel arm studies in metaanalysis (n =2) No study-level graphical/numerical information (n = 1) Identical meta-analysis with another review (n =1) 31 with at least one AB / BA crossover trial in a meta-analysis Five had only crossover trials in their meta-analyses 26 eligible for quantitative analyses Figure 1 Flow of systematic reviews proposed by the Cochrane group. One meta-analysis used data from the second period only. Finally, three meta-analyses did not have a consistent approach towards analysis of crossover trials, using first period data in some trials and data from both periods in others. Only four of the 28 meta-analyses discussed the problem of carry over effect: three of them assumed that there was no carry-over effect in the analysed crossover studies and one relied on the carry over test results of crossover studies as reported in their initial publication. In the five meta-analyses that had included only crossover trials, two meta-analyses did not mention at all what they had done, one approximated a paired analysis as described in the Reviewers’ Handbook of the Cochrane Collaboration, one used data from the second period, and one combined data from both periods without clarifying precisely the analysis methods. Only one of these meta-analyses stated that it assumed that there was no carry-over effect to the analysed crossover studies, while the remaining four meta-analyses did not comment at all on the potential problem of carry-over effect. Characteristics of crossover and parallel arm trials In the 28 meta-analyses that included both types of designs (Table 1), there were 142 crossover trials. We excluded five crossover trials from two meta-analyses (four trials analysed Table 1 Characteristics of included meta-analyses Outcome used in the meta-analysis Meta-analysis (year) Reference number CDSR Interventions and disease NStudies (Patients) Definition ES > 0 | OR > 1 favours experimental intervention Crossover Parallel Analysis of crossover trials Continuous Gotzsche (2002) 15 CD000189 Short-term corticosteroids vs placebo for RA Joint tenderness No 2 (66) Suarez-Almazor (1997) 16 CD000957 MTX vs placebo for RA Number of tender joints No 2 (55) 3 (164) Unclear 1 (40) Combined data 8 (252) First period Ferreira (2001) 17 CD000998 Nutrition vs placebo/usual diet for stable COPD Weight loss Yes 1 (25) Pinelli (2001) 18 CD001071 Non-nutritive sucking vs control in preterm infants Heart rate change No 2 (66) 1 (20) Unclear White (2001) 19 CD001106 CPAP vs placebo for obstructive sleep apnea Epworth Sleepiness Scale No 2 (86) 2 (212) Unclear Poustie(2003) 20 CD001304 Low PA diet vs no diet for PKU Blood PA level No 1 (32) Suarez-Almazor (2000) 21 CD001461 Azathioprine vs placebo for RA Number of tender joints No 2 (53) 1 (28) Mixed 1 (9) Unclear Brion (2001) 22 CD001817 Diuretics vs placebo for non-intubated preterms with CLD Change in compliance Yes 1 (20) 1 (21) Combined data Lemyre (2001) 23 CD002272 NIPPV vs NCPAP for apnea of prematurity PCO2 at 4–6 h No 1 (40) 1 (34) Combined data Lonergan (2002) 24 CD002852 Haloperidol vs placebo for agitation in dementia Change in agitation No 1 (60) Jones (2001) 25 CD003537 Inhaled corticosteroid vs placebo for asthma and mild COPD Osteocalcin Yes 2 (111) Green (2001) 26 CD003686 NSAID vs placebo for lateral elbow pain Pain VAS No 1 (28) Riemsma (2003) 27 CD003688 Patient education vs control in RA Change in pain score No McCrory (2003) 28 CD003900 Anticholinergics vs 2-agonists for COPD exacerbation Change in FEV1 at 90 min Yes Change in SBP No 47 (2612) 10 (2484) Unclear 38 (1556) 18 (1811) Unclear 3 (309) First period 1 (30) Second period 2 (102) Unclear 4 (262) 33 (1957) First period 2 (58) 2 (71) First period 29a CD004022 Low vs high Na diet (Caucasians, normal SBP) Jurgens (2003) 29b CD004022 Low vs high Na diet (Caucasians, elevated SBP) Change in SBP No Jurgens (2003) 29c CD004022 Low vs high Na diet (African-Americans, elevated DBP) Change in SBP No 6 (350) 2 (172) Unclear 30 CD004257 NSAIDS vs acetaminophen for OA Rest pain No 1 (148) 1 (123) Unclear Hughes (1999) 31 CD000057 Clomiphene citrate vs placebo for unexplained subfertility Pregnancies per patient Yes 3 (357) Cheine (2001) 32 CD000234 -blocker supplementation vs placebo in schizophrenia Leaving the study early No 1 (8) 4 (109) First period Clarke (2001) 33 CD000236 Pergoline vs bromocriptine for L-Dopa complications in PD Proportion improved Yes 1 (114) 1 (191) First period Ortiz (1999) 34 CD000951 Folinic acid vs placebo for MTX side effects in RA GI side effects No 2 (33) Higgins (2003) 35 CD001015 Lecithin vs control for AD Proportion deteriorated No 1 (15) Wiffen (2000) 36 CD001133 Anticonvulsants vs placebo for all neuropathic pain Proportion improved Yes 5 (675) Toweed (2002) Binary 2 (98) Mixed 3 (135) Unclear 1 (37) First period 2 (380) Unclear Melchart (2000) 37 CD001218 True vs sham acupuncture for idiopathic headache Proportion improved Yes 1 (18) Sultana (2000) 38 CD001944 Thioridazine vs typical neuroleptics for schizophrenia Leaving the study early No 2 (81) 19 (1656) Combined data Proctor (2001) 39 CD002123 TENS vs placebo for primary dysmenorrhoea Proportion with pain relief Yes Hood (2001) 40 CD002901 Digitalis vs control for CHF (patients with sinus rhythm) Clinical deterioration No 1 (42) 4 (191) 1 (30) First period 2 (29) First period 6 (894) Unclear 425 AD, Alzheimer’s disease; CDSR, Cochrane database of systematic reviews number; CHF, congestive heart failure; CLD, chronic lung disease; COPD, chronic obstructive pulmonary disease; (N)CPAP, (nasal) continuous positive airway pressure; DBP, diastolic blood pressure; ES, effect size; h, hours; GI, gastrointestinal; MTX, methotrexate; NIPPV, nasal intermittent positive pressure ventilation; NSAID, non-steroid anti-inflammatory drugs; NStudies, number of studies; OA, osteoarthritis; OR, odds ratio; PA, phenylalanine; PKU, phenylketonuria; PD, Parkinson’s disease; RA, rheumatoid arthritis; SBP, systolic blood pressure; TENS, transcutaneous electrical nerve stimulation; VAS, visual analogue scale. Note: All meta-analyses are included in the Cochrane Library, Issue 2, 2003. The year next to the authors name refers to the last amendment of each meta-analysis. Outcomes are described as defined in the systematic reviews. For the quantitative analyses, all comparisons were coined so that the experimental treatment is compared vs the standard/older treatment or no treatment (or placebo) for a good, favourable outcome. Thus outcome definitions were reversed for entries with ‘No’ in the fifth column. EVIDENCE FROM CROSSOVER TRIALS Jurgens (2003) 426 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY more than two groups and one was a partial crossover trial where many patients continued the first period’s treatment in the second period). Finally, we analysed 137 crossover studies with a total sample size of 7162 included in the analyses and 132 parallel studies with 11 398 study subjects. In one topic (effects of low sodium diet vs high sodium diet on blood pressure, and biochemical indices29) three different outcomes were analysed that included 91 crossover analyses with a total sample size of 4518 and 30 parallel studies with 4467 subjects. These trials represented a large proportion of the overall database and thus we performed sensitivity analyses excluding this topic. A wide variety of diseases and interventions were represented. The most common targeted categories of diseases were joint diseases (n ¼ 7, of which n ¼ 5 for rheumatoid arthritis) and respiratory (n ¼ 6, of which n ¼ 3 for chronic obstructive pulmonary disease). Parallel trials had a larger analysed sample size than crossover trials (median 46 vs 36, P ¼ 0.006). The crossover trials contributed anywhere between 5% and 79% of the total sample size of analysed subjects across these meta-analyses. In 16 of 28 meta-analyses, the total sample size in the parallel trials was larger than the sample size analysed in the crossover trials. There was no difference in the year of publication of crossover vs parallel trials (median 1990 vs 1991, P ¼ 0.36). Of the 28 meta-analyses, 18 used continuous outcomes (Table 1). Treatment effects and between-study heterogeneity Summary SMDs expressed by Hedges’s g for continuous outcomes and summary OR for binary outcomes and I2 for the extent of heterogeneity are shown in Table 2. These data refer to all studies and separately to crossover and parallel studies. Between-study heterogeneity did not seem to be clearly explained by differences between crossover and parallel trials. Only one meta-analysis found very-large between-study heterogeneity overall; one found very large between-study heterogeneity when separate analyses were performed for crossover trials; and two when separate analyses were performed for parallel arm trials. Concordance between crossover and parallel trial results The summary effect sizes for crossover trials were highly correlated to the summary effects in parallel trials across all 28 meta-analyses (Spearman correlation coefficient 0.72, P < 0.001). The results were similar, but based on more sparse data, separately for meta-analyses of continuous ( ¼ 0.68, P ¼ 0.002) and binary ( ¼ 0.68, P ¼ 0.030) outcomes. In 23 of the 28 meta-analyses, the point summary estimates were in the same direction with both designs (15/18 continuous outcomes, 6/8 binary outcomes). When the overall results had P < 0.05, the two types of designs always found estimates of effects in the same direction. In six meta-analyses, evidence from both designs gave P < 0.05 in the summary results, and in 12 meta-analyses both designs gave P’s 5 0.05. However, in 10 meta-analyses the two designs differed in the presence or not of P < 0.05 (in seven only the crossover data resulted in P < 0.05, in 3 only the parallel trials data resulted in P < 0.05). Overall the kappa coefficient was 0.27 (95% CI, 0.07 to 0.62). ROR analyses ROR analyses showed that the difference in the effect size observed in the two types of designs was beyond chance in only one of the 28 meta-analyses (Figure 2). This meta-analysis evaluated distal loop diuretics in preterm infants with chronic lung disease: there was only one small parallel trial (n ¼ 21 infants) and one even smaller crossover trial [n ¼ 10 infants, counted twice in the analysis (for the two periods)] that found opposite results. The crossover trial found even formally statistically results, despite its small sample size. There was considerable variability in the point estimates of the ROR per meta-analysis and 12 of the ROR point estimates were either larger than two or smaller than 0.5 (Figure 2). However, the large majority of RORs had wide CIs, because usually the amount of evidence was not extensive with either or both designs. Overall, there was no between-meta-analysis heterogeneity in the ROR estimates (P ¼ 0.46, I2 ¼ 0%). The summary ROR was 0.87 and its 95% CI marginally encompassed 1.00 (ROR ¼ 0.87, 95% CI 0.74–1.02), suggesting a trend for more conservative treatment effects in parallel arm trials than in crossover trials. Exclusion of the largest meta-analysis yielded similar estimates, with a summary ROR of 0.88. The summary ROR estimates were similar in meta-analyses that clearly specified that they had used consistently only the first period data (ROR ¼ 0.87) and the other meta-analyses (ROR ¼ 0.87). The ROR estimates were 0.94 for binary and 0.86 for continuous data, respectively and these did not differ beyond chance (P ¼ 0.74). Finally, the ROR was still 0.91 (95% CI 0.73–1.13) when limited to meta-analyses where the analysed sample size of the crossover designs was smaller than the sample size of the parallel arm trials (Table 3). Discussion Our empirical evaluation shows that crossover designs may contribute evidence on the effectiveness of medical interventions in about a fifth of systematic reviews. We found that their results correlate with those of parallel arm trials in broad terms. However, if anything, there was a trend for more conservative treatment effect estimates in parallel arm trials. This trend should be seen with caution. Perhaps more importantly, the majority of meta-analyses do not mention at all their approach towards the carry-over effect problem and none of them proceeded to reanalysis of data from all crossover periods with testing for carry-over effect. Systematic reviewers usually do not state how exactly they have handled the results of such studies, while a common practice is to use only the first period results, thus wasting the second period information. For some disciplines, such as rheumatology, respiratory medicine and dermatology, crossover trials may be encountered quite commonly in the literature. This has been demonstrated also in empirical evaluations of large samples of trials in these disciplines.41–44 Sometimes medical interventions may have been evaluated only with crossover trials. It is well appreciated EVIDENCE FROM CROSSOVER TRIALS 427 Table 2 Effect sizes and heterogeneity in the eligible meta-analyses overall and per study design All studies combined Meta-analysis (year) Crossover trials only Parallel trials only Reference number N Effect (95% CI) I2% (pHet) N Effect (95% CI) I2% (pHet) N Gotzsche(2002) 15 3 1.32 (0.46, 2.18) 74 (0.02) 2 1.58 (0.34, 2.82) 78 (0.03) 1 0.85 (0.20, 1.49) NA Suarez-Almazor (1997) 16 5 0.86 (0.58, 1.14) 0 (0.58) 2 1.12 (0.55, 1.70) 0 (0.52) 3 0.77 (0.45, 1.09) 0 (0.50) 31 (0.18) Effect (95% CI) I2% (pHet) Continuous Ferreira (2001) 17 9 0.07 (0.25, 0.40) 37 (0.12) 1 0.58 (1.39, 0.22) NA 8 0.15 (0.18, 0.48) Pinelli (2001) 18 3 0.19 (0.76, 0.38) 41 (0.18) 2 0.01 (0.64, 0.62) 39 (0.20) 1 0.68 (1.59, 0.23) NA White (2001) 19 4 0.84 (0.52, 1.16) 38 (0.18) 2 0.46 (0.26, 1.18) 51 (0.15) 2 1.02 (0.73, 1.31) 0 (0.97) Poustie (2003) 20 2 2.53 (0.28, 4.78) 61 (0.11) 1 1.72 (0.89, 2.54) NA 1 4.15 (1.28, 7.03) NA Suarez-Almazor (2000) 21 3 1.12 (0.30, 1.93) 61 (0.08) 2 1.51 (0.11, 2.92) 72 (0.06) 1 0.60 (0.16, 1.37) NA Brion (2001) 22 2 0.64 (0.92, 2.21) 82 (0.02) 1 1.47 (0.45, 2.48) NA 1 0.13 (1.00, 0.73) NA Lemyre (2001) 23 2 0.10 (0.56, 0.36) 0 (0.89) 1 0.13 (0.75, 0.49) NA 1 0.07 (0.74, 0.61) NA Lonergan (2002) 24 4 0.12 (0.08, 0.33) 0 (0.99) 1 0.15 (0.39, 0.69) NA 3 0.12 (0.10, 0.34) 0 (0.96) Jones (2001) 25 3 0.41 (0.96, 0.13) 48 (0.15) 2 0.24 (0.86, 0.38) 45 (0.18) Green (2001) 26 3 0.93 (0.56, 1.29) 0 (0.38) 1 0.69 (0.07, 1.46) NA Riemsma (2003) 27 37 McCrory (2003) 28 1 0.84 (1.63, 0.05) 2 NA 33 (0.22) 0.08 (0.01, 0.17) 7 (0.36) 4 0.24 (0.13, 0.60) 4 0.01 (0.36, 0.33) 0 (0.97) 2 0.05 (0.47, 0.56) Jurgens (2003) 29a 57 0.14 (0.07, 0.22) 25 (0.05) 47 0.16 (0.06, 0.26) 28 (0.04) 10 0.13 (0.04, 0.22) 7 (0.38) Jurgens (2003) 29b 56 0.27 (0.19, 0.34) 11 (0.24) 38 0.34 (0.24, 0.44) 0 (0.48) 18 0.19 (0.08, 0.30) 18 (0.24) Jurgens (2003) 29c 0.61 (0.24, 0.98) 72 (<0.01) 2 0.88 (0.06, 1.82) 79 (0.03) Toweed (2002) 30 1 0.31 (0.67, 0.04) NA 8 2 0.32 (0.56, 0.08) 0 (0.96) 6 44 (0.15) 33 0.92 (0.33, 1.52) 0 (0.97) 0.52 (0.04, 1.00) 76 (<0.01) 1 0.32(0.65, 0.00) NA 2 0.07 (0.02, 0.16) 1 (0.44) 0.07 (0.54, 0.40) 0 (0.73) Binary Hughes (1999) 31 5 2.30 (1.25, 4.23) 6 (0.37) 3 2.14 (1.20, 3.82) 0 (0.49) 2 3.41 (0.11, 110.90) 60 (0.11) Cheine (2001) 32 5 0.65 (0.17, 2.56) 0 (0.63) 1 1.57 (0.02, 98.96) NA 4 0.59 (0.14, 2.50) 0 (0.50) Clarke (2001) 33 2 2.00 (0.94, 4.28) 49 (0.16) 1 3.30 (1.25, 8.68) NA 1 1.48 (0.83, 2.63) NA 0 (0.48) Ortiz (1999) 34 5 1.74 (0.84, 3.64) 0 (0.70) 2 0.98 (0.21, 4.53) 0 (0.90) 3 2.07 (0.90, 4.81) Higgins (2003) 35 2 0.33 (0.10, 1.08) 0 (0.33) 1 0.11 (0.01, 1.41) NA 1 0.45 (0.12, 1.70) NA Wiffen (2000) 36 7 5.14 (3.38, 7.82) 34 (0.17) 5 6.42 (3.33, 12.36) 36 (0.18) 2 4.05 (2.22, 7.37) 38 (0.20) Melchart (2000) 37 2 3.58 (0.92, 13.90) 0 (0.75) 1 2.80 (0.36, 21.73) NA 1 4.33 (0.71, 26.53) NA Sultana (2000) 38 21 1.02 (0.74, 1.39) 9 (0.34) 2 0.81 (0.08, 8.51) 0 (0.90) 19 1.00 (0.71, 1.41) 18 (0.23) Proctor (2001) 39 3 1.55 (0.54, 4.50) 0 (0.63) 1 1.50 (0.43, 5.25) NA 2 1.71 (0.23, 12.71) 0 (0.34) Hood (2001) 40 10 3.51 (2.30, 5.34) 0 (0.58) 4 2.33 (0.92, 5.95) 0 (0.41) 6 3.89 (2.43, 6.23) 0 (0.57) N: Number of trials. Note: The effect size is Hedges g for continuous outcomes and the odds ratio (OR) for binary outcomes. In this table outcomes have been coined, so that g > 0 and OR > 1 imply favourable effects for the experimental intervention. that crossover designs should not be performed for interventions that have substantial carry-over effects. One might reasonably question whether this prerequisite is fulfilled for several of the topics in Table 1, like for example the use of clomiphene citrate vs placebo for unexplained subfertility. Therefore, the caveats associated with the design and conduct of these designs should be better understood. This applies both to the design and conduct of the original trials and to their incorporation into systematic reviews and metaanalyses. Although treatment effects correlated well between parallel arm and crossover trials, there was a trend for crossover trials to give more favourable results for the experimental intervention. The trend could be due to chance. Alternatively, it may reflect the fact that crossover trials tended to be smaller and it is common for small trials to give more impressive estimates of treatment effects than larger trials on the same topic.45 However, the number of trials per meta-analysis was too small to allow examining small study effects within each meta-analysis in a meaningful way. A second possibility is that crossover trials may have shorter follow-up than parallel trials and sometimes treatment effects may wane over time with longer follow-up.46 However, follow-up data were not available in sufficient detail to allow us to explore this explanation further. Overall, the difference between the two designs, even if truly present, does not seem large enough that it would suggest 428 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Continuous outcomes ROR (95% CI) Author (reference) Brion22 0.06 (0.01, 0.62) 0.19 (0.01, 3.47) 0.26 (0.02, 3.32) 0.30 (0.04, 2.22) 0.34 (0.06, 2.08) 0.53 (0.16, 1.76) 0.74 (0.37, 1.45) 0.77 (0.58, 1.01) 0.81 (0.23, 2.87) 0.95 (0.74, 1.21) 0.95 (0.33, 2.72) 1.02 (0.43, 2.44) 1.12 (0.21, 5.91) 1.52 (0.26, 8.80) 1.93 (0.29, 13.10) 2.75 (0.67, 11.29) 3.79 (0.79, 18.28) 83.24 (0.37, >100) Suarez Almazor21 Gotzsche15 Pinelli18 Jones25 Suarez Almazor 16 Riemsma27 Jurgens29 McCrory28 Jurgens29 Lonergan24 Toweed30 Lemyre23 Green26 Jurgens29 White19 Ferreira17 Poustie 20 0.86 (0.67, 1.10) Binary outcomes Cheine32 0.37 (0.01, 30.03) 0.45 (0.15, 1.38) 0.63 (0.26, 1.53) 1.14 (0.11, 12.13) 1.23 (0.11, 13.16) 1.55 (0.10, 23.85) 1.60 (0.05, 54.39) 1.67 (0.58, 4.75) 2.11 (0.37, 12.06) 4.20 (0.23, 76.23) Clarke33 Wiffen 36 Proctor39 Sultana38 Melchart37 Hughes31 Hood40 Ortiz34 Higgins35 0.94 (0.57, 1.55) 0.87 (0.74, 1.02) 0.01 0.1 0.5 1 2 10 100 ROR and 95% CI Figure 2 Relative odds ratios (ROR) between crossover and parallel arm trials. RORs and 95% confidence intervals for each meta-analysis for the comparison of effect sizes in the two types of designs. Separate summary ROR estimates are provided for the continuous outcomes (top) and binary outcomes (bottom). The grand summary pertains to the summary ROR estimate across all meta-analyses. All comparisons were coined so that the experimental treatment is compared vs the standard/older treatment or no treatment (or placebo) for a good, favourable outcome. Confidence intervals that extend beyond the border of the graph have been marked with to crossing lines (like an ‘x’) Table 3 Summary ROR estimates Overall N ROR (95% CI) I2% (pHet) All meta-analyses 28 0.87 (0.74–1.02) 0 (0.46) Excluding CD004022 25 0.88 (0.65–1.18) 5 (0.40) Continuous 18 0.86 (0.67–1.10) 20 (0.22) Binary 10 0.94 (0.57–1.55) 0 (0.75) 9 0.87 (0.69–1.10) 13 (0.30) 19 0.87 (0.57–1.34) 0 (0.60) Larger in crossover trials 12 0.91 (0.73–1.13) 0 (0.66) Larger in parallel trials 16 0.91 (0.66–1.30) 18 (0.25) Outcome type Crossover trial analyses First period only Other/unclear Total analysed sample size N: Number of trials. that one type of design in particular yields consistently biased results. Some other caveats should be discussed. The number of meta-analyses that we included in the ROR estimations is not large, so there was considerable accompanying uncertainty in the summary ROR estimates. Moreover, for the large majority of the meta-analyses included here, the amount of evidence was limited both for parallel and crossover studies. This is not atypical of randomized evidence in general in the medical sciences and it leaves considerable uncertainty about the credibility of the results47,48 obtained with either study design. Finally, we only examined meta-analyses published in the Cochrane Library. This may be an advantage, as these metaanalyses follow specific instructions and standards,43 but also a disadvantage, as the RevMan statistical package used in the Cochrane reviews cannot easily analyse crossover trials data.49 EVIDENCE FROM CROSSOVER TRIALS Surprisingly, only one of the analysed meta-analyses in our study used the approach proposed by Reviewers’ Handbook of the Cochrane Collaboration for incorporation of data of crossover studies in meta-analyses. It is difficult to extrapolate these findings to meta-analyses that have been published in scientific journals. Non-Cochrane meta-analyses may be quite differently reported compared with Cochrane reviews, partly because of space limitations. It would be inappropriate to consider the results of parallel arm trials as the perfect gold standard against which the 429 crossover trials are judged—or vice versa. Thus agreement between the two study designs does not guarantee that the estimates are unbiased. In particular for small trials, there may be room for biased results with either type of designs. In all, our empirical evaluation highlights the need for better understanding and improved use of crossover designs and their data. Conflict of interest: None declared. KEY MESSAGES Crossover designs contribute evidence in about a fifth of systematic reviews in the Cochrane Library. Full use of crossover data is uncommon in meta-analyses. The observed effect sizes in crossover and parallel trials do not disagree beyond chance across different topics. However, there was a trend for more conservative treatment effect estimates in parallel arm trials. References 1 Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G and Altman DG (eds). Systematic Reviews in Health Care. Meta-Analysis in Context. 2nd edn., London: BMJ Books, 2001. 2 Elbourne DR, Altman DG, Higgins JPT, Curtin F, Worthington HV, Vail A. Meta-analyses involving cross-over trials: methodological issues. Int J Epidemiol 2002;31:140–49. 3 Garcia R, Benet M, Arnau C, Cobo E. Efficiency of the crossover design: an empirical estimation. Stat Med 2004;23:3773–80. 4 John JA, Russell KG, Whitaker D. Crossover: an algorithm for the construction of efficient crossover designs. Stat Med 2004;23:2645–58. 5 Senn S. Misunderstandings regarding clinical crossover trials. Stat Med 2005;24:3675–78. 6 Cochrane Collaboration. Reviewers’ Handbook 8.11.3. 7 Rosenthal R. Parametric measures of effect size. In: Cooper H and Hedges LV (eds). The Handbook of Research Synthesis. New York: Russell Sage Foundation, 1994. 8 DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clin Trials 1986;7:177–88. 9 Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-Analysis in Medical Research. London: Wiley, 2004. 10 Higgins JP, Thompson SG. Quantifying heterogeneity in a metaanalysis. Stat Med 2002;21:1539–58. 11 Ioannidis JPA, Cappeleri JC, Lau J. Issues in comparisons of metaanalyses and large trials. JAMA 1998;279:1089–93. 12 Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in ‘meta-epidemiological’ research. Stat Med 2002;21:1513–24. 13 Hasselblad V, Hedges LV. Meta-analysis of screening and diagnostic tests. Psychol Bull 1995;117:167–78. 14 Fleiss JL. Analysis of data from multiclinic trials. Controlled Clin Trials 1986;7:267–75. 15 Gotzsche PC, Johansen HK. Short-term low dose corticosteroids vs placebo and non-steroidal anti-inflammatory drugs in rheumatoid arthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 16 Suarez-Almazor ME, Belseck E, Shea B, Wells G, Tugwell P. Methotrexate for treating rheumatoid arthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 17 Ferreira IM, Brooks D, Lacasse Y, Goldstein RS, White J. Nutritional supplementation for stable chronic obstructive pulmonary disease (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 18 Pinelli J, Symington A. Non nutritive sucking for promoting physiologic stability and nutrition in preterm infants (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 19 White J, Cates C, Wright J. Continuous positive airways pressure for obstructive sleep apnoea (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 20 Poustie VJ, Rutherford P. Dietary interventions for phenylketonuria (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 21 Suarez-Almazor ME, Spooner C, Belseck E. Azathioprine for treating rheumatoid arthritis (Cochrane review). The Cochrane Librar. Issue 2. Oxford: Update Software, 2003. 22 Brion LP, Primhak RA, Ambrosio-Perez I. Diuretics acting on the distal renal tubule for preterm infants with (or developing) chronic lung disease (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 23 Lemyre B, Davis PG, De Paoli AG. Nasal intermittent positive pressure ventilation (NIPPV) versus nasal continuous positive airway pressure (NCPAP) for apnea of prematurity (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 24 Lonergan E, Luxenberg J, Colford J. Haloperidol for agitation in dementia (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 25 Jones A, Fay JK, Burr M, Stone M, Hood K, Roberts G. Inhaled corticosteroid effects on bone metabolism in asthma and mild chronic obstructive pulmonary disease (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 26 Green S, Buchbinder R, Barnsley L, Hall S, White M, Smidt N, Assenbelft W. Non-steroidal anti-inflammatory drugs (NSAIDS) for 430 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY treating lateral elbow pain in adults (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 27 Riemsma RP, Kirwan JR, Taal E, Rasker JJ. Patient education for adults with rheumatoid arthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 28 McCrory DC, Brown CD. Anti-cholinergic bronchodilators versus beta2-sympathomimetic agents for acute exacerbations of chronic obstructive pulmonary disease (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 29 Jurgens G, Graudal NA. Effects of low sodium diet versus high sodium diet on blood pressure, renin, aldosterone, catecholamines, cholesterols and triglyceride (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 30 Toweed TE, Judd MJ, Hochberg MC, Wells G. Acetaminophen for osteoarthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 31 Hughes E, Collins J, Vandekerckhove P. Clomiphene citrate for unexplained subfertility in women (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 32 Cheine M, Ahonen J, Wahlbeck K. Beta-blocker supplementation of standard drug treatment for schizophrenia (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 33 Clarke CE, Speller JM. Pergolide versus bromocriptine for levodopa-induced complications in Parkinson’s disease (Coharane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 34 Ortiz Z, Shea B, Suarez-Almazor ME, Moher D, Wells G, Tugwell P. Folic acid and folinic acid for reducing side effects in patients receiving methotrexate for rheumatoid arthritis (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 35 Higgins JPT, Flicker L. Lecithin for dementia and cognitive impairment (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 36 Wiffen P, Collins S, McQuay H, Carroll D, Jadad A, Moore A. Anticonvulsant drugs for acute and chronic pain (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 37 Melchart D, Linde K, Fischer P et al. Acupuncture for idiopathic headache (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 38 Sultana A, Reilly J, Fenton M. Thioridazine for schizophrenia (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 39 Proctor ML, Smith CA, Farquhar CM, Stones RW. Transcutaneous electrical nerve stimulation and acupuncture for primary dysmenorrhoea (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 40 Hood WBJr, Dans AL, Guyatt GH, Jaeschke R, McMurray JJV. Digitalis for treatment of congestive heart failure in patients in sinus rhythm (Cochrane review). The Cochrane Library. Issue 2. Oxford: Update Software, 2003. 41 Karassa FB, Tatsioni A, Ioannidis JP. Design, quality, and bias in randomized controlled trials of systemic lupus erythematosus. J Rheumatol 2003;30:979–84. 42 Kyriakidi M, Ioannidis JPA. Design and quality considerations for randomized controlled trials in systemic sclerosis. Arthritis Rheum 2002;47:73–81. 43 Jadad AR, Moher M, Browman GP et al. Systematic reviews and meta-analyses on treatment of asthma: critical evaluation. Br Med J 2000;320:537–40. 44 Van Coevorden AM, Coenraads PJ, Svensson A et al. European Dermato-Epidemiology Network (Eden). Overview of studies of treatments for hand eczema-the EDEN hand eczema survey. Br J Dermatol 2004;151:446–51. 45 Egger M, Davey Smith G, Schneider M, Minder C. Bias in metaanalysis detected by a simple, graphical test. Br Med J 1997;315:629–34. 46 Ioannidis JP, Cappelleri JC, Sacks HS, Lau J. The relationship between study design, results, and reporting of randomized clinical trials of HIV infection. Control Clin Trials 1997;18:431–44. 47 Sterne JA, Davey Smith G. Sifting the evidence-what’s wrong with significance tests? Br Med J 2001;322:226–31. 48 Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2:e124. 49 Senn S. The quality of systematic reviews. Review is biased. Br Med J 2000;321:297.
© Copyright 2026 Paperzz