David Torgerson

What makes a good quality
trial?
Professor David Torgerson
York Trials Unit
Background
• Whilst the RCT is the most rigorous
research design some are better than
others.
• It is important that trials use the best
methods and report these.
Reporting Guidelines
• Because of a history of poor trial reporting a
group of trial methodologists developed the
CONSORT statement. Susequently, major
medical journals (e.g. BMJ, Lancet, JAMA) have
adopted this as editorial policy.
• This sets out the minimum items that trials
should report to be published in these journals.
Internal versus External Validity
• Internal validity is most important: are the
trial results correct for the sample used?
• External validity less important: is the trial
result applicable to the general
population?
• A trial cannot be externally valid if it is not
also internally valid.
Important quality items
• Allocation method;
» method randomisation;
» secure randomisation.
•
•
•
•
Intention to treat analysis.
Blinding.
Attrition.
Sample size.
Allocation Method
• How was the allocation method devised?
• Was secure allocation used?
• Secure allocation means separate
generation and allocation of participants
from the person recruiting.
Secure allocation
• Why do we need secure, preferably
independent, allocation?
• Because some researchers try to ‘subvert’
the allocation
• In a survey of 25 researchers 4 (16%)
admitted to keeping ‘a log’ of previous
allocations to try and predict future
allocations.
Brown et al. Stats in Medicine, 2005,24:3715.
Subversion - evidence
• Schulz has described, anecdotally, a number of
incidents of researchers subverting allocation by
looking at sealed envelopes through x-ray lights.
• Researchers have confessed to breaking open
filing cabinets to obtain the randomisation code.
• In a surgical trial with 5 centres – 3 were found
to be independtly subverting the allocation.
Schulz JAMA 1995;274:1456.
Mean ages of groups
Clinician
Experimental
Control
All p < 0.01
59
63
1 p < 0.01
57
72
2 p < 0.001
33
69
3 p = 0.03
47
72
Kennedy & Grant. 1997;Controlled Clin Trials 18,3S,77-78S
Recent Blocked Trial
“This was a block randomised study (four patients
to each block) with separate randomisation at
each of the three centres. Blocks of four cards
were produced, each containing two cards
marked with "nurse" and two marked with
"house officer." Each card was placed into an
opaque envelope and the envelope sealed. The
block was shuffled and, after shuffling, was
placed in a box.”
Kinley et al., BMJ 2002 325:1323.
Or did they do this?
• “Randomisation was accomplished using a
balanced block design (four patients to
each block) with a separate randomisation
process at each of the three centres. A
separate series of consecutively
numbered, opaque sealed envelopes was
administered at each research centre”
Kinley et al. 2001 Health Technology Assessment, Vol 5, no 20, p 4.
What is wrong here?
Southampton
Sheffield
Doncaster
Doctor Nurse
Doctor Nurse
Doctor Nurse
500
308
118
511
Kinley et al., BMJ 325:1323.
319
118
Problem?
• If block randomisation of 4 were used then
each centre should not be different by
more than 2 patients in terms of group
sizes.
• Two centres had a numerical disparity of
11. Either blocks of 4 were not used or the
sequence was not followed.
More Evidence
• Hewitt and colleagues examined the association
between p values and adequate concealment in
4 major medical journals.
• Inadequate concealment largely used opaque
envelopes.
• The average p value for inadequately concealed
trials was 0.022 compared with 0.052 for
adequate trials (test for difference p = 0.045).
Hewitt et al. BMJ;2005: 330: 1057 - 1058
Intention to Treat Analysis
• Were all allocated participants analysed in
their original groups ?
• Active treatment analysis, analysing by
treatment received, can result in bias.
Non use of ITT - Example
• “It was found in each sample that
approximately 86% of the students with
access to reading supports used them.
Therefore, one-way ANOVAs were
computed for each school sample,
comparing this subsample with subjects
who did not have access to reading
supports.” (Feldman and Fish, J Educ
Computing Res 1991, p 39-31).
Can it change findings?
• In New York a randomised trial of vouchers for
private schools was undertaken. Vouchers were
offered to poor parents to enable them to send
their child to a private school of their choice.
Initial analysis was undertaken of the children
using changes in their test scores. However,
many pre-tests were missing and some posttests. Complete case analysis indicated voucher
children got better test scores than children in
state schools.
BUT…
• The initial analysis did not use ITT as
some data were missing. A further
analysis of post test scores (state exams)
where there was nearly complete case
ascertainment found NO difference in test
scores between the groups.
Krueger & Zhu 2002, NBER Working Paper 9418
Blinding
•
•
•
•
Who knew who got what when?
Was the participant blind?
Was practitioner blind?
Most IMPORTANT was outcome assessment
blind?
• This is particularly important for subjective
outcomes or outcomes in a grey area – (e.g.,
marking an essay knowledge of group allocation
may lead to better or lower scores)
Attrition
• What was the final number of participants
compared with the number randomised?
• What happened to those lost along the
way?
• Was there equal attrition?
Attrition
• Rule of thumb < 5% not really a problem.
• >5% needs to be equal between groups
otherwise potential bias.
• Is information on the characteristics of lost
participants presented and does this
suggest that they are similar between
groups?
Sample size
• Was the sample size adequate to detect a
‘reasonable’ or credible difference?
• How was the sample size calculated?
Sample Size
• Small trials will miss important differences.
• Bigger is better in trials.
• Why was the number chosen? For example “given
an incidence of 10% we wanted to have 80%
power to show a halving to 5%” or “we enrolled 100
participants”.
• Custom and practice in education trials tend
around sample size of 30.
• Trials should be large enough to detect at least 0.5
Effect Size (i.e., 128 or bigger)
A Quality Comparison of RCTs in
Health & Education
Carole Torgerson1, David Torgerson2,
Yvonne Birks2, Jill Porthouse2
Departments of Educational Studies1 and
Health Sciences2, University of York
Torgerson et al. British Educational Research Journal, 2005, 761.
Are Trials of Good Quality?
• We sought to ascertain whether there was
a differential quality between health care
and educational trials.
• Are trials improving in quality?
• We looked at a sample of trials from
different journals from 1990 to 2001 and
looked at before and after CONSORT
adoption.
Study Characteristics
Characteristic
Cluster Randomised
Sample size justified
Concealed randomisation
Blinded Follow-up
Use of CIs
Low Statistical Power
Drug Health Education
1%
36%
18%
59% 28%
0%
40%
8%
0%
53% 30%
14%
68% 41%
1%
45% 41%
85%
Change in concealed allocation
50
45
40
35
30
25
20
15
10
5
0
<1997
>1996
Drug
P = 0.04
No Drug
P = 0.70
NB No education trial used concealed allocation
Blinded Follow-up
60
50
40
<1997
>1996
30
20
10
0
Drug
P = 0.54
Health
P = 0.13
Education
P = 0.03
Underpowered
90
80
70
60
50
<1997
>1996
40
30
20
10
0
Drug
P = 0.01
Health
P = 0.76
Education
P = 0.22
Mean Change in Items
3.5
3
2.5
2
<1997
>1996
1.5
1
0.5
0
Drug
P= 0.001
No Drug
P= 0.07
Education
P= 0.03
Has Consort had an Effect?
• As trialists we KNOW that pre-test post-test or
before and after data are the weakest form of
quantitative evidence.
• Evidence from this BEFORE and AFTER study
does NOT support the view that CONSORT has
had an effect on the quality of reporting. Need
to look at time-series data.
• Before CONSORT there was a strong trend to
improving quality of reporting this trend has
continued since CONSORT.
Mean Items by Year of Publication
(Drug Trials Only)
4.0
3.5
3.0
2.5
2.0
1.5
1.0
.5
1991.00
1993.00
1992.00
YEAR
1995.00
1994.00
1997.00
1996.00
1999.00
1998.00
2001.00
2000.00
2002.00
Quality Improvement
• In a multiple regression analysis calendar
year was a stronger predictor of the
number of items scored than pre and post
consort.
• Journal quality was highly predictive with
‘good’ quality general journals reporting
significantly more items than specialist
health journals.
CONSORT Effect
• Although our study seemed not to show an
effect of CONSORT. Others have. Moher
et al, compared the BMJ, Lancet, JAMA
(CONSORT adopters) with the N Engl J
Med (initial non-adopter) and found better
quality reporting.
Moher et al. JAMA 2001, 285:1992.
Quality and citations
• Are better quality trials cited more often
than poor quality trials?
• Unfortunately, not – a recent citation
review suggests that it is journal quality
rather than trial quality which dominates
citation rates.
Nieminen et al. BMC 2006, 6:42
Conclusion
• Evidence based policy demands good
quality trials that are reported well.
• Many health care trials are of poor quality,
educational trials are worse.
• Increasing the numbers of RCTs will not
improve policy making UNLESS these
trials are of good quality.