Basics of meta analysis

Basics of Meta-analysis
Steff Lewis, Rob Scholten
Cochrane Statistical Methods Group
(Thanks to the many people who have
worked on earlier versions of this
presentation)
Introduction
Session plan
•
Introduction
•
Effect measures – what they mean
–
•
Meta-analysis
–
•
Exercise 2
Heterogeneity
–
•
Exercise 1
Exercise 3
Summary
3
Before we start…this workshop will be
discuss binary outcomes only…
• e.g. dead or alive, pain free or in pain,
smoking or not smoking
• each participant is in one of two possible,
mutually exclusive, states
There are other workshops for continuous data,
etc
4
Where to start
1. You need a pre-defined question
•
“Does aspirin increase the chance of
survival to 6 months after an acute
stroke?”
•
“Does inhaling steam decrease the
chance of a sinus infection in people who
have a cold?”
5
Where to start
2. Collect data from all the trials and enter into
Revman
For each trial you need:
The total number of patients in each treatment
group.
The number of patients who had the relevant
outcome in each treatment group
6
Effect measures
– what they mean
Which effect measure?
•
In Revman you can choose:
–
Relative Risk (RR) = Risk Ratio,
–
Odds Ratio (OR)
–
Risk Difference (RD) = Absolute Risk Reduction
(ARR),
8
Risk
• 24 people skiing down a slope, and 6 fall
• risk of a fall
= 6 falls/24 who could have fallen
= 6/24 = ¼ = 0.25 = 25%
risk = number of events of interest
total number of observations
9
Odds
• 24 people skiing down a slope, and 6 fall
• odds of a fall
= 6 falls/18 did not fall
= 6/18 = 1/3 = 0.33
(not usually as %)
odds = number of events of interest
number without the event
10
Expressing it in words
• Risk
– the chances of falling were one in four, or
25%
• Odds
– the chances of falling were one third of the
chances of not falling
– one person fell for every three that didn’t
fall
– the chances of falling were 3 to 1 against
11
Do risks and odds differ much?
• Control arm of trial by Blum
– 130 people still dyspeptic out of 164
• chance of still being dyspeptic
– risk = 130/164 = 0.79; odds =130/34 = 3.82
– Tanzania trial, control arm
– 4 cases in 63 women
• chance of pregnancy induced hypertension
– risk = 4/63 = 0.063; odds = 4/59 = 0.068
eg1 - Moayeddi et al BMJ 2000;321:659-64
eg2 - Knight M et al. Antiplatelet agents for preventing and treating pre-eclampsia
(Cochrane Review). In: The Cochrane Library, Issue 3, 2000. Oxford: Update Software.
12
Comparing groups – 2x2 tables
Blum et al
Still
dyspeptic
Not still
dyspeptic
Total
Treatment
119
45
164
Control
130
34
164
Total
249
79
328
13
Risk ratio (relative risk)
• risk of event on treatment
= 119/164
• risk of event on control
= 130/164
• risk ratio =
=
Blum
et al
Still
dyspeptic
Not still
dyspeptic
Total
Treat
119
45
164
Control
130
34
164
Total
249
79
328
119/164
= 0.726
130/164
0.793
risk on treatment
risk on control
= 0.92
Where risk ratio = 1, this implies no difference in effect
14
Odds ratio
• odds of event on treatment
= 119/45
• odds of event on control
= 130/34
• odds ratio =
=
Blum
et al
Still
dyspeptic
Not still
dyspeptic
Total
Treat
119
45
164
Control
130
34
164
Total
249
79
328
119/45
= 2.64
130/34
3.82
odds on treatment
odds on control
= 0.69
Where odds ratio = 1, this implies no difference in effect
15
What is the difference between Peto OR and
OR?
•
The Peto Odds Ratio is an approximation to
the Odds Ratio that works particularly well
with rare events
16
Expressing risk ratios and odds ratios
• Risk ratio 0.92
– the risk of still being dyspeptic on treatment was
about 92% of the risk on control
– treatment reduced the risk by about 8%
– treatment reduced the risk to 92% of what it was
• Odds ratio 0.69
– treatment reduced the odds by about 30%
– the odds of still being dyspeptic in treated patients
were about two-thirds of what they were in
controls
17
(Absolute) Risk difference
• risk on treatment – risk on control
• for Blum et al
119/164 – 130/164 = 0.726 – 0.793
= -0.067
usually expressed as a %, -6.7%
• treatment reduced the risk of being dyspeptic
by about 7 percentage points
• Where risk difference = 0, this implies no
difference in effect
18
What do we want from our summary
statistic?
• Communication of effect
– Users must be able to use the result
• Consistency of effect
– It would be ideal to have one number to
apply in all situations
• Mathematical properties
19
Summary
OR
RR
RD
Communication
-
+
++
Consistency
+
+
_
Mathematics
++
_
_
Further info in “Dealing with dichotomous data”
workshop.
20
Exercise 1
Meta-analysis
What is meta-analysis?
• A way to calculate an average
• Estimates an ‘average’ or ‘common’ effect
• Improves the precision of an estimate by
using all available data
23
What is a meta-analysis?
Optional part of a systematic review
Systematic reviews
Meta-analyses
24
When can we do a meta-analysis?
• When more than one study has estimated an
effect
• When there are no differences in the study
characteristics that are likely to substantially
affect outcome
• When the outcome has been measured in
similar ways
• When the data are available (take care with
interpretation when only some data are
available)
25
Averaging studies
• Starting with the summary statistic for each
study, how should we combine these?
• A simple average gives each study equal
weight
• This seems intuitively wrong
• Some studies are more likely to give an
answer closer to the ‘true’ effect than others
26
Weighting studies
• More weight to the studies which give us
more information
– More participants
– More events
– Lower variance
• Weight is closely related to the width of the
study confidence interval: wider confidence
interval = less weight
27
For example
Deaths on
hypothermia
Deaths on
control
Weight (%)
Clifton 1992
1/5
1/5
3.6
Clifton 1993
8/23
8/22
21.5
Hirayama 1994
4/12
5/10
11.3
Jiang 1996
6/23
14/24
23.4
Marion 1997
9/39
10/42
30.0
Meissner 1998
3/12
3/13
9.7
28
Displaying results graphically
• Revman produces forest plots
29
there’s a label to tell
you what the comparison
is and what the outcome
of interest is
30
At the bottom there’s
a horizontal line. This
is the scale measuring
the treatment effect.
Here the outcome is death
and towards the left the
scale is less than one,
meaning the treatment
has made death less
likely.
Take care to read what
the labels say – things to
the left do not always mean
the treatment is better than
the control.
31
The vertical line in the
middle is where the
treatment and control
have the same effect –
there is no difference
between the two
32
For each study
there is an id
The data for
each trial
are here, divided
into the experimental
and control groups
This is the % weight
given to this
study in the
pooled analysis
33
The data shown in
the graph are also
given numerically
The label above the graph
tells you what statistic
has been used
•Each study is given a blob, placed where the data measure the effect.
•The size of the blob is proportional to the % weight
•The horizontal line is called a confidence interval and is a measure of
how we think the result of this study might vary with the play of chance.
•The wider the horizontal line is, the less confident we are of the
observed effect.
34
The pooled analysis is given a diamond shape
where the widest bit in the middle
is located at the calculated
best guess (point estimate),
and the horizontal width is the
confidence interval
Definition of a 95% confidence interval: If a trial was repeated 100 times,
then 95 out of those 100 times, the best guess (point estimate) would lie
within this interval.
35
Could we just add the data from all the trials
together?
• One approach to combining trials would be to
add all the treatment groups together, add all
the control groups together, and compare the
totals
• This is wrong for several reasons, and it can
give the wrong answer
36
If we just add up the columns we get
34.3% vs 32.5% , a RR of 1.06,
a higher death rate in the steroids group
From a meta-analysis, we get
RR=0.96 , a lower death rate
in the steroids group
37
Problems with simple addition of studies
• breaks the power of randomisation
• imbalances within trials introduce bias
38
*
#
In effect we are comparing
this experimental group directly
with this control group – this is
not a randomised comparison
39
*
The Pitts trial contributes 17% (201/1194) of all the data to the
experimental column, but 8% (74/925) to the control column.
Therefore it contributes more information to the average death rate in
the experimental column than it does to the control column.
There is a high death rate in this trial, so the death rate for the expt
column is higher than the control column.
40
Interpretation - “Evidence of absence” vs
“Absence of evidence”
• If the confidence interval crosses the line of
no effect, this does not mean that there is no
difference between the treatments
• It means we have found no statistically
significant difference in the effects of the two
interventions
41
In the example below, as more data is included, the
overall odds ratio remains the same but the confidence
interval decreases.
It is not true that there is ‘no difference’ shown in the
first rows of the plot – there just isn’t enough data to
show a statistically significant result.
Review :
Comparison:
Outcome:
Steff
01 Absence of evidence and Evidence of absence
01 Increasing the amount of data...
Study
or sub-category
Treatment
n/N
Control
n/N
1 study
2 studies
3 studies
4 studies
5 studies
10/100
20/200
30/300
40/400
50/500
15/100
30/200
45/300
60/400
75/500
OR (fixed)
95% CI
OR (fixed)
95% CI
0.63
0.63
0.63
0.63
0.63
0.1
0.2
0.5
Favours treatment
1
2
5
[0.27,
[0.34,
[0.38,
[0.41,
[0.43,
10
Favours control
42
1.48]
1.15]
1.03]
0.96]
0.92]
Interpretation - Weighing up benefit and
harm
•
When interpreting results, don’t just
emphasise the positive results.
•
A treatment might cure acne instantly, but
kill one person in 10,000 (very important as
acne is not life threatening).
43
Interpretation - Quality
•
Rubbish studies = unbelievable results
•
If all the trials in a meta-analysis were of
very low quality, then you should be less
certain of your conclusions.
•
Instead of “Treatment X cures depression”,
try “There is some evidence that Treatment
X cures depression, but the data should be
interpreted with caution.”
44
Exercise 2
Heterogeneity
What is heterogeneity?
•
Heterogeneity is variation between the
studies’ results
47
Causes of heterogeneity
Differences between studies with respect to:
• Patients: diagnosis, in- and exclusion criteria,
etc.
• Interventions: type, dose, duration, etc.
• Outcomes: type, scale, cut-off points,
duration of follow-up, etc.
• Quality and methodology: randomised or
not, allocation concealment, blinding, etc.
48
How to deal with heterogeneity
1. Do not pool at all
2. Ignore heterogeneity: use fixed effect model
3. Allow for heterogeneity: use random effects
model
4. Explore heterogeneity: (“Dealing with
heterogeneity” workshop )
How to assess heterogeneity from a Revman
forest plot
50
Statistical measures of heterogeneity
• The Chi2 test measures the amount of
variation in a set of trials, and tells us if it is
more than would be expected by chance
• Small p values suggest that heterogeneity is
present
• This test is not very good at detecting
heterogeneity. Often a cut-off of p<0.10 is
used, but lack of statistical significance does
not mean there is no heterogeneity
51
Statistical measures of heterogeneity (2)
• A new statistic, I2 is available in RevMan 4.2
• I2 is the proportion of variation that is due to
heterogeneity rather than chance
• Large values of I2 suggest heterogeneity
• Roughly, I2 values of 25%, 50%, and 75%
could be interpreted as indicating low,
moderate, and high heterogeneity
• For more info see: Higgins JPT et al.
Measuring inconsistency in meta-analyses.
BMJ 2003;327:557-60.
52
Fixed effect
Philosophy behind fixed effect model:
• there is one real value for the treatment effect
• all trials estimate this one value
Problems with ignoring heterogeneity:
• confidence intervals too narrow
Random effects
Philosophy behind random effects model:
• there are many possible real values for the
treatment effect (depending on dose, duration,
etc etc).
• each trial estimates its own real value
54
Interpretation of fixed and random effects results
If there is heterogeneity, Fixed effect and Random effects
models:
• may give different pooled estimates
• have different interpretations:
RD = 0.3: Fixed Effects Model
The best estimate of the one and only real RD is 0.3
RD = 0.3: Random Effects Model
The best estimate of the mean of all possible real values of
the RD is 0.3
• Random Effects Model gives wider confidence interval
• In practice, people tend to interpret fixed and random effects
the same way.
Exercise 3
Summary
Summary
• Precisely define the question you want to
answer
•
Choose an appropriate effect measure
•
Collect data from trials and do a metaanalysis if appropriate
•
Interpret the results carefully
–
–
Evidence of absence vs absence of evidence
Benefit and harm
–
–
Quality
Heterogeneity
58
Other sources of help and advice
• The Reviewer’s handbook
–
•
http://www.cochrane.org/resources/handbook/index.htm
The distance learning material
–
•
The Revman user guide.
–
•
http://www.cochrane-net.org/openlearning/
http://www.cc-ims.net/RevMan/documentation.htm
The Collaborative Review Group you are
working with
59