Determinants of quality of evidence: What can lower the quality? (II)

GRADE Workshop: Agenda
9:00 - 10:15 Introduction. Guideline development process and the GRADE approach
10:15-11:00 Types of questions. Framing a question: PICO question. Exercise
11:00-11:15 Coffee
11:15-12:00 Choosing outcomes. Relative importance of outcomes. Exercise
GRADE workshop:
12:00-12:45 Study designs. Exercise
Meta-analysis - the basics
12:45-13:30 Lunch
13:30-14:30 Search of the literature. Exercise
Juan A Blasco Amaro, MD MPH
Public Health Policy Support Unit
Institute for Health and Consumer Protection
(JRC-IHCP)
14:30-15:00 Determinants of quality of evidence: What can lower the evidence? (I)
15:00-15:15 Coffee
15:15-17:00 Determinants of quality of evidence: What can lower the evidence? (II) Exercise
9:00-11:15
Determinants of QoE: What can lower the evidence? (III). Exercise
Determinants of quality of evidence: What can upgrade evidence?
11:15-11:30 Coffee
11:30-13:00 Going from the evidence to the recommendation. Exercise
Joint Research Centre
The European Commission’s
in-house science service
13:00-13:45 Lunch
D is c laimer: T he c ontents of this pres entation are the views of the author and do not nec es sarily represent an offic ial position of the E uropean C ommission.
© E uropean U nion, 2 0 13
13:45-16:00 Using the Guideline Development Tool (GDT) software. Exercise
2
GRADE Workshop
– Ispra– 11-12
December
2013
Feedback
and
conclusions
Outline
¿What is a meta-analysis?
3
a.
Introduction
b.
Representation
c.
Imprecision
d.
Inconsistency
e.
Publication Bias
f.
Exercise
a)
b)
c)
d)
e)
f)
g)
h)
4
GRADE Workshop – Ispra– 11-12 December 2013
Meta-analysis History
A combination of studies
Estimation of a common value
A graphical plot
A type of Systematic Review
Combined analyses of randomized
clinical trials
Research project
Observational study
Statistical method
GRADE Workshop – Ispra– 11-12 December 2013
Evolution - meta published in medline
8000
-Background
-Karl Pearson, 1904
-First “meta-analisys” published
intervention, what intervention?
- it was effective in 35%!
7000
assessing
6000
5000
7000-8000
4000
•
-Term is introduced by Glass in 1976 and
became popular in Social Sciences
6000-7000
5000-6000
4000-5000
3000
3000-4000
2000-3000
2000
•
- In 1992, a network of epidemiologists and
healthcare
professionals
->
Cochrane
Collaboration
1000-2000
0-1000
1000
0
2004
5
GRADE Workshop – Ispra– 11-12 December 2013
6
2005
2006
2007
2008
2009
2010
2011
2012
2013
GRADE Workshop – Ispra– 11-12 December 2013
1
Meta-analysis
Define Question
Heterogeneity
Analysis
Search for
Studies
Combining effects
Study
level
↓
Graphical plot
Apply elegibility
criteria
Heterogeneity
measures
Assess studies for
Risk of Bias
Sensitivity analysis
Publication bias
Interpret results
And
Draw conclusions
Collect
Data
Review
level
↓
Study
A
Outcome
data
Effect
measure
Study
B
Outcome
data
Effect
measure
Study
C
Outcome
data
Effect
measure
Study
D
Outcome
data
Effect
measure
Effect
measure
Source: Jo McKenzie & Miranda
Cumpston
7
8
GRADE Workshop – Ispra– 11-12 December 2013
What is a meta-analysis?
•
•
•
Why perform a meta-analysis?
quantify treatment effects and their uncertainty
increase power
increase precision
explore differences between studies
settle controversies from conflicting studies
generate new hypotheses
combines the results from two or more studies
estimates an ‘average’ or ‘common’ effect
optional part of a systematic review
Systematic
reviews
GRADE Workshop – Ispra– 11-12 December 2013
Metaanalyses
Source: Julian Higgins
9
GRADE Workshop – Ispra– 11-12 December 2013
10
When not to do a meta-analysis
GRADE Workshop – Ispra– 11-12 December 2013
When can you do a meta-analysis?
• mixing apples with oranges
• each included study must address same question
• more than one study has measured an effect
consider comparison and outcomes
requires your subjective judgement
• the studies are sufficiently similar to produce a meaningful and
useful result
• garbage in – garbage out
• the outcome has been measured in similar ways data are
available in a format we can use
• a meta-analysis is only as good as the studies in it
• if included studies are biased:
meta-analysis result will also be incorrect
will give more credibility and narrower confidence interval
• if serious reporting biases present:
unrepresentative set of studies may give misleading result
Source: Julian Higgins
11
GRADE Workshop – Ispra– 11-12 December 2013
12
GRADE Workshop – Ispra– 11-12 December 2013
2
Software
(IV)
Software (III)
Software (I)
•
•
•
•
•
•
•
•
•
•
•
•
•
13
MetaXL software page
ClinTools (commercial)
Comprehensive Meta-Analysis (commercial)
MIX 2.0 Professional Excel addin with Ribbon interface for metaanalysis (free and commercial versions).
Meta-analysis features for Stata? (free add-ons to commercial
package)
The Meta-Analysis Calculator free on-line tool for conducting a
meta-analysis
Metastat (Free)
Meta-Analyst Free Windows-based tool
ProMeta Professional software for meta-analysis Java (commercial)
Revman A free software for meta-analysis
Metafor-project A free software package for meta-analyses in R
Macros in SPSS Free Macros to conduct meta-analyses in SPSS
MAd GUI User friendly graphical user interface package for R (Free)
14
GRADE Workshop – Ispra– 11-12 December 2013
GRADE Workshop – Ispra– 11-12 December 2013
¿What’s behind the
“software”?
15
GRADE Workshop – Ispra– 11-12 December 2013
Meta-analysis is typically a twostage process
Statistical analysis
16
Steps in a meta-analysis
• a summary statistic is calculated for each study,
to describe the observed intervention effect.
• identify comparisons to be made
• identify outcomes to be reported and statistics to be
used
• collect data from each relevant study
• combine the results to obtain the summary of effect
• explore differences between the studies
• interpret the results
• summary (pooled) intervention effect estimate is
calculated as a weighted average of the
intervention effects estimated in the individual
studies
17
GRADE Workshop – Ispra– 11-12 December 2013
GRADE Workshop – Ispra– 11-12 December 2013
18
GRADE Workshop – Ispra– 11-12 December 2013
3
Calculating the summary result
•
•
For example
collect a summary statistic from each contributing study
how do we bring them together?
• treat as one big study – add intervention & control data?
breaks randomisation, will give the wrong answer
• simple average?
Headache
Caffeine
Decaf
Amore-Coffea
2000
2/31
10/34
Deliciozza 2004
10/40
9/40
Mama-Kaffa 1999
12/53
9/61
Morrocona 1998
3/15
1/17
Norscafe 1998
19/68
9/64
Oohlahlazza 1998
4/35
2/37
Piazza-Allerta
2003
GRADE Workshop – Ispra– 11-12 December 2013
8/35
6/37
Weight
weights all studies equally – some studies closer to the truth
• weighted average
19
20
GRADE Workshop – Ispra– 11-12 December 2013
Overview
For example
Headache
Caffeine
Decaf
Weight
Amore-Coffea
2000
2/31
10/34
6.6%
Deliciozza 2004
10/40
9/40
21.9%
Mama-Kaffa 1999
12/53
9/61
22.2%
Morrocona 1998
3/15
1/17
2.9%
Norscafe 1998
19/68
9/64
26.4%
Oohlahlazza 1998
4/35
2/37
5.1%
8/35
6/37
14.9%
21
Piazza-Allerta
2003
GRADE Workshop – Ispra– 11-12 December 2013
23
GRADE Workshop – Ispra– 11-12 December 2013
22
GRADE Workshop – Ispra– 11-12 December 2013
24
GRADE Workshop – Ispra– 11-12 December 2013
4
Outline
Meta-analysis options
•
for dichotomous or continuous data
• inverse-variance
a.
Introduction
b.
Representation
c.
Imprecision
c.
Inconsistency
d.
Publication Bias
e.
Exercise
straightforward, general method
•
for dichotomous data only
• Mantel-Haenszel (default)
good with few events – common in Cochrane reviews
weighting system depends on effect measure
• Peto
for odds ratios only
good with few events and small effect sizes (OR close to 1)
25
GRADE Workshop – Ispra– 11-12 December 2013
A forest of lines
26
GRADE Workshop – Ispra– 11-12 December 2013
Forest plots
Headache at 24 hours
• headings explain the comparison
27
GRADE Workshop – Ispra– 11-12 December 2013
Forest plots
29
28
GRADE Workshop – Ispra– 11-12 December 2013
Forest plots
Headache at 24 hours
Headache at 24 hours
• list of included studies
• effect estimate for each study, with CI
GRADE Workshop – Ispra– 11-12 December 2013
30
GRADE Workshop – Ispra– 11-12 December 2013
5
Forest plots
31
Forest plots
Headache at 24 hours
Headache at 24 hours
• scale and direction of benefit
• pooled effect estimate for all studies, with CI
GRADE Workshop – Ispra– 11-12 December 2013
Outline
32
GRADE Workshop – Ispra– 11-12 December 2013
Interpreting confidence intervals
a.
Introduction
b.
Representation
c.
Imprecision
d.
Inconsistency
e.
Publication Bias
f.
Exercise
•
always present estimate with a confidence interval
• precision
• point estimate is the best guess of the effect
• CI expresses uncertainty – range of values we can be reasonably
sure includes the true effect
• significance
• if the CI includes the null value
rarely means evidence of no effect
effect cannot be confirmed or refuted by the available evidence
• consider what level of change is clinically important
33
GRADE Workshop – Ispra– 11-12 December 2013
Assessing precision of the Results
34
GRADE Workshop – Ispra– 11-12 December 2013
Outline
a.
Introduction
b.
Representation
c.
Imprecision
d.
Inconsistency
e.
Publication Bias
f.
Exercise
When are results imprecise?
•
•
35
For dichotomous outcomes
• Sample size, optimal information size
• Number of events
For continuous outcomes
• minimal important difference
GRADE Workshop – Ispra– 11-12 December 2013
36
GRADE Workshop – Ispra– 11-12 December 2013
6
Clinical diversity
Heterogeneity
•
clinical diversity (sometimes called clinical heterogeneity)
•
methodological diversity (sometimes called methodological
heterogeneity)
•
statistical heterogeneity simply as heterogeneity
37 GRADE Workshop – Ispra– 11-12 December 2013
37
38
Methodological diversity
39
•
design
• e.g. randomised vs non-randomised, crossover vs parallel,
individual vs cluster randomised
•
conduct
• e.g. risk of bias (allocation concealment, blinding, etc.),
approach to analysis
GRADE Workshop – Ispra– 11-12 December 2013
Identifying heterogeneity
•
participants
• e.g. condition, age, gender, location, study eligibility criteria
•
interventions
• intensity/dose, duration, delivery, additional components,
experience of practitioners, control (placebo, none, standard
care)
•
outcomes
• follow-up duration, ways of measuring, definition of an event,
cut-off points
GRADE Workshop – Ispra– 11-12 December 2013
Statistical heterogeneity
40
•
there will always be some random (sampling) variation between the results
of different studies
•
heterogeneity is variation between the effects being evaluated in the
different studies
• caused by clinical and methodological diversity
• alternative to homogeneity (identical true effects underlying
every study)
• study results will be more different from each other than if
random variation is the only reason for the differences between
the estimated intervention effects
GRADE Workshop – Ispra– 11-12 December 2013
Visual inspection
Forest plot A
• Visual inspection of the forest plots
Forest plot B
• Chi-squared (χ
χ2) test (Q test)
• I2 statistic to quantify heterogeneity
41
GRADE Workshop – Ispra– 11-12 December 2013
42
GRADE Workshop – Ispra– 11-12 December 2013
7
Cochran’s Q and his test
Thresholds for the interpretation of I2 can be misleading
43
•
•
•
• Cochran's Q statistic, (χ
χ2) :
Follow a distribution c2 with k-1 degrees of freedom
(k = number of studies).
Low power for few studies (low k).
•
•
• Statistic I2 :
I2 = 100% x (Q – [k-1])/Q
Percentage of variation related to heterogeinity and not to random.
GRADE Workshop – Ispra– 11-12 December 2013
Example
45 GRADE Workshop – Ispra– 11-12 December 2013
45
Fixed-effect vs random-effects
•
25% low - might not be important;
•
50% moderate - may represent moderate heterogeneity;
•
75% high - may represent substantial heterogeneity;
•
75% to 100% considerable heterogeneity.
44 GRADE Workshop – Ispra– 11-12 December 2013
44
The I2 statistic
46
GRADE Workshop – Ispra– 11-12 December 2013
Fixed-effect model
Random (sampling)
error
• Two models for meta-analysis available in RevMan
•
assumes all studies are
measuring the same treatment
effect
•
•
estimates that one effect
if not for random (sampling)
error, all results would be
identical
• Make different assumptions about heterogeneity
Study
result
Source: Julian Higgins
47
GRADE Workshop – Ispra– 11-12 December 2013
48
Common
true effect
GRADE Workshop – Ispra– 11-12 December 2013
8
Random-effects model
No heterogeneity
Random
error
•
•
•
Studyspecific
effect
Source: Julian
Higgins
49
Fixed
assumes the treatment effect varies
between studies
estimates the mean of the
distribution of effects
weighted for both
within-study and between-study
variation (tau2, τ2)
Random
Adapted from Ohlsson A, Aher SM. Early erythropoietin for preventing red blood cell transfusion in
preterm and/or low birth weight infants. Cochrane Database of Systematic Reviews 2006, Issue 3.
Mean of true
effects
50
GRADE Workshop – Ispra– 11-12 December 2013
GRADE Workshop – Ispra– 11-12 December 2013
Which to choose?
Some heterogeneity
Fixed
•
Do you expect your results to be very diverse?
•
Consider the underlying assumptions of the model
Random
• fixed-effect
may be unrealistic – ignores heterogeneity
• random-effects
allows for heterogeneity
estimate of distribution of studies may not be accurate if biases are present,
few studies or few events
Adapted from Adams CE, Awad G, Rathbone J, Thornley B. Chlorpromazine versus
placebo for schizophrenia. Cochrane Database of Systematic Reviews 2007, Issue 2.
51
GRADE Workshop – Ispra– 11-12 December 2013
What to do about heterogeneity
52
GRADE Workshop – Ispra– 11-12 December 2013
Exploring your results
• Check that the data are correct
•
•
•
•
• Especially if the direction of effect varies if heterogeneity is
very high
• Interpret fixed-effect results with caution
what is heterogeneity?
assumptions about heterogeneity
identifying heterogeneity
exploring your results
consider sensitivity analysis – would random-effects have made an important
difference?
• May choose not to meta-analyse
average result may be meaningless in practice
consider clinical & methodological comparability of studies
• Avoid
changing your effect measure or analysis model
excluding outlying studies
•
53
explore heterogeneity
GRADE Workshop – Ispra– 11-12 December 2013
54
GRADE Workshop – Ispra– 11-12 December 2013
9
Participant subgroups
Two methods available
•
subgroup analysis
• Group studies by pre-specified factors
• look for differences in results and heterogeneity
•
meta-regression
• examine interaction with categorical and continuous variables
• not available in RevMan
Based on Stead LF, Perera R, Bullen C, Mant D, Lancaster T. Nicotine replacement therapy for smoking cessation. Cochrane
Database of Systematic Reviews 2008, Issue 1. Art. No.: CD000146. DOI: 10.1002/14651858.CD000146.pub3.
Based on Linde K, Berner MM, Kriston L. St John's wort for major depression. Cochrane Database of
Systematic Reviews 2008, Issue 4. Art. No.: CD000448. DOI: 10.1002/14651858.CD000448.pub3.
55
GRADE Workshop – Ispra– 11-12 December 2013
56
Intervention subgroups
57
GRADE Workshop – Ispra– 11-12 December 2013
GRADE Workshop – Ispra– 11-12 December 2013
Sensitivity analysis
58
•
•
not the same as subgroup analysis
testing the impact of decisions made during the review
• inclusion of studies in the review
• definition of low risk of bias
• choice of effect measure
• assumptions about missing data
• cut-off points for dichotomised ordinal scales
• correlation coefficients
•
repeat analysis using an alternative method or assumption
• don’t present multiple forest plots – just report the results
• if difference is minimal, can be more confident of conclusions
• if difference is large, interpret results with caution
GRADE Workshop – Ispra– 11-12 December 2013
Assessing inconsistency *
Sensitivity analysis
Differences in underlying treatment effect
When heterogeneity exists, but investigators fail to identify a plausible
explanation, the quality of evidence should be downgraded by one or two
levels, depending on the magnitude of the inconsistency in the results
Inconsistency may arise from differences in:
• populations (e.g. drugs may have larger relative effects in sicker
populations)
• interventions (e.g. larger effects with higher drug doses)
• outcomes (e.g. diminishing treatment effect with time)
*Proposed by GRADE Working Group
Adapted from Li J, Zhang Q, Zhang M, Egger M. Intravenous
magnesium for acute myocardial infarction. Cochrane Database of
Systematic Reviews 2007, Issue 2.
59
GRADE Workshop – Ispra– 11-12 December 2013
60
GRADE Workshop – Ispra– 11-12 December 2013
10
Outline
Funnel plots
61
a.
Introduction
b.
Representation
c.
Imprecision
d.
Inconsistency
e.
Publication Bias
f.
Exercise
62
GRADE Workshop – Ispra– 11-12 December 2013
Symmetrical funnel plot
studies will be scattered around the effect estimate
• larger studies at the top, smaller studies further down
• small studies expected to scatter more widely
•
a symmetrical plot will look like an inverted funnel or triangle
•
•
RevMan can generate funnel plots
only appropriate with ≥ 10 studies of varying size
0
Standard Error
Standard Error
•
Asymmetrical funnel plot
1
2
3
1
Unpublished
studies
2
3
0.1
0.33
Source: Matthias Egger &
Jonathan Sterne
0.6
1
3
10
0.1
64
GRADE Workshop – Ispra– 11-12 December 2013
Adapted from Perel P, Roberts I. Colloids versus crystalloids for fluid
resuscitation in critically ill patients. Cochrane Database of
Systematic Reviews 2011, Issue 3.
GRADE Workshop – Ispra– 11-12 December 2013
0.33
Source: Matthias Egger &
Jonathan Sterne
Effect
Colloids vs crystalloids for fluid resuscitation
65
plot effect size against study size
• study size usually indicated by a measure like standard error
GRADE Workshop – Ispra– 11-12 December 2013
0
63
•
0.6
1
3
10
Effect
GRADE Workshop – Ispra– 11-12 December 2013
Magnesium for myocardial infarction
Death
Adapted from Li J, Zhang Q, Zhang M, Egger M. Intravenous
magnesium for acute myocardial infarction. Cochrane Database of
Systematic Reviews 2007, Issue 2.
66
GRADE Workshop – Ispra– 11-12 December 2013
11
Reasons for funnel plot asymmetry
Outline
1. chance
• artefact
• some statistics are correlated to SE, e.g. OR
• understanding reporting biases
•
clinical diversity
• different populations different in small studies
• different implementation different in small studies
•
methodological diversity
• greater risk of bias in small studies
•
reporting biases (publication bias)
Source: Egger M et al. Bias in meta-analysis detected by a simple,
graphical test. BMJ 1997; 315: 629
67
68
GRADE Workshop – Ispra– 11-12 December 2013
Reporting biases
The dissemination of evidence
unavailable
(unpublished)
available in
principle
(thesis,
conference,
small journal)
GRADE Workshop – Ispra– 11-12 December 2013
easily
available
(Medlineindexed)
actively
disseminated
(news, drug
company)
•
dissemination of research findings is influenced by the nature and
direction of results
•
•
statistically significant, ‘positive’ results more likely to be published…
…therefore more likely to be included
• leads to exaggerated effects
• large studies likely to be published anyway, so small studies
most likely to be affected
•
non-significant results are as important to your review as significant
results
Source: Matthias Egger
69
70
GRADE Workshop – Ispra– 11-12 December 2013
Evidence for reporting bias
GRADE Workshop – Ispra– 11-12 December 2013
Positive studies are more likely to be
Conceived
Performed
Proportion of
studies not
published
•
submitted for publication...
•
…and accepted (publication bias)
•
…quickly (time lag bias)
•
…as more than one paper
(multiple publication bias)
•
…in English (language bias)
•
…in high-impact, indexed journals (location
bias)
•
… including positive outcomes (selective
outcome reporting)
•
…and cited by others (citation bias)
Submitted
Significant
Non-significant trend
Null
Published
Cited
Years since conducted
Source: Stern JM, Simes RJ. Publication bias: evidence of
delayed publication in a cohort study of clinical research
projects BMJ 1997;315:640-645.
Source: Julian Higgins
71
72
GRADE Workshop – Ispra– 11-12 December 2013
GRADE Workshop – Ispra– 11-12 December 2013
12
Assessing publication bias *
Publication bias is a systematic underestimate or an overestimate of the
underlying beneficial or harmful effect
•
Investigators fail to report studies the have undertaken
•
If meta-analysis is influenced, downgrade the quality of evidence
Determinants of quality of evidence:
What can upgrade evidence?
*Proposed by GRADE Working Group
73
GRADE Workshop – Ispra– 11-12 December 2013
D is c laimer: T he c ontents of this pres entation are the views of the author and do not nec es sarily represent an offic ial position of the E uropean C ommission.
J RC xxxxx – © E uropean U nion, 2 013
13