UE11 Parcours Spécifique 1 Clinical Research

UE11 Parcours Spécifique 1
Clinical Research - Cours n°3
28/10/2015
Agnès Dechartres
RT : Clara Timsit, Deborah To-Puzenat
RL : Eva Bisson
[email protected]
Outcome measures and safety
I.
II.
a)
b)
c)
d)
e)
Determining whether the treatment is effective or not
1.
How to conclude to efficacy of the new treatment
2.
How can we express treatment effect?
Primary outcome (PO)
1.
A single PO/selective outcome reporting
2.
How can we choose a good PO?
Clinical relevance
Surrogate outcomes
Objectivity
Availability in all patients
Composite outcomes
Outcome, also called endpoint = « critère de jugement »
The outcome makes it possible to assess whether the treatment is effective or not. We distinguish primary
and secondary outcomes (critère de jugement principal/secondaires).
Only one primary outcome: to conclude on efficacy. It should correspond to the primary (main) objective
of the trial.
Secondary outcomes (less important):
-
other benefits
-
mechanism of action
-
side effects (= adverse effects)
The conclusion of the trial must be based only on the result of the primary outcome, we
cannot conclude to the efficacy of the experimental intervention based on a secondary outcome.
I.
1.
Determining whether the treatment is effective or not
How to conclude to the efficacy of the new treatment?
You need to perform a statistical test:
 Null hypothesis (Ho) : Treatment A = Treatment B
 Alternative hypothesis (H1) : Treatment A ≠ Treatment B
 Alpha risk (set at 5% in general) : Probability of rejecting H0 (the null hypothesis) when it’s
true, in other words, it’s the probability to have a false positive result, to conclude to a difference if this
difference doesn’t exist.

P-value: probability of observing a result that is as extreme as or more extreme than currently
observed. P-Value is the result of the statistical test.
If p < 0,05
If p ≥ 0,05
No statistical difference between the two treatments.
Does not mean that the treatments are equally
Statistical difference between treatment A and effective. We can’t conclude: it can mean there isn’t
treatment B.
any difference between the treatments or you did
not have sufficient power to show a statistical
difference.
Power: probability to observe a statistical difference if this difference exists.
Influence of sample size on p-value: the larger the sample size, the more important the power to show a
statistical difference. (For a small sample you’ll need a very large difference between the two groups to be
significant)
Warning: A statistical difference does not mean that the difference is important for the patients.
 We also need to assess whether the difference is clinically relevant
2.
How can we express treatment effect?
Several methods are used to assess the difference, whether it is large or not. Some of these methods are
adequate and some can be misleading.
We can express treatment effect as:
-
relative risk
-
relative risk reduction
-
absolute risk reduction
-
Number Needed to Treat (NNT , number of patients to treat to avoid one event)
Example: Difference between relative and absolute differences:
“Contraceptive pills double the risk of venous thromboembolism” (report of the UK committee on the safety
of medicines, 1995): This example shows that we should not rely on relative difference because it can be
misleading.
Negative impact on public opinion (“Kiss of death” on The Sun cover), increasing number of abortions in
the UK after 1995
In reality: contraceptive pills increase the risk of venous thromboembolism from 1 to 2 women out of
every 7000 women.
Relative difference -> (2/7000) / (1/7000) = 2 —> Contraceptive pills double the RR.
Absolute difference -> (2/7000) – (1/7000) = 1/7000 —> It is very small.
We have to assess whether this difference is clinically important.
Assessment of treatment effect:
Table 1
Treatments
A (exp)
B
Overall
Deaths
Yes
No
20
180
40
160
Overall
200
200
400
Larger sample, less common event:
Table 2
Deaths
Overall
Treatments Yes
No
A (exp)
2
1998
2000
B
4
1996
2000
Overall
4000
Absolute risk treatment A = 20/200=10%
Absolute risk treatment B = 40/200=20%
Absolute reduction of risk = 20%-10% = 10%
Relative reduction of risk = 20%-10%/20% = 50%
NNT = 1/ARR=1/0.1=10
Absolute risk treatment A =2/2000=0.1%
Absolute risk treatment B =4/2000=0.2%
Absolute risk reduction = 0.2%-0.1%=0.1%
Relative risk reduction = 0.2%-0.1%/0.2% = 50%
NNT = 1/ARR=1/0,001 = 1000
3 parameters:
Absolute risk treatment
Absolute risk reduction and NNT
In these 2 examples, same relative risk reduction but a different NNT (the 1st treatment is
better because you have to treat less patients in order to avoid one death).

There is a risk of erroneous interpretation with relative risk reduction.
II. Primary outcome (PO)
The
primary outcome should be defined in the protocol before beginning the trial. A sample size
calculation based on this outcome can then be performed:
Sample size calculation is important to;
Recruit a sufficient number of patients to show a statistically significant difference (power).
Not ethical to include too many patients (they are exposed to risks during the trial)
What is needed for sample size calculation?
o
Alpha risk usually set at 5%
o
Power (at least 80%, sometimes 90%) = probability to show a difference when this
difference really exists
o
Prevalence of the outcome : the more unexpected the outcome , the larger the sample size
o
Difference we want to show : the smaller the difference , the larger the sample size
1.
A single PO/selective outcome reporting
Why a single PO?
To keep α risk at 5%
o
1 statistical test α risk is at 5%
o
α risk increases when performing several statistical tests increased probability of false
positive results !
Sample size calculation :
o
Difficult when several primary outcomes (α risk needs to be adjusted)
Example:
A single predefined PO that should not change during the trial:
Especially after seeing the results
Because of α risk (to avoid an increase of α risk, the result could then be related to chance)
o
Multiplication of statistical tests
increased alpha risk
increased probability of false-positive results
Selective outcome reporting (SOR) :
o
Selection of a subset of the original variables recorded, on the basis of the results, for
inclusion in publication of trials. (if PO isn’t significant, investigators could be tempted to choose a
secondary outcome and a subgroup for which there’ll be a statistical difference)
o
It will bias treatment effect toward more positive results.
How can we assess the the risk of SOR?
 Check the protocol
 To compare the PO defined in the protocol to the PO in the published article (it should be the same)
 Protocol can be difficult to obtain
 But since 2005, all trials should be registered before the recruitment of the first patient in a free publicly
available registry online (the best-known: clinicaltrials.gov - registration of the main elements of the protocol
including the PO)
Examples of changes of PO:
1) In the register: Glycosilated Hemoglobin concentration (important parameter to monitor diabetic
patients) after 2 years - Continuous outcome / In the article: binary outcome with a cut-off according to the
value of glycosilated hemoglobin and glycaemia as PO was added
2) Same primary outcome but the timeframe of assessment changed from 2 years to 1.
Different types of outcomes:
Clinical events (usually assessed by physicians): mortality, myocardial infarction
Patient-reported outcomes (PRO): pain, disability/function, quality of life
Other types of outcomes assessed by physicians: clinical scores, biological tests, radiological
tests
2.
How can we choose a “good” PO?
a) Clinical relevance/importance for patients
Outcomes considered as important for patients: mortality, morbidity (myocardial infarction, stroke…), pain,
function and quality of life.
Few trials focus on outcomes that are important for patients :

only 18% ongoing trials on diabetes

23% cardiovascular trials with PO important for patients published between 2005 and 2008.
b) Surrogate outcomes = critère de jugement intermédiaires
Surrogate outcomes are measured before a clinical event occurs and are often used in clinical trials as
substitutes for final patient relevant outcomes like clinical events.
Example:
Surrogate outcome Outcome important for patients = clinical event
HTA
Blood pressure
Death or death by stroke
Osteoporosis
Bone mineral density
Fracture
Why use surrogate outcomes?
Low number of clinical events (important outcomes)
Long-term follow-ups to see them.
Long time before the patients could benefit from therapeutic innovations.
Advantages:
 Smaller sample size
 Shorter follow-up duration (from years to weeks)
Ex: Drug reducing cholesterol level in blood
Outcome = Blood cholesterol level , 100 patients , 3 to 12
months
Outcome = Mortality , thousands of patients , 4 to 5 years
Problems:
Assessment of efficacy:
incomplete
inadequate
sometimes misleading conclusions.
Example: treatment against osteoporosis

Fluoride: increases bone mineral density but also the risk of fractures => not good for
patients

Bisphosphonate: increases bone mineral density and decreases the risk of fracture

Raloxifene: no or little effect on bone mineral density and decreases the risk of fracture.
Surrogate outcomes are not always correlated to the occurrence of clinical events. Always better to have
an outcome which is important for the patients; surrogate outcomes are always subject to discussion.
c) Objectivity of outcomes
-
For some outcomes, assessment is objective :
o
Mortality
o
Biological tests.
- For others, the assessment looks objective but is subjective : radiological tests (not an exact
science: interpretation, influence of movements and food absorption by the patient)
- For many, assessment is subjective :
o
Pain and other patients reported outcomes (very subjective +++)
o
Myocardial infarction
o
Cardiovascular mortality (because it can be very difficult to determine the cause of death).
The consequences of subjectivity are:
Variability between outcome assessors
Bias if the outcome assessor isn’t blinded. For objective outcomes, there is no difference
between assessments made in blinded or non-blinded trials, whereas for subjective outcomes there can
a difference of treatment effect estimates between blinded and non-blinded assessors (more positive
results in non-blinded trials).
How to limit subjectivity?
 Blinding assessment
 Standardization of outcome measurement
o Standardized form
o Standardization of clinical assessment
o Training of outcome assessors
o Skilled and trained outcome assessors
o Assessment in duplicate (2 people assess the outcome independently, then see whether they disagree,
if they do a 3rd person is needed to reach a conclusion) to limit variability
o Centralization of outcome assessment (central comity to discuss discrepancies)
 Evaluation of reproducibility
o Intraclass correlation coefficient (if continuous outcome)
o Kappa coefficient (if binary outcome)
d) Availability in all patients
Objective: to limit missing data
We also need to limit exclusion of patients from analysis because of intent to treat analysis, and limit the
number of lost to follow-up.
Example: thromboembolic events assessed by venograms, myocardial outcomes assessed with
coronarography. With such outcomes, there is frequently missing data. They can be considered as SO (not
important for patients).
Some POs are less likely to have missing data than others: deaths are available in the national registry (état
civil) so the number of missing data should be very limited.
e) Composite outcomes (CO)
They are commonly used in cardiovascular trials.
Example: death OR myocardial infarction OR stroke (3 components within 1 outcome).
Interests:
 Increased power for a same sample size: increases the probability of observing the event.
 Can help investigators select the PO:
Trial in patients taking aspirin after myocardial infection (secondary prevention to avoid
recurrence)
Comparison maintenance of aspirin vs. discontinuation of aspirin before a surgical
intervention:
The CO including both ischemic and bleeding event takes into account the balance between
benefits and risks.
Recommendations in case of CO:
Interpretation as a CO:
o
If p<0.05: statistical reduction of death OR myocardial infarction OR stroke.
o
It’s not possible to conclude that the new treatment decreased mortality.
Components of the CO should be defined as secondary outcomes.
Ex: secondary outcomes = death, myocardial infarction, stroke.
Components of CO:
 Should have the same importance for patients
 Should occur with the same frequency
 Systematic assessment of treatment effect for each component (defined as secondary outcomes):
similar treatment effect?
Problems:
Few CO include components of similar importance for patients with similar treatment effect.
Example: 1715 randomized patients, comparison irbesartan + amlodipine vs. placebo
CO is defined as mortality or end stage renal disease or doubling of serum creatinine concentration (This last
one could be considered as a surrogate outcome: the components of the CO don’t have the same importance
for the patient). Treatment expressed as a relative risk reduction: there’s a statistically significant reduction
for the CO (the 95% confidence interval doesn’t include 0).
The choice of the CO is inadequate: there is in fact no difference (the 95% CI includes 0) for end stage renal
disease and mortality, which are the most important outcomes for patients, but because the surrogate
outcome (doubling of creatinine levels, which is a very common event) is significant, so is the CO. Results
are misleading.
Conclusions:
Reflexes when considering a PO:
 Is the PO important for patients?
Deaths, clinical event , pain , disability and quality of life
 Is the PO subjective or objective?
If subjective:
o
Blinded assessment?
o
Standardization to improve reproducibility?
o
Assessment in duplicate or centralized assessment to improve reproductibility?
 Is the PO available for all patients?
Interpreting the results:
 Absolute risk reduction or NNT
 Avoid relative reduction of risk
Abbreviations :
PO: primary outcome
NNT: number needed to treat
SOR: selective outcome reporting
CO: composite outcome
FICHE RECAPITULATIVE :

The primary outcome corresponds to the main objective of the trial, there should be
only one (otherwise the alpha risk will increase). The conclusion on the efficacy of the
treatment is based on its results (and not on the secondary outcome) it should be defined in
the protocol before beginning the trial. A sample size calculation can be performed on it
needing the alpha risk, the power, the prevalence of the outcome and the difference we want
to show.

We can conclude on efficacy after a statistical test. If the P-value (the result of the test)
< 0.05 (which corresponds to the Alpha Risk: the probability to get a false positive) there is
a statistical difference between the two treatments. Otherwise, there is no statistical difference
but we can't conclude that the treatments are equal,. The sample size has an effect on p-value;
the larger the sample size, the more important the power to show a statistical difference (it is
not ethical to include too many patients).

The best ways to express treatment effect are the Absolute risk reduction (treatment A
risk – treatment B risk) and the Number Needed to treat (1/absolute risk reduction)

Selective Outcome Reporting is a bias treatment effect towards more positive results
(choice of a secondary outcome with statistical differences). It is needed to check the protocol
for the PO.

A good PO can be based on clinical relevance or importance for the patient (pain,
mortality, and morbidity, quality of life, disability) or surrogate outcome. SO has a shorter
follow up duration and a smaller sample size but it can be incomplete, misleading or
inadequate. The assessment of an outcome can be objective (mortality) or subjective (pain)
which could lead to a bias if the outcome assessor isn't blinded (detection bias). In order to
improve reproducibility in case of subjective outcome, we need blinding assessment,
standardization of outcome measurement and evaluation of reproducibility.

Composite Outcomes (commonly used in cardiovascular trials) increase the power for
a small simple size. If p<0.05 it is not possible to conclude that the treatment decreases only
one component. The Components should be defined as secondary outcomes, and should
occur with the same frequency, the same importance and treatment effect should be similar.