Lecture slides

Course: Clinical Trials
Regulatory issues in confirmatory clinical trials
Norbert Benda
Feb. 2016 | Page 1/51
Contents
1. Basic principles in clinical development and drug approval
2. Statistical issues in drug approval
• ITT and missing data
• type-1 error control and multiplicity
3. Superiority, non-inferiority and equivalence
• basic principles and differences
• derivation of a non-inferiority margin
• issues in non-inferiority trials
4. Endpoints, effect measures and estimands
• definitions
• issues and examples
5. Missing data
• issues and solutions
Feb. 2016 | Page 2/51
Clinical development of a new medicine
preclicnical
development
Phase I
few healthy
volunteers
clinical
pharmacology
toxicity,
safety
Phase II
medium
number of
patients
proof-of
concept
dose
response
Phase III
MAA:
Marketing
authorization
application
large trials in
patients
randomized
controlled trials
(RCT)
demonstration of
efficacy and safety
Feb. 2016 | Page 3/51
Exploration and confirmation
exploration
phases I and II
estimation of a maximum tolerable dose (MTD)
proof of concept using pharmacodynamic parameters
explore
selection of indication and population
selection of a useful dose (range)
confirmation
phase III
confirmatory proof of efficacy compared to
placebo or active comparator
Feb. 2016 | Page 4/51
Drug approval bodies
EU: European Medicine Agency - European Commission
Scientific committee:
Committee for medicinal products for human use (CHMP)
Drug approval bodies worldwide:
Canada:
Health Canada
Russia:
FSKN
EU:
EMA
Japan:
PMDA
China:
China FDA
USA:
FDA
Brazil:
ANVISA
Switzerland:
Swissmedic
Australia:
TGA
Feb. 2016 | Page 5/51
Drug approval requirements
new chemical entity:
confirmatory proof of efficacy
superiority to placebo
or
non-interiority compared a reference product
sufficient safety database
positive benefit risk assessment
benefit outweighs risk
generic drug (same active ingredients as a reference product):
 confirmatory proof of bioequivalence
Feb. 2016 | Page 6/51
Clinical trials
• „A clinical trial is a planned experiment on human beings which is designed
to evaluate the effectiveness of one or more forms of treatment“
• Douglas Altman (1991): Practical Statistics for Medical Research
• Evaluation of efficacy / effectiveness usually requires
• comparison / control
• vs placebo or active control
• internal validity
• ensuring causal relation to treatment
• external validity
• extrapolation to other settings
Feb. 2016 | Page 7/51
Exploration and confirmation
• Exploration: early phases
• no control of false positive decisions
• generation of hypotheses to be tested later
• insufficient for licensing
• Confirmation: late phases
• pivotal Phase III study
• control of false positive decisions:
• probability of a positive study in case of an ineffective treatment
– limited (usually 2.5%)
• unbiased estimate of treatment effect
• at least not biased in favour of the new treatment
• usually at least two successful pivotal trials required for licensing
Feb. 2016 | Page 8/51
Key elements of a confirmatory clinical trial
• Control
• new treatment to be better than placebo (superiority)
• new treatment equal (similar) to relevant comparator
• indirect prove of superiority to placebo (non-inferiority)
• Randomization
• proper randomization
• ensure causal relationship between effect and treatment
• exclude confounding with other factors
• ITT principle
• evaluation of all study participants according to randomization
• Blinding
• equal care, equal follow-up, equal assessments, equal measurements
• Type-1 error control
• low probability of falsely declaring efficacy
Feb. 2016 | Page 9/51
Selected statistical issues in drug approval
• Type-1 error control
• proper control of false positive decisions
• Multiplicity, sequential and adaptive designs
• studywise control of any false positive decision in all confirmatory comparisons
• considering design adaptations and interim decisions
• including decisions made for subpopulations
• Surrogate endpoints
• validating and using endpoints intended to replace others
• e.g. tumour response or progression free survival in oncology instead of
overall survival
• Proper survival analyses
• e.g. considering “informative censoring”
• Missing data
• considering patients that discontinue treatment
• Non-inferiority studies
• derivation of a non-inferiority margin
• ensuring sensitivity of a non-inferiority comparison
Feb. 2016 | Page 10/51
Randomization 1
• Randomization to avoid confounding and to ensure causality
• Example, no randomization:
• Patients with more advanced disease receive treatment A
• Patients with less advanced disease receive treatment B
• Efficacy related to disease status
• Apparent superiority of A vs B due to
– disease status / population
– not due to treatment
• similar with other prognostic factors as age, gender, etc.
– including unknown prognostic factors
Feb. 2016 | Page 11/51
Randomization 2
• Randomization to avoid confounding and ensure causality
• no systematic difference w.r.t. prognostic factors
• remaining chance difference covered by statistical test procedure
• included in type-1 error control
– probability to obtain positive efficacy because of group
differences controlled
• small differences for large sample sizes
• Randomization facilitates interpretation for individual patients:
• mean(effect in A – effect in B)
= mean(treatment group A) – mean (treatment group B)
due to equal chances to receive specific treatment for each patient
Feb. 2016 | Page 12/51
Randomization and Causality: Example
Population of interest
P
• Two subpopulations
P1 and P2
Two treatments
A and B
Endpoint
Response (binary)
No randomization:
• all patients in P1 receive A: 50% response
• all patients in P2 receive B: 30% response
 A better than B ?
Imagine: If
• all patients in P1 received B: 50% response
• all patients in P2 received A: 30% response
 better response due to population only
Feb. 2016 | Page 13/51
Randomization and Causality: Example
Population of interest
P
• Two subpopulations
P1 and P2
Two treatments
A and B
Endpoint
Response
Randomization:
• each patients in P1 and P2 receive A with (e.g.) 50% chance (randomization 1:1)
• differences in proportions of
• P1 patients in A and B
• are due to chance only:
– large sample sizes – ignorable differences
– response difference due to population differences < α
if no real difference present
Feb. 2016 | Page 14/51
ITT principle
• ITT principle
• ITT = “intention to treat”
• maintains randomization
• measuring the treatment effect caused by treatment only
• avoids treatment related selection
• evaluation according to randomization
• irrespective of study discontinuation or treatment switch
• Ensures that group difference are due to
• treatment
• not due to populations
• remaining difference in populations by chance
• type 1 error accounts for residual population differences
Feb. 2016 | Page 15/51
ITT principle: Simplistic example (1)
Compare treatments A and B
• Consider two (latent) subpopulations S1 and S2
• Case 1: No drop-out
A
Subgroup S1: 50%
response
Subgroup S2: 50%
no response
Subgroup S1: 50%
response
Subgroup S2: 50%
no response
Randomize 1:1
B
• Response rates:
• A: 50% vs B: 50%
• No treatment difference
16
Feb. 2016 | Page 16/51
ITT principle: Simplistic example (1)
Compare treatments A and B
• Consider two (latent) subpopulations S1 and S2
• Case 2: All S2 patients drop out under A
A
Subgroup S1: 50%
response
Subgroup S2: 50%
not observed
Subgroup S1: 50%
response
Subgroup S2: 50%
no response
Randomize 1:1
B
• Response rates in completers:
• A: 100% vs B: 50%
• Large treatment difference in completers
17
Feb. 2016 | Page 17/51
ITT principle: Simplistic example (1)
Case 2: All S2 patients drop out under A
A
Subgroup S1: 50%
response
Subgroup S2: 50%
not observed
Subgroup S1: 50%
response
Subgroup S2: 50%
no response
Randomize 1:1
B
•
•
•
•
Drop-out in S2 patients may be caused by treatment A
All S2 patients may have experienced treatment failure under A
S2 patients may not be identifiable as a specific sub-group
Two options to follow the ITT principle
• Consider drop-out as failure
• Replace missing data
18
Feb. 2016 | Page 18/51
ITT principle: Simplistic example (2)
• Study comparing treatment A and B
• Primary or secondary endpoint:
• quality of life score
• Some patient die before the end of treatment
• How to measure quality of life (QoL) in all patients ?
• QoL in survivors = QoL in a selected population
– ITT principle not followed
• QoL in survivors may be biased
Feb. 2016 | Page 19/51
ITT principle: Simplistic example (2)
(latent) sub-populations P1, P2, P3 with prevalence 1/3 each
• Compare treatments A and B
P1
1/3 of the
population
P2
1/3 of the
population
P3
1/3 of the
population
A
all die
QoL not given
all survive
mean QoL = 30
all survive
mean QoL = 60
45
B
all die
QoL not given
(similar survival
times as in A)
all die
QoL not given
all survive
mean QoL = 50
50
A equal to B
A better than B
A better than B
A better than or equal to B in all subgroups
mean QoL
in survivors
but:
B better than A
re. mean QoL
Feb. 2016 | Page 20/51
ITT principle: Simplistic example (2)
Quality of life (QoL) analysis in studies with high mortality:
Possible analyses discussed
• QoL score in patients who die = 0
• followed by a non-parametric evaluation
– parametric evaluation depends on arbitrary score 0
• non-parametric Gould type analysis (see later)
• adjustment by all relevant prognostic factors (in survivors)
• still not following the ITT principle
• mimic effect in those who survive under A and B using
• causal inference with instrumental variables
– difficult, relying on full identification of instrumental variables
• replacing missing data by missing data imputation
• may not target the relevant question
– “your QoL would be such if you wouldn’t die”
Feb. 2016 | Page 21/51
Type-1 error control: Example
• Clinical trial comparing treatments A and B
• primary endpoint: walking distance in 6 minutes (difference to baseline)
(6-minute-walk-test)
• statistical test: two sample t-test on
• null hypothesis H0: mean (A) = mean (B)
• obtain p-value
– probability to obtain the observed difference (or greater) if in fact the null
hypothesis is true
Feb. 2016 | Page 22/51
Type-1 error control: Example
• Clinical trial comparing treatments A and B
• statistical test: two sample t-test
• result: p (two-sided) = 0.0027 < 0.05 (5%)
• small probability to obtain such a result by chance
• difference declared to be significant
Feb. 2016 | Page 23/51
Type-1 error control: Example
• Clinical trial comparing treatments A and B
• two comparisons for 6MWT
• After 3 months: p-value = 0.0027
• After 6 months: p-value = 0.0918
Question: Study successful ?
Feb. 2016 | Page 24/51
Type-1 error control: Example
• Multiplicity and pre-specification
• post-hoc definition of endpoint of interest (primary endpoint)
(6MWT after 3 months or 6MWT after 6 months)
• increases the probability of a false significance
• is invalid
• pre-specification needed
Feb. 2016 | Page 25/51
Multiplicity
Multiplicity
• Multiple ways to win
• Multiple chances to obtain a significant results due to chance
Example: Study success defined by a significant difference in primary
endpoints, e.g. oncology trial
• progression free survival (PFS)
• overall survival (OS)
• No adjustment means:
• probability of a significant difference in PFS or OS > α
if no real difference in PFS or OS
• increased chances to declare an ineffective treatment to be effective
Feb. 2016 | Page 26/51
Multiplicity
Example: Study success defined by a significant difference in primary endpoints,
• progression free survival (PFS)
• overall survival (OS)
Different options to keep type-1 error:
1. PFS and OS co-primary
•
both must be significant
2. Hierarchical testing:
• test PFS first, test OS only if PFS is significant (or vice versa)
3. Adjust α : test PFS and OS with 0.025 each (instead of 0.05) (Bonferroni)
• or use a different split (e.g. 0.01 for PFS, 0.04 for OS)
4. Adjust with more complex methods
•
“Bonferroni-Holm”, “Hochberg”, “Dunnett”, etc.

Lecture on multiple testing (A Zapf), May 19th
Feb. 2016 | Page 27/51
Multiplicity adjustment:
Co-primary endpoints
Example: Study success defined by a significant difference in primary
endpoints,
• progression free survival (PFS)
• overall survival (OS)
PFS and OS co-primary
• to be pre-specified in the protocol
• both must be significant
• no valid confirmatory conclusion of only one endpoint is significant
• e.g. PFS: p = 0.0000001, OS: p = 0.073
– “sorry, you lost” – no way
Feb. 2016 | Page 28/51
EMA
Points to
consider
on
Multiplicity
Feb. 2016 | Page 29/51
Sources of multiplicity
•
•
•
•
•
multiple endpoints
multiple interim looks
multiple group comparisons
• dose groups
multiple (sub-)populations
multiple analysis methods (tests)
• all may valid, but post-hoc selection is not
 Course on multiplicity (date, presenter)
Feb. 2016 | Page 30/51
Confirmatory Phase III studies
Summary of type-1 error control
•
•
•
Control probability of a false positive decision
•
type-1 error control of the statistical test
Type-1 error control refers to those comparisons that define study success
•
usually defined by primary analyses of primary endpoints
•
precise definition of study success needed
•
study success may be defined by several comparisons joint by
– “and” or “or”
•
confirmatory proof of primary comparisons
– descriptive use of others
•
multiple comparisons require appropriate multiplicity procedure
Pre-specification needed
•
study protocol / statistical analysis plan (SAP)
Feb. 2016 | Page 31/51
Superiority, non-inferiority, equivalence
• superiority study
– comparison to placebo
• new drug to be better than placebo
• non-inferiority study
– comparison to an active comparator
• suggests: new drug as least as good as comparator
• proofs: new drug not considerably inferior than comparator
• equivalence study
– bioequivalence in generic applications
(=copies of “chemical drugs”)
– therapeutic equivalence in biosimilars
(=copies of “biological drugs”)
• difference between drugs within a given range
Feb. 2016 | Page 32/51
Superiority, non-inferiority, equivalence
• compare parameter ϑ between two treatments
– e.g. ϑ
– ϑ A, ϑ B
= mean change from baseline in Hb1Ac
= mean change for treatment A (new) and
treatment B (comparator or placebo)
• superiority comparison
– show: ϑ A > ϑ B
– reject null hypothesis H0: ϑ A ≤ ϑ B
• non-inferiority comparison
– show: ϑ A > ϑ B − δ
– reject null hypothesis H0: ϑ A ≤ ϑ B− δ
• δ = non-inferiority margin
• equivalence comparison
– show: - δ < ϑ A − ϑ B < δ
– reject null hypothesis H0: ϑ A ≤ ϑ B− δ and ϑ A ≥ ϑ B+ δ
Feb. 2016 | Page 33/51
Superiority, non-inferiority, equivalence
H0
Superiority
H0
Non-inferiority
H0
H0
Equivalence
−δ
0
δ
ϑA−ϑB
Feb. 2016 | Page 34/51
Confirmatory superiority trial
• show
– new drug better than placebo
 use statistical test on null hypothesis H0: ϑ A ≤ ϑ B
 significance = reject null hypothesis – conclude superiority
• validity
– type-1 error control:
• Prob(conclude superiority|no superiority) ≤ 2.5 %
– false conclusion of superiority
̭ %
̭ ≤ 2.5
– effect estimate relative to placebo ϑ A − ϑ B
• unbiased (correct on average)
or
• conservative (no overestimation on average)
Feb. 2016 | Page 35/51
Confirmatory superiority trial
• ensure
– probability of a false positive decision on superiority should be small (≤
2.5 %)
• type-1 error control of the statistical test
• control for multiple comparisons
– conservativeness
• avoid overestimation
– correct statistical estimation procedure
– proper missing data imputation
– proper randomisation
– etc.
Feb. 2016 | Page 36/51
Confirmatory non-inferiority trial
• show
– new drug better than comparator – δ
 use statistical test on null hypothesis H0: ϑ A ≤ ϑ B − δ
 conclusion of non-inferiority (NI) = rejection of null hypothesis
• validity
– type-1 error control:
• Prob(conclude NI|new drug inferior – δ ) ≤ 2.5 %
– false conclusion of NI ≤ 2.5 %
̭
̭
– effect estimate relative to active comparator ϑ A − ϑ B
• unbiased (correct on average)
– no underestimation of an possibly negative effect
Feb. 2016 | Page 37/51
EMA
Points to consider
on switching
between
superiority
and
non-inferiority
Feb. 2016 | Page 38/51
EMA
Guideline
on the choice
of the
non-inferiority
margin
Feb. 2016 | Page 39/51
Issues in non-inferiority trials
• non-inferiority margin δ
– clinical justification
• defined through clinical relevance
– “statistical” justification
• defined through comparator benefit compared to placebo
• sensitivity
– NI studies to be designed to detect differences
• constancy assumption
– assumed comparator effect maintained in the actual study
• relevant population to be tested
• comparator effect maintained over time
– control e.g. “biocreep” in antimicrobials
Feb. 2016 | Page 40/51
Non-inferiority margin
• clinical justification
– clinical relevance
• involving anticipated risk benefit
• “statistical” justification
– related to putative placebo comparison
• indirect comparison to placebo using historical data
– based on estimated difference comparator (C) to placebo (P) C – P
– use historical placebo controlled studies on the comparator
• evaluating C – P in a meta-analysis
• quantifying uncertainty in historical data by using a meta-analysis based 95%
confidence interval of C – P
• define NI margin relative to the lower limit of the confidence interval, e.g. by
a given fraction
Feb. 2016 | Page 41/51
Non-inferiority margin: Statistical justification
Historical studies Comparator vs Placebo:
Effect estimate and 95% confidence intervals
0
δ
C−P
Meta-analysis:
Comparator vs Placebo:
Effect estimate and 95%
confidence interval
e.g. δ = ½ of lower limit of the 95% confidence interval of C – P
Feb. 2016 | Page 42/51
Sensitivity of a NI trial
• lack of sensitivity in a superiority trial
– sponsors risk
– may lead to an unsuccessful trial
• lack of sensitivity in a non-inferiority trial
(lack of assay sensitivity)
– relevant for approval
– risk of an overlooked inferiority
• assume “true” relevant effect ϑ A < ϑ B - δ
• insensitive new study
– e.g. wrong measurement timḙ in treatment
of pain
̭
– estimated effect difference ϑ A - ϑ B ≈ 0
• study would be a (wrong) success
Feb. 2016 | Page 43/51
Insensitive non-inferiority trial
true effect new treatment vs comparator
reduced effect due to
• “sloppy” measurement
• wrong time point
• insensitive analysis, etc.
−δ
H0
0
ϑA−ϑB
Non-inferiority
Feb. 2016 | Page 44/51
Sensitivity of a NI trial: Toy example
• historical data from comparator trials
– conducted in relevant population of severe cases
– response rates: comparator 60%, placebo 40%
– difference to placebo: 20%
• NI margin chosen for new study δ = 8%
• actual NI study (new vs comparator)
– conducted in mild and severe population
– 70% mild – 30% severe
– assume (e.g.):
• 100% response expected in mild cases irrespective of treatment
– expected (putative) response in this population
• comparator
0.3 ∙ 60% + 0.7 ∙ 100% = 88%
• placebo
0.3 ∙ 40% + 0.7 ∙ 100% = 82%
• expected (putative) difference to placebo: 6%
– 8% difference (new vs comparator) would mean
• new drug inferior to placebo
Feb. 2016 | Page 45/51
Sensitivity of a NI trial
• potential sources of lack of sensitivity in a NI trial
– wrong measurement time (too early, too late)
– wrong or “diluted” population
• e.g. study conducted in patients with a mild form of the disease
– but difference expected in more severe cases
– lots of missing data + insensitive imputation
• e.g. “missing = failure” may be too insensitive
– insensitive endpoint
• e.g. dichotimized response less sensitive than continuous outcome
• insensitive measurement (large measurement error)
– rescue medication
• e.g. pain
– primary endpoint VAS pain
– more rescue medication used for new drug
Feb. 2016 | Page 46/51
Equivalence trial
• bioequivalence
– primary endpoint AUC of blood concentrations
(AUC = area under the curve of the concentration time curve)
– show: 0.8 ≤ mean(AUCgeneric)/mean(AUCoriginator) ≤ 1.25
• confirmatory proof given by
– 90% confidence interval ⊂ [0.8, 1.25]
– equivalent to two one-sided 5% tests to show
• mean(AUCgeneric)/mean(AUCoriginator) ≤ 1.25
• mean(AUCgeneric)/mean(AUCoriginator) ≥ 0.8
– increased type-1 error in bioequivalence as compared to other trials !
• 5% one-sided instead of 2.5% one-sided
Feb. 2016 | Page 47/51
Equivalence trial
• therapeutic or PD equivalence
– mainly used in biosimilarity
– demonstration of equivalence by
• e.g. using 95% confidence interval ⊂ equivalence range
• equivalent to two one-sided 2.5% tests
– usually symmetric equivalence range
• depending on scale
– e.g. (0.8, 1.25) is symmetric on log-scale (multiplicative scale)
• e.g. biosimilarity:
– If A is biosimilar to B, B should also be biosimilar to A
– lack of sensitivity issues as in NI trials
Feb. 2016 | Page 48/51
Endpoints and effect measures
• endpoint
– variable to be investigated, e.g.
• pain measurements after x days of treatment
– possible individual outcomes:
4.2, 6.3, etc.
• response
– possible individual outcomes:
yes, no
• time to event (death, stroke, progression or death, etc.)
– possible individual outcomes:
event at 8 months
censored at 10 months
Feb. 2016 | Page 49/51
Endpoints: Example
treatment of neuropathic pain
EMA Draft Guideline (2012):
primary endpoint: reduction in pain measured by
VAS: visual analog scale
0 – 100%
or 11-point scale (NRS)
Numerical Rating Scale
or other scale, z.B. NPS: Neuropoathic Pain Scale
10 different questions on pain
Modul 2.4: Klinische Studien
Feb. 2016 | Page 50/51
NPS Skale, Questions 1 - 5
Modul 2.4: Klinische Studien
Feb. 2016 | Page 51/51
EMA Draft Guideline
on treatment of pain
Modul 2.4: Klinische Studien
Feb. 2016 | Page 52/51
Endpoints and effect measures
• effect measure
– population parameter that describes a treatment effect, e.g.
• mean difference in VAS score between treatments A and B
• difference in response rates
• hazard ratio in overall survival
– study result estimates the effect measure
• observed mean difference
• difference in observed response rates
• estimated hazard ratio (e.g. using Cox regression)
– note:
• disentangle
– population effect measure to be estimated
– observed effect measure as an estimate of this
Feb. 2016 | Page 53/51
Choosing the right estimand
• estimand = that which is being estimated
– Latin gerundive aestimandus = to be estimated
– simply speaking: the precise parameter to be estimated
ceterum censeo parametrum esse aestimandum
– however:
• the parameter may not always be given easily
• may be a (complex) function of other parameters
• treatment effect estimate may target
– effect under perfect adherence:
– effect under real adherence:
• several options possible
“de-jure”
“de-facto”
Feb. 2016 | Page 54/51
ICH
Concept
Paper
on
Estimands
and
Sensitivity
Analyses
Feb. 2016 | Page 55/51
Estima*s
• estimation function (estimator)
– statistical procedure that maps the study data to a single value
(that is intended to estimate the parameter of interest)
• estimate
– value obtained in a given study
• estimand
– parameter to be estimated
• or a function of estimated parameters to be estimated
Feb. 2016 | Page 56/51
Example: Event rate in two treatments
• event rates
– A: 60 %
– B: 50 %
• how to measure treatment difference ?
– several options
• Rate difference: 10 %
• Rate ratio:
60/50 = 1.2
• Odds ratio:
(60/40)/(50/50) = 1.5
• Hazard ratio resulting from a time-to-event analysis
• estimand relates to the effect measure
– but not only to this !
Feb. 2016 | Page 57/51
Estimand
• estimand
= the precise parameter to be estimate
• related to
– endpoint
– effect measure
• mean difference, difference between medians, risk ratio, hazard ratio,
etc.
– population
– time point of measurement, duration of observational period, etc.
– adherence
• effect under perfect adherence:
“de-jure”
• effect under real adherence:
“de-facto”
Feb. 2016 | Page 58/51
De-facto and de-jure estimands
treatment dropout
“retrieved” data
placebo
active treatment
de-facto
(Difference in all
randomized patients)
de-jure
(Difference
if all patients
adhered)
time
end of trial
Feb. 2016 | Page 59/51
Estimand issues: Examples
• rescue medication
– e.g. pain, diabetes
• efficacy if no subject took rescue med (de-jure)
• efficacy under rescue med (de-facto)
• efficacy and effectiveness under relevant non-adherence
– e.g. depression
• effect if all subjects were adherent
or
• effect under actual adherence
Feb. 2016 | Page 60/51
Estimands under rescue medication
• primary endpoint: Change in VAS pain
• de-facto estimand
– change in pain irrespective of rescue medication use
– all data used
– longitudinal model or ANCOVA to estimate
• de-jure estimand
– change in pain without rescue medication
– only data until start of rescue medication used
– longitudinal model on “clean” data
• severe pain
–
–
–
–
intake of rescue medication in most patients
de-jure estimand not evaluable
de-facto estimand insensitive
alternative endpoints to be considered
• amount of rescue medication
• time to first use of rescue medication
Feb. 2016 | Page 61/51
Example: Simulation of a depression trial
with retrieved data
• Longitudinal data (Hamilton Depression Score)
• Some data were collected after treatment discontinuation
• Different drop-out mechanisms
– treatment dropout (TD)
– analysis dropout (AD)
• AD time ≥ TD time
• “retrieved data” from TD to AD
• Data generation
– according to a two-piece linear mixed model
– selection model for TD, exponential time to AD
Leuchs AK, Zinserling J, Schlosser-Weber G, Berres M, Neuhäuser M, Benda N. Estimation of the treatment
effect in the presence of non-compliance and missing data. Statistics in Medicine. 2014; 32:193–208.
62
Feb. 2016 | Page 62/51
Example of a de-facto estimand
• Joint model (JM) for Y and D (treatment dropout) with density
• Y|D following (e.g.) two-piece linear mixed model assuming
different slopes while adhering and non-adhering
• De-facto estimand given by
𝐸 𝑌 = � 𝐸 𝑌|𝐷 𝑓 𝑑 d𝑑
63
Feb. 2016 | Page 63/51
Example: Simulation of a depression trial
with retrieved data
• 150 patients per group (treatment vs placebo)
• Drop-out mechanisms
– treatment dropout (TD) at random
– analysis dropout (AD) at random
• Treatment dropout rate
– 30% each
– 25% and 35%
• Retrieval rates
– 100%, 70% and 40%
– corresponding to 0%, 30%, 60% AD
64
Feb. 2016 | Page 64/51
Example: Simulation of a depression trial
with retrieved data - Analysis
Analysis Options
1. Placebo multiple imputation
2. Joint modelling
– according to data generating mechanism
3. mixed effect model for repeated measurements using all data
– usual linear mixed effect model
– ignoring non-compliance
4. mixed effect model for repeated measurements using data until TD
– usual linear mixed effect model
– using only data during adherance
65
Feb. 2016 | Page 65/51
type 1 error and bias
Equal dropouts
de-jure = 0 & de-facto = 0
de-jure
TD
30%
de-facto
TD
30%
Retrieved
100%
70%
40%
1: pMI
0.023
0.023
0.023
2: JM varying slopes
0.031
0.032
0.035
3: MM all data
0.047
0.047
0.044
4: MM only on treat
0.052
0.052
0.052
Strategy
Retrieved
100%
70%
40%
1: pMI
0.023
0.023
0.023
2: JM varying slopes
0.049
0.050
0.058
3: MM all data
0.047
0.047
0.044
4: MM only on treat
0.052
0.052
0.052
Strategy
100
70
40
100
70
40
Feb. 2016 | Page 66/51
type 1 error and bias
Equal dropouts
de-jure = 2 & de-facto = 0
de-jure
TD
30%
de-facto
TD
30%
Retrieved
100%
70%
40%
1: pMI
0.64
0.64
0.64
2: JM varying slopes
0.37
0.35
0.32
3: MM all data
0.08
0.16
0.36
4: MM only on treat
0.82
0.82
0.82
Strategy
Retrieved
100%
70%
40%
1: pMI
0.636
0.636
0.636
2: JM varying slopes
0.055
0.051
0.059
3: MM all data
0.078
0.159
0.358
4: MM only on treat
0.820
0.820
0.820
Strategy
100
70
40
100
70
40
Feb. 2016 | Page 67/51
type 1 error and bias
Unequal dropouts
de-jure = 2 & de-facto = 0
de-jure
TD
30%
de-facto
TD
30%
Retrieved
100%
70%
40%
1: pMI
0.63
0.63
0.63
2: JM varying slopes
0.43
0.42
0.42
3: MM all data
0.07
0.21
0.43
4: MM only on treat
0.81
0.81
0.81
100%
70%
40%
1: pMI
0.629
0.629
0.629
2: JM varying slopes
0.051
0.051
0.062
3: MM all data
0.075
0.208
0.431
4: MM only on treat
0.807
0.807
0.807
Strategy
Retrieved
Strategy
100
70
40
100
70
Feb. 2016 | Page 68/51
40
Simulation study: Summary
Conclusions:
• De-jure estimand
– data retrieval not needed
– MMRM using on-treatment data valid under MAR
• De-facto estimand
– (some) data retrieval needed
– joint modelling especially in case of partial retrieval
– MMRM using all data in case of (almost) complete retrieval possible
• further investigations on robustness needed
69
Feb. 2016 | Page 69/51
Estimands: Summary
• specification of a relevant estimand first
– clarifies study objective
– needed to define relevant estimation and missing data method
– impacts study design
• estimand includes
– assumption on adherence
– distributional parameter
– population
• estimand
– defines the primary analysis
– different estimands may be used in additional analysis
Feb. 2016 | Page 70/51
EMA
Guideline
on Missing Data
Feb. 2016 | Page 71/51
Missing data in confirmatory trials
• ITT principle requires treatment of missing data
• Drop-out in clinical trials cause missing data
• Ignoring patients with missing data may bias the treatment
effect estimate, if
– drop-out is caused by the treatment
– treatment effect is different in patients who drop-out
• Omitting patients who drop-out
– results in a selection
– which may result in bias
• see simplistic ITT examples
• Estimands are related to missing data
Feb. 2016 | Page 72/51
Current discussion on missing data and
estimands in confirmatory studies
• Regulatory documents
– CHMP Guideline on Missing Data in Confirmatory Clinical Trials (2010)
– ICH Draft Concept Paper on Choosing Appropriate Estimands and
Defining Sensitivity Analyses in Clinical Trials (Addendum to E9) (2014)
• Questions
– What to do with missing data ?
– Why to impute missing data ?
– Measuring efficacy conditional on
• perfect adherence or real adherence ?
→ estimand
– Precise parameter to be estimated (and to be imputed for) ?
→ estimand
Feb. 2016 | Page 73/51
ITT principle and missing data
• Requires inclusion of all patients
– including drop-outs
– including those with missing data
• How can we consider patients who discontinued ?
– (a) change in endpoint definition
• time to event
→ time to event or drop-out
• treatment failure → treatment failure or drop-out
(“drop-out = failure”)
– (b) replace missing data „reasonably“:
• conservative in superiority studies
– excluding overestimation of treatment effect
• as unbiased as possible in non-inferiority studies
Feb. 2016 | Page 74/51
Drop-out scenario 1:
Assume:
• patient drop out equally in both treatments
– with probability p
• no treatment difference in drop-outs
Analysis:
• Complete case analysis
– ignores drop-outs
Result:
?
Feb. 2016 | Page 75/51
Drop-out scenario 1:
Assume:
• patient drop out equal in both treatments
– with probability p
• no treatment difference in drop-outs
Analysis:
• Complete case analysis
– ignores drop-outs
Result:
• treatment difference overestimated
– effect in all
= (1-p) ∙ effect in completers + p ∙ effect in drop-outs
= (1-p) ∙ effect in completers + 0
< effect in completers
Feb. 2016 | Page 76/51
Drop-out scenario 2:
Assume:
• drop-out independent of efficacy and treatment
• increasing effect over time
Analysis:
• LOCF: last observation carried forward
– last values observed before drop-out used as final value
Result:
?
Feb. 2016 | Page 77/51
Drop-out scenario 2:
Assume:
• drop-out independent of efficacy and treatment
• increasing effect size (treatment difference) over time
Analysis:
• LOCF: last observation carried forward
– last values observed before drop-out used as final value
Result:
• underestimation of the treatment difference
– superiority study: conservative effect estimate
– non-inferiority study: lack of sensitivity
• potential difference to comparator may be concealed
Feb. 2016 | Page 78/51
Drop-out scenario 3:
Assume:
• progressive disease
– treatment aims at slowing disease course
• no effect active vs placebo (null hypothesis)
• earlier drop-out under active
– e.g. due to side effects
Analysis:
• LOCF: last observation carried forward
Result:
?
Feb. 2016 | Page 79/51
Drop-out scenario 3:
Assume:
• progressive disease
– treatment aims at slowing disease course
• no effect active vs placebo (null hypothesis)
• earlier drop-out under active
– e.g. due to side effects
Analysis:
• LOCF: last observation carried forward
Result:
• early drop-out: better efficacy values
– better results for active treatment
– increased number of (wrongly) significant results
– type-1 error inflation: invalid analysis
Feb. 2016 | Page 80/51
Models for missing data:
Missing completely at random (MCAR):
• missingness independent of efficacy
– e.g. accident, move, data loss etc.
Missing at random (MAR):
• missingness depends on observed values only
– e.g. due to lack of measured efficacy
– initiation of rescue medication according to well defined rules
• e.g. Hb1Ac > threshold in diabetes trials
Missing not at random (MNAR):
• missingness depends on unobserved (missing) values
– e.g. related to an early sensitive surrogate marker
Feb. 2016 | Page 81/51
Models for missing data:
Missing completely at random (MCAR):
• missingness independent of all values
– complete case analysis valid
Missing at random (MAR):
• missingness independent of unobserved values
– longitudinal analysis of repeated measurements valid for “de-jure effect”
• implicit imputation using a prediction based on observed data
Missing not at random (MNAR):
• missingness depends on observed and unobserved values
– valid analysis depends on unverifiable assumptions
Feb. 2016 | Page 82/51
Methods for analysing trials with missing data:
• complete case analysis
– only patients with complete data sets are considered
• LOCF
– last observation carried forward
• BOCF
– baseline observations carried forward
• MMRM (Mixed Model Repeated Measurement)
– model based longitudinal analysis
• multiple imputation
– repeated simulation of missing data
• non-parametric rank based analysis
• worst-case-analysis
Feb. 2016 | Page 83/51
Complete case analysis
Approach
• only patients with complete data sets are considered
– similar to per-protocol analysis
Properties
• does not comply with the ITT principle
• only valid under MCAR
Application
• only as an additional sensitivity analysis
Feb. 2016 | Page 84/51
LOCF: Last Observation Carried Forward
Approach
• last observed measurement of discontinuing patients used as final
observation
Properties
• reduces the effect size under MCAR if effect size increases over time
• may be biased under all models
• invalid and often anti-conservative in progressive diseases
Application
• may be used as primary analysis only in specific settings
– low drop-out rate
– increasing effect size
– superiority
• to be used as an additional sensitivity analysis
Feb. 2016 | Page 85/51
Modelbased longitudinal analysis (MMRM)
(mixed model repeated measures)
Approach
• model fit using all observed data to predict unobserved data
• use of model assumptions
– estimation according to the maximum-likelihood principle
Properties
• implicit prediction
• valid under MCAR and MAR
– if selected model is correct
• estimates treatment effect assuming perfect adherence
Application
• primary or sensitivity analysis
– depending on the requested estimand
Feb. 2016 | Page 86/51
Multiple imputation (1)
Approach
• repeated simulation of missing data to be imputed
– averaging of the estimates resulting from the different simulations
– uses the variance between simulations
• observed data used to derive the distribution from which
simulations are generated
– many options to use available data, e.g.
• identification of discontinuing patients with completing patients having
similar characteristics
• identification with placebo patients (placebo MI)
• may target different estimands
Feb. 2016 | Page 87/51
Multiple imputation (2)
Properties
• avoids underestimation of the variance
• may yield valid results under MCAR and MAR
• validity under MNAR depends on the link to observed data
– link unveryfiable
Application
• should (always) be used as sensitivity analysis
• different identification rules to observed values to be used
– e.g. use of different reasons for discontinuation possible
Feb. 2016 | Page 88/51
BOCF: Baseline Observation Carried Forward
Approach
• replaces missing data by baseline values
• assumes absences of a treatment effect in drop-outs
Properties
• avoids underestimation of the effect size as in LOCF
• may be highly insensitive
– conservative for superiority
Application
• may be primary in settings with high uncertainty
• not to be used for non-inferiority or equivalence
– lack of assay sensitivity
• usually a sensitivity analysis
Feb. 2016 | Page 89/51
Rank based approach (Gould) (1)
Approach
• ranking of completing and discontinuing patients
– discontinuing patients ranked according to drop-out times
– completing patients ranked according to observed data
– higher ranks for completing patients
• non-parametric analysis of obtained ranks
Properties
• valid analysis of a new endpoint
• avoids issues with unverifiable assumptions of missing data
mechanism
• interpretation difficult
– clinical relevance difficult to assess
Application
• could be a sensitivity analysis / targets a specific estimand
Feb. 2016 | Page 90/51
Rank based approach (Gould) (2)
Example: Compare treatments A and B
• completing patients
– measurements at the end of the study
• A: 12, 23, 34
• B: 11, 44
• discontinuing patients
– time to drop-out
• A: 12 days, 34 days
• B: 17 days, 19 days, 32 days
• ranks used for the analysis
– A: 1, 5, 7, 8, 9
– B: 2, 3, 4, 6, 10
Feb. 2016 | Page 91/51
Rank based approach (Gould) (3)
Rank based analysis
• evaluates the probability to obtain a higher rank in A as in B
– probability that randomly selected patients treated with A and B behave as
follows
• patient treated with A obtains better values than patient treated with B
or
• patient treated with A discontinues later than patient treated with B
or
• patient treated with A completes and patient treated with B discontinues
• no individual interpretation possible
Feb. 2016 | Page 92/51
Worst-case analysis (superiority vs placebo)
Approach
• replaces missing data by
– best values for placebo
– worst values for active treatment
• may be used for categorical or binary data
– best: (complete) response – worst: failure
Properties
• very conservative unless drop-out rate is very low
• non-differential worst-case imputation regardless treatment is
not a worst-case analysis
– drop-out = failure for all patients is not worst case
Application
• sensitivity analysis only
• usually unrealistic with medium to high drop-out rates
Feb. 2016 | Page 93/51
Missing data: Summary
•
•
•
•
Drop-out in clinical trials cause missing data
Ignoring patients with missing data may lead to bias
Many methods for missing data available
Validity of the method depend on missing data model
•
•
•
missingness mechanism usually not verifiable
use of drop-out reasons may help
several sensitivity analyses needed
• Missing data imputation should target the estimand of interest
•
•
de-facto (real adherence) or de-jure (ideal adherence)
de-facto estimand require follow-up of patients who discontinued
treatment
Feb. 2016 | Page 94/51
Summary: Important principles in drug approval
•
choosing and pre-specifying the relevant endpoint
•
•
•
control of false positive decisions
•
•
•
•
•
placebo
–
superiority: conservative estimation of treatment effect
active control
–
non-inferiority: choose the right margin + be sensitive
unbiasedness and causality
•
•
•
pre-specification of the analysis and study success
proper statistical test
adjust for multiple options of study success
comparison to
•
•
clinical relevance
proper and unambiguous definition
follow ITT principle: analyse as randomize
appropriate treatment of missing data
external validity
•
ensure that effect estimate can be extrapolated to clinical reality
Feb. 2016 | Page 95/51