Course: Clinical Trials Regulatory issues in confirmatory clinical trials Norbert Benda Feb. 2016 | Page 1/51 Contents 1. Basic principles in clinical development and drug approval 2. Statistical issues in drug approval • ITT and missing data • type-1 error control and multiplicity 3. Superiority, non-inferiority and equivalence • basic principles and differences • derivation of a non-inferiority margin • issues in non-inferiority trials 4. Endpoints, effect measures and estimands • definitions • issues and examples 5. Missing data • issues and solutions Feb. 2016 | Page 2/51 Clinical development of a new medicine preclicnical development Phase I few healthy volunteers clinical pharmacology toxicity, safety Phase II medium number of patients proof-of concept dose response Phase III MAA: Marketing authorization application large trials in patients randomized controlled trials (RCT) demonstration of efficacy and safety Feb. 2016 | Page 3/51 Exploration and confirmation exploration phases I and II estimation of a maximum tolerable dose (MTD) proof of concept using pharmacodynamic parameters explore selection of indication and population selection of a useful dose (range) confirmation phase III confirmatory proof of efficacy compared to placebo or active comparator Feb. 2016 | Page 4/51 Drug approval bodies EU: European Medicine Agency - European Commission Scientific committee: Committee for medicinal products for human use (CHMP) Drug approval bodies worldwide: Canada: Health Canada Russia: FSKN EU: EMA Japan: PMDA China: China FDA USA: FDA Brazil: ANVISA Switzerland: Swissmedic Australia: TGA Feb. 2016 | Page 5/51 Drug approval requirements new chemical entity: confirmatory proof of efficacy superiority to placebo or non-interiority compared a reference product sufficient safety database positive benefit risk assessment benefit outweighs risk generic drug (same active ingredients as a reference product): confirmatory proof of bioequivalence Feb. 2016 | Page 6/51 Clinical trials • „A clinical trial is a planned experiment on human beings which is designed to evaluate the effectiveness of one or more forms of treatment“ • Douglas Altman (1991): Practical Statistics for Medical Research • Evaluation of efficacy / effectiveness usually requires • comparison / control • vs placebo or active control • internal validity • ensuring causal relation to treatment • external validity • extrapolation to other settings Feb. 2016 | Page 7/51 Exploration and confirmation • Exploration: early phases • no control of false positive decisions • generation of hypotheses to be tested later • insufficient for licensing • Confirmation: late phases • pivotal Phase III study • control of false positive decisions: • probability of a positive study in case of an ineffective treatment – limited (usually 2.5%) • unbiased estimate of treatment effect • at least not biased in favour of the new treatment • usually at least two successful pivotal trials required for licensing Feb. 2016 | Page 8/51 Key elements of a confirmatory clinical trial • Control • new treatment to be better than placebo (superiority) • new treatment equal (similar) to relevant comparator • indirect prove of superiority to placebo (non-inferiority) • Randomization • proper randomization • ensure causal relationship between effect and treatment • exclude confounding with other factors • ITT principle • evaluation of all study participants according to randomization • Blinding • equal care, equal follow-up, equal assessments, equal measurements • Type-1 error control • low probability of falsely declaring efficacy Feb. 2016 | Page 9/51 Selected statistical issues in drug approval • Type-1 error control • proper control of false positive decisions • Multiplicity, sequential and adaptive designs • studywise control of any false positive decision in all confirmatory comparisons • considering design adaptations and interim decisions • including decisions made for subpopulations • Surrogate endpoints • validating and using endpoints intended to replace others • e.g. tumour response or progression free survival in oncology instead of overall survival • Proper survival analyses • e.g. considering “informative censoring” • Missing data • considering patients that discontinue treatment • Non-inferiority studies • derivation of a non-inferiority margin • ensuring sensitivity of a non-inferiority comparison Feb. 2016 | Page 10/51 Randomization 1 • Randomization to avoid confounding and to ensure causality • Example, no randomization: • Patients with more advanced disease receive treatment A • Patients with less advanced disease receive treatment B • Efficacy related to disease status • Apparent superiority of A vs B due to – disease status / population – not due to treatment • similar with other prognostic factors as age, gender, etc. – including unknown prognostic factors Feb. 2016 | Page 11/51 Randomization 2 • Randomization to avoid confounding and ensure causality • no systematic difference w.r.t. prognostic factors • remaining chance difference covered by statistical test procedure • included in type-1 error control – probability to obtain positive efficacy because of group differences controlled • small differences for large sample sizes • Randomization facilitates interpretation for individual patients: • mean(effect in A – effect in B) = mean(treatment group A) – mean (treatment group B) due to equal chances to receive specific treatment for each patient Feb. 2016 | Page 12/51 Randomization and Causality: Example Population of interest P • Two subpopulations P1 and P2 Two treatments A and B Endpoint Response (binary) No randomization: • all patients in P1 receive A: 50% response • all patients in P2 receive B: 30% response A better than B ? Imagine: If • all patients in P1 received B: 50% response • all patients in P2 received A: 30% response better response due to population only Feb. 2016 | Page 13/51 Randomization and Causality: Example Population of interest P • Two subpopulations P1 and P2 Two treatments A and B Endpoint Response Randomization: • each patients in P1 and P2 receive A with (e.g.) 50% chance (randomization 1:1) • differences in proportions of • P1 patients in A and B • are due to chance only: – large sample sizes – ignorable differences – response difference due to population differences < α if no real difference present Feb. 2016 | Page 14/51 ITT principle • ITT principle • ITT = “intention to treat” • maintains randomization • measuring the treatment effect caused by treatment only • avoids treatment related selection • evaluation according to randomization • irrespective of study discontinuation or treatment switch • Ensures that group difference are due to • treatment • not due to populations • remaining difference in populations by chance • type 1 error accounts for residual population differences Feb. 2016 | Page 15/51 ITT principle: Simplistic example (1) Compare treatments A and B • Consider two (latent) subpopulations S1 and S2 • Case 1: No drop-out A Subgroup S1: 50% response Subgroup S2: 50% no response Subgroup S1: 50% response Subgroup S2: 50% no response Randomize 1:1 B • Response rates: • A: 50% vs B: 50% • No treatment difference 16 Feb. 2016 | Page 16/51 ITT principle: Simplistic example (1) Compare treatments A and B • Consider two (latent) subpopulations S1 and S2 • Case 2: All S2 patients drop out under A A Subgroup S1: 50% response Subgroup S2: 50% not observed Subgroup S1: 50% response Subgroup S2: 50% no response Randomize 1:1 B • Response rates in completers: • A: 100% vs B: 50% • Large treatment difference in completers 17 Feb. 2016 | Page 17/51 ITT principle: Simplistic example (1) Case 2: All S2 patients drop out under A A Subgroup S1: 50% response Subgroup S2: 50% not observed Subgroup S1: 50% response Subgroup S2: 50% no response Randomize 1:1 B • • • • Drop-out in S2 patients may be caused by treatment A All S2 patients may have experienced treatment failure under A S2 patients may not be identifiable as a specific sub-group Two options to follow the ITT principle • Consider drop-out as failure • Replace missing data 18 Feb. 2016 | Page 18/51 ITT principle: Simplistic example (2) • Study comparing treatment A and B • Primary or secondary endpoint: • quality of life score • Some patient die before the end of treatment • How to measure quality of life (QoL) in all patients ? • QoL in survivors = QoL in a selected population – ITT principle not followed • QoL in survivors may be biased Feb. 2016 | Page 19/51 ITT principle: Simplistic example (2) (latent) sub-populations P1, P2, P3 with prevalence 1/3 each • Compare treatments A and B P1 1/3 of the population P2 1/3 of the population P3 1/3 of the population A all die QoL not given all survive mean QoL = 30 all survive mean QoL = 60 45 B all die QoL not given (similar survival times as in A) all die QoL not given all survive mean QoL = 50 50 A equal to B A better than B A better than B A better than or equal to B in all subgroups mean QoL in survivors but: B better than A re. mean QoL Feb. 2016 | Page 20/51 ITT principle: Simplistic example (2) Quality of life (QoL) analysis in studies with high mortality: Possible analyses discussed • QoL score in patients who die = 0 • followed by a non-parametric evaluation – parametric evaluation depends on arbitrary score 0 • non-parametric Gould type analysis (see later) • adjustment by all relevant prognostic factors (in survivors) • still not following the ITT principle • mimic effect in those who survive under A and B using • causal inference with instrumental variables – difficult, relying on full identification of instrumental variables • replacing missing data by missing data imputation • may not target the relevant question – “your QoL would be such if you wouldn’t die” Feb. 2016 | Page 21/51 Type-1 error control: Example • Clinical trial comparing treatments A and B • primary endpoint: walking distance in 6 minutes (difference to baseline) (6-minute-walk-test) • statistical test: two sample t-test on • null hypothesis H0: mean (A) = mean (B) • obtain p-value – probability to obtain the observed difference (or greater) if in fact the null hypothesis is true Feb. 2016 | Page 22/51 Type-1 error control: Example • Clinical trial comparing treatments A and B • statistical test: two sample t-test • result: p (two-sided) = 0.0027 < 0.05 (5%) • small probability to obtain such a result by chance • difference declared to be significant Feb. 2016 | Page 23/51 Type-1 error control: Example • Clinical trial comparing treatments A and B • two comparisons for 6MWT • After 3 months: p-value = 0.0027 • After 6 months: p-value = 0.0918 Question: Study successful ? Feb. 2016 | Page 24/51 Type-1 error control: Example • Multiplicity and pre-specification • post-hoc definition of endpoint of interest (primary endpoint) (6MWT after 3 months or 6MWT after 6 months) • increases the probability of a false significance • is invalid • pre-specification needed Feb. 2016 | Page 25/51 Multiplicity Multiplicity • Multiple ways to win • Multiple chances to obtain a significant results due to chance Example: Study success defined by a significant difference in primary endpoints, e.g. oncology trial • progression free survival (PFS) • overall survival (OS) • No adjustment means: • probability of a significant difference in PFS or OS > α if no real difference in PFS or OS • increased chances to declare an ineffective treatment to be effective Feb. 2016 | Page 26/51 Multiplicity Example: Study success defined by a significant difference in primary endpoints, • progression free survival (PFS) • overall survival (OS) Different options to keep type-1 error: 1. PFS and OS co-primary • both must be significant 2. Hierarchical testing: • test PFS first, test OS only if PFS is significant (or vice versa) 3. Adjust α : test PFS and OS with 0.025 each (instead of 0.05) (Bonferroni) • or use a different split (e.g. 0.01 for PFS, 0.04 for OS) 4. Adjust with more complex methods • “Bonferroni-Holm”, “Hochberg”, “Dunnett”, etc. Lecture on multiple testing (A Zapf), May 19th Feb. 2016 | Page 27/51 Multiplicity adjustment: Co-primary endpoints Example: Study success defined by a significant difference in primary endpoints, • progression free survival (PFS) • overall survival (OS) PFS and OS co-primary • to be pre-specified in the protocol • both must be significant • no valid confirmatory conclusion of only one endpoint is significant • e.g. PFS: p = 0.0000001, OS: p = 0.073 – “sorry, you lost” – no way Feb. 2016 | Page 28/51 EMA Points to consider on Multiplicity Feb. 2016 | Page 29/51 Sources of multiplicity • • • • • multiple endpoints multiple interim looks multiple group comparisons • dose groups multiple (sub-)populations multiple analysis methods (tests) • all may valid, but post-hoc selection is not Course on multiplicity (date, presenter) Feb. 2016 | Page 30/51 Confirmatory Phase III studies Summary of type-1 error control • • • Control probability of a false positive decision • type-1 error control of the statistical test Type-1 error control refers to those comparisons that define study success • usually defined by primary analyses of primary endpoints • precise definition of study success needed • study success may be defined by several comparisons joint by – “and” or “or” • confirmatory proof of primary comparisons – descriptive use of others • multiple comparisons require appropriate multiplicity procedure Pre-specification needed • study protocol / statistical analysis plan (SAP) Feb. 2016 | Page 31/51 Superiority, non-inferiority, equivalence • superiority study – comparison to placebo • new drug to be better than placebo • non-inferiority study – comparison to an active comparator • suggests: new drug as least as good as comparator • proofs: new drug not considerably inferior than comparator • equivalence study – bioequivalence in generic applications (=copies of “chemical drugs”) – therapeutic equivalence in biosimilars (=copies of “biological drugs”) • difference between drugs within a given range Feb. 2016 | Page 32/51 Superiority, non-inferiority, equivalence • compare parameter ϑ between two treatments – e.g. ϑ – ϑ A, ϑ B = mean change from baseline in Hb1Ac = mean change for treatment A (new) and treatment B (comparator or placebo) • superiority comparison – show: ϑ A > ϑ B – reject null hypothesis H0: ϑ A ≤ ϑ B • non-inferiority comparison – show: ϑ A > ϑ B − δ – reject null hypothesis H0: ϑ A ≤ ϑ B− δ • δ = non-inferiority margin • equivalence comparison – show: - δ < ϑ A − ϑ B < δ – reject null hypothesis H0: ϑ A ≤ ϑ B− δ and ϑ A ≥ ϑ B+ δ Feb. 2016 | Page 33/51 Superiority, non-inferiority, equivalence H0 Superiority H0 Non-inferiority H0 H0 Equivalence −δ 0 δ ϑA−ϑB Feb. 2016 | Page 34/51 Confirmatory superiority trial • show – new drug better than placebo use statistical test on null hypothesis H0: ϑ A ≤ ϑ B significance = reject null hypothesis – conclude superiority • validity – type-1 error control: • Prob(conclude superiority|no superiority) ≤ 2.5 % – false conclusion of superiority ̭ % ̭ ≤ 2.5 – effect estimate relative to placebo ϑ A − ϑ B • unbiased (correct on average) or • conservative (no overestimation on average) Feb. 2016 | Page 35/51 Confirmatory superiority trial • ensure – probability of a false positive decision on superiority should be small (≤ 2.5 %) • type-1 error control of the statistical test • control for multiple comparisons – conservativeness • avoid overestimation – correct statistical estimation procedure – proper missing data imputation – proper randomisation – etc. Feb. 2016 | Page 36/51 Confirmatory non-inferiority trial • show – new drug better than comparator – δ use statistical test on null hypothesis H0: ϑ A ≤ ϑ B − δ conclusion of non-inferiority (NI) = rejection of null hypothesis • validity – type-1 error control: • Prob(conclude NI|new drug inferior – δ ) ≤ 2.5 % – false conclusion of NI ≤ 2.5 % ̭ ̭ – effect estimate relative to active comparator ϑ A − ϑ B • unbiased (correct on average) – no underestimation of an possibly negative effect Feb. 2016 | Page 37/51 EMA Points to consider on switching between superiority and non-inferiority Feb. 2016 | Page 38/51 EMA Guideline on the choice of the non-inferiority margin Feb. 2016 | Page 39/51 Issues in non-inferiority trials • non-inferiority margin δ – clinical justification • defined through clinical relevance – “statistical” justification • defined through comparator benefit compared to placebo • sensitivity – NI studies to be designed to detect differences • constancy assumption – assumed comparator effect maintained in the actual study • relevant population to be tested • comparator effect maintained over time – control e.g. “biocreep” in antimicrobials Feb. 2016 | Page 40/51 Non-inferiority margin • clinical justification – clinical relevance • involving anticipated risk benefit • “statistical” justification – related to putative placebo comparison • indirect comparison to placebo using historical data – based on estimated difference comparator (C) to placebo (P) C – P – use historical placebo controlled studies on the comparator • evaluating C – P in a meta-analysis • quantifying uncertainty in historical data by using a meta-analysis based 95% confidence interval of C – P • define NI margin relative to the lower limit of the confidence interval, e.g. by a given fraction Feb. 2016 | Page 41/51 Non-inferiority margin: Statistical justification Historical studies Comparator vs Placebo: Effect estimate and 95% confidence intervals 0 δ C−P Meta-analysis: Comparator vs Placebo: Effect estimate and 95% confidence interval e.g. δ = ½ of lower limit of the 95% confidence interval of C – P Feb. 2016 | Page 42/51 Sensitivity of a NI trial • lack of sensitivity in a superiority trial – sponsors risk – may lead to an unsuccessful trial • lack of sensitivity in a non-inferiority trial (lack of assay sensitivity) – relevant for approval – risk of an overlooked inferiority • assume “true” relevant effect ϑ A < ϑ B - δ • insensitive new study – e.g. wrong measurement timḙ in treatment of pain ̭ – estimated effect difference ϑ A - ϑ B ≈ 0 • study would be a (wrong) success Feb. 2016 | Page 43/51 Insensitive non-inferiority trial true effect new treatment vs comparator reduced effect due to • “sloppy” measurement • wrong time point • insensitive analysis, etc. −δ H0 0 ϑA−ϑB Non-inferiority Feb. 2016 | Page 44/51 Sensitivity of a NI trial: Toy example • historical data from comparator trials – conducted in relevant population of severe cases – response rates: comparator 60%, placebo 40% – difference to placebo: 20% • NI margin chosen for new study δ = 8% • actual NI study (new vs comparator) – conducted in mild and severe population – 70% mild – 30% severe – assume (e.g.): • 100% response expected in mild cases irrespective of treatment – expected (putative) response in this population • comparator 0.3 ∙ 60% + 0.7 ∙ 100% = 88% • placebo 0.3 ∙ 40% + 0.7 ∙ 100% = 82% • expected (putative) difference to placebo: 6% – 8% difference (new vs comparator) would mean • new drug inferior to placebo Feb. 2016 | Page 45/51 Sensitivity of a NI trial • potential sources of lack of sensitivity in a NI trial – wrong measurement time (too early, too late) – wrong or “diluted” population • e.g. study conducted in patients with a mild form of the disease – but difference expected in more severe cases – lots of missing data + insensitive imputation • e.g. “missing = failure” may be too insensitive – insensitive endpoint • e.g. dichotimized response less sensitive than continuous outcome • insensitive measurement (large measurement error) – rescue medication • e.g. pain – primary endpoint VAS pain – more rescue medication used for new drug Feb. 2016 | Page 46/51 Equivalence trial • bioequivalence – primary endpoint AUC of blood concentrations (AUC = area under the curve of the concentration time curve) – show: 0.8 ≤ mean(AUCgeneric)/mean(AUCoriginator) ≤ 1.25 • confirmatory proof given by – 90% confidence interval ⊂ [0.8, 1.25] – equivalent to two one-sided 5% tests to show • mean(AUCgeneric)/mean(AUCoriginator) ≤ 1.25 • mean(AUCgeneric)/mean(AUCoriginator) ≥ 0.8 – increased type-1 error in bioequivalence as compared to other trials ! • 5% one-sided instead of 2.5% one-sided Feb. 2016 | Page 47/51 Equivalence trial • therapeutic or PD equivalence – mainly used in biosimilarity – demonstration of equivalence by • e.g. using 95% confidence interval ⊂ equivalence range • equivalent to two one-sided 2.5% tests – usually symmetric equivalence range • depending on scale – e.g. (0.8, 1.25) is symmetric on log-scale (multiplicative scale) • e.g. biosimilarity: – If A is biosimilar to B, B should also be biosimilar to A – lack of sensitivity issues as in NI trials Feb. 2016 | Page 48/51 Endpoints and effect measures • endpoint – variable to be investigated, e.g. • pain measurements after x days of treatment – possible individual outcomes: 4.2, 6.3, etc. • response – possible individual outcomes: yes, no • time to event (death, stroke, progression or death, etc.) – possible individual outcomes: event at 8 months censored at 10 months Feb. 2016 | Page 49/51 Endpoints: Example treatment of neuropathic pain EMA Draft Guideline (2012): primary endpoint: reduction in pain measured by VAS: visual analog scale 0 – 100% or 11-point scale (NRS) Numerical Rating Scale or other scale, z.B. NPS: Neuropoathic Pain Scale 10 different questions on pain Modul 2.4: Klinische Studien Feb. 2016 | Page 50/51 NPS Skale, Questions 1 - 5 Modul 2.4: Klinische Studien Feb. 2016 | Page 51/51 EMA Draft Guideline on treatment of pain Modul 2.4: Klinische Studien Feb. 2016 | Page 52/51 Endpoints and effect measures • effect measure – population parameter that describes a treatment effect, e.g. • mean difference in VAS score between treatments A and B • difference in response rates • hazard ratio in overall survival – study result estimates the effect measure • observed mean difference • difference in observed response rates • estimated hazard ratio (e.g. using Cox regression) – note: • disentangle – population effect measure to be estimated – observed effect measure as an estimate of this Feb. 2016 | Page 53/51 Choosing the right estimand • estimand = that which is being estimated – Latin gerundive aestimandus = to be estimated – simply speaking: the precise parameter to be estimated ceterum censeo parametrum esse aestimandum – however: • the parameter may not always be given easily • may be a (complex) function of other parameters • treatment effect estimate may target – effect under perfect adherence: – effect under real adherence: • several options possible “de-jure” “de-facto” Feb. 2016 | Page 54/51 ICH Concept Paper on Estimands and Sensitivity Analyses Feb. 2016 | Page 55/51 Estima*s • estimation function (estimator) – statistical procedure that maps the study data to a single value (that is intended to estimate the parameter of interest) • estimate – value obtained in a given study • estimand – parameter to be estimated • or a function of estimated parameters to be estimated Feb. 2016 | Page 56/51 Example: Event rate in two treatments • event rates – A: 60 % – B: 50 % • how to measure treatment difference ? – several options • Rate difference: 10 % • Rate ratio: 60/50 = 1.2 • Odds ratio: (60/40)/(50/50) = 1.5 • Hazard ratio resulting from a time-to-event analysis • estimand relates to the effect measure – but not only to this ! Feb. 2016 | Page 57/51 Estimand • estimand = the precise parameter to be estimate • related to – endpoint – effect measure • mean difference, difference between medians, risk ratio, hazard ratio, etc. – population – time point of measurement, duration of observational period, etc. – adherence • effect under perfect adherence: “de-jure” • effect under real adherence: “de-facto” Feb. 2016 | Page 58/51 De-facto and de-jure estimands treatment dropout “retrieved” data placebo active treatment de-facto (Difference in all randomized patients) de-jure (Difference if all patients adhered) time end of trial Feb. 2016 | Page 59/51 Estimand issues: Examples • rescue medication – e.g. pain, diabetes • efficacy if no subject took rescue med (de-jure) • efficacy under rescue med (de-facto) • efficacy and effectiveness under relevant non-adherence – e.g. depression • effect if all subjects were adherent or • effect under actual adherence Feb. 2016 | Page 60/51 Estimands under rescue medication • primary endpoint: Change in VAS pain • de-facto estimand – change in pain irrespective of rescue medication use – all data used – longitudinal model or ANCOVA to estimate • de-jure estimand – change in pain without rescue medication – only data until start of rescue medication used – longitudinal model on “clean” data • severe pain – – – – intake of rescue medication in most patients de-jure estimand not evaluable de-facto estimand insensitive alternative endpoints to be considered • amount of rescue medication • time to first use of rescue medication Feb. 2016 | Page 61/51 Example: Simulation of a depression trial with retrieved data • Longitudinal data (Hamilton Depression Score) • Some data were collected after treatment discontinuation • Different drop-out mechanisms – treatment dropout (TD) – analysis dropout (AD) • AD time ≥ TD time • “retrieved data” from TD to AD • Data generation – according to a two-piece linear mixed model – selection model for TD, exponential time to AD Leuchs AK, Zinserling J, Schlosser-Weber G, Berres M, Neuhäuser M, Benda N. Estimation of the treatment effect in the presence of non-compliance and missing data. Statistics in Medicine. 2014; 32:193–208. 62 Feb. 2016 | Page 62/51 Example of a de-facto estimand • Joint model (JM) for Y and D (treatment dropout) with density • Y|D following (e.g.) two-piece linear mixed model assuming different slopes while adhering and non-adhering • De-facto estimand given by 𝐸 𝑌 = � 𝐸 𝑌|𝐷 𝑓 𝑑 d𝑑 63 Feb. 2016 | Page 63/51 Example: Simulation of a depression trial with retrieved data • 150 patients per group (treatment vs placebo) • Drop-out mechanisms – treatment dropout (TD) at random – analysis dropout (AD) at random • Treatment dropout rate – 30% each – 25% and 35% • Retrieval rates – 100%, 70% and 40% – corresponding to 0%, 30%, 60% AD 64 Feb. 2016 | Page 64/51 Example: Simulation of a depression trial with retrieved data - Analysis Analysis Options 1. Placebo multiple imputation 2. Joint modelling – according to data generating mechanism 3. mixed effect model for repeated measurements using all data – usual linear mixed effect model – ignoring non-compliance 4. mixed effect model for repeated measurements using data until TD – usual linear mixed effect model – using only data during adherance 65 Feb. 2016 | Page 65/51 type 1 error and bias Equal dropouts de-jure = 0 & de-facto = 0 de-jure TD 30% de-facto TD 30% Retrieved 100% 70% 40% 1: pMI 0.023 0.023 0.023 2: JM varying slopes 0.031 0.032 0.035 3: MM all data 0.047 0.047 0.044 4: MM only on treat 0.052 0.052 0.052 Strategy Retrieved 100% 70% 40% 1: pMI 0.023 0.023 0.023 2: JM varying slopes 0.049 0.050 0.058 3: MM all data 0.047 0.047 0.044 4: MM only on treat 0.052 0.052 0.052 Strategy 100 70 40 100 70 40 Feb. 2016 | Page 66/51 type 1 error and bias Equal dropouts de-jure = 2 & de-facto = 0 de-jure TD 30% de-facto TD 30% Retrieved 100% 70% 40% 1: pMI 0.64 0.64 0.64 2: JM varying slopes 0.37 0.35 0.32 3: MM all data 0.08 0.16 0.36 4: MM only on treat 0.82 0.82 0.82 Strategy Retrieved 100% 70% 40% 1: pMI 0.636 0.636 0.636 2: JM varying slopes 0.055 0.051 0.059 3: MM all data 0.078 0.159 0.358 4: MM only on treat 0.820 0.820 0.820 Strategy 100 70 40 100 70 40 Feb. 2016 | Page 67/51 type 1 error and bias Unequal dropouts de-jure = 2 & de-facto = 0 de-jure TD 30% de-facto TD 30% Retrieved 100% 70% 40% 1: pMI 0.63 0.63 0.63 2: JM varying slopes 0.43 0.42 0.42 3: MM all data 0.07 0.21 0.43 4: MM only on treat 0.81 0.81 0.81 100% 70% 40% 1: pMI 0.629 0.629 0.629 2: JM varying slopes 0.051 0.051 0.062 3: MM all data 0.075 0.208 0.431 4: MM only on treat 0.807 0.807 0.807 Strategy Retrieved Strategy 100 70 40 100 70 Feb. 2016 | Page 68/51 40 Simulation study: Summary Conclusions: • De-jure estimand – data retrieval not needed – MMRM using on-treatment data valid under MAR • De-facto estimand – (some) data retrieval needed – joint modelling especially in case of partial retrieval – MMRM using all data in case of (almost) complete retrieval possible • further investigations on robustness needed 69 Feb. 2016 | Page 69/51 Estimands: Summary • specification of a relevant estimand first – clarifies study objective – needed to define relevant estimation and missing data method – impacts study design • estimand includes – assumption on adherence – distributional parameter – population • estimand – defines the primary analysis – different estimands may be used in additional analysis Feb. 2016 | Page 70/51 EMA Guideline on Missing Data Feb. 2016 | Page 71/51 Missing data in confirmatory trials • ITT principle requires treatment of missing data • Drop-out in clinical trials cause missing data • Ignoring patients with missing data may bias the treatment effect estimate, if – drop-out is caused by the treatment – treatment effect is different in patients who drop-out • Omitting patients who drop-out – results in a selection – which may result in bias • see simplistic ITT examples • Estimands are related to missing data Feb. 2016 | Page 72/51 Current discussion on missing data and estimands in confirmatory studies • Regulatory documents – CHMP Guideline on Missing Data in Confirmatory Clinical Trials (2010) – ICH Draft Concept Paper on Choosing Appropriate Estimands and Defining Sensitivity Analyses in Clinical Trials (Addendum to E9) (2014) • Questions – What to do with missing data ? – Why to impute missing data ? – Measuring efficacy conditional on • perfect adherence or real adherence ? → estimand – Precise parameter to be estimated (and to be imputed for) ? → estimand Feb. 2016 | Page 73/51 ITT principle and missing data • Requires inclusion of all patients – including drop-outs – including those with missing data • How can we consider patients who discontinued ? – (a) change in endpoint definition • time to event → time to event or drop-out • treatment failure → treatment failure or drop-out (“drop-out = failure”) – (b) replace missing data „reasonably“: • conservative in superiority studies – excluding overestimation of treatment effect • as unbiased as possible in non-inferiority studies Feb. 2016 | Page 74/51 Drop-out scenario 1: Assume: • patient drop out equally in both treatments – with probability p • no treatment difference in drop-outs Analysis: • Complete case analysis – ignores drop-outs Result: ? Feb. 2016 | Page 75/51 Drop-out scenario 1: Assume: • patient drop out equal in both treatments – with probability p • no treatment difference in drop-outs Analysis: • Complete case analysis – ignores drop-outs Result: • treatment difference overestimated – effect in all = (1-p) ∙ effect in completers + p ∙ effect in drop-outs = (1-p) ∙ effect in completers + 0 < effect in completers Feb. 2016 | Page 76/51 Drop-out scenario 2: Assume: • drop-out independent of efficacy and treatment • increasing effect over time Analysis: • LOCF: last observation carried forward – last values observed before drop-out used as final value Result: ? Feb. 2016 | Page 77/51 Drop-out scenario 2: Assume: • drop-out independent of efficacy and treatment • increasing effect size (treatment difference) over time Analysis: • LOCF: last observation carried forward – last values observed before drop-out used as final value Result: • underestimation of the treatment difference – superiority study: conservative effect estimate – non-inferiority study: lack of sensitivity • potential difference to comparator may be concealed Feb. 2016 | Page 78/51 Drop-out scenario 3: Assume: • progressive disease – treatment aims at slowing disease course • no effect active vs placebo (null hypothesis) • earlier drop-out under active – e.g. due to side effects Analysis: • LOCF: last observation carried forward Result: ? Feb. 2016 | Page 79/51 Drop-out scenario 3: Assume: • progressive disease – treatment aims at slowing disease course • no effect active vs placebo (null hypothesis) • earlier drop-out under active – e.g. due to side effects Analysis: • LOCF: last observation carried forward Result: • early drop-out: better efficacy values – better results for active treatment – increased number of (wrongly) significant results – type-1 error inflation: invalid analysis Feb. 2016 | Page 80/51 Models for missing data: Missing completely at random (MCAR): • missingness independent of efficacy – e.g. accident, move, data loss etc. Missing at random (MAR): • missingness depends on observed values only – e.g. due to lack of measured efficacy – initiation of rescue medication according to well defined rules • e.g. Hb1Ac > threshold in diabetes trials Missing not at random (MNAR): • missingness depends on unobserved (missing) values – e.g. related to an early sensitive surrogate marker Feb. 2016 | Page 81/51 Models for missing data: Missing completely at random (MCAR): • missingness independent of all values – complete case analysis valid Missing at random (MAR): • missingness independent of unobserved values – longitudinal analysis of repeated measurements valid for “de-jure effect” • implicit imputation using a prediction based on observed data Missing not at random (MNAR): • missingness depends on observed and unobserved values – valid analysis depends on unverifiable assumptions Feb. 2016 | Page 82/51 Methods for analysing trials with missing data: • complete case analysis – only patients with complete data sets are considered • LOCF – last observation carried forward • BOCF – baseline observations carried forward • MMRM (Mixed Model Repeated Measurement) – model based longitudinal analysis • multiple imputation – repeated simulation of missing data • non-parametric rank based analysis • worst-case-analysis Feb. 2016 | Page 83/51 Complete case analysis Approach • only patients with complete data sets are considered – similar to per-protocol analysis Properties • does not comply with the ITT principle • only valid under MCAR Application • only as an additional sensitivity analysis Feb. 2016 | Page 84/51 LOCF: Last Observation Carried Forward Approach • last observed measurement of discontinuing patients used as final observation Properties • reduces the effect size under MCAR if effect size increases over time • may be biased under all models • invalid and often anti-conservative in progressive diseases Application • may be used as primary analysis only in specific settings – low drop-out rate – increasing effect size – superiority • to be used as an additional sensitivity analysis Feb. 2016 | Page 85/51 Modelbased longitudinal analysis (MMRM) (mixed model repeated measures) Approach • model fit using all observed data to predict unobserved data • use of model assumptions – estimation according to the maximum-likelihood principle Properties • implicit prediction • valid under MCAR and MAR – if selected model is correct • estimates treatment effect assuming perfect adherence Application • primary or sensitivity analysis – depending on the requested estimand Feb. 2016 | Page 86/51 Multiple imputation (1) Approach • repeated simulation of missing data to be imputed – averaging of the estimates resulting from the different simulations – uses the variance between simulations • observed data used to derive the distribution from which simulations are generated – many options to use available data, e.g. • identification of discontinuing patients with completing patients having similar characteristics • identification with placebo patients (placebo MI) • may target different estimands Feb. 2016 | Page 87/51 Multiple imputation (2) Properties • avoids underestimation of the variance • may yield valid results under MCAR and MAR • validity under MNAR depends on the link to observed data – link unveryfiable Application • should (always) be used as sensitivity analysis • different identification rules to observed values to be used – e.g. use of different reasons for discontinuation possible Feb. 2016 | Page 88/51 BOCF: Baseline Observation Carried Forward Approach • replaces missing data by baseline values • assumes absences of a treatment effect in drop-outs Properties • avoids underestimation of the effect size as in LOCF • may be highly insensitive – conservative for superiority Application • may be primary in settings with high uncertainty • not to be used for non-inferiority or equivalence – lack of assay sensitivity • usually a sensitivity analysis Feb. 2016 | Page 89/51 Rank based approach (Gould) (1) Approach • ranking of completing and discontinuing patients – discontinuing patients ranked according to drop-out times – completing patients ranked according to observed data – higher ranks for completing patients • non-parametric analysis of obtained ranks Properties • valid analysis of a new endpoint • avoids issues with unverifiable assumptions of missing data mechanism • interpretation difficult – clinical relevance difficult to assess Application • could be a sensitivity analysis / targets a specific estimand Feb. 2016 | Page 90/51 Rank based approach (Gould) (2) Example: Compare treatments A and B • completing patients – measurements at the end of the study • A: 12, 23, 34 • B: 11, 44 • discontinuing patients – time to drop-out • A: 12 days, 34 days • B: 17 days, 19 days, 32 days • ranks used for the analysis – A: 1, 5, 7, 8, 9 – B: 2, 3, 4, 6, 10 Feb. 2016 | Page 91/51 Rank based approach (Gould) (3) Rank based analysis • evaluates the probability to obtain a higher rank in A as in B – probability that randomly selected patients treated with A and B behave as follows • patient treated with A obtains better values than patient treated with B or • patient treated with A discontinues later than patient treated with B or • patient treated with A completes and patient treated with B discontinues • no individual interpretation possible Feb. 2016 | Page 92/51 Worst-case analysis (superiority vs placebo) Approach • replaces missing data by – best values for placebo – worst values for active treatment • may be used for categorical or binary data – best: (complete) response – worst: failure Properties • very conservative unless drop-out rate is very low • non-differential worst-case imputation regardless treatment is not a worst-case analysis – drop-out = failure for all patients is not worst case Application • sensitivity analysis only • usually unrealistic with medium to high drop-out rates Feb. 2016 | Page 93/51 Missing data: Summary • • • • Drop-out in clinical trials cause missing data Ignoring patients with missing data may lead to bias Many methods for missing data available Validity of the method depend on missing data model • • • missingness mechanism usually not verifiable use of drop-out reasons may help several sensitivity analyses needed • Missing data imputation should target the estimand of interest • • de-facto (real adherence) or de-jure (ideal adherence) de-facto estimand require follow-up of patients who discontinued treatment Feb. 2016 | Page 94/51 Summary: Important principles in drug approval • choosing and pre-specifying the relevant endpoint • • • control of false positive decisions • • • • • placebo – superiority: conservative estimation of treatment effect active control – non-inferiority: choose the right margin + be sensitive unbiasedness and causality • • • pre-specification of the analysis and study success proper statistical test adjust for multiple options of study success comparison to • • clinical relevance proper and unambiguous definition follow ITT principle: analyse as randomize appropriate treatment of missing data external validity • ensure that effect estimate can be extrapolated to clinical reality Feb. 2016 | Page 95/51
© Copyright 2026 Paperzz