UE11 Parcours Spécifique 1 Clinical Research - Cours n°3 28/10/2015 Agnès Dechartres RT : Clara Timsit, Deborah To-Puzenat RL : Eva Bisson [email protected] Outcome measures and safety I. II. a) b) c) d) e) Determining whether the treatment is effective or not 1. How to conclude to efficacy of the new treatment 2. How can we express treatment effect? Primary outcome (PO) 1. A single PO/selective outcome reporting 2. How can we choose a good PO? Clinical relevance Surrogate outcomes Objectivity Availability in all patients Composite outcomes Outcome, also called endpoint = « critère de jugement » The outcome makes it possible to assess whether the treatment is effective or not. We distinguish primary and secondary outcomes (critère de jugement principal/secondaires). Only one primary outcome: to conclude on efficacy. It should correspond to the primary (main) objective of the trial. Secondary outcomes (less important): - other benefits - mechanism of action - side effects (= adverse effects) The conclusion of the trial must be based only on the result of the primary outcome, we cannot conclude to the efficacy of the experimental intervention based on a secondary outcome. I. 1. Determining whether the treatment is effective or not How to conclude to the efficacy of the new treatment? You need to perform a statistical test: Null hypothesis (Ho) : Treatment A = Treatment B Alternative hypothesis (H1) : Treatment A ≠ Treatment B Alpha risk (set at 5% in general) : Probability of rejecting H0 (the null hypothesis) when it’s true, in other words, it’s the probability to have a false positive result, to conclude to a difference if this difference doesn’t exist. P-value: probability of observing a result that is as extreme as or more extreme than currently observed. P-Value is the result of the statistical test. If p < 0,05 If p ≥ 0,05 No statistical difference between the two treatments. Does not mean that the treatments are equally Statistical difference between treatment A and effective. We can’t conclude: it can mean there isn’t treatment B. any difference between the treatments or you did not have sufficient power to show a statistical difference. Power: probability to observe a statistical difference if this difference exists. Influence of sample size on p-value: the larger the sample size, the more important the power to show a statistical difference. (For a small sample you’ll need a very large difference between the two groups to be significant) Warning: A statistical difference does not mean that the difference is important for the patients. We also need to assess whether the difference is clinically relevant 2. How can we express treatment effect? Several methods are used to assess the difference, whether it is large or not. Some of these methods are adequate and some can be misleading. We can express treatment effect as: - relative risk - relative risk reduction - absolute risk reduction - Number Needed to Treat (NNT , number of patients to treat to avoid one event) Example: Difference between relative and absolute differences: “Contraceptive pills double the risk of venous thromboembolism” (report of the UK committee on the safety of medicines, 1995): This example shows that we should not rely on relative difference because it can be misleading. Negative impact on public opinion (“Kiss of death” on The Sun cover), increasing number of abortions in the UK after 1995 In reality: contraceptive pills increase the risk of venous thromboembolism from 1 to 2 women out of every 7000 women. Relative difference -> (2/7000) / (1/7000) = 2 —> Contraceptive pills double the RR. Absolute difference -> (2/7000) – (1/7000) = 1/7000 —> It is very small. We have to assess whether this difference is clinically important. Assessment of treatment effect: Table 1 Treatments A (exp) B Overall Deaths Yes No 20 180 40 160 Overall 200 200 400 Larger sample, less common event: Table 2 Deaths Overall Treatments Yes No A (exp) 2 1998 2000 B 4 1996 2000 Overall 4000 Absolute risk treatment A = 20/200=10% Absolute risk treatment B = 40/200=20% Absolute reduction of risk = 20%-10% = 10% Relative reduction of risk = 20%-10%/20% = 50% NNT = 1/ARR=1/0.1=10 Absolute risk treatment A =2/2000=0.1% Absolute risk treatment B =4/2000=0.2% Absolute risk reduction = 0.2%-0.1%=0.1% Relative risk reduction = 0.2%-0.1%/0.2% = 50% NNT = 1/ARR=1/0,001 = 1000 3 parameters: Absolute risk treatment Absolute risk reduction and NNT In these 2 examples, same relative risk reduction but a different NNT (the 1st treatment is better because you have to treat less patients in order to avoid one death). There is a risk of erroneous interpretation with relative risk reduction. II. Primary outcome (PO) The primary outcome should be defined in the protocol before beginning the trial. A sample size calculation based on this outcome can then be performed: Sample size calculation is important to; Recruit a sufficient number of patients to show a statistically significant difference (power). Not ethical to include too many patients (they are exposed to risks during the trial) What is needed for sample size calculation? o Alpha risk usually set at 5% o Power (at least 80%, sometimes 90%) = probability to show a difference when this difference really exists o Prevalence of the outcome : the more unexpected the outcome , the larger the sample size o Difference we want to show : the smaller the difference , the larger the sample size 1. A single PO/selective outcome reporting Why a single PO? To keep α risk at 5% o 1 statistical test α risk is at 5% o α risk increases when performing several statistical tests increased probability of false positive results ! Sample size calculation : o Difficult when several primary outcomes (α risk needs to be adjusted) Example: A single predefined PO that should not change during the trial: Especially after seeing the results Because of α risk (to avoid an increase of α risk, the result could then be related to chance) o Multiplication of statistical tests increased alpha risk increased probability of false-positive results Selective outcome reporting (SOR) : o Selection of a subset of the original variables recorded, on the basis of the results, for inclusion in publication of trials. (if PO isn’t significant, investigators could be tempted to choose a secondary outcome and a subgroup for which there’ll be a statistical difference) o It will bias treatment effect toward more positive results. How can we assess the the risk of SOR? Check the protocol To compare the PO defined in the protocol to the PO in the published article (it should be the same) Protocol can be difficult to obtain But since 2005, all trials should be registered before the recruitment of the first patient in a free publicly available registry online (the best-known: clinicaltrials.gov - registration of the main elements of the protocol including the PO) Examples of changes of PO: 1) In the register: Glycosilated Hemoglobin concentration (important parameter to monitor diabetic patients) after 2 years - Continuous outcome / In the article: binary outcome with a cut-off according to the value of glycosilated hemoglobin and glycaemia as PO was added 2) Same primary outcome but the timeframe of assessment changed from 2 years to 1. Different types of outcomes: Clinical events (usually assessed by physicians): mortality, myocardial infarction Patient-reported outcomes (PRO): pain, disability/function, quality of life Other types of outcomes assessed by physicians: clinical scores, biological tests, radiological tests 2. How can we choose a “good” PO? a) Clinical relevance/importance for patients Outcomes considered as important for patients: mortality, morbidity (myocardial infarction, stroke…), pain, function and quality of life. Few trials focus on outcomes that are important for patients : only 18% ongoing trials on diabetes 23% cardiovascular trials with PO important for patients published between 2005 and 2008. b) Surrogate outcomes = critère de jugement intermédiaires Surrogate outcomes are measured before a clinical event occurs and are often used in clinical trials as substitutes for final patient relevant outcomes like clinical events. Example: Surrogate outcome Outcome important for patients = clinical event HTA Blood pressure Death or death by stroke Osteoporosis Bone mineral density Fracture Why use surrogate outcomes? Low number of clinical events (important outcomes) Long-term follow-ups to see them. Long time before the patients could benefit from therapeutic innovations. Advantages: Smaller sample size Shorter follow-up duration (from years to weeks) Ex: Drug reducing cholesterol level in blood Outcome = Blood cholesterol level , 100 patients , 3 to 12 months Outcome = Mortality , thousands of patients , 4 to 5 years Problems: Assessment of efficacy: incomplete inadequate sometimes misleading conclusions. Example: treatment against osteoporosis Fluoride: increases bone mineral density but also the risk of fractures => not good for patients Bisphosphonate: increases bone mineral density and decreases the risk of fracture Raloxifene: no or little effect on bone mineral density and decreases the risk of fracture. Surrogate outcomes are not always correlated to the occurrence of clinical events. Always better to have an outcome which is important for the patients; surrogate outcomes are always subject to discussion. c) Objectivity of outcomes - For some outcomes, assessment is objective : o Mortality o Biological tests. - For others, the assessment looks objective but is subjective : radiological tests (not an exact science: interpretation, influence of movements and food absorption by the patient) - For many, assessment is subjective : o Pain and other patients reported outcomes (very subjective +++) o Myocardial infarction o Cardiovascular mortality (because it can be very difficult to determine the cause of death). The consequences of subjectivity are: Variability between outcome assessors Bias if the outcome assessor isn’t blinded. For objective outcomes, there is no difference between assessments made in blinded or non-blinded trials, whereas for subjective outcomes there can a difference of treatment effect estimates between blinded and non-blinded assessors (more positive results in non-blinded trials). How to limit subjectivity? Blinding assessment Standardization of outcome measurement o Standardized form o Standardization of clinical assessment o Training of outcome assessors o Skilled and trained outcome assessors o Assessment in duplicate (2 people assess the outcome independently, then see whether they disagree, if they do a 3rd person is needed to reach a conclusion) to limit variability o Centralization of outcome assessment (central comity to discuss discrepancies) Evaluation of reproducibility o Intraclass correlation coefficient (if continuous outcome) o Kappa coefficient (if binary outcome) d) Availability in all patients Objective: to limit missing data We also need to limit exclusion of patients from analysis because of intent to treat analysis, and limit the number of lost to follow-up. Example: thromboembolic events assessed by venograms, myocardial outcomes assessed with coronarography. With such outcomes, there is frequently missing data. They can be considered as SO (not important for patients). Some POs are less likely to have missing data than others: deaths are available in the national registry (état civil) so the number of missing data should be very limited. e) Composite outcomes (CO) They are commonly used in cardiovascular trials. Example: death OR myocardial infarction OR stroke (3 components within 1 outcome). Interests: Increased power for a same sample size: increases the probability of observing the event. Can help investigators select the PO: Trial in patients taking aspirin after myocardial infection (secondary prevention to avoid recurrence) Comparison maintenance of aspirin vs. discontinuation of aspirin before a surgical intervention: The CO including both ischemic and bleeding event takes into account the balance between benefits and risks. Recommendations in case of CO: Interpretation as a CO: o If p<0.05: statistical reduction of death OR myocardial infarction OR stroke. o It’s not possible to conclude that the new treatment decreased mortality. Components of the CO should be defined as secondary outcomes. Ex: secondary outcomes = death, myocardial infarction, stroke. Components of CO: Should have the same importance for patients Should occur with the same frequency Systematic assessment of treatment effect for each component (defined as secondary outcomes): similar treatment effect? Problems: Few CO include components of similar importance for patients with similar treatment effect. Example: 1715 randomized patients, comparison irbesartan + amlodipine vs. placebo CO is defined as mortality or end stage renal disease or doubling of serum creatinine concentration (This last one could be considered as a surrogate outcome: the components of the CO don’t have the same importance for the patient). Treatment expressed as a relative risk reduction: there’s a statistically significant reduction for the CO (the 95% confidence interval doesn’t include 0). The choice of the CO is inadequate: there is in fact no difference (the 95% CI includes 0) for end stage renal disease and mortality, which are the most important outcomes for patients, but because the surrogate outcome (doubling of creatinine levels, which is a very common event) is significant, so is the CO. Results are misleading. Conclusions: Reflexes when considering a PO: Is the PO important for patients? Deaths, clinical event , pain , disability and quality of life Is the PO subjective or objective? If subjective: o Blinded assessment? o Standardization to improve reproducibility? o Assessment in duplicate or centralized assessment to improve reproductibility? Is the PO available for all patients? Interpreting the results: Absolute risk reduction or NNT Avoid relative reduction of risk Abbreviations : PO: primary outcome NNT: number needed to treat SOR: selective outcome reporting CO: composite outcome FICHE RECAPITULATIVE : The primary outcome corresponds to the main objective of the trial, there should be only one (otherwise the alpha risk will increase). The conclusion on the efficacy of the treatment is based on its results (and not on the secondary outcome) it should be defined in the protocol before beginning the trial. A sample size calculation can be performed on it needing the alpha risk, the power, the prevalence of the outcome and the difference we want to show. We can conclude on efficacy after a statistical test. If the P-value (the result of the test) < 0.05 (which corresponds to the Alpha Risk: the probability to get a false positive) there is a statistical difference between the two treatments. Otherwise, there is no statistical difference but we can't conclude that the treatments are equal,. The sample size has an effect on p-value; the larger the sample size, the more important the power to show a statistical difference (it is not ethical to include too many patients). The best ways to express treatment effect are the Absolute risk reduction (treatment A risk – treatment B risk) and the Number Needed to treat (1/absolute risk reduction) Selective Outcome Reporting is a bias treatment effect towards more positive results (choice of a secondary outcome with statistical differences). It is needed to check the protocol for the PO. A good PO can be based on clinical relevance or importance for the patient (pain, mortality, and morbidity, quality of life, disability) or surrogate outcome. SO has a shorter follow up duration and a smaller sample size but it can be incomplete, misleading or inadequate. The assessment of an outcome can be objective (mortality) or subjective (pain) which could lead to a bias if the outcome assessor isn't blinded (detection bias). In order to improve reproducibility in case of subjective outcome, we need blinding assessment, standardization of outcome measurement and evaluation of reproducibility. Composite Outcomes (commonly used in cardiovascular trials) increase the power for a small simple size. If p<0.05 it is not possible to conclude that the treatment decreases only one component. The Components should be defined as secondary outcomes, and should occur with the same frequency, the same importance and treatment effect should be similar.
© Copyright 2026 Paperzz