TRANSLATIONAL RESEARCH: DATA ANALYSIS PLAN AND EXPLORING THE RESULTS AA, JC DATA ANALYSIS PLAN: Start with the end in mind! The analysis plan starts with the data collection: you can’ t analyze data that you don’t have • Data recording – Good scientific practice calls for accountability and reproducibility. It is therefore common practice to record “everything” about an experiment or a study, and nothing gets deleted. – Protocols (for experiments and for clinical trials) – Diaries (records daily activities) – Inventory (reagents, supplies, drugs) • Defining the parameters to measure • Outcomes of interest (to answer the specific aims) The more the better but … you risk spending your entire life recording rather than doing experiments • Control measures – used as quality controls Important gatekeeping to guarantee quality informed decision INTRODUCTION TO TRANSLATIONAL RESEARCH Date 2 DATA ANALYSIS PLAN: Start with the end in mind! • Defining the time points The more the better but again within limits, take in consideration also costs, use of animals, discomfort or risk for human subjects • Defining the assessment strategy to reduce the risk of bias Bias is an inclination towards something, or a predisposition, partiality, prejudice, preference, or predilection that consciously or unconsciously may affect the way a person or parameter is assessed, judged, or treated. – Random assignment to reduce the risk of bias Since bias is human, and somewhat inevitable, allocation of groups by random allocation reduces the allocation or selection bias. – Blinded assessment to reduce the risk of bias Since bias is human, and somewhat inevitable, assessment of the measures of interest by operators who are unaware of group allocation reduces the risk of assessment bias. INTRODUCTION TO TRANSLATIONAL RESEARCH Date 3 REPRODUCIBILITY IN BIOMEDICAL RESEARCH: A BIG DEAL COGNITIVE BIASES Confirmation bias - The tendency to search for or interpret information in a way that confirms one's preconceptions. In addition, individuals may discredit information that does not support their views or may reduce inconsistency by searching for information which re-confirms their views Belief bias - When one's evaluation of the logical strength of an argument is biased by their belief in the truth or falsity of the conclusion. Framing - Using a too-narrow approach and description of the issue/topic. INTRODUCTION TO TRANSLATIONAL RESEARCH Date 4 REPRODUCIBILITY IN BIOMEDICAL RESEARCH: A BIG DEAL INTRODUCTION TO TRANSLATIONAL RESEARCH Date 5 • Defining the assessment strategy to reduce the risk of bias – Random assignment to reduce the risk of bias Since bias is human, and somewhat inevitable, allocation of groups by random allocation reduces the allocation or selection bias. – Blinded assessment to reduce the risk of bias Since bias is human, and somewhat inevitable, assessment of the measures of interest by operators who are unaware of group allocation reduces the risk of assessment bias. FEW ADDITIONAL TIPS – Multiple independent endpoints – reduces risk of bias, especially if different individuals doing different assessment; also reduces risk of errors due to technical limitations due to a specific technique – Bias-insensitive endpoints – while there are no-such-a-thing, there are some that are less less sensitive than others higher risk are those that require subjective interpretation; lower risk are those with a numeric output of a device (i.e. biomarker) INTRODUCTION TO TRANSLATIONAL RESEARCH Date 6 DATA ANALYSIS PLAN: The methodology! There are small lies, damned lies, and … statistics !!! • Statistics – Mathematical modeling used to calculate the probability of chance. The need for statistics in Biomedical Research derives from the limitations of the research methods: – Sampling (studying the part [sample] instead of a whole) – Qualitative differences (intermediate phenotypes) – Small differences (sometimes with sampling/testing error range) Question: An investigator exposes 2 groups of 6 mice to a lethal dose of LPS to induce septic shock, one group also receives an anti-inflammatory drug while the other uses vehicle alone, the treatment is assigned randomly, the assessment is blinded, at 24 hours, all 6 mice treated with LPS with vehicle are dead, while all of those treated with LPS+drug are alive. Which statistical analysis is needed now to determine if these differences are important? INTRODUCTION TO TRANSLATIONAL RESEARCH Date 7 DATA ANALYSIS PLAN: The methodology! A pre-specified analysis plan reduces risk of bias • Statistics the probability of error in rejecting the null hypothesis – Null hypothesis H0 - lack of difference/correlation or other – P value – probability with which H0 can be rejected – If P<0.05 there is <5% chance that the H0 is true, meaning that the null hypothesis can be rejected with a 95% confidence, and hence a difference/correlation may indeed exist – Limitations: • Type I error rejecting H0 when it actually it should have not been • rejected this is not really an error but a probability assessment, if you choose to accept a significance value of P<0.05 then it means that in 5% of cases you will have a type I error, to reduce type I errors you can set the significance of P values to <0.01 or lower (this increases type II errors) Type II error accepting H0 when it actually it should have been rejected this is a problem of power analysis before accepting H0 as a final statement, one should calculate the probability of type II error as (1power)(ideally 1-0.95=0.05) INTRODUCTION TO TRANSLATIONAL RESEARCH Date 8 DATA ANALYSIS PLAN: The methodology! Question: An investigator exposes 2 groups of 6 mice to a lethal dose of LPS to induce septic shock, one group also receives an anti-inflammatory drug while the other uses vehicle alone, the treatment is assigned randomly, the assessment is blinded, at 24 hours, all 6 mice treated with LPS with vehicle are dead, while 4 of 6 treated with LPS+drug are dead (P=0.20). The investigators calculate that the study had a 65% power to detect a significant difference between groups. 1) Is the null hypothesis H0 (the 2 treatments are equal) rejected ? • No, not rejected because P>0.05 2) What is the probability or risk of type I error? • Not applicable, because type I error is calculated only for false positive results for when the null hypothesis is rejected 3) What is the probability or risk of type II error? • It is 0.35 or 35% because it is = 1 - power INTRODUCTION TO TRANSLATIONAL RESEARCH Date 9 DATA ANALYSIS PLAN: The methodology! Bigger data and smaller data Gaussian distribution • Parametric distribution according to a Gaussian distribution • • • • • • Applies to continuous variables Nearly all biological variables If adequately samples Without sampling biases Allows for prediction More powerful testing Most commonly used test: • Student T test (2 groups) • For unpaired or paired data • Analysis of Variance (ANOVA) • For multiple groups • For repeated measures INTRODUCTION TO TRANSLATIONAL RESEARCH Date 10 DATA ANALYSIS PLAN: The methodology! When it cannot be assumed that the distribution is Gaussian, it is best to use NONparametric test (or to test for deviation from the parametric distribution using the Kolmogorov-Smirnov test) Descriptive statistics for continuous variables: • Parametric data use Mean and Standard Deviation or Standard Error • Non-parametric data Median and interquartile range Non-Parametric Test also called rank tests (they don’t assume that the variation is equal on both sides of the mean) Most commonly used test: • Mann Whitney U test for unpaired data (2 groups) • Wilcoxon test for paired data (2 groups) • Kruskal Wallis test for unpaired data (3 or more groups) • Spearman Correlation test Quick note: Bonferroni’s correction for multiple comparisons to account for the increased play of chance with multiple attempts, it is recommend to apply one of two corrections: multiple the significant P value x (N-1) where N is the number of Date P. 11 comparisons or divide 0.05 by (N-1) to obtain a new significance for the DATA ANALYSIS PLAN: The methodology! For discrete or discontinuous variables dedicated non-parametric tests Descriptive statistics for discrete variables: • Number and percent • Incidence (over time), prevalence (at a given time) Non-Parametric Test most common scenario is one of exposure/event, in which both the exposure and the event are discrete variable Event yes yes no no Exposure • Chi square test - standard test to explore the asymmetry of the events in 2 or more groups (when one of the cells has less than 5, then it is recommended to use the Fisher’s exact test) the Chi-square can be built as 2x2 but also larger as 2x3, 2x4, or 3x4 or any size. The derived P value is a P for rejecting the null hypothesis H0 of lack of asymmetry 2x2 tests generally follow to compare subgroups, this requires however Bonferroni correction. Date 12 DATA ANALYSIS PLAN: The methodology! Non-Parametric Test occasionally you will be trying to determine if one continuous variable predicts one dichotomous variable (i.e. a biomarker predicting an event such as death): Logistic Regression Analysis - also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as a linear combination of the predictor variables. Date 13 CHOOSING THE RIGHT TOOLS VARIABLE OF INTEREST CONTINUOUS PARAMETRIC Unpaired data: • 2 groups: T test • >2 groups: ANOVA Paired data: • 2 groups: T test • >2 groups: ANOVA Correlation: Pearson test DISCRETE NON-PARAMETRIC Unpaired data: • 2 groups: Mann Whitney • >2 groups: Kruskal Wallis Paired data: • 2 groups: Wilcoxon • >2 groups: Wilcoxon Correlation: Spearman Footer All discrete variables: • Chi-square test • (Fisher exact test) Continuous predicting dichotomous outcome: • Logistic regression Date 14 Security/Confidentiality • Keep identifying data (name, SSN) in a separate table/file – secure and protected. • Link rest of DB to this table via a Subject ID that has no meaning external to the DB • Restrict access to identifying data • Password protect at both OS and application levels • Audit entries and updates Relationships DON’T CLOSE THE DOOR TO SERENDIPITY Footer Date 17 The basis of translational research You will not get ideas ‘minding your own business’ or spending all the time at the bench --- you need to COMMUNICATE, SHARE, EXPLORE – and if you are trying to impact the outcome of a disease, you need to know about the specific disease Be open to Discovery When you start a path, you learn something every step you take (student stage) If you complete the path multiple times, you will know everything about the path (practitioner stage) .. but if you never leave the path .. You will never discover anything new and the path will appear smaller and smaller to you challenge the dogmas, explore where others have not, search for the unknowns, and you will find out whether the path you are on is really the best possible path (scientist stage) ANALYZING THE RESULTS: “I can always tell who “has it” and who “doesn’t” to be a successful researcher if you can finish an experiment and not look at the results right away you don’t have it” RESEARCH IS ABOUT CURIOSITY but let’s be organized • Start with the quality controls – In order to being able to draw reliable conclusions it is essential to determine whether the experiment was performed accurately: – Did the “positive” control display the expected characteristics? – Did the “negative” control display the expected characteristics? – Where there any unexpected problems? • Are all variables, animals or patients accounted for ? – Is there something missing? Could there be a problem with the integrity of the data in terms of totality of the data? • Are the variables available in an electronic format ? – And how were they obtained? Were they inputted by hand? Could there have been an error? Are the source documents available? INTRODUCTION TO TRANSLATIONAL RESEARCH Date 20 ANALYZING THE RESULTS: • Organize the spreadsheet – you can use any sort of software – Excel ends up being the most commonly used but not ideal for statistical analysis nor creating of graphs. The spreadsheet should be: – Comprehensive – it includes all the different sets of experiments of a model/project, so it allows to follow trend over time – Specific – it lists the date and the conditions of each experiment with the ID of the batch of cell, animals, or individuals – or of the operator, and/or the batch of reagent(s) and/or drugs • Follow means, medians and use graphs – a picture is worth a thousand words – a visual representation of the process allows you to understand the results better. A scatter plot is a very good tool. • Don’t over-interpret differences but also don’t rely too much on P values – The mean is virtual number is affected by all elements including the extremes therefore a difference in the mean may be driven by extremes. P values tell you about statistical significance but not about “importance”. A 2-fold increase may not be significant but very important and worth exploring more. INTRODUCTION TO TRANSLATIONAL RESEARCH Date 21 ANALYZING THE RESULTS: Step 1, 2, 3 • Hierarchic approach to the analyses – once you have controlled for the quality and you are confident about it , it is time to “open the gift boxes and see what Santa brought you”. “Which box should you start with?” – follow the Aims/Questions the analysis should always be dichotomous yes/no, superior/inferior, is treatment A>B? and then you test the next hypothesis – If you have multiple treatments if you may have to do a multiple group test (i.e. ANOVA) and analyze whether is there an asymmetry among the 4 groups A, B, C, D and then follow up with a T test with Bonferroni correction for is A>B - for instance if C and D are 2 controls, it may be less interesting to compare C to D HOWEVER for molecular, cellular, and animal studies this level of certainty is often not necessary consider that the P value set at 0.05 is absolutely arbitrary and hence to hold a result more or less valuable because it meets or does not meet the 0.05 value is by definition arbitrary, the P value should be considered in terms of probability of type I error the greater the P the larger the probability of error, if you attempt multiple experiment by “chance”, the probability of error increases • Reproducibility - if one experiment produces a P value of 0.06 and a purposefully repeated experiment to validate it and provides again a P value of 0.06the probability of type I errors is 0.06x0.06=0.0036 so the best gain is to simply REPEAT THE EXPERIMENT AND SEE IF REPRODUCIBLE Date 22 ANALYZING THE RESULTS: Consistency • Look for consistency – Remember the concerns about multiple comparisons and the increased risk of type I errors and therefore validate the results by looking for consistency across: – Different endpoints (i.e. TTC and troponin) – Different sets of experiments (i.e. done on different days) – Different experimental groups/models (i.e. KOs and inhibitors) – Different time points (i.e. 1 day and 7 days) • Inconsistencies – inconsistencies may reveal errors, inaccuracies in the methodologies but may also be a hint to a biologic system that is more complex than expected – do not disregard “negative” data, analyze thoroughly Imagine if Alexander Fleming had thrown away the plate of bacteria with mold on and never discovered penicillin ! INTRODUCTION TO TRANSLATIONAL RESEARCH Date 23 ANALYZING THE RESULTS: Progress report • Progress report – It is important to analyze the result after each sets of experiments to see if the hypothesis remains supported or not, if not brain-storming – Expected results Proceed with Experiments – Unexpected results Stop and Discuss Technical problem Address and the Proceed Inconsistency with the hypothesis Repeat experiment? or Develop new Hypothesis / Aims New Experiments Only fools never change their mind - Beware of cognitive biases – New data New hypothesis ? INTRODUCTION TO TRANSLATIONAL RESEARCH Date 24
© Copyright 2025 Paperzz