Applications of Statistics in Research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University Begin at the conclusion Type of the study outcome: Key for selecting appropriate statistical methods Study outcome – Dependent variable or response variable – Focus on primary study outcome if there are more Type of the study outcome – Continuous – Categorical (dichotomous, polytomous, ordinal) – Numerical (Poisson) count – Even-free duration Continuous outcome Primary target of estimation: – Mean (SD) – Median (Min:Max) – Correlation coefficient: r and ICC Modeling: – Linear regression The model coefficient = Mean difference – Quantile regression The model coefficient = Median difference Example: – Outcome = Weight, BP, score of ?, level of ?, etc. – RQ: Factors affecting birth weight Categorical outcome Primary target of estimation : – Proportion or Risk Modeling: – Logistic regression The model coefficient = Odds ratio (OR) Example: – Outcome = Disease (y/n), Dead(y/n), cured(y/n), etc. – RQ: Factors affecting low birth weight Numerical (Poisson) count outcome Primary target of estimation : – Incidence rate (e.g., rate per person time) Modeling: – Poisson regression The model coefficient = Incidence rate ratio (IRR) Example: – Outcome = Total number of falls Total time at risk of falling – RQ: Factors affecting tooth elderly fall Event-free duration outcome Primary target of estimation : – Median survival time Modeling: – Cox regression The model coefficient = Hazard ratio (HR) Example: – Outcome = Overall survival, disease-free survival, progression-free survival, etc. – RQ: Factors affecting survival The outcome determine statistics Continuous Mean Median Categorical Proportion (Prevalence Or Risk) Linear Reg. Count Survival Rate per “space” Median survival Risk of events at T(t) Logistic Reg. Poisson Reg. Cox Reg. Statistics quantify errors for judgments Parameter estimation [95%CI] Hypothesis testing [P-value] Statistics quantify errors for judgments Parameter estimation [95%CI] Hypothesis testing [P-value] 7 Caution about biases Selection bias Information bias Confounding bias Research Design -Prevent them -Minimize them Caution about biases Selection bias (SB) Information bias (IB) Confounding bias (CB) If data available: SB & IB can be assessed CB can be adjusted using multivariable analysis Generate a mock data set General format of the data layout id 1 2 3 4 5 … n y x1 x2 X3 Generate a mock data set Continuous outcome example id 1 2 3 4 5 … n y 2 2 0 2 14 x1 1 0 1 0 1 x2 21 12 4 89 0 X3 22 19 20 21 18 6 0 45 21 Mean (SD) Common types of the statistical goals Single measurements (no comparison) Difference (compared by subtraction) Ratio (compared by division) Prediction (diagnostic test or predictive model) Correlation (examine a joint distribution) Agreement (examine concordance or similarity between pairs of observations) Back to the conclusion Continuous Categorical Count Survival Appropriate statistical methods Mean Median Proportion (Prevalence or Risk) Rate per “space” Median survival Risk of events at T(t) Magnitude of effect 95% CI Answer the research question based on lower or upper limit of the CI P-value Always report the magnitude of effect and its confidence interval Absolute effects: – Mean, Mean difference – Proportion or prevalence, Rate or risk, Rate or Risk difference – Median survival time Relative effects: – Relative risk, Rate ratio, Hazard ratio – Odds ratio Other magnitude of effects: – – – – Correlation coefficient (r), Intra-class correlation (ICC) Kappa Diagnostic performance Etc. Touch the variability (uncertainty) to understand statistical inference id 1 A 2 2 3 4 5 2 0 2 14 -2 -4 -2 10 4 16 4 100 Sum () Mean(X) 20 4 0 0 128 32.0 SD Median (x-X ) (x- X ) 2 -2 4 2+2+0+2+14 = 20 2+2+0+2+14 = 20 = 4 5 5 0 2 2 2 14 Variance = SD2 5.66 2 Standard deviation = SD Touch the variability (uncertainty) to understand statistical inference id 1 A 2 2 3 4 5 2 0 2 14 -2 -4 -2 10 4 16 4 100 Sum () Mean(X) 20 4 0 0 128 32.0 SD Median (x-X ) (x- X ) 2 -2 4 Measure of central tendency 5.66 2 Measure of variation Standard deviation (SD) = The average distant between each data item to their mean X X SD n 1 2 Degree of freedom Same mean BUT different variation id 1 2 3 A 2 2 0 id 1 2 3 B 0 3 4 id 1 2 3 C 3 4 4 4 5 Sum () 2 14 20 4 5 Sum () 5 8 20 4 5 Sum () 4 5 20 Mean SD Median 4 5.66 2 Mean SD Median 4 2.91 4 Mean SD Median 4 0.71 4 Heterogeneous data Heterogeneous data Homogeneous data Skew distribution Symmetry distribution Symmetry distribution Facts about Variation Because of variability, repeated samples will NOT obtain the same statistic such as mean or proportion: – Statistics varies from study to study because of the role of chance – Hard to believe that the statistic is the parameter – Thus we need statistical inference to estimate the parameter based on the statistics obtained from a study Data varied widely = heterogeneous data Heterogeneous data requires large sample size to achieve a conclusive finding The Histogram id A id B 1 2 1 4 2 2 2 3 3 0 3 5 4 2 4 4 5 14 5 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 The Frequency Curve id A id B 1 2 1 4 2 2 2 3 3 0 3 5 4 2 4 4 5 14 5 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Area Under The Frequency Curve id A id B 1 2 1 4 2 2 2 3 3 0 3 5 4 2 4 4 5 14 5 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Central Limit Theorem Right Skew X1 Symmetry X2 Left Skew X3 X1 XX Xn Normally distributed Central Limit Theorem X1 Distribution of the raw data X2 X3 X1 XX Xn Distribution of the sampling mean Central Limit Theorem Distribution of the raw data X1 XX Xn Distribution of the sampling mean Large sample (Theoretical) Normal Distribution Central Limit Theorem Many X, X , SD X1 Xn XX Standard deviation of the sampling mean Standard error (SE) Estimated by Many X , XX , SE Large sample Standardized for whatever n, Mean = 0, Standard deviation = 1 SE = SD n (Theoretical) Normal Distribution (Theoretical) Normal Distribution 99.73% of AUC Mean ± 3SD 95.45% of AUC Mean ± 2SD 68.26% of AUC Mean ± 1SD Sample n = 25 X = 52 SD = 5 Population Parameter estimation [95%CI] Hypothesis testing [P-value] Z = 2.58 Z = 1.96 Z = 1.64 SD SE n 5 SE 25 5 5 = 1 Z = 2.58 Z = 1.96 Z = 1.64 Sample n = 25 X = 52 SD = 5 SE = 1 Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 Sample n = 25 X = 52 SD = 5 SE = 1 Population Hypothesis testing H0 : = 55 HA : 55 Z = 55 – 52 1 3 52 -3SE 55 +3SE Hypothesis testing H0 : = 55 HA : 55 Z = 55 – 52 3 P-value = 1-0.9973 = 0.0027 1 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. P-value is the magnitude of chance NOT magnitude of effect P-value < 0.05 = Significant findings Small chance of being wrong in rejecting the null hypothesis If in fact there is no [effect], it is unlikely to get the [effect] = [magnitude of effect] or more extreme Significance DOES NOT MEAN importance Any extra-large studies can give a very small Pvalue even if the [magnitude of effect] is very small P-value is the magnitude of chance NOT magnitude of effect P-value > 0.05 = Non-significant findings High chance of being wrong in rejecting the null hypothesis If in fact there is no [effect], the [effect] = [magnitude of effect] or more extreme can be occurred chance. Non-significance DOES NOT MEAN no difference, equal, or no association Any small studies can give a very large P-value even if the [magnitude of effect] is very large P-value vs. 95%CI (1) An example of a study with dichotomous outcome A study compared cure rate between Drug A and Drug B Setting: Drug A = Alternative treatment Drug B = Conventional treatment Results: Drug A: n1 = 50, Pa = 80% Drug B: n2 = 50, Pb = 50% Pa-Pb = 30% (95%CI: 26% to 34%; P=0.001) P-value vs. 95%CI (2) Pa > Pb Pb > Pa Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05) P-value vs. 95%CI (3) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 Tips #6 (b) P-value vs. 95%CI (4) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were statistically significant different between the two groups. Tips #6 (b) P-value vs. 95%CI (5) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were no statistically significant different between the two groups. P-value vs. 95%CI (4) Save tips: – Always report 95%CI with p-value, NOT report solely p-value – Always interpret based on the lower or upper limit of the confidence interval, p-value can be an optional – Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI can provide this message. Q&A Thank you
© Copyright 2026 Paperzz