Experimental Design: β -error in the Orthopaedic Literature Dr. M. Ghert Hypothesis testing Does one method of treatment have an improved effect on outcome compared to another? Is A better than B? Hypothesis testing and the criminal justice system Criminal defendant is: “Innocent until proven guilty” Null hypothesis: “There is no difference between treatment groups until we find one” A=B Goals • The courtroom: To prove guilt beyond the shadow of a doubt • Scientific investigation: To prove that A is better than B beyond the shadow of a doubt Why? • The law: So we don’t convict someone who is innocent • Scientific investigation: So we don’t accept a treatment method as being better than another when it is not What is this type of error? • Alpha (Type-I) error: We erroneously reject the null hypothesis • In other words, we decide that A is better than B when it really isn’t (or convict an innocent person) How do we avoid Alpha error? • We ask ourselves: “how much of a chance am I willing to take?” • “If I find that A is better than B, how sure do I need to be of these results to change my orthopaedic practice?” By convention: I am willing to accept a 5% chance that A really isn’t better than B P<0.05 Therefore I am willing to accept a P-value of <0.05 A P-value of <0.05 is considered statistically significant The other kind of error • The courtroom: A guilty person walks • Scientific investigation: We fail to discover that A is better than B when it really is ÆWe lose out on improving our practice What is this type error? • • • • Beta (Type-II error) We erroneously accept the null hypothesis We decide that A=B when it really doesn’t (we decide someone is innocent when they really are guilty) What causes Beta error? • When two groups really are distinct (A>B) , but the P-value is >0.05 • Variance is high within each group B A Avoiding Beta error: Decrease variance B A B A Avoiding beta error: Increasing sample size • Reamed vs. unreamed nails for open tibia fracturesÆ infection rate • 80 patients: reamed unreamed P-value 6% 4% 0.24 (NS) • 500 patients: reamed unreamed P-value 6% 4% *0.03 Beta error • In the study with only 80 patients, we committed a beta error • We failed to achieve a significant P-value because the sample size was too small In designing an experiment, how do we avoid beta error? • We find out how large a sample size is needed to achieve a beta error of less than 0.20 • This is a 20% chance of performing a beta error Back to the courtroom • We accept a 5% chance of convicting an innocent person (5% alpha error) • We accept a 20% chance of letting a guilty person walk (20% beta error) We are more willing to let a guilty person go than to convict an innocent person Scientific Investigation • We are more willing to miss out on an improved treatment than to accept one that really isn’t so good. • First do no harm Calculating sample size • Set beta=0.20 • Variance • Effect size Variance • • • • Clinician determined Expected variance within a population Based on retrospective studies example: Harris Hip Score variance within a THA population usually +/-15% Effect size • Based on clinical judgement • For example, when is a 1% difference important? No: 90% vs 91% excellent clinical outcome Yes: 1% vs 2% fatal pulmonary embolism Effect size • Small: effect is difficult to detect 1% increase in excellent results • Medium: effect is intermediate 10% decreased incidence of infection • Large: effect is obvious 50% increase in survival Standard effect size graphs Rule of thumb • For: – moderate effect size – beta=0.20 • Generally need 120 patients/subjects per arm Clinical example • Number of patients needed to detect: 1% difference in fatal pulmonary embolism rates --> 10,000 patients 50% difference in fatal pulmonary embolism rates --> 80 patients Power • The POWER of a study is inversely proportional to the beta error (power=1-β) • Therefore we strive for a power of 80% Power analysis • If the number of subjects is already determined • Pre hoc power analysis vs. post hoc Evidence based research • The gold standard: Prospective, randomized, (double-blind ) studies with adequate statistical power + appropriate, well-defined endpoints (infection, death, fracture union) +adequate follow-up Generally not possible in surgical studies How much of the orthopaedic literature achieves this gold standard? • • • • • Freedman and Bernstein, JBJS 1999: 1997 volumes of JBJS and CORR 86 clinical studies using hypothesis testing Sample size determination in only 6% Post hoc power analysis 2% Freedman and Bernstein, cont. • 69% had negative results (P>0.05) • Of these: 3% had adequate statistical power • Average sample-size deficiency was 85% of the required number • “No significant difference” is not accurate for these studies Tornetta et al, AAOS 2000 • Orthopaedic trauma literature, 32 journals • 117 studies with negative results • All underwent power analysis by Tornetta et al Tornetta et al, cont. • Primary outcomes: – Power average 25% (2%-99%) • Secondary outcomes: – Power average 19% (2%-99%) • Rate of β−error >90% for both outcomes Example • ORIF vs nonoperative for calcaneal fractures • “no difference in outcome” • Because of operative risks, authors conclude that nonoperative is superior • Power of the study was 3% • Conclusions may be flawed and differences may exist Orthopaedics is not alone • Inadequate power to detect clinically meaningful differences: • Emergency medicine • Cardiovascular research • Nursing • Internal medicine • General practice • Rehab • Hand surgery Summary • Null hypothesis: A=B – innocent until proven guilty • Alpha (Type-I error): erroneously rejecting null hypothesis – We find a difference when there really isn’t one • Courtroom analogy: – Convict an innocent person Summary • Beta (Type-II error): erroneously accepting the null hypothesis – We fail to find a difference when there really is one • Courtroom analogy: – A guilty person walks Summary • Sample size determined by power and effect size • Acceptable by convention: – – – – 5% type-I error 20% type-II error 80% power (moderate effect size) Conclusion • Read the literature carefully • If there is “no signficant difference between groups” was there adequate power to reach this conclusion? • Experimental design requires careful planning and sample size determination Rule of Thumb • Moderate effect size • 80% power • Need 120 subjects per study arm AAOS Basic Science SAE In the design of an experiment, the choice of an appropriate sample size is dependent on several factors. What is the critical calculation that characterizes the potential of the study design to successfully address the reasearch question? 1. Regresssion analysis 2. Power analysis 3. Determination of correlation coefficient 4. Determination of mean 5. Determination of t-test AAOS Basic Science SAE In the design of an experiment, the choice of an appropriate sample size is dependent on several factors. What is the critical calculation that characterizes the potential of the study design to successfully address the reasearch question? 1. Regresssion analysis 2. Power analysis 3. Determination of correlation coefficient 4. Determination of mean 5. Determination of t-test AAOS Basic Science SAE Adequate sample sizes are necessary in clinical research design. The minimum number of subjects per group needed in a clinical trial is that which will 1. Minimize type I error 2. Maximine beta, the risk of type II error 3. Provide a p-value below 0.05 4. Provide a p-value below the given alpha threshold 5. Ensure that real and clinically significant differences are statistically significant AAOS Basic Science SAE • Adequate sample sizes are necessary in clinical research design. The minimum number of subjects per group needed in a clinical trial is that which will 1. 2. 3. 4. Minimize type I error Maximine beta, the risk of type II error Provide a p-value below 0.05 Provide a p-value below the given alpha threshold 5. Ensure that real and clinically significant differences are statistically significant AAOS Basic Science SAE • Which of the following terms best describes the probability of making a decision that a treatment has an effect on an outcome in an experiment when in reality there is no effect? 1. Alpha 2. Type II error 3. Power 4. Beta 5. Effect size AAOS Basic Science SAE • Which of the following terms best describes the probability of making a decision that a treatment has an effect on an outcome in an experiment when in reality there is no effect? 1. Alpha 2. Type II error 3. Power 4. Beta 5. Effect size AAOS Basic Science SAE • When evaluating two treatment protocols using statistical methods such as Student’s t-test or analysis of variance, the ‘p value’ of less than 0.05 is best described as 1. A 5% chance that a difference between the populations will be falsely accepted 2. A 5% chance that a true difference between the populations will be missed 3. A 5% difference in the mean betweeen the two populations 4. A 5% difference in the standard deviation of the two populations 5. A 5% chance of telling the difference between the populations with the existing number of patients. AAOS Basic Science SAE • When evaluating two treatment protocols using statistical methods such as Student’s t-test or analysis of variance, the ‘p value’ of less than 0.05 is best described as 1. A 5% chance that a difference between the populations will be falsely accepted 2. A 5% chance that a true difference between the populations will be missed 3. A 5% difference in the mean betweeen the two populations 4. A 5% difference in the standard deviation of the two populations 5. A 5% chance of telling the difference between the populations with the existing number of patients. Thank-you
© Copyright 2026 Paperzz