One-way ANOVA • Example • Analysis of Variance Hypotheses • Model & Assumptions • Analysis of Variance • Multiple Comparisons • Checking Assumptions Example: Days Absent by Job Type DotPlot 11/26/2012 9:25.30 (4) 2 4 6 8 10 12 14 16 10 12 14 16 10 12 14 16 10 12 14 16 A DotPlot 2 4 6 8 B DotPlot 2 4 6 8 C DotPlot 2 4 6 8 D Analysis of Variance • Analysis of Variance is a widely used statistical technique that partitions the total variability in our data into components of variability that are used to test hypotheses about equality of population means. • In One-way ANOVA, we wish to test the hypothesis: H0 : 1 = 2 = = k against: Ha : Not all population means are the same Assumptions • Each population being sampled is normally distributed and all populations are equally variable. • Normality can be checked by skewness/kurtosis or normal probability plots. If any of the samples do not look like they come from a normal population the assumption is not met (unless the samples that do not look normal have a large sample size (n>30) • Equal variability can be checked by comparing standard deviations. If no standard deviation is more than 2 times bigger than another equal variability can be assumed. Example: Are the population mean days absent the same for all 4 job types? ANOVA table Source SS Treatment 763.823 Error 351.795 Total 1,115.618 df MS 3 254.6076 96 3.6645 99 F p-value 69.48 5.72E-24 H0: μA = μB = μC = μD H1: Not all μ’s are equal We can be almost 100% confident that population mean days absent differ in some way between the 4 job types. Multiple Comparisons • A significant F-test tells us that at least two of the underlying population means are different, but it does not tell us which ones differ from the others. • We need extra tests to compare all the means, which we call Multiple Comparisons. • We look at the difference between every pair of group population means, as well as the p-value for each difference. • When we have k groups, there are: k k! k (k - 1 ) = = 2 ! ( k - 2 )! 2 2 possible pair-wise comparisons. For example 4 groups have 4*3/2 = 6 comparisons. Multiple Comparisons • If we estimate each comparison separately with 95% confidence, the overall confidence will be less than 95%. • So, using ordinary pair-wise comparisons (i.e. lots of individual t-tests), we tend to find too many significant differences between our sample means. • We need to modify our p-values so that we determine the true differences with 95% confidence across the entire set of comparisons. • These methods are known as: multiple comparison procedures Multiple Comparisons • We use Tukey simultaneous comparisons. • Tukey simultaneous comparisons overcome the problems of the unadjusted pair-wise comparisons finding too many significant differences (i.e. p-values that are too small). Tukey Pair-wise Comparisons Tukey simultaneous comparison t-values (d.f. = 96) B D C 2.73 5.03 6.74 B 2.73 D 5.03 4.15 C 6.74 7.47 3.05 A 10.11 14.04 9.24 6.36 A 10.11 critical values for experimentwise error rate: 0.05 2.62 0.01 3.21 We can be at least 99% confident that Job Type A has the highest population mean days absent. We can be also more than 99% confident that C and D have a larger population mean days absent than B. We can only be 95% confident that Job Type C has a higher population mean than D.
© Copyright 2025 Paperzz