presentation slides

One-way ANOVA
• Example
• Analysis of Variance Hypotheses
• Model & Assumptions
• Analysis of Variance
• Multiple Comparisons
• Checking Assumptions
Example:
Days Absent by Job Type
DotPlot
11/26/2012 9:25.30 (4)
2
4
6
8
10
12
14
16
10
12
14
16
10
12
14
16
10
12
14
16
A
DotPlot
2
4
6
8
B
DotPlot
2
4
6
8
C
DotPlot
2
4
6
8
D
Analysis of Variance
• Analysis of Variance is a widely used statistical technique that
partitions the total variability in our data into components of
variability that are used to test hypotheses about equality of
population means.
• In One-way ANOVA, we wish to test the hypothesis:
H0 : 1 = 2 =  = k
against:
Ha : Not all population means are the same
Assumptions
• Each population being sampled is normally distributed
and all populations are equally variable.
• Normality can be checked by skewness/kurtosis or
normal probability plots. If any of the samples do not
look like they come from a normal population the
assumption is not met (unless the samples that do not
look normal have a large sample size (n>30)
• Equal variability can be checked by comparing standard
deviations. If no standard deviation is more than 2 times
bigger than another equal variability can be assumed.
Example: Are the population mean days
absent the same for all 4 job types?
ANOVA table
Source
SS
Treatment
763.823
Error
351.795
Total
1,115.618
df
MS
3 254.6076
96 3.6645
99
F p-value
69.48 5.72E-24
H0: μA = μB = μC = μD
H1: Not all μ’s are equal
We can be almost 100% confident that
population mean days absent differ in
some way between the 4 job types.
Multiple Comparisons
• A significant F-test tells us that at least two of the underlying
population means are different, but it does not tell us which
ones differ from the others.
• We need extra tests to compare all the means, which we call
Multiple Comparisons.
• We look at the difference between every pair of group
population means, as well as the p-value for each difference.
• When we have k groups, there are:
k 
k!
k (k - 1 )
  =
=
2 ! ( k - 2 )!
2
2
possible pair-wise comparisons. For example 4 groups have
4*3/2 = 6 comparisons.
Multiple Comparisons
• If we estimate each comparison separately with
95% confidence, the overall confidence will be
less than 95%.
• So, using ordinary pair-wise comparisons (i.e.
lots of individual t-tests), we tend to find too
many significant differences between our
sample means.
• We need to modify our p-values so that we
determine the true differences with 95%
confidence across the entire set of comparisons.
• These methods are known as:
multiple comparison procedures
Multiple Comparisons
• We use Tukey simultaneous comparisons.
• Tukey simultaneous comparisons overcome
the problems of the unadjusted pair-wise
comparisons finding too many significant
differences
(i.e. p-values that are too small).
Tukey Pair-wise Comparisons
Tukey simultaneous comparison t-values (d.f. = 96)
B
D
C
2.73
5.03
6.74
B
2.73
D
5.03
4.15
C
6.74
7.47
3.05
A
10.11
14.04
9.24
6.36
A
10.11
critical values for experimentwise error rate:
0.05
2.62
0.01
3.21
We can be at least 99% confident that Job Type
A has the highest population mean days absent.
We can be also more than 99% confident that C
and D have a larger population mean days absent
than B. We can only be 95% confident that Job
Type C has a higher population mean than D.