EXAMPLE 1 * Butter Fat Content in Cow Milk

STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
12.1 - Introduction
Suppose we wish to compare k population means ( k  2 ). This situation can arise in
two ways. If the study is observational, we are obtaining independently drawn samples
from k distinct populations and we wish to compare the population means for some
numerical response of interest. If the study is experimental, then we are using a
completely randomized design to obtain our data from k distinct treatment groups. In a
completely randomized design the experimental units are randomly assigned to one of k
treatments and the response value from each unit is obtained. The mean of the
numerical response of interest is then compared across the different treatment groups.
There two main questions of interest:
1) Are there at least two population means that differ?
2) If so, which population means differ and how much do they differ by?
More formally:
H o : 1   2  ...   k i.e. all the population means are equal and have a common mean (  )
H a : i   j for some i  j, i.e. at least two population means differ.
If we reject the null then we use comparative methods to answer question 2 above.
Basic Idea:
The test procedure compares the variation in observations between samples to the
variation within samples. If the variation between samples is large relative to the
variation within samples we are likely to conclude that the population means are not all
equal. The diagrams below illustrate this idea...
Between Group Variation >> Within Group Variation
(Conclude population means differ)
Between Group Variation  Within Group Variation
(Fail to conclude the population means differ)
The name analysis of variance gets its name because we are using variation to decide
whether the population means differ.
194
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Computational Details for Equal Sample Size Case ( n1  n2    nk  n )
Preliminary notation:
Yij  jth obs. from population i, i  1,..., k , j  1,..., n
n
Yi  sample mean for the sample from population i , Yi 
k
Y  grand mean of all obs. 
n
 Y
N
j 1
n
ij
, i  1,...,k
k
ij
i 1 j 1
Y

Y
i 1
i
k
N  n1  n2  ...  nk , i.e. N is the total sample size for the experiment
TWO ESTIMATES OF THE COMMON VARIANCE ( 2 )
Estimate 1: Using Between Group Variation
If the null hypothesis is true each of the Yi  ' s represents a randomly selected observation
from a normal distribution with mean  (common mean) and std. deviation = 
i.e. the standard error of the mean, i.e. from the sampling distribution of 𝑌̅~𝑁(𝜇,
n
𝜎
)
√𝑛
Thus if we find the sample standard deviation of the Yi  ' s we get an estimate of 
,
.
n
,
or in terms of the sample variance we have,
k
Sample variance of the Yi  ' s 
 (Y
i 1
i
 Y ) 2
k 1
is an estimate of
2
n
.
Therefore if multiply this sample variance by n,
k
n (Yi   Y ) 2
i 1
k 1
we obtain an estimate of  2 , if and only if the null hypothesis is true. However if the
alternative hypothesis is true, this estimate of  2 will be too BIG. This formula will
be large when there is substantial between group variation. This measure of between
group variation is called the Mean Square for Treatments and is denoted MS Treat .
195
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Estimate 2: Using Within Group Variation
Another estimate of the common variance ( 2 ) can be found by looking at the variation
of the observations within each of the k treatment groups. By extending the pooledvariance from the two population case we have the following:
Pooled estimate of  2 
(n1  1) s12  (n2  1) s22  ...  (nk  1) sk2
n1  n2  ...  nk  k
which simplifies to the mean of the k sample variances when the sample sizes are all
equal.
Another way to write this is as follows:
k
Pooled estimate of  2 
ni
 (Y
i 1 j 1
ij
 Yi  ) 2
N k

SS Error
 MS Error
N k
This will be an estimate of the common population variance ( 2 ) regardless of whether
the null hypothesis is true or not. This is called the Mean Square for Error ( MS Error ) .
Thus if the null hypothesis is true we have two estimates of the common variance ( 2 ) ,
namely the mean square for treatments (MSTreat ) and the Mean Square Error ( MS Error ) .
If MSTreat  MS Error we reject H o , i.e. the between group variation is large relative to the
within group variation.
If MS Treat  MS Error we fail to reject H o , i.e. the between group variation is NOT large
relative to the within group variation.
Test Statistic
The test statistic compares the mean squares for treatment and error in the form of a
ratio,
F
MS Treat
~ F - distribution with numerator df  k  1 and denominator df  N - k
MS Error
Large values for F give small p-values and lead to rejection of the null hypothesis.
196
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Example 12.1 - Weight Gain in Anorexia Patients
(Data File: Anorexia.JMP)
These data give the pre- and post-weights of patients being treated for anorexia nervosa.
There are actually three different treatment plans being used in this study, and we wish
to compare their performance.
The variables in the data file are:
 Treatment – Family, Standard, Behavioral
 prewt - weight at the beginning of treatment
 postwt - weight at the end of the study period
 Weight Gain - weight gained (or lost) during treatment (postwt-prewt)
We begin our analysis by examining comparative displays for the weight gained across
the three treatment methods. To do this select Fit Y by X from the Analyze menu and
place the grouping variable, group, in the X box and place the response, Weight Gain, in
the Y box and click OK. Here boxplots, mean diamonds, normal quantile plots, and
comparison circles have been added.
Things to consider from this graphical display:

Do there appear to be differences in the mean weight gain?

Are the weight changes normally distributed for each treatment group?

Is the variation in weight gain equal across therapies?
197
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Checking the Equality of Variance Assumption
To test whether it is reasonable to assume the population variances are equal for these
three therapies select UnEqual Variances from the Oneway Analysis pull down-menu.
Equality of Variance Test
H o :  12   22     k2
H a : the population variances are not
all equal
We have no evidence to conclude that the variances/standard deviations of the weight
gains for the different treatment programs differ (p >> .05).
ONE-WAY ANOVA TEST FOR COMPARING THE THERAPY MEANS
To test the null hypothesis that the mean weight gain is the same for each of the therapy
methods we will perform the standard one-way ANOVA test. To do this in JMP select
Means, Anova/t-Test from the Oneway Analysis pull-down menu. The results of the
test are shown in the Analysis of Variance box.
MS Treat  307.32
MS Error  56.68
The mean square for treatments is 5.42 times larger than
the mean square for error! This provides strong evidence
against the null hypothesis.

The p-value contained in the ANOVA table is .0065, thus we reject the null hypothesis at
the .05 level and conclude statistically significant differences in the mean weight gain
experienced by patients in the different therapy groups exist.
198
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
MULTIPLE COMPARISONS
Because we have concluded that the mean weight gains across treatment method are not
all equal it is natural to ask the secondary question:
Which means are significantly different from one another?
We could consider performing a series of two-sample t-Tests and constructing
confidence intervals for independent samples to compare all possible pairs of means,
however if the number of treatment groups is large we will almost certainly find two
treatment means as being significantly different. Why? Consider a situation where we
have k = 7 different treatments that we wish to compare. To compare all possible pairs
of means (1 vs. 2, 1 vs. 3, …, 6 vs. 7) would require performing a total of
k (k  1)
 21
2
two-sample t-Tests. If we used   P(Type I Error)  .05 for each test we expect to make
21(.05)  1 Type I Error, i.e. we expect to find one pair of means as being significantly
different when in fact they are not. This problem only becomes worse as the number of
groups, k, gets larger.
Experiment-wise Error Rate (EER)
Another way to think about this is to consider the probability of making no Type I
Errors when making our pair-wise comparisons. When k = 7 for example, the
probability of making no Type I Errors is (.95) 21  .3406 , i.e. the probability that we
make at least one Type I Error is therefore .6596 or a 65.96% chance. Certainly this
unacceptable! Why would conduct a statistical analysis when you know that you have a
66% of making an error in your conclusions? This probability is called the experimentwise error rate.
Bonferroni Correction
There are several different ways to control the experiment-wise error rate (EER). One of
the easiest ways to control experiment-wise error rate is use the Bonferroni Correction. If
we plan on making m comparisons or conducting m significance tests the Bonferroni
Correction is to simply use 
m
as our significance level rather than
 . This simple
correction guarantees that our experiment-wise error rate will be no larger than
correction implies that our p-values will have to be less than 
m
, rather than
 . This
 , to be
considered statistically significant.
199
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Multiple Comparison Procedures for Pairwise Comparisons of k Pop. Means
When performing pair-wise comparison of population means in ANOVA there are
several different methods that can be employed. These methods depend on the types of
pair-wise comparisons we wish to perform. The different types available in JMP are
summarized briefly below:




Compare each pair using the usual two-sample t-Test for independent
samples. This choice does not provide any experiment-wise error rate
protection! (DON’T USE)
Compare all pairs using Tukey’s Honest Significant Difference (HSD)
approach. This is best choice if you are interested comparing each
possible pair of treatments.
Compare with the means to the “Best” using Hsu’s method. The best
mean can either be the minimum (if smaller is better for the response) or
maximum (if bigger is better for the response).
Compare each mean to a control group using Dunnett’s method.
Compares each treatment mean to a control group only. You must
identify the control group in JMP by clicking on an observation in your
comparative plot corresponding to the control group before selecting this
option.
Multiple Comparison Options in JMP
200
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Example 12.1 - Weight Gain in Anorexia Patients (cont’d)
For these data we are probably interested in comparing each of the treatments to one
another. For this we will use Tukey’s multiple comparison procedure for comparing all
pairs of population means. Select Compare Means from the Oneway Analysis menu
and highlight the All Pairs, Tukey HSD option. Beside the graph you will now notice
there are circles plotted. There is one circle for each group and each circle is centered at
the mean for the corresponding group. The size of the circles are inversely proportional
to the sample size, thus larger circles will drawn for groups with smaller sample sizes.
These circles are called comparison circles and can be used to see which pairs of means
are significantly different from each other. To do this, click on the circle for one of the
treatments. Notice that the treatment group selected will be appear in the plot window
and the circle will become red & bold. The means that are significantly different from
the treatment group selected will have circles that are gray. These color differences will
also be conveyed in the group labels on the horizontal axis. In the plot below we have
selected Standard treatment group.
The results of the pairwise comparisons are also contained the output window
(shown below). The matrix labeled Comparison for all pairs using Tukey-Kramer
HSD identifies pairs of means that are significantly different using positive
entries in this matrix. Here we see only treatments 2 and 3 significantly differ. I
generally turn this option off as the other methods for displaying significant
differences are better.
The next table conveys the same information by using different letters to
represent populations that have significantly different means. Notice treatments
2 and 3 are not connected by the same letter so they are significantly different.
201
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Finally the CI’s in the Ordered Differences section give estimates for the
differences in the population means. Here we see that mean weight gain for
patients in treatment 3 is estimated to be between 2.09 lbs. and 13.34 lbs. larger
than the mean weight gain for patients receiving treatment 2 (see highlighted
section below).
202
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Example 13.2: Study of Diet and Growth of Western Rock Lobsters
(Datafile: Rock Lobster.JMP)
The western rock lobster is recognized as a valued fishery estimated at 300 million
dollars a year. Therefore, recent studies have focused on protecting this commodity and
increasing the production. In one particular study, a large sample of juvenile lobsters
were collected at various locations and held in a tank at the Fisheries Marine Research
Lab for two months as an acclimation period. During this period, the lobsters were fed a
diet of mussels and prawn pellets. Subsequently, juvenile lobsters were fed for nine
weeks on either the fresh mussel diet (D1) or one of four artificial diets: two in semimoist form (D2 and D3) and two in dry-pelleted form (D4 and D5). At the end of the
nine weeks, 10 animals were sampled at random from each diet treatment, and their
Muscle Somatic Index (MSI) was measured. The MSI is the weight of wet tail muscle
expressed as a percentage of the weight of the whole animal, wet.
Question(s) of Interest: Is there evidence that lobsters perform as well on the artificial
diets as they do on the natural diet?
Source: Tsvetneko, E., J. Brown, B.D. Glencross, and L.H. Evans. 1999. Measures of condition in
dietary studies on western rock lobster post-pueruli. Proceedings, International Symposium on
Lobster Health Management, Adelaide, September 1999.
Step 0: Check the assumptions to be sure that the one-way ANOVA is valid.

Make sure that the two groups are independent

Check the normality assumption.
203
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017

Check whether we have any evidence that the group variances differ (the
ANOVA assumes equal variances).
This test also appears when you
check equality of the variances,
it is discussed below.
Step 1: Convert the research question into Ho and Ha.
H0:
Ha:
Step 2: Find the test statistic and p-value from your data.
In JMP, select Means/Anova.
204
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
If you question the assumption of equal variances report p-value from Welch’s ANOVA
If you question the assumption of normality we can use Kruskal-Wallis Test:
Step 3: Write a conclusion in the context of the problem.
205
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Multiple Comparisons for the Western Rock Lobster data:
The researchers regarded the fresh mussel diet (D1) as a control treatment. Therefore,
we should use Dunnett’s test to compare the treatments.
Using Dunnett’s test in JMP: Select Compare Means > With Control, Dunnett’s.
First, you need to identify which group is the control group. Click on D1, the control
diet.
206
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Questions:
1. Where do the significant differences exist?
2. Do the artificial diets appear to promote or retard body condition (based on the
muscle somatic index)?
3. Which, if any, of the artificial diets would you recommend?
207
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Example 13.3 – Butter Fat Content in Cow Milk
(Datafile: Butterfat-cows.JMP)
This data set comes from the butter fat content for five breeds of dairy cows.
The variables in the data file are:
 Breed – Ayshire, Canadian, Guernsey, Holstein-Fresian, Jersey
 Age Group – two different age class of cows (1 = younger, 2 = older)
 Butterfat - % butter fat found in the milk sample
We begin our analysis by examining comparative displays for the butter fat content
across the five breeds. To do this select Fit Y by X from the Analyze menu and place the
grouping variable, Breed, in the X box and place the response, Butterfat, in the Y box
and click OK. The resulting plot simply shows a scatter plot of butter fat content versus
breed. In many cases there will numerous data points on top of each other in such a
display making the plot harder to read. To help alleviate this problem we can stagger or
jitter the points a bit from vertical by selecting Jitter Points from the Display Options
menu. To help visualize breed differences we could add quantile boxplots, mean
diamonds, or mean with standard error bars to the graph. To do any or all of these select
the appropriate options from the Display Options menu. The display below has both
quantile boxplots, sample mean with error bars, and normal quantile plots added.
We can clearly see the butter fat content is largest for Jersey and Guernsey cows and
lowest for Holstein-Fresian. There also appears to be some difference in the variation of
butter fat content as well. The summary statistics on the following page confirm our
findings based on the graph above. To obtain these summary statistics select the
Quantiles and Means, Std Dev, Std Err options from the Oneway Analysis menu at the
top of the window.
208
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Summary Statistics for Butter Fat Content by Breed
To test the null hypothesis that the butter fat content is the same for each of the breeds
we can perform a one-way ANOVA test.
H o :  Ayshire   Canadian  ...   Jersey
H a : at least two breeds have different mean percent butter fat content in their milk
Again the assumptions for one-way ANOVA are as follows:
1.) The samples are drawn independently or come from using a completely
randomized design. Here we can assume the cows sampled from each breed were
independently sampled.
2.) The variable of interest is normally distributed for each population.
3.) The population variances are equal across groups. Here this means:
 2 Ayshire   2 Canadian  ...   2 Jersey
If this assumption is violated we can use Welch’s ANOVA, which allows
for inequality of the population variances or we can transform the
response (see below).
209
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
To check these assumptions in JMP select UnEqual Variances and Normal Quantile
Plot > Actual by Quantile (the 1st option). To conduct the one-way ANOVA test select
Means, Anova/t-Test from the Oneway Analysis pull-down menu.
The graphical results and the results of the test are shown below.
Butter Fat Content by Breed
Normality appears to be satisfied
210
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
The equality of variance test results are shown below.
We have strong evidence against
the equality of the population
variances/standard deviations.
(Use Welch’s ANOVA)
We conclude at least two means differ
using Welch’s ANOVA.
Because we have strong evidence against equality of the population variances we
could use Welch’s ANOVA to test the equality of the means. This test allows for
the population variances/standard deviations to differ when comparing the
population means. For Welch’s ANOVA we see that the p-value < .0001,
therefore we have strong evidence against the equality of the means and we
conclude these dairy breeds differ in the mean butter fat content of their milk.
211
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Variance Stabilizing Transformations
Another approach that is often times used when we encounter situations where there is
evidence against the equality of variance assumption is to transform the response. Often
times we find that the variation appears to increase as the mean increases. For these
data we see that this is the case, the estimated standard deviations increase as the
estimated means increase. To correct this we generally consider taking the log of the
response. Sometimes the square root and reciprocal are used but these transformations
do not allow for any meaningful interpretation in the original scale, whereas the log
transform.
As we can see, as the samples means for the
groups increase so do the sample standard
deviations. To stabilize the variances/standard
deviations we can try a log transformation.
The results of the analysis using the log response are shown below.
Analysis of the log(Percent Butter Fat) Content for the Five Dairy Cow Breeds
Normality is still satisfied, there is no evidence against
the equality of the variances, and we have strong
evidence that at least two means differ (𝑝 < .0001).
212
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
Because we have concluded that the mean/median log butter fat content differs across
breed (p < .0001), it is natural to ask the secondary question: Which breeds have typical
values that are significantly different from one another? To do this we again use
multiple comparison procedures. In JMP select Compare Means from the Oneway
Analysis menu and highlight the All Pairs, Tukey HSD option. The results of Tukey’s
HSD are shown below.
Below is the plot showing the results when clicking on the circle for Guernsey’s.
We can see that the mean log-Butterfat Content for Guernsey’s and Jersey’s do not
significantly differ from each other, but they both differ significantly from the other
three breeds.
Here we see only Guernsey and Jersey do not significantly differ. The table using letters
conveys the information, using different letters to represent populations that have
significantly different means. The CI’s in the Ordered Differences section give estimates
213
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
for the differences in the population means/medians in the log scale. As was the case
with using the log transformation with two populations, we interpret the results in the
original scale in terms of ratios of medians. For example, back transforming the interval
comparing Jersey to Holstein-Fresian, we estimate that the median butter fat content
found in Jersey milk is between 1.33 and 1.55 times larger than the median butter fat
content in Holstein-Fresian milk. We could also state this in terms of percentages as
follows: we estimate that the typical butter fat content found in Jersey milk is between
33% and 55% higher than the butter fat content found in Holstein-Fresian milk.
214
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
ONE-WAY ANOVA MODEL (FYI only)
In statistics we often use models to represent the mechanism that generates the observed
response values. Before we introduce the one-way ANOVA model we first must
introduce some notation.
Let,
ni  sample size from group i , (i  1,..., k )
Yij  the j th response value for i th group , ( j  1,..., ni )
  mean common to all groups assuming the null hypothesis is true
 i  shift in the mean due to the fact the observation came from group i
 ij  random error for the j th observatio n from the i th group .
Note: The test procedure we use requires that the random errors are normally distributed
with mean 0 and variance  2 . Equivalently the test procedure requires that the response
is normally distributed with a common standard deviation  for all k groups.
One-way ANOVA Model:
Yij     i   ij i  1,...,k and j  1,...,ni
The null hypothesis equivalent to saying that the  i are all 0, and the alternative says that
at least one  i  0 , i.e. H o :  i  0 for all i vs. H a :  i  0 for some i . We must decide
on the basis of the data whether we evidence against the null. We display this graphically
as follows:
Estimates of the Model Parameters:
̂  Y ~ the grand mean
ˆi  Yi  Y ~ the estimate ith treatment effect
Yˆ  ˆ  ˆ  Y ~ these are called the fitted values.
ij
i
i
Note:
Yij  ˆ  ˆi  ˆij
Yij  Y  (Yi  Y )  (Yij  Yi )
ˆij  Yij  Yˆij  Yij  Yi ~ these are called the residuals.
215
STAT 305: Chapter 12 – Comparing Several Population Means (One-way ANOVA)
Spring 2017
An Alternative Derivation of the Test Statistic Based on the Model
Starting with
Yij  ˆ  ˆi  ˆij
Yij  Y  (Yi  Y )  Yij  Yi )
we obtain
(Yij  Y )  (Yi  Y )  (Yij  Yi )
After squaring both sides we have,
(Yij  Y ) 2  (Yi  Y ) 2  (Yij  Yi ) 2  2(Yi  Y )(Yij  Yi )
Finally after summing overall of the observations and simplying we obtain.
k
ni
k
k
ni
 (Yij  Y ) 2   ni (Yi  Y ) 2   (Yij  Yi ) 2
i 1 j 1
i 1
i 1 j 1
These measures of variation are called “Sum of Squares” and are denoted SS. The above
expression can be written
SSTotal  SSTreat  SS Error
The degrees of freedom associated with each of these SS’s have the same relationship
and are given below:
df total = df treatment + df error.
(n – 1) = (k – 1)
+ (n – k)
The mean squares (i.e. variance estimates) discussed above are found by taking the sum
of squares and dividing by their associated degrees of freedom, i.e.
k
MS Treat
niˆi2
SSTreat 
2
which is an estimate of the common pop. variance (  ) only if Ho is true

 i 1
k 1
k 1
and
MS Error 
SS Error
= ˆ 2 which is an estimate of the common population  2 regardless of Ho.
nk
Thus when testing
H o :  i  0 for all i.
H a :  i  0 for some i.
we reject Ho when MSTreat >> MSError , i.e. when the estimated treatment effects are
large!
216