CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE It would be very

CHAPTER 10
ONE-WAY ANALYSIS OF VARIANCE
It would be very unusual for all the research one might conduct to be restricted to
comparisons of only two samples. Respondents and various groups are seldom divided easily
into just two samples. As a result, the researcher often needs to make comparisons between
three, four, five, or even more sample means. A series of t tests could be used to make the
comparisons between combinations of three or more sample means. Four samples could be
compared using six pairs or combinations of t tests. This is an complex and unwieldy process
that would require a great number of calculations. This approach also has important statistical
limitations. The probability of committing alpha error increases dramatically when t-tests are
used in this way..
This chapter introduces a statistic called the one-way Analysis of Variance (ANOVA)
that is used to test hypotheses involving differences in means when there are three or more
samples to be examined. The requirements for using an Analysis of Variance are: the presence
of interval data, the need to consider more than two sample means, the assumption that the
characteristic under consideration is normally distributed. The core function of an ANOVA is
very similar to that of the T-test. In both cases, the statistic is calculated using a formula that
requires calculation of the amount of variance that exists within the a sample as well as the level
of variance that exists between the samples under consideration. In the t-test, the numerator of
the formula for t represents the variance that exists between the two sample means being
compared. The numerator of the formula used to perform an ANOVA also represents a statistic
which represents the variance that occurs between the three or more groups being studied.
157
Statistics reflecting the variance that exists within the individual samples are used in the
denominator of the formula for each statistic in much the same way.
Since the terms between group variance and within group variance are new to most
students in a basic statistics course, these terms will be explained as background for
understanding analysis of variance. One type of variance should already be understood based on
the work that has already been done in this class even though the term has not been employed
previously. Measurement of deviations from a sample mean, variance, standard deviation, and
standard error has focused on a type of variance called within group variance. This represents the
degree to which individual values within a group fluctuate or deviate from the mean of that
particular group. The second type of variance is called between group variance. This type of
variance measures the difference that exists between the group means of multiple samples when
working with t-tests or ANOVA’s. Between group variance is measured by the standard error of
the difference in means when conducting a t-test and by a statistical called Mean of Squares
Within when working with an ANOVA.
Just like the T-test, the ANOVA produces a statistic labled F which is determined by
comparing the variance between groups with the variance that exists within those groups. The
formula used to calculate this statistic is:
When an F-ratio is large, it provides a powerful indication that the variation observed
between the group means under consideration is likely to be the result of real differences in the
158
populations represented by each sample and not the result of simple measurement error.
The actual calculations for obtaining an F ratio are not difficult, but they are more
complex than for the statistics covered so far in the text. This is primarily true because the
student is working with more than two samples, and there is some new symbolism associated
with the formulas. However, some of the basics have already been learned from the study of
earlier chapters in the form of the sum of squares or finding deviation values. It will be recalled
that the sum of squares was the basis for obtaining variance (s2) for distribution of values. The
sum of the squares is also the basis for obtaining the denominator (MSwithin) of the F ratio. The
Mean Square Within is calculated using the following formula:
Where:
Total Number of Cases included in all samples minus the number of groups Being
Compared
= Sum of Squared Deviations of Each Value From its
corresponding sample mean
The following is a step by step analysis for finding the MSwithin portion of the ratio. The analysis
is based on a hypothetical scenario in which a researcher has data for three simple income
distributions or samples that document percent of funds spent for entertainment per year as
follows:
159
Family
X1 -High Income
X2 – Middle Income
X3 –Low Income
1
20%
50%
60%
2
10%
40%
30%
3
30%
10%
20%
4
15%
8%
10%
5
5%
2%
5%
After stating the null hypothesis, determining that the data are interval and noting that
there are more than two groups, the researcher would then calculate the means, deviation values ,
sum of the squares, sum of the squares within (SSwithin ), degrees of freedom within groups
(d.f.within ), and mean of the squares within groups.
Group 1:
20
4
16
10
-6
36
30
14
196
15
-1
1
5
-11
121
160
Group 2:
50
28
784
40
18
324
10
-12
144
8
-14
196
2
-20
400
60
35
1225
30
5
25
20
-5
25
10
-15
225
5
-20
400
Group 3:
After the means and deviation values have been calculated, the MSwithin can be calculated as
follows:
161
Step 1:
Step 2
Step 3
At this point, the researcher has calculated the denominator of the F ratio formula in three
easy steps. The next stage of the process involves the calculation of the Mean of Squares
Between. This statistic is determined by the formula:
Where:
n of each sample group
Overall mean across all groups
162
Working with the same example as before, the process for obtaining
Step 1: Compute the Overall Mean (
)
Step 2: Compute
Step 3: Determine
Step 4: Determine
163
is as follows:
Once the mean of the squares within and mean of the squares between have been
calculated, obtaining the F ratio is a simple division.
With the F ratio calculated, the researcher can then consult the tables in Appendix E and
F for the critical values of F at .01 and .05 to determine whether or not the null hypothesis should
be accepted or rejected. The tables yield critical values of 6.93 (.01 or 99% level) and 3.88 (.05
or 95% level). Since the obtained F ratio of .31 is smaller than either of the critical values, the
null hypothesis is accepted at both levels of significance. Statistically, the mean percent income
spent for entertainment for the populations of these three income groups is the same. Income
level does not affect the amount spent for entertainment.
When reporting the results of an analysis of variance, one must always construct a table
showing the sources of variation and the F ratio. The table provides a summary of the
calculations and clarifies the findings for the reader of the research report. An example of an F
ratio table, which includes the data from the above illustration, is shown below.
F RATIO TABLE
_______________________________________________________________________
Sources of Variation
df
SS
MS
F Ratio
_______________________________________________________________________
Between Groups
2
210.00
105.00
.31
Within Groups
12
4118.00
343.17
________________________________________________________________________
164
On those occasions when the obtained value for the F-ratio is significant and the null
hypothesis must be rejected, one additional step is needed to determine which sample means are
significantly different from each other. The statistic used for this follow up test is called Tukey’s
HSD. Tukey’s HSD is used to determine the amount of difference that must exist between two
sample means for those means to be considered significantly different in a statistical sense. The
formula for Tukey’s HSD is:
Where:
q=
represents the value obtained from Appendix G at the appropriate level of
confidence
the number of cases in each sample
(**Tukey’s HSD can only be used when all samples are of equal size)
In conclusion, analysis of variance is a comparison between three or more independent means. In
order to conduct an analysis of variance test, one must have data that are interval level
measurement. Ordinal and nominal data cannot be used. The test also assumes that the samples
have been drawn at random from their populations and that the characteristic being studied is
normally distributed in the populations. A review of the steps for calculating F ratio are at the
end of the chapter.
165
A Major Idea: Remember this concept when applying analysis of variance.
Every time a hypothesis is tested for a
research
situation the following MUST ALWAYS be
applied:
(1) State the RESEARCH QUESTION.
(2) State the NULL HYPOTHESIS.
(3) Conduct the STATISTICAL ANALYSIS.
(4) Draw STATISTICAL CONCLUSIONS.
166
SEQUENTIAL STATISTICAL STEPS
FINDING THE F RATIO
Step 1
Organize
Data
Matrices
Organize all of the distributions in matrices.
9
9
Step 2
0i
Calculate the means for all of the distributions by
adding all of the raw values in each distribution
and dividing by the number of values in each
distribution.
9
9
Step 3
Step 4
Ex2
9
9
SSwithin
Find the sums of the deviation values squared.
E(X-X1)2, E(X-X2)2, E(X-X3)2, E(X-X4)2
Find the sum of the squares within by adding the deviation
values squared for all of the distributions.
9
9
Step 5
X average
What is the mean average for all the distributions? Add all
the means and divide by the number of samples.
X1 + X2 + X3 + X4
4
9
9
Step 6
SS between
Find the sum of the squares between by subtracting each
mean total from the individual mean, square, multiply the
number in each distribution and add those results.
9
167
9
Step 7
What are the degrees of freedom for between groups?
df between
= add the number of distributions
and subtract 1.
dfbetween
dfwithin
9
df within
= add number of values in each
distribution and subtract one for each
distribution.
9
9
Step 8
MSwithin
sum
What is the mean of the squares within groups? Divide the
of the squares within by the degrees of freedom within.
9
9
Step 9
MSbet
sum
What is the mean of the squares between groups? Divide the
of the squares between by the degrees of freedom between.
9
9
Step 10
MS bet
MS within
What is the F ratio? Divide the mean of the squares
between by the mean of the squares within.
9
9
Step 11
Construct
F Ratio
Table
Construct a sources of variation table and enter all
F ratio values.
9
9
9
168
9
Step 12
Obtain
Critical
Values
For F
Consult the F ratio table for the critical value of F. The
degree of freedom between is the numerator and the degrees
of freedom within the denominator in the table.
9
9
Step 13
Accept or
Reject H0
Make decision to accept or reject the null hypothesis by
comparing the obtained F ratio with the critical value from
the tables.
9
9
Step 14
Draw
Research
Conclusions
Draw research conlusions based on the findings related to
the statistical tests.
169
EXERCISES - CHAPTER 11
(1)
Following the step-by-step procedures, calculate an F ratio for the following samples.
Show all work. Test the null hypothesis at the .05 and .01 levels. Draw Research
conclusions.
X
Factor/Grouping
Variable
20
1
10
1
30
1
30
1
50
2
40
2
30
2
40
2
80
3
90
3
70
3
80
3
170
(2)
Four machines in a plant produce so many units (in hundreds) per day. Test a hypothesis
that there is no statistically significant difference in the productivity of the machines.
Draw research conclusions at both 95% and 99% levels of confidence.
Units
Machine #
36
1
34
1
37
1
35
1
33
1
19
2
24
2
24
2
26
2
22
2
31
3
35
3
32
3
33
3
39
3
56
4
48
4
54
4
52
4
50
4
171
(3)
Net receipts (in thousands) for three restaurants in a chain of restaurants for one month
are as follows. Test a hypothesis that there is no statistically significant difference in
receipts of the restaurants. Draw research conclusions.
Receipts
Restaurant #
25
1
29
1
33
1
15
1
28
1
14
1
20
2
24
2
17
2
35
2
22
2
16
2
81
3
74
3
94
3
74
3
54
3
74
3
172
(4)
A test was conducted to compare the fuel economy of three different government
automobiles. The results were expressed in miles per gallon after five tanks of gasoline had been
used in each of the automobiles. Test whether or not there are statistically significant
differences in fuel economy among the automobiles.
Fuel Economy
Type
16
a
18
a
15
a
20
a
19
a
13
b
13
b
15
b
14
b
16
b
17
c
18
c
16
c
19
c
20
c
If MPG is a major consideration, which model automobile should the government buy?
173
(5)
The Department of Commerce used three different methods to rate effectiveness of
trainee programs. Are the three methods equally effective? Draw research conclusions.
Effectiveness
Rating
Method
89
1
100
1
99
1
70
1
50
1
21
1
90
2
80
2
81
2
70
2
75
2
88
2
100
3
130
3
115
3
106
3
121
3
149
3
174