Single-factor Experiments

D.G. Bonett (3/2017)
Module 3
One-factor Experiments
A between-subjects treatment factor is an independent variable with a ο‚³ 2 levels
in which participants are randomized into a groups. It is common, but not
necessary, to have an equal number of participants in each group. Each group
receives one of the a levels of the independent variable with participants being
treated identically in every other respect. The two-group experiment considered
previously is a special case of this type of design.
In a one-factor experiment with a levels of the independent variable (also called a
completely randomized design), the population parameters are πœ‡1 , πœ‡2 , …, πœ‡π‘Ž where
πœ‡π‘— (j = 1 to a) is the mean of the dependent variable if all members of the study
population had received level j of the independent variable. One way to assess the
differences among the a population means is to compute confidence intervals for
all possible pairs of differences. For instance, with a = 3 levels the following
pairwise comparisons in population means could be examined.
πœ‡1 – πœ‡2
πœ‡1 – πœ‡3
πœ‡2 – πœ‡ 3
In a one-factor experiment with a levels there are a(a – 1)/2 pairwise comparisons.
Equation 2.1 of Module 2 can be used to compute a confidence interval for pairs of
population mean differences. Equation 2.2 of Module 2 can be used to compute
confidence intervals for pairs of population standardized mean differences.
For any single 100(1 βˆ’ 𝛼)% confidence interval, we can be 100(1 βˆ’ 𝛼)% confident
that the confidence interval contains the population parameter value if all
assumptions have been satisfied. If v 100(1 βˆ’ 𝛼)% confidence intervals are
computed, it can be shown that we can be at least 100(1 βˆ’ 𝑣𝛼)% confident that all
v confidence intervals have captured their population parameters. For instance, if
six 95% confidence intervals are computed, we can be at least 100(1 βˆ’ 𝑣𝛼)% =
100(1 – .3)% = 70% confident that all six confidence intervals have captured their
population parameter values.
The researcher would like to be at least 100(1 βˆ’ 𝛼)% confident, rather than at least
100(1 βˆ’ 𝑣𝛼)% confident, that all v confidence intervals have captured their
population parameters. One way to achieve this is to use 𝛼* = 𝛼/v rather than 𝛼 in
the critical t-value (In Equation 2.1) or critical z-value (in Equation 2.2) for each
confidence interval. The adjusted alpha level 𝛼/v is called a Bonferroni adjustment.
1
D.G. Bonett (3/2017)
The Tukey-Kramer method yields a slightly narrower confidence interval than the
Bonferroni method if all possible pairs of means are examined. The standard
Tukey-Kramer method assumes equal population variances, but SAS also
implements a version of the Tukey-Kramer method that does not require equal
population variances. SPSS provides an option to compute Games-Howell
confidence intervals for all pair-wise comparisons that are similar to the unequal
variance Tukey-Kramer confidence intervals. The Tukey-Kramer and GamesHowell methods are used only when the researcher is interested in examining
every pairwise difference in population means. The Bonferroni method should be
used if the researcher is interested in a subset of all possible pairwise comparisons.
Multiple confidence intervals based on the Tukey-Kramer, Games-Howell, or
Bonferroni methods are called simultaneous confidence intervals.
Using the three-decision rule in Module 2, simultaneous confidence intervals can
be used to test multiple hypotheses and keep the familywise directional error rate
(FWDER) at or below 𝛼/2. FWDER is the probability of making at least one
directional error when testing multiple null hypotheses. The Holm test is more
powerful than tests based on simultaneous confidence intervals and also keeps the
FWDER at or below 𝛼/2. To perform a Holm test of v null hypotheses, rank order
the v p-values from smallest to largest. If the smallest p-value is less than 𝛼/v, then
reject H0 for that test and examine the next smallest p-value; otherwise, do not
reject H0 for that test or any of the reaming v – 1 null hypotheses. If the second
smallest p-value is less than 𝛼/(v – 1), then reject H0 for that test and examine the
next smallest p-value; otherwise, do not reject H0 for that test or any of the
remaining v – 2 null hypotheses. If the third smallest p-value is less than 𝛼/(v – 2),
then reject H0 for that test and examine the next smallest p-value; otherwise, do
not reject H0 for that test or any of the remaining v – 3 null hypotheses (and so on).
Example 3.1. There is considerable variability in measures of intellectual ability among
college students. One psychologist believes that some of this variability can be explained
by differences in how students expect to perform on these tests. Ninety undergraduates
were randomly selected from a list of about 5,400 undergraduates. The 90 students were
randomly divided into three groups of equal size and all 90 students were given a
nonverbal intelligence test (Raven’s Progressive Matrices) under identical testing
conditions. The raw scores for this test range from 0 to 60. The students in group 1 were
told that they were taking a very difficult intelligence test. The students in group 2 were
told that they were taking an interesting β€œpuzzle”. The students in group 3 were not told
anything. Simultaneous Tukey-Kramer confidence intervals for all pairwise comparisons
of population means are given below.
Comparison
πœ‡1 – πœ‡2
πœ‡1 – πœ‡3
πœ‡2 – πœ‡3
95% Lower Limit 95% Upper Limit
-5.4
-3.1
-3.2
-1.4
1.2
3.5
(continued)
2
D.G. Bonett (3/2017)
The researcher is 95% confident that the mean intelligence score would be 3.1 to 5.4 greater
if all 5,400 undergraduates had been told that the test was a puzzle instead of a difficult IQ
test, 1.4 to 3.2 greater if they all had been told nothing instead of being told that the test is
a difficult IQ test, and 1.2 to 3.5 greater if they all had been told the test was a puzzle instead
of being told nothing. The simultaneous confidence intervals allow the researcher to be
95% confident regarding all three conclusions.
Linear Contrasts
Some research questions can be expressed in terms of a linear contrast of
population means, βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— , where 𝑐𝑗 is called a contrast coefficient. For example,
in an experiment that compares two costly treatments (Treatments 1 and 2) with a
new inexpensive treatment (Treatment 3), a confidence interval for (πœ‡1 + πœ‡2 )/2 –
πœ‡3 might provide valuable information regarding the relative costs and benefits of
the new treatment. Statistical packages and various statistical formulas require
linear contrasts to be expressed as βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— so that the contrast coefficients must
be specified. For instance, (πœ‡1 + πœ‡2 )/2 – πœ‡3 can be expressed as
πœ‡1
2
+
πœ‡2
2
βˆ’ πœ‡3 which
can then be expressed as (½)πœ‡1 + (½)πœ‡2 + (-1)πœ‡3 so that 𝑐1= .5, 𝑐2 = .5, and 𝑐3 = -1.
Consider another example where Treatment 1 is delivered to groups 1 and 2 by
experimenters A and B and Treatment 2 is delivered to groups 3 and 4 by
experimenters C and D. In this study we might want to estimate (πœ‡1 + πœ‡2 )/2 –
(πœ‡3 + πœ‡4 )/2 which can be expressed as (½)πœ‡1 + (½)πœ‡2 + (-½)πœ‡3 + (-½)πœ‡4 so that
𝑐1= .5, 𝑐2 = .5 𝑐3 = -.5 and 𝑐4 = -.5.
A 100(1 βˆ’ 𝛼)% unequal-variance confidence interval for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— is
βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡Μ‚ 𝑗 ο‚± 𝑑𝛼/2;𝑑𝑓 βˆšβˆ‘π‘Žπ‘—=1 𝑐𝑗2 πœŽΜ‚π‘—2 /𝑛𝑗
where df = [βˆ‘π‘Žπ‘—=1
̂𝑗2
𝑐𝑗2 𝜎
𝑛𝑗
2
̂𝑗4
𝑐4𝜎
𝑗
] /[ βˆ‘π‘Žπ‘—=1 𝑛2 (𝑛
𝑗
(3.1)
]. When examining v linear contrasts, 𝛼 can
𝑗 βˆ’1)
be replaced with 𝛼* = 𝛼/v in Equation 3.1 to give a set of Bonferroni simultaneous
confidence intervals.
If the sample sizes are approximately equal and there is convincing evidence from
previous research that the population variances are similar, then the unequalvariance standard error in Equation 3.1 could be replaced with an equal-variance
standard errorβˆšπœŽΜ‚π‘2 βˆ‘π‘Žπ‘—=1 𝑐𝑗2 /𝑛𝑗 where πœŽΜ‚π‘2 = [βˆ‘π‘Žπ‘—=1(𝑛𝑗 βˆ’ 1) πœŽΜ‚π‘—2 ]/𝑑𝑓 and df = (βˆ‘π‘Žπ‘—=1 𝑛𝑗 ) βˆ’ π‘Ž.
Contrary to the recommendations of most statisticians, many researchers have
been taught to always use the equal-variance method.
3
D.G. Bonett (3/2017)
Standardized Linear Contrasts
In applications where the intended audience might be unfamiliar with the metric
of the dependent variable, it could be helpful to report a confidence interval for a
standardized linear contrast of population means which is defined as
πœ‘ = βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— /√(βˆ‘π‘Žπ‘—=1 πœŽπ‘—2 )/π‘Ž
and is generalization of the standardized mean difference defined previously. The
denominator of πœ‘ is called the standardizer. Some alternative standardizers have
been proposed for linear contrasts. One alternative standardizer averages
variances across only those groups that have a non-zero contrast coefficient.
Another standardizer uses only the variance from a control group. Although not
recommended for routine use, the most popular standardizer is the square root of
πœŽΜ‚π‘2 defined above, which can be justified only when the population variances are
approximately equal.
An equal-variance 100(1 βˆ’ 𝛼)% confidence interval for πœ‘ is
(3.2)
πœ‘Μ‚ ± 𝑧𝛼/2 π‘†πΈπœ‘Μ‚
where πœ‘Μ‚ = βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡Μ‚ 𝑗 /√(βˆ‘π‘Žπ‘—=1 πœŽΜ‚π‘—2 )/π‘Ž and π‘†πΈπœ‘Μ‚ = √(πœ‘Μ‚ 2 /2π‘Ž2 ) βˆ‘π‘Žπ‘—=1
𝑛
1
𝑗 βˆ’1
+ βˆ‘π‘Žπ‘—=1 𝑐𝑗2 /𝑛𝑗 .
An unequal-variance confidence interval for πœ‘ is available, but its standard error
formula is more complicated than Equation 3.2. When examining v linear
contrasts, 𝛼 can be replaced with 𝛼* = 𝛼/v in Equation 3.2 to give a set of
Bonferroni simultaneous confidence intervals.
Example 3.2. Ninety students were randomly selected from a research participant pool
and randomized into three groups. All three groups were given the same set of boring tasks
for 20 minutes. Then all students listened to an audio recording that listed the names of
40 people who will be attending a party and the names of 20 people who will not be
attending the party in random order. The participants were told to simply write down the
names of the people who will attend the party as they hear them. In group 1, the
participants were asked to draw copies of complex geometric figures while they were
listening to the audio recording and writing. In group 2, the participants were not told to
draw anything while listening and writing. In group 3, the participants were told to draw
squares while listening and writing. The number of correctly recorded attendees was
obtained from each participant. The sample means and variances are given below.
Complex Drawing
πœ‡Μ‚ 1 = 24.9
πœŽΜ‚12 = 27.2
n1 = 30
No Drawing
πœ‡Μ‚ 2 = 23.1
πœŽΜ‚22 = 21.8
n2 = 30
Simple Drawing
πœ‡Μ‚ 3 = 31.6
πœŽΜ‚32 = 24.8
n3 = 30
4
D.G. Bonett (3/2017)
The 95% confidence interval for (πœ‡1 + πœ‡2 )/2 – πœ‡3 is [-9.82, -5.38]. The researcher is 95%
confident that the population mean number of correctly recorded attendees averaged
across the no drawing and complex drawing conditions is 5.38 to 9.82 lower than the
population mean correctly recorded attendees under the simple drawing condition. The
95% confidence interval for πœ‘ is [-2.03, -1.03]. The researcher is 95% confident that the
population mean number of correctly recorded attendee names, averaged across the no
drawing and complex drawing conditions, is 1.03 to 2.03 standard deviations below the
population mean number of correctly recorded attendee names under the simple drawing
condition.
Hypothesis Tests for Linear Contrasts
The three-decision rule can be used to assess the following null and alternative
hypotheses regarding the value of βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— .
H0: βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— = 0
H1: βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— > 0
H2: βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— < 0
A confidence interval for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— can be used to test the above hypotheses. If the
lower limit for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— is greater than 0, then reject H0 and accept H1. If the upper
limit for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— is less than 0, then reject H0 and accept H2. The results are
inconclusive if the confidence interval includes 0. Note that it is not necessary to
develop special hypothesis testing rules for πœ‘ because βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— = 0 implies πœ‘ = 0,
βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— > 0 implies πœ‘ > 0, and βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— < 0 implies πœ‘ < 0.
In an equivalence test, the goal is to decide if βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— is between -𝑏 and 𝑏 or if
βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— is outside this range, where 𝑏 is a number that represents a small or
unimportant value of βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— . An equivalence test involves selecting one of the
following two hypotheses.
H0: | βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— | ≀ 𝑏
H1: |βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— | > 𝑏
In applications where it is difficult to specify a small or unimportant value of
βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— , it might be easier to specify 𝑏 for a standardized linear contrast of means
and choose between the following two hypotheses.
H0: |πœ‘| ≀ 𝑏
H1: |πœ‘| > 𝑏
One-way Analysis of Variance
The total variability in the dependent variable scores in a one-factor design can be
decomposed into two sources of variability – the variance of scores within
treatments (also called error variance) and the variance due to mean differences
across treatments (also called between-group variance). The decomposition of
5
D.G. Bonett (3/2017)
variability in a one-factor study can be summarized in a one-way analysis of
variance (one-way ANOVA) table, as shown below, where SS stands for sum of
squares, MS stands for mean square, and n is the total sample size (n = 𝑛1 + 𝑛2 +
… + π‘›π‘Ž ). The between-group factor (i.e., the independent variable) will be referred
to as Factor A. The components of the ANOVA table for a one-factor design are
shown below.
Source
SS
df
MS
F
__________________________________________________________
A
SSA
dfA = a – 1
MSA = SSA/dfA
MSA/ MSE
ERROR
SSE
dfE = n – a
MSE = SSE/dfE
TOTAL
SST
dfT = n – 1
__________________________________________________________
The sum of squares (SS) formulas are
2
𝑛
𝑗
SSA = βˆ‘π‘Žπ‘—=1 𝑛𝑗 (πœ‡Μ‚ 𝑗 βˆ’ πœ‡Μ‚ + ) where πœ‡Μ‚ + = βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
𝑦𝑖𝑗 / βˆ‘π‘Žπ‘—=1 𝑛𝑗
𝑛
2
𝑗
SSE = βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
(𝑦𝑖𝑗 βˆ’ πœ‡Μ‚ 𝑗 ) = βˆ‘π‘Žπ‘—=1(𝑛𝑗 βˆ’ 1) πœŽΜ‚π‘—2
𝑛
2
𝑗
SST = βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
(𝑦𝑖𝑗 βˆ’ πœ‡Μ‚ + ) = SSA + SSE.
SSA will equal zero if all sample means are equal and will be large if the sample
means are highly unequal. MSE = SSE/dfE is called the mean squared error and is
equal to the pooled within-group variance (πœŽΜ‚π‘2 ) that was defined previously for the
equal-variance confidence interval. SST/dfT is the variance for the total set of n
scores ignoring group membership.
The SS values in the ANOVA table can be used to estimate a standardized measure
of effect size called eta-squared which can be defined as πœ‚2 = 1 – 𝜎𝐸2 /πœŽπ‘‡2 . In a nonexperimental design, πœŽπ‘‡2 is the variance of the dependent variable for everyone in
the study population and 𝜎𝐸2 is the variance of the dependent variable within each
subpopulation of the study population (and 𝜎𝐸2 is assumed to be equal across all
subpopulations). In an experimental design, 𝜎𝐸2 is the variance of the dependent
variable for everyone in the study population assuming they all received a
particular treatment and πœŽπ‘‡2 = πœŽπœ‡2 + 𝜎𝐸2 where πœŽπœ‡2 is the variance of the population
means under the a treatment conditions. An estimate of πœ‚2 is
πœ‚Μ‚ 2 = SSA /SST
or equivalently (because SSA = SST – SSE)
πœ‚Μ‚ 2 = 1 – SSE /SST.
6
D.G. Bonett (3/2017)
The value of πœ‚2 can range from 0 to 1 (because SSE has a possible range of 0 to SST)
and describes the proportion of the dependent variable variance in the population
that is predictable from the between-group factor. In designs with many groups,
πœ‚2 is a useful alternative to an examination of all possible pairwise comparisons.
The estimate of πœ‚2 contains sampling error of unknown magnitude and direction
and therefore a confidence interval for πœ‚2 should be reported along with πœ‚Μ‚ 2 . In
applications where the goal of the study is to show that all a population means have
similar values, a small upper confidence interval limit for πœ‚2 would provide the
necessary evidence to make such a claim. The confidence interval for πœ‚2 is
complicated but can be obtained in SAS or R.
Example 3.3. Sixty undergraduates were randomly selected from a study population of
4,350 college students and then classified into three groups according to their political
affiliation (Democrat, Republican, Independent). A stereotyping questionnaire was given
to all 60 participants. A one-way ANOVA detected differences in the three population
means (F(2, 57) = 5.02, p = .010, πœ‚Μ‚ 2 = .15, 95% CI = [.01, .30]). The researcher can be 95%
confident that 1% to 30% of the variance in the stereotyping scores of the 4,350 students
can be predicted from knowledge of their political affiliation.
The F statistic from the ANOVA table is used to test the null hypothesis H0: πœ‡1 = πœ‡2
= … = πœ‡π‘Ž against an alternative hypothesis that at least one pair of population
means is not equal. The null and alternative hypotheses also can be expressed as
H0: πœ‚2 = 0 and H1: πœ‚2 > 0. Statistical packages will compute the p-value for the F
statistic which can be used to decide if H0 can be rejected.
It is common practice to declare the ANOVA result to be β€œsignificant” when the pvalue is less than .05, but it is important to remember that a significant result
simply indicates a rejection of H0. The rejection of H0: πœ‚2 = 0 is not a scientifically
important finding because H0: πœ‚2 = 0 is known to be false in almost every study.
Furthermore, a "nonsignificant" results should not be interpreted as evidence that
H0: πœ‚2 = 0 is true. Some researchers will conduct a preliminary test of H0: πœ‚2 = 0
and only if the results are "significant" will they proceed with tests or confidence
intervals of pairwise comparisons or linear contrast. However, this preliminary test
approach is not required or recommended when using simultaneous confidence
intervals or tests that control the FWDER.
The three-decision rule and the equivalence test do not have the same weakness as
the test of H0: πœ‚2 = 0 because the three-decision rule and equivalence test provide
useful information about the direction or magnitude of an effect. In comparison,
rejecting H0: πœ‚2 = 0 in a one-factor design does not reveal anything about how the
7
D.G. Bonett (3/2017)
population
differences.
mistake is
differences
differences.
means are ordered or the magnitudes of the population mean
In studies where the test of H0: πœ‚2 = 0 is "significant", a common
to interpret the order and magnitudes of the population mean
on the basis of the order and magnitudes of the sample mean
Two-Factor Experiments
Human behavior is complex and is influenced in many different ways. In a onefactor experiment, the researcher is able to assess the causal effect of only one
independent variable on the dependent variables. The effect of two independent
variables on the dependent variable can be assessed in a two-factor experiment.
The two factors will be referred to as Factor A and Factor B. The simplest type of
two-factor experiment has two levels of Factor A and two levels of Factor B. We call
this a 2 × 2 factorial experiment. If Factor A had 4 levels and Factor B had 3 levels,
it would be called a 4 × 3 factorial experiment. In general, an a × b factorial
experiment has a levels of Factor A and b levels of Factor B.
There are two types of two-factor between-subjects experiments. In one case, both
factors are between-subjects treatment factors and participants are randomly
assigned to the combinations of treatment conditions. In the other case, one factor
is a treatment factor and the other is a classification factor. A classification factor
is a factor with levels to which participants are classified according to some existing
characteristic such as sex, ethnicity, or political affiliation. In a two-factor
experiment with one treatment factor, participants are randomly assigned to the
treatment conditions within each level of the classification factor. A study could
have two classification factors, but then it would be a nonexperimental design.
Example 3.3. An experiment with two treatment factors takes randomly sampled Coast
Guard personnel and randomizes them to one of four treatment conditions: 24 hours of
sleep deprivation and 15 hours without food; 36 hours of sleep deprivation and 15 hours
without food; 24 hours of sleep deprivation and 30 hours without food; and 36 hours of
sleep deprivation and 30 hours without food. One treatment factor is hours of sleep
deprivation (24 or 36 hours) and the other treatment factor is hours of food deprivation
(15 or 30 hours). The dependent variable is the score on a complex problem-solving task.
Example 3.4. An experiment with one classification factor and one treatment factor uses
a random sample of men and a random sample of women from a volunteer list of students
taking introductory chemistry. The men and women samples are each randomized into
two groups with one group receiving 4 hours of chemistry review and the other group
receiving 6 hours of chemistry review. The treatment factor is the amount of review (4 or
6 hours) and the classification factor is gender. The dependent variable is the score on the
final comprehensive exam.
8
D.G. Bonett (3/2017)
One advantage of a two-factor experiment is that the effects of both Factor A and
Factor B can be assessed in a single study. Questions about the effects of Factor A
and Factor B could be answered using two separate one-factor experiments.
However, two one-factor experiments would require at least twice the total number
of participants to obtain confidence intervals with the same precision or hypothesis
tests with the same power that could be obtained from a single two-factor
experiment. Thus, a single two-factor experiment is more economical than two
one-factor experiments.
A two-factor experiment also can provide information that cannot be obtained
from two one-factor experiments. Specifically, a two-factor experiment can
provide unique information about the interaction effect between Factor A and
Factor B. An interaction effect occurs when the effect of Factor A is not the same
across the levels of Factor B (which is equivalent to saying that the effect of Factor
B is not the same across the levels of Factor A).
The inclusion of a second factor can improve the external validity of an experiment.
For instance, if there is a concern that participants might perform a particular task
differently in the morning than in the afternoon, then time of day (e.g., morning
vs. afternoon) could serve as a second 2-level factor in the experiment. If the
interaction effect between the Factor A and the time-of-day factor (Factor B) is
small, then the effect of Factor A would generalize to both morning and afternoon
testing conditions, thus increasing the external validity of the results for Factor A.
The external validity of an experiment also can be improved by including a
classification factor. In stratified random sampling, random samples are taken
from two or more different study populations that differ geographically or in other
demographic characteristics. If the interaction between the classification factor
and the treatment factor is small, then the effect of the treatment factor can be
generalized to the multiple study populations, thereby increasing the external
validity of the results for the treatment factor.
The inclusion of a classification factor also can reduce error variance (MSE), which
will in turn increase the power of statistical tests and reduce the widths of
confidence intervals. For instance, in a one-factor experiment with male and
female subjects, if women tend to score higher than men, then this will increase
the error variance (the variance of scores within treatments). If gender is added as
a classification factor, the error variance will then be determined by the variability
of scores within each treatment and within each gender, which will result in a
smaller MSE.
9
D.G. Bonett (3/2017)
Consider the special case of a 2 × 2 design. The population means for a 2 × 2 design
are shown below.
Factor B
b1
b2
a1
πœ‡11
πœ‡12
a2
πœ‡21
πœ‡22
Factor A
The main effects of Factor A and Factor B and the AB interaction effect are given
below.
A:
(πœ‡11 + πœ‡12 )/2 – (πœ‡21 + πœ‡22 )/2
B:
(πœ‡11 + πœ‡21 )/2 – (πœ‡12 + πœ‡22 )/2
AB:
(πœ‡11 βˆ’ πœ‡12 ) – (πœ‡21 βˆ’ πœ‡22 ) = (πœ‡11 βˆ’ πœ‡21 ) – (πœ‡12 βˆ’ πœ‡22 )
= πœ‡11 βˆ’ πœ‡21 – πœ‡12 + πœ‡22
The simple main effects of A and B are given below.
A at b1: πœ‡11 βˆ’ πœ‡21
B at π‘Ž1 : πœ‡11 βˆ’ πœ‡12
A at b2: πœ‡12 βˆ’ πœ‡22
B at π‘Ž2 : πœ‡21 βˆ’ πœ‡22
The interaction effect can be expressed as a difference in simple main effects,
specifically (πœ‡11 βˆ’ πœ‡12 ) – (πœ‡21 βˆ’ πœ‡22 ) = (B at a1) – (B at a2), or equivalently,
(πœ‡11 βˆ’ πœ‡21 ) – (πœ‡12 βˆ’ πœ‡22 ) = (A at b1) – (A at b2). The main effects can be expressed
as averages of simple main effects. The main effect of A is (A at b1 + A at b2)/2 =
(πœ‡11 βˆ’ πœ‡21 + πœ‡12 βˆ’ πœ‡22 )/2 = (πœ‡11 + πœ‡12 )/2 – (πœ‡21 + πœ‡22 )/2. The main effect of B is (B
at a1 + B at a2)/2 = (πœ‡11 βˆ’ πœ‡12 + πœ‡21 βˆ’ πœ‡22 )/2 = (πœ‡11 + πœ‡21 )/2 – (πœ‡12 + πœ‡22 )/2. All of
the above effects are special cases of a linear contrast of means, and confidence
intervals for these effects can be obtained using Equation 3.1.
The main effect of A (which is the average of A at b1 and A at b2) could be misleading
because A at b1 and A at b2 will be highly dissimilar if the AB interaction is large.
Likewise, the main effect of B (which is the average of B at a1 and B at a2) could be
misleading if the AB interaction is large because B at a1 and B at a2 will be highly
dissimilar. If the AB interaction effect is large, then an analysis of simple main
effects will be more meaningful than an analysis of main effects. If the AB
interaction is small, then an analysis of the main effects of Factor A and Factor B
will not be misleading and an analysis of simple main effects will be unnecessary.
10
D.G. Bonett (3/2017)
Pairwise Comparisons in Two-factor Designs
In experiments where Factor A or Factor B has more than two levels, various
pairwise comparisons can be made. Consider a 2 × 3 design where the main effects
of Factor B are of interest. The population means are given below.
Factor B
Factor A
b1
b2
b3
π‘Ž1
πœ‡11
πœ‡12
πœ‡13
π‘Ž2
πœ‡21
πœ‡22
πœ‡23
The following three pairwise main effects can be defined for Factor B
B12:
(πœ‡11 + πœ‡21 )/2 – (πœ‡12 + πœ‡22 )/2
B13:
(πœ‡11 + πœ‡21 )/2 – (πœ‡13 + πœ‡23 )/2
B23:
(πœ‡12 + πœ‡22 )/2 – (πœ‡13 + πœ‡23 )/2
where the subscripts of B represent the levels of the factor being compared.
If one or both factors have more than two levels, then more than one interaction
effect can be examined. An interaction effect can be defined for any two levels of
Factor A and any two levels of Factor B. For instance, in the 2 × 3 design described
above, the following three pairwise interaction effects can be defined
A12B12: πœ‡11 βˆ’ πœ‡12 βˆ’ πœ‡21 + πœ‡22
A12B13: πœ‡11 βˆ’ πœ‡13 βˆ’ πœ‡21 + πœ‡23
A12B23: πœ‡12 βˆ’ πœ‡13 βˆ’ πœ‡22 + πœ‡23
where the subscripts of AB represent the levels of Factor A and Factor B being
compared. The number of pairwise interaction effects can be overwhelming in
larger designs. For instance, in a 4 × 3 design, there are six pairs of Factor A levels
and three pairs of Factor B levels from which 6 × 3 = 18 pairwise interaction effects
could be examined. Pairwise interaction effects are typically examined in designs
where the number of factor levels of each factor is small.
If an AB interaction has been detected, then the simple main effects of Factor A or
the simple main effects of Factor B provide useful information. Suppose the simple
main effects of Factor A are to be examined and Factor A has more than two levels.
In this situation, pairwise simple main effects can be examined. In the 2 × 3 design
11
D.G. Bonett (3/2017)
described above, Factor B has three levels and the simple pairwise main effects of
Factor B are
B12 at π‘Ž1 : πœ‡11 βˆ’ πœ‡12
B12 at π‘Ž2 : πœ‡21 βˆ’ πœ‡22
B13 at π‘Ž1 : πœ‡11 βˆ’ πœ‡13
B13 at π‘Ž2 : πœ‡21 βˆ’ πœ‡23
B23 at π‘Ž1 : πœ‡12 βˆ’ πœ‡13
B23 at π‘Ž2 : πœ‡22 βˆ’ πœ‡23
Note that all of the above pairwise comparisons are linear contrasts of the
population means and can be expressed as βˆ‘π‘Žπ‘
𝑗 𝑐𝑗 πœ‡π‘— where ab is the total number
of groups. For instance, the contrast coefficients that define the B12 pairwise main
effect of Factor B (assuming the means in the 2 × 3 table are ordered left to right
and then top to bottom) are 1/2, -1/2, 0, 0 1/2, -1/2, 0, 0; the contrast coefficients that
define the A12B12 pairwise interaction effect are 1, -1, 0 -1, 1, 0; and the contrast
coefficients that define the pairwise simple main effect for B12 at π‘Ž1 are 1, -1, 0, 0,
0, 0.
Two-Way Analysis of Variance
Now consider a general a × b factorial design. The total variability of the
quantitative dependent variable scores in a two-factor design can be decomposed
into four sources of variability: the variance due to differences in means across the
levels of Factor A, the variance due to differences in means of across the levels of
Factor B, the variance due to differences in simple main effects of one factor across
the levels of the other factor (the AB interaction), and the variance of scores within
treatments (the error variance). The decomposition of the total variance in a twofactor design can be summarized in the following two-way analysis of variance
(two-way ANOVA) table where n is the total sample size.
Source
SS
df
MS
F
_________________________________________________________________________
A
SSA
dfA = a – 1
MSA = SSA/dfA
MSA/MSE
B
SSB
dfB = b – 1
MSB = SSB/dfB
MSB/MSE
AB
SSAB
dfAB = (a – 1)(b – 1)
MSAB = SSAB/dfAB
MSAB/MSE
ERROR
SSE
dfE = n – ab
MSE = SSE/dfE
TOTAL
SST
dfT = n – 1
_________________________________________________________________________
The TOTAL and ERROR sum of squares (SS) formulas in a two-way ANOVA shown
below are conceptually similar to the one-way ANOVA formulas
12
D.G. Bonett (3/2017)
𝑛
π‘—π‘˜
SST = βˆ‘π‘π‘˜=1 βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
(π‘¦π‘–π‘—π‘˜ βˆ’ πœ‡Μ‚ ++ )2
𝑛
π‘—π‘˜
SSE = βˆ‘π‘π‘˜=1 βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
(π‘¦π‘–π‘—π‘˜ βˆ’ πœ‡Μ‚ π‘—π‘˜ )2
𝑛
π‘—π‘˜
where πœ‡Μ‚ ++ =βˆ‘π‘π‘˜=1 βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
π‘¦π‘–π‘—π‘˜ / βˆ‘π‘π‘˜=1 βˆ‘π‘Žπ‘—=1 π‘›π‘—π‘˜ . The formulas for SSA, SSB, and SSAB
are complicated unless the sample sizes are equal. If all sample sizes are equal to
no, the formulas for SSA, SSB, and SSAB are
SSA = 𝑏𝑛0 βˆ‘π‘Žπ‘—=1(πœ‡Μ‚ 𝑗+ βˆ’ πœ‡Μ‚ ++ )
2
𝑛
0
where πœ‡Μ‚ 𝑗+ = βˆ‘π‘π‘˜=1 βˆ‘π‘–=1
𝑦𝑖𝑗 /𝑏𝑛0
𝑛0
SSB = π‘Žπ‘›0 βˆ‘π‘π‘˜=1(πœ‡Μ‚ +π‘˜ βˆ’ πœ‡Μ‚ ++ )2 where πœ‡Μ‚ +π‘˜ = βˆ‘π‘Žπ‘—=1 βˆ‘π‘–=1
𝑦𝑖𝑗 /π‘Žπ‘›0
SSAB = SST – SSE – SSA – SSB.
Partial eta-squared estimates are computed from the sum of squares estimates, as
shown below.
πœ‚Μ‚ 𝐴2 = SSA/(SST – SSB – SSAB) = SSA/(SSA + SSE)
πœ‚Μ‚ 𝐡2 = SSB/(SST – SSA – SSAB) = SSB/(SSB + SSE)
2
πœ‚Μ‚ 𝐴𝐡
= SSAB/(SST – SSB – SSA) = SSAB/(SSAB + SSE)
These measures are called β€œpartial” effect sizes because variability in the dependent
variable due to the effects of other factors is removed. For example, SSB and SSAB
are subtracted from SST to obtain πœ‚Μ‚ 𝐴2 . The method of computing a confidence
interval for a population partial eta-squared parameter is complicated but can be
obtained in SAS or R. In designs where a factor has many levels, a partial etasquared estimate is a simple alternative to reporting all possible pairwise
comparisons among the factor levels.
The F statistics for the main effect of Factor A, the main effect of Factor B, and the
2
AB interaction effect, test the null hypotheses H0: πœ‚π΄2 = 0, H0: πœ‚π΅2 = 0, and H0: πœ‚π΄π΅
=
0, respectively. Tests of these null hypotheses suffer from the same problem as the
test of the null hypothesis a one-way ANOVA in that a β€œsignificant” result does not
imply a scientifically important result, and a β€œnonsignificant” result does not imply
that the effect is zero. The new APA guidelines suggest that the F statistics and pvalues for each effect to be supplemented with confidence intervals for population
eta-squared parameters, linear contrasts of population means, or linear contrasts
of unstandardized linear population means.
Although a β€œnonsignificant” (i.e., inconclusive) test for the AB interaction effect
does not imply that the population interaction effect is zero, it is customary to
13
D.G. Bonett (3/2017)
examine main effects rather than simple main effects if the AB interaction test is
inconclusive. If the test for the AB interaction effect is β€œsignificant”, it is customary
to only analyze simple main effects or pairwise simple main effects. However, a
main effect could be interesting, even if the AB interaction effect is β€œsignificant”, if
2
the partial eta-squared estimate for the main effect is substantially larger than πœ‚Μ‚ 𝐴𝐡
.
Three-factor Experiments
The effects of three independent variables on the dependent variable can be
assessed in a three-factor design. The three factors will be referred to as Factor A,
Factor B, and Factor C. Like a two-factor design, a three-factor design provides
information about main effects and two-way interaction effects. Specifically, the
main effects of Factors A, B, and C can be estimated as well as the AB, AC, and BC
two-way interactions. These main effects and two-way interaction effects could be
estimated from three separate two-factor studies. A three-factor study has the
advantage of providing all this information in a single study and also provides
information about a three-way interaction (ABC) that could not be obtained from
separate two-factor studies. The factors in a three-factor design can be treatment
factors or classification factors. If all factors are classification factors, then the
study would be a nonexperimental design.
The simplest type of three-factor study has two levels of each factor and is called a
2 × 2 × 2 factorial design. In general, a × b × c factorial designs have a levels of
Factor A, b levels of Factor B, and c levels of Factor C.
A table of population means is shown below for the 2 × 2 × 2 factorial design.
Factor C
c1
c2
Factor B
Factor B
b1
b2
b1
b2
π‘Ž1
πœ‡111
πœ‡121
πœ‡112
πœ‡122
π‘Ž2
πœ‡211
πœ‡221
πœ‡212
πœ‡222
Factor A
The main effects of Factors A, B, and C are defined as
A:
(πœ‡111 + πœ‡121 + πœ‡112 + πœ‡122 )/4 – (πœ‡211 + πœ‡221 + πœ‡212 + πœ‡222 )/4
B:
(πœ‡111 + πœ‡211 + πœ‡112 + πœ‡212 )/4 – (πœ‡121 + πœ‡221 + πœ‡122 + πœ‡222 )/4
C:
(πœ‡111 + πœ‡211 + πœ‡121 + πœ‡221 )/4 – (πœ‡112 + πœ‡212 + πœ‡122 + πœ‡222 )/4,
14
D.G. Bonett (3/2017)
the three two-way interaction effects are defined as
AB:
(πœ‡111 + πœ‡112 )/2 – (πœ‡121 + πœ‡122 )/2 – (πœ‡211 + πœ‡212 )/2 + (πœ‡221 + πœ‡222 )/2
AC:
(πœ‡111 + πœ‡121 )/2 – (πœ‡112 + πœ‡122 )/2 – (πœ‡211 + πœ‡221 )/2 + (πœ‡212 + πœ‡222 )/2
BC:
(πœ‡111 + πœ‡211 )/2 – (πœ‡112 + πœ‡212 )/2 – (πœ‡121 + πœ‡221 )/2 + (πœ‡122 + πœ‡222 )/2,
and the three-way interaction effect is defined as
ABC:
πœ‡111 βˆ’ πœ‡121 βˆ’ πœ‡211 + πœ‡221 βˆ’ πœ‡112 + πœ‡122 + πœ‡212 βˆ’ πœ‡222 .
The simple main effects of Factors A, B, and C are given below.
A at b1:
(πœ‡111 + πœ‡112 )/2 – (πœ‡211 + πœ‡212 )/2
A at b2:
(πœ‡121 + πœ‡122 )/2 – (πœ‡221 + πœ‡222 )/2
A at c1:
(πœ‡111 + πœ‡121 )/2 – (πœ‡211 + πœ‡221 )/2
A at c2:
(πœ‡112 + πœ‡122 )/2 – (πœ‡212 + πœ‡222 )/2
B at a1:
(πœ‡111 + πœ‡112 )/2 – (πœ‡121 + πœ‡122)/2
B at a2:
(πœ‡211 + πœ‡212 )/2 – (πœ‡221 + πœ‡222 )/2
B at c1:
(πœ‡111 + πœ‡211 )/2 – (πœ‡121 + πœ‡221 )/2
B at c2:
(πœ‡112 + πœ‡212 )/2 – (πœ‡122 + πœ‡222 )/2
C at a1:
(πœ‡111 + πœ‡121 )/2 – (πœ‡112 + πœ‡122 )/2
C at a2:
(πœ‡211 + πœ‡221 )/2 – (πœ‡212 + πœ‡222 )/2
C at b1:
(πœ‡111 + πœ‡211 )/2 – (πœ‡112 + πœ‡212 )/2
C at b2:
(πœ‡121 + πœ‡221 )/2 – (πœ‡122 + πœ‡222 )/2
The simple-simple main effects of Factors A, B, and C are defined as
A at b1c1: πœ‡111 βˆ’ πœ‡211
B at a1c1: πœ‡111 βˆ’ πœ‡121
C at a1b1: πœ‡111 βˆ’ πœ‡112
A at b1c2: πœ‡112 βˆ’ πœ‡212
B at a1c2: πœ‡112 βˆ’ πœ‡122
C at a1b2: πœ‡121 βˆ’ πœ‡122
A at b2c1: πœ‡121 βˆ’ πœ‡221
B at a2c1: πœ‡211 βˆ’ πœ‡221
C at a2b1: πœ‡211 βˆ’ πœ‡212
A at b2c2: πœ‡122 βˆ’ πœ‡222
B at a2c2: πœ‡212 βˆ’ πœ‡222
C at a2b2: πœ‡221 βˆ’ πœ‡222 ,
15
D.G. Bonett (3/2017)
and the simple two-way interaction effects are defined as
AB at c1: πœ‡111 βˆ’ πœ‡121 βˆ’ πœ‡211 + πœ‡221
AB at c2: πœ‡112 βˆ’ πœ‡122 βˆ’ πœ‡212 + πœ‡222
AC at b1: πœ‡111 βˆ’ πœ‡211 βˆ’ πœ‡112 + πœ‡212
AC at b2: πœ‡121 βˆ’ πœ‡221 βˆ’ πœ‡122 + πœ‡222
BC at a1: πœ‡111 βˆ’ πœ‡121 βˆ’ πœ‡112 + πœ‡122
BC at a2: πœ‡211 βˆ’ πœ‡221 βˆ’ πœ‡212 + πœ‡222 .
The ABC interaction in a 2 × 2 × 2 design can be conceptualized as a difference in
simple two-way interaction effects. Specifically, the ABS interaction is the
difference between AB at 𝑐1 and AB at 𝑐2 , the difference between AC at b1 and AC
at b2, or the difference between BC at a1 and BC at a2. Although the meaning of a
three-way interaction is not easy to grasp, its meaning becomes clearer when it is
viewed as the difference in simple two-way interaction effects with each simple
two-way interaction viewed as a difference in simple-simple main effects. (Note
that 𝑐1 and 𝑐2 are used in this section to represent levels of Factor C and should not
be confused with the previous use of 𝑐𝑗 to represent contrast coefficients).
The two-way interaction effects in a three-factor design are conceptually the same
as in a two-factor design. Two-way interactions in a three-factor design are defined
by collapsing the three-dimensional table of population means to create a twodimensional table of means with cell means that have been averaged over the
collapsed dimension. For instance, a table of averaged population means after
collapsing Factor C gives the following 2 × 2 table from which the AB interaction
can be defined in terms of the averaged population means.
Factor B
b1
b2
a1
(πœ‡111 + πœ‡112 )/2
(πœ‡121 + πœ‡122 )/2
a2
(πœ‡211 + πœ‡212 )/2
(πœ‡221 + πœ‡222 )/2
Factor A
Three-Way Analysis of Variance
The total variance of the dependent variable scores in a three-factor design can be
decomposed into eight sources of variability – three main effects, three two-way
interactions, one three-way interaction, and the within-group error variance. The
decomposition of the total variance in a three-factor design can be summarized in
the following three-way analysis of variance (three-way ANOVA) table where n is
the total sample size.
16
D.G. Bonett (3/2017)
Source
SS
df
MS
F
________________________________________________________________________
A
SSA
dfA = a – 1
MSA = SSA/dfA
MSA/MSE
B
SSB
dfB = b – 1
MSB = SSB/dfB
MSB/MSE
C
SSC
dfC = c – 1
MSC = SSC/dfC
MSC/MSE
AB
SSAB
dfAB = dfAdfB
MSAB = SSAB/dfAB
MSAB/MSE
AC
SSAC
dfAC = dfAdfC
MSAC = SSAC/dfAC
MSAC/MSE
BC
SSBC
dfBC = dfBdfC
MSBC = SSBC/dfBC
MSBC/MSE
ABC
SSABC
dfABC = dfAdfBdfC
MSABC = SSABC/dfABC
MSABC/MSE
ERROR
SSE
dfE = n – abc
MSE = SSE/dfE
TOTAL
SST
dfT = n – 1
________________________________________________________________________
The SS formulas for a three-way ANOVA are conceptually similar to those for the
two-way ANOVA and will not be presented. Partial eta-squared estimates are
computed from the SS estimates in a three-way ANOVA in the same way they are
2
computed in a two-way ANOVA. For example, πœ‚Μ‚ 𝐴2 = SSA/(SSA + SSE) and πœ‚Μ‚ 𝐴𝐡𝐢
=
SSABC/(SSABC + SSE).
The hypothesis tests in the three-way ANOVA suffer from the same problem as the
hypothesis tests in the one-way and two-way ANOVA. These tests should be
supplemented with confidence intervals for population eta-squared values, linear
contrast of population means, or standardized linear contrasts of population
means to provide information regarding the magnitude of each effect.
If an ABC interaction has been detected in a three-way ANOVA, simple two-way
interactions or simple-simple main effects should be examined. A two-way
interaction could be examined even if an ABC interaction is β€œsignificant” if the
partial eta-squared estimate for a two-way interaction is substantially larger than
the partial eta-squared estimate for the ABC interaction.
If the test for an ABC interaction is inconclusive, the AB, AC, and BC interactions
should be examined. Using Factor A as an example, if AB and AC interactions are
detected, then simple-simple main effects of A should be examined because Factor
A interacts with both factor B and Factor C. If an AB interaction is detected, but
the test for the AC interaction is inconclusive, then the simple main effects of A
should be examined at each level of Factor B. Similarly, if an AC interaction is
detected, but the test of an AB interaction is inconclusive, then the simple main
effects of A should be examined at each level of Factor C. Even if AB and AC
interactions have been detected, the main effect of A could be examined if the
partial eta-squared estimate for the main effect of A is substantially larger than the
partial eta-squared estimates for the AB and AC interactions.
17
D.G. Bonett (3/2017)
If the tests for the ABC, AB, AC, and BC interactions are all inconclusive then all
three the main effects should be examined. An analysis of main effects can be
justified even if interactions are β€œsignificant” if the partial eta-squared estimates
for the main effects are substantially larger than the partial eta-squared estimates
for the interaction effects.
Assumptions
In addition to the random sampling and independence assumptions, the ANOVA
tests, the equal-variance Tukey-Kramer confidence intervals for pairwise
comparison, the equal-variance confidence interval for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— , the equalvariance confidence interval for 𝛿, and the confidence interval for πœ‚2 all assume
equality of population variances across treatment conditions and normality of the
dependent variable in the study population under any given treatment condition.
The effects of violating these assumptions are identical to those for the equalvariance confidence interval for πœ‡1 βˆ’ πœ‡2 described in Module 2.
The Games-Howell and unequal-variance Tukey-Kramer methods for pairwise
comparison, and the unequal-variance confidence intervals for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— and πœ‘
relax the equal population variance assumption and are preferred to the equalvariance methods unless the sample sizes are approximately equal and there is
compelling prior information to suggest that the population variances are similar
across treatment conditions. The Welsh test is an alternative to the traditional oneway ANOVA test that relaxes the equal variance assumption and can be obtained
in SAS, SPSS, and R.
The adverse effects of violating the normality assumption on the ANOVA tests and
the confidence intervals for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— are usually not serious unless the dependent
variable is highly skewed and the sample size per group is small (𝑛𝑗 < 20). However,
leptokurtosis of the dependent variable is detrimental to the performance of the
confidence interval for πœ‚2 and πœ‘. Furthermore, the adverse effects of leptokurtosis
on these confidence intervals are not diminished in large sample sizes. Data
transformations are sometimes helpful in reducing leptokurtosis in distributions
that are also skewed.
To assess the degree of non-normality in a design with a β‰₯ 2 groups, subtract πœ‡Μ‚ 𝑗
from all of the group j scores then estimate the skewness and kurtosis coefficients
from these 𝑛1 + 𝑛2 + β‹― + π‘›π‘Ž deviation scores. If the deviation scores are skewed, it
might be possible to reduce the skewness by transforming (e.g., log, square-root,
reciprocal) the dependent variable scores.
18
D.G. Bonett (3/2017)
Distribution-free Methods
If the response variable is skewed, a confidence interval for a linear contrast of
population medians might be more appropriate and meaningful than a confidence
interval for a linear contrast of population means. An approximate 100(1 βˆ’ 𝛼)%
confidence interval for βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœπ‘— is
βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœΜ‚π‘— ± 𝑧𝛼/2 βˆšβˆ‘π‘Žπ‘—=1 𝑐𝑗2 π‘†πΈπœΜ‚2𝑗
(3.7)
where π‘†πΈπœΜ‚2𝑗 was defined in Equation 1.8. This confidence interval only assumes
random sampling and independence among participants. Equation 3.5 can be used
to test H0: βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœπ‘— = 0 and decide if βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœπ‘— > 0 or βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœπ‘— < 0. Equation 3.7
also can be used to test H0: |βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœπ‘— | ≀ 𝑏 against H1: |βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœπ‘— | > 𝑏.
The Kruskal-Wallis test is a distribution-free test of the null hypothesis that the
dependent variable distribution is identical (same location, variance, and shape)
in all a treatment conditions (or all a subpopulations in a nonexperimental design).
A rejection of the null hypothesis implies differences in the location, variance, or
shape of the dependent variable distribution in at least two of the treatment
conditions or subpopulations. The Kruskal-Wallis test is used as a distribution-free
alternative to the one-way ANOVA and suffers from the same problem as the oneway ANOVA because the null hypothesis is known to be false in virtually every
study. In designs with more than two groups, useful information can be obtained
by performing multiple Mann-Whitney tests for some or all pairwise comparisons
using the Holm procedure. Simultaneous confidence intervals for pairwise
differences or ratios of medians, the Mann-Whitney parameter for pairwise
comparisons, or linear contrasts of medians are informative alternatives to the
Kruskal-Wallis test. Some researchers use the Kruskal-Wallis test as a screening
test to determine if multiple Mann-Whitney tests or simultaneous confidence
intervals are necessary.
Sample Size Requirements for Desired Precision
The sample size requirement per group to estimate a linear contrast of a population
means with desired confidence and precision is approximately
2
𝑛𝑗 = 4πœŽΜƒ 2 (βˆ‘π‘Žπ‘—=1 𝑐𝑗2 )(𝑧𝛼/2 /𝑀)2 + 𝑧𝛼/2
/2π‘š
(3.8)
19
D.G. Bonett (3/2017)
where πœŽΜƒ 2 is the average within-group variance, and m is the number of non-zero 𝑐𝑗
values. Note that Equation 3.8 reduces to Equation 2.5 for the special case of
comparing two means. Equation 3.8 also can be used for factorial designs where a
is the total number of treatment combinations. The MSE from previous research is
often used as a planning value for the average within-group variance.
Example 3.7. A researcher wants to estimate (πœ‡11 + πœ‡12 )/2 – (πœ‡21 + πœ‡22 )/2 in a 2 × 2
factorial experiment with 95% confidence, a desired confidence interval width of 3.0, and
a planning value of 8.0 for the average within-group error variance. The contrast
coefficients are 1/2, 1/2, -1/2, and -1/2. The sample size requirement per group is
approximately 𝑛𝑗 = 4(8.0)(1/4 + 1/4 + 1/4 + 1/4)(1.96/3.0)2 + 0.48 = 14.2 β‰ˆ 15.
The sample size requirement per group to estimate a standardized linear contrast
of a population means (πœ‘) with desired confidence and precision is approximately
𝑛𝑗 = [2πœ‘Μƒ 2 /π‘Ž + 4(βˆ‘π‘Žπ‘—=1 𝑐𝑗2 )](𝑧𝛼/2 /𝑀)2
(3.9)
where πœ‘Μƒ is a planning value of πœ‘. Note that this sample size formula reduces to
Equation 2.6 for the special case of a standardized mean difference.
Example 3.8. A researcher wants to estimate πœ‘ in a one-factor experiment (a = 3) with
95% confidence, a desired confidence interval width of 0.6, and πœ‘Μƒ = 0.8. The contrast
coefficients are 1/2, 1/2, and -1. The sample size requirement per group is approximately
𝑛𝑗 = [2(0.64)/3 + 4(1/4 + 1/4 + 1)](1.96/0.6)2 = 68.6 β‰ˆ 69.
A simple formula for approximating the sample size needed to obtain a confidence
interval for πœ‚2 having a desired width is currently not available. However, if sample
data can be obtained in two stages, then the confidence interval width for πœ‚2
obtained in the first-stage sample can be used in Equation 1.12 to approximate the
additional number of participants needed in the second- stage sample to achieve
the desired confidence interval width.
Example 3.9. A first-stage sample size of 12 participants per group in a one-factor
experiment gave a 95% confidence interval for πœ‚ 2 with a width of 0.51. The researcher
would like to obtain a 95% confidence interval for πœ‚2 that has a width 0f 0.30. To achieve
this goal, [(0.51/0.30)2 – 1]12 = 22.7 β‰ˆ 23 additional participants per group are needed.
Sample Size Requirements for Desired Power
The sample size requirement per group to test H0: βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡π‘— = 0 for a specified value
of 𝛼 and with desired power is approximately
20
D.G. Bonett (3/2017)
2
𝑛𝑗 = πœŽΜƒ 2 (βˆ‘π‘Žπ‘—=1 𝑐𝑗2 )(𝑧𝛼/2 + 𝑧𝛽 )2/(βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡Μƒπ‘— )2 + 𝑧𝛼/2
/2m
(3.10)
where πœŽΜƒ 2 is the average within-group variance, βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡Μƒπ‘— is the anticipated effect
size value, and m is the number of non-zero 𝑐𝑗 values. This sample size formula
reduces to Equation 2.7 when the contrast involves the comparison of two means.
In applications where βˆ‘π‘Žπ‘—=1 𝑐𝑗 πœ‡Μƒπ‘— or πœŽΜƒ 2 is difficult for the researcher to specify,
Equation 3.10 can be expressed in terms of a planning value for πœ‘, as shown below
2
𝑛𝑗 = (βˆ‘π‘Žπ‘—=1 𝑐𝑗2 )(𝑧𝛼/2 + 𝑧𝛽 )2/πœ‘Μƒ 2 + 𝑧𝛼/2
/2m
(3.11)
which simplifies to Equation 2.8 in Module 2 when the contrast involves the
comparison of two means.
Example 3.10. A researcher wants to test H0: (πœ‡1 + πœ‡2 + πœ‡3 + πœ‡4 )/4 βˆ’ πœ‡5 in a one-factor
experiment with power of .90, 𝛼 = .05, and an anticipated standardized linear contrast
value of 0.5. The contrast coefficients are 1/4, 1/4, 1/4, 1/4, and -1. The sample size
requirement per group is approximately 𝑛𝑗 = 1.25(1.96 + 1.28)2 /0.52 + 0.38 = 52.9 β‰ˆ 53.
The sample size requirements for v simultaneous confidence intervals or tests are
obtained by replacing 𝛼 in Equations 3.8 - 3.11 with 𝛼 βˆ— = 𝛼/𝑣.
Using Prior Information
Suppose a population mean difference for a particular response variable has been
estimated in a previous study and also in a new study. The previous study used a
random sample to estimate πœ‡1 – πœ‡2 from one study population, and the new study
used a random sample to estimate πœ‡3 – πœ‡4 from another study population. This is
a 2 × 2 factorial design with a classification factor where Study 1 and Study 2 are
the levels of the classification factor. The two study populations are assumed to be
conceptually similar. If a confidence interval for (πœ‡1 – πœ‡2 ) βˆ’ (πœ‡3 – πœ‡4 ) suggests that
πœ‡1 – πœ‡2 and πœ‡3 – πœ‡4 are not too dissimilar, then the researcher might want to
compute a confidence interval for (πœ‡1 + πœ‡3 )/2 – (πœ‡2 + πœ‡4 )/2. A confidence interval
for (πœ‡1 + πœ‡3 )/2 – (πœ‡2 + πœ‡4 )/2 will have greater external validity and could be
substantially narrower than the confidence interval for πœ‡1 – πœ‡2 or πœ‡3 – πœ‡4 .
A 100(1 βˆ’ 𝛼)% confidence interval for (πœ‡1 + πœ‡3 )/2 – (πœ‡2 + πœ‡4 )/2 is obtained from
Equation 3.1, and if medians have been computed in each study an approximate
100(1 βˆ’ 𝛼)% confidence interval for (𝜏1 + 𝜏3 )/2 – (𝜏2 + 𝜏4 )/2 is obtained from
Equation 3.4 using 𝑐1 = .5, 𝑐2 = βˆ’.5, 𝑐3 = .5, and 𝑐4 = βˆ’.5.
21
D.G. Bonett (3/2017)
If a standardized mean difference has been estimated in each study and a
confidence interval for 𝛿1 βˆ’ 𝛿2 suggests that these two parameter values are not too
dissimilar, the researcher might want to compute the following approximate
100(1 βˆ’ 𝛼)% confidence interval for (𝛿1 + 𝛿2 )/2
(𝛿̂1 + 𝛿̂2 )/2 ± 𝑧𝛼/2 √(𝑆𝐸𝛿̂2 + 𝑆𝐸𝛿̂2 )/4
1
2
(3.12)
where 𝑆𝐸𝛿̂2 was defined in Equation 2.2 of Module 2.
𝑗
Example 3.12. An eye-witness identification study with 20 participants per group at
Kansas State University assessed participants’ certainty in their selection of a suspect
individual from a photo lineup after viewing a short video of a crime scene. Two treatment
conditions were assessed in each study. In the first treatment condition the participants
were told that the target individual β€œwill be” in a 5-person photo lineup, and in the second
treatment condition participants were told that the target individual β€œmight be” in a 5person photo lineup. The suspect was included in the lineup in both instruction
conditions. The estimated means were 7.4 and 6.3 and the estimated standard deviations
were 1.7 and 2.3 in the β€œwill be” and β€œmight be” conditions, respectively. This study was
replicated at UCLA using 40 participants per group. In the UCLA study, the estimated
means were 6.9 and 5.7, and the estimated standard deviations were 1.5 and 2.0 in the β€œwill
be” and β€œmight be” conditions, respectively. A 95% confidence interval for (πœ‡1 – πœ‡2 ) βˆ’ (πœ‡3
– πœ‡4 ) indicated that πœ‡1 – πœ‡2 and πœ‡3 – πœ‡4 do not appear to be substantially dissimilar. The
95% confidence interval for (πœ‡1 + πœ‡3 )/2 – (πœ‡2 + πœ‡4 )/2, which describes the Kansas State
and UCLA study populations, was [0.43, 1.87].
Graphing Results
Results of a two-factor design can be illustrated using a clustered bar chart where
the means for the levels of one factor are represented by a cluster of contiguous
bars (with different colors, shades, or patterns) and the levels of the second factor
are represented by different clusters. An example of a clustered bar chart for a
2 × 2 design is shown below where the levels of Factor A defined the cluster.
22
D.G. Bonett (3/2017)
If one factor is more interesting than the other factor, the factor levels within each
cluster should represent the more interesting factor because it is easier to visually
compare means within a cluster than across clusters. In the above graph, it is easy
to see than the mean for level 2 of Factor A is greater than the mean for level 1 of
Factor A within each level of Factor B.
Data Transformations and Interaction Effects
Data transformations were described in Module 1 as a way to reduce nonnormality.
In factorial designs, an interaction effect might be an artifact of the measurement
process, and the magnitude of an interaction effect can sometimes be reduced
considerably by a data transformation. Consider the following example of a 2 × 2
design with three participant scores per group.
Factor B
Factor A
b1
b2
π‘Ž1
49, 64, 81
100, 121, 144
π‘Ž2
1, 4, 9
16, 25, 36
The simple main effect of A at b1 is 64.67 – 4.67 = 60 and the simple main effect of A
at b2 is 121.67 – 25.67 = 96, which indicates a nonzero interaction effect in this
sample. After taking a square root transformation of the data, the simple main
effect of A at b1 is 8 – 2 = 6 and the simple main effect of A at b2 is 11 – 5 = 6, which
indicates a zero interaction effect. In this example, the estimated interaction effect
was reduced to zero by simply transforming the data.
Interaction effects can be classified as removable or nonremovable. A removable
interaction effect (also called an ordinal interaction effect) can be reduced to zero
by some transformation of the data. A nonremovable interaction effect (also called
a disordinal interaction effect) cannot be reduced to zero by a data transformation.
In a 2 × 2 design, if the simple main effects of A have different signs, or the simple
main effects of B have different signs, then the interaction effect is nonremovable.
Otherwise, the interaction effect is removable by some data transformation. In
studies where the existence of an interaction effect has an important theoretical
implication, a more compelling theoretical argument can be made if it can be
shown, based on confidence intervals for the simple main effects, that the
population interaction effect is nonremovable.
23