Department of Biostatistics, SDU
Analysis of Repeated Measurements
Torben Martinussen
April 2010
Factors and factor structure
diagrams
1
Introduction
In this note we define the notion of a factor and introduce the use of so-called
factor structure diagrams. The latter ones are useful to get an overview of
the experiment at hand, both with regard to planning and analysis of the
experiment.
2
Factors
An experiment consists of a number of experimental units, which we number
1, . . . , N . We say that we have the index set I = {1, . . . , N }. To each
experimental unit is measured a response Y (the response variable), that
is, we measure the responses Y1 , . . . , YN , where we write Yi for the ith
experimental unit, i = 1, . . . , N .
A factor partions all the experimental units into a number of groups.
These groups are given names referred to as the levels of the factor. A simple
example could be the factor sex that partions the experimental units (subjects) into two groups corresponding to the factor levels male and female.
The level of the factor for the ith unit is denoted by factori , that is, in the
example, male or female depending on whether the ith subject is a male
or a female. Suppose that the experiment consists of 10 subjects, where the
first 5 are males and the last 5 are females. This can be summarized as
sex1 = male, . . . , sex5 = male
sex6 = female, . . . , sex10 = female.
1
We write |F | for the number of levels of F , thus |sex| = 2. The number
of experimental units on level j of the factor F is denoted nF (j). If there
is the same number of units in the groups defined by the factor F , that is,
nF (j1 ) = nF (j2 ) for all j1 , j2 in F , then F is said to be balanced and we then
write nF for nF (j). In this case, nF · |F | = N . In the example above, with
the factor sex, we have 5 males and 5 females, and hence that the factor is
balanced.
Example 2.1
Consider an experiment concerning growth of rats exposed to a growth hormone. Suppose that five male and five female rats from three different strains
are available, thus, in total we have 30 rats in the experiment.
We thus have two factors with 2 and 3 levels: factor sex with levels
male and female, and factor strain with levels 1, 2, 3. The two factors are
balanced and nsex = 15, nstrain = 10.
2
As mentioned earlier, factors play a role when designing and analysing
experiments. The ANOVA model for the simple example with males and
females may be written as
Yi = α(sexi ) + i ,
where Yi denotes the response value for the ith subject and 1 , . . . , N
are independent and identically N (0, σ 2 )-distributed random variables. This
model corresponds to the one-way analysis of variance model with two groups
with means α(male) and α(female), respectively.
3
Designs with more than one factor
One of the benefits by introducing the somewhat complicated notion of a
factor is the factor structure diagram, which we return to in a moment. The
factor structure diagram is useful to get an overview of ones experiment.
This is of course only important if there is more than one factor present in
the experiment. Quite often there will be several factors into play. Before
being able to define and draw the factor structure diagram we need to discuss
orderings of factors.
3.1
Nested factors
Let two factors F and G be given. Consider the groups obtained by partitioning the experimental units into the groups given by F and G. If we can
2
obtain the groups defined by G by collapsing some of the groups defined by
F we then say that F is finer than G. One also says that F is nested within
G (G defines the nests and F the eggs in the nests). If F is finer than G
then G is said to be coarser than F , and we write G ≤ F . For F to be finer
than G it is necessary that |F | ≥ |G| but this is not in general sufficient. In
Example 2.1. we have |strain| > |sex| but is strain finer than sex?
Two factors play a special role in the introduced ordering between factors:
This is the trivial factor 0, corresponding to a partitioning of the units into
a single group, that is, |0| = 1; and the units factor I that partitions the
units into their own group consisting of the unit itself, thus |I| = N . The
trivial factor 0 is coarser than all other factors, while the units factor I is
finer than all other factors.
Example 3.1
A study considered the growth of European beech. After germinating outdoors, seedlings were transferred in their pots to eight 15 m2 closed-top
chambers. Four different light levels were used in each chamber (depending
of the placement of the seedlings within each chamber). Each chamber was
set to maintain a certain temperature. Four temperatures were applied. We
hence have the factors (besides 0 and I) with their levels:
(1, 2, . . . , 8),
chamber
temperature
(1, 2, 3, 4),
light
(1, 2, 3, 4).
In this experiment the factor chamber is nested within temperature since
temperature is kept constant in each chamber and there are more chambers
than levels of temperature. Another way of concluding the same is by
noticing that knowing the chamber tells us the temperature but not the
other way around. Is chamber nested within light?
2
3.2
Cross-classifications
For any two factors F and G we can form the cross-classification by the two
factors. This is called the product of F and G and is denoted by F × G. It
is a factor with level (Fi , Gi ) for the ith unit, i = 1, . . . , N , and, in terms of
ordering, it is the coarsest factor finer than both F og G:
(i) F ≤ F × G and G ≤ F × G;
(ii) any factor H finer than both F and G is also finer than F × G.
3
Example 3.2 (Continuation of Example 2.1.)
Consider the situation described in Example 2.1. The product sex × strain
of the factors sex and strain has six levels:
(male, 1),
(female, 1),
(male, 2),
(male, 3),
(female, 2),
(female, 3).
This is of course only true if all six combinations are present in the experiment, but they are in the present example. The factor sex × strain is in
this case balanced, that is, nsex×strain = 5. Notice that if we know, for a
unit i, to which of the six combinations of the two factors sex and strain
it belongs to, then we also know sexi and straini , and hence sex × strain
is finer than both sex and strain.
2
The product of two factors is closely related to the notion of interaction.
In the example with the two factors sex and strain we may formulate the
model with main effects of the two factors and with a combined effect of
these:
Yi = α(sexi ) + β(straini ) + γ(sexi × straini ) + i ,
where 1 , . . . , 30 are independent and identically N (0, σ 2 )-distributed random variables. This model is nothing but the usual two-way analysis of
variance model. A test for interaction is obtained by testing whether or not
we can reduce the model so that it only contains the main effects, that is,
whether or not the γ(sexi × straini )-term can be excluded.
4
Factor structure diagrams
To get an overview of the structure in a given experiment it is useful to draw
a factor structure diagram, which we will illustrate in the following using the
growth hormone data. Recall that, in this experiment, we have the following
three factors
sex,
strain,
sex × strain,
and that 5 units were allocated to each of the six levels of sex × strain. A
possible conclusion of an analysis could be that the response variable, growth,
does not depend on either sex nor strain. The mean value of the response
variable is then constant, which, in the model, is written as a constant term,
µ, say. This value is common to all units. Which factor corresponds to this
term? It has to be a factor with only one level because it should result in one
value, µ. The answer is the trivial factor 0, which only has one level. The
4
factor I, which partions the units into their own single group, is the other
extreme in the ordering of the factors. The term corresponding to I in the
model is the error-term , i , which may have separate values for each of the
units in the experiment. All in all we have 5 factors in the experiment:
0,
sex,
sex × strain,
strain,
I.
These factors correspond to the model
Yi = µ + α(sexi ) + β(straini ) + γ(sex × straini ) + i ,
i = 1, . . . , 30, (1)
where we may write µ(0) in place of µ, and where we in passing note that the
term i differs from the other terms in that it is a stochastic variable, while
the other terms are (unknown) parameters, which we may want to estimate
The ordering of the factors is as follows. The factor sex is not finer than
strain and vice versa but they are both coarser than their product, and all
factors are finer than 0 and coarser than I. This is conveniently summarized
in the factor structure diagram as shown in Figure 1.
[I]
sexG
GG
nn7
n
GG
n
nn
GG
n
n
n
GG
G#
nnn
/ sex × strain
OOO
x; 0
xx
OOO
x
OOO
xx
OOO
xx
xx
'
strain
Figure 1: Factor diagram for the growth experiment.
We see that the factor structure diagram consists of all the factors in the
model and of arrows between some of the factors.
The arrows show the ordering between the factors. If one factor is finer
than an other factor then there will be an arrow from the finer factor to
the coarser factor, e.g., there is an arrow from sex × strain to strain. All
superfluous arrows are, however, deleted. Hence we do not draw an arrow
from I to strain because we already have the arrows from I to sex×strain
and from sex × strain to strain telling us that strain is coarser than I.
We draw the diagram with the coarser factors to the right and the finer
factors to the left.
The sharp parentheses around the factor I means that this factor is
represented in the model as a stochastic variable, it is a factor with a random
effect.
5
Example 4.1 (Continuation of Example 3.1.)
The factor structure diagram for this experiment is shown in Figure 2 We
[I]
/ light
/ cham × light
/ temp × light
NNN
DD
RRR
RRR
DD
NNN
RRR
DD
NNN
RRR
DD
N
RR(
N'
D"
/0
/
temp
cham
Figure 2: Factor structure diagram for the fungus experiment.
see, as noted earlier, that cham is nested within temp, and hence also that
cham × light is nested within temp × light.
2
5
Factor structure diagrams and random effects
In ANOVA all tests are F-tests, and they are carried out as
F = M Sf actor /M Sresidual ,
see the ANOVA-table, e.g. Kirkwood and Sterne, p. 84. However, in applications with more than one random factor, this will no longer be the case.
The factor structure diagram may be used to find out which residual a given
factor should be tested against. In practice, one applies a computer program
(as for example Stata), which automatically performs the tests correctly if
one includes all the needed random effects! Otherwise, wrong results may be
obtained.
Using the factor structure diagram, the rule is then that the effect of a
given factor F should be tested against the coarsest random factor finer than
F . One should hence consider all random factors finer than F and choose
the coarsest of these. If we end up with two candidates, which is not ordered,
then there is no simple F-test to judge the effect of F , and approximative
methods have to be applied (the computer program takes care of that). The
following example illustrates the use of rule .
Example 5.1
In a classical longitudinal study such as the one reported in Ch. 8 of RabeHesketh and Everitt concerning the postnatal depression for women giving
birth, we have the following factors with their respective levels:
6
treatm
(placebo, estrogen),
time
(1, 2, . . . , 6),
subj
(1, 2, . . . , 61),
I
(1, 2, . . . , 366).
assuming no missing observations. We are primarily interested in the effect
of treatment, and a possible interaction with time. We therefore also add
the product of these two factors, treatm × time, which has 12 levels. The
61 women represent a population of women and are therefore specified as a
random effects factor. The factor structure diagram is depicted in Figure 3.
/ treatm × time
/
/ time
OOO
y< 0
MMM
y
O
y
OOO
MMM
yy
OOO
MMM
yy
O
y
O
&
O'
yy
/ treatm
[subj]
[I] MM
Figure 3: Factor structure diagram for the postnatal depression data.
By use of the above mentioned rule about testing, it seen that effect of
treatm will be tested against subj, while time and treatm × time are tested
against the residual (I). With regard to treatm we see that both I and subj
are random effects factors finer than treatm, but since subj is coarser than
I, it is subj that treatm is tested against.
2
Another example, where there is more than one random factor, is the
following.
Example 5.2
Data are from Lars G. Hvid and Jonas B. Thorlund, Institute of Sports Science and Clinical Biomechanics, and concerns rapid force capacity in middleaged meniscectomized patients.
7
The above figure shows mean time-torque curves for operated leg (full line),
non-operated leg (hatched line) and mean of both legs for controls (dotted
line) in the initial phase of contraction, representing rapid force capacity.
The material consists of data from 29 cases (with measurements from 2 legs)
and 31 (age and gender matched) controls (with measurements from 1 leg,
actually average). This gives rise to the following factors
Subjects
(1, 2, . . . , 60),
Group
(1:
Time
(1, 2, . . . , 4),
Leg
(1, 2, . . . , 89),
I
(1, 2, . . . , 356).
operated, 2:
non-operated, 3:
control)),
The factor diagram, where we also allow for group and time interaction,
8
reads
/ [S]
[L] D
==
DD
z=
z
==
DD
z
z
==
DD
zz
==
D
z
D!
z
/
/0
/
[I]
G × TE
G
?
EE
EE
EE
EE
"
T
We see from the factor diagram that the group effect should be tested against
leg, this is automatically done if we remember to include that factor in the
xtmixed-analysis as shown below (where we have dropped the interaction
term):
xi: xtmixed sqrt_y i.group i.rfdpoint ||id: || leg:, mle
i.group
_Igroup_1-3
(naturally coded; _Igroup_1 omitted)
i.rfdpoint
_Irfdpoint_30-200
(naturally coded; _Irfdpoint_30 omitted)
-----------------------------------------------------------------------------sqrt_y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Igroup_2 |
.2907674
.2393653
1.21
0.224
-.17838
.7599147
_Igroup_3 | -.2316406
.414737
-0.56
0.576
-1.04451
.5812289
_Irfdpoin~50 |
2.262496
.0979184
23.11
0.000
2.07058
2.454413
_Irfdpoi~100 |
5.126932
.0979184
52.36
0.000
4.935015
5.318848
_Irfdpoi~200 |
7.041157
.0979184
71.91
0.000
6.84924
7.233074
_cons |
5.207261
.3040817
17.12
0.000
4.611272
5.80325
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------id: Identity
|
sd(_cons) |
1.321532
.1695376
1.027728
1.699328
-----------------------------+-----------------------------------------------leg: Identity
|
sd(_cons) |
.8509534
.1251244
.6378886
1.135185
-----------------------------+-----------------------------------------------sd(Residual) |
.6531973
.0282666
.6000802
.7110162
-----------------------------------------------------------------------------. estimates store model2
The response is square-root transformed (sqrty) and the time factor is called
rfdpoint in the data. We see from the output that the response from the
non-operated and the control is not significantly different from the operated
leg. There is a clear time effect, which is also pretty obvious judging from
the figure. If one forgets to include the leg factor then the group effect will
wrongly be tested against the residual; in Stata:
xi: xtmixed sqrt_y i.group i.rfdpoint ||id:,mle
9
i.group
_Igroup_1-3
(naturally coded; _Igroup_1 omitted)
i.rfdpoint
_Irfdpoint_30-200
(naturally coded; _Irfdpoint_30 omitted)
-----------------------------------------------------------------------------sqrt_y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Igroup_2 |
.2907674
.1111371
2.62
0.009
.0729426
.5085921
_Igroup_3 | -.2316406
.3985803
-0.58
0.561
-1.012844
.5495625
_Irfdpoin~50 |
2.262496
.12688
17.83
0.000
2.013816
2.511176
_Irfdpoi~100 |
5.126932
.12688
40.41
0.000
4.878252
5.375612
_Irfdpoi~200 |
7.041157
.12688
55.49
0.000
6.792477
7.289837
_cons |
5.207261
.2968465
17.54
0.000
4.625453
5.78907
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------id: Identity
|
sd(_cons) |
1.483661
.1436679
1.227185
1.79374
-----------------------------+-----------------------------------------------sd(Residual) |
.846395
.0347807
.780899
.9173843
which leads to the wrong conclusion that there is significant difference between
the operated and non-operated leg.
2
References
• A handbook of Statistical Analyses using Stata, Rabe-Hesketh and
Everitt (2007).
• Essential Medical Statistics, Kirkwood and Sterne (2003).
10
© Copyright 2026 Paperzz