Math 143: Introduction to Biostatistics
R Pruim
Spring 2012
0.2
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Contents
13 Handling Violations
13.1 Diagnosis . . . .
13.2 Prescription . . .
13.3 More Examples .
of Assumptions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
6
14 Designing Experiments
14.1 What percentage of Earth is Covered with Water? . . . . . . . . . . . . . . . . . . . .
14.2 Using Functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3 Power Calculations using power.t.test() and power.prop.test() . . . . . . . . . .
1
2
4
5
9 Chi-squared for Two-Way Tables
1
15 Comparing More Than Two Means Using ANOVA
15.1 The Basic ANOVA situation . . . . . . . . . . . . . . .
15.2 Confidence Intervals for One Mean At a Time . . . . .
15.3 Pairwise Comparison . . . . . . . . . . . . . . . . . . .
15.4 Additional Examples . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
9
10
13
16 Correlation
1
17 Regression
17.1 The Simple Linear Regression Model
17.2 The Least Squares Method . . . . .
17.3 Making Predictions . . . . . . . . . .
17.4 How Good Are Our Predictions? . .
17.5 Checking Assumptions . . . . . . . .
1
1
1
3
3
6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18 Variations on the Theme: Using Linear Models
Warning message: package 'Hmisc' was built under R version 2.14.2
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
12.4
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Handling Violations of Assumptions
13.1
13
Handling Violations of Assumptions
The t-procedures (1-sample t, 2-sample t, and paired t) all rely on an assumption that the distribution
of the variable(s) is normal in the population.
13.1
Diagnosis
13.1.1
Histograms
Histograms of our data can reveal potential problems in the population distribution, but
1. It is sometimes hard to judge the shape of a histogram by eye.
2. When data sets are small, we don’t have very much information about the population.
The xhistogram() function can add an overlay of a normal distribution to help with issue 1:
xhistogram(~ biomass.ratio, MarineReserve, density = TRUE, n = 8)
Density
1.0
0.8
0.6
0.4
0.2
0.0
1
2
3
4
biomass.ratio
But normal quantile plots work even better.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
13.2
13.1.2
Handling Violations of Assumptions
Normal Quantile Plots
Normal quantile plots address the first problem of histograms. (We can’t really do anything about
the second problem.)
A normal quantile plot is a scatter plot comparing the values in our data set to the values we would
expect to see in a standard normal distribution based on quantiles. For example, the median of our
data will be paired with 0, the median of the standard normal distribution. The 80th percentile in
our data will be paired with the 80th percentile of a standard normal distribution, namely:
qnorm(0.8)
[1] 0.8416
and so on. If the points of the normal quantile plot fall roughly along a straight line, then the data
are behaving as we would expect for a normal distribution. If they deviate (especially systematically)
from a line, that is an indication of problems. We can make a normal quantile plot using qqmath()
or xqqmath().
biomass.ratio
xqqmath(~ biomass.ratio, MarineReserve)
●
4
●
3
2
1
●
−2
●
●●●
●
●
●●●●●
●●●●
●●●●●●●●
● ● ●●●●
−1
0
1
2
qnorm
This plot indicates that the largest biomass ratios are too large. That is, our data appear to be skewed
to the right.
13.2
Prescription
When this is not the case we have three choices:
1. Ignore the problem.
2. Transform the data.
3. Use a different analysis method.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Handling Violations of Assumptions
13.2.1
13.3
Solution 1: Ignore the Problem
Some violations of assumptions don’t matter much. (When this happens, we will say that a statistical
method is robust.) The t-procedures generally more robust when the sample size is larger.
13.2.2
Solution 2: Transform the Data
Sometimes a simple transformation of the data will make things more normal.
log(biomass.ratio)
xqqmath(~ log(biomass.ratio), MarineReserve)
1.5
●
1.0
●●●
0.5
●
0.0
●
●
●●●●
●●●●
●●●
●
●
●
●●●●●●
● ●●●
●
−2
−1
0
1
2
qnorm
While not perfect, this is certainly an improvement over the untransformed data.
An analysis of the log transformed data is pretty simple, but we must keep the transformation in mind
as we do it.
Suppose our original null hypothesis is that the mean biomass ratio is 1.
• H0 : µ = 1
Then our new null hypothesis is that the mean of the log transformed biomass ratios is 0 (because
log(1) = 0).
• H0 : µ0 = 0
t.test(log(MarineReserve$biomass.ratio))
One Sample t-test
data: log(MarineReserve$biomass.ratio)
t = 7.397, df = 31, p-value = 2.494e-08
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
13.4
Handling Violations of Assumptions
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.3470 0.6112
sample estimates:
mean of x
0.4791
The confidence interval given is for the log of biomass ratio. If we want a confidence interval for the
biomass ratio, we need to “back-transform”:
exp(interval(t.test(log(MarineReserve$biomass.ratio))))
mean of x
1.615
lower
1.415
upper
1.843
So the 95% confidence interval for the bio mass ratio is (1.415, 1.843).
See pages 330–331 and page 345 for other commonly used transformations.
Although the book doesn’t do it, a reciprocal transformation would be a good choice in this example.
The normal quantile plot shows that the normality assumption is fit better, and the reciprocal of a
biomass ratio is the biomass ratio computed with the other group as the base.
1/biomass.ratio
xqqmath(~ 1/biomass.ratio, MarineReserve)
●
1.2
1.0
0.8
0.6
0.4
0.2
●
−2
●
●
●●●
●●●
●●●●●
●
●●●
●●●●
●●●●
●●
●●
−1
0
1
●
2
qnorm
t.test(1/MarineReserve$biomass.ratio, mu = 1)
One Sample t-test
data: 1/MarineReserve$biomass.ratio
t = -9.119, df = 31, p-value = 2.763e-10
alternative hypothesis: true mean is not equal to 1
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Handling Violations of Assumptions
13.5
95 percent confidence interval:
0.5802 0.7336
sample estimates:
mean of x
0.6569
# if we only want the confidence interval, it can be extracted.
interval(t.test(1/MarineReserve$biomass.ratio, mu = 1))
mean of x
0.6569
lower
0.5802
upper
0.7336
1/interval(t.test(1/MarineReserve$biomass.ratio, mu = 1))
mean of x
1.522
13.2.3
lower
1.724
upper
1.363
Solution 3: Use another Analysis Method
Sign Test as an Alternative to 1-sample t or Paired t
A sign test works by converting a quantitative variable to a categorical variable with two levels
(over and under, or + and −, whence the name sign test). The sign test uses a 1-proportion test
(binom.test() or prop.test()) to see if 50% of the data fall on either side of a hypothesized median.
It is very often used in place of paired t-tests, in which case the categorical variable indicates which
of the two values is larger.
These tests are easy to do in R beause R can work with logical values (TRUE and FALSE):
tally(MarineReserve$biomass.ratio > 1)
TRUE FALSE Total
31
1
32
binom.test(31, 32)
Exact binomial test
data: x and n
number of successes = 31, number of trials = 32, p-value = 1.537e-08
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
13.6
Handling Violations of Assumptions
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.8378 0.9992
sample estimates:
probability of success
0.9688
13.3
More Examples
13.3.1
Another Sign Test
This is Example 13.4 in the text.
head(SexualSelection)
1
2
3
4
5
6
polyandrous.species monandrous.species difference taxon.pair
53
10
43
A
73
120
-47
B
228
74
154
C
353
289
64
D
157
30
127
E
300
4
296
F
The observational units here insect taxa (groups). For each we count how many species are polyandrous
and how many are monoandrous to see which there are more of.
SexualSelection$polyandrous.species > SexualSelection$monandrous.species
[1] TRUE FALSE
[13] FALSE TRUE
[25] TRUE
TRUE TRUE
TRUE FALSE
TRUE
TRUE
TRUE
TRUE
TRUE FALSE FALSE FALSE
TRUE TRUE TRUE TRUE
TRUE FALSE
TRUE TRUE
tally(SexualSelection$polyandrous.species > SexualSelection$monandrous.species)
TRUE FALSE Total
18
7
25
Since the distributions are not normal, and no obvious transformation rectifies the problem, the Sign
Test is a good alternative to the paired t test.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Handling Violations of Assumptions
13.7
binom.test(SexualSelection$polyandrous.species > SexualSelection$monandrous.species)
Exact binomial test
data: x
number of successes = 18, number of trials = 25, p-value = 0.04329
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5061 0.8793
sample estimates:
probability of success
0.72
Since the data frame already includes a variable for the difference, we could also do this with a bit
less typing using that variable directly:
tally(SexualSelection$difference > 0)
TRUE FALSE Total
18
7
25
binom.test(SexualSelection$difference > 0)
Exact binomial test
data: x
number of successes = 18, number of trials = 25, p-value = 0.04329
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5061 0.8793
sample estimates:
probability of success
0.72
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
13.8
Last Modified: April 23, 2012
Handling Violations of Assumptions
Math 143 : Spring 2011 : Pruim
Designing Experiments
14.1
14
Designing Experiments
This chapter is quite nicely laid out. Here I’ll just highlight a few extras from class:
1. We used the Polio Vaccine Trials as a case study for the design of an experiment. I’ve posted a
link to a web article with details of the study for your reference.
2. We used the goal of estimating the proportion of the earth covered by water as an example of
determining sample size.
3. The R functions power.t.test() and power.prop.test() can be used to sample sizes (or other
study design features) for studies comparing two proportions or two means.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
14.2
Designing Experiments
14.1
What percentage of Earth is Covered with Water?
You have probably heard that the Earth is 2/3 water, or 70% water, or some other proportion water.
How do people “know” this sort of stuff? The answer is they don’t “know” it, but someone has
estimated it. Using the power of GoogleMaps, we can make our own estimate and see if it is compatible
with your folk knowledge.
We can estimate the proportion of the world covered with water by randomly sampling points on the
globe and inspecting them using GoogleMaps.
1. First, let’s do a sample size computation. If everyone in the class inspects 10 locations, what
will our margin of error be?
2. Suppose we want to estimate (at the usual 95% confidence level) this proportion with ±5%. How
large would our sample need to be?
3. Now fill in the following table.
margin of error sample size
±5%
±4%
±2%
±1%
4. Generate enough random locations so that when we combine our data, we will get the precision
we desire. To do this, replace the number 3 with the appropriate value.
positions <- rgeo(3)
positions
lat
lon
1 -9.487 66.90
2 6.926 38.18
3 -8.657 32.59
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Designing Experiments
14.3
5. Open a GoogleMap centered at each position.
Turn off pop-up blocking for your default browser, and then enter
googleMap(pos = positions, mark = TRUE)
6. For each map, record whether the center is located in water or on land. mark=TRUE is used to
place a marker at the center of the map which is helpful for locations that are close to the coast.
You can zoom in or out to get a better look.
7. Record your data in a GoogleForm at
http://www.calvin.edu/~rpruim/courses/m143/S12/fromClass/googleWater.shtml
8. Now it is simple to sum the counts across the class. Here’s how it turned out in a different class:
sum(Water$Water)
[1] 215
sum(Water$Land)
[1] 85
9. Using our new data and your favorite analysis method (perhaps binom.test()) give a 95%
confidence interval for the proportion of the world covered with water.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
14.4
14.2
Designing Experiments
Using Functions in R
Here is a way to use functions to speed up guess and check. Here is a function that computes the
margin of error in the previous example (assuming the proportion of water will be greater than .6):
me <- makeFun( diff(binom.test(.6 *n, n)$conf.int) / 2 ~ n )
me(100)
[1] 0.09975
me(1000)
[1] 0.03083
me(10000)
[1] 0.009651
Note that binom.test() requires positive integer data, so our me() function will break if we’re not
careful:
me(789)
Error: 'x' must be nonnegative and integer
We could improve things a bit by making our function more complicated:
me <- makeFun( diff(binom.test(round(.6 *n), round(n))$conf.int) / 2 ~ n )
me(789)
[1] 0.03477
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Designing Experiments
14.3
14.5
Power Calculations using power.t.test() and power.prop.test()
Each of these functions allows you to specify all but one of a number of design elements and solves for
the missing one.
14.3.1
Polio Vaccine Trials
For the case of the polio vaccine trials, if we hoped to detect a difference in polio rate of 50 in 100,000
for unvaccinated children and 25 in 100,000 for vaccinated children, we can determine the sample size
required to achieve 90% power using power.prop.test().
power.prop.test( p1=50/100000, p2=25/100000, power=.90 )
Two-sample comparison of proportions power calculation
n
p1
p2
sig.level
power
alternative
=
=
=
=
=
=
126040
5e-04
0.00025
0.05
0.9
two.sided
NOTE: n is number in *each* group
(As you can see the default significance level is α = 0.5, but that can be adjusted using the sig.level
argument.)
This explains why such a large sample was required. The researchers wanted to be very confident of
reaching a clear conclusion, but the rate of polio was relatively low even without vaccination. Allowing
for the possibility of a lower than average rate of polio in the year of the study (polio rates varied
substantially from year to year), would require even larger samples.
power.prop.test( p1=30/100000, p2=15/100000, power=.90 )
Two-sample comparison of proportions power calculation
n
p1
p2
sig.level
power
alternative
=
=
=
=
=
=
210099
3e-04
0.00015
0.05
0.9
two.sided
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
14.6
Designing Experiments
NOTE: n is number in *each* group
Given these calculations, a study with 200,000 vaccinated and 200,000 unvaccinated children doesn’t
seem at all unreasonable. And if we wanted higher power or a smaller α, we would need an even larger
sample.
Let’s try a study with 1/2 million subjects and α = 0.01. What are the chances that such a study will
detect a difference between vaccinated and unvaccinated children if we have a typical summer and the
vaccine prevents half of the polio cases?
power.prop.test( p1=50/100000, p2=25/100000, n=250000)
Two-sample comparison of proportions power calculation
n
p1
p2
sig.level
power
alternative
=
=
=
=
=
=
250000
5e-04
0.00025
0.05
0.9954
two.sided
NOTE: n is number in *each* group
And what if we have a summer with relatively few polio cases?
power.prop.test( p1=30/100000, p2=15/100000, n=250000)
Two-sample comparison of proportions power calculation
n
p1
p2
sig.level
power
alternative
=
=
=
=
=
=
250000
3e-04
0.00015
0.05
0.9425
two.sided
NOTE: n is number in *each* group
Now we see why the polio trials had such large sample sizes. In order to be quite sure that there
would be strong evidence for the effectiveness of the vaccine, even if the polio rate was low, a very
large sample was required.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Designing Experiments
14.3.2
14.7
Power for a two-sample t test
Suppose we want to estimate the difference in two population means (using 2-sample t). The precision
of this estimate depends on several ingredients:
1. Our desired level of confidence for the interval.
2. The sizes of our two samples. (n1 and n2 )
3. The standard deviations of the two populations. (σ1 and σ2 , which will in turn affect our sample
standard deviations s1 and s2 )
4. The difference bewtween the two means. (δ = µ1 − µ2 )
power.t.test() makes the simplifying assumptions that the two sample sizes are equal (n1 = n2 = n)
and the two population standard deviations are equal (σ1 = σ2 = σ).
Power calculations always require that we make some assumptions about things we don’t know in
advance. We can determine the power of a test with samples of size n = 20 assuming that σ = .4 and
δ = .1 as follows:
power.t.test(n = 20, delta = 0.1, sd = 0.4, sig.level = 0.05)
Two-sample t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
20
0.1
0.4
0.05
0.1172
two.sided
NOTE: n is number in *each* group
This shows that we would have just over a 10% chance of detecting a difference in populations means
of δ = 0.1 using a confidence interval at the 95% level or a hypothesis test with α = 0.05. If we wanted
to have higher power, we can let R estimate the sample size for our desired power:
power.t.test(delta = 0.1, sd = 0.4, sig.level = 0.05, power = 0.8)
Two-sample t test power calculation
n = 252.1
delta = 0.1
sd = 0.4
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
8.8
Designing Experiments
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
14.3.3
Power for a one-sample t test
We can do power calculations for a 1-sample t test as well. For example, suppose you want to estimate
the mean body temperature of lemurs to within 0.1 degrees C. You estimate the standard deviation
of lemur body temperature to be about 0.2 degrees C. How many lemurs must you measure?
power.t.test(delta = 0.1, sd = 0.2, power = 0.8, type = "one.sample")
One-sample t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
33.37
0.1
0.2
0.05
0.8
two.sided
Better hope your estimate of σ isn’t too small. Suppose it is really σ = 0.4. Then you will need a
larger sample:
power.t.test(delta = 0.1, sd = 0.4, power = 0.8, type = "one.sample")
One-sample t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
127.5
0.1
0.4
0.05
0.8
two.sided
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Chi-squared for Two-Way Tables
9.1
9
Chi-squared for Two-Way Tables
Example 9.3: Trematodes
tally(~ eaten & infection.status, Trematodes)
infection.status
eaten
high light uninfected Total
no
9
35
49
93
yes
37
10
1
48
Total
46
45
50
141
tally(~ eaten | infection.status, Trematodes)
# conditional probabilities
infection.status
eaten
high light uninfected
no
0.1957 0.7778
0.9800
yes
0.8043 0.2222
0.0200
Total 1.0000 1.0000
1.0000
To use the chisq.test() shortcut, we must remove the marginal totals from our table:
fish <- tally(~ eaten & infection.status, Trematodes, margins = FALSE)
fish
infection.status
eaten high light uninfected
no
9
35
49
yes
37
10
1
chisq.test(fish)
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
9.2
Chi-squared for Two-Way Tables
Pearson's Chi-squared test
data: fish
X-squared = 69.76, df = 2, p-value = 7.124e-16
Some details
• H0 : The proportion of fish that are eaten is the same for each treatment group (high infection,
low infection, or no infection).
• Ha : The proportion of fish differs among the treatment groups.
• expected count =
(row total) (column total)
grand total
observed <- c(9, 35, 49, 37, 10, 1)
9 + 35 + 49 # row 1 total
[1] 93
37 + 10 + 1
# row 2 total
[1] 48
9 + 37
# col 1 total
[1] 46
35 + 10
# col 2 total
[1] 45
49 + 1
# col 3 total
[1] 50
9 + 35 + 49 + 37 + 10 + 1
# grand total
[1] 141
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Chi-squared for Two-Way Tables
9.3
expected <- c(93 * 46/141, 93 * 45/141, 93 * 50/141, 48 * 46/141, 48 *
45/141, 48 * 50/141)
expected
[1] 30.34 29.68 32.98 15.66 15.32 17.02
X.squared <- sum((observed - expected)^2/expected)
X.squared
[1] 69.76
• expected proportion =
row total column total
·
= (row proportion) (column proportion)
grand total grand total
◦ So H0 can be stated in terms of independent rows and columns.
Example: Physicians Health Study
phs <- cbind(c(104, 189), c(10933, 10845))
phs
[1,]
[2,]
[,1] [,2]
104 10933
189 10845
rownames(phs) <- c("aspirin", "placebo") # add row names
colnames(phs) <- c("heart attack", "no heart attack") # add column names
phs
aspirin
placebo
heart attack no heart attack
104
10933
189
10845
chisq.test(phs)
Pearson's Chi-squared test with Yates' continuity correction
data: phs
X-squared = 24.43, df = 1, p-value = 7.71e-07
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
9.4
Chi-squared for Two-Way Tables
Example: Fisher Twin Study
Here is an example that was treated by R. A. Fisher in an article [Fis62] and again later in a book
[Fis70]. The data come from a study of same-sex twins where one twin had had a criminal conviction.
Two pieces of information were recorded: whether the sibling had also had a criminal conviction and
whether the twins were monozygotic (identical) twins and dizygotic (nonidentical) twins. Dizygotic
twins are no more genetically related than any other siblings, but monozygotic twins have essentially
identical DNA. Twin studies like this have often used to investigate the effects of “nature vs. nurture”.
Here are the data from the study Fisher analyzed:
Convicted
Not convicted
Dizygotic
2
15
Monozygotic
10
3
The question is whether this gives evidence for a genetic influence on criminal convictions. If genetics
had nothing to do with convictions, we would expect conviction rates to be the same for monozygotic
and dizygotic twins since other factors (parents, schooling, living conditions, etc.) should be very
similar for twins of either sort. So our null hypothesis is that the conviction rates are the same for
monozygotic and dizygotic twins. That is,
• H0 : pM = pD .
• Ha : pM 6= pD .
twins <- rbind(c(2, 15), c(10, 3))
twins
[,1] [,2]
[1,]
2
15
[2,]
10
3
rownames(twins) <- c("dizygotic", "monozygotic")
colnames(twins) <- c("convicted", "not convicted")
twins
dizygotic
monozygotic
convicted not convicted
2
15
10
3
chisq.test(twins)
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Chi-squared for Two-Way Tables
9.5
Pearson's Chi-squared test with Yates' continuity correction
data: twins
X-squared = 10.46, df = 1, p-value = 0.001221
We can get R to compute the expected counts for us, too.
chisq.test(twins)$expected
dizygotic
monozygotic
convicted not convicted
6.8
10.2
5.2
7.8
Alternatively, we can use xchisq.test(), which provides more output than chisq.test():
xchisq.test(twins)
Pearson's Chi-squared test with Yates' continuity correction
data: twins
X-squared = 10.46, df = 1, p-value = 0.001221
2.00
( 6.80)
[3.39]
<-1.84>
15.00
(10.20)
[2.26]
< 1.50>
10.00
( 5.20)
[4.43]
< 2.10>
3.00
( 7.80)
[2.95]
<-1.72>
key:
observed
(expected)
[contribution to X-squared]
<residual>
This is an important check because the chi-squared approximation is not good when expected counts
are too small. Recall our general rule of thumb:
• All expected counts should be at least 1.
• 80% of expected counts should be at least 5.
For a 2 × 2 table, this means all four expected counts should be at least 5.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
9.6
Chi-squared for Two-Way Tables
In our example, we are just on the good side of our rule of thumb. There is an exact test that works
only in the case of a 2 × 2 table (much like the binomial test can be used instead of a goodness of fit
test if there are only two categories). The test is called Fisher’s Exact Test.
fisher.test(twins)
Fisher's Exact Test for Count Data
data: twins
p-value = 0.0005367
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.003326 0.363182
sample estimates:
odds ratio
0.04694
In this case we see that the approximate p-value from the Chi-squared Test is nearly the same as the
exact p-value from Fisher’s Exact Test. It appears that the conviction rates is higher for monozygotic
twins than for dizygotic twins.
Dealing with Low Expected Counts
What do we do if we have low expected counts?
1. If we have a 2 × 2 table, we can use Fisher’s Exact Test.
2. For goodness of fit testing with only two categories, we can use the binomial test.
3. For goodness of fit testing and larger 2-way tables, we can sometimes combine some of the cells
to get larger expected counts.
4. Low expected counts often arise when analysing small data sets. Sometimes the only good
solution is to design a larger study so there is more data to analyse.
Example 9.4: Cows and Bats
bitten
not bitten
column totals
Last Modified: April 23, 2012
cows in estrous
cows not in estrous
row totals
15
6
21
7
322
329
22
328
350
Math 143 : Spring 2011 : Pruim
Chi-squared for Two-Way Tables
9.7
cows <- cbind(c(15, 6), c(7, 322))
cows
[1,]
[2,]
[,1] [,2]
15
7
6 322
rownames(cows) <- c("bitten", "not bitten") # add row names
colnames(cows) <- c("in estrous", "not in estrous") # add column names
cows
in estrous not in estrous
bitten
15
7
not bitten
6
322
chisq.test(cows)
Warning message: Chi-squared approximation may be incorrect
Pearson's Chi-squared test with Yates' continuity correction
data: cows
X-squared = 149.4, df = 1, p-value < 2.2e-16
This time one of our expected counts is too low for our rule of thumb (notice the warning above):
chisq.test(cows)$expected
Warning message: Chi-squared approximation may be incorrect
in estrous not in estrous
bitten
1.32
20.68
not bitten
19.68
308.32
So let’s use Fisher’s Exact Test instead:
fisher.test(cows)
Fisher's Exact Test for Count Data
data: cows
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
9.8
Chi-squared for Two-Way Tables
95 percent confidence interval:
29.95 457.27
sample estimates:
odds ratio
108.4
In this example the conclusion is the same: extremely strong evidence against the null hypothesis.
Odds and Odds Ratio
odds = O =
p
1−p
=
pn
(1 − p)n
=
number of successes
number of failures
odds1
=
odds ratio = OR =
odds2
=
p1
1−p1
p2
1−p2
p1
1 − p2
·
1 − p1
p2
We can estimate these quantities using data. When we do, we put hats on everything
estimated odds = Ô =
ˆ =
estimated odds ratio = OR
=
p̂
1 − p̂
Ô1
Ô2
=
p̂1
1−p̂1
p̂2
1−p̂2
1 − p̂2
p̂1
·
1 − p̂1
p̂2
The null hypothesis of Fisher’s Exact Test is that the odds ratio is 1, that is, that the two proportions
are equal. fisher.test() uses a somewhat more complicated method to estimate odds ratio, so the
formulas above won’t exactly match the odds ratio displayed in the output of fisher.test(), but the
value will be quite close.
(2/15)/(10/3)
# simple estimate of odds ratio
[1] 0.04
(10/3)/(2/15)
# simple estimate of odds ratio with roles reversed
[1] 25
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Chi-squared for Two-Way Tables
14.9
Notice that when p1 and p1 are small, then 1 − p1 and 1 − p2 are both close to 1, so
odds ratio ≈
p1
= relative risk
p2
Relative risk is easier to understand, but odds ratio has nicer statistical and mathematical properties.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
14.10
Chi-squared for Two-Way Tables
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.1
15
Comparing More Than Two Means Using ANOVA
15.1
The Basic ANOVA situation
• Two variables: categorical explanatory and quantitative response
◦ Can be used in either experimental or observational designs.
• Main Question: Does the population mean response depend on the (treatment) group?
◦ H0 : the population group means are all the equal (µ1 = µ2 = · · · µk )
◦ Ha : the population group means are not all equal
• If categorical variable has only 2 values, we already have a method: 2-sample t-test
◦ ANOVA allows for 3 or more groups (sub-populations)
• F statistic compares within group variation (how different are individuals in the same group?)
to between group variation (how different are the different group means?)
• ANOVA assumes that each group is normally distributed with the same (population) standard
deviation.
◦ Check normality with normal quantile plots (of residuals)
◦ Check equal standard deviation using 2:1 ratio rule (largest standard deviation at most
twice the smallest standard deviation).
15.1.1
An Example: Jet Lag
This example is described on page 394.
favstats(shift ~ treatment, JetLagKnees)
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.2
Comparing More Than Two Means Using ANOVA
min
Q1 median
Q3
max
mean
sd n missing
control -1.27 -0.65 -0.485 0.240 0.53 -0.3088 0.6176 8
0
eyes
-2.83 -1.78 -1.480 -1.105 -0.78 -1.5514 0.7063 7
0
knee
-1.61 -0.76 -0.290 0.170 0.73 -0.3357 0.7908 7
0
xyplot(shift ~ treatment, JetLagKnees, type = c("p", "a"))
bwplot(shift ~ treatment, JetLagKnees)
−1
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
●
●
●
●
●
0
●
●
shift
shift
0
−1
●
●
−2
●
●
control
eyes
knee
control
eyes
knee
treatment
Question: Are these differences significant? Or would we expect sample differences this large by
random chance even if (in the population) the mean amount of shift is equal for all three groups?
Whether differences between the groups are significant depends on three things:
1. the difference in the means
2. the amount of variation within each group
3. the sample sizes
anova(lm(shift ~ treatment, JetLagKnees))
Analysis of Variance Table
Response: shift
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2
7.22
3.61
7.29 0.0045 **
Residuals 19
9.42
0.50
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value listed in this output is the p-value for our null hypothesis that the mean population
response is the same in each treatment group. In this case we would reject the null hypothesis at
either the α = 0.05 or α = 0.01 levels.
In the next section we’ll look at this test in more detail, but notice that if you know the assumptions
of a test, the null hypothesis being tested, and the p-value, you can generally interpret the results even
if you don’t know all the details of how the test statistic is computed.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.1.2
15.3
The ANOVA test statistic
The ingredients
The ANOVA test statistic (called F ) is based on three ingredients:
1. how different the group means are (between group differences)
2. the amount of variability within each group (within group differences)
3. sample size
Each of these will be involved in the calculation of F .
The F statistic is a bit complicated to compute. We’ll generally let the computer handle that for us.
But is useful to see one small example to see how the ingredients are baked into a test statistic.
A Small Dataset
Suppose we have three groups with the following response values.
• Group A: 5.3, 6.0, 6.7
[ȳ1 = 6.0]
• Group B: 5.5, 6.2, 6.4, 5.5
[ȳ2 = 5.8]
• Group C: 7.5, 7.2, 7.8
[ȳ3 = 7.5]
Computing F
First, let’s represent our small data set in the more usual way:
response <- c(5.4, 6.1, 6.8, 5.4, 6.1, 6.3, 5.4, 7.5, 7.2, 7.8)
group <- c("A", "A", "A", "B", "B", "B", "B", "C", "C", "C")
small <- data.frame(response = response, group = group)
small
1
2
3
4
5
6
7
8
9
10
response group
5.4
A
6.1
A
6.8
A
5.4
B
6.1
B
6.3
B
5.4
B
7.5
C
7.2
C
7.8
C
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.4
Comparing More Than Two Means Using ANOVA
Now let’s compute the grand mean and group means.
mean(response, data = small)
# grand mean
[1] 6.4
mean(response ~ group, data = small)
# group means
A
B
C
6.1 5.8 7.5
And add those to our data frame
small$groupMean <- c(6.1, 6.1, 6.1, 5.8, 5.8, 5.8, 5.8, 7.5, 7.5, 7.5)
small$grandMean <- rep(6.4, 10)
small
response group groupMean grandMean
1
5.4
A
6.1
6.4
2
6.1
A
6.1
6.4
3
6.8
A
6.1
6.4
4
5.4
B
5.8
6.4
5
6.1
B
5.8
6.4
6
6.3
B
5.8
6.4
7
5.4
B
5.8
6.4
8
7.5
C
7.5
6.4
9
7.2
C
7.5
6.4
10
7.8
C
7.5
6.4
Now we are ready to put the ingredients together:
• Between group variability: G = groupMean − grandMean
This measures how different a group is from the overall average.
• Within group variability: E = response − groupMean
This measures how different and individual is from its group average. E stands for “error”, but
just as in “standard error” it is not a “mistake”. It is simply measure how different an individual
response is from the model prediction (in this case, the group mean).
The individual values of E are called residuals.
small$G <- small$groupMean - small$grandMean
small$E <- small$response - small$groupMean
small
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
1
2
3
4
5
6
7
8
9
10
15.5
response group groupMean grandMean
G
E
5.4
A
6.1
6.4 -0.3 -0.7
6.1
A
6.1
6.4 -0.3 0.0
6.8
A
6.1
6.4 -0.3 0.7
5.4
B
5.8
6.4 -0.6 -0.4
6.1
B
5.8
6.4 -0.6 0.3
6.3
B
5.8
6.4 -0.6 0.5
5.4
B
5.8
6.4 -0.6 -0.4
7.5
C
7.5
6.4 1.1 0.0
7.2
C
7.5
6.4 1.1 -0.3
7.8
C
7.5
6.4 1.1 0.3
As we did with variance, we will square these differences:
small$G2 <- (small$groupMean - small$grandMean)^2
small$E2 <- (small$response - small$groupMean)^2
small
1
2
3
4
5
6
7
8
9
10
response group groupMean grandMean
G
E
G2
E2
5.4
A
6.1
6.4 -0.3 -0.7 0.09 0.49
6.1
A
6.1
6.4 -0.3 0.0 0.09 0.00
6.8
A
6.1
6.4 -0.3 0.7 0.09 0.49
5.4
B
5.8
6.4 -0.6 -0.4 0.36 0.16
6.1
B
5.8
6.4 -0.6 0.3 0.36 0.09
6.3
B
5.8
6.4 -0.6 0.5 0.36 0.25
5.4
B
5.8
6.4 -0.6 -0.4 0.36 0.16
7.5
C
7.5
6.4 1.1 0.0 1.21 0.00
7.2
C
7.5
6.4 1.1 -0.3 1.21 0.09
7.8
C
7.5
6.4 1.1 0.3 1.21 0.09
And then add them up (SS stands for “sum of squares”)
SSG <- sum( small$G^2 ); SSG
[1] 5.34
0.16 * 3 + 0.36 * 4 + 1.21 * 3 # alternate way to calculate
[1] 5.55
SSE <- sum( small$E^2 ); SSE
[1] 1.82
Now we adjust for sample size using
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.6
Comparing More Than Two Means Using ANOVA
• M SG = SSG/DF G = SSG/(number of groups − 1)
• M SE = SSE/DF E = SSE/(n − number of groups)
MS stands for “mean square”
MSG <- SSG / ( 3-1); MSG
[1] 2.67
MSE <- SSE / (10-3); MSE
[1] 0.26
Finally, our test statistic is
F =
M SG
M SE
F <- MSG / MSE; F
[1] 10.27
When the null hypothesis (that all the groups have the same mean) is true, then F has an “Fdistribution”.
• F distributions have two parameters – the degrees of freedom for the numerator and for the
denominator.
In our example, this is 2 for the numerator and 7 for the denominator.
• When H0 is true, the numerator and denominator both have a mean of 1, so F will tend to be
close to 1.
• When H0 is false, there is more difference between the groups, so the numerator tends to be
larger.
This means we will reject the null hypothesis when F gets large enough.
• The p-value is computed using pf().
1 - pf(F, 2, 7)
[1] 0.00828
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.1.3
15.7
Getting R to do the work
Of course, R can do all of this work for us. We saw this earlier. Here it is again in a slightly different
way:
small.model <- lm(response ~ group, small)
anova(small.model)
Analysis of Variance Table
Response: response
Df Sum Sq Mean Sq F value Pr(>F)
group
2
5.34
2.67
10.3 0.0083 **
Residuals 7
1.82
0.26
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lm() stands for “linear model” and can be used to fit a wide variety of situations. It knows to do
1-way ANOVA by looking at the types of variables involved.
The anova() prints the ANOVA table. Notice how DFG, SSG, MSG, DFE, SSE, and MSE show up
in this table as well as F and the p-value.
Proportion of Variation Explained
The summary() function can be used to provide a different summary of the ANOVA model:
summary(small.model)
Call:
lm(formula = response ~ group, data = small)
Residuals:
Min
1Q Median
-0.700 -0.375 0.000
3Q
0.300
Max
0.700
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
6.100
0.294
20.72 1.5e-07 ***
groupB
-0.300
0.389
-0.77
0.466
groupC
1.400
0.416
3.36
0.012 *
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.51 on 7 degrees of freedom
Multiple R-squared: 0.746,Adjusted R-squared: 0.673
F-statistic: 10.3 on 2 and 7 DF, p-value: 0.00828
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.8
Comparing More Than Two Means Using ANOVA
The ratio
R2 =
SSG
SSG
=
SSG + SSE
SST
measures the proportion of the total variation that is explained by the grouping variable (treatment).
Diagnositic Plots
We can use plot() to create some diagnostic plots. The first two are the most important for our
purposes. The provide a residual plot and a normal-quantile plot of residuals.
plot(small.model, w = 1:2)
●
●
●
●
1●
6.0
6.5
7.0
7.5
1
6●
●
●
● ●
0
●
3●
−1
0.5
0.0
●
−0.5
Residuals
3●
●6
2
Normal Q−Q
Standardized residuals
Residuals vs Fitted
●
●
●
●1
−1.5
Fitted values
lm(response ~ group)
−0.5
0.5
1.5
Theoretical Quantiles
lm(response ~ group)
The residual plot shows the residual broken down by groups (actually by group means). Ideally we
should see similar patterns of variation in each cluster. The second plot is a normal-quantile plot of
the residuals. Since the differences from the group means should be normally distributed with the
same standard deviation, we can combine all the residuals into a single normal-quantile plot.
15.1.4
Back to Jet Lag
Here is all the code needed to analyze the Jet Lag experiment
jetlag.model <- lm(shift ~ treatment, JetLagKnees)
anova(jetlag.model)
Analysis of Variance Table
Response: shift
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2
7.22
3.61
7.29 0.0045 **
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.9
Residuals 19
9.42
0.50
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(jetlag.model)
Call:
lm(formula = shift ~ treatment, data = JetLagKnees)
Residuals:
Min
1Q
-1.2786 -0.3613
Median
0.0386
3Q
0.6115
Max
1.0657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.309
0.249
-1.24
0.2299
treatmenteyes
-1.243
0.364
-3.41
0.0029 **
treatmentknee
-0.027
0.364
-0.07
0.9418
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.704 on 19 degrees of freedom
Multiple R-squared: 0.434,Adjusted R-squared: 0.375
F-statistic: 7.29 on 2 and 19 DF, p-value: 0.00447
●
●
●
●
●●
●●
●●
●●
●
●
15 ●
● 22
−1.6
−1.2
−0.8
−0.4
Fitted values
lm(shift ~ treatment)
15.2
2
Normal Q−Q
0
9 ●●
−2
−1.5
0.5
Residuals
Residuals vs Fitted
Standardized residuals
plot(jetlag.model, w = 1:2)
●
● 22 ● 15
−2
●
●● ● ● ●
●●
●●●●
●
●
●●●●
−1
0
1
9●
2
Theoretical Quantiles
lm(shift ~ treatment)
Confidence Intervals for One Mean At a Time
We can construct a confidence interval for any of the means by just taking a subset of the data and
using t.test(), but there are some problems with this approach. Most importantly,
We were primarily interested in comparing the means across the groups. Often people will
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.10
Comparing More Than Two Means Using ANOVA
display confidence intervals for each group and look for “overlapping” intervals. But this
is not the best way to look for differences.
Nevertheless, you will sometimes see graphs showing multiple confidence intervals and labeling them
to indicate which means appear to be different from which. (See the solution to problem 15.3 for an
example.)
15.3
Pairwise Comparison
We really want to compare groups in pairs, and we have a method for this: 2-sample t. But we need
to make a couple adjustments to the two-sample t.
1. We will use a new formula for standard error that makes use of all the data (even from groups
not involved in the pair).
2. We also need to adjust the critical value to take into account the fact that we are (usually)
making multiple comparisons.
15.3.1
The Standard Error
s
SE =
M SE
1
1
+
ni nj
s
√
=
M SE
1
1
+
ni nj
where ni and nj are the sample sizes for the two groups being compared. Basically,
the place of s in our usual formula. The degrees of freedom for this estimate is
√
M SE is taking
DF E = total sample size − number of groups .
Ignoring the multiple comparisons issue, we can now to confidence intervals or hypothesis tests just
as before.
• confidence interval:
y i − y j ± t∗ SE
• test statistic (for H0 : µ1 − µ2 = 0):
t=
15.3.2
yi − yj
.
SE
The Multiple Comparisons Problem
Suppose we have 5 groups in our study and we want to make comparisons between each pair of groups.
That’s 4 + 3 + 2 + 1 = 10 pairs. If we made 10 independent 95% confidence intervals, the probability
that all of the cover the appropriate parameter is 0.599:
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.11
0.95^10
[1] 0.5987
So we have family-wide error rate of nearly 40%.
We can correct for this by adjusting our critical value. Let’s take a simple example: just two 95%
confidence intervals. The probability that both cover (assuming independence) is
0.95^2
[1] 0.9025
Now suppose we want both intervals to cover 95% instead of 90.2% of the time. We could get this by
forming two 97.5% confidence intervals.
sqrt(0.95)
[1] 0.9747
0.975^2
[1] 0.9506
This means we need a larger value for t∗ for each interval.
The ANOVA situation is a little bit more complicated because
• There are more than two comparisons.
• The different comparisons are not independent (because they all come from the same data set.)
We will briefly describe two ways to make an adjustment for multiple comparisons.
15.3.3
Bonferroni Corrections – An Easy Over-adjustment
Bonferroni’s idea is simple: Simple divide the desired family-wise error rate by the number of tests
or intervals. This is an over-correction, but it is easy to do, and is used in many situations where a
better method is not known or a quick estimate is desired.
Here is a table showing a few Bonferroni corrections for looking at all pairwise comparisons.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.12
Comparing More Than Two Means Using ANOVA
number
groups
number of
pairs of groups
family-wise
error rate
individual
error rate
confidence level
for determining t∗
3
3
.05
0.017
0.983
4
6
.05
0.008
0.992
5
10
.05
0.005
0.995
Similar adjustments could be made for looking at only a special subset of the pairwise comparisons.
15.3.4
Tukey’s Honest Significant Differences
Tukey’s Honest Significant Differences is a better adjustment method specifically designed for making
all pairwise comparisons in an ANOVA situation. (It takes into account the fact that the tests are not
independent.) R can compute Tukey’s Honest Significant Differences (see Section 15.4 of the text for
more information).
TukeyHSD(aov(shift ~ treatment, JetLagKnees))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = shift ~ treatment, data = JetLagKnees)
$treatment
diff
lwr
upr p adj
eyes-control -1.24268 -2.1682 -0.3171 0.0079
knee-control -0.02696 -0.9525 0.8986 0.9970
knee-eyes
1.21571 0.2598 2.1716 0.0117
knee−eyes
eyes−control
plot(TukeyHSD(aov(shift ~ treatment, JetLagKnees)))
95% family−wise confidence level
−2
−1
0
1
2
Differences in mean levels of treatment
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.13
Tukey’s method adjusts the confidence intervals, making them a bit wider, to give them the desired
family-wide error rate. Tukey’s method also adjusts p-values (making them larger), so that when the
means are all the same, there is only a 5% chance that a sample will produce any p-values below 0.05.
In this example we see that the eye group differs significantly from control group and also from the
knee group, but that the knee and control groups are not significantly different. (We can tell this by
seeing which confidence intervals contain 0 or by checking which adjusted p-values are less than 0.05.)
15.3.5
Other Adjustments
There are similar methods for testing other sets of multiple comparisons. Testing “one against all the
others” goes by the name of Dunnet’s method, for example.
15.4
Additional Examples
Example 15.4
Below is R code for some other examples that appear in the text. See the text for a description of the
studies and for the questions being asked.
The data for the carbon transfer example on page 407 are not available, so I made fake data that
exhibit the same behavior (see Table 15.1).
We can use xyplot() to inspect the data:
carbon.transfer
xyplot(carbon.transfer ~ shade, carbon, cex = 1.5, alpha = 0.5)
30
25
20
15
10
5
0
deep
none
part
shade
And then compute the F -statistic and p-value for our test that the mean amount of carbon exchange is the
same for each shade group.
anova(lm(carbon.transfer ~ shade, carbon))
Analysis of Variance Table
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.14
Comparing More Than Two Means Using ANOVA
shade
carbon.transfer
1
deep
18.34
2
deep
29.58
3
deep
11.67
4
deep
18.61
5
deep
13.44
6
part
6.55
7
part
9.87
8
part
0.85
9
part
11.39
10
part
12.79
11
none
8.66
12
none
7.43
13
none
1.05
14
none
5.05
15
none
3.85
Table 15.1: Data from a carbon transfer experiment
Response: carbon.transfer
Df Sum Sq Mean Sq F value Pr(>F)
shade
2
471
235.4
8.78 0.0045 **
Residuals 12
322
26.8
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is small, we follow-up with Tukey’s pairwise comparisons.
TukeyHSD(aov(carbon.transfer ~ shade, carbon))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = carbon.transfer ~ shade, data = carbon)
$shade
diff
lwr
upr p adj
none-deep -13.12 -21.854 -4.386 0.0046
part-deep -10.04 -18.774 -1.306 0.0246
part-none
3.08 -5.654 11.814 0.6261
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.15
We have significant differences between the deep shade and the other two groups, but we cannot conclude that
the partial shade and no shade groups differ based on this sample.
Example 15.6: Walking Sticks
This example illustrates a couple additional points about ANOVA using data on the femur length of walking
sticks.
head(WalkingStickFemurs)
1
2
3
4
5
6
specimen femur.length
1
0.26
1
0.26
2
0.23
2
0.19
3
0.25
3
0.23
Numerically coded categorical variables
The following seems like the right thing to do, but it doesn’t work right.
anova(lm(femur.length ~ specimen, data = WalkingStickFemurs))
Analysis of Variance Table
Response: femur.length
Df Sum Sq Mean Sq F value Pr(>F)
specimen
1 0.0102 0.0102
8.47 0.0055 **
Residuals 48 0.0578 0.0012
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that the degrees of freedom are wrong. There should be 24 degrees for freedom for the 25 groups (the
specimens), and 25 (50 − 25) degrees of freedom for the residuals. The problem is that our categorical variable
(specimen) is coded with numbers (1–25), so R is treating it like a quantitative variable.
The solution is to tell R that the variable is really categorical:
anova(lm(femur.length ~ factor(specimen), data = WalkingStickFemurs))
Analysis of Variance Table
Response: femur.length
Df Sum Sq Mean Sq F value Pr(>F)
factor(specimen) 24 0.0591 0.002464
6.92 4.1e-06 ***
Residuals
25 0.0089 0.000356
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.16
Comparing More Than Two Means Using ANOVA
Now the degrees of freedom are what we expect.
Random Effects
The other wrinkle in this example is that our 25 walking stick specimens are not the point of the study. They
are merely a random sample from a larger population of walking sticks. A design like this leads to a random
effects model. This is in contrast to the fixed effects model of our previous example.
The interpretation of a random effects model is slightly different, but in this simple situation, the F -statistic
and p-value are the same. (In more complicated situations, the random effects and fixed effects situations need
to be analyzed differently.)
Coagulation Example
require(faraway) # package that accompanies a different text book
head(coagulation) # data from an experiment on coagulation time (sec) in animals
1
2
3
4
5
6
coag diet
62
A
60
A
63
A
59
A
63
B
67
B
xyplot(coag ~ diet, data = coagulation, cex = 1.5, alpha = 0.5)
coag
70
65
60
A
B
C
D
diet
We can use ANOVA to test whether the mean coagulations times are the same for the animals on the four diets.
anova(lm(coag ~ diet, data = coagulation))
Analysis of Variance Table
Response: coag
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Comparing More Than Two Means Using ANOVA
15.17
Df Sum Sq Mean Sq F value Pr(>F)
diet
3
228
76.0
13.6 4.7e-05 ***
Residuals 20
112
5.6
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is small, we can follow-up with Tukey’s Honest Significant Differences to see how the diets
differ.
TukeyHSD(aov(coag ~ diet, data = coagulation))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = coag ~ diet, data = coagulation)
$diet
diff
lwr
upr p adj
B-A
5
0.7246 9.275 0.0183
C-A
7
2.7246 11.275 0.0010
D-A
0 -4.0560 4.056 1.0000
C-B
2 -1.8241 5.824 0.4766
D-B
-5 -8.5771 -1.423 0.0044
D-C
-7 -10.5771 -3.423 0.0001
plot(TukeyHSD(aov(coag ~ diet, data = coagulation)))
D−C
C−B
C−A
95% family−wise confidence level
−10
−5
0
5
10
Differences in mean levels of diet
At the 95% confidence level (α = 0.05), we have evidence that four of the six pairs differ.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
15.18
Last Modified: April 23, 2012
Comparing More Than Two Means Using ANOVA
Math 143 : Spring 2011 : Pruim
Correlation
16.1
16
Correlation
We only covered the first section of this chapter where the correlation coefficient is defined. Highlights of this
section:
1. The correlation coefficient measures the strength and direction of a linear relationship.
2. The correlation coefficient is on a scale from −1 to 1.
Values near 1 indicate a strong positive association. Values near −1 indicate a strong negative association.
Values near 0 indicate a weak or non-existent association.
3. The correlation coefficient does not check that there is a linear relationship.
If the relationship follows some other pattern, then the correlation coefficient may be completely confused
about it. Use a scatter plot to see if there is indeed a linear relationship.
4. The correlation coefficient can be computed in R using cor(x,y) where x and y are the variables.
cor(CPS$wage, CPS$exper)
# wage and experience from Current Population Survey 1985
[1] 0.08706
5. The formula for the correlation coefficient is based on adding products of z-scores (and adjusting for
sample size):
n
r=
Math 143 : Spring 2011 : Pruim
1 X xi − x yi − y
·
n−1 i
sx
sy
Last Modified: April 23, 2012
16.2
Last Modified: April 23, 2012
Correlation
Math 143 : Spring 2011 : Pruim
Regression
17.1
17
Regression
17.1
The Simple Linear Regression Model
Y = α + βx + where ∼ Norm(0, σ).
In other words:
• The mean response for a given predictor value x is given by a linear formula
mean response = α + βx
• The distribution of all responses for a given predictor value x is normal.
• The standard deviation of the responses is the same for each predictor value.
17.2
The Least Squares Method
We want to determine the best fitting line to the data. The usual method is the method of least squares which
chooses the line that has the smallest possible sum of squares of residuals, where residuals are defined by
residual = observed − predicted
For a line with equation y = a + bx, this would be
ei = yi − (a + bx)
Simple calculus (that you don’t need to know) allows us to compute the best a and b possible. These best values
define the least squares regression line. We always compute these values using software, but it is good to note
that the least squares line satisfies two very nice properties.
1. The point (x, y) is on the line.
This means that y = a + bx (and a = y − bx)
sy
2. The slope of the line is b = r
sx
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
17.2
Regression
Since we have a point and the slope, it is easy to compute the equation for the line if we know x, sx , y, sy , and
r.
Fortunately, statistical software packages do all this work for us. For example, consider the lion noses example
from the book.
lm(age ~ proportion.black, data = LionNoses)
Call:
lm(formula = age ~ proportion.black, data = LionNoses)
Coefficients:
(Intercept)
0.879
proportion.black
10.647
This means that the equation of the least squares regression line is
ŷ = 0.879 + 10.647x
We use ŷ to indicate that this is not an observed value of the response variable but an estimated value (based
on the linear equation given).
R can add a regression line to our scatter plot if we ask it to.
xyplot(age ~ proportion.black, data = LionNoses, type = c("p", "r"))
●
12
10
age
●
8
●
●●
● ●
6
4
2
●
●
●
●
●● ● ● ●
●
●
●●●●● ●● ●
●
0.2
●●
●
●●
0.4
0.6
0.8
proportion.black
We see that the line does run roughly “through the middle” of the data but that there is also a good deal of
variability above and below the line.
Explanatory and Response Variables Matter
It is important that the explanatory variable be the “x” variable and the response variable be the “y” variable
when doing regression.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Regression
17.3
17.3
Making Predictions
We can use our least squares regression line to make predictions For example, if a lion has a nose that is 30%
black, we estimate the age to be
age
ˆ = 0.879 + 10.647 · 0.30 = 4.0731
As it turns out, the 17th lion in our data set has a nose that was 30% black.
LionNoses[17, ]
age proportion.black
17 4.3
0.3
That lion had an age of 4.3 years, so the residual for this lion is
observed − predicted = 4.3 − 4.07 = 0.23
The observed age is 0.23 years larger that the regression line predicts.
17.4
How Good Are Our Predictions?
How well have we estimated the slope and intercept? How accurately can we predict a lion’s age from the
blackness of its nose? We would like more than just point estimates for these values, we would like confidence
intervals and p-values. Since the regression model is base on normal distributions, these will look very much
like they do for a 1-sample t. The only things that change are the standard error formulas and the degrees of
freedom.
The degrees of freedom is n − 2 for simple linear regression because we estimate two parameters (α and β). The
standard error formulas are bit messier, but computer software does the work of computing them for us.
17.4.1
Estimates for Slope and Intercept
lion.model <- lm(age ~ proportion.black, data = LionNoses)
summary(lion.model)
Call:
lm(formula = age ~ proportion.black, data = LionNoses)
Residuals:
Min
1Q Median
-2.545 -1.112 -0.528
3Q
0.963
Max
4.342
Coefficients:
(Intercept)
proportion.black
Estimate Std. Error t value Pr(>|t|)
0.879
0.569
1.55
0.13
10.647
1.510
7.05 7.7e-08 ***
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
17.4
Regression
--Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.67 on 30 degrees of freedom
Multiple R-squared: 0.624,Adjusted R-squared: 0.611
F-statistic: 49.8 on 1 and 30 DF, p-value: 7.68e-08
Notice the t statistics, standard errors and p-values in the table. We can use these to compute our confidence
intervals and p-values.
• A 95% confidence interval for the slope (β) is given by
b ± t∗ SE
10.647 ± 2.042 · 1.5095
10.647 ± 3.083
using
qt(0.975, df = 30)
# there are 32 lions so df= 32-2 = 30
[1] 2.042
to get the value of t? .
• We can test the hypothesis that β = 0 even more easily since that is the hypothesis test for which the t
statistic and p-value are listed in the summary output.
t=
10.647
b−0
=
= 7.053
SE
1.5095
And the p-value is
2 * (1 - pt(7.053, df = 30))
[1] 7.685e-08
just as the summary output reports.
• To test H0 : β = 1 (or any number other than 0) we need to do the work manually, but since the degrees
of freedom and standard error are provided, it is straightforward to do.
• Tests and intervals for the intercept work the same way but are usually much less interesting.
17.4.2
What Does the Slope Tell Us?
The slope is usually more interesting because it tells useful things about the relationship between the explanatory
and response variable.
1. If the slope is 0, then our model becomes
Y =a+
∼ Norm(0, σ)
That is, Y ∼ Norm(a, σ) and knowing the explanatory variable does not help us predict the response.
The test of whether β = 0 is therefore called the model utility test: it tells us whether the model is useful.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Regression
17.5
2. If the slope is not 0, then the slope tells us how much the average response changes for each unit increase
in the predictor.
The intercept on the other hand, tells us what happens when the predictor is 0, which may be very uninteresting,
especially if the predictor can’t be 0. (Suppose your predictor is your pulse, for example.)
17.4.3
Predictions for the Mean and Individual Response
R can compute two kinds of confidence intervals for the response for a given value
1. The predict() function can plug new values into the linear regression equation for us. Here is our point
estimate again for a lion with a nose that is 30% black.
predict(lion.model, new = data.frame(proportion.black = 0.3))
1
4.073
2. A confidence interval for the mean response for a given explanatory value can be computed by adding
interval='confidence'.
predict(lion.model, new = data.frame(proportion.black = 0.3), interval = "confidence")
fit
lwr
upr
1 4.073 3.467 4.679
3. An interval for an individual response (called a prediction interval to avoid confusion with the confidence
interval above) can be computed by adding interval='prediction' instead.
predict(lion.model, new = data.frame(proportion.black = 0.3), interval = "prediction")
fit
lwr
upr
1 4.073 0.6116 7.535
Prediction intervals are
(a) much wider than confidence intervals
(b) much less robust against violations of the normality assumption
(c) a little bit wider than
ŷ ± 2SE
where SE is the “residual standard error” reported in the summary output.
4.07 - 1.669 * 2
[1] 0.732
4.07 + 1.669 * 2
[1] 7.408
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
17.6
Regression
The prediction interval is a little wider because it takes into account the uncertainty in our estimated
slope and intercept as well as the variability of responses around the true regression line.
The figure below shows the confidence (dotted) and prediction (dashed) intervals as bands around the regression
line.
require(fastR)
xyplot(age ~ proportion.black, data = LionNoses, panel = panel.lmbands,
cex = 0.4, alpha = 0.5)
12
age
10
8
6
4
2
0.2
0.4
0.6
0.8
proportion.black
As the graph illustrates, the intervals are narrow near the center of the data and wider near the edges of the
data. It is not safe to extrapolate beyond the data (without additional information), since there is no data to
let us know whether the pattern of the data extends.
17.5
Checking Assumptions
17.5.1
Don’t Fit a Line If a Line Doesn’t Fit
When doing regression you should always look at the data to see if a line is a good fit. If it is not, it may be
that a suitable transformation of one or both of the variables may improve things.
Anscombe’s Data
Anscombe illustrated the importance of looking at the data by concocting an interesting data set.
Notice how similar the numerical summaries are for these for pairs of variables
summary(lm(y1 ~ x1, anscombe))
Call:
lm(formula = y1 ~ x1, data = anscombe)
Residuals:
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Regression
17.7
Min
1Q Median
-1.9213 -0.4558 -0.0414
3Q
0.7094
Max
1.8388
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.000
1.125
2.67
0.0257 *
x1
0.500
0.118
4.24
0.0022 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.24 on 9 degrees of freedom
Multiple R-squared: 0.667,Adjusted R-squared: 0.629
F-statistic:
18 on 1 and 9 DF, p-value: 0.00217
summary(lm(y2 ~ x2, anscombe))
Call:
lm(formula = y2 ~ x2, data = anscombe)
Residuals:
Min
1Q Median
-1.901 -0.761 0.129
3Q
0.949
Max
1.269
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.001
1.125
2.67
0.0258 *
x2
0.500
0.118
4.24
0.0022 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.24 on 9 degrees of freedom
Multiple R-squared: 0.666,Adjusted R-squared: 0.629
F-statistic:
18 on 1 and 9 DF, p-value: 0.00218
summary(lm(y3 ~ x3, anscombe))
Call:
lm(formula = y3 ~ x3, data = anscombe)
Residuals:
Min
1Q Median
-1.159 -0.615 -0.230
3Q
0.154
Max
3.241
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.002
1.124
2.67
0.0256 *
x3
0.500
0.118
4.24
0.0022 **
---
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
17.8
Regression
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.24 on 9 degrees of freedom
Multiple R-squared: 0.666,Adjusted R-squared: 0.629
F-statistic:
18 on 1 and 9 DF, p-value: 0.00218
summary(lm(y4 ~ x4, anscombe))
Call:
lm(formula = y4 ~ x4, data = anscombe)
Residuals:
Min
1Q Median
-1.751 -0.831 0.000
3Q
0.809
Max
1.839
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.002
1.124
2.67
0.0256 *
x4
0.500
0.118
4.24
0.0022 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.24 on 9 degrees of freedom
Multiple R-squared: 0.667,Adjusted R-squared: 0.63
F-statistic:
18 on 1 and 9 DF, p-value: 0.00216
But the plots reveal that very different things are going on.
5
10
y
1
12
10
8
6
4
15
5
2
10
3
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●●
●●●●
●
●●
●
●●
●
●
●●
●
15
4
●
●
●
●
●
●
●
●
●
●
5
10
15
5
10
15
x
17.5.2
Outliers in Regression
Outliers can be very influential in regression, especially in small data sets, and especially if they occur for
extreme values of the explanatory variable. Outliers cannot be removed just because we don’t like them, but
they should be explored to see what is going on (data entry error? special case? etc.)
Some researchers will do “leave-one-out” analysis, or “leave some out” analysis where they refit the regression
with each data point left out once. If the regression summary changes very little when we do this, this means
that the regression line is summarizing information that is shared among all the points relatively equally. But
if removing one or a small number of values makes a dramatic change, then we know that that point is exerting
a lot of influence over the resulting analysis (a cause for caution). See page 483 for an example of leave-one-out
analysis.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Regression
17.5.3
17.9
Checking the Residuals
Residuals should be checked to see that the distribution looks approximately normal and that that standard
deviation remains consistent across the range of our data (and across time).
xqqmath(~ resid(lion.model))
●
4
resid(lion.model)
●
2
0
●
● ●
−2
●●
●
●
●
●
●●
●●
●●
●●●
●●●●●
●●●
●●
●
●
−2
−1
0
1
2
qnorm
xyplot(resid(lion.model) ~ proportion.black, data = LionNoses)
●
4
resid(lion.model)
●
●
2
0
−2
●
●
●
●●
●
●
●
●
●●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
0.2
0.4
0.6
0.8
proportion.black
Similar plots (and some others as well) can also be made with
plot(lion.model)
In this case things look pretty good. but not as good as they could be. The normal-quantile plot shows a slight
curve, and the residuals seem to be a bit larger on the right side. A logarithmic transformation offers some
improvement.
17.5.4
Tranformations
Transformations of one or both variables can change the shape of the relationship (from non-linear to linear,
we hope) and also the distribution of the residuals. In biological applications, a logarithmic transformation is
often useful.
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
17.10
Regression
Consider the data from Problem 17.2 comparing zoo mortality rates (in %) to the minimal home ranges (in the
wild) of various animal species. 1
mortality
xyplot(mortality ~ homerange, ZooMortality, type = c("p", "r"))
●
60
40
20
0
●
●
●
●
●●
●●
●
●●
●
●
●
0
500
1000
homerange
Clearly a line is not a good fit, and there is one very influential observation. Even if we remove the influential
observation, things don’t look great.
mortality
xyplot(mortality ~ homerange, ZooMortality, subset = homerange < 500,
type = c("p", "r"))
40
30
20
10
0
●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
0
20
●
40
60
homerange
A logarithmic transformation of homerange takes care of most of our troubles.
xyplot(mortality ~ log(homerange), ZooMortality, type = c("p", "r"))
●
mortality
60
●
40
●●
● ●
●●
●
20
●
●
●
0
−2
●
●
●
●●
●●
●
0
2
4
6
log(homerange)
1
This example won’t work quite right if you try to repeat it because in problem 17.2 the transformation was already
done for you. I untrasformed it so we could go through the whole process.
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Regression
17.11
zoo.model <- lm(mortality ~ log(homerange), ZooMortality)
plot(zoo.model, which = 1)
10
●
●● 12
●●●
● ●
●
●
●
−20
Residuals
Residuals vs Fitted
●
●4
10
20 ●
●
●
20
●
●●
●
30
40
Fitted values
lm(mortality ~ log(homerange))
Normal Q−Q
12
●● ● ● ●
●●
●●●
1
−1
Standardized residuals
plot(zoo.model, which = 2)
●4
−2
20 ●
●
●●●●
● ●●
−1
0
1
2
Theoretical Quantiles
lm(mortality ~ log(homerange))
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
17.12
Last Modified: April 23, 2012
Regression
Math 143 : Spring 2011 : Pruim
Variations on the Theme: Using Linear Models
18.1
18
Variations on the Theme: Using Linear Models
You may have wondered why ANOVA and simple linear models both use the R function lm(). This is because
they are both examples of the more general method of general linear models.
In the context of general linear models,
A model is a mathematical representation of the relationship between a response variable Y and one
or more explanatory variables.
The linear models have a special form (there are other types of models as well). Let’s start by looking at the
two examples we have already seen:
• Linear Regression:
Y = α + βx + ◦ α = intercept = starting point = baseline
◦ βx = adjustment depending on x
◦ = random/individual variation
• 1-way ANOVA (3 groups):
Y = µ + β1 [[if in group 2]] + β2 [[if in group 3]] + ◦ µ = mean for group 1 = starting point = baseline
◦ β2 = adjustment for those in group 2
◦ β3 = adjustment for those in group 3
◦ = random/individual variation
The general format of these models is
RESPONSE = BASELINE + ADJUSTMENT(S) + RANDOM VARIATION
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
18.2
Variations on the Theme: Using Linear Models
Since the adjustments are made based on values of explanatory variables, it is handy to name them after those
variables and write our model formula as
RESPONSE = BASELINE + VARIABLE(S) + RANDOM VARIATION
These models can be described in R by specifying the response and the variables used to make the adjustments.
(R addes the baseline and random variation parts for you). So in R it looks like
response ~ variable1 + variable2 + · · · + lastvariable
Actually, R provides a number of shortcuts that make the model description even briefer. For example, when
we use a categorical variable, R creates as many adjustment variables as we need if we just tell it the name of
the categorical variable:
# This is example 18.1
summary(lm(percent.stretching ~ treatment, MouseEmpathy))
Call:
lm(formula = percent.stretching ~ treatment, data = MouseEmpathy)
Residuals:
Min
1Q Median
-37.19 -9.34
2.81
3Q
9.51
Max
29.03
Coefficients:
Estimate Std. Error t
(Intercept)
58.05
5.02
treatmentIsolated
-20.86
6.56
treatmentOne Writhing
-22.68
6.97
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*'
value Pr(>|t|)
11.56 3.6e-14 ***
-3.18
0.0029 **
-3.26
0.0023 **
0.05 '.' 0.1 ' ' 1
Residual standard error: 17.4 on 39 degrees of freedom
Multiple R-squared: 0.255,Adjusted R-squared: 0.217
F-statistic: 6.67 on 2 and 39 DF, p-value: 0.00322
Notice the new variables in this output: treatmentIsolated and treatmentOne Writhing. The fitted model
becomes
ˆ
percent.stretching
= 58.05 − 20.656[[if isolated]] − 20.681[[if one writhing]]
So for an isolated mouse, we predict the amount of stretching to be
58.05 − 20.856 = 37.194
This is just another way of describing the sample mean for that group:
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Variations on the Theme: Using Linear Models
18.3
favstats(percent.stretching ~ treatment, MouseEmpathy)
min
Q1 median
Q3 max mean
sd n missing
Both Writhing 36.7 53.3 58.35 64.15 81.1 58.05 11.52 12
0
Isolated
0.0 30.0 41.10 46.70 65.6 37.19 18.35 17
0
One Writhing
2.2 22.2 41.10 51.10 64.4 35.37 20.34 13
0
Of course, we don’t expect every mouse in the isolated group to stretch this much. That’s why the model
also includes ε, the measure of individual variation. In the R output above we see that the estimated standard
deviation for the individual variation (compared to one’s group’s mean) is 17.4.
18.0.5
The Null Model
ANOVA (Analysis of Variance) asks the question: Compared to not making these adjustments, does myDoes
my model fit better than a model that doesn’t make these adjustments by an
18.0.6
Blocking
model <- lm(zooplankton ~ treatment + factor(block), Zooplankton)
summary(model)
Call:
lm(formula = zooplankton ~ treatment + factor(block), data = Zooplankton)
Residuals:
Min
1Q Median
-0.62 -0.20 -0.08
3Q
0.22
Max
0.68
Coefficients:
(Intercept)
treatmenthigh
treatmentlow
factor(block)2
factor(block)3
factor(block)4
factor(block)5
--Signif. codes:
Estimate Std. Error t value Pr(>|t|)
3.42e+00
3.13e-01
10.94 4.3e-06 ***
-1.64e+00
2.89e-01
-5.67 0.00047 ***
-1.02e+00
2.89e-01
-3.52 0.00781 **
1.04e-15
3.74e-01
0.00 1.00000
-7.00e-01
3.74e-01
-1.87 0.09795 .
-1.00e+00
3.74e-01
-2.68 0.02811 *
-3.00e-01
3.74e-01
-0.80 0.44532
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.458 on 8 degrees of freedom
Multiple R-squared: 0.846,Adjusted R-squared: 0.73
F-statistic: 7.32 on 6 and 8 DF, p-value: 0.00651
Math 143 : Spring 2011 : Pruim
Last Modified: April 23, 2012
18.4
Variations on the Theme: Using Linear Models
anova(model)
Analysis of Variance Table
Response: zooplankton
Df Sum Sq Mean Sq F value
treatment
2
6.86
3.43
16.37
factor(block) 4
2.34
0.58
2.79
Residuals
8
1.68
0.21
--Signif. codes: 0 '***' 0.001 '**' 0.01
18.0.7
Pr(>F)
0.0015 **
0.1010
'*' 0.05 '.' 0.1 ' ' 1
Factorial Design
Last Modified: April 23, 2012
Math 143 : Spring 2011 : Pruim
Bibliography
[Fis62] R. A. Fisher, Confidence limits for a cross-product ratio, Australian Journal of Statistics 4 (1962).
[Fis70]
, Statistical methods for research workers, 14th ed., Oliver & Boyd, 1970.
5
© Copyright 2026 Paperzz