Partitioning variance

This course is about VARIATION: its causes, effects, and history.
For thousands of years, western thought had accepted the Platonic
view that an object’s ultimate reality was its essence or ideal type.
In biology, essentialism gave rise to the assumption that species are
held together by their underlying, unchanging "types“ or ideal forms.
On this view, individual variations are departures from the essence of
a species; thus they are imperfections that make individuals less
representative of the true nature of their species.
Darwin destroyed essentialism in biology and
replaced it with a radical new idea: variationism.
Variationism is the view that species are united only by
recent common ancestry. Thus every individual is equally
representative of the species; the average phenotype is
just a statistical abstraction, not the reflection of some
higher, more pure or more ultimate reality.
Anth/Biol 5221, 26 August 2015
What causes variation, and why is some
of it heritable (kids resemble parents?)
Darwin didn’t know. Mendel’s discovery of
genes (1865) was rediscovered in 1900, and
most of the early geneticists concluded that
genes and Darwin were incompatible.
R.A. Fisher (1890-1962) invented the analysis of
variance (ANOVA) in 1918 to show that Darwin’s
ideas about the inheritance of variation were
consistent with Mendel’s genetics.
This is one of the deepest, most general and
most transformative ideas in the history of
human thought - and oddly, most invisible!
Fisher remains an obscure nerd celebrated by
almost no one other than statisticians (right).
Meanwhile, ANOVA has become the foundation
of statistical thinking and practice in industry,
government and medicine as well as in science.
Heights of 51
students in
Anth/Biol 5221,
fall 2011
(measured in
inches)
They vary.
How much?
How should we
describe the
variation?
68
61
66
75
70
64
65
65
70
69
60
67
66
72
61
66
74
73
63
69
63
69
71
71
77
70
66
65
69
66
62
67
71
70
69
72
61
68
64
71
70
74
71
73
64
71
62
69
64
64
70
Axelrod, Rachel
Baca, Amanda
Barlow-Hilmo, Kim
Blackhorst, Michael
Carothers, Jennifer
Casagranda, Brooke
Cash, Gabriela
Chase, Gillian
Davis, Samantha
Dimond, Ben
Do, Michael
Fox, Robert
Gibbs, Katie
Glosenger, Andrew
Guten, Maria
Hardman, Kennedy
Howell, Ryan
Humes, Ryan
Huynh, Tina
Iliff, Anthony
Ingram, Elise
Jen_Twu, Jonas
Johnson, Deborah
Jones, Carson
Malovich, Michael
Marble, Stephanie
McCann, Jennifer
Munoz, Adriana
Nelson, Christopher
Newton, Brittany
Nguyen, Than
Noe, Jordan
Park, James
Price, Brock
Reid, Doug
Rudd, Jared
Rueckert, Katie
Schweitzer, John
Silver, Alyssa
Simkins, Richard
Sorensen, Jeff
Spackman, Derek
Spencer, Cody
Spencer, Zac
Staufer, Annmarie
Stephens, Jesse
Tang, Kim
Thomas, Nathan
Trakhimets, Alesia
Trotter, Ten
ZoBey, Scott
60
61
61
61
62
62
63
63
64
64
64
64
64
65
65
65
66
66
66
66
66
67
67
68
68
69
69
69
69
69
69
70
70
70
70
70
70
71
71
71
71
71
71
72
72
73
73
74
74
75
77
Do, Michael
Baca, Amanda
Guten, Maria
Rueckert, Katie
Nguyen, Than
Tang, Kim
Huynh, Tina
Ingram, Elise
Casagranda, Brooke
Silver, Alyssa
Staufer, Annmarie
Trakhimets, Alesia
Trotter, Ten
Cash, Gabriela
Chase, Gillian
Munoz, Adriana
Barlow-Hilmo, Kim
Gibbs, Katie
Hardman, Kennedy
McCann, Jennifer
Newton, Brittany
Fox, Robert
Noe, Jordan
Axelrod, Rachel
Schweitzer, John
Dimond, Ben
Iliff, Anthony
Jen_Twu, Jonas
Nelson, Christopher
Reid, Doug
Thomas, Nathan
Carothers, Jennifer
Davis, Samantha
Marble, Stephanie
Price, Brock
Sorensen, Jeff
ZoBey, Scott
Johnson, Deborah
Jones, Carson
Park, James
Simkins, Richard
Spencer, Cody
Stephens, Jesse
Glosenger, Andrew
Rudd, Jared
Humes, Ryan
Spencer, Zac
Howell, Ryan
Spackman, Derek
Blackhorst, Michael
Malovich, Michael
60
77
Heights of 51 students in Anth/Biol 5221, fall 2011
mean = 67.8 in, variance = 16.2 in2, standard deviation (sd) = 4.03 in
Heights of 25 women and 26 men in Anth/Biol 5221
Analysis in English units (inches)
The mean (M) is the average or “expected” value.
The variance (V) is the average or “expected” squared deviation from the mean.
26 males : M = 70.4
V = 10.013
25 fems
: M = 65.1
V =
51 all
: M = 67.8
V = 16.236
8.154
The women are 25/51 = 0.490 of the sample, and 26/51 = 0.510 are men.
V(within) =
9.102 = 0.490*8.15 + 0.510*10.01
V(among)
=
7.134 = 0.490*(65.1 - 67.8)^2 + 0.510*(70.4 - 67.8)^2
V(total)
=
16.236
fraction “explained by sex” =
7.134/ 16.236 = 0.44
Heights of 25 women and 26 men in Anth/Biol 5221
Analysis in metric units (centimeters)
26 males : M = 178.8
V =
66.485
25 fems
: M = 165.4
V =
53.366
51 all
: M = 172.2
V = 104.454
V(within) =
60.054 = 0.490*53.37 + 0.510*66.49
V(among)
=
44.400 = 0.490*(165.4 - 172.2)^2 + 0.510*(178.8 - 172.2)^2
V(total)
= 104.454
fraction “explained by sex” =
44.400/104.454 = 0.43
Heights of 252 women and 223 men in the Utah Genetic Reference Project
252 females : M = 165.6
V =
46.962
223 males
: M = 180.2
V =
49.782
475 total
: M = 172.5
V = 101.694
V(within) =
48.286 = 0.531*46.96 + 0.469*49.78
V(among)
=
53.408 = 0.531*(165.6 - 172.5)^2 + 0.469*(180.2 - 172.5)^2
V(total)
= 101.694
fraction “explained by sex” =
53.408/101.694 = 0.53
Most of the individuals are siblings in 36
families with 1-12 sons and 1-12 daughters.
90% of the remaining variance is explained
by genetic differences among the families.
And 10% by effects of the environment
(what remains after the effects of sex and
of genes have been “removed” statistically).
These people grew up in a very healthy and
uniform environment (20th-century Utah).
In other times and places, the split tends
to be 80/20 or even 70/30.
For other traits, in most species, it may be
anywhere from 80/20 to 20/80.
The families
disaggregated
Populus tremuloides (quaking aspen)
East Canyon site, clone #2
Sheets 1-10
Top of East Canyon
Three clones
Two trees/clone
Sheets 11-20
Top of Millcreek
Three clones
Two trees/clone
4 x 10 = 40
Leaves/clone
Width
Length
Leaf-shape ratio
R = Length / Width
Upper
Millcreek
East
Canyon
Leaf shape within and among six quaking aspen clones
mean
variance
Clone 1
0.902
0.00351
Clone 2
0.992
0.00237
Clone 3
1.075
0.00271
Clone 1
0.861
0.00552
Clone 2
1.028
0.00200
Clone 3
0.918
0.00947
All
0.963
0.00990
Upper
Millcreek
East
Canyon
Analysis of variance (ANOVA)
mean
variance
Clone 1
0.902
0.00351
Clone 2
0.992
0.00237
Clone 3
1.075
0.00271
Clone 1
0.861
0.00552
Clone 2
1.028
0.00200
Clone 3
0.918
0.00947
All
0.963
0.00990
Variance among clones
= var(0.902, 0.992, … , 0.918)
= 0.00564
Variance within clones
= mean(0.00351, … , 0.00947)
= 0.00426
Total variance
= 0.00564 + 0.00426
= 0.00990
Fraction explained by clones
= 0.00564 / 0.00990
= 0.57
Willow
Heights
Upper
Millcreek
East
Canyon
Leaf shape within and among nine quaking aspen clones
mean
variance
Clone 1
0.898
0.00235
Clone 2
0.994
0.00251
Clone 3
1.067
0.00232
Clone 1
0.831
0.00276
Clone 2
1.023
0.00180
Clone 3
0.890
0.00712
Clone 1
0.938
0.00347
Clone 2
0.890
0.00850
Clone 3
0.896
0.00173
All
0.936
0.00879
Note: data for EC and UM
differ slightly from earlier
analysis. I don’t know why!
Willow
Heights
Upper
Millcreek
East
Canyon
Analysis of variance (ANOVA)
mean
variance
Clone 1
0.898
0.00235
Clone 2
0.994
0.00251
Clone 3
1.067
0.00232
Clone 1
0.831
0.00276
Clone 2
1.023
0.00180
Clone 3
0.890
0.00721
Clone 1
0.938
0.00347
Clone 2
0.890
0.00850
Clone 3
0.896
0.00173
All
0.936
0.00879
Variance among clones
= var(0.898, 0.994, … , 0.896)
= 0.00517
Variance within clones
= mean(0.00235, … , 0.00173)
= 0.00362
Total variance
= 0.00517 + 0.00362
= 0.00879
Fraction explained by clones
= 0.00517 / 0.00879
= 0.59
Leaf
length
alone
49.175
45.975
48.725
(5.945) <35.34437>
(2.650) < 7.02438>
(3.442) <11.84938>
40
40
40
51.475
54.875
40.525
(4.266) <18.19938>
(5.414) <29.30937>
(3.413) <11.64938>
40
40
40
40.625
49.675
49.025
(3.022) < 9.13438>
(4.209) <17.71937>
(3.752) <14.07438>
40
40
40
47.786
(6.086) <37.04036>
360
var(among) = 19.89543
var(within) = 17.14493
sum
= 37.04036
Leaf
width
alone
Explained by clones
= 19.895 / 37.040
= 0.54
54.925
46.350
45.700
(7.387) <54.56937>
(3.380) <11.42750>
(3.156) < 9.96000>
40
40
40
62.050
53.775
45.775
(4.780) <22.84750>
(6.207) <38.52438>
(4.356) <18.97438>
40
40
40
43.400
56.125
54.775
(3.426) <11.74000>
(4.920) <24.20938>
(3.805) <14.47438>
40
40
40
51.431
(7.642) <58.39518>
360
var(among) = 35.42552
var(within) = 22.96965
sum = 58.39518
Explained by clones
= 35.425 / 58.395
= 0.61
Silver Lake study site
(Brighton, elevation ~9,000 ft)
23 August 2015
(note smoke from west-coast wildfires)
2 October 2011
c8
c9
c12
c2a c1
c4-6
c5a
c7
c3
c14
For the seven clones
with more than two
sampled trees, clone
membership explains
75% of the variation
in the trees’ mean
W/L ratios.
Within clones, W/L
increases going west
(i.e., trees with larger
west longitudes tend
to have broader
leaves, on average).
c8
c9
c12
c2a c1
c4-6
c5a
c7
c3
c14
Summary
Every quantitative phenotype you can think of varies.
Often the distributions are roughly normal.
If some of this variation is heritable, then evolution by natural
selection is inevitable (Darwin’s world-changing insight).
Fisher invented ANOVA to show that darwinian evolution of
quantitative traits is compatible with mendelian genetics.
(If many genetic loci make small, independent contributions,
and so does the environment).
The paper will soon have its 100th anniversary (2018).
It + Darwin changed how we think about variation.
Now the variance can be “partitioned” into contributions
associated with “factors” that “explain” the total.
Often (but not always) we can interpret the factors as causes.
For example, “genes” and “environment. ”