L~f.~
tJ.p
sf~f.3fc'c.3
11,',."e.. ~Y'/~.J
(1 (,;
;.
#- ~ I
Mimeo. No. 2
R. E. Comstock = 1;49
•
STATISTICAL CONCEPTS IN GENETICS
II. Statistical formulae
A small group of population parameters (the mean, variance, linear regression
coefficient, and correlation coefficient) will be used repeatedly.
should
at his command the basic formulae involving them.
~ave
The student
The most important
of such formulae are given below together with some attenti9n to derivation.
The
derivations given should receive the students' attention because they exemplify
sorts of algebraic manipulation to be used extensively in the material that
follows.
Some of the formulae are given in several algebraic forms; the student
will find it helpful to have an easy familiarity with all of them.
A.
Variance
Consider a population of which the individual members are designated as
1, 2, 3, • , " N
and their'magnitudes-in some measured character are
XII X2 , ~, • • "
Xr~
The arithmetic @ean (X) is
The variance (CJ 2) is, by definition,
_ 2 S(x2 ).. (SX)2
Ol 2 = sex- xl =
N
X
N
= Xl - i,
Then,
2
(J
=
(1)
N
x2 = X2 -
X,
etc.
S~2
N
§mall ~mse letters
!i!1
~ used thr oughou1
12
symboli,]~ deviation§.· of ~ variab12
!t2m ~ts ~ean. Other uses Will, of course, be made of tho loner case letters
but when a variable is symbolized h:
Q.
specified letter the corresponding loner
case letter will be reserved for ~10 deviation of that variable from its moan.
6!:...-.
.. -"
'J
't,-j; ..
Mimeo. No. 2
Comstock
-2B.
The effects of coding on mean and varinnce.
1.
Let the measured values of the above population be coded by subtraction of
0.
constant value, -,
a from each. This yields a. nO,1 population of values,
Tho mean is
X" -
n
Xl - a + X2 - a • • • XN - a i:x - No. = ....;;;;.--~-----------= N -~ = X N
c.
The va.rinnce is
f"lr-
'--.)
2
--
(X - a)
-
SeX - o. - X + a)
-
-
N
-
2
- 2
- X) ~ = i J
= sex
-iJ'
2
X
Note that the coding chnnges the meun in the samo
each individual but docs not affect tho variance.
~"J:J.Y
thd it changes
Obviously coding by addition
would have similo.r consequonces.
2. Next, consider coding by multiplication. Lot each vc1uo of tho original
population be multiplied by a constant, £.
We then havo the values
cXl' cX2' cX3 • • •
Their mean is
-cx = cXl + cX2 + • • • cXN
- - - - - - - -...N-'--.:.:.
= cX
Their variance is
2
_ S(cX - cX)
C5"' cX ,..
N
= c2sex _ X)2 =
(6)
N
Again the mean is affected in the same manner as the individuals.
Clearly the
above covorscoding by division since c nay tiko a fractional value such as
1/10 and multiplication by 1/10 is the equivalent of division by 10.
,..J
Mimeo. No.2
R_ E. Comstock
-3-
-
• •
with mean X, and
. - with mean -Y.
By definition the correlation coefficient is
p
.
seX - X)(Y -
=..
.r-:=
-2
sex - X)
t~
where <Y" xy
;8
=
Q XI
- S (X -
-
•
...
<-
.
- 2
S(y _ y)
= Covariance
1)(Y_- Y)
SeX - X)2
Y-X -
Y}
oJ
XV
O-x C'y
of X and Y.
=~
=
s:C
rs- xy
(S)
S"~
(Sa)
to=
()r;r
c
=
xCy
2
6 r;r
()2
. x cr. y2
=
';& y·x -;6
. X.Y
The equation of the regression (Yon X) model is
,..,
Y= Y +px+e
...
I
or subtra.cting Y froLl each side of the equation
y=px+e
Where Q is a random error inY not correlated uith variation in X.
(9)
(7)
Mimeo. No.2
R. E. Comstock
-4Rcarrnnging
e
=y
-
I
d x
Se 2 = S(y - ~x)2
= 3y2
-
=S(y2
2;J Sxy + P
- 2;9xy + ;e2x2)
2
2
Sx = ,3y2
2Sxy.Sxy
- - Sx2
+
sx2
Sxy. Sxy •
2
--;--
Sx
• Sx
(10)
Se 2 is, of course, tho sum of squares of the deviations of Y from regression on X.
It should be noted that if any value other than SXY/Sx ~ere assigned ~ , 3e 2
2
would be increased, i.e.
;C1
satisfied the least squares criterion.
The equation for the regression line is
A
y ::;: Y
+,.ex
where
Y is
the prodicted value of Y for any value of x insorted.
4x. = y
-
y
symbolizes the deviations of predicted valuos from tho mean of Y and the sum of
squares of such deviations is uhat is called the regression sum of squares
(11)
From (10) and (11)
S02 + s( ;ex)2
=
sl-
(SS)2 +
S
(~)2 = sl
IS
(12)
or dividing through by N
()2 + () 2
e
;&x
= cr-
2
() 2 is frequently symbolized as
e
(13)
y
2
() yex.
From (12) and (13) ne see tho.t
either the sun of squares or the variance of Y is divisible into two parts
(1) that due to regression, and (2) that due to deviation from regression.
Uimeo. No. 2
R. E. Comstock
D. Varianoo of eums, differences and neans.
Consider t"ro populations
All
..
..
'l7i th nonn B •
~ • • •
B.3 • • •
A2 ,
Bl' B2 ,
I"Ii th moon A, and
2
u 2A =Sa,
N
No~
let a third population be formed as fol10TIs
c
c2
= Al + B1
= C2
+
A2
N
+
B&
-
.. = A
- C
• • • =A+ B
..
2 + B2 .. A .. B = 0.2 + b2
etc.
,.. 2
S~
\) C =~
=
S (a + b) 2
2
. 2
_ So. + Sb
N
sa.2
=
2
+ 2Snb + Sb
N
2.Sc.b
-7 v+rr=
(14)
since from equation (7)
t
The reader can 00.si1y verify that if the C s are taken as the differences betueen
Als and BI S rather than ns the sums,
Mineo. No. 2
R. E. Constock
-6In either case if ~ AS = 0, i.e. A ~nd B ~re uncorrelnted
<f"2 =
cr-
2 + G' 2
CAB
When the nenbors of tuo such populations nre paired at randon
f
\7ill nlv1nys be
zero.
There Irill be [,any instances il':lportant to us in nhich this ';;ill be the
c~se.
Equation (13) is true beca.use thore is no correlation bot',7een X and e.
The above can easily be extended to
nunber of
vari~bles.
2
() (A Z B i" C
:t
SU;]6
(or differences) involving any
Thus
D ••• )
(15)
=
provided none of the variabiles are correlated.
Or
+ • ••
() 2
~
If Xv X2 , • • • XN are all drawn frou the sane population
() ~l = ()
~
= • • • =
u ~
=
cr" 2
and
No~r if this suo, (Xl + X2 + X3 • • •
Xr>J)' is
divided by N to obtain
D.
mean,
X,
applying (6) \1e find
()2
•
=N
the fornula for the variance of a Dean.
(16)
Mineo. No. 2
R. E. COtlstock
Note that all of the foregoing has dealt with
rather than sample statistics.
popul~tion
This is the reason, for exaople, why the
denominator in the variance formula is N rather than N - 1.
why equations such as
(15)
paraneters
It also explains
involving variances of SutlS and differences can be
given as strict equalities rather than as approxir-lations.
The reason for
approaching the subject in this way is that the parameters (the population vcJ.ues
of the mean, variance, etc.) are the "expected l ' values of tho una.lagous statistics
in snople data.
Much of our tirle will bo spent in tho derivation of "expected II
values and for that purpose the above forns uil1 be appropriate.
Obviously, uhen
expectations are to be checked o.gainst obsorvations or pnrauotors are to be
estioated using sanple data the appropriate sanp1e statistics nust be used as
Ilestimators" of the para.':loters.
of
~2
\J
S(X -
J.°t
For eXD.!"TIplo, if
nus t be couput e d f'
ror.l the snDp1 e
.:J~t
uu. 0.
6
2 is to bo an unbiased estir.lo.to
as S(X - X)2 rather than as
N- 1
X)2
N
•
froblem§:
1. Give tho variance of aX +
assuning
2.
thnt~
and
~
Give the vo.riance of
0- ~ to be 10,
3. If X
= 4 .,. ~
S• Z in torms of tho vo.riances of X, Y, and Z
are constants and that X,
(X .,. Y)
<> ~ to
I,
~~d
assuning the corrolation
Z nrc uncorrolated.
(fJ xy)
to be
.5,
be 20, o.nd N to be 12.
.,. b2 .. • • •
br .. 01
.. 02 .. • • • en and thore nre no
correlations aflong the sur.med values, '\1ha.t is tho variance of X?
Consider the
b's as randooly dra\1n fron one population and the e t s as ro.ndonly drawn fron .
another.
e
4. Given
What is the variance of X/n?
~ ~ = 64,
CS"
~
= 100, qnd 6'" CD
where r a.nd t are constants?
= 32.
What is the vario.ncc of rC ..
12
t
STATISTICAL CONCEP'IS IN GENETICS
III.
Mimeo, No, 3
R, E. Comstock s2:49
The genotypic improvement resulting from selection,
The value of a genotype may be measured either in terms of (1) its effect
on the phenotype of the individual TIhich possesses it, or (2) the mean genotype,
of progeny of that individual,
For illustration, consider a pair of genes(~,a)
of which the gene A is completely dominant to its allele.
Measured in terms of
their effects on the phenotype of their possessors, the genotypes AA and Aa have
equal value while by
th~
are obviously not equal,
of its
~
posse~sor
second criterion of measurement (the progeny) their values
jIhe value of a genotype with respect to the phenotype
will be referred to henceforward as the genotypic value or the
of the genotype,
The deviations of such values from their population mean
will be termed the effect Qf.
~ genot~,
One individual will be said to be
genotypically superior to another if the value or effect of its genotype is
greate~
(this assumes that the method of measurement is such that high values are more
desirable than low ones).
We shall be concerned first with the effect of selection upon the difference
in genotJ~ic value (as defined above) betv70en selectod individuals (or lines,
varieties, hybrids, etc.) and the mean of the population or sample from which
selections were made, ieee, the genotypic superiority of selected individuals.
The
effect of selection upon the moan genotypic value in later gener~tions is a more
complex problem, eonsideration of which must be postponed until further groundwork
has been laid,
However, as will be shovm later, the gain reflected in later
generations is proportional to the genotypic suporiority of the selected
individuals.
Selection is based on the phenotype,
The phenotypic expression of any charac-
teristic of an individual, e,g., height at a specified age, is the resultant of the
genotype of the individual and the environment in ,":hich it develops,
The relative
Mimeo. No. 3
R. E. Comstock
-2-
importance of heredity and environment as sources of phenotypic variation is known
to vary greatly for different characteristics.
Traits such as color are in general
almost completely under genetic control though there arc classical exceptions.
The
so-called quantitative characters arc for the most part much more responsive to
environmental variation.
We recognize intuitively that given a certain amount of genetic variation
selection
~ill
increase in effectiveness as vDxiation in phenotype from environ-
mental sources decreases.
It nould appear tho.t control of environtlent should
offer a menns for mnking selection more effective and some effort in this direction
is productive. However, it is essential to recognize thut a completely unifom
environment for all individuals of a group is an abstraction, nevor an actuality.
The environments (defined to cncampass all non-genetic variables that affect
phenotype) of plants very even though the plants nre growing adjacent to each other
and the environments of animals vary even though all arc handled as nearly alike as
.is humnly possible.
A. little consideration of s oil variation, competition, the
random distribution of parasitic and pathogenic orgonisns, accidents of various and
subtle sorts, etc. will suggest many uncontrollable sources of enviromlental
variation.
As a preliminary exorcise to gain fnniliarity with the meaning of certain torns
and notations consider a population of genotypes
of any plant or animal, and a population of envirollilents
Assune that one individual of each genotype is raised in e",ch of the M onvironnents~
Synbolize by P the mensuro of any
chnracteri~t~c
- -
such
~s_ho~ght
,
. ,.
and as BUI:1S
th C'.t t'llO
measuroment is not subjoct to error. Let Pij be height of the individual with the
Mineo. No. 2
R. E. Cor.lstock
~
l th genotype raised in tho lth envirol~lont. Thus, for exanple, P23 will be the
height of the individual vTith the genotype G2 raised in the environr.lOnt E3. Let
P be
the mean height of the NM individuals, and
- = Pij
(17)
Pij - P
Thus
-
P27 - P ~ P27
etc.
Then
• • •
the
~~
of the genotype 01 for tho popula.tion of environnents involved.
In
general
(18)
Pil + Pi2 + •
Ii
the Q!fect of the !th genotype,
~lj + P2j + • • • PNj
N
the
effec~
(19)
= OJ'
of the jth envirorment for tho population of genotypes involved.
Note that the effect of a.
genotJ~e
is defined in torns of tho average,
for sone specified population of environnents, of the
f
associ~ted
phenotypes.
In
ony practical selection proble::l it is inportnnt that tho population of environnents
~-
be delineated.
For exanple, in selection aoong corn hybrids it night be conposed
\
of those envirolli.lents in '\7hich corn is raised in the COQ.sto.l Plain area of North
I
Carolina..
r
Ie
~.
t
~
r,.
~vora.ge
In like nanner the effect of an environnent is defined in terns of the
phenotype, for a. population of gonotypes, of individuals raised in that
environment.
-
Nou Pll - P is not necessarily equal to the
s~~
of gl and el_
response to a given variation in genotype nay not be the sane
L~
Tho phenotypic
0.11 onvironnents;
Minco. No. 3
R. E. COLlstock
-4-
individuals differing in genotype Dey respond differently to n spocific variation
in envirollilont.
Is it the rosponse to genotype or to environnent which varies?
We cannot distinguish end resolve tho
.
onvironnont intoractions.
- = gl
by pnying that thero nre gonotypo-
Let
+ 01 + ill
or
ill = Pll - gl - el
p = g2 + 03 +i23
or
i23 = P23 - g2 - Ct.3
PII - P
P23 -
situ~tion
-
otc.
You uill note that the 1 (the interaction teru) is the anount by uhich the
deviation of the phenotypo froD the general noan fails to be the SUD of the
average devia.tions for the genotype
.~ p .. - P
Pij~'
1J
which is our
Dather~.ticgl
e~d
= g.1
envirollilcnt involved.
In general
+ o' + i,·
J
nodel for phenotype.
(20)
1J
For practical purposes it is not
quite conplete since it does not recognize errors of T.10aSUrenent uhich arc always
involved in practical situations. S·uch errors aro recognized by addition of
another torn to the nodel.
(21)
Pijk = gi + e J, + i., + z' 'k
1J
1J
The third subscript is necessary because nol'O them one individucl of a given
genotype night theoretically be raised in any given onvirolliJent.
specifics the !th individual of the
~th
Thus Pijk
genotypo raised in the ith envirollilent,
and Zijk is a randon error in neasurenent of that individual's phenotype.
The
phenotypic variance (er ~) for the population specified is
[soo (12), II
J
sinco the definitions of g, e, i, and z were such that thoro can be no correlations
aLl0ng then.
Tho reasons for considering this hypothotical population uero (1) to
Mineo. No.. 2
R. E. Constock
-5-
e
establish definitions of genotypic effect" envirOl1r1Ontnl effect" and genotypoenvironnent interaction, Qlld (2) the population described is of the type froD
~hich
the nnterial uith which we denl in practicnl probIens nay frequently be
considered a snnple.
In certain instances" ho,Jevor" the parent pepulation c,2l1not be
be of the type considered above.
Tho nost inpertunt
thL~g
ass~~od
to
to be on guard against
is lack of indepondence in the distribution of genotypes and environncnts.
(The
specification that each genotype Was to be raised in each envirOntlont was one way
of describing the situation that would obtain in the population if genotypes and
envirent~nts
TIere independently distributed.)
certain hunan traits genotypes and
Fer
e~~nple"
enviroru~onts
It is probable thnt in the case of
arc not independently
distributed~
both the IQ of children and the environrJent in which they are raised
are probably corrolnted to sCDe extent \lith the IQ of their parents.
Agnin in
natural populations which extend over areas which differ in average environrlent,
the distributions arc likely to be non-indepondent since in oach area there TIould
be a tendency for the best adapted genotypes te roproduce DoSt rapidly and those
bost adapted night well differ frOIl erea to r..rea.
Our
prinnry concern, he'wever,
will be \"lith applicntions te controlled breeding progrOJ_,s \lith fnrn uninc.ls or
agrononic crops \7ith \"lhich randon distribution con be affected.
For sone of cur purposes there
~"ril1
be no point in distinGuishing environnentl:'J.
effects, genotypo envirort:ent interr'.ctions, and errors of f.1ec.surenent.
In those
instances the follo\ling siLplifiod nodel TIill be used.
p
=g
+
0
in which £ is the sun of the three non-genetic effects.
In other cases, .£ vlill be
defined to include the neasurenent error, the effect of enVirol1r.illnt, and a frnction
of the genotype~enviro~10nt interaction TIith i being defined to include a specified
conponent of the interaction.
HOITcver, the conplete Dodel should ulways be kept
in ~ind as a cheek on the applic~bility of any vuriate version.
Mimeo. No.2
R. E. Comstock
-6-
Tho cstino.tion of the genotypic eu:pcriorit~r to be expected
on the
~ver~ge
in selected individuQls is a regression problon.
We can measure phenotype so if we Icnow the regression of genotype on phenotype
we can Eredict the amount by which phenotypically super-ior individuals are
superior in genotype.
Let
p
=g
+e
where g is genotypic effect and
~
is the deviation of g from p due to
environmental variation, genotype environment interaction, and measurement
error.
Then
o pg = ~Eg
N
[. g(g + e)
~
N
') ge
----- =
Note the absence of a correction term in the equation.
the means of p, g, and e are all zero.
-
-
-
(23a)
N
It would be zero since
If genotypes are distributed at random
relative to variations in environment and the measurement error is random, there
will be no correlation between ~ and '::' i.e.
(5"
>" g2
:;: '--
pg
N
=
~ ge will equal zero.
(Y2
g
Hence
(23b)
The regression of genotype on phenotype will be
(J
P gp :;:
0--
pg
.".-) p2
(24)
Vre can now set up the prediction equation for genotypic superiority of selected
individuals.
Let
Ps be the selection differential (the mean difference in phenotype between
selected individuals and the group from which they are selected), and
gs
be the moan genotypic superiority of the selected individuals.
Mimeo. No.2
R. E. Comstock
-7Then
where (=) is used to indicate estimation rather than strict equality in view of
sampling error present.
If Ps is measurod in terms of the phenotypic standard
deviation VIe set
Ps = k
a- p
(26)
where 1£ is the number of standard deviations by ,{lhich selected individuals are
phenot;ypically superior to the entire group.
g
s
(=)
,'5
Then
= k (~-J2
p
u
(25a)
p
This form of the equation has special utility for measuring the advantage gained
by reduction of tho environmental component of phenotypic vuria.nce through
control of environmental variation.
~
is on the averageconst&it vhen the
proportion of total individunls \'lhich are selected is constant. Since ~ ~
is not a function of envirOnIl10nta.1 variation the only term in the equation which
will be affected by change in environmental vnrif':.::lco is
effect of variation in
~
6'" p is changed and
J.
6. 6- p symbolize the
D. gs be the corresponding change in
genotypic gain resulting from selection.
k
Let us consider the
p On gs assuming k constro1t, i.e. that the selected
proportion of the population remains constant. Let
amount by vlhich
<S'" .'0.
Thon
0- 2g
lT'P
(27)
Mimeo. Ne. 2
R. E. Comstock
and
!J g
s
(=)
The expected increment in gs as a fraction of gs will be
b.
g§
=
gs
6g§i5""J2:;:
~p
2
--....,;;.~--
k () g
() p + b. 'S- p
_ 1
(28)
In cases where CS" p is reduced (as presumably it would be if environment is made
Let tho change in 'S ~ resulting
more uniform) 11 () p is a negative quantity.
from groater control on
o L.. 0.
<.. 1.0.
1) () ~ where
This is the amount of change in ()~. Then
l:. gs
-=
.
gs
()p
IS"' p + 6.
Substituting 6- p2 ...
A gs
environme~t be symbolized by (0. ...
(S-
cr' p
... 1 =
' p
/~_.
c· + (0. ...
(:)"'2
2
1)
.) P
2
2g for (r·
. e
= ~_._---
(since
::s- 2p
... 1
:;: (;-2g + (J'2)
e
() ~
... 1
gs
cr e2
2
2
2
(298.)
<S"" p + (0. ... 1) ( r::; p ... ~ g)
If numerator and denominator of tho term under the radical are both divided by
";-' 2
\..:. p' we obtain
r-"
~
1
- 1
(29b)
a-
which is tho fraction by which gonotypic gain from selection is expocted to be
increased as a consequence of 'changing environmental variance by (a ... 1) () ~,
i.e. from o ~ to \5 e2 ... (n ... 1) <y e2 :;: a fT e2'.
listed in Table 1.
Values of this frQ.ction are
They indicate that when t') 2g amounts to as much as one-half
of ~ ~, control on environment is relatively inefficient in increasing the effect~
Minoa. No. 2
R. E. COr.1stock
-9of selection.
Evon ,rhon
(') ~ is rather loti relative to
~~, environmcnta.l
vcrinnco
Tablo 1.
a.
Values of L1 gs as n fraction of gs for varying valuos of
and () ~ () ~ (fron equation 29).
.1
,048
,170
.348
.644
1.294
.9
.7
.5
.3
.1
.3
.5
.7
.9
.037
.125
.240
.400
.644
.026
.OS5
.154
.240
.348
.015
,048
.085
.125
.170
,005
,015
.026
,037
.048
a
Dust be roducud sharply if the effect of selection is to be increased very ouch.
e
For e:xnnple ':Then T ~ is one-tenth of l ' ~ reduction of ()~ to .7 U ~
increases the expected genot;ypo ga.in frOtl selection by only 17'/>.
Ih£ expected value
£t k.
When tho data on uhich selection is based are available tho selection
differential, ps' can
<~uays
be obtained directly.
problens the expected nagnitude of
popul~tion
k,
HOTIcver, in theoretical
the selection diff8rentia.l in units of the
phenotypic variance, is often required.
If tho sanple fran uhich selections are to be nade is of size 50 or less,
expected
k nay
be obtained fron TC1.ble XX of Fisher and Yates (1).
This table
gives, for sGnples fran N = 2 to N = 50, tho nunber of standard devintions by
uhich the 1st, 2nd, • • • Nth individuals rJey on tho aVerage be expected to
e
deviate froD the Dean if tho individuals of the aaoplo are placed in rank. order"
For oxanple, if i,7e are going to take the best 4 of
0.
sC1,,':1ple of 40, ·,70 find freD
the table that on the avera.ge they TIill deviate 2.16, 1.75, 1.52, ill1d 1.34
I
!
I
l
..
Mineo. No. 2
R. E. Constock
-10-
standard deviations fron the nean, respectively.
The uvernge of these figures,
1.69, is taken as the expected value of~. AsSUIlptions involved are that the
saiJple can be considered as randoDly dru\ln fron a pcrent population of tIle
f1nornal" forn.
If selection is to be fran sanplos of Dare than 50, tho expected value of
1£ can be obtained fron tho attributes of the nomal curve c.s
Z/Til
uhere E is the
proportion selected and J is the height of the ordinate TIhich d~vides the area
under the "norr.lo.l" curve into portions rela.tive in nngnitude to the proportions
selected and rejected.
The value of
Z
cnn be obtained using Tables I and II
of Fisher l'nd Yates (I) or fron the table in the Handbook of Chenistry and
Physics uhich relates to the Itnorr.1alll curve.
If the tables of Fisher and Yates
are used, Table I is entered uith P equal to e~ther ~7 or 2(1 - w) TIhichever
1.0 or less.
is
(Tho factor, 2, is introduced since Table I gives the relative
deivate, x, beyond nhich a given proportion of the population, P, is found when
both tails of tho curve are considered;
tail of the curve.)
TIO
are interested
III
only the positive
Table II is then ontored With tho ~ obtained fran Table I
to find z.
Exar.lpl'o;
20% of the sa;:1plo is to be selocted, i.e.
x for P :: .4 is .8416 (froLl Table I),
k ::
Z
for x
= .8416
".! ::
.2, P
= 217
:: .4,
is .2799 (fron Table II).
z/TI = .2799/.2 :: 1.4
Ii: the table fran tho Handbook of Chenistry and Physics is used, find the value
.5 -
Til
(or if ";/
)0
.5, the value
It - '
.5) in tho colurn hoaded "Area" and read the
value of ~ fron the next colurm to the right which is headed ItOrdinc.te."
There '17111 be a slight upnnrd bias in ~ taken as z/u since this assunos the
sar.lple fron which selection is unde to be of size approaching infinity.
the nagnitude of the bins vdll be uninportnnt unless tho proportion to be
selected is very
s~nll.
However,
•
l
""
Mineo. No. 3R. E. Constock
-11Problens:
5. Obtain the expected w.luos of
6.
of size 20, 30, 40,
~d
Obtain the expected
v~uos
ls;
for selection of .05 and .3 of sa.nplos
50 using Tablo XX of Fisher and Yntes.
of k for selection of .05 and .3 using Tnbles I
and II of Fisher and Yc.tes or the table in the Handbook of Choaistry nnd
Physics.
7.
4ssune a-~ for annual egg production of pullets is 625.' viho..t is the
expected selection difforential for tho top 15 fran
D.
ranGOD smlple of 100?
for the top 3 from a sample of 201
8.
Given () g
2; Up
2 =. 4 and
<r- p2
-- 100 , what genotypic gain will be expected
from selection of the best half of samples of size 6, 12, 24, and 481
these same values' assuming
()~ is reduced to .8 0'- ~,.6 (r~, and
Obtain
.40"-;.
ReferenceS:
1.
Fisher, R. A. and F. Yates (1938) Statistical Tables for Biological,
Agricultural, and Medical Research. Oliver and Boyd. London am
Edinburgh.
2.
Handbook of Chemistry and Physics; Chemical Rubber Publishing Co., Cleveland,
Ohio.
STATIf:;TICAL CONCEP'iS HI GENETICS
IV.
R. E. Comstock:3:49
Mimeo. No.4
The genotypic improvement resulting from selection - special categories and
practical considerations.
In the foregoing section, it was shown that the effect of selection on genotype
was the product of
,t? gp
and the selection differential, k<r- p.
It is apparent
therefore that four factors bear on the magnitude of genotypic gain from selection.
They are
1.
The proportion, w, of individuals available that must be selected.
2. The total size of the sample from which selection is made.
:3. The variance of the population, () ~, from which selection is made.
4. The proportion of the phenotypic variance which arises .from differences in
phenotype, {)- ~(S" 2 •
.
P
The first two factors have importance because they control the size
of~.
From a
practical standpoint the proportion of individuals to be selected is affected greatly
by the reproductivo rate of the organism.
! can be very loYl in selection with most
plants, but must be higher With farm aI).imals and particularly so with those such as
cattle or horses that on the average produce loss than one offspring per year.
Sample size is relatively Unimportant unless samples are very small.
However,
samples of five or less are frequently involved in the practice of selection in
animals based on progeny performance.
To bring the material of the foregoing section into cloarer perspective with
relation to the problems of breeding progro.ms, it will be expanded in this section
relative to some of the practical aspects of four categories of selection situations.
A. Selection in cases uhere eaoh genotype is represented by but one individual in
uhich there is only one phenotypic expression of the character.
e
\
Examples are age at sexual maturity in poultry, birth ueight of
gain from weaning to market
pollinated crops.
dOt
The
cc~ves,
rate of
eight in sUine, and ultima.te plant height in open-
Y1
ma.toriL~
of the .preceding' section applies directly to problems
,R. E. Comstock
Mimeo. No. ··4
-2-
in this category.
The principal point remaining to 'be considered is the implications
of genotype environment interactions.
If genotypes imich are superior in one class
of environments are inferior in another, what would appear to be again in seleotion
effectiveness resulting from restriction of variation in environments is partially
illusory except in terms of those environments involved in the selection experiment.
If the interactions are important little vJill be gained from control of enviroment
which excludes from the selection exporimonts environments in y,hich the improved
material is to be mod.
Rather selection for porfortlance in more than ono class
of environncnts should be practiced.
This point TIill be brought into clearer focus
a little later.
It is worth noting that genotypic
gnL~
fron selection in this category of
problems has practical inportnnce only because it is proportional to menn gain in
breeding value.
~
Genotypic gain is inportant in itself only if the selected
individuals cnn thomselves be reused for comnercial purposes to which the character
under selection is important.
B.
Cases \1hore each genotypo is represented by only one individual but uhere there
is opportunity for more than one phenotypic expression of the character.
Examples are annual milk production in cows, annual uool clip of sheep, rold
size of litter in swine.
Repeated observations of such traits result in more accurate
estimation of an individual's genotype.
From a practical standpoint it is north
inquiring h0\1 much selection is improved by increased observations on the same
individual.
The situation may be symbolized by the follovJing model.
(.30)
\7here Pij is the phenotypic deviation from the populr.cdon mean for the ,ith individual
and the jth expression of phenotype on that individual, gi is tho effect of the
genotype of the ,ith individual, ci is an environnental effect \7hich is constant for
•
all cllpl'ossionS of phonotype on tho !th individual, and
Vij
is tho
SUl:l
of " randoJ:l
error in nensurenent and tho portion of the total environnentnl effect which is
R, E. Constock
MiLleo. No. 4
-3randoIJ. from one expression of phenotype to another.
4It
Effc-cts of environnont which
are constant for repeated expressions of a trait arc easily visualized.
Suppose
a cow is snall in size because of poor nutrition during her body developnont.
uould have a liniting effect on nilk production in all
NoVi
This
lactation~.
let Pi be the neon of f for !1 expressions of phenotype by the ,!th
individucl.
Pil
Pi2
•
= gi
= gi
.. ci + vil
+ ci + vi2
•
•
(31)
This
CM
be 'flI'itten as
in which
If there are no correlations mJong g, c, and v
"' 2 ,,- 2
U g +,,) e
and
2
CT e
= a- 2
C
2
+ () v
n
The genotypic superiority of selected individuals TIill be
(36)
This differs froD the expression (25a) df the previous section only by substitution
of
cr- p for a- p vlhich is logical
,ebeing used in selection.
of
/& gp
since
p rather
than p is tho noasure of phenotype
The student cnn readily verify equation (36) by derivn.tion
nnd nultiplicc.tion of the value found by the seloction differential, k 6
P(,
R. E. Comstock
Mimeo. No.4
-4The effect on gs of increasing
n,
e~~ressions
the number of phenotypic
of the
genotype on which selection is based, is easily found in terms of its affect on the
magnitude of ()~~.
The increment in () ~ when
n
is increased from 1.0 to a larger
number is
() 2 _
,- e2 + ---.:Y.
\....
n
~
'0
2
~,2
2
2'~ v.--c - Irr'
J
=I')
.
v
n' v
=__(,~-v2(1
~
- n')
n
Set this equal to (a - 1)6 ; as in the previous section for an increment in cr"
resulting from control on environmental variation.
(a _ 1) '" 2
V
e
Then
t1r,2(1 - n)
= _\"'_v.:.-.._
_
n
2
n - 1
a=l+------l-(1 - n)/J' v _
n<1"~.
(37)
n
Note that! depends not only on
~
n but also upon the ratio of tY ~ to Cf~. That
this should be the case is obvious from equation (35).
of ! for several values of
n and 0- ~ (f~.
Table 1 shows the effect of increasing
Table 2 contains values
This table used in conjunction with
n upon genotypic gain from selection.
2/
Values of ! for different magnitudes of nand (f
"v.
Table 2:
_
__ -
O~o~
.
..
.2
1.0
.90
.85
1
2
4
For example, if.{y ~rs ~
.4
.6
1.0
.80
.70
1.0
.70
.55
= .8
0- ~() ~
....
= .4,
should be noted.
n
-
1.0
1.0
.50
.25
1.0
.60
.40
(onvirorunental effects which are Q'onstant through
= 4,
th~n .2
= .4.
Then from
= .3, the expected gain in effect of selection is. about
20% (it could be computed exactly from equation 29).
.(f.". ~(r"~
•
-"' _--
the organisms's life are relatively unimportant) and !!
Table 1, if
v e2
---.8
..
n
'----_.
:
= 4,
o.nd
(5 ~/er ~
=
19 or
On the other hand for
.3; the expoctod gain is 12.5%.
Two points
R. E. Comstock
Mimeo. No. 4
-51.
The effect of repeated observations increases as the fraction of
environmental variance due to I1constnnt ll effects decreo.ses.
records on the same
2.
individuc~s
Added
will obviously be an inefficient means
for speeding progress by selection unless
Ci- ~
The effect of furthcr:increase
off rapidly as n increases.
in~nfo.lls
is small relative toCY~.
Repeatabilitx:
There are practical situations in which our concern is not primarily with the
effect of selection on mean genotype, but wi.th its e.ffect on expected future
performance.
This would be true in the case of a dairy man Who did not as a practice
produco his own replacement stock but instead planned an purchasing some animals
each year to replace others culled from his herd.
e
The expectation of future performance is (g + c).
If we 2.re in-r,crested only
in performance of the individual und not in its value fur reproduction we do not care
whether good (or poor) performance is c function of genotypo or of a constant
environmental effect.
~
{yo
~~---
(g ... c)p
\ . (g ... c)p
=
L_
(g + c) (g ... c ... VI + v2 . . . . . . vn )
N
= \.'
~
. . ' 2g
N
... ,"'.
\.' 2
C
(38)
assuming correlations among g, c and v are all zero.
=
!~) (
)-
g ... c p
1')
-2
~'p
c:: 2g + C"..- 2c
= --..;:;.--'j--::...--'"7'r
2
2
\. g +
~.
6
C; 2C + --::....:Y..
n
and expected gain in futuro performance as a consequence of selection is
(g .. c) s
=/I.? (g
+
• Ps
c) p
=k(o 2g + G
(5p
2)
c
(40)
R. E. Comstock
:Mimeo. No. 4
-6When n
= 1.0,
we have
~
(g ... c)p
Lush (3) Lsee
(j2-t6 2
g
c
=
.<
v
2 .. () 2 ..
g
0
(41)
cr- v2
p. 165-167] defines Ifrepeatnbility" us the corl'clution botneon
single observntions on the same individual and symbolizes it as
r.
We TIill usc R
to avoid confusion TIith the use of ! as a symbol for correlntion in general.
If the variance of effects due to temporary (as contrasted to constant) differences
in environment is of the some size for all expressions of the trait
()2+<r 2
go
R=------(j2 +(J2 ... ()2
c
g
= ;S (g
(43)
+ c)p
v
Actually when Lush defL'1es "repeatability" in terms of the correlation he is assuming
that (J ~ is consto.l1t for the successive experessions of tha trait.
iVa \7ill define
"repeatability" as the regression of future performance on phenotype as nonsured
in one expression of tho trait.
Note now (s0e equation 39) that
~
r
(g .. c)i5
()2
...,J;ag
=
6
•
<S 2
2 +6 2 +---!
g
c
n
It follows that increasing
n results
()2+()2
g
c
() 2
= Igp
,.(3 _
g
in the SD.llle relative gain in effect of selection
for future performance as in selection for suporior genotype.
that
;t5 (g ... c)p
Z ;t9 gp
(44)
It is also obvious
and, hence, that the effect of selection measured in
terms of future performance will on the average be equal to or greater than the
_effect measured in terms of genotypic gain.
Mimeo. No.4
R. E. Comstock
~
The estimntion of
~(g + e}p
•
R, IIRepeato.biIitylt 1 is equal to
(fit
o~,
(45)
...:"2 it6 2+()2
u g
- c
. V
o.nd
n( CS ~ -+ () ~)
---_._----
=
(J
e
~ + G ~ + cr ~ + (n - 1)( () ~ + C;-~)
Dividing nUIlero.tor end denoninr,tor ?y (<S ~ + Cy ~ + ()~)
according to (45)
VIe
;r~(g +
substituting R
have
_
~
0l1d
c)p
nR
1 + (n - l)R
446)
Thus havll1g estimated R, .~(
+ c )-p is estimated for any value of n using (46).
I
g
The nost obvious estina.tbir:~ of R is the regression of second record on first
conputed fron data. involving tuo records on onch of
0.
series of individuals.
This
is o.n o.ppropriate method and ha.s the advmltage of yielding an unbiased result fron
data involving only the individuals uhich TIere phenotypically superior on tho basis
of their first performance record. Such data is conrr-l0n for livestock since culling
on the besis of the first record of porfornlance is frequent in pro.ctice.
The exo.ct procedure in conputo.tion of the regression coefficient TIill vary
depending on the deta.ils of the selection process a.bout uhich
~
L~fornlation
is desired.
Thus, if the selection procodure involves comparison of rocords nade in different
yecrs, the COi:1puto.ticns
sh~uld
be nnde in a nanner that nUoHs yeo.r to yeo.r
Mimeo. No.4
R. E. Comstock
-8-
e
variation to affect the regression coefficient.
On the other hand, if in practice
selection is always among individuals whose records were made in the same year or
years the regression should be a function of only intra-year variation. Let us
consider a numerical example of the latter procedure.
The data consist of records
for the first two years of production on the hatchability of eggs of 56 turkey hens.
(They were supplied by Dr~ R. S. Dearstyne of the North Carolina State College
Poultry Department..) The birds were born in 8 different years.
The analysis of
covariance is given in Table 3.
Table 3.
Analysis of covariance for estimation of "repeatability" of hatchability
in turkey hens.
Source of Variance
7
Groups of contemporary birds
Birds within groups
4S
55
Total
b = 10,432/22,441
~
-2,511
10,432
7,921
7,478
22,441
29,919
=
5,584
24,022
29,606
.46
The regression was put on an intra-year basis by separation of the portions of the
sums of squares and products associated with "group" differences since the birds
within a group all made their first records in the same year and their second record
the following yoar, again the same for all birds of the same group. Note that if
12 had been computed from the total variation (not eliminating variance due to year
differences) it would have come out .26 instead of .46.
That it v[ould be smaller
on that basis was of course to be antioipated though it could easily haveboen
larger as n consoquence of sampling error since so few "groups" Were involved.
A question will frequently arise concerning tho amount of data required for
a rea.sonably preoise estimate of "ropentabilityn.
The variance of samp10
regression ooefficients is
()
~
= S(Y - y)2
=5i
NSx2
- (Sxy)2jSx2
NSx;2
_Sy;. - P Sxy
_ sy2/Sx?- -
-
-
NSx2
;6; 2
N
(Since /~ = Sxy/Sx?-)
(soo
(10),
II1
(47)
R. E. Comstock
Mimeo. No.4.
Now, if there has been no selection of individuals on which second records were
obtained the population values of
sy2 and ax?- are equal. Substituting sx?- and .
Sy2 ne have
() b2 = 1.. - /.1
. 2
(47a)
N
Values of () b in terms of equntion (470.) are listed in Table 4 for various valuo~
of
;<9 ond N.
The values of
chc.ngos so !J10vr1y c.s
/<5' = .3
for
;8 used in the table o.:ce nidoly spaced because ()
~ Ch~ges.
b
(For our purposes we do not need to know ()b
and N = 200 more precisely than that
Table 4.
N
50
200
800
1600
<S"'" b for various values of /tg and N.
e.1
~
.~.
.6
.141
.070
.035
.018
.130
.065
.032
.016
.113
.057
.028
.014
it lies between .070 and .065.)
Suppose we hcve a sample regression coefficient of .19 from da.ta on 50
individua.ls.
be
0.
In vievl of the valuc:s of () b listed in the ta.ble, ..19 could 'dell
random deviate from
0.
population
;S of
any 'where from .01 to .40.
estimate would be less precise thon we \10uld like.
the data had involved 800 individualse
The
Suppose on the other hand that
The standard deviation of tho value, .19,
uou1d then be about .034 and we TIould be reasonably assured that the popUlation
;<J
I"100s
not sma.ller than .12 nor larger than .26.
This would be much more
satisfactory though perhaps still not as procise as we would like.
much more satisfactory TIhen the coefficients are larger since as
e
The situa.tion is
;C?
incre~ses
() b de~reases slightly in absolute va.lue, but \lhat is more importont it decrea.ses
sharply rela.tive to tho c.bsa1ute mo.gnitude of
;!?
In the case of the turkey hen data.
the estimate, .46, ma.y conceiva.b1y ha.ve been a random deviate from a population
c.s small a.s. .25 or as large o,s .65.
It fails to esto.blish a.s na.rrow confidence
;C1
R. E. Comstock
Mimeo. No. 4
-10..
limits as we would like.
In the event that the data involves only those individuals TIith the bost
phenotypes in the first record year (i.e. the data are on selected individuals)
the expected value of 3y2 will be larger than thnt fO,r Slx2 and 6" ~ will be lJ.rger
since Sy2/Sx2 in (47) will be grenter than one. Such data yields unbiased estimntes
but more data is required i f equal precision in estimo.tion of
~
is to be obtained.
For example, if the selection consisted of obtaining seoond reoords on only the
individuo.1s thc.t wore in the top half with respeot to first records the expectation
of s:l-/Sx2 is 2.75 (assuming sy2
the variance of
=sox?
rlith no seleotion).
Thus for;t!5 = .3,
J2 would be 2.9 times ns l:".rge o.s if the chta 'flere on the
unselected mo.terinl.
When more than tVI0 records c.re avail::'.ble on
0::'.00
individual tho simplest
procedure for utilization of all the information is to estimate (G ~ + <)~) cnd
6 ; from an nnalysis of varianoe nnd combine tho valuos
equation (45) to yield an estimo.te of R.
in accordc.noe '!,7ith
The approprinte form of the analysis of
vario.noe will vary With the selection practioe for \7hich tho IIropeatabilityll
estimate is to be applied but nill usuc.lly be in one of the tuo
d.£,
Groups of individuals
Individuiils in groups
G- 1
Total
below.
lorm 1
§ource 2l. vario.n,Q,e
Within individuals
indic~ted
li
..
n
~eqp
G
N(n .. l)/n
Square
~ectation
of Moan Square
6~+n«)~+cr~)
~2
v
N .. I
G = number of groups
N = total number of records
n = number of records on onch individual (assumed constant for 0.11
individu~ls)
I
f
R. E. Constock
No. 4
l~imoo.
-11-
l£!:m
~
§,2urce of Variance
£...L.
M£..qn Squares
Groups of individuals
G ... 1
Mo
Order of record in gtoups
G(h - 1)
N
Expectc.tion of
Mo::.m Square
• hI1
Individuals in groups
n ..
Individual x Order in groups
(n ... 1)(* - G)
G
M2
6" v2 .,. n( (f g2 + <5' 2)
c
MJ
o~
N ... 1
Total
Very frequently a mase of data. ava.ilnble for estimating R involves sources of
variation such as strain effects, location effects, year effects, etc.
not oontribute to the voriance of individucls among VIhich
practice.
solec~ions
which do
are made in
In that case ve are interest?d in only the variance TIithin groups that
are homogeneous with respect to those factors.
It is such groups of individuals
that are provided for in tho l1nc.lysis forCls above.
Form 1 a.ssumes that individuals
may in practice be compared for selection purposes on the basis of.records not node
ooncurrently.
FOIn 2
on the other hond is appropriate if selection is always based
on records nade "morc or less 2,t the sane time.
Assuning that '.7ithin a group all
individuoJ.s have Dade their records in the sarae seasons the vc:riance due to time
(year or season) TIill be removed in the portion of the variance fornally ascribed
to "order of record".
In the case of either anclysis fom
by (M2 - MJ )In.
<S ~ is estiI:ln~ed by M,3 and (6"; +<}~)
The difference trill be that in the case of Form 1, ~ ; TIill
include varionce Que to year or season effedts which are reLIoved:in tho caeo of Forr.l 2.
In nany instances there will be variation in !!I the
individuc.l.
.,. (n -
of records on each
The only inportant effect of this on the Foro 1 analysis is a chango
in the expectation or nean square M2 fron
<.f v2
nunb~
s~ (<S
Sn)
2
g .,. ()~).
6' ~ .,. n( d-.~ +<31) to
Hanover, Forn 2 can then be conputod only by the
enothod of Least Squaros i f the entiro data. is to be used.
What is usually
c~one in
R. E. Oomstock
Minoo. No. 4
-12~
practice as an alternative is to sacrifice some data, analyzing only such portions
as are amenable to the usual analysis of variance computational procedure.
Anumerical example will be taken from Lush and Molln (4).
It involves data on
(from the Mich. Agric. Expt. Station) 461 litters from 2.30 different sows of 6
breeds.
The analysis of variance is as folloYls:
d £
M.S.
Groups of sows
Sows in groups
57
172
26.64
11.49
Within sows
2.31
8.12
Souroe of
varia~
~
§xpectation of Mean
<> ~ + k( () ~
+ <S
S~uares
~)
<S"~
The groups were made up of animals of the same breed that were born in the same
year. Since Lush and MolIn were interested in selection among individuals of the
e
same breed, this grouping, VJhich yields an estimate of <.5" ~ +
breed basis, is appropriate.
cs- ~
on an intra-
Restricting the composition of groups to animals
born in the same year removes most of the effect of time changes in the average
genotype of the herds and of time changes in the average management of the herds,
neither of which would affect selection in a given year.
the data Lush and MolIn set k equal to
n,
In their treatment of
the average number of litters per sow
2
rather than to (n - sn) as would have been strictly appropriate.
3n
They point out
that this results in a slight bias. However, the two could in this case have
differed very little.
(Note that
k is
used above to
s~llbolize
somethlllg entirely
different from its moaning in connection with the selection differentiQ1.)
The
avorage number of litters per sou was 2.0, honce () ~ + () ~ is estimated as
(1l.49 - 8.12)/2 = 1.68.,.. The nithin
SO\;7
moan square estimo.tes () v2 as 8.12.
The estimate of R is 1~68/(8ol2 + 1.68) =.17.
e
For other and more detailea examples of the estimation of repeatability
see Lush and Molln (4), Lush et al (5), Dickerson (6) andStowart (7).
...
R. E. Comstock
Mimeo. No.4
-13Problems;
9. R for
unnu~l
basis.
butterfat production of cous is nbout .30 on the intra-herd
AsSUtle the average for a herd is 320 1bs.
Compute the expected
future porformc.nce of
1
(0.)
n con uith a single lactation record of 380 1bs.
(b)
anothor \lith
(c)
another With a three lactation average record of 360 1bs.
0. tilO
1a.ctation average of 360 1bs.
10. Given tho fo110ving analysis of variance for annual \lool clip of Hampshire
I
L
olles.
(n is the same for all ewes.)
Source of varinnce
d.f.
Group of same age raised at same location
Order of record in groups
Eues in groups
Order x ewes in groups
Total
e
9
20
500
1000
1529
-
~
10.00
12.00
2.20
1.00
Estimate (0.) repeatability
(b) increase'W future performance expo cted fron culling the 10Yl half
of 40 ewes (1) on the basis of 1 year's rocord, and (2) on the
basis of the menn of two yea.zd s reoord.
!
R2ferenoesl
3. Lush, J. L.
I.
I
~
Animal Breeding P1Ms.
~d
i-
The Iorva state College Press.
Ames.
Ed.
(1942) Litter Size and ~'Ieight as Pormnnent
Characteristics of Sows. U.S~D.&. TecheBul. No. 836.
4. Lush, Jay L. ond A.. E. MalIn.
5. LUsh, Jay L., H. W. Norton,III, and Floyd 4t'no1d. (1941) Effects 1'1hich Selection
of Drums may have on Sire Indexes. Jour. Dairy Sci. 24:695-721.
6.
Dickerson, G. E.. (1941) Estimates of Producing Ability in Dairy Cattle.
Jour•. Agric. Res.. 61:561-585.
7. SteY1o.rt, H.. ~.. (1945) The InhoritMce of prolifico.cy in Suine.
Sci. 4:359..366..
Jour. AniD.
,
j
1
.J
..
,~~
STATISTICAL CONCEPTS IN GEHETICS
Mimeograph No. 5
R. E. Comstock:4:49
IV. (con'd)
C.
Cases where each genotype may be represented by more thnn one individual but
where the individual expresses the character only once.
Individuals of the some genotjrpe are possible nith pure lines or ",lith
material that can be asexually propagated.
in this category unless the characters
~re
Thus these sorts of material belong
expressed more than once as in perennials.
Specific examples of traits in this group nre the chc.racters of inbred lines of
corn or single cross corn hybrids, and of varieties of such normally self-fertilized
crops as oats and tobacco. Sweet and irish potatoes nre exa.mpies from among the
horticUltural crops.
Precision in the comparison of genntyPos can be much greater nith material of
this type.
Because large numbers of individuals of each genotype can be produced"
the error of comparison can be reduced by replication. HOTIever, important questi ons
arise concerning the optimum amount of matorio.1 to use in genotype comparisons and
the optimum distribution of· the material relative to the population of environmcntr:
for Fhich we TIish to evaluate the genotypes,
As a concrete example of the practical problem consider the evaluation of
strains of oats for use in the Piednont area of North Carolina.
The population of
environments involved are tiose TIhich occur in that area, Specific sources of
variation in environmont that are recognizablo are soil variability Within tho area
and variation in the climato complex from year to yoa.r or from one part of tho area
to another in the same year,
It is quito obvious that selection nnong varieties
based on data collected in ono yeer and at one location could be quite ineffective
with respect to gonotypic value for the population of Piedmont environments since
only one of the soils and one of the "Climates" would be involved.
We recognize
that the evaluation of the genotypes should be made on nore soils and in more
"climates".
How can \Te decide such practical questions as hon LianY locations and
years should be involved in a variety trial
~nd
how Liany replicntions should be
..
~
MiLl00graph No. 5
R. E. Comstock
used at cach location in onch year.
The mathematical model is more involved in this case since we wish to
subdivide the total effect of environment into portions arising from several
specific sources.
Let
where
Pijklm is the deviation from mean phenotype for the mth individual of a plot of the
1th genotype in the lth replication at the ith location in the kth yearo
effect of the 1th genotype
aj
= the
= the
~
= the
effect of the kth year
Cij
= the
interaction of the 1th g8notype and the
l~h
location
dik
= the
interaction of the 1th genotype and the
~th
year
fjk
interaction of the ith location and the
~th
year
hijk
= the
= the
Ujkl
= the
effect of the lth replication in the
Vijk1
= the
gi
Zijklm
effect of the ,J,th location
second order interaction between the 1th genotype, the lth location and
the kth year.
~th
location and the kth year
effect of the plot on which the 4:th genotype is raised in the jth
location, the ~h year, and the .g th replication.
-
= measurement
error plus the effect of intra-plot environmental variation for
the individual in question.
Of course, some of these effects cancel out in the compcrison of genotypes if e[,ch
genotype is represented by the same number of individuals in oach replication,
location, and yeo:r.
Suppose each genotype is represented by a plot of !l individuals
in each of I replications at each of
.§
locations in each of ! yeo.rs.
Then
.
,
~
.
Mimeo. No. 5
R, E. Comstock
-3-
SPi.
s
= nrstgl + nrt
+ nr
+ n
r:* 1a.j + nrsk L= 1~c + nrt j ~• 1 e.t J
s
t
L l:
f jk + nr
d ::;
1 k = 1
s
t
l
t
°
j
s
s
t
>-... h1jk + n
j = 1 k = 1
j
r
l~_' ,2.~· VJ.jk~ +
j = 1 k = 1
~.=
1
.
t
s
j
=1
..L.'
Z.-
.>.. "
= 1 k :: 1 J..:::; 1 m
=1
d
lk
r
t
~
r
L
nrs
k
s
t
1:
...
L
~.
U 0k"n
k = 1 il. == 1 J ~,
n
r= 1 zlJok.>.!" m
(49)
The sum for any other genotype will be the sawe except for substitution of the
subscript for that genotypowherovcr tho numercl one appears
above expression.
e
lo~ation
This
lo~ves
QS
a
subscript in the
unchc.ngod the terms involving the year effects, the
effects, the intera.ction of
YQ~r
and locction, end the replication effecto;
th<:.t is, those terms are consto.nts in '0110 sums for the various genotypes.
Being
constDllts they will not contribute to the variance of the phonotypic menns and wil:L
be omitted from the equation for the DOun,
PI'
-,
Spl
nrst
s
... L
j
:s
t
1 Ie
L
=1
r
r-
~= 1
s
7
(vljk9 )/rst +
j
=1
t
L
r
n
E
I;:
k = 1 ~:;: 1
In
(zlJokQ m)/nrst
(L:.91..)
=1
The variance of the menn phenotype for a genotyPe will be (soe equations 6 and 15,
section II).
Mineo. No. 5
R.' E. Comstock
-4-
-=
,.,.2
'-..
P
~2
('j-2
.. 2
2
2
str
strn
,....,.. 2.+ v c d . C h
0- v
cr z
=v
-+-+-+-+-
g
s
t
st
(50)
where
cr ~
= genotypic variance
() ~ = variance due to interaction of gonotype c.nd locC'.tion
() ~ = vnria.nce
cr-- ~ ::
duo to interc.ction of genotype
anu
year
variance due to interaction of genotype, location, and year
() ~ :: varianoe due to environmental variation between plots in the same replication,
and
(J
~
i
= intra-plot
If means nre
variC'.nce between individucls of the snme genot;ype.
co~puted on
the basis of plots as is COOTuon whore the number of plants
per plot is large, we have np, the vnrinnco of which is
2
2
2' 2
222
2
n () d
n ()
n cr 2
n () 2
2
2 ()2 n a- a
unp-=n
g++
+
h+
v+
Z
s
t
st
str
strn
(;00.)
what is ordinarily oalled the experimental error or strictly speaking the '.1ithin
replication plot variance is in· our notation n 2 .,.- ~ + n (f ~.
If as before we write
p=g+e
then
and
..
.(1 ..
Mineo. No. 5
R. E. Constock
-5whore
2
2
2
2
2-.2
2,-..2
2
2
2
2 nQc
nUd nOh nu v n o z
n 0e=
+
+--+--+--S
t
st
str
(50b)
strn
is the non-genotypic, and n2 () ~ the genotypic p.ortion of 0-
~p, Clearly
increased replication will not by itself be a very good ncthod for decreasing
n 2 (J~ if interactions of genotypes uith loce.tiona andyenrs are very gree.t.
~
at which the genotypes are conpnred.
As a basis for
between
~,
1,
and
1,
is needed in that case is to increase
What
the nunbers of locations and years
decid~ng
the Optir.1UO ratios
and £, estina.tes of the severD.1 vari<ll1ce cOl.lpononts are required.
They cc.n be obtained from the data. of vnriety conparisons conducted in several
locations in nore than one year. !ssuna as above that ench genotype (variety) has
been replicated r tines at each of
were
n individuals
~
locations in each of ! years and that there
of the variety per plot.
The analysis
ofva.ri~~ce
Vlould be as
follows.
Source of Variation
~ectation
Locations
s .. 1
Years
Locations x Years
t .. 1
(s .. l)(t .. 1)
of M.S.
Reps in locations and yoa.rs st(r - 1)
G- 1
n2 «(f2 + rg-~ + rtG"~ + rs<r~
Var. x locations
(s .. l)(G .. 1) 1\12
... rts 0- ~r
222
2
n (~ ... rClh ... rt() c)
Vo.r x years
(t - 1) (G - 1) M3
n (C)
(s .. l)(t - l)(G -1) M
n 2 (v
Varieties
Var x loc x yenrs
VDr x reps in locations
and years
Total
4
st(r - l)(G .. 1) M
S
strG - 1
222
2
... r<Jh + rS<J d)
2
+
ro-~)
."
Mineo, No, 5
R. E. Constock
-6As used in tho expectations of
~:lOall squr~rcs n2 6 2 is equo.l to n2 0" ~ + n () ~
of (500.),
The noan square expectntions nre appropriate functions of the population
variances derived on the basis of tho arithnctic opora.tions involved in cOI:1putation
of the noon squaros.
Let us. first consider the variety
variance of variety Doans is given in (50a).
nec~
squaro,
Tho population
Nm7 renenber that the Donn square is
cooputed a.s the so.np1e varinnco of tho variety suns dividod by str, the nunber of
plot tota.ls th:-,_t ere sunned for oo.ch variety.
The vnrinnco of variety
SUDS
is
r 2s2t 2 tines the variance of variety tlenns and dividing it by rst rre havo fran
(50n)
2 02
2
n 2 (rst () g ... rt 6 c + rs () ~ + r
Substituting C5' 2 for
o~...
(j' ~ -+ 0
c;: .~ Tfe
~ ...
"7)
ha.ve the expocta.tion of the vnriety nean square
a.s givon in the analysis of vnriance,
Tho va.rioty x loca.tion noan square is cOllputod froD the variety
locntions.
SW:lS
for single
As a \7~rking procedure for obta.ining its expecta.tion, VTe will (1) write
the equntion for such suus, (2) subtract all torus involving effects not specific
for a. given genotype and location, (3) \~ite the oxpression for the population
variance of the rOi.minder, and (4) divide tha.t variance by the nunber of plots
totaled in these suns as is done in conputc.tion of the nonn squ£tres.
Step (1)
t
SPij = rtngi ... rtnaj ... rtnoij ... rn
k
t
t
L
+ rn
hijk ... n ~
kfl
k=l
t
+
4::"
k= 1
r
t
L:=1hJc + rnk L= 1dik + rnk r= 1f'k
J
t
r
L
.9-= 1
t
uj.k4 + n
~
L-
k = 1
n
z. L. Zijki m
.{ = 1 m = 1
.;
Mimeo. No. 5
R. E. Comstock
-7Step (2)
t
SIPiJ'
= rtnoiJ'
t
+ rn ;f:. hiJ'k + n
k
=1
k
2:-.
=1
r
~ V'jl
~ == 1 J.
t
D
{A
r
n
Z, 'kR. •
= 1 f..L= 1 mL.
= 1 J.J
m
+ 2:k
Step (3)
Step (4)
Variance
rt
= n2 (rt 6" 2c
+ r () 2 +
h
(j'
2
2 .. er;- z )
v
n
The final expression is the mean square expectation as given in the analysis of
variance.
The steps outlined above may now be put in a general form that will be
applicable for deterlilining tho expectation. of the mean square for any source of
variance in the analysis of variance table.
be combined into one.
The first two steps listed above can
The steps involved are then as follows:
(1) \1.rite the
equation for the sums used in computing the mean square, omitting all effects not
specific for a single one of these sums. Note that effects specific for single
sums will include among its subscripts those necessary to specify the sum.
(2)
Write the variance of the expression [remembering (6) and (15) of
section II ~.
those sums.
Divide tho variance by the number of plots totaled in eaoh of
(3)
For a slightly different rule for writing moan square expectations
see Crump (8).
Given the moan square expootations the procodure for estimating tho varianoe
oomponents from the mean squares is easily seen.
n 2 6"' 2
M
......,.
(M4
- ].~)/r -7 n2 0 ~
5
(IJ3 - M4)/rs
-7 n
(r~ .. M4)/rt
- ? .11
2
()
~
2 () ~
(M1 ... ~;~ ... M3 + 1.14)/rst
Mimeo, No. 5
R. E, COtlstock
-8-
->
where
is read "is an estimate of".
As a numoriccl oxo.t1p1e an analysis of vc.riance is presented below of data on
yield of 10. varieties of soybeans at each of 5 stations in each of two yenrs.
We have then. G = 10,
comparison was replicated 4 times in oach locc.tion c.nd yoar.
s
= 5,
The
t = 2, r = 4.
~
Source of Variation
M.S. EePecto.tion
M.&..
Years
1
424,453
Locations
4
1,044,298
4
1,632,833
YxL
Rers. ;1\
y+L.
2.'1
Varieties
2
n (o.. 2 +
422,236
9
4(r~ + 20~~ + 8()~
+ 40
()fi + 20rr-a)
VxY
9
42,950
n 2 ((f 2 + 4
V xL
36
46,359
n2 (<S"2 ..
v x Y xL
36
60,744
n 2 (\}2 +46'~)
270
12,716
n 2 <S" 2
Y)< Reps in year and location
! somewho.t unexpected result is noted.
cs- ~)
41)~+ 8(f'~)
The V x Y nnd V x L moans squc.:t'es nro
smaller than the V x Y x L mean squc.re instead of larger as was to be expected
a.ssuming <S ~ and
l)
~ ~ec.ter than zero. Since vnrinnces cannot be negntive
quantities the most logical Proc,edure probc.b1y is' to conclude that
C)
~ and'
~
6""
are zero (or a.t least of insignificnnt magnitude) nnd to use all three of these
interaction mean squares as estitlates of n 2 ( f; 2 + 4
cr ~).
On this pretlise our
estimates of the variance components are as follows:
ComEoncnt
Estimate
n 2 <S" 2
n2 ~ 2
h
12,716
~ 9(4~,950)
~
1
+ 36(46,359) + 36(60,744) _ 12 716
81
=
'
9,914
/4
MimeD. No. 5
R. E. Comstock
-9(conld)
Estimate
Component
zero
zero
[422,236 - 9(42,950) ..
3~i46,359) .. 36(60,744)
1
= 9,247
n 2 () ~ for the experiment as conducted is estinatod as
= 1309
9,914 + 12,716
10
40
if G
and as an estir,mte of n2 6
~p
\70
have
9247
- 88
9247 + 1309 - •
The above estimates are, of course, subject to sampling variance. However,
if the data were sufficient to assure that the estimates were substantially correct,
they would furnish a basis for deciding the munner in
~hich
tested in the future. Referring to Table 1, Section III we
6" ~ ()~ is as high as .7 to .9 there is littlo to ga.in
varieties should be
seo
that nhen
fran attempting to decrease
C) ~, i.e. in the case under discussion there would ha.ve boen no point in inoreasing
either replication or the nunber of years and locations nt TIhich the comparison
~as
rlade.
If tho estinates of the variances were substantially correct, reduction
of replication to two in future cooparisons i10uld result in reduction of the
expeoted value of n2 <r
V
<S"'"
~p only to
9247
"
••85
9247 + ~ .,. 12716
10
20
The data indicate that
ve~y
little was gained fron using 4 replications instead of
two.
A brief StWlrlary of ioplications of the relative magnitudes of the variance
components with respect to the design of selecticn progra.ns is portinentat this
point.
Mineo. No.5
R. E. Constock
-10-
1.
Unless 6" 2 is very large relativa to the other conponents of non-genetic
variance, nore thun two replicntions at a single
loc~tion
in a single year
will rarely be TIorthwhile in terns of ~ gSl tho increase in progress e~Jocted
fron selection.
2.
When ~ ~, vario.nce due to intero.ction of loco.tion and genotype, is very large,
selection for average perforoance over the entire oren fron which the loco.tions
were drawn should be abandoned if a criterion can be found for establishing
sub-areas within which
a
~ will be much reduced. The reason is that when
this interaction is large there is probablY no single genotype that will be
superior under the conditions of all locations in the area. By subdivision
on some meaningful basis such as soil-type, drainage, etc., it may be possible
to find genotypes with superior adaptation for all locations of a given
sub-area..
lines~)
(Note that the oriterion may not divide areas along geographical
The location-genotype interaction that remains after further area
subdividion becomes impractical must be dealt with as error variance in the
genotype comparisons and controlled by making variety comparisons over a
sufficient number of locations.
3. When
tS"" ~ is very large the only useful action is to increa.se :b the number
of yee.rs over rThich tests are run, unless measures of climate or effect of
climate are a.vailable before planting time which will serve to divide the year
population into sub-populations within which yeo.r-varietyinteraction would
be greatly reduced.
For example, it is possible that genotypes respond
differentially to soil moisture at planting time and th2t certain varieties
might be recommended for use when soil moisture was high and others when it was
low.
(The intent is not to say that this is or is not a reasonable possibility.
Tho example is a.dvanced only to indica.te the sort of thing that would be reqUired
to eliminate year-genotype interaction as
of superior genotypes.)
11
source of error in recommendations
Mimeo. No. 5
R. E. Comstock
-11-
4. Variance due to second order interaction of genotype, year, and location.
In general, this variance must be looked on as error variance to be controlled
by incroasing locations, yeDxs, or both. Houever, to the extent that it is a
consequence of variation in Doather botueen locations in tho same
yor~
thore
is a possibility of dealing with it as suggested in the cc.se of r:r~.
D.
Cc.ses tlhere each genotype may be represented by more thon one individual ond
phenotype is expressed more than once by each individual.
The most
corr~on
cxaJnplas c.re the perennial crops capable of asexual
propagation. The annually expressed characters of fruit trees,
stro.~berries,
blueberries, sugar cane, etc. belong in this group. Traits of perennial forage
crops in which seed is produced entirely by apomixis also belong here.
The situation for these traits can be resolved into one very similar ,to that
for category 0 if we take the point of view that the trait should be measured in
terms of its average expression during the lifetime (defined in terns of commercial
practice) of an indiVidual. Unless we assume that the expression of the trait is
uncorrelated TIith age (usually an untenable assumption) lifetime performance is
expressed but once though, 'as will be discufsed boloTI, sample portions of lifetimo
performance may be considered as sepcrate oxpressions of it.
Our
model uill be
~9
follows:
y
2-
iO
=1
n
vijkR.o + ~. z..
1:1
= 1 1.Jk ~ m
y
n
+.L.I
n
=1
'0
= 1 zi'kQL1o
J
(51)
whore p.1.J'I..
'k fi is tho lifetioe sum (of annual ne2.Surcs of tho trait) for a plot of the
,!th gonotype in the
1.
th replication a.t the ~th location in the !th testing period~
(A testing period is taken to mean a set of
Y02.TS
over TIhich a single planting of a
·.
Mimeo. No. 5
R. E. Comstock
-12-
genotype comparison is observed.
0
= 1,
2, • • • y is used to identify single
years of the testing period.)
gil aj' and Cij have the same meaning as is equation (48)
~
is the effect of the !th
te~ting
period
dikis the interaction of the !th genotype and the kth testing
period.
.
fjk is the interaction of the lth location and the kth testing
period.
hijk is the second order interaction of the !th genotype, the jth
location and the !th testing period.
/1
Ujkx is the effect of the -& th replication in the ith location and
and kth testing period.
Vijki is the effect of the plot in which the !th genotype is raised
in the jth location and kth testing period that is constant
for eacn year of the kth testing period.
is the portion of the plot effect in the .2,th year that is
temporary. (These effects are assumed to be randomly
distributed with respect to plots and years.)
is the effect of intra-plot environmental variation for the
mth individual of the plot that is constant for all years
of the ~h testing period.
,
Zijk l.mo
is the portion of the effect ofintra.-plot environmental
variation that is temporary. (These are a~sumed to be
randomly distributed with respect to plants and years.)
This model differs from that of the preceding section (equation 48) in three
ways.
(1)
The unit of time involved in expression of the phenotype is recognized
as a set or series of years instead of only one.
(2)
Random plot and intra-plot
environmental effects are subdivided into a portion that is constant through the
testing period and another that varies from year to year.
measured on the plot basis instead of the plant basis.
(3)
The phenotype is
R. E. Comstock
Mimeo. No.5
-13The mean for
n
genotype (omitting terms that are oonstant for all genotypes,
as in equation 49a.) is
s
Pi ::
s
t
t
gi + ~ (cij)/s + 2: (dik)/t + 2:
.2:. (hijk)/st
j=l
k=l
j=lk==l
s
+
t
r
s
L L
~ (Vijk t )/rst
j=lk=l Jl. == 1
s
+
j
L Z.
2....
+
j
=1
y
r
t
k
=1
~ (V~jldZ o)/rst
~. = 1 o = 1
t
~
E
=1 k = 1
s
t
+ E
~.
j = 1 k =1
r
~.
n
y
.' .E L.. (ziJ'k i.mo}/rst
1= 1 m = 1 0 = 1
(52)
where
s = number of
t = number of
r = number of
n == number of
o = number of
locations
testing periods
replications at each looation in each testing period
individuals per plot, and
years per testing period.
From (52) the varianoe of a genotype mean is seen to be
where
(53)
An analysis of variance using plot sums (for the entire testing period) as tho unit
variable and data oollected in
r replications at each of
testing periods would tcke the following form.
~
looatians in each of 1
.,
Mimeo. No.5
R. E. Comstock
-14-
Expectation of MeS.
§ource of Variation
Loca.tion
s - 1
Testing period
t - 1
L x Tp
(s - l)(t - 1)
Reps in L and Tp
st(r - 1)
Vnrieties
G- 1
6
VxL
(s • l)(G ~ 1)
.~2 .+.r\5'" h2 + rt-)'2c
•
(t - l)(G - 1)
("2
(s ~
V x L x Tp
V
x reps in L and Tp
Total
2
+
+
rs~
r
+
rs()~
+
rt<S'~
+
rtsG"'"~
<r'~ + rS(f~
2
2
+ r tS"h
2
2
2
=ti v + Y()v' +n1S z +nY<i
. z'
l}(t • l)(G • 1)
itS
st(r - l)(G - 1)
S
2
2
strG - 1
In addition plot values by years may be ufed to compute
0.
menn squ.::re for
"varieties x years in replications in 10cntions 2.nd testing poriods" which will
hJ.ve as its expected value,
6'";r + n (,- ;,.
If large plants such as fruit trees
are involved and therefore a nunimum nurlber of plants per plots would be desirablG
"
estimates of ~ 2
B and ~ 2
Zl will be useful.
These can be obtained using data fo~
individual plants by yenI's (if c.vo.ilab1e) to compute menn squares for "plants in
plots" ond "yoars x pl.:mts in plots".
rs 2,z
The expected value of the forner will be
+ YO" 2
z and of the latter will be () 2,
z •
Fron the foregoing it will be noted that a.ll variance components on which
information is needed as
0.
basis for deciding how selection experiments may best be
conducted can be estimated if nppropria.te data arc avo.ilnble.
However, data
involving more thon one testing period is not likely to be available for long-lived
plnnts.
Indeed, extension of trials over two non-over-Iapping testing periods
R. E. Coostock
Mineo. No. 5
-15would most likely be
impracticc~
for such plants.
Alternatives that suggest
thooselvos nre (1) spacing locations widely enough so that weather correlation
anong then is not likely to be high, nnd (2) over-lapping testing poriods. The
•
latter would nean that the planting year and perhaps one or two nore 'Would be
different for each testing period and the initio.1 years nay have the nost profound
effoct on lifetime p0rforoance.
If the cost of taking data fron producing plants is very great the oxpor1nenter
should consider collecting data in only a portion of yoars of the testing period
in tho case of long-lived plants unless 6" ~I or f!S :
f
arc so large that so doing
would appreciably increase ~ ~. Tho effect can be seen if (53) is rewritten
in terns of
6-
~I
p y
'
the variance of
_ <S" ~.,. ()
y2
,<""
2
U
ply -
7
0.
genot;ypo oean on a plot per yeDX basis.
~
(54)
,..
2
u h
~2
v·...;
VI
0- 2
2
.. _2,
n .v l7.n
__'.J;;...'
_z_.
+-+-+-+-+
sty2 rsty2
rsty rsty2
rsty
If the forn of the perfornrumce curve over the lifetime of the organisn has
econooic inportance, it will perhaps be taken into account bost if considered a
separate attribute to be neasurod in terus of one or nore regrorsion coefficinte"
Probletl§:
11.
(a).
Assune that a comparison of 12 ants varietio s had been conducted at tilO
locatio~s
With 6 replications in oach of 3 yenrs.
Write out tho foro
of the analysis of vnriMceincluding tho noan square expect2.tions
reqUired for ostimation of rS ~ and the conponents of . ,). ~ •
(b). Suppose the conparison hc.d boon ccnductod at only one station. Write th3
o.no..lysis with uoan squnre expectations and ShO\l that
of I") ~ \1ould not be available.
0.
clenn estir.mte
..
i,
I"
Mineo. No.5
R. E. Constock
-16-
12. Assunc that you have nade a cross of two varieties of irish potatoes, that
F2's have been tested in a disease nursery, and thnt adequnte vegetative
planting naterial of each of n large nunber of disease-resistnnt selections
is at hand.
AssurJe further thnt fron previous dato. you
h~ve
good variance
conponent estino.tes \7hich nre as foll01:1S.
<5" 2g
=8
""\ 2
- c
=6
1
,<""2=2
d
1..)
rs-~:;:4
es- 2
:;: 16
Now suppose you can plant
D.
trial involving 400 plots that can all be at one
location or can be divided equally anong anyWhere fran 2 to 10 locations.
HOVlOver, if the trial is extendod over 2 years only 200 plots cnn be used per
year.
How nany varieties would you put in the trial at ho,;,! T.J.any loca.tions in
how mny years o.nd in ho\"! nany replications per loco.tion-year? The obj ect is
to ncke gs as large o.s possible.
13. Beley! is un annlysis of varinnoe of data collected by P. H. Harvey in. a
conpo.rison of 25 corn hybrids.
Loco.tions
Reps in Loo.
Genotypes
GxL
Reps x genotypes in loco.tions
4
10
24
96
. 2772.0
72.3
32.19
6.78
4.26
Conpute ostinc:.tes of variance c:"uponents that can be nade fran these data.
Coupute gs for the best hybrid of the 25 on the basis of this trial.
Its
average yield for the test ,laS 3.8 above the average for the 25.
References:
I
(8)
Crunp, S. Lee (1946~ The Estihntion of Variance Ccnponents in Analysis of
Variance." Bionetrics 2:7-11.
.....----.......
\
lJimeo. No. 6
R. E. Comstock:5:49
STATISTICU, CONCEPTS IN GENETICS
v.
"' ,
Simultaneous selection for several traits.
When the economic value of
811
organism is a function of more than one charac-
teristic, it is logical that each trait affecting merit should receive attention in
selection directed at genetic improvement. Hazel
~ld
Lush (9) showed that it is more
efficient to' consider.each such trait in'everygeneration of 'selection,
RroV1ded.~
trait i& given its proper weight relative to the g1:hers, than to follow the plan of
improving the individual traits one at a time.
It should be noted that in certain
instances the optimum weight to be given a specific trait will be so high that the
optimum basis for seleotion is essentially selection for that trait alone.
In such
cases the slight gain from att.ention to other traits may.not compensate for the cost
of collecting data on them.
This situo.ticn might be expected, for example, in the
first stages of selection following a cross of two strains one of 'which was
resistant and the other susceptible to an important diseaee to wtd.ch resistance was
an absolute prerequisite for practical
purposes~
Giving a specifio weight in selection to
e[~ ch
of severo,l tro.its amounts to
basing selection on an index of the form
\7here Xl, X2' • • .,
Xu are the phenotypic values of the traits considered o.nd
bl, b2, • • " On are the relative ueights accorded the
n traits.
The term,
seleotion index, will be usod specifically in reforence to such on index in which
the b I S are given the best
esti!~l::,tes
of their optimum rulues.
If phenotype were unaffected by variation in environment every individual would
express its genotypic worth phenotypically w1d selection would be straightforward
Ie
and efficient. The weights to be given different traits would then depend only on
their contributions to the economic value of the organism. Honever, not only do
~nvironmental variations affect phenotype but their effect on phenotypic expression is
greater for some traits than others.. It is obvious thnt propor ..:eighting must take
R. E. Comstock
Himeo. No. 6
-2-
cognizanoe of these faots.
For
eJ~runple,
a funotion of tvJO traits and that
entirely
non-~onetio but
phenot~rpio
non~genetic ~ould
~ere
variation in one of these traits '\"Jas
in the other TIas largely genetio.
foruhich all variation TIas
en
suppose eoonomio worth of an organism
Attention to the trait
aocomplish no genetic improvement.
the other hand, it \lould moan a 10ller seleotion differential and henoe less
gain from seleotion in the trait TIhich uas varying genetioally.
A.
~ations
for tho estimo.tion of optimum '\JOights.
The estimation prooedure ':;as first. given by Fairfield Smith (10).
He
demonstrated its applioation in a oonsid0rction of seleotion among varieties of
uheat.
Details of the estimation prooedure
~ere
also given by Hazel (11)
reported on the construction of a selection index for usc
~ith s~ino.
~ho
Applications
to selection in poultry have boen considered by Pnnse (12) and Lornor at
~
(13).
The follo\7ing derivations differ i..'1 details but not in fundamentals from the
presentations by Fairfield Smith and Ho..zel.
Let 1J symbolize eoonomio Horth end gn the genotype for eoonomic 'ilorth. Genetic
improvement implies inorease in gyr- To be effective in incre~si..'1g gu selection
must be on a basis which 11i1l result in choice of genotypes for \:hich gu is a.bove
its mean for tho population.
Using an index of the form
(55)
selection will be most effeotive 7hen bl' b21 • • • ,
tho correlation of I With gw a.s large as possible.
bn
are given values that make
These are the optimum values
of the b f s refe"red to earlier. These optimum values vlill be sumbolizcd as
/tJ
11
1 2,
••
.,;1 n .
I
The problem is of the multiple regression type.
The XI s are the independent
varia.bles, and gw the dependent variable to be predicted or estima.ted from knoulodge
of the XiS.
The regression model for rela.tion of gu to the X'r is
Z~ =;8 0
+
;9 lJS. + ;8 2X2
+ • • • +
P nXn
(56)
Mimeo" No.6
R. E. C6mstock
-3-
where g; is the average value of gw associated with a given set of X's.
magnitude of
;8 0
The
does not concern us since it is a constant and \7ill have no
effect on the differences betueen estimates of gw.
important in the selection process,)
(Only the differences are
By reference to Fisher
We find tha.t appropriate b's (estima.tes of the
I'?
fJ
(14)
or Snedecor
(15)
's in multiple regression) are
obtained from slllultaneous solution of the follOWing set of equations.
blSxf +
bzSXI~
~S~X2
+
b4S~
+ • • • + bnSXIJ!i1
+ • • • +
bnS~::n
1
J
= SXJ.gw
S~gw
=
•
•
Hemember that xl :: Xl - r~, x
2
12 ,
= X2 -
SXJ.gw :: S(gl + el}gw
=Sgl&'v
etc.
Since Xl
Hor: consider the quantity, SXJ.gw.
(56)
= gl
+ ell
+ Selgw
However 61 is a random environmental effect uncorrelated with genotype, and so
the expected value of Selgw is zero.
Hence SJelgw is on the average equal to
Sglgw'
In like manner SX2gw, SX3gm etc. are on the avers.ge equal to Sg2gw,
Sg3~t
etc.
Making the substitutions, Sgl~ for SXJ,.gw, Sg2~{ for SX2gw, etc.
and dividing each of the equations by (N - 1), He have
b1 s pll + b 2s pl2 + •
•
• + bnspln
= Sgl\7
blS pl2 + b2s p22 -+ • • • ... b s
np2n ::
•
S
g2W
•
•
~spln + b2 s p2n + • • • + bn s pnn :: Sgnw
e
.\
l
)
T:here Bpll
= an
estimate of the phenotypic varia.'1ce of Xl'
sp12
= an
estimate of the phenotypic covariance of Xl and X2'
Sglw
= an
estimate of the genotypic covariance of Xl and w,
etc.
(57)
-4-
R. E. Comstock
Mimeo. No. 6
When only two traits are to be considered in selection there will be two equations
b1spll
02s p12 = Sglw
bl sp12
b2S p22
= Sg2\1
When three traits are to be considered the set will involve three equations
blSpll + b2s p12
b3 Sp13 = Sglw
bl s p12
b 2S p22
b3 sp23 =~g2rr
bl spl3
b3 sp23
b3s p3 .3 =Sg,3w
and so forth.
B. The expected effect of selection.
nle effioiency of sel€ction can be measured in terms of the expected genotypic
superiority of selected individUals, strains, or varieties over the mGan of the
group from which they were selected.
differential
T;'li~ Y~ill
be the produot of the selection
(mean difference in s€lection index betV7een selected group and the
entire group) and the regression of gu on the selection index.
The selection differential in accordance With previous notation is k 6- I
and the regression of ~ on I
is () If!;;/ 64~. Thus the expected genotyp~c advance
is
(58)
Since XJ.
Then
= Xl
-
X,
~
= X2 - X,
etc.,
R. E. Comstock
Mimeo. No. 6
-5-
:::
L(blx:L .. b2x2 .. • • • + bnxn)gw
N
(59)
and
Expanding and collect:J.ng terms a;:Jpropriately this becomes
CT ~ = bl( bl
L 0{ .. b22:. ":!."2 .. • • • .. bn ~ xl"n~\IN
~ [bl 2. xl":2 .. b:! 2:. ~ ... · •.. bn 2: X2"n1 IN
.. • • • bn[ ~ L xl"n .. b:! ~ x2"n .. • • • .. bn L ~ ]
..
IN
Substituting in terms of equations (56) this can be written
<> 12. = bl~ ":!.11w .. b2l: ~ . . . . . . . bn ~ "ng,.
N
N
From (59) and (60) we see that
(60)
N
(J;I~::: 0 ~. or
t) I:::
~ ~~~
•
Hence (58), the expected advance from selection can be written
Finally, remembering that, in ariving at equations (57), it was shown that
~.
L
~
,,~ ::: £-. gl~J L ~~:::
~
.£-
g2~
, etc.,
"'Ie
can
~·.Tite
the expected genetic
•
-6-
R. E. Comstock
Mimeo. No. 6
advance from selection based on the index as
(61)
~ I.. .'
is .
the genet';
r!~.
v
.c covariance of Xl with ""of'
g ..
where
0g2t1 is the genetic covariance of X2 vlith gil'
etc.
C. Estimation of variances and covariances required for solution of equations (57).
The estimates of phenotypic variances and covariances of the Xrs
(spIl' sp12' etc.) must be appropriate to the sort of values to be used in the
index.
For example, if selection among strains is to be based on means for
replications at each of
for such values.
~
~
locations in each of ! years then the estimates should be
They can be made directly from data involving such means or can be
built up from estimates of the variance or covariance components involved in
accordance with equations (50) and (53).
The estimation of covariance components from the mean products of an analysis
of covariance is exactly analogous to the estinw.tion of Yo.riance components from
the mean squares of an analysis of variance. This was pointed out by both
Fairfield Smith (10) and Hazel (II) though the procedure has not been used
extensively.
An analysis of covariance involving data on tTIO traits (say Xl and X2) from a
field comparison of genotypes at different locations and in different years
would tw{a the follading form.
R. E. Comstock
I',Iimeo. No. e
-7-
Expectation of mean product
Source of Variance
Locations
S - 1
~~S
t-l
Lx Y
(s - l)(t -
Reps in yenrs and locations
st(r" 1)
Varieties
G - 1
1)
6 12 ...
l' (!"hl2
... sr oJdl2 ... tr 'S"c12
... rst T g12
VxL
(s .. l)(G .. 1)
VxY
(t - l)(G .. 1)
(s - l)(t - l)(G ..
VxYxL
Vnr x reps in year
and location
Total
(/12 +
1'..)'" h12
+ sr r:Jd12
1)
st(G - 1) (r - 1)
strG - 1
G, s, t, and r have the same significance us in Section IV, C and D. The
numeral subscripts of the 0'
f
indicc..te the traits involved and the letter
S
subscripts have the same significance relative to the source of the
covariance components as they had in Section IV relative to sources of
variance components.
For example
~ c12 is the covariance between the
location-genotype interaction effects for Xl
fuLd
X2_
The covariance of genotype means for two traits can be written in terms of
the covariance components in a form analagous to that for variance.
For e:xnmple
the covariance betvJeen genotype meens of Xl and Jt2 is
(') p12
=
6" g12'" 0 :12 + ~dl2 ... (::) hl6 + <r12
at
rat
(62)
The genetic covariances of the Xts 'with gu can be estimated from analyses of
covariance (like the above) of the
xt s
nith economic north.
An obvious prerequisite
•
R. E. Comstock
Mimeo. No, 6
-8-
is definition of economic north and all data required to compute the values of it
to be used in the analyses,
In agronomic or horticultural crops,
~orth
may in
some instances be definodsimply as yield whereas in others a satisfactory
definition Tilill need to involve both yield and some measure of quality.
Once
rl
has boen defined and tabulated, ostimation of its genetic covariances with the XiS
is UIlD,lagous to estimation of genetic variances except that, as indicated above,
it involves analysis of covariance instead of analysis of variance.
In cnsoselect1on is to be betvlOen individuals (instead of betneen strains,
lines, or vc.rieties) as is ofton the cnso l.7ith aninds, the varinnces and
covurainces of equc.tions (57) nust be for indiViduals, not for group means.
The estination of the phenotypic variances and covarit'l1ces of the X.r s is then
completely straightforTIard, HOTIovor, the estimation of genetic Qovariances will
need to be made somewhat indirectly from data involving related animals. This
trill be token up Inter since the basis for the possible procedures has not yet
been laid,
D.
Indirect estimation of the genetic covariances '\7ith economic \7orth•
•
In some instances the available data uill not include all items necessary
for computation of
h,
When this is true it is sonetinas possible to estimate the
genetic covnriances indirectly,
...
Let 0.1, 0.2' • •.• , Un be the increases in economic :iorth that result from
unit phenotypic increases in traits 1, 2, • , ., n Ilhen each of the other traits
remains unchanged.
Then the genotype for econor.li.c worth can be vtritten as a
function of the genotype for the
n traits.
and the genotypic covariance of u uith the
ith
trait ~ill be
R. E. COJ:l8tock
Mineo. No.6
"9..
If one of the II traits does not affect ccononic TIorth directly, i.e. variation in
it causes no chnnge in econonic north TIhen all other traits reml.in constant, the
corresponding a-value is zero and the tern involving that a-value drops put of the
equation.
Existing data on anionls is frequently not complete enough so thQt direct
estimates of the genetic covarim1ces between econonic
can be nade.
~orth
and individual traits
On the other hm1d estinates of the genetic covariances OIlong the
individual traits nay be possible using different bodies of data for the
estioation of different covariances.
It is in such cases that equation (62)
becomes useful. [For exanple, see Hazel (11).1
In the case of plants the
collection of data for the specific purpose of constructing a selection index is
feasible and can be acconplished in a reasonable period of tine.
Thus \T.ith plants
the worker is not so dependent on existing data collected for other purposes
as in the C8.se With anirJals, especially those
~l~lich
reproduce slo1:Jly such as sheep
and cattle.
The a-values.
The at s stem directly froD definition of econornic worth in the organiso in
question.
For exanple,
~
Ylork '7ith Sea Island cotton, H. L. Manning (unpublished)
defined economic worth in terns of yield.
It was not necessary to bring in qUAlity
since the entire range of quality in the population with which he was working uas
~
within the range suitable for the purposes for which the cotton was to be used.
The individual traits on Yfhich selection Was being based were bolls per plant (Xl)'
R. E. Comstock
Minco. No, 6
-10-
seeds per ball (~), and lint per seed (X3). Since the cultural practice
involved a constant number of plants per unit area of land econooic worth (yield)
~as
a function of these traits as folIous:
Obviously, if Xl rrero increased by unity, and there ,rere no change in X2 or·
X3, the increase in W would be X2X3- If Xl ~ere incr~ased by unity for the entire
population, the increase in nenn WTIould be S(X2X3)/N _ Thus re~sonable a-values
in this case
~ere
112 = S(IrX:;)/N
a3
= S(X1X2)/N
An alternative procedure (used by Fairfield Snith) "::ould have been to define
oconooic uorth as the logarithn of yield.
Th~n
the expression for W~ould have been
VI = log yield = log Xl + log X2 + log X3
Then if tIle index
~ere
to be basod on logarithns of the XIS, the a's would all be
1 0 0 since chnnge in W in response to change in the log of
the ll.'"1ount of tho chango in tho log in question.
fu~y
of tho xrs is exactly
This approach has the advantage
that the a's are independent of the population nean values of the X r s. What
nay be a disadvantage is that the use of logs results in giving narc weight to
variations aDong lor! vclues of the Xis than aoong the high values_
As
an exanplo fran livostock the following is presented as a pOSSible
definition of econooic worth in datry cattle.
iN ~ cXl - !g - X4
X3
Xl = annual production of
X2
4%
Fat Corrected
= dost of rearing ninus disposal value
X3 = length of productive lifetine in years
~tl1k
in lbs.
R. E. Coostock
Mioeo. No. 6
-11-
X4
C
X3
= oaintainenqe cost per year
= value per lb. of Fat Correctod Milk in excess of cost of production
exclusive of oaintainence cost for the nninnl.
~ould
not ordincxily be a useful neasure for usc in selection because so
ouch trIe is required to obtain it that its Use would seriously retard a. breeding
progran. The alternative and probably Dore practical procedure 'would be to treat
X3 a.s a constant equal in value to the population nean for productive 1ifetine.
The a-values for XII 12' and X4 \lould then be
The a.-values for any traits which do not enter the function for econonic
worth oust be zero since variation in then will not cause variation in W sO long
as other traits rennin constant. Thus a-values for disease resistonce or plant
habit in plants or for aspects of confornation in dairy cattle
This dOGS not
r.1Extn
~ould
be zero.
that they should not roceive 'Height in a..'1 index. Indeed their
genetic correla.tions \lith worth or their phenotypic oovariances
night be such that they should be ueighted
~ith
other traits
hoa\~ly.
E. General rooarks.
The forcBoing serves only to outline what is involved in tho problen of effioient use in selection of data. on the various irlportant traits of the organise in
question.
To hcwe given sufficient detail to cover procedure in the variety of
situations in which this problem 1s confronted is far beyond the scope of this
course. Tho or-senCe of the problen is always the same but in details it differs
so much fron one situation to Mother that the conf;truction of an 1J."1dex for any
single organism constitutes a research project. Such a project
involve two phases.
~ill
always
R. E. CODstoclc
Mi=:eo. No. 6
-12-
1. Definition of econonic flOrth. lSoLletiDes this \7ill call for considera.ble
effort III itself. For exonple, in the worth function proposed for dairy
cattle the a.ppropriate value for £ would roquire careful considerction of
the relationships over tine between nilk prices, feed prices, laber costs, etc.
2. Estination of varicnces and covariances required for estinution of the bls.
It should be noted thet a greet deal of investigation remains to be done
relative to sta.tistical nspects of the selection index problen. ~ong tho nost
pressing questions is tho.t of the annunt of da.te. required to furnish satisfo.ctory
npproxinations to the optiDUD indeX4 The Dost
cor~~on
procedure in pra.ctice
probably is to Ileight traits in proportio!.i to their econonic inportnnce.
practical point of view it is ll1portunt that
~ny
FrOT}
a
basis for selection adopted as en
alternative be at least as efficiont o To insure this thero is obviouslJr
niniDun anount of data. on uhich such an c1tornative index Dust be based.
being accunul-:;ted at this station ':rill serve to thrOi7
SODO
0.
Da.ta
light on this question
within the next·fow years.
?robleDs.
14. Following are estinates of phenotypic varicnces and covariances and genetic
covario.nces in Soo. Island cotton (frOD unpublished data of H. L. Mannillg).
~pll ~
9.49 ,
Sglw :: 3.46 ,
==
6.85 ,
~p22
=11.94
Sg2i7 ~
7.45 ,
sgmt
=7
sp12
0
20
(a)
Estb""p;te the b's to be used in the selection index, blXI + b X
(b)
Estinate the progress expected froD selection based on
2 2
1.
the selection index
2.
Xv bolls per plant
3.
X2,
= I.
lint per seed
4. W, yield
(It uill not be necessary to substitute a nunericnl value of k, since
the gain in units of ! tlill suffice for conpnrison of tho four bases
for selection.)
"1.3-
R. E. Cop-stock
Mineo. No. 6
Rof~ences:
(9)
Hazel, L. N. and J. L. Lush (1942) • T:l.O Ei'::'ioienoy of Three Methods of
Selection. Jour. Hered., 3.3;.39.3-.399.
(10)
Seith; H.. Fairfield. (19.36) A DisCl'irJinant F11l"lction for Plant Selection.
Annuls of Eugenics, 7:240-250.
(11)
Hnzcl, L. N. (194.3)
Genetio Basis for Selection Indices.
Genetics, 28:
1~76-490.
(12)
Ponso, V. G. (1946) An Application of the DircrinL_:'.r.t Funotion for
Selection in Poultry. Jour. Genetics, 47:242-248.
(1])
Lerner, I. l:iichael, V. S. ASt1undson, and Dorothy H. Cruden (lSiI,,7).
Inprovenent of New Hur.1pshire Fryers. Poult~ Sci., 26:515-524.
(14)
Firli;.lr, R. A. Statistical Nethods for Rosoo.rch rlorkers.
Edinburgh and London.
(15)
Snedccor, G. W.
The
Oliver and Boyd,
"-
St~tistical Methods.
Iowa State College Press, Anes, Iona n
M!meo. No.7.
R. E. ComstockJ6:49
A
STATISTICAL CONCEPTS IN GENErICS
VI.
Gene Frequency and the Distribution of Genotypes.
the frequen~y of a gene is its number in the population expressed as a
fraction of the total number of loci at which it or an allele of it is present.
For example, sUFPose that of 100 diploid organisms, 30 are of the genotype
BB,
60 are Bb, and 10 are bb. There are 120 B genes and 200 loci occupied
either by B or its allele,
12.
Hence the frequency of B \-lith respect to this
population of 100 individuals is 120/200
= .6.
When using letters to symbolize genes, the large case will all-lays be used
to designate the most favorable of a set of allolic genos.
be understood that B is a more favorable' gene than
than an allele.s, etc.
12,
Thus it will always
C a more favorable gono
The letters' will be used to desj.gnate tho frequency of
the most favorable of a sot of allelic genes.
Then i f such a favorable gene
has only ono allele, tho frequency of its allele will be (l-q).
The Distribution of .Geno!:::mes in
ij.
Rmldom Hating Population.
Segregation of a pair of allelic ~enes (B and
genotypes; BB, Bb, and bb.
:2,
for example) yields three
Assuming random mating and equal viability of
the different gametes and genotypes, the expected ratio of genotypes homozygous
for the favorable gene, heterozygous, and homozygous for the less favorable
gene is q2. 2q{1-q):
(1_q)2.
This can be demonstrated as follows:
The probability of a gamete containing B is obviously
.s
and the probability
of 2 specific gametes, i.e. two combining to form a zygote, both containing
B is q2.
Thus the expected proportion of BB zygotes will be q2.
The probability of a gamete containing 12 is (l-q) and the probability of
2 specific gametes both containing J2 is (l_q)2.
Thus the probability of 1212
zygotes will be (l-q) 2 •
The remainder of the zygotes must be of the Db type.
thQ frequencies of the
thre~
Since the total of
types must be 1,0, tn.e expected frequency of
M:tmeoc: No", 7
Ito E" Comstock:6:49
hetorozygotes is
l_q2 _ (1_q)2 = l_q2 _ l + 2q _ q2 = 2q_ 2q2
= 2q(1-q).
When two gene pairs (say B, b and C, c) segregate independently, the
expected frequencies of the various possible genotypes '\olill be as follows:
Genotype
Frequency of BE, Bb, and bb
Freauency
BBOC
q2 q2
tJ
c
BBCC
q~ 2%(1-qc)
BBCC
~(1-qc)2
BbCC
BbCc
2qb (1-'4,) 2qc (l-qc)
Bbcc
2~(1-~)(1-<lc)2
bbCC
bbCc
bbcc
The key to these frequencies is obvious.
For example, if the probability
of a genotype containing 2B I S is ~ and of a genotype containing 20' s is q~J
the probability of a genotype containing both 2B t S and 20' s is ~q~ •
In general, if n gene pairs are segregating independently in a random
breeding population and viability is equal among gametes and zygotes, the
expected frequencies of the various genotypes can be obtained from expansion
of the expression
.
[q1~ +
(1- q1 )a1] 2 [ql-2 + (1- q2)A ] 2 ... [ 'l"A + (1""'\,).."
2
n
J2
•
In each term of the expansion the particular combination of A's and a's
species the genotype, and the portion involving q's and numerical values gives
-3-
the frequencies.
For·· examplel i f n
Himeo.
R~
= 4,
:10~
7
E. Comstock:6:49
one term of the expansion "lill be
•
This term indicates that thegE,lnotype,~~J!.,2a2A3a3a4a4'will be expected in
the frequency
4qfq2(1-q2)q3(1-q3)(1-q4) 2 •
There will be a term in the
.expansion for each of the possible genotypes.
or
some special interest is the frequency of the genotype homozygous for
all of a set of desired genes,.
ql
= q2 = ......
Clearly this will be qi q~ .....~. If
= qn as in theF2 of a cross of homozygous lines this becomes
q2n.
Linkage between loci prevents indopondont segregation•. In the event of
linkage bett-reen loci the foregoing relative frequencies of genotypes may be
regarded as equilibrium values that will be approached over a span
generations if the actual frequencies in
any reason out of equilibrium.
~
of
random breeding population are for
For example, the joint distribution of
genotypes at linked loci may be expected to be· out of equilibrium in a
population descended from a recent cross of contrasting genotypes.
How
rapidly equilibrium is approached will depend on the closeness of linkage •
(see
'l'hfl Distribytiop· of· Genotype§ in Inbred POPulation§,a·
Inbreed:Jpg may be defined as
ferm of non-random mating.
self-fertilization.
same gene at
~
th~
mating of related individuals.
It is a
The closest or most· intense sort of inbreeding is
Because related individuals are more likelytt> possess1:he
given locus than are non-related individuals the progeny of
niatings involving inbreeding will on the average be homozygous at more loci
than the progeny of random matings.
Mimeo. No. 7
R.E.Camstocks6:49
-4-
A11sorts of inbreeding which result in progressive decrease in
heterozygosis break a population into non-interbreeding lines.
For example,
a self-fertilized line can contain no more individuals in anyone generation
than can be produced from the seed of a single plant, the parent of that
generation.
~~
heterozygosis decreases the individuals belonging to
line must became more and more alike in
genot~rpe.
suc~
a
On the other hand different
lines may become homozygous for different genes at a given locus and so while
inbreeding reduces genetic variation within lines, it increases the genetic
variation in a population of such lines.
Wright (16) devised a measure of inbreeding conunonly called the coefficient
of inbreeding.
Its calculation is such that its magnitude is the expected
decrease in heterozygosis in an inbred population as a percentage of the
heterozygosis that would have been expected in the same population if there had
been no inbreeding.
Inbreeding as measured by
~lright I
s coefficient will be
symbolized as i.. The distribution of genotypes involving two allelic genes
in an inbred population will be as follows.
Genotype
Freguepcy
BB
q2 + fq(l-q)
Bb
2q(1-q) (l-f)
bb
(1_q)2 + fq(l-q)
Comparing this distribution with that expected under random mating it will be
noted that the increase in homozygous loci is divided equally between the BB
and bb types.
As l. approaches 1.0 the frequency of the BB type approaches ,g,
and that of the bb type approaches (l-q).
The gene frequency in the whole
population is unaffected by inbreeding if the population is large.
other hand the gene frequency
j~
any one inbred lines of the population shifts,
as inbreeding continues, from q to either 1.0 or zero.
force involved in this shift.
On the
There is no directional
It occurs, instOQ.d, because when a line is
Mimeo. No, 7
R.E.Comstock:6:49
carried from one generation to another by only a few gametes, the opportunity
for random shift in gen_e frequency is greatly increased.
This is most easily
perceived in the case of self-fertilization where the line is carried from
one generation to tho next by but two gametes,
Obviously if the line is
heterozygous for a gene in one generation there is an even chance that it
will be homozygous in the next,
Thus the expected decrease in heterozygosis
in.any generation is fifty percent of that present in the preceding
one.
The derivation of Wright's coefficient and some of the consequences of
inbreeding will receive attention in a later section.
References:
16,
e
17.
Uright, Sewall (1922) Coefficients of Inbreeding and Relationship.
Arnor. Nat. 56:330-338.
Mimeo. No.7
R.E.Comstock:6:49
-6VII.
Genotypic Variance Arising from Segregation of a Single Pair of Genes.
If both members of a gene pair (say B,b) are present in a population,
three genotypes (with respect to the B locus) will appear; BB, Bb, and bb.
Assuming random mating and equal viability of the different types of gametes
and zygotes, their expected frequencies will be in the ratio q2:2q(1_q):(1_q)2.
Let the average effect of bb on the organism be v, that of BB be (v+2u), and
/that of Bb be (v+u+au).
Note that the effects of the three genotypes are
not specified as constent but in terms of averages for the population.
These
effects would be constant only if there were no interaction of genotype at
the B locus with either the remainder of the genotype or with environment.
The situation is sununarized in tabular form below.
Genotype
e
BB
Bb
bb
~
~ency
q2
z+2u
z+u+au
z
2q(1-~)
(l-q)
:l.
A
u
au
-u
2
1
0
is the phenotypic average for the specified genotype.
the value of y' for the bb genotype, is the sum of
:It and the average effect of the remainder of the genotype with
respect to the population of environments invol',ed.
~ is obtained by coding yt: y = yt - (z+u)
Z is the number of B genes in the genotype.
~ reflects the dominance involved in the action of the genes,
When
a = 0, the heterozygote is midway between the homozygotes, i.e.
there is no dominance. If there is dominance of B, a > 0.0; if there
is dominance of b, a < 0.0.
~
a,
The total genotypic variance due to segregation of this pair of genes is
Since N, the total frequency,
= 1.0
Mimeo No.7
R.E.Comstock:6:49
-7Sy = q2u + 2q(1-q)au _ (1_q)2u
= (q2
_ 1 + 2q - q2)u + 2q(1-q)au
= (2q-l)u
(63)
+ 2q(1-q)au
Sy2 = q2u2 + 2q(1-q)a2u2 + (1_q)2u2
() ~ = [q~ +
and
J
(1_q)2] u 2 + 2q(1-q)a2u2 - [(2Q-l)U + 2q(1-Q)au 2
= [Q2 + 1_2q+q2 _ 4q2+4q-l] u2 _ 4q(1-q) (2q-l)au2
+ [2q(1-q) - 4q2(1-q)2J a 2u2
=
2q(1-q) [1 + 2(1-2q)a + (1_2Q+2q2 )a 2 ]u2
(64)
The total genotypic vari~lce,cr~, is divisible into portions which are
called (a) additive genetic variance, and (b) variance due to dominance
deviations from the additive scheme [see Wright (18) and Lush (3), page 74
~he
J.
additive effect of the gene B may be defined as the regression of
y' (or of y, which will be equivalent) on x, the number of B's in tho
genotype.
This definition is couched in different words but has the same
meaning as that given by Wright '(18).
The additive genetic variance (or the
variance due to additive effects of the genes) is the variance in y that is
accountable to linear regression on x.
Rem~nbering
that the regression
coefficient is computed in a manner that minimizes the variance due to
deviations from regression, it will observed that the additive effect of B is
defined such that as large a por tion ofer~ as possible will be explained
on the basis
~dditive
gene effects •.
It happens that in a population where mating is random, the additive
effect as defined above is the response that would be obtained fram substituting
B for b, averaged for all loci in the population at which b is present (and
hence the substitution is theoreticallY possible),
Thus, the concept of
-8,.
Mimeo. No.7
R.E.Comstock:6:49
additive effects applied when gene action is not of the simple additive
sort is not so abstract as it may at first appear.
Lush (3, page 73) defines
the additive effect of a gene in terms of its substitution value.
The derivation of formulae for the additive effect of B; the additive
genetic variance, and the variance due to dominanco deviations is as follows.
Sx = 2q2 + 2q(1-q)
(.) x2 =
sx2 _(Sx)2
=4q2 + 2q(1-q)
tS..xy= Sxy -
= 2q
(65)
(remembering that N = 1)
_ 4q2
= 2q(1-q)
(66)
(Sx)(Sy)
= 2q2U + 2q(1-q)au -
2q ({2q-l)u +.. 2q{1-q)au ]
[see (63) for Sy. ]
=[ 2q2 _
4q2 + 2q] U + 2q(1-q)(1-2q)au
(67)
= 2q(1-q) [1 + {l-2q)a] u
The additive effect of B is by our definition
byx =
U xy!Ux
(68)
= [1 + (1-2q)a] u
It is comparatively simple to verify that this same value will result from
computation of the average substitution value of B.
Substitution of B for b
in the heterozygote increases y' by the amount (u-au).
The frequency of
heterozygotes is 2q(1-q) which is therefore the frequency of possible
substitutions having the effect (u-au).
genotype
increas~s
Substitution of B for b in the bb
yt from -u to au, i.e. by the amount (u+au).
Since the
frequency of the bb genotype is (1_q)2 and since there are two loci per individual
of this genotype at which the substitution is possible, the frequency of
possible substitutions having the effect (u+au) is 2(l_q)2.
effect of all the possible substitutions is
2q(1-q)(u-au) + 2(l-q)2{u+au)
2q(1-q) + 2(1-q)2
== g(}l-au) + (l-q)(u+au)
q + (l-q)
==
[1 + (1-2q)a]. u
The average
-9-
as obtained for the regression of y on x
Mimeo. No.7
~E.Comstock:6:49
see (68)
•
The additive genetic variance, to be symbolized hereafter as
was defined as the portion of() ~ due to regression on x.
r.- g2
\)
=()~
2
Ox
Hence
. ] 2 u2
= 2q{l-q} (
1 +.
(1-2q)a.
[from (11), (66), and (67)]
Note now that when a
[sUbstitution of a
(f ~,
= 0, i.e. when
= 0 in (64) and
thore is no dominance,
o-~
<r;
=
(69) loads to 2q{1_q)u2 in both cases] •
This is logical since with no dominance, ue have simple additive gene action
and all genetic vo.r5.r.Jlcc should be of the udc;iitivo sort.
whenever a
'I
0,
a ~«(j:.
On the other hand,
The difference is the variance due to deviations
from regression· resulting fran dominance and is termed variance due to
dominance deviations (from the additive scheme).
It will be symbolized
2
hereafter as (Y" d •
o
~
=
u; - G ~ = 4q2(1_q}2a2u2
(70)
[from (64) e.nd (69)]
Values of the Additive Effect and the Genetic Variances when
of" Spec~aT Interest
1.
= 0.0,
byx = u
When a
i.e. there is no daninance.
(a)
for all values of q.
(b) Oy2 = (J2
g
= 2q(l_q)u2.
with ma.."timum value when q
listed below for several values of q.
q
.1
.2
.3
.4
.5
.6
.7
.8
.9
§;
= .5.
takes Values
Values are
-10-
Mimeo. No. 7
R.E.Comstocks6:49
Note that for q = .2 to .8 the genetic variance is from 64 to 100
percent of its maximum_
Cc)
2.
() 2
d
When a
=. 0
= 1.0,
(a) -byx
i. e. when B is completely dominant to b.
= 2(1-q)u.
It approaches 2u as q approaches zero and approaches
zero as qapproaches 1.0.
Thus the additive effect ofa dODlinant gene
is high "rhen its frequency is low, but low when its frequency is high.
(b)
(j:
= 4q(2_q)(1_q)2u 2 with maximum when q
formula
(0)
Cd)
~ee ~lright
= 4q2(1_q}2u2 with maximum when q
The maximum for
e-
corresponding
(19) ] •
2
()
with maxi.mum when q
. g2 = 8q(1_q)3u
ua
= .293[ For
()~
is at q
= .5
= .25 •
= .5.
for all values of a. [For formulae
corresponding to (c) and (d) seo Fisher (20) ] •
(e)
The ratio of (f ~ to
(T~/
u; =
cr- ~ is
2(1-q)/{2-q)
which approaches ono as q approaches zero and approaches zero as q approaches
.88
.75
.67
.57
.46
e-
.33
.18
.12
.25
.33
.43
.54
.67
.82
.3. When a> 1.0, i.e. when the heterozygote is superior to the superior
homozygote, -and
q =
l+a
2a
•
Mimeo. No. 7
R.E.Comstock:6:49
-11...
byx = 0
(a)
~
(b)
0
(c)
Q ~
(d)
=0
= () ~
yo =. (2q-l)u + 2q(1-q)au is at its maximum.
is listed bolo\T for several values of a.
..,g.
(1+a)/2a
1.0
1.5
2.0
3.0
4.0
1.0
.833
.75
.667
.625
Problems:
Plot
byxl U
§, ()
~,
CF
~, and (f ~ G: against q for a = 0, .5,
References,
18. Wright, Sewall (1932). The Analysis of Variance and the Correlations
between Relatives with Respect to Devictions from an Optimum.
Jour. Gen. 30:243-256.
19. vlright, Sewall (1931).
16:97-159.
Evolution in
Mendeli~n Populations.
Genetics
20. Fisher, R. A. (1718). The Correlation between Relatives on the
Supposition of Mendelian Inheritance. Trgns. Roy. Soc.
Edinb. 52, Part 2, 399-433.
"-
© Copyright 2026 Paperzz