ESTIMATORS FOR GENETIC PARAMETERS OF POPULATIONS DERIVED
FROM PARENTS OF A DIALLEL MATING
R. O. Kuehl and J. O. Rawlings
Institute of Statistics
Mimeo Series No. 384
INSTITUTE OF STATISTICSi.
BOX 5457
STATE COLLEGE STATION
RALEIGH. NORTH CAROLINA
ERRATA SnEEr
-'-"
page iii - line 6:
"Hansenll should be "Hansonll.
page 21 - 2nd formula:
II
Y- W1211 should be IY2-W12"'
page 28 - lOth line of Table:
II
-u + u - t II
l2
l
l
Sh 0 uld
be
II -u + u - t1211.
l
2
page 30 - line 4:
page 31 - line 3:
="
"y
page 29 - line 16:
II(Y
l
-l)U + (y -l)y 11 should be lI(y -l)u
l
l
l
2
2
II ~ "
C.P
page 34 - last line:
(~-1)'
MSI =
II
2P
.
should be " MSI =
II
1:
II·
should be
II
i<j
page
II
jfi
(~~li"
"u211 should be II u 2" •
i
page 35 - first line:
last line:
should . be
jfl
II
= "•
should be 111'
.'
1: " •
irj
I d be 118'"
58 - 1 ine 6 : "8
+2 nJ..L6i " h
s ou
+2 nl-161 •
page 59 - line 7:
II
+
"should be
II
irj
+ .E II •
irj
page 72,- line 3 from bottom, value for a
= 0.5,
n
= 10:
114811 . should
"
be 1168" •
page 95 - line 3 in Table, values for a
0.04 0.05" should be "0.95
= 10,
15, 20:
"0.05
0.95",
II (
) I I should be " ( l-2P j# )" '
l-2Pjr
l
II ( n-2 )11 in the deno~nator
.
. ".( n-l..
)11
should be
line 12:
page 112 - line 4:
"(w12-nPlP2 )1 -c.P )11 s h ould b e "(W12- nPI P2 )(1""')"
-c.P1 •
1
li ne:
9
1"\
II[Ui + (1-2Pjri)t12]" should be "[ui - (1-2Pjri)t12]"'
page 113 - line 11:
••
0.94
n
,,(n-2)i" h ldb lI[n-,gl
II
n
u 2 s ou
e.
n
u2 •
page 110 - line 2:
•
5'
page 116 - 1·J.ne,
= 1.0,
2
"r(n-j)
n-l 2
II
2
b
uld
b
"n(n-4)
so
e
2II '
(n-l)
2
page 117 - line 11:
11(,
page 119 - line 10:
"
page 126 - line
5:
~nP1P2
ui
II
page 127 - line 5:
II
II
i<j
page 134 - line 3:
line 8:
shuldb
0
e
II
s h 0 uld b e "2,,
ui '
" >'
2 "
":' crDi'
~
- 2P
n 1'"
should be ""
4 ," ,
:-- crDi
~
should be " .E
11 ,
i<j
one "(n_I)311 in the denominator should be
. "
.II ( n-1 )211 in the denomJ.nator
should be
II (
II
(n_2)211,
n-2 )211 ,
iv
TABLE OF CONTENTS
Page
vi
LIST OF TABLES • .
1.
INTRODUCTION.
1
2.
REVIEW OF LITERATURE
4
3.
SAMPLING DISTRIBUTIONS
4.
ANALYSIS OF THE
4.1
4.2
5.
15
25
CROSS
DIAL~EL
The Analysis .~ . . . . . . . . .
....
Genetic InterP7e~ations of Dial1e1 Statistics
27
36
ESTIMATION OF POPULATION PARAMETERS
5.1
5.2
5.3
25
General Remarks . '. . . . . . . . . .
Parent Population Estimators
• . . . . .
Derived Population Estimators . . . . . .
36
37
42
I
5.3.1
5.3.2
5.3.3
5.4
6.
General Remarks
. . . . . . . .
Genetic Model With Two Alleles
.
Extension to Multiple Alleles
.
VARIANCES OF.ESTIMATORS
56
6.1
6.2
Introduction . . . . . . . . . . .
Exact Variances of Parent Population
6.3
Exact Variances of Derived Population
6.4
Numerical Evaluations of Estimator Variances
Estimators
6.4.1
6.4.2
6.5
6.6
0
•
0
•
0
•
0
•
•
•
•
56
•
0
•
•
. . . . . . . . .
•
.
56
60
65
General Remarks . . . . . . . . . .
Relative Efficiencies of Derivedto "i'Parent , Popula tion. Estimators.
66
Consequences of Random Experimental Error. .
84
6.5.1
6.5.2
'e
54
Discussion
Estimators
42
44
51
No~mal
Parent Population Estimators. .
·Derived Population Estimators . . .
Approximations of Variances
65
84
89
92
v
TABLE OF CONTENTS (continued)
Page
7.
SUMMARY AND CONCLUSIONS
7.1
7.2
..
97
Discussion . . . . . . . . • . . . . . . . . 97
Suggestions for Further ·Research
. . . . 104
8.
LIST
OF REFERENCES .
,
. . . 106
9.
APPENDIX . . . . . .
108
9.1
9.2
9.3
9.4
108
Expectations of Diallel Statistics
Derivations of the Exact Variances for
Parent Population Estimators . . . . . . . 118
Derivations of Exact Variances for Derived
. 129
Population Estimators . . . . . . . .
Moments and Functional Expectations for the
Binomial Distribution . . . . . . • . .
. 140
>.~~&~
..
l
,
vi
LIST OF TABLES
Page
2.1
Analysis of variance of the diallel cross. . . . .
4
2.2
Analysis of variance and expected mean squares of
tp,e modified diallelcross . . . . . . . . . . .
8
2.3
4.1
4.2
Partitions of the analysis of variance and
expectation of mean squares for. a diallel
experiment excluding selfs . . . . . . .
'Mean squares and expected mean squares for the
analysis of means from a dial leI experiment
excluding selfs and reciprocals . .
.....
26
Diallel matings with resulting number and genetic
values of the Fl progeny . . . . . . . . . . . .
28
Genotypic value and number of inbred parents,
utilized in the 'diallel cross . . . . .' .
32
Exact variances and covariance for genetic
portions of MS g . c . a and MS s . c . a from the
diallel analys1s of variance . . . . . .
58
Variances and covariances of diallel estimators
letting Pi=Pj=P, ai=aj~a, and Ui=Uj=l .
...
67
A2
6.9
~*
2 2
2* 2
Values' of [V(6A)/(6A) ]/[V(6A )j(E y 6A ) J fo~
specified values of n, a, m, and p . . .,.
71
Coefficients of variation of ~X for specified
combinations of n, a, m, and p . . . . . . . . .
72
Coefficients of,variation of 6~* for specified
combinations of nand m . . . . . . . . . . .
73
Additive genetic variance of the parent population
for 10 loci and specified valuesofa and p . . .
73
Average additive genetic variance of derived
population for 10 loci and specified values of
n·, a, an.d p . . .
74
0
6.8
10
~2
"2 2
•
•
••
,.
"_2*
•
•
•
.,
•
•
, 2* 2
Values of' [V(C>j»/(O"D) J/[V«(5fi )/(EYO"D ) J for
specified v.:a1ue,s of n, m, and p . . . . . . . . .
76
Coefficients of variation of ~fi for several
combinations of n, m, and'p .
.
76
1.
INTRODUCTION
The dia11e1 cross has been utilized for a riumber of
years to investigate the nature of gene action in plant
populations.
Numerous analyses and interpretations of the
diallel experiment have evolved as a result of the research.
Basically, the dialle1 cross
i~
its current context is con-
stituted by all possible crosses among ,a set of parents.
A discussion of the modifications of the dia1le1 cross
and the associated analyses is presented in Section 2.
At
least two inference populations are used in the interpretation of the analysis of the dial1e1 cross.
One is the ran-
dom mating parent population from which the crosses are a
random sample.
The second is the set of parents utilized
for the dial1el cross.
Much controversy exists as to the
appropriateness and validity of the two methods.
To infer about a specific group of parents
requires
the assumption that the genes are distributed among the parents at random;
zero.
i.~.,
the covariance of the gene effects is
This assumption is not fulfilled in most practical
situations.
However, it the crosses are a random sample of
some original random mating population, the assumption is
not necessary.
In a discussion by Cockerham (1963) of the
problems associated with obtaining estimates of genetic
variances from a specific set of parents and their crosses,
the desirability of having some base of reference other than
the set of genetical material in the sample was pointed out.
2
A reference base suggested by Cockerham (1963) was the random mating population wholly constituted from the set of
lines used for a diallel cross.
Certain criteria must be satisfied before a new reference base can be utilized for any genetic mating system and
its analysis.
It must be possible to define genetic param-
eters for the reference base, and it must be possible to obtain estimators for the parameters.
Further, the error of
inference from the estimators to the reference base must be
evaluated to determine whether or not the use of a new reference base is worthwhile.
In the present problem, an attempt was made to provide
estimators and their variances from a diallel analysis for
genetic variances of the random mating population derived
from a set of completely inbred parents.
The above estima-
tors were compared to the estimators for genetic variances
of the random mating population from which the crosses were
a random sample.
In Section 3, the homozygous parents to be used for the
diallel mating are considered to be a random sample from a
population of homozygous individuals, constituted by inbreeding a random mating population.
They are described
later in terms of sampling variables and distributions,
which provides a workable base for solving the problem.
The analysis of a diallel cross excluding._ reciprocal
crosses is described in Section 4.
The statistics from the
3
analysis are written in terms of genotypic values and the
sampiing variables of Section 3 for a ,gene model consisting
of additive, dominance, and additive-by-additive epistatic
effects.
The results of Sections 3 and 4 are utilized in Sections 5 and 6 to obtain unbiased estimators and their variances for genetic parameters of the two reference populations.
The variances of the estimators are given for the
gene model in the absence of epistasis and the relative efficiency of the two sets of estimators is evaluated numerically.
The usefulness of some biased estimators for derived
population
para~eters
also is investigated .
.rhe exact sampling variances of some diallel estimators
are compared to their normal approximations for specific
cases of the additive and dominance gene model to determine
the usefulness of normal approximations to variances of
quadratic forms utilized in genetic studies.
The utility of the derived population as a reference
base is discussed in Section 7, along with some additional
ramif~cations
and proposed extensions of the methods used in
the dissertation.
4
2.
REVIEW OF LITERATURE
The analyses of diallel crosses presented by a number
of authors differ as to content of the statistical and genetical analyses and the scope of inference associated with
the analyses.
An orthogonal analysis of variance of a complete dialleI design, including reciprocal Fl crosses and selfs,was
given by Yates (1947).
Letting Yqr be the phenotypic value
of an individual following a mating between the qth parent
and the r th parent, the analysis of a complete diallel involving p parents is as given in Table 2.1.
Table 2.1.
Analysis of variance of the diallel cross
Source
Sums of squares
df
Lines (general)
p-l
G=
L
(yq .+y. q )2/2P - 2Y~./p2
L:
(Yqr+Yrq)2/4
q
Reciprocal sums
after lines
(specific)
S=
q,r
- Lq
p-l
Maternal effects
(Yq . +Y. q)2 /2p + Y~ . /p2
M= ~ (Yq . -Y .q) 2/ 2p
q
ReGiprocal
effects
Yq • = L.Yqr ,
r
1
2(p-l) (p-2)
R=
L
q,r
(Yqr-Yrq ) 2/4
Y.. = ~ Yqr = ~
...q,r
q
2::r Yqr
5
Using Yate's ana1ysis,Hayman (1954a) interpreted the
mean squares in terms of a genetical model patterned after
Mather's (1949) description of a polygenic system with additive and dominance effects.
Hayman further partitioned
Specific Sum of Squares, S, into three parts to obtain
further information on dominance effects.
The partitions
were
Sl = (y •. _p~Yqq)2/p2(p_1), with 1 df;
q
.
and
Kempthorne (1956) introduced genetical interpretations
of the mean squares in Yate's analysis, extending the genetic model to include arbitrary alleles and epistasis.
The
population parameters given genetic interpretation were
= mean of the random mating parent population
=
mean of the population of possible inbred
lines
= genotypic variance in the parent population
= variance of the inbred lines
c
covariance of inbred lines and the progeny of
the inbred lines
C(P.o.) = covariance of parent and offspring in the
original random mating population.
6
Expected mean squares were expressed in terms of the population parameters, and it was found that only in the absence
of epistasis could the dial leI give unbiased information
about the population parameters.
The main distinction between Hayman's (1954a) and Kempthorne's (1956) analyses is the ,reference base for which inferenc~s
are made from the analysis.
Kempthorne interpreted
the results in terms of the parent random mating population
that has given rise to the homozygous parents by inbreeding,
whereas Hayman
r~stricted
interpretations to the specific
set of parents utilized in the diallel cross.
Griffing (1958) presented an analysis similar to Kemp-
•
thorne's, but included a component in the model to account
for recip:r;-ocal effects.
The relationships between Kemp-
thorne's and Griffing's population parameters are
2
+ 6 s.c.a
Cov(P.O.) =
~.c.a
where Griffing utilized the concepts of general combining
ability (g.c.a) and specific combining ability (s.c.a) defined and applied by Sprague and Tatum (1942).
Various modifications of the analysis of variance of
_
diallel crosses have been presented by Griffing (1956, 1958),
7
Matzinger and Kempthorne (1956), and Cockerham (1963) in
which parents and/or reciprocal matings were excluded from
the analysis.
Griffing (1956, 1958) gave the analysis in two forms,
one omitting inbred parents and the second omitting inbred
parents and reciprocal matings.
Specific and general com-
bining ability variances were presented in terms of genetic
variances where the genetic model included additive, dominance, and all types of epistatic effects for an arbitrary
number of loci with
a~bitrary
alleles (Griffing, 1956).
2
2
.
6 g.c.a
w
t ti ons g1ven
gene ti ca 1 i n t erprea
an d 6 s.c.aere
The
where 6~ is the additive genetic variance, 6~ is the dominance genetic variance, and
6lA is
the additive-by-additive
epistatic variance; etc.
Matzinger and Kempthorne (1956), omitting selfs and reciprocal matings from the analysis, considered an arbitrary
but uniform degree of inbreeding in the parents.
The vari-
ances of specific and general combining ability were given
in terms of covariances of full-sib and half-sib relatives.
The genetic model was equivalent to that of Griffing (1956),
with the exception that Griffing considered only completely
e_.
8
inbred parents.
The modified dia11e1 analysis with p par-
ents and k replications is given in Table 2.2.
Table 2.2.
Source
tit
Analysis of variance and expected mean squares
of the modified dia11e1 cross
df
Replicates
k-l
R
General
p-1
G
Specific
Error
E(MS)
SS
1
=k(p-2)
S=
p(p-3) /2
E=
q
(5
2
2
+ k(p-2)6g.c.a
L;
6 2 +k6 2
(y2
qr. /k)- C - G
L: y2qrt -
S - G - R- C
s.c.a
(52
where
n=[p(p-1)/2] -1
2Y~ ..
C = kp{p-1)
The model used for the analysis in Table 2.2 is
q,r
t
2
+ k6 s . c. a
_ 2(p-1)C
p-2
q,r
(k-1) (n-1)
2: y2q ..
1 , 2, ... , p; q<r
= 1,2,···,k,
~
9
where Yqrt is the yield resulting from a cross of the qth
line with the r th line grown in the tth replicate; p, the
general mean; gq' a measure of general combining ability of
the qth line; Sqr' a measure of the specific combining
ability of a cross between the qth and r th lines; k t , a replication effect; andeqrt, the experimental error associated
wi th Yqrt .
Matzinger and Kempthorne (1956) showed that
6~.c.a=COV(FS)-2 Cov(HS) and6:. c . a =COV(HS).
of full sibs was given as
The covariance
Cov(FS)
and the covariance of half-sibs was given as
where F represents the degree of inbreeding in the parents.
With a single diallel experiment, it was shown that additive
and dominance genetic variances could be unbiasedly estimated only in the absence of epistasis.
Additional compo-
nents of genetic variance could be estimated if a series of
diallel experiments was conducted with different levels of·
inbreeding.
In addition, the analysis was presented for ob-
taining estimates of the interactions of genotypic components
e'-
10
of variance with environments represented by locations and
ye~rs.
In a general discussion of mating designs,Cockerham
(1963) presented the diallel analysis, excluding selfs.
The
new features of the analysis were the expectations of the
mean squares and translation of the components of variance
into covariances of relatives involving reciprocals.
The
analysis included maternal and reciprocal sums of squares as
given by Yates (1947) with general and specific sums of
squares as given by Matzinger and Kempthorne (1956).
A por-
tion of the analysis for k replicates and p parents is given
in Table 2.3.
Table 2.3.
Partitions of the analysis of variance and expectation of mean squares for a diallel experiment excluding selfs
Source
df
Replicates
:k-l ..
General
:p-l~
E(MS)
6 2 + k6~ + 2k6~.c.a
+
Specific
p(p-3)/2
Maternal
,p-l
k(p-2)6~
+
2k(P-2)6~.c.a
6 2 + k6 r2 + 2k6s2 . c . a
6 2 + k6r2 + 2kCS;
Reciprocal
(p-l) (p-2)/2
6 2 + k6 r2
Error
(k-l) (p2_ p _l )
62
11
In the analysis shown in Table 2.3, 6 2 is the error
variance, 6;= (Cps+Cms ) 12-C rs ' (5~=Cf-Crf- (Cps+Cms-2Crs)"
6~.c.a=Crf-2Crs' and
6:. c . a =Crs '
where Cf=Cov (full sibs),
Cr f=CoV (reciprocal full sibs), Cm
=Cov (maternal
half-sibs),
s.
Cps=Cov (paternal half-sibs), and Crs=Cov (reciprocal halfsibs).
In the absence of reciprocal effects,
(52g.c.a =C rs =Cov(HS) and 6 2s.c.a =C rf -2C rs =COV(FS):2 Cov(HS),
which agrees with Matzinger and Kempthbrne (1956). One is
able to test the hypothesis that 6~=O and that 6~=O, where
6~ and 6; refer to the variances of red(iprocal and maternal
effects respectively.
Hayman (1954b) and Jinks (1954) presented an analysis
of a complete dialle1 cross among a set of homozygous parents.
The analysis was designed to provide information
mainly on the distribution of genes in the parents, on average degree of dominance, and on certain components of genetic
variance with inferences from the analysis restricted to the
parental lines.
The genetic model was restricted
~o
additive and domi-
nance effects with two alleles each at an arbitrary number
of loci.
The regression of array covariance, Wr , on array
variance, Vr , was plotted to obtain evidence of nonadditive
gene effects, where deviations from unit slope provided evidence that nonallelic interactions were present.
Wr is the
covariance between the parents and their offspring in the
r th array of the diallel table. If the quantities, Wr-V r ,
12
were homogeneous, the results of the experiment were considered to conform to the biometrical model and the analysis
was performed.
If, however, the Wr-V r values were hetero-
geneous, the data for interacting lines or crosses causing
the disturbances were either removed or adjusted and the
usual analysis was performed on the remaining crosses.
Kempthorne (1956) objected to the procedure of removing
crosses from the analysis on the basis that, if the parents
were regarded as a random sample from some larger population, the reduced set of parents could not be so regarded.
Gilbert (1958) felt that the objection lost its force if inferences were directed to the parental lines in the experiment.
Hayman (1957) derived aX 2 statistic from the diallel
analysis of variance (Hayman, 1954a) to test for the presence of epistasis.
The test was made possible by the inclu-
sion of F2 families in the experiment, and essentially determined the failure or nonfailure of the F2 family to
conform to its expectation from its ancestors under the
simple dominance model.
Hayman (1958) extended the dialleloanalysis to include
F2 families to increase the accuracy of measurement of the
components of genetic variation.
Dickinson and Jinks (1956),
extended the analyses of Hayman and Jinks to include arbitrary inbreeding of parental lines in the diallel cross.
13
Hayman (1960) attempted to relate the main lines of approach to analyzing and interpreting the diallel cross.
In
so doing, he considered the homozygous parents as a random
sample from an inbred but originally random mating population.
Population parameters were translated into components
of genetic variances defined by Hayman (1954b), and the
population parameters were related to those given by Kempthorne (1956) and Griffing (1958).
Hayman then provided a
set of unbiased estimators and a set of maximum likelihood
estimators from the dial1el analysis for the population parameters.
The variance-covariance matrix of the unbiased
estimators for population parameters and genetic components
e
was given, where variances and covariances of the quadratic
\.~_ .../
functions were derived under the assumption of normally distributed effects in the model.
From the nature of the vari-
ances, it was suggested that at least 10 parents should be
used if the dia11e1 cross was to provide useful estimators
of the population parameters.
Considering the interpretation of the genetic parameters defined by Hayman (1954b) £or a fixed set of lines,
one might ask whether or not these parameters more appropriate1y apply to a random mating population defined by the
gene frequencies of the set of lines.
appropriate
~or
If the parameters are
such a population, there is an error of in-
ference associated with estimators of the parameters.
make inferences to a reference population, an adequate
To
14
genetic sampling plan is necessary to determine the error of
inference.
In the following sections, an attempt is made to provide estimators and their associated errors from a diallel
analysis for genetic variances of the random mating population derived from the set of completely inbred parents used
for the diallel mating.
15
3.
SAMPLING DISTRIBUTIONS
The sampling variables used in the·solution of this
problem and their probability distributions can be illustrated by initially considering a random mating diploid
population in linkage equilibrium consisting of genotypes
having m loci, each with two alleles (Band b).
The alleles
is imposed on the random mating population, such that Pi
does· not change ,. to form a completely inbred population of
e
'''... /
homozygous geno1:.ypes.
The frequency of the genotypes, BB
and bb, at the i th locus in the inbred population will be Pi
and
(l~Pi)'
respectively.
Let X={1,2" .. ,m} be the set of m loci.
A homozygous
genotype in the population is completely defined by specify.ing the set of loci, eJthat has the positive alleles, BB,
since the remainder of the loci,
tive alleles, bb.
~~-~,
must have the nega-
For example, with two loci, ~={1,2)
specifies the genotype BI BI B2 B2 , a={21 specifies the genotype b l b l B2B2 , etc. Thus all possible homozygous genotypes
for the m loci are specified by considering all possible
subsets of X, including the empty set,
set, a=4C.
~=g,
and the complete
The relative frequency of a genotype in the in-
bred population is given by
16
(3.1)
Let X(a) denote the number of lines having genotype a
in a particular random sample of inbred lines.
The geno-
typic composition of a particular sample of n lines from the
-inbred population is specified by the
X(~)'s
for that sample
where
= n.
If welet
~i=\aliea}, !,~"
(3,2)
~i is the set of all a contain-
ing the i th locus, the number of lines in the ~ample.that
contains the set of positive alleles, BB, at the i th locus
is
Yi
=
L
X(a).
aC'~i
(3.3)
.
Conversely, the number of lines in the sample that contain
the negative alleles, bb, at thei th locus is n-Yi'
The
relative frequency of the B allele at the i th locus in the
sample is given by
(3.4)
and the frequency of the b allele is
I-Pi~
. Likewise, letting ~ij={al(i,j)£a~, the number of lines
in the sample containing the set of positive alleles, BB, at
both the i th and jth loci is
17
(3.5)
The set of random variables,
X(~),
has a joint mu1ti-
nominal distribution with probability density function
n:
f[X(a.)]
=
n-y
11
pYi(l-p.)
i
ide i
1
. TT
a.CK
(3.6)
X(~)!
subject to (3.2).
The Y1 are marginal sample values associated with the
distribution of the two alleles, Band b, at the i th locus,
and they are binomially distributed with probability density'
function
(3.7)
where YI'Y2'···'Ym are mutually independent if the parent
population is assumed to be in linkage equilibrium.
To exemplify the random variables and their density
functions, consider a random mating parent population of
genotypes having two alleles at each of two loci.
The com-
pletely inbred population derived from the random mating
population w.ill have the following distribution of homozygous genotypes:
18
Genotype
Relative frequencyl
Bl Bl B2B2
B1Bl b 2 b 2
f 12 = PlP2
fl
= Pl{1-P2)
b l b l B2 B2
f2
= (1-Pl)P2
b b b b
l 1 2 2
f
... (1-Pl)(1-P2)
(3.8)
Suppose a random sample of size n is drawn from the inbred population; the sample array will appear as:
Number of
genotypes 2
Genotype
Expected
relative
frequency
Bl B1B2B2
Bl B1b 2 b 2
X12
f 12
Xl
fl
b 1 blB 2B2
X2
f2
b 1b 1b 2b 2
X
f
The marginal totals for the two loci are
Genotype
BlB l -b 1b 1 ---B 2B2
--b 2b 2
Number of
genotypes
Yl = X12 + Xl
n-Yl = X2 + X
Y2
X12 + X2
n-Y2 = Xl + X
Expected relative
frequency
PI = f 12 + f 1
f2 + f
I-PI
P2 = f 12 + f 2
I-P2 ... f 1 + f
(3.9)
IThe subscripts of f and X are simplythe ~ designations of (3.1) and (3.2), respectively, omitting the
parentheses. The absence of a subscript refers to the empty
set, 0.=0.
2Ibid.
19
The frequency of the B allele in the sample at loci 1 and 2
is Pl=yl/n and P2=Y2/n, respectively.
The number of lines
that have BB genotypes at both loci 1 and 2 isW 12 ""X 12 when
only two loci are concerned.
The random variable, Wij , is a result of the nature of
sampling sets of genotypes of size n from a population.
Several samples of n genotypes having different distributions of genotypes within each sample can have identical
distributions of gene frequencies.
Therefore, samples of
genotypes can be aggregated into groups in which all samples
within the group have the same distribution of gene frequencies but there may be a different distribution ofgeno-
•
types for each sample.
The random variable, Wij , is indica-
tive of the differing distributions of genotypes.
In the framework of the previous example, suppose two
four-line samples are drawn from the inbred population with
the following distribution of genotypes for each sample:
Genotype
Number of genotypes
Sample I
Sample 2
Bl Bl B2B2
Xl2
BI Bl b 2 b 2
b l b l B2B2
Xl
2
Xl
== 1
X2
1
==
0
X "" 0
X2
X
==
1
n "" 4,
n
==
4
b l b1 b b 2
2
I
==
Xl2
2
==
20
In both samples, the marginal values, Yl and Y2' are the
same;
!..!.., Yl=X 12+X l -3 and Y2=X 12+X 2 =2.
Hence, in both
samples, PI=3/4 and P2 =1/2. The distribution of genotypes
in the two samples is different, but the distribution of
gene frequencies is identical.
It remains to determine the
,relevance of W12 =X 12 , which is accomplished by considering
the multinomial density function associated with the twolocus example.
From (3.6),
n!
X
Xl X X
f[X(a.)] = X'X 'X 'X IfI212 f l f 2 2 f
. 1· 2· 12·
(3.10)
Making the transformation, W12 =X I2 , YI=X 12 +X 1 , Y2=X 12 +X 2 ,
and recalling that n=X I2 +X l +X 2 +X,
is a joint density function of W12 , Y1' and Y2 . Now if the
marginal values, Y1 and Y2' are fixed, which is equivalent
to having constant distribution of gene frequencies, the remaining variable is W12 .
Essentially, the distribution of
W12 must be determined conditional on Yl and Y2' which is
f(W 12 !Yl'Y2)=[f(W12 'Yl'Y2)]/[f(Yl)f(Y2)]' since Yl and Y2
are mutually independent.
Since f(Yi) is binomial, (3.7),
the conditional distribution. of W12 given Yl and J2 is
21
Y1: Y2:(n- Y1):(n- Y2):
IY
f(W 12 l'Y2) = n:w12:<Y1-W12>:<Y2-W12>:<n-YI-Y2+W12):
which is the hypergeometric density function.
The above situation can be illustrated with a 2x2 table,
where the cell totals are the numbers of each genotype in
the sample and the marginal totals represent the number of
homozygous genotypes for each locus as shown below.
B2B2
b2b2
BIB I
W12
YI-W 12
Yl
blb l
Y2- W12
n-YI-Y2+W12
n-Yl
Y2
n-Y2
n
When n, Yl' and Y2 take specified values in the 2x2
table, it is apparent that there is still one degree of
freedom left to determine the cell values.
The distribution
of values is determined by f(W 12 \Y1'Y2)' the hypergeometric
function.
Extension to more loci in the model introduces more
such variables.
In fact, for m 10.ci, the 2x2 table becomes
m dimensional and there are
C;) such hypergeometric
22
variables;
!.~.,
one hypergeometric variable, Wij' for each
pair of loci.
The extension to four loci, for example, produces the
following set of Wij from (3.5):
Then the conditional distribution of any Wij given the set
of Yi's, 1. =(Yl'Y2,""Ym) is
(3.11)
by consideration of an m-dimensional table similar to the
2x2 tables in two dimensions but summed over all dimensions
except i and j to give a 2x2 table.
iI.
I
23
Some conditional expectations of interest are
F~ = E(Wijl~) = nPiP j
f2
= E[(Wij-nPiPj)21~J
F3
= E[(Wij-nP i P j)3IzJ
3
= (n-l)n (n-2)Pi (l-P i ) (l-2P i )P j (l-Pj ) (1-2P j )
P4
= E[(W ij -nP i P j )4IzJ
= (n-l) (~:2) (n-3)Pi (l-Pi)P j (l-P j )[ (n+l) - 6nP i (l-P i )
- 6nP j (l-P j ) + 3n(n+6)P i (l-P i )P j (1-P j )].
(3.12)
The formulas of (3.12) are conditional moments of Wij '
given the Yi' and are obtained from the moments of the hypergeometric distribution shown by Kendall and Stuart (1958).
To designate the conditional expectation of Wij' given
the Yi' the symbol Ew/ y is used as opposed to the more conventional notation in (3.12). In turn, expectation over the
distribution of y is indicated by the symbol Ey .
Then total
expectation of any function, g(W,Z), is expressed as
E[g(W,y)]=Ey[EW/yg(W,.l)].
In later derivations, upon extension to m>2loci conditional covariances and higher-order product moments of the
set of Wij are required.
Symbolically,
(3.13 )
,-.
,.-.
:,,'
24
til.
must be determined for r,s=1,2 and j~t, where i and k may
or may not be equal.
It can be shown that the product moment in (3.13) is
equal to zero under the specified conditions.
The result
is demonstrated for r=s=l and i=k, the covariance of Wij and
Wit' which is
(3.14)
However, it can be shown that
Therefore, the average conditional covariance of Wijand
Wit in (3.14) is zero.
For the product moment in (3.13) to be zero it is only
necessary to show that.
(3.15)
and for r,s=l,2 and jtt, the equality in (3.15) can be demonstrated.
25
4.
ANALYSIS OF THE DIALLEL CROSS
4.1
The Analysis
The dial leI cross considered for this problem includes
all possible crosses among a sample of n inbred lines, exeluding reciprocal crosses, so that there are the n(n-l)/2
Fl'S plus the n inbred parents involved.
to
The model used
results from the Fl cross means in a replicated experiment is
analy~ethe
(4.1)
(4.2)
where Yqq is the mean of the qth inbred parent, PI is the
mean of the population of inbred lines, gq is defined as in
(4.1), and ~qq is the mean experimental error associated
with the Yqq observational mean.
26
Table 4.1
Mean squares and expected mean squares for the
analysis of means from a dial leI experiment,
excluding selfs and reciprocals
Mean
square
E (mean square)
(n-l)
MS g . c . a
«5~/k) + 6'2s.c.a + (n-2)0'~.c.a
Specific
n(n-3)/2
MS s . c . a
(6~/k)
Error
fe
MSE~
(O'~/k)
Source
df
General
+
(52s.c.a
MS
. g.c.a
MS s • c . a
2
1
2,
= q<r
2;Yqr
- (n-2)L:Y
.
q q
[
2
2]
.
+ (n-l) (n-2)Y., /n(n-3)/2
a MSE is the usual experimental error mean square
divided by the number of replications, k.
The mean square among inbred parents is
MSI
= [~Y~q
-(2.tYqqY/n}(n-l).
(4.3)
The error variance for the parents, 6~, is estimated from a
replicated experiment and the estimator is denoted as MSE!,
The expectation of the above statistics is then
E(MSI) =
E(MSE 1 )
where
6i
(O~/k)
= e>g/k,
+
6~
(4,4)
is the component of variance among inbred lines.
', .. .r
27
Additional information can be obtained from the mean
product between the inbred parents and their offspring,
which is
loIP(l.O) =
where
[~YqqYq.
Yq.=~Yqr
- ~(tYq~q~/qrJ
!(n-l)(n-2),
(4.5)
from (4.1) and
r
E[MP(I.O)]
= aI.O'
(4.6)
where 6 1 . 0 is the covariance of inbred line$ and their progeny.
The means for the n(n-l)/2 Fl's; Yj and the n parents,
YI , are computed in the usual manner.
4.2
Genetic Interpretations of Diallel Statistics
The expected values of the diallel statistics must be
expressed in terms of the genetic parameters of the reference population of interest in order to find unbiased
estimators of the genetic parameters.
The first step in ac-
complishing the translation is to express the diallel
statistics in terms of the sampling variables of Section 3
and genotypic values to be introduced.
These expressions
are derived in the remaining portions of Section 4.
The most general model of gene action utilized includes
additive, dominance, and additive-by-additive epistatic gene
effects.
Two special cases of this model also investigated
are .(i) additive and dominance gene effects and (ii) additive and additive-by-additive gene effects.
Random experimental error is assumed equal to zero, for
the present, in all of the derivations, and attention is
28
focused on the genetic components of interest.
Even though
random error is not included in the model, the mean squares
and products are referred to as diallel statistics in
Sections 4 and 5.
The consequences of adding random experi-
mental error to the model are taken up in Section 6.
The mean squares and products computed for the diallel
analysis, expressed in terms of sampling variables and
genotypic values, are illustrated for two loci.
All pos-
sible crosses among the n inbred parents sampled at random
result in the n(n-l)/2 FIts shown in Table 4.2.
Table 4.2
e
'-../
Diallel matings with resulting number and genetic
values of the F l progeny
Fl
genotype
Number of
FIts
Fl genotypic
value
BIBlB2B2
X12(X12- l )/2
ul + u2 + t 12
BIBIB2 b 2
XI2 Xl
ul + a2 u 2
x Pl b I B2B2
x b l b l b 2b 2
Bl b I B2B2
B1 b I B2 b 2
X1 2X2
alul + u2
X12 X
alul + a2 u 2
BI al b 2 b 2 x B'lBIb2b2
BIBIb2 b 2
Xl(Xl-I)/2
ul
x b1 b 1B2 B2
x b1 b 1 b 2b 2
BI b 1B2 b 2
XI X2
alul + a2 u 2
Bl b l b2 b 2
XIX
alul - u2
b l b l B2B2 x b1blB2B2
b l b l B2B2
X2(X2- l )/2
-ul + ul
x blblb2b2
blbIB2 b 2
X2 X
-ul + a2 u 2
b I b 1 b2 b 2 x blblb2b2
b 1 b 1 b 2 b2
X(X-l)/2
-ul - u2 + t12
Mating type
Bl Bl B2B2 x BI BI B2B2 ·
x BI Bl b 2 b 2
,e
Genotypic
-
u2 - t 1 2
-
t 12
•
29
values are assigned, following the model of Comstock and
Robinson (1948), with the addition of additive-by-additive
epistatic values to their additive and dominance model,
The symbols, ui and ai' are those used by Comstock and Robinson (1948),
The factor, t 12 , in the genotypic value introduces into the genetic model additive-by-additive interaction
between loci 1 and 2.
2 Yqr I for the' diallel analysis is
q<r
obtained from Table 4.2 by multiplying the genotypic value
The quantity, Y,.:::
times the number of each genotype and summing all such
terms, which is
•
Y
1
.
1
~X12(X12-1)(ul+u2+t12)
=
n(~-l) ~ (2P i -l)ui + n 2 L: Pi (l-Pj.>aiui
1
.
+ ... + 2X (X-l)(-ul-u2+ t 12)
:::
i
Hence, the genotypic mean of the diallel FIls is
2Y
Y ::: n(n-l)
where Pf""Yi/n and X1 2=W 12 are defined in Section 3. Summationover the i subscript refers- to summation over the two
loci.
30
To obtain MS g . c. a' Table 4.1, the sum of squares of progeny
totals, L y 2 , is required . The progeny totals, Yq . =2: Yqr ,
q
q.
r
are
-(n-YI-I)ul - (n-Y2- l )u2 + Ylalul + Y2 a 2u 2
+ (X-l)t 12 ·
(4.8)
There are X12 terms like (i), Xl terms like (ii),X2 terms
like (iii), and X terms like (iv) for the diallel mating.
Recall that YI=X I2 +X I , Y2=X I2 +X 2 , and n=X 12 +X I +X 2 +X.
mean square is
The
e
',-,
MS g . c . a
· [i?:. /
=
in-2) -
4Y~ /n Cn-2)}C n - 1 )
31
3
n
L Pi (I-Pi) [(n-2)Ui
+ (1-2P i)ai ui
(n-I)(n-2) i
l n
J
2
(n-2)
2
n(n-4)
, 2
n (1-2P j /l)t12, + (n-l)2 Pl(1-Pl)P2(1-P2)t I 2
2
2n
(rn
P p >[(n-2)
+ (n-I)(n-2} "12- n 1 2'
n uI+ (1- 2P I)aIul
(4.9)
Then,
Ms.c.a
= [
",",y2
I
,",y2
2
y2 ]/n(n-3)
q-zr qr - (n-2) ~ q. + (n-I) (n-2) ..
2
32
+
(n_l)(n~2)(n_3)1~(W12-nPiPj)[(n-l) - n2Pi(1-Pi)]aiuit12
+'n(n-l)(n~2)(n-3)t(n2-3n+4)[(W12-nPlP2)2
n2
- (n~1)Pl(l-Pl)P2(1-P2)]
- 2n(n-l)(W12-nPlP2)(l-2Pl)(1-2P2)Jt~2'
(4.10)
The values required to obtain the statistic, MSI, (4.3)-the number and genotypic value of the inbred parents--are
listed in Table 4.3.
Table 4.3
Genotypic value and number of inbred parents
utilized in the diallel cross
Genotype
Number of genotypes
Genotypic value
Bl Bl B2B2
X12
ul + u2 + t 12
Bl Bl b 2 b 2
b l b l B2B2
Xl
u l - u2
X2
-ul + u2
X
-u l - u 2 + t 12
b b b b
l l 2 2
- t 12
- t 12
Obtaining totals and sums of squares in the usual manner,
MSI
= [tY~q
e-,
- (t: y qq)2 /n} (n-l)
= (n-l)
4n 2:;P.(l-P.)[u, - (1-2P '1i)t 12 J 2
JF
i
~
~
~
33
+
1~:~~~§)Pl(1-Pl)P2(1-P2)t~2
+
+ (1-2P l )u l t12 + (1- 2P 2)u2 t 12 -
(n~1)(W12-nPIP2)[UIU2
(1-2Pl)(1-2P2)t~2J
16
[ n2
2J. 2
+ n(n-l) L(n-1)P1 (1-P 1 )P 2 (1-P 2 ) - (W 12 -nP 1P 2 ) t 12 •
(4.11)
The covariance of the inbred parents and their Fl progeny,
,MP(I.O), is obtained in a manner similar to mean squares except that cross products are used instead of squares, which is
1oIP(I,O)
=[
II
i?qqYq • -
(:~l)
t:
~~Yq.v~~ Yqr)] j[(n-l)(n-2)]
Pi(l-P i ) CUi - (1-2Pj~i'>t12J2
2n 2
+ (n-1) (n-2)
""V
L..J
i
Pi (l-P i ) (1-2P i ) [ui - (1-2Pj~i) t 1 2 Jai ui
+ 4n(n-4)p (l-P )P (1-P )t 2
(n_l)2 1
1 2
2 12
+
n(::~)t~-2) [(n~~)Pl (1-Pl)P2(1-Pa)
4
-
(W12-nP1P2)~ t~2
2
- (n_lY(W12-nP1P2)(1-2P1)(1-2P2)t12
~
2(n-4)
(
(
,
+ (n-l)(n-2)~ W12 -nP i P j ) 1-2P i )ui t 12
2n
2
~
+ (n-l)(n-2) ~(W12-nPiPj)(1-2Pi) a i u i t 12
4
+ (n_l)(W12-nPlP2)ulu2
+ (
n-
·2n
1)(
n-
2)
~
LJ
i~j
(1-2P.)(W12 -nP 1·P.)u·.a .u .•
J
J
Finally, the mean of the inbred parents is
1
J J
(4.12)
34
The genetic model assuming the absence of dominance is
readily investigated in the framework of this section by allowing ai=O in all the formulas.
Likewise, in the absence
of epistasis, t 12 =O in all the formulas.
In the absence of
epistasis, the gene model can be extended to include an arbitrary number of loci, m, since the two-locus model· is suf~icient
to illustrate the most general expressions of the
diallel statistics.
The diallel statistics become, upon letting tl2=O and
extending to m loci,
YI = ~ ( 2P i- l )ui
n3
~
r,(n-2)
] 2 2
MSg . c . a =. (n-l) (n-2)
Pi (I-Pi) L n
+ (1-2P i )a i
ui
t
+
J
2
2n
>' (W .. _np.p.)[(n-2)
(n-l)(n-2) 1<j 1J
1 J • n
+ (1- 2P i)ai
X
MS s . c . a
=
J
[(n;2) + .(l-2P j )aj UiUj ,
4n
""'"
2
1
-1 n-n~)(w
.. -nP.P.)(1-2P.)(1-2P.)]a.u.a.u·.
.. 1J
1 J
1 .
J
1 1 J J
MSI
2 2
(n-l) (n-2) (n-3) ~Pi(l-Pi)[n Pi(l-Pi)-(n-l)}aiu i
35
+
4
' " (W .. -nP . P . ) u . u .
(n-l)
,Li.
J.J
J. J
J. J
l<J
(4.14)
36
5.
ESTIMATION OF POPULATION PARAMETERS
5.1
General Remarks
For the present investigation, unbiased estimators are
to be obtained from the dia11el analysis for genetic parameters of two separate reference populations.
One reference
population is the random mating population from which the
dial1el crosses are a random sample.
The second is a random
mating population derived wholly from the diallel parents.
The genetic parameters of interest in the populations are
the mean and the partitions of the total genetic variance.
Estimators are derived from the dia1le1 analysis by
equating the diallel statistics to their expectations, which
are given in terms of the genetic parameters.
Solutions of
the equations for the genetic parameters in terms of the
diallel statistics are taken as the estimators.
The partitioning of genetic variance for the additive,
dominance, and additive-by-additive genetic model is given
below for a random mating population in equilibrium for two
loci each with two alleles, as outlined by Cockerham (1954).
An analysis of the population produces the following set of
genetic parameters;
37
(i) population mean
p =
4 (2P i- 1 )Ui
+ 24: Pi(1-Pi)ai u i + ( 2P1-1) (2P2-1)t12
1
1
(ii) additive genetic variance
(iii) dominance genetic variance
(iv) additive-by-additive genetic variance
(5.1)
The symbols ai' ui' and t12 represent the genotypic values
as assigned to the genotypes in Table 4.2.
The Pi repre-
sents the frequency of the positive allele, Bi , at the i th
locus.
5.2
Parent Population Estimators
The results for estimation of genetic parameters in the
parent random mating population have been given for a genera1.gene model by Matzinger and Kempthorne (1956) for the
modified dia11e1 experiment, and by Kempthorne (1956) for
the complete dia11e1.
In the context of the present problem, the most general
gene model considered consists of additive, dominance, and
38
additive-by-additive epistasis at two loci,each with two
alleles.
The mean and genetic variances of the parent popu-
lation are those shown in (5.1) with the gene frequencies
shown in (5.1);
i.~.,
the frequency of the positive allele,
Bi , at the i th locus is Pi.
The expectation of the diallel statistics in terms of
\
the parent population parameters of (5.1) are shown below in
(5.2).
The detailed expectations are given in Section 9.1.
E(y)
E(MS
s.c.a
=
)J
)
E(MSI)
(5.2)
where
(5.3)
39
The quantity, PI' is the mean of the population of completely inbred lines derived from the random mating population.
The quantities, D and F, in (5.3) are related to the parameters, ,D and F, defined by Hayman (1960) in the absence of
epistasis.
If there is no dominance or if gene frequencies
are one-half, then F=O and D=26~.
dominance P=PI'
Also, if there is no
Examination of the statistics and their ex-
pectations reveals that only p and PI can be estimated unbiasedly from the analysis.
The estimators are
A
]J == Y
1\
-
PI == YI •
(5.4)
2
2
The genetic variances, 6A
, 6 D2
, and
6AA cannot be estimated
unbiasedly from the Fl analysis ~inc~ there are two equations and three unknowns if the mean squares are equated to
their expectations.
Addition of MSI and MP(I.O) to the analysis introduces
a like
numbe~
o,f additional parameters, D and F.
However,
if gene frequencies are one-half, the inclusion of either
MSI or MP(I.O) in the analysis
allows unbiased\ estimation of
.
22
2
62
A, 0D' and·6.AA since F=O and D=26A when Pi=1!2.
The results are in agreement with those obtained by
Matzinger and Kempthorne (1956) in that it is not possible
to estimate unbiasedly the genetic components of variance
with a single diallel experiment at one level of inbreeding
in the presence of epistasis.
The results obtained show
40
that the present gene model is sufficient to indicate that
inclusion of more epistatic effects in the model only increases the difficulties of estimation from the dia11e1
analysis.
In the absence of epistasis, there is a change of definition for the genetic parameters in (5.1) associated with
the reference population and a change in the expected values
of the statistics associated with the dial1e1 analysis.
The
population parameters in the absence of epistasis are obtained from (5.1) by allowing t 12 =0 and extending the model·
to include m loci. The expectations of the dia11e1 stat istics are obtained in the manner described earlier and demonstrated for the more complete genetic model in Section 9.1.
The resulting expectations are those given in (5.2) with the
a~A terms omitted.
The unbiased estimators p, 6~, and 6~ in the absence of
epistasis are
1\
P = Y
2(MS
-MS
)
(n-2)
g.c.a
s.c. a
(5.5)
These results are well known and have been presented by Matzinger and Kempthorne (1956).
In addition, the statistics computed from the parental
information provide estimators for PI' D, and F, which are
Equivalent results were presented by Hayman (1960) for the
estimators of D and F.
In the absence of dominance, the genetic parameters of
interest are p, 6~, and 6~A'
domin~nce
genetic variance.
There will be, of course, no
The genetic parameters of the
random mating population in the absence of dominance can be
obtained from those in (5.1) by allowing ai=O.
The expecta-
tions of the dia1le1 statistics are those given in (5.2)
2 =F=0 and D=262 .
with 6 D
A
2 , and 6 2 , from the F
The unbiased estimators of p, 6A
l
AA
analysis in the absence of dominance, are
p=
y
A2
0AA = 2 MS s.c.a
(5.7)
Since the expectations of MSI and MP(I.O) contain a~ and 6~A'
it is possible to include one or both statistics in the
analysis to obtain least square solutions for estimators of
o~ and 6~A'
Also, the mean of the inbred parents is an un-
biased estimator of f
in the absence of dominance.
•--
42
5.3
5.3.1
Derived Population
General Remarks.
Estimato~s
In this section, the dia11e1
estimators are obtained for genetic parameters of the random
mating population derived entirely from the completely inbred parents of the dial1e1 cross, referred to as the
rived population.
de~
The gene frequencies of the derived popu-
lation are identical with the gene frequencies of the set of
inbred lines from which the population was derived in the
absence of forces that change gene frequency.
Ordinarily, the estimators for the derived population
parameters are obtained by equating the dial1e1 statistics
to their expectations in terms of the derived population genetic parameters.
Then solutions for the genetic parameters
in terms of the diallel statistics are taken as the estimators.
However, only the conditional expectations of the
diallel statistics are used, since we are concerned only
with those samples that give rise to the same derived
lation.
popu~
Such estimators are considered to be conditionally
unbiased.
However, the average values of the derived population
paramet~rs
can be expressed as linear functions of the par-
ent population parameters,"
Since the expectations of the
statistics are known in terms of the parent population parameters, it is most convenient to make use of the linear
relationships of the two sets of parameters in solving for
unbiased estimators of the derived population parameters.
43
A proof that these estimators are identical to those obtained by taking conditional expectations follows.
Let S be the vector of diallel statistics.
The condi-
tional expectation of S is
(5.8)
where M is a nonsingular square matrix whose elements are
functions of n, and
9p
is the vector of derived population
parameters defined such that (5.8) is true.
The condi-
tionally unbiased estimator of ~ is then
(5.9)
The average value of derived population parameters are
(5.10)
where N is a nonsingular square matrix whose elements are
functions of n, and
parameters.
~p
is the vector of parent population
The unbiased estimator of Ey(9~) is then de-
fined as
(5.11)
It must be shown that ~~$~.
The unconditional expectation of S is
E(S)
44
using (5.8) and (5.10).
Then the unbiased estimator of the
vector of parent population parameters
~
is
(5.13)
,F~om (5.11), the unbiased estimator of' Ey(~) is
using (5.13).
1'\
1\
Hence 9*~9*
which was to be shown.
-p ~"
The vectors of population parameters, ~p and !p' can be'
modified to the genetic model assumed, but they are re-
-
stricted to the same number of elements as contained in S.
The matrix N is found from the relationship in (5.10) and
can be used in conjunction with (MN)-l in (5.13) to determine M- l .
The matrix (MN)-l is known for very general ge-
netic models from previous results on estimation of parent
population parameters.
However, this method does not allow
determination of the exact variances of the estimators.
To
obtain their exact variances, the estimators must be derived
using conditional expectations of the diallel statistics as
shown in (5.8).
Since exact variances are desired, the
estimators are found using conditional expectations in the
following section, which necessarily restricts the gene
model to two alleles.
5.3.2
Genetic Model with Two Alleles.
For the case of
two alleles, the genetic parameters of interest in the derived population are those shown in (5.1), where the
45
frequencies of the two alleles at the i locus are Pi and
I-Pi.
Therefore, for two loci, the derived population pa-
rameters are
(i) population mean,
(ii) additive genetic variance,
(iii) dominance genetic variance,
(iv) additive-by-additive genetic variance,
(5.14)
where the asterisk in (5.14) distinguishes the derived population parameters from the parent population parameters.
The estimators for the derived population parameters
are obtained by equating the diallel statistics to their expectations, which are given in terms of the derived population genetic parameters.
Solutions for the genetic param-
eters in terms of the diallel statistics are taken as the
estimators.
,~
However, only the conditional expectations of
the dial leI statistics are used since we are concerned only
46
with those samples that give rise to the same derived population.
The conditional expectations for the diallel statistics
of Section 4.2 are shown in Section 9.1.
The conditional
expectation of the mean of the Fl's is
(5.15)
where p~=~ (2Pi-l)ui+(2Pl-l) (2P 2 -l)t 12 is the mean of the
l.
population of completely inbred lines obtained from the derived population.
The coefficient of p* illustrates an in-
crease in the amount of heterozygosis in the derived population relative to that of the parent population.
The conditional expectation of the mean of the diallel
parents is
(5.16)
The conditional expectations of the mean squares and
product of the diallel analysis are
47
2
n
*'
- (n-2) n + 2(n-l) (n-2)F*
n
+
=
EW/y[MP(I.O)]
n
n*' _
n
iF*
(n-2) (n-3)
(n-2) (n-3)
n D*
4n(n-2)~*
(n-l)
+ (n-l)2 AA
n n* n(n-4) 6¥*
2 (n-l)
+ (n-l) 2 AA
-
2
n
iF*
4 (n-l) (n-2)
(5.17)
2* , are shown in ( 5. 14) . n*
The parameters, 6 A2* ' 6 n2* ' and 6 AA
and F* are the derived population equivalents ton and F
shown in (5.3).
Solving (5.15) and (5.16) for p~ and p*, the following
unbiased estimators are obtained
A*
P
= (n-l)y + lYI'
n
n
(5.18)
,
Observation of equations (5.17) reveals that there are
four equations in five unknowns, which precludes obtaining
unbiased estimators of any of the genetic components of
variance except with gene frequencies of one-half.
In that
48
case, F*=O and D*=2~*, and inclusion of either MSI or
MP(I.O) allows unbiased estimation of the genetic components
of variance--a situation analogous to that encountered for
estimation of components of genetic variance in the parent
population.
The additive and dominance gene model in the absence of
epistasis is considered by allowing t12=O in all formulas
and extending to m loci.
In the absence of epistasis, the
conditional expectations of the diallel statistics are those
shown in (5.15) through (5.17) with the 6~~ terms omitted.
Setting the statistics equal to their conditional ex-
•
pectations and solving for the parameters gives the following set of estimators for the derived population parameters.
~D2* = (n-l)(n-3)MS
n(n-2)
s.c.a
+ 4(n-1)2 MS
3
n (n-2)
g.c.a
(n-l)-
IY + n
-y I
n
=
(n-l)e
n
(n-1) (n-2)/\
n
2
F.
(5.19)
49
It is important to realize that the unbiased estimators
for derived population parameters are unbiased over those
diallel samples that lead to the same derived population,
<
i.~.,
those dial leI samples having the same set of Pi' which
is quite different from obtaining unbiased estimators from a
fixed sample for its specific derived population.
If for the additive and dominance model the parental
2* ' and
analysis is ignored, one set of estimators for p * , 6 A
2* obtained from the analysis of the FIls is
6n
2(n-l) (n-2)MS
( ~2*)
vA b~=
n3
g.c.a
(dn2*)b = (n-I)(n-2)(n-3)MS
n
3
s.c.a·
(5.20)
However, there is a bias associated with each of the estimators in (5.20).
The average bias for each of the estimators
is
= -
4(n-l) 2 "C"l
2 2
3 ~. P.(l-Pi)a.u
..
1
1 1
n
1
(5.21)
--
50
The set of biased estimators presented in (5.20) is one of
many possible sets of biased estimators available from the
Fl analysis.
In the absence of dominance, the genetic parameters of
the derived population are found from (5.14) by letting ai=O
in all of the formulas.
Then for the additive and additive-
by-additive epistatic models, the conditional expectations
of the diallel statistics are those shown in (5.15) through
2* ' 6 2* =F* =0, and PI=P.
* *
(5.17) with D* =26A
Using only the
D
statistics from the F l analysis, unbiased estimators for p*,
2* ' and (5AA
2* are
6A
A* = -Y
}1
2(n-l)2
MS s.c.a·
2
(5.22)
n
The parental analysis can be included to aid in the estima-
2* ' and 6 2* by using a least squares estimation
tion of p * , 6 A
AA
procedure.
For completeness, consider the additive genetic model
in the absence of dominance and epistasis by allowing t12=0
and ai=O in all the formulas.
The conditional expectation
of the diallel statistics are those given in (5.15) through
* _2*
2* _2* *
* *
(5.17) with D =26A ' 6 D =O-AA=F =0, and Pr=P. The unbiased
2* from the F analysis are
estimators of p* and 6 A
I
51
. (5.23)
5.3.3
Extension to Mu1tip1eoA11eies.
In this section,
the results of Section 5.3.1 are used to show that the estimators obtained in Section 5.3.2 with two alleles do not
change with the extension to a multiple allelic system.
The
result is illustrated for the additive and dominance genetic
model with an arbitrary number of loci, each with an arbitrary number of alleles.
The estimators for parent population genetic variances
have been presented by Matzinger and Kempthorne (1956) and
Griffing (1956) with the extension to an arbitrary number of
alleles.
The number of alleles did not affect their results
on estimation from the diallel experiment.
The present extension required a change to the genetic
notation used below.
Kempthorne (1954, 1957) described the random mating
population for one locus and s alleles with genotypic array,
s
L p.p.B.B ..
i, j=l ~ J ~ J
52
The genotypic value of BiB j is denoted by Zij' which is
equal to Zji' the genotypic value of BjB i .
The effects of
the alleles, B1 ,B2 , ··.,Bs ' at a locus are u 1 ,U 2 '···'U s ' respectively. Now Zij=P+«i+«j+d ij , where Ui andUj are the
additive effects of the i and j alleles and d ij is the domiAlso u i = ~ PiZij-P.
nance deviation.
The genetic param-
J
eters for the population with one locus are given as
(i) mean,
(ii) additive genetic variance,
(iii) dominance genetic variance,
62
D
=
L p. p .z~. - p2.
i,j 1 J 1J
"(5.24)
where summation is over the s alleles.
Similarly, the population of inbred lines derived from
the random mating population by inbreeding without selection
will have the following mean and variance.
53
(i) mean
(ii) variance
(5.25)
The extension to m loci is accomplished by summing all
parameters for m loci;
~.~.,
the mean for locus m in the
random mating population is P(m)=L:,Pi(m)PJ'(m)Zij(m) and the
1. , J
.
mean for all loci is
LP(m)=~ L~Pi(m)Pj(m)Zij(m)J.
m
m
1.,j
.
The
derived population parameters are those shown in (5.24) upon
substitution of proper gene frequencies.
The procedure used to obtain the estimators for derived
population variances is outlined in Section 5.3.1.
The vari-
ances of the derived population are averaged over all derived
populations;
!..2,.., the expected value of the derived popula-
tion variances are taken with respect to Pi in order to obtain the elements of the N matrix, (5.10).
The average
values are linear combinations of parent -population variances for which estimators from the diallel "analysis are obtained.
As before, Pi=Yiln' where Yi is now multinomially
rather than binomially distributed.
Expectation of
:..2*
OA
and 6 02* yields
E (6 2*) - 2(n-l) (n-2) [(n-2) 62
602] + 2 (:;1) 2D "_ (Ii-I) (n-2) F
Y A n3
"2
A+
n3
54
and
Upon proper substitution of estimators for parent population
parameters from (5.5), the estimators for.6~ and 6~* are
identical to the estimators found for the two-allele case in
Section 5.3.2.
Results on the estimation of parent population parameters and the brief presentation in this section lead to the
speculation that the number of alleles does not affect the
form of
t~e
estimators for derived population parameters.
5.4
Discussion
Both the similarities and the differences associated
with the estimation of genetic parameters in the two reference bases cpnsidered for the dial leI experiment are of interest.
The basic similarity is the generality of genetic
model one can assume for purposes of estimation.
In both
cases, the genetic parameters can be estimated unbiasedly
only in the absence of epistasis or in the absence of dominance.
In the presence of both dominance and additive-by-
additive epistasis, there are no unbiased estimators for the
genetic variances of either reference population unless gene
frequencies are one-half; however, there are unbiased estimators for the means of these populations.
The basic dif-
ference lies in the utilization of the parental analysis for
55
estimation.
In the presence of dominance, statistics from
the parental analysis are required for unbiased estimators
of genetic parameters of the derived population, whereas
they are not required for the parent population estimators.
The results of Section 5.3.1 provide a convenient means
for obtaining unbiased estimators of the derived population
parameters for a general gene model.
The method should prove
useful in extending results to mating designs other than the
diallel in that one can dispense with the formulation of the
statistics of the analysis in terms of the sampling variables
and genotypic values as was done in Section 4.
It is only
necessary to obtain the average value of the derived population parameters as a linear function of the parent population
parameters and take usual estimators from the analysis of
parent population parameters to obtain an unbiased. estimator
of the linear function.
56
6.
VARIANCES OF ESTIMATORS
6.1
Introduction
The exact variances of the unbiased estimators for the
parent population and derived population parameters are obtained for the genetic model, including only additive and
dominance effects with two alleles at each locus.
The vari-
ances of the biased estimators of derived population parameters (5.21) are also considered .
. The variances of the derived population estimators are
compared to the variances of the parent population estimators as an indication of the relative efficiency of the
derived population estimators.
Initially, only variances of the genetic portion of the
estimators shown in (5.5) and. (5.l9) are presented.
The
consequences of random experimental error and replication
are discussed in Section 6.5.
6.2
Exact Variances of Parent
Population Estimators
The estimators of parent population parameters in the
absence of epistasis are
1\
}J
= -Y
"2
2
6 A = (n_2){MS g . c . a - MS s . c . a )
"'2
6D
as shown in (5.5).
= MS s . c . a '
57
The exact variance of
~~ is
4 2[V(MS g c a) + V(MS s c a)
(n-2)
. .
. .
- 2 COV(MS g . c . a , MB s . c . a )],
2 )=V(MS
and the exact variance of a~ is V(an
s.c.a ).
V(MB g . c . a ), V(MB s . c . a ), and Cov(MS g . c . a , MBs,c.a) are
obtained by use of the mean squares shown in (4.14) with the
variables Pi and Wij' which are binomially and hypergeometrically distributed, respectively.
The variances and co-
variance are found from the expectations,
V(MSg,c,a) = E{MS g . c . a )2 - [E(MB g . c ,a)]2
e
V(MBs,c.a)
= E(MS s ,c.a)2
- [E(MS s ,c.a)]2
and
Cov(MBg,c,a' MBs,cta)
= E[(MSg.c,a)(MSs.c.a)]
- E(MSg.c.a)E(MBs.c.a)·
Due to their complexity, the derivations of the above
variances and covariance are given in Section 9.2.
The
final form of the two variances and covariance are shown
in Table 6,1.
At the present time, the formulas of Table
6.1 appear to be unfactorable in their present form.
How-
ever, with the simplifying assumption of only additive gene
effects, the variance of MSg.c,a in Table 6.1 is a function
of the variance of the sampling variance of gene frequencies.
For example, in the absence of dominance, the variance of
the estimator for additive genetic variance for one locus
can
b~
expressed as
e
e
Table 6.1
"
Exact variances and covariance for genetic portions of MS g . c . a and MS s . c . a
from the dia11e1 analysis of variance
V(MSg . C • a )
= n 2 (n-1)1 2 (n-2) 2~[(n-2)4(n2p2'·-2nP3'·+P4'·)
i
1
1
1
+ 4(n-2)3(n3p2i-4n2p3i+snp4i-2PSi)ai + 6(n-2)2(n4p2i-6n3P3i+13n2P4i-12nP5i+4P6i)a~
+ 4(n-2)(nSp2i-8n4p3i+2Sn3p4i-38n2p5i+28nP6i-8P7i)a~
+ ( n6P2'· -10nSp3' . +41n4P4' . -88n3Ps' . +104n2P 6' . - 64nP 7' . + 16P8' . )a~Ju~ + ( :1) L: Ci CJ. 1
1.
1
1
1
1
1
1
1
n
i<j
2:i C~
V(MS s • c • a )
=
16
n2 (n-1)2 (n-2)2(n-3)2
~
-? {1. n 2 (n-1) 2Jl2i
+ [6n 2 (n-1) + (n-1)2 + n4Jp4i
2
- 2n(n-1) (n +n-1)}13i
- [6n(n-1) + 4n3 JpSi
2
2 2
+ (6n +2n-2)P6" 1 - 4nP7'·1 + PS' 1)
.1 a~u~
+ n (~3)
1 1
n. . 1<J
"L 0n. 1.6n J. -
4
2:
.1 6 n 1·
C1l
(Xl
ie
e
~
Table 6.1 (continued)
COV(MS g • c • a ' MS s • c • a )
2
_
4
""'{
2
2
2·
- n 2 (n-l)2(n-2)2(n-3) ~
(n-2) [-n (n-l)P2i + n(n +2n-2)P3i - (3n +n-l)P4i
+ 3nPSi - P6i
2
2
+ 2(n-2)[-n 3 (n-l)P2i + n (n 2 +4n-4)P3i - 5n(n2 +n-l)P4i + (9n +2n-2)PSi
J
- 7nP6i + 2P7i]ai + [-n 4 (n-l)P2i +n3 (n +6n-6)P3i - n 2 (7n 2 +13n-13)P4i
+ n(19n 2 +12n-12)PSi - (25n 2 +4n-4)P6i + 16nP7i - 4P8iJa~}a~u1
+
2
C.O . iFj ~ nJ
[2: c.] [2:
i
~
Ci
=
i
(52.J,
n~
where
62
Di
22222 2
(n-2)Pi(1-Pi)[1 + (1-2Pi)a i ] u i + 4P i (1-Pi) aiu i
=
4p~(1-p.)2a~u~
~
~
~
~
J1~i = E Y(y~)
~
01
CD
60
where the quantity inside the square brackets is n 2 times
the variance of the sampling variance of gene frequency.
For completeness, the variance of
Q=y
is
8(2n-3)~
P~(1-p.)2a~u~
+ 8n~, Pl' (1-Pi)(1-2Pl·)al,u2l'
n(n-3)~ 1
1
1 1
~
1
= 10
n
1
+ nlHl - !F _ 2(2n-3)6 2
n
6.3
n(n-l) D'
Exact Variances of Derived
Population Estimators
The appropriate variances for the estimators of derived
population parameters are the average conditional variances,
i.~.,
conditional on fixed sample gene frequencies, Pi'
The
average conditional variances are appropriate because the
parameters of interest are for the equilibrium random mating
population completely specified by the sample gene frequencies.
Hence, the· only source of genetic variability in
the estimators must be due to differences among the diallel
.
estimators arising from samples having the same marginal gene
.
frequencies but different genotypic distributions.
The unbiased estimators for the additive and dominance
genetic model of Section 5.3.2 are considered.
The esti-
mator for additive genetic variance is from (5.19).
61
~2*
= 2(n-l)(n-2)MS
A
n3
g. c •a
+ 2(n-l)2£
n3
-
(n-l)(n-2)A
n3
F.
Its variance is
- 4(n-l):(n-2)coV*(D,9).
(6.1)
n
"'2* ) are derived in Section 9.3 and
The components of V(6
A
are giv~n below as average conditional variances and co-
variances.
.'-......-,
'"
V*(D)
= (n-l)
4
"" DiD.J
.L.....
l.<J
V*(F)
=
2::
16
D. [(n-2) 6 2
(n-l)(n-2)i~j l. [ 2
Aj
+
V*(MS
g.c.a
~D.(F.-D.) +
4
(n-l) i~j l.
) =
J
J
1
LF.F.
(n-l) i~j l. J
4
~[(n-2)62. + 6 2 .J" f(n-2)6 2 .
(n-l)i~· 2
Al.
D~ L 2
AJ
1\
COV*(MS g • c . a , D)
"-
Cov*(MS g . c . a ' F)
(6.2)
62
where
Collecting terms for (6.1),
V(~2·)
A
- 4(n-l) (n_2)2" [2C
F
n6
«j
i - i +
2~n-l~D
n-2
i
][2C
j
-
F
j
+
2(n-l)D
(n-2) j
J
(6.3)
where
The estimator for 6~* from (5.19) is
and its variance is
63
V(~2*)
D
=
2
2
.
4
(n-1) (n-3) ¥*(MS
) + 16(n-1) ¥*(MS
) +
n 2 (-n_2)2
s.c.a
n6(n_2)2
g.c.a
4
.
3
+ (n-1) ¥*(;) + 8(n-l) (n-3)CoV*(MS
n6
n4(n-2)2
2(n-1)3(n-3)C *(MS
n4(n-2)
ov
s.c.a'
fi)
s.c.a
(
n-~
n
)4
v*(n)
MS . . )
g c a
+ 2(n-1)3(n-3)CoV*(MS
n'(n-2)
s.e.a
,
8(n-1)4
A
8(n-1)~c' ov*(MS g c a , F~)
6
COY * (MS g c a' D)
+ 6
n (n-2)
. .
n (n-2)
. .
2(n-~)4cov*(a,
n
F).
(6.4)
The component variances and covariances of (6.9) needed in
addition to those in (6.2) are derived in Section 9.3, and
they are
*
V (MS s.c.a )
1\
=
2 .6 2
83). LJ
" 6D
(
n n- i<j 1 Dj
Cov*(MS s . c . a , F)
=0
a)
=0
COV*(MS s . c . a '
CoV*(MS g . c . a ' MS s . c . a )
= O.
(6.5)
Upon collecting terms for (6.4),
V(~2*)
D'
= 8(n-1) 2 (n-3)
n3 (n-2)2
+
:6
62 62
i<j Di Dj
4(n-~)3!
L [<n:2)C'i ~
n
~i<j
.
Di + F,i] [cn:2)C j - Dj + FjJ
where
Ci
=[
(n-2) 2
2 ]
2 6Ai + 6Di .
1 (6.6)
F)
64
The conditional variance of the estimator of p*,
(5.19), is zero; hence, the average conditional variance
of ~* is always less than the variance of
p.
The unbiased estimators discussed above require information from the inbred parents, in addition to the Fl's in the
diallel experiment.
It is of interest to investigate some
of the properties of certain biased estimators that utilize
information from Fl's only. The biased estimators for O·2*
A
and 6~* are from (5.20) as follows.
2(n-l) (n-2)1I8
3
.
g.c.a
n
with bias,
and
= (n-l)(n-2)(n-3)MS
n
3
s.c.a '
with bias
Their variances are
1\
V«(52*)
A b
= 4(n-l) n2 (n-2) 2¥*(MS g.c.a )
6
and
22
= (n-l) 2
(n-2) (n-3) ¥*(M8
n6
=
8 (n-l)2 (n-2)2 (n-3)
n7
L: (j2
. )
s.c.a
62
. . Di Dj'
l.<J
(6.7)
65
6.4
Numerical Evaluations of Estimator Variances
6.4.1
General Remarks.
It is difficult to compare
analytically the relative efficiencies of the estimators
from the parent and the derived population parameters.
How-
ever, some measure of the goodness of the derived population
estimators is necessary to evaluate the proposed procedure
of inference to derived populations.
In this section, the
two sets of variances for several spe6ificcases under the
additive and dominance genetic model will be evaluated and
compared.
For this purpose, let
u·
·1
= Uj = 1
a i = a j = 0.0, 0.5, 1.0, 1.5
Pi
Pj = 0.05, 0.10, 0.25, 0.50,
::::
n
::::
0.75, 0.90, 0.95
5, 10, 15, 20
m = 2, 10, 100, 1000,
which give a total of 448 combinations of n, a, m, and p.
The value of 1 is assigned u because u4 enters as a constant
multiplier for each of the variances.
The restriction that
all ai' all ui' and all Pi are equal is an unrealistic but
necessary restriction in order to reduce the
n~merical
evalu-
atton to a manageable task.
The variances of the unbiased estimators evaluated were
A
V(6~)
::::
4
(n_2)2[V(MS g . c . a ) + V(MS s . c . a )
- 2 Cov (MS g . c . a , MS s . c. a) ]
A
V(6~) = V(MS s . c . a )
66
V(~~*) = v*[2(n-l)(n-2)MS
n3
g.e.a
+ 2(n-l)2
n3
8 _ (n-l)(n-2)F]'
n3
= v*[(n-l)(n-3)MS
. + 4(n-l)2MS
n(n-2)
s.e.a
n3 (n-2). g.e.a
-
(n_l)2 A
(n-l)2 0 ]
3 D+
3 F,
n
n
where V(MS g . e •a ), V(MS s . e . a )' and COV(MS g . e . a , MS s . c . a ) are
"2* ) and V(On
"2* ) are shown in (6.3)
shown in Table 6.1 and V(OA
and (6.6).
Upon letting Pi=Pj=P, ai=aj=a, and ui=Uj=l, the variances in (6.14) took the form shown in Table 6.2.
The
formulas of Table 6.2 were evaluated on the IBM 1410 digital
computer located in the School of Textiles at North Carolina
State of the University of North Carolina at Raleigh.
6.4.2
Relative Efficiencies of Derived-to-Parent
P9Pulation Estimators.
In order to compare the relative ef-
ficiencies of derived population to parent population estimators of additive and dominance genetic variance, the following ratios were computed.
[V(6~)/(6*)2]/[V(d~*)/(Ey6**)j
[V(~~)/(6~)2J/[V(6~*)/(Ey6~*)J.
(6.9)
A ratio >1 indicates that the derived population estimators are relatively more efficient, while a ratio <1 indicates that parent population estimators are relatively
2*
2*
Ey 6A
and Ey 6D
' used in
(6.9) are the average values over all possible derived
more efficient.
The
v~l~es,
•
e
~
Table 6.2
Variances and covariances of dial1el estimators letting Pi=Pj=P, ai=aj=a,
and ui:::Uj=1
V(MS
=
-)
g.c.a
m.
[(n-2)4(n2p'-2np'+p') + 4(n-2)3(n 3 p'-4n 2 p'+5np'-2p')a
n 2 (n-1)2(n-2)2
2
3 4
2
3
4
5
2432"
2
+ 6(n-2) (n Pi- 6n P3+ 13n P4-12nPS+4P6)a
+ 4(n-2) (n5p'-8n4p'+25n3p'-38n2p'+28np'-8p')a3
2
3
4
567
+ (n6P2-10n5p3+41n4p4-88n3~5+104n2p6-64nP7+16P8)a4J + 2m(m-~) ~.m(n-1)C2
V(MS s . c
~
•a )
16m
"2
2
2{n 2 (n-l)2 Pi n (n-l) (n-2) (n-3)
2
2n(n-1)(n2+n-l)p~
+ [6n 2 (n-l) + (n-l)2 +
- [6n(n-l) + 4n3 Jp' + (6n 2 +2n-2)p' - 4np' + p'la 4 + 4m(m-l) 7
8)
n(n-3)
5
6
ID
n4]p~
n2
m
~
(e
e
.(e
Table 6.2 (continued)
Cov(MS g . c . a ' MS s . c . a )
= 2 24m
2
n (n-I) (n-2) (n-3)
2
2
2
3
f(n-2)2[-n (n-l)P2 + (n +2n -2n)P3 - (3n +n-l)P4 + 3nP5 - P6 Ja2
l
.
+ 2(n-2)[-n3 (n-l)P2 + (n4 +4n3 -4n 2 )pj - (5n3 +5n2 -5n)P4 + (9n2 +2n-2)PS - 7np6 + 2p7Ja 3
+ [-n4 (n-l)P2 + (n 5 +6n4 -6n 3 )p3 - (7n 4 +l3n 3 -l3n 2 )p4 + (l9n3 +l2n 2 -l2n)P5
- (25n2+4n-4)p~ + l6np~ - 4p~Ja4} - mCD
"
V(~~*)
= 8m(m-l~(n-l)[(n_2)C
+ 4(n-2)p(l-p) (1-2p)a + 4(n-l)p(1-p)]2'
n
V<62*) - 4m(m-l) (n-l)2(n-3)D 2 + 32m(m-l)(n-l)3[C _ 2(n-2)p(1-p)(1-2p)a - (n-2)p(1-p)J2
D
n 3 (n-2)2.
n6(n_2)2 .
where
C
= (n-2)p(1-p)[1 + (l-2p)a]2 + 4p2(l_p)2a2
D
= 4 p2(1-p)2a 2.
0)
00
69
populations such that they are expressed in terms o,f the
gene frequencies of the parent population.
In addition, the coefficient of variation for each of
the estimators was computed, which is
C.v.
where
1\
~9
1\
= 100(69 /9),
(6.10)
1\
is the standard error of 9, the unbiased estimator
In the present problem, 9 represents the parameters
of 9.
~, ~,
2*
6A '
or
2*
6n
.
The coefficient of variation is supplementary in that
it provides an indication of the precision of estimation
for each of the populations, in addition to the information
on the relative efficiencies of estimation for one population
relative to the other as given by the ratios (6.9).
The
coefficients of variation presented in the tables do not
include random experimental error and could be considered as
minimum values, since the addition of random error would inflate the values presented.
The coefficient of variation is a quantitative measure,
which requires one to set an arbitrary limit on the value
of a coefficient of variation as a criterion for whether
or not an estimator can be considered good in the sense of
being precise.
Ordinarily, an estimator for a mean is con-
sidered poor if the coefficient of variation is as high
as 50, but variances are ordinarily estimated with less
precision than means.
Allowing for an additional inflation
70
of the coefficient of variation due to random error, the
estimators obtained will be considered sufficiently precise
if the coefficient of variation is <40.
Results on the evaluation of the ratio for additive
variance estimators, (6.9), with specified values of n, a,
m, and p are given in Table 8.3.
Values for 1000 loci dif-
fer only slightly from those for 100 loci and are eliminated
from the table.
Also, results for n=15 are eliminated since
-the trend for increasing n is well illustrated with those
values used in the table.
The coefficients of variation for
"2
6 A and "2*
6 A are shown in Tables 6.4 and 6.5, respectively, for
specified values of n, a, m, and P.
It should be noted that
the coefficient of variation for ~2*
vA is independent of gene
,--,.
frequency and degree of dominance.
Generally, the derived population estimator is more efficient than the parent population estimator; and as both m
and n become larger the ratio becomes smaller, as shown in
Table 6.3.
Extremely high values of the ratio for some cases
of p=0.75, 0.95 are accounted for by a divergence of the genetic variances of the two populations at these points.
Obser-
vation of additive genetic variance for the two populations
in Tables 6.6 and 6.7 reveals additive genetic variance in the
parent population to be much smaller than that of the derived
population at these crucial points, hence causing the ratios
to be very large.
An increase in the degree of dominance
appears to accentuate the high and low points in the tables.
(e
e
Table 6.3
P
(e
Values of [V(6~)/(6~)2J/[V(6~*)/(E6:*)2 J for specified values of
m, and p
Y
~
0.0
5
10
20
5
0.5
10
20
1.0
10
5
20
D,
a,
:
1.5
10
20
0.05
2
10
100
8.0
1.8
1.1
8.8
1.9
1.1
9.1
1.9
1.1
7.9
1.8
1.1
8.1
1.8
1.1
8.2
1.7
1.1
8.1
1.8
1.1
7.9
1.8
1.1
7.8
1.8
1.1
8.4
1.9
1.1
7.8
1.8
1.1
7.6
1.7
1.1
0.25
2
10
100
1.7
1.1
1.0
1.7
1.1
1.0
1.7
1.1
1.0
1.9
1.2
1.1
1.4
1.1
1.0
1.3
1.0
1.0
2.6
1.4
1.3
1.6
1.1
1.1
1.3
1.1
1.0
3.4
1.7
1.4
1.8
1.2
1.2
1.4
1.1
1.1
0.50
2
10
100
1.2
1.0
1.0
1.1
1.0
1.0
1.0
1.0
1.0
1.9
1.3
1.2
1.6
1.1
1.1
1.5
1.1
1.0
4.7
2.2
1.9
3.2
1.5
1.3
3.0
1.3
1.1.
11.2
4.4
3.6
6.3
2.2
1.7
5.7
1.8
1.3
0.75
2
10
100
1.7
1.1
1.0
1.7
1.1
1.0
1.7
1.1
1.0
3.6
1.5
1.3
3.6
1.4
1.1
3.6
1.3
1.1
24.2
7.0
5.0
15.3
3.4
2.1
13.3
2.7
1.5
890.9
245.5
172.1
258.7
48.4
24.4
148.0
22.2
7.9
0.95
2
10
100
8.0
1.8
1.1
8.8
1.9
1.1
9.1
1.9
1.1
11.6
2.3
1.2
12.5
2.3
1.2
12.8
2.3
1.1
1852.4
282.9
104.6
580.3
73.6
18.3
268.6
33.3
7.2
109.8
17.3
6.9
32.1
5.3
2.4
3.3
2.6
1.6
....:I
....
(e
e
Coefficient of variation of ~f for specified combinations of n, a, m, and p
Table 6.4
p
.(e
~5
0.0
10
IS
20
5
0.5
10 15
20
5
10
15
.20
5
1.0
1.5
10
15
20
0.05
2
10
100
1000
141
90
73
71
99
61
49
47
80
49
39
38
69
48
34
33
140
90
73
71
95
60
49
47
76
48
39
38
66
31
33
33
142
91
74
72
94
60
49
48
75
38
39
,38
64
41
33
33
145
92
75
73
93
60
49
48
74
47
39
38
63
41
33
33
0.25
2
10
100
1000
66
70
71
71
43
46
47
45
35
37
38
38
30
32
32
32
68
73
73
74
40
46
47
48
31
37
37
38
26
31
33
33
81
79
79
79
42
48
49
49
31
37
39
39
26
33
33
33
93
86
85
85
45
49
50
50
33
38
39
99
27
32
33
33
0.50
2
10
100
1000
55
64
70
71
35
45
47
47
28
36
38
38
23
31
32
32
69
75
77
77
42
47
48
49
33
37
38
38
28
32
33
33
108
99
97
97
60
55
53
53
47
42
41
41
40
36
34
34
168
140
133
133
84
66
61
61
65
49
45
45
55
41
37
37
0.75
2
10
100
1000
66
70
71
71
43
46
47
45
35
37
38
38
30
32
32
32
95
82
79
79
63
52
49
49
51
41
39
39
43
35
33
33
246
177
157
155
130
83
67
66
100
61
48
47
'84
51
39
38
1487
1049
922
911
599
311
232
222
367.
193
130
121
278
145
91
84
2
141
90
73
71
99
61
49
47
80
49
39
38
69
43
34
33
173
102
78
75
118
48
50
48
95
55
40
38
82
47
34
33
2152
1127
716
662
795
386
201
172
503
241
118
98
377
179
86
70
524
279
184
171
187
104
73
69
113
66
50
48
42
50
40
39
0.95
10
100
1000
'J
lI.:)
Table 6.6
e
'-"
~
Additive genetic variance of the parent population for 10 loci and specified values of a and p
0.00
0.50
1.00
1.50
0.05
0.95
1.99
3.43
5.25
0.10
1.80
3.53
5.83
8.71
0.25
3.75
5.86
8.44
11.48
0.50
5.00
5.00
5.00
5.00
0.75
3.75
2.11
0.94
0.23
0.90
1.80
0.65
0.07
0.07
0.95
0.95
0.29
0.01
0.12
(e
e
Table 6.7
"a
n
-
p
\
(e
Average additive genetic variance of derived population for 10 loci and
specified"va1ues of n, a, and p
5
0.0
10
20
5
0.5
10
20
5
1.0
10
20
5
1.5
10
20
0.05
0.76
0.86
0.90
1.23
1.58
1.78
1.82
2.54
2.96
2.53
3.73
4.45
0.10
1.44
1.62
1.71
2.23
2.83
3.17
3.22
4.40
5.09
4.40
6.33
7.46
0.25
3.00
3.38
3.56
4.04
4.91
5.38
5.34
6.82
7.61
6.92
9.10
10.27
0.50
4.00
4.50
4.75
4.12
4.59
4.80
4.48
4.86
4.96
5.08
5.31
5.23
0.75
3.00
3.38
3.56
2.24
2.21
2.17
1.74
1.42
" 1.20
1.52
1.00
0.65
0.90
1.44
1.62
1.71
0.85
0.76
0.71
0.45
0.26
0.16
0.25
0.11
0.07
0.95
0.76
0.86
0.90
0.41
0.35
0.32
0.18
0.08
0.04
0.07
0.03
0.06
'I
t/:lo.
75
-
For increasing degrees of dominance, the absolute magnitude
of difference between high and low values in the table increases when p<0.95.
The effects of m and n must be considered jointly in
order to come to any sensible conclusions.
Apparently, as
m and n both become large, the ratio tends to a limiting
value near unity, indicating that both estimators become
equally efficient for all gene
frequencies and degrees of
for precise estimates of additive variance in the parent
population.
\,--,,-
There are cases where at least 20 or more par-
ents may be necessary,
~.K.,
with
p~0.50
and
a~l.O.
Values
of the coefficient of variation for d~* in Table 6.5 indicate that n=lO parents are sufficient for precise estimates
of additive variance in the derived population.
Values of the ratio for dominance variance estimators,
(6.9), are shown in Table 6.8 for specified values of n,
m, and p.
The ratio is independent of the degree of dominance.
The coefficients of variation for 6"2 and A2*
6
are shown in
D
D
Tables 6.9 and 6.10, respectively. The dominance genetic
variances of the two populations are shown in Tables 6.11
and 6.12.
76
Values of [V(~~)/(6fi)2J/[V(a~*)/(Ey6~*)2Jfor
specified values of n, m, ana p
Table 6.8
0.05, 0.95
10
20
5
0.25, 0.75
5
10
20
5
0.50
10
20
2
126.4
125.8
179.0
10.0
14.1
26.8
3.9
2.9
2.9
10
15.1
14.4
20.3
2.7
2.7
4.0
1.8
1.3
1.2
100
2.4
1.8
2.3
1.9
1.4
1.4
1.6
1.1
1.0
1000
1.3
0.7
0.6
1.8
1.3
1.1
1.5
1.1
1.0
Table 6.9
e
'-'
5
Coefficients of variation for ~fi for several combinations of n, m, and pa
0.05, 0.95
15
10
20
0.25, 0.75
10 15 20
5
5
0.50
10 15
20
2
468
250
185
153
106
57
44
38
71
27
18
13
10
217
114
84
69
74
33
24
19
65
25
15
11
100
91
43
30
24
64
25
16
12
63
24
15
11
1000
67
26
17
13
63
24
15
11
63
24
15
11
of a.
aCoefficients of variation for "2
6
are independent
D
_·.--"0.,4.'
e
'-
77
Coefficients of variation of "2*
6 D for specified
values of n, m, and pa
Table 6.10
~
0.05, 0.95
5 10 15
5
0.25, 0.75
10 15
20
0.50·
5 10 15
20
2
42
22
15
11
34
15
10
7
36
16
10
8
10
56
30
20
15
45
20
13
10
48
22
14
10
100
59
31
21
16
47
21
14. 10
51
23
15
11
1000
59
32
22
16
47
21
14
51
23
15
11
of a.
10
aCoefficients of variation for "2*
On are independent
Table 6.11
~a
•
20
Dominance genetic variance of the parent population for 10 loci and specified values of
a and p
0.5
1.0
1.5
0.05, 0.95
0.02
0.09
0.20
0.10, 0.90
0.08
0.32
0.73
0.25, 0.75
0.35
1.41
3.16
0.50
0.63
2.50
5.63
The estimator of dominance variance in the derived population is generally more efficient than that of the parent
population estimator with the exception where p=0.95, 0.05
and m=1000.
There also appears to be a tendency for the
ratio to approach a limiting value, possibly near unity, as
both m and n increase in magnitude.
The coefficients of variation for "2
6 and "2*
6
do not
n
n
change with a change in the degree of dominance and are symmetrical around p=0.5.
It appears that a diallel sample
size of n=lO is sufficient for good estimation of 6~*.
Good
estimation of 6~ requires a diallel sample size of at least
n=l5 except for p=0.5, where n=lO
~ppears
to be sufficient.
For both sets of estimators considered, sampling appears
to play an important role in determining the relative ef-
•
ficiency of the estimators, as one would suspect.
The param-
eters to be estimated in the derived population become more
79
like their counterparts in the parent population as the sample
size of the diallel increases; bence, the variances of their
estimators would be expected to become more like those for
the parent population.
Also, the number of loci seems to have
an important role and their effect on the relative efficiencies
must be considered jointly with the sample size, as indicated
earlier.
The situations considered may be 'somewhat artificial due
to the physical limitations encountered in the numerical
evaluations.
Also, it is highly unlikely that all genes
controlling any character have the same frequency in a population or that all loci exhibit the same degree of dominance.
An attempt was made to study the possible effects of this
restriction by considering 10 subsets of 10 loci with varying combinations of the other parameters, n, a, and p.
In
all cases considered, the averages over the 10 subsets fell
in the expected region assuming 100 loci with parameters
equal to the average of the 10 subset parameters.
Thus, the
assumption of equal parameters in the numerical evaluations
did not appear to be too misleading.
The mean square error of the biased estimators for the
derived population parameters is shown in Table 6.13, letting ui=uj=l, Pi=Pj=P, and ai=aj=a.
The mean square error
of an estimator is its .variance plus the square of its bias.
The formulas of Table 6.13 were obtained from the variance
and the bias of each estimator shown in (6.7).
80
Table"6.13
Mean s~uare error of biased estimators for 6~*
and
where ui:Uj=l, Pi=Pj=P, and
ai=ajl:a
"
On '
l
MSE(6A2*)b = 8m(m-l)(n-l)(n-2)2 (n_2)p(1_P)[1 + (1-2p)a]2
n6
81
A numerical evaulation of the mean square error of the
biased estimators, the genetic variances of the derived population, and the bias of the estimators were made using particular values of n, a, m, and p.
The ratios of the variances
for parent population estimators to the mean square error for
biased derived population estimators were computed.
Table 6.14 illustrates the general results of the investigation for additive variance estimators.
The gene fre-
quencies and degrees of dominace are some of those used in
Table 6.3 for purposes of comparison.
The same general patterns of behavior occurred with the
biased estimators as with the unbiased estimators for the derived population parameters.
However, the biased estimators
were less efficient than the unbiased estimators, if the
mean square error was considered as the measure of ,efficiency
for the biased estimators.
The efficiency of the biased estima-
tors was also computed using their variance rather than the
mean square.
When the variance was used as a measure of ef-
ficiency, the biased estimators had a higher relative efficiency than did the unbiased estimators.
But the magnitude of the bias associated with the est imator must also be considered.
The bias associated with the
biased estimator ranges from roughly 60 percent of the additive variance for n=5 to roughly 25 percent of the additive
variance for n=20.
82
tit
Table 6.14
Additive genetic variance of derived population,
6 2*. bias of (&~*)b; and the ratio,
A '
V(a2)/MSE(~2*) for specific values of n, a,
A
A b
m, and p
I;:n
p=0.05, a=0.5
5
10
15
Blas
Ratio
0.25
-0.13
15.55
0.32
-0.09
10.80
0.34
-0.06
9.81
0.36
-0.05
9.32
"2*
Blas
Ratio
1.23
-0.65
5.56
1.58
-0.43
3.17
1.72
-0.31
2.63
1.78
-0.24
2.24
100
62*
Bias
Ratio
12.30
-6.51
3.63
15.85
-4.31
1.98
17.16
-3.13
2.63
17.84
-2.45
2.24
1000
2*
Blas
Ratio
123.01
-65.06
3.45
158.47
-43.09
1.86
171.56
-31.29
1.52
178.35
-24.46
1.37
2
10
•••
0'2*
A
• 6A
O'A
20
p=0.75, a=0.5
2
62*
A
Blas
Ratio
0.45
-0.31
1.57
0.44
-0.19
1.64
0.44
-0.13
2.03
0.43
-0.10,
1.83
10
6 2*
Bias
Ratio
2.24
-1.56
1.15
2.21
-0.95
1.00
2.19
-0.67
1.03
2.17
-0.52
1.06
100
6i*
Blas
Ratio
22.35
-15.60
1.05
22.11
-9.45
0.87
21.86
-6.69
0.88
21.70
-5.17
0.89
223.50
-156.00
1.04
221.06
-94.50
0.86
218.56
-66.89
0.86
216.98
-51.66
0.88
1000
6H*
Blas
Ratio
e
''<-..-.'
83
Table 6.14 (continued)
S':n
p=0.75, a=1.5
10
15
20
2
6 2*
Btas
Ratio
0.30
-0.17
14.94
0.20
-0.08
7.32
0.15
-0.05
8.57
0.13
-0.04
10.26
10 .'
0 2* .
Bias .
Ratio
1.52
-0.84
6.65
1.00
-0.41
2.27
0.77
-0.26
1.99
0.65
-0.20
1.99
o~*
15.15
-8.40
5.03
9.96
-4.05
1.23
7.70
-2.64
0.87
6.47
-1.96
0.76
o~*
151.50
-84.00
. 4.87
99.56
-40.50
1.13
77.00
-26.44
0.76
64.68
-19.59
0.64
100
B as
Ratio
1000
B1.as
Ratio
e
5
84
The biased dominance variance estimators for the derived population parameters produced results similar to
the biased additive variance estimators (Table 6.15).
In
all cases, the biased estimators were less efficient than
the unbiased estimators with the mean square error comparison.
If the variance of the biased estimator was used
as an efficiency measure, the biased estimators were more
efficient than the unbiased estimators" .However , the bias
of the dominance variance estimator was roughly 80 percent
of 6 2 * for n=5 to roughly 30 percent of 6 2 * for n=20.
D
D
The unbiased estimators for derived population parameters would be more desirable for use than the biased esti-
•
mators, since they are more efficient.
Also, the bias
associated with the biased estimators is quite large, and
it seems to be relatively easy to correct for the bias by
the use of information from the parental analysis.
6.5
6.5.1
Consequences of Random Experimental Error
Parent Population Estimators.
The estimators
and their variances shown in Section 6.1 pertain to the
genetic portion of the model and ignore any random contribution due to experimental error.
In this section, the change
in estimators from the dia1le1 and their variances upon inelusion of random experimental error for the strictly additive and dominance genetic model in the absence of epistasis
are indicated.
85
~
Table 6.15
Dominance genetic variance of derived population,
2* ; bias of (OD
"2* )b; and the ratio,
6n
V(a~)/MSE(~*)b for specified values of n, a,
m, and p
S
2
10
n
e
1000
p=0.25, 0.75; a=0.5
10
15
20
~~:s
Ratio
0.06
-0.05
2.35
0.07
-0.03
1.71
0.07
-0.02
1.87
0.07
-0.02
2.25
6 2*
0.31
-0.24
1.14
0.33
-0.15
0.55
0.34
-0.11
0.55
0.34
-0.09
0.60
Blas
Ratio
3.08
-2.40
0.8,6
3.29
-1.52
0.31
3.36
-1.09
0.24
3.40
-0.85
0.22
2*
6J;)
Blas
Ratio
30.75
-24.00
0.84
32.91
-15.19
0.29
33.64
-10.89
0.21
34.01
-8.46
0.19
B~as
Ratio
100
5
6B*
p=0.50; a=1.0
2
10
100
1000
6 2*
B~as
Ratio
0.35
-0.26
1.86
0.41
-0.16
0.67
0.44
-0.12
0.52
0.45
-0.09
0.46
62*
n
Blas
Ratio
1.76
-1.28
1.52
2.07
-0.81
0.52
2.20
-0.58
0.38
2.27
-0.45
0.33
Ratio
17.60
-12.80
1.45
20.70
-8.10
0.48
21.99
-5.81
0.35
22.68
-4.51
0.31
6 2*
B£as
Ratio
176.00
-128.00
1.45
207.00
-81.00
0.48
219.85
-58.07
0.35
226.81
-45.13
0.30
6 2*
B~as
e"-
86
Table 6.15 (continued)
S
2
10
n
62*
Blfas
Ratio
1000
e
p=0.50; a=0.5
10
15
20
0.09
-0.06
1.85
0.10
-0.04
0.70
0.11
-0.03
0.55
0.11
-0.02
0.47
0.44
-0.32
1.53
0.52
-0.20
0.51
0.55
-0.15
0.38
0.57
-0.11
0.33
Bl.as
Ratio
4.40
-3.20
1.45
5.18
-2.03
0.45
5.50
-1.45
0.35
5.67
-1.13
0.31
62*
B£as
Ratio
44.00
-32.00
1.45
51.75
-20.25
0.48
54.96
-14.52
0.35
56.70
-11.28
0.30
6~*
Bias
Ratio
" 100
5
6 2*
D
87
The error contribution to an observation, YqrJ in
(4.1) is e qr , where the e qr are assumed to be normal and
independently distributed variables with mean zero and
v~riance 6 2 /k, and are independent of the genetic effects.
e
For the purpose of estimation, it is assumed that a rep1i-
cated experiment has been conducted that yielded an error
variance with f e degrees of freedom (Table 4.1).
In the
absence of epistasis, the estimators for the genetic variances in the parent population are
(6.11)
•
The variances of the estimators in (6.11) differ from
those shown in Section 6.2 due to the addition of random
error to the model.
A2
V(6n)e
where
The variances are
= V(MSs.c.a)e
+ V(MSE) - 2 Cov(MS s . c . a ' MSE) ,
e
~ V(MS g . c . a )
'-
+
4
[(n-2)6A2 +
k(n-1)
2
Cov(MSs,c.a' MSE)e ~ 0
Cov(MS g . c . a ' MSs.c.a)e
=
Cov(MS g . c . a , MS s . c . a )·
(6.12)
Hence
+
+
2
8
[(n-2)6 2 + 2(n -n-2)6 2 J6 2
k(n-1)(n-2)2
A
n(n-3)
D e
8(n2-n-2)6~
k 2n(n-1) (n-2)2(n-3)
(6.13)
The subscript e on the expressions in (6.12) and (6.13) distinguishes the variances under the model, including. random
erro~
from the variances where random error was ignored.
Where there is no subscript, the values of V(MS g . c . a ),
V(MS s . c . a )' and Cov(MS g . c . a , MS s . c . a ) are the variances
and covariance shown in Table 6.1.
V(~)
and
v(2~)
Similarly, values for
are the variances shown in Section 6.2,
ignoring random error.
Observation of the variances in (6.13) reveals the
genetic portion of the variance is unaffected by increased
replication, whereas the portions of the variances containing random experimental error become smaller as the replication is increased.
Hence, replications in the environmental
sense cannot improve the initial genetic sample.
89
6.5.2
Derived Population Estimators.
The variances of
the estimators for parameters of the derived population also
change if random experimental error is included in the
model.
In addition to the random error component, e qr , in
(4.1) the random component of error must be considered for
-
observations on the parental values, (4.2), defined as Sqq
where the
aqq
are assumed to be normal and independently
.
2
distributed variables with mean zero and variance 6b/k. It
is assumed that an experiment has been conducted to yield an
estimate of error variance for the parents, MSE I , with f I
degrees of freedom.
Upon addition of random error to the model, the estimators for the derived population variances in (5.19) become
~D2*
(n-l)(n-3)(MS
-MSE) + 4(n-l)2(MS
-MSE)
n(n-2)
s.c.a
n3 (n-2)
g.c.a
-
(n-I) ?,..
n3
D +
(n-l) 2"
n
3
F,
where
1\
D = MSI - MSE I
1\
F = 2 MSI - 4 MP(I.O) - 2 MSE I .
(6.14)
The variances and covariances of the dial leI statistics
2* are shown in Table 6.16. From
used to estimate 6A2* and 6D
Section 6.3, the variances for derived population estimators
'e
are average conditional variances.
(e
e
Table 6.16
.(e
Average conditional variances and covariances of ·diallel statistics used to
estimate derived population parameters
Statistics
Variance
/\
4 "'V
2
2
2
2
4
-. ~ ~oDiDj + k( -1) [2D + (6e5 /k) ]6() + -2- 6:)
~<J
n
.
k f I
A
16
~
[(n-2) 2
21
4""
1
""V
(n-l)(n-2)~oDi[ 2
6 AJo + 6 DJo + (n_l)~oDi(Fi-Dj) + (n_l)~.FiFj
~FJ
.
~FJ
~FJ
D
F
16
[(n-2) 2
2J 2
16
2
2
+ k(n-l)(n-2)[ 2
6 A + 6 D 6 6 + k(n-l) (n_2)[D + (6:)/k)]6 e
+
MS g . c . a
;
+
MS s • c • a
8
k(n-l)
4
F6 2 +
6
"'"' L(n-2)
~ < &j [ . 2
8
k2 (n-l)
64 +
0
64
8
k 2f
I
0
J
2
2 rJn-2) 2
2 ]
6 Ai + 6Di [ 2
6 Aj + 6 Dj
4
[(n-2) 6 2 + 62J62 +
2
64
k(n-l) L 2
A
D e
k 2 (n-l) e
8
"'"' 62 62
n(n-3)~. Di Dj
~<J
+
8
6262 +
4
64
kn(n-3) D e
k2n(n-3) e
co
o
(oe
'e
ore
Table 6.16 (continued)
Statistics
Variance
MSE
(2/k 2 f e )64e
MSE 1
(2/k 2 f
"
A
(D,
F)
'"
2
2 /k)]6&
2 + (4/k 2 f )6 4
. 2 -- o~.DiFj
+ k(n_l)[2D
+ F + 2(6b
I b
lr=J
I
)6 4
b
(D, MS g . c . a )
~ (
)(
)
.•(n-2)
~,.~2Di-Fi 2D j -F j
l<J
" MS g . c . a )
(F,
(n-2) "" (2D. -F.) (2D .-F.) 2
~ (2D· -F·) [(n-2) 6 2 . + 6 2
2(n-l)i~
1
1
J J
(n-l)i~j
1
1 [2
AJ
DJ
1\
.J
-.
(F, MS s • c • a )
o
(D,
o
MS s . c . a )
(MS g . c . a , MS s . c . a )
,
2
~ , (2D-F) 6~
o
(D
....
92
6.6
Normal Approximations of Variances
The sampling variances in (6.12) are the exact sampling
variances for the particular problem considered herein.
Ordinarily, the sampling variances used for mean squares
are derived under the assumption that all effects in the
model are normal and independently distributed, and the mean
squares are functions of a chi square variable.
Under
normality assumptions, the variance of" a mean square, MS,
is V(MS)
=
(2/d.f.)[E(MS)]2.
For the problem under discussion, the genetic portion
of the model has as its basis the multinomial distribution
and only the random errors are considered to come from
normal populations.
Therefore, the normality assumption
leads to approximations of the exact variances of the mean
squares.
The point is illustrated with MS g . c . a '
Under
the normality assumption for all effects, its variance is
V(MS g • c • a )
=
=
2
f(n-2)6 2 + 6 2 +
(n-l)l 2
A
D
6~]2
~
2 [(n-2) (52 + 6D2l2
(n-l)
2
A
IJ
+
4 [(n-2)62 + 6 2 JOe2 +
2
64
D
2
k(n-l)[ 2
A
k (n-l) e'
(6.15)
Comparing the approximate variance in (6,15) with the true
sampling variance in (6.12), it can be seen that the last
two terms are identical.
The closeness of the approximation
depends on how well V(MSg,c.a) is approximated by
93
2
r(n-2)o2
(n-l)l
2
A
(2)2
+n .
The exact variance of MS g . c . a is shown in detail in Table
6.1. At the risk of being repetitious, it is stressed that
the difference comes about as a result of the basic distribution assumed for the genetic effects.
Likewise, the variance of MS s . c . a changes from the
form in (6.12) to
4
V(MS s . c • a )
(2
= n(n-3)~n
+
0:)2
~
(6.16)
.
Again, the degree of approximation depends upon how well
[4/n(n-3)]6~ approximates V(MS s . c . a ) shown in Table 6.1.
The covariance of MS g . c . a and MS s . c . a becomes zero under the
hormal approximation.
Considering only the changes in (6.15) and (6.16) that
result from the change in the underlying assumption regarding the distribution of the genetic effects, the normal approximations of V(6~) and V(a~) can be compared to the true
variances evaluated in Section 6.3.
Contributions from
random error were ignored for the evaluations since they
occur with equal value in the two sets of variances.
Hence,
the quantities evaluated and compared with V(~~) and V(~~)
of Section 6.4 are
4
f
2
= (n-l)2 (n-l)
[ (n -2 ) 0 2
2
A +
On2] 2
4
4}
+ n(n_3)60
and
(6.17)
94
The subscript N in (6,17) distinguishes the normal approximations to the variances from the true variances as evaluated in Section 6.4.
The variances in (6.17) were evaluated in the same manner as outlined in Section 6.4.
The following ratios were
then computed.
(6.18 )
A2
A2
~2
where V(6 A)N and V(6n)N are shown in (6.17) and V(6A) and
V(~~) were evaluated in Section 6.4. The normal approxi~
mations of the variances are too low if the ratios in (6.18)
are <1 and too high if the ratios are >1.
Some of the results are presented in Table 6.17 to illustrate the general pattern of behavior of the ratio of
the two variances for the estimator of additive genetic variance.
The most obvious development is that the normal ap-
proximation appears to be very good with a large number of
.
loci, malOO, for all gene frequencies and degrees of dominance.
dent.
When the value of m is low, several trends are eviFirst, the normal variance underapproximates the
true variance for low and high gene frequencies, while it
overapproximates at intermediate gene frequencies and low
degree of dominance.
An increase of the degree of dominance
results in a decrease of the normal approximation relative
to the true variance, in most cases.
The results.for a=O.5
are similar to those for a=O.O and the results for a=1.5
a,
95
are similar to those for a'""LO.
Therefore results for a==0.5
and a::::l.5 were omitted from Table 6.17.
Table 6.17
"2 )N /V ( ~
V(6
A) for specified values of n, a, m,
A
and p
p
~
10
5
0.0
15
20
S
10
1.0
15
20
0.05
2
10
100
1000
0.25
0.62
0.94
0.99
0.23
0'.60
0.94
0.99
0.22
0.59
0.93
0.99
0.22
0.58
0.93
0.99
0.25
0.63
0.94
0.99
0.26
0.63
0.05
0.99
0.26
0.63
0.04
0.99
0.26
0.63
0.05
0.99
0.50
2
10
100
1000
1.67
1.09
1.01
1.00
1.82
1.10
1.01
1.00
1.88
1.10
1.01
1.00
1.90
1.10
1. 01
1.00
0.80
0.95
1.00
1.00
0.78
0.95
0.99
1. 00
0.75
0.94
0.99
1.00
0,73
0.93
0.99
1. 00
0.75
2
10
100
1000
1.15
1.03
1.00
1.00
1.18
1.03
1.00
1.00
1.18
1.03
1.00
1.00
1.19
1.03
1.00
1.00
0.40
0.77
0.97
1. 00
0.25
0.63
0.94
0,99
0.22
0.58
0.93
0.99
0.20
0.56
0.93
0.99
0.95
2
10
100
1000
0.25
0.62
0.94
0.99
0.23
0.60
0.94
0.99
0.22
0.59
0.93
0.99
0.22
0.58
0.93
0.99
0.09
0.34
0.84
0,98
0.04
0.20
0.70
0.96
0.04
0.16
0.65
0.95
0.03
0.14
0.63
0.94
The results shown in Table 6.18 illustrate the behavior of the ratio,
V(a~)N/V(~).
The results are similar
to those shown in Table 6.17 for the additive variance.
However, degree of dominance has no effect on the ratio and
the results are symmetrical around p:.::0.5;£..g., the results
for p::::0.25 are equal to the results for p=0.75.
Again, the
normal approximation seems to be very good with large number
of loci at all gene frequencies, especially for intermediate
gene frequencies.
that as the number of loci becomes large the distribution
approaches the normal; hence the variances of estimators from
a sample of the population would tend to variances obtained
under assumptions of normality.
Hayman (1960) presented a complete variance matrix of
diallel estimators where the variances and covariances were.
obtained under the assumption of normality.
97
--,'
7.
SUMMARY AND CONCLUSIONS
7.1
Discussion
Most variance estimation procedures in quantitative
genetics are based on the assumption that the experimental
material is a random sample from the population of inference.
Sometimes, however, it is desirable to 'make inferences to a
population that is wholly derived in some prescribed way
from the parents of the experimental material.
The use of a
reference base population derived wholly from the parents of
a diallel cross has been studied in this investigation for
the estimators of genetic variance that may be obtained from
e,
the diallel experiment.
Certain assumptions were imposed on the developments
contained herein.
In all cases, regular diploid Mendelian
'inheritance was assumed, and both the parent population and
derived population were assumed to be in linkage equilibrium.
The effects of finite sampling from the parent population
were taken into consideration in the development.
However,
the development of the random mating derived population was
assumed to proceed in such a manner that random drift was
unimportant.
Initially, the genetic model included additive,
dominance, and additive-by-additive epistatic gene effects
" with two alleles at each of two loci.
The genetic model was
extended to include an arbitrary number of loci and alleles
in the 'absence of epistasis.
The genetic model must be
98
restricted if genetic variances are to be estimated from
the diallel experiment.
The usual genetic model assumed for
the diallel analysis is the additive and dominance model in
the absence of epistasis.
In some populations, however,
dominance variance may have no significance.
And in these
cases the additive and additive-by-additive epistatic genetic
model can be considered.
Unbiased estimators of genetic variances of the parent
population are available from the diallel analysis consisting
of only the F
crosses either in the absence of dominance or
l
in the absence of epistasis. Analysis of the inbred parents
is not required for estimation of genetic variances in the
parent population, nor does it aid in the estimation of these
variances--a well-known result.
Unbiased estimators of the genetic parameters in the
derived population under the additive and additive-by-additive
epistatic model were obtained that required no information
on the parents.
For the additive and dominance genetic model
in the absence of epistasis, however, it was necessary to use
information from the analysis of the inbred parents to obtain
unbiased estimators of the derived population parameters.
The estimators for the genetic variances in the derived
populations were obtained in such a manner that they were
unbiased with respect to the particular subset of diallel
samples that give rise to the same derived population.
That
is, the conditional expectations of the estimators are equal
99
-'
to the genetic variances of the particular derived population that will be formed from that set of diallel parents,
which implies that the estimators are also unbiased with respect to their unconditional expectations.
Also in the
absence of epistasis the mean for a specific derived population can be estimated without genetic error from the dial leI
experiment.
The mean is the only parameter that can be esti-
mated without genetic error.
The complexity of the exact variances of the estimators
of the derived and parent population parameters ruled out
any analytical comparison of their relative efficiencies.
Consequently, numerical evaluations of the variances and
coefficients of variation of the estimators were made for
the additive and dominance genetic model in order to obtain
information on the usefulness of the derived population
estimators relative to the parent population estimators.
With few exceptions, the estimators for derived population parameters were relatively more efficient than estimators for parent population parameters for the cases investigated in the numerical analysis.
There was an indica-
tion that the estimators became equally efficient or nearly
so for large numbers of parents and large numbers of loci
considered jointly, especially at intermediate gene frequencies.
For populations with intermediate gene frequencies, the
method of estimating parameters of the derived population is
e·
recommended.
If gene frequencies tend to the extremes, there
100
are cases where use of the derived population base may not
be appropriate.
However, this is also true for the parent
population base, depending on conditions of the other parameters,
!.~.,
number of loci and degree of dominance.
Consideration of the results on the coefficients of
variation indicate the need for at least fifteen or more
parents for the diallel to obtain good estimators for the
parent population reference base.
The use of ten parents is
. sufficient to obtain the same degree of precision in estimating derived population parameters.
It would be desirable in some cases to be able to pro-
•
vide estimates of the derived population parameters from
the information contained only in the analysis of the F
crosses.
l
For the additive and dominance genetic model such
estimators must necessarily be biased.
However, if the
bias is not too large, the advantage of not having to use
parental information might outweigh the slight bias involved.
For this reason, biased estimators for the derived
population parameters were provided for the additive and
dominance genetic model, which included information from
the analysis of the F l crosses only. Using mean square
error as a measure of their efficiencies, it was found that
the biased estimators were less efficient than the unbiased
estimators.
In addition, the bias of the estimators for
genetic variances was very larg,e, varying around 25 percent
101
for the additive variance and around 30 percent for the
dominance 'variance, when n=20.
The bias became larger when
n, the number of parental lines, was reduced.
In view of
the large bias and the large error mean square, the estimation for this'model without parental information was considered useless.
The exact sampling variances of estimators for additive and dominance variance in the parent population were
. compared to their normal approximations ordinarily used.
A numerical analysis of the two sets of variances showed
that the variances used under normality assumptions provided very good approximations to the exact variances and
warranted their use in the analysis.
The approximations
were especially good for large numbers of loci and intermediate gene frequencies.
The approximations were poor
when 2 or 10 loci were involved and especially so when
gene frequencies were near 0.05 or 0.95.
Comstock and Robinson (1951) obtained experimental
results on the estimates of components of variance from a
series of quantitative genetical experiments on corn.
They
compared the dispersion of observed mean squares with that
expected from normal theory and concluded that the assumption of normality was realistic for the material with which
they were working.
102
There is also a considerable savings in time and effort
if the normal approximations are used as opposed to the exact
variances.
The derivation of the exact variances is a com-
p1icated and cumbersome operation as compared to the derivation of the normal approximations.
If one accepts the normal approximations to estimator
variances, the difficulties encountered in determining good
estimators for any reference base are reduced considerably.
For the present, consider the derived population reference
base.
Following the procedures set forth in Section 5.3.1,
the average values of derived population parameters may be
expressed as linear functions of the parent population
parameters.
Estimators for the functions are obtained from
the diallel analysis.
The estimators obtained for the average
values of the derived population parameters are unbiased
over those diallel samples that give rise to the same derived
population.
The variances of the estimators can be obtained
from the analysis of the diallel cross under the assumption
of normality in the linear model.
Where the variance of a
2
mean square, MS, will be V(MS)=2[E(MS)] Id.f.
With the
assumption of normality, there is no need to obtain the
formula expressing mean squares in terms of sampling variables and genetic values (Section 4) that were used primarily
to derive the true variances.
These results hold for the
additive and dominance genetic model for any number of loci,
,It
each with an arbitrary number of alleles.
103
The problem considered herein is not the same as the
case where a plant breeder takes a set of available lines and
uses them in a diallel experiment to estimate derived population parameters.
The above case calls for exact estimators
in the sense that there is no genetic error of estimation.
In the absence of the ability to develop exact estimators,
the distribution of the diallel samples giving rise to
identical derived populations must be considered in order to
determine expectations of the estimators and variances of the
estimators.
The distribution of the diallel samples in this
problem are based on the idea of sampling the parents from
some parent population.
Selection or other forces in the
development of a specific set of lines may upset the
assumption of independence of gene distributions necessary
in the development of the results contained herein.
Questions arise as to whether or not the equilibrium
derived population is a realistic reference base or if the
equilibrium population can be attained in practice.
Theoret-
ically, of course, the answer to both questions is yes.
In
practice, linkage and random drift due to finite samples can
cause some difficulty in approaching an equilibrium population for which genetic variances can be defined.
Linkages
slow up the approach to equilibrium and finite populations
introduce the problem of gene fixation and inbreeding.
These
forces are discussed earlier in regard to the present problem.
104
The composition of synthetic populations as reservoirs
of genetic variability offer an example of derived populations, provided the original parents of the population are
chosen such that the assumptions listed above are not violated.
Although synthetics are best suited for naturally
cross-pollinating species, they can be maintained with certain self-pollinating species such as tobacco where artificial cross-pollinating is easily accomplished.
The selection potential of a population could be evaluated prior' to its synthesis.
Ordinarily, the genetic po-
tential of the experimental material would be evaluated
relative to some parent population.
However, with the tech-
niques of estimating genetic variances for a derived population, the genetic variability could be predicted for the
specific population derived from the experimental material.
The technique of estimating genetic variances for derived populations is highly recommended for use in obtaining
knowledge of the gene action in plant populations, as derived populations are more representative of the real situation in breeding programs.
The potential of this method of
estimation should be investigated further both theoretically
and with actual experimentation.
Some suggested extensions
on the problem are presented in Section 7.2.
7.2
Suggestions for Further Research
A number of extensions of the genetic model should be
·tt
considered for the use of the derived population as a
105
referenc~
base for the analysis of mating designs.
Certainly
efforts should be made to refine the estimation techniques in
order to include
~ore ~eneral
epistatic effects with an
arbitrary number of loci and alleles.
'Perhaps an approach
similar to that used in Section 5.3.1 would provide such an
extension.
Further elucidations on the effects of linkage
disequilibrium in the derived population and the effects of
random drift in forming the derived population are in order.
The consideration of arbitrary inbreeding of the parents
would broaden the choice of mating designs one could utilize
for inferences to the derived population reference base.
It would also be desirable to have the technique fit
into the framework of covariances of relatives, such that the
covariances obtained from the" mating design could be translated directly into genetic variances appropriate for the
derived population.
Such a transition would simplify exten-
sions to other mating designs.
It appears that many of the extensions to generalize the
genetic model and use of different mating designs might be
accomplished through techniques introduced in Section 5.3.1.
Efforts in the direction of generalizing the method contained
therein may prove to be fruitful.
8.
LIST OF REFERENCES
Cockerham, C. C. 1954. An extension of the concept of partitioning hereditary variance for analysis of covariances
among relatives when epistasis is present. Genetics 39:
859-882.
Cockerham, C. C. 1963. Estimation of genetic variances,
pp. 53-94. In W. D. Hanson and H. F. Robinson (ed.),
Statistical Genetics and Plant Breeding. National
Research Council Publication 982, National Academy of
Science, Washington, D.C.
Comstock, R. E., and Robinson, H. F. 1948. The components of
genetic variance in populations of biparental progenies
and their use in estimating the average degree of
dominance. Biometrics 4:254-266.
Comstock, R. E., and Robinson, H. F .. 1951. Consistency of
estimates of variance components. Biometrics' 7:75-82.
Dickinson, A. G., and Jinks, J. L.' 1956. A generalized
analysis of diallel crosses. Genetics 4l~65-78.
Gilbert, N. E. G. 1958. Diallel cross in plant breeding.
Heredi ty 12:0477-492.
Griffing, B. 1956. A generalized treatment of the use of
diallel crosses in quantitative inheritance. Heredity
10:31,:,,50.
Griffing, B. 1958. Application of sampling variables in the
identification of methods which yield unbiased estimates
of genotypic variance components. Australian J. BioI.
Sci. 11:219-245.
Hayman, B. I. 1954a. The analysis of variance of diallel
tables. Biometrics 10:235-244.
Hayman, B. I.
crosses.
1954b.
The theory and analysis of diallel
Gene~ics 39:789-809.
Hayman, B. I.
crosses.
1957. Interaction, heterosis and diallel
Genetics 42:336-355.
'
Hayman, B. I. 1958. The theory and analysis of diallel
crosses II. Genetics 43:63-85.
Hayman, B. I. 19'60. The theory and analysis of diallel
crosses III. Genetics 45:155-172.
107
Jinks, J. L. 1954. The analysis of continuous variation in
a dial leI cross of Nicotiana rustica varieties.
Genetics 39:767-788.
Kempthorne, O. 1954. The correlation between relatives in
a random mating population. Proc. Roy. Soc. (London)
B143 (910): 103-113.
Kempthorne, O. 1956. The theory of the diallel cross.
Genetics 41:451-459.
Kempthorne, O. 1957. An Introduction to Genetic Statistics.
.
John Wiley and Sons, Inc., New York.
Kendall, M. G., and Stuart, A., 1958. The Advanced Theory
of Statistics. Charles Griffin and Company Limited,
London.
Mather, K. 1949.
London.
•
Biomet'rical Genetics.
Methuen and Company,
Matzinger, D. F., and Kempthorne, O. 1956. The modified
diallel table with partial inbreeding and interactions
with environment. Genetics 41:822-833 .
Sprague, G. F., and Tatum, T. A., 1942. General vs.
specific combining ability in single crosses of corn.
J. A~. Soc. Agron. 34:923-932.
Yates, F. 1947. Analysis of data from all possible
reciprocal crosses between a set of parental lines.
Heredity 1:287-301.
108
9.
9.1
APPENDIX
Expectations of Diallel Statistics
The expectations of the diallel statistics presented in
Section 4.2 are derived in this section.
The expectations
of the statistics are performed in two steps.
The total ex-
I
pectation is obtained by first taking the conditional expectation with respect to Wl2 given the y,. and secondly taking
expectations of the conditional expectation With respect to
y.
Hence, total expectation is given by E=EyEw/ y where Ew/ y
denotes the conditional expectation of Wl2 given y and Ey
denotes expectation with respect to y.
Expectations of
functions of the Pi required in this section are given in
Section 9.4.
The conditional expectation of the mean of the FIls,
(4.7), is
Since Ew/y(W12)=nPIP2
Ew/y(Y) =
~ (2P i -1)ui
+
(~~1)~ P i (l-P i )ai u i
n
(n-l)P
*
I
*
- (n-1)ft I '
109
The expectation of Ew/y(Y) with respect to Yi is
Now from (9.37) in Section 9.4, expectations of the functions
of Pi are
where the last expectation holds due to the independence of
Yi and Yj'
Substituting proper expectations,
(9.2)
The conditional
Ew/y(MSg.c.a)
ex~ectation
of MS g . c . a ' (4.9), is
.'
110
_.
- :(1- 2P2)t 12]
[(n~2)iU2
+ (1-2P2)a2u2 -
*(1-2P1)ti~J
[n
2
(n-4)2
+ n(n-1)(n-2) (n-I)P1(1-P 1 )P 2 (1-P2 )
From (3.12), the conditional expectations of interest are
and
Substitution of the proper conditional expectations yields
EW/y(MSg.c.a)
3
n
"
[(n-2)
='(n-l)(n-2)~Pl(1-Pi)
'n u i + (1-2P i )a i u i
i
..
(n-2)
J2
n(n-4)2
2
+ (n_l)2P1(1-P1)P2(1-P2)t12
n (1-2Pj~1)t12
=
n3
2* + n(n-4)2 62 *
2(n-l)(n-2)6A
4(n-l)2 AA
n
- (n-2)D
*
Expanding
respect to y,
e,
n
2
+ 2(n-l)(n-2)F*.
~/y(MSg.c.a)
(9.3)
and taking expectation with
e·
111
E(MS g . c . a )
= EyEw/y(MSg.c.a)
3
n
= (n-l)(n-2)
~r.(n-2)2
2
2 2 2
~L n2 EyPi(l-Pi)u i + EyP i (1-P i )(1-2P i ) aiu i
+
(n-2)2
2 2
n 2 EyPi(1-Pi)Ey(1-2PjFi) t 12
-
2(n-2)2
n 2 EyPi(1-Pi)Ey(1-2PjFi)uit12
Substituting proper expectations from (9.37) and collecting
terms,
=
1 2
(n-2) [ ~A +
1 2
1
2
1 2
i6AAJ + 6n + i6AA'
The conditional expectation of MS s . c . a ' -(4.10), is
(9.4)
112
e·
-",,~,/
+ Cn-i) (n~2) (n-3) i~ Ew/y(w12-nPlP2 )
X[(n-l) -
n2~i(1-Pi}Jaiuit12
XC Ew(y(W12-nPIP2) 2
2
+ n(n-l)(:-2)(n-3) l(n -3n+4)
n2
- (n_l)Pl(1-Pl)P2(1-P2)]
- 2n(n-l)Ew/Y(W12-nPlP2)l-2Pl)(1-2P2)Jt~2'
Recalling the conditional expectations in (3.12),
4n
""
= (n-l)(n-2)(n-3)~
1.
2
2 2
Pi(l-Pi)[n Pi(l-P i ) - (n-l)]aiu i
2n 2
2
+ (n_l)2 P1 (1-Pl)P2(1-P2)t12 .
=
n(n-2)
2*
2n
6 2*
n2
2*
(n-l)(n-3)6n - (n-2)(n-3) A + 2(n_l)26 AA
n
+ (n-2)(n-3)D
*
n
- (n-2)(n-3)F
*
113
Taking the expectation of Ew/y(MSs.c.a) with respect
to y,
= E(MS s . c • a )
=
4n
"'" [n 2 E P 2 (1_P )2 _ (n l)E P (1 P ) ]a 2 u 2
(n-l)(n-2)(n-3)~
y i
i
- . Y i
- iii
2n 2
2
+ (n_l)2EyPl(1-Pl)EyP2(1-P2)t12
= 4L P~(1-P1)2a~u~
i
e.
+
2Pl(1-Pl)P2(1-P2)t~2
The conditional expectation of MSI,.(4.11), is
+
16n(n-2)
2
8
2 PI (I-PI) P 2 (1-P 2 ) t 12 + ( 1)Ew/y(W12-nPIP2)
(n-l)
.
n-
X[u l u 2 + (1-2P l )u l t 12 + (1-2P 2 )u 2 t 12 -
(1-2Pl)(1-2P2)t~2]
Recalling conditional expectations from (3.12),
114
+ 16n(n-2)p (l-P )P (l-P )t 2
(n_l)2
==
n
(n-i)
1
1
2
2
12
n* + 4n(n-2) 62*
(n-1)2 AA'
The expectation of the expansion of Ew/y(MSI) with respect
to y is
= E(MSI)
which yields, upon substitution of expectation from (9.37),
E(MSI) = 44: Pi (l-Pi)[u i - (l-2Pj~i)t12]2
J.
2
+ 16Pl(l-P1)P2(l-P2)t12
=D
+ 46
2
AA
.
(9.8)
The quantity D in (9.8) is related. to the parameter D
defined by Hayman (1960)., where in the absence of epistasis,
•
D
= 4 ~ Pi(l-Pi )u~.
J.
(9.9)
115
The conditional expectation of MP(I.O), (4.12),is
+ 4n(n-4) P (l-P )P (l-P )t 2
(n_l)2
+
1
1
2
2
12
(n-~)~n-2) ~ Pi (1-P i)(1- 2P i) CUi
(1-2Pj~i)t12Jaiui
L n2
4(n-4)
+ n(n-l) (n-2) [(n-l) P 1 (1-P 1 )P 2 (1-P 2 )
-
Ew/y(W12-nPIP2)2Jt~2
4
2
- (n-l) Ew/y(W12-nPIP2)(1-2Pl)(1-2P2)t12
2(n-4»)1
(
(
Ew/ y W12- nP i P j) 1~2Pi)uit12
ttj
+ (n-l) (n-2)
+ (
/ (W12-np.p.)(1-2p.)2a.u.t12
n- 1~( n- 2)LE
i~j w y
1 J
1
1 1
and upon substitution for the conditional expectations,
EW/y(W12-nPIPZ)
Ew/y(W1Z-nPIPZ)
we have
Z
=0
2
n
= (n-l)
P1 (1-P 1 )P Z (1-P 2 ),
116
2n "
. - (1-2P /: )t ] 2
= (n_l)~Pi(l-Pi)[Ui
j i
12
i
=
+
4n(n-4)
(n_l)2Pl(1-Pl)P2(1-P2)t12
+
2n 2
~
. ~Pi(1-Pi)(1-2Pi)[ui - (1-2PJ'1i)t12Jaiui
(n-l)(n-2) i
F
2
n
*
n(n-4)2 6 2*
n
*
2(n-l)D + (n-l)2 AA - 4(n-l) (n_2)F .
The expectation of the expansion of Ew/y(MP(I.O)] with respect to y gives
=
E[MP( I. 0)]
= (:~l)~[EyPi(l-Pi)U~
1
which upon proper substitutions for the expectations from
(9.37) yields
117
E[MP(LO) ]
= 2~
~
+
Pi(l-Pi)[u i - (1-2Pjpi)t12)2
2Li Pi(l-Pi) (1...;2P i)[ui
- (1-2pjpi)t12 j a i u i
+ 4Pl(1-Pl)P2(1-P2)tf2
112
= 2D
- 4 F + 6 AA
(9.11)
The quantity, F, in (9.11) is related to the parameter,
F', defined by Hayman (1960), where in the absence of epistasis,
F = -84:Pi(1-Pi)(1-2Pi)aiu~.
(9.12)
~
Finally, the conditional expectation of YI , (4.13), is
=
~ (2P i -l)u i
~
+
~(4nPIP2 - 2nPl - 2nP 2
*
= PI'
+ n)t 12
(9.13)
The total expectation of YI is then
=
2: (2Pi-l)ui
+
(2Pl-l)(2P2-1)t12
i
= PI'
(9.14)
where PI is the mean of the population of completely inbred
lines derived from the random mating parent population.
118
Derivations of the Exact Variances
9.2
for Parent Population Estimators
In this section, V(MS g . c . a ), V(MS s . c . a ), and Cov(MS g . c . a ,
MS s . c . a ) as shown in Table 6.1 are derived.
The two vari-
ances and covariance of the mean squares of the dial leI
analysis are used to obtain the variances of the estimators
for cs~ and 6~ of the parent population..
The derivations are
given for the genetic portion of the mean squares shown in
(4.14) for the additive and dominance gene model with an
arbitrary number of loci.
First,
V(MS g . c . a )
= E[MS g . c . a - E(MS g . c . a )]2
·
2
= E[MS g . c . a ] 2- ·[E(MS
g . c . a )l.
(9.15)
L:. c.1.
(9.16)
From (5.6),
E(MS g . c . a )
and
=
1.
for simplicity.
Substituting for MS g . c . a and E(MS g . c . a ) in (9.15) from
(4.14) and (9.16), respectively,
119
--.
V(MS g • c
.a)
· E! (n-l~~n-2) ~ Pi (I-Pi) [(n;2 l
2n 2
~
+ (n-l)(n-2)i~j(Wij-nPiPj)
+ (1-2Pilai]
[(n-2)
n
+ (1-2PilaJl (n;2 l + (1-2P j }a j )u i u j }2 -[
=E
6
n
(n_1)2(n_2)2
2u~
{L:
p. (l-P·) [(n-2)
Ii 1
1
n
f:CiY
+ (1:"2P.)a.J 2 u2i l. 2
1
1
)
4
{~
. (w·· -nP . P . ) [(n-2)
(n-1)2(n-2)2 . <j .1 J
1 J.
n
+ E .4n
+ (1-2P i l aJ
~(n;2l
+ (1-2P j la j ]ui u j
f
5
4n
2: (w .. -np.p.)r (n-2)
{n-1)2(n-2)2 i <j 1J
1J L n
+ E
+ (1"-2P i
)uJ [(n~2)
... (1-2P i )a i ] 2Ui -
+ (1-2P j )U j ] UiU j
[~cJ 2.
~
1
Pi (l-P i )[ (n~2)
(9.17)
Considering the conditional expectation of (9.17) term by
term,it is seen that the third term goes to zero, since
EW/y(Wij-nPiPj)~O.
yields
Expansion of the second term of (9.17)
120
E
. 4n
4
..
.(
~
(W
P P ) 2 f.(n-2)
(1-2P.)a.J 2[(n-2)
l n +
1
1
n
(n-l)2(n-2)2~. ij-n i j
+ (1-2P.)a
J
j
'12u?u~
1
J
2~ ~
+
i
j<k
(WiJ·-nPiP .) (Wik-nPiP k ) [(n-2)
J
n
~i
+ (1-2P )a J2
i
+
i
t(n~2)
(1-2Pk)ak}.i~UjUk
+ (1-2P )a ]
j
+
j
t(n~2)
i~ ~.(, (Wij-nPiPj)(Wkt-nPkP.(,)t(n~2)
~i,j
.
'+ (1-2P )aJ [(n-2) + (1-2P.)alL(n-2)
i
i
n
J
~[ n
The conditionsl expectation of the first term of the expans ion is
+ (1-2P.) a .] 2 u 2. P . (l-P .) [(n-2) + (1-2P.) a.U2 u 2..
1
1
1 J
J
n
J J
J
The expectations of the last two terms of the 'expansion go
conditional expectation; hence, upon collecting terms after
conditional expectation,
•
'-
121
V(MS g • c • a )
=E
n
6
fL:p (l-P )[(n-2)
n
Y(n-1)2(n~2)2l i i i
6
+ E
4n
LP.(1-p.)[(n-2)
Y(n-l)3(n-2)2 i <j 1
1
n
1
2
2
[(n-2) + (1-2P.)a.
+ (1-2P.)a. 2 u.P.(l-P.)
1
J
1
1
J.
J.
n
.
J
J
u.J2
(9.18)
Now, since Pi and P. are independent, the expectation of the
.
J
second term of (9.18) with respect to y can be obtained from
the results of Section 9.1, equations (9.3) and (9.4),
letting t 12 =O in those equations.
The expectation of the
second term in (9.18) is then
(n~l)?=. {(n-2)Pi(1-Pi)[l
1<J
+
+
(1-2Pi)aiJ2u~
4P~(1-Pi)2a~u~Jt(n-2)Pj(1-Pj)[1
+
(1-2Pj)aj]2u~
2
2 2 21
+ 4pj (l-Pj) ajuj 5
Expansion of the first· term in (9.18) yields
(9.19)
122
From the results above, the expectation of the second term
in (9.19) yields
")'1l(n-2)Pi(1-Pi)[l + (1- 2P i)ai) 22
2t<j
ui
+
4P~(1-Pi)2a~u~1{(n-2)Pj(1-Pj)[l
+
(1-2Pj)ajJ2u~
Collection of terms for (9.18) then gives
6
n
V. (MS g . c . a ) = E
~
Y(n_l)2(n_2)2 i
P~(1_p.)2[(n-2)
1
1
n
+
(1-2P.)a.]4u~
.
1
1
1
(9.20)
Now to obtain the expectation of the first term in (9.20),
we expand the term and take expectation of individual terms
with respect to y.
Since Pi=Yi/n, Ey(pi)=Ey(yi/nr)=u~i/nr,
where the u~i are the u~ shown in (9.36) for the i th locus.
Substitution of the expectations for (9.20) yields the final
result in Table 6.1 for V(MS g . c . a )'
The exact variance of the mean square for specific combining abil±ty is
V(MS s . c . a ) = E[MS s . c . a - E(MS s . c . a )l
2
(9.21)
From (5.6),
123
letting
(9.22)
for simplicity.
Substituting for MS s . c . a arid E(MS s . c . a ) in (9.21) from
(5.6) and (9.22), respectively,
= Ehn-l) (:~2) (n-3) ~
2
Pi (1-P i )[n P i (I-Pi) - (n-l)
8 2
+ n(n-3) i~ [(Wij-nPiP j )
e
n2
- (n-l)P i (I-Pi)P j (l-P j )
- (n~2)(Wij-nPiPj)(1-2Pi)(l-2Pj)]aiUiajujl2
=E
2
l6n
.2
2
(n-l) (n-2) (n-3)
2
I2: '
i
n
(n-2)
2 2
- (n-l)]aiui
-
1 J
1{"'" .
[~6~iJ2
2
n
(n_l)Pi(l-Pi)PJ.(I-P
J.)
(w· .-np·P.)(1-2p.)(1-2P.)]a.u.a.u.
1J
-
· ) - (n-l)]aiui
2 2}2
Pi{l-Pi)[n2
Pi(l-P
i
[(
"W ' ·-np·p.) 2 + E 2 64
L.,[
n (n-3)2 i<j
1J
1 J
-
]a~u~
1
J i~[~ij-nPiPj) 2 -
J
1
1
J J
1
2
2
n
(n_I)Pi(l-Pi)~j(l-Pj)
(n~2)(Wij-nPiPj)(1-2Pi)(l-2Pj)]aiUiajUjJ
(9.23)
124
Upon taking conditional expectation, the third term of
(9.24)
The last two terms of (9.24) go to zero on conditional
expectation since it was shown in (3.15) that
125
for r,
s~,
and
j~~,
where i mayor may not be equal to k.
The foregoing statement implies that the conditional expectations of the products of functions for Wij and Wk~'
j~t, can be taken as the product of the conditional expectations.
As a result, the last two terms of (9.24) go
to zero, since the conditional expectations of each member
of the products within the square brackets are zero because
2 2 ·
Ew/y(Wij-nPiPj) =[n /(n-l)]Pi(l-Pi)Pj(I-P j ) and
Ew/y(Wij-nPiP j ) =0.
The expectation of the first term of
(9.24) requires P2' P3' and P4' shown in (3.12), after which
expectation with respect to y yields
Upon collecting terms for (9.23),
V(MS s . c .• a )
_[L: 62.J 2
.
1.
=
E
2
2 l6n 2
D1.
tt
22: P~(l-Pi)2[n2Pi(I-Pi) - (n-l) ] 2a u
(n-l) (n-2) (n-3)i
32n2
~
2
2 2 ~ Pi (l-Pi)[n Pi (I-Pi)
(n-l) (n-2) (n-3) i<j
+ E.
2
22·
2
2 2
- (n-I)]a i u i P j (I-P j )[n Pj(l-P j ) - (n-l)]aju j
8,",22
",2 2~4
+ n ( n -3) i<j
~6D"6D"
-2~6D1."OD"
- ~i 6 D1."·
1. J
i<j
J
(9.25)
126
But, expectation of the second term in (9.25) yields
22; 6~i6~. from (9.37), and
i<j
J
V(MS s . c . a )
(9.26)
Now to obtain the expectation of the first term in (9.26),
the term is expanded and expectation is taken with respect
to y.
u~i
Since Pi=yi!n, Ey(pi) = Ey(yi!nr)=u~1!nr,
where the
are the u~ shown in (9.36) for the i th locus.
Substitu-
tion of the expectations for (9.26) yields the final result
shown in Table 6.1 for V(MSs.c,a)'
The covariance of MS g . c . a and MSs.c,a is
COV(MS g . c . a ' MSs.c,a)
= E[MSg . c . a
- E(MSg.c,a)][MSs,c.a - E(MSs.c,a)]
= E[(MSg.c.a)(MSs,c.a)]
- E(MSg,c.a)E(MSs.c,a)'
Substituting from (9.16) and (9.22),
Cov(MSg,c.a' MS s . c . a )
= E[(MSg.c,a)(MSs.c,a)]
-[zr i] [~ 6~i]
C
Substituting for MS g . c . a and MS s . c . a yields
(9.27)
127
2
2n
.'
[(n-2)
] [(n-2)
' .""
+ (n-l)(n-2)~(Wij-nPiPj)n
+ (1- 2P i)ai
n
1<J
.
+ (1-2P j )a j ]u i u j }{
=
(n-l)(:~2)(n-3)~
Pi(1-Pi)[ n2P i(1-Pi)
-
(n-l)]a~U~
-
(n~2) (Wij-nPiP j ) (1-2P i ) (1-2Pj)]aiUiajUj
2
+8
[(w -nP P )2 n p .(1_P.)P.(1_P.)
ij
i j
n(n-3) i<j
(n-l) 1
1
J
J
-
[2i c~ [~6~iJ
E[ (n-l)n 3(n-2) ~
~ P (l-P ) [(n-2)
i
i. n
•
+ E
-
l 6 n { L : ( W ._np.p.)[(n-2)
(n-l)(n- 2) (n-3) i<j iJ
1 J
n
(l-2Pi)aiJ[(n~2)
L:
+ (1-2P .)aJ uiu.]l
[(w ·-nP i P.)2
iJ
J .J
J li<j
J
2
n
- (n-I)Pi(l-Pi)Pj(l-P j ) .
-
,
(n~2) (Wij-nPiPj ) (1-2P i ) (1-2Pj ) jaiUiajUj 1
(9.28)
The conditional expectation of the second term in (9.28)
is zero, using the
expect~tions
in (3.12) and the result of
the relationship shown in (3.15),
Details of the expectation
•
128
of the second term in (9.28) are not shown here, but can
easily be shown to give the indicated result,
Upon collecting terms for (9.28),
Cov(MSg,c,a' MS s • c . a )
2u~J {(n-l) (:~2) (n-3) ~ Pi (1-P i )[n 2P i (I-Pi)
+ (1-2P i )a i ]
(n-l)
= E{
la~un
- [~c~ [Lt /)~J
4
4n
L p 2 (1_P )2[(n-2)
(n-l)2(n-2)2(n-3), i i i
n
2
+ (1-2Pi)ai] 2[n P i (I-Pi) - (n-l)
+
4n
4
]a~u1
L;p. (l-P.) [(n-2)
(n-l)2(n-2)2(n-3)i~j 1
1
n
(9.29)
Si,nce the Pi are independent, the expectation of the second
term of (9.29) is from the results of Section 9.1,
L: (C.)(6 2 .),
i~j
1
DJ
so upon collection of terms,
129
!
4
2
Cov(MS g . c • a ' MS s . c . a ) = E"
2 4n 2
~ P (1_P i )2i(n-2)
(n-l) (n-2) (n-3)"i i
n
+ (1-2Pi)ai]2[n2Pi(1-Pi) -
(n-l)Ja~ut}
Upon expansion of the first term in (9.30) and taking expecta'tion where E(P::)=E(y::!nr)=u'., one obtains the final
1
1
r1
result shown for COV(MS g . c . a , MS s • c . a ) in Table 6.1.
9.3
Derivations of Exact Variances for
Derived Population Estimators
In this section, the average conditional variances and
covariances are derived for the genetic portion of the
diallel statistics used to estimate genetic variances of the
derived population.
The derivations are shown for the addi-
tive and dominance genetic model using the formulas in equations (4.14) to give the final results shown in Section 6.3.
The conditional variance of a statistic, 9, used to
estimate a derived population parameter is
(9.31)
From Section 9.1 upon conditional expectation of a diallel
statistic, all functions of the W.. became zero, leaving
1J
only functions of the Pi' which were constant with respect
to conditional expectation.
"
Hence, upon obtaining 1\9-Ew!y(9)
in (9.31), only a function of the Wij " remains. This result
130
simplifies the derivation of the variances and also the covariances, since only conditional expectations of the
squares and cross products of the functions of the Wij are
needed. The point is illustrated with the conditional
1\
variance of D.
1\
Now, D=MSI, where MSI is shown in (4.14).
A
1\
1\
2
The conditional variance of D is V(D)w=Ew/y[D-Ew/y(D)] .
Now
Then using the above result,
(9.32)
which is only a function of the term
1\
appears in (D).
invo~ving
the Wij that
This result can easily be verified for all
diallel statistics in (4.14).
131
e--.
Evaluating (9.32) gives
V*(~)w
= Ew/ y t(n~l){;j (Wij-nPiPj)UiUj] 2
=E
/
[
64
L:
w Y~n-l)2i<j
(W .. -nPi P . )2u~u~
J
1J
J
(9.33)
From Section 3,
= Ew/ y (W 1J
.. -nP.P.)E
/ (w.k-nP.P
)
1 J W Y 1
1 k
:;: o.
Also
=0.
Then the only remaining term to be evaluated in (9.33) is
v*(~)w = Ew/yr.
64 2.
L .(Wij-nPiPj)2U~u~J
L(n-l) 1<J
'
which yields from (3.12)
(9.34)
J'\
which is the conditional variance of D.
A
tional variance of D is
The average condi-
132
J\
V*(D)
=
and since the Pi are independent and from (9.37)
the average conditional variance of
= (n~l) l<J
.2:. D i Dj
Dis
,
.
as shown in (6.2), where Di=4Pi(1-Pi)u~.
1\
The conditional variance of F=2MSI-4MP(I.O), (5.7), is
upon substitution from (4.14),
since all cross products of Wij functions are zero on expectation from Section 3. Now using (3.12),
1\
V*(F)w
=
64n 2
2
2 2 2
2
2 "V
LJE / (Wi,-nP.P.) [U-2P.)a. + (1-2p J.)a J,] uiu J'
(n-l) (n-2) i<j w Y J
1 J
1
1
133
1\
which is the conditional variance of F.
The average condi-
~
~
tional variance of ~F is V* (F)
= Ey[V * (F)w
J, whereupon
sub-
stituting expectations from (9.37),
V*(F) =
(n_l~tn_2)~jPi(1-Pi)U~[(n-2)Pj(1-Pj)(1-2Pj)2
+ 4p~(1-p.)2Ja~u~
J
J
J J
+ (128 L:P1(1-P.i.)(l-2Pi)aiU~Pj(1-Pj)(1-2Pj)
n-l)i<j
=
16
2:;D. [(n-2) ~ + 6 2 ]
1
2
Aj
Dj
(n-l)(n-2)i~j
which is shown in (6.2) where C52 .=2P.(1-P.)[1+ (1-2P.)a.J2u~,
AJ
J
J
J
J
J
6~j=4P~(1-Pj)2a~u~, and Fj=-8Pj(1-Pj)(1-2Pj)ajU~.
The conditional variance of MS g . c . a , (4.14), is
*
V (MS g.c.a ) -- Ew/y [MS g.c.a - Ew/y (MS g.c.a ) J2
2 ) J u · uJ2
.
+-(1-2P.)a.
J J
J ·.
J[(nn
+ (1-2P.)a.
1
1
1
(9.35)
134
The expectation of (9.35) was derived in Section 9.2 from
equation (9.17) and was shown to be
.
. ~ 2 u·P·(l-P.)
2
[(n-2) + (1-2P·)a. J2 u·,
2
+ (1-2P.)a.
1
1
1 J
J
n
J J. J
which is then the conditional variance of MS g . c . a '
average conditional variance of MS g • c . a is then
which was shown to be, from (9.18) in Section (9.2),
V*(MS g . c • a )
=
which is shown in (6,2),
The conditional variance of MS s . c . a ' (4.14) is
The
135
V*CMSs.c.a)w
= Ew/y[MSs.c.a - Ew/yCMSs.c.a)]2
8
.',
2
n2
= Ew/ y [ n(n-3)i~[CWij-nPiPj)
- (n-l) PiCl-Pi)PjCl-Pj)
-
(n~2)CWij-nPiPj)Cl-2Pi)(1-2Pj)]aiUiajUjJ2,
which upon expansion is the term in (9.24) of Section 9.2,
and the conditional variance is, using (3.12),
V* (MSs.c.a)w
II
2 l28n 2
3 ""
L..[Cn-l) - n2PiCI-Pi)][Cn-l)
(n-l) Cn-2) (n-3) i<j
2
2 2 2 2
- n PjCI-Pj)]PiCI-Pi)PjCI-Pj)aiUiajuj.
The average conditional variance using C9.37) is
which is shown in C6.5).
1\
The conditional covariance of D and MS g . c . a is
1\
1\
= Ew/y[D - Ew/y(D)][MSg.c.a - Ew/y(MSg.c.a)]
=
EW/Y[cn~l)i~ CWij-nPiPj)UiU~ {cn_~~~n_2)i~(Wij-nPiPj)
e--
136
Since expectation of cross products of Wij functions
zero, the conditional covariance is
Cov* (MS g • c • a ' A
D)w
~
4
l6n
Pi (i-Pi) r(n-2)
(n-l)3(n-2)i<j
L n
L
+ a i (1-2P i )]
U~Pj(1_Pj)[(n~2)
using (3.12).
+ a, (1-2P,)1 u~
J
J J J'
The average conditional covariance using
expectations from (9.37) is
Cov* (MS g • c . a ' 1\D)
4
=
E >Pi(1-Pi)[(n-2)
.16n
(n-l)3(n-2) y~
_
•
n
+ a i (1-2P i )]
= 16(n-2)
(n-l)
-
U~Pj (l-Pj ) [(n~2)
+ a j (1-2P j )] uj
~[p,(l-p,)u~
i~
<J
1.
1.
1.
+ p"1. (l-p"1. ) (1-2p"1. )a,1. u?]
Cp.1. (l-p,J )u?1. + PJ' (l-PJ") (1-2PJ' )a . uJ~]
1.
J
which is shown in (6.2).
137
e·
1\
The conditional covariance of F and MS g . c • a is
Since expectation of cross products of Wij functions are
zero, the conditional covariance is
=
E
-16n3
~ (W .. _np.p.)2[(n-2)
(n-l)2(n-2)2 W/Y ipj 1J
1 J
n
+ a. (1-2P.)J [(n-2) + a. (1-2P.) 1 (1-2P. )u~a . u~
1
1
using (3.12).
n
J
J
J
J
1 J J
The average conditional covariance using
expectations from (9.37) is
13S
*
COV (MS g • c • a
' A
F)
J
2 ( I-P. ) ( 1-2P.) [(n-2) + (1-2P.)] a.u.2
+ (1-2P.)a i uiP.
1
J
J
J . n
J
J J
-16 ~ [Pi(l- P i)u 2
= (n-l)
i
i j
+ Pi(1-Pi)(1-2Pi)aiu~J[(n-2)p.(1-P.)(1-2P.)a.u~
J
1·
J
J
JJ
6 2 _ (n-2) (!-D. - sl FJ.\l
~ (.l D. _ I F .\ [(n-2) 6 2
(n-l)~.~4 1
8 Y
2
Aj + Dj
~4 J
VJ
= -16
1FJ
as shown in (6.2).
A
A
The conditional covariance of D and F is
*
h
A
~
~
A
A
Cov (D, F)w = Ew/y[D - Ew/y(D)][F - Ew/y(F)]
= EW/Y[(n~l)i~ (Wij-nPiPj)UiUj]
Since the cross products are zero on expectation, the conditional covariance using (3.12) becomes
139
"'~
The average conditional covariance is, using (9.37),
1\
COV *'"
(D, F)
~
Ey[Cov *"
(D, "-F)w J
•
(~~~)i~Pi(1-Pi)U~Pj(1-Pj)(1-2Pj)ajU~
= (n~l) i~DiF j '
as shown in (6.2).
1\
The conditional covariance of MS s . c . a and F is
- (n-2)
n (W 1J
.. -nP.P.)(1-2P.)(1-2P.)a.u.a,u'J
1 J
1
J
1 1 J J
Since the expectations of cross products are zero, the
conditional covariance is
140
-64n
'C""l
r.
n3
= n(n-1)(n-2)(n-3)i~jL(n-l)(n-2)
-
.
Pi(l-Pi)(l-2Pi)Pj(l-Pj)(l-2Pj)
(n-l)~n-2) Pi(l-Pi)(l-2Pi)Pj(l-Pj)(1-2Pj)2Ja~u~ajuj
= 0,
as shown in (6.5).
Likewise, COV*(MS s . c . a ' D)w and
Cov * (MS s . c • a , MSg.c.a)w can be shown to be zero.
9.4
Moments and Functional Expectations
for the Binomial Distribution
Moments and functional expectations generated from the
binomial sampling distribution in (3.7) are given in this
section.
The moments of the binomial distribution are obtained
from the moment generating function,
t=o
2
141
The first eight moments about zero are, letting ni=n-i,
2
u'
'2 = nop + nOn l P
u'3 = nop + 3n On l P2 + n On l n 2 P3
u'4
= nop
Us
=
+ 7n n l P2 + 6nOnln2P3 + n On l n 2n 3 P4
O
nop + 15non l P2 + 25nonln2P3 + lOnOnln2n3 P4
5
+ n Onln2 n3 n 4 P
2 3 4
+ 90nOnln2P + 65nonln2n3P
+ 31nonlP
+ 15nonln2n3n4p5 + nOnln2n3n4n5P6
ui
= nop
+ 63nonlP2 + 301no n 1 n 2 P3 + 350nOnln2n3P4
+ 140nOnln2n3n4P5 + 21nonln2n3n4n5P6
+ nOnln2n3n4n5n6P
Us' = nop
7
+ 127n o n l P2 + 966n o n l n 2 P3 + 1701nonln2n3P4
+ l050nOnln2n3n4P5 + 266nonln2n3n4n5P6
'+ 28nonln2n3n4n5n6P7 + nOnln2n3n4n5n6n7P8.
(9.36)
Following are expectations of some functions of Pi
which are required, where Pi is defined in (3.4).
142
E(Pi)
= E(Yi/ n ) = Pi
E[P i (I-P i )]
1
2
= nE(nYi-Yi)
~
(n-l)
n Pi(I-Pi)
1
.
= --E[Yi(n-Yi)
(n- 2Y i)]
3
ti
~ (n-l)(n-2)p.(1_P.)(1_ 2P i)
21.1.
n
• (n-l)(n-2)p.(I-p.)[(n-2)(1_
2P i)2 + 4Pi(I-Pi)]
n
. 1.
1.
3
E[P~(1-Pi)2] = ~E[y~(n-Yi)2]
n
• (n-5)Pi(I-Pi)[(n-l) + (n-2)(n-3)Pi(1-Pi)]
n
E(I-2Pi)2
=
12E (n- 2Y i)2
n
(9.37)
© Copyright 2026 Paperzz