Full Text - the American Society of Animal Science

Linkage analysis using direct and indirect counting and
relative efficiencies for codominant and dominant loci1
Y. Da2, J. Garbe, N. London, and J. Xu
Department of Animal Science, University of Minnesota, Saint Paul 55108
ABSTRACT: A method based on direct and indirect
counting is developed for rapid and accurate linkage
analysis for codominant and dominant loci. Methods for
estimating gender-specific recombination frequencies
are available for cases where at least one of the two loci
is multiallelic and for biallelic loci with mixed parental
linkage phases where at least one locus is codominant.
Most of the estimates of gender-average and genderspecific recombination frequencies required iterative
solutions. The new method makes use of the full data
set, yields exact estimates of the recombination frequencies when the observed and expected genotypic
frequencies are equal, and are computationally effi-
cient. Relative efficiency of various data types is affected by the inheritance mode and by parental linkage
phases of biallelic loci, but unaffected by the locus polymorphism when using the full data set for linkage analysis. The ability to determine parental linkage phases
is affected by the locus polymorphism as well as inheritance mode. Intercross (or F-2 design) is more efficient
for mapping codominant loci, whereas backcross is more
efficient if dominance is involved. Mixed parental linkage phases of biallelic loci are less efficient than coupling or repulsion linkage phases. Ignoring noninformative offspring results in biased estimates of recombination frequency for biallelic loci only and reduced LOD
scores for all cases.
Key Words: Codominance, Dominance, Linkage Analysis, Loci
2002 American Society of Animal Science. All rights reserved.
Introduction
While computationally efficient methods are available for large-scale linkage analysis for codominant loci
(Green et al., 1990), rapid methods are unavailable for
mapping dominant loci and for the map integration of
dominant and codominant loci. Most computer programs that provide linkage analysis for dominant loci,
such as LINKAGE (Lathrop et al. 1984), implement
computationally intensive likelihood analysis and generally have a limitation on the number of loci that can
be analyzed jointly. A computationally efficient method
for linkage analysis with codominant and dominant inheritance would be a valuable tool for mapping dominant genes and for the map integration of codominant
and dominant loci, because the dominant inheritance
mode is typical of many disease genes and many domi-
1
This research is supported in part by the Agricultural Experiment
Station (project MN-16-043) and grant-in-aid of the University of
Minnesota, and by funding from Cargill and NRICGP/USDA (grant
#03275). The authors wish to thank two anonymous reviewers for
helpful comments.
2
Correspondence: 265D Haecker Hall (phone: (612) 625-7780; fax:
(612) 625-1283; E-mail: [email protected]).
Received November 11, 2001.
Accepted May 28, 2002.
J. Anim. Sci. 2002. 80:2528–2539
nant markers exist (Ajmone-Marsan et al., 1997;
Cushwa and Medrano, 1996; Knorr et al., 1999). Knapp
et al. (1995) derived an analytical formula for maximum
likelihood estimation of recombination frequency between two dominant loci in repulsion linkage phase.
The mathematical simplicity of such an analytical formula is computationally efficient for large-scale linkage
analysis. However, many other cases of linkage analysis
do not have a simple analytical formula for estimating
recombination frequencies based on likelihood functions. The understanding of relative efficiencies of various types of genotypic data is useful for planning mapping experiments. Most results on relative efficiencies
of genotypic data (Allard, 1956; Green, 1981) were
based on the approximate variances and covariances of
estimated recombination frequencies, but the accuracy
of such an approximation is unclear. The purpose of
this article is to develop simple solutions for linkage
analysis to facilitate large-scale joint linkage analysis
with codominant and dominant loci, and to evaluate
the relative efficiencies of various types of genotypic
data to provide insights for designing mapping experiments.
Material and Methods
General Strategy. Families with linkage information
will be divided into two categories: families that can be
2528
2529
Direct and indirect counting
analyzed using the direct counting method (Ott, 1999)
for all offspring (Category I), and families that cannot
be analyzed using the direct counting method for all
offspring (Category II). Recombination frequencies will
be estimated using the direct counting method (Ott,
1999) for Category I, and using “direct and indirect
counting” for Category II. Then, the two estimates will
be combined to obtain the overall estimate of recombination frequency. The focus of this article is on the new
method of direct and indirect counting for Category II.
The Method of Indirect Counting. The purpose of using
indirect counting is to develop a method for linkage
analysis that uses the full data set including noninformative offspring with minimal mathematical complexity and computational difficulty to facilitate large-scale
applications. Noninformative offspring do not have information to determine parental allele transmission unequivocally (Da and Lewin, 1995) and cannot be used
for linkage analysis using the direct counting method.
However, noninformative offspring are expected to contain a percentage of unobservable recombinants and
such unobservable recombinants could be estimated using the method of indirect counting to be described below. Therefore, “noninformative” offspring for direct
counting in fact are at least “partially” informative for
indirect counting. “Underlying genotype” is used to refer to a phase-known genotype, whereas “genotype” or
“observed genotype” is used to refer to a genotype with
known allele contents only. For example, AaBb is an
observed genotype with two possible underlying genotypes: AB/ab and Ab/aB. “Phenotype” of a locus refers
to the fact that the allele content of the locus is unknown
and is used to describe observations of dominant loci.
Based on the gene counting method of Smith (1957),
the indirect counting method calculates the expected
number of recombinants contained in the noninformative offspring using the following formula
kei =



c

∑mijvij ki
[1]
j=1
where kei = expected number of recombinants contained
in genotype (or phenotype) i, mij = number of recombinants in the underlying genotype j of genotype (or phenotype) i, vij = conditional probability of recombinants in
noninformative offspring for a given two-locus genotype
(for codominant loci) or phenotype (for dominant loci),
and ki = the total number of noninformative offspring
with the given genotype (or phenotype). The general
formula for calculating vij is
vij = pj/qi
[2]
where pj = probability of underlying genotype j with
recombinant(s), and qi = probability of the observed
genotype or phenotype. Note that pj and qi can be equal
in some cases. The observed number of recombinants
in the same category of families is obtained by direct
counting from informative offspring for which parental
allele transmission can be determined unequivocally.
Adding the numbers of expected and observed recombinants yields the estimated total number of recombinants. Dividing this estimated number of recombinants
by the total number of meioses yields the estimate of
recombination frequency from families where noninformative offspring exist. If gender-average (sex-average)
recombination frequency is assumed,
θ = nr/T
[3]
where θ = gender-average recombination frequency, nr
= total number of expected and observed recombinants,
T = total number of meioses. Since nr is a function of
θ, Eq. [3] generally is a polynomial function of θ. In this
article, an analytical solution for θ is provided if Eq. [3]
is a polynomial function of degree 3 or less, and an
iterative solution is used if Eq. [3] is a higher order
polynomial function. As shown in Da and Lewin (1995),
a cross between heterozygous genotypes, referred to as
an “intercross,” is the only situation where noninformative offspring may exist if the genotypes of both parents
are known. Therefore, the method of indirect counting
will consider various situations of intercross, including
multiallelic, biallelic, codominant, and dominant loci.
Gender-Average and Gender-Specific Recombination
Frequencies. Gender-average (sex-average) recombination frequency refers to the recombination frequency
estimated from meioses of both genders, and genderspecific (sex-specific) recombination frequencies refer
to two recombination frequencies estimated from male
and female meioses separately. Gender-average recombination frequency is always estimable as long as linkage information exists. However, gender-specific recombination frequencies are not always estimable. When
the two loci are biallelic and the heterozygous parents
have the same linkage phase (coupling or repulsion),
gender-specific recombination frequencies are nonestimable regardless of whether the loci are codominant or
dominant, because two independent equations cannot
be established to estimate two separate recombination
frequencies. For two dominant loci, the case with mixed
parental linkage phases (one parent is in coupling
phase and the other in repulsion phase) is the only
situation where two equations could be established to
estimate gender-specific recombination frequencies.
However, neither our method nor the maximum likelihood method would yield reliable estimates. Therefore,
estimating gender-specific recombination frequencies
using dominant loci is deemed impractical and will not
be considered in this article. Methods to estimate gender-specific recombination frequencies will be developed for cases where at least one locus has multiple
alleles or the parents have mixed linkage phases with
at least one codominant locus. In analogy to Eq. [3],
gender-specific (sex-specific) recombination frequencies
can be estimated using the following equations simultaneously:
2530
Da et al.
x = nx/Tx
[4]
y = ny/Ty
[5]
where x = female recombination frequency, nx = total
number of expected and observed female recombinants,
Tx = total number of female meioses, y = male recombination frequency, ny = total number of expected and
observed male recombinants, and Ty = total number of
male meioses. In all cases covered by this article, Eqs.
[4] and [5] will be solved by iterative methods.
Pooling of Estimates. For families using direct and
indirect counting, estimates of a recombination frequency from all s families can be pooled to obtain the
overall estimate from all families using the following
formula:
θ1 =



S


S

∑nri / ∑Ti
i=1
[6]
i=1
where θ1 = the overall estimate of the recombination
frequency from families where noninformative offspring exist, nri = expected number of recombinants in
family i, and Ti = number of gametes in family i. Equation [6] can be used to obtain the pooled estimates of x
and y except that θ is replaced with x or y, and nri and
Ti are replaced with the corresponding gender-specific
numbers defined in Eqs. [4] and [5]. When gender-specific recombination frequencies are available, the gender-average recombination frequency will be obtained
as:
θ = a 1x + a 2 y
[7]
where a1 = Tx/(Tx + Ty) and a2 = Ty/(Tx + Ty). As usual,
the LOD score for a gender-average recombination frequency is defined as
Zθ = log10[L(θ)/L(θ = ¹⁄₂)]
[8]
where L(θ) = likelihood function under the hypothesis
of linkage, and L(θ = ¹⁄₂) = likelihood function under the
hypothesis of no linkage. The LOD scores for testing
the significance of gender-specific recombination frequencies in the literature (e.g., Ott, 1999) is:
Zxy = log10[L(x,y)/L(x = y = θ)]
[9]
The LOD score given by Eq. [9] is an indication how
much the gender-specific model is favored over the gender-average model, but is not a test for the significance
of each gender-specific recombination frequency. The
following LOD scores could be defined to test the significance for gender-specific recombination frequencies:
Zx = log10[L(x;y)/L(x = ¹⁄₂; y)]
[10]
Zy = log10[L(y;x)/L(y = ¹⁄₂; x)]
[11]
As to be shown in this article, a family with noninformative offspring is not as informative as a family without noninformative offspring, even when noninformative offspring are used in the linkage analysis. To account for this type of unequal information, estimates
of a recombination frequency from the two categories
of families should be weighted differently. Since the
LOD score is a summary statistic of the number of
observations and informativeness of a family type, the
LOD of each family type should be a logical choice as
the weight. Let θ2 = estimate of recombination frequency from families without noninformative offspring,
and Z1 and Z2 be the LOD scores for θ1 and θ2, respectively. Then, the overall estimate from all families can
be obtained as θ = c1θ1 + c2θ2, where c1 = Z1/(Z1 + Z2),
and c2 = Z2/(Z1 + Z2), with c1 + c2 = 1. A gender-specific
recombination frequency (x or y) over families can be
obtained similarly using the LOD scores defined by Eqs.
[10] and [11]. However, this article does not include
families that do not have separate estimates of x and
y in the calculation of x or y across families. We have
observed that “forced” estimates of x and y from those
families tend to yield the same x and y values, so that
including such families without separate x and y estimates would tend to diminish the difference between
x and y.
Relative Efficiency of Different Genotypic Data. Relative efficiency of different genotypic data, including
multiallelic, biallelic, codominant and dominant genotypes, will be compared using the unit LOD score and
the likelihood ratio for testing parental linkage phases.
The unit LOD score (u) will be defined as the expected
LOD score per offspring assuming gender-average recombination frequency, i.e.,
u = Zθ/N
[12]
where N is the number of offspring, and Zθ is defined
by Eq. [8]. The definition of the unit LOD score is the
same as the ELOD in Lander and Botstein (1989). Here,
“unit LOD” rather than “ELOD” is used to avoid potential confusion with the ELOD defined differently in Ott
(1999). The type of genotypic data with higher unit LOD
score is considered more efficient for linkage analysis.
An advantage of the unit LOD score over the overall
LOD score (Zθ) is that the unit LOD score can be expressed in terms of the recombination frequencies so
that the numbers of observations are no longer involved. This is convenient for studying the relative efficiencies without having to assume a specific set of numbers of observations. It can be shown that the unit LOD
score can be obtained by replacing the numbers of genotypes in Zθ by the corresponding genotypic probabilities.
Unit LOD scores for specific cases are defined in Appendix 1. The backcross design (backcross to the recessive
line is assumed if dominance is involved) is included
for comparing relative efficiencies with the intercross
or F-2 design. The information available for testing
parental linkage phases is a measure of data efficiency.
2531
Direct and indirect counting
When parental linkage phases are unknown, such as
when the grandparents have missing genotypes and
the allele transmission from the grandparents to the
parent cannot be determined, the likelihood ratio test
based on the offspring genotypic distribution can be
used to determine the parental linkage phases. The
type of genotypic data that yields more statistical confidence for determining parental linkage phases is more
efficient for linkage analysis. Given two loci, four combinations of parental linkage phases are possible. The
likelihood ratio for the two highest likelihood functions
will be used for comparing efficiency for inferring parental linkage phases. Likelihood functions for testing parental linkage phases are given in Appendix 2.
Multiallelic and Biallelic Codominant Loci (MB Data
Type). From Table 1, gender-specific recombination frequencies can be obtained by the following iterative solutions:
x(i+1) = a +
bx(i)(1 − y(i))
x (1 − y(i)) + (1 − x(i))y(i)
+
cx(i)y(i)
(1 − x )(1 − y(i)) + x(i)y(i)
y(i+1) = d +
b(1 − x(i)y(i))
x (1 − y(i)) + (1 − x(i))y(i)
+
cx(i)y(i)
(1 − x )(1 − y(i)) + x(i)y(i)
Bias and Reduction in LOD Score Due to Ignoring
Noninformative Offspring. To quantify the benefit of
including noninformative offspring in linkage analysis,
bias in estimates of recombination frequency due to
ignoring noninformative offspring and the reduction in
unit LOD score are evaluated under the assumption
that the observed offspring distribution equals the expected. Bias is defined as the difference between the
estimate using informative offspring only and the estimate using full data. The reduction in LOD score is
defined as the difference in the unit LOD scores between
using the full data and using informative offspring only.
As to be shown in this article, the methods of using full
data developed in this article yield estimates that are
exactly the same as the true parameters under the
assumption that the observed offspring distribution
equals the expected. Therefore, the bias in recombination frequency due to ignoring noninformative offspring
can be expressed as θd − θ, where θd = estimate of recombination frequency using direct counting, and θ = the
true recombination frequency.
(i)
(i)
(i)
Both Loci Are Multiallelic and Codominant (MM Data
Type). “Multiallelic” in this article refers to three or
more alleles per locus for the two heterozygous parents.
Such a definition is used because three alleles per locus
for the parents result in 100% informative offspring for
the locus. The direct counting method is used for this
type of data. The purpose of describing this type of
data is not to develop a new method, but to use as
a comparison to less informative types of data where
noninformative offspring exist. By the direct counting
method, the gender-specific recombination frequencies
are estimated by Eqs. [4] and [5]. When x and y are
available, gender-average recombination frequency can
be obtained by Eq. [7]. Likelihood functions and LOD
scores are given in Appendices 1 and 2.
[14]
(i)
where x = female recombination frequency, y = male
recombination frequency, superscript i = iteration number, a = (k2 + k3 + k6 + k7)/n, b = (k9 + k12)/n, c = (k10 +
k11)/n, and d = (k2 + k4 + k5 + k7)/n, and where k1 through
k12 are defined in Table 1. Then the gender-average
recombination frequency can be estimated as θ = (x +
y)/2, noting that the male and female parents have the
same number of meioses. This method of estimating
gender-average recombination frequency will also be
used for other cases where gender-specific recombination frequencies are available.
Biallelic Codominant Loci with Coupling or Repulsion
Parental Linkage Phases (BB Data Type). For this case,
gender-specific recombination frequencies are unavailable and gender-average recombination frequency can
be estimated based on Table 2. Substituting the nr in
Table 2 into Eq. [2], then Eq. [2] can be written as a
third degree polynomial function of θ, and the solution
for θ is
Results and Discussion
Results of the new method for estimating recombination frequencies will be presented for eight cases in
order of the most informative loci (both loci have multiple codominant alleles) to the least informative loci
(both loci are dominant with mixed parental linkage
phases).
[13]
θ = [−s +
√ s2 + t3 ]1/3 − [s + √s2 + t3 ]1/3 +
a1
[15]
3
where s = ¹⁄₂[a1a2/3 − (2/27)a13 − c], t = ¹⁄₃(a2 − a12/3), a1
= (T + c1 + n4)/T, a2 = 0.5 + c1/T, and where T = 2n, c1
= 2n3 + n2, c = c1/(2T), n1 = k1 + k9, n2 = k2 + k4 + k6 +
k8, n3 = k3 + k7, and n4 = k5. Note that Eq. [15] is derived
under the assumption of coupling parental linkage
phases but is applicable to the repulsion linkage phases
by reversing the allele definitions for one of the two loci.
Biallelic Codominant Loci with Mixed Parental Linkage
Phases (BB-CR). From Table 3, gender-specific recombination frequencies can be obtained by the following
iterative solutions:
[16]
x(i+1) = a + [bx(i)(1 − y(i))]/[x(i)(1 − y(i)) +
(i) (i)
(i) (i)
(i)
(i)
(i) (i)
(1 − x )y ] + cx y /[(1 − x )(1 − y ) + x y ]
[17]
y(i+1) = d + [b(1 − x(i))y(i)]/[x(i)(1 − y(i)) +
(1 − x(i))y(i)] + cx(i)y(i)/[(1 − x(i))(1 − y(i)) + x(i)y(i)]
2532
Da et al.
Table 1. Genotypic frequency, number of observations, and the number of
recombinants in the offspring from the intercross of A1B/A2b (male) × A3B/A4b (female)
Genotype
A1A3BB
A1A3bb
A1A4BB
A1A4bb
A2A3BB
A2A3bb
A2A4BB
A2A4bb
A1A3Bb
A1A4Bb
A2A3Bb
A2A4Bb
Total
Number of
recombinants
Genotypic
frequencya
Number of
observations
Femaleb
Maleb
q1 = ¹⁄₄(1 − x)(1 − y)
q2 = ¹⁄₄xy
q3 = ¹⁄₄x(1 − y)
q4 = ¹⁄₄(1 − x)y
q 5 = q4
q6 = q3
q 7 = q2
q8 = q1
q9 = q3 + q4
q10 = q1 + q2
q11 = q1 + q2
q12 = q3 + q4
k1
k2
k3
k4
k5
k6
k7
k8
k9
k10
k11
k12
0
k2
k3
0
0
k6
k7
0
v1k9
v3k10
v3k11
v1k12
0
k2
0
k4
k5
0
k7
0
v2k9
v3k10
v3k11
v2k12
1
n
nx
ny
x = female recombination frequency, y = male recombination frequency.
v1 = x(1 − y)/[x(1 − y) + (1 − x)y], v2 = (1 − x)y/[x(1 − y) + (1 − x)y], v3 = xy/[xy + (1 − x)(1 − y)].
a
b
Table 2. Genotypic frequency, number of observations, and the number of
recombinants in the offspring from the intercross of AB/ab × AB/ab
Genotypic
frequencya
Number of
observations
Number of
recombinants
AABB
AABb
AAbb
AaBB
AaBb
Aabb
aaBB
aaBb
aabb
q1 = ¹⁄₄(1 − θ)2
q2 = ¹⁄₂θ(1 − θ)
q3 = ¹⁄₄θ2
q 4 = q2
q5 = 2(q1 + q3)
q 6 = q2
q 7 = q3
q 8 = q2
q 9 = q1
k1
k2
k3
k4
k5
k6
k7
k8
k9
0
k2
2k3
k4
2k5θ2/[(1 − θ)2 + θ2]
k6
2k7
k8
0
Total
1
n
nr
Genotype
θ = gender-average recombination frequency.
a
Table 3. Offspring phenotypes and recombinants from the
mating of AB/ab (male) × Ab/aB (female)
Number of
recombinants
Genotypic
frequency
Number of
observations
Femalea
Malea
AABB
AABb
AAbb
AaBB
AaBb
Aabb
aaBB
aaBb
aabb
q1 = ¹⁄₄x(1 − y)
q2 = ¹⁄₄[(1 − x)(1 − y) + xy]
q3 = ¹⁄₄(1 − x)y
q 4 = q2
q5 = ¹⁄₂[x(1 − y) + (1 − x)y]
q 6 = q2
q 7 = q3
q 8 = q2
q 9 = q1
k1
k2
k3
k4
k5
k6
k7
k8
k9
k1
v1k2
0
v1k4
v3k5
v1k6
0
v1k8
k9
0
v1k2
k3
v1k4
v2k5
v1k6
k7
v1k8
0
Total
1
n
nx
ny
Genotype
v1 = xy/[(1 − x)(1 − y) + xy], v2 = (1 − x)y/[x(1 − y) + (1 − x)y], v3 = x(1 − y)/[x(1 − y) + (1 − x)y].
a
2533
Direct and indirect counting
Table 4. Genotypic frequency, number of observations, and the number of
recombinants in the offspring from the intercross of A1B/A2b (male)
× A3B/A4b (female) with B being dominant over b
Genotype
A1A3bb
A1A4bb
A2A3bb
A2A4bb
A1A3BA1A4BA2A3BA2A4BTotal
Number of
recombinants
Genotypic
frequency
Number of
observations
Female
Male
q1 = ¹⁄₄xy
q2 = ¹⁄₄(1 − x)y
q3 = ¹⁄₄x(1 − y)
q4 = ¹⁄₄(1 − x)(1 − y)
q5 = ¹⁄₄(1 − xy)
q6 = ¹⁄₄[1 − (1 − x)y]
q7 = ¹⁄₄[1 − x(1 − y)]
q8 = ¹⁄₄(x + y − xy)
k1
k2
k3
k4
k5
k6
k7
k8
k1
0
k3
0
k5x(1 − y)/(1 − xy)
k6x/[1 − (1 − x)y]
k7xy/[1 − x(1 − y)]
k8x/(x + y − xy)
k1
k2
0
0
k5(1 − x)y/(1 − xy)
k6xy/[1 − (1 − x)y]
k7y/[1 − x(1 − y)]
k8y/(y + x − xy)
1
n
nx
ny
where x = female recombination frequency, y = male
recombination frequency, a = (k1 + k9)/n, b = k5/n, c =
(k2 + k4 + k6 + k8)/n, and d = (k3 + k7)/n.
average recombination frequency can be obtained using
the following iterative solution:
Multiallelic Codominant Locus and Dominant Locus
(MD Data Type). From Table 4, gender-specific recom-
θ(i+1) = a + bθ(i)/(1 + θ(i)) + cθ(i)(1 + θ(i))/
[1 − θ(i)(1 − θ(i))] + d/(2 − θ(i))
bination frequencies can be obtained by the following
iterative solutions:
ax(i)(1 − y(i))
bx(i)
+
1 − x(i)y(i)
(1 − x(i))y(i)
x(i+1) =
+
[18]
y
a(1 − x(i))y(i)
bx(i)y(i)
=
+
(i) (i)
1−x y
(1 − x(i))y(i)
(i)
where a = (2k2 + k4)/(2n), b = k1/n, c = k3/(2n), and d =
k5/n.
Biallelic Codominant Locus and Dominant Locus with
Mixed Parental Linkage Phases (BD-CR Data Type).
From Table 6, gender-specific recombination frequencies can be obtained by the following iterative solutions:
dx(i)
cx(i)y(i)
+ (i)
+e
(i)
(i)
1 − x (1 − y ) x + y(i) − x(i)y(i)
(i+1)
x(i+1) = av1(i) + cv3(i) + dv5(i) + ev7(i) + f
[21]
y(i+1) = av2(i) + b + cv4(i) + dv6(i) + ev8(i)
[22]
[19]
(i)
dy
cy
+f
+
+
1 − x(i)(1 − y(i)) x(i) + y(i) − x(i)y(i)
where a = k5/n, b = k6/n, c = k7/n, d = k8/n, e = (k1 + k2)/
n, and f = (k1 + k3)/n.
Biallelic Codominant Locus and Dominant Locus with
Coupling or Repulsion Parental Linkage Phases (BD Data
Type). Gender-specific recombination frequencies are
nonestimable for this case. From Table 5, the gender-
where a = k1/n, b = k2/n, c = k3/n, d = k4/n, e = k5/n, f =
k6/n, v1 = [x(1 − y) + xy]/(1 − y + xy), v2 = xy/(1 − y + xy),
v3 = 2[x(1 − y) + xy]/(1 + x + y − 2xy), v4 = 2[(1 − x)y +
xy]/(1 + x + y − 2xy), v5 = xy/[(1 − x)(1 − y) + xy], v6 =
[x + (1 − x)y]/[(1 − x)(1 − y) + xy], v7 = xy/(1 − x + xy),
v8 = [(1 − x)y + xy]/(1 − x + xy).
Two Dominant Loci with Coupling Linkage Phases
(DD-CC Data Type). In this case, both parents are as-
Table 5. Genotypic frequency, number of observations, and the number of
recombinants in the offspring from the intercross of AB/ab × AB/ab
with B being dominant over b
Genotypic
frequency
Number of
observations
Number of
recombinants
AABAAbb
AaBAabb
aaBaabb
q1 = ¹⁄₄(1 − θ)(1 + θ)
q2 = ¹⁄₄θ2
q3 = ¹⁄₂[1 − θ(1 − θ)]
q4 = ¹⁄₂θ(1 − θ)
q5 = ¹⁄₄θ(2 − θ)
q6 = ¹⁄₄(1 − θ)2
k1
k2
k3
k4
k5
k6
2k1θ/(1 + θ)
2k2
k3θ(1 + θ)/[1 − θ(1 − θ)]
k4
2k5/(2 − θ)
0
Total
1
n
nr
Genotype
[20]
2534
Da et al.
Table 6. Offspring phenotypes and recombinants from the
mating of AB/ab (male) × Ab/aB (female)
Number of
recombinants
Genotypic
frequency
Number of
observations
Femalea
Malea
AAB_
AAbb
AaB_
Aabb
aaB_
aabb
q1 = ¹⁄₄(1 − y + xy)
q2 = ¹⁄₄(1 − x)y
q3 = ¹⁄₄(1 + x + y − 2xy)
q4 = ¹⁄₄((1 − x)(1 − y) + xy)
q5 = ¹⁄₄(1 − x + xy)
q6 = ¹⁄₄x(1 − y)
k1
k2
k3
k4
k5
k6
k1v1
0
k3v3
k4v5
k5v7
k6
k1v2
k2
k3v4
k4v6
k5v8
0
Total
1
n
nx
ny
Genotype
v1 = [x(1 − y) + xy]/(1 − y + xy), v2 = xy/(1 − y + xy), v3 = 2[x(1 − y) + xy]/(1 + x + y − 2xy), v4 = 2[(1 − x)y
+ xy]/(1 + x + y − 2xy), v5 = xy/[(1 − x)(1 − y) + xy], v6 = [x + (1 − x)y]/[(1 − x)(1 − y) + xy], v7 = xy/(1 − x +
xy), v8 = [(1 − x)y + xy]/(1 − x + xy).
a
sumed to have coupling linkage phase (Table 7). The
gender-average recombination frequency can be obtained from the following iterative solution:
θ(i+1) = 4aθ(i)(1 + θ(i))/[2 + (1 − θ(i))2]
+ 2b/(2 − θ(i))
[23]
where a = k1/(2n), and b = (k2 + k3)/(2n).
Two Dominant Loci with Mixed Linkage Phases (DDCR Data Type). In this case, one parent is assumed
to have coupling phase and the other repulsion phase
(Table 8). The gender-average recombination frequency
can be obtained from the following iterative solution:
θ(i+1) = aθ(i)(5 − θ(i))/[2 + θ(i)(1 − θ(i))]
+ bθ(i)(1 + θ(i))/[1 − θ(i)(1 − θ(i))] + c
[24]
where a = k1/(2n), b = (k2 + k3)/(2n), and c = k4/(2n).
For the case when the two loci are dominant and both
parents have repulsion linkage phase (DD-RR data
type), the analytical formula for maximum likelihood
estimation of recombination frequency is available from
Knapp et al. (1995).
Numerical Results for Estimating Recombination Frequencies. Equations [13] through [24] were validated and
tested using 200 offspring genotypes generated with the
requirement that the observed genotypic frequencies
equal the expected. The true parameters used to generate
the offspring genotypes were θ = 0.20, x = 0.10, and y =
0.20. Two sets of extreme starting values, θ0 = 0.01, x0 =
0.01, and y0 = 0.01, and θ0 = 0.45, x0 = 0.45, and y0 = 0.45,
were used to test the robustness of the iterative solutions
to starting values. Equations [13] through [24] all yielded
estimates of recombination frequencies that are exactly
the same as the assumed true parameters. The iterative
solutions required less than 55 iterations to converge
with a tolerance level of 10−9 except for the case of dominance with mixed linkage phases (DD-CR), which required 235 to 284 iterations to converge. In terms of CPU
time, all the iterative solutions required less than 1 s
to converge on an 800-MHz laptop computer. The two
different sets of extreme starting values did not have a
significant effect on the number of iterations or computing time. The case of dominance with mixed parental
linkage phases not only required more iterations, but
also was the least efficient data type, as discussed below.
For all the cases, direct and indirect counting yielded
exactly the same results as maximum likelihood analysis.
The method of direct and indirect counting should be a
useful addition or alternative to current methods available for linkage analysis including complex maximum
likelihood analysis due to its mathematical simplicity
and computational efficiency. When combined with the
strategy of two-point analysis for linkage detection, the
method of direct and indirect counting should allow rapid
large-scale joint linkage analysis of codominant and dominant loci, which is useful to facilitate mapping dominant
Table 7. Genotypic frequency, number of observations, and the number of
recombinants in the offspring from the intercross of AB/ab × AB/ab with
allele A being dominant over a and B being dominant over b
Genotypic
frequency
Number of
observations
Number of
recombinants
A-BA-bb
aaBaabb
q1 = ¹⁄₄[2+(1 − θ)2]
q2 = ¹⁄₄θ(2 − θ)
q3 = ¹⁄₄θ(2 − θ)
q4 = ¹⁄₄(1 − θ)2
k1
k2
k3
k4
4k1θ(1 + θ)/[2 + (1 − θ)2]
2k2/(2 − θ)
2k3/(2 − θ)
0
Total
1
n
nr
Genotype
2535
Direct and indirect counting
Table 8. Genotypic frequency, number of observations, and the number of
recombinants in the offspring from the intercross of AB/ab × Ab/aB with
A being dominant over a and B being dominant over b
Genotype
A-BA-bb
aaBaabb
Total
Genotypic frequency
q1 = ¹⁄₄[2 + θ(1
q2 = ¹⁄₄[1 − θ(1
q3 = ¹⁄₄[1 − θ(1
q4 = ¹⁄₄θ(1 −
− θ)]
− θ)]
− θ)]
θ)
1
loci using codominant markers and the map integration
of codominant and dominant loci. The estimates of recombination frequencies from direct and indirect counting
are the expected fraction of recombinants whether the
estimates are within or outside the parameter space.
This is helpful in interpreting the estimates in situations
where the meanings of the estimates are not easily interpretable. For example, if a maximum likelihood using
numerical maximization yielded an estimate outside the
parameter space, the estimate itself could not tell
whether the problem was due to the algorithm of numerical maximization or due to a wrong model or sampling.
As shown in London et al. (2002) and Xu et al. (2002), a
wrong inheritance model could result in a serious bias
in estimating recombination frequencies (including estimates out of the parameter space) and such a bias could
be evaluated conveniently using the method of direct and
indirect counting.
Relative Efficiencies. Figure 1 shows that the unit LOD
scores are affected by the inheritance mode of each locus
and the parental linkage phases but unaffected by the
polymorphism of the locus for all cases where noninformative offspring exist. Genotypic data with 100% informative offspring (both loci are multiallelic and codominant; MM in Figure 1) is the most efficient data type for
linkage analysis even though offspring noninformative
for direct counting in other types of data are used by
indirect counting. This implies that an offspring noninformative for direct counting is only partially informative
for indirect counting and is never as good as an informative offspring. Mixed linkage phases (BB-CR, BD-CR,
DD-CR in Figure 1) are less efficient than coupling and
repulsion phases. For dominant loci, coupling linkage
phases (DD-CC in Figure 1) are strikingly more efficient
than the mixed and repulsion phases (DD-CR and DDRR in Figure 1). For example, assuming θ = 0.05, the
unit LOD for the repulsion phases is only 22% of that
for the coupling phases, whereas the unit LOD for the
mixed phases is a mere 12% of that for the coupling
phases. The backcross design is better than the intercross
or F-2 design for mapping dominant loci but is worse for
mapping codominant loci. Compared to results of relative
efficiencies in the literature, the results in this article
have new information regarding the effect of marker
polymorphism on the unit LOD scores and the ability to
determine parental linkage phases, and have essentially
Number of
observations
Number of recombinants
k1
k2
k3
k4
k1θ(5 − θ)/[2 + θ(1 − θ)]
k2θ(1 + θ)/[1 − θ(1 − θ)]
k3θ(1 + θ)/[1 − θ(1 − θ)]
k4
n
nr
the same conclusion regarding the effect of inheritance
mode; that is, dominance has less linkage information
than codominance and backcross is more efficient for
dominant loci (Allard, 1956; Green, 1981; Knapp et al.,
1995). However, the result regarding dominant loci with
mixed linkage phases (DD-CR) is somewhat different
from that in Green (1981), where DD-CR is found to be
more efficient than repulsion linkage phases for small
recombination frequencies. This difference could be attributable to different methods for evaluating relative
efficiencies; that is, this article used the unit LOD
whereas Green (1981) used the information matrix,
Figure 1. Unit LOD scores for various types of data for
linkage analysis. MM: two multiallelic codominant loci;
MB: one multiallelic codominant locus and one biallelic
codominant locus; BB: both loci are biallelic and codominant with coupling or repulsion linkage phases; BB-CR:
both loci are biallelic and codominant with mixed linkage
phases; MD: one multiallelic codominant locus and one
dominant locus; BD: one biallelic codominant locus and
one dominant locus with coupling or repulsion linkage
phases; BD-CR: one bi-allelic codominant locus and one
dominant locus with mixed parental linkage phases; DDCC: two dominant loci in coupling linkage phase; DDCR: two dominant loci with mixed parental linkage phase;
DD-RR: two dominant loci in repulsion linkage phase.
2536
Da et al.
Figure 2. Likelihood ratios for identifying parental linkage phase. The likelihood ratio is based on the largest
two likelihood functions for each case and is in log10 scale
calculated from 200 F-2 offspring. MM: two multiallelic
codominant loci; MB: one multiallelic codominant locus
and one biallelic codominant locus; BB: both loci are biallelic and codominant; MD: one multiallelic codominant
locus and one dominant locus; BD: one biallelic codominant locus and one dominant locus; DD: two dominant
loci.
which is an approximation of the second moments of
parameter estimates based on the second derivatives of
the log-likelihood function. It is worth noting that the
same author recommended avoiding the experimental
design using mixed linkage phases (DD-CR in Figure 1)
for linkage analysis (p. 85, Green, 1981). For inference
about parental linkage phases, both the locus polymorphism and inheritance mode affect the statistical power;
that is, the power for detecting parental linkage phases
decreases as locus polymorphism decreases and as the
number of dominant loci increases (Figure 2). Note that
the BB-CR data type (Table 3) does not have the ability
to distinguish between the two possible cases of mixed
linkage phases, Phase II and Phase III in Table 9, using
the likelihood ratio test in Appendix 2. Therefore, knowing parental linkage phases is a necessary condition to
estimate gender-specific recombination frequencies using Eqs. [16] and [17]. If mixed linkage parental linkage
phases are identified as the most likely parental phases
but cannot be identified as Phase II or III, Eqs. [16] and
[17] still can be used but cannot identify which estimate
is for the male and which is for the female recombination frequency.
Bias and Reduction in LOD Score Due to Ignoring Noninformative Offspring. The direct counting method does
not apply to the case when both loci are dominant, because such a method cannot estimate recombination frequency due to the fact that only one genotype is informative, that is, the aabb genotype in Tables 7 and 8. Therefore, this section applies only to the cases where at least
one locus is codominant. For biallelic loci with coupling
or repulsion linkage phases where at least one locus is
codominant (BB, BD), gender-specific recombination frequencies are unavailable and the bias in the genderaverage recombination frequency is d1 = θd − θ = −θ(1 −
θ)(1 − 2θ)/[(1 − θ)2 + θ2]. For biallelic loci with mixed
parental linkage phases (BB-CR, BD-CR), θd = 0.5 irrespective of the true parameter value, and the bias is d2
= θd − θ = 0.5 − θ. For biallelic loci with mixed parental
linkage phases (BB-CR, BD-CR), bias in the female recombination frequency is d3 = xd − x = x(1 − x)(1 − 2y)/(x
+ y − xy), where xd = the female recombination frequency
estimated by the direct counting method. Bias in the
male recombination frequency can be obtained using the
same formula except that x and y are switched in the
formula. The formulas of d1, d2, and d3 show that ignoring
noninformative offspring yields underestimates of recombination frequencies for coupling or repulsion parental
linkage phases and overestimates for mixed parental
linkage phases. As shown in Figure 3, the absolute bias
reaches the maximum at θ = 0.257 for the coupling or
repulsion phases, where the bias is 0.151, whereas the
bias is an increasing function of θ and decreasing function
of x or y for the mixed phases. These results indicate
that ignoring noninformative offspring could result in a
serious bias for biallelic loci. Bias due to ignoring noninformative offspring was also reported for half-sib designs
with biallelic codominant loci (Gomez-Raya, 2001). Our
analytical and numerical results show that ignoring noninformative offspring does not result in a bias in the
estimate of the recombination frequency when at least
Figure 3. Bias in estimates of recombination frequency
due to ignoring noninformative offspring. BB: both loci
are biallelic and codominant; BB-CR: two codominant
loci with mixed parental linkage phases; BD: one biallelic
codominant locus and one dominant locus with coupling
or repulsion linkage phases; BD-CR: one biallelic codominant locus and one dominant locus with mixed parental
linkage phases.
2537
Direct and indirect counting
Table 9. Association between genotypic probabilities and numbers of observations for
testing parental linkage phases for an intercross of A1A2B1B2 × A3A4B3B4
Genotypic frequency
Phase I
A1B1/A2B2 ×
A3B3/A4B4
Phase II
A1B2/A2B1 ×
A3B3/A4B4
Phase III
A1B1/A2B2 ×
A3B4/A4B3
Phase IV
A1B2/A2B1 ×
A3B4/A4B3
Number of
observations
A1A3B1B3
A1A4B1B4
A2A3B2B3
A2A4B2B4
q1 =
(1 − x)(1 − y)
q2
q3
q4
k1
A1A3B1B4
A1A4B1B3
A2A3B2B4
A2A4B2B3
q2 = x(1 − y)
q1
q4
q3
k2
A1A3B2B3
A1A4B2B4
A2A3B1B3
A2A4B1B4
q3 = (1 − x)y
q4
q1
q2
k3
A1A3B2B4
A1A4B2B3
A2A3B1B4
A2A4B1B3
q4 = xy
q3
q2
q1
k4
1
1
1
1
n
Genotype
Total
one locus is multiallelic. As expected, ignoring noninformative offspring results in a reduction in the LOD score
for all cases (Figure 4).
genes using codominant markers or to integrate linkage
maps of codominant and dominant loci.
Literature Cited
Implications
Results from this study indicate that the method of
direct and indirect counting can be an effective method
for large-scale joint linkage analysis of codominant and
dominant loci that are useful for mapping dominant
Figure 4. Reduction in LOD scores due to ignoring
noninformative offspring. The reduction in LOD score
for a data type is defined as the difference between the
unit LOD in Figure 1 and the unit LOD score when noninformative offspring for direct counting are ignored.
Allard, R. W. 1956. Formulas and tables to facilitate the calculation of
recombination values in heredity. Hilgardia 24:235–278.
Ajmone-Marsan, P., A. Valentini, M. Cassandro, G. Vecchiotti-Antaldi,
G. Bertoni, and M. Kuiper. 1997. AFLP markers for DNA fingerprinting in cattle. Anim. Genet. 28:418–426.
Cushwa, W. T., and J. F. Medrano. 1996. Applications of the random
amplified polymorphic DNA (RAPD) assay for genetic analysis of
livestock species. Anim. Biotechnol. 7:11–31.
Da, Y., and H. A. Lewin. 1995. Linkage information content and efficiency of full-sib and half-sib designs for gene mapping. Theor.
Appl. Genet. 90:699–706.
Gomez-Raya, L. 2001. Biased estimation of the recombination fraction
using half-sib families and informative offspring. Genetics
157:1357–1367.
Green, E. L. 1981. Genetics and Probability in Animal Breeding Experiments. Oxford University Press, New York.
Green, P., K. Falls, and S. Crooks. 1990. Documentation for CRIMAP. version 2.4. Washington University School of Medicine,
St. Louis.
Knapp, S. J., J. L. Holloway, W. C. Bridges, and B-H. Liu. 1995. Mapping
dominant markers using F2 matings. Theor. Appl. Genet.
91:74–81.
Knorr, C., H. H. Cheng, and J. B. Dodgson. 1999. Application of AFLP
markers to genome mapping in poultry. Anim. Genet. 30:28–36.
Lander, E. S., and D. Botstein. 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics
121:185–199.
Lathrop, G. M., J. M. Lalouel, C. Julier, and J. Ott. 1984. Strategies
for multilocus linkage analysis in humans. Proc. Natl. Acad. Sci.
USA 81:3443–3446.
London, N., J. Xu, J. Garbe, and Y. Da. 2002. Linkage analysis for the
hypothesized interaction between the polled and scurred traits
in cattle. In: Proc. 7th World Cong. Genet. Appl. Livest. Prod.,
Montpellier, France 29:485–488.
2538
Da et al.
Ott, J. 1999. Analysis of Human Genetic Linkage. 3rd ed. The Johns
Hopkins University Press, Baltimore and London.
Smith, C.A.B. 1957. Counting methods in genetical statistics. Ann.
Hum. Genet. 21:254–276.
Xu, J., N. London, J. Garbe, and Y. Da. 2002. Bias in linkage analysis due to ignoring epistasis effects. In: Proc. 7th World
Cong. Genet. Appl. Livest. Prod., Montpellier, France
32:633–636.
Appendix 1. LOD Scores
MM data type: both loci are codominant and multiallelic (Table 9).
Zx
=
nlog10(2) + (k2 + k4)log10(x) + (k1 + k3)log10(1 − x)
Zy
=
nlog10(2) + (k3 + k4)log10(y) + (k1 + k2)log10(1 − y)
Zθ
=
2nlog10(2) + (k2 + k3 + 2k4)log10(θ) + (2k1 + k2 + k3)log10(1 − θ)
u
=
2[log10(2) + θlog10(θ) + (1 − θ)log10(1 − θ)]
MB data type: both loci are codominant, but one locus is multi-allelic and one locus is bi-allelic (Table 1).
Zx
=
N1log10[2(1 − x)] + N2log10(2x) + N3log10[2x(1 − y) + 2(1 − x)y] + N4log10[2xy + 2(1 − x)(1 − y)]
Zy
=
N5log10[2(1 − y)] + N6log10(2y) + N3log10[2x(1 − y) + 2(1 − x)y] + N4log10[2xy + 2(1 − x)(1 − y)]
Zθ
=
N7log10[4(1 − θ)2] + N8log10(4θ2) + N9log10[4θ(1 − θ)] + N10log10{2[(1 −θ)2 + θ2]}
u
=
¹⁄₂(1 − θ)2log10[4(1 − θ)2] + ¹⁄₂θ2log10(4θ2) + 2θ(1 − θ)log10[4θ(1 − θ)] + ¹⁄₂[(1 − θ)2 + θ2] log10{2[(1 − θ)2 + θ2]}
where N1 = k1 + k4 + k5 + k8, N2 = k2 + k3 + k6 + k7, N3 = k9 + k12, N4 = k10 + k11, N5 = k1 + k3 + k6 + k8, N6 = k2 +
k4 + k5 + k7, N7 = k1 + k8, N8 = k2 + k7, N9 = k3 + k4 + k5 + k6 + k9 + k12, and N10 = k10 + k11.
BB data type: both loci are codominant and bi-allelic with coupling or repulsion parental linkage phases (Table 2).
Zθ = N1log10[4(1 − θ)2] + N2log10[4θ (1 − θ)] + N3log10(4θ2) + N4log10{2[(1 − θ)2 + θ2]}
where N1 = k1 + k9, N2 = k2 + k4 + k6 + k8, N3 = k3 + k7, N4 = k5. The unit LOD score is the same as that for the
MB data type.
BB data type: both loci are codominant and biallelic with mixed parental linkage phases (Table 3).
Zx
=
(k1 + k9)log10(2x) + (k2 + k4 + k6 + k8)log10{2[(1 − x)(1 − y) + xy]}
+ (k3 + k7)log10[2(1 − x)] + k5log10{2[x(1 − y) + y(1 − x]}
=
(k1 + k9)log10[2(1 − y)] + (k2 + k4 + k6 + k8)log10{2[(1 − x)(1 − y) + xy]}
Zθ
=
+ (k3 + k7)log10(2y) + k5log10{2[x(1 − y) + y(1 − x)]}
(k1 + k3 + k5 + k7 + k9)log10[4θ(1 − θ)] + (k2 + k4 + k6 + k8)log10{2[(1 − θ)2 + θ2]}
u
=
2θ(1 − θ)log[4θ(1 − θ)] + (1 − 2θ + 2θ2) log[2(1 − 2θ + 2θ2)]
Zy
MD data type: one locus is codominant and multi-allelic and one locus is dominant (Table 4).
Zx
=
(k1 + k3)log10(2x) + (k2 + k4)log10[2(1 − x)] + k5log10[2(1 − xy)/(2 − y)]
+ k6log10{2[1 − (1 − x)y]/(2 − y)} + k7log10{2[1 − x(1 − y)]/(1 + y)}
+ k8log10[2(x + y − xy)/(1 + y)]
Zy
=
(k1 + k2)log10(2y) + (k3 + k4)log10[2(1 − y)] + k5log10[2(1 − xy)/(2 − x)]
+ k6log10{2[(1 − x(1 − y)]/(1 + x)} + k7log10{2[1 − (1 − x)y]/(2 − y)}
+ k8log10[2(x + y − xy)/(1 + x)]
Zθ
=
k1log10(4θ2) + (k2 + k3)log10[4θ(1 − θ)] + k4log10[4(1 − θ)2] + k5log10[(4/3)(1 − θ2)]
+ (k6 + k7)log10{(4/3)[1 − θ(1 − θ)]} + k8log10[(4/3)θ(2 − θ)]
u
=
¹⁄₄θ2log10(4θ2) + ¹⁄₂θ(1 − θ)log10[4θ(1 − θ)] + ¹⁄₄(1 − θ)2log10[4(1 − θ)2]
+ ¹⁄₄(1 − θ2)log10[(4/3)(1 − θ2)] + ¹⁄₂[1 − θ(1 − θ)]log10{(4/3)[1 − θ(1 − θ)]}
+ ¹⁄₄θ(2 − θ)log10[(4/3)θ(2 − θ)]
2539
Direct and indirect counting
BD data type: one locus is codominant and biallelic and one locus is dominant with coupling or repulsion
parental linkage phases (Table 5).
=
Zθ
k1log10[(4/3)(1 − θ2)] + k2log10(4θ2) + k3log10{(4/3)[1 − θ(1 − θ)]} + k4log10[4θ(1 − θ)]
+ k5log10{[(4/3)θ(2 − θ)]} + k6log10[4(1 − θ)2]
The unit LOD score is the same as for the MD data type.
BD-CR data type: one locus is codominant and biallelic and one locus is dominant with mixed parental linkage
phases (Table 6).
Zx
=
Zy
=
Zθ
=
u
=
k1log10[2(1 − y + xy)/(2 − y)] + k2log10[2(1 − x)] + k3log10[(2/3)(1 + x + y − 2xy)]
+ k4log10{2[(1 − x)(1 − y) + xy]} + k5log10[2(1 − x + xy)/(1 + y)] + k6log10(2x)
k1log10[2(1 − y + xy)/(1 + x)] + k2log10(2y) + k3log10[(2/3)(1 + x + y − 2xy)]
+ k4log10{2[(1 − x)(1 − y) + xy]} + k5log10[2(1 − x + xy)/(2 − x)] + k6log10[2(1 − y)]
(k1 + k5)log10[(4/3)(1 − θ + θ2)] + (k2 + k6)log10[4θ(1 − θ)] + k3log10[(2/3)(1 + 2θ − 2θ2)]
+ k4log10{2[(1 − θ)2 + θ2]}
[¹⁄₂(1 − θ + θ2)]log10[(4/3)(1 − θ + θ2)] + [¹⁄₂θ(1 − θ)]log10[4θ(1 − θ)]
+ [¹⁄₄(1 + 2θ − 2θ2)]log10[(2/3)(1 + 2θ − 2θ2] + {¹⁄₄[(1 − θ)2 + θ2)]}log10{2[(1 − θ)2 + θ2]}
DD-CC data type: both loci are dominant with coupling linkage phase (Table 7).
Zθ
u
=
=
k1log10{(8/9)[1 + 0.5(1 − θ)2]} + (k2 + k3)log10[(4/3)θ(2 − θ)] + k4log10[4(1 − θ)2]
q1log10{(8/9)[1 + 0.5(1 − θ)2]} + (q2 + q3)log10[(4/3)θ(2 − θ)] + q4log10[4(1 − θ)2]
DD-CR data type: both loci are dominant with mixed parental linkage phases (Table 8).
=
=
Zθ
u
k1log10{(8/9)[1 + ¹⁄₂θ(1 − θ)]} + (k2 + k3)log10{(4/3)[1 − θ(1 − θ)]} + k4log10[4θ(1 − θ)]
q1log10{(8/9)[1 + ¹⁄₂θ(1 − θ)]} + (q2 + q3)log10{(4/3)[1 − θ(1 − θ)]} + q4log10[4θ(1 − θ)]
DD-RR data type: both loci are dominant with coupling and repulsion linkage phases (Knapp et al., 1995).
Zθ
u
=
=
k1log10[(4/9)(2 + θ2)] + (k2 + k3)log10[(4/3)(1 − θ2)] + k4log10(4θ2)
¹⁄₄(2 + θ2)log10[(4/9)(2 + θ2)] + ¹⁄₂(1 − θ2)log10[(4/3)(1 − θ2)] + ¹⁄₄θ2log10(4θ2)
where k1, k2, k3, and k4 are numbers of observations for A-B-, A-bb, aaB-, and aabb genotypes, respectively.
Appendix 2: Likelihood Functions for Testing
Parental Linkage Phases
When the linkage phase of each parent is unknown,
the most likely linkage phase of each parent can be
identified using likelihood ratios. Table 9 shows the
association between the genotypic probabilities and the
numbers of genotypic observations. As the assumption
about the parental linkage phases changes, the association between the underlying probabilities and the observations changes. For an intercross with two loci, four
combinations of parental linkage phases are possible.
For example, for the mating of A1A2B1B2 × A3A4B3B4 in
Table 9, the following four combinations of parental
linkage phases are possible: 1) A1B1/A2B2 × A3B3/A4B4,
2) A1B2/A2B1 × A3B3/A4B4, 3) A1B1/A2B2 × A3B4/A4B3,
and 4) A1B2/A2B1 × A3B4/A4B3. Then, the corresponding log-likelihood functions (except for a common constant) are:
L1
L2
L3
L4
=
=
=
=
k1log(q1)
k1log(q2)
k1log(q3)
k1log(q4)
+
+
+
+
k2log(q2)
k2log(q1)
k2log(q4)
k2log(q3)
+
+
+
+
k3log(q3)
k3log(q4)
k3log(q1)
k3log(q2)
+
+
+
+
k4log(q4)
k4log(q3)
k4log(q2)
k4log(q1)
The log-likelihood functions for the other cases can be
derived in a similar manner. The resulting formulae
are similar, except that the definitions for ki and qi are
different and the numbers of terms in the summation
of the right-hand side of the equation are generally
different as well. The most likely linkage phase is then
identified by the largest likelihood.