Linkage analysis using direct and indirect counting and relative efficiencies for codominant and dominant loci1 Y. Da2, J. Garbe, N. London, and J. Xu Department of Animal Science, University of Minnesota, Saint Paul 55108 ABSTRACT: A method based on direct and indirect counting is developed for rapid and accurate linkage analysis for codominant and dominant loci. Methods for estimating gender-specific recombination frequencies are available for cases where at least one of the two loci is multiallelic and for biallelic loci with mixed parental linkage phases where at least one locus is codominant. Most of the estimates of gender-average and genderspecific recombination frequencies required iterative solutions. The new method makes use of the full data set, yields exact estimates of the recombination frequencies when the observed and expected genotypic frequencies are equal, and are computationally effi- cient. Relative efficiency of various data types is affected by the inheritance mode and by parental linkage phases of biallelic loci, but unaffected by the locus polymorphism when using the full data set for linkage analysis. The ability to determine parental linkage phases is affected by the locus polymorphism as well as inheritance mode. Intercross (or F-2 design) is more efficient for mapping codominant loci, whereas backcross is more efficient if dominance is involved. Mixed parental linkage phases of biallelic loci are less efficient than coupling or repulsion linkage phases. Ignoring noninformative offspring results in biased estimates of recombination frequency for biallelic loci only and reduced LOD scores for all cases. Key Words: Codominance, Dominance, Linkage Analysis, Loci 2002 American Society of Animal Science. All rights reserved. Introduction While computationally efficient methods are available for large-scale linkage analysis for codominant loci (Green et al., 1990), rapid methods are unavailable for mapping dominant loci and for the map integration of dominant and codominant loci. Most computer programs that provide linkage analysis for dominant loci, such as LINKAGE (Lathrop et al. 1984), implement computationally intensive likelihood analysis and generally have a limitation on the number of loci that can be analyzed jointly. A computationally efficient method for linkage analysis with codominant and dominant inheritance would be a valuable tool for mapping dominant genes and for the map integration of codominant and dominant loci, because the dominant inheritance mode is typical of many disease genes and many domi- 1 This research is supported in part by the Agricultural Experiment Station (project MN-16-043) and grant-in-aid of the University of Minnesota, and by funding from Cargill and NRICGP/USDA (grant #03275). The authors wish to thank two anonymous reviewers for helpful comments. 2 Correspondence: 265D Haecker Hall (phone: (612) 625-7780; fax: (612) 625-1283; E-mail: [email protected]). Received November 11, 2001. Accepted May 28, 2002. J. Anim. Sci. 2002. 80:2528–2539 nant markers exist (Ajmone-Marsan et al., 1997; Cushwa and Medrano, 1996; Knorr et al., 1999). Knapp et al. (1995) derived an analytical formula for maximum likelihood estimation of recombination frequency between two dominant loci in repulsion linkage phase. The mathematical simplicity of such an analytical formula is computationally efficient for large-scale linkage analysis. However, many other cases of linkage analysis do not have a simple analytical formula for estimating recombination frequencies based on likelihood functions. The understanding of relative efficiencies of various types of genotypic data is useful for planning mapping experiments. Most results on relative efficiencies of genotypic data (Allard, 1956; Green, 1981) were based on the approximate variances and covariances of estimated recombination frequencies, but the accuracy of such an approximation is unclear. The purpose of this article is to develop simple solutions for linkage analysis to facilitate large-scale joint linkage analysis with codominant and dominant loci, and to evaluate the relative efficiencies of various types of genotypic data to provide insights for designing mapping experiments. Material and Methods General Strategy. Families with linkage information will be divided into two categories: families that can be 2528 2529 Direct and indirect counting analyzed using the direct counting method (Ott, 1999) for all offspring (Category I), and families that cannot be analyzed using the direct counting method for all offspring (Category II). Recombination frequencies will be estimated using the direct counting method (Ott, 1999) for Category I, and using “direct and indirect counting” for Category II. Then, the two estimates will be combined to obtain the overall estimate of recombination frequency. The focus of this article is on the new method of direct and indirect counting for Category II. The Method of Indirect Counting. The purpose of using indirect counting is to develop a method for linkage analysis that uses the full data set including noninformative offspring with minimal mathematical complexity and computational difficulty to facilitate large-scale applications. Noninformative offspring do not have information to determine parental allele transmission unequivocally (Da and Lewin, 1995) and cannot be used for linkage analysis using the direct counting method. However, noninformative offspring are expected to contain a percentage of unobservable recombinants and such unobservable recombinants could be estimated using the method of indirect counting to be described below. Therefore, “noninformative” offspring for direct counting in fact are at least “partially” informative for indirect counting. “Underlying genotype” is used to refer to a phase-known genotype, whereas “genotype” or “observed genotype” is used to refer to a genotype with known allele contents only. For example, AaBb is an observed genotype with two possible underlying genotypes: AB/ab and Ab/aB. “Phenotype” of a locus refers to the fact that the allele content of the locus is unknown and is used to describe observations of dominant loci. Based on the gene counting method of Smith (1957), the indirect counting method calculates the expected number of recombinants contained in the noninformative offspring using the following formula kei = c ∑mijvij ki [1] j=1 where kei = expected number of recombinants contained in genotype (or phenotype) i, mij = number of recombinants in the underlying genotype j of genotype (or phenotype) i, vij = conditional probability of recombinants in noninformative offspring for a given two-locus genotype (for codominant loci) or phenotype (for dominant loci), and ki = the total number of noninformative offspring with the given genotype (or phenotype). The general formula for calculating vij is vij = pj/qi [2] where pj = probability of underlying genotype j with recombinant(s), and qi = probability of the observed genotype or phenotype. Note that pj and qi can be equal in some cases. The observed number of recombinants in the same category of families is obtained by direct counting from informative offspring for which parental allele transmission can be determined unequivocally. Adding the numbers of expected and observed recombinants yields the estimated total number of recombinants. Dividing this estimated number of recombinants by the total number of meioses yields the estimate of recombination frequency from families where noninformative offspring exist. If gender-average (sex-average) recombination frequency is assumed, θ = nr/T [3] where θ = gender-average recombination frequency, nr = total number of expected and observed recombinants, T = total number of meioses. Since nr is a function of θ, Eq. [3] generally is a polynomial function of θ. In this article, an analytical solution for θ is provided if Eq. [3] is a polynomial function of degree 3 or less, and an iterative solution is used if Eq. [3] is a higher order polynomial function. As shown in Da and Lewin (1995), a cross between heterozygous genotypes, referred to as an “intercross,” is the only situation where noninformative offspring may exist if the genotypes of both parents are known. Therefore, the method of indirect counting will consider various situations of intercross, including multiallelic, biallelic, codominant, and dominant loci. Gender-Average and Gender-Specific Recombination Frequencies. Gender-average (sex-average) recombination frequency refers to the recombination frequency estimated from meioses of both genders, and genderspecific (sex-specific) recombination frequencies refer to two recombination frequencies estimated from male and female meioses separately. Gender-average recombination frequency is always estimable as long as linkage information exists. However, gender-specific recombination frequencies are not always estimable. When the two loci are biallelic and the heterozygous parents have the same linkage phase (coupling or repulsion), gender-specific recombination frequencies are nonestimable regardless of whether the loci are codominant or dominant, because two independent equations cannot be established to estimate two separate recombination frequencies. For two dominant loci, the case with mixed parental linkage phases (one parent is in coupling phase and the other in repulsion phase) is the only situation where two equations could be established to estimate gender-specific recombination frequencies. However, neither our method nor the maximum likelihood method would yield reliable estimates. Therefore, estimating gender-specific recombination frequencies using dominant loci is deemed impractical and will not be considered in this article. Methods to estimate gender-specific recombination frequencies will be developed for cases where at least one locus has multiple alleles or the parents have mixed linkage phases with at least one codominant locus. In analogy to Eq. [3], gender-specific (sex-specific) recombination frequencies can be estimated using the following equations simultaneously: 2530 Da et al. x = nx/Tx [4] y = ny/Ty [5] where x = female recombination frequency, nx = total number of expected and observed female recombinants, Tx = total number of female meioses, y = male recombination frequency, ny = total number of expected and observed male recombinants, and Ty = total number of male meioses. In all cases covered by this article, Eqs. [4] and [5] will be solved by iterative methods. Pooling of Estimates. For families using direct and indirect counting, estimates of a recombination frequency from all s families can be pooled to obtain the overall estimate from all families using the following formula: θ1 = S S ∑nri / ∑Ti i=1 [6] i=1 where θ1 = the overall estimate of the recombination frequency from families where noninformative offspring exist, nri = expected number of recombinants in family i, and Ti = number of gametes in family i. Equation [6] can be used to obtain the pooled estimates of x and y except that θ is replaced with x or y, and nri and Ti are replaced with the corresponding gender-specific numbers defined in Eqs. [4] and [5]. When gender-specific recombination frequencies are available, the gender-average recombination frequency will be obtained as: θ = a 1x + a 2 y [7] where a1 = Tx/(Tx + Ty) and a2 = Ty/(Tx + Ty). As usual, the LOD score for a gender-average recombination frequency is defined as Zθ = log10[L(θ)/L(θ = ¹⁄₂)] [8] where L(θ) = likelihood function under the hypothesis of linkage, and L(θ = ¹⁄₂) = likelihood function under the hypothesis of no linkage. The LOD scores for testing the significance of gender-specific recombination frequencies in the literature (e.g., Ott, 1999) is: Zxy = log10[L(x,y)/L(x = y = θ)] [9] The LOD score given by Eq. [9] is an indication how much the gender-specific model is favored over the gender-average model, but is not a test for the significance of each gender-specific recombination frequency. The following LOD scores could be defined to test the significance for gender-specific recombination frequencies: Zx = log10[L(x;y)/L(x = ¹⁄₂; y)] [10] Zy = log10[L(y;x)/L(y = ¹⁄₂; x)] [11] As to be shown in this article, a family with noninformative offspring is not as informative as a family without noninformative offspring, even when noninformative offspring are used in the linkage analysis. To account for this type of unequal information, estimates of a recombination frequency from the two categories of families should be weighted differently. Since the LOD score is a summary statistic of the number of observations and informativeness of a family type, the LOD of each family type should be a logical choice as the weight. Let θ2 = estimate of recombination frequency from families without noninformative offspring, and Z1 and Z2 be the LOD scores for θ1 and θ2, respectively. Then, the overall estimate from all families can be obtained as θ = c1θ1 + c2θ2, where c1 = Z1/(Z1 + Z2), and c2 = Z2/(Z1 + Z2), with c1 + c2 = 1. A gender-specific recombination frequency (x or y) over families can be obtained similarly using the LOD scores defined by Eqs. [10] and [11]. However, this article does not include families that do not have separate estimates of x and y in the calculation of x or y across families. We have observed that “forced” estimates of x and y from those families tend to yield the same x and y values, so that including such families without separate x and y estimates would tend to diminish the difference between x and y. Relative Efficiency of Different Genotypic Data. Relative efficiency of different genotypic data, including multiallelic, biallelic, codominant and dominant genotypes, will be compared using the unit LOD score and the likelihood ratio for testing parental linkage phases. The unit LOD score (u) will be defined as the expected LOD score per offspring assuming gender-average recombination frequency, i.e., u = Zθ/N [12] where N is the number of offspring, and Zθ is defined by Eq. [8]. The definition of the unit LOD score is the same as the ELOD in Lander and Botstein (1989). Here, “unit LOD” rather than “ELOD” is used to avoid potential confusion with the ELOD defined differently in Ott (1999). The type of genotypic data with higher unit LOD score is considered more efficient for linkage analysis. An advantage of the unit LOD score over the overall LOD score (Zθ) is that the unit LOD score can be expressed in terms of the recombination frequencies so that the numbers of observations are no longer involved. This is convenient for studying the relative efficiencies without having to assume a specific set of numbers of observations. It can be shown that the unit LOD score can be obtained by replacing the numbers of genotypes in Zθ by the corresponding genotypic probabilities. Unit LOD scores for specific cases are defined in Appendix 1. The backcross design (backcross to the recessive line is assumed if dominance is involved) is included for comparing relative efficiencies with the intercross or F-2 design. The information available for testing parental linkage phases is a measure of data efficiency. 2531 Direct and indirect counting When parental linkage phases are unknown, such as when the grandparents have missing genotypes and the allele transmission from the grandparents to the parent cannot be determined, the likelihood ratio test based on the offspring genotypic distribution can be used to determine the parental linkage phases. The type of genotypic data that yields more statistical confidence for determining parental linkage phases is more efficient for linkage analysis. Given two loci, four combinations of parental linkage phases are possible. The likelihood ratio for the two highest likelihood functions will be used for comparing efficiency for inferring parental linkage phases. Likelihood functions for testing parental linkage phases are given in Appendix 2. Multiallelic and Biallelic Codominant Loci (MB Data Type). From Table 1, gender-specific recombination frequencies can be obtained by the following iterative solutions: x(i+1) = a + bx(i)(1 − y(i)) x (1 − y(i)) + (1 − x(i))y(i) + cx(i)y(i) (1 − x )(1 − y(i)) + x(i)y(i) y(i+1) = d + b(1 − x(i)y(i)) x (1 − y(i)) + (1 − x(i))y(i) + cx(i)y(i) (1 − x )(1 − y(i)) + x(i)y(i) Bias and Reduction in LOD Score Due to Ignoring Noninformative Offspring. To quantify the benefit of including noninformative offspring in linkage analysis, bias in estimates of recombination frequency due to ignoring noninformative offspring and the reduction in unit LOD score are evaluated under the assumption that the observed offspring distribution equals the expected. Bias is defined as the difference between the estimate using informative offspring only and the estimate using full data. The reduction in LOD score is defined as the difference in the unit LOD scores between using the full data and using informative offspring only. As to be shown in this article, the methods of using full data developed in this article yield estimates that are exactly the same as the true parameters under the assumption that the observed offspring distribution equals the expected. Therefore, the bias in recombination frequency due to ignoring noninformative offspring can be expressed as θd − θ, where θd = estimate of recombination frequency using direct counting, and θ = the true recombination frequency. (i) (i) (i) Both Loci Are Multiallelic and Codominant (MM Data Type). “Multiallelic” in this article refers to three or more alleles per locus for the two heterozygous parents. Such a definition is used because three alleles per locus for the parents result in 100% informative offspring for the locus. The direct counting method is used for this type of data. The purpose of describing this type of data is not to develop a new method, but to use as a comparison to less informative types of data where noninformative offspring exist. By the direct counting method, the gender-specific recombination frequencies are estimated by Eqs. [4] and [5]. When x and y are available, gender-average recombination frequency can be obtained by Eq. [7]. Likelihood functions and LOD scores are given in Appendices 1 and 2. [14] (i) where x = female recombination frequency, y = male recombination frequency, superscript i = iteration number, a = (k2 + k3 + k6 + k7)/n, b = (k9 + k12)/n, c = (k10 + k11)/n, and d = (k2 + k4 + k5 + k7)/n, and where k1 through k12 are defined in Table 1. Then the gender-average recombination frequency can be estimated as θ = (x + y)/2, noting that the male and female parents have the same number of meioses. This method of estimating gender-average recombination frequency will also be used for other cases where gender-specific recombination frequencies are available. Biallelic Codominant Loci with Coupling or Repulsion Parental Linkage Phases (BB Data Type). For this case, gender-specific recombination frequencies are unavailable and gender-average recombination frequency can be estimated based on Table 2. Substituting the nr in Table 2 into Eq. [2], then Eq. [2] can be written as a third degree polynomial function of θ, and the solution for θ is Results and Discussion Results of the new method for estimating recombination frequencies will be presented for eight cases in order of the most informative loci (both loci have multiple codominant alleles) to the least informative loci (both loci are dominant with mixed parental linkage phases). [13] θ = [−s + √ s2 + t3 ]1/3 − [s + √s2 + t3 ]1/3 + a1 [15] 3 where s = ¹⁄₂[a1a2/3 − (2/27)a13 − c], t = ¹⁄₃(a2 − a12/3), a1 = (T + c1 + n4)/T, a2 = 0.5 + c1/T, and where T = 2n, c1 = 2n3 + n2, c = c1/(2T), n1 = k1 + k9, n2 = k2 + k4 + k6 + k8, n3 = k3 + k7, and n4 = k5. Note that Eq. [15] is derived under the assumption of coupling parental linkage phases but is applicable to the repulsion linkage phases by reversing the allele definitions for one of the two loci. Biallelic Codominant Loci with Mixed Parental Linkage Phases (BB-CR). From Table 3, gender-specific recombination frequencies can be obtained by the following iterative solutions: [16] x(i+1) = a + [bx(i)(1 − y(i))]/[x(i)(1 − y(i)) + (i) (i) (i) (i) (i) (i) (i) (i) (1 − x )y ] + cx y /[(1 − x )(1 − y ) + x y ] [17] y(i+1) = d + [b(1 − x(i))y(i)]/[x(i)(1 − y(i)) + (1 − x(i))y(i)] + cx(i)y(i)/[(1 − x(i))(1 − y(i)) + x(i)y(i)] 2532 Da et al. Table 1. Genotypic frequency, number of observations, and the number of recombinants in the offspring from the intercross of A1B/A2b (male) × A3B/A4b (female) Genotype A1A3BB A1A3bb A1A4BB A1A4bb A2A3BB A2A3bb A2A4BB A2A4bb A1A3Bb A1A4Bb A2A3Bb A2A4Bb Total Number of recombinants Genotypic frequencya Number of observations Femaleb Maleb q1 = ¹⁄₄(1 − x)(1 − y) q2 = ¹⁄₄xy q3 = ¹⁄₄x(1 − y) q4 = ¹⁄₄(1 − x)y q 5 = q4 q6 = q3 q 7 = q2 q8 = q1 q9 = q3 + q4 q10 = q1 + q2 q11 = q1 + q2 q12 = q3 + q4 k1 k2 k3 k4 k5 k6 k7 k8 k9 k10 k11 k12 0 k2 k3 0 0 k6 k7 0 v1k9 v3k10 v3k11 v1k12 0 k2 0 k4 k5 0 k7 0 v2k9 v3k10 v3k11 v2k12 1 n nx ny x = female recombination frequency, y = male recombination frequency. v1 = x(1 − y)/[x(1 − y) + (1 − x)y], v2 = (1 − x)y/[x(1 − y) + (1 − x)y], v3 = xy/[xy + (1 − x)(1 − y)]. a b Table 2. Genotypic frequency, number of observations, and the number of recombinants in the offspring from the intercross of AB/ab × AB/ab Genotypic frequencya Number of observations Number of recombinants AABB AABb AAbb AaBB AaBb Aabb aaBB aaBb aabb q1 = ¹⁄₄(1 − θ)2 q2 = ¹⁄₂θ(1 − θ) q3 = ¹⁄₄θ2 q 4 = q2 q5 = 2(q1 + q3) q 6 = q2 q 7 = q3 q 8 = q2 q 9 = q1 k1 k2 k3 k4 k5 k6 k7 k8 k9 0 k2 2k3 k4 2k5θ2/[(1 − θ)2 + θ2] k6 2k7 k8 0 Total 1 n nr Genotype θ = gender-average recombination frequency. a Table 3. Offspring phenotypes and recombinants from the mating of AB/ab (male) × Ab/aB (female) Number of recombinants Genotypic frequency Number of observations Femalea Malea AABB AABb AAbb AaBB AaBb Aabb aaBB aaBb aabb q1 = ¹⁄₄x(1 − y) q2 = ¹⁄₄[(1 − x)(1 − y) + xy] q3 = ¹⁄₄(1 − x)y q 4 = q2 q5 = ¹⁄₂[x(1 − y) + (1 − x)y] q 6 = q2 q 7 = q3 q 8 = q2 q 9 = q1 k1 k2 k3 k4 k5 k6 k7 k8 k9 k1 v1k2 0 v1k4 v3k5 v1k6 0 v1k8 k9 0 v1k2 k3 v1k4 v2k5 v1k6 k7 v1k8 0 Total 1 n nx ny Genotype v1 = xy/[(1 − x)(1 − y) + xy], v2 = (1 − x)y/[x(1 − y) + (1 − x)y], v3 = x(1 − y)/[x(1 − y) + (1 − x)y]. a 2533 Direct and indirect counting Table 4. Genotypic frequency, number of observations, and the number of recombinants in the offspring from the intercross of A1B/A2b (male) × A3B/A4b (female) with B being dominant over b Genotype A1A3bb A1A4bb A2A3bb A2A4bb A1A3BA1A4BA2A3BA2A4BTotal Number of recombinants Genotypic frequency Number of observations Female Male q1 = ¹⁄₄xy q2 = ¹⁄₄(1 − x)y q3 = ¹⁄₄x(1 − y) q4 = ¹⁄₄(1 − x)(1 − y) q5 = ¹⁄₄(1 − xy) q6 = ¹⁄₄[1 − (1 − x)y] q7 = ¹⁄₄[1 − x(1 − y)] q8 = ¹⁄₄(x + y − xy) k1 k2 k3 k4 k5 k6 k7 k8 k1 0 k3 0 k5x(1 − y)/(1 − xy) k6x/[1 − (1 − x)y] k7xy/[1 − x(1 − y)] k8x/(x + y − xy) k1 k2 0 0 k5(1 − x)y/(1 − xy) k6xy/[1 − (1 − x)y] k7y/[1 − x(1 − y)] k8y/(y + x − xy) 1 n nx ny where x = female recombination frequency, y = male recombination frequency, a = (k1 + k9)/n, b = k5/n, c = (k2 + k4 + k6 + k8)/n, and d = (k3 + k7)/n. average recombination frequency can be obtained using the following iterative solution: Multiallelic Codominant Locus and Dominant Locus (MD Data Type). From Table 4, gender-specific recom- θ(i+1) = a + bθ(i)/(1 + θ(i)) + cθ(i)(1 + θ(i))/ [1 − θ(i)(1 − θ(i))] + d/(2 − θ(i)) bination frequencies can be obtained by the following iterative solutions: ax(i)(1 − y(i)) bx(i) + 1 − x(i)y(i) (1 − x(i))y(i) x(i+1) = + [18] y a(1 − x(i))y(i) bx(i)y(i) = + (i) (i) 1−x y (1 − x(i))y(i) (i) where a = (2k2 + k4)/(2n), b = k1/n, c = k3/(2n), and d = k5/n. Biallelic Codominant Locus and Dominant Locus with Mixed Parental Linkage Phases (BD-CR Data Type). From Table 6, gender-specific recombination frequencies can be obtained by the following iterative solutions: dx(i) cx(i)y(i) + (i) +e (i) (i) 1 − x (1 − y ) x + y(i) − x(i)y(i) (i+1) x(i+1) = av1(i) + cv3(i) + dv5(i) + ev7(i) + f [21] y(i+1) = av2(i) + b + cv4(i) + dv6(i) + ev8(i) [22] [19] (i) dy cy +f + + 1 − x(i)(1 − y(i)) x(i) + y(i) − x(i)y(i) where a = k5/n, b = k6/n, c = k7/n, d = k8/n, e = (k1 + k2)/ n, and f = (k1 + k3)/n. Biallelic Codominant Locus and Dominant Locus with Coupling or Repulsion Parental Linkage Phases (BD Data Type). Gender-specific recombination frequencies are nonestimable for this case. From Table 5, the gender- where a = k1/n, b = k2/n, c = k3/n, d = k4/n, e = k5/n, f = k6/n, v1 = [x(1 − y) + xy]/(1 − y + xy), v2 = xy/(1 − y + xy), v3 = 2[x(1 − y) + xy]/(1 + x + y − 2xy), v4 = 2[(1 − x)y + xy]/(1 + x + y − 2xy), v5 = xy/[(1 − x)(1 − y) + xy], v6 = [x + (1 − x)y]/[(1 − x)(1 − y) + xy], v7 = xy/(1 − x + xy), v8 = [(1 − x)y + xy]/(1 − x + xy). Two Dominant Loci with Coupling Linkage Phases (DD-CC Data Type). In this case, both parents are as- Table 5. Genotypic frequency, number of observations, and the number of recombinants in the offspring from the intercross of AB/ab × AB/ab with B being dominant over b Genotypic frequency Number of observations Number of recombinants AABAAbb AaBAabb aaBaabb q1 = ¹⁄₄(1 − θ)(1 + θ) q2 = ¹⁄₄θ2 q3 = ¹⁄₂[1 − θ(1 − θ)] q4 = ¹⁄₂θ(1 − θ) q5 = ¹⁄₄θ(2 − θ) q6 = ¹⁄₄(1 − θ)2 k1 k2 k3 k4 k5 k6 2k1θ/(1 + θ) 2k2 k3θ(1 + θ)/[1 − θ(1 − θ)] k4 2k5/(2 − θ) 0 Total 1 n nr Genotype [20] 2534 Da et al. Table 6. Offspring phenotypes and recombinants from the mating of AB/ab (male) × Ab/aB (female) Number of recombinants Genotypic frequency Number of observations Femalea Malea AAB_ AAbb AaB_ Aabb aaB_ aabb q1 = ¹⁄₄(1 − y + xy) q2 = ¹⁄₄(1 − x)y q3 = ¹⁄₄(1 + x + y − 2xy) q4 = ¹⁄₄((1 − x)(1 − y) + xy) q5 = ¹⁄₄(1 − x + xy) q6 = ¹⁄₄x(1 − y) k1 k2 k3 k4 k5 k6 k1v1 0 k3v3 k4v5 k5v7 k6 k1v2 k2 k3v4 k4v6 k5v8 0 Total 1 n nx ny Genotype v1 = [x(1 − y) + xy]/(1 − y + xy), v2 = xy/(1 − y + xy), v3 = 2[x(1 − y) + xy]/(1 + x + y − 2xy), v4 = 2[(1 − x)y + xy]/(1 + x + y − 2xy), v5 = xy/[(1 − x)(1 − y) + xy], v6 = [x + (1 − x)y]/[(1 − x)(1 − y) + xy], v7 = xy/(1 − x + xy), v8 = [(1 − x)y + xy]/(1 − x + xy). a sumed to have coupling linkage phase (Table 7). The gender-average recombination frequency can be obtained from the following iterative solution: θ(i+1) = 4aθ(i)(1 + θ(i))/[2 + (1 − θ(i))2] + 2b/(2 − θ(i)) [23] where a = k1/(2n), and b = (k2 + k3)/(2n). Two Dominant Loci with Mixed Linkage Phases (DDCR Data Type). In this case, one parent is assumed to have coupling phase and the other repulsion phase (Table 8). The gender-average recombination frequency can be obtained from the following iterative solution: θ(i+1) = aθ(i)(5 − θ(i))/[2 + θ(i)(1 − θ(i))] + bθ(i)(1 + θ(i))/[1 − θ(i)(1 − θ(i))] + c [24] where a = k1/(2n), b = (k2 + k3)/(2n), and c = k4/(2n). For the case when the two loci are dominant and both parents have repulsion linkage phase (DD-RR data type), the analytical formula for maximum likelihood estimation of recombination frequency is available from Knapp et al. (1995). Numerical Results for Estimating Recombination Frequencies. Equations [13] through [24] were validated and tested using 200 offspring genotypes generated with the requirement that the observed genotypic frequencies equal the expected. The true parameters used to generate the offspring genotypes were θ = 0.20, x = 0.10, and y = 0.20. Two sets of extreme starting values, θ0 = 0.01, x0 = 0.01, and y0 = 0.01, and θ0 = 0.45, x0 = 0.45, and y0 = 0.45, were used to test the robustness of the iterative solutions to starting values. Equations [13] through [24] all yielded estimates of recombination frequencies that are exactly the same as the assumed true parameters. The iterative solutions required less than 55 iterations to converge with a tolerance level of 10−9 except for the case of dominance with mixed linkage phases (DD-CR), which required 235 to 284 iterations to converge. In terms of CPU time, all the iterative solutions required less than 1 s to converge on an 800-MHz laptop computer. The two different sets of extreme starting values did not have a significant effect on the number of iterations or computing time. The case of dominance with mixed parental linkage phases not only required more iterations, but also was the least efficient data type, as discussed below. For all the cases, direct and indirect counting yielded exactly the same results as maximum likelihood analysis. The method of direct and indirect counting should be a useful addition or alternative to current methods available for linkage analysis including complex maximum likelihood analysis due to its mathematical simplicity and computational efficiency. When combined with the strategy of two-point analysis for linkage detection, the method of direct and indirect counting should allow rapid large-scale joint linkage analysis of codominant and dominant loci, which is useful to facilitate mapping dominant Table 7. Genotypic frequency, number of observations, and the number of recombinants in the offspring from the intercross of AB/ab × AB/ab with allele A being dominant over a and B being dominant over b Genotypic frequency Number of observations Number of recombinants A-BA-bb aaBaabb q1 = ¹⁄₄[2+(1 − θ)2] q2 = ¹⁄₄θ(2 − θ) q3 = ¹⁄₄θ(2 − θ) q4 = ¹⁄₄(1 − θ)2 k1 k2 k3 k4 4k1θ(1 + θ)/[2 + (1 − θ)2] 2k2/(2 − θ) 2k3/(2 − θ) 0 Total 1 n nr Genotype 2535 Direct and indirect counting Table 8. Genotypic frequency, number of observations, and the number of recombinants in the offspring from the intercross of AB/ab × Ab/aB with A being dominant over a and B being dominant over b Genotype A-BA-bb aaBaabb Total Genotypic frequency q1 = ¹⁄₄[2 + θ(1 q2 = ¹⁄₄[1 − θ(1 q3 = ¹⁄₄[1 − θ(1 q4 = ¹⁄₄θ(1 − − θ)] − θ)] − θ)] θ) 1 loci using codominant markers and the map integration of codominant and dominant loci. The estimates of recombination frequencies from direct and indirect counting are the expected fraction of recombinants whether the estimates are within or outside the parameter space. This is helpful in interpreting the estimates in situations where the meanings of the estimates are not easily interpretable. For example, if a maximum likelihood using numerical maximization yielded an estimate outside the parameter space, the estimate itself could not tell whether the problem was due to the algorithm of numerical maximization or due to a wrong model or sampling. As shown in London et al. (2002) and Xu et al. (2002), a wrong inheritance model could result in a serious bias in estimating recombination frequencies (including estimates out of the parameter space) and such a bias could be evaluated conveniently using the method of direct and indirect counting. Relative Efficiencies. Figure 1 shows that the unit LOD scores are affected by the inheritance mode of each locus and the parental linkage phases but unaffected by the polymorphism of the locus for all cases where noninformative offspring exist. Genotypic data with 100% informative offspring (both loci are multiallelic and codominant; MM in Figure 1) is the most efficient data type for linkage analysis even though offspring noninformative for direct counting in other types of data are used by indirect counting. This implies that an offspring noninformative for direct counting is only partially informative for indirect counting and is never as good as an informative offspring. Mixed linkage phases (BB-CR, BD-CR, DD-CR in Figure 1) are less efficient than coupling and repulsion phases. For dominant loci, coupling linkage phases (DD-CC in Figure 1) are strikingly more efficient than the mixed and repulsion phases (DD-CR and DDRR in Figure 1). For example, assuming θ = 0.05, the unit LOD for the repulsion phases is only 22% of that for the coupling phases, whereas the unit LOD for the mixed phases is a mere 12% of that for the coupling phases. The backcross design is better than the intercross or F-2 design for mapping dominant loci but is worse for mapping codominant loci. Compared to results of relative efficiencies in the literature, the results in this article have new information regarding the effect of marker polymorphism on the unit LOD scores and the ability to determine parental linkage phases, and have essentially Number of observations Number of recombinants k1 k2 k3 k4 k1θ(5 − θ)/[2 + θ(1 − θ)] k2θ(1 + θ)/[1 − θ(1 − θ)] k3θ(1 + θ)/[1 − θ(1 − θ)] k4 n nr the same conclusion regarding the effect of inheritance mode; that is, dominance has less linkage information than codominance and backcross is more efficient for dominant loci (Allard, 1956; Green, 1981; Knapp et al., 1995). However, the result regarding dominant loci with mixed linkage phases (DD-CR) is somewhat different from that in Green (1981), where DD-CR is found to be more efficient than repulsion linkage phases for small recombination frequencies. This difference could be attributable to different methods for evaluating relative efficiencies; that is, this article used the unit LOD whereas Green (1981) used the information matrix, Figure 1. Unit LOD scores for various types of data for linkage analysis. MM: two multiallelic codominant loci; MB: one multiallelic codominant locus and one biallelic codominant locus; BB: both loci are biallelic and codominant with coupling or repulsion linkage phases; BB-CR: both loci are biallelic and codominant with mixed linkage phases; MD: one multiallelic codominant locus and one dominant locus; BD: one biallelic codominant locus and one dominant locus with coupling or repulsion linkage phases; BD-CR: one bi-allelic codominant locus and one dominant locus with mixed parental linkage phases; DDCC: two dominant loci in coupling linkage phase; DDCR: two dominant loci with mixed parental linkage phase; DD-RR: two dominant loci in repulsion linkage phase. 2536 Da et al. Figure 2. Likelihood ratios for identifying parental linkage phase. The likelihood ratio is based on the largest two likelihood functions for each case and is in log10 scale calculated from 200 F-2 offspring. MM: two multiallelic codominant loci; MB: one multiallelic codominant locus and one biallelic codominant locus; BB: both loci are biallelic and codominant; MD: one multiallelic codominant locus and one dominant locus; BD: one biallelic codominant locus and one dominant locus; DD: two dominant loci. which is an approximation of the second moments of parameter estimates based on the second derivatives of the log-likelihood function. It is worth noting that the same author recommended avoiding the experimental design using mixed linkage phases (DD-CR in Figure 1) for linkage analysis (p. 85, Green, 1981). For inference about parental linkage phases, both the locus polymorphism and inheritance mode affect the statistical power; that is, the power for detecting parental linkage phases decreases as locus polymorphism decreases and as the number of dominant loci increases (Figure 2). Note that the BB-CR data type (Table 3) does not have the ability to distinguish between the two possible cases of mixed linkage phases, Phase II and Phase III in Table 9, using the likelihood ratio test in Appendix 2. Therefore, knowing parental linkage phases is a necessary condition to estimate gender-specific recombination frequencies using Eqs. [16] and [17]. If mixed linkage parental linkage phases are identified as the most likely parental phases but cannot be identified as Phase II or III, Eqs. [16] and [17] still can be used but cannot identify which estimate is for the male and which is for the female recombination frequency. Bias and Reduction in LOD Score Due to Ignoring Noninformative Offspring. The direct counting method does not apply to the case when both loci are dominant, because such a method cannot estimate recombination frequency due to the fact that only one genotype is informative, that is, the aabb genotype in Tables 7 and 8. Therefore, this section applies only to the cases where at least one locus is codominant. For biallelic loci with coupling or repulsion linkage phases where at least one locus is codominant (BB, BD), gender-specific recombination frequencies are unavailable and the bias in the genderaverage recombination frequency is d1 = θd − θ = −θ(1 − θ)(1 − 2θ)/[(1 − θ)2 + θ2]. For biallelic loci with mixed parental linkage phases (BB-CR, BD-CR), θd = 0.5 irrespective of the true parameter value, and the bias is d2 = θd − θ = 0.5 − θ. For biallelic loci with mixed parental linkage phases (BB-CR, BD-CR), bias in the female recombination frequency is d3 = xd − x = x(1 − x)(1 − 2y)/(x + y − xy), where xd = the female recombination frequency estimated by the direct counting method. Bias in the male recombination frequency can be obtained using the same formula except that x and y are switched in the formula. The formulas of d1, d2, and d3 show that ignoring noninformative offspring yields underestimates of recombination frequencies for coupling or repulsion parental linkage phases and overestimates for mixed parental linkage phases. As shown in Figure 3, the absolute bias reaches the maximum at θ = 0.257 for the coupling or repulsion phases, where the bias is 0.151, whereas the bias is an increasing function of θ and decreasing function of x or y for the mixed phases. These results indicate that ignoring noninformative offspring could result in a serious bias for biallelic loci. Bias due to ignoring noninformative offspring was also reported for half-sib designs with biallelic codominant loci (Gomez-Raya, 2001). Our analytical and numerical results show that ignoring noninformative offspring does not result in a bias in the estimate of the recombination frequency when at least Figure 3. Bias in estimates of recombination frequency due to ignoring noninformative offspring. BB: both loci are biallelic and codominant; BB-CR: two codominant loci with mixed parental linkage phases; BD: one biallelic codominant locus and one dominant locus with coupling or repulsion linkage phases; BD-CR: one biallelic codominant locus and one dominant locus with mixed parental linkage phases. 2537 Direct and indirect counting Table 9. Association between genotypic probabilities and numbers of observations for testing parental linkage phases for an intercross of A1A2B1B2 × A3A4B3B4 Genotypic frequency Phase I A1B1/A2B2 × A3B3/A4B4 Phase II A1B2/A2B1 × A3B3/A4B4 Phase III A1B1/A2B2 × A3B4/A4B3 Phase IV A1B2/A2B1 × A3B4/A4B3 Number of observations A1A3B1B3 A1A4B1B4 A2A3B2B3 A2A4B2B4 q1 = (1 − x)(1 − y) q2 q3 q4 k1 A1A3B1B4 A1A4B1B3 A2A3B2B4 A2A4B2B3 q2 = x(1 − y) q1 q4 q3 k2 A1A3B2B3 A1A4B2B4 A2A3B1B3 A2A4B1B4 q3 = (1 − x)y q4 q1 q2 k3 A1A3B2B4 A1A4B2B3 A2A3B1B4 A2A4B1B3 q4 = xy q3 q2 q1 k4 1 1 1 1 n Genotype Total one locus is multiallelic. As expected, ignoring noninformative offspring results in a reduction in the LOD score for all cases (Figure 4). genes using codominant markers or to integrate linkage maps of codominant and dominant loci. Literature Cited Implications Results from this study indicate that the method of direct and indirect counting can be an effective method for large-scale joint linkage analysis of codominant and dominant loci that are useful for mapping dominant Figure 4. Reduction in LOD scores due to ignoring noninformative offspring. The reduction in LOD score for a data type is defined as the difference between the unit LOD in Figure 1 and the unit LOD score when noninformative offspring for direct counting are ignored. Allard, R. W. 1956. Formulas and tables to facilitate the calculation of recombination values in heredity. Hilgardia 24:235–278. Ajmone-Marsan, P., A. Valentini, M. Cassandro, G. Vecchiotti-Antaldi, G. Bertoni, and M. Kuiper. 1997. AFLP markers for DNA fingerprinting in cattle. Anim. Genet. 28:418–426. Cushwa, W. T., and J. F. Medrano. 1996. Applications of the random amplified polymorphic DNA (RAPD) assay for genetic analysis of livestock species. Anim. Biotechnol. 7:11–31. Da, Y., and H. A. Lewin. 1995. Linkage information content and efficiency of full-sib and half-sib designs for gene mapping. Theor. Appl. Genet. 90:699–706. Gomez-Raya, L. 2001. Biased estimation of the recombination fraction using half-sib families and informative offspring. Genetics 157:1357–1367. Green, E. L. 1981. Genetics and Probability in Animal Breeding Experiments. Oxford University Press, New York. Green, P., K. Falls, and S. Crooks. 1990. Documentation for CRIMAP. version 2.4. Washington University School of Medicine, St. Louis. Knapp, S. J., J. L. Holloway, W. C. Bridges, and B-H. Liu. 1995. Mapping dominant markers using F2 matings. Theor. Appl. Genet. 91:74–81. Knorr, C., H. H. Cheng, and J. B. Dodgson. 1999. Application of AFLP markers to genome mapping in poultry. Anim. Genet. 30:28–36. Lander, E. S., and D. Botstein. 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199. Lathrop, G. M., J. M. Lalouel, C. Julier, and J. Ott. 1984. Strategies for multilocus linkage analysis in humans. Proc. Natl. Acad. Sci. USA 81:3443–3446. London, N., J. Xu, J. Garbe, and Y. Da. 2002. Linkage analysis for the hypothesized interaction between the polled and scurred traits in cattle. In: Proc. 7th World Cong. Genet. Appl. Livest. Prod., Montpellier, France 29:485–488. 2538 Da et al. Ott, J. 1999. Analysis of Human Genetic Linkage. 3rd ed. The Johns Hopkins University Press, Baltimore and London. Smith, C.A.B. 1957. Counting methods in genetical statistics. Ann. Hum. Genet. 21:254–276. Xu, J., N. London, J. Garbe, and Y. Da. 2002. Bias in linkage analysis due to ignoring epistasis effects. In: Proc. 7th World Cong. Genet. Appl. Livest. Prod., Montpellier, France 32:633–636. Appendix 1. LOD Scores MM data type: both loci are codominant and multiallelic (Table 9). Zx = nlog10(2) + (k2 + k4)log10(x) + (k1 + k3)log10(1 − x) Zy = nlog10(2) + (k3 + k4)log10(y) + (k1 + k2)log10(1 − y) Zθ = 2nlog10(2) + (k2 + k3 + 2k4)log10(θ) + (2k1 + k2 + k3)log10(1 − θ) u = 2[log10(2) + θlog10(θ) + (1 − θ)log10(1 − θ)] MB data type: both loci are codominant, but one locus is multi-allelic and one locus is bi-allelic (Table 1). Zx = N1log10[2(1 − x)] + N2log10(2x) + N3log10[2x(1 − y) + 2(1 − x)y] + N4log10[2xy + 2(1 − x)(1 − y)] Zy = N5log10[2(1 − y)] + N6log10(2y) + N3log10[2x(1 − y) + 2(1 − x)y] + N4log10[2xy + 2(1 − x)(1 − y)] Zθ = N7log10[4(1 − θ)2] + N8log10(4θ2) + N9log10[4θ(1 − θ)] + N10log10{2[(1 −θ)2 + θ2]} u = ¹⁄₂(1 − θ)2log10[4(1 − θ)2] + ¹⁄₂θ2log10(4θ2) + 2θ(1 − θ)log10[4θ(1 − θ)] + ¹⁄₂[(1 − θ)2 + θ2] log10{2[(1 − θ)2 + θ2]} where N1 = k1 + k4 + k5 + k8, N2 = k2 + k3 + k6 + k7, N3 = k9 + k12, N4 = k10 + k11, N5 = k1 + k3 + k6 + k8, N6 = k2 + k4 + k5 + k7, N7 = k1 + k8, N8 = k2 + k7, N9 = k3 + k4 + k5 + k6 + k9 + k12, and N10 = k10 + k11. BB data type: both loci are codominant and bi-allelic with coupling or repulsion parental linkage phases (Table 2). Zθ = N1log10[4(1 − θ)2] + N2log10[4θ (1 − θ)] + N3log10(4θ2) + N4log10{2[(1 − θ)2 + θ2]} where N1 = k1 + k9, N2 = k2 + k4 + k6 + k8, N3 = k3 + k7, N4 = k5. The unit LOD score is the same as that for the MB data type. BB data type: both loci are codominant and biallelic with mixed parental linkage phases (Table 3). Zx = (k1 + k9)log10(2x) + (k2 + k4 + k6 + k8)log10{2[(1 − x)(1 − y) + xy]} + (k3 + k7)log10[2(1 − x)] + k5log10{2[x(1 − y) + y(1 − x]} = (k1 + k9)log10[2(1 − y)] + (k2 + k4 + k6 + k8)log10{2[(1 − x)(1 − y) + xy]} Zθ = + (k3 + k7)log10(2y) + k5log10{2[x(1 − y) + y(1 − x)]} (k1 + k3 + k5 + k7 + k9)log10[4θ(1 − θ)] + (k2 + k4 + k6 + k8)log10{2[(1 − θ)2 + θ2]} u = 2θ(1 − θ)log[4θ(1 − θ)] + (1 − 2θ + 2θ2) log[2(1 − 2θ + 2θ2)] Zy MD data type: one locus is codominant and multi-allelic and one locus is dominant (Table 4). Zx = (k1 + k3)log10(2x) + (k2 + k4)log10[2(1 − x)] + k5log10[2(1 − xy)/(2 − y)] + k6log10{2[1 − (1 − x)y]/(2 − y)} + k7log10{2[1 − x(1 − y)]/(1 + y)} + k8log10[2(x + y − xy)/(1 + y)] Zy = (k1 + k2)log10(2y) + (k3 + k4)log10[2(1 − y)] + k5log10[2(1 − xy)/(2 − x)] + k6log10{2[(1 − x(1 − y)]/(1 + x)} + k7log10{2[1 − (1 − x)y]/(2 − y)} + k8log10[2(x + y − xy)/(1 + x)] Zθ = k1log10(4θ2) + (k2 + k3)log10[4θ(1 − θ)] + k4log10[4(1 − θ)2] + k5log10[(4/3)(1 − θ2)] + (k6 + k7)log10{(4/3)[1 − θ(1 − θ)]} + k8log10[(4/3)θ(2 − θ)] u = ¹⁄₄θ2log10(4θ2) + ¹⁄₂θ(1 − θ)log10[4θ(1 − θ)] + ¹⁄₄(1 − θ)2log10[4(1 − θ)2] + ¹⁄₄(1 − θ2)log10[(4/3)(1 − θ2)] + ¹⁄₂[1 − θ(1 − θ)]log10{(4/3)[1 − θ(1 − θ)]} + ¹⁄₄θ(2 − θ)log10[(4/3)θ(2 − θ)] 2539 Direct and indirect counting BD data type: one locus is codominant and biallelic and one locus is dominant with coupling or repulsion parental linkage phases (Table 5). = Zθ k1log10[(4/3)(1 − θ2)] + k2log10(4θ2) + k3log10{(4/3)[1 − θ(1 − θ)]} + k4log10[4θ(1 − θ)] + k5log10{[(4/3)θ(2 − θ)]} + k6log10[4(1 − θ)2] The unit LOD score is the same as for the MD data type. BD-CR data type: one locus is codominant and biallelic and one locus is dominant with mixed parental linkage phases (Table 6). Zx = Zy = Zθ = u = k1log10[2(1 − y + xy)/(2 − y)] + k2log10[2(1 − x)] + k3log10[(2/3)(1 + x + y − 2xy)] + k4log10{2[(1 − x)(1 − y) + xy]} + k5log10[2(1 − x + xy)/(1 + y)] + k6log10(2x) k1log10[2(1 − y + xy)/(1 + x)] + k2log10(2y) + k3log10[(2/3)(1 + x + y − 2xy)] + k4log10{2[(1 − x)(1 − y) + xy]} + k5log10[2(1 − x + xy)/(2 − x)] + k6log10[2(1 − y)] (k1 + k5)log10[(4/3)(1 − θ + θ2)] + (k2 + k6)log10[4θ(1 − θ)] + k3log10[(2/3)(1 + 2θ − 2θ2)] + k4log10{2[(1 − θ)2 + θ2]} [¹⁄₂(1 − θ + θ2)]log10[(4/3)(1 − θ + θ2)] + [¹⁄₂θ(1 − θ)]log10[4θ(1 − θ)] + [¹⁄₄(1 + 2θ − 2θ2)]log10[(2/3)(1 + 2θ − 2θ2] + {¹⁄₄[(1 − θ)2 + θ2)]}log10{2[(1 − θ)2 + θ2]} DD-CC data type: both loci are dominant with coupling linkage phase (Table 7). Zθ u = = k1log10{(8/9)[1 + 0.5(1 − θ)2]} + (k2 + k3)log10[(4/3)θ(2 − θ)] + k4log10[4(1 − θ)2] q1log10{(8/9)[1 + 0.5(1 − θ)2]} + (q2 + q3)log10[(4/3)θ(2 − θ)] + q4log10[4(1 − θ)2] DD-CR data type: both loci are dominant with mixed parental linkage phases (Table 8). = = Zθ u k1log10{(8/9)[1 + ¹⁄₂θ(1 − θ)]} + (k2 + k3)log10{(4/3)[1 − θ(1 − θ)]} + k4log10[4θ(1 − θ)] q1log10{(8/9)[1 + ¹⁄₂θ(1 − θ)]} + (q2 + q3)log10{(4/3)[1 − θ(1 − θ)]} + q4log10[4θ(1 − θ)] DD-RR data type: both loci are dominant with coupling and repulsion linkage phases (Knapp et al., 1995). Zθ u = = k1log10[(4/9)(2 + θ2)] + (k2 + k3)log10[(4/3)(1 − θ2)] + k4log10(4θ2) ¹⁄₄(2 + θ2)log10[(4/9)(2 + θ2)] + ¹⁄₂(1 − θ2)log10[(4/3)(1 − θ2)] + ¹⁄₄θ2log10(4θ2) where k1, k2, k3, and k4 are numbers of observations for A-B-, A-bb, aaB-, and aabb genotypes, respectively. Appendix 2: Likelihood Functions for Testing Parental Linkage Phases When the linkage phase of each parent is unknown, the most likely linkage phase of each parent can be identified using likelihood ratios. Table 9 shows the association between the genotypic probabilities and the numbers of genotypic observations. As the assumption about the parental linkage phases changes, the association between the underlying probabilities and the observations changes. For an intercross with two loci, four combinations of parental linkage phases are possible. For example, for the mating of A1A2B1B2 × A3A4B3B4 in Table 9, the following four combinations of parental linkage phases are possible: 1) A1B1/A2B2 × A3B3/A4B4, 2) A1B2/A2B1 × A3B3/A4B4, 3) A1B1/A2B2 × A3B4/A4B3, and 4) A1B2/A2B1 × A3B4/A4B3. Then, the corresponding log-likelihood functions (except for a common constant) are: L1 L2 L3 L4 = = = = k1log(q1) k1log(q2) k1log(q3) k1log(q4) + + + + k2log(q2) k2log(q1) k2log(q4) k2log(q3) + + + + k3log(q3) k3log(q4) k3log(q1) k3log(q2) + + + + k4log(q4) k4log(q3) k4log(q2) k4log(q1) The log-likelihood functions for the other cases can be derived in a similar manner. The resulting formulae are similar, except that the definitions for ki and qi are different and the numbers of terms in the summation of the right-hand side of the equation are generally different as well. The most likely linkage phase is then identified by the largest likelihood.
© Copyright 2026 Paperzz