Copyright 0 1992 by the Genetics Society of America The Theoretical Distribution of Lengths of Intact Chromosome Segments Around a Locus Held Heterozygous With Backcrossing in a Diploid Species H. Naveira' and A. Barbadilla Departamento de Genktica y Microbiologia, Universidad Autbnoma de Barcelona, Bellaterra (Barcelona), Spain Manuscript received October 29, 1990 Accepted for publication September 19, 1991 ABSTRACT When two different isogenic lines of a diploid species (or two different species) are crossed, the resulting F1 individuals should be heterozygous at all the loci fixed for different alleles in the two strains (in the limit, at all the loci of the genome). If one of these loci is then held heterozygous for several generations of repeated backcrossing to the same strain, the average length of intact chromosome segments(with reference to the original parental chromosome) on both sides of the selected that locuswhich are still locus, or, equivalently,theaveragelengthofsegmentssurrounding heterozygous (with reference to the fully heterozygous F, chromosome), may diminish, but cannot increase. Several authors have derived equations to predict this average. Weshow that the most of the true parametric widely usedcriterion, developedby R. A. Fisher, leads to serious overestimations with the correspondingerrors in the interpretation values, when applied to early generation analyses, of experimental results. We then derive the exact equations both for the averageandstandard deviation of the lengths of intact chromosome segmentssurrounding a locus held heterozygousafter any number of generations of backcrossing. Our results are in close agreement with those found by a former author, although involving a rather different approach. W HEN a recognizable dominant or semidominantgene is introducedintoan established isogenic line by repeated backcrossing, other genes are introduced together with the desired gene. An average 50% of those that are in other chromosomes are eliminated in each backcross generation by chromosome segregation. But those that arelinked can be eliminated only by crossing over,andthereforea considerable portion of the chromosome on eachside of the selected gene may be expected to remain intact (with reference to the original parental chromosome) during the early backcross generations. The question of how much is reduced, as an average, the length of intact chromosome segments after a series of backcrosses, or other selected breeding procedure, is importantnot only forthe plant or animal breeder, interested in thedevelopment of stocks with desired genetic traits, but also for persons working in many other fields of research. For example, this issue is relevant to those trying to separate the selective effect of an allele from the effect of their genetic background (genetic hitchhiking, a problem particularly relevant for genetic perturbation experiments), those wanting to study the maintenance of coadapted gene complexes in spite of recombination with other genecomplexes (recombinational load),or, finally, those trying to find out the number of differ- ' Present address: Departamento de Biologia Fundamental, Unidad de Gen&ica,Facultad de Biologia. Universidad de Santiago de Compostela, Santiago de Compostela, Spain. Genetics 1 3 0 205-209 (January, 1992) ent genes that determine a given character by chromosome substitution experiments (for example, the number of species differences that bring about hybrid sterility). In all these examples it is implicitly assumed that the parental chromosomes are heterozygous not only for the locus of interest, but also for at least several other loci with effects on the same character. Therefore, the question of how much the length of intact chromosome segments is reduced following selected breeding proceduresis indeed equivalent to the question ofhow much thelength of the segments which are still heterozygous is reduced (with reference tothe original, wholly heterozygous, FI chromosomes). The answer to this question has been investigated by several authors (BARTLETTand HALDANE 1935; FISHER 1949;HANSON1959a,b),buttheir conclusions, for one or anotherreason, show some problems for early generation analyses. In this paper we derive the exact equation both for the average and thestandard deviation of lengths of intact chromosome segments surrounding a locus held heterozygous for any number of generations of backcrossing. Earlier derivations:T o understand our derivation below, it is helpful to first review thearguments leadingboth to BARTLETTandHALDANE'Sandto FISHER'S classical formulas. BARTLETT and HALDANE (1935) showed that, assuming a uniform distribution of c (the recombination frequency) along the chromosome, the average intact lengthon each side of the and H. Naveira 206 selected locus after n generations would be given by: L(in map (1 units + 100) = l/n[l - c)’”’dc - (1/2)”] = l/n (1) (if n large) or 100(2/n) map units on both sides of the selected locus, where map units correspond to centimorgans (cM), 1 cM being equal to a recombination frequency of 1%.This approximation has one fundamental problem. N o matter how far aparttwo loci are on achromosome, we never observe a c value of greater than 0.5 (50 cM). Therefore, all large segments are assigned the same value in theintegral, although some of them may be quitelongerthan others. But as the number of backcross generations increases, the relative frequency of large intact chromosome segments decreases very quickly, and this argument loses importance. That iswhy BARTLETT and HALDANE gave only approximate results for large n in the various cases that they considered ( L = l/n, for n large). One of the best measures we can have of true genetic distance is m, the actual average number of crossovers per meiosis in a chromosome region (excluding crossovers between sister chromatids, which are thought to be very rare anyway). The proportion of meioses withat least one crossover is one minus the fraction with zero crossovers, and, assuming no interference, it may be derived from aPoisson distribution of parameter m. Only one-half of the products of those meioses will be recombinants in the region of interest. Hence c = (1 - e-”)/2. It can be seen that as m gets larger, c approaches 0.5, butthefunction (usually denotedthemappingfunction;HALDANE 1919) is approximately linear, namely c = m/2, for a certain range corresponding to very small m values (genetic distances). Only in this range, which corresponds to m valuesless than 0.2 (c valuesless than O.l), does the map unit defined as 1% recombinant frequency (1 map unit = 1 cM) have real meaning. Accordingly, genetic distances may be obtained either through the summation of small genetic intervals in which c has a linear relationship with map distance, or, alternatively, by transforming centimorgans into real map units by means of the mapping function. In this last case, the recombination fraction between two locimust be used first to obtain an estimate of the averagenumber of crossovers per meiosisin that chromosome region (from c = (1 - e-”)/2, it follows that m = -In(l - 2c)); then, using the linear relationship found for small genetic distances, c = m/2, real map units can be obtained (1 real map unit = 0.02 crossovers per meiosis per chromosome, that is, real map units = 50m; SUZUKI et al. 1986, p. 124). This relationship works both ways. Forexample,for a A. Barbadilla chromosome of 50 real mapunits (not centimorgans!), which could represent an average map length of a chromosome arm in Drosophila, m would be 1. If we consider crossovers per chromatid instead of per chromosome (two sister chromatidsjoined by a single centromere), then 1 real map unit = 0.01 crossovers per meiosis per chromatid, that is, real map units = loom’, where m‘ = m/2 (HALDANE 1919; FISHER 1948). This difference is quite simple to understand. If m is the average number of crossovers per chromosome (m = map units f 50), m’ = map units + 100 will be the average per chromatid. That is, the distance between two loci in terms of number of crossovers per meiosis will be different on a “per chromosome” or a “per chromatid” basis (m and m/2, respectively), but it will be the same in terms of real map units (50m or 100m’ in both cases). FISHER(1949) actually used ameasure of genetic distance quite related to these ones to derive a second approximation to the problem ofhow largea linked region will remain after a certain number of generations of backcrossing. Consider meiosis in an F1 between isogenic lines of a diploid species, or in an F, interspecific hybrid, where each chromosome is present as two homologs, each one from a different isogenic line (or species), and each one consisting of two sister chromatids. Let the length of the chromosome be 0 . Let x be the average number of crossovers per chromatid (genetic distance) between any locus on the chromosome and the fixed dominant (selected locus). Then, the probability of no crossover on each chromatid in this segment for n generations of backcrossing would be e-nx. The probability of a crossover in any one generation within an infinitely small chromatid interval d x adjacent to this segment should be precisely d x . And the probability of crossover in the interval d x sometime during n generations should be n d x . Therefore, the probability of having had a crossover in the interval d x butnot in the adjacentsegment x after n generations should be e-%dx. Then, the mean value of the intact interval onone side of the selected locus, assuming crossing over has occurred sometime during n generations, would be given (after some operation and dropping terms with e-m) by: E[X]= JO” xe-%dx = 1/n crossovers perchromatid, (2) that is, 100(l/n) real map units, or 100(2/n) on both sides of the fixed dominant. This is exactly the same resultreached by BARTLETTandHALDANE(1935) that we presented above (1). But in this case the derivation is apparently faultless, because the genetic map scale used by FISHERis linear.It has not the Segments Chromosome Intact 207 limitations imposed by c on the derivation of BARTLETT and HALDANE. Therefore it should be possible in principle to apply this criterion toany number of backcross generations. But, as expected, when applied to early backcross generations, it leads to overtly incongruent conclusions. For example, after one generation of backcrossing the average length of heterozygous segments surrounding a locus on the X chromosome of Drosophila melanogaster would be 100 X 2 = 200 map units, although the X chromosome is only 70 map units long! Hanson's derivation: The unsuitability for early generations of the relationships developed by BARTLETTandHALDANE(1935)and, particularly FISHER (1 949) was first noted by HANSON (195913). He showed that these relationships are effectively limiting functions, which are not very accurate in the early generations of backcrossing. His work apparently passed unnoticed for most scientists, though. FISHER'S criterioncontinues to be used at present in early generation analyses (ORR and COYNE 1989), and it is cited (as such or as the less perfectderivation of BARTLETT andHALDANE) in widely used text books without warning about its failure in early generations (CROW and KIMURA 1970; HEDRICK1985; WRIGHT 1969). Maybe this fact is due to theparticular rationale used by Hanson in his derivation. Instead of trying to find out how FISHER'Sderivation could be improved,hepreferredtointroducea radically new approach, considering crossover breakpoints as loci which define a set of chromatid segments, each identified by subsets with a specific probability of containing the selected locus and a specific frequency distribution of lengths. His derivation of the function for the cumulative distribution of lengths of heterozygous segmentsonone side of the selected locus(in the chromatids generated by a meiotic division) is essentially correct, but rather difficult to follow, mainly because of the confusing notation he applies. Besides, hisfinal results, the formulas for the expected half length and variance of heterozygous chromosome segments (actually the object of his whole paper) arehalfhidden as afootnote ofhis Table 1. AI1 this may explain the low diffusion of this work amongthe scientific audience. Anyway, according to his results, the mean value of the intact interval associated on one side with a locus placed in the middle of a chromosome and held heterozygous in backcrossing is: E [ X ] = 100(1/n)[1- e-ns/2]rea~mapunits, 195913, p. 833). After eight to ten generations this quantity is indistinguishable from FISHER'Scriterion, but in the earliest generations the differences may be very important (see Table 1 in HANSON'S paper). The first problem, as we said, is that HANSONdid not showhow the derivation ofFISHER could be improved. The second one is that his measure of chromosome length may be confusing when considering that a chromosome in meiosis consists of two sister chromatids joined by a single centromere. T o showhow theseproblems can becircumvented is precisely the object of the following paragraphs. The exact derivation: Experiments which involve early generation tests need an exact criterion for the average length of heterozygous segments in the early generations of backcrossing. We are going to derive such a criterion following essentially FISHER'Srationale, with relatively small, but ultimately very important corrections in his original derivation, that allow the calculation of the average length of heterozygous segments abouta locus held heterozygous for any number of generations of backcrossing. FISHERintegrated from 0 to a,but that is a limiting condition. The upper limit is actually fixed by the length of the heterozygous chromosome segment in the F, ( n = 0), which is far from being 03. Let the length of the chromosome be m in the scale of crossovers per meiosis (50m map units in the real map scale), that is to say, an average of m crossovers take place in that chromosome per meiosis. Therefore m/ 2 (or m') would be the average number of crossovers in each sister chromatid, assuming equal probability of exchanges for both of them. If there is no interference, the probability of no crossover in each chromatid of this chromosome in one generation is given by the Poisson distribution, namely Thenthe probability of no crossover in n generations of backcrossing would be e-nm/2. If the selected locus is placed just in the middle of the chromosome (a simplifying assumption that we will drop afterward), the probability of no crossover on oneof its sides in each chromatid will be e-nm/4. Finally, following FISHER'S rationale, we had that the probability of having had a crossover in the interval d x but not in the adjacent segment x after n generations of repeated backcrossing was ePnxndx. Then, the mean value of the intact interval on one side of the selected locus assuming crossing over has occurred sometime during n generations, would be given by: (3) o r twice this amount if both sides of the selected locus are considered, where n is the number of backcross generations and s is the length of the heterozygous chromosome in n = 0 (the F1 generation), expressed as "the expected number of breaks per . . . chromosome(s) resulting from a meiotic division" (HANSON E [ X c ]= r or simply, by taking D 4 xe-""ndx/( 1 - epnrnI4) = m/4, xe-""ndx/(l - e-nD) and H. Naveira 208 where X, means segments resulting from crossovers on one side of the selected locus in the original chromatids of length m/2. In so far as the selected locus is assumed to be in the middle of the chromosome, the maximum length of the intact segment on each of its sides must be m/4. Anyway, this is the same formula obtained by FISHER(2), after making D = w. In D. melanoguster, for example, the X chromosome has a length of only 70 map units, which correspond to an m value of 1.4 crossovers per meiosis for the whole chromosome. The integral, then, should actually be from 0 to 0.35 in this case, not to a! By solving the integral in parts, we obtain: E[X,] = [(l/n) - e-””[D + (l/n)]]/(1 - e-.”). + (l/n) - e-””[D + (l/n)] where the first part of the sum is the contribution to the average of those chromatidsthat have had no crossovers for n generations. Operating, R[X] = ( l / n ) e ( 1 - e-””)crossovers per chromatid, - do=25 + d,=50 + d0=75 do=lOO * = 4 \\ * do= 03 I I - 0 .> v) 0 0 ge . 2 But we are interested not in this amount, that represents the average length of the intact heterozygous chromatid segments on one side of the selected locus produced by crossing over in the originally wholly heterozygous chromosome, but in the absolute average length, which includes the contribution both of crossover and non-crossover chromatids. That is, E [ X ] = De-”” A. Barbadilla (4) o r 100(l/n)( 1 - e-””) map units. Again, if we are interested in the average length on both sides of the selected locus we must double that amount. By comparing this formula with HANSON’S (3) it becomes clear that our D equals s/2 in HANSON’S notation. That is, his definition of chromosome length(s) is exactly what we are calling chromatid length (m/2, or m’).This difference results from HANSON always meaning “chromosomes resultingfrom meiosis,” that is, the former chromatids,whereas for us, instrict adherence to commonly used terminology in meiosis, the term chromosome is reserved for each duplicated chromosome homolog, consisting of two daughter strands (sister chromatids) joined by a single centromere, in the stage of four chromatids when crossing over takes place. We believe that in this way serious misunderstandings are avoided. A full statement of HANSON’S definition for the length(s) of a chromosome region, in the line of FISHER (1948, 1949), would be rather difficult to understand: average number of crossover breaks per chromosome resulting from meiosis in the chromosome region considered. Our definition, in the line of SUZUKI et ul. (1986), is more simple: the length (m) of a chromosome region is the average number of crossovers (or exchanges) per meiosis in (D L !p (D 0 n * 5 WO I I I I I 0 9 4 8 8 I 10 19 14 I - ~ r l I8 18 Lo n (generatLons 1 FIGURE 1 .-The expected combined length ( E [ X ] ,in map units) of heterozygous chromosome segments on both sides (right and left) of a locus held heterozygous for several generations of backcrossing, when the locus in question is just in the middle of a chromosome segment whose initial length in heterozygosis, do (the length in the F, generation, n = 0) is either 2 5 , 5 0 , 7 5or , 100 map units (after Equation 6, where R = L = d0/2). FISHER’Slimiting function is also represented (do = m). the chromosomeregionconsidered. But, bothfor HANSONand for us, the length of the chromosome region would be the same when expressed in real map units: 1005, or 100m/2 (1OOm’) in our notation. The problem with HANSON’S criterion,therefore, is not properly a mathematical one, but ratherof a semantic kind. The variance of the lengths of the heterozygous segments linked on one side to the selected locus can be calculated as Var[x] = E[x*] - (ELXI)*, which gives: Var[X] = (l/n*)[l - e”’”(2nD + e-””)], (5) or twice this amount for the lengths surrounding the selected locus. The limiting function (when m + 00) would be Var[X] = l/n*. In Figure 1 we compare the performance of our exact criterion for the expected combined length on both sides of the selected locus with FISHER’S limiting function, when different initial lengths of heterozygous segments surrounding the selected locus in the FI generation ( n = 0) are considered. As shown in the figure, FISHER’S criterion is adequate only after eight or more generationswhen the size ofthe heterozygous Segments Chromosome - Intact 209 d,=25 distance L (also in map units + 100) from the same locus. Then, the average length of the heterozygous segment on both sides of the locus after n generations of backcrossing would be given by + d,=50 + d,=75 E[X] = (l/n)[(l + d,=100 * + 03 Var[X] = (l/n2)[2 - e-"R(2nR e-nR) - e-nL (2nL + e-"L)]. (7) (II A ~~~ 0 t 4 E 8 10 (6) and the variance, do= % 030 - e-nR) + (1 - e-nL)] I t 14 16 18 10 n (generations 1 FIGURE 2.-The standard deviation (in map units) of the combined length ( X ) of heterozygous chromosome segments on both sides (right and left) of a locus held heterozygous for several generations of backcrossing, in the same situations as in Figure 1 (after Equation 7, where R = L = d0/2). chromosome is 100 map units. If the size of the heterozygous segment that we consider in the F1 is smaller (75 to 50 map units), 10-12 generations may be necessary to get a good approximation to the true value. And when the size ofthe heterozygous segment is very small (20 map units, for example), up to 20 generations are necessary to get a really good agreement between the exact criterionand FISHER'S.Figure 2 shows the standard deviations for the lengths on both sides of the selected locus. Except for theearliest backcross generations, they are quite similar for all the different initial lengths of heterozygous chromosome segments that we have considered. Their values increase at first to reach a maximum that depends on the initial size of the heterozygous segment and then drop to the values of the limiting function as n gets larger. Equations 4 and 5 may be expressed in a more generalformfor cases where the selected locus is asymmetrically located on the chromosome.Let us assume that the selected locus is linked in the F1 to a heterozygous chromosome segment whose right bound (nearest to the centromere)lies at a distance R (in map units + 100) from the locus of interest, and whose left bound (nearest to the telomere) lies at a Equations 6 and 7 giverise to Equations 4 and 5, respectively, when R = L = D (that is, when the locus of interest is just in the middle of achromosome which is whollyheterozygous in the FJ, and thelength on only one side of the locus is considered. These generalequations can be usedin any conceivable situation to provide estimates of the average length of heterozygous chromosome segments linked to a selected locus and its associated error, which will depend, of course, on the size of the sample, N(Var[E[X]] = (l/N)Var[X]). We hopethat with this paper all the interested scientific audience will finally be aware of the limitations of FISHER'Scriterion. T o understand this will help to realize many of the consequences of genetic recombination, which are sometimes more difficult to anticipate without a well developed theory. LITERATURE CITED BARTLETT, M. S., and J. B. S. HALDANE,1935 The theory of inbreeding with forced heterozygosis.J. Genet. 31: 327-340. CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics, pp. 94-95. Harper & Row, New York. FISHER,R. A., 1948 A quantitative theory of genetic recombination and chiasma formation. Biometrics 4: 1-13. FISHER,R. A., 1949 The Theory of Inbreeding, pp. 49-50. Hafner, New York. HALDANE, J. B. S., 1919 The combination of linkage values, and the calculation of distances between the loci of linked factors. J. Genet. 8: 299-309. HANSON, W. D., 1959a The theoretical distribution of lengths of parental gene blocks in the gametes of an F, individual. Genetics 44: 197-209. HANSON,W. D., 1959b Early generation analysisof lengths of heterozygous chromosome segments around a locus held heterozygous with backcrossing or selfing. Genetics 4 4 833-837. HEDRICK, P. W., 1985 Genetics of Populations, pp. 372-373. Jones and Bartlett, Boston. ORR,H. A., and J. A. COYNE,1989 The genetics of postzygotic isolationin the Drosophilavirilis group. Genetics 121: 527537. SUZUKI, D. T., GRIFFITHS,A. J. F., MILLER,J.H. and R. C. LEWONTIN, 1986 An Introduction to Genetic Analysis, pp. 103105. Freeman, New York. WRIGHT,S., 1969 Evolution and the Genetics of Populations, Vol. 11, pp. 264-265. The University of Chicago Press, Chicago. Communicating editor: B. S. WEIR
© Copyright 2026 Paperzz