The Theoretical Distribution of Lengths of Intact

Copyright 0 1992 by the Genetics Society of America
The Theoretical Distribution of Lengths of Intact Chromosome Segments
Around a Locus Held Heterozygous With Backcrossing in a Diploid Species
H. Naveira' and A. Barbadilla
Departamento de Genktica y Microbiologia, Universidad Autbnoma de Barcelona, Bellaterra (Barcelona), Spain
Manuscript received October 29, 1990
Accepted for publication September 19, 1991
ABSTRACT
When two different isogenic lines of a diploid species (or two different species) are crossed, the
resulting F1 individuals should be heterozygous at all the loci fixed for different alleles in the two
strains (in the limit, at all the loci of the genome). If one of these loci is then held heterozygous for
several generations of repeated backcrossing to the same
strain, the average length of intact chromosome segments(with reference to the original parental chromosome)
on both sides of the selected
that locuswhich are still
locus, or, equivalently,theaveragelengthofsegmentssurrounding
heterozygous (with reference to the fully heterozygous F, chromosome), may diminish, but cannot
increase. Several authors have derived equations to predict
this average. Weshow that the most
of the true parametric
widely usedcriterion, developedby R. A. Fisher, leads to serious overestimations
with the correspondingerrors in the interpretation
values, when applied to early generation analyses,
of experimental results. We then derive the exact equations both for
the averageandstandard
deviation of the lengths of intact chromosome segmentssurrounding a locus held heterozygousafter
any number of generations of backcrossing. Our results are in close agreement with those found by a
former author, although involving a rather different approach.
W
HEN a recognizable dominant or semidominantgene is introducedintoan
established
isogenic line by repeated backcrossing, other genes
are introduced together with the desired gene. An
average 50% of those that are in other chromosomes
are eliminated in each backcross generation by chromosome segregation. But those that arelinked can be
eliminated only by crossing over,andthereforea
considerable portion of the chromosome on eachside
of the selected gene may be expected to remain intact
(with reference to the original parental chromosome)
during the early backcross generations.
The question of how much is reduced, as an average, the length of intact chromosome segments after
a series of backcrosses, or other selected breeding
procedure, is importantnot only forthe plant or
animal breeder, interested in thedevelopment
of
stocks with desired genetic traits, but also for persons
working in many other fields of research. For example, this issue is relevant to those trying to separate
the selective effect of an allele from the effect of their
genetic background (genetic hitchhiking, a problem
particularly relevant for genetic perturbation experiments), those wanting to study the maintenance of
coadapted gene complexes in spite of recombination
with other genecomplexes (recombinational load),or,
finally, those trying to find out the number of differ-
' Present address: Departamento de Biologia Fundamental, Unidad de
Gen&ica,Facultad de Biologia. Universidad de Santiago de Compostela,
Santiago de Compostela, Spain.
Genetics 1 3 0 205-209 (January, 1992)
ent genes that determine a given character by chromosome substitution experiments (for example, the
number of species differences that bring about hybrid
sterility). In all these examples it is implicitly assumed
that the parental chromosomes are heterozygous not
only for the locus of interest, but also for at least
several other loci with effects on the same character.
Therefore, the question of how much the length of
intact chromosome segments is reduced following selected breeding proceduresis indeed equivalent to the
question ofhow much thelength of the segments
which are still heterozygous is reduced (with reference
tothe
original, wholly heterozygous, FI chromosomes).
The answer to this question has been investigated
by several authors (BARTLETTand HALDANE 1935;
FISHER 1949;HANSON1959a,b),buttheir
conclusions, for one or anotherreason, show some problems
for early generation analyses. In this paper we derive
the exact equation both for the average
and thestandard deviation of lengths of intact chromosome segments surrounding a locus held heterozygous for any
number of generations of backcrossing.
Earlier derivations:T o understand our derivation
below, it is helpful to first review thearguments
leadingboth to BARTLETTandHALDANE'Sandto
FISHER'S
classical formulas. BARTLETT
and HALDANE
(1935) showed that, assuming a uniform distribution
of c (the recombination frequency) along the chromosome, the average intact lengthon each side of the
and H. Naveira
206
selected locus after n generations would be given by:
L(in
map
(1
units + 100)
= l/n[l
- c)’”’dc
- (1/2)”] = l/n
(1)
(if n large) or 100(2/n) map units on both sides of the
selected locus, where map units correspond to centimorgans (cM), 1 cM being equal to a recombination
frequency of 1%.This approximation has one fundamental problem. N o matter how far aparttwo loci are
on achromosome, we never observe a c value of
greater than 0.5 (50 cM). Therefore, all large segments are assigned the same value in theintegral,
although some of them may be quitelongerthan
others. But as the number of backcross generations
increases, the relative frequency of large intact chromosome segments decreases very quickly, and this
argument loses importance. That iswhy BARTLETT
and HALDANE
gave only approximate results for large
n in the various cases that they considered ( L = l/n,
for n large).
One of the best measures we can have of true
genetic distance is m, the actual average number of
crossovers per meiosis in a chromosome region (excluding crossovers between sister chromatids, which
are thought to be very rare anyway). The proportion
of meioses withat least one crossover is one minus the
fraction with zero crossovers, and, assuming no interference, it may be derived from aPoisson distribution
of parameter m. Only one-half of the products of
those meioses will be recombinants in the region of
interest. Hence c = (1 - e-”)/2. It can be seen that as
m gets larger, c approaches 0.5, butthefunction
(usually denotedthemappingfunction;HALDANE
1919) is approximately linear, namely c = m/2, for a
certain range corresponding to very small m values
(genetic distances). Only in this range, which corresponds to m valuesless than 0.2 (c valuesless than
O.l), does the map unit defined as 1% recombinant
frequency (1 map unit = 1 cM) have real meaning.
Accordingly, genetic distances may be obtained either
through the summation of small genetic intervals in
which c has a linear relationship with map distance,
or, alternatively, by transforming centimorgans into
real map units by means of the mapping function. In
this last case, the recombination fraction between two
locimust be used first to obtain an estimate of the
averagenumber of crossovers per meiosisin that
chromosome region (from c = (1 - e-”)/2, it follows
that m = -In(l - 2c)); then, using the linear relationship found for small genetic distances, c = m/2, real
map units can be obtained (1 real map unit = 0.02
crossovers per meiosis per chromosome, that is, real
map units = 50m; SUZUKI
et al. 1986, p. 124). This
relationship works both ways. Forexample,for
a
A. Barbadilla
chromosome of 50 real mapunits (not centimorgans!),
which could represent an average map length
of a
chromosome arm in Drosophila, m would be 1. If we
consider crossovers per chromatid instead of per chromosome (two sister chromatidsjoined by a single
centromere), then 1 real map unit = 0.01 crossovers
per meiosis per chromatid, that is, real map units =
loom’, where m‘ = m/2 (HALDANE
1919; FISHER
1948). This difference is quite simple to understand.
If m is the average number of crossovers per chromosome (m = map units f 50), m’ = map units + 100
will be the average per chromatid. That is, the distance between two loci in terms of number of crossovers per meiosis will be different on a “per chromosome” or a “per chromatid” basis (m and m/2, respectively), but it will be the same in terms of real map
units (50m or 100m’ in both cases). FISHER(1949)
actually used ameasure of genetic distance quite
related to these ones to derive a second approximation
to the problem ofhow largea linked region will
remain after a certain number
of generations of backcrossing.
Consider meiosis in an F1 between isogenic lines of
a diploid species, or in an F, interspecific hybrid,
where each chromosome is present as two homologs,
each one from a different isogenic line (or species),
and each one consisting of two sister chromatids. Let
the length of the chromosome be 0 . Let x be the
average number of crossovers per chromatid (genetic
distance) between any locus on the chromosome and
the fixed dominant (selected locus). Then, the probability of no crossover on each chromatid in this
segment for n generations of backcrossing would be
e-nx. The probability of a crossover in any one generation within an infinitely small chromatid interval d x
adjacent to this segment should be precisely d x . And
the probability of crossover in the interval d x sometime during n generations should be n d x . Therefore,
the probability of having had a crossover in the interval d x butnot in the adjacentsegment x after n
generations should be e-%dx. Then, the mean value
of the intact interval onone side of the selected locus,
assuming crossing over has occurred sometime during
n generations, would be given (after some operation
and dropping terms with e-m) by:
E[X]=
JO”
xe-%dx
= 1/n crossovers perchromatid,
(2)
that is, 100(l/n) real map units, or 100(2/n) on both
sides of the fixed dominant. This is exactly the same
resultreached by BARTLETTandHALDANE(1935)
that we presented above (1). But in this case the
derivation is apparently faultless, because the genetic
map scale used by FISHERis linear.It has not the
Segments
Chromosome
Intact
207
limitations imposed by c on
the
derivation of
BARTLETT
and HALDANE. Therefore
it should be possible in principle to apply this criterion toany number
of backcross generations. But, as expected, when applied to early backcross generations, it leads to overtly
incongruent conclusions. For example, after one generation of backcrossing the average length of heterozygous segments surrounding a locus on the X chromosome of Drosophila melanogaster would be 100 X 2
= 200 map units, although the X chromosome is only
70 map units long!
Hanson's derivation: The unsuitability for early
generations of the relationships developed by
BARTLETTandHALDANE(1935)and,
particularly
FISHER (1 949)
was first noted by HANSON
(195913). He
showed that these relationships are effectively limiting
functions, which are not very accurate in the early
generations of backcrossing. His work apparently
passed unnoticed for most scientists, though. FISHER'S
criterioncontinues to be used at present in early
generation analyses (ORR and COYNE 1989), and
it is
cited (as such or as the less perfectderivation of
BARTLETT andHALDANE)
in widely used text books
without warning about its failure in early generations
(CROW and KIMURA 1970; HEDRICK1985; WRIGHT
1969). Maybe this fact is due to theparticular rationale used by Hanson in his derivation. Instead of trying
to find out how FISHER'Sderivation could be improved,hepreferredtointroducea
radically new
approach, considering crossover breakpoints as loci
which define a set of chromatid segments, each identified by subsets with a specific probability of containing the selected locus and a specific frequency distribution of lengths. His derivation of the function for
the cumulative distribution of lengths of heterozygous
segmentsonone
side of the selected locus(in the
chromatids generated by a meiotic division) is essentially correct, but rather
difficult to follow, mainly
because of the confusing notation he applies. Besides,
hisfinal results, the formulas for the expected half
length and variance of heterozygous chromosome segments (actually the object of his whole paper) arehalfhidden as afootnote ofhis Table 1. AI1 this may
explain the low diffusion of this work amongthe
scientific audience. Anyway, according to his results,
the mean value of the intact interval associated on one
side with a locus placed in the middle of a chromosome
and held heterozygous in backcrossing is:
E [ X ] = 100(1/n)[1- e-ns/2]rea~mapunits,
195913, p. 833). After eight to
ten generations this
quantity is indistinguishable from FISHER'Scriterion,
but in the earliest generations the differences may be
very important (see Table 1 in HANSON'S paper).
The first problem, as we said, is that HANSONdid
not showhow the derivation ofFISHER could be
improved. The second one is that his measure of
chromosome length may be confusing when considering that a chromosome in meiosis consists of two
sister chromatids joined by a single centromere. T o
showhow theseproblems can becircumvented is
precisely the object of the following paragraphs.
The exact derivation: Experiments which involve
early generation tests need an exact criterion for the
average length of heterozygous segments in the early
generations of backcrossing. We are going to derive
such a criterion following essentially FISHER'Srationale, with relatively small, but ultimately very important corrections in his original derivation, that allow
the calculation of the average length of heterozygous
segments abouta locus held heterozygous for any
number of generations of backcrossing.
FISHERintegrated from 0 to a,but that is a limiting
condition. The upper limit is actually fixed by the
length of the heterozygous chromosome segment in
the F, ( n = 0), which is far from being 03. Let the
length of the chromosome be m in the scale of crossovers per meiosis (50m map units in the real map
scale), that is to say, an average of m crossovers take
place in that chromosome per meiosis. Therefore m/
2 (or m') would be the average number of crossovers
in each sister chromatid, assuming equal probability
of exchanges for both of them. If there is no interference, the probability of no crossover in each chromatid of this chromosome in one generation is given by
the Poisson distribution, namely
Thenthe probability of no crossover in n generations of backcrossing
would be e-nm/2. If the selected locus is placed just in
the middle of the chromosome (a simplifying assumption that we will drop afterward), the probability of
no crossover on oneof its sides in each chromatid will
be e-nm/4. Finally, following FISHER'S
rationale, we had
that the probability of having had a crossover in the
interval d x but not in the adjacent segment x after n
generations of repeated backcrossing was ePnxndx.
Then, the mean value of the intact interval on one
side of the selected locus assuming crossing over has
occurred sometime during n generations, would be
given by:
(3)
o r twice this amount if both sides of the selected locus
are considered, where n is the number of backcross
generations and s is the length of the heterozygous
chromosome in n = 0 (the F1 generation), expressed
as "the expected number of breaks per . . . chromosome(s) resulting from a meiotic division" (HANSON
E [ X c ]=
r
or simply, by taking D
4
xe-""ndx/( 1 - epnrnI4)
= m/4,
xe-""ndx/(l
- e-nD)
and H. Naveira
208
where X, means segments resulting from crossovers
on one side of the selected locus in the original chromatids of length m/2. In so far as the selected locus is
assumed to be in the middle of the chromosome, the
maximum length of the intact segment on each of its
sides must be m/4. Anyway, this is the same formula
obtained by FISHER(2), after making D = w. In D.
melanoguster, for example, the X chromosome has a
length of only 70 map units, which correspond to an
m value of 1.4 crossovers per meiosis for the whole
chromosome. The integral, then, should actually be
from 0 to 0.35 in this case, not to a!
By solving the integral in parts, we obtain:
E[X,] = [(l/n) - e-””[D
+ (l/n)]]/(1
- e-.”).
+ (l/n) - e-””[D + (l/n)]
where the first part of the sum is the contribution to
the average of those chromatidsthat have had no
crossovers for n generations. Operating,
R[X] = ( l / n )
e (
1 - e-””)crossovers per chromatid,
-
do=25
+ d,=50
+ d0=75
do=lOO
*
=
4
\\
*
do=
03
I
I
- 0
.>
v)
0
0
ge
.
2
But we are interested not in this amount, that represents the average length of the intact heterozygous
chromatid segments on one side of the selected locus
produced by crossing over in the originally wholly
heterozygous chromosome, but in the absolute average length, which includes the contribution both of
crossover and non-crossover chromatids.
That is,
E [ X ] = De-””
A. Barbadilla
(4)
o r 100(l/n)( 1 - e-””) map units. Again, if we are
interested in the average length on both sides of the
selected locus we must double that amount. By comparing this formula with HANSON’S
(3) it becomes clear
that our D equals s/2 in HANSON’S
notation. That is,
his definition of chromosome length(s) is exactly what
we are calling chromatid length (m/2, or m’).This
difference results from HANSON always meaning
“chromosomes resultingfrom meiosis,” that is, the
former chromatids,whereas for us, instrict adherence
to commonly used terminology in meiosis, the term
chromosome is reserved for each duplicated chromosome homolog, consisting of two daughter strands
(sister chromatids) joined by a single centromere, in
the stage of four chromatids when crossing over takes
place. We believe that in this way serious misunderstandings are avoided. A full statement of HANSON’S
definition for the length(s) of a chromosome region,
in the line of FISHER (1948, 1949), would be rather
difficult to understand: average number of crossover
breaks per chromosome resulting from meiosis in the
chromosome region considered. Our definition, in
the line of SUZUKI
et ul. (1986), is more simple: the
length (m) of a chromosome region is the average
number of crossovers (or exchanges) per meiosis in
(D
L
!p
(D
0
n
*
5
WO
I
I
I
I
I
0
9
4
8
8
I
10
19
14
I - ~ r l
I8
18
Lo
n (generatLons 1
FIGURE 1 .-The expected combined length ( E [ X ] ,in map units)
of heterozygous chromosome segments on both sides (right and
left) of a locus held heterozygous for several generations of backcrossing, when the locus in question is just in the middle of a
chromosome segment whose initial length in heterozygosis, do (the
length in the F, generation, n = 0) is either 2 5 , 5 0 , 7 5or
, 100 map
units (after Equation 6, where R = L = d0/2). FISHER’Slimiting
function is also represented (do = m).
the chromosomeregionconsidered.
But, bothfor
HANSONand for us, the length of the chromosome
region would be the same when expressed in real map
units: 1005, or 100m/2 (1OOm’) in our notation. The
problem with HANSON’S
criterion,therefore, is not
properly a mathematical one, but ratherof a semantic
kind.
The variance of the lengths of the heterozygous
segments linked on one side to the selected locus can
be calculated as
Var[x] = E[x*] - (ELXI)*,
which gives:
Var[X] = (l/n*)[l - e”’”(2nD
+ e-””)],
(5)
or twice this amount for the lengths surrounding the
selected locus. The limiting function (when m + 00)
would be Var[X] = l/n*.
In Figure 1 we compare the performance of our
exact criterion for the expected combined length on
both sides of the selected locus with FISHER’S
limiting
function, when different initial lengths of heterozygous segments surrounding the selected locus in the
FI generation ( n = 0) are considered. As shown in the
figure, FISHER’S
criterion is adequate only after eight
or more generationswhen the size ofthe heterozygous
Segments
Chromosome
-
Intact
209
d,=25
distance L (also in map units + 100) from the same
locus. Then, the average length of the heterozygous
segment on both sides of the locus after n generations
of backcrossing would be given by
+ d,=50
+ d,=75
E[X] = (l/n)[(l
+ d,=100
*
+
03
Var[X] = (l/n2)[2 - e-"R(2nR e-nR)
- e-nL (2nL + e-"L)]. (7)
(II
A
~~~
0
t
4
E
8
10
(6)
and the variance,
do=
%
030
- e-nR) + (1 - e-nL)]
I t
14
16
18
10
n (generations 1
FIGURE
2.-The standard deviation (in map units) of the combined length ( X ) of heterozygous chromosome segments on both
sides (right and left) of a locus held heterozygous for several
generations of backcrossing, in the same situations as in Figure 1
(after Equation 7, where R = L = d0/2).
chromosome is 100 map units. If the size of the
heterozygous segment that we consider in the F1 is
smaller (75 to 50 map units), 10-12 generations may
be necessary to get a good approximation to the true
value. And when the size ofthe heterozygous segment
is very small (20 map units, for example), up to 20
generations are necessary to get a really good agreement between the exact criterionand FISHER'S.Figure
2 shows the standard deviations for the lengths on
both sides of the selected locus. Except for theearliest
backcross generations, they are quite similar for all
the different initial lengths of heterozygous chromosome segments that we have considered. Their values
increase at first to reach a maximum that depends on
the initial size of the heterozygous segment and then
drop to the values of the limiting function as n gets
larger.
Equations 4 and 5 may be expressed in a more
generalformfor
cases where the selected locus is
asymmetrically located on the chromosome.Let us
assume that the selected locus is linked in the F1 to a
heterozygous
chromosome
segment
whose right
bound (nearest to the centromere)lies at a distance R
(in map units + 100) from the locus of interest, and
whose left bound (nearest to the telomere) lies at a
Equations 6 and 7 giverise to Equations 4 and 5,
respectively, when R = L = D (that is, when the locus
of interest is just in the middle of achromosome
which is whollyheterozygous in the FJ, and thelength
on only one side of the locus is considered. These
generalequations can be usedin any conceivable
situation to provide estimates of the average length of
heterozygous chromosome segments linked to a selected locus and its associated error, which will depend, of course,
on
the size of the sample,
N(Var[E[X]] = (l/N)Var[X]).
We hopethat with this paper all the interested
scientific audience will finally be aware of the limitations of FISHER'Scriterion. T o understand this will
help to realize many of the consequences of genetic
recombination, which are sometimes more difficult to
anticipate without a well developed theory.
LITERATURE CITED
BARTLETT,
M. S., and J. B. S. HALDANE,1935 The theory of
inbreeding with forced heterozygosis.J. Genet. 31: 327-340.
CROW,
J. F., and M. KIMURA, 1970 An Introduction to Population
Genetics, pp. 94-95. Harper & Row, New York.
FISHER,R. A., 1948 A quantitative theory of genetic recombination and chiasma formation. Biometrics 4: 1-13.
FISHER,R. A., 1949 The Theory of Inbreeding, pp. 49-50. Hafner,
New York.
HALDANE,
J. B. S., 1919 The combination of linkage values, and
the calculation of distances between the loci of linked factors.
J. Genet. 8: 299-309.
HANSON,
W. D., 1959a The theoretical distribution of lengths of
parental gene blocks in the gametes of an F, individual. Genetics 44: 197-209.
HANSON,W. D., 1959b Early generation analysisof lengths of
heterozygous chromosome segments around a locus held heterozygous with backcrossing or selfing. Genetics 4 4 833-837.
HEDRICK,
P. W., 1985 Genetics of Populations, pp. 372-373. Jones
and Bartlett, Boston.
ORR,H. A., and J. A. COYNE,1989 The genetics of postzygotic
isolationin the Drosophilavirilis group. Genetics 121: 527537.
SUZUKI, D. T., GRIFFITHS,A. J. F., MILLER,J.H. and R. C.
LEWONTIN,
1986 An Introduction to Genetic Analysis, pp. 103105. Freeman, New York.
WRIGHT,S., 1969 Evolution and the Genetics of Populations, Vol.
11, pp. 264-265. The University of Chicago Press, Chicago.
Communicating editor: B. S. WEIR