Linkage and Crossing Over Between Genes

Linkage
The term linkage refers to the fact that certain genes tend to be inherited together,
because they are on the same chromosome. Thus parental combinations of characters are
found more frequently in offspring than non-parental. Linkage is measured by the percentage
recombination, unlinked genes showing 50% recombination.
Genetic linkage is a term which describes the tendency of certain loci or alleles to be
inherited together. Genetic loci on the same chromosome are physically close to one another
and tend to stay together during meiosis, and are thus genetically linked.
As a consequence such genes would
fail to show independent segregation.
Bateson and Punnet(1906) while working
with Sweet Pea reported that in real situation
when a test cross is carried in sweet pea
plant (flower colour and pollen shape) the
result showed 7: 1 : 1 : 7 ratio instead of 1 :
1 : 1 : 1 ratio as per Mendelian law of
independent
segregation.
Morgon
while
working
on
Drosophila proposed the ability or tendency
of two genes to stay together during
inheritance is known as linkage. Coupling
and repulsion phases are two aspects of
linkage".
Coupling Phase : The condition of having the dominant
alleles for both genes on the same parental chromosome,
with both recessives on the other parental chromosome, is
called “coupling”: the C and S genes are “in coupling
phase”.
Repulsion Phase : The opposite condition, having one
dominant and one recessive on each parental chromosome,
is called “repulsion”. Thus, if the original parents were Cs
x cS, their offspring would have the genes in repulsion
phase: Cs / cS.
The figure below depicts the gamete composition for linked genes from coupling and
repulsion crosses.
Types of linkage
Depending upon the absence (complete linkage) or presence(incomplete linkage) of
recombinant phenotypes in test cross progeny.
Complete linkage
When only combinations of parental characteristics are recovered in test cross
progeny, it is called complete linkage.
Incomplete linkage
The recombinant types are also recovered in addition to the two parental types in the
test cross progeny, then it is a incomplete linkage.
Linkage is also classified as (a) coupling and (b) repulsion phase linkage depending
upon whether all dominant or some dominant and some recessive genes are linked together.
Detection of linkage
I. Inspection method for linkage detection
A & B each segregates 3:1
Phenotypes in F2
AB
Ab
aB
Recombination value
(a)
(b)
(c)
0%, Repulsion (R)
2
0%, Coupling (C)
3
50%, C or R (independence) 9
1
0
3
1
0
3
ab
(d)
0
1
1
Quick test:
The Ratio
ad
----bc
=1
<1
>1
independence (No Linkage)
repulsion linkage
coupling linkage
II. Precise Method
The chi-square test of goodness of fit is carried out to test the whether observed
frequencies are in accordance with the expected frequencies. This is a precise method of
estimation of linkage.
The general form of the chi-square is given as
If chi-square is significant then we say that segregation is not independent and there is
a linkage between two genes.
A)
Detection by test cross generation
1. Chi-square test is applied to verify the frequencies of four phenotypic classes are in
the ratio of 1:1:1:1 as expected on the basis of law of independent segregation of two genes.
(Test of deviation from 1:1:1:1)
Phenotypes :
AaBb Aabb aaBb aabb
Obs. Freq.
:
O1
O2
O3
O4
Exp. Freq.
:
E1
E2
E3
E4
(in the ratio 1:1:1:1)
2
Calculate the  -value and compare with table value (at 3 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
2. If there is a significant deviation from the ratio 1:1:1:1, the frequencies of
phenotypic classes Aa (i.e. AaBb+Aabb) and aa (i.e. aaBb+aabb) are determined. For this a
chi-square test is carried out to see if these frequencies (Aa and aa) are in the ratio 1:1.(Test of
deviation of Aa and aa from 1:1)
Phenotypes :
Aa
aa
(= AaBb+Aabb)
(= aaBb+aabb)
Obs. Freq.
:
O1
O2
Exp. Freq.
:
E1
E2 (in the ratio 1:1)
Calculate the 2-value and compare with table value (at 1 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
3. Similarly, the frequencies of phenotypic classes Bb (i.e. AaBb+aaBb) and bb (i.e.
Aabb+aabb) are determined. For ths a chi-square test is carried out to see if these frequencies
(Bb and bb) are in the ratio 1:1. .(Test of deviation of Bb and bb from 1:1)
Phenotypes :
Bb
bb
(= AaBb+aaBb)
(= Aabb+aabb)
Obs. Freq.
:
O1
O2
Exp. Freq.
:
E1
E2 (in the ratio 1:1)
Calculate the 2-value and compare with table value (at 1 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
4. The independence of segregation of genes A/a and B/b is tested. To do so the
frequencies of parental and recombinants types are determined. In coupling phase linkage, the
parental types will be AaBb and aabb, and the recombinant types will be Aabb and aaBb. But
in repulsion phase linkage, the reverse will be the case. If the two genes are segregating independently, and if the classes Aa and aa as well as Bb and bb are present in 1:1 ratio (items 2
and 3), the parental and recombinant types will be in 1:1 ratio. A 2-test is now applied to
determine if their observed frequencies agree with 1 : 1 ratio. (Test for independence of
segregation of A/a and B/b – coupling phase)
Parental Types
Recombinants
Phenotypes :
(= AaBb+aabb)
(= Aabb+aaBb)
Obs. Freq.
:
O1
O2
Exp. Freq.
:
E1
E2 (in the ratio 1:1)
2
Calculate the  -value and compare with table value (at 1 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
If the phenotypic classes Aa and aa as well as Bb and bb are in 1 : 1 ratio, and the
frequencies of parental and recombinant types deviate significantly from 1 : 1 ratio, genes A/a
and B/b are not segregating independently, i.e., they are linked.
B) Detection in F2 Generation
The procedure for the detection of linkage in an F2 generation is similar to that for in a
test-cross generation.
1. A chi-square test is applied to see if the four phenotypic classes, viz., A-B-, A- bb, aaB- and
aa bb, are in 9 : 3 : 3 :1 ratio, the ratio expected in F2 in the case of independent assortment.
(Test of deviation from 9:3:3:1 ratio)
Phenotypic class :
A-B- A-bb aaBaabb
Obs. Freq.
:
O1
O2
O3
O4
Exp. Freq.
:
E1
E2
E3
E4
(in the ratio 9:3:3:1)
Calculate the 2-value and compare with table value (at 3 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
2. If there is a significant deviation from 9:3:3:1 ratio, the frequencies of classes A- (A- B- +
A- bb) and aa (aa B- + aa bb) are determined. A 2-test is now done to assess if these classes
are in 3 : 1 ratio. (Test of deviation of the phenotypic classes A- and aa from 3 :1 ratio)
Phenotypic class :
Aaa
(=A-B+A-bb)
(=aaB + aabb)
Obs. Freq.
:
O1
O2
Exp. Freq.
:
E1
E2 (in the ratio 3:1)
Calculate the 2-value and compare with table value (at 1 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
3. Similarly, the frequencies of phenotypic classes B- (a-B- + aa B-) and bb (A-bb + aa bb)
are computed, and a 2-test is applied to determine if they are in 3 : 1 ratio.(Test of deviation
of the phenotypic classes B- and bb from 3 :1 ratio)
Phenotypic class :
Bbb
(=A-B + aaB)
(=Abb + aa bb)
Obs. Freq.
:
O1
O2
Exp. Freq.
:
E1
E2 (in the ratio 3:1)
2
Calculate the  -value and compare with table value (at 1 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
These two tests are done to see if the segregations for A/a and for B/b are normal
yielding the expected 3 : 1 ratio, and if the significant deviation from 9:3:3:1 ratio is not due to
a departure of one or both of these from the 3 : 1 ratio.
4. Finally, the independence of segregation for genes Ala and B/b is test by computing the 2
for independence. This is done by rearranging t frequencies of the four phenotypic classes,
and estimating the 2 value.( Test for the independence of segregation of genes Ala and B/b)
Phenotypic class
Frequencies
Total
Bbb
A-
aa
(A-B-)
=a
(A- bb) a+b
=b
(aaB-)
=c
(aa b )
=d
c+d
a+c
b+d
GT
( ad-bc-1/2GT )2 x GT
2
=
------------------------------------------
(a+b) (c+d) (a+c) (b+d)
Calculate the 2-value and compare with table value (at 1 d.f. and 1% L/S). If
calculated value is greater than table value, then segregation deviates from normal ratio, other
wise independent.
If the segregations for genes Ala and Bib separately yield the ratio 3 :1 and the 2 for
independence is significant, the genes Ala and B/b are not segregating independently, i.e., they
are linked.
PRCTICAL EXERCISES
TEST CROSS
Morgan performed a testcross by crossing prpr vgvg flies to F1. For this example, the
testcross genotype is pr vg. Therefore the testcross progeny will represent the
distribution of the gametes in the F1. Remember that a testcross to F1 derived from a
dihybrid cross gave a 1:1:1:1 ratio. But this is not what Morgan observed. The
following table shows the result of this test cross.
F1 Gamete Testcross Distribution
Gamete Type
pr+ vg+
1339
Parental
pr+ vg
151
Recombinant
pr vg+
154
Recombinant
pr vg
1195
Parental
F2 Generation
Bateson and Punnett with sweet peas. They performed a typical dihybrid cross
between one pure line with purple flowers and long pollen grains and a second pure
line with red flowers and round pollen grains. Because they new that purple flowers
and long pollen grains were both dominant, they expected a typical 9:3:3:1 ratio when
the F1 plants were crossed. The table below shows the ratios that they observed.
Specifically, the two parental classes, purple, long and red, round, were over
represented in the progeny.
Observed
Expected
Purple, long (P_L_)
284
215
Purple, round (P_ll)
21
71
Red, long (ppL_)
21
71
Red, round (ppll)
55
24
Total
381
381
Crossing over
Crossing over is the exchange of locus by the alleles among the corresponding segments of the
homologous chromosomes. It is responsible for recombination between two genes.
Number of recombinant progeny from a test cross
Freq. of crossing over(%) = ------------------------------------------------------------ x 100
Total number of progeny
Estimation of recombination frequency
Basically, the estimation of recombination frequency from observed genotypes is to
assess the frequencies of recombinant gametes and non-recombinant gametes that have
resulted in the observed genotypes (or phenotypes).There is a universally applicable method to
obtain an estimate of recombination frequency from observed phenotype frequencies among
the offspring. This method is known as the maximum likelihood method. (In fact, counting
recombinant and non-recombinant gametes is a special case of maximum likelihood
estimation.)
Let a sample of n gametes contain k recombinant and n-k non-recombinant gametes.
The recombination fraction “p” is given as
k
p = -------n
If p= ½
: Independent assortment (unlinked loci)
If p <1/2
: linked loci
If p  0
: Tight linkage
LOD score (likelihood or logarithm of odds’ score)
The larger the sample size (= population size) the more reliable the estimate will be.
The precision of a recombination estimate is reflected by its variance or, alternatively, its
standard error. As an example consider the backcross design with two classes of genotypes,
i.e. recombinants and non-recombinants. Suppose we observe k recombinants and n-k nonrecombinants in a sample of size n. The estimate of recombination fraction “p” then equals

k
p
n
and its variance equals


p(1  p)
Var( p) 
n
Thus the sampling variance of p is inversely proportional to the sample size n. In other words,
the larger n is, the more confidence we have in the estimate.

In linkage analysis an alternative ‘measure of confidence’ is used. This measure, the
LOD score, indicates how much confidence we can have in supposed linkage of two loci,
based on the observed frequencies of genotypes or phenotypes. The acronym LOD stands for
‘likelihood of odds’. Symbolically the definition of LOD reads

 k
 p (1  p) n k
LOD  log 
10
0.5 n







= k log pˆ + (n-k) log (1-pˆ) - n log ( 0.5)
As k = n pˆ and n - k = (1-pˆ) this is rewritten as
LOD = n pˆ log(pˆ) + n(1-pˆ) log(1-pˆ) + n log( 2 )
= n[pˆ log(pˆ) + (1-pˆ) log(1-pˆ) + log( 2 )]
In linkage analysis a LOD score of 3 or larger is generally taken as evidence of
linkage, whereas a LOD score smaller than 3 is not considered as proof of linkage. As a
general rule, if the LOD score is 3 or higher, it means that there is a high probability of
genetic linkage. Lower number indicates less likelihood of genetic linkage, even though they
can still be useful in the process of elimination. A LOD score of +3 indicates 1000 to 1 odds
that the linkage being observed did not occur by chance.
Example. Let k = 20, n = 100 (20 recombinants in a population of size 100).
Then
pˆ =20/100 =0.2,
LOD =100[0.2 log(0.2) +0.8 log(0.8) +log(2)] = 8.37
Now 108.37is a very large number, and so we are very confident that the loci are linked.
The LOD score (logarithm (base 10) of odds), is a statistical test often used for
linkage analysis in human, animal, and plant populations. The LOD score compares the
probability / likelihood of obtaining the test data if the two loci (specific location of a gene in
a chromosome) are indeed linked, to the likelihood of observing the same data purely by
chance. If the LOD score is high (Positive LOD), it means that the traits are closely linked,
and therefore usually inherited together. Low scores (Negative LOD), on the other hand,
indicate a low linkage. Having a good knowledge of LOD scores is essential to geneticists for
a number of reasons, ranging from understanding particular genetic conditions to a desire to
figure out where a gene is located, and using information about known genes. Computerized
LOD score analysis is a simple way to analyze complex family pedigrees in order to
determine the linkage between traits.
ESTIMATION OF LINKAGE
The phenomenon of linkage can only be observed in families segregating for each of
the two gene pairs corresponding to the two observed characters. This means that at least one
of the two parents involved in raising the family should be doubly heterozygous. Thus the
following types of crosses can provide information about linkage:
Expected ratios with no linkage
AaBb x aabb
1:1:1:1
AaBb X aaBb
3:1:3:1
or
AaBb x Aabb
3:1:3:1
(i) Double backcross
(ii) Single backcross
(iii) F2 family
AaBb x AaBb
9:3:3:1
The relative proportion of recombinant and parental(non-recombinants) types are
apparently p and (1- p) respectively. With no linkage, p = 0.5 and non-recombinants are
likely to occur as recombinants as in the case of independent assortment. The gametic output
of the double heterozygote is then in the expected ratio of 1:1:1:1. But when the linkage is
complete, there is no possibility of crossing over and this would mean p = 0 and we get only
non-recombinant types.
Depending on whether both the dominant genes are located on the same chromosome
(AB/ab), that is in the coupling phase or one dominant on the first member and the other
dominant on the second member (Ab/aB), that is, in the repulsion phase. The gametic output of
the double heterozygote, in the two cases, would then be as given below :
AB
Ab
aB
ab
a1
a4
Observed frequencies
a2
a3
m1
m4
Expected frequencies
m2
m3
(1-p)/2
(1-p)/2
Coupling phase
p/2
p/2
p/2
p/2
Repulsion phase
(1-p)/2
(1-p)/2
Various methods for estimation of linkage are :
1. Method of Moments
The simplest method of estimation is to develop equations by equating the observed
sample moments to their expected values which depend on the parameters. The number
of these equations has to be as many as the number of parameters. However, if there is
only one parameter, p or  {where  = (1-p)2 in coupling phase and  = p2 in repulsion
phase } and the observed sample consists of several distinguishable classes. Solving this
equation gives the required estimate of the specified parameter. There are, however, several
ways of doing it. We consider two of them:
(A) Emersion’s Method
In the case of F2 family, the expectations in the four classes such that as  is
increased, those in the first and fourth increase while those in the second and third (which
are equal) decrease. We may, therefore, take a linear function (e — m) where e = a1 + a4
and m = a2 + a3 . Its expectation is then:
E(e — m ) = E(a1 —a2 — a3 + a4) = n
The estimating equation gives a consistent estimator.

  (a1 - a2 - a3  a4) /n
The sampling variance is easily found from the formula:

Var   (1-2)/n
Variance in terms of common parameter p is given as
Var(p) = (1-)2 /4n
But when one parent shows coupling phase and other shows repulsion, then we get:
Var(p) = Var ( ) / (1-4)
(B) Linear function used in detecting linkage
The linear function of frequencies which is used as a method of estimation in case of
F2 data, we use the function
x = a1 – 3a2 – 3a3 + 9a4
the consistent estimator
of  is given as

  (a1  a2  a3  5a4) / 2n
The sampling variance is given as

Var   (1+6-42)/4n
2. Maximum Likelihood Method
The method of maximum likelihood provides the estimates which satisfy all the
three criteria of consistency, efficiency and sufficiency. In this method, the likelihood of
obtaining a family of size n distributed according to the multinomial law in the four
phenotypic classes expected with two segregating loci, each with two alleles and with
dominance at each locus, is first worked out and then maximised for variation in the
parameter. Thus, expressing likelihood as L, we get:
L
n!
4
 mi
 miai
i 1
After maximizing log L, we get the estimating equation as
4
 ai
i 1
d log mi
0
d
Where one of the admissible solutions is taken as the required estimate.
The variance of the estimate is then obtained by the formula :
d 2 log mi
Var ( )  1 /  nmi (
)
i 1
d 2
4
(A) Double backcross

With coupling phase, estimate of p is given as p  (a 2  a3) / n

With repulsion phase estimate of p is given as p  (a1  a 4) / n
The sampling variance for both phases is given as Var(p) = p (1-p)/n
(B) Single backcross
With coupling phase, the estimate of p can be obtained by solving the cubic equation
(a1  a 2  a3  a 4) p 3  (2a 2  3a3  a 4) p 2  (a1  a 2  2a3  2a 4) p  2a 2  0
and positive root is taken as estimate. The sampling variance is given by
2 p(1  p)(1  p)(2  p)
V ( p) 
2(1  2 p  2 p 2 )
(C) F2 family
With the coupling phase males and females, the likelihood quadratic equation is
n 2  (a1  2a 2  2a3  a 4)  2a 4  0
The positive root of this equation is taken as the required estimate.
The sampling variance of the parameter  is given as
Var ( ) 
2 (1   )(2   )
)
n(1  2 )
In case of repulsion phase in both the sexes or coupling in one sex and repulsion in
other sex, the results would be similar but under the condition proportion of male
recombinants(pm) and proportion of female recombinants(p f) is equal i.e. pm = pf = p.
3. Minimum Chi-square Method
The chi-square statistic has been used for judging the agreement of observed with expected
frequencies in the four phenotypic classes on the assumption of the absence of linkage. If we
take, instead, the expected frequencies, based on the presence of linkage, the chi-square
statistic would be a function of the linkage parameter. We can minimize this function for
variation in the linkage parameter and obtain the required equation for estimation purpose.
For double backcross p is estimated by quadratic equation given as :
(1   ) p 2  2 p  1  0
where =(a1 + a4)/ (a2 +a3) in coupling phase and =(a2 +a3)/ (a1 + a4) in repulsion phase .
The positive root of this quadratic equation gives the required solution.
For F2 family p is estimated as positive solution of the equation of the fourth degree in  as :
a42 a 22  a32 
4  a12



0
n  (2   ) 2  2 (1   ) 2 
Where p  1   for coupling phase and p   for repulsion phase
The sampling variance is given as Var ( ) 
2 (1   )(2   )
)
n(1  2 )
4. Product Ratio Method
In this method instead of considering the sum of the extreme frequencies a1 and a4,
we might consider their product. The ratio of the product a1a4 to the product a2a3 clearly
increases with increase in  . The ratio is given as :

a1a 4
a 2a3
The quadratic equation for estimating  is given by
2

(1   )  2(1   )    0
The positive root of above quadratic equation is taken as the estimate. This method was
devised by Fisher and Balmukund (1928) and Immer(1930). Immer also devised tables in which
for any value of  , we can read the value of  or p.
The sampling variance is given as
Var ( ) 
2 (1   )(2   )
)
n(1  2 )
Coupling Phase
The dominant alleles of linked genes are located on the same chromosomes. The
dominant genes have the a strong affinity for each other.
Example.
Test cross
between CcSs and ccss
progeny
CS
Cs
cS
cs
47% 3%
2%
48%
Repulsion Phase
When long winged black bodies (LLDD) are crossed with short winged grey bodies (llBB),
the FI were long-winged and grey bodied (LlGg).
But when the FI were test crossed with the double recessive strain (llgg), the parental type
long winged black bodies and short winged, grey bodies were more frequent than the
recombinant types.
It appears as if in this cross, the dominant genes did not like one another, hence they got
separated.
This situation is referred to as repulsion phase or trans configuration
Coupling and repulsion phases are obviously two phenomena of linkage. The only point to
consider is whether two dominant genes or characters are existing simultaneously on one
chromosome or not.
In other words, if the parental combination continues to exist in FI, F2 and in test cross, i.e., in
every generation, then such combination of linkage is called complete linkage.
On the other hand, if the parental combination continues to exist only in every alternate
generation, i.e., FI' F3, Fs F n-I then such combination of linkage is called as incomplete
linkage.
Linkage between two dominant genes produces significant deviation from the typical dihybrid
ratio 1: 1: 1: 1 (test cross) and 9:3: 3: 1 (in F2 generation).
This deviation is the most easily detected in test cross. In the case of linkage, the two parental
types !ire most frequent.
In the linkage group, the percentage of parental types will be more than 50% and the
recombinant percentage type will be less than 50%.
It may be seen that the two parental types have comparable frequencies.
Similarly the frequencies of the two recombinant types are also comparable. This type of test
cross data is a sure indication of linkage.
In fact, this relationship can be used as a safe guide to identify the two types in the test cross
data whenever the identity of the parental type is not known.
All the genes present on single chromosomes are grouped as one linkage group.
Thus, the number of linkage groups present in a species is equal to the number of
chromosomes present in that species.
For example, the number of linkage groups in humans is 23 and that of maize is 10.Genes
present in one linkage group can be represented on a single straight line in: the same order in
which they are present in the chromosome, and the distance between two linked genes is
proportional to the frequency of recombination between them which can be depicted in a
diagram.
Such a diagram with the linear order and recombination frequencies depicted is called as
linkage map or genetic map or chromosome map.
Before preparing a chromosome map, the sequence of genes in the chromosome and
frequencies of recombination between linked genes must be known.
An appropriate test cross will determine the recombination frequencies between linked genes.
Each recombination frequency is used as a map unit for preparing the linkage map.
Linkage Map
A linkage map is a genetic map of a species or experimental population that shows the
position of its known genes or genetic markers relative to each other in terms of
recombination frequency, rather than as specific physical distance along each chromosome.
Linkage mapping is critical for identifying the location of genes that cause genetic diseases.
A genetic map is a map based on the frequencies of recombination between markers during
crossover of homologous chromosomes. The greater the frequency of recombination
(segregation) between two genetic markers, the farther apart they are assumed to be.
Conversely, the lower the frequency of recombination between the markers, the smaller the
physical distance between them. Historically, the markers originally used were detectable
phenotypes (enzyme production, eye color) derived from coding DNA sequences; eventually,
confirmed or assumed noncoding DNA sequences such as microsatellites or those generating
restriction fragment length polymorphisms (RFLPs) have been used.
Genetic maps help researchers to locate other markers, such as other genes by testing for
genetic linkage of the already known markers.