PB HLTH C240F/STAT C245F

PB HLTH C240F/STAT C245F
Spring 2017
Assignment #2
Due: Thursday, February 23rd
This assignment concerns the statistical analysis of meiosis.
Question 1. Map functions. Consider two loci A and B. Let X denote
the random variable for the number of chiasmata between A and B on the
four-strand chromatid bundle and let πn = Pr(X = n), n = 0, 1, . . ., denote
the chiasma count distribution. Consider the following three models for chromatid strand involvement, where {1, 2} and {3, 4} denote the two pairs of sister
chromatids.
1. No-chromatid-interference (NCI). Each of the four pairs of non-sister chromatids is equally likely to be involved in a chiasma, independently of strand
involvement in other chiasmata.
2. Complete-negative-chromatid-interference (NegCI). The same randomly chosen pair of non-sister chromatids is involved in all chiasmata.
3. Complete-positive-chromatid-interference (PosCI). The pairs of non-sister
chromatids involved in different chiasmata strictly alternate between the
first randomly chosen pair and the pair not including the chromatids from
the first pair. For instance, if the first chiasma involves pair {1, 3}, then
subsequent chiasmata alternately involve pairs {2, 4} and {1, 3}.
Let d and θ denote, respectively, the genetic map distance and recombination
fraction between loci A and B.
For each of the above three strand involvement models, perform the following
analyses.
• Express the recombination fraction θ in terms of the chiasma count distribution {πn }.
• For a Poisson chiasma process, derive a simple expression for the recombination fraction θ.
• For a Poisson chiasma process, determine the range of the map function
M (d) relating the recombination fraction θ to the genetic map distance d.
Comment on the impact of the strand involvement model on the range of
the recombination fraction.
Sandrine Dudoit
Assignment #1
PH C240F/STAT C245F
• For a Poisson chiasma process, plot the map function M (d) vs. the genetic
map distance d, for d ∈ [0, 3]. (The map functions for the three strand
involvement models can be plotted in the same figure using the R matplot
function.)
Hint. Let R denote an indicator for the occurrence of a recombination event
between A and B on a random chromatid. The recombination fraction θ may
be derived by conditioning on the number of chiasmata,
θ = Pr(R = 1) =
∞
X
Pr(R = 1|X = n) Pr(X = n),
(1)
n=1
and by noting that a particular chromatid is recombinant if it is involved in an
odd number of chiasmata.
For a Poisson chiasma process, recall properties of trigonometric (sin, cos) and
hyperbolic (sinh, cosh) functions and their Taylor series expansions.
Question 2. Coincidence coefficients. A common measure of crossover interference between two intervals is the coincidence coefficient. Specifically, consider three consecutive loci, A, B, and C, defining two adjacent intervals, I1 and
I2 , of genetic map length d and δ Morgans (M), respectively. Let πi1 ,i2 denote
the joint probability of i1 recombination events in the first interval (I1 , between
A and B) and i2 recombination events in the second interval (I2 , between B and
C), i1 , i2 ∈ {0, 1}. Then, the coincidence coefficient C is defined as the joint
probability of a recombination in both the first and second intervals, divided by
the product of the marginal recombination probabilities,
C = C(I1 , I2 ) ≡
π11
.
(π10 + π11 )(π01 + π11 )
(2)
The case C = 1 corresponds to no crossover interference, while C < 1 and C > 1
correspond, respectively, to positive and negative crossover interference. Let
θ = M (d) denote the map function relating recombination fractions and genetic
map distances, with M (0) = 0 and M 0 (0) = 1. Note that we are assuming that
map functions only depend on the genetic map length of an interval, not its
location.
a) Prove that the coincidence coefficient for consecutive intervals of genetic map
length d and δ can be expressed as
C = C(d, δ) =
M (d) + M (δ) − M (d + δ)
.
2M (d)M (δ)
(3)
b) Derive the following differential equation
M 0 (d) = 1 − 2C3 (d)M (d),
(4)
where C3 (d) ≡ limδ→0 C(d, δ) is called the semi-infinitesimal 3-point coincidence
function.
Version: 07/02/2017, 15:15
2
Sandrine Dudoit
Assignment #1
PH C240F/STAT C245F
c) Derive the map function corresponding to the semi-infinitesimal 3-point coincidence function C3 (d) = 1.
d) Derive the semi-infinitesimal 3-point coincidence function C3 (d) for the Kosambi
(1944) map function
1
MK (d) ≡ tanh(2d)
2
and the Carter and Falconer (1951) map function
−1
MCF
(θ) ≡
1
(tanh−1 (2θ) + tan−1 (2θ)).
4
Hint. Recall that tanh(x) = (ex −e−x )/(ex +e−x ) and work with the derivative
and inverse of tanh.
e) Plot the Haldane (1919), Kosambi (1944), and Carter and Falconer (1951)
map functions for d ∈ [0, 3] on the same figure (e.g., using the R matplot
function).
References
T. C. Carter and D. S. Falconer. Stocks for detecting linkage in the mouse and
the theory of their design. Journal of Genetics, 50:307–323, 1951.
J. B. S. Haldane. The combination of linkage values, and the calculation of
distances between the loci of linked factors. Journal of Genetics, 8:299–309,
1919.
D. D. Kosambi. The estimation of the map distance from recombination values.
Annals of Eugenics, 12:172–175, 1944.
Version: 07/02/2017, 15:15
3