The Evolution of Haplotype Distributions in Random
Mating Populations:
A Random Set Approach
AMIR RAJI-KERMANY and CHARALAMBOS D.CHARALAMBOUS
School of Information Technology and Engineering
University of Ottawa
800 King Edward Ave, Ottawa,K1N 6N5
and
DONAL HICKEY
Department of Biology
University of Ottawa
150 Louis Pasteur, Ottawa K1N 6N5
CANADA
Abstract: - Several mathematical models have been developed to describe the genetic structure of populations. Most of these models focus on only one, or few genetic loci. In this paper, we develop a model to
describe a large number of loci simultaneously. Such a model is appropriate for recent large-scale studies
of single nucleotide polymorphisms (SNPs). Building on the earlier work of Geiringer (1944) and the
more recent mathematical formalization proposed by Dawson (2000), we suggest modeling the system
using random sets.[1, 3] We can then use this model to find the recursion law for the change of haplotype
frequencies between successive generations.
Key-Words: - recombination, polymorphism, allele, locus, genetic structure,random sets.
1
The Haplotype Structure of
Populations
isting haplotypes. [5] Other evolutionary factors also
cause the change of haplotype frequencies (for example, mutation and natural selection) but we do not
consider these factors in the current model. Also, we
A chromosome can be considered as a linear array
assume that the size of the population under study is
of genes. Each gene has a defined position, or locus,
large , and that the generations are non-overlapping.
along the chromosome but different allelic forms of
For simplicity, , we consider only di-allelic genes, but
a particular gene may occur at a given locus. The
the method can be extended for multi allele case. [1]
combination of allelic forms over all of the loci on
a given chromosome is defined as the haplotype. [6].
Haplotypes are replicated exactly during the chromo- 2
Random Sets as a Useful Tool
some division, assuming that there is no mutation or
recombination occurring. In sexual, random mating In this section, we provide a brief overview of the
populations (such as mammals, including humans) random sets.
Random sets are generalizations of the familiar
homologous recombination is very frequent and it continually generates new haplotypes from pairs of ex- concept of random variables/vectors in probability
1
3
theory. [4] As a simple example, let S be a finite
set. Suppose elements of the power set P(U ) of U
are selected at random according to some specified
probability law. Then we obtain a finite random set.
We can define a set function which plays the role of
probability density function:
f (A) = 1
(1)
f : P(U ) → [0, 1] ;
Haplotypes can be represented as an array of genes.
This array may be confined to a single chromosome
or it may extend to the entire genome. Denoting the
total number of loci of a gamete by L, we have:L =
L1 + · · · + Ln where Li is the length of the i’th chromosome, and n is the number of chromosomes of the
gamete. If we represent an alleles of a gene at locus
i by x(i) where x(i) ∈ Λ = {0, 1, 2, . . . , m}, we can
uniquely represent a gamete in the population by the
array X = {x(1), . . . , x(L)} where m is the number
of possible alleles.
Any subset of the set X can be considered as a
haplotype. For example, in the simplest case, studied
by Mendel, where the locus is di-allele and only one
genetic factor is considered, haplotypes are singletons
of X, i.e. {x(i)}.
[7] A population of gametes can be partitioned according to the haplotype under consideration, where
individuals belonging to the same partition share the
same arrangements of genes with respect to the specified region. The number of individuals in each partition divided by population size, is the frequency of
corresponding haplotype.
In the di-allelic case, genes can appear only in two
alternative forms denoted by 0 and 1. Therefore we
can represent a gamete as an array of 0 and 1’s , Any
haplotype, in this notation can be considered as a
subset of X, which can also be represented as an array
of 0 and 1’s, i.e X = (x(1), x(2), x(3)) ≡ 101, means
the gene on the first locus appears with its second
allele, the second gene with its first allele and the
third with its second allele. In this case if we consider
n loci, there is 2n possible haplotypes. For instance
in above example we have 23 = 8 possible haplotypes
denoted as:000, 001, 011, 010, 101, 111, 110.
Let X to be the space of all possible haplotypes
with respect to the set of loci under consideration.
As there are only two possible alleles, we can identify haplotypes by the loci which carry genes appeared with 1 allele. For example 000 ⇔ ∅; 001 ⇔
{3}; 011 ⇔ {2, 3} and so on. In this way we can set
a one-to-one correspondence between each haplotype
and 2n subsets of S = {1, . . . , n}. Let us call this
mapping as Γ which is a random set corresponding
to each point in the space X : x ∈ X . Therefore
Γx = U ⊂ S, is the set of loci on which genes appear
with their 1 allele, and loci belong to S − Γx contain
0 alleles. Therefore probability of having a haplotype which all the loci are occupied by 1 allele is the
A∈P(U)
Hence the probability that a subset A of U is selected
is f (A). Let (Ω, A, P ) be a probability space and
Γ : Ω → P(U ) be a random element. The σ-field on
P(U ) is P(P(U )). The density of the random set Γ
is:
f (A) = Pr{ω : Γ(ω) = A}
(2)
Also, in the case that U is finite ,we can define a
similar concept of distribution function for random
sets as[4, 8]:
F : P(U ) → [0, 1]
F (A) = Pr{ω : Γω ⊂ A} =
f (B)
(3)
B⊂A
As a geometric illustration of a random set, consider we drop a disc of radius r on a square surface
S = [0, a] × [0, a] and r << a, at each trial the disc
will cover a circle which is a subset of S. Therefore
the outcomes of the event ,dropping the disc on the
surface, is a set Γω ⊂ S , where:
Γω =
{(x, y) : (x, y) ∈ S,
(x − ωx )2 + (y − ωy )2 = r2 }
(4)
In which ω = (ωx , ωy ) is the location of the center of
the disc on the x − y plane.[?] Hence, if we represent
each outcome of the random experiment as ω and the
sample space to be Ω we can represent U as : U = Γω.
In this example we can make probabilistic judgments
about certain events, for example probability that a
disc hits a square T = [0, a/2] × [0, a/2], is:
Pr{ω ∈ Ω : Γ(ω) ∩ T = ∅}
(5)
In the theory of random sets, the events of interest
are in the form of (5) or the event that a random set
covers a set T :
Pr{ω ∈ Ω : T ⊂ Γ(ω)}
The Model
(6)
which in the next section we see that this event plays
an important role in our model.[8, 2]
2
define another probabilistic mapping Γ : X → 2S ,
where Γ x denotes the set of allels on haplotype x
where have come from parent 1 and the rest from
parent 2. Now let us denote the probability that a
gamete x being mapped to Γ x = R ⊂ S by r(R|S),
(7) therefore:
probability that this haplotype is mapped to the set
S, and ∅ corresponds to the haplotype which all the
genes appear with 0 allele. We define the probability
that x is mapped to Γx = U ⊂ S as:
∇
Pr{x ∈ X : Γx = U ⊂ S} = g(U |S)
Which is the probability that all the loci belonging to
(13)
r(R|S) = Pr{x ∈ X : Γ x = R ⊂ S}
U are carrying 1 allele, and the rest (S − U ) 0 alleles.
Therefore probability that for an arbitrary gamete, a If one parent of an individual transmits R , the rest of
genes on the child gamete have come from the other
subset U of loci is occupied by 1 alleles is:
parent so the probability that the first parent trans∇
mits R is equal to the probability that the second
Pr{x ∈ X : x(i) = 1, i ∈ U ⊂ S} = P (U )
parent transmits S − R, but as the labeling of the
= Pr{x ∈ X : Γx = U ⊂ S}
parents is arbitrary we have a symmetry whereby the
Pr{x ∈ X : Γx = A}
=
probability that one parent transmits S−R is equal to
A:U⊂A⊂S
the probability that the other parent transmits S −R,
therefore:
=
g(A|S)
(8)
A:U⊂A⊂S
r(R|S) = r(S − R|S)
(14)
Hence:
Now let define another parameter ρ : 2S → R which is
P (U ) =
g(A|S)
(9) the probability that all the loci belong to U ⊂ S have
come from the same parental gamete, in the other
A:U⊂A⊂S
words ρ(U ) is the probability of non-recombination
By Möbius inverse transformation from (9) we get:
among the set of loci U . Therefore:
ρ(U ) = Pr({x ∈ X : U ⊂ Γ x ⊂ S}
(−1)|A−U| P (A)
(10)
g(U |S) =
A:U⊂A⊂S
(15)
∪{x ∈ X : S − U ⊂ Γ x ⊂ S})
Where |A − U | is the cardinality of the set A − U .
If we let U = ∅ in (9) then:
If we exclude cases which U = ∅ and U = S, we have:
{x ∈ X : U ⊂ Γ x ⊂ S}
∩{x ∈ X : S − U ⊂ Γ x ⊂ S} = ∅
P (∅) = Pr{x ∈ X : Γx = ∅ ⊂ Γx}
= Pr{x ∈ X } = 1
Hence, in this case from (15) we get:
Hence
P (∅) =
(16)
(11)
ρ(U ) = Pr{x ∈ X : U ⊂ Γ x ⊂ S}
g(A|S) = 1
+ Pr{x ∈ X : S − U ⊂ Γ x ⊂ S}
(12)
A:A⊂S
(17)
On the other hand, we have:
3.1
Recombination Rates
Pr{x ∈ X : U ⊂ Γ x ⊂ S} =
The haplotype structure in each generation, depends
on the haplotypes that existed in the previous generand
ation in addition to the recombination rate between
loci. In order to be able to describe the change of haplotype distributions in a population from one generation to the next one we need a mechanism to represent the transmission probabilities for each set of loci.
These probabilities only depend on the distances between the loci and not on the particular alleles that
occupy those loci (or in the other words not on the
label assigned to each locus) For this purpose, we
3
r(A|S)
(18)
A:U⊂A⊂S
Pr{x ∈ X : S − U ⊂ Γ x ⊂ S}
=
r(A|S)
A:S−U⊂A⊂S
=
r(S − A|S)
A:U⊂A⊂S
=
A:U⊂A⊂S
r(A|S)
(19)
Therefore by (14) we have:
ρ(U ) =
2r(A|S)
The structure of a haplotype in the (t + 1)’th generation, in general, is such that a part of it (R ⊂ S)
(20) has come from parent 1 and the rest (S −R ⊂ S) from
parent 2. This is because of recombination effect in
meiosis.
A:U⊂A⊂S
By Möbius inversion of (20)we have:
2r(U |S) =
(−1)|A−U| ρ(A)
(21)
A:U⊂A⊂S
If we put U = ∅ in (15), we have:
{x ∈ X : ∅ ⊂ Γ x ⊂ S} ∪ {x ∈ X : S ⊂ Γ x ⊂ S}
= {x ∈ X : Γ x ⊂ S}
Hence:
ρ(∅) = Pr{x ∈ X : Γ x ⊂ S} =
Figure(1):A haplotype which section U is occupied by 1 allele
(22)
r(A|S) = 1 (23) Suppose we want to find the probability of a haplotype like figure (1) The probability of this event in
A:A⊂S
the space X is:
With a change of variables in (15) we can write it as
follows:
(Probability that part R has come form parent 1)×
ρ(U ) = Pr[{x ∈ X : U ⊂ Γ x ⊂ S}
(Probability that Parent 1 has allele 1 on segment
R ∩ U ⊂ U ) × (Probability that Parent 1 has allele 1
(24)
on segment (S − R) ∩ U ⊂ U )
Therefore:
(r(R|S)
gt+1 (U |S) =
∪{x ∈ X : U ⊂ S − Γ x ⊂ S}]
Now if we let S = U in (24) we get:
ρ(U ) = Pr[{x ∈ X : U ⊂ Γ x ⊂ U }
∪{x ∈ X : U ⊂ U − Γ x ⊂ U }]
R:R⊂S
(25)
× Pr{x ∈ X : U ∩ R ⊂ Γx ⊂ U }
Hence
× Pr{x ∈ X : (S − R) ∩ U ⊂ Γx ⊂ U })
ρ(U ) = Pr[{x ∈ X : U = Γ x}
∪{x ∈ X : U − Γ x = U }]
(28)
Hence:
= Pr{x ∈ X : U = Γ x} + Pr{x ∈ X : Γ x = ∅}
g
(U
|S)
=
r(R|S) ×
t+1
= r(U |U ) + r(∅|U ) = 2r(U |U ) = 2r(∅|U ) (26)
If we put U = ∅ in (21) we get:
2r(∅|S) =
(−1)|A| ρ(A) = ρ(S)
R:R⊂S
×
gt (A|S)
A:U∩R⊂A⊂U
gt (B|S)
B:(S−R)∩U;⊂B⊂U
(27)
(29)
A:A⊂S
4
Change of Haplotype Probabilities
gt+1 (U |S) =
r(R|S) ×
R:R⊂S
gt (A|S)
A:A⊂S;U∩R=A∩R
gt (B|S)
As a result of recombination and the random union of
B:B⊂S;(S−R)∩B=(S−R)∩U
gametes, the frequency of haplotypes in a population
(30)
will be different in the next generation, unless the
population has reached its equilibrium state.
Using this result we can drive a recursion for Pt (U ).
The subject of this section is to find a relation From (9) We have:
between haplotype frequencies in (t + 1)’th genera
Pt (U ) =
gt (A|S) ⇒ Pt (S) = gt (S|S)
tion knowing the state in the t’th generation. ThereA:U⊂A⊂S
fore we want to determine gt+1 (U |S) as a function of
(31)
gt (U |S)
×
4
Now if we put U = S in (30) we get[1]:
gt+1 (S|S) =
r(R|S) ×
gt (A|S)
R:R⊂S
on that locus. In this case equation (36) gives the
following:
Pt+1 ({1})
A:R⊂A⊂S
×
gt (B|S)
(S−R)⊂B⊂S
= r(∅|{1})Pt ({1})Pt (∅)
+r({1}|{1})Pt ({1})Pt (∅)
(39)
But by (14), r(∅|{1}) = r({1}|{1}), Hence:
(32)
Pt+1 ({1}) = 2r(∅|{1})Pt ({1})Pt (∅)
On the other hand from simple set theoretic rules,we
By (27),2r(∅|{1}) = ρ({1}) = 1, Hence:
have:
S ∩R = A∩R ⇒ R = A∩R ⇒ R ⊂A ⊂S
Hence using (9) we have:
gt+1 (S|S) =
r(R|S)Pt (R)Pt (S − R)
R⊂S
From (31) and (34) we have:
Pt+1 (S) =
r(R|S)Pt (R)Pt (S − R)
Pt+1 ({1}) = ρ({1})Pt ({1}) = Pt ({1})
(33)
(41)
Equation (41) implies that when |S| = 1 , case of one
factor, the population reaches it’s equilibrium state
(34) after one generation and the haplotype frequencies
remain constant. This is essentially a re-statement of
the classic Hardy-Weinberg equilibrium.[5]
5.2
(35)
The case of Two Loci
Now let S = {1, 2}, in this case:
X = X11 ∪ X10 ∪ X01 ∪ X00 where:
R⊂S
Equation (35) is valid for all set of loci, hence we
have:
Pt+1 (U ) =
r(A|U )Pt (A)Pt (U − A);
X11 = {x ∈ X : Γx = {1, 2}}
X10 = {x ∈ X : Γx = {1}}
X01 = {x ∈ X : Γx = {2}}
X00 = {x ∈ X : Γx = ∅}
A⊂U
∀U ⊂ S
(40)
(36)
In this case from (36) we have:
5
Recursion For One and Two
loci
Pt+1 ({1, 2}) = r(∅|{1, 2})Pt (∅)Pt ({1, 2})
+r({1}|{1, 2})Pt({1})Pt ({2})
+r({2}|{1, 2})Pt({1})Pt ({2})
In order to validate our model, we applied it to the
special cases of one and two locus.
5.1
+r({1, 2}|{1, 2})Pt({1, 2})Pt (∅)
Pt+1 ({1, 2}) = 2r(∅|{1, 2})Pt ({1, 2})
The case of One Locus
+2r({1}|{1, 2})Pt({1}P {2}
In the case of only one factor, we have S = {1}, |S| =
1. Hence we can partition the set of all gametes into
to subsets X0 , X1 such that X = X0 ∪ X1 , X0 ∩ X1 =
∅, and:
X1 = {x ∈ X : Γx = {1}}
(42)
(43)
In order to economize in the symbols let adopt the
following notations:
∇
(t)
Pt ({1, 2}) = P12
(37)
∇
Pi = P ({i})
∇
r({1}|{1, 2}) = r1
And
∇
X0 = {x ∈ X : Γx = ∅}
r({1, 2}|{1, 2}) = r12
ρ({1, 2} = ρ1 ; ρ({1} = ρ1
(38)
Hence all the gametes belonging to the set X1 contain Now we can rewrite (43) as:
a gene appeared with 1 allele at the specified locus,
(t+1)
(t)
(
(t)
= 2r12 P12 + 2r1 P1 t)P2
P12
and X0 represent the set of all gametes with 0 allele
5
(44)
6
On the other hand by (21) we have:
Conclusion
2r12 = ρ12
and 2r1 = ρ1 − ρ12
(45) In this study, we showed that the use of random sets
enables us to obtain the recursion relation that deBut from (23) we know that
U:U⊂S r(U |S) = 1 scribes the change of haplotype distributions from
therefore:
one generation to the next. Such a relationship was
(46) previously described by Dawson (date), using an al2r1 + 2r12 = 1
ternative approach. The utility of our approach, usTherefore by (45) we have:
ing random sets, lies in the fact that it reflects very
ρ1 − ρ12 + ρ12 = ρ1 = 1
(47) well the process by which the haplotype distribution
changes - namely, there is a geometrical change in
haplotype structure of the population.
Hence we can write equation (44) as:
In the future, we can extend this model by apply(t+1)
(t)
(
(t)
= ρ12 P12 + (ρ1 − ρ12 )P1 t)P2
(48) ing data fusion methods for identifying disease genes,
P12
especially in the cases where a large number of loci
(t) (t)
Subtracting P1 P2 = P1 P2 from both sides of (48) are considered simultaneously. There is an urgent
need for such models given the rapid increase in the
we get:1
volume of molecular surveys that provide data on ge(t)
(t)
P12 − P1 P2 = ρ12 (ρ12 − P1 P2 )
(49) netic variation at hundreds, or even thousands of loci
throughout the genome.
t t
Now with a change of variables: b12 = P12 − P1 P2 we
have:
(t+1)
b12
(t)
(t)
(t )
= ρ12 b12 ⇒ b12 = ρt12 b120
References
(50)
[1] K.Dawson, The Decay of Linkage Disequilibrium under Random Union of Gametes: How to Calculate
Bennett’s Principal Components. Theoretical Population Biology. Vol.58,2000,pp.1-20.
Therefore parameter b(t) is a decreasing function of
t, and the rate of decay is ρ12 = ρ(1, 2) which is the
probability of non-recombination along the set of loci
{1, 2}.
As we see from (50) when t becomes larger, bt12
approaches zero, so b12 can be considered as a measure of linkage disequilibrium, in the sense that when
b differs from zero, the population is in linkage disequilibrium
[2] A.P. Dempster,Upper and Lower Probabilities Induced by a Multivalued Mapping Ann. Math. Stat.
Vol 38,1967,pp.325-339
[3] H.Geiringer,On the Probability theory of Linkage in Mendelian Heredity,Ann. Math. Stat.
Vol.15,1944,pp.25-57
[4] I.R.Goodman,R Mahler. Mathematics OF Data Fusion,Kluwer Academic Publishers,1997
Remark 5.1. Note that the definition of recombination rate here is somehow different from the usual
definition in the case of two factors. Usually recombination rate is the probability of recombination between two loci, r, it is obvious that by definition in
the case of two loci we have: ρ12 = 1 − r, so if we
(t+1)
(t)
denote b12
by Dt+1 and b12 by Dt , from (50) we
get:
[5] D.L.Hartl, A Primer of Population Genetics, Sinauer
Associates Inc. 1988
[6] Y.U.Liyubich , Basic Concepts and Theorems of
The Evolutionary Genetics of Free Populations Russ.
Math. Survey,Vol.26,1971,pp.51-123
[7] Y.U.Liyubich 1992,Mathematical Structures in Population Genetics Biomathematics,Vol.22,SpringerVerlag
Dt+1 = (1 − r)Dt
Which is the classic result for the decay of two-locus [8]
linkage disequilibrium.[5] Note that r = 0.5 ⇒ ρ12 =
0.5 corresponds to the case in which the genes are far [9]
apart and r = 0 ⇒ ρ12 = 1 corresponds to the case
that the genes are very close in position.
[10]
1 As
we saw in section (5.1) the allele frequencies after the
first generation reach the steady state and don’t change with
time
6
G.Matheron,Random Sets and Integral Geometry,John Wiley,1975
G.Shafer,A
Mathematical
theory
dence,Princeton University Press,1976
of
Evi-
D.Stoyan, W.Kendall, Stochastic Geometry and Its
Applications,John Wiley,1995
© Copyright 2026 Paperzz