translocation. Karyotype distributions in a stochastic

Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
Karyotype distributions in a stochastic model of reciprocal
translocation.
D Sankoff and V Ferretti
Genome Res. 1996 6: 1-9
Access the most recent version at doi:10.1101/gr.6.1.1
References
This article cites 6 articles, 3 of which can be accessed free at:
http://genome.cshlp.org/content/6/1/1.refs.html
Article cited in:
http://genome.cshlp.org/content/6/1/1#related-urls
Email alerting
service
Receive free email alerts when new articles cite this article - sign up in the box at the
top right corner of the article or click here
To subscribe to Genome Research go to:
http://genome.cshlp.org/subscriptions
Copyright © Cold Spring Harbor Laboratory Press
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
RESEARCH
Karyotype Distributions in a Stochastic
Model of Reciprocal Translocation
David Sankoff 1 and Vincent Ferretti
Centre de Recherches Math~matiques, Universit~ de Montreal, Quebec H3C 3J7, Canada
A random process of reciprocal translocation for a fixed number k of chromosomes (or arms) will have an
equilibrium distribution of chromosome lengths, in this paper we calculate this distribution, by analytical
means for k - 2 and partially for k - 3, and simulate the means of the marginal distributions for higher k. We
compare this with a random (i.e., ahistorical) distribution of genomic DNA among k chromosomes and to a
selection of karyotypes of real organisms. The results motivate a revised model where translocations giving
rise to undersize chromosomes are disadvantaged.
The number, size, and centromeric position of its
chromosomes are the most evident properties of
the karyotype of a species. Because overall genomic DNA content is rather variable and does
not have systematic phylogenetic pertinence, the
distribution of c h r o m o s o m e , or c h r o m o s o m e
arm, length (measured cytogenetically, genetically, or as DNA content), normalized by total
length, is a meaningful characteristic of a given
organism for comparative purposes. Over the
course of evolution, the gross characteristics of a
karyotype are altered by processes such as gen o m e fusion, chromosome fusion and fission, reciprocal translocation, paracentric inversions,
duplication, deletion, and insertion of genomic
material. It is a tenet of m a m m a l i a n genomics
that the distribution of conserved chromosomal
segments evident in the comparison of two relatively divergent species can be accounted for by
repeated reciprocal translocations, each involving two breakpoints occurring more or less at rand o m a l o n g t h e a r m s of t w o c h r o m o s o m e s
(Nadeau and Taylor 1984), t h o u g h of course noncoding regions and h e t e r o c h r o m a t i n , centromeric, and telomeric regions have all been cited
as particularly susceptible to the breaking process.
From an evolutionary point of view, a reciprocal translocation occurs w h e n arms of two
chromosomes break simultaneously and are each
rejoined to the " w r o n g " chromosome (for detailed descriptions, see Schulz-Schaeffer 1980;
Swanson et al. 1981). A r a n d o m process of recip-
1Corresponding author.
E-MAIL [email protected]; FAX (514) 343-2254.
rocal translocation for a fixed n u m b e r k of chromosomes (or arms) will have an equilibrium distribution of chromosome lengths. In this paper
we calculate this distribution, by analytical
means for k - 2 and partially for k -- 3, and simulate the density for higher k. We compare this
with a r a n d o m (i.e., ahistorical) distribution of
genomic DNA a m o n g k chromosomes and with a
selection of karyotypes of real organisms. The results motivate a revised model where translocations giving rise to undersize chromosomes are
disadvantaged.
Random Reciprocal Translocations
We define a stochastic model for k / > 2 chromosomes without taking into account the fact that
the chromosomal segments exchanged by translocations do not contain centromeres. This same
model can be used, and is perhaps more properly
used, w h e n k represents the n u m b e r of arms. Let
11, • • •, Ik be the lengths of the k chromosomes of
a karyotype at time t, where 1 1 / > . . . / > lk and
where ~ili---1. Choose two different c h r o m o somes, for example, the ith and the jth, according
to some probability distribution P(i,j), which is
either uniform (=l/k) or depends on the lengths
li. Pick a breakpoint at r a n d o m on each of the two
chromosomes, breaking t h e m into segments of
length UI~, ( 1 - U)li, VIj, ( 1 - V)Ij, respectively.
Then we reform a karyotype at time t + I containing chromosomes of length 11, • • •, UI~ + VIj, ...,
(1 - U)li + (1 - V)Ij, . . . , Ik, which then must be
reindexed so that the lengths of the chromosomes are in a m o n o t o n e nonincreasing order.
This process is repeated indefinitely. As the
6:1-9 ©1996 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/96 $5.00
OENOME RESEARCH~ 1
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
SANKOFF ET AL.
p[yl x]
n u m b e r of iterations a p p r o a c h e s infinity, t h e
p r o b a b i l i t y t h a t t h e l e n g t h of t h e ith longest
c h r o m o s o m e is in a certain interval will converge. Let q(ll, . . . ,lk) be the joint equilibrium
probability density of the lengths of the longest,
second longest, . . . , shortest c h r o m o s o m e , respectively. The following sections are devoted to
the calculation of this density.
.
I~
y
1
Figure 2 Probability density for length of longer
chromosome.
The Two-chromosome Case
To simplify the notation, let x = 11 a n d 1 - x = 12
be the lengths of the two initial chromosomes,
a n d let U a n d V be two i n d e p e n d e n t r a n d o m
n u m b e r s between 0 a n d 1. T h e n the two new
c h r o m o s o m e s have lengths A - Ux + V(1 - x) a n d
1 - A - (l-U), x + (1 - V)(1 - x), respectively. Let
Y = Max[A,1 - A ] be the length of the longer of
the two, a n d let Fx(y) = Prob[Y ~< yix].
Consider the two-dimensional square [0, 1] x
(0, 1] that is the d o m a i n of (U, Ii"). W h e n A t> 1 A, t h e n Y ~< y if U is between the lines Ux + V(1 x) = 1/2a n d Ux + V(1 - x) = y, as indicated in Figure
1. This has area
2y- 1
1
2x i f y ~ < x ° r 2
.
x
( y - 1) z
2x(1-x)
( y - 1) 2
x ( l _ x ) , i f x ~ y<~ l.
-1
The density of this probability is
2
1
p[ylx] = x" if ~ ~< y ~< x
2(1 - y)
- ~ , i f x ~< y~< 1.
~-x(i
as depicted in Figure 2.
Now that we know the density p(ylx) for each
x, we can look for the equilibrium density q(y); in
our original n o t a t i o n q(1) _ 1 - q(2). The equilibrium q(y) must satisfy
1
if x -< y ~< 1. W h e n A ~< 1 - A, by s y m m e t r y an
equal area is contributed to the probability t h a t Y
~< y. T h e n
2y- 1
Fx(y)-
x
1
-< ~<
' if ~--~y-~x
q(Y) = f ~2 q(x)p(ylx)dx
= 2 ( 1 - y) f ;z x (q(x)
i T x ) dx+2 f r q(x)
x dx.
Differentiating twice, we obtain the differential
equation
y(1 - y)q'(y) + 2q(y) = 0,
whose solution is
V
q(y) = 12y(1 - y)
a
c
on the interval [I/z, 1]. The m e a n of the density q is
11/16 "
.U
b
dl
Figure 1 Areas corresponding to length distribution delimited by the line Ux + V(1 - x) = 1/2 joining
points a and b and the line Ux + 5(1 - x) = y joining
points c and d.
2 ~ GENOME RESEARCH
How do these results c o m p a r e with o t h e r
r a n d o m processes for dividing the interval [0,1]
into two segments? The simplest such process
would cut the interval at a p o i n t r a n d o m l y chosen in the interval a n d t h e n take the largest piece
as 11 and the other as 12. In this case the m e a n of
the equilibrium density would be 3/~, w h i c h is
larger t h a n 1V16.
Is there biological evidence that m i g h t decide
between the translocation model and the rand o m lengths model? Unfortunately, there are n o t
m a n y species with o n l y two chromosomes. One
well-known example is the grass Haplopappus gra-
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
1 <=
1/2
t
•
m
<=
1/3
1 >=
:
,'j
:
p
1/2
m
<=
1/3
"
-.
/j
:
P
,
/"
',
.
,
-,
:
f
,
j
i
f
.."
r
" - -
i
,."
"""
-.
.
;
"',
1
.-
"--_
<=
1/2
,,
:
m
.
l
>
.
i
".
f
'
Z
1/3
1
.
.
.
," /
:
>:
-,
"-"
- - _ _
.
1/2
-..
.:
m
>
I
-
/
z
1/3
_
.
,,';:
i
P
~-.-_.
/
,
,
---.-_.~,,.
...
i"
,
e
t
P
, ."
:
--.
e
.."
..
,,
- _-.<_
,'
,
i
"--.
"
"4..
Figure 3
:
--~._.~.
',
:
I
t
"''.
7"
Joint probability densities for longest and shortest chromosomes.
cilis (Jackson 1957), where the sizes of the larger
and smaller chromosomes are in the ratio of 5:3
(or 62.5:37.5). Thus, the translocation model (69:
31) fits better than the random lengths model
(75:25), t h o u g h we c a n n o t place too m u c h
weight on this single case.
Three Chromosomes
Because each translocation involves just two
chromosomes, the analysis for three or more
chromosomes reduces in some aspects to the case
k = 2. Complications arise, however, because the
two new chromosomes resulting from a translo-
cation involving the ith and the jth largest chromosome may change the rank of the lengths of
several or all of the chromosomes unaffected by
the translocation itself.
To model the translocation process, we need
to specify how pairs of chromosomes are chosen
for each event. The most natural postulate is that
the probability P(i,j) of choosing the ith and the
jth largest chromosome is proportional to their
lengths:
P(i,j) = Ii ~
=
lil/
li
+ 1/i - I/
+
,
GENOME RESEARCH~ 3
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
SANKOFF ET AL.
Table 1. Simulated Mean Chromosome Lengths It for Karyotypes of
Varying Numbers of Chromosomes k, Based on the Proportional Model
(M1), the Uniform Model (M2), and Random Fragmentation (/k13)
li
11
ls
M1
0.313
0.687
li
11
12
13
M1
0.122
0.298
0.580
li
11
Is
M1
0.067
0.153
0.279
0.501
13
14
k---2
Ms
0.312
0.688
k=3
11'12
0.160
0.304
0.536
k=,i
Ms
0.101
0.180
0.275
0.443
Ms
0.250
0.750
M3
0.111
0.277
0.611
M3
0.062
0.146
0.271
0.520
k=5
M,
M2
0.040 0.070
0.092 0.121
0.161 0.177
0.260 0.250
0.446 0.381
k=10
M1
Ms
0.010 0.023
0.021 0.038
0.034 0.051
0.048 0.065
0.065 0.079
16
0.085
0.095
0.084
0.110
0.143
0.193
0.290
0.113
0.136
0.169
0.231
0.109
0.143
0.193
0.293
11,1
0.057
0.059
0.057
115
116
117
0.066
0.076
0.088
0.105
0.130
0.180
0.065
0.071
0.080
0.090
0.106
0.136
0.066
0.076
0.088
0.105
0.130
0.180
13
/4
/5
M3
0.010
0.021
0.033
0.048
0.064
119
lso
w h e r e li a n d lj are t h e l e n g t h s of t h e two c h r o m o somes. In S i m u l a t i o n s (below) we also discuss t h e
m o d e l w h e r e this p r o b a b i l i t y is 1/(k2), i n d e p e n d e n t of t h e l e n g t h s of t h e c h r o m o s o m e s .
In t h e case k -- 3, g i v e n i n i t i a l c h r o m o s o m e
l e n g t h s 1 I> m I> n, t h e j o i n t p r o b a b i l i t y distribut i o n of t h e l e n g t h X of t h e longest a n d Z of t h e
shortest of t h e three n e w c h r o m o s o m e s after a
single t r a n s l o c a t i o n e v e n t 1 is
V)n,/], Z = M i n [ U m + Vn,(1 - U)rn + (1 - V)n,l], a n d
two subcases are to be considered:
(1) 1/> 1/2. Here, X - 1, so
~.2,3~
n (x,z) = 0, for x < 1,
and
~,2,3)
(x,z) = Prob[Z ~-< z], x >-/,
n
= zZ/mn, 0 <~ z <~ n
2z-n
m+n
m ,n-<z-<
2
P(i,J)~)(x,z),
l~<i<j~<3
w h e r e F~t,~,]~ (x,z) is t h e d i s t r i b u t i o n of t h e s e
l e n g t h s g i v e n t h a t ith a n d t h e j t h largest c h r o m o s o m e s are i n v o l v e d i n t h e t r a n s l o c a t i o n .
The q u a n t i t y ~
(x,z), is calculated in m u c h
t h e s a m e w a y as Fx(y) i n The T w o - c h r o m o s o m e
Case (above), except t h a t k e e p i n g track of t h e
ranks of t h e l e n g t h s is m o r e c o m p l i c a t e d . Consider for e x a m p l e t h e case (i,j) = (2,3), w h e r e t h e
s e c o n d a n d t h i r d largest c h r o m o s o m e s , of l e n g t h
m a n d n, respectively, are i n v o l v e d i n t h e translocation. T h e n X = Max[Urn + Vn,(1 - U)m + (1 -
1Given that the lengths of the chromosomes sum to 1, the length Y of
the second largest new chromosome is determined by X and Z.
4 ~ GENOME RESEARCH
1113
0.002
0.005
0.008
0.011
0.013
0.017
0.021
0.025
0.029
0.033
0.038
0.044
0.050
IT
18
l0
110
li
Ii
ls
M3
0.040
0.090
0.156
0.257
0.457
lls
Fl,n(X'Z)= E
k=20
M1
Ms
0.002 0.008
0.005 0.012
0.008 0.016
0.011 0.019
0.013 0.023
0.017 0.026
0.021 0.030
0.025 0.033
0.029 0.037
0.033 0.040
0.038 0.044
0.044 0.049
0.050 0.053
li
11
19
13
14
15
16
17
18
i0
11o
111
11~
113
l~
11
ls
/3
/4
/5
m+ n
= 1, - - - - ~
<~ z ~ 1/3 ,
as c a n be calculated i n m u c h t h e s a m e w a y as i n
The T w o - c h r o m o s o m e Case.
(2) l < l , ~ . H e r e l ~ < X ~ < m + n ,
so
~,~3) (x,z) = O, for x < l
and
~3)
(x,z) = P[Z <. z], for x > rn + n,
w h e r e P[Z < z] is g i v e n in case 1 above. For I ~< x
<~ m + n, FtZ~ (x,z) c o r r e s p o n d s to t h e area of t h e
set of p o i n t s (U, V) ~ [0,1] x [0,1] for w h i c h X ~< x
a n d Z ~< z.
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
0,6 '
Muntlacus muntJak (k=3)
..'"
o.4
.s. ~'"""
....o" "'=
j.~' ....*"
s' .-°
..,,,-~'.-"
°..*
0,5 '
Pea (k=7)
0,35
0,3
,¢:.¢:.
0,4 '
0,25
0,3.
3
.::.::.-.'~""
0,2
0,2
..:..... ;.:;--~.::":":'"-
---'-"-"
0,1
_
...&.o. oo-" °~"~AI- .~°
0,1
-=-- Prop. Mod.
--m--Data
--*-- Unif. Mod.
0,05 ........ "...~. . . . . .
I
I
2
3
- - ~ Dala
0,3
0
I
I
I
I
I
I
2
3
4
5
6
7
Chromosome
Zea mays (k=lO)
,
o.=
/
/
/ ,
/ .:
0,25
0,2
:S
/
~D 0,15 -
_3
r
"
"'"-
/
II•/ . ; "
.."
0,25
0,2.
#
.
. ..... 4 - ' " . . . = ~ - "
f
/./ . .
Ji .."
ii / ..* ...
~¢::~'" "
0,1
----Data
"'*'" Unif. Mod.
0,05
-o-. Prop. Mod.
:-.--- ..~:.- :::~': ----
I
I
I
I
I
I
I
I
3
4
5
6
7
8
9
10
O,
:
:
:
I
I
I
I
I'
I
i
I
2
3
4
5
6
7
8
0
10
11
12
Chromosome
Chromosome
0,18
0,18
1k=21)
;
0,14
0,12 -
/
I
~g) O,1
i
,.~0,05
0,06-~
0,04i
l='
•" ~.."
0,16
t
;
o,14
/
."
0,12
..,.~..i
~o.1
~ID
=
II 0,08
Human (k=22)
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21
,,!
I
II:
0,06
0,04
0,02~ . . . . ~ " ' ~ ~ ' " - ' ~ 0 .. : : : :,,,
0,02
0!
2
-~-- Prop. Mod.
--*--Unif. Mod.
I
Wheat
P
/
-
2
0,16
(k=12)
_~0,15
: ~ y
..~-°'~.:""
....... "" .,a.--"
~. . . . . "'-
sativa
~
/..'"
~,.~.s*¢
~
-o-. Prop. M~I.
----- Un~. Mod.
Chromosome
0,05
J
/ oO.°
i" oOi..."
• • •
£
i"
/
1
2
3
4
5
e
7
Chromosome
Data
.:..~u,,..:...,d.,---''~8
- o.. Prop. Mod.
. . . . .
,
,
9 10 11 12 13 14 15 16 17 18 19 20 21 22
Chromosome
Figure 4
Comparison of simulated mean chromosome lengths, based on the proportional and uniform models, with karyotypes from six species. The corresponding NSS values for the proportional and uniform model are,
respectively, Muntiacus muntjak, 0.052, 0.028; pea, 0.061, 0.031; Zea mays, 0.033, 0.014; Oriza sativa, 0.011,
0.004; wheat, 0.021, 0.009; human, 0.010, 0.003.
n
~ 2 , 3 ) ( x , z ) = O, 0 ~ z ~ m + n - x
z 2 - x 2 +2x(m + n) - (m + n) 2
mn
m+n
n < ~ z ~ ~
2
2x(m + n) - x 2 + mn - (m + n) 2
m+n-x~z~n
2nz - x 2 + 2x(m
mn
+ n) - n 2 - (m + n) 2
mn
m+n
2
G E N O M E RESEARCH ~ 5
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
HDHV]S]tl qlNON]9 ~ 9
Z osea uI
• %~z~
Z
j[ '0 =
1 ~ x j! '0 =
U+U/
"~ ~ z ~ u jt. , ut ---U+tU
Z
u --.> z ~, 0J! , u t u = (u'tulZ'l) ~,z)d
zz
pul~
(U'llZ'X) ~'zfl
[ ~se3 u! 'XI~S!a~ad ~aOlAl "1 =
x p u e z - u + ut = x souII oq~ uo :ldo3xo [~A '0] × [I
'~A] uietuop oq~ u! o~oq,~:toAa soqsiuea uoi~nqia~
-sip ~!tIqeqo~d sIq:L jo (U'llZ'X) (~,z~d L~!suop ~q,L
"t00"0 'uewn4 :/000"0 ':leaqM :L00"0 'oA!;os oz!J 0 .'L00"0 '~,~o~ oaZ rE00"0 'ead
:L00"0 ')lo[~untu ~ngo!~unl41 aJe lapotu s!q:~ JOj sanleA ~$N 15u!puodsaJjo:) aq_l. "sopads xls tuoJj sad,~]o/ue~l q~!~
'lapotu leUO!~JodoJd pa~e:)unJ~ aq:l uo paseq 'sq:lfiUal atuosotuoJq:) ueatu pa:lelntu!s jo uos!Jedtuo3 S a.mlil: I
ewosowo~qo
;~; I.~O,7,6r st Zl. Ol. g l . l , l . ¢ l . ~ l .
,i
,i
,,.,
,
,,
,,
,,
,,
,,
lePOm p e l e o u m l . . . . ,
,,
•,
1.1.01. 6 g
,,
,,
,,
•,
e t u o s o t u o J q o
Z o g l, ~ i~
,,
,•
,,
,,
,,.
,,
:
ele(].-.--
:
:
:
:
',
I
I
IopolJu p e l e o u n J 1 , - . . .
"
:
:
:
:
:
;
I
I
I
;
eleO-...-
:0
LOgO
30"0
~.~..°D.ooa--om'°°ll"
EO'O
1,0"0
.I..o~. o f ' ' ~ ° °
¢o'o
.
~
,m..41.o.ll...EI- ". "" .- --
•
...I.- -El'~
1,o'o ~.
• gO'O
•
O0"O
80"0
LifO
vo
ii:
(~=)1)
/
i
rL
o~
e
o
I
i
o
z
- 60"0
~'o
~
i,
(L~=N) l e e q M
c
0
•
I
i
I
i
80"0
-
d
uewnH
VO
emosotuoJqo
e u l o s o w o J q o
~L
~,0'0
lapouJ p e l e o u m 1 . . . . . . e l e ( ]
I
i
!
i
OL
,'
6
',
g
; "'
9
I
Z
I
lepotu p e l e o u m J . . . o . ,
g
$,
I
~
I
I
I
0
e l e ( ] ...,,-
~,0'0
t'O'O
- t~'O
gO'O
gO'O
~0'0 0~.
.....
I.'0 I~
d
~
~
°
°
'
Q
'
9
I
,
°
°
t~ .......
°
°
"°
.....
.41
---
.
Bo'oer"
°
~L'O
,.,.It'""
~L'O
9L'O
~1.'0
t,L'O
-
/°"
8L'O
g
I
t'
I
- OL'O
(OL--]t) slew
"
~'0
eeZ
81.'0
omosowo~qo
e w o s o t u ( u q o
L
8
I
lepOm p e l e o u n J l . - a . ,
L
E
I
I'
o
I
eleC] --,,--
.
30"0
I
IoPOm poleourul..g.,
o
•
eleO
go'o
gl.'O
go'o
I,'0
t,o'o
80'0 ~"
0
• ~'0 ~"
Q
• g~'o
~1.'0
_.°~.
......
~
o .....
• E;'O
- t¢O
91.'0
- S'e'O
~'L'O
(~--N) Nul|untu $noqlunlm
8L'O
"-IV 13 -HONNVS
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
p(2,3) (x, zll, n) = 0, if x ~ I or x ~ m + n - z,
2z
p(Z,3) (m + n - z, zll, n) = - ~ , 0 <<-z <~ m + n - l,
2z
p(2,3) (l, zlm, n) = -~-~ , m + n - l <~ z <~ n
2
m 'n-<z<~
m+n
2
m÷n
= 0,---f-
~< z ~< V3.
Similar analyses yield p(1,2) and p(3,1). Each of
Figure 3a-d depicts the three conditional densities for one of the four regions created by the two
boundaries l = l/z, n = 1/3. Weighting these three
densities by P(i,j) and s u m m i n g t h e m yields
p(x, zll, n). Because the three conditional densities
are concentrated on one-dimensional subspaces
of the (x,z) space, which are disjointed except for
one point at which all three intersect, p(x, zll, n)
has essentially the composite form of p~l,Z~,p~Z,3~,
and p(3,1).
Setting
p(x, zll, n ) =
~
P(i,j)pU'J~(x, zll, n),
l~i<j~3
the equilibrium density q should satisfy the integral equation
q(x,z)
=f ~ f l
Zl p(x, zll, n)q(l,n)dn dl
÷ f'~2 f~-Y2 p(x, zll, n)q(l,n)dn dl.
The solution to this equation requires investigating separately the dozens of regions w i t h i n
which each of the p(i,i) does not change form, and
it is not known whether there is a simple expression for the solution analogous to the case k = 2.
lated that P(i,l) is proportional to the lengths li
and lj:
P(i,j)
= I, ~
=
lib
The difficulties already encountered for k = 3
oblige us to undertake computer simulations to
estimate the expected length of the longest, second longest, . . . , kth longest chromosome, for
k ~> 3. If q(ll, . . . .
lk) is the equilibrium joint
density function on the domain 11 ~ . . . ~> lk, our
task was to estimate Eq(li) , for i= 1, . . . , k. Our
approach was simply to carry out the experiment
described in the Random Reciprocal Translocations (above) for 100,000 steps and to average the
lengths of 11, • . . , lk over all the steps.
The experiments were carried out with two
choices of weight function P(i,j). First, we postu-
Ii
+
•
A second set of runs assumed this probability to
be 1/(k), independent of the lengths of the chromosomes, and we will call this the u n i f o r m
model.
In addition, the results of the translocation
experiments were compared with the coutcome
of simply fragmenting the unit interval into k
segments, using k - 1 r a n d o m breakpoints selected according to the uniform distribution.
Table 1 shows that aside from small values of
k the proportional translocation model is very
close to the random fragmentation model. We
also see in Table 1 that the length-independent
translocation model results in a more uniform
distribution of expected lengths, whereas the
proportional model predicts a wider range of
lengths.
Comparisons with Some Known Karyotypes and a
Truncated Model
In The Two-chromosome Case (above), we
showed how the proportional translocation
model fits the H. gracilisdata better than the random lengths model. Similarly, we c o m p a r e d
karyotypes (chosen for illustrative purposes from
among those depicted in King 1975; Lima-deFaria 1980; Swanson et al. 1981) from species
with a range of values of k (Fig. 4) with the simulations in Simulations (above). As measured by a
normalized sum of squares
1~
Simulations
li
+ lj 1 -
NSS : -~ i=1
(l i - L i )
-L~
2
'
where L measures the empirical lengths, the
uniform model fits somewhat more closely
than either the proportional model or the
random fragmentation model. It can be seen,
however, that the predictions of all translocation models are systematically biased toward
too large a range of chromosome lengths and
that this bias is more important than the
differences between the models.
Physical chemical considerations of rates of
chromosome transport during mitosis and
meiosis suggest that genomes combining very
large and very small chromosomes might be at
GENOME RESEARCH~ 7
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
SANKOFF ET AL.
a disadvantage. From the p o i n t of view of
modeling, this could be h a n d l e d b y prohibiting
a n y translocation resulting in a c h r o m o s o m e of
length below a certain threshhold. This
" t r u n c a t i o n " approach is also justified at the
cytogenetic level where a viable a n d functional
chromosome
must
minimally
contain
a
centromere a n d two telomeres (and at least one
gene whose
function is n o t
duplicated
elsewhere in the genome). This imposes a lower
b o u n d on the size of a c h r o m o s o m e , on a
purely structural basis. Finally, from the genetic
viewpoint, there is reason to believe t h a t for
meiosis to be completed successfully, each
c h r o m o s o m e m u s t be of length sufficient for at
least one crossover to be expected a m o n g the
four aligned strands before t h e y segregate into
two pairs.
We redid the simulations of t h e p r o p o r t i o n a l
model corresponding to each empirical data set,
fixing a t h r e s h h o l d equal to t h e smallest observed c h r o m o s o m e size. As seen in Figure 5, this
results in a great i m p r o v e m e n t in the fit of the
models, greater t h a n m i g h t have been expected
simply by virtue of adding an additional parameter to the model.
It can be seen that except for the very largest
c h r o m o s o m e s in most of the species, the fit is
m u c h improved. Given the rather preliminary
nature of this exercise, including the choice of
karyotypes based o n l y on their fortuitous availability to the authors, no a t t e m p t was m a d e to
optimize the t r u n c a t i o n threshold. We did, however, compare a m o d e l with t r u n c a t i o n of awkwardly large c h r o m o s o m e s instead of excessively
reduced ones. T h o u g h the fit with the real data
was of course better for the longest c h r o m o somes, it was m u c h worse t h a n the lower b o u n d
t r u n c a t i o n w h e n it came to the smallest chromosomes, a n d the overall fit tended to be worse, as
measured b y the same normalized sum of squares
used in Figure 4. Similarly, a c o m p a r i s o n with a
truncated u n i f o r m m o d e l was no i m p r o v e m e n t
over the results in Figure 5.
translocations (Hannenhalli and Pevzner 1995;
Kececioglu a n d Ravi 1995) necessary to transform
one observed g e n o m e into another. Little work
has been done, however, on quantifying the incidence and c h r o m o s o m a l scope of these processes, especially on a comparative basis. For example, the algorithmic inference literature implicitly assumes that all rearrangement events of
a given type are equally likely, i n d e p e n d e n t of
h o w large a segment t h e y affect. Further modeling should compare the results of this t y p e of
assumption, versus other empirically-motivated
weighting schemes, so that inference problems
can be formulated a n d solved in a biologically
more meaningful way. Thus, our d e m o n s t r a t i o n
of the plausibility of the t r u n c a t i o n model should
have consequences for the problems studied in
H a n n e n h a l l i a n d Pevzner (1995); Kececioglu and
Ravi (1995).
It must be acknowledged t h a t no t r u n c a t i o n
model can be universally satisfactory, for a n u m ber of reasons. First, some genomes, for example,
in Aves, c o n t a i n large n u m b e r s of very small
" d o t " chromosomes, so t h a t no threshold mechanism seems operative, at least in these cases. Second, a n d more i m p o r t a n t l y , translocations resulting in very small c h r o m o s o m e s , especially
with a n y r e m a i n i n g genes duplicated elsewhere,
seem just as likely to appear as c h r o m o s o m e fusions, reducing k, and it seems essential to incorporate this possibility into the model.
We have m e n t i o n e d the necessity of eventually applying our models to c h r o m o s o m e arms,
rather t h a n entire chromosomes. This task will be
complicated by the process of centromere movem e n t in the course of evolution, often in a systematic way across all chromosomes, as in the
mouse genome.
Another direction for research involves the
incorporation of heterogeneity of breaking susceptibility of c h r o m o s o m e s along their lengths
from t h e telomeric to c e n t r o m e r i c zones a n d
from h e t e r o c h r o m a t i c to euchromatic regions.
ACKNOWLEDGMENTS
Discussion
Recently, there has been m u c h work on genomic
distances (Sankoff et al. 1992; Sankoff 1992,
1993a,b) inferred t h r o u g h the n u m b e r of inversions (Kececioglu a n d Sankoff 1994, 1995; Hannenhalli 1995; H a n n e n h a l l i and Pevzner 1995),
transpositions (Bafna a n d Pervzner 1995), a n d / o r
8 ~ GENOME RESEARCH
We thank Gopalakrishnan Sundaram for his help in setting up the simulation experiments. Thanks are also due to
Erica Jen for encouragement and suggestions for the mathematical analysis, to William F. Grant for pointers on the
cytogenetics literature and for the references to H. gracilis
and M. muntjak, and to David Baillie, Bronya Keats, and
Joseph H. Nadeau for discussions of the truncation model.
Research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and
Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press
the Canadian Genome Analysis and Technology Program.
D.D. is a Fellow of the Canadian Institute for Advanced
Research.
The publication costs of this article were defrayed in
part by payment of page charges• This article must therefore be hereby marked "advertisement" in accordance
with 18 USC section 1734 solely to indicate this fact.
REFERENCES
Bafna, V. and P.A. Pevzner. 1995. Sorting by
transpositions• Proceedings of the Sixth Annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 614-623•
Hannenhalli, S. 1995. Polynomial algorithm for
computing translocation distance between genomes.
Proceedings of the 6th Symposium on Combinatorial
Pattern Matching, Springer-Verlag Lecture Notes Comput.
Sci.: 162-176•
• 1993b. Models and analyses of genomic
evolution. In Second International Conference on
Bioinformatics, Supercomputing and Complex Genome
Analysis•
Sankoff, D., G. Leduc, N. Antoine, B. Paquin, B.F. Lang,
and R. Cedergren. 1992. Gene order comparisons for
phylogenetic inference: Evolution of the mitochondrial
genome. Proc. Nat. Acad. Sci. 89: 6575-6579.
Schulz-Schaeffer, J. 1980. Cytogenetics. Springer-Verlag,
New York, NY.
Swanson, C.P., T. Merz, and WJ. Young• 1981.
Cytogenetics, 2nd ed. Prentice Hall, Englewood Cliffs, NJ.
Received May 11, 1995; accepted in revised form December
14, 1995.
Hannenhalli, S. and P.A. Pevzner. 1995. Transforming
cabbage into turnip. (polynomial algorithm for sorting
signed permutations by reversals). In Proceedings of the
27th Annual ACM-SIAM Symposium on the Theory of
Computing, pp. 178-189. ACM, New York, NY.
Jackson, R.C. 1957. New low chromosome number for
plants. Science 126:1115-1116.
Kececioglu, J. and R. Ravi. 1995. Of mice and men.
Evolutionary distances between genomes under
translocation. Proceedings of the Sixth Annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 604-613.
Kececioglu, J. and D. Sankoff. 1994. Efficient bounds for
oriented chromosome inversion distance• Proceedings of
the Fifth Symposium on Combinatorial Pattern Matching,
(Springer Verlag Lecture Notes in Computer Science)
8 0 7 : 307-325.
• 1995. Exact and approximation algorithms for
sorting by reversals, with application to genome
rearrangement• Algorithmica 1:]: 180-210.
King, R.C. 1975. Handbook of genetics. Plenum Press, New
York, NY.
Lima-de-Faria, A. 1980. How to produce a human with 3
chromosomes and 1000 primary genes. Hereditas
93: 47-73•
Nadeau, J.H. and B.A. Taylor• 1984. Lengths of
chromosomal segments conserved since divergence of
man and mouse. Proc. Nat. Acad. Sci. 81: 814.
Sankoff, D. 1992• Edit distance for genome comparison
based on non-local operations. Proceedings of the Third
Symposium on Combinatorial Pattern Matching, (Springer
Verlag Lecture Notes in Computer Science)
644: 121-135.
• 1993a. Analytical approaches to genomic
evolution. Biochimie 75: 409-413.
GENOME RESEARCH~ 9