Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press Karyotype distributions in a stochastic model of reciprocal translocation. D Sankoff and V Ferretti Genome Res. 1996 6: 1-9 Access the most recent version at doi:10.1101/gr.6.1.1 References This article cites 6 articles, 3 of which can be accessed free at: http://genome.cshlp.org/content/6/1/1.refs.html Article cited in: http://genome.cshlp.org/content/6/1/1#related-urls Email alerting service Receive free email alerts when new articles cite this article - sign up in the box at the top right corner of the article or click here To subscribe to Genome Research go to: http://genome.cshlp.org/subscriptions Copyright © Cold Spring Harbor Laboratory Press Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press RESEARCH Karyotype Distributions in a Stochastic Model of Reciprocal Translocation David Sankoff 1 and Vincent Ferretti Centre de Recherches Math~matiques, Universit~ de Montreal, Quebec H3C 3J7, Canada A random process of reciprocal translocation for a fixed number k of chromosomes (or arms) will have an equilibrium distribution of chromosome lengths, in this paper we calculate this distribution, by analytical means for k - 2 and partially for k - 3, and simulate the means of the marginal distributions for higher k. We compare this with a random (i.e., ahistorical) distribution of genomic DNA among k chromosomes and to a selection of karyotypes of real organisms. The results motivate a revised model where translocations giving rise to undersize chromosomes are disadvantaged. The number, size, and centromeric position of its chromosomes are the most evident properties of the karyotype of a species. Because overall genomic DNA content is rather variable and does not have systematic phylogenetic pertinence, the distribution of c h r o m o s o m e , or c h r o m o s o m e arm, length (measured cytogenetically, genetically, or as DNA content), normalized by total length, is a meaningful characteristic of a given organism for comparative purposes. Over the course of evolution, the gross characteristics of a karyotype are altered by processes such as gen o m e fusion, chromosome fusion and fission, reciprocal translocation, paracentric inversions, duplication, deletion, and insertion of genomic material. It is a tenet of m a m m a l i a n genomics that the distribution of conserved chromosomal segments evident in the comparison of two relatively divergent species can be accounted for by repeated reciprocal translocations, each involving two breakpoints occurring more or less at rand o m a l o n g t h e a r m s of t w o c h r o m o s o m e s (Nadeau and Taylor 1984), t h o u g h of course noncoding regions and h e t e r o c h r o m a t i n , centromeric, and telomeric regions have all been cited as particularly susceptible to the breaking process. From an evolutionary point of view, a reciprocal translocation occurs w h e n arms of two chromosomes break simultaneously and are each rejoined to the " w r o n g " chromosome (for detailed descriptions, see Schulz-Schaeffer 1980; Swanson et al. 1981). A r a n d o m process of recip- 1Corresponding author. E-MAIL [email protected]; FAX (514) 343-2254. rocal translocation for a fixed n u m b e r k of chromosomes (or arms) will have an equilibrium distribution of chromosome lengths. In this paper we calculate this distribution, by analytical means for k - 2 and partially for k -- 3, and simulate the density for higher k. We compare this with a r a n d o m (i.e., ahistorical) distribution of genomic DNA a m o n g k chromosomes and with a selection of karyotypes of real organisms. The results motivate a revised model where translocations giving rise to undersize chromosomes are disadvantaged. Random Reciprocal Translocations We define a stochastic model for k / > 2 chromosomes without taking into account the fact that the chromosomal segments exchanged by translocations do not contain centromeres. This same model can be used, and is perhaps more properly used, w h e n k represents the n u m b e r of arms. Let 11, • • •, Ik be the lengths of the k chromosomes of a karyotype at time t, where 1 1 / > . . . / > lk and where ~ili---1. Choose two different c h r o m o somes, for example, the ith and the jth, according to some probability distribution P(i,j), which is either uniform (=l/k) or depends on the lengths li. Pick a breakpoint at r a n d o m on each of the two chromosomes, breaking t h e m into segments of length UI~, ( 1 - U)li, VIj, ( 1 - V)Ij, respectively. Then we reform a karyotype at time t + I containing chromosomes of length 11, • • •, UI~ + VIj, ..., (1 - U)li + (1 - V)Ij, . . . , Ik, which then must be reindexed so that the lengths of the chromosomes are in a m o n o t o n e nonincreasing order. This process is repeated indefinitely. As the 6:1-9 ©1996 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/96 $5.00 OENOME RESEARCH~ 1 Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press SANKOFF ET AL. p[yl x] n u m b e r of iterations a p p r o a c h e s infinity, t h e p r o b a b i l i t y t h a t t h e l e n g t h of t h e ith longest c h r o m o s o m e is in a certain interval will converge. Let q(ll, . . . ,lk) be the joint equilibrium probability density of the lengths of the longest, second longest, . . . , shortest c h r o m o s o m e , respectively. The following sections are devoted to the calculation of this density. . I~ y 1 Figure 2 Probability density for length of longer chromosome. The Two-chromosome Case To simplify the notation, let x = 11 a n d 1 - x = 12 be the lengths of the two initial chromosomes, a n d let U a n d V be two i n d e p e n d e n t r a n d o m n u m b e r s between 0 a n d 1. T h e n the two new c h r o m o s o m e s have lengths A - Ux + V(1 - x) a n d 1 - A - (l-U), x + (1 - V)(1 - x), respectively. Let Y = Max[A,1 - A ] be the length of the longer of the two, a n d let Fx(y) = Prob[Y ~< yix]. Consider the two-dimensional square [0, 1] x (0, 1] that is the d o m a i n of (U, Ii"). W h e n A t> 1 A, t h e n Y ~< y if U is between the lines Ux + V(1 x) = 1/2a n d Ux + V(1 - x) = y, as indicated in Figure 1. This has area 2y- 1 1 2x i f y ~ < x ° r 2 . x ( y - 1) z 2x(1-x) ( y - 1) 2 x ( l _ x ) , i f x ~ y<~ l. -1 The density of this probability is 2 1 p[ylx] = x" if ~ ~< y ~< x 2(1 - y) - ~ , i f x ~< y~< 1. ~-x(i as depicted in Figure 2. Now that we know the density p(ylx) for each x, we can look for the equilibrium density q(y); in our original n o t a t i o n q(1) _ 1 - q(2). The equilibrium q(y) must satisfy 1 if x -< y ~< 1. W h e n A ~< 1 - A, by s y m m e t r y an equal area is contributed to the probability t h a t Y ~< y. T h e n 2y- 1 Fx(y)- x 1 -< ~< ' if ~--~y-~x q(Y) = f ~2 q(x)p(ylx)dx = 2 ( 1 - y) f ;z x (q(x) i T x ) dx+2 f r q(x) x dx. Differentiating twice, we obtain the differential equation y(1 - y)q'(y) + 2q(y) = 0, whose solution is V q(y) = 12y(1 - y) a c on the interval [I/z, 1]. The m e a n of the density q is 11/16 " .U b dl Figure 1 Areas corresponding to length distribution delimited by the line Ux + V(1 - x) = 1/2 joining points a and b and the line Ux + 5(1 - x) = y joining points c and d. 2 ~ GENOME RESEARCH How do these results c o m p a r e with o t h e r r a n d o m processes for dividing the interval [0,1] into two segments? The simplest such process would cut the interval at a p o i n t r a n d o m l y chosen in the interval a n d t h e n take the largest piece as 11 and the other as 12. In this case the m e a n of the equilibrium density would be 3/~, w h i c h is larger t h a n 1V16. Is there biological evidence that m i g h t decide between the translocation model and the rand o m lengths model? Unfortunately, there are n o t m a n y species with o n l y two chromosomes. One well-known example is the grass Haplopappus gra- Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press 1 <= 1/2 t • m <= 1/3 1 >= : ,'j : p 1/2 m <= 1/3 " -. /j : P , /" ', . , -, : f , j i f .." r " - - i ,." """ -. . ; "', 1 .- "--_ <= 1/2 ,, : m . l > . i ". f ' Z 1/3 1 . . . ," / : >: -, "-" - - _ _ . 1/2 -.. .: m > I - / z 1/3 _ . ,,';: i P ~-.-_. / , , ---.-_.~,,. ... i" , e t P , ." : --. e .." .. ,, - _-.<_ ,' , i "--. " "4.. Figure 3 : --~._.~. ', : I t "''. 7" Joint probability densities for longest and shortest chromosomes. cilis (Jackson 1957), where the sizes of the larger and smaller chromosomes are in the ratio of 5:3 (or 62.5:37.5). Thus, the translocation model (69: 31) fits better than the random lengths model (75:25), t h o u g h we c a n n o t place too m u c h weight on this single case. Three Chromosomes Because each translocation involves just two chromosomes, the analysis for three or more chromosomes reduces in some aspects to the case k = 2. Complications arise, however, because the two new chromosomes resulting from a translo- cation involving the ith and the jth largest chromosome may change the rank of the lengths of several or all of the chromosomes unaffected by the translocation itself. To model the translocation process, we need to specify how pairs of chromosomes are chosen for each event. The most natural postulate is that the probability P(i,j) of choosing the ith and the jth largest chromosome is proportional to their lengths: P(i,j) = Ii ~ = lil/ li + 1/i - I/ + , GENOME RESEARCH~ 3 Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press SANKOFF ET AL. Table 1. Simulated Mean Chromosome Lengths It for Karyotypes of Varying Numbers of Chromosomes k, Based on the Proportional Model (M1), the Uniform Model (M2), and Random Fragmentation (/k13) li 11 ls M1 0.313 0.687 li 11 12 13 M1 0.122 0.298 0.580 li 11 Is M1 0.067 0.153 0.279 0.501 13 14 k---2 Ms 0.312 0.688 k=3 11'12 0.160 0.304 0.536 k=,i Ms 0.101 0.180 0.275 0.443 Ms 0.250 0.750 M3 0.111 0.277 0.611 M3 0.062 0.146 0.271 0.520 k=5 M, M2 0.040 0.070 0.092 0.121 0.161 0.177 0.260 0.250 0.446 0.381 k=10 M1 Ms 0.010 0.023 0.021 0.038 0.034 0.051 0.048 0.065 0.065 0.079 16 0.085 0.095 0.084 0.110 0.143 0.193 0.290 0.113 0.136 0.169 0.231 0.109 0.143 0.193 0.293 11,1 0.057 0.059 0.057 115 116 117 0.066 0.076 0.088 0.105 0.130 0.180 0.065 0.071 0.080 0.090 0.106 0.136 0.066 0.076 0.088 0.105 0.130 0.180 13 /4 /5 M3 0.010 0.021 0.033 0.048 0.064 119 lso w h e r e li a n d lj are t h e l e n g t h s of t h e two c h r o m o somes. In S i m u l a t i o n s (below) we also discuss t h e m o d e l w h e r e this p r o b a b i l i t y is 1/(k2), i n d e p e n d e n t of t h e l e n g t h s of t h e c h r o m o s o m e s . In t h e case k -- 3, g i v e n i n i t i a l c h r o m o s o m e l e n g t h s 1 I> m I> n, t h e j o i n t p r o b a b i l i t y distribut i o n of t h e l e n g t h X of t h e longest a n d Z of t h e shortest of t h e three n e w c h r o m o s o m e s after a single t r a n s l o c a t i o n e v e n t 1 is V)n,/], Z = M i n [ U m + Vn,(1 - U)rn + (1 - V)n,l], a n d two subcases are to be considered: (1) 1/> 1/2. Here, X - 1, so ~.2,3~ n (x,z) = 0, for x < 1, and ~,2,3) (x,z) = Prob[Z ~-< z], x >-/, n = zZ/mn, 0 <~ z <~ n 2z-n m+n m ,n-<z-< 2 P(i,J)~)(x,z), l~<i<j~<3 w h e r e F~t,~,]~ (x,z) is t h e d i s t r i b u t i o n of t h e s e l e n g t h s g i v e n t h a t ith a n d t h e j t h largest c h r o m o s o m e s are i n v o l v e d i n t h e t r a n s l o c a t i o n . The q u a n t i t y ~ (x,z), is calculated in m u c h t h e s a m e w a y as Fx(y) i n The T w o - c h r o m o s o m e Case (above), except t h a t k e e p i n g track of t h e ranks of t h e l e n g t h s is m o r e c o m p l i c a t e d . Consider for e x a m p l e t h e case (i,j) = (2,3), w h e r e t h e s e c o n d a n d t h i r d largest c h r o m o s o m e s , of l e n g t h m a n d n, respectively, are i n v o l v e d i n t h e translocation. T h e n X = Max[Urn + Vn,(1 - U)m + (1 - 1Given that the lengths of the chromosomes sum to 1, the length Y of the second largest new chromosome is determined by X and Z. 4 ~ GENOME RESEARCH 1113 0.002 0.005 0.008 0.011 0.013 0.017 0.021 0.025 0.029 0.033 0.038 0.044 0.050 IT 18 l0 110 li Ii ls M3 0.040 0.090 0.156 0.257 0.457 lls Fl,n(X'Z)= E k=20 M1 Ms 0.002 0.008 0.005 0.012 0.008 0.016 0.011 0.019 0.013 0.023 0.017 0.026 0.021 0.030 0.025 0.033 0.029 0.037 0.033 0.040 0.038 0.044 0.044 0.049 0.050 0.053 li 11 19 13 14 15 16 17 18 i0 11o 111 11~ 113 l~ 11 ls /3 /4 /5 m+ n = 1, - - - - ~ <~ z ~ 1/3 , as c a n be calculated i n m u c h t h e s a m e w a y as i n The T w o - c h r o m o s o m e Case. (2) l < l , ~ . H e r e l ~ < X ~ < m + n , so ~,~3) (x,z) = O, for x < l and ~3) (x,z) = P[Z <. z], for x > rn + n, w h e r e P[Z < z] is g i v e n in case 1 above. For I ~< x <~ m + n, FtZ~ (x,z) c o r r e s p o n d s to t h e area of t h e set of p o i n t s (U, V) ~ [0,1] x [0,1] for w h i c h X ~< x a n d Z ~< z. Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press 0,6 ' Muntlacus muntJak (k=3) ..'" o.4 .s. ~'""" ....o" "'= j.~' ....*" s' .-° ..,,,-~'.-" °..* 0,5 ' Pea (k=7) 0,35 0,3 ,¢:.¢:. 0,4 ' 0,25 0,3. 3 .::.::.-.'~"" 0,2 0,2 ..:..... ;.:;--~.::":":'"- ---'-"-" 0,1 _ ...&.o. oo-" °~"~AI- .~° 0,1 -=-- Prop. Mod. --m--Data --*-- Unif. Mod. 0,05 ........ "...~. . . . . . I I 2 3 - - ~ Dala 0,3 0 I I I I I I 2 3 4 5 6 7 Chromosome Zea mays (k=lO) , o.= / / / , / .: 0,25 0,2 :S / ~D 0,15 - _3 r " "'"- / II•/ . ; " .." 0,25 0,2. # . . ..... 4 - ' " . . . = ~ - " f /./ . . Ji .." ii / ..* ... ~¢::~'" " 0,1 ----Data "'*'" Unif. Mod. 0,05 -o-. Prop. Mod. :-.--- ..~:.- :::~': ---- I I I I I I I I 3 4 5 6 7 8 9 10 O, : : : I I I I I' I i I 2 3 4 5 6 7 8 0 10 11 12 Chromosome Chromosome 0,18 0,18 1k=21) ; 0,14 0,12 - / I ~g) O,1 i ,.~0,05 0,06-~ 0,04i l=' •" ~.." 0,16 t ; o,14 / ." 0,12 ..,.~..i ~o.1 ~ID = II 0,08 Human (k=22) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ,,! I II: 0,06 0,04 0,02~ . . . . ~ " ' ~ ~ ' " - ' ~ 0 .. : : : :,,, 0,02 0! 2 -~-- Prop. Mod. --*--Unif. Mod. I Wheat P / - 2 0,16 (k=12) _~0,15 : ~ y ..~-°'~.:"" ....... "" .,a.--" ~. . . . . "'- sativa ~ /..'" ~,.~.s*¢ ~ -o-. Prop. M~I. ----- Un~. Mod. Chromosome 0,05 J / oO.° i" oOi..." • • • £ i" / 1 2 3 4 5 e 7 Chromosome Data .:..~u,,..:...,d.,---''~8 - o.. Prop. Mod. . . . . . , , 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chromosome Figure 4 Comparison of simulated mean chromosome lengths, based on the proportional and uniform models, with karyotypes from six species. The corresponding NSS values for the proportional and uniform model are, respectively, Muntiacus muntjak, 0.052, 0.028; pea, 0.061, 0.031; Zea mays, 0.033, 0.014; Oriza sativa, 0.011, 0.004; wheat, 0.021, 0.009; human, 0.010, 0.003. n ~ 2 , 3 ) ( x , z ) = O, 0 ~ z ~ m + n - x z 2 - x 2 +2x(m + n) - (m + n) 2 mn m+n n < ~ z ~ ~ 2 2x(m + n) - x 2 + mn - (m + n) 2 m+n-x~z~n 2nz - x 2 + 2x(m mn + n) - n 2 - (m + n) 2 mn m+n 2 G E N O M E RESEARCH ~ 5 Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press HDHV]S]tl qlNON]9 ~ 9 Z osea uI • %~z~ Z j[ '0 = 1 ~ x j! '0 = U+U/ "~ ~ z ~ u jt. , ut ---U+tU Z u --.> z ~, 0J! , u t u = (u'tulZ'l) ~,z)d zz pul~ (U'llZ'X) ~'zfl [ ~se3 u! 'XI~S!a~ad ~aOlAl "1 = x p u e z - u + ut = x souII oq~ uo :ldo3xo [~A '0] × [I '~A] uietuop oq~ u! o~oq,~:toAa soqsiuea uoi~nqia~ -sip ~!tIqeqo~d sIq:L jo (U'llZ'X) (~,z~d L~!suop ~q,L "t00"0 'uewn4 :/000"0 ':leaqM :L00"0 'oA!;os oz!J 0 .'L00"0 '~,~o~ oaZ rE00"0 'ead :L00"0 ')lo[~untu ~ngo!~unl41 aJe lapotu s!q:~ JOj sanleA ~$N 15u!puodsaJjo:) aq_l. "sopads xls tuoJj sad,~]o/ue~l q~!~ 'lapotu leUO!~JodoJd pa~e:)unJ~ aq:l uo paseq 'sq:lfiUal atuosotuoJq:) ueatu pa:lelntu!s jo uos!Jedtuo3 S a.mlil: I ewosowo~qo ;~; I.~O,7,6r st Zl. Ol. g l . l , l . ¢ l . ~ l . ,i ,i ,,., , ,, ,, ,, ,, ,, lePOm p e l e o u m l . . . . , ,, •, 1.1.01. 6 g ,, ,, ,, •, e t u o s o t u o J q o Z o g l, ~ i~ ,, ,• ,, ,, ,,. ,, : ele(].-.-- : : : : ', I I IopolJu p e l e o u n J 1 , - . . . " : : : : : ; I I I ; eleO-...- :0 LOgO 30"0 ~.~..°D.ooa--om'°°ll" EO'O 1,0"0 .I..o~. o f ' ' ~ ° ° ¢o'o . ~ ,m..41.o.ll...EI- ". "" .- -- • ...I.- -El'~ 1,o'o ~. • gO'O • O0"O 80"0 LifO vo ii: (~=)1) / i rL o~ e o I i o z - 60"0 ~'o ~ i, (L~=N) l e e q M c 0 • I i I i 80"0 - d uewnH VO emosotuoJqo e u l o s o w o J q o ~L ~,0'0 lapouJ p e l e o u m 1 . . . . . . e l e ( ] I i ! i OL ,' 6 ', g ; "' 9 I Z I lepotu p e l e o u m J . . . o . , g $, I ~ I I I 0 e l e ( ] ...,,- ~,0'0 t'O'O - t~'O gO'O gO'O ~0'0 0~. ..... I.'0 I~ d ~ ~ ° ° ' Q ' 9 I , ° ° t~ ....... ° ° "° ..... .41 --- . Bo'oer" ° ~L'O ,.,.It'"" ~L'O 9L'O ~1.'0 t,L'O - /°" 8L'O g I t' I - OL'O (OL--]t) slew " ~'0 eeZ 81.'0 omosowo~qo e w o s o t u ( u q o L 8 I lepOm p e l e o u n J l . - a . , L E I I' o I eleC] --,,-- . 30"0 I IoPOm poleourul..g., o • eleO go'o gl.'O go'o I,'0 t,o'o 80'0 ~" 0 • ~'0 ~" Q • g~'o ~1.'0 _.°~. ...... ~ o ..... • E;'O - t¢O 91.'0 - S'e'O ~'L'O (~--N) Nul|untu $noqlunlm 8L'O "-IV 13 -HONNVS Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press p(2,3) (x, zll, n) = 0, if x ~ I or x ~ m + n - z, 2z p(Z,3) (m + n - z, zll, n) = - ~ , 0 <<-z <~ m + n - l, 2z p(2,3) (l, zlm, n) = -~-~ , m + n - l <~ z <~ n 2 m 'n-<z<~ m+n 2 m÷n = 0,---f- ~< z ~< V3. Similar analyses yield p(1,2) and p(3,1). Each of Figure 3a-d depicts the three conditional densities for one of the four regions created by the two boundaries l = l/z, n = 1/3. Weighting these three densities by P(i,j) and s u m m i n g t h e m yields p(x, zll, n). Because the three conditional densities are concentrated on one-dimensional subspaces of the (x,z) space, which are disjointed except for one point at which all three intersect, p(x, zll, n) has essentially the composite form of p~l,Z~,p~Z,3~, and p(3,1). Setting p(x, zll, n ) = ~ P(i,j)pU'J~(x, zll, n), l~i<j~3 the equilibrium density q should satisfy the integral equation q(x,z) =f ~ f l Zl p(x, zll, n)q(l,n)dn dl ÷ f'~2 f~-Y2 p(x, zll, n)q(l,n)dn dl. The solution to this equation requires investigating separately the dozens of regions w i t h i n which each of the p(i,i) does not change form, and it is not known whether there is a simple expression for the solution analogous to the case k = 2. lated that P(i,l) is proportional to the lengths li and lj: P(i,j) = I, ~ = lib The difficulties already encountered for k = 3 oblige us to undertake computer simulations to estimate the expected length of the longest, second longest, . . . , kth longest chromosome, for k ~> 3. If q(ll, . . . . lk) is the equilibrium joint density function on the domain 11 ~ . . . ~> lk, our task was to estimate Eq(li) , for i= 1, . . . , k. Our approach was simply to carry out the experiment described in the Random Reciprocal Translocations (above) for 100,000 steps and to average the lengths of 11, • . . , lk over all the steps. The experiments were carried out with two choices of weight function P(i,j). First, we postu- Ii + • A second set of runs assumed this probability to be 1/(k), independent of the lengths of the chromosomes, and we will call this the u n i f o r m model. In addition, the results of the translocation experiments were compared with the coutcome of simply fragmenting the unit interval into k segments, using k - 1 r a n d o m breakpoints selected according to the uniform distribution. Table 1 shows that aside from small values of k the proportional translocation model is very close to the random fragmentation model. We also see in Table 1 that the length-independent translocation model results in a more uniform distribution of expected lengths, whereas the proportional model predicts a wider range of lengths. Comparisons with Some Known Karyotypes and a Truncated Model In The Two-chromosome Case (above), we showed how the proportional translocation model fits the H. gracilisdata better than the random lengths model. Similarly, we c o m p a r e d karyotypes (chosen for illustrative purposes from among those depicted in King 1975; Lima-deFaria 1980; Swanson et al. 1981) from species with a range of values of k (Fig. 4) with the simulations in Simulations (above). As measured by a normalized sum of squares 1~ Simulations li + lj 1 - NSS : -~ i=1 (l i - L i ) -L~ 2 ' where L measures the empirical lengths, the uniform model fits somewhat more closely than either the proportional model or the random fragmentation model. It can be seen, however, that the predictions of all translocation models are systematically biased toward too large a range of chromosome lengths and that this bias is more important than the differences between the models. Physical chemical considerations of rates of chromosome transport during mitosis and meiosis suggest that genomes combining very large and very small chromosomes might be at GENOME RESEARCH~ 7 Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press SANKOFF ET AL. a disadvantage. From the p o i n t of view of modeling, this could be h a n d l e d b y prohibiting a n y translocation resulting in a c h r o m o s o m e of length below a certain threshhold. This " t r u n c a t i o n " approach is also justified at the cytogenetic level where a viable a n d functional chromosome must minimally contain a centromere a n d two telomeres (and at least one gene whose function is n o t duplicated elsewhere in the genome). This imposes a lower b o u n d on the size of a c h r o m o s o m e , on a purely structural basis. Finally, from the genetic viewpoint, there is reason to believe t h a t for meiosis to be completed successfully, each c h r o m o s o m e m u s t be of length sufficient for at least one crossover to be expected a m o n g the four aligned strands before t h e y segregate into two pairs. We redid the simulations of t h e p r o p o r t i o n a l model corresponding to each empirical data set, fixing a t h r e s h h o l d equal to t h e smallest observed c h r o m o s o m e size. As seen in Figure 5, this results in a great i m p r o v e m e n t in the fit of the models, greater t h a n m i g h t have been expected simply by virtue of adding an additional parameter to the model. It can be seen that except for the very largest c h r o m o s o m e s in most of the species, the fit is m u c h improved. Given the rather preliminary nature of this exercise, including the choice of karyotypes based o n l y on their fortuitous availability to the authors, no a t t e m p t was m a d e to optimize the t r u n c a t i o n threshold. We did, however, compare a m o d e l with t r u n c a t i o n of awkwardly large c h r o m o s o m e s instead of excessively reduced ones. T h o u g h the fit with the real data was of course better for the longest c h r o m o somes, it was m u c h worse t h a n the lower b o u n d t r u n c a t i o n w h e n it came to the smallest chromosomes, a n d the overall fit tended to be worse, as measured b y the same normalized sum of squares used in Figure 4. Similarly, a c o m p a r i s o n with a truncated u n i f o r m m o d e l was no i m p r o v e m e n t over the results in Figure 5. translocations (Hannenhalli and Pevzner 1995; Kececioglu a n d Ravi 1995) necessary to transform one observed g e n o m e into another. Little work has been done, however, on quantifying the incidence and c h r o m o s o m a l scope of these processes, especially on a comparative basis. For example, the algorithmic inference literature implicitly assumes that all rearrangement events of a given type are equally likely, i n d e p e n d e n t of h o w large a segment t h e y affect. Further modeling should compare the results of this t y p e of assumption, versus other empirically-motivated weighting schemes, so that inference problems can be formulated a n d solved in a biologically more meaningful way. Thus, our d e m o n s t r a t i o n of the plausibility of the t r u n c a t i o n model should have consequences for the problems studied in H a n n e n h a l l i a n d Pevzner (1995); Kececioglu and Ravi (1995). It must be acknowledged t h a t no t r u n c a t i o n model can be universally satisfactory, for a n u m ber of reasons. First, some genomes, for example, in Aves, c o n t a i n large n u m b e r s of very small " d o t " chromosomes, so t h a t no threshold mechanism seems operative, at least in these cases. Second, a n d more i m p o r t a n t l y , translocations resulting in very small c h r o m o s o m e s , especially with a n y r e m a i n i n g genes duplicated elsewhere, seem just as likely to appear as c h r o m o s o m e fusions, reducing k, and it seems essential to incorporate this possibility into the model. We have m e n t i o n e d the necessity of eventually applying our models to c h r o m o s o m e arms, rather t h a n entire chromosomes. This task will be complicated by the process of centromere movem e n t in the course of evolution, often in a systematic way across all chromosomes, as in the mouse genome. Another direction for research involves the incorporation of heterogeneity of breaking susceptibility of c h r o m o s o m e s along their lengths from t h e telomeric to c e n t r o m e r i c zones a n d from h e t e r o c h r o m a t i c to euchromatic regions. ACKNOWLEDGMENTS Discussion Recently, there has been m u c h work on genomic distances (Sankoff et al. 1992; Sankoff 1992, 1993a,b) inferred t h r o u g h the n u m b e r of inversions (Kececioglu a n d Sankoff 1994, 1995; Hannenhalli 1995; H a n n e n h a l l i and Pevzner 1995), transpositions (Bafna a n d Pervzner 1995), a n d / o r 8 ~ GENOME RESEARCH We thank Gopalakrishnan Sundaram for his help in setting up the simulation experiments. Thanks are also due to Erica Jen for encouragement and suggestions for the mathematical analysis, to William F. Grant for pointers on the cytogenetics literature and for the references to H. gracilis and M. muntjak, and to David Baillie, Bronya Keats, and Joseph H. Nadeau for discussions of the truncation model. Research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and Downloaded from genome.cshlp.org on March 16, 2011 - Published by Cold Spring Harbor Laboratory Press the Canadian Genome Analysis and Technology Program. D.D. is a Fellow of the Canadian Institute for Advanced Research. The publication costs of this article were defrayed in part by payment of page charges• This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact. REFERENCES Bafna, V. and P.A. Pevzner. 1995. Sorting by transpositions• Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 614-623• Hannenhalli, S. 1995. Polynomial algorithm for computing translocation distance between genomes. Proceedings of the 6th Symposium on Combinatorial Pattern Matching, Springer-Verlag Lecture Notes Comput. Sci.: 162-176• • 1993b. Models and analyses of genomic evolution. In Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis• Sankoff, D., G. Leduc, N. Antoine, B. Paquin, B.F. Lang, and R. Cedergren. 1992. Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome. Proc. Nat. Acad. Sci. 89: 6575-6579. Schulz-Schaeffer, J. 1980. Cytogenetics. Springer-Verlag, New York, NY. Swanson, C.P., T. Merz, and WJ. Young• 1981. Cytogenetics, 2nd ed. Prentice Hall, Englewood Cliffs, NJ. Received May 11, 1995; accepted in revised form December 14, 1995. Hannenhalli, S. and P.A. Pevzner. 1995. Transforming cabbage into turnip. (polynomial algorithm for sorting signed permutations by reversals). In Proceedings of the 27th Annual ACM-SIAM Symposium on the Theory of Computing, pp. 178-189. ACM, New York, NY. Jackson, R.C. 1957. New low chromosome number for plants. Science 126:1115-1116. Kececioglu, J. and R. Ravi. 1995. Of mice and men. Evolutionary distances between genomes under translocation. Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 604-613. Kececioglu, J. and D. Sankoff. 1994. Efficient bounds for oriented chromosome inversion distance• Proceedings of the Fifth Symposium on Combinatorial Pattern Matching, (Springer Verlag Lecture Notes in Computer Science) 8 0 7 : 307-325. • 1995. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement• Algorithmica 1:]: 180-210. King, R.C. 1975. Handbook of genetics. Plenum Press, New York, NY. Lima-de-Faria, A. 1980. How to produce a human with 3 chromosomes and 1000 primary genes. Hereditas 93: 47-73• Nadeau, J.H. and B.A. Taylor• 1984. Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Nat. Acad. Sci. 81: 814. Sankoff, D. 1992• Edit distance for genome comparison based on non-local operations. Proceedings of the Third Symposium on Combinatorial Pattern Matching, (Springer Verlag Lecture Notes in Computer Science) 644: 121-135. • 1993a. Analytical approaches to genomic evolution. Biochimie 75: 409-413. GENOME RESEARCH~ 9
© Copyright 2026 Paperzz