Pseudo-chromosome assembly of large and complex genomes using multiple references 1 2 2 3,4 5 5,6 2 1 1 Mikhail Kolmogorov , Brian Raney , Joel Armstrong , Duncan Odom , Paul Flicek , David Thybert , Benedict Paten and Son Pham 1 University of California San Diego, USA 2University of California Santa Cruz, USA, 3University of Cambridge, UK, 4 Wellcome Trust Sanger Institute, UK, 5European Bioinformatic Institute, UK, 6The Genome Analysis Center, UK Abstract Incomplete Breakpoint Graph Analysis • Assembly of mammalian-scale genomes into complete chromosomes is challenging Ref. 1 a d c • To address this, we developed Ragout, a referenceassisted assembly tool for large and complex genomes Ref. 2 a b c Target a • Taking as input an NGS assembly and multiple related references, Ragout infers their evolutionary relationship and builds the final assembly of the target genome using a genome rearrangement approach • Using Ragout we assembled two mice genomes (M. Caroli and M. Pahari) with comlicated chromosome-scale rearrangements into sets of high-quality pseudochromosomes • Chromosome coloring comfirms most the rearrangements that Ragout has detected d at b b d c ct dh ch bh bt Algorithm Overview (a) A A C GT G C TA G G C G AT G G a d c Ref. 2 A A C G G AT G G G G C T G C TA a b c C G ATAT G Target d G C TA b b (b) Nucleotide sequences are decomposed into the permutations of synteny blocks d a c C G AT (c) (d) at dh Ref. 2 Ref. 1 Target ct (e) ah (d) Incomplete breakpoint graph reflects adjacencies between synteny blocks ch bh at (c) Phylogenetic tree of the input genomes is reconstructed based on the breakpoints data ah dt bt (e) Missing adjacencies are reconstructed by analysing rearrangements (f) dt ct C G AT dh C G ATAT G G C TA ch bh • Ragout recovers the missing adjacencies so as to minimize the weighted number of rearrangements between the genomes • These adjacencies are then used to merge the target fragments into scaffolds (a) Raw references and target genome sequences (b) Ref. 1 • If all genomes were complete, the edges of each color will define a perfect matching on the graph • As the target genome is fragmented, some adjacencies of red color are missing ah dt • Breakpoint graphs reflect adjacencies between synteny blocks in different genomes bt Results • We assembled two genomes from Murinae family: Mus. Caroli and Mus. Pahari using Mus. Musculus and Rattus Norvegicus as references. The assemblies contains 20 and 23 pseudo-chromosomes, respectively with at most 2% of unlocalized sequence. • M. Caroli shows 5% sequence diversity from Mus. Musculus and has the same karyotype (which was confirmed by Ragout). However, we have detected a large inversion in chr17. • M. Pahari has 10% sequence diversity and contain many interchromosomal rearrangements. Ragout has detected four of them, which were also confirmed by chromosome coloring. • Some of the rearrangements remain undetected, as the corresponding breakpoints are missing from the NGS data. However, they could be recovered using the aid of chromosomal maps. (f) Target fragments are joined into scaffolds with respect to the inferred adjacencies Synteny Blocks (a) G1 (b) a1 a2 a5 a3 a2 a1 G2 G3 a1 a2 a1 a3 a5 a4 a5 a4 a3 a3 a5 a4 (c) (d) a123 a4 a123 a5 a5 a4 a4 (e) (f) a1235 G1 a1235 a1 a2 a3 a5 Synteny blocks help to separate small sequence variations from largescale rearrangements. (a) An alignment between three genomes with complicated sub-structure. (b) A-Bruijn graph representation of the alignment. Small sequence variations correspond to bubbles, while rearrangements form more complicated structures. (c) A bubble is removed during the graph simplification, forming a larger synteny block a123(d) Masking smaller block a4 allows to make the graph structure even simplier. (e) After removing another bubble, the whole alignment is represented as a large synteny block. (d) A hierarchical representation of a synteny block. Availability & Contacts • Ragout is an easy to use package, written in Python/C+. It is freely available at http://fenderglass.github.io/Ragout • Email: [email protected]
© Copyright 2025 Paperzz