Lecture: Topic 4: Structural variation detection

Structural Varia+on Detec+on Review of Structural Varia+ons •  Structural varia+ons are varia+ons in an organism’s chromosome. •  Usually refers to microscopic or submicroscopic types, such as: –  dele+ons, duplica+ons, copy-­‐number varia+ons, inser+ons, inversions and transloca+ons •  Typically, a SV affects about 1Kb to 3Mb, which is larger than a SNP and smaller than a chromosomal abnormality. 2 3 Fluoroscent in situ hybridiza+on (FISH) •  (Cancer genomes show extensive structural varia+on) •  Historically, larger structural varia+ons (easily observed under a microscope were commonly studied, mostly in the context of diseases… An inser+on (also called an inser+on muta+on) is the addi+on of one or more nucleo+des. Inser+ons can be anywhere in size from one base pairs incorrectly inserted into a DNA sequence to a sec+on of one chromosome inserted into another. 5 Inser+ons and Tay-­‐Sachs Tay-­‐Sachs disease is a autosomal recessive gene+c disorder. •  Deteriora+on of mental and physical abili+es. •  Common in some specific popula+ons. 6 •  A four base pair inser+on iresults in an altered reading frame for the HEXA gene on chr 15. •  Heterozygous carriers show abnormal enzyme ac+vity, but manifest no disease symptoms. The HEXA gene is located on the long (q) arm of human chromosome 15, between posi+ons 23 and 24. 7 Dele+ons •  A dele+on is a muta+on in which part of a chromosome or a sequence of a DNA is missing •  Small dele+ons not be fatal •  Large dele+ons are usually fatal •  Medium-­‐sized dele+ons o^en lead to recognizable phenotypes (and in humans, disorders) 8 Effects of Dele+ons •  Responsible for an array of gene+c disorders, including infer+lity, Muscular dystrophy, rare gene+c disorders (i.e. Spinal muscular atrophy, Cri du chat) •  Recent work suggests that some dele+ons of highly conserved sequences (CONDELs) may be responsible for the evolu+onary differences present among closely related species (i.e. humans, chimpanzees and other mammals). 9 Copy Number Variants •  altera+ons of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sec+ons of the DNA 10 •  Eleva+ng the gene copy number of a par+cular gene can up-­‐regulate that gene. •  CNVs have been associated with suscep+bility or resistance to disease. •  Are associated with au+sm, schizophrenia, learning disabili+es, autoimmune disorders, and cancer. •  Higher copy number of CCL3L1 has been associated with lower suscep+bility to HIV infec+on. •  Most common CNVs have lidle or no role in causing disease. 11 Detec+on of SV’s Using Paired-­‐
End Short Read Data Paired-­‐End Data •  The paired-­‐ends of a clone help iden+fy deformi+es/ structural varia+on in the donor genome. •  Some SVs are copy neutral (inversions), while others are copy number variant (dele+ons/duplica+ons). •  Besides raw detec+on, there are a number of problems that we might want to solve computa+onally. Paired-­‐end short read data (insert distribu+on known due to fragment size selec+on) 14 Methods for Detec+ng SVs 15 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments Using BWA or some other alignment tool. 16 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments 17 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments 18 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Inser+on Split read alignments Dele+on 19 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments 20 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments 21 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments 22 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments 23 Mate-­‐pair or paired-­‐end mapping abnormali+es Read-­‐depth signals Split read alignments Pindel (Ye et al.): detec+on of large dele+ons and medium size inser+ons. •  Some reads may not be mapped because
they are just across the break points of
deletion events.
•  If we can find a proper position to break the
read into two fragments and map them
separately, we will be able to compute the
exact break points and the fragment deleted
compared to the reference. 24 Detec+ng Dele+on Events © The Author(s) 2009. Published by Oxford University Press.
Ye K et al. Bioinformatics 2009;25:2865-2871
Exis+ng Programs •  Pindel: detects breakpoints of large dele+ons and medium sized inser+ons. •  Varia+onHunter/BreakDancer: detects all SVs. •  CommonLaw/GenomeSTRiP: detects SVs by sequencing on a popula+on scale. •  BreakFusion/deFuse/SOAPfusion/Gene deFuser: detects gene fusion events. 26