Sequence

Stuff to Do
Midterm I
questions due 1/31
• Email me your question (with answers),
– if you have the capability, mail complete questions,
figures, etc. and all,
– if not, write questions, with instructions…i.e. in Figure
2 of x paper, blah, blah, blah,
• Friday afternoon, I’ll post the questions on the WEB page,
on Monday, you’ll have time to work on them together, in
class.
memory +
0 pts
analysis
5 pts
memory only
integration of information
Cycle Sequencing
Chain Termination
…a DNA polymerase application.
ddNTPs
dNTPs
Primer
Template
Taq DNA Polymerase w/ Buffer
Cycles
Polymerization until Taq hits ddNTP.
=
dNTPs
dNTPs and ddNTPS
(mixture)
Linked on Course WEB Page.
Cycle Sequence Tutor
…and an animation,
http://www.dnalc.org/shockwave/cycseq.html
Disclaimer: this review is heavily biased toward the public sequencing consortium.
Hierarchical Clone-by-Clone
Map First: then sequence
Whole Genome Assembly
Sequence First: then map
Genome Sequencing Strategy
#1
Clone-by Clone Approach
– Order clones along the genome, then sequence,
• not dependent on acceleration of sequencing capacity,
• not dependent on advanced computer analysis,
• not dependent on ‘as-of-yet’ sequencing technologies,
• “repeats” not as big a problem?
• heavy up-front demand for human labor.
Clone-by-Clone
Ordered Approach
Online Primer: mapping
Genomic Libraries
…how many clones
to cover a genome?
Vectors
(carry insert DNA)
Vector
Host
Inserts
Plasmid
E. coli
up to 15 kb,
Phage
E. coli
up to 25 kb,
Cosmid
E. coli
up to 45 kb,
BAC
E. coli
100-500 kb,
YAC
Yeast
250-1000 kb.
Yeast Artificial
Chromosome
plasmid/phage hybrid
Bacterial
Artificial
Chromosome
Genomic Sequences and Coverage
p finding clone
N=
ln(1 - .9999)
ln(1 - v/2,900,000,000)
v = average vector insert size
genome
size
plasmid (5 kb) = 5.3 x 106
phage (20 kb) = 1.3 x 106
5
BAC (125 kb) = 2.2 x 10
YAC (500 kb) = 27,000 clones
Clone-by-Clone
Ordered Approach
Contigs
(Contiguos Sequences)
Find overlapping ends…
Clone 1
Clone 2
…Sequence,
…Restriction Fragment Length Polymorphisms (RFLPs).
Sequence Contig
RFLP
Restriction enzymes cut
specific DNA…
…specifically,
Fragment lengths
provide clone
identification data.
Contigs
(Contiguos Sequences)
Find overlapping ends…
Merge good pairs of reads into
longer contigs…
Find the minimal Tilling Path,
- minimum set of
overlapping clones that
cover the genome.
Minimal Tilling Path
Fig. 2
Identify minimal
overlapping clones.
Shotgun Sequence Each Clone
Bacterial Artificial Chromosomes
BACs
• Universal Priming
Sites,
– On the vector,
flanking the genomic
insert.
Shotgun
(self-quiz)
~ 8x - 10x coverage:
To shotgun sequence 10,000 bp,
you’d need 80k - 100k bp of
sequence, or ~160 - ~180
sequencing reactions.
But, 10,000 bp, at 500 bp per
sequencing reaction could be
done in as few as 20 sequencing
reactions.
Why Shotgun?
Contigs
QC
Structural Genomic Strategies
#2
Whole Genome Assembly Approach:
– Sequence first, then order,
• dependent on advances in computer analysis and
sequencing technologies,
• dependent on automated labor.
WGA
Read Pairs = Mate End Pairs
• Paired End Sequencing,
– sequence both ends of the vector insert, using
vector derived primers,
• Maintain mate pair data.
insert
vector
5’
3’
3’
5’
Example Sequence Output
(example: 5 kb insert)
5’ read(543 bp)atatgtatattgaattacatacatattattaatgcacatttttatccggagttgtggaccatagaaagacatattgactcctca
aagtaaattctgcatgttacattgaaatcataggctaaatttgagatgcactatttttagaaagtgtagagaaaaggacaggaa
gaaataagcgaaagctttggtaagccaccaaacctgattactggaagaaaagaaaaaagttccgagaatagagttagatcgctg
gtgagggttttaaatggaacacaacaatggttgttttagagtgtgttattcttttgtatttataccttctcataggtttcttgt
aatacacgcttcttcctctctctccctctctcttatggcctcgtcttgaaagcgtcttgcatgctaagagaaggctttagagca
aggagagaagggagaagttgatttatacgtccatcggatatatcttctttttatatctgtctctcttttaaggaagaaaaatgg
cgactgaattctcgtgggatgaaatcaagaaagaaaatg...
- rest of insert (unsequenced, ~3.9 kb) ...ggcttgaaatatttggggcaaacaagcttgaagagaaatcagagaacaagtttttgaaattcttggggttcatgtggaatc
ctctctcatgggttatggagtctgctgcaatcatggctattgttttagctaatggaggaggaaaggcgccggattggcaagatt
ttatcggtattatggtgttgcttatcatcaactccaccataagtttcatcgaggagaacaatgctggcaatgccgctgctgctc
tcatggcaaatcttgcaccaaagactaaggtatgcaaatttctcaatacatatatataggtatgtattttctaaaaaggagagt
tatataacctatgtgtgaatgtaggtgttgagagatggtaaatggggggagcaagaggcttcaatcttggttccgggtgatttg
ataagcatcaaattgggtgacattgttcctgctgatgctcgtctcctcgaaggagatcctttaaaaattgaccaatctgctctt
actggtgaatcccttccaaccaccaaacacccaggagat - 3’ read(540 bp)
…plus trace data files associated with these sequence runs.
WGA
Structural Genomic Strategies
#3 (Hybrid)
Project Comparisons
(NYT: 10/3/2002)
•
•
Decoding the genome of Plasmodium falciparum, the most dangerous of the
four single-cell parasites that cause malaria, took six years and cost about $20
million, paid for by the Wellcome Trust of London, the National Institutes of
Health in Bethesda, Md., and other sources. Dr. Malcolm J. Gardner of the
Institute for Genomic Research in Rockville, Md., led a large team of scientists
there and at the Sanger Centre near Cambridge in England. Completion of the
falciparum genome was first announced at a conference in Las Vegas in
February.
Hybrid
The genome of Anopheles gambiae, the primary carrier of the parasite, was
begun more recently and took a mere 15 months even though its genome is far
larger — some 278 million units of DNA encoding 14,000 genes compared
with the parasite's 23 million units of DNA and 5,268 genes. The mosquito
team was led by Dr. Robert A. Holt of Celera Genomics in Rockville. The $14
million cost was born by the National Institutes of Health, by Genoscope in
France and other sources.
WGA
Wednesday
• WGA,
• Shotgun Sequencing,
• Hybrid Approach.
Compartmentalized
Shotgun
Approach
• Please read…
Science 291: 1304-1315