Single DNA Sequence Analysis Tools BME 110: CompBio Tools Todd Lowe May 1, 2012 Today’s topics • Getting sets of sequences: The UCSC Genome Table Browser • General Toolbox: EMBOSS tool suite • ORF prediction – gORF at NCBI (single sequences) – GeneMark (full-genome) • Specialized site: – Restriction enzyme searches – PCR primer design Basic ORF Finding 1. 2. 3. 4. 5. 6. 7. 8. Look for “long” open reading frames Scan sequence at nucleotide #1 (Frame #1), begin ORF at first start codon: ATG, GTG, TTG Continue scanning to first stop codon: TAA, TGA, or TAG That is your ORF! Repeat, starting at nuc #2 (Frame #2) , then again, starting at nuc #3 (Frame #3) Take reverse complement of sequence -> (i.e. 5’-CGAAC -> 5’-GTTCG) Scan sequence starting at nuc #1 (Frame –1), nuc #2 (Frame –2), nuc #3 (Frame –3) On-line ORF Finding @ NCBI http://www.ncbi.nlm.nih.gov/gorf/gorf.html • Shows all ORFs of min. length 50, 100, or 300 nucleotides long • Also shows all start (green) & stop (red) codons • Allows “alternate” start codons (TTG, GTG, CTG) • When you’ve selected an ORF you like, click on “Accept” button, then can display nucleotide or protein sequence for further analyses • Example: “Missing gene region” in Pyrobaculum calidifontis: chr:1487337-1491109 Historical Note: Annotating the Yeast Genome • Yeast is a eukaryote, but does not have many introns • A strict cutoff for ORF length was used: minimum of 300 nucleotides ORF required to be considered a gene in original genome annotation (1996) • Since then, many smaller ORFs have been found experimentally EMBOSS Tools • Package (large) of on-line tools • Go to http://mobyle.pasteur.fr/ Transform sequence – Convert file format types – Reverse complement (revseq) – – – – Extract a portion of sequence (extractseq) Search & replace subsequences (biosed) Translate (transeq, prettyseq) or back-translate (backtranseq/backtranambig) Randomize sequence (shuffleseq) Analyze sequences – – – – – G/C content (geecee) Codon usage (cusp), codon adaptation (cai), codon bias (chips) Word composition (wordcount) Needleman-Wunsch global alignment (needle) Smith-Waterman local alignment (water) Many others… Demo • Get the DNA sequence for PAE1265 in Pyrobaculum aerophilum • Using EMBOSS at Mobyle site, calcuate: – G/C percentage (DNA) – What is the most common 4-letter word? – Take the reverse complement – Extract nucleotides 100-150; What is the G/C content? – Translate the DNA sequence in frame 2 – What species have the 4 most similar gene sequences? (Use NCBI BLAST) More Sophisticated ORF Prediction • Can analyze entire genome at once, use codon frequencies, not just one gene • GeneMark (http://opal.biology.gatech.edu/GeneMark) – Two modes: • Genemark.hmm (based on previous genome information) • GenemarkS (uses only your sequence info, when no other similar genome models are available) Cutting DNA: Restriction enzymes • enzymes isolated from prokaryotes that break DNA at very specific sequence-specific positions • in nature, act as host-defense against viruses examples Nla III: BamH I: Dra III: 5’ ... CATG^ ... 3’ 5’ ... G^GATCC ... 3’ 5’ ... CACNNN^GTG... 3’ > 3,000 RE’s found to date with >200 specificities! Many RE’s Create “Sticky” Ends Before cutting: 5’-ATTGATGG^AATTCTTATGGATAG-3' 3'3’-TAACTACCTTAA^GAATACCTATC-5' After cutting, “sticky ends”: 5’-ATTGATGG AATTCTTATGGATAG-3' 3’-TAACTACCTTAA + GAATACCTATC-5‘ • Useful to increase efficiency & specificity of rejoining ends New England BioLabs Tools • http://tools.neb.com/ NEBcutter – Display which restriction enzyme cut in your sequence, and where REBsites – Display a “virtual” digest of your DNA, showing how it would look on an agarose gel Primer3 http://frodo.wi.mit.edu/ Key Input: 1. Sequence 2. Targets (region to be amplified) 3. Product Size Ranges 4. Primer size (usually 18-22 bp) 5. Primer Tm (annealing temperature) (Rest are usually OK as defaults)
© Copyright 2026 Paperzz