EMBOSS "The European Molecular Biology Open Software Suite " EMBOSS • Open Source software • Over 150 individual programs – Sequence alignment – Rapid database searching – Protein motif identification – Nucleotide sequence pattern analysis – Codon usage analysis – Identification of sequence patterns – An much more… • EMBOSS was initiated as an european project when GCG (american analysis package) became commercial. • They both provide roughly the same services: http://helix.nih.gov/apps/bioinfo/embossgcg.html Advantages • It is free • It runs practically on every UNIX based system (Linux and MacOSX. At the CSC netsite you can also use a windows version) • Free of arbitrary size limits • Can be used from most of the programming environments • Programs of EMBOSS package can be combined and piped together in countless ways • Extremely stable • Most useful in UNIX command prompt enviroment but there is GUIs available http://emboss.sourceforge.net/docs/emboss_tutorial/emboss_tutorial.html Programs are grouped • • • • • • • • • • Alignment Display Edit Enzyme kinetics Feature tables Information Nucleic Phylogeny Protein Utils • EMBOSS website has comprehensive list of programs • Another list of EMBOSS programs can be found from http://www.csc.fi/english/r esearch/sciences/bioscien ce/programs/emboss/inde x_html EMBOSS command syntax • Follows normal UNIX syntax • Uniform Sequence Addresses – (=> USA syntax…nothing to do with the USA ;) • Sequence format – Multiple formats supported • Alignment formats • Feature formats • Report formats USA syntax • • • • ”format::file” ”format::file:entry” ”dbname:entry” ”@listfile” (a file of file-names) Sequence Formats I • There are at least couple of dozens different formats • ”Nearly every collection of sequences that call itself a database has stored its data in its own format” • Ids and Accessions – Most databases has both – ID was originally intended to be human-readable…not working since there is far too many sequences to be named by humans – Accession numbers are unique identificators more for computer (=automated) use Sequence Formats II • Annotation and Features – Every format have some line or field for holding annotation about sequence in question • The Sequence – Sequences are usually held in the IUPAC standards one-letter codes • Sequence Database Formats – – – – EMBL GenBank SwissProt PIR • Formats supported by EMBOSS can be seen from http://emboss.sourceforge.net/docs/themes/SequenceFo rmats.html
© Copyright 2026 Paperzz