Custom preformatted databases LotuS

Custom preformatted
databases LotuS
LotuS custom sequence databases (2)
• LotuS preformatted databases consist of two file:
• Fasta file (DNA, can be translated RNA) – unique, short header to each ref
sequence
• Taxonomy file – contains for each Fasta header taxonomy information
• Fasta file is straightforward to create – both strands will be searched,
so sequence orientation is not important
• Taxonomy file contains fasta identifier (without “>”), tab, and “;”
delimited list of taxonomic levels:
• HP451749
k__Eukaryota; p__Basidiomycota; c__Pucciniomycetes;
o__Pucciniales; f__Pucciniaceae; g__Puccinia; s__Puccinia triticina
LotuS custom sequence databases
• Additionally, the level is indicated with a single letter in the tax string:
•
•
•
•
•
•
•
k = kingdom
p = phylum
c = class
o = order
f = family
g = genus
s = species
• Each unknown level is marked with level and “?”
• E.g. HP451XXX
k__Eukaryota; p__Basidiomycota; c__Pucciniomycetes;
o__Pucciniales; f__Pucciniaceae; g__?; s__?
Example: Silva
otusdir/DB/SLV_123_SSU.fasta
lotusdir/DB/SLV_123_SSU.tax
>HP451749
CCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAACAACTATACAGTG
AAACTGCGAATGGCTCATTAAATCAGTTATAGTTTATTTGATGATACCTTACTACATGGATAACTGTGGTAATTCTAGAG
CTAATACATGCTGAAAAGCCCCAACCTTTGGAAGGGGTGTATTTATTAGATAAAAAACCAATGGCTTTCGGGTCTCTTTG
GTGATTCATAATAACTTCTCGAATCGCATGGCCTTGTGCCGGTGATGCTTCATTCAAATATCTGCCCTATCAACTTTCGA
TGGTAGGATAGAGGCCTACCATGGTGATGACGGGTAACGGGGAATAAGGGTTCGATTCCGGAGAGAGGGCCTGAGAAACG
GCCCTCAAATCTAAGGATTGCAGCAGGCGCGCAAATTACCCAATCCTGACACAGGGAGGTAGTGACAATAAATAACAATG
TATGGCTCTTTTGGGTCTTACAATTGGAATGAGTACAATTTAAATCTCTTAACGAGGATCAATTGGAGGGCAAGTCTGGT
GCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAATTGTTGACGTTAAAAAGCTCGTAGTCGAACTTCGGC
CTCTGGCAGTTGGTCCGCCTTTTGGTGTGTACTGATTTGTTGGAGGCTTACCTCTTGGTGAACTTCAATGCACTTTACTG
GGTGTTGAAGGGAACCAGGACTTTTACTTTGAAAAAATTAGAGTGTTCAAAGCAGGCTTATGCCTGAATACATTAGCATG
GAATAATAAAATAGGACGTGTGATTCTATTTTGTTGGTTTCTAGGATTACCGTAATGATGAATAGGGTCAGTTGGGGGCA
TTTGTATTACATCGTCAGAGGTGAAATTCTTGGATTGATGTAAGACAAACTACTGCGAAAGCATCTGCCAAGGATGACTT
CATTGATCAAGAACGAAGGTTAAGGGTTCAAAAACGATCAGATACCGTTGTAGTCTTAACAGTAAACTATGCCGACTGGG
GATCAGACAAGGATTTATAATGACTTGCTTGGCACCCAAAGGGAAACCTTAAGTTAAGGGGGGGGAGTGAATGAGGATAA
GAGTGATGGGATTTGAAAATAGCGAGGCGTAGGGCACCACCAGGTGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGG
AAACTCACCAGGTCCAGACACAGTAAGGATTGACAGATTGATAGCTTTTTCTTGATTTTGTGGTTGGTGGTGCATGGCCG
TTCTTAGTTGGGTGGAGTGATTTGTCTGGGTAATTCCGATAACAAACAAAACCTTCTCCTGCTAAATAGTCCAGCTGGCT
ACGGCTGGCTGCAGACTTCTTAAAGGGACTATCAGACGTTTAGTTGATGGAAGTTGGAGGCAATAACAGGTCTGTGATGC
CCTTAGATGTTCTGGGCCACACGCGCTCTACACTGACCAAGCCAACGAGTATATCACCTTATCTGAAAAGCGGGGGGTAA
TCTTGTGAAACTTGGTCGTGATGGGGATAGAGCATTGCAATTATTGCTCTTCAACGAGGAATACCTAGTAAGCGTATGTC
ATCAGCATGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGGCAGGCTTTTTGAGACGT
TCATTCCGGATAATGCGTTGGCGGCAGACCATCAGCGTTGGTTAGCGAATGCGGTCGACCACCTGTCAGACGAAGACGCC
CAATCTTGTGCTATTGGGTTGGGTTTCTA
>AB002583
CAAAAGAAGCGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGTATGCTTAACACATGCAAGTCGAACGCGCGTAAGTG
GCGTGGCGAACGGGTGAGTAACACGTGAGAATCTGCCCCTAGGAGTTGGATAAGGCTTGGAAACGAGCGCTAAACCAACA
TATAAGGAAAGGAGAGATCGCCTAGGGAAGAGCTCGCGGCTGATTAGCTAGTTGGTAGGGTAAAGGCCTACCAAGGCGAT
GATCAGTAGCTGGTCTGAGAGGATGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGG
GGAATTTTCTGCAATGGGCGAAAGCCTGACAGAGCAATACCGCGTGAGGGATGAAGGCCTTAGGGTTGTCAACCTCTTTT
…..
head SLV_123_SSU.tax
HP451749
k__Eukaryota; p__Basidiomycota; c__Pucciniomycetes;
o__Pucciniales; f__Pucciniaceae; g__Puccinia; s__Puccinia triticina
AB002583
k__Bacteria; p__Cyanobacteria; c__Chloroplast;
o__Cyanidioschyzon merolae; f__; g__; s__
FJ904637
k__Eukaryota; p__Apicomplexa; c__Conoidasida;
o__Coccidia; f__Eimeriorina; g__Calyptosporidae; s__Calyptospora
spinosa
FJ906773
k__Eukaryota; p__Arthropoda; c__Maxillopoda;
o__Pedunculata; f__?; g__?; s__Lepas anatifera
FJ911809
k__Eukaryota; p__Arthropoda; c__Arachnida; o__?; f__?;
g__?; s__Gamasellodes adrianae
FJ911827
k__Eukaryota; p__Arthropoda; c__Arachnida; o__?; f__?;
g__?; s__Dendrolaelaps sp. 2 APGD-2010
AB012846
k__Eukaryota; p__; c__Chlorophyceae;
o__Sphaeropleales; f__; g__Coelastrella; s__Coelastrella striolata var.
multistriata
AB013182
k__Eukaryota; p__Bangiales; c__?; o__?; f__?; g__?;
s__Porphyra sp. Shimonoseki
FJ911854
k__Eukaryota; p__Arthropoda; c__Arachnida; o__?; f__?;
g__?; s__Ornithonyssus bursa
FJ911859
k__Eukaryota; p__Arthropoda; c__Arachnida; o__?; f__?;
g__?; s__Spelaeorhynchus praecursor