LeBIBI PPF – step by step RUN #1 Let`s start with a SSU rDNA

LeBIBI PPF – step by step
RUN #1
Let’s start with a SSU rDNA sequence in Fasta format:
>unkown_sequence
GTCTTCGGACTTAGCGGCGGACGGGTGAGTAACGCGTGGGAACGTGCCCTTTGCTTCGGAATAGCCCCGG
GAAACTGGGAGTAATACCGAATGTGCCCTTTGGGGGAAAGATTTATCGGCAAAGGATCGGCCCGCGTTGG
ATTAGGTAGTTGGTGGGGTAATGGCCTACCAAGCCGACGATCCATAGCTGGTTTGAGAGGATGATCAGCC
ACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATCTTAGACAATGGGCGCA
AGCCTGATCTAGCCATGCCGCGTGATCGATGAAGGCCTTAGGGTTGTAAAGATCTTTCAGGTGGGAAGAT
AATGACGGTACCACCAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCTA
GCGTTATTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGATCGGAAAGTCAGAGGTGAAATCCCAGGG
CTCAACCCTGGAACTGCCTTTGAAACTCCCGATCTTGAGGTCGAGAGAGGTGAGTGGAATTCCGAGTGTA
GAGGTGAAATTCGTAGATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCT
GAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAATGCC
AGTCGTCGGGCAGCATGCTGTTCGGTGACACACCTAACGGATTAAGCATTCCGCCTGGGGAGTACGGCCG
CAAGGTTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCA
ACGCGCAGAACCTTACCAACCCTTGACATGGCGATCGCGGTTCCAGAGATGGTTCCTTCAGTTCGGCTGG
ATCGCACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTCGGTTAAGTCCGGCAACGAG
CGCAACCCACGTCCTTAGTTGCCAGCATTCAGTTGGGCACTCTAGGGAAACTGCCGGTGATAAGCCGGAG
GAAGGTGTGGATGACGTCAAGTCCTCATGGCCCTTACGGGTTGGGCTACACACGTGCTACAATGGCAGTG
ACAATGGGTTAATCCCAAAAAGCTGTCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGTCGGA
ATCGCTAGTAATCGCGTAACAGCATGACGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAC
ACCATGGGAATTGGTTCTACCCGAAGGCGGTGCGCCAACCTCGCAAGAGGAGGCAGCCGACCACGGTAGG
The first step of the analysis is to get some idea of its identity.
1
Query building
Paste the sequence in the query window and set a maximum number of 10 hits of a
Blast search of the PPFDB_SSU_rDNA-16S_superstringent database (database of one
SSU_rDNA-16S sequence per type species).
Increasing the maximum number of BLAST hits to be retained may be necessary when
the input sequences are highly divergent.
2
Settings and analysis parameters
You may give a specific name to the analysis. In the ‘Settings’ part of the window,
replace ‘Query’ by the suitable name in ‘User-given Id.’.
The length of the sequence names can be set to any value from 10 to 200 characters
(default = 30).
Keep the default settings of the post-BLAST processing: you are analyzing a single
sequence for which you want to get a global phylogenetic tree of all the sequences
(unknown + 10 closest BLAST hits).
When inputing several sequences with BLAST retrievals for them all, you have the
possibility of building as many trees as the input sequences, each tree
corresponding to a single sequence.
The default settings of the sequence alignment algorithms, sequence
algorithm and the phylogeny program may be modified. For that latter
trimming
If you do not wish to use the default substitution model, un-click the option in
the ‘Phylogeny’ part of the window.
Once the settings are correct, click on the ‘Proceed to the next PPF step’ red
button.
3
Verification of the run settings
A summary of the analysis settings and parameters appears in another window that
opens automatically.
Alternatively, if you opted to do so at the previous step, you can select in a list
the substitution model for the phylogeny reconstruction at this stage (lowest part
of the window)
If/When everything is OK, click on the ‘Run’ red button to launch the sequence
analysis.
4
Results window
A separate result window opens automatically. The top of the window lists the run
settings and, below, the results are presented in a double frame.
The left frame displays the log of the run, with the most recent information
appearing at the top.
The run stops when three clickable buttons appear in this frame and the right frame
shows the resulting phylogenetic tree.
You may visualize the run log in a separate window by clicking on the links ‘See
the actual page here’ above the frame.
The result of the BLAST run is accessible by clicking on the ‘BLAST results’ link
in the log.
All generated files are directly accessible by clicking on the link ‘All files’
above the frame log.
[to add :description of files when list is final]
5
Tree visualization
The tree (svg format) can be visualized in a separate window by clicking on the
most left ‘View the svg tree’ yellow button. The ‘unknown sequence’ is highlighted
in red.
6
Visualization/Download of the tree in PDF format
Clicking on the most right ‘View the pdf tree’ yellow button gets you access to the
tree in PDF format.
0.01
Haematobacter_massiliensis~v~TT~AF452106
Haematobacter_massiliensis~v~TT~DQ342309
Haematobacter_missouriensis~v~TT~DQ342315
Frigidibacter_albus~v~TT~KF944301
Rhodobacter_sphaeroides~v~TT~D16425
Rhodobacter_sphaeroides~v~TT~X53853
QRY_QRY_unkown_sequence
Rhodobacter_sphaeroides~v~TT~CP000143
Rhodobacter_johrii~v~TT~AM398152
Rhodobacter_megalophilus~v~TT~AM421024
Rhodobacter_azotoformans~v~TT~AB607332
7
Tree edition
Clicking on the middle ‘Edit the svg tree’ red button opens a tree edition
interface. The tree appears in a dedicated frame with, on the right, a control
panel showing the available modification options.
To modify the aapearance of the tree, select the option(s) in the control sections
‘Shape’, ‘Outgroups’, ‘Branch support’, ‘Highlighting’, ‘Scale’ and click on the
‘modify’ red button at the top of the control panel.
Scale: The tree width and length can be rescaled as well as the font size.
Tree shape: A squared representation showing the branch supports as branch width is
used by default. Keep in mind that the tree is always **unrooted** until a root is
selected. The circular representation does not show the branch supports.
Outgroups: Up to two outgroup sequences can be selected from the complete list of
sequences.
Branch support: Branch width is used as default. Only SH branch support value above
a selected threshold (0.7 to 0.95) are displayed. The branch width is calculated
relatively to a selected maximum support value (0.7 to 1.0). The branch support
display can be changed to numerical support value or be removed.
Highlighting: By default, the queried sequences are singled out by red-coloring
whilst all other non-highlighted sequences appear in black. The sequences can be
color-highlighted according to their taxonomy at the species, genus, family, order
or class level. This may be used to identify the lineage at any rank level to which
the queried sequences belong.
- Species: The example Query sequence belongs to Rhodobacter sphaeroides species
The modified tree can be visualized in a separate window in SVG format for all
modification options and in PDF in the case of the coloring options.
- Genus: The example Query sequence belongs to Rhodobacter genus
- Family: The example Query sequence belongs to the Rhodobacteraceae family
- Free choice: This option allows you to highlight any sequence which name contains
a string of characters of your choosing.
- Clear: Ticking the ‘Clear’ option and clicking on ‘Modify’ will get you back to a
tree devoid of any tag.
RUN #2
The unknown sequence is likely a member of Rhodobacter sphaeroides species
within the Rhodobacteraceae (Rhodobacterales, Alphaproteobacteria).
The next step is to ascertain its position among all the sequences
available for this this species and within Rhodobacterales. Including the
Rhodobacterales will root the Rhodobacter sphaeroides species lineage.
In the Supplemental Query frame, write the following commands:
#Rhodobacter_sphaeroides @stringent %notag
#Rhodobacterales @genuslevel %notag
The first command will select all Rhodobacter sphaeroides SSU rDNA-16S
sequences that are currently available in the database. The second command
will select representative sequences for all Rhodobacterales genera.
Query:
Run summary:
Run log:
Circular tree:
Unknon
sequence
Rhodobacter genus
representative sequence
The unknown sequence (red) groups with the Rhodobacter genus representative
sequence (yellow) and the majority of Rhodobacter sphaeroides species (pale
yellow) among which are found sequences for the type strain.
Note that several sequences for Rhodobacter sphaeroides (black arrows) do
not group with the Rhodobacter genus representative sequence. They likely
were misnamed.
RUN #3
Alternatively, the position of the unknown sequence can be explored within
the Rhodobacter sphaeroides species as previously (first line of command in
the Supplemental Query frame), and among Rhodobacterales species (second
line of command in the Supplemental Query frame) instead of Rhodobacterales
genus:
#Rhodobacter_sphaeroides @stringent %notag
#Rhodobacterales @superstringent %notag
Query:
Run summary:
Run log:
A
larger
number
of
sequences
is
included
in
the
analysis:
660
Rhodobacterales type species in comparison to the 260 Rhodobacterales genus
representatives in the prvious analysis. The run will thus last a little
longer (about 5 min. instead of 1 for the previous run).
Family-colored tree:
RHODOBACTERACEAE
Unknon
sequence
HYPHOMONADACEAE
The Rhodobacterales are separated into the Rhodobacteraceae and the
Hyphomonadaceae. This latter family can thus be used as outgroup for the
rooting of the Rhodobacteraceae to which the unknown sequence belongs.
Species-colored tree:
Unknon
sequence
HYPHOMONADACEAE
Zooming in on the part of the tree with the query sequence, the affiliation to
Rhodobacter sphaeroides species is confirmed by the co-occurrence of sequences for
this species type strain.
Note however that representative sequences for the type strains of two other
Rhodobacter species (R. megalophilus and R. johrii) branch among the Rhodobacter
shpaeroides strain sequences.
In reverse, some Rhodobacter sphaeroides strain sequences group with Rhodobacter
azotoformans type strain (top of the figure below).
Finally the presence of a long branch may deserve some more digging, starting with
the checking of he sequence itself.
Rhodobacter azotoformans T
Unknown sequence
Rhodobacter sphaeroides T
Rhodobacter megalophilus T
Rhodobacter sphaeroides T
Rhodobacter sphaeroides T
Rhodobacter sphaeroides T
Rhodobacter sphaeroides T
Rhodobacter johrii T
Long branch: possible
problem with the sequence