Visualising Phylogenetic Trees

Visualising Phylogenetic Trees
Wan Nazmee Wan Zainon & Paul Calder
School of Informatics and Engineering
Flinders University of South Australia
PO Box 2100, Adelaide 5001, South Australia
[email protected]
[email protected]
Abstract•
This paper describes techniques for visualising pairs of
similar trees. Our aim is to develop ways of presenting
the information so as to highlight both the common
structure of the trees and their points of difference. The
impetus for the work comes from the field of
bioinformatics, where geneticists construct complex
phylogenetic trees to represent the evolution of species or
genes. But the techniques can also be used for other treestructured data such as file systems, parse trees, decision
trees, and organisational hierarchies.
To investigate our techniques, we have built a prototype
application that reads and displays phylogenetic trees in
the popular Nexus format. The application incorporates a
variety of interactive and automated visualisation
techniques, and is implemented in Java. We are working
with biologists to see how well the techniques work for
real-world data.
Keywords: Interactive visualisation, phylogenetic trees,
bioinformatics.
1
Introduction
Tree-structured data occurs in many domains: file
systems, parse trees, organisational hierarchies, and
classification schemes of many kinds. The impetus for
this work described in this paper is the domain of
phylogenetic classification, which is used by geneticists
to describe possible evolutionary relationships between
species or individuals. Although we have developed our
techniques specifically for that domain, many of our
techniques could also be applied to other domains that
use similar trees.
This paper presents techniques for visualising pairs of
phylogenetic trees in order to emphasise the similarity of
the trees while also highlighting how they differ. We
have implemented these techniques in the context of a
prototype tool for interactively visualising phylogenetic
trees, and are in the process of evaluating the
effectiveness of the tool for real phylogenetic data.
Copyright © 2006, Australian Computer Society, Inc. This
paper appeared at the Seventh Australasian User Interface
Conference (AUIC2006), Hobart, Australia. Conferences in
Research and Practice in Information Technology (CRPIT),
Vol. 50. Wayne Piekarski, Ed. Reproduction for academic, notfor profit purposes permitted provided this text is included.
The remainder of this paper is organised as follows.
Section 2 provides an introduction to the bioinformatics
basis of phylogenetic trees and outlines other work that
has investigated the visualisation and comparison of such
data. Section 3 presents our approach to the problem and
details several of the algorithms we use to compute
visualisations. Section 4 describes brief implementation
details for the prototype visualisation tool, shows
examples of its interface and discusses its use.
2
2.1
Related Work
Bioinformatics Context
Biologists and geneticists use phylogenetic trees to
represent the evolutionary interrelationships between
collections of related species or genes. The discovery and
analysis of those relationships may help in many practical
applications such as drug discovery, forensics, disease
control, and ecological modelling.
Biologists construct phylogenetic trees by examining the
phenotypes or genotypes of a collection of organisms and
attempting to infer the evolutionary process by which the
organisms came to be. For example, a geneticist might
obtain DNA sequence data from a range of species or
from individuals within a population.
Then, by
comparing the sequences, she could infer how the
sampled organisms might have evolved via a series of
mutations, each caused by one change in the DNA
sequence. This hypothesised evolutionary history is then
represented as a “tree of life” showing how possible
ancestors could have led to the current organisms.
Bioinformaticists have devised a range of algorithms,
based on strategies such as Maximum Likelihood
(Felsenstein et al. 1982) and Maximum Parsimony (Farris
1983), for computing such phylogenetic trees. However,
there is no “gold standard”; current practice dictates that
several different methods be applied to the sequence data
(Thorup 1994). When this happen, biologists often need
to compare several similar trees in order to get a more
complete picture of the relationships involved. A similar
situation arises when several species have evolved in
close association (co-evolution); the biologist might be
interested in understanding how the phylogenetic tree for
one species compares with that for the co-evolved
species.
In its simplest form, a phylogenetic tree is drawn as a
rooted binary tree. Each leaf node represents an actual
species or organism; each internal node represents a
hypothetical ancestor at which mutation is assumed to
Ant
Bat
Bat
Cow
Cow
Dog
Dog
Ant
Tree 1
Tree 2
Figure 1: Fictitious phylogenetic trees
have occurred (and which therefore has exactly two
branches). For example, Figure 1 shows two (clearly
fictitious) trees that suggest two possible ways in which 4
present-day species might be related.
Tree 1 implies that Bat and Cow diverged recently from a
common ancestor, that the Bat/Cow ancestor and Dog
share a more distant common ancestry, and finally that
the whole Bat/Cow/Dog tree split from the Ant branch
even further in the past. Tree 2, on the other hand,
suggests that a common ancestor split into two branches,
one ultimately leading to Bat and Cow and the other to
Ant and Dog.
Real phylogenetic trees will of course be much larger and
thus more complex. Understanding such trees requires
visual inspection, structural comparison, and interactive
manipulation and exploration, and thus present a number
of visualisation challenges (Carrizo 2004). Biologists
faced with inadequate visualisation tools for comparing
trees have had to rely instead on paper, tape, and
highlighter pens (Munzner et al. 2003).
2.2
Tree Comparison Techniques
Bioinformaticists use a variety of techniques to compare
phylogenetic trees. Section 3.1 describes how we apply
and extend some of these techniques in visualising trees.
Consensus trees are widely used to summarise the
agreement between a set of trees. A consensus tree
represents a lowest common denominator of two or more
trees; it depicts those aspects that the individual trees all
agree on.
Bryant (1997) reports on a variety of methods for creating
a consensus tree, including the strict, majority rule, semistrict, and Nelson and Adams techniques. For example,
the strict consensus tree of the trees in Figure 1 is as
follows:
Bat
Cow
Dog
Ant
The consensus tree indicates that both trees agree that Bat
and Cow had a recent common ancestor, but disagree
about how Ant and Dog fit into the picture. The best that
can be said is that Ant and Dog both shared a common
ancestor with the Bat/Cow ancestor at some time in the
past.
Note that a consensus tree includes all of the original leaf
nodes, but is normally not fully resolved; areas of
disagreement generally result in interior nodes with more
than two branches.
An agreement subtree is a subtree that is common to two or
more trees. Conceptually, a subtree can be obtained by pruning
leaf nodes (and collapsing the parent internal nodes) from the
original tree. An agreement subtree is a subtree that can be
extracted in such a manner from all of the trees. A greatest
agreement subtree (GAS) is an agreement subtree with the
greatest number of leaf nodes. For example, the trees in Figure
1 have two greatest agreement subtrees:
Bat
Bat
Cow
Cow
Dog
Ant
Note that a greatest agreement subtree does not normally
include all of the leaf nodes (unless, of course, the trees are
identical).
A triplet is a 3-node subtree and represents the smallest
informative subtree of a rooted tree. The structure of a tree is
fully characterised by enumerating the structure of its triplets.
For example, Tree 1 has the following triplets:
Ant
Ant
Ant
Dog
Bat
Bat
Cow
Bat
Cow
Dog
Dog
Cow
Triplets can be used as a basis for quantifying the difference
between rooted trees. Using this approach, the structural
difference between two trees is the number of triplets whose
structure is different in the two trees.
For example, Tree 2 has the following triplets. Since 2 triplets
(the second and third) are different from the corresponding
triplets in Tree 1, the structural triplet difference between the
two trees is 2.
Ant
Bat
Cow
Dog
Bat
Ant
Ant
Bat
Cow
Dog
Dog
Cow
The nearest neighbour interchange (NNI) technique
(Robinson 1971) is also used to quantify the difference
between trees. A nearest neighbour interchange is an
interchange of two “nearest neighbour” branches. The
NNI difference between two trees is the minimum
number of such interchanges needed to convert one tree
into the other.
NNI is usually applied to unrooted trees, but can be
adapted for rooted trees. For rooted trees, the “nearest
neighbour” of a branch is one of the sub-branches (if they
exist) of its sibling. For example, in Tree 1 the nearest
neighbours of the Dog branch are the Bat and Cow
branches, and the nearest neighbours of the Ant branch
are the Dog branch and the (unlabelled) common
Bat/Cow ancestor branch.
Program
Use
PROTPARS
Infers phylogenies from protein
sequences
using
parsimony
method
NEIGHBOR
Infers phylogenies from distance
matrix data using either pairwise
clustering or neighbour joining
methods
DRAWGRAM
Draws a rooted tree based on
output from one of the phylogeny
inference programs
CONSENSE
Computes a consensus tree from
a group of phylogenies
RETREE
Allows interactive manipulation
of a tree
Table 1: Selected PHYLIP programs
Using this definition, the following trees can all be
obtained by one NNI step from Tree 1. Since one of
these (the bottom right) is structurally identical with Tree
2, the NNI difference between the two trees is 1.
2.3
such as rearranging tree branches, deleting nodes, and
rerooting trees.
Mesquite (Madison and Madison 2005) is a system that
its developers describe as “a modular system for
evolutionary analysis”.
Available modules include
components for construction and comparison of
phylogenetic trees. The TreeSet Visualisation module
(Klinger and Amenta 2002) produces point-set
visualisations that suggest clustering within large sets of
trees.
TreeJuxtaposer (Munzner at al. 2003) supports
structural comparison of trees. The tool can highlight
parts of several trees that are structurally similar,
although its emphasis is on efficiently handling very large
trees (up to several hundred thousand nodes) rather than
on identifying the specific differences between the trees.
3
Visualising Tree Differences
Our approach to visualising trees similarities and
differences makes use of the fact that a tree with
unordered branches can be drawn in many arrangements.
In a phylogenetic tree, the order in which branches appear
is usually less important than the structural relationships
between nodes. In such cases, we can take advantage of
this flexibility to draw a pair of trees to highlight both
their similarities and differences.
Ant
Ant
Dog
Bat
Cow
Dog
Bat
Cow
Dog
Bat
Bat
Cow
Bat
Bat
Cow
Ant
Cow
Cow
Ant
Dog
Dog
Dog
Ant
Ant
Tools for Phylogenetic Tree Analysis and
Visualisation
Biologists use many applications to analyse and
understand phylogenetic data.
This section briefly
describes four of the most popular tools that are freely
available over the Internet. A comprehensive list of other
tools is provided on the PHYLIP web site
(Felsenstein 2005)
Gibas and Jambeck (2001) report that the most widely
used phylogenetic analysis package is PHYLIP
(Felsenstein 2005), which contains more than 30
programs that implement different phylogenetic
algorithms. It has programs for tree plotting, heuristic tree
search, interactive tree manipulation, and other
phylogenetic analysis methods. Table 1 shows a list of
PHYLIP programs that users are most likely to use to
analyse protein and DNA sequence data.
The COMPONENT application (Roderic 1993) can both
display and analyse phylogenetic trees. It’s emphasis is
on computing comparative metrics between trees,
although it includes simple interactive editing operations
Our technique is to draw the pair of trees “face-to-face”,
with the arrangements of each tree chosen to best
emphasise the similarities and highlight the differences.
For example, the trees in Figure 1 could be drawn as
follows:
This arrangement shows the greatest agreement subtree
(Ant, Cow, Bat) and also how the differing node (Dog)
connects in the two trees. In essence, it suggests that in
one case Dog diverged from the Bat/Cow line, whereas in
the other it diverged from the Ant line.
Typical phylogenetic trees can often have 50 or more
nodes, and since the number of possible arrangements of
a fully resolved tree of size n is 2n-1 it is usually
impractical to manually determine the best arrangement.
To help in the process we have considered several
strategies for automatically arranging the trees.
♦
The minimum triplet difference (MTD) algorithm
computes arrangements of two trees for which the
difference, as measured by triplet arrangement
pattern, is minimised.
♦
The maximum branch similarity (MBS) algorithm
arranges one tree so that its branches have as many
leaf nodes as possible in common with the
corresponding branch in the other tree.
–
–
0
0
+
–
+
0
+
B
0
+
+
–
+
0
–
0
–
D
E
–
0
0
+
–
+
0
+
H
3.1
(Ant, Bat, Cow)
A
J
(Ant, Bat, Dog)
A
D
(Ant, Cow, Dog)
A
D
(Bat, Cow, Dog)
G
G
Table 2: Triplet patterns for Tree 1 and Tree 2
Tree 1 arrangement
I
0
+
+
+
–
0
–
0
–
K
L
Figure 2: Labelled triplet arrangement patterns
♦
Tree 2 pattern
F
–
G
J
C
Tree 1 pattern
Tree 2 arrangement
A
Triplet
The all-but-n (ABn) algorithm attempts to arrange
the common structures of the two trees so that the
nodes that differ can be drawn in alignment.
Minimum Triplet Difference
Nodes in a triplet can be labelled in 3 distinct ways, and
there are 4 distinct arrangements for each labelling,
making a total of 12 possible labelled triplet patterns, as
shown in Figure 2. The nodes in the figure are labelled to
suggest how the triplet pattern is assigned to a particular
labelled tree. The label ‘–’ is assigned to the tree node
with the lowest ordinal number (in some domain-specific
ordering) of the three triplet nodes. Similarly, the label
‘+’ is assigned to the tree node with the highest ordinal
number, and the label ‘0’ is assigned to the tree node with
the intermediate ordinal number.
For example, using an alphabetic ordering for the labels
in the trees of Figure 1 and considering the triplet (Ant,
Bat, Cow), label ‘–’ would map to ‘Ant’ (the label with
the lowest ordering), ‘0’ to ‘Bat’, and ‘+’ to ‘Cow’ (the
highest ordering). Thus this triplet in Tree 1 would match
pattern A, and the same triplet in Tree 2 would match
pattern J.
The triplet difference between two trees is computed by
considering all triplets and counting the number of triplets
for which the pattern in the two trees is different. For
example, Table 2 lists the triplet patterns for all four of
the triplets in the trees of Figure 1. Since three of the
triplets have different patterns in the two trees, the triplet
difference for these tree arrangements is 3.
3
4
4
4
2
4
3
4
3
4
4
4
2
4
3
4
4
3
4
4
4
2
4
3
4
3
4
4
4
2
4
3
3
4
2
4
4
4
3
4
3
4
2
4
4
4
3
4
4
3
4
2
4
4
4
3
4
3
4
2
4
4
4
3
Table 3: Triplet difference matrix for all possible
arrangements of Tree 1 and Tree 2
The minimum triplet difference (MTD) algorithm finds
an arrangement for each tree that minimises the triplet
difference. In principle, the algorithm considers each
possible arrangement of each of the two trees, then choses
the pair of arrangements for which the triplet difference is
smallest. In general, there may be many pairs of
arrangements with the same minimum triplet difference;
MTD does not specify which such pair should be chosen.
For example, Table 3 lists the triplet difference for each
of the 8 possible arrangements of Tree 1 and Tree 2. In
this case the minimum difference is 2, which is achieved
by 8 pairs, of which one is as follows:
3.2
Ant
Dog
Dog
Ant
Bat
Bat
Cow
Cow
Maximum Branch Similarity
The maximum branch similarity (MBS) algorithm
arranges one tree so that the branches of each internal
node have the largest number of leaf nodes in common
with the corresponding branches of the equivalent node in
the other tree.
For example, consider the original arrangements of the
trees in Figure 1. The set of leaf nodes comprised by the
upper branch of the root node of Tree 1 is {Ant}, and the
set comprised by the lower branch is {Bat, Cow, Dog}.
Similarly, the Tree 2 root node upper branch comprises
{Bat, Cow} and lower branch {Ant, Dog}. Thus for this
arrangement there are no nodes common to the upper
branches, and only one (Dog) common to the lower
branches, for a total common node count of 1. However,
if the branches of the Tree 2 root node were exchanged,
then the upper branches would have 1 common node
(Ant) and the lower branches would have 2 common
codes (Bat and Cow), for a total common node count of
3. Thus MBS indicates that the root node of Tree 2
should be flipped (its branches swapped), giving the
following arrangement:
Ant
Ant
Bat
Dog
Cow
Bat
Dog
Cow
The algorithm then recursively considers the upper and
lower children of the original nodes, ultimately
terminating at the leaf nodes. In this simple example, no
further swaps occur since the upper branch of Tree 1 is
already a leaf, and since flipping the lower branch of Tree
2 would not result in an increase in the number of
common nodes (both alternatives have only 1 node in
common).
3.3
All-But-n
We have explored a class of algorithms, which we call
All-But-n (ABn), that can arrange trees to maximise leaf
node alignment in a face-to-face display where the GAS
of the two trees is almost as large as the trees themselves
(in other words, where the trees differ with respect to just
a few nodes).
The simplest situation (AB1) occurs for trees for which
the GAS includes all but one node. In this case, the aim of
the algorithm is to choose an arrangement for the GAS so
that, when the differing node is re-inserted into the tree
(which will be in a different position in the two trees), the
differing nodes will be aligned. For example, the trees of
Figure 1, which have a GAS that excludes the single node
Dog, could be drawn as follows.
Ant
Ant
Dog
Dog
Bat
Bat
Cow
Cow
AB1 partitions the GAS into three components at the
nearest common ancestor (NCA) of the points in the two
original trees at which the different node is attached to the
GAS. The component above the NCA (the “outer” tree)
pays no further part in the algorithm. The algorithm
proceeds by arranging the upper and lower “inner”
branches of the NCA so that missing node attachment for
one tree is on the lower boundary of the upper “inner”
branch, while for the other tree it is on the upper
boundary of the lower “inner” branch. Then, when the
two trees are constructed around face-to-face copies of
the GAS, the missing node insertion points will coincide.
Since it is always possible to arrange a tree so that any
one particular node is on the tree boundary, it is always
possible to achieve this arrangement when the GAS is
only one node short of the full trees. When more than
one node must be pruned (and subsequently reinserted),
the situation is more complex; sometimes full alignment
can be achieved, but sometimes only partial alignment is
possible. A full explanation of the ABn algorithm is
beyond the scope of this paper.
4
A Visual Tree Comparison Tool
We have implemented a prototype application for
visualising pairs of phylogenetic trees and used it as a
vehicle for developing and evaluating our ideas. The
application is implemented in Java using the Swing
components.
Figure 3 shows the prototype tool displaying two 50-node
trees. The program can read standard Nexus-format tree
files (David et al. 1997) and display a selected pair of
trees. It provides controls for specifying basic parameters
of the tree display, including the separation between
branches and the depth of each node. The information
display area at the bottom of the window provides basic
information about the trees and is used largely for
debugging.
Figure 3 shows the trees displayed in the raw
arrangement specified in the Nexus file; in this example,
that arrangement does not make it easy to compare the
trees. However, the node connection display (between
the two trees), which visually connects common leaf
nodes in the two trees, provides some indication of
similarities in the trees.
Horizontal connection lines (coloured green in the
application) indicate nodes whose vertical position is the
same in the two trees. Clearly, if the two trees (or parts
of the trees) are identical, then they can be drawn so that
all nodes are aligned, in which case the connection
display would consist entirely of parallel horizontal lines.
In Figure 3, few nodes are aligned (the exception is a
group of 3 towards the top of the display).
Slanted connection lines (coloured red or yellow
depending on whether the position of the node in the left
tree is higher or lower than that in the right tree) indicate
nodes that are not aligned. However, parallel slanted
lines indicate groups of nodes whose relative positions
are the same in the two trees, suggesting a similar
structure for those groups in the two trees. Figure 3
shows several such groups.
Figure 3: Visualisation tool interface
The insert gap and decrease gap tools are used to add
additional space between branches in order to arrange a
group of nodes so that they are located at the same level
in both of the trees.
The flip tool is used to swap the positions of the branches
of a given interior node, which allows manual
manipulation of the tree arrangement and may provide a
simpler view of the tree structures.
Figure 4: Collapsing interior nodes
4.1
Using The Application
To rearrange the trees (in order to better compare them),
the user can use a combination of manual interaction and
automatic rearrangement.
The palette on the left of Figure 3 includes tools for
interactively modifying the tree appearance, including
selecting tree nodes, collapsing selected branches,
controlling the spacing between branches of a node,
swapping the upper and lower branches of a node, and
manually setting branch colours and line thicknesses.
The collapse tool is used to temporary hide various parts
of the tree, as shown in Figure 4. Collapsing nodes
enhances visibility, especially for larger trees, because it
enables the user to focus on specific parts of the tree
while ignoring other parts. Collapsed nodes can then
subsequently be expanded (and themselves arranged)
once their containing structure has been dealt with.
The visualisation tool currently implements the MTD and
MBS automatic rearrangement algorithms, but not the
ABn algorithm. To apply the algorithms, the user selects
a branch (or perhaps the entire tree) in both left and right
trees, then invokes the desired algorithm. The application
computes the new arrangements, then redraws the trees
with the selected nodes rearranged.
4.2
Evaluation
Informal evaluation of our prototype visualisation tool
has shown that a combination of automatic rearrangement
and manual rearrangement is often effective in rapidly
generating an arrangement that facilitates tree
comparison, even for quite large trees.
For example, Figure 5 shows an arrangement of the trees
in Figure 3 for which most nodes are aligned. The
arrangement was achieved by a combination of MBS
(applied to the whole trees to align high-level structure),
MTD (to sort out the “tangles” indicated by groups of
nearly parallel connecting lines), manual node flipping (to
fine-tune a few branches), and manual gap insertion (to
move relatively aligned groups into absolute alignment).
Figure 5: An arrangement with greater alignment
Note, however, that alignment of nodes does not
necessarily indicate commonality of structure, although it
does make it much easier to see such commonality.
Figure 6 shows the same arrangement as does Figure 5,
but with common leaf-level branches highlighted in
colour. The colouring algorithm finds nodes that have the
same siblings in both trees, then recursively examines
their parents. Note that not all aligned nodes have
common structure (although most do) and that not all
nodes with common structure are aligned (although most
are).
Our current investigations suggest that the combination of
alignment (to simplify the display) and colouring (to
identify common structures) appears promising as a way
to understand the two trees. We are working with our
bioinformaticist colleagues to validate and further
develop our ideas and to determine if interactive
visusalisation is a viable technique for data of this kind.
5
Conclusion
Information visualisation can play a major role in the
analysis of phylogenetic data by allowing geneticists to
visually compare and therefore better understand their
data. We have developed and are in the process of
evaluating a prototype tool that domain specialists that
deal with phylogenies can use to help understand the data
that they confront.
Although we have not yet done so, we believe that our
ideas will also be of value in other domains where
similarly structured data is used, and where comparisons
are key in understanding the implications of that data.
Acknowledgements
We gratefully acknowledge the contribution of Rejmond
Sejic, who built an early version of the prototype tool and
implemented the MTD algorithm as part of his Honours
project (Sejic 2004). Thank you also to our School of
Biological Sciences colleagues Dr Cathy Abbott and
Assoc. Prof. Mike Schwarz for their valuable insights and
bioinformatics expertise.
References
Carrizo, S. F. (2004): Phylogenetic Trees: An Information
Visualisation Perspective. In Proc. 2nd Asia-Pacific
Bioinformatics Conference (APBC2004), Dunedin,
New Zealand, Australian Computer Society, Inc.
Klinger, J. and Amenta, N. (2002): Case Study:
Visualizing Sets of Evolutionary Trees. In Proc. IEEE
Symposium on Information Visualization, Boston,
Massachusetts, USA.
Roderic, D. M. (1993): Component 2.0 – User Guide,
http://taxonomy.zoology.gla.ac.uk/rod/cplite/Manual.ht
ml (last accessed 08/08/2005).
Byrant, D. (1997): Building Trees, Hunting for Trees and
Comparing Trees. Ph.D. Thesis, University of
Canterbury.
Sejic, R. (2004): Visual Comparison of Phylogenetic
Trees. Honours Thesis, Flinders University of South
Australia.
Gibas, C. and Jambeck, P. (2001): Bioinformatics
Computer Skills. O’Reilly, USA.
Figure 6: The final presentation
David, R. M., David, L. S., and Wayne, P. M. (1997):
NEXUS: An Extensible File Format for Systematic
Information. Systematic Biology, 46(4):590, 62.
Munzner, T., Guimbretiere, F., Tasiran, S., Zhang, L. and
Zhou, Y. (2003): TreeJuxtaposer: Scalable Tree
Comparison Using Focus+Context with Guaranteed
Visibility. In Proc. SINGGRAPH 2003.
Thorup, M and Farach, M. (1994): Fast Comparison of
Evolutionary Trees. In Proc. 5th Annual ACM_SIAM
Symposium on Discrete Algorithms.
Maddison, W. P. and Maddison, D. R. (2005): Mesquite:
a modular system for evolutionary analysis. Version
1.06 http://mesquiteproject.org
Felsenstein, J., Sawyer, S., and Kochin, R (1982): An
efficient method for matching nucleic acid sequences.
Nucleic Acids Research 10(1): 133-139.
Farris J. S. (1983): The logical basis of phylogenetic
analysis. In Advances in Cladistics, Platnick N.I. &
Funk V.A., eds, pp. 1-36. Columbia Uni. Press, New
York.
Robinson, D. F. (1971): Comparison of labeled trees with
valency three, J. Combin. Theory 11:105-119.
Felsenstein, J. (accessed 27/10/2005): PHYLIP web site.
http://evolution.genetics.washington.edu/phylip.html