Carnival of Evolution, Number 46: The Tree

Carnival of Evolution, Number 46: The Tree (structures)
of Life
Bradly Alicea, April 1, 2012
Originally posted at: http://syntheticdaisies.blogspot.com/2012/04/carnival-of-evolution-number46-tree.html.
Introduction
Welcome to Carnival of Evolution, Number 46. I am your host, Bradly Alicea. This month's
theme is: the Tree (structures) of Life. Since this blog covers a mix of both biological and
computational content, it is fitting that we explore this month's submissions in the context of
trees (the computational kind) and biological classification (the biological kind). I will indulge in
a historical and technical overview of trees used in evolutionary analysis, then present this
month's
posts.
What is a tree structure? In computational science, trees are a type of data structure often used to
hierarchically sort information. In graph theory, this is called a directed acyclic graph (DAG).
There are decision trees, factor trees, and classification trees, and even trees resulting
from fractal growth (Figure A). Factorization of the number 46 can be used to illustrate the way
tree structures are built: it can be directly factored to its primes in a single bifurcation (Figure B).
Figure A. LEFT: An example of a decision tree, strict hierarchy. RIGHT: a tree that embodies
fractal growth (built using a recursion process).
Figure B. A factor tree for the number "46".
In Darwin's notebooks, common descent was conceptualized as being represented by directed
acyclic graph (Figure C). Indeed, one of the primary signatures of common descent (shared,
derived traits) is quite well suited to analysis using directed acyclic graphs, although that
1
relationship was not appreciated by Darwin. This connection would not become clear until the
rise of phylogenetic theory many years later.
Figure C. One of the first "trees of life", from Darwin's Notebooks.
Figure D. A later version of an evolutionary tree, by Ernst Haeckel (late 19th century).
The "tree of life" is often thought of as a "branching bush" (Figure D) -- meaning that taxa (e.g.
species) do not arise from one another in linear fashion. The concept of common ancestry is key.
Common ancestors, or one ancestral form giving rise to many descendents, are key enablers of
2
the proliferation of biodiversity, represented by bifurcations (binary splits) in the tree (Figure E).
In a biological context then, "trees" are reconstructed from available data to infer a set of
evolutionary relationships.
Figure E. Why do we use trees? To reveal a "black box" of evolutionary relatedness containing
the common ancestor using a variety of character types (traits). In many cases, the more
characters you have the more likely you are to find the "correct" tree, but it comes with a
computational cost. Inference can be done using many different types of data.
Modern phylogenetics (or as some people prefer, cladistics) provides a specialized language and
protocol for understanding evolution from common descent. Tree structures are also used to
reveal clades (e.g. phylogenetic sets) and nested relationships. This is the idea
behind monophyly, which postulates that given the data, all related species are directly
connected to one and only one internal node(common hypothetical ancestor) in the graph (Figure
E).
In general, polyphyly (e.g. parallel or convergent evolution) is considered an incorrect
evolutionary hypothesis. However, in select cases, polyphyletic relationships may capture true
evolutionary relationships [1]. Yet the work of Carl Woese [2] demonstrates that all three known
domains of life (eukarya, bacteria, and archaea) of life can be classified as a series of
monophyletic groups.
Figure F. A modern "consensus" (based on rRNA data) phylogeny demonstrating the three
domains of life. COURTESY: Wikipedia.
3
Some people argue that in certain cases (hybridization or horizontal gene transfer) evolution is
also reticulate (or in the parlance of graph theory, cyclical). Indeed, depending on natural history,
some groups of species or particular traits can exhibit a cycle [3]. Even in the case of the
universal tree (Figure F), reticulations in the form of horizontal gene transfer can violate the
strict
hierarchy
of
this
tree
topology.
Finally, phylogenetic relationships range from relationships that distinguish 2-3 species to
complex intraspecific relationships and the three domains of life. From a computational point-ofview, this is not a trivial issue. In general, the greater the number of taxa (e.g. species) analyzed,
the much greater the number of possible evolutionary hypotheses (e.g. tree structures) there are
to evaluate. In equation form, this scales according to the equation in Figure G.
Figure G. Equation to find the total number of possible phylogenies (tree topologies) given a
specific number of related taxa.
In Figure G, the number of possible trees (T) increases in exponential fashion with the the
number of taxa (N) added to the analysis. As additional taxa are added to the analysis, finding
the true tree quickly becomes an NP-hard problem (for the biologists, this means an exact
solution is not likely). Fortunately, we can use search heuristicsto approximate the true tree. This
approximation is of course subject to the type and amount of data added to the analysis.
If you guessed from the equation that this is a combinatorics problem, you are correct, but you
still have to complete the evolution crossword puzzle to claim your prize.
4
Now, on to the posts.........
For this version of Carnival of Evolution, I will be incorporating the submitted and other featured
posts into a series of phylogenetic trees. Each tree will demonstrate a typical tree topology that
one might encounter in the scientific literature. My basis for homology, character coding, etc.
were conceptual more than systematic. In addition, the sampling was non-uniform over the
course of March. Nevertheless, this should still be a fun (and potentially educational) experience.
Tree #1: two clades with an outgroup.
In tree #1, we have two clades (taxa that share some set of derived characteristics), as well as
an outgroup (an distantly related taxon that helps to determine the polarity, or ancestral state, of
traits that make up the tree.
* the outgroup for this tree topology is a post from PZ Myers at Pharyngula, and is a link to an
educational video. PZ thinks this is a good way to teach 11-year olds (or the uninitiated) about
evolution. PZ's post also serves to root two clades of two posts each.
* the first clade features posts on taxonomy by Larry Moran at Sandwalk and Jerry Coyne at
Why Evolution is True. Larry's post is a critique of a recent paper and press release on the
taxonomic status of Pikaia (a chordate from the Burgess shale). Jerry's post is a review of a
recent PNAS paper on the inactivation of the genes for taste buds (which provides human with
the tastes for sweet, bitter, umami, salty, and sour) in certain carnivore species. Particularly, the
inactivation of S(sweet)-genes were found to involve multiple types of changes to the genes. For
a much more in-depth take on this topic (the subspecies exemplar in our clade), please see Bjorn
5
Ostman at Pleiotropy. In a post called "Carnivores have bad taste", he will help you understand
the molecular evolution behind pseudogenes that are coupled to function.
* the second clade features two posts from John Hawks at the John Hawks weblog. If you've
never been to this blog, visiting for his artwork on human evolution and diversityalone is worth
the time. This is the first post of of many this month on sequence data and its evolutionary
implications involving the Gorilla genome [4]. In his first post, John is interested in genes that
evolve with respect to hearing (particularly LOXHD1) and how those genes have diverged
between Gorillas and Humans. The second postconcerns the taxonomic status of humans.
Commenting on a recent article about Richard Dawkins in the Washington Post, John argues that
Humans should be considered Hominoids rather than apes, because Hominoidea represents a
valid taxonomic (e.g. monophyletic) group.
Tree #2: the four taxon case.
In tree #2, we have a four-taxon case, which was originally used by Huelsenbeck and Hillis [5]
to test search heuristics (e.g. algorithms) that allow us to determinemaximum parsimony for a set
of hypothesized evolutionary relationships. In this unrooted tree, we have two clades.
* the clade on the left involves two posts on the Watchmaker analogy (orginally set forth
by William Paley and featured in a book by Richard Dawkins [6]). Greg Laden's Blog features a
personal anecdote that leads in to the second post by Brian Lynchehaun at The Crommunist
Manifesto. Brian's post argues that the Watchmaker analogy is not a valid method of reasoning
for
a
number
of
reasons
which
are
explicated
in
the
post.
* the clade on the right involves posts on human evolution. The first post is from This Week in
Evolution, and features insight on cumulative culture in primate species. The post focuses on two
recent articles on solving puzzles and game playing in a cross-species context. What
distinguishes humans from other primates: problem-solving ability, cumulative culture, or a bit
of both? The second post, at 10,000 Birds, is a free-association-style essay on the
paleoanthropology of human scavenging and its relationship to our modern energy
behaviors/needs and the species around us.
6
Tree #3: variable branch-length tree topology
In tree #3, we have a tree with variable branch lengths. Trees of this style are often used when
time-of-divergence information (e.g. mutation rate) is available. Speaking of which, there was a
post by Larry Moran from Sandwalk discussing a reconsideration of how mutation rates are
calculated. This is based on a recent paper that suggests there is variation in genome-wide
mutation
rates
within
and
between
human
families.
Tree #3 (based on no particular mutation rate) features two clades: one on human taxonomy and
the
other
on
human
sociocultural
evolution.
* the first clade features two different takes on human taxonomy. The first post from Stephanie
Zvan at Almost Diamonds is a discussion about human subspecies (always a contentious topic)
and their relationship to human variation. She asks Greg Laden (a fellow blogger) about this
issue, and gets a very thoughtful response. The second post is from John Wilkins at Evolving
Thoughts with his thoughts and comments on John Hawks post featured in tree #1.
* the second clade features a post on a social attribute of our species and one of our cultural
products, both from an evolutionary perspective. Anne Buchanan at Mermaid's Tale gives her
thoughts on human altruism in the context of Hamilton's rule and what it means to be
cooperative. She proposes that we must look beyond kinship to understand the true nature of
altruism. The second post from This Week in Evolution involves recent finding related to diet
7
soda and how they might be explained by a model of evolutionary tradeoffs between fertility and
longetivity.
For a more computational view on altruism, check out Masoud Mirmomeni's work at Michigan
State. He was featured in a recent Beacon Center Blog researcher profile (Mathematical
Modeling of Evolution). Masoud's work focuses on understanding the Price equation (which is
used here as an alternative to Hamilton's rule) using Avidians (digital organisms).
Tree #4: a reticulate tree topology (e.g. a tangled web, a tangled bank).
In tree #4, we have an example of reticulating branches (or evolutionary graph cycles, if you
will). Our reticulations are due to related intellectual content or institutional affiliations rather
than hybridization or horizontal gene transfer (HGT) events, but hopefully it still conveys the
concept of evolution as a tangled bank [7]. Instead of calling out blog posts by clade, I will go
clockwise
from
the
upper-left
portion
of
the
graph.
* the first post is from yours truly at Synthetic Daisies (Use, reuse, and use again....), and
discusses an instance of exaptation called neural recycling (which is one way the brain can
acquire new functional architectures without growing accordingly). There is also discussion
of homology in a neural context, and the techniques researchers use to map cross-species
relationships. The second post is from the Beacon Center Blog, and features current work
by Daniel Couvertier on simulating biased group selection and digital evolution. The third post is
another post on the Gorilla Genome, this one from David Winter at The Atavism. In this post, the
taxonomic relationship between Gorilla, Chimp, and Human is considered, as he shows why not
every gene gives the same phylogenetic signal in a three-taxon relationship.
* Ken Weiss at the Mermaid's Tale posts (actually two posts) on role of "slop" in describing how
life works. By "slop", he means the role of stochastic processes, non-normally distributed
8
phenomena, and chance events. In another post from the Beacon Center Blog, Eric Bruger gives
his own take on the evolution of cooperative behaviors, this time in bacteria. And in another post
from Mermaid's Tale (this one by Anne Buchanan), the role of random events in ordered
biological systems is pondered. The subject of the post is a recent paper on stochastic gene
expression, which Anne then relates to the role populations play in averaging out and otherwise
resampling random events in biological systems and evolution. The last post on this tree is from
EvoAnth on reconsidering the evolution of monogamy among Primate species by using digit
ratios
(2D:4D) as
an
assay.
Tree #5: long-branch attraction.
Tree #5 demonstrates a phenomenon called long-branch attraction, which occurs when many
changes accumulate along two branches that graphically-speaking appear to be in the same clade.
For our long-branch attraction example, we have two blog posts. The left branch features a post
from Mousetrap on Hamiltonian parasites(in this case, Hamiltonian refers to W.D. Hamilton, not
the physics function). This post is an open inquiry into how parasites fit into the Red Queen
hypothesis. The post on the right branch is from Safari Ecology on the toxicity of snake venom.
Alternative
Evolutionary
Hypotheses
(evo-eco-devo
posts).
Another post I could not incorporate into a tree is from Felipe Pérez Jvostov at Eco-Evolutionary
Dynamics called Parasites, guppies and predation, which is on........a recent paper published by
their group on parasites, guppies, and predation. Holly Dunsworth at The Mermaid's Tale wrote a
comprehensive post on babyism (Forget Bipedalism, what about Babyism?) which nicely recaps
some of her dissertation work on the role of early development in motherhood in evolution.
9
Outside of the blogosphere, there was an interesting evo-devo related talk this month in Second
Life given by Marta Linde-Medina of New York Medical College. The talk was hosted by
the Embryo Physics course, and was a developmental physics perspective on the origins of
curved bird beaks.
10
Two final announcements before CoE concludes for this month
1) this year's Artificial Life conference (Alife XIII) will be held at Michigan State University in
East Lansing from July 18-22. The conference is being hosted by theBEACON center, and the
theme is experimental evolution. The program will cover cutting-edge work being done in
biological theory, artificial life (the simulation of evolution), and the evolution of intelligence.
This will be a very interesting and intellectually stimulating conference, so be sure to attend if
you can.
2) here is a bonus for those of you inclined to puzzles and games. I have created an
"evolutionary" crossword puzzle for you to ponder over the next month. It is fairly light, but also
requires a fair amount of knowledge about evolutionary theory and biology. If you can answer all
of the clues correctly by April 20, e-mail me proof of completion and I will post your name to
Synthetic Daisies saying that you are a CoE 46 puzzle solver. Good luck!
Download puzzle here
References:
[1] Conant, G.C. and Wagner, A. (2003). Convergent evolution of gene circuits. Nature
Genetics, 34(3), 264-266. AND Eisthen, H.L. and Nishikawa, K.C. (2002). Convergence:
obstacle or opportunity? Brain, Behavior, and Evolution, 59, 235-239.
[2] Woese, C.R. (2000). Interpreting the universal phylogenetic tree. PNAS, 97(15), 8392-8396.
[3] Doolittle, W.F. (1999). Phylogenetic classfication and the universal tree. Science, 284, 2124–
2128. AND Williams D. et.al (2011). A rooted net of life. Biology Direct, 6, 45.
[4] Scally, A. et.al (2012). Insights into hominid evolution from the gorilla genome sequence.
Nature, 483, 169.
11
COURTESY: Figures 3 and 4, reference [3].
[5] Huelsenbeck, J.P. and Hillis, D.M. (1993). Success of Phylogenetic Methods in the FourTaxon
Case.
Systematic
Biology,
42(3),
247-264.
[6]
Dawkins,
R.
The
Blind
Watchmaker.
W.W.
Norton,
New
York.
[7] Zimmer, C. (2009). The Tangled Bank: an introduction to evolution. Roberts and Company,
New York.
12