Lecture 14 - Math Berkeley

Math 239: Discrete Mathematics for the Life Sciences
Spring 2008
Lecture 14 — March 11
Lecturer: Lior Pachter
14.1
Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar
Introduction
The goal of today’s lecture is to prove the remaining theorem of last lecture characterizing
tree additive dissimilarity maps. Namely,
Theorem 14.1. A dissimilarity map δ is tree additive (with respect to a given tree T ) iff δ
satisfies the weak four point condition.
As we discussed last time, rather than proving this theorem we will change our framework
to dissimilarity maps with values in a given group G and provide a more general result in
this new setting. In our previous framework, the group G is (R, +), with identity 1G = 0
and our original Theorem 14.1 will follow immediately from the general result.
For general literature including the material discussed today, we refer the reader to the
book by Semple and Steel.
14.2
General setting
In this section we provide analogous definitions for all concepts developed in Lecture 13.
Definition 14.2. Given a group G a G -dissimilarity map is a map δ : X × X → G such
that δ(x, x) = 1G for all x ∈ X.
Note that in this definition we avoid the symmetry condition required for dissimilarity maps.
Why have we decided to do so? Two reasons justify our choice:
• G in general may not be an abelian group, and
• the general framework for dissimilarity maps realized by trees will allow directed trees
with directed edge weights, so that we may have δ(i, j) 6= δ(i, j) for adjacent nodes
i, j ∈ V (T ), where δ(i, j) denotes the weight of the edge i → j.
Definition 14.3. δ is a tree dissimilarity map if there exists a tree T (i.e. a phylogenetic
X-tree) and weight function w : E(T ) → G such that
Y
δ(x, y) =
w(e),
e∈ path
from ϕ(x) to ϕ(y)
where ϕ : X → V (T ) is the corresponding labeling function and the product is the operation
in the group G .
14-1
Math 239
Lecture 14 — March 11
Spring 2008
Since (G , ∗G ) may not be abelian, the product defining δ(x, y) must be considered in the
order given by the path from x to y, that is if the path is x = v1 → v1 → . . . → vr → y,
then δ(x, y) = w(ev0 v1 ) ∗G w(ev1 v2 ) ∗G . . . ∗G w(evr y ). As we discussed before, we may assign
weights in each direction of the edges of T .
14.3
Main Theorem
We are now in conditions of stating the general result. For simplicity of notation we will
avoid the subscript G in the operation · of the group G, but the reader should have this in
mind.
Theorem 14.4. (Main Theorem) Let G be a group and δ a G -dissimilarity map on X.
Consider the set
−1
Hδ = {δik · δjk
· δjl · δil−1 | i, j, k, l ∈ X} ⊂ G .
If δ is a tree dissimilarity map then:
−1
−1
1. ∀ i, j, k ∈ X : δij · δkj
· δki = δik · δjk
· δji (“three point condition”);
2. ∀ i, j, k, l ∈ X pairwise distinct, there exists some ordering of these points (i.e. a
relabeling of them) such that δik · δjl−1 = δil · δjl−1 (“four point condition”);
Moreover, if δ satisfies the previous conditions and Hδ has no element of order 2 in G , then
δ is a tree dissimilarity map.
Remark: As we discussed previously, Theorem 14.1 will be a consequence of the Main
Theorem, since (R, +) has no elements of order 2.
The proof of the sufficient conditions of the Main Theorem will mimic the arguments
provided on Theorem0 of last lecture. For this we will need to define the notion of an
ultrametric G -dissimilarity map as well as an ultrametric tree representation. We will show
that these two definitions are equivalent. The Main Theorem will be proven by induction
on |X|. Given our G -dissimilarity map δ on X satisfying conditions (1) and (2) we will
construct a suitable ultrametric on X 0 = X r {a}. This will give us an ultrametric tree
representation T 0 and we will need to attach a node a and modify the weights of the edges
of T in order to obtain our result.
The proof of the necessary conditions will be immediate. We will illustrated the desired
conditions by an example. We will denote the inverse element δij−1 by a squiggly arrow:
i
j.
Example. Assume δ is a tree metric.
14-2
Math 239
Lecture 14 — March 11
Spring 2008
• Each side of condition (1) is given by the following weighted directed arrows. The
(LHS) corresponds to
•k
}}
}
}}}}~>
}}}}}~> ~>
}
~}}} ~>
~
•
/
O
O
O
o
i•
•j
whereas the (RHS) corresponds to
i•
k
}}> •
}
>
}}
}}}} >~
}}}}>~ }>~ >~
}
/ }
O•O
O
O
O
o
•j
If we cann u the middle node, and we compute the product of the (LHS) and (RHS)
of the equation, we get δiu δui due to several cancellations. Namely,
(LHS) = (δiu δuj )(δku δuj )−1 )(δku δui ) = δiu δui = (δiu δuk )(δju δuk )−1 )(δju δui ) = (RHS).
• For condition (2) we have (LHS) equal to
i
• ?????
k
~~? •
?
~
~~~~ ?
~~~~~? ?
~
/ ~ ?
• /o /o /o /o /o /o o/ / • @
@@
~?
@@
~~? ?
~
@@
~ ?
~
@

?
~
????
????
???
j•
•l
whereas the (RHS) is given by
i
• ?????
•k
~~
~~
~
/ ~~
• /o /o /o /o /o /o o/ / • @@@@
@
~?
` @` @@@@@
~~? ?
~
~ ?
` @` @@@@
~~?
????
????
???
j•
•l
In this case, we proceed as in condition (1). Call u the node connecting the leaves i and
j. We get that both sides of the equation for condition (2) give the same expression
−1
δiu δju
, so condition (2) also holds.
14-3
Math 239
Lecture 14 — March 11
Spring 2008
Several cancellations will provide the equality of each side in conditions (1) and (2).
By a similar method we will be able to show that conditions (1) and (2) are necessary
for δ to be a tree G -dissimilarity map. So we only need to prove the converse, provided that
Hδ has no elements of order 2. As we anticipated earlier, the main idea will be to build an
ultrametric form δ using the Gromov product:
−1
δx (i, j) = δxi · δji
· δjx
∀ i, j 6= x.
Note that this function δx may not be a dissimilarity map, since δx (i, i) need not be 1G .
However, the important fact is that δx will be an ultrametric in a more general setting that
we explain later.
14.4
Ultrametric conditions and ultrametric tree representation
In this section we define the generalized notion of ultrametric conditions and ultrametric
tree representations in context of G -valued functions.
Definition 14.5. We say that δ : X × X → G satisfies the ultrametric conditions if
1. δ(i, j) = δ(j, i) (i.e., δ is symmetric),
2. | {δ(i, j), δ(i, k), δ(j, k)} |≤ 2, i.e. we have equality of at least two of these elements of
G (“weak three point condition”),
3. (Technical condition for Hδ )
There does not exist four pairwise distinct points i, j, k, l ∈ X with
δij = δjk = δkl 6= δjl = δli = δik .
In words, this says that things have to fit together nicely.
Before stating the next definition and the key result relating both notions, let us motivate
this definition through an example.
Example. Assume G = (R, +). Suppose we are given a rooted X-tree T with weights
assigned to its edges, which corresponds to a tree metric d. Assume that the distance from
the root ρ to each leaf is the same number δ(ρ, x).
ρ
nn • PPPPP
n
n
PPP3
3 n
PPP
nnn
n
n
PPP
n
P
nnn
•A
•A
|
A
AA 1
}} AAA−1
1 |||
1
}
AA
AA
}}
||
AA
A
||
}}
•e
•@
a•
d•
~ @@
~
−2 ~~
@@
@@
~~
−2
@
~
~
b•
•c
14-4
Math 239
Lecture 14 — March 11
Spring 2008
We claim that the edge weighting of the tree T will be equivalent to giving a weight to each
internal node of T in the following way. For each internal node v we assign w(v) = 2d(x, v)
for any leaf x. Likewise we assign the weigh w(ρ) = 2d(x, ρ) to the root ρ. Since the distance
from ρ to each leaf x is the same, this numbers w(v) will be the same for any choice of the
leaf x. In our example:
4
ρ • PP
PPP
mmm
m
m
PPP
mm
m
PPP
m
m
m
PPP
mm
m
P
m
m
−2
• DD
•−2 B
BB
|
|
D
|
|
DD
BB
||
||
D
BB
|
|
D
|
|
D
B
|
|
|
|
−2
•
•
•e
d
a
• E
EE
y
y
EE
yy
EE
yy
E
y
y
b•
•c
A first question one might ask is why did we include to add the factor of 2 when defining
w(v). A reason for this is that if v is the internal node corresponding to the cherry of the
leaves x, y then we have that the weights d(x, v) = w(exv ) = 21 d(x, y) = w(evy ) = d(v, y).
In our example we have d(b, c) = −4 = 2(−2). Moreover, in general we have the following
identity
d(x, y) = label (weight) of the least common ancestor of x and y.
So given T and the distance function d in V (T ) provided by the weights on E(T ) we
can construct weights for the internal nodes of T . Conversely, assume that we have defined
these weights on the internal nodes we want to construct the distance function d. This will
be given by assigning a weight to each edge as we ascend from the leaves of T towards the
root ρ, bearing in mind that w(v) = 2d(x, v).
Since these two weighting representations of a tree T are equivalent, we will define an
ultrametric tree representation by simply labeling the internal vertices of a rooted tree via
a function t, that is δ(x, y) = t(l.c.a.(x, y)). Note that the weights on the internal nodes
are free from any a priori restriction, so this notion can be generalized to take values in an
arbitrary set, not necessarily in a fixed group.
Definition 14.6. An ultrametric tree representation is a rooted phylogenetic X-tree, together with a labeling of the (internal) vertices of T by elements of G given by a function
t : V (T ) → G .
We now state the key result for our Main Theorem, without proof because it will be the
same as the one provided for the analogous result from Lecture 13.
Theorem 14.7. Given an ultrametric tree representation t, then δ defined by δ(x, y) =
t(l.c.a.(x, y)) is an ultrametric. Conversely, given an ultrametric δ we can construct an
ultrametric tree representation that realizes δ.
14-5
Math 239
Lecture 14 — March 11
Spring 2008
Proof of Main Theorem As we said before, the argument will preceed as follows. We need
to show that the three ultrametric conditions are satisfied. By induction we will construct
a tree and then we will need to transform it in order to get our tree dissimilarity map. We
omit the details since it is very similar to Theorem0 of last lecture. Remark: Note that, as in Theorem0 , we have a constructive proof, hence we have an
algorithm for building the tree dissimilarity map.
As a consequence we obtain
Corollary 14.8. A G -dissimilarity map is tree additive iff it satisfies the weak four point
condition.
Proof. The four point condition with two equal nodes provides condition (1). On the other
hand, condition (2) is just the four point condition. For further details on the previous proof we refer to the book by Semple and Steel.
14.5
Why is this theorem relevant?
In this section we aim to discuss the importance of this theorem from a historical perspective.
In 1967, a paper by Cavalli-Sforza and Edwards appeared. It was the first paper to
discuss statistical approaches to phylogenetics. The idea suggested in that work was the
following. Starting with fixed DNA data, build a dissimilarity map in some way (today we
would rather use the Jukes-Cantor connection, which was unknown at that time). By the
evolutionary theorem we have that δ comes from a tree metric (recall that it came from real
DNA data). The goal was to find the corresponding representing tree T .
To fulfill this they proposed the following approach. Let T be a phylogenetic tree. Given
δ : X × X → R>0 (which takes positive real values since it corresponds to distances between
vertices of T ) the idea was to find δ̂ that minimizes the following expression:
X
(δij − δ̂ij )2 ,
(*)
i,j
where δ̂ is a tree metric for T .
However, this approach has two main problems:
1. What happens in the case where T is an unknown? One possible solution would be to
construct a tree metric δ̂T for every tree T and find the one that is closest to δ in the
sense of (*).
2. The other difficulty we may encounter is how to find an explicit δ̂ minimizing (*).
14-6
Math 239
Lecture 14 — March 11
Spring 2008
For the second task, if we weaken our restriction on (*) by allowing δ̂ to be tree additive
rather than being a tree metric, then we have a formula computing δ̂:
δ̂ = ST · ˆl
where ST denotes the incidence matrix and ˆl are optimal weights of the edges of T . In this
case, the least squares formula gives:
ˆl = (S t ST )−1S t δ,
T
T
where δ is the given dissimilarity map. Note that in this case we obtain ˆl ∈ R|E| (where
E = E(T )) and it may not have positive entries.
On the contrary, if we require ˆl ∈ (R>0 )|E| , we have a constrained least square problem,
so the optimization task is harder in this case. In fact, we will need to use an iterative
approach to solve it.
Moreover in the tree additive setting, ˆl has a very simple formula. Given any edge
e ∈ E(T ) we have
nA nD + nB nC
nA nC + nB nD
DAC + DBD +
DAD + DBC − DAB − DCD
(nA + nB )(nC + nD )
(nA + nB )(nC + nD )
(**)
2 lˆe =
where
• nA = #{labeled nodes in the cluster A } and similarly for B, C and D.
P
• DAB =
δab and similarly for DAC , DBC , DAD , DBD and DCD .
a∈A
b∈B
A
@@
@@
@@
@





•
lˆe
e
B
~~
~~
~
~~
•@
@@
@@
@@
C
D
Note that in this case B , C and D correspond to groups of nodes rather than single nodes.
There are two important remarks to make concerning formula (**):
Observation:
1. lˆe depends only on δxy where the path from x → y touches the edge e. This is called
the group property since we have groups of nodes. We say that the path touches rather
than contains the edge e since the paths in DAB only touch e at its left node.
Moreover the formula (**) doesn’t involve distances between nodes in the same cluster
of nodes: we always need to pick one node from each group A, B, C or D.
14-7
Math 239
Lecture 14 — March 11
Spring 2008
2. Although less obvious, we have an important complexity result: (**) gives an O(n2 )
algorithm to find δ̂, where n = |X|. So it has the optimal possible complexity. This
result is due to Vach (1989).
These two facts give a strong argument in favor of considering tree additive maps instead
of tree metrics. If we are lucky enough, our algorithms will give tree metrics, but a priori
we should expect tree additive maps instead. An example of this general behaviour is the
Neighbor-Joining algorithm.
14.6
Homework
Exercise (optional): Give a simple direct proof of result Theorem0 for the case (R, +), i.e.
try to avoid passing through the ultrametric construction. (For references to this approach,
see a paper by Hakimi and Patinos form the early 1970s.)
14-8