The Cophylogeny Reconstruction Problem is NP

The Cophylogeny Reconstruction Problem is NP-Complete∗
Y. Ovadia and D. Fielder and C. Conow and R. Libeskind-Hadas
September 7, 2009
1
Introduction
The cophylogeny reconstruction problem arises in the study of host-parasite relationships. Specifically, we are given a host tree H, a parasite tree P , and a function ϕ mapping the leaves (extant
taxa) of P to the leaves of H. Four biologically plausible operations are considered: cospeciation,
duplication, host switching, and loss (Figure 1). A host switch is permitted in conjunction with a
duplication event but not with a cospeciation event [1].
A feasible solution is an extension of ϕ that maps each internal node of the parasite tree to a
vertex or edge of the host tree and can be constructed using the four types of events. The objective
of the cophylogeny reconstruction problem is to find one or more “optimal” solutions that reconcile
the parasite tree and the host tree with respect to these operations. One notion of optimality
simply assigns a cost to each of the four types of events and then seeks to minimize the total cost.
Since it is often difficult to estimate appropriate costs for each of the four types of operations,
an alternative approach is to find a Pareto optimal set of solutions [1]. In this case, a vector is
associated with each solution where the entries indicate the number of cospeciation, duplication,
loss, and host switch events, respectively. Let v = (c, d, `, h) and v 0 = (c0 , d0 , `0 , h0 ) be the cost
vectors of two solutions. We say that v is strictly less than v 0 if c ≤ c0 , d ≤ d0 , ` ≤ `0 , and h ≤ h0 ,
and at least one of these relationships is a strict inequality. A solution is said to be in the Pareto
optimal set if its cost vector is some v such that there is no solution with cost vector strictly less
than v. The optimization version of the cophylogeny reconstruction problem is to find the maximal
Pareto optimal set. The corresponding decision version of this problem, henceforth denoted CRDP
(Cophylogeny Reconstruction Decision Problem) asks: For a given cost vector v, is there a solution
whose cost vector is strictly less than v?
While practitioners have generally assumed for over a decade that this problem is computationally intractable, no rigorous proof of this conjecture has been found. Recently, Libeskind-Hadas
and Charleston have made progress in this direction by showing that a slightly more general version
of this problem is NP-complete [2]. Their proof assumes that the host phylogeny may be reticulate,
meaning that it can contain cycles (corresponding to biological hybridization events) rather than
a true tree. In addition, the proof assumes that ranges can be stipulated for the relative times of
events in the two trees. These two assumptions are necessary in the proof in order to construct
appropriate gadgetry. The problem of whether the classical cophylogeny reconstruction problem is
NP-complete was, therefore, left open.
∗ This
work was supported by the U.S. National Science Foundation under grant 0753306.
1
a
e
b
d
c
a
a
loss
loss
duplication
e
e
loss
loss
c
cospeciation
d
duplication
with host switch
c
d
cospeciation
b
b
Figure 1: A simple tanglegram with host tree in black at left and parasite tree in gray on right.
The associations ϕ between tips is shown in dotted lines. Below are two possible reconstructions
that explain the relationship between H and P with events labeled.
In this paper, we prove that the decision version of the cophylogeny reconstruction problem is indeed NP-complete. The proof relies on some ideas introduced in Libeskind-Hadas and Charleston’s
proof for the reticulate timed problem, but is substantially more involved due to the fact that we
can no longer exploit reticulation or timing.
2
Terms and Definitions
The following terms and definitions will be used throughout the remainder of the paper.
• V (T ) denotes the vertices in rooted tree T ,
• L(T ) denotes the leaves or tips of rooted tree T ,
• A host tree is a rooted regular binary tree with an additional edge (d, r) where d is called the
“dummy root” and r is the original root of the regular binary tree. This is required in order
to account for events in the parasite phylogeny that may predate the most recent common
ancestor in the host phylogeny.
We now formally state the decision problem.
Definition 1 An instance of the Cophylogeny Reconstruction Decision Problem (CRDP) is a 4-tuple
(H, P, ϕ, B) where
• H and P are the rooted host and parasite trees,
2
• ϕ : L(P ) → L(H) maps the tips of P to the tips of H,
• B is a 4-tuple (BC , BD , BS , BL ) of upper bounds on the number of cospeciation, duplication,
loss, and host switch events respectively.
The decision question is: Does there exist a mapping Φ that extends ϕ and whose cost is strictly
less than B?
A natural generalization of this problem allows an extant parasite to be mapped to some arbitrary number of tips in the host tree. This problem, called the Generalized Cophylogeny Reconstruction Decision Problem (GCRDP), is stated as follows:
Definition 2 An instance of the Generalized Cophylogeny Reconstruction Decision Problem (GCRDP)
is a 4-tuple (H, P, ϕ, B) where
• H and P are the rooted host and parasite trees,
• ψ : L(P ) → 2L(H) maps the tips of P to sets of tips of H,
• B is a 4-tuple (BC , BD , BS , BL ) of upper bounds on the number of cospeciation, duplication,
loss, and host switch events respectively.
The decision question is: Does there exist a mapping Ψ that extends ψ and whose cost is strictly
less than B?
We first prove that GCRDP is NP-complete via a reduction from 3SAT. We then show an instance of GCRDP constructed in the reduction from 3SAT can be transformed into a corresponding
instance of CRDP with the same answer, thus implying that CRDP is also NP-complete.
3
Polynomial Time Reduction of 3SAT to GCRDP
Given an instance of 3SAT with n variables x1 , . . . , xn and m clauses C1 , . . . , Cm where n is a power
of 2 and each clause contains at most a single instance of each variable1 , we construct an instance
of GCRDP for which a solution exists if and only if a solution exists to the 3SAT instance.
A basic gadget in the reduction is the k-thorn gadget, illustrated in Figure 2, consisting of a path
of length k where each vertex on the path, except the last, has one tip child. The last vertex on
the path may have one tip child and a second child in another gadget or may have two tip children.
The value of k will be determined later.
If y and z are two k-thorn gadgets where y is in the parasite tree and z is in the host tree, then
we say that y associates with z if ϕ maps each tip of y to the corresponding tip of z or, in the case
of GCRDP, if φ maps each tip of y to a set which contains the corresponding tip of z. Similarly,
we say that a solution mapping Ψ maps thorn y to thorn z to mean the solution maps each vertex
of y to the corresponding vertex in z.
In our reduction, variables in the 3SAT instance will be represented in the host tree while clauses
will be represented in the parasite tree. The tip mapping function will be used to encode the literals
that appear in each clause.
1 It is easily seen that any instance of 3SAT can be expressed as an equivalent instance with these two properties
using a simple padding of variables and clauses.
3
...
k
Figure 2: The k-thorn gadget is drawn to the left with filled-in tip vertices. Henceforth, we represent
this symbolically as shown to the right.
α0
α1
α2
α3
α4
α0
α0
α1
0
α2 α
α31
α
α42
α
α3
α4
α1 0
α
α21
α
α32
α
α43
α
α4
Figure 3: The assignment gadget is set to true if α1 occurs prior to α3 and is set to false otherwise.
4
λ0
λ1
λ2
λ3
λ4
Figure 4: A generic literal gadget.
The primary gadget in the host tree is called an assignment gadget. There will be four such
gadgets in the host tree for each variable in the 3SAT instance (only one of which determines the
variable’s assignment). This gadget, shown in Figure 3, begins with a k-thorn gadget α0 . This
thorn’s non-tip child has two descendant edges; one to thorn gadget α1 followed by thorn gadget
α2 , and the other which leads to α3 followed by α4 . The final thorns of α2 and α4 have two tip
children. This host tree gadget will represent the value true if α1 occurs before α3 and false
otherwise. These relative times will be induced by the mapping of corresponding gadgets in the
parasite tree, which we describe below.
The host tree, depicted
in Figure 5, begins with a dummy root whose single child is a k-thorn
P
0
τ 0 where k = BD + l∈L(P ) (|ψ(l)| − 1) + 1 (this value will be defined as BD
+ 1 later). This
k-thorn leads to a vertex which initiates a sequence of bifurcations forming a balanced binary tree
culminating with n vertices, one for each variable in the 3SAT instance. Each of these n vertices
is the root of a k-thorn gadget τi1 of size k = BL + 1. Each such thorn’s non-tip child leads to a
balanced binary tree with four children which serve as roots for assignment gadgets αi , βi , γi , and
δi .
The primary gadget in the parasite tree is called a literal gadget and is depicted in Figure 4
Each clause in a 3SAT instance contains exactly three literals and thus there will be three literal
gadgets per clause in the parasite tree. Let λ denote a literal gadget. λ begins with k-thorn gadget
λ0 which shares its root, followed by an additional thorn λ1 . The non-tip child of λ1 is incident to
one edge which leads to a k-thorn λ2 followed by a tip, and another edge which leads to k-thorns
λ3 and λ4 in series followed by another tip vertex.
The literal gadgets in the parasite tree will be mapped to assignment gadgets in the host tree
using the tip mapping function ψ. In particular, this mapping will depend on whether a given
literal, represented by a literal gadget in the parasite tree, appears unnegated or negated in a given
clause. In other words, the tip mapping function will encode whether a given literal satisfies its
clause by being true or false.
Henceforth, we will refer to a literal gadget from clause Cj that is the negated or unnegated
form of variable xi as λi,j .
5
αi0
αi1
α0i0
ααi 0 α2
i
i
...
...
...
...
α1
ααi1i1
αi3
i
2
α
βi
ααi2i2
αi4
i
αi
0
3
αi α3i
γi
βi0
αi0 ααi i3
δi
1
4
α
α
i
τi1
ααi4i4
βi1
α1
i
τ 0 0 i2
0
α α
β
ααi0i0 i2
ββi0i0
βi2
i
i αi
3
1
1
αi
α
β
ααi1i1
ββi1i1
βi3
αi3
i
i
2
αi4
α2i2
β
ααi 2
ββi2i2
βi4
αi4
i
i
0
3
0
3
β
α
αi
β
3i
i
ββi3i3
αi0 ααi i3
γi0
βi0
i
1
4
1
4
βi
α
αi
β
ααi4i4
ββi4i4
αi1
γi1
βi1
i
i
0
2
0
2
0
αi α0 i
βi
β
γ
ββi0i0
γγi0i0
αi0 ααi i2
γi2
βi2
i
i
1
αi1 1 αi3
β1i1 βi βi3
γ
ββi 1
γγi1i1
αi1 αi αi3
γi3
βi3
αii
i
2
4
2
4
2
αi 2 αi
βi
γβ
γ
ββi2ii2
γγi2i2
αi2 αi αi4
γi4
βi4
δii
i
0
13
0
0 α3
3
γi
τβ
αi i 3 βi
γ
ββii3i3
γγi3i3
δi0
αi0 αi3 αi βi0
γi0
i
i
4
αi4 4 βi1
γi1
β4i4
αi1
γ
ββi 4
γγi4i4
αi4 αi βi1
δi1
αi1
γi1
i
i
0
2
0
2
0
0 α2
β
γ
γ
δ
α
i
i
0 βi
γγi0i0
δδi0i0
ααi0i0 αi2
βi0 βi βi2
δi2
γi2
i
i
i
i
1
3
1
3
3
1
1
βi 1 βi
γi
γ
αi
δ
α
γγi1i1
δδi1i1
ααi1i1
βi1 βi βi3
δi3
αi3
γi3
i
i
i
2
4
2
4
4
2
βi 2 βi
γi
γ
αi
δi4
δ2i
α2
γγi2i2
2
δ
ααi2i2
4
2 βi
4
4
i
δ
β
β
α
γi
i
i
i
i
i
i
3
βi3 3 γi0
δi0
γ3i3
βi0
δ
α3i3
3
i
γγi 3
δδi 3
ααi3
βi3 βi γi0
βi0
δi0
i
i
i
4
1
4
1
1
4
βi 4 γi
δi
γ4i
βi
α4i
γγi 4
ααi4
βi4 βi γi1
βi1
δi1
i
i
1
11
0
0
2
2
0
γi 0 γi
δi2
δ0i
βi
β
δ
ββi0i0
2
0
γ
γi i γi
δi i0
βi2
δi2
i
1
3
1
3
3
1
γ
δ
δ
γ
β
1i
i
i
i
1i
i
1
Figure 5: A sample hostβββtree
derived
from
The balloon shows the group of four
1 γi a 33SATδiinstance.
1
3
i1
γi
γi
δi
βi
δi3
i
1
assignment gadgets for variable
x
2
2
4
4 i.
2
γ
δ2i
γi
βi
β
i
2
δ
ββi2i2
4
2 γi
2
4
i
γi
γi
δi
βi
1
i
γi3 3 δi0
δ3i3
1
γi0
β3i3
δδi 3
ββi3
γi3 γi δi0
γi0
i
i
γi4 4 δi1
γi1
β4
ββi4i4
1
γi4 γi δi1
γ
i
1
i
11
0
2
0
δ
δi2
γi
γ0i
i
0
γγi0
δi0 δi δi2
γi2
i
3
1
δi1 1 δi3
γi
γ
γγi1i1
δi1 δi δi3
γi3
i
4
δi2 2
γi
γ2
6
γγi2i2
4
δi2 δi
γ
i
1
i
3
1
δi 3
δi0
γ3i3
γγi3
δi3 δi
δi0
i
δi1
γ4
γγi4i4
δi1
1
i
0
1 1
δi2
δ0i
δδi0
2
δ
i
1
i
δi3
δ1i1
δδi1
δi3
i
2
δ
δδi2i2
i
1
1
σj0
σj1
σj00
σj
σj11
σj
p1
p1
pj
pj
...
pm
...
pm
pm+1
pm+1
...
...
...
0
0
σm+2
σm+1
1
1
σm+2
σm+1
0
0 0
σm+1
ρ1 1 σm+1
π10
1
π11 σm+1 ρ11 σm+2
0
0
0
0 ρ2 π1 σ0m+2
π12 π1 σm+1
0 1 11 σm+2
0
0 ρ0 πσ11m+2
σ
π111 σσm+1
π20 σ
m+2
2 σ20
1
m+2
m+1
m+2
1
1
πσσ120m+2
1
ρ1 πσρ100m+2
π21 π
10m+20 2 ρ1
0
0
πρ210111 σ
m+1
πππ2010 σ
σm+2
0 ρ2
π22 π
m+2
2
σ
1
ρ
1
1
0
0
12
m+1
σ111 σm+1
π
1
0 π
ρ
σ21211m+2
1 σm+2
ρ04nπρσ
π4n
m+2
πσπ1221m+2
0
2
0
1
1
2
0
π2m+2
ρ
σ
σ210102m+2
1 π
02
ρ14nπρρ
π4n
12
ππ22011m+2
1
1
0
0
0
1
21
0
π10210
2 π
ρ4n
ρ24nπρρ
11
π4n
22
1
ππ24n
1
1
12
12
2
1
1
22
πρ
2121
1
πππ4n
ρ4n
0
π
0
0
ρρ22212020220 σ0m+2
2212
σ
π
π
202 m+1
01
0
2
1
ππ4n
4n
ρ
4nσm+2
0
1
2
π
σ
0
12
ρρ14n
1
m+1
0
0
σ
1 σm+2
20
0
σ
m+2
1
π4n
σ
m+2
4n
12
10 m+1 σρm+2
2
4n
1
4n
πσπ4n
12
24n
0
1
2
m+2
1
ρρ00114n
σ
12
π2124n
2
σ
m+2
0
2
1
ρρ101224n
m+2
4n
2
ππ4n
0
24n
2
0
ρ114n
2
0
π1124n
2
ρρ
ρ4n
02
π14n
2
04n
ρρ12121104n
1
π14n
2
4n
ρ
ρ
12
02
4n
π 20
1
π14n
2
20
ρρρ
ρ0211204n
2
4n
π14n
1
2
2
0
4n
π
1
ρρρ122124n
π24n
ρ
22
π221
ρ220
π20
4n
π4n
2
ρρ04n
20
1
π4n
0
1
0
π4n
4n
1
ρρ14n
2
π4n
1
2
1
ρ
24n
π4n
2
ρ
4n
π4n
2
2
ρ
π4n
4n
1
1
1
...
σj00
σj
λ0i,j
0
λ
λ0i,j
σj11i,j
σ
p1
λj1i,j
λ1i,j
p1
λ1i,j
λ0i,j
0
λ
2
i,j
pλλji,j
2
i,j
λ2i,j
λ1i,j
1
pj
λ
3
λi,j
σj00
i,j
3
λ
p
0
σ
3
2
i,j
0
σj
λi,jj
λσmi,j
2j
λ
4
i,j
1
pm
1
λ
σj1
σj λi,j
4
0
σj λσ400 0 pλ3i,j σj
σj1
3
i,j
σi,jjj λi,j λσm+1
0j1
0
1 λi,j
! ,j
λ
i
λi,j λ0!
i,j
pm+10
0
4ii,j,j
1i,j 0
2
λi,j
σλ
σjj11 λi! ,jλi,j λ
λλ4i,j
11
i,j
! ,j
λi,j
λ3i,j λλi1i,j
2!
1
1
λ
1
0ii,j
,j
0
i,jλ !
4
λi,j
λλi,j
0
i ,jλi,j λ
i03!! ,j
λ
i,j
i2i,j
λ
2
λ
0
λ2i,j λ ! i24! ,j,j
i
,j
λ
1i! ,j
1i,jλ2
λ2i,j
λλ
!
1 i! ,jλ1! λi1i,j
0,j
λi,j
i ,jλ
3
λλi3!!,j
λi,j
i,j
λ2i! ,jλi3i1!!,j,j
0
3
σ
3j
λ
3
λ
0
2
i
2i,jλ
!,j
,j
σj λi,j
λλi,j
2 i! ,jλ3! λi2i! ,j
i ,jλ
4
λi4i2i!!!,j,j,j
σj0 λi,j
4 λ
i,j
43
1
λ
!
4
i ,jλ
σσj1 λ40
λ
4
1
3ii!!,j
λ
3
i,j
,j
σj
λλi,j
3 λi! ,jλ0 λi3! ,j
j σi,j
λλi04!!,j
σj0
i!! ,j
0
λ00i,j j
λi,j
i0i!!,j
1 λ
i! ,j
λ
0
!! ,j
0
λ
i
!!
λ
1
0
4i! ,j
λ4i !!,j
1 λ0
λ0i,j σλi,j
λλ
4 λi!! ,j 2 λ4
i! ,j
λi!! ,jλii1i1!! ,j,j,j
λi,j
σjj1i,j
1
λλ !!!!
λ2i,j
λi,j
i1! ,j
λ3i!! ,jλi01ii2!!,j,j,j
λλ1i,j
1
3
λ
1
1
!
0
λi,j λ00i,jλi! ,j λλi0i! ,j,jλi!! ,jλ4 λλi0i!!!!,j,j
λ
!
λi,j
i!! ,j
4
λλi2i23i!!!!!!,j,j,j
λi22i! ,j,j
λi,j
λ 4!!
λ2i,j
0 λ2
1i! ,j
λ2i!! ,j λλ1i1ii!!!!,j,j,j
λ2i,j λλ1i,j
λλ
i! ,j
1i! ,j
i1!!,j
λ
λ
λi,j
i
,j
1
λ3ii3!!!! ,j,j
λ33i! ,j
λi,j
i! ,j
λ !!
λ32i,j
3
λ
3
3
λi,j λλ22i! ,jλi! ,j λλ2i2i!!,j,jλi!! ,j λ2i2i!! ,j,j
λ
!
λi,j
3
λi4i4!!!! ,j,j
λi4i4! ,j,j
λi,j
4i! ,j
λ !!
λ 4 λ4
3i! ,j 4
λ4i,j λλ3i,j
λ3i3i!! ,j,j
λλ
3i! ,j i! ,j
λi3i0!!,j,jλi!! ,j λ
λi,j
i!! ,j
i,j
λ0i!! ,j
λ00i!! ,j
λ
4
4i!! ,j
λ0i! ,j λλ4i41i!!!,jλ,j0i!! ,j λλ
λ
4
!
λii1!,j,j
λ4ii!!!! ,j,j
λi,j
i,j
λi1!! ,j
λ2
λ1i0i3!!!,jλ,j1
λ
1
!!
0
λi! ,j λλ0i!!! ,ji!! ,j λλi0i!! ,j,j
!!
λii4! ,j,j
λi22i!! ,j,j
λλ2i!! ,j
1i!! ,j
λ2i! ,j λ1i1!! ,jλ2i!! ,j λλ
λi1i3!!!!,j,j
λii! ,j,j
λ3i!! ,j
λ3
2i!! ,j
λ3i! ,j λ22i!! ,jλ3i!! ,j λλ
λi2i4!!!!,j,j
λii! ,j,j
λi4!! ,j
λ4i3! ,jλ4
λ
4
λi! ,j λ3! i!! ,j λλ3i3i!!!!,j,j
λii! ,j,j
i!! ,j
λ0
4
λ0i!! ,j λ4i4!!! ,j
λ
λi4i!!!!,j,j
λii! ,j,j
λ10i!! ,j
1
λi!! ,j λ0!!
λii!! ,j,j
λ2
λ2i!! ,j λ1i1!!!! ,j
λii!! ,j,j
λ3
λ3i!! ,j λ22i!!!! ,j
λii!! ,j,j
λ4
λ4i!! ,j λi33!!!! ,j
λii!! ,j,j L
D
S
1
1
Figure 6: A sample parasite tree derived from a 3SAT instance. On the left is the gadgetry
corresponding to clause Cj containing literals of variables xi , xi0 , and xi00 .
3.0.1
Budget Tuple
1
1
1
1
The budget tuple B = (BC , B , B , B ) is defined as follows
11
1
4
λ
λi4i!!!! ,j,j
BC = ∞
BD = 4m + 8n − 1
BS = 3m + 8n − 2
1
1
1
11 + 8n + 4
BL = (m + 2) log(n) + 4m
3.0.2
Parasite Tree
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1
1
1
1
1
The parasite tree, illustrated in Figure 6, begins with a path (p1 , . . . pm+1 ) where p1 is the root.
Henceforth, for 1 ≤ j ≤ m, the jth parasite subtree refers to the subtree containing descendants
of pj ’s non-path edge. We arbitrarily label the subtrees containing descendants of pm+1 ’s edges as
the (m + 1)th and (m + 2)th parasite subtrees.
Each of the first m parasite subtrees correspond to one of the m clauses. Consider an arbitrary
clause Cj and its corresponding subtree. The jth subtree begins with a thorn gadget of size
0
k = BD
+ 1 labeled σj0 followed by an additional k-thorn labeled σj1 of size k = BL + 1. A path of
7
length 2 follows the non-tip child of σj1 which is followed by three edges followed by literal gadgets
labeled λi,j , λi0 ,j , and λi00 ,j each representing a literal contained in Cj .
The last two parasite subtrees do not correspond to clauses. Instead they serve to force an
order between assignment gadgets as we will describe later. The branches lead to the two thorns
0
1
0
1
0
σm+1
, σm+1
and σm+2
, σm+2
respectively where the 0th thorns are of size k = BD
+ 1, and the 1st
is of size k = BL + 1. These branches are then followed by n k-thorn triples (k = BL + 1) in series
πi0 , πi1 , πi2 and ρ0i , ρ1i , ρ2i respectively for 1 ≤ i ≤ n, and each triple except the last is followed by a
vertex with one tip child.
3.0.3
Tip Mappings
We begin by providing definitions of how literal gadgets are mapped to assignment gadgets in
negated and unnegated forms. This, in turn, will be used to describe the mapping function ψ.
If we wish to associate λ with α in unnegated form, then the tips are mapped as follows where
each parasite thorn, host thorn pair indicates that the parasite thorn is associated with the host
thorn.
(λ0 , α0 ), (λ1 , α1 ), (λ2 , α2 ), (λ3 , α3 ), (λ4 , α4 )
If, instead, we wish to associate λ with an assignment gadget α in negated form, then we use
the following set of associations.
(λ0 , α0 ), (λ1 , α3 ), (λ2 , α4 ), (λ3 , α1 ), (λ4 , α2 )
Figure 7, which depicts the intended solution mapping of a literal gadget onto an assignment
gadget, also serves to depict the tip mapping between these gadgets.
The parasite tip mappings are defined by the following:
• σj0 is associated with τ 0 for 1 ≤ j ≤ m.
• σj1 is associated with τi1 for all i, j where xi ∈ Cj or xi ∈ Cj .
• The (m + 1)th branch maps its sequence of thorns to the 0th , 1st, and 2nd thorn of each
assignment gadget as described by the following parasite thorn, host thorn associations (for
1 ≤ i ≤ n):
0
1
2
(π4i−3
, αi0 ), (π4i−3
, αi1 ), (π4i−3
, αi2 )
0
1
2
(π4i−2
, βi0 ), (π4i−2
, βi1 ), (π4i−2
, βi2 )
0
1
2
(π4i−1
, γi0 ), (π4i−1
, γi1 ), (π4i−1
, γi2 )
0
1
2
(π4i
, δi0 ), (π4i
, δi1 ), (π4i
, δi2 )
• Similarly, the (m + 2)th branch’s thorn tips map to the 0th , 1st , and 2nd thorn of each
assignment gadget (for 1 ≤ i ≤ n):
(ρ04i−3 , αi0 ), (ρ14i−3 , αi3 ), (ρ24i−3 , αi4 )
(ρ04i−2 , βi0 ), (ρ14i−2 , βi3 ), (ρ24i−2 , βi4 )
(ρ04i−1 , γi0 ), (ρ14i−1 , γi3 ), (ρ24i−1 , γi4 )
(ρ04i , δi0 ), (ρ14i , δi3 ), (ρ24i , δi4 )
8
0
1
2
• Recall that every k-thorn triple (except the last) of form π4i−3
, π4i−3
, π4i−3
(1 ≤ i ≤ n) is
followed by a vertex with exactly one tip child. This tip maps to a singleton set containing
2
the tip child of αi2 which is not already mapped to by the last thorn in π4i−3
.
• Similarly, the tips following thorn triples of form π4i−2 , π4i−1 , and π4i (1 ≤ i ≤ n) map to
the tip children of β, δ, and γ which are not already mapped to by the last thorn of the triple
respectively.
• The prior two mappings apply similarly for the (m + 2)th parasite subtree except the assignment gadget thorns use superscripts 0, 3, 4 in place of 0, 1, 2.
• For all 1 ≤ i ≤ n and 1 ≤ j ≤ m, where xi ∈ Cj we associate λi,j ’s tips with αi ’s tips in its
unnegated form as defined earlier.
• For all 1 ≤ i ≤ n and 1 ≤ j ≤ m, where xi ∈ Cj we associate λi,j with αi in its negated form
as defined earlier.
• For 1 ≤ i ≤ n and 1 ≤ j ≤ m, if λi,j ’s parent is the first vertex of the length 2 path following
σj1 , then λi0 ,j ’s and λi00 ,j ’s tips map to γi ’s and δi ’s tips respectively, where λi,j , λi0 ,j , and
λi00 ,j are distinct literal gadgets of the jth parasite subtree, and i0 < i00 .
• For 1 ≤ i ≤ n and 1 ≤ j ≤ m, if λi,j ’s parent is not the first vertex of the path following σj1 ,
then its sibling literal gadget λi0 ,j ’s tips map to βi , and the tips of λi00 ,j map to γi .
3.1
3.1.1
Proof of Correctness
Forward Direction
Assume that the given 3SAT instance is satisfiable. Let C1 , . . . , Cm denote the m clauses, and let
x1 , . . . , xn denote the n variables. We show the existence of a solution to the constructed GCRDP
instance by describing a solution Ψ which incurs a cost tuple less than B. For convenience, define
s(j) ∈ {1, . . . , n} (1 ≤ j ≤ m + 2) to be the index of a variable which satisfies clause Cj . If multiple
variables satisfy Cj , let s(j) be the least such index. We arbitrarily assign s(m + 1) = s(m + 2) = 1.
First we describe the mapping of a literal gadget onto an assignment gadget as shown in Figure
7. When we write that a literal gadget λ associates with an assignment gadget α, and ψ has been
defined such that λ’s tips are mapped to α’s tips in unnegated form, the following is intended:
• For each of the following parasite thorn, host thorn pairs, we map the parasite gadget to the
host gadget each at cost of BL + 1 cospeciations.
(λ0 , α0 ), (λ1 , α1 ), (λ2 , α2 ), (λ3 , α3 ), (λ4 , α4 )
• The edge between λ0 and λ1 incurs a loss as it passes over the non-tip child of α0 .
• The parent vertex of thorns λ2 and λ3 is mapped to the edge between α1 and α2 and incurs
a duplication followed by a host switch onto the edge preceding α3 .
If ψ maps λ’s tips to α’s tips in negated form, then we apply the following modifications to the
mapping above.
9
0
λ0
0
λ
α1
1
α
λ1
1
λ
α2
2
α
λ2
23
λ
α
3
α
λ3
3
λ
α4
4
α
λ4
λ4
α
λ0
1
α
α0
λ10 λ0 α0
0
α
α02 λ
α
α11
λ2 α
λ11
α3 λ
α2
3 α2
λ λ2
2
α4 λ
α33
λ4 α
λ33
λ4
α4
α4
λ4
λ
α0
λ0
α1
λ1
α2
λ2
α3
α0
λ3
4 λ0
α
0
α01 α
λ4 λ
λ11
α
α12
λ
λ22
α
α23
λ
λ33
α
α34
λ
λ4
α4
λ4
0
α0
λ0 0
α01 α
λ
λ11
α
α12
λ
λ22
λ0
α
α23
λα01
λ
3
λ3
αλ11
α
α34
λα12
λ
λ44
αλ22
α
λα23
4
λ
αλ33
λα34
αλ44
λ4
α
λ0
1
α
λ1
α2
λ2
α0
α3
0
λ3 λ0 α
0
α1
α4 λ
1
λ1
λ4 α
12
λ
α
α22
α00 λ2
α λ
α3
3
α
λ3
3
λ
α4
4
α
λ4
λ4
α0
λ0
α1
λ1
α2 α0
λ2 λ0 0
α
α3α01
λ
λ3 λ11
α
α4α12
λ
4 2
λ λ2
α3
α
λ23
λ
α34
α
λ34
λ
α4
λ4
α0
λ0 0
α10 α
λ1
λ
α21
α
λ21
λ
α32
α
λ32
λ
α43
α
λ43
λ
α4
λ4
Figure 7: On the left, we show a literal gadget λ mapping onto an assignment gadget α in unnegated
form. The mapping for negated form is shown to the right.
• The gadgets are mapped to each other according to the pairs
(λ0 , α0 )(λ1 , α3 )(λ2 , α4 )(λ3 , α1 )(λ4 , α2 )
• The parasite tree incurs a loss by passing over the other child of the non-tip child of α0 .
• The duplication and host switch events occur on the edge between α3 and α4 , and the switch
lands on the edge prior to thorn α1 .
1
Regardless of form, this mapping costs 5(BL + 1) cospeciations, 11 duplication, 1 loss, and
1 host
1
1
1
switch.
0
1
First, let all vertices of the path (p1 , . . . , pm+2 ) map to the host edge which precedes τ incurring
1
1
1
a cost of m + 1 duplications. For the jth parasite subtree
(1 ≤ j ≤ m + 2), we map the k-thorn σj0
1
1
0
1
1
to τ , assign the child edge to the path through the binary subtree between τ 0 and τs(j)
, and
map
1
1
1
0
1
m
+
1
duplication,
and
σj to τs(j) . This accumulates (m + 2) × (BD + 1 + BL + 1) cospeciation,
1
1
1
(m + 2) × (log2 (n)) loss events.
1
For 1 ≤ j ≤ m, the length two path following σj1 codiverges at vertices of the balanced binary
1
subtree following τs(j)
as depicted in Figure 8, incurring two cospeciations and one loss per parasite
1
1 assignment gadget
subtree associated with a 3SAT clause. The literal gadget λs(j),j then maps to
αs(j) . Since the given 3SAT instance is known to be satisfiable, no two values j and j 0 exist such
that i = s(j) = s(j 0 ) and xi ∈ Cj and xi ∈ Cj 0 . Thus we may apply the mapping defined earlier
between literal and assignment gadgets and incur a cost tuple of (5(BL + 1), 1, 1, 1) for each of the
m clauses (recall that this is the tuple we derived after describing how to map a literal gadget to
an assignment gadget). The other two literal gadgets of the jth parasite subtree each map to one
of βs(j) , γs(j) , or δs(j) in unnegated form accumulating an additional cost of (10(BL + 1), 2, 2, 2) for
each clause.
10
22
00
11
The tip
tip
child
of the
the vertex
vertex which
which follows
follows aa thorn
thorn triple
triple ππ4i−3
•• The
child
of
4i−3,,ππii
4i−3,,ππ4i−3
00
11
π
,π
••The
ππ4i−3
, πi2 i2 of
, π4i−3
Thetip
tipchild
childofofthe
thevertex
vertexwhich
whichfollows
followsaathorn
thorntriple
triple
4i−3
4i−3
which isis not
not mapped
mapped to
to by
by the
the last
last thorn
thorn in
in
maps
to
the
tip,child
child
of ααi2i2 which
maps
to
the
tip
22
maps
whichisisnot
notmapped
mappedtotoby
the
lastthorn
thorninin
mapstotothe
thetip
tipchild
childofofααi i which
22 the
0
1
ππby
.. last
4i−3
, πi2
, π4i−3
• The tip child of the vertex which
follows
a thorn triple π4i−3
4i−3
22
2
0
1
ππ4i−3
• The tip child
of. . the vertex which follows
a thorn
4i−3
4i−3 , πis
i not mapped to by the last thorn in
4i−3
maps to
the tiptriple
childπof
αi2, πwhich
• Similarly,
Similarly,
we map
map the
the tips
tips following
following thorn
thorn triples
triples of
of form
form ππ4i−2
•
we
4i−2,, ππ4i−1
4i−1,,
2
mapped
to
by
the
last
thorn
in
maps to the tip child of αi2 which is πnot
.
4i−3
••Similarly,
following
thorn
form
, ,πtip
,,
Similarly,we
wemap
mapthe
thetips
tips
following
thorntriples
triplesofof
form
π4i−1
4i−2
2
4i−2
4i−1
and
to
the
tip
children
ofβ,
β,δ,δ,and
andγγ which
whichare
arenot
notalready
alreadymapped
mappedto
to
and
ππ4i4iππto
the
children
of
π4i−3
.
and
ofofβ,β,δ,δ,and
γγwhich
already
mapped
andππ4i4itotothe
thetip
tipchildren
children
andmap
which
arenot
not
already
mapped
to
• Similarly,
we
the are
tips
following
triplesto
ofthe
form
π4i−2
, π4i−1 ,
by
thethorn
last
thorn
of
the
triple
respectively.
by
the
last
thorn
of
triple
respectively.
• Similarly,by
we
map
thethorn
tips following
thorn
triples of form π4i−2
, γ which are not already mapped to
ofofthe
bythe
thelast
last
thorn
thetriple
triple
respectively.
4i−1
and
πrespectively.
of ,β,πδ,
and
4i to the tip children
and π4i to the tip children of β, δ, andbyγ the
which
not of
already
mapped
Thetoprior
prior two
two mappings
mappings apply
apply similarly
similarly for
for the
the m
m++22 lineage
lineage except
except the
the
•• respectively.
The
lastare
thorn
the triple
prior
two
mappings
apply
the
•The
The
prior
two
mappings
applysimilarly
similarlyfor
forthe
themm++2assignment
2lineage
lineageexcept
except
the
by the •last
thorn
of the
triple
respectively.
assignment
gadget
thorns use
use indices
indices 0,0,3,3,44 in
in place
place of
of 0,0,1,1,2.2.
gadget
thorns
assignment
use
indices
0,0,1,1,2.2. for the m + 2 lineage except the
• The
two0,mappings
applyofof
similarly
assignmentgadget
gadgetthorns
thorns
useprior
indices
0,3,3,44ininplace
place
• The prior two mappings apply similarly
for the m
+ 2 lineage
assignment
gadget
thorns except
use
in
place
0, 1, 2. λλi,j
Cjj4,, then
then
weofassociate
associate
’s tips
tips with
with ααii’s’s tips
tips in
in its
its unnegated
unnegated
•• indices
IfIf the
xxii ∈∈0,C3,
we
i,j’s
assignment
usewe
indices
0, 3, 4λin
place
of with
0,with
1, 2.αα
••IfIfgadget
xxi i∈∈Cthorns
associate
’s’stips
ininas
its
unnegated
Cj ,j ,then
then
we
associate
λi,ji,j
tips
tips
unnegated
i ’s
i ’stips
form
asits
defined
earlier.
form
defined
earlier.
form
formasasdefined
definedearlier.
earlier.• If xi ∈ Cj , then we associate λi,j ’s tips with αi ’s tips in its unnegated
• If xi ∈ Cj , then we associate λi,j ’s form
tips with
αi ’s tips
in its unnegated
as defined
earlier.
then we
we associate
associate λλi,j
with ααii in
in its
its negated
negated form
form as
as defined
defined
•• IfIf xxii ∈∈ CCjj,, then
i,j with
form as••
defined
IfIfxxi i∈earlier.
form
∈CCj ,j ,then
thenwe
weassociate
associateλλi,ji,jwith
withααi iininits
itsnegated
negated
formasasdefined
defined
earlier.
• If xi ∈ Cj , then we associate λearlier.
negated form as defined 0
i,j with αi in
2
1
0
1
βi its
The
tip child
the
vertex
which
follows
a thorn triple π4i−3a, πthorn
, πi2
, π4i−3
The
tip
π4i−3
• If xi ∈ Cjearlier.
,earlier.
then we associate λi,j with
αi in• its
negated
form of
as •
defined
4i−3 , πtriple
i
2 vertex which follows
0 child
1 of the
2
0
1
earlier.
πthe
tipofchild
the vertex
whichmaps
follows
a triple
thorn
11 then
,,ππ tipisis,child
,i,j
ππ’s
• The •tipThe
child
the of
vertex
which follows
a thorn
π•triple
2αparent
βIf
2vertex
i first
i λλ
4i−3
’s
the
first
ofisthe
the
length
path
following
i i4i−3
If
22 path
following
which
not
mapped
to of
by
the
last
thorn
to the
tip •child
of
α4i−3
i,j4i−3
notlength
mapped
toinby
the lastσσthorn
in
maps
to
the
of αvertex
earlier.
jj,, then
i parent
i which
11
2
2 first
•
If
λ
’s
parent
is
the
vertex
of
the
length
2
path
following
σ
,
then
which
is
not
mapped
to
by
the
last
thorn
in
maps
to
the
tip
child
of
α
•
If
λ
’s
parent
is
the
first
vertex
of
the
length
2
path
following
σ
,
then
α
not
tofirst
by the
last
thorn
in
maps to the
βπimapped
2
11
i,j
γlength
i,jtip child of αi which
i λ
iλis
j,j’s
!of
! ,j,,
i λλ
λ
’s
and
’s
tipstriple
map
and
tipsrespectively,
respectively,where
whereλλi,j
map
to
δ,δπ
2’stips
0 γγii’s
•• IfThe
’s
parent
is
vertex
the
2 tips
path
following
σπand
, then
. the
,j’s
i!!j!!,j
i,j,,λλii! ,j
πii!2,j
.and
ia
ii’s
i,jtip
j4i−3
4i−3of
,’s
child
therespectively,
vertex
which
follows
thorn
πto
2! ! ’s and λ !! !! ’s tips map
2
1 04i−3
i
4i−3
!i,j
! ,j
γ
λ
to
γ
’s
and
δ
’s
tips
where
,
2, λ
1δi λλ
"
""
α
π
.
λ
’s
and
λ
’s
tips
map
to
γ
’s
and
δ
’s
tips
respectively,
where
,
λ
,
"
""
• If π
λi,j
’s
parent
is
the
first
vertex
of
the
length
2
path
following
σ
,
then
.
i
i
,j
i
,j
i
i
i,j
i
i
β
i
,j
i
,j
i
i
i,j
i
4i−3
!!
!
!!
!
j
, πrespectively,
,δiπi!!inot
follows
thorn
πand
4i−3• The tip child of the vertexλmaps
λi ,jtip
’satips
map
andλ
’s
where
,
are
distinct,
and
ithorn
and
are
and
iilast
<<λii,j
.,. λi ,jin
iwhich
,j ’s and
i ’s
,j,jtips
4i−3
idistinct,
4i−3
isλ
mapped
to by
the
child
oftriple
αtoi2δ" γwhich
"" ""the
"<γito
2δi ’s
!!itips
! ,jtips
λi! ,j ’s andand
λand
map
to
γi ’sof
and
where
,the
, thorn
!!
distinct,
and
i2"itips
. are
λito
are
distinct,
irespectively,
.Similarly,
i•
αi
we
following
triples
of form
π4i−2
, π4i−1of, form π4i−2 , π4i−1 ,
•<
Similarly,
wein map thorn
the tips
following
thorn
triples
i!! ,jλ’s
,jthe
,jare
is
notdistinct,
mapped
tomap
by
the
maps
tip
child
αand
λ<
and
iλii,j
iλ""i.last
i.!! ,j
iand
πwhich
Similarly,
map
following
thorn
form
•and
Similarly,
we mapwe
the
of
πIf
4i−2
4i−1
2 distinct,
4i−2
4i−1
δthorn
λi!!•
are
and
i"tips
<the
i""following
. tips4i−3
’sπof
parent
not
the first
first
vertex
of the
the
path following
following
then its
its
i
γπi4i to triples
If
λλi,j
parent
isis not
the
vertex
of
path
σσ1j1,, then
,j π
andtriples
theform
tip••of
children
β, ,δ,πand
γ, which
are not
already
mapped
to
i,j
and
π,’s
to
to
4i
4i−3 .
1 1 the tip children of 1β, δ, and γ which are not already jmapped
•
If
λ
’s
parent
is
not
the
first
vertex
of
the
path
following
σ
,
then
its
π
toparent
the
tipischildren
of
β,’sγδ,vertex
and
which
are
not
already
to
Iftoλi,j
not
of
the
path
following
σjmapped
,respectively.
then
its
and π•4iand
the
children
of β,
δ,
and
which
are
not
already
mapped
to
••the
IfSimilarly,
λfirst
parent
is
not
the
first
vertex
of
path
following
then
its
i,j4i’stip
i,j
jthe
δγ
! ,j
j ,respectively.
sibling
literal
gadget
λiform
tips
map
toβ,βii,,and
andthe
thetips
tipsof
ofλλii!!!!,j,j map
mapto
toγγii..
sibling
literal
gadget
’s’stips
map
to
wethe
map
the
tips
thorn
triples
ofλ
πσ4i−2
, π4i−1
i last
by
thorn
of
the
triple
i! ,j
by
the
last
thorn
of
the
triple
1 following
• If by
λi,jthe
’s•parent
isliteral
not
the
first
of
the
path
following
σthe
then
its
!!i!!
! ,j
!! ,j map to γi .
j ,tips
! ,j
sibling
gadget
λtips
’s
tips
map
to
β
,
and
the
tips
of
λ
map
to
γ
.
Similarly,
we
map
the
following
thorn
triples
of
form
π
,
π
,
sibling
literal
gadget
λ
’s
map
to
β
,
and
the
tips
of
λ
by
the
last
of vertex
the
triple
respectively.
sibling
literal
gadget
λi!i,jrespectively.
’s
tips
map
to
β
,
and
tips
of
λ
map
to
γ
.
last
thorn
ofthorn
the
triple
4i−2
4i−1
i
i
,j
i
i
i
i
i
,j
i
and π4i to the tip children of β, δ, and γ which are not already mapped to
! ,j ’s tips map to βi , and the tips of λi!! ,j map to γi .
sibling literal
gadget
λ
i
and π4i to the tip children ofbyβ,the
δ, and
γthorn
which
not
already
to
•last
The
prior
two
mappings
applytwo
similarly
for the
m+
2 lineage
• respectively.
Themapped
prior
mappings
apply
similarly
forexcept
the m the
+ 2 lineage except the
ofare
the
triple
3.5
Cost
Tuple
Cost
Tuple
The
prior
two mappings
apply
similarly
the
m + 2thorns
lineage
except
the
• The •prior
two
mappings
similarly
for
the for
m+
23.5
lineage
except
the
by
the
last
thorn
of apply
the
respectively.
assignment
gadget
use
indices
0,thorns
3, 4 inuse
place
of 0, 1,
assignment
gadget
indices
0, 2.
3, 4 in place of 0, 1, 2.
3.5triple
Cost
Tuple
3.5
Cost
Tuple
3.5Tuple
Cost
Tuple
assignment
gadget
use prior
indices
3, 4 inofplace
of similarly
0, 1, 2. for the m + 2 lineage except the
assignment
gadget
thorns thorns
use indices
0, 3, two
4 in0,mappings
place
0,
1, 2.
• The
apply
3.5 Cost
The
cost
tuple
B
=
(B
,
B
,
B
,
B
)
is
defined
as
follows
The
cost
tuple
B
=
(B
,
B
,
B
,
B
)
is
defined
as
follows
C D
D LL SS
• The prior two mappings
apply
similarly
m
+
except
B
=
,C
B
,then
BL2use
, lineage
B•we
)Ifis
as
follows
gadget
thorns
3,j 4,the
in
place
0, 1, 2.αi ’sλtips
Sindices
Ifdefined
xfor
∈Cthe
,follows
associate
λthen
’sCtips
with
unnegated
xdefined
weofassociate
tipsitswith
αi ’s tips in its unnegated
i(B
jD
i,j
The
tuple
BB==(B
,B
B
,B
) •)isis
as
i ∈0,C
i,j ’s in
Thecost
cost
tuple
(BCthorns
,The
BDD, defined
,cost
Bassignment
,tuple
BSSas
defined
as
follows
C
The cost
,then
Bassociate
follows
assignment
gadget
use
indices
0,
3,
4α
in ’s
place
0,tips
1, 2.unnegated
IfC=xj ,i(B
∈C ,CBj D
,we
we
associate
λi,j
’s
tips
αof
in as
itsdefined
unnegated
L, B
S ) is
• Iftuple
xi •∈B
then
λLi,jL’s
tips
with
tips
in
its
i ’s
form
as iwith
defined
earlier.
form
earlier.
B
17mBLL++
2m++4B
4BLL++24nB
24nBLL
B
==17mB
2m
C
• If xi ∈ Cj , then
associate
α
’s
unnegated
BCwe
= 17mB
2m
+ tips
4BL with
+ 24nB
i,j ’s
iC
as defined
L +λ
L tips in its
form asform
defined
earlier.Bearlier.
17mB
++
++4B
B
=
17mB
+2m
2m
4BLL+α
+24nB
24nB
CC
Lλ
LLin its unnegated
• If xi ∈BCCj=, 17mB
then we
associate
tips
with
tips
L
i ’s
+=
2m
+form
4B
24nB
L
Li,j
as’s
•defined
If xLi ∈earlier.
Cj , then we
form
as its
defined
• Ifassociate
xi ∈ Cj ,λthen
we α
associate
λ4m
with
αi1in
negated form as defined
i,j with
i in its negated
i,j +
as
earlier.
8n−−
BBDD ==4m
+ 8n
1
= 4m as
+form
8n
−as
1 defined
IfCjx,i then
∈ defined
Cjwe
, then
we associate
λi,jαearlier.
αinegated
in itsBD
negated
• If xi •∈form
associate
λi,j with
in its
form
defined
iwith
4m
11 associate λearlier.
B
4m
8n−−we
DD−
BD = 4m +•B8n
1∈
If
x=i=
Cj+,+8n
then
its negated form as defined
i,j with αi in
earlier.
earlier.
• If xi ∈ Cj , then we associate
λi,j with αi in
form as
BL4L ==(m
(m++2)
2)log(n)
log(n)++4m
4m++8n
8n1++44
BLits
= negated
(m + 2) log(n)
+ defined
4m +βi8n B
+
earlier.
βi first
αi
==(m
++4m
2)2)log(n)
4m
44•the
+
•8nIf++
λ4+
’s +
parent
ofisthe
2 pathoffollowing
σj ,2 then
BLLlog(n)
(m+
log(n)
4m
+8n
8n++is
If
λi,j1’svertex
parent
thelength
first vertex
the length
path following σj1 , then
i,j
earlier.BL = (m +B2)
λi,j ’s parent
is the
first vertex
of
length
2following
σ!!j1and
, then
• If λi,j•’sIfparent
is the first
vertex
ofλ the
length
2 the
path
σ8nand
,−
then
B
=
+’s
2 ’s
αtips
=
3m
+8n
8n
1+
Spath
BBrespectively,
−−tips
22 λrespectively,
β
γγlength
i 3m
S=
Sto
i iλ
! ,j
λii!the
and
λfirst
map
to
δpath
λifollowing
map
γi3m
’s
δiwhere
’s
where λi,j , λi! ,j ,
• 8n
If −
’s parent
of jthe
following
σand
,j ’s is
i!! ,j ’svertex
i ’s tips
i,j , λi! ,j ,
i ,j ’s2 tips
i,j
j , then
=
3m
2δ3m
1λ
BB
++
8n
2β2i2respectively,
S the
3m
−
! ,jλ
S
!!and
!λ
λiλ
’si’s
λi!! B
’s
tips
map
to
γ=
δ!!8n
’s−
tips
where
S=
λi! ,j ’s• and
’s tips
map
tofirst
γ+
and
’siand
tips
where
λ
,
,
γγ
"δiithen
"", λi! ,j ,
If
of
the
length
path
following
σ
,
"
"" , λ ! ,
,jis
i ’s
irespectively,
i,j
,jparent
i ’svertex
iα
i,j
,j
i ’s
i,j
j
!!
!
λ
’s
and
λ
’s
tips
map
to
and
δ
’s
tips
respectively,
where
λ
!!
and
λ
are
distinct,
and
i
<
i
.
i ,j
i ,j i ,j
i and λi i ,j are distinct, and i < ii,j. i ,j
δ" i<where
’sλand
tips
to
γ.ii"’s<
and
λi,j , λi! ,j ,
distinct,
and
""
and λi!!λand
and map
i" <
i""and
γ!!ii"",j.δare
αi respectively,
i! ,jare
i!! ,j ’s
i ’s tips
i!!distinct,
,j λare
,j
λ
distinct,
and
i
i
.
i
4
Proof
of
Correctness
"
""
1
Proof
ofvertex
Correctness
and
λ
44 is•not
Proof
of
Correctness
γi’s parent
of the
following
i!! ,j are distinct, and i < i .δ•
If 1λthe
parent
is not
the path
first vertex
of σ
the
path its
following σj1 , then its
i If λi,j
i,j ’sfirst
j , then
4 Proof
of
Correctness
1
Proof
Proof
of
Correctness
If
λi,j
’s parent
is not
of
the
path
following
σ
then
itsfollowing
• If44
λi,j• ’s
parent
isof
notCorrectness
the
firstthe
ofparent
the path
following
σvertex
then
• vertex
Iffirst
λi,j ’svertex
not
the
first
ofj ,its
the
path
σj1 ,tips
thenofto
its
δiisliteral
sibling
gadget
λji,! ,j
’s literal
tips
map
to
βλ
the
λβ
mapthe
to tips
γi . of λi!! ,j map to γi .
sibling
gadget
map
, and
i ,i!and
i!!i,j
,j ’s tips
Reduction
• If
λi,j ’sgadget
parent gadget
the
vertex
oftogadget
the
path
following
σj1λ,ito
its
!!then
sibling
literal
λi! ,jfirst
’s
tips
βfrom
tips
ofmap
γitips
. of
sibling
literal
λisi! ,jnot
’s4.1
tips
map
tomap
βi , and
the
tips
of
λ
to
. tothe
!the
sibling
literal
λiDCRP
’s tips
map
βimap
,γiand
λi!! ,j map to γi .
i , and
,j
i!! ,jReduction
,j
4.1
Reduction
from
DCRP
4.1
from
DCRP
4.1 depicts
Reduction
from
DCRP
Figure 8: The upper figure
the
used
satisfying
literal mapping
gadget
λ ’s
tips map
to β ,if
andthe
the tips
of λ
map to γ .literal for clause Cj is a child
4.1
Reduction
from
DCRP
4.1 sibling
Reduction
from
DCRP
We will
now show that an instance of DCRP constructed from a 3SAT problem
3.5above
Cost
Tuple
3.5problem
Tuple
We
willCost
now
show
thatan
anof
instance
ofDCRP
DCRPaconstructed
fromaa3SAT
3SATproblem
problem
We
will
now
show
that
instance
of
from
will now show that an instance ofas
DCRP
constructed
frombe
a 3SAT
described
can
transformed
into
an
instance
CRP for
which
3.5
Cost
Tuple
3.5
Cost
Tuple
3.5
Cost
Tuple
of the first vertex in the We
length
two
path.
The
lower
figure
illustrates
the
ifconstructed
satisfying
We
will
now
show
that
an
instance
of
DCRP
constructed
from
a
problem
We
will
now
show
that
an
instance
of
DCRP
constructed
from
a
3SAT
problem
as
described
above
can be
bemapping
transformed into
into
anthe
instance
of CRP
CRP for
for which
which aa
as described
can Tuple
be transformed
into exists
an instance
of CRP
forDCRP
whichisa3SAT
as
described
above
can
transformed
an
instance
of
3.5above
Cost
solution
if andtuple
only
if the
solvable.
The cost
B
= The
(B of
, of
BCRP
,tuple
B
, for
BBwhich
)=
is(B
defined
cost
,as
B follows
, B ) is defined as follows
asasdescribed
above
can
be
transformed
an
instance
for
a,aB
described
above
can
be
transformed
into
an,as
CRP
which
The
cost
tuple
Binto
= (B
Binstance
,solution
B
, BP,
)ψ,
is
defined
as
follows
solution
exists
if and
only
if ,the
DCRP
is
solvable.
solution
exists
if and
and
only
if the
the
DCRP
is solvable.
solvable.
exists
if
only
if
DCRP
is
The
cost
tuple
B
=
(B
,
B
,
B
,
B
)
is
defined
follows
The
cost
tuple
B
=
(B
B
,
B
,
B
)
is
defined
as
follows
Given
an
instance
of
DCRP
(H,
B)
corresponding
to
a
3SAT
problem,
literal is a child of the second
vertex.
Note
that
in
both
cases,
we
incur
two
cospeciations
and
one
The
cost
tuple
B
=
(B
,
B
,
B
,
B
)
is
defined
as
follows
solution
exists
if
and
only
if
the
DCRP
is
solvable.
solution
exists
if
and
only
if
the
DCRP
is
solvable.
Given an instance of DCRP (H, P,
B) corresponding
to a, P3SAT
problem,
Given
an instance
instance
of DCRP
DCRP
(H,P,
P,ψ,
ψ,B)
B)
corresponding
to aa 3SAT
3SAT problem,
problem,
an
of
(H,
corresponding
weψ,construct
a CRP (H
, φ, B
) as
follows.
The
transformation
leaves
the4B
BGiven
2m
+
4B
+ 24nB
B
= 17mB
+
2m
+
+ 24nB to
B +
=24nB
17mB
+=
+
4B +
+
24nB
instance
(H,
P,
ψ,ψ,transformation
B)
corresponding
to
a17mB
problem,
an
instance
DCRP
(H,
P,
B)4B
corresponding
to2m
a3SAT
3SAT
problem,
we constructGiven
aGiven
CRPan
(H
, P ,=
φ,17mB
Bofof
) DCRP
as
follows.
The
leaves
the
=
17mB
+
2m
+
B
+tree
2m
+
4B
24nB
we
construct
CRP
(H
φ,such
as follows.
follows.
The transformation
transformation leaves
leaves the
the
so
H =
H.
The
parasite
tree
is(H
modified
each
we
construct
aa CRP
,,PP ,,φ,
BB )) that
as
The
BB =host
17mB
+unchanged
2m
+ 4B
+ 24nB
loss.
,tip
P
,l φ,
The
transformation
leaves
the
host tree we
unchanged
so aHaCRP
= H.(H
The
parasite
isfollows.
modified
such
thattree
eachunchanged
weconstruct
construct
CRP
(H
,P
,∈φ,B
B) )tree
The
transformation
leavesso
the
L(P
)asas
is follows.
replaced
with
ahost
disjunctive
tip gadget.
This
at a
H gadget
= H. rooted
The parasite
tree is modified such that each
i! ,j
D C L DS
C
"
"
C
"
C
D
i!! ,j
i
L
L
"
C C" " L" "
C
S
S
"
L "L"
L
L
C "
L L
D
L
"
C
S
i
D
L
"
C
L
L
S
LL
L
C
D
C"L"
L"
"
L
S
"
"L
L
L
L
"
host tree unchanged
so 8n
H "−
=1 H.
parasite
B = 4m +
B The
= 4m
+ 8n −tree
1 is modified such that each
...
D−1
B
+
8n
D
""
D = 4m
" The
tip l ∈ L(P
) istree
replaced
with a disjunctive
tip
gadget.
rooted
at
asuch
host
sosoBHH
H.
parasite
tree
istip
modified
that
host
treeunchanged
unchanged
=4m
The
parasite
istip
modified
thateach
BH.
=
4m
+
8n1the
−gadget
1tree
=B
8n
1 This
vertex
l=
which
takes
place
of
lllis∈∈aL(P
path
of
length
keach
1 from
l" to l1" where
D+
4m
+−
8n
−
D
l−
L(P
) is
is replaced
replaced
with
disjunctive
tip gadget.
gadget. This
This gadget
gadget rooted
rooted at
at aa
D
)such
with
aa disjunctive
tip
"
"
vertex l" tip
which
of l iswith
al path
of
length
kl tip
−tip
1B
from
lin
to
Bl2)
=
+ rooted
2)
log(n)
+
4m
+of
8n
+
4aathe
= (m
+
log(n)
+
4m
+tip
8nthe
+
B
=except
(m
+l2)
+of
4m
+ 8nk+
4−11 from
) )isisplace
replaced
gadget.
This
gadget
at
a4place
" (mhas
tipl l∈takes
∈L(P
L(Pthe
replaced
adisjunctive
disjunctive
gadget.
This
rooted
at
aplace
1Lwhere
L
L
kwith
=a|ψ(l)|
and every
vertex
the
path
one
child
last
vertex
which
takes
the
of
lfor
path
of
length
kll−
from ll" " to
to ll1" 1" where
where
ll"gadget
which
takes
isislog(n)
path
length
B
=
(m
+
2)
log(n)
+
4m
+
4
B(m
=
(m
+has
2)
log(n)
++
4m
++
8n+
+8n
4vertex
B
=
+
2)
log(n)
+
4m
8n
4
""
""
""
LL
L
0
kl = |ψ(l)|
and
every
vertex
in
the
path
one
tip
child
except
for
the
last
vertex
−−
from
land
l1l1where
vertexl l which
whichtakes
takesthe
theplace
place
pathofoflength
lengthkkkBkll l=
1|ψ(l)|
from
l to
toevery
where
γi ofofl lisisaapath
every
vertex
in
the
path
has
one
tip
child
except
for
the
last
and
vertex
in
the
path
has
one
tip
child
except
for
the
last
=1|ψ(l)|
3m
+B
8n
−
2
=
3m
+
8n
−
2
l S=
B
=
3m
+
8n
−
2
S
S
3m
+
−
2−
kkl l==|ψ(l)|
the
path
one
|ψ(l)|and
andevery
everyvertex
vertex
in=
the
path
has
one
tipchild
childexcept
exceptfor
forthe
thelast
last
=8n
3m
+
8n
2 tip
SB
B
3m
+
−8n
2has
S
S =Bin
4
4
1
Proof
of Correctness
00γi 4
4 44Proof
of 4Correctness
Proof of Correctness44
γCorrectness
4 Proof
of
ii
4
Proof
ofCorrectness
4 Proof
of Correctness
...
4.1
Reduction from DCRP
4.1 Reduction4.1
from
DCRP from DCRP
Reduction
4.1 Reduction
from
γi2 DCRP
4.1
Reduction
Reduction
from
DCRP
γi1i1 from
We DCRP
will now show that an instance of DCRP constructed from a 3SAT problem
...
4.1
1
...
will
now
that
instance
of instance
DCRP
problem
αi show that an instance
We an
will
now
show
that anconstructed
instance
of1 from
DCRP
constructed
from a 3SAT problem
We will now
ofWe
DCRP
constructed
from
a 3SAT
problem
described
above
canshow
befrom
transformed
an
of
CRP for
whichaa3SAT
will
nowthat
show
an as
instance
DCRP
from
ainto
3SAT
problem
We willWe
show
anthat
instance
of
DCRP
constructed
adescribed
3SAT
problem
asnow
described
above
can
transformed
into
an constructed
instance
of
CRP
for
which
asof described
above
can
be transformed
intobeantransformed
instance of into
CRPanforinstance
which aof CRP for which a
as
above
α
1 a can
exists
if into
and
only
if the
DCRP
is solvable.
itransformed
γbei3besolution
22be can
as
described
above
transformed
an ifinstance
offor
CRP
for
a if the DCRP is solvable.
as described
above
into
an
instance
of solution
CRP
which
awhich
solution
existscan
ifγand
DCRP
solvable.
solution
exists
and
only
the
isonly
solvable.
exists
if and
anisinstance
of DCRP
(H,
P,ifψ,
B) DCRP
corresponding
to
a
3SAT
problem,
ii only if theGiven
1
if and
if the
iscorresponding
solvable.
β
solutionsolution
exists
and
only
ifofonly
the
DCRP
is
Given
an
instance
DCRP
(H,DCRP
P,solvable.
ψ,Given
B)
i ifexists
"
"a DCRP
an(H
instance
(H,
P,
ψ, transformation
B)
corresponding
a 3SAT
problem,to a 3SAT problem,
anproblem,
instance
of DCRP
(H, P,
ψ,toB)
corresponding
we
construct
a CRP
, P " , φ,to
BofGiven
) 3SAT
as follows.
The
leaves
the
"
"(H,
Given
an ainstance
DCRP
P,
ψ, B) corresponding
problem,
Given
instance
ofi DCRP
(H,
ψ,
B)
corresponding
"to a
" problem,
"
"
we an
construct
(Hof" , 4P
, φ,P,
Btree
) as
follows.
The
transformation
leaves
the
" to a(H
βCRP
0
we
construct
CRP
, Pparasite
, 3SAT
φ,
Ba" )CRP
as follows.
leaves
the
unchanged
so aH0
= we
H. 3SAT
The
tree
is(H
modified
each
construct
, P " ,The
φ,such
Btransformation
) that
as follows.
The
transformation
leaves the
33" " γi"" host
" " , φ, B " ) as follows. γ
i
γCRP
we
a(H
(H
,
P
The
transformation
leaves
the
hostconstruct
tree
unchanged
so
H
=
H.
The
parasite
tree
is
modified
such
that
each
we construct
a CRP
,
P
,
φ,
B
)
as
follows.
The
transformation
leaves
the
"
i
" gadget
ii
tip
l
∈
L(P
)
is
replaced
with
a
disjunctive
tip
gadget.
This
rooted
at
a
host
tree
unchanged
so
H
=
H.
The
parasite
tree
is
modified
such
that
each
host
tree
unchanged
so
H
=
H.
The
parasite
tree
is modified
such that each
δl i∈tree
" a disjunctive
tipunchanged
L(Punchanged
) is so
replaced
tip
gadget.
This
gadget
rooted
at
a each
" The parasite
"
"
host
sowith
Hvertex
= H.parasite
tree
is
modified
such
that
host tree
H" =
H.
The
tree
is
modified
such
that
each
l
which
takes
the
place
of
l
is
a
path
of
length
k
−
1
from
l
to
l
where
tip
with
gadget.
This
rooted This
at a gadget rooted at a
tip l ∈
L(Pa)disjunctive
is replacedl tip
with
a disjunctive
tip gadget.
1 gadget
0 l ∈ L(P ) is replaced
0with
δreplaced
vertex
l"L(P
which
theπ
ofa|ψ(l)|
ldisjunctive
isγai path
of
length
kl −gadget
1This
frompath
l"" to has
l" where
1 vertex
) istakes
gadget.
gadget
at
aof
i 44with
tip l ∈ tip
L(Pl )∈is
replaced
aplace
disjunctive
tip
This
aa tip
" tip
1
kl =
andgadget.
in
the
one
child
except
lastofl"length
vertex
levery
which
takes
the
place
of1" rooted
lat
is
path
length
1path
from
to l1" where
4i−1
vertex
lrooted
which
takes
the
place
of l kfor
isl −
athe
kl − 1 from l" to l1" where
γof
γevery
"
" from
" lthe
ii k
k" l which
= |ψ(l)|
andthe
vertex
one
tip
childkl except
ii place
l" takes
which
takes
theofplace
lpath
aofhas
path
length
1l|ψ(l)|
tolast
lpath
vertex lvertex
l in
is the
aofpath
length
tofor
lin
where
l − 1 from
1 where
1 and
αi+1
kis
vertex
the
has
oneintip
for tip
thechild
last except for the last
kl −
=
every
vertex
thechild
pathexcept
has one
l = |ψ(l)| and every
kl = and
|ψ(l)|
and vertex
every vertex
in1 thehas
path
tipexcept
child except
kl = |ψ(l)|
every
in1 the path
onehas
tipone
child
for the for
lastthe last
αi+1
π4i−1 γγii00
00
π4i−1
γi
4i−1
4
24
2 1 γ
π4i−1
i
11
γ
π4i−1
γii1
4i−1
22
π4i−1
4i−1
4
2
γii2
4
4
4
3
γii3
4
3 γi
γ
γii3
0
πii0
0
4 πi
γ
γii4
πii11
1
πi00 πi
π
i
2
πii2
1
1
βi
βi
δi
1
α1 i
αi
...
γii44
...
3
γi22 γi
γ
i
δi
αi+1
αi+1
2
1 πi
π
πii1
...
πi22
π
i
Figure 9: Here we depict the mapping of the (m + 1)th parasite subtree along the k-thorns with
superscripts 0, 1, and 2 of every assignment gadget. The expanded assignment gadget shows the
mapping details onto γi . This is similar to the mapping for the (m + 2)th parasite subtree which
codiverges along k-thorns of superscripts 0, 3, and 4.
11
1
1
1
1
1
1
For the (m + 1)th and (m + 2)th parasite subtrees, we map the edge following σj1 to the path
between τ11 and α10 at a cost of 2 loss events for each subtree. The first k-thorn triple of the (m+1)th
parasite subtree maps to the 0, 1, and 2 superscripted thorns of α1 , and the vertex which follows this
triple host switches to the edge preceding assignment gadget β1 . This continues through assignment
gadgets β1 , γ1 , δ1 , α2 , . . . , δn as depicted in Figure 9. Since each of the two parasite subtrees has 4n
k-thorn triples, and 4n − 1 vertices to following the triples in order to perform the host switches,
an additional cost of 24n(BL + 1) cospeciation, 8n − 2 duplication, 8n loss, and 8n − 2 host switch
events is incurred.
The total cost of this mapping is
0
(m + 2)(BD
+ 1) + (m + 2)(BL + 1) + 15m(BL + 1) + 2m + 24nk cospeciations
4m + 8n − 1 duplications
(m + 2) log n + 4m + 8n + 4 losses
3m + 8n − 2 host switches
Since this cost is within the budget tuple B, it follows that any instance of GCRDP constructed
from a satisfiable 3SAT instance has a valid solution.
3.1.2
Reverse Direction
Now we will show that the existence of a valid solution to an instance of GCRDP describing an
instance of 3SAT implies that the 3SAT instance is satisfiable.
First note that every parasite tree k-thorn must codiverge at least once since every such thorn
is made up of at least BL + 1 > BD + 1 vertices. It follows that for every 1 ≤ j ≤ m + 2, Ψ must
map at least one vertex of σj0 to its corresponding vertex on τ 0 yielding a cospeciation event, and
since the vertices, p1 , . . . , pm+1 are ancestors of σ10 , they must incur m + 1 duplications.
Consider the (m + 1)th and (m + 2)th parasite subtrees of the parasite tree. In order for a
solution to map any non-k-thorn vertex of these branches as a cospeciation event, we must map it
to a common ancestor of two assignment gadgets. However, such a mapping would exceed the loss
budget as the edge from this vertex to its tip child must work through one side of an assignment
gadget at a cost of 3 × (BL + 1) losses. Thus each of these 2 × (4n − 1) vertices must duplicate and
host switch. The purpose of this mapping is to force an ordering between all assignment gadgets
such that given any pair of assignment gadgets in the host tree, the k-thorns of one must occur
before all k-thorns of the other.
Note that we have now accounted for all but 3m duplications, and all but 3m host switches.
We claim that any solution must incur at least one duplication and host switch as it maps each
literal gadget to the host tree. Any mapping of a literal gadget λi,j which incurs no duplication
or host switch events must map the k-thorn λ0i,j as a cospeciation event to αi00 , βi00 , γi00 , or δi00 for
some i0 possibly equal to i. Without loss of generality, we will assume that the assignment gadget
is labeled αi00 . In order to avoid host switches we must map all vertices of λi,j to vertices of αi0 .
Now consider the parent of λ2i,j and λ3i,j which we will refer to as w. A solution which maps this
vertex as a cospeciation must map it to the parent of αi10 and αi20 , but such a solution requires that
w’s child lineage which contains λ2i,j incur at least BL + 1 loss events. Thus any valid solution
must duplicate and host switch at least one vertex of each λi,j , and since we have used all but 3m
duplications and host switches, each literal gadget duplicates and host switches exactly once.
12
!!
!!1
!!2
!!ki −1
!!ki
...
!
!
! !!
!!
!!1
!!1
!!2
!!2
!!ki −1
!!ki −1 !!ki
!!ki
!
!!
!!1
!!2
!
!!ki −1!!
!
!
!ki !1
!!2
!!ki −1
!!ki
!
!!
!
!1
!!2
!!ki −1
!!ki
Figure 10: Here we show a sample transformation of a parasite tip vertex to a disjunctive tip gadget.
We have now accounted for all host switches and duplications, so Φ must map all remaining
parasite vertices as cospeciation events. This implies that each vertex of a σj1 k-thorn gadget must
map to its corresponding vertex on some τi1 as a cospeciation, and that the length 2 path which
follows σj1 must codiverge with vertices of the balanced tree following τi1 .
Consider some λi,j which, without loss of generality, maps to an assignment gadget αi0 . Recall
that any attempt to codiverge the parent vertex of λ2i,j and λ3i,j will exceed the loss budget, so this
vertex must duplicate and host switch while all other vertices of the literal gadget codiverge with
vertices of αi .
Furthermore, we claim that this host switch cannot land outside of the assignment gadget αi0 .
Recall that the solution mapping of the (m + 1)th and (m + 2)th parasite subtrees imply that for
any pair of assignment gadgets in the host tree, all non-tip vertices of one gadget must occur before
all non-tip vertices of the other. Thus a host switch which lands outside of αi will place a switched
lineage with 2 × (BL + 1) internal vertices onto an edge that either precedes an assignment gadget
where any mapping will incur at least BL + 1 loss events or onto an edge which precedes a tip
vertex resulting in an excess of duplication events. Since the switched parasite lineage will contain
2 k-thorns of size k = BL + 1, the host switch must land on the edge preceding either αi10 or αi30
depending on ψ’s definition.
Note that this host switch requires that either all vertices of αi10 occur prior to all vertices of αi30
or that the reverse ordering occurs. If this condition cannot be met, it follows that no1 satisfactory
mapping exists to solve the GCRDP instance. Thus for any
1 solvable instance, a solution exists for
1 to
the associated 3SAT instance which we can obtain by assigning xi to true if αi1 occurs prior
1
αi3 and to false if we find the reverse ordering. To prove that this solution will
satisfy
the
3SAT
1
clauses, consider a clause Cj . Note that ψ is defined such that the property1 noted earlier where the
literal gadgets of a single clause must map to the assignment gadgets of the same variable implies
that some gadget λi,j exists which maps to αi . Since the solution mapping is valid, an ordering is
imposed between αi1 and αi3 which assigns the variable xi appropriately such that Cj is satisfied.
This is the case for all Cj and therefore a solution to the GCRDP instance implies the existence of
an assignment which satisfies the 3SAT clauses.
4
Polynomial Time Reduction of GCRDP to CRDP
We now show that an instance of GCRDP constructed from a 3SAT instance, as described above,
can be transformed into an instance of CRDP for which a solution exists if and only if the DCRP
instance is solvable.
13
Given an instance of GCRDP (H, P, ψ, B) corresponding to a 3SAT instance, we construct a
CRDP instance (H 0 , P 0 , ϕ, B 0 ) as follows. The transformation leaves the host tree unchanged so
H 0 = H. The parasite tree is modified, as illustrated in Figure 10 such that each tip ` ∈ L(P ) is
replaced with a disjunctive tip gadget. This gadget is similar in form to a k-thorn of size |ψ(`)| − 1
all of whose children are tips labeled `01 , `02 , . . . , `0|ψ(`)| in order of decreasing depth, and whose root
we label `0 . The tips are mapped by ϕ to a unique element in the set ψ(`), and we stipulate that
ϕ(`01 ) and ϕ(`02 ) are the most distant pair of host vertices in ψ(`) with respect to distance in the
host tree.
Finally, we define
X
X
B 0 = (BC , BD +
(|ψ(`)| − 1), BS +
(|ψ(`)| − 1), BL ).
`∈L(P )
4.1
`∈L(P )
Proof of Correctness
Given a solution, Ψ, for a GCRDP instance (H, P, ψ, B), we construct a solution to the CRDP
instance (H 0 , P 0 , ϕ, B 0 ) as follows. Let Φ(p) = Ψ(p) for all p ∈ P ∩ P 0 . For every disjunctive tip
gadget constructed to replace a tip ` ∈ L(P ), we map the vertices of the unique path from `0 to `0i
to the edge preceding hi where hi = Ψ(`), ϕ(li0 ) = hi , and every internal vertex v of the disjunctive
tip gadget host switches to the edge preceding some tip hi0 , where some descendant of v maps to
hi0 . The mapping of verticesPp ∈ P ∩ P 0 accumulates a cost of at most (BC , BD , BS , BL ) while the
disjunctive tip gadgets add `∈L(P ) (|ψ(`)| − 1) duplications and the same number of host switches.
The total cost is upper bounded by B 0 , and thus there exists a solution to the CRDP instance.
Now we demonstrate that the transformation above yields a solvable instance of CRDP only if
the transformed GCRDP instance is solvable. First we show that every internal vertex of a disjunctive tip gadget, which takes the place of a tip other than a thorn tip of some σj1 , must be mapped
as a duplication and host switch event. It will then be demonstrated that the vertices p1 , . . . , pm+1
must be mapped to the edge child of the host dummy root at a cost of m + 1 duplications, and
finally, that the solution must map each σj1 to some τi1 as cospeciation events. With these observations, we then show that defining Ψ(p) = Φ(p) for p ∈ P ∩ P 0 and Ψ(`) = Φ(`0 ) for ` ∈ L(P ), will
yield a valid solution to the GCRDP instance.
Consider an internal vertex v of a disjunctive tip gadget, which replaces a tip that is not part
of some σj1 . Note that in our definition of ψ, no two vertices exist together in some ψ(`) which are
both descendants of the same τi1 k-thorn gadget, so in order to map v such that only a cospeciation
is incurred, it must be mapped to an ancestor of two gadgets of form τi1 which would incur BL + 1
losses yielding an invalid solution. Thus it follows that these vertices, of which there are
X
(|ψ(`)| − 1)
l∈L(P )−{thorn tips of some σj1 }
must each incur a duplication and host switch. Furthermore, for any such mapping, it is always
desirable to perform these host switches on edges which precede tips in order to avoid loss events,
which yields a mapping similar to that described earlier.
We then note that in order to map a vertex of the initial parasite tree path, pj , to a descendant
of τ 0 , σj0 must duplicate and host switch all of its thorns in order to ensure that its tips properly
0
map with the tips of τ 0 . Since we specified that k = BD
+ 1 for the k-thorns, τ 0 and all σj0
14
(1 ≤ j ≤ m + 2), host switching all thorns of any σj0 will incur too many duplications for even the
CRDP instance, and therefore the path (p1 , . . . , pm+1 ) must accumulate m + 1 duplications.
Subtracting the host switch and duplication events due to the disjunctive tip gadgets below
thorns of form σj1 and the length m + 1 path, the remaining cost budget for both host switches and
duplications is
X
(|ψ(`)| − 1).
3m + 8n − 2 +
`∈{thorn tips of some σj1 }
Since every host switch requires an accompanying duplication, any additional duplication which
does not incur a host switch is no less expensive than a duplication followed by a host switch.
If we map each σj1 to some τi1 (1 ≤ i ≤ n and 1 ≤ j ≤ m + 2) and host switch the disjunctive
tip gadgets as described earlier, then each thorn contributes a cost of two duplications and host
switches. Now if we consider any mapping of a thorn vertex in σj1 which avoids this host switch,
we will show that such a mapping incurs an additional duplication and some number of loss events.
Let u and v be the two internal vertices of the disjunctive tip gadget associated with a thorn tip
of σj1 where v is a descendant of u, let `01 , `02 , and `03 be the disjunctive gadget’s tips in order of
increasing depth, and label ϕ(`0k ) = hk (k = 1, 2, 3). In order to avoid a single host switch at u
or v, Φ must map this vertex to a common ancestor vertex or edge of h1 and h2 . Mapping to the
latest common ancestor would yield a cospeciation, but the parent of u (an internal vertex of the
k-thorn gadget) would have to incur a duplication (else exceed BL losses) while an earlier common
ancestor would yield an extra duplication. In addition to this, the edges (v, `01 ) and (v, `02 ) would
each incur at least one loss at the parents of h1 and h2 . To avoid a host switch at u, the vertex
must map to a common ancestor of h1 , h2 , and h3 , but recall that `01 and `02 are defined such that
h1 and h2 are pairwise the most distant vertices in the image of this disjunctive tip gadget’s tips.
It follows that the common ancestor of h1 , h2 , and h3 is the same as the common ancestor of h1
and h2 , so if we wish to save host switches at u and v, we cannot codiverge at u, and must incur
an extra duplication in addition to those accumulated by avoiding the host switch at v.
Thus, the proposed solution, Ψ, to the GCRDP instance solution, where Ψ(p) = Φ(p) for
p ∈ P ∩ P 0 and Ψ(`) = Φ(`0 ) for ` ∈ L(P ) will incur a cost within the budget tuple B. Furthermore,
a valid CRDP instance solution requires that Φ(p) = h implies that there exists a tip descendant
p0 of p and a tip descendant h0 of h such that ϕ(p0 ) = h0 , so it follows that Ψ is valid in that it
properly extends the tip mapping function ψ.
References
[1] Michael Charleston. Jungles: A new solution to the hostparasite phylogeny reconciliation problem. Mathematical Biosciences, 149:191–223, 1998.
[2] Ran Libeskind-Hadas and Michael Charleston. On the computational complexity of the reticulate cophylogeny reconstruction problem. Journal of Computational Biology, 16(1):105–117,
2009.
15