Article Timing and Order of Transmission Events

Timing and Order of Transmission Events Is Not Directly
Reflected in a Pathogen Phylogeny
Ethan Romero-Severson,1 Helena Skar,y,1 Ingo Bulla,y,1 Jan Albert,2,3 and Thomas Leitner*,1
1
Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM
Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
3
Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
y
These authors contributed equally to this work.
*Corresponding author: E-mail: [email protected].
Associate Editor: Jeffrey Thorne
2
Abstract
Pathogen phylogenies are often used to infer spread among hosts. There is, however, not an exact match between the
pathogen phylogeny and the host transmission history. Here, we examine in detail the limitations of this relationship.
First, all splits in a pathogen phylogeny of more than 1 host occur within hosts, not at the moment of transmission,
predating the transmission events as described by the pretransmission interval. Second, the order in which nodes in a
phylogeny occur may be reflective of the within-host dynamics rather than epidemiologic relationships. To investigate
these phenomena, motivated by within-host diversity patterns, we developed a two-phase coalescent model that includes
a transmission bottleneck followed by linear outgrowth to a maximum population size followed by either stabilization or
decline of the population. The model predicts that the pretransmission interval shrinks compared with predictions based
on constant population size or a simple transmission bottleneck. Because lineages coalesce faster in a small population,
the probability of a pathogen phylogeny to resemble the transmission history depends on when after infection a donor
transmits to a new host. We also show that the probability of inferring the incorrect order of multiple transmissions from
the same host is high. Finally, we compare time of HIV-1 infection informed by genetic distances in phylogenies to
independent biomarker data, and show that, indeed, the pretransmission interval biases phylogeny-based estimates of
when transmissions occurred. We describe situations where caution is needed not to misinterpret which parts of a
phylogeny that may indicate outbreaks and tight transmission clusters.
Key words: HIV, within-host dynamics, molecular epidemiology, phylodynamics, transmission reconstruction, coalescent.
Introduction
Article
Investigating DNA sequence data from pathogens is a powerful and increasingly popular method to get information
about epidemics that otherwise is difficult to access.
Typically, pathogen phylogenies are used to infer parameters
and events relating to host behavior such as who infected
whom or when transmissions occurred in the past. In chronic
diseases such as HIV infection, it is often not known how long
patients have been infected, and even less who infected
whom, and thus phylogenies that are able to reconstruct
the past evolutionary history of the pathogen promises to
unravel such unknowns. Because HIV (and other measurably
evolving pathogens) typically accumulates mutations faster
than between-host transmissions occur, transmission histories become recorded in the virus phylogeny (Leitner 2002).
There is, however, a lack of deeper understanding of the connection between transmission histories and viral phylogenies.
The pretransmission interval describes the difference in
time between when a transmitted lineage arose in a donor
and when transmission actually took place (Leitner and
Albert 1999; Leitner and Fitch 1999). Because transmitted
lineages must already exist at the time of transmission,
using the time-point when the transmitted lineage arose
will always bias the estimated time of transmission backwards
in time. However, the magnitude of this bias has not been
carefully investigated. Therefore, the impact this may have on
epidemiological reconstructions is unknown. In addition to
timing bias, there may be additional problems arising from
the fact that viral lineages can persist for long time periods in
the host. Although the issue of incomplete lineage sorting has
been investigated quite extensively in the more general context of speciation (Pamilo and Nei 1988; Degnan and
Rosenberg 2006; Liu and Pearl 2007; Rosenberg and Tao
2008), and has been discussed in the context of pathogen
transmission (Leitner and Fitch 1999; Leitner and Albert 2000;
Pybus and Rambaut 2009), it is often not appreciated in molecular epidemiology studies. For HIV infections, it has been
described that multiple HIV lineages may exist for seven or
more years in a host (Rodrigo et al. 1999), which possibly
could affect transmission reconstructions. However, these
estimates were based on simple coalescent models assuming
constant population size, which appear unrealistic when
transmission is considered, because transmissions are
known to involve a strong bottleneck (McNearney et al.
1992; Wolfs et al. 1992; Zhang et al. 1993; Salazar-Gonzalez
et al. 2009).
Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2014. This work is written by US Government employees and is in
the public domain in the US.
2472
Mol. Biol. Evol. 31(9):2472–2482 doi:10.1093/molbev/msu179 Advance Access publication May 29, 2014
Timing and Order of Transmission Events . doi:10.1093/molbev/msu179
The objective of this study was to explore the limitations of
using phylogenetic branch lengths as a proxy for time between infections and time since infection. We show that
the pretransmission interval becomes shorter when a more
realistic coalescent model is applied, which may improve
transmission reconstructions. However, we also show that
phylogenies alone may not distinguish between serial transmissions between different individuals and one person
spreading to several individuals as in super-spreader cases,
which may confuse the inference of spread dynamics.
Results
Theoretical Limitations in Reconstructing Time of
Transmission
Transmission of a pathogen can occur serially from one
person to another so that each person only infects
one other person, forming a single chain of transmissions.
However, one person may also transmit to more than one
recipient, the extreme being a super-spreader. In real epidemics, a combination of these two modes of transmission
usually occurs. When no other information is available than a
virus phylogeny from the persons involved in the transmission chain, limitations arise on what we can and cannot infer.
Figure 1A shows the case when serial transmissions have
occurred. We will consider the case of HIV transmission, but
note that the theoretical limitations may be valid for many
other measurably evolving pathogens, such as hepatitis C and
influenza. At transmission, the donor has an established infection with some genetic variation in the HIV population,
this level of variation depends on a number of factors, such as
if the infection is acute or chronic. From this population a
limited number of viruses are transmitted, incurring a strong
bottleneck (McNearney et al. 1992; Wolfs et al. 1992; Zhang
et al. 1993; Salazar-Gonzalez et al. 2009). Importantly, the virus
that was transmitted already existed in the donor at time of
transmission. Thus, the phylogenetic split that corresponds to
the most recent common ancestor of what grows out in the
recipient and what remains in the donor must be further back
in time than the time of transmission, known as the pretransmission interval (Leitner and Albert 1999). In the case of serial
transmissions, each split relates to one transmission event,
and thus the phylogeny will qualitatively resemble the serial
transmissions in its ladder-like shape, but these events will all
be inferred prior to when they actually occurred. Each donor
occupies three branch-segments of the resulting phylogeny,
where the splits relate to within-host events. In each transmission, the length of the pretransmission interval depends
mainly on the diversity in the donor at time of transmission.
In the case when the donor infects more than one new
host, the diversity at each time of transmission causes additional problems for the reconstruction. If transmissions occur
over long intervals (fig. 1B), each split again relates to a transmission, and superficially the phylogeny reconstructs the
transmission events. However, in this case all splits occurred
within the donor (A), and thus the phylogeny is mostly a
description of within-host evolution, sampled through the
transmissions to B, C, and D, with only some fraction of the
MBE
tips describing evolution occurring in the recipients. As in the
serial case, each transmission will phylogenetically be inferred
prior to when it actually occurred if one was to interpret a
split as a transmission event. Finally, when a donor infects two
or more recipients within a short interval, it is possible that
the order of transmissions along with infection times become
impossible to accurately reconstruct (fig. 1C). In this case too,
all splits are within the donor, describing within-host evolution in the donor. Overall, the pretransmission interval associated with each and every transmission is a random draw
from the possible coalescence times in the donor’s viral
population.
A Two-Phase Linear Coalescent Model of Within-Host
HIV Dynamics
According to standard coalescent theory (Nordborg 2001;
Wakeley 2009), the time for two randomly selected lineages
(the transmitted lineage and a random lineage in the donor)
to coalesce in a simplified theoretical population is an exponential random variable with a rate parameter proportional
to the inverse of the population size.
The empirical dynamics of a real HIV population can
be quite different from the idealized population leading to
multiple-order-of-magnitude discrepancies in the empirical
census population size and the equivalent theoretical population size (Leigh Brown 1997; Pennings et al. 2014). For this
reason, modeling the characteristic growth in the viral census
population can lead to erroneous inferences. Given the assumptions of the Wright–Fisher-like population, the diversity
of the viral population is a closer proxy to the theoretical
population size than the number of infected cells or viral load.
Shankarappa et al. (1999) measured the viral diversity of
seven patients beginning at seroconversion for up to 12 years.
They showed that env diversity increases linearly at an approximately common rate up to some point after which the
diversity appears to either decline or stabilize. The time (from
seroconversion) at peak diversity varied between 2 and
8 years and the rate of decline in diversity after its peak also
seemed to vary between patients. To model this trend, we use
the following piecewise-linear model with uniform random
effects on both the time at maximum diversity and the rate of
decline in diversity following the maxima:
1 + 1 t, t tx
,
NðtÞ ¼
1 + 1 tx + 2 ðt tx Þ, t > tx
where tx ~ U ðta ,tb Þ and 2 ~ U ð,0Þ account for between
patient variance in the maximum diversity and the rate of
decline in diversity after the maximum. 1 is the population
size at the moment of infection, 1 is the rate of population size increase in the first phase, ta and tb are the respective
minimum and maximum times at maximum population size.
The quantity is defined as ðNMin 1 1 ta Þ=ðtMax ta Þ
where tMax is the maximum sampling time, and NMin is the
minimum population size. The latter two parameters are
used to define the minimum slope in the second phase
such that at time tMax the population will not be less than
NMin. Figure 2 summarizes the parameters in our two-phase
2473
Romero-Severson et al. . doi:10.1093/molbev/msu179
MBE
FIG. 1. Fundamental limitations of phylogenetic transmission reconstruction. Panels show the case of serial transmissions (A), and multiple transmissions from one donor with long transmission intervals (B) and short transmission intervals (C). The left column shows the transmission history and the
right column the reconstructed phylogeny. Time of transmission T is indicated for each transmission, A!B, B!C, and C!D in panel (A) and A!B,
A!C, and A!D in panels (B) and (C).
coalescent model. In what follows, we always refer to the
phase (0, tx) as the first phase and (tx, tMax) as the second
phase, independent of whether we are in the framework of
measuring time forwards or backwards.
A Bottleneck-Constant Size Model of Within-Host
HIV Dynamics
To further consider the relative influence of the bottleneck at
transmission and the population dynamics on the pretransmission interval, we also consider the model
0, t ¼ 0
NðtÞ ¼
,
c, t > 0
where the population size is zero at the moment of transmission, forcing all lineages to coalesce at or before this
time (on the reverse time axis), and constant everywhere
2474
else. The point of this model is not to express any biological
reality but rather to be used as a tool to understand the
impact of the strong transmission bottleneck alone, and
how it influences the magnitude of the pretransmission
interval.
The Pretransmission Interval Impact on Time of
Transmission Inference
To derive the magnitude of the pretransmission interval, we
need to first consider density of times to the next coalescent
event in units of coalescent time fA ¼ ea where 1 unit of
coalescent time at time t is defined as N(t) generations in the
past; in the case of HIV-1 the generation time is about 1 day
(Perelson et al. 1996), so we will omit this scalar factor
and simply refer to calendar time. Our strategy for deriving
the magnitude of the pretransmission interval is to first
MBE
Timing and Order of Transmission Events . doi:10.1093/molbev/msu179
h
for z 2 0,t1 +
1
1
i
and
fZ2 j t1 ,tx ,2 ðzÞ ¼ fA ðg2 ðz,t1 ,tx ,2 ÞÞg20 ðz,t1 ,tx ,2 Þ
1
¼ ð1 + 1 tx + 2 ðt1 tx ÞÞ2
1
ð1 + 1 tx + 2 ððt1 tx Þ zÞÞ2 1
for z 2 ½0,1. To account for the joint at time tx we will only
use the density of Z2 conditional on the time to coalescence
being less than t1 tx , which is indicated as fZ2 . To get that
density we need the probability of this event occurring, which
is given by
Z
FIG. 2. Parameters in our two-phase linear coalescent model. The HIV
population initially grows from N ¼ 1 at rate 1 to a time tx in
U ðta ,tb Þ when the population growth either stabilizes or starts to decline at rate 2 in U ð,0Þ to end at N ¼ NMin at time tMax .
derive the relevant densities of coalescence in the twophase model, assemble the pieces into a single expression,
and finally integrate to obtain the random effects and
expectations.
The changing rate of coalescence given our population
model can be obtained by the relation that an infinitesimal
unit of calendar time, du, at time u is equivalent to a change of
du
NðuÞ in units of coalescent time. Therefore, denoting the time
of sampling by t1,
Zs
du
g1 ðs,t1 Þ ¼
0 1 + 1 ðt1 uÞ
1
¼ ½logð1 + 1 t1 Þ logð1 + 1 ðt1 sÞÞ
1
Zs
du
g2 ðs,t1 ,tx ,2 Þ ¼
+
t
+
1
1 x
2 ððt1 tx Þ uÞ
0
1
¼
logð1 + 1 tx + 2 ðt1 tx ÞÞ
2
logð1 + 1 tx + 2 ððt1 tx Þ sÞÞ
gives the relationship between change in coalescent and calendar time in the first and second phase respectively. Because
the coalescent is defined in reverse time the derivation of the
model equations is defined on a reverse time axis such that
the integral defined over 0 to s indicates moving backwards in
time s units, starting at t1.
Next, we derive the density of the time to coalescence in
units of generation time, which we denote by Z. To this end,
we calculate the densities of the time to coalescence (in units
of generation time) either assuming that t1 tx (denoted
fZ1 ), or assuming that t1 > t and NðtÞ ¼ 1 + 1 tx + 2 t for
all t1 tx (denoted fZ2 ). To get these densities, we need to
make the following transformations:
fZ1 j t1 ðzÞ ¼ fA ðg1 ðz,t1 ÞÞg10 ðz,t1 Þ
1
1
¼ ð1 + 1 t1 Þ1 ð1 + 1 ðt1 zÞÞ1 1
t1 tx
ðt1 ,tx ,2 Þ ¼
fZ2 j t1 ,tx ,2 ðzÞdz
0
1
1
¼ ð1 + 1 tx Þ2 ð1 + 1 tx + 2 ðt1 tx ÞÞ 2 + 1:
f
ðzÞ
Then we have the definition fZ2 j t1 ,tx ,2 ðzÞ ¼ Z2ðjtt11,t,txx,,22 Þ on
½0,t1 tx and zero everywhere else.
To obtain the final expectation, we need to integrate out
the random time at peak population size and the random
slope in the second phase. If we start by assuming that all
parameters are fixed, we have the two cases with respect to
the time that at maximum population size, tx
EðZ j t1 ,tx ,2 Þ
8
R t1 + 11
>
>
fZ1 j t1 ðzÞz dz,
0
>
>
Z t1 tx
>
>
>
>
> ðt1 ,tx ,2 Þ
fZ2 j t1 ,tx ,2 ðzÞzdz
>
>
>
0
>
<
¼
+ ½1 ðt1 ,tx ,2 Þ
>
>
>
2 t + 1
3
>
>
x
>
Z 1
>
>
>
6
7
>
>
fZ1 j t1 ðzÞzdz + ðt1 tx Þ5,
4
>
>
:
t1 tx
t1 > tx
0
Finally, to obtain the two-phase model we integrate over the
random effects
Z Z
1 1 0 t1
EðZ j t1 ,x,yÞdx
EðZ j t1 Þ ¼
tb ta ta
Z tb
EðZ j t1 ,x,yÞdx dy:
+
t1
The expectation for the bottleneck-constant size model is
more straightforward. By definition of the model, all coalescences must occur at or before the donor was infected (in
reverse time). Therefore, to obtain the expectation under this
model we need to simply take the sum of the expectation
from 0 to t1 for the constant model and the remaining density
of uncoalesced lineages at time of infection (forcing them to
coalesce) as
2
3
Zt1
Zt1
1
t
1
t
exp t dt + 41 exp dt5t1
EðZ j t1 ,cÞ ¼
c
c
c
c
0
0
2475
Romero-Severson et al. . doi:10.1093/molbev/msu179
MBE
is dominating the impact on the pretransmission interval and
the random effects of 2 are relatively small (dark gray envelope, fig. 3).
In addition to large among patient variability, the pretransmission interval possible in each patient has large range of
possible realizations upon transmission. For a typical patient
that reaches a peak population size of 3,000 in 5 years and has
a moderate decline in population (2 ¼ 1N=day), the
upper 95% confidence band is close to the identity line and
the lower 95% band close to zero. Thus, the confidence interval of the pretransmission interval is large, as both young
and old lineages from the donor’s population potentially
could be transmitted.
The Probability of Incongruity between the
Transmission Tree and the Coalescent Genealogy
FIG. 3. The expected pretransmission interval. The red horizontal line
shows the expected pretransmission interval under the standard constant size coalescent at N = 3,000. The dashed line shows the expectation under a bottleneck-constant size model with N = 3,000. The gray
envelope shows the extremes of the random realizations of the pretransmission interval expectation under our two-phase linear coalescent
model where the average patient reaches N(tx) & 3,000 in 2–8 years.
The dark gray envelope shows the subset of patients with tx = 5 years.
In the bottleneck-constant and two-phase linear models, the mean time
for two random lineages to coalesce depends on the time a donor has
been infected.
Figure 3 summarizes the effects of the pretransmission
interval that one could expect in a phylogenetic reconstruction of a transmission time. Typically, previous coalescent
estimates of within-host HIV-1 N based on the constantsize model have ranged 1,000–10,000 (Leigh Brown 1997;
Nijhuis et al. 1998; Rodrigo et al. 1999). With N = 3,000 and
a generation time of 1 day, the pretransmission interval would
be unchanged through transmission from donor to recipient
at 8.2 years. Clearly, this is a very unrealistic model as it assumes that the entire diversity in the donor is transmitted to
the recipient. Adding the well-known bottleneck at transmission, we note that the bottleneck-constant model drastically
reduces the expected pretransmission interval. N is now dependent on the time of transmission relative to the donor’s
infection time. Our two-phase model adds further realism
and the expected pretransmission interval is generally reduced even more because lineages coalesce faster in a small
population, early in the infection, and slower once the pathogen has reached its full diversity level. In a comparable set of
possible patients, which on average reach N(tx) & 3,000 in 2–
8 years (N(tx) range 1,200–4,900), implying 1 = 1.64 N/day,
the two-phase model also introduces substantial variation
among patients, as observed in Shankarappa et al. (1999).
Some patients may induce very restricted pretransmission
intervals and others much larger (gray envelope in fig. 3).
To further illustrate the effects of our two-phase model, we
highlight patients where tx is fixed at 5 years; noting that N(tx)
2476
If there is negligible within-host diversity, the coalescent genealogy will have the same labeled topology as the transmission tree with high probability. However, for pathogens that
evolve within the host, the coalescent genealogy can be heavily influenced by the within-host dynamics, which could mislead investigators who assume negligible within-host diversity
(fig. 1C). In the theoretical case where a single individual
transmits to k individuals in a very short period of time, the
coalescent genealogy reflects more of the within-host dynamics of the donor than the timing of transmissions. In that case,
each of the many potential trees with labeled tips for k taxa
is nearly equally probable. However as the time between
transmissions increases, the epidemiologic dynamics exert a
stronger influence, making the coalescent genealogy more
probable to be congruous with the transmission tree.
In the situation where a donor transmits to two other
individuals (at times t1 and t2 ), we have a tree with three
taxa corresponding to the two transmission events and a
reference lineage in the donor. There are three possible coalescent genealogies with labeled tips in this situation. In one
coalescent genealogy, the first coalescence is between the
reference lineage and the lineage transmitted at t1 followed
by coalescence with the lineage transmitted at time t2 ; this is
incongruous with the transmission tree as it implies that t2
occurred before t1 . In the second possible coalescent genealogy, the lineages transmitted at times t1 and t2 coalesce with
one another before coalescing with the reference lineage; this
tree is also incongruous with the transmission tree as it
implies that one of the individuals infected at times t1 and
t2 infected the other one. In the final coalescent genealogy,
the reference lineage coalesces first with the lineage transmitted at time t2 and then with the lineage transmitted at t1
which agrees with the transmission tree.
Figure 4 shows the probability of obtaining the incorrect
coalescent genealogy given the two-phase model for a three
taxa tree with one donor transmitting to two individuals at
times t1 and t1 + t2 for a wide range of times. If t2 ¼ 0 then
the probability of obtaining the correct coalescent genealogy
is the reciprocal of the number of possible coalescent genealogies, which, for n = 3, is 1/3. In the case where the second
transmission occurs about 1 year or less after the first, the
Timing and Order of Transmission Events . doi:10.1093/molbev/msu179
FIG. 4. Probability that two transmissions will phylogenetically appear in
the opposite order in which they occurred. The x axis shows how long
the donor has been infected at time of the first transmission, and the y
axis shows the amount of time that has passed from the first to the
second transmission. The color indicates the probability of obtaining a
coalescent genealogy inconsistent with the transmission tree.
probability of obtaining a coalescent genealogy that is incongruous with the epidemiology is about 50%. This means that if
chronically infected persons are responsible for a large fraction
of new infections, a phylogeny could seriously misrepresent
when and in what order transmissions occurred.
The probability of recovering the correct coalescent genealogy is worse for trees involving more recipients. If a donor
transmits to k individuals, then to recover the coalescent
genealogy that is consistent with the transmission tree,
each k 1 within-host coalescent event must occur in the
correct order; even given a modest level of diversity, this is an
improbable outcome.
Tip Lengths Compared with BED Estimates of Time
of Transmission
Recently, we developed a BED (Dobbs et al. 2004) biomarkerbased time-continuous model that can provide information
on how long ago a patient was infected (Skar et al. 2013).
Here, we use that estimate as a reference to what the tip
lengths tell us about time since infection/transmission.
According to the above theoretical limits and pretransmission
interval expectations, we would predict that tip lengths in
general should correspond to a longer time than time since
infection/transmission. Trivially, if we have not sampled all
individuals in a transmission chain, then tip lengths would be
much too long as they now contain unobserved transmissions, spanning more than one patient (fig. 5). Therefore, we
identified several transmission chains that were likely to have
been extensively, if not fully, sampled in the greater Swedish
HIV-1 epidemic.
MBE
FIG. 5. The pretransmission interval and missing taxa. When all hosts in
a transmission chain are sampled (100% sampling), there will be a
pretransmission interval associated with each transmission, with an
expected length according to Fig. 3. When taxa are missing (<100%
sampling), some branches will be affected by more than one pretransmission interval, and such branches reflect evolution in at least three
hosts. Note also that when less than 100% sampling occurred, the
sampled taxa may not have infected each other (Leitner and Albert
2000).
As our HIV-1 trees were based on pol, which is targeted by
antiviral drugs, we first investigated whether known drug resistance mutations (DRMs) affected the reconstructions. First,
there was no significant difference between the homoplasy
distributions of potential DRM sites and non-DRM sites
(P = 0.09, Wilcoxon rank sum test with continuity correction).
However, the tip lengths related to the sequences that contained DRMs (w/DR) were significantly more affected by the
exclusion of DRMs in the alignments than sequences without
DRMs (w/o DR), (mean(w/DR) = 0.001672, mean(w/o
DR) = 0.000065; P < 0.0002, Wilcoxon rank sum test with
continuity correction). Another interesting effect was that
taxa w/DR and whose nearest phylogenetic neighbor had
changed when DRMs were removed had significantly
longer branches than the taxa without DR whose neighbor
remained the same (P = 3.103e-07, Wilcoxon rank sum test
with continuity correction). This is explained by that the taxa
with DR that stayed with the same nearest neighbor when
DRMs were excluded were more likely to have been involved
in local transmission chains. As there were significant effects
on tip lengths depending on inclusion or exclusion of DRMs
in the alignments, we removed all potential DRM sites in our
alignments before further analyses.
Figure 6 shows five well-sampled transmission chains of
recently diagnosed persons involving injecting drug user
(IDU), heterosexual (HET), and homosexual (MSM) spread
of HIV-1 subtypes A1, B, and CRF01. As expected, we see
that for many taxa the tips are longer than the corresponding
BED distance. Both tip lengths and BED estimates are associated with uncertainties, so we compared the 95% confidence
and credible intervals of each estimate to assess when the
differences were significant. Although the trend was clearly
2477
Romero-Severson et al. . doi:10.1093/molbev/msu179
MBE
FIG. 6. Examples of transmission chains from the greater Swedish HIV-1 epidemic. Chain 1 involved a CRF01 outbreak among IDU, imported from
Finland (Skar et al. 2011); Chains 4 and 79 involved smaller MSM outbreaks of subtype B; Chain 80 was another IDU chain spreading subtype B, the
more common subtype among IDU in Sweden; and finally Chain 85, a mixed MSM-HET chain spreading subtype A1. In each chain, the tree shows the
phylogenetic reconstruction of the HIV-1 pol sequences from each subject (1 direct population sequence/subject; Leitner et al. 1993). The trees are
scaled in units of substitutions/site. Superimposed on each tip is the BED-derived genetic distance in bold blue lines (see Materials and Methods for
details). Significant differences are indicated (P < 0.05). Each chain shown is a subtree of much larger trees (one per subtype) describing the entire
sampled Swedish epidemic; they are thus rooted according to their linkage to the large subtype trees.
skewed toward tips being longer, most differences were not
significant. pol-based branch lengths typically are uncertain
because pol evolves slowly, and BED measurements have large
uncertainty because of patient variation (Parekh et al. 2011;
Skar et al. 2013). Nevertheless, taking these uncertainties into
account, all transmission chains showed some significant differences between tip lengths and BED distances, suggesting
that there indeed are real instances where the pretransmission interval has a large impact.
There are two types of patterns in the transmission chains
in figure 6: 1) Tight phylogenetic clusters with short BED
distances. This pattern is visible in all chains except Chain
85. This pattern is consistent with sampling shortly after infection and transmission(s) from one or several donors in the
acute phase of infection. In Chain 1, we know from previous
work that the tight part corresponds to an outbreak in a
socially closely linked IDU network (Skar et al. 2011). 2)
Phylogenetic clusters with long branches but short BED distances. This pattern describes Chain 85 entirely, and is also
part of the other chains, for example, Chains 4 and 79. This
type of pattern is consistent with sampling shortly after infection and transmission from a donor in the chronic disease
stage, years into their infection when the donor HIV-1 population has great diversity (fig. 3). As we believe that these
clusters were densely sampled, they most likely involved
donors that transmitted to multiple recipients as in
figures 1C and 4. The third possible pattern, long phylogenetic
distances with long BED distances, was not observed here
because, obviously, most long-term infected donors were
not in the sample of recently diagnosed persons.
Discussion
The pretransmission interval was described in 1999 (Leitner
and Albert 1999; Leitner and Fitch 1999), and although it has
been discussed since (Leitner and Albert 2000; Pybus and
Rambaut 2009), it has largely been ignored in phylodynamic
investigations. Here, we show that this may have bigger consequences than previously understood by showing that there
are both timing and topological issues involved in the
2478
pretransmission interval. We show that splits in a pathogen
tree are not related directly to transmission events but also
reflect samplings of lineages in the donor’s population at
times of transmissions. The effect may be that the order of
transmission events becomes jumbled, and certainly the inferred times of transmissions are pushed to the past. Hence,
one could expect that dynamics such as number of transmissions over time could become unreliable without properly
modeling pretransmission interval effects.
Identification of clusters in phylogenetic trees of HIV-1 and
other pathogen sequences has been a popular method to
investigate the epidemic spread of such agents in human
populations. The clusters have been defined in various
ways, but generally they have been based on genetic distance
(branch lengths) and/or phylogenetic robustness (typically
nonparametric bootstrap support values). This may be problematic because the branch lengths have a pretransmission
interval associated with them biasing the inferred time of
transmission backwards in time, and moreover, the topology
in the cluster may mostly reflect the random draw of transmitted lineages in a donor. As we show here, the pretransmission interval may involve time periods of months up to
several years, depending on spread dynamics. Thus, real epidemiological clusters, stemming from rapid outbreaks or tight
transmissions may not form clusters with short branch
lengths, as in Chain 85 in figure 6. For instance, consider a
donor that infects ten people within a few months; should we
expect a tight phylogenetic cluster or not? If the donor had
been infected for, say, 6 years at this event, we should expect
tips leading to the recipients on average 4 years long, some
more, some less (fig. 3). Thus, this very rapid spread will not
turn out as a tight cluster in a phylogenetic tree. Clusters with
short distances do, however, exist (fig. 6); so, what do they
mean in epidemiological terms?
Clearly, short tip lengths mean that the infected person
was diagnosed/sampled shortly after infection. Short internal
branches mean that either there was transmission from a
donor shortly after the donor was infected because only closely related lineages existed in the donor, or it may indicate
Timing and Order of Transmission Events . doi:10.1093/molbev/msu179
that a younger lineage in the donor happened to be transmitted even though older existed. When many short
branches exist in a cluster it must mean that either there
were many serial transmissions occurring in early stage infection, or an early stage donor infected multiple recipients
(super-spreader). Thus, although clusters with short distances
relate to fast spread, fast spread may also occur in clusters
with longer distances (fig. 6). Consequently, focusing on clusters with short distances biases the results to one type of rapid
spread, and misses other types of rapid spread.
There are additional reasons for why branch lengths may
differ from times between transmissions: 1) Missing taxa obviously will elongate branch lengths, as several patients and
transmissions now occur along them (fig. 5). This problem is
always present when not all patients in an epidemic have
been sampled, as is usually the case. Note also that typically
17–60% of HIV cases in a human population have not been
diagnosed (Hamers et al. 2006; Volz et al. 2013; Supervie et al.
2014); 2) The rate of genetic evolution is different in different
patients, depending among other things on the strength of
their immune system and on antiviral treatment (Wolinsky
et al. 1996; Halapi et al. 1997; Lee et al. 2008; Cohen and Gay
2010; Skar et al. 2010); 3) The rate of genetic evolution within
one patient is not constant over time; it dynamically follows
the increase and decrease of the immune response over the
pathogenesis (Lee et al. 2008). In addition, there exists some
confusion about which network we can reconstruct, the
transmission network or the phylogeny. The transmission
network is a subset of the social network, where the transmission network will be bifurcating (unless there is a high rate
of superinfection), whereas the social network likely contains
many alternative (cyclic and multifurcating) paths of potential transmissions. The latter cannot be reconstructed by phylogenetics as only transmissions that actually occurred will be
recorded, and the former will be subject to the effects of the
pretransmission interval as described in this article. Similarly,
the within-host phylogeny may contain recombination, causing complicated cyclic structures. It is possible that modeling
all these levels of networks may improve future transmission
inference.
To operate correctly, our model needs additional information about how long the donor was infected at time of transmission. As we show here, and previously (Skar et al. 2013),
time since infection can be estimated using BED data, or for
instance by estimating the number of multistate characters in
direct population sequence data (Kouyos et al. 2011).
Alternatively, if this information is not available, in large
epidemiological studies we can treat it as a random variable.
It is possible that adding additional features to our model
would further reduce the inferred pretransmission interval
effects. Such features include within-host selective sweeps
(Messer and Neher 2012; Pennings et al. 2014), which could
further reduce N in a donor. Accurately modeling these
sweeps would require additional information, for example,
longitudinal and detailed clonal data from donors. In this
context it is also worth pointing out that evolution across
the HIV genome is highly variable, for example, env evolves
much faster than pol. Thus, although our model qualitatively
MBE
resembles the diversity development of any gene, the growth
rates may differ along the genome. In general, when transmissions happen more often a fast evolving gene would have
greater power to resolve lineages in time (Leitner et al. 1996),
but slower evolving genes may also have sufficient power
(Lemey et al. 2005).
We believe that more realistic phylodynamic models that
include the issues we describe here will significantly improve
future epidemiological investigations. On the level of who
infected whom, relevant to forensic investigations, for example (Leitner and Albert 2000; Scaduto et al. 2010), proper
consideration of the pretransmission interval may aid in
assessments of whether transmission could have taken
place in a time window suggested by other data. If more
than one transmission occurred from a common source,
the potential effects associated with inferred disordering
need to be considered. On the larger epidemiological level,
relating to questions of incidence or when infections occur
relative to host stage (e.g., acute/chronic) or time, at a minimum the expected (mean) pretransmission interval needs to
be included. Note, however, that the pretransmission interval
should inversely correlate to the spread rate (Maljkovic Berry
et al. 2007), and social network structures (Graw et al. 2012)
may cause heterogeneous effects on the mean pretransmission interval.
Materials and Methods
HIV Sequence and BED Biomarker Data
A total of 1,740 partial pol sequences from newly diagnosed
HIV-1 patients were evaluated in the study. These sequences
were generated using a published in-house method that
targets amino acids 1–99 in the protease and 1–253 in the
reverse transcriptase (Murillo et al. 2010; Karlsson et al. 2012).
The sequences were assembled and edited using the
SequencherTM software (Gene Codes Corporation, Ann
Arbor, MI) and aligned using MAFFT (Katoh and Toh
2008). The largest part of the data set (N = 1,539) represents
the Swedish HIV-1 epidemic, that is, patients diagnosed with
HIV-1 in Sweden, representing 45% of the newly diagnosed
HIV-1 patients between August 2002 and August 2010. In
addition, we included 201 previously published Swedish sequences sampled between 1992 and 2002 (Lindstrom et al.
2006). BED testing was done on 819 of the collected blood
samples to estimate time since infection as previously
described (Skar et al. 2013). From all of these patients,
we selected sequences that were part of densely sampled
Swedish transmission chains (fig. 6), involving patients
that had no symptoms of advanced disease (low CD4
counts or other AIDS-related manifestations) to avoid
known problems related to false-recent infection classification of patients with advanced disease and low BED values
(Dobbs et al. 2004).
For similar scientific and ethical reasons as explained in
Alizon et al. (2010) and Kouyos et al. (2010), only approximately 15% of the anonymized sequences are accessible via
GenBank (accession numbers: JQ698667–JQ698874).
2479
MBE
Romero-Severson et al. . doi:10.1093/molbev/msu179
Ethics Statement
Informed consent was obtained from participants or from the
next of kin, caregivers or guardians on the behalf of minors.
The research was conducted according to the Declaration of
Helsinki and was approved by the Regional Medical Ethics
Board in Stockholm, Sweden (Dnr 02-367, 04-797, 2005/1214,
2007/1533, and 2011/1854).
Phylogenetic Inference
Maximum likelihood (ML) trees were generated for each HIV1 subtype using PhyML with substitution model as suggested
by the software jModeltest according to the Akaike information criterion (Guindon et al. 2005; Darriba et al. 2012). In
order to account for systematic tip length biases due to
DRMs, potential effects were investigated on the ML trees
inferred from alignments either with or without potential
DRMs. Potential DRMs were classified according to the
Stanford database; Major HIV-1 Drug Resistance Mutations,
available at: http://hivdb.stanford.edu/pages/download/ (last
accessed January 15, 2013). 1) The homoplasy index was calculated for each individual nucleotide site using MacClade
4.06 (Maddison D and Maddison W 2003). Each site was
classified to be part of a potential DRM codon or not. By
comparing these two site classifications, it was possible to
determine whether DRMs contributed significantly more to
the homoplasy distribution in the tree. 2) To investigate the
actual effect on the parameter of interest (the branch
lengths), we compared the tip length distributions from ML
trees inferred from alignments either with or without DRMs.
Furthermore, the tip length distribution of those taxa with
known transmitted DRMs (TDRs) was compared with taxa
without TDRs.
Translating BED Estimated Time since Infection to
Genetic Distance
The biomarker (BED) estimation of time since infection (Skar
et al. 2013) was transformed into genetic distance by a molecular clock. The genetic region corresponding to the
Swedish pol sequences was extracted and aligned from six
previously studied patients (PIC1362, PIC38417, PIC71101,
PIC83747, and PIC90770) with N = 27–121 sequences from
3 to 11 time points with a follow up of between 181 and
1,245 days from onset of acute HIV-1 infection symptoms (Liu
et al. 2006). Putative recombinants (DQ853454, DQ853445)
were identified and removed using the Phi-test in SplitsTree
(Huson 1998). TDRs were excluded from the alignment.
PhyML was used to construct ML trees for each patient.
The substitution model used (GTR + I + G) was the same
as for the construction of the ML trees for the Swedish pol
data. In order to take patient variation into account, we used
a linear mixed effects model to estimate the within-patient
evolutionary rate in this pol fragment. An uncorrelated
random effects model described the data better than a correlated model (P < 0.0001, AIC), and the fixed effect correlation was also very small (r = 0.042). The rate of evolution
was 6.59 106 substitutions site1 day1, and the intercept 1.46 103 substitutions site1 (supplementary fig. S1,
2480
Supplementary Material online). Modeling was performed
with R package lme4 (Bates et al. 2011). Confirmative
BEAST (Drummond and Rambaut 2007) analyses (uncorrelated relaxed lognormal or strict clock) had very similar results
(data not shown).
Comparing Phylogenetic and BED Distances
The BED-derived genetic distances were compared with the
sequence-based phylogenetic distances by numerically estimating the overlap of the 95% credibility interval of the posterior and 95% confidence interval, respectively. The 95%
credibility interval of the posterior BED estimate was based
on the random effects in the time-continuous BED modeling,
described in Skar et al. (2013), and the 95% confidence interval on branch lengths was estimated using ML optimizations
using PAUP* (Swofford 2002). Significance was determined as
less than 5% overlap.
Simulation of Coalescent Genealogies
In the case where a donor (D) transmits to two other individuals (I1 and I2 at times t1 and t2 , respectively), there are
three possible coalescent genealogies. If the lineage terminating in D coalesces with the lineage terminating in I2 before t1 ,
then the coalescent genealogy must be consistent with the
transmission tree. Therefore, we needed to calculate the probability that a coalescent event occurs in the interval ½t1 ,t2 . To
do this we first divided ½t1 ,t2 into subintervals of length 1 day.
For each day, td , in ½t1 ,t2 we approximated the probability of
a coalescent event by first drawing 104 values of tx from
Uð2 365,8 365Þ and 104 values of 2 from Uð0:16,0Þ
and calculated the population size at td assuming 1 ¼ 1 and
1 ¼ 1:64. We assumed that over each day that the population size was constant such that the probability of a coales cent event occurring in that subinterval was 1 exp N1 .
For each day, we calculated the average probability of a coalescent event by averaging the probability of coalescence for
each of the sampled population sizes. The probability of no
coalescent event in ½t1 ,t2 is then the product of the complement of the probability of a coalescent event occurring for all
days in ½t1 ,t2 . If no coalescent event occurs in ½t1 ,t2 , then
each of the possible coalescent genealogies is equally probable. Then, the probability of obtaining an incongruous coalescent genealogy is ð1 pÞ + p3 where p is the probability of
no coalescence occurring in ½t1 ,t2 : The implementations of
our model were done using R (R Development Core Team
2003), and we plan to include the model in a future comprehensive transmission inference software.
Approximation of the Coalescent Rate for Small
Sample Fractions
The standard n-coalescent is an approximation to the full
coalescent process that assumes that the probability of multiple coalescences per generation is very small. In practice, this
equates to assuming that the population size is large or that
the sample size is small. Our model, however, allows the population to become very small to account for the strong bottleneck at transmission. To understand the effect of violating
Timing and Order of Transmission Events . doi:10.1093/molbev/msu179
this assumption, we need to consider the full coalescent process. The probability that k genes have k unique ancestors in
the previous generation in a population of size N is given by
k1
Y
1
i¼1
i
,
N
which is approximated by
k 1
1
,
2 N
where the terms in the original product that correspond
to multicoalescences are assumed to be ignorable if N is
large with respect to k. In general, we are concerned with
the situation where we have two taxa (a transmitted taxon
and a reference taxon in the host). Supplementary Figure S2,
Supplementary Material online, shows the discrepancy in the
exact and approximate probabilities of observing k lineages in
a population of size N in the previous generation given a
sample of size k, that is, the probability of no coalescent
event. In the most common case where k = 2 the approximation is exact as there is only one possible coalescence that can
occur. In the case where k = 10 such as with a putative superspreader, the approximation underestimates the rate of
coalescence leading to overestimated inferences (longer
pretransmission interval and higher probability of node disordering). In general, most coalescences will occur when the
population size is large enough such that the approximate
rate of coalescence is a reasonable estimate of the true rate.
Even in the pathological case where there are many transmissions right after infection, the magnitude of the bias is limited
by the fact that lineages are constrained by biological reality to
occur in the donor at the time of infection. Therefore, the
effect of the small population size on the coalescence rate is
ignorable for our purposes.
Supplementary Material
Supplementary figures S1 and S2 are available at Molecular
Biology and Evolution online (http://www.mbe.oxfordjournals.
org/).
Acknowledgments
This work was supported by National Institutes of Health
(grant number R01AI087520), Vetenskapsrådet (fellowship 623-2011-1100), Deutsche Forschungsgemeinschaft
(fellowship BU 2685/4-1), Vetenskapsrådet (grant number
K2008-56X-09935-17-3), and EU projects SPREAD (QLK2CT-2001-01344), and CHAIN (FP7/2007–2013).
References
Alizon S, von Wyl V, Stadler T, Kouyos RD, Yerly S, Hirschel B, Boni J,
Shah C, Klimkait T, Furrer H, et al. 2010. Phylogenetic approach
reveals that virus genotype largely determines HIV set-point viral
load. PLoS Pathog. 6:e1001123.
Bates D, Maechler M, Bolker B. 2011. lme4: linear mixed-effects models
using S4 classes. 0.999375-42 ed [Internet]. Available from:
http://cran.r-project.org.
Cohen MS, Gay CL. 2010. Treatment to prevent transmission of HIV-1.
Clin Infect Dis. 50(Suppl. 3), S85–S95.
MBE
Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2:
more models, new heuristics and parallel computing. Nat
Methods. 9:772.
Degnan JH, Rosenberg NA. 2006. Discordance of species trees with their
most likely gene trees. PLoS Genet. 2:e68.
Dobbs T, Kennedy S, Pau CP, McDougal JS, Parekh BS. 2004.
Performance characteristics of the immunoglobulin G-capture
BED-enzyme immunoassay, an assay to detect recent human
immunodeficiency virus type 1 seroconversion. J Clin Microbiol.
42:2623–2628.
Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis
by sampling trees. BMC Evol Biol. 7:214.
Graw F, Leitner T, Ribeiro RM. 2012. Agent-based and phylogenetic
analyses reveal how HIV-1 moves between risk groups: injecting
drug users sustain the heterosexual epidemic in Latvia. Epidemics
4:104–116.
Guindon S, Lethiec F, Duroux P, Gascuel O. 2005. PHYML Online—a
web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33:W557–W559.
Halapi E, Leitner T, Jansson M, Scarlatti G, Orlandi P, Plebani A, Romiti L,
Albert J, Wigzell H, Rossi P. 1997. Correlation between HIV sequence
evolution, specific immune response and clinical outcome in vertically infected infants. AIDS 11:1709–1717.
Hamers FF, Devaux I, Alix J, Nardone A. 2006. HIV/AIDS in Europe:
trends and EU-wide priorities. Euro Surveill. 11:E061123.1.
Huson DH. 1998. SplitsTree: analyzing and visualizing evolutionary data.
Bioinformatics 14:68–73.
Karlsson A, Bjorkman P, Bratt G, Ekvall H, Gisslen M, Sonnerborg A, Mild
M, Albert J. 2012. Low prevalence of transmitted drug resistance
in patients newly diagnosed with HIV-1 infection in Sweden
2003–2010. PLoS One 7:e33484.
Katoh K, Toh H. 2008. Recent developments in the MAFFT multiple
sequence alignment program. Brief Bioinform. 9:286–298.
Kouyos RD, von Wyl V, Yerly S, Boni J, Rieder P, Joos B, Taffe P, Shah C,
Burgisser P, Klimkait T, et al. 2011. Ambiguous nucleotide calls from
population-based sequencing of HIV-1 are a marker for viral diversity and the age of infection. Clin Infect Dis. 52:532–539.
Kouyos RD, von Wyl V, Yerly S, Boni J, Taffe P, Shah C, Burgisser P,
Klimkait T, Weber R, Hirschel B, et al. 2010. Molecular epidemiology
reveals long-term changes in HIV type 1 subtype B transmission in
Switzerland. J Infect Dis. 201:1488–1497.
Lee HY, Perelson AS, Park SC, Leitner T. 2008. Dynamic correlation
between intrahost HIV-1 quasispecies evolution and disease progression. PLoS Comput Biol. 4:e1000240.
Leigh Brown AJ. 1997. Analysis of HIV-1 env gene sequences reveals
evidence for a low effective number in the viral population. Proc
Natl Acad Sci U S A. 94:1862–1865.
Leitner T. 2002. The molecular epidemiology of human viruses. Boston:
Kluwer Academic Publishers.
Leitner T, Albert J. 1999. The molecular clock of HIV-1 unveiled through
analysis of a known transmission history. Proc Natl Acad Sci U S A.
96:10752–10757.
Leitner T, Albert J. 2000. Reconstruction of HIV-1 transmission chains for
forensic purposes. AIDS Rev. 2:241–251.
Leitner T, Escanilla D, Franzen C, Uhlen M, Albert J. 1996. Accurate
reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A. 93:10864–10869.
Leitner T, Fitch WM. 1999. The phylogenetics of known transmission
histories. In: Crandall KA, editor. The evolution of HIV. Baltimore
(MD): Johns Hopkins University Press.
Leitner T, Halapi E, Scarlatti G, Rossi P, Albert J, Fenyö EM, Uhlén M.
1993. Analysis of heterogeneous viral populations by direct DNA
sequencing. BioTechniques 15:120–126.
Lemey P, Derdelinckx I, Rambaut A, Van Laethem K, Dumont S,
Vermeulen S, Van Wijngaerden E, Vandamme AM. 2005.
Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J Virol. 79:11981–11989.
Lindstrom A, Ohlis A, Huigen M, Nijhuis M, Berglund T, Bratt G,
Sandstrom E, Albert J. 2006. HIV-1 transmission cluster with M41L
2481
Romero-Severson et al. . doi:10.1093/molbev/msu179
“singleton” mutation and decreased transmission of resistance in
newly diagnosed Swedish homosexual men. Antivir Ther. 11:
1031–1039.
Liu L, Pearl DK. 2007. Species trees from gene trees: reconstructing
Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol. 56:504–514.
Liu Y, McNevin J, Cao J, Zhao H, Genowati I, Wong K, McLaughlin S,
McSweyn MD, Diem K, Stevens CE, et al. 2006. Selection on the
human immunodeficiency virus type 1 proteome following primary
infection. J Virol. 80:9519–9529.
Maddison D, Maddison W. 2003. MacClade 4: analysis of phylogeny and
character evolution. 4.06 ed. Sunderland (MA): Sinauer Associates.
Maljkovic Berry I, Ribeiro R, Kothari M, Athreya G, Daniels M, Lee HY,
Bruno W, Leitner T. 2007. Unequal evolutionary rates in the human
immunodeficiency virus type 1 (HIV-1) pandemic: the evolutionary
rate of HIV-1 slows down when the epidemic rate increases. J Virol.
81:10625–10635.
McNearney T, Hornickova Z, Markham R, Birdwell A, Arens M, Saah A,
Ratner L. 1992. Relationship of human immunodeficiency virus type
1 sequence heterogeneity to stage of disease. Proc Natl Acad Sci U S
A. 89:10247–10251.
Messer PW, Neher RA. 2012. Estimating the strength of selective sweeps
from deep population diversity data. Genetics 191:593–605.
Murillo W, de Rivera IL, Parham L, Jovel E, Palou E, Karlsson AC, Albert J.
2010. Prevalence of drug resistance and importance of viral load
measurements in Honduran HIV-infected patients failing antiretroviral treatment. HIV Med. 11:95–103.
Nijhuis M, Boucher CA, Schipper P, Leitner T, Schuurman R, Albert J.
1998. Stochastic processes strongly influence HIV-1 evolution during
suboptimal protease-inhibitor therapy. Proc Natl Acad Sci U S A. 95:
14441–14446.
Nordborg M. 2001. Coalescent theory. In: Balding DJ, Bishop M,
Cannings C, editors. Handbook of Statistical Genetics. New Jersey:
John Wiley and Sons, Ltd. p. 843–877.
Pamilo P, Nei M. 1988. Relationships between gene trees and species
trees. Mol Biol Evol. 5:568–583.
Parekh BS, Hanson DL, Hargrove J, Branson B, Green T, Dobbs T,
Constantine N, Overbaugh J, McDougal JS. 2011. Determination of
mean recency period for estimation of HIV type 1 incidence with
the BED-capture EIA in persons infected with diverse subtypes. AIDS
Res Hum Retroviruses. 27:265–273.
Pennings PS, Kryazhimskiy S, Wakeley J. 2014. Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 10:
e1004000.
Perelson AS, Neuman AU, Markowitz M, Leonard JM, Ho DD. 1996.
HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span,
and viral generation time. Science 271:1582–1586.
Pybus OG, Rambaut A. 2009. Evolutionary analysis of the dynamics of
viral infectious disease. Nat Rev Genet. 10:540–550.
R Development Core Team. 2003. R: a language and environment for
statistical computing. Vienna (Austria): R Foundation for Statistical
Computing.
Rodrigo AG, Shpaer EG, Delwart EL, Iversen AK, Gallo MV, Brojatsch J,
Hirsch MS, Walker BD, Mullins JI. 1999. Coalescent estimates of
2482
MBE
HIV-1 generation time in vivo. Proc Natl Acad Sci U S A. 96:
2187–2191.
Rosenberg NA, Tao R. 2008. Discordance of species trees with their most
likely gene trees: the case of five taxa. Syst Biol. 57:131–140.
Salazar-Gonzalez JF, Salazar MG, Keele BF, Learn GH, Giorgi EE, Li H,
Decker JM, Wang S, Baalwa J, Kraus MH, et al. 2009. Genetic identity,
biological phenotype, and evolutionary pathways of transmitted/
founder viruses in acute and early HIV-1 infection. J Exp Med. 206:
1273–1289.
Scaduto DI, Brown JM, Haaland WC, Zwickl DJ, Hillis DM, Metzker ML.
2010. Source identification in two criminal cases using phylogenetic
analysis of HIV-1 DNA sequences. Proc Natl Acad Sci U S A. 107:
21242–21247.
Shankarappa R, Margolick JB, Gange SJ, Rodrigo AG, Upchurch D,
Farzadegan H, Gupta P, Rinaldo CR, Learn GH, He X, et al. 1999.
Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J Virol. 73:
10489–10502.
Skar H, Albert J, Leitner T. 2013. Towards estimation of HIV-1 date of
infection: a time-continuous IgG-model shows that seroconversion
does not occur at the midpoint between negative and positive tests.
PLoS One 8:e60906.
Skar H, Axelsson M, Berggren I, Thalme A, Gyllensten K, Liitsola K,
Brummer-Korvenkontio H, Kivela P, Spangberg E, Leitner T, et al.
2011. Dynamics of two separate but linked HIV-1 CRF01_AE outbreaks among injection drug users in Stockholm, Sweden, and
Helsinki, Finland. J Virol. 85:510–518.
Skar H, Borrego P, Wallstrom TC, Mild M, Marcelino JM, Barroso H,
Taveira N, Leitner T, Albert J. 2010. HIV-2 genetic evolution in patients with advanced disease is faster than that in matched HIV-1
patients. J Virol. 84:7412–7415.
Supervie V, Ndawinz JD, Lodi S, Costagliola D. 2014. The undiagnosed
HIV epidemic in France and its implications for HIV screening strategies. AIDS. 28(12):1797–1804.
Swofford DL. 2002. PAUP* Phylogenetic Analysis Using Parsimony
(*and Other Methods). 4.0b10 ed. Sunderland (MA): Sinauer
Associates.
Volz EM, Ionides E, Romero-Severson EO, Brandt MG, Mokotoff E,
Koopman JS. 2013. HIV-1 transmission during early infection in
men who have sex with men: a phylodynamic analysis. PLoS Med.
10:e1001568.
Wakeley J. 2009. Coalescent theory: an introduction. Greenwood Village
(CO): Roberts and Company Publishers.
Wolfs TFW, Zwart G, Bakker M, Goudsmit J. 1992. HIV-1 genomic RNA
diversification following sexual parenteral virus transmission.
Virology 189:103–110.
Wolinsky SM, Korber BTM, Neumann AU, Daniels M, Kunstman KJ,
Whetsell AJ, Furtado MR, Cao Y, Ho DD, Safrit JT, et al. 1996.
Adaptive evolution of human immunodeficiency virus type1
during the natural course of infection. Science 272:537–542.
Zhang LQ, MacKenzie P, Cleland A, Holmes EC, Brown AJ, Simmonds P.
1993. Selection for specific sequences in the external envelope
protein of human immunodeficiency virus type 1 upon primary
infection. J Virol. 67:3345–3356.