Timing and Order of Transmission Events Is Not Directly Reflected in a Pathogen Phylogeny Ethan Romero-Severson,1 Helena Skar,y,1 Ingo Bulla,y,1 Jan Albert,2,3 and Thomas Leitner*,1 1 Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden 3 Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden y These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate Editor: Jeffrey Thorne 2 Abstract Pathogen phylogenies are often used to infer spread among hosts. There is, however, not an exact match between the pathogen phylogeny and the host transmission history. Here, we examine in detail the limitations of this relationship. First, all splits in a pathogen phylogeny of more than 1 host occur within hosts, not at the moment of transmission, predating the transmission events as described by the pretransmission interval. Second, the order in which nodes in a phylogeny occur may be reflective of the within-host dynamics rather than epidemiologic relationships. To investigate these phenomena, motivated by within-host diversity patterns, we developed a two-phase coalescent model that includes a transmission bottleneck followed by linear outgrowth to a maximum population size followed by either stabilization or decline of the population. The model predicts that the pretransmission interval shrinks compared with predictions based on constant population size or a simple transmission bottleneck. Because lineages coalesce faster in a small population, the probability of a pathogen phylogeny to resemble the transmission history depends on when after infection a donor transmits to a new host. We also show that the probability of inferring the incorrect order of multiple transmissions from the same host is high. Finally, we compare time of HIV-1 infection informed by genetic distances in phylogenies to independent biomarker data, and show that, indeed, the pretransmission interval biases phylogeny-based estimates of when transmissions occurred. We describe situations where caution is needed not to misinterpret which parts of a phylogeny that may indicate outbreaks and tight transmission clusters. Key words: HIV, within-host dynamics, molecular epidemiology, phylodynamics, transmission reconstruction, coalescent. Introduction Article Investigating DNA sequence data from pathogens is a powerful and increasingly popular method to get information about epidemics that otherwise is difficult to access. Typically, pathogen phylogenies are used to infer parameters and events relating to host behavior such as who infected whom or when transmissions occurred in the past. In chronic diseases such as HIV infection, it is often not known how long patients have been infected, and even less who infected whom, and thus phylogenies that are able to reconstruct the past evolutionary history of the pathogen promises to unravel such unknowns. Because HIV (and other measurably evolving pathogens) typically accumulates mutations faster than between-host transmissions occur, transmission histories become recorded in the virus phylogeny (Leitner 2002). There is, however, a lack of deeper understanding of the connection between transmission histories and viral phylogenies. The pretransmission interval describes the difference in time between when a transmitted lineage arose in a donor and when transmission actually took place (Leitner and Albert 1999; Leitner and Fitch 1999). Because transmitted lineages must already exist at the time of transmission, using the time-point when the transmitted lineage arose will always bias the estimated time of transmission backwards in time. However, the magnitude of this bias has not been carefully investigated. Therefore, the impact this may have on epidemiological reconstructions is unknown. In addition to timing bias, there may be additional problems arising from the fact that viral lineages can persist for long time periods in the host. Although the issue of incomplete lineage sorting has been investigated quite extensively in the more general context of speciation (Pamilo and Nei 1988; Degnan and Rosenberg 2006; Liu and Pearl 2007; Rosenberg and Tao 2008), and has been discussed in the context of pathogen transmission (Leitner and Fitch 1999; Leitner and Albert 2000; Pybus and Rambaut 2009), it is often not appreciated in molecular epidemiology studies. For HIV infections, it has been described that multiple HIV lineages may exist for seven or more years in a host (Rodrigo et al. 1999), which possibly could affect transmission reconstructions. However, these estimates were based on simple coalescent models assuming constant population size, which appear unrealistic when transmission is considered, because transmissions are known to involve a strong bottleneck (McNearney et al. 1992; Wolfs et al. 1992; Zhang et al. 1993; Salazar-Gonzalez et al. 2009). Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2014. This work is written by US Government employees and is in the public domain in the US. 2472 Mol. Biol. Evol. 31(9):2472–2482 doi:10.1093/molbev/msu179 Advance Access publication May 29, 2014 Timing and Order of Transmission Events . doi:10.1093/molbev/msu179 The objective of this study was to explore the limitations of using phylogenetic branch lengths as a proxy for time between infections and time since infection. We show that the pretransmission interval becomes shorter when a more realistic coalescent model is applied, which may improve transmission reconstructions. However, we also show that phylogenies alone may not distinguish between serial transmissions between different individuals and one person spreading to several individuals as in super-spreader cases, which may confuse the inference of spread dynamics. Results Theoretical Limitations in Reconstructing Time of Transmission Transmission of a pathogen can occur serially from one person to another so that each person only infects one other person, forming a single chain of transmissions. However, one person may also transmit to more than one recipient, the extreme being a super-spreader. In real epidemics, a combination of these two modes of transmission usually occurs. When no other information is available than a virus phylogeny from the persons involved in the transmission chain, limitations arise on what we can and cannot infer. Figure 1A shows the case when serial transmissions have occurred. We will consider the case of HIV transmission, but note that the theoretical limitations may be valid for many other measurably evolving pathogens, such as hepatitis C and influenza. At transmission, the donor has an established infection with some genetic variation in the HIV population, this level of variation depends on a number of factors, such as if the infection is acute or chronic. From this population a limited number of viruses are transmitted, incurring a strong bottleneck (McNearney et al. 1992; Wolfs et al. 1992; Zhang et al. 1993; Salazar-Gonzalez et al. 2009). Importantly, the virus that was transmitted already existed in the donor at time of transmission. Thus, the phylogenetic split that corresponds to the most recent common ancestor of what grows out in the recipient and what remains in the donor must be further back in time than the time of transmission, known as the pretransmission interval (Leitner and Albert 1999). In the case of serial transmissions, each split relates to one transmission event, and thus the phylogeny will qualitatively resemble the serial transmissions in its ladder-like shape, but these events will all be inferred prior to when they actually occurred. Each donor occupies three branch-segments of the resulting phylogeny, where the splits relate to within-host events. In each transmission, the length of the pretransmission interval depends mainly on the diversity in the donor at time of transmission. In the case when the donor infects more than one new host, the diversity at each time of transmission causes additional problems for the reconstruction. If transmissions occur over long intervals (fig. 1B), each split again relates to a transmission, and superficially the phylogeny reconstructs the transmission events. However, in this case all splits occurred within the donor (A), and thus the phylogeny is mostly a description of within-host evolution, sampled through the transmissions to B, C, and D, with only some fraction of the MBE tips describing evolution occurring in the recipients. As in the serial case, each transmission will phylogenetically be inferred prior to when it actually occurred if one was to interpret a split as a transmission event. Finally, when a donor infects two or more recipients within a short interval, it is possible that the order of transmissions along with infection times become impossible to accurately reconstruct (fig. 1C). In this case too, all splits are within the donor, describing within-host evolution in the donor. Overall, the pretransmission interval associated with each and every transmission is a random draw from the possible coalescence times in the donor’s viral population. A Two-Phase Linear Coalescent Model of Within-Host HIV Dynamics According to standard coalescent theory (Nordborg 2001; Wakeley 2009), the time for two randomly selected lineages (the transmitted lineage and a random lineage in the donor) to coalesce in a simplified theoretical population is an exponential random variable with a rate parameter proportional to the inverse of the population size. The empirical dynamics of a real HIV population can be quite different from the idealized population leading to multiple-order-of-magnitude discrepancies in the empirical census population size and the equivalent theoretical population size (Leigh Brown 1997; Pennings et al. 2014). For this reason, modeling the characteristic growth in the viral census population can lead to erroneous inferences. Given the assumptions of the Wright–Fisher-like population, the diversity of the viral population is a closer proxy to the theoretical population size than the number of infected cells or viral load. Shankarappa et al. (1999) measured the viral diversity of seven patients beginning at seroconversion for up to 12 years. They showed that env diversity increases linearly at an approximately common rate up to some point after which the diversity appears to either decline or stabilize. The time (from seroconversion) at peak diversity varied between 2 and 8 years and the rate of decline in diversity after its peak also seemed to vary between patients. To model this trend, we use the following piecewise-linear model with uniform random effects on both the time at maximum diversity and the rate of decline in diversity following the maxima: 1 + 1 t, t tx , NðtÞ ¼ 1 + 1 tx + 2 ðt tx Þ, t > tx where tx ~ U ðta ,tb Þ and 2 ~ U ð,0Þ account for between patient variance in the maximum diversity and the rate of decline in diversity after the maximum. 1 is the population size at the moment of infection, 1 is the rate of population size increase in the first phase, ta and tb are the respective minimum and maximum times at maximum population size. The quantity is defined as ðNMin 1 1 ta Þ=ðtMax ta Þ where tMax is the maximum sampling time, and NMin is the minimum population size. The latter two parameters are used to define the minimum slope in the second phase such that at time tMax the population will not be less than NMin. Figure 2 summarizes the parameters in our two-phase 2473 Romero-Severson et al. . doi:10.1093/molbev/msu179 MBE FIG. 1. Fundamental limitations of phylogenetic transmission reconstruction. Panels show the case of serial transmissions (A), and multiple transmissions from one donor with long transmission intervals (B) and short transmission intervals (C). The left column shows the transmission history and the right column the reconstructed phylogeny. Time of transmission T is indicated for each transmission, A!B, B!C, and C!D in panel (A) and A!B, A!C, and A!D in panels (B) and (C). coalescent model. In what follows, we always refer to the phase (0, tx) as the first phase and (tx, tMax) as the second phase, independent of whether we are in the framework of measuring time forwards or backwards. A Bottleneck-Constant Size Model of Within-Host HIV Dynamics To further consider the relative influence of the bottleneck at transmission and the population dynamics on the pretransmission interval, we also consider the model 0, t ¼ 0 NðtÞ ¼ , c, t > 0 where the population size is zero at the moment of transmission, forcing all lineages to coalesce at or before this time (on the reverse time axis), and constant everywhere 2474 else. The point of this model is not to express any biological reality but rather to be used as a tool to understand the impact of the strong transmission bottleneck alone, and how it influences the magnitude of the pretransmission interval. The Pretransmission Interval Impact on Time of Transmission Inference To derive the magnitude of the pretransmission interval, we need to first consider density of times to the next coalescent event in units of coalescent time fA ¼ ea where 1 unit of coalescent time at time t is defined as N(t) generations in the past; in the case of HIV-1 the generation time is about 1 day (Perelson et al. 1996), so we will omit this scalar factor and simply refer to calendar time. Our strategy for deriving the magnitude of the pretransmission interval is to first MBE Timing and Order of Transmission Events . doi:10.1093/molbev/msu179 h for z 2 0,t1 + 1 1 i and fZ2 j t1 ,tx ,2 ðzÞ ¼ fA ðg2 ðz,t1 ,tx ,2 ÞÞg20 ðz,t1 ,tx ,2 Þ 1 ¼ ð1 + 1 tx + 2 ðt1 tx ÞÞ2 1 ð1 + 1 tx + 2 ððt1 tx Þ zÞÞ2 1 for z 2 ½0,1. To account for the joint at time tx we will only use the density of Z2 conditional on the time to coalescence being less than t1 tx , which is indicated as fZ2 . To get that density we need the probability of this event occurring, which is given by Z FIG. 2. Parameters in our two-phase linear coalescent model. The HIV population initially grows from N ¼ 1 at rate 1 to a time tx in U ðta ,tb Þ when the population growth either stabilizes or starts to decline at rate 2 in U ð,0Þ to end at N ¼ NMin at time tMax . derive the relevant densities of coalescence in the twophase model, assemble the pieces into a single expression, and finally integrate to obtain the random effects and expectations. The changing rate of coalescence given our population model can be obtained by the relation that an infinitesimal unit of calendar time, du, at time u is equivalent to a change of du NðuÞ in units of coalescent time. Therefore, denoting the time of sampling by t1, Zs du g1 ðs,t1 Þ ¼ 0 1 + 1 ðt1 uÞ 1 ¼ ½logð1 + 1 t1 Þ logð1 + 1 ðt1 sÞÞ 1 Zs du g2 ðs,t1 ,tx ,2 Þ ¼ + t + 1 1 x 2 ððt1 tx Þ uÞ 0 1 ¼ logð1 + 1 tx + 2 ðt1 tx ÞÞ 2 logð1 + 1 tx + 2 ððt1 tx Þ sÞÞ gives the relationship between change in coalescent and calendar time in the first and second phase respectively. Because the coalescent is defined in reverse time the derivation of the model equations is defined on a reverse time axis such that the integral defined over 0 to s indicates moving backwards in time s units, starting at t1. Next, we derive the density of the time to coalescence in units of generation time, which we denote by Z. To this end, we calculate the densities of the time to coalescence (in units of generation time) either assuming that t1 tx (denoted fZ1 ), or assuming that t1 > t and NðtÞ ¼ 1 + 1 tx + 2 t for all t1 tx (denoted fZ2 ). To get these densities, we need to make the following transformations: fZ1 j t1 ðzÞ ¼ fA ðg1 ðz,t1 ÞÞg10 ðz,t1 Þ 1 1 ¼ ð1 + 1 t1 Þ1 ð1 + 1 ðt1 zÞÞ1 1 t1 tx ðt1 ,tx ,2 Þ ¼ fZ2 j t1 ,tx ,2 ðzÞdz 0 1 1 ¼ ð1 + 1 tx Þ2 ð1 + 1 tx + 2 ðt1 tx ÞÞ 2 + 1: f ðzÞ Then we have the definition fZ2 j t1 ,tx ,2 ðzÞ ¼ Z2ðjtt11,t,txx,,22 Þ on ½0,t1 tx and zero everywhere else. To obtain the final expectation, we need to integrate out the random time at peak population size and the random slope in the second phase. If we start by assuming that all parameters are fixed, we have the two cases with respect to the time that at maximum population size, tx EðZ j t1 ,tx ,2 Þ 8 R t1 + 11 > > fZ1 j t1 ðzÞz dz, 0 > > Z t1 tx > > > > > ðt1 ,tx ,2 Þ fZ2 j t1 ,tx ,2 ðzÞzdz > > > 0 > < ¼ + ½1 ðt1 ,tx ,2 Þ > > > 2 t + 1 3 > > x > Z 1 > > > 6 7 > > fZ1 j t1 ðzÞzdz + ðt1 tx Þ5, 4 > > : t1 tx t1 > tx 0 Finally, to obtain the two-phase model we integrate over the random effects Z Z 1 1 0 t1 EðZ j t1 ,x,yÞdx EðZ j t1 Þ ¼ tb ta ta Z tb EðZ j t1 ,x,yÞdx dy: + t1 The expectation for the bottleneck-constant size model is more straightforward. By definition of the model, all coalescences must occur at or before the donor was infected (in reverse time). Therefore, to obtain the expectation under this model we need to simply take the sum of the expectation from 0 to t1 for the constant model and the remaining density of uncoalesced lineages at time of infection (forcing them to coalesce) as 2 3 Zt1 Zt1 1 t 1 t exp t dt + 41 exp dt5t1 EðZ j t1 ,cÞ ¼ c c c c 0 0 2475 Romero-Severson et al. . doi:10.1093/molbev/msu179 MBE is dominating the impact on the pretransmission interval and the random effects of 2 are relatively small (dark gray envelope, fig. 3). In addition to large among patient variability, the pretransmission interval possible in each patient has large range of possible realizations upon transmission. For a typical patient that reaches a peak population size of 3,000 in 5 years and has a moderate decline in population (2 ¼ 1N=day), the upper 95% confidence band is close to the identity line and the lower 95% band close to zero. Thus, the confidence interval of the pretransmission interval is large, as both young and old lineages from the donor’s population potentially could be transmitted. The Probability of Incongruity between the Transmission Tree and the Coalescent Genealogy FIG. 3. The expected pretransmission interval. The red horizontal line shows the expected pretransmission interval under the standard constant size coalescent at N = 3,000. The dashed line shows the expectation under a bottleneck-constant size model with N = 3,000. The gray envelope shows the extremes of the random realizations of the pretransmission interval expectation under our two-phase linear coalescent model where the average patient reaches N(tx) & 3,000 in 2–8 years. The dark gray envelope shows the subset of patients with tx = 5 years. In the bottleneck-constant and two-phase linear models, the mean time for two random lineages to coalesce depends on the time a donor has been infected. Figure 3 summarizes the effects of the pretransmission interval that one could expect in a phylogenetic reconstruction of a transmission time. Typically, previous coalescent estimates of within-host HIV-1 N based on the constantsize model have ranged 1,000–10,000 (Leigh Brown 1997; Nijhuis et al. 1998; Rodrigo et al. 1999). With N = 3,000 and a generation time of 1 day, the pretransmission interval would be unchanged through transmission from donor to recipient at 8.2 years. Clearly, this is a very unrealistic model as it assumes that the entire diversity in the donor is transmitted to the recipient. Adding the well-known bottleneck at transmission, we note that the bottleneck-constant model drastically reduces the expected pretransmission interval. N is now dependent on the time of transmission relative to the donor’s infection time. Our two-phase model adds further realism and the expected pretransmission interval is generally reduced even more because lineages coalesce faster in a small population, early in the infection, and slower once the pathogen has reached its full diversity level. In a comparable set of possible patients, which on average reach N(tx) & 3,000 in 2– 8 years (N(tx) range 1,200–4,900), implying 1 = 1.64 N/day, the two-phase model also introduces substantial variation among patients, as observed in Shankarappa et al. (1999). Some patients may induce very restricted pretransmission intervals and others much larger (gray envelope in fig. 3). To further illustrate the effects of our two-phase model, we highlight patients where tx is fixed at 5 years; noting that N(tx) 2476 If there is negligible within-host diversity, the coalescent genealogy will have the same labeled topology as the transmission tree with high probability. However, for pathogens that evolve within the host, the coalescent genealogy can be heavily influenced by the within-host dynamics, which could mislead investigators who assume negligible within-host diversity (fig. 1C). In the theoretical case where a single individual transmits to k individuals in a very short period of time, the coalescent genealogy reflects more of the within-host dynamics of the donor than the timing of transmissions. In that case, each of the many potential trees with labeled tips for k taxa is nearly equally probable. However as the time between transmissions increases, the epidemiologic dynamics exert a stronger influence, making the coalescent genealogy more probable to be congruous with the transmission tree. In the situation where a donor transmits to two other individuals (at times t1 and t2 ), we have a tree with three taxa corresponding to the two transmission events and a reference lineage in the donor. There are three possible coalescent genealogies with labeled tips in this situation. In one coalescent genealogy, the first coalescence is between the reference lineage and the lineage transmitted at t1 followed by coalescence with the lineage transmitted at time t2 ; this is incongruous with the transmission tree as it implies that t2 occurred before t1 . In the second possible coalescent genealogy, the lineages transmitted at times t1 and t2 coalesce with one another before coalescing with the reference lineage; this tree is also incongruous with the transmission tree as it implies that one of the individuals infected at times t1 and t2 infected the other one. In the final coalescent genealogy, the reference lineage coalesces first with the lineage transmitted at time t2 and then with the lineage transmitted at t1 which agrees with the transmission tree. Figure 4 shows the probability of obtaining the incorrect coalescent genealogy given the two-phase model for a three taxa tree with one donor transmitting to two individuals at times t1 and t1 + t2 for a wide range of times. If t2 ¼ 0 then the probability of obtaining the correct coalescent genealogy is the reciprocal of the number of possible coalescent genealogies, which, for n = 3, is 1/3. In the case where the second transmission occurs about 1 year or less after the first, the Timing and Order of Transmission Events . doi:10.1093/molbev/msu179 FIG. 4. Probability that two transmissions will phylogenetically appear in the opposite order in which they occurred. The x axis shows how long the donor has been infected at time of the first transmission, and the y axis shows the amount of time that has passed from the first to the second transmission. The color indicates the probability of obtaining a coalescent genealogy inconsistent with the transmission tree. probability of obtaining a coalescent genealogy that is incongruous with the epidemiology is about 50%. This means that if chronically infected persons are responsible for a large fraction of new infections, a phylogeny could seriously misrepresent when and in what order transmissions occurred. The probability of recovering the correct coalescent genealogy is worse for trees involving more recipients. If a donor transmits to k individuals, then to recover the coalescent genealogy that is consistent with the transmission tree, each k 1 within-host coalescent event must occur in the correct order; even given a modest level of diversity, this is an improbable outcome. Tip Lengths Compared with BED Estimates of Time of Transmission Recently, we developed a BED (Dobbs et al. 2004) biomarkerbased time-continuous model that can provide information on how long ago a patient was infected (Skar et al. 2013). Here, we use that estimate as a reference to what the tip lengths tell us about time since infection/transmission. According to the above theoretical limits and pretransmission interval expectations, we would predict that tip lengths in general should correspond to a longer time than time since infection/transmission. Trivially, if we have not sampled all individuals in a transmission chain, then tip lengths would be much too long as they now contain unobserved transmissions, spanning more than one patient (fig. 5). Therefore, we identified several transmission chains that were likely to have been extensively, if not fully, sampled in the greater Swedish HIV-1 epidemic. MBE FIG. 5. The pretransmission interval and missing taxa. When all hosts in a transmission chain are sampled (100% sampling), there will be a pretransmission interval associated with each transmission, with an expected length according to Fig. 3. When taxa are missing (<100% sampling), some branches will be affected by more than one pretransmission interval, and such branches reflect evolution in at least three hosts. Note also that when less than 100% sampling occurred, the sampled taxa may not have infected each other (Leitner and Albert 2000). As our HIV-1 trees were based on pol, which is targeted by antiviral drugs, we first investigated whether known drug resistance mutations (DRMs) affected the reconstructions. First, there was no significant difference between the homoplasy distributions of potential DRM sites and non-DRM sites (P = 0.09, Wilcoxon rank sum test with continuity correction). However, the tip lengths related to the sequences that contained DRMs (w/DR) were significantly more affected by the exclusion of DRMs in the alignments than sequences without DRMs (w/o DR), (mean(w/DR) = 0.001672, mean(w/o DR) = 0.000065; P < 0.0002, Wilcoxon rank sum test with continuity correction). Another interesting effect was that taxa w/DR and whose nearest phylogenetic neighbor had changed when DRMs were removed had significantly longer branches than the taxa without DR whose neighbor remained the same (P = 3.103e-07, Wilcoxon rank sum test with continuity correction). This is explained by that the taxa with DR that stayed with the same nearest neighbor when DRMs were excluded were more likely to have been involved in local transmission chains. As there were significant effects on tip lengths depending on inclusion or exclusion of DRMs in the alignments, we removed all potential DRM sites in our alignments before further analyses. Figure 6 shows five well-sampled transmission chains of recently diagnosed persons involving injecting drug user (IDU), heterosexual (HET), and homosexual (MSM) spread of HIV-1 subtypes A1, B, and CRF01. As expected, we see that for many taxa the tips are longer than the corresponding BED distance. Both tip lengths and BED estimates are associated with uncertainties, so we compared the 95% confidence and credible intervals of each estimate to assess when the differences were significant. Although the trend was clearly 2477 Romero-Severson et al. . doi:10.1093/molbev/msu179 MBE FIG. 6. Examples of transmission chains from the greater Swedish HIV-1 epidemic. Chain 1 involved a CRF01 outbreak among IDU, imported from Finland (Skar et al. 2011); Chains 4 and 79 involved smaller MSM outbreaks of subtype B; Chain 80 was another IDU chain spreading subtype B, the more common subtype among IDU in Sweden; and finally Chain 85, a mixed MSM-HET chain spreading subtype A1. In each chain, the tree shows the phylogenetic reconstruction of the HIV-1 pol sequences from each subject (1 direct population sequence/subject; Leitner et al. 1993). The trees are scaled in units of substitutions/site. Superimposed on each tip is the BED-derived genetic distance in bold blue lines (see Materials and Methods for details). Significant differences are indicated (P < 0.05). Each chain shown is a subtree of much larger trees (one per subtype) describing the entire sampled Swedish epidemic; they are thus rooted according to their linkage to the large subtype trees. skewed toward tips being longer, most differences were not significant. pol-based branch lengths typically are uncertain because pol evolves slowly, and BED measurements have large uncertainty because of patient variation (Parekh et al. 2011; Skar et al. 2013). Nevertheless, taking these uncertainties into account, all transmission chains showed some significant differences between tip lengths and BED distances, suggesting that there indeed are real instances where the pretransmission interval has a large impact. There are two types of patterns in the transmission chains in figure 6: 1) Tight phylogenetic clusters with short BED distances. This pattern is visible in all chains except Chain 85. This pattern is consistent with sampling shortly after infection and transmission(s) from one or several donors in the acute phase of infection. In Chain 1, we know from previous work that the tight part corresponds to an outbreak in a socially closely linked IDU network (Skar et al. 2011). 2) Phylogenetic clusters with long branches but short BED distances. This pattern describes Chain 85 entirely, and is also part of the other chains, for example, Chains 4 and 79. This type of pattern is consistent with sampling shortly after infection and transmission from a donor in the chronic disease stage, years into their infection when the donor HIV-1 population has great diversity (fig. 3). As we believe that these clusters were densely sampled, they most likely involved donors that transmitted to multiple recipients as in figures 1C and 4. The third possible pattern, long phylogenetic distances with long BED distances, was not observed here because, obviously, most long-term infected donors were not in the sample of recently diagnosed persons. Discussion The pretransmission interval was described in 1999 (Leitner and Albert 1999; Leitner and Fitch 1999), and although it has been discussed since (Leitner and Albert 2000; Pybus and Rambaut 2009), it has largely been ignored in phylodynamic investigations. Here, we show that this may have bigger consequences than previously understood by showing that there are both timing and topological issues involved in the 2478 pretransmission interval. We show that splits in a pathogen tree are not related directly to transmission events but also reflect samplings of lineages in the donor’s population at times of transmissions. The effect may be that the order of transmission events becomes jumbled, and certainly the inferred times of transmissions are pushed to the past. Hence, one could expect that dynamics such as number of transmissions over time could become unreliable without properly modeling pretransmission interval effects. Identification of clusters in phylogenetic trees of HIV-1 and other pathogen sequences has been a popular method to investigate the epidemic spread of such agents in human populations. The clusters have been defined in various ways, but generally they have been based on genetic distance (branch lengths) and/or phylogenetic robustness (typically nonparametric bootstrap support values). This may be problematic because the branch lengths have a pretransmission interval associated with them biasing the inferred time of transmission backwards in time, and moreover, the topology in the cluster may mostly reflect the random draw of transmitted lineages in a donor. As we show here, the pretransmission interval may involve time periods of months up to several years, depending on spread dynamics. Thus, real epidemiological clusters, stemming from rapid outbreaks or tight transmissions may not form clusters with short branch lengths, as in Chain 85 in figure 6. For instance, consider a donor that infects ten people within a few months; should we expect a tight phylogenetic cluster or not? If the donor had been infected for, say, 6 years at this event, we should expect tips leading to the recipients on average 4 years long, some more, some less (fig. 3). Thus, this very rapid spread will not turn out as a tight cluster in a phylogenetic tree. Clusters with short distances do, however, exist (fig. 6); so, what do they mean in epidemiological terms? Clearly, short tip lengths mean that the infected person was diagnosed/sampled shortly after infection. Short internal branches mean that either there was transmission from a donor shortly after the donor was infected because only closely related lineages existed in the donor, or it may indicate Timing and Order of Transmission Events . doi:10.1093/molbev/msu179 that a younger lineage in the donor happened to be transmitted even though older existed. When many short branches exist in a cluster it must mean that either there were many serial transmissions occurring in early stage infection, or an early stage donor infected multiple recipients (super-spreader). Thus, although clusters with short distances relate to fast spread, fast spread may also occur in clusters with longer distances (fig. 6). Consequently, focusing on clusters with short distances biases the results to one type of rapid spread, and misses other types of rapid spread. There are additional reasons for why branch lengths may differ from times between transmissions: 1) Missing taxa obviously will elongate branch lengths, as several patients and transmissions now occur along them (fig. 5). This problem is always present when not all patients in an epidemic have been sampled, as is usually the case. Note also that typically 17–60% of HIV cases in a human population have not been diagnosed (Hamers et al. 2006; Volz et al. 2013; Supervie et al. 2014); 2) The rate of genetic evolution is different in different patients, depending among other things on the strength of their immune system and on antiviral treatment (Wolinsky et al. 1996; Halapi et al. 1997; Lee et al. 2008; Cohen and Gay 2010; Skar et al. 2010); 3) The rate of genetic evolution within one patient is not constant over time; it dynamically follows the increase and decrease of the immune response over the pathogenesis (Lee et al. 2008). In addition, there exists some confusion about which network we can reconstruct, the transmission network or the phylogeny. The transmission network is a subset of the social network, where the transmission network will be bifurcating (unless there is a high rate of superinfection), whereas the social network likely contains many alternative (cyclic and multifurcating) paths of potential transmissions. The latter cannot be reconstructed by phylogenetics as only transmissions that actually occurred will be recorded, and the former will be subject to the effects of the pretransmission interval as described in this article. Similarly, the within-host phylogeny may contain recombination, causing complicated cyclic structures. It is possible that modeling all these levels of networks may improve future transmission inference. To operate correctly, our model needs additional information about how long the donor was infected at time of transmission. As we show here, and previously (Skar et al. 2013), time since infection can be estimated using BED data, or for instance by estimating the number of multistate characters in direct population sequence data (Kouyos et al. 2011). Alternatively, if this information is not available, in large epidemiological studies we can treat it as a random variable. It is possible that adding additional features to our model would further reduce the inferred pretransmission interval effects. Such features include within-host selective sweeps (Messer and Neher 2012; Pennings et al. 2014), which could further reduce N in a donor. Accurately modeling these sweeps would require additional information, for example, longitudinal and detailed clonal data from donors. In this context it is also worth pointing out that evolution across the HIV genome is highly variable, for example, env evolves much faster than pol. Thus, although our model qualitatively MBE resembles the diversity development of any gene, the growth rates may differ along the genome. In general, when transmissions happen more often a fast evolving gene would have greater power to resolve lineages in time (Leitner et al. 1996), but slower evolving genes may also have sufficient power (Lemey et al. 2005). We believe that more realistic phylodynamic models that include the issues we describe here will significantly improve future epidemiological investigations. On the level of who infected whom, relevant to forensic investigations, for example (Leitner and Albert 2000; Scaduto et al. 2010), proper consideration of the pretransmission interval may aid in assessments of whether transmission could have taken place in a time window suggested by other data. If more than one transmission occurred from a common source, the potential effects associated with inferred disordering need to be considered. On the larger epidemiological level, relating to questions of incidence or when infections occur relative to host stage (e.g., acute/chronic) or time, at a minimum the expected (mean) pretransmission interval needs to be included. Note, however, that the pretransmission interval should inversely correlate to the spread rate (Maljkovic Berry et al. 2007), and social network structures (Graw et al. 2012) may cause heterogeneous effects on the mean pretransmission interval. Materials and Methods HIV Sequence and BED Biomarker Data A total of 1,740 partial pol sequences from newly diagnosed HIV-1 patients were evaluated in the study. These sequences were generated using a published in-house method that targets amino acids 1–99 in the protease and 1–253 in the reverse transcriptase (Murillo et al. 2010; Karlsson et al. 2012). The sequences were assembled and edited using the SequencherTM software (Gene Codes Corporation, Ann Arbor, MI) and aligned using MAFFT (Katoh and Toh 2008). The largest part of the data set (N = 1,539) represents the Swedish HIV-1 epidemic, that is, patients diagnosed with HIV-1 in Sweden, representing 45% of the newly diagnosed HIV-1 patients between August 2002 and August 2010. In addition, we included 201 previously published Swedish sequences sampled between 1992 and 2002 (Lindstrom et al. 2006). BED testing was done on 819 of the collected blood samples to estimate time since infection as previously described (Skar et al. 2013). From all of these patients, we selected sequences that were part of densely sampled Swedish transmission chains (fig. 6), involving patients that had no symptoms of advanced disease (low CD4 counts or other AIDS-related manifestations) to avoid known problems related to false-recent infection classification of patients with advanced disease and low BED values (Dobbs et al. 2004). For similar scientific and ethical reasons as explained in Alizon et al. (2010) and Kouyos et al. (2010), only approximately 15% of the anonymized sequences are accessible via GenBank (accession numbers: JQ698667–JQ698874). 2479 MBE Romero-Severson et al. . doi:10.1093/molbev/msu179 Ethics Statement Informed consent was obtained from participants or from the next of kin, caregivers or guardians on the behalf of minors. The research was conducted according to the Declaration of Helsinki and was approved by the Regional Medical Ethics Board in Stockholm, Sweden (Dnr 02-367, 04-797, 2005/1214, 2007/1533, and 2011/1854). Phylogenetic Inference Maximum likelihood (ML) trees were generated for each HIV1 subtype using PhyML with substitution model as suggested by the software jModeltest according to the Akaike information criterion (Guindon et al. 2005; Darriba et al. 2012). In order to account for systematic tip length biases due to DRMs, potential effects were investigated on the ML trees inferred from alignments either with or without potential DRMs. Potential DRMs were classified according to the Stanford database; Major HIV-1 Drug Resistance Mutations, available at: http://hivdb.stanford.edu/pages/download/ (last accessed January 15, 2013). 1) The homoplasy index was calculated for each individual nucleotide site using MacClade 4.06 (Maddison D and Maddison W 2003). Each site was classified to be part of a potential DRM codon or not. By comparing these two site classifications, it was possible to determine whether DRMs contributed significantly more to the homoplasy distribution in the tree. 2) To investigate the actual effect on the parameter of interest (the branch lengths), we compared the tip length distributions from ML trees inferred from alignments either with or without DRMs. Furthermore, the tip length distribution of those taxa with known transmitted DRMs (TDRs) was compared with taxa without TDRs. Translating BED Estimated Time since Infection to Genetic Distance The biomarker (BED) estimation of time since infection (Skar et al. 2013) was transformed into genetic distance by a molecular clock. The genetic region corresponding to the Swedish pol sequences was extracted and aligned from six previously studied patients (PIC1362, PIC38417, PIC71101, PIC83747, and PIC90770) with N = 27–121 sequences from 3 to 11 time points with a follow up of between 181 and 1,245 days from onset of acute HIV-1 infection symptoms (Liu et al. 2006). Putative recombinants (DQ853454, DQ853445) were identified and removed using the Phi-test in SplitsTree (Huson 1998). TDRs were excluded from the alignment. PhyML was used to construct ML trees for each patient. The substitution model used (GTR + I + G) was the same as for the construction of the ML trees for the Swedish pol data. In order to take patient variation into account, we used a linear mixed effects model to estimate the within-patient evolutionary rate in this pol fragment. An uncorrelated random effects model described the data better than a correlated model (P < 0.0001, AIC), and the fixed effect correlation was also very small (r = 0.042). The rate of evolution was 6.59 106 substitutions site1 day1, and the intercept 1.46 103 substitutions site1 (supplementary fig. S1, 2480 Supplementary Material online). Modeling was performed with R package lme4 (Bates et al. 2011). Confirmative BEAST (Drummond and Rambaut 2007) analyses (uncorrelated relaxed lognormal or strict clock) had very similar results (data not shown). Comparing Phylogenetic and BED Distances The BED-derived genetic distances were compared with the sequence-based phylogenetic distances by numerically estimating the overlap of the 95% credibility interval of the posterior and 95% confidence interval, respectively. The 95% credibility interval of the posterior BED estimate was based on the random effects in the time-continuous BED modeling, described in Skar et al. (2013), and the 95% confidence interval on branch lengths was estimated using ML optimizations using PAUP* (Swofford 2002). Significance was determined as less than 5% overlap. Simulation of Coalescent Genealogies In the case where a donor (D) transmits to two other individuals (I1 and I2 at times t1 and t2 , respectively), there are three possible coalescent genealogies. If the lineage terminating in D coalesces with the lineage terminating in I2 before t1 , then the coalescent genealogy must be consistent with the transmission tree. Therefore, we needed to calculate the probability that a coalescent event occurs in the interval ½t1 ,t2 . To do this we first divided ½t1 ,t2 into subintervals of length 1 day. For each day, td , in ½t1 ,t2 we approximated the probability of a coalescent event by first drawing 104 values of tx from Uð2 365,8 365Þ and 104 values of 2 from Uð0:16,0Þ and calculated the population size at td assuming 1 ¼ 1 and 1 ¼ 1:64. We assumed that over each day that the population size was constant such that the probability of a coales cent event occurring in that subinterval was 1 exp N1 . For each day, we calculated the average probability of a coalescent event by averaging the probability of coalescence for each of the sampled population sizes. The probability of no coalescent event in ½t1 ,t2 is then the product of the complement of the probability of a coalescent event occurring for all days in ½t1 ,t2 . If no coalescent event occurs in ½t1 ,t2 , then each of the possible coalescent genealogies is equally probable. Then, the probability of obtaining an incongruous coalescent genealogy is ð1 pÞ + p3 where p is the probability of no coalescence occurring in ½t1 ,t2 : The implementations of our model were done using R (R Development Core Team 2003), and we plan to include the model in a future comprehensive transmission inference software. Approximation of the Coalescent Rate for Small Sample Fractions The standard n-coalescent is an approximation to the full coalescent process that assumes that the probability of multiple coalescences per generation is very small. In practice, this equates to assuming that the population size is large or that the sample size is small. Our model, however, allows the population to become very small to account for the strong bottleneck at transmission. To understand the effect of violating Timing and Order of Transmission Events . doi:10.1093/molbev/msu179 this assumption, we need to consider the full coalescent process. The probability that k genes have k unique ancestors in the previous generation in a population of size N is given by k1 Y 1 i¼1 i , N which is approximated by k 1 1 , 2 N where the terms in the original product that correspond to multicoalescences are assumed to be ignorable if N is large with respect to k. In general, we are concerned with the situation where we have two taxa (a transmitted taxon and a reference taxon in the host). Supplementary Figure S2, Supplementary Material online, shows the discrepancy in the exact and approximate probabilities of observing k lineages in a population of size N in the previous generation given a sample of size k, that is, the probability of no coalescent event. In the most common case where k = 2 the approximation is exact as there is only one possible coalescence that can occur. In the case where k = 10 such as with a putative superspreader, the approximation underestimates the rate of coalescence leading to overestimated inferences (longer pretransmission interval and higher probability of node disordering). In general, most coalescences will occur when the population size is large enough such that the approximate rate of coalescence is a reasonable estimate of the true rate. Even in the pathological case where there are many transmissions right after infection, the magnitude of the bias is limited by the fact that lineages are constrained by biological reality to occur in the donor at the time of infection. Therefore, the effect of the small population size on the coalescence rate is ignorable for our purposes. Supplementary Material Supplementary figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals. org/). Acknowledgments This work was supported by National Institutes of Health (grant number R01AI087520), Vetenskapsrådet (fellowship 623-2011-1100), Deutsche Forschungsgemeinschaft (fellowship BU 2685/4-1), Vetenskapsrådet (grant number K2008-56X-09935-17-3), and EU projects SPREAD (QLK2CT-2001-01344), and CHAIN (FP7/2007–2013). References Alizon S, von Wyl V, Stadler T, Kouyos RD, Yerly S, Hirschel B, Boni J, Shah C, Klimkait T, Furrer H, et al. 2010. Phylogenetic approach reveals that virus genotype largely determines HIV set-point viral load. PLoS Pathog. 6:e1001123. Bates D, Maechler M, Bolker B. 2011. lme4: linear mixed-effects models using S4 classes. 0.999375-42 ed [Internet]. Available from: http://cran.r-project.org. Cohen MS, Gay CL. 2010. Treatment to prevent transmission of HIV-1. Clin Infect Dis. 50(Suppl. 3), S85–S95. MBE Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 9:772. Degnan JH, Rosenberg NA. 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2:e68. Dobbs T, Kennedy S, Pau CP, McDougal JS, Parekh BS. 2004. Performance characteristics of the immunoglobulin G-capture BED-enzyme immunoassay, an assay to detect recent human immunodeficiency virus type 1 seroconversion. J Clin Microbiol. 42:2623–2628. Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 7:214. Graw F, Leitner T, Ribeiro RM. 2012. Agent-based and phylogenetic analyses reveal how HIV-1 moves between risk groups: injecting drug users sustain the heterosexual epidemic in Latvia. Epidemics 4:104–116. Guindon S, Lethiec F, Duroux P, Gascuel O. 2005. PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33:W557–W559. Halapi E, Leitner T, Jansson M, Scarlatti G, Orlandi P, Plebani A, Romiti L, Albert J, Wigzell H, Rossi P. 1997. Correlation between HIV sequence evolution, specific immune response and clinical outcome in vertically infected infants. AIDS 11:1709–1717. Hamers FF, Devaux I, Alix J, Nardone A. 2006. HIV/AIDS in Europe: trends and EU-wide priorities. Euro Surveill. 11:E061123.1. Huson DH. 1998. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73. Karlsson A, Bjorkman P, Bratt G, Ekvall H, Gisslen M, Sonnerborg A, Mild M, Albert J. 2012. Low prevalence of transmitted drug resistance in patients newly diagnosed with HIV-1 infection in Sweden 2003–2010. PLoS One 7:e33484. Katoh K, Toh H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9:286–298. Kouyos RD, von Wyl V, Yerly S, Boni J, Rieder P, Joos B, Taffe P, Shah C, Burgisser P, Klimkait T, et al. 2011. Ambiguous nucleotide calls from population-based sequencing of HIV-1 are a marker for viral diversity and the age of infection. Clin Infect Dis. 52:532–539. Kouyos RD, von Wyl V, Yerly S, Boni J, Taffe P, Shah C, Burgisser P, Klimkait T, Weber R, Hirschel B, et al. 2010. Molecular epidemiology reveals long-term changes in HIV type 1 subtype B transmission in Switzerland. J Infect Dis. 201:1488–1497. Lee HY, Perelson AS, Park SC, Leitner T. 2008. Dynamic correlation between intrahost HIV-1 quasispecies evolution and disease progression. PLoS Comput Biol. 4:e1000240. Leigh Brown AJ. 1997. Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. Proc Natl Acad Sci U S A. 94:1862–1865. Leitner T. 2002. The molecular epidemiology of human viruses. Boston: Kluwer Academic Publishers. Leitner T, Albert J. 1999. The molecular clock of HIV-1 unveiled through analysis of a known transmission history. Proc Natl Acad Sci U S A. 96:10752–10757. Leitner T, Albert J. 2000. Reconstruction of HIV-1 transmission chains for forensic purposes. AIDS Rev. 2:241–251. Leitner T, Escanilla D, Franzen C, Uhlen M, Albert J. 1996. Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A. 93:10864–10869. Leitner T, Fitch WM. 1999. The phylogenetics of known transmission histories. In: Crandall KA, editor. The evolution of HIV. Baltimore (MD): Johns Hopkins University Press. Leitner T, Halapi E, Scarlatti G, Rossi P, Albert J, Fenyö EM, Uhlén M. 1993. Analysis of heterogeneous viral populations by direct DNA sequencing. BioTechniques 15:120–126. Lemey P, Derdelinckx I, Rambaut A, Van Laethem K, Dumont S, Vermeulen S, Van Wijngaerden E, Vandamme AM. 2005. Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J Virol. 79:11981–11989. Lindstrom A, Ohlis A, Huigen M, Nijhuis M, Berglund T, Bratt G, Sandstrom E, Albert J. 2006. HIV-1 transmission cluster with M41L 2481 Romero-Severson et al. . doi:10.1093/molbev/msu179 “singleton” mutation and decreased transmission of resistance in newly diagnosed Swedish homosexual men. Antivir Ther. 11: 1031–1039. Liu L, Pearl DK. 2007. Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol. 56:504–514. Liu Y, McNevin J, Cao J, Zhao H, Genowati I, Wong K, McLaughlin S, McSweyn MD, Diem K, Stevens CE, et al. 2006. Selection on the human immunodeficiency virus type 1 proteome following primary infection. J Virol. 80:9519–9529. Maddison D, Maddison W. 2003. MacClade 4: analysis of phylogeny and character evolution. 4.06 ed. Sunderland (MA): Sinauer Associates. Maljkovic Berry I, Ribeiro R, Kothari M, Athreya G, Daniels M, Lee HY, Bruno W, Leitner T. 2007. Unequal evolutionary rates in the human immunodeficiency virus type 1 (HIV-1) pandemic: the evolutionary rate of HIV-1 slows down when the epidemic rate increases. J Virol. 81:10625–10635. McNearney T, Hornickova Z, Markham R, Birdwell A, Arens M, Saah A, Ratner L. 1992. Relationship of human immunodeficiency virus type 1 sequence heterogeneity to stage of disease. Proc Natl Acad Sci U S A. 89:10247–10251. Messer PW, Neher RA. 2012. Estimating the strength of selective sweeps from deep population diversity data. Genetics 191:593–605. Murillo W, de Rivera IL, Parham L, Jovel E, Palou E, Karlsson AC, Albert J. 2010. Prevalence of drug resistance and importance of viral load measurements in Honduran HIV-infected patients failing antiretroviral treatment. HIV Med. 11:95–103. Nijhuis M, Boucher CA, Schipper P, Leitner T, Schuurman R, Albert J. 1998. Stochastic processes strongly influence HIV-1 evolution during suboptimal protease-inhibitor therapy. Proc Natl Acad Sci U S A. 95: 14441–14446. Nordborg M. 2001. Coalescent theory. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of Statistical Genetics. New Jersey: John Wiley and Sons, Ltd. p. 843–877. Pamilo P, Nei M. 1988. Relationships between gene trees and species trees. Mol Biol Evol. 5:568–583. Parekh BS, Hanson DL, Hargrove J, Branson B, Green T, Dobbs T, Constantine N, Overbaugh J, McDougal JS. 2011. Determination of mean recency period for estimation of HIV type 1 incidence with the BED-capture EIA in persons infected with diverse subtypes. AIDS Res Hum Retroviruses. 27:265–273. Pennings PS, Kryazhimskiy S, Wakeley J. 2014. Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 10: e1004000. Perelson AS, Neuman AU, Markowitz M, Leonard JM, Ho DD. 1996. HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271:1582–1586. Pybus OG, Rambaut A. 2009. Evolutionary analysis of the dynamics of viral infectious disease. Nat Rev Genet. 10:540–550. R Development Core Team. 2003. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. Rodrigo AG, Shpaer EG, Delwart EL, Iversen AK, Gallo MV, Brojatsch J, Hirsch MS, Walker BD, Mullins JI. 1999. Coalescent estimates of 2482 MBE HIV-1 generation time in vivo. Proc Natl Acad Sci U S A. 96: 2187–2191. Rosenberg NA, Tao R. 2008. Discordance of species trees with their most likely gene trees: the case of five taxa. Syst Biol. 57:131–140. Salazar-Gonzalez JF, Salazar MG, Keele BF, Learn GH, Giorgi EE, Li H, Decker JM, Wang S, Baalwa J, Kraus MH, et al. 2009. Genetic identity, biological phenotype, and evolutionary pathways of transmitted/ founder viruses in acute and early HIV-1 infection. J Exp Med. 206: 1273–1289. Scaduto DI, Brown JM, Haaland WC, Zwickl DJ, Hillis DM, Metzker ML. 2010. Source identification in two criminal cases using phylogenetic analysis of HIV-1 DNA sequences. Proc Natl Acad Sci U S A. 107: 21242–21247. Shankarappa R, Margolick JB, Gange SJ, Rodrigo AG, Upchurch D, Farzadegan H, Gupta P, Rinaldo CR, Learn GH, He X, et al. 1999. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J Virol. 73: 10489–10502. Skar H, Albert J, Leitner T. 2013. Towards estimation of HIV-1 date of infection: a time-continuous IgG-model shows that seroconversion does not occur at the midpoint between negative and positive tests. PLoS One 8:e60906. Skar H, Axelsson M, Berggren I, Thalme A, Gyllensten K, Liitsola K, Brummer-Korvenkontio H, Kivela P, Spangberg E, Leitner T, et al. 2011. Dynamics of two separate but linked HIV-1 CRF01_AE outbreaks among injection drug users in Stockholm, Sweden, and Helsinki, Finland. J Virol. 85:510–518. Skar H, Borrego P, Wallstrom TC, Mild M, Marcelino JM, Barroso H, Taveira N, Leitner T, Albert J. 2010. HIV-2 genetic evolution in patients with advanced disease is faster than that in matched HIV-1 patients. J Virol. 84:7412–7415. Supervie V, Ndawinz JD, Lodi S, Costagliola D. 2014. The undiagnosed HIV epidemic in France and its implications for HIV screening strategies. AIDS. 28(12):1797–1804. Swofford DL. 2002. PAUP* Phylogenetic Analysis Using Parsimony (*and Other Methods). 4.0b10 ed. Sunderland (MA): Sinauer Associates. Volz EM, Ionides E, Romero-Severson EO, Brandt MG, Mokotoff E, Koopman JS. 2013. HIV-1 transmission during early infection in men who have sex with men: a phylodynamic analysis. PLoS Med. 10:e1001568. Wakeley J. 2009. Coalescent theory: an introduction. Greenwood Village (CO): Roberts and Company Publishers. Wolfs TFW, Zwart G, Bakker M, Goudsmit J. 1992. HIV-1 genomic RNA diversification following sexual parenteral virus transmission. Virology 189:103–110. Wolinsky SM, Korber BTM, Neumann AU, Daniels M, Kunstman KJ, Whetsell AJ, Furtado MR, Cao Y, Ho DD, Safrit JT, et al. 1996. Adaptive evolution of human immunodeficiency virus type1 during the natural course of infection. Science 272:537–542. Zhang LQ, MacKenzie P, Cleland A, Holmes EC, Brown AJ, Simmonds P. 1993. Selection for specific sequences in the external envelope protein of human immunodeficiency virus type 1 upon primary infection. J Virol. 67:3345–3356.
© Copyright 2026 Paperzz