Transmission tree reconstruction by augmentation of internal phylogeny nodes Matthew Hall Li Ka Shing Institute for Health Information and Discovery, University of Oxford February 2017 Matthew Hall (Oxford) Transmission tree reconstruction February 2017 1 / 29 The relationship of the phylogeny to the transmission tree Let T be a time-tree (rooted, with branch lengths in units of time). Let V be its node set of size n. Suppose the isolates at the tips of T come from a set of H of hosts. Initial assumptions: Complete sampling of the epidemic since the TMRCA No superinfection or reinfection Transmission is a complete bottleneck Matthew Hall (Oxford) Transmission tree reconstruction February 2017 2 / 29 The relationship of the phylogeny to the transmission tree The transmission tree N (a DAG whose nodes are the members of H, depicting which host infected which other) can be represented by a map d : V → H taking each node to a host (tips to the host they were sampled from). Visualised by collapsing the nodes in the preimage of each h ∈ H under d to a single node. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 3 / 29 The relationship of the phylogeny to the transmission tree The transmission tree N (a DAG whose nodes are the members of H, depicting which host infected which other) can be represented by a map d : V → H taking each node to a host (tips to the host they were sampled from). Visualised by collapsing the nodes in the preimage of each h ∈ H under d to a single node. Matthew Hall (Oxford) Transmission tree reconstruction A H F D B E G C I J February 2017 3 / 29 The simplest version Assume that the phylogeny and transmission tree coincide; internal nodes are transmission events. This implies no within-host diversity and necessitates no more than one tip per host. If n is internal with children nC1 and nC2 , then either d(n) = d(nC1 ) or d(n) = d(nC2 ). Trivially 2n−1 transmission trees for a fixed T . Matthew Hall (Oxford) Transmission tree reconstruction February 2017 4 / 29 Within-host diversity If within-host diversity is assumed then internal nodes are coalescences of two lineages within a host. The subgraph induced by the preimage of d for any host must be connected. An extra set of parameters q represent the infection times. Question: How many transmission trees for a fixed T ? (Depends on the topology.) With one tip per host? With ≥ 1 tip per host? (Sometimes 0.) Matthew Hall (Oxford) Transmission tree reconstruction February 2017 5 / 29 Simultaneous MCMC reconstruction of phylogeny and transmission tree In either case we get an (injective but not surjective) map z from the set of possible ds to the space of transmission trees. Thus an MCMC method that samples from the posterior distribution of phylogenies with internal node augmentation obeying either set of rules simultaneously samples from the posterior distribution of transmission trees. Not only a method for reconstructing N , but a population model (tree prior) for reconstruction of T that is more realistic for an outbreak than the standard unstructured coalescent models. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 6 / 29 Decomposition Let S be the sequence data and φ the various model parameters. Without within-host diversity: p(T , d, φ|S) = p(S|T )p(T , d|φ)p(φ) p(S) p(S|T ) is the standard phylogenetic likelihood and p(T , d|φ) the probability of observing the augmented tree under a transmission model. With within-host diversity: p(T , d, q, φ|S) = p(S|T )p(T |N , q, φ)p(N , q|φ)p(φ) p(S) p(N , q|φ) is the probability of the transmission tree and its timings as above; p(T |N , q, φ) is the probability of the within-host mini-phylogenies under a coalescent process. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 7 / 29 MCMC implementation Hall et al., 2015 implemented simultaneous reconstruction of both trees in BEAST, with MCMC proposals that respect the rules of node augmentation. Several other approaches (e.g. Didelot et al., 2014, Morelli et al., 2012; Ypma et al., 2013; Klinkenberg et al., 2017) with recent work on the incomplete sampling problem (Didelot et al., 2016; Lau et al., 2016). Matthew Hall (Oxford) i) i) ii) iii) ii) iv) iii) Exchange Subtree slide Wilson-Balding Transmission tree reconstruction Exchange Subtree slide 50% 50% Wilson-Balding 50% 50% February 2017 8 / 29 43617 tips Matthew Hall (Oxford) Transmission tree reconstruction February 2017 9 / 29 The BEEHIVE study NGS short-read sequence data acquired from samples taken from European (and one African) HIV cohort studies. Some cohorts go back to the early epidemic in the 1980s Current data from 3138 individuals Epidemiology: age, gender, date of first positive test, countries of origin and infection, risk group, ART dates, etc. Sequences from one time point only (with a few exceptions) Rather than making a consensus sequence from each host’s reads, we want to use everything. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 10 / 29 Phyloscanner: phylogenetic analysis of NGS pathogen data sequencing mapping to references Idea: align all short reads from all hosts to a reference genome and slide a window across the genome, building a phylogeny for the reads overlapping each window. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 11 / 29 Phyloscanner: phylogenetic analysis of NGS pathogen data Identical reads from a single host are merged but the duplicate counts kept as tip traits We use RAxML for reconstruction Tips are not associated with each other across different windows, but hosts are. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 12 / 29 The topological signal of transmission ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Once we have many tips from each host, transmission has a topological signal. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Direct transmission is suggested when the clade from the infectee is not monophyletic (Romero-Severson et al., 2016) but in general we only see the direction of transmission from the topology. ● Matthew Hall (Oxford) ● Starts to look like a parsimony problem. Transmission tree reconstruction February 2017 13 / 29 The topological signal of transmission ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Once we have many tips from each host, transmission has a topological signal. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Direct transmission is suggested when the clade from the infectee is not monophyletic (Romero-Severson et al., 2016) but in general we only see the direction of transmission from the topology. ● Matthew Hall (Oxford) ● Starts to look like a parsimony problem. Transmission tree reconstruction February 2017 13 / 29 The topological signal of transmission ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Once we have many tips from each host, transmission has a topological signal. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Direct transmission is suggested when the clade from the infectee is not monophyletic (Romero-Severson et al., 2016) but in general we only see the direction of transmission from the topology. ● Matthew Hall (Oxford) ● Starts to look like a parsimony problem. Transmission tree reconstruction February 2017 13 / 29 Challenges reconstructing transmission from this data Datasets: Enormous size Contamination present Coverage is uneven Epidemiology: Sampling is incomplete Multiple infections present Bottleneck at transmission may be wide (IDUs) 43617 tips Matthew Hall (Oxford) Transmission tree reconstruction February 2017 14 / 29 Transmission tree reconstruction using parsimony For a fixed tree, we aim to: Reconstruct hosts from those represented in the tips to internal nodes in the tree But also allow reconstruction to “a host outside the dataset’ as required by incomplete sampling Minimise the number of infection events amongst hosts in the dataset. . . . . . except, penalise reconstructions which suggest an unreasonable amount of genetic diversity stemming from a single infection event. Identify multiple infections and contaminations Matthew Hall (Oxford) Transmission tree reconstruction February 2017 15 / 29 Transmission tree reconstruction using parsimony Suppose the T has nodes V and we are reconstructing characters from the set of states S. The cost function c(p, q; i, j) determines the cost of transitioning from state i to state j along the branch from p to q (in the direction away from the root). We take a node- and edge- dependent c on a known tree; costs for transitions vary depending on the states involved and the branch on which they occur. The lowest cost reconstruction is found with the Sankhoff algorithm. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 16 / 29 Transmission tree reconstruction using parsimony If reads are taken from n hosts h1 , . . . , hn making up the study population, we use the hi as states along with an “unsampled” state u. Assume that the root of the tree was in the unsampled state (using an outgroup if required). Tips from outside the study population (the outgroup, other reference sequences, contaminants) are assigned u as a state. We are interested in minimising the cost of infections of members of the study population, but not hosts outside that population, so c(p, q; h, u) = 0 for all p, q, h. Reconstruction starts by putting a u on the root. (Otherwise it will always be cheap to reconstruct some hi to the root, plausible or no.) Matthew Hall (Oxford) Transmission tree reconstruction February 2017 17 / 29 Transmission tree reconstruction using parsimony Let l(p, i) be the sum of the branch lengths in the subtree rooted at p pruned so that only tips from i remain, or ∞ if there are no such tips. Then we take c(p, q; i, j) = 1 + kl(p, j) with k ∈ R+ a tunable parameter. This penalises reconstructions with unreasonable amounts of within-host diversity (right). Matthew Hall (Oxford) l1 ● ● ● q p ● ● l2 r ● o ● s p ● ● h u q r ● s o Left: C = 1 + k(l1 + l2 ) Right: C = 2 Transmission tree reconstruction February 2017 18 / 29 Transmission tree reconstruction using parsimony ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● c(p, q; h, u) = c(p, q, h, u) for all p, q, h. Above trees have equal C . Choose to break ties always towards h (for dense sampling), always towards u (conservative), or based on branch lengths. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 19 / 29 Parsimony for detection of contaminants Small numbers of contaminant reads are frequent, where a virus from the wrong host has been sequenced The parsimony reconstruction does double duty to detect these Find “multiple introductions” where the tips in one split have very low read counts, and ignore those tips Matthew Hall (Oxford) Transmission tree reconstruction February 2017 20 / 29 Augmented phylogeny "Transmission tree" ● ● ● ● Without the assumptions of a single infection event per host and complete sampling, the “transmission tree” has multiple nodes per host and unsampled nodes. B1 ● ● ● A ● ● ● ● B2 ● ● ● ● ● ● ● ● ● ● A ● ● ● ● ● U ● ● ● ● ● ● B ● ● Matthew Hall (Oxford) Transmission tree reconstruction February 2017 21 / 29 Example Known HIV transmission chain (Lemey et al., 2005, Vrancken et al., 2014). ? B A Reconstruction on RAxML trees for env and pol genes Host A B C D E F G H I K L True B? A? B C C A F B B E C Matthew Hall (Oxford) env U U B D C U U B B E C pol U A B D C A F B B E C E C H L D I F G K Transmission tree reconstruction February 2017 22 / 29 Full phyloscanner results Full output from phyloscanner is a separate phylogeny for the reads in each genome window We would like to use these in a manner similar to bootstrapping, to indicate support for topological relationships The phylogenies are not trivially comparable across windows as the tips are not the same The hosts are the same, so the trasmission trees are more comparable, but: Some hosts are absent from some windows due to sequencing problems One or more nodes for unsampled regions Potentially multiple nodes per host Question: Can we nonetheless give a summary or median transmission tree? Matthew Hall (Oxford) Transmission tree reconstruction February 2017 23 / 29 Classification of relationships For the time being, concentrate on classifying pairwise relationships between hosts on each window Contiguity: is the subgraph induced by the nodes from the pair, with perhaps some unsampled nodes, connected? Descent: Are all the nodes from one patient ancestral to those from the other? Mean patristic distance in the phylogeny Then count relationships across windows Matthew Hall (Oxford) Transmission tree reconstruction February 2017 24 / 29 Improved HIV cluster detection By setting a threshold on patristic distance we can refine the procedures for identifying HIV clusters with likely directionality. BEE0221-1 13 BEE0239-1 7 12 5 3 5 1 3 2 11 1 BEE0260-1 BEE0220-1 Matthew Hall (Oxford) Transmission tree reconstruction February 2017 25 / 29 Improved HIV cluster detection 5 9 BEE0247-1 1 BEE0018-1 BEE0117-1 BEE0529-1 8 3 BEE0259-1 BEE0042-1 2 2 13 8 BEE0406-1 6 4 BEE0062-1 1 5 5 13 1 4 4 7 1 BEE0096-1 10 BEE0735-1 3 11 9 3 5 BEE0426-1 BEE0809-1 2 2 BEE0109-1 3 4 1 1 13 9 BEE0287-1 1 6 BEE0212-1 1 BEE0843-1 BEE0738-1 2 12 1 6 6 BEE0095-1 9 1 6 4 1 2 5 3 9 5 7 BEE0726-1 6 7 6 6 2 BEE0197-1 14 BEE0728-1 3 1 32 BEE0198-1 BEE0098-1 1 BEE0527-1 4 16 BEE0840-1 1 3 9 7 3 2 3 2 5 11 BEE0442-1 BEE0331-1 3 BEE0240-1 1 BEE0409-1 1 19 BEE0826-1 1 1 3 BEE0036-1 6 325 8 1 14 BEE0321-1 1 11 11 6 6 2 2 6 8 BEE0201-1 1 BEE0386-1 BEE0563-1 1 2 BEE0037-1 7 8 2 BEE0023-1 8 BEE0427-1 9 7 BEE0112-1 9 8 BEE0885-1 7 3 3 2 4 BEE0351-1 2 7 7 BEE0564-1 3 10 6 BEE0854-1 12 BEE0865-1 1 2 BEE0205-1 6 3 BEE0886-1 BEE0381-1 BEE0241-1 11 BEE0234-1 3 6 7 4 483 6 BEE0102-1 4 3 3 4 2 5 3 BEE0243-1 2 BEE0100-1 12 BEE0043-1 BEE0174-1 BEE0347-1 7 5 5 15 6 3 2 BEE0255-1 7 BEE0115-16BEE0146-1 1 0 2 7 10 BEE0085-1 1 5BEE0428-1 2 5 2 3 5 2 18 4 1 3 BEE0193-1 6 BEE0086-1 4 4 1 1 BEE0200-1 9 2 BEE0233-1 1 1 7 BEE0110-1 2 BEE0153-1 11 BEE0142-1 BEE0227-1 7 BEE0265-1 1 20 6 BEE0248-1 13 1 1 10 8 BEE0175-1 BEE0196-1 BEE0741-1 6 14 3 BEE0327-1 BEE0215-1 7 BEE0804-1 1 5 BEE0394-1 BEE0093-1 2 1 24 BEE0254-1 1 6 BEE0795-1 13 4 3 6 BEE0577-1 1 6 1 3 BEE0815-1 BEE0323-1 7 BEE0811-1 2 1 2 BEE0821-1 5 BEE0437-1 4 BEE0015-1 14 6 BEE0009-1 1 BEE0464-1 BEE0474-1 1 3 BEE0767-1 4 BEE0108-1 BEE0057-1 2 2 6 7 1 BEE0014-1 14 10 16 BEE0032-1 9 9 8 3 1 BEE0803-1 4 BEE0141-1 9 BEE0122-1 9 1 2 6 1 BEE0114-1 6 16 BEE0567-1 10 BEE0542-1 BEE0107-1 BEE0341-1 BEE0541-1 2 BEE0019-1 1 4 20 1 BEE0161-1 5 3 BEE0154-1 5 1 6 BEE0736-1 BEE0290-1 BEE0547-1 3 BEE0301-1 2 BEE0353-1 BEE0696-1 BEE0031-1 6 BEE0237-1 BEE0758-1 1 6 7 BEE0181-1 BEE0662-1 2 2 1 BEE0794-1 4 BEE0555-1 4 2 BEE0592-1 18 14 BEE0789-1 BEE0808-1 2 BEE0305-1 5 2 BEE0709-1 6 BEE0273-1 5 7 2 BEE0053-1 BEE0791-1 1 2 BEE0723-1 1 10 9 BEE0260-1 7 BEE0139-1 16 6 BEE0213-1 BEE0322-1 BEE0148-1 BEE0017-1 BEE0012-1 BEE0799-1 13 BEE0470-1 6 BEE0532-1 7 BEE0206-1 7 6 11 BEE0573-1 12 1 2 2 7 1 14 9 BEE0130-1 14 6 2 1 2 BEE0223-1 BEE0395-1 BEE0164-1 BEE0168-1 8 BEE0368-1 BEE0807-1 8 13 1 1 1 10 7 6 BEE0160-1 27 BEE0345-1 1 BEE0454-1 3 3 BEE0131-1 5 21 BEE0534-1 BEE0092-1 BEE0097-1 3 BEE0545-1 4 BEE0222-1 BEE0231-1 BEE0385-1 2 5 BEE0657-1 1 5 12 BEE0230-1 BEE0710-1 11 7 BEE0851-1 BEE0219-1 1 BEE0065-1 3 7 5 18 BEE0572-1 3 2 BEE0225-1 BEE0473-1 BEE0877-1 7 BEE0797-1 3 BEE0162-1 4 BEE0539-1 BEE0398-1 2 4 BEE0330-1 BEE0798-1 4 BEE0120-1 3 5 BEE0780-1 BEE0121-1 3 BEE0557-1 3 5 6 BEE0538-1 BEE0584-1 1 12 BEE0182-1 12 BEE0543-1 BEE0284-1 BEE0686-1 BEE0560-1 1 4 BEE0867-1 4 BEE0049-1 BEE0669-1 BEE0335-1 1 2 BEE0253-1 4 7 10 24 2 BEE0599-1 15 BEE0147-1 BEE0034-1 Matthew Hall (Oxford) 12 10 BEE0552-1 BEE0178-1 4 11 10 BEE0220-1 BEE0849-1 8 8 7 5 5 7 BEE0258-1 1 4 BEE0733-1 4 6 11 BEE0465-1 BEE0221-1 3 2 BEE0734-1 BEE0725-1 BEE0214-1 2 15 4 BEE0040-1 2 12 BEE0059-1 2 BEE0050-1 1 3 7 BEE0016-1 4 BEE0285-1 BEE0144-1 14 6 BEE0574-1 5 Transmission tree reconstruction 7 BEE0263-1 13 BEE0318-1 February 2017 26 / 29 Conclusions The transmission tree can be viewed as an augmentation of the internal nodes of a phylogeny with host information, subject to various sets of rules. Considerable recent work in reconstructing it for smaller datasets, usually using MCMC. Phyloscanner allows reconstruction with big, NGS datasets. Work remains to be done for rigorous treatment of the output. Matthew Hall (Oxford) Transmission tree reconstruction February 2017 27 / 29 Matthew Hall (Oxford) Transmission tree reconstruction February 2017 28 / 29 Acknowledgements Oxford Christophe Fraser Chris Wymant Imperial Oliver Ratmann Edinburgh Andrew Rambaut Mark Woolhouse Matthew Hall (Oxford) Transmission tree reconstruction February 2017 29 / 29
© Copyright 2026 Paperzz