Phylogenetic analysis Selecting sequences Outgroup sequences Alignment Choice of method Example using one method Three most important choices Which sequences to include Outgroup sequences Alignment T5 T6 O1 O2 T4 O3 T3 O4 T2 T1 T5 T6 T5 O1 O2 T4 O3 T3 T1 O3 T3 T2 T1 T2 T3 T4 T5 T6 O1 O2 O3 O4 O1 T4 O2 O4 T2 T6 T1 O4 T1 T2 T3 O2 T4 T5 T6 O1 O3 O4 “Outgroup” sequences be included The best outgroup sequences are sequences clearly outside the group being studied, but not too far out. Multiple outgroup sequences should be chosen. The outgroup sequences are included in the data matrix just like the other sequences. They will be used to root the tree. Methods of phylogenetic analysis Parsimony (Cladistics) Maximum likelihood Bayesian Genetic distance (Neighbor-joining, etc.) Parsimony (Cladistics) Willi Hennig. 1950. Grundzüge einer Theorie der phylogenetischen Systematik. 1966. Phylogenetic systematics. Evidence comes from characters Goal: build most parsimonious tree Finding the most parsimonious tree Goal- fewest evolutionary steps (optimality criterion) • Fewest a.a. changes • Fewest base changes Many tree topologies are tested, choosing the best. Unrooted Rooting the tree comes later. Rooting the tree The outgroup taxa are included in the data matrix just like the other taxa. Once the best tree is found, it is “rooted” along the branch connecting the outgroup and ingroup taxa. T1 T2 T3 T4 T5 T6 O1 O2 O3 O4 T1 T1 T2 T3 T4 T5 T6 O1 O2 O3 O4 T2 T3 T4 T5 T6 O1 O2 O3 O4 Strict consensus What to do in case of a tie- consensus A “strict” consensus tree is one in which the branches not present on all trees are collapsed, resulting in polytomies. A “50% majority rule” consensus tree is one in which the branches not present on 50% of the trees are collapsed, resulting in polytomies. Trees with many polytomies are said to be less resolved than trees with few or no polytomies. T1 T2 T3 T4 T5 T6 O1 O2 O3 O4 T1 T1 T2 T3 T4 T5 T6 O1 O2 O3 O4 T2 T3 T4 T5 T6 O1 O2 O3 O4 Strict consensus Why are Maximum Likelihood and Bayesian methods considered an improvement over parsimony? + They allow for a model of molecular evolution to be specified. • Not all changes from one base to another (or from one a.a. to another) are equally likely. • Not all positions have the same probabilty of change. - They require that the correct model be specified. What is Maximum Likelihood (ML)? Just like parsimony, ML examines lots of trees and picks the best one. However, the optimality criteria differ. • Parsimony -- fewest changes. • ML -- maximizes the probability of observing the data (aligned sequences), given a model of molecular evolution. Models of molecular evolution Substitution matrix • For proteins, this is the (observed) probability of one amino acid changing to another. • For DNA, it is the probability of one base changing to another. Site-to-site variation in rate of change • Some sites don’t vary. • Among those that do, they vary at different rates. Why is using a correct model of molecular evolution better than using parsimony? Under some conditions, parsimony chooses the wrong tree (long branch attraction). Methods using a model are more precise and result in fewer exact ties, generally. • For example, changes between two chemically similar a.a.’s can be used as “similarity”. Under parsimony all differences are simply “different”. • Models usually choose a single best tree, whereas parsimony usually chooses a large set of most parsimonious trees. Branch length estimates are more accurate with a model. What is Bayesian phylogenetic analysis? Just like ML, we search for the best trees that are consistent with both the model and the data. Optimality criterion: • -- maximizes the probability of the tree, given the data (aligned sequences) and the model of molecular evolution. Bayesian analysis is the only one that automatically provides confidence estimates (similar to bootstrap values) for each node. Example - Bayesian analysis of signal transduction proteins Using ProtTest to find out how the sequences are evolving Informing MrBayes of the model of molecular evolution Using MrBayes to get the phylogeny Making a figure MrBayes doesn’t know when it has run long enough -- you decide. Average standard deviation of split frequencies: < 0.01 A B C D E B A E D C What is Neighbor-joining (NJ)? NJ is an algorithm for building a tree. There is no optimality criterion. First, a matrix of distances between all pairs of sequences is computed. • A substitution matrix is needed to do this. Then, one pair is chosen from among all possible pairs, because combining them best minimizes the length of the tree. Neighbor-joining NJ is very fast. There is no optimality criterion. • This means there is no way to assess its success. • There is also no way to say whether a “best” tree is significantly better that a set of “next best” trees. (mt Eve) The tree it chooses is not always the shortest. Distances are estimated from noisy data and early mistakes in NJ can’t be revisited. Large data sets If you have over 50 sequences, or if you have very long sequences (hundreds of proteins) ProtTest and MrBayes may take more than a couple of days to finish. Parsimony is much faster. • It allows node support (bootstrap values) to be calculated. • It doesn’t require a model of molecular evolution. • PAUP* can read nexus files. NJ is faster still. Sometimes it is the only method that is fast enough. • A default model of molecular evolution must be used. DNA sequences should be used when sequences are highly similar Use a very similar procedure. Use MrModelTest instead of ProtTest. Summary Three most important choices • Which sequences to include • Outgroup sequences • Alignment Choice of method - Bayesian Example - Look on Ned’s Computational Corner for more details.
© Copyright 2024 Paperzz