Phylogenetic basis of systematics • Linnaeus: Ordering principle is God. • Darwin: Ordering principle is shared descent from common ancestors. • Today, systematics is explicitly based on phylogeny. Goals of Phylogenetic Analysis Time • Given a multiple sequence alignment, determine the ancestral relationships among the species. • We assume that residues in a column are homologous, and that all columns have the same history. Hu Ch Go Gi Types of Phylogenic Trees: • 1. Cladogram: • show the relationships between different organisms • branch lengths are arbitary • 2. Phylogram: • branches that represent evolutionary time and amount of change. Data • Biomolecular sequences: DNA, RNA, amino acid, in a multiple alignment • Molecular markers (e.g., SNPs, etc.) • Morphology • Gene order and content These are “character data”: each character is a function mapping the set of taxa to distinct states (equivalence classes), with evolution modelled as a process that changes the state of a character DNA Sequence Evolution -3 mil yrs AAGACTT AAGGCCT AGGGCAT AGGGCAT TAGCCCT TAGCCCA -2 mil yrs TGGACTT TAGACTT AGCACTT AGCACAA AGCGCTT -1 mil yrs today Phylogenetic Analyses • Step 1: Gather sequence data, and estimate the multiple alignment of the sequences. • Step 2: Reconstruct trees on the data. (This can result in many trees.) • Step 3: Apply consensus methods to the set of trees to figure out what is reliable. Phylogeny Problem U AGGGCAT V W TAGCCCA X TAGACTT Y TGCACAA X U Y V W TGCGCTT Types of Phylogenetic Methods • Character-based • Parsimony • Likelihood Involve optimizing a criterion based on fit of the residues to the tree. • Distance-based • Neighbor joining (NJ) • UPGMA Involve optimizing a criterion based on fit of a matrix of pairwise distances to the tree Select the tree that explains Parsimony the data with the fewest number of substitutions. http://study.com/academy/lesson/maximum-parsimony-likelihood-methods-in-phylogeny.html Select the tree that has the Likelihood highest probability of producing the observed data Select the tree that best Distance recreates the observed pairwise distances. https://www.youtube.com/watch?v=NRRErwFsIcw Phylogenetic Tree Building Two basic types: Gene/protein tree: represents evolutionary history of genes/proteins Species tree: represents the evolutionary history of species based on characters (like protein sequences) ORFP MG01127.1 NC U01640.1 ORFP YDL020C Scastellii Skluyeri orf6.4920.prot AN0709.2 H. Rooted, binary tree Unrooted, binary tree Phylogenetic Tree Building Two basic types: Gene/protein tree: represents evolutionary history of genes/proteins Species tree: represents the evolutionary history of species based on characters (like protein sequences) ORFP MG01127.1 NC U01640.1 ORFP YDL020C Scastellii Skluyeri orf6.4920.prot AN0709.2 H. Rooted, binary tree Unrooted, binary tree * Can root a tree using an outgoup: known distant relative Branch lengths (“distance”) ~ time ORFP MG01127.1 NC U01640.1 Root (ancestral species) ORFP YDL020C Scastellii Skluyeri orf6.4920.prot AN0709.2 H. Edges Nodes (common ancestor) Leaves (modern observations) Branch lengths (“distance”) ~ time ORFP MG01127.1 NC U01640.1 Root (ancestral species) ORFP YDL020C Why is the structure of the tree important? Scastellii Skluyeri orf6.4920.prot AN0709.2 H. Edges Nodes (common ancestor) Leaves (modern observations) Branch lengths (“distance”) ~ time ORFP MG01127.1 NC U01640.1 Root (ancestral species) ORFP YDL020C Why is the structure of the tree important? Scastellii Skluyeri orf6.4920.prot Branching represents speciation into two new species AN0709.2 H. Edges Nodes (common ancestor) Leaves (modern observations) Branch lengths (“distance”) ~ time ORFP MG01127.1 NC U01640.1 Root (ancestral species) ORFP YDL020C Scastellii Skluyeri orf6.4920.prot AN0709.2 H. This tree can also be denoted in text format 8 7 6 5 4 3 2 1 Branch lengths (“distance”) ~ time ORFP MG01127.1 NC U01640.1 Root (ancestral species) ORFP YDL020C Scastellii Skluyeri orf6.4920.prot AN0709.2 H. 8 7 6 5 4 3 2 1 This tree can also be denoted in text format ( ( ( (3,4) , (5,6) ), 7 ), (1,2) ), 8 Building phylogenetic trees 1. Distance based methods a. Calculate evolutionary distances between sequences b. Build a tree based on those distances 2. Maximum Parsimony (character based method) a. Find the simplest tree that explains the data with the fewest # of substitutions 3. Maximum Likelihood (probabilistic method based on explicit model) a. Find the tree that is most likely, given an evolutionary model Building phylogenetic trees 1. Distance based methods 2. Maximum Parsimony (character based method) Search all possible trees and find the one requiring the fewest substitutions AAG GGA AAA AGA a b c d Building phylogenetic trees 1. Distance based methods 2. Maximum Parsimony (character based method) Search all possible trees and find the one requiring the fewest substitutions AAG GGA AAA AGA a b c d Building phylogenetic trees 1. Distance based methods 2. Maximum Parsimony (character based method) Search all possible trees and find the one requiring the fewest substitutions AAG AAA GGA AGA a c b d What are the ancestral sequences at each node? How many base changes are required for this tree? Building phylogenetic trees 1. Distance based methods 2. Maximum Parsimony (character based method) Search all possible trees and find the one requiring the fewest substitutions AAA AAG AAA GGA AGA AAA or AGA a c b d AGA What are the ancestral sequences at each node? How many base changes are required for this tree? 3 changes are required. Building phylogenetic trees 1. Distance based methods 2. Maximum Parsimony (character based method) Search all possible trees and find the one requiring the fewest substitutions AAA AAG AAA GGA AGA AAA or AGA a c b d AGA The score of the tree is the number of character changes. MP aims to minimize the score of tree. How can you tell if your tree is significant? Bootstrapping: how dependent is the tree on the dataset 1. 2. 3. 4. Randomly choose n objects from your dataset of n, with replacement Rebuild the tree based on the subset of the data Repeat 1,000 – 10,000 times How often are the same children joined? If a given node is represented in <x trials, collapse the node for a ‘consensus’ tree Jackknifing: how dependent is the tree on the dataset 1. 2. 3. 4. Randomly choose k objects from your dataset of n, without replacement Rebuild the tree based on the subset of the data Repeat 1,000 – 10,000 times How often are the same children joined? How can you tell if your tree is significant? ORFP MG01127.1 NC U01640.1 70 100 ORFP YDL020C Scastellii 80 95 Skluyeri orf6.4920.prot 100 AN0709.2 H. Maximum Likelihood tree showing Bayesian Inference/Maximum Parsimony/Maximum Likelihood support value at each node
© Copyright 2026 Paperzz