Question 1. Below are four gene sequences. These are taken from four animals that are believed to have recent shared ancestry. The gene sequences are from a pseudogene, the evolutionary remnant of a gene, which is now nonfunctional, in a given species or group of related species. In this case, the gene is GULO (L-gulonolactone oxidase), which codes for the enzyme which catalyzes a key step in the synthesis of ascorbic acid (vitamin C). Along the way, some animals have lost the function of this gene (by random mutation) and must consume vitamin C in their diet. #1 GGAGCTGAAGGCCATGCTGGAGGCCCACCCCGAGGTGGTGTCCCACTACCTGGTGGGGCTACGCTTCACCTGGAGG #2 GGAGCTGAAGGCCATGCTGGAGGCCCACCCTGAGGTGGTGTCCCACTACCCGGTGGGGGTGCGCTTCACCCAGAGG #3 GGAGCTGAAGGCCGTGCTGGAGGCCCACCCTGAGGTGGTGTCCCACTACCTGGTGGGGGTACGCTTCACCTGGAGG #4 GGAGATGAAGGCCATGCTGGAGGCCCACCCTGAGGTGGTGTCCCACTAACCGGTGGGGGTGCGCTTCACCCAAGGG 1. Examine the four gene sequences and mark any differences among the sequences that you can find. 2. Discuss the following questions: Do you notice any specific pattern? What could this pattern mean regarding the ancestry/relatedness of the four species? 3. Make an hypothesis about the ancestry of these four species in the form of a phylogenetic tree. 4. Draw this tree with the relevant synapomorphy and explain why you drew it this way 5. Why are pseudogenes and other noncoding DNA sequences used commonly by evolutionary biologists in constructing phylogenies? Question 2 While noncoding DNA sequences are extremely useful in analyzing the shared ancestry of different species, protein-coding DNA sequences are also useful. However, the mutation and evolution of protein-coding sequences of DNA is more “constrained.” Why might this be? Below is the amino acid sequence for a protein called SCML1, an enzyme necessary for male embryonic development and male fertility in mammals. It is encoded by a gene on the Xchromosome. The amino acid sequence is only slightly different amongst five mammals, as shown below: ( “…/…” represents a stretch of identical amino acids, and is this omitted. #1 MSNS…/…VIKT…/…DDNTI…/…EQLKTVDD…/…DALQN…/…RFHARSLWTNHKRYG…/…KKHSYRLVL…/…YETF #2 MSDS…/…VVKT…/…DDNTI…/…EQLRTVND…/…DALQN…/…RFYARSLWTNRKRSG…/…KKHSYRPVL…/…YETF… #3 MSNS…/…VVKT…/…DDDTI…/…EQLKTVND…/…DAMQN…/…RFHARFLWANRKRYG…/…KKHSYRLVL…/…YETF… #4 MSNS…/…VVKT…/…DDDTI…/…EQLKTVND…/…DAMQN…/…RFHARSLWTNRKRYG…/…KKYSYRLVA…/…YESF… #5 MSSS…/…VVKT…/…DDDTI…/…EQQKTVND…/…DAMQN…/…RFRARSLWTNRKRYG…/…KKYSYRLVA…/…YESF… 1. Examine the five amino acid sequences above and mark any differences among the sequences that you can find. 2. Use the differences in amino acid sequence to retrace the ancestry of these five mammals. Make an hypothesis in the form of a phylogenetic tree. Draw this tree on a separate piece of paper, along with your notes explaining it. Question 3 – Hypothesis testing Distances Evolutionary distances are fundamental for the study of molecular evolution and are useful for phylogenetic reconstruction and estimation of divergence times. Distances can be estimated using DNA sequence differences between individuals, or allele frequency differences between populations. In this lab, you will examine DNA sequences and estimate distance measures that are derived from nucleotide polymorphisms between individuals. MEGA, the program we will use to draw phylogenetic trees, implements a number of distances based on mutation type, each of which has its own assumptions. In this lab, we will use the Kimura 2-parameter distance as a measure of divergence between sequences. Kimura’s two parameter model (1980) takes into account different transitional and transversional substitution rates. The model also assumes that the four nucleotide frequencies are the same and that rates of substitution do not vary among sites. Tree-drawing Neighbor-joining uses distance measures that correct for multiple substitutions at the same sites, and a topology showing the smallest value of the sum of all branches (S) is chosen as an estimate of the correct tree. The number of possible topologies (unrooted trees) rapidly increases with the number of taxa. Therefore, it becomes very difficult to examine all topologies. NJ method produces an unrooted tree because it does not assume a constant rate of evolution. Therefore, you need to define an outgroup taxon. In the absence of outgroup taxa, the root is sometimes given at the midpoint of the longest route connecting two taxa in the tree. This is referred to as the mid-point rooting. Measuring support for the branches Note that in addition to the distance estimates, MEGA also computes the standard errors of the estimates using the analytical formulas for the distances and the bootstrap method. Bootstrapping is a statistical procedure that allows an assessment of the confidence in your tree, i.e. your hypothesis. In phylogenetic trees, bootstrap values do not correspond to p-values in statistical tests, that is, there is no particular significance in a bootstrap value of 95%. However, bootstrap values give a good indication of the support of a particular group by the sequence data. The principle of bootstrapping is simple: an initial tree is constructed based on the original data. Subsequently, the data are ‘resampled’, that is, the computer draws randomly base positions from all sequences and puts them into a new data set. It does so with replacement, that is, after drawing a base position it puts it back into the data set, and may therefore draw it several times. This procedure is carried out until a randomized sequences has been created that has the same length of your original sequence. After creating a specified number of such randomized data sets (usually 1000 sets), trees are constructed for each (resulting in 1000 trees). Now the computer checks how often each cluster in your original tree occurs in these resampled data sets. For example, if the cluster of human and gorilla occurs in 560 trees of the resampled data, the bootstrap value would be 56 (for 56% of all resampled trees). If a cluster only depends on very few nucleotides, the bootstrap value will be low, because these few nucleotides will not have been sampled (by chance) in quite a few of the resampled data sets. If a cluster is well supported by a large proportion of the sequence, resampling will have little effect and the bootstrap value will be high. For example, a 10 bp sequence Sea Lion Hippo Orca 1 A A A 2 T C C 3 G G G 4 C C C 5 G G A 6 C C C 7 C C C 8 A A A 9 C T T 0 T T T 1 A A A 1 A A A 4 C C C 0 T T T 6 C C C 3 G G G 8 A A A 9 C T T 5 G G A 5 G G A 6 C C C 7 C C C 7 C C C 4 C C C 3 G G G 2 T C C 1 A A A 0 T T T 9 C T T 3 G G G 4 C C C 3 G G G 6 C C C 8 A A A 8 A A A 9 C T T 0 T T T 1 A A A 9 C T T 3 G G G Resample 1 Sea Lion Hippo Orca 6 7 10 0 Resample 2 Sea Lion Hippo Orca Resample 3 Sea Lion Hippo Orca Objective: Reconstruct a phylogeny, examine the effects of different assumptions on tree topology, and use the tree to examine an alternate hypothesis on phenotypic trait evolution. Methodology: 1. Align the sequences using GENEIOUS and trim the sequence to keep only the overlapping area. 2. Build trees to estimate distances and compute trees. 3. Examine likely evolutionary events that explain the evolution of virulence in Vibrio. Downloading sequences 1. Download the file “GyrB.fasta” from the class website and save it on the desktop. Aligning sequences 2. Open GENEIOUS 3. Create a folder in the Local directory: Select Local, right click and select New Folder. 4. Open the folder you just created. In the File tab, select Import – From File; Select the file with the sequences you just saved. Select the “Keep sequences separate” option. 5. Select all the sequences; In the Tools tab, click on Alignment; Click OK 6. Open the newly created alignment. You will notice that the sequences are not the same length for all you species. Scroll through your alignment until you get complete coverage (i.e. all sequences are aligned). Select the part of the alignment where all sequences overlap. Right click on it and select Extract Region. Select Extract Region as an alignment. The dashes in the middle of the sequences are gaps representing insertions or deletions and are evolutionary meaningful. 7. Select the newly extracted alignment. In the File tab select Export, then Selected Documents. Save the alignment as a Mega (*.meg) file on the desktop. Creating phylogenetic trees 8. Open the program MEGA. 9. Under File, select Open a File/Session. Select your file. 10. Open the Sequence file by clicking on the TA icon. 11. Under the Data menu select Explore Active Data. Do the following to gain more knowledge about the sequences. a. How many nucleotides are there in these sequences? (Look at the bottom of the screen). b. Transition/transversion ratio? Go to Statistics=>Nucleotide Pair Frequences=>Directional. The domains are the codon positions. 12. Using MEGA construct a Neighbor-Joining Tree of the data, and overlay the virulence phenotype. Explore the effects of different distances and tree visualizing methods. 13. Using the full tree, explain the evolution of virulence in this system. Include images of relevant trees (you can copy and paste the tree from MEGA). Figure 1. Temporal variation in the amount of Vibrio bacteria in oyster hemolymph (HL). Quantification of total Vibrio spp. (open circles and triangles) isolated from oyster hemolymph stemming from the four transplant groups (origin_site), i.e. DB_DB, DB_OW, OW_DB, OW_OW. Blue lines represent oysters assayed in site OW and black lines represent oysters assayed in site DB. Solid lines show oyster origin DB, while dashed lines show oyster origin OW. Water temperature is shown on the secondary y-axis (full circles, dotted line). The area shaded in grey marks the spawning period.
© Copyright 2026 Paperzz