Crop Species Interrelatedness: Become a detective for a day The amount of sequence data and the rate at which it is being made available means that we need to come up with ever more impressive computational tools in order to analyse it. But do you know what actually goes on behind the scenes to create a phylogenetic tree? Here is a great exercise that demonstrates how sequences are compared and analysed for genetic differences, and how a family history can be inferred to draw up a phylogenetic tree using five common crop plants Created by Dr Emily Angiolini, based on ‘Bioinformatics with pen and paper’ by Cleopatra Kozlowski 1 hour lesson A bit of background A good example of this is the pudding4. Puddings started out as meat-based foods either encased in instestines much like sausages or cooked in a pot like a broth or porridge. Introduction of more cereals allowed Single molecule for sweet puddings, and the household oven together sequencers with pudding cloths lead to the baked or steamed Short-read puddings which closely resemble the modern sequencers Christmas pudding or Spotted Dick! Technological advances in recent years are such that it is now relatively quick (Figure 1), easy and cheaper to determine DNA, RNA or protein sequences1,2,3. Think Kilobases per day per machine 1 000 000 000 100 000000 10 000 000 1 000 000 100 000 Microwell pyrosequencers 10 000 1 000 100 Manual slab gel 10 1980 1985 Automated slab gel Second-generation capillary First-generation capillary 21st Century 20th Century 1990 1995 2000 2005 2010Future Year Figure 1: Adapted from Stratton M, Campbell PJ and Futreal PA (2009) back to the Human Genome Project - this was an international effort with many labs contributing to the project over the course of more than a decade. The technology available today means that it is now possible for a single lab to produce the same amount of data within a week2! Beef steak and kidney pudding Beef steak and mutton pudding Pease pudding (more solid) Savoury (meat-based) Evolution of recipes How a DNA sequence evolves over generations through the accumulation of mutations can be considered analogous to a recipe being passed from one generation to the next. Each time a new technology is invented, or when the recipe is passed on through word of mouth or print, some element of the recipe is changed. The most modern version of the recipe may look similar to a relatively recent version although the flavour may have subtly changed. However when this most modern recipe is compared to the original recipe, the end result may vary quite drastically such that it does not even have the same approximate shape. introducion of basins and steaming (e.g. plum/Christmas pudding) 19th Century Sponge puddings 18th Century Sweet 17th Century (flour, nuts, sugar) introducion of pudding cloth 16th Century White It is all very well being able to churn out the sequences, but what do they actually mean? Does a particular DNA sequence code for a protein? What does that protein do within a cell? What effect does a small change in DNA sequence have on the protein’s structure and therefore its function? How can we determine the evolutionary history or how related a number of species are to one another? This is where bioinformatics comes in to play we are able, for instance, to compare newly sequenced stretches of DNA to those that have been sequenced previously and for which we already know the function. If the sequences contain similar patterns or ‘motifs’ then perhaps the proteins encoded work in a similar way. Of course to make life easier (and faster) this sort of work is usually done with the help of a super-powerful computer. However, in allowing the computer to do all the hard work we may begin to lose understanding of how the comparisons are done. This activity is designed to help you understand how bioinformatics can analyse data using a simple pen and this paper - all the tables and diagrams you need are right here! Cakes-style puddings pudding (mainly cereal sausage) Pease pottage Black pudding (cooked in a pot) (meat sausage) Figure 2: Putative evolution of the British ‘pudding’ household baking oven (still cool) 15th Century 5th Century Evolution of sequences Each time a sequence is copied, for example from one generation to the next, mutations occur in that sequence. Providing these mutations are not harmful to the individual they are perpetuated through subsequent generations. The accumulation of mutations over time can be used to estimate the relationship between different species. Classically organisms would be compared by their physical appearance to determine their relationship. Problems can arise with the accuracy of suggested relationships, however, when two organisms evolve a similar appearance but through different routes, for example birds and insects both developed wings. Studies comparing DNA sequences have told us that mutations occur very infrequently and at random locations, being passed from parents to offpsring. By assuming that all organisms derived from a common ancestor you can look at comparable sequences, for instance which make the same protein, and determine how long ago they diverged from one another by aligning them and determining the number of mutations - the longer ago that they separated, the greater the number of mutations will be. It is important to understand that different parts of DNA evolve at different rates. DNA which make proteins (coding regions) accumulate fewer mutations as they could produce a defective protein that is detrimental to the organism, which is therefore less likely to survive long enough to reproduce and perpetuate the mutation. Sequence Comparison Table 2 Alignment of a 90bp sequence of the atpB gene from five crop species Crop Barley Wheat Oat Rice Oilseed Rape Sequence T GC C GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC GGAC GG T GAC GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC GGAC GG T T C C GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAC GG T GAC GGT AAGC AAAT T AAT GT AAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAT GG T GAC GT C AAGC AAAT T AAT GT GAC T T GT GAAGT AC AGC AAT T AT T AGGAAAC AAC C GAGT T AGAGC T GT AGC T AT GAGC GC GAC C GAGGG Above (Table 2) shows the alignment of a partial atpB gene from five different crop plant species. AtpB encodes ATP synthase which is responsible for generating ATP (the energy source for cells) and as such is a highly conserved gene across many species4. Pairwise Comparison The first step in determing the ancestry of these crop plants is to make comparisons between all possible paired combinations of species. Table 3 shows pairwise comparisons between Barley and the four other species with differences or mutations highlighted in red. Continue to complete all pairwise comparisons using Tables 4 to 6 by highlighting mutations with a coloured pen, or encircling the nucleotides which are different Table 3 Pairwise comparison of Barley atpB sequence Crop Barley Wheat Oat Rice Oilseed Rape Sequence T GC C GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC GGAC GG T GAC GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC GGAC GG T T C C GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAC GG T GAC GGT AAGC AAAT T AAT GT AAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAT GG T GAC GT C AAGC AAAT T AAT GT GAC T T GT GAAGT AC AGC AAT T AT T AGGAAAC AAC C GAGT T AGAGC T GT AGC T AT GAGC GC GAC C GAGGG Table 4 Pairwise comparison of the Wheat atpB sequence Crop Wheat Oat Rice Oilseed Rape Sequence T GAC GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC GGAC GG T T C C GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAC GG T GAC GGT AAGC AAAT T AAT GT AAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAT GG T GAC GT C AAGC AAAT T AAT GT GAC T T GT GAAGT AC AGC AAT T AT T AGGAAAC AAC C GAGT T AGAGC T GT AGC T AT GAGC GC GAC C GAGGG Table 5 Pairwise comparison of the Oat atpB sequence Crop Oat Rice Oilseed Rape Sequence T T C C GAT AAGC AAAT T AAT GT GAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAC GG T GAC GGT AAGC AAAT T AAT GT AAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAT GG T GAC GT C AAGC AAAT T AAT GT GAC T T GT GAAGT AC AGC AAT T AT T AGGAAAC AAC C GAGT T AGAGC T GT AGC T AT GAGC GC GAC C GAGGG Table 6 Pairwise comparison of the Rice atpB sequence Crop Rice Oilseed Rape Sequence T GAC GGT AAGC AAAT T AAT GT AAC T T GT GAGGT AC AAC AAT T AT T AGGAAAT AAT C GAGT T AGAGC T GT AGC T AT GAGT GC T AC AGAT GG T GAC GT C AAGC AAAT T AAT GT GAC T T GT GAAGT AC AGC AAT T AT T AGGAAAC AAC C GAGT T AGAGC T GT AGC T AT GAGC GC GAC C GAGGG from the top-most sequence in the table i.e. for Table 4 compare Oat to Wheat, Rice to Wheat and Oilseed Rape to Wheat. Complete Table 7 with the number of mutations between each crop species pair. For example, there is one (1) nucleotide which differs between Barley and Wheat, however a comparison between Barley and Oilseed Rape reveals there are 11 mutations. This indicates that Barley and Wheat are the most closely related of the five species. Proportional differences You can now begin to populate Table 8 with the proportional difference in row 1. For example, for Barley and Wheat divide the number of different nucleotides (1) by the length of the sequence which you have compared (90) i.e. 1/90 = 0.0111 (to 4 decimal places). This gives an indication of the proportional distance between the two species. Table 7 Number of mutations between crop species Barley Barley Wheat Oat Rice Oilseed Rape 0 1 2 5 11 Wheat Oat 1 0 Rice 2 Oilseed Rape 5 11 0 0 0 Table 8 Proportional distances between crop species No. Differences Proportional difference Barley and Wheat Barley/Wheat and Oat Barley/Wheat/Oat and Rice Barley/Wheat/Oat/Rice and Oilseed Rape 1 1/90 = 0.0111 You then need to determine the number of mutations between this Barley/Wheat ancestor and the remaining 3 species. The ancestral sequence is presumed to be the ‘average sequence’ of the two species, and whilst it is not physically determined here it is possible to determine the proportional distance between the theoretical ancestor and each of the other crop species in turn. First you must calculate the number of mutations between the ancestral sequence and the other crop species by taking an average of the number of mutations for the two species comprising the theoretical ancestor. For example a comparison of Oat with the Barley reveals 2 mutations, and with Wheat reveals 3 mutations. Build a phylogenetic tree Therefore between Oat and the Barley/Wheat ancestor there are (2+3)/2 = 2.5 mutations. Complete Table 9 for the Barley/Wheat/Oat ancestor not forgetting that your ancestor now consists of 3 species and to get the ancestral sequence differences you need to add up the mutations from each of the individual contributing sequences and divide by 3. Continue with Tables 10 and 11 in this way. Table 9 Number of mutations between Barley/Wheat ancestor and other species Barley/Wheat Barley/Wheat Oat Rice Oilseed Rape 0 (2+3)/2 = 2.5 Table 10 Number of mutations between Barley/Wheat/Oat ancestor and other Barley/Wheat/Oat species Barley/Wheat/Oat Rice Oilseed Rape 0 Table 11 Number of mutations between Barley/Wheat/Oat/Rice ancestor and Oilseed Rape Barley/Wheat/Oat/Rice Barley/Wheat/Oat/Rice Oilseed Rape 0 Now you may convert your values to proportional distances in the same way as before to complete Table 8. Building the phylogenetic tree Using the proportional distances that you have calcuated in Table 8 you can now begin to construct the phylogenetic or evolutionary tree. First of all you need to connect Barley and Wheat with a trunk line whose length is dependent upon the time it has taken for the two species to diverge from their common ancestor as indicated by the proportional distances calculated in row 1 of Table 8. For the purpose of this exercise we will assume that it would take 1000 milllion years for all of the nucleotides in the sequence analysed to mutate. So, for our Barley/Wheat ancestor’s sequence to diverge into the two separate species we know today it would have taken: 0.0111*1000 million = 11.1 million years ago (mya). Draw a line back which represents 11.1 million years on Figure 3. It may help later on if you include the proportional distances by writing them beside the trunk line when drawing your tree. The next step is to work out how long ago Oat, Barley and Wheat diverged from a common ancestor. The way to calculate this is to add the proportional distances between the Barley and Wheat (row 1 of Table 8), and between the Barley/Wheat ancestor and oat (Row 2 of Table 8) like this: = (0.0111+0.0278)*1000 million years = 0.0389*1000 million = 38.9 mya Again mark this with a trunk line on Figure 3 along with the proportional distance. Continue to draw up the phylogenetic tree in this way until you have estimated the divergence of all five species and their common ancestors. Questions There are a few questions that you might like to think about and which can help you to better understand the process of drawing a phylogenetictree. Answers can be downloaded as a separate file from: (http://www.tgac.bbsrc.ac.uk). 1. Are your estimates of time since divergence from ancestors likely to be close to those published or to the actual times? 2. What could cause your estimates of time since divergence to be wildly different (hint: think about the length of sequences that you have compared today and the assumed rate of mutation)? 3. How would phylogenetic trees compare to one another if they were built using calculations from different DNA sequences (think about different lengths of sequence used for comparison or different regions such as within or outside of genes)? 4. What would you do if you had gaps in your aligned sequences for comparison due to insertions or deletions of nucleotides (as opposed to substitutions)? References and More reading 1. Stratton M, Campbell PJ and Futreal PA (2009) Nature 458 (7239), 719-724 2. Linnarsson S (2010) Exp Cell Res 316, 1339-1343 3. Pedersen PL and Amzel LM (1993) J Bio Chem 268 (14), 9937-9940 4. ‘The Food Timeline’ Ed. Lynne Olver accessed on: 04/05/11 at: http://www.foodtimeline.org/foodpuddings.html For an introduction to phylogenetics see: http://www.ncbi.nlm.nih.gov/About/primer/phylo.html and http:// tinyurl.com/2wqp7nq To find out more about a group of scientists who have attempted to pen down a more accurate tree of life visit: http://www.embl.de/aboutus/communication_outreac h/publications/annual_report/AnnualReport05-06.pdf page 166 180 160 140 120 80 60 40 million years ago 100 Figure 3: A hypothetical phylogenetic tree showing interrelatedness between 5 common crop species 200 20 0 Oilseed Rape Rice Oat Wheat Barley
© Copyright 2026 Paperzz