584 CHAPTER 13 PLANT REPRODUCTIVE BIOLOGY 14 EXERCISES 1. Examine specimens of two species of plants plus any putative hybrids between them. (a) Study both vegetative and floral characters, from original observations or using a manual of the area, and note which diagnostic features distinguish the two species. (b) Decide upon which characters to measure in the specimens available. (c) Record 10—25 measurements of each of the parameters chosen. Compare these by preparing graphs in order to recognize discontinuities (or lack thereof) of the three taxa. 2. Locate a population of a composite (Asteraceae) species that has both disk and ray flowers. Observe insect visitors (potential pollinators) in each of two subsets of plants (or inflorescences): one undisturbed and another with all ray flowers removed. Count the number and type of visitors over a time period (e.g., 10—30 minutes) and record. 3. If material is available, observe ultraviolet light—sensitive regions in the perianth by placing a flower into a jar saturated with ammonium vapors. Bees can detect these UV-reflective regions of the flower, enabling them to find flowers and orient to pollen or nectar more efficiently. 4. Fix the styles of a species of flowering plant in 70% alcohol. Remove the style and place in drops of aniline blue on a micro scope slide, covered by a cover slip. If this style is small enough, it may be “squashed” by applying firm pressure on the cover slip (using, e.g., a cork). Observe under fluorescence microscopy. Pollen tubes regularly deposit callose, which dif ferentially picks up the aniline blue stain. This method allows for detection of pollen tube growth and can be used to test whether self-incompatibility is occuring. 5. If time permits, select a plant species and perform the crossing and caging experiments described in the text. These techniques are used to test the potential and degree of self-pollination versus cross-pollination. 6. Peruse journal articles on plant systematics, e.g., American Journal of Botany, Systematic Botany, or International Journal of Plant Sciences, and note those that describe aspects of reproductive biology in relation to systematic studies. Identify the techniques used and the problems addressed. REFERENCES FOR FURTHER STUDY Bernhardt, P. 1989. Wily Violets and Underground Orchids. W. Mono & Co., New York. Chittka, L., A. Shmida, N. Troje, and R. Menzel. 1994. Ultraviolet as a component of flower reflections, and the colour perception of Hymenoptera. Vision Research 34: 1489—1 508. Faegri, K., and L. van der Pijl. 1979. The Principles of Pollination Ecology, 3rd ed. Pergamon Press, Oxford. Stace, C. A. 1989. Plant Taxonomy and Biosystematics, 2nd ed. Edward Arnold. Distributed in the U.S.A. by Routledge, Chapman, and Hall, New York. van der Pijl, L. 1982. Principles of Dispersal in Higher Plants. Springer-Verlag, Berlin. PLANT MOLECULAR SYSTEMATICS ACQISIT1ON OF MOLECULAR DATA 585 MICROSATELLITE DNA 593 DNA SEQJENCE DATA 585 RANDOM AMPLIFIED POLYMORPHIC DNA (RAPDs) 596 Polymerase Chain Reaction DNA Sequencing Reaction Types of DNA Sequence Data Analysis of DNA Sequence Data 585 587 587 590 AMPLIFIED FRAGMENT LENGTH POLYMORPHISM (AFLPs) 596 592 REVIEW QUESTIONS EXERCISES 600 RESTRICTION SITE ANALYSIS (RFLPs) ALLOZYMES 593 REFERENCES FOR FURTHER STUDY 601 600 Molecular systematics encompasses a series of approaches in which phylogenetic relationships are inferred using informa tion from macromolecules of the organisms under study. Specifically, the types of molecular data acquired include that from DNA sequences, DNA restriction sites, allozymes, microsatellites. RAPDs, and AFLPs. (The use of data from other, generally smaller molecules, such as secondary compounds in plants, is usually relegated to the field of in a container of silica gel. Alternatively, plant samples may be frozen or placed in concentrated extraction buffer. With any of these procedures, DNA is usually preserved intact. Usable DNA is often successfully isolated from dried herbarium sheets, attesting to the “toughness” of the molecule. “chemosystematics” and will not be reviewed here.) DNA SEQJJENCE DATA A revolution in inferring the phylogenetic relationships of life is occurring with the use of molecular data. The follow ing is a review of the types of data, methods of acquisition, and methods of analysis of molecular systematics. Perhaps the most important method for inferring phylo genetic relationships of life is that of acquiring DNA sequences. DNA sequence data basically refers to the sequence of nucleotides (adenine = A, cytosine = C, guanine = 0, or thymine = T; Figure 14.1) in a particular region of the DNA ACOJISITION OF MOLECULAR DATA Plant samples from which DNA is to be isolated may be acquired by various means. It is vital to always collect a of a given taxon. Comparisons of homologous regions of DNA among the taxa under study yield the characters and character states that are used to infer relationships in phylogenetic analyses. The first step of acquiring DNA sequence data is to proper voucher specimen, properly mounted and accessioned in an accredited herbarium, to serve as documentation for any identify a particular region of DNA to be compared between species. Much prior research goes into identifying these regions molecular systematic study (see Chapter 17). Live samples may be collected and immediately subjected to chemical processing, e.g., for allozyme analysis (see later discussion). and determining their efficacy in phylogenetic analysis. POLYMERASE CHAIN REACTION For many DNA methods, pieces of leaves (from which chloro plast, mitochondrial, and nuclear DNA can be isolated) are removed from the live plant and immediately dried, typically After a gene sequence of interest is identified, the DNA from a given plant sample is firstisolated and purified by various 585 02010 Elsevier Inc. All rights reserved. doi: 1O.10161B978-O-12-374380-O00014-3 1 586 UNIT III PLANT MOLECULAR SYSTEMATICS CHAPTER 14 3 CH N NtTh HNNH HNNH II II guanine adenine 0 0 cytosine thymine FIGURE 14.1 Molecular structure of the four DNA nucleotides. Adenine and guanine are chemically similar purines; cytosine and thymine are chemically similar pyrimidines. The polymerase chain reaction work as follows (see Figure 14.2). Prior research establishes the occurrence of relatively short regions of DNA that flank (occur at each end of) the gene or DNA sequence of interest and that are both unique (not occurring elsewhere in the genome) and conserved (i.e., invariable) in all taxa to be investigated. chemical procedures. Following this, the DNA sequences of interest are amplified using the polymerase chain reaction (or PCR). The invention of this technology was crucial to modern DNA sequencing, as it permitted rapid and efficient DNA amplification, the replication of thousands of copies of DNA. repeat cycle sample DNA * u (4 (4 * 4 * 4) * DNA denatures solution heated 6 4 * * ) * C * 4 6 ç C. * C * DNA renatures temperature lowered 5’ 3’ 9 3’ primers anneal to conserved regions 5’ A-T-C-G-G-T-T T-A-G-C-C-A-A-T-C-G-C-T 3’ .... n 5’ .•• A-T-C-G-G-T-T-A-G-c-G-A,. - n A-A-T-T-A-C-C-T-C-C-A--T 3’ 5’ A-T-C-G-G-T-T--Q--A .... T-A-G-C-C-A-A-T-C-G-C-T,, 5’ n•• 3’ n - T-T-A-A-T-C-G-A-G-G---. u.A-A-T-T-A-G-C-T-C-C-A-J-T A-T-C-G-G-T-T-A-G-C-G-AU T-A-G-C-C-A-A-T-C-G-c-T. n. ““T-T-A-A-T-C-G-A-G-G-p-T-A -C-C-T-C-C-A-A-T 5’ 3’ DNA strands replicated 5’ 3’ ““T-T-A-A-P-C-G-A-G-G-T-TC-T-C-C-A-A-T 5’ free nucleotides (catalyzed by DNA polymerase) bind to primers 3’ 5’ 5’ A-T-C-C-G-T-T-A-G-C-G-A P-A-G-C-C-A-A-T-c-Q-c-’r.,., ... n A-T-C-G-G-T-T-A-G-C-G-A.,. - nA-A-T-T-A-G-C-T-C-c-A-A-” n UUTTAATCGAGGTTA -AT---C-C-T-C-C-A-A-T 5’ FIGURE 14.2 3’ 5’ Polymerase chain reaction, using cycle sequencing to produce multiple copies of a stretch of DNA. 5’ 3’ SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY These short, conserved, flanking regions are used as a template for the synthesis of multiple, complementary copies, known as primers. Primers ideally are constructed such that they do not bind with one another. In the polymerase chain reaction, a solution is prepared, made up of the isolated and purified DNA of a sample; multiple copies of primers; free nucleotides; DNA polymerase molecules (typically Taq polymerase, which can tolerate heat); and buffer and salts. This solution is heated to a point at which the sample DNA denatures, whereby the two strands of DNA separate from one another. Once the sample DNA denatures, the primers in solution may bind with the corresponding, complementary DNA of the sample (Figure 14.2). Following binding of the primer to the sample DNA, individual nude otides in solution attach to the 3’ end of the primer, with the sample DNA acting as a template; DNA polymerase cata lyzes this reaction. A second primer, at the opposite end of the DNA sequence of importance, is used for the complemen tary, denatured DNA strand. Thus, the two denatured strands of DNA are replicated. After replication, the solution is cooled to allow for annealing of the replicated DNA with the complementary DNA single strands. This is followed by heating to the point of DNA denaturation, and repeating the process. A typical PCR reaction can produce more than a million copies of DNA in a matter of hours. DNA SEOJENCING REACTION After DNA is replicated, it is sequenced. The most common sequencing technology involves a machine that reads fluores cent dyes with a laser detector. The production of dye-labeled DNA is very similar to DNA replication using the PCR. The replicated DNA is placed into solution with DNA polymer ase, primers, free nucleotides, and a small concentration of synthesized compounds called dideoxynucleotides (discussed later) that are each attached to a different type of fluorescent dye. As in the polymerase chain reaction, the sample DNA is heated until the double helix unwinds and the two comple mentary DNA chains separate (Figure 14.3). At this point, a primer attaches to a conserved region of one of the strands of DNA, and free nucleotides in solution join to the 3’ end of the primer, using the sample DNA as a template and cata lyzed by DNA polymerase (Figure 14.3). Thus, a replicated copy of the DNA strand begins to form. However, at some point a dideoxynucleotide joins to the new strand instead of a nucleotide doing so. The dideoxynucleotides (dideoxy adenine, dideoxycytosine, dideoxyguanine, and dideoxythymine) resemble the four nucleotides, except that they lack a hydroxyl group. Once a dideoxynucleotide is joined to the chain, absence of the hydroxyl group prevents the DNA polymerase from joining it to anything else. Thus, with the addition of 587 a dideoxynucleotide, synthesis of the new DNA strand terminates (Figure 14.3). The ratio of dideoxynucleotides to nucleotides in the reac tion mixture is carefully set and is such that the concentration of dideoxynucleotides is always much smaller than that of normal nucleotides. Thus, the dideoxynucleotides may termi nate the new DNA strand at any point along the gene being replicated. For example, some of the new DNA strands will be the length of the primer plus one additional base (in this case the dideoxynucleotide); some will be the primer length plus two bases (a nucleotide plus the terminal dideoxynucle otide); some will be the primer length plus three bases (two nucleotides plus the terminal dideoxynucleotide); etc. There are many thousands, if not millions, of copies of the sample DNA. Thus, there will be an equivalent number of newly replicated DNA strands, of all different lengths. The final step of DNA sequencing entails subjecting the DNA strands to electrophoresis, in which the DNA is loaded onto a flat gel plate or in a thin capillary subjected to an elec tric current. Because the phosphate components of nucleic acids give DNA a net negative charge, the molecules are attracted to the positive pole. The DNA strands migrate through the medium over time, the amount of migration inversely proportional to the molecular weight of the strand (i.e., lighter strands migrate further). Each strand is termi nated with a dideoxynucleotide to which a fluorescent dye is attached; each of the four dideoxynucleotides has a differ ent type of fluorescent dye, which (upon excitation) emits light of a different wavelength. Thus, as the multiple copies of DNA of one particular length migrate along the gel or capillary, the wavelength of emitted light is detected and recorded as a peak, which measures the light intensity. Because a given emitted wavelength (“color”) is determined by one of the four dideoxynucleotides, the corresponding nucleotide can be inferred and its position identified by the timing of migration of the DNA strands. In this way, the sequence of nucleotides of the DNA strand can be inferred (Figure 14.3). TYPES OF DNA SEQ.JENCE DATA For plants, the three basic types of DNA sequence data stem from the three major sources of DNA: nuclear (nDNA), chioroplast (cpDNA), and mitochondrial (mtDNA). Nuclear DNA is, of course, transmitted from parent(s) to offspring by nuclear division (meiosis or mitosis) via sexual or asexual (somatic) reproduction. Chloroplasts and mitochondria, however, replicate and divide independently of the nucleus and may be transmitted to offspring in a different fashion. For example, in angiosperms these organelles are usually (with some exceptions) sexually transmitted only maternally, being 586 CHAPTER 14 UNIT III PLANT MOLECULAR SYSTEMATICS 3 CH 0 Fio LE HN.NH 0 adenine cytosine guanine thymine FIGURE 14.1 Molecular structure of the four DNA nucleotides. Adenine and guanine are chemically similar purines; cytosine and thymine are chemically similar pyrimidines. The polymerase chain reaction work as follows (see Figure 14.2). Prior research establishes the occurrence of relatively short regions of DNA that flank (occur at each end of) the gene or DNA sequence of interest and that are both unique (not occurring elsewhere in the genome) and conserved (i.e., invariable) in all taxa to be investigated. chemical procedures. Following this, the DNA sequences of interest are amplified using the polymerase chain reaction (or PCR). The invention of this technology was crucial to modem DNA sequencing, as it permitted rapid and efficient DNA amplification, the replication of thousands of copies of DNA. repeat cycle sample DNA 4 (. . C’ 6 solution heated * ,.. 05 * * DNA denatures v 3’ 5’ 5 3 CS , * ‘S .s * C c • 4 3’ .. T-A-G-C-C-A-A-T-C-G-C-T n. - ... 5’ n. A-T-C-G-G-T-T-A-G-C-G_An. - 3’ 5’ A-T-C-G-G-T-T 3’ •T-A-G-C-C-A-A--C-G-C-T .. ..‘A-A-T-T-A-G-C-T-C-C-A-A- 5’ 5’ - .‘I-T-A-A-T-C-G-A-G-G-T-T-A 3’ A-T-C-G-G-T-T-A-C-C-G-A.” , C 4 * C’S . o T-A-G-C-C-A-A-T-C-G-C-Tn T-T-A--T-C-G-A-G-G-T-TA ... 3’ A-A-T-T-A-G-C-T-C-C-A-A-T .. 5’ 3’ “T-T-A-A-T-C-G-A-G-G-T-T-A n -:-:-T-A-G-c-T-c-c-A-A-r 5’ DNA strands replicated C-T-C-C-A-A-T 5’ 5’ A-T-C-G-G-T-T-A-G-C--A free nucleotides (catalyzed by DNA polymerase) bind to primers FIGURE 14.2 3’ u. T-A-G-C-C-A-A-T-C-G-C-T u. - 5’ u•• A-T-C-G-G-T-T-A-G-C-G-A n - A-A-T-T-A-G-C-T-C-C-A-A-T u T-T-A-A-T-C-G-A-G-G-T-T-A u. A-A-T-T-A-G-C-T-C-C-A-A-T 5’ Polymerase chain reaction, using cycle sequencing to produce multiple copies of a stretch of DNA. These short, conserved, flanking regions are used as a template for the synthesis of multiple, co,npletnentary copies, known as primers. Primers ideally are constructed such that they do not bind with one another. In the polymerase chain reaction, a solution is prepared, made up of the isolated and purified DNA of a sample; multiple copies of primers; free nucleotides; DNA polymerase molecules (typically Taq polymerase, which can tolerate heat); and buffer and salts. This solution is heated to a point at which the sample DNA denatures, whereby the two strands of DNA separate from one another. Once the sample DNA denatures, the primers in solution may bind with the corresponding, complementary DNA of the sample (Figure 14.2). Following binding of the primer to the sample DNA, individual nude otides in solution attach to the 3’ end of the primer, with the sample DNA acting as a template; DNA polymerase cata lyzes this reaction. A second primer, at the opposite end of the DNA sequence of importance, is used for the complemen tary, denatured DNA strand. Thus, the two denatured strands of DNA are replicated. After replication, the solution is cooled to allow for annealing of the replicated DNA with the complementary DNA single strands. This is followed by heating to the point of DNA denaturation, and repeating the process. A typical PCR reaction can produce more than a million copies of DNA in a matter of hours. * DNA renatures temperature lowered 5’ A-T-C-G-G-T-T-A-G-C-C-A primers anneal to conserved regions * SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY 5’ 3’ DNA SEQJENCING R]EACTION After DNA is replicated, it is sequenced. The most common sequencing technology involves a machine that reads fluores cent dyes with a laser detector. The production of dye-labeled DNA is very similar to DNA replication using the PCR. The replicated DNA is placed into solution with DNA polymer ase, primers, free nucleotides, and a small concentration of synthesized compounds called dideoxynucleotides (discussed later) that are each attached to a different type of fluorescent dye. As in the polymerase chain reaction, the sample DNA is heated until the double helix unwinds and the two comple mentary DNA chains separate (Figure 14.3). At this point, a primer attaches to a conserved region of one of the strands of DNA, and free nucleotides in solution join to the 3’ end of the primer, using the sample DNA as a template and cata lyzed by DNA polymerase (Figure 14.3). Thus, a replicated copy of the DNA strand begins to form. However, at some point a dideoxynucleotide joins to the new strand instead of a nucleotide doing so. The dideoxynucleotides (dideoxy adenine, dideoxycytosine, dideoxyguanine, and dideoxythymine) resemble the four nucleotides, except that they lack a hydroxyl group. Once a dideoxynucleotide is joined to the chain, absence of the hydroxyl group prevents the DNA polymerase from joining it to anything else. Thus, with the addition of 587 a dideoxynucleotide, synthesis of the new DNA strand terminates (Figure 14.3). The ratio of dideoxynucleotides to nucleotides in the reac tion mixture is carefully set and is such that the concentration of dideoxynucleotides is always much smaller than that of normal nucleotides. Thus, the dideoxynucleotides may termi nate the new DNA strand at any point along the gene being replicated. For example, some of the new DNA strands will be the length of the primer plus one additional base (in this case the dideoxynucleotide); some will be the primer length plus two bases (a nucleotide plus the terminal dideoxynucle otide); some will be the primer length plus three bases (two nucleotides plus the terminal dideoxynucleotide); etc. There are many thousands, if not millions, of copies of the sample DNA. Thus, there will be an equivalent number of newly replicated DNA strands, of all different lengths. The final step of DNA sequencing entails subjecting the DNA strands to electrophoresis, in which the DNA is loaded onto a flat gel plate or in a thin capillary subjected to an elec tric current. Because the phosphate components of nucleic acids give DNA a net negative charge, the molecules are attracted to the positive pole. The DNA strands migrate through the medium over time, the amount of migration inversely proportional to the molecular weight of the strand (i.e., lighter strands migrate further). Each strand is termi nated with a dideoxynucleotide to which a fluorescent dye is attached; each of the four dideoxynucleotides has a differ ent type of fluorescent dye, which (upon excitation) emits light of a different wavelength. Thus, as the multiple copies of DNA of one particular length migrate along the gel or capillary, the wavelength of emitted light is detected and recorded as a peak, which measures the light intensity. Because a given emitted wavelength (“color”) is determined by one of the four dideoxynucleotides, the corresponding nucleotide can be inferred and its position identified by the timing of migration of the DNA strands. In this way, the sequence of nucleotides of the DNA strand can be inferred (Figure 14.3). TYPES OF DNA SEQUENCE DATA For plants, the three basic types of DNA sequence data stem from the three major sources of DNA: nuclear (nDNA), chioroplast (cpDNA), and mitochondrial (mtDNA). Nuclear DNA is, of course, transmitted from parent(s) to offspring by nuclear division (meiosis or mitosis) via sexual or asexual (somatic) reproduction. Chloroplasts and mitochondria, however, replicate and divide independently of the nucleus and may be transmitted to offspring in a different fashion. For example, in angiosperms these organelles are usually (with some exceptions) sexually transmitted only maternally, being 588 CHAPTER 14 UNIT Ill PLANT MOLECULAR SYSTEMATICS sample DNA (many copies) add: primer molecules, nucleotides, DNA polymerase, dideoxynucleotides solution heated, DNA denatures 3’ 3 51 5’ 3’ a single primer anneals to a conserved region of one strand of sample DNA t 5’ A_T_C_G_G_T_T_A_G_C* T-A-G-C-C-AA-T-C-GC-A..” n 44 3’ 5’ A-T-C-G-G-T-T-A-G T-A-G-C-C-A-A-T-C-G-C-An. n.. primer 5’ A-T-C-G-G-T-T 3’ .•.. T-A-G-C-C-A-A-T-C-G-C-A..” — ...A-A-T-T-A-G-C-T-C-C-A-A-T •n first nucleotide (catalyzed by DNA polymerase) \bindststrand ‘ 5’ ATCGGTTA T-A-G-CC-A-A-T-C-G-C-A atpB 5’ at random, dideoxynucleotide (C* in this case) binds to primer strand, terminating reaction n.A-A-T-T-A-G-C-T-C-C-A-A-T rn t 5’ sample DNA 5 matK second nucleotide binds to primer strand n’A-ATT-A-G-C-T-C-C-A-A-T 5’ Inverted Repeat B (+) 4 retained in the egg but excluded in sperm cells. (In conifers, interestingly, chloroplast DNA is transmitted paternally, not maternally.) The use of sequence data from the DNA of chloroplasts has proven to be very useful in elucidating both lower and higher level relationships. The basic structure of chloro plast DNA for a flowering plant, with coding genes indicated, is shown in Figure 14.4. Like all organelle and prokaryotic DNA, chloroplast DNA is circular. Curiously, most angio sperms have a region of chloroplast DNA known as the inverted repeat, which is the mirror image of the correspond ing region (Figure 14.4). Some of the more commonly sequenced chioroplast DNA genes are listed in Table 14.1, although many more have been utilized. A-T-C-G-G-T-T-A A_T_C_G_G_T_T_A_G_C* DNA strands scanned during migration. Peaks of wavelengths correspond to fluorescent dyes attached to specific dideoxynucleotides A-T-C-G-G-T-T-A-G-C A_T_C_G_G_T_T_A_G_C_G_T* (—) FIGURE 14.3 DNA sequencing reactions. A* Inverted Repeat A FIGURE 14.4 Molecular structure of the chloroplast DNA of tobacco (Nicotiana tabacum). Note large single-copy region (LSC), small single-copy region (SSC), and the two inverted repeats (IRA and IRB). Also note location of atpB, rbcL, inatK, and ndhF genes (see Table 14.1). (Redrawn from Wakasugi, T., M. Sugita, T. Tsudzuki, and M. Sugiura. 1998. Updated gene map of tobacco chloroplast DNA. Plant Molecular Biology Reporter 16: 23 1—241, by permission.) A-T-C-G-G-T-T ELECTROPHORESIS: electric current applied. DNA strands migrate to (+) pole (inversely to molecular weight) 589 Large Single Copy Region new DNA strands denatured from sample DNA; after numerous reactions new DNA strands separated by electrophoresis (below) ..A-A-TT-A-GC-TC-CAAT SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY = dideoxyadenine; C’ = dideoxycytosine; G* = dideoxyguanine; 1* = dideoxythymine. In addition to coding genes of chioroplast DNA, the sequences between genes, known as intergenic spacers, may be used in phylogenetic analyses. Intergenic spacer regions often show a higher degree of variability than the coding genes, making the former more useful for analyses at a lower täxonomic level, such as species or infraspecies. A list of some commonly used chloroplast intergenic spacers is seen in Table 14.2. Nuclear DNA sequencing has been used to a lesser degree in plant systematics. Some nuclear genes such as alcohol dehydrogenase (Adh), which has traditionally been used in allozyme studies, are becoming more frequently used. One of the more useful types of nuclear DNA sequences has been the internal transcribed spacer (ITS) region, i 588 CHAPTER 14 UNIT Ill PLANT MOLECULAR SYSTEMATICS sample DNA (many copies) add: primer molecules, nucleotides, DNA polymerase, dideoxynucleotides solution heated, DNA denatures 3’ 3 51 5’ 3’ a single primer anneals to a conserved region of one strand of sample DNA t 5’ A_T_C_G_G_T_T_A_G_C* T-A-G-C-C-AA-T-C-GC-A..” n 44 3’ 5’ A-T-C-G-G-T-T-A-G T-A-G-C-C-A-A-T-C-G-C-An. n.. primer 5’ A-T-C-G-G-T-T 3’ .•.. T-A-G-C-C-A-A-T-C-G-C-A..” — ...A-A-T-T-A-G-C-T-C-C-A-A-T •n first nucleotide (catalyzed by DNA polymerase) \bindststrand ‘ 5’ ATCGGTTA T-A-G-CC-A-A-T-C-G-C-A atpB 5’ at random, dideoxynucleotide (C* in this case) binds to primer strand, terminating reaction n.A-A-T-T-A-G-C-T-C-C-A-A-T rn t 5’ sample DNA 5 matK second nucleotide binds to primer strand n’A-ATT-A-G-C-T-C-C-A-A-T 5’ Inverted Repeat B (+) 4 retained in the egg but excluded in sperm cells. (In conifers, interestingly, chloroplast DNA is transmitted paternally, not maternally.) The use of sequence data from the DNA of chloroplasts has proven to be very useful in elucidating both lower and higher level relationships. The basic structure of chloro plast DNA for a flowering plant, with coding genes indicated, is shown in Figure 14.4. Like all organelle and prokaryotic DNA, chloroplast DNA is circular. Curiously, most angio sperms have a region of chloroplast DNA known as the inverted repeat, which is the mirror image of the correspond ing region (Figure 14.4). Some of the more commonly sequenced chioroplast DNA genes are listed in Table 14.1, although many more have been utilized. A-T-C-G-G-T-T-A A_T_C_G_G_T_T_A_G_C* DNA strands scanned during migration. Peaks of wavelengths correspond to fluorescent dyes attached to specific dideoxynucleotides A-T-C-G-G-T-T-A-G-C A_T_C_G_G_T_T_A_G_C_G_T* (—) FIGURE 14.3 DNA sequencing reactions. A* Inverted Repeat A FIGURE 14.4 Molecular structure of the chloroplast DNA of tobacco (Nicotiana tabacum). Note large single-copy region (LSC), small single-copy region (SSC), and the two inverted repeats (IRA and IRB). Also note location of atpB, rbcL, inatK, and ndhF genes (see Table 14.1). (Redrawn from Wakasugi, T., M. Sugita, T. Tsudzuki, and M. Sugiura. 1998. Updated gene map of tobacco chloroplast DNA. Plant Molecular Biology Reporter 16: 23 1—241, by permission.) A-T-C-G-G-T-T ELECTROPHORESIS: electric current applied. DNA strands migrate to (+) pole (inversely to molecular weight) 589 Large Single Copy Region new DNA strands denatured from sample DNA; after numerous reactions new DNA strands separated by electrophoresis (below) ..A-A-TT-A-GC-TC-CAAT SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY = dideoxyadenine; C’ = dideoxycytosine; G* = dideoxyguanine; 1* = dideoxythymine. In addition to coding genes of chioroplast DNA, the sequences between genes, known as intergenic spacers, may be used in phylogenetic analyses. Intergenic spacer regions often show a higher degree of variability than the coding genes, making the former more useful for analyses at a lower täxonomic level, such as species or infraspecies. A list of some commonly used chloroplast intergenic spacers is seen in Table 14.2. Nuclear DNA sequencing has been used to a lesser degree in plant systematics. Some nuclear genes such as alcohol dehydrogenase (Adh), which has traditionally been used in allozyme studies, are becoming more frequently used. One of the more useful types of nuclear DNA sequences has been the internal transcribed spacer (ITS) region, i _____________ 590 TABLE CHAPTER 14 14.1 Large single-copy region of chloroplast rbcL Large single-copy region of chloroplast matK Large single-copy region of chloroplast Small single-copy region of chloroplast LEU1 position of a given gene) are arranged in corresponding columns (Fiure 14.6). For some genes that are relatively conserved, alignment is straightforward, as all taxa have the same number of nucleotides per gene. For other genes or DNA segments, some taxa may have one or more additions, deletions, inversions, or translocations relative to other taxa. The occurrence of these mutations, and/or the occurrence of considerable homoplasy among taxa, can make alignment of DNA sequences difficult. In addition, multiple copies of a gene can make homology assessment difficult. Various computer algorithms can be used to automatically align sequences of the taxa being studied, but these have assump tions that must be carefully assessed. Generally, in using DNA sequence data in a phylogenetic analysis, a character is equivalent to the nucleotide position, and a character state of that character is the specific nucleotide at that position (there being four possible character states, cor responding to the four nucleotides; see Figure 14.6). A large number (often the great majority) of nucleotide positions are generally invariable among taxa, and some of the variable ones are often uninformative by being autapomorphic for a given taxon; thus, relatively few sites are informative and therefore useful in phylogenetic reconstruction (Figure 14.6). Some chloroplast intergenic spacer regions that have been used in plant molecular systematics, after Shaw et al. 2005, 2007. CHLOROPLAST INTERGENIC SPACER REGIONS 3 ‘rpsl6-5 ‘trnK 3 ‘trnK-matK intron 3’trnV-ndhC 5 ‘rpSl2-rpL2O atpl-atpH matK-5 ‘trnK intron ndhA intron ndhF-rp132 ndhJ-trnF petL-psbE psal-accD psbA-3’trnK psbB-psbH psbD-trnT psbJ-petA psbM-trnD rpll4-rps8-infA-rp136 rpl]6 intron rp132-trnL rpoB-trnC rpsl6 intron rps4-trnT trnC-ycf6 trnD-trnT trnG intron trnH-psbA trnL intron trnL-trnF trnQ-S’rpsl6 trnS-rps4 trnS-trnfM trnS-trnG trnT-trnL ycf6-psbM ITS3 ITS5 rr L 5.8S nrDNA 1 TS2 18S nrDNA 26S ]_ITS1 A ANALYSIS OF DNA SEQJENCE DATA DNA sequence data is converted to characters and character states to be used in phylogenetic analyses. First, the sequences of a given length of DNA are aligned, in which homologous nucleotide positions (e.g., corresponding to the same codon I I Beta subunit of ATP synthethase, which functions in the synthesis of ATP via proton translocation Large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO), which functions in the initial fixation of carbon dioxide in the dark reactions Maturase, which functions in splicing type II introns from RNA transcripts Subunit of chloroplast NADH dehydrogenase, which functions in converting NADH to NAD + H*, driving various reactions of respiration which contains multiple DNA copies (as opposed to single copies found in most protein-coding genes). The ITS region lies between the 18S and 26S. nuclear ribosomal DNA (nrDNA); the ITS region is divided into two subregions, ITS 1 and ITS2, separated by the 5.8S nrDNA (Figure 14.5A). ITS sequence data has been most valuable for inferring phyloge netic relationships at a lower level, e.g., between closely related species. However, it has also been used in elucidating higher level relationships. (See Baldwin et al. 1995.) A related DNA sequence region is the external transcribed spacer (ETS) region. The ETS region lies between 26S and 18S nrDNA, adjacent to the latter (Figure 14.5B). (The entire region, including both the ETS and the nontranscribed spacer region (NTS) is known as the intergenic spacer region, or IGS; see Figure 1 4.5B.) The ETS region contains even more sequence variation than ITS and is useful in analyses at lower taxonomic levels. (See Baldwin and Markos 1998.) 591 ITS Region FUNCTION atpB TABLE 14.2 SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY Some chloroplast genes that have been used in plant molecular systematics, after Soltis et al. 1998. CHLOROPLAST GENES GENE LOCATION nd/iF UNIT III PLANT MOLECULAR SYSTEMATICS ITS2 ITS4 C ETS-HeI-1 r ETS NTS ETS-HeI-2 26S-IGS 18S nrDNA 1BS-E 18S-IGS 18S-ETS B IGS Region A. Internal transcribed spacers (ITSs) of nuclear ribosomal DNA, illustrating the ITS region and flanking subunits, and show spacer (ETS) of the intergenic spacer ing the orientations and locations of primer sites. After Baldwin et al. (1995). B. External transcribed (IGS) region, also showing orientations and locations of primer sites. After Baldwin and Markos (1998). FIGURE 14.5 However, a major addition, deletion, inversion, or translo cation can in itself be identified as an evolutionary novelty (apomorphy), used in grouping lineages together. For exam ple, members of the Faboideae (of the Fabaceae) lack, by deletion, one of the inverted repeats found in the chloroplasts of most angiosperms (see Figure 14.4). Chromosomal muta tions such as these may be coded separately from single base differences (e.g., as in the example of Figure 14.6) and may be given relatively greater weight in inferring relationships. Several types of weighting schemes may be done with molecular data. For protein encoding genes, the codon posi tion may be differentially weighted. For example, because of redundancy of the genetic code, the third codon position is generally more labile (a change more likely to have occurred randomly) than the second, and the second may be more labile than the first. Thus, the first and second codon posi tions may be given relatively greater weight, respectively (such as a weight of 10 for the first codon position, 5 for the second position, and 1 for the third position). The logic here is that a change in codon position 1 or 2 is less likely to have occurred at random within a taxon and more likely represents evolutionary novelties that are shared among taxa. Weighting by codon position may be based on empirical data. For a given data set, the number of changes occurring for codon positions 1, 2, and 3 may be used (inversely) to establish the relative weights. Another weighting parameter that may be used with DNA sequence data concerns transitions versus transversions. Transitions are evolutionary changes from one purine to another purine (A —, G or G —* A) or from one pyrimidine to another pyrimidine (C —* T or T —> C); see Figure 14.1. Transversions are evolutionary changes from a purine to a pyrimidine (A — C, A —, T, G — C, or G — T) or from a pyrimidine to a purine (C —* A, C — G, T —> A, or T — G). Weighting using transitions versus transversions may be based on empirical data. For a given data set, the relative frequency _____________ 590 TABLE CHAPTER 14 14.1 Large single-copy region of chloroplast rbcL Large single-copy region of chloroplast matK Large single-copy region of chloroplast Small single-copy region of chloroplast LEU1 position of a given gene) are arranged in corresponding columns (Fiure 14.6). For some genes that are relatively conserved, alignment is straightforward, as all taxa have the same number of nucleotides per gene. For other genes or DNA segments, some taxa may have one or more additions, deletions, inversions, or translocations relative to other taxa. The occurrence of these mutations, and/or the occurrence of considerable homoplasy among taxa, can make alignment of DNA sequences difficult. In addition, multiple copies of a gene can make homology assessment difficult. Various computer algorithms can be used to automatically align sequences of the taxa being studied, but these have assump tions that must be carefully assessed. Generally, in using DNA sequence data in a phylogenetic analysis, a character is equivalent to the nucleotide position, and a character state of that character is the specific nucleotide at that position (there being four possible character states, cor responding to the four nucleotides; see Figure 14.6). A large number (often the great majority) of nucleotide positions are generally invariable among taxa, and some of the variable ones are often uninformative by being autapomorphic for a given taxon; thus, relatively few sites are informative and therefore useful in phylogenetic reconstruction (Figure 14.6). Some chloroplast intergenic spacer regions that have been used in plant molecular systematics, after Shaw et al. 2005, 2007. CHLOROPLAST INTERGENIC SPACER REGIONS 3 ‘rpsl6-5 ‘trnK 3 ‘trnK-matK intron 3’trnV-ndhC 5 ‘rpSl2-rpL2O atpl-atpH matK-5 ‘trnK intron ndhA intron ndhF-rp132 ndhJ-trnF petL-psbE psal-accD psbA-3’trnK psbB-psbH psbD-trnT psbJ-petA psbM-trnD rpll4-rps8-infA-rp136 rpl]6 intron rp132-trnL rpoB-trnC rpsl6 intron rps4-trnT trnC-ycf6 trnD-trnT trnG intron trnH-psbA trnL intron trnL-trnF trnQ-S’rpsl6 trnS-rps4 trnS-trnfM trnS-trnG trnT-trnL ycf6-psbM ITS3 ITS5 rr L 5.8S nrDNA 1 TS2 18S nrDNA 26S ]_ITS1 A ANALYSIS OF DNA SEQJENCE DATA DNA sequence data is converted to characters and character states to be used in phylogenetic analyses. First, the sequences of a given length of DNA are aligned, in which homologous nucleotide positions (e.g., corresponding to the same codon I I Beta subunit of ATP synthethase, which functions in the synthesis of ATP via proton translocation Large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO), which functions in the initial fixation of carbon dioxide in the dark reactions Maturase, which functions in splicing type II introns from RNA transcripts Subunit of chloroplast NADH dehydrogenase, which functions in converting NADH to NAD + H*, driving various reactions of respiration which contains multiple DNA copies (as opposed to single copies found in most protein-coding genes). The ITS region lies between the 18S and 26S. nuclear ribosomal DNA (nrDNA); the ITS region is divided into two subregions, ITS 1 and ITS2, separated by the 5.8S nrDNA (Figure 14.5A). ITS sequence data has been most valuable for inferring phyloge netic relationships at a lower level, e.g., between closely related species. However, it has also been used in elucidating higher level relationships. (See Baldwin et al. 1995.) A related DNA sequence region is the external transcribed spacer (ETS) region. The ETS region lies between 26S and 18S nrDNA, adjacent to the latter (Figure 14.5B). (The entire region, including both the ETS and the nontranscribed spacer region (NTS) is known as the intergenic spacer region, or IGS; see Figure 1 4.5B.) The ETS region contains even more sequence variation than ITS and is useful in analyses at lower taxonomic levels. (See Baldwin and Markos 1998.) 591 ITS Region FUNCTION atpB TABLE 14.2 SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY Some chloroplast genes that have been used in plant molecular systematics, after Soltis et al. 1998. CHLOROPLAST GENES GENE LOCATION nd/iF UNIT III PLANT MOLECULAR SYSTEMATICS ITS2 ITS4 C ETS-HeI-1 r ETS NTS ETS-HeI-2 26S-IGS 18S nrDNA 1BS-E 18S-IGS 18S-ETS B IGS Region A. Internal transcribed spacers (ITSs) of nuclear ribosomal DNA, illustrating the ITS region and flanking subunits, and show spacer (ETS) of the intergenic spacer ing the orientations and locations of primer sites. After Baldwin et al. (1995). B. External transcribed (IGS) region, also showing orientations and locations of primer sites. After Baldwin and Markos (1998). FIGURE 14.5 However, a major addition, deletion, inversion, or translo cation can in itself be identified as an evolutionary novelty (apomorphy), used in grouping lineages together. For exam ple, members of the Faboideae (of the Fabaceae) lack, by deletion, one of the inverted repeats found in the chloroplasts of most angiosperms (see Figure 14.4). Chromosomal muta tions such as these may be coded separately from single base differences (e.g., as in the example of Figure 14.6) and may be given relatively greater weight in inferring relationships. Several types of weighting schemes may be done with molecular data. For protein encoding genes, the codon posi tion may be differentially weighted. For example, because of redundancy of the genetic code, the third codon position is generally more labile (a change more likely to have occurred randomly) than the second, and the second may be more labile than the first. Thus, the first and second codon posi tions may be given relatively greater weight, respectively (such as a weight of 10 for the first codon position, 5 for the second position, and 1 for the third position). The logic here is that a change in codon position 1 or 2 is less likely to have occurred at random within a taxon and more likely represents evolutionary novelties that are shared among taxa. Weighting by codon position may be based on empirical data. For a given data set, the number of changes occurring for codon positions 1, 2, and 3 may be used (inversely) to establish the relative weights. Another weighting parameter that may be used with DNA sequence data concerns transitions versus transversions. Transitions are evolutionary changes from one purine to another purine (A —, G or G —* A) or from one pyrimidine to another pyrimidine (C —* T or T —> C); see Figure 14.1. Transversions are evolutionary changes from a purine to a pyrimidine (A — C, A —, T, G — C, or G — T) or from a pyrimidine to a purine (C —* A, C — G, T —> A, or T — G). Weighting using transitions versus transversions may be based on empirical data. For a given data set, the relative frequency 592 CHAPTER 14 PLANT MOLECULAR SYSTEMATICS 1 2 3 4 5 6 7 8 E 00000000000 0000000 01111111111111111111111 8 888 8 8 8 8 899999999990000 0 00000111111111122 12345678901234567890123456789012345678901 GCCTAGCC AAGCTCTTCCAAGGTGACTCTCAGTTCAAGCT GCT GCCTAGCCAAGCTCTTCCAAGCTGACTCTCA GCCTAGCC TAAGCTCAACCAAGGTGTCTCTCAGTTCAAGC T GCCTAGCC TAAGCTCTTCCAAGGTGTCTCTCAGTTCAAGCT GCCTAGCCAAAGCTCTTCCAAGCTGACTCTCA GCT CCCTAGC C AAAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT CCC TAGCCAAAGCTCTTC CAAGCTGACTCTCAGTTCAAGCT GCCTAGCC AAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT 123456 203204 203105 230234 233234 203105 103104 103104 233104 FIGURE 14.6 Example of alignment of DNA sequences of 41 nucleotide sites (positions 81—121) from eight taxa. Variable nucleotide sites are in bold. Note deletion of six bases in taxon 2 and taxon 5. Possible character coding of variable sites is seen at right. Coding of nucleotides is as follows: A state 0; C state 1; G state 2; T = state 3. In this example, the deletion is coded as a single binary character (character 6), coded differently from nucleotides, as state 4 = deletion absent and state 5 deletion present. of transitions versus transversions may be used (inversely) to establish the relative weights. For example, for a given group under study, if transitions occur 5x more frequently than transversions, the latter may be given a weight of 5 and the former a weight of 1, as illustrated in the step matrix of Figure 14.7. These weighting schemes may be viewed as a simplified component of a process that may be quite complex, taking into account, e.g., rate of base substitution, base frequency, and branch length in determining an evolutionary model. Evolutionary models are commonly used in maximum likeli hood and Bayesian analyses. (See Chapter 2.) DNA sequence data can also be used to evaluate the secondary structure of a molecule. Thus, nucleotide differ ences that result in major changes in the conformation of the product (whether ribosomal RNA or protein) may have a much greater physiological effect than those that do not and might receive a higher weight. Computer algorithms can evaluate this to some degree. A 0 1 T C G A 1 5 5 G 5 5 0 C 0 1 5 5 T 1 5 5 0 FIGURE 14.7 Step matrix of nucleotide changes, showing weight ing scheme in which transversions are given a weight 5 times greater than that of transitions. SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY Character Coding DNA Alignment Taxon Taxon Taxon Taxon Taxon Taxon Taxon Taxon UNIT III Parsimony, maximum likelihood, and Bayesian methods are commonly used to infer phylogenetic relationships using DNA sequence data (Chapter 2). The most robust hypotheses of relationship are generally those using a large taxon sampling and sequence data from multiple (e.g., anywhere from 3 to 20+) genes and/or sequence regions. RESTRICTION SITE ANALYSIS (RFLPs) A restriction site is a sequence of approximately 6—8 base pairs of DNA that binds to a given restriction enzyme. These restriction enzymes, of which there are many, have been isolated from bacteria. Their natural function is to inac tivate invading viruses by cleaving the viral DNA. Restriction enzymes known as type II recognize restriction sites and cleave the DNA at particular locations within or near the restriction site. An example is the restriction enzyme EcoPJ (named after E. coli, from which it was first isolated), which recognizes the DNA sequence seen in Figure 14.8 and cleaves the DNA at the sites indicated by the arrows in this figure. Restriction fragment length polymorphism, or RFLP, refers to differences between taxa in restriction sites, and therefore the lengths of fragments of DNA following cleav age with restriction enzymes. For example, Figure 14.9 shows, for two hypothetical species, amplified DNA lengths of 10,000 base pairs that are subjected to (“digested with”) the restriction enzyme EcoRI. Note, after a reaction with the EcoRI enzyme, that the DNA of species A is cleaved into three fragments, corresponding to two EcoRI restriction sites, whereas that of species B is cleaved into four fragments, corresponding to three EcoRI restriction sites. The relative c iiiiiiiiiiiiiiiiiiiiiiiiini c iiiiiiiiiiiiiiiiiiiiiiiiiiii 1:’ —. — — ‘i’ — — —....—.z..— c iiiiiiiiiiiiiiiiiiiiiiiiii 4 FIGURE 14.8 A DNA restriction site, cleaved (at arrows) by the restriction site enzyme EcoRI. locations of these restriction sites on the DNA can be mapped; one possibility is seen at the bottom of Figure 14.9. (Note that there are other possibilities for this map; precise mapping requires additional work.) Additional restriction enzymes can be used. Figure 14.10 illustrates how each of the DNA fragments from the EcoRI digests can be digested with the BAM HI restriction enzyme, yielding different fragments for the two species. These data can be added to the original in preparing a map (one possible map is shown in lower part of Figure 14.10). Restriction site fragment data can be coded as characters and character states in a phylogenetic analysis. For example, given that the restriction site maps of Figure 14.10 are correct, the presence or absence of these sites can be coded as characters, as seen in Figure 14.11. Restriction site analysis contains far less data than complete DNA sequencing, accounting only for the presence or absence of sites 6—8 base pairs long. It has the advantage, however, of surveying con siderably larger segments of DNA. However, with improved and less expensive sequencing techniques, it is less valuable and less often used than in the past. ALLOZYMES Allozymes are different molecular forms of an enzyme that correspond to different alleles of a common gene (locus). (This is not to be confused with isozymes, which are forms of an enzyme that are derived from separate genes or loci.) Allozymes are traditionally detected using electrophoresis, in which the enzymes are extracted and placed on a medium (e.g., starch) through which an electric current runs (similar to gel electrophoresis in DNA sequencing). A given enzyme will migrate toward one pole or the other depending on its charge. Similarly, different allozymes of an enzyme will migrate differentially because they differ slightly in amino acid composition and therefore have somewhat different 593 electrical charges: Allozymes subjected to electrophoresis are identified with a stain specific to that enzyme and the bands marked by their relative position on the electrophoresis medium. Allozymes have traditionally been used to assess genetic variation within a population or species, but they can also be used as data in phylogenetic analyses of closely related spe cies, e.g., species within a monophyletic genus. Figure 14. 12A illustrates an example of electrophoretic allozyme banding data for five species and an outgroup. There are several ways to code polymorphic allozyme data. One way is to code each allele as a character and the presence or absence of that allele as a character state. A second way to code allozyme data is to treat the locus (corresponding to the gene coding for the enzyme) as the character and all unique combinations of alleles as character states (as in Figure 14.12B). The number of state changes between these unique allelic combinations can be a default of one. However, another method of coding is to treat the loss of each allele as one state change and the gain of an allele as a separate state change. Thus, the number of state changes between different allelic combinations can vary, as seen in Figure 14.12C. Step matrices (see Chapter 2) are used to code these in a cladistic analysis. Yet another way to code ailozyme data is to take into account the frequency of alleles present in a given taxon. For example, by this method, species A, which has allele X present with a fre quency of 95% and allele Y with a frequency of 5%, would receive a different coding from species B, which has the same alleles, but in frequencies of 55% and 45%, respectively. MICROSATELLITE DNA Microsatellites are regions of DNA that contain short (usually 2—5) repeats of nucleotides, an example being TGTGTG, in which two base pairs repeat. The regions are termed tandem repeats; if they vary within a population or species, they are called variable-number tandem repeats (VNTR). (Other designations and acronyms are used, depending on the particular field of study.) These tandem repeats can be located all across the genome; at a given location (locus), the repeat will tend to be of a cer tain length. However, individuals within or between popu lations may vary in the number of tandem repeats at a given locus (or even show allelic variation) because of irregularities in crossing-over and replication. Thus, variable-number tandem repeats can be used as a genetic marker. Microsatellites are identified by constructing primers that flank the tandem repeats and then using PCR technology.
© Copyright 2026 Paperzz