plant molecular systematics

584
CHAPTER 13
PLANT REPRODUCTIVE BIOLOGY
14
EXERCISES
1. Examine specimens of two species of plants plus any putative hybrids between them. (a) Study both vegetative and floral
characters, from original observations or using a manual of the area, and note which diagnostic features distinguish the two
species. (b) Decide upon which characters to measure in the specimens available. (c) Record 10—25 measurements of each
of the parameters chosen. Compare these by preparing graphs in order to recognize discontinuities (or lack thereof) of the
three taxa.
2. Locate a population of a composite (Asteraceae) species that has both disk and ray flowers. Observe insect visitors
(potential pollinators) in each of two subsets of plants (or inflorescences): one undisturbed and another with all ray flowers
removed. Count the number and type of visitors over a time period (e.g., 10—30 minutes) and record.
3. If material is available, observe ultraviolet light—sensitive regions in the perianth by placing a flower into a jar saturated with
ammonium vapors. Bees can detect these UV-reflective regions of the flower, enabling them to find flowers and orient to
pollen or nectar more efficiently.
4. Fix the styles of a species of flowering plant in 70% alcohol. Remove the style and place in drops of aniline blue on a micro
scope slide, covered by a cover slip. If this style is small enough, it may be “squashed” by applying firm pressure on the
cover slip (using, e.g., a cork). Observe under fluorescence microscopy. Pollen tubes regularly deposit callose, which dif
ferentially picks up the aniline blue stain. This method allows for detection of pollen tube growth and can be used to test
whether self-incompatibility is occuring.
5. If time permits, select a plant species and perform the crossing and caging experiments described in the text. These
techniques are used to test the potential and degree of self-pollination versus cross-pollination.
6. Peruse journal articles on plant systematics, e.g., American Journal of Botany, Systematic Botany, or International Journal
of Plant Sciences, and note those that describe aspects of reproductive biology in relation to systematic studies. Identify the
techniques used and the problems addressed.
REFERENCES FOR FURTHER STUDY
Bernhardt, P. 1989. Wily Violets and Underground Orchids. W. Mono & Co., New York.
Chittka, L., A. Shmida, N. Troje, and R. Menzel. 1994. Ultraviolet as a component of flower reflections, and the colour perception of
Hymenoptera. Vision Research 34: 1489—1 508.
Faegri, K., and L. van der Pijl. 1979. The Principles of Pollination Ecology, 3rd ed. Pergamon Press, Oxford.
Stace, C. A. 1989. Plant Taxonomy and Biosystematics, 2nd ed. Edward Arnold. Distributed in the U.S.A. by Routledge, Chapman, and Hall,
New York.
van der Pijl, L. 1982. Principles of Dispersal in Higher Plants. Springer-Verlag, Berlin.
PLANT MOLECULAR SYSTEMATICS
ACQISIT1ON OF MOLECULAR DATA
585
MICROSATELLITE DNA
593
DNA SEQJENCE DATA
585
RANDOM AMPLIFIED POLYMORPHIC DNA
(RAPDs)
596
Polymerase Chain Reaction
DNA Sequencing Reaction
Types of DNA Sequence Data
Analysis of DNA Sequence Data
585
587
587
590
AMPLIFIED FRAGMENT LENGTH
POLYMORPHISM (AFLPs)
596
592
REVIEW QUESTIONS
EXERCISES
600
RESTRICTION SITE ANALYSIS (RFLPs)
ALLOZYMES
593
REFERENCES FOR FURTHER STUDY
601
600
Molecular systematics encompasses a series of approaches in
which phylogenetic relationships are inferred using informa
tion from macromolecules of the organisms under study.
Specifically, the types of molecular data acquired include that
from DNA sequences, DNA restriction sites, allozymes,
microsatellites. RAPDs, and AFLPs. (The use of data from
other, generally smaller molecules, such as secondary
compounds in plants, is usually relegated to the field of
in a container of silica gel. Alternatively, plant samples
may be frozen or placed in concentrated extraction buffer.
With any of these procedures, DNA is usually preserved
intact. Usable DNA is often successfully isolated from
dried herbarium sheets, attesting to the “toughness” of the
molecule.
“chemosystematics” and will not be reviewed here.)
DNA SEQJJENCE DATA
A revolution in inferring the phylogenetic relationships of
life is occurring with the use of molecular data. The follow
ing is a review of the types of data, methods of acquisition,
and methods of analysis of molecular systematics.
Perhaps the most important method for inferring phylo
genetic relationships of life is that of acquiring DNA sequences.
DNA sequence data basically refers to the sequence of
nucleotides (adenine = A, cytosine = C, guanine = 0, or
thymine = T; Figure 14.1) in a particular region of the DNA
ACOJISITION OF MOLECULAR DATA
Plant samples from which DNA is to be isolated may be
acquired by various means. It is vital to always collect a
of a given taxon. Comparisons of homologous regions of
DNA among the taxa under study yield the characters
and character states that are used to infer relationships in
phylogenetic analyses.
The first step of acquiring DNA sequence data is to
proper voucher specimen, properly mounted and accessioned
in an accredited herbarium, to serve as documentation for any
identify a particular region of DNA to be compared between
species. Much prior research goes into identifying these regions
molecular systematic study (see Chapter 17). Live samples
may be collected and immediately subjected to chemical
processing, e.g., for allozyme analysis (see later discussion).
and determining their efficacy in phylogenetic analysis.
POLYMERASE CHAIN REACTION
For many DNA methods, pieces of leaves (from which chloro
plast, mitochondrial, and nuclear DNA can be isolated) are
removed from the live plant and immediately dried, typically
After a gene sequence of interest is identified, the DNA from
a given plant sample is firstisolated and purified by various
585
02010 Elsevier Inc. All rights reserved.
doi: 1O.10161B978-O-12-374380-O00014-3
1
586
UNIT III
PLANT MOLECULAR SYSTEMATICS
CHAPTER 14
3
CH
N
NtTh
HNNH
HNNH
II
II
guanine
adenine
0
0
cytosine
thymine
FIGURE 14.1 Molecular structure of the four DNA nucleotides. Adenine and guanine are chemically similar purines; cytosine and thymine
are chemically similar pyrimidines.
The polymerase chain reaction work as follows (see
Figure 14.2). Prior research establishes the occurrence of
relatively short regions of DNA that flank (occur at each
end of) the gene or DNA sequence of interest and that are
both unique (not occurring elsewhere in the genome) and
conserved (i.e., invariable) in all taxa to be investigated.
chemical procedures. Following this, the DNA sequences of
interest are amplified using the polymerase chain reaction
(or PCR). The invention of this technology was crucial to
modern DNA sequencing, as it permitted rapid and efficient
DNA amplification, the replication of thousands of copies
of DNA.
repeat
cycle
sample DNA
*
u
(4
(4
*
4
*
4)
*
DNA
denatures
solution
heated
6
4
*
*
)
*
C
*
4
6
ç
C.
*
C
*
DNA
renatures
temperature
lowered
5’
3’
9
3’
primers anneal
to conserved regions
5’ A-T-C-G-G-T-T
T-A-G-C-C-A-A-T-C-G-C-T
3’
....
n
5’
.••
A-T-C-G-G-T-T-A-G-c-G-A,.
-
n
A-A-T-T-A-C-C-T-C-C-A--T
3’
5’ A-T-C-G-G-T-T--Q--A
.... T-A-G-C-C-A-A-T-C-G-C-T,,
5’
n••
3’
n
-
T-T-A-A-T-C-G-A-G-G---.
u.A-A-T-T-A-G-C-T-C-C-A-J-T
A-T-C-G-G-T-T-A-G-C-G-AU
T-A-G-C-C-A-A-T-C-G-c-T.
n.
““T-T-A-A-T-C-G-A-G-G-p-T-A
-C-C-T-C-C-A-A-T 5’
3’
DNA strands
replicated
5’
3’
““T-T-A-A-P-C-G-A-G-G-T-TC-T-C-C-A-A-T 5’
free nucleotides
(catalyzed by DNA polymerase)
bind to primers
3’
5’
5’ A-T-C-C-G-T-T-A-G-C-G-A
P-A-G-C-C-A-A-T-c-Q-c-’r.,.,
...
n
A-T-C-G-G-T-T-A-G-C-G-A.,.
-
nA-A-T-T-A-G-C-T-C-c-A-A-”
n
UUTTAATCGAGGTTA
-AT---C-C-T-C-C-A-A-T 5’
FIGURE 14.2
3’
5’
Polymerase chain reaction, using cycle sequencing to produce multiple copies of a stretch of DNA.
5’
3’
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
These short, conserved, flanking regions are used as a
template for the synthesis of multiple, complementary copies,
known as primers. Primers ideally are constructed such that
they do not bind with one another.
In the polymerase chain reaction, a solution is prepared,
made up of the isolated and purified DNA of a sample;
multiple copies of primers; free nucleotides; DNA polymerase
molecules (typically Taq polymerase, which can tolerate heat);
and buffer and salts. This solution is heated to a point at which
the sample DNA denatures, whereby the two strands of DNA
separate from one another. Once the sample DNA denatures,
the primers in solution may bind with the corresponding,
complementary DNA of the sample (Figure 14.2). Following
binding of the primer to the sample DNA, individual nude
otides in solution attach to the 3’ end of the primer, with the
sample DNA acting as a template; DNA polymerase cata
lyzes this reaction. A second primer, at the opposite end of
the DNA sequence of importance, is used for the complemen
tary, denatured DNA strand. Thus, the two denatured strands
of DNA are replicated. After replication, the solution is
cooled to allow for annealing of the replicated DNA with the
complementary DNA single strands. This is followed by
heating to the point of DNA denaturation, and repeating the
process. A typical PCR reaction can produce more than a
million copies of DNA in a matter of hours.
DNA SEOJENCING REACTION
After DNA is replicated, it is sequenced. The most common
sequencing technology involves a machine that reads fluores
cent dyes with a laser detector. The production of dye-labeled
DNA is very similar to DNA replication using the PCR. The
replicated DNA is placed into solution with DNA polymer
ase, primers, free nucleotides, and a small concentration of
synthesized compounds called dideoxynucleotides (discussed
later) that are each attached to a different type of fluorescent
dye. As in the polymerase chain reaction, the sample DNA is
heated until the double helix unwinds and the two comple
mentary DNA chains separate (Figure 14.3). At this point,
a primer attaches to a conserved region of one of the strands
of DNA, and free nucleotides in solution join to the 3’ end of
the primer, using the sample DNA as a template and cata
lyzed by DNA polymerase (Figure 14.3). Thus, a replicated
copy of the DNA strand begins to form. However, at some
point a dideoxynucleotide joins to the new strand instead
of a nucleotide doing so. The dideoxynucleotides (dideoxy
adenine, dideoxycytosine, dideoxyguanine, and dideoxythymine)
resemble the four nucleotides, except that they lack a hydroxyl
group. Once a dideoxynucleotide is joined to the chain,
absence of the hydroxyl group prevents the DNA polymerase
from joining it to anything else. Thus, with the addition of
587
a dideoxynucleotide, synthesis of the new DNA strand
terminates (Figure 14.3).
The ratio of dideoxynucleotides to nucleotides in the reac
tion mixture is carefully set and is such that the concentration
of dideoxynucleotides is always much smaller than that of
normal nucleotides. Thus, the dideoxynucleotides may termi
nate the new DNA strand at any point along the gene being
replicated. For example, some of the new DNA strands will
be the length of the primer plus one additional base (in this
case the dideoxynucleotide); some will be the primer length
plus two bases (a nucleotide plus the terminal dideoxynucle
otide); some will be the primer length plus three bases (two
nucleotides plus the terminal dideoxynucleotide); etc. There
are many thousands, if not millions, of copies of the sample
DNA. Thus, there will be an equivalent number of newly
replicated DNA strands, of all different lengths.
The final step of DNA sequencing entails subjecting the
DNA strands to electrophoresis, in which the DNA is loaded
onto a flat gel plate or in a thin capillary subjected to an elec
tric current. Because the phosphate components of nucleic
acids give DNA a net negative charge, the molecules are
attracted to the positive pole. The DNA strands migrate
through the medium over time, the amount of migration
inversely proportional to the molecular weight of the strand
(i.e., lighter strands migrate further). Each strand is termi
nated with a dideoxynucleotide to which a fluorescent dye
is attached; each of the four dideoxynucleotides has a differ
ent type of fluorescent dye, which (upon excitation) emits
light of a different wavelength. Thus, as the multiple copies
of DNA of one particular length migrate along the gel or
capillary, the wavelength of emitted light is detected and
recorded as a peak, which measures the light intensity.
Because a given emitted wavelength (“color”) is determined
by one of the four dideoxynucleotides, the corresponding
nucleotide can be inferred and its position identified by
the timing of migration of the DNA strands. In this way, the
sequence of nucleotides of the DNA strand can be inferred
(Figure 14.3).
TYPES OF DNA SEQ.JENCE DATA
For plants, the three basic types of DNA sequence data stem
from the three major sources of DNA: nuclear (nDNA),
chioroplast (cpDNA), and mitochondrial (mtDNA). Nuclear
DNA is, of course, transmitted from parent(s) to offspring
by nuclear division (meiosis or mitosis) via sexual or asexual
(somatic) reproduction. Chloroplasts and mitochondria,
however, replicate and divide independently of the nucleus
and may be transmitted to offspring in a different fashion. For
example, in angiosperms these organelles are usually (with
some exceptions) sexually transmitted only maternally, being
586
CHAPTER 14
UNIT III
PLANT MOLECULAR SYSTEMATICS
3
CH
0
Fio
LE
HN.NH
0
adenine
cytosine
guanine
thymine
FIGURE 14.1 Molecular structure of the four DNA nucleotides. Adenine and guanine are chemically similar purines; cytosine and thymine
are chemically similar pyrimidines.
The polymerase chain reaction work as follows (see
Figure 14.2). Prior research establishes the occurrence of
relatively short regions of DNA that flank (occur at each
end of) the gene or DNA sequence of interest and that are
both unique (not occurring elsewhere in the genome) and
conserved (i.e., invariable) in all taxa to be investigated.
chemical procedures. Following this, the DNA sequences of
interest are amplified using the polymerase chain reaction
(or PCR). The invention of this technology was crucial to
modem DNA sequencing, as it permitted rapid and efficient
DNA amplification, the replication of thousands of copies
of DNA.
repeat
cycle
sample DNA
4
(.
.
C’
6
solution
heated
*
,..
05
*
*
DNA
denatures
v
3’
5’
5
3
CS
,
*
‘S
.s
*
C
c
•
4
3’
..
T-A-G-C-C-A-A-T-C-G-C-T n.
- ...
5’
n.
A-T-C-G-G-T-T-A-G-C-G_An.
-
3’
5’ A-T-C-G-G-T-T
3’ •T-A-G-C-C-A-A--C-G-C-T
..
..‘A-A-T-T-A-G-C-T-C-C-A-A-
5’
5’
-
.‘I-T-A-A-T-C-G-A-G-G-T-T-A
3’
A-T-C-G-G-T-T-A-C-C-G-A.”
,
C
4
*
C’S
.
o
T-A-G-C-C-A-A-T-C-G-C-Tn
T-T-A--T-C-G-A-G-G-T-TA
...
3’
A-A-T-T-A-G-C-T-C-C-A-A-T
..
5’
3’
“T-T-A-A-T-C-G-A-G-G-T-T-A
n
-:-:-T-A-G-c-T-c-c-A-A-r 5’
DNA strands
replicated
C-T-C-C-A-A-T 5’
5’ A-T-C-G-G-T-T-A-G-C--A
free nucleotides
(catalyzed by DNA polymerase)
bind to primers
FIGURE 14.2
3’
u.
T-A-G-C-C-A-A-T-C-G-C-T
u.
-
5’
u••
A-T-C-G-G-T-T-A-G-C-G-A
n
-
A-A-T-T-A-G-C-T-C-C-A-A-T
u
T-T-A-A-T-C-G-A-G-G-T-T-A u.
A-A-T-T-A-G-C-T-C-C-A-A-T 5’
Polymerase chain reaction, using cycle sequencing to produce multiple copies of a stretch of DNA.
These short, conserved, flanking regions are used as a
template for the synthesis of multiple, co,npletnentary copies,
known as primers. Primers ideally are constructed such that
they do not bind with one another.
In the polymerase chain reaction, a solution is prepared,
made up of the isolated and purified DNA of a sample;
multiple copies of primers; free nucleotides; DNA polymerase
molecules (typically Taq polymerase, which can tolerate heat);
and buffer and salts. This solution is heated to a point at which
the sample DNA denatures, whereby the two strands of DNA
separate from one another. Once the sample DNA denatures,
the primers in solution may bind with the corresponding,
complementary DNA of the sample (Figure 14.2). Following
binding of the primer to the sample DNA, individual nude
otides in solution attach to the 3’ end of the primer, with the
sample DNA acting as a template; DNA polymerase cata
lyzes this reaction. A second primer, at the opposite end of
the DNA sequence of importance, is used for the complemen
tary, denatured DNA strand. Thus, the two denatured strands
of DNA are replicated. After replication, the solution is
cooled to allow for annealing of the replicated DNA with the
complementary DNA single strands. This is followed by
heating to the point of DNA denaturation, and repeating the
process. A typical PCR reaction can produce more than a
million copies of DNA in a matter of hours.
*
DNA
renatures
temperature
lowered
5’ A-T-C-G-G-T-T-A-G-C-C-A
primers anneal
to conserved regions
*
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
5’
3’
DNA SEQJENCING R]EACTION
After DNA is replicated, it is sequenced. The most common
sequencing technology involves a machine that reads fluores
cent dyes with a laser detector. The production of dye-labeled
DNA is very similar to DNA replication using the PCR. The
replicated DNA is placed into solution with DNA polymer
ase, primers, free nucleotides, and a small concentration of
synthesized compounds called dideoxynucleotides (discussed
later) that are each attached to a different type of fluorescent
dye. As in the polymerase chain reaction, the sample DNA is
heated until the double helix unwinds and the two comple
mentary DNA chains separate (Figure 14.3). At this point,
a primer attaches to a conserved region of one of the strands
of DNA, and free nucleotides in solution join to the 3’ end of
the primer, using the sample DNA as a template and cata
lyzed by DNA polymerase (Figure 14.3). Thus, a replicated
copy of the DNA strand begins to form. However, at some
point a dideoxynucleotide joins to the new strand instead
of a nucleotide doing so. The dideoxynucleotides (dideoxy
adenine, dideoxycytosine, dideoxyguanine, and dideoxythymine)
resemble the four nucleotides, except that they lack a hydroxyl
group. Once a dideoxynucleotide is joined to the chain,
absence of the hydroxyl group prevents the DNA polymerase
from joining it to anything else. Thus, with the addition of
587
a dideoxynucleotide, synthesis of the new DNA strand
terminates (Figure 14.3).
The ratio of dideoxynucleotides to nucleotides in the reac
tion mixture is carefully set and is such that the concentration
of dideoxynucleotides is always much smaller than that of
normal nucleotides. Thus, the dideoxynucleotides may termi
nate the new DNA strand at any point along the gene being
replicated. For example, some of the new DNA strands will
be the length of the primer plus one additional base (in this
case the dideoxynucleotide); some will be the primer length
plus two bases (a nucleotide plus the terminal dideoxynucle
otide); some will be the primer length plus three bases (two
nucleotides plus the terminal dideoxynucleotide); etc. There
are many thousands, if not millions, of copies of the sample
DNA. Thus, there will be an equivalent number of newly
replicated DNA strands, of all different lengths.
The final step of DNA sequencing entails subjecting the
DNA strands to electrophoresis, in which the DNA is loaded
onto a flat gel plate or in a thin capillary subjected to an elec
tric current. Because the phosphate components of nucleic
acids give DNA a net negative charge, the molecules are
attracted to the positive pole. The DNA strands migrate
through the medium over time, the amount of migration
inversely proportional to the molecular weight of the strand
(i.e., lighter strands migrate further). Each strand is termi
nated with a dideoxynucleotide to which a fluorescent dye
is attached; each of the four dideoxynucleotides has a differ
ent type of fluorescent dye, which (upon excitation) emits
light of a different wavelength. Thus, as the multiple copies
of DNA of one particular length migrate along the gel or
capillary, the wavelength of emitted light is detected and
recorded as a peak, which measures the light intensity.
Because a given emitted wavelength (“color”) is determined
by one of the four dideoxynucleotides, the corresponding
nucleotide can be inferred and its position identified by
the timing of migration of the DNA strands. In this way, the
sequence of nucleotides of the DNA strand can be inferred
(Figure 14.3).
TYPES OF DNA SEQUENCE DATA
For plants, the three basic types of DNA sequence data stem
from the three major sources of DNA: nuclear (nDNA),
chioroplast (cpDNA), and mitochondrial (mtDNA). Nuclear
DNA is, of course, transmitted from parent(s) to offspring
by nuclear division (meiosis or mitosis) via sexual or asexual
(somatic) reproduction. Chloroplasts and mitochondria,
however, replicate and divide independently of the nucleus
and may be transmitted to offspring in a different fashion. For
example, in angiosperms these organelles are usually (with
some exceptions) sexually transmitted only maternally, being
588
CHAPTER 14
UNIT Ill
PLANT MOLECULAR SYSTEMATICS
sample DNA (many copies)
add:
primer molecules,
nucleotides,
DNA polymerase,
dideoxynucleotides
solution heated,
DNA denatures
3’
3
51
5’
3’
a single primer anneals
to a conserved region
of one strand of
sample DNA
t
5’ A_T_C_G_G_T_T_A_G_C*
T-A-G-C-C-AA-T-C-GC-A..”
n
44
3’
5’ A-T-C-G-G-T-T-A-G
T-A-G-C-C-A-A-T-C-G-C-An.
n..
primer
5’ A-T-C-G-G-T-T
3’ .•.. T-A-G-C-C-A-A-T-C-G-C-A..” — ...A-A-T-T-A-G-C-T-C-C-A-A-T •n
first nucleotide
(catalyzed by DNA polymerase)
\bindststrand
‘
5’ ATCGGTTA
T-A-G-CC-A-A-T-C-G-C-A
atpB
5’
at random,
dideoxynucleotide
(C* in this case)
binds to primer strand,
terminating reaction
n.A-A-T-T-A-G-C-T-C-C-A-A-T rn
t
5’
sample DNA
5
matK
second nucleotide
binds to primer strand
n’A-ATT-A-G-C-T-C-C-A-A-T
5’
Inverted
Repeat B
(+)
4
retained in the egg but excluded in sperm cells. (In conifers,
interestingly, chloroplast DNA is transmitted paternally, not
maternally.)
The use of sequence data from the DNA of chloroplasts
has proven to be very useful in elucidating both lower
and higher level relationships. The basic structure of chloro
plast DNA for a flowering plant, with coding genes indicated,
is shown in Figure 14.4. Like all organelle and prokaryotic
DNA, chloroplast DNA is circular. Curiously, most angio
sperms have a region of chloroplast DNA known as the
inverted repeat, which is the mirror image of the correspond
ing region (Figure 14.4). Some of the more commonly
sequenced chioroplast DNA genes are listed in Table 14.1,
although many more have been utilized.
A-T-C-G-G-T-T-A
A_T_C_G_G_T_T_A_G_C*
DNA strands scanned during migration.
Peaks of wavelengths correspond to
fluorescent dyes attached to specific
dideoxynucleotides
A-T-C-G-G-T-T-A-G-C
A_T_C_G_G_T_T_A_G_C_G_T*
(—)
FIGURE 14.3 DNA sequencing reactions. A*
Inverted
Repeat A
FIGURE 14.4 Molecular structure of the chloroplast DNA of tobacco (Nicotiana tabacum). Note large single-copy region (LSC),
small single-copy region (SSC), and the two inverted repeats (IRA and IRB). Also note location of atpB, rbcL, inatK, and ndhF genes (see
Table 14.1). (Redrawn from Wakasugi, T., M. Sugita, T. Tsudzuki, and M. Sugiura. 1998. Updated gene map of tobacco chloroplast DNA.
Plant Molecular Biology Reporter 16: 23 1—241, by permission.)
A-T-C-G-G-T-T
ELECTROPHORESIS:
electric current applied.
DNA strands
migrate to (+) pole
(inversely to
molecular weight)
589
Large Single
Copy Region
new DNA strands denatured
from sample DNA;
after numerous reactions
new DNA strands separated
by electrophoresis (below)
..A-A-TT-A-GC-TC-CAAT
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
=
dideoxyadenine; C’
=
dideoxycytosine; G*
=
dideoxyguanine; 1*
=
dideoxythymine.
In addition to coding genes of chioroplast DNA, the
sequences between genes, known as intergenic spacers, may
be used in phylogenetic analyses. Intergenic spacer regions
often show a higher degree of variability than the coding
genes, making the former more useful for analyses at a lower
täxonomic level, such as species or infraspecies. A list of
some commonly used chloroplast intergenic spacers is seen
in Table 14.2.
Nuclear DNA sequencing has been used to a lesser degree
in plant systematics. Some nuclear genes such as alcohol
dehydrogenase (Adh), which has traditionally been used in
allozyme studies, are becoming more frequently used.
One of the more useful types of nuclear DNA sequences
has been the internal transcribed spacer (ITS) region,
i
588
CHAPTER 14
UNIT Ill
PLANT MOLECULAR SYSTEMATICS
sample DNA (many copies)
add:
primer molecules,
nucleotides,
DNA polymerase,
dideoxynucleotides
solution heated,
DNA denatures
3’
3
51
5’
3’
a single primer anneals
to a conserved region
of one strand of
sample DNA
t
5’ A_T_C_G_G_T_T_A_G_C*
T-A-G-C-C-AA-T-C-GC-A..”
n
44
3’
5’ A-T-C-G-G-T-T-A-G
T-A-G-C-C-A-A-T-C-G-C-An.
n..
primer
5’ A-T-C-G-G-T-T
3’ .•.. T-A-G-C-C-A-A-T-C-G-C-A..” — ...A-A-T-T-A-G-C-T-C-C-A-A-T •n
first nucleotide
(catalyzed by DNA polymerase)
\bindststrand
‘
5’ ATCGGTTA
T-A-G-CC-A-A-T-C-G-C-A
atpB
5’
at random,
dideoxynucleotide
(C* in this case)
binds to primer strand,
terminating reaction
n.A-A-T-T-A-G-C-T-C-C-A-A-T rn
t
5’
sample DNA
5
matK
second nucleotide
binds to primer strand
n’A-ATT-A-G-C-T-C-C-A-A-T
5’
Inverted
Repeat B
(+)
4
retained in the egg but excluded in sperm cells. (In conifers,
interestingly, chloroplast DNA is transmitted paternally, not
maternally.)
The use of sequence data from the DNA of chloroplasts
has proven to be very useful in elucidating both lower
and higher level relationships. The basic structure of chloro
plast DNA for a flowering plant, with coding genes indicated,
is shown in Figure 14.4. Like all organelle and prokaryotic
DNA, chloroplast DNA is circular. Curiously, most angio
sperms have a region of chloroplast DNA known as the
inverted repeat, which is the mirror image of the correspond
ing region (Figure 14.4). Some of the more commonly
sequenced chioroplast DNA genes are listed in Table 14.1,
although many more have been utilized.
A-T-C-G-G-T-T-A
A_T_C_G_G_T_T_A_G_C*
DNA strands scanned during migration.
Peaks of wavelengths correspond to
fluorescent dyes attached to specific
dideoxynucleotides
A-T-C-G-G-T-T-A-G-C
A_T_C_G_G_T_T_A_G_C_G_T*
(—)
FIGURE 14.3 DNA sequencing reactions. A*
Inverted
Repeat A
FIGURE 14.4 Molecular structure of the chloroplast DNA of tobacco (Nicotiana tabacum). Note large single-copy region (LSC),
small single-copy region (SSC), and the two inverted repeats (IRA and IRB). Also note location of atpB, rbcL, inatK, and ndhF genes (see
Table 14.1). (Redrawn from Wakasugi, T., M. Sugita, T. Tsudzuki, and M. Sugiura. 1998. Updated gene map of tobacco chloroplast DNA.
Plant Molecular Biology Reporter 16: 23 1—241, by permission.)
A-T-C-G-G-T-T
ELECTROPHORESIS:
electric current applied.
DNA strands
migrate to (+) pole
(inversely to
molecular weight)
589
Large Single
Copy Region
new DNA strands denatured
from sample DNA;
after numerous reactions
new DNA strands separated
by electrophoresis (below)
..A-A-TT-A-GC-TC-CAAT
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
=
dideoxyadenine; C’
=
dideoxycytosine; G*
=
dideoxyguanine; 1*
=
dideoxythymine.
In addition to coding genes of chioroplast DNA, the
sequences between genes, known as intergenic spacers, may
be used in phylogenetic analyses. Intergenic spacer regions
often show a higher degree of variability than the coding
genes, making the former more useful for analyses at a lower
täxonomic level, such as species or infraspecies. A list of
some commonly used chloroplast intergenic spacers is seen
in Table 14.2.
Nuclear DNA sequencing has been used to a lesser degree
in plant systematics. Some nuclear genes such as alcohol
dehydrogenase (Adh), which has traditionally been used in
allozyme studies, are becoming more frequently used.
One of the more useful types of nuclear DNA sequences
has been the internal transcribed spacer (ITS) region,
i
_____________
590
TABLE
CHAPTER 14
14.1
Large single-copy region of chloroplast
rbcL
Large single-copy region of chloroplast
matK
Large single-copy region of chloroplast
Small single-copy region of chloroplast
LEU1
position of a given gene) are arranged in corresponding
columns (Fiure 14.6). For some genes that are relatively
conserved, alignment is straightforward, as all taxa have the
same number of nucleotides per gene. For other genes or
DNA segments, some taxa may have one or more additions,
deletions, inversions, or translocations relative to other taxa.
The occurrence of these mutations, and/or the occurrence of
considerable homoplasy among taxa, can make alignment
of DNA sequences difficult. In addition, multiple copies of
a gene can make homology assessment difficult. Various
computer algorithms can be used to automatically align
sequences of the taxa being studied, but these have assump
tions that must be carefully assessed.
Generally, in using DNA sequence data in a phylogenetic
analysis, a character is equivalent to the nucleotide position,
and a character state of that character is the specific nucleotide
at that position (there being four possible character states, cor
responding to the four nucleotides; see Figure 14.6). A large
number (often the great majority) of nucleotide positions are
generally invariable among taxa, and some of the variable
ones are often uninformative by being autapomorphic for a
given taxon; thus, relatively few sites are informative and
therefore useful in phylogenetic reconstruction (Figure 14.6).
Some chloroplast intergenic spacer regions that have been used in plant molecular systematics, after Shaw et al. 2005,
2007.
CHLOROPLAST INTERGENIC SPACER REGIONS
3 ‘rpsl6-5 ‘trnK
3 ‘trnK-matK intron
3’trnV-ndhC
5 ‘rpSl2-rpL2O
atpl-atpH
matK-5 ‘trnK intron
ndhA intron
ndhF-rp132
ndhJ-trnF
petL-psbE
psal-accD
psbA-3’trnK
psbB-psbH
psbD-trnT
psbJ-petA
psbM-trnD
rpll4-rps8-infA-rp136
rpl]6 intron
rp132-trnL
rpoB-trnC
rpsl6 intron
rps4-trnT
trnC-ycf6
trnD-trnT
trnG intron
trnH-psbA
trnL intron
trnL-trnF
trnQ-S’rpsl6
trnS-rps4
trnS-trnfM
trnS-trnG
trnT-trnL
ycf6-psbM
ITS3
ITS5
rr
L
5.8S
nrDNA 1
TS2
18S nrDNA
26S
]_ITS1
A
ANALYSIS OF DNA SEQJENCE DATA
DNA sequence data is converted to characters and character
states to be used in phylogenetic analyses. First, the sequences
of a given length of DNA are aligned, in which homologous
nucleotide positions (e.g., corresponding to the same codon
I
I
Beta subunit of ATP synthethase, which functions in the synthesis of ATP via proton
translocation
Large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO),
which functions in the initial fixation of carbon dioxide in the dark reactions
Maturase, which functions in splicing type II introns from RNA transcripts
Subunit of chloroplast NADH dehydrogenase, which functions in converting NADH to
NAD + H*, driving various reactions of respiration
which contains multiple DNA copies (as opposed to single
copies found in most protein-coding genes). The ITS region
lies between the 18S and 26S. nuclear ribosomal DNA
(nrDNA); the ITS region is divided into two subregions, ITS 1
and ITS2, separated by the 5.8S nrDNA (Figure 14.5A). ITS
sequence data has been most valuable for inferring phyloge
netic relationships at a lower level, e.g., between closely
related species. However, it has also been used in elucidating
higher level relationships. (See Baldwin et al. 1995.)
A related DNA sequence region is the external transcribed
spacer (ETS) region. The ETS region lies between 26S and 18S
nrDNA, adjacent to the latter (Figure 14.5B). (The entire region,
including both the ETS and the nontranscribed spacer region
(NTS) is known as the intergenic spacer region, or IGS; see
Figure 1 4.5B.) The ETS region contains even more sequence
variation than ITS and is useful in analyses at lower taxonomic
levels. (See Baldwin and Markos 1998.)
591
ITS Region
FUNCTION
atpB
TABLE 14.2
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
Some chloroplast genes that have been used in plant molecular systematics, after Soltis et al. 1998.
CHLOROPLAST GENES
GENE
LOCATION
nd/iF
UNIT III
PLANT MOLECULAR SYSTEMATICS
ITS2
ITS4
C
ETS-HeI-1
r
ETS
NTS
ETS-HeI-2
26S-IGS
18S nrDNA
1BS-E 18S-IGS 18S-ETS
B
IGS Region
A. Internal transcribed spacers (ITSs) of nuclear ribosomal DNA, illustrating the ITS region and flanking subunits, and show
spacer (ETS) of the intergenic spacer
ing the orientations and locations of primer sites. After Baldwin et al. (1995). B. External transcribed
(IGS) region, also showing orientations and locations of primer sites. After Baldwin and Markos (1998).
FIGURE 14.5
However, a major addition, deletion, inversion, or translo
cation can in itself be identified as an evolutionary novelty
(apomorphy), used in grouping lineages together. For exam
ple, members of the Faboideae (of the Fabaceae) lack, by
deletion, one of the inverted repeats found in the chloroplasts
of most angiosperms (see Figure 14.4). Chromosomal muta
tions such as these may be coded separately from single base
differences (e.g., as in the example of Figure 14.6) and may
be given relatively greater weight in inferring relationships.
Several types of weighting schemes may be done with
molecular data. For protein encoding genes, the codon posi
tion may be differentially weighted. For example, because
of redundancy of the genetic code, the third codon position
is generally more labile (a change more likely to have occurred
randomly) than the second, and the second may be more
labile than the first. Thus, the first and second codon posi
tions may be given relatively greater weight, respectively
(such as a weight of 10 for the first codon position, 5 for the
second position, and 1 for the third position). The logic here
is that a change in codon position 1 or 2 is less likely to
have occurred at random within a taxon and more likely
represents evolutionary novelties that are shared among taxa.
Weighting by codon position may be based on empirical data.
For a given data set, the number of changes occurring
for codon positions 1, 2, and 3 may be used (inversely) to
establish the relative weights.
Another weighting parameter that may be used with DNA
sequence data concerns transitions versus transversions.
Transitions are evolutionary changes from one purine to
another purine (A —, G or G —* A) or from one pyrimidine
to another pyrimidine (C —* T or T —> C); see Figure 14.1.
Transversions are evolutionary changes from a purine to
a pyrimidine (A — C, A —, T, G — C, or G — T) or from
a pyrimidine to a purine (C —* A, C — G, T —> A, or T — G).
Weighting using transitions versus transversions may be based
on empirical data. For a given data set, the relative frequency
_____________
590
TABLE
CHAPTER 14
14.1
Large single-copy region of chloroplast
rbcL
Large single-copy region of chloroplast
matK
Large single-copy region of chloroplast
Small single-copy region of chloroplast
LEU1
position of a given gene) are arranged in corresponding
columns (Fiure 14.6). For some genes that are relatively
conserved, alignment is straightforward, as all taxa have the
same number of nucleotides per gene. For other genes or
DNA segments, some taxa may have one or more additions,
deletions, inversions, or translocations relative to other taxa.
The occurrence of these mutations, and/or the occurrence of
considerable homoplasy among taxa, can make alignment
of DNA sequences difficult. In addition, multiple copies of
a gene can make homology assessment difficult. Various
computer algorithms can be used to automatically align
sequences of the taxa being studied, but these have assump
tions that must be carefully assessed.
Generally, in using DNA sequence data in a phylogenetic
analysis, a character is equivalent to the nucleotide position,
and a character state of that character is the specific nucleotide
at that position (there being four possible character states, cor
responding to the four nucleotides; see Figure 14.6). A large
number (often the great majority) of nucleotide positions are
generally invariable among taxa, and some of the variable
ones are often uninformative by being autapomorphic for a
given taxon; thus, relatively few sites are informative and
therefore useful in phylogenetic reconstruction (Figure 14.6).
Some chloroplast intergenic spacer regions that have been used in plant molecular systematics, after Shaw et al. 2005,
2007.
CHLOROPLAST INTERGENIC SPACER REGIONS
3 ‘rpsl6-5 ‘trnK
3 ‘trnK-matK intron
3’trnV-ndhC
5 ‘rpSl2-rpL2O
atpl-atpH
matK-5 ‘trnK intron
ndhA intron
ndhF-rp132
ndhJ-trnF
petL-psbE
psal-accD
psbA-3’trnK
psbB-psbH
psbD-trnT
psbJ-petA
psbM-trnD
rpll4-rps8-infA-rp136
rpl]6 intron
rp132-trnL
rpoB-trnC
rpsl6 intron
rps4-trnT
trnC-ycf6
trnD-trnT
trnG intron
trnH-psbA
trnL intron
trnL-trnF
trnQ-S’rpsl6
trnS-rps4
trnS-trnfM
trnS-trnG
trnT-trnL
ycf6-psbM
ITS3
ITS5
rr
L
5.8S
nrDNA 1
TS2
18S nrDNA
26S
]_ITS1
A
ANALYSIS OF DNA SEQJENCE DATA
DNA sequence data is converted to characters and character
states to be used in phylogenetic analyses. First, the sequences
of a given length of DNA are aligned, in which homologous
nucleotide positions (e.g., corresponding to the same codon
I
I
Beta subunit of ATP synthethase, which functions in the synthesis of ATP via proton
translocation
Large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO),
which functions in the initial fixation of carbon dioxide in the dark reactions
Maturase, which functions in splicing type II introns from RNA transcripts
Subunit of chloroplast NADH dehydrogenase, which functions in converting NADH to
NAD + H*, driving various reactions of respiration
which contains multiple DNA copies (as opposed to single
copies found in most protein-coding genes). The ITS region
lies between the 18S and 26S. nuclear ribosomal DNA
(nrDNA); the ITS region is divided into two subregions, ITS 1
and ITS2, separated by the 5.8S nrDNA (Figure 14.5A). ITS
sequence data has been most valuable for inferring phyloge
netic relationships at a lower level, e.g., between closely
related species. However, it has also been used in elucidating
higher level relationships. (See Baldwin et al. 1995.)
A related DNA sequence region is the external transcribed
spacer (ETS) region. The ETS region lies between 26S and 18S
nrDNA, adjacent to the latter (Figure 14.5B). (The entire region,
including both the ETS and the nontranscribed spacer region
(NTS) is known as the intergenic spacer region, or IGS; see
Figure 1 4.5B.) The ETS region contains even more sequence
variation than ITS and is useful in analyses at lower taxonomic
levels. (See Baldwin and Markos 1998.)
591
ITS Region
FUNCTION
atpB
TABLE 14.2
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
Some chloroplast genes that have been used in plant molecular systematics, after Soltis et al. 1998.
CHLOROPLAST GENES
GENE
LOCATION
nd/iF
UNIT III
PLANT MOLECULAR SYSTEMATICS
ITS2
ITS4
C
ETS-HeI-1
r
ETS
NTS
ETS-HeI-2
26S-IGS
18S nrDNA
1BS-E 18S-IGS 18S-ETS
B
IGS Region
A. Internal transcribed spacers (ITSs) of nuclear ribosomal DNA, illustrating the ITS region and flanking subunits, and show
spacer (ETS) of the intergenic spacer
ing the orientations and locations of primer sites. After Baldwin et al. (1995). B. External transcribed
(IGS) region, also showing orientations and locations of primer sites. After Baldwin and Markos (1998).
FIGURE 14.5
However, a major addition, deletion, inversion, or translo
cation can in itself be identified as an evolutionary novelty
(apomorphy), used in grouping lineages together. For exam
ple, members of the Faboideae (of the Fabaceae) lack, by
deletion, one of the inverted repeats found in the chloroplasts
of most angiosperms (see Figure 14.4). Chromosomal muta
tions such as these may be coded separately from single base
differences (e.g., as in the example of Figure 14.6) and may
be given relatively greater weight in inferring relationships.
Several types of weighting schemes may be done with
molecular data. For protein encoding genes, the codon posi
tion may be differentially weighted. For example, because
of redundancy of the genetic code, the third codon position
is generally more labile (a change more likely to have occurred
randomly) than the second, and the second may be more
labile than the first. Thus, the first and second codon posi
tions may be given relatively greater weight, respectively
(such as a weight of 10 for the first codon position, 5 for the
second position, and 1 for the third position). The logic here
is that a change in codon position 1 or 2 is less likely to
have occurred at random within a taxon and more likely
represents evolutionary novelties that are shared among taxa.
Weighting by codon position may be based on empirical data.
For a given data set, the number of changes occurring
for codon positions 1, 2, and 3 may be used (inversely) to
establish the relative weights.
Another weighting parameter that may be used with DNA
sequence data concerns transitions versus transversions.
Transitions are evolutionary changes from one purine to
another purine (A —, G or G —* A) or from one pyrimidine
to another pyrimidine (C —* T or T —> C); see Figure 14.1.
Transversions are evolutionary changes from a purine to
a pyrimidine (A — C, A —, T, G — C, or G — T) or from
a pyrimidine to a purine (C —* A, C — G, T —> A, or T — G).
Weighting using transitions versus transversions may be based
on empirical data. For a given data set, the relative frequency
592
CHAPTER 14
PLANT MOLECULAR SYSTEMATICS
1
2
3
4
5
6
7
8
E
00000000000 0000000 01111111111111111111111
8 888 8 8 8 8 899999999990000 0 00000111111111122
12345678901234567890123456789012345678901
GCCTAGCC AAGCTCTTCCAAGGTGACTCTCAGTTCAAGCT
GCT
GCCTAGCCAAGCTCTTCCAAGCTGACTCTCA
GCCTAGCC TAAGCTCAACCAAGGTGTCTCTCAGTTCAAGC T
GCCTAGCC TAAGCTCTTCCAAGGTGTCTCTCAGTTCAAGCT
GCCTAGCCAAAGCTCTTCCAAGCTGACTCTCA
GCT
CCCTAGC C AAAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT
CCC TAGCCAAAGCTCTTC CAAGCTGACTCTCAGTTCAAGCT
GCCTAGCC AAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT
123456
203204
203105
230234
233234
203105
103104
103104
233104
FIGURE 14.6 Example of alignment of DNA sequences of 41 nucleotide sites (positions 81—121) from eight taxa. Variable nucleotide
sites are in bold. Note deletion of six bases in taxon 2 and taxon 5. Possible character coding of variable sites is seen at right. Coding of
nucleotides is as follows: A state 0; C state 1; G state 2; T = state 3. In this example, the deletion is coded as a single binary character
(character 6), coded differently from nucleotides, as state 4 = deletion absent and state 5 deletion present.
of transitions versus transversions may be used (inversely) to
establish the relative weights. For example, for a given group
under study, if transitions occur 5x more frequently than
transversions, the latter may be given a weight of 5 and the
former a weight of 1, as illustrated in the step matrix of
Figure 14.7.
These weighting schemes may be viewed as a simplified
component of a process that may be quite complex, taking
into account, e.g., rate of base substitution, base frequency,
and branch length in determining an evolutionary model.
Evolutionary models are commonly used in maximum likeli
hood and Bayesian analyses. (See Chapter 2.)
DNA sequence data can also be used to evaluate the
secondary structure of a molecule. Thus, nucleotide differ
ences that result in major changes in the conformation of the
product (whether ribosomal RNA or protein) may have
a much greater physiological effect than those that do not
and might receive a higher weight. Computer algorithms can
evaluate this to some degree.
A
0
1
T
C
G
A
1
5
5
G
5
5
0
C
0
1
5
5
T
1
5
5
0
FIGURE 14.7 Step matrix of nucleotide changes, showing weight
ing scheme in which transversions are given a weight 5 times greater
than that of transitions.
SYSTEMATIC EVIDENCE AND DESCRIPTIVE TERMINOLOGY
Character Coding
DNA Alignment
Taxon
Taxon
Taxon
Taxon
Taxon
Taxon
Taxon
Taxon
UNIT III
Parsimony, maximum likelihood, and Bayesian methods
are commonly used to infer phylogenetic relationships using
DNA sequence data (Chapter 2). The most robust hypotheses of
relationship are generally those using a large taxon sampling and
sequence data from multiple (e.g., anywhere from 3 to 20+)
genes and/or sequence regions.
RESTRICTION SITE ANALYSIS (RFLPs)
A restriction site is a sequence of approximately 6—8 base
pairs of DNA that binds to a given restriction enzyme.
These restriction enzymes, of which there are many, have
been isolated from bacteria. Their natural function is to inac
tivate invading viruses by cleaving the viral DNA. Restriction
enzymes known as type II recognize restriction sites and
cleave the DNA at particular locations within or near the
restriction site. An example is the restriction enzyme EcoPJ
(named after E. coli, from which it was first isolated), which
recognizes the DNA sequence seen in Figure 14.8 and cleaves
the DNA at the sites indicated by the arrows in this figure.
Restriction fragment length polymorphism, or RFLP,
refers to differences between taxa in restriction sites, and
therefore the lengths of fragments of DNA following cleav
age with restriction enzymes. For example, Figure 14.9 shows,
for two hypothetical species, amplified DNA lengths of
10,000 base pairs that are subjected to (“digested with”) the
restriction enzyme EcoRI. Note, after a reaction with the
EcoRI enzyme, that the DNA of species A is cleaved into
three fragments, corresponding to two EcoRI restriction sites,
whereas that of species B is cleaved into four fragments,
corresponding to three EcoRI restriction sites. The relative
c
iiiiiiiiiiiiiiiiiiiiiiiiini c
iiiiiiiiiiiiiiiiiiiiiiiiiiii
1:’
—.
—
—
‘i’
—
—
—....—.z..— c
iiiiiiiiiiiiiiiiiiiiiiiiii
4
FIGURE 14.8 A DNA restriction site, cleaved (at arrows) by the
restriction site enzyme EcoRI.
locations of these restriction sites on the DNA can be mapped;
one possibility is seen at the bottom of Figure 14.9. (Note that
there are other possibilities for this map; precise mapping
requires additional work.) Additional restriction enzymes
can be used. Figure 14.10 illustrates how each of the DNA
fragments from the EcoRI digests can be digested with the
BAM HI restriction enzyme, yielding different fragments for
the two species. These data can be added to the original in
preparing a map (one possible map is shown in lower part of
Figure 14.10).
Restriction site fragment data can be coded as characters
and character states in a phylogenetic analysis. For example,
given that the restriction site maps of Figure 14.10 are
correct, the presence or absence of these sites can be coded as
characters, as seen in Figure 14.11. Restriction site analysis
contains far less data than complete DNA sequencing,
accounting only for the presence or absence of sites 6—8 base
pairs long. It has the advantage, however, of surveying con
siderably larger segments of DNA. However, with improved
and less expensive sequencing techniques, it is less valuable
and less often used than in the past.
ALLOZYMES
Allozymes are different molecular forms of an enzyme that
correspond to different alleles of a common gene (locus).
(This is not to be confused with isozymes, which are forms of
an enzyme that are derived from separate genes or loci.)
Allozymes are traditionally detected using electrophoresis, in
which the enzymes are extracted and placed on a medium
(e.g., starch) through which an electric current runs (similar
to gel electrophoresis in DNA sequencing). A given enzyme
will migrate toward one pole or the other depending on
its charge. Similarly, different allozymes of an enzyme will
migrate differentially because they differ slightly in amino
acid composition and therefore have somewhat different
593
electrical charges: Allozymes subjected to electrophoresis
are identified with a stain specific to that enzyme and the
bands marked by their relative position on the electrophoresis
medium.
Allozymes have traditionally been used to assess genetic
variation within a population or species, but they can also be
used as data in phylogenetic analyses of closely related spe
cies, e.g., species within a monophyletic genus. Figure 14. 12A
illustrates an example of electrophoretic allozyme banding
data for five species and an outgroup.
There are several ways to code polymorphic allozyme data.
One way is to code each allele as a character and the presence
or absence of that allele as a character state. A second way
to code allozyme data is to treat the locus (corresponding
to the gene coding for the enzyme) as the character and
all unique combinations of alleles as character states (as in
Figure 14.12B). The number of state changes between these
unique allelic combinations can be a default of one. However,
another method of coding is to treat the loss of each allele as
one state change and the gain of an allele as a separate state
change. Thus, the number of state changes between different
allelic combinations can vary, as seen in Figure 14.12C. Step
matrices (see Chapter 2) are used to code these in a cladistic
analysis.
Yet another way to code ailozyme data is to take into account
the frequency of alleles present in a given taxon. For example, by
this method, species A, which has allele X present with a fre
quency of 95% and allele Y with a frequency of 5%, would
receive a different coding from species B, which has the same
alleles, but in frequencies of 55% and 45%, respectively.
MICROSATELLITE DNA
Microsatellites are regions of DNA that contain short
(usually 2—5) repeats of nucleotides, an example being
TGTGTG, in which two base pairs repeat. The regions are
termed tandem repeats; if they vary within a population
or species, they are called variable-number tandem
repeats (VNTR). (Other designations and acronyms are
used, depending on the particular field of study.) These
tandem repeats can be located all across the genome; at a
given location (locus), the repeat will tend to be of a cer
tain length. However, individuals within or between popu
lations may vary in the number of tandem repeats at a given
locus (or even show allelic variation) because of irregularities
in crossing-over and replication. Thus, variable-number
tandem repeats can be used as a genetic marker.
Microsatellites are identified by constructing primers that
flank the tandem repeats and then using PCR technology.