Molecular Methods in Anthropology Module Leslie A. Knapp

Molecular Methods in Anthropology Module
C
n
e
e
g
a
g
g
n
i
n
r
a
e
L
Leslie A. Knapp
Department of Biological Anthropology
University of Cambridge
i
Table of Contents
Introduction and Aims
page
1
Molecular Anthropology and the Human Genome
Nuclear DNA
Mitochondrial DNA
1
2
3
g
n
i
n
r
a
Sources of DNA and Biological Sample Collection
Invasive versus non-invasive sampling
Ancient/archival DNA
DNA Extraction
In The Lab 1: DNA Extraction
5
5
7
8
8
Methods and Application of Molecular Hybridization
In The Lab 2: DNA-DNA hybridization
Restriction Fragment Length Polymorphisms (RFLPs)
and DNA Fingerprinting
In The Lab 3: Southern Hybridization
e
L
9
10
10
11
Principles and Applications of the Polymerase Chain Reaction (PCR)
In The Lab 4: PCR
Allele-specific PCR
Advantages and disadvantages of PCR
12
12
13
13
Gel electrophoresis
In The Lab 5: Gel Electrophoresis
DNA sequencing
In The Lab 6: DNA Sequencing
14
14
16
16
n
e
e
g
a
g
Repetitive DNA Sequences
Dispersed repeats
Clustered repeats
Identifying individuals with microsatellites
C
17
17
18
18
DNA-based Trees and Evolution
Species trees versus gene trees
How are we related to the Neandertals?
20
21
22
Protein Structure and Function
Protein structure
The functional diversity of proteins
mRNA studies
BOX 1: MHC genes, immune response and evolution
24
24
24
25
25
Recombinant DNA Technology and Human Evolution
In The Lab 7: DNA cloning
27
28
ii
Acknowledgements
29
Suggested Discussion Questions
29
Bibliography and Suggested Readings
29
g
n
i
n
r
a
32
Glossary
C
n
e
e
g
a
g
iii
e
L
Preface
This supplement is intended to accompany your Wadsworth Anthropology
textbook. The enclosed printed copy is provided as a courtesy with the
purchase of your book. To best utilize this supplement, please go to the online version at the following web address:
g
n
i
n
r
a
http://www.wadsworth.com/anthropology_d/special_features/ext/molecular_methods/
user name: nucleus
password: cytoplasm
The online version includes color photographs, hot-linked chapter topics,
and live weblinks. Plus, you can answer the questions for thought at the end
of the chapter online and email your responses to your instructor.
C
n
e
e
g
a
g
iv
e
L
Introduction and Aims
Genetic methods have always played an important part in physical anthropology.
Historical figures such as Galton in the 1800s and Landsteiner in the early 1900s
represent some of the earliest pioneers of anthropological genetics. Laboratory methods
for identifying human blood types such as ABO have been available since 1901, when
Landsteiner described this well-known blood group system. Not long after this,
laboratory methods for identifying disease-causing genes became available to medical
and biological scientists. In Norway, biochemical techniques were developed in the
1930s to determine if newborns had an inherited disease called phenylketonuria (PKU),
which could seriously damage the developing nervous system. Eventually, however,
biochemical approaches were replaced by techniques that could be used to examine DNA
directly, at the molecular level.
g
n
i
n
r
a
The same molecular biology techniques used in medicine can be used to study
human variation and, as a consequence, anthropologists now use molecular biology
techniques to study humans and to explore evolutionary relationships in humans and their
primate relatives. In some cases, molecular methods have been used to supplement fossil
evidence. In other cases, molecular biology techniques have changed the way in which
anthropologists study modern humans and nonhuman primates.
e
g
e
L
This module explores how molecular anthropologists use genetic methods and
applications to study genetic variation and evolution in humans and nonhuman primates.
Accordingly you will learn about some of the common laboratory methods being used to
explore these topics in ways that would have been impossible even 10 years ago.
Detailed laboratory protocols can be found in many manuals and papers, including
references cited at the end of this module. Specific examples will be drawn from up-todate research on human evolutionary origins and comparative primate genomics1 to
demonstrate that scientific research is an ongoing process with theories frequently being
questioned and re-evaluated.
n
e
a
g
Molecular Anthropology and the Human Genome
C
Molecular and biochemical studies of human variation began in the early 1900’s,
but the practical application of biochemistry and genetics to the field of anthropology did
not begin until the 1960’s. “Molecular anthropology,” as it was originally named by
Emile Zuckerkandl in 1962, described the use of biochemistry to understand human
evolution. Since that time, molecular studies of human diversity and evolution have
expanded to include molecular genetic investigations of human variation and
evolutionary relationships within and between humans and nonhuman primates, as well
as the application of molecular genetics to the study of human and nonhuman primate
behavior.
1
See glossary at end of module for definitions of all terms in bold face.
1
In the 1980’s the Human Genome Project set out to obtain a complete
description of the human genome by determining the precise DNA sequence of all 46
human chromosomes. The rationale was that fundamental information concerning our
genetic make-up would help us understand the role of genes in health and disease, while
furthering our scientific knowledge of human genetics in general. Although the Human
Genome Project began thorough the support of the U.S. Department of Health and
Human Services, it has now become a major international project and complementary
research programs have been established in the United Kingdom (England, Scotland and
Wales), France, Japan and many other nations. Coordination of these international
efforts has been undertaken by the Human Genome Organization (HUGO), which
facilitates discussions of the ethical, legal and social issues of human genome research.
g
n
i
n
r
a
Another of the major goals of the Human Genome Project, and later HUGO, is to
study the genomes of nonhuman organisms to determine similarities that may help in
understanding health, disease and even evolution. Not surprisingly, most of the effort on
nonhumans has been focused on model organisms, such as mice, fruit flies and yeast.
However, some researchers have argued that detailed studies of our closest relatives, the
primates, are also needed. Currently, efforts are being directed toward particular
nonhuman primates such as chimpanzees and rhesus macaques. There is still a great deal
to learn about our own genome, as well as that of our close relatives.
Nuclear DNA
e
g
e
L
Although the precise order of nucleotides and the relative importance of some
regions of DNA in humans has yet to be determined, a great deal is known about what
makes up the human genome. Based on the hard work of many scientists throughout the
world, we now have a blueprint of the human genome. Generally, we know that the
human genome is made up of two basic components (Figure 1). The nuclear genome
(the genetic material contained in chromosomes) contains DNA inherited from both
parents. Nuclear DNA is found only in the cell nucleus and, as a rule, each cell contains
just one copy of the nuclear genome, which is made up of approximately 3 billion
nucleotide or base pairs (bp), organized into chromosomes. Surprisingly, only about
20% of the nuclear genome consists of genes and gene-related sequences. We know that
genes contain protein-coding segments called exons. But they also contain non-coding
regulatory regions and introns. As a consequence, more than 90% of the gene and generelated sequences is considered non-coding DNA. Non-coding sequences also include
pseudogenes, which are genes with deletions, insertions or mis-sense mutations that
interfere with the gene’s function. Gene fragments may also arise from unequal crossingover during meiosis.
n
e
C
a
g
An even larger part of the nuclear genome consists of what is known as
extragenic DNA, repetitive or unique sequences that do not, at present, seem to contain
protein-coding information. These sequences are composed of repeated strings of
nucleotides, some of which are dispersed throughout the genome while others are
clustered together. (The repetitive sequences are especially numerous and alone make up
more than 40% of the nuclear genome; we will discuss them in greater detail later.)
2
Some of these repetitive sequences have been used to study evolution and to determine
paternity and relatedness in humans and other primates.
The Human Genome
mtDNA
37 genes
g
n
i
n
r
a
Nuclear DNA
3 billion bp
Genes and
gene-related
sequences
(20-30%)
Figure 1.
The Human
Genome is
composed of
the mitochondrial
and nuclear
genomes.
Extragenic
DNA
(70-80%)
e
L
Repetitive (20%)
e
g
Coding (<10%)
Non-coding (>90%)
a
g
Mitochondrial DNA (mtDNA)
Unique sequences (80%)
The second basic component of the human genome is found in the mitochondria,
the small energy-producing organelles in the cytoplasm of a cell. Mitochondria contain
their own DNA (also called the mitochondrial genome) and they are very similar in
humans and nonhuman primates. Mitochondrial DNA (mtDNA) is double-stranded,
like nuclear DNA. However, it differs from the nuclear genome in that it forms a closed
ring instead of being organized into chromosomes. Almost 90% of this circular
molecule, comprising approximately 17,000 base pairs, is made up of protein coding
sequences. The mitochondrial genome includes 24 genes that code for the production of
ribosomal RNA and 13 that code for proteins required for energy production. There are
no introns, few repetitive sequences and just one non-coding sequence, which initiates
transcription and replication of the mitochondrial genome. Other critical differences
between the nuclear and mitochondrial genomes include the pattern of inheritance:
mtDNA is inherited only from the mother in the egg’s cytoplasm; recombination does not
occur in mtDNA because mtDNA is inherited as an identical copy of the mother’s; and
the presence of multiple mtDNA copies in every cell, since there are many mitochondria
in all cells.
C
n
e
3
The differences in the structure and composition of genes, patterns of inheritance
and number of copies of the nuclear and mitochondrial genome in each cell make it
possible for scientists to study genetic relationships between distantly, or closely, related
organisms. For related species, such as Old World monkeys, apes and humans,
molecular geneticists often study regions of the genome that evolve relatively slowly.
These regions are usually protein coding DNA sequences that are shared by different
species due to common ancestry. Genes like these are considered homologous, or shared
due to inheritance from a common ancestor.
g
n
i
n
r
a
Most species, if they are not too closely related, can also be studied using parts of
the genome that accumulate mutations at a fairly constant “clock-like” rate. Pseudogenes
and non-coding DNA sequences are said to accumulate neutral mutations and, as a
consequence more closely related species should have fewer mutations than more
distantly related species. (In other words, pseudogenes and non-coding sequences should
be more similar in closely related species than they are in more distantly related ones.)
Based on detailed studies of homologous DNA sequences in different species, it is now
clear that rates of mutations are different in different lineages and at different times in an
organism’s history. The rates of change can be influenced by natural selection operating
on the genes themselves or on nearby regions of the genome. Rates of change are also
affected by generation length, rates of DNA repair and even the nucleotide sequence
itself.
e
g
e
L
Although “molecular clock” studies of modern humans suggest that
“mitochondrial Eve,” our common female ancestor, lived approximately 200,000 years
ago in Africa, recent studies in other species indicate that molecular clocks do not work
perfectly. Based on comparative studies in different species, it is now clear that
molecular clocks tick at different rates in different species and at different times. For
example, one of the genes involved in hemoglobin production called alpha globin has
evolved 10 times faster in baboons than in rhesus macaques (Shaw et al, 1989). Another
example comes from langurs, when compared to other Old World monkeys, there has
been a 2.5-fold increase in the rate of nucleotide substitutions in the gene for a digestive
enzyme (Messier and Stewart, 1997).
n
e
a
g
When organisms are quite closely related, it may be difficult to identify
significant evolutionary differences between individuals. Therefore, scientists often
choose to study regions of the genome that accumulate mutations rapidly. For example,
if we want to investigate evolutionary relationships between different human populations,
rapidly mutating regions of the mitochondrial genome can be studied. Rates of
divergence are 1.5 to 5 times greater in these segments than they are in protein coding
genes. The mitochondrial genome is also useful for evolutionary studies because we
have a complete picture of the DNA sequence and order of genes in human mitochondria.
Furthermore, there is very little difference between the mitochondrial genomes of humans
and most nonhuman primates. Thus, although comparisons can also be made between
humans and other primate species using mitochondrial genes, the example of langur
digestive enzymes and baboon alpha globin demonstrate that molecular clocks do not
C
4
have the same rate in all species. As a consequence, divergence dates derived using
molecular clocks should be accepted with caution (see Graur and Martin, 2004).
Sources of DNA and Biological Sample Collection
One of the most hotly debated issues in anthropology has been the origin of
anatomically modern humans. Since the fossil record for this time period is not complete
enough to support any one scenario, two contrasting models were proposed. The multiregional model argues that ancestral populations of Homo erectus gave rise to all modern
Homo sapiens in the Old World. According to this model, modern humans would be
genetically very similar to one another since there would have been extensive gene flow
for a very long time. Alternatively, the single origin model argues that Homo sapiens
originated as a single population in Africa and, as a consequence, only African
populations would exhibit extensive genetic variation.
g
n
i
n
r
a
While fossil evidence supporting one of these two models may be discovered
eventually, the study of DNA in modern human populations can provide new insight into
recent evolutionary events in human history. Mitochondrial DNA from Africans, Asians
and Europeans lend support for the Out of Africa model in two ways. First, modern
human genetic variation is generally small. Second, Africans display the greatest degree
of genetic variation.
e
g
e
L
Studies of DNA can also be used to examine more ancient evolutionary events,
such as the divergence of apes and humans. Using DNA from modern humans and apes
such as chimpanzees, gorillas and orangutans, anthropologists have demonstrated that
humans and chimpanzees are more closely related to one another than either is to gorillas
or orangutans. The DNA data can also be used to construct a molecular clock that
estimates the human/chimpanzee divergence at a little more than 5 million years ago.
a
g
These two examples demonstrate how molecular studies provide important insight
into human and primate evolutionary history. As you will see, studies of DNA are also
useful for identifying individuals and for determining how closely related individuals are
within modern human and nonhuman primate populations. However, the success or
failure of molecular genetic analyses depends on collecting the most suitable biological
sample and storing it in such a way to minimize damage or degradation. In general, a
biological sample is simply a specimen that contains nucleated cells and, therefore,
DNA. All cells with a nucleus contain a copy of an individual’s genome, but, all samples
are not equally useful for molecular genetic studies.
n
e
C
Invasive versus non-invasive sampling
For most studies, an ideal sample would be fresh blood or tissue since they
contain relatively large numbers of nucleated cells and yield high concentrations of good
quality DNA. Whole blood samples contain red and white blood cells and platelets
suspended in a watery fluid called plasma. Plasma, which makes up more than 50
percent of total human blood volume, contains proteins required for clotting and
5
immunity, but these substances do not contain DNA. Neither do red blood cells or
platelets (essential for blood clotting). But the white blood cells, well known for their
role in immune defense, do contain nuclei with DNA and they are excellent sources of
DNA.
Whenever blood samples are collected for genetic studies, the blood must be
mixed with anti-coagulants to prevent clotting because when blood samples clot, it is
difficult to separate out the white cells. However, the problems associated with clotting
can be avoided when tissue samples are available. Nearly all tissue (such as liver, muscle
or skin) cells contain a nucleus, with a complete copy of an individual’s genome, and
therefore, they can provide abundant sources of DNA. Unfortunately, blood and tissue
samples (derived thorough invasive techniques) are often difficult to obtain except when
researchers are based in close proximity to clinical settings, where blood and tissue
samples can be obtained safely and without discomfort to study subjects. When studying
nonhuman primates in the field, researchers must sedate the animal, collect the blood or
tissue sample and then ensure that the animal will recover in a safe location, away from
predators or other dangers. Blood and tissue samples also require proper storage in
refrigerators or freezers to prevent degradation of DNA. In field studies of human
populations, blood samples can be obtained by pricking a subject’s finger with a needle
and collecting the blood sample on sterile filter paper. This approach eliminates the need
for immediate refrigeration, but typically yields low concentrations of DNA and often
causes minor discomfort to study subjects.
e
g
g
n
i
n
r
a
e
L
Non-invasively collected samples generally do not contain large numbers of
nucleated cells, but they are usually much easier to collect. In the field it is often possible
to obtain cells scraped from the inside of a subject’s cheek. Often, cheek cells yield poor
quality DNA due to the presence of salivary enzymes that breakdown cells in the mouth.
Particular types of foods, drinks and activities (such as gum chewing or smoking) also
have a negative effect on DNA yields. Interestingly, cheek cells can also be collected
from nonhuman primates by rinsing the surface of wads of vegetation (called wadges)
that have been chewed and spit out. These samples have been particularly useful for
studies in chimpanzees.
n
e
a
g
Hair follicles also contain nucleated cells and these provide another way to collect
DNA-containing samples without much discomfort for study subjects. Ideally, hairs are
plucked to ensure that cells are numerous and fresh. Hairs that have been shed are a poor
source of DNA since only a small number of cells are attached to hair shaft and these
cells are in the process of degradation. There are also problems with contamination since
there can be more DNA from the individual collecting the sample than from the
individual that shed the hair. Wearing gloves during sample collection may reduce the
possibility of contamination, but molecular genetic studies of shed hairs are notoriously
difficult. Thus, even though this approach avoids disruption of the study subjects, it
requires a great deal of effort to obtain accurate results. Consequently, new techniques
for using shed hairs for molecular genetic research are still in development.
C
6
For studying many nonhuman primates, it is not possible to obtain food wadges or
hairs. Instead, researchers can only collect waste products from animals. Urine and feces
may seem useless, but actually they contain significant numbers of nucleated cells from
the individual. Also, urine and feces are plentiful and they are ideal for those who do not
want to disturb their study subjects or cannot get close enough to collect any other type of
sample. Urine samples can even be caught in containers in mid-air from animals
overhead! (Give this a try some time if you are feeling adventurous.) And, although it is
not nearly as exciting, fecal samples can be collected directly off the ground. Scientists
who use feces suggest that the surface of the sample provides the most cells. However,
when the animal is a carnivore, there may be contamination from the cells of digested
prey animals.
g
n
i
n
r
a
Recently there has been an increase in the number of studies relying on urine and
fecal samples, even though many researchers emphasize the difficulties in obtaining
DNA and repeatable results from these sources. Carefully controlled DNA-based studies
of chimpanzee feces have shown that the low DNA content of fecal samples can lead to
incorrect results (see Morin et a.l, 2001). Additionally, the presence of microbial DNA in
wild gorilla feces can lead to parent-offspring mismatches and even incorrect paternity
determinations (see Bradley and Vigilant, 2002).
Ancient/archival DNA
e
g
e
L
Archival samples, such as skins and teeth from museum collections, may provide
DNA for molecular genetic studies, but they present problems that are similar to those
described for non-invasively collected samples. First, many museum skins have been
treated with preservatives that destroy cells and degrade DNA. Second, even when skins
have not been treated, much of the DNA has been degraded by high storage temperatures
and normal aging over time. Researchers may also find DNA-containing cells within the
pulp or dentin of teeth, but this material degrades rapidly and few cells will contain
significant amounts of nuclear DNA.
n
e
a
g
More ancient samples, such as Neandertal or even modern human bone, pose
similar problems since very few cells may be found in this material and most of the DNA
will be highly degraded. Moreover, the problem of contamination from the researchers
themselves are even more exaggerated than with non-invasively collected samples like
shed hair. To avoid contamination, studies of ancient DNA must be conducted in
isolated, specially sealed rooms that do not allow the introduction of modern DNA since
only minute amounts of ancient DNA can be obtained from these samples. Few
laboratories have the space, or resources, to undertake studies of ancient samples, and
those that do frequently bemoan the costs and difficulties associated with these studies.
C
In addition to the scarcity of DNA, archival and ancient samples are problematic
because the sample is destroyed during the DNA extraction process. Consequently, the
museums entrusted with the protection of these valuable and unique specimens must
impose very strict guidelines and restrictions. To deal with the problems and limitations
7
involved in the use of these samples, new techniques are currently being developed in a
number of laboratories throughout the United States and in other countries.
Whatever type of biological sample is collected, permits for collection and
transport will be required. For example, because of ethical concerns, permission must be
obtained to collect human samples. This may involve a comprehensive risk assessment, a
review of the research project’s aims and a plan for obtaining informed consent from all
study subjects. In the United States, scientists studying the genetic history of human
populations must obtain meaningful informed consent from people who donate DNA
samples and there can be no record of medical or personally identifying information
about the donors. In the United Kingdom, the recently developed Human Tissues Act
also requires that researchers obtain permission for every specific use of a biological
sample. When blood, tissue or hair samples are collected from primates, export and
import permits must be obtained by both local government agencies and through the
Convention on International Trade in Endangered Species (CITES) of Wild Fauna and
Flora. Strict regulations by CITES aim to prevent smuggling of animal parts and, as a
consequence, scientists involved in research on animals, particularly endangered species,
must justify their research and the need for biological samples such as blood, hair or
feathers. Importation of fecal and urine samples are not so strictly regulated and, as a
rule, researchers must only obtain permits from agricultural officials.
DNA extraction
e
g
g
n
i
n
r
a
e
L
As you have already learned, although DNA can be extracted from a variety of
sources, white blood cells are generally the best sources of DNA. But today, using
various chemical and physical methods, scientists can obtain DNA from almost any
biological sample (see In The Lab 1).
a
g
In The Lab 1: DNA Extraction
n
e
If whole blood is treated to prevent clotting and then permitted to stand in a container,
the red blood cells, which weigh the most, will settle to the bottom and the plasma will remain
at the top. The white blood cells and platelets will remain suspended between the plasma and
the red blood cells. A centrifuge (see Figure 2), a device that spins the tubes at extremely high
rates of speed, may be used to hasten this separation process. White cells can be centrifuged
out, then mixed in a buffered, soapy, saline solution that breaks down the fatty cell membrane,
and splits the cells open to release their DNA-containing nuclei. The nuclei, in turn, can then
be broken open, with more soapy solution to dissolve the nuclear membrane. When a high salt
concentration solution is then added to the mixture, the DNA dissolves. Eventually, the DNA
is precipitated out of solution by adding cold ethanol and, as a consequence, it condenses,
becomes visible, and looks like pieces of whitish thread (see Figure 3). The precipitated DNA
is then collected using a sterile glass hook, dried, and placed in sterile water to dissolve into
solution. This procedure can yield strands of nuclear and mitochondrial DNA and many
researchers use this technique to extract both types at the same time.
C
8
In The Lab 1: DNA Extraction (continued)
Figure 2a-b: A
centrifuge separates
substances with
different densities. The
centrifuge in Figure 2a
(close-up in 2b), used to
spin small volumes of
solutions (<2 millilitres)
contained within small
tubes, is known as an ultracentrifuge. Other
centrifuges can be used to
spin large volumes, up to
50 millilitres.
n
e
e
g
a
g
g
n
i
n
r
a
e
L
Figure 3: Genomic
DNA is precipitated
in ethanol. While
individual strands of
DNA are not visible
by eye, large quantities
will condense during
extraction and will be
visible as a large white
mass (see arrow left).
DNA is visible as white mass
Methods and Application of Molecular Hybridization
C
A technique called DNA hybridizaton has been used for at least 20 years to
estimate genetic distance between humans and nonhuman primate species. DNA-DNA
hybridization relies on the double-stranded (i.e., duplex) nature of DNA and the fact
that nucleotides on the two complementary strands are held together by hydrogen bonds.
Specifically, adenine pairs with thymine using two hydrogen bonds and guanine pairs
with cytosine using three hydrogen bonds. When DNA is heated to temperatures greater
than 94oC, the hydrogen bonds, the weakest links in DNA, are broken and the two DNA
strands separate. Thus, this process (called denaturation) will yield two intact, but
separate, single strands. When the single strands are cooled, complementary nucleotides
will rejoin or anneal to form into double-stranded duplexes. Duplexes formed from a
single sample will have perfectly matched complementary strands, called homoduplexes.
9
However, when duplexes are formed from the DNA of two different species incorrect
nucleotide pairing will occur, resulting in the formation of heteroduplexes.
In a classic study, Sibley and Ahlquist (1984) used DNA-DNA hybridization to
assess the evolutionary relationships between humans, chimpanzees and gorillas. They
reported that hybridization experiments demonstrated that humans and chimpanzees are
more closely related to one another than either is to gorillas. Although DNA-DNA
hybridization has improved our understanding of genome structure in many species,
including primates, the approach has been criticized due to the fact that the technique
does not provide specific detail about nucleotide mismatches and genome size. But, at
the same time, recent studies have supported the work of Sibley and Ahlquist (see Li et
a.l, 1987 and Wildman et a.l, 2003).
In The Lab 2: DNA-DNA Hybridization
g
n
i
n
r
a
e
L
The human and non-human primate genomes contain large segments of repetitive DNA
and these sequences reassociate most rapidly after denaturation. For DNA-DNA hybridization
studies, repetitive DNA is removed through chemical means and the remaining sequences are
mixed under conditions that favor duplex formation. When mixtures from different species are
created, it is possible to determine the degree of genetic divergence between the species since
heteroduplexes containing the largest number nucleotide mismatches will disassociate most
rapidly. The difference in temperature at which heteroduplexes denature subtracted from the
temperature at which homoduplexes denature, the delta-T, yields an overall estimate of genetic
difference between two species. Closely related species, with very similar DNA, will have fewer
mismatches while more distantly related species, with different DNA, will have more.
e
g
a
g
Restriction Fragment Length Polymorphisms (RFLPs) and DNA Fingerprinting
n
e
The discovery of restriction endonucleases, which can cleave duplex DNA
composed of particular nucleotide sequences, revolutionized molecular genetics. One
well-known example of a restriction endonuclease is EcoRI (named after the bacterium
Escherichia coli, from which it was first isolated). EcoRI cuts double-stranded DNA
wherever the nucleotide sequence GAATTC occurs (Table 1). Several hundred
restriction endonucleases have been isolated from bacteria to identify different nucleotide
sequences. Cutting (or restricting) DNA with endonucleases enables scientists to detect
differences in nucleotide sequence (polymorphisms) between individuals that result
from substitutions, insertions, deletion or rearrangements of DNA. The restriction
fragment length polymorphisms (RFLPs) can then be separated and visualized using
gel electrophoresis (see below). After the initial RFLP assay, it is possible to use
hydridization to detect a single gene, a specific stretch of nucleotides or even a repetitive
sequence. This technique is known as “Southern hybridization” (see In The Lab 3).
C
10
In The Lab 3: Southern Hybridization
Following gel
electrophoresis (see p. 14), the
duplex DNA fragments are
chemically denatured and fixed onto
a nylon membrane where they are
exposed to a single-stranded
“probe” under conditions that allow
complementary nucleotide
sequences to form duplexes.
Usually, the conditions allow duplex
formation only between the “probe”
and its complementary sequences.
Detection of hybridization involves
the use of radioactivity or
fluorescence. If the probe
represents a single, unique
nucleotide sequence, only one or
two fragments will be identified.
Contrastingly, probes representing
repetitive sequences may yield very
complex fragment patterns that are
also known as “DNA fingerprints”
due to the fact that most individuals
will possess complex restriction
endonuclease/probe patterns.
n
e
Restriction
endonuclease
Recognition
sequence
(cut site=⇓)
GG⇓CC
CltI
(Caryophanon
latum)
EcoRI
(Escherichia
coli)
Double-stranded DNA
after restriction
endonuclease cutting
GG
CC
⏐⏐
⏐⏐
CC
GG
G
AATTC
⏐⏐⏐⏐⏐
⏐⏐⏐⏐⏐
CTTAA
G
g
n
i
n
r
a
G⇓AATTC
Table 1: Restriction endonucleases cut specific
double stranded DNA sequences. These two
examples show how the cuts can be straight (CltI) or
staggered (EcoRI), depending on the DNA sequence.
1
2
3
e
g
a
g
e
L
4
5 6
7
Figure 4: Most
individuals will have
different DNA
fingerprints. Seven
related pigtailed
macaques (lanes 1-7,
left) have similar, but
not identical, DNA
fingerprints when
“probed” with a
repetitive sequence.
The term “DNA fingerprinting” is usually associated with a technique introduced
by Alec Jeffreys, of the University of Leicester in the United Kingdom, in which unique
restriction endonuclease/probe patterns distinguish most or all individuals (except
monzygotic twins). Originally applied to the study of humans, Jeffreys’ probes can also
be used to identify unique DNA patterns in species that range from apes and monkeys to
birds and fish. The complex patterns characteristic of DNA fingerprints provide a simple
means of establishing genetic identity in forensic and criminal investigations, as well as
assessing parentage since DNA fingerprint patterns are the product of maternal and
paternal contributions.
C
Unfortunately, the complex patterns of DNA fingerprints also make them difficult
to interpret and the assignment of particular patterns to specific genetic loci is not
currently possible. Also, DNA-based forensic analyses are not foolproof and many
weaknesses have been challenged in court. One famous example comes from the 1994
O.J. Simpson trial. When DNA tests, conducted by the California Department of Justice,
were used to compare droplets of blood found at the crime scene and Simpson’s own
11
blood, it was reported that there was so much similarity that only 1 person in 57 billion
could have produced an equivalent match. Similar analyses at other crime labs confirmed
the results, but Simpson’s lawyers were able to raise doubts about the blood storage and
handling and they ultimately persuaded the jury that Simpson was “Not Guilty.” During
the last 10 years, DNA testing has become more sophisticated and accurate and, as a
consequence, DNA evidence has been used to convict criminals and exonerate
incarcerated individuals – even defendants on “Death Row” (see below) .
g
n
i
n
r
a
Principles and Applications of the Polymerase Chain Reaction (PCR)
The PCR technique involves three steps that essentially mimic the natural process
of DNA replication in vivo (see In The Lab 4). When all of the components of the PCR
reaction mixture are together in one tube, the original template is melted and an emzyme
called polymerase makes two new strands, doubling the amount of DNA present. This
process is repeated 20 to 40 times, each cycle providing two new templates for the next
cycle. Thus, the amount of amplification is 2 (the number of templates) raised to n power,
where n represents the number of cycles that are performed. The polymerase reaction
provides an extremely sensitive means of amplifying small quantities of DNA. The
development of this technique resulted in an explosion of new techniques in molecular
biology (and a Nobel Prize for Kary Mullis in 1993) as more and more applications of the
method have been published.
e
g
In The Lab 4: PCR
First, double-stranded DNA is
denatured into single strands by
heating. Second, short stretches of
single stranded DNA sequences
(about 18-25 nucleotides long), called
primers, are annealed to sites that
flank the DNA sequence of interest.
Third, using heat-resistant DNA
polymerase, new complementary
single strands of DNA are synthesized
between the flanking primers using
the original single strands as a
template. Following just one round of
these three steps, the targeted region
of interest has been duplicated. Each
repetition of the three steps doubles
the number of copies of template and
after 30 cycles up to a billion copies
can be produced.
C
n
e
a
g
Cycle
number
1
2
↓
30
e
L
Number of
DNA copies
after cycle
2
4
↓
1,073,741,824
Start
1,073,741,824 copies
Figure 5: PCR is used to make many copies of
a particular DNA sequence of interest. When
a PCR experiment begins with one copy of a
DNA sequence, there will be two copies after the
first round of the PCR cycle, four copies after the
second round and more than one billion copies
after 30 rounds of PCR cycles. The illustration,
at right, shows how copy number increases
geometrically.
12
In The Lab 5: PCR (continued)
g
n
i
n
r
a
a.
b.
c.
e
g
a
g
e
L
Figure 6a-c: PCR is a common technique in most genetic labs. a) PCR tubes are small,
because reaction volumes can be minimal; b) to avoid contamination, PCR reactions are set up in
isolated areas like this safety hood; c) each of these three thermal cycling machines uses
computer-based temperature controls to raise and lower temperature during PCR.
n
e
Allele-specific PCR
C
As a rule, PCR is used to identify particular DNA sequences and allele-specific
PCR is possible when the primers are designed to anneal to just one DNA sequence at a
given genetic locus. This approach is used to identify alleles that differ by one or more
nucleotides within the primer annealing site. For example, individuals with normal
hemoglobin have a different DNA sequence from those with the sickle cell mutation and
these differences can be identified using allele-specific PCR.
Advantages and Disadvantages of PCR
While a very powerful technique, PCR can also be very tricky. Primer design is
extremely important for effective amplification. That is, the primers for the reaction must
13
be very specific for the template to be amplified. Cross-reactivity, with non-target DNA
sequences, results in non-specific amplification of DNA. Also, the primers must not be
capable of annealing to themselves or each other, as this results in the very efficient
amplification of short nonsense DNAs. The reaction is also limited in the size of the
DNA that can be amplified (i.e., the distance between the forward and reverse primers).
The most efficient amplification is in the 300 - 1000 base pair (bp) range, however
amplification of products up to 4,000 bases (4 Kb) has been reported.
g
n
i
n
r
a
The most important consideration in PCR, though, is contamination. If the sample
being tested has even the smallest amount of contamination from another source of DNA,
the reaction could amplify the contaminating DNA and report a falsely positive
identification. For example, technicians in a crime lab compare blood samples from
suspects to samples taken from a crime scene. If there is any contamination from one
sample to the other, the result could be the unfortunate and mistaken conviction of the
suspect. Contamination can also occur when a few blood cells stick to the plastic surface
of the pipette, and then get ejected into the test sample. For this reason, and many others,
modern labs devote tremendous effort to avoiding this problem.
Gel Electrophoresis
e
L
Gel electrophoresis is a common laboratory method used to quickly separate and
visualize DNA fragments in molecular genetics laboratories. The usual electrophoretic
media are agarose or acrylamide gels. These gels are dense matrices through which DNA
fragments migrate when exposed to electric current. The technique of electrophoresis is
based upon the fact that DNA is negatively charged at neutral pH due to its phosphate
backbone. When an electrical potential is placed on the DNA it will slowly move
towards the positive pole.
e
g
a
g
In The Lab 5: Gel Electrophoresis
n
e
An agarose gel is a flat slab of jelly-like material, that forms a porous lattice, or matrix,
through which DNA fragments must migrate in order to move toward the positive pole during
electrophoresis. Larger molecules move more slowly than smaller ones, since the smaller molecules
meet less resistance in the gel. As a result, a mixture of large and small fragments of DNA can be
separated by size.
C
As discussed previously, agarose gel electrophoresis is used to detect DNA fragment sizes
following restriction endonuclease digestion. It is also used to separate digested DNA fragments
prior to Southern hybridization using single-stranded DNA probes and as part of the process when
generating DNA fingerprints.
Like agarose gel electrophoresis, acrylamide gel electrophoresis separates DNA fragments
according to size. Acrylamide gels are somewhat more difficult to work with than agarose gels,
primarily because they are usually very thin (0.5-0.25mm thick), and are often electrophoresed in a
vertical orientation. But, because these gels can be used to visualize differences as small as a single
nucleotide in length, acrylamide gel electrophoresis is significantly more sensitive for detecting size
differences in DNA fragments.
14
In The Lab 5: Gel Electrophoresis (continued)
a.
Figure 7d (below): Experimental
results from agarose gel
electrophoresis are recorded as a
photograph. A photograph of the
agarose gel records experimental results
for analysis. In this photo, DNA has
been amplified using PCR and the
amplified fragments show up as bright
bands (see arrow, below, left). From
left, the order of samples is: negative
control with no DNA(-), positive control
with DNA known to amplify using the
present experimental conditions (+) and
two different experimental samples
(chimpanzee (C) and gorilla (G)). Sizes
of fragments are estimated by comparing
results with standardised fragments of
known size on the far right of gel (S).
For this standard, the uppermost band is
1100 base pairs (bp) long and the
brightest band is 500bp.
b.
e
g
c.
C
n
e
a
g
g
n
i
n
r
a
e
L
_
+
C G S
1100bp
500bp
Figure 7a-c (above): Agarose gel
electrophoresis is used to visualize
DNA. a) Agarose gels are formed using
a horizontal mold; b)once polymerised,
DNA samples are loaded into pre-formed
wells in the gel; c) after electrophoresis
and staining, DNA fragments are
visualized over UV using a
transilluminator.
15
DNA Sequencing
DNA sequencing is the process of determining the exact order of nucleotides that
make up a DNA segment. As discussed in the beginning of this module, determining the
precise order of the 3 billion nucleotides that make up the DNA of the 24 different human
chromosomes has been one of the major aims of the Human Genome Project. However,
most scientists have more modest aims that involve studying a single gene, or region of a
gene, and this requires the determination of only several hundreds or thousands of
nucleotides. For example, the gene that produces hemoglobin in humans is 1,652
nucleotides long and individuals with a single point mutation (TÆA at amino acid
position #6) will produce sickling hemoglobin. Thus, by determining a person’s DNA
sequence for the hemoglobin gene, scientists can tell if the person will produce normal or
sickling hemoglobin.
g
n
i
n
r
a
If anthropologists want to estimate relatedness between species using DNA, then
they need DNA sequences, since the estimates themselves are based on the number of
nucleotide differences between the species being compared. For example, DNA
sequences from a non-coding segment of the hemoglobin genes have been used to
support the close evolutionary relationship between humans and chimpanzees. When Li
et al. (1987) compared 5,300 nucleotides from humans, chimpanzees and gorillas, they
found that humans and chimpanzees differed by 77 nucleotides, humans and gorillas
differed by 79 nucleotides and chimpanzees and gorillas differed by 83 nucleotides.
Therefore, it was concluded, humans and chimpanzees shared a common ancestor more
recently than either did with gorillas.
e
g
a
g
In The Lab 6: DNA Sequencing
e
L
Several methods can be used to determine the sequence of nucleotides in a purified DNA
segment. The Sanger method, developed by Fred Sanger at the University of Cambridge, is the
technique of choice today. It involves the controlled interruption of in vitro DNA replication, not
unlike the polymerase chain reaction described in an earlier section. It begins with the
denaturation of double-stranded DNA and annealing a short single-stranded segment of DNA
called a primer. As in PCR, the primer must be complementary to a template DNA sequence in
order to anneal and initiate the synthesis of a new single strand of DNA. For the purposes of
DNA sequencing, some of the nucleotides will have fluorescent or radioactive particles attached
to them so the newly synthesized strand of DNA will also consist of fluorescent or radioactively
labeled nucleotides. The newly synthesized sequences are then separated using gel
electrophoresis, visualized by laser (for fluorescent particles) or autoradiography (for radioactive
particles) and the DNA sequence is read directly. Figure 8 displays an image of a computerized
DNA sequence output (also known as an electropherogram).
Figure 8: Electropherograms show
C G G G C G G T G A C A G A G C T G G G G C G G C C
DNA sequence results from
automated sequencing. This
electropherogram depicts 26
nucleotides of a DNA sequence from a
human gene called HLA-DRB1 (see
C G G G C G G T G A C A G A G C T G G G G C G G C C
page 26 for more on HLA).
C
n
e
16
Repetitive DNA sequences
You have already learned that a large proportion of the human genome consists of
non-coding DNA. In many other species, including non-human primates, the proportion
of repeated sequences is equally high. There are two basic classes of repetitive DNA
sequences in the human genome: dispersed repeats and clustered repeats. Dispersed
repeats are scattered throughout the genome and are characterized as either short or long.
Clustered repeats are also widely distributed throughout the genome, but they are
generally shorter than dispersed repeats. Clustered repeats fall into three basic
categories: satellites, mini-satellites and microsatellites. They are called “satellites”
because scientists discovered them by noticing that centrifuged DNA sometimes settled
out into two or more layers. The main layer, or band, contained coding sequences, but
the others were repeated sequences that researchers named “satellites” (Moxon and Wills,
1999). The further designations (mini- and micro-) refer to the length of the segments.
Dispersed repeats
g
n
i
n
r
a
e
L
Short Interspersed Nucleotide Elements (SINEs). SINEs are highly abundant
in mammalian genomes. In humans, one type of SINE, Alu, is particularly well known.
This repetitive sequence is approximately 300 nucleotides long and about 500,000 copies
have been identified with a copy occurring approximately every 5-10,000 nucleotides. It
is called Alu because scientists initially discovered it using a restriction endonuclease
called AluI. AluI cleavage sites flank the repetitive sequence, which is typically a pair of
repeats (141bp each) with sequence similarities to a human RNA gene.
e
g
All Alu sequences are not exactly the same because nucleotide substitutions occur
in about 10 percent of the sequence. The differences suggest that Alus have been derived
through repeated copying and moving (transposing) of the original, and later copies of
the, RNA gene. Alu repeats are frequently used as natural markers for studying genetic
rearrangements that indicate genetic variability and heritable disorders in humans.
Similar Alus have been identified in other primate species and these sequences are useful
for determining evolutionary relationships between species and also regions of the
primate genome. As with DNA-DNA hybridization studies, greater similarity in Alu
sequences suggests closer evolutionary relationship between species. Other SINEs,
which have been identified in the genomes of primates and other mammalian species, are
called MIRs (mammalian-wide interspersed repeats). In primates, an estimated 300,000
copies of MIRs are discernable.
C
n
e
a
g
Long Interspersed Nucleotide Elements (LINEs). LINEs can range from
14,000 to 61,000 bp in length and more than 50,000 copies have been identified in the
human genome using restriction enzymes. LINEs are ubiquitous in the mammalian
genome and, thus, are informative for evolutionary studies that involve many different
primate species. Unlike SINEs, however, some members of LINE families may be able
to copy themselves and, thus, they are sometimes called “jumping genes.”
17
Clustered repeats
Clustered satellite sequences are usually found in specific regions of
chromosomes. For example, some are located near the centromeres and others are found
near the ends of chromosome strands. As a rule, these sequences are just long strings of
the same nucleotide repeated over and over. Many molecular geneticists argue that these
sequences have structural importance since they do not code for proteins and are found in
regions that suggest structural roles. About 10% of human microsatellites are found
within genes and a number of human diseases result from abnormally large numbers of
triplet repeats within genes. These diseases include Huntington’s disease, myotonic
dystrophy and fragile X syndrome. In each case, disease severity and age of onset
depends on the length of the triplet repeat. The longer the length, the earlier and more
severe the disease will be.
g
n
i
n
r
a
Most human chromosomes contain moderately long, tandemly repeated DNA
sequences that can be detected using restriction enzymes, Southern hybridization and/or
PCR. These variable number tandem repeats (VNTRs), first described in 1985, are also
known as minisatellites. As explained earlier, VNTRs underlie the principle of “DNA
fingerprinting,” since individuals differ in the number of VNTRs they possess, generally
from one to 30 tandem repeats, of a 15-70 bp core sequence. Interestingly, many human
minisatellite sequences have been identified in nonhuman primates and the same
techniques used to study humans can be applied to the study of nonhuman primates.
e
g
e
L
Shorter repeated DNA sequences, called microsatellites have the same basic
structure as VNTRs, but the tandem repeat sequences are only 2 to 4 nucleotides long.
More than 10,000 microsatellites have been discovered in the human genome and, as a
rule, they are rarely more than 300 bp in length.
a
g
Most, but not all, microsatellite repeats are found outside the coding regions of
the genome and any increase or decrease in repeat number should hypothetically have no
major effect on the fitness of an individual. Therefore, microsatellite repeats are highly
variable, with as many as 12 to 15 different repeats per “locus.” This extraordinary
variability, combined with the fact that they can be studied using PCR (therefore
requiring little DNA) makes microsatellites very popular genetic markers for disease
association studies, forensic investigations and paternity determination.
C
n
e
Identifying individuals with microsatellites
The accurate genetic identification of individuals has, until recently, been very
difficult to achieve due to problems related to the limited availability of biological
samples in forensic settings. Another problem has been the general lack of adequate
variation, between individuals of the same species, needed to identify the genetic
uniqueness of individuals. Before molecular genetic techniques were developed, people
could be identified according to their particular ABO blood type and a combination of
serum proteins. In the 1980’s these genetic markers were replaced, to a certain extent, by
DNA fingerprinting with minisatellites. These markers exhibited more variability and,
therefore, made it more likely that most individuals would have unique genetic profiles.
18
There were, however, some disadvantages to using minisatellites but most of these
problems and limitations were overcome once microsatellites were discovered.
Because microsatellites are so abundant in the human genome, scientists can
examine several different, and highly variable, loci to identify a unique combination of
microsatellite alleles for any individual. To illustrate the power of these markers,
consider the scientist who determines the microsatellite alleles for four different loci,
each known to exist on a different chromosome. Since the probability of having a
particular combination of alleles at each locus is independent due to the Mendelian Law
of Independent Assortment, it is highly unlikely that two individuals will share the exact
combination of alleles unless they are unrelated. In fact, the probability of matching 4
loci with allele frequencies of 0.11 to 0.40 will be 1 in 7450 for unrelated individuals and
1 in 1750 for first cousins.
g
n
i
n
r
a
The same type of calculations can be used to assess paternity and determine
relatedness. These calculations depend on the degree of heterozygosity for each
microsatellite locus and the number of loci that are used for the assessment. As a rule, it
is easier to assess paternity when the microsatellite alleles of the mother are known
because then it is usually clear which allele the offspring inherited from the father. If it
is possible that more than one male may be the father, potential fathers can be excluded if
they do not possess the paternal allele. When a number of loci are studied, it is often
possible to assess paternity with confidence. This approach is currently being used in
paternity studies, not only of humans, but also of nonhuman primates (e.g., chimpanzees
and baboons).
e
g
e
L
Interestingly, many human microsatellite loci exist in the genomes of other
primates. Not surprisingly, a large number of human microsatellites can be used when
studying chimpanzees. Indeed, most of the ground-breaking molecular genetic studies of
chimpanzees in the wild have identified variation between individuals for paternity
assessment using human microsatelllites. Rhesus macaques also possess many of the
same microsatellites as humans (see Figure 9), making it possible to study diversity and
relatedness in wild and captive groups of macaques. In more distantly related species,
however, the number of, and variability in, microsatellites shared with humans is limited
and it is often necessary to use complex and labor-intensive genome screening techniques
to identify new species-specific microsatellites.
C
n
e
1
2
a
g
3
5
6
7
4
Figure 9: Rhesus macaque paternity and relatedness can
be determined using human microsatellies. This
acrylamide gel shows the pattern of autosomal inheritance of
microsatelliltes in rhesus macaques. The pedigree (at left)
shows mothers (●) and fathers (■) at top and offspring below.
Note how offspring inherit one allele from each parent
according to Mendelian Laws. For example, individual #1
passed one allele to her daughter (#5). Individual #5 is a
heterozygote and she passed a different allele to her daughter
(#7).
19
DNA-based Trees and Evolution
Molecular genetics has had a major impact on the way we view the evolutionary
relationships of primates. You have learned that phylogenetics is the reconstruction of
the evolutionary history of organisms using evidence from the fossil record, embryology,
morphological features and even DNA sequences. The basic assumption, when using
molecular data, is that the DNA sequences of organisms are a record of the DNA
sequences passed down from previous ancestors. As a rule, organisms with more similar
DNA sequences share more recent common ancestry than organisms with more
dissimilar DNA sequences. The DNA sequences may also be used to calibrate the time
since divergence between species, since DNA sequence divergence represents a type of
“molecular clock.” To illustrate, Wildman et al (2004) analyzed mitochondrial DNA
(mtDNA) sequences to estimate divergence between Arabian and African hamadryas
baboons at approximately 35,000 years ago. Representative DNA sequences from other
types of baboons were significantly different from the hamadryas sequences, including
Papio and Theropithecus whose divergence has been dated at about 4 million years ago
(mya) using fossil evidence.
g
n
i
n
r
a
e
L
As mentioned previously, some genes may accumulate mutations more rapidly
than others and this may, or may not, be useful for estimating evolutionary relationships.
In general, the DNA sequences best suited for reconstructing evolutionary relationships
are those that are not dramatically affected by natural selection, in other words
“selectively neutral.” Neutral genes are thought to accumulate mutations at a fairly even
rate and, therefore, it should be possible to estimate time since divergence. You have
learned that some DNA sequences have been used, preferentially, for estimating
evolutionary relationships between modern human and nonhuman primate populations
(i.e., mitochondrial DNA). Other DNA sequences, such as the beta domain of the
hemoglobin gene, are more suitable for comparisons between distantly related organisms.
n
e
e
g
a
g
Whatever types of DNA sequences are compared, the first step is usually to
construct an alignment. A sequence alignment is simply a side-by-side comparison of the
order of nucleotides for each individual included in a study. Usually these alignments are
done in successive rows (see Figure 10a).
C
The next step is to create a phylogenetic tree based upon calculations of the
evolutionary distance between all pairs of sequences in the alignment. Typically, this is
also achieved using a computer program. One of the simplest ways to calculate distances
is to determine the number of differences per nucleotide of the sequence pair. For
example, if we compare 1,000 nucleotides from a non-coding region of the hemoglobin
genes in humans and chimpanzees we will find that there are 145 differences between the
two sequences. The evolutionary distance between human and chimpanzee would,
therefore, be 1.45 (see Li et al., 1987). To help put this in perspective, the evolutionary
distance between human/chimpanzee and gorilla is 1.54 and the distance between human/
chimpanzee and orangutan is 2.96.
20
As a rule, phylogenetic trees are constructed with a minimum number of steps to
get from one sequence to the next. This is called a maximum parsimony tree. When a
large number of DNA sequences are used in an analysis, several models may be equally
parsimonious. Another important consideration is the identification of a common
ancestral sequence for all of the sequences in question. The ancestral sequence is usually
represented as the “root” of the phylogenetic tree. In some cases, however, it is not
possible to identify the root of a tree. An “unrooted” tree depicts the relationship between
groups of sequences, but it does not identify the oldest, or ancestral, sequence.
g
n
i
n
r
a
A number of critical assumptions must be considered when molecular trees are
constructed. First, it is assumed that each nucleotide position in an alignment evolves
independently of every other nucleotide. Second, it is assumed that the sequences used in
the alignment are representative of the organisms. Third, and most importantly, it is
assumed that the sequences and the nucleotides compared in the alignment are
homologous (i.e., inherited from a common ancestor). If nucleotide sequences are shared
between species, but not homologous, the similarity may have arisen because of
convergence (i.e., the same nucleotide substitution has occurred at the same position in
different evolutionary lineages). Convergence usually occurs because the two species
have faced the same selection pressures. Similarity may also exist when homologous
genes have been duplicated within one or more species (paralogy) and alignments
contain nucleotide sequences from the duplicated genes. For example, the human adult
beta globin and chimpanzee beta globin genes are homologous because they have been
derived from a common ancestor. However, the two human alpha globin genes are
paralogous, because they have been duplicated since the divergence of humans and
chimpanzees. Importantly, if nucleotide sequences from different species are not
homologous, then the evolutionary inferences drawn from the trees will be inaccurate.
e
g
a
g
Species trees versus gene trees
e
L
Following on from the point about homology and phylogenetic trees is the fact
that some trees will depict an evolutionary relationship between organisms and other
trees will depict an evolutionary relationships between genes. The use of highly variable,
non-coding regions of the genome can provide a phylogeny that reflects the evolutionary
relationship between the representative species that have contributed DNA sequences for
the analysis. This is based upon the assumption that the rise of new DNA sequences
coincides with speciation events. For some regions of the genome, however, original
DNA sequences may persist in two species that have already diverged, while new DNA
sequences may arise without any speciation events. In these cases, it is not appropriate to
attempt a reconstruction of species relationships using these DNA sequences. Instead,
the only meaningful evolutionary relationships that can be described are those that exist
between the genes. Gene trees can tell you about the evolutionary history of particular
DNA sequences, not the species that possess them (see Figure 11). In some cases, gene
trees will disagree with species trees. Nevertheless, if one uses DNA sequences from
many different genes and/or non-coding regions of the genome, it is possible that the
average branching pattern will better represent the species tree.
C
n
e
21
How are we related to the Neandertals?
Molecular genetic studies using ancient DNA are extremely difficult, but they
have recently been instrumental in helping scientists understand how we are related to the
Neandertals. As you have learned, there have been great debates on the evolutionary
place of Neandertals relative to Homo sapiens sapiens. Some have suggested that they
are our direct ancestors (and are probably a subspecies of Homo sapiens). Others have
argued that they are a separate species. Using only the fossil evidence, this issue is very
difficult to resolve. However, the use of ancient DNA may offer an answer to this longstanding question.
g
n
i
n
r
a
In 1997, a very minute quantity of degraded DNA was extracted from the original
Neandertal skeleton discovered in the Neander Valley, Germany. Using PCR and
additional molecular genetic techniques, scientists were able to examine the relationship
between Neandertals and modern humans in an entirely new way. PCR amplification of
the mitochondrial genome was necessary because of the fact that cells contain many more
mitochondria than nuclei. The use of mitochondrial DNA was also advantageous
because, as explained earlier, the DNA sequence of the entire mitochondrial genome is
known. A small segment, just 379 base pairs of the hypervariable control region, was
amplified. For scientists studying modern DNA, it would be very easy to PCR such a
small fragment, but most of the Neandertal PCR fragments were less than 100 bp due to
degradation of the DNA. To produce a complete 379 bp sequence, the scientists used a
procedure called cloning and sequencing (see below) to create a series of overlapping
fragments. When the complete sequence was compared with DNA sequences of the
same region in modern humans, it was discovered that the Neandertal sequence was
significantly different from all modern sequences. While the most divergent modern vs.
modern human population sequences differed at 24 positions, pairs of modern humanNeandertal sequences differed, on average, at 25.6 positions, with no less than 20
differences between the most similar modern-Neandertal pair. Considering the
differences, as well as the type and location of the DNA sequence data, scientists
concluded that Neandertals were not the sole ancestors of any modern human population
(Krings et al., 1997).
n
e
e
g
a
g
e
L
Supporting evidence for this conclusion came in 2000, when ancient DNA was
extracted from a Neandertal skeleton discovered in the northern Caucasus (Ovchinnikov
et al., 2000). The DNA from this second and unrelated Neandertal specimen was PCR
amplified to obtain two overlapping fragments of 232 and 256 bp, yielding a total of 345
bp of the same hypervariable region of the mitochondrial genome. The PCR fragments
were directly sequenced using the Sanger sequencing method (see above). When
compared to a modern human reference sequence, this second Neandertal sequence
differed at 22 positions. Comparison with the original Neandertal sequence revealed only
12 differences. Once again, there was evidence to suggest that the Neandertals were not
directly ancestral to modern humans. But, like the 1997 study, the 2000 study still
cannot exclude with certainty the possibility that anatomically modern humans and
Neandertals exchanged some genes. Nuclear DNA could provide a clearer picture of our
relationship to Neandertals, but this is thought to be extremely difficult to obtain since
ancient DNA from the nucleus would be less abundant, and even more degraded, than
C
22
mitochondrial DNA. Nevertheless, recent studies have successfully obtained cave bear
nuclear DNA from the Vindija Neandertal site (O’Rourke et al., 2000). These successes
suggest that nuclear DNA studies of Neandertals may be possible in the future.
So, how do we, modern humans, compare to our closest nonhuman primate
relative, the chimpanzee? The scientists studying Neandertal DNA in 1997 had an
interesting answer to this question: modern humans and modern chimpanzees differ at
about twice as many positions as modern humans and Neandertals (Krings et al., 1997).
According to the fossil record, it would have taken approximately 4-5 million years for
the modern human- modern chimpanzee differences to arise. Therefore, Krings
concluded, one could create a “molecular clock” to estimate the divergence of
Neandertals and modern humans. Considering the types of changes that might occur in
the hypervariable region of the mitochondrial genome, the scientists concluded that
modern human and Neanderthal mitochondria began to diverge approximately 550,000 to
690,000 years ago. This would be about four times older than the last common ancestor
of all modern humans.
Hss
N
Pt
CCAAGTATTGACTTACCCATCAAC
-------------C--------G-T--------G-C—-TT---T-*-
e
g
Figure 10a: A mitochondrial DNA (mtDNA)
sequence alignment for three species, human
(Hss), Neandertal (N) and chimpanzee (Pt) .
This small alignment shows a portion of the
mtDNA D-Loop. It is conventional to indicate
agreement of sequence with a dash (-), to note
mismatches with the specific nucleotide (C) and
indicate missing nucleotides with an asterisk (*).
C
n
e
a
g
g
n
i
n
r
a
e
L
chimpanzees
Neandertal
modern
humans
Figure 10b: Phylogenetic tree
constructed from human,
Neandertal and chimpanzee
mtDNA sequences.
The study of ancient DNA is fraught with controversy due to serious problems of
contamination from modern humans who originally collected the ancient bones used for
DNA extraction or laboratory researchers who undertake the molecular genetics
experiments. Nevertheless, the studies of ancient Neanderthal DNA have been
exemplary. Laboratory controls were used to detect contamination from modern humans
at many stages of these studies. Indeed, the 1997 study reports that some contaminating
data, identified as modern human, was excluded from the analyses. As techniques
become even more sophisticated, and scientists become more aware of the hazards of
working with minute quantities of degraded DNA, it is likely that molecular genetics will
provide a revolutionary perspective on the relationships between modern humans and
their ancestors.
23
Protein structure and function
You have already learned that only about 2% of the human genome consists of
exons, or coding DNA sequences. As you also know, the DNA in exons serves as the set
of instructions for building proteins. One of the double strands of DNA is used as a
template for the transcription of mRNA in the nucleus. With the help of tRNA and
rRNA, the message is ultimately translated into a long string of amino acids to form a
protein or part of a protein. The function of any protein depends on its shape and
structure.
Protein structure
g
n
i
n
r
a
The sequence of amino acids is formed as a long chain held together by peptide
bonds. The primary structure of a protein is the amino acid sequence, which determines
the higher levels of structure of the protein and its biological function. The secondary
structure of a protein is determined by the way the protein folds, and the third is basically
the protein’s shape. Moreover, some proteins combine with other proteins as subunits of
a larger, more complex protein. The quaternary structure is the arrangement of the
protein subunits that form the larger functional protein. For example, the protein that
carries oxygen through the blood, hemoglobin, is composed of two alpha globins and two
beta globins.
e
g
The functional diversity of proteins
a
g
e
L
Proteins are extremely diverse and complex. They include receptors for
recognizing other proteins or chemicals, enzymes for DNA and RNA synthesis and
hormones for triggering biological responses. Proteins also form antibodies as part of the
immune response and, as you have learned, an antibody combines with an antigen to
form a complex that can stimulate an immune reaction. Antibodies perform a number of
other tasks as well, including: some antibodies attack microbes and bacterial toxins;
others are involved in allergic reactions; and still others are responsible for initiating the
destruction of infected cells. Antibodies exhibit a great deal of variability because
different genes that code for them vary tremendously.
C
n
e
For many immune response genes, natural selection has favored nucleotide
diversity and some of the resulting amino acid changes have had advantageous
consequences for individuals (see Box 1). Contrastingly, some proteins have strong
functional constraints since most amino acid changes have negative, or deleterious,
consequences and natural selection does not generally favor nucleotide substitutions.
Often such proteins form important structures for activities such as chromosome
formation. For example, histones are DNA-binding proteins that mediate the coiling of
DNA during chromosome condensation prior to cell division. If you compare the histone
genes of many different primate species, you will discover that there are almost no
24
differences in the nucleotide sequence. In fact, if you compare one of the histone (H4)
genes of humans and wheat, only 2 out of 104 amino acids differ. This degree of
similarity indicates that the histone genes have been highly conserved throughout the
course of evolution because of the importance of histones in chromosome structure and
function in all forms of life.
Although technically more demanding, the study of coding DNA sequences offers
scientists an opportunity to understand gene function and evolution. The information
encoded in exons can be translated into a real functional protein with a defined threedimensional shape, since scientists are developing computerized models that predict
protein structure and aid in understanding gene function. For example, identification of
nucleotide sequences that result in a flawed proteins requires knowledge of correct
protein structure, which is essential for determining gene function. Comparisons of
coding sequences within a species can help scientists understand the functional
importance of nucleotide substitutions.
mRNA Studies
g
n
i
n
r
a
e
L
Many scientists are interested in studying the functional diversity of proteins in
humans and nonhuman primates. This can be done by extracting mRNA from fresh,
nucleated cells and employing almost any one of the molecular techniques previously
described for use with DNA. Studies of mRNA are significantly more time and labor
intensive since mRNA has an extremely short life span and it is only found in minute
quantities within nucleated cells. Before any further molecular genetic study can be
undertaken a complementary strand of nucleotides must be synthesized to produce a
double stranded DNA-like molecule. Typically, this procedure is followed by
amplification using PCR, as described for DNA. The two steps together are known as
reverse transcriptase-polymerase chain reaction (RT-PCR) since the synthesis of the
complementary strand requires the reverse transcriptase, an enzyme essential for the
replication of mRNA.
n
e
e
g
a
g
BOX 1: MHC genes, immune response and evolution
C
Humans and nonhuman primates face a huge range of dangerous and rapidly changing
pathogens in their natural habitats. Disease-causing organisms usually have short generation
times and an ability to adapt quickly to their host. When you consider that these organisms can
also cause mortality, they clearly represent powerful agents of evolution. In humans, epidemics
such as the bubonic plague, caused by the insect borne bacterium Yersinia pestis, killed up to 20
million Europeans in the 1300s. Currently, HIV is lowering life expectancy and reversing gains in
child survival in east and central Africa, and it is spreading rapidly in South Asia as well.
Most vertebrates cope with these challenges through immune response. Some blood
proteins reduce the likelihood of disease, but a much more complex genetic system provides a
key barrier to infection by disease-causing organisms. The major histocompatibillity complex
(MHC) is sometimes considered the center of the immune universe since it consists of many
genes directly involved in battling parasitic infections. In humans, the MHC is also known as the
human leukocyte antigen (HLA) complex. HLA genes occupy more than 600,000 bases of the
25
BOX 1: MHC genes, immune response and evolution (continued)
entire 3,800,000 HLA complex
on chromosome 6. Using
molecular genetic methods,
hundreds of different alleles
Patr-DRB1*0305
have been identified at some
HLA loci. In humans, HLA
polymorphism is so great that it
is theoretically possible for
HLA-DRB1*0302
every single person to possess a
genetically different
combination of HLA alleles. In
recognition of medical and
evolutionary implications many
Patr-DRB1*0702
scientists have been using
molecular genetic techniques
such as PCR, recombinant
DNA cloning and DNA
sequencing to study MHC genes
HLA-DRB1*0701
in nonhuman primates. Not
surprisingly, comparative
studies of DNA sequences from
Figure 11: Some human and chimp genes are very
chimpanzee and gorilla MHC
similar. A gene tree of human (HLA) and Pan troglodytes
genes have revealed a
(Patr) MHC alleles shows that some alleles are more similar
remarkable degree of similarity
between species than within species. (After Klein, Takahata
with human DNA sequences. In
and Ayala, 1993)
some cases, MHC alleles may
be more similar between two
species than within each species. For example, comparisons of DNA sequences from the MHCDRB1 locus in humans (HLA) and chimpanzees (Patr) show that the human alleles (HLADRB1*0302 and HLA-DRB1*0701) have more sequence differences (31 nucleotides) than either
has with its chimpanzee counterpart (HLA-DRB1*0302/ Patr-DRB1*0305: 13 nucleotides and
HLA-DRB1*0701/Patr-DRB1*0702: 2 nucleotides)(see Figure 11). These similarities between
humans and chimpanzees indicate that the alleles were inherited by both species from their shared
common ancestor (Klein et al., 1993).
n
e
C
e
g
a
g
g
n
i
n
r
a
e
L
MHC diversity could be maintained by natural selection favoring heterozygotes.
Molecular genetic studies of HLA nucleotide sequences demonstrate a significantly higher rate of
nonsynonymous substitutions in the regions directly involved in immune response.
Additionally, there is evidence that the inheritance of particular HLA molecules provides
resistance to certain pathogens. In West Africa, for example, certain HLA alleles (HLA-B53 and
HLA-DRB*1302 ) are found in individuals who are resistant to Plasmodium falciparum malaria
(Hill et al.,1992). Similar HLA associations have been described for HIV progression and
hepatitis B virus resistance.
Since resistance to infectious disease is so important, it should not be surprising that
individuals also have behavioral and biological mechanisms for maintaining MHC
heterozygosity. Studies of humans and mice have revealed that MHC-based mating preferences
26
BOX 1: MHC genes, immune response and evolution (continued)
for partners with different MHC types preferentially produce MHC heterozygous progeny with
higher Darwinian fitness (see Apanius et al., 1997). In mice, MHC genes actually affect an
individual’s odor and it has been suggested that mate choice, and even kin recognition, is based
upon their odor cues. The ability to discriminate MHC-based odors has also been observed in
humans, with some humans capable of recognizing mates and relatives on the basis of olfactory
cues. Some scientists have also discovered that women exhibit preferences for odors from males
with particular HLA alleles, usually different from their own. In 1995, Claus Wedekind, a
zoologist at Bern University in Switzerland, tested women’s responses to sweaty T-shirts. He
found that women preferred the scent of T-shirts from men who had the most dissimilar HLA
types to their own. Wedekind argued that these results indicate that body odor plays a role in
female mate choice.
g
n
i
n
r
a
Currently, studies of how MHC genes influence odor, mate choice and disease resistance
are underway in several laboratories. Although complex, the genes of the MHC have the
potential to provide us with important insight into evolution, behavior and reproduction in
humans and nonhuman primates.
e
L
MHC genes are also of great importance in the field of medicine for several reasons.
First, they play a major role in mediating tissue transplantation. Successful organ or bone
marrow transplantation requires matching of as many HLA alleles as possible, since any
difference between donor and recipient can prompt vigorous, and sometimes fatal, immunemediated rejection of the transplant. To increase the likelihood of HLA matching, relatives are
often encouraged to donate bone marrow for transplants. Another reason that MHC genes are
important in medicine is the fact that many autoimmune diseases, like diabetes and rheumatoid
arthritis, occur more frequently in individuals with certain HLA alleles. For example, insulindependent diabetes mellitus occurs more often than expected in individuals with the HLADQB1*0302 allele and about 90% of people with an inflammatory disease of the hips and spine,
known as ankylosing spondylitis, have the HLA-B27 allele (see Hill, 2001). Given the
relationship between MHC and immune response, it should not be surprising that HLA alleles
have a role in predisposition to auto-immune diseases. Nevertheless, it might seem puzzling that
some alleles that predispose individuals to autoimmune diseases are common in contemporary
populations. Apanius et al. (1997) suggest that these alleles may be maintained because they
confer some benefit, such as resistance to infectious diseases, that outweighs the deleterious
effects from autoimmunity.
C
n
e
e
g
a
g
Recombinant DNA Technology and Human Evolution
Almost everyday we hear about new breakthroughs in biotechnology and
molecular genetics. Cloning is one of those relatively new advances that attracts media
attention. We hear about claims that humans are being cloned, but the idea of an exact
human replica is preposterous. Human clones, if they were ever successfully created,
would be no more replicas of each other than identical twins are. Even identical twins,
with the same genetic make-up, still exhibit physical and behavioral differences due to
numerous environmental factors (e.g., experiences, education, nutrition). Furthermore,
attempts to clone complex organisms such as mice, sheep and even monkeys have had
27
mixed success and it is unlikely that similar experiments would be attempted with
humans in the near future. The recent experiments by Korean scientist (see Tamkins,
2004), creating human embryo clones for the production of therapeutic stem cells,
apparently do not represent an attempt to produce living children.
Cloning also has an important place in molecular genetic research. In this setting,
cloning is also known as recombinant DNA technology (see In The Lab 7).
g
n
i
n
r
a
In The Lab 7: DNA Cloning
Recombinant DNA clones are produced by combining short segments of DNA from one
organism (like a human) with DNA from other organisms (such as a bacterium). Typically, the
short segment of DNA is generated using PCR and inserted, thorough physical and chemical
means, into a carrier organism called a plasmid. Plasmids have small circular genomes, not
unlike mitochondria, and are unable to live without a host. They are a bit like parasites, except
they generally do not cause problems for their hosts. Bacterial cells often serve as the host for
plasmids in molecular genetics experiments. When the bacterium replicates itself, the foreign
DNA and plasmid will also be replicated, or cloned.
In molecular genetics laboratories,
cloning with plasmids and bacteria provides
researchers with the ability to study short
segments of DNA in detail. You can see what
bacterial clones look like in Figure 12.
Nucleotide sequences can be determined and
the function of DNA sequences can be
experimentally studied. Recombinant DNA
technology has been used to artificially
synthesize proteins as well. Human insulin is
produced in this way and diabetics benefit
from this modern and efficient application of
molecular genetics.
n
e
e
g
a
g
e
L
Figure 12: Cloning and sequencing is used to identify new genes. This graduate student at
the University of Cambridge identifies genetically modified (i.e., recombinant DNA) clones.
C
Molecular genetic research has also contributed to advances in disease diagnosis
and medical treatment. In some cases, the advances relate to our ability to identify
individuals predisposed to a disease before symptoms even develop. In other cases, drug
therapies have been revolutionized by the advances arising from genome sequence data.
Scientists in the field of “structural genomics” use DNA sequence data to understand and
create new proteins as discussed above. Many discoveries in this field of research are
relevant for drug design and improvement of human health (see Pistoi, 2002).
28
However, amidst the hope and hype surrounding molecular genetic technology,
there are also difficult and, so far, unanswered questions about how this knowledge will
be best used. For example, gene therapy is intended to replace damaged genes with
healthy ones, often using a disarmed virus to deliver a package of "good" DNA into the
patient's cells. When doctors tried to use gene therapy to treat the rare metabolic disease
of a 19 year old male in Philadelphia in 1999, however, the therapy turned out to be fatal.
Gene therapy may still be too new and unpredictable to use widely on humans. In
February 2001, the scientific journals Nature and Science devoted special sections to
studies of the Human Genome and its relevance for human health.
g
n
i
n
r
a
Although progress in the fields of cloning, recombination DNA technology and
structural genomics may be slow, some people are concerned about the ways in which
these advances will affect our species and its evolution. The human genome, as we have
seen, is subject to mutations and is constantly evolving. Evolution of any kind involves
genetic change and so long as recombinant DNA technology and molecular genetic
medical treatment advances knowledge and reduces disability and disease there is good
reason to encourage further responsible research.
Acknowledgements
e
L
My thanks to Robert Jurmain and Lynn Kilgore for offering me the opportunity to
write this module and for their helpful editorial suggestions. Thanks also to Kristin
Abbott, Lynn Kilgore, Robert Jurmain, Simon Middleton, Julie Robson and Jean
Wickings for their contributions to the photos in the module and to Emma Wainwright
for the cover illustration. Finally, thanks to DKK for helpful suggestions throughout the
preparation of this module.
e
g
a
g
Suggested Discussion Questions
n
e
1. What are the key differences between the nuclear and mitochondrial genomes and
how can these differences be used to study human variation?
2. How and why is mitochondrial DNA so useful for understanding human
evolution?
C
3. What are the potential problems associated with molecular genetic studies of
ancient DNA.
4. What are the potential problems associated with molecular studies of noninvasively collected DNA.
5. How has the development of the polymerase chain reaction (PCR) contributed to
the study of human evolution?
29
6. Does non-coding DNA provide any useful information for identifying
individuals? Explain.
7. What are restriction fragment length polymorphisms (RFLPs) and how are they
used to identify individuals?
8. What are microsatellites and why are they useful for studying human variation?
g
n
i
n
r
a
9. Why would studies of Y chromosomes and mitochondrial DNA give different
results in studies of modern humans?
10. How can studies of the chimpanzee genome contribute to our understanding of
human evolution?
Bibliography
e
L
Apanius V., D. Penn, P.R. Slev, L.R. Ruff and W.K. Potts (1997) The nature of selection
on the major histocompatibility complex. Critical Reviews in Immunology, 17(2):179224.
Bradley, B. and L. Vigilant (2002) False alleles derived from microbial DNA pose a
potential source of error in microsatellite genotyping of DNA from faeces. Molecular
Ecology Notes, 2:602-605.
e
g
Graur, D. and W. Martin (2004) Reading the entrails of chickens: molecular timescales of
evolution and the illusion of precision. Trends in Genetics, 20(2):80-86.
a
g
Hill, A.V. (2001) Immunogenetics and genomics. Lancet, 357(9273):2037-2041.
n
e
Hill, A.V., J. Elvin, A.C. Willis, M. Aidoo, C.E. Allsopp, F.M. Gotch FM, X.M. Gao et
al. (1992) Molecular analysis of the association of HLA-B53 and resistance to severe
malaria. Nature, 360: 434-439.
C
Klein, J, N. Takahata and F.J. Ayala (1993) MHC polymorphisms and human origins.
Scientific American, Dec. 1993:78-83.
Krings M, A. Stone, R.W. Schmitz, H. Krainitzki, M. Stoneking and S. Paabo (1997)
Neandertal DNA sequences and the origin of modern humans. Cell, 90(1):19-30.
Li, H.W., K.H. Wolfe, J. Sourdis and P.M. Sharp (1987) Reconstruction of phylogenetic
trees and estimatation of divergence times under constant rates of evolution. Cold Spring
Harbor Symposium in Quantitative Biology, 52:847-856.
Messier, W. and C.B. Stewart (1997) Episodic adaptive evolution of primate lysozymes.
Nature, 385(6612):151-154.
30
Morin P.A., K.E. Chambers, C. Boesch and L. Vigilant (2001) Quantitative polymerase
chain reaction analysis of DNA from noninvasive samples for accurate microsatellite
genotyping of wild chimpanzees (Pan troglodytes verus). Molecular Ecology,
10(7):1835-44.
Moxon, E.R. and C. Wills (1998). DNA microsatellites: agents of evolution? Scientific
American, 280(1):94-99.
g
n
i
n
r
a
O’Rourke, D.H., M.G. Hayes and S.W. Carlyle (2000) Ancient DNA studies in physical
anthropology. Annual Reviews of Anthropology, 29:217-242.
Ovchinnikov I.V., A. Gotherstrom, G.P. Romanova, V.M. Kharitonov, K. Liden and W.
Goodwin (2000) Molecular analysis of Neanderthal DNA from the northern Caucasus.
Nature, 404(6777):490-493.
Pistoi, S (2002) Facing your genetic destiny. see www.sciam.com/article.cfm?articleid=
00016A09-BE5F-1CDAB4A8809EC5888EEDF
e
L
Shaw J.P., J. Marks, C.C. Shen and C.K. Shen (1989) Anomalous and selective DNA
mutations of the Old World monkey alpha-globin genes. Proceedings of the National
Academy of Sciences, U S A, 86(4):1312-1316.
e
g
Sibley, C. G. and J.E. Ahlquist (1984) The phylogeny of hominoid primates, as indicated
by DNA-DNA hybridization. Journal of Molecular Evolution, 20: 2-15
a
g
Tamkins, T. (2004) South Koreans create human stem cell line using nuclear transfer.
Lancet, 363(9409):623.
Ward, R. and C. Stringer (1997) A molecular handle on the Neanderthals. Nature,
388:225-226.
n
e
Wedekind C., T. Seebeck, F.Bettens and A.J. Paepke (1995) .MHC-dependent mate
preferences in humans. Proceedings of the Royal Society of London, Biological Sciences,
260(1359):245-249.
C
Wildman D.E., M. Uddin, G. Liu, L.I. Grossman and M. Goodman (2003) Implications
of natural selection in shaping 99.4% nonsynonymous DNA identity between humans
and chimpanzees: enlarging genus Homo. Proceedings of the National Academy of
Sciences, U S A, 100(12):7181-7188.
Wildman D.E., T.J. Bergman, A. Al-Aghbari, K.N. Sterner, T.K. Newman, J.E. PhillipsConroy, C.J. Jolly and T.R. Disotell (2004) Mitochondrial evidence for the origin of
hamadryas baboons. Molecular Phylogenetics and Evolution, 32(1):287-296.
31
Suggested Readings and Internet Sites
Molecular Anthropology and the Human Genome
Collins, F.S., M. Morgan and A. Patrinos (2003) The Human Genome Project: Lessons
from Large-Scale Biology. Science, 300: 286-290
Sources of DNA and Biological Sample Collection
Hoefreiter, M., D. Serre, H.N. Poinar, M. Kuch and S. Paabo (2001) Ancient DNA.
Nature Reviews, 2:353-359.
g
n
i
n
r
a
DNA Extraction
Cooper, A. and H.N. Poinar (2000) Ancient DNA: Do It Right or Not at All. Science,
289(5482): 1139.
Principles and Applications of the Polymerase Chain Reaction (PCR)
Mullis, K. B. (1990) The Unusual Origin of the Polymerase Chain Reaction" Scientific
American, April, pp.36-39.
e
L
Repetitive DNA Sequences
Goodwin, W., A. Linacre, and P. Vanezis (1999). The use of mitochondrial DNA and
short tandem repeat typing in the identification of air crash victims. Electrophoresis, 20,
1701-1711.
e
g
DNA-based Trees and Evolution
Build a Molecular Clock: The Origin of HIV
www.smccd.net/accounts/case/CPS/400.html
a
g
Generation of Phylogenetic Tree based upon DNA sequence analysis.
www.bioweb.uwlax.edu/GenWeb/Evol_Pop/Phylogenetics/Exercise/exercise.htm
n
e
Protein Structure and Function
Genetic Science Learning Center, University of Utah
www.gslc.genetics.utah.edu/units/basics/
C
MHC Genes, Immune Response and Evolution
Knapp, L.A. (2002) Evolution and Immunology. Evolutionary Amthropology,
11(S1):140-144.
Knapp, L.A. (in press) The ABCs of MHC. Evolutionary Anthropology.
Recombinant DNA Technology and Human Evolution
Facing Your Genetic Destiny
www.sciam.com/article.cfm?articleid=00016A09-BE5F-1CDAB4A8809EC5888EEDF
32
Glossary
alleles Alternate forms of a gene. Alleles occur at the same locus on homologous
chromosomes and thus govern the same trait. However, because they are different, their
action may result in different expressions of that trait. The term is sometimes used
synonymously with gene.
g
n
i
n
r
a
amino acids Small molecules that are the components of proteins.
anneal To join together. In molecular genetics, two single-strands of DNA can anneal to
form one double-stranded molecule.
antigen Large molecule found on the surface of cells. Several different loci govern
various antigens on red and white blood cells. (Foreign antigens provoke an immune
response.)
antibody Proteins that are produced by some types of immune cells and that serve as
major components of the immune system. Antibodies recognize and attach to foreign
antigens on bacteria, viruses, and other pathogens. Then other immune cells destroy the
invading organism.
e
L
base-pairs (bp) Pairs of nucleotides held together by hydrogen bonds. The nucleic acid
adenine (A) pairs with thymine (T) and guanine (G) pairs with cytosine (C).
e
g
chromosomes Discrete structures composed of DNA and protein found only in the nuclei
of cells. Chromosomes are only visible under magnification during certain phases of cell
division.
a
g
clone An organism that is genetically identical to another organism. The term may also
be used to refer to genetically identical DNA segments, molecules, and cells.
n
e
complementary Referring to the fact that DNA bases form base pairs in a precise
manner. For example, adenine can bond only to thymine. These two bases are said to be
complementary because one requires the other to form a complete DNA base pair.
cytoplasm The portion of the cell contained within the cell membrane, excluding the
nucleus. The cytoplasm consists of a semifluid material and contains numerous structures
involved with cell function.
C
data (sing., datum) Facts from which conclusions can be drawn; scientific information.
33
deletions/deletion mutations A change in DNA sequence due to the loss of one or more
nucleotides.
denaturation The physical separation of a molecule, usually through heat or chemical means.
When the hydrogen bonds of double-stranded DNA are broken and two single-strands are
formed, the DNA is denatured.
g
n
i
n
r
a
deoxyribonucleic acid (DNA) The double-stranded molecule that contains the genetic
code. DNA is a main component of chromosomes.
derived (modified) Referring to characters that are modified from the ancestral condition
and thus are diagnostic of particular evolutionary lineages.
domain Region of a protein with distinct structure and characteristic function that is
determined by the protein’s tertiary structure. The tertiary structure is the way in which
the strings of amino acids of the protein fold with respect to each other.
e
L
double-stranded The usual and most stable structure for DNA. Two single-strands of
DNA are usually paired together to form one double-stranded molecule. A double-strand
forms when nucleotides are paired through hydrogen bonding, with G pairing with C and
A pairing with T. Double-stranded DNA is also known as a duplex.
e
g
a
g
duplex Double-stranded DNA molecules.
enzymes Specialized proteins that initiate and direct chemical reactions in the body.
n
e
evolution A change in the genetic structure of a population. The term is also frequently
used to refer to the appearance of a new species. The modern genetic definition is a
change in the frequency of alleles from one generation to the next.
exon Regions of a gene that consist of nucleotides that will be translated into proteins
during protein synthesis.
C
extra-genic DNA DNA sequences that do not code for proteins (i.e., they are not genes).
gel electrophoresis A technique for separating DNA sequences that differ by length or
nucleotide sequence.
gene A sequence of DNA bases that specifies the order of amino acids in an entire
protein, a portion of a protein, or any functional product. A gene may be made up of
hundreds or thousands of DNA bases organized into coding and noncoding segments.
34
genetics The study of gene structure and action and the patterns of inheritance of traits
from parent to offspring. Genetic mechanisms are the underlying foundation for
evolutionary change.
genome The entire genetic makeup of an individual or species. In humans, it is estimated
that each individual possesses approximately 3 billion DNA nucleotides.
g
n
i
n
r
a
hemoglobin A protein molecule that occurs in red blood cells and binds to oxygen
molecules.
heterozygous/heterozygote Having different alleles at the same locus on members of a
chromosome pair. Can be contrasted with homozygous/homozygote Having the same
allele at the same locus on both members of a chromosome pair.
HLA The human major histocompatibility complex (MHC). There are hundreds of HLA
genes, many of which are involved in immune response and disease resistance. HLA
genes also provide an immunological marker for genetic self-identity.
e
L
homology/homologous Similarity between organisms based on descent from a common
ancestor.
Human Genome Project An international effort aimed at sequencing and mapping the
entire human genome.
e
g
hybridization The creation of double-stranded DNA sequences by allowing singlestranded DNA sequences to anneal.
a
g
insertions/insertion mutations A change in DNA sequence due to the addition of one or
more nucleotides.
n
e
intron Regions of a gene that consist of nucleotides that will not be translated into
proteins during protein synthesis. Introns are removed from the protein-coding sequence
of a gene during editing of the mRNA.
C
in vitro Chemical or biological reaction that take place in a test tube.
locus (pl., loci) The position on a chromosome where a given gene occurs. The term
is sometimes used interchangeably with gene, but this usage is technically incorrect.
long interspersed nucleotide element (LINE) A type of repetitive DNA sequence that
is about 14,000 to 61,000 bp in length. More than 50,000 copies of LINEs have been
identified in the human genome. LINEs are found in most mammalian genomes and
generally do not code for proteins.
35
meiosis Cell division in specialized cells in ovaries and testes. Meiosis involves two
divisions and results in four daughter cells, each containing only half the original number
of chromosomes. These cells can develop into gametes.
messenger RNA (mRNA) A form of RNA that is assembled on a sequence of DNA
bases. It carries the DNA code to the ribosome during protein synthesis.
g
n
i
n
r
a
Major Histocompatibility Complex (MHC) A genetic complex of vertebrate genes
provide an immunological marker for genetic self-identity. (Also known as the HLA
complex in humans.)
mis-sense mutation A change in DNA sequence that results in an incorrect amino acid or a
stop codon.
mitochondria (sing., mitochondrion) Structures contained within the cytoplasm of
eukaryotic cells that convert energy, derived from nutrients, into a form that is used by
the cell.
e
L
mitochondrial DNA (mtDNA) DNA found in the mitochondria; mtDNA is inherited
only from the mother.
molecules Structures made up of two or more atoms. Molecules can combine with other
molecules to form more complex structures.
e
g
mutation A change in DNA. Mutation refers to changes in DNA nucleotides
(specifically called point mutations) and also to changes in chromosome number and/or
structure.
a
g
natural selection The mechanism of evolutionary change first articulated by Charles
Darwin; refers to genetic change or changes in the frequencies of certain traits in
populations due to differential reproductive success between individuals.
n
e
neutral mutation A change in DNA sequence that does not change the amino acid sequence
of a gene.
C
non-coding DNA DNA sequences that do not code for proteins.
nucleated cells Somatic cells that contain a nucleus and, therefore, a copy of the
organism’s nuclear genome. Most cells in the body are nucleated. Some cells, like red
blood cells, do not contain a nucleus.
nucleotides Basic units of the DNA molecule, composed of a sugar, a phosphate, and one
of four DNA bases.
nucleus A structure (organelle) found in all eukaryotic cells. The nucleus contains
chromosomes (nuclear DNA).
36
paralogy/paralogous Homologous due to a recent or past duplication in the
same species.
pathogens Substances or microorganisms, such as bacteria, fungi, or viruses, that cause
disease.
g
n
i
n
r
a
peptide bonds The chemical bonds that hold individual amino acids together to produce
a protein.
phylogenetic tree A chart showing evolutionary relationships as determined by
phylogenetic systematics. It contains a time component and implies ancestor-descendant
relationships.
point mutation A chemical change in a single base of a DNA sequence.
polymerase An enzyme that is directly involved in the synthesis of new strands of DNA
or RNA.
e
L
polymerase chain reaction (PCR) A method of producing thousands of copies of a
DNA segment using the enzyme DNA polymerase.
e
g
polymorphisms Loci with more than one allele. Polymorphisms can be expressed in the
phenotype as the result of gene action (as in ABO), or they can exist solely at the DNA
level within noncoding regions.
a
g
primer A short sequence of single-stranded DNA that is responsible for initiating the
synthesis of new strands of DNA or RNA.
probe A single-stranded sequence of DNA that is used to identify complementary
sequences of single-stranded DNA. Probes hybridize to their complementary sequence.
n
e
protein Three-dimensional molecule that serve a wide variety of functions through an
ability to bind to other molecules.
C
pseudogene: Gene that has acquired a nonsense mutation and is no longer transcribed.
recombinant DNA When genes, or parts of genes, from one species are transferred to
somatic cells or gametes of another species.
replicate To duplicate. The DNA molecule is able to make copies of itself.
restriction endonuclease An enzyme that can be used to cut (or restrict) particular DNA
sequences. For example, the restriction endonuclease EcoRI will cut the DNA sequence
GAATTC. Restriction endonuclease and restriction enzyme are used interchangeably.
37
restriction fragment length polymorphism (RFLP) Genetic polymorphism that is
revealed by the different sizes of fragments generated with a particular restriction
endonuclease (such as EcoRI).
reverse-transcriptase An enzyme that is used to synthesize new strands of DNA from an
mRNA sequence.
g
n
i
n
r
a
ribonucleic acid (RNA) A single-stranded molecule, similar in structure to DNA. Three
forms of RNA are essential to protein synthesis. They are messenger RNA (mRNA),
transfer RNA (tRNA), and ribosomal RNA (rRNA).
sequence A string of nucleotides. For genes, the precise order of the nucleotides will
determine the amino acids that make up a protein.
short-interspersed nuclear element (SINE) A type of repetitive DNA sequence that is
about 300 bp in length. More than 500,000 copies of the Alu SINE have been identified
in the human genome. SINEs are found in most mammalian genomes, but the Alu SINE
is unique to primates.
e
L
sickle-cell anemia A severe inherited hemoglobin disorder that results from inheriting
two copies of a mutant allele. This allele results from a single base substitution in the
DNA.
e
g
single-stranded An unstable, and usually temporary, structure for DNA. Single-strands
of DNA usually pair together to form one double-stranded molecule.
a
g
tandem Literally, side-by-side. Tandem repeats are DNA sequences that are repeated
side-by-side.
n
e
transcription The first step in protein synthesis. Transcription is the transfer of genetic
information from the DNA template to RNA. It is followed by translation of the RNA
into amino acids and then proteins.
C
38