Molecular Marker Guide

Molecular Marker Guide
A molecular marker is a stretch of DNA (this can range from one base to millions of bases) that enables
us to deduce information about the identity, demography, history and evolution of an individual,
population or species. There are many different marker types and some are best suited to particular
applications. This guide is not exhaustive as new markers and technologies are emerging all the time.
Instead it aims to provide an overview of the main marker types, their uses and their limitations. To find
people who can provide you with information about the right marker type(s) for your study, search for
labs/researchers in the Community Database.
Links for more information
Provides details on marker types and their applicability to different ecological questions:
http://www.nature.com/scitable/knowledge/library/molecular-genetic-techniques-and-markers-forecological-15785936
Overview of different molecular markers used in conservation:
http://labs.russell.wisc.edu/peery/files/2011/12/Molecular-Markers.pdf
Information and links to resources about plant molecular markers (though note some of the information
is applicable to animals too):
http://www.cgn.wur.nl/UK/CGN+Plant+Genetic+Resources/Research/Molecular+markers/
A practical guide to using microsatellites for ecologists:
http://web.ecologia.unam.mx/laboratorios/evolucionmolecular/images/a_homes/pdfs/SelkoeToonen_2006_MicrosForEcologists.pdf
Review of AFLPs and their applications: http://www.softgenetics.com/TrendsPlantScience2007.pdf
A guide to using expressed sequence tags for ecologists: http://www.mol-palaeolit.de/pdf/bouck/2007/5517_Bouck+Vision2007.pdf
Reviews of new genomic applications to conservation: http://iuss.unife.it/dipartimento/biologiaevoluzione/ricerca/evoluzione-e-genetica/chioggia2011/allendorf-et-al-2010-nature-reviews-genetics.pdf
http://129.125.2.51/fmns-research/theobio/_pdf/ou_eatig10.pdf
A guide to next generation sequencing technologies:
ftp://84.237.21.152/pub_archive/lin/yu/NGS/2010_Field_guide_to_next_generation_DNAsequencers.
pdf
Key words:



Polymerase chain reaction (PCR): A method that allows you to exponentially amplify a specific
segment of DNA, even when the starting whole cell DNA concentration is very low.
Locus: A locus is a segment of DNA at a particular position on a chromosome (the plural is
‘loci’).
Allele: An allele is a variant at a given locus. For species that have two sets of chromosomes
(diploids, like us and most mammals), one individual in a population may have alleles ‘A’ and ‘a’
at locus 1, while another has alleles ‘A’ and‘A’.





Homozygous: A homozygous individual has a ‘genotype’ comprised of two identical genetic
variants (alleles) at the same locus on a pair of chromosomes.
Heterozygous: A heterozygous individual has a genotype comprised of two different alleles at the
same locus on a pair of chromosomes.
Polymorphic: When there are different alleles among different individuals in a population. In
other words, when there is genetic variation.
Dominant: A dominant marker can only be scored as present or absent. It is not possible to
distinguish between heterozygotes and homozygotes.
Codominant: It is possible to distinguish between heterozygotes and homozygotes using
codominant markers.
Genomic regions

Nuclear autosomal DNA markers:
Genomic location: The autosomes (i.e. the chromosomes which are not sex-chromosomes and
not from organelle – mitochondrial and chloroplast – DNA).
Inheritance: Inherited from both parents in sexually reproducing organisms.
Applications: Individual identification, Inbreeding, Pedigrees, Parentage, Demography,
Population structure, Selection, Speciation, Phylogenetic relationships, Phylogeography.
Limitations: It can be difficult to reconstruct the inheritance of genetic lineages due to
recombination of DNA and to determine which allele was inherited from which parent.
Marker types: Allozymes, anonymous markers (e.g. AFLPs, RAD tags), microsatellites, SNPs,
ESTs (see below), DNA sequence data, whole genomes.

Sex-chromosome markers:
Location: Markers located on the chromosomes that determine the sex of an individual. For
example: X and Y chromosomes in mammals, females are XX and males are XY, Z and W
chromosomes in birds, females are ZW and males are ZZ.
Inheritance: Marker specific. Sex-chromosome markers are inherited asymmetrically by the sexes,
but the mode varies. For example in mammals Y-chromosomes are inherited from father to son
and X-chromosomes are inherited from the mother in sons, but from both parents in daughters.
Applications: Useful for investigating sex-biased processes: Paternity, Maternity, Sex-typing, Sexbiased dispersal/population structure, selection.
Limitations: Markers from sex-chromosomes can be difficult to develop due to their DNA
complexities. Many sex-chromosome markers have low variability meaning they have low
resolution. Cannot be used to investigate processes equally affecting both sexes.
Marker types: microsatellites, SNPs, DNA sequence data.

Mitochondrial DNA markers:
Location: Mitochondrial organelles.
Inheritance: From mother to offspring in vertebrates (i.e. down matrilines).
Applications: Useful for looking at female-biased dispersal due to maternal inheritance,
Demography, Phylogeography, Selection. Mitochondrial genomes are typically highly variable in
vertebrates (which can provide high resolution), though they can be highly conserved plants.
Limitations: Can be difficult to disentangle whether patterns are due to selection or demography.
High stochasticity associated with mtDNA markers means patterns may not be congruent with
true population history. Mitochondrial data should usually be combined with autosomal data
when looking at phylogeographic processes.
Marker types: SNPs, DNA sequence data, whole mitochondrial genome.

Chloroplast DNA markers
Location: Plastid organelles.
Inheritance: Depends on the species – can be uniparental or biparental, and the size of these
genomes varies among species.
Applications: Demography, Phylogeography, Selection.
Limitations: Can be difficult to disentangle whether patterns are due to selection or demography.
Chloroplast DNA data should usually be combined with autosomal data when looking at
phylogeographic processes. Some plastid genomes are highly conserved (low variation and so
low resolution).
Marker types: SNPs, DNA sequence data, whole plastid genome.
Marker Types
Allozymes: Genes code for proteins, some of which form enzymes. Genetic variation at the locus (or
loci) coding for an enzyme may change the charge properties of the enzyme, allowing the variants to be
discriminated as ‘allozymes’. These markers are co-dominant and may be useful in non-model organisms
as they do not require prior sequence information. However, not all variation at the locus will be detected
by this method, so the resolution is relatively low, and the enzyme has a specific function, and so could be
under directional or purifying selection (which could confound assumptions made about the neutrality of
genetic markers when populations are compared).
Anonymous markers: Some of these techniques use restriction enzymes (enzymes that cut DNA) and
PCR, to amplify thousands or millions of fragments of DNA randomly from an organism’s genome.
They have the advantage that they do not require prior information about the DNA sequence of the
species, and so are good for non-model organisms. Markers employing this method include AFLPs
(amplified fragment length polymorphisms) and RAD tags (restriction site associated DNA markers).
AFLP markers are dominant, meaning that interpretation is restricted to assessing the presence or
absence of variable sites without knowing what locus they are associated with (so no information on
genotype). RAD tags provide information on heterozygosity and homozygosity by providing redundant
sequence for the same segments of DNA, and therefore information on genotype. Note that some other
approaches to anonymous DNA amplification (e.g. RAPD and ISSR) provide data that are difficult to
interpret accurately, and are therefore not recommended.
DNA sequence data: Determining the sequence of nucleotide bases (A, T, G & C) in a segment of
DNA provides detailed information on genetic variation, and is fast, easy and not expensive when
combined with PCR and based on mtDNA. This approach has been very popular for studies of
vertebrate populations where mtDNA is highly variable (and so provides good resolution) and both
haploid and matrilineal (avoiding complications presented by the PCR amplification and sequencing of
nuclear loci, which require further steps to determine genotype prior to sequencing). New DNA
sequencing methodologies, and especially next generation sequencing, will provide much more sequence
data quickly, and sequence paired chromosomes in diploids through extensive re-sequencing of each
position. Eventually this may provide whole genome sequences for population-level analyses, but in 2012
this would still be quite expensive. RAD tag approaches (see above) provide a subset of the full genome
sequence, and this approach is more affordable, but only necessary when either high resolution is
required, or when the objective is to look for loci that may be under selection. Sequence data for short
segments of DNA can also provide important information on selection, when the locus sequence is well
described and known to be under selection. The most common example of this is the sequencing of
immune system genes (e.g. ‘MHC’ genes) for comparison at the population level. Another potential
resource for investigating genes under selection is ‘Expressed Sequence Tags’ (EST). These are sequences
associated with the transcript of functional genes, and are especially useful when combined with array
technologies (see below) that can screen may such loci at once.
Microsatellite DNA: These are relatively short (~20-200 bases) simple sequence repeats of DNA, such
as ACACACAC or GTTGTTGTT. Microsatellites are typically highly variable in length among
individuals (evolving by DNA copying errors that alter the length of the repeat array) and so provide high
resolution for comparing populations or individuals (e.g. to assess levels of kinship). These loci are
amplified by PCR and the genotypes resolved by standard methods that allow the process to be fast and
inexpensive. Multiple loci are combined (typically 10-20, but more will provide better accuracy) in a
single screening process. Care needs to be taken to ensure that the genotypes are read accurately, and to
control for variation among labs when multiple labs are involved.
Single nucleotide polymorphisms (SNPs): DNA sequences vary among individuals at specific
nucleotide positions, and these can be identified as ‘single nucleotide polymorphisms’. Methods can then
be devised to look for variation only at these sites that are known to be variable, rather than collecting
sequence data for the full stretch of DNA in which they occur. There are various ways to screen for
these polymorphisms, but one common way is to use a ‘micro-array’. This is a means to set up a template
including the various SNP loci that have been identified, and then to screen DNA samples for a match to
sequences found on the template (known as a ‘chip’). Once the preliminary work has been done, this
allows a large number of loci to be assessed quickly for a large number of samples. However, the
preliminary work can be expensive and time-consuming, and it is important to control for potential biases
by basing the initial setup on a diverse range of populations.
Sex chromosome markers: Once a locus that is unique to the ‘heterogametic’ chromosome (e.g. the Y
chromosome in mammals) has been identified, PCR can simultaneously amplify a locus from both sex
chromosomes, and thereby identify the sex of the sample. This method is fast, easy and inexpensive.
Marker Type Properties and Uses
Development Resolution
time
(power)
Marker
Genome
Cost
Allozymes
Nuclear
Low
Low
Low
Low
Low
Medium
Medium
Medium
Medium-high
AFLPs
RAD tags
Nuclear, sexchromosomes
Nuclear, sexchromosomes
Microsatellites
Nuclear, sexchromosomes,
plastids
Low
Medium
Medium
SNPs
All
Mediumhigh
Medium-high
Medium - high
DNA
sequence data
All (can range
from short
segment to
whole
genomes)
Medium high
Low-medium
Medium - high
Prior
sequence
Neutral/Coding
Applications
data
required
Genetic diversity,
Population
Coding
No
structure,
Adaptation
Demography,
Both
No
Population structure
Demography,
Both
No
Population structure
Inbreeding,
Neutral (unless
Individual ID,
linked to
Yes
Parentage/Kinship,
functional
Demography,
locus)
Population structure
Inbreeding,
Individual ID,
Parentage/Kinship,
Both
Yes
Demography,
Population
structure,
Adaptation
Species ID,
Phylogenetics,
Phylogeography,
Both
Yes
Demography,
Population
structure,
Adaptation