Genetic isolation by distance in Arabidopsis thaliana: biogeography

MEC1122.fm Page 2109 Saturday, November 11, 2000 2:57 PM
Molecular Ecology (2000) 9, 2109 – 2118
Genetic isolation by distance in Arabidopsis thaliana:
biogeography and postglacial colonization of Europe
Blackwell Science, Ltd
T I M O T H Y F. S H A R B E L , B E R N H A R D H A U B O L D and T H O M A S M I T C H E L L - O L D S
Department of Genetics and Evolution, Max Planck Institut für Chemische Ökologie, Carl Zeiss Promenade 10, 07745 Jena,
Germany
Abstract
Arabidopsis thaliana provides a useful model system for functional, evolutionary and
ecological studies in plant biology. We have analysed natural genetic variation in A. thaliana
in order to infer its biogeographical and historical distribution across Eurasia. We analysed
79 amplified fragment length polymorphism (AFLP) markers in 142 accessions from the
species’ native range, and found highly significant genetic isolation by distance among A.
thaliana accessions from Eurasia and southern Europe. These spatial patterns of genetic
variation suggest that A. thaliana colonized central and northern Europe from Asia and
from Mediterranean Pleistocene refugia, a trend which has been identified in other species.
Statistically significant levels of multilocus linkage disequilibrium suggest intermediate
levels of disequilibrium among subsets of loci, and analysis of genetic relationships
among accessions reveal a star or bush-like dendrogram with low bootstrap support.
Taken together, it appears that there has been sufficient historical recombination in the
A. thaliana genome such that accessions do not conform to a tree-like, bifurcating pattern
of evolution – there is no ‘ecotype phylogeny.’ Nonetheless, significant isolation by distance
provides a framework upon which studies of natural variation in A. thaliana may be
designed and interpreted.
Keywords: AFLP, Arabidopsis thaliana, biogeography, linkage disequilibrium, postglacial colonization
Received 20 May 2000; revision received 22 July 2000; accepted 24 July 2000
Introduction
Arabidopsis thaliana provides a useful model system
for functional, evolutionary and ecological studies in
plant biology and genetics. Although positional cloning
or insertional mutagenesis have been used to elucidate
genetic influences on development and physiology, studies
of natural genetic variation also are becoming increasingly
common (Alonso-Blanco & Koornneef 2000). Besides
contributing to functional genomics, analyses of naturally
occurring quantitative genetic variation can elucidate
the evolutionary causes and consequences of molecular
variation in quantitative traits (Mitchell-Olds 1995). It is,
therefore, important to understand historical and ecological
influences on genetic variation in A. thaliana.
A. thaliana reproduces almost exclusively through selfing
(Redei 1975; Abbott & Gomes 1989). Observed patterns of
Correspondence: Thomas Mitchell-Olds. Fax: +49–3641–643668;
E-mail: [email protected]
© 2000 Blackwell Science Ltd
genetic variation are consistent with this life style: most
individuals are highly inbred, little heritable variation exists
within populations, and most genetic variation is found
among populations (Hanfstingl et al. 1994; Todokoro et al.
1995; Bergelson et al. 1998; Breyne et al. 1999; Miyashita
et al. 1999).
A. thaliana is native to Eurasia and North Africa (Price
et al. 1994; O’Kane & Al-Shehbaz 1997), and has become
widely naturalized in the Western Hemisphere following
European colonization. A. thaliana is a poor competitor
which occupies disturbed environments early in succession. It is commonly found in agricultural fields and other
disturbed sites associated with human activity (e.g. Bergelson
et al. 1998; Mauricio 1998). Previous studies of molecular
markers have identified no association between genetic
polymorphisms and geographical location (King et al.
1993; Todokoro et al. 1995; Bergelson et al. 1998; Miyashita
et al. 1999), suggesting that the original biogeographic
patterns in A. thaliana have been obscured by human disturbance. However, it is possible that phylogeographic
MEC1122.fm Page 2110 Saturday, November 11, 2000 2:57 PM
2110 T. F. S H A R B E L , B . H A U B O L D and T. M I T C H E L L - O L D S
structure could be detectable within the original species
range in Eurasia.
Pleistocene changes in climate and vegetation have
influenced the geographical range and genetic variation
of many European species during the past 135 000 years
(Comes & Kadereit 1998; Hewitt 1996). A. thaliana likely
occupied Europe during glacial and interglacial periods,
thus its present distribution may have been influenced by
repeated episodes of glacial advance and retreat. Briefly,
ice sheets covered parts of Britain and northern Europe as
well as the major mountain ranges during the Pleistocene,
while the plains of central Europe were tundra-like and
characterized by permafrost (Hewitt 1996; Willis 1996;
Comes & Kadereit 1998). Consequently, much of Europe’s
flora and fauna were forced southward into three main
glacial refugia: the Iberian Peninsula, Southern Italy, and
the Balkan region (Konnert & Bergmann 1995; Hewitt
1996; Comes & Kadereit 1998). It has also been hypothesized that parts of Scandinavia may have been a glacial
refugium, as there are indications that the Norwegian
coast was ice-free (Forsström & Punkari 1997), although
there has been little biogeographical evidence to support
this. Other regions east and south (e.g. south-west Asia)
were warmer during this time, but they suffered from
reduced precipitation levels (Willis 1996) and as a result
may have lacked suitable habitats for some species. Many
plants recolonized Europe from these refugia during the
present interglacial period, with species-specific differences in recolonization rates and patterns (Hewitt 1996;
Comes & Kadereit 1998). A. thaliana may have undergone
similar Pleistocene migrations, and this may be detectable
using molecular markers.
A number of characteristics of A. thaliana are consistent
with such a colonization pattern. First, the influx of genetically divergent populations from different glacial refugia
should lead to relatively higher interpopulation genetic
variability in Europe compared to any single glacial
refugium (Cooper et al. 1995; Leonardi & Menozzi 1995;
Schmidtling & Hipkins 1998). In support of this, elevated
interpopulation genetic variability has repeatedly emerged
from interecotype (i.e. accession) analyses of A. thaliana
(Hanfstingl et al. 1994; Todokoro et al. 1995; Bergelson
et al. 1998; Breyne et al. 1999; Miyashita et al. 1999). While
this high level of genetic variation among populations
has for the most part been attributed to its selfing nature,
a portion of this differentiation may have resulted from
isolation and divergence in disjunct refugia. Second,
populations which have undergone independent evolution in different glacial refugia should be characterized
by molecular markers unique to each region (see Comes
& Kadereit 1998; Purugganan & Suddith 1999). This
phenomenon may be reflected in the preponderance
of low frequency polymorphisms in A. thaliana (see Fig. 2
in Miyashita et al. 1999a), although other phenomena
(i.e. population bottlenecks) cannot be discounted. Third,
one would predict that populations in areas which have
acted as refugia over repeated glaciations should together
encompass most of the genetic variability in present-day
Europe (Comes & Kadereit 1998). To support this view,
it has been shown that genetic analyses of a few accessions can account for most of the variability contained in
larger samples (albeit from the analysis of a single locus,
Hanfstingl et al. 1994).
Therefore, we have undertaken a study of genetic
variation across a large sample of A. thaliana accessions
using amplified fragment length polymorphism (AFLP)
markers (Vos et al. 1995). Our aim was to genotype a large
number of Eurasian accessions using markers scattered
throughout the genome in order for us to detect biogeographic trends in Europe and Asia. As it is clear that
glacial refugia have influenced genetic variation in many
species (Willis 1996; Newton et al. 1999), our results
show that geography and history are important determinants of molecular and quantitative genetic variation
in A. thaliana.
Materials and methods
Samples
We sampled one genotype per accession because previous studies have found most genetic variation among
populations, and little polymorphism within populations
(Todokoro et al. 1995; Bergelson et al. 1998). Individuals
of 142 accessions (Table 1) were grown under identical
conditions (light/dark cycle) from seeds obtained from
the Arabidopsis Biological Resource Center (The Ohio State
University), the Nottingham stock centre and independent
collectors. After 6 weeks, leaves from three to five individuals
per accession were pooled, flash frozen in liquid nitrogen,
and DNA isolated using a Nucleon (Amersham Pharmacia
Biotech Europe GmbH) extraction kit. DNA quality and
concentration were assessed by restriction digestion and
visualization of 5 µL of the product on 0.7% TAE-agarose
gels.
AFLP analysis
From 0.5 to 1.0 µg of genomic DNA, per individual,
was digested with MseI (1 unit) and EcoRI (5 units; New
England Biolabs) and ligated to polymerase chain reaction
(PCR) adapters following the Ligation and Preselective
Amplification Module for Small Plant Genomes (P/N 402004)
procedure from Applied Biosystems. All restriction-ligation
reactions were incubated at 17 °C overnight, and PCRs
were run on a GeneAmp PCR System 9600 thermal cycler.
An initial screening using 64 selective primer combinations was performed on 10 accessions using the Selective
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109 – 2118
MEC1122.fm Page 2111 Saturday, November 11, 2000 2:57 PM
G E N E T I C I S O L AT I O N B Y D I S TA N C E I N A R A B I D O P S I S T H A L I A N A 2111
Table 1 Arabidopsis thaliana accessions from which AFLP genotypes were generated, grouped by geographical region
Geographic region
Accession
Africa
Cvi-0, Ita-0, Jl-3, Mt-0
Asia
Condara, cs1074, cs22482, cs22484, cs22485, cs22486, cs22488, cs22491, cs22492, cs22493, cs22495, cs6179, cs6180,
cs931, Hodja, Kas-1, Perm-1, Ms-0, Rsch-0, Stw-0, Ws-0, Ws-3, En-T
British Isles
Lc-0, Bur-0, Cnt-1, Edi-0, Lan-0, Su-0, Kil-0, Cal-0, Ty-0
Central Europe
Aa-0, Ag-0, Ak-1, An-2, Bch-1, Blh-1, Br-0, Bs-1, Bsch-0, Bu-0, Ca-0, Cha-0, Cha-1, Cit-0, Da(1)-12, Db-1, Di-0, Di-1,
Di-g, Do-0, Dr-0, Ei-2, Eil-0, El-0, En-0, En-1, Ep-0, Estland, Fe-1, Fi-0, Ga-0, Gd-1, Ge-0, Gie-0, Goe-0, Gü-0, Gy-0,
Ha-0, Hh-0, Hl-0, Hn-0, Je54, Jm-0, Ka-0, Kae-0, Kb-0, Kl-0, Kl-5, Ko-2, Kr-0, Kro-0, Kz, Le-0, Ler, Li-0, Li-5, Lip-0,
Lm-2, Lö-1, Lö-2, Lz-0, Ma-0, Me-0, Mh-0, Nd-0, No-0, Mrk-0, Mz-0, Nd-1, Nok-0, Np-0, Nw-0, Ob-0, Ove-0, Pi-0,
Po-0, Pr-0, Rak-2, RLD1, Ru-0, Sap-0, Sav-0, Ste-0, Ta-0, Uk-1, Wei-0, Wl-0, Wt-1
Iberian Peninsula
Bla-1, Bla-10, Co-1, Ll-0, Pla-0, Sah-0, Sf-1
Scandinavia
Fl-1, Lu-1, Oy-0, Oy-1, Te-0
Southern Italy
Bl-1, Ct-1, Mr-0, Pa-1, Pa-3, Tu-1
Amplification Start-up Module for Small Plant Genomes (P/N
402006, Applied Biosystems), following which the quality
and number of polymorphic bands were assessed and
three primer combinations were chosen for further
analysis (EcoRI TG × MseI CTA; EcoRI TG × MseI CAT;
EcoRI TG × MseI CTT). All samples were processed in
random order, and independent AFLP reactions were
performed on duplicate samples for internal control.
A sequencer loading mix was made by combining
1.2 µL deionized formamide, 0.46 µL GeneScan-500 (ROX)
internal size standard, 0.34 µL blue loading dye and 2 µL
of the selective amplification product. This mixture was
denatured at 95 °C for 4 min, and snap-cooled in an
ice-water mixture before being run on an ABI Prism 377
Genetic Analyser. The GS Run 36D-2400 module was run
using the following collection parameters: 4 h (run time);
2500 V (run voltage); 50 mA (current); 200 W (power); and
60 °C (gel temperature). Raw data was collected using the
ABI Prism GeneScan Analysis Software (Applied Biosystems), and sample files were aligned using the internal
size standard (ROX 500).
Aligned data was subsequently imported into Genographer (version 1.1.0, © Montana State University,
1998; http://hordeum.oscs.montana.edu/genographer/)
for band calling. Each AFLP locus was assessed and
scored using the ‘thumbnail’ option of Genographer,
which enables fluorescence signal strength distributions
per locus to be compared across all accessions together,
and presence was assigned if an accession had a band
≥100 fluorescence units. From this a presence/absence
matrix was constructed and imported into spss for
Windows (release 9.0.0; © SPSS Inc.) for data manipulation and analysis.
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109–2118
Analyses
Accessions were separated into the following seven
geographical regions: Scandinavia; British Isles; central
Europe (North Coast to the Alps); Iberian Peninsula;
Southern Italy (South of the Po Valley); Asia; and Africa
(Table 1). Cape Verdi Island was included in Africa as it
probably had been colonized from Africa or the Canary
Islands (Böhle et al. 1996). These regions were chosen
because they are separated by physical barriers to gene
flow, or are known glacial refugia (see Hewitt 1996).
We performed analyses of linkage disequilibrium to
test for correlations or statistical nonindependence
between different loci using the program lian (Haubold
& Hudson 2000). Two-locus linkage disequilibrium between
allele 1 at locus A and allele 0 at locus B is often expressed
as D = gA1,B0 – pA1 pB0 , where gA1,B0 is the frequency of the
two-locus haplotype and pA1 pB0 is the product of the
corresponding single-locus allele frequencies. If D = 0,
the loci are said to be in linkage equilibrium and the
converse is called linkage disequilibrium. However, pairwise
tests for linkage disequilibrium become cumbersome with
many loci. To avoid spurious rejection of the null hypothesis,
Bonferroni correction imposes a more stringent significance
threshold, but may be too conservative (Rice 1989; Weir
1996). Alternatively, one may ask whether more significant
results are obtained than by chance alone (Miyashita et al.
1999). However, these significance tests are not fully independent, so this approach may also be conservative.
Alternatively, genome-wide multilocus analysis of linkage disequilibrium can be applied to population data on
multilocus haplotypes (Brown et al. 1980). A mismatch
distribution is obtained by comparing all possible pairs of
MEC1122.fm Page 2112 Saturday, November 11, 2000 2:57 PM
2112 T. F. S H A R B E L , B . H A U B O L D and T. M I T C H E L L - O L D S
haplotypes (individuals) in a data set. Each pairwise comparison counts the number of loci that differ between two
haplotypes. For example, the five-locus haplotypes 10011
and 10000 are mismatched at two loci. Under linkage
equilibrium mismatches occur independently, one locus
at a time. In contrast, with linkage disequilibrium
mismatches involve correlated sets of loci. Consequently,
the mismatch distribution is influenced by presence or
absence of linkage disequilibrium, thus enabling a test for
multilocus linkage disequilibrium.
The observed variance of the number of mismatches
from all n(n – 1)/2 possible pairs of haplotypes in a
sample, VD, can be compared to the mismatch variance
expected under linkage equilibrium, Ve,
r
V e = ∑ hi ( 1 − hi ) ,
i=1
where hi = ∑ pij is the genetic diversity at the i-th locus
2
j
and r is the number of loci. The test of multilocus linkage
equilibrium then amounts to testing the null hypothesis
H0: VD = Ve. This can be achieved either by imitating the
effect of recombination through computer simulation
(Souza et al. 1992) or by using a parametric test (Haubold
et al. 1998). The approach based on pairwise mismatches
also yields a standardized index of association as a
measure of haplotype-wide linkage disequilibrium,
data were unavailable for some accessions, and thus only
a subset (n = 113) of the complete data could be analysed.
We first tested for genetic isolation by distance for the
entire data set and for samples contained in each geographical region separately. Following this, geographical
contrasts were tested by grouping accessions from pairs
of regions. The latter analyses were performed only for
those comparisons containing n ≥ 10 accessions. Finally,
we divided the central European accessions into two groups:
west (west of 5°); and east (east of 13°), and Mantel
permutation tests were then run on comparisons of east
and west Europe combined with populations from Asia
and the Iberian Peninsula separately.
Results
We scored 79 polymorphic AFLP loci across 142 Arabidopsis
thaliana accessions. The frequency distribution across all
accessions (n = 142) for the 79 loci is similar to that found
by Miyashita et al. (1999) for 472 AFLP bands and 38
accessions, with relatively high occurrences at the low
and high frequency ends of the distribution (Fig. 1). This
distribution is partially explained by random mutation,
which will lead to the formation of unique bands to weight
the frequency distribution towards its lower end (Fig. 1;
(Miyashita et al. 1999). Intermediate frequency polymorphisms were also found, although at lower frequencies.
s
IA = (VD /Ve – 1)/(r – 1),
which is zero for linkage equilibrium (Maynard Smith
et al. 1993; Hudson 1994).
Genetic similarity among accessions was examined
using the phylogenetic tree building package treecon
(Van de Peer & De Wachter 1993; Van de Peer & De
Wachter 1994; Van de Peer & De Wachter 1997). The
genetic distance algorithm of Link et al. (1995) was used
because only shared band presence is considered to be
informative, while band loss can be attained convergently
(e.g. mutation of any nucleotides within restriction sites
or selective nucleotides). Neighbour-joining trees were
constructed using 100 bootstrap replicates.
Principal components analysis is a method of data
reduction (Manly 1994). If the data are highly correlated,
a plot of the taxa against the first few principal components will account for a large portion of the total variance.
Such a plot would effectively summarize the structure
contained in the full data set. We applied principal
components analysis to our AFLP data using spss for
Windows (release 9.0.0; © SPSS Inc.).
We used the Mantel permutation procedure to test for
genetic isolation by distance (Mantel 1967). Sample
coordinates of most accessions were taken from the
Nottingham database (http://nasc.nott.ac.uk/) or found
using encarta (Microsoft Inc.). Latitude and longitude
Linkage disequilibrium and relationships among
accessions
On average, 62 (78%) of the AFLP bands were shared by
any two accessions. Significant linkage disequilibrium
Fig. 1 Locus frequency class distribution of 79 AFLP markers
sampled from 142 Arabidopsis thaliana accessions.
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109 – 2118
MEC1122.fm Page 2113 Saturday, November 11, 2000 2:57 PM
G E N E T I C I S O L AT I O N B Y D I S TA N C E I N A R A B I D O P S I S T H A L I A N A 2113
Table 2 Linkage disequilibrium analyses for geographical regions
with at least 10 accessions. n = 79 loci. Monte Carlo simulations
were performed with 10 000 iterations, and statistically significant
results after a sequential Bonferroni correction for multiple tests
are shown in bold. Bonferroni significance levels are indicated by
*P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001
Geographic
region
n
VD
Ve
H (± SE)
IAS
P
All Samples
142 16.1 11.7 0.2318 (0.0195) 0.0048 0.0001**
Central Europe 88 14.2 11.4 0.2278 (0.0200) 0.0031 0.0009**
Asia
23 23.4 11.2 0.2317 (0.0215) 0.0140 0.0001**
was detected for the complete sample, central Europe,
and Asia (Table 2). Mismatch distributions for the entire
data set, as well as for Europe, and combined Europe and
Asia showed bell-shaped distributions, whereas the Asian
sample was irregular (Fig. 2). Miyashita et al. (1999) found
that removal of the Fl-3 accession from their analysis had
the effect of decreasing linkage disequilibrium for the
remaining data set, the result of a disproportionately high
number of unique bands in the Fl-3 genotype. However,
Fl-3 is not A. thaliana (L. Dorn and T. Mitchell-Olds, unpublished). The t-tests for differences in genetic diversity were
insignificant for comparisons between all geographical
regions (not shown).
Given the significant linkage disequilibrium in the
data set, it might contain appreciable information about
evolutionary relationships. We searched for robust
subclusters of accessions using neighbour-joining with
bootstrap. This analysis returned no node with bootstrap
support >60%. The dendrogram resembled a bush or
star phylogeny (not shown), so consequently there is
no evidence for phylogenetic substructuring among
A. thaliana accessions.
Isolation by distance
In order to further investigate the genetic structure of
A. thaliana we subjected our data set to principal components analysis (Manly 1994). A plot of the taxa against
the first three principal components for the allele variables
accounted for merely 16.3% of the total variance. Nevertheless, some Asian accessions clustered separately from
the central European plants, and in addition, two Iberian
accessions (Sah-0 and Ll-0) and eight central European
accessions were also separated from the main cluster of
A. thaliana (Fig. 3).
We formally tested the geographical patterning suggested by the principal components analysis using Mantel
tests. Genetic distance between accessions increases
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109–2118
Fig. 2 Mismatch distributions for (a) the entire sample, (b) Asia,
and (c) Europe. All possible pairs of accessions are compared,
showing the frequency of mismatched (nonidentical) AFLP loci
for each pairwise comparison.
MEC1122.fm Page 2114 Saturday, November 11, 2000 2:57 PM
2114 T. F. S H A R B E L , B . H A U B O L D and T. M I T C H E L L - O L D S
Fig. 3 Three dimensional plot of first three principle components which describe 16.3% of the AFLP variation in Arabidopsis thaliana (Sah0 and Ll-0 are two outlier accessions from the Iberian peninsula).
significantly with geographical distance (Fig. 4). Even
with a conservative correction for multiple statistical tests
(Rice 1989), nine of 20 Mantel test comparisons were
statistically significant. All significant tests showed a
positive correlation between geographical and genetic
distance ranging from 0.16 (central Europe and Iberian
Peninsula) to 0.65 (Asia and Scandinavia; Table 3). Furthermore, genetic isolation by distance (with Bonferroni
significance) was found between samples from east Europe
and Asia (n = 20, r = 0.46, P < 0.01), west Europe and Asia
(n = 19, r = 0.50, P < 0.01), Asia and the Iberian Peninsula
(n = 16, r = 0.54, P < 0.0001), and east Europe and the
Iberian Peninsula (n = 18, r = 0.21, P < 0.02).
Two major trends are apparent from the significant
comparisons between geographical regions. First, a
significant positive correlation exists for all comparisons
with Asia (Table 3), and of these, the strongest correlations
exist for comparisons involving the edges of the European
distribution. Second, a Mantel test of the central European
samples alone shows no significant genetic isolation by
distance, but this becomes significant if accessions from
the Iberian Peninsula are included (Table 3). The Iberian
Peninsula sample exhibits no significant genetic isolation
by distance when compared to any other geographical
region except Asia.
Discussion
Glacial refugia and the postglacial colonization of Europe
We find significant isolation by distance among Arabidopsis
thaliana accessions from Eurasia and southern Europe
(Table 3; Figs 3 and 4). These spatial patterns of genetic
variation suggest that A. thaliana colonized central and
northern Europe from Asia, with some indications of
an additional Mediterranean Pleistocene refugium (the
Iberian peninsula). Previously, lack of phylogeographic
pattern has been ascribed to recent human-induced
migrations (King et al. 1993; Todokoro et al. 1995; Bergelson
et al. 1998; Miyashita et al. 1999). However, although
human disturbance clearly influences the biogeography
of A. thaliana, our large sample of Eurasian accessions
provides strong evidence for historical migrations and
isolation by distance. This result has important implications
for design and interpretation of functional and evolutionary
studies of natural variation in Arabidopsis.
Two possible scenarios could explain the significant
Mantel test results for comparisons involving Asia, the
Iberian peninsula and central Europe (Table 3; Fig. 4).
First, as significant genetic isolation by distance exists for
all comparisons involving Asia, Asia might be the source
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109 – 2118
MEC1122.fm Page 2115 Saturday, November 11, 2000 2:57 PM
G E N E T I C I S O L AT I O N B Y D I S TA N C E I N A R A B I D O P S I S T H A L I A N A 2115
Table 3 Mantel test results r (P-value) for the correlation between
geographical region and genetic distance (10 000 iterations; n = 79
loci). Boldfaced values are significant after sequential Bonferroni
correction for multiple tests. Post-Bonferroni significance levels
are indicated by *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001
Fig. 4 Positive correlation between geographical and genetic
distance shows isolation by distance in Arabidopsis thaliana. (a) all
accessions; (b) Asia.
region from which other areas were colonized since the
Pleistocene. The colonizing populations emigrating from
Asia would likely be characterized by reduced genetic
variability, as migrating populations are affected by
stochastic events which tend to decrease diversity (Cooper
et al. 1995; Leonardi & Menozzi 1995; Koch et al. 1998;
Schmidtling & Hipkins 1998). Regardless, one would not
expect genetic distance between the colonizing and Asian
source populations to increase significantly with geographical distance because: (i) only a short time period
(≈17 000 years) would have passed during the colonization event; and (ii) population bottlenecks along the
colonization route would have the effect of decreasing
genetic diversity but not increasing genetic distance.
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109–2118
Comparison
n
All
All samples together
Central Europe – Asia
Central Europe – Iberian Peninsula
Central Europe – Southern Italy
Central Europe – British Isles
Central Europe – Scandinavia
Central Europe – Africa
Central Europe
Asia – Iberian Peninsula
Asia – Southern Italy
Asia – British Isles
Asia – Scandinavia
Iberian Peninsula – Southern Italy
British Isles – Iberian Peninsula
British Isles – Southern Italy
Asia – Africa
Iberian Peninsula – Scandinavia
British Isles – Scandinavia
Southern Italy – Scandinavia
Asia
113
86
84
83
83
81
80
77
16
15
15
13
13
13
13
12
11
10
10
9
0.24 (0.0001)**
0.31 (0.0001)**
0.16 (0.0001)**
– 0.06 (0.05)
– 0.03 (0.47)
– 0.006 (0.84)
0.06 (0.17)
0.07 (0.04)
0.53 (0.0001)**
0.60 (0.0001)**
0.56 (0.0001)**
0.65 (0.0001)**
– 0.06 (0.57)
0.11 (0.46)
0.12 (0.45)
0.53 (0.0008)**
– 0.007 (0.65)
0.05 (0.76)
0.03 (0.86)
0.63 (0.0002)**
Thus, a second, and more likely scenario explaining the
observed pattern of isolation by distance is that both Asia
and the Iberian Peninsula provided Pleistocene glacial
refugia for A. thaliana.
The Asian accessions are clearly distinct based both on
the Mantel and PCA procedures (Table 3; Figs 3 and 4).
Accessions from the Iberian Peninsula exhibit no significant isolation by distance with any geographical region
other than central Europe and Asia (Table 3). Overall, it
seems likely that Europe may have been recolonized by
accessions emerging from the Iberian Peninsula and Asia.
In support of this hypothesis, the most divergent pairs
of accessions occur in comparisons of western Europe vs.
Asia (see Results). The principle components analysis
furthermore imply that this trend with the Iberian Peninsula may have been influenced by a subset of the Iberian
accessions (Sah-0 and Ll-0), although this conclusion is
weakened by the fact that only 16.3% of the data variance
is accounted for in the first three principle components
(Fig. 3). The Balkan region also remains a possible glacial
refugium for A. thaliana (Hewitt 1996), although this
hypothesis cannot be tested due to lack of collections
from this region. Similar post-Pleistocene migration
patterns are exhibited by a number of species (Hewitt
1996; Taberlet et al. 1998).
MEC1122.fm Page 2116 Saturday, November 11, 2000 2:57 PM
2116 T. F. S H A R B E L , B . H A U B O L D and T. M I T C H E L L - O L D S
Fig. 5 Scenario for Arabidopsis postglacial
colonization of Europe from the Iberian
Peninsula (black) and Asia (white), to a
central European contact zone (checkered).
Smaller arrows out of the Iberian Peninsula
indicate colonization of Scandinavia and
Italy.
Significant genetic isolation by distance between
central European vs. Asian accessions, and between
Iberian vs. central European accessions suggests that
central Europe may be a suture zone (Taberlet et al. 1998)
between the two refugia (Fig. 5). We hypothesize that the
genetic diversity characteristic of this zone may represent
the combined diversities of the colonizing populations
from both glacial refugia, each of which was composed
of genetically distinct selfing populations. The t-tests
show no increase in genetic diversity in central Europe
relative to any single glacial refugium (not shown), and
this result is consistent with a scenario whereby the genetic
diversity of migrant populations derived from each
refugium has been decreased by drift (Cooper et al. 1995;
Leonardi & Menozzi 1995; Schmidtling & Hipkins 1998).
If the above hypothesis is correct, then central European
accessions may show an east-to-west clinal distribution in
genetic variation. Two colonization ‘waves’ into Europe,
one from Asia and one from Iberia, would be reflected in
western European accessions being more closely related
to those from Iberia, and eastern European accessions
being more closely related to those from Asia (Fig. 5).
Mantel tests of the subdivided central European sample
indirectly support this hypothesis, with the most significant genetic isolation by distance found in the western
Europe–Asia and eastern Europe–Asia comparisons, and
less significance for the eastern Europe–Iberia comparison
(see Results). This trend suggests that the hypothesized
suture zone between European and Asia populations lies
further east than our central Europe sample, as the comparisons involving both east and west Europe with Asia show
similar levels of genetic isolation by distance, while the east
Europe and Iberia comparison is less significant (Table 3).
Linkage disequilibrium and relationships among
accessions
We found statistically significant levels of multilocus linkage
disequilibrium among A. thaliana accessions. Previous
analyses have not found strong evidence for linkage
disequilibrium, but were based on fewer accessions and
on two-locus estimators averaged across many comparisons
(Bergelson et al. 1998; Miyashita et al. 1999). In addition
to possible differences in statistical power (see Materials
and methods), these statistical estimators ask somewhat
different biological questions. VD, a genome-wide multilocus
estimator, may reject the null hypothesis of linkage
equilibrium if a small subset of loci are in disequilibrium.
Alternatively, large numbers of pairwise tests (D) only
reject the null hypothesis if high levels of disequilibrium
exist among many loci. Consequently, this analysis and
previously published studies are all compatible with a
model of intermediate levels of disequilibrium among
subsets of loci. More data will be needed to understand
the extent and possible utility of linkage disequilibrium
in A. thaliana (e.g. Collins et al. 1999; Kruglyak 1999).
Patterns of genetic relationship can be summarized by
a dendrogram. Our resulting tree indicated a bush or star
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109 – 2118
MEC1122.fm Page 2117 Saturday, November 11, 2000 2:57 PM
G E N E T I C I S O L AT I O N B Y D I S TA N C E I N A R A B I D O P S I S T H A L I A N A 2117
topology (not shown), with low levels of bootstrap support. Similar results were found in previous studies with
fewer accessions but more polymorphic loci (Breyne et al.
1999; Miyashita et al. 1999). This pattern agrees with our
findings of partial linkage disequilibrium. Evidently there
has been sufficient recombination in the A. thaliana
genome to partially reshuffle genetic variation. This
conclusion is also supported by surveys of nucleotide
polymorphism in several A. thaliana genes, which typically
contain evidence of intragenic recombination within a
few kilobases (Innan et al. 1996; Purugganan & Suddith
1998; Kawabe & Miyashita 1999; Purugganan & Suddith
1999). Although A. thaliana is highly selfing under laboratory conditions, historical recombinations have been
sufficiently important that A. thaliana accessions do not
conform to a tree-like, bifurcating pattern of evolution —
there is no ‘ecotype phylogeny’.
Nucleotide polymorphisms have an historical context,
as the frequency of specific alleles may have been influenced by gene flow before, during and after isolation
in Pleistocene refugia (Hewitt 1996). Populations of
A. thaliana from different refugia should differ in terms of
low frequency alleles, which may have resulted from
mutation, genetic drift and weak selection. Intermediate
frequency alleles may represent older polymorphisms
distributed over a broad geographical range, or polymorphisms that have reached high frequency through some form
of selection (Konnert & Bergmann 1995; Akashi 1999). Our
data suggest that Eurasian A. thaliana accessions have undergone genetic differentiation in separate Pleistocene glacial
refugia. Northward recolonization following glacial retreat
may have generated a suture zone somewhere in eastern
Europe. This geographical structure provides a context
from which accessions of differing genetic backgrounds
may be chosen for studies of natural genetic variation
(Alonso-Blanco & Koornneef 2000). Choosing accessions
from different glacial refugia may increase levels of genetic
variation for functional and evolutionary studies.
Acknowledgements
We thank Domenica Schnabelrauch and Antje Figuth for help with
the AFLP analysis, and J. Bergelson, D. Charlesworth, M. Clauss,
M. Koch, J. McKay, B. Stranger, and two anonymous reviewers
for comments on the manuscript. Randy Scholl Arabidopsis Biological Resource Center (ABRC) provided seeds. This work was
supported by the Max-Planck-Gesellschaft, and by grants to
TMO from the US National Science Foundation (DEB-9527725),
the European Union, and Bundesministerium für Bildung und
Forschung (BMBF).
References
Abbott RJ, Gomes MF (1989) Population genetic structure and
outcrossing rate of Arabidopsis thaliana (L.) Heynh. Heredity, 62,
411– 418.
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109–2118
Akashi H (1999) Within- and between-species DNA sequence
variation and the ‘footprint’ of natural selection. Gene, 238, 39 –
51.
Alonso-Blanco C, Koornneef M (2000) Naturally occurring
variation in Arabidopsis: an underexploited resource for plant
genetics. Trends in Plant Science, 5, 22–29.
Bergelson J, Stahl E, Dudek S, Kreitman M (1998) Genetic variation
within and among populations of Arabidopsis thaliana. Genetics,
148, 1311–1323.
Böhle U-R, Hilge HH, Martin WF (1996) Island colonization and
evolution of the insular woody habit in Echium L. (Boraginaceae).
Proceedings of the National Academy of Sciences of the USA, 93,
11740–11745.
Breyne P, Rombaut D, Van Gysel A, Van Montagu M, Gerats T
(1999) AFLP analysis of genetic diversity within and between
Arabidopsis thaliana ecotypes. Molecular and General Genetics,
261, 627–634.
Brown AHD, Feldman MW, Nevo E (1980) Multilocus structure
of natural populations of Hordeum spontaneum. Genetics, 96,
523–536.
Collins A, Lonjou C, Morton N (1999) Genetic epidemiology of
single-nucleotide polymorphisms. Proceedings of the National
Academy of Sciences of the USA, 96, 15173 –15177.
Comes HP, Kadereit JW (1998) The effects of Quaternary climatic
changes on plant distribution and evolution. Trends in Plant
Science, 3, 432–438.
Cooper SJB, Ibrahim KM, Hewitt GM (1995) Postglacial Expansion and Genome Subdivision in the European Grasshopper
Chorthippus Parallelus. Molecular Ecology, 4, 49 – 60.
Forsström L, Punkari M (1997) Initiation of the Last Glaciation in
Northern Europe. Quaternary Science Review, 16, 1197 –1215.
Hanfstingl U, Berry A, Kellog EA et al. (1994) Haplotypic divergence coupled with lack of diversity at the Arabidopsis thaliana
alcohol dehydrogense locus: Roles for both balancing and
directional selection? Genetics, 138, 811– 828.
Haubold B, Hudson RR (2000) lian, version 3.0: detecting
linkage disequilibrium in multilocus data. Bioinformatics, in
press.
Haubold B, Travisano M, Rainey P, Hudson R (1998) Detecting
linkage disequilibrium in bacterial populations. Genetics, 150,
1341–1348.
Hewitt GM (1996) Some genetic consequences of ice ages, and
their role in divergence and speciation. Biological Journal of the
Linnaean Society, 58, 247–276.
Hudson RR (1994) Analytical results concerning linkage disequilibrium in models with genetic transformation and
conjugation. Journal of Evolutionary Biology, 7, 535 – 548.
Innan H, Tajima F, Terauchi R, Miyashita NT (1996) Intragenic
recombination in the Adh locus of the wild plant Arabidopsis
thaliana. Genetics, 143, 1761–1770.
Kawabe A, Miyashita N (1999) DNA variation in the basic chitinase locus (ChiB) region of the wild plant Arabidopsis thaliana.
Genetics, 153, 1445–1453.
King G, Niehnuis J, Hussey C (1993) Genetic similarity among
ecotypes of Arabidopsis thaliana estimated by analysis of restriction fragment length polymorphisms. Theoretical and Applied
Genetics, 86, 1028–1032.
Koch M, Hurka H, Mummenhoff K (1998) Molecular phylogenetics of Cochlearia L. & allied genera based on nuclear
ribosomal ITS DNA sequence analysis contradict traditional
concepts of their evolutionary relationship. Plant Systematics
and Evolution, 216 (3–4), 207–230.
MEC1122.fm Page 2118 Saturday, November 11, 2000 2:57 PM
2118 T. F. S H A R B E L , B . H A U B O L D and T. M I T C H E L L - O L D S
Konnert M, Bergmann F (1995) The geographical distribution of
genetic variation of silver fir (Abies alba, Pinaceae) in relation
to its migration history. Plant Systematics and Evolution, 196,
19 – 30.
Kruglyak L (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics,
22, 139 –144.
Leonardi S, Menozzi P (1995) Genetic Variability of Fagus Sylvatica L in Italy — the Role of Postglacial Recolonization. Heredity,
75, 35 – 44.
Link W, Dixens C, Singh M, Schwall M, Melchinger AE (1995)
Genetic diversity in European and Mediterranean faba bean
germ plasm revealed by RAPD markers. Theoretical and Applied
Genetics, 90, 27 – 32.
Manly FJ (1994) Multivariate Statistical Methods, a Primer. 2nd edn.
Chapman & Hall, London.
Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209–220.
Mauricio R (1998) Costs of resistance to natural enemies in field
populations of the annual plant Arabidopsis thaliana. American
Naturalist, 151, 20 – 28.
Maynard Smith J, Smith NH, Dowson CG, Spratt BG (1993) How
clonal are bacteria? Proceedings of the National Academy of
Sciences of the USA, 90, 4384 – 4388.
Mitchell-Olds T (1995) The molecular basis of quantitative
genetic variation in natural populations. Trends Ecology and
Evolution, 10, 324 – 328.
Miyashita NT, Kawabe A, Innan H (1999) DNA variation in the
wild plant Arabidopsis thaliana revealed by amplified fragment
length polymorphism analysis. Genetics, 152, 1723 –1731.
Newton AC, Allnutt TR, Gillies ACM, Lowe AJ, Ennos RA (1999)
Molecular phylogeography, intraspecific variation and the
conservation of tree species. Trends in Ecology and Evolution, 14,
140 –145.
O’Kane SL, Al-Shehbaz IA (1997) A Synopsis of Arabidopsis
(Brassicaceae). Novon, 7, 323 – 327.
Price RA, Al-Shehbaz IA, Palmer JD (1994) Systematic relationships of Arabidopsis: a molecular and morphological approach.
In: Arabidopsis (eds Meyerowitz E, Somerville C), pp. 7–19.
Cold Spring Harbor Press, Cold Spring Harbor, NY.
Purugganan MD, Suddith JI (1998) Molecular Population Genetics
of the Arabidopsis Cauliflower Regulatory Gene — Nonneutral
Evolution and Naturally Occurring Variation in Floral Homeotic Function. Proceedings of the National Academy of Sciences of
the USA, 95, 8130 – 8134.
Purugganan MD, Suddith JI (1999) Molecular population genetics
of floral homeotic loci: Departures from the equilibrium-
neutral model at the APETALA3 and PISTILLATA genes of
Arabidopsis thaliana. Genetics, 151, 839 – 848.
Redei GP (1975) Arabidopsis as a genetic tool. Annual Review of
Genetics, 9, 111–127.
Rice W (1989) Analyzing tables of statistical tests. Evolution, 43,
223–225.
Schmidtling RC, Hipkins V (1998) Genetic diversity in longleaf
pine (Pinus palustris): influence of historical and prehistorical
events. Canadian Journal of Forest Research, 28, 1135 –1145.
Souza V, Nguyen TT, Hudson RR, Pinero D, Lenski RE (1992)
Hierarchical analysis of linkage disequilibrium in Rhizobium
populations: evidence for sex? Proceedings of the National
Academy of Sciences of the USA, 89, 8389 – 8393.
Taberlet P, Fumagalli L, Wustsaucy AG, Cosson JF (1998) Comparative Phylogeography and Postglacial Colonization Routes
in Europe. Molecular Ecology, 7, 453– 464.
Todokoro S, Terauchi R, Kawano S (1995) Microsatellite polymorphisms in natural populations of Arabidopsis thaliana in
Japan. Japanese Journal of Genetics, 70, 543 – 554.
Van de Peer Y, De Wachter R (1993) treecon: a software package for the construction and drawing of evolutionary trees.
Computer Applications in the Biosciences, 9, 177–182.
Van de Peer Y, De Wachter R (1994) treecon for Windows:
a software package for the construction and drawing of
evolutionary trees for the Microsoft Windows environment.
Computer Applications in the Biosciences, 10, 569 – 570.
Van de Peer Y, De Wachter R (1997) Construction of evolutionary
distance trees with treecon for Windows: accounting for
variation in nucleotide substitution rate among sites. Computer
Applications in the Biosciences, 13, 227– 230.
Vos P, Hogers R, Bleeker M et al. (1995) AFLP — a New Technique
For DNA fingerprinting. Nucleic Acids Research, 23, 4407 – 4414.
Weir BS (1996) Genetic Data Analysis II. Sinauer, Sunderland, MA.
Willis KJ (1996) Where did all the flowers go? The fate of temperate European flora during glacial periods. Endeavour, 20,
110–114.
Tim Sharbel is a postdoctoral researcher working on gametophytic
apomixis in plants and animals, for which he uses genomic,
chromosomal and population genetic approaches. Bernhard
Haubold is a bioinformatician with research interests in population
genetics and molecular evolution. Thomas Mitchell-Olds is Director
of the Max Planck Institute of Chemical Ecology. He studies the
functional basis of evolutionary forces influencing ecologically
important genetic variation.
© 2000 Blackwell Science Ltd, Molecular Ecology, 9, 2109 – 2118