Sheep Mitochondrial DNA Variation in European, Caucasian,
and Central Asian Areas
Miika Tapio,* Nurbiy Marzanov, Mikhail Ozerov,* Mirjana Ćinkulov,à Galina Gonzarenko,§
Tatyana Kiselyova,k Maciej Murawski,{ Haldja Viinalass,# and Juha Kantanen*
*Biotechnology and Food Research, MTT Agrifood Research Finland, Jokioinen, Finland; All-Russian Research Institute of Animal
Husbandry, Russian Academy of Agricultural Sciences, Dubrovitsy, Russia; àAnimal Science Department, University of Novi Sad,
Novi Sad, Serbia and Montenegro; §Siberian Branch of Russian Academy of Agricultural Science, Krasnoobsk, Russia; kAll-Russian
Research Institute of Animal Genetics and Breeding, Russian Academy of Agricultural Sciences, St Petersburg-Pushkin, Russia;
{Department of Sheep and Goat Breeding, Agricultural University of Cracow, Cracow, Poland; and #Institute of Veterinary Medicine
and Animal Sciences of the Estonian University of Life Sciences, Tartu, Estonia
Three distinct mitochondrial maternal lineages (haplotype Groups A, B, and C) have been found in the domestic sheep.
Group B has been observed primarily in European domestic sheep. The European mouflon carries this haplotype group.
This could suggest that European mouflon was independently domesticated in Europe, although archaeological evidence
supports sheep domestication in the central part of the Fertile Crescent. To investigate this question, we sequenced a highly
variable segment of mitochondrial DNA (mtDNA) in 406 unrelated animals from 48 breeds or local varieties. They originated from a wide area spanning northern Europe and the Balkans to the Altay Mountains in south Siberia. The sample
included a representative cross-section of sheep breeds from areas close to the postulated Near Eastern domestication center
and breeds from more distant northern areas. Four (A, B, C, and D) highly diverged sheep lineages were observed in
Caucasus, 3 (A, B and C) in Central Asia, and 2 (A and B) in the eastern fringe of Europe, which included the area north
and west of the Black Sea and the Ural Mountains. Only one example of Group D was detected. The other haplotype groups
demonstrated signs of population expansion. Sequence variation within the lineages implied Group A to have expanded
first. This group was the most frequent type only in Caucasian and Central Asian breeds. Expansion of Group C appeared
most recently. The expansion of Group B involving Caucasian sheep took place at nearly the same time as the expansion of
Group A. Group B expansion for the eastern European area started approximately 3,000 years after the earliest inferred
expansion. An independent European domestication of sheep is unlikely. The distribution of Group A variation as well as
other results are compatible with the Near East being the domestication site. Groups C and D may have been introgressed
later into a domestic stock, but larger samples are needed to infer their geographical origin. The results suggest that some
mitochondrial lineages arrived in northern Europe from the Near East across Russia.
Introduction
In several species, mitochondrial DNA (mtDNA) has
been used to study domestication history. The first surveys
of mtDNA variation in the domestic sheep (Ovis aries) revealed 2 distinct lineages (A and B; Wood and Phua 1996;
Hiendleder, Mainz et al. 1998; Hiendleder et al. 2002), and
recently, a third distinct haplotype group was reported (C;
Guo et al. 2005; Pedrosa et al. 2005). The number of highly
diverged lineages in other domestic ruminants is 4 for goat
(Sultana et al. 2003), 2 for cattle (Loftus et al. 1994), and 2
for water buffalo (Tanaka et al. 1996), and separate domestication regions have been inferred. Similarly, the presence
of several distinct lineages has been inferred as multiple
sheep domestications (e.g., Pedrosa et al. 2005). The number of culturally and biologically independent domestication events may be lower than the number of distinct
lineages because the original wild population may have
been polymorphic, or new maternal lineages may have been
introgressed from different wild populations into the domesticated population (Zeder et al. 2006).
The previous studies on sheep mtDNA sequence diversity (Wood and Phua 1996; Hiendleder, Mainz et al.
1998; Hiendleder et al. 2002; Guo et al. 2005) have been
mainly based on European or Asian sheep distant from the
postulated Near East domestication center (Smith 1998). A
single haplotype group, Group B, predominates in EuroKey words: Ovis aries, sheep, domestication, mtDNA.
E-mail: [email protected].
Mol. Biol. Evol. 23(9):1776–1783. 2006
doi:10.1093/molbev/msl043
Advance Access publication June 16, 2006
Ó The Author 2006. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
pean sheep populations, and it is the only group that has
been observed in the European mouflon. This haplotype
group is less common in the native eastern Eurasian breeds
(Hiendleder, Mainz et al. 1998; Hiendleder et al. 2002, Guo
et al. 2005, Meadows et al. 2005) with a notable exception
for the Javanese thin tailed (Meadows et al. 2005), which
may result from crossbreeding with breeds originating from
Europe (reviewed by Davis et al. 2002). The predominance
of Group B in Europe supports an independent European
domestication of sheep, which has been suggested earlier
(e.g., Ryder 1983, p 23–4). However, the most reliable archaeological data suggest that sheep domestication occurred in the central part of the Fertile Crescent in the
Near East approximately 9,000 years ago (Smith 1998),
and the European mouflon can represent a primitive feral
sheep rather than a truly wild sheep (Poplin 1979). A recent
study in Turkish sheep showed 3 distinct maternal lineages,
and this was regarded as a support for the high importance
of Turkey in sheep domestication (Pedrosa et al. 2005).
However, the hypothesis of Near Easter sheep domestication has not been conclusively explored with comparisons
of genetic variation between geographical areas. The aim of
the present study was to assess the support for the European
and Near Eastern domestication of sheep. This was based
on extensive sampling in Caucasus area, which is located
very close to the hypothesized Near East domestication
center, and in a wide northern area spanning the North
European countries and the Balkans to the Altay Mountains
in south Siberia, which represents a geographical area
clearly exterior to the postulated Near Eastern domestication sites (e.g., Pedrosa et al. 2005). The rich sheep diversity
Mitochondrial DNA Variation in Eurasian Sheep 1777
FIG. 1.—The distribution of the 4 distinct haplotype groups in the study regions. The slices in the pie diagrams represent Group A (white), B (black),
C (gray), and D (region 2, hatched and extended for visibility). The 4 Caucasian regions are: south Caucasus (1), north Caucasus (2), Stavropol (3), and the
Caspian Depression (4). The 2 Central Asian regions are located in areas northeast of Caucasus: the region east of Caspian Sea (5) and the Altay region (6).
The 10 regional groups of the eastern fringe of Europe are: the Middle Volga region (7), the Volga-Kama region (8), west Russia (9), Russian Karelia (10),
Ukraina (11), east of Baltic Sea (12), Poland (13), southeast Europe (Tsigai breeds; 14), Norway (15), and Britain (16). Previously reported haplotype
group frequencies on 4 regions has been included: (i) Austria (Meadows et al. 2005), (ii) Turkey (Pedrosa et al. 2005), (iii) northwest China (Kazakh
Fat-Rumped and Tibetan), and (iv) northeast China (Han, Hu, Mongolian, and Tong; Guo et al. 2005).
in the study area contains indigenous fat-tailed, fat-rumped,
and thin-tailed fleece sheep (Ryder 1984), including both
short- and long-tailed breeds.
PickPen/QuickPick gDNA method (Bio-Nobile, Finland)
according to the manufacturer’s instructions.
Sequence Data
Materials and Methods
Sampling and DNA Extraction
Samples of 406 unrelated animals from 48 breeds or
local varieties were studied. Sampling was done in large
collective or institute farm flocks of recognized breeds
or in several smallholder flocks of local varieties. Sheep were
grouped into 16 regional groups (fig. 1; Supplementary
Table 1, Supplementary Material online) representing 3
wider areas (Caucasian area, Central Asian area, and the
remaining eastern fringe of Europe). Four of these regional
groups are located in the Caucasian area: south Caucasus
(Azerbaijan Mountain Merino, Bozakh, Gala, Karabakh,
Mazekh, Tushin), north Caucasus (Andi, Dagestan local,
Dagestan Mountain Merino, Karachai, Lezgian), Stavropol
(Caucasian, North Caucasus Mutton-Wool, Stavropol), and
the Caspian Depression (Aksaraisk sheep type, Grozny,
Soviet Merino, Volgograd). Two of the groups are located
in Central Asia: east of Caspian Sea (Edilbai, Karakul) and
Altay (Gorno-Altay local, Kulunda). The remaining 10
regional groups are located in the area west of the Ural
Mountains and cover the eastern fringe of Europe (fig. 1):
the Middle Volga region (Kuibyshev, Mordovian local),
the Volga-Kama region at the intersection of the Volga
and Kama Rivers (Komi local, Mari local, Oparin, Udmurtian local), west Russia (Kuchugur, Romanov, Russian
Romney Marsh), Russian Karelia (Vepsia sheep, Viena
sheep), Ukraina (Carpathian Mountain, Sokolsk), east of
Baltic Sea (Finnsheep, Finnish Grey Landrace, Estonian
Blackhead, Estonian Whitehead, Saaremaa local, and
Ruhnu local), Poland (Olkuska, Swiniarka, Wrzosowka),
southeast Europe (the original place of Tsigai breeds:
Serbian Tsigai, Russian Tsigai), Norway (Spael Sheep,
Old Spael Sheep, Norwegian Feral Sheep), and Britain
(Oxford Down). DNA samples were extracted from blood
as described previously (Tapio et al. 2003) or using the
The DNA region analyzed was a highly variable 721-nt
long segment of the mtDNA control region, running from
base 15,541 to base 16,261 in relation to the full mitochondrial sequence (accession number NC001941; Hiendleder,
Lewalski et al. 1998). Based on this complete genome sequence, 4 primers were designed. The primers were named
to indicate the locations and whether they hybridized
with the heavy (H) or the light (L) strand. OarCR1538915410L and OarCR29-48H were used to produce sequencing template using polymerase chain reaction (PCR), and
OarCR15412-15436L and OarCR16368-16391H were
used to sequence both complementary strands. In PCR,
0.05 lg of total DNA was used in 50 ll volume of standard
DyNAZyme II (Finnzymes, Finland) PCR reaction mix.
The template production conditions were as follows: 2 min
94°C; 10 times 1 min 94°C, 1 min 56°C, 2 min 70°C; 10 times
45 s 90°C, 1 min 54°C, 2 min 70°C; 20 times 45 s 88°C, 1 min
52°C, 2 min 70°C; 5 min 70°C. PCR products were purified using ExoSAP-IT (Amersham Biosciences, United
Kingdom). Sequencing reactions were performed with
DYEnamic ET Terminator Kit (Amersham Biosciences)
using 10 ll of purified template. The sequencing reaction
had 29 cycles of the following temperatures: 20 s 95°C,
15 s 50°C, and 1 min 60°C. The sequencing products were
purified using Amersham Biosciences Autoseq 96 plates
and analyzed using MegaBACE 500 (Amersham Biosciences). Fluorogram analysis was performed using Cimarron
3.12 base-caller implemented in MegaBACE Sequence
Analyser (Amersham Biosciences). The complementary sequence reads were combined using Phred/Phrap software
(Ewing et al. 1998). This combining utilized Phred-estimated
base-calling confidences. The combination of complementary reads was done also with fixed, equal confidence for
each base-call. If this implied that there were major differences between the sequence reads, the reads were considered
1778 Tapio et al.
unreliable and were excluded in an early phase of the analysis. The ends of the sequences were trimmed to exclude
problematic segments after a manual check.
The following previously published wild and domestic
sheep data were used for comparison: Ovis aries musimon
(AY091487; Hiendleder et al. 2002), Ovis ammon collium
(AY091492; Hiendleder et al. 2002), Ovis ammon nigrimontana (AY091494; Hiendleder et al. 2002), Ovis vignei
bochariensis (AF039580, AY091491, and AY091490;
Hiendleder, Mainz et al. 1998; Hiendleder et al. 2002), Ovis
vignei arkal (AY091489; Hiendleder et al. 2002), and O.
aries (Mongolian) (AY829402; Guo et al. 2005). Novel domestic sheep sequences were submitted to GenBank (accession numbers DQ242050–DQ242455). The sequences
were aligned using MAP2 software (Ye and Huang 2005)
with the default parameters but setting alignment gaps
larger than 75 nt not to be penalized more than a 75-nt gap.
Data Analysis
MEGA 3.1 (Kumar et al. 2004) was used to construct
a Neighbor-Joining tree and to measure differences within
and between the observed distinct haplotype groups. Network 4.1.0.9 (Bandelt et al. 1999; available from: http://
www.fluxus-engineering.com/) was used to construct median-joining networks separately for the sequences within
each haplotype group in order to estimate short-scale evolutionary relationships. If the network indicated that the
maximum number of mutations at a site exceeded 4, the site
was excluded and the network was rebuilt. Groups A and B
had 5 common excluded sites (15,956, 15,957, 16,008,
16,048, and 16,133). In Group A, the sites 15,939 and
15,971 were also excluded. In Group B, 28 other sites were
not considered (15,566, 15,592, 15,601, 15,621, 15,639,
15,645_15,646insT, 15,933, 15,934, 15,943, 15,948,
15,955, 15,958, 15,959, 15,961, 15,963, 15,977, 15,982,
15,993, 16,003, 16,019, 16,036, 16,042, 16,044, 16,096,
16,097, 16,101, 16,218, and 16,245). Site numbering is
given in relation to the full mitochondrial sequence (accession number NC001941).
The distribution of the sequence types represented by
nodes in the median-joining network was tested using a simplified phylogeographic test. This test was a permutational
contingency test, where the geographic sampling area (Caucasus, Central Asia, and the remaining eastern fringe of
Europe) was treated as a categorical variable (Templeton
et al. 1995). In each network for the 3 haplotype groups,
the sequence types were grouped into 3 classes: 1) rare
types occurring once or twice, 2) the phylogenetically central common root type, and 3) other types. The null distribution (the sampling area is unrelated to the observation,
e.g., rare sequence types are equally likely to be found
in any area) was created by 106 permutations using Geodis
2.2 (Posada et al. 2000).
Signs of population expansion were explored using
Arlequin 2.001 (Schneider et al. 1999) with the following
steps. First, Fu’s Fs test of selective neutrality (Fu 1997),
which compares the observed haplotype number to the observed number of pairwise differences, was used to establish the presence of population expansion. Second, based on
the observed distribution of pairwise differences between
sequences (i.e., mismatch distribution), the model parameters for the sudden population expansion model (Rogers
1995) were estimated, and the fit of the data to the inferred
model was tested (Schneider and Excoffier 1999). Deletion
polymorphisms were ignored in the analysis.
R8s 1.70 software (Sanderson 2003) was used to estimate the time to the most recent common ancestor for each
distinct domestic sheep lineage (i.e., time in which all the
within-group variation emerged). This analysis required
a phylogenetic tree with branch lengths. To construct this,
the appropriate mutation model was determined first. This
was done using a hierarchical likelihood ratio test in Modeltest 3.06 (Posada and Crandall 1998). Second, similarity
of substitution rate in the evolutionary paths from an ancestral sequence type into pairs of present-day domestic sheep
mtDNA haplotypes was studied using a model-based relative rate test implemented in HY-PHY (Pond et al. 2005).
This test compares the evolutionary distance from an outgroup to 2 different ‘‘ingroup’’ haplotypes. The hypothesis
testing is based on the difference between the likelihood of
data with and without the assumption of equivalent substitution rate. In this relative rate test, Ovis ammon nigrimontana (AY091494; Hiendleder et al. 2002) was the outgroup.
Haplotypes were excluded from the phylogenetic dating
analysis based on a testwise P value 0.05. Third, the remaining unique domestic sheep haplotypes were used
to construct a Bayesian phylogeny using MrBayes 3.1.2
(Ronquist and Huelsenbeck 2003). In construction of the
phylogeny, the default priors of MrBayes were utilized,
and simple uniform molecular clock was assumed.
MrBayes was run for 6 million iterations. This was performed using 4 parallel chains with temperature setting
0.03 in order to make the analysis explore possible trees
and parameters more efficiently. The first million iterations
were excluded as ‘‘burn-in.’’ From the remaining 5 million
iterations, 12,500 trees were sampled to construct a consensus tree that included all groupings that occurred in
the majority of the trees. Fourth, this Bayesian tree was
given as input to r8s software. One substitution rate across
the entire tree was assumed, and a so-called truncated
Newton algorithm was used to estimate times (see r8s
documentation for details).
Results
General Observations
The sequenced segment spanned 721 nt in relation to
the full sheep mitochondrial sequence (NC001941). The
analyses were based on the region that corresponds to
the sites from 15,541 to 15,643 and from 15,933 to
16,261. A central region, which consisted mainly of long
tandem repeats, was excluded. The exact boundaries of
the discarded part were set by the MAP2 software, which
aligns only significantly similar blocks (Ye and Huang
2005). The alignment was 432 nt long.
In the obtained 406 domestic mtDNA sequences, 210
haplotypes were identified. They were defined by 124 polymorphic sites, including 3 sites where the polymorphism
was only a presence/absence of an alignment gap. Among
the 210 haplotypes in the 48 populations, 159 were observed once, whereas the most common haplotype occurred
Mitochondrial DNA Variation in Eurasian Sheep 1779
FIG. 3.—Median-joining networks constructed separately for the distinct haplotype Groups A, B, and C. Cross-lines on the branches indicate
the number of mutations, where several mutations were inferred for
a branch. Black, gray, and white proportions correspond to samples from
Caucasus, Central Asia, and the remaining eastern fringe of Europe, respectively. For the Group A, the haplotypes found in the Middle Volga
region (v), in the Nordic countries in northern Baltic (n), and in Estonia
and Poland in southern Baltic (s) are marked on the network.
FIG. 2.—Neighbour-joining tree showing differences within and between the 4 domestic sheep haplotype groups (Groups A, B, C, and D) and
their divergence from wild sheep species. Within the Groups B and D, the
filled circle indicates the European mouflon haplotype and the filled square
indicates domestic Mongolian haplotype (accession number AY829402),
respectively. The tree is based on proportion of different nucleotides between haplotypes, and the numbers indicate bootstrap support for the main
branches as percentages among 10,000 resamplings.
61 times (Supplementary Table 2, Supplementary Material
online).
The haplotypes divided into 4 distinct haplotype
groups (fig. 2): Groups A, B, (Wood and Phua 1996), C
(Pedrosa et al. 2005), and a new Group D. This division
was confirmed by split decomposition analysis (Bandelt
and Dress 1992; unpublished data). The average proportion
of different nucleotides between unique haplotypes was
2.8%. Within Groups A, B, and C, this proportion was
0.9%, 0.7%, and 0.4%, respectively. The average proportion of dissimilar nucleotides in pairs of haplotypes from
different haplotype groups was 5.5%. Between pairs of haplotype groups, this proportion varied from 5.0% (between
Groups B and D) to 6.1% (between Groups A and C).
Groups A and B were the most common and occurred in
22% and 71% of the studied sheep, respectively. Frequencies of the 2 other groups were low; the frequency of Group
C was 7%, and Group D was detected in only one Karachai
sheep from north Caucasus.
Analysis of Haplotype Group Expansions
The median-joining networks showed a star-shaped
pattern (fig. 3). This phylogenetic pattern is commonly understood to be indicative of a population expansion. Two
statistical approaches supported this inference. First, the
Fs (Fu 1997) statistic, which is particularly sensitive to population growth, showed a significant (P , 0.001) departure
from neutrality in all 3 haplotype groups with multiple haplotypes (A: 26.0; B: 26.1; and C: 7.1). Second, the
observed mismatch distributions were fitted to the sudden
expansion model (Rogers 1995), and the analysis supported
population expansion in each haplotype group. The observed mismatch distributions did not deviate from the expectations of the fitted models (P . 0.36; table 1) according
to the sum of squared deviation statistic (Schneider and Excoffier 1999). The estimated initial and postexpansion
scaled effective population sizes (i.e., 2Mu, twice the mitochondrial effective population size multiplied with the mutation rate) were as follows: Group A: 0.0 and 230.2; Group
B: 0.174 and 137.1; and Group C: 0.0 and 2.6 (table 1). This
result indicated that the mtDNA variation within the 3 haplotype groups originated from a very small ewe population.
The results of the separate mismatch analysis for each studied area are presented in table 1.
The analysis described above also included estimation
of the scaled numbers of generations after the expansion
(i.e., 2tu, twice the number of generations after the expansion multiplied with the mutation rate; table 1). These were
transformed to time estimates by setting the time estimate of
the earliest expansion (Group A: 2tu 5 4.01; table 1) to
equal 9,000 years before present. The time estimates for
Group B and Group C corresponded to expansions 6,400
and 7,000 years ago, respectively. The separate analyses
for the 3 geographical areas resulted in wider confidence
intervals for the estimate of 2tu than for the combined sample. The only exception was Group B. For this haplotype
group, the estimate of 2tu for Caucasus differed considerably from that for the eastern fringe of Europe, and the
confidence intervals remained relatively narrow. The expansion of Group B involving Caucasian sheep was
1780 Tapio et al.
Table 1
Sudden Expansion Model Parameters Estimated from the Distribution of Pairwise Differences between Sequences within
Haplotype Groups
Group
A
B
C
Area
2Mu0a
2Mu1b
2tuc
2tu 90% CId
All
Caucasus
East Europe
Central Asia
All
Caucasus
East Europe
Central Asia
All
Caucasus
Central Asia
0.000
0.003
0.000
0.002
0.174
0.067
0.000
2.305
0.000
0.002
0.005
230.2
90.7
14.5
6657.5
137.1
7461.3
6655.0
16.0
2.6
1.8
22.9
4.01
3.72
4.91
3.84
2.87
3.66
2.60
2.36
3.13
3.43
2.56
(2.83,
(2.35,
(2.55,
(1.89,
(2.19,
(2.55,
(1.80,
(0.82,
(0.79,
(0.87,
(0.94,
4.62)
4.60)
7.31)
4.99)
3.68)
4.22)
2.87)
9.49)
6.25)
6.64)
4.45)
Pe
YAf
0.37
0.49
0.10
0.79
0.82
0.76
0.97
0.49
0.86
0.52
0.96
9,000
8,300
11,000
8,600
6,400
8,200
5,800
5,300
7,000
7,700
5,700
YA 90% CId
(6,400,
(5,300,
(4,200,
(5,700,
(4,900,
(5,700,
(4,000,
(1,800,
(1,800,
(2,000,
(2,100,
10,400)
10,300)
11,200)
16,400)
8,300)
9,500)
6,400)
21,300)
14,000)
14,900)
10,000)
a
2Mu0, twice the initial effective population size multiplied with the mutation rate.
2Mu1, twice the final effective population size multiplied with the mutation rate.
c
2tu, twice the number of generations since the expansion multiplied with the mutation rate.
d
90% confidence interval (CI) obtained by parametric bootstrap with 10,000 resamplings.
e
P values of the sum of squared deviations test evaluating the fit of data to sudden expansion model.
f
Years ago, estimated time to the population expansion in years before present.
b
estimated to have begun approximately at the same time as
expansion of Group A. The estimate based on whole data
was affected by the data from eastern European sheep, for
which a later demographic expansion commencing 4,000–
6,400 years ago was inferred.
To complement the above time estimates, a phylogenetic molecular dating method was used to estimate the time
to the most recent common ancestor of each polymorphic
haplotype group. The Hasegawa et al. (1985) model with
invariable sites and gamma-distributed mutation rate variation between sites was found to fit into the data. The modelbased relative rate test did not indicate variation in mutation
rates between the evolutionary paths to present-day haplotypes. Only 2.3% of the tests between the pairs had P values
below 0.05. As a precautious measure, this testwise P value
of 0.05 was used as a criterion to exclude haplotypes from
dating of the most recent common ancestors. In total, 39 haplotypes were removed. The phylogenetic tree in figure 2
presents this data with 171 domestic sheep haplotypes. Using the constructed Bayesian phylogeny with branch lengths
(unpublished data), the times to the most recent common
ancestors were 9,000, 8,850, and 5,900 years for Groups
A, B, and C, respectively.
Differences between Geographical Areas
The haplotype group distribution had 2 distinctive
geographical patterns (fig. 1). First, Group C was present
in the Caucasian and Central Asian areas but absent in
the eastern fringe of Europe (north and west of the Black
Sea and the Ural Mountains). On the regions east of the
Black Sea, the frequency of Group C varied from 8.1%
(the Caspian Depression) to 22% (east of the Caspian
Sea; fig. 1). A second recorded pattern was the absence
of Group A in the 4 studied populations from southeastern
Europe (the Tsigai breeds and the Ukrainian breeds; fig. 1;
Supplementary Table 1, Supplementary Material online).
The region dominated by Group B reached north of the
Black Sea and included Russian Karelia and the VolgaKama region (fig. 1), whereas Group A was found as a mi-
nor group in the area. Further east, the frequency of Group
B decreases while Group A becomes more important. The
transition is gradual, and only 7.8% of the mitochondrial
variation was due to differentiation between the 3 wide
areas (unpublished data).
Identification of the ancestral sequence type in the median-joining networks (fig. 3) was unambiguous in Groups
B and C but more complicated in Group A. The phylogenetically central and most numerous type can be assumed to
be the most ancient (root) node in the network, whereas the
haplotypes at the periphery of the network are the most recently arisen variants (Crandall and Templeton 1993). Unlike Groups B and C, Group A had a few common types that
were relatively central in the network (i.e., linked to several
other nodes). However, the most frequent central type was
the probable ancient type for Group A as well. This was
supported by 2 additional observations. First, the most common type was observed in 18 breeds, whereas the second
most common sequence type was observed in only 5 breeds
(unpublished data). Another supporting observation was
based on the argument by Templeton et al. (1995) that during slow gradual range expansion, new haplotypes emerging in the periphery of population range may subsequently
spread over a larger area, increase in frequency, and even
replace the ancient types in these new areas. Presently, all
the Central Asian Group A sequence types were directly
connected either to the most common type (not occurring
in Central Asia) or to another Central Asian type. Thus,
a single direction of gene flow (out of the Near East)
was sufficient to explain the Group A types in Central Asia,
if the root was set at the most common type. Other places
for the root would require a more complicated hypothesis.
The geographical distribution of Group A diversity
suggested that this group existed in the Caucasus area earlier than in the 2 other areas. First, there was a significant
difference in ancestral type frequency between Central Asia
and the other 2 areas (2-tailed contingency test P , 0.025);
the ancestral type was observed in Caucasus and the European area but not observed in Central Asia at all. As
described above, the diversity pattern suggested slow
Mitochondrial DNA Variation in Eurasian Sheep 1781
gradual spread of domesticated sheep to Central Asia involving haplotype frequency changes caused by a chain
of founding events. Second, Group A was rare in eastern
Europe (fig. 1), and the distribution of sequence types suggested gradual expansion over this region as well; the region had fewer rare nonroot sequence types than the
other areas (2-tailed contingency test P , 0.026). These
rare types were replaced by nonroot types reaching moderately high frequencies (fig. 3). More precisely, there were 3
nonroot Group A sequence types that were detected in the 2
breeds of the Middle Volga region. They were also detected
in the old native breeds in Nordic Countries and in Russian
Karelia. In addition, the most common Group A sequence
type in Finland seems to be descendant of one of these haplotypes (fig. 3). This suggests a previously unrecognized
migration of sheep to northern Europe through Russia. This
pattern caused by spatial expansion came close to deviate
significantly (P 5 0.10, table 1) from the pattern expected
from fitted model assuming only demographic expansion.
Therefore, the estimated model parameters for Group A in
the eastern Europe might reflect the history inaccurately.
A separate European sheep domestication appeared
unlikely based on Group B variation, even if this hypothesis
was supported by the high frequency of Group B haplotypes in Europe. There was a significant difference between
the frequency of Group B root in Europe (0.66) and in Caucasus and Central Asia (0.50 and 0.44, respectively; 2-tailed
contingency test P , 0.03). However, the mismatch distribution analysis (table 1) suggested that the expansion of
Group B lineages began later in eastern Europe than in Caucasus. The observation can be explained by a strong maternal bottleneck at the time when the European population
was founded from an earlier Near Eastern domesticated
stock.
Discussion
Four highly diverged sheep lineages (Groups A, B, C,
and D) were observed in Caucasus, 3 (A, B, and C) in Central Asia, and 2 (A and B) in the eastern fringe of Europe,
which included the area north and west from the Black Sea
and the Ural Mountains. Only one sheep had a Group D
haplotype. The other haplotype groups demonstrated signs
of population expansion. Sequence variation within the lineages implied that the earliest demographic expansion for
the common groups (A and B) initiated approximately at
the same time, even though the data from the eastern fringe
of Europe suggest that Group B expansion was more recent.
Expansion of Group C seems most recent, but wide confidence interval prevents firm dating.
Domestic sheep are likely to have ancestry from at
least 4 different geographical populations of wild mouflon.
The differentiation between the 4 mtDNA haplotype groups
was similar to that observed between distinct lineages in the
goat (unpublished data based on publicly available goat
sequences) and equals to that between recognized argali
sheep subspecies (fig. 2, Hiendleder et al. 2002). The presence of such divergence in a single population of constant
size has been estimated to be highly unlikely (Luikart et al.
2001). However, both European mouflons (King and
Brooks 2003) and primitive feral Soay sheep (Coltman
et al. 2003) exhibit a fine-scale genetic structure. The ewes
stay in the population and in the geographical region they
were born in, whereas the males are more prone to migrate.
The distinct maternal lineages may have been maintained
within a single ancestral mouflon subspecies, where the female population was subdivided. Maintaining very distinct
lineages within a single narrow geographical region with
a single female population is less likely.
All the presented results are compatible with the hypothesis of a Near East domestication center for sheep
(Smith 1998). First, the Caucasus area located closest to
the hypothesized Near East domestication center displays
high mtDNA diversity and has all 4 haplotype groups. Second, estimated times to the initiation of expansion of each
haplotype group are similar between the areas except in
Group B. For this haplotype group, Caucasus shows evidence for the earliest expansion. Third, for Group A, the
evidence of range expansion into other areas except the
Caucasus area suggests that the other regions received this
haplotype group more recently. In conclusion, our data suggest that both main haplotype groups were derived from
wild populations approximately at the same time in the
Near East. This makes their fully independent derivation
from wild sheep unlikely.
Group C may have been derived from wild sheep later
than Groups A and B. Group C exists mainly in the semidesert and steppe regions around the Caspian Sea, Central
Asia, and China (fig. 1). This distribution overlays the distribution of fat-tailed sheep, while Group C is absent in the
region spanning western Europe to the Ural Mountains,
where only thin-tailed fleece sheep exist as indigenous
breeds (Ryder 1984). The limited distribution supports a hypothesis of a more recent emergence of Group C in domestic sheep because gene flow has not had time to make the
haplotype group even modestly frequent in Europe. The
time estimates suggest that derivation of Group C from wild
sheep took place 2,000 (table 1) or 3,100 (a phylogenetic
estimate) years after the domestication of sheep. The confidence interval for the time (table 1) is much wider than for
the other haplotype groups, and it is compatible with the
initiation of expansion between 1,800 and 14,000 years
ago. The practice of mating domestic ewes with wild rams
in the regions close to the Caspian Sea has been documented even for the 20th century (Carruthers 1949). This
is not directly applicable for maternally inherited variation
but demonstrates long-lasting interest in deriving material
from the wild sheep into domesticated stock. This may have
included wild ewes or lambs, which brought the new type
into the domestic stock. Further sampling in Asia is required to reliably infer the original distribution of Group C.
The Group D was observed in a single Caucasian
sheep. There were 3 different kinds of further evidence confirming that this sequence represents a real fourth mitochondrial haplotype group rather than a sequencing artefact
originating from a nuclear pseudogene. First, we confirmed
the sequence data for this individual. In this, we used a different primer (the sequencing primer) to produce the template for the sequencing reaction. This should have led to
a different sequence than the originally obtained, if our initial observation would have represented a nuclear pseudogene. This was not observed. Second, Guo et al. (2005)
1782 Tapio et al.
detected an individual carrying this haplotype group, although they distinguished the haplotype only as a highly
differentiated variant within their C lineage, which mainly
corresponds to our Group C. Considering the mtDNA sites
used in the present statistical analyses, the sheep ‘‘M1516’’
(accession number AY829402) of Guo et al. (2005) had
only 2 nucleotide differences (equals to 0.5%) to our Group
D haplotype. This is well within the range of internal variation of the other 3 haplotype groups, and the mean difference of Group D to other 3 groups is similar to the average
difference between the more common haplotype groups
(fig. 2). The M1516 is the only previously reported ‘‘lineage
C’’ individual (Guo et al. 2005; Pedrosa et al. 2005) that
clusters to Group D (unpublished data). Finally, the test
of relative substitution rate indicated that the rate for the
evolutionary branch leading to the Group D haplotype is
the same as for the branches leading to other haplotypes.
If Group D represented an artefact originating from an ancient nuclear pseudogene, a lower mutation rate would be
expected (e.g., Perna and Kocher 1996).
All the presented molecular dates should be interpreted cautiously. They should not be strongly affected
by the recently observed nonlinearity of apparent mutation
rates, which suggests that mutation rates seem much higher
in the recent rather than ancient phylogenetic history (Ho
and Larson 2006), because the calibration point is very
close to estimated times. However, we timed the first expansion (Group A) equal to the archaeologically estimated time
of sheep domestication (9,000 year ago). Throughout the
study, times were estimated in relation to this date. It should
also be noted that the phylogenetic time estimate does not
necessarily reflect the domestication date because it can be
affected by the extent of diversity sampled in the domestication. Timing the domestication based on the timing of
population expansion is a more reliable approach because
it incorporates the sampling to the model as the initial population size. Presently, the 2 methods suggested the same
order for the 3 haplotype group expansions.
Group B has been observed in the European mouflon,
and it predominates in European breeds being at the same
time a minority type in eastern Asia (Hiendleder, Mainz
et al. 1998; Hiendleder et al. 2002, Guo et al. 2005). This
pattern could have been caused by an independent domestication in Europe and subsequent gene flow to Asia. However, the expansion of Group B in Europe appears to have
begun later than in the Caucasus. The frequency pattern can
be explained by a genetic bottleneck when the European
stock was founded from an earlier domestic stock, and there
is no need to invoke an independent European domestication of sheep. The present study suggests an approximately
3,000 year time period between the domestication of sheep
in the Near East and the arrival of sheep to temperate
Europe. Although the confidence intervals around the point
estimates are wide, this time difference is similar to the
2,000–3,000 year period that archaeological evidence has
suggested as a separation of the initial sheep domestication
event in the Near East and the later spread of agriculture to
the temperate Europe (Smith 1998).
Our data suggest that the derivation of Groups A and B
from the wild sheep took place approximately at the same
time in the Near East, although sheep mtDNA variation
in the uncharacterized regions of the Near East and its
neighboring areas should be studied to strengthen this conclusion. The results imply an additional, previously unrecognized route of sheep from the Near East to northern
Europe directly through Russia. Analysis of mtDNA variation in western European sheep will shed light on whether
the Russian route has influenced European sheep diversity
outside northern Europe.
Supplementary Material
Supplementary Tables 1 and 2 are available at
Molecular Biology and Evolution online (http://www.mbe.
oxfordjournals.org/).
Acknowledgments
We thank Anneli Virta for technical assistance and
Professor Asko Mäki-Tanila and anonymous referees for
helpful comments on the manuscript. We acknowledge
the following persons for assistance in sample collection
in the Middle Volga and the Volga-Kama regions: Paula
Kokkonen, Galina Mišarina, Esa-Jussi Salminen, Konstantin Zamjatin, Vasili Petrov, Lidija Matrosova, Jouni Kortesharju, and Natalia Devjatkina. Dr Alexandr Ivanovich
Kostenko from Ukrainian Academy of Agrarian Sciences,
Kiev, aided in sampling of Ukrainian sheep. We thank the
Nordic Gene Bank for Farm Animals (NGH) and Dr Ingrid
Olsaker for the Norwegian sheep samples. This work was
financially supported by the Academy of Finland and the
Finnish Ministry of Agriculture and Forestry (SUNAREprogram and Russia In Flux-program). M.T. was supported
by the Department of Education in Finland (Finnish Graduate School in Population Genetics coordinated by Oulu
University).
Literature Cited
Bandelt H.-J, Dress AWM. 1992. Split decomposition: a new and
useful approach to phylogenetic analysis of distance data. Mol
Phylogenet Evol 1:242–52.
Bandelt H.-J, Forster P, Röhl A. 1999. Median-joining networks
for inferring intraspecific phylogenies. Mol Biol Evol 16:37–48.
Carruthers D. 1949. Beyond the Caspian: a naturalist in Central
Asia. Edinburgh, United Kingdom: Oliver and Boyd.
Coltman DW, Pilkington JG, Pemberton JM. 2003. Fine-scale genetic structure in a free-living ungulate population. Mol Ecol
12:733–42.
Crandall KA, Templeton AR. 1993. Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction. Genetics 134:959–69.
Davis GH, Galloway SM, Ross IK, et al. (19 co-authors). 2002.
DNA tests in prolific sheep from eight countries provide new
evidence on origin of the Booroola (FecB) mutation. Biol
Reprod 66:1869–74.
Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of
automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–85.
Fu Y.-X. 1997. Statistical tests of neutrality of mutations against
population growth, hitchhiking and background selection. Genetics 147:915–25.
Guo J, Du L.-X, Ma Y.-H, Guan W.-J, Li H.-B, Zhao Q.-J, Li X,
Rao S.-Q. 2005. A novel maternal lineage revealed in sheep
(Ovis aries). Anim Genet 36:331–6.
Mitochondrial DNA Variation in Eurasian Sheep 1783
Hasegawa M, Kishino H, Yano T. 1985. Dating the human-ape
split by a molecular clock of mitochondrial DNA. J Mol Evol
22:160–74.
Hiendleder S, Kaupe B, Wassmuth R, Janke A. 2002. Molecular
analysis of wild and domestic sheep questions current nomenclature and provides evidence for domestication from two different subspecies. Proc R Soc Lond B Biol Sci 269:893–904.
Hiendleder S, Lewalski H, Wassmuth R, Janke A. 1998. The complete mitochondrial DNA sequence of the domestic sheep
(Ovis aries) and comparison with the other major ovine haplotype. J Mol Evol 47:441–8.
Hiendleder S, Mainz K, Plante Y, Lewalski H. 1998. Analysis of
mitochondrial DNA indicates that domestic sheep are derived
from two different ancestral maternal sources: no evidence for
contributions from urial and argali sheep. J Hered 89:113–20.
Ho SYW, Larson G. 2006. Molecular clocks: when times are
a-changin’. Trends Genet 22:79–83.
King R, Brooks SP. 2003. Survival and spatial fidelity of mouflons: the effect of location, age and sex. J Agric Biol Environ
Stat 8:486–513.
Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software
for molecular evolutionary genetics analysis and sequence
alignment. Brief Bioinform 5:150–63.
Loftus RT, MacHugh DE, Bradley DG, Sharp PM, Cunningham
P. 1994. Evidence for two independent domestications of
cattle. Proc Natl Acad Sci USA 91:2757–61.
Luikart G, Gielly L, Excoffier L, Vigne J-D, Bouvet J, Taberlet P.
2001. Multiple maternal origins and weak phylogeographic
structure in domestic goats. Proc Natl Acad Sci USA
98:5927–32.
Meadows JRS, Li K, Kantanen J, et al. (11 co-authors). 2005. Mitochondrial sequence reveals high levels of gene flow between
breeds of domestic sheep from Asia and Europe. J Hered
96:494–501.
Pedrosa S, Uzun M, Arranz JJ, Gutiérrez-Gil B, San Primitivo F,
Bayón Y. 2005. Evidence of three maternal lineages in Near
Eastern sheep supporting multiple domestication events. Proc
R Soc Lond B Biol Sci 272:2211–7.
Perna NT, Kocher TD. 1996. Mitochondrial DNA: molecular fossils in the nucleus. Curr Biol 6:128–9.
Pond SL, Frost SD, Muse SV. 2005. HyPhy: hypothesis testing
using phylogenies. Bioinformatics 21:676–9.
Poplin F. 1979. Origine du mouflon de Corse dans une nouvelle
perspective paleontologique, par marronnage. Ann Génét Sél
Anim 11:133–43.
Posada D, Crandall KA. 1998. Modeltest: testing the model of
DNA substitution. Bioinformatics 14:817–8.
Posada D, Crandall KA, Templeton AR. 2000. GeoDis: a program
for the cladistic nested analysis of the geographical distribution
of genetic haplotypes. Mol Ecol 9:487–8.
Rogers AR. 1995. Genetic evidence for a Pleistocene population
explosion. Evolution 49:608–15.
Ronquist F, Huelsenbeck JP. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics
19:1572–4.
Ryder ML. 1983. Sheep and man. London: Duckworth.
Ryder ML. 1984. Sheep. In: Mason IL, editor. Evolution of domesticated animals. London: Longman. p 63–85.
Sanderson MJ. 2003. r8s: inferring absolute rates of molecular
evolution and divergence times in the absence of a molecular
clock. Bioinformatics 19:301–2.
Schneider S, Excoffier L. 1999. Estimation of past demographic parameters from the distribution of pairwise differences when the mutation rates vary among sites:
application to human mitochondrial DNA. Genetics 152:
1079–89.
Schneider S, Roessli D, Excoffier L. 1999. Arlequin version 2.0:
a software for population genetic data analysis. University of
Geneva, Switzerland: Genetics and Biometry Lab, Department
of Anthropology. Available from: http://anthro.unige.ch/
arlequin/. Accessed on 2001 Jan 2.
Smith BD. 1998. The emergence of agriculture. New York:
Scientific American Library.
Sultana S, Mannen H, Tsuji S. 2003. Mitochondrial DNA diversity of Pakistani goats. Anim Genet 34:417–21.
Tanaka K, Solis CD, Masangkay JS, Maeda K, Kawamoto Y,
Namikawa T. 1996. Phylogenetic relationship among all living
species of the genus Bubalus based on DNA sequences of the
cytochrome b gene. Biochem Genet 34:443–52.
Tapio M, Miceikiené I, Vilkki J, Kantanen J. 2003. Comparison of
microsatellite and blood protein diversity in sheep: inconsistencies in fragmented breeds. Mol Ecol 12:2045–56.
Templeton AR, Routman E, Phillips CA. 1995. Separating population structure from population history: a cladistic analysis of
geographical distribution of mitochondrial DNA haplotypes
in the tiger salamander, Ambystoma tigrinum. Genetics
140:767–82.
Wood NJ, Phua SH. 1996. Variation in the control region sequence of the sheep mitochondrial genome. Anim Genet
27:25–33.
Ye L, Huang X. 2005. MAP2: multiple alignment of syntenic genomic sequences. Nucleic Acids Res. 33:162–70.
Zeder MA, Emshwiller E, Smith BD, Bradley DG. 2006. Documenting domestication: the intersection of genetics and archaeology. Trends Genet 22:139–55.
Anne Stone, Associate Editor
Accepted June 13, 2006
© Copyright 2026 Paperzz