Quercus suber L. - Repositório da Universidade de Lisboa

UNIVERSIDADE DE LISBOA
FACULDADE DE CIÊNCIAS
DEPARTAMENTO DE BIOLOGIA ANIMAL
Differentiation and genetic variability in cork oak populations
(Quercus suber L.)
Joana Seabra Pulido Neves da Costa
MESTRADO EM BIOLOGIA HUMANA E AMBIENTE
Lisboa
2011
UNIVERSIDADE DE LISBOA
FACULDADE DE CIÊNCIAS
DEPARTAMENTO DE BIOLOGIA ANIMAL
Differentiation and genetic variability in cork oak populations
(Quercus suber L.)
Joana Seabra Pulido Neves da Costa
Dissertação orientada por:
Prof. Doutor Octávio Fernando de Sousa Salgueiro Godinho Paulo
Doutora Dora Cristina Vicente Batista Lyon de Castro
MESTRADO EM BIOLOGIA HUMANA E AMBIENTE
Lisboa
2011
Nota prévia
A presente tese de mestrado encontra-se escrita na língua Inglesa uma vez que esta é
considerada a língua científica universal. Por esta razão, o conhecimento e treino da sua
escrita apresentam uma importância considerável para quem tenciona seguir uma carreira em
investigação científica em Biologia. Com a escrita da tese em Inglês pretende-se também
acelerar o processo de elaboração dos manuscritos e subsequentes publicações científicas.
As referências bibliográficas foram elaboradas segundo os parâmetros da revista científica
internacional,
“Trends
in
Ecology
and
Evolution”
(www.cell.com/trends/ecology-
evolution/authors). Esta é uma das revistas mais relevantes na área em que esta tese foi
desenvolvida e possui um sistema de citações cómodo para a leitura de textos de revisão
científica. Adicionando o seu elevado factor de impacto na sociedade científica, pareceu
apropriada a escolha desta revista como referência para a apresentação da bibliografia.
O estudo elaborado nesta tese foi desenvolvido no âmbito do projecto PTDC/AGRGLP/104966/2008, “Avaliação dos recursos genéticos e genómicos do sobreiro: bases para
uma gestão prospectiva”, financiado pela Fundação para a Ciência e Tecnologia (FCT).
III
Foreword
The present master thesis is written in English. This is considered as the universal scientific
language and, therefore, is of the upmost importance the practice of its writing and grammar
for those who intend to follow a career in Biology and scientific investigation. Also, the
writing of the present thesis in the English language allows to accelerate the process of
submission of the manuscripts for further publication.
The bibliographic references were elaborated following the parameters of the international
scientific journal “Trends in Ecology and Evolution” (www.cell.com/trends/ecologyevolution/authors). This is one of the most relevant journals in the area where this thesis was
developed, with an elevated impact factor in the scientific society. Also it possesses a
confortable citations system for the reading long texts.
This study is part of the project PTDC/AGR-GLP/104966/2008, “Avaliação dos recursos
genéticos e genómicos do sobreiro: bases para uma gestão prospectiva”, funded by Fundação
para a Ciência e Tecnologia (FCT).
IV
Agradecimentos
No terminar desta tese surge a necessidade de agradecer a todos aqueles que de alguma forma
a tornaram possível.
O primeiro agradecimento é devido aos meus orientadores, Octávio Paulo e Dora Batista. Ao
Professor Octávio pelo incentivo e voto de confiança que depositou em mim desde o início. À
Dora pela proposta do tema de mestrado e pelo despertar do meu interesse pelas plantas.
À Professora Deodália Dias pelas oportunidades que me proporcionou e pelo apoio
incondicional quando os problemas fogem ao nosso controlo e não dependem de nós.
Agradeço em particular à Professora Helena Almeida do Instituto Superior de Agronomia
pelo acesso à Herdade Monta da Fava de onde vieram algumas das populações de sobreiro
mais importantes para o desenvolvimento deste trabalho. A todos os que directa ou
indirectamente foram importantes para a recolha das amostras.
Ao CoBiG2 pelo grupo que se formou e pelos bons momentos. Ao Francisco Pina-Martins e à
Vera Nunes por me terem criado nos momentos iniciais da minha vida de laboratório. À Sofia
Seabra pela sua calma natural e amizade. Ao Eduardo Marabuto pelo seu bom humor, muito
necessário em tempos difíceis. À Sara Ema, por incrível que te possa parecer acho que
stressas mais que eu e isso é uma ajuda enorme, assim como estares comigo até ao dia da
entrega… mesmo de moletas. Ao Diogo Silva pelo partilhar de alguns momentos difíceis
com os orientadores a afins. À Catarina Dourado, Ana Sofia, Patrícia Brás, Renata Martins,
Inês Modesto e Bruno Vieira que me foram ajudando com as usuais dificuldades de uma tese.
Aos restantes membros do CoBiG2, assim como a antigos membros e às mais recentes
aquisições, muito obrigada!
À Rita Oliveira e Raquel Vaz, não fazem parte do CoBiG2, mas fazem parte da família e
merecem o devido reconhecimento e agradecimento pelo que “aturaram” da minha parte.
Um agradecimento especial a quatro pessoas que devem ter sofrido muito comigo. Seriam
precisas páginas de agradecimentos, mas como não o posso fazer fica a intenção. À Catarina
Dourado um agradecimento em particular. Foi um longo caminho e fica o agradecimento pelo
carinho, apoio e amizade. Ao Eduardo Marabuto pelos valiosos comentários, acima de tudo
na Introdução. Tens razão em muitas coisas mas há que fazer compromissos. À Sofia Seabra
pelas importantíssimas correcções, seria muito mais difícil sem ti. Ao Bruno Vieira pelas
V
horas infinitas que me ouviu queixar da vida em geral, da tese em particular. Sempre com
muito amor e carinho! Obrigada aos quatro!
À Diana Martins. Não estás sempre comigo mas estás sempre a pensar em mim e tens timings
impecáveis para quando preciso mais de ti.
E ao Pai, Mãe e Avós. Apesar de estarem no fim desta lista foram provavelmente as pessoas
que mais contribuíram para que esta tese pudesse ser concluída. Claramente não estaria cá
sem o apoio precioso da minha família.
VI
Resumo
O ano 2011 foi designado como “O Ano Internacional das Florestas” pela Assembleia Geral
das Nações Unidas, na tentativa de despertar o interesse público e promover a
sustentabilidade da gestão e conservação florestal para o benefício das gerações futuras.
Estimativas da FAO (Food and Agriculture Organization) para o ano de 2010 demonstraram
que 31% da superfície terrestre ainda está coberta por florestas e que as árvores
correspondem a 90% da biomassa terrestre, compreendendo um total de 60.000 a 100.000
taxa. Contudo, certas alterações induzidas pelo Homem, principalmente a desflorestação e as
alterações climáticas elevaram o número de espécies ameaçadas de extinção para 10%.
Nos últimos tempos as espécies florestais têm sido bastante usadas em estudos de genética
populacional e evolutiva, assim como em estudos genómicos. As principais razões são as
características particulares que estes modelos não-clássicos apresentam, visto resultarem de
milhões de anos de divergência e diversificação, e assim apresentarem impressionantes níveis
de diversidade morfológica, divergência evolutiva e diversidade ecológica. Apesar de o
impacto que as alterações globais vão ter sobre estas espécies depender grandemente da sua
capacidade de reacção e da dos seus ecossistemas, os estudos genéticos permitem-nos, até
certo ponto, prever as consequências evolutivas das alterações uma vez que nos possibilitam
aumentar o conhecimento da biodiversidade e evolução destas espécies.
O conceito de “Filogeografia” foi apresentado por Avise et al. em 1987, e durante os últimos
25 anos teve um grande impacto na investigação, particularmente em animais. Nas plantas os
resultados produzidos não têm sido tão explícitos, principalmente devido à falta de
variabilidade genética aplicável à análise filogeográfica. Tem sido consideravelmente difícil
encontrar um marcador genético em plantas com um poder de resolução semelhante ao DNA
mitocondrial animal. No entanto a filogeografia em plantas tem-se desenvolvido bastante,
principalmente nos últimos anos, com o crescimento do uso de marcadores moleculares
nucleares e com a recolha de informação de fragmentos maiores do genoma cloroplastidial.
O género Quercus (carvalhos) (Fagaceae) é um dos grupos mais importantes de
angióspermicas lenhosas no hemisfério norte, nomeadamente em relação à diversidade de
espécies, dominância ecológica e valor económico. O género é bastante antigo considerando
que o fóssil mais antigo encontrado pertence ao Oligoceno (34-23 milhões de anos). Os
VII
carvalhos são os membros dominantes de uma grande variedade de habitats e pensa-se que
existam 500-600 espécies na Terra.
O sobreiro (Quercus suber L.) representa umas das espécies arbóreas mais importantes da
região Oeste do Mediterrâneo, tanto económica como ecologicamente, onde define espaços
florestais abertos (criados e mantidos pelo Homem) conhecidos em Portugal como
“montados”. A área de distribuição do sobreiro, apesar de descontínua, vai desde a costa
Atlântica do Norte de África e Península Ibérica até às regiões sudoeste de Itália, incluindo as
ilhas Mediterrânicas Sicília e Sardenha, assim como as zonas costeiras do Mediterrâneo da
Argélia e Tunísia. As florestas de sobreiro cobrem uma área total de cerca de 2,2 milhões de
hectares, de onde são extraídas 340.000 toneladas/ano de cortiça. As maiores extensões de
área coberta estão localizadas em Portugal com cerca de 700.000 hectares, correspondendo a
21% da área florestal Portuguesa e 30% da área mundial de produção de cortiça. O sobreiro
tem sido usado desde a Antiguidade para a produção de cortiça e este produto natural
apresenta um grande valor económico. As maiores ameaças contudo, são enfrentadas pelas
populações naturais e marginais, que muitas vezes são pequenas e se encontram dispersas e
em habitats restritos. Muitas destas populações podem estar em risco de desaparecer,
principalmente devido à falta de regeneração.
Devido ao seu valor económico e também porque os espaços florestais de sobro são
reservatórios de biodiversidade e abrigo para uma grande variedade de espécies ameaçadas de
extinção, estas populações representam material importante para estudos genéticos que
possam servir de base ao delineamento de programas de conservação. Assim sendo, é
necessário fortalecer e aumentar o conhecimento da organização espacial da variação
genética da espécie, para assim se poder tomar decisões conscientes e informadas sobre a
conservação dos recursos genéticos.
Os estudos filogeográficos em Quercus suber têm sido pouco aprofundados e alguns até
inconclusivos. Isto leva a que não haja uma boa compreensão da história evolutiva da
espécie, muito provavelmente devido ao número limitado de áreas amostradas ou o baixo
conteúdo informativo dos marcadores usados. Por exemplo, nos estudos que envolveram
populações Portuguesas, foram feitas inferências com base numa amostragem deficiente de
Portugal, e sendo esta uma das regiões mais relevantes na história presente e passada do
sobreiro, é necessária uma maior cobertura da área de distribuição, incluindo algumas zonas
referidas como potenciais zonas de refúgios glaciais para outras espécies. Por outro lado, uma
VIII
vez que a maioria dos estudos filogeográficos são suportados por dados derivados do DNA
cloroplastidial (cpDNA) (PCR-RFLPs e SSRs), deve considerar-se se outras abordagens
moleculares ou marcadores genéticos, que evoluam a taxas mais rápidas que o cpDNA não
indicariam um cenário evolutivo diferente. Esta tese de mestrado propõe uma abordagem
diferente dos estudos anteriores, complementando dados obtidos a partir de DNA
cloroplastidial e nuclear. Esta abordagem nunca foi aplicada ao sobreiro, e espera-se que
possa adicionar informação filogeográfica relevante. Mais especificamente os objectivos
deste trabalho foram: 1. Inferir a história evolutiva e os padrões demográficos de Quercus
suber; 2. Explorar os padrões de hibridação e introgressão do sobreiro com outras espécies de
Quercus; 3. Avaliar os níveis de diversidade e diferenciação entre e dentro de algumas
populações chave de sobreiro.
A sequenciação de vários fragmentos permitiu inferir alguns detalhes sobre a história
evolutiva da espécie. O tradicional cpDNA foi seleccionado para sequenciação de 3 regiões
inter-génicas (TrnL-F, TrnS-PsbC e TrnH-PsbA), num total de 148 amostras provenientes de
26 populações. No entanto, e porque inferências filogeográficas baseadas num único tipo de
marcador não-recombinante pode dar informações erróneas sobre a história evolutiva da
espécie, o genoma nuclear (nuDNA) também foi explorado com a sequenciação de um gene
candidato potencialmente envolvido no stress osmótico (EST 2T13), em 104 amostras
provenientes das mesmas 26 populações. Para ambos os conjuntos de dados foram detectadas
duas linhagens presentes em sobreiro. Uma linhagem, a “linhagem pura”, parece
praticamente exclusiva do sobreiro e divide-se em três sub-linhagens possivelmente
resultantes de três zonas de refúgio, sendo uma predominante na zona Oeste do Mediterrâneo,
e as outras duas na zona Este do Mediterrâneo. A outra linhagem aparece associada a
Quercus ilex (azinheira) e Quercus coccifera (carrasco) e foi apelidada de “linhagem
introgredida”. Esta linhagem parece resultar de vários fenómenos de hibridação e
introgressão com Quercus ilex. A análise combinada das sequências do cpDNA e nuDNA
sugere que esta introgressão aconteceu em ambos os sentidos entre as duas espécies, assim
como sugere que estes eventos foram frequentes e consecutivos durante um período de
tempo.
Finalmente, microssatélites nucleares, derivados de ESTs (Expressed Sequence Tags) (ESTSSRs) e anónimos (nuSSRs), permitiram obter uma perspectiva dos padrões de diversidade
genética e estrutura populacional do sobreiro. Numa primeira fase foi possível estabelecer os
EST-SSRs como marcadores válidos no sobreiro, contrariando a ideia de que os EST-SSRs
IX
tendem a ser pouco polimórficos. Posteriormente, uma análise combinada destes dois
marcadores (5 EST-SSRs e 3 nuSSRs) em 379 indivíduos provenientes de 13 populações
detectou uma diversidade genética relativamente baixa, mas altamente significativa. Apesar
de não ter sido detectada estrutura populacional nas populações Portuguesas, aparecendo em
conjunto num grupo populacional, verifica-se uma tendência para considerar a Catalunha
(Espanha) como uma das populações mais diferenciadas.
No geral os objectivos do trabalho foram cumpridos, esclarecendo alguns pontos da
filogeografia e história evolutiva do sobreiro. A introdução dos novos marcadores
moleculares foi claramente informativa, revelando novos aspectos inesperados acerca dos
padrões genéticos da espécie e assim o gerar de hipóteses explicativas completamente novas
em sobreiro.
Palavras-Chave
Quercus suber, estrutura geográfica, microssatélites, ESTs, introgressão
X
Abstract
Cork oak (Quercus suber L.) is one of the most important tree species, economically but also
ecologically, in the Western Mediterranean region. Consequently there is an enormous
interest in understanding the evolutionary history and current population structure in cork
oak. Although some details on the genetic divergence of cork oak populations have been
uncovered, it is most probable that a different and complementary analysis of chloroplastidial
and nuclear DNA markers (cpDNA and nuDNA) can bring additional phylogeographical
relevant information. So far, no one has attempted the molecular approach proposed in the
present study for cork oak by combining cpDNA and nuDNA sequence variation and also
anonymous nuclear microsatellites (nuSSRs) and EST-derived (Expressed Sequence Tags)
(EST-SSRs) polymorphism data to infer phylogeographical patterns and history, possible
glacial refuges, diversity levels and geographic structure.
A genetic survey was conducted sampling populations throughout the entire distribution
range of the species. Genetic diversity was monitored at 8 nuclear microsatellite loci (3 ESTSSRs and 5 nuSSRs) in 379 individuals derived from 13 populations, and at 4 DNA
sequences (3 cpDNA intergenic spacer regions and 1 osmotic-stress related candidate gene)
in 148 samples from 26 populations.
DNA sequences, of both cpDNA and nuDNA, confirmed two main lineages of cork oak
haplotypes, the first named as pure lineage (mostly exclusive of cork oak but also shared with
Q. cerris) and the second as introgressed lineage (shared with Q. ilex and Q. coccifera).
However, sequences of the cpDNA show the complexity of the introgressed lineage,
apparently indicating that these events of hybridization and introgression may have happened
frequently and consecutively over a period of time. The theory of cork oak refugia over the
last glaciations was also revisited (over the pure lineage of the cpDNA haplotypes) and three
major haplotypes were detected, reflecting three possible refuge areas. Finally, with the
microsatellite data, population differentiation was low but rather significant and the
geographic subdivisions that could be defined isolated the Portuguese populations in one
cluster, further characterizing the Catalonia (Spain) population as possibly the most
differentiated population.
Key words
Quercus suber, geographical structure, microsatellites, ESTs, introgressive hybridization
XI
XII
List of abbreviations
AFLP – Amplified Fragment Length Polymorphisms
BA – Bayesian analysis
BLAST – Basic Local Alignment Search Tool
bp – base pairs
BP – Before Present
CBOL – The Consortium for the Barcode of Life
COI or Cox1 – cytochrome c oxidase I
cpDNA – Chloroplastidial DNA
ESTs – Expressed Sequence Tags
EST-SSRs – EST-derived SSRs
FAO – Food and Agricultural Organization (of the United Nations)
ITS – Internal Transcriber Spacer
kb – Kilobases
MCMC – Markov Chain Monte Carlo
MP – Maximum Parsimony
mtDNA – Mitochondrial DNA
nuDNA – Nuclear DNA
nuSSRs – nuclear SSRs
PCR - Polymerase chain reaction
RAPDs – Random Amplified Polymorphic DNA
rDNA – Ribosomal DNA
RFLP – Restriction Fragment Length Polymorphism
sncDNA – Single-nuclear copy DNA
SNPs – Single Nucleotide Polymorphisms
SSR – Simple sequence repeats; microsatellites
XIII
XIV
Table of Contents
1.
Introduction................................................................................................................................... 17
Thesis Main Goals ............................................................................................................................ 18
1.1
An emblematic tree: Quercus suber L. ............................................................................. 20
1.1.1 General aspects on cork oak and the “montado” ............................................................. 20
1.1.2 Taxonomic classification and phylogenetic studies ......................................................... 22
1.1.2.1 Barcoding in oak phylogenetics ................................................................................ 24
1.1.3 Geographical distribution ................................................................................................. 25
1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization ............... 27
1.1.5 Genetic diversity studies .................................................................................................. 30
1.1.6 Hybridization and cytoplasmatic introgression ................................................................ 35
1.2
Molecular markers in phylogeography ............................................................................. 37
1.2.1 Mitochondrial DNA (mtDNA) ......................................................................................... 37
1.2.2 Chloroplastidial DNA (cpDNA) ...................................................................................... 38
1.2.3 Nuclear DNA (nuDNA) ................................................................................................... 39
1.2.4 Simple Sequence Repeats (SSRs) .................................................................................... 40
1.2.5 Expressed Sequence Tags (ESTs) .................................................................................... 41
2.
Materials and Methods ................................................................................................................. 43
2.1 Sampling and DNA extraction ................................................................................................ 43
2.2 DNA sequencing ..................................................................................................................... 44
2.3 Microsatellite genotyping ....................................................................................................... 47
2.4 Phylogenetic and phylogeographic analysis ........................................................................... 48
2.5 Selective neutrality tests and demographic history ................................................................. 49
2.6 Genetic diversity and population differentiation..................................................................... 50
2.7 Genetic structure of populations ............................................................................................. 51
3.
Results ........................................................................................................................................... 53
3.1 Sequencing of chloroplast and nuclear DNA fragments ......................................................... 53
3.1.1 cpDNA and nuDNA diversity levels ............................................................................... 53
3.1.2 Differentiation patterns .................................................................................................... 54
3.1.3 Mismatch distribution and neutrality tests ....................................................................... 62
3.2 Microsatellite analysis............................................................................................................. 64
3.2.1 Genetic diversity values ................................................................................................... 64
3.2.2 Genetic differentiation among populations ...................................................................... 66
XV
3.2.3 Population structure ......................................................................................................... 68
Discussion ..................................................................................................................................... 73
4.
4.1 Differentiation and demographic patterns ............................................................................... 73
4.2 Hybridization and introgression .............................................................................................. 75
4.3 Genetic diversity and population structure ............................................................................. 79
5.
Final Remarks ............................................................................................................................... 82
6.
Bibliographic References .............................................................................................................. 84
Supporting Information ......................................................................................................................... 95
Supporting Information 1 .................................................................................................................. 96
Supporting Information 2 .................................................................................................................. 97
Supporting Information 3 .................................................................................................................. 99
Supporting Information 4 ................................................................................................................ 101
Supporting Information 5 ................................................................................................................ 105
Supporting Information 6 ................................................................................................................ 109
Supporting Information 7 ................................................................................................................ 110
Supporting Information 8 ................................................................................................................ 111
Supporting Information 9 ................................................................................................................ 115
XVI
Materials and Methods
1. Introduction
The year of 2011 has been designated as „The International Year of Forests‟ by the United
Nations General Assembly, in an attempt to raise awareness and strengthen a more
sustainable forest management and conservation of all types of forests for the benefit of
current and future generations. Estimates by the Food and Agriculture Organization (of the
United Nations) (FAO), in the year of 2010, demonstrated that 31% of the Earth‟s terrestrial
surface is still covered by forests, and that trees correspond to 90% of Earth‟s biomass [1].
Some estimates of the global tree species richness state that there are 60,000 to 100,000 taxa,
and that forests harbour the majority of the world‟s terrestrial biodiversity [2]. However, the
ongoing deforestation and other human-induced global changes (such as climate and land
use) brought the number of the world‟s tree species threatened with extinction close to 10%
[3,4] and, although the overall rate of deforestation remains alarmingly high (estimated at 9.4
million hectares per year in the late 1990s), this rate is surprisingly slowing down [5].
In recent years forest trees have been gaining much attention as a non-classical model for
several types of studies. For purposes of population and evolutionary genetic and genomic
studies, they are particularly interesting since forest trees result of millions of years of lineage
divergence and diversification and present amazing levels of diversity in morphology,
adaptation, and ecology [6,7]. Although, in the end, the impact of global changes in forest
trees will depend to great extent on the reaction of these trees and their ecosystems, genetic
studies open the possibility of predicting the evolutionary consequences of the future global
changes by increasing the knowledge on tree biodiversity and evolution [8]. For that purpose
phylogeographic genetic studies seem to be an important step in understanding these
processes.
Avise et al. presented the concept of “Phylogeography” in 1987 [9], and during the past 25
years or so phylogeography has had a major impact on research, particularly in animal
species. In plants, however, the produced results have not been so explicit. One of the major
problems has been a lack of useful genetic variation applicable to the phylogeographic
analysis. It is quite difficult to find genetic markers in plants with a resolving power
comparable to
animal
mitochondrial DNA (mtDNA) [7,10]. Nonetheless,
plant
17
Materials and Methods
phylogeography has come a long way over the last few years with the availability of nuclear
markers and with the collection of data from larger sections of chloroplastidial genome [11].
The genus Quercus (oaks) of the Fagaceae family is one of the most important groups of
woody angiosperms in the northern hemisphere in terms of species diversity, ecological
dominance, and economic value. The genus is quite old, since the oldest unequivocal oak
fossils belong to the Oligocene, which ranges from 34 to 23 million years before present.
Oaks are dominant members of a wide variety of habitats, and somewhat 500-600 species
exist on earth [12,13].
The Quercus suber L. (commonly known as cork oak) is among the most important tree
species (economically and ecologically) in the Western Mediterranean region, from where it
is endemic, defining unique open woods (created and maintained by man) known in Portugal
as “montados” ” and in Spain as “dehesas”. Quercus suber has been mostly used to produce
cork and this natural product has a great economic value. The biggest threats, however, are
faced by the marginal natural populations, often growing in small and scattered stands and in
restricted habitats that are at risk of disappearing, mainly due to a lack of regeneration [14].
Due to the species economic value and also because cork oak woodlands are renowned
reservoirs of biodiversity, home to a variety of threatened and endangered species, and
crucial to avoid soil erosion, Q. suber populations represent valuable material for genetic
studies as well as gene conservation programs. With that purpose, a greater knowledge about
the spatial organization of genetic variation within the species is necessary to allow decisions
to be made about tree breeding and the conservation of genetic resources.
Thesis Main Goals
Although previous studies have already addressed population genetics in Quercus suber,
there is still a void in the current understanding of the evolutionary history of the species,
mostly due to the limited number of geographical areas sampled or to low marker informative
content. In particular, Portuguese populations have been poorly represented in those studies,
and being Portugal one of the most relevant regions in the recent and past history of cork oak,
a more complete range of cork oak distribution and differentiation should be covered,
18
Materials and Methods
including some areas referred as potential glacial refuges for other species. On the other hand,
since the majority of the previous phylogeographical inferences are supported by data from
chloroplastidial markers [Restriction Fragment Length Polymorphism (RFLP) and
microsatellites (SSRs)], it should be raised the question whether other molecular approaches
or genetic markers evolving at faster rates than chloroplastidial DNA (cpDNA), would
provide a different microevolutionary scenario. Therefore, the main objectives of this work
were to:
1. Infer the evolutionary history and demographic patterns of Quercus suber;
2. Assess the hybridization and introgression patterns by other Quercus species;
3. Evaluate the diversity and differentiation levels among and within some key cork
oak populations.
To achieve these goals populations from the entire Mediterranean distribution of the species
were analysed using different approaches with several molecular markers. A multi-locus
sequencing approach was applied to infer the evolutionary history of the species and its
relationships and also introgression patterns with other Quercus species. The traditional
cpDNA was selected for sequencing of several fragments. Additionally, and because
phylogeographic inferences based on a single non-recombining marker can be misleading, the
nuclear genome was also explored with the sequencing of one candidate gene. Finally,
Expressed sequence tag (EST) derived SSRs (EST-SSRs) and anonymous nuclear SSRs
(nuSSRs) were also used with the intention of providing a perspective of patterns of genetic
diversity and population structure.
19
Materials and Methods
1.1 An emblematic tree: Quercus suber L.
1.1.1 General aspects on cork oak and the “montado”
Cork oak (Quercus suber Linné, 1753) is an emblematic Mediterranean evergreen
sclerophyllous tree. It is a slow growing, extremely long-lived tree, reaching about 20 meters
height, with massive branches forming a round crown (Fig. 1.1). It is a diploid (2n=24),
monoecious (both male and female reproductive organs in one individual) species with a
protandrous system (anthers mature before carpels) to ensure cross-pollination. Plant
propagation in natural populations occurs by seed (acorn) dispersal and subsequent
germination (sexual reproduction), which is called natural regeneration. Cork oaks natural
regeneration is mostly assured by wind and animals, as with those of most oak species [1517].
Cork oak, along with holm oak (Quercus ilex L.,
1753), are the two main evergreen oak species in
the western part of the Mediterranean Basin [17].
These two species, particularly in the Iberia
Peninsula are mostly present as semi-natural stands
known as “montados”, which are open woods with
a delicate and particular ecosystem, created and
maintained by man.
The montado semi-natural landscape is valued
because it represents a viable land use still
preserving a rich biodiversity at all levels from
Figure 1.1: Quercus suber L. – Cork
oak‟s natural population in Serra da
Estrela, Portugal
insects and flora to top predators such as the
Iberian Imperial Eagle (Aquila adalberti) or the
Iberian Lynx (Lynx pardinus), the world‟s most
endangered cat and their mutual prey species, the rabbit (Oryctolagus cuniculus).
They also represent an important economical resource, but with the exception of central
Spain, holm oak forests can be regarded as rare cases of woodlands that have undergone very
little or no silvicultural management. Cork oak management is, however, at a different level,
since its high economical importance is associated not only with harvesting of acorns, but
20
Materials and Methods
also of cork. The thick and soft bark of cork oak is used to produce the familiar cork which is
the main product responsible for the important economical role of this partly domesticated
species. Trees are first stripped of cork, from the lower portion of the trunk at about 14 years
of age and subsequently every 9-12 years, and can live through this process for 100 to 500
years without any apparent effect on tree physiology. Acorns are eaten by birds and they are
highly valued as fattening fodder for domestic Iberian pigs. Since ancient times, cork oak has
been favoured, and sometimes widely spread, by preferably using acorns from trees
producing good quality cork [15,18-20]. Therefore, Q. suber is widely cultivated within its
natural range, but according to Carrión et al. [21], without human activities, cork oak would
never develop pure stands in the Iberian Peninsula, and would form mixed forests with other
sclerophyllous and deciduous oaks, and with Pinus pinaster.
Cork oak forests cover 2,2 million hectares worldwide, from where 340,000 tons/year of cork
are extracted (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). The largest stands,
covering about 700,000 ha are located in Portugal and correspond to 21% of the forest area in
Portugal and to 30% of the world‟s cork producing area. Currently, cork industry represents
3% of all Portuguese exportations. Cork stoppers for wine bottles are the most representative
product
of
this
industry,
responsible
for
70%
of
the
exportations
(see:
http://www.amorim.com/cor_glob_cortica.php). In spite of the economic importance of this
renewable material, there is still much to discover about both the biological and the genetic
mechanisms involved in its formation. Human intervention through extensive plantations and
systematic clear-cutting in forests with the objective of empirically selecting varieties with
higher quality levels of cork is supposed to have strongly contributed to the genetic
homogenization of Q. suber populations in the Iberian Peninsula [22].
Cork oak plantations are very important for the economy and play an important social and
environmental role that has to be taken into consideration as the unparalleled decline
occurring in the Iberian Peninsula and in Morocco is threatening the entire ecosystem [22].
Although the marginal and natural populations of cork oak are possibly the most endangered,
Iberian cork oak montados are also currently threatened and in decline due to multiple
factors. The main factor contributing to this decline is the occurrence of very severe drought
periods over several consecutive years [18]. The lack of natural regeneration (mainly due to
overgrazing and insolation, particularly in North Africa) is one of the most important factors
and so stand sustainability cannot rely exclusively on the decreasing resprouting ability of
21
Materials and Methods
aged and decaying adult trees [23,24]. In Portugal and Spain another contributing factor to
this decline is the occurrence of ink disease, a root disease caused by the soil born pathogen
Phytophthora cinnamomi. Moreover, the increasing use of synthetic stoppers in wine bottles
replacing the traditional cork is an additional factor that in conjunction with the above stated
threatens this ecosystem at medium term [18,22].
Holm oak montados are also endangered for some of these and a number of other reasons.
Thus, the admired sustainability of montados is jeopardized, and these formations may
become „fossil forests‟ [23]. In this sense, the outlook is more favourable for Q. ilex which is
a more euryecious species than Q. suber (whose presence is limited by cold, drought and soil
type). In recent years, there has been increasing recognition of the important contribution
made by these species to the preservation of seminatural habitats and landscapes in Europe
[18,23,24]. Several studies on the regeneration of Mediterranean forest have been published,
and some of them are centred on Q. ilex and Q. suber [24,25]. However, these works have
focused mainly on ecological aspects of regeneration, silviculture and land use [23], without
addressing the genetic bases of montado regeneration or the populations‟ diversity with
consequences on adaptation, which is now an immediate priority to allow informed decisions
for conservation of genetic resources.
1.1.2 Taxonomic classification and phylogenetic studies
Several proposals for Quercus taxonomy based on morphology have been presented [12,26],
however these classifications have always been surrounded by controversy mainly due to a
generalised intraspecific morphological variation that may be produced by hybridization and
adaptation to ecological changes in the environment [27], especially abundant in oaks. As a
result, classifications have been all but straightforward and especially at the subgenus level,
still uncertain. The taxonomic scheme proposed by Schwarz in 1964 [26] is possibly the
most accepted for the classification of cork oak, and appears to be the most suitable in
describing the systematics of European oaks [19,28,29]. According to the Flora Europaea
[17], and that same taxonomic scheme, the genus Quercus is divided in four subgenera (or
subsections), as follows:
22
Materials and Methods
Order Fagales
Family Fagaceae
Genus Quercus
Subgenus Cerris
Quercus
Sclerophyllodrys
Erythrobalanus
Quercus suber belongs to the family Fagaceae, genus Quercus and subgenus (or subsection)
Cerris (Spach) Oersted.
Quercus comprises 500-600 species, of which 350–500 species are distributed throughout the
Northern Hemisphere [12,13,30]. They are conspicuous members of the temperate deciduous
forests of North America, Europe, Asia, as well as the evergreen Mediterranean maquis. A
smaller number of oak species (30–35) are evergreen and grow mainly in south-western Asia,
western North America and around the Mediterranean Basin. In the Mediterranean area, only
four evergreen oak species have been identified. These include Quercus alnifolia Poech.
(golden oak) endemic to Cyprus and Quercus suber L. (cork oak) distributed exclusively in
the western part of the Mediterranean Basin. The third species is the holly oak which is a
complex including Quercus coccifera L. and Quercus calliprinos Webb. Allozyme studies
suggest that holly oak should perhaps be considered as a single species (Q. coccifera L.) with
subspecies coccifera and calliprinos [19]. The fourth Mediterranean oak species, Quercus ilex
L. (holm oak), shows two morphological types, rotundifolia and ilex type [18,19,26], which
are sometimes regarded as distinct species.
According to Schwarz [26], Q. ilex and Q. coccifera (including subsp. calliprinos) belong to
subgenus Sclerophyllodrys (O. Schwartz) whereas Q. suber and Q. alnifolia relate to
subgenus Cerris (Spach). This classification was also supported by RFLP analysis of the
nuclear ribosomal DNA (rDNA) 18S and 25S and spacer regions [28] and chloroplastidial
DNA [30] and by nuclear DNA (nuDNA) Internal Transcriber Spacer (ITS) sequences
[27,30], however Q. alnifolia was not included in these studies. Moreover, from the study of
Manos et al. [30], evidence was obtained that the two groups of Mediterranean oaks (subg.
Sclerophyllodrys and subg. Cerris sensu Schwarz; ”Ilex group” and “Cerris group” sensu
23
Materials and Methods
Nixon) are monophyletic, as reported previously by Nixon [29]. More recently, they
constitute a larger group (the Eurasian Cerris group) which includes all the European and
Asiatic evergreen oak species analysed [27,30]. When considering the subg. Cerris, several
systematic studies support that Quercus cerris and Quercus crenata are the most closely
related species to Q. suber [27,29-31].
1.1.2.1 Barcoding in oak phylogenetics
Tree species share several attributes, such as longevity, complex reproductive strategies, great
potential for local adaptation, and slow mutation and speciation rates [2], that makes
barcoding of forest trees a captivating issue from both speculative and practical points of
view. “DNA Barcoding” is a molecular approach to identify the species to which any living
organism belongs by the use of a standardised gene region of the genome (or several loci
used together as a complementary unit). Ideally, the barcode system would be an universal
and valuable resource that would allow fast and unequivocal species identification and taxon
characterization at any life stage of the specimen and from minimal tissue samples
(http://www.barcoding.si.edu) [29,32,33]. Besides taxonomy, a widespread application of
barcoding would be a powerful research complement for molecular ecology, phylogenetics,
and population genetics [34].
The success of a DNA sequence as a species identification tool - the barcode - depends on the
prerequisite of existence of unique substitutions that distinguish among closely related
species, and ease of application across a broad range of taxa. A portion of the mitochondrial
cytochrome c oxidase I (COI or cox1) gene sequence is currently being used as a universal
barcode in certain groups of animals, fungi, diatoms, and red algae. However, COI has
proved to be unsuitable in land plants, mainly because of the low nucleotide substitution rates
of the plant mitochondrial genome [7,35,36]. The nuclear and plastid plant genomes therefore
offer the best expectation of yielding a suitable sequence (or pool of sequences) for DNA
barcoding, i.e., a sequence(s) that will be variable enough to differentiate species, but at the
same time still stable enough at a lower taxonomic level as to have low infraspecific
variability [33,35]. The difficulty in finding a single-locus for barcode in plants suggested a
multilocus approach, focusing on the chloroplast genome as the most promising strategy for
barcoding plant species. Therefore a pool of loci has been recently considered, with the
24
Materials and Methods
greatest interest turned to seven candidates: rpoB, rpoC1 and rbcL as three easy-to-align
coding regions, a section of matK as a rapidly evolving coding region, and trnH-psbA, atpFatpH, and psbK-psbI for being three rapidly evolving intergenic spacers [36,37]. Based on the
relative ease of amplification, sequencing, multialignment, and on the amount of variation
displayed, many research groups have proposed different combinations of these loci [32,3639]. However, in 2009, the CBOL (The Consortium for the Barcode of Life) Plant Working
Group stated the combination of rbcL and matK as the most convenient in terms of
universality, sequence quality and discrimination power. Nevertheless, it is still argued that
regardless of the regions adopted for barcoding, some species will always be better resolved
with the use of other regions [29,36,40]. Such an example is the oaks, which represent an
obstacle to the idea of barcode in plants.
A recent attempt of barcoding in the Italian wild dendroflora, with the use of four plastid
regions (trnH-psbA, rbcL, rpoC1, matK), revealed that the genus Quercus is noncompliant to
barcoding (0% discrimination success) [29], a probable consequence of factors like low
variation rate at the plastid genome level and hybridization. Nonetheless, it appears that the
main obstacle to barcoding success in difficult genera, such as Quercus, cannot simply be
overcome by adding additional plastid DNA data. Nuclear DNA may offer some advantages
due to higher mutation rates and modes of inheritance. Discrimination of the same set of oak
species was already obtained by means of internal transcribed spacer region of ribosomal
DNA (ITS) sequence variation [27], and it even supports the recognition of the subgenus
Schlerophyllodrys, Cerris, and Quercus, as proposed by Schwarz [26]. The rapidly evolving
ITS may thus represent a useful supplementary barcode in difficult genera, although not
without completely overcoming extant problems, namely the paralogy and other factors
associated with the complex concerted evolution of this highly repeated part of the nuclear
genome, which still requires further refinement of current protocols [7,35].
1.1.3 Geographical distribution
The Mediterranean evergreen Quercus species are a group with overlapping habitats. In the
Western Mediterranean Basin, holm oak, cork oak and holly-oak are the dominant
broadleaved species. These three species are sympatric in many areas, but some differences
in their ecological requirements produce distinct responses to environmental conditions and
25
Materials and Methods
hence different evolutionary histories as interestingly confirmed by several studies showing
differences in their genetic variation patterns at both nuclear and cytoplasmic levels
[18,19,30,41,42].
Figure 1.2: Geographical distribution of cork oak, Quercus suber, represented in dark grey. Based
on Magri et al. [16]
Q. suber has quite a narrow geographical range when compared to the other main evergreen
Mediterranean oak species, mainly due to its ecological restrictions. The modern distribution
of cork oak, rather discontinuous, ranges from the Atlantic coasts of North Africa and Iberian
Peninsula to the southeastern regions of Italy, and includes the main western Mediterranean
islands of Sicily and Sardinia as well as the coastal belts of Algeria and Tunisia, Provence
(France) and Catalonia (Spain) [16,43] (Fig. 1.2).
As opposed to holm oak which shows a great ecological amplitude, cork oak is restricted to
hot (>4ºC – 5ºC mean temperature for the coldest month) variants of the humid and subhumid Mediterranean areas with at least 450 mm mean annual rainfall [18,20].
26
Materials and Methods
In Europe there are, theoretically, low winter temperatures that appear to set the geographic
distribution limits and most cork oak stands are located in areas below 800 meters in altitude,
since cork oak leaves are less tolerant to frost and to drought than those of the more
widespread holm oak. In addition, whereas holm oak is indifferent to soil types, cork oak
usually grows in acidic soils on granite, schist, or sandy substrates and it avoids limestone
and other carbonated substrates. Cork oak distribution is therefore more shifted to the west
and more patchy than that of holm oak (sensu latu) which constitutes a continuum from
Turkey to Portugal, including all the larger Mediterranean islands [17,18,20,43]. In spite of
this, within its geographical range, cork oak shows high levels of morphological and
phenological variability, albeit most of this diversity is considered to be result of past
introgressive hybridization with other sympatric species [15,44,45]. Nowadays, in their
common distribution area, cork and holm oaks often grow together and the local occurrence
of morphologically intermediate trees has been reported [18,27].
1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization
Several hypotheses have been advanced concerning the evolutionary history of cork oak as
well as the geographical location of its centre of origin; however the details of its
differentiation processes are still largely unknown.
It was originally suggested that Q. suber may have originated in the Iberian Peninsula where
the species has its current main range (Fig. 1.2). This hypothesis was based on geobotanical
studies and on allozyme variation from the whole cork oak range, which revealed a
substantially higher genetic diversity in the Iberian populations as compared with those from
North Africa, Italy and France [18,27]. Paleoecological data indicate that both cork and holm
oak species have been present in south Europe since the end of the Tertiary period. Also, two
fossil records of cork oak from Miocene age were found in Portugal and two belonging to the
Pliocene were recovered in Tunisia and Galicia (Spain). Therefore it seems plausible an early
Cenozoic origin for Q. suber in Iberia and subsequently, at the end of the Miocene, the
colonization of North Africa from the Gibraltar strait [16,18,43].
Alternatively, according to fossil records of other oak species of subgenus Cerris, dating to
the Tertiary and found in the Balkanic Peninsula, it has also been considered that Q. suber
27
Materials and Methods
might have appeared first in more eastern countries (either in the Balkanic Peninsula or,
alternatively, in the Middle Eastern-Peri-Caucasian area), in common to the whole Cerris
group. It has been suggested that the species expanded westward during the late Miocene and
was widespread throughout the Mediterranean Basin during the Pliocene, where it survived
thanks to the lack of climatic constraints, but going extinct in the eastern part of its
distribution area [27,43]. Data from PCR–RFLPs over cpDNA fragments seems to constitute
additional evidence to support an eastern origin for cork oak [43].
Glacial and periglacial environments have had a significant effect on the modern vegetation
of Europe. It is widely accepted that the climatic oscillations that occurred during the
Quaternary (i.e., over the past 1.8 million years) are one of the most crucial determinants of
the current distribution of biota in temperate latitudes. The spatial patterns of several tree
species throughout the European continent are the long-term result of late glacial and postglacial migration from refugial populations that were able to withstand the severe climatic
conditions of Pleistocene stadials [46-49]. With few exceptions [8,50,51], during the coldest
periods of the last full glacial epoch (37,000 – 16,000 years BP – before present) the locations
postulated for glacial refugia of most European woody angiosperms have been south of the
parallel 40º N, which runs from central Portugal to Sardinia, Calabria and northern Greece.
This is considered to be the boundary between polar aridity and warmer climates during part
of the Quaternary. The theory that southern Europe (particularly the three southern peninsulas
- Balkan, Italian and Iberian) and the Near East provided appropriate conditions for refugia of
temperate tree taxa is based on a number of assumptions relating to the full-glacial
environments of those regions and their ability to supply the necessary conditions for growth
[52,53].
The original refugial model idea implied that „forests‟ could have survived in these southern
locations during the cold stages of the Quaternary. However, extensive populations of trees
have never been detected. Instead, the traditional palaeogeographical models (although
inferred from a scarce palynological evidence) suggest a small number of refugia - the “few
southern refugia” hypothesis [52,53]. Temperate tree taxa possibly survived in small pockets
of microenvironmentally favourable locations where usually only a few tree taxa are detected
and in low concentrations. Aridity was probably a significant limiting factor for tree growth.
Assuming the hypothesis of “few southern refugia”, common patterns of post-glacial
colonization for temperate European tree species are defined and expected, with high
28
Materials and Methods
diversity levels in southern Europe and decreasing northwards [54-57]. Some of the main
European trees species have been analysed using molecular markers, including for example
several Quercus species and Abies alba (European silver fir), and the resulting patterns of
diversity correspond to the expected ones [58,59].
However, more complete palaeobotanical data sets [50,53], palaeoclimatic modeling [60] and
genetic research [52] are starting to question the paradigm of “few southern refugia” in
southern Europe (and in particular in the three southern peninsulas) during full-glaciations.
Increasing evidence indicates that during the last full-glacial period populations of coniferous
and some deciduous trees grew much further north and east than previously assumed [53]. In
addition, new palaeoclimatic simulations suggest that full-glacial conditions in central and
eastern Europe were not nearly as severe as previously anticipated [60,61]. While some
refugia for Mediterranean trees were previously identified in the Iberia Peninsula, López de
Heredia et al. (2007) results based on cpDNA PCR-RFLPs and a review of paleobotanical
data support the presence of multiple refugia for the evergreen oaks within the Iberian
Peninsula (e.g. Cantabric mountain ranges, south-eastern Spain or even central Spain) during,
at least, the last glacial period [52]. Under the “multiple refugia” hypothesis, tree species that
nowadays are present in the north and central Europe would have recolonized these areas
from populations located in the north of the Iberian Peninsula. Moreover, these populations
would have been barriers preventing expansion from southern refugia. If that was the case,
cpDNA data should show complex patterns of spatial distribution that would have resulted
from the generation of multiple secondary contact zones [8].
For the last glacial and postglacial periods, results from palynological data indicate the
occurrence of cork oak in south-western Iberia since the Late Glacial period (17,000-12,000
years BP) and in North Africa since the early Postglacial (approximately 8,500 years BP)
[18,27]. It is accepted that during the Quaternary glaciations, cork oak may have survived in
scattered refugia which possessed favourable microclimate conditions, and from which
postglacial colonization occurred over recent millennia. Palynological [21] and molecular
data [43,52] indicate a glacial refugia in south-western Iberia that expanded northwards in the
absence of mountain barriers and which was favoured by the existence of siliceous substrates.
It is also possible that the extensive introgression of Q. suber with Q. ilex may also indicate
several potential refugia in eastern Iberia [52]. RFLP analysis of the whole cpDNA show a
phylogeographical pattern of three groups corresponding to potential glacial refuges in Italy,
29
Materials and Methods
North Africa and Iberian Peninsula [43], from which, after the last glaciations, Q. suber may
have begun migrating northward to the southern part of France. However no fossil record
supports the molecular data for the Italian and North African refuge.
Reliable scientific evidence is lacking to confirm the presence of Q. suber in more northern
and eastern European countries. The Tertiary and Quaternary remains (megafossils and
pollen) found in several European countries did not allow taxonomic identification at the
species level and could be attributable to any Mediterranean oak species of the Cerris group
[43,62]. In fact, Q. suber is more thermophilous and has stricter soil requirements than many
other Quercus species, thus a bigger reduction of this species‟ range during glacial times is
expected to have happened. However, the uncertainty of palynological discrimination and the
lower cpDNA variation itself could bias the identification of glacial refugia for Q. suber [52].
1.1.5 Genetic diversity studies
As cork oak is predominantly allogamous, i. e. favouring cross-fertilization, with a life span
of up to 500 years or more and having a low replacement rate, it can be expected that at least
in some places, and mostly for selectively neutral characters, selection over time may have
resulted in reduced genetic differentiation both among trees of the same population and
between populations. However, a differentiation among populations has been detected around
the Mediterranean Basin by investigating both chloroplast and mitochondrial DNA (which
are maternally inherited in oaks, as demonstrated by Dumolin et al. [63]), as well as allozyme
variation [16,18,19,67]. Isozyme variation in the genus Quercus also shows that genetic
variability is high and similar to that found in conifers [15,65]. One of the main causes for the
high polymorphism found in cork-oak, as well as in holm-oak, may be attributable to the
physiological plasticity of the species, which allows them to adapt to variable and
unpredictable climatic conditions, characteristic of the Mediterranean climate. High levels of
diversity within populations are observed; conversely, low inter-population variability
indicates that most of the total genetic diversity in the species is found within rather than
among populations [15,18,19,23]. According to Elena-Rosselló & Cabrera [15], more than
83% of the total diversity in this species is found within populations, and the decline of
kinship estimates with distance suggests that isolation by distance has led to this structure.
The results obtained for Q. suber contrast with those found for most temperate forest species,
30
Materials and Methods
for which a generally weak and narrower within-population structure is the trend [23]. In cork
oak, gene flow between populations was estimated as more than one migrant per generation
(F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) and is theoretically enough to prevent
genetic drift from causing local genetic differentiation and therefore population divergence,
under the Wright‟s Island Model [66].
PCR-RFLPs over specific cpDNA fragments illustrate a complex pattern of variation in the
evergreen oaks [19,41]. Jiménez et al. [41] detected three very distinct lineages of cpDNA
haplotypes, two of them being present in cork oak. One of the lineages, the “suber” lineage,
is specific to cork oak populations and may be considered as the original and most widely
distributed lineage in this species. The partial geographical distribution of this lineage was
reported by López de Heredia et al. [64], from peninsular Italy, Sardinia, Sicily, Corsica,
northern Africa and the island of Minorca. Cork oak populations from the Spanish mainland
and from the island of Majorca were characterized by another maternal lineage also shared
with Q. ilex and Q. coccifera, the “ilex-coccifera I” lineage [41,64]. This fact was interpreted
as the result of multiple and mainly unidirectional cytoplasmic introgression of Q. suber by
Q. ilex.
RFLP analysis over the whole chloroplastidial DNA was used by Lumaret et al. [43] for the
first time in Q. suber to analyse the phylogeographical variation over the whole species range
(Fig. 1.3). The chlorotypes showed a clear phylogeographical pattern of three groups
corresponding to potential glacial refuges in Italy, North Africa and Iberian Peninsula. The
most ancestral and recent groups were observed in populations located in the eastern and
western parts of the species range, respectively. Unrelated chlorotypes of an “ilex” cpDNA
lineage were also identified in specific western populations [43]. From the cpDNA variants
of „ilex‟ lineage recovered through interspecific introgression, additional successive cpDNA
changes may have occurred in Q. suber, and so two distinct cpDNA lineages in cork oak
were predicted. A particular chlorotype S1, observed predominantly in continental Italy and
in Sicily, was identified by Lumaret et al. [43] in a few populations from Sardinia, and from
Corsica which also shared a rare chlorotype S7 with Tunisia. This situation possibly reflects
the occurrence of rare natural events of long-distance dispersal from several geographical
sources located in the closest areas to those islands. Moreover, the possibility of an
intentional acorn transport by people for economic purposes cannot be ruled out and its
impact on the geographical patterns of cork oak genetic variation should not be
31
Materials and Methods
underestimated [43]. López-de-Heredia et al. [64] also proposed the possibility of longdistance dispersal events to explain the sharing of a rare chlorotype by cork oak populations
located in Minorca and in Sardinia.
Figure 1.3: Geographical distribution of the eight and six chlorotypes of the „suber‟ and „ilex‟ lineages
identified in Q. suber populations by Lumaret et al. [43]. Chlorotypes were scored by RFLP variation over
the whole cpDNA molecule. The identity of sampled populations and cpDNA chlorotypes assayed through
RFLP as well as affiliation to the „suber‟ or „ilex‟ cpDNA lineages are indicated in the Figure. Source:
Lumaret et al. [43].
Using cpDNA microsatellites, Magri et al. [16] analysed cork oak populations throughout the
species distribution range and found a high geographical structure characterized by five
distinct haplotypes (Fig. 1.4). It was assumed that H3 (north Africa-Sardinia-CorsicaProvence) and H4 (Portugal-western Spain-southwest France-northern Morocco) were the
ancestral Q. suber haplotypes, with H1, H2 (Italy) and H5 (scattered populations) originating
through ancient or recent introgression with Q. cerris (H1 and H2) and Q. ilex (H5). Also, the
cpDNA SSR data combined with paleobotanical and geodynamics models demonstrated that
cork oak populations have possibly experienced a genetic drift geographically consistent with
the Oligocene and Miocene break-up events of the European–Iberian continental margins and
persisted in some of the separate microplates that are currently found in Tunisia, Sardinia,
32
Materials and Methods
Corsica, and Provence [16] (Fig. 1.5). All these events seemed to have occurred without
detectable cpDNA modifications for a time span of over 15 million years.
Figure 1.4: Distribution of cpDNA haplotypes found by Magri et al. [16] with cpDNA SSRs and
phylogenetic reconstruction of the relationships between haplotypes. The black circle in the network
indicates a hypothesized mutation, which is required to connect existing haplotypes within the network
with maximum parsimony. The grey area corresponds to the current distribution of Q. suber Source:
Magri et al. [16].
The modern history of Quercus suber is closely related to human activity over the use of its
cork. For this reason, humans have been considered responsible for a reduction in genetic
variation in some stands of cork oak, as well as for hybridization with congeners [67]. Other
cultivated tree species in the Mediterranean area display a similar low geographical structure
in genetic variation, arguing for a multidirectional diffusion because of human activity. For
example, in Castanea sativa, the low geographical structure of the chloroplast genetic
diversity may be explained by the effect of a strong human impact [67,68]. However the
geographical distribution of the cork oak haplotypes found by Magri et al. [16] does not
appear to be related to cultivation. In fact, fossil pollen and wood records suggest that cork
oak was distributed in approximately the same areas as today even before the Neolithic.
33
Materials and Methods
Another possible hypothesis to explain these results is postglacial population expansion from
the potential glacial refuges in Italy, North Africa and Iberian Peninsula [43].
Some studies have also assessed the genetic
variability of cork oak populations in
Portugal. Coelho et al. [22] used AFLP
markers
and
reported
low
levels
of
differentiation among cork oak populations.
The reasons pointed out are owed to the
outcrossing characteristic of the species,
long distance anemophilous pollination and
eventual secondary acorn dispersal by
animals, leading to extensive gene flow and
an
increased
homogeneity
of
allele
frequencies between populations [22,45].
The values of population differentiation
reported by Coelho et al. [22] (FST =0.0172)
are below the average of 0.07–0.09
expected for long-lived, wind-pollinated
woody species. These results are similar to
Figure 1.5: Reconstructions of the Western
Mediterranean palaeogeography and possible
location of Quercus suber haplotypes found by
Magri et al. [16] (colours as in Fig. 1.4).
Continental microterranes rifted off the EuropeanIberian continental margin: Rif (R), Betic range (B),
Balearics (Ba), Kabylies (Ka), Corsica (Co),
Sardinia (Sa), Calabria (Ca). Source: Magri et al.
[16].
those found by Simões de Matos (F. Simões
de Matos, PhD thesis, INETI Lisbon, 2007)
with nuclear SSRs (FST=0.02), confirming
the absence of population structure. This
pattern of genetic differentiation within
Portuguese cork oak stands, some located
over a distance of 700 km, may be
explained by anthropogenic pressure in addition to a constant gene flow. This study shows
that 90% of the polymorphic markers identified in cork oak genotypes are uniformly
distributed through the populations of Algarve, Alentejo and Trás-os-Montes regions.
34
Materials and Methods
1.1.6 Hybridization and cytoplasmatic introgression
Capture of unexpected chloroplast haplotypes by hybridization and introgression has been
proposed as the most likely explanation for the sharing of cytoplasmic genes both in
deciduous and evergreen oaks [69] as well as in others [41,70]. Q. suber was reported to
hybridize with several species of the evergreen oak group, particularly, with holm oak [17,45]
this being regarded as one factor contributing to the increase of genetic diversity in cork oak
[22]. Q. suber and Q. ilex possess overlapping geographical distributions [17], and
hybridization occurs in nature, although it is not a frequent event [45]. Nevertheless, these
species are not very closely related, as shown from both cytoplasmic and nuclear genetic
analyses [19,27,69] and belong to subgenera Cerris and Schlerophyllodrys, respectively [26],
although the more recent classification includes both species within the same Eurasian Cerris
group [30].
The two most easily recognizable oak hybrids are Quercus x crenata (Q. cerris x Q. suber)
and Quercus x morisii (Q. suber x Q. ilex) [27]. It must be noted that these relatively rare
hybrids (0.3%) are found only when both parental species co-occur. Mature hybrid
individuals are easily recognized due to intermediate morphological traits between the two
parental oaks [27], but seedlings and even juvenile trees show very similar morphological
traits so that, in mixed stands, species identification is usually very difficult or even
impossible until the adult stage [71]. Asymmetric hybridization has been confirmed by
Boavida et al. [45], upon the description of post-pollination barriers in Q. suber to
interspecific crosses with Q. ilex, Q. coccifera, Q. faginea and Q. robur. The cross between
Q. ilex and Q. suber shows evidence of unidirectional compatibility and a higher success rate
was reported in the interspecific crosses in which Q. suber acts as pollen donor rather than as
female parent due to a differential growth in the pollen tubes of both species [45]. Also, since
both species are protandrous and Q. ilex flowers earlier, early cork oak male flowers can
pollinate late holm oak female flowers, the reverse not usually occurring.
By analysing polymorphism at allozyme loci and DNA markers for which alleles are distinct
in the two species growing in separate areas (diagnostic markers), evidence was obtained for
the occurrence of hybrids and genetic introgression (backcrosses between hybrids and
parental species) between sympatric holm oak (female) and cork oak (male) in several
locations [18,43,71]. Further evidence was advanced that, in initial hybridization and in
backcrosses, Q. ilex is predominantly, but not exclusively, the maternal species. This
35
Materials and Methods
interpretation is supported by the discovery of “ilex-coccifera I” haplotypes (chlorotype
shared by Q. ilex and Q. coccifera) in Q. suber individuals, and the absence of the opposite
situation, that is no Q. suber haplotypes within the Q. ilex pool [41].
The effect of hybridization and introgression in Q. suber cpDNA can produce the total
replacement of the Q. suber chlorotype by the “ilex-coccifera I” lineage (chlorotype shared
by Q. ilex and Q. coccifera) [41,64]. This situation is common in eastern Spain, where
siliceous soils are scarce and the effective population size is lower than in the continuous
forests from western Iberia. It has been suggested, on the basis of the differences between Q.
ilex and Q. suber chlorotypes found in sympatric populations, that hybridization and
introgression in these populations may be ancient [43]. Therefore, as reported by López de
Heredia et al [52] it cannot be ruled out that in the eastern range of the species some
populations withstood the glacial conditions by hybridizing with Q. ilex. For instance, a
particular chlorotype found by these authors (named “c66”) is predominant in all Q. suber
populations from Catalonia (north-eastern Spain), being very rare in Q. ilex.
The absence of cork oak populations possessing „ilex‟ chlorotypes in the eastern
Mediterranean range of the species was reported both by López-de-Heredia et al. [64] and by
Lumaret et al. [43]. However in a Corsican population, one of the 50 cork oak individuals
scored for cpDNA RFLPs was shown to possess an „ilex‟ chlorotype [43], suggesting that
cytoplasmic introgression of Q. suber by Q. ilex does occur in the eastern range although
apparently much less commonly. A substantial number of trees showing intermediate
morphology between both species have been observed in south-eastern continental Italy [27],
in Sardinia and Provence [42], also possessing predominantly an „ilex‟ chlorotype and for
many of them a hybrid origin was confirmed on the basis of nuclear interspecific diagnostic
markers. So, interspecific hybridization is likely to have happened quite frequently in the
eastern part of the range of Q. suber as well [43].
36
Materials and Methods
1.2 Molecular markers in phylogeography
The use of molecular markers has revolutionized research fields such as conservation
biology, population biology, and ecology. Markers provide a mean of observing otherwise
hidden aspects of natural history, whether this involves population level interactions on
ecological timescales, or the evolutionary relationships of genes, populations, and taxa [10].
As stated before, there is a lack of phylogeographic studies in plants, when in comparison to
animal studies. One of the major problems is finding useful genetic variation applicable to
this type of analysis, and it has been quite difficult to find genetic markers with a resolving
power comparable to the animal mitochondrial DNA [7,10]. To address this problem, and
also the choice of the molecular markers used for this study, it seemed necessary to review
the literature concerning plant genomes and the molecular markers available.
Plants are characterized by three types of genomes within the cell: the nuclear genome, and
two cytoplasmic genomes – mitochondrial and chloroplastidial DNA. The latter are of
endosymbiotic origin and have lost various genes to the nucleus over time (and, sometimes,
vice versa). These organelle genomes, because of their supposed shared prokaryotic origin,
are similar to animal mtDNA in overall structure (closed-circle chromosomes), replication
mode (with large populations of molecules per cell), and a non-Mendelian inheritance.
However, they also differ from animal mtDNA, and from one another, in some important
molecular and evolutionary aspects [10,72].
1.2.1 Mitochondrial DNA (mtDNA)
Although the phylogeographic studies in animals rely heavily on the mitochondrial genome,
in plants several characteristics make it poorly suited for these studies [7,11].
Plant mtDNA is highly variable in size across species, ranging from about 20 kilobases (kb)
to 2500 kb. Inheritance is often maternal, but not always. Surprisingly, plant mtDNA evolves
rapidly with respect to gene order and gene rearrangements are common, but rather slowly
regarding primary nucleotide sequence. This leads to low rates of sequence evolution (about
100 times slower than in animals), such that specific loci do not contain adequate variation
37
Materials and Methods
for generating phylogeographic, intraspecific signal. So, in these regards, the evolutionary
dynamics of plant mtDNA and animal mtDNA differ greatly, and one must look in
alternative genomes for informative variation [7,10].
1.2.2 Chloroplastidial DNA (cpDNA)
Although similar to mtDNA, the plant cpDNA plays by different evolutionary rules. It varies
moderately in size among species (from about 120 to 217 kb), with much of the size variation
attributable to the extent of sequence repetition in a large inverted repeat region. The
molecule contains about 120 genes that code for ribosomal and transfer RNAs, and several
polypeptides involved in protein synthesis and photosynthesis. The chloroplastidial genome
is transmitted maternally in most species, biparentally in some, and paternally in others
(notably, most gymnosperms), and tends to evolve somewhat slowly with regard to gene
rearrangements and also in terms of primary nucleotide sequences (about 3 to 4 times faster
than plant mtDNA, but still much slower than animal mtDNA). For this latter reason, cpDNA
sequences have proven especially useful for estimating phylogenetic relationships in plants
[7,10].
Intraspecific variation has been reported in a growing number of species, so that almost all
published plant phylogeographic studies have relied on the chloroplast genome as their only
source of genetic variation [7,10]. Most of this variation has been revealed by restriction
enzyme digestion of cpDNA (RFLP technique), in which genetic variants reflect the gain or
loss of restriction sites or length variation [73]. A more recent restriction enzyme-based
approach involves the digestion of PCR-amplified chloroplast loci to reveal fragment length
polymorphisms (RFLP) within the amplified fragment [52,59]. Using these readily accessible
laboratory techniques, large portions of the chloroplast genome may be evaluated in
numerous individuals. Furthermore, it is believed that at least 50% of all cpDNA variation
may be attributable to small insertion/deletion mutations. However, concerns about the
homology of length variants associated with simple-sequence repeat (SSR) polymorphisms
need to be addressed before this technique can be widely applied to construct useful gene
trees [7]. Ultimately, direct knowledge of the sequences of cpDNA variation would be most
desirable for gene tree construction. Unlike restriction enzyme analyses, direct sequencing of
cpDNA loci has not retrieved so far as many optimal levels of variation for phylogeographic
38
Materials and Methods
analysis. On the search for cpDNA loci with useful levels of sequence variation, it is
necessary to consider that the mutation rate of cpDNA varies for different regions of the
genome and non-coding regions are more prone to mutation. Therefore, several small regions
of the chloroplast genome (such as some intergenic spacers) show potential for
phylogeographic analysis [59,74,75].
Several attempts indicate that single cpDNA loci are only occasionally useful at the
intraspecific level, but as technology progresses and the sequencing of larger fragments of
DNA becomes easily achievable, with diligent sequencing efforts, it seems likely that
sufficient genetic variation can be uncovered and studies will utilize more of the potentially
available variation in the chloroplast. Ultimately, finer phylogeographic resolution can be
obtained [7,10]. Indeed a few studies have already proved intergenic spacer regions as useful
regions for direct sequencing, such as, for example, trnT-L-F in Ficus carica [76][74] and
trnH-psbA in Eucalyptus perriniana [77][78], as well as psbC-trnS intergenic spacer region
[79] in several Quercus species [75].
1.2.3 Nuclear DNA (nuDNA)
The remaining alternative is the nuclear genome that is still largely unexplored but offers a
potentially inexhaustible source of informative genetic variation, and lately many
investigators are developing techniques and strategies for locating and efficiently sampling
appropriate variation in nuclear DNA.
The ITS region, useful for plant systematics, is however generally not very helpful for
phylogeographic studies. First, for most species examined, intraspecific variation has not
always been detected in this region. Furthermore, as part of a multicopy gene family, the ITS
region is subjected to poorly understood processes of concerted evolution, which may lead to
problems with the interpretation of sequence polymorphism at the intraspecific level. Also,
when a locus is part of a multicopy gene or multigene family, PCR amplification with
conserved primers may produce multiple fragments, including duplicated gene copies,
pseudogenes, and even recombinant PCR artifacts. Care is thus necessary to avoid comparing
paralogous loci, which may be especially difficult to detect in cases where there has been
differential homogenization of gene copies among populations [7,80].
39
Materials and Methods
In principle, single-copy nuclear (scn) genes should also provide sufficient sequence data for
phylogeographic assessments at the intraspecific level, but three technical and biological
obstacles need to be considered: first the considerable slow rate of sequence evolution at
many nuclear loci; in diploid organisms, the difficulty of isolating aleles, one at a time; and
intragenic recombination. Nonetheless, scnDNA has been employed successfully in some
phylogeographic assessments, with some of the most informative results coming from intron
sequences at protein-coding genes [10,81]. However, so far, no single locus appears to be
universally useful in all species of plants.
Additional features of the nuclear genome also need to be taken into account for
phylogeographic analysis such as complications involving interallelic recombination and
heterozygosity, recombinant alleles from crossing-over events among alleles of a locus
resulting in chimeric haplotypes and also the homology of the loci in use needs to be
reassured. Some (and probably many) „single-copy‟ nuclear genes exist as part of small gene
families consisting of two to ten expressed loci and possibly additional pseudogenes
[7,10,80,81].
Despite all of these potential problems the nuclear genome is still, perhaps, the most dynamic
and useful marker for studying plant phylogeography because it is much larger than the
others and includes most of the information behind the shaping and adaptation of the
individual to the environment.
1.2.4 Simple Sequence Repeats (SSRs)
Microsatellites (or simple sequence repeats - SSRs) are short repetitive sequences of
nucleotides of typically 1-5 base pairs (bp) motifs, that are repeated in tandem up to a usual
maximum of 60 or so, and are widespread in both eukaryotic and prokaryotic genomes
[82,83]. Less accuracy of traditional molecular markers in the estimation of genetic
differences between various taxa and their insufficient statistical capacity forced researchers
to look towards better alternatives like microsatellites. They present a group of characteristics
that make them eligible as markers of choice for several studies, such as: 1- PCR-based, 2co-dominant, 3- usually multiallelic and highly variable, 4- randomly dispersed throughout
40
Materials and Methods
the genome, and 5- easily scorable by different methods [82,84,85]. Neutral nuclear SSRs
(nuSSRs) are the choice for diversity analysis, genetic mapping and association studies [86].
The use of microsatellites as polymorphic DNA markers has considerably increased over the
years, and although they were originally designed for research in humans, they have been
extensively used for genetic analysis in all classes of organisms, including plants [82,85].
With the development of other genetic markers like single nucleotide polymorphisms (SNPs)
and AFLPs, it was thought that the use of microsatellites would decline. However, recent
research has improved its application so much that microsatellites will probably still be used
in the near future as important genetic markers in various biological disciplines [11,82]. The
initial cost associated with microsatellites may be high due to the requirement of sequence
information, but once developed they can be easily maintained and shared between
laboratories. The ease of use, high reproducibility, low cost and abundance of SSR loci in
living organisms makes them ideal markers for genetic analysis. Also they are multi-allelic
and generally have high heterozygosity and mutation rates (ranging from 10-6 to 10-2 events
per locus per generation), which can make them more informative than other markers, such as
Random Amplified Polymorphic DNA (RAPDs) and AFLPs [82,85].
Particularly in Quercus suber, Simões de Matos (F. Simões de Matos, PhD thesis, INETI
Lisbon, 2007) developed the only specific nuSSRs for cork oak (as a rule SSRs are speciesspecific markers which must be developed de novo for each species, mainly because they
usually occur in non-coding regions of the genome which are not highly conserved) but some
studies [87,88] have shown the transferability of nuSSRs from other oak species to cork oak,
which potentially reduces the need to develop species-specific nuSSRs for this species.
1.2.5 Expressed Sequence Tags (ESTs)
Expressed Sequence Tags (ESTs) can serve as a source of molecular markers as gene
sequences, SSRs or SNPs, and are an easy way to access fragments of the transcriptome.
They are short (200-800 bases), randomly selected sequences derived from cDNA libraries.
Even if ESTs are not available from the organism under study, EST collections can serve as a
bridge between the genomic resources of model organisms and diverse species of interest,
usually nonmodel organisms. ESTs provide information of the transcribed mRNA
41
Materials and Methods
populations within a given set of tissues, developmental stages, environmental conditions and
genotypes [89,90]. For instance, the direct sequencing of EST fragments and subsequent
detection of SNPs would be the most useful way of studying geographical distribution of
genetic variation within species. As most ESTs are directly involved in the genetic control of
an adaptive trait and have a known function, ESTs are the genetic marker that offer real
potential for detecting adaptive genetic diversity [90].
As an alternative to the conventional strategy for detecting anonymous SSRs, large numbers
of novel SSRs can be isolated with comparatively minor effort simply by in silico mining of
the ESTs databases [91,92]. This approach has become a routine for some species, and there
are many characteristics that EST-SSRs (EST-derived SSRs) present and that make them
valuable as genetic markers. These include their presence in large numbers, high levels of
polymorphism compared with many other types of genetic markers, co-dominant inheritance,
repeatability and clarity of scoring, and enhanced transferability across related species
[91,93]. Perhaps the greatest concern about the utility of EST-SSRs in population genetic
analysis is that selection on these loci might influence the estimation of population
parameters. Indeed, divergent selection will increase differentiation among and reduce
variability within populations, whereas the opposite effect is expected under balancing
selection. However, studies of large-scale comparative analyses suggest that only a very
small percentage of all genes are experiencing positive selection [91,93]. Inevitably some
fraction of all EST-SSRs will be subjected to selection.
Recently a significant number of EST‟s was generated in oaks, and particularly in cork oak.
Since EST‟s are gene conservative primers designed for a species are likely to work well in
related ones.
Ultimately, the potential of phylogeography may be fully accomplished when multiple loci
are considered. The combined analysis of different marker types should allow a
reconstruction of past population events in great detail, and also help understand their spatial
structure and the dynamics of genetic diversity.
42
Materials and Methods
2. Materials and Methods
2.1 Sampling and DNA extraction
Sampling of 26 natural populations was performed from the entire Mediterranean distribution
(Fig. 1.2 and Table 2.1). In Portugal sampling was performed surveying the following
locations: Gerês, Serra da Estrela, Serra de São Mamede, Serra da Arrábida, Serra de
Monchique, Serra do Buçaco, Azeitão and Serra de Sintra. Stands were considered as natural
populations when constituted by irregularly disposed trees with over 50 years old. The
remaining populations from Portugal (São Brás de Alportel), Spain (Cataluña, Montes de
Toledo, Haza del Lino, Sierra de Aracena, Sierra Morena, Sierra de Guadarrama), Italy
(Puglia, Lazio and Sicily), France (Var, Landes and Corsica), Algeria (Forêt des Guerbès),
Tunisia (Mekna and Fermana) and Morocco (Taza and Kenitra), were obtained from a cork
oak provenance trial, located at Herdade Monte da Fava (Ermidas do Sado) , which harbours
an international provenance trial established in 1998 in the frame of the Q. suber network
from EUFORGEN, covering the complete distribution range of the species. Access to these
populations was kindly provided by Helena Almeida from Instituto Superior de Agronomia.
From each population 3-5 trees were sampled for the cpDNA and nuDNA fragment analysis.
Young leaves were collected from Spring 2009 to Summer 2010 on a total of 119 adult trees
distributed among the 26 sampled populations (Table 2.1).
Of the 26 populations chosen, 13 were selected for a wider sampling for the SSR study. The
selected locations are representative of the entire Mediterranean distribution, and are the
following: Portugal (Gerês, Serra da Estrela, Serra da Arrábida, Serra de Monchique, Serra
do Buçaco and Serra de Sintra), Spain (Cataluña and Haza del Lino), Italy (Puglia), Algeria
(Forêt des Guerbès), Tunisia (Mekna) and Morocco (Taza and Kenitra). For each population
22-32 trees were obtained. Young leaves were also collected from Spring 2009 until Summer
2010, on a total of 379 adult trees distributed among the 13 sampled populations (Table 2.1).
Several other Quercus species (namely Q. robur, Q. pyrenaica, Q. faginea, Q. rubra, Q.
lusitanica, Q. canariensis, Q. cerris, Q. ilex (subsp rotundifolia and subsp ilex) and Q.
coccifera) were also sampled from natural populations and used to help determine the Q.
suber lineages, and also to more accurately establish the phylogenetic relationships of these
lineages. According to the taxonomic classification of Schwartz [26] Q. cerris is part of the
43
Materials and Methods
subgenus Cerris, together with Q. suber. As for Q. petrea, Q. robur, Q. pyrenaica, Q. faginea
and Q. lusitanica they belong to the subgenus Quercus. The species Q. coccifera and Q. ilex
belong to the subgenus Sclerophyllodrys. Finally, Q. rubra is part of the subgenus
Erythrobalanus. Castanea crenata was used as an outgroup (Table 2.1). Species identification
of each tree was checked based on the leaf morphology, and presence of bark in Q. suber,
assessed during the growing season on fully elongated leaves.
The leaves were ground thoroughly with liquid Nitrogen, with a mortar and pestle, and then
the genomic DNA was extracted according to Qiagen‟s protocol for DNeasy plant mini kit
(Qiagen). The samples were analysed by electrophoresis on 1% w/v agarose gels stained with
Red Safe 20,000x (iNtRON Biothechnology), to determine DNA integrity.
2.2 DNA sequencing
Polymerase chain reaction (PCR) amplifications were performed for 148 Quercus samples
(Table 2.1) for fragments of three different chloroplastidial DNA regions [intergenic spacer
regions TrnL-F [74], TrnS-PsbC [79] and TrnH-PsbA [78]]. Considering preliminary results
of the cpDNA fragments analysis, 104 individuals, out of the 148, were selected for
amplification of one nuclear DNA fragment [Expressed Sequence Tag (EST) 2T13 [94], a
stress osmotic related gene] (Table 2.1). The primers used to amplify each fragment were
those described by each mentioned author (Supporting Information 1 – Table S1.1 and Table
S1.2).
To confirm Quercus species and assess the usefulness of barcodes as phylogeographical
markers the official cpDNA barcode fragments (matK and rbcL) were amplified with the
primers described by Cuénoud et al. [95] and Kress & Erickson [32], respectively
(Supporting Information 1 – Table S1.1). Three individuals of each cork oak lineage,
identified in the previous analysis of cpDNA regions, and one individual of each other
Quercus species were selected for the analysis.
PCRs were performed in a final volume of 25 μL, with 1 μL of DNA (50–100 ng), 1x PCR
buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4
µM of each primer. PCR amplification conditions were as follows: an initial denaturation
step at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 20 s,
44
Materials and Methods
annealing at 65 °C for 30 s for intergenic cpDNA fragments and 55 °C for the nuclear
candidate gene and barcode fragments, extension at 72 °C for 40 s, and a final extension step
at 72 °C for 7 min. PCR and amplification conditions were the same for all oak species.
PCR products amplifications were verified by staining with Red Safe 20,000x (iNtRON
Biothechnology) along with the molecular weight marker HyperLadder™ IV (Bioline) on 1%
w/v agarose gels. Amplicons were purified using SureClean (Bioline).
The nuclear EST fragments Phyt B (Phytocrome B, involved in flower phenology) [96] and
Cons 58 (Auxin repressed protein) [97] were also tested, with the primers described by the
mentioned authors (Supporting Information 1 – Table S1.2); however after several attempts
of optimization no amplification product was obtained.
Sequencing reactions were carried out using the BigDye v3.1 chemistry (Applied
Biossystems, ABI) on an ABI prism 310 automated sequencer. Amplicons were sequenced in
both directions with an initial denaturation at 96ºC for 1 min, followed by 25 cycles of 96ºC
for 10s, annealing temperature of 50ºC for 5s, and a final extension step at 60ºC for 4 min.
The amplified products were purified through a 70% ethanol precipitation, described as
follows. The total reaction volume was transferred to a 1.5 ml tube containing 1 μl of 3 M of
sodium acetate and 25 μl of absolute ethanol. This mixture was subsequently incubated on ice
for 30 min, and then centrifuged at 10,000 g for 25 min. The supernatant was discarded and
300 μl of 70% ethanol were added to each tube, which were centrifuged for 15 min at the
same speed; this last step was performed a second time. Finally, the supernatant was
completely discarded and the samples were air-dried in the dark, until further processing.
The products were sequenced in an ABI PRISM® 310 Genetic Analyzer (Applied
Biosystems, USA) available in the laboratory.
Chromatograms were manually checked for errors in SEQUENCHER v4.0.5 (Gene Codes
Co.). For the nuclear fragment, nucleotide ambiguities of similar peak size in chromatograms
were considered as evidence of potential heterozygous sites. The IUPAC ambiguity code was
used for subsequent analyses.
BLAST
(Basic
Local
Alignment
Search
Tool)
against
NCBI
database
(http://blast.ncbi.nlm.nih.gov) was always performed to confirm the fragments‟ identity. The
45
Materials and Methods
Table 2.1: Description of the sampled populations for the several species, and sample size for each marker
(cpDNA and nuDNA sequences, and SSRs).
Species
Q. suber
Country
Portugal
Tunisia
Code
GPS coordinates
Sample size
Lat
Long
cpDNA
nuDNA
SSRs
Azeitão
AZT
38º 30'N
9º 02'W
5
3
-
Gerês
GER
41º 40'N
8º 10'W
5
3
29
Serra de Monchique
MON
37º 19’N
8º 34’W
5
3
29
Serra da Arrábida
ARR
38º 50’N
9º 03’W
5
4
30
Serra São Mamede
SSM
39º 23'N
7º 22'W
5
4
-
Serra de Sintra
SIN
38º 45’N
9º 25'W
5
3
30
Serra do Buçaco
BUC
40º 22'N
8º 21'W
5
3
30
São Brás de Alportel
SBA
37º 20'N
7º 56'W
5
3
-
Serra da Estrela
EST
40º 32'N
7º 51'W
5
4
32
Mekna, Tabarka
MEK
36º 57'N
8º 51'E
5
3
28
Fermana
FER
36º 35'N
8º 32'E
3
3
-
Algeria
Forêt des Guerbès
ALG
36º 54'N
7º 15'E
5
3
30
Italy
Puglia, Brindisi
PUG
40º 34'N
17º 40'E
5
3
22
Lazio, Tuscany
LAZ
42º 25'N
11º 57'E
5
3
-
Sicily, Catania
SIC
37º 07'N
14º 30'E
3
3
-
Landes, Soustons
LAN
43º 45'N
1º 20'W
5
2
-
Var, Bomes les Mimoses
VAR
43º 08'N
6º 15'E
3
3
-
Corsica, Sartene
COR
41º 37'N
8º 58'E
3
3
-
Kenitra
KEN
34º 05'N
6º 35'W
5
3
30
Rif, Taza
TAZ
34º 12'N
4º 15'W
5
3
30
Sierra de Guadarrama
GUA
40º 31'N
3º 45'W
5
3
-
Montes de Toledo,
Cañamero
Haza del Lino
TOL
39º 22'N
5º 21'W
5
3
-
France
Morocco
Spain
Q.
rotundifolia
Site
Portugal
HAZ
36º 50'N
3º 18'W
5
5
29
Sierra Morena,
Fuencaliente
Cataluña, Sta Coloma de
MOR
38º 24'N
4º 16'W
4
3
-
CAT
41º 51'N
2º 32'W
5
3
30
Farnes
Sierra de Aracena,
Jabugo
Ermidas do Sado
ARC
37º 54’N
6º 44’W
3
3
-
38º 00'N
8º 07'W
2
2
-
Serra da Arrábida
38º 50’N
9º 03’W
1
1
-
Serra da Estrela
40º 32'N
7º 51'W
1
1
-
Serra de São Mamede
39º 23'N
7º 22'W
9
6
-
Fátima
39º 37'N
8º 40'W
1
1
-
Q. ilex
France
43º 09’N
3º 03’E
2
1
-
Q. coccifera
Portugal
Cascais, Aldeia de Juzo
38º 72’N
9º 09’W
5
3
-
Q. faginea
Portugal
Serra da Arrábida
38º 50’N
9º 03’W
1
1
-
Q. pyrenaica
Portugal
Serra da Estrela
40º 32'N
7º 51'W
1
1
-
Q. robur
Portugal
Serra da Estrela
40º 32'N
7º 51'W
1
1
-
Portugal
Lisbon
38º 45’N
9º 09’W
1
-
-
Portugal
Negrais
38º 52’N
9º 17’W
1
1
-
Q. rubra
Portugal
Lisbon
-
Greve in Chianti
9º 09’W
11º 18’E
1
Italy
38º 45’N
43º 35’N
1
Q. cerris
1
1
-
Castanea
crenata
Portugal
Vila Real
1
1
-
148
104
379
Q.
canariensis
Q. lusitanica
Total
46
Materials and Methods
matK sequence for Quercus crenata was retrieved from GenBank (accession number
FN675334, [29]).
2.3 Microsatellite genotyping
A total of 9 dinucleotide nuclear anonymous microsatellite (nuSSRs) markers previously
developed on other oaks species were used in this study; one of them, MSQ13, was first
described in Q. macrocarpa Michx. [98], five in Q. petraea (Matt) Liebl. (QpZAG9,
QpZAG15, QpZAG36, QpZAG46 and QpZAG110) [99], and three in Q. robur L.
(QrZAG11, QrZAG7 and QrZAG20) [100]. Transferability of these SSRs to cork oak had
been previously reported [87,88]. These microsatellites are considered as unlinked and
anonymous markers [101]. Amplifications were performed with the primers designed by the
previously mentioned authors and the conditions were as follows: an initial denaturation step
at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 60 s,
annealing at 50 °C for 30 s (specific annealing temperatures in Table S2.1 – Supporting
Information 2), extension at 72 °C for 60 s, and a final extension step at 72 °C for 10 min.
PCRs were performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR
buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4
µM of each primer. However, considering the authors‟ guidelines for PCR, and after several
attempts, the loci QpZAG36 and QpZAG46 presented no amplification product for most of
the samples or very unreliable scoring and were, therefore, abandoned.
Two nuSSRs developed by Simões de Matos (F. Simões de Matos, PhD thesis, INETI
Lisbon, 2007) specifically for cork oak (QsA11 and QsD8) were also tested. However, in
spite of the optimization attempts a clear scoring was never possible and the loci were also
discarded.
At the onset of this work there were no EST-derived microsatellites (EST-SSRs) specifically
for cork oak, but since ESTs are gene conservative sequences, primers designed for a species
are likely to work in related ones. So, six polymorphic EST-SSRs were selected from
Quercus mongolica (QmOST1, QmD12, QmAJ1, QmDN1, QmDN2, QmDN3) [92]. The loci
names were chosen for this work and correspond, respectively, to the following NCBI dbEST
(http://www.ncbi.nlm.nih.gov/dbEST)
accession
numbers:
DN949770,
CR627959,
AJ577265, DN950717, DN949776, and DN950726. The selected sets of specific primers for
47
Materials and Methods
each SSR used can be found in Ueno & Tsumura [92] (Supporting Information 2 – Table
S2.2). PCR amplification conditions were as follows: an initial denaturation step at 94 °C for
5 min followed by 30 cycles consisting of denaturation at 94 °C for 30 s, annealing at 57 °C
for 30 s (specific annealing temperatures in Table S2.2 – Supporting Information 2),
extension at 72 °C for 30 s, and a final extension step at 72 °C for 10 min. PCRs were
performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR buffer
(Promega), 1U Taq polymerase (Promega), 1.5 mM MgCl2, 0.12 mM dNTPs and 0.3 µM of
each primer. After several attempts of amplification, the locus QmDN2 presented no PCR
products.
PCR product electrophoresis was performed with an ABI PRISM 310 automated sequencer
and the genotypes were scored and visually controlled using the GENEMAPPER software
v3.7 (Applied Biosystems, Inc.) To identify and correct possible genotyping errors the
software MICRO-CHECKER v2.2.3 [102] was used.
2.4 Phylogenetic and phylogeographic analysis
Datasets for each sequenced fragment were aligned in CLUSTAL X v2.0.12 [103,104],
followed by manual refinement in BIOEDIT v7.0.9 [105]. To create the cpDNA concatenated
matrix from the individual datasets of TrnL-F, TrnS-PsbC and TrnH-PsbA fragments, the
CONCATENATOR v1.1.0 software was used [106].
Phylogenetic analysis was performed using PAUP* v4.0.b4a [107]. Maximum parsimony
(MP) analyses were carried out on all data sets. The optimal tree was found by a heuristic
search with tree-bisection–reconnection as the branch-swapping algorithm. Initial trees were
obtained via stepwise addition with 1000 replicates of random addition sequence.
Bootstrapping with 1000 replicates was performed to evaluate the robustness of the nodes of
the phylogenetic trees.
Bayesian analyses (BA) were undertaken using MRBAYES v3.1.2 [108] with the optimal
model selected under the Akaike Information Criterion (AIC), as implemented in
MrMODELTEST v2.3 [109]. For analysis of the combined data, model selection was carried
out separately for each cpDNA data set with MrMODELTEST and then implemented
according to the author‟s recommendations. Additionally indels were included and scored as
48
Materials and Methods
binary characters (absent/present). The posterior probabilities of the phylogenetic trees were
estimated by a Metropolis-coupled, Markov chain Monte Carlo sampling algorithm
(MCMCMC), sampling at every 1000th generation. For the individual and combined cpDNA
datasets, Bayesian posterior probabilities were generated from 6x106 and 5x108 generations,
respectively. For the nuclear fragment dataset 3x106 generations were used to calculate the
Bayesian phylogeny and respective posterior probability values. The analysis was run three
times with one cold and three incrementally heated Metropolis-coupled Monte Carlo Markov
chains, starting from random trees. Ten percent of the generations were discarded as burn-in.
Trees were then combined and summarized on a 50% majority-rule consensus tree.
The cpDNA fragments, when aligned for all oak species presented several indels, therefore
only the MP and BA analyses were performed, because only these allow considering indels
as informative data.
The program NETWORK v4.6 [110] was used to construct a median-joining network of
haplotypes showing the number of mutational steps between them.
2.5 Selective neutrality tests and demographic history
Selective neutrality of each microsatellite locus was examined based on the sampling
distribution of neutral alleles under the infinite-alleles model. The Ewens–Watterson
homozygosity test [111] and the Ewens–Watterson–Slatkin exact test [112,113] were
performed using the absolute allele frequency distribution, as implemented in ARLEQUIN
v3.5 software [114]. In these tests, the expected null distribution of the homozygosity statistic
(Fexp) is generated by simulating random neutral samples, which is then compared with the
homozygosity observed in the original sample (Fobs). If the null hypothesis of selective
neutrality is rejected (p<0.05), an Fobs/Fexp ratio less than 1 implies balancing selection in
favour of heterozygotes and a ratio greater than 1 implies directional selection in favour of
advantageous alleles.
The mismatch distribution (1000 replicates) was used to infer the demographic history of the
cork oak lineages present in each cpDNA and nuDNA datasets. Pairwise distances between
haplotypes, time since population expansion (τ), relative population size before (θ0) and after
(θ1) expansion were calculated in ARLEQUIN. The Harpending's (1994) raggedness index
49
Materials and Methods
(r) and the sum of squared deviation (SSD) to assess the statistical significance of the
distribution under the rapid expansion model was tested with 1000 replicates of bootstrap in
ARLEQUIN.
Both Tajima‟s D [116] and Fu‟s Fs [117] tests were implemented to test deviations from
neutrality. Fu‟s Fs uses information from the haplotype distribution and is particularly
sensitive to population demographic expansion where low Fs values indicate an excess of
single substitutions usually due to expansion [117,118]. Tajima‟s D uses the average number
of pairwise differences and number of segregating sites in the intraspecific DNA sequence to
test for departure from neutral expectations, generally assuming negative values in
populations that have experienced size changes, or for sequences that have undergone
selection [116,118]. Fu‟s Fs and Tajima‟s D were calculated in ARLEQUIN.
2.6 Genetic diversity and population differentiation
Linkage disequilibrium (LD) between all pairs of polymorphic SSR loci was calculated using
the probability test implemented in GENEPOP v4.0 software [119]. Using the complete
sampling, the nucleotide diversity (π) and its standard deviation, Haplotype diversity (Hd)
and Indel Haplotype diversity (IndelHd) were estimated for each selected sequenced
fragment in DnaSP v10.01 [120].
Gene diversity statistics (gene diversity He [121] and allelic richness A) were estimated for
microsatellites using the program FSTAT v2.9.3.2 [122,123]. Allelic richness (A) was
corrected using the rarefaction method based on a minimum sample size of 21 diploid
individuals, which corresponded to the smallest number of individuals successfully
genotyped for a given locus in a population. The private alleles were calculated in GenAlEx
v6.3 [124]. The inbreeding coefficient Fis [125] was calculated using ARLEQUIN and its
deviation from zero tested by 10,000 allele permutations. Population differentiation was
calculated by FST [125] and RST [126] in ARLEQUIN.
SMOGD software v1.2.5 [127] was used to measure the actual differentiation among
populations (Dest) according to Jost [128], G‟ST standardized measure of genetic
differentiation [129] and GST nearly unbiased estimator of relative differentiation [130].
Pairwise genetic differentiation between populations was estimated with FST, RST and Dest, in
50
Materials and Methods
FSTAT, ARLEQUIN and SMOGD, respectively. Standard Bonferroni corrections were
applied to account for multiple testing.
Geographic patterns of genetic differentiation were tested by regressing the genetic
differentiation (FST) against geographic distance between pairs of samples, following Rousset
[131] [FST/(1-FST) and logarithm of geographic distances between populations]. The reduced
major axis regression was used to estimate the regression, using the IBDWS v3.03 software
[132]. Mantel tests were used to test the null hypothesis of no relationship between the
genetic and geographic matrices.
2.7 Genetic structure of populations
The Bayesian clustering method implemented in STRUCTURE v2.3.3 [133] was used to
determine the genetic structure of the sampled populations for the microsatellite loci. Because
preliminary analyses showed that overall differentiation was low the new clustering method
was used, which is not only based on the individual multilocus genotypes but also takes into
account the sampling locations [134]. The LocPrior model considers that the prior
distribution of cluster assignments can vary among populations. This approach is
recommended by the authors when the genetic data are not very informative to help the
detection of population structure. A parameter r indicates the extent to which the sampling
locations are informative (small values <1 indicate that locations are informative). Twenty
independent runs were done, following a Markov Chain Monte Carlo (MCMC) scheme, for
each value of K (the number of putative clusters) ranging from 1 to 13 (the number of
populations sampled). The admixture model with sampling locations as prior information
[134] was selected and correlated allele frequencies among populations were assumed [135].
Each run consisted in a MCMC length of 1,000,000 and 50,000 burnin. It was used the
posterior probability of the data for a given K, LnP(D), to identify the most probable number
of clusters using both DK (DeltaK) ad hoc statistics [136] and guidelines of the software
documentation [133]. Once the most likely K value was determined, for interpreting results
was chosen the run with the higher posterior probability and lower variance. Final results
from STRUCTURE were visualized using the software DISTRUCT v1.1 [137].
The degree of population subdivision was also explored as implemented in the R-package
GENELAND v3.2.4 [138]. This latter approach determines the number of groups (K) using a
51
Materials and Methods
Bayesian clustering model executed in a MCMC scheme to detect the location of genetic
discontinuities using individual geo-referenced multilocus genotypes [139]. GENELAND
uses geographical locations of individuals as prior information. This model treats the number
of clusters as a parameter processed by the MCMC scheme without any approximation and
may provide a better estimation of the number of clusters than other proposed procedures that
do not take the geographical locations into account [139,140]. Twenty independent MCMC
runs were performed, allowing K to vary from 1-13 (the number of populations sampled),
with the following parameters: 1,000,000 iterations, of which every hundredth one was saved
(after 10% burnin), treating the number of genetic clusters as unknown and using Dirichlet
model for allelic frequencies (assumed as correlated).
Results obtained following the GENELAND and STRUCTURE approaches were further
tested with an Analysis of Molecular Variance (AMOVA) approach [141].
52
Results
3. Results
3.1 Sequencing of chloroplast and nuclear DNA fragments
3.1.1 cpDNA and nuDNA diversity levels
Initially, 148 samples were sequenced for the cpDNA fragments studied (Table 2.1). The
cpDNA fragments, when aligned for all oak species presented several indels. When
ambiguous alignments were produced, several slightly different alignments including the
removal of the ambiguous positions or indels were tested, without producing any major
differences in the results. The nucleotide diversity found in each dataset was 0.00400 (+/0.0009), 0.00925 (+/- 0.0007) and 0.00549 (+/- 0.0004) for the fragments TrnS/PsbC,
TrnH/PsbA and TrnL-F, respectively (Table 3.1). For the cork oak samples (119 out of the
148 individuals), a total of 8 TrnS/PsbC haplotypes, 7 TrnH/PsbA haplotypes, and 5 TrnL-F
haplotypes, were obtained. For the cpDNA concatenated dataset 17 cork oak haplotypes were
detected with a nucleotide diversity of 0.00658 (+/- 0.005) (Table 3.1). After a preliminary
analysis of the cpDNA sequences, 104 samples (out of the 148) from the main groups were
selected and sequenced for candidate gene EST 2T13 (Table 2.1). The alignment was
straightforward showing no potential heterozygous sites for the cork oak samples. The
nucleotide diversity estimated for the EST 2T13 fragment is 0.02387 (+/- 0.0126) (Table 3.1),
with 8 cork oak haplotypes.
Table 3.1: The length (bp), number of parsimony informative sites (PI) and
estimated nucleotide diversity (π) and its standard deviation for each dataset,
using the complete sampling.
Lenght Variable
Total
(bp)
sites
characters Indels PI
π
Individual cpDNA
0.00400
+/- 0.0009
0.00925
34
+/- 0.0007
0.00549
14
+/-0.0004
TrnS/PsbC
250
20
238
12
15
TrnH/PsbA
478
54
448
30
TrnL-F
381
18
374
7
1109
92
1060
49
63
0.00658
+/-0.0005
249
48
240
9
20
0.02387
+/-0.0126
Concatenated
TrnS/TrnH/TrnL
Individual nuDNA
EST 2T13
53
Results
3.1.2 Differentiation patterns
Maximum parsimony (MP) trees for the cpDNA fragments TrnS/PsbC, TrnH/PsbA and
TrnL-F are presented in Fig. 3.1a, Fig. 3.2 and Fig. 3.3, respectively. The Bayesian analyses
(BA) derived trees showed very similar results to those of the MP analysis; therefore it was
decided to present the MP tree of each fragment with the respective bootstrap and clade
credibility values. The concatenated tree supported the results of the individual trees
(Supporting Information 3). In all the cpDNA phylogenetic trees (Fig. 3.1a, Fig. 3.2 and Fig.
3.3), four major groups were distinguished, and were named as Group A, B, C and D. Group
A (highlighted in the figures in yellow) is composed exclusively by samples of the subgenus
Cerris, namely cork oak samples from several populations and Q. cerris. Group B appears as
a more complex group since it is composed of samples of several Quercus species, namely Q.
suber (highlighted in the trees in orange – subg Cerris), and Q. coccifera (green), Q. ilex ilex
(pink) and Q. ilex rotundifolia (red) of the subgenus Sclerophyllodrys. Considering the
presence of cork oak samples in these two groups, and that the samples present in each group
were always the same for all the cpDNA fragments, the haplotypes belonging to Group A
were considered as a pure lineage of cork oak, while the samples belonging to Group B were
considered as an introgressed lineage of cork oak. Group C (highlighted in blue) is composed
by several Quercus species, specifically Q. faginea, Q. robur, Q. pyrenaica, Q. lusitanicus
and Q. canariensis from the subgenus Quercus. Finally, Group D is constituted by Q. rubra
of the subgenus Erythrobalanus.
In particular, Group A is composed of 92 cork oak samples (out of the 119) and the sample of
Q. cerris, and was characterized by low levels of variation and number of haplotypes (Table
3.2). This was particularly evident for the TrnL-F fragment for which only one haplotype was
found (Fig. 3.3), and for TrnH/PsbA fragment where again only one major haplotype is
present, although two derived low frequent haplotypes are present in Puglia (Italy) (Fig. 3.2
and Table 3.2). In both these fragments Q. cerris shares the same haplotype as Q. suber.
Higher variation was found for TrnS/PsbC fragment in cork oaks pure lineage (Table 3.2),
allowing the distinction of tree sublineages, that were named as A1, A2 and A3 (Fig. 3.1 and
Table 3.2). In Fig. 3.1b a reconstruction of the phylogenetic tree for the pure lineage of the
TrnS/PsbC fragment shows the major haplotypes of each sublineage and the mutational
events that occurred during the formation of those haplotypes. The sublineage A1 (sl A1) is
exclusive to the island of Sicily, the sublineage A2 (sl A2) is present in West Mediterranean
54
Results
b)
a)
Figure 3.1: a) Maximum parsimony tree of the cpDNA TrnS/PsbC intergenic spacer region. Four groups are
represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage and Q. cerris (Bright
Yellow - Sublineage A2 (Sl A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1
(Sl A1)); Group B (orange – cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink –
Q. ilex); Group C is highlighted in dark blue and is composed of several Quercus: Q. faginea, Q. robur, Q.
pyrenaica, Q. canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra.
Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and
the Bayesian credibility value; b) Detailed phylogenetic reconstruction of the sublineages from Group A.
Bootstrap support and Bayesian credibility value are provided above each branch. The site combinations bellow
each branch represents the mutational events that occurred along the evolution of the three sublineages.
55
Results
Figure 3.2: Maximum parsimony tree of the cpDNA TrnH/PsbA intergenic spacer region. Four groups are
represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage; Group B: cork oak‟s
introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted
in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and
Q. lusitanica; Group D is hightlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the
bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.
56
Results
Figure 3.3: Maximum parsimony tree of the cpDNA TrnL-F intergenic spacer region. Four groups are
represented and color coded. Group A is highlighted in yellow: cork oak‟s pure lineage; Group B: cork oak‟s
introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted
in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and
Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the
bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.
57
Results
populations, and the sublineage A3 (sl A3) is present in East Mediterranean populations (Fig.
Table 3.2: Haplotype diversity (Hd), Indel Haplotype
diversity (Indel Hd) and number of haplotypes for each
cpDNA and nuclear fragment (nr H) according to pure
and introgressed cork oak lineages.
Hd
Indel Hd nr H
TrnS/PsbC
Pure Lineage (A) 0.000
0.471
4
Introgressed lineage (B) 0.000
0.649
4
TrnH/PsbA
Pure Lineage (A) 0.024
0.024
3
Introgressed lineage (B) 0.442
0.695
5
TrnL-F
Pure Lineage (A) 0.000
0.000
1
Introgressed lineage (B) 0.613
0.413
4
EST2T13
Pure Lineage (α) 0.494
0.028
6
Introgressed lineage (β) 0.000
0.182
2
3.4). For this fragment Q. cerris shows a
derived haplotype from the sublineage A3
(Fig. 3.1a).
Group B is composed of all the Q.
coccifera and Q. ilex (subspecies ilex and
rotundifolia) samples that were analysed,
as well as 27 of the 119 cork oak
samples. Most of the cork oak haplotypes
that belong to this group are shared or
seemed derived from the haplotypes
present in the other species.
When comparing cork oak samples in Group B, belonging to the introgressed lineage, they
generally presented more variability than those in Group A, the pure lineage (Table 3.2).
All the three cpDNA fragments seem to be able to distinguish groups C and D, even if with a
low resolution, most evident in the TrnL-F fragment. Also these groups are quite inconsistent
regarding their position on the trees, as well as the phylogenetic relationships between them
and the other groups (Fig. 3.1a, Fig. 3.2 and Fig. 3.3).
The analysis of the barcode matK fragment provided roughly the same nucleotide diversity
(π=0.0067 +/- 0.00095) as the remaining cpDNA fragments analysed, whereas the rbcL
fragment presented no variation for any of the species. The analysis of the matK fragment and
the resulting phylogenetic tree (Fig. 3.5) corroborated the presence of two cork oak lineages.
Quercus cerris and Quercus crenata are classified, both by classic taxonomy [17] and recent
DNA barcode analysis [29], as the most closely related species to Q. suber (subgenus Cerris),
and these species appear with the same haplotype as cork oak‟s samples characterized as pure
lineage (subgenus Cerris). Cork oak samples characterized as the introgressed lineage appear
closely related to Quercus ilex ilex, Quercus ilex rotundifolia and Quercus coccifera
(subgenus Schlerophyllodrys).
58
Results
Figure 3.4: Geographical distribution of cork oak cpDNA haplotype lineages according to the TrnS/PsbC
fragment. Pie charts represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect
the number of samples per population (3-5). Colour codes reflect those in the TrnS/PsbC tree (Fig. 3.1);
Yellow: cork oak‟s Pure lineage (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3;
Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage. In grey is represented the present
distribution of the species.
Figure 3.5: Maximum parsimony tree of the cpDNA fragment matK. Four groups are represented and
classified according to classic taxonomy (Tutin et al. 1993), following the four subgenera identified by
Schwartz 1964 (Sclerophyllodrys, Cerris, Quercus and Erythrobalanus). Numbers at the nodes are the
bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility
value.
59
Results
For the nuclear fragment EST 2T13, the best model of sequence evolution for the BA was
calculated, and the resultant tree presented a very similar topology to that of the MP. The MP
tree is shown in Fig. 3.6a with the respective bootstrap values of MP and clade credibility
values for the BA. The nuclear tree reflects the same pattern as the cpDNA trees. The same
four major groups are present, here named as Group α, β, γ and δ.
Group α is constituted by Q. suber [40 samples out of the 52 used, (Table 2.1)] and Q. cerris,
similarly to the cpDNA results. The pattern of three sublineages is also present, α1, α2 and
α3, although Q. cerris in the nuclear fragment appears to share the same haplotype of the
cork oak samples from sublineage α3. In Fig. 3.6b a reconstruction of the phylogenetic tree
for the nuclear pure lineage of the EST 2T13 fragment shows the major haplotypes of each
sublineage and the six mutational events that occurred during the formation of those
haplotypes. Group β, as in the cpDNA trees, is constituted by cork oak, Q. coccifera and Q.
ilex samples. Also as in the cpDNA trees, the Group γ is composed of the Quercus species
belonging to the subg. Quercus and Group δ is constituted by Q. rubra from the subgenus
Schlerophyllodrys. The phylogenetic relationships between the four groups more closely
resemble those of the cpDNA fragment TrnH/PsbA tree. The major differences found
between the nuDNA and the cpDNA datasets are the cork oak samples that compose Group α
(and one could call the nuclear pure lineage), and subsequent sublineages, and Group β (the
nuclear introgressed lineage), that are not always the same when comparing the fragments
from both genomes. In particular, sublineage α1 in the nuclear DNA is not exclusively
composed of cork oak samples from Sicily island as in the cpDNA fragments, showing
samples that in the cpDNA belonged to sublineage A3; sublineage α2 is not completely West
Mediterranean in the nuclear DNA presenting samples from the sublineage α3 as well as
from the introgressed lineage; the sublineage α3 also loses its exclusiveness to East
Mediterranean populations being constituded by samples that in the cpDNA trees belong to
the sublineage α2 and introgressed lineage (Fig. 3.7). Another difference between the
cpDNA and the nuclear DNA was the arrangement of the cork oak samples that compose
Group β. These samples do not share the same haplotypes with Q. ilex and Q. coccifera as
they did in the cpDNA fragments. Instead they present a major haplotype derived from a
60
a)
b)
Results
Figure 3.6: a) Maximum parsimony tree of the candidate gene EST 2T13. Four groups are represented and color
coded. Group α is highlighted in yellow: cork oak‟s pure lineage and Q. cerris (Bright Yellow - Sublineage α2;
Brownish-Yellow – Sublineage α3; Light Yellow – Sublineage α1); Group β (Orange – cork oak‟s introgressed
lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group γ is highlighted in dark blue and is
composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica:
Group δ is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap
support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value. b) Detailed
phylogenetic reconstruction of the sublineages from Group α. Bootstrap support and Bayesian credibility value
are provided above each branch. The site combinations bellow each branch represents the 6 mutational events
that occurred along the evolution of the three sublineages.
61
Results
common ancestor, shared with Q. ilex and Q. coccifera. The diversity levels in the nuclear
pure lineage are not as low as in the cpDNA (Table 3.2), showing 6 haplotypes.
However, the diversity levels of the cork oak samples that belong to the nuclear introgressed
lineage in Group β are lower when compared to the diversity of the other species in this
group (Table 3.2 and Fig. 3.6). In Group γ the diversity, however is higher than that of the
cpDNA since each species is characterized by its own haplotype (Fig. 3.6).
Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Supporting
Information 4) reflecting the four major groups in the trees and the shared haplotypes for Q.
suber, in clade B, with Quercus coccifera, Q. rotundifolia and Q. ilex.
Figure 3.7: Geographical distribution of cork oak nuDNA EST 2T13 haplotype lineages. Pie charts
represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect the number of
samples per population (3-5). Colour codes reflect those in the EST 2T13 tree (Fig. 3.6); Yellow: cork oak‟s
nuclear pure lineage (Bright Yellow - Sublineage A2‟; Brownish-Yellow – Sublineage A3‟; Light Yellow –
Sublineage A1‟); Orange: cork oak‟s nuclear introgressed lineage. In grey is represented the present
distribution of the species.
3.1.3 Mismatch distribution and neutrality tests
Demographic histories of both cork oak lineages were evaluated with mismatch distributions
and tests of the standard neutral model for a demographically stable population (Tajima‟s D
[116] and Fu‟s Fs [117]) (Table 3.3). The TrnS/PsbC fragment sequence analysis provided
62
Results
slightly contradicting results. The null hypothesis of population demographic expansion was
not rejected based on the mismatch distribution for neither of the lineages (pure lineage –
SSD=0.098, p=0.061; r=0.436, p=0.164/ introgressed lineage – SSD=0.026, p=0.086;
r=0.186, p=0.055), but p values are somehow marginal and these statistics are conservative
and use little information of the data. Detecting population demographic size changes can be
difficult with small sample sizes or haplotypes, or when the population has experienced a
very recent expansion. Fu‟s Fs has been shown to be more powerful than mismatch
distributions in detecting both very recent and older population expansions [117,118], and
this statistic (such as Tajima‟s D) did not support population expansion for either of the
lineages (Table 3.3). The TrnH/PsbA fragment analysis, for the pure lineage showed a strong
evidence of recent population expansion from the not significant sum of squared deviations
(SSD=0.000, p=0.183) and Harpending‟s raggedness index (r=0.822, p=0.837), and
significant (p<0.001) negative values of Fu‟s Fs. Tajima‟s D values, although not significant
(p=0.119) presented also a negative tendency (D=-1.047). TrnH/PsbA introgressed lineage
presented no evidence of population expansion. SSD and r values rejected the null hypothesis
of expansion supported by the values of D and Fs. It was not possible to calculate the
mismatch and Fu‟s Fs for the pure lineage of the TrnL-F fragment because only one
haplotype is present. The TrnL-F introgressed lineage presented a mismatch distribution that
departed (although marginally) from the stepwise growth model (SSD=0.132, p=0.076), but
fit to the Harpending‟s raggedness index of stepwise population expansion model (r=0.491,
p=0.019). Fu‟s Fs values and Tajima‟s D were positive and not significant rejecting
population expansion (Table 3.3).
The nuDNA fragment EST 2T13 was also evaluated for its demographic history and
neutrality. For the nuclear pure lineage the null hypothesis of demographic expansion based
on the Harpending‟s raggedness index of the mismatch distribution was not rejected
(r=0.145, p=1.00). However the SSD value rejected the null hypothesis at a highly significant
level (SSD=0.328, p=0.00), supported by the non-significant values of Fu‟s Fs and Tajima‟s
D (although both values are negative) (Table 3.3). For the nuclear introgressed lineage the
mismatch analysis indicates demographic expansion, although this is not supported by Fu‟s
Fs and Tajima‟s D tests (Table 3.3).
63
Results
Table 3.3: Estimates of mismatch distribution parameters and neutrality tests. τ = (tau) time since population
expansion; θ = relative population size before (θ0) and after (θ1) expansion; SSD = sum of squared deviations; r
= Harpending‟s raggedness index; D = Tajima‟s D; Fs = Fu‟s Fs; ns = not significant; * Significant at p<0.05;
*** Significant at p<0.001
Missmatch
Tajima's D
Fu's Fs
Fs
Ɵ0
Ɵ1
SSD
r
D
0.000
1.115
τ
TrnS-PsbC
Pure lineage (A) 2.648
0.098 ns
0.436 ns
0.000 ns
0.947 ns
0.000 99999.000
0.026 ns
0.186 ns
0.000 ns
-0.271 ns
0.000
0.050
0.000 ns
0.882 ns
-1.047 ns
-3.773 ***
Introgressed lineage (B) 10.273 0.000
TrnL-F
8.887
0.236 **
0.461 ***
0.047 ns
5.891 ns
-
-
-
-
0.000 ns
-
0.002
3.000
0.132 ns
0.491 *
0.771 ns
1.290 ns
0.000 3413.950
0.328 ***
0.145 ns
-1.323 ns
-0.741 ns
0.450
0.021 ns
0.457 ns
0.000 ns
-0.176 ns
Introgressed lineage (B) 0.959
TrnH-PsbA
Pure lineage (A) 3.000
Pure lineage (A)
-
Introgressed lineage (B) 2.484
EST 2T13
Pure lineage (α) 0.000
Introgressed lineage (β) 2.965
0.450
3.2 Microsatellite analysis
3.2.1 Genetic diversity values
For the EST-SSRs markers, QmDN1 locus was apparently monomorphic and was discarded
from any subsequent analysis. Global evaluation of the microsatellite data set using MicroChecker [102] revealed no evidence of genotyping errors due to stuttering or large allele
dropout, but identified possible null alleles at two markers: QmOST1 and QmDN3. For
QmOST1 locus, although marginally, there is the possibility of null alleles for the
populations HAZ and MEK (Supporting Information 5 – Fig. 5.1). As for the QmDN3 locus
revealed indices of null alleles in all populations and, therefore, was eliminated from all
subsequent analyses (Supporting Information 5 – Fig. 5.2). For the three remaining ESTSSRs no linkage disequilibrium between the loci was detected (Supporting Information 6).
The number of total alleles (NA) in each population ranged from nine to fourteen and the
allelic richness (A) from 3.000 to 4.109, being the SIN population that clearly presented the
higher number of alleles and consequent the highest allelic richness. Gene diversity (expected
heterozygosity over loci) ranged from 0.400 in SIN to 0.598 in CAT (Table 3.4). Only the
population of SIN departed significantly from Hardy-Weinberg equilibrium at 0.01
64
Results
significance level (Table 3.4). The inbreeding coefficient for the SIN population was positive
(Fis=0.1691), and as the species, although monoeicious, presents a protandrous system to
ensure cross-pollination, significant deviation from zero should reflect biparental inbreeding
or population substructure (Table 3.4). In total, 18 alleles were identified at the three loci, and
7 alleles were exclusive to a single population (private alleles). Of these, 5 were exclusive to
SIN and the others to MEK and CAT (Table 3.4). The private alleles were at the extremes of
the allele size distribution and occurred at very low frequencies.
Table 3.4: Populations of Quercus suber sampled for the molecular genetic work with SSRs, including country
and population abbreviations, number of total alleles (NA), allelic richness (A), number of private alleles (PA),
expected (He) and observed (Ho) heterozygosities, and within-population inbreeding coefficients (Fis).
EST-SSRs (3 loci)
Country Code
Portugal
+
NA
A
nuSSRs (5 loci)
PA
Fis
Ho
He
NA
A
PA
Fis
Ho
He
ARR
10 3.315
-
0.0162 ns
0.522
0.531
34
6.165
3
0.1064 ns
0.538 0.601
BUC
10 3.025
-
-0.0005 ns
0.422
0.422
34
5.997
1
0.0399 ns
0.553 0.576
EST
10 3.167
-
0.2190 ns
0.375
0.479
32
5.703
-
0.1449 ns
0.506 0.591
GER
10 3.228
-
-0.0840 ns
0.506
0.467
30
5.635
-
0.0586 ns
0.557 0.591
MON
11 3.479
-
0.1769 ns
0.393
0.476
35
6.494
1
0.1579 *
0.521 0.617
SIN
14 4.109
5
0.1691 **
0.333
0.400
33
6.158
1
-0.0665 ns 0.621 0.583
Algeria
ALG
10 3.202
-
-0.0055 ns
0.522
0.519
38
6.759
1
0.0920 ns
0.510 0.548
Spain
CAT
11 3.465
1
-0.0218 ns
0.611
0.598
29
5.354
-
0.0628 ns
0,533 0.569
HAZ
10 3.230
-
0.1188 ns
0.512
0.580
32
5.796
1
0.1181 ns
0,469 0.531
Marocco
TAZ
10 3.276
-
0.1083 ns
0.478
0.535
34
6.138
1
0.0261 ns
0.538 0.552
KEN
11 3.365
-
-0.2336 ns
0.700
0.570
26
4.938
-
0.0433 ns
0.630 0.658
Tunisia
MEK
11 3.480
1
0.1689 ns
0.441
0.528
29
5.433
-
0.0348 ns
0.508 0.526
Italy
PUG
9
-
0.0403 ns
0.444
0.463
28
5.600
-
0.0642 ns
0.591 0.630
3.000
+See Fig. 3.4 for visual location on a map of Europe.
Significance levels after Bonferroni corrections: Ns – Not significant; ** Significant at p<0.01; * Significant at
p<0.05
For the nuSSRs, As previously shown by Burgarella et al. [142] the locus MSQ13 appears to
be particularly informative to detect F1 hybrids between Q. suber and Q. rotundifolia because
the allele sizes do not overlap [88,142]. The locus was tested in some individuals for each
population (including all the individuals that were detected as belonging to the introgressed
lineages), revealing to be monomorphic at the expected allele size for Q. suber. Thus, the
locus was not used in the following analysis. Global evaluation of the microsatellite data set
65
Results
using Micro-Checker revealed no evidence of genotyping errors due to stuttering or large
allele dropout, but identified possible null alleles in a few populations for markers
QpZAG110, QrZAG20, QrZAG11 and QpZAG15 (Supporting Information 5 – Fig. 5.3, Fig.
5.4, Fig. 5.5). Also, the QpZAG15 locus revealed a departure from the Hardy-Weinberg
equilibrium (HWE) in 9 populations (data not shown). Considering all, this locus was
removed from subsequent analyses. No linkage disequilibrium between the remaining loci
was detected (Supporting Information 6). The number of total alleles (NA) in each population
ranged from 26 (KEN) to 36 (ALG) and the allelic richness (A) from 4.938 to 6.759. The
gene diversity (expected heterozygosity over loci) ranged from 0.526 in Mekna to 0.658 in
Kenitra (Table 3.4). These values are slightly higher than those obtained for the EST-SSRs in
every population, with the exception of the Spanish and Tunisian populations. Only the
population of MON departed significantly from HWE at 0.05 significance level, after
Bonferroni correction (Table 3.4). Fis for the MON population assumed a positive value, and
could reflect biparental inbreeding or population substructure (Table 3.4). However, in this
case, considering that Micro-Checker marginally detected null alleles for this population for
the QrZAG20 locus, this effect cannot be discarded. In total, 56 alleles were identified at the
five loci, and nine alleles were private alleles. The private alleles presented no particular
distribution over the populations as did those of the EST-SSRs, although a slight tendency for
the population of ARR that has 3 of the nine alleles (Table 3.4). The private alleles were
mostly at the extremes of the allele size distribution and occurred at low frequencies.
No microsatellite (either nuSSR or EST-SSR) revealed evidence of nonneutrality after the
Ewens–Watterson and Ewens–Watterson–Slatkin tests (data not shown).
3.2.2 Genetic differentiation among populations
Different coefficients of genetic differentiation among populations were estimated for both
types of SSRs markers (Table 3.5). All the coefficients displayed higher values for the ESTSSRs than for the nuSSRs, and consistently in both markers G‟ST and D displayed slightly
higher values than FST, GST and RST (Table 3.5). GST and FST showed that differentiation
among populations was more than double in the case of EST-SSRs (GST=0.066 EST-SSRs vs.
0.031 nuSSRs; FST=0.071 EST-SSR‟s vs 0.032 nuSSRs). Nevertheless, for the remaining
66
Results
Table 3.5: Genetic statistics for EST-SSRs and nuSSRs. Number of alleles (NA),
allelic richness (A), observed (Ho) andexpected (He) heterozygosities, FST
differentiation among populations according to Wier and Cockerham [125]; RST
differentiation among populations according to Slatkin [126], GST proportion among
population differentiation according to Nei & Chesser [130], G'ST standardized measure
of genetic differentiation according to Hedrick [129], and Dest estimator of actual
differentiation according to Jost [128]
Locus
NA
A
Ho
He
FST
RST
GST
G'ST
Dest
QrOST1
9
4.320
0.540
0.610 0.064
0.038 0.060 0.148 0.093
EST-SSRs
QpD12
3
2.999
0.403
0.474 0.139
0.142 0.130 0.229 0.114
QmAJ1
6
3.182
0.501
0.542 0.017
0.019 0.020 0.045 0.025
All
18
1.750
0.481
0.542 0.071
0.066 0.066 0.141 0.077
QpZAG110
23
13.149
0.817
0.872 0.022
-0.004 0.023 0.169 0.149
QpZAG9
7
3.124
0.138
0.142 0.014
0.015 0.014 0.016 0.003
QrZAG20
5
3.894
0.449
0.557 0.035
0.042 0.036 0.081 0.047
QrZAG7
10
6.625
0.689
0.756 0.057
0.138 0.055 0.204 0.158
QrZAG11
11
5.755
0.580
0.628 0.010
0.032 0.015 0.042 0.027
All
56
6.509
0.535
0.591 0.032
0.045 0.031 0.102 0.077
nuSSR's
coefficients (RST, G‟ST and D) the differences between the markers are not significant and,
interestingly, the value of actual differentiation among populations calculated according to
Jost D for both SSRs was the same (Dest=0.077) (Table 3.5).
Tests of pairwise FST and RST were performed for the thirteen populations, for both ESTSSRs and nuSSRs. There was a tendency for obtaining higher values in the EST-SSR data
matrix, but not always so. Therefore both SSR matrices were analysed together. The overall
genetic differentiation at the microsatellite loci was low (Pairwise FST from 0.000 to 0.123),
though highly significant (p<0.001) after bonferroni correction in 51 out of 78 pairs (Table
3.6). The RST matrix values very resembled the ones of the FST matrix. The highest values
were obtained for the populations CAT and KEN, followed by PUG. The Dest values,
although similar to the FST and RST values, tend to be lower (Supporting Information 7).
Isolation by distance was tested using a Mantel test but no correlation was found between
genetic differentiation and geographic distance among populations (r=0.1082, p=0.26).
67
Results
Table 3.6: Pairwise FST (Below) and RST (Upper) values between every population.
ALG
ALG
ARR
--
CAT
HAZ
EST
GER
PUG
KEN
TAZ
MON
SIN
MEK
0.023 0.044 0.055 0.013 0.063 0.021 0.075 0.073 0.022 0.059 0.041 0.000
0.023
***
0.043
BUC
***
0.052
CAT
***
0.013
HAZ
ns
0.035
EST
***
0.021
GER
***
0.070
PUG
***
0.068
KEN
***
0.021
TAZ
ns
0.056
MON
***
0.039
SIN
***
0.000
MEK
ns
ALG
ARR
BUC
-0.000
ns
0.050
***
0.020
ns
0.003
ns
0.000
ns
0.052
***
0.061
***
0.033
***
0.001
ns
0.008
ns
0.044
***
ARR
0.000 0.052 0.020 0.004 0.000 0.055 0.065 0.034 0.001 0.008 0.046
-0.086
***
0.053
***
0.007
ns
0.004
ns
0.057
***
0.097
***
0.041
***
0.013
ns
0.009
ns
0.063
***
BUC
0.094 0.056 0.007 0.004 0.060 0.108 0.043 0.013 0.009 0.067
-0.049
***
0.086
***
0.077
***
0.123
***
0.076
***
0.063
***
0.087
***
0.084
***
0.079
***
CAT
0.051 0.094 0.084 0.141 0.083 0.067 0.096 0.092 0.086
--
0.041 0.025 0.087 0.043 0.021 0.050 0.065 0.031
0.039
***
0.025
***
0.080
***
0.042
***
0.020
ns
0.048
***
0.061
***
0.030
***
HAZ
-0.000
ns
0.053
***
0.084
***
0.033
ns
0.006
ns
0.034
***
0.042
***
EST
0.000 0.056 0.092 0.034 0.006 0.035 0.044
--
0.039 0.073 0.028 0.004 0.023 0.037
0.038
***
0.068
***
0.027
***
0.004
ns
0.022
***
0.036
***
GER
--
0.101 0.055 0.072 0.100 0.084
0.092
***
0.052
***
0.067
***
0.091
***
0.077
***
PUG
--
0.052 0.120 0.132 0.093
0.049
-0.068 0.082 0.036
***
0.107 0.063
-0.035 0.076
*** ***
0.117 0.076 0.034
-0.069
*** *** ***
0.085 0.035 0.070 0.065
-*** *** ***
***
KEN TAZ MON SIN MEK
ns=Not significant; *p<0.05; ** p<0.01, *** p<0.001
3.2.3 Population structure
The EST-SSRs and nuSSRs datasets were analysed separately and then merged together to
determine the populations genetic structure (Fig. 3.8). For the EST-SSR‟s, in the software
STRUCTURE, the logarithm of the probability of the data [LnP(D)] as function of K reached
a peak for K=3 (mean values: LnP(D)=-2055.3; var[LnP(D)]=131.2), which was confirmed
using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.1). For the nuSSR‟s
dataset
the
LnP(D)
reached
a
peak
at
K=4
(mean
values:
LnP(D)=-4749.3;
var[LnP(D)]=238.9) and then decreased, but there was a higher DK value for K=3 than for
K=4 using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.2). For the
combined dataset, the LnP(D) reached a peak ate K=4 (mean values: LnP(D)=-6704.3;
var[LnP(D)]=303.8), but when DK was used to infer the number of clusters, K=2 presented
68
Results
the highest values, however there was a second peak at K=4 (Supporting Information 8 – Fig.
S8.3).
For the most likely run for each K, the r value was always low and below 1, indicating that
the sample locations were informative and helped greatly to find the population structure.
When comparing the results from the EST-SSRs and nuSSRs datasets the results are slightly
different, which is not completely unexpected considering the different types of SSRs (Fig.
3.8a and Fig. 3.8b). However, for both datasets each population can almost be completely
assigned to one of the clusters detected. When K=2, for the EST-SSRs the populations CAT
and KEN can be assigned to one cluster (pink cluster), as ALG, ARR, BUC, EST, GER,
PUG, MON and SIN to the other cluster (blue cluster). The populations HAZ, TAZ and MEK
appear as a mixture of both clusters (Fig. 3.8a). For the nuSSRs dataset, the groups are
different, as MEK appears differentiated from the remaining populations in the blue cluster
and ALG and HAZ as mixed populations, although slightly more similar to MEK (Fig. 3.8b).
Despite of the validation of K=3 for the EST-SSRs most of the populations appear as a
mixture of clusters. The population of CAT appears differentiated, alone in one of the clusters
(pink cluster), the same way as SIN appears in another cluster (blue cluster) (although some
individuals show more probability of belonging to the pink cluster, along with CAT) (Fig.
3.8b). For the nuSSRs, at K=4, CAT also appears differentiated, alone in one cluster (blue
cluster). The Italian population, PUG, can also be placed alone in another cluster (green
cluster). The Portuguese populations (ARR, BUC, EST, GER, MON and SIN) can all also be,
to same extent, placed in a third cluster (pink cluster), and HAZ and TAZ appear as mixed
populations (Fig. 3.8b).
For a more robust analysis both matrices were merged together (Fig. 3.8c). At K=2 the
populations of ALG, CAT and MEK appear as part of the same pink cluster (79%, 88% and
91% of assignment probabilities, respectively), and the Portuguese populations and PUG as
part of the blue cluster (94% on average for the Portuguese populations and 85% for PUG).
HAZ, KEN and TAZ appear as mixed populations, with a slight tendency for the pink cluster
(Fig. 3.8c). At K=4 CAT differentiates from the other populations (75%) in a green cluster.
PUG and KEN appear as part of the same yellow cluster (79% and 74%, respectively). The
MEK population differentiates in another cluster (85% for the pink cluster) and HAZ and
ALG appear as mixed populations although more closely related to the MEK cluster. The
69
Results
Portuguese populations appear all together in the blue cluster (83% on average), with the
GER populations as the most mixed population in the group. The population TAZ is a mixed
population between several clusters (Fig. 3.8c). The geographic distribution of the clusters
obtained by STRUCTURE for the combined SSRs dataset is presented in Fig. 3.9a for K=2
and in Fig. 3.9b for K=4.
To complement the analyses run in STRUCTURE, GENELAND analysis was performed on
the merged dataset. The geographical distribution of the six clusters detected is shown in Fig.
3.9c. The first cluster (purple) was composed of the Portuguese populations (EST, GER,
BUC, MON, SIN and ARR); the second (orange) was composed only by KEN; the third
cluster (green) grouped the populations HAZ and TAZ; the fourth cluster (grey) included a
single population, CAT; the fifth cluster (blue) comprised the populations of ALG and MEK;
and the sixth cluster (red) considered only PUG.
AMOVA considering the clusters formed in GENELAND and STRUCTURE analysis (Fig.
3.8 and Fig. 3.9) was always significant for the clusters detected at the 0.001 level but also
showed that the great majority of genetic variation was found within populations (94%).
Also, for the molecular analysis considering the 6 clusters (structure obtained by the software
GENELAND) we were able to obtain the highest value for the genetic differentiation
between groups (FCT=4.99) (Supporting Information 9).
70
Results
Figure 3.8: Structure clustering results obtained for the a) EST-SSRs dataset (K=2 and 3); b) nuSSRs dataset
(K=2, 3 and 4); and c) combined dataset (K=2, 3 and 4). Populations are separated by black bars and identified
at the bottom. In all analyses, each distinct cluster is represented by a unique colour. Each individual is
represented by a thin bar and the colours on each vertical bar represent the probability of the individual
belonging to each cluster.
a)
b)
c)
71
Results
Figure 3.9: Geographic distribution of the clusters obtained by STRUCTURE and GENELAND:
a) combined dataset with Structure for K=2; b) combined dataset for Structure with K=4; and c)
combined dataset for GENELAND with K=6. Pie charts represent the assignment probabilities to
each cluster, and each cluster is colour coded. Pie charts sizes reflect the number of samples per
population (22-32). For a) and b) the colour codes reflect the ones used in Fig. 7c to code each
cluster.
a)
b)
c)
72
Discussion
4. Discussion
4.1 Differentiation and demographic patterns
Maternally inherited cpDNA markers yield valuable information about genetic variability
associated with local populations or provenances [143], therefore the geographic patterns of
cpDNA haplotypes in many widespread European forest trees are sometimes interpreted
based on the assumption of survival as glacial refugia in South and Eastern Europe – outside
the limits of the Weichselion ice sheet – and postglacial migration. Some species appear to
have spread northwards and westwards from a single refuge while others spread from
multiple refugia [48,54,57,70,144].
Analysis of the sequencing data from cpDNA regions, clearly show (with the exception of the
rbcL fragment) the presence of two well established cork oak lineages, the pure lineage and
the introgressed lineage (supported as well by the sequencing of the nuclear candidate gene).
The cpDNA pure lineage here described seems to be related with the “suber” lineage
described previously by Jiménez et al. [41], which is almost specific to cork oak populations
and may be considered as the original and most widely distributed lineage in this species
[41,64]. The TrnS/PsbC fragments presented the highest resolution power regarding this
lineage and three main haplotypes (A1, A2 and A3) are evident (Fig. 3.1 and Fig 3.4). These
three sublineages have well delimited geographic areas and possibly reflect refuge areas from
where expansion events putatively occurred after the last glaciation, which is somewhat
supported by the values from mismatch distribution and neutrality tests (Table 3.3). The
previous works of López de Heredia et al. [52] and Lumaret et al. [43] have indicated the
southern Iberian Peninsula as a possible refuge area, supported by palynological data.
Although the results found in this work are not conclusive enough to support this idea, the
sublineage A2 appears to have spread from a western Mediterranean area, consistent with a
refuge area in the Iberian Peninsula. Lumaret et al. [43], based on RFLP analysis of the
whole cpDNA, indicated two more possible refuge areas for cork oak, more precisely
southern Italian Peninsula and North Africa, albeit this is not supported from fossil record
[52]. It is difficult to determine the origin of sl A3 because this haplotype is distributed
throughout most of Peninsular Italy and North Africa (Algeria and Tunisia). However, any of
these geographic areas could have been a refuge for this lineage in cork oak in agreement
73
Discussion
with the results presented by Lumaret et al. [43]. Nevertheless, the presence of a haplotype (sl
A2) restricted to the Sicily Island was unexpected. Although no previous work suggests
Sicily as a refuge area, the geographic restriction of this lineage and the fact that is more or
less contemporaneous to the other two sublineages, suggests that this might be indeed a
refuge area for cork oak.
It is also possible that the extensive introgression of Q. suber by Q. ilex may indicate several
potential refugia areas. In fact López de Heredia et al. [52] presents North-eastern Spain
(Catalonia) as a potential refuge area resultant from extensive hybridization with Q. ilex [52].
The authors argue that the populations from this area present a predominant “ilex chlorotype”
that is very rare in holm oak. Therefore it cannot be discarded the hypothesis that some
populations might have withstood the glacial conditions in this area (or any other area), by
hybridizing. Although it is not possible to fully corroborate this, it was found that in CAT
population there is the indication of a total replacement of the cpDNA pure lineage, which
might indicate that the events of introgression might be ancient, and indeed reflect a glacial
refuge area. The same complete replacement of the cpDNA pure lineage appears to have
happened in MEK, and almost completely in HAZ.
However, more detailed inferences about the geographic origins of the haplotypes and their
migration scenarios will require additional sampling of populations and most likely other
genomic regions because the lower cpDNA variation itself could bias the identification of
glacial refugia for Quercus suber.
There is no previous works using sequences from the nuclear genome in this species.
However, in comparison with the results from the cpDNA sequences, the nuclear DNA
fragment seems to be in fact more informative than the cpDNA. The nucleotide diversity is
higher than those from the cpDNA fragments, as well as the haplotype number found for the
pure lineage (Table 3.1 and Table 3.2). Also, the analysis shows a more complex geographic
distribution history for cork oak. The results obtained, just like for the cpDNA, showed a
pure lineage composed by three sublineages, but the distribution of the sublineages are not as
geographically structured as they were for the cpDNA dataset (Fig. 3.6 and Fig. 3.7). The
sublineage α3 provided by the nuDNA dataset, that in the cpDNA was restricted to the Sicily
Island, extends to Lazio (Italy) and Tunisia. The sublineage α1, equivalent to the cpDNA
sublineage A1, was still the most frequent sublineage, but at the nuDNA it is not restricted to
74
Discussion
the western part of the Mediterranean as it was in the cpDNA, showing an extended
distribution, although not so frequently, to the eastern part of the Mediterranean. The same
was detected for the sublineage α2 that seemed not to be restricted to the eastern part of the
species distribution. These differences between cpDNA and nuDNA sequence data can be
explained by long-distance pollen dispersal and/or high levels of polymorphism. However,
considering the results for the levels of polymorphism in the candidate gene (Table 3.1 and
Table 3.2) they do not appear to be high enough to justify these differences and long-distance
pollen dispersal, with the more limited acorn dispersal, seems to be a better explanation. This
is consistent with indirect methods based on measures of genetic differentiation for nuclear
versus cpDNA markers in oaks, which suggest that pollen flow is much higher (by two orders
of magnitude) than seed flow [145-147].
The pattern of three sublineages obtained in this work clearly contrasts with the one
previously found by Magri et al [16]. Using cpDNA microsatellites, the authors analysed
cork oak populations throughout the species distribution range and found a high geographical
structure characterized by five distinct haplotypes (Fig. 1.4). The cpDNA SSR data combined
with paleobotanical and geodynamics models lead the authors to suggest an early Cenozoic
origin for cork oak in the Iberian Peninsula and a susequent genetic drift geographically
consistent with the Oligocene and Miocene break-up events [16] (Fig. 1.5). All these events
seemed to have occurred without detectable cpDNA modifications for a time span of at least
15-25 million years. This is somehow also inconsistent with the results found in this work. As
most of the cpDNA fragments sequenced here actually showed no resolution and therefore
haplotype variation that could detect the three sublineages, the TrnS/PsbC fragment indeed
shows that the sublineages are formed by a single mutational event (Fig. 3.1), which is
unlikely to date to an early Cenozoic.
4.2 Hybridization and introgression
Several proposals for Quercus taxonomy based on morphology have been presented [12,26].
Classifications have not been straightforward and especially at the subgenus level, are
uncertain. The taxonomic scheme proposed by Schwarz [26] is possibly the most accepted for
75
Discussion
the classification of cork oak, and appears to be the most suitable in describing the
systematics of European oaks [19,31,32].
Upon sequencing of the cpDNA fragments for the eleven Quercus species used in this study,
with the exception of the rbcL fragment that presented no sequence variation between all the
11 species used, the remaining 4 cpDNA fragments (matK, TrnS/PsbC, TrnL-F and
TrnH/PsbA) in general were able to distinguish the 4 subgenus (or subsections) (Fig. 3.1a,
Fig. 3.2, Fig. 3.3 and Fig. 3.5) proposed by Schwarz [26] (Quercus, Erythrobalanus,
Sclerophyllodrys and Cerris). However, the phylogenetic relationships between the subgenus
are uncertain among fragments and it is not possible to make accurate inferences about those
relationships. Also, in accordance to the latest work of Piredda et al. [29], it remains the idea
that the genus Quercus is noncompliant to barcoding with the most common cpDNA
sequences, since most of the species analysed within the same subgenus share the same
cpDNA haplotype. The low levels of cpDNA variation rate and hybridization events are
likely to be the cause [29].
The nuclear DNA, however, has a lot more discrimination power than the cpDNA. In fact the
EST 2T13 fragment supports the recognition of the subgenus Sclerophyllodrys, Cerris,
Erythrobalanus and Quercus, in agreement with the works of Bellarosa et al. [28] and
Bellarosa et al. [27] that also used fragments of the nuclear genome [27,28]. Also, the EST
2T13 fragment distinguishes all the species analysed, and although this issue requires further
study it supports the idea that the nuclear DNA might be a useful supplementary barcode tool
in difficult genus such as Quercus.
The complex evolutionary history of the Mediterranean evergreen oaks has already been
addressed by other authors, that showed that Q. suber, Q. ilex and Q. coccifera present shared
haplotypes as a result of successful hybridization and introgression of Q. suber by Q. ilex
[41,43,52]. However, those results were based on RFLP analysis over the cpDNA only and
with no insight on the nuclear genome. The sequencing of the cpDNA fragments immediately
evidences the introgression events in Q. suber. Since the subgenus Sclerophyllodrys and
Cerris are clearly distinguishable in the phylogenetic trees constructed (Fig. 3.1, Fig. 3.2, Fig.
3.3 and Fig. 3.5), the presence of cork oak samples in both subgenus easily points to
76
Discussion
introgression of Q. suber, allowing the identification of a pure lineage of cork oak haplotypes
in the subg Cerris, and an introgressed lineage in the subg Sclerophyllodrys.
The distribution of the cpDNA introgressed lineage appears restricted to the Western area of
the species distribution and peripheral regarding the distribution of the pure lineage
(specifically the sublineage A3). Although it is not possible to date precisely the introgression
events some may in fact reflect glacial refugia in this area of the distribution [possibly in the
North-eastern Spain (Catalonia) and/or Morocco] where cork oak populations survived with
introgression with Q. rotundifolia. In the postglacial colonization events of range expansion
the rapid expansion of cork oak from the pure lineage refuge may have limited the expansion
of the introgressed lineage forming the mixed populations that present both haplotype
lineages (Fig. 3.4). On the other hand, the analysis of the phylogenetic trees doesn‟t allow
ruling out the hypothesis of more recent or current introgression events (Fig. 3.1, Fig. 3.2,
Fig. 3.3). Current hybridization is still happening, most frequently in central and eastern
Iberia, with the first-generation hybrids between Q. suber and Q. ilex being easily identified
in the field [52].
The same introgressed lineage seems to be present in the nuclear DNA, although there is no
previous reference. However, the cork oak samples belonging to the introgressed lineage are
not always the same in both genomes. That is, some of the samples of the cpDNA
introgressed lineage present a nuclear genome of the pure lineage as others present evidences
of a nuclear introgressed lineage, and also some samples with the cpDNA belonging to the
pure lineage present a nuclear genome from the introgressed lineage.
The flowering phenology and present day ecology of the two species suggest that pollen-flow
might be expected to be predominantly from Q. suber into Q. ilex. Quercus suber performs
better than Q. ilex as a pollen parent in interspecific crosses [45]. Molecular evidence provide
support for this expectation [18,43,71]. These evidences would explain the cork oak samples
that present an introgressed cpDNA, but where the nuclear fragment belongs to the pure
lineage (see, for example, samples TAZ 1 or HAZ 5). However the reverse also seems to
happen, because samples were found that present a cpDNA from the pure lineage, and the
nuclear DNA belongs to the introgressed lineage (see TOL 3 or LAZ 2). Interestingly some
77
Discussion
samples (see GER 5 or TAZ 2) present both cpDNA and nuDNA fragments of the
introgressed lineage at the same time.
The fact that in the subg. Sclerophyllodrys the species Q. ilex and Q. coccifera present the
same haplotypes was suggested previously by some authors to be a result of introgression
between these species or of incomplete lineage sorting [41,64]. The same happens in the subg
Cerris, between Q. cerris and Q. suber. The lack of resolution of the cpDNA might argue for
incomplete lineage sorting, but previous authors suggested introgression between these
closely related species [16]. Despite Quercus suber and Quercus cerris belong to the same
taxonomic group, subgenus Cerris [17,30], they are morphologically well distinct, and have
different geographical and ecological ranges. The natural distribution range of Q. cerris is
from central and southern Europe to Asia Minor. However, in peninsular Italy and in Sicily
the ranges of Q. cerris and Q. suber overlap. In fact, Q. crenata is hypothesized to be a
hybrid between Q. suber and Q. cerris, although some other authors considered it instead as a
fixed species.
The analyses of the cpDNA datasets show that Quercus cerris and Quercus ilex share the
same haplotype for most of the fragments, which could point to an incomplete lineage
sorting. However, the highest resolution power of the TrnS/PsbC fragment (Fig. 3.1) places
Q. cerris haplotype as highly derived from the sublineage A3, in the Eastern Mediterranean
area. Although this cpDNA fragment differentiates the species it does not excludes possible,
and eventually somewhat ancient, hybridization events between Q. suber sl A3 and Q. cerris.
The nuclear fragment shows that Q. cerris shares the same haplotype as Q. suber samples
from sublineage α1, one of the lineages from the Eastern Mediterranean area. Considering
both types of markers, although the cpDNA does not immediately suggest introgression
events between these species, the nuclear candidate gene does not clarify between this
hypothesis and incomplete lineage sorting. Nevertheless, retention of ancestral polymorphism
also needs to be considered given the unavailability in confirming introgression between
these species. These two hypotheses might be confounded with each other, particularly when
contemporary introgression can not be discarded, due to the presence of both species in some
areas.
78
Discussion
4.3 Genetic diversity and population structure
The selection of the populations for the SSRs analyses was made based on the sequencing
results and throughout the entire range in order to maximize the chances of surveying a great
part of the species genetic diversity.
Recent work has been done in genetic diversity and population structure for several species
using a combined analysis of EST and genomic SSRs. Although a small amount of work has
been done in cork oak with nuSSRs there were no previous studies EST-SSRs. Tests for
neutrality indicate that selection did not differentially affect performance of EST and nuSSRs
in characterizing cork oak populations. Even though EST-SSRs are potentially exposed to
selection only a small percentage shows evidence of positive selection [91,93]. However, it is
important to conduct selective neutrality tests on EST-SSRs before using them in population
genetics analyses because even though they most probably will not be under strong selection
pressure, a small percentage may indeed be [91,93]. Also, results show that genetic diversity
of EST-SSRs measures similar to the nuSSRs, and there is no evidence of null alleles or other
genotyping errors. Therefore, evidences suggest that EST-SSRs are appropriate markers for
population genetics studies in cork oak.
The population differentiation found, although low was significant and is, at least for the
EST-SSRs (FST=0.071; RST=0.066; Dest=0.077), close to the lower limit of the range of the
average values (0.07-0.09) expected for the long-lived, wind-pollinated woody species (Table
3.5 and Table 3.6) [22]. Although the studies of Coelho et al. [22] and Simões de Matos (F.
Simões de Matos, PhD thesis, INETI Lisbon, 2007) only considered Portuguese populations
(FST=0.0172 and FST=0.02/RST=0.013, respectively), the general values of population
differentiation found here were considerably higher (RST=0.066 EST-SSRs vs. 0.045 nuSSRs;
FST=0.071 EST-SSR‟s vs 0.032 nuSSRs) (Table 3.5). However, pairwise FST and RST values
between Portuguese populations tend to be lower and non-significant (Table 3.6) denoting the
small differentiation between these populations, also found by the studies of Coelho et al.
[22] and Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). Also, and
in agreement with these results we found that most of species diversity (94%) is found within
rather than among populations.
79
Discussion
The locus MsQ13 was previously suggested to be particularly informative to detect F1
hybrids between Q. suber and Q. rotundifolia because alleles sizes do not overlap [88,142].
Even though the locus was tested here in individuals of every population, including all the
individuals that were detected as belonging to the introgressed lineages (either cpDNA or
nuDNA) the locus was monomorphic at the expected allele size for Q. suber. Nevertheless
the work of Burgarella et al. [142] clearly demonstrates the difficulty in detecting
introgressed hybrids in these species even though the microsatellite loci chosen for their work
were highly differentiated between species and had good diagnostic power. Also, although
there was the initial attempt of recreate the SSR battery used in this work some of them were
discarded because there was no amplification product, the scoring was extremely doubtful or
there was a high deviation from HWE. In the future, perhaps a more targeted choice for easily
reproducible markers is required, as well as the investment in some key holm oak populations
for comparative purposes in detecting hybrids.
Isolation by distance was tested but no correlation was found between genetic differentiation
and geographic distance among populations throughout the Mediterranean. However, in a
previous work of Ramírez-Valiente et al. [148] in cork oak Spanish populations, and using
the same nuSSRs battery as in this work (with the exception of QpZAG46 that had no clear
scoring) the authors found that the FST measures for the neutral markers were correlated with
geographic distance. In the same work the authors also found an association between leaf size
and the microsatellite QpZAG46, which suggests a possible linkage between QpZAG46 and
genes encoding for leaf size [148].
When comparing the population structure results from the EST-SSRs and nuSSRs datasets
they are slightly different, which is not completely unexpected considering the different types
of SSRs (Fig. 8a and Fig. 8b). However, when merging the datasets, from where the most
consistent information is expected to be retrieved, the results from STRUCTURE and
GENELAND softwares, although not in complete agreement, present the same emerging
pattern: 1) The Portuguese populations grouped together in one cluster. There was no
differentiation between the Portuguese populations and this is in agreement with the results
found by Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). This
80
Discussion
might be explained, considering the geographic distance between the populations, and it
might be therefore expected the role of gene flow in the homogenization of the alleles in
these populations. Also, this is in agreement with the low and mostly non-significant pairwise
FST and RST values found between these populations; 2) Catalonia is clearly the most well
differentiated population. The results always placed this population as the only of a cluster
and it scored the highest pairwise comparisons for FST and RST values.
On the overall GENELAND results provided a more plausible scenario regarding the
distribution of the clades. When analysing STRUCTURE results the population of Puglia
(PUG) appeared in awkward clusters that are difficult to explain, such as, when K=2 why
does it appear in the same cluster as the Portuguese populations, and when K=4 in the same
cluster as Kenitra (KEN), as in K=2 KEN and PUG are in opposite clusters. Although
STRUCTURE groups KEN and PUG in one cluster, GENELAND separates these two
populations in one cluster each (Figs. 8 and 9). The small number of SSRs and low levels of
differentiation might explain the senseless distribution of some clades in STRUCTURE
analysis. However, the finding that GENELAND identified a greater number of clusters than
STRUCTURE (six versus two/four), and that the same clusters were identified by
independent GENELAND runs and produced similar values of posterior probabilities, could
indicate that the algorithm employed in GENELAND may be more sensitive to find weak
clusters in space, when there is low differentiation. In fact, recently, a similar finding was
reported by Wellenreuther et al. [149] in a work with Ischnura elegans, the blue tailed
damselfly.
81
Final Remarks
5. Final Remarks
Extending over a surface of about 2.2 million ha in seven Mediterranean countries (Portugal,
Spain, Algeria, Morocco, Italy, Tunisia and France), cork oak forest landscapes represent one
of the best examples of the multi-functional role of forests, maintained over thousands of
years but promoting high biodiversity levels. Well managed cork oak forests provide valuable
ecological functions such as the conservation of soil, buffering against climate change and
desertification, water table recharge and run-off control and contribute to the survival of
many species. Cork oak trees are extremely important in ensuring that these ecosystems
maintain the ecological balance and do not harm the forest. These semi-natural woodlands
thus provide a valuable income to local populations both at a direct level with the harvesting
of cork and in an indirect level by providing other economically valuable resources such as
grazing grounds for animals and above all, the maintenance of an ecological balance
Mediterranean regions have been facing a growing number of extreme weather events due to
rapid change of climate. Assessment of the impacts of climate extremes upon cork oak trees
can help planning better forest management practices for coping with future climate change,
and to achieve the purpose of sustainable development of the ecosystems and societies within
the Mediterranean area.
Studying the consequences of past climate shifts on biodiversity are among the best tools to
validate models of the ecological and evolutionary consequences of future changes. Advances
in DNA analysis are allowing the reconstruction of the evolutionary history of forest trees.
This work focused on the first molecular approach assessing the potential of a combined
analysis with chloroplastidial and nuclear DNA markers, as well as sequence data and
microsatellites. The importance of such synergistic analyses is highlighted when addressing
questions such as the evolutionary history and geographic patterns of populations‟ diversity.
On the overall, the three major objectives in this work were achieved. It was possible to
gather valuable information on the evolutionary history of Quercus suber. Sequencing data
allowed the detection of two major haplotype lineages, consistent in both nuclear and
chloroplastidial genomes. Within the pure lineage were unveiled three sublineages and some
signs of recent population expansion. It is hypothesised that during the coldest periods cork
oak would only survive in more benign climatic areas (possibly three refuges), from where,
82
Final Remarks
after the warming at the end of the last glacial period, might have colonized its current
distribution area.
It was also possible to explore the phylogenetic relationships of cork oak and other Quercus
species from all the four recognized subgenus. This also helped the detection of the
introgressed lineage in cork oak resulting from several events of hybridization with Q. ilex.
Although some of the hybridization events might appear old, current hybridization can not be
discarded. Also, and although the hybridization and DNA introgression by Q. ilex has already
been reported by other authors, it became evident in this work that the introgression events
are also detected in the nuclear genome.
Finally, microsatellites allowed the identification of some differentiation and structuring in
some key cork oak populations. Although the differentiation and the clusters found might be
somewhat weak, adding microsatellites and populations will possibly strengthen the results
found here.
83
Bibliographic References
6. Bibliographic References
1
Food and Agriculture Organization of the United Nations (FAO) (2011) State of the
World‟s Forests. Fao World Forests
2
Petit, R. J. and Hampe, A. (2006) Some Evolutionary Consequences of Being a Tree.
Annual Review of Ecology, Evolution, and Systematics. 37, 187-214
3
Oldfield, S. et al. (1998) The World List of Threatened Trees, Cambridge,World
Conservation Press
4
Hansen, A. J. et al. (2001) Global Change in Forests: Responses of Species,
Communities, and Biomes. BioScience. 51, 765-779
5
Food and Agriculture Organization of the United Nations (FAO) (2010) Global Forest
Resources Assessment 2010. Main report
6
González-Martínez, S. C. et al. (2006) Forest-tree population genomics and adaptive
evolution. The New phytologist. 170, 227-38
7
Schaal, B. a et al. (1998) Phylogeographic studies in plants: problems and prospects.
Molecular Ecology. 7, 465-474
8
Petit, R. J. et al. (2005) Climate changes and tree phylogeography in the
Mediterranean. Taxon. 54, 877-885
9
Avise, J. C. et al. (1987) Intraspecific Phylogeography: The Mitochondrial DNA
Bridge Between Population Genetics and Systematics. Annual Review of Ecology and
Systematics. 18, 489-522
10
Avise, J. C. (2009) Phylogeography: retrospect and prospect. Journal of
Biogeography. 36, 3-15
11
Beheregaray, L. B. (2008) Twenty years of phylogeography: the state of the field
and the challenges for the Southern Hemisphere. Molecular ecology. 17, 3754-74
12
Nixon, K. C. (1993) Infrageneric classification of Quercus (Fagaceae) and typification
of sectional names. Annales Des Sciences Forestières. 50, 25s-34s
13
Nixon, K. C. (2006) Global and Neotropical Distribution and Diversity of Oak ( genus
Quercus ) and Oak Forests. In Ecology and conservation of neotropical montane oak
forests 185 (Kappelle, M., ed), pp. 3-13, Springer-Verlag
14
Pausas, J. G. et al. (2006) Regeneration of a marginal Quercus suber forest in the
eastern Iberian Peninsula. Journal of Vegetation Science. 17, 729
84
Bibliographic References
15
Elena-Rosselló, J. A. and Cabrera, E. (1996) Isozyme Variation in Natural Populations
of Cork-Oak (Quercus suber L.). Population Structure, Diversity, Differentiation and
Gene Flow. Silvae Genetica. 4 & 45, 229-235
16
Magri, D. et al. (2007) The distribution of Quercus suber chloroplast haplotypes
matches the palaeogeographical history of the western Mediterranean. Molecular
Ecology. 16, 5259-5266
17
Tutin, T. G. et al. (1993) Flora Europaea, Volume 1, (2nd edn) Cambridge University
Press
18
Toumi, L. and Lumaret, R. (1998) Allozyme variation in cork oak (Quercus suber L.):
the role of phylogeography and genetic introgression by other Mediterranean oak
species and human activities. Theoretical and Applied Genetics (TAG). 97, 647-656
19
Toumi, L. and Lumaret, R. (2001) Allozyme characterisation of four Mediterranean
evergreen oak species. Biochemical systematics and ecology. 29, 799-817
20
Pausas, G. P. et al. (2009) The tree. In Cork Oak Woodlands on the Edge. Ecology,
Adaptive Management, and Restoration (1st edn) (Aronson, J. et al., eds), pp. 11-21,
Island Press
21
Carrión, J. S. et al. (2000) Past distribution and ecology of the cork oak (Quercus
suber) in the Iberian Peninsula: a pollen-analytical approach. Diversity and
Distributions. 6, 29 - 44
22
Coelho, A. C. et al. (2006) Genetic Diversity of Two Evergreen Oaks [Quercus suber
(L.) and Quercus ilex subsp. rotundifolia (Lam.)] in Portugal using AFLP Markers.
Silvae Genetica. 55, 146-152
23
Soto, A. et al. (2007) Differences in fine-scale genetic structure and dispersal in
Quercus ilex L. and Q. suber L.: consequences for regeneration of mediterranean open
woods. Heredity. 99, 601-7
24
Pulido, F. J. et al. (2001) Size structure and regeneration of Spanish holm oak Quercus
ilex forests and dehesas: effects of agroforestry use on their long-term sustainability.
Forest Ecology and Management. 146, 1-13
25
Pons, J. and Pausas, J. G. (2006) Oak regeneration in heterogeneous landscapes: The
case of fragmented Quercus suber forests in the eastern Iberian Peninsula. Forest
Ecology and Management. 231, 196-204
26
Schwarz, O. (1964) Quercus L. In Flora Europaea, Volume 1 (2nd edn) (Tutin, T. G.
et al., eds), pp. 71-76, Cambridge University Press
27
Bellarosa, R. et al. (2005) Utility of ITS sequence data for phylogenetic reconstruction
of Italian Quercus spp. Molecular Philogenetics and Evolution. 34, 355-370
85
Bibliographic References
28
Bellarosa, R. et al. (1990) Ribosomal RNA genes in Ouercus spp. (Fagaceae). Plant
Systematics and Evolution. 172, 127-139
29
Piredda, R. et al. (2011) Prospects of barcoding the Italian wild dendroflora: oaks
reveal severe limitations to tracking species identity. Molecular ecology resources. 11,
72-83
30
Manos, P. S. et al. (1999) Phylogeny, Biogeography, and Processes of Molecular
Differentiation in Quercus subgenus (Fagaceae). Molecular Phylogenetics and
Evolution. 12, 333-349
31
Manos, P. S. et al. (2001) Systematics of Fagaceae: Phylogenetic test of reproductive
trait evolution. International journal of plant sciences. 162, 1361-1379
32
Kress, W. J. and Erickson, D. L. (2007) A two-locus global DNA barcode for land
plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.
PloS one. 2, e508
33
Cowan, R. S. et al. (2006) 300,000 Species to Identify: Problems, Progress, and
Prospects in DNA Barcoding of Land Plants. Taxon. 55, 611
34
Hajibabaei, M. et al. (2007) DNA barcoding: how it complements taxonomy,
molecular phylogenetics and population genetics. Trends in genetics. 23, 167-72
35
Chase, M. W. et al. (2005) Land plants and DNA barcodes: short-term and long-term
goals. Philosophical transactions of the Royal Society of London. Series B, Biological
sciences. 360, 1889-95
36
Fazekas, A. J. et al. (2008) Multiple multilocus DNA barcodes from the plastid
genome discriminate plant species equally well. PloS one. 3, e2802
37
Chase, M. W. et al. (2007) A proposal for a standardised protocol to barcode all land
plants. Taxon. 56, 295-299
38
Lahaye, R. et al. (2008) DNA barcoding the floras of biodiversity hotspots. PNAS.
105, 2923-8
39
CBOL, P. W. G. (2009) A DNA barcode for land plants. PNAS. 106, 12794-7
40
Neubig, K. M. et al. (2008) Phylogenetic utility of ycf1 in orchids: a plastid gene more
variable than matK. Plant Systematics and Evolution. 277, 75-84
41
Jiménez, P. et al. (2004) High variability of chloroplast DNA in three Mediterranean
evergreen oaks indicates complex evolutionary history. Heredity. 93, 510-5
42
Lumaret, R. et al. (2002) Phylogeographical variation of chloroplast DNA in holm oak
(Quercus ilex L.). Molecular ecology. 11, 2327-36
86
Bibliographic References
43
Lumaret, R. et al. (2005) Phylogeographical Variation of Chloroplast DNA in Cork
Oak (Quercus suber). Annals of Botany. 96, 853-861
44
Rushton, B. S. (1993) Natural hybridization within the genus Quercus L. Annals of
forest science. 50, 73-90
45
Boavida, L. C. et al. (2001) Sexual reproduction in the cork oak (Quercus suber L). II.
Crossing intra- and interspecific barriers. Sexual Plant Reproduction. 14, 143-152
46
Bennett, K. D. (1997) Evolution and Ecology: the pace of life, Cambridge University
Press.
47
French, H. M. (2007) The Periglacial Environment, (3rd edn) Longman.
48
Comes, H. P. and Kadereit, W. K. (1998) The effect of Quaternary climatic changes on
plant distribution and evolution. Trends in Plant Science. 3, 432-438
49
Hewitt, G. M. (1999) Post-glacial re-colonization of European biota. Biological
Journal of the Linnean Society. 68, 87-112
50
Willis, K. J. et al. (2000) The Full-Glacial Forests of Central and Southeastern Europe.
Quaternary Research. 53, 203-213
51
Palmé, A. E. et al. (2003) Postglacial recolonization and cpDNA variation of silver
birch, Betula pendula. Molecular ecology. 12, 201-12
52
López de Heredia, U. et al. (2007) Molecular and palaeoecological evidence for
multiple glacial refugia for evergreen oaks on the Iberian Peninsula. Journal of
Biogeography. 34, 1505-1517
53
Willis, K. J. and Van Andel, T. H. (2004) Trees or no trees? The environments of
central and eastern Europe during the Last Glaciation. Quaternary Science Reviews.
23, 2369-2387
54
Hewitt, G. M. (1996) Some genetic consequences of ice ages, and their role in
divergence and speciation. Biological Journal of the Linnean Society.
55
Hewitt, G. M. (2000) The genetic legacy of the Quaternary ice ages. Nature. 405, 907913
56
Petit, R. J. et al. (1997) Chloroplast DNA footprints of postglacial recolonization by
oaks. PNAS. 94, 9996-10001
57
Taberlet, P. et al. (1998) Comparative phylogeography and postglacial colonization
routes in Europe. Molecular ecology. 7, 453-64
87
Bibliographic References
58
Konnert, M. and Bergmann, F. (1995) The geographical distribution of genetic
variation of silver fir (Abies alba, Pinaceae) in relation to its migration history. Plant
Systematics and Evolution. 196, 19-30
59
Dumolin-Lapègue, S. et al. (1997) Phylogeographic structure of white oaks throughout
the European continent. Genetics. 146, 1475-87
60
Pollard, D. and Barron, E. J. (2003) Causes of model-data discrepancies in European
climate during Oxygen Isotope Stage 3 with insights from the last glacial maximum.
Quaternary Research. 59, 108-113
61
Barron, E. and Pollard, D. (2002) High-Resolution Climate Simulations of Oxygen
Isotope Stage 3 in Europe. Quaternary Research. 58, 296-309
62
Kvacek, Z. and Walther, H. (1989) Paleobotanical studies in Fagaceae of the European
Tertiary. Plant systematics and Evolution. 162, 213-229
63
Dumolin, S. et al. (1995) Inheritance of chloroplast and mitochondrial genomes in
pedunculate oak investigated with an efficient PCR method. Theoretical and Applied
Genetics. 91, 1253-1256
64
López de Heredia, U. et al. (2005) The Balearic Islands: a reservoir of cpDNA genetic
variation for evergreen oaks. Journal of Biogeography. 32, 939-949
65
Kremer, A. and Petit, R. J. (1993) Gene diversity in natural populations of oak species.
Annals of forest science. 50, 186-202
66
Wright, S. (1931) Evolution in Mendelian Populations. Genetics. 16, 97-159
67
Thompson, J. D. (2005) Plant Evolution in the Mediterranean, Oxford University
Press.
68
Fineschi, S. et al. (2000) Chloroplast DNA polymorphism reveals little geographical
structure in Castanea sativa Mill. (Fagaceae) throughout southern European countries.
Molecular Ecology. 9, 1495 -1503
69
Petit, R. J. et al. (2002) Chloroplast DNA variation in European white oaks
Phylogeography and patterns of diversity based on data from over 2600 populations.
Forest Ecology and Management. 156, 5-26
70
Palmé, A. E. and Vendramin, G. G. (2002) Chloroplast DNA variation, postglacial
recolonization and hybridization in hazel, Corylus avellana. Molecular ecology. 11,
1769-79
71
Elena-Rosselló, J. A. et al. (1992) Evidence for hybridization between sympatric
holm-oak and cork-oak in Spain based on diagnostic enzyme markers. Vegetation. 99,
115-118
88
Bibliographic References
72
Hamza, N. B. (2010) Cytoplasmic and nuclear DNA markers as powerful tools in
populations‟ studies and in setting conservation strategies. African Journal of
Biotechnology. 9, 4510-4515
73
Levy, F. et al. (1996) A population genetic analysis of chloroplast DNA in Phacelia.
Heredity. 76, 143-55
74
Taberlet, P. et al. (1991) Universal primers for amplification of three non-coding
regions of chloroplast DNA. Plant Molecular Biology. 17, 1105-1109
75
Aoki, K. et al. (2003) Intraspecific sequence variation of chloroplast DNA among the
component species of evergreen broad-leaved forests in Japan. Journal of plant
research. 116, 337-44
76
Baraket, G. et al. (2008) Chloroplast DNA analysis in Tunisian fig cultivars (Ficus
carica L.): Sequence variations of the trnL-trnF intergenic spacer. Biochemical
Systematics and Ecology. 36, 828-835
77
Rathbone, D. A. et al. (2007) Microsatellite and cpDNA variation in island and
mainland populations of a regionally rare eucalypt, Eucalyptus perriniana
(Myrtaceae). Australian journal of botany. 55, 513-520
78
Kress, W. J. et al. (2005) Use of DNA barcodes to identify flowering plants. PNAS.
102, 8369-8374
79
Nishizawa, T. and Watano, Y. (2000) Primer pairs suitable for PCR-SSCP analysis of
chloroplast DNA in angiosperms. Journal of Phytogeography Taxon. 48, 63-66
80
Calonje, M. et al. (2008) Non-coding nuclear DNA markers in phylogenetic
reconstruction. Plant Systematics and Evolution. 282, 257-280
81
Hare, M. P. (2001) Prospects for nuclear gene phylogeography. Trends in Ecology &
Evolution. 16, 700-706
82
Bhargava, A. and Fuentes, F. F. (2010) Mutational dynamics of microsatellites.
Molecular biotechnology. 44, 250-66
83
Goldstein, D. B. and Pollock, D. D. (1997) Launching Microsatellites : A Review of
Mutation Processes and Methods of Phylogenetic Inference. Journal of Heredity. 88,
335-342
84
Qureshi, S. N. et al. (2004) EST-SSR: A New Class of Genetic Markers in Cotton. The
Journal of Cotton Science. 8, 112-123
85
Oliveira, E. J. et al. (2006) Origin, evolution and genome distribution of
microsatellites. Genetics and Molecular Biology. 29, 294-307
89
Bibliographic References
86
Lazrek, F. et al. (2009) The use of neutral and non-neutral SSRs to analyse the genetic
structure of a Tunisian collection of Medicago truncatula lines and to reveal
associations with eco-environmental variables. Genetica. 135, 391-402
87
Hornero, J. et al. (2001) Testing the Conservation of Quercus spp. Microsatellites in
the Cork Oak, Q. suber L. Silvae Genetica. 50, 3-4
88
Soto, A. et al. (2003) Nuclear Microsatellite Markers for the Identification of Quercus
ilex L . and Q . suber L . hybrids. Silvae Genetica. 52, 63-66
89
Nagaraj, S. H. et al. (2007) A hitchhiker‟s guide to expressed sequence tag (EST)
analysis. Briefings in bioinformatics. 8, 6-21
90
Bouck, A. and Vision, T. (2007) The molecular ecologist‟s guide to expressed
sequence tags. Molecular ecology. 16, 907-24
91
Kim, K. S. et al. (2008) Utility of EST-derived SSRs as population genetics markers in
a beetle. The Journal of heredity. 99, 112-24
92
Ueno, S. and Tsumura, Y. (2007) Development of ten microsatellite markers for
Quercus mongolica var. crispula by database mining. Conservation Genetics. 9, 10831085
93
Ellis, J. R. and Burke, J. M. (2007) EST-SSRs as a resource for population genetic
analyses. Heredity. 99, 125-32
94
Porth, I. et al. (2005) Linkage mapping of osmotic stress induced genes of oak. Tree
Genetics & Genomes. 1, 31-40
95
Cuénoud, P. et al. (2002) Molecular hylogenetics of Caryophyllales based on nuclear
18S rDNA and plastid and rbcl, atpB and matK DNA sequences. American Journal of
Botany. 89, 132-144
96
Jeffrey, J. A. and Lexer, C. (2008) A set of novel DNA polymorphisms within
candidate genes potentially involved in ecological divergence between Populus alba
and P. tremula, two hybridizing European forest trees. Molecular Ecology Resources.
8, 188-192
97
Casasoli, M. et al. (2006) Comparison of Quantitative Trait Loci for Adaptive Traits
Between Oak and Chestnut Based on an Expressed Sequence Tag Consensus Map.
Genetics Society of America. 172, 533-546
98
Dow, B. D. et al. (1995) Characterization of highly variable (GA/CT) n microsatellites
in the bur oak, Quercus macrocarpa. Theoretical and Applied Genetics. 91, 137-141
99
Steinkellner, H. et al. (1997) Identification and characterization of (GA/CT)nmicrosatellite loci from Quercus petraea. Plant molecular biology. 33, 1093-6
90
Bibliographic References
100
Kampfer, S. et al. (1998) Characterization of (GA)n Microsatellite Loci from Quercus
Robur. Hereditas. 129, 183-186
101
Alberto, F. et al. (2010) Population differentiation of sessile oak at the altitudinal front
of migration in the French Pyrenees. Molecular ecology. 19, 2626-39
102
Van Oosterhout, C. et al. (2004) Micro-Checker: Software for Identifying and
Correcting Genotyping Errors in Microsatellite Data. Molecular Ecology Notes. 4,
535-538
103
Thompson, J. D. et al. (1997) The CLUSTAL_X windows interface: flexible strategies
for multiple sequence alignment aided by quality analysis tools. Nucleic acids
research. 25, 4876-82
104
Larkin, M. a et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics
(Oxford, England). 23, 2947-8
105
Hall, T. A. (1999) BioEdit: A biological user-friendly sequence alignment editor and
analisis program. Nucleic Acids Symposium. 41, 95-98
106
Pina-Martins, F. and Paulo, O. S. (2008) Concatenator: Sequence Data Matrices
Handling Made Easy. Molecular ecology resources. 8, 1254-5
107
Swofford, D. L. (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other
Methods). Version 4. Inauer Associates, Sunderland, Massachusetts
108
Ronquist, F. and Huelsenbeck, J. P. (2003) MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics. 19, 1572-1574
109
Nylander, J. (2004) MrModeltest V2. Evolutionary Biology Centre.
110
Bandelt, H. J. et al. (1999) Median-joining networks for inferring intraspecific
phylogenies. Molecular biology and evolution. 16, 37-48
111
Watterson, G. a (1978) The homozygosity test of neutrality. Genetics. 88, 405-17
112
Slatkin, M. (1994) An exact test for neutrality based on the Ewens sampling
distribution. Genetical Research. 64, 71-74
113
Slatkin, M. (1996) A correction to the exact test based on the Ewens sampling
distribution. Genetical Research. 68, 259-260
114
Excoffier, L. and Lischer, H. E. L. (2010) Arlequin suite ver 3.5: a new series of
programs to perform population genetics analyses under Linux and Windows.
Molecular Ecology Resources. 10, 564-567
91
Bibliographic References
115
Harpending, H. C. (1994) Signature of ancient population growth in a low-resolution
mitochondrial DNA mismatch distribution. Human biology an international record of
research. 66, 591-600
116
Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics. 123, 585-95
117
Fu, Y.-X. (1997) Statistical Tests of Neutrality of Mutations Against Population
Growth, Hitchhiking and Background Selection. Genetics Society of America. 147,
915-925
118
Ramos-Onsins, S. E. and Rozas, J. (2002) Statistical properties of new neutrality tests
against population growth. Molecular biology and evolution. 19, 2092-100
119
Rousset, F. (2008) genepop‟007: a complete re-implementation of the genepop
software for Windows and Linux. Molecular Ecology Resources. 8, 103-106
120
Librado, P. and Rozas, J. (2009) DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics (Oxford, England). 25, 1451-2
121
Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New
York, USA. 512 pp
122
Goudet, J. (1995) FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics
. Journal of Heredity . 86, 485-486
123
Goudet, J. (2001) FSTAT, a program to estimate and test gene diversities and fixation
indices (version 2.9.3). Available ,
124
Peakall, R. and Smouse, P. E. (2006) genalex 6: genetic analysis in Excel. Population
genetic software for teaching and research. Molecular Ecology Notes. 6, 288-295
125
Wier, B. S. and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of
population structure. Evolution. 38, 1358-1370
126
Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele
frequencies. Genetics. 139, 457-62
127
Crawford, N. G. (2010) Smogd: Software for the Measurement of Genetic Diversity.
Molecular ecology resources. 10, 556-7
128
Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular
Ecology. 17, 4015-4026
129
Hedrick, P. W. (2005) A Standardized genetic differentiation measure. Evolution. 59,
1633-1638
92
Bibliographic References
130
Nei, M. and Chesser, R. K. (1983) Estimation of fixation indices and gene diversities.
Annals of Human Genetics. 47, 253-259
131
Rousset, F. (1997) Genetic differentiation and estimation of gene flow from F-statistics
under isolation by distance. Genetics Society of America. 145, 1219-1228
132
Jensen, J. L. et al. (2005) Isolation by distance, web service. BMC genetics. 6, 13
133
Pritchard, J. K. et al. (2000) Inference of population structure using multilocus
genotype data. Genetics. 155, 945-59
134
Hubisz, M. J. et al. (2009) Inferring weak population structure with the assistance of
sample group information. Molecular ecology resources. 9, 1322-32
135
Falush, D. et al. (2003) Inference of population structure using multilocus genotype
data: linked loci and correlated allele frequencies. Genetics. 164, 1567-87
136
Evanno, G. et al. (2005) Detecting the number of clusters of individuals using the
software STRUCTURE: a simulation study. Molecular ecology. 14, 2611-20
137
Rosenberg, N. A. (2004) Distruct: a program for the graphical display of population
structure. Molecular Ecology Notes. 4, 137-138
138
Guillot, G. et al. (2005) Geneland: a computer package for landscape genetics.
Molecular Ecology Notes. 5, 712-715
139
Guillot, G. et al. (2005) A spatial statistical model for landscape genetics. Genetics.
170, 1261-80
140
François, O. et al. (2006) Bayesian clustering using hidden Markov random fields in
spatial population genetics. Genetics. 174, 805-16
141
Excoffier, L. et al. (1992) Analysis of molecular variance inferred from metric
distances among DNA haplotypes: application to human mitochondrial DNA
restriction data. Genetics. 131, 479-91
142
Burgarella, C. et al. (2009) Detection of hybrids in nature: application to oaks
(Quercus suber and Q. ilex). Heredity. 102, 442-52
143
Lexer, C. et al. (2004) Hybrid zones as a tool for identifying adaptive genetic variation
in outbreeding forest trees: lessons from wild annual sunflowers (Helianthus spp.).
Forest ecology and management. 197, 49-64
144
Petit, R. J. et al. (2002) Identification of refugia and post-glacial colonisation routes of
European white oaks based on chloroplast DNA and fossil pollen evidence. Forest
Ecology and Management. 156, 49-74
93
Bibliographic References
145
Dow, B. D. and Ashley, M. V. (1996) Microsatellite analysis of seed dispersal and
parentage of samplings in bur oak, Quercus macrocarpa. Molecular ecology. 5, 615627
146
Hu, X. S. and Ennos, R. A. (1999) Impacts of seed and pollen flow on population
genetic structure for plant genomes with three contrasting modes of inheritance.
Genetics. 152, 441-50
147
Streiff, R. et al. (1999) Pollen dispersal inferred from paternity analysis in a mixed oak
stand of Quercus robur L . and Q. petraea ( Matt .) Liebl . Molecular Ecology. 8, 831841
148
Ramírez-Valiente, J. a et al. (2009) Elucidating the role of genetic drift and natural
selection in cork oak differentiation regarding drought tolerance. Molecular ecology.
18, 3803-15
149
Wellenreuther, M. et al. (2011) Environmental and climatic determinants of molecular
diversity and genetic population structure in a coenagrionid damselfly. PloS one. 6,
e20440
150
Lewontin, R. C. (1964) The Interaction of Selection and Linkage. I. General
Considerations; Heterotic Models. Genetics. 49, 49-67
151
Meirmans, P. G. and Hedrick, P. W. (2011) Assessing population structure: F(ST) and
related measures. Molecular ecology resources. 11, 5-18
94
Supporting Information
Supporting Information
95
Supporting Information
Supporting Information 1
Information regarding the primers used for the amplification of each cpDNA fragment is
summarized in table S1.1, as well as the annealing temperatures for PCR amplification and
fragments size.
Table S1.1: Description of the cpDNA fragments used concerning primer sequences, annealing temperature (Ta
in ºC) and fragment size (in base pairs).
Primers
Locus
TrnL-F
TrnS-PsbC
TrnH-PsbA
matK
rbcla
Forward
5’ GGT TCA AGT CCC TCT
ATC CC 3’
5’ TGA ACC TGT TCT TTC
CAT GA 3’
5’ CGC GCA TGG TGG ATT
CAC AAT CC 3’
5' CGA TCT ATT CAT TCA
ATA TTT C 3'
5' ATG TCA CCA CAA ACA
GAG ACT AAA GC 3'
Reverse
5’ ATT TGA ACT GGT GAC ACG
AG 3’
5’ GAA CTA TCG AGG GTT
CGA AT 3’
5’ GTT ATG CAT GAA CGT AAT
GCT C 3’
5' TCT AGC ACA CGA AAG TCG
AAG T 3'
5' GTA AAA TCA AGT CCA CCR
CG 3'
Ta
Size
65
381
65
250
65
478
65
740
65
552
Reference
Taberlet et al.,
1991
Nishizawa &
Watano, 2000
Kress et al., 2005
Cuénoud et al.,
2002
Kress & Erickson,
2007
A description regarding the three nuclear candidate genes tested in this study is summarized
in table S1.2, as well as the annealing temperatures and primers for PCR amplification and
fragments size.
Table S1.2: Primer sequences and bibliographic references, annealing temperature (in ºC), fragment size (in
base pairs) and locus information for the nuDNA fragments.
Primers
Locus
EST 2T13
Cons 58
Phyt B
Forward
5' CAT GCA CTG
CCA ATC TCA GAG
A 3'
5'CCA ATT CTC
TTA GTG GCA
AGG 3'
5' ATA TGG CGA
ATA TGG GGT CA
3'
Reverse
5' ATA ATT TGC CTC
ATC ACT ACA TAA GA
3'
Description
Ta
size
Reference
Osmotic stress
related gene
55
249
Porth et al.,
2005
5' GCT TTG GGA TGA
TGT TTT GG 3'
Auxin repressed
protein
*
*
Casasoli et
al., 2006
5' GGC ATC CAT TTC
TGC ATT CT 3'
Phytocrome B,
involved in flower
phenology
*
*
Jeffrey &
Lexer, 2008
* Amplification product was never obtained for cork oak.
96
Supporting Information
Supporting Information 2
Information regarding the 11 dinucleotide nuclear microsatellite (nuSSRs) markers is
summarized in table S2.1. A description and relevant information about the 6 EST-SSRs
tested in this study is also summarized in table S2.2.
Table S2.1: Description of the nuSSRs used concerning primer sequences, annealing temperatures (Ta in ºC),
repeat motif and size ranges (in base pairs).
Primers
Locus
MsQ13
QpZAG9
QpZAG15
QpZAG36
QpZAG46
QpZAG110
QrZAG11
QrZAG7
QrZAG20
QsA11
QsD8
Forward
5' TGG CTG CAC
CTA TGG CTC TTA
G 3'
5' GCA ATT ACA
GGC TAG GCT GG
3'
5' CGA TTT GAT
AAT GAC ACT ATG
G 3'
5' GAT CAAA AAT
TTG GAA TAT TAA
GAG AG 3'
5' CCC CTA TTG
AAG TCC TAG CCG
3'
5' GGA GGC TTC
CTT CAA CCT ACT
3'
5' CCT TGA ACT
CGA AGG TGT CCT
T 3'
5' CAA CTT GGT
GTT CGG ATC AA
3'
5' CCA TTA AAA
GAA GCA GTA TTT
TGT 3'
5’ GAT CTC TTT
GTC AAC CCA GAC
3’
5’ GAT CCT CTG
CTT CTC TCT G 3’
Reverse
5' ACA CTC AGA
CCC ACC ATT
TTT CC 3'
5' GTC TGG ACC
TAG CCC TCA TG
3'
5' CAT CGA CTC
ATT GTT AAG
CAC 3'
5' ACT GTG GTG
GTG AGT CTA
ACA TGT AG 3'
5' TCT CCC ATG
TAA GTA GCT
CTG 3'
5' GAT CTC TTG
TGT GCT GTA
TTT 3'
5' GTA GGT CAA
AAC CAT TGG
TTG ACT 3'
5' GTG CAT TTC
TTT TAT AGC
ATT CAC 3'
5' GCA ACA CTC
AGC CTA TAT
CTA GAA 3'
5’ ATG TGT GTG
GTG ATG GGT
TT 3'
5’CTG CAA CTT
TAT CCG CCT CC
3’
Size range (bp)
Ta
Repeat
motif
Expected
Found
Reference
55
(AG)n
222-246
218
Dow et al.,
1995
50
(AG)12
182-210
223-249
Steinkellner
et al., 1997
57
(AG)23
108-152
101-135
Steinkellner
et al., 1997
*
(AG)19
210-236
*
Steinkellner
et al., 1997
*
(AG)13
190-222
*
Steinkellner
et al., 1997
50
(AG)15
206-262
208-258
Steinkellner
et al., 1997
50
(TC)18
238-263
255-281
Kampfer et
al., 2004
50
(TC)17
115-153
115-133
Kampfer et
al., 2004
50
(TC)22
160-200
161-171
Kampfer et
al., 2004
*
(CA)n
258-276
*
Simões de
Matos 2007
*
(CA)n
140-150
*
Simões de
Matos 2007
* Amplification product was never obtained, or the scoring was unreliable.
97
Supporting Information
Table S2.2: Primer sequences [92], annealing temperature (in ºC), repeat motif, size ranges (in base pairs) and
locus information for the EST-SSRs.
Primers
Locus
QmOST1
DN949770
QmD12
CR627959
QmAJ1
AJ577265
QmDN1
DN950717
QmDN2
DN949776
QmDN3
DN950726
Forward
5' CAA CCA TCG
AGG CCA TTA
CGA A 3'
5' GCT CCC TGG
TAG TCG GCT
AAA GA 3'
Reverse
5' TCA CCG ATC
TTG AAG GTC
CTC GA 3'
5' CAA TTG GGA
CAA CAT GGA
AGC AT 3'
5' ATT CAG GCC
GCA AAT CAA
TAA GG 3'
5' TAG TTT TCC
CAG CGA ATC
CAA CA 3'
5' GAA ACT GGT
CCC CTT CTC
TTG GA 3'
5' CTT CTT GAA
GGG ACT GAC
CCC AT 3'
5' CAA CCA TCG
AGG CCA TTA
CGA A 3'
5' TCA CCG ATC
TTG AAG GTC
CTC AG 3'
5' TCA AAC AAT
CTC AAG GCT
CCC AA 3'
5' GCT TTT GAG
AAA CTT TGG
CCA CC 3'
Size range (bp)
Ta
Repeat motif
Expected
Found
Description
58
(AG)19
149-171
134-152
EST Noncoding
58
(GCA)7
243-251
240-246
57
(GAA)6
374-380
360-375
58
(GGA)6
242-261
236
*
(AG)9
156-168
*
58
(TC)10
361-381
361-375
* Amplification product was never obtained, or the scoring was unreliable.
98
EST Coding
Zinc finger
protein
EST Coding
Pheromone
receptor-like
protein
EST Coding
Salt tolerance
protein
EST Noncoding
60S ribosomal
protein L21
EST Noncoding
Putative
carboxylterminal
proteinase
Supporting Information
Supporting Information 3
The cpDNA concatenated matrix has a length of 1109 bp, where 92 are variable. The model
of sequence evolution for the Bayesian analysis (BA) was calculated separately for each
cpDNA data set. The BA tree showed a very similar result to that of the MP analysis,
therefore the MP tree for the concatenated dataset is presented in Fig. S3.1. The concatenated
tree supports the results of the individual trees, where the 4 major groups are present (Fig.
3.1a, Fig. 3.2 and Fig. 3.3). Highlighted in yellow, the Group A is composed by the cork oak
samples belonging to the pure lineage distributed in the three sublineages (A1, A2 and A3) in
accordance with the TrnS/PsbC tree (Fig. 3.1). Group B is the most variable one, composed
by several haplotypes of cork oak samples from the introgressed lineage, as well as with
samples from Quercus ilex (subs rotundifolia and ilex) and Quercus coccifera. The Group C,
composed by several Quercus species, is closely related to Group A. Group D is constituted
by Quercus rubra, which is placed as the most distant species from cork oak, as it happened
in the phylogeny of the TrnH/PsbA fragment (Fig. 3.2)
99
Supporting Information
Figure S3.1: Maximum parsimony tree of the cpDNA concatenated dataset. Four groups are represented and
color coded. Group A is highlighted in yellow: Cork oak‟s Pure lineage (Bright Yellow - Sublineage A2 (Sl
A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1(Sl A1)); Group B (orange –
cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group C is
highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q.
canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at
the nodes are the bootstrap support value obtained from 1000 replicates for the MP analysis and the Bayesian
credibility value.
100
Supporting Information
Supporting Information 4
Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Fig. S4.1,
Fig. S4.2 and Fig. S4.3) reflecting the four major groups in the phylogenetic trees. Also they
show shared haplotypes for Q. suber, in clade B, with Quercus coccifera, Q. ilex ilex and
Quercus ilex rotundifolia. Although the networks do not clearly reflect the phylogenetic
relationships between the groups they bring visual support information about the distance
between them, as the networks appear as a simple and clear way to represent the mutational
steps between haplotypes, and also about the haplotype frequencies. The median-joining
networks of the distribution representing the observed haplotypes for each Quercus species,
for the fragments TrnS/PsbC, TrnH/PsbA and TrnL-F, are respectively presented in Fig. S4.1,
Fig. S4.2 and Fig. S4.3.
101
Supporting Information
Figure S4.1: A median-joining haplotype network generated from 250 bases of the TrnS/PsbC intergenic spacer
region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading
indicates the proportion of individuals with a particular haplotype for a given species (Yellow: cork oak‟s Pure
lineage and Q. cerris (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3, including Q. cerris;
Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q.
rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light
Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black
circles indicate the presence of a missing ancestral haplotype
102
Supporting Information
Figure S4.2: A median-joining haplotype network generated from 478 bases of the TrnH/PsbA intergenic
spacer region. Circle size reflects the relative frequency of each haplotype across all 10 Quercus species.
Shading indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork
oak‟s pure lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q.
rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light
Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black
circles indicate the presence of a missing ancestral haplotype.
103
Supporting Information
Figure S4.3: A median-joining haplotype network generated from 381 bases of the TrnL-F intergenic spacer
region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading
indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork oak‟s pure
lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q. rotundifolia;
Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light Blue Q. rubra.
Each number in the network indicates the number of mutations between the haplotypes. Black circles indicate
the presence of a missing ancestral haplotype.
104
Supporting Information
Supporting Information 5
Global evaluation of the EST-SSR dataset using MICRO-CHEKER v2.2.3 [102] revealed no
evidence of genotyping errors due to stuttering or large allele dropout, but identified possible
null alleles, by a general excess of homozygotes, at two loci: QmOST1 and QmDN3
(p<0.05). For QmOST1 locus there is the possibility of null alleles for the populations of
Haza del Lino (HAZ) and Mekna (MEK) (Fig. S5.1). However, for both populations, when
analyzing the graphics the observed values of the homozygote frequencies are barely outside
the range of the expected values. Therefore this microsatellite was not discarded from the
following analysis.
For the QmDN3 locus the observed homozygote frequencies were clearly out of the range of
what would be expected. The fact that this was detected for all the 13 populations provides a
strong indicator that there seems to be in fact null alleles for this locus. A representative
example of all populations is exhibited in Fig. S5.2. As a result this locus was discarded from
all subsequent analyses.
Regarding the nuSSRs dataset, the global evaluation with MICRO-CHEKER revealed, again,
no evidence of genotyping errors due to stuttering or large allele dropout, but identified
possible null alleles in a few populations for the markers: QpZAG110 (Fig. S5.3), QrZAG11
(Fig. S5.4) and QrZAG20 (Fig. S5.6). Specifically for QpZAG110 locus, null alleles were
detected for the populations of Serra da Arrábida (ARR) e Serra do Buçaco (BUC); for the
QrZAG11 locus the possibility of null alleles was detected for Serra da Estrela (EST)
population; and for QrZAG20 locus for the populations of Puglia (PUG) and Serra de
Monchique (MON). However, when analyzing the graphics the observed values of the
homozygote frequencies are barely outside the range of the expected values, and the
indication of null alleles is only for one or two populations out of the 13 analysed. Therefore
these microsatellites were not discarded from the following analysis.
105
Supporting Information
a)
b)
c)
d)
Figure S5.1: MICRO-CHEKER charts for the QmOST1 locus for the populations of HAZ (Figs. a and b) and
MEK (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population
HAZ; b) Homozygote frequencies for the population HAZ; c) Frequency differences in base pair for the
population MEK; d) Homozygote frequencies for the population MEK.
a)
b)
Figure S5.2: MICRO-CHEKER charts of the QmDN3 locus for the population of Serra da Arrábida (ARR), as
a representative of the indication of null alleles for all the 13 populations. The significance level is 0.05; a)
Frequency differences in base pair for the population ARR; b) Homozygote frequencies for the population ARR.
106
Supporting Information
a)
b)
c)
d)
Figure S5.3: MICRO-CHEKER charts for the locus QpZAG110 for the populations ARR and BUC. The
significance level is 0.05; a) Frequency differences in base pair for the population ARR; b) Homozygote
frequencies for the population ARR; c) Frequency differences in base pair for the population BUC; d)
Homozygote frequencies for the population BUC.
107
Supporting Information
a)
b)
Figure S5.4: MICRO-CHEKER charts for the locus QrZAG11 for the populations EST. The significance level
is 0.05; a) Frequency differences in base pair; b) Homozygote frequencies.
a)
b)
c)
d)
Figure S5.5: MICRO-CHEKER charts for the QrZAG20 locus for the populations of PUG (Figs. a and b) and
MON (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population
PUG; b) Homozygote frequencies for the population PUG; c) Frequency differences in base pair for the
population MON; d) Homozygote frequencies for the population MON.
108
Supporting Information 6
Linkage disequilibrium is the non-random association of alleles at two or more loci. This is a
statistical association and the loci do not have necessarily to be physically linked [150].
Genotypic linkage disequilibrium between all pairs of loci was tested by means of a
contingency exact test using GenePop v4 [119] (Table S6.1). No significant departure from
the null hypothesis of linkage equilibrium was detected. Therefore the eight polymorphic
microsatellite markers should be useful for this study.
Table S6.1: Test for linkage disequilibrium for all pairs of loci
using Fisher's method, implemented in GenePop software.
Loci combination
p
EST-SSRs
QrOST1 & QpD12
0.37
Loci combination
QrOST1 & QmAJ1
EST-SSRs 0.10
QpD12 & QmAJ1 QrOST1
0.07 & QpD12
nuSSRs
QrOST1 & QmAJ1
QpZAG110 & QpZAG9
0.38 & QmAJ1
QpD12
QpZAG110 & QrZAG20
0.90
nuSSRs
QpZAG9 & QrZAG20
0.43 & QpZAG9
QpZAG110
QpZAG110 & QrZAG7
0.96& QrZAG20
QpZAG110
QpZAG9 & QrZAG7QpZAG9
0.51& QrZAG20
QrZAG20 & QrZAG7
0.34 & QrZAG7
QpZAG110
QpZAG110 & QrZAG11 QpZAG9
0.81 & QrZAG7
QpZAG9 & QrZAG11 QrZAG20
0.38 & QrZAG7
QpZAG20 & QrZAG11
0.10& QrZAG11
QpZAG110
QrZAG7 & QrZAG11QpZAG9
0.97& QrZAG11
Complete dataset
QpZAG20 & QrZAG11
QrOST1 & QpZAG110 QrZAG7
0.95& QrZAG11
QrOST1 & QpZAG9
0.88
Complete dataset
QrOST1 & QrZAG20QrOST1
0.18
& QpZAG110
QrOST1 & QrZAG7 QrOST1
0.00 & QpZAG9
QrOST1 & QrZAG11 QrOST1
0.44& QrZAG20
QpD12 & QpZAG110 QrOST1
0.88 & QrZAG7
QpD12 & QpZAG9 QrOST1
0.95& QrZAG11
QpD12 & QrZAG20 QpD12
0.05
& QpZAG110
QpD12 & QrZAG7 QpD12
0.25 & QpZAG9
QpD12 & QrZAG11 QpD12
0.86& QrZAG20
QmAJ1 & QpZAG110 QpD12
0.58 & QrZAG7
QmAJ1 & QpZAG9 QpD12
0.96& QrZAG11
QmAJ1 & QrZAG20QmAJ1
0.59
& QpZAG110
QmAJ1 & QrZAG7 QmAJ1
0.15 & QpZAG9
QmAJ1 & QrZAG11 QmAJ1
0.82& QrZAG20
QmAJ1 & QrZAG7
QmAJ1 & QrZAG11
Table S6.1: Test for linkage disequilibrium for all pairs of loci
using Fisher's method, implemented in GenePop software.
p
0.37
0.10
0.07
0.38
0.90
0.43
0.96
0.51
0.34
0.81
0.38
0.10
0.97
0.95
0.88
0.18
0.00
0.44
0.88
0.95
0.05
0.25
0.86
0.58
0.96
0.59
0.15
0.82
Supporting Information
Supporting Information 7
Although FST is widely used as a measure of population differentiation and structure, it has
been criticized because of its dependency on within-population diversity, which has led to the
development of replacement statistics such as D, the measure of actual differentiation among
populations, according to Jost [128]. Nevertheless, Meirmans & Hendrick [151] recommend
continuing to use FST in combination with the new statistics.
Tests of pairwise Dest were performed for the thirteen populations. Both SSR‟s matrices
were analysed together. The overall genetic differentiation at the microsatellite loci was low
(Pairwise FST from 0.000 to 0.097) (Table S7.1). The Dest values very resembled the FST and
RST matrices (Table 3.6), although with a tendency to be lower.
Table S7.1: Pair Dest values between every population.
--
0.010 0.021 0.031
-- 0.000 0.017
-- 0.031
--
ALG ARR
BUC
CAT
0.005
0.016
0.060
0.035
--
0.012
0.001
0.002
0.050
0.032
--
0.007
0.000
0.002
0.029
0.017
0.000
--
0.056
0.050
0.041
0.097
0.073
0.037
0.033
--
0.039
0.040
0.065
0.051
0.026
0.057
0.030
0.042
--
0.006
0.008
0.009
0.045
0.009
0.013
0.013
0.034
0.025
--
0.033
0.000
0.005
0.043
0.039
0.001
0.001
0.055
0.069
0.037
--
0.012
0.000
0.003
0.037
0.030
0.016
0.005
0.072
0.070
0.016
0.029
--
HAZ
EST
GER
PUG
KEN
TAZ
MON
SIN
0.000 ALG
0.024 ARR
0.028 BUC
0.070 CAT
0.012 HAZ
0.015 EST
0.013 GER
0.066 PUG
0.062 KEN
0.010 TAZ
0.046 MON
0.031 SIN
-MEK
MEK
ALG – Forêt des Guerbès (Algeria); ARR – Arrábida (Portugal); BUC – Buçaco (Portugal); CAT – Cataluña
(Spain); HAZ – Haza del Lino (Spain); EST – Estrela (Portugal); GER – Gerês (Portugal); ITA – Puglia
(Italy); KEN – Kenitra (Marocco); TAZ – Taza (Marocco); MON – Monchique (Portugal); SIN – Sintra
(Portugal); TUN – Mekna (Tunisia).
110
Supporting Information
Supporting Information 8
The estimation of the number of populations (K) should be treated with care and a biological
interpretation of K may not be straightforward. We used the posterior probability of the data
for a given K, LnP(D), to identify the most probable number of clusters using both DeltaK
(DK) ad hoc statistics [136] and by plotting the average values of LnP(D). As the LnP(D), the
(ad hoc) estimate for the number of groups given by STRUCTURE might not always
correspond to the real number of clusters, the DeltaK, an ad hoc quantity related to the second
order rate of change of the log probability of data with respect to the number of clusters,
tends to be a good predictor of the real number of clusters.
The EST-SSR‟s and nuSSR‟s datasets were analysed separately and then merged together to
determine the species genetic structure (Fig. 3.8). The plots of the logarithm of the
probability of the data [LnP(D)] and of the Evanno‟s criterion [136] are represented,
respectively, in Fig. S8.1, Fig. S8.2 and Fig. S8.3 for the EST-SSRs, nuSSRs and combined
datasets.
111
Supporting Information
Figure S8.1: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the
EST-SSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20
replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1
to 13).
112
Supporting Information
Figure S8.2: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the
nuSSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20
replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1
to 13).
113
Supporting Information
Figure S8.3: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the
combined dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20
replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1
to 13).
114
Supporting Information
Supporting Information 9
Several AMOVA (Hierarchical Analysis of Molecular Variance) [141] analysis (with 1000
permutations) (Table S9.1), were performed based on the allelic frequencies (FST values). It
was intended to verify the distribution of the genetic variability between the different
hierarchy levels: groups (FCT), populations (FSC) and individuals (FST). The different
structures considered were in accordance with the clusters (K) obtained by the softwares
STRUCTURE [133] (Fig. 3.8) and GENELAND [138] (Fig. 3.9). It is assumed that the best
genetic structure obtained is the one that explains the major part of variation by the groups
(FCT), that is, it maximizes the break between populations
Table S9.1: Variation percentages over different levels estimated with AMOVA. The analysis
was performed for the SSR loci combined dataset, based on Fst values.
Among groups
Among populations
within groups
Within populations
%
Fct
%
Fsc
%
Fst
Two clusters (K=2)
2.81
0.02814
***
3.16
0.03249
***
94.03
0.02814
***
Four clusters (K=4)
3.37
0.05817
***
2.45
0.02532
***
94.18
0.03370
***
Six clusters (K=6)
4.99
0.05844
***
0.85
0.00897
**
94.16
0.04992
***
%= Percentage explained by the total of molecular variance
Significance level **P<0.01, ***P<0.001
115
Supporting Information
116