Genetic structure in Finland and Sweden

Genetic structure in
Finland and Sweden:
Aspects of Population History
and Gene Mapping
Elina Salmela
Department of Medical Genetics, Haartman Institute,
Research Programs Unit, Molecular Medicine, and
Institute for Molecular Medicine Finland FIMM
University of Helsinki
Academic dissertation
To be presented, with the permission of the Faculty of Medicine of the University of Helsinki,
for public examination in Auditorium XII, Main Building, Fabianinkatu 33, Helsinki,
on October 5th 2012, at 12 noon
Helsinki 2012
Supervisors
Professor Juha Kere
Department of Medical Genetics
University of Helsinki
Helsinki, Finland
Folkhälsan Institute of Genetics
Helsinki, Finland
Department of Biosciences and Nutrition
Karolinska Institutet
Huddinge, Sweden
Docent Päivi Lahermo
Institute for Molecular Medicine Finland FIMM
University of Helsinki
Helsinki, Finland
Reviewers
Professor Jaakko Ignatius
Department of Clinical Genetics
Turku University Hospital
Turku, Finland
Professor Marjo-Riitta Järvelin
School of Public Health
Imperial College London
London, UK
Opponent
Docent Mattias Jakobsson
Department of Evolutionary Biology
Uppsala University
Uppsala, Sweden
ISBN 978-952-10-8190-3 (paperback)
ISBN 978-952-10-8191-0 (PDF)
http://ethesis.helsinki.fi/
Layout: Jere Kasanen
Cover: Ancestries by Admixture. Modified from Figure 14.
Unigrafia, Helsinki 2012
To my family
Houkat ja viisahat uurteet otsillaan
että mihin ja miksi ja mistä, mikä johti?
Aika ei paljasta arvoituksiaan,
meidän mukanaan niitä vain käytävä on kohti.
Pauli Hanhiniemi: Äärelä (2006)
In Hehkumo: Muistoja tulevaisuudesta
Fools and wise people wonder with furrowed brows:
what, where does it lead to, why, and where from?
Time does not tell us its mysteries –
all we can do is wander with it to meet them.
Translated by Sara Norja
Contents
List of original publications 9
Abbreviations 10
Abstract 11
Tiivistelmä 14
Introduction 17
Review of the literature 19
20
1.Basics of population genetics 1.1.Inheritance 20
1.2. Allele frequencies 20
1.2.1.Mutation 21
21
1.2.2. Genetic drift 22
1.2.3.Selection 22
1.2.4.Migration 1.3.Genotypes 23
24
1.4. Haplotypes, recombination, and linkage disequilibrium 2. Human genome structure 26
2.1. Types and patterns of variation 27
2.1.1. Single nucleotide polymorphisms (SNPs) 27
28
2.1.2.Microsatellites 28
2.1.3. Structural variation 2.2. Haplotype structure 29
3.Gene mapping strategies 30
3.1. Linkage analysis 30
3.1.1. Suitability for different populations and phenotypes 32
3.2. Association analysis 32
3.2.1. Suitability for different populations 34
35
3.2.2. Suitability for different phenotypes 3.3. Complementary methods 36
4.Genetic inference of population history 37
4.1. General principles 37
4.2. Genome-wide SNP datasets in studies of population history 39
4.3. The multidisciplinary inference of human past 41
5.Finland 43
5.1. Population prehistory and history 43
5.2.Language 47
5.3. Genetic structure 48
5.3.1.
5.3.2.
5.3.3.
5.3.4.
5.3.5.
5.3.6.
5.3.7.
Finns compared to other European populations 48
Eastern influence in the Finnish gene pool 49
50
Finns and their Saami neighbors 50
Genetic structure within Finland 51
Phenotypic variation in Finland East-west difference in cardiovascular diseases in Finland 52
53
Age estimates of Finnish genes 6.Sweden 55
6.1. Population prehistory and history 55
6.2.Language 58
6.3. Genetic structure 59
6.3.1. Swedes compared to other European populations 59
60
6.3.2. Genetic structure within Sweden Aims of the study 63
Materials and methods 64
64
1.Samples 1.1. Finns (I, II, III, IV) 65
1.2. Swedes (II, III, IV) 66
1.3. Reference populations (III, IV) 66
1.4. Simulated population data (I) 68
2.Markers, genotyping, and quality control 69
71
3.Analyses 3.1. Model-free clustering of individuals: multidimensional scaling 71
3.2. Model-based clustering of individuals: Structure, Admixture,
and Geneland 71
71
3.3. IBS distributions within and between populations 3.4. SNP-based inspection of eastern admixture 72
3.5. Distances between populations: FST and allele frequency
differences 72
3.6. Hierarchical population subdivision, AMOVA, and
inbreeding 73
3.7. Linkage disequilibrium 73
3.8. Correlation of genetic and geographic distances 73
3.9. Tests of the subpopulation difference scanning (SDS) approach 73
3.10.Enrichment analyses
74
Results 75
75
1.Population structure in Finland and Sweden 1.1. Model-free clustering of individuals: multidimensional
scaling 75
1.2. Model-based clustering of individuals: Structure, Admixture,
and Geneland 77
1.3. IBS distributions between populations 80
1.4. SNP-based inspection of eastern admixture 81
1.5. Distances between populations: FST and allele frequency
differences 82
1.6. Hierarchical population subdivision, AMOVA, and
inbreeding 82
1.7. IBS distributions within populations 83
1.8. Linkage disequilibrium 83
83
1.9. Correlation of genetic and geographic distances 2. Subpopulation difference scanning (SDS) approach 84
2.1. Population simulations 84
85
2.2. East-west differences in the genome-wide SNP dataset Discussion 88
88
1.Technical aspects 1.1. Quality control of genome-wide SNP data for population
genetics 88
1.2. Comparison of model-based methods of individual
clustering 89
2.Population structure in Finland and Sweden 90
2.1. Comparison of Finland and Sweden 90
2.2.Finland 92
2.2.1. Eastern influence 92
93
2.2.2. East-west difference 2.2.3. Genetic vs. archaeological view of Finnish population history 94
2.3.Sweden 95
2.4. Limitations of the results 97
3. Subpopulation difference scanning (SDS) approach 100
3.1. The idea 100
101
3.2. Theoretical testing, advantages, and limitations 3.3. Analyses of genome-wide population data 103
Concluding remarks and future prospects 104
Acknowledgements 106
References 110
List of original publications
This thesis is based on the following original publications. They are referred
to in the text by their Roman numerals. In addition, some unpublished data
are presented.
I
Salmela E, Taskinen O, Seppänen JK, Sistonen P, Daly MJ, Lahermo
P, Savontaus ML, Kere J. Subpopulation difference scanning: a
strategy for exclusion mapping of susceptibility genes. J Med Genet.
2006; 43(7):590-7.
II Hannelius U, Salmela E, Lappalainen T, Guillot G, Lindgren CM,
von Döbeln U, Lahermo P, Kere J. Population substructure in
Finland and Sweden revealed by the use of spatial coordinates and
a small number of unlinked autosomal SNPs. BMC Genet. 2008;
9:54.
III Salmela E*, Lappalainen T*, Fransson I, Andersen PM, DahlmanWright K, Fiebig A, Sistonen P, Savontaus ML, Schreiber S, Kere
J, Lahermo P. Genome-wide analysis of single nucleotide poly­
morphisms uncovers population structure in Northern Europe.
PLoS One. 2008; 3(10):e3519.
IV Salmela E, Lappalainen T, Liu J, Sistonen P, Andersen PM,
Schreiber S, Savontaus ML, Czene K, Lahermo P, Hall P, Kere J.
Swedish population substructure revealed by genome-wide single
nucleotide polymorphism data. PLoS One. 2011; 6(2):e16747.
* Equal contribution.
Publications II and III have earlier appeared as part of the doctoral theses
of Ulf Hannelius (Stockholm 2008) and Tuuli Lappalainen (Helsinki 2009),
respectively.
These publications have been reproduced with permission from their
copyright holders.
The open access publications II-IV are freely available online.
List of original publications
9
Abbreviations
AD
Anno Domini
aDNA
ancient DNA
AIM
ancestry-informative marker
AMI
acute myocardial infarction
AMOVA
analysis of molecular variance
BC
before Christ
BRCA1
breast cancer 1, early onset gene
BRCA2
breast cancer 2, early onset gene
CEU
Utah residents with ancestry from northern and western Europe
CHB
Han Chinese in Beijing, China
CI
confidence interval
cMcentimorgan
CVD
cardiovascular disease
CYP21
steroid 21-hydroxylase gene
DNA
deoxyribonucleic acid
DIP
deletion/insertion polymorphism
FDH
Finnish disease heritage
GWAS
genome-wide association study
HGDP
Human Genome Diversity Project
HLA
human leukocyte antigen
HWE
Hardy-Weinberg equilibrium
IBS
identity by state
JPT
Japanese in Tokyo, Japan
kbkilobase
KEGG
Kyoto Encyclopedia of Genes and Genomes
LD
linkage disequilibrium
logarithm of odds
LOD
MAF
minor allele frequency
Mbmegabase
MCAD
medium-chain acyl-CoA dehydrogenase
MDS
multidimensional scaling
mtDNA
mitochondrial DNA
NHGRI
National Human Genome Research Institute
OR
odds ratio
PAR
pseudoautosomal region
PCA
principal component analysis
QC
quality control
RNA
ribonucleic acid
SDS
subpopulation difference scanning
SNP
single nucleotide polymorphism
STR
short tandem repeat
YRI
Yoruba in Ibadan, Nigeria
10
Abbreviations
Abstract
The genetic structure of populations is of interest not only as a source of
population history information, but also because of its importance to gene
mapping studies. The main aim of this thesis was to study the genetic
structure of the human populations in present-day Finland and Sweden.
Although both populations have been studied with small numbers of genetic
markers, the present analyses were the first to utilize genome-wide data from
thousands of single nucleotide polymorphism (SNP) markers. Furthermore,
this thesis introduced a novel gene mapping approach, subpopulation difference scanning (SDS), and tested its theoretical applicability to the Finnish
population.
The study subjects included 280 Finnish and 1525 Swedish individuals. Of
the Finns, 141 had grandparents born in western and 139 in eastern Finland.
For the Swedes, the geographic information was based on their places of
residence, scattered throughout the country roughly according to population
density. The Finns were genotyped for 238,000 SNPs on the Affymetrix 250K
StyI array and the Swedes for 550,000 SNPs on the Illumina HumanHap550
array. Most analyses were based on subsets of the 29,000 SNPs common to
both platforms, and none used imputed genotypes. Genotypes from Russian,
German, British and other populations served as reference data. The amount
and patterns of genetic variation between populations and individuals were
analyzed by standard population genetic methods using, for example, allele
frequencies, identity-by-state (IBS) similarities, FST distances, and linkage
disequilibrium (LD).
The results revealed that the genetic diversity within Sweden and Finland
was lower than in central European reference populations, and was substantially reduced in eastern Finland. Finns also differed clearly from central Europeans, and a strong population structure existed within Finland:
the genetic distance between eastern and western Finns was greater than
between for instance the British and northern Germans. In fact, western
Finns were genetically equally close to Swedes than to eastern Finns. In
Sweden, the overall population structure seemed clinal and lacked strong
borders. The population in the southern parts of the country was relatively
homogeneous and genetically close to the Germans and British, while the
northern subpopulations differed from the south and also from each other.
Somewhat surprisingly, however, genetic diversity in northern Sweden was
not markedly reduced.
Subjects from the Swedish-speaking region in Finnish Ostrobothnia
were genetically intermediate between Finns and Swedes and demonstrated
abstract
11
clear signs of contacts with their Finnish-speaking neighbors. Notably, the
geographically closest study areas in the north of Sweden (Norrbotten) and
Finland (Northern Ostrobothnia) did not appear to be genetically close.
Instead, the northern Swedes were genetically closer to the southwestern
Finns. Interestingly, the Finns, especially eastern Finns, showed a higher
genetic affinity to East Asian reference populations than did most other
populations. Although a similar higher affinity was observable in Russians,
they were not genetically close to eastern Finns, which suggests that the
eastern influence in the Finnish population may predate the expansion of
the Russian population to its current areas.
The genetic substructure within Finland and Sweden could cause problems
in association studies that use geographically unmatched cases and controls,
especially if the number of genotyped markers limits the possibilities for
stratification correction. On the other hand, the substructure observed within
Finland could also be utilized in mapping genes for diseases that show varying
incidences within the country, for example cardiovascular diseases with their
east-west difference. Because a gene underlying an incidence difference must
itself harbor a frequency difference, the SDS mapping approach proposes
that such genes could be mapped by comparing samples from high- and
low-incidence subpopulations and excluding from further analyses those
genome areas that do not show a (sufficient) difference between the samples.
Simulations mimicking the population history of Finland demonstrated
that the SDS approach may work for the cardiovascular diseases, provided
that a substantial portion of the incidence difference was caused by a single
gene. On the other hand, analyses of the genome-wide SNP data suggested
that the east-west difference in cardiovascular diseases may largely result
from many genes’ combined contributions, each of which may remain too
small for SDS to detect.
Overall, the patterns of population structure observed ­­­– the genetic
distinctness and reduced diversity of the north European populations, the
unique characteristics of northern Sweden, and the east-west difference and
eastern influence in Finland – are congruent with results from earlier studies
with smaller numbers of markers and from later genome-wide studies. The
patterns are also generally explicable by known features of population history:
The combination of mainly European but partly eastern elements in the
Finnish gene pool agrees well with the contacts that the Finnish population
has had to both east and west throughout its history, while the regionally
varying strength of those contacts may serve to explain genetic differences
between the eastern and western parts of the country. Furthermore, these
east-west differences have been accentuated by the history of agricultural
settlement in eastern Finland which has involved strong founder events
and subsequent isolation of small breeding units. Although small size has
undoubtedly characterized the population also in northern Sweden, there
12
abstract
the most extreme drift-induced reductions in genetic diversity may have
been alleviated by admixture with the neighboring populations.
In summary, the results of this thesis emphasize the capacity of genomewide SNP data to detect patterns of population structure – also in populations that have often been assumed homogeneous, such as the Finns and
Swedes. Obviously, knowledge of genome-wide population structure has
immediate relevance also to studies focusing on diseases and other important
phenotypes.
abstract
13
Tiivistelmä
Populaation, esimerkiksi ihmisväestön, geneettinen rakenne kertoo populaation historiasta. Geneettinen rakenne vaikuttaa myös siihen, mitkä väestöt
ovat eri geenikartoitusmenetelmien otollisimpia kohteita. Tämän väitöstyön
päätavoitteena oli tarkastella väestön geneettistä rakennetta Suomessa ja
Ruotsissa. Molempien väestöjen rakennetta on aiemmin selvitetty muutamilla markkereilla kerrallaan, mutta tässä työssä hyödynnettiin ensi kertaa perimänlaajuista aineistoa tuhansista yhden emäksen polymorfioista
(SNP:eistä). Lisäksi tässä työssä esiteltiin uusi geenikartoitusmenetelmä
(subpopulation difference scanning, SDS) ja tutkittiin sen teoreettista soveltuvuutta suomalaisväestössä käytettäväksi.
Tutkimusaineistona oli 280 suomalaista ja 1525 ruotsalaista koehenkilöä. Suomalaisilta tunnettiin isovanhempien syntymäpaikat: koehenkilöistä 141 edusti Länsi-Suomea ja 139 Itä-Suomea. Ruotsalaiskoehenkilöiltä
tiedettiin asuinpaikat, ja ne kattoivat koko maan likimain väestötiheyttä
vastaavasti. Suomalaisista tutkittiin 238.000 SNP-markkeria Affymetrix
250K StyI -siruilla ja ruotsalaisista 550.000 SNP-markkeria Illumina
HumanHap550 -siruilla. Pääosa analyyseista perustui 29.000:een siruille
yhteiseen SNP-markkeriin; laskennallista genotyyppi-imputaatiota ei
tehty. Vertailuaineistona käytettiin mm. venäläisiä, saksalaisia ja brittejä.
Väestöjen ja yksilöiden välistä geneettistä vaihtelua kartoitettiin vakiintunein
populaatiogeneettisin laskentamenetelmin esimerkiksi alleelifrekvenssien,
FST-etäisyyksien ja kytkentäepätasapainon (LD) perusteella.
Tulokset osoittivat että geneettinen monimuotoisuus Suomessa ja
Ruotsissa on pienempi kuin Keski-Euroopassa, ja Itä-Suomessa erityisen
vähäinen. Suomalaiset poikkesivat keskieurooppalaisista vertailuväestöistä
huomattavasti, ja Suomen sisällä havaittiin selvä populaatiorakenne: itä- ja
länsisuomalaisten välinen geneettinen etäisyys oli suurempi kuin esimerkiksi
brittien ja pohjoissaksalaisten välinen, ja länsisuomalaiset olivat itse asiassa
geneettisesti yhtä lähellä ruotsalaisia kuin itäsuomalaisia. Ruotsin sisällä
havaittiin pohjois-etelä-suuntainen rakenne, mutta ei jyrkkiä geneettisiä
rajoja. Eteläruotsalaiset olivat geneettisesti varsin homogeenisia ja muistuttivat läheisesti saksalaisia ja brittejä, kun pohjoisruotsalaiset puolestaan
erosivat selvästi sekä eteläruotsalaisista että toisistaan. Geneettinen monimuotoisuus Pohjois-Ruotsissa ei kuitenkaan ollut etelää alhaisempi, mikä
oli hivenen yllättävää.
Suomalaisaineistoon sisältyi pieni määrä Pohjanmaan rannikon suomenruotsalaisia. Geneettisesti nämä henkilöt sijoittuivat suomalaisten ja
ruotsalaisten välimaastoon, mikä kertoo erikielisten väestöryhmien välisestä
14
Tiivistelmä
kanssakäymisestä. Toisaalta Suomen ja Ruotsin rajaseudulla pohjoisessa
maantieteellisesti lähekkäiset suomalaiset ja ruotsalaiset koehenkilöt eivät
olleet geneettisesti erityisen läheisiä, vaan pohjoisruotsalaiset olivat geneettisesti lähempänä lounaissuomalaisia. Kun tutkittuja väestöjä verrattiin
itäaasialaisiin, suomalaisissa – erityisesti itäsuomalaisissa – nähtiin pieni
mutta selvä itäinen vaikutus. Samanlainen vaikutus nähtiin myös venäläisissä, mutta koska venäläiset ja itäsuomalaiset eivät olleet keskenään läheisiä,
suomalaisissa havaittu itävaikutus lienee peräisin ajalta ennen venäläisten
levittäytymistä nykyisille asuinseuduilleen.
Sekä Suomen että Ruotsin sisällä havaitut geneettiset erot ovat niin suuria
että ne voivat haitata geneettisiä assosiaatiotutkimuksia joissa tapaus- ja
verrokkiotosten maantieteelliset jakaumat poikkeavat toisistaan, varsinkin
jos tutkitaan niin pieniä markkerilukumääriä että perimänlaajuisia korjausmenetelmiä ei voida käyttää. Toisaalta Suomen populaatiorakennetta
voidaan mahdollisesti hyödyntää etsittäessä geenejä jotka voisivat olla tiettyjen tautien Suomen-sisäisten esiintyvyyserojen (esimerkiksi sydäntautien
itä-länsi-eron) taustalla. Perusideana on, että taudin esiintyvyyseron voi
selittää vain sellainen geneettinen tekijä, jonka alueellinen yleisyysjakauma
vastaa taudin esiintyvyysjakaumaa. Niinpä jos verrataan korkean ja matalan
esiintyvyysalueen väestöstä poimittuja otoksia, voidaan jatkotarkasteluiden
ulkopuolelle jättää ne geneettiset tekijät, jotka eivät näillä kahdella alueella
eroa. Tämän SDS-kartoitusmenetelmän toimimisen edellytyksiä testattiin
Suomen väestöhistoriaa mukailevissa tietokonemallinnuksissa ja todettiin, että menetelmä voi toimia sydäntaudeille, mikäli riittävä osa niitten
yleisyyserosta on yhden geenin aikaansaamaa. Perimänlaajuisen analyysit
viittasivat kuitenkin siihen, että sydäntautien esiintyvyyseron taustalla on
lukuisia geenejä, joista jokaisella saattaa olla yksinään niin pieni vaikutus,
että niitä ei SDS-menetelmällä välttämättä kyetä paikallistamaan.
Tässä tutkimuksessa nähdyt populaatiorakenteet – Pohjois-Euroopan
väestöjen vähentynyt monimuotoisuus ja korostunut geneettinen etäisyys
keskieurooppalaisista, Pohjois-Ruotsin omaleimaisuus sekä Suomessa
havaittu itäinen vaikutus ja maansisäinen itä-länsi-ero – vastaavat pääosin
tuloksia, joita on saatu aiemmissa, pienempiin markkerimääriin perustuneissa tutkimuksissa ja myös sittemmin tehdyissä perimänlaajuisissa
tutkimuksissa. Lisäksi ne ovat sopusoinnussa tunnetun väestöhistorian
kanssa: suomalaisten pääasiassa eurooppalaistyyppinen perimä itäisine
piirteineen selittyy yhteyksillä, joita Suomesta on kautta aikojen ollut eri
ilmansuuntiin, ja läntisten yhteyksien suurempi osuus Länsi-Suomessa
voi aiheuttaa osan Suomen-sisäisestä erosta. Eroa on lisäksi kasvattanut
Itä-Suomen epätavallinen väestöhistoria, jossa maanviljelyksen leviämiseen
liittyi voimakkaita perustajanvaikutuksia ja jossa pienet lisääntymisyksiköt
ovat myöhemminkin johtaneet geneettiseen satunnaisajautumiseen. Vaikka
Tiivistelmä
15
myös Pohjois-Ruotsissa väestömäärä on ollut pieni, mittavinta satunnais­
ajautumista lienee siellä hillinnyt naapuriväestöjen välinen sekoittuminen.
Kokonaisuutena tämän väitöstutkimuksen tulokset kertovat perimänlaajuisten SNP-aineistojen käyttökelpoisuudesta väestörakennetutkimuksissa.
Vaikka suppeammatkin aineistot pystyvät valaisemaan väestöhistoriallisia
ilmiöitä, tautien ja muiden näkyvien ominaisuuksien tutkimuksissa nimenomaan perimänlaajuisen rakenteen tuntemus on olennaista. Nämä tulokset
myös muistuttavat, että kulttuuristen erojen vähäisyys ei takaa väestön
geneettistä homogeenisuutta, kuten Suomessa havaitut jyrkät geneettiset
erot hyvin osoittavat.
16
Tiivistelmä
Introduction
Population genetics studies the genetic structure of populations – groups
of (potentially) interbreeding individuals – and the forces affecting that
structure. Insights into a population’s structure can be of interest for several
reasons. Because the structure results from various forces that have affected
the population, it carries information about the population’s past; in gene
mapping efforts, the genetic variation and structure of the study population
determine the potential success of different mapping approaches; in forensics,
information on population structure and variant frequencies are centrally
important in assigning correct probabilities for random and observed genetic
matches; in the conservation of endangered species, the maintenance of
genetic variability requires knowledge of population structure, and so on.
For many decades, the markers available to population structure studies
were limited to blood groups and a small number of other proteins, until the
development of deoxyribonucleic acid (DNA) methods enabled the use of
molecular markers. Since then, the majority of population genetic studies
have focused on molecular variation in the mitochondrial DNA (mtDNA)
and the Y chromosome. Their uniparental mode of inheritance and lack of
recombination make them the markers of choice for studies of branching and
timing as well as for the detection of sex-specific phenomena. On the other
hand, each is only a single locus, subject to random forces such as genetic drift,
and may thus not reflect the full history of a population. Their contribution
to the phenotypic variation of populations is also limited. Fortunately, it
has recently become technically feasible to complement them by analyses of
genome-wide sets of thousands of single nucleotide polymorphisms (SNPs).
This thesis studies the population structure of humans (Homo sapiens
Linnaeus 1758) in Finland and Sweden using autosomal markers, with an
emphasis on the population history implications and gene mapping consequences. These northern European populations are of interest because their
remote geographic location has led to a restricted gene flow into, within,
and between them. Additionally, after a long history of small sizes and low
densities – partly due to the relatively late introduction of agriculture – the
populations have recently expanded. Whereas the genetic structure in
both populations has previously been studied using protein, mtDNA and
Y-chromosomal markers, studies III and IV of this thesis constitute the first
published analyses of population structure within Finland and Sweden that
were based on genome-wide SNP data.
With molecular markers, genetic structure in Finland has been studied
more extensively than that in Sweden. This difference may relate to interest
Introduction
17
awakened by the obvious discrepancy between the eastern origin of the
Finns’ language and their predominantly European gene pool. Furthermore,
it may partly stem from research efforts directed toward diseases of Finns,
the existence of which is intimately related to population history. Such
studies have demonstrated the feasibility of Finns as a target population
for standard gene mapping methods. Meanwhile, it may also be possible
to utilize the typical features of the Finnish population structure by more
specific methods. This is exemplified by Study I, which introduces a novel
approach (subpopulation difference scanning, SDS) potentially useful in
mapping disease genes that have drifted to differing frequencies in otherwise
closely related subpopulations.
In humans, studies of population structure can complement population
history information derived from various other sources including written
historical records, archaeology, and linguistics. In theory, a population’s
genetic structure provides a source of information independent of data and
conclusions from the other disciplines. In practice, however, the genetic
structure observed is usually compatible with several population history
scenarios, and the confidence intervals of any timing estimates typically
remain very wide, rendering it impossible to base sensible conclusions as
to a population’s past on genetic data alone. Thus, the interpretation of
population genetic results in terms of population history will rely heavily
on knowledge and insights from other disciplines. Because of the interdisciplinary nature of the subject, this thesis attempts to review non-genetic
literature on the history of its study populations in a wider fashion than is
usually possible in standard genetic studies. Likewise, the Review of the
Literature section aims at providing potential non-geneticist readers with
some basic knowledge of population genetics essential for interpretation of
this and other genetic studies.
18
Introduction
Review of the literature
The genetic structure of a population results from a combination of various
processes, the basic features of which are briefly described in the first section
of this literature review. Their consequences for the human genome structure,
gene mapping strategies, and population history inference are discussed
in the three subsequent sections. The last two sections review the history,
linguistics, and population structure of Finland and Sweden. Treatment of
these in the wider context of Europe and the Baltic Sea region can be found
for example in Lappalainen (2009).
This thesis refers to many geographical areas by their names in the local
language rather than in English (for example Skåne and Häme instead of
Scania and Tavastia). For simplicity, these names are used also in discussing the past, when the current entities were still nonexistent. Analogously,
the population history review concerns those moving into and residing
in the current area of Finland and Sweden since ancient times, although
the emergence of Finns and Swedes as nations is naturally a much later
development. Additionally, this thesis uses the term ”history” to refer to
past events regardless of the existence of written sources, i.e., referring to
both history and prehistory.
Unless otherwise indicated, information in the first section is based on
Hamilton (2009), Hartl & Clark (2007), and Hedrick (2005), in sections 2
and 3 on Strachan & Read (2011) and Read & Donnai (2011), and in section
4.1 on Jobling et al. (2004).
Review of the literature
19
1.Basics of population genetics
1.1.Inheritance
Diploid organisms have two alleles in each locus. In sexual reproduction, only
one of the alleles is passed to an offspring, and the offspring inherits one
allele from each parent. Which of the two alleles is inherited is determined
randomly and independently for each offspring, as originally established by
Mendel (Mendel’s first law).
In human beings, the exceptions to this pattern are the mitochondrial DNA
(mtDNA) and the sex chromosomes. The mtDNA is passed from mothers to
all offspring; thus, it forms maternal lineages. The Y chromosome, in turn,
is inherited along a paternal lineage from father to sons. Mothers pass an
X chromosome to both daughters and sons, and daughters receive another
X chromosome from their fathers (Figure 1).
paternal
grandfather
paternal
grandmother
maternal
grandfather
father
maternal
grandmother
mother
son
daughter
Figure 1. Inheritance in a three-generation pedigree. Mitochondrial DNA (circles) is
inherited maternally and the Y chromosome (small bars) paternally without recombinations,
whereas the autosomal chromosomes (pairs of long bars) recombine in each meiosis.
X chromosomes not shown.
1.2. Allele frequencies
Four evolutionary factors can change the allele frequencies of a population:
mutation, migration, selection, and genetic drift. In the absence of these
factors, the allele frequencies of a population will remain constant from one
20
Review of the literature
generation to the next. Notably, whereas selection depends on phenotype,
the other three factors are in principle independent of the phenotypic effects
of the locus in question.
1.2.1.Mutation
Mutation is a random, molecular-level process that results in permanent
differences between the ancestral and descendant copies of a DNA sequence.
It is the ultimate source of all genetic variation. Mutations range from single-base DNA changes to large structural alterations, and their phenotypic
or fitness effects range from advantageous through silent or neutral to highly
deleterious. The frequency of mutations depends on the type of mutation
and the organism in question (for humans, see section 2.1). In general,
mutations are rare and will affect the allele frequencies of a population only
in the long term.
1.2.2.Genetic drift
Genetic drift is a random process caused by sampling errors in the proportions at which individuals, gametes, and alleles from the parental generation
contribute to the gene pool of the next generation. It results in random allele
frequency fluctuations which, despite being independent in successive generations, tend to accumulate over time. In the absence of balancing effects
of selection or new variants introduced to the population by mutation or
migration, genetic drift will lead to the fixation of one allele in the population
and loss of the other(s); the probability of an eventual fixation of an allele
is equal to its initial frequency. Thus, genetic drift leads to a reduction in
a population’s genetic diversity. Meanwhile, it causes divergence between
populations, as they can become fixed for different alleles.
Since sampling errors are largest in small samples, the effect of genetic
drift is strongest in small populations (Figure 2). Even short periods of
small population size – population bottlenecks – can lead to substantial
genetic drift and a marked reduction in genetic diversity. Another instance
of considerable genetic drift is founder effect: when a small group of individuals emigrates to form a new population, the allele frequencies in that
founder group – and subsequently in the established population – may
differ strikingly from those of the ancestral population. Genetic drift can
also have a stronger effect on the population than its census size would
suggest, if the effective size of the population is small, for example due to
an unequal breeding sex ratio or a large variation in family size. Notably,
mtDNA and the Y chromosome have lower effective population sizes due
to their uniparental inheritance, which makes them more prone to genetic
drift than autosomal, biparentally inherited loci.
Review of the literature
21
B
C
0.0
0.2
0.4
0.6
allele frequency
0.8
1.0
A
0
20
40
60
generation
80
100 0
20
40
60
generation
80
100 0
20
40
60
generation
80
100
Figure 2. Allele frequency fluctuations caused by genetic drift in populations of size
50 (A), 300 (B), and 1500 (C). Each line denotes allele frequency from one simulation of
100 generations. For example, in small population A, of the initial 20 loci, only 6 remain
polymorphic after 100 generations, whereas in the larger populations, none of the alleles
become fixed, but their final frequencies vary between ca. 0.2 and 0.85 in B and between
0.4 and 0.65 in C. Initial allele frequency in all simulations was 0.5; 20 simulations were
done per population size. Calculated with a population simulator written by E.S. and
described in Lappalainen et al. 2010.
1.2.3.Selection
If individuals with a certain phenotype leave more descendants than others,
the alleles contributing to that phenotype (if any) rise in frequency. This process, called selection, is the mechanism producing evolutionary adaptation.
Selection can act for or against any genotype(s) and affect any life stage of
an individual, for example viability, mating success (in which context it is
often called sexual selection) or fecundity. The effects of selection on a population’s allele frequencies depend on the strength of selection for or against
a given genotype, on genotype frequencies, and on population size; in small
populations, the effects of selection can easily be overridden by genetic drift.
Selection can also affect loci that are themselves neutral: in a phenomenon
called genetic hitch-hiking, alleles can change in frequency due to selection
acting on a nearby locus. The general importance of selection in shaping
and maintaining a population’s genetic variation has been highly debated.
1.2.4.Migration
The migration of individuals between populations (or the movement of
their gametes, as with plant pollen) can lead to gene flow, i.e., the inclusion
of the individuals’ alleles in the new population’s gene pool. Gene flow can
change allele frequencies in either or both populations. These changes can
efficiently oppose the effects of genetic drift: they reduce the divergence
between the populations by harmonizing allele frequencies across them, and
22
Review of the literature
maintain the variation within the populations by replacing alleles that could
otherwise become lost. Within a population, migration patterns can lead to
deviations from random mating, i.e., to population substructure. One such
spatial pattern that is frequently observed is isolation by distance, where
geographically close individuals are most likely to mate, and the probability
of mating will decrease with distance.
1.3.Genotypes
Under a set of simplifying assumptions, the genotype frequencies of a population follow the Hardy-Weinberg equation p2 + 2pq + q2 = 1. In this equation,
p is the frequency of allele A, q = p - 1 is the frequency of allele B, with p2,
2pq, and q2 being the frequencies of the genotypes AA, AB, and BB. This
equation will hold for a biallelic locus in a population of a diploid, sexually
reproducing species with non-overlapping generations. Furthermore, the
population must mate randomly and have equal allele frequencies for males
and females; selection, mutation, migration, and genetic drift need to be
absent; and the union of gametes in fertilization must be random. However,
even when some of these conditions go unmet, the equation will often hold
approximately, for example in finite populations (i.e., in the presence of
genetic drift) with overlapping generations.
A population whose genotype frequencies match those predicted by the
above equation is in Hardy-Weinberg equilibrium (HWE). Deviations from
HWE can signal deviations from the equation’s basic assumptions. For
example, relative to the equilibrium frequencies, positive assortative mating
leads to an excess of homozygotes, but negative assortative mating to an
excess of heterozygotes. Likewise, the presence of diverged subpopulations
within a population will lead to a deficiency of heterozygotes compared to
a situation in which the same population is mating randomly (Figure 3).
Such heterozygote deficiency is measured by the F statistics (for example FST) that are used to detect population structure. Note that nonrandom
mating patterns – unlike the factors listed in section 1.2 – do not change
the allele frequencies of the population, only the distribution of the alleles
into genotypes: if the population becomes randomly mating (and otherwise
fulfills the criteria above), its genotype frequencies will reach HWE in the
following generation.
Review of the literature
23
A
Allele frequency in total population: 1/2
Subpopulation allele frequency: 1/6 Subpopulation allele frequency: 5/6
Heterozygote frequency: 20/72 = 28%
B
Allele frequency in total population: 1/2
Subpopulation allele frequency: 1/2 Subpopulation allele frequency: 1/2
Heterozygote frequency: 36/72 = 50%
Figure 3. Genotype frequencies in a subdivided population (A) and a randomly mating
population (B). Each subpopulation is panmictic and has genotype frequencies determined by the Hardy-Weinberg equilibrium, but the population consisting of two diverged
subpopulations (A) has fewer heterozygotes than the population with the same overall
allele frequency but no subdivision (B). Black and white circles represent two types of
alleles, and ovals demarcate alleles of one individual.
1.4. Haplotypes, recombination, and linkage disequilibrium
According to Mendel’s second law, the alleles at two loci are inherited independently. An exception to this rule are the alleles in adjacent markers on the
same chromosome. These alleles form a haplotype, and are inherited together
unless a recombination occurs between them (Figure 1). Recombinations
result when the homologous chromosomes (one inherited from the mother
and another from the father) change segments in meiosis in a process called
crossing over. Recombinations may create haplotypes different from those
present in the parental chromosomes, thus leading to increased genetic
variation in populations. Like mutation, however, recombination is a slow
process, and its effect on a population’s genetic variation is therefore relatively small.
24
Review of the literature
The further apart two loci are on a chromosome, the more probable that a
recombination will occur between them. When two loci have a 1% probability
of recombination, their genetic distance is defined as 1 centimorgan (cM). In
humans, 1 cM roughly corresponds to a physical distance of 1 megabase (Mb),
but variation in both directions is at least threefold (Strachan & Read 2011).
On an extremely fine scale, recombination rates show even greater variation,
producing recombination hotspots (Myers et al. 2005, Coop et al. 2008).
When alleles on adjacent markers co-occur on a chromosome more often
(or more rarely) than would be expected at random based on their respective
frequencies, they are said to be in gametic disequilibrium or linkage disequilibrium (LD). LD commonly arises because a mutation initially occurs on
a certain haplotype, and recombinations will only gradually produce other
haplotypes carrying the mutation. Additionally, the level of LD can depend
on many other population processes: Selection for or against certain allele
combinations can change haplotype frequencies relative to LD. In positive
assortative mating, the decay of LD will become slower. Small populations
maintain higher levels of LD than do larger populations, and LD decays
faster in expanding than in constant populations (Slatkin 1994). Admixture
can create strong LD, especially when highly diverged populations mix in
equal proportions. Notably, processes like selection or admixture can create LD even between loci situated on different chromosomes or otherwise
completely unlinked.
Genome regions with exceptional recombination patterns include the
mtDNA which does not recombine, the Y chromosome which does not
recombine apart from small pseudoautosomal regions (PARs), and the X
chromosome which, apart from PARs, recombines only in female meiosis.
Review of the literature
25
2. Human genome structure
The human genome contains ca. 3200 Mb of DNA. It consists of the nuclear
genome – 22 pairs of autosomes and one pair of sex chromosomes (XX in
females and XY in males) that vary in length between 48 and 249 Mb – and
the small mitochondrial genome which is 16.6 kilobases (kb) in size. The
total genetic length of the nuclear genome (excluding the Y chromosome)
is 3615 cM (Kong et al. 2002).
While the whole mitochondrial DNA was sequenced relatively early
(Anderson et al. 1981), a detailed insight into the nuclear genome was first
provided by the initial sequences produced by the International Human
Genome Sequencing Consortium (Lander et al. 2001) and a company, Celera
Genomics (Venter et al. 2001). These sequences covered about 90% and
were later refined to cover more than 99% (International Human Genome
Sequencing Consortium 2004) of the euchromatic portion of the nuclear
genome. (The remaining 200 Mb, or 6.5% of the genome, is highly repetitive heterochromatin which is very difficult to sequence.) Since then, the
International HapMap Project (see section 2.2), The 1000 Genomes Project
(1000 Genomes Project Consortium 2010), and many other studies have
complemented and updated these views (Lander 2011).
The majority of the human genome consists of different types of repetitive
sequences; transposable elements alone constitute at least 40% (Strachan
& Read 2011, de Koning et al. 2011). Meanwhile, only 1.1% of the genome
is protein coding, and another 2 to 4% shows evolutionary conservation
across species that points to important regulatory or other functional roles
(Dermitzakis et al. 2005). On the other hand, it appears that almost all of
the human genome is transcribed (ENCODE Project Consortium 2007),
which suggests previously unknown functions and mechanisms for the
noncoding DNA.
The human genome is estimated to harbor ca. 20,500 protein-coding
genes (Clamp et al. 2007). Compared to other animals, this is an unexceptional number: for instance, the flatworm Caenorhabditis elegans has more
than 19,000 genes in its 97-Mb genome (C. elegans Sequencing Consortium
1998, Hillier et al. 2005), and the water flea Daphnia pulex has more than
30,000 (Colbourne et al. 2011). Due to the abundance of alternative splicing
(Pan et al. 2008), however, the human genome codes for a much higher
number of different proteins than the number of protein-coding genes might
suggest. Additionally, the human genome contains at least 6000 genes for
various types of noncoding ribonucleic acids (RNAs) (Amaral et al. 2008,
Read & Donnai 2011).
26
Review of the literature
2.1.Types and patterns of variation
The sequences of any two human beings are on the average about 99.9%
identical (Kidd et al. 2004). Compared to that of many other species, this
degree of variation is small, and has been attributed to a recent population
bottleneck in the human species (Jorde et al. 2001 and references therein).
The global distribution of these differences is also relatively uniform: the
differences between continents account for 3 to 10% of the genetic variation
and the differences between populations within a continent for only 2 to 5%,
while 84 to 95% of the genetic variation appears between individuals within
populations (Barbujani et al. 1997, Rosenberg et al. 2002). Furthermore,
the geographic patterns of variation between populations tend to be mostly
clinal (as in Europe; Novembre et al. 2008).
All genetic variants have originally been produced by mutation (see
section 1.2.1). Other factors that produce variation are recombination and
sexual reproduction that reshuffle these variants into new combinations in
individuals. Variants whose frequency in a population exceeds an arbitrary
threshold, often 1%, are termed polymorphisms. Variants rarer than this
are often referred to as mutations. This may easily bring unsupported connotations of pathogenicity and recent origin, although a rare variant can be
equally old and equally neutral as a common one – indeed, the same variant
can obviously be common in one population and rare in another.
2.1.1.Single nucleotide polymorphisms (SNPs)
In terms of numbers, the most abundant type of variation in the human
genome involves single nucleotide polymorphisms (SNPs): locations of
the genome where individuals may frequently differ by one DNA base
(Figure 4A). Common SNPs with minor allele frequency (MAF) above 0.05
number about 9 to 10 million (International HapMap Consortium et al. 2007),
i.e., on average roughly one per 300 bases, although their density along the
genome varies. The rate of single-base mutations is very low, approximately
1.0 to 2.5 x 10-8 mutations per nucleotide per generation (Nachman & Crowell
2000, 1000 Genomes Project Consortium 2010). Consequently, most SNPs
have likely arisen in a single mutation event in the past, which makes them
attractive markers for association studies (see section 3.2).
Review of the literature
27
A
B
ACGAGCTGCCACG
TGCTCGACGGTGC
ACGAGCCACACAGCCACG
TGCTCGGTGTGTCGGTGC
ACGAGCCGCCACG
TGCTCGGCGGTGC
ACGAGCCACACACACAGCCACG
TGCTCGGTGTGTGTGTCGGTGC
Figure 4. Two possible alleles of a single nucleotide polymorphism (SNP) (A) and a
microsatellite (B). The SNP alleles differ by one basepair (bold) of a DNA stretch. In the
microsatellite, the upper allele contains three dinucleotide (CA/GT) repeats, and the
lower allele contains five.
2.1.2.Microsatellites
Microsatellites, or short tandem repeats (STRs), are runs of short sequence
units (1-6 nucleotides), repeated tandemly in the genome. The number of
repeats can vary between individuals (Figure 4B). The human genome contains about 150,000 polymorphic microsatellites. Microsatellite mutations
are typically additions or deletions of one to two repeat units, caused by polymerase errors during DNA replication. The mutation rate of microsatellites
is much higher than that of SNPs, in the range of 1.5 x 10-3 (Butler 2006).
Due to their high mutation rate and multiple alleles, many microsatellites
are highly polymorphic (Weber & May 1989, Litt & Luty 1989), which has
made them widely useful in many genetic applications since the early 1990s.
Whereas microsatellites are still in use in forensics, technical advances
in high-throughput SNP genotyping in the early 2000s have made SNPs
the markers of choice for gene mapping and much of population genetics;
relative to microsatellites, their higher genomic density and lower mutation
rate amply compensate for their lower diversity.
2.1.3.Structural variation
Structural variants can be defined as genomic alterations that are larger than
1 kb in size. They include copy number variants such as insertions, deletions,
and duplications, and structural alterations that do not change the copy number, such as inversions and translocations. Copy number changes involving
segments less than 1 kb in size are often called indels or deletion/insertion
polymorphisms (DIPs) (Feuk et al. 2006). Structural variants appear to be
much more common in the genome than previously believed (Sebat et al.
2004, Iafrate et al. 2004, Tuzun et al. 2005, Redon et al. 2006), even in the
nonrepetitive sequences of phenotypically normal individuals. Although
28
Review of the literature
structural variants are fewer in number than are SNPs, they comprise the
majority of differing nucleotides between individuals (Levy et al. 2007).
2.2. Haplotype structure
The level of LD varies along the human genome. These patterns of longrange LD are usually similar between populations (De La Vega et al. 2005,
Service et al. 2006, Conrad et al. 2006), although the overall levels of LD
can differ as a result of population history (see section 1.4). At small scale,
the LD pattern is characterized by discrete haplotype blocks of high LD that
exhibit low diversity and are separated by points of recurrent historical
recombinations, while the long-range LD arises from haplotype correlations
between these blocks (Daly et al. 2001).
The block structure, resulting from the effects of recombination
hotspots and human population history, has been studied in detail by the
International HapMap Project (International HapMap Consortium 2003,
2005, International HapMap Consortium et al. 2007, International HapMap
3 Consortium et al. 2010), initially focusing on four populations from three
continents. Depending on the block inference method, the blocks appear 5 to
16 kb in length and carry 3.6 to 5.6 haplotypes on average. The overall block
structure is similar among populations, although in the African HapMap
population, the blocks are slightly shorter and harbor more diversity than
in the European or Asian HapMap populations.
Because of the low haplotype diversity within blocks, knowing block
structure can greatly simplify the pursuit of association mapping (see section 3.2): when most of the variation within a block can be captured with a
few tagging SNPs, in a genome-wide association study it suffices to genotype
a few hundred thousand tagging SNPs, instead of the several million common
SNPs present in the genome.
Review of the literature
29
3.Gene mapping strategies
Like any genetic variants, disease alleles follow the basic mechanisms of
inheritance described in section 1.1: once created by mutation, they are
passed from parents to offspring along with the alleles in their adjacent
markers, unless separated by a recombination; over the generations, the
recombinations narrow down the segments of the original haplotype that the
disease chromosomes share. These mechanisms form the basis of the two
main approaches of gene mapping, linkage analysis and association analysis,
which are detailed in sections 3.1 and 3.2. These approaches monitor the
flow and sharing of chromosome segments in pedigrees or populations by
use of sets of molecular markers. None of these markers needs to be directly
causal of the disease but some of them may show a pattern similar to the
one expected for the disease gene. Such markers are likely to reside close
to the causal locus.
Neither of these approaches leads directly to the identification of a gene
underlying the phenotype but only indicates its probable position in the
genome – the actual gene has to be recognized by other means. In the old
days, those involved the tedious process of cloning, but with the current
availability of the human genome sequence, genes in the relevant area can
just be sought in the databases and prioritized for further analyses like
mutation screening based on (predicted) information on their function. The
new sequencing methods also enable a direct search for mutations causing
Mendelian diseases from whole-genome or whole-exome data. However,
distinguishing the pathogenic mutation(s) from among the thousands of
benign variants discovered requires intensive computational efforts of careful
filtering (Bamshad et al. 2011, Gilissen et al. 2012).
3.1.Linkage analysis
Linkage analysis, in its most basic form, monitors the cosegregation of
a phenotype and the alleles of a set of markers when they are inherited
in a pedigree; the markers whose segregation pattern is compatible with
the pattern of the phenotype are likely to reside close to the causal gene
(Figure 5). Extensions of this basic principle exist, for example for quantitative phenotypes. In humans, the pedigrees available are usually not
maximally informative (as specifically designed test crosses would be), and
linkage mapping has to rely on computational methods. Parametric linkage
analysis, applicable to phenotypes in which the inheritance pattern is known
and Mendelian, produces logarithm of the odds (LOD) scores that quantify
how much more likely is an observed pedigree if the disease gene and the
marker in question are linked than if they are unlinked. Nonparametric
30
Review of the literature
linkage analysis, suitable also for phenotypes with less clear patterns of
inheritance, detects markers in which affected relatives share alleles more
often than expected by chance.
Figure 5. Principle of linkage mapping. In a three-generation family, an autosomal
dominant disease (black circles and squares) segregates with a short haplotype of three
markers (colored bars and horizontal stripes). The disease allele appears to be inherited
along with the red haplotype, and the recombinations in the two individuals on the lower
right suggest that the disease gene resides close to the uppermost marker (arrow).
Naturally, in a single meiosis, many alleles throughout the genome will
show an inheritance pattern compatible with that of the phenotype. Therefore,
linkage analysis requires information from many meioses, and typically it is
necessary to collect several pedigrees. Still, the number of informative meioses and recombinations crucially limits the resolution of linkage analysis. In
some cases, the resolution can improve in subsequent analyses of ancestral
haplotypes or LD or in homozygosity mapping (Lander & Botstein 1987).
The inherently low resolution, on the other hand, makes linkage analyses
with relatively low-density marker sets feasible: the linkage studies of the
late 1990s and early 2000s routinely used sets of 300 to 500 microsatellites.
Although the microsatellites were informative for linkage due to their high
polymorphism, they have by now been largely replaced by SNP sets that
compensate for the lower marker variability with substantially higher density
and allow high-throughput genotyping.
Review of the literature
31
3.1.1.Suitability for different populations and phenotypes
Linkage analysis may benefit from the use of specific populations. In populations characterized by strong founder effects or other forms of extreme
genetic drift, some diseases that are rare elsewhere may have become
enriched. This will facilitate the collection of a sufficient number of families
affected with the disease. Furthermore, genetic drift has likely reduced allelic
heterogeneity and thus simplified haplotype-based fine-mapping efforts,
as well as the locus heterogeneity, so that most of the affected families will
presumably carry a mutation in the same gene.
Linkage analysis has proved effective in mapping genes causal of monogenic diseases, but it is very sensitive to locus heterogeneity and has low
power to detect loci with weak effects. Its successes have therefore been
modest in locating genes for complex diseases, which are thought to arise
from the contributions of several small-effect susceptibility alleles from
multiple genes. Indeed, replicable detection of such genes with linkage
analysis would require extensive if not impossibly large sample sets, whereas
the same power is achievable with considerably smaller sample sets in
association analyses (Risch & Merikangas 1996). The latter have thus won
popularity in complex disease studies, once technological developments
made them feasible.
3.2. Association analysis
Rather than explicitly inferring recombination events during meioses, association analysis relies on the cumulative effect of those recombinations in a
population; individuals sharing a disease mutation through common descent
are also likely to share adjacent short stretches of the haplotype in which
the mutation originally occurred (or in which the mutation entered the
population). This leads to LD between the disease allele and the surrounding
alleles and can be detected as co-occurrence of the surrounding alleles with
the disease more often than expected (Figure 6). In its simplest form, association analysis compares a set of cases with the disease of interest to a set of
control individuals without the disease but otherwise resembling the cases
as closely as possible; any genetic factors that differ between the groups are
assumed to relate to the disease. Extensions of this principle to quantitative
phenotypes exist, but in this section, association analysis is mainly discussed
from the point of view of case-control analysis and disease traits.
32
Review of the literature
cases
x
x
x
x
x
T
T
x
C
C
C
C
controls
x
T
C
x
x
T
C
x
x
T
C
C
C
T
T
T
C
frequency of allele T:
8/20 = 40%
T
C
C
C
C
C
C
C
C
C
x
C
C
C
C
T
C
T
C
C
C
T
C
frequency of allele T:
4/20 = 20%
Figure 6. Principle of association mapping. A genetic variant (marked X) that contributes
to risk for a complex disease is in linkage disequilibrium (LD) with allele T of a nearby
SNP. Despite incomplete LD (risk haplotypes with C and non-risk haplotypes with T
allele), reduced penetrance (healthy individuals with the risk variant) and phenocopies
(cases without the risk variant), the T allele is more common among cases than among
controls, i.e., it associates with the disease.
Because many meioses and recombinations have taken place in a population since the original appearance of a disease mutation, LD usually
extends a much shorter distance on the chromosome than does linkage, and
association analyses typically require a genotyping density in the range of a
few kilobases. This obviously translates to a relatively high resolution. On
the other hand, the prior unavailability of sufficiently dense marker-sets
restricted association analyses mainly to candidate gene studies and to
the fine mapping of linkage analysis signals (in which context association
analysis is often called linkage disequilibrium analysis).
Since the early 2000s, commercial SNP genotyping arrays have enabled
genome-wide association studies (GWASes). These are based on the “common disease - common variant” hypothesis, in which susceptibility to complex diseases results from genetic variants that are common in the population
and also largely shared between populations. By March 2012, the nearly 1200
published GWASes had detected associations (p < 1 x 10‑8) of more than
2000 SNPs with dozens of phenotypes ranging from melanoma, migraine,
and malaria to height and hair color (National Human Genome Research
Institute (NHGRI) Catalog of Published Genome-Wide Association Studies,
http://www.genome.gov/gwastudies, accessed 03/06/2012).
Review of the literature
33
Compared to linkage analysis, association analysis is less sensitive to
reduced penetrance of the causal variant and to locus heterogeneity. On
the other hand, unlike linkage analysis, it can be seriously affected by allelic
heterogeneity or population stratification. Population stratification can
create spurious associations, if the sampling patterns of cases and controls
across subpopulations differ, because an allele whose frequencies vary
between subpopulations may differ between cases and controls even if the
allele is unrelated to the phenotype. Obviously, when genome-wide data
are available, they can serve in the genetic matching of cases and controls
prior to association analysis or in a computational stratification correction,
typically based on principal component analysis (PCA) (Price et al. 2006).
3.2.1.Suitability for different populations
Association analysis may benefit from the use of populations that are characterized by founder effects, small size, or isolation, because their reduced
genetic diversity can lead to lower locus and allelic heterogeneity. However,
the reduction in diversity is likely smaller for common variants than for the
rare ones targeted by linkage analysis. The obvious downside of such study
populations is that some of the variants contributing to the disease in other
populations will be missing or rare and therefore impossible to find, and,
conversely, the enriched variants may elsewhere have lower frequencies and
thus be of less importance (see Martin 2006).
The LD utilized by association analysis can vary in strength between populations. When LD extends more widely on the chromosome, an association
analysis needs a less dense set of markers. It can therefore be advantageous
to conduct association studies in populations that exhibit high LD, be it
from founder effects, small population size (Terwilliger et al. 1998), or other
factors resulting in genetic drift. This includes a trade-off, however; while
stronger LD allows association scanning with smaller sets of markers, it also
leads to lower resolution. This disadvantage may be partly circumvented by
replicating and refining association findings from high-LD populations in
populations that are more outbred.
Increased LD can result from a recent population admixture and, as
stated above, reduce the number of markers needed in an association scan.
Additionally, if a disease differs in frequency between the parental populations, its causal variant is likely to lie in a genome region where cases show
ancestry from the high-frequency parental population either more often
than controls do or more often than elsewhere in their genome (Stephens
et al. 1994, Smith & O’Brien 2005, Zhu et al. 2008); the ancestry inference
can be based on a set of ancestry-informative markers (AIMs) that differ
in frequency between the parental populations. This approach, admixture
mapping (or mapping by admixture linkage disequilibrium), has successfully
34
Review of the literature
located genes for several phenotypes such as hypertension and multiple
sclerosis (Zhu et al. 2005, Reich et al. 2005; see Winkler et al. 2010 for a
review). The usual target population has been African-Americans, in whom
the admixture between Africans and Europeans is sufficiently recent to
enable mapping with a hundredfold fewer markers than for nonadmixed
populations (Smith et al. 2004). Other AIM panels exist, for example for
Latinos (Tian et al. 2007, Price et al. 2007, Mao et al. 2007) and East Asian
Uyghurs (Xu & Lin 2008). A further advantage of admixture mapping is
that it can locate relatively low-risk loci by use of fairly small sets of cases
and controls (Hoggart et al. 2004, Patterson et al. 2004; cf. section 3.2.2).
3.2.2.Suitability for different phenotypes
The commercial SNP genotyping arrays that are used in GWASes feature
common SNPs, typically with MAFs exceeding 5%. Since the LD between
a common and a rare variant will be low (Wray 2005, Eberle et al. 2006),
GWASes are underpowered to find rare disease alleles. Common alleles, in
turn, are unlikely to have large effects on disease risk, due to the purifying
effects of selection – perhaps with the exception of late-onset diseases
and variants that would not have been disadvantageous, for example, for
a pre-industrial life style. Thus, GWASes were predicted to find mainly
low-effect variants, and, indeed, most SNPs indicated by GWASes have small
effect sizes: the 1,234 associated SNPs (with p < 1 x 10-8) reported by the
NHGRI Catalog of Published Genome-Wide Association Studies (accessed
01/01/2012) had a median odds ratio (OR) below 1.28.
Finding small-effect alleles requires large sample sizes, especially in
genome-wide studies where the significances must survive a stringent multiple testing correction. Accordingly, many of the current successful GWASes
are meta-analyses or efforts of large international consortia, and feature
combined datasets with thousands or tens of thousands of individuals. The
need to collect large numbers of cases can obviously limit the usefulness of
GWASes in small populations or for relatively rare diseases. Furthermore,
the power of prediction of the disease risk of individuals based even on
ample sets of such markers tends to be low, often much lower than that
based on traditional (non-genetic) risk factors (see Meigs et al. 2009 for
type 2 diabetes, Aulchenko et al. 2009 for height).
Even when association analyses have revealed numerous variants that
contribute to a complex disease or phenotype, the variants commonly explain
only a modest portion – usually 10% or less – of the genetic variation in
the phenotype (although when non-significantly associated loci are also
considered, the proportion may rise to 50%) (Visscher et al. 2012 and their
references). This phenomenon, known as missing heritability, may result
from many factors: 1) epistatic interactions computationally unfeasible
Review of the literature
35
to test in a hypothesis-free manner; 2) structural variations, which are
currently understudied in GWASes, substantially contributing to disease
susceptibility; 3) complex inheritance, for instance epigenetic mechanisms or parent-of-origin effects; 4) inflated estimates of total heritability;
5) common variants not reaching genome-wide significance in the GWASes
due to their low penetrance but that might be found by further enlarging
sample sizes; 6) actual causal variants in the GWAS-associated regions having
larger effects than the genotyped SNPs; and 7) rare alleles that confer large
effects and that may be found by sequencing (Eichler at al. 2010, Manolio
et al. 2009). These factors, however, have not yet been sufficiently studied
to enable estimation of their relative contributions to genetic variation in
human complex diseases, and the subject remains debated.
3.3.Complementary methods
As described, linkage and association analysis are applicable to a wide range
of diseases and other phenotypes. Indeed, each approach has mapped dozens
of genes successfully. Both approaches nevertheless have their disadvantages
and limitations, which creates a demand for complementary methods that
may be suitable for a specific subset of phenotypes, populations, or variants.
This thesis introduces and theoretically tests one such method, the subpopulation difference scanning (SDS).
36
Review of the literature
4.Genetic inference of population
history
4.1. General principles
As described in section 1, various demographic processes will influence the
genetic structure of a population. Conversely, population structure can be
useful in inferring processes that have affected the population in the past and
thus in estimating earlier population structure. Another, more direct means
of assessing the structure of past populations would be studying ancient
DNA (aDNA), but such studies often suffer from the limited availability of
suitable samples and from technical challenges such as DNA degradation
and contamination (Hofreiter et al. 2001, Mitchell et al. 2005, Willerslev &
Cooper 2005). Consequently, the genetic inference of population histories
mostly relies on studies of extant populations.
To complement direct observations of contemporary population structure
and diversity, population history inference often utilizes population simulations. These can model the processes that may have played a role in the
population’s past. Two main types of simulations exist. Forward simulations
start from a set of initial parameters, proceed forwards in time according to
defined demographic scenarios, and produce simulated datasets that can
then be compared to the real data. Backward or coalescent simulations start
from the observed data and proceed backwards in time to infer what kind of
processes may have produced them. However, relative to the multitude of
factors that affect real population demographies, both forward and backward
simulations are often highly simplified. Additionally, compatibility of the
observed data with (forward) simulations does not prove that the simulated
scenario would be the one that has actually taken place. Equally or more
compatible results could result from other, untested simulation scenarios.
Thus, instead of proving a given scenario, simulations can serve to exclude,
from various explanations for the observed data, obviously improbable
scenarios.
Genetic variation within a population may result from very different
population-genetic processes, typically from a combination of many factors.
For example, factors such as selection, absence of migration, or a small
population size that has caused strong genetic drift can all result in low
diversity of a locus. The presence of selection can often be explicitly tested,
or simply ignored if the proportion of loci under selection is presumably low.
The effects of selection can, however, extend to neighboring loci through
hitch-hiking phenomena (see section 1.2.3). Mutation, in turn, usually has
negligible effects on population diversity when the time intervals of interest
Review of the literature
37
are short, and the mutation rates for the marker type in question are low,
as for example for SNPs.
In addition to overall level of diversity, geographical structuring of
that diversity within a population may vary and thus be of interest for
population-history inference. Indeed, low diversity does not always equal
within-population homogeneity (Figure 7), because factors such as low
population density that reduce general diversity (through genetic drift)
will likely cause increased substructuring (through small breeding units).
The existence of population substructure is typically detected as deviations
in genotype frequencies from HWE (see section 1.3), and is quantified by
FST-like measures, several versions of which exist for different marker types.
Furthermore, information on the past can emerge from the level and patterns
of LD: for example, admixture events create increased LD that in subsequent
generations will gradually decay.
heterogeneous
homogeneous
high diversity
low diversity
Figure 7. Relation between genetic diversity and homogeneity or heterogeneity. Even
the populations (large circles) with relatively low diversity can be either homogeneous
or heterogeneous, depending on the geographic distribution of genetic variants (colored
circles) they contain.
Genetic data from extant populations also enable the timing of population history events. When populations diverge through genetic drift, their
genetic distances can serve in inferring their relative divergence times. This
approach, however, presumes the absence of subsequent migration between
38
Review of the literature
the diverging populations and constant (or at least known) effective population sizes – both criteria that the human population history rarely fulfills.
Other timing methods rely on the gradual accumulation of mutations (for
example STR diversity within Y-chromosomal haplogroups) or the decay
of haplotypes through recombination (Figure 8). These estimates can be
calibrated into years or generations by use of mutation and recombination
rates, but because these rates are usually imprecise, the confidence intervals
of the resulting time estimates typically remain very wide. Importantly,
these approaches measure the molecular, rather than population, history.
For example, the most recent common ancestor of a set of lineages in a
population may greatly antedate the migration that brought those lineages
into the population. The two time-points will be close to each other only in
the cases of extreme founder effects or bottlenecks.
A
B
time
Figure 8. Principle of genetic timing. Mutations (A) or recombinations (B) accumulate
over time on the descendant copies of an original haplotype. The larger the number of
mutational differences (thin stripes in A) or the shorter the shared stretch of the original
haplotype (white in B), the longer the time passing since the split between the two lineages.
4.2. Genome-wide SNP datasets in studies of population history
It is generally preferable to base population history inference on data from
several loci, because the patterns of single loci are subject to different levels
of selection and genetic drift (Jobling et al. 2004). Notably, mtDNA and
the Y chromosome which have been widely used in population genetics
comprise only two loci because of their nonrecombining nature, and their
smaller effective population size makes them specifically prone to genetic
drift. Whereas the latter may even be advantageous when studying closely
related populations, the two loci may not yield a full and representative
picture of a population’s history, and may need to be complemented by
autosomal markers.
In recent years, an important source for such autosomal data has been provided by the genome-wide SNP datasets produced by GWASes. Furthermore,
GWASes have awakened increased interest – also among non-population
Review of the literature
39
geneticists – into population structure because it is a confounding factor
in the association studies. Together, these developments have resulted in
numerous studies of population structure on a global and continental as well
as on more local scales. They have also spawned methodological advances
in population genetics, for example evaluations of the interpretation and
limitations of the traditionally used method of PCA (Patterson et al. 2006,
Novembre & Stephens 2008, McVean 2009, François et al. 2010).
Population genetic analyses of the genome-wide SNP datasets can use
many of the existing analysis methods, provided that those are computationally sufficiently fast to effectively utilize the vast amount of data. Obviously,
the presence of recombinations restricts the range of possible analysis
methods relative to those applicable on mtDNA- and Y-chromosomal data,
and the biallelic nature of the SNPs on commercial genotyping arrays limits
the informativeness of single markers. Use of short haplotypes of adjacent
SNPs may, however, enhance the power of analyses at least in some situations
(Jakobsson et al. 2008, Morin et al. 2009, Gattepaille & Jakobsson 2012).
Additionally, the large number of markers available enables individual-based
analyses, which have the potential to produce results independent of the
researcher’s preconceptions of the underlying population structure. This is
an important capacity, especially when the geographical sampling distribution is relatively even and suggests no natural division into units of analysis.
Several novel analysis methods have also been developed specifically for
genome-wide SNP data (see Moorjani et al. 2011, Johnson et al. 2011, Pickrell
& Pritchard 2012). However, many such methods may be more useful for
study of global or intercontinental phenomena than of the small differences
between closely related (sub)populations.
Some disadvantages may limit the use of genome-wide SNP datasets
from GWASes in population history inference. Although the size and geographic density of sampling in these datasets may be unprecedented, from
a population history perspective, the populations targeted by association
analyses may not be maximally interesting. The geographical and ancestry
information available can also be much less detailed than if the individuals
had been recruited specifically for population genetic studies. Even geographically dense sampling may not always be a blessing, because some of
the existing analysis methods are not well suited to inferring continuous
and clinal population patterns. Furthermore, ascertainment of SNPs on
commercial genotyping arrays is biased: those SNPs are common rather
than rare variants. Moreover, if their selection has been based on data
from one particular population, variation in other populations may not be
represented equally well (Wakeley et al. 2001, Clark et al. 2005, Rosenblum
& Novembre 2007). While correction methods for ascertainment biases do
exist (Nielsen et al. 2004, Guillot & Foll 2009), a more balanced picture of
population genetic variation will eventually emerge from sequencing studies.
40
Review of the literature
4.3. The multidisciplinary inference of human past
The historical inferences genetics provides can be compared to those provided by other disciplines. Archaeology studies the remnants of past human
cultures to infer their customs, contacts and sources of livelihood. Linguistics
can expose the divergence of related languages through time, presumably
reflecting their spatial spread, and discover loanwords signaling contacts
between speakers of different languages. Additionally, the words’ meanings
can provide information on the circumstances in which they were used or
acquired – for example, names for inventions are often loaned along with
the inventions themselves – and toponyms (place names) of loaned origin
can even anchor these language forms to a specific geographical location.
Physical anthropology can use bone findings to draw inferences as to the
anatomy and health of ancient individuals and relationships between peoples.
Other material from the past can undergo study by tools from sciences like
botany, zoology, and ecology to yield information on past environments,
living conditions, and means of human subsistence. History which relies on
written documents can often provide more detailed and direct information
than other disciplines, but is applicable only to periods with written records
available.
Because these disciplines study different aspects of the human past, their
conclusions may also differ – our genes determine neither the language we
speak nor the type of pottery we use. Admittedly, the three often correlate
because of being typically inherited in a similar manner, but the correlation
may be broken in various circumstances. From the archaeological record it
is often difficult to tell whether inventions or cultures have spread into an
area along with major migrational movements or mainly as cultural loans;
the two options could obviously leave very different traces in a population’s
language or gene pool. Conversely, there are major, historically known
linguistic influences invisible in the archaeological record (see Mallory
1989 p. 166). Meanwhile, relatively few immigrants per generation can
significantly shape a population’s gene pool over time, without necessarily
causing noticeable changes in the material culture or leaving more than an
occasional loan word in the linguistic record. On the other hand, factors like
the social dominance of a small group of immigrants can lead to language
replacement in the whole population within a few generations, while the
influence on the gene pool may remain subtle.
In theory, historical inferences based on genetic data are independent
of results from other disciplines. In practice, however, it is fairly difficult to
form an accurate picture of a population’s past based on genetic evidence
alone, since many different phenomena can result in similar genetic patterns,
and the confidence intervals of timings of genetic events are often wide (see
section 4.1). Therefore, insights into a population’s history may be crucial
Review of the literature
41
in the sensible interpretation of genetic data. For instance, agricultural
populations are typically much denser and more sedentary than hunter-gatherer populations, which immediately influences many population genetic
processes. Obviously, an up-to-date view of the timing, direction and extent
of contacts with neighboring populations is equally important.
42
Review of the literature
5.Finland
5.1.Population prehistory and history
The earliest – albeit highly disputed – signs of habitation within the current
area of Finland may date back to the Eemian interglacial more than 40,000
years ago (Pettitt & Niskanen 2005, Schulz 2010). During the last glacial
period, however, the Fennoscandian area was covered by a thick continental
ice sheet that both made the area unsuitable for life and wiped out most
signs of possible earlier inhabitants. While certain species like spruce may
have survived in some ice-free locations of Norway (Parducci et al. 2012),
the distribution of humans and most other species was limited to refugia
located further south, for example in Iberia and the Ukraine.
The first postglacial settlers arrived in the area of present-day Finland
soon after the glacial ice started retreating. These were hunter-gatherers
who probably followed the spread of deer and other prey species into the
newly emerged areas (Huurre 2005). The earliest settlements in Finland
date back to 8000-8500 BC, with those immigrants coming from the south
(Takala 2004). Other early settlers arrived from eastern and southeastern
directions (Siiriäinen 2003a). In northern Fennoscandia, including northern
Finland, the early settlers arrived via a route along the Norwegian coast
(Siiriäinen 2003a). Recently, objects connected to the southern culture
and dating back to 8300-8200 BC have also been found in northernmost
Finnish Lapland (Rankama & Kankaanpää 2008). From the period between
7000 and 6000 BC, multiple settlement sites are known throughout Finland
(Huurre 2005). The population size was probably not more than a few
thousand (Pitkänen 2007).
Pottery came into use around 4200 BC along with the arrival of the Comb
Ceramic culture (Figure 9A) that prevailed in large areas east and southeast
of Finland. Several subtypes of Comb Ceramics can be recognized, for example the Typical Comb Ceramic culture that spread from Karelia in 3400 BC.
Although such subtypes indicate the presence of regional (and temporal)
differences, the era was apparently characterized by frequent contacts both
within the country and with outside areas. These are demonstrated for
instance by archaeological findings that include objects made of Siberian
pine, originating from the Ural mountains, and Baltic amber (Siiriäinen
2003a, 2003b, Huurre 2005).
Review of the literature
43
Figure 9. Geographic distribution of the Comb Ceramic (CC), Funnel Beaker (FB), and
Pitted Ware (PW) cultures in Fennoscandia in ca. 3000 BC (A), and the Corded Ware
culture ca. 2500-2000 BC (B), according to Huurre (2005) and Carpelan (1999). Note
that other cultures existed elsewhere in the area.
The area of Finland first split into western and eastern cultural domains
upon the arrival of the Corded Ware culture, part of a wider movement that
spread over the whole of Denmark, southern Norway and southern Sweden
in 2300 BC, and reached Finland through the eastern Baltic, possibly a few
centuries earlier. In Finland, its influence was limited to the southwestern
and coastal parts of the country (Figure 9B). Contacts between the Corded
Ware area and the rest of Finland seem to have been very rare. In the
eastern areas, the Comb Ceramic culture with its predominantly eastward
connections persisted. Even in the western area, Corded Ware and the local
Comb Ceramic culture seemed to remain mostly separate, until towards the
end of Stone Age they fused to form the Kiukainen culture (Carpelan 1999,
Siiriäinen 2003a, 2003b, Huurre 2005).
With the advent of metal in the beginning of the Bronze Age
(1500/1300 BC - 500 BC), the division between eastern and western Finland
continued: both regions acquired the metal via their main direction of
contact. In the west, the main direction of contact shifted from the Baltics
to Scandinavian areas, as Scandinavian traders extended their trips to the
Finnish coasts to buy fur. In the east, southeastern contact brought textile
pottery from central Russia into southeastern and eastern Finland. From
there, ceramics spread also to the previously aceramic north (Siiriäinen
2003a, 2003b, Huurre 2005).
The introduction of agriculture to Finland has often been connected to
the arrival of the Corded Ware people who were agriculturalists elsewhere,
although direct signs of this have not been found in Finland (Siiriäinen 2003a,
44
Review of the literature
2003b). On the other hand, experiments with agriculture may have been
made even earlier, during the Typical Comb Ceramic period, as in related
cultures in Estonia and in the Lake Onega region in Russia (Mökkönen 2011).
One theory is that agriculture arrived during the Bronze Age both from the
east along with textile pottery and from the west along with Scandinavian
contact (Siiriäinen 2003a). Pollen studies have detected signs of agriculture
from more than 4000 years ago in southwestern Finland (Vuorela 1999).
Regardless of the time of arrival, hunting (especially fishing) long continued to be an important means of subsistence even in the agricultural areas
(Siiriäinen 2003a).
Earlier, archaeological findings dating to the early Iron Age were scarce.
This led archaeologists to postulate a discontinuity in Finnish settlement
and a large-scale migration of the ancestors of the present-day Finns from
the south in the first centuries AD. This theory is no longer valid, however,
since the archaeological record has become more plentiful for this period.
Currently, settlement is thought to have been continuous, although climate
factors may have caused a population decrease in 500 BC to 0 AD which
was compensated by a population increase from 0 to 200 AD. The latter
part of the Iron Age was characterized by extension of agricultural areas,
increase in population density, Baltic and Scandinavian contact and trade
with central Russia (Siiriäinen 2003b, Huurre 2005, Tallavaara et al. 2010).
Based on the archaeological record it is often difficult to say which of
the observed influences have resulted from a mere cultural diffusion and
which may have involved immigration, and the archaeologists are hardly
unanimous in their interpretations. The initial introduction of ceramics may
have been mainly a cultural loan, but a population movement has often been
associated with the spread of the Typical Comb Ceramic culture, although
the increase in settlement finds may also stem from a local economic boom
prompted by the warming climate rather than from a population influx.
The arrival of the Corded Ware culture was likely accompanied by some
degree of immigration, but the actual number of settlers may have been very
small; furthermore, they eventually fused with the former population rather
than replaced it. The Scandinavian contact during the Bronze Age, though
mainly trade-related, may have involved a small amount of colonization.
For the Iron-Age influence from the south, estimates of immigration range
from some to none (Carpelan 1999, Siiriäinen 2003a, 2003b, Huurre 2005,
Tallavaara et al. 2010).
The influence of Christianity began gradually to diffuse into Finland in
the 11th century, although pagan traditions persisted widely for centuries.
Nevertheless, this development attached Finland as a part of the Swedish
kingdom in the second half of the 12th and first half of the 13th century; in
this process, the efforts of the church were in fact more important than the
Review of the literature
45
efforts of the royal authorities (Lindkvist 2003a, 2003b, Mäntylä 2003a,
Sawyer & Sawyer 2003, Siiriäinen 2003b, Vahtola 2003a).
In the early 12th century, agricultural settlement was limited to small
areas of coastal Southwest Finland, Åland, southern Satakunta, Häme,
Southern Savo, and the northern and western shores of Lake Ladoga. The
early medieval centuries witnessed an increasing density of these settlements
as well as the spread of agriculture to new areas in Southwest Finland,
Satakunta, Karelia, Häme, Savo, Southern Ostrobothnia and the Kemi and
Tornio river valleys. Consequently, a population estimated as less than
50,000 in early the 12th century may have at least doubled within a few
centuries. Population growth in Sweden led to Swedish immigration to the
coasts of Ostrobothnia and southern Finland between the mid-13th and
mid-14th centuries, and some Estonian immigrants may also have arrived
in Southwest Finland. Numerous German merchants settled in towns, and
many were later assimilated (Orrman 1999, Dahlbäck 2003, Vahtola 2003a,
2003b, Pitkänen 2007).
Despite these developments, agriculture remained limited south of
62 degrees of latitude except on the west coast, mostly because of climate
and soil. The areas inland were not uninhabited, but consisted of hunting
grounds of the agricultural areas and of the nonagricultural Saami population. From the 16th century onwards, however, agriculture extended into
these areas with the spread of settlers predominantly originating from the
province of Southern Savo (themselves mostly of Karelian ancestry). This
spread was enabled by their farming methods (slash-and-burn) and a new
variety of winter-tolerant rye. The movement started spontaneously, probably
propelled by the earlier population growth, but since the latter half of the
16th century it was also legislationally encouraged by the Swedish King. By
the 17th century, the migrants had gradually spread to Northern Savo, to the
previously uncultivated areas of Häme, Satakunta, Southern and Northern
Ostrobothnia, and to Kainuu, Karelia, the inland areas of central and northern
Sweden, and even to Norway. The Saami inhabitants of the areas probably
in part retreated from and in part assimilated with the newcomers. Notably,
many of these areas were on the Russian side of the 1323 border, but the
newly-settled areas of northern Finland were ceded to Sweden in 1595. Other
consequences of the process were the doubling of the agricultural area of
Finland and growth of the population to 250,000 to 350,000 (Orrman 1999,
Keränen 2003, Vahtola 2003a, 2003b, Pitkänen 2007).
After the population expansion and increase of the 16th century, the
population growth of the following century was very modest, mainly due
to cold summer temperatures leading to numerous years of low crop yields
throughout the century. The famine of the winter of 1696-1697 was especially
severe: in combination with related epidemics, it may have killed one-third
of the population. Consequently, though the population size was about
46
Review of the literature
290,000 in 1635 and 400,000 before the famine, in 1721 it was 306,000.
In subsequent years, population growth resumed, leading to a population
of 405,000 in 1749 and nearly 900,000 in 1800. Despite a famine that
killed 100,000 people as late as in the winter of 1867-1868, the population
reached two million by 1880 (Mäntylä 2003a, 2003b, Nygård & Kallio 2003,
Olkkonen 2003, Pitkänen 2007).
Finland was ceded to Russia in 1809 and gained independence in 1917.
Since the late 19th century, migration within the country has been directed to
the cities and towards the south. For example Helsinki and the surrounding
province, Uusimaa, contained 10% of the Finnish population in 1880 but 25%
in 1990 (Pitkänen 2007). Nevertheless, urbanization occurred relatively late:
more than half of the population lived in rural areas until 1970 (Vihavainen
2003, Pitkänen 2007). Another instance of internal migration took place
after World War Two, when people from the areas ceded to the Soviet
Union – 12% of the total population – were resettled elsewhere in Finland
(Haataja 2003). The 20th century also featured two major emigration waves;
in the first decades of the century, 150,000 Finns left, mainly to America,
and in the 1960s, almost 300,000 went to Sweden (Nygård & Kallio 2003,
Vihavainen 2003, Pitkänen 2007). A smaller population movement occurred
during World War Two, when 70,000 children were sent for safety to Sweden.
While most of them returned after the war, thousands stayed in their foster
families (Laine 2003). At present (July 2012), the Finnish population size
is 5.4 million (Official Statistics of Finland 2012).
In summary, the Finnish population history demonstrates a general
continuity, possibly since the first postglacial settlers. While the overall
number of incoming settlers has undoubtedly been small, immigration has
nevertheless been characterized by relatively frequent contact with various
neighboring populations throughout the ages, rather than a handful of
distinct immigration waves. The population remained nonagricultural for
a long time, especially in eastern Finland, and small in numbers until a few
centuries ago.
5.2.Language
The Finnish language, with its 5 million speakers, is the second largest
language of the Uralic (Finno-Ugric) language family (Nanovfszky 2004). It
is closely related to other Finnic languages, including Estonian and several
small languages spoken around the Baltic Sea, for example Karelian and
Vepsian, and to the Saami languages spoken in the northern parts of Finland,
Sweden, and Norway, and in the Russian Kola Peninsula. Other related
languages include for example Komi, Udmurt, and Mordvin, spoken west of
the Urals in Russia, and the more distant relatives, Hungarian and several
languages of Siberia (Laakso 1991). Within Finnish is a major division into
Review of the literature
47
eastern and western dialects (Leskinen 1999). The first written documents
of Finnish date from the 16th century (Häkkinen 1996).
The Finnish language bears signs, in the form of loanwords, of contacts
with ancient Indo-European language forms, and more recent strata of loanwords from Baltic, Slavic, and various forms of Germanic languages. Extensive
analysis of the typical meanings of these words is available (Häkkinen 1990),
but interesting in the current context is that several Finnish words for female
relatives are Baltic loans, and the word for “mother” is Germanic in origin.
During later times, loanwords have come from Swedish as well as Greek
and Latin (often via Swedish), and recently from English (Häkkinen 1990).
Studies of Finnish place names have recognized Germanic names, names
of Saami origin even in southern Finland, and possibly names originating
from some unknown language (Häkkinen 1996).
The first arrival of the Uralic language in Finland has often been connected
to the Typical Comb Ceramic culture (see Meinander 1984), the spread of
which would coincide with the traditional estimates of splitting of the ProtoUralic language ca. 6000 years ago. Furthermore, the arrival of the Corded
Ware people has been thought to have led to the initial divergence between
Saami and Finnic languages (Sammallahti 1984) and to explain some of the
Baltic loan words found in Finnish (Carpelan 1999). More recently, however,
an apparent archaeological continuity has led to suggestions that it was the
early settlers of Finland who originally introduced the Uralic language into
the area (Wiik 2002), but this view is strongly opposed by most linguists.
Some linguists, in turn, have proposed markedly shorter chronologies for the
divergence of the Uralic language family: an initial split of the Proto-Uralic
language as late as 2000 BC, the arrival of a Uralic language in the Baltic
Sea area ca. 1900 BC, and the split between Finnish and Saami no earlier
than a few centuries BC (Kallio 2006, Häkkinen 2009). On the other hand, a
Bayesian phylogenetic analysis has suggested a Proto-Uralic divergence time
largely in line with the traditional estimates (Honkola et al. unpublished).
In 2011, Finnish was spoken by 4.9 million people in Finland (some
90% of the population). The other official language, Swedish, was spoken
by 270,000 (4.9%), and Saami by 1,900 (0.03%). Other minority languages
in Finland include Russian with 58,000 speakers (1.1%) and Estonian with
33,000 (0.6%) (Statistics Finland).
5.3. Genetic structure
5.3.1.Finns compared to other European populations
The Finns’ genetic structure has been studied extensively, starting with blood
group markers in the first half of the 20th century (see Norio 2003b p. 461
for a review). A study of nuclear genes found Finns as outliers within Europe
48
Review of the literature
(along with for example Saami, Basques, and Icelanders), with Belgians,
Germans, Austrians, and Swedes as their closest genetic neighbors (CavalliSforza et al. 1994). Finns also differ from their Scandinavian neighbors in
their human leukocyte antigen (HLA) frequencies (Siren et al. 1996), and the
frequencies of one blood group reveal a Baltic influence in Finland (Sistonen
et al. 1999). The mtDNA diversity and haplogroup composition of Finns are
similar to those of other Europeans (Sajantila et al. 1995, 1996; Lahermo et al.
1996, Torroni et al. 1996, Kittles et al. 1999, Finnilä et al. 2001, Hedman et
al. 2007), whereas Finns’ Y-chromosomal diversity is substantially reduced
(Sajantila et al. 1996, Kittles et al. 1998, 1999; Lahermo et al. 1999, Zerjal
et al. 2001, Hedman et al. 2004, Lappalainen et al. 2006). Y-chromosomal
haplogroups also reveal Baltic and Scandinavian contact (Lappalainen et
al. 2006, 2008).
Recent genome-wide SNP studies have shown that Finns differ from
other Europeans even more than their geographic location would indicate.
Depending on the combination of populations included, Finns appear to be
genetically closest to Swedes, Estonians, Germans, and Poles, among others
(Seldin et al. 2006, Bauchet et al. 2007, Lao et al. 2008, Novembre et al.
2008, McEvoy et al. 2009, Nelis et al. 2009). In a study looking at genetic
matches between individuals from 23 European locations, for 83% of the
Finnish samples the closest match was another Finn, which illustrates their
genetic distinctness (Lu et al. 2009). Furthermore, genome-wide data have
allowed study of signs of recent positive selection among Finns and other
European populations (Lappalainen et al. 2010). Finns are also likely to be
among the first populations to be extensively studied by the new sequencing
methods, not least because they are included in the 1000 Genomes Project
(1000 Genomes Project Consortium 2010).
5.3.2.Eastern influence in the Finnish gene pool
Although the Finnish gene pool is mainly European, it also harbors distinct
eastern elements. Estimates of the relative contributions of these sources
to the nuclear gene pool have been 75% European and 25% non-European
(Nevanlinna 1984) or 90% European and 10% Uralic (Guglielmino et al.
1990). HLA haplotype frequencies in Finland are consistent with a predominantly European origin of the population but also show traces of eastern
elements (Ikäheimo et al. 1996). Stronger eastern influence is apparent in
the Y chromosome (Zerjal et al. 1997, Lahermo et al. 1999), exemplified by
the most common haplogroup of Finns, N3, which is of eastern origin (Rootsi
et al. 2007) and has a frequency of about 60% (Lappalainen et al. 2006).
Some eastern mtDNA haplogroups have been detected in Finns (Torroni
et al. 1996, Meinilä et al. 2001), and an increased eastern affinity has also
been suggested by some genome-wide SNP studies (Bauchet et al. 2007).
Review of the literature
49
5.3.3.Finns and their Saami neighbors
The Finns show signs of admixture with the Saami population that lives in
the north of Finland (as well as in northern Sweden, Norway, and Russia)
(Sajantila et al. 1995, Lahermo et al. 1999). The admixture is stronger in the
northern and northeastern Finnish provinces than in the central and southwestern ones (Meinilä et al. 2001, Hedman et al. 2007). While the Saami
speak languages related to Finnish, and are relatively similar to Finns with
respect to the Y chromosome (Lahermo et al. 1999 and Raitio et al. 2001),
they differ from the Finns and other Europeans by their mitochondrial and
nuclear DNA (Barbujani & Sokal 1990, Cavalli-Sforza et al. 1994, Sajantila et
al. 1995, Lahermo et al. 1996, Huyghe et al. 2011). The Saami show mtDNA
patterns and increased LD indicative of constant population size (Lahermo et
al. 1996, Laan & Pääbo 1997, Kaessmann et al. 2002, Johansson et al. 2005,
Huyghe et al. 2010). An apparent discrepancy between the variation patterns of genes (Saami vs. other populations) and of languages (Finno-Ugric
vs. Indo-European) in the Baltic Sea area has sometimes been attributed
to language replacements (Sajantila & Pääbo 1995, Laitinen et al. 2002).
Interestingly, the Saami gene pool harbors eastern – Uralic and non-European – contributions of up to 50% (Guglielmino et al. 1990, Meinilä et
al. 2001, Tambets et al. 2004, Ingman & Gyllensten 2007, Johansson et al.
2008, Huyghe et al. 2011).
5.3.4.Genetic structure within Finland
Within Finland, mtDNA distribution is relatively homogeneous (Vilkki et al.
1988, Hedman et al. 2007), although small regional differences have been
detectable (Palo et al. 2009). The Y chromosome, however, shows striking
differences between eastern and western Finland (Kittles et al. 1998, Raitio
et al. 2001, Hedman et al. 2004, Lappalainen et al. 2006, Palo et al. 2007,
2008). These contrasting patterns have been suggested to result from a
male-biased Scandinavian gene flow into the western areas (Palo et al. 2009).
East-west differences also exist in classical blood group markers (see Norio
2003b p. 464 for a review), HLA (Siren et al. 1996), and nuclear markers
(Barbujani & Sokal 1990). These differences are unsurprising in the view of
Finnish population history (see section 5.1): the eastern and western parts
of the country have been subject to different external influences at least
since the Corded Ware culture and Bronze Age; they were separated by the
medieval border between Sweden and Russia; and their demographic histories, for example population densities, have differed up until present times.
Similar east-west differences are evident in many non-genetic phenomena
(see Norio 2003b Figure 2 for maps), e.g., eastern and western dialects of
Finnish (Leskinen 1999), in mammalian fauna (Heikinheimo et al. 2007),
50
Review of the literature
and even in the predominance of major vs. minor key in folk music tunes
(Toiviainen & Eerola 2005).
Between northern and southern Finland genetic differences are even
larger than between east and west; additionally, the northern areas exhibit
a strong substructure (Jakkula et al. 2008). Eastern and northern areas
also show reduced genetic diversity, both in the Y chromosome (Raitio
et al. 2001, Hedman et al. 2004, Lappalainen et al. 2006) and autosomally
in terms of increased LD and extended regions of homozygosity (Varilo
et al. 2003, Uimari et al. 2005, Service et al. 2006, Jakkula et al. 2008).
The mtDNA variation and LD patterns are compatible with a population
expansion (Lahermo et al. 1996, Laan & Pääbo 1997, Kaessmann et al. 2002).
Furthermore, the percentage of eastern elements (see section 5.3.2) is higher
in eastern than in western Finland (Lappalainen et al. 2006), and eastern
Finns are genetically close to Karelians (Lappalainen et al. 2008).
The Swedish-speaking minority of Finland lives mainly in the coastal
regions of Ostrobothnia and southern and southwestern Finland, descends
predominantly from 12th to 14th century immigrants, and is genetically
intermediate between Finns and Swedes. Estimates of the extent of admixture
with the Finnish-speaking population range from 60% (Virtaranta-Knowles
et al. 1991) to 75% (Nevanlinna 1984). The admixture has likely taken place
at or since an early stage of the population’s history because it appears to be
geographically homogeneous (Virtaranta-Knowles et al. 1991).
On an extremely fine geographical scale, the blood-group frequencies
within Finland have differed more between single villages than when averaged over communes or provinces (Nevanlinna 1972). Few subsequent
datasets have allowed analyses with similar resolution. Recently, however,
neighboring communes in northern Finland were shown to differ from
each other conspicuously (Jakkula et al. 2008). Consequently, genetic differences in the area allow prediction of parental birthplaces at a median
accuracy of 23 km (Hoggart et al. 2012). Additionally, a similar fine-scale
population structure may explain part of the striking Y-chromosomal difference between the small commune of Larsmo on the western coast and
many Finnish provinces (Palo et al. 2008). This phenomenon is not unique
to Finland, however, but exists for example in Italy, Scotland, and Croatia
(O’Dushlaine et al. 2010b).
5.3.5.Phenotypic variation in Finland
The genetic structure of the Finnish population also reflects in its phenotypes.
Maybe the best demonstration is the Finnish disease heritage (FDH), a
group of at least 36 monogenic disorders far more common in Finland than
elsewhere in the world (Norio et al. 1973, Peltonen et al. 1999, Norio 2003a,
2003b, 2003c). Revealingly, many of them are concentrated in eastern and
Review of the literature
51
northern Finland, and some show local enrichment (Norio 2003a, 2003c),
demonstrating founder effects and genetic drift. On the other hand, stemming
from the same effects, several diseases are markedly rarer in the Finnish
population than in other European or Caucasian populations. Examples
include cystic fibrosis (Kere et al. 1989), phenylketonuria (Guldberg et al.
1995), medium-chain acyl-CoA dehydrogenase (MCAD) deficiency (Pastinen
et al. 2001), and Friedreich ataxia (Juvonen et al. 2002).
In addition to the FDH diseases, a number of other diseases and mutations
show geographic variation within Finland. An east-west difference for the
cardiovascular diseases (section 5.3.6) is perhaps the most conspicuous. Such
patterns can shed light on gene flow or on other aspects of population history,
as with schizophrenia (Hovatta et al. 1997), multiple sclerosis (Tienari et al.
2004), cystic fibrosis (Kere et al. 1994), and the mutations of CYP21 (Levo
et al. 1999) and of BRCA1 and BRCA2 (Sarantaus et al. 2000). All in all, the
history and genetic structure of the Finnish population have enabled the
mapping of a multitude of disease genes, especially for monogenic disorders
(Peltonen 2000, Kere 2001). Reduced diversity and increased LD, combined
with the good performance of HapMap-based tagging SNPs (Willer et al.
2006, Service et al. 2007, Lundmark et al. 2008) make Finns an attractive
target population also for complex disease gene studies (Peltonen et al. 2000,
Varilo & Peltonen 2004, Kristiansson et al. 2008).
5.3.6.East-west difference in cardiovascular diseases in Finland
Cardiovascular diseases (CVDs) are more common in Finland than in most
other Western countries, particularly in males (Kuulasmaa et al. 2000, Sans et
al. 1997). Furthermore, they show substantial variation within Finland, being
more common in eastern Finland, especially in North Karelia (Tuomilehto
et al. 1992, Jousilahti et al. 1998, Karvonen et al. 2002, Havulinna et al.
2008). For example, age-standardized attack rates (per 100,000 / year)
for acute myocardial infarction (AMI) in 1991 for men aged 35 to 64 years
were 769 in western Finland (Turku and Loimaa), 776 in eastern Finland
(Northern Savo), and 860 in North Karelia (Salomaa et al. 1996), while
corresponding rates were 499 in Estonia (Tallinn) in 1991 (Laks et al. 1999),
367 in Sweden (Gothenburg) in 1990-1994 (Wilhelmsen et al. 1997) and 355
in Denmark (Glostrup) in 1988-1991 (Kirchhoff et al. 1999). In 1988-1992,
the corresponding coronary-event rate was 483 in western and 603 and
697 in eastern Finland (Northern Savo and North Karelia, respectively)
(Kuulasmaa et al. 2000).
Such an internal variation is not unique to Finland but exists for example
in Sweden (Nerbrand et al. 1991, Hammar et al. 1992, Johansson et al. 1992)
and in Britain (Morris et al. 2001, 2003). However, in the other populations
the majority of these gradients may be explained by regional differences
52
Review of the literature
in traditional risk factors (Hammar et al. 1992, 2001; Morris et al. 2001)
like serum cholesterol levels (Rosengren et al. 1999) and blood pressure
(Shaper et al. 1991), or by environmental variation such as drinking-water
constituents (Nerbrand et al. 1992). In Finland, in contrast, only 30 to 40%
of the east-west difference seems to be explained by traditional risk factors
(Pekkanen et al. 1992, Jousilahti et al. 1998).
CVDs have shown a declining trend in western countries (Sans et al.
1997, Tunstall-Pedoe et al. 1999, Kuulasmaa et al. 2000). In Finland, this
decline has been markedly rapid (Salomaa et al. 2003, Pajunen et al. 2004):
for instance, coronary mortality declined 80% between 1972 and 2007
(Vartiainen et al. 2010). This decline stems mostly from favorable development in the levels of classical risk factors (Vartiainen et al. 1994, 2010),
partly initiated by the North Karelia project (Salonen et al. 1979, Puska et
al. 1998), but the population has benefited also from new medical treatment
(Salomaa et al. 1999, Tunstall-Pedoe et al. 2000). Interestingly, the main
reasons for the decline may differ between eastern and western Finland
(Salomaa et al. 1996). Nevertheless, the east-west difference has mostly
persisted for AMI incidence (Karvonen et al. 2002, Havulinna et al. 2008),
unlike that for stroke (Sarti et al. 1994).
Persistence of the east-west difference has evoked the question of its origin,
especially because current differences in risk factors are relatively small
(Similä et al. 2005). Some of the variation may correlate with environmental
factors such as ground-water geochemistry (Kaipio et al. 2004, Kousa et al.
2004, 2006) or cold climate (Gyllerup et al. 1993, Gyllerup 2000, Näyhä
2005, Analitis et al. 2008). Studies of Finnish immigrants in Sweden and
their non-immigrant twins demonstrate effects of life style or environment
or both on CVD risk and subclinical atherosclerosis (Jartti et al. 2002a,
2009; Hedlund et al. 2007). On the other hand, some such factors seem to
be more dependent on place of birth than on place of residence (Jartti et
al. 2002b). For men in the Helsinki area, eastern birthplace contributes to
CVD risk independently of classical risk factors (Tyynelä et al. 2009, 2010).
Studies of children and young adults suggest a role for genetic factors or for
environmental factors early in life or prenatally (Pesonen et al. 1986, 1990;
Juonala et al. 2005). Furthermore, the susceptibility to the classical risk
factors may vary geographically (Jartti 2002b), possibly for genetic reasons.
Unsurprisingly, some genetic risk factors differ in frequency between eastern
and western Finland (Aalto-Setälä et al. 1991, Juonala et al. 2006).
5.3.7.Age estimates of Finnish genes
Relatively few genetic timings exist for the Finnish population. While the
diversity of the mtDNA as a whole is not reduced, the slowly evolving positions in the control region show a decrease in diversity, and served to date
Review of the literature
53
a mitochondrial bottleneck to approximately 3,900 years ago (confidence
interval (CI) not reported; Sajantila et al. 1996). For the Y chromosome,
early analysis determined an age of about 4,900 or 2,300 years for one
haplogroup and about 1,600 or 800 years for another (the latter estimates
for each haplogroup with CIs of 1,000 to 8,100 and 300 to 2,700 years,
respectively; Kittles et al. 1998). In the context of the whole Baltic area, two
mtDNA haplogroups have coalescence ages of 37,000 and 68,000 years,
and the most common Y haplogroups 7,700 to 10,700 years (CIs are in the
range of 10,000 years or more for the mitochondrial and about 3,000 years
for the Y-chromosomal timings; Lappalainen et al. 2008). With a different
estimate of the relevant mutation rate, however, the latter ages would be
lower, in the range of 3,000 to 4,000 years (Lappalainen 2009).
Additionally, age-estimates exist for several disease mutations. These are
often based on the geographic distribution of the disease and the width of
LD detected around the gene, and the compatibility of these with historically
known settlement events (Varilo 1999); more direct estimates also exist.
Mutation ages are typically 40 generations or less (Varilo et al. 1996a, 1996b;
Moisio et al. 1996, Sarantaus et al. 2000, Mykkänen et al. 2004), meaning
that they will shed light mainly on population events during centuries for
which historical information already exists. Furthermore, the patterns of
disease alleles – both rare and deleterious – may be unrepresentative of
events that have affected the population’s gene pool as a whole.
Studies of ancient DNA that could directly illuminate the genetic history
of the Finnish population and vitally assist in the timing of relevant events
have not been published to date. They will probably remain uncommon,
because suitable materials are lacking, due to the poor preservation of
organic samples in the acidic Finnish soil. Meanwhile, elaborate population
simulations have estimated the likely effects of various population scenarios – especially the presence and strength of population bottlenecks – on the
current genetic composition of the Finnish population. One of the insights
gained is that relatively minor, continuous migration – i.e., the type hardly
visible in the archaeological record – can easily generate much stronger
effects than single bottlenecks can (Sundell et al 2010).
54
Review of the literature
6.Sweden
6.1.Population prehistory and history
After the Ice Age, the first inhabitants of the area of present-day Sweden
were hunter-gatherers who followed their prey species, especially reindeer,
towards the north as the glacial ice retreated. The settlers came from two
directions: southern Sweden, then still a part of mainland Europe, received
its first occupants from south by 14,000 years ago, while the north was populated by immigrants arriving north via the Norwegian coast. Interestingly, a
similar bidirectional pattern of postglacial colonization exists in some animal
species such as the bear (Hewitt 2000) and the common frog (Knopp &
Merilä 2009), with hybridization zones across northern Sweden. By 4000 BC,
most of Sweden was inhabited; population size has been estimated as in the
vicinity of 25,000 (Siiriäinen 2003a, Lindqvist 2006).
Hunting and gathering remained the only means of subsistence for millennia, until agriculture spread into the southern and central parts of Sweden
from the south along with the Funnel Beaker culture around 4000 BC
(Figure 9A). Even in the southern areas, the adoption of agriculture was
slow, and it became the main source of livelihood only after several centuries.
Close to the Funnel Beaker culture, in central and southern Sweden, the
Pitted Ware culture relied exclusively on hunting – especially seal hunting in
the coastal areas – and gathering. It may have had contacts with the Comb
Ceramic culture of Finland. Meanwhile, in northern Sweden, influences
from Finland and northern Norway blended with the earlier culture. Some
researchers see this as the beginning of development towards the present-day
Saami (Siiriäinen 2003a, Lindqvist 2006).
In 2300 BC, the Corded Ware culture extended over southern Sweden
(Figure 9B), and despite the worsening climate at the end of the Stone Age,
agriculture spread northwards. The transition from the Stone Age to the
Bronze Age in 1800 BC was characterized by an overall continuity: the influx
of copper and bronze from central and western Europe had started earlier
and continued during the Bronze Age. Imports were the only source of metal;
it was traded for amber, fur, and seal skins. These were acquired through
northward and eastward contact that reached for instance coastal Finland
by 1500 BC (Figure 10A). In the same era, pottery from Finland extended
to the aceramic northern Sweden. Notably, this region of influence roughly
corresponds to the early historical Saami area (Siiriäinen 2003a).
Review of the literature
55
Figure 10. Geographic distribution of the Scandinavian Bronze culture in ca. 1500-500 BC
according to Huurre (2005) (A), and area of the Swedish kingdom in 1659 according to
Melin et al. (2003) and Stadin (1999) (B).
In the beginning of the Iron Age, the climate grew colder and damper.
Earlier, the small number of findings from the early pre-Roman Iron Age
(500 BC - 0 AD) in Sweden was interpreted as settlement discontinuity, but
this view is no longer valid. More likely, the climate-related difficulties in
agriculture led to some degree of population decline and emigration, but
the scarcity of findings may in part stem from Celtic population movements
that interrupted the former trade routes from Sweden to southern Europe.
When these connections were restored in the Roman Iron Age (0 - 400 AD),
many coins and luxury goods appear in the archaeological record. During
the Migration Period (400-550 AD), the main population movements did
not reach Sweden, but the era seems to have entailed some kind of unrest,
judging from the number of fortifications built and gold treasures buried
(Baudou 1999, Melin et al. 2003, Lindqvist 2006).
The increasing fur trade, expanding agriculture, and accumulating wealth
led to profound changes in society in southern Scandinavia, as chieftains and
petty kingdoms emerged. Towards the end of the Iron Age, in the Vendel Era
(550-800 AD), one such center formed in Uppland (the area north of current
Stockholm). It featured flourishing trade, powerful towns (including Birka),
and influence and contacts that extended north- and southward along the
Swedish coast and to Gotland as well as across the Baltic Sea to the coasts
of Finland and Estonia (Melin et al. 2003, Lindqvist 2006).
Starting in the late 8th century, Vikings exerted Scandinavian influence
over large parts of Europe for 300 years. The Vikings in western Europe
originated mainly from Norway and Denmark, whereas the Vikings active in
eastern Europe came from eastern Sweden – the same area that had already
56
Review of the literature
grown in power in the Vendel Era. Viking activities in western Europe consisted of varying periods of trade and raid, but contacts in the east seem to
have been predominantly peaceful, involving cultural blending with Slavs,
Balts, and Finns. In the end of the 9th century, Viking influence extended
all the way to the city of Kiev (in present-day Ukraine), but by the end of
the 10th century, the city had been slavified and the Vikings’ descendants
assimilated. Notably, although the Viking culture influenced large areas of
Europe, it dominated neither in Finland nor in the Saami areas of northern
and central Scandinavia. In addition to bringing Scandinavian influence
to Europe, Vikings transferred European cultural elements to Scandinavia
(Sawyer 2003, Roesdahl & Sørensen 2003, Melin et al. 2003).
Christianity had slowly diffused through trade and other contacts for over
a century by the time a king of Sweden first converted to Christianity around
the year 1000 AD. The transition continued slowly, with Christian and pagan
traditions living side by side throughout the 11th century. Nevertheless,
Christianity played an important role in the unification of the country.
Thus, during the late Viking Age and early Middle Ages, the often unstable
petty kingdoms with their bitter rivalries were gradually replaced by the
Scandinavian kingdoms. Sweden was the last one to emerge: its core regions
Svealand and Götaland were separated from each other by an impenetrable
forest and only become united under a common rule in the 12th century
(Sawyer & Sawyer 2003, Lindkvist 2003a, 2003b).
In subsequent centuries, the processes of expansion and unification
continued. Finland became a part of Sweden in the 12th century, and the
first border between Sweden and Russia was drawn in 1323 across Finland
from northwest to southeast. In 1319-1343, Sweden belonged to a union
with Norway, and in 1397-1521 to the Kalmar Union with Norway and
Denmark. After a series of wars, the Swedish kingdom was at its largest
in 1659 (Figure 10B). The areas outside the current borders of Sweden
were lost by 1721, apart from Finland, which was transferred to Russian
rule in 1809. In 1814, Sweden and Norway joined a union which dissolved
in 1905 (Stadin 1999, Melin et al. 2003, Lindkvist 2003a, Lindqvist 2006).
The population size within the current borders of Sweden was in the range
of 500,000 - 650,000 in the early 14th century, after having roughly tripled
since 750 AD. The 14th century saw slowing population growth and even
population declines in southern Sweden, as in large parts of Scandinavia.
Meanwhile, spread of settlement and population growth continued in northern Sweden. The Swedish colonization of Norrland began in the 14th century,
and as the Finnish settlement of the Tornio valley was also in progress, an
ethnic border formed close to the present-day border between Finland and
Sweden. Additionally, the population in Swedish towns was markedly influenced during the second half of the 13th and first half of the 14th century
Review of the literature
57
by immigrant German merchants who later assimilated into the population
(Benedictow 2003, Vahtola 2003b, Dahlbäck 2003, Lindqvist 2006).
In the 16th and 17th centuries, population growth was strong. In 1570,
the population size was about 600,000, and almost 1.5 million in 1721. In
the first half of the 18th century, growth was limited by wars, epidemics, and
low crop yields, leading to a population of 1.8 million in 1750, 2.0 million
in 1767, and 2.3 million in 1800. Between 1815 and 1860, the population
grew by 60%, from 2.5 million to 4 million. The population surplus was
discharged by mass migration: in 1821-1930, 1.2 million Swedes emigrated,
1.0 million of them permanently. For example in 1910, 20% of all Swedes
lived in the United States. Since the 1930s, immigration has exceeded emigration, together with internal population growth leading to a population
of 9.5 million (May 2012) (Samuelsson 1999, Häger 1999, Melin et al. 2003,
Vahtola 2003b, Norman 2003, Lindqvist 2006, Statistics Sweden 2012).
6.2.Language
Swedish belongs to the world’s largest language family, Indo-European,
more specifically to its North Germanic branch. Other North Germanic languages are Danish, Norwegian, Icelandic, and Faroese; Germanic languages
include for instance German and English (Pettersson 2005). Swedish is
spoken by more than 95% of the Swedish population, i.e., about 8.5 million
people. The largest linguistic minority in Sweden consists of some 250,000
Finnish-speakers. Apart from the population next to the Finnish border in
the north, they are mainly recent immigrants. Additionally, Saami speakers
in the north number 10,000 to 15,000 (Vikor 2002).
An Indo-European language form may have first arrived to the area of
present-day Sweden by 2000 BC or possibly earlier, but such estimations
remain tentative (Barnes 2003). The subsequent gradual divergence of the
North Germanic languages can be studied from runic inscriptions, the oldest
of which date to the 4th century AD (Bergman 2003). They document a
language form consisting of mutually intelligible dialects slowly diverging
from the 7th century AD onwards (Bergman 2003), with the final split
between Swedish and Danish taking place during the early Middle Ages
(Pettersson 2005). Meanwhile, the language expanded geographically; from
its 7th century area covering Denmark and the southern and central parts
of Sweden and Norway, it first spread towards the Saami areas in the north,
then to the British Isles, further to Iceland, Faroes, and Greenland in the 9th
and 10th centuries, and to the coastal areas of Finland and Estonia during
the 11th and 12th century (it later died out in Greenland and the British
Isles) (Vikor 2002).
About one-third of the Swedish vocabulary lacks cognates (words of
common etymological origin) in non-Germanic Indo-European languages,
58
Review of the literature
perhaps as a result of contact with an unknown language previously spoken
in the areas where the proto-Germanic language then extended (Barnes
2003). During historical times, the Swedish language has acquired loanwords
from several directions: from German in the 14th to 16th centuries through
intensive trade interactions, from French in the 17th and 18th centuries
mainly through royal and elite contacts, and recently from English. Other
loan sources include Greek and especially Latin, acquired both directly and
mediated through other languages (Pettersson 2005). Expectedly, place
names of Finnish and Saami origin exist in northern Sweden, and Finnish
names in the mid-Swedish areas that received Finnish immigrants during
the 16th and 17th centuries (Pettersson 2005). In southern Scandinavia,
almost all place names are Germanic, with the majority of the exceptions
seemingly of Saami origin (Barnes 2003).
The Swedish language can be divided into five major dialect groups. The
northernmost is spoken in Norrland, except for areas close to the Finnish
border where Finnish dominates. This dialect carries signs of contacts with
Norwegian in the southwest areas. The region of another dialect group
roughly corresponds to Svealand. Götaland features two dialect groups,
spoken in its northern and southern parts; the latter demonstrates an influence from Danish. The eastern dialect group includes the dialects spoken in
Finland and earlier (until World War Two) also in Estonia. These resemble
dialects from the western coast of Gulf of Bothnia, and, along with the dialect
spoken on Gotland, have retained many archaic features (Bergman 2003).
6.3. Genetic structure
6.3.1.Swedes compared to other European populations
Serologic studies of the Swedish population are numerous (see Einarsdottir
et al. 2007 and the references therein), but molecular population studies
of the Swedes have been less extensive than for the Finns. A classical study
based on a small number of autosomal markers found the Swedes to be
genetically close to Norwegians and relatively similar to other nearby populations such as the Danes, Germans, and Dutch, with the exception of the
geographically neighboring Finns and especially Saami, and the linguistically
related Icelanders (Cavalli-Sforza et al. 1994). The Swedes resemble general
European populations in their overall level of mtDNA diversity (Lappalainen
et al. 2008), their mitochondrial haplogroup frequencies (Tillmar et al. 2010),
and their X-chromosomal STR haplotype frequencies (Tillmar 2012). The
most common mitochondrial haplogroup in Sweden is H1, with a frequency
of 13% (Lappalainen et al. 2008), which originates from the glacial refugium
in Iberia (Torroni et al. 2006 and references therein).
Review of the literature
59
The Swedes resemble most other northern Europeans also in relation
to their Y chromosomes. Their diversity of Y-chromosomal haplogroups
(Lappalainen et al. 2008) and STR haplotypes (Holmlund et al. 2003)
corresponds to European levels. Their haplogroup distribution is similar
to Europeans (Karlsson et al. 2006) and even to western – but not eastern – Finns (Lappalainen et al. 2008), whereas their STR haplotypes are
closely related to Danes’, Norwegians’, and Germans’ but differ significantly
from Finns’ (Holmlund et al. 2006). The most common Y-chromosomal
haplogroup in Sweden, at a 36 to 38% frequency, is I1a (Karlsson et al. 2006,
Lappalainen et al. 2008, 2009). It has its origin in the Iberian refugium
(Rootsi et al. 2004), and may have spread to Sweden by Neolithic migrations
from Germany (Lappalainen et al. 2008). Several other Y-chromosomal haplogroups also seem to have arrived in Sweden from the south (Lappalainen
et al. 2008).
Swedes have also featured in many of the recent studies of European
population structure that have used genome-wide SNP data. There, they
match the general pattern of an overall correspondence of genetic and geographic distances, with for example Norwegians, Danes, Germans, and Poles
appearing as their closest genetic neighbors, whereas the Finns seem more
distant from the Swedes genetically than geographically (Lao et al. 2008,
Novembre et al. 2008, Heath et al. 2008, McEvoy et al. 2009, Nelis et al.
2009, Tian et al. 2009, O’Dushlaine et al. 2010a, Leu et al. 2010). Another
study on a genome-wide dataset of Europeans underlines the Swedes’ close
genetic relationship with neighboring populations by showing that the closest
genetic match for any Swede was actually more likely a Norwegian, Dane,
or even a German than another Swede (Lu et al. 2009).
6.3.2.Genetic structure within Sweden
Within Sweden, variation in mitochondrial and Y-chromosomal haplogroup
frequencies is mostly clinal, lacks strong borders, and shows signs of genetic
drift in various regions throughout the country, especially in the Y chromosome (Lappalainen et al. 2009). Haplogroup frequencies demonstrate
influence from the surrounding populations: from Denmark or central
Europe in the southern part of the country, and from Finland in northern
Sweden, eastern Svealand, and Dalarna (Lappalainen et al. 2009). Influence
from Norway is visible in the western provinces of Halland, Dalarna, and
Värmland (Lappalainen et al. 2009). However, the main genetic features
of Värmland may stem from migrational patterns unrelated to Norwegians
(Karlsson et al. 2006). In the contemporary population, both mitochondrial
and Y-chromosomal haplogroups reveal signs of recent immigration in the
cities of Malmö and Gothenburg (Lappalainen et al. 2009). Additionally, the
frequency of one Y-chromosomal haplogroup differs between the eastern and
60
Review of the literature
western parts of southern Sweden, possibly since prehistoric times (Karlsson
et al. 2006). The estimated ages for the four most common Y-chromosomal
haplogroups range from 3,300 to 9,100 years, with wide confidence intervals
(Karlsson et al. 2006). The distribution of mtDNA variation is compatible
with marked population expansion (Kaessmann et al. 2002, Tillmar et al.
2010).
As expected, allele frequencies within Sweden vary also in some autosomal
markers (Hannelius et al. 2005). Analogously, many monogenic diseases
show enrichment in various parts of Sweden, especially in the northern
areas (cf. below; Gustavson & Holmgren 1991). Furthermore, autosomal
markers show that the island of Gotland differs from the Swedish mainland:
one marker demonstrates an influence from the Finno-Ugric populations
on the eastern side of the Baltic sea (Beckman et al. 1998), and another one
an increased affinity to Baltic populations, possibly mediated by the FinnoUgric contacts (Sistonen et al. 1999). In contrast, however, Gotland does
not markedly differ from mainland Sweden with respect to Y-chromosomal
or mitochondrial variation (Zerjal et al. 2001, Karlsson et al. 2006, Tillmar
et al. 2010).
The population structure in northern Sweden is of special interest, owing
to its historically low population density and the long-standing presence of
three ethnic groups, Swedes, Finns, and Saami (see Einarsdottir et al. 2007
for a review of the region’s population history). Finnish genetic influence
manifests in the elevated frequency of Y-chromosomal haplogroup N3
(Karlsson et al. 2006, Lappalainen et al. 2008, 2009). Autosomal markers
suggest that in the areas along the Finnish border, a majority of the gene pool
may be of Finnish origin (Nylander & Beckman 1991). Whereas the Saami differ markedly from the Swedes (Zerjal et al. 2001), admixture with the Saami is
visible in both mitochondrial (Lappalainen et al. 2009) and Y-chromosomal
(Karlsson et al. 2006) haplogroup frequencies of northern Swedes, and
autosomal markers have yielded estimates of local admixture proportions
up to one-third (Nylander & Beckman 1991). Reciprocally, signs of Swedish
admixture are detectable in the Saami population (Karlsson et al. 2006),
especially the southern Saami (Ingman & Gyllensten 2007, Johansson et al.
2008). Both marital patterns and autosomal allele frequencies in the area in
part follow river valley geography (Einarsdottir et al. 2007). Furthermore,
Y-chromosomal haplogroup frequencies in Västerbotten may reflect the
high degree of central European immigration in the 17th century (Karlsson
et al. 2006). In addition to genetic drift, founder effects and admixture, the
genetic structure in the north may have been shaped by the relatively high
frequency of consanguineous marriages (Bittles & Egerbladh 2005). These
factors have also had an impact on phenotypic variation in the area, visible
in the increased prevalence of several monogenic diseases (Gustavson &
Holmgren 1991) as well as in more complex phenotypes. Consequently, the
Review of the literature
61
northern Swedish population has usefully served in the mapping of disease
genes (Burstedt et al. 1999, Venken et al. 2005).
A study of ancient DNA has shown a discontinuity in mitochondrial
lineages between the contemporary Swedish population and the Pitted
Ware culture hunter-gatherers living in Gotland from 2800 to 2000 BC,
while the latter are more closely related to contemporary Baltic populations (Malmström 2009). The hunter-gatherers have also demonstrated
a substantially lower frequency of the lactase persistence allele than does
the extant Swedish population (Malmström 2010). These patterns may
reflect population replacement related to the introduction of agriculture.
Contrastingly, however, the high frequency of paleolithic relative to neolithic
Y-chromosomal haplogroups in the contemporary Swedish population may
signal continuity of the (male) population through the adoption of agriculture
(Karlsson et al. 2006). A recent aDNA sequencing study showed the Funnel
Beaker farmers and Pitted Ware hunter-gatherers to differ markedly both
from each other and from present-day Swedes. Interestingly, the proportion
of ancient farmer ancestry varies within the contemporary Swedish population, being lowest in the north (Skoglund et al. 2012).
In genome-wide data, the Swedish population substructure has appeared
subtle. A comparison of control datasets with differing geographic distributions within Sweden detected no differences in terms of FST’s (Leu et al.
2010). Recently, a direct study of the population substructure (Humphreys
et al. 2011) detected genetic distinctness of the northernmost areas similar
to that observed earlier in other types of markers (see above) and genomewide (Study IV). Additionally, it revealed unique genetic features in Dalarna,
possibly resulting from Finnish or Norwegian ancestry (cf. results from other
marker types above). Furthermore, genome-wide SNP data have served in
scanning for signals of recent positive selection (Lappalainen et al. 2010)
and for copy number variants and regions of homozygosity (Teo et al. 2011)
in the Swedish population.
62
Review of the literature
Aims of the study
The overall aim of this study was to analyze the genetic structure of the
Finnish and Swedish populations. Previous studies into their structure have
concentrated mainly on the Y chromosome and mitochondrial DNA, which
may, however, fail to yield a comprehensive picture of the genetic variation,
especially the autosomal variation most relevant to diseases and other
phenotypes. This study therefore utilized recent methodological advances
in SNP genotyping to investigate the structure of these populations by use
of large numbers of autosomal markers.
The specific objectives of this study were the following:
1. to study the genetic structure and variation in northern Europe
predominantly from a population historical perspective using
genome-wide autosomal marker sets (III, IV)
2. to study the autosomal substructure of the Finnish population, with
an emphasis on possible differences between eastern and western
Finland (II, III)
3. to study the autosomal substructure of the contemporary Swedish
population (II, IV)
4. to propose and theoretically test a novel gene-mapping approach,
subpopulation difference scanning (SDS), which utilizes the population substructure for localizing genes for diseases whose incidence
differs between closely related subpopulations (I)
Aims of the study
63
Materials and methods
1.Samples
For a summary of the Finnish and Swedish datasets genotyped for this study,
see Table 1, and for the locations of the provinces sampled see Figure 11.
Table 1. Swedish and Finnish genotype datasets in this study: number of genotyped
markers and numbers of individuals per country, region, and province.
Dataset
Dataset 1 Dataset 2
Used in publication(s)
I, II
II
a
Number of SNPs
30 b
34
c
Number of individuals (area and abbreviation)
SwedenSWE
0
1599
0
256
NorrlandNORR
Norrbotten
NBO
0
98
Västerbotten
VBO
0
64
Västerbotten + Norrbotten
VNBO
0
na
Västernorrland
VNO
0
30
Jämtland
JAM
0
29
Gävleborg
GAV
0
35
Svealand
SVEA
0
692
Stockholm
STO
0
407
Uppsala
UPP
0
56
Södermanland
SMA
0
50
Värmland
VAR
0
46
Dalarna
DAL
0
40
Värmland + Dalarna
VARD
0
na
Örebro
ORE
0
55
Västmanland
VMA
0
38
GötalandGOTA
0
651
Östergötland
OGO
0
64
Jönköping
JON
0
57
Kalmar
KAL
0
31
Gotland
GOT
0
6
Kalmar + Gotland
KALG
0
na
Kronoberg
KRO
0
19
Blekinge
BLE
0
21
Skåne
SKA
0
193
Malmö
MAL
0
na
Halland
HAL
0
60
Västra Götaland
VGO
0
200
Göteborg (Gothenburg)
GOB
0
na
Swedes w/o further information SWEmix
0
0
64
Materials and methods
Dataset 3 Dataset 4 Dataset 5
III, IV
IV
IV
201 011
29 339
353 565
113
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
113
755
115
na
na
53
31
0
31
245
104
26
32
na
na
21
42
20
395
42
35
na
na
17
30
34
68
29
38
74
28
0
1525
237
73
37
na
75
18
34
545
214
39
43
95
24
na
77
26
743
82
50
na
na
55
48
50
114
29
61
172
82
0
Table 1 continued
Dataset
FinlandFIN
Western FinlandFIW
Southwest Finland
SWF
Satakunta
SAT
Swedish-speaking Ostrobothnia SSOB
Southern Ostrobothnia
SOB
Häme
HAM
Mixed western Finnish ancestry FIWmix
Eastern FinlandFIE
Northern Ostrobothnia
NOB
Kainuu
KAI
Northern Savo
SAV
NKAR
North Karelia d
SKAR
South Karelia d
Mixed eastern Finnish ancestry FIEmix
Individuals in total
na
a
b
c
d
Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5
440
572
280
0
0
191
322
141
0
0
33
85
71
0
0
55
67
16
0
0
18
21
20
0
0
43
54
13
0
0
41
88
17
0
0
1
7
4
0
0
249
250
139
0
0
107
78
52
0
0
0
0
18
0
0
84
88
40
0
0
17
30
14
0
0
36
48
0
0
0
5
6
15
0
0
440
2171
393
755
1525
= not applicable.
Numbers of markers may differ between analyses.
STRs, not SNPs.
Individuals from Sweden categorized into provinces according to place of birth (Dataset 2) or place of
residence (Datasets 4 and 5), and individuals from Finland according to birthplace of their grandparents.
Includes areas currently belonging to Russia.
Numbers reported in original publications may differ by a few individuals due to different QC thresholds.
1.1.Finns (I, II, III, IV)
The Finnish individuals in datasets 1, 2, and 3 (440, 572, and 280, respectively)
were male blood-donors recruited by the Finnish Red Cross in 1998-1999.
They were 40 to 55 years old at the time and selection was based on their
grandparental birthplaces that were located in the same area of Finland,
mostly within a single province. They gave their informed consent to participate in a population genetic study, and blood collection was approved
by the ethics committee of the Finnish Red Cross. Only the birthplaces of
the subjects’ grandparents were recorded, while the subjects themselves
remained anonymous.
Originally, the collection comprised 971 individuals. Individuals in
Datasets 1 and 2 represented the provinces targeted earlier in a Y-chromosomal
study (Lappalainen et al. 2006). Dataset 3 comprised individuals with
grandparental ancestry in areas of high (n = 139) and low (n = 141) acute
myocardial infarction (AMI) incidence (Havulinna et al. 2008).
Geographic coordinates for the Finnish subjects were determined as
an average of their grandparental birthplace coordinates obtained from
Google Maps. The individuals were classified by provinces based on their
grandparents’ birthplaces mainly following traditional Finnish province
divisions, with small modifications, for instance, to maintain sufficient
sample numbers per subpopulation. These provinces were further grouped
Materials and methods
65
into western or eastern ones based on previous knowledge (Lappalainen et al.
2006). As province coordinates, the average coordinates of the individuals
representing the province served for the Mantel test, and the geographical
center of the province (from Google Earth version 4.2) for other analyses.
1.2.Swedes (II, III, IV)
The Swedish individuals of Dataset 2 (n = 1599) comprised all children
born in Sweden during one week in December 2003, complemented with
89 additional newborns from northern Sweden to ensure sufficient sample
size in the more sparsely populated areas (Hannelius et al. 2007). The
dataset thus includes children of immigrants as well as of native Swedes.
Geographic information available was their birth hospital, and this was used
to divide the population by provinces and regions. Geographic coordinates
for the birth hospitals came from Google Earth version 4.2, and province
coordinates were calculated as averages of all births in the province. The
individuals were kept anonymous, and the study was approved by the ethics
committee of the Karolinska Institute.
The Swedish individuals in Dataset 3 were 113 ethnically Swedish males.
They originated mainly from the eastern part of the country, but with no
detailed ancestry information. They were recruited with informed consent, and the recruitment was approved by the ethics committee of Umeå
University.
Dataset 4 (n = 755) and Dataset 5 (n = 1525) comprise Swedish-born
female subjects aged 50 to 74 recruited in 1993-1995 for a breast cancer
study (Einarsdóttir et al. 2006). They were recruited with informed consent,
and the procedure was approved by the ethics committee of the Karolinska
Institute. All 755 individuals in Dataset 4 were controls, while an additional
770 cases were included in Dataset 5 (after removal of the most widely differing SNPs; see section 2) to provide enhanced geographic resolution for
some analyses. The population was divided by provinces according to place
of residence at the time of recruitment. Some of the smallest provinces were
combined, and the cities Malmö and Gothenburg were separated out from
their surrounding provinces. The provinces were further grouped into three
regions – Norrland, Svealand and Götaland – according to a traditional
division. Coordinates for the Swedish individuals, based on their place of
residence, were obtained from GeoNames (www.geonames.org).
1.3.Reference populations (III, IV)
The German reference group originated from Kiel area in Schleswig-Holstein
in northern Germany. They had been recruited for population genetic studies
as described elsewhere (Krawczak et al. 2006). All had given their informed
66
Materials and methods
NORR
SVEA
GOTA
FIW
FIE
Figure 11. Geographic location of Finnish and Swedish provinces studied. Colors denote
grouping of provinces into regions. Abbreviations as in Table 1.
consent, and the procedure had been approved by the ethics committee of the
Kiel Medical Faculty. A total of 256 male individuals took part in Study III,
and 492, male and female, in IV. All data for analysis remained anonymous.
The British reference group were male controls of the 1958 birth cohort.
Their genotype data were kindly provided by the Wellcome Trust Case Control
Consortium; a full list of the investigators contributing to generation of the
data is available from www.wtccc.org.uk. Details of the sample collection
have been described elsewhere (Wellcome Trust Case Control Consortium
2007). Geographic information was available on the subjects’ region of birth.
A total of 296 individuals with birthplaces distributed throughout the country
were used in Study III, and 740 individuals in IV.
Materials and methods
67
Additionally, we used two publicly available control genotype datasets:
HapMap (International HapMap 3 Consortium et al. 2010) and the Human
Genome Diversity Project (HGDP) (Li et al. 2008). The HapMap contains
relatively high numbers of individuals from a handful of populations around
the world, whereas the HGDP features a wider geographical coverage but a
more limited number of individuals for most of its populations. From these
datasets, especially those individuals of northern (central) European origin or
descent were included in several analyses: CEU individuals (Utah residents
with ancestry from northern and western Europe) from HapMap (58 in
Study III, 101 in IV) and Vologda Russians from HGDP (25 in Study IV). In
this thesis, the German, British, CEU, and Russian population will be jointly
referred to as central Europeans. Additionally, individuals from Yoruba in
Ibadan, Nigeria (YRI), Japanese in Tokyo, Japan (JPT), and Han Chinese in
Beijing, China (CHB) from the HapMap collection were included in selected
analyses for global comparison.
1.4. Simulated population data (I)
We simulated a population history in which an initial population splits into
three populations. The daughter populations grow exponentially from their
initial size of 100, 400, or 800 for either 20 or 40 generations to reach a final
size of 100,000. The range of initial sizes encompasses estimates of breeding
unit size in a rural Finnish population in the 19th century (Nevanlinna 1972),
and the growth times reflect those of the eastern and western Finnish subpopulations. Meanwhile, the end size was chosen based mainly on technical
convenience, as most of the genetic drift will occur in the smallest generations
at the start of the simulation. After the initial split, no subsequent migration
occurred between daughter populations. The simulation had non-overlapping generations and random mating within the subpopulations, apart
from sibling matings, which were excluded. At the end of each simulation,
50, 100, 200, 500, or 1000 non-sibling individuals were randomly sampled
for the analyses from two of the daughter populations. Further details of the
simulation can be found in Appendix A of Study I.
68
Materials and methods
2.Markers, genotyping, and quality
control
The markers of Dataset 1 were dinucleotide repeat microsatellites (i.e., STRs)
from Linkage Mapping Set MD-10 (Applied Biosystems, Foster City, CA,
USA). These microsatellites were known to have rare alleles in the Finnish
population. They were genotyped with the MegaBACE 1000 capillary
electrophoresis instrument (Molecular Dynamics/Amersham Biosciences,
Sunnyvale, CA, USA), and the genotypes were called with Genetic Profiler
software version 1.1. The genotypes were checked manually twice by a person
blind to their province. Quality control (QC) exclusion criteria were success
rates below 80% for both individuals and SNPs.
The SNPs of Dataset 2 were originally selected for zygosity testing as
described in Hannelius et al. 2007. The source material for the samples in
datasets 1 and 3 to 5 and the Finnish samples of Dataset 2 was genomic DNA
extracted by standard procedures, but the DNA of the Swedish samples of
Dataset 2 had been extracted from Guthrie cards (Hannelius et al. 2005)
and whole-genome amplified (see Study II for details) before genotyping.
Dataset 2 was genotyped on the Sequenom MALDI-TOF platform by use of
the iPLEX Gold chemistry (Sequenom, San Diego, CA, USA). Allele calling
was done in the Spectro Typer 3.4 software and checked independently by
two persons. SNPs and individuals with more than 20% missing data were
excluded.
Dataset 3 was genotyped with a commercial genotyping array, Affymetrix
250K StyI (Affymetrix, Santa Clara, CA, USA), which contains 238,304 SNPs.
The genotype calling was done in the Affymetrix GeneChip Genotyping
Analysis Software (GTYPE) version 4.1 with the BRLMM algorithm. QC
exclusion thresholds were a success rate of 97% for samples and 95% for
SNPs, a HWE p of 0.001, and MAF of 0.005. Additionally, smaller marker
sets for computationally intensive analyses were constructed by LD-based
SNP pruning. The QC procedures are described in more detail in Study III.
Datasets 4 and 5 were genotyped on the Illumina HumanHap550 array
(Illumina, San Diego, CA, USA). In QC, individuals with a success rate below
99% and SNPs with a success rate below 95%, HWE p < 1 x 10-7 or MAF < 0.05
were excluded. Dataset 4 was used in comparisons that involved Finns
from Dataset 3 and individuals from reference datasets. These datasets had
been genotyped on different platforms (the Finns on Affymetrix 250K, the
HGDP samples on Illumina HumanHap650K, and the HapMap samples on
Affymetrix 6.0 and Illumina 1M). The analyses employed only those SNPs
available from all platforms, i.e., no genotype imputation was done in order
to avoid possible artefacts from the imputation process. The details of the
Materials and methods
69
merging and QC filtering of the datasets can be found in Study IV. Dataset 5
was used for detailed analyses of the structure within Sweden. Because
it contained both cases and controls, we removed the SNPs whose allele
frequencies differed the most among them (p < 0.1 in a chi-square test);
the remaining differences were small (Figure S10 in Study IV). In addition
to analyses of all the genome-wide SNPs passing QC, some analyses used
smaller SNP sets (mostly because of computational limitations) filtered on
the basis of LD.
70
Materials and methods
3.Analyses
3.1. Model-free clustering of individuals: multidimensional scaling
Identity-by-state (IBS) similarities between pairs of individuals were calculated in R 2.7.2 (R Development Core Team 2008, http://www.R-project.org)
with package GenABEL 1.4–2 (Aulchenko et al. 2007), and visualized in R by
classical multidimensional scaling (MDS) of the IBS distances (1-IBS). The
variance explained by each of the resulting MDS dimensions was calculated
by dividing its eigenvalue by the sum of all eigenvalues (Cox & Cox 2001).
3.2. Model-based clustering of individuals: Structure, Admixture,
and Geneland
Population structure in the genome-wide SNP data was also assessed by
clustering the individuals with Structure software version 2.2 (Falush et al.
2003) using the admixture model with 10,000 burn-ins and iterations and
doing three separate runs for each number of clusters (K). Estimation of
the most likely K was based on the run probabilities, on visual inspection
of the resulting clusters, and on a specific statistic (Evanno et al. 2005).
For Datasets 1 and 2, both admixture and non-admixture models and both
correlated and non-correlated allele frequencies were used, with five runs
for each K ranging from 1 to 5. For comparison, the data were also clustered
with the Admixture software version 1.12 (Alexander et al. 2009; http://
www.genetics.ucla.edu/software/ admixture/). Assessment of the most
probable K was based on the default five-fold cross-validation procedure
and a minimum of ten runs per each K.
Additionally, individuals in the smaller SNP and STR datasets (Datasets 1
and 2) were clustered with the software Geneland (Guillot et al. 2005, 2008;
http://www2.imm.dtu.dk/~gigu/ Geneland/), in which the analysis can also
use geographical information of the samples. An independent Gamma prior
with parameters (1,20) was placed on the drift coefficients, and Geneland
was run 50 times for each dataset using 100,000 iterations and a burn-in of
60,000 iterations. Only the ten runs with the best mean posterior density
were considered in the analysis.
3.3.IBS distributions within and between populations
The amount of genetic variation within populations was inspected from their
IBS distributions. The IBS distribution within a given population consists of
the IBS similarities between those individual pairs in which both individuals
Materials and methods
71
belong to the population in question. Analogously, the IBS distribution
between two populations consists of the IBS similarities for those pairs where
one individual belongs to one of the populations and one to the other. The
difference between the distributions was statistically tested by calculating
the Mann-Whitney U test statistic for each pair of distributions, and comparing this observed value to an empirical distribution of expected values of
the test statistic obtained by permuting the population labels 10,000 times.
The significance levels were Bonferroni-corrected for the number of tests.
3.4. SNP-based inspection of eastern admixture
We examined the possible eastern contribution to the Finnish autosomal
gene pool from a SNP-wise perspective. If our study populations were all
to stem from the same proto-European population (approximated by the
CEU frequencies) and would have diverged from it in the absence of (eastern) admixture, their allele frequencies might have been subject to varying
amounts of genetic drift, but the number of allele frequencies drifting toward
a given direction (for example towards the Asian frequencies) should be
similar across populations. We therefore counted, for each study population
(eastern and western Finland, Sweden, Germany, and the UK), the number
of markers in which allele frequency deviated from the CEU frequency in the
same direction as did the CHB+JPT frequency, and the number of markers in
which the allele frequency deviated in the opposite direction. These numbers
were then compared to the null hypothesis that the proportion of markers
deviating toward the Asian direction is the same in all five populations by
use of a 2 x 5 chi-square test.
3.5.Distances between populations: FST and allele frequency
differences
The FST calculations for the genome-wide data were done in Arlequin version
3.11 (Excoffier et al. 2007), with significance levels based on 5000 permutations. Differences in allele frequency between pairs of populations (sampled
to be of equal size) were calculated in Plink (http://pngu.mgh.harvard.edu/
purcell/plink, Purcell et al. 2007) using a chi-square test with one degree of
freedom. The level of differences observed was then compared to the level
expected by chance by calculating the means of the smallest 90% of the chisquare test statistics both in the observed and expected distributions, and
taking the ratio of these means as in Clayton et al. (2005).
72
Materials and methods
3.6. Hierarchical population subdivision, AMOVA, and inbreeding
In Datasets 1 and 2, F-statistics from the R package Hierfstat (Goudet 2005)
and Arlequin version 3.1 were used to apportion the genetic variation hierarchically into individual, county, region, and country levels. An analysis of
molecular variance (AMOVA) was performed using Arlequin. Calculation of
p values was based on 10,000 replications for the F-statistics and on 20,000
permutations for the AMOVA results.
In Dataset 5, the inbreeding coefficients of Swedish individuals relative
to the total Swedish sample were calculated in Plink. The difference between
the coefficients of individuals from Norrland, Svealand, and Götaland was
tested with a Kruskal-Wallis rank-sum test in R. The significance levels were
corrected for the number of pairwise tests by use of a Bonferroni correction.
3.7.Linkage disequilibrium
The extent of linkage disequilibrium (LD) in each population was calculated
for 67,620 SNP pairs (all SNPs at a less than 500-kb distance from 5083
randomly chosen SNPs) using the expectation-maximization algorithm in
Plink. As LD is dependent on sample size, all populations were first sampled
to equal size. The LD distributions between populations were compared by
use of a Wilcoxon signed rank test (i.e., a paired test) in R. The significance
levels were Bonferroni-corrected for the number of pairwise tests.
3.8.Correlation of genetic and geographic distances
The correlation between genetic and geographic distances was tested with
the Mantel test from the R package ade4 (Dray & Dufour 2007). The geographic distances between pairs of individuals and pairs of counties were
calculated as great-circle distances by using R package fields (Nychka 2007).
As genetic distances, IBS distances between pairs of individuals (see above)
were used for the genome-wide data, and pairwise FST statistics between
counties calculated in Arlequin version 3.1 for Datasets 1 and 2.
3.9.Tests of the subpopulation difference scanning (SDS)
approach
Allele frequency differences between areas of high and low AMI incidence
were calculated in Dataset 1, by comparison of subpopulations in Finland
from central east (111 individuals) and central west (46 individuals) as
well as central east and southwest (38 individuals). The result for the most
differing allele per marker was recorded in each comparison. Similarly, the
differences in the simulated data were recorded as the largest frequency
Materials and methods
73
difference in five initially equifrequent haplotypes between the two sampled
daughter populations. Furthermore, the simulated frequency differences
were used to determine the power of detecting a disease gene with a given
frequency difference when using varying genomic exclusion thresholds (see
Appendix A in Study I for details). The degree of disease incidence difference
that various gene frequency differences would produce was then studied
in a simple model of two subpopulations, each in HWE and differing only
in their frequency of the disease-causing allele, which was assumed to be
dominant with incomplete penetrance.
3.10.Enrichment analyses
The potential enrichment into Kyoto Encyclopedia of Genes and Genomes
(KEGG) pathways among the genome areas that showed a large regional
frequency difference between eastern and western Finland was studied
using WebGestalt (http://bioinfo.vanderbilt.edu/webgestalt/) (Zhang et
al. 2005). The 11,615 SNPs in which either homozygote had a frequency
difference above 0.135 in Dataset 3 served as input. Of these SNPs, 4,116
were unambiguously mapped to 1,810 unique Entrez IDs. All pathways with
at least two of these genes were tested. The p-value cutoff used (after multiple testing correction with the Benjamini and Hochberg method) was 0.05.
A list of SNPs associated with cardiovascular phenotypes (with
p < 1 x 10-5) was retrieved from the NHGRI Catalog of Published GenomeWide Association Studies (http://www.genome.gov/ gwastudies, accessed
01/01/2012). We then estimated how large an incidence difference each
of these 248 SNPs could explain. For simplicity, it was first assumed that
the frequency difference in the GWAS SNP would equal that observed in a
nearby SNP in Dataset 3; the largest of the differences in both homozygote
frequencies and the allele frequency was used. The increased disease risk
caused by the GWAS SNPs was calculated based on the reported OR. As
the risk depends on allele frequencies, the maximal risk increase across all
frequencies was used to obtain an initial estimate. The estimate was then
refined for those SNPs which could explain an incidence difference of at
least 1% (or 0.2% when the SNP itself was present in Dataset 3) in the initial
calculation and which originated from GWASes reporting the risk allele and
conducted mainly on European or European-ancestry populations. When the
same or a nearby SNP was reported by multiple studies, the best reported OR
was used. The risk increase of the allele was calculated based on the allele
frequencies reported in the paper, and if the SNP had not been genotyped in
Dataset 3, its allele frequencies in the high and low incidence regions were
estimated based on HapMap (phase 2, release 24) CEU haplotype frequencies
obtained from Haploview version 4.1 (Barrett et al. 2005).
74
Materials and methods
Results
1.Population structure in Finland and
Sweden
1.1. Model-free clustering of individuals: multidimensional scaling
When a multidimensional scaling (MDS) analysis was performed on the
genome-wide data of north Europeans (Datasets 3 and 4 complemented
with Germans and Russians), the individuals clustered mainly according to
their area of origin (Figure 12A). The first dimension showed an east-west
gradient from eastern Finns through western Finns and Swedes to Germans,
and the second dimension revealed a difference between northern Swedes
and others. Svealand and Götaland individuals behaved very much like
Germans until the third dimension, and the few Russian references were
located between this cluster and the western Finns. Overall, the different
populations did not form clearly separate clusters. Genetic and geographic
distances generally corresponded to each other, except for eastern Finns and
northern Swedes who exhibited longer genetic than geographic distances.
In an MDS analysis of Finns and Swedes only (Figures 12B and 12C),
the two first dimensions formed a pattern very similar to the one that had
the German and Russian reference populations included. A province-based
coloring of the samples showed that the Finns with closest affinity to Swedes
were Swedish-speaking Ostrobothnians, whereas of the Swedes, Götaland
individuals tended to be furthest away from the Finns. Notably, the Swedish
individuals from the northernmost province, Norrbotten, were not particularly close to the geographically closest Finnish province, Northern
Ostrobothnia. Samples from different provinces, especially within southern
Sweden, showed high degrees of overlap.
An MDS analysis of the Swedes (Figure 12D and Figure 2C in Study IV)
showed a north-south gradient in the first dimension, while the second
dimension further differentiated, within Norrland, between Norrbotten and
Västerbotten. The samples without geographical information behaved mainly
like the northern samples. The plot of Norrland individuals alone (Figure 2D
in IV), colored according to their river valley of origin, showed a loose division into the south, middle, and north of Norrland. An analysis of Götaland
and Svealand (Figure S5 in IV) revealed only a very subtle structuring, with
the first dimension loosely differentiating between Svealand and Götaland,
and the second dimension showing an east-west gradient within Götaland.
Results
75
0.00
0.02
0.02
-0.02
0.00
0.00
FIE
FIW
GER
RUS
2nd dimension (0.21%)
0.02
2nd dimension (0.21%)
SWEmix
NORR
SVEA
GOTA
0.04
B
0.04
A
0.04
1st dimension (0.54%)
-0.02
0.00
0.02
0.04
1st dimension (0.74%)
C
GOTA
VGO
GOB
HAL
SKA
MAL
SVEA
OGO
JON
KALG
KRO
BLE
NORR
UPP
STO
SMA
DAL
VMA
ORE
VAR
FIW
FIE
NOB
KAI
NKAR
SAV
FIN
SOB
SSOB
SAT
SWF
HAM
NBO
VBO
VNO
GAV
SWEmix
E
-0.04
-0.04
-0.02
-0.02
0.00
0.00
0.02
0.02
0.04
2nd dimension (0.52%)
2nd dimension (0.23%)
D
-0.02
0.00
0.02
0.04
1st dimension (0.35%)
0.06
-0.04
-0.02
0.00
0.02
1st dimension (0.90%)
0.04
Figure 12. Genetic distances between individuals visualized by multidimensional scaling
(MDS). Identity by state (IBS) distances in northern Europe (A), Sweden and Finland (B),
Sweden (D), and Finland (E). Individuals are colored according to their country (A) or
province (B-E) of origin. Axis labels indicate proportions of variance explained by each
axis. Abbreviations: Germans (GER), Russians (RUS); others as in Table 1. Based on
data from Study IV.
76
Results
In an MDS analysis of the Finns (Figure 12E), the first dimension showed
a division into eastern and western samples, with Häme as an intermediate
province, and the second dimension revealed a spread within the eastern and
western samples. This spread loosely corresponded to province origin, with
a span from north (Northern Ostrobothnia and Kainuu) to south (Savo and
Northern Karelia) in eastern Finland, and a difference between the southwestern provinces (Southwest Finland and Satakunta) and Ostrobothnia
(Southern Ostrobothnia and Swedish-speaking Ostrobothnia) in western
Finland.
Overall, the percentages of variation explained by the first MDS dimensions were very small, as is typical in an individual-based analysis, because
most of the variation in the human species resides between individuals and
is therefore not well captured by MDS or PCA methods. The loose divisions
(as opposed to strictly differing clusters) may reflect both the large individual
differences, a clinal nature of the genetic differences between areas, and the
limited resolution of the MDS, especially with these numbers of individuals.
1.2. Model-based clustering of individuals: Structure, Admixture,
and Geneland
When north European individuals (cf. section 1.1) were analyzed with the
Structure software (Figure 3 in IV), the analysis with K = 2 revealed a division
between eastern Finns and all the other samples, while the analysis with K = 3
added a cluster dominated by the Swedes, K = 4, dominated by northern
Swedes, and K = 5, dominated by Germans. In all these analyses, the western
Finns and Russians tended to show notable proportions of ancestry in more
than one cluster. Analyses with K = 6 and K = 7 did not bring out further
differentiation between populations. While the likelihoods of clusterings
appeared approximately equal across K, a specific statistic (Evanno et al.
2005) suggested the most likely number of clusters to be 2 or 6.
An Admixture analysis of the same dataset (with ca. 25,000 SNPs) produced a similar result (Figure 13), with the exception that K = 3 was the
most likely solution (Table 2). Of the three clusters, one appeared central
European and showed elevated frequencies also in southern Sweden, one was
eastern Finnish, with intermediate frequencies in western Finland, and one
was enriched in northern Sweden and low in eastern Finland, including the
North Ostrobothnians geographically closest to northern Sweden (Figure 14).
A Structure analysis of the Finns alone (not shown) supported the existence of two clusters which largely corresponded to the eastern and western
subpopulations: only 1.8% of the samples showed higher ancestry in the
opposite cluster. In an Admixture analysis, even with an unpruned dataset of 201,011 SNPs, the most likely result was only one cluster (Table 2).
A Structure analysis of the Swedes showed two clusters (an analysis with
Results
77
K=2
K=3
K=4
FIE
RUS
FIW
NORR
SWEmix
SVEA
GOTA
GER
CEU
BRI
K=5
Figure 13. Clustering of north European individuals by Admixture software. Each thin
vertical line represents an individual, and colors denote proportions of ancestry in
each of the K inferred clusters (2 to 5) dominated by eastern Finns (yellow), Swedes
or southern Swedes (red), northern Swedes (green), Germans (light blue), and others
(purple). Abbreviations: British (BRI), Utah residents with ancestry from northern and
western Europe (CEU), Germans (GER), Russians (RUS); other abbreviations as in
Table 1. Unpublished.
78
Results
Table 2. Comparison of clustering results from Admixture and Structure.
Study population(s)
Admixture analysis
SNPs a
number of runs
CV error b
K = 1
K = 2
K = 3
K = 4
Structure analysis
most likely K
SNPs a
Northern Europe
Finland
Sweden
25,000
11
201,000
10
60,000
14 c
0.5955445
0.5949245
0.5948736
0.5949291
0.540306
0.540472
nd
nd
0.62188
0.62185
0.62199
nd
2 or 6
7,200
2
6,400
2
5,700
nd = not determined.
a
Approximate number of SNPs used in the analysis.
b
Cross-validation error for K clusters, averaged over runs. The smallest error is underlined.
c
CV error for K = 3 based on four runs.
A
B
C
70
CEU BRI GER RUS SWE CEU BRI GER RUS SWE CEU BRI GER RUS SWE
mix
mix
mix
n
n
n
1
5
10
1
5
10
55
60
latitude
65
1
5
10
10
15
20
25
longitude
1.0
1.0
1.0
0.5
0.5
0.5
0.0
0.0
0.0
30
10
15
20
25
longitude
30
10
15
20
25
longitude
30
Figure 14. Local ancestries in three clusters (A-C) by Admixture software. Circles colored
according to average proportion of ancestry of individuals originating from the location in
question; circle sizes correspond to number of individuals per location. For comparison,
panels above the maps depict the average ancestry proportions in five populations (circle
size not to scale). Abbreviations as in Table 1 and Figure 13. Unpublished.
Results
79
K = 3 yielded a clearly lower likelihood), with a clinally varying frequency
from north to south and an enrichment in the province of Västerbotten
(Figures S6 and 5A in IV). An Admixture analysis (based on ca. 60,000
SNPs) found two clusters to be a slightly more probable solution than one
cluster (Table 2). The geographic trend in K = 2 was similar to that seen in
the Structure analysis (E.S., unpublished results).
In the smaller dataset with 34 SNPs (Dataset 2), the Geneland software
detected two clusters within Finland (Figure 3A in II). These mostly corresponded to the division between eastern and western Finland, although
the border between the clusters lay somewhat further north and east than
did the main geographical division of the birthplaces in the data. When
Swedish individuals were added to the analysis (Figure 3B in II), Geneland
found three clusters: eastern Finland, western Finland and Sweden, with
the exception that the Swedish-speakers from the Finnish Ostrobothnia
clustered with the Swedes. In contrast, Structure found only one cluster in
each of these datasets. In the STR dataset of Finns, Geneland and Structure
found no substructure.
1.3.IBS distributions between populations
As MDS results may not be representive of the full genetic variation of
populations, the distributions of IBS between populations were examined
using the predefined population divisions. The populations of interest were
compared to the HapMap populations, i.e., the IBS distributions to be
compared consisted of the IBS similarities between pairs of individuals
with one individual from the population of interest and the other from the
HapMap population in question (see section 3.3 in Materials and Methods).
As expected, the general level of similarity was lowest with the African (YRI),
highest with the European (CEU), and intermediate with the East Asian
(CHB and JPT) HapMap populations (Figure 15A). Notably, the similarities
of the study populations with the East Asians varied relatively widely, with
Germany and Götaland showing the lowest similarity and eastern Finland
and Russia the highest, whereas the comparisons with CEU exhibited the
opposite order (Bonferroni-corrected p < 0.015 for all distribution pairs,
except Svealand vs. Germany being nonsignificant in the comparison with
CEU, Svealand vs. Norrland with CHB and JPT, and Russia vs. eastern
Finland and Götaland vs. Germany in both comparisons). Interestingly, a
similar comparison with the Russian reference population revealed a different pattern (Figure 15B), with Germany and especially western Finland
showing the highest similarity, and Norrland and eastern Finland the lowest
similarity with the Russians (Bonferroni-corrected p < 0.031 for western
Finland vs. all populations except Germany, and for Germany vs. eastern
Finland and Norrland).
80
Results
YRI
CHB + JPT
CEU
0.00
0.05
0.10
frequency
0.15
0.20
A
0.66
0.68
0.70
IBS similarity
0.710
0.715
0.720
IBS similarity
0.725
0.15
0.10
frequency
0.05
0.00
0.00
0.05
0.10
frequency
0.15
NORR
SVEA
GOTA
FIW
FIE
GER
RUS
0.20
C
0.20
B
0.72
0.710
0.715
0.720
0.725
IBS similarity
Figure 15. Distributions of pairwise identity-by-state similarities between study populations
and four HapMap populations (A) and Russians (B) and within study populations (C).
Triangles denote the median of the distribution of the corresponding color. Each curve in
A and B consists of similarities of individual pairs with one individual from the population
denoted by curve color and the other from the HapMap or Russian population; analogously, in C, all pairs are depicted in which both individuals originate from the population
in question. Abbreviations: Yoruba from Ibadan, Nigeria (YRI, n = 105); Han Chinese
from Beijing, China (CHB, n = 78); Japanese from Tokyo, Japan (JPT, n = 84); Utah
residents with ancestry from northern and western Europe (CEU, n = 101); Germans
(GER, n = 492); Russians (RUS, n = 25); other abbreviations as in Table 1. Adapted
from Study IV with permission.
1.4. SNP-based inspection of eastern admixture
The varying affinity to eastern Asians was also detectable in a SNP-wise
analysis of Dataset 3: when the allele frequencies in the study populations
were compared to those of CEU and CHB+JPT, the proportion of SNPs whose
allele frequency deviated from the CEU frequency towards the eastern Asian
frequency (instead of away from it) was significantly higher (p < 1 x 10-5)
in Finland, especially in eastern Finland, than in the central European
populations (Table S2 in III).
Results
81
1.5.Distances between populations: FST and allele frequency
differences
In European populations, FST distances and geographical distances mostly
corresponded, with the exception of eastern Finns, Sardinians, and Basques,
who exhibited longer genetic distances than their geographic locations would
imply (Table S2 and Figure S3A in IV). A similar pattern was evident in the
general levels of allele frequency differences between populations (Table 3):
eastern Finland differed the most, while Svealand and Götaland were relatively close to Germany and Great Britain, and western Finland was in fact
closer to Norrland and Svealand than to eastern Finland. Within Sweden,
the allele frequency differences between Svealand and Götaland were small,
but from them to Norrland they were of the same magnitude or larger than
those between Germany and Great Britain.
Table 3. Degree of allele frequency differences between pairs of populations.
λ
FIE
FIW
NORR
SVEA
GOTA
GER
BRI
FIE
1.00
1.71
2.59
2.62
2.91
3.08
3.30
FIW
1.71
1.00
1.56
1.52
1.70
1.82
2.05
NORR
2.59
1.56
1.00
1.12
1.20
1.36
1.46
SVEA
2.62
1.52
1.12
1.00
1.03
1.16
1.28
GOTA
2.91
1.70
1.20
1.03
1.00
1.13
1.21
GER
3.08
1.82
1.36
1.16
1.13
1.00
1.11
BRI
3.30
2.05
1.46
1.28
1.21
1.11
1.00
Measured as λ, the ratio of observed vs. expected chi-square test statistics; λ = 1 indicates no difference.
Abbreviations: Germany (GER), Great Britain (BRI); other abbreviations as in Table 1.
Reproduced from IV with permission.
On a province level, the FST values between most province pairs within
eastern and western Finland were significant (Table S3 and Figure S3B in IV).
In contrast, despite the larger sample sizes for many Swedish provinces,
the FST values between province pairs within Svealand and Götaland were
non-significant. In Norrland, the provinces of Norrbotten and Västerbotten
differed significantly from each other and from the other provinces (Tables S3
and S4 and Figures S3B and S3C in IV).
1.6. Hierarchical population subdivision, AMOVA, and inbreeding
In Dataset 2, a hierarchical analysis of individual, county (i.e., province), region,
and country levels in Finland and Sweden resulted in significant F-statistics except
for Fcountry/total (Table 2 in II). Within Sweden, the genetic variation between counties and regions in Sweden was non-significant (Table 1 in II), while Findividual/country
and Findividual/county were significantly inflated (at 0.027 and 0.025, P < 0.0001; Table
2 in II), which may relate to increased genotyping error rate due to whole-genome
amplification of the template. Eastern and western Finland explained a small
82
Results
but significant portion (0.42%, p < 0.0001; Table 1 in II) of the genetic variation
within Finland, and in the four-level analysis (Table 2 in II) both Fcounty/region and
Fregion/country were significant (p < 0.01). The latter was significant also in the STR
data (Dataset 1; p < 0.05).
Level of inbreeding differed between the three Swedish regions (p < 0.0002),
and was stronger in the north, especially in coastal Västerbotten (Figure 5B in IV).
1.7.IBS distributions within populations
The variation within populations was measured as the distribution of IBS
similarities between all pairs of individuals originating from the same population (Figure 15C). The similarities were highest in eastern Finland and
successively lower in western Finland, Götaland, Germany, Svealand, and
Norrland (Bonferroni-corrected p < 0.034 for eastern Finland vs. all other
populations, Norrland vs. Götaland and western Finland, and western
Finland vs. Svealand).
A similar pattern emerged in the smaller SNP dataset (Dataset 2), where
the mean IBS in Finland was higher than in Sweden (0.650 and 0.641,
respectively), and higher in eastern than in western Finland (0.656 and
0.649, respectively), indicating reduced heterozygosity in Finland, especially
in eastern Finland. Within Sweden, the IBS similarities ranged from 0.640
in Götaland and 0.639 in Svealand to 0.6370 in Norrland.
1.8.Linkage disequilibrium
Degree of linkage disequilibrium (LD) differed significantly between most
of the populations (Bonferroni-corrected p < 0.002 for all pairwise comparisons, except Germany vs. Great Britain and Svealand vs. Götaland as
nonsignificant). It was weakest in Germany and Great Britain, followed by
Svealand and Götaland, Norrland and western Finland, and was clearly
strongest in eastern Finland (Figure S8 in IV).
1.9.Correlation of genetic and geographic distances
In Dataset 3, genetic distances between Finnish individuals correlated significantly with their geographic distances (r = 0.31, p < 10−6). In Sweden (Dataset 5),
the correlation was significant for the whole country (r = 0.066, p < 0.0001)
as well as regionally in Norrland (r = 0.164, p < 0.0001), Svealand (r = 0.011,
p < 0.0001), and Götaland (r = 0.036, p < 0.0001). In the smaller SNP set (Dataset
2), the genetic and geographic distances between counties correlated significantly
in Finland (r = 0.32, p = 0.046), and in Finland and Sweden combined (r = 0.39,
p < 0.0003), whereas in Sweden and in the STR data (Dataset 1), the correlation
was not significant (r = -0.014, p = 0.54 and r = 0.01, p = 0.49, respectively).
Results
83
2. Subpopulation difference scanning
(SDS) approach
2.1.Population simulations
In the STR data (Dataset 1), 90% of the markers had allele frequency differences below 0.13 between eastern and southwestern, and below 0.15 between
eastern and western Finland. Similar distributions of allele frequency differences occurred in the population simulations in which the population
expanded from an initial size of 400 or 800 individuals to 10,000 individuals
in 40 generations (Table 4). Judging from the simulations with expansion
time of 20 generations and initial sizes of 100 and 800 individuals (Figure 3
in I), an initial size of ca. 300 could produce a relatively similar result in
20 generations – as could, obviously, a number of other parameter combinations that remain unsimulated.
Table 4. Frequency differences from various population simulations with initial haplotype
frequency 0.2.
Initial size
100
100
100
100
400
400
800
800
800
800
Initial D’
0.5
0.7
0.5
0.7
0.5
0.7
0.5
0.7
0.5
0.7
Time (gen)
20
20
40
40
40
40
20
20
40
40
Sample size
100
100
100
100
100
100
200
200
200
200
HFD 90%
0.19
0.19
0.25
0.25
0.16
0.16
0.10
0.09
0.12
0.12
GFD 80%
0.27
0.23
0.33
0.29
0.21
0.19
0.13
0.12
0.15
0.14
Initial D’: average LD at 1-cM distances in the initial population.
Time (gen): duration of exponential population growth (generations).
HFD 90%: haplotype frequency difference threshold producing a 90% exclusion of the genome.
GFD 80%: gene frequency difference yielding a 80% detection power of the gene.
Results based on 100 simulations, except for initial sizes of 400 on 1000 simulations.
Adapted from Salmela et al. J Med Genet 43: 590-597 (2006) with permission.
The simulations underwent further study of how different genotyping
densities and sample sizes would affect the power to detect a true gene
frequency difference, when detection is based on the haplotype frequency
difference of nearby markers genotyped in samples from the daughter
populations (Table 4). In the simulations closest resembling the STR data,
a gene with a frequency difference of at least 0.14 to 0.20 would have a
nearby haplotype frequency difference exceeding the 90th percentile of the
haplotype difference distribution, when assuming a 1-cM genotyping density
and a sample size of 100 to 200. Thus, an exclusion of 90% of the genome
84
Results
based on such a genotyping would result in an 80% power of detecting
(i.e., 20% risk of excluding) the correct gene.
How likely is it that a disease-causing gene would have a frequency difference of 0.14 to 0.20? The regional incidence difference caused by a gene
is a product of the gene frequency difference between the regions and the
risk increase caused by the gene. Acute myocardial infarction (AMI) shows
yearly incidences in the range of 860 per 100,000 35- to 84-year-old men
in the high and 710 in the low incidence areas of Finland (Havulinna et al.
2008). Part of this difference may obviously be caused by regional differences
in non-genetic risk factors. Let us assume that the lifetime genetic incidence
difference for AMI caused by a single dominant, incompletely penetrant
gene would be for example 0.04. Then the gene’s frequency difference
would exceed the 0.20 calculated above (meaning that the gene would be
detectable with a 90% exclusion of the genome at least 80% of the time)
if the gene produced at most a 0.12 increase in disease risk (Figure 4 in I).
2.2.East-west differences in the genome-wide SNP dataset
0.3
0.2
0.1
0.0
frequency difference
0.4
In Dataset 3, the eastern Finnish individuals represent the high-incidence
area and the western Finnish individuals the low-incidence area for AMI.
When the genotype frequencies in these areas were compared, 90% of the
SNPs had a frequency difference below 0.12 (E.S., unpublished results). As
expected for differences caused by genetic drift and unrelated to a particular
phenotype, the large population differences were much more scattered across
the genome than were the typical differences in case-control association
studies (Figure 16).
1
3
5
7
9
chromosome
11
13
15
17 19 21 X
Figure 16. Genomic locations of differences between eastern and western Finland, with
absolute genotype frequency differences plotted according to SNP location; chromosomes
are shown in an alternating color. Unpublished.
Results
85
We then analyzed whether genes in the genome areas showing the highest east-west difference (above 13.5%) were enriched into any functionally
relevant categories. Indeed, the SNPs (n = 11,615) that had the highest eastwest frequency differences showed significant enrichment to four KEGG
pathways. Interestingly, three of these, arrhythmogenic right ventricular
cardiomyopathy (p = 0.0004), hypertrophic cardiomyopathy (p = 0.0053)
and dilated cardiomyopathy (p = 0.0198) were among the four pathways
that KEGG categorizes as being related to cardiovascular diseases, and only
the viral myocarditis pathway was absent. The fourth enriched pathway was
neuroactive ligand-receptor interaction (p = 0.0017) (E.S., unpublished
results).
Based on the east-west differences in Dataset 3, we also calculated how
large were the incidence differences a subset of cardiovascular GWAS SNPs
could explain. We used the OR reported by the GWAS and estimated allele
frequencies from those in Dataset 3 and from the haplotype frequencies
in HapMap CEU. The combined effect of variants from eight independent
genome regions could explain a difference of ca. 1.3% in disease incidence
between the high- and low-incidence regions (Table 5). In four of these
regions, the SNPs (rs6455689, rs11556924, rs12618342, and rs17676758)
harbored genotype frequency differences that were among the highest 1.2%
in the dataset, and would thus be among the top candidates to be picked up
by the SDS approach (E.S., unpublished results).
86
Results
From GWA study
SNP
allele OR study
c
b
d
f
e
f
g
b
h
rs599839
rs12618342
rs9818870
rs16893526
rs6455689
rs6455689
rs11556924
rs501120
rs17676758
A
T
T
G
C
C
C
T
G
0
72
0
0
37
37
0
0
49
0.777
0.441
0.082
0.933
0.436
0.436
0.610
0.850
0.137
0.835
0.290
0.137
0.989
0.309
0.309
0.725
0.874
0.241
0.056
0.080
0.043
0.028
0.139
0.041
0.022
0.066
0.047
0.777
0.028
0.082
0.933
0.011
0.129
0.610
0.850
0.795
0.835
0.031
0.137
0.989
0.008
0.151
0.725
0.874
0.811
inc. diff
(%)
0.327
0.029
0.238
0.157
-0.045
0.090
0.247
0.158
0.077
1.278
From population dataset
Calculated
SNP
allele
disfrequency
risk
frequency
explained
tance
(kb)
west east increase west east
Table 5. Estimated incidence differences between high and low AMI incidence regions explained by a subset of
SNPs associated with cardiovascular diseases in GWASes.
chr
1 rs599839
A
1.29
2 rs4665058
A
1.92
3 rs9818870
T
1.15
G
1.13
6 rs16893526
6 haplotype a CCTC 1.82
6 haplotype a CTTG 1.20
C
1.09
7 rs11556924
10 rs501120
T
1.33
16 rs8055236
G
1.91
combined
a
A haplotype formed by SNPs rs2048327, rs3127599, rs7767084 and rs10755578.
b
Samani et al. 2007
c
Arking et al. 2011.
d
Erdmann et al. 2009.
e
Wild et al. 2011.
f
Trégouët et al. 2009.
Schunkert et al. 2011.
Wellcome Trust Case Control Consortium 2007.
g
h
Associated phenotype in all studies is coronary heart disease, except sudden cardiac arrest for SNP rs4665058.
Unpublished.
87
Results
Discussion
1.Technical aspects
1.1. Quality control of genome-wide SNP data for population
genetics
Even though genome-wide association study (GWAS) datasets have proven
useful in population genetics, some of the standard quality control (QC)
procedures developed for GWAS settings may be nonoptimal for population
genetics use. For example, deviations from Hardy-Weinberg equilibrium
(HWE) that can signal errors in genotype calling may also stem from population substructure. Likewise, applying fixed QC thresholds for sample
heterozygosity or identity-by-state (IBS) relatedness will ignore the fact that
these measures can vary because of population history rather than because of
technical errors. In the presence of marked population structure, unnecessarily strict QC procedures can exclude markers and individuals with the largest
genetic differences. This could underestimate the strength of population
structure. In a GWAS, in turn, unnecessarily strict QC would have a more
indirect effect: a loss of power to detect an association. On the other hand,
QC with too lenient thresholds can accept samples or markers of inferior
genotyping quality. In GWASes, the genotyping quality can be re-checked
for the SNPs that show the strongest associations. Such subsequent checks
will rarely be possible in the genome-wide analyses of population structure
which usually do not indicate particular loci.
In this study, the QC procedures tried to avoid some of the above pitfalls. When filtering for sample heterozygosities and IBS relatedness, we
examined the distribution of the measures in each population and used
population-specific thresholds. Similarly, the HWE checks were done for
each population separately, to minimize the number of deviations that could
originate from population structure. Overall, the QC procedures and the
filtering thresholds we chose were stricter for markers than for individuals.
In a population genetic study – contrary to a GWAS – the individuals are of
primary interest, whereas single markers will have little effect on the results
that are genome-wide averages. Moreover, the exact number of markers
surviving QC is irrelevant for many of the analysis methods, because they
are unable to handle more than a few thousand markers.
The GWAS data for different populations often originate from separate datasets genotyped on varying platforms or in several laboratories.
Consequently, technical differences between the datasets may inflate the
88
Discussion
observed population differences. As long as the technical differences affect
relatively few markers, their effects can be minimized by choice of analysis
methods. For example, we clustered individuals with multidimensional
scaling (MDS) instead of principal component analysis (PCA), because the
former is somewhat less sensitive to single deviating markers. Additionally,
in QC we compared genetically close populations from different datasets
and removed clearly outlying markers.
1.2.Comparison of model-based methods of individual clustering
This study clustered individuals with three model-based software programs:
Structure, Admixture, and Geneland. Analyses of the same dataset with
multiple softwares enabled comparisons of Structure vs. Admixture and
Structure vs. Geneland.
The main trends detected by the Structure and Admixture softwares did
not differ. It was obvious, however, that Admixture tended to infer a smaller
number of clusters than did Structure, i.e., it had lower resolution. This was
most visible in the analysis of Finns; even with a more than 30-fold number
of markers, Admixture was able to infer only one cluster, whereas Structure
found two. In the other populations, the reduction in resolution was smaller,
possibly due to the larger number of individuals. Additionally, in the analysis
of north Europeans, the clusters inferred by Structure appeared somewhat
more separated than were those produced by Admixture (cf. Figure 3 in
Study IV and Figure 13). Obviously, therefore, although the faster algorithm
of Admixture allows the use of larger numbers of markers, this does not
immediately translate into better performance.
The comparison of Structure and Geneland used a small dataset
with 34 SNPs. Structure found only one cluster, which is unsurprising
because previous studies have shown for example that Structure needs 60
or more random markers to correctly assign individuals to populations even
on a continental level (Bamshad et al. 2003, Turakulov & Easteal 2003).
Geneland, in turn, inferred two clusters in Finland and a third one in Sweden.
Thus, the spatial prior of Geneland clearly improved the clustering power
in this small dataset.
Discussion
89
2.Population structure in Finland and
Sweden
In our genome-wide analysis of population structure and substructure in
northern Europe, Finns clearly differed from other populations. Eastern
Finns also differed from western Finns – as much as western Finns differed
from Swedes, and markedly more than Germans from British. The Finns also
harbored a small but apparent eastern influence, which was pronounced in
eastern Finland. Furthermore, the genetic diversity of eastern Finland was
substantially reduced. In Sweden, the population of the northern part of
the country differed from that of the rest of Sweden as well as from that of
central Europe and showed an internal substructure, whereas the population
of the southern part was relatively homogeneous and genetically close to
Germans and other central Europeans.
These patterns, discussed in detail below, are generally consistent with
earlier genetic research into Finns and Swedes. They are also compatible
with the history of these populations, which is characterized by low population densities and a relatively limited gene flow. Overall, our data showed a
general correlation between genetic and geographic distances which agrees
with that of earlier studies of Europe (Lao et al. 2008, Novembre et al. 2008).
Interestingly, however, the nature of this correlation varied regionally: in the
north, a given geographic distance corresponded to a much longer genetic
distance than in the south.
2.1.Comparison of Finland and Sweden
In our analyses, genetic substructure appeared stronger in Finland than
in Sweden. The Finnish and Swedish datasets are not directly comparable,
however, because of their different ascertainment. In the Finnish samples,
geographic information consists of the grandparental birthplaces of the
subjects, and the data therefore represent the historical rather than the
contemporary population. For the Swedes, geographic information is the
place of birth in a sample representing the contemporary population as a
whole (i.e., including descendants of recent immigrants) (Study II), or is the
place of residence in a sample of Swedish-born individuals (Study IV). Recent
(post-WW2) trends of immigration, urbanization, and internal migration
have likely reduced and blurred subpopulation differences in both countries.
It is therefore impossible to say to what degree the stronger Finnish substructure in our data represents a temporal difference in population structure
and to what degree current (or past) differences between these countries. In
theory, the discrepancy between the datasets could be alleviated by either
90
Discussion
tracking the ancestry of the Swedish samples or by correcting the Finnish
results for the degree of internal migration during recent generations.
Despite their geographic proximity, the northernmost Finnish and
Swedish provinces studied (Northern Ostrobothnia and Norrbotten, respectively) were not close to each other genetically. Instead, the northern Swedes
showed a closer genetic affinity to western Finns. This clearly contrasts with
the correlation of genetic with geographic distances in Europe (Lao et al.
2008, Novembre et al. 2008). The likely explanation to this seemingly counterintuitive trend lies in the population history of Northern Ostrobothnia:
apart from the very coastal areas, this province became inhabited during
the 16th century population movement originating from eastern Finland
(Keränen 2003, Pitkänen 2007). Our results suggest that the amount of
recent Finnish genetic influence in northernmost Sweden has been rather
restricted geographically or in magnitude, or both, despite contact across
the border. This observation disagrees somewhat with earlier findings of
increased links between the northern areas (Nylander & Beckman 1991,
Kittles et al. 1998, Karlsson et al. 2006, Lappalainen et al. 2008, 2009).
Admittedly, the disagreement may in part stem from the fact that in our
study the Finnish areas directly adjacent to the border remained unsampled.
Meanwhile, the genetic affinity of northern Swedish to western Finnish areas
may reflect the apparently more powerful contacts around and across the Gulf
of Bothnia ever since ancient times (Siiriäinen 2003a; see also sections 5.1
and 6.1 of Review of the Literature).
The Finnish province genetically closest to the Swedes was Swedishspeaking Ostrobothnia. Although the small sample size (20 individuals)
limits conclusions, this result is well in line with historical records of Swedish
immigration to the area in the early Middle Ages. It is also compatible with
results depicting this area as genetically intermediate between Sweden
and the rest of Finland (Virtaranta-Knowles et al. 1991). Obviously, exact
estimates of admixture proportions may vary between studies, depending on
the origin of both the Swedish and especially the Finnish reference samples
(cf. section 2.2.2). In our data, a strong admixture of the Swedish-speakers
with the Finnish-speaking population seemed evident. For example, in the
multidimensional scaling (MDS) analyses, Swedish-speaking Ostrobothnians
differed from their Finnish-speaking neighbors (Southern Ostrobothnians)
only when Swedes were included in the analysis. This underlines both the
close relations of these populations and the importance of including in MDS
relevant reference populations. Conversely, genetic distances to Sweden were
relatively long: Swedish-speaking Ostrobothnia appeared approximately
equally far from Swedish provinces as the two northernmost – and most
differing – provinces of mainland Sweden (Norrbotten and Västerbotten).
Discussion
91
2.2.Finland
2.2.1.Eastern influence
Most of the features of the Finnish genetic structure could be explained
by a scenario in which a gene pool of European origin is subject to genetic
drift, both in founder effects during the gradual inhabitation of Finland
and in subsequent small and isolated breeding units. However, the Finnish
autosomal gene pool also carried signs of a small but evident contribution
from the east, visible as an increased genetic affinity of the Finnish samples
to the East Asian HapMap reference populations (from China and Japan),
and it also emerged in a SNP-based analysis. This influence was stronger
in eastern than in western Finland.
Eastern elements have earlier been detected in the Finns, most conspicuously in the very high frequency of the Y-chromosomal haplogroup
N3 (Guglielmino et al. 1990, Lahermo et al. 1999, Meinilä et al. 2001,
Lappalainen et al. 2006; see also section 5.3.2 of the Literature). In our
data, this effect appeared to be much smaller. The difference between the
Y chromosome and the autosomes may result from random factors (like
stronger genetic drift in the Y chromosome), or it may signal a male bias in
related population-history events. Moreover, it can reflect the Far Eastern
reference population used; comparisons with eastern Finno-Ugric or other
eastern European or northern Asian reference populations may have shown
larger effects. Unfortunately, very few eastern reference populations were
publicly available at the time of the original analysis. Interestingly, signs of
contacts between Volga-Ural populations and East Asians (Japanese) have
been observed in X-chromosomal markers (Laan et al. 2005).
While another eastern reference population, the Russians, also showed
an increased affinity to East Asians, no such increase was apparent between
the Russians and the (eastern) Finns. Provided that the Russian samples
(from Vologda) are a representative reference, this suggests that Finland and
Russia have separate histories of the admixtures occurring after the eastern
influence. The eastern influence in Finland would thus predate the arrival
of ancestors of Russians in their current areas, which occurred in the end
of the first millennium AD. In view of archaeological knowledge of ample
contacts from Finland towards the east, possible timings for the influence
still abound, especially as the varying degree of eastern influences within
Finland may stem from later phenomena. Furthermore, some part of the
influence in Finns could result from contacts with the Saami who are known
to harbor a relatively high frequency of eastern elements (Guglielmino et al.
1990, Johansson et al. 2008, Huyghe et al. 2011).
Earlier studies have provided estimates of the degree of eastern influence
in the Finnish gene pool (Nevanlinna 1984, Guglielmino et al. 1990), but we
92
Discussion
did not attempt to quantify it. It seemed superfluous given the limited relevance of the available reference population: a comparison with East Asians
would likely produce only a minimum estimate of the percentage of eastern
influence (see above), and it would be very difficult to know how seriously
this would underestimate the “true” proportion. Additionally, such estimates
can be highly dependent on the method and software used for the inference
(for an example of Admixture vs. Structure, see Figure 1 in Alexander et al.
2009). Moreover, regardless of how carefully the researchers themselves
state the factors limiting the validity of their estimates, the numbers often
tend to get cited without any such disclaimers.
2.2.2.East-west difference
In terms of the difference between eastern and western Finland, the results
of this study are compatible with the duality of the Y chromosome (Kittles
et al. 1998, Raitio et al. 2001, Hedman et al. 2004, Lappalainen et al. 2006,
Palo et al. 2007, 2008) but discordant with the homogeneous pattern of
mtDNA (Vilkki et al. 1988, Hedman et al. 2007). They also agree with the
results of later genome-wide analyses (Jakkula et al. 2008, McEvoy et al.
2009), and with the distributions of many of the Finnish disease heritage
(FDH) diseases (Norio 2003a).
The differences between the population histories of the two regions
are many. They start from the different direction of predominant contact
already in ancient times, and extend to the much shorter agricultural history
in the east which involves events of extreme genetic drift (see section 5.1 in
the Literature). The drift-related phenomena that have taken place during
Finnish population history are the standard explanation for the existence
and distribution of FDH diseases. However, these diseases are caused by
extremely rare alleles which are prone to drift, whereas drift may not be the
only or main mechanism playing a role in the distribution of more common
variants. Admittedly, the common variants have not been immune to the
drift either, as demonstrated by the reduced genetic diversity detected in
eastern Finland in this and other studies (Raitio et al. 2001, Hedman et al.
2004, Lappalainen et al. 2006, Service et al. 2006, Jakkula et al. 2008).
Nevertheless, the earlier events of population history, especially the higher
degree of European (mainly Baltic and Scandinavian) influence in western
Finland, may also serve to explain a part of the east-west difference observed.
Although the genetic difference between eastern and western Finland
was large, it is impossible to say how sharp a genetic border exists between
the areas, because of the sampling gap separating the eastern and western
samples in our data. Later studies with differing sampling distributions
have not suggested the existence of a strict border (Jakkula et al. 2008,
McEvoy et al. 2009). In fact, it is likely that the difference is clinal: even if
Discussion
93
the original phenomena producing the difference would have been strictly
limited geographically, later developments such as migration and marriage
across the (hypothetical) border would probably have blurred it. Nevertheless,
the cline would be exceptionally steep, because the difference between the
sampled areas is much larger than usually observed across similar distances
for example in central Europe (Heath et al. 2008, Nelis et al. 2009).
Interestingly, while eastern and western Finns differed clearly in the FST
and allele frequency analyses, their IBS similarity was much higher than
the other measures would have led one to expect. This illustrates the fact
that the sensitivity of these measures for various population genetic forces
can differ. For example genetic drift, which has been powerful in eastern
Finland, may be more strongly reflected in the SNP-based results, whereas
the shared history of eastern and western Finland may have led to allele
frequency combinations that do not produce equally marked differences in
the individual-based IBS measures (cf. Figure S4 in Study III).
The genetic substructure observed within Finland can obviously pose a
problem for association studies, unless the substructure is accounted for.
Naturally, methods for stratification correction may in part reduce spurious
associations in datasets where the geographic distributions of cases and
controls differ. Minimizing such differences already during study design
(prior to genotyping) is nevertheless advisable. Whenever possible, efforts
to collect Finnish samples should record information on parental or preferably grandparental origin, and in association studies, cases and controls
should match at least on the level of their east/west ancestry. Moreover,
because the genetic differences within Finland were larger than between
many neighboring countries in central Europe (Heath et al. 2008, Nelis et al.
2009), in a wider context our results also emphasize how the geographical
units relevant to association analysis sampling are deducible neither from
national borders nor from perceived cultural homogeneity.
2.2.3.Genetic vs. archaeological view of Finnish population history
The current archaeology-based view of Finnish population history supports
general continuity since the early settlers and more or less frequent smallscale contact with neighboring populations rather than a few unequivocally
detectable migration waves. This seems to conflict with the views frequently
cited by geneticists that the Finnish population derives from major settlement
events 2000 and 4000 years ago. Direct time estimates based on genetic
data appear scarce, however, and the existing ones, because of their wide
confidence intervals, may be associated with several archaeological events.
Thus, they can hardly be used to argue for specific timepoints unsupported
by non-genetic evidence.
94
Discussion
Meanwhile, some indirect evidence of population age can perhaps be
derived from calculations made in the course of FDH studies. In these
calculations, the Finnish population has typically been assumed to be 2000
years old. Considering that this assumption more resembles the long-abandoned theory of an Iron Age population replacement than it does the current
archaeological view of no major discontinuity, what may appear surprising
is that the calculations have been very successful in predicting the location of disease genes. However, it is worth noting that such calculations
correspond to forward-time population simulations in the sense that a fit
to the observations does not automatically support the correctness of the
parameters employed. Indeed, to my knowledge, no systematic studies exist
on other combinations of population history parameters that could have
yielded equally accurate predictions. Besides, the FDH alleles are rare, and
the diseases are only a handful, and therefore may not form a representative
picture of the forces that have affected the Finnish gene pool as a whole.
Moreover, even when precise and representative genetic timings exist,
they do not necessarily equal the presence of migration waves. Rather, they
can detect a reduction in population size or the start of a population expansion. These could of course reflect a migration or founder event, but could
equally well result from a temporary reduction in the local population (a
bottleneck), compensated for by local population growth rather than gene
flow. In an extreme scenario, a population may go through a strong bottleneck
in the absence of gene flow. As a result, the genes of the post-bottleneck
population may have resided in the area for millennia, and still appear
much younger. Conversely, if a migration involves relatively large numbers
of individuals, the resulting genetic timings may greatly antedate the actual
migration event. Furthermore, genetic timing in itself does not directly reveal
the geographic location of a genetic event.
Such limitations of the genetic timing methods may appear obvious to
a geneticist. They are often worth stating explicitly, however, when dealing
with questions on Finnish population history that stimulate multidisciplinary
interest. Otherwise, the comparisons of results from different disciplines
may exhibit discrepancies that are, in fact, largely illusory.
2.3.Sweden
Although the overall genetic structure detected in Sweden was clinal, a
difference was visible between southern and northern Sweden, and a substructure existed within the north. This agrees with earlier findings (Karlsson
et al. 2006, Einarsdottir et al. 2007, Lappalainen et al. 2009) as well as
with a later genome-wide study (Humphreys et al. 2011). Interestingly, a
pronounced genetic substructure exists in northern Finland (Jakkula et al.
2008), suggesting the involvement of similar population history processes
Discussion
95
in both areas. In southern and central Sweden (Svealand and Götaland), the
population structure was clinal but subtle. The individuals were genetically
relatively close to central Europeans, which is in congruence with previous
results (see section 6.3.1 of the Literature for references). Unfortunately, the
Svealand provinces of Värmland and Dalarna which have shown interesting
features in several other studies (Karlsson et al. 2006, Lappalainen et al.
2009, Humphreys et al. 2011) could not be analyzed in detail because of
small sample size (n = 21).
The difference between northern and southern Sweden is important
from an association analysis perspective. While it is of the magnitude that
should be possible to correct for, the difference might become problematic
if genome-wide information is unavailable, for example in candidate gene or
replication studies. Meanwhile, any sampling done within southern Sweden
requires less rigorous matching. Obviously, obtaining ancestry information
would still be important in order to exclude recent northern migrants. A more
thorough analysis of the effects of the Swedish substructure on association
studies has been reported elsewhere (Humphreys et al. 2011).
Norrland differed from the rest of Sweden in much the same manner as
did eastern Finland from western Finland. Interestingly, Norrland did not
exhibit the reduced genetic diversity which was very apparent in eastern
Finland and which would be characteristic of a small and isolated population
affected by genetic drift. Of course, drift-related phenomena may have had
less extreme an effect in Norrland where the population is older, and has
possibly been larger and more constant in size. Additionally, the presence
of several subpopulations – as detected in this and other studies (Gustavson
& Holmgren 1991, Einarsdottir et al. 2007 and references therein) – may
increase the overall diversity if the subpopulations have independent histories of genetic drift. Furthermore, the combination of normal diversity
with pronounced LD suggests the possible involvement of admixture in
simultaneously increasing both genetic variation and genetic distance from
southern Sweden. Admixture obviously is not a far-fetched idea in the light of
the known population history of Norrland, with its contact and assimilation
with Finns and Saami (see section 6.3.2 in the Literature).
A trace of the increased eastern contribution observed in (eastern) Finland
was also visible in northern and central Sweden (i.e., relative to southern
Sweden). The origin and age of these elements remain open. In line with the
relatively high frequencies of the Y-chromosomal haplogroup N3 in these
areas (Karlsson et al. 2006, Lappalainen et al. 2009), they may reflect relatively recent contacts with the Finns (although in our dataset, these appeared
to be rather limited in the north, cf. section 2.1) or Saami, or they may stem
from ancient contacts, possibly with the same people who originally brought
the influence to Finland. Later analyses of the fine-scale geographical distri-
96
Discussion
bution of these elements (for instance of their enrichment in the northern
or coastal areas) could possibly shed more light on their origin.
In Study III of this thesis, Swedish reference individuals demonstrated an
interesting spread in the second dimension of the MDS plot. We suspected
that this was an indication of population substructure within Sweden, but
lacking detailed geographic information, we could not study this directly.
In an indirect analysis, we examined the original IBS distances between
the Swedes, i.e., the input data of the MDS. Since we found no notable differences between the IBS distribution within the Swedish population and
distributions within most other populations, we originally concluded that
the spread pattern was probably some internal artefact of the MDS calculation. Later, however, in the analyses of Study IV, the Swedish individuals
lacking geographical information behaved very similarly to the individuals
representing northern Sweden. It thus appears that the former individuals
may originate predominantly from the northern parts of Sweden and that
the MDS spread in Study III already reflected the population substructure
which was later detectable in Study IV. This underlines the ability of MDS
to discern the small portion of total genetic variation between individuals
that actually represents a geographic (or other systematic) pattern.
2.4. Limitations of the results
The representativeness of subjects is crucial in population genetics. This
study used geographical (ancestry) information rather than ethnicity, with
the exception that native language served as an additional classification criterion for the Swedish-speaking Ostrobothnians. The datasets had differing
ascertainment criteria, which limits the comparisons between them (see
section 2.1). Especially the Swedish datasets, with no ancestry information,
may contain some recent migrants. Indeed, the analyses revealed several
individuals potentially with an immigrant background. Furthermore, (sub)
populations in the data were defined by geography rather than by interbreeding. However, any major discrepancy between the classifications based on
administrative provinces and the actual genetic structure of the population
should have manifested itself in the individual-based analyses.
Gaps in the geographic distribution of samples restricted the conclusions
in two locations where large genetic differences were seen, namely between
eastern and western Finland and between Sweden and Finland in the north.
However, neither of these differences should be an artefact caused by the
sampling gap alone. In the former case, more-continuous sampling would
yield information on the exact location and clinality of the difference (see
section 2.2.2), but genetic distances between the currently sampled areas
would obviously remain the same. In the latter case, the country border also
coincides with a boundary between two datasets genotyped with different
Discussion
97
arrays in separate laboratories. The difference observed unlikely results
merely from a dataset difference, because the Swedish samples actually
showed smaller genetic distances from other samples of the Finnish dataset
(see section 2.1).
In many of the analyses, small sample size can limit resolution. This was
the case at least to some degree in eastern and western Finland and in northern Sweden: larger sample sizes may have revealed fine-scale population
substructures with higher precision. Within southern Sweden, on the other
hand, the fact that we observed no strong population structure should not
be due to immediate limitations by sample size, except in a few provinces.
Of the reference populations, the Russian sample was very small, which may
have affected its behavior in the Structure and MDS analyses. However, its
small size should not have affected the IBS distributions that underlie the
relative timing of the eastern influence in Finland. Another question is of
course how representative is the Russian sample of the Russian population
as a whole, because it originated from only one area. The same applies to
the German reference data that originated from one small area in northern
Germany and are thus unlikely to reflect the whole variation in the population.
In the presence of a fine-scale population substructure, like the one seen
in Finland (Nevanlinna 1972), the sampling distribution may substantially
affect the results. For example, samples from very limited areas could differ
more than do wider samples from the same regions (Figure 17). In our study,
however, the size of the east-west difference should be largely unaffected
by such phenomena, because our Finnish samples represent relatively wide
geographic areas. In fact, in comparisons of the east-west difference with
the genetic distance between Germans and British, it was perhaps the latter
that was more likely inflated, due to the limited geographic coverage of the
German sample (see above).
regional allele frequency
0.10 0.70 0.20
0.70 0.10 0.20
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
Figure 17. Effect of geographic sampling scale on observed population structure. When
frequencies of alleles (colored circles) vary locally, samples from small areas (orange
rectangles) may differ more from each other than do samples representing larger areas
(violet squares).
98
Discussion
Apart from the LD calculations, the analyses of this study use single SNPs,
and in the case of genome-wide datasets, the results were typically averaged
across the genome. This is a valid approach, but it may mask some interesting
patterns of variation between genome areas. Furthermore, haplotype-based
analyses could detect for instance migrational patterns more clearly than the
single-SNP analyses can. Multimarker methods could also allow the timing of
population events, whereas here the observations have been connected with
population history phenomena only at the level of discussion. Unfortunately,
few if any such methods were available at the time of our original analyses,
and even now, many of the tools available are designed for use in more
diverged populations, for example on a continental scale.
Discussion
99
3. Subpopulation difference scanning
(SDS) approach
3.1.The idea
In gene mapping, population stratification (i.e., population structure) can be
a nuisance because it can produce spurious results in an association analysis.
In contrast, other gene-mapping methods may be indifferent to population
structure or may even benefit from it. This thesis introduces and theoretically
tests a new mapping approach, subpopulation difference scanning (SDS),
which directly utilizes population substructure to localize genes for diseases
that differ in incidence between otherwise closely related subpopulations.
The underlying idea of the SDS approach is that when subpopulations
diverge from each other by genetic drift (including founder effects), their
allele frequencies will fluctuate, and varying amounts of frequency difference between the subpopulations will accumulate at the loci. A frequency
difference gained by a disease-predisposing locus will lead, other things
being equal, to differing incidences of the disease in the subpopulations
(Figure 18). Conversely, a locus with no frequency difference cannot cause
an incidence difference. Thus, loci underlying incidence differences could
possibly be located by studying subpopulation frequency differences and by
excluding from further analyses any genome areas that do not show a large
enough difference between subpopulations.
penetrance
other risks
carriers
non-carriers
carriers
non-carriers
Figure 18. Disease incidence in two subpopulations. The subpopulations (large squares)
have a different frequency of carriers of a risk variant and therefore different proportions
of healthy (white) and diseased (gray) individuals, even though penetrance of the risk
variant and background risk for the disease in the subpopulations are equal. Note that
the difference in disease incidence between the subpopulations is the carrier frequency
difference times the additional risk caused by the variant. Adapted from Salmela et al. J
Med Genet 43: 590-597 (2006) with permission.
100
Discussion
3.2.Theoretical testing, advantages, and limitations
We assessed the theoretical feasibility of this approach in the Finnish population, where differences between the eastern and western parts of the country
exist both in genetic markers and in cardiovascular disease incidences (see
sections 5.3.4 and 5.3.6 of the Literature). The extent of genetic differences
was examined from a small set of autosomal STR markers, and population
simulations resembling the Finnish population history allowed estimation
of the sample sizes and genotyping densities necessary for successful gene
localization. The simulations that produced subpopulation frequency differences similar to those observed suggested that a 90% exclusion of the
genome could be achieved at a 20% risk of excluding the true gene with
use of a 1-cM genotyping density and samples of 100 to 200 individuals
from the subpopulations, provided that the gene would have a frequency
difference of at least 0.20.
In general, the SDS approach will work best for diseases with large
incidence differences – or, to be more precise, for diseases in which a large
absolute incidence difference results from a single gene. Furthermore, the
smaller the effect that this gene has on disease risk, the easier it will be to
locate. This may seem counterintuitive, bearing in mind that for both linkage
and association analysis the trend is the opposite. However, it makes sense
when remembering that the incidence difference is the product of the risk
difference and the gene frequency difference (Figure 18). Thus, for a given
incidence difference, a small risk difference will lead to a frequency difference
that is large and therefore easier to detect. (How likely it is for an incidence
difference to result from an allele with a small rather than a large effect is
of course a different question, the answer to which depends on the extent
of genetic drift.) Conversely, even large incidence differences caused by
single genes can be undetectable by SDS if the incidence difference is due to
a large risk instead of a large frequency difference. It is therefore advisable
to test the subpopulation frequencies of known genetic risk factors before
proceeding to a full-scale SDS effort.
The target populations of SDS are limited to those in which the differences between the subpopulations are moderate. In populations with large
differences, the gene’s frequency difference would not likely be among the
largest, whereas in subtly differing subpopulations, the accurate detection
of differences would require large sample sizes. It seems that the east-west
differences within Finland fall into the intermediate range. However, the
Finnish population does not fully conform to the assumptions of the theoretical analysis in another respect: the simulations assume that the subpopulation differences stem purely from genetic drift, whereas the Finnish
population also harbors a small, geographically variable eastern influence.
We have not tested how this might affect SDS dynamics.
Discussion
101
Above, we have formulated and tested the SDS approach in the context of
two subpopulations, but it could be extended to more than two. That would
be a simple way to enhance the exclusion of genome areas, because the
further analyses could be limited to loci whose frequency difference pattern
is compatible with the incidence pattern in all of the subpopulations. Even
then, many loci that are unrelated to the phenotype of interest will differ
because of genetic drift alone. The number and percentage of false positives
will therefore be considerably larger than in a typical association scan. It
is thus essential to confirm the findings with, for example, case-control
association analyses of selected markers. Alternatively, the confirmation
could rely on existing GWAS data from the same or a closely related population. In fact, combining frequency difference information with GWAS
data would provide a means of detecting a wider range of causal genes (in
terms of their combination of frequency difference and risk) than would
the SDS approach alone.
SDS looks only at subpopulation allele frequencies and not at the distribution of those alleles among healthy and diseased individuals. It is therefore
independent of the decay of LD between the disease gene and its nearby
markers after the frequency differences have formed. Most of the differences
will arise from the heavy genetic drift during the early generations when LD
is still strong, and a relatively sparse marker set will suffice to detect them.
Nevertheless, the estimate of necessary genotyping density in the original
study (1 cM) may be overly optimistic: the simulations did not account for
the fact that not all markers will be equally informative for detecting the
genetic drift. However, this factor should be amply compensated for by the
current marker densities of genome-wide arrays. The immediate flip side of
the strong initial LD is that the SDS resolution will remain limited. Notably,
the essential LD in the present population may be much weaker, and therefore the marker density needed for the follow-up association analyses higher.
The original analyses of SDS performance were conducted before the
GWAS era, and many of the aspects used to advertise SDS in the original
publication have not turned out to be major obstacles in the GWASes. Several
of the advantages of SDS nevertheless remain. Population sampling may
be easier and cheaper than the laborious phenotyping needed to collect
association study cases. Naturally, if GWAS control data with appropriate
geographical information exist, they will be of immediate use. Additionally,
once such a genotyping has been made, the results will be valid for any phenotype that shows the same geographic pattern, in contrast to the collections
of cases for association studies that are strictly phenotype-specific.
102
Discussion
3.3. Analyses of genome-wide population data
Since the original SDS analyses, we have genotyped a genome-wide set of
SNPs in Finns who represent the areas of high and low acute myocardial
infarction (AMI) incidence. These data can be compared to some of the
original analysis assumptions. Firstly, in the SNP dataset, the high- and
low-incidence areas had a slightly smaller allele frequency difference than
in the STR data. With a shift in this direction, the performance of SDS
should improve.
Secondly, the SNP data can shed light on the number of loci that may
contribute to the incidence difference. Our functional enrichment analysis
suggested that relatively many of the SNPs with a large frequency difference
are related to cardiovascular functions. Together, such loci might explain a
substantial portion of the incidence difference. This could violate the basic
requirement of SDS that a large incidence difference should be caused by
a single gene.
On the other hand, several SNPs that have been associated with cardiovascular phenotypes in GWASes resided in genome areas that harbored very
large frequency differences. These areas and GWAS SNPs would therefore
be among those that an SDS analysis would select for further inspection.
Naturally, this evidence of the performance of SDS is indirect, and the feasibility of the approach still needs to be demonstrated by other means, for
example in a systematic case-control association testing of the most widely
differing SNPs.
Discussion
103
Concluding remarks and future
prospects
In the past few years, genome-wide SNP datasets have firmly established
themselves as a suitable and useful material for population genetic analyses.
In this thesis, such data served for the first time in a study of population
structure within Finland and Sweden. Additionally, the thesis introduced
a novel gene mapping approach, subpopulation difference scanning (SDS).
The results paint a picture of two northern European populations that
have long been characterized by small population sizes and relatively limited
contact with neighboring populations. Through genetic drift, these factors
have led to distinct genetic features and reduced diversity. Specifically,
northern Sweden differed genetically from the rest of Sweden, in congruence
with its history of small breeding units on the one hand and an admixture
between Swedes, Finns, and Saami on the other. In Finland, a predominantly European gene pool was dotted with signs of an eastern influence,
possibly of ancient origin. The genetic difference between the eastern and
western parts of Finland was striking, albeit in line with long-standing
regional differences in the extent of contact with neighboring populations,
as well as with more recent features of the history of the eastern Finnish
settlement. These east-west differences could serve in mapping genes for
diseases whose incidences also vary within the country, but the feasibility
of this SDS approach remains to be proven in practice by testing the most
differing loci in an association setting.
It can be argued that in populations studied earlier, analyses of genomewide SNP datasets have rarely revealed population patterns that would have
been previously unseen or unexpected. The analyses have, however, yielded
information on the strength of such patterns on a genome-wide level, i.e.,
in the parts of the genome directly relevant to the majority of phenotypes.
This study, for example, showed that an east-west difference within Finland,
previously known from the Y chromosome but not from mtDNA, exists
also in the autosomal markers. This demonstrates that this difference is
important to acknowledge in association studies. Indeed, genome-wide
datasets can effectively complement the frequently employed mtDNA and
Y-chromosomal markers, which may be prone to overstate the importance
of some population history events, especially of sex-specific or drift-related
phenomena.
In the future, our understanding of the genetic history of human populations will undoubtedly increase, not least because of methodological and
technical advances. The emergence of new datasets will enable more com-
104
Concluding remarks and future prospects
prehensive analyses. In the case of Finland and Sweden, reference data from
various populations in northern Eurasia would be of special interest. New
computational methods specifically developed for genome-wide SNP data
can enhance their potential for inference and timing of demographic events
such as migrations; in our datasets, such methods could obviously refine
many of the current interpretations. The increasing use of sequencing will
provide an unprecedented source of data also for population history studies,
and help to circumvent some of the shortcomings of the SNP arrays, for
example, the biases of SNP selection. Meanwhile, with constantly increasing
sampling densities, the need for methods that detect and quantify clinal
structures within continuous populations instead of measuring differences
between predefined population units will become ever more pressing.
Concluding remarks and future prospects
105
Acknowledgements
This study was carried out in the Finnish Genome Center (currently the
Technology Center of the Institute for Molecular Medicine Finland (FIMM)),
at the Department of Medical Genetics, and in the Molecular Medicine
Research Program of the Research Programs Unit in the University of
Helsinki. I thank the current and previous directors of these institutes for
providing excellent research facilities for this study. My work was funded
by the University of Helsinki, Graduate School in Computational Biology,
Bioinformatics, and Biometry (ComBi), and a grant from the Sigrid Juselius
Foundation through Professor Juha Kere. Other main supporters of the
project have been the Academy of Finland, the Emil Aaltonen foundation, the Finnish Cultural Foundation and the Swedish Research Council.
Detailed information on further sources of funding appears in the respective
publications.
I thank my supervisors, Juha Kere and Päivi Lahermo, for their trust and
support throughout this project. You have formed a perfect combination of
inexhaustible optimism and ideas with down-to-earth realism on how things
don’t always – or ever – go as planned and how everything tends to take
longer than expected (and then some more). If there is anything I would
want to criticize, then it would be your incredible patience. With a bit less
of it, I might have had to finish on somewhat closer to a standard schedule…
I thank my former Professor, Marja-Liisa Savontaus, for introducing me to
my supervisors – and to human genetics, for that matter. Unfortunately the
spatiotemporal overlap of our contributions in the project has been smaller
than I would have wished; I would have had a lot to learn from you.
I am extremely grateful to my coauthor and closest collaborator Tuuli
Lappalainen. We started by doing science on Skype (a practice which approximately doubled my typing speed) when our offices were separated by three
floors, and our collaboration grew tighter than our feet could bear. Later we
actually shared an office and a countless number of variably scientific but
invariably long and fulfilling face-to-face discussions that I still miss. Even
after that glass ceiling in our office proved unable to stop you, the former
means of communication has been sustained, this time across 2000 kilometers, though undoubtedly you would often have had more urgent things
to do. Meanwhile, you have taught me a lot about science and how to do
it, and it is hard for me to even imagine what would have ever become of
this thesis project (let alone when!) without your contribution, enthusiasm,
efficiency, determination, ideas, and the incredible number of working hours
you invested in our papers.
106
Acknowledgements
Of my other coauthors, I thank Ulf Hannelius for adopting “my” microsatellite dataset and giving it a role in a larger context – and letting me use
the result in my thesis. I hope it was a win-win deal. I acknowledge the
late Olli Taskinen for creating the fanciest figure of my first-ever paper; his
untimely death was a great loss. I thank my coauthors who provided the
samples and datasets without which this study would have been impossible,
in particular Pertti Sistonen, whose collection of Finnish subjects played a
role in all papers of this thesis, and Per Hall on whose Swedish dataset the
last paper stands. Naturally, I also thank all the sample-donors in those
datasets – population simulations may be fancy, but they will take one only
so far. I am grateful to all my coauthors for their contributions, comments
and advice, insight, and for providing me views on different ways of doing
successful science.
I also warmly thank the reviewers of this thesis, Professors Jaakko Ignatius
and Marjo-Riitta Järvelin, for their constructive and valuable comments.
My heartfelt thanks go to the IT experts at FIMM/GIU for fixing my
computer countless numbers of times in countless numbers of ways, to
the staff at the Finnish Genome Center, especially Timo Miettinen, Riitta
Koskinen, Anne Leinonen, Jouko Siro, Kari Tuomainen, and Kyösti Sutinen,
for their help on various matters great and small, to laboratory technicians
Sirkku Ekström, Anu Yliperttula, Päivi Hantula, Anitta Hottinen, Ahmed
Mohammed, Eira Leinonen, Auli Saarinen, Riitta Känkänen, and Riitta
Lehtinen for their skillful assistance in the laboratory, to Anu Puomila for her
invaluable help with the Finnish samples, to Ingegerd Fransson and Marika
Rönnholm for their superb contribution in the Affymetrix genotyping, and
to David Brodin at the Bioinformatics and Expression Analysis facility at
Karolinska Institutet for much advice on the wonders of Affymetrix genotype
calling and export.
I am grateful to Carol Norris for teaching me more than half of what I
know about scientific writing (and an unknown portion of what I’ve already
forgotten), and to Sami Kilpinen and Henrik Edgren for teaching me programming in R and Perl which have proved more than useful in this work.
I thank Päivi Onkamo, Outi Vesakoski, and Jaakko Häkkinen for pointing
out several interesting references to me. I am grateful to the Finnish daycare
system and its main incarnations Marja and Leea for providing me the luxury
of thinking more than one scientific thought consecutively.
To those archaeologists and linguists whose results I may have misunderstood or ignorantly overlooked, I offer my sincere apologies. Although
as a kid I wanted to become an archaeologist, and as a teenager a linguist,
studying human genetics has not miraculously turned me into either. (The
same apologies of course apply to any omission of results from my fellow
geneticists, but there I have many fewer excuses.) Special thanks to those of
Acknowledgements
107
you who have, throughout the years, taken the trouble of correcting some
of my misconceptions.
I thank the former and current members of Juha’s research group, including but not limited to Outi, Sini, Dario, Emma, Lena, Kata, Tiina, Nina,
Riitta K., Riitta L., Lotta, Eira, Satu M., Satu W., Mari M., Mari T., Ville,
Anna, Auli, Päivi A., Päivi S., Annika, Lilli, Inkeri, Johanna, and Yan. In the
variable combinations of office mates and lunch company, you have formed
my primary peer group and scientific social circle, supporting me through
the oddities of (academic) life. Our projects rarely overlapped in mission or
methodology, but the ups and downs of doing and publishing science are
largely in common, and you have always been there to listen to me (which, as
we all know, takes a lot of time). The same applies to the folks at the Finnish
Genome Center, especially Anna and Riitta with whom I shared an office for
many years, as well as Anna, Päivi T. and Timo – I think we had quite an
effective and comforting “valituspiiri,” until too many of us got our theses
ready and no longer had anything to complain of… Additionally, I thank Päivi
Saavalainen and her celiac-disease group for letting me sit in their office last
winter; even more vital to my mood and motivation than the seat by a window
to the outside world was the company of Beta, Amarjit, and Dawit. Special
thanks to Beta for sharing your insight into the inner workings of Haploview
etc., and into various other issues related and unrelated to science – and
for occasionally making my day by speaking a word or two of Icelandic. So,
thanks, everyone – science wouldn’t be half as much fun without you guys!
I thank my friends who have lent me their ears (and occasionally also
shoulders) in the smaller and larger storms that life includes, academically
and in general. Mysteriously, some of you are still here despite the non-returned calls, unreplied-to emails, and my general negligence in social obligations during the years and especially this last spring. So thank you Riikka,
Johku, Hannu, Silvie, Yelena, Madhu, Jeremy, HJ, Outi, Inari, Biijá. And a
separate thanks commonly to all my dancing friends for the dizzying polskas,
breathtaking sottiisis and a few very divine hambos of these years – and for
the combination of good company and bad jokes, as the definition has it.
I thank my parents for their love and support, and for the curiosity they
gave me – I don’t know whether its transmission is genetic, epigenetic, or
environmental, but I do know it comes pretty handy in science. I thank my
son Otso for generously letting something as mundane as thesis writing
rob him of so many effective hours that could instead have been devoted
to building Legos and playing Kimble, and for providing such a wonderful
counterbalance to science by exactly such activities. Thesis or no thesis, you
are by far my most important achievement in life, and the amount of joy you
have brought me is immense. Last but not least, I thank my husband Jouni. It
is not a small number of computer problems you have fixed, short programs
you have written, and technical tips you have given for my work throughout
108
Acknowledgements
these years. More importantly, when I stayed at work until 6, 8, or 10 PM (in
February, March, and April, respectively), delving into the secrets of Uralic
linguistics or R plotting, it was you who loaded the dishwasher, filled the
fridge, and watched the kid – for the main reward of a few minutes in the
company of a thoroughly stressed, sleep-deprived, and grumpy me. I know
it takes a very special person to tolerate that – and that’s who you are. Eihän
tästä kaikesta olisi tullut mitään ilman sinua.
In summary: I thank you all! :-)
Helsinki, August 2012
Acknowledgements
109
References
1000 Genomes Project Consortium. A map of human genome variation from population-scale
sequencing. Nature. 2010 Oct 28;467(7319):1061-73.
Aalto-Setälä K, Viikari J, Åkerblom HK, Kuusela V, Kontula K. DNA polymorphisms of the
apolipoprotein B and A-I/C-III genes are associated with variations of serum low density
lipoprotein cholesterol level in childhood. J Lipid Res. 1991 Sep;32(9):1477-87.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated
individuals. Genome Res. 2009 Sep;19(9):1655-64.
Amaral PP, Dinger ME, Mercer TR, Mattick JS. The eukaryotic genome as an RNA machine.
Science. 2008 Mar 28;319(5871):1787-9.
Analitis A, Katsouyanni K, Biggeri A, Baccini M, Forsberg B, Bisanti L, Kirchmayer U, Ballester
F, Cadum E, Goodman PG, Hojs A, Sunyer J, Tiittanen P, Michelozzi P. Effects of cold
weather on mortality: results from 15 European cities within the PHEWE project. Am J
Epidemiol. 2008 Dec 15;168(12):1397-408.
Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, Eperon IC, Nierlich
DP, Roe BA, Sanger F, Schreier PH, Smith AJ, Staden R, Young IG. Sequence and organization of the human mitochondrial genome. Nature. 1981 Apr 9;290(5806):457-65.
Arking DE, Junttila MJ, Goyette P, Huertas-Vazquez A, Eijgelsheim M, Blom MT, Newton-Cheh
C, Reinier K, Teodorescu C, Uy-Evanado A, Carter-Monroe N, Kaikkonen KS, Kortelainen
ML, Boucher G, Lagacé C, Moes A, Zhao X, Kolodgie F, Rivadeneira F, Hofman A, Witteman
JC, Uitterlinden AG, Marsman RF, Pazoki R, Bardai A, Koster RW, Dehghan A, Hwang
SJ, Bhatnagar P, Post W, Hilton G, Prineas RJ, Li M, Köttgen A, Ehret G, Boerwinkle
E, Coresh J, Kao WH, Psaty BM, Tomaselli GF, Sotoodehnia N, Siscovick DS, Burke GL,
Marbán E, Spooner PM, Cupples LA, Jui J, Gunson K, Kesäniemi YA, Wilde AA, Tardif
JC, O’Donnell CJ, Bezzina CR, Virmani R, Stricker BH, Tan HL, Albert CM, Chakravarti A,
Rioux JD, Huikuri HV, Chugh SS. Identification of a sudden cardiac death susceptibility
locus at 2q24.2 through genome-wide association in European ancestry individuals. PLoS
Genet. 2011 Jun;7(6):e1002158.
Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R library for genome-wide
association analysis. Bioinformatics. 2007 May 15;23(10):1294-6.
Aulchenko YS, Struchalin MV, Belonogova NM, Axenovich TI, Weedon MN, Hofman A,
Uitterlinden AG, Kayser M, Oostra BA, van Duijn CM, Janssens AC, Borodin PM. Predicting
human height by Victorian and genomic methods. Eur J Hum Genet. 2009 Aug;17(8):1070-5.
Bamshad MJ, Wooding S, Watkins WS, Ostler CT, Batzer MA, Jorde LB. Human population genetic structure and inference of group membership. Am J Hum Genet. 2003
Mar;72(3):578-89.
Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J. Exome
sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011 Sep
27;12(11):745-55.
Barbujani G, Sokal RR. Zones of sharp genetic change in Europe are also linguistic boundaries.
Proc Natl Acad Sci U S A. 1990 Mar;87(5):1816-9.
110
References
Barbujani G, Magagni A, Minch E, Cavalli-Sforza LL. An apportionment of human DNA
diversity. Proc Natl Acad Sci U S A. 1997 Apr 29;94(9):4516-9.
Barnes M. Languages and ethnic groups. In: Helle K (ed). The Cambridge history of Scandinavia.
Volume I: Prehistory to 1520. Cambridge University Press, Cambridge. 2003; 94-102.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype
maps. Bioinformatics. 2005 Jan 15;21(2):263-5.
Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley
DG, Shriver MD. Measuring European population stratification with microarray genotype
data. Am J Hum Genet. 2007 May;80(5):948-56.
Baudou E. Forntiden fram till 1000-talet. In: Larsson HA (ed). Boken om Sveriges historia.
Forum, Stockholm. 1999; 9-46.
Beckman L, Sikström C, Mikelsaar AV, Krumina A, Ambrasiene D, Kucinskas V, Beckman G.
Transferrin variants as markers of migrations and admixture between populations in the
Baltic Sea region. Hum Hered. 1998 Jul-Aug;48(4):185-91.
Benedictow OJ. Demographic conditions. In: Helle K (ed). The Cambridge history of
Scandinavia. Volume I: Prehistory to 1520. Cambridge University Press, Cambridge.
2003; 237-49.
Bergman G. Kortfattad svensk språkhistoria. Norstedt, Stockholm. 2003; 255 p.
Bittles AH, Egerbladh I. The influence of past endogamy and consanguinity on genetic disorders
in northern Sweden. Ann Hum Genet. 2005 Sep;69(Pt 5):549-58.
Burstedt MS, Sandgren O, Holmgren G, Forsman-Semb K. Bothnia dystrophy caused by
mutations in the cellular retinaldehyde-binding protein gene (RLBP1) on chromosome
15q26. Invest Ophthalmol Vis Sci. 1999 Apr;40(5):995-1000.
Butler JM. Genetics and genomics of core short tandem repeat loci used in human identity
testing. J Forensic Sci. 2006 Mar;51(2):253-65.
C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform
for investigating biology. Science. 1998 Dec 11;282(5396):2012-8.
Carpelan C. Käännekohtia Suomen esihistoriassa aikavälillä 5100...1000 eKr. In: Fogelberg P
(ed). Pohjan poluilla: suomalaisten juuret nykytutkimuksen mukaan. Bidrag till kännedom
av Finlands natur och folk 153. Finska Vetenskaps-Societeten, Helsinki. 1999; 249-80.
Cavalli-Sforza LL, Menozzi P, Piazza A. The history and geography of human genes. Princeton
University Press, Princeton. 1994; 518 p.
Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES.
Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad
Sci U S A. 2007 Dec 4;104(49):19428-33.
Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R. Ascertainment bias in
studies of human genome-wide polymorphism. Genome Res. 2005 Nov;15(11):1496-502.
Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, Smink LJ, Lam AC,
Ovington NR, Stevens HE, Nutland S, Howson JM, Faham M, Moorhead M, Jones HB,
Falkowski M, Hardenbol P, Willis TD, Todd JA. Population structure, differential bias
and genomic control in a large-scale, case-control association study. Nat Genet. 2005
Nov;37(11):1243-6.
References
111
Colbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, Tokishita S, Aerts
A, Arnold GJ, Basu MK, Bauer DJ, Cáceres CE, Carmel L, Casola C, Choi JH, Detter JC,
Dong Q, Dusheyko S, Eads BD, Fröhlich T, Geiler-Samerotte KA, Gerlach D, Hatcher P,
Jogdeo S, Krijgsveld J, Kriventseva EV, Kültz D, Laforsch C, Lindquist E, Lopez J, Manak
JR, Muller J, Pangilinan J, Patwardhan RP, Pitluck S, Pritham EJ, Rechtsteiner A, Rho
M, Rogozin IB, Sakarya O, Salamov A, Schaack S, Shapiro H, Shiga Y, Skalitzky C, Smith
Z, Souvorov A, Sung W, Tang Z, Tsuchiya D, Tu H, Vos H, Wang M, Wolf YI, Yamagata H,
Yamada T, Ye Y, Shaw JR, Andrews J, Crease TJ, Tang H, Lucas SM, Robertson HM, Bork
P, Koonin EV, Zdobnov EM, Grigoriev IV, Lynch M, Boore JL. The ecoresponsive genome
of Daphnia pulex. Science. 2011 Feb 4;331(6017):555-61.
Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, Rosenberg NA, Pritchard JK. A worldwide
survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet.
2006 Nov;38(11):1251-60.
Coop G, Wen X, Ober C, Pritchard JK, Przeworski M. High-resolution mapping of crossovers
reveals extensive variation in fine-scale recombination patterns among humans. Science.
2008 Mar 7;319(5868):1395-8.
Cox TF, Cox MAA. Multidimensional scaling. Chapman and Hall/CRC, Boca Raton. 2001; 308 p.
Dahlbäck G. The towns. In: Helle K (ed). The Cambridge history of Scandinavia. Volume I:
Prehistory to 1520. Cambridge University Press, Cambridge. 2003; 611-34.
Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES. High-resolution haplotype structure
in the human genome. Nat Genet. 2001 Oct;29(2):229-32.
de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise
over two-thirds of the human genome. PLoS Genet. 2011 Dec;7(12):e1002384.
De La Vega FM, Isaac H, Collins A, Scafe CR, Halldórsson BV, Su X, Lippert RA, Wang Y, LaigWebster M, Koehler RT, Ziegle JS, Wogan LT, Stevens JF, Leinen KM, Olson SJ, Guegler
KJ, You X, Xu LH, Hemken HG, Kalush F, Itakura M, Zheng Y, de Thé G, O’Brien SJ, Clark
AG, Istrail S, Hunkapiller MW, Spier EG, Gilbert DA. The linkage disequilibrium maps
of three human chromosomes across four populations reflect their demographic history
and a common underlying recombination pattern. Genome Res. 2005 Apr;15(4):454-62.
Dermitzakis ET, Reymond A, Antonarakis SE. Conserved non-genic sequences - an unexpected
feature of mammalian genomes. Nat Rev Genet. 2005 Feb;6(2):151-7.
Dray S, Dufour AB. The ade4 package: implementing the duality diagram for ecologists.
Journal of Statistical Software. 2007;22:1–20.
Eberle MA, Rieder MJ, Kruglyak L, Nickerson DA. Allele frequency matching between SNPs
reveals an excess of linkage disequilibrium in genic regions of the human genome. PLoS
Genet. 2006 Sep 8;2(9):e142.
Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH. Missing heritability
and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010
Jun;11(6):446-50.
Einarsdóttir K, Humphreys K, Bonnard C, Palmgren J, Iles MM, Sjölander A, Li Y, Chia KS,
Liu ET, Hall P, Liu J, Wedrén S. Linkage disequilibrium mapping of CHEK2: common
variation and breast cancer risk. PLoS Med. 2006 Jun;3(6):e168.
112
References
Einarsdottir E, Egerbladh I, Beckman L, Holmberg D, Escher SA. The genetic population
structure of northern Sweden and its implications for mapping genetic diseases. Hereditas.
2007 Nov;144(5):171-80.
ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras
TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor
CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM,
Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC,
Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James
KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC,
Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox
S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS,
Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish
A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch
HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel
J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS,
Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT,
Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S,
Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano
C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J,
Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang
X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM,
Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu
Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold
EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos
JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin
JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA;
NISC Comparative Sequencing Program; Baylor College of Medicine Human Genome
Sequencing Center; Washington University Genome Sequencing Center; Broad Institute;
Children’s Hospital Oakland Research Institute, Batzoglou S, Goldman N, Hardison RC,
Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC,
Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW,
Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer
MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC,
Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman
S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna
R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang
X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B,
Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith
K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI,
Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras
E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan
X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young
AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock
GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S,
Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K,
Yoshinaga Y, Zhu B, de Jong PJ. Identification and analysis of functional elements in 1% of
the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816.
References
113
Erdmann J, Grosshennig A, Braund PS, König IR, Hengstenberg C, Hall AS, Linsel-Nitschke
P, Kathiresan S, Wright B, Trégouët DA, Cambien F, Bruse P, Aherrahrou Z, Wagner
AK, Stark K, Schwartz SM, Salomaa V, Elosua R, Melander O, Voight BF, O’Donnell CJ,
Peltonen L, Siscovick DS, Altshuler D, Merlini PA, Peyvandi F, Bernardinelli L, Ardissino
D, Schillert A, Blankenberg S, Zeller T, Wild P, Schwarz DF, Tiret L, Perret C, Schreiber S,
El Mokhtari NE, Schäfer A, März W, Renner W, Bugert P, Klüter H, Schrezenmeir J, Rubin
D, Ball SG, Balmforth AJ, Wichmann HE, Meitinger T, Fischer M, Meisinger C, Baumert
J, Peters A, Ouwehand WH; Italian Atherosclerosis, Thrombosis, and Vascular Biology
Working Group; Myocardial Infarction Genetics Consortium; Wellcome Trust Case Control
Consortium; Cardiogenics Consortium, Deloukas P, Thompson JR, Ziegler A, Samani NJ,
Schunkert H. New susceptibility locus for coronary artery disease on chromosome 3q22.3.
Nat Genet. 2009 Mar;41(3):280-2.
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the
software STRUCTURE: a simulation study. Mol Ecol. 2005 Jul;14(8):2611-20.
Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated software package for
population genetics data analysis. Evol Bioinform Online. 2007 Feb 23;1:47-50.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003 Aug;164(4):1567-87.
Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet.
2006 Feb;7(2):85-97.
Finnilä S, Lehtonen MS, Majamaa K. Phylogenetic network for European mtDNA. Am J Hum
Genet. 2001 Jun;68(6):1475-84.
François O, Currat M, Ray N, Han E, Excoffier L, Novembre J. Principal component analysis
under population genetic models of range expansion and admixture. Mol Biol Evol. 2010
Jun;27(6):1257-68.
Gattepaille LM, Jakobsson M. Combining markers into haplotypes can improve population
structure inference. Genetics. 2012 Jan;190(1):159-74.
Gilissen C, Hoischen A, Brunner HG, Veltman JA. Disease gene identification strategies for
exome sequencing. Eur J Hum Genet. 2012 May;20(5):490-7.
Goudet J. Hierfstat, a package for R to compute and test hierarchical F-statistics. Mol Ecol
Notes. 2005;5:184-6.
Guglielmino CR, Piazza A, Menozzi P, Cavalli-Sforza LL. Uralic genes in Europe. Am J Phys
Anthropol. 1990 Sep;83(1):57-68.
Guillot G, Estoup A, Mortier F, Cosson JF. A spatial statistical model for landscape genetics.
Genetics. 2005 Jul;170(3):1261-80.
Guillot G, Santos F, Estoup A. Analysing georeferenced population genetics data with
Geneland: a new algorithm to deal with null alleles and a friendly graphical user interface.
Bioinformatics. 2008 Jun 1;24(11):1406-7.
Guillot G, Foll M. Correcting for ascertainment bias in the inference of population structure.
Bioinformatics. 2009 Feb 15;25(4):552-4.
Guldberg P, Henriksen KF, Sipilä I, Güttler F, de la Chapelle A. Phenylketonuria in a low
incidence population: molecular characterisation of mutations in Finland. J Med Genet.
1995 Dec;32(12):976-8.
114
References
Gustavson KH, Holmgren G. Ärftliga sjukdomar som är vanliga i Sverige: “det svenska sjukdomsarvet”. Finska Läkaresällskapets Handlingar. 1991;151:194-8.
Gyllerup S, Lanke J, Lindholm LH, Schersten B. Cold climate is an important factor in
explaining regional differences in coronary mortality even if serum cholesterol and other
established risk factors are taken into account. Scott Med J. 1993 Dec;38(6):169-72.
Gyllerup S. Cold climate and coronary mortality in Sweden. Int J Circumpolar Health. 2000
Oct;59(3-4):160-3.
Haataja L. Jälleenrakentava Suomi. In: Zetterberg S (ed). Suomen historian pikkujättiläinen.
WSOY, Porvoo. 2003; 728-831.
Häger BÅ. Frihetstiden och den gustavianska tiden 1718-1809. In: Larsson HA (ed). Boken
om Sveriges historia. Forum, Stockholm. 1999; 181-218.
Häkkinen K. Mistä sanat tulevat: suomalaista etymologiaa. Tietolipas 117. Suomalaisen
Kirjallisuuden Seura, Helsinki. 1990; 326 p.
Häkkinen K. Suomalaisten esihistoria kielitieteen valossa. Tietolipas 147. Suomalaisen
Kirjallisuuden Seura, Helsinki. 1996; 224 p.
Häkkinen J. Kantauralin ajoitus ja paikannus: perustelut puntarissa. Journal de la Société
Finno-Ougrienne. 2009;92:9-56.
Hamilton MB. Population Genetics. Wiley-Blackwell, Chichester. 2009; 407 p.
Hammar N, Ahlbom A, Theorell T. Geographical differences in myocardial infarction incidence
in eight Swedish counties, 1976-1981. Epidemiology. 1992 Jul;3(4):348-55.
Hammar N, Andersson T, Reuterwall C, Nilsson T, Knutsson A, Hallqvist J, Ahlbom A.
Geographical differences in the incidence of acute myocardial infarction in Sweden.
Analyses of possible causes using two parallel case-control studies. J Intern Med. 2001
Feb;249(2):137-44.
Hannelius U, Lindgren CM, Melén E, Malmberg A, von Dobeln U, Kere J. Phenylketonuria
screening registry as a resource for population genetic studies. J Med Genet. 2005
Oct;42(10):e60.
Hannelius U, Gherman L, Mäkelä VV, Lindstedt A, Zucchelli M, Lagerberg C, Tybring G, Kere
J, Lindgren CM. Large-scale zygosity testing using single nucleotide polymorphisms. Twin
Res Hum Genet. 2007 Aug;10(4):604-25.
Hartl DL, Clark AG. Principles of Population Genetics. 4th ed. Sinauer Associates, Sunderland.
2007; 652 p.
Havulinna AS, Pääkkönen R, Karvonen M, Salomaa V. Geographic patterns of incidence
of ischemic stroke and acute myocardial infarction in Finland during 1991-2003. Ann
Epidemiol. 2008 Mar;18(3):206-13.
Heath SC, Gut IG, Brennan P, McKay JD, Bencko V, Fabianova E, Foretova L, Georges M,
Janout V, Kabesch M, Krokan HE, Elvestad MB, Lissowska J, Mates D, Rudnai P, Skorpen F,
Schreiber S, Soria JM, Syvänen AC, Meneton P, Herçberg S, Galan P, Szeszenia-Dabrowska
N, Zaridze D, Génin E, Cardon LR, Lathrop M. Investigation of the fine structure of
European populations with applications to disease association studies. Eur J Hum Genet.
2008 Dec;16(12):1413-29.
References
115
Hedlund E, Kaprio J, Lange A, Koskenvuo M, Jartti L, Rönnemaa T, Hammar N. Migration
and coronary heart disease: A study of Finnish twins living in Sweden and their co-twins
residing in Finland. Scand J Public Health. 2007;35(5):468-74.
Hedman M, Pimenoff V, Lukka M, Sistonen P, Sajantila A. Analysis of 16 Y STR loci in the
Finnish population reveals a local reduction in the diversity of male lineages. Forensic Sci
Int. 2004 May 28;142(1):37-43.
Hedman M, Brandstätter A, Pimenoff V, Sistonen P, Palo JU, Parson W, Sajantila A. Finnish
mitochondrial DNA HVS-I and HVS-II population data. Forensic Sci Int. 2007 Oct
25;172(2-3):171-8.
Hedrick PW. Genetics of populations. 3rd ed. Jones and Bartlett Publishers, Sudbury. 2005;
737 p.
Heikinheimo H, Fortelius M, Eronen J, Mannila H. Biogeography of European land
mammals shows environmentally distinct and spatially coherent clusters. J Biogeogr.
2007;34(6):1053-64.
Hewitt G. The genetic legacy of the Quaternary ice ages. Nature. 2000 Jun 22;405(6789):907-13.
Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH. Genomics in C. elegans:
so many genes, such a little worm. Genome Res. 2005 Dec;15(12):1651-60.
Hofreiter M, Serre D, Poinar HN, Kuch M, Pääbo S. Ancient DNA. Nat Rev Genet. 2001
May;2(5):353-9.
Hoggart CJ, Shriver MD, Kittles RA, Clayton DG, McKeigue PM. Design and analysis of
admixture mapping studies. Am J Hum Genet. 2004 May;74(5):965-78.
Hoggart CJ, O’Reilly PF, Kaakinen M, Zhang W, Chambers JC, Kooner JS, Coin LJ, Jarvelin
MR. Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data. Genetics. 2012 Feb;190(2):669-77.
Holmlund G, Nilsson H, Lango-Warensjo A, Rosén B, Lindblom B. Y-chromosome STR
haplotypes in a Swedish population. In: Brinkmann B, Carracedo A (eds). Progress in
Forensic Genetics 9. International Congress Series 1239. Elsevier, Amsterdam. 2003; 343–8.
Holmlund G, Nilsson H, Karlsson A, Lindblom B. Y-chromosome STR haplotypes in Sweden.
Forensic Sci Int. 2006 Jun 27;160(1):66-79.
Honkola T, Vesakoski O, Korhonen K, Lehtinen J, Syrjänen K, Wahlberg N. Cultural and
climatic changes shape the evolutionary history of the Uralic languages. Unpublished.
Hovatta I, Terwilliger JD, Lichtermann D, Mäkikyrö T, Suvisaari J, Peltonen L, Lönnqvist J.
Schizophrenia in the genetic isolate of Finland. Am J Med Genet. 1997 Jul 25;74(4):353-60.
Humphreys K, Grankvist A, Leu M, Hall P, Liu J, Ripatti S, Rehnström K, Groop L, Klareskog
L, Ding B, Grönberg H, Xu J, Pedersen NL, Lichtenstein P, Mattingsdal M, Andreassen OA,
O’Dushlaine C, Purcell SM, Sklar P, Sullivan PF, Hultman CM, Palmgren J, Magnusson PK.
The genetic structure of the Swedish population. PLoS One. 2011;6(8):e22547.
Huurre M. 9000 vuotta Suomen esihistoriaa. 9th ed. Otava, Helsinki. 2005; 272 p.
Huyghe JR, Fransen E, Hannula S, Van Laer L, Van Eyken E, Mäki-Torkko E, LysholmBernacchi A, Aikio P, Stephan DA, Sorri M, Huentelman MJ, Van Camp G. Genome-wide
SNP analysis reveals no gain in power for association studies of common variants in the
Finnish Saami. Eur J Hum Genet. 2010 May;18(5):569-74.
116
References
Huyghe JR, Fransen E, Hannula S, Van Laer L, Van Eyken E, Mäki-Torkko E, Aikio P, Sorri
M, Huentelman MJ, Van Camp G. A genome-wide analysis of population structure in the
Finnish Saami with implications for genetic association studies. Eur J Hum Genet. 2011
Mar;19(3):347-52.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection
of large-scale variation in the human genome. Nat Genet. 2004 Sep;36(9):949-51.
Ikäheimo I, Silvennoinen-Kassinen S, Tiilikainen A. HLA five-locus haplotypes in Finns. Eur
J Immunogenet. 1996 Aug;23(4):321-8.
Ingman M, Gyllensten U. A recent genetic link between Sami and the Volga-Ural region of
Russia. Eur J Hum Genet. 2007 Jan;15(1):115-20.
International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec
18;426(6968):789-96.
International HapMap Consortium. A haplotype map of the human genome. Nature. 2005
Oct 27;437(7063):1299-320.
International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL,
Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA,
Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X,
Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R,
Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen
H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y,
Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y,
Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K,
Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier
JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC,
Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK,
Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A,
Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S,
Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett
J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe’er I, Price A, Purcell S, Richter DJ,
Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L,
Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS,
Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A,
Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens
M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin
JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer
DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah
C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole
IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D,
Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC,
Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers
J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT,
Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L’Archevêque
P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks
LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia
LF, Collins FS, Kennedy K, Jamieson R, Stewart J. A second generation human haplotype
map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61.
References
117
International HapMap 3 Consortium, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs
RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE,
Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye
M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y,
Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson
K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S,
Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F,
Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF,
Zhang Q, Ghori MJ, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F,
Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon
DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi
CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE. Integrating common and rare
genetic variation in diverse human populations. Nature. 2010 Sep 2;467(7311):52-8.
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence
of the human genome. Nature. 2004 Oct 21;431(7011):931-45.
Jakkula E, Rehnström K, Varilo T, Pietiläinen OP, Paunio T, Pedersen NL, deFaire U, Järvelin
MR, Saharinen J, Freimer N, Ripatti S, Purcell S, Collins A, Daly MJ, Palotie A, Peltonen
L. The genome-wide patterns of variation expose significant substructure in a founder
population. Am J Hum Genet. 2008 Dec;83(6):787-94.
Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH,
Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez
J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA,
Rosenberg NA, Singleton AB. Genotype, haplotype and copy-number variation in worldwide
human populations. Nature. 2008 Feb 21;451(7181):998-1003.
Jartti L, Rönnemaa T, Kaprio J, Järvisalo MJ, Toikka JO, Marniemi J, Hammar N, Alfredsson
L, Saraste M, Hartiala J, Koskenvuo M, Raitakari OT. Population-based twin study of the
effects of migration from Finland to Sweden on endothelial function and intima-media
thickness. Arterioscler Thromb Vasc Biol. 2002a May 1;22(5):832-7.
Jartti L, Raitakari OT, Kaprio J, Järvisalo MJ, Toikka JO, Marniemi J, Hammar N, Luotolahti
M, Koskenvuo M, Rönnemaa T. Increased carotid intima-media thickness in men born in
east Finland: a twin study of the effects of birthplace and migration to Sweden on subclinical
atherosclerosis. Ann Med. 2002b;34(3):162-70.
Jartti L, Rönnemaa T, Raitakari OT, Hedlund E, Hammar N, Lassila R, Marniemi J, Koskenvuo
M, Kaprio J. Migration at early age from a high to a lower coronary heart disease risk
country lowers the risk of subclinical atherosclerosis in middle-aged men. J Intern Med.
2009 Mar;265(3):345-58.
Jobling MA, Hurles M, Tyler-Smith C. Human Evolutionary Genetics: Origins, peoples &
disease. Garland Publishing, Abingdon. 2004; 523 p.
Johansson S, Nordin P, Wilhelmsen L, Tibblin G, Johansson BW, Hansen O, Ahlmark G,
Jacobsson S, Michaeli E, Gillnäs T. Geographical variation and time trends in the attack
rate of coronary heart disease in five Swedish cities. J Intern Med. 1992 May;231(5):511-20.
Johansson A, Vavruch-Nilsson V, Edin-Liljegren A, Sjölander P, Gyllensten U. Linkage disequilibrium between microsatellite markers in the Swedish Sami relative to a worldwide
selection of populations. Hum Genet. 2005 Jan;116(1-2):105-13.
118
References
Johansson A, Ingman M, Mack SJ, Erlich H, Gyllensten U. Genetic origin of the Swedish
Sami inferred from HLA class I and class II allele frequencies. Eur J Hum Genet. 2008
Nov;16(11):1341-9.
Johnson NA, Coram MA, Shriver MD, Romieu I, Barsh GS, London SJ, Tang H. Ancestral components of admixed genomes in a Mexican cohort. PLoS Genet. 2011 Dec;7(12):e1002410.
Jorde LB, Watkins WS, Bamshad MJ. Population genomics: a bridge from evolutionary history
to genetic medicine. Hum Mol Genet. 2001 Oct 1;10(20):2199-207.
Jousilahti P, Vartiainen E, Tuomilehto J, Pekkanen J, Puska P. Role of known risk factors
in explaining the difference in the risk of coronary heart disease between eastern and
southwestern Finland. Ann Med. 1998 Oct;30(5):481-7.
Juonala M, Viikari JS, Kähönen M, Taittonen L, Rönnemaa T, Laitinen T, Mäki-Torkko N,
Mikkilä V, Räsänen L, Akerblom HK, Pesonen E, Raitakari OT. Geographic origin as a
determinant of carotid artery intima-media thickness and brachial artery flow-mediated
dilation: the Cardiovascular Risk in Young Finns study. Arterioscler Thromb Vasc Biol.
2005 Feb;25(2):392-8.
Juonala M, Viikari JS, Raitakari OT. Miksi sepelvaltimotauti on edelleen enemmän itä- kuin
länsisuomalaisten vaiva? Duodecim. 2006;122(1):55-61.
Juvonen V, Kulmala SM, Ignatius J, Penttinen M, Savontaus ML. Dissecting the epidemiology of a trinucleotide repeat disease - example of FRDA in Finland. Hum Genet. 2002
Jan;110(1):36-40.
Kaessmann H, Zöllner S, Gustafsson AC, Wiebe V, Laan M, Lundeberg J, Uhlén M, Pääbo
S. Extensive linkage disequilibrium in small human populations in Eurasia. Am J Hum
Genet. 2002 Mar;70(3):673-85.
Kaipio J, Näyhä S, Valtonen V. Fluoride in the drinking water and the geographical variation
of coronary heart disease in Finland. Eur J Cardiovasc Prev Rehabil. 2004 Feb;11(1):56-62.
Kallio P. Suomen kantakielten absoluuttista kronologiaa. Virittäjä. 2006;110(1):2-25.
Karlsson AO, Wallerström T, Götherström A, Holmlund G. Y-chromosome diversity in
Sweden - a long-time perspective. Eur J Hum Genet. 2006 Aug;14(8):963-70.
Karvonen M, Moltchanova E, Viik-Kajander M, Moltchanov V, Rytkönen M, Kousa A,
Tuomilehto J. Regional inequality in the risk of acute myocardial infarction in Finland: a
case study of 35- to 74-year-old men. Heart Drug 2002;2:51-60.
Keränen J. Kustaa Vaasa ja uskonpuhdistuksen aika. In: Zetterberg S (ed). Suomen historian
pikkujättiläinen. WSOY, Porvoo. 2003; 138-91.
Kere J, Norio R, Savilahti E, Estivill X, de la Chapelle A. Cystic fibrosis in Finland: a molecular
and genealogical study. Hum Genet. 1989 Aug;83(1):20-5.
Kere J, Estivill X, Chillón M, Morral N, Nunes V, Norio R, Savilahti E, de la Chapelle A. Cystic
fibrosis in a low-incidence population: two major mutations in Finland. Hum Genet. 1994
Feb;93(2):162-6.
Kere J. Human population genetics: lessons from Finland. Annu Rev Genomics Hum Genet.
2001;2:103-28.
Kidd KK, Pakstis AJ, Speed WC, Kidd JR. Understanding human DNA sequence variation. J
Hered. 2004 Sep-Oct;95(5):406-20.
References
119
Kirchhoff M, Davidsen M, Brønnum-Hansen H, Hansen B, Schnack H, Eriksen LS, Madsen
M, Schroll M. Incidence of myocardial infarction in the Danish MONICA population 19821991. Int J Epidemiol. 1999 Apr;28(2):211-8.
Kittles RA, Perola M, Peltonen L, Bergen AW, Aragon RA, Virkkunen M, Linnoila M, Goldman
D, Long JC. Dual origins of Finns revealed by Y chromosome haplotype variation. Am J
Hum Genet. 1998 May;62(5):1171-9.
Kittles RA, Bergen AW, Urbanek M, Virkkunen M, Linnoila M, Goldman D, Long JC. Autosomal,
mitochondrial, and Y chromosome DNA variation in Finland: evidence for a male-specific
bottleneck. Am J Phys Anthropol. 1999 Apr;108(4):381-99.
Knopp T, Merilä J. The postglacial recolonization of Northern Europe by Rana arvalis as
revealed by microsatellite and mitochondrial DNA analyses. Heredity (Edinb). 2009
Feb;102(2):174-81.
Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B,
Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML,
Thorgeirsson TE, Gulcher JR, Stefansson K. A high-resolution recombination map of the
human genome. Nat Genet. 2002 Jul;31(3):241-7.
Kousa A, Moltchanova E, Viik-Kajander M, Rytkönen M, Tuomilehto J, Tarvainen T, Karvonen
M. Geochemistry of ground water and the incidence of acute myocardial infarction in
Finland. J Epidemiol Community Health. 2004 Feb;58(2):136-9.
Kousa A, Havulinna AS, Moltchanova E, Taskinen O, Nikkarinen M, Eriksson J, Karvonen
M. Calcium:magnesium ratio in local groundwater and incidence of acute myocardial
infarction among males in rural Finland. Environ Health Perspect. 2006 May;114(5):730-4.
Krawczak M, Nikolaus S, von Eberstein H, Croucher PJ, El Mokhtari NE, Schreiber S. PopGen:
population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships. Community Genet. 2006;9(1):55-61.
Kristiansson K, Naukkarinen J, Peltonen L. Isolated populations and complex disease gene
identification. Genome Biol. 2008;9(8):109.
Kuulasmaa K, Tunstall-Pedoe H, Dobson A, Fortmann S, Sans S, Tolonen H, Evans A, Ferrario
M, Tuomilehto J. Estimation of contribution of changes in classic risk factors to trends
in coronary-event rates across the WHO MONICA Project populations. Lancet. 2000 Feb
26;355(9205):675-87.
Laakso J (ed). Uralilaiset kansat: tietoa suomen sukukielistä ja niiden puhujista. WSOY,
Porvoo. 1991; 329 p.
Laan M, Pääbo S. Demographic history and linkage disequilibrium in human populations.
Nat Genet. 1997 Dec;17(4):435-8.
Laan M, Wiebe V, Khusnutdinova E, Remm M, Pääbo S. X-chromosome as a marker for
population history: linkage disequilibrium and haplotype study in Eurasian populations.
Eur J Hum Genet. 2005 Apr;13(4):452-62.
Lahermo P, Sajantila A, Sistonen P, Lukka M, Aula P, Peltonen L, Savontaus ML. The genetic
relationship between the Finns and the Finnish Saami (Lapps): analysis of nuclear DNA
and mtDNA. Am J Hum Genet. 1996 Jun;58(6):1309-22.
120
References
Lahermo P, Savontaus ML, Sistonen P, Béres J, de Knijff P, Aula P, Sajantila A. Y chromosomal
polymorphisms reveal founding lineages in the Finns and the Saami. Eur J Hum Genet.
1999 May-Jun;7(4):447-58.
Laine A. Suomi sodassa. In: Zetterberg S (ed). Suomen historian pikkujättiläinen. WSOY,
Porvoo. 2003; 690-727.
Laitinen V, Lahermo P, Sistonen P, Savontaus ML. Y-chromosomal diversity suggests that Baltic
males share common Finno-Ugric-speaking forefathers. Hum Hered. 2002;53(2):68-78.
Laks T, Tuomilehto J, Jõeste E, Mäeots E, Salomaa V, Palomäki P, Pullisaar O, Karu K, Torppa
J. Alarmingly high occurrence and case fatality of acute coronary heart disease events in
Estonia: results from the Tallinn AMI register 1991-94. J Intern Med. 1999 Jul;246(1):53-60.
Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with
the DNA of inbred children. Science. 1987 Jun 19;236(4808):1567-70.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle
M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky
J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W,
Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N,
Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley
D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham
I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones
M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb
R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD,
Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl
MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson
DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S,
Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs
RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH,
Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama
A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T,
Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C,
Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J,
Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G,
Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ,
Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C,
Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen
F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia
N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A,
Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp
M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C,
Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif
S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe
TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G,
Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J,
Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins
F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong
P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ; International Human Genome
Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature.
2001 Feb 15;409(6822):860-921.
References
121
Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011 Feb
10;470(7333):187-97.
Lao O, Lu TT, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit J,
Bindoff LA, Comas D, Holmlund G, Kouvatsi A, Macek M, Mollet I, Parson W, Palo J, Ploski
R, Sajantila A, Tagliabracci A, Gether U, Werge T, Rivadeneira F, Hofman A, Uitterlinden
AG, Gieger C, Wichmann HE, Rüther A, Schreiber S, Becker C, Nürnberg P, Nelson MR,
Krawczak M, Kayser M. Correlation between genetic and geographic structure in Europe.
Curr Biol. 2008 Aug 26;18(16):1241-8.
Lappalainen T, Koivumäki S, Salmela E, Huoponen K, Sistonen P, Savontaus ML, Lahermo
P. Regional differences among the Finns: a Y-chromosomal perspective. Gene. 2006 Jul
19;376(2):207-15.
Lappalainen T, Laitinen V, Salmela E, Andersen P, Huoponen K, Savontaus ML, Lahermo
P. Migration waves to the Baltic Sea region. Ann Hum Genet. 2008 May;72(Pt 3):337-48.
Lappalainen T. Human genetic variation in the Baltic Sea region: features of population
history and natural selection. PhD thesis. Helsinki University Print, Helsinki. 2009; 83 p.
Lappalainen T, Hannelius U, Salmela E, von Döbeln U, Lindgren CM, Huoponen K, Savontaus
ML, Kere J, Lahermo P. Population structure in contemporary Sweden--a Y-chromosomal
and mitochondrial DNA analysis. Ann Hum Genet. 2009 Jan;73(1):61-73.
Lappalainen T, Salmela E, Andersen PM, Dahlman-Wright K, Sistonen P, Savontaus ML,
Schreiber S, Lahermo P, Kere J. Genomic landscape of positive natural selection in Northern
European populations. Eur J Hum Genet. 2010 Apr;18(4):471-8.
Leskinen H. Suomen murteiden synty. In: Fogelberg P (ed). Pohjan poluilla: suomalaisten
juuret nykytutkimuksen mukaan. Bidrag till kännedom av Finlands natur och folk 153.
Finska Vetenskaps-Societeten, Helsinki. 1999; 358-71.
Leu M, Humphreys K, Surakka I, Rehnberg E, Muilu J, Rosenström P, Almgren P, Jääskeläinen
J, Lifton RP, Kyvik KO, Kaprio J, Pedersen NL, Palotie A, Hall P, Grönberg H, Groop L,
Peltonen L, Palmgren J, Ripatti S. NordicDB: a Nordic pool and portal for genome-wide
control data. Eur J Hum Genet. 2010 Dec;18(12):1322-6.
Levo A, Jääskeläinen J, Sistonen P, Sirén MK, Voutilainen R, Partanen J. Tracing past population migrations: genealogy of steroid 21-hydroxylase (CYP21) gene mutations in Finland.
Eur J Hum Genet. 1999 Feb-Mar;7(2):188-96.
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF,
Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V,
Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill
J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. The diploid
genome sequence of an individual human. PLoS Biol. 2007 Sep 4;5(10):e254.
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS,
Feldman M, Cavalli-Sforza LL, Myers RM. Worldwide human relationships inferred from
genome-wide patterns of variation. Science. 2008 Feb 22;319(5866):1100-4.
Lindkvist T. Early political organisation. Introductory survey. In: Helle K (ed). The Cambridge
history of Scandinavia. Volume I: Prehistory to 1520. Cambridge University Press,
Cambridge. 2003a; 160-7.
122
References
Lindkvist T. Kings and provinces in Sweden. In: Helle K (ed). The Cambridge history of
Scandinavia. Volume I: Prehistory to 1520. Cambridge University Press, Cambridge.
2003b; 221-34.
Lindqvist H. A history of Sweden: from Ice Age to our age. Norstedts, Stockholm. 2006; 727 p.
Litt M, Luty JA. A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J Hum Genet. 1989 Mar;44(3):397-401.
Lu TT, Lao O, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit
J, Bindoff LA, Comas D, Holmlund G, Kouvatsi A, Macek M, Mollet I, Nielsen F, Parson W,
Palo J, Ploski R, Sajantila A, Tagliabracci A, Gether U, Werge T, Rivadeneira F, Hofman
A, Uitterlinden AG, Gieger C, Wichmann HE, Ruether A, Schreiber S, Becker C, Nürnberg
P, Nelson MR, Kayser M, Krawczak M. An evaluation of the genetic-matched pair study
design using genome-wide SNP data from the European population. Eur J Hum Genet.
2009 Jul;17(7):967-75.
Lundmark PE, Liljedahl U, Boomsma DI, Mannila H, Martin NG, Palotie A, Peltonen L, Perola
M, Spector TD, Syvänen AC. Evaluation of HapMap data in six populations of European
descent. Eur J Hum Genet. 2008 Sep;16(9):1142-50.
Mallory JP. In search of the Indo-Europeans: language, archaeology and myth. Thames &
Hudson, London. 1989; 288 p.
Malmström H, Gilbert MT, Thomas MG, Brandström M, Storå J, Molnar P, Andersen PK,
Bendixen C, Holmlund G, Götherström A, Willerslev E. Ancient DNA reveals lack of
continuity between neolithic hunter-gatherers and contemporary Scandinavians. Curr
Biol. 2009 Nov 3;19(20):1758-62.
Malmström H, Linderholm A, Lidén K, Storå J, Molnar P, Holmlund G, Jakobsson M,
Götherström A. High frequency of lactose intolerance in a prehistoric hunter-gatherer
population in northern Europe. BMC Evol Biol. 2010 Mar 30;10:89.
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos
EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E,
Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson
G, Haines JL, Mackay TF, McCarroll SA, Visscher PM. Finding the missing heritability of
complex diseases. Nature. 2009 Oct 8;461(7265):747-53.
Mäntylä I. Suurvaltakausi. In: Zetterberg S (ed). Suomen historian pikkujättiläinen. WSOY,
Porvoo. 2003a; 192-275.
Mäntylä I. Kustavilainen aika. In: Zetterberg S (ed). Suomen historian pikkujättiläinen. WSOY,
Porvoo. 2003b; 314-57.
Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG,
Vargas E, McKeigue PM, Shriver MD, Parra EJ. A genomewide admixture mapping panel
for Hispanic/Latino populations. Am J Hum Genet. 2007 Jun;80(6):1171-8.
Martin ER. Linkage disequilibrium and association analysis. In: Haines JL, Pericak-Vance MA
(eds). Genetic analysis of complex disease. John Wiley & Sons, Hoboken. 2006; 329-353.
McEvoy BP, Montgomery GW, McRae AF, Ripatti S, Perola M, Spector TD, Cherkas L, Ahmadi
KR, Boomsma D, Willemsen G, Hottenga JJ, Pedersen NL, Magnusson PK, Kyvik KO,
Christensen K, Kaprio J, Heikkilä K, Palotie A, Widen E, Muilu J, Syvänen AC, Liljedahl
References
123
U, Hardiman O, Cronin S, Peltonen L, Martin NG, Visscher PM. Geographical structure
and differential natural selection among North European populations. Genome Res. 2009
May;19(5):804-14.
McVean G. A genealogical interpretation of principal components analysis. PLoS Genet. 2009
Oct;5(10):e1000686.
Meigs JB, Shrader P, Sullivan LM, McAteer JB, Fox CS, Dupuis J, Manning AK, Florez JC,
Wilson PW, D’Agostino RB Sr, Cupples LA. Genotype score in addition to common risk
factors for prediction of type 2 diabetes. N Engl J Med. 2008 Nov 20;359(21):2208-19.
Meinander CF. Kivikautemme väestöhistoria. In: Gallén J (ed). Suomen väestön esihistorialliset
juuret. Bidrag till kännedom av Finlands natur och folk 131. Finska Vetenskaps-Societeten,
Helsinki. 1984; 21-48.
Meinilä M, Finnilä S, Majamaa K. Evidence for mtDNA admixture between the Finns and the
Saami. Hum Hered. 2001;52(3):160-70.
Melin J, Johansson AW, Hedenborg S. Sveriges historia: koncentrerad uppslagsbok - fakta,
årtal, kartor, tabeller. 3rd ed. Prisma, Stockholm. 2003; 508 p.
Mitchell D, Willerslev E, Hansen A. Damage and repair of ancient DNA. Mutat Res. 2005
Apr 1;571(1-2):265-76.
Moisio AL, Sistonen P, Weissenbach J, de la Chapelle A, Peltomäki P. Age and origin of two
common MLH1 mutations predisposing to hereditary colon cancer. Am J Hum Genet.
1996 Dec;59(6):1243-51.
Mökkönen, Teemu. Studies on Stone Age Housepits in Fennoscandia (4000 - 2000 cal BC):
Changes in ground plan, site location and degree of sedentism. PhD thesis. Helsinki
University Print, Helsinki. 2011; 86 p.
Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, Atzmon G, Burns E, Ostrer H,
Price AL, Reich D. The history of African gene flow into Southern Europeans, Levantines,
and Jews. PLoS Genet. 2011 Apr;7(4):e1001373.
Morin PA, Martien KK, Taylor BL. Assessing statistical power of SNPs for population structure
and conservation studies. Mol Ecol Resour. 2009 Jan;9(1):66-73.
Morris RW, Whincup PH, Lampe FC, Walker M, Wannamethee SG, Shaper AG. Geographic
variation in incidence of coronary heart disease in Britain: the contribution of established
risk factors. Heart. 2001 Sep;86(3):277-83.
Morris RW, Whincup PH, Emberson JR, Lampe FC, Walker M, Shaper AG. North-south
gradients in Britain for stroke and CHD: are they explained by the same factors? Stroke.
2003 Nov;34(11):2604-9.
Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. A fine-scale map of recombination
rates and hotspots across the human genome. Science. 2005 Oct 14;310(5746):321-4.
Mykkänen K, Savontaus ML, Juvonen V, Sistonen P, Tuisku S, Tuominen S, Penttinen M,
Lundkvist J, Viitanen M, Kalimo H, Pöyhönen M. Detection of the founder effect in Finnish
CADASIL families. Eur J Hum Genet. 2004 Oct;12(10):813-9.
Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics.
2000 Sep;156(1):297-304.
Nanovfszky G (ed). The Finno-Ugric world. Teleki László Foundation, Budapest. 2004; 548 p.
124
References
Näyhä S. Environmental temperature and mortality. Int J Circumpolar Health. 2005
Dec;64(5):451-8.
Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, Karachanak S, Piskácková T,
Balascák I, Peltonen L, Jakkula E, Rehnström K, Lathrop M, Heath S, Galan P, Schreiber
S, Meitinger T, Pfeufer A, Wichmann HE, Melegh B, Polgár N, Toniolo D, Gasparini P,
D’Adamo P, Klovins J, Nikitina-Zake L, Kucinskas V, Kasnauskiene J, Lubinski J, Debniak
T, Limborska S, Khrunin A, Estivill X, Rabionet R, Marsal S, Julià A, Antonarakis SE,
Deutsch S, Borel C, Attar H, Gagnebin M, Macek M, Krawczak M, Remm M, Metspalu A.
Genetic structure of Europeans: a view from the North-East. PLoS One. 2009;4(5):e5472.
Nerbrand C, Svärdsudd K, Hörte LG, Tibblin G. Geographical variation of mortality from
cardiovascular diseases. The Project ’Myocardial Infarction in mid-Sweden’. Eur Heart
J. 1991 Jan;12(1):4-9.
Nerbrand C, Svärdsudd K, Ek J, Tibblin G. Cardiovascular mortality and morbidity in seven
counties in Sweden in relation to water hardness and geological settings. The project:
myocardial infarction in mid-Sweden. Eur Heart J. 1992 Jun;13(6):721-7.
Nevanlinna HR. The Finnish population structure. A genetic and genealogical study. Hereditas.
1972;71(2):195-236.
Nevanlinna HR. Suomalaisten juuret geneettisen merkkiominaisuustutkimuksen valossa. In:
Gallén J (ed). Suomen väestön esihistorialliset juuret. Bidrag till kännedom av Finlands
natur och folk 131. Finska Vetenskaps-Societeten, Helsinki. 1984; 157–74.
Nielsen R, Hubisz MJ, Clark AG. Reconstituting the frequency spectrum of ascertained
single-nucleotide polymorphism data. Genetics. 2004 Dec;168(4):2373-82.
Norio R, Nevanlinna HR, Perheentupa J. Hereditary diseases in Finland; rare flora in rare
soil. Ann Clin Res. 1973 Jun;5(3):109-41.
Norio R. Finnish Disease Heritage I: characteristics, causes, background. Hum Genet. 2003a
May;112(5-6):441-56.
Norio R. Finnish Disease Heritage II: population prehistory and genetic roots of Finns. Hum
Genet. 2003b May;112(5-6):457-69.
Norio R. The Finnish Disease Heritage III: the individual diseases. Hum Genet. 2003c
May;112(5-6):470-526.
Norman H. Emigrationen. In: Melin J, Johansson AW, Hedenborg S. Sveriges historia: koncentrerad uppslagsbok - fakta, årtal, kartor, tabeller. 3rd ed. Prisma, Stockholm. 2003; 277.
Novembre J, Stephens M. Interpreting principal component analyses of spatial population
genetic variation. Nat Genet. 2008 May;40(5):646-9.
Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S,
Nelson MR, Stephens M, Bustamante CD. Genes mirror geography within Europe. Nature.
2008 Nov 6;456(7218):98-101.
Nychka D. fields: Tools for spatial data. R package version 4.1. 2007.
Nygård T, Kallio V. Rajamaa. In: Zetterberg S (ed). Suomen historian pikkujättiläinen. WSOY,
Porvoo. 2003; 534-93.
Nylander PO, Beckman L. Population studies in northern Sweden. XVII. Estimates of Finnish
and Saamish influence. Hum Hered. 1991;41(3):157-67.
References
125
O’Dushlaine CT, Morris D, Moskvina V, Kirov G, Consortium IS, Gill M, Corvin A, Wilson JF,
Cavalleri GL. Population structure and genome-wide patterns of variation in Ireland and
Britain. Eur J Hum Genet. 2010a Nov;18(11):1248-54.
O’Dushlaine C, McQuillan R, Weale ME, Crouch DJ, Johansson A, Aulchenko Y, Franklin CS,
Polašek O, Fuchsberger C, Corvin A, Hicks AA, Vitart V, Hayward C, Wild SH, Meitinger T,
van Duijn CM, Gyllensten U, Wright AF, Campbell H, Pramstaller PP, Rudan I, Wilson JF.
Genes predict village of origin in rural Europe. Eur J Hum Genet. 2010b Nov;18(11):1269-70.
Official Statistics of Finland (OSF): Preliminary population statistics [e-publication].
ISSN=2243-3627. Helsinki: Statistics Finland. Accessed in August 2012.
Olkkonen T. Modernisoituva sääty-yhteiskunta. In: Zetterberg S (ed). Suomen historian
pikkujättiläinen. WSOY, Porvoo. 2003; 462-533.
Orrman E. Suomen väestön ja asutuksen kehitys keskiajalla. In: Fogelberg P (ed). Pohjan
poluilla: suomalaisten juuret nykytutkimuksen mukaan. Bidrag till kännedom av Finlands
natur och folk 153. Finska Vetenskaps-Societeten, Helsinki. 1999; 375-84.
Pajunen P, Pääkkönen R, Juolevi A, Hämäläinen H, Keskimäki I, Laatikainen T, Moltchanov
V, Niemi M, Rintanen H, Salomaa V. Trends in fatal and non-fatal coronary heart disease
events in Finland during 1991-2001. Scand Cardiovasc J. 2004 Dec;38(6):340-4.
Palo JU, Hedman M, Ulmanen I, Lukka M, Sajantila A. High degree of Y-chromosomal
divergence within Finland--forensic aspects. Forensic Sci Int Genet. 2007 Jun;1(2):120-4.
Palo JU, Pirttimaa M, Bengs A, Johnsson V, Ulmanen I, Lukka M, Udd B, Sajantila A. The
effect of number of loci on geographical structuring and forensic applicability of Y-STR
data in Finland. Int J Legal Med. 2008 Nov;122(6):449-56.
Palo JU, Ulmanen I, Lukka M, Ellonen P, Sajantila A. Genetic markers and population history:
Finland revisited. Eur J Hum Genet. 2009 Oct;17(10):1336-46.
Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008
Dec;40(12):1413-5.
Parducci L, Jørgensen T, Tollefsrud MM, Elverland E, Alm T, Fontana SL, Bennett KD, Haile J,
Matetovici I, Suyama Y, Edwards ME, Andersen K, Rasmussen M, Boessenkool S, Coissac
E, Brochmann C, Taberlet P, Houmark-Nielsen M, Larsen NK, Orlando L, Gilbert MT,
Kjær KH, Alsos IG, Willerslev E. Glacial survival of boreal trees in northern Scandinavia.
Science. 2012 Mar 2;335(6072):1083-6.
Pastinen T, Perola M, Ignatius J, Sabatti C, Tainola P, Levander M, Syvänen AC, Peltonen L.
Dissecting a population genome for targeted screening of disease mutations. Hum Mol
Genet. 2001 Dec 15;10(26):2961-72.
Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL,
Smith MW, O’Brien SJ, Altshuler D, Daly MJ, Reich D. Methods for high-density admixture
mapping of disease genes. Am J Hum Genet. 2004 May;74(5):979-1000.
Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006
Dec;2(12):e190.
Pekkanen J, Manton KG, Stallard E, Nissinen A, Karvonen MJ. Risk factor dynamics, mortality
and life expectancy differences between eastern and western Finland: the Finnish Cohorts
of the Seven Countries Study. Int J Epidemiol. 1992 Apr;21(2):406-19.
126
References
Peltonen L, Jalanko A, Varilo T. Molecular genetics of the Finnish disease heritage. Hum Mol
Genet. 1999;8(10):1913-23.
Peltonen L. Positional cloning of disease genes: advantages of genetic isolates. Hum Hered.
2000 Jan-Feb;50(1):66-75.
Peltonen L, Palotie A, Lange K. Use of population isolates for mapping complex traits. Nat
Rev Genet. 2000 Dec;1(3):182-90.
Pesonen E, Viikari J, Akerblom HK, Räsänen L, Louhivuori K, Sarna S. Geographic origin
of the family as a determinant of serum levels of lipids in Finnish children. Circulation.
1986 Jun;73(6):1119-26.
Pesonen E, Norio R, Hirvonen J, Karkola K, Kuusela V, Laaksonen H, Möttönen M, Nikkari
T, Raekallio J, Viikari J, et al. Intimal thickening in the coronary arteries of infants and
children as an indicator of risk factors for coronary heart disease. Eur Heart J. 1990 Aug;11
Suppl E:53-60.
Pettersson G. 2005. Svenska språket under sjuhundra år: en historia om svenskan och dess
utforskande. 2nd ed. Studentlitteratur, Lund. 2005; 290 p.
Pettitt P, Niskanen M. Neanderthals in Susiluola cave, Finland, during the last interglacial
period? Fennoscandia archaeologica 2005;22:79–87.
Pickrell J, Pritchard J. Inference of population splits and mixtures from genome-wide allele
frequency data. Available from Nature Precedings 2012 http://hdl.handle.net/10101/
npre.2012.6956.1.
Pitkänen K. Suomen väestön historialliset kehityslinjat. In: Koskinen S, Martelin T, Notkola IL,
Notkola V, Pitkänen K, Jalovaara M, Mäenpää E, Ruokolainen A, Ryynänen M, Söderling
I (eds). Suomen väestö. Gaudeamus, Helsinki. 2007; 41-75.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components
analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006
Aug;38(8):904-9.
Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C,
Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti
G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W,
Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D. A genomewide admixture map for Latino populations. Am J Hum Genet. 2007 Jun;80(6):1024-36.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de
Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep;81(3):559-75.
Puska P, Vartiainen E, Tuomilehto J, Salomaa V, Nissinen A. Changes in premature deaths
in Finland: successful long-term prevention of cardiovascular diseases. Bull World Health
Organ. 1998;76(4):419-25.
R Development Core Team. R: A language and environment for statistical computing
2.6.2. Available: http://www.R-project.org via the Internet. R Foundation for Statistical
Computing, Vienna. 2008; 2673 p.
Raitio M, Lindroos K, Laukkanen M, Pastinen T, Sistonen P, Sajantila A, Syvänen AC.
Y-chromosomal SNPs in Finno-Ugric-speaking populations analyzed by minisequencing
on microarrays. Genome Res. 2001 Mar;11(3):471-82.
References
127
Rankama T, Kankaanpää J. Eastern arrivals in post-glacial Lapland: the Sujala Site 10000
cal BP. Antiquity. 2008;88(318):884–99.
Read A, Donnai D. New Clinical Genetics. 2nd ed. Scion Publishing Limited, Banbury. 2011;
442 p.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH,
Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang
J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L,
Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang
F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP,
Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME. Global variation in copy number
in the human genome. Nature. 2006 Nov 23;444(7118):444-54.
Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, Lincoln RR,
DeLoa C, Fruhan SA, Cabre P, Bera O, Semana G, Kelly MA, Francis DA, Ardlie K, Khan
O, Cree BA, Hauser SL, Oksenberg JR, Hafler DA. A whole-genome admixture scan finds
a candidate locus for multiple sclerosis susceptibility. Nat Genet. 2005 Oct;37(10):1113-8.
Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science.
1996 Sep 13;273(5281):1516-7.
Roesdahl E, Sørensen PM. Viking culture. In: Helle K (ed). The Cambridge history of
Scandinavia. Volume I: Prehistory to 1520. Cambridge University Press, Cambridge.
2003; 121-46.
Rootsi S, Magri C, Kivisild T, Benuzzi G, Help H, Bermisheva M, Kutuev I, Barać L, Pericić
M, Balanovsky O, Pshenichnov A, Dion D, Grobei M, Zhivotovsky LA, Battaglia V, Achilli
A, Al-Zahery N, Parik J, King R, Cinnioğlu C, Khusnutdinova E, Rudan P, Balanovska E,
Scheffrahn W, Simonescu M, Brehm A, Goncalves R, Rosa A, Moisan JP, Chaventre A, Ferak
V, Füredi S, Oefner PJ, Shen P, Beckman L, Mikerezi I, Terzić R, Primorac D, CambonThomsen A, Krumina A, Torroni A, Underhill PA, Santachiara-Benerecetti AS, Villems R,
Semino O. Phylogeography of Y-chromosome haplogroup I reveals distinct domains of
prehistoric gene flow in europe. Am J Hum Genet. 2004 Jul;75(1):128-37.
Rootsi S, Zhivotovsky LA, Baldovic M, Kayser M, Kutuev IA, Khusainova R, Bermisheva
MA, Gubina M, Fedorova SA, Ilumäe AM, Khusnutdinova EK, Voevoda MI, Osipova LP,
Stoneking M, Lin AA, Ferak V, Parik J, Kivisild T, Underhill PA, Villems R. A counter-clockwise northern route of the Y-chromosome haplogroup N from Southeast Asia towards
Europe. Eur J Hum Genet. 2007 Feb;15(2):204-11.
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW.
Genetic structure of human populations. Science. 2002 Dec 20;298(5602):2381-5.
Rosenblum EB, Novembre J. Ascertainment bias in spatially structured populations: a case
study in the eastern fence lizard. J Hered. 2007 Jul-Aug;98(4):331-6.
Rosengren A, Stegmayr B, Johansson I, Huhtasaari F, Wilhelmsen L. Coronary risk factors,
diet and vitamins as possible explanatory factors of the Swedish north-south gradient
in coronary disease: a comparison between two MONICA centres. J Intern Med. 1999
Dec;246(6):577-86.
Sajantila A, Lahermo P, Anttinen T, Lukka M, Sistonen P, Savontaus ML, Aula P, Beckman L,
Tranebjaerg L, Gedde-Dahl T, Issel-Tarver L, DiRienzo A, Pääbo S. Genes and languages
in Europe: an analysis of mitochondrial lineages. Genome Res. 1995 Aug;5(1):42-52.
128
References
Sajantila A, Pääbo S. Language replacement in Scandinavia. Nat Genet. 1995 Dec;11(4):359-60.
Sajantila A, Salem AH, Savolainen P, Bauer K, Gierig C, Pääbo S. Paternal and maternal DNA
lineages reveal a bottleneck in the founding of the Finnish population. Proc Natl Acad Sci
U S A. 1996 Oct 15;93(21):12035-9.
Salomaa V, Miettinen H, Kuulasmaa K, Niemelä M, Ketonen M, Vuorenmaa T, Lehto S,
Palomäki P, Mähönen M, Immonen-Räihä P, Arstila M, Kaarsalo E, Mustaniemi H, Torppa
J, Tuomilehto J, Puska P, Pyörälä K. Decline of coronary heart disease mortality in Finland
during 1983 to 1992: roles of incidence, recurrence, and case-fatality. The FINMONICA
MI Register Study. Circulation. 1996 Dec 15;94(12):3130-7.
Salomaa V, Rosamond W, Mähönen M. Decreasing mortality from acute myocardial infarction:
effect of incidence and prognosis. J Cardiovasc Risk. 1999 Apr;6(2):69-75.
Salomaa V, Ketonen M, Koukkunen H, Immonen-Räihä P, Jerkkola T, Kärjä-Koskenkari P,
Mähönen M, Niemelä M, Kuulasmaa K, Palomäki P, Mustonen J, Arstila M, Vuorenmaa
T, Lehtonen A, Lehto S, Miettinen H, Torppa J, Tuomilehto J, Kesäniemi YA, Pyörälä K.
Decline in out-of-hospital coronary heart disease deaths has contributed the main part to
the overall decline in coronary heart disease mortality rates among persons 35 to 64 years
of age in Finland: the FINAMI study. Circulation. 2003 Aug 12;108(6):691-6.
Salonen JT, Puska P, Mustaniemi H. Changes in morbidity and mortality during comprehensive
community programme to control cardiovascular diseases during 1972-7 in North Karelia.
Br Med J. 1979 Nov 10;2(6199):1178-83.
Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, Dixon RJ, Meitinger
T, Braund P, Wichmann HE, Barrett JH, König IR, Stevens SE, Szymczak S, Tregouet DA,
Iles MM, Pahlke F, Pollard H, Lieb W, Cambien F, Fischer M, Ouwehand W, Blankenberg
S, Balmforth AJ, Baessler A, Ball SG, Strom TM, Braenne I, Gieger C, Deloukas P, Tobin
MD, Ziegler A, Thompson JR, Schunkert H; WTCCC and the Cardiogenics Consortium.
Genomewide association analysis of coronary artery disease. N Engl J Med. 2007 Aug
2;357(5):443-53.
Sammallahti P. Saamelaisten esihistoriallinen tausta kielitieteen valossa. In: Gallén J (ed).
Suomen väestön esihistorialliset juuret. Bidrag till kännedom av Finlands natur och folk
131. Finska Vetenskaps-Societeten, Helsinki. 1984; 137-56.
Samuelsson J. Vasatiden 1521-1611. In: Larsson HA (ed). Boken om Sveriges historia. Forum,
Stockholm. 1999; 101-32.
Sans S, Kesteloot H, Kromhout D. The burden of cardiovascular diseases mortality in Europe.
Task Force of the European Society of Cardiology on Cardiovascular Mortality and Morbidity
Statistics in Europe. Eur Heart J. 1997 Dec;18(12):1231-48.
Sarantaus L, Huusko P, Eerola H, Launonen V, Vehmanen P, Rapakko K, Gillanders E,
Syrjäkoski K, Kainu T, Vahteristo P, Krahe R, Pääkkönen K, Hartikainen J, Blomqvist
C, Löppönen T, Holli K, Ryynänen M, Bützow R, Borg A, Wasteson Arver B, Holmberg
E, Mannermaa A, Kere J, Kallioniemi OP, Winqvist R, Nevanlinna H. Multiple founder
effects and geographical clustering of BRCA1 and BRCA2 families in Finland. Eur J Hum
Genet. 2000 Oct;8(10):757-63.
Sarti C, Vartiainen E, Torppa J, Tuomilehto J, Puska P. Trends in cerebrovascular mortality
and in its risk factors in Finland during the last 20 years. Health Rep. 1994;6(1):196-206.
References
129
Sawyer B, Sawyer P. Scandinavia enters Christian Europe. In: Helle K (ed). The Cambridge history of Scandinavia. Volume I: Prehistory to 1520. Cambridge University Press, Cambridge.
2003; 147-59.
Sawyer P. The Viking expansion. In: Helle K (ed). The Cambridge history of Scandinavia.
Volume I: Prehistory to 1520. Cambridge University Press, Cambridge. 2003; 105-20.
Schulz HP. The Susiluola Cave site in Western Finland - Evidence of the Northernmost
Middle Palaeolithic Settlement in Europe. In: Burdukiewicz JM, Wisniewski A (eds).
Middle Palaeolithic Human Activity and Palaeoecology: New Discoveries and Ideas. Studia
Archeologiczne. 2010; 41:47–67.
Schunkert H, König IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, Preuss M, Stewart AF,
Barbalic M, Gieger C, Absher D, Aherrahrou Z, Allayee H, Altshuler D, Anand SS, Andersen
K, Anderson JL, Ardissino D, Ball SG, Balmforth AJ, Barnes TA, Becker DM, Becker
LC, Berger K, Bis JC, Boekholdt SM, Boerwinkle E, Braund PS, Brown MJ, Burnett MS,
Buysschaert I; Cardiogenics, Carlquist JF, Chen L, Cichon S, Codd V, Davies RW, Dedoussis
G, Dehghan A, Demissie S, Devaney JM, Diemert P, Do R, Doering A, Eifert S, Mokhtari
NE, Ellis SG, Elosua R, Engert JC, Epstein SE, de Faire U, Fischer M, Folsom AR, Freyer
J, Gigante B, Girelli D, Gretarsdottir S, Gudnason V, Gulcher JR, Halperin E, Hammond
N, Hazen SL, Hofman A, Horne BD, Illig T, Iribarren C, Jones GT, Jukema JW, Kaiser
MA, Kaplan LM, Kastelein JJ, Khaw KT, Knowles JW, Kolovou G, Kong A, Laaksonen
R, Lambrechts D, Leander K, Lettre G, Li M, Lieb W, Loley C, Lotery AJ, Mannucci PM,
Maouche S, Martinelli N, McKeown PP, Meisinger C, Meitinger T, Melander O, Merlini PA,
Mooser V, Morgan T, Mühleisen TW, Muhlestein JB, Münzel T, Musunuru K, Nahrstaedt
J, Nelson CP, Nöthen MM, Olivieri O, Patel RS, Patterson CC, Peters A, Peyvandi F, Qu L,
Quyyumi AA, Rader DJ, Rallidis LS, Rice C, Rosendaal FR, Rubin D, Salomaa V, Sampietro
ML, Sandhu MS, Schadt E, Schäfer A, Schillert A, Schreiber S, Schrezenmeir J, Schwartz
SM, Siscovick DS, Sivananthan M, Sivapalaratnam S, Smith A, Smith TB, Snoep JD,
Soranzo N, Spertus JA, Stark K, Stirrups K, Stoll M, Tang WH, Tennstedt S, Thorgeirsson
G, Thorleifsson G, Tomaszewski M, Uitterlinden AG, van Rij AM, Voight BF, Wareham NJ,
Wells GA, Wichmann HE, Wild PS, Willenborg C, Witteman JC, Wright BJ, Ye S, Zeller T,
Ziegler A, Cambien F, Goodall AH, Cupples LA, Quertermous T, März W, Hengstenberg
C, Blankenberg S, Ouwehand WH, Hall AS, Deloukas P, Thompson JR, Stefansson K,
Roberts R, Thorsteinsdottir U, O’Donnell CJ, McPherson R, Erdmann J; CARDIoGRAM
Consortium, Samani NJ. Large-scale association analysis identifies 13 new susceptibility
loci for coronary artery disease. Nat Genet. 2011 Mar 6;43(4):333-8.
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Månér S, Massa H, Walker M,
Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson
N, Zetterberg A, Wigler M. Large-scale copy number polymorphism in the human genome.
Science. 2004 Jul 23;305(5683):525-8.
Seldin MF, Shigeta R, Villoslada P, Selmi C, Tuomilehto J, Silva G, Belmont JW, Klareskog
L, Gregersen PK. European population substructure: clustering of northern and southern
populations. PLoS Genet. 2006 Sep 15;2(9):e143.
Service S, DeYoung J, Karayiorgou M, Roos JL, Pretorious H, Bedoya G, Ospina J, RuizLinares A, Macedo A, Palha JA, Heutink P, Aulchenko Y, Oostra B, van Duijn C, Jarvelin
MR, Varilo T, Peddle L, Rahman P, Piras G, Monne M, Murray S, Galver L, Peltonen L,
Sabatti C, Collins A, Freimer N. Magnitude and distribution of linkage disequilibrium
in population isolates and implications for genome-wide association studies. Nat Genet.
2006 May;38(5):556-60.
130
References
Service S; International Collaborative Group on Isolated Populations, Sabatti C, Freimer
N. Tag SNPs chosen from HapMap perform well in several population isolates. Genet
Epidemiol. 2007 Apr;31(3):189-94.
Shaper AG, Elford J. Place of birth and adult cardiovascular disease: the British Regional
Heart Study. Acta Paediatr Scand Suppl. 1991;373:73-81.
Siiriäinen A. The Stone and Bronze Ages. In: Helle K (ed). The Cambridge history of Scandinavia.
Volume I: Prehistory to 1520. Cambridge University Press, Cambridge. 2003a; 43-59.
Siiriäinen A. Esihistoria. In: Zetterberg S (ed). Suomen historian pikkujättiläinen. WSOY,
Porvoo. 2003b; 14-43.
Similä M, Taskinen O, Männistö S, Lahti-Koski M, Karvonen M, Laatikainen T, Valsta L. Terveyttä
edistävä ruokavalio, lihavuus ja seerumin kolesteroli karttoina. Kansanterveyslaitoksen
julkaisuja B20/2005. Kansanterveyslaitos, Helsinki. 2005; 36 p.
Sirén MK, Sareneva H, Lokki ML, Koskimies S. Unique HLA antigen frequencies in the Finnish
population. Tissue Antigens. 1996 Dec;48(6):703-7.
Sistonen P, Virtaranta-Knowles K, Denisova R, Kucinskas V, Ambrasiene D, Beckman L. The
LWb blood group as a marker of prehistoric Baltic migrations and admixture. Hum Hered.
1999 Jun;49(3):154-8.
Skoglund P, Malmström H, Raghavan M, Storå J, Hall P, Willerslev E, Gilbert MT, Götherström
A, Jakobsson M. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in
Europe. Science. 2012 Apr 27;336(6080):466-9.
Slatkin M. Linkage disequilibrium in growing and stable populations. Genetics. 1994
May;137(1):331-6.
Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A,
Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M,
Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA,
Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA,
O’Brien SJ, Reich D. A high-density admixture map for disease gene discovery in African
Americans. Am J Hum Genet. 2004 May;74(5):1001-13.
Smith MW, O’Brien SJ. Mapping by admixture linkage disequilibrium: advances, limitations
and guidelines. Nat Rev Genet. 2005 Aug;6(8):623-32.
Stadin K. Stormaktstiden 1611-1718. In: Larsson HA (ed). Boken om Sveriges historia. Forum,
Stockholm. 1999; 133-80.
Statistics Finland. http://www.stat.fi/ Accessed in April 2012.
Statistics Sweden. http://www.scb.se/ Accessed in May 2012.
Stephens JC, Briscoe D, O’Brien SJ. Mapping by admixture linkage disequilibrium in human
populations: limits and guidelines. Am J Hum Genet. 1994 Oct;55(4):809-24.
Strachan T, Read A. Human Molecular Genetics. 4th ed. Garland Science, Taylor & Francis
Group, New York. 2011; 781 p.
Sundell T, Heger M, Kammonen J, Onkamo P. Modelling a neolithic population bottleneck
in Finland: a genetic simulation. Fennoscandia Archaeologica. 2010;27:3-19.
Takala H. The Ristola Site in Lahti and the Earliest Postglacial Settlement of South Finland.
Lahti City Museum, Lahti. 2004; 205 p.
References
131
Tallavaara M, Pesonen P, Oinonen M. Prehistoric population history in eastern Fennoscandia.
J Archaeological Sci. 2010 Feb;37(2):251-60.
Tambets K, Rootsi S, Kivisild T, Help H, Serk P, Loogväli EL, Tolk HV, Reidla M, Metspalu E,
Pliss L, Balanovsky O, Pshenichnov A, Balanovska E, Gubina M, Zhadanov S, Osipova L,
Damba L, Voevoda M, Kutuev I, Bermisheva M, Khusnutdinova E, Gusar V, Grechanina
E, Parik J, Pennarun E, Richard C, Chaventre A, Moisan JP, Barác L, Pericić M, Rudan P,
Terzić R, Mikerezi I, Krumina A, Baumanis V, Koziel S, Rickards O, De Stefano GF, Anagnou
N, Pappa KI, Michalodimitrakis E, Ferák V, Füredi S, Komel R, Beckman L, Villems R. The
western and eastern roots of the Saami--the story of genetic “outliers” told by mitochondrial
DNA and Y chromosomes. Am J Hum Genet. 2004 Apr;74(4):661-82.
Teo SM, Ku CS, Naidoo N, Hall P, Chia KS, Salim A, Pawitan Y. A population-based study of
copy number variants and regions of homozygosity in healthy Swedish individuals. J Hum
Genet. 2011 Jul;56(7):524-33.
Terwilliger JD, Zöllner S, Laan M, Pääbo S. Mapping genes through the use of linkage disequilibrium generated by genetic drift: ‘drift mapping’ in small populations with no demographic
expansion. Hum Hered. 1998 May-Jun;48(3):138-54.
Tian C, Hinds DA, Shigeta R, Adler SG, Lee A, Pahl MV, Silva G, Belmont JW, Hanson RL,
Knowler WC, Gregersen PK, Ballinger DG, Seldin MF. A genomewide single-nucleotide-polymorphism panel for Mexican American admixture mapping. Am J Hum Genet.
2007 Jun;80(6):1014-23.
Tian C, Kosoy R, Nassir R, Lee A, Villoslada P, Klareskog L, Hammarström L, Garchon HJ,
Pulver AE, Ransom M, Gregersen PK, Seldin MF. European population genetic substructure: further definition of ancestry informative markers for distinguishing among diverse
European ethnic groups. Mol Med. 2009 Nov-Dec;15(11-12):371-83.
Tienari PJ, Sumelahti ML, Rantamäki T, Wikström J. Multiple sclerosis in western Finland:
evidence for a founder effect. Clin Neurol Neurosurg. 2004 Jun;106(3):175-9.
Tillmar AO, Coble MD, Wallerström T, Holmlund G. Homogeneity in mitochondrial DNA
control region sequences in Swedish subpopulations. Int J Legal Med. 2010 Mar;124(2):91-8.
Tillmar AO. Population genetic analysis of 12 X-STRs in Swedish population. Forensic Sci
Int Genet. 2012 Mar;6(2):e80-1.
Toiviainen P, Eerola T. Musiikkitietokannat kansanmusiikin tutkimuksessa Musiikin suunta.
2005;2:5-11.
Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R, Obinu D, Savontaus
ML, Wallace DC. Classification of European mtDNAs from an analysis of three European
populations. Genetics. 1996 Dec;144(4):1835-50.
Torroni A, Achilli A, Macaulay V, Richards M, Bandelt HJ. Harvesting the fruit of the human
mtDNA tree. Trends Genet. 2006 Jun;22(6):339-45.
Trégouët DA, König IR, Erdmann J, Munteanu A, Braund PS, Hall AS, Grosshennig A, LinselNitschke P, Perret C, DeSuremain M, Meitinger T, Wright BJ, Preuss M, Balmforth AJ, Ball
SG, Meisinger C, Germain C, Evans A, Arveiler D, Luc G, Ruidavets JB, Morrison C, van der
Harst P, Schreiber S, Neureuther K, Schäfer A, Bugert P, El Mokhtari NE, Schrezenmeir J,
Stark K, Rubin D, Wichmann HE, Hengstenberg C, Ouwehand W; Wellcome Trust Case
Control Consortium; Cardiogenics Consortium, Ziegler A, Tiret L, Thompson JR, Cambien
132
References
F, Schunkert H, Samani NJ. Genome-wide haplotype association study identifies the
SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nat Genet.
2009 Mar;41(3):283-5.
Tunstall-Pedoe H, Kuulasmaa K, Mähönen M, Tolonen H, Ruokokoski E, Amouyel P.
Contribution of trends in survival and coronary-event rates to changes in coronary heart
disease mortality: 10-year results from 37 WHO MONICA project populations. Monitoring
trends and determinants in cardiovascular disease. Lancet. 1999 May 8;353(9164):1547-57.
Tunstall-Pedoe H, Vanuzzo D, Hobbs M, Mähönen M, Cepaitis Z, Kuulasmaa K, Keil U.
Estimation of contribution of changes in coronary care to improving survival, event rates,
and coronary heart disease mortality across the WHO MONICA Project populations. Lancet.
2000 Feb 26;355(9205):688-700.
Tuomilehto J, Arstila M, Kaarsalo E, Kankaanpää J, Ketonen M, Kuulasmaa K, Lehto S,
Miettinen H, Mustaniemi H, Palomäki P, et al. Acute myocardial infarction (AMI) in
Finland--baseline data from the FINMONICA AMI register in 1983-1985. Eur Heart J.
1992 May;13(5):577-87.
Turakulov R, Easteal S. Number of SNPS loci needed to detect population structure. Hum
Hered. 2003;55(1):37-45.
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson
D, Pinkel D, Olson MV, Eichler EE. Fine-scale structural variation of the human genome.
Nat Genet. 2005 Jul;37(7):727-32.
Tyynelä P, Goebeler S, Ilveskoski E, Mikkelsson J, Perola M, Löytönen M, Karhunen PJ.
Birthplace predicts risk for prehospital sudden cardiac death in middle-aged men who
migrated to metropolitan area: The Helsinki Sudden Death Study. Ann Med. 2009;41(1):57-65.
Tyynelä P, Goebeler S, Ilveskoski E, Mikkelsson J, Perola M, Löytönen M, Karhunen PJ.
Birthplace in area with high coronary heart disease mortality predicts the severity of coronary atherosclerosis among middle-aged Finnish men who had migrated to capital area:
the Helsinki sudden death study. Ann Med. 2010 May 6;42(4):286-95.
Uimari P, Kontkanen O, Visscher PM, Pirskanen M, Fuentes R, Salonen JT. Genome-wide
linkage disequilibrium from 100,000 SNPs in the East Finland founder population. Twin
Res Hum Genet. 2005 Jun;8(3):185-97.
Vahtola J. Keskiaika. In: Zetterberg S (ed). Suomen historian pikkujättiläinen. WSOY, Porvoo.
2003a; 44-137.
Vahtola J. Population and settlement. In: Helle K (ed). The Cambridge history of Scandinavia.
Volume I: Prehistory to 1520. Cambridge University Press, Cambridge. 2003b; 559-80.
Varilo T, Savukoski M, Norio R, Santavuori P, Peltonen L, Järvelä I. The age of human
mutation: genealogical and linkage disequilibrium analysis of the CLN5 mutation in the
Finnish population. Am J Hum Genet. 1996a Mar;58(3):506-12.
Varilo T, Nikali K, Suomalainen A, Lönnqvist T, Peltonen L. Tracing an ancestral mutation:
genealogical and haplotype analysis of the infantile onset spinocerebellar ataxia locus.
Genome Res. 1996b Sep;6(9):870-5.
Varilo T. The age of the mutations in the Finnish disease heritage; a genealogical and linkage
disequilibrium study. PhD thesis. Publications of the National Public Health Institute, KTL
A21/1999. National Public Health Institute, Helsinki. 1999; 98 p.
References
133
Varilo T, Paunio T, Parker A, Perola M, Meyer J, Terwilliger JD, Peltonen L. The interval of
linkage disequilibrium (LD) detected with microsatellite and SNP markers in chromosomes
of Finnish populations with different histories. Hum Mol Genet. 2003 Jan 1;12(1):51-9.
Varilo T, Peltonen L. Isolates and their potential use in complex gene mapping efforts. Curr
Opin Genet Dev. 2004 Jun;14(3):316-23.
Vartiainen E, Puska P, Pekkanen J, Tuomilehto J, Jousilahti P. Changes in risk factors
explain changes in mortality from ischaemic heart disease in Finland. BMJ. 1994 Jul
2;309(6946):23-7.
Vartiainen E, Laatikainen T, Peltonen M, Juolevi A, Männistö S, Sundvall J, Jousilahti P,
Salomaa V, Valsta L, Puska P. Thirty-five-year trends in cardiovascular risk factors in
Finland. Int J Epidemiol. 2010 Apr;39(2):504-18.
Venken T, Claes S, Sluijs S, Paterson AD, van Duijn C, Adolfsson R, Del-Favero J, Van
Broeckhoven C. Genomewide scan for affective disorder susceptibility loci in families of a
northern Swedish isolated population. Am J Hum Genet. 2005 Feb;76(2):237-48.
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans
CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q,
Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor
Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ,
Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo
D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert
K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M,
Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck
K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins
ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV,
Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg
S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C,
Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W,
Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F,
An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver
A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K,
Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun
S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S,
Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy
B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers
YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E,
Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S,
Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV,
Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan
A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen
D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne
M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H,
Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil
J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C,
Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman
M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott
134
References
J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu
D, Wu M, Xia A, Zandieh A, Zhu X. The sequence of the human genome. Science. 2001
Feb 16;291(5507):1304-51.
Vihavainen T. Hyvinvointi-Suomi. In: Zetterberg S (ed). Suomen historian pikkujättiläinen.
WSOY, Porvoo. 2003; 832-915.
Vikor LS. The Nordic language area and the languages in the north of Europe. In: Bandle O,
Braunmüller K, Jahr EH, Karker A, Naumann HP, Teleman U, Elmevik L, Widmark G (eds).
The Nordic languages: an international handbook of the history of the North Germanic
languages. Walter de Gruyter, Berlin. 2002; 1-12.
Vilkki J, Savontaus ML, Nikoskelainen EK. Human mitochondrial DNA types in Finland. Hum
Genet. 1988 Dec;80(4):317-21.
Virtaranta-Knowles K, Sistonen P, Nevanlinna HR. A population genetic study in Finland: comparison of the Finnish- and Swedish-speaking populations. Hum Hered. 1991;41(4):248-64.
Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum
Genet. 2012 Jan 13;90(1):7-24.
Vuorela I. Viljelytoiminnan alku Suomessa paleoekologisen tutkimuksen kohteena. In:
Fogelberg P (ed). Pohjan poluilla: suomalaisten juuret nykytutkimuksen mukaan. Bidrag
till kännedom av Finlands natur och folk 153. Finska Vetenskaps-Societeten, Helsinki.
1999; 143-51.
Wakeley J, Nielsen R, Liu-Cordero SN, Ardlie K. The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. Am J Hum Genet. 2001
Dec;69(6):1332-47.
Weber JL, May PE. Abundant class of human DNA polymorphisms which can be typed using
the polymerase chain reaction. Am J Hum Genet. 1989 Mar;44(3):388-96.
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of
seven common diseases and 3,000 shared controls. Nature. 2007 Jun 7;447(7145):661-78.
Wiik K. Eurooppalaisten juuret. Atena, Jyväskylä. 2002; 503 p.
Wild PS, Zeller T, Schillert A, Szymczak S, Sinning CR, Deiseroth A, Schnabel RB, Lubos
E, Keller T, Eleftheriadis MS, Bickel C, Rupprecht HJ, Wilde S, Rossmann H, Diemert P,
Cupples LA, Perret C, Erdmann J, Stark K, Kleber ME, Epstein SE, Voight BF, Kuulasmaa K,
Li M, Schäfer AS, Klopp N, Braund PS, Sager HB, Demissie S, Proust C, König IR, Wichmann
HE, Reinhard W, Hoffmann MM, Virtamo J, Burnett MS, Siscovick D, Wiklund PG, Qu L,
El Mokthari NE, Thompson JR, Peters A, Smith AV, Yon E, Baumert J, Hengstenberg C,
März W, Amouyel P, Devaney J, Schwartz SM, Saarela O, Mehta NN, Rubin D, Silander
K, Hall AS, Ferrieres J, Harris TB, Melander O, Kee F, Hakonarson H, Schrezenmeir J,
Gudnason V, Elosua R, Arveiler D, Evans A, Rader DJ, Illig T, Schreiber S, Bis JC, Altshuler D,
Kavousi M, Witteman JC, Uitterlinden AG, Hofman A, Folsom AR, Barbalic M, Boerwinkle
E, Kathiresan S, Reilly MP, O’Donnell CJ, Samani NJ, Schunkert H, Cambien F, Lackner
KJ, Tiret L, Salomaa V, Munzel T, Ziegler A, Blankenberg S. A genome-wide association
study identifies LIPA as a susceptibility gene for coronary artery disease. Circ Cardiovasc
Genet. 2011 Aug 1;4(4):403-12.
Wilhelmsen L, Rosengren A, Johansson S, Lappas G. Coronary heart disease attack rate, incidence and mortality 1975-1994 in Göteborg, Sweden. Eur Heart J. 1997 Apr;18(4):572-81.
References
135
Willer CJ, Scott LJ, Bonnycastle LL, Jackson AU, Chines P, Pruim R, Bark CW, Tsai YY, Pugh
EW, Doheny KF, Kinnunen L, Mohlke KL, Valle TT, Bergman RN, Tuomilehto J, Collins FS,
Boehnke M. Tag SNP selection for Finnish individuals based on the CEPH Utah HapMap
database. Genet Epidemiol. 2006 Feb;30(2):180-90.
Willerslev E, Cooper A. Ancient DNA. Proc Biol Sci. 2005 Jan 7;272(1558):3-16.
Winkler CA, Nelson GW, Smith MW. Admixture mapping comes of age. Annu Rev Genomics
Hum Genet. 2010 Sep 22;11:65-89.
Wray NR. Allele frequencies and the r2 measure of linkage disequilibrium: impact on design
and interpretation of association studies. Twin Res Hum Genet. 2005 Apr;8(2):87-94.
Xu S, Jin L. A genome-wide analysis of admixture in Uyghurs and a high-density admixture
map for disease-gene discovery. Am J Hum Genet. 2008 Sep;83(3):322-36.
Zerjal T, Dashnyam B, Pandya A, Kayser M, Roewer L, Santos FR, Schiefenhövel W, Fretwell N,
Jobling MA, Harihara S, Shimizu K, Semjidmaa D, Sajantila A, Salo P, Crawford MH, Ginter
EK, Evgrafov OV, Tyler-Smith C. Genetic relationships of Asians and Northern Europeans,
revealed by Y-chromosomal DNA analysis. Am J Hum Genet. 1997 May;60(5):1174-83.
Zerjal T, Beckman L, Beckman G, Mikelsaar AV, Krumina A, Kucinskas V, Hurles ME,
Tyler-Smith C. Geographical, linguistic, and cultural influences on genetic diversity:
Y-chromosomal distribution in Northern European populations. Mol Biol Evol. 2001
Jun;18(6):1077-87.
Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene sets in
various biological contexts. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W741-8.
Zhu X, Luke A, Cooper RS, Quertermous T, Hanis C, Mosley T, Gu CC, Tang H, Rao DC, Risch
N, Weder A. Admixture mapping for hypertension loci with genome-scan markers. Nat
Genet. 2005 Feb;37(2):177-81.
Zhu X, Tang H, Risch N. Admixture mapping and the role of population structure for localizing
disease genes. Adv Genet. 2008;60:547-69.
136
References