Genome scan methods against more realistic models

Introduction
Models & Statistics
Results
Genome scan methods against more realistic models
when and how much should we trust them?
Pierre de Villemereuil, Éric Frichot, Éric Bazin, Olivier François & Oscar
Gaggiotti
Laboratoire d’écologie alpine
LECA (Grenoble, France)
Software and Statistical Methods for
Population Genetics (SSMPG)
Aussois - June 20 2013
1 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Introduction
2 / 21
Pierre de Villemereuil (LECA)
Results
Discussion
Introduction
Models & Statistics
Results
Some context...
What are genome scan methods?
Using medium to highly dense markers (AFLPs, SNPs...)
Looking for signatures of selection
Using test procedures to control for power and false positive rate
Context: non-model species
Very little information about the genome: coding regions, genetic map...
Little room for functionnal genetic experimentation
Little information about population structure
3 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Why “more realistic” models?
A lot of published studies...
Issues not tackled...
More difficult models:
Highly hierarchical structure
Correlation between demography and environment
Polygenic selection
4 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Discussion
Two kind of methods
FST -based models
Association models
Detecting outlier loci based on
locus-specific FST distribution.
Detecting assocation between genotypes
or population frequencies and
environment variables.
Bayescan (Foll & Gaggiotti, 2008)
Linear regression
BayEnv (Coop et al., 2010)
LFMM (Frichot et al., 2012)
Comparison between 4 methods
Most recent and currently used
Diversity of approaches (underlying model, )
5 / 21
Pierre de Villemereuil (LECA)
Introduction
Models & Statistics
Results
Models & Statistics
6 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Fission model
T=50
T=150
T=200
T=300
T=500
Strong multiple-layer hierarchical structure !
7 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Parameters of the model
Demo-genetic parameters
16 populations of 500 individuals (samling: 20 individuals)
5,000 SNPs over 10 chromosomes
Migration rate m = 0.00045 ⇒ FST = 0.1
Two kind of selection: monogenic (1 locus) or polygenic (50 loci)
8 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Discussion
Parameters of the model
New environmental value Ei is
drawn for a uniform centred
around the mother population
value E0 :
E1 ∼ U (E0 − 0.5,E0 + 0.5)
E2 ∼ U(E0 − 0.5,E0 + 0.5)
Environment
Two kind of environment
Random and correlated with demography
Clinal gradient along the 16 populations
8 / 21
Pierre de Villemereuil (LECA)
Introduction
Models & Statistics
Parameters of the model
Environment
Two kind of environment
Random and correlated with demography
Clinal gradient along the 16 populations
8 / 21
Pierre de Villemereuil (LECA)
Results
Discussion
Introduction
Models & Statistics
Results
Statistic comparison of models
False Discovery Rate and q-values
FDR: proportion of false positive among positive results
Linked to power and false positive rate
Used to control for multiple testing (q-values)
A unified framework
Different test statistics for the methods (p-values, Bayes Factor, Posterior
probabilities...)
All were transformed into q-values ⇒ unified multiple testing framework
As for p-values, a q-value should correspond to the nominal FDR
ex: αq = 0.1
9 / 21
Pierre de Villemereuil (LECA)
⇒
FDR = 10%
Discussion
Introduction
Models & Statistics
Results
Results
10 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Discussion
Monogenic selection – False Discovery Rate
Random Correlated
Cline
1.00
We want FDR and threshold to
be equal = black line
False Discovery Rate
0.75
0.50
0.25
0.00
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
Alpha threshold
Regression — Bayescan — BayEnv — LFMM
11 / 21
Pierre de Villemereuil (LECA)
0.25
Very high FDR !
Although: 75%FDR
= 4 SNPs
Little difference
between methods
Introduction
Models & Statistics
Results
Discussion
Monogenic selection – Power
Random Correlated
Cline
We want high power for low
FDR = highest lines
1.00
Power
0.75
Less power for the
random correlated
environment
0.50
0.25
Regression has a
bad behaviour
0.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
False Discovery Rate
Regression — Bayescan — BayEnv — LFMM
12 / 21
Pierre de Villemereuil (LECA)
1.00
All the other
methods are similar
Introduction
Models & Statistics
Results
Discussion
Polygenic selection – False Discovery Rate
Random Correlated
Cline
We want FDR and threshold to
be equal = black line
1.00
False Discovery Rate
0.75
BayEnv and
Bayescan are better
for clinal
environment
0.50
0.25
0.00
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
Alpha threshold
Regression — Bayescan — BayEnv — LFMM
13 / 21
Pierre de Villemereuil (LECA)
0.25
All the methods
have a high FDR for
correlated
environment
Introduction
Models & Statistics
Results
Discussion
Polygenic selection – Power
Random Correlated
Cline
We want high power for low
FDR = highest lines
1.00
Power
0.75
LFMM is the best
performing method
0.50
Bayescan is rather
good for clinal
environment
0.25
0.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
False Discovery Rate
Regression — Bayescan — BayEnv — LFMM
14 / 21
Pierre de Villemereuil (LECA)
1.00
The regression is
no more the worst
method
Introduction
Models & Statistics
Results
Discussion
More classical models with polygenic selection – False Discovery Rate
Isolation with Migration
We want FDR and threshold to
be equal = black line
Stepping Stone
1.00
False Discovery Rate
0.75
LFMM and
regression behave
almost correctly for
IM model
0.50
0.25
BayEnv performs
correctly for SS
model
0.00
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
Alpha threshold
Regression — Bayescan — BayEnv — LFMM
15 / 21
Pierre de Villemereuil (LECA)
0.25
Overall the FDR are
still high !
Introduction
Models & Statistics
Results
Discussion
More classical models with polygenic selection – Power
Isolation with Migration
We want high power for low
FDR = highest lines
Stepping Stone
1.00
Power
0.75
LFMM is still the
best performing
method
0.50
BayEnv is the
second best
0.25
0.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
False Discovery Rate
Regression — Bayescan — BayEnv — LFMM
16 / 21
Pierre de Villemereuil (LECA)
1.00
Surprisingly,
Bayescan is very
bad for the IM
model
Introduction
Models & Statistics
Discussion
17 / 21
Pierre de Villemereuil (LECA)
Results
Discussion
Introduction
Models & Statistics
Results
Discussion
About the genome scan methods
Overall performances
False Discovery Rate was very much higher than expected
For polygenic selection and medium stringent thresholds, the power can be
quite low (few loci detected).
Not that bad, if you think about False Positive Rate (4 SNPs)
⇒ Just keep in mind that you’re sometime not controling the FDR as much as
you want to!
Comparison of methods
LFMM was most of the time the best performing method
The linear regression was not the worst approach for polygenic selection
Bayescan and BayEnv have contrasted results, being useful for some
scenarios
18 / 21
Pierre de Villemereuil (LECA)
Introduction
Models & Statistics
Results
About simulation assessments of the methods
Complex scenarios
We need to use more complex scenarios:
“Weird” population structures (hierarchical, unbalanced...)
Importance of polygenic selection!
Can change qualitatively the results
More parameters to play with?
Coefficient of selection
Effective population size
Distribution of the environmental variables (influence of outliers?)
Genetic architecture (number of loci, infinitesimal, L-shaped...)
19 / 21
Pierre de Villemereuil (LECA)
Discussion
Introduction
Models & Statistics
Results
Discussion
Further development?
Two mains points:
Developing more robust null models in order to deflate
FDR in case of deviances to their assumptions
Developing a statistical test framework suited for polygenic
selection (no locus-by-locus test)
20 / 21
Pierre de Villemereuil (LECA)
Introduction
Models & Statistics
Results
Discussion
Acknowledgements
Éric Frichot (TIMC, Grenoble), Éric Bazin (LECA, Grenoble), Olivier
François (TIMC, Grenoble) and Oscar Gaggiotti (LECA, Grenoble /
Saint Andrews, UK)
Thank you for your attention!!
21 / 21
Pierre de Villemereuil (LECA)