Introduction Models & Statistics Results Genome scan methods against more realistic models when and how much should we trust them? Pierre de Villemereuil, Éric Frichot, Éric Bazin, Olivier François & Oscar Gaggiotti Laboratoire d’écologie alpine LECA (Grenoble, France) Software and Statistical Methods for Population Genetics (SSMPG) Aussois - June 20 2013 1 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Introduction 2 / 21 Pierre de Villemereuil (LECA) Results Discussion Introduction Models & Statistics Results Some context... What are genome scan methods? Using medium to highly dense markers (AFLPs, SNPs...) Looking for signatures of selection Using test procedures to control for power and false positive rate Context: non-model species Very little information about the genome: coding regions, genetic map... Little room for functionnal genetic experimentation Little information about population structure 3 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Why “more realistic” models? A lot of published studies... Issues not tackled... More difficult models: Highly hierarchical structure Correlation between demography and environment Polygenic selection 4 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Discussion Two kind of methods FST -based models Association models Detecting outlier loci based on locus-specific FST distribution. Detecting assocation between genotypes or population frequencies and environment variables. Bayescan (Foll & Gaggiotti, 2008) Linear regression BayEnv (Coop et al., 2010) LFMM (Frichot et al., 2012) Comparison between 4 methods Most recent and currently used Diversity of approaches (underlying model, ) 5 / 21 Pierre de Villemereuil (LECA) Introduction Models & Statistics Results Models & Statistics 6 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Fission model T=50 T=150 T=200 T=300 T=500 Strong multiple-layer hierarchical structure ! 7 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Parameters of the model Demo-genetic parameters 16 populations of 500 individuals (samling: 20 individuals) 5,000 SNPs over 10 chromosomes Migration rate m = 0.00045 ⇒ FST = 0.1 Two kind of selection: monogenic (1 locus) or polygenic (50 loci) 8 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Discussion Parameters of the model New environmental value Ei is drawn for a uniform centred around the mother population value E0 : E1 ∼ U (E0 − 0.5,E0 + 0.5) E2 ∼ U(E0 − 0.5,E0 + 0.5) Environment Two kind of environment Random and correlated with demography Clinal gradient along the 16 populations 8 / 21 Pierre de Villemereuil (LECA) Introduction Models & Statistics Parameters of the model Environment Two kind of environment Random and correlated with demography Clinal gradient along the 16 populations 8 / 21 Pierre de Villemereuil (LECA) Results Discussion Introduction Models & Statistics Results Statistic comparison of models False Discovery Rate and q-values FDR: proportion of false positive among positive results Linked to power and false positive rate Used to control for multiple testing (q-values) A unified framework Different test statistics for the methods (p-values, Bayes Factor, Posterior probabilities...) All were transformed into q-values ⇒ unified multiple testing framework As for p-values, a q-value should correspond to the nominal FDR ex: αq = 0.1 9 / 21 Pierre de Villemereuil (LECA) ⇒ FDR = 10% Discussion Introduction Models & Statistics Results Results 10 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Discussion Monogenic selection – False Discovery Rate Random Correlated Cline 1.00 We want FDR and threshold to be equal = black line False Discovery Rate 0.75 0.50 0.25 0.00 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 Alpha threshold Regression — Bayescan — BayEnv — LFMM 11 / 21 Pierre de Villemereuil (LECA) 0.25 Very high FDR ! Although: 75%FDR = 4 SNPs Little difference between methods Introduction Models & Statistics Results Discussion Monogenic selection – Power Random Correlated Cline We want high power for low FDR = highest lines 1.00 Power 0.75 Less power for the random correlated environment 0.50 0.25 Regression has a bad behaviour 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 False Discovery Rate Regression — Bayescan — BayEnv — LFMM 12 / 21 Pierre de Villemereuil (LECA) 1.00 All the other methods are similar Introduction Models & Statistics Results Discussion Polygenic selection – False Discovery Rate Random Correlated Cline We want FDR and threshold to be equal = black line 1.00 False Discovery Rate 0.75 BayEnv and Bayescan are better for clinal environment 0.50 0.25 0.00 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 Alpha threshold Regression — Bayescan — BayEnv — LFMM 13 / 21 Pierre de Villemereuil (LECA) 0.25 All the methods have a high FDR for correlated environment Introduction Models & Statistics Results Discussion Polygenic selection – Power Random Correlated Cline We want high power for low FDR = highest lines 1.00 Power 0.75 LFMM is the best performing method 0.50 Bayescan is rather good for clinal environment 0.25 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 False Discovery Rate Regression — Bayescan — BayEnv — LFMM 14 / 21 Pierre de Villemereuil (LECA) 1.00 The regression is no more the worst method Introduction Models & Statistics Results Discussion More classical models with polygenic selection – False Discovery Rate Isolation with Migration We want FDR and threshold to be equal = black line Stepping Stone 1.00 False Discovery Rate 0.75 LFMM and regression behave almost correctly for IM model 0.50 0.25 BayEnv performs correctly for SS model 0.00 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 Alpha threshold Regression — Bayescan — BayEnv — LFMM 15 / 21 Pierre de Villemereuil (LECA) 0.25 Overall the FDR are still high ! Introduction Models & Statistics Results Discussion More classical models with polygenic selection – Power Isolation with Migration We want high power for low FDR = highest lines Stepping Stone 1.00 Power 0.75 LFMM is still the best performing method 0.50 BayEnv is the second best 0.25 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 False Discovery Rate Regression — Bayescan — BayEnv — LFMM 16 / 21 Pierre de Villemereuil (LECA) 1.00 Surprisingly, Bayescan is very bad for the IM model Introduction Models & Statistics Discussion 17 / 21 Pierre de Villemereuil (LECA) Results Discussion Introduction Models & Statistics Results Discussion About the genome scan methods Overall performances False Discovery Rate was very much higher than expected For polygenic selection and medium stringent thresholds, the power can be quite low (few loci detected). Not that bad, if you think about False Positive Rate (4 SNPs) ⇒ Just keep in mind that you’re sometime not controling the FDR as much as you want to! Comparison of methods LFMM was most of the time the best performing method The linear regression was not the worst approach for polygenic selection Bayescan and BayEnv have contrasted results, being useful for some scenarios 18 / 21 Pierre de Villemereuil (LECA) Introduction Models & Statistics Results About simulation assessments of the methods Complex scenarios We need to use more complex scenarios: “Weird” population structures (hierarchical, unbalanced...) Importance of polygenic selection! Can change qualitatively the results More parameters to play with? Coefficient of selection Effective population size Distribution of the environmental variables (influence of outliers?) Genetic architecture (number of loci, infinitesimal, L-shaped...) 19 / 21 Pierre de Villemereuil (LECA) Discussion Introduction Models & Statistics Results Discussion Further development? Two mains points: Developing more robust null models in order to deflate FDR in case of deviances to their assumptions Developing a statistical test framework suited for polygenic selection (no locus-by-locus test) 20 / 21 Pierre de Villemereuil (LECA) Introduction Models & Statistics Results Discussion Acknowledgements Éric Frichot (TIMC, Grenoble), Éric Bazin (LECA, Grenoble), Olivier François (TIMC, Grenoble) and Oscar Gaggiotti (LECA, Grenoble / Saint Andrews, UK) Thank you for your attention!! 21 / 21 Pierre de Villemereuil (LECA)
© Copyright 2026 Paperzz