Instructions d`utilisation du programme pour déterminer les

NESSI: a program for Numerical Estimations for Sporophytic
Self-Incompatibility systems.
*************************
Welcome in NESSI:
Numerical Estimations for Sporophytic Self-Incompatibility.
program written by Sylvain Billiard.
see Billiard S, V. Castric and X. Vekemans 2007 Genetics for details.
*************************
User’s Guide v0.2.3
1. What is it?
Evolutionary biologists are still looking for evidence of selection in natural populations. One
of the most famous examples is the self-incompatibility systems (SI) in plants, which allows
the avoidance of self-fertilization as well as fertilization between genetically related
individuals. As a consequence, it should decrease inbreeding depression. A particularity of
interest of this mating system is that it is controlled by a single locus: the so-called S-locus.
Two main types of SI are known in plants: Gametophytic SI (GSI) and Sporophytic SI (SSI).
In GSI, only one of the homologous alleles at the S-locus is expressed in pollen and is implied
in the recognition of self-pollen. In stigma, all homologous alleles are expressed, they are
codominant. This case is relatively simple and many theoretical models and predictions have
been done about the diversity at the S-locus in natural population.
In SSI yet, all homologous alleles may be expressed in both pollen and stigma. The difficulty
here is that there can be different dominance relationships between alleles in pollen and
stigma, what makes the modelling and the predictions of the diversity at the S-locus more
complex. As a consequence, it is difficult to identify the evolutionary processes involved in
the diversity at the S-locus observed in natural populations.
The main goal of this program is to provide a tool for evolutionary biologist interested in SSI
to get estimations for different variables generally used by population geneticists in SI studies.
The program can handle any kind of dominance relationships between alleles. One can use
this program to perform estimations on a given population of a given species with given
dominance relationships.
2. What can it do?
Deterministic equilibrium frequencies: The program can provide the expected allelic and
genotypic frequencies at deterministic equilibrium when only negative frequency-dependent
selection (FDS) is involved.
Allelic richness, allelic frequencies and genotypic frequencies distributions at driftmutation-FDS “equilibrium”: The program can provide an estimation of the distribution of the
number of alleles, allelic frequencies and genotypic frequencies in an isolated panmictic
population at equilibrium, with a given mutation rate (in a K-allele model), a given population
size and a given sample size. Note that the initial genotypic frequencies are set to the
deterministic equilibrium frequencies.
Distribution of the allelic and genotypic frequencies change in one generation: The
program can provide the distribution of the genotypic and allelic frequencies after one
generation from given initial genotypic frequencies. It is useful if one wishes to know if the
observed frequencies change in a generation are concordant with the FDS hypothesis.
You will choose what to compute after launching the executable NESSI.exe:
*** Would you like to compute:
(1) Deterministic genotypic and allelic equilibrium frequencies?
(2) Distributions in finite populations?
(3) Genotypic and Allelic frequencies change distributions in a generation?
3. Computations options
Dominance relationships: The program can handle any kind of dominance relationships,
especially specific dominance relationships determined from cross experiments. The program
can handle as well classical kinds of dominance relationships (the so-called dom, domcod and
cod models).
*** Dominance relationships :
(1) Simple: dom or domcod models?
(2) Specific?
Selection regime: The program can handle two types of selection regime, through male way
only (the classical Wright’s model) or through both female and male ways (the so-called
“fecundity selection” model).
*** Frequency-dependent selection model through:
(1) Male way only (Wright's model)?
(2) Male and female ways (fecundity selection)?
Mutation model: At this time, the single available model is the K-allele model (KAM).
When a mutation occurs, an allele copy randomly chosen in the population is changed and
becomes one the K-1 other possible alleles (without any limitation relied on the dominance
class). K is fixed at the beginning by the user and is equal to the number of alleles given in
dominance relationships input files (see further for details).
4. Input files: how to write them?
See options summary below to know which informations are needed depending on which
computations you want to perform.
All input files MUST be in the same folder as NESSI.exe.
__________________________________________________________________________
File containing the matrix of dominance relationships in pollen or pistil
One file for each matrix is needed. If there are K alleles, these files must contain K2 values,
one for each allele pairwise. There must be 1 on the diagonal (for all {i,i} pairwise). The
element at row i and column j contains 1 if allele i is dominant over j, 0 if allele i is recessive
relatively to j, and 0.5 if both are codominant. Note that this matrix is not symmetric since if
element {i,j} is 1 then element {i, j} is 0.
Example 1 (dom model): the file for the pollen and the pistil dominance relationships are the
same :
1
0
0
0
0
0
1
1
0
0
0
0
1
1
1
0
0
0
1
1
1
1
0
0
1
1
1
1
1
0
1
1
1
1
1
1
Example 2 (not a classical model):
1
0
0
0
0
1
1
0.5
0.5
0
1
0.5
1
0.5
0
1
0.5
0.5
1
0.5
1
1
1
0.5
1
1
1
1
0.5
0.5
0
0
0
0.5
0.5
1
Note that partial dominance is not implemented in the program.
Separation can be tabulations or space blanks.
__________________________________________________________________________
File containing simple dominance relationships
This file contains the type of classical dominance relationships as well as the number of
alleles in each dominance class. The informations must be given in a single line with
- the first number is at the beginning of the line refers to the dominance relationship
(0=dom, 1= domcod, 2=coddom, 3=cod).
- Following numbers refer to the number of alleles by class, from the most dominant to
the most recessive.
Example:
0
5
2
1
With this line in a file, computations will be performed with a dom model for dominance
relationships, 3 different dominance classes, 5 allele in the most dominant class, 2 alleles in
the intermediate class and 1 alleles in the most recessive class.
Using this file can be convenient if one wishes to perform computations for several
dominance relationships and allele number, one after the other. For this, just write all
parameters set on different lines.
Example:
0
1
0
1
5
5
2
2
2
2
2
2
1
1
1
1
Using this file will perform four computations: two under dom dominance model and two
under domcod dominance model. The number of dominance classes is the same for the four
computations but the total number of alleles is different as well as the number of alleles in the
most dominant class.
Note that this file can be used for deterministic or stochastic computations.
Separation can be tabulations or space blanks.
__________________________________________________________________________
Parameters of the simulations.
user_parameters.txt: This file is the most important as it must be filled by the user with the
parameters necessary for computation and is ALWAYS needed. The user must give in this
file the parameter values and the name of the files containing the dominance relationships and
optionally the initial genotypic frequencies. When a parameter or a file is not needed, just
leave a blank (but keep a single line between all entries). The name of this parameters file can
not be changed but all information it contains are reminded at the beginning of each output
files.
Explanation of each entry
1, 2 and 3 refer to the three computation options, respectively (1) Deterministic equilibrium,
(2) Distributions in finite populations and (3) Frequency change in one generation. A number
X at the end of each entry means that it is necessary for computation option X. A number
between parentheses (X) means it is optional for computation option X.
//*** Genotypic frequencies equilibrium criteria: 10e-x, x = ? (0<x<9) -- (1)(2)
Gives a criteria for stopping deterministic computations when looking for deterministic
equilibrium. The computations stops when the frequency change in a generation for all alleles
is lower than the given value. Note that for stochastic simulations, the default criteria used for
the computation of the initial genotypic frequencies is 10-4.
//*** Number of frequency classes for frequencies distribution estimation? (0=no distribution)
–23
This is the number of values classes defined for the distribution estimation. The same number
is used for genotypic and allelic distributions (denoted C further).
//*** Number of diploid individuals in the populations? – 2 3
No much more to say.
//*** Mutation rate (per individual per generation)? – 2 3
No much more to say.
//*** Minimal frequency of an allele to be counted? – (2)(3)
This is a parameter useable for convenience (it is not needed). The alleles which frequency is
lower than this parameter is not taken into account the estimation of the distributions. It may
be convenient if it is preferable not to account for rare alleles).
//*** Number of forget generations? – 2
Parameter used only for stochastic simulations and distribution estimations. This number
refers to the supposed number of generation with drift, mutation and FDS to reach a pseudo
equilibrium between loss and appearance of alleles in the population.
//*** Number of generations in a single repetition (after forget generations)? – 2
No much more to say.
//*** Number of repetitions? – 2 3
Number of independent repetitions to be performed?
//*** Sample size? – 2 3
Number of diploid individuals are randomly sampled from the population and used for the
distribution estimations?
//*** Number of generations between sampling? – 2
No much more to say.
//*** File name for initial (observed) genotypic frequencies? (with extension) – 1 2 3
Name of the file containing the initial genotypic frequencies (used only for the estimation of
the distribution of the allelic and genotypic frequencies change in one generation).
//*** File name for dominance relationships in pollen? (with extension) – 1 2 3
Name of the file containing the matrix with pairwise alleles dominance relationships in pollen
(used in computations for specific dominance relationships, either deterministic or
stochastic).
//*** File name for dominance relationships in pistil? (with extension) – 1 2 3
Name of the file containing the matrix with pairwise alleles dominance relationships in pistil
(used in computations for specific dominance relationships, either deterministic or
stochastic).
//*** File name for simple dominance relationships? (with extension) – 1 2 3
Name of the file containing the simple dominance relationships (used in computations with
simple dominance relationships, dom, domcod or cod, either deterministic or stochastic).
//*** First seed for the pseudo-random number generator: 1
Gives the seed of the peudo-random number generator (only used for debugging).
// Last Analysis: Wed Jan 31 11:42:30 2007
Gives the date of the last analysis
//
Computations: Deterministic genotypic and allelic equilibrium frequencies.
//
Type of frequency dependent selection: Male way only (Wright's classical model).
//
Dominance relationships: Specific (dominance matrices)
Summary of the last performed analysis.
__________________________________________________________________________
File containing initial (observed) genotypic frequencies
This file must contain initial genotypic frequencies for the one generation frequencies change
distribution. This file must contain values for each pairwise {i, j} of alleles even if genotype
{i, j} can not exist (in this case the frequency is obviously 0). The table may either contain
value for the upper right part of the table (j≥i) only or be symmetric.
Example:
0.0
0.1
0
0.2
0
0
0.1
0.1
0.5
Be careful that frequencies must exactly sum up to 1!
5. Output files: how to read them?
There are several output files containing results, depending on what is computed and what is
needed. At the beginning of each result file, there is a summary of the parameters used for the
computations.
__________________________________________________________________________
F_Alleles.txt and F_genotypes.txt
These files contain the frequencies at deterministic equilibrium (equilibrium is defined by the
user by specifying the stop criteria or is set to 10-4 for stochastic simulations).
__________________________________________________________________________
Alleles.txt and Genotypes.txt
These files contain the frequencies at each generation until deterministic equilibrium is
reached. Note that the genotypic frequencies are presented in a single row, such that
frequency of genotype {i, j} is written only for j≥i. Hence, frequency of genotype {i, j} is the
element (K-1)(i-1)+j in the row, with K the total number of alleles.
__________________________________________________________________________
Allelic_Frequency_Distribution.txt and Genotypic_Frequency_Distribution.txt
These files contain the estimated distribution of the frequencies in the population, with C the
number of frequencies classes. Results are given in a row for each allele/genotype. The first
element is for a null frequency and the last element is for a frequency equal to 1. All other C
elements are for range of values ]a, b], with b-a = 1/C. The element n of row r is the total
number of times allele r has been observed with frequency in the range ]a, b]. At the end of
each line, the confidence intervals at 90%, 95% and 99% is given.
Example of Allelic_Frequency_Distribution.txt with 3 classes and 2 alleles per class and
C=10:
0
0
0
1
2
1
15
]0, 0.1] ]0.1, 0.2] ]0.2, 0.3] ]0.3, 0.4] ]0.4, 0.5] ]0.5, 0.6] ]0.6, 0.7] ]0.7, 0.8] ]0.8, 0.9] ]0.9, 1] 1
5
12
2
5
0
0
12
5
13
11
0
1
3
3
4
2
5
2
0
0
0
0
6
1
0
0
0
0
7
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
The results above mean that the first allele has been observed 5 times among 20 samples with
a frequency between 0 (exclusive) and 0.1 (inclusive). The last row indicates that allele 6 has
been observed 15 times among 20 replicates with a null frequency.
Allelic_Frequency_Distribution.txt also contains the estimated distribution of the allele
number in the population. It is just a summary of the distribution of the allelic frequencies.
Results are given in a row for each dominance class. The element n of row r is the total
number of times n-1 alleles have been observed in dominance class r across all the samples.
For example, for a model with 3 dominance classes with two possible alleles by class, results
such as
Distribution of allele number per dominance class.
Allele number:
0...
1...
2... observed alleles.
Class 1 0
0
10
Class 2 0
0
10
Class 3 0
0
10
indicate that no allele was never observed in any class, only 1 allele has been observed in 5%
of the samples in class 1, 2% in class 2 and 72% in class 3 and finally, 2 alleles have been
observed in 95% of the samples in class 1, 98% in class 2 and 28% in class 3.
Note that for computations using specific dominance relationships, each allele is considered
independently in this file. In other words, each row will have only two elements: either the
allele is present or absent in the sample.
The file Genotypic_Frequency_Distribution.txt can be read in the same way, except that
rows refer to genotypes instead of alleles. In this file, each row corresponds to a genotype.
The first row contains the distribution for genotype {1,1}, the second row for genotype {1,2},
… the nth row for genotype {1,n}, with n the maximum number of alleles. The n+1 th row
contains the distribution for genotype {2,2}, The n+2 th row contains the distribution for
genotype {2,3}, etc. The last row contains the distribution for genotype {n,n}. The numbering
order of the alleles is the one given by the user.
6. Warnings and hypothesis
Some hypothesis are done while performing the computations.
- Only diploid individuals are considered: tetraploid species expressing a SI can not be
handled with this program (perhaps in the future?)
- The population is supposed isolated and panmictic: every stigma in the population can
potentially be pollinated by any pollen from all other individuals.
- It is assumed that all plants produce an infinity of pollen and an infinity of ovules.
- All plants grow, produce ovule and pollen, get pollinated, produce seeds and die.
- The mutation model is KAM: no new alleles can appear.
- Drift is simulated with multinomial sampling among genotypes.
- Initial genotypic frequencies for stochastic simulations are the ones attained at
deterministic equilibrium.
Warnings:
- Large population size can be handled rather easily, be careful however not to perform
stochastic simulations with too large population size.
-
-
The most time-consuming computations concern the reproduction step of the life
cycle. This time increases exponentially with the number of alleles as every genotypic
frequency change is computed each generation and the number of potential genotypes
with n alleles is equal to (n+1)n/2. Be careful not to perform computations with too
large number of alleles. By experience, deterministic computations can be handled
until approximately a hundred alleles. Stochastic computations can be handled for few
dozens of allele. The lower the faster, obviously.
Be coherent about the number of alleles! Be careful that the total number of possible
genotypes is the same between all your input files!
Options summary
(informations needed in the parameters file user_parameters.txt)
(1) Deterministic genotypic
and allelic equilibrium
frequencies
(2) Distributions in finite
populations
(3) Genotypic and Allelic
frequencies change
distributions in a generation
Genotypic frequencies equilibrium criteria
Number of frequency classes for frequencies distribution
estimation
Number of diploid individuals in the populations
Mutation rate (per individual per generation)
O
_
O
×
_
×
_
_
×
×
×
×
(might be 0)
(might be 0)
Minimal frequency of an allele to be counted
Number of forget generations
_
_
O
×
O
_
Number of generations in a single repetition (after forget
generations)
Number of repetitions
Sample size
_
(might be 0)
Number of generations between sampling
×
_
(might be 1)
_
_
_
×
×
×
×
(might be as large as the total
size of the population)
(might be as large as the total size
of the population)
×
_
(might be 0)
File name for initial (observed) genotypic frequencies
File name for dominance relationships in pollen AND pistil†
_
×
_
×
×
×
File name for simple dominance relationships†
×
×
×
O: optional
_ : not necessary (any value here will not be taken into account for computations)
× : necessary
†: In case of specific dominance relationships, two files must be given: one for pollen one for pistil. In case of simple dominance relationships, a single file should be
given.(see above). Note that if you work with specific dominance relationships, a file name for simple dominance relationships will not be considered, and vice versa.
All input files MUST be in the same folder than NESSI.exe.