ForSim - Penn State Anthropology!

Running the
ForSim
Forward Evolutionary Simulator:
A Basic Guide
------------------- *** --------------------ForSim Program Version: Final (subject to corrections)
ForSim Manual Version: August 2, 2013
Brian Lambert
Ken Weiss
Penn State University
Joe Terwilliger
Columbia University
Developed with financial support from the National Institutes of Health (grant R01
MH063749 and MH 084995), the Penn State Huck Institutes of the Life Sciences, and the
Penn State Evan Pugh Professors’ research fund
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
1
DISCLAIMER AND USE CONDITIONS This program is offered without warranty of its accuracy. Like any software, ForSim
may contain bugs, though we have done our best to identify and correct them and the
program seems to run properly after years of uses and tests. However, we have not tested
all possible ‘legitimate’ combinations of options, nor promise to fix erriors because we
were not funded sufficiently to provide technical service.
Likewise, we believe this Manual to be accurate, but it will be updated as mistakes are
identified or we see ways to clarify explanations, and so on. The file is date-stamped, but
the latest version can be obtained from me ([email protected]). A few features are with
some restrictions; these are identified by being in blue highlight. Because the grant for
this project expired in Spring 2013, new featurs are not being added; for more on this, see
Note to Programmers, in the Addendum.
ForSim has been changed in many ways since its inception. New features have been
added, explanations improved, and some input file syntax changes have been made. For
various unavoidable reasons, the syntax for the input file is not entirely backward
compatible. We have not found bugs that would result in computation errors in earlier
versions used within their realm of features. But to use with this current version,
modifications may be needed.
Publications resulting from using ForSim should acknowledge the program in the
following way:
Lambert, B, Terwilliger, J, Weiss, K ForSim: A tool for exploring the genetic
architecture of complex traits with controlled truth. Bioinformatics, 24(16): 1821-22,
2008; (doi: 10.1093/bioinformatics/btn317).
The distribution includes the source code. If you modify the code, a condition of use is
that you agree to describe those changes in any resulting publications, so readers know
that you are not using off-the-shelf ForSim and what differs in your application.
ForSim is distributed to registered users who provide an email address. I will do my best
to distribute relevant bug reports, usage suggestions, minor updates etc.
Use of ForSim implies agreement to the open-source license provided in the
README file distributed with the program.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
2
TABLE OF CONTENTS DISCLAIMER AND USE CONDITIONS ........................................................................ 2 TABLE OF CONTENTS.................................................................................................... 3 CHAPTER 1: ForSim BASICS ......................................................................................... 4 CHAPTER 2: INSTALLATION ....................................................................................... 7 CHAPTER 3: HOW ForSim DOES ITS WORK .............................................................. 8 CHAPTER 4 : CONTENTS OF THE INPUT FILE ....................................................... 14 CHAPTER 5: USERS’ OUTPUT FILES ........................................................................ 41 ADDENDUM: SOME WAYS TO SIMULATE FEATURES NOT EXPLICITLY
BUILT INTO ForSim AND ONE WAY TO CHECK REASONS FOR CRASHES ...... 59 APPENDIX 1: ForSim LOGICAL FLOW AND TIME CONSUMPTION ................... 65 APPENDIX 2: GENERIC CUT & PASTE INPUT.SIM FILE TEMPLATE ................. 65 APPENDIX 3: MORE COMPLEX SIMULATION FLOW AND INPUT FILE
EXAMPLE........................................................................................................................ 72 © 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
3
CHAPTER 1: ForSim BASICS ForSim is a forward evolutionary simulation system designed to be highly flexible for
application to a wide variety of both applied health and life science questions as well as
issues in theoretical evolutionary biology. It attempts to simulate in the most natural way
the evolutionary process that generates the genetic architecture that underlies present-day
traits, and related phenomena such as mate choice, migration bias, population
substructure, and interactions with the environment. These phenomena are related to the
way natural selection affects underlying genetic variation, molding the trait’s genetic
architecture. Variation over the short evolutionary scale, within species or among closely
related species, is generally built upon a phylogenetically stable underlying causal genetic
architecture upon which mutation, selection, and demographic effects are laid to generate
subsequent variation within and among populations.
In turn, this variation affects the ability of particular study designs or statistical
approaches to correctly infer the basic genetic architecture of traits of interest, or specific
effects that may be of practical importance (e.g., in public health), helping to guide
appropriate sample designs, sample sizes, hypotheses, and analytic methods. ForSim is
an evolving work, but is written to be easily and highly modifiable by the user both
within the current specifications, and by using the current capabilities to design
approximate simulations of additional features, all in an open-ended way.
No simulation is entirely free of the developers’ or users’ assumptions. But ForSim is
based as much as is practicable on biology rather than formalism, and its structure lets the
user specify those assumptions: that is, ForSim itself makes only minimal structural
assumptions. It is a brute-force rather than mathematically sophisticated approach, but it
is this that gives it its nimbleness and minimal dependency on theoretical assumptions.
Since CPU time and memory are rapidly becoming less of a constraint, ForSim will in
principle be able simulate extensive approximations to genome-wide information in large
geographically and environmentally differentiated species under dynamic selective
conditions, on an ordinary computer, and can be a viable part of an evolutionary and
biostatistical exploratory as well as formal analytic tool-kit.
ForSim is written in C++ with processing and controlling scripts written in Ruby. The
scripts are written in Ruby 1.8, but should be compatible with later versions. The
wrapper calls other software as well, if available (see below). All required or optional
software is public domain. The program is not a commercial product, but is under
continual development and augmentation, so current versions (and manual) should
always be used.
ForSim takes its input conditions from an input file. All user-settable conditions—which
means most conditions—are specified in this file, described below. The installation
directory contains simple demonstration *.sim input files. Any input file name is
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 4
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
accepted, so long as it is a plain text file.
ForSim can be run most simply from the command line in the following manner:
./forsim [inputFileName]
However, in most cases users will find it preferable to call the Ruby wrapper script
(runForsim.rb) which automates running of multiple simulation replicates as well as
basic post-simulation analysis. This wrapper script should be run as follows:
ruby runForsim.rb -i [inputFileName] -r [NumberOfReps] -o
[outputFormat] –c [false/true]
If not specified, the default input file is input.sim, the default number of independent
replicates run is 1, the default output figure format is png (other format extensions,
including pdf, that are recognized by R and that your machine has the ancillary software
for may be specified), and gzip compression of output files is false.
The program itself produces numerous text output files as described below, but if run
with the wrapper, these are used to generate many useful graphical files summarizing the
data, and all of the output is then sequestered in a run-specific output folder with named
to identify the date, time, and replicate number. The wrapper places these folders in a
forsim/runData directory which the program creates (if it doesn’t already exist).
NOTE: ForSim does this only at the end of a run. If you use the –r switch (and, hence,
iterate the same input file), these output folders will be created sequentially for each run.
But you cannot run the program more than once simultaneously (e.g., to test separate
input file specifications), unless each run is with a separate copy of the program in a
separate directory! Otherwise, the output results from different runs may be inextricably
mixed before being bundled and stored. The wrapper logic is straightforward, graphs
mainly produced with R, so it could be rewritten in another scripting language if desired.
Also, aborted runs may leave various files in the forsim/ directory. For bookkeeping
purposes you should delete such files by hand. However, each subsequent run will use
run start time to identify run-specific files to put in the run’s runData folder, ignoring any
pre-existing files.
The burden’s on you!
ForSim is designed to be flexible, so it is not for canned push-button science. It can do a
lot, but you have to think about what you want to do. The burden is on you as the user to
conceive your questions carefully, especially to avoid building into the program what you
want to get out of it. Since so many things can be changed or used in different ways, you
must pre-plan your use to specify them carefully. Running ForSim is not difficult, but
it’s for science, so your study design—your input file specification—is all-important.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 5
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Start simple: use only a few of the possible parameters, use small samples and short runs.
Get a feeling for how it’s working, and then build up to what you really want to simulate.
And…before going farther, read the Manual carefully, or check relevant sections if your
run crashes.
An important conceptual point
ForSim simulates a basic evoluitionary architecture, but not the specific variation or
causal details that will be present at the end of the run. As in real nature, evolutionary
processes determine that. ‘Evolutionary architecture’ refers to the demographic
conditions (population size, structure, mating and migration patterns etc.) and the basic
genetic mechanisms (number of genes that may affect a trait, their interactions, the nature
of natural selection, mutation and recombination rates, etc.). The actual genetic
architecture at the end—how many alleles and haplotypes, and at which genes, affect
variation in the trait in the final population, the frequencies and effect sizes of the existing
SNPs at the end, their linkage disequilibrium (LD) patterns, and so on are not foreseeable
nor prespecifiable in ForSim. As in real life, these aspects of genetic architecture are
strictly the result of the individual evolution (simulation). It is often the case that a set of
desired conditions (e.g., SNPs with particular frequency or LD relationship) can be found
among those in the simulated data, again as occurs in life.
Playing around….but not a toy
ForSim can’t do everything, but it can do many different things, as you will see by
browsing this Manual. You have the ability (or burdensome responsibility!) to stipulate
many different conditions and values in what you simulate. For many purposes, a ‘toy’
model will suffice. This is a very simple model, clearly unrealistic, yet satisfactory to
investigate a particular point. It is easy and quick to set up the appropriate instructions
(input file) to test toy models, run them and get results.
But for many problems a more extensive range of models must be tested and evaluated.
Often, if not typically, you’ll just be guessing and will have to assume some values you
think reasonable, or try a range of them.
And while it’s true that ForSim can’t do everything (and it would be natural for you
quickly to spot something you’d like that it doesn’t do), there are many ways to get
conceptually quivalent results. Examples and discussion of this will be seen in several
places in what follows.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 6
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
CHAPTER 2: INSTALLATION ForSim has been tested on Linux installations and MacOS X. It should work seamlessly
in Unix. The program is installed from a forsim[###].tar.gz file, where ###, if present, is
a date stamp.
Uncompress this file with the command: tar –xzvf forsim###.tar.gz, which will produce
a ForSim directory containing the program contents. Included are the program source
code, the Makefile, various sample input (‘.sim’) files, the ruby wrapper script
runForsim.rb, a README file, a copy of the Manual (this file), and a tidy.rb script that
can be used to remove various types of file to clean up the directory from the forsim
directory (e.g., if previous runs failed or did not complete and left miscellaneous results
files).
On Linux systems, simply running ‘make’ in this ForSim directory will compile and
build the program executable. The program will then run from this ForSim directory.
For compiling on MacOS systems, please refer to the “README” file in the ForSim
directory.
The computer must have an installation of the Ruby scripting language. For graphics, the
R statistical software is needed, and access to X11 must be provided. Graphics can be
generated faster if the R “GDD” package is installed. (Please see “http://cran.rproject.org/src/contrib/Descriptions/GDD.html” for installation details.)
Some of the output content is produced by the wrapper script runForsim.rb rather than
ForSim itself (this is for various pragmatic reasons).
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 7
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
CHAPTER 3: HOW ForSim DOES ITS WORK ForSim is very flexible and most aspects of its function can be altered by creative use of
the input specifications (examples are suggested in Addendum). Note also that there are
many ways to achieve similar ends, so that things not explicitly in the language can be
done by creative approaches, many of which are described in this Manual. Appendix 1
provides a diagram of the main execution flow of the program. The relevant parametrs
are specified in an input file that the program parses, as described later. Here is a
summary of the major features:
TABLE 1: ForSim MAIN FEATURES
(not all features are listed here)
BASIC FEATURES Specifiable duration of simulation (in synchronous generations) Single or multiple replicate simulations Point mutation and recombination that can be sex-‐specific; gene conversion, hotspots Sex-‐specific phenotypes Gene-‐ and sex-‐specific mutation rates Mating by families formed with or without replacement Stochastic family size distribution and logistic population maintains specified size Multiple genes and chromosomes, of arbitrary number and length Multiple univariate or multivariate phenotypes Stochastically determined mutation-‐specific allelic effects on phenotypes Environmental (family and individual) contributions to individual phenotypes Gene x environment interactions Phenotype-‐based mate choice and migration (gene flow) between populations ELABORATIONS USERS CAN SPECIFY OR CHANGE DURING THE RUN Flexible mating, phenotype determination, migration, and selection Multiple populations with hierarchical (cladistic) splitting, and user-‐specified split-‐ times, phenotype-‐based or random mate-‐choice and gene flow, environmental effects, population size, and natural selection regimes Pleiotropic genetic effects Complex multilocus phenotype definition including networks and gene interaction Flexible natural selection criteria with stochastic fitness Ability to restart simulation for replicate subsequent runs, or to generate specified types of output data, including some ability to change run parameters INPUT/OUTPUT FEATURES Population and pedigree data saved Data saved at user-‐specified generation check-‐points with real-‐time plotting of conditions and specification of the data to be saved at check-‐points Complete history of every variant SNP can be saved Output suitable for standard human genetic analysis and mapping software Output in rapid and easily parsed XML format Graphical output in browser-‐readable SVG, as well as other formats © 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 8
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
_________________________________________________________________ Following is a brief description of how the program works.
Generations
ForSim uses synchronous generations, that like Christian-era year-numbering begin with
generation 0 and runs for the user-specified number of generations (0-based indexing is a
C++ characteristic). Each generation, families are formed, genes are mutated and
recombination occurs, phentotypes are determined, and natural selection is imposed (if
any is specified), and the population or populations are ready for the next generation.
The processing occurring during a given generation is only manifest by the start of the
next generation. This means that if you want data on generation n (which is the n+1st
generation), you must refer to it in the input instruction file (see below) as generation
n+1.
The program begins with only a single population, but as of generation 1 (the second
generation), and/or later as specified by the user, other populations may be created by
receiving individuals from existing populations. Thus if a new population is created at
the thousanth generation (generation number 999), it doesn’t really exist as an acessible
entity until the next generation. The simulation must run until the generation number is ≥
the generation of most recent new population founding + specified pedigree depth.
ForSim is a diploid simulator and we refer in this Manual to ‘males’ and ‘females’ but
there are no sex chromosomes. Haploid approximations can be made (see the suggestion
on Haploid evolution in the Addendum), but no real XX/XY differences.
Mutation and recombination: These occur each generation, randomly across the genome,
and can differ between males and females. New individuals are randomly assigned male
or female status (p=0.5). With user-specified probabilities, a mutation can have no effect,
or an additive effect that can be either positive (adds to the trait) or negative (subtracts)
(for the syntax, see ‘Mutational (allelic) effects’, below).
You can specify locations at which there are reombination hotspots (higher rate than
elsewhere on the chromosome). To do this, in the input file (see below), add the line or
lines as needed: hotspot start length rate (in megabases per centimorgan),
e.g. hotspot 1000000 1000 1.0, in the appropriate chromosome block, but not within a
gene block even if it involves a gene. The default or if not specified is no hotspots.
Gene conversion is a double recombination within short nucleotide distances. This can be
specified with the syntax specifying the probability that, once a recombination occurs,
another will occur nearby, in basepairs downstream from the first, specified as following
a gamma distribution. The syntax, in the global block, is:
geneConversion Prob gammaShapeParameter gammaScale Parameters
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 9
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
such as geneConversion 0.001 10.0 10.0
Parameters must be floating point. The default, or if the keyword is not specified, is no
gene conversion.
Mating: An individual is chosen randomly for mating. Mates are chosen optionally with
or without replacement—without replacement is the default. Mate selection may be
specified to be phenotype-dependent so that assortative mating may be simulated.
Offspring sibships are generated from each mating pair, constrained by limits on
maximum family size and per-generation growth rates (see below), and new mating pairs
are chosen until the target offspring size is achieved to within stochastic accuracy. When
there are multiple populations that interact, a potential parent in a given population
chooses a suitable mate from the same, or other populations, with user-specified
probabilities and optionally based on the selectee’s phenotype.
Before each generation all ‘males’, and separately all ‘females’, are put in randomized
order. Males are picked in their order on the mating stack, starting from the top, and they
search for females in their respective stack-order until a suitable mate is found. In mating
without replacement, the male and female are removed from the eligible mate list. In
mating with replacement, they are moved to the bottom of the list of eligible potential
mates.
[NOTE: There will always be at least one male and female in the next generation when it
is formed. But if selection is too strong and one or both are culled, the the population can
become extinct, crashing the run. If mating is specified as without replacement, there
may (as in real life) not be enough mates to achieve the desired parental mating pool size
or individuals of both sexes, and a population may not grow in the normal way or may
tend towards unplanned extinction.]
Family and population size: Mated parents produce offspring of family size distributed
as a poisson with user-specified mean. If this is set to 2.0 the population is stationary
unless there is selection. If there is selection, or to reduce the probability of population
decline, a value somewhat higher than 2.0 can be used to approximately accommodate
selective loss. But reproduction is stochastic, so that if a population is below its userspecified target size, mating will continue (if mates are available) and the population will
shrink or grow by mean family size altered (increased or decreased) to respond to the
excess or deficit. Population expansion and contraction are logistic as described by the
standard Verhulst equation, in which population size is determined by current size, a
growth rate, and a population carrying capacity. The Verhulst equation models the rate
of reproduction as being proportional to both the existing population size and the
availability of resources. Thus, when populations are small and resources abundant,
population growth is rapid, and as populations become large and resources scarce, growth
slows and eventually stops as population size reaches carrying capacity. The pergeneration population change is specified as:
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 10
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
[1] P1 = (K*P0*erate ) / K + (P0*( erate- 1.0 ) ) )
Where P represents population size and K represents carrying capacity. The carrying
capacity, growth rate, and starting population size can be set independently for each
population. This scaling adjustment, however, is constrained by user-specified maximum
family size and maximum per-generation growth rate. The growth rate is proportional to
the difference between the current and target population size, roughly a logistic growth
model. [NOTE: For fertility and population growth, if there are t years per generation,
and you want to model a growth of fraction r per year and a generation of length t years,
then use growthRate rt. This will generate per generation growth by a factor (1+r)t
or ert. Thus, at 25 years per generation and 2.1% growth per year, the value for the
growth rate line is 25*0.021, which is 0.525. (Note that this is just an example: 2.1%
annual growth is very rapid for human populations, where zero is closer to steady state
and 1% is substantial)]
Phenotype definition: Phenotypes are affected by genes as well as environments, and
genes may affect phenotype indirectly by affecting each other (epistasis) or (under one
option) by gene-environment interactions. The user specifies additive or more complex
contributions of the genotypes at each gene, with algebraic functions (see Defining
phenotypes, below).
Natural selection: Natural selection in ForSim is based on phenotypic rather than
genotypic criteria. Selection occurs based on user-specified functional criteria ; so that
those not satisfying the criteria are culled (removed from the population) before mating
(see how to Specifying natural selection, below). Selection takes to possible forms: new
individuals are immediately screened for fitness, a form of mortality selection. Or, by
using phenotype-based mate-choice a form of fertility selection can be imposed.
Selection and phenogenetic criteria can be changed during the simulation, within as well
as between populations at user-specified generations using event lines in the input file.
[NOTE: that (as in Nature) too-severe selection can lead the population to go extinct!]
Narrow cutoffs represent stringent purifying selection. If a positive-valued phenotype
threshold cutoff is used to classify a person as ‘affected’, then if the floor cutoff is closer
to the mean compared to the ceiling, this will favor mutations that contribute positive
effects with respect to the trait. Phenotypes are generated as quantitative traits, but a
threshold-based selection option, treats traits as qualitative traits.
The simplest stipulation of selection is in terms of relative rather than absolute
phenotypes, fitness specified in truncation selection in units of standard deviations of the
current population phenotype distribution. The user specifies lower and upper SD limits,
within which fitness is 1.0, and zero otherwise. Individuals with fitness outside these
relative truncation limits are not part of the next generation’s parental gene pool.
Selective neutrality can be specified in three ways. Most efficiently, variation in a gene
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 11
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
that is defined as not affecting any phenotype evolves neutrally by definition. Secondly,
and realistic for genes that do affect phenotype(s), specifying broad relative selection
cutoffs (e.g., -5.0,5.0) meaning only truncate individuals with phenotypes more than 5 SD
from the current population mean phenotype, will mean that few if any individuals are
selectively culled. In essence, if s < 1/Ne where here for a given gene s is the truncated
tail area of a normal distribution, the trait and its affecting genes both evolve essentially
neutrally (NOTE: However, this s applies to a trait not a specific gene or allele, because
ForSim is a phenotype-based simulation, just as Nature works on phenotypes rather than
genotypes). Thirdly, a phenotype can be explicitly specified as neutral, in which case
genes contribute to the phenotype, but no selective test is imposed regardless of the
phenotype value in an individual.
For evolutionary studies, it may be useful to save the entire history of every SNP
generated during the run. This can be done with a usingTrackSNPs true/false line in
the input file (see Chapter 4). A series of files are saved when a SNP buffer is full, and
at the end, and these can be jointly analyzed. But be careful what you wish for! A large
or long run will generate drillions of SNPs, just as Nature does, and that means huge data
storage requirements.
IMPORTANT NOTES: ForSim begins with every member of the starting population
having the same genotype—that is, there is no genetic variance nor phenotypic variance
due to genetic variance. The specified nucleotide spots in the simulated genetic data are
blank. As variants arise by mutation, they are stored as a pair of random, differing
nucleotides (A, C, G, T), one assigned to ‘ancestral’ and the other to ‘novel’ allele status.
In some output files these are recoded to 1,2,3,4 as preferred by some genetic software.
Since everyone is genomically identical at this point, all phenotype variance would be
due to environmental effects, if any are specified. Therefore, for many purposes, a ‘burnin’ time of your choosing (such as a few hundred or thousand generations, depending, for
example, on population size) is needed before effective mutation, selection and genetic
variation have accumulated to something approaching equilibrium state. This also means
that selection, especially truncation selection, can destroy the entire population right
away if it is too severe (just as it does in real life). Strong selection imposed at the
beginning can work if there is environmental variance so that some individuals will
survive, but this will greatly reduce effective population size until a burn-in time has been
achieved.
NOTE also that this is written in C++ and arrays are indexed starting at 0. Thus, some
values in output files or on-screen reporting refer to Gene0 or population0 or phenotype0,
or generation 0, referring to the first-named or first-occurring item in a series.
A note on speed
ForSim runs are acceptably fast for a wide variety of evolutionary scenarios, but speed
depends necessarily on the complexity of the input specifications. The more complex the
simulated conditions the more time will be required to complete the simulation. Specify
as simple a run as will be a close enough approximation to what you want to test. One
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 12
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
population is fastest, or more than one but without post-division migration. Truncation
selection is faster than complex selection, as is simple rather than complex phenotype
specification; specifying selection as ‘neutral’, or mateCutoff internal/external
as ‘none’ (both of which are defaults and need not explicitly be specified in the input
file—it is a good idea to put in a default line in the input file, as a reminder to yourself)
will run faster. Similarly for migration and mate choice; random (which is the default) is
faster than phenotype-based.
For some epidemiological conditions, population splits, admixture, or differential
environmental conditions need only be specified at the last few generations of a run. For
some evolutionary purposes, one may wish to track the entire inheritance history of every
SNP that is generated during the run, whether or not it is present at the end. Using
usingTrackSNPs will do this, but with a speed cost. Otherwise, use the default
usingTrackSNPs false. These data may be of no interest, for example, to genetic
epidemiological analysis.
ForSim places no formal limits on complexity, and impracticably long runs can easily be
devised. Contorted conditions are more likely to entail inadvertent bugs (or reveal real
ones). It’s your obligation to conceive tractable problems that will cogently answer the
question you want to answer, and to check the results to see that they seem to do what
you specified. It is not sensible to oversimplify, but many details can be omitted without
substantial loss in information, unless they would proliferate during a run. These things
can be explored in each case with some small-size test runs.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 13
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
CHAPTER 4 : CONTENTS OF THE INPUT FILE ForSim takes its running conditions from an input text file (default name input.sim)
whose syntax is designed to be straightforward and intuitive to produce. The input file is
read line by line. NOTE: This file must be in plain text. In particular, it cannot contain
platform-incompatible line-end characters. This will cause otherwise inexplicable
hangups. If need be, use the dos2unix Linux utility or the command tr -d '\r' <
inputfileX.sim > inputfile.sim to purge such line-enders. Our experience is that
textEditor and other similar programs work suitable for Mac installations.
To make developing specification of running conditions as easy as possible, even for
complex situations, the input file has a begin-end block structure, keyworded format,
that ForSim parses. Various keywords followed by begin specify the start of an input
block, which is terminated by the end keyword. Within each block, additional keywords
specify the item whose parameters are then given. The actual files are plain text format,
but in the examples in this Manual, color-coded syntax is used for clarity: block
keywords are given in red font, while parameter keywords are in blue. Comments are
shown in the example in orange font.
# Comment line(s) explaining the file’s objectives
global begin
Global general running conditions, using global-keyword parameters (output
options, mutation & recombination rates, fertility parameters, generations to
simulate, changes to occur at specified points during the run)
end
# of the global block
chromosome begin
Chromosome specifications: a separate block for each chromosome, using
chromosome-keyword parameters (length)
gene begin
Gene specifications: a separate block for each gene, nested in its
chromosome block, using gene-keyword parameters (location, size, mutation
rate, allelic effects)
end # of this gene block
end
# of this chromosome block
phenotype begin
Phenotype definition, separate block for each trait, using phenotype-keyword
parameters to specify phenogenetic model (genes, environments, interactions)
end # of this phenotype block
population begin
Population specifications, separate block for each population, using
population-keyword parameters (size, environment effects, selection & mating
patterns)
end
# of the population block
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 14
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
For readability, the input file parser uses a simple syntax in which each input file line is a
single command unit. Whitespace (blank lines and indentation) is ignored (except
between words or parameter values), but a newline is needed between each command.
There are no preset limits for the number or characteristics of chromosomes, genes, or
populations (except that the simulation must begin with only a single founding
population). For simple situations, the input parameters can be simply typed in to an
input text file using the format shown below. For more complex situations, such as many
different genes, a file-making script will probably be most efficient. Comments are
permitted, indicated by ‘#’ on a line, so long as it is the first character in the line or is
preceded by a space. Comments, including an explanatory line or lines at the beginning,
will be helpful for users simulating many different conditions.
Basically, users specify the phenogenetics of one or more traits, that is, their underlying
genetic architecture, aspects of the simulated genome’s evolution, functional effects of
new mutations, criteria for mate choice and reproduction, and population dynamics over
time. By use of the event keyword parsed by ForSim, many of the simulation
conditions can change at specified points during the simulation. The user specifies
specific aspects of output, such as the generations during a run at which population data
should be saved (if any such saving is desired), and aspects of final data to be generated,
including the number and generational depth of pedigrees to be generated.
The following model input files explain the available specifications. A basic version,
Example 3a, ‘basic.sim’, is in Appendix 2 at the end of this manual, that can be cut and
pasted into a new text file, deleting, duplicating, or modifying the entries. An even
simpler version, smple.sim, is Example 3b, as a quick-running way to test and debug
syntax issues and quickly evaluate the effect of modifying input instructions.
The global parameter block specifies conditions that apply to the entire simulated data,
although some of these can be changed locally during the run. After the global block,
locally specific keyword blocks specify the nature of chromosomes, genes, phenotype
determinations, and populations. Chromosomes are numbered, starting at 0, in the order
in which they are specified. Phenotypes, populations, and genes are given names. The
number of repetitions of entries (phenotypes, chromosomes, genes) is open-ended.
However, we recommend only simulating a single chromosome for practical reasons;
search for Simulating Multiple Chromosomes below.
For some simulations, a discrete affection status (affected/unaffected) is desired. This
applies to the first-specified phenotype only, and is simulated as a threshold such that an
individual is ‘affected’ if its phenotype exceeds the threshold, specified in terms of
number of standard deviations from the current phenotype mean (cutoff can be positive
or negative, but must be explicitly pecified in the input file. Output specifies affection
status in the preMakeped (LINKAGE analysis format) files, a standard format for genetic
epidemiological analysis. Prevalence is specified by the user and is not used explicitly by
the program except at the end; if you wish something to be based on affection, such as
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 15
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
selection, migration, or mate choice, it could work to simply specify the same cutoff
criterion.
At the end of a run, ForSim saves the pedigrees that comprised the entire population that
produced the last n generations in the simulation. The default is 3 generations but
pedigrees of 2 (nuclear families only), or of more than 3 generations can also be specified
if needed (finalPedigreeDepth #). NOTE: The specified number of generations is
the maximum for any given pedigree: if some subset of matings fail to reproduce, the
overall pedigree will still be included—as occurs in real life—but childless couples are
not saved, so that family size distributions are truncated.
The input file can have any name, passed as command-line parameter (see above), with
‘input.sim’ provided as a default in the ForSim installation package, used if no filename
is specified in the runForsim command line.
If ForSim is unable to parse the input file, it will exit, sometimes with an error message
containing the test and number of the line it failed to parse. But be aware that not all
syntax errors are detected. A simple misspelling in the input file will give rise to the
following example error message :
Error in input file on line 17 of 101 lines.
Line #17 : matingWithReplacement true
We suggest that you write yourself a text paragraph explaining what you intend to
simulate, draw a flow diagram if it is at all complicated, and construct your input file.
Then, save all of these in a descriptive file with a name you’ll understand later, so you
can confirm that you did what you intended, can remember it, and can relate it to the
output results. Put this self-informing file in the output file directory.
Also, liberally comment the input file so its intent will be clear to you later. A useful but
not mandatory practice is to have one or more explanatory comment lines at the
beginning of the file saying what you intended to do. The first line might begin with
something like (this is not mandatory, just a suggestion):
# WHAT: short test of phenotype based migration, two pops, 1000 gens
Using the event keyword to modify conditions during the run
There are several conditions that can be changed using the event keyword. The mate
choice pattern within or between populations, the overall environmental effects
distribution, the selection regime, and the population size can be changed during the run
in this way. Other changables may be added in future modifications. The syntax for
most commands is
event ## where ## is generation number, plus other parameters:
donateParents SourcePopName RecievingPopName #males #females
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 16
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
mutation Remaining syntax as in global definition (changes all gene mut. rates)
outputXML true/false
printGeneration (available with some restrictions)
serializeState
setCarryingCapacity Popname MaxPopSize
setEnvironmentNormal PopName PhenotypeName mean variance
setFertility PopName poisson mean (Popname not needed in global block)
setMatingPopMatrix PopName %-mates-chosen-from-each-population
setMaxOffspringNumber PopName # (Popname not needed in global block)
setPhenotypeSelection PopName PhenotypeName [full selection line]*
*This means use a regular selection specification line, with redundancy as follows:
event 100 setPhenotypeSelection Pop1 Phen1 relative -1.0 2.5
See the event examples in the sample input files below. The keyword system can also
be used to specify ‘serialized’ data to be stored at the specified generation, that can be
used to restart the simulation under changed conditions, do replicates from that point
forward, etc. More than one such instruction, each applying to a different generation may
be used. For the format and use, see below.
NOTES: There must only be one population at the beginning; other populations can be
founded in any subsequent generation. Note also the sublety that if multiple populations
are being simulated, the run must not end until the last-defined population has existed for
at least as many generations as the specified pedigree depth. Thus for 3-gen. pedigrees
and population founded at generation 1000, the input file must specify generations
1003 (or greater).
Not all parameters can be changed by event lines. An example are the gene-specific
allelic effects probabilities. However, some such variables can be changed by means
given in the Addendum that involve stopping modifying the parameters and adding a
load instruction and restarting.
Input file keywords and their meaning or default values if relevant
Unstarred terms must be explicitly listed in the input file (some, like event, only if
they’re being used). Single-starred terms have default values as given here, and need not
be explicitly specified in the input file (though for clarity and reminders, it may be good
to include them). Double-starred terms are contextual and don’t take specific values per
se, but lines using them may require such values. For explanation, do a Find search in the
main text.
Keyword
Default, etc.
**absolute
Requires some value to be specified
birth
0 (generation pop. Starts)
carryingCapacity
Size of population
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 17
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
*death
**definition
**donateParents
**environment
**familyEnvironment
*environmentNormal
**event
*familyEnvironmentNormal
**female
*finalPedigreeDepth
firstCodonProbability
functional
gamma
**gene
geneConversion
generations
growthRate
hotspot
ifFemale
ifMale
initialSize
*intron
length
location
**male
mateCutoff external
mateCutoff internal
*matingWithReplacement
megabases per centiMorgan
mutation rate
**name
neutral
output
*outputSVG
*outputXML
**phenotype
poisson
**population
*prevalence
printGeneration
when population goes extinct;
if not specified, end of run
phenotype definition
must specify source and amount
usable in phenotype definition
usable in phenotype definition
must specify mean, variance
0 0 (mean, variance)
3
1.0
selection type; specify func. equat.
must specify the parameters vals
gene block header
prob gammashape gammascale,
default none
location length rate, default none
sex-specific phenotype specifier
sex-specific phenotype specifier
for new population
false
gene length
gene start position
false
specifying recombiantion
global or gene-specific
for genes or populations
type of (no) selection
gen. interval for special output
false
true/false (can be set in ‘event’ lines)
phenotype block header
mandatory word in fam. size spec.
population block header
relative, 0.05
Available with some restrictions
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 18
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
probabilityNoEffect
probabilityPositiveEffect
relative
*scaleEnvironmentalNormalVariance
*secondCodonProbability
**selection
*serializeState
setCarryingCapacity
setFamilyEnvironmentFraction
setEnvironmentNormal
**setFertility
setHeritability
setMatingPopMatrix
*setMaxOffspringNumber
setPhenotypeSelection
setSimulatedGenomeFraction
*thirdCodonProbability
*usingSpatial
*usingTrackSNPs
*usingRecurrentMutations
*usingCodons
selection mode, needs cutoff limits
false (else give parameters)
0.9
specify selection regime
final generation or specified by event
default 0.0
family size specification
default 0.4
default 1.0
0.5
false (values or defaults)
false
false
false
The following sections describe a simple single-population simulation, and a more
complex multiple population simulation, respectively. The input files which generate
these simulations are included with the ForSim distribution.
Sample input file for single population simulation
Successful simulation with ForSim depends on careful specification of the run conditions
in the input.sim file. Because the program is flexible, there are many options. They need
not all be used (there are defaults for everything), but to test specific things you must be
careful in designing your input file.
The simplest basic simulation is of evolution in a single population, as shown in this first
example input file. In this description, entries like PhenA, ABC1, and numbers are runspecific examples; but the keywords, like megabases, phenotype, gene, etc. must
be explicitly included. There can be as many chromosomes, phenotypes, and genes as
user desires (more means slower, of course!), and genes can affect whatever phenotypes
user specifies (or none, in which case they accumulate mutation but are not affected by
natural selection). We urge using only one chromosome, with ‘genes’ spaced far apart
where unlinked locations would be desired (search on: Simulating Multiple
Chromosomes, below).
Mutational (allelic) effects
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 19
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Each generation for each individual a mutation will occur in a specified ‘gene’ with userspecified mutation rate probabilities. Mutation rates can be sex-specific and genespecific, at user’s choice. These are set in the global block if they are to apply to all
genes, or they can be specified using the same syntax, separately for each gene not
following the global rate. However, NOTE that if an event instruction changes the
mutation rate, that is then applied globally. The rate is set in scientific notation: 2.5 E -8,
where E refers to powers of 10.
With user-specified probabilities, new mutations have either (a) no effect, or some
additive effect that can be either (b) positive (adds to the trait) or (c) negative (subtracts).
This is specified by 2 probabilities in the input file: a and b. Partition c, the fraction of
negative mutational effects, is the complement of these to sum to 100% of possibilities.
NOTE: These specifications can be made in the global block and will apply to all genes.
They can also be made in each gene block. The last-specified value (global or genespecific) will be what applies to a given gene. That is, a global rate will be over-ridden if
specified for a specific gene. If you change mutation rates in an event, this will apply
globally.
The effect size of a new mutation is then determined by a random draw from a gamma
distribution with its two parameters, shape (usually denoted by k or α), and scale (θ or
β), in that order) specified by the user in the input file. Parameters of the gamma can be
specified independently for positive and negative effects. This shows an example:
The next two figures show gamma parameters for positive effects, that yield only a small
fraction of large effects (left, gamma (1.0,0.05)), and a larger amount (right,
gamma(1.0,3.0)). NOTE: Specify numerical values in floating point format.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 20
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Note that these are additive effects attached to each new mutant allele at the time it arises.
The ‘Ancestral’ allele by definition has zero effect; this means that the net effects of AA,
AN, and NN genotypes are 0, a, and 2a, where a is the assigned allelic effect.
A comment on Dominance
Gregor Mendel worked with crosses in inbred strains, and in subsequent experimental
work many investigators do that. This led to the widespread idea of Dominance that can
be characterized as ‘physiological,’ in the sense that the dominant allele always masks the
presence of a recessive allele at the locus (with co-dominance added as a term for when
the effect is only partial). The implication is that this is different from dominance in the
statistical sense as used in quantitative genetics, that refers to the mean phenotype of
individuals with the AN genotype relative to the midpoint between the mean AA and NN
phenotypes. But these are false distinctions: physiological effects are only manifest in
their respective genomic background, and Mendel worked with crosses between inbred
strains. Thus, there is essentially no absolute effect.
ForSim does not currently explicitly specify classical inherent or ‘physiological’
dominance or recessiveness, though the ancestral allele’s zero assigned effect, or using
‘no effect’ for the novel SNP allele, makes them effectively recessive—they contribute
nothing to a phenotype. Large assigned allelic effects could in practice raise the
probability of the carrying individual being ‘affected’, a kind of approximate de facto
dominance or codominance, but this depends on the overall complexity of the model
being simulated, and of the chance aspects of mutational effects.
Dominance can be introduced in an explicit way. ForSim assigns allelic effects (see
discussion of mutational effects) that are additive. Dominance is usually parameterized
in the statistical sense as a deviation of heterozygotes’ mean phenotype, from the
homozygote mean, d=(AA+NN)/2. Note that the ancestral allele A has by definition an
effect zero, and a novel allele N has an assigned effect, say, n (drawn probabilistically, as
described above). The difference between the homozygotes assigned effects is just the
Novel allele’s dose, n: [(0+2n)/2=n]. But if your phenotype definition (see below)
contains nonlinear functions such as (say) Phen=GeneA^2 the phenotypes will be 0,
n2, and 4n2, and the assigned dominance deviation will be 2n2, which is different from n.
Dominance also arises routinely in the sample or population sense of observed statistical
deviation of heterozygotes from the homozygote midpoint. This is because, as in Nature,
the genomic and environmental backgrounds of the AA, AN, and NN individuals at a
given SNP will vary in finite samples such that the formally additive effects assigned to
the N allele at the time of its mutation are not precisely realized.
Dominance can be approximated in other ways as well. See suggestion 4, on Mendelian
traits, in Addendum for some comments on this.
Defining phenotypes
Phenotypes are defined in terms of simulated genotypes and phenotypes, with many
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 21
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
options.
Genotypic effects
Phenotypes are specified in terms of the genes and environments that affect them (the
individual’s summed haplotype effects for the specified gene) , and a phenogenetic
model. Note that this allows GxG interactions to be specified. The syntax is algebraic:
PhenA = G1 * 2.0 + G2 + G3 * ( G4 + G5 )
In the usual precedence order, a subset of standard mathematical functions can be used:
+, -, *, /, ^ (exponentiation), and e (exponentiation). For absolute value use
(variableName^2.0)^0.5. Functional expressions are parsed following standard rules of
algebraic precedence. NOTE: Specify numerical values in floating point format.
Phenotype definition can have a sex-specific component. Use of keywords ifMale
and/or ifFemale will substitute a value 1.0 for that keyword if the individual is of the
specified sex, zero otherwise:
PhenA = G1 * ifMale + 2.0 * G1 * G2 * ifFemale.
NOTE: It is unrealistic to try to be too fancy here, as even simple nonlinear phenogentic
or fitness relationships are challenging to confirm even in experimental data. Keep it
simple. Because ForSim must parse a wide varietey of possible functions with as little
ambiguity as possible, every number must be floating point, and there must be a space
between every item (except negative numbers, that are written -2.3). See example and
explanatory material in the Addendum.
NOTE also that while this specifies the logic of effects and interactions of basic genetic
pathway architecture, the actual quantitative phenogenetic effects in any given run
depend on the mutations that arise during the simulation, and their individual and
haplotypic effects, just as developmental and homeostatic pathways in natural organisms
are phylogenetically conserved but can vary by sequence evolution molded by drift and
selection.
Environmental effects.
Random individual environmental effects are phenotype contributions imposed
independently on each new individual each generation, separately for every phenotype.
These effects are normally distributed with user-specified parameters that can differ for
each phenotype and population (default is Nor(0,1)), and can be changed during the
simulation run. Of course, you have to make some kind of guess at these parameter
values. For no environmental effects, use Nor(0,0), but Note as stated below that because
the program begins with no genetic variation, without environmental effects the mode of
selection must be set to neutral for enough generations that some genetic variation
that affects the selected trait arises by mutation.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 22
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Additional family-specific environments may also be specified. A random
Nor(Fam_mean, Variance) variate, default Nor(0,1), is drawn once for each sibship. Each
individual in the sibship is given a value drawn independently from that distribution, used
as specified in the phenotype definition part of the input file. Family-specific
environmental effects are added (unless specified otherwise) to the individual random
environmental effects for a net total environmental effect for each individual. Default is
no family-specific effect (Nor(0,0) distribution).
Environmental effects for each individual are used in determinit his/her pheno type as
specified in the phenotype definition block, as for example:
PhenA = G1 + G3 * ( environment – familyEnvironment )
If not specified in algebraic terms in the phenotype definition, random and family
environments are added to the genotypic effects for each individual. Individual
environments can be set for each phenotype and each population, specified in the
population definition blocks. These can be changed during the run by event
instructions; however, while the shared family environmental component can differ
among phenotypes, its distribution and application are the same for all populations. An
example is shown in testExample2.sim, below.
The input file syntax is:
environmentNormal PhenotypeA 0.0 1.0
familyEnvironmentNormal PhenotypeA 0.0 1.0
with such lines in each population block. If not specified, defaults are applied
automatically.
[NOTES: Numerical parameters must be specified in floating point format. By
specifying the variances in the two environmental components you are also essentially
but implicitly specifying the heritability that will result if the simulation approaches
equilibrium. You cannot easily know this value in advance (or at all with complex
interactions). So you must make a decision about the relative impact of G’s and E’s.
You can do some moderately complete test runs to see what the heritability approaches,
and then adjust the environmental variances to have approximately the relative variance
contribution you wish to simulate.
Family environmental effects are assigned to an individual when the individual is created.
These stay with the individual when s/he becomes a parent, meaning that the family
component is only applied to sibs.
A fixed environmental variance may not be realistic under natural selection, because
selection can move a population to a greater ‘fit’ within its environment, rather than
continually driving it in a given direction without the population phenotype mean not
reflecting its better fit by being less environmentally affected. This kind of effect can be
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 23
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
approximated by scaling environmental effects, say, to diminish during a run by
scaleEnvironmentNormalVariance # ##, where # is the multiplicative factor
by which environmental variance decreases per generation, and ## is the final variance,
after which the environmental variance remains constant.
The program changes the envrionmental variance by an amount # which if positive
decreases the variance each generation, or if negative increases the variance. The initial
variance as specified in the global block must be compatible with the specified changes!
Thus, if initially V=1.0 and the decrement # is 0.001, then the variance decreases to zero
in 1000 generations; if the specified scaling factor is negative, the environmental variance
will increase by that factor each generation, without limit during the simulation. The
scale instruction applies from the first simulated generation on (is specified in the global
block). If you want to modify that during the run, use an event instruction to alter the
phenotypic variance (the scale target specified in the global block will remain).
This usage only can be made with random, not family environments and applies to all
traits and must be specified in the global inputfile block.]
Polygenic background
ForSim does not explicitly differentiate genes with major effect from ‘polygenes’ that
may be numerous but have individually minor effect. One can implement polygenic
effects by defining many scattered genes whose gamma functions rarely would generate a
more than trivial effect. Parameter values like 0.1 0.5 will do this, for example. Other
‘major’ genes can have effect distributions with greater probability of major effect. As
noted above, one can do empirical not-too-long runs with a given envrionmental
component to see what value the heritability approaches, and adjust the environmental
and polygenic components to generate a desired heritbility.
Specifying natural selection (fitness)
Fitness is specified in various ways as
user options specified in the input file.
Fitness is defined in relative-fitness
terms, with maximum in a given
population of 1.0. Selection criteria are
imposed sequentially and independently
for each defined phenotype, so an
individual is saved for reproduction if
passing all the screens. Selection can be
applied to a compound of phenotypes
by defining a new phenotype, such as
PhenC = PhenA*PhenB, and applying
selection only to PhenC.
The simplest fitness function is a dichtomous 0-1 step function, truncation selection in
which fitness=1.0 for all phenotypes within a specified range relative to the mean and
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 24
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
variance of the phenotype distribution in the person’s population, and zero otherwise: the
input file specifies truncation limits expressed in standard deviations relative to the
current population/phenotype mean. Fitness is 1.0 within the limits, 0.0 outside of them
(or 0.0 within and 1.0 outside is also possible, by appropriate sign change). An
individual is removed from the eligible mating pool if its fitness=0.0. The Figure shows
how phenotype acceptability ranges can be specified for neutral (very broad cutoffs),
stabilizing (symmetric cutoffs), and directional selection (asymmetric cutoffs).
Individuals with phenotype beyond the cutoff are selected out, that is, excluded from
mating. The syntax is selection PhenotypeA relative # #, where the #’s refer
to the lower and upper cutoff, in StD units.
Alternatively, fitness can also be specified as a functional probability f, of being
excluded from reproduction, based on a user-specified mathematical function relating the
individual’s phenotype relative to the current phenotype distribution in its population,
such as increasing probability of exclusion from reproduction inversely proportional to
the square of the individual’s phenotype’s distance from the mean in SD units, or the
phenotype’s distance from some target optimal absolute phenotype value specified by the
user in the input file). This is schematically shown by the curve in the figure. Each
individual is assigned an f based on the user-specified function, and that value of f is used
in a random draw to determine the individual’s probability of being in the mating pool.
Matings are formed from individuals who pass this screen, and there is no separate
mating-based fertility-based function at present.
For example, in the following line:
selection PhenA functional 1.0 - ( 0.05 * ( ( Phenotype - Mean ) / StdDev ) )
The term “PhenA” in these examples defines which phenotype is being used in the
functional expression. In that expression, however, this must be referred to as
‘Phenotype’; in this example, which will call the relevant phenotype from the individual
whose fitness is being evaluated. The terms “Mean” and “StdDev” here refer to the
mean and standard deviation of the distribution of the PhenA phenotype in that
individual’s population at that generation. Fitness is relative and must be standardized to
a maximum of 1.0 (that is, must be in the interval [0,1]). In this example, 0.05 is a userspecified numerical value specifying the rate of decrease in fitness per unit difference
from the mean. The fitness function is evaluated for each individual in the population
and determines the probability that the individual will survive to reproduce.
NOTE: Specify numerical values in floating point format.
NOTE: As stated above in regard to phenotype definition, keep fitness functions simple.
Because ForSim must parse a wide varietey of possible functions with as little ambiguity
as possible, every number must be floating point, and there must be a space between
every item (except negative numbers, that can be written -2.3). See the discussion and
suggestions in the Addendum.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 25
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
NOTE: as with many situations, a population may not be able to survive some
conditions unless it is variable or large enough. For selection, it may be important to do a
burn-in number of generations before imposing the more stringent condition. The latter
can then be done via an event instruction. If this is not within the repertoire of options,
there are two other ways to achieve the same end. One is to found a new population after
the burn in, have everyone in the old population be donated to found the new population,
and have the first population die at that time (you may want to save its data for later
checking, with an output instruction at that generation). Second, you can stop the
simulation and serialize the data. Then, modify the input file by adding a load instruction
and change or implement the new selection regime and run the reloaded population.
A few notes on selection modeling
There is no one way that selection operates in Nature or in ForSim. At any given time,
theory models indivdiual fitness relative to others in the same population. Simulating
‘relative’ selection drives a population in the specified direction, but without limit unless
selection is changed by an event instruction. This may not be realistic. It may be that
selection drives a population towards some optimum mean phenotype. This can be
simulated by ‘functional’ fitness modeling (fitness highest near some optimal value, T).
But fitness relative to some threshold value requires that other simulation specifications
don’t drive the population so that nobody is fit—leading to extinction (unless extinction
conditions are what you’re simulating). For reasons of this sort, ForSim does not allow
specifying a rigid fitness threshold, above or below which an individual is culled from the
population; but fitness thresholds can be approximated by the function methodk as in
spcifying an option in the way just discussed. The funtional option makes it possible to
model an open-ended range of scenarios. Artificial selection and some other aspects of
selection are discussed in the Addendum.
Selection is based strictly on survival, with no specific provision for fertility-based
selection (i.e., where expected family size depends on parental phenotypes). However,
for functional rather than relative-truncation modes of selection, survival is probabilistic.
This means an individual with fitness f has probability f of being fully reproductive, and
1-f of not being available as a parent. This is approximately the same as having full
chance to survive but reduction by amount f in expected offspring number.
Whether or not you are specifying natural selection or letting things drift, it may be
interesting to see in generation-by-generation context, what happens to each SNP and
how its effects are manifest. For example, if a SNP has a given assigned effect in a
selectively favored direction, how noticeable is that allelic effect on the SNP’s fitness?
What has its net effect each generation? How well do your theoretical expectations of
how your genetic architecture evolves fit what actually happens? How long does a SNP
last in the population as a function of selection and its effects? By setting
usingTrackSNPs true, you get the data to find out (but it slows things down and saves
huge files, so don’t do it unless you mean it).
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 26
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Multiple populations
At any point after the start of the run, that is, from generation 1 onward, one can use the
event property to create new populations by donating a number of new males and
females from existing populations to the new population. Then you must set the mating
pattern within and between the existing populations. Each population must have a
population definition block with the appropriate ‘birth’ generation and population
size. See the example of multiple population simulation for details.
Simulating multiple chromosomes
You can simulate multiple chromosomes, but the only thing that achieves is independent
assortment. It does this at the expensve of logistic complexity complexity, especially in
the output file formats. As described there (Note on Multiple Chromosomes, Chapter 5),
this generates files that will require post-run melding. For this reason, one single
chromosome is greatly to be preferred, with long distance spacing between gene
segments that you want to assort effectively independently.
Specifying mate choice
Mate choice can be a form of fertility selection. Mating may usually be random within
and between populations. However nonrandom mating may be specified by using
mateCutoff internal or external in the input file. This is done in terms of the
chooser’s phenotype in Standard Deviation units of the chooser’s population’s phenotype
distribution. In searching for a spouse in its own (internal) or a specified other population
(external), an individual will only consider as potential mates individuals in the specified
range (in StdDev units) of the chooser’s population’s phenotype distribution relative to
the chooser’s own phenotype. Upper and lower SD limits are specified, and need not be
symmetric so that large-phenotype individuals can prefer large (or larger, if so specified)
mates. Choice continues for all specified phenotypes until a potential mate fails to satisfy
one of the criteria. If ranges for a given phenotype are not specified in the input file, any
value ± 8 SD from the choooser (that is, essentially anyone) will be accepted.
Creative use of population dynamics
Population history and structure are important both in genetic inference today and in
understanding how genetic variation and causation have evolved. Because ForSim can
create, grow, shrink, or destroy a population, or can have multiple populations do that
under different conditions and with gene flow between them, which can be changed at
any point or as many times as desired, a number of important demographic phenomena
can be simulated, including rapid expansion, bottlenecks, inbreeding due to small
populations, or even the generation and intercrossing of inbred populations such as of
experimental animals (probably not plants—they have very different fertility and
reproductive patterns). NOTE: The program must begin with only a single population.
If you want multiple populations, found them at Generation 1 or later. Also, if you are
reloading a population from a previous run using serialized files, there can be more than
one population.
The program saves graphs and individual phenotypes for each person in the population at
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 27
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
the final and penultimate generation. This can be used to assess the phenotypic response
to selection at the end of the simulation run.
The program will output some selected data and graphs to help monitor progress, every
user-specified number of generations in the output keyword line (see Chapter 5, Files
generated at each generation interval specified by output # in the input file, below).
Default is not to save several special files at all (which will save time and very much
storage space, if you have no need for intra-run data). With syntax described in Chapter
5, some of these files may be useful at least for the end generation, and/or you can
specify only certain variables to be saved at each. NOTE: if you want basically complete
data use the serializeState feature, described in section ‘Rerunning a ForSim
simulation’ below.
Prevalence refers to the cutoff value for calling someone ‘affected' for a dichotomous
trait, and is calculated based on the first phenotype defined in the input file. Prevalence is
specified either as a decimal fraction of the population affected (e.g., 0.10 means 10%)
[prevalence relative #, in the input file], which ForSim uses to determine
corresponding standard deviations from the mean, assuming that phenotype distributions
are roughly Gaussian, or alternatively, prevalence can be specified by an absolute
criterion, such that individuals whose first-defined phenotype exceeds some cutoff is
called ‘affected’ [prevalence absolute #], where # is the cutoff value (floating point
number)—this is potentially dangerous however, since you cannot determine in advance
whether anyone, or indeed everyone, might have such a phenotype.
Since ForSim parses the input file looking for keywords to set its internal parameters, but
the global block must be specified first, including the set of desired event changes (if
any), followed by definition of chromosomes and their content, then phenotypes, then
populations. This input.sim example uses a few of the user-accessible variables. There
are many such variables, and the list is complete as of version release time; but most will
be of little value to you and should not be changed unless you clearly have a reason and
know what you are doing (e.g., iid).
The following example is to simulate a single population with 1 chromosome, containing
2 genes that affect 2 phenotypes. NOTE: Values given are illustrative and do not suggest
that they should or must be used; you need to decide that.
Example 1
This is testExample1.sim in the distribution package, and can be edited and run.
NOTE: input files must be plain text, no hidden characters so don’t just cut-andpaste the examples in this Manual.
# WHAT: single population simulation, with 1 chromosome, 2 genes affecting
# 2 phenotypes, where both genes affect one trait but only one gene
# affects the other.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 28
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
global begin
# Begin definitions of global variables
generations 1000
# Length of simulation, in generations
# Single population sample parameters
1.0 megabases per centiMorgan male
# Recombination rate
1.0 megabases per centiMorgan female
# instead of separate male & female rates, there can be just one
# line labeled ‘all’ rather than by sex
mutation rate 2.5 E -8.0 female
# Optional: one line labeled ‘all’
mutation rate 2.5 E -8.0 male
usingTrackSNPs false
# Don’t track each SNP’s history (else set to ‘true’)
setFertility poisson 2.5 # poisson distributed fam size, mean 2.5
scaleEnvironmentNormalVariance false
setMaxOffspringNumber 8
# Maximum family size permitted
prevalence absolute 5.0
# Cutoff phenotype for ‘affected’ status
usingSpatial false
outputXML true
# Needed if you want this major data output file
outputSVG true # to get the multifeatured svg figures (see Ch. 5)
# Now
#
#
#
#
event
specify the events you want to happen in this simulation:
event
A keyword for special events that want during the
simulation at specified generation times. After the keyword,
a generation number when the event occurs must be specified,
and then the nature of the event (see multipop file below)
500 setPhenotypeSelection PopulationA PhenotypeA relative -1.0 1.0
#make env’t effects stronger at generation 700:
event 700 setEnvironmentNormal PopulationA PhenotypeA 0.0 2.5
event 750 printGeneration
# save population data in preMakeped format
output 500 # Interval in generations between partial data saves
end
# End of global variable definitions
chromosome begin
# Begin chromosome defining block.
Chromosomes are numbered in order of
definition, beginning with index 0
# Length of the chromosome, in basepairs
#
#
length 2000000
gene begin
name ABC1
location 200000
length 100000
gamma 1 0.05 1.0 0.5
# Begin gene defining block
# In basepairs along this chromosome
# Length of gene in basepairs
# Gamma parameters of phenotype effects
# of each new mutation
probabilityNoEffect 0.2
# prob new mutation has no effect
probabilityPositiveEffect 0.5
# probability new mutation has positive effect
end
# End of gene defining block
gene begin
# Specify another gene defining block
name ABC2
location 400000
length 100000
mutation rate 2.5 E -8 all
# Example of gene-specific mutation rate
gamma 1 0.05
probabilityNoEffect 0.2
probabilityPositiveEffect 0.5
end
# End of this gene defining block
end
# End of chromosome defining block
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 29
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
# Phenotype blocks require the naming of a phenotype and then a
# definition of that phenotype in the form of an algebraic formula.
# The algebraic formula must be expressed in terms of gene values
(specified by gene name), numerical values, and addition,
# subtraction, multiplication, and division operations. Two examples
# follow.
# First, a simple phenotype, consisting of the sum of the SNP
# phenotypes contributed by SNPs within genes ABC1 and ABC2.
phenotype begin # Begin phenotype defining block
name PhenotypeA
definition ABC1 + ABC2 * ifMale + 0.3 * ifFemale
def.
end # End of phenotype defining block
#
# Sex-specific phenotype
# The second phenotype example is more complex, and is defined as half # the
sum of half of the phenotypic value of ABC2 and three times the
# phenotype value of ABC1.
phenotype begin # Begin another phenotype defining block
name PhenotypeB
definition (3.0 * ABC1) + ((ABC2 / 2.0) / 2.0) # * environment + 2.0 *
familyEnvironment
# in actual file keep on same line with above
# NOTE: currently printGeneration does not work if environment variables
#
are included in the phenotype definition, so it’s commented out here
end
population begin # Begin population defining block
name PopulationA
birth 0 # Generation when population is created
initialSize 500 # Initial Size of the new population
death 40000
carryingCapacity 1000
growthRate 0.525 # at 2.1% per year for gen=25 yrs
selection PhenotypeA relative -4.8 4.8 # Selective regime
environmentNormal PhenotypeA 0.0 1.0 # Random environmental effects
familyEnvironmentNormal PhenotypeA 0.0 1.0 # family-specific env’ts
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
familyEnvironmentNormal PhenotypeB 0.0 1.0
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
-----------------------------------------------Sample input file for simulating multiple populations
ForSim provides a high degree of flexibility via the ability to simulate evolution in
multiple, interacting, hierarchical networks of populations. Populations can arise at any
time during the simulation, after the first generation, by the donation of individual males
and females from one or more existing populations (contributions from each of one or
two source populations to the new population are specified in separate lines in the input
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 30
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
file, using more than one line if more than 2 populations will contribute). Existing
populations exchange mates via a matrix of user-specified probabilities of random mate
selection from the chooser’s own, or any of the other existing populations (these
probabilities can be 0.0 if no exchange is desired). Populations can also be subjected to
differing evolutionary conditions. These can be changed at user-specified times by use of
the ‘event’ situation-change keyword. A population can donate mates to another existing
population at any point in the simulation, using the same donateParents syntax.
Most of the input file components that specify the evolution of a single population,
described above, are specified separately for each additional population so that it can
evolve in its own way, with its own parameters, mode of splitting from some other
population, and relationships to other existing populations. Parameters not shown as
population-specific in the following example are shared among all populations.
ForSim uses round-robin mating among populations. This allows different levels of
endogamy and exogamy (gene flow) to be specified. [NOTE: Populations are referenced
in the order they are created as specified in the input file, where the first-defined
population is indexed as population 0]. From population 0 to population n, in turn, mates
are chosen randomly from populations specified by a proportional mating matrix. Mates
are drawn with or without replacement as per the input file. Mates are drawn until each
population reaches its target parental population size. The mating matrix specifies the
probability that a mate in each population is selected from itself, or from any other
population. For no gene flow, simply give 100 for the fraction of mates from own
population, 0 from all others. [NOTES: user must specify a complete matrix of
dimension equal to the maximum number of populations to be simulated at any point
during the run. In generation times before a given population is created, its mate choice
probability is set at 0, then changed to the desired probability after the population exists.
And for efficiency and safety, if a population is specfied for ‘death’—to become extinct
in a given generation, if needed, subsequent mating matrices should be changed by event
lines to draw 0% mates from that population.]
The price of flexibility is a corresponding increase in the number of parameters that must
be specified, for multiple populations. However, making the specifications is
straightforward, and requires a parameter declaration block for each population.
Following are the components of a multipopulation simulation input file. Within each
population, local conditions regarding population size, duration, environmental effects,
and selection and the like can be specified.
User can specify the extinction of a population, at which generation its final data are
saved. Because each population can be specified with an extinction date, to prevent
extinction just omit the ‘death’ line (or, alternatively, set a death time greater than or
equal to the ‘generations’ value for the simulation). However, note that, as in real life,
stochastic extinction due to drift or selection could also occur, subject to chance and the
selection criteria.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 31
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
New populations can be given different environmental conditions, by specifying the the
distribution of random environmental effects imposed separately on each individual, ikn
terms of the parameters of a Normal distribution.
The following example input file simulates two chromosomes one containing 4 genes and
the other a single gene, 2 phenotypes, 3 populations with different founding dates, sizes,
and selection regimes.
Appendix 3 provides an even more complex situation, described with a flow diagram,
and its input.sim file.
Example 2:
This is included as testExample2.sim in the distribution package
# WHAT: Run name or description line here
global begin
setFertility poisson 2.0
setMaxOffspringNumber 8
prevalence relative 0.08
generations 1000
# in S.D. units of curr. phen. dist.
1.0 megabases per centiMorgan male
1.0 megabases per centiMorgan female
mutation rate 2.5 E -8.0 female
mutation rate 2.5 E -8.0 male
outputXML false
# Optional: one line labeled ‘all’
# set to ‘true’ if you want this major data output file
# Now specify the events you want to happen during in this simulation:
# First, found a new population at generation 400
# PopulationA donates individuals to PopulationB; the
# numbers specify how many males and females are donated
event 400 donateParents PopulationA PopulationB 250 250
# Now set probabilities for each population, that a mate comes from
# population numbers 0, 1, or 2
event 400 setMatingPopMatrix PopulationA 95 5 0
# PopnA picks 95% from self,
# 5% from PopnB, and 0 from not-yet-existing popnC
event 400 setMatingPopMatrix PopulationB 5 95 0
event 600 donateParents PopulationB PopulationC 250 250 #make PopnC
event 600 setMatingPopMatrix PopulationA 90 5 5
event 600 setMatingPopMatrix PopulationB 5 90 5
event 600 setMatingPopMatrix PopulationC 1 1 98
end
# end of global block
chromosome begin
length 4000000
# Properties of chromosome 0
gene begin
name ABC1
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 32
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
location 200000
length 100000
gamma 1 0.05 # defines a symmetrical gamma distribution with a shape
# parameter equaling 1 and a scale parameter equaling 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.9
end
gene begin
name ABC2
location 500000
length 100000
gamma 1 0.05 1 0.01 # defines a negative effect gamma distb
# with shape parameter equaling 1 and a scale parameter
# equaling 0.05, and a positive effect gamma with shape
# and scale of 1 and 0.01
probabilityNoEffect 0.6
probabilityPositiveEffect 0.3
end
gene begin
name ABC3
location 3300000
length 100000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.9
end
gene begin
name ABC4
location 3700000
length 100000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.6
probabilityPositiveEffect 0.3
end
end
#of chromosome 0 specifications
chromosome begin
length 4000000
#Properties of chromosome 1
gene begin
name ABC5
location 200000
length 100000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.9
end
end
#of chromosome 1 specifications
phenotype begin
name PhenotypeA
definition ABC1 + ABC3
end
population begin
name PopulationA
birth 0
#Properties of PopulationA (indexed 0)
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 33
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
initialSize 900
carryingCapacity 11000
growthRate 0.525
selection PhenotypeA relative -1.8 4.8 #selects for upper chunk
# Note no env’t specs needed for the phenotype if default valures are OK
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
end
#End of PopulationA
population begin #Properties of PopulationB (indexed 1)
name PopulationB
birth 400
initialSize 500
death 40000
carryingCapacity 1000
growthRate 0.525 # at 2.1% per year for gen=25 yrs
selection PhenotypeA relative -4.8 4.8
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
end
population begin #Properties of PopulationC (indexed 2)
name PopulationC
birth 600
initialSize 500
death 40000
carryingCapacity 1100
growthRate 0.525 # at 2.1% per year for gen=25 yrs
selection PhenotypeA relative -4.8 4.8
environmentNormal PhenotypeA 0.0 1.0
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
end
Rerunning a ForSim simulation:
At the end of each run, the final generation is saved to a file named
“serialized_generationNumber.txt” (where generationNumber is an integer). This file
can be reloaded by placing the following instruction in the global block of a simulation
input file:
load
/path/to/file/serialized_generationNumber.txt
When the simulation input file is executed, it will reload this generation as the founding
generation for the new simulation. The load feature can be used to generate replicate data
or large data sets. Or, the new run may specify different evolutionary parameters, if they
maintain consistency with the reloaded generation with respect to numbers and lengths of
chromosomes and numbers, lengths and locations of genes. As far as ForSim is
concerned, this is just a brand-new simulation, starting from generation 0. The input file
must specify everything ForSim needs to use the loaded data; for example, with multiple
populations, the mating matrix must be specified as of generation 0.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 34
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
[NOTE: when load is executed, each item is noted on the terminal screen as it is loaded.
There are many more than 2 x diploid population size items (e.g., chromosomes) but this
is not a bug. It reflects the fact that the end of a run involves pedigrees, but serialized
data really only reflect the final generation; some items were present in the pedigree but
have zero frequency in the final generation, but for programming convenience are saved
any way in the serialized file. This causes no problems.
NOTE: Because of internal ways the reload data are used, related to pedigree
construction and the like from loaded data, the ‘load’ runs should be at least 2 + pedigree
depth or more generations; to be safe you should explicitly use short runs to test how few
generations will result in successful runs with your input file conditions. If you wish to
use reload for replicating more population data, you can switch mutation, recombination,
selection, gene flow, etc. rates to zero, so that only drift is occurring.]
An option is to use the event keyword to save results during the run specifying the
generation (##) as in: event ## serializeState. This will put out a file
Serialize_## which contains all the information needed from a single generation to "reanimate" that generation, that is, to initiate a forward simulation with that generation as
the starting point. This can be used to iterate replicate data sets of a few generations’
depth, for example. The final serialized data are automatically saved whether or not such
event is specified.
Each line in a serialized file describes a single simulated "object," a SNP, a haplotype
region, and so on. We begin with the lowest level of the tree of objects, the SNPs.
SNP 2336 5006921
SNP 2867 5019728
SNP 3267 4024683
SNP 3922 3014852
.
.
.
# ENDSNPS 351
A
C
A
A
1
2
1
1
4
3
4
4
2
2
1
0
0
0
0
0
216
263
298
359
-0.0961303
0.0690454
0
0
Here, the "SNP" at the begining of the line simply identifies that the line describes a SNP.
The next field is the SNP ID, then the SNP location, SNP nucleotide, the numerical code
for the nucleotide, the numerical code for the ancestral nucleotide at this location, the
number of the gene in which the SNP resides, the chromosome on which it resides, the
generation the SNP became polymorphic, and the phenotype contribution of the SNP.
Finally, after all the SNPs are printed, we print a line that begins with the pound sign, the
string "ENDSNPS" and the number of SNPs that have been serialized (which acts as an
internal check; if we have not read 351 SNPs in this case, then something has gone badly
wrong.)
Next, we print the "HaploGenes." Recall that a HaploGene in our terminology is a
specific haplotype at a given gene.
HaploGene 259122 1 1 6189 0.357618 3 184903 34920 561
HaploGene 262905 0 0 6282 -2.09839 5 70718 131960 7191 59569 16986
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 35
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
HaploGene 274091 1 1 6547 -0.850774 4 165806 62336 195582 66375
.
.
.
# ENDHAPLOGENES 464
Here, we begin with the string "HaploGene," the unique ID, the sequential order of the
gene as entered in the input file, the sequential number of the gene on the chromosome on
which it resides, the generation this specific haplotype came into existence, the net
phenotype effect of the SNPs on the haplogene, then the count of SNPs within this
haplotype region followed by the unique IDs of the SNPs in linear order. The
"ENDHAPLOGENES" line also includes a count of the HaploGenes.
Now we handle the chromosome haplotypes:
Chromosome 6083381 0 3000000 3 3 1 9761 386022 259122 358416
Chromosome 6090432 0 3000000 3 3 1 9772 368724 373336 408573
Chromosome 6109270 0 3000000 3 3 1 9802 380821 332436 354578
.
.
.
# ENDCHROMOSOMES 18379
Again, we begin with a static string identifying the line type "Chromosome" followed by
the unique ID of the chromosome, the sequential number of the chromosome in the or
specified in the input file, the length of the chromosome, the count of genes on the
chromosome, the number of genes on all chromosomes, the number of simulated
phenotypes, the generation this specific chromosomal haplotype came into existence, and
then the unique IDs of the HaploGenes contained on the chromosomal haplotype. After
all chromosomes are printed, another end tag that provides a count of the unique
chromosome sequences.
Populations of individuals are preceded with a line like the following:
Population PopA 10001
This line always begins with the string "Population" followed by the name of the
population as specfied in the input file and the population size. This line is followed by
all individuals from this generation who have survived the selection process and have
entered the pool of potential mates. All potentially mating males are listed first, then all
potentially mating females.
The line begins with the static string "Individual," then the unique individual ID, a
number which defines whether the individual is male (1) or female (0), then all "left"
chromosomal haplotypes in sequential order for each chromosome being simulated, and
then all "right" chromosomal haplotypes in sequential order. Finally, all individual
phenotypes, in the order they are declared in the input file. The end tag line gives the
number of simulated populations.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 36
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Individual 100004802 1 6217189 6220312 -0.510328 0
Individual 100003775 1 6212114 6219228 -3.15124 0
Individual 100005452 1 6230772 6170063 -2.86811 0
.
.
.
# ENDPOPDATA 1
So long as you don’t alter aspects that undermine the characteristics of the saved data,
you can change the initial run instructions in the input file (such as selection, mate
choice, or gene-mutation-effect parameters), and rerun (using a ‘load’ instruction in the
input file); the population will henceforth be subject to the altered conditions.
Specialized features
Some extended or more specialized features are possible with ForSim. But be aware that
needless complications will use more memory and slow the program.
Gene coding structure (approximated)
It is possible to set up genes with codon triplets, starting with the first position in the
gene. Given the gene’s overall probability that a mutation has a phenotypic effect, then
for each codon position a second random number is drawn and if the two are positive,
then the various type of effect (the gamma distribution statements) go into effect for that
new mutation. Thus, if the gene definition specified that 50% of mutations had an effect,
and the mutation hits a second codon position, then there is a 90% probability that (if the
first 50% test is passed) the mutation will have an effect. Then the specified positive and
negative effect probabilities, with their respective gamma distributions, go into effect. An
example of the specification format is:
usingCodons true
firstCodonProbability 1.0
secondCodonProbability 0.9
thirdCodonProbability 0.5
The default is false, so none of these lines need be in the input file if the feature is not
being used. Note that this is not specific to the actual amino acid code, but just a way to
distribute effects in a systematially non-uniform way.
Introns
Likewise, within a gene definition block you can specify that genes have intronic regions,
basically just sections in which mutations have no effect:
intron 5000 2000
This specifies that the intron begins 5kbp from the start of the gene and is 2kbp long. All
mutations which arise in this region will contribute no phenotypic effect. The default is
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 37
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
not to have introns, and need not be specified in the input file.
Spatial genotype distributions
It is possible to generate a single population that is distributed over 2-dimensional spatial
rectangular grid, that expands from the [0,0] corner. This works as follows: each
individual passes the usual phenotype definition and selective screens. When mates are
chosen, a male searches the surrounding grid-points randomly for eligible mates (can use
the usual range of mate-choice criteria). If there are eligible mates and the offspring
survive selection screen, they are given a location based on a Normal (0,1) distribution of
grid-point displacement from the father’s X and Y locations. If no eligible mates exist,
that individual will not mate. This simulates gradual expansion or isolation-by-distance
models of population history.
To use this option, specify
usingSpatial true Xmax
Ymax
DispersionMean
DispersonVariance
in the input file, where Xmax is the number of locations in the X direction, Ymax in the
Y direction, and the dispersion is Nor(DispersonMean, DispersionVariance). A
‘standard’ set of values is 1000 1000 0.0 1.0, but there is nothing biologically based about
these values, and the dimensional space need not be a square. At the end, a figure of the
final population distribution is produced (see figures, below).
At the end of the simulation, ForSim produces the usual data files except that the
pedigree (pre-Makeped) files have x and y location pre-pended for each individual. For
analysis, individuals may be sampled from anywhere in the grid by sampling based on xy properties of each individuals, or randomly, etc. The results can then be used as input
for population-history or structure programs such as Structure, or other geographically
based analysis.
When usingSpatial, ForSim will generate locations_population_generation#.txt
text file. And if the outputSVG instruction, which enables the preparation of svg files, is
set to true in the input file, a locations_PopName_Generation#.txt.SVG graphics file is
generated. In addition, the preMakePed.txt files will have X- and Y- coordinate locations
added to the Prepended list of variables, and these values will also appear in the xml files
if the latter option is chosen. These will be listed in the header line as well, to show their
positions.
NOTE: Using spatial simulation is not compatible with multiple populations, or using
usingTrackSNPs, so don’t try both in the same simulation. Only a single population
may be simulated. However, multiple populations founded at subsequent generations can
be simulated serial founder effects, and since they can have whatever mating matrix is
specified they can be treated as being spatially arranged. Or, if donors are solely from the
most recent population, and gene flow only between adjacent (i.e., in founding order)
populations, one can simulate geographic expansion via serial founder effect.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 38
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Multi-allelic SNPs
By default, ForSim is a 2-state SNP simulator. That basically means that it uses an
infinite sites evolutionary model. But by specifying usingRecurrentMutation
false, which is the dfefault, if a mutation hits a site where a SNP is already
polymorphic no new mutation is allowed. If at the chosen site a novel allele has arisen
but been lost, a new mutation arises, of any different nucleotide, which could include the
one that’s lost. This could allow a recurrent mutation under those conditions, but not
simultaneously with an existing variant. If a SNP novel allele has been fixed, the site
cannot re-mutate.
If usingRecurrentMutation is set to true in the input file, then there can be multiallelic sites. In this case, if a novel allele at a location has arisen but been lost, it can be
produced again; if fixed, a new allele different from the original (ancestral) or the fixed
(novel) allele is chosen. But recurrent mutation to an existing polymorphic allele does
not occur. Under these conditions there can be up to 4 SNP alleles at a site, but this is not
the same as a recurrent mutation (true finite sites) model, although such a thing could be
approximated (e.g., by treating adjacent nucleotide positions as being the same).
Since multi-allelic SNPs are statistically relatively rare in the real world, and since much
analytic software is based on 2-state SNPs, users should be clear why they want to invoke
this option. NOTE: the history of a lost SNP allele can be identified if
usingTrackSNPs is set to true (see below, in Chapter 5).
For ways to achieve things not specifically provided, see Addendum and the sections on
‘serialize’.
Complex “case-control” comparison figures
runForsim can use simulated data to compute esthetic and highly informative multifeature figures. It currently does so by including output true in the input file. Even
more detailed figures called Hap_GeneName_Gen#.svg will be produced (see figure
description below). These are sorted by first-defined phenotype, from high to low values,
so that you can use a case definition as a rough cutoff to compare affected and unaffected
SNP presences, as you scan down the figure.
Genetic epidemiological uses of the results
The pedigree files generated by ForSim are extensive if the population is large, and will
reflect the family size distribution and so on that were specified in the input file. But if
less than a whole or random sample of family data is desired, scripts can easily be written
to select them from the preMakeped.txt files. As an example, ascertainment sampling
schemes can be achieved by reading each pedigree and deciding if it qualifies.
For example, for epidemiological purposes an output file relativeRisk.txt is saved, that
provides data on risks and relative risks based on the prevalence value (see Chapter 5).
To identify pedigrees by single ascertainment, one could read the first line of a pedigree,
save the pedigree ID, and generation number, g, (part of the prepended data), read and
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 39
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
save lines until reaching the first person in generation g+2 (if 3-generation pedigrees
were specified) and if that person is affected, continue reading and saving until reaching
the next pedigree. If the person is unaffected, discard the saved lines and break the loop,
moving to the next pedigree.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 40
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
CHAPTER 5: USERS’ OUTPUT FILES ForSim produces a variety of output files, some of which are used by the program itself,
for example, to generate graphics, or for program-monitoring purposes, and may not be
otherwise useful to the user. You may have no direct scientific use for some of these text
or graphic files, but they can be very useful for debugging what you think your input file
is attempting to do. As examples, population size by generation is useful to see if the
program is maintaining the specified size; patterns of gene-specific variation, or plot of
individuals culled by natural selection, or the population phenotype distribution by
generation may reveal that how you have specified selection is not what you intended.
Careful scrutiny of results is the best and fastest first debugging test. Note that files not
described below are for our internal working use.
Depending on what you are simulating, ForSim may save very large files. To conserve
on disk space, make the results easier to move or store on your system, or to ftp them to
some other site for storage, use the file-compress option (-c true) in the line that runs
ForSim (see the install section, Chapter 1, above).
NOTE ON MULTIPLE CHROMOSOMES: ForSim can run an arbitrary number of
chromosomes, and when pedigree and marker files are saved (see descriptions below),
there will be a separate set for each chromosome whose filename includes the
chromosome number (indexed from 0). For a single analysis with n chromsomes, one
would have to meld all n files to generate a single complete Marker and single complete
sets of pedigree files. This will have to be scripted by you, and can be complicated. For
practical and logistic reasons, we strongly suggest simulating only a single chromosome,
but putting large spacing between genes intended to be effectively on separate
chromosomes (in terms of recombination); doing this will generate the correct linkage
patterns but only a single set of Marker and pedigree files to work with.
A warning!
ForSim is designed for maximal flexibility and usefulness rather than rigidity. It
generates very large amounts of data. In default mode, most of this is used on the fly and
discarded when no longer needed. In default mode, all of the data at the end of the
simulation are automatically saved in an appropriately labeled directory, as described
below. This includes graphics that display aspects of the entire run (e.g., plots of
population size by generation). For most applications these final data may be the only
data of interest (e.g., to use in inferential software, such as for linkage or association
analysis). But you can specify more information, at intervals of your choice during the
simulation, at which points all the then-current data will be dumped for later post-run
analysis. You could even do this every generation. However, you should be careful what
you ask for because the amount of data can easily amount to many Gigabytes. If you
want intermediate data, choose appropriately spaced intervals (e.g., every 1000
generations).
You can use the input file to take advantage of the values of a number of run-time
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 41
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
variables (see keyword table above). These variables are real-time values that can also be
user-modified with event Generation# set instructions, but it is dangerous to do that,
so be careful!
To monitor what goes on during a run, so as for example to see if it behaves as expected
for the input file you have constructed, you could do a test run with small samples, few
genes or populations, or not too many generations. Tinker with your input specifications
if you need to, then remove or comment out the input-file lines specifying intermediate
data saves (or make them happen at fewer checkpoint generations), and then do your fullsize run. This will provide useful intermediate data if you need it, plus the final results,
without becoming unmanageable.
As a general note, for genetic epidemiological uses of ForSim, the program produces
several f iles. The MarkerInfo and PreMakeped files are relevant for mapping studies.
The input file includes stipulation of a prevalence value based on the phenotype that is
listed first in the input file. When specified in relative terms, the prevalence cutoff is in
SD units from the population phenotype mean in the final generation, that is, we assume
the phenotype is approximately normally distributed. The PreMakeped file contains an
entry for affection status determined by this cutoff value.
This affection status can be used in mapping studies of case-control design. To do this,
the MarkerInfo and PreMakePed files need to be modified by a script, to (1) add a first
column in MarkerInfo.txt with chromosome number (since they are indexed from 0, the
first simulated chromosome is chromosome 0), (2a) use the preMakePed.txt (or
genPed.txt if you used printGeneration) header line to determine the number of prepended columns, then (2b) don’t save the header line (in any of these files) and (2c) read
each individual’s line, remove the pre-pended columns and write the remaining line (now
in standard preMakePed format) to a modified preMakePed file. This can then be used in
mapping software, such as Plink, Haploview, etc., for identifying SNPs passing some
chosen significance test. NOTE: remember not to include the word ‘#prepended:’ in
identifying the number of these variables.
For more detailed or evolutionary analysis, the other files will be useful to track the
history of SNPs, phenotype distributions, selection, haplotypes, genetic diversity,
multiple populations and the like. These files are necessarily more complex, and users
should experiment with simple runs to learn how to use them effectively.
A word about pedigree files and analysis
Ascertainment of pedigrees for genetic analysis can be a tricky business, so please be
deliberate in selecting the data from a simulation run to analyze in your chosen way!
ForSim generates two kinds of pedigree files. At the end of the run the pedigrees that
comprised the population to produce the user-specified number of generations (default=3)
are saved in preMakeped files (the number of files depends on population size, with 1000
pedigrees per file—this has to do with space and memory considerations.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 42
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
To form these pedigrees, ForSim ‘remembers’ who was in each generation before the
final generation, back to the appropriate generational depth. Because the whole
population is included, individuals may appear more than once, as they can do in real life
with multiple marriages (the default state, mating without replacement, avoids this, but
then the population size may not be as high as the specified carrying capacity; normally,
the population will grow back, by increased mean family size as discussed elsewhere, but
this won’t affect the end-of-run population size. Under some conditions of family size,
mate choice, or selection, it could be that mating without replacement leads to
insufficient numbers of mates and/or even population extinction—as in real life!—so be
careful).
The last generation of end-of-run pedigrees has not been subject to the selection screen
that takes place normally before mates are chosen. If you want a population that has
already been screened, use event ## serializeState and use that result, a postselection population.
If you want deeper pedigrees or want to do various kinds of ascertainment, then you
should specify the size of total population you estimate will be needed in order to have
the appropriate number of pedigrees, use an event instruction to have the population
grow to that size, or to change family sizes, and then use post-run scripting to screen the
preMakeped files to select pedigrees according to your ascertainment scheme. If a very
large number is desired, use the serialize and load features to generation multiple sets of
modest-size data, and rerun the same conditions for pedigree-depth number of
generations (typically 3), with n (the number of reps) specified in the runForsim.rb
command line large enough that the resulting runs give you your desired number of
pedigrees as many times as desired. You can set mutation and/or recombination to zero
in this reload input file, stop any natural selection, etc. so the additional pedigrees are
from essentially the same population as your original run. The data saved for reload are
only the final generation of the initial run. The preMakeped files from these iterations
can then be merged to make the total desired pedigree set. If minimal overlap of
individuals between pedigrees is desired, make sure the population is large relative to the
needed number of pedigrees, and randomly sample them from the whole preMakeped
file(s).
The normal end-of-run pedigrees are of the whole population, and so there is no
ascertainment bias.
You may want to analyze just a single generation of individuals. A simple way to do this
is just read the preMakePed files, check the prepended generation number column, and
only save those lines from the final generation, and use MarkerInfor##.txt to see the
SNPs present.
printGeneration utility: An alternative, with some restrictions (see NOTE, below).
If you want to analyze a single generation population during a run, rather than just in the
ending population, you can specify event # printGeneration, to invoke the
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 43
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
printGeneration utility function. This will save the population at generation # in
the usual preMakeped and MarkerInfo format files, including prepended values (but just
for the specified generation). As with preMakePed files, the genPed output files will be
separated into subsets with 1000 individuals each, as in other preMakeped.txt output
files, whose standard names will be appended with the generation number:
genPed##[email protected]#.txt
generation_@_MarkerInfo##.txt
generation_@_MarkerFreqs##.txt,
generation_@_MarkerStats##.txt
where ## is chromosome number and # the file subset.
NOTE: Current restrictions on this utility instruction are that (1) the generation number
must be ≥ generation most recent population created + pedigree depth, and (2) less than
the final pedigree depth at the end of the run; (3) all populations simulated in the entire
run must have been created and still exist in the specified generation; and (4) the
instruction does not work if ‘environment’ or ‘familyEnvironment’ are explicitly
included in the phenotype definition. The reasons for these conditions have to do with
the order in which events are completed at any given generation, related to whether
variables have been assigned values or are accessible, plus the 0-based indexing of
generation numbers.
If you need single generation data within these restrictions, there are two ways to get it:
1. Run that number of generations, and stop; this will save preMakePed files for that
generation. Then, modify the input file to specifying the subsequent conditions you
would have run, and use ‘load’ to resume, using the serialized file saved at the end of the
initial run. Then, extract individuals from the preMakePed files for the stopped
generation(s) (this will be slightly faster if you specify pedigree depth 2).
2. Specify event # outputXML, where # is the generation for which you wish the
data (have a separate eventline for each such generation you want). Then write a script
to read the XML file and extract the data (for example, into Marker and preMakePed file
format); a script to do this is in development.
File formats and examples
Files are saved within a directory called ‘runData’ and for each run a date-time stamped
subfolder is created (the folder name will end with Run#, so results from multiple runs
can be identified separately).
In the data output files SnpID’s have the form: SNP0XTL1000790ID213, coded as
follows for parsing (as by regular-expression scripts):
SNP
Chromosome Number (0 in this example)
X
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 44
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Novel Nucleotide (A,C,G,T) (T in this example)
L
Chromosome coordinate (1000790 in this example), the location of the SNP
ID
Sequential SNP number (213 in this example)
The program produces modified PreMakeped files of the last 3-generations (or 2, or
deeper if so specified in the input file) of the entire data set, plus an accompanying
MarkerID file (marker name, marker location).
The modification to standard PreMakeped format includes (1) a header line specifying
what variables are pre-pended to the standard PreMakeped file, and (2) each line
thereafter is pre-pended with those variables specified in the header line. This includes
the population name and other variables (see example below). The modification of the
PreMakeped format is important because of the way ForSim handles multiple
populations with migration, in which individuals in a pedigree may not all be from the
same population. Therefore, before use in software that requires standard pre-Makeped
format, the header line, and the pre-pended variables on the subsequent lines must be
removed to generate LINKAGE-format input. A script can easily be written to do this.
Note also that there is variation in pre-Makeped formats. Some software assume that
there is only one phenotype, affection status (e.g., disease, normal), while other software
accommodates a user-specified number of quantitative or qualitative phenotypes.
ForSim generates a set of files at the end of every run. Optionally, the user may use the
'output n' keyword to specify a dump of data at every n generations during the run. If
not specified, these files are not generated; if specified as the last generation the files
(SnpStatsGen, HaploPhenGen, phenotypes_PopName_Generation) are saved; if at a
number less than the last generation, these files will be saved every that-many
generations. If output n is specified, the files have names that identify the generation
number, as in examples below.
NOTE: If event instructions are specified that alter the natural selection strategy,
separate files will trace the population for each segment of the run under the respective
strategies. There will still be internalRunData files for the whole run.
The major output file named ‘generation_#_Popname.xml’ is a complete data dump of
the population at the specified generation, and is hierarchically organized, so it is
generated in XML format for easier parsing (e.g., by RegEx scanning for category tags).
This is a basic, but potentially large file, so the input file needs the statement:
“outputXML true” to generate that file, or specify ‘false’ if you won’t need it.
NOTE: The xml data files are comprehensive and very useful, but very large. If you
specify event gen## outputXML this file will be saved at the specified generation, but
you should only do this if you really want all that data. If the option is set, the .xml file
will contain the final generation only (pedigree files also contain the ancestors for the
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 45
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
user-specified parental generations).
In the descriptions below are various terms:
* UID is the unique ID number of the relevant item (gene, HaploGene, individual…)
* Phen means the assigned phenotypic effect of a SNP allele, or the net (summed)
phenotypic effect of all the SNPs in a given HaploGene
* HaploGene is the haplotype of a given copy of a gene
* born means the generation in which the item first appeared
* count the number of copies of the item in the current generation
* location is the starting nucleotide position along the chromosome
* Sequential/Male/FemaleNumber is their sequential order in the out put data
* census counts of the population size at the relevant time, males +females
Note that numbers for multiple categories, like genes, chromosomes, etc. are generally
used in these files, rather than the unit names if they were given. This is done in 0indexed array order, so a gene defined as Gene1 will be printed out labeled Gene0, gene
named Gene2 will be listed as Gene1, etc., and similarly for phenotypes.
Files automatically generated at the end of a ForSim run:
NOTE that if a population dies during the run, some population-specific files will not be
generated for it in the usual end-of-run output.
currentInput.txt The input file for the run being reported. The first line contains the
commented name of the input file (e.g,, # thisInput.sim). The rest is the contents of that
file.
internalRunData.txt Some summary data of the run: Each line contains Generation,
population count, unique chromosomes, unique Haplogenes, number of SNPs, number of
SNPs lost in this generation, novel mutations in this generation, number of recombination
events, heritability of first phenotype, and HaploGene heterozygosity in percentage form
and the runtime, in seconds, of the reporting generation:
Generation KidsCount Chromosomes HaploGenes SnpSites SnpsLost
MutationEvents RecombinationEvents intraGeneRecombinationEvents
Heritability observedHaplotypeHeterozygosity elapsedSeconds
0 5000 11825 0 0 0 260 2015 105 0 0.209958 0
1 5001 13609 105 260 260 243 1975 105 0.00010376 0.372 0
2 5000 6196 453 243 101 242 1992 91 0.000162504 0.990198 0
3 4999 5935 587 384 143 240 1954 101 0.000218168 1.67867 1
4 4998 5864 724 481 157 258 2003 99 0.00032797 2.29246 1
5 4999 5818 868 582 182 241 1958 107 0.000395046 3.0126 1
internalRunDataFull.txt Some summary data of the run: Each line contains Generation,
population count, survivors of selection that generation, number culled by selection,
unique chromosomes (i.e., each different sequence is one), unique Haplogenes, number
of SNPs, number of SNPs lost in this generation, novel mutations in this generation,
number of recombination events, GenePerEffect (an estimate of heritability equal to
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 46
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
genotypic variance/(environmental+genotypic variance, for the first-named phenotype;
this will not be accurate if there is GxE correlation, of course), FastHGHetero (for every
gene on the first chromosome, for the first population, the haplogene (gene-specific
haplotype) heterozygosity), mean, variance, and standard deviation of the first-defined
phenotype:
Generation PopSize Survived Culled Chromosomes HaploGenes SnpSites
SnpsLost MutationEvents RecombinationEvents GenPerEffect FastHGHetero
PhenMean0 PhenVar0 PhenStDev0
1 0 0 0 6128 2 5 5 1 74 0.500907 0.1 0.00460933 1.01534 1.00764
2 0 1 1 2644 4 1 1 4 65 0.500202 0.166611 -0.0133695 1.02679 1.0133
3 0 0 0 2284 8 4 1 1 65 0.499953 0.266667 -0.0331265 1.01107 1.00552
MarkerInfo##.txt For a given chromosome (##, indexed starting at 00), the SNP IDs and
their location (for use with Linkage input). Includes any SNP site with a novel allele
(i.e., not the initial SNP allele) is present in the pedigrees, even if it is fixed and no longer
variable (this is because it’s phenotypic effects are still present). Location and IDnumber
are also embedded in SNP names that include the nucleotide (right after ‘SNP’):
SNP0XCL100945ID169912 100945
SNP0XGL100981ID169971 100981
SNP0XCL100996ID154771 100996
MarkerPedSnps.txt SNPs with novel allele still present anywhere in the final pedigrees
(even if the novel allele has become fixed), when the allele was created by mutation,
their basic identity and assigned phenotypic effects. SNPs at all chromosomes are listed,
with their unique IDs their chromosome location can be tracked (if usingTrackSNPs
is set to true, the counts at every generation are also included at the end of each line):
Snp name=SNP0XCL500176ID208; born=45; uid=208;
location=500176; nucleotide=C; phen=1.03721;
SNP0XCL500176ID208 0 500176 45 1.03721
MarkerStats##.txt For chromosome ##, and for all SNP sites with novel allele present in
any generation of the final pedigrees (even if it is fixed). Each line contains a SNP record
of chromosome number##, the new nucleotide, and in 1..4 <-ACGT format the
nucleotide, location when created current count in the final simulated generation (bottom
of the pedigrees), and phenotypic effect:
SNPID Nucleotide Novel Ancestral Number Location Born Count Phenotype
SNP0XCL100945ID169912 C 2 3 169912 100945 1999 1 -0.465247
SNP0XGL100981ID169971 G 3 2 169971 100981 1999 1 -1.42006
MarkerFreqs##.txt Same as MarkerStats but includes frequency in the final generation
instead of a count.
SNPID Nucleotide Novel Ancestral Number Location Born Freq Phenotype
SNP0XCL100945ID169912 C 2 3 169912 100945 1999 0.00049975 -0.465247
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 47
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
SNP0XGL100981ID169971 G 3 2 169971 100981 1999 0.00049975 -1.42006
geneEntropies.txt For each generation, for each gene, the mean entropy measure for all
populations pooled, computed as: For each HaploGene, this equals
∑ - frequency*log10(frequency)
summed over all unique HaploGenes (haplotypes at that gene). This is useful to see the
effect of selection on specific haplotypes.
Gene0 Gene1 Gene2 Gene3
1 0 0 0 0 0 0 1 1
1.58496 1.58496 1.58496
3.58496 3.70044 3.80735
3.70044 4.08746 4.16992
3.90689 3.80735 4.24793
Gene4 Gene5 Gene6 Gene7 Gene8
0 0 0 0 2 2
2 1.58496 2.80736 2 2.32193 2.58496
2.32193 3 3.16992 3.58496 4.32193 3.90689
3.80735 3.90689 3.45943 3.90689 4.16992 4.08746
finalPedigreeHaplogenes.xml For each HaploGene (haplotype in a given gene), the
details of the HaploGene and its constituent SNPs. NOTE: SNPs reported in this file
only include the derived (novel) SNP (which may or may not be fixed at the time); the
ancestral alleles are not specifically listed, mainly to save on the enormous file size that
could otherwise result, since in most simulations most alleles will be ancestral.
uid=unique ID, count is the number of copies in the current populations that were
simulated. The snpcount is the number of novel SNP alleles on that HaploGene.
HaploGenes are listed in their ‘born’ generation order, not their chromosome order; their
relative locations on chromosomes can be inferred by the gene names, as they were
specified in the input file (onChromosome gives the chromosome number, remembering
that this is indexed from 0 in the order specified in the input file, and can also be found in
the SNP IDs after ‘SNP’):
<HaploGene name="ABC10" onChromosome=”0” born="4741" uid="16574956"
snpcount="115" count="21" phen="0.378721">
<Snp name="SNP0XAL109002507ID2050188" count="6240"
born="820" uid="2050188" location="109002507" nucleotide="A"
phen="0.0388587"/>
<Snp name="SNP0XCL109004218ID6334505" count="6240"
born="2532" uid="6334505" location="109004218" nucleotide="C" phen="0.0715263"/>
.
.
</HaploGene>
<HaploGene name=”ABC11” on Chromsome=”0” born=”4823”
uid="16587992" snpcount="106" count="38" phen="0.22000">
NOTE also that the xml file provides population data, but does not include each
individual’s ‘sex’. This is because generally (unless using ifMale and ifFemale in
phenotype definition), this doesn’t have any meaning. However, if you need it you can
extract sex from each Indivdual’s line in the corresponding serialized file.
preMakePed##_subset#.txt For chromosome##, and subset#, a modified preMakeped
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 48
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
file. Because files can be very large, a separate file is generated every 1000 pedigrees,
subset files numbered sequentially (the set of files contain the full pedigree data for the
population). It will differ from standard PreMakeped formats by having a header line
and each data line pre-pended with several values. These have been added for post-run
analytic convenience, and the header identifies these values (the example below shows
the current set). The IndividualID is the UID found in other files, and is included because
in generating pedigrees, ForSim creates new pedigree-specific sequential IDs for the
individuals (this is sometimes convenient for Linkage users; pre-pending the UID allows
setting up correspondence with information in other output files). The standard line
includes these space-separated fields:
Pedigree# PersonID fatherId motherId sex phenotype 1 1 2 1 3 3 ….
[the numbers are pairs of the individuals’ alleles at the tested markers (paternal, maternal)
for each SNP sites in their chromosome order as specified in the Marker input file].
Since ForSim generates pedigrees from complete data, top-generation individuals in a
pedigree will have father’s and mother’s IDs. Here phenotype means affection status,
coded as 0, 1, or 2 where 0 is unknown, 1 is unaffected, and 2 is affected. But since
ForSim has complete data, 0 (unknown) should never occur.
The pre-pended variables are included as a convenience for indentifying individual
properties. These include each individual’s net first-defined phenotype, the ‘genetic’
phenotype, the Environment and the familyEnvironment contribution. At present,
because of a possible future polygenic background feature, there is also a
PolygenicComponent prepend column, whose value is set to 0 for all individuals in the
preMakePed files and can be ignored. The value of the phenotype is worked out for that
individual from the phenotype definition and his genotype and environments. In default,
the two environments (familyEnvironment itself defaults to zero) are added to the genetic
contribution that was specified in the Phenotype definition; that means the net genetic
contribution can be determined by subtraction (phenotype – Envt – familyEnvt). The
‘geneticPhenotype’ column is a redundant repetition of the overall phenotype: it is the
compound of G and/or E effects specified in the phenotype definition. But if
environments are specified in non-additive ways and/or that interact with genotypic
effects in the phenotype definition, this subtraction does not apply! So use (or ignore)
these file fields advisedly.
NOTES: For program legacy reasons, a polygeneComponent column, with value fixed at
0 is prepended.
Before using programs like Linkage that want PreMakeped format, you must delete the
header line and then crop the pre-pended values from each data line. This is necessary
even in one-population simulations. The subfile number refers to the fact that for file
management in large runs, the total data are divided into subfiles that need to be
concatenated for a single analysis. The prepended variables currently are as given below
(except that they also include x and y coordinate locations for each individual, if the
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 49
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
simulation was run specifying usingSpatial), followed by the standard variables
shown in bold above. Values are given for each simulated phenotype, labeled in the
order in which they were specified in the input file (not the name given). NOTE also that
the prepended variables may change not just with the number of phenotypes but if or
when some in-development features like adding a polygenic background component are
added; but their identities will be listed in the #Prepended line.
#Prepended: Population Phenotype0 GeneticPhenotype0
EnvironmentalPhenotype0 FamilyEnvironmentalPhenotype0 Phenotype1
GeneticPhenotype1 EnvironmentalPhenotype1 FamilyEnvironmentalPhenotype1
Generation
PopA 0.336108 -0.0249734 0.361081 0 0.127397 -0.0249734 0.152371 0 98
840 295611 291699 293037 1 2 1 1 3 3 2 2 3 3 3 3 1 1 4 4 3 3 3 3 4 4 2
PopA 0.345266 0 0.345266 0 0.31541 0 0.31541 0 98 840 295635 293095
290988 2 2 1 1 3 3 2 2 3 3 3 3 1 1 4 4 3 3 3 3 4 4 3 3 4 4 3 3 2 2 1 1
runtime.txt Miscellaneous run data, including the total run time :
Finished: Thu Jun 21 13:02:52 2007
Run took 73 seconds, output took 11 seconds.
relativeRisk.txt Description of some phenotype characteristics after the run. Pcutoff is
the phenotype level above which one is scored as ‘affected’. The other entries are self
explanatory:
766 of 10124 offspring in final
50 first siblings have affected
2734 of 3109 first two siblings
pcutoff was set to: 15.0216
first siblings == 3816
first siblings affected == 292
sibling pairs == 3109
sibling pairs, both affected ==
overall risk == 0.0756618
sibling risk == 0.171233
generation are affected
second siblings in final generation
have concordant affectation status
50
snp since last dumpDataFinal.txt For each SNP present or fixed at the end of the run,
name, chromosomal location, born-on generation, whether fixed or lost, count in current
generation, diploid population size, current frequency, assigned phenotypic effect. Gives
lost/fixed SNPs since last dump, plus current polymorphic SNPs. [NOTE: This file is
like the other snpLifeSpanData files except that they are dumps of fixed or lost SNPs
only, while this also includes those still present at frequencies in (0,1). To tally all SNPs
produced in the run, user must also include the other snpLifeSpanData files.]:
# Name Status Location Born DateFixedOrLost CurrentCount DiploidCount
Freq Phenotype
SNPGL939347ID6588 Lost 939347 97 1890 0 6002 0 0.461625
SNPGL418416ID6628 Lost 418416 98 1890 0 6002 0 -0.900036
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 50
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
geneHaploCounts.txt For each gene, and each generation, the number of unique
haplotypes:
Gene0 Gene1 Gene2 Gene3
0 0 0 0
0 0 0 0
11 8 13 10
17 12 17 23
These files are always produced at the end of the run, but their names depend on the
input file specifications:
PopName_PopRunData.txt Generation, current census count before and after selection
in that generation, number who left the population before mating, and the number culled
by selection:
Generation
1 2003 1993
2 2001 1990
3 2002 1987
census postSelectionCensus emigrants culled
0 10
0 11
0 15
generation_###_Population#.xml The entire data set, in easily parsable XML tagged
format. This can be a huge file, so is only saved if so specified by outputXML true in
the input file. This is the last generation simulated and thus the final generation in the
final pedigree file. When a population is specified for death during the simulation, its
final data will be saved in this format. The population can be saved at other points as
well by using an event line as described earlier. The first line gives population name
and generation, male and female sex counts in the population, sequential ID of each
simulated individual, and so on, then the two chromosome haplotypes in terms of its
HaploGenes.
NOTE: SNPs reported in this file only include the derived (novel) SNP (which may or
may not be fixed at the time); the ancestral alleles are not specifically listed, mainly to
save on the enormous file size that could otherwise result, since in most simulations most
alleles will be ancestral. This means that a chromosome listing that does not include one
of the extant SNPs has the ancestral allele. Also, the data are phased because
chromosomes are listed in parental order:
<Population name="PopulationA" generationNumber="400" males="1444"
females="1555">
<Individual uid="1195227" paternal_uid="1194235"
maternal_uid="1192273">
<Phenotype number="0" net="-0.95603" genetic="0" environment="0.95603" />
<Phenotype number="1" net="0.547231" genetic="0.110449"
environment="0.436782" />
<Chromosome uid="35816" number="0" count="5" born="392">
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 51
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
<HaploGene name="ABC1" onChromosome="0" born="0"
uid="1" snpcount="0" count="2840" phen="0">
</HaploGene>
<HaploGene name="ABC3" onChromosome="0" born="370"
uid="15571" snpcount="2" count="60" phen="-0.0247006">
<Snp name="SNP0XTL306509ID3129" count="232"
born="103" uid="3129" location="306509" nucleotide="T" phen="0.0247006"/>
<Snp name="SNP0XCL331557ID11175" count="60"
born="371" uid="11175" location="331557" nucleotide="C" phen="0"/>
PopName_SelStratRelative_PhenotypeName.txt Describes for each generation the
phenotype lower selection cutoff threshold, mean phenotype, and phenotypic standard
deviation, upper threshold, those culled because they were below the low or above the
high cutoff. Name specifies selection strategy (e.g., ‘relative’ to mean). SelStrat names
the selection strategy used for this phenotype. Alternative for ‘Relative’ will be ‘Neutral’
or ‘functional’ (based on a probabilistic function) as specified in the input file. Separate
file produced for each phenotype and each population [NOTE: The mean and StdDev
columns are always relevant, but the ‘low’ and ‘up’ columns have no user-useful
meaning in ‘functional’ or ‘neutral’ selection. ]:
Gen LowThresh Mean UpThresh StdDev LowCulled UpCulled
1 -2.78966 0.0168957 4.82814 1.00234 4 0
2 -2.90604 0.0035682 4.99146 1.03914 6 0
3 -2.95189 -0.0389495 4.95467 1.04034 8 0
Files generated at each generation interval specified by output # in the input file
NOTE: the large xml whole-population file is not saved unless outputXML # is specified
for the desired generation (this is to save space and time):
Hap_GeneName_OutputGeneration_##.svg
(this is a figure, documented below requires global line outputSVG true)
Phenotypes_PopName_Generation_##.txt The phenotypes, given as one line for each
individual. The file is saved for generation 0 to see the starting conditions (basically,
environmental variation), and the file is also saved for the penultimate generation so that
response to selection can be assessed:
Phen0 Phen1
9.4427 -13.3644
9.13841 -13.5032
8.49651 -16.5196
HaploPhenGen###.txt For each unique HaploGene, its phenotypic effect and number of
copies, at the specified generation number:
-1.74914 294
6.42087 255
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 52
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
-1.74914 138
-1.14608 668
SnpStatsGen##.txt For each SNP in generation #, gives the chromosome, the sequential
SNP number, chromosomal location generation, when born:
Generation Chromosome ID Location Born
999 0 6588 939347 97
999 0 6628 418416 98
999 0 11339 904214 168
999 0 17500 911487 259
snpSelCoeffs_#.txt For each SNP that exists in generation #, its SNP ID, chromosome,
chromosomal location, generation when born, followed by the selective coefficient of
that SNP back to the generation in which it first became polymorphic; these are computed
as the number of copies of the SNP divided by the total number of transmissions in that
generation (2 x population size).
PopulationCount 0 0 0 0 1004(2) 1003(1558) [...]
SNP0XTL554524ID35703 0 554524 1002 0 1004(0.00160462)
1003(0.000322789)
SNP0XTL587068ID35708 0 587068 1002 0 1004(0.000962773)
1003(0.000322789)
SNP0XAL3370468ID35715 0 3370468 1002 0 1004(0.00513479)
1003(0.00225952)
These files are produced when an internal storage buffer has been filled:
(NOTE: Of these, the snpLifeSpanData, snpOwnerPhens, snpCounts, snpSelCoeffs files
are saved during the run if usingTrackSNPs true is included in the input file; these
files will be saved as their buffer of 100,000 records is filled (so short runs or small
populations may generate only one file. These can be huge files, so only use this option
if you really want the complete history of every SNP!)
snpLifeSpanData_#.txt At file-save time, generation #, the generation by generation
history of each SNP that was born since, or was polymorphic in, the previous savegeneration, and for every generation until lost (if lost) the status of that SNP. At each
generation, provides the name, chromosomal location, birth generation, whether fixed,
lost, or still polymorphic, allele count (=0 if lost), current diploid count (twice population
size), current frequency, assigned phenotypic effect. At the end, a
snpLifeSpanDataFinal.txt file is generated, that summarizes what was present at the end
of the run. [NOTE: For a full tally of the history of all SNPs in the run, user must
retrieve all of these files, as SNPs present at the end will not count those that existed but
were fixed or lost during the run, if those exceeded the buffer size and were saved as files
during the run. The history of those that were polymorphic in previous files will be
continued in subsequent files until they are lost (if they are lost before the end)]:
# Name Status Location Born DateFixedOrLost CurrentCount DiploidCount
Freq Phenotype
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 53
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
SNP0XGL939347ID6588 Lost 939347 97 699 0
SNP0XGL418416ID6628 Lost 418416 98 699 0
SNP0XGL904214ID11339 Lost 904214 168 699
SNP0XAL911487ID17500 Lost 911487 259 699
4010 0
4010 0
0 4010
0 4010
0.461625
-0.900036
0 -1.10356
0 0.139189
snpOwnerPhens_##.txt For each SNP existing in generation ##, the first line records
each generation number and population size that generation in the form gene(popcount);
the four 0’s are space-holders only. Each SNP has its own line that includes ID,
chromosome number, chromosomal location, born-on date, assigned phenotypic effect to
any phenotype it affects, then for each generation during its life, the generation number
and the net phenotypic effect, that is the average phenotype among all individuals who
‘own’ (have at least one copy of) the SNP in the population. These net effect values
reflect the changing linkage and genotype context that the SNP has been in during its life.
Values are given for each generation from its birth to generation ##, or since the last
‘output’ generation. This file will only be produced in generations that are a multiple of
the generation specified by the “output” directive.
PopulationCount 0 0 0 0 7993(4999) 7992(4999) ...
SNPTL1041915ID645 0 1041915 10 0 7993(-2.153) 7992(-2.627) ...
SNPTL1036347ID7221 0 1036347 96 -0.012469 7993(-2.364) 7992(-3.124)
SNPCL5011063ID8827 0 5011063 117 -0.0933972 7993(-2.408) 7992(-.199)
SNPCL5064714ID9595 0 5064714 127 0 7993(-2.358) 7992(-2.204) ...
snpCounts_##.txt For each SNP existing in generation ##, the count of the novel allele in
the full population (that is, in all simulated populations), at each generation over the
entire history of the SNP since the last ‘output’ generation. Like the previous file, except
reporting SNP counts rather than their net effects. The first line records a population
count per numbered generation. This file will only be produced in generations that are a
multiple of the generation specified by the “output” directive.
PopulationCount 0 0 0 0 7993(4999) 7992(4999) ...
SNP0XTL1041915ID645 0 1041915 10 0 7993(6616) 7992(6626) ...
SNP0XTL1036347ID7221 0 1036347 96 -0.012469 7993(3317) 7992(3318) ...
SNP0XCL5011063ID8827 0 5011063 117 -0.0933972 7993(2551) 7992(2574)
SNP0XCL5064714ID9595 0 5064714 127 0 7993(2363) 7992(2364) ...
Also, see MarkerPedSNPs.txt, above.
Output graphic files
ForSim also produces output graphics of various aspects of the data, at the user-specified
generational markpoints (‘output #’ described above, in the input file), and at the end.
Many are generated by calls to R, or are produced in browser-plottable SVG format.
These have self-explanatory file names and identifying legends and should be selfexplanatory. A few examples are given in below. The graphics report various aspects of
the data that are useful directly as well as serving as indicators of whether the simulation
is doing what you think it should (and in that sense a source of bug-detection either of
ForSim or of the input file specifications). A strange pattern could be an interesting
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 54
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
result, but could indicate a bug or that you specified something other than what you
thought, so don’t just take them at face value!
The graphics can be viewed directly if produced in png format, or downloaded to be
viewed, if produced in pdf format. The format is specified as a command-line parameter
(see above).
Contents of output graphics
Here is a list of the output figure names (command line specifies whether pdf or png
format):
Hap_GeneName_OutputGenerationNumber.svg Figure saved for each gene if
outputSVG true is set
PopulationName_PhenotypeNameMean.png: value for each generation during the run
PopulationName_PhenotypeNameStdDev.png: value for each generation during the run
PopulationName_PhenotypeNameLowCulled.png: number culled because they were
below the low selection threshold, for each generation during the run (there is no such
threshold for neutral or probilistic ‘functional’ selection, so these and the next three
graphics are not useful under those conditions, except as checks).
PopulationName_PhenotypeNameUpCulled.png: number each generation during the run
PopulationName_PhenotypeNameLowThresh.png: number each generation in the run
PopulationName_PhenotypeNameUpThresh.png: number each generation during the run
PopulationName_culled.png: total selected out, for each generation during the run
PopulationName_postSelectionCensus,png: number of selection-survivors in each
generation during the run
PopulationName_census.png: count for each generation during the run
SnpStatsGen#_Born.png: X-axis is generation, Y-axis present-day SNPs that were born
at that generation
SnpStatsGen#_Location.png: At interval triggered output generations, X-axis is
chromosome coordinate, Y-axis location of existing SNPs, in 10-basepair bins (shown
only for chromosome 0, in all populations pooled, intended as a diagnostic for various
parameters like mutation and selection)
SnpStatsGen#_ID.png: At interval triggered output generations, gives frequency of each
extant SNP for frequency distribution diagnostic checking; given for all SNPS on all
chromosomes in all populations.
phenotypes_PopulationName_Generation_##_Phen#.png: phenotype #’s distribution,
including a red line showing the normal distribution with the same µ and σ as the data.
The file is also saved for the penultimate generation so that response to selection can be
viewed.
Hap_GeneName_Gen#.svg: For each ‘output’ interval generation, a multifeature plot of
HaploGenes in the population at this generation. This shows for each individual gene,
(for all populations pooled), on every chromosome in a separate figure), one line for
each unique HaploGene (haplotype for the gene), for each SNP along the gene whether
the HaploGene’s allele is the ancestral (blank) or novel allele (blue if a positive, red if a
negative, green if a neutral assigned effect on phenotypes). On the left is the frequency
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 55
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
of the HaploGene in bar-form, and its net assigned phenotypic effect (red, blue, the sum
of its SNP effects). Sorted by HaploGene frequency. The background is lightly colored
by age of that line’s HaploGene, since the last recombination or mutation that
generated it (pink=older, blue=younger). To get these, the input file global block must
contain outputSVG true. If output ## is also specified, these svg files will be
generated every ## generations; to produce these files only for the final generation of
the run, set ## equal to the final generation number.
Heritability.png: By generation, the narrow sense heritability (ratio of genotypic to total
phenotypic variance) for the first-defined phenotype. This is useful for debugging, to
check the behavior of the specified parameters, and to show the varying effects of
genes, or changes in their effects after event-specified occurrences, and so on.
locations_PopName_Generation#.SVG. If usingSpatial is set, this figure represents every
individual in the population at Generation #, as a circle, the color is determined by the
individual’s (first-defined) phenotype in a rainbow scale (that changes each generation,
so is relative only). The frequency with which such files are generated is specified in the
output line in the input file.
The following two types of figures are not routinely generated. They can take up
enourmous space if you’re simulating many different genes or they’re generating many
different gene haplotypes. The generating code has been commented out in the running
script runForsim.rb. If you want these, just search on geneEntropies or geneHaploCounts
and remove the comments. Then run runForsim.rb as usual.
Gene#_geneHaploCounts.png: For gene #, the count of unique haplotypes for that gene
at each generation during the run, one figure for each gene.
GeneName_geneEntropies.png: entropy value (computed as above, in output text files
description, for this gene at each generation in the run
Here are samples of the above files:
Histogram of individuals culled by selection, by generation
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 56
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Distribution of phenotype values in a single generation, with normal distribution having the same mean
and variance (red line)
Illustration 3: Plot of mean value of Phenotype A in Population A over 1000 generations
Illustration 4: Plot of spatial distribution of individuals genearted by usingSpatial
If outputSVG true is used, the following integrated figure comparing haplotypes of cases and
controls will be generated (see above for description, and how to specify the desired generation(s) for these
figures):
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 57
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Illustration 5: “Case-control” comparison
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 58
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
ADDENDUM: SOME WAYS TO SIMULATE FEATURES NOT EXPLICITLY BUILT INTO ForSim, AND ONE WAY TO CHECK REASONS FOR CRASHES While ForSim is very flexible, it cannot explicitly do everything. Nonetheless, many
conditions that are not automatically provided for can be simulated by creative use of the
existing features. Here are some suggestive examples.
First, and most flexibly, many things can be done indirectly by using the serialize/load
feature. The saved reload data reflect the previous simulation conditions, but the data
after the reload run completes will reflect new properties that may be specified in a
revised input file. Many variables and running conditions can be changed in this way,
that are not amenable to direct event keyworded change, and there is nothing that
prevents several stop-starts of this kind, except that they must be done ‘by hand’ (or shell
script).
1. Microsatellites
Microsats are not explicitly included in ForSim, which simulates single nucleotide
mutation. But to simulate the hierarchically-ordered high mutation rate behavior of
microsatellites, a gene of appropriate length and/or mutation rate can be simulated that
will accumulate enough mutations to approximate the higher haplotype heterozygosity of
microsatellites. Or a very high mutation rate can be specified as the global rate, but the
non ‘microsat’ genes can be given a proportionately lower mutation rate. Doing this will
not, however, generate recurrent mutation.
2. Complex prevalence.
The ‘prevalence’ variable in ForSim input files is based only on the first-defined
phenotype in the input file. Multivariate prevalence, such as defined by two phenotypes,
can be handled in at least two ways. At the end of the run, PreMakeped files are
generated for all individuals in the population (and the specified pedigrees). These files
list all phenotypes for each individual. Post-run analysis could then determine multi-trait
affections status, and this could be altered accordingly in the preMakeped ‘affection
status’ column for each individual. Alternatively, Phenotype A can be defined (as the
first-specified trait), in terms of the other phenotypes, affection status or prevalence will
be defined in terms Phenotype A: PhenA=(PhenB+PhenC)/2.
3. Mendelian traits.
ForSim does not explicitly specify dichotomous causation. But a gene that one wants to
be able to have Dominant effects could be assigned a high large-effect probability in the
allelic effects (gamma function) parameters. If an absolute rather than relative
(phenotype distribution tail size, in SD units) cutoff for affected status is chosen, then
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 59
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
mutations with large effects will by themselves cause affection status. Environmental
variance could be reduced to allow single mutations alone be more likely to generate
affected status. The program already effectively deals with recessiveness in that many
mutations may cause affected status only when paired with a second large-enough allele.
Also, if natural selection is not involved, then after the run one can write a script to parse
the preMakeped or population files and check the genotype at any SNP site you would
like to have a dominance or recessiveness property. Adjust the phenotype (affectionstatus) column in the individual’s preMakeped file appropriately.
4. Artificial selection as in agricultural situations.
To specify that only individuals whose phenotype is above some cutoff (in SD relative to
the mean), specify (say) ‘selection PhenA relative +2.0 +10.0’. Here, any individual
below +2SD will be truncated (not reproduce), and anyone with a phenotype >+10SD
will not reproduce (this essentially cuts off no one because of high phenotype). The
inverse specification could work for thresholds on the small end of the phenotype
distribution.
Natural selection is likely not so rigid. Fitness may rise rapidly after some threshold, T,
approaching 1.0 above the threshold. This can be specified with a logistic function,
where f (ø) is the fitness of an individual with phenotype ø:
f(ø)= a/(1+bc-kø)
Here b is the Y-intercept, which may typically be zero (zero fitness for a zero phenotype)
and c is the base of the exponentiation, typically e could be used. Users can set values
that seem to make sense. A steep rise in fitness past, say, the inflection point of the
logistic function (at position a/2), essentially makes that the threshold T.
Note that in ForSim these functions generate probabilistic f values so that with
probability f(ø) the individual has normal reproduction, and probability 1-f(ø) of having
none. In addition note that we write this example with this typography because the input
file cannot take subscripts or superscripts in the inputfile it would be
selection PhenA functional a/( 1 + b * c ^ ( - k * Phenotype ) ) )
NOTE: because ForSim must parse a wide varietey of possible functions with as little
ambiguity as possible, every number must be floating point, and there must be a space
between every item (except negative numbers, that can be written -2.3).
Normally for increasing fitness with increasing phenotype one would set c>1. By testing
various parameter values you can set the threshold based on, say, the inflection point
(a/(1+b)), and fitness can decline from a negative threshold as the phenotype increases by
setting 0<c<1. As noted earlier, the phenotype in question is mentioned first (here,
PhenA), but in the functional expression references to that phenotype are made by
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 60
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
‘Phenotype’, regardless of which phenotype is being tested.
Fitness can lead a population to some mean phenotype value for example if fitness is 1.0
at that value and less than that away from it. There are many ways this could be
modeled, but a useful one that has been used in many papers and thus is given here is
normal distribution of fitness, with mean equal to a target optimal phenotype, P, with
standard deviation S:
𝑓 𝜙 =
1
2𝜋𝑆 !
𝑒
!
(!!!)!
!! !
which will generate maximum fitness at the mean value, S. To make this maximal
fitness equal to one, just omit the scaling term:
𝑓 𝜙 =𝑒
!
(!!!)!
!!!
which in input file typography is
selection PhenA functional e ^ ( 0.0 - ( P – Phenotype ) ^ 2.0 /( 2.0 * S ^ 2.0 ) )
or, to do this during the run
event 900 setPhenotypeSelection PopA PhenA selection PhenA functional
e ^ ( 0.0 - ( ( P -Phenotype ) ^ 2.0 / ( 2.0 * S ^ 2.0 ) ) )
where you put in your chosen values for the target optimal phenotype, and the strength of
selection as represented by S. Keep everything on one line (no linebreaks) even if it
wraps on your screen (the above typography is for readability). The ‘0.0’ term is needed
to ensure proper parsing of the relational operator (here, the first ‘-’ in the exponential
term), because the parser looks for binary relationships. Omitting the scaling term from
the Normal distribution, allowing fitness at the optimal point to be 1.0 is not necessary,
since fitnesses are always relative, but if not done in this way no genotype has a 100%
chance of reproducing, and having a max fitness of 1.0 can reduce the number of
indivdiuals culled by selection, making less of an impact on population size, growth
capability and so on.
Life is complex but simulations should be simple and interpretable approximations. So
don’t try any functional relationships that are too fancy. ability to parse equations is
limited (hence the 0.0 and extreme spacing in the above example). Polynomials or other
similar approximations are most likely to be best as a rule (but make sure they are
between 0 and 1).
To test any such functions, first look at the output screen. At the generation when a
functional fitness expression is first implemented (at the very beginning or any such
‘event’ generations) the output screen lines describe the way ForSim is interpreting the
equation, term by term. Additionally, you can check the function explicitly: Edit
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 61
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
MathStack.cpp in forsim/src. Find the output line section (search on SURVIVED or
CULLED), remove the // commenting symbol from these lines, and recompile (just do
‘make’ in the forsim directory). Then run your intended input file. The fitness of every
individual in every generation will be printed to the screen, along with the indivdual’s
phenotype and the fitness decision (survive or culled) based on a random [0,1] number
compared to fitness. Stop the program quickly (^C), and verify the fitness computation
and decision. Then, re-comment out these lines, and recompile.
NOTE: this following section corrects and improves the corresponding discussion in
earlier versions of this Manual.
5. Weak selection.
To simulate the kind of weak selection such that the individual has slightly reduced
fitness below some cutoff value—for example, having high phenotype values confers
only a 1% advantage, use a logistic function again with T as the inflection point, but (for
example) with a small value for k. That is, fitness drops off below T only very slowly.
Try out various parameter values for the function in a spreadsheet program like Excel.
6. Haploid evolution.
ForSim is a diploid simulator. But if you are not concerned with selection—drift only
simulations for population history inference, you can achieve this. Set recombination to
zero, do the normal run, then search the SNP output files (MarkerFreqs##.txt) for a fixed
SNP (‘frequency’) =1.0 and origin (‘born’) generation as early as you can find. Then
examine the data only on haplogenes carrying that SNP allele. NOTE that this will not
work if there is natural selection, since that is based on diploid phenotypes.
7. Changing parameters mid-run, that cannot be changed with event line options.
Not all parameters can be changed with event line options. But the same effect is easy to
achieve: Have an initial run stop at the generation where the change(s) are to occur.
Then, using the resulting ‘serialized’ data file and a different input file that contains a
‘load’ instruction and the altered parameter values, start again and run for however many
more generations are desired. If many such changes are desired, write a shell script file
with the series of runForsim lines each referring to the appropriate input and serialized
files (e.g., before each move the most-recent ‘serialized’ file to a location specified in the
subsequent input files). Here is an example of such a script:
# example to show how to change parameters in the middle of a run
# when that changes aren't among the 'event' instruction options
# do first run:
ruby runForsim.rb -i FirstInputFile.sim
# go to this run’s output folder:
cd runData/For*
# next line needed to extract the serialized file if data were compressed:
tar xzvf *.gz serialized*
# move this serialized file to the main runData folder:
mv -f serialized* ../serialized.txt
# go back to the forsim directory:
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 62
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
cd ~/. . . ./forsim
# run ForSim with new input file:
ruby runForsim.rb -i SecondInputFile.sim
# NOTE: the new input file must 'load runData/serialized.txt
# NOTE: this instruction set can be repeated as many times as desired.
#
The last runData folder will contain the final results
A word on ‘crashes’ and why they happened
As with any complex program, results depend upon the conditions specified. One first
line of defense to check the reasonableness of what you specified is to browse the output
graphic and text files to see if things you expected based on what you (thought you)
specified in the input (.sim) file are what you got. Many problems can be spotted this
way: indeed, it is often a very good way to realize the complex nature of evolution and
genetic architecture in the real world!
ForSim runs can crash or freeze before saving results. As in any computer program, and
given ForSim’s flexibility, there are undoubtedly bugs in the C++ code, and while we are
not funded to provide a programming service we may be able to fix them if we’re notified
about them.
Most often, however, crashes occur either for purely stochastic (bad luck) reasons, or
because of paremeter settings in the input file. For example, a population can die out if
there are not enough mates, or selection is too severe, or a major change such as in a
selection or mate-choice regime are too sudden. To explore whether this is the problem,
try re-running the same input file. If it only crashes some of the time, then it is this kind
of issue. Real-world populations crash, too (most eventually go extinct!) so this may be a
lesson in life rather than a ‘mistake’ in what you specified! Major changes can be
implemented in steps by a serias of moderate ‘event’ instructions.
If re-running doesn’t help, try the same input file but adjusting some of the parameters
(pop size, number of generations run, selection intensity, try mating with rather than
without replacement, &c). If this makes a difference, again this is not a program bug.
Crashes can also occur when some invalide memory call is made and because this can
occur in a multitude of ways, it was not practicable to identify those before they happen
and report them with an orderly exit. So if you still are unclear, or want to know just
where the problem arises, try the following. ForSim is written in C++ and here is a way
to see at least where the crash occurred (NOTE: this is done running forsim alone, rather
than within its runForsim.rb Ruby wrapper, so the post-run output files and so on will not
be produced):
To debug a forsim run in g++ [‘rtn’ means hit Enter]:
In makefile CXXFLAGS line, insert –g3
Make clean [rtn]
Make [rtn]
Then
gdb --args ./forsim args [rtn]
then
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 63
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
[rtn] to run
after the run, or if it crashes,
bt [rtn] for backtrace, which shows last 5 lines hit in the source code
q [rtn] to quit
r
(--args means there will be arguments, here the input file name)
Of course, to understand the problem, you may or may not be able to decipher the issue
from this, and/or may have to explore the source code to see what it’s up to.
Note that in the CXXFLAGS = -g3 -O3 [...] line in the make file the '-g3' portion
instructs the compiler to include debugging capacity into the executable it is building.
Note to Programmers
As noted in the outset of this Manual, the financial support for ended in spring of 2013.
At that time, a few legacy quirks remain, and are noted as such in the text. These are
things that do not affect program accuracy as far as we know, but that were not spotted
before the project ended. An example is the PolygenicComponent pre-pend column in
the preMakePed output files. This has a value of 0 because a polygenic background
component was a feature in development that did not completely work properly when the
programmer left the project.
ForSim C++ code and Makefile compiled and ran on MacOS X Lion and Linux/Unix.
The Ruby wrapper scripts ran under Ruby 1.8/1.9 as of Spring 2013.
If you wish to modify the program or explore the C++ code, you will find some legacy
code that is no longer functional. An example are routines related to the production of
‘extra pedigrees’. These are sections with features that were in development or needed
some debugging at the end of the project. Rather than take the chance that removing the
code would unhook functioning features, the unused code was not removed.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 64
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
APPENDIX 1: ForSim LOGICAL FLOW AND TIME CONSUMPTION These figures show the general flow of the program among its components. First is a
more comprehensive view of the calls (not counting final output function calls), and
below that is a simplified version with just major calls included. The numbers give
percent of CPU time consumed by the respective functions.
The logic is basically correct, but the percents vary as the program has been modified.
The figure generated with KCachegrind ( http://kcachegrind.sourceforge.net/) from using
profiling data supplied by Valgrind (http://valgrind.org/). Each box is a major component,
the small fill-bars in each box show the relative time consumption in a typical run. These
values can be viewed by zooming in on the images.
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 65
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 66
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
APPENDIX 2: GENERIC INPUT.SIM FILE TEMPLATE Example 3a:
Edit this basic file (“basic.sim”) to suit your own needs, or make your own from scratch
global begin
setFertility poisson 2.5
setMaxOffspringNumber 4
prevalence relative 0.08
generations 1000
1.0 megabases per centiMorgan male
1.0 megabases per centiMorgan female
mutation rate 2.5 E -8.0 female
mutation rate 2.5 E -8.0 male
event
event
event
event
event
event
event
400
400
400
600
600
600
600
# Optional: one line labeled ‘all’
donateParents PopulationA PopulationB 100 100
setMatingPopMatrix PopulationA 95 5 0
setMatingPopMatrix PopulationB 5 95 0
donateParents PopulationB PopulationC 100 100
setMatingPopMatrix PopulationA 90 5 5
setMatingPopMatrix PopulationB 5 90 5
setMatingPopMatrix PopulationC 1 1 98
output 900
outputXML true
matingWithReplacement true
# The following is usually set to false unless you really want to constrict
environmental variance
scaleEnvironmentNormalVariance true 0.001 0.1
finalPedigreeDepth 3
end
chromosome begin
length 4000000
gene begin
name ABC1
location 200000
length 100000
gamma 1 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.5
end
gene begin
name ABC2
location 500000
length 100000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.5
end
gene begin
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 67
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
name ABC3
location 3300000
length 100000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.5
end
gene begin
name ABC4
location 3700000
length 100000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.1
probabilityPositiveEffect 0.5
end
end
phenotype begin
name PhenotypeA
definition ABC1 + ABC3
end
phenotype begin
name PhenotypeB
definition ABC2 + ABC4
end
population begin
name PopulationA
birth 0
initialSize 900
carryingCapacity 3000
growthRate 0.525 # at 2.1% per year for gen=25 yrs
death 2000
selection PhenotypeA relative -4.8 0.4
environmentNormal PhenotypeA 0.0 1.0
familyEnvironmentNormal PhenotypeA 0.0 0.5
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
familyEnvironmentNormal PhenotypeB 0.0 0.5
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
population begin
name PopulationB
birth 400
initialSize 200
carryingCapacity 3000
growthRate 0.525 # at 2.1% per year for gen=25 yrs
death 2001
selection PhenotypeA relative -4.8 4.8
environmentNormal PhenotypeA 0.0 1.0
familyEnvironmentNormal PhenotypeA 0.0 0.5
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 68
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
environmentNormal PhenotypeB 0.0 1.0
familyEnvironmentNormal PhenotypeB 0.0 0.5
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
population begin
name PopulationC
birth 600
initialSize 200
carryingCapacity 3000
growthRate 0.525 # at 2.1% per year for gen=25 yrs
death 2001
selection PhenotypeA relative -4.8 3.6
familyEnvironmentNormal PhenotypeA 0.0 0.5
environmentNormal PhenotypeA 0.0 1.0
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
familyEnvironmentNormal PhenotypeB 0.0 0.5
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 69
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
Example 3b:
Edit this even simpler file (“simple.sim”) to suit your own needs, or test effects of
changing instructions or to help debug input-file syntax
global begin
setFertility poisson 2.5
setMaxOffspringNumber 4
prevalence relative 0.08
generations 1000
1.0 megabases per centiMorgan male
1.0 megabases per centiMorgan female
mutation rate 2.5 E -8.0 female
mutation rate 2.5 E -8.0 male
# Optional: one line labeled ‘all’
output 500
outputXML true
matingWithReplacement true
# The following is usually set to false unless you really want to constrict
environmental variance
scaleEnvironmentNormalVariance true 0.001 0.1
finalPedigreeDepth 3
#
#
#
#
usingTrackSNPs true
outputXML true
outputSVG true
usingSpatial true 1000 1000 0.1 1.0
event 530 outputXML
event 500 serializeState
# event 800 setPhenotypeSelection PopulationA PhenotypeA -8.8 3.8
# event 910 setPhenotypeSelection PopulationA PhenotypeB -3.8 8.8
end
chromosome begin
length 4000000
gene begin
name ABC1
location 3000000
length 25000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.4
probabilityPositiveEffect 0.3
end
gene begin
name ABC2
location 4000000
length 25000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.4
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 70
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
probabilityPositiveEffect 0.3
end
gene begin
name ABC3
location 5000000
length 25000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.4
probabilityPositiveEffect 0.3
end
end
phenotype begin
name PhenotypeA
definition ABC1 + ABC3
end
phenotype begin
name PhenotypeB
definition ABC2
end
population begin
name PopulationA
birth 0
initialSize 1000
carryingCapacity 3000
growthRate 0.525 # at 2.1% per year for gen=25 yrs
death 20000
selection PhenotypeA relative -4.8 0.4
environmentNormal PhenotypeA 0.0 1.0
familyEnvironmentNormal PhenotypeA 0.0 0.5
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
familyEnvironmentNormal PhenotypeB 0.0 0.5
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 71
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
APPENDIX 3: MORE COMPLEX SIMULATION FLOW AND INPUT FILE EXAMPLE In this simulation, a single population of 10,000 runs for 7,500 generations, then splits
into two populations of 5,000 that grow to 10,000, then 2490 generations later a third
‘admixed’ population is formed with 70% from Population A and 30% from Population
B. The simulation then continues for 10 more generations. Pedigrees in the end, will
reflect ‘pure’ PopulationA, PopulationB, and admixed PopulationC individuals.
Example 4:
Following is included as “complexTest.sim” in the distribution:
#WHAT: Simulation of an admixed population
global begin
setFertility poisson 2.0
setMaxOffspringNumber 8
output 10000
prevalence relative 0.09
generations 10000
outputXML false
1.0 megabases per centiMorgan all
mutation rate 2.5 E -8.0 all
# Optionally, users can include and alter the following four lines to
# specify probabilities defining the likelihood that a novel mutation
# in a given codon position will have no effect.
usingCodons true
firstCodonProbability 1.0
secondCodonProbability 0.9
thirdCodonProbability 0.5
matingWithReplacement true
scaleEnvironmentNormalVariance false
finalPedigreeDepth 3
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 72
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
event 7500 donateParents PopulationA PopulationB 2500 2500
event 7500 setMatingPopMatrix PopulationA 100 0 0
event 7500 setMatingPopMatrix PopulationB 0 100 0
event
event
event
event
event
9990
9990
9990
9990
9990
donateParents PopulationA PopulationC 3500 3500
donateParents PopulationB PopulationC 1500 1500
setMatingPopMatrix PopulationA 100 0 0
setMatingPopMatrix PopulationB 0 100 0
setMatingPopMatrix PopulationC 0 0 100
end
phenotype begin
name PhenotypeA
definition ABC2 + ABC3 + DEF1 + DEF2 + DEF4 + 4.5 * environment
end
phenotype begin
name PhenotypeB
definition ABC4 + ABC5 + DEF5
end
chromosome begin
length 10000000
gene begin
name ABC1
location 1000
length 50000
# Introns can be “inserted” in genes as follows:
intron 5000 2000
# Above, the intron begins 5kbp from the start of the gene “ABC1” and # is
2kbp long. All mutations which arise in this region will contribute no
phenotypic effect.
gamma 1 0.01 1 0.01
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name ABC2
location 1000000
length 50000
gamma 1 0.01 1 0.01
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name ABC3
location 2000000
length 50000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name ABC4
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 73
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
location 3000000
length 20000
gamma 1 0.01 1 0.01
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name ABC5
location 4000000
length 50000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name DEF1
location 5000000
length 50000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name DEF2
location 6000000
length 20000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name DEF3
location 7000000
length 20000
gamma 1 0.05 1 0.05
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name DEF4
location 8000000
length 50000
gamma 1 0.01 1 0.01
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
gene begin
name DEF5
location 9000000
length 20000
gamma 1 0.01 1 0.01
probabilityNoEffect 0.9
probabilityPositiveEffect 0.1
end
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 74
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved
end
population begin
name PopulationA
birth 0
death 40000
initialSize 1000
carryingCapacity 10000
growthRate 0.525
selection PhenotypeA relative -4.8 4.2
environmentNormal PhenotypeA 0.0 1.0
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
population begin
name PopulationB
birth 7500
death 90000
initialSize 5000
carryingCapacity 5500
growthRate 0.525
selection PhenotypeA relative -3.6 3.6
environmentNormal PhenotypeA 0.0 1.0
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
population begin
name PopulationC
birth 9990
death 90000
initialSize 10000
carryingCapacity 11000
growthRate 0.525
selection PhenotypeA relative -3.6 3.6
environmentNormal PhenotypeA 0.0 1.0
mateCutoff internal PhenotypeA -4.8 4.8
mateCutoff external PhenotypeA -4.8 4.8
selection PhenotypeB relative -4.8 4.8
environmentNormal PhenotypeB 0.0 1.0
mateCutoff internal PhenotypeB -4.8 4.8
mateCutoff external PhenotypeB -4.8 4.8
end
# end of input file
© 2008-2013 ForSim the logo, the program itself, and these notes, are copyright by 75
Kenneth M Weiss and Brian Lambert. All rights of use or reproduction are reserved

Download Report

ForSim - Penn State Anthropology!

Paperzz.com

Your Paperzz