IMa2(Isolation with Migration)

IMa2(Isolation with
Migration)
Reporter: Junning Liu
2017.01.15
INTRODUCTION(https://bio.cst.temple.edu/~hey/software/software.htm#IMa2)
The program implements a method for generating posterior
probabilities for complex demographic population genetic models.
IMa2 works similarly to the older IMa program, with some important
additions. IMa2 can handle data and implement a model for multiple
populations (for numbers of sampled populations between one and
ten) have a known phylogenetic history– not just two populations (as
was the case with the original IM and IMa programs)
The program is based on the ‘Isolation with Migration’ model and
Bayesian inference and Markov chain Monte Carlo.
IM and IMa
Assumptions
1: The major overall assumption is that the history of a sample from
two populations can reasonably be described by an Isolation with
Migration model.
2:Selective Neutrality. The method assumes that the variation within
the data set is neutral (i.e. not affected by directional or balancing
selection).
3:No Recombination Within Loci
4:Free Recombination Between Loci.
5:Mutation has Followed the Model Applied to the Data
IM and IMa
Mutation Models :
1:The Infinite Sites (IS) model (Kimura 1969).
2:The Hasegawa-Kishino-Yano (HKY) model (Hasegawa et al. 1985).
3:The Stepwise Mutation Model (SMM) (Kimura and Ohta 1978).
4:Compound Locus Models.
IM and IMa
• Parameter
Parameters of
Isolation with Migration model
IMa2---More parameters exist when there are more than two
populations
An isolation-with-migration model
for three sampled
populations.
IMa2
The general k-population model includes the following assumptions:
1:The history of the sampled populations can be represented by a
bifurcating phylogenetic tree.
2: The population phylogeny is rooted, and the topology of the tree and
the sequence of splitting events in time is known.
3: Each sampled population, as well as each ancestral population, is
constant in size and follows Fisher–Wright population assumptions
(Ewens 1979).
4: Gene flow may have occurred, in either or both directions, between
each pair of populations that coexist over one or more time intervals.
5: No gene flow occurred between unsampled populations and
sampled populations or their ancestors.
Running IMa2
IMa2 works with several types of files, some of which must be
prepared by the user if they are needed. The primary input file is of
course the data file and all analyses require one of these.
• Data File Format
• Parameter Prior File Format
• Nested Model File Format
Input data file format:
If there are two populations then the tree
string is: (0,1):2
RUNNING IMa2
• Input data file formats:
line 1 - arbitrary text, usually explaining the content of the file .
After line 1, but before line 2, comments can be included to provide explanatory information.
Each line of comment must begin with a ‘#’ .
line 2 - an integer, the number of populations, npops.
Line 3 – the population names in order, separated by one or more spaces.
Line 4 – the population string in modified Newick format.
line 5 - an integer, the number of loci in the data set, nloci .
line 6 - basic information for locus 1. This line contains, in order and each separated by
spaces: the locus names; the sample sizes for each population; the size of the locus; the
mutation model; the inheritance scalar; possibly a mutation rate; and possibly a range of
mutation rates .
line 7 - data for gene copy # 1 from population 0.
RUNNING IMa2
• The program is run by typing and entering a command at a command
line prompt.
• It is usually simplest to have the program and data files in the same
directory (folder), and for the command prompt window to be open
In that directory (folder).
./IMa2 -iinputfile -ooutputfile -q2 -m1 -t3 -b10000000 -hn20 -s123
Output file
The program generates up to five different types of output files,
including:
• the main results file,
• genealogy files (ending in .ti),
• Markov chain state file (ending in .mcf extension) for restarting a run;
migration histogram files (ending in .mpt extension) for plotting
counts
• times of migration events;
• burntend files for showing trend lines during the burnin period.
Output file
Main Results File :
• Input and starting information
• Load genealogies(L) mode information(L mode only)
• MCMC information (M mode only)
• Parameter comparisons, greater than probabilities
• Means, variances and correlations
• Marginal peak locations and probabilities
• Joint peak location and posterior probabilities(L mode only)
• Histograms
• ASSCII curves- Approximate posterior densities
• ASCII plots of parameter trends