Quantitative ecology seminar

BIOE 293
Quantitative ecology
seminar
Marm Kilpatrick
Steve Munch
Spring Quarter 2015
Seminar Goal
For 5-10 quantitative methods:
• To understand how the approach works by reading a
“methods” paper or book chapter on it. This includes
the assumptions, strengths, weaknesses, and
limitations.
• Read papers more critically. Assess whether an approach is
potentially useful for your own work
• Try analyzing data using the approach and discuss
challenges.
• Sadly, you likely won’t be an expert in any of the
topics at the end, but you’ll have a start at becoming
one
Potential Topics & Voting scheme!
• 0. Generic intro stuff
• Statistical approaches
• Maximum Entropy
• I. Linear models and apps
•
•
•
•
Generalized linear models
Path analysis/SEM
Correlated data
Phylogenetic methods
• II. Linear, multivariate
• PCA, MCA, CCA
• III. Multi-layer models
• Hierarchical models
• Multi-state mark recapture
• IV. Nonlinear from linear
• GAMs
• Wavelets
• V. Potpourri
•
•
•
•
•
•
Meta-analyses
Isotope mixing models
Ecological Niche models
Kriging
Machine learning
Occupancy modeling
Statistical approaches
• Frequentist, AIC, Bayesian – what questions are
they answering, what advantages/disadvantages
do each them have?
• Frequentist: P-values
• AIC: Best fitting model(s)
• Bayesian: Descriptions of posterior distributions
• Maximum Entropy (a different way of thinking
about the same stuff, with different pros/cons)
p(x) is probability density; m(x) is “background probability distribution”
Most statistical methods start with a model for the
probability of data (x) given parameters (q)
P(x|q)
a.k.a. the ‘Likelihood’
It’s what happens next that gets people so worked up:
Frequentist
Data: (x) parameters: (q)
-Think of q as fixed, but unknown
-Find parameters that maximize P(x|q).
Bayesian
Bayes’ rule P(q|x)=P(x|q)P(q)/P(x)
-Derive bounds on these estimates that should
provide good coverage in repeated
sampling.
-Treat all unknowns as ‘random’
-Hypothesis tests compare fit against some null
distribution.
-Model selection and Hypothesis tests usually
based on P(model|data). PPL also used.
-Model selection based on goodness of fit.
-Need to specify P(q) and P(model).
-Use Bayes rule to find P(q|x).
-Intervals based directly on P(q|x).
-Frequently only asymptotically correct
Information theoretic
-Choose amongst set of candidate models
based on some ‘Information criterion.’
-All various attempts to choose model that
comes closest to ‘truth’
Maximum Entropy
-Derive probability model that contains smallest
amount of ‘extra’ information.
-Introduced by ET Jaynes as a way to specify
minimally informative priors, later
expanded into its own inferential tool.
-Current applications in ecology range from
purely statistical (e.g. MAxEnt for SDM) to
purely theoretical (Harte’s applications to
size, area, density distributions)
NY
2.0
Pop.
10
12
14
Temp Precip
NY
abundance
0.4 0.6 0.8
# mosq. species
2
3
4
5
Linear Models and applications
2000
1980
1.0
DDT
0.0
2000
10
Temp
abundance
0.2
0.4
# mosq. species
3
4
5
1960
NJ
0.6
NJ
1940
2.0
Pop.
1980
1.0
DDT
1960
12
14
Precip
1940
6
0.0
1
0.2
8
• Generalized linear models and data
transformations: distributions, links, leverage and
more
1980
0.0
2000
1960
1980
2000
1960
1980
2000
CA
0.0
1940
1960
Mosquitoes
1980
2000
Temp.
0.0
4
abundance
1.0
2.0
3.0
CA
1940
6
8 10 12 14
Temp Precip
1960
1.5
# mosq. species
2.0 2.5 3.0 3.5
1940
1.0
2.0
DDT Pop.
0.0
2
8
• Correlated data – GLS for time series, spatial data
1940
DDT
Population
Precip.
Phylogenetic methods (for analyses
where species are data points)
Felsenstein 1985 Am Nat
Linear Models and applications
• Path analysis/Structural equation modeling
Hypotheses
The data
Wootton 1994 Ecology
Multivariate correlational
approaches
• Principal components analysis (PCA), MCA (PCA
for categorical data), CCA (for exploring
correlations between 2 sets of predictors
(matrices))
• What people often do after they’ve collected lots
of data but don’t know what to do with it
III. Multi-layer models (usually
linear, but not necessarily)
• Hierarchical models (Mixed effects models, nested
models, random effects models)
• For analyzing data that is influenced by variables that
differ at more than one “level”
• Multi-state mark recapture models
• Survival analyses
• Allow for temporary emigration (temporary movement
to unvisited locations)
• Allow for variable states/traits of individuals to
influence survival
Hierarchical models
Finite mixture models
Mixed effects models
Hidden Markov models
Introduce ‘hidden’ or ‘latent’ variable to
account for heterogeneity among individuals
Capture nonstandard distributional shapes
Treat some estimated effects (i.e.
parameters) as ‘random’ (i.e. variable)
State-space models
Separate observation and process models
Allow for imperfect observations of
dynamical systems
P(x|q) Likelihood
P(q|r) Prior
P(r) Hyperprior
Ecological Niche Models
(Because everyone loves maps)
Occurrence data
Environmental variables
Probability of occurrence
IV. Nonlinear models (out of linear
ones)
• Generalized additive models (GAMs)
Where each f is represented as a ‘basis expansion’
f ( x)   a j h j ( x)
j
hj(x) are fixed ‘basis functions’
and aj are coefficients to be estimated.
Has same structure as a linear model
Wavelets
Potpourri
(Other topics)
Meta-analyses
A method for combining results from multiple studies
Salkeld et al 2013 Ecol Lett
Assessing bias, modeling heterogeneity
Isotope mixing models
• Estimate the proportions of different food items in
your diet
Kriging
Machine learning approaches –
regression trees, random forests
• Regression tree: split data into successive groups
• Random forests: Lots of regression trees to
minimize overfitting
De’ath&Fabricius 2000 Ecology
Occupancy modeling
• Measuring the occupancy and distribution of an
organism when accounting for imperfect detection
• With additional assumptions, can be used to
estimate abundance
• Uses repeated visitation of locations and
presence/absence of species of interest