APPENDIX S2: Running an MRDS analysis in Distance and R

Last updated 16/10/2014
APPENDIX S2: Running an MRDS analysis in Distance and R: a tutorial
M. L. Burt1, D. L. Borchers1, K. J. Jenkins2 & T. A. Marques1
1
Centre for Research into Environmental and Ecological Modelling, University of St Andrews, UK
2
U.S. Geological Survey Forest and Rangeland Ecosystem Science Centre, USA
Summary
Examples of running a mark-recapture distance sampling analysis in Distance (Thomas et al. 2010) and R (R Core
Team 2013) using the mrds package (Laake et al. 2012) are provided. The data described in Example 1 of Burt et al.
(in prep) are used to illustrate the methods and are in a Distance project which is available as Supporting Information
S3 or can be downloaded from http://distancesampling.org/Distance/example-projects/MEE_Burtetal_Example1.zip.
Implementing MRDS methods
Conventional distance sampling (CDS), multiple covariate distance sampling (MCDS) and mark-recapture distance
sampling (MRDS) analysis methods can be implemented using program Distance (Thomas et al. 2010) or R (R core
team (2013). Distance provides a user-friendly environment for model selection and model checking and is
recommended for those unfamiliar with the methods or with R. However, experienced R users may wish to dispense
with the Distance interface and run the models directly in the R environment using the mrds package (Laake et al.
2012) as this offers more flexibility. Here we provide a brief overview of both methods using data collected using an
IO observer configuration (Example 1 in accompanying paper). The data format required for detections collected
during a MRDS survey is slightly different to that for CDS and so this is described below.
Program Distance can be downloaded from http://distancesampling.org/ and R is available from http://cran.rproject.org/. Both programs are free.
A Distance project has been created containing the data described in Example 1 of Burt et al. (in prep) and the zip file
of the project can be downloaded at the link given above. The necessary files required to run an analyses in R can be
obtained from the Distance project (see below).
DESCRIPTION OF THE SURVEY DATA
The data used throughout this tutorial come from a line transect survey of ungulate faecal pellets within a national
park in the USA conducted in 2001 and 2002 (Jenkins and Manly 2008). The study region was divided into eight subregions; four in the east zone of the park and four in the west. The sub-regions were divided into sampling units and
data were collected in 102 sampling units. Transects, selected at random within each sampling unit, were categorised
by ease of access (Accessible, Moderately Accessible or Inaccessible) and elevation (high or low). Two observers
(denoted by 1 and 2) worked independently and walked along each selected transect (divided into segments) searching
for pellets. On detecting a pellet, the observers recorded information about the pellets and environmental conditions:
perpendicular distance (cm); species (Deer or Elk); number of pellets; number of pellet clumps; presence (1) or
absence (0) of clumps; pellet decomposition (categorised into five levels from no decomposition (1) to more than 50%
decomposed (5)); maximum diameter of pellet dispersion (cm); prevailing aspect of the slope (0-359o); measurement
of the slope classified into 6 levels from flat (1) to more than 35o (6) and size of the pellet group which categorises the
number of pellets into four group sizes (very small, small, medium and large). After both observers had completed
each transect, they determined duplicate detections.
FORMATTING DETECTIONS
In Distance, information about detections (i.e. perpendicular distance, cluster size) is stored in the ‘Observation’ layer.
Figure S1 illustrates the format required for the ‘Observation’ layer of MRDS data. The object field uniquely
identifies each object detected and two records are required for each of the detections; one record for observer 1 and
one record for observer 2. With a trial configuration, observer 1 is the primary observer and observer 2 is the tracker.
The field called detected indicates whether the observer detected the object (1) or did not detect the object (0). For a
duplicate, detected will equal one for both observer 1 and observer 2.
MRDS analysis in Distance
1
Last updated 16/10/2014
Although the survey was designed with an IO configuration, analyses with other configurations have been illustrated
in the Distance project, ‘MEE Burtetal Example 1’; three sets illustrate analyses with an IO configuration, trial
configuration and a single observer configuration (conventional distance sampling). Each set contains several model
definitions which illustrate different options for the DS model and MR models, where appropriate. The covariate
(apart from perpendicular distance) used in the model definitions is sizegroup, the variable which categorises the size
of the pellet group. Of primary interest in this analysis was to estimate the probability of detection for each observer
rather than estimating a density or abundance (the quantities required to be estimated are specified in the model
definition) and so only information on the detection functions are shown. Results are available for all analyses (on the
Results tab). Instructions are given below to create a new analyses:
i.
ii.
iii.
iv.
Open a new analysis and choose the MRDS analysis engine.
Under the ‘Detection function’ tab, choose the observer configuration and model assumptions on the
‘Method’ tab. In Figure S2 an IO point independence model is chosen. This choice determines whether a
MR model and/or a DS model are required.
For an IO point independence model, then both a DS model and a MR model need to be specified. The
left screenshot in Figure S3 illustrates that a hazard rate form has been chosen for the DS model with no
additional covariates included in the scale parameter. The right screenshot in Figure S3 shows a MR
model with perpendicular distance and sizegroup chosen as explanatory variables. Sizegroup was listed as
a factor on the ‘Factors’ tab. Some key fields get renamed during an MRDS analysis: perpendicular
distance gets renamed to ‘distance’ and cluster size (the covariate defined as cluster size in the Survey
Properties) gets renamed to ‘size’. These new names should be used when specifying covariates in
models; look in the log file after running an analysis for details.
After running the analysis, all results are stored on the Results tab. Figure S4 describes the detection
function summary page for the model described above. In the results, ‘Primary’ refers to observer 1 and
‘Secondary’ refers to observer 2.
MRDS analysis in R
Running an analysis directly in R gives the user access to options that are not available via the Distance interface. A
convenient way to switch from Distance to R is to run an analysis in Distance in debug mode. This creates all the
necessary data files (the directory where they are stored is shown in the log tab) and R commands (on the log tab). To
activate debug mode, go to Tools>Preferences>Analysis and tick the ‘Debug mode’ analysis option. The data in
Distance are stored in an Access database (in the ProjectName.dat directory) and so, alternatively, R can connect to
this directly or the ‘RData’ workspace within the Distance project (ProjectName.dat/R) can be interrogated.
Open R and load the mrds library. If you have not already used Distance to run a MRDS analysis (which will have
installed it) you may need to install the mrds package (Laake et al. 2012) first. It is available on the R-project
homepage (R Core Team, 2013).
library(mrds)
Import the data files into R as follows (or otherwise access the data). Specify directory where data output from
Distance is stored (note that here the data files have been copied from their original location to the direction given
below)
dirname <- "C:\\mrds\\Example1\\"
Import detections:
ddf.dat <read.table(file=paste(dirname,"ddf.dat.r",sep=""),header=TRUE,sep='\t',comment.
char='')
The following files are required to estimate density and abundance:
Region information (i.e. strata)
region.dat <read.table(file=paste(dirname,"region.dat.r",sep=""),header=TRUE,sep='\t',comme
nt.char='')
Transect information
sample.dat <read.table(file=paste(dirname,"sample.dat.r",sep=""),header=TRUE,sep='\t',comme
nt.char='')
2
Last updated 16/10/2014
Data used to link detections to transects and regions
obs.dat <read.table(file=paste(dirname,"obs.dat.r",sep=""),header=TRUE,sep='\t',comment.
char='')
Specify variables that are to be treated as factors. The variable sizegroup categorizes the number of faecal pellets in a
group as very small, small, medium and large.
ddf.dat$sizegroup <- as.factor(ddf.dat$sizegroup)
ddf.dat$observer <- as.factor(ddf.dat$observer)
Specify the truncation distance (in the same units as distance)
trunc <- 150
IO CONFIGURATION
Fit an IO POINT independence model. This requires both a DS model and a MR model. The form of the detection
function chosen here is a hazard rate form and no covariates have been included in the scale parameter. The MR
model includes distance as the only covariate.
ddf.1 <ddf(method='io',dsmodel=~cds(key='hr'),mrmodel=~glm(link='logit',formula=~dista
nce),data=ddf.dat,meta.data=list(width=trunc))
Print a summary of model and AIC values to the screen. See Figure S4 for a description of the summary information.
summary(ddf.1)
Look at the goodness of fit and qq plot:
ddf.gof(ddf.1,lwd=2,lty=1,pch=1,cex=1)
Investigate the detection data: create (and then print to the screen) a series of tables and plots that show the numbers
detected and missed by each observer within distance intervals.
ddf.table.1 <- det.tables(ddf.1)
print(ddf.table.1)
plot(ddf.table.1,which=1:6,lwd=2,lty=1,pch=1,cex=1)
The 'which' argument controls which plots are drawn.
The width of the distance intervals can be specified by the user using the ‘nc’ argument for equal-width intervals or
‘breaks’ for user-defined breaks:
ddf.table.1 <- det.tables(ddf.1,nc=15)
Plot the fitted detection function models. The ‘which’ argument (and the method) control which plots are drawn.
plot(ddf.1,lwd=2,lty=1,pch=1,cex=1)
Estimate density and abundance by region with the fitted model:
dht.1 <dht(ddf.1,region.dat,sample.dat,obs.dat,se=T,options=list(varflag=2,convert.uni
ts=.00000001))
print(dht.1)
Include covariates into the DS model: the DS model takes a half-normal form and sizegroup is included as a covariate
in the scale parameter.
ddf.2 <ddf(method='io',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link='l
ogit',formula=~distance),data=ddf.dat,meta.data=list(width=trunc))
The ‘method’ argument allows different observer configurations and independence assumptions to be fitted. Below
an IO Full Independence model is selected. With this method the DS model is not used and in the model below, the
MR model includes distance, sizegroup and observer.
3
Last updated 16/10/2014
ddf.3 <ddf(method='io.fi',mrmodel=~glm(link='logit',formula=~distance+sizegroup+observ
er),data=ddf.dat,meta.data=list(width=trunc))
TRIAL CONFIGURATION
Trial point independence model: the DS model has a half-normal form and no covariates are included. The MR model
includes distance and sizegroup.
ddf.4 <ddf(method='trial',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~di
stance+sizegroup),data=ddf.dat,meta.data=list(width=trunc))
Trial point independence model: the DS model has a half-normal form with sizegroup as covariate in scale parameter.
The MR model includes distance and sizegroup.
ddf.6 <ddf(method='trial',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link
='logit',formula=~distance+sizegroup),data=ddf.dat,meta.data=list(width=trunc))
Trial Full Independence model: the DS model is not used with full independence. The MR model includes distance
and sizegroup.
ddf.7 <ddf(method='trial.fi',mrmodel=~glm(link='logit',formula=~distance+sizegroup),da
ta=ddf.dat,meta.data=list(width=trunc))
The Lincoln-Petersen estimates of probability of detection for observer 1 and observer 2 can be fitted in the MR model
by including only observer as a covariate in the MR model.
ddf.10 <ddf(method='io',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link='l
ogit',formula=~observer),data=ddf.dat,meta.data=list(width=trunc))
To fit a Lincoln-Petersen estimates of probability with a trial configuration (so only the probability of detection for
observer 1is of interest), then use the following syntax:
ddf.11 <ddf(method='trial',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link
='logit',formula=~1),data=ddf.dat,meta.data=list(width=trunc))
IMPLEMENTING A CDS AND MCDS ANALYSIS IN R
A conventional distance sampling model can also be fitted (i.e. assuming detection on trackline is certain and so the
MR model is not used):
ddf.8 <ddf(method='ds',dsmodel=~cds(key='hr'),data=ddf.dat,meta.data=list(width=trunc)
)
A multiple covariate distance sampling model can be fitted as follows:
ddf.9 <ddf(method='ds',dsmodel=~mcds(key='hr',formula=~sizegroup),data=ddf.dat,meta.da
ta=list(width=trunc))
References
Burt, M.L., Borchers, D.L., Jenkins, K.J. and Marques, T.A. (in prep) Using mark-recapture distance sampling
methods on line transect surveys. Methods in Ecology and Evolution.
Laake, J.L., Borchers, D.L., Thomas, L., Miller, D. & Bishop J. (2012) mrds: Mark-Recapture Distance Sampling
(mrds). R package version 2.1.4.
R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL http://www.R-project.org.
4
Last updated 16/10/2014
Thomas, L., Buckland, S.T., Rexstad, E.A., Laake, J.L., Strindberg, S., Hedley, S.L., Bishop, J.R.B., Marques, T.A. &
Burnham, K.P. (2010) Distance software: design and analysis of distance sampling surveys for estimating population
size. Journal of Applied Ecology, 47, 5-14. DOI: 10.1111/j.1365-2664.2009.01737.x.
5
Last updated 16/10/2014
Figure S1 Example of an observation layer containing data for an MRDS analysis. The object field identifies unique
objects and links records together, one record for observer 1 and one for observer 2. The detected column indicates
whether the observer has detected the object (1) or not detected the object (0). Here, object 45 was detected by
observer 1 but not by observer 2. Object 46 was detected by both observer 1 and observer 2. The other data layers
reflect the hierarchical nature of the sampling strategy.
6
Last updated 16/10/2014
Figure S2 Choose the user configuration and model independence options. An IO configuration assuming point
independence is chosen in this example.
Figure S3 Specifying the DS model (left) and the MR model (right). The DS model chosen has a hazard rate form with
no covariates added into the scale parameter. Covariates can be included in the ‘Formula’ box (activated by checking
the ‘Scale parameter is a function of additional covariates’ option). The MR model chosen below includes two
covariates, perpendicular distance and size group (specified as a factor in the ‘Factors’ tab).
7
Last updated 16/10/2014
Summary of number of objects detected by:
Summary for io.fi object
Number
Number
Number
Number
AIC
of observations
seen by primary
seen by secondary
seen by both
:
:
:
:
:
1380
1094
1102
816
2457.952
Observer 1 + Observer 2 - Duplicates
Observer 1
Observer 2
Duplicate detections
AIC for MR model
Conditional detection function parameters:
estimate
se
(Intercept) 0.28098736 0.188557908
distance
-0.00835025 0.001517454
sizegroup2
0.46927834 0.207238010
sizegroup3
1.78569572 0.193560108
sizegroup4
3.19715740 0.440773795
Summary of the MR model
estimates of the regression coefficients
Estimate
SE
CV
Average primary p(0)
0.7952424 0.017075401 0.02147194
Average secondary p(0) 0.7952424 0.017075401 0.02147194
Average combined p(0) 0.9416874 0.009603436 0.01019812
Summary for ds object
Number of observations :
Distance range
:
AIC
:
Probability of detection on
the trackline
DS model fitted assuming certain detection on the
trackline
1380
0 - 150
13612.94
AIC for DS model
Detection function:
Hazard-rate key function
Form of DS model
Detection function parameters
Scale Coefficients:
estimate
se
(Intercept) 4.424918 0.05860113
DS model coefficients
Shape parameters:
estimate
se
(Intercept) 0.6843539 0.1246873
Estimate
SE
CV
Average p 0.6922382 0.02191403 0.03165677
Summary for io object
Total AIC value : 16070.89
Probability of detection assuming certain
detection on the trackline
Total AIC combining DS model and MR model
= 2457.95 + 13612.94
Estimate
SE
CV
Average p
0.651872 0.02168053 0.03325887 Overall probability of detection
N in covered region 2116.979995 78.08545496 0.03688531 Estimated number in covered region
= Number of obs/Average p
Figure S4 Summary of the fitted detection functions after choosing an IO configuration and a point independence
assumption; both a MR model and a DS model are fitted. The DS model would not be fitted if a full independence had
been assumed. In the output, primary refers to observer 1, secondary refers to observer 2 and the covered region has
an area given by 2wL (where w is the distance truncation distance and L is the total length of search effort).
8