Last updated 16/10/2014 APPENDIX S2: Running an MRDS analysis in Distance and R: a tutorial M. L. Burt1, D. L. Borchers1, K. J. Jenkins2 & T. A. Marques1 1 Centre for Research into Environmental and Ecological Modelling, University of St Andrews, UK 2 U.S. Geological Survey Forest and Rangeland Ecosystem Science Centre, USA Summary Examples of running a mark-recapture distance sampling analysis in Distance (Thomas et al. 2010) and R (R Core Team 2013) using the mrds package (Laake et al. 2012) are provided. The data described in Example 1 of Burt et al. (in prep) are used to illustrate the methods and are in a Distance project which is available as Supporting Information S3 or can be downloaded from http://distancesampling.org/Distance/example-projects/MEE_Burtetal_Example1.zip. Implementing MRDS methods Conventional distance sampling (CDS), multiple covariate distance sampling (MCDS) and mark-recapture distance sampling (MRDS) analysis methods can be implemented using program Distance (Thomas et al. 2010) or R (R core team (2013). Distance provides a user-friendly environment for model selection and model checking and is recommended for those unfamiliar with the methods or with R. However, experienced R users may wish to dispense with the Distance interface and run the models directly in the R environment using the mrds package (Laake et al. 2012) as this offers more flexibility. Here we provide a brief overview of both methods using data collected using an IO observer configuration (Example 1 in accompanying paper). The data format required for detections collected during a MRDS survey is slightly different to that for CDS and so this is described below. Program Distance can be downloaded from http://distancesampling.org/ and R is available from http://cran.rproject.org/. Both programs are free. A Distance project has been created containing the data described in Example 1 of Burt et al. (in prep) and the zip file of the project can be downloaded at the link given above. The necessary files required to run an analyses in R can be obtained from the Distance project (see below). DESCRIPTION OF THE SURVEY DATA The data used throughout this tutorial come from a line transect survey of ungulate faecal pellets within a national park in the USA conducted in 2001 and 2002 (Jenkins and Manly 2008). The study region was divided into eight subregions; four in the east zone of the park and four in the west. The sub-regions were divided into sampling units and data were collected in 102 sampling units. Transects, selected at random within each sampling unit, were categorised by ease of access (Accessible, Moderately Accessible or Inaccessible) and elevation (high or low). Two observers (denoted by 1 and 2) worked independently and walked along each selected transect (divided into segments) searching for pellets. On detecting a pellet, the observers recorded information about the pellets and environmental conditions: perpendicular distance (cm); species (Deer or Elk); number of pellets; number of pellet clumps; presence (1) or absence (0) of clumps; pellet decomposition (categorised into five levels from no decomposition (1) to more than 50% decomposed (5)); maximum diameter of pellet dispersion (cm); prevailing aspect of the slope (0-359o); measurement of the slope classified into 6 levels from flat (1) to more than 35o (6) and size of the pellet group which categorises the number of pellets into four group sizes (very small, small, medium and large). After both observers had completed each transect, they determined duplicate detections. FORMATTING DETECTIONS In Distance, information about detections (i.e. perpendicular distance, cluster size) is stored in the ‘Observation’ layer. Figure S1 illustrates the format required for the ‘Observation’ layer of MRDS data. The object field uniquely identifies each object detected and two records are required for each of the detections; one record for observer 1 and one record for observer 2. With a trial configuration, observer 1 is the primary observer and observer 2 is the tracker. The field called detected indicates whether the observer detected the object (1) or did not detect the object (0). For a duplicate, detected will equal one for both observer 1 and observer 2. MRDS analysis in Distance 1 Last updated 16/10/2014 Although the survey was designed with an IO configuration, analyses with other configurations have been illustrated in the Distance project, ‘MEE Burtetal Example 1’; three sets illustrate analyses with an IO configuration, trial configuration and a single observer configuration (conventional distance sampling). Each set contains several model definitions which illustrate different options for the DS model and MR models, where appropriate. The covariate (apart from perpendicular distance) used in the model definitions is sizegroup, the variable which categorises the size of the pellet group. Of primary interest in this analysis was to estimate the probability of detection for each observer rather than estimating a density or abundance (the quantities required to be estimated are specified in the model definition) and so only information on the detection functions are shown. Results are available for all analyses (on the Results tab). Instructions are given below to create a new analyses: i. ii. iii. iv. Open a new analysis and choose the MRDS analysis engine. Under the ‘Detection function’ tab, choose the observer configuration and model assumptions on the ‘Method’ tab. In Figure S2 an IO point independence model is chosen. This choice determines whether a MR model and/or a DS model are required. For an IO point independence model, then both a DS model and a MR model need to be specified. The left screenshot in Figure S3 illustrates that a hazard rate form has been chosen for the DS model with no additional covariates included in the scale parameter. The right screenshot in Figure S3 shows a MR model with perpendicular distance and sizegroup chosen as explanatory variables. Sizegroup was listed as a factor on the ‘Factors’ tab. Some key fields get renamed during an MRDS analysis: perpendicular distance gets renamed to ‘distance’ and cluster size (the covariate defined as cluster size in the Survey Properties) gets renamed to ‘size’. These new names should be used when specifying covariates in models; look in the log file after running an analysis for details. After running the analysis, all results are stored on the Results tab. Figure S4 describes the detection function summary page for the model described above. In the results, ‘Primary’ refers to observer 1 and ‘Secondary’ refers to observer 2. MRDS analysis in R Running an analysis directly in R gives the user access to options that are not available via the Distance interface. A convenient way to switch from Distance to R is to run an analysis in Distance in debug mode. This creates all the necessary data files (the directory where they are stored is shown in the log tab) and R commands (on the log tab). To activate debug mode, go to Tools>Preferences>Analysis and tick the ‘Debug mode’ analysis option. The data in Distance are stored in an Access database (in the ProjectName.dat directory) and so, alternatively, R can connect to this directly or the ‘RData’ workspace within the Distance project (ProjectName.dat/R) can be interrogated. Open R and load the mrds library. If you have not already used Distance to run a MRDS analysis (which will have installed it) you may need to install the mrds package (Laake et al. 2012) first. It is available on the R-project homepage (R Core Team, 2013). library(mrds) Import the data files into R as follows (or otherwise access the data). Specify directory where data output from Distance is stored (note that here the data files have been copied from their original location to the direction given below) dirname <- "C:\\mrds\\Example1\\" Import detections: ddf.dat <read.table(file=paste(dirname,"ddf.dat.r",sep=""),header=TRUE,sep='\t',comment. char='') The following files are required to estimate density and abundance: Region information (i.e. strata) region.dat <read.table(file=paste(dirname,"region.dat.r",sep=""),header=TRUE,sep='\t',comme nt.char='') Transect information sample.dat <read.table(file=paste(dirname,"sample.dat.r",sep=""),header=TRUE,sep='\t',comme nt.char='') 2 Last updated 16/10/2014 Data used to link detections to transects and regions obs.dat <read.table(file=paste(dirname,"obs.dat.r",sep=""),header=TRUE,sep='\t',comment. char='') Specify variables that are to be treated as factors. The variable sizegroup categorizes the number of faecal pellets in a group as very small, small, medium and large. ddf.dat$sizegroup <- as.factor(ddf.dat$sizegroup) ddf.dat$observer <- as.factor(ddf.dat$observer) Specify the truncation distance (in the same units as distance) trunc <- 150 IO CONFIGURATION Fit an IO POINT independence model. This requires both a DS model and a MR model. The form of the detection function chosen here is a hazard rate form and no covariates have been included in the scale parameter. The MR model includes distance as the only covariate. ddf.1 <ddf(method='io',dsmodel=~cds(key='hr'),mrmodel=~glm(link='logit',formula=~dista nce),data=ddf.dat,meta.data=list(width=trunc)) Print a summary of model and AIC values to the screen. See Figure S4 for a description of the summary information. summary(ddf.1) Look at the goodness of fit and qq plot: ddf.gof(ddf.1,lwd=2,lty=1,pch=1,cex=1) Investigate the detection data: create (and then print to the screen) a series of tables and plots that show the numbers detected and missed by each observer within distance intervals. ddf.table.1 <- det.tables(ddf.1) print(ddf.table.1) plot(ddf.table.1,which=1:6,lwd=2,lty=1,pch=1,cex=1) The 'which' argument controls which plots are drawn. The width of the distance intervals can be specified by the user using the ‘nc’ argument for equal-width intervals or ‘breaks’ for user-defined breaks: ddf.table.1 <- det.tables(ddf.1,nc=15) Plot the fitted detection function models. The ‘which’ argument (and the method) control which plots are drawn. plot(ddf.1,lwd=2,lty=1,pch=1,cex=1) Estimate density and abundance by region with the fitted model: dht.1 <dht(ddf.1,region.dat,sample.dat,obs.dat,se=T,options=list(varflag=2,convert.uni ts=.00000001)) print(dht.1) Include covariates into the DS model: the DS model takes a half-normal form and sizegroup is included as a covariate in the scale parameter. ddf.2 <ddf(method='io',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link='l ogit',formula=~distance),data=ddf.dat,meta.data=list(width=trunc)) The ‘method’ argument allows different observer configurations and independence assumptions to be fitted. Below an IO Full Independence model is selected. With this method the DS model is not used and in the model below, the MR model includes distance, sizegroup and observer. 3 Last updated 16/10/2014 ddf.3 <ddf(method='io.fi',mrmodel=~glm(link='logit',formula=~distance+sizegroup+observ er),data=ddf.dat,meta.data=list(width=trunc)) TRIAL CONFIGURATION Trial point independence model: the DS model has a half-normal form and no covariates are included. The MR model includes distance and sizegroup. ddf.4 <ddf(method='trial',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~di stance+sizegroup),data=ddf.dat,meta.data=list(width=trunc)) Trial point independence model: the DS model has a half-normal form with sizegroup as covariate in scale parameter. The MR model includes distance and sizegroup. ddf.6 <ddf(method='trial',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link ='logit',formula=~distance+sizegroup),data=ddf.dat,meta.data=list(width=trunc)) Trial Full Independence model: the DS model is not used with full independence. The MR model includes distance and sizegroup. ddf.7 <ddf(method='trial.fi',mrmodel=~glm(link='logit',formula=~distance+sizegroup),da ta=ddf.dat,meta.data=list(width=trunc)) The Lincoln-Petersen estimates of probability of detection for observer 1 and observer 2 can be fitted in the MR model by including only observer as a covariate in the MR model. ddf.10 <ddf(method='io',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link='l ogit',formula=~observer),data=ddf.dat,meta.data=list(width=trunc)) To fit a Lincoln-Petersen estimates of probability with a trial configuration (so only the probability of detection for observer 1is of interest), then use the following syntax: ddf.11 <ddf(method='trial',dsmodel=~mcds(key='hn',formula=~sizegroup),mrmodel=~glm(link ='logit',formula=~1),data=ddf.dat,meta.data=list(width=trunc)) IMPLEMENTING A CDS AND MCDS ANALYSIS IN R A conventional distance sampling model can also be fitted (i.e. assuming detection on trackline is certain and so the MR model is not used): ddf.8 <ddf(method='ds',dsmodel=~cds(key='hr'),data=ddf.dat,meta.data=list(width=trunc) ) A multiple covariate distance sampling model can be fitted as follows: ddf.9 <ddf(method='ds',dsmodel=~mcds(key='hr',formula=~sizegroup),data=ddf.dat,meta.da ta=list(width=trunc)) References Burt, M.L., Borchers, D.L., Jenkins, K.J. and Marques, T.A. (in prep) Using mark-recapture distance sampling methods on line transect surveys. Methods in Ecology and Evolution. Laake, J.L., Borchers, D.L., Thomas, L., Miller, D. & Bishop J. (2012) mrds: Mark-Recapture Distance Sampling (mrds). R package version 2.1.4. R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org. 4 Last updated 16/10/2014 Thomas, L., Buckland, S.T., Rexstad, E.A., Laake, J.L., Strindberg, S., Hedley, S.L., Bishop, J.R.B., Marques, T.A. & Burnham, K.P. (2010) Distance software: design and analysis of distance sampling surveys for estimating population size. Journal of Applied Ecology, 47, 5-14. DOI: 10.1111/j.1365-2664.2009.01737.x. 5 Last updated 16/10/2014 Figure S1 Example of an observation layer containing data for an MRDS analysis. The object field identifies unique objects and links records together, one record for observer 1 and one for observer 2. The detected column indicates whether the observer has detected the object (1) or not detected the object (0). Here, object 45 was detected by observer 1 but not by observer 2. Object 46 was detected by both observer 1 and observer 2. The other data layers reflect the hierarchical nature of the sampling strategy. 6 Last updated 16/10/2014 Figure S2 Choose the user configuration and model independence options. An IO configuration assuming point independence is chosen in this example. Figure S3 Specifying the DS model (left) and the MR model (right). The DS model chosen has a hazard rate form with no covariates added into the scale parameter. Covariates can be included in the ‘Formula’ box (activated by checking the ‘Scale parameter is a function of additional covariates’ option). The MR model chosen below includes two covariates, perpendicular distance and size group (specified as a factor in the ‘Factors’ tab). 7 Last updated 16/10/2014 Summary of number of objects detected by: Summary for io.fi object Number Number Number Number AIC of observations seen by primary seen by secondary seen by both : : : : : 1380 1094 1102 816 2457.952 Observer 1 + Observer 2 - Duplicates Observer 1 Observer 2 Duplicate detections AIC for MR model Conditional detection function parameters: estimate se (Intercept) 0.28098736 0.188557908 distance -0.00835025 0.001517454 sizegroup2 0.46927834 0.207238010 sizegroup3 1.78569572 0.193560108 sizegroup4 3.19715740 0.440773795 Summary of the MR model estimates of the regression coefficients Estimate SE CV Average primary p(0) 0.7952424 0.017075401 0.02147194 Average secondary p(0) 0.7952424 0.017075401 0.02147194 Average combined p(0) 0.9416874 0.009603436 0.01019812 Summary for ds object Number of observations : Distance range : AIC : Probability of detection on the trackline DS model fitted assuming certain detection on the trackline 1380 0 - 150 13612.94 AIC for DS model Detection function: Hazard-rate key function Form of DS model Detection function parameters Scale Coefficients: estimate se (Intercept) 4.424918 0.05860113 DS model coefficients Shape parameters: estimate se (Intercept) 0.6843539 0.1246873 Estimate SE CV Average p 0.6922382 0.02191403 0.03165677 Summary for io object Total AIC value : 16070.89 Probability of detection assuming certain detection on the trackline Total AIC combining DS model and MR model = 2457.95 + 13612.94 Estimate SE CV Average p 0.651872 0.02168053 0.03325887 Overall probability of detection N in covered region 2116.979995 78.08545496 0.03688531 Estimated number in covered region = Number of obs/Average p Figure S4 Summary of the fitted detection functions after choosing an IO configuration and a point independence assumption; both a MR model and a DS model are fitted. The DS model would not be fitted if a full independence had been assumed. In the output, primary refers to observer 1, secondary refers to observer 2 and the covered region has an area given by 2wL (where w is the distance truncation distance and L is the total length of search effort). 8
© Copyright 2026 Paperzz