Appendix S1: General description of the Multispecies Site Occupancy Model
(MSOM) for estimating species richness of a habitat.
We used a Multispecies Site Occupancy Model (MSOM) to estimate the number of distinct
species occupying a particular region (Dorazio et al., 2006). This model attempts to
separate the ecological process (occurrence) from the observation process (detection).
MSOM utilises repeated sampling at exact locations within a sufficiently short period to
prevent colonisation and extinction in the community. An additional assumption is that
the species are consistently identified. The particular model we used allows for
heterogeneity in occurrence and detection of species to account for the fact that species
differ in their probability of being captured and do not occur everywhere with equal
probability.
Example 1: Representation of the Multispecies Site Occupancy Model (MSOM). Incidence
data (k=replicates, n=observed number of species) for a number of sites where 1 means
a species was detected in a sample at a site and 0 means that the species was not
detected. The observations are augmented with an arbitrarily large number of zeros to
produce a super-community (S) of fixed number of species (a). This is a statistical trick to
simplify estimation of the number of species that were present but never observed. The
estimation problem is now shifted to separating the augmented zeros into those that
represent species that were not detected and extra zeros – a problem similar to the one
that regular occupancy models are designed to solve. The model estimates species
detection and inclusion probabilities within a Bayesian framework using MCMC algorithms.
The model is set up as a state-space model with matrix (b) representing the true
presence/absence for each species at each site. Matrix (b) is only partly observed
because species can go undetected. “Sampling” zeros indicated in grey (a) may be a
reflection of true absence or non-detection (b). The inclusion probability (c) is the
probability that a species from the super-community forms part of the regional
community (W) and species richness is the sum over the vector W. The estimated species
richness is therefore higher than the observed species richness.
2
Appendix S2: An example of the Multi-species Site Occupancy Model code for the
Sandy Coast Biotope referred to as “beach” in the code.
The original code was taken from Dorazio et al. (2006). This example is based on the Sandy Coast Biotope data.
#BEACH
#read in data
X =as.matrix(read.csv(file="Beach.csv")) # see Appendix S3
#read in the replicates for each station and create a matrix
nr <- c(5,5,5,5,5,5)
nr1 <- rep(nr,dim(X)[1]+400)
#add extra zeros to matrix to match the Multisppsiteocc fn
nr2 <- matrix(nr1,nrow=dim(X)[1]+400, ncol=dim(X)[2],byrow=T) #create a matrix
nrepls=nr2
#run the Multi-species Site Occupancy Model
beach = MultiSpeciesSiteOcc(nrepls, X)
summary(beach$fit$sims.matrix)
alpha.post = beach$fit$sims.matrix[,"alpha"]
# posterior distribution of the site-level effects
sigmaU.post = beach$fit$sims.matrix[,"sigma.u"]
# posterior distribution of the species-level effects
N.post = beach$fit$sims.matrix[,"N"]
# posterior distribution of the species richness estimates
hist(N.post, breaks = 10, col = "grey", main = "", xlab = "Community size", las = 1, freq = FALSE) #histogram
nsites = 30
CumNumSpeciesPresent(nsites, alpha.post, sigmaU.post, N.post)
MultiSpeciesSiteOcc
MultiSpeciesSiteOcc = function(nrepls, X) {
start.time = Sys.time()
# augment data matrix with an arbitrarily large number of zero row vectors
nzeroes = 400
n = dim(X)[1]
nsites = dim(X)[2]
Xaug = rbind(X, matrix(0, nrow=nzeroes, ncol=nsites))
# create arguments for WinBUGS()
sp.data = list(n=n, nzeroes=nzeroes, J=nsites, K=nrepls, X=Xaug)
sp.params = list('alpha', 'beta', 'rho', 'sigma.u', 'sigma.v', 'omega', 'N')
#initial values
sp.inits = function() {
omegaGuess = runif(1, n/(n+nzeroes), 1)
psi.meanGuess = runif(1, .25,1)
theta.meanGuess = runif(1, .25,1)
rhoGuess = runif(1, 0,1)
3
sigma.uGuess = 1
sigma.vGuess = 1
list(omega=omegaGuess, psi.mean=psi.meanGuess, theta.mean=theta.meanGuess,
tau.u=1/(sigma.uGuess^2), tau.v=1/(sigma.vGuess^2), rho=rhoGuess,
w=c(rep(1, n), rbinom(nzeroes, size=1, prob=omegaGuess)),
phi=rnorm(n+nzeroes, log(psi.meanGuess/(1.-psi.meanGuess)), sigma.uGuess),
eta=rnorm(n+nzeroes, log(theta.meanGuess/(1.-theta.meanGuess)), sigma.vGuess),
Z = matrix(rbinom((n+nzeroes)*nsites, size=1, prob=psi.meanGuess), nrow=(n+nzeroes))
)
}
# fit model to data using WinBUGS code
library(R2WinBUGS)
fit = bugs(sp.data, sp.inits, sp.params,
model.file='MultiSpeciesSiteOccModel.txt',
debug=T, n.chains=3, n.iter=100 000, n.burnin=50 000, n.thin=20, DIC=TRUE)
end.time = Sys.time()
elapsed.time = difftime(end.time, start.time, units='mins')
cat(paste(paste('Posterior computed in ', elapsed.time, sep=''), ' minutes\n', sep=''))
list(fit=fit, data=sp.data, X=X)
MultiSpeciesSiteOccModel.txt
model {
omega ~ dunif(0,1)
psi.mean ~ dunif(0,1)
alpha <- log(psi.mean) - log(1-psi.mean)
theta.mean ~ dunif(0,1)
beta <- log(theta.mean) - log(1-theta.mean)
sigma.u ~ dunif(0,10)
sigma.v ~ dunif(0,10)
tau.u <- pow(sigma.u,-2) # 1/(sigma.u)^2
tau.v <- pow(sigma.v,-2)
rho ~ dunif(-1,1)
var.eta <- tau.v/(1-pow(rho,2))
for (i in 1:(n+nzeroes)) {
w[i] ~ dbin(omega, 1)
phi[i] ~ dnorm(alpha, tau.u)I(-5,5)
mu.eta[i] <- beta + (rho*sigma.v/sigma.u)*(phi[i] - alpha)
eta[i] ~ dnorm(mu.eta[i], var.eta)I(-5,5)
logit(psi[i]) <- phi[i]
4
logit(theta[i]) <- eta[i]
mu.psi[i] <- psi[i]*w[i]
for (j in 1:J) {
Z[i,j] ~ dbin(mu.psi[i], 1)
mu.theta[i,j] <- theta[i]*Z[i,j]
X[i,j] ~ dbin(mu.theta[i,j], K[i,j])
}
}
n0 <- sum(w[(n+1):(n+nzeroes)])
N <- n + n0
}
5
Appendix S3: Collated data for the Sandy Coast Biotope referred to as Beach.csv in Appendix S2
Columns are sites and rows are species 1-19. Numbers in the table represent the number of samples
containing species n.
Melkbosstrand
Beach
Yzerfontein
Beach
1
5
0
2
0
0
0
0
0
0
0
3
0
0
3
4
0
0
0
Paternoster
Beach
2
5
0
0
0
0
0
0
0
0
0
2
0
0
2
4
0
0
0
Elands Bay
Beach
3
5
0
1
0
0
0
0
0
0
0
5
0
0
0
0
1
1
1
6
Platbaai
Beach
4
5
0
0
0
0
2
0
4
0
0
1
0
0
1
3
0
0
0
0
5
0
1
0
1
0
0
5
0
0
0
0
0
0
0
0
0
1
Port Nolloth
Beach
0
5
1
0
5
0
3
4
0
2
4
0
1
5
5
0
0
1
3
Table S1: Comparison conservation targets based on Chao 2, Jackknife 2,
Bootstrap and MSOM estimators complimenting Figure 2 in the manuscript.
Habitat
Southern Benguela
Sandy Coast
Muddy RiverInfluenced Middle
Shelf
Muddy Organicallyenriched Middle
Shelf
Sandy Middle Shelf
Sandy Outer Shelf
Sandy and Muddy
Shelf Edge
Chao 2
Target
%
(95 % Confidence
Interval)
Jackknife 2
Target
%
Bootstrap
Target
%
(95 % Confidence
Interval)
MSOM
Target
%
(95 % Credible
Interval)
7.7
(6.5-10.6)
8.5
6.7
(5.5-7.9)
7.8
(5.4-12.6)
0.9
(0.7-1.8)
1.4
0.8
(0.5-1.1)
0.9
(0.6-1.8)
7.2
(6.5-8.1)
7.6
4.9
(4.2-5.6)
7.7
(5.7-10.4)
4.1
(3.4-4.7)
5.5
(5.0-6.1)
6.8
(5.9-7.6)
7.9
(5.9-9.8)
8.5
(7.3-9.8)
12.3
(10.0-14.8)
5.2
(4.9-5.6)
7.2
(6.9-7.5)
9.7
(9.4-10)
6.1
8
10.2
7
© Copyright 2026 Paperzz