advances.sciencemag.org/cgi/content/full/1/8/e1500082/DC1
Supplementary Materials for
The shape of terrestrial abundance distributions
John Alroy
Published 25 September 2015, Sci. Adv. 1, e1500082 (2015)
DOI: 10.1126/sciadv.1500082
This PDF file includes:
Fig. S1. Four theoretical abundance distributions fitted to a data set for birds from
Poland (33), which has the highest species richness of any complete sample for
this group included in this analysis.
Fig. S2. Characteristic rank abundance distributions of six additional taxonomic
groups in tropical and temperate zones based on Fig. 2.
Fig. S3. Predicted and actual dominance in four disparate groups based on the
double geometric distribution.
Fig. S4. Predicted and actual dominance in six additional groups based on the
double geometric distribution.
Fig. S5. Predicted and actual median relative abundances in four disparate groups
based on the double geometric distribution.
Fig. S6. Predicted and actual median relative abundances in six additional groups
based on the double geometric distribution.
Fig. S7. Predicted and actual dominance and median relative abundance based on
the broken stick and Zipf distributions.
Fig. S8. Mean and median abundances observed in relatively well-sampled
distributions.
Table S1. Medians of the fit of the six theoretical abundance distributions to
observed frequencies as measured by K-L divergence statistics.
Table S2. Means of the fit of the six theoretical abundance distributions to
observed frequencies.
Table S3. Results of tests for differences between distributions of K-L divergence
statistics.
Appendix S1. R code used to perform the analyses.
Reference (33)
200
●●●
●●●
50
●
10 20
●
●●●
log normal
B
●
●●●
●
10 20
50
●●●
●●●
●
●
●●●
●●
●
5
Abundance
double geometric
A
●●
●●●
●●●
●●●
●
●
●●●
●●
●
5
200
Supplementary Materials
●●
●●●
●●●
●●●
●●●●●
●●●●●
20
30
2
geometric series
C
●●●
10 20
●
●●●
10
●
●●●
●●
5
●
●
●●
●●●
30
●●●
●
●●●
●●●
●●●
●
●
●●●
●●
●●●
●
●●
●●●
●●●
●●●●●
●●●●●
●●
1
1
2
●●
2
40
log series
●
●●●
20
D
50
●
0
10 20
50
●●●
40
200
10
5
200
0
Abundance
●●
1
1
2
●●
0
10
20
Abundance rank
30
40
0
10
20
30
40
Abundance rank
Fig. S1. Four theoretical abundance distributions fitted to a data set for birds from Poland (33),
which has the highest species richness of any complete sample for this group included in this
analysis. "Complete" is defined as in Fig. 1. (A) Double geometric distribution. (B) Log normal
distribution. (C) Geometric series distribution. (D) Log series distribution. The line is jagged at
low abundances because predicted values are rounded to the nearest integer.
Fig. S2. Characteristic rank abundance distributions of six additional taxonomic groups in
tropical and temperate zones based on Fig. 2. Thick lines = tropical data; thin lines = temperate
zone data. (A) Small terrestrial mammals. (B) Birds. (C) Lizards. (D) Ants. (E) Butterflies. (F)
Odonates.
Fig. S3. Predicted and actual dominance in four disparate groups based on the double geometric
distribution. (A) Trees. (B) Bats. (C) Frogs. (D) Dung beetles.
Fig. S4. Predicted and actual dominance in six additional groups based on the double geometric
distribution. (A) Small terrestrial mammals. (B) Birds. (C) Lizards. (D) Ants. (E) Butterflies. (F)
Odonates.
Fig. S5. Predicted and actual median relative abundances in four disparate groups based on the
double geometric distribution. (A) Trees. (B) Bats. (C) Frogs. (D) Dung beetles.
Fig. S6. Predicted and actual median relative abundances in six additional groups based on the
double geometric distribution. (A) Small terrestrial mammals. (B) Birds. (C) Lizards. (D) Ants.
(E) Butterflies. (F) Odonates.
Fig. S7. Predicted and actual dominance and median relative abundance based on the broken
stick and Zipf distributions. (A) Dominance predicted by the BS. (B) Dominance predicted by
the Zipf. (C) Median abundance predicted by the BS. (D) Median abundance predicted by the
Zipf.
Fig. S8. Mean and median abundances observed in relatively well-sampled distributions. The
minimum cutoff for inclusion in the figure is a Good's u value of 0.99 or better. The residual
standard error is 0.367 on a log scale. The ZSM model implies that many points should fall
above the line, not below it.
trees
bats
small mammals
birds
lizards
frogs
ants
dung beetles
butterflies
odonates
all combined
DG
0.0334
0.0305
0.0295
0.0231
0.0252
0.0265
0.0321
0.0280
0.0281
0.0242
0.0273
LN
0.0299
0.0307
0.0271
0.0212
0.0252
0.0261
0.0335
0.0289
0.0269
0.0273
0.0270
GS
0.0616
0.0489
0.0528
0.0501
0.0515
0.0526
0.0520
0.0481
0.0641
0.0460
0.0515
LS
0.0361
0.0285
0.0417
0.0218
0.0358
0.0448
0.0337
0.0259
0.0264
0.0353
0.0309
BS
0.1604
0.1235
0.0935
0.0842
0.0967
0.0570
0.2239
0.2128
0.1490
0.1137
0.1282
Zipf
0.2509
0.2262
0.2150
0.3103
0.2254
0.2888
0.2042
0.1812
0.2839
0.2739
0.2398
Table S1. Medians of the fit of the six theoretical abundance distributions to observed
frequencies as measured by K-L divergence statistics. Lower values imply closer fits. The
equation and an explanation of the measure are given in the Materials and Methods section. Note
(1) the highly consistent DG values, (2) the comparable support for the LN and LS, and (3) the
poor performance of the other models.
trees
bats
small mammals
birds
lizards
frogs
ants
dung beetles
butterflies
odonates
all combined
DG
0.0449
0.0388
0.0347
0.0303
0.0361
0.0354
0.0480
0.0393
0.0419
0.0334
0.0380
LN
0.0380
0.0403
0.0369
0.0284
0.0359
0.0351
0.0475
0.0402
0.0315
0.0393
0.0371
GS
0.0884
0.0658
0.0612
0.0700
0.0643
0.0667
0.0874
0.0681
0.0952
0.0582
0.0716
LS
0.0407
0.0410
0.5202
0.0271
0.0515
0.0548
0.0529
0.0342
0.0403
0.0500
0.0436
BS
0.1952
0.1739
0.1301
0.1472
0.1425
0.1415
0.3081
0.2616
0.2028
0.1635
0.1829
Zipf
0.2511
0.2412
0.2612
0.3132
0.2802
0.3189
0.2343
0.1961
0.2836
0.2983
0.2650
Table S2. Means of the fit of the six theoretical abundance distributions to observed frequencies.
See the caption of Table S1 for further details.
trees
bats
small mammals
birds
lizards
frogs
ants
dung beetles
butterflies
odonates
all combined
DG vs. LN
0.963
0.009 [DG]
0.038 [LN]
0.904
0.251
0.341
0.203
0.062
0.279
0.023 [DG]
~ 0 [LN]
DG vs. LS
0.236
0.137
~ 0 [DG]
0.147
0.003 [DG]
~ 0 [DG]
0.170
0.965
0.563
0.001 [DG]
~ 0 [DG]
Table S3. Results of tests for differences between distributions of K-L divergence statistics.
Comparisons are between the double geometric (DG) and log normal (LN) and between the DG
and the log series (LS). Figures are p-values derived from paired Wilcoxon rank-sum tests.
Models favored by the tests are noted in brackets when the differences are significant at an alpha
level of 0.05. The results show that (1) the LN is favored slightly over the DG in the overall data
set (see also Table S1) but there is no consistent support for it across groups, and (2) the DG is
better than the LS for four groups plus all groups combined.
Appendix S1
R code used to perform the analyses.
# fits the double geometric distribution
fitDoubleGeometric<-function(ab)
{
spp = length(ab)
n = sum(ab)
p = ab/n
best = 999999
k = 0.5
bestk = 1
r = spp
bestr = spp - 1
while (r < 10 * spp && r == bestr + 1)
{
for (z in 1:100) {
lastk = k
k = k + rnorm(1,sd=0.01)
if (k <= 0)
k = lastk
dg = array()
dg[1] = k
# substituted r for spp 22.4.15
for (i in 2:r)
{
if ( i > r - i + 1 )
{
dg[i] = dg[i-1] * k ** (1 / sqrt( (r - i + 1) /
r ))
} else
{
dg[i] = dg[i-1] * k ** (1 / sqrt( i / r ))
}
}
raw = dg
dg = dg / sum(dg)
kl = sum(p * log(p/dg[1:spp]))
if (kl < best)
{
best = kl
bestk = k
bestr = r
bestdg = dg[1:spp]
}
if (bestk != k)
{
k = lastk
}
}
r = r + 1
}
return(list(distribution=bestdg,richness=bestr,k=bestk,fit=best))
}
# fits the log normal distribution
fitLogNormal<-function(ab)
{
spp = length(ab)
n = sum(ab)
p = ab/n
best = 999999
s = 1
r = spp
bestr = spp - 1
while (r < 10 * spp && r == bestr + 1)
for (z in 1:100) {
lasts = s
s = s + rnorm(1,sd=0.1)
if (s <= 0) {
{
s = lasts
}
np = exp(sort(qnorm(seq(0.5/r,10.5/r,1/r),sd=s),decreasing=T))
np = np/sum(np)
kl = sum(p * log(p/np[1:spp]))
if (kl < best)
{
best = kl
bests = s
bestr = r
bestln = np[1:spp]
}
if (bests != s)
{
s = lasts
}
}
r = r + 1
}
bestln = sort(bestln / sum(bestln), decreasing=T)
return(list(distribution=bestln,richness=bestr,sd=bests,fit=best))
}
# fits the geometric series distribution
fitGeometric<-function(ab)
{
spp = length(ab)
p = ab/sum(ab)
k = exp(lm( log(ab) ~ c(1:spp),weights=sqrt(ab) )$coef[2])
best = 999999
for (i in 1:100) {
lastk = k
k = k + rnorm(1,sd=0.01)
q = array()
for (i in 1:(10*spp))
{
if (i > 1)
q[i] = q[i-1] * k
else
q[i] = k
}
q = q/sum(q)
kl = sum(p * log(p/q[1:spp]))
if (kl < best)
{
best = kl
bestk = k
bestgs = q
}
if (bestk != k)
{
k = lastk
}
}
return(list(distribution=bestgs,k=bestk,fit=best))
}
# fits the log series distribution
fitLogSeries<-function(ab)
{
spp = length(ab)
nisp = sum(ab)
p = ab/sum(ab)
if (nisp > 10000)
return()
z = logSeriesParams(ab)
alpha = z[1]
lsx = z[2]
best = 999999
for (i in 1:100) {
lasta = alpha
alpha = alpha + rnorm(1,0.01)
if (alpha <= 0)
alpha = lasta
lsx = nisp/(nisp + alpha)
s = array()
cd = rep(0,3*nisp)
highest = 0
for (j in 1:(3*nisp))
{
s[j] = alpha * lsx^j / j
cd[j+1] = cd[j] + s[j]
if (cd[j]/nisp > 1 - 0.5/spp)
break
highest = j
}
z = 0
q = array()
for (j in 2:highest)
{
if (round(cd[j-1]) < round(cd[j]))
for (k in round(cd[j-1]):(round(cd[j])-1) )
z = z + 1
q[z] = j - 1
}
}
if (length(q) < spp)
{
alpha = lasta
lsx = nisp/(nisp + alpha)
next
}
q = sort(q,decreasing=T)
q = q/sum(q)
kl = sum(p * log(p/q[1:spp]))
if (kl < best)
{
best = kl
besta = alpha
bestls = q[1:spp]
}
if (besta != alpha)
{
alpha = lasta
lsx = nisp/(nisp + alpha)
}
}
return(list(distribution=bestls,alpha=besta,fit=best))
}
# needed by fitLogSeries
logSeriesParams<-function(ab) {
a = 10
olda = 0
n = sum(ab)
z = 0
while (abs(a - olda) > 0.0000001 && z < 1000)
z = z + 1
olda = a
a = length(ab) / log(1 + n / a)
}
x = n/(n + a)
return(c(a,x))
}
# fits the broken stick distribution
fitBS<-function(ab)
{
best = 999999
spp = length(ab)
{
{
bestr = spp
p = ab/sum(ab)
for (i in spp:(length(ab)*3)) {
q = rep(0,i)
for (j in 1:i)
{
for (k in 0:(i-j))
{
q[j] = q[j] + 1 / ( i - k )
}
q[j] = q[j] / i
}
q = q/sum(q,na.rm=T)
kl = sum(p * log(p/q[1:spp]))
if (kl < best)
{
best = kl
bestr = i
bestbs = q
}
if (bestr != i)
{
break
}
}
return(list(distribution=bestbs,richness=bestr,fit=best))
}
# fits the Zipf distribution
fitZipf<-function(ab)
{
spp = length(ab)
p = ab/sum(ab)
co = lm(log(ab) ~ log(1:spp),weights=sqrt(ab))$coefficients
z = co[2]
best = 999999
for (i in 1:100) {
lastexp = z
z = z + rnorm(1,sd=0.01)
q = exp(log(1:1000) * z)
q = q/sum(q)
kl = sum(p * log(p/q[1:spp]))
if (kl < best)
{
best = kl
bestexp = z
bestzipf = q[1:spp]
}
if (bestexp != z) {
z = lastexp
}
}
return(list(distribution=bestzipf,exponent=bestexp,fit=best))
}
© Copyright 2026 Paperzz