Comparison between the point independence model and the hazard

IWC/SC/J11/AE3
Comparison between the point independence model and the
hazard probability model using the IDCR/SOWER Antarctic
minke whales data
Hiroshi Okamura1 and Toshihide Kitakado2
1
National Research Institute of Far Seas Fisheries, Fisheries Research Agency, Kanagawa 236-8648, Japan
2
Tokyo University of Marine Science and Technology, Minato, Tokyo 108-8477, Japan
The corresponding author’s email address: [email protected]ffrc.go.jp
ABSTRACT
We examined the effects of assumption on the independence in detection between platforms and
use of the SSE data to the abundance estimates. Point independece assumption consistently provided
lower abundance estimates compared with the hazard probability model and can be implausible for
IDCR/SOWER data. The use of the SSE data is less influential but might have potentially serious
impacts along with the change of confirmation status handling.
1. INTRODUCTION
Two approaches, SPLINTR and OK, provided abundance estimates for Antarctic minke whales
in the IWC/SC, 2009 (Bravington and Hedley 2009, Okamura and Kitakado 2009). Because their
results were considerably different, the developers for two approaches ran their models using the
common dataset that is called the reference dataset (Bravington and Hedley 2010, Okamura and
Kitakado 2010). The difference did not be canceled even with the reference dataset. The developers
for SPLINTR carried out the analysis without spatial components of their model and it did not
produce essential difference. The SC therefore have decided to hold the intersessional workshop and
encouraged the developers to conduct some sensitivity analyses other than spatial components.
This paper provides the sensitivity tests of the factors that were not focused during the last SC
meeting. They are Point Independence (PI) vs. Hazard Probability and with/without the SSE data.
PI assumes the independence of detection on the trackline, while HP assumes the independence
of each cue detection. Buckland et al. (2010) developed the Limiting Independence (LI) model which
assumes independence in the limit as detection probability tends to be one and includes PI and Full
Independence (FI) models. We used FI and LI models as well as PI for comparison.
The SSE data are the experiment data which provide direct estimates for recording errors in
unconfirmed school sizes and were conducted during the CPIII surveys. They have records of school
size before and after approaching. While the OK method uses the confirmation/unconfirmation
status of observed school size directly, the SPLINTR uses Closing and IO modes as a surrogate
of confrimation/unconfirmation. This enables SPLINTR to use the SSE data, but not OK. One
therefore needs to use the definition of confirmation/unconfirmation used in SPLINTR to use the
SSE data directly.
The objective of this paper is to make the effects of difference of independence in detection
and presence/absence of the SSE data clear. The next section provides the outline of the approach.
1
Finally, the results and discussion are provided with the focus of cause of difference between SPLINTR
and OK.
2. MATERIALS AND METHODS
2.1. The data
We used the IDCR/SOWER reference dataset. For simplicity, the sightings from Upper Bridge
were eliminated. School size was only the covariate used in analyses when necessary. CPII and
CPIII were analyzed separately. First, we examined the data with school size = 1 ignoring confirmation/unconfirmation status of school size. Next, the data including school size information were
examined. For investigating the sensitivity of the SSE data, we also used the data that replaced the
confirmation/unconfrimation with CL/IO. Summary of perpendicular distances and school sizes for
two types of data was provided in Table 1.
2.2. The model
The estimation equation is the same as Okamura and Kitakado (2010, 2011a). The following two
forms were used as detection functions:
• Perpendicular Distance Model:
The form of the hazard rate model was used as a detection function:
g(x) = g(0) × [1 − exp{−(x/σ)−b }]
When the school size information is available, the parameters σ and g(0) are linked to (true)
school size:
log(σ) ∼ Platform + log(s)
logit(g(0)) ∼ Platform + log(s)
The detections are categorized into three pattterns, A (Top Barrel) only, B (IO Booth) only,
and both A and B. Each probability density is:
for A only,
gAb (x) = gA (x)(1 − δ(x)gB (x))
for B only,
gaB (x) = gB (x)(1 − δ(x)gA (x))
for both A and B,
2
gA×B = δ(x)gA (x)gB (x)
where
δ(x) = {U (x) − L(x)}δ0 (x) + L(x)
logit(δ0 (x)) = α + βx + log{(1 − L(x))/(U (x) − 1)}
L(x) = max[0, {gA (x) + gB (x) − 1}/{gA (x)gB (x)}]
U (x) = min{1/gA (x), 1/gB (x)}
Buckland et al. (2010) called this model the Limiting Independence (LI) model. The model
becomes Point Independence (PI) when α = 0, and Full Independence (FI) when α = β = 0.
We examine all the models, LI, PI, and FI.
• Hazard Probability Model:
The specification of the Hazard Probability model is same as Okamura and Kitakado (2010,
2011a) except for eliminating the Upper Bridge platform, C. The parameters of detection function were linked to school size.
The handling of school size distribution was based on the approach of Okamura and Kitakado
(2011a). The models were applied to the data where confirmation/unconfirmation of school size was
replaced by Closing mode/IO mode for examining the effect of using the SSE data. The density
indices for each methods were calculated by E(s)/eswA∪B using only the IO data.
For comparison, the model with g(0) = 1 was also fitted to the data. We then used the three
platform data for the model with g(0) = 1. The hazard rate model was used as a detection function.
When there is school size effect, the conventional regression approach was used.
2.3. Diagnostics
We compared the observed proportions of sightings, Ab (A only), aB (B only), and AB (AB both),
with the predicted proportions from the PI and HP models using the IO mode data. The predicted
values were calculated by relative proportions of eswAb , eswaB , and eswAB = eswA∪B −eswAb −eswaB .
3. RESULTS AND DISCUSSION
When the models were applied to the data with school size equal to one, the relative density
indices (DI) for PI and HP was about 1.7 and 2.0, respectively (Table 2). The difference among
models was larger for CPII. LI for CPII provided an extremely large density index. This is due to
high correlation between estimated parameters (Buckland et al. 2010). g(0)s and esws are regarded
3
as being overestimated, because the school size tends to be greater than one if sightings with school
size eqaul to one are unconfirmed (we ignored conf/unconf for this example).
When there is school size effect, g(0) and esw for school size = 1 got smaller as expected (Table
3). Relative DIs were somewhat reduced. For CPIII, the DI for PI was considerably small. Relative
DIs for HP were about 1.8 for both CPII and CPIII.
When the conf/unconf was changed to CL/IO, the DIs increased (Table 4). This is due to the
decrease of esw. The same operation for three platform data showed the similar trend for the OK
method, but the degree of change of two platform data was much larger (Okamura and Kitakado
2011b).
Using the SSE data, the DIs decreased in general (Table 5). The DIs for PI were about 1.7.
These values are comparable to the values for HP in Table 3. Therefore, if the handling of school size
distribution is equal, SPLINTR and OK should provide the same level abundance estimates which
are a little less than 2 times as large as the estimates with g(0) = 1.
The goodness of fits for proportions of sighting data showed that HP outperforms PI (Table 6)
and the PI assumption may be implausible for IDCR/SOWER data for the data with mean school
size greater than one, though the performances for both models were almost same for the data with
mean school size equal to one. In particular, the proportion of duplicate sightings based on the PI
assumption unreasonably increased when the conf/unconf = CL/IO assumption and the SSE data
were used.
These factors might be able to potentially explain the difference between SPLINTR and OK,
but we were not able to be convinced of the true cause through this investigation. At least, the
use of PI would be problematic for IDCR/SOWER data because it lowers abundance estimates
unreasonably. The use of the SSE data seems less influential than the PI assumption (but, the
change from conf/unconf to CL/IO might be potentially influential). However, the procedure of
SSE does not necessarily match the IDCR/SOWER surveys. If possible, estimation based on pure
IDCR/SOWER data would be desirable considering the abundance estimates with g(0) = 1 were
based on them (Branch 2006). The general results from this study would suggest that we need to
explain why SPLINTR provides the same level abundance estimates as those with g(0) = 1 rather
than OK provides too large abundance estimates.
ACKNOWLEDGMENTS
We greatly thank Mark Bravington for the SSE data.
REFERENCES
Branch (2006) IWC/SC58/IA18.
Bravington and Hedely (2009) IWC/SC61/IA14.
Bravington and Hedely (2010) IWC/SC62/IA12.
Buckland, Laake, Borchers (2010) Double-observer line transect methods: levels of independence.
Biometrics 66: 169-177.
Okamura and Kitakado (2009) IWC/SC61/IA6.
4
Okamura and Kitakado (2010) IWC/SC62/IA3.
Okamura and Kitakado (2011a) Revised abundance estimates of Antarctic minke whales from the
OK method. IWC/SC/J11/AE1.
Okamura and Kitakado (2011b) Sensitivity analyses of Antarctic minke whales abundance estimation
by the OK method. IWC/SC/J11/AE2.
5
Table 1. Summary statistics for the data. PD = Perpendicular distance, SS = School size
The reference dataset (excluding C)
CPII
PD
IO
CL
unconf
0.504
0.312
conf
0.279
0.323
IO
0.555
0.339
CL
0.448
0.498
SS
unconf
conf
IO
2.044
2.432
CL
1.233
2.670
IO
2.192
1.766
CL
1.217
3.675
CPIII
The reference dataset with the change of conf/unconf to CL/IO (excluding C)
CPII
CPIII
PD
IO
CL
IO
CL
unconf
0.475
0.507
conf
0.320
0.486
SS
unconf
conf
IO
2.137
-
CL
3.110
IO
2.132
-
CL
2.323
6
Table 2. The results for the data with school size = 1. max.cor = maximum absolute correlations among
parameters. Ratio is the relative density index when DI for HR3 is 1.
CPII
g(0)
max.cor
HR3
CL
1.0000
IO
1.0000
FI
PI
LI
HP
A
0.3803
0.3214
0.0015
0.2553
B
0.1912
0.1590
0.0007
0.1560
0.8729
AUB
0.4987
0.4293
0.0019
0.3453
esw
0.7689
0.9107
1.0000
0.9614
DI
Ratio
HR3
CL
0.3360
IO
0.5696
cor.coef
1.4267
2.5046
1.0000
FI
PI
LI
HP
A
0.1970
0.1577
0.0007
0.1232
B
0.1220
0.0974
0.0004
0.0757
AUB
0.2939
0.2340
0.0010
0.1831
3.4024
4.2741
956.1
5.4617
1.3585
1.7065
381.7
2.1807
CPIII
g(0)
max.cor
HR3
CL
1.0000
IO
1.0000
FI
PI
LI
HP
A
0.3338
0.2664
0.2088
0.2622
B
0.1935
0.1538
0.1205
0.1630
0.8595
AUB
0.4627
0.3792
0.2972
0.3564
esw
0.8825
0.8837
0.9996
0.9521
DI
Ratio
HR3
CL
0.3916
IO
0.5973
cor.coef
1.4224
2.3813
1.0000
FI
PI
LI
HP
A
0.2217
0.1708
0.1338
0.1479
B
0.1277
0.0983
0.0770
0.0847
AUB
0.3218
0.2468
0.1934
0.2141
3.1077
4.0516
5.1716
4.6705
1.3050
1.7014
2.1717
1.9613
HR3 = standard line transect method with hazard rate detction function using 3 platform data
FI = full indepence model, PI = point independence model, LI = limitin independence model
HP = hazard probability model
7
Table 3. The results for the data with school size effects. max.cor = maximum absolute correlations among
parameters. Ratio is the relative density index when DI for HR3 is 1.
CP II
g(0)
max.cor
HR3
CL
1.0000
IO
1.0000
school size
FI
PI
LI
HP
A
1
0.2630
0.2530
0.2874
0.2562
average
0.3458
0.3371
0.3756
0.2819
0.8822
B
1
0.1441
0.1388
0.1571
0.1058
AUB
1
0.3693
0.3567
0.4320
0.2458
average
0.2194
0.2149
0.2393
0.1733
average
0.4587
0.4481
0.5244
0.3318
0.9384
0.9362
0.9528
0.9190
esw
HR3
FI
PI
LI
HP
DI
CL
0.3241
A
1
0.0968
0.0952
0.1077
0.0759
IO
0.6033
average
0.1743
0.1699
0.1896
0.1375
cor.coef
1.3833
B
1
0.0506
0.0499
0.0561
0.0442
Es
2.2685
Es.CL
3.2933
AUB
1
0.1405
0.1306
0.1574
0.1127
average
0.1147
0.1117
0.1247
0.0887
average
0.2442
0.2267
0.2638
0.1915
Es.reg
4.0948
1.7860
Es.reg.CL
5.8919
2.5698
Es
1.3920
1.3942
1.4288
1.4275
5.7009
6.1488
5.4163
7.4549
CPIII
g(0)
CL
1.0000
IO
1.0000
school size
FI
PI
LI
HP
A
1
0.2694
0.2525
0.4414
0.1788
average
0.3368
0.3193
0.5117
0.2486
1.4389
1.3922
1.5016
1.3227
1.8206
0.8480
B
1
0.1630
0.1548
0.2687
0.1121
AUB
1
0.3885
0.3682
0.7528
0.2582
average
0.2217
0.2121
0.3391
0.1719
average
0.4657
0.4461
0.8182
0.3332
0.9732
0.9745
0.9735
0.9724
DI
esw
FI
PI
LI
HP
1.0000
max.cor
HR3
HR3
Ratio
CL
0.5252
A
1
0.1704
0.1618
0.2813
0.0983
IO
0.6764
average
0.2597
0.2452
0.3945
0.1633
cor.coef
1.4090
B
1
0.0889
0.0852
0.1470
0.0527
Es
2.3092
Es.CL
2.5584
AUB
1
0.2423
0.2118
0.4606
0.1428
average
0.1553
0.1465
0.2359
0.0963
average
0.3555
0.3117
0.5948
0.2217
Es.reg
3.4870
1.6740
Es.reg.CL
4.0217
1.9307
Es
1.4722
1.4705
1.6499
1.3864
4.1414
4.7170
2.7739
6.2526
HR3 = standard line transect method with hazard rate detction function using 3 platform data
FI = full indepence model, PI = point independence model, LI = limitin independence model
HP = hazard probability model
8
Ratio
1.0000
1.1533
1.1877
1.3527
0.7955
1.7931
Table 4. The results for the data with conf/unconf -> CL/IO. max.cor = maximum absolute correlations
among parameters. Ratio is the relative density index when DI for HR3 in Table 3 is 1.
CP II
g(0)
school size
FI
PI
LI
HP
max.cor
A
1
0.2053
0.1931
0.1743
0.1672
average
0.2871
0.2765
0.2522
0.2388
B
1
0.1080
0.1015
0.0923
0.1006
AUB
1
0.2911
0.2750
0.2323
0.2314
average
0.1779
0.1722
0.1571
0.1605
average
0.3834
0.3698
0.3213
0.3100
0.9636
0.9690
0.9065
0.9220
esw
FI
PI
LI
HP
A
1
0.0748
0.0733
0.0665
0.0728
average
0.1453
0.1402
0.1278
0.1287
B
1
0.0378
0.0372
0.0341
0.0425
AUB
1
0.1086
0.0977
0.0839
0.1079
average
0.0942
0.0909
0.0828
0.0826
average
0.2037
0.1832
0.1610
0.1795
Es
1.4769
1.4943
1.4589
1.4693
DI
Ratio
7.2516
8.1558
9.0625
8.1832
1.7709
1.9917
2.2131
1.9984
CPIII
g(0)
school size
FI
PI
LI
HP
max.cor
A
1
0.1235
0.1227
0.3268
0.1160
average
0.1786
0.1780
0.4224
0.1628
B
1
0.0716
0.0712
0.1847
0.0710
AUB
1
0.1862
0.1852
0.7496
0.1722
average
0.1181
0.1177
0.2786
0.1113
average
0.2518
0.2511
0.8309
0.2228
0.9821
0.9324
0.9632
0.9368
esw
FI
PI
LI
HP
A
1
0.0797
0.0795
0.2074
0.0616
average
0.1387
0.1381
0.3282
0.1037
B
1
0.0388
0.0387
0.0974
0.0323
AUB
1
0.1151
0.1128
0.4631
0.0905
average
0.0820
0.0817
0.1938
0.0608
average
0.1897
0.1863
0.6090
0.1405
Es
1.2916
1.2951
1.6292
1.2292
DI
Ratio
6.8098
6.9502
2.6751
8.7460
1.9529
1.9932
0.7671
2.5082
HR3 = standard line transect method with hazard rate detction function using 3 platform data
FI = full indepence model, PI = point independence model, LI = limitin independence model
HP = hazard probability model
9
Table 5. The results for the data with conf/unconf -> CL/IO and SSE. max.cor = maximum absolute
correlations among parameters. Ratio is the relative density index when DI for HR3 in Table 3 is 1.
CP II
g(0)
school size
FI
PI
LI
HP
max.cor
A
1
0.2886
0.2769
0.1702
0.1892
average
0.3714
0.3620
0.2360
0.2623
B
1
0.1558
0.1514
0.0947
0.1145
average
0.2292
0.2261
0.1469
0.1763
AUB
1
0.3994
0.3864
0.1833
0.2609
average
0.4902
0.4806
0.2566
0.3403
0.9232
0.9231
0.9171
0.9365
esw
FI
PI
LI
HP
A
1
0.1082
0.1108
0.0685
0.0835
average
0.1900
0.1852
0.1200
0.1424
B
1
0.0572
0.0595
0.0378
0.0490
average
0.1232
0.1200
0.0776
0.0913
AUB
1
0.1569
0.1407
0.0704
0.1235
average
0.2671
0.2324
0.1290
0.1989
Es
1.6195
1.6573
1.5071
1.5207
DI
Ratio
6.0642
7.1312
11.6811
7.6460
1.4809
1.7415
2.8526
1.8672
CPIII
g(0)
school size
FI
PI
LI
HP
max.cor
A
1
0.2440
0.2192
0.4966
0.1346
average
0.3090
0.2828
0.5623
0.1883
B
1
0.1449
0.1334
0.2975
0.0824
average
0.2009
0.1874
0.3674
0.1284
AUB
1
0.3535
0.3233
0.8514
0.1984
average
0.4284
0.3985
0.9158
0.2565
0.9072
0.8902
0.8995
0.9549
esw
FI
PI
LI
HP
A
1
0.1605
0.1491
0.3323
0.0730
average
0.2419
0.2206
0.4419
0.1216
B
1
0.0821
0.0779
0.1718
0.0383
average
0.1428
0.1304
0.2606
0.0710
AUB
1
0.2283
0.1839
0.5486
0.1069
average
0.3314
0.2663
0.6746
0.1651
Es
1.5906
1.5946
1.9720
1.3104
DI
Ratio
4.7999
5.9869
2.9233
7.9387
1.3765
1.7169
0.8384
2.2767
HR3 = standard line transect method with hazard rate detction function using 3 platform data
FI = full indepence model, PI = point independence model, LI = limitin independence model
HP = hazard probability model
10
Table 6. Comparison between observed and predicted proportions of sighting patterns. .SSE denote the results for the
data with conf/unconf = CL/IO and SSE.
MSS = 1
CPII
pAb
paB
pAB
Obs
0.5850
0.3256
0.0895
PI
0.5837
0.3258
0.0904
HP
0.5865
0.3271
0.0864
CPIII
pAb
paB
pAB
Obs
0.6028
0.3077
0.0895
PI
0.6017
0.3078
0.0905
HP
0.6047
0.3093
0.0860
MSS > 1
CPII
pAb
paB
pAB
Obs
0.5408
0.2824
0.1768
PI
0.5069
0.2510
0.2421
HP
0.5368
0.2819
0.1813
PI.SSE
0.4833
0.2032
0.3136
HP.SSE
0.5410
0.2840
0.1750
CPIII
pAb
paB
pAB
Obs
0.5685
0.2678
0.1637
PI
0.5301
0.2132
0.2567
HP
0.5657
0.2637
0.1706
PI.SSE
0.5103
0.1715
0.3181
HP.SSE
0.5698
0.2636
0.1667
11