Predicting fish diet composition using a bagged classification tree

1
2
Predicting fish diet composition using a bagged
3
classification tree approach: a case study using
4
yellowfin tuna (Thunnus albacares)
5
Petra M. Kuhnert
6
7
8
CSIRO Mathematics, Informatics and Statistics, Private Bag 2, Glen Osmond SA 5064, Australia
E-mail: [email protected]; Phone: +61 8 8303 8775; Fax: +61 8 8303 8763
9
Leanne M. Duffy
10
11
12
Inter-American Tropical Tuna Commission, 8604 La Jolla Shores Drive, La Jolla CA 920371508 USA
13
Jock W. Young
14
15
16
CSIRO Marine and Atmospheric Research and Wealth from Oceans Flagship, GPO Box 1538,
Hobart TAS 7001 Australia
17
Robert J. Olson
18
19
20
Inter-American Tropical Tuna Commission, 8604 La Jolla Shores Drive, La Jolla CA 920371508 USA
21
22
23
24
25
26
27
28
29
30
1
31
Abstract:
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
We provided a classification tree modeling framework for investigating complex feeding relationships, and
48
Keywords: Yellowfin tuna diet, classification and regression trees (CART); bootstrapping;
49
Feeding habits; Predator-prey relationships; Spatial, Trophic ecology;.
illustrated the method using stomach contents data for yellowfin tuna (Thunnus albacares) collected by longline
fishing gear deployed off eastern Australia between 1992 and 2006. The non-parametric method is both exploratory
and predictive, can be applied to varying size datasets and therefore is not restricted to a minimum sample size. The
method uses a bootstrap approach to provide standard errors of predicted prey proportions, variable importance
measures to highlight important variables, and partial dependence plots to explore the relationships between
explanatory variables and predicted prey composition. Our results supported previous studies of yellowfin tuna
feeding ecology in the region. However, the method provided a number of novel insights. For example, significant
differences were noted in the prey of yellowfin tuna sampled north of 30°S in summer where oligotrophic waters
dominate. The analysis also identified that sea-surface temperature, latitude and yellowfin size were the most
important variables associated with dietary differences. The methodology is appropriate for delineating ecosystemlevel trophic dynamics, as it can easily incorporate large datasets comprising multiple predators to explore trophic
interactions among members of a community. Broad-scale relationships among explanatory variables
(environmental, biological, temporal and spatial) and prey composition elucidated by this method then serve to focus
and lend validity to subsequent fine-scale analyses of important parameters using standard diet methods and chemical
tracers such as stable isotopes.
50
51
52
53
54
55
56
57
58
59
60
61
2
62
Introduction
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Characterising the feeding relationships of top predators in complex marine ecosystems is a key component of
78
predators and p represents the number of prey observed in the stomach contents of predators at the time of sampling
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
(Chipps and Garvey 2007). Not all prey are represented in the stomach of each predator, hence the matrix usually
ecosystem modeling (Olson and Watters 2003; Griffiths et al. 2010). Typically, fish diets have been characterised
using stomach contents data. While stable-isotope and fatty-acid analyses continue to provide valuable information
about food webs, diet data from stomach contents are essential for defining the taxonomic components and links in
ecosystem models. Yet, there is no unified approach for analysing stomach contents (hereafter “diet”) data used to
inform these models. Previous approaches have been exploratory, consisting of several types of multivariate analyses
accompanied by non-parametric statistical tests to examine a-priori hypotheses about the data. The relationships
between the type of prey eaten by a predator and their respective environmental niches are also difficult to
synthesize. Given the growing focus on understanding predator-prey relationships under increasing environmental
and fishing pressures (Pikitch et al. 2004; Marasco et al. 2007), it is imperative to develop methodology that is both
exploratory and predictive, and able to inform ecosystem models for exploratory analyses and eventual ecosystembased management.
A large body of literature focusing on diet studies based on stomach contents has depended on multivariate
analyses (Mardia et al. 1979). Methods including principal component analysis, non-metric multidimensional scaling
and cluster analysis are used to explore an n × p matrix of prey weights, where
n
represents the number of
consists of many zeros. Examples of recent diet analyses that adopt one or more of these multivariate approaches
include the work of Young et al. (2010), who analysed 10 species of pelagic fishes, Griffiths et al. (2007) who
analysed the feeding dynamics of longtail tuna (Thunnus tonggol), Potier et al. (2007), who investigated prey
composition of three large pelagic fish predators and Young et al. (2006), who analysed the feeding ecology of
broadbill swordfish (Xiphias gladius) from eastern Australian waters. In many of these analyses, the multivariate
methods were accompanied by non-parametric statistical tests, including the Wilcoxon signed rank test and the
Kruskal-Wallis rank sum tests to examine specific differences in diet composition as expressed through the results of
the multivariate methods. Alternative analyses have typically accompanied the multivariate investigations to provide
a thorough evaluation of feeding. These include the calculation of daily consumption rates of the predator and how
this varies with fish size. Studies by Olson and Galván-Magaña (2002), Young et al. (2010), Griffiths et al. (2009)
and Griffiths et al. (2007) provide examples.
Regression-based approaches have included quantile regression (Koenker and Bassett 1978; Scharf et al.
1998), to examine prey-predator length relationships and provide a broad overview and synthesis of what different
sized predators eat (Ménard et al. 2006; Young et al. 2010; Logan et al. 2011); Generalised Linear Models (GLMs)
for analysing the presence or absence of a particular prey in the stomach of a predator (McCullagh and Nelder 1983);
or Generalised Linear Mixed Models (GLMMs) (McCulloch and Searle 2001). The latter can be used to address
pseudo-replication, which is a concern when the stomachs of several predators are collected at the same sampling
event (e.g. in the purse-seine fishery, fish that are collected from the same purse-seine set, i.e. sampling event, are not
3
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
considered independent) (M. Hunsicker, personal communication). Classification and Regression Trees (CART)
113
Materials and methods
114
Study sites and sample collection
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
The yellowfin tuna is one of four tuna and billfish species fished commercially off eastern Australia (Young et al.
131
broke the survey region into south (south of 30°S), central (latitude between 20°S and 30°S) and northern locations
132
(north of 20°S) and inshore (longitude <155°E) and offshore regions (longitude between 155°E and 160°E). These
(Breiman et al. 1984) have been used as an exploratory tool to predict individual prey weights (Oson and GalvánMagaña 2002) and the occurrence of fish in the diet (Baker and Sheaves 2005). They have not been used as a tool to
predict diet composition or to identify the relationships between explanatory variables and the distribution of prey
groupings.
This paper extends the classification tree approach proposed by Breiman et al. (1984) to provide a method
for exploring and predicting diet composition. The method provides variable importance measures for identifying
important predictor variables in the model. It produces bagged predictions with standard errors using bootstrap
techniques similar to Breiman (1996) and Kuhnert et al. (2010), but with a spatial component to account for sampling
design. It also includes partial dependence plots to examine the relationships between the explanatory variables and
the response in the model (Breiman 2001). Finally, it provides a method for visualising the results by mapping the
predictions back to the pruned tree for interpretation and synthesis (Kuhnert and Mengersen 2003). We demonstrate
this approach using a case study from Australia (Young et al. 2010), where the diet composition of yellowfin tuna
(Thunnus albacares) was investigated to determine if relationships existed with environmental, biological, temporal
and spatial covariates.
2011). The fishery for yellowfin tuna is situated largely within the East Australia Current. As this current is predicted
to increase spatially because of ocean warming, understanding its impact on the distribution of yellowfin tuna in the
region is an important research goal (Hartog et al. 2011). Part of this understanding will require detailed dietary
information. As trophic relations are at the base of most marine ecosystem models (Christensen and Walters 2004,
Fulton et al 2007), diet studies that have the capacity to predict prey concentrations under different environmental
conditions will be particularly valuable to evaluate indirect effects of future management scenarios on exploited
ecosystems. Although previous studies have documented the diet of yellowfin tuna in these waters (Young et al.
2001; Young et al. 2010), none have attempted such a task.
Yellowfin tuna were caught by longline fishing gear deployed off eastern Australia between 1992 and 2006
using the survey approach outlined in Young et al. (2010) and the stomachs were removed for diet analysis. Of the
818 stomachs sampled, only 528 that contained prey remains were used in the analysis. Any prey contributing less
than 1% to the total wet weight across all stomach samples was also excluded from the analysis. This resulted in 19
prey taxa for analysis (Table 1). For each prey group in each stomach, a corresponding prey proportion, based on the
wet weight stomach remains, was calculated.
Explanatory variables consisted of categorical versions of longitude (LN.reg) and latitude (LT.reg) that
4
133
134
135
136
137
138
139
140
141
142
143
144
145
regions were based on where samples were taken in relation to the main features of the East Australian Current (see
146
Classification tree for diet data
147
Classification trees
148
149
150
151
152
153
Classification and Regression Trees (CART) is a popular non-parametric modeling approach that is extensively
154
155
156
157
158
159
160
161
162
163
164
165
166
167
amongst categories being predicted and is calculated as a function, φ of class probabilities predicted at node, t . On
Young et al. (2006) for details). A seasonal categorical variable identified samples taken in summer (September to
March) or winter (April to August). Sea-surface temperature (SST) was recorded at time of capture from an
underway thermosalinograph (Young et al. 2006). Absent values were later included from SST satellite imagery
using the spatial dynamics ocean data explorer (SDODE) interface (Hobday et al. 2006) after pair wise comparisons
showed a significant relationship between satellite and vessel-derived data (R2=0.87). These data were represented as
a continuous variable ranging between 14.6 and 29.1°C (mean 21.67°C). Mixed layer depth (MLD), defined as the
depth from the surface of water with equal density, was applied to each sample also using the SDODE interface.
Values ranged between 12.56 and 161.7 m (mean 37.69 m). The fork length (Length) of yellowfin tuna was
represented by a continuous variable and ranged between 73 and 214 cm (mean 130 cm). Moon phase was recorded
as a continuous variable between 0 and 1 where 0 indicates a new moon, 1 indicates a full moon and values in
between represent all other different phases of the moon. There were missing values of fork length (6.7% missing),
sea-surface temperature (2.2% missing) and mixed layer depth (2.2% missing).
described in Breiman et al. (1984), available for application using the RPART package (Therneau et al. 2009) in the
R programming language (R Development Core Team 2005) and more recently, applied with references to ecology,
by Zuur et al. (2007). We briefly describe the methodology here with extensions to diet data. Through a greedy
algorithm, data are partitioned by successive splitting on explanatory variables that seek to minimize an error
criterion. For classification problems, the Gini index, i(t ) (Breiman et al. 1984), represents a measure of diversity
production of a large tree, v-fold cross-validation (typically 10-fold) is implemented to prune the tree to the lowest
cross-validated error rate. Cross-validation is a technique used to test the predictive performance of a model and how
well it will generalize to a new set of data. Cross-validation is implemented by holding back a portion, v, of the data
as a test set, constructing the model on the remaining set of observations (regarded as a training set) and then
predicting using the test set data. This is repeated for all possible subsets of data to provide a cross-validated
prediction error. As Breiman et al. (1984) highlights, the selection of the final tree can be subjective and may result
in a series of trees within one standard error of the tree yielding the minimum, known as the 1SE tree. Breiman et al.
(1984) advocate using the 1SE rule for selecting what is regarded as “the right size tree” and reporting the crossvalidated error rate and its accompanying standard error as a measure of the uncertainty. Although different types of
trees (e.g. unpruned, stumps, selecting trees based on the number of splits or varying error rates) can be explored,
using the 1SE rule, as suggested by Breiman makes the model selection less subjective. We therefore followed
Breiman’s suggestion to use cross-validation and the 1SE rule to identify an optimal tree. Predictions are formed by
partitioning a new observation down the tree until it resides in a terminal node with labels in accordance to the
5
168
169
170
171
172
173
174
175
176
177
naming convention set out by Breiman et al. (1984). For classification tree problems, terminal node predictions are
represented as the most dominant prey group that is partitioned to that node. Missing values are easily
accommodated in CART models through surrogates that represent splits closely related to the primary split, which
can be used to partition data in the event of missing values.
Although classification trees have typically been developed on datasets whereby the response has consisted
of a categorical variable, such as the presence or absence of a species or the classification of two or more categories,
classification trees can also be developed for an n× p matrix of diet data, where π ij represents the proportion of prey,
j , consumed by predator, i , based on the recorded wet weights and subject to the constraint:
sum of the proportions equals 1. This type of model is equivalent to a multinomial model (McCullagh and Nelder
1983), where π ij =Pr{Yi = j} with a probability distribution represented as
ni

 yi1
yip
Pr {Yi1 = yi1 ,K , Yip = yip } = 
 π i1 Kπ ip
 yi1 ,K , yip 
178
179
p
∑ π ij =1 , where the
j =1
and
∑y
ij
(1)
= ni represents the number of predators consuming prey, i . We explored relationships between the prey
j
180
composition and matrix of explanatory variables, X i , in the usual way through a logit link that linearly relates the
181
response to the explanatory variables through the following expression
ηij = log
182
π ij
= α j + β j Xi
π ip
(2)
183
where α j represents the mean proportion weight (log-scale) for each prey classification and β j represents a vector
184
of regression coefficients for j = 1, 2, K , J − 1 , describing the contribution of each explanatory variable in the model.
185
186
187
188
189
190
191
192
193
194
195
Put more simply, we modelled the prey proportions such that the probability of residing in the prey class represents a
196
of prey proportions into an ( L = n × k ) vector of prey classes, Yl (l = 1, K , L.) , with case weights, Wl , representing
197
the proportion of prey, Yl , eaten by a predator. In this parameterisation, k < p and represents the prey classes with
function of the explanatory variables. Models of this form may be investigated for stomach contents data but they
rely on a-priori knowledge of how each explanatory variable should be represented in the model and what if any
interactions might be explored. Furthermore, the investigation of interactions may be difficult, especially for prey
groups that appear in low proportions. The number of prey groups that can be examined using a multinomial
regression approach may also be limited due to the shear lack of data and missing values that either need to be
removed or imputed.
In the classification tree framework, we can fit a similar model to predict the prey distribution of a predator,
given important explanatory variables. If we consider a transformation of the data and let each row, l , of the
transformed dataset represent a unique predator-prey combination, where the proportion of a prey consumed by a
predator is calculated as a case weight, Wl , for the classification tree model, then we can retransform the n× p matrix
6
198
199
200
201
weights observed. To illustrate, consider prey eaten by a yellowfin tuna captured along the east coast of Australia.
202
Y3 = Carangidae and corresponding case weights of W1 = 0.2 , W2 = 0.1 and W3 = 0.7 respectively.
For a particular yellowfin, we find that it consumes squid of the family, Ommastrephidae, crustacea of the order
Decapoda and fishes from the family Carangidae in proportions of 0.2, 0.1 and 0.7 respectively. In the transformed
dataset, this predator would appear three times with prey labels of Y1 =
Ommastrephidae, Y2 = Decapoda and
203
We calculated two types of predictions for each unique yellowfin tuna, i , from this type of model. The first
204
is the predicted prey composition, πˆij , which represents the predicted proportion of prey, j , for each unique
205
predator, i . The second is the predicted prey group, Yˆi (i =1,K,n) , where Yˆi = max{πˆi1,K,πˆip } and therefore represents
206
the prey group that yields the maximum predicted prey proportion.
207
Variable importance
208
We identified important variables that contribute to the model using a variable importance ranking, similar
209
to the approach described by Breiman et al. (1984). The ranking, M ( xm ) for a variable, xm , is developed from
210
surrogate splits, s%m , appearing at node
211
212
213
214
215
216
alternative splits with high correlation to the primary split and we used these to partition the data if missing data are
217
node impurity, i , as t is partitioned by a surrogate split into left and right daughter nodes, which are represented by
218
t L and t R , respectively. In other words, the importance of a variable in a tree based model represents the
219
220
221
222
contribution that each variable makes as a surrogate to the primary split. We regard node impurity as the error in
t of the fitted classification tree model. Recall that surrogate splits represent
present. The reason for using surrogate variables in the calculation of variable importance is to provide recognition to
variables that are important predictors but are either masked or hidden by other variables in the model. That is, if two
variables are closely associated with one another and therefore provide a similar partitioning, although only one will
be selected by the model, the contributions of both are recognized in the variable importance ranking. Using the
notation defined in Breiman et al. (1984), the variable importance is defined for a variable, xm , as the change in the
predicting the prey group classified at node,
t . The difference therefore in the impurity calculated at a parent node
with the impurities calculated at the left and right daughter nodes therefore provides an estimate of the error based on
the split. The variable importance calculation is mathematically expressed as
M ( x ) = ∑ ∆i ( s%m ,t )
m
t∈T
where ∆i ( s%m ,t ) =i (t ) −i (t L )−i (t R )
223
(3)
and i (t ) = φ ( p (1|t ),K, p ( J |t ) ) ( J = no. of classes)
224
225
226
For each split of the tree and each variable, the node impurity is calculated at each node and summed across all nodes
of the tree. Thus, the final ranking represents a ranking relative to the variable that yielded the largest variable
importance.
7
227
Bagging
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
Various bootstrapping techniques have been applied to classification trees to provide more accurate predictions due
243
vector of prey classes, Yl* (b) , with corresponding case weights, Wl* ( b ) , representing the proportion of prey, Yl* (b) ,
244
245
eaten by a predator as represented in each bootstrap sample, b . Averaged prey proportions are formed by taking the
to the instability noted with the greedy algorithm that produces the partitions of the tree (Breiman 1996; Freund and
Schapire 1996; Breiman 1998; Breiman 2001). Traditional bootstrap aggregation, referred to as bagging, develops
B unpruned trees using re-samples of the data and forms predictions at each of the B trees for each observation in
the dataset. Although unpruned trees are generally used, pruned versions can be considered, e.g. “stumps”, trees
containing one split. We can then form predictions by averaging across the B trees to yield an aggregated (or
bagged) prediction and, as Kuhnert et al. (2010) demonstrated using Random Forests, we can calculate standard
errors. For visualisation we adopt the approach proposed by Kuhnert and Mengersen (2003), where the bagged
predictions are mapped back to the pruned tree for interpretation.
In terms of predicting diet composition, we extend the bootstrap aggregation approach of Breiman (1996)
by adopting a spatial bootstrap to account for spatial dependence in the data using the methods described by Hall
(1985) and Cressie (1993). The spatial bootstrap ensures that bootstrap samples of the data are stratified according to
a defined spatial resolution, typically one that is small enough such that the correlation between spatial grids is
negligible. We can formally test this assumption by fitting a variogram to the residuals resulting from the bootstrap
predictions. The approach therefore takes B spatial bootstrap samples of the transformed data to form a resampled
mean of the bootstrapped proportions for each predator with a corresponding estimate of the variance as follows:
*
πˆ ij (⋅) =
246
1 B *
∑ πˆ (b ) (i = 1, K , n; j = 1, K , p )
B b =1 ij
2
1 B  *
*
* (⋅) 
∑ πˆij (b ) −πˆij
var(πˆ ) =
ij
B −1 b =1 

(4)
247
The predicted prey therefore represents the bootstrapped average prey proportion yielding the maximum probability
248
* (⋅) .
and represented as Yˆi* (⋅) (i =1,K,n ) , where Yˆi* (⋅) = max πˆi*1 (⋅),K,πˆip
249
Visualisation of results and partial dependence plots
250
251
252
253
254
255
256
257
We can visualize the bagged predictions using the pruned classification tree in a similar approach outlined by
{
}
Kuhnert and Mengersen (2003) where they took bootstrap samples of their data, fitted a regression tree to each resample, formed bootstrap predictions and mapped the predictions back to the original tree to examine the error in
each terminal node of the tree. Using this approach, we mapped the bagged predictions back to the intermediate and
terminal nodes of a pruned tree to visualise the bagged prey distribution in terms of the proportion of prey eaten and
produce summary statistics accordingly.
Partial dependence plots represent a useful visual aid for exploring the relationship between explanatory
variables and the predicted response. In terms of their calculation, partial dependence is calculated using the
8
258
bootstrap predictions, πˆij* (b )|x⋅ j , x.− j = x.− j for each variable x⋅ j , conditional on holding all other explanatory
259
260
261
262
variables constant at their respective means. The approach used here is similar to that described by Breiman (2001).
263
Results
264
A classification tree for diet data
265
266
267
268
269
270
271
272
273
274
275
276
The bagged classification tree approach was applied to the yellowfin dataset using the spatial and environmental
277
278
279
280
281
282
283
284
(expression in Equation 3), that was used to partition the data and form splits of the tree (Figure 2). Values of the gini
285
Predicting diet composition
286
287
288
289
290
Bootstrapping was performed to provide a bagged predicted diet composition for each yellowfin tuna with
The plotted results either represent a step function (for continuous variables) showing the relationship between the
explanatory variable and the predicted response, or alternatively, a barplot (for categorical variables) showing the
contribution of each category to the prey composition relative to the most dominant response category.
covariates described in the methods with the aim of predicting prey composition. The tree was pruned to 1 standard
error, yielding a cross-validated error rate of 0.859 (SE=0.023). The resulting model consists of terminal nodes with
colors reflecting the prey yielding the highest proportion in the prey composition (Figure 1). Prey codes used in the
figure are outlined in Table 1. Important splits are indicated by the length of the splits (Figure 1), with the longer
splits highlighting the importance of seasonal and spatial effects. The first split separates summer and winter
samples, followed by splits on latitudinal region separating the central and southern regions from the northern region
during summer months, and the central and northern from the southern region in winter months. Further splits were
on sea-surface temperature (SST), mixed layer depth and yellowfin length. The variable importance ranking
indicated SST had the highest rank (1.00) followed by latitude (0.74), season (0.51), fork length (0.49), mixed layer
depth (0.36) and low relative importance values contributed by longitude region (0.19) and moon phase (0.14).
A map highlighting prey diversity for the yellowfin tuna was based on the calculated gini index, i (t )
index, hereafter termed diversity, ranged between 0 and 1 where low values (purple and blue in Figure 2) indicated a
dominant prey and high values (yellow and orange) indicated highly diversified prey species composition. We
showed the observed diversity for each yellowfin and a smoothed representation using the results from a fitted
generalized additive model (GAM) to latitude and longitude using smoothing splines in Figures 2(a) and 2(b)
respectively. Overall, high diversity is represented amongst the yellowfin sampled, indicating that the yellowfin ate a
varied diet. Of notable interest was the high diversity of prey predicted by the GAM in the diets of yellowfin sampled
in the southern areas of the map compared to central and northern regions.
corresponding standard errors. We investigated the residuals from the fitted classification tree to determine whether
spatial dependence had been adequately captured and for this application, spatial bootstrapping was not required. We
showed the results from bagging in Figure 3 (panels labeled bootstrapped proportions) based on 500 trees developed
on bootstrap samples of the data. This number was based on investigations by Kuhnert and Mengersen (2003) to
9
291
292
ensure sampling error was minimized. Predictions were mapped to the terminal nodes of the pruned tree (Figure 1),
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
We used Figure 1 to facilitate interpretation of the bagged estimates and examined internal and terminal
providing a summary of the predictions.
nodes of the tree to make sense of the predictions produced. To illustrate the method, we selected a terminal node
from the tree in Figure 1 (node 8) and an internal node (node 6). We showed in Figure 3(a) the results from the
internal node (6) representing the location of samples taken in winter months in the southern region off the east coast
of Australia. The observed proportion of prey consumed by the tuna at that node is presented along with the
corresponding bootstrap predictions. The results indicated a diverse prey composition (gini diversity index of 0.795),
highlighting yellowfin that eat a range of prey in low proportions but with high precision as demonstrated from the
narrow bootstrap percentile intervals (Figure 3(a)). The most dominant prey appearing in this node were crustacean
decapods, and scomberosocidae, carangid, gempylid, scombrid and tetraodontid fishes. The predicted prey
composition for terminal node (8) of the tree (Figure 3(b)) showed lower prey diversity (gini diversity index of
0.542), highlighting crustacean decapods as the most dominant prey (and prey class prediction), with some
uncertainty surrounding that prediction as demonstrated by the wide bootstrap percentile intervals. All other prey
appeared in lower proportions. Bootstrap predictions for all terminal nodes of the tree are shown in Figure 4.
Although some nodes clearly showed a dominant prey (nodes 8 and 29), most terminal nodes comprised 2 or 3
dominant prey, but predicted in low proportions. Of note was the prevalence of myctophid fishes in northern waters
during summer (node 5), highlighting the method’s usefulness in distinguishing prey differences at smaller spatial
and temporal scales.
310
Investigating relationships between predictor variables and diet composition
311
312
313
314
315
316
317
318
319
Partial dependence plots were constructed to examine the relationship between each of the predictor variables and the
320
321
322
323
324
325
326
waters up to 25°C (Figure 5). Crustacean decapods (black) appeared in higher proportions as SST increased. In
predicted prey composition for a subset of prey. We presented partial dependence plots for the 3 most important
variables, sea-surface temperature, length and latitude. We investigated the interaction between latitude and
longitude to aid with the interpretation. Partial dependence plots for SST and length (Figures 5 and 6 respectively),
are shown for a subset of prey, crustacean decapods, and fishes from the families Carangidae, Gempylidae,
Scombridae, Scomberesocidae and Tetradontidae, which were identified as important in nodes 6 and 8 of the tree
(Figure 3). Rug plots represented by black segments at the base of the graph, showed the distribution of samples
collected for each variable. Reduced proportions of scomberesocids (blue) and carangids (green) were noted in
warmer waters compared to cooler waters, wheras greater proportions were noted for scombrids (orange) in warmer
contrast, gempylid (gold) and tetradontid (red) fishes appeared in small proportions consistently across the range of
sea surface temperatures.
Figure 6 shows the contribution of yellowfin fork length to the predicted proportions of prey across a
yellowfin size range between 73 cm to 170 cm with an outlier at 214 cm. Greater proportions of scomberosocids
(blue) were found in smaller yellowfin compared to larger fish, wheras consistent proportions between about 0.1 and
0.25, were noted for carangids (green), scombrids (orange) and decapods (black) for fish between 100 cm and 160
10
327
328
329
330
331
332
333
334
335
cm in length. tetraodontid (red) and gempylid (gold) fishes were consistently predicted in low proportions across the
336
Discussion
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
The yellowfin analyzed in this study exhibited a diverse diet composition, rarely dominated by any one particular
range of yellowfin tuna sizes. Although there was a marked decline predicted for decapods as fork length increased
(Figure 6), this was more likely an artifact of the data (or lack of data for large fish) than related to biological factors.
We showed the partial dependence between latitude and longitude in Figure 7 for four prey identified in
node 6 of the tree. The predicted range of proportions is between 0 and 0.4 for each prey group. The plots indicated
greater proportions of decapods in the inshore, central and southern regions with lower proportions noted in offshore
regions and survey sites to the north. Greater proportions were also noted in the southern inshore region for
scombrids. Carangid fishes were more prevalent as prey in offshore regions and regions to the north where
tetraodontid fishes appearing in greater proportions offshore and regions to the south.
prey taxon, a similar conclusion to that reached previously for yellowfin tuna off eastern Australia (Young et al.
2010). However, there were a number of novel insights delivered by this analysis. For example, significant
differences were noted in the prey of yellowfin tuna sampled north of 30°S in summer where oligotrophic (nutrient
poor) waters dominate (Ridgway 2007). These fish fed mainly on small crustacean megalopa and lanternfish. The
predicted expansion of these waters in future years may result, therefore, in a diet shift to smaller crustaceans and
fish. How this will impact on stocks of yellowfin tuna in the region needs further study, but should be considered in
modelling scenarios (e.g. Griffiths et al., 2010). Other insights included spatial mapping of prey diversity, and the
relationships between individual prey taxa and environmental correlates. For example, carangid fishes (F.Car) were
increasingly dominant prey in areas with decreased SST. As tunas and other large predators are considered effective
samplers of smaller prey species these analyses could be used to build a better understanding of the environmental
tolerances of many lower-trophic level species for which there is only limited information. A similar analysis is
being conducted for yellowfin tuna in the eastern Pacific Ocean (R. Olson and colleagues, personal communication)
and, in contrast, found less diverse food habits compared to the Australian-caught yellowfin. The diet of eastern
Pacific tuna was dominated by only a few prey taxa, depending on temporal, spatial, environmental and biological
covariates.
Classification trees are highly suited for analysing data that may include non-linearity, outliers, high order
interactions, lack of balance and missing values, which are all relevant to diet data. In this context, we present a
model for predicting diet composition that is both exploratory and predictive. From an exploration perspective, we
produced an easily interpretable classification tree accompanied by partial dependence plots for exploring the
relationship between the explanatory variables and the predicted response. We also predicted the composition of prey
consumed by each predator and produced a predicted prey distribution at each terminal node of the tree, with
bootstrap percentile intervals. Furthermore, we can also predict the composition of prey consumed by each predator.
This methodology therefore provides a significant improvement over existing methods that are purely exploratory or
involve multiple methodologies to determine relationships between the explanatory variables and the response and
can therefore be used as an effective tool in broader analyses of environmental variables. By providing mapped
11
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
distributions of prey composition and diversity, the method presents a baseline for future comparison, particularly
relevant in understanding distribution changes with respect to climate change (Chen et al. 2011). Moreover, this
methodology represents a tool for hypotheses development involving potential changes in prey biodiversity under
model scenarios of climate change (e.g. Polovina et al. 2011).
One potential limitation of this approach is its ability to perform fine scale analyses of any single dietary
predictor. An example of this is the investigation of predator length, which was highlighted by this method as an
important predictor and shown through partial dependence plots as having distinct relationships with prey species.
Methods such as quantile regression (Koenker and Bassett 1978; Scharf et al. 1998) can be a useful accompaniment
to this broader scale analysis to investigate the relationships between tuna length and minimum, median and
maximum prey lengths in relation to dietary differences as highlighted in Young et al. (2010). Therefore we
recommend using this tree based methodology in a broader scale analysis initially to focus on specific biological
questions that relate to dietary differences that can be explored using other techniques.
A second potential limitation of the tree based approach presented here is its ability to handle dependent
observations. Although not an issue for the data presented in this manuscript, pseudoreplication, where multiple fish
were sampled from the same purse-seine set can induce a bias in the tree based model. In a regression analysis, the
bias can be easily accounted for in the random effects of a generalized linear mixed model. In the case of trees, we
can overcome this issue by implementing a sub-sampling approach within the bootstrap procedure. This approach
randomly samples a subset of observations from each set, fits the tree model and compares the predictions with the
observed proportions using a Hellinger distance to identify whether bias is an issue for the sample size taken and
whether an analysis of the entire set of data is reasonable.
Although the classification tree methodology can be applied to quite varied diet data with a large variety of
prey taxa, we recommend selecting a threshold, say 1% and applying the methodology to stomach samples where the
percent of wet weight is greater than the nominated threshold. We found that by applying this threshold, we obtained
more informative analyses as it omitted many rare prey groups and outliers that tended to make interpretation of
model outputs difficult. We also illustrated the method using diet data for one predator in this paper. Despite this, the
methodology can be easily applied to a multi-predator species dataset, in which stomachs from more than one
predator from a region are sampled. The inclusion of multiple predator species simply introduces an additional
explanatory variable into the model indicating the predator species. The classification tree therefore has the capacity
to split on the predator species variable if it identifies a different prey composition for the different species being
analyzed.
The methodology presented here is an extension of the classification tree methodology presented by
Breiman et al. (1984), which incorporates spatial bootstrap techniques (where required) using a bagged approach to
provide bootstrap predictions and standard errors that can be mapped back to a pruned tree for visualization or used
to form predictions of prey distributions for each predator in the dataset. The partial dependence plots provide a
valuable addition for visualizing relationships between each explanatory variable and the predicted proportions of
each prey of interest. Although it is tempting to extrapolate beyond the range of the data to provide forecasts of the
prey composition, such predictions are not based on data and must be treated with caution. To ensure predictions do
not fall outside the range of the data, we have restricted predictions to only those based on the data provided. For
12
401
402
403
404
405
406
407
408
409
410
411
412
example, in Figure 7, we have only provided predictions for the regions that were sampled, as indicated by the black
413
Conclusions
414
415
416
417
418
419
420
421
422
423
424
The classification tree methodology provides new insights into the complex feeding habits of top predators like
425
Software and code
426
427
428
429
points shown on each map.
The choice of explanatory variables to include in this analysis requires some preliminary investigations.
Although the inclusion of additional variables in a model may lead to low error rates, they may also introduce
complex and noisy splits that are very difficult to interpret. For example, the use of spatial variables, latitude and
longitude in their continuous form made this tree difficult to interpret. As an alternative, we created categorical
versions of these variables that made more sense ecologically when examining the prey composition at each split of
the tree. A second example is the consideration of a year term in the model. While data were collected over a 14 year
period (1996-2006), splits on year were not interpretable ecologically as samples were not collected homogeneously
across sampling years and locations. As balanced sampling regimes are not always possible, we recommend that
prior to any classification tree modeling, a thorough exploratory analysis be conducted on the explanatory variables
to be incorporated into the model.
yellowfin tuna. Compared to existing approaches, this methodology offers a robust approach for analysing diet data,
providing exploratory summaries in addition to predicted prey composition that can be useful for examining
differences between predator feeding characteristics in different spatial regions, temporal zones and environmental
regimes. The bootstrap implementation also provides a way of incorporating uncertainty into the model by providing
bootstrap percentile intervals around the estimated diet composition. By providing mapped distributions of prey
composition and diversity, the method presents a baseline for future comparison, particularly relevant in
understanding distribution changes with respect to climate change. This method shows promise as a framework for
clarifying heretofore approximations of the trophic structure underlying ecologically-important large marine
ecosystems, which is an essential prerequisite for understanding future effects from environmental and anthropogenic
forces.
The methodology presented in this paper was implemented in R (R Development Core Team 2005), making
use of the rpart (Therneau et al. 2009) and maps (Becker et al. 2010) packages from the CRAN website. A diet R
package that implements the methodology presented in this paper is currently under development with a
corresponding publication, details of which can be obtained from the first author.
430
Acknowledgements
431
432
433
434
435
The authors wish to acknowledge Ross Sparks and the three anonymous reviewers who kindly reviewed this
manuscript. We acknowledge Alex Aires-DaSilva for making the R code and resources available for producing the
maps in this paper. We thank Christine Patnode for assistance with the graphics. We thank skippers from the Eastern
Tuna and Billfish fishery for help in collecting the yellowfin samples; Matt Lansdell and S. Ridoch (CSIRO) for
taxonomic support and Scott Cooper, CSIRO for structuring the database that housed this information. We thank
13
436
437
438
Alistair Hobday, Klaus Hartman, Jason Hartog and Sophie Bestley for developing and making available the SDODE
439
References
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
Baker R, Sheaves M (2005) Redefining the piscivore assemblage of shallow estuarine nursery habitats. Marine
interface for exploring global data sets. Finally, we acknowledge the funding support from CSIRO through the Julius
Award that was granted to the first author.
Ecology Progress Series 291: 197-213. doi:10.3354/meps291197
Becker RA, Wilks AR, Brownrigg R, Minka TP (2010) maps: Draw Geographical Maps. R package version 2.1-5,
http://CRAN.R-project.org/package=maps
Breiman L (1996) Bagging Predictors. Machine Learning 24: 123-140. doi: 10.1023/A:1018054314350
Breiman L (1998) Arcing Classifiers (with Discussion). Annals of Statistics 26: 801-824. doi:10.2307/120055
Breiman L (2001) Random Forests. Machine Learning 45: 5-32. doi: 10.1023/A:1010933404324
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees. Wadsworth, Belmont,
California.
Chipps SR, Garvey JE (2007) Assessment of diets and feeding patterns. In: Guy CS, Brown ML (eds) Analysis and
interpretation of Freshwater Fisheries Data. American Fisheries Society, Bethesda, Maryland, USA., pp 473-514
Christensen, V., and Walters, C.J. (2004) Ecopath with Ecosim: methods, capabilities and limitations. Ecological
Modelling 172(2-4), 109-139. Doi: 10.1016/j.ecolmodel.2003.09.003
Cressie NAC (1993) Statistics for Spatial Data. Wiley, New York
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Kauffman M (ed) Machine Learning:
Proceedings of the Thirteenth International Conference, San Francisco, pp 148-156
Fulton, E.A., Smith, A.D.M., and Smith, D.C. (2007) Alternative Management Strategies for Southeast Australian
Commonwealth Fisheries: Stage 2: Quantitative Management Strategy Evaluation. CSIRO Marine and Atmospheric
Research, Hobart, Tasmania
Griffiths SP, Fry GC, Manson FJ, Pillans RD (2007) Feeding dynamics, consumption rates and daily ration of
longtail tuna (Thunnus tonggol) in Australian waters, with emphasis on the consumption of commercially important
prawns. Marine and Freshwater Research 58: 376-397. doi: 10.1071/MF06197
Griffiths SP, Kuhnert PM, Fry GF, Manson FJ (2009) Temporal and size-related variation in the diet, consumption
rate and daily ration of mackerel tuna (Euthynnus affinis) in neritic waters of eastern Australia. ICES Journal of
Marine Science 66: 720-733. doi: 10.1093/icesjms/fsp065
Griffiths SP, Young JW, Lansdell JW, Campbell RA, Hampton J, Hoyle SD, Langley A, Bromhead D, Hinton MG
(2010) Ecological effects of longline fishing and climate change on the pelagic ecosystem off eastern Australia.
Reviews in Fish Biology and Fisheries 20: 239-272. doi: 10.1007/s11160-009-9157-7
Hall P (1985) Resampling a coverage pattern. Stochastic Processes and their Applications 20: 231-246
Hobday, A.J., Hartmann, K., Hartog, J., Bestley, S., 2006. SDODE: spatial dynamics ocean data explorer. User
Guide. CSIRO Marine and Atmospheric Research, Hobart 8pp.
14
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
Hartog J, Hobday AJ, Matear R, Feng M (2011) Habitat overlap of southern bluefin tuna and yellowfin tuna in the
east coast longline fishery - implications for present and future spatial management. Deep Sea Research II 58: 746752. doi: 10.1016/j.dsr2.2010.06.005
I-Ching Chen, Jane K. Hill, Ralf Ohlemüller, David B. Roy, Chris D. Thomas (2011) Rapid Range Shifts of Species
Associated with High Levels of Climate Warming. Science, 333, 1024 – 1026. doi: 10.1126/science.1206432
Koenker RW, Bassett GW (1978) Regression quantiles. Econometrica 46: 33-50
Kuhnert PM, Kinsey-Henderson A, Bartley R, Herr A (2010) Incorporating uncertainty in gully erosion calculations
using the random forests modelling approach. Environmetrics 21: 493-509. doi: 10.1002/env.999
Kuhnert PM, Mengersen K (2003) Reliability measures for local nodes assessment in classification trees. Journal of
Computational and Graphical Statistics 12: 398-416. doi: 10.1198/1061860031734
Logan JM, Rodriguez-Marin E, Goñi N, Barreiro S, Arrizabalaga H, Golet W, Lutcavage M (2011) Diet of young
Atlantic bluefin tuna (Thunnus thynnus) in eastern and western Atlantic foraging grounds. Marine Biology 158: 7385. doi: 10.1007/s00227-010-1543-0
Marasco RJ, Goodman D, Grimes CB, Lawson PW, Punt AE, Quinn TJI (2007) Ecosystem-based fisheries
management: some practical suggestions. Canadian Journal of Fisheries and Aquatic Sciences 64: 928-939.doi:
10.1139/F07-062
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic, London
McCullagh P, Nelder JA (1983) Generalized linear models. Chapman and Hall, London
McCulloch CE, Searle SR (2001) Generalized, linear and mixed models. John Wiley and Sons, New York
Ménard F, Labrune C, Shin Y-J, Asine A-S, Bard F-X (2006) Opportunistic predation in tuna: a size-based approach.
Marine Ecology Progress Series 323: 223-231. doi: 10.3354/meps323223
Olson RJ, Galván-Magaña F (2002) Food habits and consumption rates of common dolphinfish (Coryphaena
hippurus) in the eastern Pacific Ocean. U.S. National Marine Fisheries Service, Fisheries Bulletin 100: 279-298
Olson RJ, Watters GM (2003) A model of the pelagic ecosystem in the eastern tropical Pacific Ocean. InterAmerican Tropical Tuna Commission, Bulletin 22: 133-218
Pikitch EK, Santora C, Babcock EA, Bakun A, Bonfil R, Conover DO, Dayton P, Doukakis P, Fluharty D, Heneman
B, Houde ED, Link J, Livingston PA, Mangel M, McAllister MK, Pope J, Sainsbury KJ (2004) Ecosystem-based
fishery management. Science 305: 346-347. doi: 10.1126/science.1098222
Polovina, J. J., J. P. Dunne, P. A. Woodworth, and E. A. Howell. 2011. Projected expansion of the subtropical biome
and contraction of the temperate and equatorial upwelling biomes in the North Pacific under global warming. ICES J.
Mar Sci. doi:10.1093/icesjms/fsq198
Potier M, Marsac F, Cherel Y, Lucas V, Sabatie R, Maury O, Ménard F (2007) Forage fauna in the diet of three large
pelagic fishes (lancetfish, swordfish and yellowfin tuna) in the western equatorial Indian Ocean. Fisheries Research
83: 60-72. doi: 10.1016/j.fishres.2006.08.020
R Development Core Team (2005) R: A language and environment for statistical computing, reference index version
2.12.1. R Foundation for Statistical Computing, ISBN 3-900051-07-0, URL http://www.R-project.org., Vienna,
Austria
15
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
Ridgway KR (2007) Long-term trend and decadal variability of the southward penetration of the East Australian
Current. Geophysical research letters 34: 5pp. doi: 10.1029/2007GL030393
Scharf FS, Juanes F, Sutherland M (1998) Inferring ecological relationships from the edges of scatter diagrams:
comparison of regression techniques. Ecology 79: 448-460
Therneau TM, Atkinson B, R port by Brian Ripley (2009) RPART: Recursive Partitioning. R package version 3.145, http://CRAN.R-project.org/package=rpart
Young JW, Hobday AJ, Campbell RA, Kloser RJ, Bonham PI, Clementson LA, Lansdell MJ (2011) The biological
oceanography of the East Australian Current and surrounding waters in relation to tuna and billfish catches off
eastern Australia. Deep Sea Research II 58: 720-733. doi: 10.1016/j.dsr2.2010.10.005
Young JW, Lamb TD, Bradford R, Clementson LA, Kloser RJ, Galea H (2001) Yellowfin tuna (Thunnus albacares)
aggregations along the shelfbreak of southeastern Australia: links between inshore and offshore processes. Marine
and Freshwater Research 52. doi: 10.1071/MF99168
Young JW, Lansdell JW, Riddoch S, Revill A (2006) Feeding ecology of broadbill swordfish, Xiphias gladius
(Linnaeus, 1758), off eastern Australia in relation to physical and environmental variables. Bulletin of Marine
Science 79: 793-811
Young JW, Lansdell MJ, Campbell RA, Cooper SP, Juanes F, Guest MA (2010) Feeding ecology and niche
segregation in oceanic top predators off eastern Australia. Marine Biology 157: 2347-2368. doi: 10.1007/s002227010-1500-y
Zuur AF, Leno EN, Smith GM (2007) Analysing Ecological data. Springer, New York
16
541
542
Figure Captions
543
544
545
546
547
548
Figure 1: Pruned classification tree that predicts the diet composition of yellowfin tuna off the east coast of Australia.
549
prey group abbreviations. This tree yielded a cross-validated error rate of 0.859 ± 0.023.
The prey groups identified at each terminal node are those with the highest proportion composition among a suite of
prey in the diet. Colors represent broad taxonomic groupings: light blue represent salps, dark blue represent squids,
red represent crustaceans, and green and yellow represent fish. Covariates used to develop the tree were Season
(Summer and Winter), Latitude Region (LT.reg: Central, South and North), Longitude Region (LN.reg: Offshore and
Inshore), sea-surface temperature (SST), mixed layer depth (MLD) and fork length (Length). Refer to Table 1 for
550
551
552
553
554
Figure 2: Map of diversity values ranging between 0 and 1 showing the diversity of the distribution of prey predicted
for each yellowfin tuna. Diversity is mapped as a set of (a) points corresponding to individual predators and (b) a
spatially interpolated map using generalized additive modelling overlayed with contours showing the standard errors
of the predicted diversity and points indicating sampling locations.
555
556
557
558
559
Figure 3: Observed and bagged prey proportions for (a) node 6, representing an internal node of the classification
tree and (b) node 8, representing a terminal node of the classification tree. The largest predicted prey class is colored
according to the legend, and the locations of samples are portrayed on the map, with the diversity index, D listed
beneath the map. 95% bootstrap percentile intervals are shown for each figure.
560
561
562
Figure 4: Terminal node bootstrapped predictions and 95% bootstrap percentile intervals are presented for each node
in the pruned classification tree for the 19 prey examined.
563
564
565
566
567
Figure 5: Partial dependence plots showing the relationship between SST and the predicted proportion for
Crustaceans (Cr.Dec), and fishes from the families Scomberesocidae (F.Scombs), Carangidae (F.Car), Gempylidae
(F.Gem), Scombridae (F.Scom), and Tetraodontidae (F.Tet). A rug plot is shown beneath the plot to indicate the
distribution of SST measurements taken.
568
569
570
571
572
Figure 6: Partial dependence plots showing the relationship between fork length and the predicted proportion for
Crustaceans (Cr.Dec), and fishes from the families Scomberesocidae (F.Scombs), Carangidae (F.Car), Gempylidae
(F.Gem), Scombridae (F.Scom), and Tetraodontidae (F.Tet). A rug plot is shown beneath the plot to indicate the
distribution of length measurements taken.
573
17
574
575
576
Figure 7: Partial dependence plots of four dominant prey appearing in node 6 of the classification tree showing the
relationship between the spatial variables and the predicted proportion of (a) Cr.Dec, (b) F.Scombs, (c) F.Car and (d)
F.Tet. Predictions are based on averaging across all other variables in the model.
18
Figures
Figure 1: Pruned classification tree that predicts the diet composition of yellowfin tuna off the east coast of Australia.
The prey groups identified at each terminal node are those with the highest proportion composition among a suite of
prey in the diet. Colors represent broad taxonomic groupings: light blue represent salps, dark blue represent squids,
red represent crustaceans, and green and yellow represent fish. Covariates used to develop the tree were Season
(Summer and Winter), Latitude Region (LT.reg: Central, South and North), Longitude Region (LN.reg: Offshore and
Inshore), sea-surface temperature (SST), mixed layer depth (MLD) and fork length (Length). Refer to Table 1 for
prey group abbreviations. This tree yielded a cross-validated error rate of 0.859 ± 0.023.
1
(a)
(b)
Figure 2: Map of diversity values ranging between 0 and 1 showing the diversity of the distribution of prey predicted
for each yellowfin tuna. Diversity is mapped as a set of (a) points corresponding to individual predators and (b) a
spatially interpolated map using generalized additive modelling overlayed with contours showing the standard errors
of the predicted diversity and points indicating sampling locations.
2
(a)
(b)
Figure 3: Observed and bagged prey proportions for (a) node 6, representing an internal node of the classification tree and (b) node 8, representing a terminal node of the
classification tree. The largest predicted prey class is colored according to the legend, and the locations of samples are portrayed on the map, with the diversity index, D listed
beneath the map. 95% bootstrap percentile intervals are shown for each figure.
3
Figure 4: Terminal node bootstrapped predictions and 95% bootstrap percentile intervals are presented for each node in the pruned classification tree for the 19 prey
examined.
4
Figure 5: Partial dependence plots showing the relationship between SST and the predicted proportion for
Crustaceans (Cr.Dec), and fishes from the families Scomberesocidae (F.Scombs), Carangidae (F.Car),
Gempylidae (F.Gem), Scombridae (F.Scom), and Tetraodontidae (F.Tet). A rug plot is shown beneath the plot
to indicate the distribution of SST measurements taken.
5
Figure 6: Partial dependence plots showing the relationship between fork length and the predicted proportion for
Crustaceans (Cr.Dec), and fishes from the families Scomberesocidae (F.Scombs), Carangidae (F.Car),
Gempylidae (F.Gem), Scombridae (F.Scom), and Tetraodontidae (F.Tet). A rug plot is shown beneath the plot
to indicate the distribution of length measurements taken.
6
(b)
(a)
(d)
(c)
Figure 7: Partial dependence plots of four dominant prey appearing in node 6 of the classification tree showing
the relationship between the spatial variables and the predicted proportion of (a) Cr.Dec, (b) F.Scombs, (c) F.Car
and (d) F.Tet. Predictions are based on averaging across all other variables in the model.
7
Tables
Table 1: Prey groupings, broad categorizations and codes used in the analysis of yellowfin tuna diets.
Prey Group
Broad Categorisation
Code
Salpidae
Salps
S-Sal
Octopodidae
Molluscs
Ce-Oct
Argonautidae
Molluscs
Ce-Arg
Ommastrephidae
Molluscs
Ce-Om
Decapoda
Crustaceans
Cr-Dec
Clupeidae
Fishes
F-Clu
Emmelichthyidae
Fishes
F-Emm
Alepisauridae
Fishes
F-Ale
Myctophidae
Fishes
F-Myc
Scomberesocidae
Fishes
F-Scombs
Hemiramphidae
Fishes
F-Hemi
Carangidae
Fishes
F-Car
Bramidae
Fishes
F-Bra
Gempylidae
Fishes
F-Gem
Scombridae
Fishes
F-Scom
Nomeidae
Fishes
F-Nom
Monacanthidae
Fishes
F-Mon
Ostraciidae
Fishes
F-Ost
Tetraodontidae
Fishes
F-Tet