AN EFFICIENCY APPROACH TO CHOICE SET GENERATION

AN EFFICIENCY APPROACH TO CHOICE SET GENERATION
by
David Scrogin, Richard Hofler, Kevin Boyle, and J. Walter Milon
Corresponding author:
David Scrogin
Department of Economics
University of Central Florida
Orlando, FL 32816-1400
(407) 823-4129 voice
(407) 823-3269 fax
[email protected]
October 20, 2004
AN EFFICIENCY APPROACH TO CHOICE SET GENERATION
DAVID SCROGIN, RICHARD HOFLER, KEVIN BOYLE, AND J. WALTER
MILON
Proper formation of individual choice (or consideration) sets is an ongoing concern of
applied researchers. For destination choice analyses, criteria defined by travel time, distance,
or site quality have a behavioral appeal, yet are fundamentally limited by their
unidimensional scope. To remedy their shortcomings in a framework consistent with
individual optimizing behavior, we develop a multi-variate econometric frontiers approach
to individual choice set formation. Estimating stochastic frontier models relating costly trip
inputs to utility generating outputs, choice sets are formed with respect to the deviation of
sites from the efficient frontier. The composition of the choice sets, efficiency of site choice
and probability of selection, and estimates of consumer surplus are contrasted with results
obtained under deterministic choice set criteria.
Key words: Choice sets, destination choice, discrete choice models, stochastic frontier
models.
___________________________
David Scrogin is assistant professor, Department of Economics, University of Central
Florida. Richard Hofler is professor, Department of Economics, University of Central
Florida. Kevin Boyle is Libra Professor of Environmental Economics, Department of
Resource Economics and Policy, University of Maine. J. Walter Milon is Provost’s
Distinguished Research Professor, Department of Economics, University of Central Florida.
An earlier version of the paper was presented at the 2004 meetings of the American
Agricultural Economics Association. We thank Glenn Harrison, Shelby Gerking, Michael
Caputo, and Mark Dickie for helpful comments. The data collection was funded by the
Maine Agricultural and Forest Experiment Station and the Maine Department of Inland
Fisheries and Wildlife. The views in this paper do not necessarily represent those of the
Maine Department of Inland Fisheries and Wildlife or its personnel. All errors and
interpretations are the responsibility of the authors.
AN EFFICIENCY APPROACH TO CHOICE SET GENERATION
Proper formation of individual choice (or consideration) sets is an ongoing concern of
applied researchers. While factorial design algorithms and efficiency statistics assist in
choice set design for stated choice experiments (e.g., conjoint analysis and multi-attribute
contingent valuation), such tools elude the analyst of revealed preference (RP) data. Instead,
the individual decision maker’s true choice set is not directly observed. As a result, particular
care must be exercised in constructing choice sets for RP analyses. The task is further
confounded when the underlying universal choice set–such as recreational sites, houses, or
shopping locations–comprises a large number of feasible alternatives.
Several recent studies of destination choice have proposed criteria defined by distance
or travel time (trip inputs) and destination quality (trip outputs) for generating choice sets in
situations where the universal choice set contains hundreds of sites (Parsons and Hauber;
Whitehead and Haab; Hicks and Strand). Similar to McFadden’s random draws approach,
these deterministic criteria avoid the computational burden and memory requirements
associated with jointly modeling the formation of the individual’s choice set and the discrete
choice (Manski; Swait and Ben-Akiva; Haab and Hicks, 1997; Swait). Furthermore, the rules
seemingly possess a behavioral appeal: sites that are sufficiently distant from the point of trip
origin or deficient in particular aspects may, one might argue, be excluded from the analysis.
The common finding of the studies is that random utility model (RUM) coefficients and nonmarket use-values tend to be robust with deterministic criteria as the estimation results are
similar to those obtained with the universal choice sets. This has led to the perception that
sites failing to satisfy a given criterion may be excluded without cause for concern.
1
Yet a closer examination reveals the shortcomings of deterministic choice set rules.
In particular, because sites that fail to satisfy a rule defined by trip inputs (or site quality) are
discarded without regard to their respective quality (or trip inputs), the rule will fail to
accommodate potential tradeoffs between inputs and attainable site quality. As a result, highquality sites may be discarded if the analyst selects a conservative input rule, while readily
accessible sites may be discarded if a conservative output rule is selected. Furthermore, the
quality of sites attainable with a given amount of inputs may vary sizably across the sites
retained in the choice sets used for estimation. The prudent analyst may be troubled,
however, by assuming individuals consider all such sites rather than those providing the
greatest quality for a given amount of inputs (i.e., time and distance traveled).
In light of these issues, we develop a multi-variate efficient frontiers approach
(Aigner, Lovell, and Schmidt) for generating individual choice sets from large universal
choice sets, such as those commonly confronted in destination choice research. Using an
econometric framework complementing the probabilistic approaches to choice set formation
examined by Swait and Ben-Akiva, Haab and Hicks (1997), and Swait, individual-level
stochastic frontier models relating costly trip inputs to utility generating outputs are estimated
to measure the relative efficiency of sites comprising the universal set. Choice sets are then
formed under an efficiency criterion, which exclude sites that deviate sufficiently from the
efficient frontier. The approach complements the travel cost frontier model of Smith,
Palmquist, and Jakus and the minimum utility threshold model of choice set formation
proposed by Klein and Bither. Coupled with its consistency with individual optimizing
behavior and ease of implementation in the discrete choice framework, the frontiers approach
may serve as a useful tool for choice set generation across a broad range of applications.
2
Additional background is provided in the next section. The site choice frontier
function and efficiency criteria for choice set generation are developed in section three.
Choice set composition, site choice efficiency, and estimated RUMs produced with the
efficiency approach are contrasted with results produced with deterministic criteria in the
case of recreational angler site choice. The implications for benefits estimation are addressed
in section four. Section five concludes.
The Choice Set Problem
To motivate the choice set problem and development of the efficiency approach, we begin
in the conventional way by assuming individuals derive utility from site quality and a
numeraire good but are constrained by income (see e.g., Haab and McConnell). Quality is
obtained by traveling a known distance or length of time to any of n sites in the universal set
J. Furthermore, quality is treated as individual-specific and assumed to vary over sites.
Assuming utility maximization, individual i’s problem may be written:
(1)
where V(.) is the indirect utility function, qj,i denotes the quality of site j attainable by the
individual, yi is income, and pj,i is the implicit price of site access.1 The objective of the
choice analyst is generally to recover estimates of the parameters of V(.) for testing
hypotheses about individual preferences and to measure the non-market values of changes
in the choice set composition (e.g., willingness-to-pay for changes in site quality).
3
To proceed requires specification of the indirect utility function and individual choice
sets. While the former have been defined rather exclusively as linear-in-parameters, debate
is ongoing regarding the proper definition of the choice sets. As a starting point, it is
commonly assumed that an individual’s true choice set on a given choice occasion contains
those alternatives considered by the individual (see Thill; Haab and Hicks, 1997, 2000).
However, given imperfect individual level information–including the subset of sites in J that
individuals are cognizant of, their familiarity with these sites, and constraints limiting the
viability of some of the sites as choices–the true choice set is not perfectly observable.
Further, as 2n - 1 choice sets may be formed from a universal set containing n alternatives,
the complexity of properly specifying the choice set is compounded as n increases. The
choice of choice set is therefore a dilemma to be resolved by the analyst as its misspecification will yield misleading information about individual preferences in the form of
biased estimates of the parameters of the indirect utility function (Swait and Ben-Akiva;
Horowitz; Haab and Hicks, 1997, 2000; Swait).
Given that individual choice sets are unobservable, two ‘second-best’ approaches for
choice set formation have emerged in the literature (see Thill and Haab and Hicks, 2000, for
comprehensive reviews).2 The first, deterministic approach entails excluding sites from the
choice sets used for estimation if they fail to satisfy a specific rule defined by the analyst.
Examples in the destination choice literature include the maximum travel time and distance
criteria examined by Parsons and Hauber, Whitehead and Haab, and Hicks and Strand, and
in early work by Black, whereby sites located beyond a maximum travel time or distance
threshold are discarded.3 The second, behavioral approach follows from Manski’s two-stage
4
model of choice set formation, whereby the probability an individual selects a given site is
conditional upon the site appearing in the choice set. The probability that the site appears in
the choice set is conditioned upon individual characteristics and then modeled jointly with
the probability of site choice. Empirical applications to date include Swait and Ben-Akiva,
Haab and Hicks (1997), and Swait.
Whether to adopt simple deterministic criteria or to more rigorously develop a formal
model of choice set formation will depend primarily upon the dimension of the universal
choice set, the availability of individual information specific to choice set formation, and the
analyst’s technical expertise. The deterministic approach is ad hoc as the analyst must decide
upon both the exclusion criterion and the stringency with which it will be applied. In
contrast, the behavioral approach is consistent with utility maximization and by design
provides insight on the decision making process that is unobtainable with the deterministic
approach (Manski; Thill).4 However, behavioral models of choice set formation become
computationally intractable if the universal choice set contains more than a few alternatives
(Haab and Hicks, 1997; Swait). Furthermore, empirical implementation of the behavioral
approach requires that the instrument used for data collection be designed to elicit the
specific individual information necessary to estimate the choice set component of the model,
yet this may be infeasible in many circumstances. These information requirements also
undermine the use of the behavioral approach with many existing RP datasets. In contrast,
deterministic criteria are convenient to implement and computationally tractable regardless
of the size of the universal choice set. They also require no additional data beyond that
required of a standard model of destination choice (i.e., travel costs and site characteristics).
5
As destination choice analysts frequently confront universal choice sets containing
large numbers of alternative locations (e.g., dozens or hundreds of sites), deterministic
criteria have prevailed for generating choice sets in empirical applications. Several recent
studies have examined the sensitivity of destination choice models to various deterministic
criteria. Parsons and Hauber evaluated a rule defined by maximum travel time in modeling
recreational angler site choice in Maine. Excluding sites in thirty minute increments from the
trip origin, they found RUM coefficient and benefit estimates to be insensitive to the rule.
Whitehead and Haab evaluated criteria defined both by trip inputs and outputs in studying
angler site choice in the southeast United States. The input was defined by maximum trip
distance, and two definitions of minimum attainable site quality were evaluated: the historic
(five-year) daily catch rate of fish and an individual-specific catch rate generated from an
estimated Poisson regression model (see e.g., McConnell, Strand, and Blake-Hedges). With
sites excluded in sixty mile increments, the findings of Parsons and Hauber were supported.
Similarly, for both site quality measures the estimated RUM coefficients rapidly converged
upon those obtained with the universal choice set. Most recently, Hicks and Strand evaluated
the maximum time and distance rules relative to ‘familiar’ choice sets constructed from
individual responses to survey questions in studying Maryland beach visitation. In both
cases, the estimates rapidly converged upon those obtained with the universal set, a result
that may be attributed to a high degree of correlation between distance and travel time.
In selecting deterministic choice set criteria, it is natural to question whether tradeoffs
exist between attainable site quality and trip inputs. In many instances, quality may
reasonably be expected to vary with the time and distance traveled. For example, to the
6
extent that public lands and waters within city boundaries are of lesser quality for any of a
number of recreational activities (e.g., fishing and site seeing) compared to the surrounding
rural countryside, attainable site quality will vary directly with travel time and distance as
one moves outward from the city center. The exceptions occur if quality is homogenous over
sites or the closest sites to the trip origin happen to be the highest quality sites.
If attainable site quality is correlated with the time or distance traveled, then concern
arises about the merits of deterministic rules for choice set generation. Specifically, if a rule
is defined solely by the spatial or temporal proximity of sites to the trip origin, then there will
be a tendency to discard high-quality sites if attainable site quality increases with the time
or distance traveled. Similarly, if a rule is defined solely by site quality, then convenient and
readily accessible sites will tend to be dropped. The implication is that deterministic rules
may be inconsistent with utility maximization if the excluded sites dominate the sites
retained in the choice sets used for estimation. Furthermore, the quality of sites attainable
with a given quantity of trip inputs may vary sizably within the choice sets. However, the
prudent analyst should be troubled by assuming that individuals consider all such sites rather
than those that provide the greatest quality for a given amount of inputs.
Efficient Frontiers and Individual Choice Sets
To remedy the shortcomings of deterministic criteria while maintaining the hypothesis that
trips require costly inputs to yield utility generating outputs, we develop a multi-variate
efficient frontiers approach to choice set formation. Stochastic frontier models (Aigner,
Lovell, and Schmidt; Kumbhakar and Lovell) are estimated in order to identify the efficient
7
site frontier for individual trip-takers. Measuring the distance of sites from the efficient
frontier–referred to here as site efficiency–choice sets are then generated by retaining sites
that satisfy a specific efficiency criterion. The stringency of the criterion is varied to evaluate
its effects on the choice set composition, site choice probabilities, and consumer surplus.
Smith, Palmquist, and Jakus (SPJ) is the only prior study to have employed frontier
models for travel demand analysis. Their approach estimated a deterministic frontier model
(Farrell) relating the costs of accessing sites for recreational angling to the average catch rate
of fish and indicators of water quality. Our approach complements SPJ, however, we
implement the more flexible stochastic specification of the frontier model at the level of the
individual trip-taker in a discrete choice framework. As with applications in productivity
analysis, frontier models are intended as tools for the analyst rather than models of individual
behavior. Furthermore, the models do not identify the causes of ‘inefficient’ behavior.
However, the appeal of the frontier models for choice set generation rests in their
computational tractability relative to probabilistic choice set models (Swait and Ben-Akiva;
Haab and Hicks,1997; Swait) and ease of implementation with large universal choice sets.
Begin by assuming that trip inputs and site quality are continuous and non-negative.
For consistency with the neoclassical production function and the deterministic frontier
function of SPJ, attainable site quality is hypothesized to be positively correlated with travel
time and trip distance. Defining qj,i as the quality of site j attainable by individual i given the
trip input xj,i (time or distance), the individual’s efficient site frontier function is written:
(2)
8
where
is a positive parameter.5 Empirically, (2) will likely fail to hold perfectly in most,
if not all, situations. Although the maximum attainable level of site quality may be expected
to increase as one ventures further from home, the number of lesser quality sites may also
increase. Those sites providing the greatest (least) quality for a given amount of inputs are
the most efficient (inefficient) sites in the universal choice set J. To distinguish between sites
on the basis of efficiency, the frontier function is weighted by a site efficiency term
:
(3)
Site j lies on the frontier of efficient sites if
inefficient within J as
= 1, while the site is considered increasingly
. As, all else constant, utility will increase as the level of quality
increases, it follows that the utility attainable at relatively efficient sites will be greater than
that attainable at a relatively inefficient sites. To also allow for random variation in the
relation between site quality and trip inputs from the perspective of the analyst, an error term
is included in (3). The resulting stochastic site frontier function is written:
(4)
And the logarithmic transformation of (4) is given by:
(5)
where uj,i
. Specifying the functional relation between qj,i and xj,i and the
9
distributions of u and , maximum likelihood may be used to recover
efficiency values (
Typically,
and the site
’s ) indicating the proximity of the sites to the efficient frontier.
, while u may be assigned any of
is assumed to be distributed as
several distributions (e.g., truncated normal, half normal, and exponential).
Individual choice sets may be defined in terms of the efficiency of sites in the
universal choice set J relative to site frontier (2). One possibility is to define a lower bound
by weighting (2) by some value k (where 0 < k < 1,
) and then retain those sites
located between the frontier and the lower bound. However, this ‘uniform’ approach is
problematic as it ignores potential heterogeneity in the frontier function and site efficiency
distribution across trip-takers. We instead define the lower bound at the individual level by
weighting the frontier function by a linear combination of the mean and standard deviation
of the individual’s site efficiency distribution: (
). The parameter
calibrates the
location of the lower bound relative to the frontier and, hence, the composition of the choice
set. The efficiency criterion for choice set generation is formally stated:
(6)
where
For individual
, site
if:
represents the choice set retained for estimation. The criterion states that sites are
excluded from
if their quality, conditional upon the inputs required to access the sites, is
less than the lower bound. In general,
J as
10
, while
converges upon the
efficient frontier as
. Given that the utility attained with a given level of trip inputs
will be greater at relatively efficient sites than at relatively inefficient sites, the lower bound
defining the minimum admissible level of site efficiency is equivalent to a minimum utility
threshold (Klein and Bither). As
increases (decreases), the minimum level of efficiency
required for sites to appear in the choice set increases (decreases), and as a result, the
minimum utility attainable at the sites comprising the choice set increases (decreases).
The composition of the choice sets may be evaluated with efficiency scores, which
are calculated from the efficiency values of the sites contained in the universal choice set. As
site efficiency is independent of the choice set criterion, the composition of the choice sets
may be compared with the composition of choice sets generated from alternative approaches
(e.g., travel time, distance, and site quality criteria). The efficiency score for site j is defined:
(7)
Efficiency Scorej =
The scores of highly efficient (inefficient) sites will be positive (negative) and relatively large
in absolute value, while sites possessing approximately average levels of efficiency will have
efficiency scores of approximately zero. The mean efficiency score of a given choice set may
be used as a gauge of its composition. A mean efficiency score of zero indicates the choice
set contains both efficient and inefficient sites relative to the mean site. This would be
expected, for example, with the universal set of sites. In general, if a choice set is comprised
of relatively efficient (inefficient) sites, then its mean efficiency score will be positive
(negative). In contrast, the mean site efficiency scores of choice sets generated from simple
11
deterministic criteria are expected to be approximately equal to zero due to the tendency of
these criteria to exclude sites independently of their efficiency values.
The characteristics of individual choices over the sites comprising their choice sets
may also be evaluated in terms of the efficiency scores (7) and with site efficiency rankings
(Jensen). The ratio of a choice set’s mean efficiency score to the efficiency score of the
chosen site measures the efficiency of the individual’s site choice relative to the mean site.
Formally, the relative efficiency score of the site chosen by individual i is defined:
Relative Efficiency Score
(8)
where
=
denotes the efficiency value of the chosen site.6 For a given choice set, the site
selected by the individual is relatively efficient (inefficient) the smaller (larger) the value of
(8). In general, if the efficiency score of the chosen site exceeds the mean efficiency score
of the choice set, then (8) will be less than one; otherwise, the relative efficiency of the site
is no greater than the mean level of site efficiency. With regard to the efficiency criterion (6),
as
increases through its range, dropping relatively inefficient sites in the process, the mean
efficiency score of an individual’s choice set will increase, yet the efficiency score of the
chosen site will remain constant. As a result, (8) will increase as
increases.
Efficiency rankings (Jensen) are also used to examine the relative efficiency of
individual choices. These are constructed by placing the
12
sites contained in the choice set
in descending order by their efficiency values and then assigning consecutive integers (1, 2,
3, ...,
) to the ordered sites. The ratio of the rank of the chosen site to the number of sites
in the choice set may be used as a second measure of the relative efficiency of individual
choice. The relative efficiency rank of the site chosen by individual i is defined:
(9)
Relative Efficiency Rank
=
If the individual selected the most efficient site, then (9) is equal to 1/
for all choice sets
generated under the efficiency criterion (6), while a value of one indicates that the least
efficient site was chosen. In general, as relatively inefficient sites are excluded according to
the efficiency criterion (i.e., as
is increased), the denominator of (9) will decrease, and as
a result, the relative efficiency rank will approach one for a chosen site of a given rank. In
contrast, as deterministic choice set rules discard sites without regard to their efficiency
characteristics, the relative efficiency rank of the chosen site is expected to remain
approximately constant as the stringency of the rule is varied.
Model Specification
The modeling of individual choice sets and site choice is performed in two stages. In the
first-stage, the stochastic frontier models relating site quality and trip inputs are estimated
separately for each individual in the sample. Choice sets are then generated with the
efficiency criterion (6) evaluated across a large range of values of the calibrating
13
parameter
. The probability of site choice is then modeled in the second stage.
The deterministic portion of the first-stage frontier model is specified as double-log:
. This specification is the benchmark in the production frontiers
literature and permits convenient calculation of and its interpretation as an efficiency
measure (Aigner, Lovell, and Schmidt; Kumbhakar and Lovell). The random error term,
is assumed to be distributed as
,
, and several distributions were examined for the
log-transformed efficiency term, u . The stochastic frontier model with truncated normal site
efficiency is presented here.7 The log-likelihood function of the model is written:
(10)
where
, and
is the
standard normal cumulative distribution function. Given this specification of the stochastic
site frontier model, the efficiency value of site j may be written for individual i as:
(11)
where
and
are defined:
(12)
14
And, the empirical expression of the efficiency criterion (6) is written:
(13)
For individual
where ln(
)=
site
if:
.
For estimation of the second-stage models of site choice, the utility derived by
individual i from visiting site j is expressed:
(14)
where v(.) is the observable, deterministic portion of utility, and
is the unobservable,
random utility component. The probability of choosing site j from the set
may be written:
(15)
Distributional assumptions about
yield specific RUMs. As with the frontier models,
alternative specifications were investigated.8 McFadden’s conditional logit model, which
results from assuming the random utility component is iid type-I extreme value, was selected
for this application. The conditional logit site choice probability is written:
(16)
15
And the log-likelihood function of the conditional logit is written:
(17)
where zj,i equals one if individual
selects site
and is zero otherwise. To evaluate
the effects of the efficiency criterion on the estimated RUM coefficients, choice sets are
generated across the range of values defining
and the RUM is then estimated at each value.
The Data
The empirical application is angler choices of inland fishing sites in Maine during the 19992000 openwater season. The data were obtained through mail surveys using the method of
Salant and Dillman. The surveys elicited the lakes and ponds visited, the quantity of each of
ten species of fish caught, and socioeconomic characteristics of the anglers. The sample used
here contains 1,081 Maine residents who took 5,556 single-day trips to 483 sites.9
For estimation of the first-stage site choice frontier models, the trip input is defined
as the round-trip travel time between the centroid of the respondent’s home zip-code and the
site access points. One-way travel time averages 1.25 hours and ranges from about zero to
five hours (Table 1). Converted to natural logarithms, the input is referenced by Ln(Time).
Site quality is defined as an individual-specific expected daily catch rate of salmon, brook
trout, lake trout, and brown trout. The expected catch rate is generated by estimating specieslevel negative binomial count data models (see Cameron and Trivedi) relating catch to site
attributes and angler characteristics and then summing the fitted values over the species (see
16
Scrogin et al.).10 Converted to natural logarithms, the catch rate is referenced by Ln(Quality).
For estimation of the site choice models, the deterministic indirect utility function v(.)
is specified as linear-in-parameters and linear-in-variables. Five independent variables
comprise v(.). These include the four species-specific expected catch rate variables (Salmon,
Brook Trout, Lake Trout and Brown Trout) and a travel cost price proxy (Travel Cost)
defined as the sum of explicit round-trip travel costs ($0.33/mile) and the opportunity cost
of travel time, valued at one-third of the estimated hourly wage rate.
Estimation Results
Choice Set Frontier Models
The site choice frontier model was estimated separately for each individual in the sample
over the 483 sites comprising the universal choice set. Results reported in Table 2 indicate
that expected catch increases significantly with travel time on average. As the estimated
coefficients may be interpreted as elasticities, the results indicate that a one hundred percent
increase in one-way travel time (about 1.25 hours) yields about a fifteen percent increase in
expected daily catch on average. More than seventy-six percent of the coefficients are
positive and significant, and only two percent are negative and significant.11 Table 2 also
presents summary statistics on the estimated site efficiency estimates (
average site is relatively efficient (
deviation (
’s). Overall, the
= 0.87), but the relative size of the average standard
= 0.25) and range of values (0.06 – 0.99) indicates the efficiency estimates
vary sizably within the sample.
17
Analysis of the Choice Sets
Having estimated the first-stage site choice frontier models and efficiency statistics, choice
sets are formed with the efficiency criterion (6) for estimation of the second-stage site choice
models. As the site efficiency estimates are obtained independently from the choice sets, the
composition of the choice sets may be compared to the composition of choice sets produced
with alternative approaches. To demonstrate, travel-time dependent choice sets were also
constructed (Parsons and Hauber; Hicks and Strand).
The left half of Table 3 reports results with the efficiency rule, and the right half
reports results with the deterministic rule. Considering the former, the 483 site universal
choice set (J) is obtained for all individuals by reducing the calibrating parameter to
= -3.0
standard deviations below the mean site efficiency. Alternatively, increasing the parameter
to
= 2.0 standard deviations above the mean site efficiency yields the minimum feasible
choice set, containing two sites. Evaluated at the mean, , the average choice set contains
approximately 231 sites (
= 17.1). With the deterministic rule, J is obtained by including
all sites within five hours, and the minimum feasible choice set is obtained by excluding all
sites exceeding eighteen minutes. Evaluated at the mean (1.25 hours), the average choice set
contains about fifty more sites (
= 277) than that obtained with the efficiency approach,
and the spread in the choice set size around the mean is about five times larger (
= 79.0).
The composition of the choice sets is evaluated in terms of the mean of the site
efficiency scores (7). With the site efficiency criterion, the mean efficiency score increases
from approximately zero at the universal set to beyond 2.0 as
18
increases (column 6). In
contrast, the choice set composition remains approximately constant on average as sites are
excluded by the deterministic travel time rule, with the average efficiency score reaching a
maximum value of only 0.13 (column 13). This invariance is due to the deterministic rule
discarding both high-quality, efficient sites and low-quality, inefficient sites as the
permissible level of travel time decreases.
Analysis of Site Choice
The relative efficiency of individual choice and the probability of site selection are evaluated
next. As shown in the first row of Table 4, with the universal choice set the absolute and
relative efficiency ranks of the chosen sites (9) average 156 and 0.32, respectively, and the
relative efficiency scores (8) average approximately zero. Overall, some anglers selected the
most efficient sites in their choice sets (Min = 1), while others selected relatively inefficient
sites (Max
n*).
Comparing the rules, the relative efficiency of site choice differs considerably on
average as sites are excluded. With the site efficiency rule, as
approaches its maximum
(2.0) the relative efficiency rank (column 5) approaches its maximum (1.0), and the relative
efficiency score increases beyond 3.0 (column 6). This indicates that as
increases, the
chosen sites become increasingly inefficient within the choice sets relative to the universal
choice set. In contrast, with the deterministic rule the relative efficiency ranks remain
approximately constant as sites are excluded (column 11), and the relative efficiency score
deviates only slightly above that observed with the universal set on average (column 12).
19
The estimated RUM coefficients are examined graphically in Figures 1 and 2.
Overall, fifty-three of the fifty-five RUM coefficients estimated with the efficiency rule (i.e.,
five coefficients estimated at each of the eleven values of
) were significant (0.01 level),
while all fifty-five of the coefficients estimated with the deterministic rule were significant
(see Appendices D and E for numerical results). Considering the former, the estimated Travel
Cost coefficients are negative as expected but approach zero as
increases (Figure 1),
indicating that the marginal effect of travel cost on the probability of site choice increases
as the relative efficiency of the sites comprising the choice set increases. In contrast, the
Travel Cost coefficients estimated with the deterministic rule remain virtually constant as the
permissible travel time is varied. This is attributed to the invariance of both the composition
of the choice sets, as indicated by the constancy of the mean efficiency scores (Table 3), and
the relative location of the chosen sites within the choice sets, as indicated by the constancy
of the relative efficiency ranks and relative efficiency scores (Table 4).
As shown in Figure 2, the estimated catch rate coefficients trend downward in all
cases as
increases and become negative for at least two of the four species (e.g., Brook
Trout and Brown Trout) at relatively conservative values (i.e.,
as
> 0). This is because
increases and the resulting proportion of efficient sites in the choice sets increases, the
relative efficiency of the chosen sites decreases as shown in Table 4 by the increasing relative
efficiency scores and ranks. The quality of the chosen sites is, therefore, less than the quality
attainable at other sites requiring the same travel costs, and as a result, the probability of site
choice is estimated to decrease as expected catch increases. Alternatively, the catch rate
coefficients estimated with deterministic choice sets are insensitive to the level of travel time.
20
The explanation for this finding is that the composition of the choice sets and the relative
efficiency of the chosen sites remain approximately constant as the permissible level of travel
time is varied as indicated by the invariance of the mean efficiency scores (Table 3) and the
relative efficiency scores and relative efficiency ranks (Table 4).
To summarize, the invariance of the RUM coefficients estimated with deterministic
choice sets is consistent with the findings of Parsons and Hauber, Whitehead and Haab, and
Hicks and Strand. In contrast, the parameter estimates converge at a much slower rate with
choice sets generated with the efficiency approach. However, as the efficiency criterion
evaluated at a specific value of the calibrating parameter (
) is consistent with a minimum
utility threshold, the behavior of the estimated coefficients of the indirect utility function
could be explained in terms of the exogenously determined site efficiency statistics. In
adopting the efficiency criterion, a natural question follows regarding the value that should
be assigned to
. As with other choices made in model implementation, the choice of
ultimately rests with the analyst. Furthermore, an appropriate value of
will likely be
application-specific so that sensitivity analyses similar to those performed here (Tables 3-4)
may be necessary to guide the decision. However, as behavior models of destination choice
typically assume–and empirical observation commonly supports–that, all else constant, site
quality (travel cost) is positively (negatively) related to individual utility, the decision
regarding an appropriate value of
this occurs when
may be guided by economic theory. In the present study,
is assigned a value that is no greater than zero.
21
Implications for Benefits Measurement
A common use of RUMs in destination choice research is to estimate the non-market benefits
associated with exogenous changes in some or all of the available sites. With public
resources, such as the water bodies and adjoining lands considered here, changes in policy
may affect site quality or accessibility. To mitigate environmental degradation or stock
depletion, state and federal agencies often impose regulations upon particular activities in
which the public may engage–directly or indirectly affecting site quality–or instead restrict
public access through site closures. As the welfare effects of policy changes necessarily
depend upon the composition of the sites comprising the choice sets, the choice set criterion
serves a crucial role in calculating non-market values.
In evaluating the effects of the choice set on non-market benefit estimates, we focus
upon a controversial set of five large lakes contained in the ‘China Lakes’ region of Maine.
While the lakes are among the state’s most coveted coldwater-fisheries, agricultural and
forestry production coupled with residential land developments have led to increased levels
of eutrophication and toxic loading in their waters (see Parsons, Plantinga, and Boyle for a
detailed discussion). The policy scenario we consider is the hypothetical closure of the five
lakes (thirteen site access points) to recreational anglers.
The monetary value of the change in welfare experienced by individual i from the
closure of one or more sites is measured by the compensating variation (CVi ). For the
conditional logit model CVi is defined:
(18)
22
where MUy represents the marginal utility of income, and n* and n** denote, respectively, the
number of sites in the choice set before and after the site closures. Estimated values of CVi
are obtained by replacing MUy with the Travel Cost coefficient from an estimated RUM and
the v(.) with their estimated values (see e.g., Haab and McConnell, pp. 226-232).
As with the site efficiency analysis (Tables 3-4), we begin in Table 5 with the
universal choice set (row 1). Here, the thirteen access points appear in one hundred percent
of the individual choice sets. The mean welfare loss for closure of the five lakes is calculated
to be $0.32 on average (columns 4 and 9). With the site efficiency rule, the number of
affected sites in the choice set declines to less than one on average as
increases (column
2), however, the mean welfare loss per trip rises sharply. Given that the percentage of
individuals affected by the site closures falls as
rises (column 3), the results indicate that
those who are affected experience large welfare losses.
In comparison, the number of affected sites contained in the deterministic choice sets
is considerably larger on average, especially at relatively conservative values of travel time
(column 7). The percentage of anglers affected by the site closures is also notably larger than
with the site efficiency approach, with the results again being amplified as the size of J*
decreases (column 8). Given the insensitivity of the RUMs estimated with deterministic
choice sets, the mean welfare loss is invariant across all but the most conservative levels of
travel time (column 9). When considered at the individual level, welfare losses estimated
with the site efficiency approach are larger than those resulting with the deterministic
approach–and in some cases strikingly so. However, considering the standard deviations of
the estimates (columns 5 and 10), the findings indicate there is greater heterogeneity within
the sample when evaluated with choice sets generated from the site efficiency approach.
23
Conclusions
Debate remains ongoing about properly constructing individual choice sets for discrete
choice analysis of revealed preference data. To accommodate universal choice sets
containing hundreds or thousands of sites while maintaining the hypothesis that trips require
costly inputs to produce utility generating outputs, this study developed an efficiency
approach to the choice set problem in the context of destination choice that complements the
travel cost frontier model developed by Smith, Palmquist, and Jakus and the minimum utility
threshold model of Klein and Bither.
The site efficiency approach has several advantages over deterministic criteria and
probabilistic models of choice set formation. First, because the approach is consistent with
individual optimizing behavior it overcomes the ‘dominance’ problems associated with
excluding sites according to simple deterministic rules. Second, in contrast to probabilistic
models of choice set formation, empirical implementation of site efficiency models requires
no additional data beyond that required for a defensible revealed preference model of
destination choice, and estimation remains feasible with large universal choice sets.
On a final note, as the production frontier model employed in this study has an
economic dual–the efficient cost frontier–our approach may readily be extended for cost and
pricing analyses. As the choice set issue remains open in the literature on residential housing
choice (e.g., Banzhaf and Smith) and given that the number and types of homes for sale in
a given market at any particular time may be quite large, hedonic cost frontier models may
be an invaluable tool for choice set generation in future analyses of housing demand.
24
References
Aigner, D.J., C.A.K. Lovell, and P. Schmidt. “Formulation and Estimation of Stochastic Frontier
Production Function Models.” Journal of Econometrics 6(1977):21-37.
Black, W. C. “Choice-set Definition in Patronage Modeling.” Journal of Retailing 60(1984):63-85.
Banzhaf, H.S., and V.K. Smith. “Meta Analysis in Model Implementation: Choice Sets and the
Valuation of Air Quality Improvements.” Discussion Paper 03-61, Resources for the Future,
2003.
Cameron, A.C., and P.K. Trivedi. Regression Analysis of Count Data. Cambridge: Cambridge
University Press, 1998.
Farrell, M.J. “The Measurement of Productive Efficiency.” Journal of the Royal Statistical Society
Series A 120(1957):253-281.
Haab, T.C., and R. Hicks. “Accounting for Choice Set Endogeneity in Random Utility Models for
Recreational Demand.” Journal of Environmental Economics and Management
34(1997):127-147.
Haab, T.C., and R.L Hicks. “Choice Set Considerations in Models of Recreation Demand: History
and Current State of the Art.” Marine Resource Economics 14(2000):271-281.
Haab, T.C., and K.E. McConnell. Valuing Environmental and Natural Resources. Cheltenham, UK:
Edward Elgar Publishing Limited, 2002.
Hicks, R.L., and I.E. Strand. “The Extent of Information: Its Relevance for Random Utility Models.”
Land Economics 76(2000):374-385.
Horowitz, J. L. “Modeling the Choice of Choice Set in Discrete-choice Random-utility Models.”
Environment and Planning 23(1991):1237-1246.
25
Jensen, U. “Is it Efficient to Analyse Efficiency Rankings?” Empirical Economics 25(2000):189208.
Klein, N. M., and S. W. Bither. “An Investigation of Utility-Directed Cutoff Selection.” Journal of
Consumer Research 14(1987):240-256.
Kumbhakar, S.C., and C.A.K. Lovell. Stochastic Frontier Analysis. Cambridge: Cambridge
University Press, 2000.
Manski, C. F. “ The Structure of Random Utility Models.” Theory and Decision 8(1977) 229-254.
McConnell, K.E., I.E. Strand, and L. Blake-Hedges. “Random Utility Models of Recreational
Fishing: Catching Fish Using a Poisson Process.” Marine Resource Economics
10(1995):247-261.
McFadden, D. “Modelling the Choice of Residential Location.” in A. Karlqvist, L. Lundquist, F.
Snickars, and J Weibull (Eds.), Spatial Interaction Theory and Planning Models.
Amsterdam: North-Holland, 1978, pp. 75-96.
Parsons, G., and B. Hauber. “Spatial Boundaries and Choice Set Definition in a Random Utility
Model of Recreation Demand.” Land Economics 74(1998):32-48.
Parsons, G. and M. J. Kealy. “Randomly Drawn Opportunity Sets in a Random Utility Model of
Lake Recreation.” Land Economics 68(1992):93-106.
Parsons, G.R., A.J. Plantinga, K.J. Boyle. “Narrow Choice Sets in a Random Utility Model of
Recreation Demand.” Land Economics 76(2000):86-99.
Paterson, R., D. Scrogin, K. Boyle, D. McNeish. “Maine Open Water Fishing Survey: Summer
1999.” Maine Agricultural and Forest Experiment Station Paper No. 493, University of
Maine, 2001.
26
Rabe-Hasketh, S., A. Skrondal, and A. Pickles. “Estimation of Generalized Linear Models.” The
Stata Journal 2(2002):1-21.
Salant, P., and D. Dillman. How to Conduct Your Own Survey. New York: John Wiley and Sons,
Inc., 1994.
Scrogin, D., K. Boyle, G. Parsons, A.J. Plantinga. “Effects of Regulations on Expected Catch,
Expected Harvest, and Site Choice of Recreational Anglers.” American Journal of
Agricultural Economics 86(2004):963-974.
Smith, V.K., R.B. Palmquist, and P. Jakus. “Combining Farrell Frontier and Hedonic Travel Cost
Models for Valuing Estuarine Quality.” Review of Economics and Statistics 73(1991):694699.
Swait, J. “Choice Set Generation within the Generalized Extreme Value Family of Discrete Choice
Models.” Transportation Research B 35( 2001):643-666.
Swait, J., and M. Ben-Akiva. “Empirical Test of a Constrained Choice Discrete Model: Mode
Choice in Sao Paulo, Brazil.” Transportation Research B 21(1987):103-15.
Thill, J. C. “Choice Set Formation for Destination Choice Modelling.” Progress in Human
Geography 16(1992):361-382.
Train, K. “Mixed Logit Models for Recreation Demand.” in C. Kling, and J. Herriges (Eds.),
Valuing the Environment using Recreation Demand Models. New York: Elgar, 1997, pp.
140-163.
Whitehead, J.C., and T.C. Haab. “Southeast Marine Recreational Fishing Statistical Survey:
Distance and Catch Based Choice Sets.” Marine Resource Economics 14(1999):283-298.
27
Footnotes
1. The access price pj,i is usually defined by explicit and, perhaps, implicit travel costs
(e.g., fuel expenditures and the opportunity cost of time, respectively). The former are a
function of the round-trip distance between the trip departure point and the destination; the
latter are usually expressed as a function of travel time and income.
2. Others have defined individual choice sets by the collection of sites visited by all
individuals. When the size of this set proved overwhelming (i.e., in terms of memory
requirements or computational burden), McFadden’s random draws approach was adopted
to reduce the choice set dimension without jeopardizing the consistency of the estimated
RUM coefficients (see e.g., Parsons and Kealy). Alternatively, the choice set dimension may
be reduced through some aggregation scheme, whereby multiple sites are combined through
a weighting of their attributes to form a single, common site (Parsons, Plantinga, and Boyle).
3. Deterministic rules may be considered directly from (1). With travel time and
distance rules, sites are excluded if they are deemed excessively far from the trip origin.
Since pj,i is a function of round-trip distance and the opportunity cost of travel time, a rule
defined solely by trip inputs is equivalent to a rule defined by travel costs. Hence, if a site is
relatively costly to access, then all else constant, the utility derived from visitation will be
sufficiently low for the site to be excluded from estimation. Similarly, with rules defined by
site quality, and assuming utility increases as quality increases, sites will be dropped if, all
else constant, their quality–and thus the utility derived from visitation–is sufficiently low.
4. Thill (p. 362) emphasizes the potential inconsistencies of deterministic approaches
with utility maximization more emphatically by stating, “The analyst takes a major risk of
28
mis-specifying the choice set, primarily because its specification derives only remotely from
the behavioural processes of choice-set formation.”
5. Although quality is represented here by a single, continuous variable, it may be
considered a multi-dimensional vector of site attributes.
6. The relative efficiency score could instead be written as the reciprocal of (8).
However, the statistic diverges to infinity as the mean efficiency score approaches zero (e.g.,
the universal choice set).
7. Models were also estimated with u distributed as half-normal and exponential and
the deterministic relation specified as semi-log, linear, and quadratic. Results are reported
in Appendix C.
8. Random parameters (or mixed) logit models (see e.g., Train) were estimated by
Gaussian-Hermite and adaptive quadrature methods in Stata version 8.2 (Rabe-Hasketh,
Skrondal, and Pickles). While the adaptive method is computationally more efficient, run
time with the smallest data set used in this study (see Appendix D) and four random
parameters exceeded one hundred hours. Nested logit models were also considered, yet these
require the analyst to place a common and potentially arbitrary structure upon the individual
decision process. Although the mixed and nested models relax the IIA assumption of
conditional logit, their empirical implementation is often problematic. However, the general
conclusions of this study are expected to be upheld by the alternative models.
9. By focusing on single-day trips, additional qualitative trip outputs associated with
multiple-day trips may be eschewed. Details on the survey instruments, sample, and
responses are contained in Paterson, Scrogin, Boyle, and McNeish.
29
10. Because an angler’s catch of individual species of fish–and, hence, aggregate
catch–is unknown a priori, the catch variables appear as expected values (see e.g.,
McConnell, Strand, and Blake-Hedges). The expected catch models were specified as zeroinflated negative binomial as it outperformed Poisson, negative binomial, and zero-inflated
Poisson models in all cases. Variable definitions, summary statistics, and estimation results
are reported in Appendices A and B.
11. The proceeding analysis was also performed by restricting the sample to the
subset of individuals for whom the estimated travel time coefficients were positive. Overall,
the estimated RUM coefficients were not sensitive to the sample. Estimation results are
available in an appendix upon request.
30
Table 1. Variable Definitions and Summary Statistics
Variable Definition
Mean
Standard
Deviation
Ln(Quality)
Natural log of expected catch per day
-1.02
1.04
Ln(Time)
Natural log of round-trip travel time
0.76
0.70
Salmon
Expected catch of salmon per day
0.15
0.25
Brook Trout
Expected catch of brook trout per day
0.30
0.49
Lake Trout
Expected catch of lake trout per day
0.06
0.12
Brown Trout
Expected catch of brown trout per day
0.07
0.12
Travel Cost
0.325 x (Round trip miles) + time cost
51.99
33.40
N = 2,683,548
Stochastic Frontier Models
Random Utility Models
Notes: The total number of observations (N) is equal to the number of sites (483) multiplied
by the sample size (5,556). The aggregate and species-specific expected catch rates are
obtained from estimated zero-inflated negative binomial models relating reported catch to
site and angler characteristics. Variable definitions, summary statistics, and estimation results
are reported in Appendices A and B. The time cost component of Travel Cost is defined as
one-third of the hourly wage; hourly values of the wage are derived from respondent annual
income and a 2,000 hour work year (40 hours per-week x 50 weeks per-year).
31
Table 2. Estimated Site Efficiency Frontier Models Averaged Over Trips
Number of Trips: 5,556
Number of Sites: 483
Truncated Normal
Site Efficiency Frontier Models
Average Coefficient Estimates
Ln(Time)
0.146*
(0.040)
Intercept
-0.904*
(0.208)
Summary of Ln(Time) Coefficients
% Positive and Significant
76.6
% Negative and Significant
2.1
% Insignificant
21.3
Summary of Site Efficiency Estimates
Mean
Standard
Deviation
Estimated Site Efficiency:
0.87
0.25
0.06
0.99
Estimated One-Sided Error:
0.22
0.47
1.02e-07
2.84
Min
Max
Notes: The dependent variable is the natural logarithm of expected daily catch. The number
of estimated models is equal to the number of trips (5,556), and the number of observations
per model is equal to the number of sites contained in the universal choice set (483). The
reported coefficient estimates are averages of the individual coefficient estimates. Standard
errors are reported in parentheses. * denotes significance at the 0.01 level. The reported
percentages are based upon two-tailed t-tests of the null Ho:
= 0 conducted at the 0.05
level. Approximately seventy percent of the intercepts were significant.
32
Table 3. Choice Set Characteristics: Efficiency vs. Deterministic Criteria
Site Efficiency Choice Sets
Mean Standard
Sites Deviation
Min
Max
Travel-Time Dependent Choice Sets
Mean
Efficiency
Scores (7)
-3.0
483
0
483
483
-2.5
482.81
1.09
476
483
0.001
-2.0
481.96
2.24
475
483
-1.5
470.46
11.91
435
-1.0
386.54
15.88
-0.5
297.39
0.0
Travel
Time
N
-0.00001 2,683,548
Mean Standard
Sites Deviation
Min
Max
Mean
Efficiency
Scores (7)
N
5.0
483
0
483
483
-0.00001 2,683,548
2,682,481
4.5
482.82
0.68
475
483
-0.0001
2,682,543
0.005
2,677,775
4.0
479.38
14.05
331
483
-0.002
2,663,410
483
0.05
2,613,897
3.5
472.37
30.08
214
483
-0.01
2,624,501
336
448
0.32
2,147,590
3.0
460.50
47.75
123
483
-0.01
2,558,536
20.25
217
337
0.64
1,652,302
2.5
438.66
69.14
55
483
-0.01
2,437,168
228.92
16.99
175
274
0.91
1,271,870
2.0
398.99
87.65
31
470
-0.02
2,216,797
0.5
167.10
14.66
127
209
1.15
928,380
1.5
321.67
92.35
15
445
-0.02
1,787,184
1.0
94.71
11.23
72
136
1.45
526,191
1.0
209.21
78.38
7
317
-0.02
1,162,368
1.5
37.72
9.42
16
74
1.80
209,584
0.5
69.07
34.67
3
146
0.07
383,732
2.0
13.52
5.97
2
33
2.03
75,131
0.3
35.86
20.49
2
85
0.13
199,241
33
Table 4. Relative Efficiency of Individual Choices: Efficiency vs. Deterministic Criteria
Site Efficiency Choice Sets
Efficiency Ranks
Travel-Time Dependent Choice Sets
Mean
Min
Max
Relative Site
Efficiency
Ranks (9)
Relative
Efficiency
Scores (8)
Travel
Time
Efficiency Ranks
Mean
Min
-3.0
156.42
1
483
0.32
-0.00002
5.0
156.42
1
483
0.32
-0.00002
-2.5
156.42
1
483
0.32
0.001
4.5
156.36
1
483
0.32
-0.0002
-2.0
156.42
1
483
0.32
0.01
4.0
154.89
1
483
0.32
-0.003
-1.5
156.36
1
478
0.33
0.07
3.5
151.95
1
483
0.32
-0.01
-1.0
154.45
1
434
0.40
0.50
3.0
147.47
1
475
0.32
-0.01
-0.5
144.83
1
331
0.49
1.01
2.5
140.34
1
466
0.32
-0.02
0.0
129.43
1
260
0.57
1.44
2.0
127.93
1
463
0.32
-0.02
0.5
106.48
1
205
0.64
1.82
1.5
103.02
1
403
0.32
-0.04
1.0
71.02
1
128
0.75
2.30
1.0
67.33
1
298
0.32
-0.04
1.5
33.27
1
72
0.88
2.85
0.5
24.01
1
123
0.35
0.11
2.0
13.14
1
33
0.97
3.22
0.3
13.21
1
67
0.36
0.21
34
Max
Relative Site
Efficiency
Ranks (9)
Relative
Efficiency
Scores (8)
Table 5. Welfare Losses for Site Closures: Efficiency vs. Deterministic Criteria
Site Efficiency Choice Sets
Travel-Time Dependent Choice Sets
Welfare Losses
Mean
Sites
Closed
Percent
of Anglers
Affected
Mean
per Trip
Standard
Deviation
-3.0
13
100
0.32
-2.5
13
100
-2.0
13
-1.5
Welfare Losses
Travel
Time
Mean
Sites
Closed
Percent
of Anglers
Affected
Mean
per Trip
Standard
Deviation
0.51
5.0
13
100
0.32
0.51
0.32
0.51
4.5
13
100
0.32
0.51
100
0.32
0.51
4.0
13
100
0.32
0.51
12.98
100
0.32
0.51
3.5
12.91
100
0.32
0.51
-1.0
12.28
99.81
0.36
0.56
3.0
12.77
99.07
0.32
0.51
-0.5
7.90
93.99
0.40
0.65
2.5
12.61
98.06
0.32
0.51
0.0
3.26
60.22
0.37
0.69
2.0
12.11
96.21
0.32
0.51
0.5
1.30
28.31
0.31
0.73
1.5
11.79
94.26
0.32
0.51
1.0
0.56
14.62
0.69
3.32
1.0
9.85
86.03
0.34
0.52
1.5
0.28
11.19
6.52
41.18
0.5
3.76
39.96
1.21
1.85
2.0
0.09
6.85
32.19
182.64
0.3
2.16
30.62
-0.73
1.28
35
Efficiency Choice Sets
Horizontal Axis: Calibrating Parameter ( )
Vertical Axis: Travel Cost Coefficient
Travel-Time Dependent Choice Sets
Horizontal Axis: One-Way Travel Time
Vertical Axis: Travel Cost Coefficient
Figure 1. Sensitivity of Estimated Travel Cost Coefficients: Efficiency vs. Deterministic
Criteria
36
Efficiency Choice Sets
Horizontal Axis: Calibrating Parameter ( )
Vertical Axis: Expected Catch Coefficients
Travel-Time Dependent Choice Sets
Horizontal Axis: One-Way Travel Time
Vertical Axis: Expected Catch Coefficients
Figure 2. Sensitivity of Estimated Expected Catch Coefficients: Efficiency vs. Deterministic
Criteria
37
Appendix A. Variable Definitions and Summary Statistics
Standard
Variable Name
Definition
Mean
Deviation
I = 1,081
Angler Characteristics
Age
Angler age
40.58
12.91
Male
1 if angler is male; 0 if angler is female
0.88
0.32
Ln(Days)
Natural logarithm of single day trips
1.09
0.95
Targeted
1 if angler targeted the species;
0 otherwise:
Salmon
0.36
0.48
Brook Trout
0.37
0.48
Lake Trout
0.26
0.44
Brown Trout
0.23
0.42
J = 483
Fishery Characteristics
Species Not Present 1 if species not known to be present
by managing agency; 0 otherwise: Salmon
0.57
0.50
Brook Trout
0.25
0.44
Lake Trout
0.74
0.44
Brown Trout
0.70
0.46
Species Not Abundant 1 if species not known to be abundant
by managing agency; 0 otherwise: Salmon
0.11
0.31
Brook Trout
0.32
0.47
Lake Trout
0.05
0.22
Brown Trout
0.10
0.30
38
Appendix A continued
Stocked Species
1 if lake or pond stocked with species;
0 otherwise:
Salmon
0.26
0.44
Brook Trout
0.32
0.47
Lake Trout
0.08
0.28
Brown Trout
0.20
0.40
Surface area of lake or pond in acres
scaled by 1,000
2.21
6.32
Depth
Depth of lake or pond in feet, scaled by 10
2.18
1.66
Elevation
Elevation of lake or pond above sea level
scaled by 100
5.45
4.52
Acres
Water Type 1
1 if oligotrophic water (low productivity;
deep secchi disk readings); 0 otherwise
0.28
0.45
Water Type 2
1 if eutrophic water (high productivity;
shallow secchi disk readings); 0 otherwise
0.34
0.47
No Live Bait
1 if no-live-bait regulation at lake or pond;
0 otherwise
0.13
0.34
Catch and Release
1 if catch-and-release regulation at lake or
pond; 0 otherwise
0.01
0.12
39
Appendix B. Estimated Zero-Inflated Negative Binomial Models of Expected Catch
Expected Catch
Salmon
Brook Trout
Lake Trout
Brown Trout
Log(Days)
0.80*
(0.07)
0.63*
(0.07)
0.79*
(0.09)
0.81*
(0.09)
Targeted
1.92*
(0.16)
2.31*
(0.16)
2.31*
(0.21)
2.23*
(0.20)
Age
0.02*
(0.00)
-0.02*
(0.01)
0.00
(0.01)
-0.01
(0.01)
Male
-0.38
(0.23)
0.35
(0.23)
-0.54***
(0.32)
0.63***
(0.33)
Acres
0.00
(0.00)
-0.01**
(0.00)
0.02**
(0.01)
0.00
(0.01)
Depth
-0.04
(0.07)
0.00
(0.03)
-0.02
(0.08)
-0.14
(0.11)
Elevation
0.04*
(0.02)
0.05*
(0.02)
0.00
(0.03)
0.05
(0.03)
Water Type 1
0.34
(0.17)
-0.53*
(0.17)
0.59***
(0.32)
0.13
(0.35)
Water Type 2
-0.21
(0.22)
-0.16
(0.17)
0.17
(0.32)
0.15
(0.19)
No Live Bait
-0.39
(0.30)
-0.28
(0.19)
0.04
(0.44)
-1.12***
(0.60)
Species Not Present
-0.20
(0.37)
-0.05
(0.35)
-0.47
(0.40)
-0.27
(0.52)
Species Not Abundant
-0.57*
(0.31)
-0.99*
(0.21)
0.18
(0.55)
0.02
(0.55)
Stocked Species
-0.36*
(0.20)
0.12
(0.15)
-0.19
(0.24)
0.40
(0.46)
Catch and Release
0.04
(0.46)
-1.12**
(0.48)
-0.15
(0.52)
-0.37
(0.76)
Constant
-2.37
(0.44)
-1.20*
(0.38)
-2.60*
(0.59)
-2.85*
(0.63)
Modified Negative Binomial
40
Appendix B Continued
Logit of Positive Catch
Species Not Present
4.51*
(0.80)
3.12*
(0.54)
2.07*
(0.46)
1.73*
(0.56)
Species Not Abundant
2.28*
(0.75)
1.64*
(0.48)
1.80**
(0.74)
2.06*
(0.69)
Age
-0.03*
(0.01)
-0.05*
(0.02)
-0.02
(0.01)
-0.03***
(0.02)
Male
-1.97*
(0.75)
-0.35
(0.46)
-0.64
(0.49)
-0.17
(0.61)
0.36*
(0.12)
0.69*
(0.13)
0.47*
(0.22)
0.52***
(0.27)
Mean
1.48
1.80
0.61
0.61
Standard Deviation
3.09
3.20
1.48
1.51
Minimum
0.00
0.01
0.01
0.00
Maximum
34.89
27.74
19.51
19.58
Sample Size
1,310
1,305
1,227
1,213
-1,366.74
-2,022.56
-1,185.18
-1,141.60
Dispersion Parameter
Log(
)
Prediction Statistics
Log-Likelihood
Notes: The dependent variable is the reported catch per visited site during the open-water
fishing season. *, **, and *** denote significance at the 0.01, 0.05, and 0.10 level,
respectively.
41
Appendix C. Comparison of Estimated Site Efficiency Frontier Models
Truncated Normal Site Efficiency
Number of Trips: 5,556
Number of Sites: 483
DoubleLog
LinearLog
0.147*
(0.041)
Travel Time
Travel Time Squared
Half-Normal Site Efficiency
DoubleLog
LinearLog
---
0.146*
(0.041)
0.043*
(0.012)
---
---
---
-1.103*
(0.217)
0.547*
(0.059)
% Positive and Significant
76.6
% Insignificant
% Negative and Significant
Exponential Site Efficiency
DoubleLog
LinearLog
---
0.147*
(0.041)
0.043*
(0.012)
---
---
---
-1.062*
(0.285)
0.540*
(0.117)
78.5
76.6
18.3
20.9
1.4
0.6
Linear
Quadratic
Linear
Quadratic
0.086*
(0.027)
---
---
---
---
Linear
Quadratic
0.086*
(0.027)
---
0.086*
(0.027)
---
---
---
---
---
---
0.043*
(0.012)
---
0.006*
(0.002)
---
0.006*
(0.002)
---
---
---
0.006*
(0.002)
0.500*
(0.063)
0.552*
(0.113)
0.493*
(0.123)
0.552*
(0.113)
-1.103*
(0.217)
0.547*
(0.059)
0.500*
(0.063)
0.552*
(0.113)
75.5
80.3
75.5
80.3
78.5
76.6
75.5
80.3
78.5
21.3
21.5
21.3
21.5
18.3
20.9
21.3
21.5
18.3
20.9
2.1
3.0
2.1
3.0
1.4
0.6
2.1
3.0
1.4
0.6
Coefficient Estimates
Log(Travel Time)
Intercept
Coefficient Summary†
Notes: The dependent variable in the linear, quadratic, and semi-log models is defined as the sum of the salmon, brook trout, lake catch, and brown trout catch
expectations generated from zero-inflated negative binomial (ZINB) models (see Appendix A). The dependent variable in the double-log models is the natural
logarithm of the sum of the catch expectations. Coefficient estimates and standard errors (in parentheses) are obtained by averaging the coefficients over the 5,556
estimated stochastic frontier models. * denotes significance at the 0.01 level. † Significance based upon two-tailed t-tests ( = 0.05) of the null hypothesis Ho : =
0. While not reported here, approximately 96 percent of the intercepts were significant ( = 0.05) with the levels and semi-log specifications; about 70 percent
were significant with the double-log specifications.
42
Appendix D. Conditional Logit Estimation Results: Efficiency Choice Sets
Pseudo R2
N
-3.0
-0.153*
2.432*
0.616*
3.147*
2.838*
0.326
2,683,548
-2.5
-0.153*
2.432*
0.614*
3.157*
2.829*
0.326
2,682,481
-2.0
-0.153*
2.431*
0.607*
3.146*
2.810*
0.326
2,677,775
-1.5
-0.153*
2.425*
0.603*
3.143*
2.768*
0.323
2,613,897
-1.0
-0.148*
2.277*
0.529*
3.047*
2.385*
0.311
2,147,590
-0.5
-0.142*
1.784*
0.104**
2.704*
1.168*
0.310
1,652,302
0.0
-0.136*
0.860*
-0.676*
2.178*
-0.182
0.324
1,271,870
0.5
-0.129*
-0.466*
-2.379*
1.184*
-0.792*
0.360
928,380
1.0
-0.109*
-4.353*
-5.192*
0.014
-3.746*
0.432
526,191
1.5
-0.063*
-13.663*
-15.809*
-5.639*
-12.754*
0.643
209,584
2.0
-0.031*
-21.760*
-22.457*
-5.504*
-20.147*
0.857
75,131
Notes: The choice sets were generated with the stochastic frontier model with the site efficiency
component assumed to be distributed as truncated normal and the random error assumed to be
normally distributed. * and ** denote significance at the 0.01 and 0.05 level, respectively.
43
Appendix E. Conditional Logit Estimation Results: Deterministic Choice Sets
Travel
Time
Pseudo R2
N
5.0
-0.153*
2.432*
0.616*
3.147*
2.838*
0.326
2,683,548
4.5
-0.153*
2.432*
0.616*
3.147*
2.838*
0.326
2,682,543
4.0
-0.153
2.432*
0.616*
3.147*
2.838*
0.325
2,663,410
3.5
-0.153*
2.432*
0.616*
3.147*
2.838*
0.323
2,624,501
3.0
-0.153*
2.432*
0.616*
3.147*
2.838*
0.319
2,558,536
2.5
-0.153*
2.432*
0.616*
3.148*
2.838*
0.312
2,437,168
2.0
-0.153*
2.432*
0.616*
3.148*
2.839*
0.298
2,216,797
1.5
-0.152*
2.433*
0.617*
3.167*
2.841*
0.267
1,787,184
1.0
-0.144*
2.444*
0.623*
3.309*
2.848*
0.199
1,162,368
0.5
-0.036*
2.099*
0.638*
5.008*
2.992*
0.077
383,732
0.3
0.062*
2.674*
0.756*
4.655*
3.280*
0.114
199,241
Notes: * denotes significance at the 0.01 level.
44