Discrete Choice with Social and Spatial Network

Dugundji & Walker
Discrete Choice with Social and Spatial Network Interdependencies
An Empirical Example Using Mixed GEV Models with Field and “Panel” Effects
Elenna R. Dugundji *
Department of Geography and Planning
University of Amsterdam
Nieuwe Prinsengracht 130
1018 VZ Amsterdam, Netherlands
Tel: +31 20 525 4581
Fax: +31 20 692 5813
Email: [email protected]
Joan L. Walker
Center for Transportation Studies
Boston University
675 Commonwealth Avenue
Boston, MA 02215, USA
Tel: +1 617 353 5743
Fax: +1 617 358 0205
Email: [email protected]
Submission date: August 1, 2004
Revised: November 15, 2004
Final version: March 31, 2005
Word count: 7451 (including abstract, references, 1 table, 1 figure)
*) Corresponding author
1
Dugundji & Walker
2
ABSTRACT
Discrete choice analysis has become an industry standard in land use and transportation models. Such models are
fundamentally grounded in individual choice, and therefore an outstanding challenge is the treatment of
interdependencies among decision-makers. This paper describes and illustrates through an empirical application to
mode choice how interdependencies can be captured in discrete choice. In this approach, decision-makers are
assumed to be influenced, for example, by those of similar socio-economic status and those in spatial proximity.
Given such social and spatial network relationships, interdependencies are captured in the choice model in two
ways. The first is to include in the systematic utility variables that describe choices of others in the decision-maker's
social and spatial network. The second is by allowing for correlation across the disturbances of decision-makers
within the same social and spatial network. Variations of these approaches (including their combination and the use
of random parameters) were tested with mode choice, and were compared with traditional methods of market
segmentation. The application results indicate that the proposed methods for capturing interdependencies are
significant and are superior to traditional methods. Furthermore, capturing the interdependencies in the systematic
utility was sufficient in that it explained better than the model with just correlation and was not significantly worse
than the model with both the systematic term and correlation. The systematic term also captures a feedback effect,
which, for example, can propel the adoption of a new mode over time. The models were estimated using a
traditional transportation dataset and readily available software.
Dugundji & Walker
3
INTRODUCTION
Pioneered in the domain of travel demand by Ben-Akiva, McFadden and others (1, 2), discrete choice analysis has
become an industry standard in land use and transportation planning models. Recent elaborate operational examples
of the development of this methodology are due to Wegener, Waddell, Martinez, Hensher (3)-(7), to cite just a few.
Meanwhile the field itself has flourished in the past 30 years ultimately extending the basic random utility model to
incorporate cognitive and behavioral processes, flexible error structures and different types of data in so-called
hybrid choice models (8)-(12). However as discrete choice theory is fundamentally grounded in individual choice,
an outstanding challenge remains in the treatment of the interdependence of various decision-makers’ choices (13).
This paper aims to illustrate issues in the empirical estimation of a discrete choice model with
interdependence between decision-makers’ choices using mixed GEV model structures. The paper is organized as
follows. First the particular forms of interdependence to be addressed in this paper are motivated. A brief review of
literature is presented describing what the paper brings to an existing stream of work, and a framework for
interactions among decision-making agents is outlined. Next, five different modeling strategies are proposed to test
central hypotheses of the paper. The model is applied to a case study in the Netherlands to highlight some main
hypothesized social and spatial interaction effects. Estimation results are discussed. Finally, limitations in the
present work are summarized and directions for future research efforts are outlined.
BACKGROUND
Our starting point in considering interdependence of various decision-makers’ choices is a trio of papers by
economists Aoki (14), Brock & Durlauf (15) and Blume & Durlauf (16). They introduce social interactions in binary
discrete choice models by allowing a given agent’s choice for a particular alternative to be dependent on the overall
share of decision-makers that choose that alternative. The specification captures feedback between decision-makers
that can potentially be reinforcing over the course of time. In diverse literature this is referred to as a social
multiplier, a cascade, a bandwagon effect, imitation, contagion, herd behavior, etc. (17) The lack of attention to such
models to date in the field of transportation is surprising. Namely, if there is theoretical or qualitative reason to
believe that a feedback effect exists, it can have very important implications for the prediction of (system-wide)
results over the course of time. For example, in the introduction of a new transportation mode alternative, if there is
a “dominant enough” feedback effect, this can propel the adoption of the new mode over time.
How dominant is “dominant enough” under different conditions? Brock & Durlauf (18) have extended their
results on the behavior of binary logit models to multinomial logit models. Dugundji (19) makes Brock & Durlauf’s
multinomial results precise for trinary multinomial choice and extends the results for the case of nested logit. Also,
while the behaviour over time derived in early work assumed each decision-maker to be influenced by all other
decision-makers (so-called global interactions), Dugundji & Gulyas (20, 21) derive more general behavior for the
case where each decision-maker is influenced by only a subset of decision-makers (so-called non-global
interactions).
A key to all of the above theoretical results, however, is the assumption that the only explanatory variable
in the model is the feedback effect. While such a specification may be plausible for a fad such as the latest new
children’s toy, it is much less intuitive for transportation mode choice where other explanatory variables would be
assumed to be significant, including both attributes of the alternatives as well as characteristics of the decisionmaking agents. Dugundji & Gulyas (22) present initial results using simulated data for a binary logit model with
non-global interactions and other explanatory variables included in the utility. In a companion paper to this one,
Dugundji & Gulyas (23, 24) present results for the behavior over time of a nested logit model with non-global
interactions, using the same empirical data and the same social and spatial network interdependencies of which
decision-makers influence each other as to be defined here.
One econometric issue that arises in empirical estimation however is that the same unobserved effects
might be likely to influence the choice made by a given decision-maker as well as the choices made by those in the
decision-maker’s reference group. In terms of transportation mode choice, for example accessibility measures for
residents in the neighborhood could play such a role to the extent that these were unable to be directly captured
through explanatory variables in the utility specification. In this case, the use of transportation mode shares of
neighbors living in the same zone as an explanatory variable will be correlated with the unobserved error of the
given decision-maker, which is a classic case of endogeneity. If endogenous, the results and coefficients of such a
model will be biased. To try to separate out effects, it is therefore critically important to begin with an as wellspecified model as possible, making use of relevant available explanatory variables. Consequences of omitting
significant explanatory variables in the utility specification and improperly accounting for availability of alternatives
Dugundji & Walker
4
in nested logit discrete choice models with feedback effects are considered in Dugundji & Gulyas. (25) In this paper
we continue this exploration of issues in the empirical estimation of discrete choice models with feedback effects, by
specifically testing for correlation among agents in the error structure in the particular empirical case study of mode
choice to work, through the used of mixed GEV family models. It is namely our intent to capture potential bias
induced by endogeneity, by explicitly accounting for the supposed correlation in the error structure. For
completeness, we also compare feedback effect models to baseline models where limited social and spatial
interdependence can be introduced as alternative-specific dummy variables.
A FRAMEWORK FOR SOCIAL AND SPATIAL NETWORK INTERDEPENDENCIES
In order to model interactions among decision-makers, it is necessary to formulate the nature in which these
interactions take place. It is useful in this context to apply the concept of a network as a formal representation of
such interactions, where nodes represent decision-making entities and links are drawn to represent interdependencies
between these entities. We address now in turn issues of scope and mechanism.
Perhaps most salient is the issue of scope: are we considering only supply-side interactions, only demandside interactions, or supply-demand interactions? (13) This is important due to the inherently more complex nature
of supply-side and supply-demand interaction and the importance of the specific role and position of decisionmakers in an (unbalanced) interaction process. Hensher (26) outlines considerations in this regard and proposes at
least eight different styles of decision-making processes ranging from autocratic to leaderless. A particularly novel
approach to supply-demand interactions in transportation literature are the studies by Brewer & Hensher (27) and
Rose & Hensher (28). Combining ideas from game theory as well as discrete choice theory, they develop what they
term “Interactive Agency Choice Experiments” between employer and employee to study adoption of participation
in distributed work. The focus in this paper however will be only on demand-side interactions.
To make an effort to classify different types of demand-side interaction mechanisms, a distinction is
hypothesized between social versus spatial interactions and between identifiable versus aggregate interactions (20,
21). We speak of interaction between “identifiable” decision-makers when the links in the network are well-known
and explicitly defined on an individual decision-maker by decision-maker basis. We speak of interaction between
“aggregate” decision-makers when interdependence is assumed to take place only at an aggregate level with links
being defined, for example, more generally based on decision-maker characteristics. We speak of “spatial” network
interactions when the interdependence represents a confluence of decision-makers in geographic terms. For
example, decision-makers may be linked based on spatial proximity of residential location, work location or other
geographical point of reference such as school, childcare, shopping, healthcare, leisure/recreation or other relevant
activity location. We speak of “social” network interactions when decision-makers are linked based on social circles.
With social interactions, the decision-makers need not be proximally or tangentially situated in geographical terms
and the interaction is not necessarily centered at a particular geographic point of reference; interaction may take
place at a distance, so to speak.
The research reported here explores interactions between a decision-maker and the aggregate actions of
other decision-makers proximally situated in a spatial network, and interactions between a decision-maker and the
aggregate actions of other decision-makers associated in a socioeconomic network. Technically, however,
interactions between identifiable decision-makers may also be modeled using the approach described in this paper
given the availability of suitable data, and thus methods reported here may prove to be useful in those areas as well.
We will return briefly to the limitations of our approach as applied to identifiable interactions in the discussion at the
end of this paper.
One particular special case of spatial interactions between identifiable agents worthy of note are intrahousehold interactions. There are various reasons why we might expect a priori that intra-household interaction
would be important in travel demand behavior. Vovsha, Bradley & Bowman (29) and Vovsha et al (30) provide a
useful review of literature, distinguishing between coordination of individual daily activity patterns, joint
participation in activities and travel and intra-household mechanisms for allocation of maintenance activities. Other
interesting examples of the application of intra-household interactions include residential choice behavior (31)-(34)
and in the marketing literature, for example, holiday destination choice (35).
In the case study to be discussed, we do not have available data on which identifiable agents influence other
identifiable agents’ choices. We do however have rich socioeconomic data for each respondent as well as the
geographic location of each respondent’s residence. This allows us to define aggregate interactions by grouping
agents into geographic neighborhoods or into socioeconomic groups indexed (1,...,g,...,G) where the influence is
assumed to be more likely. In the simplest case, these groups are assumed to be mutually exclusive and collectively
exhaustive. That is each agent n belongs to one and only one group g. The agent is influenced by the average choice
Dugundji & Walker
5
behavior of his or her group, and the influence by other groups is assumed to be negligible. Figure 1(a) illustrates the
transportation mode shares for decision-makers in the sample grouped by residential district. At a global level, the
picture is a fragmented or disconnected network of clustered groups. If we are interested in equilibrium behavior, the
consequences of such an assumption are important: there is no transmission of influence across groups, and the
global picture is a weighted average behavior of the separate clusters. As in Dugundji & Gulyas (23)-(25), we also
consider the case with overlapping groups, with agents for example connected by social group as well as by
residential district. This leads to a giant cluster for the empirical example under consideration, with the important
implication that influence can spread throughout the entire population. Such a network is abstractly visualized in
Figure 1(b) using the freely available software packet Pajek developed by Batagelj & Mrvar (http://vlado.fmf.unilj.si/pub/networks/pajek).
SOME STRATEGIES FOR INTRODUCING SOCIAL AND SPATIAL NETWORK
INTERDEPENDENCIES IN CHOICE MODELS
Discrete choice theory allows prediction based on computed individual choice probabilities for heterogeneous
agents’ evaluation of alternatives. We present several strategies for introducing social and spatial network
interdependencies in choice models: as a generic feedback effect, as unobserved group heterogeneity, as a set of
alternative-specific dummy variables, and as a random parameter feedback effect. Note that we want to avoid
confounding correlation among alternatives with any potential correlation among agents. Although we choose the
cross-nested logit model (36) as a starting point owing to its fairly general closed form, the strategies could in
principle be applied to any suitable choice model.
Strategy 1: Field Effect for Reference-Group
We introduce a feedback effect among agents by allowing the systematic utility Vin to be a linear-in-parameter β first
order function of the proportion xin of a given decision-maker’s reference entities who have made this choice. The
variable xin is termed a “field variable.” As motivated by Aoki, “Knowledge of a field variable relieves agents (at
least partially) of the need for detailed information on interaction patterns. Any macroeconomic variable that serves
this decentralizing function is called a field variable.”
Our model differs from the Aoki model however, in that we consider non-global interactions. We do
consider an aggregated feedback, but agents see different proportions, depending on who their particular reference
entities are. In fact, even agents within the same group see (slightly) different proportions of reference agents
choosing each alternative because an agent’s own choice is not included in these proportions. If we had data
available to support treatment of network interaction effects with identifiable households, the approach could be
further generalized for example to allow different weights on different ties.
Strategy 2: Unobserved Group Heterogeneity (“Panel” Effect)
Next we consider correlation among agents in the disturbance of the utility. We model the correlation among agents
within a group g as a panel effect. In a typical choice model application with panel data, the panel aspect captures
correlation across multiple responses from the same individual. The panel implementation that we use captures
correlation among different decision-makers based on the connectivity in a social and/or spatial network. Let ξg =
[ξ1g,..., ξig,..., ξJg]T be a vector of unobserved random terms where the ξig are now additively introduced to the
utility function Uin. The probability Pn(i) that agent n chooses alternative i from the composite choice set Cn is:
Pn (i ) = ∫ Pn (i | ξ g ) f (ξ g )d ξ g
ξg
[1]
where f (ξg) denotes the multivariate probability density of ξg. It is namely our intent to capture potential bias
induced by endogeneity, by explicitly accounting for the supposed correlation in the error structure. Where these
error terms can be shown to be statistically insignificant, we assume that the hypothesized endogeneity has
negligible effect.
Dugundji & Walker
6
Strategy 3: Alternative-Specific Dummy Variable for Reference-Group
For completeness, we also consider a more typical model specification with a complete set of alternative-specific
dummy variables (ASDV) by reference group. This would be a case of observed group heterogeneity. In general we
can expect a complete set of ASDV to perform well in terms of the loglikelihood value, in being very flexible in
capturing variation across alternatives and across groups. The difficulty with adding a complete set of ASDV is the
proliferation of estimated parameters if there are a large number of groups and/or large number of alternatives. In
such case we might run considerable risk of overfitting our model.
Strategies 4 and 5: Random Field Effect and Panel Field Effect
Finally, we allow the field effect described in strategy #1 to have alternative-specific variance σi associated with it.
In strategy #4 we model this variance most generally as unobserved individual heterogeneity (random parameter
field effect). In strategy #5 we model this variance as unobserved group heterogeneity: the variance σi on the field
effect is constrained to be equal for agents in the same reference group (“panel” field effect). The distinction
between this last model versus strategies #1 and #2 estimated directly together is that here the unobserved group
heterogeneity is on the field effect itself, instead of being estimated on its own. Strategies #4 and #5 thus gain
flexibility in capturing variation across alternatives and across individuals/groups as with the complete set of ASDV
in strategy #3, but without the drawback of a potential proliferation of estimated parameters with large number of
groups and/or alternatives.
CASE STUDY
The research reported here is a small part of a larger work aimed at assessing multi-modal transportation as a land
use transportation planning policy instrument for reducing road congestion (37)-(40). The contribution of this
particular subproject is the treatment of social and/or spatial interdependencies between households and the attempt
to understand any potential role of generated feedback dynamics in the take-up of mode choice to work (41).
The data used in this subproject originates from travel questionnaires administered by the Municipality of
Amsterdam Agency for Infrastructure, Traffic and Transport (dIVV) during the period 1992-1997 in Amsterdam and
a neighboring suburb to the south of the city, Amstelveen. Due to limitations in the available data, we consider a
trip-based home to work model.
Baseline Model with No Inter-Agent Dependencies
Definition of Variables
The model is a mode choice model to work, with alternatives public transit (pt), bicycle/motorcycle (bi) and car
driver/passenger (ca). Raw variables available for use in the model are defined as follows:
availpt
carown
gender
age
income
education
ivtpt
ovtpt
ttbike
ttcar
parktca
1 if public transit alternative is available, 0 otherwise
1 if decision-maker owns a car, 0 otherwise
1 if female, 0 if male
Respondent’s age category: 12-17 years; 18-29 years; 30-44 years; 45-59 years; 60 years and older
Respondent’s income in one of five categories based on dutch governmental classification
Respondent’s education category: elementary education (LO); lower vocational education (LBO); high
school education (MO); post high school education (HO); other
In-vehicle-time in minutes with public transit
Out-of-vehicle-time in minutes with public transit (access and egress, waiting, transferring, etc)
Travel time in minutes with bicycle
Travel time in minutes with car
Time in minutes to park car
Various piecewise linear specifications of all travel time related variables (ivtpt, ovtpt. ttbi, ttca, parkt) as
well as age (defined by the midpoint of the age category) were tested against linear, quadratic and logarithmic forms
of these variables. Considering various a priori hypotheses of behavior in the region and after statistical comparison
Dugundji & Walker
7
of the alternative nonlinear specifications of variables against the linear versions thereof using loglikelihood ratio
tests and non-nested tests (42), the following definitions are ultimately used in the baseline model:
lnage
age4559
ivtsqpt
lnttcar
parktsqcar
aowmin
Natural logarithm of age in years
Max [0, min (age - 45, 15) ]
In-vehicle-time in minutes with public transit, squared
Natural logarithm of travel time in minutes with car
Time in minutes to park car, squared
1 if income category at pension level but below poverty line, 0 otherwise
Additionally, availability of bicycle is defined:
availbi75
1 if travel time by bicycle is less than 75 minutes, 0 otherwise
Choice Kernel: Cross-Nested Logit
Estimation of three successive nested logit models first with public transit nested with bicycle, then with public transit
nested with car, and finally with bicycle nested with car, show the first two nesting structure to be significant in terms of
loglikelihood ratio test and in terms of a t-test on the nest coefficient. The third nesting structure was not indicated.
Thus a cross-nested structure with public transit nested with both bicycle and car is a logical choice. This
cross-nesting structure was tested against a general structure where all 3 alternatives could belong to each of 2 nests,
and showed no significant loss of fit. Furthermore, the nest coefficient on the public transit and car nest and the nest
coefficient on the public transit and bicycle nest are constrained to be equal. Finally constraining the two parameters
αpt, m defining the degree of “membership” of the public transit alternative in each of the two nests m to be equal also
showed no significant loss of fit, and improvement in the t-test on αpt, m. The baseline cross-nested model thus adds
two additional parameters, namely a generic nest coefficient and a generic membership in the nests.
Network Interdependence Defined by Residential District
Now we turn to the specification of the network interdependence. We begin with a broad classification by residential
district. See figure 1(a). There are nine districts represented in the sample. Three new variables are created:
dtrpt(bi)(ca)nsl
Share of respondent’s fellow district residents in the sample choosing public transit (bicycle) (car)
The designation “nsl” refers to “no self-loops.” That is, the respondent’s own choice is not included in the
average behavior in the district perceived by a given respondent. The specification of these variables is generic. There are
two main motivations for choosing a generic specification, namely, simplicity and identification. In order to estimate three
distinct coefficients, we would either need to add an additional constraint or estimate the coefficients as random
parameters.
Network Interdependence Defined by Socioeconomic Group
Next using the variables age, income and education, thirteen socioeconomic groups are defined (41). As above, three
new variables are created, with generic specification:
socpt(bi)(ca)nsl
Share of respondent’s socioeconomic peers in the sample choosing public transit (bicycle) (car)
Network Interdependence Defined by Residential Postcode
Finally, to be able to test the effect of spatial scale, by way of comparison with the network interdependence defined
by residential district, we define a smaller neighborhood region of influence on the basis of 4-digit postcode. There
are 67 postcode regions represented in the sample. Three new variables are again created, with generic specification:
pc4pt(bi)(ca)nsl Share of decision-maker’s neighbors choosing public transit (bicycle) (car)
Dugundji & Walker
8
We might hypothesize that the smaller spatial scale network interdependence defined by postcode may be
more homogeneous with regard to choice behavior than that for the variables defined on the basis of district. Thus
we might expect the coefficient on these variables to be relatively stronger.
DISCUSSION
All models are estimated using the freely available optimization toolkit Biogeme developed by Bierlaire
(http://roso.epfl.ch/biogeme). The loglikelihood, number of estimated parameters and adjusted rho-squared are given
in Table 1 for all model strategies and various reference group treatments. The first, second and third columns give
summary results for network interdependence defined by socioeconomic group, residential district and residential
postcode, respectively. The fourth and the fifth columns are treatments where social and spatial interdependence are
considered jointly: agents are assumed not to distinguish between their socioeconomic peers’ and their fellow
district residents or neighbors when considering their choice behavior. In the remaining sixth through ninth columns,
the specification is extended to allow agents to weigh any influence from their socioeconomic peers differently from
any influence from their fellow district residents and furthermore differently from any influence from their more
immediate neighbors. The interested reader is referred to Dugundji (41) or Dugundji & Walker (43) for tables of
actual parameter estimates and an enumeration of specification tests.
We conclude from loglikelihood ratio tests on the field effect models versus the baseline model, that for
this particular case study and the network definitions under consideration, systematic field effects representing social
and spatial network interactions between an agent and the aggregate behavior of other reference agents do indeed
have explanatory power. Also, on the basis of non-nested tests, the fit is better than models with “panel” effects
representing unobserved heterogeneity between reference groups and no systematic field effects. Finally, on the
basis of loglikelihood ratio tests, there is no statistically significant gain in fit at the 0.05 level by adding the “panel”
effects, when the systematic field effects are already included in the model.
For the case of network interdependence defined by residential district, a loglikelihood ratio test on the
model with a complete set of alternative specific dummy variables (ASDV) for reference group versus the field
effect model shows no significant additional explanatory power of the ASDV. With network interdependence
defined by socioeconomic group, a loglikelihood ratio test shows that the complete set of ASDV does indeed have
explanatory power over the field effect model, but it comes at the expense of 23 additional variables.
Loglikelihood ratio tests on the “random field” effect versus the field effect, shows that there is indeed
significant unobserved individual heterogeneity at the 0.05 level on the field variables with all network definition
treatments except residential district. For residential district this may not be so surprising given our finding that the
complete set of ASDV also showed no statistically significant improvement over the field effect, but gives now
further information: not only is there no additional explanatory power due to group heterogeneity, there is also no
additional explanatory power due to unobserved individual heterogeneity.
Loglikelihood ratio tests on the “panel field” effect versus the field effect, show that there is no significant
unobserved group heterogeneity at the 0.05 level on the field variables. At the 0.1 level however, for the case of
network interdependence defined by socioeconomic group there is significant unobserved group heterogeneity. This
is in-line with our finding that the complete set of ASDV for socioeconomic group showed statistically significant
improvement over the field effect model for socioeconomic group. Similarly on the basis of non-nested tests on the
“random field” effect versus the “panel field” effect, only for network interdependence defined by socioeconomic
group can we not safely bound the probability of incorrectly selecting the wrong model at less than 0.05, under the
hypothesis that the model with higher adjusted rho-squared (random field effect) is the correct one.
An interesting finding with respect to this particular case study is that the specifications allowing agents to
weigh both any influence from their socioeconomic peers as well as any influence from their spatial network, and
furthermore allowing the possibility to weight these influence differently, all outperform models where each agent
belongs to one and only one group. Also interesting is that the specifications allowing agents to weigh any influence
from their fellow district residents differently from any influence from their more immediate neighbors also perform
better than a uniform spatial network.
Note that the importance of distinguishing social versus spatial influence networks in transportation mode
choice behavior can be especially relevant if the model would in turn be coupled to, for example, a residential choice
behavior model. Transportation mode choice, in itself, does not necessarily affect a decision-maker’s reference
position in a social-spatial network. Whether an agent decides to travel by car versus public transit will not change
in a directly causal sense who his/her neighbors are, nor who his/her fellow district residents are, nor will it have
consequences for who the decision-maker’s age, income and education level peers are. Here, the network is wholly
static with respect to the choice dimension. Residential location choice, on the other hand, will per definition impact
Dugundji & Walker
9
a decision-maker’s reference position in a spatial network. In moving to a new neighborhood, a decision-maker per
definition acquires new neighbors, and potentially also new fellow district residents if the move of house is out of
the existing district. In this case the spatial network, in particular, is dynamic with respect to the choice dimension. If
transportation mode choice and residential choice would be considered together, we then per definition have to
address the dynamic nature of the spatial network in the empirical application.
Our main aim had been to explore the empirical estimation of discrete choice models with social and/or
spatial feedback effects, including various forms of unobserved (group) heterogeneity. In this paper we specifically
test for correlation among agents in the error structure, as well as in the systematic utility. Yet despite our efforts to
explicitly model unobserved heterogeneity in strategies #2, #1+2, #4 and #5, an open question in the present work
may still be the issue of potentially biased estimates due to endogeneity in the field effect. An alternative strategy for
introducing social and spatial network dependencies in choice models currently under investigation by the authors is
a two-stage instrumental variables approach.
One important distinction between the social-spatial causality and correlation discussed here and the time
causality and correlation traditionally understood in panel data literature is however that the concept of initial
conditions (44) can be less obvious. Time generally points in only one direction (unless foresight is considered),
whereas inter-agent dependencies can point in dual directions (decision-maker A influences decision-maker B, and
decision-maker B influences decision-maker A). Furthermore inter-agent dependence is clearly not necessarily onedimensional. Nonetheless if it can be assumed that the aggregate field variable as constructed in strategy #1 can be
seen as being effectively self-contained and not incurring bias in pointing by definition to unobserved decisionmakers outside of the data set, we believe the case to be similar as one with a time series which is observed from the
beginning. In such case there is no “initial value problem” as these values are well-defined. That said, we can also
easily imagine alternative constructions not discussed here which could much more obviously incur bias. For
example, social-spatial boundary conditions could be critical in the case of modeling social-spatial Markov random
field state dependence instead of group mean field state dependence, and social-spatial auto-regressive correlation
instead of unobserved group heterogeneity. (41) This is in fact one driving motivation for the particular strategies
presented here in this paper. It is also important to note that the identifiable agent interdependence presented in the
framework at the outset of the paper, but not addressed in the case study may also by definition more naturally suffer
from boundary conditions.
An important limitation of the present work is the a priori assumption that a feedback effect is indeed
relevant. As explained by Manski (17), access to true temporal data may allow possibilities for better empirically
distinguishing feedback effects from correlated effects and contextual effects.
CONCLUSIONS
We have presented a framework for conceptualizing the interdependence of decision-makers’ choices, making a
distinction between social versus spatial network interdependencies and between identifiable versus aggregate agent
interdependencies. We discuss five strategies for introducing social and spatial network interdependencies into
choice models, focusing on feedback effects and on correlated effects. Due to the nature of data available to us, we
are unable to consider identifiable agent interaction in the case study. Instead we consider aggregate agent
interdependencies and apply the model strategies for nine variations on three network treatments, one of these
defined by socioeconomic group and two defined spatially based on residential location. According to likelihood
ratio and non-nested specification tests, our best performing model strategy for this particular case study is what we
term a “random field effect” model, namely one with unobserved individual heterogeneity on the group mean state
dependence.
By casting the research question on social and spatial network interdependencies in terms of an analogy
with the treatment of typical panel data, we gain two important benefits. First it allows us to build on wellunderstood properties of typical panel data models. Second, but not least, it allows us to apply freely available
software for the estimation of the models with social-spatial interdependence. In so doing, we hope to stimulate
researchers and practitioners to adopt these techniques due to the relatively lower entry barrier for researchers and
practitioners than could be the case if dedicated own code would need to be written or if expensive software would
need to be purchased.
We believe our approach can be useful for researchers and practitioners who have a priori reason to believe
there is a feedback effect in their work, for example, on theoretical or qualitative grounds. Even without such a
priori knowledge, the so-called field variable can still be a very powerful and practical means of capturing variation
in a data set, especially in avoiding proliferation of estimated parameters, although strict caution must then be
exercised in the interpretation of predictions over time. Access to temporal data would allow researchers and
Dugundji & Walker
10
practitioners to remove ambiguity and be more definitive about their results. We hope that researchers and
practitioners will start to pay more attention to the importance and consequences of agent interdependence. In
particular we would be enthused to see more temporal data collected with the aim towards understanding the
potential inherent dynamic associated with feedback effects.
ACKNOWLEDGEMENTS
The authors would like to gratefully acknowledge Moshe Ben-Akiva, Frank le Clercq, Loek Kapoen, although the
authors claim full responsibility for any errors. The work forms Project II of the AMADEUS program directed by
Harry Timmermans and coordinated by Theo Arentze, made possible in part through a generous grant by the Dutch
National Science Organization. Special thanks are due to Guus Brohm and Yvon Weening of the Municipality of
Amsterdam Agency for Infrastructure, Traffic and Transport for assistance with data in the empirical application,
and to Len de Klerk, Gert van der Meer, Barbara Lawa, Karin Retèl-de Groot, Tina Zettl, Gimene Spaans, Ernst
Berkhout, Michel Hageman, the ICT Helpdesk, Wim de Lange, Loes van Dort, the FMG Facilitheek and the NVD
meldkamer, for institutional support in running simulations on computers in student halls at the Department of
Geography and Planning in the Faculty of Social and Behavioral Sciences at the University of Amsterdam.
Additional simulations were performed on the linux beowulf cluster at the SARA Computing and Networking
Services, Sciencepark Amsterdam, with special thanks to Willem Vermin, Bert van Corler, the HPC team, Jan
Hartmann and Marcel Heemskerk for institutional support. Several anonymous reviewers provided useful comments
on an earlier draft of this paper.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Ben-Akiva, M. Structure of Passenger Travel Demand Models. Ph.D. dissertation. Department of Civil
Engineering, MIT, Cambridge, Massachusetts, 1973.
Domencich, T., and D. McFadden. Urban Travel Demand. North Holland Press, Amsterdam, 1975.
Wegener, M. Reduction of CO2 Emissions of Transport by Reorganisation of Urban Activities. In Y.
Hayashi, and J. Roy (eds.), Land Use, Transport and the Environment. Kluwer Academic Publishers,
Dordrecht, Netherlands, 1996, pp. 103-124.
Moeckel, R., C. Schürmann, B. Schwarze, K. Spiekermann, and M. Wegener. Microsimulation of
Urban Land Use, Transport and Environment. Presented at the 10th World Conference on
Transportation Research, Istanbul, Turkey, July 2004.
Waddell, P. UrbanSim: Modeling Urban Development for Land Use, Transportation and
Environmental Planning. Journal of the American Planning Association, Vol. 68, No. 3, 2002, pp. 297314.
Martinez, F., and F. Aguila. An optimal urban planning tool: Subsidies and regulations. Presented at
the 10th World Conference on Transportation Research, Istanbul, Turkey, July 2004.
Hensher, D.A., and T. Ton. TRESIS: A Transportation, Land Use and Environmental Strategy Impact
Simulator for Urban Areas. Proceedings of the 9th World Conference on Transportation Research,
Seoul, Korea, July 2001.
Ben-Akiva, M., D. McFadden, T. Gärling, D. Gopinath, J. Walker, D. Bolduc, A. Börsch-Supan, P.
Delquié, O. Larichev, T. Morikawa, A. Polydoropoulou, and V. Rao. Extended Framework for
Modeling Choice Behavior. Marketing Letters, Vol. 10, No. 3, 1999, pp. 187-202.
McFadden, D. Economic Choices. American Economic Review, Vol. 91, No. 3, 2001, pp. 351-378.
Walker, J. Extended Discrete Choice Models: Integrated Framework, Flexible Error Structures, and
Latent Variables. Ph.D. dissertation. Department of Civil Engineering, MIT, Cambridge,
Massachusetts, 2001.
Walker, J., and M. Ben-Akiva. Generalized Random Utility Model. Mathematical Social Sciences,
Vol. 43, No. 3, 2002, pp. 303-343.
Ben-Akiva, M., D. McFadden, K. Train, J. Walker, C. Bhat, M. Bierlaire, D. Bolduc, A. BoerschSupan, D. Brownstone, D.S. Bunch, A. Daly, A. de Palma, D. Gopinath, A. Karlstrom, and M.A.
Munizaga. Hybrid Choice Models: Progress and Challenges. Marketing Letters, Vol. 13, No. 3, 2002,
pp. 163-175.
Dugundji, E.R., L.L. Kapoen, F. le Clercq, T.A. Arentze, H.J.P. Timmermans, and K.J. Veldhuisen.
The Long-term Effects of Multi-modal Transportation Networks: The Residential Choice Behavior of
Dugundji & Walker
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
11
Households. Proceedings of the 9th World Conference on Transportation Research, Seoul, Korea, July
2001.
Aoki, M. Economic Fluctuations with Interactive Agents: Dynamic and Stochastic Externalities.
Japanese Economic Review, Vol. 46, No. 2, 1995, pp. 148-165.
Brock, W., and S. Durlauf. Discrete Choice with Social Interactions. Review of Economic Studies, Vol.
68, 2001, pp. 235-260.
Blume, L., and S. Durlauf. Equilibrium Concepts for Social Interaction Models. SSRI Working Papers,
Social Systems Research Institute, University of Wisconsin, Madison, 2002.
Manksi, C.F. Identification Problems in the Social Sciences. Harvard University Press, Cambridge,
Massachusetts, 1995.
Brock, W., and S. Durlauf. A Multinomial Choice Model of Neighborhood Effects. American
Economic Review, Vol. 92, No. 2, 2002, pp. 298-303.
Dugundji, E.R. Socio-Dynamic Discrete Choice: Analytical Results for the Nested Logit Model.
Presented at the International Conference on Complex Systems, Boston, Massachusetts, May 2004.
(http://necsi.net/events/iccs/openconf/author/abstractbook.php)
Dugundji, E.R., and L. Gulyas. Empirical Estimation and Multi-Agent Based Simulation of a Discrete
Choice Model with Network Interaction Effects. In Proceedings of the 8th International Conference on
Computers in Urban Planning and Urban Management, Sendai, Japan, May 2003.
(http://www.rs.civil.tohoku.ac.jp/~cupum03)
Dugundji, E.R., and L. Gulyas. An Exploration of the Role of Global versus Local and Social versus
Spatial Networks in Transportation Mode Choice Behavior in The Netherlands. In Proceedings of
AGENT 2003: Challenges in Social Simulation, University of Chicago, USA, October 2003.
(http://www.agent2004.anl.gov/Agent2003.pdf)
Dugundji E.R., and L. Gulyas. Discrete Choice on Networks: An Agent-Based Approach. Presented at
the First Conference of the European Social Simulation Association, Groningen, Netherlands,
September 2003. (http://www.uni-koblenz.de/~kgt/ESSA/ESSA1/)
Dugundji E.R., and L. Gulyas. Dynamic Discrete Behavioral Choice on Networks: The Nested Logit
Model – An Empirical Application. Presented at XXIV International Sunbelt Social Network
Conference, Portorož, Slovenia, May 2004. (http://vlado.fmf.uni-lj.si/info/sunbelt24/default.htm)
Dugundji E.R., and L. Gulyas. Socio-Dynamic Discrete Choice on Networks: Multi-agent Simulation
of Social & Spatial Interaction Effects in Transportation Mode Choice Behavior. Presented at the 10th
World
Conference
on
Transportation
Research,
Istanbul,
Turkey,
July
2004.
(http://www.wctr2004.org.tr)
Dugundji E.R., and L. Gulyas. Socio-Dynamic Discrete Choice on Networks: Impacts of Agent
Heterogeneity on Emergent Equilibrium Outcomes. In Proceedings of AGENT 2004: Social
Dynamics: Interaction, Reflexivity and Emergence, University of Chicago, USA, October 2004.
(http://www.agent2004.anl.gov/agenda.html)
Hensher, D.A. Models of Organisational and Agency Choices for Passenger and Freight-Related
Travel Choices: Notions of Inter-Activity and Influence. ITS-WP-03-05, Faculty of Economics and
Business, University of Sydney, NSW, 2003.
Brewer, A.M. and D.A. Hensher. Distributed work and travel behaviour: The dynamics of interactive
agency choices between employers and employees. Transportation, Vol. 27, 2000, pp. 117-148.
Rose, J. and D.A. Hensher. Modelling agent interdependency in group decision making.
Transportation Research Part E, Vol. 40, 2004, pp. 63-79.
Vovsha, P., M. Bradley and J.L. Bowman. Activity-Based Travel Forecasting Models in the United
States: Progress since 1995 and Prospects for the Future. Presented at the EIRASS Workshop on
Progress in Activity-Based Analysis, Maastrict, Netherlands, May 2004.
Vovsha, P., J. Gliebe, E. Peterson and F. Koppelman. Comparative Analysis of Sequential and
Simultaneous Choice Structures for Modeling Intra-Household Interactions. Presented at the EIRASS
Workshop on Progress in Activity-Based Analysis, Maastrict, Netherlands, May 2004.
Timmermans, H.J.P., A.W.J. Borgers, J. van Dijk and H. Oppewal. Residential Choice Behavior of
Dual-Earner Households: A Decompositional Joint Choice Model. Environment and Planning A, Vol.
24, 1992, pp. 517-533.
Borgers, A.W.J. and H.J.P. Timmermans. Transport Facilities and Residential Choice Behavior: A
Model of Multi-Person Choice Processes. Papers in Regional Science, Vol. 72, 1993, pp. 45-61.
Dugundji & Walker
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
12
Abraham, J.E. and J.D. Hunt. Specification and Estimation of Nested Logit Model of Home,
Workplaces, and Commuter Choices by Multiple Worker Households, Transportation Research
Record 1606, 1997, pp.17-24.
Molin, E.J.E., H. Oppewal and H.J.P. Timmermans. Conjoint modeling of residential group
preferences: A comparison of the internal validity of hierarchical information integration approaches.
Journal of Geographical Systems, Vol. 4, 2002, pp. 343-358.
Dellaert, B.G.C., M. Prodigalidad and J.J. Louviere. Family Members’ Projections of Each Other’s
Preference and Influence: A Two-Stage Conjoint Approach. Marketing Letters, Vol. 9, No. 2, 1998,
pp. 135-145.
Ben-Akiva, M. and M. Bierlaire. Discrete choice methods and their applications to short-term travel
decisions. In R. Hall (ed.), Handbook of Transportation Science. International Series in Operations
Research and Management Science, Kluwer, Dordrecht, Netherlands, 1999.
Timmermans, H.J.P., T.A. Arentze, M. Dijst, E.R. Dugundji, C-H. Joh, L.L. Kapoen, S. Krygsman, K.
Maat, and K.J. Veldhuisen. AMADEUS: Framework for Developing a Dynamic Multiagent,
Multiperiod Activity-Based Micro-Simulation Model of Travel Demand. Presented at the 81st Annual
Meeting of the Transportation Research Board, Washington, D.C., January 2002.
Joh, C-H. Measuring and Predicting Adaptation in Multidimensional Activity-Travel Patterns. Ph.D.
dissertation. Department of Architecture, TU/e, Eindhoven, Netherlands, 2004.
Krygsman, S. Activity and Travel Choice(s) in Multimodal Public Transport Systems. Ph.D.
dissertation. Department of Geography, UU, Utrecht, Netherlands, 2004.
Maat, K., H.J.P. Timmermans, and E. Molin. A Model of Spatial Structure, Activity Participation and
Travel Behavior. Presented at the 10th World Conference on Transportation Research, Istanbul,
Turkey, July 2004.
Dugundji, E.R. Socio-Dynamic Discrete Choice. Ph.D. manuscript (in progress). Faculty of Social and
Behavioral Sciences, Department of Geography & Planning, UvA, Amsterdam, Netherlands, 2005.
Ben-Akiva, M., and S.R. Lerman. Discrete Choice Analysis: Theory and Application to Travel
Demand. MIT Press, Cambridge, Massachusetts, 1985.
Dugundji, E.R. and J.L. Walker. Discrete Choice with Social and Spatial Network Interdependencies:
An Empirical Example Using Mixed GEV Models with Field and “Panel” Effects. Presented at the
84th Annual Meeting of the Transportation Research Board, Washington, D.C., USA, January 2005.
Heckman, J. J. The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating
a Discrete Time-Discrete Data Stochastic Process. In C. Manski and D. McFadden (eds.), The
Structural Analysis of Discrete Data. MIT Press, Cambridge, Massachusetts, 1981, pp. 179-195.
Dugundji & Walker
13
LIST OF FIGURES AND TABLES
FIGURE 1 (a) Commuter mode count for decision-makers in Amsterdam and Amstelveen grouped by
residential district. (b) Abstract visualization of network interdependence defined by residential district plus
socioeconomic group.
TABLE 1 Loglikelihood, Number of Estimated Parameters and Adjusted Rho-Squared for Various
Reference Group Treatments
Dugundji & Walker
14
(a)
Legend
6
Amsterdam and Amstelveen
1
2
8
3
9
4
10
Sampled Residential Locations
District 1 (363 respondents)
District 2 (352 respondents)
District 3 (329 respondents)
District 4 (223 respondents)
District 6 (254 respondents)
District 8 (340 respondents)
District 9 (348 respondents)
District 10 (243 respondents)
District 16 (461 respondents)
310
16
Bicycle commuters (779 total)
Transit commuters (690 total)
Car commuters (1444 total)
(b)
FIGURE 1 (a) Commuter mode count for decision-makers in Amsterdam and Amstelveen grouped by
residential district. (b) Abstract visualization of network interdependence defined by residential district plus
socioeconomic group. Red dots represent fully-connected residential districts as in (a); the darkness and
width of lines gives an indication of the number of links between districts induced by socioeconomic group.
Dugundji & Walker
15
TABLE 1 Loglikelihood, Number of Estimated Parameters and Adjusted Rho-Squared for Various Reference-Group Treatments a)
Model Strategy
#1: Field Effect
(Group Mean State Dependence)
#2: “Panel” Effect
(Unobserved Group Heterogeneity)
#1 and #2: Field + Panel
#3: Alt-Specific Group Dummies
(Observed Group Heterogeneity)
#4: Random Field Effect
(Group Mean State Dependence
w/ Unobs. Individual Heterogeneity)
#5: “Panel” Field Effect
(Group Mean State Dependence
w/ Unobserved Group Heterogeneity)
a)
b)
c)
Socioecon. Socioecon.
Socioecon.
Socioecon. District
Postcode
w/ District w/ Postcode Socioecon. District
Socioecon. + District
(13 groups) (9 groups) (67 groups) (joint)
(joint)
+ District + Postcode + Postcode + Postcode
-2052.15
-2054.88
-2009.55
-2051.9
-2047.61
-2042.92
-2005.1
-1999.58
-1994.65
16
16
16
16
16
17
17
17
18
0.305
0.304
0.320
0.305
0.307
0.308
0.321
0.323
0.324
-2054.22
-2063.61
-2061.22
-2058.55
-2059.17
-2058.55
-2061.22
-2059.17
-2059.17
18
18
18
18
18
18
18
18
18
0.304
0.301
0.302
0.303
0.302
0.303
0.302
0.302
0.302
-2048.69
-2054.87
-2009.55
-2048.54
-2043.86
-2041.27
-2005.1
-1999.34
-1994.49
19
19
19
19
19
20
20
20
21
0.306
0.303
0.319
0.306
0.307
0.308
0.320
0.322
0.323
-2034.27
-2048.89
b)
c)
c)
b)
b)
b)
b)
39
31
0.304
0.301
-2047.91
-2052.01
-2004.79
-2047.08
-2042.51
-2033.31
-2002.55
-1989.9
-1984.37
19
19
19
19
19
23
23
23
27
0.306
0.304
0.320
0.306
0.308
0.309
0.320
0.324
0.324
-2048.75
-2054.87
-2009.55
-2048.06
-2044.42
-2040.95
-2005.1
-1999.1
-1994.02
19
19
19
19
19
23
23
23
27
0.305
0.303
0.319
0.306
0.307
0.307
0.319
0.321
0.321
The baseline model with cross-nested logit choice kernel and no modeled inter-agent dependence has loglikelihood of -2066.03 with 15 estimated parameters.
The adjusted rho-squared for this baseline model is 0.301.
Considerable risk of model overfit due to large number of parameters to be estimated
Unclear how to operationalize alternative-specific group dummies with joint overlapping reference groups