Dugundji & Walker Discrete Choice with Social and Spatial Network Interdependencies An Empirical Example Using Mixed GEV Models with Field and “Panel” Effects Elenna R. Dugundji * Department of Geography and Planning University of Amsterdam Nieuwe Prinsengracht 130 1018 VZ Amsterdam, Netherlands Tel: +31 20 525 4581 Fax: +31 20 692 5813 Email: [email protected] Joan L. Walker Center for Transportation Studies Boston University 675 Commonwealth Avenue Boston, MA 02215, USA Tel: +1 617 353 5743 Fax: +1 617 358 0205 Email: [email protected] Submission date: August 1, 2004 Revised: November 15, 2004 Final version: March 31, 2005 Word count: 7451 (including abstract, references, 1 table, 1 figure) *) Corresponding author 1 Dugundji & Walker 2 ABSTRACT Discrete choice analysis has become an industry standard in land use and transportation models. Such models are fundamentally grounded in individual choice, and therefore an outstanding challenge is the treatment of interdependencies among decision-makers. This paper describes and illustrates through an empirical application to mode choice how interdependencies can be captured in discrete choice. In this approach, decision-makers are assumed to be influenced, for example, by those of similar socio-economic status and those in spatial proximity. Given such social and spatial network relationships, interdependencies are captured in the choice model in two ways. The first is to include in the systematic utility variables that describe choices of others in the decision-maker's social and spatial network. The second is by allowing for correlation across the disturbances of decision-makers within the same social and spatial network. Variations of these approaches (including their combination and the use of random parameters) were tested with mode choice, and were compared with traditional methods of market segmentation. The application results indicate that the proposed methods for capturing interdependencies are significant and are superior to traditional methods. Furthermore, capturing the interdependencies in the systematic utility was sufficient in that it explained better than the model with just correlation and was not significantly worse than the model with both the systematic term and correlation. The systematic term also captures a feedback effect, which, for example, can propel the adoption of a new mode over time. The models were estimated using a traditional transportation dataset and readily available software. Dugundji & Walker 3 INTRODUCTION Pioneered in the domain of travel demand by Ben-Akiva, McFadden and others (1, 2), discrete choice analysis has become an industry standard in land use and transportation planning models. Recent elaborate operational examples of the development of this methodology are due to Wegener, Waddell, Martinez, Hensher (3)-(7), to cite just a few. Meanwhile the field itself has flourished in the past 30 years ultimately extending the basic random utility model to incorporate cognitive and behavioral processes, flexible error structures and different types of data in so-called hybrid choice models (8)-(12). However as discrete choice theory is fundamentally grounded in individual choice, an outstanding challenge remains in the treatment of the interdependence of various decision-makers’ choices (13). This paper aims to illustrate issues in the empirical estimation of a discrete choice model with interdependence between decision-makers’ choices using mixed GEV model structures. The paper is organized as follows. First the particular forms of interdependence to be addressed in this paper are motivated. A brief review of literature is presented describing what the paper brings to an existing stream of work, and a framework for interactions among decision-making agents is outlined. Next, five different modeling strategies are proposed to test central hypotheses of the paper. The model is applied to a case study in the Netherlands to highlight some main hypothesized social and spatial interaction effects. Estimation results are discussed. Finally, limitations in the present work are summarized and directions for future research efforts are outlined. BACKGROUND Our starting point in considering interdependence of various decision-makers’ choices is a trio of papers by economists Aoki (14), Brock & Durlauf (15) and Blume & Durlauf (16). They introduce social interactions in binary discrete choice models by allowing a given agent’s choice for a particular alternative to be dependent on the overall share of decision-makers that choose that alternative. The specification captures feedback between decision-makers that can potentially be reinforcing over the course of time. In diverse literature this is referred to as a social multiplier, a cascade, a bandwagon effect, imitation, contagion, herd behavior, etc. (17) The lack of attention to such models to date in the field of transportation is surprising. Namely, if there is theoretical or qualitative reason to believe that a feedback effect exists, it can have very important implications for the prediction of (system-wide) results over the course of time. For example, in the introduction of a new transportation mode alternative, if there is a “dominant enough” feedback effect, this can propel the adoption of the new mode over time. How dominant is “dominant enough” under different conditions? Brock & Durlauf (18) have extended their results on the behavior of binary logit models to multinomial logit models. Dugundji (19) makes Brock & Durlauf’s multinomial results precise for trinary multinomial choice and extends the results for the case of nested logit. Also, while the behaviour over time derived in early work assumed each decision-maker to be influenced by all other decision-makers (so-called global interactions), Dugundji & Gulyas (20, 21) derive more general behavior for the case where each decision-maker is influenced by only a subset of decision-makers (so-called non-global interactions). A key to all of the above theoretical results, however, is the assumption that the only explanatory variable in the model is the feedback effect. While such a specification may be plausible for a fad such as the latest new children’s toy, it is much less intuitive for transportation mode choice where other explanatory variables would be assumed to be significant, including both attributes of the alternatives as well as characteristics of the decisionmaking agents. Dugundji & Gulyas (22) present initial results using simulated data for a binary logit model with non-global interactions and other explanatory variables included in the utility. In a companion paper to this one, Dugundji & Gulyas (23, 24) present results for the behavior over time of a nested logit model with non-global interactions, using the same empirical data and the same social and spatial network interdependencies of which decision-makers influence each other as to be defined here. One econometric issue that arises in empirical estimation however is that the same unobserved effects might be likely to influence the choice made by a given decision-maker as well as the choices made by those in the decision-maker’s reference group. In terms of transportation mode choice, for example accessibility measures for residents in the neighborhood could play such a role to the extent that these were unable to be directly captured through explanatory variables in the utility specification. In this case, the use of transportation mode shares of neighbors living in the same zone as an explanatory variable will be correlated with the unobserved error of the given decision-maker, which is a classic case of endogeneity. If endogenous, the results and coefficients of such a model will be biased. To try to separate out effects, it is therefore critically important to begin with an as wellspecified model as possible, making use of relevant available explanatory variables. Consequences of omitting significant explanatory variables in the utility specification and improperly accounting for availability of alternatives Dugundji & Walker 4 in nested logit discrete choice models with feedback effects are considered in Dugundji & Gulyas. (25) In this paper we continue this exploration of issues in the empirical estimation of discrete choice models with feedback effects, by specifically testing for correlation among agents in the error structure in the particular empirical case study of mode choice to work, through the used of mixed GEV family models. It is namely our intent to capture potential bias induced by endogeneity, by explicitly accounting for the supposed correlation in the error structure. For completeness, we also compare feedback effect models to baseline models where limited social and spatial interdependence can be introduced as alternative-specific dummy variables. A FRAMEWORK FOR SOCIAL AND SPATIAL NETWORK INTERDEPENDENCIES In order to model interactions among decision-makers, it is necessary to formulate the nature in which these interactions take place. It is useful in this context to apply the concept of a network as a formal representation of such interactions, where nodes represent decision-making entities and links are drawn to represent interdependencies between these entities. We address now in turn issues of scope and mechanism. Perhaps most salient is the issue of scope: are we considering only supply-side interactions, only demandside interactions, or supply-demand interactions? (13) This is important due to the inherently more complex nature of supply-side and supply-demand interaction and the importance of the specific role and position of decisionmakers in an (unbalanced) interaction process. Hensher (26) outlines considerations in this regard and proposes at least eight different styles of decision-making processes ranging from autocratic to leaderless. A particularly novel approach to supply-demand interactions in transportation literature are the studies by Brewer & Hensher (27) and Rose & Hensher (28). Combining ideas from game theory as well as discrete choice theory, they develop what they term “Interactive Agency Choice Experiments” between employer and employee to study adoption of participation in distributed work. The focus in this paper however will be only on demand-side interactions. To make an effort to classify different types of demand-side interaction mechanisms, a distinction is hypothesized between social versus spatial interactions and between identifiable versus aggregate interactions (20, 21). We speak of interaction between “identifiable” decision-makers when the links in the network are well-known and explicitly defined on an individual decision-maker by decision-maker basis. We speak of interaction between “aggregate” decision-makers when interdependence is assumed to take place only at an aggregate level with links being defined, for example, more generally based on decision-maker characteristics. We speak of “spatial” network interactions when the interdependence represents a confluence of decision-makers in geographic terms. For example, decision-makers may be linked based on spatial proximity of residential location, work location or other geographical point of reference such as school, childcare, shopping, healthcare, leisure/recreation or other relevant activity location. We speak of “social” network interactions when decision-makers are linked based on social circles. With social interactions, the decision-makers need not be proximally or tangentially situated in geographical terms and the interaction is not necessarily centered at a particular geographic point of reference; interaction may take place at a distance, so to speak. The research reported here explores interactions between a decision-maker and the aggregate actions of other decision-makers proximally situated in a spatial network, and interactions between a decision-maker and the aggregate actions of other decision-makers associated in a socioeconomic network. Technically, however, interactions between identifiable decision-makers may also be modeled using the approach described in this paper given the availability of suitable data, and thus methods reported here may prove to be useful in those areas as well. We will return briefly to the limitations of our approach as applied to identifiable interactions in the discussion at the end of this paper. One particular special case of spatial interactions between identifiable agents worthy of note are intrahousehold interactions. There are various reasons why we might expect a priori that intra-household interaction would be important in travel demand behavior. Vovsha, Bradley & Bowman (29) and Vovsha et al (30) provide a useful review of literature, distinguishing between coordination of individual daily activity patterns, joint participation in activities and travel and intra-household mechanisms for allocation of maintenance activities. Other interesting examples of the application of intra-household interactions include residential choice behavior (31)-(34) and in the marketing literature, for example, holiday destination choice (35). In the case study to be discussed, we do not have available data on which identifiable agents influence other identifiable agents’ choices. We do however have rich socioeconomic data for each respondent as well as the geographic location of each respondent’s residence. This allows us to define aggregate interactions by grouping agents into geographic neighborhoods or into socioeconomic groups indexed (1,...,g,...,G) where the influence is assumed to be more likely. In the simplest case, these groups are assumed to be mutually exclusive and collectively exhaustive. That is each agent n belongs to one and only one group g. The agent is influenced by the average choice Dugundji & Walker 5 behavior of his or her group, and the influence by other groups is assumed to be negligible. Figure 1(a) illustrates the transportation mode shares for decision-makers in the sample grouped by residential district. At a global level, the picture is a fragmented or disconnected network of clustered groups. If we are interested in equilibrium behavior, the consequences of such an assumption are important: there is no transmission of influence across groups, and the global picture is a weighted average behavior of the separate clusters. As in Dugundji & Gulyas (23)-(25), we also consider the case with overlapping groups, with agents for example connected by social group as well as by residential district. This leads to a giant cluster for the empirical example under consideration, with the important implication that influence can spread throughout the entire population. Such a network is abstractly visualized in Figure 1(b) using the freely available software packet Pajek developed by Batagelj & Mrvar (http://vlado.fmf.unilj.si/pub/networks/pajek). SOME STRATEGIES FOR INTRODUCING SOCIAL AND SPATIAL NETWORK INTERDEPENDENCIES IN CHOICE MODELS Discrete choice theory allows prediction based on computed individual choice probabilities for heterogeneous agents’ evaluation of alternatives. We present several strategies for introducing social and spatial network interdependencies in choice models: as a generic feedback effect, as unobserved group heterogeneity, as a set of alternative-specific dummy variables, and as a random parameter feedback effect. Note that we want to avoid confounding correlation among alternatives with any potential correlation among agents. Although we choose the cross-nested logit model (36) as a starting point owing to its fairly general closed form, the strategies could in principle be applied to any suitable choice model. Strategy 1: Field Effect for Reference-Group We introduce a feedback effect among agents by allowing the systematic utility Vin to be a linear-in-parameter β first order function of the proportion xin of a given decision-maker’s reference entities who have made this choice. The variable xin is termed a “field variable.” As motivated by Aoki, “Knowledge of a field variable relieves agents (at least partially) of the need for detailed information on interaction patterns. Any macroeconomic variable that serves this decentralizing function is called a field variable.” Our model differs from the Aoki model however, in that we consider non-global interactions. We do consider an aggregated feedback, but agents see different proportions, depending on who their particular reference entities are. In fact, even agents within the same group see (slightly) different proportions of reference agents choosing each alternative because an agent’s own choice is not included in these proportions. If we had data available to support treatment of network interaction effects with identifiable households, the approach could be further generalized for example to allow different weights on different ties. Strategy 2: Unobserved Group Heterogeneity (“Panel” Effect) Next we consider correlation among agents in the disturbance of the utility. We model the correlation among agents within a group g as a panel effect. In a typical choice model application with panel data, the panel aspect captures correlation across multiple responses from the same individual. The panel implementation that we use captures correlation among different decision-makers based on the connectivity in a social and/or spatial network. Let ξg = [ξ1g,..., ξig,..., ξJg]T be a vector of unobserved random terms where the ξig are now additively introduced to the utility function Uin. The probability Pn(i) that agent n chooses alternative i from the composite choice set Cn is: Pn (i ) = ∫ Pn (i | ξ g ) f (ξ g )d ξ g ξg [1] where f (ξg) denotes the multivariate probability density of ξg. It is namely our intent to capture potential bias induced by endogeneity, by explicitly accounting for the supposed correlation in the error structure. Where these error terms can be shown to be statistically insignificant, we assume that the hypothesized endogeneity has negligible effect. Dugundji & Walker 6 Strategy 3: Alternative-Specific Dummy Variable for Reference-Group For completeness, we also consider a more typical model specification with a complete set of alternative-specific dummy variables (ASDV) by reference group. This would be a case of observed group heterogeneity. In general we can expect a complete set of ASDV to perform well in terms of the loglikelihood value, in being very flexible in capturing variation across alternatives and across groups. The difficulty with adding a complete set of ASDV is the proliferation of estimated parameters if there are a large number of groups and/or large number of alternatives. In such case we might run considerable risk of overfitting our model. Strategies 4 and 5: Random Field Effect and Panel Field Effect Finally, we allow the field effect described in strategy #1 to have alternative-specific variance σi associated with it. In strategy #4 we model this variance most generally as unobserved individual heterogeneity (random parameter field effect). In strategy #5 we model this variance as unobserved group heterogeneity: the variance σi on the field effect is constrained to be equal for agents in the same reference group (“panel” field effect). The distinction between this last model versus strategies #1 and #2 estimated directly together is that here the unobserved group heterogeneity is on the field effect itself, instead of being estimated on its own. Strategies #4 and #5 thus gain flexibility in capturing variation across alternatives and across individuals/groups as with the complete set of ASDV in strategy #3, but without the drawback of a potential proliferation of estimated parameters with large number of groups and/or alternatives. CASE STUDY The research reported here is a small part of a larger work aimed at assessing multi-modal transportation as a land use transportation planning policy instrument for reducing road congestion (37)-(40). The contribution of this particular subproject is the treatment of social and/or spatial interdependencies between households and the attempt to understand any potential role of generated feedback dynamics in the take-up of mode choice to work (41). The data used in this subproject originates from travel questionnaires administered by the Municipality of Amsterdam Agency for Infrastructure, Traffic and Transport (dIVV) during the period 1992-1997 in Amsterdam and a neighboring suburb to the south of the city, Amstelveen. Due to limitations in the available data, we consider a trip-based home to work model. Baseline Model with No Inter-Agent Dependencies Definition of Variables The model is a mode choice model to work, with alternatives public transit (pt), bicycle/motorcycle (bi) and car driver/passenger (ca). Raw variables available for use in the model are defined as follows: availpt carown gender age income education ivtpt ovtpt ttbike ttcar parktca 1 if public transit alternative is available, 0 otherwise 1 if decision-maker owns a car, 0 otherwise 1 if female, 0 if male Respondent’s age category: 12-17 years; 18-29 years; 30-44 years; 45-59 years; 60 years and older Respondent’s income in one of five categories based on dutch governmental classification Respondent’s education category: elementary education (LO); lower vocational education (LBO); high school education (MO); post high school education (HO); other In-vehicle-time in minutes with public transit Out-of-vehicle-time in minutes with public transit (access and egress, waiting, transferring, etc) Travel time in minutes with bicycle Travel time in minutes with car Time in minutes to park car Various piecewise linear specifications of all travel time related variables (ivtpt, ovtpt. ttbi, ttca, parkt) as well as age (defined by the midpoint of the age category) were tested against linear, quadratic and logarithmic forms of these variables. Considering various a priori hypotheses of behavior in the region and after statistical comparison Dugundji & Walker 7 of the alternative nonlinear specifications of variables against the linear versions thereof using loglikelihood ratio tests and non-nested tests (42), the following definitions are ultimately used in the baseline model: lnage age4559 ivtsqpt lnttcar parktsqcar aowmin Natural logarithm of age in years Max [0, min (age - 45, 15) ] In-vehicle-time in minutes with public transit, squared Natural logarithm of travel time in minutes with car Time in minutes to park car, squared 1 if income category at pension level but below poverty line, 0 otherwise Additionally, availability of bicycle is defined: availbi75 1 if travel time by bicycle is less than 75 minutes, 0 otherwise Choice Kernel: Cross-Nested Logit Estimation of three successive nested logit models first with public transit nested with bicycle, then with public transit nested with car, and finally with bicycle nested with car, show the first two nesting structure to be significant in terms of loglikelihood ratio test and in terms of a t-test on the nest coefficient. The third nesting structure was not indicated. Thus a cross-nested structure with public transit nested with both bicycle and car is a logical choice. This cross-nesting structure was tested against a general structure where all 3 alternatives could belong to each of 2 nests, and showed no significant loss of fit. Furthermore, the nest coefficient on the public transit and car nest and the nest coefficient on the public transit and bicycle nest are constrained to be equal. Finally constraining the two parameters αpt, m defining the degree of “membership” of the public transit alternative in each of the two nests m to be equal also showed no significant loss of fit, and improvement in the t-test on αpt, m. The baseline cross-nested model thus adds two additional parameters, namely a generic nest coefficient and a generic membership in the nests. Network Interdependence Defined by Residential District Now we turn to the specification of the network interdependence. We begin with a broad classification by residential district. See figure 1(a). There are nine districts represented in the sample. Three new variables are created: dtrpt(bi)(ca)nsl Share of respondent’s fellow district residents in the sample choosing public transit (bicycle) (car) The designation “nsl” refers to “no self-loops.” That is, the respondent’s own choice is not included in the average behavior in the district perceived by a given respondent. The specification of these variables is generic. There are two main motivations for choosing a generic specification, namely, simplicity and identification. In order to estimate three distinct coefficients, we would either need to add an additional constraint or estimate the coefficients as random parameters. Network Interdependence Defined by Socioeconomic Group Next using the variables age, income and education, thirteen socioeconomic groups are defined (41). As above, three new variables are created, with generic specification: socpt(bi)(ca)nsl Share of respondent’s socioeconomic peers in the sample choosing public transit (bicycle) (car) Network Interdependence Defined by Residential Postcode Finally, to be able to test the effect of spatial scale, by way of comparison with the network interdependence defined by residential district, we define a smaller neighborhood region of influence on the basis of 4-digit postcode. There are 67 postcode regions represented in the sample. Three new variables are again created, with generic specification: pc4pt(bi)(ca)nsl Share of decision-maker’s neighbors choosing public transit (bicycle) (car) Dugundji & Walker 8 We might hypothesize that the smaller spatial scale network interdependence defined by postcode may be more homogeneous with regard to choice behavior than that for the variables defined on the basis of district. Thus we might expect the coefficient on these variables to be relatively stronger. DISCUSSION All models are estimated using the freely available optimization toolkit Biogeme developed by Bierlaire (http://roso.epfl.ch/biogeme). The loglikelihood, number of estimated parameters and adjusted rho-squared are given in Table 1 for all model strategies and various reference group treatments. The first, second and third columns give summary results for network interdependence defined by socioeconomic group, residential district and residential postcode, respectively. The fourth and the fifth columns are treatments where social and spatial interdependence are considered jointly: agents are assumed not to distinguish between their socioeconomic peers’ and their fellow district residents or neighbors when considering their choice behavior. In the remaining sixth through ninth columns, the specification is extended to allow agents to weigh any influence from their socioeconomic peers differently from any influence from their fellow district residents and furthermore differently from any influence from their more immediate neighbors. The interested reader is referred to Dugundji (41) or Dugundji & Walker (43) for tables of actual parameter estimates and an enumeration of specification tests. We conclude from loglikelihood ratio tests on the field effect models versus the baseline model, that for this particular case study and the network definitions under consideration, systematic field effects representing social and spatial network interactions between an agent and the aggregate behavior of other reference agents do indeed have explanatory power. Also, on the basis of non-nested tests, the fit is better than models with “panel” effects representing unobserved heterogeneity between reference groups and no systematic field effects. Finally, on the basis of loglikelihood ratio tests, there is no statistically significant gain in fit at the 0.05 level by adding the “panel” effects, when the systematic field effects are already included in the model. For the case of network interdependence defined by residential district, a loglikelihood ratio test on the model with a complete set of alternative specific dummy variables (ASDV) for reference group versus the field effect model shows no significant additional explanatory power of the ASDV. With network interdependence defined by socioeconomic group, a loglikelihood ratio test shows that the complete set of ASDV does indeed have explanatory power over the field effect model, but it comes at the expense of 23 additional variables. Loglikelihood ratio tests on the “random field” effect versus the field effect, shows that there is indeed significant unobserved individual heterogeneity at the 0.05 level on the field variables with all network definition treatments except residential district. For residential district this may not be so surprising given our finding that the complete set of ASDV also showed no statistically significant improvement over the field effect, but gives now further information: not only is there no additional explanatory power due to group heterogeneity, there is also no additional explanatory power due to unobserved individual heterogeneity. Loglikelihood ratio tests on the “panel field” effect versus the field effect, show that there is no significant unobserved group heterogeneity at the 0.05 level on the field variables. At the 0.1 level however, for the case of network interdependence defined by socioeconomic group there is significant unobserved group heterogeneity. This is in-line with our finding that the complete set of ASDV for socioeconomic group showed statistically significant improvement over the field effect model for socioeconomic group. Similarly on the basis of non-nested tests on the “random field” effect versus the “panel field” effect, only for network interdependence defined by socioeconomic group can we not safely bound the probability of incorrectly selecting the wrong model at less than 0.05, under the hypothesis that the model with higher adjusted rho-squared (random field effect) is the correct one. An interesting finding with respect to this particular case study is that the specifications allowing agents to weigh both any influence from their socioeconomic peers as well as any influence from their spatial network, and furthermore allowing the possibility to weight these influence differently, all outperform models where each agent belongs to one and only one group. Also interesting is that the specifications allowing agents to weigh any influence from their fellow district residents differently from any influence from their more immediate neighbors also perform better than a uniform spatial network. Note that the importance of distinguishing social versus spatial influence networks in transportation mode choice behavior can be especially relevant if the model would in turn be coupled to, for example, a residential choice behavior model. Transportation mode choice, in itself, does not necessarily affect a decision-maker’s reference position in a social-spatial network. Whether an agent decides to travel by car versus public transit will not change in a directly causal sense who his/her neighbors are, nor who his/her fellow district residents are, nor will it have consequences for who the decision-maker’s age, income and education level peers are. Here, the network is wholly static with respect to the choice dimension. Residential location choice, on the other hand, will per definition impact Dugundji & Walker 9 a decision-maker’s reference position in a spatial network. In moving to a new neighborhood, a decision-maker per definition acquires new neighbors, and potentially also new fellow district residents if the move of house is out of the existing district. In this case the spatial network, in particular, is dynamic with respect to the choice dimension. If transportation mode choice and residential choice would be considered together, we then per definition have to address the dynamic nature of the spatial network in the empirical application. Our main aim had been to explore the empirical estimation of discrete choice models with social and/or spatial feedback effects, including various forms of unobserved (group) heterogeneity. In this paper we specifically test for correlation among agents in the error structure, as well as in the systematic utility. Yet despite our efforts to explicitly model unobserved heterogeneity in strategies #2, #1+2, #4 and #5, an open question in the present work may still be the issue of potentially biased estimates due to endogeneity in the field effect. An alternative strategy for introducing social and spatial network dependencies in choice models currently under investigation by the authors is a two-stage instrumental variables approach. One important distinction between the social-spatial causality and correlation discussed here and the time causality and correlation traditionally understood in panel data literature is however that the concept of initial conditions (44) can be less obvious. Time generally points in only one direction (unless foresight is considered), whereas inter-agent dependencies can point in dual directions (decision-maker A influences decision-maker B, and decision-maker B influences decision-maker A). Furthermore inter-agent dependence is clearly not necessarily onedimensional. Nonetheless if it can be assumed that the aggregate field variable as constructed in strategy #1 can be seen as being effectively self-contained and not incurring bias in pointing by definition to unobserved decisionmakers outside of the data set, we believe the case to be similar as one with a time series which is observed from the beginning. In such case there is no “initial value problem” as these values are well-defined. That said, we can also easily imagine alternative constructions not discussed here which could much more obviously incur bias. For example, social-spatial boundary conditions could be critical in the case of modeling social-spatial Markov random field state dependence instead of group mean field state dependence, and social-spatial auto-regressive correlation instead of unobserved group heterogeneity. (41) This is in fact one driving motivation for the particular strategies presented here in this paper. It is also important to note that the identifiable agent interdependence presented in the framework at the outset of the paper, but not addressed in the case study may also by definition more naturally suffer from boundary conditions. An important limitation of the present work is the a priori assumption that a feedback effect is indeed relevant. As explained by Manski (17), access to true temporal data may allow possibilities for better empirically distinguishing feedback effects from correlated effects and contextual effects. CONCLUSIONS We have presented a framework for conceptualizing the interdependence of decision-makers’ choices, making a distinction between social versus spatial network interdependencies and between identifiable versus aggregate agent interdependencies. We discuss five strategies for introducing social and spatial network interdependencies into choice models, focusing on feedback effects and on correlated effects. Due to the nature of data available to us, we are unable to consider identifiable agent interaction in the case study. Instead we consider aggregate agent interdependencies and apply the model strategies for nine variations on three network treatments, one of these defined by socioeconomic group and two defined spatially based on residential location. According to likelihood ratio and non-nested specification tests, our best performing model strategy for this particular case study is what we term a “random field effect” model, namely one with unobserved individual heterogeneity on the group mean state dependence. By casting the research question on social and spatial network interdependencies in terms of an analogy with the treatment of typical panel data, we gain two important benefits. First it allows us to build on wellunderstood properties of typical panel data models. Second, but not least, it allows us to apply freely available software for the estimation of the models with social-spatial interdependence. In so doing, we hope to stimulate researchers and practitioners to adopt these techniques due to the relatively lower entry barrier for researchers and practitioners than could be the case if dedicated own code would need to be written or if expensive software would need to be purchased. We believe our approach can be useful for researchers and practitioners who have a priori reason to believe there is a feedback effect in their work, for example, on theoretical or qualitative grounds. Even without such a priori knowledge, the so-called field variable can still be a very powerful and practical means of capturing variation in a data set, especially in avoiding proliferation of estimated parameters, although strict caution must then be exercised in the interpretation of predictions over time. Access to temporal data would allow researchers and Dugundji & Walker 10 practitioners to remove ambiguity and be more definitive about their results. We hope that researchers and practitioners will start to pay more attention to the importance and consequences of agent interdependence. In particular we would be enthused to see more temporal data collected with the aim towards understanding the potential inherent dynamic associated with feedback effects. ACKNOWLEDGEMENTS The authors would like to gratefully acknowledge Moshe Ben-Akiva, Frank le Clercq, Loek Kapoen, although the authors claim full responsibility for any errors. The work forms Project II of the AMADEUS program directed by Harry Timmermans and coordinated by Theo Arentze, made possible in part through a generous grant by the Dutch National Science Organization. Special thanks are due to Guus Brohm and Yvon Weening of the Municipality of Amsterdam Agency for Infrastructure, Traffic and Transport for assistance with data in the empirical application, and to Len de Klerk, Gert van der Meer, Barbara Lawa, Karin Retèl-de Groot, Tina Zettl, Gimene Spaans, Ernst Berkhout, Michel Hageman, the ICT Helpdesk, Wim de Lange, Loes van Dort, the FMG Facilitheek and the NVD meldkamer, for institutional support in running simulations on computers in student halls at the Department of Geography and Planning in the Faculty of Social and Behavioral Sciences at the University of Amsterdam. Additional simulations were performed on the linux beowulf cluster at the SARA Computing and Networking Services, Sciencepark Amsterdam, with special thanks to Willem Vermin, Bert van Corler, the HPC team, Jan Hartmann and Marcel Heemskerk for institutional support. Several anonymous reviewers provided useful comments on an earlier draft of this paper. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Ben-Akiva, M. Structure of Passenger Travel Demand Models. Ph.D. dissertation. Department of Civil Engineering, MIT, Cambridge, Massachusetts, 1973. Domencich, T., and D. McFadden. Urban Travel Demand. North Holland Press, Amsterdam, 1975. Wegener, M. Reduction of CO2 Emissions of Transport by Reorganisation of Urban Activities. In Y. Hayashi, and J. Roy (eds.), Land Use, Transport and the Environment. Kluwer Academic Publishers, Dordrecht, Netherlands, 1996, pp. 103-124. Moeckel, R., C. Schürmann, B. Schwarze, K. Spiekermann, and M. Wegener. Microsimulation of Urban Land Use, Transport and Environment. Presented at the 10th World Conference on Transportation Research, Istanbul, Turkey, July 2004. Waddell, P. UrbanSim: Modeling Urban Development for Land Use, Transportation and Environmental Planning. Journal of the American Planning Association, Vol. 68, No. 3, 2002, pp. 297314. Martinez, F., and F. Aguila. An optimal urban planning tool: Subsidies and regulations. Presented at the 10th World Conference on Transportation Research, Istanbul, Turkey, July 2004. Hensher, D.A., and T. Ton. TRESIS: A Transportation, Land Use and Environmental Strategy Impact Simulator for Urban Areas. Proceedings of the 9th World Conference on Transportation Research, Seoul, Korea, July 2001. Ben-Akiva, M., D. McFadden, T. Gärling, D. Gopinath, J. Walker, D. Bolduc, A. Börsch-Supan, P. Delquié, O. Larichev, T. Morikawa, A. Polydoropoulou, and V. Rao. Extended Framework for Modeling Choice Behavior. Marketing Letters, Vol. 10, No. 3, 1999, pp. 187-202. McFadden, D. Economic Choices. American Economic Review, Vol. 91, No. 3, 2001, pp. 351-378. Walker, J. Extended Discrete Choice Models: Integrated Framework, Flexible Error Structures, and Latent Variables. Ph.D. dissertation. Department of Civil Engineering, MIT, Cambridge, Massachusetts, 2001. Walker, J., and M. Ben-Akiva. Generalized Random Utility Model. Mathematical Social Sciences, Vol. 43, No. 3, 2002, pp. 303-343. Ben-Akiva, M., D. McFadden, K. Train, J. Walker, C. Bhat, M. Bierlaire, D. Bolduc, A. BoerschSupan, D. Brownstone, D.S. Bunch, A. Daly, A. de Palma, D. Gopinath, A. Karlstrom, and M.A. Munizaga. Hybrid Choice Models: Progress and Challenges. Marketing Letters, Vol. 13, No. 3, 2002, pp. 163-175. Dugundji, E.R., L.L. Kapoen, F. le Clercq, T.A. Arentze, H.J.P. Timmermans, and K.J. Veldhuisen. The Long-term Effects of Multi-modal Transportation Networks: The Residential Choice Behavior of Dugundji & Walker 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 11 Households. Proceedings of the 9th World Conference on Transportation Research, Seoul, Korea, July 2001. Aoki, M. Economic Fluctuations with Interactive Agents: Dynamic and Stochastic Externalities. Japanese Economic Review, Vol. 46, No. 2, 1995, pp. 148-165. Brock, W., and S. Durlauf. Discrete Choice with Social Interactions. Review of Economic Studies, Vol. 68, 2001, pp. 235-260. Blume, L., and S. Durlauf. Equilibrium Concepts for Social Interaction Models. SSRI Working Papers, Social Systems Research Institute, University of Wisconsin, Madison, 2002. Manksi, C.F. Identification Problems in the Social Sciences. Harvard University Press, Cambridge, Massachusetts, 1995. Brock, W., and S. Durlauf. A Multinomial Choice Model of Neighborhood Effects. American Economic Review, Vol. 92, No. 2, 2002, pp. 298-303. Dugundji, E.R. Socio-Dynamic Discrete Choice: Analytical Results for the Nested Logit Model. Presented at the International Conference on Complex Systems, Boston, Massachusetts, May 2004. (http://necsi.net/events/iccs/openconf/author/abstractbook.php) Dugundji, E.R., and L. Gulyas. Empirical Estimation and Multi-Agent Based Simulation of a Discrete Choice Model with Network Interaction Effects. In Proceedings of the 8th International Conference on Computers in Urban Planning and Urban Management, Sendai, Japan, May 2003. (http://www.rs.civil.tohoku.ac.jp/~cupum03) Dugundji, E.R., and L. Gulyas. An Exploration of the Role of Global versus Local and Social versus Spatial Networks in Transportation Mode Choice Behavior in The Netherlands. In Proceedings of AGENT 2003: Challenges in Social Simulation, University of Chicago, USA, October 2003. (http://www.agent2004.anl.gov/Agent2003.pdf) Dugundji E.R., and L. Gulyas. Discrete Choice on Networks: An Agent-Based Approach. Presented at the First Conference of the European Social Simulation Association, Groningen, Netherlands, September 2003. (http://www.uni-koblenz.de/~kgt/ESSA/ESSA1/) Dugundji E.R., and L. Gulyas. Dynamic Discrete Behavioral Choice on Networks: The Nested Logit Model – An Empirical Application. Presented at XXIV International Sunbelt Social Network Conference, Portorož, Slovenia, May 2004. (http://vlado.fmf.uni-lj.si/info/sunbelt24/default.htm) Dugundji E.R., and L. Gulyas. Socio-Dynamic Discrete Choice on Networks: Multi-agent Simulation of Social & Spatial Interaction Effects in Transportation Mode Choice Behavior. Presented at the 10th World Conference on Transportation Research, Istanbul, Turkey, July 2004. (http://www.wctr2004.org.tr) Dugundji E.R., and L. Gulyas. Socio-Dynamic Discrete Choice on Networks: Impacts of Agent Heterogeneity on Emergent Equilibrium Outcomes. In Proceedings of AGENT 2004: Social Dynamics: Interaction, Reflexivity and Emergence, University of Chicago, USA, October 2004. (http://www.agent2004.anl.gov/agenda.html) Hensher, D.A. Models of Organisational and Agency Choices for Passenger and Freight-Related Travel Choices: Notions of Inter-Activity and Influence. ITS-WP-03-05, Faculty of Economics and Business, University of Sydney, NSW, 2003. Brewer, A.M. and D.A. Hensher. Distributed work and travel behaviour: The dynamics of interactive agency choices between employers and employees. Transportation, Vol. 27, 2000, pp. 117-148. Rose, J. and D.A. Hensher. Modelling agent interdependency in group decision making. Transportation Research Part E, Vol. 40, 2004, pp. 63-79. Vovsha, P., M. Bradley and J.L. Bowman. Activity-Based Travel Forecasting Models in the United States: Progress since 1995 and Prospects for the Future. Presented at the EIRASS Workshop on Progress in Activity-Based Analysis, Maastrict, Netherlands, May 2004. Vovsha, P., J. Gliebe, E. Peterson and F. Koppelman. Comparative Analysis of Sequential and Simultaneous Choice Structures for Modeling Intra-Household Interactions. Presented at the EIRASS Workshop on Progress in Activity-Based Analysis, Maastrict, Netherlands, May 2004. Timmermans, H.J.P., A.W.J. Borgers, J. van Dijk and H. Oppewal. Residential Choice Behavior of Dual-Earner Households: A Decompositional Joint Choice Model. Environment and Planning A, Vol. 24, 1992, pp. 517-533. Borgers, A.W.J. and H.J.P. Timmermans. Transport Facilities and Residential Choice Behavior: A Model of Multi-Person Choice Processes. Papers in Regional Science, Vol. 72, 1993, pp. 45-61. Dugundji & Walker 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 12 Abraham, J.E. and J.D. Hunt. Specification and Estimation of Nested Logit Model of Home, Workplaces, and Commuter Choices by Multiple Worker Households, Transportation Research Record 1606, 1997, pp.17-24. Molin, E.J.E., H. Oppewal and H.J.P. Timmermans. Conjoint modeling of residential group preferences: A comparison of the internal validity of hierarchical information integration approaches. Journal of Geographical Systems, Vol. 4, 2002, pp. 343-358. Dellaert, B.G.C., M. Prodigalidad and J.J. Louviere. Family Members’ Projections of Each Other’s Preference and Influence: A Two-Stage Conjoint Approach. Marketing Letters, Vol. 9, No. 2, 1998, pp. 135-145. Ben-Akiva, M. and M. Bierlaire. Discrete choice methods and their applications to short-term travel decisions. In R. Hall (ed.), Handbook of Transportation Science. International Series in Operations Research and Management Science, Kluwer, Dordrecht, Netherlands, 1999. Timmermans, H.J.P., T.A. Arentze, M. Dijst, E.R. Dugundji, C-H. Joh, L.L. Kapoen, S. Krygsman, K. Maat, and K.J. Veldhuisen. AMADEUS: Framework for Developing a Dynamic Multiagent, Multiperiod Activity-Based Micro-Simulation Model of Travel Demand. Presented at the 81st Annual Meeting of the Transportation Research Board, Washington, D.C., January 2002. Joh, C-H. Measuring and Predicting Adaptation in Multidimensional Activity-Travel Patterns. Ph.D. dissertation. Department of Architecture, TU/e, Eindhoven, Netherlands, 2004. Krygsman, S. Activity and Travel Choice(s) in Multimodal Public Transport Systems. Ph.D. dissertation. Department of Geography, UU, Utrecht, Netherlands, 2004. Maat, K., H.J.P. Timmermans, and E. Molin. A Model of Spatial Structure, Activity Participation and Travel Behavior. Presented at the 10th World Conference on Transportation Research, Istanbul, Turkey, July 2004. Dugundji, E.R. Socio-Dynamic Discrete Choice. Ph.D. manuscript (in progress). Faculty of Social and Behavioral Sciences, Department of Geography & Planning, UvA, Amsterdam, Netherlands, 2005. Ben-Akiva, M., and S.R. Lerman. Discrete Choice Analysis: Theory and Application to Travel Demand. MIT Press, Cambridge, Massachusetts, 1985. Dugundji, E.R. and J.L. Walker. Discrete Choice with Social and Spatial Network Interdependencies: An Empirical Example Using Mixed GEV Models with Field and “Panel” Effects. Presented at the 84th Annual Meeting of the Transportation Research Board, Washington, D.C., USA, January 2005. Heckman, J. J. The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time-Discrete Data Stochastic Process. In C. Manski and D. McFadden (eds.), The Structural Analysis of Discrete Data. MIT Press, Cambridge, Massachusetts, 1981, pp. 179-195. Dugundji & Walker 13 LIST OF FIGURES AND TABLES FIGURE 1 (a) Commuter mode count for decision-makers in Amsterdam and Amstelveen grouped by residential district. (b) Abstract visualization of network interdependence defined by residential district plus socioeconomic group. TABLE 1 Loglikelihood, Number of Estimated Parameters and Adjusted Rho-Squared for Various Reference Group Treatments Dugundji & Walker 14 (a) Legend 6 Amsterdam and Amstelveen 1 2 8 3 9 4 10 Sampled Residential Locations District 1 (363 respondents) District 2 (352 respondents) District 3 (329 respondents) District 4 (223 respondents) District 6 (254 respondents) District 8 (340 respondents) District 9 (348 respondents) District 10 (243 respondents) District 16 (461 respondents) 310 16 Bicycle commuters (779 total) Transit commuters (690 total) Car commuters (1444 total) (b) FIGURE 1 (a) Commuter mode count for decision-makers in Amsterdam and Amstelveen grouped by residential district. (b) Abstract visualization of network interdependence defined by residential district plus socioeconomic group. Red dots represent fully-connected residential districts as in (a); the darkness and width of lines gives an indication of the number of links between districts induced by socioeconomic group. Dugundji & Walker 15 TABLE 1 Loglikelihood, Number of Estimated Parameters and Adjusted Rho-Squared for Various Reference-Group Treatments a) Model Strategy #1: Field Effect (Group Mean State Dependence) #2: “Panel” Effect (Unobserved Group Heterogeneity) #1 and #2: Field + Panel #3: Alt-Specific Group Dummies (Observed Group Heterogeneity) #4: Random Field Effect (Group Mean State Dependence w/ Unobs. Individual Heterogeneity) #5: “Panel” Field Effect (Group Mean State Dependence w/ Unobserved Group Heterogeneity) a) b) c) Socioecon. Socioecon. Socioecon. Socioecon. District Postcode w/ District w/ Postcode Socioecon. District Socioecon. + District (13 groups) (9 groups) (67 groups) (joint) (joint) + District + Postcode + Postcode + Postcode -2052.15 -2054.88 -2009.55 -2051.9 -2047.61 -2042.92 -2005.1 -1999.58 -1994.65 16 16 16 16 16 17 17 17 18 0.305 0.304 0.320 0.305 0.307 0.308 0.321 0.323 0.324 -2054.22 -2063.61 -2061.22 -2058.55 -2059.17 -2058.55 -2061.22 -2059.17 -2059.17 18 18 18 18 18 18 18 18 18 0.304 0.301 0.302 0.303 0.302 0.303 0.302 0.302 0.302 -2048.69 -2054.87 -2009.55 -2048.54 -2043.86 -2041.27 -2005.1 -1999.34 -1994.49 19 19 19 19 19 20 20 20 21 0.306 0.303 0.319 0.306 0.307 0.308 0.320 0.322 0.323 -2034.27 -2048.89 b) c) c) b) b) b) b) 39 31 0.304 0.301 -2047.91 -2052.01 -2004.79 -2047.08 -2042.51 -2033.31 -2002.55 -1989.9 -1984.37 19 19 19 19 19 23 23 23 27 0.306 0.304 0.320 0.306 0.308 0.309 0.320 0.324 0.324 -2048.75 -2054.87 -2009.55 -2048.06 -2044.42 -2040.95 -2005.1 -1999.1 -1994.02 19 19 19 19 19 23 23 23 27 0.305 0.303 0.319 0.306 0.307 0.307 0.319 0.321 0.321 The baseline model with cross-nested logit choice kernel and no modeled inter-agent dependence has loglikelihood of -2066.03 with 15 estimated parameters. The adjusted rho-squared for this baseline model is 0.301. Considerable risk of model overfit due to large number of parameters to be estimated Unclear how to operationalize alternative-specific group dummies with joint overlapping reference groups
© Copyright 2026 Paperzz