The value of points of interest information in predicting cost-effective charging infrastructure locations Master Thesis MSc. Business Information Management Stéphanie Florence Visser Thirty-sixth International Conference on Information Systems, Forth Worth 2015 The value of points of interest information in predicting cost-effective charging infrastructure locations Stéphanie Florence Visser 407153 MSc. Micha Kahlen Dr Rodrigo Belo 14 August 2016 MSc. Business Information Management The copyright of the master thesis rests with the author. The author is responsible for its contents. RSM is only responsible for the educational coaching and cannot be held liable for the content. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 2 Executive summary In this study, we assess the value of using information on points of interest in predicting cost-effective electric vehicle charging infrastructure locations. The benefits of battery electric vehicle adoption are profound and necessary, as they revolve around the potential to combat climate change. However, battery electric vehicles (BEVs) currently lack mass adoption, amongst others due to range anxiety of potential users – which can be diminished by increasing the presence of charging infrastructure. Improved insight in potential charging demand can help overcome the reluctance of infrastructure providers to place charging stations before actual demand is known – and as explanatory studies show a relationship between points of interest and charging demand, we investigate the predictive power of this relationship. We therefore study potential charging demand in the case city of San Diego, given expected electric vehicles on-the-road in the year of 2020. We investigate both slow charging and fast charging demand based on an extensive dataset of parking behavior of battery electric vehicles over the course of 63 weeks, and combine this with information on points of interest within the city of San Diego. After investigating different methods to ensure the best possible predictive power, we use a Gradient Boosting Model to predict potential charging demand. Our findings show that electric vehicle charging demand can be more accurately predicted when using a model supplemented with information on points of interest, compared to a model with information on neighborhood characteristics, seasonality and charging infrastructure presence. Consequently, our findings show that a supplemented model can assist in decision-making regarding cost-effective charging locations, by promising 2 to 45 times more profit using the supplemented model, than when using the base model. Still, we find that the supplemented model under-predicts high-demand locations, hence that the model loses out on potential profit. We show that although slow charging demand can be more accurately predicted, the supplemented model provides most gain in fast charging demand prediction. Further, in our models presence of a food and beverage store, leisure activity, or airport is most influential on predicting demand. This study therefore shows that information on points of interest is indeed valuable in predicting cost-effective electric vehicle charging locations. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 3 Acknowledgements I would like to take the opportunity to express my sincere gratitude to the following individuals who significantly contributed to this educational journey. Micha Kahlen – I honestly could not have wished for a better coach. Thank you for your excellent guidance, insights and clarity throughout the whole process. I sincerely appreciate your time and patience, especially in the last weeks when I went full-stalk mode. Dr Rodrigo Belo, who made me genuinely enthusiastic about statistics and R. Thank you for stimulating me to think further, for helping me fight the data, and for your time and patience when adopting me lastminute as a thesis student. My mum, Béatrice, who is the most awesome mum ever – and who still teaches me dedication and happiness every day. Thank you for your continuous support, and for helping me to bring out the best in myself. Lastly, my homeboys Dirk and Job – you guys were the best possible people to conquer BIM with, and I could not have done it without you. Thank you for teaching me the value of teamwork, for the encouragement, and for the laughter. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 4 Table of contents 1. Introduction 8 1.1 Background 8 1.2 Research question 10 1.3 Relevance 10 2. Related work and conceptual model 12 2.1 The importance of infrastructure availability 12 2.1.1 Electric vehicle adoption 12 2.1.2 Range anxiety 13 2.2 Predictors of charging demand 14 2.2.1 Points of interest 14 2.2.2 Other predictors of charging demand 16 2.3 The business case of charging infrastructure 17 2.4 Predictive modelling 18 2.5 Summary of findings and conceptual model 19 2.6 The rest of this paper 19 3. Methodology 21 3.1 Context 21 3.1.1 Expected charging demand in San Diego 21 3.1.2 Charging station types 22 3.2 Study design 22 3.3 Data 22 3.3.1 Data on parking behavior 22 3.3.2 Data on points of interest 25 3.3.3 Data on neighborhood characteristics 26 3.4 Data preparation 27 3.4.1 Response variable 27 3.4.2 Point of interest predictors 28 3.4.3 Base predictors 31 3.5 Model selection 32 3.5.1 Selecting predictors and method 32 3.5.2 Gradient Boosting Method 32 3.5.3 Generalized Linear Models 33 3.5.4 Data partitioning 33 3.5.5 Comparison metrics 33 3.5.6 Overview predictors and methods 35 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 5 4. Assessment of predictive value 36 4.1 Demand predictions base models 36 4.1.1 Base model slow charging 36 4.1.2 Base model fast charging 39 4.2 Demand predictions points of interest model 42 4.2.1 Points of interest model slow charging 43 4.2.2 Points of interest model fast charging 45 4.2.3 Demand prediction differences 48 4.3 Location predictions 48 4.3.1 Costs and revenue 48 4.3.2 Predictive metrics 49 4.3.3 Cost-effective charging locations 50 4.3.4 Charging locations given predetermined investment 52 4.3.5 Improved decision-making 53 5. Discussion 56 5.1 Charging demand prediction using POI information 56 5.1.1 Difference in demand prediction between models 56 5.1.2 Large difference between actual and predicted demand 56 5.1.3 Slow charging more accurately predicted than fast charging demand 57 5.1.4 F&B store, leisure activity and airport most influential on predictions 57 5.1.5 Importance of neighborhood characteristics and seasonality 58 5.1.6 Explanatory power is not necessarily related to predictive power 58 5.2. Decision-making using POI information 59 5.2.1 Better prediction of cost-effective locations using POI information 59 5.2.2 More potential profit can be made using point of interest information 59 5.2.3 Yet we lose out on profit due to under-prediction 60 5.2.4 Model applications 60 5.3 Assessment of methodologies used 61 5.3.1 Predictive power of Gradient Boosting 61 6. Conclusion 62 6.1 The value of points of interest information 62 6.2 Academic relevance 63 6.3 Managerial relevance 63 6.4 Limitations and suggestions for future research 64 Bibliography 65 Appendices 71 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 6 List of tables and figures Table 1. Overview studies finding influence of points of interest 16 Table 2. Extract raw data parking behavior 23 Table 3. Extract raw data points of interest 25 Table 4. Points of interest categories 26 Table 5. San Diego neighborhoods 26 Table 6. Table of notation 31 Table 7. Overview predictors and methods 35 Table 8. Predictive metrics base model slow charging 39 Table 9. Predictive metrics base model fast charging 42 Table 10. Predictive metrics model comparison 43 Table 11. Predictive metrics supplemented model slow charging 45 Table 12. Predictive metrics supplemented model fast charging 47 Table 13. Lifetime charging station costs $ 48 Table 14. Predicted cost-effective parking locations metrics 54 Table 15. Predicted top 100 parking locations metrics 55 Figure 1. Conceptual model 20 Figure 2. San Diego parking locations - where red dots indicate parking locations 24 Figure 3. San Diego points of interest - where red dots indicate points of interest 25 Figure 4. Base model slow charging 50 trees – where red dots indicate lowest MSE 37 Figure 5. Base model slow charging 9 trees – where red dots indicate lowest MSE 38 Figure 6. Predictor importance base model slow charging 39 Figure 7. Base model fast charging 50 trees - where red dots indicate lowest MSE 40 Figure 8. Base model fast charging 8 trees - where red dots indicate lowest MSE 41 Figure 9. Predictor importance base model fast charging 42 Figure 10. Supplemented model slow charging 100 trees - where red dots indicate lowest MSE 44 Figure 11. Predictor importance supplemented model slow charging 45 Figure 12. Supplemented model fast charging 100 trees - where red dots indicate lowest MSE 46 Figure 13. Predictor importance supplemented model fast charging 47 Appendix 1. Extract points of interest 71 Appendix 2. Example non-linearity predictor and response 72 Appendix 3. Density response variable 72 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 7 1. Introduction 1.1 Background Although CO2 emissions remained stable in 2014, the International Panel on Climate Change (IPCC) warns that without drastic action, greenhouse gasses will irreversibly change our climate (IEA, 2015). Already, the number of extreme weather events such as droughts, heat waves, floods, rising sea levels and typhoons increased over four times from 1980 to 2014, from 38 events to 174 events respectively (World Energy Council, 2015). The transportation sector significantly contributes to these developments, accounting for almost 20% of greenhouse gas (GHG) emissions in the EU in 2013 (European Environment Agency, 2016). To combat climate change, governments have set goals for the transportation sector to reduce this type of emission. For example, the EU aims to lower GHG emission from transport by 20% in 2030 compared with 2008 levels, resulting in a cut of 60% in GHG emission by 2050 compared to 1990 (European Commission, 2011). Along the same lines, the US committed to several new policies for the transportation sector that will assist in reaching the overall US goal of a 17% GHG emission reduction in 2020 compared to 2005 levels (US Department of State, 2014). As the large contribution to GHG of the transportation sector can mainly be accounted to the intensive use of fossil fuels, policy-makers and other stakeholders are increasingly interested in transportation means that are less dependent on oil. More specifically, deployment of plug-in Electric Vehicles (EVs) offers opportunities to significantly lower emissions. Plug-in Electric Vehicles (PEVs) include Battery Electric Vehicles (BEVs, e.g. Nissan Leaf) and Plug-in Hybrid Electric Vehicles (PHEVs, e.g. Toyota Prius Plug-in). As the latter vehicle type, apart from a rechargeable battery, also deploys an internal combustion engine, BEVs are preferred over PHEVs in the light of environmental impact. BEVs produce no tailpipe GHG emissions (Mak, Rong, & Shen, 2013; Schneider, Stenger, & Goeke, 2014), can be powered by sustainable energy sources, produce minimal noise (Schneider et al., 2014), and can reduce GHG emissions even when the electricity production at time of charging is unusually CO2 high (San Román, Momber, Abbad, & Sánchez Miralles, 2011). BEVs therefore have the potential to become popular on a large scale, especially in urban areas, where combatting pollution is high on the agenda of every administrator. However, success of these vehicles will depend on advancements in recharging infrastructure. At the moment, Battery Electric Vehicle charging poses more challenges to users than refueling internal combustion engine (ICE) vehicles. Compared to their traditional counterparts, Battery Electric Vehicles have a shorter driving range hence need more frequent recharging, require special charging stations, and demand a significantly longer charging time. This necessitates special care when planning public charging infrastructure; only when EV charging is perceived as efficient and convenient, BEV adoption will be positively influenced (Guo & Zhao, 2015; Ip, Fong, & Liu, 2010; Lin, 2014). Conversely, perceived low or Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 8 inconvenient availability of charging infrastructure could hinder BEV adoption due to range anxiety – the fear of stranding with a depleted battery (Cai, Jia, Chiu, Hu, & Xu, 2014; Egbue & Long, 2012). Convenient public charging infrastructure is especially vital in urban areas to ensure BEV adoption, due to limited access to home charging (Ip et al., 2010). However, as placement of EV infrastructure started only years ago, planning and deployment of effective charging infrastructure is currently subject to a ‘chicken-and-egg dilemma’ (Mak et al., 2013); on the one hand, customers are willing to purchase BEVs only when plenty of charging facilities are placed at easily accessible locations. On the other hand, distribution system operators are only inclined to place charging facilities once a certain demand is observed, as underutilized stations can lead to a waste of resources (Cai et al., 2014). Indeed, currently the electric vehicle infrastructure landscape is shaped by government funding (McCormack, Sanborn, & Rhett, 2013), whilst the EV market can only be sustainable in the longrun when private investors are able to make profit on their investment (Madina, Zamora, & Zabala, 2016). To challenge this dilemma, distribution system operators should be able to determine locations that expect a certain level of charging demand hence that promise return on investment, before actual demand is known. Advancement in BEV charging infrastructure is therefore attained when charging stations are placed at locations where they are easily accessible to BEV users for refueling, whilst being economically efficient to the charging station distributor (Brooker & Qin, 2015; Guo & Zhao, 2015; Madina et al., 2016; Shukla, Pekny, & Venkatasubramanian, 2011). It is for this reason that over the last few years, the question of effective charging station infrastructure has surged interest amongst scholars. As trip or refueling patterns of ICE vehicles may not be one-on-one applicable in a battery electric vehicle context, some studies have focused on destination characteristics; due to their currently long recharging time, BEVs are assumed to charge more frequently at a destination rather than in the middle of a trip. These studies found a relationship between points of interest, such as museums or restaurants, and parking demand (Brooker & Qin, 2015; Wagner, Brandt, & Neumann, 2014, 2015). It may therefore be, that placing charging stations close to specific points of interest ensures the necessary demand to make privately-funded charging stations profitable. Although the relationship between points of interest and parking demand seems promising, there is still a lot to gain in assessing its practical value. Most studies that report evidence on this relationship, have used proxies for BEV charging such as ICE vehicle parking and refueling data, or ICE vehicle-focused survey data (Brooker & Qin, 2015; Cai et al., 2014; Chen, Hall, & Kockelman, 2013; Shahraki, Cai, Turkay, & Xu, 2015; Wagner et al., 2015). Further, in a BEV context no successful attempt is made to examine the actual predictive power of this relationship. Should we want private parties to be convinced to invest in charging infrastructure before demand is known, it is vital to study whether or not we can predict costeffective locations based on this relationship. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 9 1.2 Research question Mass-adoption of electric vehicles will only make advancements when private investors can gain return on investment from placing electric vehicle recharging stations, which requires providing (potential) users with easily accessible public charging infrastructure. In order to contribute to solving this chicken-andegg dilemma, this study aims to assess the practical value of the promising relationship between points of interest and demand, by answering the following research question: RQ. What is the value of using information of points of interest in predicting cost-effective electric vehicle charging infrastructure locations? To effectively answer this research question, this study focuses on the following two sub-research questions: SRQ1. How is predicted electric vehicle charging infrastructure demand influenced when using information on points of interest supplementary to information on neighborhood characteristic, seasonality and charging infrastructure presence? SRQ2. How can a model supplemented with information on points of interest improve decision-making in predicting cost-effective electric vehicle charging locations? 1.3 Relevance The relevance of this study is two-fold, focusing both on academic relevance and on managerial relevance. We aim to advance current academic knowledge through two main contributions. Firstly, we use a unique dataset of parking behavior of battery electric vehicles that was recorded over the course of fourteen months. To our knowledge, no study thus far used such extensive data of parking behavior within a battery electric vehicle parking demand context. Secondly, we aim to contribute to the understanding of the relationship between points of interest and potential charging behavior, by assessing the practical relevance of this relationship through predictive modelling. From a managerial perspective, this study will assist policy makers and independent charging infrastructure operators in determination of cost-efficient charging infrastructure locations before actual demand is known. We thereby hope to contribute to improved decision-making in this field, eventually making the electric vehicle market more attractive for both car users and infrastructure operators. Lastly, this predictive approach may also be used in a non-electric vehicle context; as predicted load is directly Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 10 linked to parking time, parking facility operators or point of interest operators such as shopping malls may also find our methodology useful in predicting parking demand at specific locations. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 11 2. Related work and conceptual model In light of the research question of this study, we examine related work and develop a conceptual model accordingly. 2.1 The importance of infrastructure availability 2.1.1 Electric vehicle adoption The benefits of electric vehicle adoption are profound and necessary, as they revolve around the potential to lower emissions hence to combat climate change. Lowering emissions can specifically have a notable influence on cities and their inhabitants; partially due to transportation emissions, urban areas suffer from toxic air pollution which severely affects inhabitants’ health (Wagner et al., 2014). Studies on reduction of emissions from motorized vehicles by introduction of alternative-fuel vehicles such as electric vehicles, forecast high-impact results such as a reduction in premature deaths, a reduction in years of life lost, and a reduction in disability-adjusted life-years (Woodcock et al., 2009). However, despite the clear benefits, EV adoption is still low. In 2014, electric vehicles comprised of only 0.08% of the global number of passenger cars (International Energy Agengy, 2015). The percentage of electric vehicles is higher when only taking developed countries into account – yet is still very modest. Electric vehicle sales in the European Union made up of 0.7% of total vehicle registrations in 2014, with Norway (13.8%) and the Netherlands (4.0%) as frontrunners due to fiscal incentives promoted by the governments of these countries (International Council on Clean Transportation, 2015). In the same year, EVs in the United States made up of 1.5% of the total number of vehicle market share (International Energy Agengy, 2015). To understand why electric vehicles lack mass adoption, one should be aware of the factors that influence EV purchase intentions amongst consumers hence that may act as drivers for or barriers against acceptance. It is for this reason that Rezvani, Jansson, & Bodin (2015) published a meta-analysis concerning EV purchase intentions, covering studies that were published in peer-reviewed journals between 2007 and 2014. In this analysis, it becomes clear that multiple studies find evidence for the perceived importance of charging infrastructure in EV adoption. Charging infrastructure is specified as a major concern to EV adoption in Egbue & Long (2012), yet these concerns are not made explicit due to the generic formulation of the study’s survey question regarding this matter (Q: “What do you consider your biggest concern about EVs?” A: “Charging infrastructure” (Egbue & Long, 2012, p. 727)). Fortunately, this concern is made clearer in other studies, where results show that drivers express anxiety regarding availability and safety of public charging points (Graham-Rowe et al., 2012), or that EV demand is significantly influenced by availability of charging locations both at work and in the public space (Jensen, Cherchi, & Mabit, 2013). This appears to be especially true amongst customers that describe themselves as planners who prefer structure; this consumer group would be more likely to purchase a BEV when charging points were available at supermarkets and in town centers. Perceived availability appears to have Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 12 a strong association with visibility, as EV interest levels are significantly higher amongst potential EV users that have seen recharging stations in their neighborhood (Carley, Krause, Lane, & Graham, 2013). Lastly, charging infrastructure concerns do not seem to have a relation with the act of plugging-in the EV or remembering to do so (Skippon & Garwood, 2011). In conclusion, it is evident that mass adoption of electric vehicles is related to absence or presence of charging infrastructure. 2.1.2 Range anxiety Above studies signify the importance of EV charging infrastructure availability – a concept that stems from potential customer concerns regarding the currently-inherent nature of electric vehicles, namely that EVs suffer from a shorter travelling distance compared to internal combustion engine vehicles hence need more frequent recharging, and that this recharging requires a significantly longer time than ICE vehicles (Ip et al., 2010; Neubauer & Wood, 2014). Although technical advancements will eventually resolve these issues, currently these EV characteristics cause a “fear of being stranded in a BEV because it has insufficient range to reach its destination” (Egbue & Long, 2012, p. 273) – a concept dubbed as range anxiety. Remarkably, (potential) EV users do not appear to experience range anxiety as such, yet a priori resolve the potential of range anxiety by reaching towards other means of transportation, such as ICE vehicles (Franke, Neumann, Bühler, Cocron, & Krems, 2012). Therefore, range anxiety indeed is a barrier for mass adoption of electric vehicles (Egbue & Long, 2012; Rauh, Franke, & Krems, 2015). Additionally, this tendency to avoid potential range stress gives room to a paradoxical effect; although the optimal EV range is the smallest sufficient range in light of EV ecological footprint and cost-efficiency (Franke & Krems, 2013c), EV users prefer a much larger range to feel comfortable (Egbue & Long, 2012) – a safety buffer which is confirmed by the finding that drivers usually have a large surplus of battery left when recharging (Franke & Krems, 2013b; Speidel & Braunl, 2014). Multiple studies have shown that for a large share of drivers, their average daily range needs easily fall into the common 100-mile EV range (Franke & Krems, 2013a). However, their comfortable range preferences are substantially higher than these average daily needs; in Egbue & Long (2012), 91% of respondents drove less than 50 miles per day, yet only 32% of drivers was interested in an EV with a maximum range of 100 miles – 45% of respondents indicated to only be interested in BEVs with a range greater than 200 miles. This is potentially due to the finding that although range preferences are higher than average needs, they are not substantially higher than maximum daily range needs (Franke & Krems, 2013c). Indeed, in a sample of US drivers, only 9% of drivers never exceeds the 100 miles range in a year – hence for 91% of drivers, a 100-mile range EV that would be charged once per day would fail to adhere to the needs of the driver at least once a year (Pearre, Kempton, Guensler, & Elango, 2011). Range anxiety therefore is a significant barrier to EV adoption that needs to be diminished in order to ensure EV success. This anxiety is related to multiple factors which can therefore be deployed to reduce this stress; experience, where more seasoned EV drivers experience less range stress (Franke & Krems, Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 13 2013c; Rauh et al., 2015); personality traits of the driver such as ambiguity tolerance (Franke & Krems, 2013a; Franke et al., 2012); and availability of public charging infrastructure (Neubauer & Wood, 2014). The first two factors seem to be challenging to deploy in order to reduce range anxiety, as studies suggest that EV drivers’ learning curves vary due to multiple variables such as domain-specific knowledge (Franke & Krems, 2013a) and daily range practice (Franke et al., 2012), and may require psychological intervention. Availability of public charging, however, is a less intrusive manner of decreasing range anxiety with promising results; ubiquitous public charging possibilities can strongly increase miles travelled of high-mileage drivers, whilst it can ensure that low-mileage drivers utilize their BEV range to almost 100% (Neubauer & Wood, 2014). Accessibility of public EV charging infrastructure is therefore indeed critical to the success of electric vehicles (Cai et al., 2014; Egbue & Long, 2012). 2.2 Predictors of charging demand 2.2.1 Points of interest Yet what makes public electric vehicle charging infrastructure optimally accessible? To answer this question, numerous studies have focused on transportation infrastructure network design using highly theoretical approaches, such as Funke, Nusser, & Storandt (2014), who developed a model to place as few charging stations as possible on any quickest path, whilst still ensuring sufficient energy supply to BEV drivers who started with a full battery. Although theoretical approaches as in this study may significantly contribute to the methodological discussion of infrastructure network design, they may however be impractical or unfriendly to implement, or may neglect issues in BEV adoption in urbanized areas (Ip et al., 2010; Wagner et al., 2014) – the latter being applicable to this study. Whilst the authors do address the problem of infrastructure accessibility by ensuring that there are enough charging stations to guarantee sufficient energy supply, they bypass the possibility of potential underutilization or capacity constraints of these charging stations – it seems plausible that especially in urban areas, multiple charging stations can still operate efficiently or are even necessary to cope with BEV demand; whilst in rural areas, a small detour may make more sense from a station utilization perspective. Therefore, the search for optimal infrastructure accessibility should go hand-in-hand with the challenge of charging utilization prediction. Expected BEV infrastructure utilization hence charging demand is explored by Cai et al. (2014), who aim to estimate charging demand based on parking patterns from a Beijing taxi fleet and who find that “collective vehicle hotspots are good indicators of charging demand” (Cai et al., 2014, p. 39). This thought is confirmed by Shahraki, Cai, Turkay, & Xu (2015), who use the same Beijing taxi dataset to select current gas station locations for electric vehicle charging, and who find that charging demand for this fleet concentrates in the inner-city. In predicting charging demand, these authors base their findings on ICE taxi parking demand, thereby assuming that BEV parking behavior follows the same patterns as ICE vehicle parking behavior. This might be an overly simplistic assumption, however, as traditional travel Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 14 patterns may not represent demand for public charging infrastructure (Cai et al., 2014). Indeed, “there remains significant uncertainty regarding the estimation of demand for plug-in electric vehicle charging, due to a lack of available information on PEV driver behavior” (Sathaye & Kelley, 2013, p.16). As an example, as BEV charging usually takes longer than refueling ICE vehicles, it seems fair to assume that charging of BEVs will more frequently happen at the end of a trip, as opposed to in the middle of a trip – therefore imposing different requirements on recharging stations. However, although based on ICE vehicle data, the findings of these two studies do point towards the direction of a connection between points of interest (POIs) and charging demand – it does not take much to reason that locations proximate to the city center attract more traffic (or, are vehicle “hotspots”), as this is where most points of interest are located. It is maybe for these reasons, the likelihood of higher BEV infrastructure utilization at the end of a trip and the finding that vehicle parking density is more intense at specific locations, that some scholars shifted their focus towards destination characteristics as opposed to trip or refueling patterns. Chen, Hall, & Kockelman (2013) argue that BEV parking duration is influenced by activity type hence points of interest at destination, with activities related to work, college, religion, social and recreation showing longer parking durations than, for example, eating out or shopping. This influence of destination type on parking duration is also recognized by Xi, Sioshansi, & Marano (2013), who in developing their charging infrastructure location model decide to model parking locations around workplaces, universities and shopping locations, as “trips to such locations typically entail extended stays” (Xi et al., 2013, p. 64). Although confirming the influence of destination characteristics on parking duration, the data used by Chen et al. (2013) is again focused on ICE vehicles which may not be one-on-one applicable to a BEV context. Further, data is collected from a travel survey, which is usually recorded for a limited duration (in this case, two days), hence which may not be representative of actual parking demand. Brooker & Qin (2015) also explore the relationship between points of interest at destination and charging demand by use of survey results yet combine this with data of actual charging behavior. The authors determine the likelihood of recharging, at nine point of interest groups; home, work, school, medical, shopping, social, family, transport, and meals. Of the public destinations, they find that shopping, social and meal destinations have a high probability of recharging, where school, medical or family-related destinations are less likely to be recharged at. Also using data on charging station usage, Wagner, Brandt, & Neumann (2014) analyze utilization of EV charging points in Amsterdam. Through regression analyses, the authors find significant relationships between EV infrastructure utilization and different types of POIs within approximately half-an-hour walk from the parking location, such as banks, hospitals or museums. Despite a relatively low predictive power of the model, this confirms that proximity to points of interest has an impact on urban charging behavior. However, using utilization of existing charging stations as a proxy for actual parking behavior hence Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 15 charging demand, may not uncover the potential of other parking locations which may be more frequently visited. Using actual parking data within an ICE car-sharing context, Wagner, Brandt, & Neumann (2015) find significant relationships in regression analyses between the number of ended car-sharing rentals at a destination, and the points of interest within 1 kilometer of the destination. Although this again indicates that points of interest have an influence on parking activities in car-sharing, the low explanatory power of the regression model shows that there are many more aspects influencing car-sharing parking. However, these findings may be less interesting in an electric vehicle charging station context, as the number of ended car-sharing rentals may not be predictive of the potential utilization level of charging stations; one can imagine that at locations where many users end their rental, the number of started rentals is also high, making that BEVs do not have time to recharge at these locations. It is therefore “more appropriate to focus on EV arrival and departure times from parking lots since this is when (…) charging can be reasonably done” (Xi et al., 2013, p. 60). An overview of above-mentioned studies that find potential influence of points of interest on charging demand, can be found as Table 1. Table 1. Overview studies finding influence of points of interest Overview studies finding influence of points of interest Author (year) Brooker & Qin (2015) Observed variable Charging likelihood Focus Method ICE Travel survey and charging data Findings related to POIs Cai et al. (2014) Parking duration ICE Traject data from taxi fleet Collective vehicle hotspots g00d Chen et al. (2013) Parking duration ICE Travel survey Shahraki et al. (2015) Parking duration ICE Traject data from taxi fleet Wagner et al. (2014) Charging station utilization EV Charging data Wagner et al. (2015) Ended rentals ICE Rental data car-sharing Charging needs differ per destination type indication of charging demand Parking duration influenced by activity type Charging demand concentrates in inner-city Relationship infrastructure utilization and points of interest Relationship ended rentals and points of interest 2.2.2 Other predictors of charging demand Apart from points of interest, there may be other factors influencing charging demand, including neighborhood characteristics or seasonality. Demographic characteristics of neighborhoods may influence the charging demand of a parking location. For example, Wagner et al. (2015) find that neighborhood characteristics such as high population density, low income, or a high share of foreigners increase the number of ended trips for shared vehicles. Chen et Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 16 al. (2013) find that high population density negatively influences total parking duration, whilst total parking duration is positively influenced by a high employment density and student density. Further, multiple studies show that seasonality influences driving behavior; daytime rain in winter and in spring significantly reduces traffic volume (Keay & Simmonds, 2005), as well as cold and snowfall (Agarwal, Maze, & Souleyrette, 2005; Datla & Sharma, 2008). Further, holidays or other events influence travel volume (Festin, 1996). Lastly, it needs no further elaboration that in a city where charging stations are already present, charging station presence has an influence on charging demand at specific locations. 2.3 The business case of charging infrastructure Whilst accessible charging stations appear critical for BEV adoption to ensure sufficient charging demand, the question on who finances roll-out of an effective charging station infrastructure remains. Currently, this question has mainly been answered by “federal, state, and local grant money, with some assistance from automakers” (McCormack et al., 2013, p. 5). Examples include ECOtality Inc., which received $114.8 million government funding for a deployment and evaluation project of BEVs and charging infrastructure, dubbed as The EV Project. Further private investments for this project raised its project value to about $230 million (Car Charging Group, 2016). Similarly, charging network start-up ChargePoint recently received a funding of $50 million, raising its total received funding to almost $150 million (CrunchBase, 2016). However, to further deploy charging infrastructure hence to stimulate BEV adoption, expansion of private sector investment in public charging stations is necessary (Nigro & Frades, 2015). Therefore, the EV market can only be sustainable in the long term when, amongst others, charging service operators can recover their costs and make profit (Madina et al., 2016). Creating a profitable business case for public EV charging infrastructure, however, is challenging due to high investment costs, uncertain demand for available charging, and competition of home and work charging (Madina et al., 2016; Nigro & Frades, 2015; Sadeghi-Barzani, Rajabi-Ghahnavieh, & KazemiKaregar, 2014). Indeed, public charging is still used less than other charging location types, with results varying from 5% to 33% of charging activities performed at public charging locations by users who are also able to charge their EV at home or work (Madina et al., 2016; McCormack et al., 2013; Rauh et al., 2015; Speidel & Braunl, 2014). It may be due to these reasons that studies on cost effectiveness of public charging stations show mixed results. Focusing on fast chargers, Nigro & Frades (2015) find that under current market conditions in Washington US, public fast charging stations cannot become financially viable without public interventions, assuming a desired investment payback of five years. This is confirmed by Schroeder & Traber (2012), who argue that based on 2011 EV penetration levels, fast charging in Germany is unlikely to become profitable. McCormack et al. (2013) on the other hand, are Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 17 more positive, arguing that payback over the course of three years is possible at certain utilization levels – however, to reach these utilization levels EV sales should continue to grow. Madina et al. (2016) conclude similarly that public charging infrastructure can become profitable when there is enough demand. 2.4 Predictive modelling In their paper, Shmueli & Koppius (2011) make a case for including more predictive models and testing in Information Systems research, a field in which most studies focus on explanatory models. Whilst both types of modelling are complementary to each other, “predictive models and testing play an important role in assessing practical relevance of existing theories, and quantifying the level of predictability of phenomena” (Shmueli & Koppius, 2011, p. 559) – therefore, successful prediction will add credibility to the theories that led to it (Kaplan, 2009). Where explanatory modeling focuses on gaining understanding to why or how phenomena of interest happen, predictive modeling aims to predict what will happen should certain preconditions hold (Gregor, 2006). In predictive modeling, therefore, ‘truthfulness’ of the model is of secondary importance – a true model may even be less predictively accurate than a less-true model (Sober, 2002). Predictive accuracy therefore also transcends issues such as correct model specification, model transparency, and multicollinearity; factors which are predominant in explanatory research (Shmueli & Koppius, 2011). Consequently, methods that do not assume a parametric form of the data are frequently used in predictive modelling, such as machine learning algorithms. Explanatory power does not necessarily imply predictive power, as statistical relationship tests common in explanatory research provide information on generalizability of the relationship, yet do not inform us about how well the model is able to predict new instances; however, this is often confused, leading to studies that state predictive goals yet use inappropriate predictive modeling or testing, or studies that have explanatory goals yet provide predictive claims deduced from explanatory models (Shmueli & Koppius, 2011). Of the studies describing a relationship between points of interest and potential charging demand as presented in Table 1, Wagner et al. (2014) and Wagner et al. (2015) are the only two studies that, after explanatory analyses, aim to extend their findings to predictions. Wagner et al. (2014) aim to assess predictive power of the relationship between presence of points of interest and charging station utilization by using their model to predict optimal locations for charging infrastructure in Amsterdam. However, the authors cannot accurately assess the value of their predictions as they merely have data on actual charging points, and not on potentially optimal locations without a charging station. Therefore, predictive accuracy could only be determined if an optimal location would be predicted at the exact same location as an already-existing charging station. Wagner et al. (2015) aim to predict ended ICE car sharing rentals after Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 18 explanatory analyses by predicting data points that were not included in the training data - as is good practice in predictive research. On a critical note, however, the test data covers one specific area of the case city, and is therefore no random sample of the entire dataset – whilst a random sample would be preferred to ensure that the test data is similar to the rest of the data (Shmueli & Koppius, 2011). 2.5 Summary of findings and conceptual model Based on above, we summarize our findings and construct our conceptual model as Figure 1. Based on the literature, we found that presence of specific points of interest has a relationship with charging demand, which in turn influences whether or not charging station infrastructure can become cost-effective. Costeffectiveness of infrastructure through charging demand will increase the presence hence availability of charging infrastructure, which decreases range anxiety. Eventually, a decrease in range anxiety increases EV acceptance amongst customers hence increases the likelihood of mass adoption of BEVs. Our study focuses on the first part of this sequence, by determining the value of using information of points of interest in predicting cost-effective vehicle charging infrastructure locations. Given the promising relationship between points of interest and charging demand, we construct the following predictions in light of our research questions which will be answered using our conceptual model: P1. Electric vehicle charging infrastructure demand is more accurately predicted when using information on points of interest supplementary to information on neighborhood characteristics, seasonality and charging infrastructure presence. P2. A model supplemented with information on points of interest can improve decision-making in predicting cost-effective electric vehicle charging locations. 2.6 The rest of this paper The rest of this paper is constructed as follows. We firstly will describe the used data in assessing abovementioned predictions, where after we will give an elaborate overview of how we prepared the data for analyses. We present different ways of building the predictive model, and choose the model that shows most promising results in light of our response variable. Consequently, we assess predictive value of a model without point of interest information, and predictive value of a model supplemented with point of interest information, to assess prediction 1 and prediction 2. We will discuss our findings, and conclude with limitations of our study and suggestions for future research. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 19 Figure 1. Conceptual model Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 20 3. Methodology To effectively answer the stated research question, we describe the study’s context, the study design and the used data. We explain how we prepared the data for analyses, and how we selected our model. 3.1 Context 3.1.1 Expected charging demand in San Diego In this study, we aim to predict charging demand in the case city of San Diego, California, US. California is recognized as a frontrunner in building public charging infrastructure with 20% of nation-wide public EV chargers located in the state, whilst San Diego currently is being perceived as a projected high EV adoption city (McCormack et al., 2013). This makes San Diego an excellent case to study potential demand for EV charging stations. We base demand prediction on two BEV on-the-road scenarios based on forecasts by Alexander & Gartner (2012), who in turn use potential environmental regulations for their estimations. In the first scenario, regulation changes little to nothing and aims for CO2 emission reduction to 95g CO2/km in 2050. In this scenario, 2% of US vehicles sales are BEV sales – in this study, we make the assumption that this implies that 2% of all vehicles on-the-road in San Diego are BEVs. In the second scenario, new climate goals to reduce CO2 emission to 40g/km in 2050, stimulate BEV sales to 6% of all sold vehicles in 2020. This number of electric vehicles on the road asks for an increase in charging stations – it indeed may be therefore that San Diego policy makers plan to install 3,500 more public EV charging stations in the city (San Diego Gas & Electric, 2016). If we translate these sales percentages to the case city of San Diego, we assume 1.7 cars per household (Governing, 2016) over a total of 493,446 households (Census Reporter, 2016). We therefore estimate 16,777 BEVs on-the-road in San Diego in the 2% scenario, and 50,331 BEVs on-the-road in the 6% scenario. However, since we aim to predict charging demand of privately-owned BEVs, we presume that the main portion of charging is done at home. In line with McCormack et al. (2013), we therefore assume that 30% of charging activities is performed at public charging stations. Please note that sub research question 1, the influence of predicted parking location demand by using data on points of interest, will only be assessed using the 2% scenario, due to limitations imposed by computational efficiency. However, as the 2% scenario and the 6% scenario are linearly related, we expect that predicted demand in these scenarios is influenced in a similar manner. In sub research question 2, where we examine decision-making using the supplemented model, we will make predictions based on both the 2% and the 6% scenario. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 21 3.1.2 Charging station types As Xi et al. (2013) note, one cannot assume that when a car is parked, the battery is loaded until full – as vehicles may only be partially charged depending on parking time, charging infrastructure studies should take this into account. This study therefore incorporates parking time hence potential refueling time in making predictions on charging demand in kWh. However, the type of charging infrastructure makes a difference in estimating how many kWh can be charged during the potential refueling time, hence makes a difference in predicting cost-effectiveness of potential charging locations. To account for this difference, we study potential demand for both ‘slow’ chargers and ‘fast’ chargers. Accounting for fast chargers also overcomes the problem of popular locations with high turnover; when a parking location is visited frequently yet for a short time per parking location, fast charging may be more beneficial from an economical standpoint. There are currently two fully-developed public wired charging levels; alternating-current (AC) level 2, and direct-current (DC) fast charging (SAE J1772 US charging infrastructure standard, U.S. Department of Energy, 2016). Although exact specifications differ amongst reports and literature, we assume that a basic AC level 2 ‘slow’ charging station operates at an output power of 3.6 kW, with 16 amperes and 240 volts, and is therefore able to fully recharge an electric vehicle in 4 to 8 hours, depending on the EV battery capacity. Further, we assume a DC ‘fast’ charging station to be running at 50 kW, with 125 amperes and 500 volts, which enables a user to fully reload an EV in 20 to 30 minutes (Alexander & Gartner, 2012). 3.2 Study design Serving the ultimate goal of charging demand prediction, the deployed study design is a cross-sectional observational study, where observations are made without any intervention (Song & Chung, 2011). Where in explanatory studies, experiments are usually preferred over other study designs to establish causality, in predictive studies observational data may be favorable as “they better represent the realistic context of prediction in terms of the uncontrolled factors, the noise, the measured response and other factors” (Shmueli & Koppius, 2011, p. 562). In order to realistically determine predictability of charging demand using point of interest data, a cross-sectional observational research design therefore is most suitable. 3.3 Data 3.3.1 Data on parking behavior Car2Go We collected data on battery electric vehicle parking behavior in San Diego from Car2Go, a car sharing service provider that offered at the time of measuring, from April 2014 to June 2015, 300 Smart Fortwo BEVs with 16.5 kWh battery capacity to its users. Car2Go users can rent the cars unlimitedly after a onetime sign-up fee, and pay on a use base per minute, hour, or day, as well as an extra fee after 150 miles per Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 22 trip. In San Diego, users are allowed to park for free on-street within the Car2Go home area, and are able to charge the car using one of the 100 public charging stations in the city. Car2Go San Diego parking data We retrieved parking data of Car2Go vehicles in San Diego via a private application programming interface that was provided by the car sharing provider. We wrote a web scraper to automatically retrieve a list of EVs available to rent from the Car2Go website, and stored this time-stamped information in a database every 15 minutes over the course of 63 weeks. This data includes a unique car ID; timestamp (in intervals of 15 minutes); longitude, latitude and corresponding street, zip code, and city; rating of interior and exterior; fuel level; whether or not it is charging; and type of engine. An extract of the obtained raw data can be found as Table 2. When a car is rented, it is not recorded in the data – only ended rentals hence parking instances are documented. In line with San Diego city policy, we assume that when a car is parked at a charging station (which can be inferred from GPS location), it is charged. From the data presented in Table 2, we can infer that a car was parked yet not charged at 22.45 hrs and at 23.00 hrs. In the following 15 minutes, the car was rented, where after it was parked again between 23.45 and 00.00 hrs. A limitation of the data includes the interval time of 15 minutes, which can lead to a situation where a car is parked and immediately picked up within this 15-minute time period without this showing in the data. However, we believe that this likelihood is small enough to have a negligible impact on measured parking behavior. Table 2. Extract raw data parking behavior Extract raw data parking behavior CarID 6RFN700 6RFN700 6RFN700 Timestamp 2014-10-24 22:45:01 2014-10-24 23:00:01 2014-10-25 00:00:01 Longitude -117.15684 -117.15684 -117.13811 Latitude 32.71047 32.71047 32.70683 Street Island Ave 856 Island Ave 856 26th St 140 Zip 92101 92101 92101 City San Diego San Diego San Diego Interior Good Good Good Exterior Unacceptable Unacceptable Good Fuel 54 54 52 Charging No No No Engine type Electric Electric Electric Car2Go San Diego operating area and parking locations Although Car2Go users in San Diego are free to drive anywhere around the city, they are allowed to end their rental only within the Car2Go San Diego operating area. The Car2Go San Diego operating area consists of multiple closed polygons, where each corner point of each polygon is defined by a longitude coordinate λ and a latitude coordinate ϕ. From the raw data we can extract parking location addresses within the San Diego operating area, consisting of a street name and house number. Due to occasional slight deviance in GPS accuracy, in the Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 23 raw data multiple longitude and latitude coordinates ! and " were assigned to the same address. To ensure that parking locations are not too spread yet not too granular, in order to still be able to capture variations between parking locations, we modified parking locations so that each would cover a radius of maximum 100 x 100 meter, corresponding to about λ = 0.0015 and ϕ = 0.00095. This radius is based on Wagner et al. (2015), who tested several values of granularity and found this threshold to best balance computational complexity and spatial detail. We extracted 90,388 unique parking locations as visualized in Figure 2. As one can deduct from the figure, contrary to the San Diego operating area rules, few data points lie outside the operating area – however, as these data points are within 1000 meter from recorded points of interest in the operating area, these data points were not removed. Figure 2. San Diego parking locations - where red dots indicate parking locations Data validity As this observational data was gathered without Car2Go users directly being aware of it, the data is expected to provide a true image of BEV parking behavior in San Diego in the context of shared cars. Although we recognize that there may be differences in parking behavior between BEV shared cars and BEV privately-owned cars, we believe that this difference is marginal within-city, especially due to the ‘free-float’ nature of Car2Go, where users are able to pick up and drop off a vehicle anywhere within a designated range. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 24 3.3.2 Data on points of interest OpenStreetMap points of interest Data on points of interest located in the designated Car2Go San Diego operating area was obtained via OpenStreetMap (OSM), a community-driven open data project that provides free geographic information. To subtract POI coordinates, we used the osmar open-source R package. We extracted the geographic location of a total of 1,901 POIs across 133 POI types, of which an overview can be found as Appendix 1. An extract of the raw data can be found as Table 3. Hereafter, we merged the POIs into 24 mutually exclusive POI categories closely corresponding to the OSM classification of points of interest (OpenStreetMap, 2016b). An overview of POIs per category can be found as Table 4, whilst point of interest location visualization can be found as Figure 3. Table 3. Extract raw data points of interest Extract raw data points of interest POI type Restaurant Library Church Supermarket Supermarket Station POI name Thai Time Bistro Point Loma Branch Public Library Holy Trinity Episcopal Church 7-Eleven 7-Eleven Fenton Parkway Longitude -117.2487 -117.2294 -117.2448 -117.2470 -117.2352 -117.1271 Latitude 32.74396 32.74003 32.74783 32.75198 32.74380 32.77833 Figure 3. San Diego points of interest - where red dots indicate points of interest Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 25 Data validity OpenStreetMap greatly values quality assurance and has several systems in place to detect errors, inaccuracy or sparseness of data (OpenStreetMap, 2016c). However, despite these measures, the service is still based on volunteered geographic information hence may be prone to limitations in, for example, accuracy and completeness (Haklay, 2010). Although on US road-mapping level, OSM performs remarkably well (with over 7 million roads mapped compared to 4 million roads mapped in the established CIA World Factbook (Maron, 2015)), it may be that points of interest are inaccurately mapped due to small GPS errors, or not mapped at all (OpenStreetMap, 2016a). Whilst the presence or absence of points of interest is very challenging to measure, with companies getting in and out of business every day, positional accuracy is studied in Haklay (2010), where the authors find a 5.8 meter average difference between mapped locations in London, UK in OSM and Ordnance Survey. A potential six-meter difference in our study context seems negligible, as an extra six meters walking from a parking location to a point of interest will reasonably not determine parking location selection. Table 4. Points of interest categories Points of interest categories POI category Accommodation Airport Courthouse Education Entertainment F&B F&B store Financial Fire station Graveyard Healthcare Historic Leisure Office Place of worship Police Post Prison Public transport Shopping Sport Toilets Tourism Transportation Count 69 3 3 100 23 639 106 65 12 3 37 7 57 16 274 1 32 2 48 232 10 29 51 82 3.3.3 Data on neighborhood characteristics San Diego neighborhoods Data on neighborhood characteristics is extracted from the San Diego Table 5. San Diego neighborhoods Planning Department (2016). This source provides names and approximate San Diego neighborhoods locations of neighborhoods in San Diego – according to this classification, Neighborhood Count Airport Balboa Park City Heights College Area Coronado Downtown Encanto Greater Golden Hill Greater North Park Hillcrest Kensington-Talmadge Linda Vista Mid-City Mission Beach Mission Valley Navajo Old Town Pacific Beach Peninsula Southerneastern SD 1819 1654 432 483 1 15830 9 3927 20097 5733 1360 2 6534 1056 2591 2 6943 9507 12025 383 the parking locations in our study are found in 20 different neighborhoods, as Table 5. Data validity The San Diego Planning Department does not provide street names or coordinates on neighborhood boundaries; the division of neighborhoods is merely determined by visual inspection of the provided map by this source. It may therefore be that some parking locations are not accurately allocated to a neighborhood. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 26 3.4 Data preparation 3.4.1 Response variable In order to most effectively answer our research question, we aim to predict the potential charging demand in kWh that can yearly be loaded per parking location in San Diego, given the scenarios as presented in Section 3.1. This requires modification of the raw parking data as follows. A table of notation can be found as Table 6. A unique parking instance # is defined as follows: # = (&, (, )*+ . . )*- , .) (1) Where & is a unique car id, ( is a parking location consisting of a street and house number, )* are consecutive timestamps of 15 minutes, and . is the initial fuel level of the car upon parking. Once either the parking location of the car changes, or if two timestamps do not follow each other in 15 consecutive minutes, the car is expected to be rented and a new parking instance is recorded for that car the moment a new timestamp appears in the data. Over the course of the recorded data, a total of 444,463 unique parking instances occurred. To determine the potential charging demand in kWh per parking instance, we firstly determine the number of minutes 0 a car is parked per parking instance # : 01 = .234 ()*1+ . . )*1- ∗ 15) + 28 (2) Where .234 )*1+ . . )*1- indicates the number of consecutive timestamps of 15 minutes for the specific parking instance #. Presuming an optimistic scenario, we assume that when a parking instance occurs, the car was parked 14 minutes before the first-recorded timestamp of # , and that the car was rented 14 minutes after the last-recorded timestamp of # . Assuming that whenever a car is parked, it is being charged, we are presented with two mutually exclusive possibilities; the car is either rented before it reaches full battery, or the car is parked longer than it requires to reach full battery. To accurately determine the potential charging demand per parking instance, the smallest kWh of these two options presents the potential kWh loaded ;<=ℎper parking instance # as follows: ;<=ℎ1 : (3) #.;<=ℎ_)1 < ;<=ℎ_C1 → ;<=ℎ_)1 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 27 #.;<=ℎ_)1 > ;<=ℎ_C1 → ;<=ℎ_C1 Where: ;<=ℎ_)1 = 01 ∗ F 60 ;<=ℎ_C1 = (100 − .1 ) ∗ (4) 16.5 100 (5) Where ;<=ℎ_) signifies the potential kWh loaded given the minutes 0 a car is parked during a parking instance and is loaded by a charging station with F kW output power from the charging level. Similarly, ;<=ℎ_C marks the potential kWh loaded given the initial fuel level . of the car during a parking instance – assuming a 16.5 kWh vehicle battery. To predict the potential kWh charged per parking location ( whilst being able to control for seasonal variance, we aggregate ;<=ℎ of all parking locations over unique location-week combinations JK . Given 90,388 unique parking locations over the course of 63 weeks, this results in a total of 5,694,444 JK , where the 444,463 parking instances occurred in 290,241 location-week combinations. Given that the recorded data measured parking behavior of 300 BEVs, we calculate potential charging demand in kWh over the course of one year ;<=ℎ_L per parking location ( as follows: Σ(;<=ℎ1 , JK) ∗ 52 63 ;<=ℎ_LM = ∗ 0.3P 300 (6) Where P is the number of expected on-the-road electric vehicles as described in Section 3.1, that charge 30% of their charging activities at a public station. 3.4.2 Point of interest predictors To determine the optimal predictive model, we apply four different ways of framing the predictors and eventually choose the framing that yields best predictive results. As predictor values, we used several modifications of the distance between the 24 point of interest types in which the 1,901 points of interest were subdivided, and the 90,388 unique parking locations. The distances between points of interest and parking locations were calculated using the Vincenty formulae (Veness, 2015). This shortest-distance calculation uses an ellipsoidal model of the earth, which makes its calculations more accurate than for example formulae based on a spherical earth model – even though we are calculating short distances hence improved accuracy is minimal. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 28 Not every point of interest influences a parking location; for example, few people would find it attractive to park their car at a location that is a 30-minutes walk away from the café where they aim to have coffee. In three out of four ways of framing the predictors, we therefore take willingness-t0-walk thresholds into account. These thresholds are set at 100 meter, 200 meter, 500 meter and 1000 meter – the latter maximum threshold loosely based on the maximum distance that people are willing to walk for general purposes (Untermann & Lewicki, 1984). For the fourth way of framing the predictors, we let go of this willingness-to-walk standard and merely look at the closest points of interest, irrespective of distance. Framing predictors method 1: Presence points of interest willingness-to-walk The literature presented in Section 2 gives reason to assume that the mere presence of points of interest within a willingness-to-walk range, may influence the potential charging demand at a parking location. Presence of point of interest categories within a willingness-to-walk range from a parking location is therefore calculated as a categorical variable, as follows: Q23*3R&3S, (: (7) #.TM,UV ≤ 2 → 1 #.TM,UV > 2 → 0 Where the presence of a point of interest category S respective to a parking location ( is determined by whether the distance T between the parking location and a point of interest location ℎ belonging to point of interest category S is equal to or smaller than the designated willingness-to-walk range 2 . Framing predictors method 2: Presence frequency points of interest willingness-to-walk In their study, Shahraki, Cai, Turkay, & Xu (2015) find that charging demand for the Beijing taxi fleet concentrates in the inner city, where most POIs are located. The authors hereby hint towards an influence on changing demand by not only the mere presence of points of interest, but also by the number of points of interest that can be found within a certain range. We therefore calculate the number of times that a point of interest category can be found within a willingness-t0-walk range from a parking location: Q23*3R&3.234X3R&LS, (: (8) .234(TM,UV ≤ 2) Where we count the frequency that the distance T from a POI location ℎ belonging to POI category S to a parking location ( is equal to or smaller than the given willingness-to-walk range 2 . Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 29 Framing predictors method 3: Weighted distance points of interest willingness-to-walk Similar to the rationale presented above regarding a maximum willingness-to-walk range, one could argue that points of interest also have a varying impact on a parking location within a willingness-to-walk range, depending on their distance from a parking location. As an example, the bar that is located just across the street of a specific parking location is more relevant to that location than a bar that is situated a tenminutes walk away. This also overcomes a potential drawback of method 2; if all points of interest within a specific ratio find themselves on the far edge of the ratio (e.g., in a ratio of 100 meter, all POIs are located 99 meter away from the parking location), in method 2 they would be weighted as if these POIs would be 1 meter away from the parking location. We base the impact value that each POI has on each parking location on the model proposed by Van Der Goot (1982), who linearly models the impact that a parking area has on its destination. We inverse this relationship so to linearly model the impact that a POI destination has on a parking location, where a destination-location has an impact of 0 when it exceeds the willingness-to-walk range, and where a destination-location has an impact of 1 when both the POI and the location are located within one meter from each other. This weighted distance score of a point of interest category respective to a parking location is calculated as follows: =3#Sℎ)3T*&Y23S, (: Σ( #.TM,UV ≤ 2 → (9) 1 TM,UV #.TM,UV > 2 → 0 ) Where a relevance score per point of interest, calculated by the distance T between a parking location (and a point of interest location ℎ belonging to the point of interest category S, is summed. Framing predictors method 4: Presence points of interest closeness In method 1, we stated that based on provided literature it is likely that the presence of a point of interest influences the charging demand of a parking location. In this method, we relax the assumption of a willingness-to-walk boundary and merely determine the point of interest types that are closest to a specific parking location. We therefore determine if a point of interest category is the nth-closest POI category from a parking location, as a categorical variable: Q23*3R&3RZU &JY*3*)S, (: (10) #.TM,UV = R_min(TM,^ ) → 1 #.TM,UV ≠ R_min(TM,^ ) → 0 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 30 Here, we establish if the distance T between a parking location ( and a point of interest location ℎ belonging to a point of interest category S , is the nth-smallest distance between a parking location ( and all points of interest locations ` . In other words, we determine which point of interest category, of all points of interest, is nth-closest to the parking location. 3.4.3 Base predictors Neighborhood We include the neighborhood of a parking location as a categorical predictor of potential charging demand. Seasonality To capture influence of seasonality on potential charging demand, we include the week number in which potential charging occurs as a categorical variable. Charging station During the time of measurement, 100 public charging stations were operational in the city of San Diego. To account for a higher parking frequency based on the current availability of a charging station, we include charging station presence as a categorical variable in the analyses. Table of notation Table 6. Table of notation Variable Description Unit a c d f g h H i lw m pkWh pkWh_b pkWh_t pkWh_y r ts v z Parking location street and house number Unique car ID tag Distance meter Initial fuel level of car % POI category tag POI location tag All POI locations Parking instance tag Location-week combination tag Parking time minutes Potential kWh charged kWh Potential kWh charged given a car's initial fuel level kWh Potential kWh charged given a car's parked minutes kWh Potential kWh charged in 52 weeks kWh Willingness-to-walk range meter Consecutive timestamp of 15 minutes index On-the-road vehicles integer Output charging station kW Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 31 3.5 Model selection 3.5.1 Selecting predictors and method Although the aim of this paper is to predict cost-effective locations in several scenarios, for reasons of time-restrictions and computational efficiency we use a trial scenario to determine which way of framing the predictors and which statistical method yields the most accurate predictions. We will select predictors and method based on the following trial scenario: All responses: 90,388 parking locations over 5,694,444 location-week combinations All predictors: 1,901 points of interest over 24 point of interest categories 3.6 kW output charging stations 2% of all on-the-road vehicles are BEV Willingness-to-walk range of 200 meter or 7-th closest points of interest* *average number of POIs within 200 meter On this trial scenario, we apply two different methods; Gradient Boosting Method (GBM) and Generalized Linear Models (GLM), both using 10-fold cross validation. Given its computational power, we perform the analyses on open source machine learning platform H2O. 3.5.2 Gradient Boosting Method Gradient Boosting (Click, Lanford, Malohlava, & Parmar, 2015) is a non-parametric, supervised machinelearning ensemble technique that is currently described as one of the most powerful predictive methods (Dell Software, 2015), and which can be used for both classification tasks and regression tasks. GBM is based on boosting, where trees are added to the ensemble sequentially – each iteration, a new weak model is trained with respect to the error of the ensemble so far (Natekin & Knoll, 2013), so that gradually the loss function of the model is minimized. We decided to apply GBM because of, again, its promise to deliver one of the most accurate predictions. Further, as the relationship between the predictors and the dependent variable appears not to be linear (of which an example can be found in Appendix 2), a nonparametric approach may yield better results than a parametric approach would. Gradient Boosting is a highly flexible method, making that the chosen parameters may greatly influence results (Natekin & Knoll, 2013). Three parameters that may strongly influence results are as follows (Jain, 2016): • Maximum number of trees the model may grow. A high number of trees usually creates a more robust model, yet the model may overfit at some point. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 32 • Maximum depth a tree may grow. A high depth may lead to overfitting due to the model learning specific relations in the sample. • Learning rate of the model, ranging from 0 to 1. A lower learning rate makes the model more robust hence more generalizable. After experimenting with different thresholds regarding the three above-mentioned criteria (with learning rate 0.05 and 0.1 and maximum tree depths of 4, 5 and 6), we decided on a learning rate of 0.1 and maximum tree depth of 4. For the sake of comparison and computational efficiency, we keep these two parameters at the same level for all models. 3.5.3 Generalized Linear Models Generalized Linear Models are a more flexible extension of traditional linear models – where the assumptions in traditional linear models make these models restrictive to certain problems, in GLM these assumptions are relaxed by, for example, allowing a non-linear relationship between response and predictors (Nykodym, Kraljevic, Hussami, Rao, & Wang, 2016). We chose to test GLM due to its interpretability; parametric approaches such as GLM are in prediction usually easier to interpret yet less accurate. Although the distribution of the response variable is highly skewed towards zero (Appendix 3), correct model specification is secondary in predictive analyses (Shmueli & Koppius, 2011) hence we use a Gaussian distribution as our response is continuous and real valued. 3.5.4 Data partitioning To ensure model validity hence model performance on cases that were not used to train the model with, we perform 10-fold cross validation on each modelling technique. In cross-validation, the dataset is randomly split up in k sets (here, ten), where after it iterates and tests k times where each time a different set is chosen as holdout data, where the other sets are combined as train data. This approach is preferred over merely one holdout set, as it better shows model validity hence predictability over previously unseen cases. We include a random 80% of the total dataset to train and cross-validate the model on; predictions are made on the remaining 20% of the dataset, as of now referred to as the holdout set. 3.5.5 Comparison metrics We evaluate the predictors and methods based on three different criteria, thereby implying that these performance measures provide the most useful information on predictive power of our model. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 33 1. Absolute difference between actual and predicted response on the holdout set 2. Root mean squared error on the holdout set 3. Adjusted R-squared on the holdout set Absolute difference To extract how much, on average, actual potential kWh demand deviates from predicted potential kWh demand per parking location, we derive the absolute difference between each actual response and predicted response on the holdout set and divide this by the number of observations in the holdout set, as follows: aC*YJX)3T#..323R&3: (11) Σ|(&)X(J23*;YR*3ℎYJTYX) − ;23T#&)3T23*;YR*3ℎYJTYX)| RYC*32P()#YR*ℎYJTYX) Root mean squared error Similar to the absolute difference, the root mean squared error (RMSE) measures the difference between the actual L1 and the predicted values L1 of the response, and penalizes large differences more severely than small differences. The RMSE is calculated on the holdout set, given the following generic formulation: cdef: 1 R (12) - (L1 − L1 )g 1h+ Adjusted R-squared The adjusted R-squared gives an indication on how much of the variance in the response variable is explained by the predictors, taking the number of predictors into account. When the R-squared is given, the adjusted R-squared on the holdout set is calculated as follows: aTiX*)3Tce4X(23T: 1 − (13) (1 − c g )(R − 1) R−;−1 Where c g is the holdout R-squared, R the sample size of the holdout set, and; the number of predictors. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 34 3.5.6 Overview predictors and methods We made predictions with the different predictors and methods, using an un-optimized GBM model with 50 trees, a 0.1 learning rate and a maximum tree depth of 4, and an un-optimized GLM model with Gaussian distribution. Please find an overview of the absolute predicted difference, the predicted RMSE and the predicted adjusted R-squared as Table 7. Here, we find that Gradient Boosting Method outperforms Generalized Linear Models on each metric. Further, the predictor type that yields the best results on the three comparison metrics is method 3, the weighted distance score of a point of interest category respective to a parking location within a willingness-to-walk range. Table 7. Overview predictors and methods Overview predictors and methods Metric Method 1 GBM Absolute difference 84.15 RMSE 44.97 Adjusted R-squared 0.1866 GLM 85.21 49.12 0.0175 Method 2 GBM 82.95 44.12 0.1946 GLM 84.81 48.78 0.0173 Master Thesis MSc. Business Information Management | Method 3 GBM 80.57 43.38 0.1938 GLM 83.58 47.91 0.0174 Stéphanie Florence Visser Method 4 GBM 81.70 43.65 0.1916 GLM 83.97 48.50 0.0174 35 4. Assessment of predictive value In this section, we will evaluate the predictive value of our models. According to Shmueli & Koppius (2011), there are three steps in evaluating predictive models; evaluating overfitting, predictors, and predictive power. Firstly, as overfitting is the biggest concern in predictive modelling, we assess this based on the cross-validation MSE to ensure predictive power of the models on unseen cases. Secondly, we evaluate the predictors and their influence on the response variable. Consequently, we evaluate the models’ power in predicting charging demand, based on the absolute difference between predicted and actual potential demand, the root mean squared error on the holdout set, and the adjusted R-squared on the holdout set. Lastly, we evaluate the models’ power in predicting cost-effective locations. In Section 3.1 we presented a 2% BEVs on-the-road scenario and a 6% BEVs on-the-road scenario. Given that the 2% and the 6% scenarios are linearly related, predictions on one of these scenarios will yield similar results to predictions on the other scenario. In light of computational efficiency, we therefore assess the value of points of interest on demand predictions only using the 2% scenario, for both slow charging and fast charging. When assessing predictive power on cost-effective locations, however, we use our models to estimate both the 2% and the 6% scenario, again for both slow charging and fast charging. 4.1 Demand predictions base models In the base models, we model potential kWh charging demand per parking location against the three moderating predictors; neighborhood, seasonality, and charging infrastructure presence. We develop different models for predicting slow charging demand and predicting fast charging demand. Due to its superior performance as proven in Section 3.5 we use GBM with predictor values based on the weighted distance score of a point of interest category respective to a parking location within a willingness-to-walk range. We again hold on to the GBM learning rate of 0.1 and a maximum tree depth of 4, and start modelling with 50 trees – we will, however, assess the number of trees to avoid overfitting for each model. 4.1.1 Base model slow charging Assessment of overfitting When initially fitting the base model to 50 trees with a learning rate of 0.1 and maximum tree depth of 4 on slow charging demand, the model overfits quickly as can be seen in Figure 4, where the red dots indicate the lowest MSE per cross-validation iteration, and where the bold line shows the MSE on the training data. After comparing performance given different numbers of trees (comparing 4 to 14 trees), we find that a model with 9 trees yields the lowest MSE. As we can deduce from Figure 5, with 9 trees almost all cross-validation iterations do not overfit. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 36 Interestingly, and counter-intuitive of what one would expect, few cross-validation metrics perform better in terms of MSE than the training set. As expected, the cross-validation MSE (2,310.96) is still larger than the train MSE (2,226.94). It may therefore be that although the test- and training sets are randomly split, the training set is unusually difficult to train, or some test sets are unusually easy to predict. Figure 4. Base model slow charging 50 trees – where red dots indicate lowest MSE Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 37 Figure 5. Base model slow charging 9 trees – where red dots indicate lowest MSE Predictor importance The predictor importance graph as Figure 6 shows the impact of the predictors on the response variable, where the predictor with the highest importance on the model is scored with a scaled importance of 1.0, and where predictors with no influence on the model are scored with 0.0. In this model, all three moderating predictors are deemed important, with neighborhood characteristics being most important in predicting the response. Seasonality appears to have the least influence in this model on making predictions. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 38 Figure 6. Predictor importance base model slow charging Predictive accuracy After fitting the chosen GBM model with 9 trees, a learning rate of 0.1 and a maximum tree depth of 4 on the holdout set, we find predictive metric results as in Table 8. The average absolute difference between the actual kWh demand and the predicted kWh demand is high with 82.89, compared to a mean kWh demand of 320.4 kWh. The same applies to the RMSE of 47.6. Further, with an adjusted R-squared of 2.97%, the proportion of variance in the response variable that is explained by this base model is low. Table 8. Predictive metrics base model slow charging 4.1.2 Base model fast charging Assessment of overfitting When fitting the base model with a learning rate of 0.1, a maximum tree depth of 4 and 50 trees on fast charging demand, the model overfits quickly as can be seen in Figure 7. Comparing different numbers of Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 39 trees (4 to 14 trees), we find that a model built with 8 trees yields the lowest MSE. In Figure 8, we indeed find that almost all cross-validation iterations do not overfit with this model. Figure 7. Base model fast charging 50 trees - where red dots indicate lowest MSE Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 40 Figure 8. Base model fast charging 8 trees - where red dots indicate lowest MSE Predictor importance Figure 9 shows the impact of predictors on the response variable. In predicting fast charging demand, the area of a parking location is again the main determinant of its potential kWh demand. However, where in the slow charging base model charging station presence was a greater determinant than the week in which the car was parked, in the fast charging base model both are almost equally as predictive of kWh demand. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 41 Figure 9. Predictor importance base model fast charging Predictive accuracy After fitting the GBM model of 8 trees with a learning rate of 0.1 and a maximum tree depth of 4 on the holdout set, we find accuracy measures as in Table 9. With an absolute difference between predicted and actual potential kWh demand of 111.46 and a RMSE of 58.84, the fast charging base model predicts potential kWh demand worse than the slow charging base model. This reflects in the adjusted R-squared, where only 2.3% of the variance in the response is explained by our model. Table 9. Predictive metrics base model fast charging 4.2 Demand predictions points of interest model In predicting potential kWh demand given slow charging and fast charging with a model supplemented with point of interest information, we again use a GBM model with 0.1 learning rate and a maximum tree depth of 4. We choose to model with 100 trees, as no overfitting takes place up to that point; although increasing the number of trees may yield more accurate predictions, due to computational efficiency we decide not to increase the number of trees. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 42 To select the best model given the predictors, we use a step-wise approach to predictor selection as is common in predictive research (Shmueli & Koppius, 2011). To do so, we start with the three moderating variables as in the base model, and iteratively select predictors to include in the model, where after we assess the model on absolute difference, RMSE, and adjusted R-squared on the holdout set. Based on this approach, we built four different models; (1) including all POIs within 100 meters, (2) including all POIs within 100 and within 200 meters, (3) including all POIs within 100, 200, and 500 meters, (4) including all POIs within 100, 200, 500 and 1000 meters. Please note, however, that due to computational efficiency (the 4th model took over 9 hours to compute, given a computer with 4GB memory and 1.4 GHz processor), we only assessed these four models based on a slow charging 3.6 kW charging scenario. Please find our findings regarding the four models as Table 10. The more variables are fitted into the model, the better it predicts new instances; the absolute difference decreases, as well as the RMSE. Further, the variation in the response variable explained by the model increases. Based on these outcomes, we decided on a predictive GBM model with 0.1 learning rate, tree depth of 4, 100 trees, and predictors based on the weighted distance of points of interest within a 100 meter, 200 meter, 500 meter, and 1000 meter willingness-t0-walk range. Lastly, in an attempt to increase prediction on this model, we tried reducing the data dimensions as predictor reduction may lead to higher predictive accuracy (Shmueli & Koppius, 2011). We therefore removed the predictors that provided less than 0.02 scaled importance on the predictors. However, reducing the number of predictors increased the model’s error hence decreased its predicting ability (with an absolute difference of 79.19, RMSE of 41.47, and an R2 of 0.26). We therefore decided to include all predictors in the model. Table 10. Predictive metrics model comparison 4.2.1 Points of interest model slow charging Overfitting As mentioned, none of the cross-validation sets overfit at a total of 100 trees, as is confirmed by Figure 10. As with the base model, there are few cross-validation sets that predict better than the average training set; it may again be that either the cross-validation sets contain instances that are particularly easy to predict, or that the training set contains instances that are particularly hard to predict. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 43 Figure 10. Supplemented model slow charging 100 trees - where red dots indicate lowest MSE Predictor importance Of the 92 predictors in the model, 25 predictors have a scaled importance on the response variable of 0.02 or higher. In Figure 11 we further inspect these variables. The variable that has the largest influence on potential kWh slow charging demand at a parking location is the weighted distance of an F&B store within 1000 meter from the parking location. Interestingly, F&B stores within a walking distance of 100 meter also appear as one of the 25 most influential predictors, yet with a scaled importance of just above 0.02. A leisure activity in the neighborhood also influences the potential kWh slow charging demand; where a leisure activity within 100 meter of the parking activity is the second-most predictive point of interest, leisure POIs within 200, 500 and 1000 meters are all within the 25 most predictive POIs. Other predictive points of interest include the presence of an airport within 1000 meter; a touristic activity within 500 or 1000 meter; an educational facility within 500 or 1000 meter; an F&B POI within 200, 500 or 1000 meter, or a religious institution within 500 or 1000 meter. Also adding to predictive power yet less influential are the presence of public toilets within 1000 meter; accommodation within 200 meter; a shopping location within 500 meter; access to public transport within 500 meter; an entertainment activity within 200 meter; or a healthcare institution within 1000 meter. It further is remarkable to note that even with points of interest input, the base predictors neighborhood characteristics, seasonality and charging station presence are the third, fourth, and eight-most important predictors of potential kWh slow charging demand. It is interesting that whilst seasonality was leastMaster Thesis MSc. Business Information Management | Stéphanie Florence Visser 44 predictive of kWh demand in the base slow charging model, it significantly surpasses charging station presence in terms of influence on the response in the supplemented model. Figure 11. Predictor importance supplemented model slow charging Predictive accuracy Table 11 repeats the accuracy measures of our fitted model. The supplemented model predicts potential kWh slow charging demand better than the base model; the absolute difference between predicted and actual kWh demand is almost 6% lower than in the base model, whilst the RMSE is 14.5% lower than in the base model. The supplemented model also provides a notable leap in adjusted R-squared, where it now explains 29% of the variance in the response, compared to 2.9% using the base model. Table 11. Predictive metrics supplemented model slow charging 4.2.2 Points of interest model fast charging Overfitting Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 45 Fitting a model with a learning rate of 0.1, tree depth of 4, 100 trees, and all predictors of 100 meter, 200 meter, 500 meter and 1000 meter, does not lead to overfitting given potential fast charging kWh demand prediction as seen in Figure 12. There are again few cross-validation sets that eventually produce a lower MSE than the MSE of the average training set, which indicates that also in this model there are instances in the cross-validation sets that are easier to predict, or instances in the test set that are harder to predict. Figure 12. Supplemented model fast charging 100 trees - where red dots indicate lowest MSE Predictor importance Although we included all predictors in the model, we examine the 25 predictors with the highest influence on the response variable in Figure 13. The first remarkable finding is the importance of the base predictors; neighborhood characteristics of the parking location has most influence on the response, whilst seasonality is third-most influential, and charging station presence seventh-most important. This is especially interesting given that these three variables alone produced a relatively badly-predicting model. Other than that, similar to the slow-charging model, the presence of an F&B store within 1000 meter, and of an airport within 1000 meter highly influence response. Proximity to a leisure activity is also important; however, in the fast charging model a 1000-meter range in leisure activities has the highest predictive power, whilst in the slow charging model, a 100-meter range has most influence. Almost all points of interest that are of influence in the slow charging model, are also of influence in the fast charging model; a touristic activity within 500 and 1000 meter, a religious institution within 500 and 1000 meters, an F&B point of interest within 500 and 1000 meter, an educational institution within 200 and 1000 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 46 meter, accommodation within 200 meter, public transportation within 1000 meter, and an entertainment facility within 200 meter. However, in the fast charging model, also presence of an office within 200 meter, of a fire station within 500 meter, and of transportation within 500 meter influences the response variable. Figure 13. Predictor importance supplemented model fast charging Predictive accuracy Table 12 provides the predictive measures of the fast charging model supplemented with points of interest information. The fast charging model provides better predictions in terms of accuracy than the base model. As with the slow charging model, the absolute difference between predicted and actual potential kWh demand is 6% more accurate with the supplemented model than with the base model. Further, we find that the RMSE is 13.5% lower with the supplemented model compared to the base model. Lastly, inserting data on points of interest notably increases explanatory power of the model, from 2.2% in the base model, to 27% in the supplemented model. Table 12. Predictive metrics supplemented model fast charging Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 47 4.2.3 Demand prediction differences In light of our first sub-research question and first prediction, we find that BEV potential kWh charging demand is more accurately predicted when using points of interest supplementary to information on neighborhood characteristics, seasonality and charging infrastructure presence. 4.3 Location predictions Based on above-developed models, we determine how these models can be used to improve decisionmaking in predicting cost-efficient charging station locations. 4.3.1 Costs and revenue In estimating cost-efficiency of charging stations, we must make several assumptions regarding costs and revenue involved in charging infrastructure ownership and operation. Main costs involve hardware, installation and maintenance; for the sake of simplicity, we exclude land costs due to their high variability between- and within-city, as well as investment interest rates, time value of money, and potential investments – the latter as we aim to estimate a scenario in which charging stations can operate financially independent. Please find an overview of costs per charging station type as Table 13. Table 13. Lifetime charging station costs $ Lifetime charging station costs $ Material costs Installation costs Maintenance and repair yearly Costs yearly Total investment costs 10 years DC fast charging AC level 2 public 14,000 2,000 27,300 4,000 1,400 200 5,530 800 55,300 8,000 Slow charging Charging infrastructure hardware costs significantly depend on the type of charging station and the features it provides. In 2015, Smith & Castellano (2015) estimated AC level 2 unit costs between $400 for a basic wall charger to $6,500 for a charger with advanced features. However, hardware costs have significantly decreased; Alexander & Gartner (2012) found that infrastructure level 2 charging hardware costs decreased with about 50% between 2011 and 2013. We expect these costs to keep decreasing, and therefore estimate hardware costs of a public pedestal with basic data collection features in 2020 at $2,000, based on lower estimates of similar hardware by Smith & Castellano (2015). Installation costs include labor costs, material costs, permits and taxes (Smith & Castellano, 2015); these costs, however, vary widely per geographic region, mainly due to differences in labor costs. Where average US installation costs per AC level 2 charging station total $3,000, are costs in San Diego higher with an average of $4,000 installation costs per charging station (Smith & Castellano, 2015). Maintenance costs approximate 10% of hardware costs per year (Schroeder & Traber, 2012). Lastly, charging station life span ranges from 10 to 15 years (Schroeder & Traber, 2012) – we therefore assume a pessimistic scenario of 10 years. Total costs Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 48 over the course of 10 years therefore add up to $8,000. Ignoring time value of money and, as mentioned, investment interest rates and land costs, we therefore assume yearly slow charging station costs of $800. Fast charging Fast charging hardware costs can be estimated at $10,000 to $40,000 per unit, depending on features and power (Smith & Castellano, 2015). Estimating hardware costs of a public DC fast charger with basic features yet low power output of 50 kW whilst keeping decreasing hardware costs in mind, we estimate the costs per single fast charging station to be $14,000 in 2020. Installation costs are also significantly higher for fast chargers than for slow chargers; where fast charging installation costs in the US average to $21,000 (Smith & Castellano, 2015), we add a margin for higher-than-average San Diego labor costs of about 30%, and estimate installation costs at $27,300. Yearly maintenance costs are again estimated at 10% of hardware costs, and we again assume a life span of 10 years. Excluding time value of money, investment interest rates and land costs, we predict 10-year costs of a fast charging station to be $55,300, hence yearly costs of $5,530. Revenue Currently, there are four main payment structures in EV charging; no payment (free), fixed rate (e.g. monthly), pay per charge, and pay per used resources (Madina et al., 2016). In this study, we focus on the latter, where a customer is charged per kWh loaded. We investigate two price levels, assuming that these price levels per kWh equal income per kWh: Low price level: $0.40 per kWh slow charging, $0.75 per kWh fast charging Moderate price level: $0.50 per kWh slow charging, $1.00 per kWh fast charging 4.3.2 Predictive metrics To assess predictive power of the models in estimating cost-effective locations, we used the following metrics. To account for situations where a charging station is not charging due to, for example, the parking area temporarily being unoccupied, we assume in these calculations that 1/6th of the yearly capacity of a charging station is unused. We therefore presume the yearly capacity of a slow charging station to be 26,280 kWh, whilst the yearly capacity of a fast charging station is presumed at 365,000 kWh. Locations predicted vs actual The number of predicted cost-effective locations versus the number of predicted locations that, given the actual potential kWh demand at that location over a year, are actually cost-effective. Cost-effectiveness per parking location is attained when: Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 49 Slow charging at $0.40 At least 2,000 kWh charged Slow charging at $0.50 At least 1,600 kWh charged Fast charging at $0.75 At least 7,373.33 kWh charged Fast charging at $1.00 At least 5,530 kWh charged Costs infrastructure The number of predicted charging stations, which is in our prediction equal to the number of predicted parking locations, times the costs per charging station. Actual charged kWh The sum of the actual potential kWh charging demand at the predicted parking locations. Revenue charging Actual charged kWh times the income per kWh. Profit all stations The revenue from charging minus the costs of infrastructure. Profit per station Profit from all charging stations divided by the number of predicted charging stations. 4.3.3 Cost-effective charging locations Using above metrics, we predict cost-effective parking locations given the different scenarios, as can be found in Table 14. In these predictions, we examine situations where charging infrastructure suppliers can choose to place a combination of both slow charging stations and fast charging stations. Before analyzing predictions between the base model and the supplemented model, it firstly is interesting to note that the supplemented model predicts to place merely one charging station per predicted costeffective parking location. Upon further inspection, it appears that in the supplemented model the maximum predicted potential kWh demand at a parking location is remarkably lower than the maximum actual potential kWh demand at a parking location; with a maximum of 39,172 kWh compared to a maximum of 290,484 kWh respectively in the 2% scenario with 3.6 kW chargers, the actual potential kWh demand is over seven times bigger than predicted. It is for this reason, the apparent under-prediction of high-kWh locations, that some parking locations are predicted to be suitable for one charging station whilst actual potential demand could be served with more charging stations; for example, at one parking location, the actual kWh demand could only be reached by installing six 3.6 kW charging stations, yet the supplemented model predicts that only enough kWh will be demanded for one station. This difference Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 50 between maximum predicted and actual potential kWh demand is even more apparent in the base model, where it predicts a maximum of 4,639 kWh demand in a 2% scenario with 3.6 kW chargers – which is 62 times smaller than the actual predicted potential kWh demand. When comparing the predictions of the base model to that of the supplemented model, several interesting findings emerge. Focusing specifically on prediction of slow charging station locations within a scenario where an infrastructure supplier places both slow- and fast charging stations, the supplemented model constantly predicts the number of cost-effective locations better than the base model. Depending on the scenario, of the locations predicted by the base model, 29%, 30%, 43% and 46% are predicted correctly (for a 2% low pricing scenario, a 2% moderate pricing scenario, a 6% low pricing scenario, and a 6% moderate pricing scenario respectively); for the supplemented model, this is 47%, 43%, 50%, and 53% respectively. The biggest gain in performance using the supplemented model regarding this metric is therefore in the 2% low pricing scenario. This superiority of the supplemented model translates to almost all profit metrics; should one place slow charging stations at locations according to both models, one will gain additional slow charging profit with the supplemented model of $49,139 or 2.2 times more profit (2% low pricing scenario), $28,418 or 1.4 times more profit (2% moderate pricing scenario), and $42,462 or 1.5 times more profit (6% low pricing scenario) compared to the base model. Interestingly, however, this is not the case in the 6% moderate pricing scenario; in this scenario, the base model just outperforms the supplemented model in terms of profit gained by $456.95 or an increase in profit of 0.3%. Upon inspection, we find that the supplemented model in this scenario excludes some high-kWh parking locations which the base model does take into account; it is probable that this explains the difference in actual potential kWh demand hence profit. Other interesting outcomes emerge when examining placement of fast charging stations in a scenario where an infrastructure provider places both slow charging and fast charging stations. As we saw, the base model predicts fast charging remarkably worse than it predicts slow charging – and this is confirmed by our prediction metrics. Given low pricing scenarios, the base model does not predict one single charging location as cost-effective – therefore significantly impacting total profit. This absence of cost-effective fast charging prediction can be accounted for by the, as mentioned before, remarkable under-prediction of high-kWh locations. As an example, one of the two parking locations that was predicted by the base model in the moderate pricing scenario, promised an actual potential demand of 256,558 kWh; the base model, however, predicts potential kWh demand for this location to be 5,694 kWh – and given that in a low pricing scenario, a fast charging station needs at least 7,373.33 kWh demand to be profitable, the base model excludes this potential charging location from cost-effective predictions. On the other hand, the supplemented model performs outstanding given low pricing scenarios, as 100% of all predicted charging locations are indeed profitable. This results in great differences in profit; should one decide not to place Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 51 fast charging stations yet only place slow charging stations given the predictions in the base model, he misses out on $1.7 million in the 2% scenario, and on $3.7 million in the 6% scenario. Given the moderate pricing scenario, the base model predicts fast charging relatively well with 50% of predicted cost-effective locations actually being cost-effective – although this is slightly deceiving, as the actual cost-effective location is one of the top-parking locations with 256,668 actual potential kWh demand, whilst the non-cost effective location is far from cost-effective with 723 actual potential kWh demand. The supplemented model, however, predicts better than the base model in the moderate pricing scenario, with 86.4% of the predicted profit-making locations actually being profitable. Also, in all moderate pricing scenarios, the supplemented model ensures exceptionally higher profits than when using the base model; in the 2% moderate pricing scenario, the supplemented model achieves fast charging profits of $2.3 million, which is $2.1 million or 9.6 times more than fast charging profits made by base model predictions of $246,221. In the 6% moderate pricing scenario, fast charging profits of $5.2 million are achieved with the supplemented model, which is $4.9 million or almost 15 times more than fast charging profits using the base model. As fast charging profits are the main source of profit in these scenarios, total profit of both fast chargers and slow chargers combined given the supplemented model is $2.1 million or 7.6 times more than the base model in the 2% scenario, and $4.9 or 11 times more in the 6% scenario. The high profits of fast charging stations may be explained by their capacity relative to their costs; whilst fast charging infrastructure costs are yearly almost 7 times more than slow charging infrastructure costs, its capacity can cover almost 14 times the capacity of a slow charger. As we saw before, some predicted slow charging parking locations could have served potential kWh demand with more than one slow charger. However, as the supplemented model merely predicts one charger per location due to the underestimation of high-demand locations, we miss out on a significant portion of potential kWh charged at these locations. As fast chargers can cover more demand, even so much that none of the fast chargers reaches full capacity, all demand can be loaded at locations where a fast charging station is placed – hence resulting in greater profits. 4.3.4 Charging locations given predetermined investment To present different applications of the supplemented model, we also examine a case in which a charging infrastructure operator aims to place 100 slow charging stations, instead of being limited to predicted cost-effective charging locations as in Section 4.3.3. We therefore select the top 100 highest predicted potential kWh demand locations given 3.6 kW charging stations, and calculate the metrics regarding profitability as Table 15. When comparing predicted cost-effective locations with the actual cost-effective locations within this predicted top 100, the supplemented model constantly outperforms the base model. Where in the base Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 52 model, 12%, 24%, 34% and 39% of the predicted top 100 locations are cost-effective (in a 2% low pricing scenario; a 2% moderate pricing scenario; a 6% low pricing scenario; and a 6% moderate pricing scenario respectively), the supplemented model performs remarkably better with 38%, 43%, 66% and 70% of predicted top 100 locations actually being cost-effective. This also translates to the actual potential kWh demand per scenario; in the supplemented model, the actual potential kWh demand is in each scenario about 2.6 times higher than in the base model. In terms of potential profit made, the optimized model is therefore also superior to the base model. In the 2% low pricing scenario, using the base model will even result in a loss of $1,819 for 100 charging stations. The supplemented model, however, provides a profit of $126,634. In a 2% moderate pricing scenario, the base model does not make a loss anymore, yet $160,567 or 10 times more profit can be made when using the supplemented model. Given the 6% low pricing scenario, the supplemented model promises $362,500 or 3.5 times more profit than the base model. In the 6% moderate pricing scenario, the supplemented model’s predictions ensure an additional profit of $453,124 or 3.3 times more than the base model. 4.3.5 Improved decision-making Reflecting on our second sub-research question and second prediction, we find that a model supplemented with points of interest information improves decision-making in predicting cost-effective electric vehicle charging locations. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 53 Table 14. Predicted cost-effective parking locations metrics Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 54 Table 15. Predicted top 100 parking locations metrics Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 55 5. Discussion In this study, we investigated the value of using information of points of interest in predicting costeffective electric vehicle charging infrastructure locations. Whilst scholars found evidence of a relationship between points of interest and charging demand (Brooker & Qin, 2015; Cai et al., 2014; Chen, Hall, & Kockelman, 2013; Shahraki, Cai, Turkay, & Xu, 2015; Wagner et al., 2015), predictive power of this relationship was still unclear. We aimed to contribute to this gap in knowledge by investigating how predicted charging demand is influenced by using a model supplemented with points of interest information, and consequently by determining how this supplemented model could be of use in decisionmaking regarding cost-effective electric vehicle charging locations. In this discussion, we will discuss our findings and the methodology used. 5.1 Charging demand prediction using POI information 5.1.1 Difference in demand prediction between models We showed that a model supplemented with information on nearby points of interest predicts charging demand more accurately than a base model that merely covers neighborhood characteristics, influence of seasonality and charging station presence. We measured accuracy of predictions using three different metrics; the absolute difference between predicted and actual kWh demand, the RMSE on the holdout set, and the adjusted R-squared on the holdout set. The difference between the base model and the supplemented model is not remarkably large yet present; the supplemented model decreases the absolute difference in predicting slow charging or fast charging demand with 5% and 6% respectively, and decreases RMSE in predicting slow charging or fast charging demand with 14% and 13% respectively. A big increase is found in the adjusted R-squared hence the proportion of variance in kWh demand that is explained by the predictors – in slow charging prediction, the adjusted R-squared of the supplemented model is 8 times as large as that of the base model; in fast charging prediction, the adjusted R-squared is almost 11 times as large. 5.1.2 Large difference between actual and predicted demand Although the supplemented model predicts potential kWh demand more accurately than the base model does, the difference between actual and predicted values by the supplemented model is still rather large, with the absolute difference in kWh prediction for both slow charging and fast charging in a 2% scenario being one-fourth of the average actual kWh demand per parking location, and the RSME being oneseventh of this average. This indicates that for some observations, the predicted kWh demand and the actual kWh demand differ substantially. This shows that the supplemented model does not perform optimally hence that there are factors other than points of interest, neighborhood characteristics, seasonal variety and charging station availability that influence the potential kWh demand. This is also confirmed Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 56 by the adjusted R-squared, which informs us that the proportion of variance in kWh demand that is explained by the predictors, is 28% in slow charging prediction and 27% in fast charging prediction. That there are other factors influencing charging demand except from points of interest, is also reported by other scholars (Wagner et al., 2014, 2015). Considering city infrastructure, one can think of many factors that may influence potential kWh demand, such as difficult accessibility of the city center for vehicles, ease of parking at a certain location, or zones that are not accessible for vehicles such as parks or pedestrian streets. 5.1.3 Slow charging more accurately predicted than fast charging demand Both the base model and the supplemented model predict slow charging demand more accurately than fast charging demand, given the three metrics as described above. Given our operationalization of potential kWh charged, potential kWh charged depends on either how long a car is parked, or on its battery level. As fast charging infrastructure loads a BEV engine quicker, the potential kWh loaded per parking instance at a fast charging station is, on average, more dependent on the battery level of a car than on the time a car is parked; conversely, potential kWh loaded per parking instance at a slow charging station is on average more dependent on the time a car is parked. It seems straightforward that points of interest, neighborhood characteristics, and seasonality have a stronger influence on time parked, and less on the initial battery level of a car; therefore, slow charging seems to be more accurately predictable using our models. This rationale is confirmed by the finding that presence of a charging station has a higher influence on predicted potential kWh demand at a fast charging station than it has on predicted potential kWh demand at a slow charging station. On the other hand, another explanation for this finding may be that we optimized the variables in our model based on a slow charging scenario; a different blend of variables may have been more appropriate for modelling potential fast charging kWh demand. 5.1.4 F&B store, leisure activity and airport most influential on predictions Both in predicting slow charging demand and fast charging demand, the presence of a food and beverage store (such as a supermarket), of a leisure activity (such as a park), or of an airport, is most influential. Importance of a food and beverage store is also found by Wagner et al. (2014), who report that the presence of a bakery has an influence on charging station utilization, and by Brooker & Qin (2015), who find that locations related to food (including buying a meal) are amongst the most popular places to recharge. According to Wagner et al. (2015), there is indeed a relationship between parking popularity and presence of an airport. Lastly, positive influence of leisure activities on demand is indeed found by Brooker & Qin (2015), yet Wagner et al. (2014) find that presence of a park does not have a significant influence on charging station utilization. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 57 Although finding the strength or direction of the influence of the predictors is outside of the scope of this study, one can reasonably argue that a supermarket is preferably visited by car due to the weight of groceries – or contrarily, that a food and beverage store is always so close to one’s residency or work, that there is no need to go by car. Travelling to the airport is reasonably also more convenient by car due to the weight of a suitcase. Further, one could also imagine that when a family with small children visits a park as a weekend activity, it is more convenient to go by car. 5.1.5 Importance of neighborhood characteristics and seasonality Interestingly, both neighborhood characteristics and seasonality have a notable influence on predicted charging demand even when the model is supplemented with points of interest characteristics. Although it is, again, beyond the scope of our study to make claims regarding the strength or direction of this influence, one can imagine potential reasons for this influence; low-income neighborhoods may for example attract less parking activity as low-income households are less likely to be able to afford a car. In contrast to our findings, Wagner et al. (2014) do not find significant relationships between charging station utilization and neighborhood characteristics such as distance from the center, population density or income per person. It may, however, be that our data does not correctly represent neighborhood characteristics; by using the neighborhood as a predictor, influences of points of interest in that neighborhood may be incorporated hence inflate the effect that neighborhood characteristics have on the response variable. Both in slow charging and fast charging, seasonality measured in weeks is within the top four of predictors influencing potential kWh demand. Scholars indeed found evidence that weather, holidays or events influence traditional traffic volume (Agarwal et al., 2005; Datla & Sharma, 2008; Festin, 1996; Keay & Simmonds, 2005); this also appears to extend towards an electric vehicle context. 5.1.6 Explanatory power is not necessarily related to predictive power Explanatory power is not necessarily related to predictive power (Shmueli & Koppius, 2011), which is confirmed by this study. We find that points of interest that were estimated by other scholars to have a significant influence on parking behavior, such as ATMs, banks and post offices (Wagner et al., 2015), and police stations and gyms (Wagner et al., 2014), have a negligible influence on predicting charging demand. This finding again highlights the importance of predictive models to assess practical relevance of phenomena (Shmueli & Koppius, 2011). Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 58 5.2. Decision-making using POI information 5.2.1 Better prediction of cost-effective locations using POI information We found that when supplementing the base model with points of interest information, the rate of correctly-predicted cost-effective locations will always be higher, with 7% to 17% better predictions in our scenarios. This shows that a model supplemented with POI information can improve decision-making regarding cost-effective locations. Still, using the supplemented model, about half of the predicted costeffective locations are actually cost-effective – indicating, as previously mentioned, that factors influence kWh demand that the supplemented model does not take into account. Currently high-kWh locations are under-predicted, making that the correctly-predicted locations attract more actual kWh demand than predicted, hence make up for the losses made on the incorrectly predicted locations. Should a charging infrastructure provider, however, use this model to predict merely one or two ideal locations, chances are higher that the locations are not cost-effective hence that no profit will be made. To establish this, however, further experimentation with different prediction thresholds is required. 5.2.2 More potential profit can be made using point of interest information Due to the higher rate of correctly-predicted cost-effective charging locations, the supplemented model ensures that more profit can be made than when using the base model – hence the supplemented model can again contribute to decision-making in predicting cost-effective locations. These differences in profit are large in each scenario given predictions of cost-effective charging locations; ranging from 6 times up to 45 times more profits that can be acquired using the supplemented model over the base model. Large profit differences can also be found when predicting the top 100 locations for slow charging, although less extreme; ranging from 2 times until 9 times more profit, and even turning a potential loss with the base model into a large profit with the supplemented model. This large increase in profits by using the supplemented model is mainly due to the model’s superior power in predicting cost-effective fast charging locations. It is interesting to note that however we established that fast charging demand is more difficult to predict than slow charging demand, our model predicts fast charging locations very accurately, with 86% to 100% of predicted profitable fast charging locations actually being profitable. It may therefore be that the supplemented model finds it easier to predict high-demand parking locations, yet has more difficulties in predicting moderate-demand yet still profitable parking locations. What is important to note, however, is that the costs incorporated in this calculation merely cover costs for infrastructure hardware, maintenance and installation. These calculations therefore exclude land costs, investment interest rates, time value of money, signage and other visibility measures, transaction costs, vandalism, electrical upgrades, and advertising (US Department of Energy, 2015). Further, payback period is set at the expected lifetime of a charging station of 10 years, yet a shorter investment payback Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 59 period may be desired. To get a full grasp of the potential profit made per location, these costs should be taken into account. Still, with a per-station profit of $2,700 to $4,700 for slow chargers and $108,000 to $247,000 for fast chargers, at 2020 expected demand levels in San Diego, both slow chargers and fast chargers have the potential to become cost-effective without additional funding – which is in line with findings by Madina et al. (2016) and McCormack et al. (2013). 5.2.3 Yet we lose out on profit due to under-prediction Due to the under-prediction of high-kWh locations, the supplemented model advises to place merely one slow charging station per cost-effective charging location – whilst at some locations, up to six slow charging stations could be deployed to cover all actual kWh demand. Therefore, some slow charging stations will constantly operate at 100% utilization, making that a charging infrastructure provider will lose out on profit. However, there may be more negative effects to consider. Range anxiety is a barrier to EV adoption, which can only be resolved through perceived availability and visibility of charging infrastructure (Carley et al., 2013; Neubauer & Wood, 2014). Although not studied as such, one may reason that when charging infrastructure at a popular location is always occupied, (potential) BEV drivers may lose confidence in their possibilities for recharging – which is harmful for BEV adoption hence for the long-term prospect of BEVs and charging infrastructure. 5.2.4 Model applications We showed that a supplemented model can be applied in different ways; the model shows promising results when predicting cost-effective locations when both fast and slow chargers can be placed, and when predicting a top 100 of best locations for slow chargers. On a critical note, due to the structural underprediction of cost-effective locations, the supplemented model is not suited to solve charging demand in a city as a whole. When predicting cost-effective locations, actual potential kWh demand at predicted locations in a 2% moderate charging scenario only covers 9.5% of total city demand. Even if the charging stations at these predicted locations would constantly be utilized to their full capacity, they would still be able to cover only 30% of total city demand. The model may therefore only be used to incrementally choose locations to place charging stations, whilst keeping track on utilization levels per placed charging station to determine whether or not a charging location should be extended with multiple chargers. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 60 5.3 Assessment of methodologies used 5.3.1 Predictive power of Gradient Boosting In context of this study, Gradient Boosting outperforms Generalized Linear Modeling on all three recorded metrics. This suggests that GBM is, as expected, more capable to cope with non-linear relationships between response and predictors, making it able to better predict potential kWh demand in this context. Despite its predictive power, a drawback of GBM is its relatively computational inefficiency. Given that our most elaborate model took over 9 hours to compute, this gives little room for experimentation with parameters – and as Gradient Boosting is heavily influenced by parameter tuning (Natekin & Knoll, 2013), this may impose limitations on the eventual fine-tuning hence predictive power of the model. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 61 6. Conclusion We conclude our study by presenting our stance on the research question and findings, by determining academic and managerial implications of the findings, and by providing limitations of the study and suggestions for future research. 6.1 The value of points of interest information This study aimed to determine the value of using information on points of interest in predicting costeffective electric vehicle charging infrastructure locations. Charging infrastructure placement is currently prone to a ‘chicken-and-egg’ dilemma, where customers are only willing to adopt battery electric vehicles once plenty public charging stations are available, yet where charging infrastructure providers are reluctant to place infrastructure until demand is known. We believe that improved insight in potential charging demand can help overcome the reluctance of private infrastructure investors, so that charging infrastructure can be placed a priori actual demand is known – eventually leading to higher adoption rates of electric vehicles. As the presence of points of interest was found to have an explanatory relationship with charging demand (Brooker & Qin, 2015; Cai et al., 2014; Chen, Hall, & Kockelman, 2013; Shahraki, Cai, Turkay, & Xu, 2015; Wagner et al., 2015), we therefore aimed to assess the practical value of this relationship by determining whether a model supplemented with points of interest information would indeed have value in predicting demand. In this study, we showed that charging demand can be more accurately predicted using a model supplemented with information on points of interest, as opposed to using a model with information on neighborhood characteristics, seasonality, and potential charging infrastructure presence. This, in turn, also ensures better decision-making in predicting cost-effective charging infrastructure locations, where structurally more profit can be made when using the supplemented model as opposed to the base model. Depending on the scenario, we showed that the supplemented model ensures 2 to 45 times more profit than the base model does. Although slow charging demand can be more accurately predicted by the supplemented model than fast charging demand, biggest gain can be found in fast charging prediction due to the underperformance of the base model in predicting fast charging locations. Further, we found that specific points of interest have a large influence on predicting potential charging demand, such as presence of food and beverage stores, leisure activities, or an airport. At the same time, in addition to information on points of interest, neighborhood characteristics and seasonality still have a notable impact on predictions. This also leads us to believe that explanatory power is not necessarily related to predictive power; several points of interest that were predicted by other scholars to have a significant effect on charging demand, have a negligible effect on predictions in our supplemented model. We also found that despite the supplemented model’s positive influence on predictions and profit compared to the base model, some predictions differ substantially from actual demand – we find that especially high-demand Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 62 locations are under-predicted in the supplemented model, thereby not reaping potential profit to the fullest. This indicates that despite the superiority of a supplemented model over a base model, there are other factors influencing charging demand that the supplemented model does not take into account. Lastly, given the contradictory findings by other scholars regarding cost-effectiveness of charging stations (Madina et al., 2016; McCormack et al., 2013; Nigro & Frades, 2015; Schroeder & Traber, 2012), we found that in all scenarios using the supplemented model, charging stations can be profitable given our cost and demand assumptions. Based on our findings, we conclude that a model supplemented with information on points of interest provides value in terms of additional potential profit when predicting cost-effective electric vehicle charging infrastructure locations, as compared to a model with merely information on neighborhoods, seasonality and infrastructure presence. 6.2 Academic relevance The main contribution of this study to the current state of academic knowledge regarding the influence of points of interest on charging demand, is the finding that points of interest information has predictive power in estimating potential charging demand hence cost-effective locations. We found that points of interest indeed impact charging demand, in line with findings by other scholars (Brooker & Qin, 2015; Cai et al., 2014; Chen, Hall, & Kockelman, 2013; Shahraki, Cai, Turkay, & Xu, 2015; Wagner et al., 2015). However, some points of interest that were deemed to have a significant impact on charging demand given these explanatory findings, did not have notable influence on predictions. This study therefore also shows the importance of predictive models to assess relevance of explanatory findings. Lastly, this study shows the predictive power of non-parametric statistical methods such as Gradient Boosting, and we therefore hope that these methods will be further utilized in future predictive studies. 6.3 Managerial relevance The findings of this study assist policy-makers and independent charging infrastructure operators in determining cost-effective charging infrastructure locations, before actual demand is known. Given the stated assumptions, infrastructure providers can better predict cost-effective locations using the model supplemented with points of interest information, as opposed to predicting cost-effective locations merely based on neighborhood, seasonality and other infrastructure presence. The predicted potential profit using the supplemented model given 2020 BEV penetration levels, hopefully stimulates independent parties to invest in charging station infrastructure, so that mass adoption of BEVs will become reality. Lastly, our approach may also be relevant to decision-making in a non-electric vehicle context. As predicted charging demand is related to parking time, parking facility operators or parking demand stakeholders may use our approach to predict parking demand within-city. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 63 6.4 Limitations and suggestions for future research The main limitations of this study and suggestions for future research revolve around the nature of the used data, model and methodology selection, and actual applicability of the model. Focusing on the data used in this study, a main limitation is the fact that parking data is gathered from electric vehicles in a car-sharing context. Parking behavior of privately-owned BEVs may differ from shared BEVs, hence our conclusions regarding predictability of points of interest may not fully apply to a non-shared vehicle context. Further, neighborhood characteristics in our model proved to be influential on predicting charging demand – yet to increase our comprehension of what exact characteristics influence the response variable, we suggest future research to incorporate neighborhood factors separately, such as income and population density. Other limitations of our study revolve around model selection. In light of computational efficiency, we tested our models on trial scenarios and with limited variation in parameter selection. It may therefore be that our models are not optimally specified hence predictive. Further, future studies may want to experiment with increasing the number of points of interest in the model, as full inclusion of all parameters (points of interest within a willingness-to-walk range of 100, 200, 500 and 1000 meter) yielded best results within our context. We also found that there are other factors influencing charging demand, which the supplemented model does not take into account. Future research can therefore try to optimize predictive power of the model by including, for example, factors regarding city infrastructure. Applicability of the developed model also poses limitations. Firstly, our model is based on several assumptions regarding the number of BEVs on-the-road, public charging versus home charging activity, infrastructure costs and potential revenue. Interpretation of the model’s findings should therefore always be done in light of these assumptions. In this study, we further ignore factors such as price elasticity or shifts in demand when infrastructure is placed – which are interesting parameters to include in future research. Additionally, even though better-performing than the base model, the supplemented model under-predicts high-demand parking locations, which negatively influences its applicability to solving a city’s total charging demand – which can only be solved by increasing predictable power of the model hence by implementing additional predictors. Future research may also revolve around finding the breakeven point between charging demand and infrastructure costs. Lastly, although this study aims to develop a model which is generalizable to other cities where actual parking activity is unknown, generalizability towards other cities should yet be proven. It may be that predictive power of points of interest information is not one-on-one applicable to other cities due to, for example, cultural differences and city architecture. We therefore highly suggest future studies to apply the developed model in other contexts. Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 64 Bibliography Agarwal, M., Maze, T. H., & Souleyrette, R. R. (2005). Impacts of Weather on Urban Freeway Traffic Flow Characteristics and Facility Capacity. Proceedings of the 2005 Mid-Continent Transportation Research Symposium, (August 2005), 14p. Retrieved from http://www.ctre.iastate.edu/pubs/midcon2005/index.htm Alexander, D., & Gartner, J. (2012). Electric Vehicles in Europe. Amsterdam Roundtables Foundation and McKinsey & Company, 60. Brooker, R. P., & Qin, N. (2015). Identification of potential locations of electric vehicle supply equipment. Journal of Power Sources, 299, 76–84. http://doi.org/10.1016/j.jpowsour.2015.08.097 Cai, H., Jia, X., Chiu, A. S. F., Hu, X., & Xu, M. (2014). Siting public electric vehicle charging stations in Beijing using big-data informed travel patterns of the taxi fleet. Transportation Research Part D: Transport and Environment, 33, 39–46. http://doi.org/10.1016/j.trd.2014.09.003 Car Charging Group. (2016). Blink Network - Our history. Retrieved August 9, 2016, from http://www.blinknetwork.com/blink-history.html Carley, S., Krause, R. M., Lane, B. W., & Graham, J. D. (2013). Intent to purchase a plug-in electric vehicle: A survey of early impressions in large US cites. Transportation Research Part D: Transport and Environment, 18(1), 39–45. http://doi.org/10.1016/j.trd.2012.09.007 Census Reporter. (2016). Census Reporter San Diego, CA. Retrieved August 9, 2016, from https://censusreporter.org/profiles/16000US0666000-san-diego-ca/ Chen, T. D., Hall, C. J., & Kockelman, K. M. (2013). The Electric Vehicle Charging Station Location Problem: A Parking-based Assignment Method for Seattle. Proceedings of the 92nd Annual Meeting of the Transportation Research Board in Washington DC. Click, C., Lanford, J., Malohlava, M., & Parmar, V. (2015). Gradient Boosted Models with H2O ’ s R. CrunchBase. (2016). ChargePoint, Inc. Retrieved August 9, 2016, from https://www.crunchbase.com/organization/chargepoint#/entity Datla, S., & Sharma, S. (2008). Impact of cold and snow on temporal and spatial variations of highway traffic volumes. Journal of Transport Geography, 16(5), 358–372. http://doi.org/10.1016/j.jtrangeo.2007.12.003 Dell Software. (2015). Boosting Trees Regression Classification. Egbue, O., & Long, S. (2012). Barriers to widespread adoption of electric vehicles: An analysis of Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 65 consumer attitudes and perceptions. Energy Policy, 48(2012), 717–729. http://doi.org/10.1016/j.enpol.2012.06.009 European Commission. (2011). Roadmap to a Single European Transport Area–Towards a competitive and resource efficient transport system. Office for Official Publications of the European …. http://doi.org/10.2832/30955 European Environment Agency. (2016). EEA greenhouse gas - data viewer. Retrieved from http://www.eea.europa.eu/data-and-maps/data/data-viewers/greenhouse-gases-viewer Festin, S. M. (1996). Summary of national and regional travel trends: 1970 - 1995. Franke, T., & Krems, J. F. (2013a). Interacting with limited mobility resources: Psychological range levels in electric vehicle use. Transportation Research Part A: Policy and Practice, 48, 109–122. http://doi.org/10.1016/j.tra.2012.10.010 Franke, T., & Krems, J. F. (2013b). Understanding charging behaviour of electric vehicle users. Transportation Research Part F: Traffic Psychology and Behaviour, 21, 75–89. http://doi.org/10.1016/j.trf.2013.09.002 Franke, T., & Krems, J. F. (2013c). What drives range preferences in electric vehicle users? Transport Policy, 30, 56–62. http://doi.org/10.1016/j.tranpol.2013.07.005 Franke, T., Neumann, I., Bühler, F., Cocron, P., & Krems, J. F. (2012). Experiencing Range in an Electric Vehicle: Understanding Psychological Barriers. Applied Psychology, 61(3), 368–391. Funke, S., Nusser, A., & Storandt, S. (2014). Placement of Loading Stations for Electric Vehicles: No Detours Necessary! Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 417–423. Governing. (2016). Car Ownership in U.S. Cities Map. Graham-Rowe, E., Gardner, B., Abraham, C., Skippon, S., Dittmar, H., Hutchins, R., & Stannard, J. (2012). Mainstream consumers driving plug-in battery-electric and plug-in hybrid electric cars: A qualitative analysis of responses and evaluations. Transportation Research Part A: Policy and Practice, 46(1), 140–153. http://doi.org/10.1016/j.tra.2011.09.008 Gregor, S. (2006). The nature of theory in information systems. Mis Quarterly, 30(3), 611–642. Guo, S., & Zhao, H. (2015). Optimal site selection of electric vehicle charging station by using fuzzy TOPSIS based on sustainability perspective. Applied Energy, 158, 390–402. http://doi.org/10.1016/j.apenergy.2015.08.082 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 66 Haklay, M. (2010). How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703. http://doi.org/10.1068/b35097 IEA. (2015). Energy and climate change. World Energy Outlook Special Report, 1–200. http://doi.org/10.1038/479267b International Council on Clean Transportation. (2015). European Vehicle Market Statistics Pocketbook 2015/16. International Energy Agengy. (2015). Global EV Outlook 2015. Ip, A., Fong, S., & Liu, E. (2010). Optimization for allocating BEV recharging stations in urban areas by using hierarchical clustering. 2010 6th International Conference on Advanced Information Management and Service (IMS), 460–465. Jain, A. (2016). Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python. Jensen, A. F., Cherchi, E., & Mabit, S. L. (2013). On the stability of preferences and attitudes before and after experiencing an electric vehicle. Transportation Research Part D: Transport and Environment, 25, 24–32. http://doi.org/10.1016/j.trd.2013.07.006 Kaplan, A. (2009). The conduct of inquiry. New Jersey: Transaction Publishers. Keay, K., & Simmonds, I. (2005). The association of rainfall and other weather variables with road traffic volume in Melbourne, Australia. Accident Analysis and Prevention, 37(1), 109–124. http://doi.org/10.1016/j.aap.2004.07.005 Lin, Z. (2014). Optimizing and Diversifying Electric Vehicle Driving Range for U . S . Drivers Battery Electric Vehicles : Range Optimization and Diversification for US. Transportation Science, 48(4), 635–650. http://doi.org/10.1287/trsc.2013.0516 Madina, C., Zamora, I., & Zabala, E. (2016). Methodology for assessing electric vehicle charging infrastructure business models. Energy Policy, 89, 284–293. http://doi.org/10.1016/j.enpol.2015.12.007 Mak, H.-Y., Rong, Y., & Shen, Z.-J. M. (2013). Infrastructure planning for electric vehicles with battery swapping. Management Science, 59(7), 1557–1575. Maron, M. (2015). How complete is OpenStreetMap? McCormack, D. E., Sanborn, S., & Rhett, D. (2013). Plugged In: The Last Mile. Who will build out and pay for electric vehicle public charging infrastructure? Retrieved from Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 67 http://www.deloitte.com/assets/Dcom-UnitedStates/Local Assets/Documents/Energy_us_er/us_er_PluggedInLastMile_June2013.pdf Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7(DEC). http://doi.org/10.3389/fnbot.2013.00021 Neubauer, J., & Wood, E. (2014). The impact of range anxiety and home, workplace, and public charging infrastructure on simulated battery electric vehicle lifetime utility. Journal of Power Sources, 257, 12–20. Nigro, N., & Frades, M. (2015). Business business models for financially sustainable EV charging networks. Center for Climate and Energy Solutions. Nykodym, T., Kraljevic, T., Hussami, N., Rao, A., & Wang, A. (2016). Generalized Linear Modeling with H2O’s R. Mountain View, CA. Retrieved from http://docs.h2o.ai/h2o/latest-stable/h2odocs/booklets/GLMBooklet.pdf OpenStreetMap. (2016a). Accuracy OpenStreetMap Wiki. OpenStreetMap. (2016b). Map Features OpenStreetMap Wiki. Retrieved August 11, 2016, from http://wiki.openstreetmap.org/wiki/Map_Features OpenStreetMap. (2016c). Quality Ensurance OpenStreetMap Wiki. Pearre, N. S., Kempton, W., Guensler, R. L., & Elango, V. V. (2011). Electric vehicles: How much range is required for a day’s driving? Transportation Research Part C: Emerging Technologies, 19(6), 1171– 1184. http://doi.org/10.1016/j.trc.2010.12.010 Rauh, N., Franke, T., & Krems, J. F. (2015). Understanding the Impact of Electric Vehicle Driving Experience on Range Anxiety. Human Factors: The Journal of the Human Factors and Ergonomics Society , 57 (1 ), 177–187. http://doi.org/10.1177/0018720814546372 Rezvani, Z., Jansson, J., & Bodin, J. (2015). Advances in consumer electric vehicle adoption research: A review and research agenda. Transportation Research Part D: Transport and Environment, 34, 122–136. http://doi.org/10.1016/j.trd.2014.10.010 Sadeghi-Barzani, P., Rajabi-Ghahnavieh, A., & Kazemi-Karegar, H. (2014). Optimal fast charging station placing and sizing. Applied Energy, 125, 289–299. San Diego Gas & Electric. (2016). SDG&E to Install Thousands of Electric Vehicle Charging Stations. San Diego Planning Department. (2016). Community Profiles. San Román, T. G., Momber, I., Abbad, M. R., & Sánchez Miralles, Á. (2011). Regulatory framework and Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 68 business models for charging plug-in electric vehicles: Infrastructure, agents, and commercial relationships. Energy Policy, 39(10), 6360–6375. http://doi.org/10.1016/j.enpol.2011.07.037 Sathaye, N., & Kelley, S. (2013). An approach for the optimal planning of electric vehicle infrastructure for highway corridors. Transportation Research Part E: Logistics and Transportation Review, 59, 15– 33. http://doi.org/10.1016/j.tre.2013.08.003 Schneider, M., Stenger, A., & Goeke, D. (2014). and Recharging Stations The Electric Vehicle-Routing Problem with Time Windows and Recharging Stations, (December 2015). Schroeder, A., & Traber, T. (2012). The economics of fast charging infrastructure for electric vehicles. Energy Policy, 43, 136–144. http://doi.org/10.1016/j.enpol.2011.12.041 Shahraki, N., Cai, H., Turkay, M., & Xu, M. (2015). Optimal locations of electric public charging stations using real world vehicle travel patterns. Transportation Research Part D: Transport and Environment, 41, 165–176. http://doi.org/10.1016/j.trd.2015.09.011 Shmueli, G., & Koppius, O. (2011). Predictive analytics in information systems research. MIS Quarterly, 35(3), 553–572. http://doi.org/10.2139/ssrn.1606674 Shukla, A., Pekny, J., & Venkatasubramanian, V. (2011). An optimization framework for cost effective design of refueling station infrastructure for alternative fuel vehicles. Computers & Chemical Engineering, 35(8), 1431–1438. http://doi.org/10.1016/j.compchemeng.2011.03.018 Skippon, S., & Garwood, M. (2011). Responses to battery electric vehicles: UK consumer attitudes and attributions of symbolic meaning following direct experience to reduce psychological distance. Transportation Research Part D: Transport and Environment, 16(7), 525–531. http://doi.org/10.1016/j.trd.2011.05.005 Smith, M., & Castellano, J. (2015). Costs Associated With Non-Residential Electric Vehicle Supply Equipment. Sober, E. (2002). Instrumentalism, Parsimony, and the Akaike Framework. Philosophy of Science, 69(S3), S112–S123. http://doi.org/10.1086/341839 Song, J., & Chung, K. (2011). Observational studies: Cohort and case-control studies. National Institute of Health, 126(6), 2234–2242. http://doi.org/10.1097/PRS.0b013e3181f44abc.Observational Speidel, S., & Braunl, T. (2014). Driving and charging patterns of electric vehicles for energy usage. Renewable and Sustainable Energy Reviews, 40, 97–110. http://doi.org/10.1016/j.rser.2014.07.177 U.S. Department of Energy. (2016). Developing Infrastructure to Charge Plug-In Electric Vehicles. Retrieved July 1, 2016, from http://www.afdc.energy.gov/fuels/electricity_infrastructure.html Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 69 Untermann, R., & Lewicki, L. (1984). Accommodating the Pedestrian: Adapting Towns and Neighborhoods for Walking and Biking. Michigan: Van Nostrand Reinhold. US Department of Energy. (2015). Costs Associated With Non-Residential Electric Vehicle Supply Equipment, (November). US Department of State. (2014). U.S. Climate Action Report 2014, 310. Retrieved from http://www.state.gov/documents/organization/219038.pdf Van Der Goot, D. (1982). A model to describe the choice of parking places. Transportation Research Part A: General, 16(2), 109–115. http://doi.org/10.1016/0191-2607(82)90003-6 Veness, C. (2015). Vincenty solutions of geodesics on the ellipsoid. Wagner, S., Brandt, T., & Neumann, D. (2014). Smart City Planning - Developing an Urban Charging Infrastructure for Electric Vehicles. Proceedings of the European Conference on Information Systems (ECIS) 2014, 1–15. Wagner, S., Brandt, T., & Neumann, D. (2015). In free float: Developing Business Analytics support for carsharing providers. Omega, 59, 4–14. http://doi.org/10.1016/j.omega.2015.02.011 Woodcock, J., Edwards, P., Tonne, C., Armstrong, B. G., Ashiru, O., Banister, D., … Roberts, I. (2009). Public health benefits of strategies to reduce greenhouse-gas emissions: urban land transport. The Lancet, 374(9705), 1930–1943. http://doi.org/10.1016/S0140-6736(09)61714-1 World Energy Council. (2015). World Energy Perspective: The road to resilience − managing and financing extreme weather risks. Xi, X., Sioshansi, R., & Marano, V. (2013). Simulation-optimization model for location of a public electric vehicle charging infrastructure. Transportation Research Part D: Transport and Environment, 22, 60–69. http://doi.org/10.1016/j.trd.2013.02.014 Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 70 Appendices Appendix 1. Extracted points of interest Appendix 1. Extract points of interest Extracted points of interest POI type Accountant Airport gate Alcohol store Antiques store Arts store Arts center Artwork ATM Attraction Bakery Bank Bar Beauty store Beverage store Bicycle parking Bicycle rental Bicycle store Books store Boutique Bureau de change Café Car parts store Car rental Car repair store Car store Car wash Caravan site Carpet store Cinema Clinic Clothes store College Community center Company Computer store Convenience store Copyshop Courthouse Craft store Department store Doctor DIY store Dry cleaning Electronics store Estate agent Fashion store Fastfood Fire station Florist Fountain Frame store Fuel Furniture store Games store Garden Garden center Gift store Government Graveyard Greengrocer Hairdresser Herbalist store Hospital Hostel Hotel Houseware store Information POI category Office Airport F&B store Shopping Shopping Entertainment Tourism Financial Tourism F&B store Fianncial F&B Shopping Shopping Transportation Transportation Shopping Shopping Shopping Financial F&B Shopping Transportation Shopping Shopping Transportation Accommodation Shopping Entertainment Healthcare Shopping Education Entertainment Office Shopping Shopping Shopping Courthouse Shopping Shopping Healthcare Shopping Shopping Shopping Office Shopping F&B Fire station Shopping Entertainment Shopping Transportation Shopping Shopping Leisure Shopping Shopping Office Graveyard F&B store Shopping Shopping Healthcare Accommodation Accommodation Shopping Tourism Count 2 3 8 2 2 2 14 34 15 5 26 140 6 2 11 1 10 7 1 1 79 1 8 12 1 4 1 1 3 2 40 1 3 4 1 48 3 3 1 9 2 3 5 6 3 4 157 12 3 1 1 30 5 2 1 2 17 1 3 1 17 1 17 2 49 1 7 Master Thesis MSc. Business Information Management | POI type Interior decoration store Jewelry store Kindergarten Kiosk Laundry store Lawyer Library Mall Marina Massage store Memorial Mobile phone store Money lender Monument Motel Motorcycle store Museum Newsagent Nightclub Office Optician Outdoor store Park Parking Parking entrance Parking space Pet store Pharmacy Photo store Place of worship Platform Playground Police Post office Postbox Prison Pub Railway station Restaurant Station School Second hand store Ship Shoe store Social facility Sports center Sports store Station Stationery store Stop position Supermarket Swimming pool Swimming sport Taxi Tea store Theatre Ticket store Tobacco store Toilets Tram stop Univeristy Variety store Veterinary Video game store Warehouse Wine store Zoo POI category Shopping Shopping Education Shopping Shopping Office Education Shopping Leisure Shopping Historic Shopping Financial Historic Accommodation Shopping Tourism Shopping Entertainment Office Shopping Shopping Leisure Transportation Transportation Transportation Shopping Healthcare Shopping Place of worship Public transport Leisure Police Post Post Prison F&B Public transport F&B Public transport Education Shopping Historic Shopping Healthcare Sports Shopping Public transport Shopping Public transport F&B store Sports Sports Transportation F&B store Entertainment Shopping Shopping Toilets Public transport Education Shopping Healthcare Shopping Shopping F&B store Tourism Stéphanie Florence Visser Count 3 7 1 3 5 3 19 1 1 1 4 12 4 2 17 2 14 1 2 2 1 1 49 9 19 3 4 13 3 274 6 6 1 11 21 2 13 4 250 77 1 1 7 2 5 5 5 29 2 5 40 2 3 2 1 12 1 1 29 1 2 4 1 1 1 1 1 71 Appendix 2. Example non-linearity predictor and response Appendix 2. Example non-linearity predictor and response Appendix 3. Density response variable Appendix 3. Density response variable Master Thesis MSc. Business Information Management | Stéphanie Florence Visser 72
© Copyright 2026 Paperzz