Modeling the Impacts of Inclement Weather on Freeway Traffic Speed: An Exploratory Study Utilizing Social Media Data Lei Lin Graduate Research Assistant Department of Civil, Structural, and Environmental Engineering University at Buffalo, the State University of New York, Buffalo, NY 14260 Phone: (716) 645-4347 FAX: (716) 645-3733 E-mail: [email protected] Ming Ni Graduate Research Assistant Department of Industrial & Systems Engineering University at Buffalo, the State University of New York, Buffalo, NY 14260 Phone: (716) 645-3470 FAX: (716) 645-3302 E-mail: [email protected] Qing He, Ph.D. Stephen Still Assistant Professor Department of Civil, Structural, and Environmental Engineering Department of Industrial & Systems Engineering University at Buffalo, the State University of New York, Buffalo, NY 14260 Phone: (716) 645-3470 FAX: (716) 645-3302 E-mail: [email protected] Jing Gao, Ph.D. Assistant Professor Department of Computer Science and Engineering University at Buffalo, the State University of New York, Buffalo, NY 14260 Phone: (716) 645-1586 E-mail: [email protected] Adel W. Sadek, Ph.D.*** Professor, Department of Civil, Structural, and Environmental Engineering Director, Institute for Sustainable Transportation and Logistics Director, Transportation Informatics Tier I University Transportation Center University at Buffalo, the State University of New York, Buffalo, NY 14260 Phone: (716) 645-4367 FAX: (716) 645-3733 E-mail: [email protected] Transportation Research Board 94th Annual Meeting Washington, D.C. *** Corresponding Author Submission Date: August 1, 2014 Word Count: 5,285 text words + 2 Figures + 5 Tables = 7,035 equivalent words Lin, Ni, He, Gao & Sadek 2 ABSTRACT Recently, there has been an increased interest in quantifying and modeling the impact of inclement weather on transportation system performance. One problem that the majority of previous research studies on the topic have faced is that they largely depended on weather data merely from atmospheric weather stations, which lacked information about road surface condition. The emergence of social media platforms, such as Twitter and Facebook, provides a new opportunity to extract more weather related data from such platforms. The current study has two primary objectives; first, to examine if real world weather events can be inferred from social media data, and secondly, to determine whether including weather variables, extracted from social media data, can improve the predictive accuracy of models developed to quantify the impact of inclement weather on freeway traffic speed. To achieve those objectives, weather data, Twitter data, and traffic information were compiled for the Buffalo-Niagara metropolitan area as a case study. A method called the Twitter Weather Events observation was then applied to the Twitter data, and the sensitivity and false alarm rate for the method was evaluated against real world weather data. Following this, linear regression models for predicting the impact of inclement weather on freeway speed were developed with and without the Twitter-based weather variables incorporated. The results indicate that Twitter data has a relatively high sensitivity for predicting inclement weather (i.e., snow) especially during the daytime and for areas with significant snowfall. They also show that the incorporation of Twitter-based weather variables can help improve the predictive accuracy of the models. Key Words: Social Media; Twitter; Average Traffic Speed; Inclement Weather; Linear Regression; Weather Event Observation. Lin, Ni, He, Gao & Sadek 3 INTRODUCTION It is widely acknowledged that inclement weather could have a significant impact on transportation system safety, traffic flow characteristics, and road infrastructure. For example, with respect to traffic safety, a recent report by the Federal Highway Administration (FHWA) shows that every year on average 23% of all vehicle crashes (more than 1.3 million) are a result of inclement weather, and that 6,250 people are killed and over 480,000 people are injured in those weather-related crashes (1). Quantifying such an impact will help (1) the general public, in terms of better planning their trips; and (2) transportation operators, in terms of how to best operate the transportation system during inclement weather conditions. Given this, several research studies have recently attempted to quantify and model such an impact. Among the numerous methods recently proposed in the literature, two groups of approaches have received wide attention. The first group of approaches is statistical analysis. For example, Bartlett et al. (2) applied linear regression model to derive a relationship between hourly traffic volumes and weather factors in Buffalo, NY. Stern et al. (3) explored the impact of inclement weather on travel time for road segments in Washington DC using a two-step linear regression process. Vlahogianni and Karlaftis (4) analyzed and compared the statistical characteristics of per lane speed time series under fine and adverse weather conditions based on recurrence quantification analysis. The second approach is traffic simulation. For example, Zhao et al. (5) calibrated a TRANSIMS model to inclement weather conditions, and then used the model to study the network-wide impacts of snow storms. Also, Rakha et al. (6) calibrated the INTEGRATION traffic simulation model to reflect inclement weather and roadway conditions. Based on the simulation model, they showed that both rain and snow result in a reduction in vehicle speeds. Although these previous studies provided insight into the impact of adverse weather, one problem that the majority of those studies faced stems from the fact that they largely depended on weather data merely from atmospheric weather stations. While atmospheric weather stations typically provide measurements of precipitation, visibility, and wind speed, other critical weather related data, most importantly information about the road surface condition, are missing. This has limited the ability to accurately model adverse weather impact for obvious reasons. For example, even after a storm ends, lingering icy conditions on the roads may still persist. On the other hand, even during a storm, vehicles’ speeds may not be appreciably impacted if the roads were properly plowed. To secure information about road surface conditions, Omer and Fu (7) developed a support vector machine (SVM) model to classify winter road surface images from low cost cameras mounted on regular vehicles. Jonsson (8) tested the use of cost effective infrared detectors to distinguish road surface conditions. As can be seen, quantifying road surface conditions require specialized, and often costly, equipment and methodologies. Given this, additional research, based on new data streams, is needed to address some of the limitations of the previous work. Recently, with the emergence of social media tools, such as Twitter and Facebook, people can freely share their feelings and opinions on almost anything. Discussions about adverse weather are also very common on these social media platforms. Motivated by this, some researchers have recently tried to extract weather information from social media data as a complement to traditional weather observations. For example, Hyvärinen and Saltikoff (9) compared the position of hail detected in the atmosphere by radar with positions of Flickr photos depicting hail on the ground, and showed that Flickr data are useful for cases Lin, Ni, He, Gao & Sadek 4 when photos are available but meteorological data are not. Cox and Plale (10) tried to improve on automatic weather observations with public Twitter streams. Although their study shows the difficulty of connecting Twitter-inferred weather events to atmospheric weather stations observations in some cases (when the Twitter stream provides insufficient location information), the authors emphasize the great potential of social media when location information is available. Moreover, other researchers have started to extract and utilize useful information from social media data, specifically for transportation applications. For example, Ni et al. (11) tried to predict short-term traffic volumes under special event conditions like game days by taking into account information derived from tweets. Their result shows that the performances of the prediction models are significantly improved by including social media information such as tweet rate features and semantic features. Other researchers have evaluated the advantages and disadvantages of this new data source. Grant-Muller et al. (12) illustrated that, in particular contexts, social media data can meet the data needs of transportation applications more than many of the current data sources. The researchers also addressed four challenges associated with the text mining process of social media data. PURPOSE AND SCOPE The current study has two primary objectives or tasks. The first objective is to examine if real world weather events can be inferred by extracting some characteristics from social media data; this would allow us to determine whether social media data can be correlated to weather events. To achieve this objective, a method called “Twitter Weather Events observation” will be applied. The second objective is to determine whether including weather variables, extracted from social media data, can improve the predictive accuracy of models developed to quantify the impact of inclement weather on freeway traffic speed. To answer that question, linear regression models will be developed once with Twitter variables included, and another without, and the accuracy of the two model groups will be compared. To give context to the study, the research will focus on the Buffalo-Niagara metropolitan area well known for its heavy snow in winter. Two atmospheric weather stations were selected to provide weather data, the first located at the Buffalo Niagara International Airport (BNIA), and the other located at the Niagara Falls International Airport (BFIA). Twitter data in this area, with detailed geographic information, from 12/01/2013 to 12/19/2013were available. Because of the microclimates effect caused by Lake Erie, which results in weather conditions varying significantly over a small geographic area, only traffic data from those links closest to the atmospheric weather conditions were considered. For twitter data, data located within a circular area, whose center is located at the location of the weather station and having a radius equal to 15 miles, was considered. The paper is organized as follows. The methodology section will introduce the method used to infer real world weather events from social media data. The next section will discuss the data used in the study and how they were preprocessed prior to building the models. The data and analysis and modeling results are then presented; finally, the paper concludes by summarizing the major conclusions of the study and future study plans. METHODOLOGY As mentioned, one important task in this study is to show that inclement weather conditions in real world can be inferred from social media, then with the extracted social media weather data Lin, Ni, He, Gao & Sadek 5 as a complement, we can better explain the relationship between inclement weather and traffic behavior. The following section will describe how weather events may be discerned from Twitter stream data. It will also introduce two criteria to compare and verify observations derived from the tweets with real world weather events. Twitter Weather Events Observation Twitter provides an interface which allows users to post up to 140 characters. These posts are called “tweets”, and can be read by the other Twitter users. Users of Twitter can subscribe to other users, and when they do that, they are known as followers. The "@" sign followed by a username is used for mentioning or replying to other users. Hashtags of tweets are words or phrases prefixed with a "#" sign, which are used to group posts together that refer to a certain topic. As of July 2014, the number of tweets that happen every second is estimated to be around 9,100 tweets per second (13). As defined by Cox and Plale (10), there are two types of weather events that can be observed in Twitter stream data. The first type of weather events is called Weather Utterance Event. Once a particular weather related word, like “snow” for example, is discovered in the context of a tweet, we say that a “Weather Utterance Event” has been observed. However this simply means that a person in the real world is “tweeting” about the weather, but it may or may not be true that a weather event really happened in the real world. The second type of weather events is called a “Weather Report Event”. This refers to the case where a hashtag like “#weather” is used to report a weather event in that tweet, and only specific information regarding the weather is contained. Table 1 below illustrates the difference between a “Weather Utterance Event” and a “Weather Report Event”. As can be seen, the first example of a “Weather Utterance Event” clearly shows that it was snowing at that time and the traffic was impacted, while the second example of a “Weather Utterance Event” does not indicate that it was really snowing. For the “Weather Report Event”, on the other hand, the tweet example used several hashtags to report a specific weather event, and even attached an “http” link which points to a snow picture. We can see that “Weather Report Events” are more precise compared to “Weather Utterance Events”; however, their frequency of occurrence is less. Table 1 Examples of Weather Utterance Event and Weather Report Event Weather Utterance Event Weather Report Event Longitude -78.6706 Latitude 42.91815 Created At 2013/12/12 16:44 -78.8107 42.90562 2013/12/12 17:14 -78.8258 42.95148 2013/12/12 17:59 Text The roads are a hot mess out in the burbs all over. Snowing like CRAZY up in here...drive safe everyone. I hate the cold but I love the snow. #BuffaloNY #Weather #Outside. #Cold #Snowing #Windy. @ Parkside Candy http://t.co/IfyzICtGPW Lin, Ni, He, Gao & Sadek 6 Because only one type of a weather event is not enough to infer a real world inclement weather condition, Cox and Plale (10) proposed what they called a “Twitter Weather Event” by aggregating the uttered and reported weather events within a spatial region and time interval . The definition is simple as shown in Equation 1. Once the number of uttered and reported events is over a threshold , a Twitter Weather Event is said to have been observed. { , [Equation 1] Where, means a Twitter Weather Event is observed, 0 means not; is the number of Weather Utterance Events; is the number of Weather Report Events. The Twitter Weather Event is then used to indicate the occurrence of real world inclement weather event in the same meteorologically significant region and time interval . Suppose at a region , the real world weather records in time intervals are available, and that in the real world, there were time intervals when the inclement weather events happened and time intervals when no inclement weather events occurred (N = N1 + N2). Based on this, we can define the following two criteria, namely the sensitivity and false alarm rate criteria, to evaluate the performance of the Twitter Weather Event method. Those two metrics are calculated as shown in Equation 2 and Equation 3. ∑ [Equation 2] ∑ [Equation 3] Where, 1, if eistrue . I (e) 0, if eis false In other words, the “sensitivity” metric identifies the proportion of real world inclement weather events which were correctly inferred as such by the Twitter Weather Event method, and the “false alarm rate” metric captures the proportion of normal weather conditions that were wrongly indicated as inclement weather events by the method. A high sensitivity and a low false alarm rate would demonstrate that “tweets” can serve as a good data source for identifying inclement weather conditions. MODELING DATASETS Three different types of data were utilized in this study, namely: (1) weather data recorded by two the atmospheric weather stations previously mentioned; (2) Twitter data; and (3) freeway traffic data from the Buffalo, NY metropolitan area. Because in this study, only the Twitter data from 12/01/2013 to 12/19/2013 were available, the weather data and freeway traffic data Lin, Ni, He, Gao & Sadek 7 corresponding to the same time period are extracted. Table 2 provides a summary of the different data items utilized in the study, and the following sections describe those items in more detail. Table 2 the Summary of the Weather Data, Twitter Data and Traffic Data Weather data Twitter data Index 1 Variables Temperature 2 3 4 5 6 7 8 1 2 3 Visibility Snow Precipitation Wind Speed Weather Condition-Clear (=1 when it is true, otherwise 0) Weather Condition- Cloudy (=1 when it is true, otherwise 0) Weather Condition- Rain (=1 when it is true, otherwise 0) Weather Condition- Snow (=1 when it is true, otherwise 0) Number of Weather Utterances and Report Events per hour (no_of_snow) Number of Weather Report Events per hour (no_of_snow_hashtag) Number of independent Twitter users per hour publishing Weather Utterance and Report Events (no_of_snow_user) Number of Weather Utterance and Report Events per hour mentioned by the other users (no_of_snow_@) Number of Weather Utterance and Report Events per hour containing URLs (no_of_snow_http) Number of Weather Utterance and Report Events per hour related to “melt” (no_of_snow_melt) Number of Weather Utterance and Report Events per hour related to “road” (no_of_snow_road) Hourly Traffic Counts (Counts_N0198S_lane1, …,) Hourly Traffic Speed Standard Deviation (Speed_Std_N0198N_lane1,…,) Average Traffic Speed (Speed_M4183E_lane1,…,) 4 5 6 7 Traffic data 1 2 3 Values 1 ( 32 ºF); 0 ( 32 ºF) 0-10 miles 0+ inch, …, 0+ mph, …, 0, 1 0, 1 0, 1 0, 1 0, 1, 2, … 0, 1, 2, … 0, 1, 2, … 0, 1, 2, … 0, 1, 2, … 0, 1, 2, … 0, 1, 2, … 0, 1, 2,..vph 0+ mph, …, 0+ mph,…, Weather Data The Buffalo region is an ideal location for studying the impact of inclement weather, particularly snow, on traffic flow. The number of days in Buffalo with appreciable snowfall (more than 0.1 inch) accounts for 17 percent of all days annually (14). Weather data were downloaded from Weather Underground, which is a commercial weather service that provides real-time and historical weather information via the Internet (15). Their data sources include personal weather stations with quality control and the automated airport weather stations. Considering the need for safe and efficient aviation operations, higher precision standards are required at airports. In this study, weather data from two airport weather stations in the area: the Buffalo Niagara International Airport (BNIA), and the Niagara Falls International Airport (NFIA) were utilized. As can be seen in Table 2, the weather data include eight variables: the temperature, visibility, wind speed, snow precipitation, the weather condition clear, cloudy, rain and snow. The temperature variable is transferred into a binary variable, higher than 32 ºF or not. The visibility (0-10 miles), wind speed (mph), and snow precipitation (inches) are kept as continuous variables. The weather condition clear, cloudy, rain and snow are also four binary variables. The values of the eight variables are recorded for each hour. Twitter Data Lin, Ni, He, Gao & Sadek 8 Twitter, with its ability to geo tag the location of a message, is ideal for providing weather information from observers (16). Twitter data was collected through the Twitter Streaming API with geo-location filter (17). After excluding spam and commercial tweets, a total of 360,112 tweets with longitude and latitude located within the Buffalo area, and within the time interval from 12/01/2013 to 12/19/2013, were assembled. The original twitter data is in the form of short text messages. As mentioned in Methodology part, one needs to define one or more keywords to identify the Weather Utterance Events and the Weather Report Events. Considering the data time period was in the winter, we use the keyword “snow” to identify those tweets that are likely to be relevant to inclement weather. The spatial region is defined as a circular area whose center located at the weather station and whose radius is equal to 15 miles. The time interval is defined as one hour. One thing that needs to be pointed out is that the two circles drawn based on the two weather stations overlap with each other. In case, one Weather Utterance/Report Event is located within the overlapping region, it is counted as record belonging to the circular region whose center is closer to the event. Table 2 lists the seven variables extracted from Twitter data. The first variable refers to the total number of Weather Utterance and Report Events per hour. However, because Weather Report Events are more accurate than utterances, a second variable capturing only the number of Weather Report Events per hour is defined. The third variable refers to the number of independent users per hour who publish those Weather Utterance and Report Events. Variable 4 through 7 refer to the number of tweets in those Weather Utterance and Report Events per hour which contain “@”, “http”, “melt” and “road”, respectively. The notations in the parentheses will be used to refer to these variables thereafter. Traffic Data Traffic data were obtained from the detectors maintained by the Niagara International Transportation Technology Coalition (NITTEC), an organization made up of fourteen transportation-related agencies in Western New York and Southern Ontario area. The data consist of hourly traffic counts for each lane of several freeway links. It also includes the number of the vehicles in the different speed bins for each lane aggregated in each hour. Lin, Ni, He, Gao & Sadek 9 M4183E 0.73 miles M4183W Weather Station Figure 1a Closest Traffic Detectors to BNIA (I-90) Weather Station 4.08 miles N0198S N0198N Figure 1b Closest Traffic Detectors to NFIA (I-190) As mentioned previously, because of the microclimate effect for which Lake Erie is responsible, the weather could vary significantly over a small geographic area. Given this, the study only considered traffic data from the detectors closest to the weather stations at BNIA and NFIA. As can be seen from Figure 1a, the two detectors closest to the weather station at BNIA Lin, Ni, He, Gao & Sadek 10 are M4183E and M4183W on interstate I-90. For NFIA, the two closest detectors are N1098S and N1098N on I-190 (see Figure 1b). Also shown on Figure 1, are the distances between the detectors and airports’ weather stations. Table 2 also lists the three traffic variables extracted from the traffic dataset. For each lane and each direction, the hourly traffic volume, the average traffic speed and the traffic speed standard deviation for each hour was calculated. The notations in the parentheses will be used to refer to these variables thereafter. MODELING DEVELOPMENT AND RESULTS This section will start with a preliminary analysis of the data. Next, the sensitivity and the false alarm rate for the Twitter data in terms of identifying snow events will be quantified. Finally, linear regression models for predicting the average traffic speed as a function of weather conditions will be developed, once with the Twitter weather data included and another without the Twitter data, to discern whether Twitter data can help improve the accuracy of such models. Preliminary Data Analytics As an example, Figure 2 shows a part of the hourly traffic volume (top plot) and speed data (second from top) as recorded by detector M4183E, Twitter data (specifically the number of weather utterances and report events; third plot from top) and hourly snow precipitation (bottom plot) in BNIA dataset. Figure 2 Weather Data, Twitter Data and Traffic Data in BNIA Dataset A few observations can be made. Firstly, we can see there are some connections among the snow precipitation, the number of Twitter Utterance and Report Events per hour, the hourly traffic volume and average traffic speed. It looks like when there is a peak in the hourly snow precipitation curve, the number of Twitter Utterance and Report Events per hour is also higher, Lin, Ni, He, Gao & Sadek 11 the average traffic speed is much lower than its normal level, like the yellow marked areas in Figure 2. In a few occasions, the hourly traffic volume, in addition to the average speed, was decreased from its normal level, like the red marked area, from 12/14/2013 to 12/15/2013. This was a period of two continuously snowing days which led to not only a decrease in the speed but also in the hourly traffic volume. Secondly, we can see that the main tweets of Twitter users occur during the daytime. As a result, when the inclement weather event occurs at a time period close to the midnight, like the dark regions marked in Figure 2, there are no obvious fluctuations in the Twitter data curve. It is therefore reasonable to split Twitter data into two groups: daytime and nighttime, and develop separate linear regression models for predicting the impact of inclement weather on traffic flow. In our previous short term traffic volume prediction studies, we define the daytime interval as 7:00 AM-9:00 PM (18-19). This paper follows the same definition. Finally, we find that although the geographic locations of the Weather Utterance and Report Events are in the Buffalo area, the tweets may refer to inclement weather conditions occurring at other places. This was the case for the green marked region in Figure 2; on 12/03/2013, there was no snow in Buffalo, and the hourly traffic volume and average traffic speed were normal, but a number of tweets talking about “snow” were observed. When the researchers read those tweets, it was revealed that there was a snowstorm during the NFL game in Pennsylvania between the Philadelphia Eagles and Detroit Lions, and people in Buffalo were “tweeting” about the inclement weather while they were watching the game. Twitter Weather Events Observation As a result of the preliminary data analysis, the data are split into two groups: daytime group from 7:00 AM to 9:00 PM with 270 records and nighttime group from 10:00 PM to 06:00 AM with 162 records. Before applying the Twitter Weather Events Observation method, the threshold in Equation 1 needed be defined. For the daytime group it was set to 5. For the nighttime, and considering that few people use Twitter during nighttime, the threshold is set smaller, specifically to 3. Then based on Equation 2 and Equation 3, the sensitivity and false alarm rate are calculated as shown in Table 3. Table 3 Sensitivity & False Alarm Rate for Twitter Weather Events Observations Datasets BNIA NFIA Groups Daytime (270 records) Snow Statistics Number of Snow Snow Precipitation Records (inches) 86 1.06 Criteria False Alarm Rate 68.60% 11.41% Sensitivity Nighttime (162 records) 45 0.82 40% 17.95% Daytime (270 records) 55 0.44 32.73% 1% Nighttime (162 records) 25 0.28 8% 1.5% From Table 3, first we can see a confirmation of the microclimate phenomenon. Although the two weather stations are both locate in the Buffalo-Niagara metropolitan area (the distance between them is about 20 miles), it is obvious that the BNIA weather station witnessed more snow than that at NFIA. Specifically, for BNIA, there were 86 snow records in the Lin, Ni, He, Gao & Sadek 12 daytime group and 45 in the nighttime group. For NFIA, the two numbers are only 55 and 25 respectively. The same conclusion can also be arrived at by comparing the total snow precipitation, no matter for daytime or for nighttime, the total snow precipitation at BNIA is much higher than that at NFIA. Second, the Twitter Weather Event observation method performs the best for the daytime group of BNIA dataset. The sensitivity is as high as 68.60%, and the false alarm rate is only11.41%. This result shows that the Twitter Weather Event observation method is only effective when there are enough people tweeting about the weather event within the same spatial and temporal region. This explains why the method did not perform well at nighttime when few people were using Twitter; for the nighttime period, the sensitivity was lower and the false alarm rate was higher for both datasets. For example, for the BNIA dataset, the sensitivity at nighttime group drops from 68.60% to 40%, and the false alarm rate increases from 11.41% to 17.95%. Similarly, the method doesn’t perform well when there are not many real world inclement weather events which can stimulate people to tweet about them. Although this may lead to a much lower false alarm rate (1% for NFIA versus 11.41% for BNIA), which is good, the sensitivity suffers (32.73% for NFIA versus 68.60% for BNIA during the daytime). Models for Predicting the Impact of Inclement Weather on Freeway Speed Linear regression model is used in this paper to quantify and predict the impact of inclement weather impact on freeway traffic speed during a given hour for each lane and each direction of the two traffic detectors. To assess the benefit of including information from Twitter, in addition to data from the atmospheric weather stations, on the model’s predictive accuracy, the regression was performed twice for each group; once with the Twitter variables included and another without the Twitter variables. The R-square values for the models with the Twitter variables were then compared to those for the models without the Twitter variables to see whether there is an improvement in the model’s predictive capability. Table 4 shows the coefficients of the variables and R square values of the linear regression models for BNIA dataset. The column entitled “No Twitter” refers to the case when the Twitter data were not included in the model, and the column entitled “Twitter” means the Twitter variables were used in building that model. Using a confidence level of 95%, all the pvalues of the coefficients were smaller than 0.01, indicating that they are statistically significant. Lin, Ni, He, Gao & Sadek 13 Table 4 Results of the Linear Regression Models for BNIA Dataset Groups Coefficients and R Square Daytime Visibility Snow Precipitation Weather Condition-Rain no_of_snow no_of_snow_melt Constant Nighttime Temperature Visibility Snow Precipitation Wind Speed no_of_snow no_of_snow_http Constant W4183E_speed lane1 lane2 No Twitter Twitter No Twitter Twitter 0.84 0.48 0.84 0.46 -227.41 -154.54 -234.99 -157.89 -0.24 56.62 0.52 2.04 0.83 -180.15 56.08 0.63 60.63 0.65 -0.25 0.82 -178.49 63.25 0.51 2.55 0.82 -227.78 67.49 0.65 2.33 0.74 -231.57 -1.74 57.11 0.64 61.98 0.63 -1.68 63.02 0.65 W4183W_speed lane1 lane2 No Twitter Twitter No Twitter Twitter 0.45 0.49 -258.52 -176.85 -247.92 -182.29 -12.68 -12.70 -11.26 -11.28 -0.28 -0.26 -4.41 -4.26 63.06 68.17 56.60 62.03 0.28 0.41 0.32 0.45 2.62 2.53 1.87 1.75 0.69 0.51 0.65 0.46 -270.22 -266.97 -182.97 -179.62 -0.12 -0.12 -0.31 -0.34 62.57 0.68 63.35 0.71 56.77 0.68 57.68 0.73 As can be seen in Table 4, first regardless of the group (i.e., daytime or nighttime), when the Twitter data variables were included in the linear regression models, the R square values were improved, indicating that the Twitter data was useful in terms of improving the model’s fit. This was especially the case for the daytime group, which as was previously mentioned, corresponded to higher sensitivity and lower false alarm rate of the Twitter Weather Events observation results than those in the nighttime group. Secondly, the statistically significant Twitter variables in the linear regression models included the number of Weather Utterance and Report Events per hour (no_of_snow), the Number of Weather Utterance and Report Events per hour related to “melt” (no_of_snow_melt), and the Number of Weather Utterance and Report Events per hour containing URLs (no_of_snow_http). These coefficients are all negative, which means the higher these Twitter data variables are, the lower the freeway traffic speed will be. For the variable “no_of_snow_melt”, after checking the contexts of the tweets, we found sometimes people were complaining and hoping the snow to melt soon at that time, so this may explain the negative coefficient. Thirdly, for the traditional weather data, the significant variables were temperature, visibility, snow precipitation, weather condition-rain, and wind Speed. The signs of these coefficients are all negative, which agree with intuition. Lin, Ni, He, Gao & Sadek 14 Similarly, Table 5 shows the results of the linear regression models for the NFIA dataset, where it can also be seen that the inclusion of the Twitter weather variables helped improve the R square values. However the magnitude of the improvement in the Rsquare values was slightly less. This is to be expected since, as previously pointed out, the sensitivity of the NFIA Twitter data was lower than that for BNIA. Another difference between the NFIA and BNIA datasets is that only two Twitter variables were found statistically significant for each of the daytime and the nighttime models. Table 5 Results of the Linear Regression Models for NFIA Dataset Groups Coefficients and R Square lane1 No Twitter Twitter Daytime Nighttime Visibility Snow Precipitation Weather Condition-Clear Weather Condition-Cloudy Weather Condition-Rain Weather Condition-Snow no_of_snow Constant Temperature Visibility Weather Condition-Clear Weather Condition-Cloudy Weather Condition-Rain no_of_snow Constant -473.41 -408.16 -3.40 -2.35 -0.47 49.36 0.53 - 49.11 0.48 6.52 5.33 7.72 41.71 0.28 N0198S_speed lane2 No Twitter Twitter 0.36 0.23 -735.38 -611.24 lane3 No Twitter Twitter 0.36 0.21 -807.83 -671.08 3.93 2.98 3.94 2.89 55.55 0.58 -0.57 57.18 0.65 62.22 0.58 -0.62 64.02 0.65 0.44 3.97 4.01 3.96 51.11 0.46 0.48 3.48 3.53 3.36 -0.60 51.41 0.48 0.36 5.50 5.36 5.82 56.75 0.44 0.39 5.02 4.89 5.23 -0.58 57.06 0.46 N0198N_speed lane1 lane2 No Twitter Twitter No Twitter Twitter 0.34 0.32 -647.93 -576.08 -678.41 -785.64 3.49 2.96 3.45 3.87 3.08 3.67 8.48 6.59 8.28 4.51 55.53 0.59 2.30 0.55 3.41 2.74 54.42 0.48 -0.87 59.85 0.66 1.97 0.81 -1.31 55.28 0.49 61.87 0.56 2.29 0.51 4.51 3.78 -1.03 68.82 0.61 2.12 0.55 4.07 3.38 58.62 0.50 -0.79 59.02 0.52 CONCLUSIONS AND FUTURE RESEARCH This exploratory study has first examined whether real-world inclement weather events could be inferred from social media data, and evaluated the sensitivity and false alarm rate associated with doing this. Next, the study assessed whether incorporating information from social media could help improve the predictive accuracy of models developed to predict the impact of inclement weather on freeway traffic speed. Among the main conclusions of the study are: Lin, Ni, He, Gao & Sadek 15 1. The majority of Weather Utterance Events and Weather Report Events in Twitter tweets occur during the daytime. Given this, the analysis and model developed should be conducted separately for daytime and nighttime; 2. The Twitter Weather Events observation method can be used to infer the real world weather conditions, especially during daytime hours and for areas with frequent snow events and significant precipitation. Special events (e.g., snow during a major sporting event) may impact the quality of the Twitter data, since remote users are likely to tweet about this while watching the game for example. 3. The R square values of the linear regression models developed to predict the impact of inclement weather on freeway speed are improved when the Twitter weather variables are included. Once again, this is especially true where the tweeter data shows high sensitivity for predicting weather events. For future research, the researchers plan to acquire additional Twitter data, for longer time periods and for several geographic locations to validate the conclusions made in this study. ACKNOWLEDGEMENTS Partial funding of this research has been provided by the Transportation Informatics Tier I University Transportation Center headed at the University at Buffalo. The authors would like to thank the University Transportation Center program for their financial support. Lin, Ni, He, Gao & Sadek 16 REFERENCES 1. Federal Highway Administration. How Do Weather Events Impact Roads? Available at: http://www.ops.fhwa.dot.gov/weather/q1_roadimpact.htm, accessed on Jul 16 2014. 2. Bartlett, A., W. Lao, Y. Zhao, and A. W. Sadek. Impact of Inclement Weather on Hourly Traffic Volumes in Buffalo, New York. In Transportation Research Board 92nd Annual Meeting (No. 13-3240), 2013. 3. Stern, A. D., V. Shah, L. Goodwin, and P. Pisano. Analysis of weather impacts on traffic flow in metropolitan Washington DC. In Proceedings of the 19th International Conference on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology, 2003. 4. Vlahogianni, E. I., and M. G. Karlaftis. Comparing traffic flow time-series under fine and adverse weather conditions using recurrence-based complexity measures. Nonlinear Dynamics, Vol. 69, No. 4, 2012, pp. 1949-1963. 5. Zhao, Y., A. W. Sadek, and D. Fuglewicz. Modeling the impact of inclement weather on freeway traffic speed at macroscopic and microscopic levels. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2272, No. 1, 2012, pp. 173-180. 6. Rakha, H., M. Arafeh, and S. Park. Modeling inclement weather impacts on traffic stream behavior. International Journal of Transportation Science and Technology, Vol. 1, No. 1, 2012, pp. 25-48. 7. Omer, R., and L. Fu. An automatic image recognition system for winter road surface condition classification. In Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on, 2010, pp. 1375-1379. 8. Jonsson, P. Remote sensor for winter road surface status detection. In Sensors, IEEE , 2011, pp. 1285-1288. 9. Hyvärinen, O., and E. Saltikoff. Social Media as a Source of Meteorological Observations. Monthly Weather Review, Vol. 138, No. 8, 2010. 10. Cox, J., and B. Plale. Improving Automatic Weather Observations with the Public Twitter Stream. IU School of Informatics and Computing, 2011. 11. Ni, M., Q. He, and J. Gao. Using Social Media to Predict Traffic Flow under Special Event Conditions, In Transportation Research Board 93nd Annual Meeting, 2014. 12. Grant-Muller, S. M., A. Gal-Tzur, E. Minkov, S. Nocera, T. Kuflik, and I. Shoor. The Efficacy of Mining Social Media Data for Transport Policy and Practice. In Transportation Research Board 93rd Annual Meeting (No. 14-1716), 2014. 13. Statistic Brain, Twitter Statistics, Available at: http://www.statisticbrain.com/twitterstatistics/, accessed on Jul 18 2014. Lin, Ni, He, Gao & Sadek 17 14. Maze, T. H., M. Agarwai, and G. Burchett. Whether weather matters to traffic demand, traffic safety, and traffic operations and flow. Transportation research record: Journal of the transportation research board, Vol. 1948, No. 1, 2006, pp. 170-176. 15. Weather Underground. http://www.wunderground.com/history/, accessed on 05/15/2014. 16. Mass, C., and C. F. Mass. Nowcasting: The Next Revolution in Weather Prediction. Bulletin of the American Meteorological Society, 2011. 17. The Streaming APIs. https://dev.twitter.com/docs/api/streaming, accessed on 05/15/2014. 18. Lin, L., Q. Wang, S. Huang, and A. W. Sadek. On-line prediction of border crossing traffic using an enhanced Spinning Network method. Transportation Research Part C: Emerging Technologies, 2013. 19. Lin, L., A. W. Sadek, and Q. Wang. Multiple-Model Combined Forecasting Method for Online Prediction of Border Crossing Traffic at Peace Bridge. In Transportation Research Board 91st Annual Meeting (No. 12-3398), 2012.
© Copyright 2026 Paperzz