Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 1 Modeling Crash Risk of Highway Work Zones with Relatively Short Durations Hong Yang, Ph.D. (Corresponding author) Post-doctoral Associate, Department of Civil and Urban Engineering; Center for Urban Science and Progress (CUSP); New York University (NYU) One MetroTech Center, 19th Floor, Brooklyn, NY 11201, USA Phone: (646) 997-0548, Fax: (646) 997-0560 E-mail: [email protected] Kaan Ozbay, Ph.D. Professor, Department of Civil and Urban Engineering; Center for Urban Science and Progress (CUSP); New York University (NYU) One MetroTech Center, 19th Floor, Brooklyn, NY 11201, USA Phone: (646) 997-0552, Fax: (646) 997-0560 E-mail: [email protected] Kun Xie, Ph.D. Candidate, Department of Civil and Urban Engineering; Center for Urban Science and Progress (CUSP); New York University (NYU) One MetroTech Center, 19th Floor, Brooklyn, NY 11201, USA Phone: (646) 997-0547, Fax: (646) 997-0560 E-mail: [email protected] Bekir Bartin, Ph.D. Assistant Professor, Department of Civil Engineering Istanbul Kemerburgaz University Mahmutbey Dilmenler Caddesi No: 26 Bagcilar, 34217, Istanbul / Turkey Phone: 00 1 (212) 604 01 00 Ext. 1411 E-mail: [email protected] Word count: 5189 Texts + 3 Table + 3 Figures = 6689 Abstract: 231 Re-submission Date: November 15, 2014 Paper submitted for Publication in the Transportation Research Record, Journal of Transportation Research Board after being presented Transportation Research Board’s 94th Annual Meeting, Washington, D.C., 2015 TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2 Abstract Highway work zones greatly affect operational and safety performance of traffic. Existing studies have primarily focused on exploring safety issues of long-term work zones whereas safety issues associated with a large number of relatively shorter duration work zones are seldom examined. Thus, this paper aims to present an investigation of traffic safety in these work zones with relatively short duration. Considering low frequency of crashes due to this type of short duration work zones with respect to its crash condition and non-crash condition, a “rare event” logistic regression model is developed to explore the causal relationship between a set of contributing factors and the crash risk. The proposed model accounts for the imbalance issues of events (crashes) vs. non-event (non-crash) conditions in modeling low frequency crash occurrences. In addition, the model uses actual traffic data to reduce the bias of using aggregated exposure data (i.e. averaged traffic volume over a day, month, year, etc.). The modeling results based on a case study show that the work zone length, traffic volume and lane closure are positively associated with the crash risk in those work zones with short duration. The proposed model with specific model corrections and actual input data is found to improve the depiction of the relationship between these factors and the crash risk based on the comparison of the area under the ROC (Receiver Operating Characteristics) curves of different models. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 3 INTRODUCTION Transportation network is frequently affected by the disruptions of the increasing number of work zones associated with construction, maintenance, and rehabilitation projects. Such issue is particularly notable for critical freeways and arterials. For example, it was estimated that about 20 percent of the highways in the nation are undergoing some kind of construction work during the peak period each year and the work zones accounted for over 20 percent of non-recurring delays and more than 87 thousand of crashes including 576 fatalities during 2010 (1, 2). This number will be more alarming when more work zones have to be deployed as a result of rapidly aging transportation networks. Many transportation agencies continuously make great efforts to reduce the operational and safety impacts of work zones on road users as well as their own on-site workers by using various work zone deployment strategies and advanced traffic control approaches. In order to make more effective decisions, the underlying knowledge of work zone crash occurrence is essential. Therefore, a number of research projects have been dedicated to explore the potential risk factors associated with work zone crash occurrences and their consequence in recent decades (3). Despite the existence of a large body of research literature developed in recent years, work zone safety issues still need more investigation as many of the associated analysis have focused on long-term work zones. Other than the long-term projects, there are thousands of work zones with relatively short durations (i.e., from several hours to several days) that have been less frequently explored. These work zones can cause very different safety issues compared with those long-term work zones. For instance, drivers are less likely to notice work zones with short duration whereas they become familiar with the existence of long-term work zones after commuting along the same routes regularly. Therefore, the main objective of this paper is to develop a novel statistical model for exploring the causal relationship between crash risk and a set of explanatory variables in work zones with relatively short durations. The most important contribution of this paper is that it treats the crashes for short duration work zones as rare events compared to the long-term work zone crashes for the same section of the same highway. Then a rare event regression model is adopted to specifically model this unique problem. Unlike existing studies only exploring work zones crashes based on aggregate data and crash count models (i.e., Poisson and Negative Binomial), this study looks at the safety issue of work zones from a more comprehensive perspective namely work zones with crash as well as work zones without crash experiences. In addition, it emphasizes the importance of using data that reflect actual work zone conditions to obtain reliable modeling results. LITERATURE REVIEW Numerous studies have been conducted in past decades to examine the influence of work zones on roadway safety. A thorough review has been presented in (3). Despite the amount of increases varies from study to study, these studies generally found that the presence of work zones on a roadway increases the crash rate or the likelihood of crash occurrence (3). Potential contributing factors that affect work zone crash occurrence and/or consequence have also been widely explored and discussed (4-25). Despite increasing concerns on work zone safety issues, only a limited number of research specifically modeled the causal relationship between the occurrence of crashes and contributing factors at work zones (24-34). TABLE 1 provides a summary of these studies. As seen in the table, almost all studies were focused on long-term work zones on freeways (26-29, 31, 32). Other than freeway work zones, some work zones on other types of roads were also analyzed in some studies (24, 25, 30, 33, 34). However, no study specifically examined crash risk of work zones with a relative shorter duration. Other than statistical techniques such as Poisson Regression and Linear Regression, most studies examined the underlying causes of work zone crash occurrence based on TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 4 Negative Binomial (NB) regression models. Factors such as length of a work zone, duration, and traffic counts were commonly found to be positively correlated with crash occurrence at work zones. TABLE 1. Summary of studies on modeling work zone crash occurrence. Study Pal and Sinha (26) Khattaket al. (27) Venugopal and Tarko (28) Khattaket al. (29) Qiet al. (30) Roadway (Location) freeway (IN) freeway (CA) rural freeway (IN) freeway (CA) multiple types of road (NY) Lundevaller (31) Srinivasanet al. (32) Yanget al. (24) Ozturket al. (33) Ozturket al. (25) Chen and Tarko (34) freeway & non-freeway (IN) freeway (CA, OH, NC, WA) multiple types of road (NJ) multiple types of road (NJ) multiple types of road (NJ) multiple types of road (IN) WZ Duration long-term long-term long-term long-term long/shortterm/moving long-term long-term long-term long-term long-term long-term Sample Sites 34 36 116 36 Unknown Method NB, PR, NLR NB NB NB TPR, TNB Exposure ADT ADT ADT ADT AADT 72 64 60 60 60 72 RENB NB, EB MENB NB NB RE, RP ADT AADT AADT AADT AADT ADT Note: NB= Negative Binomial Regression; PR=Poisson Regression; NLR=Normal Linear Regression; TPR=Truncated Poisson Regression; TNB= Truncated NB; EB=Empirical Bayesian; RE=Random Effect; MENB= NB with Measurement Error; RP=Random Parameters; ADT=Average Daily Traffic (vehicles per day); AADT=Annual Average Daily Traffic. The findings based on the above studies provide some useful information for work zone safety improvement. However, the findings based on the developed crash occurrence models are still questionable due to several major issues. First, these models were only built on work zones that had many crashes. Many work zones, particular those with shorter duration, that have no crash experience were not included (30). Second, data such as adjusted/estimated average daily traffic (ADT) at non-work zone periods incorporated in the models did not reflect actual traffic conditions when work zones present (27, 28, 35). The traffic condition might significantly change due to the presence of the work sites. Consequently, the deficiencies of these work zone-specific data can lead to uncertainty when quantifying work zone crash risk associated with different contributing factors (35). Finally, almost all the existing studies used the aggregated data in their models (i.e., aggregated crash counts and exposure data). These aggregated data cannot reflect the change of conditions over time in the same work zones. To sum up, most studies in literature only provided insight into crash risk of long-term work zones. There is need to investigate crash risk of work zones/roadwork with a relative shorter duration as they are unique but frequently present on our transportation networks. In addition, data issues have to be addressed and studies based on more accurate work zone-related information are greatly needed. DATA DESCRIPTION To investigate the mechanism of crash occurrence in work zones with a relatively short duration, the work zones on a 25-mile section of the New Jersey Turnpike (NJTPK) between milepost 48 and milepost 73 were used. This section of NJTPK has 3 lanes in both northbound and southbound directions. As one of the busiest toll road corridors in the east coast, work zones present on the NJTPK can possibly have an adverse impact on traffic operations and safety. In order to minimize the impact of work zones, they are usually deployed for short durations during the off-peak periods. In 2011, 562 work zones with relatively short durations were deployed along the studied section (both directions). For instance, some of them were maintenance work zones, pavement rehabilitation, etc. Despite deploying these work zones during off-peak periods, 56 crashes occurred due to these TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 5 work zones. Therefore, it is of great interest to explore the crash risk posed by these and other similar work zones. The studied section also has 23 remote traffic microwave sensors (RTMS) placed approximately at every one mile on the mainline, which provide real-time traffic data for these work zones. The sensors continuously measure travel speed, volume, and occupancy. The raw data were aggregated in 15-minute time intervals to capture the variation of traffic pattern. In addition, the time interval was chose considering potential errors in reported crash time. For each work zone, the traffic data from the nearest sensor was extracted for describing the actual traffic condition in the work zone. Excluding those cases with missing data, 466 work zones with 44 crashes were used in the final analysis. In total, these work zones consist of 16,285 periods (each period is 15 minutes based on the aggregation of sensor data). In other words, 44 out of the 16,285 periods had a crash, which clearly represent a very rare case of crash occurrence (0.27%). The characteristics of the data were summarized in TABLE 2. Notably, the duration of these work zones on average is 11.5 hours and the maximum one is 78.9 hours (less than 4 days). TABLE 2. Descriptive statistics of variables. Variable Type Description Mean SD Min Max Crash Categorical Crash occurrence =1; otherwise = 0 0.0027 0.0519 0 1 WZLength Numerical Work zone length (mile) 2.33 1.52 0.2 21.1 Volume Numerical Traffic volume in each time interval (veh/15min) 284 191 1 1183 Laneclosed Numerical Number of lanes closed in a work zone 1.34 0.49 0 2 Duration Numerical Hours a work zone presents 11.5 11.7 0.1 78.9 Season Categorical Winter=1; otherwise=0 0.43 0.49 0 1 Unlike long-term work zones that have many crashes, it should be noted that work zones with relatively short durations usually have no crash or one crash during the entire active work period. Therefore, traditional models such as those listed in TABLE 1 may be not appropriate as they mainly rely on a higher frequency of crash counts. Therefore, a more reliable and appropriate approach to describe the crash risk in these work zones is needed. METHODOLOGY In this study, instead of assessing crash frequency/rate, we are interested in whether or not the presence of a short term work zone will cause a crash. Hereinafter crash risk is defined as the probability of short-term work zone having a crash. Then work zone-related factors that are hypothesized to affect the crash risk are investigated. Logistic Regression Denote y as the crash occurrence at a work zone. Crash occurrence at work zones can be described as a binary outcome. Let y = 1 represent that a crash occurs in a work zone whereas y=0 indicates no crash occurred in a work zone. A binary logistic regression model can be developed to examine the influence of factors that may affect the crash risk of a short-term work zone. Let us define ( x) as the probability of a work zone having a crash during its presence and 1 ( x) as the probability of a work zone having no crash. The binary logistic regression model identifies the relationship between the log odds of the dichotomous outcome and various risk factors. It can be formulated as following equation (1): ( x) ' logit ( x) = log α Xβ 1 ( x ) TRB 2015 Annual Meeting (1) Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 Based on the above equation, the probability that a work zone having a crash can be described by the logistic distribution shown in following equation (2): P(y 1X) | ( x) exp(α X'β) 1 exp(α X'β) (2) where ( x) is the conditional probability of the form P y 1| X ; X is the vector of explanatory variables (contributing factors) that could be numerical or categorical; is the corresponding vector of the coefficients; and is an intercept parameter. The maximum-likelihood estimation (MLE) technique can be used to determine the regression model’s parameters. The likelihood function is constructed as equation (3). By maximizing the log likelihood expression shown in equation (4), the best estimate of parameters and can be obtained accordingly. n l ( xi ) yi 1 ( xi ) i 1 12 6 (1 yi ) (3) n LL ln l yi ln ( xi ) 1 yi ln 1 ( xi ) (4) i 1 13 14 15 16 17 18 19 where yi denotes the observed outcome at the i th period of a work zone, with the value of either 0 or 1 only; i 1,2,..., n and n is total number of observed periods. The overall goodness-of-fit of the logistic regression model is tested by likelihood ratio test. The significance of individual contributing factors within the model is examined using the Wald z statistic. Moreover, the influence of j th factor on work zone crash risk can be revealed by the odds ratio (OR), which is defined as equation (5): OR exp j (5) 20 21 22 23 where j is the coefficient of the j th factor. OR measures the ratio of the predicted odds for a one-unit increase in a continuous variable x j or the presence of a dichotomous variable x j when other variables in the model are held constant. The 95% confidence interval of OR is exp j 1.96s ,exp j 1.96s , where s is the standard error of the coefficient j . An OR greater 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 than 1 suggests that the j factor increases the likelihood of crash occurrence in the work zone and vice versa. j j j th Rare Events Logistic Regression Although the binary logistic regression models have been frequently used in traffic safety studies, the models can yield biased estimates when sample data exhibits class imbalance (36). In this study, the crash occurrence at work zones with short duration can be an example of class imbalance. As noted in Section 3, there were many periods that the work zones have no crash y 0 whereas only a small number of them had crashes y 1 . In such cases, if the MLE of the binary logistic regression model is used, it can result in biased estimates of coefficients for the factors related to work zone crash occurrence (37, 38). In order to avoid these estimation issues, (King and Zeng (37)) have developed the rare events logistic regression for the cases that include three types of corrections for the ordinary logistic regression. First, a case-control sampling design based on endogenous stratified sampling is recommended. It consists of taking all periods with a crash y 1 and a random selection of periods with no work zone crash y 0 . Empirically, the proportion of periods with a crash to the periods with no crash can be set around 1:10 (39, 40). A sensitivity analysis can be performed by using different sampling proportions. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 7 The second correction is called prior correction. The purpose of prior correction is to avoid selecting dependent variables that may introduce sampling bias on the logistic coefficients (King and Zeng, 2001). The intercept is corrected using the following equation. 1 ˆ ln y 1 y (6) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 where is corrected intercept, ̂ is the uncorrected intercept, is the actual true faction of 1s (events) in the population, and y is the observed fraction of 1s (events) in the sample. The last correction is to adjust the MLEs of the logistic regression parameters and the predictions of the estimated model. When we substitute the corrected intercept in equation (6), there is an underestimation of predicated probability because of the uncertainty in the estimation of ˆ . Therefore, ˆ has to be adjusted. King and Zeng (2001) described the procedure to obtain the "approximately unbiased" coefficients, . Based on the adjusted intersect and coefficients, the predicted crash occurrence probability for a given work zone is obtained as ˆi based on a correction factor Ci incorporated into the original estimated probability i . When events are rare, the predicted probability of an event using logistic regression is adjusted. In our case, it denotes the predicted work zone crash risk. ˆi i Ci (7) The correction factor Ci is obtained using the following equation: Ci 0.5 pi pi 1 pi XV ( ) X ' (8) 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 where pi is the event probability estimated using the bias-corrected coefficient ; X is a 1 n 1 vector of values for each explanatory variable; and V ( ) is the variance-covariance matrix. To implement the rare event logistic regression model, we have processed all the traffic and work zone data, and estimated models using the open source statistical software package R. The rare event logistic regression model built in the “relogit” function of the R package "Zelig" was used to estimate the work zone crash risk (41). RESULTS AND DISCUSSION As mentioned in the literature review section, existing studies usually use the aggregated traffic volume (AADT) to describe the traffic condition in work zones. Since the traffic condition may greatly change during the presence of a work zone, the use of actual observed traffic data during this time period are preferred. Unlike existing studies, the present study, therefore, uses real-time sensor data as the input to the risk prediction models to accurately describe the traffic condition when a crash occurs in a work zone. FIGURE 1 illustrates the actual traffic conditions of four work zones. FIGURE 1 (a) and FIGURE 1 (b) show work zones without a crash whereas FIGURE 1 (c) and FIGURE 1 (d) show work zones that experienced crashes. It can be seen that the traffic volumes greatly change during the duration of the work zone. For example, the average traffic volume is 173 vehicles per 15 minutes for FIGURE 1 (c), but when a crash occurs then volume is observed to be 234 vehicles per 15 minutes. Larger differences are observed in FIGURE 1 (d). Therefore, the use of averaged/aggregated exposure data (i.e., ADT & AADT) obviously may yield biased results for this type of analysis. Instead of using the aggregate data, this study describes the crash risk in each 15minute interval and uses the corresponding sensor data to capture the variation in traffic. Despite the fact that higher resolution sensor data (i.e., 30-second interval, 1-minute interval, 5-minute interval, etc.) were available, the 15-minute interval was chosen based on two major considerations: (a) the desire to have a reasonable size interval that can capture the traffic variation, and (b) the need to define a time interval that can better capture/approximate actual crashes. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 8 FIGURE 1. Examples of traffic volumes in the presence of work zones Based on the traffic conditions during the time period where the crash actually occurs and work zone information, the rare event logistic regression model is estimated. The estimation results are presented in TABLE 3. The final rare event logistic regression model only includes variables that are statistically significant at the significance level of 0.1. The results suggest that all the three variables are positively associated with the crash risk in work zones with relatively short durations. More specifically, if the log-transformed work zone length increases one unit, the likelihood of having work zone crash increased, with an odds ratio of 2.415. Similarly, when the log-transformed volume increases one unit, the corresponding crash risk will be increased, with an odds ratio of 1.679. When one more lane is closed, the odds ratio is 1.662. For these work zones with relatively short duration, the seasonal impact on the crash risk was not significant. The underlying reason for this observation should be attributed to the fact that most of these work zones were only deployed during the off-peak periods under good weather condition. Since the work zone durations are short, they can be easily rescheduled to avoid severe weather impact. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 9 TABLE 3. Results of rare events logistic regression model (Event: Non-Event =1:30). Variable Estimate Std. Error z value Pr(>|z|) Odds Ratio Intercept -10.063 1.435 -7.012 0.000 ---- log(WZLength) 0.882 0.317 2.780 0.005 2.415 log(Volume) 0.518 0.219 2.364 0.018 1.679 Lanes closed 0.508 0.304 1.672 0.095 1.662 The results presented in TABLE 3 are based on a sampling ratio of 1:30 (crash conditions vs. non-crash conditions). To test the sensitivity of the modeling results, different sampling ratios are also analyzed. FIGURE 2 shows the estimated coefficients under different sampling ratios. It can be seen that as long as the ratio of non-event conditions vs. event conditions increases (i.e. sampling ratio ≥ 30), the estimates are relatively stable. This suggests that without losing modeling accuracy, the work of data preparation can be reduced in terms of processing the real-time sensor data to reflect actual work zone conditions required by our model. FIGURE 2. Effect of different sampling ratios on estimates of coefficients A regular logistic regression model is also estimated for the same case study with all the observed crash periods and non-crash periods of work zones. To compare the performance of the rare events logistic regression model with the regular logistic regression model, the areas under the Receiver Operating Characteristics (ROC) curves, or simply the area under the curve (AUC) of two models are examined. This is a frequently used measure for the comparison of learning algorithms (42). According to this reference, a ROC curve is said “to dominate another one if it is always above and to the left of the other one”. If most of the curves overlap each other, the AUC is used as the measure for comparison (42). FIGURE 3 shows the ROC curves of logistic regression model vs. the rare events logistic regression model with different sample ratios. The AUCs of the rare events model TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 10 are larger than the logistic regression model in both examples (The AUCs for rare events models are 0.6807 in FIGURE 3 (a) and 0.6804 in FIGURE 3 (b). Both are larger than the AUC of 0.6748 for the logistic regression model), which indicate that the rare events model should be preferred. FIGURE 3. ROC of rare event model vs. logistic regression model for different sampling ratios CONCLUSIONS AND FUTURE WORK This study examined the statistical relationships between a set of explanatory variables and the crash risk in work zones with relatively short durations. Instead of using traditional logistic regression model, i.e., Harbet al. (19), a rare event logistic regression model was estimated to capture the characteristics of crashes occurring during the of short duration work zones on highways. The proposed rare event regression model contributes to the literature in several aspects: First, it incorporates all work zones with and without crash experience in the model. Existing studies have only focused on those work zones with crash records. Second, it accounts for the imbalance between periods with crash occurrence(s) and other large amount of periods with no crashes. In addition, instead of using aggregated crash and traffic data, this study modeled the crash risk at each time interval with the actual sensor data. The bias introduced by using aggregated data can be reduced. For instance, existing studies generally adopt Poisson and/or Negative Binomial Models to estimate work zone crash frequency. However, these models hardly differentiate the periods with crashes from others without any crash. Instead, they usually use the "duration" as a variable to aggregate these periods together based on the assumption that the crash risk of each period is the same. In contrast, by using discrete data, the work zone crash risk in each short-time interval can be reliably captured under its prevailing traffic conditions. Based on the analysis of the developed rare event logistic regression model, it is found that the work zone length, actual traffic volume, and the number of lanes closed are positively associated with the crash risk in those work zones with short duration. The sensitivity analysis shows that the modeling results are stable even when only a part of the non-event periods are sampled. The use of the rare events model also improved the prediction performance of crash occurrence compared with the traditional logistic regression model. Future work will include extending the rare event regression model to multiple sites with different characteristics. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 11 ACKNOWLEDGEMENTS The contents of this paper reflect views of the authors who are responsible for the facts and accuracy of the data presented herein. The contents of the paper do not necessarily reflect the official views or policies of any agency mentioned in the paper. The authors thank for the insightful comments and suggestions from all the anonymous reviewers that helped improve the paper. REFERENCES 1. NHTSA, 2013. Traffic Safety Fact 2010, http://www-nrd.nhtsa.dot.gov/Pubs/811659.pdf, Access June 3. 2. FHWA, Work Zone Safety and Mobility Fact Sheet - 10th Annual National Work Zone Awareness Week (2009), http://safety.fhwa.dot.gov/wz/wz_awareness/2009/factsht09.cfm, Access December 12., Access. 3. Yang, H., O. Ozturk, K. Ozbay, and K. Xie, 2014. Work Zone Safety Analysis and Modeling: A State-ofthe-Art Review. Traffic Injury Prevention (Earlier version presented at 93rd TRB Annual Meeting, 2014). 4. Nemeth, Z.A. and D.J. Migletz, 1978. Accident Characteristics Before, During, and After Safety Upgrading projects on Ohio’s Rural Interstate System. Transportation Research Record: Journal of the Transportation Research Board No. 672, pp. 19-24. 5. Rouphail, N.M., Z.S. Yang, and J. Fazio, 1988. Comparative Study of Short- And Long-Term Urban Freeway Work Zones. Transportation Research Record: Journal of the Transportation Research Board No. 1163, pp. 4-14. 6. Hall, J.W. and V.M. Lorenz, 1989. Characteristics of Construction-Zone Accidents. Transportation Research Record: Journal of the Transportation Research Board No. 1230, pp. 20-27. 7. Garber, N.J. and T.H. Woo, 1990. Accident Characteristics at Construction and Maintenance Zones in Urban Areas. Report VTRC 90-R12. Virginia Transportation Research Council, Charlottesville. 8. Sorock, G.S., T.A. Ranney, and M.R. Lehto, 1996. Motor Vehicle Crashes in Roadway Construction Workzones: An Analysis Using Narrative Text from Insurance Claims. Accident Analysis and Prevention 28(1), pp. 131-138. 9. Lin, H., L.J. Pignataro, and Y. He, 1997. Analysis of Truck Accident Reports in Work Zones in New Jersey. Final Report. The National Center for Transportation and Industrial Productivity, Institute for Transportation, New Jersey Institute of Technology, New Jersey. 10. Bryden, J.E., L.B. Andrew, and J.S. Fortuniewicz, 1998. Work Zone Traffic Accidents Involving Traffic Control Devices, Safety Features, and Construction Operations. Transportation Research Record: Journal of the Transportation Research Board No. 1650, pp. 71-81. 11. Raub, R., O. Sawaya, J. Schofer, and A. Ziliaskopoulos, 2001. Enhanced Crash Reporting to Explore Workzone Crash Patterns. Transportation Research Board 80th Annual Meeting, Washington, D.C. 12. Chambless, J., A.M. Chadiali, J.K. Lindly, and J. McFadden, 2002. Multistate Work-Zone Crash Characteristics. ITE Journal, Institute of Transportation Engineers 72(5), pp. 46-50. 13. Garber, N.J. and M. Zhao, 2002. Distribution and Characteristics of Crashes at Different Work Zone Locations in Virginia. Transportation Research Record: Journal of the Transportation Research Board No. 1794, pp. 19-25. 14. Schrock, S.D., G.L. Ullman, A.S. Cothron, E. Kraus, and A.P. Voigt, 2004. An Analysis of Fatal Work Zone Crashes in Texas. Report No.: FHWA/TX-05/0-4028-1. Texas Transportation Institute, Texas. 15. Bushman, R., J. Chan, and C. Berthelot, 2005. Characteristics of Work Zone Crashes and Fatalities in Canada. In: Proceedings of the Canada Multidisciplinary Road Safety Conference XV, Fredericton, NB. 16. Müngen, U. and G.E. Gürcanli, 2005. Fatal Traffic Accidents in the Turkish Construction Industry. Safety Science 43(5-6), pp. 299-322. 17. Salem, O.M., A.M. Genaidy, H. Wei, and D. N., 2006. Spatial Distribution and Characteristics of Accident Crashes at Work Zones of Interstate Freeways in Ohio. In: Proceedings of the 2006 IEEE Intelligent Transportation Systems Conference, Toronto, Canada, pp. 1642-1647. 18. Ullman, G.L., M.D. Finley, J.E. Bryden, R. Srinivasan, and F.M. Council, 2008. Traffic Safety Evaluation of Nighttime and Daytime Work Zones. NCHRP Report 627. Transportation Research Board, Washington, D.C. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 12 19. Harb, R., E. Radwan, X. Yan, A. Pande, and M. Abdel-Aty, 2008. Freeway Work-Zone Crash Analysis and Risk Identification Using Multiple and Conditional Logistic Regression. Journal of Transportation Engineering 134(5), pp. 203-214. 20. Li, Y. and Y. Bai, 2008. Comparison of Characteristics between Fatal and Injury Accidents in The Highway Construction Zones. Safety Science 46(4), pp. 646-660. 21. Jin, T.G., M. Saito, and D.L. Eggett, 2008. Statistical Comparisons of The Crash Characteristics on Highways between Construction Time and Non-Construction Time. Accident Analysis and Prevention 40(6), pp. 2015-2023. 22. Dissanayake, S. and S.R. Akepati, 2009. Characteristics of Work Zone Crashes in the SWZDI Region: Differences and Similarities. In: Proceedings of the 2009 Mid-Continent Transportation Research Symposium, Ames, Iowa. 23. Akepati, S.R. and S. Dissanayake, 2011. Characteristics and Contributory Factors of Work Zone Crashes. Transportation Research Board 90th Annual Meeting, Washington, D.C. 24. Yang, H., K. Ozbay, O. Ozturk, and M. Yildirimoglu, 2013. Modeling Work Zone Crash Frequency by Quantifying Measurement Errors in Work Zone Length. Accident Analysis & Prevention 55(June), pp. 192-201. 25. Ozturk, O., K. Ozbay, and H. Yang, 2014. Estimating the Impact of Work Zones on Highway Safety. Transportation Research Board 93rd Annual Meeting, Washington, D.C. 26. Pal, R. and K.C. Sinha, 1996. Analysis of Crash Rates at Interstate Work Zones in Indiana. Transportation Research Record: Journal of the Transportation Research Board No. 1529, pp. 43-53. 27. Khattak, A.J., A.J. Khattak, and F.M. Council, 1999. Analysis of Injury and Non-Injury Crashes in California Work Zones. Transportation Research Board 78th Annual Meeting, Washington, D.C. 28. Venugopal, S. and A. Tarko, 2000. Safety Models for Rural Freeway Work Zones. Transportation Research Record: Journal of the Transportation Research Board No. 1715, pp. 1-9. 29. Khattak, A.J., A.J. Khattak, and F.M. Council, 2002. Effects of Work Zone Presence on Injury and Noninjury Crashes. Accident Analysis and Prevention 34(1), pp. 19-29. 30. Qi, Y., R. Srinivasan, H. Teng, and R.F. Baker, 2005. Frequency of Work Zone Accidents on Construction Projects. Report No: 55657-03-15. University Transportation Research Center City College of New York, New York. 31. Lundevaller, E.H., 2006. Measurement Errors in Poisson Regressions: A Simulation Study Based on Travel Frequency Data. Journal of Transportation and Statistics 9(1), pp. 39-47. 32. Srinivasan, R., G.L. Ullman, M.D. Finley, and F.M. Council, 2011. Use of Empirical Bayesian Methods to Estimate Temporal-Based Crash Modification Factors for Construction Zones. Transportation Research Board 90th Annual Meeting, Washington, D.C. 33. Ozturk, O., K. Ozbay, H. Yang, and B. Bartin, 2013. Crash Frequency Modeling for Highway Construction Zones. Transportation Research Board 92nd Annual Meeting, Washington, D.C., January 13-17. 34. Chen, E. and A. Tarko, 2014. Modeling Safety of Highway Work Zones with Random Parameters and Random Effects Models. Analytic Methods in Accident Research 1(January), pp. 86–95. 35. Wang, J., W.E. Hughes, F.M. Council, and J.F. Paniati, 1996. Investigation of Highway Work Zone Crashes: What We Know and What We Don’t Know. Transportation Research Record: Journal of the Transportation Research Board No. 1529, pp. 54-62. 36. Buehler, R., 2012. Determinants of Bicycle Commuting in the Washington, DC Region: The Role of Bicycle Parking, Cyclist Showers, and Free Car Parking at Work. Transportation Research Part D: Transport and Environment 17(7), pp. 525–531. 37. King, G. and L. Zeng, 2001. Logistic Regression in Rare Events Data. Political Analysis 9(2), pp. 137– 163. 38. King, G. and L. Zeng, 2001. Explaining Rare Events in International Relations. International Organization 55(3), pp. 693–715. 39. Ramalho, E.A., 2002. Regression models for choice-based samples with misclassification in the response variable. Journal of Econometrics 106(1), pp. 171–201. 40. Beguería, S., 2006. Changes in land cover and shallow landslide activity: A case study in the Spanish Pyrenees. Geomorphology 74(1-4), pp. 196–206. TRB 2015 Annual Meeting Paper revised from original submittal. Yang, Ozbay, Xie, Bartin 1 2 3 4 5 13 41. Imai, K., G. King, and O. Lau, 2007. relogit: Rare Events Logistic Regression for Dichotomous Dependent Variables. Zelig: Everyone's Statistical Software. http://gking.harvard.edu/zelig. 42. Ling, C.X., J. Huang, and H. Zhang, 2003. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms. Advances in Artificial Intelligence 2671, pp. 329-341. TRB 2015 Annual Meeting Paper revised from original submittal.
© Copyright 2026 Paperzz