Pre-print

Time Gap Modeling using Mixture Distributions under Mixed Traffic Conditions
Subodh Kant Dubey1, Balaji Ponnu2 and Shriniwas S Arkatkar3
ABSTRACT
Time-gap data have been modeled through a non-composite distribution up to a flow level of 1800 vph. It has
been found that these non-composite models are not capable of modeling time gap data at higher flow levels.
Some composite distributions have been proposed to overcome this problem, but the calibration of model
parameters used in composite distributions is tedious, therefore their use may be relatively limited. In this paper,
five mixture models namely Exponential+Extreme-value (EEV), Lognormal+Extreme-value (LEV), Weibull
+Extreme-value (WEV), Weibull+Lognormal (WLN), and Exponential+ Lognormal (ELN) have been used to
model time gap data for flows ranging from 1900 vph to 4100 vph. Two types of goodness-of-fit tests namely
cumulative distribution function (CDF) based and two-sample (Cramer-von Mises test) & K-sample (AndersonDarling test) based tests were performed. Among all five models, Weibull+ Extreme Value was found to be the
best mixture model for modeling time gap data as it performed consistently well in the Cramer-von Mises test
and K-sample Anderson-Darling test.
Key Words: Mixed Traffic; vehicular time gap; AD test; Cramer-Von Mises test
INTRODUCTION
Mixed traffic conditions in developing countries like India, are entirely different as compared to the
homogeneous traffic conditions prevailing in developed countries. They consist of many classes of vehicles
including: Auto-Rickshaws, two-wheelers, small cars (including jeeps and vans), sports utility vehicle, and
heavy vehicles (including buses and trucks of different axles). Additionally, there is no lane discipline i.e.
vehicles can occupy any lateral space available on the road irrespective of lane markings. Some vehicles follow
each other and some do not. Also, vehicles can move parallel to each other within a single lane, leading to the
distance between each vehicle to be almost zero. Considering the fact that there is no lane discipline, it would
be better to consider time headway of homogeneous traffic to be analogous to inter vehicular time gap under
heterogeneous traffic conditions. Headway for homogeneous traffic is defined as the time between two
successive vehicles in a traffic lane or a single file traffic stream as they pass a point on the road way, measured
from bumper to bumper (front to front or rear to rear), in seconds. However, this definition is not
appropriate for heterogeneous traffic in India with imperfect lane discipline. The reasons are: (i) complex
movement patterns are not restricted to a single file stream, (ii) near parallel movements of smaller
vehicles within a single lane, (iii) the possibility of multiple leaders and/or followers, (iv) absence of
lane discipline and lane markings, (v) higher degree of lateral movement and vehicle occupancy of entire
road width without reference to lanes, (vi) varying lateral dimensions of vehicles, and (vii) little or no
enforcement measures. On the other hand, inter vehicular time gap is defined as the time interval between
arrivals of consecutive vehicles, this is more pertinent to heterogeneous traffic. Time gap is independent of
lead lag vehicle types and incorporates various types of non-following interactions that are typical of
heterogeneous traffic conditions (Kanagaraj et al. (2011)) [1]. Also, this definition ensures a unique time
ordering of all vehicles crossing an observation point and is more appropriate for simulation models for
heterogeneous traffic. A snapshot of both homogeneous and heterogeneous traffic flows is shown in Figure 1,
which justifies the application of time gap concept under heterogeneous traffic conditions as explained above.
1
Former Graduate Student, Civil Engineering Department, Birla Institute of Technology & Science Pilani-333031, Rajasthan, India;
E-mail: [email protected]
2
Former Graduate Student, Civil Engineering Department, Birla Institute of Technology & Science Pilani-333031, Rajasthan, India;
E-mail: [email protected]
3
Assistant Professor, Civil Engineering Department, Birla Institute of Technology & Science Pilani-333031, Rajasthan, India; Phone:
+91-01596-245073; Fax: +91-01596-244183; E-mail: [email protected]
1
(a) Homogeneous traffic
(b) Heterogeneous traffic
Figure 1: Traffic flow under different types of traffic condition
Many researchers (Arasan& Koshy [2] and Chandra and Kumar [3]) have used entire road space based arrivals
at a reference line on the road for the modeling of time gap under heterogeneous conditions. Vehicular time gap
incorporates both following and non-following interactions typical of the heterogeneous traffic scenario in
developing countries like India. In spite of this advantage, there are a few challenges in time gap modeling. Due
to the presence of both fast and slow moving, small and large vehicles, time gaps may range from 0 to 25 sec,
with a significant amount (0-20%) of data in the tail regions. Hence modeling of both zero time gaps and the
data in tail regions assumes paramount importance and leads to erroneous results when neglected. In a nut shell,
it becomes imperative to perform statistical modeling of entire road width based time gaps under heterogeneous
traffic conditions coupled with better tail modeling.
A lot of work has been done in the past to model headways and time gaps under homogeneous and
heterogeneous traffic conditions respectively though there are noticeable differences between the two
parameters. As seen earlier, traffic is lane based in homogeneous traffic conditions and hence if one wants to
simulate homogenous traffic condition for a multilane road, the multilane road can be divided into separate
lanes for the purpose of analysis. In this situation, traffic volume is limited to one lane capacity i.e. 2000–2200
passenger cars per hour. Due to this reason, most of the research works on time headway modeling are limited
to a flow range of 1800 vph. On the other hand, under heterogeneous traffic conditions as there is no lane
discipline, the entire roadway has to be considered. In these cases, traffic volume on four-lane divided roads
may reach a high flow level of up to 4000 vph (for two lanes in each direction). In addition to this, heterogeneous
traffic conditions comprise many vehicle categories such as cars, two-wheelers, three-wheelers, light
commercial vehicle, and trucks. Generally, two-wheeler’s concentration is very high, which makes it possible
to observe traffic volume up to 4000 vph on four-lane divided roads. Non-composite probability distributions
such as Weibull, Erlang, exponential, and lognormal distributions, which have been used to model time gaps
up to 1800 vph, are not capable of modeling time gaps at higher flow rates (for eg. volume levels of 2000 to
4000 vph). To address this problem, few researchers have proposed composite distributions to model headways
at higher flow rates. These models are very complex in nature and calibration of the parameters involved in
them is also a challenge. Considering the above and motivated by the fact that there are no works on mixture
distributions for modeling time gaps, this study aims to model time gap data at flow levels ranging from 1900
to 4100 vph using mixture distributions. This work is also motivated by the research works done in the area of
accountancy (Cooray and Ananda (2005) [4] & Scollnik (2007) [5]), where lot of work has been done to model
insurance payment data. The nature of insurance payment data is very similar to time gap data at higher flow
rates. Both are highly positively skewed and distributed with a long tail.
2
LITERATURE REVIEW
As seen above, time gaps under heterogeneous traffic conditions are different from that of time headways under
homogeneous traffic conditions, as they both measure two different quantities. Nevertheless, many researchers
on time gap modeling have adopted distributions such as exponential, gamma, Erlang, and lognormal which
have been used by researchers for headway modeling. Researchers of heterogeneous traffic conditions have
also referred the entire road width based inter-arrival rate as ‘headway’ though it is not the same as the lane
based follower headway that prevails in homogeneous conditions. Extensive literature is available based on
research done for modeling time headway and time gaps. But this work would restrict itself to the review of the
earlier composite models on headways. It is important to note that there are very few works on composite
models for heterogeneous traffic conditions. Buckley (1968) [6] suggested a semi-random headway model. He
found that generalized Poisson, Pearson type III, and semi-random models fit well for high ranges of flow.
Dawson and Chimni (1968) [7] proposed a hyperlang model which is a linear combination of the shifted
exponential and shifted Erlang distributions. In their work, a shifted exponential distribution was used to model
free headways (time headways at low volume level) and the Erlang distribution explained time headways at
high volume levels. Tolle (1971) [8] studied three distributions namely lognormal, composite exponential, and
Pearson type III for flows ranging from 800 to 1800 vph and concluded that lognormal offers the best fit among
the three. Cowan (1975) [9] suggested two components namely the tracking and free component for time
headways. He proposed four headway models: M1, M2, M3, and M4. M1 corresponded to Poisson, M2 to
shifted exponential, M3 to mixed and exponential, and M4 to a generalized M3 distribution. M3 and M4 were
found to be realistic and provided a better fit. Wasielewski (1979) [10] proposed a semi-Poisson model for a
volume range of 900-2000 vph. Chisakhi and Youichi (1984) [11] also developed a headway model based on
distinction between leaders and followers considering the following parameters: headway of leader, headway
of followers, mean and variance of headway. Chandra and Kumar (2001) [3] analyzed headways on urban roads
in India and suggested a hyperlang model for a range of 900-1600 vph, under heterogeneous traffic conditions.
They also developed a graphical relationship between traffic volume and parameters of the hyperlang model.
Hagring (1997) [12] proposed that Cowan’s M3 model describes driver behavior adequately through the
parameters of the distributions. Zhang et al.(2007) [13] studied various headway models like three parameter
shifted lognormal and gamma distributions, Cowan’s M3 and M4, and Doubly Displaced negative exponential
distribution (DDNED) for two flow levels of 400 vph and 1400 vph and found that DDNED fit both levels
better.
A review of literature clearly shows that Cowan’s M4 and DDNED models are the best models available to
model time headway data, but due to their complexity and tedious parameter calibration procedure, they are
limited in use. In addition to that, no work under homogeneous traffic conditions has examined the time
headway at or above 1900 vphpl to the best of the authors’ knowledge. It was noted from the literature review
that no research has been done pertaining to time gap modeling using mixture distributions under
heterogeneous traffic conditions prevailing in India. There also exists a lot of ambiguity for applying the type
of distributions for different flow levels. In addition to this, statistical validation of potential probability
distributions has been done at very basic level. Most of the researchers, under heterogeneous traffic conditions,
have used only Chi-square goodness-of-fit tests for deciding the suitability of a given distribution and hence
lack robust statistical modeling. Therefore, the present study tries to:
1. Make an attempt to develop vehicle generation models/ time gap models using mixture distributions for
medium to high flow levels under heterogeneous traffic conditions in India.
2. Study the discrepancies of Kolmogorov-Smirnov (KS) goodness-of-fit test and to present two other more
robust goodness-of-fit tests, Cramer-Von Mises (CVM) and Anderson-Darling (AD) tests. These tests are
more effective than traditional goodness-of-fit tests like the Chi-Square in determining the statistical
validity of the identified distributions as a potential candidate.
DATA COLLECTION
The time gap data used in this study was collected by considering the entire width of the roadway for one
direction of flow at Anna Salai, a major arterial in the city of Chennai for 3 consecutive days, where the flow
was found to vary between 2125 to 4100 vph during the observation period. The study locations were chosen
3
in such a way that they were far away from any intersection, bus stops, or parking lots hence there was no
hindrance to the traffic flow on the selected road section. The road was a four lane divided roadway with a
width of 7.5m in one direction. A video camera was mounted on the roadside, opposite to the reference point
to capture the moving traffic in one direction continuously across this reference point. Time gap data from the
videos were extracted at a rate of 25 frames per second for achieving a high accuracy.
MODEL DEVELOPMENT
To decide the components (number and type of distribution) of the mixture model, several well-known
distributions were individually tested against the data. All the tested distributions were evaluated using one
sample Cramer-Von Mises and Anderson-Darling test and an assessment was made regarding their performance
in different regions of data distributions (middle and tail regions). Based on the above mentioned statistical
tests, the following distributions were chosen for the formation of different mixture models: Exponential,
Extreme-Value, Lognormal, Weibull and Pareto. Using these five distributions, seven combinations were made
and their numerical convergence and root mean square error (RMSE) were calculated. Five combinations
namely Exponential + Extreme-value (EEV), Lognormal + Extreme-value (LEV), Weibull + Extreme-value
(WEV), Weibull + Lognormal (WLN), and Exponential + Lognormal (ELN) were considered for final analysis
in this research.
MODEL DESCRIPTION
Given a finite set of probability density functions p1(x), …, pn(x), and weights w1, …, wn such that wi ≥ 0 and
∑wi = 1, the mixture distribution can be represented as in Equation 1:
n
F ( x)   wi pi ( x)
(1)
i 1
where, F (x) is the probability density of the mixture of distributions, pi (x) is the individual probability
density function, and wi is the weight of the distribution in the mixture.
Consider the combination of exponential and lognormal probability distributions as an example. The
corresponding mixture probability density function (PDF) can be written as in Equation 2:
F (x) 
e  xλ pλ
x=0

e  xλ pλ 
(    Log [ x ])
2 2
e
x 2 
2
(1  p ) x > 0
(2)
where, p stands for the weight of a distribution.
DISTRIBUTION PARAMETER ESTIMATION
The parameters of the distributions were estimated using the classical Expectation-Maximization (EM)
algorithm discussed by Dempster et al. (1977) [14]. The algorithm is similar to the general uni-variate
Expectation-Maximization algorithm, except for the fact that it requires introduction of a latent variable which
works as an identifier for original data indicating which class it belongs to.
Assume that X=[x1, x2…..xn]T represent the data vector and φ represent the vector of distribution parameters
to be estimated. Hence, a mixture model can be specified as in Equation 3:
f ( X ; )  i 1 wi f i ( X ; i )
N
(3)
where, wi is the mixing proportion or probability and f i ( X ; i ) is the PDF of the component distribution. Let
Y= [yi] T be the identifier, where i equals the number of components (group) and takes a value of xi, if the given
4
xi belongs to the particular group. Let f (X|  ) denote the PDF of the original (unlabeled) data (X) and g(Y| 
) denote the PDF of the labeled data (Y). The maximum likelihood estimation then involves the maximization
of the log-likelihood of the original data N (  ) =log f (X|  ). The estimation in this direct form is very difficult,
as the origin or class of data is missing. Hence, EM makes use of the relationship between f(X|  ) and g(Y| 
) to estimate the log-likelihood. To estimate the parameters it performs two steps known as Expectation and
Maximization. From the observed value of Y compute the ML estimate of  according to Equation (4)
 ML  arg max log g (Y |  )
(4)
‘arg max’ refers to the parameter which maximizes the function.
For the Expectation step, it calculates the expected value of X, given Y and  as in Equation (5).
Q( |  c )  E[log g (Y |  ) | X , c ]
(5)
Where  is the current value of the parameter in the particular iteration. The algorithm begins with an initial
estimate of the parameter (say  0 ).
c
The Maximization step, calculates the parameters again using the expected value of data (X) obtained in the
first step. Substitute the expected value of data in f (X|  ) and equate the derivative of the function to zero by
taking the log of the function. The EM iterates between the expectation and maximization until the algorithm
converges. The convergence criterion can be the number of iterations or a tolerance value, which is defined as
the minimum change in log-likelihood value of the function in consecutive steps.
The problem with the EM algorithm is that it may not always give the global maxima. When the data is multipeaked, it has many hills and as EM is an uphill algorithm, it will give some maximum value, irrespective of
the hill used in the search space. To ensure the condition of global maxima, the EM algorithm is performed
with different initial guesses of parameter values (  0 ) to ensure that we search in all the hills and the one which
gives the maximum likelihood value upon convergence is taken as the final parameter value.
TIME GAP DATA MODELING
For modeling time gap data, five mixture distributions, namely Exponential + Extreme-value (EEV),
Lognormal + Extreme-value (LEV), Weibull + Extreme-value (WEV), Weibull + Lognormal (WLN), and
Exponential + Lognormal (ELN), were used for different flow levels starting from 1950 to 4100 vph. To decide
the underlying distribution of the data, kernel density estimation for different data sets was performed. The
estimated parameter values such as weight, log-likelihood value, and root mean square error (RMSE) are given
in Table 1 for the flow level of 4100 vph as an example.
Table 1: Model parameters estimated for flow level of 4100vph
Mixture
EEV
LEV
WEV
WLN
ELN
Parameters value
1st distribution
2nd distribution
1.35866
-0.93559, 1.39101
0.82770, 0.88750
1.17161, 0.57650
2.26115
4.76675, 1.78214
0.58614, 0.31481
0.46575, 0.25673
0.27292, 0.84034
-0.1629, 0.85113
5
Weight
RMSE
0.98, 0.02
0.62, 0.38
0.66, 0.34
0.76, 0.24
0.48, 0.52
0.08
0.75
0.2
0.08
0.16
Loglikelihood
value
-3361.54
-3321.15
-3350.54
-3322.27
-3335.25
While conducting analysis it was observed that the integral for the Exponential +Lognormal (ELN) distribution
did not converge for the flow level of 2300 vph. The best mixture distribution fit, evaluated based on RMSE
was found to be Weibull + Lognormal (WLN) for all the four flow levels. The distribution fit to the data for the
flow level of 4100 vph is shown in Figure 2 as an example.
(a) Flow level 1959 vph (RMSE = 0.08)
(b) Flow level 4100vph (RMSE = 0.08)
Figure 2. Best fit mixture distribution for different flow level based on RMSE
GOODNESS-OF-FIT TESTS
Two types of goodness-of-fit tests were performed viz. cumulative distribution function (CDF) based and twosample tests. Under the CDF category, two tests, namely Kolmogorov-Smirnov (K-S) and Cramer-von Mises
tests were performed. Under the two-sample test category, two-sample Cramer-von Mises and K-sample
Anderson-Darling tests were performed.
Kolmogorov-Smirnov (K-S) Test
The K-S test is a CDF based test and is known as the “distance test”. The K-S statistic is defined as:
D = Maximum of all D+ and D- ( ≥ 0); for i = 1... n
Where, D+ = Fn - F0 and D- = F0 - Fn-1 for every data point Xi. Fn stands for empirical CDF and F0 stands for
theoretical CDF. The K-S test is based on a one point maximum deviation. Therefore, there is the possibility
that a probability distribution is neglected due to a one-point large deviation regardless of its better overall fit,
resulting in misleading results in some cases. A better solution to this problem can be to conduct a test which
accounts for over all deviation i.e. integrates it and gives a final result, so that better decisions can be made
regarding selection of the fitted distributions.
Cramer-Von Mises Test
This test is also based on the CDF. It is similar to the K-S test except that it makes use of the overall deviation
by integrating the area between the theoretical CDF and Empirical CDF. The test statistic is defined in Equation
6:

   [ Fn ( x)  F * ( x)]2 dF * ( x)
2
(6)

Where,  2 stands for the Cramer-Von Mises statistic value, Fn(x) stands for the Empirical CDF, F*(x) stands
for the theoretical probability distribution CDF. This statistic will be reduced to the following form for one
sample test as given in Equation 7:
n
1
2i  1
(7)
2 
 {
 F ( xi )}2
12n i 1 2n
6
These two tests were performed for all five mixture models at different four flow levels using MATLAB. The
test statistics (K-S and Cramer-Von Mises) are shown in Table 2 for the flow level of 4100 vph as an example.
The critical values at 5% significance level, which were taken from Stephens (1970) [15], are mentioned below
Table 2.
Table 2. CDF based test for flow level of 4100vph
Distribution
EEV
LEV
WEV
K-S statistic
0.07
0.04
0.03
Cramer-Von Mises statistics
3.01
0.60
0.38
WLN
ELN
0.04
0.04
0.56
0.66
K-S critical value: 0.017; Cramer-von Mises critical value: 0.461
In the decision making process regarding the selection of a statistically significant distribution, only Cramervon Mises test statistics were used. As explained above, the K-S test statistics was misleading in some situations.
For example, in the case of a flow level of 1950 vph, the K-S test discarded all three mixture distributions,
namely the Lognormal + Extreme-Value (LEV), Weibull + Lognormal (WLN), and Exponential + Lognormal
(ELN), whereas they qualified as per the more robust Cramer-von Mises test. Hence, based on the Cramer-von
Mises test, it was inferred that Weibull + Extreme-Value (WEV) performed better than other mixture models at
all four flow levels.
Two-Sample Cramer-Von Mises Test
The Cramer-von Mises two-sample test is one of the best-known distribution-free two-sample tests. It is based
on the statistic T2, defined below. Suppose two independent samples x1, x2, . . . , xm and y1, y2, . . . , yn are drawn
independently from two distributions with continuous cumulative distribution functions F(·) and G(·),
respectively. Based on those samples, one wants to test the null hypothesis H0: F = G against the two-sided
alternative H1 : F ≠G. The Cramer-von Mises test statistic T2 is given by Equation 8:
T2 
mn
( m  n) 2
n
 m

  ( Fm ( xi )  Gn ( xi )) 2   ( Fm ( y j )  Gn ( y j )) 2 


j 1
 i 1

(8)
Where, m stands for the number of observations in sample 1 and n stands for the number of observations in
sample 2. A detailed description of two-sample Cramer-von Mises test can be found in Xiao and Gordon
(2007) [16]. The critical value table can be referred in Anderson and Darling (1952) [17].
K-Sample Anderson-Darling Test
The Anderson-Darling k-sample test was introduced by Scholz and Stephens (1986) [18] as a generalization of
the two-sample Anderson-Darling test. It is a nonparametric statistical procedure, i.e., a rank test, and, thus,
requires no assumptions other than that the samples are true independent random samples from their respective
continuous populations (although provisions for tied observations are made). It tests the hypothesis that the
populations from which two or more independent samples of data were drawn are identical. This test can be
used to decide whether data from different sources may be combined as they are judged to come from one
common distribution, i.e., the null hypothesis Ho of the same population distributions cannot be rejected. It is
an omnibus test because of its effectiveness against all alternatives to the null hypothesis H o (all k populations
being equal). The Anderson-Darling k-sample procedure assumes that ith sample has a continuous distribution
function and one is interested in testing the null hypothesis that all sampled populations have the same
distribution without specifying the nature of that common distribution. The complete description about this test
is beyond the scope of this paper. For detailed information one may refer to Scholz and Stephens (1986) [18].
7
The above mentioned two tests were performed for all five mixture models at different four flow levels using
MATLAB. For the purpose of performing this test, 10 different samples were generated using each of the five
mixture models with random seeds ranging from 1 to 10. The values were simulated and the average value for
10 runs was taken as the final value. The estimated test statistics (Cramer-von Mises and K-sample A-D statistic)
are given below in Table 3, for all five mixture models at the flow level of 4100 vph as an example. The critical
value for the Cramer-Von Mises test is 0.03656. The Anderson-Darling test passes a distribution if it has a
probability value greater than the significance level. The significance level of 5% is used here for testing
purposes.
Table 3: Two-sample test statistics for flow level of 4100 vph
Distribution
Cramer-von Mises
statistics
A-D probability value
(Not adjusted for tie)
A-D probability value
(Adjusted for tie)
EEV
LEV
WEV
WLN
ELN
0.00000
0.06870
0.13370
0.06260
0.04400
0.00001
0.00233
0.05734
0.01863
0.04310
0.00000
0.00516
0.08327
0.03391
0.04900
It can be inferred from Table 3 that based on the two-sample Cramer-Von Mises test, Exponential + Extremevalue (EEV) was found to be the most suitable mixture distribution at the flow level of 4100 vph and the same
observation was made for the other three flow levels. Based on the simulation runs at 4100 vph, it was noted
that the average statistic value was more or less consistent even after more runs. Hence, the number of simulation
runs was limited to 10 for the purpose of this study. Based on the K-sample A-D test (Table 3), the best mixture
distribution was found to be Weibull + Extreme-value (WEV) for all flow levels.
CONCLUSIONS AND FUTURE RESEARCH
This paper analyzes the modeling of time gap data through the use of mixture models. This work also
highlights the inherent discrepancies of the Kolmogorov-Smirnov test in some cases. The important findings in
this study are listed below:
1. All of the five mixture distributions: Exponential + Extreme-value, Lognormal + Extreme-value, Weibull
+ Extreme-value, Weibull + Lognormal, and Exponential + Lognormal were able to model time gap data at
flows of 1959, 2131, 2341, and 4100 according to their root mean square(RMSE) values. The RMSE was
the lowest for the Weibull + Lognormal model among the five combinations at all four flow levels
considered in this study.
2. Based on the CDF based tests, the Weibull + Extreme-Value (WEV) mixture model showed the best
performance at all four flow levels except for 2300 vph. At the 2300 vph flow level, the Weibull +
Lognormal (WLN) mixture model offered the best fit. This is due to the fact that the histogram for the 2300
vph flow is not strictly sloping to the right side. There is a spike in the beginning between the first two bins.
3. Since lognormal’s probability distribution first goes upward and then slopes down, its mixture with the
Weibull distribution models the spike better.
4. Based on two-sample Cramer-von Mises test, the Exponential + Extreme-value (EEV) mixture distribution
proves to be better than other mixture models for all four flow levels considered in this study.
5. Based on the K-sample Anderson-Darling test, the Weibull + Extreme value (WEV) mixture model proves
to be better than other mixture models. The WEV model passes the test at all flow levels except for 2300
vph.
8
6. From the study, Weibull + Extreme value (WEV) is adjudged as the best mixture model to model time gaps
at flow levels above 1900 vph. This is due to its consistent better performance in both the Cramer-von Mises
test and K-Sample Anderson-Darling (AD) Test. Also, as the K-Sample AD test is an omnibus test that
assesses the goodness of fit of a candidate distribution at both the middle and tail regions, a distribution that
passes this test can be said to be very robust.
The applications of the models developed in the study are as follows:
 Mixture models that combine two models using appropriate weights and also offer better fit than
non-composite models for headway data with flow ranges above 1900 vph have been suggested in
this study. These models can be used in simulating mixed traffic.
 As these models are road width based rather than lane based, they can be used in simulation software
for mixed traffic such as HETEROSIM [2].
The research scope to be undertaken in the future is as follows:


The applicability of these mixture models to flows higher than 4100 vph that may prevail in cases
of six-lane and eight-lane divided arterials of Indian cities would be tested.
These mixture models would be extended to roadway conditions other than what has been
encountered in this study viz. rural roads and highways with a lower number of two-wheelers and
a higher number of trucks.
REFERENCES
1. Kanagaraj, V., Asaithambi, G., Srinivasan, K., Sivanandan, R., Vehicle Class wise Analysis of Time Gaps
and Headways under Heterogeneous Traffic, Proc. 90th Annual Meeting of Transportation Research
Board, Washington D.C., 2011
2. Arasan, V. T., and Koshy, R. Z., Headway distribution of heterogeneous traffic on urban arterials.” J. Inst.
Eng. (India), Part AG, 84, 2003, 210–215.
3. Chandra, S., and Kumar, R., Headway modelling under mixed traffic on urban roads, Road & Transport
Research, ARRB, Australia, Vol. 10, 2001, pp. 61-71.
4. Cooray, K., and Ananda, M. M. A., Modeling actuarial data with a composite lognormal-Pareto model,
Scandinavian Actuarial Journal, Vol. 5, 2005, pp. 321-334.
5. Scollnik, D. P. M., on composite lognormal-Pareto models, Scandinavian Actuarial Journal, Vol. 1, 2007,
pp. 20-33.
6. Buckley, D. J., A semi-Poisson model of traffic flow, Transportation Science, Vol. 2, 1968, pp. 107–133.
7. Dawson, R. F., and Chimni, L. A., The hyperlang probability distribution - A generalized traffic headway
model. HRR 230, Highway Research Board, National Research Council, Washington D.C., 1968, pp. 1–
14.
8. Tolle, J. E., The lognormal headway distribution model, Traffic Engineering Control, Vol. 13, 1971, pp.
22–24.
9. Cowan, R. J., Useful headway models, Transportation Research, Vol. 9, 1975, pp. 371–375.
10. Wasielewski, P., Car following headways on freeways interrupted by the semi-Poisson headway
distribution model, Transportation Science, Vol. 13, 1979, pp. 36–55.
11. Chishaki, T., and Youichi, T., Headway distribution models based on the distinction between leaders and
followers, Proceeding of 9th International Symposium of Traffic and Transportation Theory, UNU
Science, 1984, The Netherlands, pp 43-63
12. Hagring, O., Calibration of headway distributions, Proceedings of the 9th Mini-Euro conference, 1997,
Montenegro (Yugoslavia)
13. Zhang, G., Wang, Y., Wei, H., and Chen, Y., Examining headway distribution models with urban freeway
loop event data.” Transportation Research Record: Journal of the Transportation Research Board, Issue
No. 1999, 2007, pp. 141-149.
9
14. Dempster, A. P., Laird, N.M., Rubin, D. B., Maximum Likelihood from Incomplete Data via the EM
Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp
1-38.
15. Stephens, M. A., Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without
Extensive Tables, Journal of the Royal Statistical Society Series B (Methodological), Vol. 32, 1970, pp.
115-122.
16. Xiao, Y., and Gordon, A., A C++ Program for the Cramer-von Mises Two-Sample Test, Journal of
Statistical Software, Vol. 17, 2007, pp. 1-15.
17. Anderson, T. W., and Darling, D. A., Asymptotic Theory of Certain Goodness of Fit Criteria Based on
Stochastic Processes, Annals of Mathematical Statistics, Vol. 23, 1952, pp. 193-212
18. Scholz, F. W., and Stephens, M. A., K-sample Anderson-Darling tests of fit, for continuous and discrete
cases, Technical Report No. 81, Department of statistics, University of Washington, 1986.
19. Dubey, S.K., Ponnu, B. and Arkatkar S.S., Time Gap Modeling under Mixed Traffic Conditions: A
statistical Analysis. Journal of Transportation System Engineering and Information Technology, 12(6),
21−25, 2012.
10