Selecting Input Probability
Distributions
Techniques for Assessing Sample
Independence - 1
• Most methods to specify input distributions assume observed
data X1, X2, ..., Xn are an independent (random) sample from
some underlying distribution
– If not, most methods are invalid
– Need a way to check data empirically for independence
– Heuristic plots vs. formal statistical tests for independence
2
Techniques for Assessing Sample
Independence - 2
• Correlation plot: If data are observed in a time sequence,
compute sample correlation with lag j and plot it
n j
ˆ j
X
i 1
i
X ( n ) X i j X ( n)
n j S 2 ( n)
– If data are independent then the correlations should be near
zero for all lags
– Keep in mind that these are just estimates
3
Techniques for Assessing Sample
Independence - 3
• Scatter diagram: Plot pairs (Xi, Xi+1)
– If data are independent the pairs should be scattered
randomly
– If data are positively (negatively) correlated the pairs will lie
along a positively (negatively) sloping line
4
Techniques for Assessing Sample
Independence
• Independent draws from expo(1) distribution (independent by
construction)
5
Techniques for Assessing Sample
Independence
• Delays in queue from M/M/1 queue with utilization factor ρ = 0.8
(positively correlated)
• There are also formal statistical testing procedures, like the “runs”
test
6
Activity I: Hypothesizing Families of
Distributions - 1
• First, need to decide what form or family to use—exponential,
gamma, or what?
• Later, need to estimate parameters and assess goodness of fit
• Sometimes prior knowledge of a certain random variable’s role in
simulation may be useful:
– Requires no data
– Use theoretical knowledge of random variable’s role in
simulation
7
Activity I: Hypothesizing Families of
Distributions - 2
• Sometimes prior knowledge of a certain random variable’s
role in simulation may be useful:
– Seldom have enough prior knowledge to specify a distribution
completely. Exceptions are:
• Arrivals one-at-a-time, constant mean rate, independent:
exponential interarrival times or Poisson arrival process
• Sum of many independent pieces: normal
• Product of many independent pieces: lognormal
• Percentage/probability: Beta (range is in (01,)
– Often use prior knowledge to rule out distributions on basis of
range
• Interarrival times: not normal (normal range always goes negative)
– Still should be supported by data (e.g., for parameter-value
estimation)
Summary Statistics
• Compare simple sample statistics with theoretical population
versions for some distributions to get a hint
– Bear in mind that we get only estimates subject to uncertainty
• If sample mean X (n) and sample median xˆ0.5 (n) are close,
suggests a symmetric distribution
• Coefficient of variation of a distribution: cv = σ/μ; estimate via;
sometimes useful for discriminating between continuous
distributions cvˆ S (n) / X (n)
– cv < 1 suggests gamma or Weibull with shape parameter α > 1
– cv = 1 suggests exponential
– cv > 1 suggests gamma or Weibull with shape parameter α < 1
9
Summary Statistics
• Lexis ratio of a distribution: = σ2/μ; estimate via ˆ S 2 (n) / X (n)
Sometimes, useful for discriminating between discrete
distributions:
– < 1 suggests binomial
– = 1 suggests Poisson
– > 1 suggests negative binomial or geometric
• Other summary statistics: range, skewness, kurtosis
10
Histograms
Continuous Data Set
• hj : Basically an unbiased estimate of Db f(x), where f(x) is the true
(unknown) underlying density of the observed data and Db is a constant
• Break range of data into k intervals of width Db each
– k, Db are determined basically by trial and error
– One rule of thumb: Sturges’s rule
k 1 log 2 n 1 3.332 log 10 n
– Compute proportion hj of data falling in jth interval; plot a constant of
height hj above the jth interval
– Shape of plot should resemble density of underlying distribution;
compare shape of histogram to density shapes.
Histograms
Discrete Data Set
• Basically an unbiased estimate of the (unknown) underlying probability
mass function of the data
• For each possible value xj that can be assumed by the data, let hj be
the proportion of the data that are equal to xj; plot a bar of height hj
above xj
• Shape of plot should resemble mass function of underlying distribution;
compare shape of histogram to mass-function shapes in Sec. 6.2.3
Multimodal Data
• Histogram might have multiple local modes, rather than just one; no
single “standard” distribution adequately represents it
• Reason: data can be separated on some context-dependent basis (e.g.,
observed machine downtimes are classified as minor vs. major)
• Separate data on this basis, fit separately, recombine as a mixture
Quantile Summaries
• Numerical synopsis of sample quantiles useful for detecting whether
underlying density or mass function is symmetric or skewed one way
or the other
• Definition of quantiles: Suppose the CDF F(x) is continuous and
strictly increasing whenever 0 < F(x) < 1, and let q be strictly
between 0 and 1. Then the q-quantile of F(x) is the number xq such
that F(xq) = q. If F–1 is the inverse of F, then xq = F–1(q)
– q = 0.5: median
– q = 0.25 or 0.75: quartiles
– q = 0.125, 0.875: octiles
– q = 0, 1: extremes
13
Quantile Summaries
• Quantile summary: List median, average of quartiles,
average of octiles, and average of extremes
• If distribution is symmetric, then
median average of quartiles average of octiles
average of extremes
• If distribution is skewed right, then
median < average of quartiles < average of octiles <
average of extremes
• If distribution is skewed left, then
median > average of quartiles > average of octiles >
average of extremes
14
Box Plots
• Graphical display of quantile summary
• On horizontal axis, plot median, extremes, octiles, and a box
ending at quartiles
• Symmetry or asymmetry of plot indicates symmetry or
skewness of distribution
15
Hypothesizing a Family of Distributions:
Example with Continuous Data
• Sample of n = 219 interarrival times of cars to a drive-up bank over a
90-minute peak-load period are obtained
– Number of cars arriving in each of the six 15-minute periods was
approximately equal (suggesting stationarity of arrival rate)
– Sample mean = 0.399 (all times in minutes) > median = 0.270,
skewness = +1.458 (suggesting right skewness)
– cv = 0.953, close to 1 (suggesting exponential distribution)
• Histograms (for different choices of interval width Db) suggest the
exponential distribution
16
Hypothesizing a Family of Distributions:
Example with Discrete Data
• Sample of n = 156 observations on number of items demanded per
week from an inventory over a three-year period are obtained
– Range 0 through 11
– Sample mean = 1.891 > median = 1.00, skewness = +1.655
(suggesting right skewness)
– Lexis ratio = 5.285/1.891 = 2.795 > 1 (suggesting negative
binomial or geometric (special case of negative binomial)
distribution)
• Histogram suggests the geometric distribution
17
Activity II: Estimation of Parameters
• Have: Hypothesized parametric distribution
• Need: Numerical estimates of its parameter(s)—this constitutes the “fit”
• Many methods to estimate distribution parameters
– Method of moments
– Unbiased estimates
– Least squares estimation
– Maximum likelihood estimation (MLE)
• In some sense, MLE is the preferred method for our purposes
– Good statistical properties
– Somewhat justifies chi-square goodness-of-fit test
– Intuitive
– Allows estimates of error in the parameters—sensitivity analysis
Activity II: Estimation of Parameters
• Idea for MLEs:
– Have observed sample X1, X2, ..., Xn
– Came from some true (unknown) parameter(s) of the distribution form
– Pick the parameter(s) that make it most likely that you would get what
you did get (or close to what you got in the continuous case)
– An optimization (mathematical-programming) problem, often messy
Desirable properties of MLEs
• Invariance Principle: Let ˆ1 , , ˆm be MLEs of the
parameters θ1,…, θm. Then the mle of any function
h(θ1,…, θm) of these parameters is the function h(ˆ1 , , ˆm )
of the MLEs.
• Asymptomatic behavior of the MLEs: When the sample
size is large:
– The MLE is approximately the Minimum Variance
Unbiased Estimator (MVUE) of θ.
– The MLE is approximately normal
20
MLEs for Discrete Distributions
• Have hypothesized family with PMF pθ (xj) = Pθ(X = xj)
• Single (for now) unknown parameter θ to be estimated
• For any trial value of θ, the probability of getting the already-observed
sample is
• P{Getting X1, X2,..., Xn} = P(X1)P(X2)...P(Xn)
= P(X = X1)P(X = X2)...P(X = Xn)
= pθ(X1) pθ(X2)... pθ(Xn)
= Likelihood function L(θ)
• Task: Find the value of θ that makes L(θ) as big as it can be
• How? Differential calculus, take logarithm (turns products into sums),
nonlinear programming methods, tabled values in the book
• MLEs for continuous distributions are obtained by replacing the PMF
with the PDF
• MLEs for multiparameter distributions are obtained similarly where the
likelihood function is a function of all of the parameters
Example of Continuous MLE
X (219) 0.399
• {Xi|i=1,…,219}: a random sample
• Hypothesized distribution is the exponential distribution
1 x /
e
f ( x)
0
if x 0
otherwise
• Likelihood function is:
L( ) (
1
e
X1 /
)(
1
e
X2 /
)...(
1
e
Xn /
)
n
exp(
1
n
X)
i 1
i
• Want value of β that maximizes L(β) over all β > 0
• Equivalent (and easier) to maximize the log-likelihood function l(β) =
ln L(β) since ln is a monotonically increasing function
22
Example of Continuous MLE
l ( ) n ln
1
n
X
i 1
i
n
dl ( ) n 1
2
d
n
X i 0 ˆ
i 1
d 2l ( ) n
2 n
d 2l
2 3 Xi
2
d
i 1
d 2
ˆ
X
i 1
n
i
X ( n)
n
2nX (n)
n
0
2
3
2
X (n)
X (n)
X (n)
ˆ X (n) 0.399 from the observed sample of n 219 points
23
Example of Discrete MLE
•
Hypothesized distribution is the geometric distribution
p p ( x) p(1 p) x (x 0,1, 2,...)
•
Likelihood function is
L( p) p(1 p) p(1 p)
X1
•
•
n
X2
p(1 p)
Xn
Xi
p (1 p) i1
n
Want value of β that maximizes L(β) over all β > 0
Equivalent (and easier) to maximize the log-likelihood function l(β) =
ln L(β) since ln is a monotonically increasing function
24
Example of Discrete MLE
n
Xi
dl ( p) n
1
i 1
0 pˆ
dp
p 1 p
X ( n) 1
n
Xi
d 2l ( p )
n
2 i 1 2 0
2
dp
p 1 p
MLE is pˆ
1
0.346 from the observed sample of n 156 points
1.891 1
n
E[ X i ]
d 2l ( p )
n
n n(1 p) / p
n
E
2 i 1
2
2
2
2
2
p
(1 p )
p
(1 p)
p (1 p)
dp
d 2l ( p )
2
( p) n / E
p
(1 p)
2
dp
pˆ z1 / 2
( pˆ )
n
pˆ z1 / 2
pˆ 2 (1 pˆ )
n
0.3462 (1 0.346)
90% CI: 0.346 1.645
0.346 0.037 [0.309, 0.383]
156
25
Activity III: Determining How Representative
the Fitted Distributions Are
• Have: Hypothesized family and estimated parameters
• Question: Does the fitted distribution agree with the observed data?
• Approaches
– Heuristic
• Density/Histogram overplots and frequency comparisons
• Distribution function differences plots
• Probability and quantile plots
– P-P plot
– Q-Q plot
– Statisticals tests
• Chi-square goodness-of-fit tests
• Kolmogorov-Smirnov tests
• Anderson-Darling tests
26
• Poisson process tests
Heuristic Procedures
Continuous Data
• Density/Histogram Overplots and Frequency Comparisons
– Density/histogram overplot: Plot Db fˆ ( x)
over the histogram
h(x); look for similarities (recall that the area under h(x) is Db
and fˆ is the density of the fitted distribution)
– Interarrival-time data for drive-up bank and fitted exponential
27
Heuristic Procedures
Continuous Data
• Data Set 3 : Against a Weibul distribution
Density-Histogram Plot
0.19
Density/Proportion
0.15
0.11
0.08
0.04
0.00
2.30
4.70
7.10
9.50
Interval Midpoint
13 intervals of w idth 1.2
1 - Weibull
11.90
14.30
16.70
28
Heuristic Procedures
• Frequency comparison
– Take histogram intervals [bj–1, bj] for j = 1, 2, ..., k, each of width Db
– Let hj be the observed proportion of data in jth interval
– Let
bj
rj
fˆ ( x)dx
b j 1
be the expected proportion of data in jth interval if the fitted
distribution is correct
– Plot hj and rj together, look for similarities
29
Heuristic Procedures
Discrete Data
• Frequency comparison
– Let hj be the observed proportion of data that are equal to the jth
possible value xj
– Let rj pˆ ( x j )
be the expected proportion of the data equal to xj if the fitted
probability mass function p̂ is correct
– Plot hj and rj together, look for similarities
– Demand-size data for inventory and fitted geometric distribution
30
Distribution Function Differences Plots
• Density/histogram overplots are comparisons of individual probabilities
of fitted distribution with observed individual probabilities
• Instead of individual probabilities, one can compare cumulative
probabilities via fitted CDF Fˆ ( x ) against a (new) empirical CDF
number of X i ' s x
Fn ( x)
proportion of data that are x
n
• Could plot Fˆ ( x) with Fn (x) and look for similarities, but it is harder to
see such similarities for cumulative than for individual probabilities
• Alternatively, plot Fˆ ( x) Fn ( x) against the range of x values and look for
closeness to a flat horizontal line at 0.
31
Distribution Function Differences Plots
Interarrival-time data for drive-up bank and fitted exponential distribution
32
Distribution Function Differences Plots
Distribution-Function-Differences Plot
0.20
Data Set 3
Weibull
0.13
Difference (Proportion)
0.07
0.00
-0.07
-0.13
-0.20
1.76
3.96
6.17
8.37
10.58
12.79
x
Use caution if plot crosses line
1 - Weibull (mean diff. = 0.01706)
14.99
33
17.20
Distribution Function Differences Plots
Demand-size data for inventory and fitted geometric distribution
Goodness-of-Fit Test
• Formal statistical hypothesis tests for
– H0: The observed data X1, X2, ..., Xn are IID random variables
with F̂ distribution function
• Caution: Failure to reject H0 does not constitute “proof” that the fit
is good
– Power of some goodness-of-fit tests is low, particularly for
small sample size n
• Also, large n creates high power, so tests will nearly always reject
H0
– Keep in mind that null hypotheses are seldom literally true,
and we are looking for an “adequate” fit of the distribution
35
Chi-Square Test
• Very old (Karl Pearson, 1900), and general (continuous or discrete
data)
• It is a statistical formalization of frequency comparisons
• Divide range of data into k intervals, not necessarily of equal
width:
– [a0, a1), [a1, a2), ..., [ak–1, ak]
– a0 could be –or ak could be +
• Compare actual amount of observed data in each interval with what
the fitted distribution would predict
– Let Nj = the number of observed data points in the jth interval
– Let pj = the expected proportion of the data in the jth interval if
the fitted distribution were literally true
36
Chi-Square Test
aj
fˆ ( x)dx for continuous
p j a j1
pˆ ( x)
for discrete
a j1 x a j
• Thus, n pj = expected (under fitted distribution) number of points in
the jth interval
• If fitted distribution is correct, would expect that Nj n pj
• The test statistic is
k
2
j 1
N
np j
2
j
np j
37
Chi-Square Test
• Under H0: Fitted distribution is correct, 2 has (approximately—see
book for details) a chi-square distribution with k – 1 degrees of
freedom
• Reject H0 at level α if 2 > upper critical value
• Advantages
– Completely general
– Asymptotically valid (as n ) if MLEs were used
• Drawbacks
– Arbitrary choice of intervals (can affect test conclusion)
– Conventional advice
• Want n pj 5 or so for all but a couple of j’s
• Pick intervals such that the pj’s are close to each other
38
Chi-Square Test
Exponential distribution fitted to interarrival-time data
• Choose k = 20 intervals so that pj = 1/20 = 0.05 for each
interval (see book for details on how the endpoints were
chosen, which involved inverting the exponential CDF and
taking a20 = +)
• Thus, npj = (219) (0.05) = 10.95 for each interval
• Counted observed frequencies Nj, computed test statistic χ2 =
22.188
• Use degrees of freedom = k – 1 = 19; upper 0.10 critical level is
χ219,0.90 = 27.204
• Since test statistic does not exceed the critical level, do not
reject H0
39
Chi-Square Test
Geometric distribution fitted to demand-size data
• Since data are discrete, cannot choose intervals so that the pj’s are
exactly equal to each other
• Choose k = 3 intervals (classes) {0}, {1, 2}, and {3, 4, ...}
• Obtained np1 = 53.960, np2 = 58.382, and np3 = 43.658
• Counted observed frequencies Nj, computed test statistic χ2 = 1.930
• Use degrees of freedom = k – 1 = 2; upper 0.10 critical level is
χ22,0.90 = 4.605
• Since test statistic does not exceed the critical level, do not reject H0
40
Chi-Square Test
• Data Set 3 against Weibull
41
Chi-Square Test
• Data Set 3 against Gamma
42
Kolmogorov-Smirnov Test
• Advantages with respect to chi-square tests
– No arbitrary choices, like intervals
– Exactly valid for any (finite) n
• Disadvantage with respect to chi-square tests
– Not as general
• It is a kind of a statistical formalization of probability plots
– Compare empirical CDF from data against fitted CDF
• Yet another version of empirical distribution function is used
– Fn(x) = proportion of the Xi data that are x (piecewise linear
step function)
– On the other hand, we have the fitted CDF Fˆ ( x )
– In a perfect world, Fn(x) = Fˆ ( x ) for all x
43
Kolmogorov-Smirnov Test
• The worst (vertical) discrepancy is Dn sup F n( x) Fˆ ( x)
x
• Computing Dn (must be careful; sometimes stated incorrectly)
i
Dn max Fˆ X ( i )
i 1, 2 ,..., n n
i 1
Dn max Fˆ X ( i )
i 1, 2 ,..., n
n
Dn max Dn , Dn
i 1, 2 ,..., n
• Reject H0: The fitted distribution is correct if Dn is not too big
• If all parameters are known, reject H0 if
0.11
n
0.12
Dn c1
n
• There are several different kinds of tables depending on the form
and specification of the hypothesized distribution (see book for
details and example)
44
Kolmogorov-Smirnov Test
45
Anderson-Darling Test
• As in K-S test, look at vertical discrepancies between Fˆ ( x) and F n(x)
• Difference:
– K-S weights differences the same for each x
– Sometimes more interested in getting accuracy in (right) tail
• Queueing applications
• P-K formula depends on variance of service-times
– A-D applies increasing weight on differences toward tails
– A-D is more sensitive (powerful) than K-S in tail discrepancies
• Define the weight function ( x)
1
Fˆ ( x) 1 Fˆ ( x)
• Note that Ψ(x) is smallest (= 4) in the middle (median) where Fˆ ( x) 1 / 2
and largest () in either tail
Anderson-Darling Test
• Test statistic is
An2 n [ Fn ( x) Fˆ ( x)]2 ( x) fˆ ( x) dx
(2i 1)ln Fˆ ( X
n
i 1
(i )
) ln 1 Fˆ ( X ( n 11) )
n
n
• Reject H0: The fitted distribution is correct if An2 is too big
• There are several different kinds of tables depending on the form
and specification of the hypothesized distribution (see book for
details and example)
47
Anderson-Darling Test
48
Poisson-Process Test
• Common situation in simulation: modeling an event or arrival
process over time:
– Arrivals of customers or jobs, Breakdowns of machines,
Occurrence of accidents
• A popular (and realistic) model is the Poisson process with some
rate λ
• Equivalent definitions are
1. Number of events in (t1, t2] ~ Poisson with mean λ (t2 – t1)
2. Time between successive events ~ exponential with mean 1/ λ
3. Distribution of events over a fixed period of time is uniform
• Use second or third definitions to develop test for observed data
coming from a Poisson process:
• Test for inter-event times being exponential (chi-square, K-S, A-D, ...)
• Test for placement of events over time being uniform
• See book for details and example
The ExpertFit Software
•
•
Need software assistance to carry out the statistical procedures
Standard statistical-analysis packages do not suffice
– Often too oriented to normal-theory and related distributions
– Need wider variety of “nonstandard” distributions to achieve an adequate fit
– Difficult calculations like inverting non-closed-form CDFs, computation of critical
values and p-values for tests
•
•
•
•
ExpertFit package is tailored to these needs
Other packages exist, sometimes packaged with simulation-modeling
software
See book for details on ExpertFit and an extended, in-depth example
In Arena, distribution fitting is done with Input Analyzer
50
Shifted Distributions
• Many standard distributions have range [0, )
– Exponential, gamma, Weibull, lognormal, PT5, PT6, log-logistic
• But in some situations we’d like the range to be [g, ) for some
parameter
– A service time cannot physically be arbitrarily close to 0; there
is some absolute positive minimum gfor the service time
• One can shift one of the above distributions to the right by g
– Replace x in their density definitions by x – g(including in the
definition of the ranges)
• Introduces a new parameter gthat must be estimated from the data
– Depending on the distribution form, this may be relatively easy
(e.g., exponential) or very challenging (e.g., global MLEs are illdefined for gamma, Weibull, lognormal)
51
Truncated Distributions
• Data are well-fitted by a distribution with range [0, ) but
physical situation dictates that no value can exceed some finite
constant b
• Need to truncate the distribution above b to make effective
range [0, b]
• This is really a random variate-generation issue (covered in
Chapter 8)
52
© Copyright 2026 Paperzz