Selecting Input Probability Distributions 1 (Chapter 6, Law

B. Maddah
ENMG 622 Simulation
05/02/07
Selecting Input Probability Distributions 1
(Chapter 6, Law)
 Introduction
 The objective here is to determine what probability
distributions to use as input to a simulation.
 The input distributions are usually determined based on
data collected from the real system (e.g., observations, on
arrival times, service times, demand, time to failure, etc.).
 This data can be used in several ways.
 The third approach based on fitting theoretical distribution
is the preferred one (if applicable).
 Law lists several popular continuous and discrete
distributions (pp. 287 – 313).
1
 Fitting an input probability distribution is usually done
through three activities:
o Activity I: Hypothesizing families of distributions
o Activity II: Estimation of parameters
o Activity III: Determining how representative the fitted
distributions are.
 Activity I: Hypothesizing Families of Distributions
 We need to decide what form or family to use: Exponential,
gamma, or what?
 Sometimes we can use our theoretical knowledge of random
variable’s role in simulation to hypothesis a distribution.
E.g.,
o Arrivals one-at-a-time, constant rate, independent:
Exponential interarrival times.
o Sum of many independent pieces: Normal.
o Product of many independent pieces: Lognormal.
o Service times: Cannot be normal (because of < 0 values).
o Proportion defective: Use a bounded distribution on (0,1).
 Even in these cases, we should still validate our choice
empirically.
 In the absence (or with limited) theoretical knowledge, the
following empirical tools can be used to hypothesis a family
of distribution.
2
 Summary Statistics
 These include simple descriptive statistics measures such as
mean, variance, median, mode, quartiles, etc.
 By comparing the descriptive statistics of the sample with
those of hypothesized distribution, one may get some hints.
 For example, the coefficient of variation,
CV = mean/standard deviation,
is useful in distinguishing continuous distributions.
o CV > 1 suggests gamma or Weibull with α < 1
o CV ≈ 1 suggests exponential
o CV < 1 suggests gamma or Weibull with α > 1
 Lexis ratio,
 = variance/mean,
is useful in distinguishing discrete distributions.
o  > 1 suggests negative binomial or geometric
o  ≈ 1 suggests Poisson.
o  < 1 suggests binomial.
 The skewness,
 = E[(X−)3] / 3
is a measure of symmetry of a distribution’s density.
o  > 0 suggests right skewness (e.g. exponential)
o  ≈ 0 suggests symmetry (e.g., normal).
o  < 0 suggests left skeweness (e.g. right triangular).
3
 Histograms
 For continuous distributions, a histogram of the data can be
(appropriately) constructed, with a density function fitted.
 The shape of the fitted density should resemble the shape of
the hypothesized density.
 For discrete distribution, the histogram frequencies provide
an estimator of the probability mass function of the
hypothesized distribution.
 Comparing the estimated pmf and the theoretical pmf
indicates how appropriate the hypothesized distribution is.
 Quantile Summaries
 Sample quantiles are useful for detecting whether the
underlying distribution is symmetric or skewed.
 Useful quantiles are
o The median: x0.5 = FX−1(0.5)
o The quartiles: x0.25 = FX−1(0.25) and x0.75 = FX−1(0.75)
o The octiles: x0.25 = FX−1(0.125) and x0.75 = FX−1(0.875)
o The extremes: minimum and maximum values.
 A box plot is a graphical representation of the quantiles.
x0.125
x0.25
x0.5
x0.75
x0.875
4
Hypothesizing a Family of Distributions: Example with Continuous Data
Sample of n = 219 interarrival times of cars to a drive-up bank over a 90-minute
peak-load period
Number of cars arriving in each of the six 15-minute periods was
approximately equal, suggesting stationarity of arrival rate
Sample mean = 0.399 (all times in minutes) > median = 0.270, skewness = +1.458,
all suggesting right skewness
cv = 0.953, close to 1, suggesting exponential
Histograms (for different choices of interval width ∆b) suggest exponential:
Box plot is consistent with exponential:
6-14
Hypothesizing a Family of Distributions: Example with Discrete Data
Sample of n = 156 observations on number of items demanded per week from an
inventory over a three-year period
Range 0 through 11
Sample mean = 1.891 > median = 1.00, skewness = +1.655, all suggesting right
skewness
Lexis ratio = 5.285/1.891 = 2.795 > 1, suggesting negative binomial or geometric
(special case of negative binomial)
Histogram suggests geometric:
6-15
 Activity II: Estimation of Parameters
 With hypothesized distribution(s) at hand, we need to
estimate numerical values for the distribution(s) parameters.
 There are many methods for estimating parameters.
o Method of moments.
o Least squares.
o Maximum likelihood estimators (MLE).
 Here we will focus on MLE’s. It is the preferred method
because (i) it has good statistical properties; (ii) it justifies
using goodness-of-fit tests; (iii) it is intuitive.
 The MLE method operates on a set of observed values,
X1, X2, ,., Xn. The idea of the MLE is to choose the
parameter(s) that maximizes the probability that the random
variable of interest takes on values X1, X2, ,…, Xn .
 For example, for a discrete distribution having a single
parameter, the MLE estimator is
ˆ  arg max L( )  p ( X1 ) p ( X 2 )

p ( X n ) ,
where p(Xi) = P{X = Xi| parameter = } is the pmf of X.
 For a continuous distribution the density function is used in
place of the pmf.
ˆ  arg max L( )  f ( X1 ) f ( X 2 )

f ( X n ) .
7
 Example: MLE estimator for the exponential distribution
 For the exponential distribution, the density is f(x) = e−x .
 Then, the likelihood function, L(), is
L ( )   n e

n
 Xi
i 1
 In order to determine the value of  that maximizes L(), it
is useful to consider
n
l ( )  ln( L( ))  n ln( )    X i
i 1
 Differentiating l() gives,
l ( ) n n
   Xi

 i 1
 2 l ( )
n


 0.
2
2


 Setting the first derivative equal to zero gives the estimator
ˆ 
n

n
X
i 1
1
X
i
where X is the sample mean.
8
 Example: MLE estimator for the Poisson distribution
 For the Poisson distribution, the pmf is p(x) = e−x/x! .
 Then, the likelihood function, L(), is
n
 Xi
e  n  i1
L ( ) 
n
X !
i
i 1
 n

l
(

)

ln(
L
(

))


n


ln(

)
X

ln
X
!

i
 i 
And
i 1
 i 1

n
 Differentiating l() gives,
l ( )
1 n
 n   X i

 i 1
n
Xi

 2 l ( )
  i 1 2  0.
2


 Setting the first derivative equal to zero gives the estimator
n
ˆ 
X
i 1
n
i
X.
9

Download Report

Selecting Input Probability Distributions 1 (Chapter 6, Law

Paperzz.com

Your Paperzz