ML-estimators for parameters of hyperbolic densities in finance

Quality Technology &
Quantitative Management
Vol. 3, No. 4, pp. 383-400, 2006
QTQM
© ICAQM 2006
On Control Charts Based on
the Generalized Poisson Model
B. He1, M. Xie2, T. N. Goh2 and K. L. Tsui3
1
Philips Electronics (S) Pte. Ltd., Lorong , Singapore
National University of Singapore, Kent Ridge Crescent, Singapore
3
Georgia Institute of Technology, Atlanta, GA, USA
2
(Received March 2005, accepted November 2005)
______________________________________________________________________
Abstract: The Poisson distribution is widely used to fit count data such as the number of
nonconformities in a product unit. However, this distribution fails to fit the defect data in a high-quality
manufacturing environment when over-dispersion occurs. In this paper, the generalized Poisson
distribution is studied as an alternative distribution. The generalized Poisson distribution is flexible and
capable of dealing with over-dispersed data. In particular, the interpretation of parameters is discussed
and statistical monitoring procedures for count data that can be modeled by the generalized Poisson
distribution are studied. Based on the generalized Poisson distribution, two different procedures are
discussed for an effective process monitoring. Sensitivity analyses of the two monitoring procedures are
also presented. To validate the use of the generalized Poisson distribution, three statistical tests for testing
the Poisson distribution against the generalized Poisson alternative are also investigated and compared.
Keywords: Average run length, control chart, generalized Poisson distribution, hypothesis testing,
monitoring procedure.
______________________________________________________________________
1. Introduction
I
n manufacturing industries, the Poisson distribution has been widely used to model the
number of defects, and control charts based on the Poisson distribution can be
constructed for process control of defect data. In the past decades, many successful
applications of data modeling and process control based on the Poisson distribution have
been reported in industrial practices. The Poisson distribution has become a popular and
even one of the dominant models in dealing with count-related data.
High-quality manufacturing processes are commonly adopted to achieve customer
satisfaction and retain competitiveness. If the control chart based on the Poisson
distribution is used to control such processes, a large number of false alarms could be
observed. The reason is that the Poisson distribution is no longer a suitable model to fit the
data (Xie and Goh [33]). Consequently, the control chart based on the Poisson distribution
has inappropriate control limits, which directly lead to a large number of false alarms.
For the data from high-quality manufacturing processes, the ratio of the sample
variance to the sample mean is usually larger than 1 (for example, Lambert [20]; Xie and
Goh [33]; Ramirez and Cantell [23]). This is the so-called over-dispersion as the Poisson
distribution has equal mean and variance. In fact, over-dispersed data are existing not only
in high-quality manufacturing processes, but also in many other fields (for example,
Shankar et al. [27]; Woodall [31]; Bohning [3]; Freund et al. [12]; Toscas and Faddy [29];
From [13]; Song [28]; Luceno [21]). In many practical situations, the Poisson distribution is
384
He, Xie, Goh and Tsui
no longer suitable and some alternative distributions have been suggested. One of these
alternatives is the zero-inflated Poisson distribution (Lambert [20]), which has a simple
probability mass function and a reasonable two-population interpretation.
In this paper, the generalized Poisson distribution is studied as the alternative model.
The generalized Poisson distribution is a more direct generalization of the Poisson
distribution. It has a very versatile nature and similar complexity as the zero-inflated
Poisson distribution. It can be noted that both distributions can be used, but the model
selection and fitting are different issues from what are considered in this paper, which
focuses on general Poisson distribution.
The purpose of this paper is to study the generalized Poisson distribution as an
alternative model when the Poisson distribution is not suitable in the situation of
high-quality process control. Control procedures based on the generalized Poisson
distribution are investigated. In what follows, we will first give a brief review of the
generalized Poisson distribution, and then we will present two monitoring procedures
based on this distribution together with their statistical performance. Furthermore, to
validate the use of the generalized Poisson distribution, hypothesis testing of the Poisson
distribution (Fang [11]) against the generalized Poisson alternative is important. Three test
methods, the variance test, the O2 test and the Z2 test, will be studied and compared
through a simulation study.
2. Generalized Poisson Model and Applications
2.1. The Generalized Poisson Distribution
The generalized Poisson distribution (GPD) (Consul and Jain [6]; Tuenter [30]) has
two parameters (θ , λ ) with probability mass function given as:
PX (θ , λ ) =
θ (θ + xλ ) x −1 e −θ − xλ
x!
, x = 0,1, 2,..., λ ≥ 0 .
(1)
Theoretically, the parameter λ can also take negative values. However, since we are more
interested in the study of over-dispersed data in this paper, which corresponds to the
situation when λ is positive, we assume λ ≥ 0 here.
For the generalized Poisson distribution we have that (Consul [7]) E ( X ) = θ (1 − λ ) −1
and Var (X ) = θ (1 − λ ) −3 . It should be pointed out that the GPD model is easy to use
since there are closed-form expressions for both mean and variance, and moment estimates
can be easily calculated. On the other hand, the maximum likelihood estimates can also be
obtained in a straightforward manner. Consider a set of independent observations { X 1 ,
X 2 ,…, X n } with sample size n. Then the log-likelihood function is:
l (θ , λ ) = n ln θ +
n
∑
i =1
( x i − 1) ln(θ + x i λ ) − nθ − nxλ −
n
∑ ln x !,
i
i =1
(2)
where x = ∑ in=1 x i / n . Taking derivatives with respect to the parameters, the maximum
likelihood estimators ( θˆ , λ̂ ) can be obtained by solving Equation (3).
Bednarski [1] proposed a robust estimation for the parameters of the generalized
Poisson model, and the estimators are proved to be optimal in the sense of local mini-max
testing.
On Control Charts Based on Generalized Poisson Model
385
⎧ n x i ( x i − 1)
− nx = 0
⎪
ˆ
.
⎨ i =1 x + ( x i − x )λ
⎪
θˆ = x (1 − λˆ)
⎩
∑
(3)
2.2. An Application Example
An example is used here to illustrate the use of the generalized Poisson model. The
data in Table 1 are records of read-write errors discovered in computer hard disks from a
manufacturing process (Xie and Goh, [33]).
Table 1. A set of defect-count data from a manufacturing process.
0
11
0
0
75
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
0
0
0
1
0
0
0
0
0
0
0
0
1
0
2
4
0
0
2
0
0
0
0
0
0
0
2
0
0
0
1
1
0
0
0
0
0
0
0
75
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
3
0
0
0
0
0
0
1
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
1
0
0
0
0
0
15
0
0
0
0
0
9
0
2
9
6
0
0
0
0
0
0
0
0
From the data contained in Table 1, it can be seen that most of the hard disks have no
defects. Based on the data, we have that n = 208 , x = 242 / 208 , and the maximum
likelihood estimates are θˆ = 0.144297 and λˆ = 0.875977 . The GPD model for the data set
is:
f (x ) =
0.144297 (0.144297 + 0.875977 x ) x −1 e −0.144297 −0.875977 x
, x = 0,1, 2... .
x!
The data in Table 1 are from a high-quality manufacturing environment, and most of
the product units contain no defects. The zero-inflated Poisson could be used in such a case.
However, the generalized Poisson distribution can be interpreted as a multi-population
mixture and is more suitable to use when the over-dispersion results from the existence of
different types of causes of defects or different types of defects.
2.3. Process Monitoring
When process characteristics can be modeled with the generalized Poisson distribution,
exact probability control limits can be obtained. However, the lower control limit for the
GPD model may not exist. This is because the probability of zero-defect could be larger
than the desired Type I error probability. This phenomenon is common for attribute control
charts. The upper control limit will be studied here and a combined procedure will be
discussed later for process improvement monitoring, which is important in statistical
process control.
The upper control limit for a control chart based on the number of defects can be
obtained as the smallest integer solution of the following equation:
386
He, Xie, Goh and Tsui
P( More than UCLc defects in a sample unit ) ≤ α L ,
where α L is the predetermined false alarm probability for the upper control limit nu .
For the data in Table 1, based on the maximum likelihood estimates of parameters, it
can be calculated that the upper control limit is 26 at the false alarm rate of 0.01. This
means that there should not be any alarm for the values less than or equal to 26 when the
underlying distribution model is the generalized Poisson distribution.
To study the sensitivity of the monitoring procedure, both the operating-characteristic
function and average run length are examined. The GPD model has two parameters, and
the two parameters could have different impact on the alarm probability. The
operating-characteristic function, expressed by the Type II error probability β , is a
measure of the inability of a control chart to detect process shifts. For the GPD model,
assuming that the parameters shift from ( θ 0 , λ0 ) to ( θ , λ ), the general formula of the Type
II error probability is:
UCL
β = P ( X ≤ UCL (θ 0 , λ0 ) | θ , λ ,α L ) = ∑
θ (θ + kλ ) k −1 e −θ − kλ
k!
k =0
(4)
,
where X is the number of defects in a product unit, which is a GPD random variable with
parameters θ and λ , and α L is the Type I error probability. The average run length
(ARL) is the average number of samples to be checked in order to obtain a point outside
the control limit, which in this paper would be a point above the UCL. For the GPD model,
the ARL is given by:
ARL =
1
=
P ( POINT OUT OF CONTROL LIMIT )
∑
1
.
∞
θ (θ + kλ ) k −1 e −θ −kλ
k =UCL
k!
(5)
Some numerical values of ARL for different α L , θ and λ are presented in Table 2. For
example, with the false alarm probability α L = 0.0005, the ARL is calculated to be 2071
when parameters θ and λ are 0.80 and 0.01, respectively. That is, on average, 2071
samples need to be checked in order to obtain a point plotted outside the control limit. It
should be pointed out that the purpose to set α L = 0.0005 is to obtain a close in-control
AIRL (Average Item Run Length) as the one in the second monitoring procedure that will
be discussed later. The AIRL is defined as the average number of items inspected in order
to obtain an out-of-control alarm. For example, the AIRL in the traditional c-chart is
exactly the same as the ARL. Some ARL curves are also shown in Figure 1.
Table 2. Some numerical values of ARL for θ 0 = 0.01, λ0 = 0.8, and α L = 0.0005.
θ
λ = 0.50
λ = 0.75
λ = 0.80
λ = 0.85
λ = 0.90
λ = 1.00
λ = 1.50
0.005
0.01
0.03
0.05
0.1
0.15
0.2
312409
155491
50891
29984
14329
9134
6552
6626
3308
1097
654
322
212
157
4147
2071
688
411
203
134
100
2775
1386
461
276
137
91
68
1966
983
327
196
98
65
48
1137
568
190
114
57
38
29
342
171
57
35
18
12
9
On Control Charts Based on Generalized Poisson Model
387
Figure 1. ARL for θ 0 = 0.01, λ0 = 0.8, and α L = 0.0005.
3. Monitoring the Parameter Change for GPD Model
3.1. Parameter Interpretation
In practice, products with one or more defects may not necessarily be classified as
defective. However, in this paper, for the sake of convenience in discussion, the word
defective or nonconforming is referred to products with one or more defects or
nonconformities. That is, any product with one or more nonconformities is considered to
be nonconforming.
For the generalized Poisson model, the probability of a product being nonconforming is
1- e −θ . If θ increases, the defective rate of the process will increase, and vice versa. Hence,
θ can be used as a measure of the process defective rate. For the parameter λ , when it is
small, there will be some small but frequent non-zero counts. With the increase of λ ,
larger but less frequent non-zero counts will be observed. Therefore, the value of λ could
indicate the pattern of non-zero counts. By given this type of interpretation for the
parameters in the generalized Poisson model, we can easily interpret the changes of the
parameters when GPD model is used to set up control charts.
In the generalized Poisson model, the overall probability of finding x defects in a
product is:
PX (θ , λ ) =
θ (θ + xλ ) x −1 e −θ − xλ
x!
, x = 0,1, 2..., λ ≥ 0 .
The interpretation of this model is that the process outputs are from a multi-population
resource. The reasonability of a multi-population resource could be explained in this way.
Usually, there are a large number of input variables that affect the process outputs. Not all
of these input variables can be adjusted properly at the same time, even if the process is in
control. Moreover, the group of properly adjusted variables is also changing over time.
Hence, the heterogeneity of the underlying distributions is reasonable and the generalized
Poisson distribution could be used to fit the data. Another explanation of the
multi-population process outputs is mentioned in Jones et al. [17], which is due to different
types of defects.
388
He, Xie, Goh and Tsui
3.2. Monitoring Procedure
As shown in the GPD model, two parameters ( θ , λ ) are involved. The process is
stable only when both parameters are not changed. Using a single chart to monitor the
process is insufficient, because it is difficult to tell which parameter should be checked
when there is an out-of-control alarm. The information about which parameter has shifted
is very helpful when assignable causes have to be found. For example, this information can
guide the process engineer to find out the cause by checking on relevant factors relating to
the specific parameter in question. For this reason, it is preferable to use one chart for each
parameter. This idea is quite similar to using the traditional x-bar chart and R chart
together to monitor normally distributed variables.
3.2.1. A Geometric Chart to Monitor Parameter θ
For the GPD model, we have that p = P(one or more defects in a product) = 1 − e −θ .
Monitoring p is equivalent to monitoring the parameter θ . Hence, the procedure for
monitoring the parameter θ is to count the number of products examined until a
nonconforming product is detected. This procedure is repeated whenever a nonconforming
product is produced. A geometric control chart will then be used to monitor the count of
conforming units. Thus, by using the geometric chart, we could tell whether there is any
change in the parameter θ . Geometric distributions have attracted a lot of attention
recently. For more references on the geometric chart for fraction nonconforming, see
Bourke [4], Kaminsky et al. [18], Xie and Goh [32], Nelson [22], and Woodall [31]. A
recent text is Xie et al. [34].
In this paper, the cumulative count of conforming items until a nonconforming one is
denoted by CCC. Note that CCC should include the last nonconforming item since in
practical applications every round of inspection will stop only when a nonconforming item
is encountered (Goh [15]). Obviously, CCC follows a geometric distribution. The
probability of false alarm is defined by α ccc . The control limits for geometric chart are
computed by:
ln(α ccc / 2)
ln(1 − p )
ln 0.5
CLccc =
.
ln(1 − p )
ln(1 − α ccc / 2)
.
LCLccc =
ln(1 − p )
UCLccc =
(6)
For the data set in Table 1, the estimated control limits are UCLccc = 25.56, CLccc = 4.80
and LCLccc = 1 at the false alarm rate of 0.05. The geometric chart is shown in Figure 2.
Note that a point plotted above the UCL indicates process improvement in geometric charts,
whereas with Shewhart attribute charts it signals process deterioration.
3.2.2. An Attributive Chart with Exact Probability Limits to Monitor Parameter λ
For the number of nonconformities in a nonconforming product, we use attributive
control charts. Since the data plotted are truncated at zero, the control limits should be
based on the conditional probability. The conditional probability of a product containing k
nonconformities, k = 1, 2, 3,…, given that the product is nonconforming is:
On Control Charts Based on Generalized Poisson Model
389
Figure 2. Control chart to monitor the parameter θ
for disk error data.
P ( k nonconformities in a product | the product is defective )
= P ( X = k | X > 0)
=
θ (θ + k λ ) k −1 e −θ − k λ
k !(1 − e −θ )
(7)
k = 1, 2, 3,....
This is called zero-truncated generalized Poisson distribution (Consul, 1989) with
parameters ( θ , λ ) (or shifted generalized Poisson distribution). For the zero-truncated
generalized Poisson distribution we have (Famoye [9]):
E (X ) = θ (1 − λ ) −1 (1 − e −θ ) −1
and
Var (X ) = [θ (1 − λ ) −3 + θ 2 (1 − λ ) −2 ](1 − e −θ ) −1 - θ 2 (1 − λ ) −2 (1 − e −θ ) −2 .
Famoye [9] also proposed two types of statistical control charts for monitoring data
from this distribution. They are gc-chart and gu-chart respectively. However, those control
charts deal with either sample average number of defects or total number of defects, so they
are not suitable for our case where individual data is of interest. Here, two attribute control
charts could be used for monitoring data from the zero-truncated generalized Poisson. One
is the attribute control chart with 3-sigma control limits, and the other is the attribute
control chart with exact probability control limits, both of which are based on the number
of defects in each nonconforming product. The control limits are determined by formula
below.
For the attribute control chart with 3-sigma control limits, we have:
UCLc = E ( X ) + 3 Var ( X )
CLc = E ( X )
LCLc = E ( X ) − 3 Var ( X ) .
(8)
390
He, Xie, Goh and Tsui
For the attribute control chart with exact probability control limits, we have:
∞
UCLc such that
∑
θ (θ + kλ ) k −1 e −θ −kλ
UCLc
∞
and
∑
UCLc +1
k!(1 − e −θ )
> αc
θ (θ + kλ ) k −1 e −θ −kλ
k!(1 − e −θ )
(9)
< αc ,
where α c is the probability of Type I error. It can be seen that the attribute control chart
with 3-sigma control limits is simple, whereas the control chart with exact probability
control limits has the advantage that the probability of Type I error can be predetermined.
For the data set in Table 1, the estimated 3-sigma control limits are UCLc = 27 and LCLc
non-existent. The estimated exact probability control limit is UCLc = 37 at the false alarm
rate of 0.05. The attribute control chart with exact probability upper control limit for
nonconforming data is shown in Figure 3.
Figure 3. Control chart to monitor the parameter λ for disk error data.
Some points can be noted here. First, for the attribute control chart with exact
probability control limits, we may only have the upper control limit. Secondly, by solving
inequality LCLc>0, the existence condition of the LCL in the attribute control chart with
3-sigma control limits can be obtained as:
10θ
9
− 9θ >
.
−θ
1− λ
1− e
In the case of no lower control limit, we can use control chart with only an upper
control limit. When the traditional 3-sigma control limits are not suitable due to poor
normal approximation (Xie and Goh [33]; Ryan and Schwertman [24]), we should choose
the attribute control chart with exact probability control limits.
It can be seen that when θ is a constant, the number of defects in a nonconforming
product is decided only by the parameter λ . Therefore, when used together with the
On Control Charts Based on Generalized Poisson Model
391
geometric chart, by monitoring the number of defects in nonconforming products, we
could tell whether any change in the parameter λ has occurred.
3.2.3. A Joint Monitoring Procedure for Both Parameters
A charting procedure can be used to monitor GPD data by using the geometric chart
and the attribute control chart with exact probability control limits together. An
out-of-control alarm in either chart will be considered as an alarm for the whole monitoring
procedure and investigations should be conducted accordingly together with necessary
corrective actions.
There are three different out-of-control types when the process has deteriorated:
•
When only the geometric chart has a point plotted below the LCL, the parameter
θ has probably increased with the parameter λ to be the same. This is because
the increase of the parameter θ will increase both the mean and variance of the
zero-truncated generalized Poisson distribution.
•
When only the attribute chart has a point plotted above the UCL, the parameter λ
has probably increased, and the parameter θ remains the same. In this case, both
the process mean and variance have increased, and the variance-to-mean ratio has
also increased.
•
When both charts signal alarm simultaneously, the parameter θ has probably
increased while we are not sure about the parameter λ .
Similarly, there are also different out-of-control types when the process has improved:
•
When only the geometric chart has a point plotted above the UCL, the parameter
θ has probably decreased, and we are not sure about the parameter λ . This is
because the decrease of the parameter θ will decrease both the mean and variance
of the zero-truncated generalized Poisson distribution. Thus, the attribute chart
could still be in control even if the parameter λ has increased.
•
When only the attribute chart has a point plotted below the LCL, we can conclude
that the parameter λ has decreased, and the parameter θ remains unchanged. In
this case, both the process mean and variance have decreased, and the
variance-to-mean ratio has also decreased.
•
When both charts signal alarm at the same time, the parameter θ has probably
decreased, and we are not sure about the parameter λ .
These general guidelines, based on the changes of process mean and variance, are
helpful when engineers are trying to find assignable causes that lead to the out-of-control of
the process. This interpretation is in line with traditional x-bar and R-chart for variable
attributes, for which the normal distribution also contains two parameters.
3.2.4. Statistical Properties
Since we use two control charts to monitor the GPD data in this charting procedure,
joint sensitivity of the two charts should be studied. Under the assumptions that the items
are independent of each other, and the parameters of the GPD model are known or can be
accurately estimated from the historical data, we can see that no matter how many
conforming items are examined between two consecutive defective items, the number of
defects on the second defective item will always follow the same truncated generalized
Poisson distribution. Therefore, the cumulative count of conforming items (including the
392
He, Xie, Goh and Tsui
last nonconforming one) is independent of the number of defects on that nonconforming
item.
Let Xc and Xccc denote the statistics plotted on the attribute chart and the geometric
chart, respectively. P (Xc = m, Xccc = n) is the probability that after the last nonconforming
item, there are n-1 consecutive conforming ones followed by a nonconforming one with m
defects. Let Pi represents the probability that an item will have i nonconformities. Under the
assumption that the items are independent of each other, we have that
P (Xc = m, Xccc = n) = P0n-1 Pm,
∞
P (Xc = m) =
∑
P0n −1 Pm = Pm
n =1
∞
∑P
n −1
0
= Pm /(1 − P0 ) ,
n =1
and
∞
P (Xccc = n) =
∑
P0n −1 Pm = P0n −1
m =1
∞
∑P
m
= P0n −1 (1 − P0 ) .
m =1
Thus, it follows that P (Xc = m, Xccc = n) = P (Xc = m) P (Xccc = n). From the definition of
independence of two discrete random variables, we have that Xc and Xccc are statistically
independent.
Hence, we can carry out an approximation to study the joint sensitivity of the two
charts separately by:
Joint probability of Type I error, α = 1-(1- α ccc )(1- α c ),
(10)
Joint probability of Type II error, β = β ccc β c ,
(11)
where α ccc and α c are individual probabilities of Type I error for the geometric chart
and the attribute chart, respectively. β ccc and β c are individual probabilities of Type II
error for the geometric chart and the attribute chart, respectively. In what follows, we will
present the joint analytical sensitivity study of the two control charts used in this procedure.
In SPC, the operating characteristic function expressed by Type II error probability β
is a measure of the inability of a control chart to detect a certain shift in the process
parameter. The larger the β , the higher the probability that a control chart fails to defect
the shift, and vice versa. Since in the joint monitoring procedure, we use two control charts
to monitor GPD data, and we can obtain their joint sensitivity, we would like to study the
operating-characteristic function for the whole control procedure. For the GPD model, the
general formula of joint Type II error probability β is:
β = β ccc β c
= [(1 − p ) LCLccc − (1 − p )UCLccc ][
UCLc
∑
k = LCLc
= [(e −θ ) LCLccc − (e −θ )UCLccc ][
UCLc
∑
k = LCLc
θ (θ + kλ ) k −1 e −θ −kλ
k!(1 − e −θ )
θ (θ + kλ ) k −1 e −θ −kλ
k!(1 − e −θ )
].
]
(12)
On Control Charts Based on Generalized Poisson Model
393
Similarly, the joint average run length (JARL) is the average number of points plotted to
obtain an out-of-control alarm. For the GPD model, it is given by:
JARL =
=
1
1
=
P (out − of − control alarm ) 1 − β
1
1 − [(e −θ ) LCLccc − (e −θ )UCLccc ][
UCLc
∑
θ (θ + kλ ) k −1 e −θ −kλ
k = LCLc
k!(1 − e −θ )
(13)
.
]
We can also have the relationship between the JARL and the individual ARLs. That is:
1
JARL =
1 − (1 −
(14)
1
1
)(1 −
)
1 − ARLc
ARLccc
where ARLccc and ARLc are the average run lengths for the geometric chart and the
attribute chart, respectively. It should be pointed out that this equation is valid for both
in-control and out-of-control situations. Some numerical values of JARL and AIRL for
different α , θ , and λ are presented in Tables 3 & 4, and the plot of AIRL is also shown
in Figure 4.
Table 3. Some numerical values of JARL for θ 0 = 0.01, λ0 = 0.8, and α = 0.05.
θ
λ = 0.50 λ = 0.75 λ = 0.80 λ = 0.85 λ = 0.90 λ = 1.00 λ = 1.50
0.005
8.4
7.7
7.1
6.4
5.6
4.1
1.6
0.01
39.4
26.5
20.2
14.8
10.8
6.1
1.7
0.03
26.6
20.0
16.2
12.6
9.5
5.7
1.7
0.05
16.19
13.5
11.7
9.7
7.8
5.0
1.6
0.1
8.4
7.6
7.0
6.3
5.4
4.0
1.6
0.15
5.7
5.4
5.1
4.7
4.2
3.3
1.5
0.2
4.4
4.2
4.1
3.8
3.5
2.9
1.4
* α is the probability of Type I error for the whole procedure and we assume equal Type
I error probabilities for the two charts, i.e., α ccc = α c = 1 − 1 − α .
Table 4. Some numerical values of AIRL for θ 0 = 0.01, λ0 = 0.8, and α = 0.05.
θ
λ = 0.50
λ = 0.75
λ = 0.80
λ = 0.85
λ = 0.90
λ = 1.00
0.005
0.01
0.03
0.05
0.1
0.15
0.2
845
3962
2677
1627
840
578
447
773
2662
2011
1357
763
541
425
715
2033
1630
1174
703
511
407
641
1490
1262
972
628
472
382
559
1084
958
784
547
426
353
411
611
570
507
401
335
290
λ = 1.50
159
169
167
163
156
150
144
AIRL is the average item run length, which is the average number of items
inspected in order to obtain an out-of-control alarm.
394
He, Xie, Goh and Tsui
From the results in Tables 3 & 4, it can be observed that the second procedure can
detect the decrease of the parameter θ , which denotes possible improvement of the
process. This is very useful, as people always like to find some ways to improve the process
quality. It also can be seen that when either parameter increases the ARL will decrease.
This is because when the parameter θ increases, there will be more units with defects, and
hence the average number of items to be inspected to obtain an alarm should decrease.
Meanwhile, when the parameter λ increases, the number of defects in the nonconforming
unit will have a larger mean as well as a larger variance, which means that we should be
able to observe an occasional large number of defects in sample units.
Figure 4. AIRL for
θ 0 = 0.01, λ0 = 0.8, and α = 0.05.
4. Testing Against the Poisson Distribution
Usually, the choice of the generalized Poisson distribution is not always obvious. Since
the generalized Poisson distribution is more complicated, we should use it only when it is
necessary, otherwise we will use the Poisson distribution. Thus, it is useful to test whether
the generalized Poisson distribution is more appropriate than the Poisson distribution for
the data.
As we can see, the generalized Poisson distribution will reduce to the Poisson when the
parameter λ equals 0. Hence, we define the null and alternative hypotheses as:
H 0 :λ = 0
H 1 :λ > 0
(15)
Test methods introduced in this section are proposed especially for testing this
hypothesis.
4.1. Three Statistical Tests
Three methods of statistical hypothesis test for testing the Poisson distribution against
the generalized Poisson distribution are discussed. Results from an extensive simulation
study of test power are summarized.
On Control Charts Based on Generalized Poisson Model
395
The variance test (VT) or the index of dispersion test is very common in testing the
Poisson assumption. The test statistic is given by:
VT =
n
∑
i =1
(X i − X )2
S2
= (n − 1)
X
X
(16)
When the null hypothesis is true, VT has an approximate Chi-square distribution with n-1
degrees of freedom. Furthermore, the Chi-square approximation is highly satisfactory even
for small n (Selby [26]). The critical region of the VT test is VT > χ α2 , n −1 where χ α2 , n −1 is
the upper percentage point.
{
}
Note that the null hypothesis can be rewritten as H0: σ 2 = µ . To test this hypothesis,
Potthoff and Whittinghill [25] and Bohning [2] proposed the test statistic:
O2 =
n −1 ⎛ S 2 − X
⎜
2 ⎜⎝ X
∑
(X i − X )2
⎞
n −1
⎟=
−
⎟
2
X 2(n − 1)
⎠
(17)
which will approximately follow the standard normal distribution under null hypothesis.
Potthoff and Whittinghill [25] also showed that among all locally unbiased tests for testing
the Poisson assumption against a mixed Poisson distribution, this test is asymptotically
locally most powerful, especially with respect to a gamma mixing distribution, i.e., against
a negative binomial distribution. The critical region for the O2 test is {O 2 > z α } where zα
is the upper percentage point.
Gart and Pettigrew [14] derived test statistic based on the property that the cumulants
of the Poisson distribution are all equal to the Poisson parameter. Here we use the order 2
cumulant as the test statistic, which is:
Z2 =
S2 − X
2nX (nX − 1)
n n −1
(18)
Furthermore, the asymptotic distribution of all the test statistics based on the sample
cumulants is the standard normal distribution. The critical region is {Z 2 > z α } .
In the literature, some other test procedures for the generalized Poisson distribution
have been proposed. Famoye [8] introduced a test method for testing the homogeneity of a
restricted generalized Poisson distribution against a general class of alternatives, Henze and
Klar [16] derived a bootstrap-based goodness of fit test for the generalized Poisson
distribution and Famoye [10] proposed the empirical distribution function (EDF) test for
the generalized Poisson distribution. However, as we want to test about the parameter λ
in an unrestricted generalized Poisson distribution, those test methods in the literature may
not be considered suitable or applicable. Meanwhile, the general Chi-square test is also not
a good alternative due to its inherent problems. For example, the asymptotic approximation
of the test statistic to the Chi-square distribution is sensitive to the sample size and the
grouping of data (Karlis and Xekalaki [19]). Another example is that significant test power
will be lost in order to satisfy the requirement referring to the frequency of each group
made, which is commonly greater than 5 (Cochran [5]).
396
He, Xie, Goh and Tsui
4.2. A Simulation Study
To investigate the power and compare the tests proposed, a simulation study was
conducted. Based on each combination of values of θ and λ , a GPD model can be
obtained with known parameters. According to each GPD model, 1000 samples were
generated by simulation for five different sample sizes, i.e., 500, 200, 100, 50 and 20,
respectively. For each sample size, the test statistic was calculated and the test power was
estimated. Values of θ were chosen as 0.1, 0.5, and 1. The results for the case of θ = 0.1,
θ = 0.5 and θ = 1.0 are presented in the Tables 5a-5c.
These three values of θ are chosen because they can be considered as representative
values in high-quality processes. In fact, from a study of a number of real data sets from
high-quality processes, we found that most of the estimated values of θ are within the
range of (0.1, 0.5). The results indicate that these tests are all good for the testing of the
Poisson distribution against the generalized Poisson alternative, even when λ is small.
Generally, the estimated power of those tests increases with the increase of the sample size
or the value of θ , and decreases with the decrease of the parameter λ . The reason for the
latter is that when λ is smaller, the GPD model is closer to the Poisson model, and
consequently it is more difficult to distinguish the null and alternative hypotheses.
In specific, the performance of the O2 and Z2 tests is better than the variance test,
especially when the sample size is small. The estimated power of Z2 test is slightly higher
than that of the O2 test, and the difference increases when the sample size is small or λ
closes to 0.
We can also find that the estimates of the significance level are near the true level, and
the difference is not obvious. As the significance level is the probability of Type I error, it
can be considered as a protection of the null hypothesis. Thus, those tests can give an
appropriate protection to the Poisson model according to the predetermined probability of
Type I error.
Table 5a. The powers of the three tests for θ = 0.1 ( α = 0.05).
N
500
200
100
50
20
λ
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
0.9
1.000
1.000
1.000
1.000
1.000
1.000
.997
.997
.997
.946
.946
.946
.650
.651
.683
0.8
1.000
1.000
1.000
1.000
1.000
1.000
.995
.995
.995
.919
.920
.920
.588
.588
.628
0.7
1.000
1.000
1.000
1.000
1.000
1.000
.989
.989
.989
.910
.912
.912
.546
.546
.598
0.6
1.000
1.000
1.000
.998
.998
.998
.972
.974
.977
.852
.854
.855
.487
.491
.556
0.5
1.000
1.000
1.000
.997
.998
.999
.944
.950
.952
.783
.785
.785
.446
.451
.527
0.4
1.000
1.000
1.000
.988
.989
.989
.896
.903
.914
.703
.705
.710
.322
.324
.401
0.3
1.000
1.000
1.000
.951
.955
.955
.793
.800
.811
.556
.562
.562
.254
.255
.328
0.2
.981
.983
.983
.828
.837
.837
.585
.600
.621
.397
.408
.411
.162
.165
.238
0.1
.737
.747
.747
.446
.469
.469
.305
.319
.354
.204
.212
.212
.082
.083
.149
0
.038
.040
.041
.048
.054
.054
.034
.045
.056
.049
.049
.049
.044
.044
.044
On Control Charts Based on Generalized Poisson Model
397
Table 5b. The powers of the three tests for θ = 0.5 ( α = 0.05).
N
500
200
100
50
20
λ
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
0.9
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.987
.987
.987
0.8
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.966
.969
.971
0.7
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.939
.947
.950
0.6
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.883
.905
.908
0.5
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.985
.988
.988
.806
.832
.838
0.4
1.000
1.000
1.000
1.000
1.000
1.000
.999
.999
.999
.934
.947
.948
.647
.699
.707
0.3
1.000
1.000
1.000
1.000
1.000
1.000
.968
.974
.974
.795
.830
.830
.452
.512
.521
0.2
1.000
1.000
1.000
.969
.975
.976
.782
.801
.801
.525
.566
.568
.267
.335
.349
0.1
.878
.884
.884
.517
.543
.548
.332
.361
.361
.208
.247
.247
.140
.174
.182
0
.053
.051
.051
.047
.048
.049
.043
.044
.044
.040
.038
.042
034
.043
.047
Table 5c. The powers of the three tests for θ = 1.0 ( α = 0.05).
N
500
200
100
50
20
λ
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
VT
O22
Z22
0.9
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.8
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.999
.999
.999
0.7
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.992
.995
.995
0.6
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.951
.964
.966
0.5
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.998
.998
.998
.881
.902
.905
0.4
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.987
.992
.993
.766
.797
.804
0.3
1.000
1.000
1.000
1.000
1.000
1.000
.992
.993
.993
.880
.900
.901
.530
.578
.588
0.2
1.000
1.000
1.000
.991
.993
.993
.838
.869
.869
.546
.600
.603
.288
.335
.354
0.1
.899
.907
.907
.563
.585
.585
.317
.359
.361
.206
.242
.249
.113
.151
.158
0
.055
.055
.055
.050
.044
.045
.041
.042
.043
.051
.054
.056
.040
.038
.045
4.3. An Example
A summary of the test statistics for the hard disk example is given in Table 5. It is clear
that the GPD model should be used instead of the Poisson model, under the assumption
that the data are taken from an in-control situation. All three tests reject the Poisson
distribution assumption in favor of the generalized Poisson.
398
He, Xie, Goh and Tsui
Table 5. The summary of the test statistics ( α = 0.01).
Test methods
Variance test
VT
O22 test
2
Z2 test
Test statistic
Critical region
Accept/reject null hypothesis
10003.3
VT > 263.158 or < 158.347
Reject
231805
O22 > 6.6349
Reject
232767
2
2
Z > 6.6349
Reject
5. Discussion
When analyzing the data on the number of defects containing a large number of zeros,
a generalized Poisson model could be used as an alternative. It is especially effective when
the data show over-dispersion compared with the traditional Poisson model. If we use the
traditional Poisson model to fit the over-dispersed data and monitor the process with a
control chart, one direct disadvantage is that too many false alarms will be obtained, which
leads to unnecessary costs in inspection, and frequent interventions of the manufacturing
processes. A generalized Poisson model is more suitable in this situation, and wider but
more appropriate control limits can be derived.
On the other hand, many zero counts may not necessarily lead to the generalized
Poisson model, and a hypothesis test should be conducted before it is used. Based on our
simulation study, the proposed three test methods are all reasonably powerful when testing
the Poisson distribution against the generalized Poisson alternative. Finally, for practical
purposes, a simple test should be applied.
In this paper, we have used the generalized Poisson distribution mainly for high-quality
processes characterized by data with many zero counts. For this type of data, the
zero-inflated Poisson distribution has been used. It would be of interest to compare the two
distributions in this case. Our preliminary investigation indicates that they have very similar
behaviors, and the main difference is the model interpretation. The former has a shock
model interpretation for which shocks occur at random, and each will lead to a number of
defects. The generalized Poisson has a mixture of population interpretation. On the other
hand, the generalized Poisson model can be used for high-quality manufacturing processes
as well as in general cases of over-dispersed data.
References
1. Bednarski, T. (2004). Robust estimation in the generalized Poisson model. Statistics, 38,
149-159.
2. Bohning, D. (1994). A note on a test for Poisson overdispersion. Biometrika, 81,
418-419.
3. Bohning, D. (1998). Zero-inflated Poisson models and C.A.MAN: a tutorial collection
of evidence. Biometrical Journal, 40, 833-843.
4. Bourke, P. D. (1991). Detecting a shift in fraction nonconforming using run-length
control charts with 100-percent inspection. Journal of Quality Technology, 23, 225-238.
5. Cochran, W. (1954). Some methods of strengthening the χ 2 goodness of fit test.
Biometrics, 10, 417-451.
6. Consul, P. C. and Jain, G. C. (1973). A generalization of the Poisson distribution.
Technometrics, 15, 791-799.
On Control Charts Based on Generalized Poisson Model
399
7. Consul, P. C. (1989). Generalized Poisson Distributions: Properties and Applications. Marcel
Dekker, NK.
8. Famoye, F. (1993). Testing for homogeneity: the generalized Poisson distribution.
Communications in Statistics - Theory and Methods, 22, 705-715.
9. Famoye, F. (1994). Statistical control charts for shifted generalized Poisson distribution.
Journal of the Italian Statistical Society, 3, 339-354.
10. Famoye, F. (1999). EDF tests for the generalized Poisson distribution. Journal of
Statistical Computation and Simulation, 63, 159-168.
11. Fang, Y. (2003). C-charts, X-charts, and the Katz family of distribution. Journal of
Quality Technology to appear.
12. Freund, D. A., Kniesner, T. J. and LoSasso, A. T. (1999). Dealing with the common
econometric problems of count data with excess zeros, endogenous treatment effects,
and attrition bias. Economics Letters, 62, 7-12.
13. From, S. G. (2004). Approximating the distribution of a renewal process using
generalized Poisson distributions. Journal of Statistical Computation and Simulation, 74,
667-681.
14. Gart, J. and Pettigrew, T. (1970). On the conditional moments of the K-statistics for the
Poisson distribution. Biometrika, 67, 661-664.
15. Goh, T. N. (1987). A control chart for very high yield processes. Quality Assurance, 13,
18-22.
16. Henze, N. and Klar, B. (1995). Bootstrap based goodness of fit tests for the generalized
Poisson model. Communications in Statistics - Theory and Methods, 24, 1875-1896.
17. Jones, L. A., Woodall, W. H. and Conerly, M. D. (1999). Exact properties of demerit
control charts. Journal of Quality Technology, 31, 207-216.
18. Kaminsky, F. C., Benneyan, J. C., Davis, R. D. and Burke, R. J. (1992). Statistical
control charts based on a geometric distribution. Journal of Quality Technology, 24,
63-69.
19. Karlis, D. and Xekalaki, E. (2000). A simulation study of several procedures for testing
the Poisson assumption. Journal of the Royal Statistical Society, D49, 355-382.
20. Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in
manufacturing. Technometrics, 34, 1-14.
21. Luceno, A. (2005). Recursive characterization of a large family of discrete probability
distributions showing extra-Poisson variation. Statistics, 39, 261–267.
22. Nelson, L. S. (1994). A control chart for parts-per-million nonconforming items. Journal
of Quality Technology, 26, 239-240.
23. Ramirez, J. G. and Cantell, B. (1997). An analysis of a semiconductor experiment
using yield and spatial information. Quality and Reliability Engineering International, 13,
35-46.
24. Ryan, T. P. and Schwertman, N. C. (1997). Optimal limits for attribute control charts.
Journal of Quality Technology, 29, 86-98.
25. Potthoff, R. E. and Whittinghill, M. (1966). Testing for homogeneity: II, the Poisson
distribution. Biometrika, 53, 183-190.
26. Selby, B. (1965). The index of dispersion as a test statistic. Biometrika, 52, 627-629.
27. Shankar, V., Milton, J. and Mannering, F. (1997). Modeling accident frequencies as
zero-altered probability processes: an empirical inquiry. Accident Analysis and Prevention,
29, 829-837.
400
He, Xie, Goh and Tsui
28. Song, J. X. (2005). Zero-inflated Poisson regression to analyze lengths of hospital stays
adjusting for intra-center correlation. Communications in Statistics - Simulation and
Computation, 34, 235-241.
29. Toscas, P. J. and Faddy, M. J. (2003). Likelihood-based analysis of longitudinal count
data using a generalized Poisson model. Statistical Modeling, 3, 99-108.
30. Tuenter, H. J. H. (2000). On the generalized Poisson distribution. Statistica Neerlandica,
54, 374-376.
31. Woodall, W. H. (1997). Control charts based on attribute data: bibliography and review.
Journal of Quality Technology, 29, 172-183.
32. Xie, M. and Goh, T. N. (1992). Some procedures for decision making in controlling
high yield processes. Quality and Reliability Engineering International, 8, 355-360.
33. Xie, M. and Goh, T. N. (1993). SPC of a near zero-defect process subject to random
shock. Quality and Reliability Engineering International, 9, 89-93.
34. Xie, M., Goh, T. N. and Kuralmani, V. (2003). Statistical Models and Control Charts for
High-Quality Processes. Kluwer Academic Publisher, Boston.
Authors’ Biographies:
B. He is a Senior Engineer working for Philips Electronics. He obtained his BS from the
University of Science & Technology of China in 1999, and his PhD from the National
University of Singapore in 2004. He is a member of IEEE.
Min Xie is a Professor of Industrial and Systems Engineering, National University of
Singapore. He received his PhD in Quality Technology from Linkoping University in 1987.
His research area includes quality, reliability and applied statistics. Prof Xie is an author or
co-author of over 100 journal papers and 6 books in this field. He serves as editor or
associate editor of IJRQE, QTQM, IIE Transactions, IEEE Transactions on Reliability and
several other journals. He is a Fellow of IEEE.
Thong Ngee Goh is a Professor of Industrial and Systems Engineering at the National
University of Singapore. He obtained his PhD from the University of Wisconsin-Madison.
Prof Goh has been internationally recognized for his expertise in quality engineering and
management. He is an author of numerous papers and a few books. He is an elected Fellow
of ASQ as well as Academician of IAQ.
Kwok-Leung Tsui is a Professor in the School of Industrial and Systems Engineering at
Georgia Institute of Technology. He has a B.Sc. in Chemistry and an M.Ph. in
Mathematics both from the Chinese University of Hong Kong, and a Ph.D. in Statistics
from the University of Wisconsin at Madison. He had worked in the Quality Assurance
Center of AT&T Bell Laboratories in 1986-90. Dr. Tsui was a recipient of the 1992 NSF
Young Investigator Award. He is a Fellow of the American Statistical Association (ASA),
and was the (elected) President and Vice President of the ASA Atlanta Chapter in
1992-1993. Dr. Tsui was the chair of the INFORMS Section in the Quality, Statistics, and
Reliability (QSR) in 2000, and is the founding chair of the INFORMS Section in Data
Mining (DM). Dr. Tsui is also a US representative in the ISO Technical Committee on
Statistical Methods (TC 69). Dr. Tsui researches, teaches, and consults on statistical
methods for quality, logistics, and data mining. His research interest includes classification
tree, support vector machine, Mahananlobis-Taguchi System, inventory forecasting and
control, statistical process control, experimental design, robust design and Taguchi method,
design and modeling of computer experiments, and coordinate measuring machine
modeling.