Some Sequential ·Procedures f'or Ordering Populations
According to Means, Variances and Regression Coef'ficients
Ishverlal S. Bangdiwala and R. J. Monroe
l"
,.
\
L
<~
... ,.
•
f.:
¥:.
Institute of' sta.tistics
Mimeo Series No. 202
June, 1958
(v)
TABLE OF OOl.TTEl-JTS
Chapter
I.
INTRODUCTION
1.1 General
1.2 Statement of the Problem
II.
1
5
RETJIEW OF LITERATURE
2.1 Sequential Procedure
2.2 Multiple Decisions
2.3 Sequential Procedure for Ranking Populations
11
12
13
III. THEORETICAL DEVELOPMENT
3.1 Introduction
3.2 Ordering the Populations According to the Means
3.3 Ordering k Populations According to their Variances
3.4
Ordering k Population Regression Coefficients'
16
17
32
45
IV. APPLICATION
4.1
4.2
4.3
-4.4
V.
General
Ordering Means
Ordering Variancas
Ordering Regression Coefficients
58
60
64
67
SUMMARY
5.1 summary
70
5.2 Consideration in Applying Sequential Procedures
5.3 Suggestions for Further Research
71
71
BIBLIOGRAPHY
73
Errata Sheet
SOME SEQIENTIAL PROCEOORES FOR ORDERING POPULATIONS
ACCORDING TO MEANS, VARIANCES AND REGRESSION OOEFFICIENTS
I. S Bangdiwala
0
Page 36 -
Line beginning ''Using transformations , ... " should be (3.3.2.13).
t 3.3.2.13)
Page 38 -
Line 6b - Read associated for assocaited.
Page 59 -
Line 14 - Read sugar-cane for sugar-cone.
Page 62 -
Line 11 - liThe M-values for these two cases are to the
II
should read "The l1-values for these two cases are to be .. " .
Line 5b .. Read "tl) be wl)rst II for lito be wrost II •
e
Page 64 Page
74 -
Line 3..
poll- - Omit
*.
Bibli')graphy - Duncan should precede Dunnet,
•
CHAPTER I
INTRODUCTION
1.1
General:
From many of the experimental situations over the past few years it has
become evident that most of the research workers in all scientific fields
intend to accomplish two things from an experiment which are important from
the practical point of view'.
First, the experimenter would like to know
at as early a stage as possible, which results from his experiment are
important and conclusive.
Secondly, the experimenter intends to have
the analysis of his data so made as to help him get direct answers to the
questions toward which he has aimed the experiment.
For example, when the
experimenter runs a variety trial knowing that the varieties are different,
he therefore expects to observe significant differences among the varieties.
The conventional tests of signif'icance (e.g. for the hypothesis that the
means are equal) do confirm his expectation.
However, these tests seem to
be unrealistic in many experiments, because the theory of statistical
inference on which they are based permits them to make a decision only with
regard to the rejection or non-rejection of the null hypothesis (equal
means, in this ease).
This classical test procedure, viz., analysis of variance, has been
widely used since its introduction by R. A. Fisher and has become one of
the basie tools of research workers as a method of analyzing experimental
data.
One of the chief virtues of this procedure is its practicality for
separating orthogonal sources of total variation giving "information"
about the population.
It also provides a test of homogeneity of means,
2
the F test.
However, this test and, in general, the analysis o:f variance
is misused more o:ften than it is used appropriately.
The F-test does
provide an important decision-ma1{ing rule; however, one which is limited
in its utility.
These tests are not able to suggest any suitable course
of action for the experimenter to :follow.
In fac t, the experimenter
desires, in some cases, to find the "best If treatment or variety out of
the ones he has tried or a group of varieties better than the rest or,
in general, he may like to divide the entire set of varieties in several
groups in some systematic order.
Or, as Cochran and Cox (1957) put it:
"On the whole •.•. tests of significance are less frequently
use:ful in experimental work than confidence limits. In
many experiments it seems obvious that the different treatments must have produced some dif:ference, however small in
effect. Thus the hypothesis that there is no difference is
unrealisticg the real problem is to obtain estimates of the
sizes of the differences."
The method of estimating the sizes 01" differences often is used as a
way of obtaining some kind of order of different treatments.
Thus, the experimenter would like to make such multiple decisions
from his data and to have, at the same time, decisions as soon as
possible according to the limits of precision and confidence desired
by him :for his conclusions.
He would be at an advantage if he had
planned an experiment such that it would give him the answers
directly.
This experimental procedure could then be formulated to
arrive at such decisions with the guarantee of a certain (specified)
minimum protection level from the risk of not including the appropriate
treatment in the right group.
3
'liIhen the experimenter desires his results from the experiment Itas
soon as possible", he must bear in mind certain limitatiors due to
several factors including an important one, the nature of the experimental material.
For exa.mple~
as Monroe and Mason (1955) mention:
lIThe estimates of variability, in case of agricultural
research, associated with perennial crops and the use
to which these estimates are put, depend so strikingly
upon the assumptions we are willing to make. • ..... The
criteria which are set up to describe optimum conditions
sometimes lead to an experiment which requires a lifetime or two to execute. 1t
They go on, later, to suggest two approaches to the problem when
certain assumptions. made about the population of a certain measurable
factor are not known to the experimenter.
The first of these is to
change the experimental technique to include possible selections such
that ltnature provides the necessary randomization It.
technique will be expensive in most of the cases.
Obviously this
The second
approach'is to
use the sample survey technique. A single experiment
could be designed in a much simpler fashion with no
requirements that an estimate of error be supplied by the
experiment itself. The survey would furnish the real
estimates of error with which to judge the relative
variety growth. It
Q •••
0
They conclude by saYing that
ft(l)
The experimental data can be lOI:>ked at from more
than one point. Answers by data depend upon the
questions asked and the assumptions made.
(2)
Newer techniques show promise to help broaden the
base of experimental inference and to shorten the
time required from such an inference.
(3)
Theory •••• on one hand, promotes attitudes of the
experimenter towards quantitization of variability,
(and) on the other, deludes us into thinking that
because we have rigor, we have truth. tt
Thus in order to suggest some of the newer techniques to broaden the
base of inference the subject of this dissertation is to present some
sequential procedures
fClr
ordering several treatments (populations)
according to their t'bestness" as defined in terms of their measurable
characteristics, such that at every suitable stage of the data collection
the experimenter may be able to examine his data and accordingly arrive
at a decision rule to stop or to continue taking more observations.
Such a procedure has come to be known as Sequential Procedure since its
introduction byWald (1947) in the U.S.A. and Barnard (1946) in England.
However, i f one thinks about it, the sequential idea is much older than
this. We will realize that the statistical method in itself is a
sequential idea.
The policy of research workers is to learn from
available data what steps can be followed or what improvement can be
made in his experiment or how many more samples should be taken, if' any
at all are needed, in order to make his decision on some specified
mea.sure of precision and the level of confidence in his decision. These
ideas are applied in all fields of research, e.g. in Agriculture,
Industry, or Social or Economic Surveys, Medicine and others.
The
knowledge of the results obtained in previous sets of data is utilized
in order to follow up the directions suggested therein.
The sequential procedure as it is referred to at present is the
one based on the ideas which led lfald and others to formulate procedures involving observations taken one by one with analysis and test
of hypothesis made at every stage of sampling.
5
So far most of this theory of sequential procedures has been
developed for estimating and testing the comparison between two groups
(treatments) only.
The technique of sequential sampling and testing has
been widely used in industry but only to some extent in medicine and
other fields of research such as agriculture.
One of the main limitations
is the requirement that the nature of the population be the same from one
stage to the next stage.
In industrial testing this is possible because
the process is largely contrl)lled but in agriculture, two successive
observational stages may differ greatly during the periodic changes.
Very often there are more than just two treatments or varieties involved
in order to generate samples from different populations under the same
environmental conditions.
St) far very little work has been done with
this aspect of comparing several treatment
means by sequential pro-
cedures.
1.2.
Statement of the Problem
Let there be k populations vI'
2, ••• ,
tf
k and let
11
Q
be the
corresponding parameter whose magnitude defines the l'bestness lt of
the populations relative to each other.
Without loss of generality
let us assume that the best population is the one having the highest
value of Q.
Let the value of
Q
for k populations be
Ql'
Q2'···'
k '
Q
respectively.
Let the general goal of the experimenter be ttto divide the k
populati ens into s groups and to determine the order of these grOllps
such that we have the k populations identified as follows:
6
~
populations are the "worst It
k
populations are the Itnext worst lt
2
(
'>
(1.2.1)
!
."'
00
.
\
populations are the "best lt
k
s
We note here that
s~k
and
.
t
k
~==
s
== k
Let us denote the ranked parameters
Q
llJ ~
i (i == 1, 2, •••, k) as
Q
•• ... ~ Q[kJ .
Q[2]:::::
(1.2·4)
It is not known which population is associated with Q[ i
J.
According to
the general goal stated as above, then, it is desired to determine the
following order of the population parameters
(1)
QI s:
The set of ~ parameters
Qe1]' Q[2]' .....,
QL~·J
(1.2.5)
for which the corresponding populations are the t'worst 11 •
( 2)
The set of k 2 parameters
Q[~ +
IJ'
Q[~
+
2J'····,
Q[~
+
(1.2.6)
k2l
for which the corresponding populations are "next worst If •
•
0
••••• 0 ••••••••••••••••••••••••••••••• $
(s)
The set of k
Q ~-ks+1J' Q [k-k
s
parameters
+2J' •••• , QLk]
s
for which the corresponding populations are the tt'best".
.
7
The number of alternate decisions in the above order of k populations is
k....t
D =
~t
_
k2 t .o.k8 t
(1.2.8 )
•
')
corresponding to the s sets mentioned in (1.2.5), (1.2.6) and (1.2.7),
let us define (s-l) constants 0i'/ 0, (i = 1, 2, .. 0' s-l) as some
function of Q[kiJ and Q[k +1J' as for eXBmple, the difference between
i
the two, viz 0'
or as a ratio, viz.,
Q(k +1]
i
(1.2.10)
Q[kiJ
So that, if 6~ is the minimum value of such a function desired to be
detected, the sequential procedure will guarantee a specified minimum
probability P* (l!D == P'<"" :'::. 1) of correct decision whenever
0i ~ 0i*
(1.2.11)
,
The problem then consists in defining a statistic Q as a function
of the observations
to estimate the parameter Q and calculating the
...
-'-~'{
'
probability P at the roth stage of a sequential procedure of sampling
m
for the grouping of the populations under consideration.
The object of' this dissertation is to derive sequential procedures
f'rJr
~
different cases of' ordering populations according to three
different characteristics measuring the "bestness of the populations.
6
The three characteristics considered are
(a) Q
and
=~,
the mean
(b)
Q !i!
(l,
( c)
Q ::
13, the regression coefficient 0
the variance
The"bestness" in the above cases is defined as follows:
(a)ihen
Q
== ~, the best population is the one having the
.;;;;LAR;;;.·_GE_S~T ~
(b) When
Q ::
i,
0
the best population is the one having the
SMALLEST VARIANCE
and (c) 1Wb.en Q =:
0
13, the best population is the one having the
HIGHEST REGRESSION COEFFICIENT (POSITIVE)
0
The four cases of ordering k population.s considered for each of too
above 3 characteristics are:
GoaJ. (1)
To determine the t "best" (unordered) populations:
In this case
=2
k:t = k-t
k2 = t
s
and
(102012)
\
The number of aJ.ternate decisions is
D
1
=
. kt
tt (k-t)J
GoaJ. (2)
=
Ck
t
To determine the best po;eulation:
Here again
s = 2
but
~
and k 2
= k-1
=1
(102.14)
9
The number of alternate decisions is
=
D
2
and
s
(1.2.15)
(k-l) I
Goal (3)
Here
=k
kt
'1'0 determine the t "best" (ordered) populations:
= t+l
k:I. = k-t
and k 2
(1.2.16)
= k3 = ••• = ~ 1 = 1
The alternate decisions in this case are
=
D
3
Goal
kJ
(k-t) I
! 4)
tJCk
= ..
t
'1'0 determine the complete ranking of k populations:
In this case
s
=k
(1.2.18)
and the number of alternate decisions is
(1.2.19)
The theoretical development of the above four cases is dealt with
in Chapter III.
A sequential procedure for each of the 12 possible cases
(four cases of ordering for each of the three characteristics) is derived
and outlined.
In Chapter IV are given some illustrative examples.
Bechhofer and Sobel (1956) have shown that the sequential procedure
for ordering terminates with probability unity and guarantees minimum
acceptable P
= P* whenever
0 ~ 0* as discussed above.
It has to be
noted that for different goals, this P* has to have different limits,
10
viz.,
for Goal (1),
1
d[
p.,E-=:; 1
!JC..
P~*'L
k- -
for Goal (2),
for Goal (3),
~
1
...1;c ~ P~*'~ 1
trct
and for Goal (4),
1 £U
-
(1.2.20)
(1.2.21)
(1.2.22)
P~*'~ 1
.-
It was also shown by Bechhofer that the "Average Sample Size"
required in a sequential procedure for such a goal is usually less than
that required in a "fixed" sample procedure giving the same precision
and protection level.
Thus the entire effort of this dissertation is
directed towards formulat:ing some procedures which (i) are economical
with respect to sample size and (ii) have broader utility than the
conventional procedures.
CHAPTER II
REVIEW OF LITERATURE
2.1
Sequential Procedure:
The general theory of sequential procedure was developed by Wald
(1947) in the U.S.A. and independently by Barnard (1946) in England.
Most of their work was done during world 'War II and published later.
The procedure for sequential testing as suggested by both writers
consisted, in general, of specifying certain conditions of risks of
error in making the decision and of taking observations one at a time
from normally distributed populations until a decision to accept or to
reject the hypothesis was reached.
If neither decision was reached,
more samples were taken, and the process of testing repeated.
Theoreti-
cally such a sequential process is shown to terminate with probability
unity.
The average sample size needed is also shown to be less than the
fixed sample size required for attaining the same precision and confidence levels.
The sequential technique has been very widely used in industrial
and engineering problems for testing for defectives and for maintaining
product quality control.
In other fields of research this technique
has not been applied as much, probably because the theory has not been
stressed until recently,. and because of impracticality arising when
the population being sampled may be changed during the lapse between
two successive samples.
Agriculture is one of these fields.
Bross
(1952) and Armitage (1954) have applied the sequential technique in
medical research to compare two treatments given in pairs to two
objects (patients in this ease).
Rushton (1952) has outlined'a two-
12
tailed sequential test for unpaired samples.
A number of other research
workers have written on this subject and its applications; however, the
theory of this procedure has not been much extended to deal with more
than two populations or for multiple-decision -type problems.
Recently
Bechho:fer and Sobel have proposed a solution for the highest mean
(1953, 1956) and the smallest variance (1956a).
2.2 Multiple Decisions:
A number of publications have appeared recently on the problem of
making multiple decisions.
This was due, probably, to the realization
by many that the conventional test procedures were based on a theory of
statistical inference which permits the experimenter to make only one
of the two alternate decisions, Le., to reject or not to reject a
null hypothesis of "no difference".
The experimenter does not get
direct answers from the experiment as to his next best course of action.
In most of the cases, he intends to order or to rank his populations
according to some measurable criteria such as the mean or the variance.
This need of the experimenter, realized in the past few years, led the
theoretical statisticians to consider the problem of multiple decisions.
lIald (1950) formulated this decision theory. One of these problems
was the development of rank order statistics.
Paulson (1948) suggested
and developed the theory of order statistics.
He considered classifi-
cations of a number of populations into a superior and an inferior
group.
He also considered (1952) the problem of (k-l) experimental
categories with a control and gave (1952a) an optimum solution to
k-samplettslippage ll problems discussed earlier by Mosteller (1948).
13
Duncan (1955) then considered the problem o:f testing the significance
o:f di:f:ferences between ranked means of populations using a multiplerange test.
A single sample procedure was considered by Bechho:fer (1954) :for
ranking means of normal populations with known common variance.
indicated its use later in another article (1955).
He
Hall (1954a)
considered an optimum property of Bechhofer's procedure, and showed
that it is the most economical multiple-decision rule as defined by
him earlier in (1954), when the bounds on the "distance" are satis:fied.
In a later paper, Bechhofer, Dunnet and Sobel (1954) presented a
similar procedure in case o:f unknown variance, by using the principle
of Stein's two-stage sampling procedure (1945).
Bechhofer and Sobel
(1954) also gave the :fixed sample procedure for ranking variances of
normal populations.
Gupta (1956) "gave a decision rule for ranking means
and later Bose and Gupta (1956) extended the results by working out the
moments of the order statistics.
Some work has been done in ranking populations other than normal;
for example, Ifuyett and Sobel (1957) considered the smallest success
in the case of binomial populations and Bechhofer, Elmaghraby and
Morse (in preparation) worked out the procedure for selecting the
multinomial population having the highest probability parameter.
2.3
Sequential procedure for ranking populations:
The work described above in ranking populations was principally
concerned with fixed sample except for the two-sample procedure for
the case of unknown variances (1954).
In the latter instance, the
first sample is used to obtain the estimate of the variance and
14
then an additional sample taken, if needed, to arrive at the decision
meeting the requirement of some specified precision and protection level.
So far no detailed work has been published in sequential procedures for
ranking.
Bechhofer and Sobel proposed in an a.bstract (1953) a sequentiaJ.
procedure for determining highest mean in the ease of normal populations
with known common variance and in another abstract (1956) for unknown
common variance.
They also proposed in still another abstract (1956a)
a similar procedure for the smaJ.lest varianoe and again in an abstract
(1956b) a procedure for a multinomial event with highest probability.
No other work in detail appears treating more general cases of
ordering populations sequentially.
In this dissertation a.s mentioned
in Chapter I, four general cases of ordering are considered as special
cases of a more general problem. of ordering k populations in s
groups~
according to the magnitude of some of their characteristics. Again,
aJ.most all the previous work deals with ordering populations according
to the mean and in few cases according to the variance.
In this
present 'Work, three characteristics, viz., the mean, the variance and
the regression coefficients are considered.
In the latter ease, the
best population is defined as the one having the highest positive
regression coefficient instead of defining it as the one for which the
regression coefficient explains the most variation.
Rotelling (1940)
and Williams (1958) have used the latter definition in their work.
Should the occasion arise to consider the highest negative
r~gression
coefficient, these same procedures would apply by rescaling the independent variable.
15
The idea of the procedures described in this dissertation originated
from some of the suggestions made by Honroe and Mason
(1956)
(1955)
and Box
regarding'practical uses of such procedures in research as well
as in improving a given process iihi.le studying its behaviour at several
stages.
CHAPTER III
THEORETICAL DEVELOPMENT
3.1
Introduction:
In this chapter, separate sequential multiple-decision procedures
are developed f'or ordering k populations
"i (i=l, 2, ...., k) according
to the magnitude of' dii'f'erent population characteristics.
The charac-
teristics considered are: (a) the mean, (b) the variance, and (c) the
regression coef'f'icient.
The def'inition of' the "best 1t population with respect to each of'
the above three characteristics is as f'ollows:
1lhen ordering the
populations according to the mean, the l'best rt population is the one
having the
LARGEST~;
according to the variance, it is the one
with the SMALLEST variance; and according to the regression coef'f'icient,
it is the one with the IITGHEST positive regression coei'f'icient.
For each of' the above characteristics, f'our dif'f'erent types of'
ordering (goals) are considered as mentioned earlier, namely:
(1)
To determine the t best populations out of' k, without
considering
~ ~
(2 )'1'0 determine
(3)
!!! ~
of' the t best.
population.
To determine the t best populations out of' k, considering
the order of' the t best.
(4)
To determine the complete ranking of' the k populations ..
As mentioned earlier, the order (2) above may be considered a
special case of' (1) when t=l and case (4), a special case of' (3)
when t=k-l.
17
3.2 Ordering the populations according to the means:
3.2.1 Assumptions and Symbols:
Let X ..< (i=l, 2, ••• , k; <4=1,2, ...., Ni ) be normally and independently
i
distributed chance variables with unknown popu1a.tion means lJ.i=lJ.+8i and
a common unknown variance
cl
(i=l, 2, ...., k), where lJ. is the generaJ. mean
and 8. is the "treatment effect!1 in the simplest case, such that
~
k
~ e. =0. Let the ranked IJ.f s be denoted by
.
~=
1
~
1J.[1]:=:: lJ. C2 1~
... ~ IJ.CkJ'
and let 0ij =lJ.[iJ- f.L[j( '[i1 - t[j(-Oji (i=1,2, •.• ,k; irj)
(3.2.1.2)
denote the difference between the means of the populations ranked i!!:
and
j~,
[..1.[ i J.
> j).
(i
It is not known which population is associated with
Let us also define
N.
~Xi..<
...{ =1
=
Ni
as the mean of the sample from the population
i a.nd X( i) as the
1f
estimate of the ranked mean IJ.( i J; that is
E { X( i)
~
(3.2.1·4)
= [..I.ei J
and let the ranked
XI s
be
(3.2.1.5)
3.2.2
(i)
Goal (1):
t t'best" populations (unordered):
Probability Distribution Function:
Let us assume that the experimenter desires to decide which are
the t best treatments out of k under consideration.
(The "best" popula-
tion in this case is defined to be the one having the largest mean.)
18
That is, it is desired to identif'y the popula.tions associa.ted with the
means IJ.: k-t+l",' l.I. -k-t+2-' •••, l.I. rk:'
.:..-
-
-'
--
~
The probability of correct decision,
."
then is
P = Pr
~ iI(lyX(2)' ... 'X(k_t»)
< min l1Ck- t +1 )'·· .,XCk )\\
(3.2.2.1)
where X(i) is the estimate of IJ.liJ,(i=1,2~••• ,k).
N.
J.
For simplicity let
= N.
Let iJ.[i] be the minimum of IJ.k- t +l ,·· ,,~, and lJ.ljJbe the ma.xirnum
of l.I.1' •• o'l.I.k _t ' and let
5 ..= J.L r 'J- IJ.,. .•
J.J
LJ.
:.J.
It is necessary for the experimenter to speci.fy before experimentation
starts a constant,
5* = 5
= 5, say
ij
(3.2.2.3)
as the minimum difference between l.I.r i" and lJ.,.. . , desired to be detected
"- _
LJ...J
and the probability p-ll- as the minimum value of the probability of'
arriving at the correct decision whenever
>
J.J -
5 ..
5
{~
Let us define z. . as
J.J
Then, the conditional joint distribution of Zij t s is the multivariate
normal distribution with means zero, and common variance
the correlation matrix
ci ,
d
say, and
19
.
(
rl~2
\ fij )= ,1 1/ 2
1/2
1/2
1
1/2
1/2
1
1/2
1/2
•
I
I
Now, U s
d
1/2
•••
·.........
·..........
·..........
·.........
I· ...
L1/2
••••• 0
1/2
1/2
1
is the estimated standard error of the dUference between
two means with n degrees of freedom, the joint probability distribution
2
function of Zij and n;a is given by
(J
a
ng2
d
p(Z ..,. -w-)
-J.J (JCo
d
1
where
~
= number
of Zij values and.! = Zij
r
column vector and A is the
variance covariance matrix (J2 Pi': , because we know that the prob-
d \..
a.bility distribution function of
2
J)
n
a
2'
.
where s2 is the estimate of' (J2,
based on n degrees of' freedom, is a central x2-distribution with n
degrees of freedom which can be expressed as
2 n-2
<Et)2
(J
ns
2
-2"
e a
(3.2.2.8)
20
Substituting
z.. =t.jS
:l.J
:l.
d
and
(3.2.2.10)
s=s
ns
the joint distribution function of t.; . and
...J
ns
-
k1/2
(n)
2
d
2cr
d is
cr
d
2
~
2
[1 +
n1 (1'
A1
!b\1
"d
• e
IA I
1
where t
-
= ·ft. j ~ column vector and Al is the matrix
~:l.
\
2(~-1)
-ki
~
2(~-1)
~
2
-ki
{ A1
\=
·
2
.
e· • • • • • • • • •
.
,
2
-~
2
-itl
ns
In (3.2.2.11) integrating out
.
·
.
·
.
2
-ki
2
- ki
(3.2.2.12)
2
J
between 0 and (1), we get, as the
cr '(i"
joint probability distribution function of t ij , as
21
n+~
2
f'(~)
in (3.2.2.13) represents the probability distribution f'unction of a
~-dimensiona1
t-distribution for this particular A.
As the number of
degrees of freedom n approaches infinity, (3.2.2.13) approaches the
corresponding multivariate normal density function.
This multivariate
analogue of Student 1st-distribution has been considered by Dunnet and
Sobel (1954).
Now the probability required for our Goal (1) is
P=Pr [max {X(l)' •• "X(k_t)
~
< min
t
X(k-t+l)' •
"'X(k)~]
(3.2.2.14)
which can be written as
P=tPr [ max {X(l)' ..•
;X(k_t)~
<l(k_t+l) < min { X(k_t+2)' .• "X(k) \]
(3.2.2.15)
t-l
.f(y)dy (3.2.2.16)
where y =
X(k_t+l) -
~[k-t+1J
•
a/tIN
Substituting
~[k-t+1J
a/IN
y = X(k-t+lf
in (3.2.2.16), we get
k-t
t-l
LI-F(y)]
f(y)dy
22
where
F(y) =
j
"y
f(~)d1'
-Q!)
and
= FI(y) = f(1)
f(y)
0·2.2.20)
and
d
= sfrr
0·2.2.21)
15 being the difference between two population means.
It is to be noted that if d
= 0,
then the probability 0.2.2.18)
is equaJ. to
t! (k-t) \ =
k!
(ii)
1
Ck
0·2.2.22)
t
Sequential procedure for GoaJ. (1):
It is necessary to specify the minimum difference desired between
two means ~C.iJand ~[jJas defined in 0.2.2.3) viz.,
where
~C.iJ is
the minimum of the t largest means and lLljJis the ma.xi:m.um
of this k-t smaJ.lest means, and aJ.so specify the minimum probability
p* for (
~ .5 p7f-.5 1) for a.chieving the Goal (1) whenever
C
t
(3.2.2.24)
The sequentia.l procedure is then as follows:
(1)
Start taking observations from the k populations and at the ~
stage (m=1,2, ••• ), compute means.
(2)
Compute for every population 11'i'
23
n -
Mi
ll_~ ~(:I~,) ~~
=
t-l
(i=l, 2~ ••• ,k) •
It is seen from (3.2.2.25) that i f the positive quadratic form
1
Q = n( t' Alt) are ranked, then the ranked Mi' s obtained as
Mei ) =
t
n+k-I
1
I+Q[k-i+Jd
J
2
are such that
M ] is associated, at the
ei
m!!!
stage, with the population having the mean
tJ.liJ •
(3)
Using the values of M ] , calculate then the probability Pm at the
li
rn!h sta.ge, for the Goal (1), as
k
Pm =
~
M<'J
k-t+I l:t
k
.~
:t=I
Mei ]
and compare with the specified value of p*.
If P > p~fo, stop and select the t populations associated with
m~fo
Mc.k- t +I ] ' ..., M[k] as the t "best" populations. If Pm <P,
repeat the above procedure with the next stage, and continue
until the procedure indicates a stop.
24
3.2.3.
(i)
Goal (2):
The "best" population:
Probability Distribution Function:
Sometimes the experimenter is interested only in deciding the "best It
one treatment instead of severaJ..
In this case, if the ranked
(.I.' s
are,
again,
(3.2.3.1)
then the required probability is,
P=Pr
[max {X(l)' X(2)' •.•,
'f(k_l)
<'!(k)]
(3.2.3.2)
Let us define
It is necessary for the experimenter to specify
as the minimum difference desired by him between the largest mean (.I.lk]
and the next one (.I.[k_l] • Also he must specify the minimum probability
p~f- of achieving the goal whenever
Defining as in GoaJ. (1),
(3.2.3.6)
which bave the multivariate normal distribution, then as before,
where
8
2
d
is the estimate of
(,2
d
with n degrees of freedom, have the joint
(k-l) dimensional t-distribution given by,
25
n+k-1
2
where matrix
~2
is given by
2(k-1)
k
o ••••
0
•••••••
2
-i{
........
The probability
by
.
.............
1
2(k-1
k
P in (3.2.3.2), then, required for our GoaJ. (2) is given
1 ir J
fJ&
p
............. .
k-1
y
r(l)di
r(y)dy
(3.2.3.10)
where
Substituting for y, we get,
k-l
00
P=
[F(Y+d)]
f(y)dy
(3.2.3.12)
/
-fll)
•
(3.2.3.13)
26
and
d=
5
-a/1N
Again, it may be noted that when d=O
P = (k-1)!
It!
=!
k •
This Goal (2) then involves, the same distribution function as in
Goal (I) when t= 1.
(ii)
Sequential Procedure frrr Goal
(2~:
In this case, we need to specify, then
o* = 0k,k-1
(3.2.3.16)
as the minimum dif'ference desired between the two means I-Ltk1 and I.L [k-1] as
def'ined in (3.2.3.1) and the min:1Jnum acceptable probability
P*(l/k
-
< p ifo < 1)
-
of deciding the "best lt population whenever
.
o*~ Ok, k-1
(3.2.3 .17)
The sequential procedure, then, is as follows:
(1)
-
Take observations ;from k populations up to the mth stage (m=1,2, ••• )
and compute means.
(2)
-
Compute for each population, at the mth stage
[1+Q[~i+1}]
M(i]
=
where
QCk- i +11 =
n+k-l
(3.2.3.18)
2
the ordered quadratic form
~(.!t 6~).
Using values of M(.1J' compute the probability Pm' at the
for Goal (2) as
m.1!!
stage,
27
p
==~
i~MliJ
m
and compare it with this specified vaJ..ue of p*.
(4)
> p*,
If P
m-
stop and select the population associated with the
highest M-value, viz., M[kJaB the best population.
Ii' Pm
*
< p',
.
repeat the above procedure with the next sampling
stage and continue until the procedure calls for a stop.
This procedure is equivalent to the one proposed in an a.bstract
form by Bechhofer and Sobel (1956).
3.2.4 Goal (3): t "best" populations ~orderedh
(i)
Probability Distribution Function:
Let us suppose tha.t it is desired to determine the order of the t
best populations out of k.
Again i f the ranked j..Lts are represented as
(3.2·4·1)
then the probability required to be computed is
P==Pr
[max ~ X(l)' 0OO;X(k_t) (
<
X(k_t+l)
< 0"< X(k)]
•
(3.2 ·4·2)
Let us then define
°1 == 0(i+1),i
==
j..LLi+l] - j..LLi] ,
~=
(k-t+1),ooo,(k-1)1
and
where j..L[jJis the maximum of the (k-t)smaJ.lest means
0
(3.2·4·3)
28
Let us specify p~l- as the minimum a.cceptable probability of the
correot decision for the above Goal (3), whenever
5
> 5~l
1- 1
and
(3.2·4·6)
Let us assume for simplicity that 5~
= 5~ = 5*.
(3.2·4·7)
As in the case of Goal (1) above, the probability distribution
function in the case of this Goal, viz., Goal (3) is the same; that is,
if
(i,j, =1,2, •••,k; irj), (3.2.4.8)
~
then,
=
t
ij
has the joint t-distribution in k2-dimensions, where k 2
=
the number of t ij values, as
where
:
)
\, A3l =
2(k2-2)
2(k -2)
2
k2
4(k -2)
2
k2
4(k2-3)
2(k2-3)
4(k2-3)
6(k2-3)
k2
k2
k2
"
........
••• 0
:
'-
2(k2-3)
2(k -1)
2
k
2
2
•
k2
e _ •
k
2
... ....
~
k2
k
2
l
I
k2
6
k2
(3.2·4·10)
29
i.e.
.6
ij3
:: the element of the matrix .6
3
= 2 min(i, j) [1 max(i,
2
i
j)]
As before, we get the probability desired as
p=t!
where
(i=l, 2, ••• , t)
and
=
and d
0
a/IN
It is seen that when d=O! the probability for the Goal is
p = t!(k-tH
1
."tf'
k!
(ii)
= (k-tH
kt
(3.2 ·4·16)
Sequential procedure for Goal 0):
Let us specify o~land
o~l-
= 0i+l,i
(i=k-t+l, ..•,k-l)
(3.2·4·17)
= O(k-t+l)j
,
(j=1,2, •••,k-t)
0·2·4·18)
as defined above, and also specify P~l- (
(ki{;>' .5 P* .5 1),
the minimUlll
acceptable probability for the correct decision wheneve+'
0.+
. >0
1. 1 ,1.-
~f'
(3.2·4·19)
and
o
> o~l-
(k-t+l), j -
0·2·4·20)
•
The sequential procedure is, then as follows:
(1)
Take observations from k populations up to the
(m=1,2, .••• ) and compute means.
nt1h
stage
30
( 2)
Compute for each population, at the m!h stage
n+k-t
t-l
2
(i=1,2, •••, k)
(3.2·4·21)
where
~ (i' f!j!:.)
Qlli-i+l) = the ordered quadratic form
Using values of Mr-t'" , compute the probability Pm' at the mth
L.:'" J
stage, for Goal (3) as
Pm
=
~ (.~lM~.~)/~
Me·,
LJ.J i=l
~-
i=k-t+1 J-
and compare it with the specified minimum acceptable probability p'*.
(4)
> p*,
If P
stop and select the t populations associated with
m-
M[k_t+l}" ..,M[kl as the! best
If Pm
< p~\
1-!! ~
order.
repeat the procedure with the next stage until a stop
is decided as above.
3.2.5 .Goal (4):
(i)
Complete ranking of the k populations according to means.
,
Probability Distribution Function:
Suppose it is desired to order all k population means according to
their magnitudes.
~[lJ
oS
~ [2JoS
If we rank them as
····OS ~lkJ '
the probability required to be computed is
P=Pr (X(l)
<X(2) < ... <X(k)J'
I f we now define
(i=1,2, .•• ,k-l)
31
and
the joint distribution function of t= t .. is the generalized t-distribution
-
J.J
as before where the variance-covariance matrix is
t.4=cr~ {Pij 1(3.2.5 .5)
where again
P
ij
_!
~
=j
1
if
i
-J/2
if
Ii-j \ = 1
0
if
\i-j \
I
>1
J
(i,j
=1,2, ••• ,k)
The probability P in 0.2.5 .2) is then
P= (k-l):
where
y.
J.
[F(Yl+d~ • [:(Y2)-F(Yl)]
= x{ i+l) - ~[i+JJ
cr/-IN
and
F{y.)
J.
j
(i=1,2, ••• , k-l)
Y.
=
J.
f{t)dt, f(y.)
-
... e o . [F(Yk)-F(Yk_l)] 0·2.5.7)
-
J.
= Ft{y~)
....
-00
and
Again, if d = 0, the probability P becomes
P = (k-l)Jl!
k!
(ii)
1
(k-lH
.
1
= k!
•
0·2.5.11)
Sequential Procedure for Goal (4):
Let us specify
?l-
5 = 5i+l,i
= ~Li+lJ-
7~
and also specify P* (k
lJo[i]
P*~ 1)
for the correct decision whenever,
(i=l,2~ •.•,k-l)
'0.2.5 .12)
the minimum acceptable probability
32
Then, the sequential procedure is proposed as follows:
(1)
Take observations from k populations, up to the ~ 'stage,
(m=1,2, ••• ) and compute the means.
(2)
Compute for each population, at the ~ stage,
n+l
Mlit
~+Ql~-i+iJjT
L 1 '\ ¥1 k-2
~+QQc-i+lj } j
1where Q~-i+l) • the ordered quadratic form li(!' AJ).
(3)
Using M-values, compute the probability Pm' at the
m!h
stage, for
Goal (4) as
k
p
m=
i
i~ (lIt[j~
(3.2.5.15)
k
.2M(i]
~=l
and compare it with P*.
(4)
If Pm 2 p*, stop and select the best populations associated with
the Mli] in that order.
If Pm
< p*,
repeat the procedure with the next stage until a stop
is decided as above.
3.3 Ordering k populations according to their variances:
3 .3 .1 Assum:ptions and Symbols:
Let there be k normally distributed populations
with unknown means !J.i and unknown varia.nces
O'i.
..< = 1, 2, ••• , Ni ) be independent observations from
ranked 0'2,s as
i (i=l, 2, ... , k)
tt
Let~..< (i=1,2, ••• , k;
i • Let us divide the
11'
33
and define the "best n population as the one having the smallest variance.
In the above, of course, it is not known which population is associated
2
with O"LiJ
Let
S~i)
with ni degrees of freedom be the best estimate of
O"~iJ and
let
=Q1••
Q
ij
J~
2
and the ranked s.,
be
~
.
222
s[l] < s[2J< ..... < s[kJ
3.3.2
(i)
Goal (1):
0·3.1.3)
To determine the t '!best" popula.tions (unordered):
Probability Distribution Function:
For this Goal, it is necessary to evalua.te the proba.bi1ity
2
2 (
." 2
2 l""'\
P=Pr [ Max { s (1)' •••, s ( t ») < m~n \ s ( t+1)' ..., s (k) \ j
i.e.
t
P=
j~
pr [ max
0·3.2.1)
{S~l)' ... 'S~j_1)'
< min { s~t+l)' ••
OJ
S~k) J] .
0·3.2.2}
Let us now define
(i, j = 1,2, ..., k)
Then
say
(i, j=l, 2, • •• , k)
(irj, i,. j )
Now it is known that
1
(i=j)
(i=1,2, ...,k)
nis~i)/0"2i
is distributed as a
x~
0.3.2.4)
with n i degrees
34
of freedom, vi z .,
Let us consider now k independent
x. =
Xn =
2
J.
x2,
viz.,
·2
ni;(i)
i
(3.3.2.6)
(i=1,2" •••,k)
O"['iJ
a.nd denote
k
~n.=N •
i=l J.
Then, their joint distribution function can be written as
k
_12 ~x.
n.
1
• x.
J
where!
-t - 1
• e
(3.3.2.6) we
2
s(J.') =
e
have
2
xiO" i
n
i
(3.3.2.8)
and
o
x.
_-1
T
= (:lC:I.,~, ••. ,~).
From
••
i=l J.
ifj
r( J.J
.. )
Q ••
J.J
giving
(i=1,2" •.• k; ifj)
35
Also
,
n.;
'dxi = ~
(3.3.2.11)
XjdUij
Using transformation (3.3.2.10) and (3.3.2.11) and keeping xi
~(~)
= xj '
in the
in (3.3.2.7) we get a function of uij and x j (i=1,2, •••,k; ilj) as
k
IT
J
n.
(..2= x.u..)
n. J
i=1
irj
J
1~
- ~i· j
~J
_ xj
ni
-
X.U. •
n j J ~J
e
e
k
'T
.
n
-i x
IT n
i=l j
j
(3.3.2.12)
irj
when
U
-
= all u.. (i,j=1,2, •••,k;ilj).
~J
This then reduces to
n
k
ni
.TT (~) '2
J!=~
k
J
irj
ni
2-1
u •.
.[ . ~J
~=
irj
k
-?[1 + '~1 t
~=
irj
e
Let
y
= x.
J
[1
+
.~,2;.
ilj n j
U
ij]
J
u ij
J
(3.3.2.13)
36
Then
x.
y
==
J
(3.3.2.15)
n.
l+~~U
. . n. iJ.
J.
J
J
and
dx
j
=
dy
1
(3.3.2.16)
+i~ ~ uij
Using transformation (3.3.2.14)
we then get from '3.3.2.13) a function
in u's and y, viz.,
k
n.
n.
(.2;)
J.
r
~
[ l'lj
i=
=
i;(i--::--_ _
-,: k
n
2
'2
n
1
f(-f)
i=l
N
Y
~-1
Integra.ting out y as a Gamma function, we obtain the distribution of
u..' s a.s
J.J
(3.3.2.18)
This represents the generalized F-distribution of the ratios of
(k-1) variances SZi) to a common variance
Sfj)' It can be seen that
37
if aJ.l n
i
t
s are equal, say to n, then
k
N =
~ n. = kn
~
i=l
and
the above distribution reduces to a. simplified distribution form
r <:!W)
fr <¥~
k
[
1 +
~
i=l
ifj
u •.
~J
J
(3.3.2.20)
kn
2"
Hence the required probability (3.3.2.2.) for our Goal is
=
p
~
j=l
jfi
I.
Qjl
jQj,j_lj9j,j+l . ./ 9j,/..
•• •
o
00;.00
• • • 1
0
/,
0
• ••
CO (u)du
J
9j,t+l
-
-
9j,k
(3.3.2.21)
(ii)
Sequential procedure for Goal (1):
Let us specify the smallest value Q,f- of Qt+l, t which the experi-
menter desires to detect and also specify' P*<l/C~
~ p'~ ~ 1) the
minimum. acceptable probability of correctly arriving at the decision
whenever,
Q
t+l,t
2
=
(j[~+JJ
('LtJ
>9*.
-
38
Then the sequential procedure is proposed as follows:
(1) Take observations from each of the k populations 1'l'i and compute
2
sCi) for each, at the ~ stage (m
(2)
= 1,2, •.•).
Compute, at the mj:h sta.ge,
k
ni
Jl
= ~
i· (u
Lj
where j
ij
)
2-
1
.
lN
L\~ ~ "i~ ~
,
(irj)
n
k
= 1,2, ... ,k,.
i
>j
and N
= .~1
:1.=
n.•
:I.
At the m!h stage, if the ranked Lj t S are
LI:.IJ<L(2)< ••• <L~-tJ<L~-t+1J<'" <L[k]'
then
O'~iJ
corresponds with Llk- i +l ]
(3.3.2.24)
•
Compute then the probability P , at the mth stage, using the
m
-
L-values, so that the Goal (1) is achieved.
This is given by
k
P
m
=
~
j=k-t+l
k
~L
j=l
L
l J·J
[j J
and compare Pm with the specified acceptable value p*.
If P
> p{f-, stop
m-
L\1t-t+l]'" .,L
If P
m
< p*,
and select the t populations assocaited with
lk} as
the t best populations.
repeat the computations for the next sta.ge and
continue until the decision to stop is reached.
As will be noticed, the computa.tions are simplified if all the
n
i
tS
are equal.
39
3.3.3.
(i)
Goal (2):
To determine the best population:
Proba.bility Distribution Function:
The goal here is to determ.ine the best population, that is, the
population with the smallest variance.
P=pG~l}
<min
Defining again, u
ij
Thus we need to find
{S~2)' •.• 'Sfk)~J
= 1,2, •.., k)
(i, j
(3.3.3.1)
as before in (3.3.2.4) we get the
distribution function of the uij's a.s the same generalized F-distribution,
as before,
(3.3.3.2)
and there.fore the required proba.bility is
(3.3.3.3)
(ii)
Sequential Procedure for Goal (2):
In this ca.se we must specify the smallest desired value g* of
9 . k 1 and the minimum acceptable probability p~fo (11k < p{fo < I) .for a
..
-1t, ~
correct decision whenever
.
,...
~2 , 1 =
cl 2
--rCJ
2
",...{fo
•
1
The sequential procedure, then, is outlined a.s .follows:
(1)
Take observations .from each, of k populations Wi' up to m~ stage
(m
= 1,2, •.• )
and compute
s~i} .for each population.
40
( 2)
~
Compute, at the
ni
2' -
k
11
=
L.
J
stage, for each population
1
r=1 (uij )
i
~
n.
-I
. 1 + ~ 2; u .. 1\.
. 1 n. 1.J
/"- 1.=
J
~
N
~
,
(irj)
(3.3.3.5)
\
and rank L.' s as
J
<
L[lJ
L[2J <
(3.3.3.6)
..• < L Ck]
Compute P , the probability at mth stage, for the smallest variance,
m
-
as
p
m=
~
(4)
If P
~
.
~
L-I j \
= 1 " --
> P*
m-
, stop and select the population associated with the
value L ] as the best one.
lk
-)fI f Pm < P , repeat the computations at the next stage and continue
until the decision to stop is reached.
This procedure for Goal (2) is similar to the one proposed by
Bechhofer and Sobel (1954).
3.3.4 G!,a1 (3):
(i)
To determine the ORDER of the t best populations:
Prf:>bability Distribution Function:
For this goal, the probability required can be expressed a.s
P =P[SCl)
< S(2) < ... < SCt) <min l SCt+l)
...
S(k)~'
Defining again
(i,j=1,2, •.• ,k; irj; i
> j)
(3.3·4·1)
the probability distribution function of u .. f S in the above case can be
~J
expressed as
n.
~
no '""2
k
N
r(-)
2.[
(..2:)
n.
J
T !! = --k-...:iolo---
(Q (
.
~-
)
Jl u..
k
~-
nr<-1)
1
~-
1
~J
~l
~
IT
n.
1 + .2
~ u ..
. 1/ . 1 n. J~
~= \. .1=
~ _
i=l
n.
i
[(k-j)n .-2J /2
J~
--
,.
!
~I
Au"
t
J
- , (ifj)
~
N
1
II
n . . II
-a1
n
j3=t+1 t3
U.
! I
J~'
'l J
and hence the required probability is
As can be seen, the function for this goal reduces to that for Goal (2)
when·t • 1.
Sequential Procedure for Goal (3):
(ii)
Here let us specify Q~
2
of
Q.
~,
t
=~
~
,
the minimum value desired to be detected
where,} .. is the minimum variance of the (k-t) highest
O'r t~
\. J ~'"
-
~~
2
variances, and Q as the minimum value desired to be detected of
2
0'[j+1J
(. )
.
~~ (k-l)t.
Qj+1,j = 2
' J = 1,2, •••,t • Also let us spec~fy P t k
~
O'[j]
the minimum accepta.ble probability for correct decision whenever
Q
it
*
(3.) ·4.5)
~ Q
and Qj+l,j
l
~
Q; .
(3.3·4·6)
42
Then the seql1ential procedure can be as follows:
(1)
Take obseFwtions for each of the k populations and compute
for each at the
(2)
~
stage (m=1,2, ••• ).
Compute, at the m.:!?l! stage for each population
n
k
IT
Lj
u
ij
t
i
~
- 1
i=t+l
IT u ji
tr
j
,
(k-j)n j
2 - -
1
j=l
= ---~------------~------------------,(j-l,2, ••. ,k)
i
k
1.+ ~
i=l
L
S(i)
(3.3.4.7)
ifj
IT ~ u ji
....lj"':'=,;;"l__
~ __
i
n
IT ~ u j13
13=t+l 13
J
b
where
IT u j13 = 1 if a > b.
13=a
At the
m.i!:
stage, rank L. I S as
J
.
L[lJ < L[2J< ••• < L lk- tJ < L[k_t+l] < ... < L(kJ
(3.3 ·4·8)
Compute then the probability P at the mth stage, as
•
J
P •
j=ktl
k
m
m
-
~ L[PJ
j~ L lj J
and compare it with p-lE-, the specified minimum acceptable probability
value P.
(4)
If Pm
~
P*, stop taking observations and select the t best popula.-
tions as the ones associated with L
CkJ ••• ,L[k-t+iJ in that order.
If P < p*, repeat the computations for the (m + l)th stage and
m
continue until the decision to stop is reached.
As before, the computations here are simplified if all n. are equal.
J.
3.3.5 Goal (4):
(i )
To determine the complete ranking of k .E-0pulations:
Probability Distribution Function:
In this case the required probabiIity is
r.2
P =P [3(1)
2
2 "'
(3.3.5 .1)
< s(2) < .•. < S(k)J
Defining u .. as before, the probability distribution function is, in
J.J
this case,
n.
k 1
of - Iff
~j
if.iu ji
(k-j)n j
2 -
-1
------~---"""'JIfir.r,
--
k
TI r(~)
i=l
1+.~ (IT ~
1
L
J.-1
j=l
J.
U
ji
)
l
_
~
, (irj)
(3.3.5.2)
and hence the required probability is
(irj)
(ii)
Sequential Prl,cedure for Goal (4):
Let us specify Q~!-, the minimum value desired to be detected of
Qi+l, i =
2
O'[i+lJ
2
'
~
"',
e-=l, 2, •••, (k-l~
*
~<.
and P' (11k! ~ P' ~ 1) as the
(j~. ~
LJ.
J
minimum acceptable probability of correct ranking whenever
The sequential procedure is then as follows:
(1)
2
Take observations from each population and compute sCi) at the mth
stage, (m=1,2, ..• ).
(2)
Compute, at the mth stage, for each population
-
J
ni
k-l
"'1!-1
L.
J
=
(k-j)n j
2 - - 1
u ji
Tr.,
n
5=1
~j
=N-
-l+~(.Jli~u"JJ
~
t
·-1
~-
and rank L j
1S
J=
no;....
(ilj)
0·3·5.5)
J~
,
as
L[lJ < L[2J< •.• < Llk J
(3)
,
.
Compute Pm the probability of complete ranking as,
=
p
m
~
k.
elL l~~
~L
j=l
ljJ
.>~
and compare it with P".
If P
> p"\
m-
stop the observations and select the best populati'~s
as givan by the order of L(kJ' L \}c-lJ' ••• , LeI] •
If Pm <
P-*, repeat the computation with the next stage and continue
until the decision for a stop is reached.
It will be seen that this Goal (4) is the same as Goal (3) when
t
= k-l.
3.4
Ordering k population regression coefficients:
3.4.1 Assumptions and Symbols:
Let X.J.",- be observations of a fixed "independent" variable (i=1,2, •.•k;
I
.I.,
= 1,2, ••• ,Ni ) and Yi ..<
'
the corresponding observations of the "dependent"
variable from the popula.tion vi.
Let us a.ssume Yi .,( to be normally and
independently distributed with unknown mean (.1i and the common unknown
.
var:tance
Let
2
CJ •
~.
J.
denote the coefficient of linear regression of Y on X for
the population Wi' expressed as
Y.J.
= ~.X
+ y.J.
J..
and let the ranked
~ rs
be expressed as
(3 ·4·1.2)
It is not known which popula.tion is a.ssociated with
~Q.J.
The ordering
of the regression coefficients in this section is considered with
respect to their ma.gnitudes, that is, the best po:pulation is the one
having the highest positive regression coefficients.
Let us therefore define the dif.ference between two regression
coefficients a.s
(i,j,=1,2, ...,k; irj)
we
(3 ·4·1.3)
further define b , the sample regression coefficient, as the least
i
square estimate of ~. a.s
N
b.J.
=
J.
~Xi.li"(
.c: =1
N.
'S
.,( =1
X~.,(
J.
1+6
where xieo<' Yi -< are deviations of Xi ..< and Yi ..< from their respective sample
means, Xi and Yi •
The ranked b I S are then denoted by
b
GtJ < b l2J
< ••• < b
lk1
'
(3.1+.1.5)
and the estimate of the ranked regression coefficient
E{b(i)
3.L,..2
( i)
1=
~L.iJ
Goal (1):
~
Li] by b (i); then
(3 ~1+.1.6)
.
To determine the t highest regression coefficients:
Probability Distribution Function:
For this particular goal, we need to obtain the probability which
can be expressed as
P = Pr [max
~b(l)' ••• , b(k_t) \
< min {b(k_t+l)' ... , b(k)\} (3.1+.2.1)
that is,
P=tPr [max {bel)' ...,b(k_t)i <b(k_t+l) < min {b(k_t+2)" .•,b(k)
~J
.
(3.1+.2.2)
Now in a bivariate normal population considered above in Section
5~' and r i as the sample values estimating the
(ai), variance of Y, ( CT~)' and the coefficient of
3.1+.1, if' we define si,
variance of X,
correla.tion
(fi )
for the population,
sl' 52 and r is given by
1'1
i
, then the joint distribution of
47
where
SUbstituting
bS
l
r=-
s2
and dr
sl
=-s2
(3·4·2.6)
db
in the above, we get the joint distribution of 8 1 , s2 and bas,
g(sl,s2,b)=(constant)e
N.-l
• (sl) ~
N.-3
(s2)
1
s~2 li- 4
(1 _~)-rs2
Integrating out this with respect to sl and s2' we ge1i the distribution
of b as
h(b)
=
N.-1
r(+)OJ.
where
~
[
N.-2
~
0"22 , 2
2]
~ (l-f )+(b-~)
.y'i
f<:12
= -<:1
1
is the population regression coefficient.
This is the distribution o:r Pearson Type VII, sJ'IlUlletrica1 about the
point
b.~,
and tending to normaJ.ity fairly rapidly.
48
Now defining
(J ·4·2.10)
and using the joint distribution (J.4. 2.7) above of sl' s2 and b, we get
the distribution of u as
1
f(u) = (constant)
Ni -1
(1+u2)~
which does not contain any other parent parameters.
(J·4·2.l2)
in (J.4.2.11), we see that the distribution of t is that of the familiar
Student t s t-distribution.
The t-statistic in (J.4. 2.12) is simply
t = (b-@)
s3
where s3 is the standard error of b given by
2
2
(s2 - bsl )
8
2
3 = sl(N
i - 2)
We also then see that the difference between two regression coefficients
b and b can be tested by the t-distribution.
2
1
Caming baek to the k regression coefficients b(i)' (i=1,2, ... ,k)
we define
(i, j=l, 2, ... , k; irj )
(3·4·2.15)
where s d is the estimate of the standard error of the difference between
the two bls from the pooled variance estimate s2 (assuming common variance)
given by
2(...L...+
~xi
s
,
1)
~x~
where
~ ]i,-b.
i=l
=1 1.....
1.
k
s
2
[
N.
co(
=
ix,v.,
N.
co(
]
=1 1..,(" 1....
,
n
(3·4·2.16)
and n, the number of degrees of freedom, is given by
k
n
= ·11.
~ N.-2k
•
1.=
As already discussed in Section 3.2.2,. the t ij I S have a
~ -dimen-
sional generalized t-distribution, viz.,
r
where
n+~)
( -r-
(3·4·2.18)
.~ is the number of tij-vaJ.ues, and
Al is the matrix { Aij ~ where
if i = j
ifirj,
and! is the column vector {t ij \ •
Now the probability required for this Goal (1), is
P=tPr
[max {b(1)' •••, b (k-t) \
< b (k-t+l) < min \ b (k-t+2)' •••, b (k)
~
(3·4·2.19)
50
k-t
00
=t
[F(y+d)]
;
-CO
~-F(y)J
t-l
(3·4·2.20)
f(y)dy
as before, where
F(y) =
JY f(~)d!
(3·4·2.21)
-00
and
f(y)
= Ft (y) = f<.!~),
(3 ·4·2.22)
and
y = b(k_t+l) .. ~[k-t+i}
sb
sb being the estimate of the standard error of b.
If the number of degrees of freedom, n, is very large, the above
distribution can be approx:i.mated by the corresponding normal distribution.
(ii)
Sequential Procedure for Goal (1):
Here it is required to specify the minimum difference between two
regression coefficients desired to be detected as
5*
where
= 0ij
~liJ
(3·4·2.25)
= ~Li)- ~[jJ '
is the minim:um of the t largest regression coefficients and
~[j J is the ma.:x::i..rnum of
the (k-t) smallest means.
Also required to be
specified, as before, is the minimum acceptable probability
P* (1/ Ctk
0ij
S P* S 1) for achieving the Goal (1) whenever
~ 0*
(3·4·2.26)
The sequential procedure, then, can be outlined as follows:
(1)
Start taking observations from k populations and compute b(i) fS
at the
m~
stage; (m=1,2, ••• ).
51
(2)
-
Compute for every population, at the mth stage,
n+~-t
MiLJ.
,,' J
r
=I
I
1
I
\ l+Q)k-i+l\
.,,.'"OJ
~
2
--
1
1-
1,-
(l+Q
\
\
2)
n
~
t-1
rk. i+1J )
-
(i=1,2, ...,k)
As in the case of ordering means, here also, we have
M"'lJ<M~2°'<
l
l -'
••• < Mrk ,·
L j
(3·4·2.28)
and the Mc.iJat the m!h stage is associated with the population regressim
coefficient ~[iJ •
(3)
Compute Pm' the probability for Goal (1) as
k
~
=
Pm
Mf ·-
i=k-t+l ..J.J
~,
~
. I
1=
Mf
"
j
LJ. -'I
and compare it with p~l-.
m-> p~l-, stop and select the t populations
. associated with
If P
M - t +l ] , ..• ,M~kJ as the t population having the highest regression
Ck
coefficients.
If P
m
*
< P , repeat the procedure for the (m+l )th
- stage; continue
this way until the procedure calls for a stop.
3 ·4.3Goa1 (2):
(i)
The Ithighest 't regression coefficients:
Proba.bility Distribution Function:
I f the interest of the experimenter is only to detect the population
with the highest regression coefficia:lt, the probability can be expressed
as
52
(3·4·3.1)
1<le define here
(3·4·3.2)
and
(3·4·3.3)
The joint distribution
of t 1.. (i=1,2, •. ,k-l) is
.
k
.n+k-l)
r( 2
;
l
where h. 2 is the (k-l) x (k-l) matrix ~ h.ij j
( 2(~-1)
for i=j
- 2/k
for irj
h. •• = ~
l.J l
, with
(3·4·3·5)
Then the above probability P can be expressed as
P
=
J
(J)
\!(y+d)]
k-lf(y)dy
(3·4·3.6)
-00
where
and
B· .
d =
(ii)
.:2:.J.
sd
Sequential Procedure for Goal (2):
Let us specify
* = 0k,k-l
(3·4·3·8 )
B
as the minimum difference between
~lkJand ~lk-lJ
desired to be detected
53
with the mininrom acceptable probability P*(l/k .5 p,t- .5 1) for the correct
decision vlhenever
*
~,k-12 5
•
The sequential procedure, then, is as follows:
(1)
Ta.ke observations from k populations and compute b(i); (i=l, 2, ••• ,k)
at the
(2)
m~
stage, (m=1,2, .•. ).
Compute for ea.ch population, at the
~
stage
l n+~-l
1
0·4·3.10)
Mlij = [ l+Q\i<:-i+lJJ
where Q(k . 1.J
1.£ -1.+
=
1
the ordered quadratic form -n(_tt A2_t).
Compute P , the probability, at the mth stage for Goal (2) as
m
-
M~ ""
LkJ
Pm = -k~
........----
~
i=l
0·4·3.11)
'1
C1._
M
and compare it with the value of p*.
(4)
If P~ p{l-, stop and select the population associated with ~!IlkJ
as the one having the highest regression coefficient.
*
If Pm< P , repeat the procedure for the (m+l )th
- stage and so on,
until the stop is decided as above.
The above case of Goal (2) can be considered as a special case of
Goal (1) when t • 1.
3.4.4 Goal· (3):
(i)
Orderin& the t hi&hest regression coefficients:
Probability Distributil:>n Functit::m:
In this particular case, the probability required is
P=Pr [max {b(l)'b(2)' ..., b(k_t) \
< b(k_t+l) < b(k_t+2) <...< b(k)J
(3·4·4·1)
•
54
Let us define
51 = 5(i+l,i) =
~li+1J- ~LiJ ' j=(k-t+1, ...,k-l.)J
(3 ·4·4·2)
and
and assume, for simp1ioity
5* = 51
= 52
(3 ·4·4·4)
where \5 * is the min:i.nmm differenoe between two regression ooeffioients
desired to be dete,cted with the probability of at least p~k for the
correct decision whenever
(3 ·4·4·5)
(3·4·4·6)
Here again the statistics, t ij , viz.,
t ij
= b(i) -
be j2
Sd
- 5*
(i,j,
= 1,2, ••• ,k)
hive a joint t-distribution in k 2 dimensions; thus,
(3·4·4·8)
k2
(-}mf)
when
A3 is
1
2 (k2+l)~r(1)
the matrix
{Aij\
= 2 min(i,j)
[1 -
whose elements are given by
i
max(i,j)]
2
The probability for Goal (3) then is given by
Aij
55
where
_ b(k_t+i) - ~ fk-t+i]
y. - .
,
~
sb
(i=1,2, •.•, t)
(3·4·4·11)
/Yi
=J
F(Yi)
f(!)di
(3·4·4·12)
lJ)
and d = (ii)
5
(3.4.4.13)
sd
Sequential Procedure for Goal (3):
As specified above, we have 5 ~fo the minimum difference desired to be
detected and
p~k( (ki{r H .5 P*.5 1) the minimum probability for cOrreot
t
decision for this Goal, whenever (3·4·4.5) and (3.4.4.6) exist.
The sequential procedure, then, is:
(1) Compute b(i)'at the
(2)
~
stage (m=1,2, •••,).
Compute, at the mth stage
- n+k-t
n
(""
t
. '2 I t-l,
11-(
1
\
L
Q
,1+ Ck-i+JJ )
J
Calculate P , the probability at the mth stage, as
m
k
Pm =
(4)
(i=1,2, •••k)
=_ the ordered quadratic form ~(!I A~).
where Q)-i+l]
(2)
I
~
i=k-t+l
j
,i
(
1-L
j=l _J /
/k
IT ·il / i=l
~ M~i- ..i
If Pm ~ p*, stop and select the t populations associated with
Mlk-t'fol] , ••• ,M[k) as the ones having the t highest regression
coefficients in that order.
56
I f Pm
< p*,
repeat the procedure for the next stage until the stop
is called for as above.
It will be seen that the above procedure reduces to tha.t .for Goal(2)
when tal.
3.4.5
Goal
(4):
Complete ranking of the k regression coefficients:
(i) Probabilitl Distribution Function:
I f it is desired to rank all k regression euqations, the probability
can be expressed as
P=Pr
[bel)
< b(2) < .
=\
o.
<b(k)j
•
Then we define,
i=1,2, ..., (k-l)
and
t .....
~J
where 5
i
!-
b(i) - b(j) - 5 *
sd
is the minimum difference between two coefficients desired to
be detected; the joint distribution of t ij I s is as in Goal (3).
The probability P, then is,
P = (k-l)! [F(Yl+dB [F(Y2)-F(Yl)J
•••• IF(Yk)-F(Yk_l)J (3 ·4.5 .3)
where
Y
... \i+l) sb
~Li+lJ
and F(Yi) and d are
(ii)
as
( i=l, 2, ..., k-l )
before.
Sequential Procedure for Goal (4):
We specify
5 * ... 5 i +l ,i
and P*(l/k!
= ~li+1J
.s pi !- .s 1)
- ~[iJ
(
)
i=1,2, •.•
,k....l
(
5 •5)
3·4·
as the minimum probability for the Goal whenever
57
5 J."+1,J." >5
~f-
(3·4·5.6)
.
The sequential procedure is outlined as folloW's:
~
(1)
Compute b(i) at the
stage, (m=1,2, ••• ) for each population.
(2)
Compute for each population, at the mth stage,
-
~l
M
li
]
=
I. .
1.
[+(~~-i+1J
-\
"'2
\1 _
J
L
where Q[k-i+l} .. ordered quadratic form
(3)
f:
n
1
\
'2' -
10-2
(3.4.5 .7)
\l+Q k-i+l )
~(.!' t\;~).
.-i
Calculate the probability P , at the mth stage, for Goal (4), as
=
p
m
i~ (AMU»
m
-
k
"~
J.=1
(3·4·5.8)
MCi -J
and compare it with P*.
(4)
If Pm
2 P*, stop and select the ranks of the coefficients
corresponding to the ranks of the respective M ] -values.
Ci
*
.
.
th
.
I f Pm <P , repeat the procedure .for the (m+l)
stage and
continue this way until a stop is called .for.
It can again be seen that this Goal is the same a.s Goal (3) when
t .. k-l.
CHAPTER IV
APPLICATION
•
4.1
General:
The importance of sequential procedures for multiple decisions was
mentioned in the earlier chapters.
As discussed there, a sequential
procedure, in general, helps to arrive at a decision quicker than the
comparable fixed sample procedure; again the multiple decision provides
a greater breadth of in format ion concerning the populations under study.
These sequential procedures are quite practical in many fields such
as industry and engineering where the requirement of uniformity of
population from observation to observation is easily met.
They are
limited to some extent in fields such as agriculture and medicine
where the time lapse or change of individual (unless assumed random)
from observation to observation violates the assumption of the
sequential procedure.
This situation can be overcome· in many instances
by proper selection of samples and individual subjects, so that the
uniformity is attained from observation to observation.
For example,
suppose that in an agricultural experiment, certain fruit:-varieties
are to be ranked according to their nutrient content.
~
job
requires the use of a chemical ;laboratory for determining the nutrient
content where perhaps only a few analyses can be done during the day.
Fruits of the same maturity can be brought from different varieties
for analysis of the nutrient content and the decision probabilities
computed for every stage of the sampling process.
If the decision
stage is reached, no more fruits need be picked for analysis.
This
59
process eliminates waiting until the complete harvest is over which would
be required by the fixed sample procedure.
Again, lower costs of analysis
and storage of fruits is realized i f there are many fruits picked up
which could not be analyzed in the short time period available.
More
work is thereby accomplished in less time.
The computational procedure for the decision probabilities is a little
complicated, but the systematic tabulation of calculations may enable a
statistical clerk to do the routine work.
The computations are simplifie.d
to a great extent i f the number of observations from each population is
the same.
In this chapter, the procedure to compute the decision probabilities
is briefly outlined for ordering means, variances, and regression coefficients.
The data used in the given illustrations for means and variances
are taken from the experimental work done; on sugar-cone moth-borer infestation by Bangdiwala. and Martorell (1953)
at the University of
Puerto Rico Agricultural Experiment Station.
The data used for ranking
the regression coefficients are from the work published by Martorell
and Bangdiwala (1954) on the reduction in sucrose content in sugar-cane
due to moth-borer infestation.
The purpose of using these data is
only to illustr.a.te the computati onal procedure for obtaining the decision
probabilities.
In actual application, the probability level of confidence
(P*) would be predetermined, and samples would be taken until the stage
when the decision probability computed is gr~ater than or equal to P~~.
In this chapter, however, only a few stages are considered to illustrate
the computation.
60
4.2 Ordering Means:
Let us suppose that we have 5 varieties (k=5) which are to be ordered
as follows according to the mean infestation:
(i)
(ii)
(iii)
the worst variety
the two worst (unordered) varieties
the two worst (ordered), that is, the worst and the next
worst varieties
(iv)
complete ranking of all 5 varieties.
(The worst population is defined as the one having the highest infestation .)
In table 4.2.1 are given the means of infestation for five sampling
stages (m=l, 2, 3, 4,5).
At every stage the cumulative total of the number
of observations up to that stage is given along with the corresponding
new mean for each variety and its rank at that stage.
In order to com-
pute the decision probability for each stage, the tij-values are
obtained first.
As an illustration in table 4.2.2, the tij-values lfor
m-l and 6*=1 are given.
Similar values can be obtained for other cases.
From these values, the quadratic forms," that is, the <a-values
and the corresponding M-values are computed, to obtain the final Pm'
the decision probability at the
rn.ih
stage.
For example, the M-values for choosing the worst variety are
computed as given by the formula (3.2.3.18) with the 6-matrix I)f
order (4x4) as defined in (3.2.3.9).
and
61.J.
..
= 8/5
6..
= -2/5 (irj).
J.J
We note that in this ca.se
61
Table 4.2.1
Means of five varieties (X)
Sample
size
Samp:ling
stage
m
A
11.83
16.74
12.34
11.24
10.99
7.58
7.16
7.13
6.13
(4) 6.85
(1)
(1)
(1)
(1)
(1)
E
X
X
X
7.91
12·45
8.84
10.20
9·77
(2) 7·92 (3)
(4) 9 ·78 (3)
(2) 11.55 (3)
(3) 8.72 (2)
8.92 (2)
(3)
10.24
8.80
13·56
12.39
12.01
'!
(5)
(5)
(4)
(4)
D
C
B
'!
12
24
36
48
60
their ranks.
Varieties
N
1
2
3
4
5
and
Table 4.2.2
tij-values for means (m=1,5*=1)
i
1
2
3
4
5
-0.1442
-0.1420
-0.2131
0.3573
0.2862
0.2843-
0·6995
0.6285
0.6263
0.1270
j
1
2
3
4
5
-0.2862
-0.2884
-0.7877
-1.1299
-0.2174
-0·7167
-1.0589
-0.1145
-1.0568
-0.5514
(4)
(2)
(5)
(5)
(5)
62
These :rtI-values for the 5 stages are given in table 4.2.3 and the
corresponding decision probabilities in table 4.2.4.
Similarly the M-values for selecting the two worst (unordered)
varieties (5*=1) are computed following the formula 0.2.2.25) where
the t.-matrix is defined similarly as above in 0.2.2.12).
The
decision probabilities for this case are given in Column (2) of
table 4.2.4.
For selecting the ~ worst (ordered) varieties and for complete
ranking of all
5 varieties the
t.~matrix
in computations of M-values
changes for different varieties at each stage as indicated in the
formula 0.2.4.11).
The M-values for these two cases are to the
computed as per the formulae 0.2.4.21) and 0.2.5.14) respectively.
The corresponding decision probabilities for these cases then work out
as shown in the last two columns of the table 4.2.4, where again, for
illustrative purposes, the value
of 5 * is taken as 5 ~~=1.
Aside from the necessary consideration for proper interpretation of
the above results:
we can draw the following conclusions on the basis
of the decision probabilities in the table 4.2.4.
1.
If our job is to select only ~ worst variety of the
5 under
consideration and we desire to detect it i f the true infestation is at
least 1 percent greater than the next worst (5·*=1), then the fifth
stage sampling detects varietyE to be wrost with a decision probability
of 0.864, as indicated in column (1) of table 4.2.4.
,g.
among the
If in delineating the ~ worst (unordered) varieties from
5, we set a 1 percent difference between the minimum of the
two worst and the maximum of the remaining three (5*=1), then
63
Table 4.2.3
H-vaJ.ues for the worst variety (means)
m
1
M[l]
3
·42032
.0035222
.001448
4
5
.0021457
.0019486
2
11[21
11[31
M[41
-[51
·43366
.0046389
.007395
.0058580
.0049530
·43484
.0054677
.033846
.0104470
.0084157
.54447
.0085815
.041251
.63524
.0178600
.155733
.1174580
.157470
.0136330
.0093959
Table 4.2.4
Decision Probabilities (means)
(1)
Two worst
(ordered)
varieties
(3)
.257
.209
.336
3
·446
.650
.173
.272
.375
4
5
.785
.864
1
2
iI
I
I
.
e
The worst
variety
Two worst
(unordered)
varieties
(2)
m
·475
.679
.724
Complete ranking
of aJ.l varieties
(4)
.103
.514
.142
.229
.32,2
.666
.571
64
Column (3)
of table 4.2.4 shoVls that at the 5,!h stage of sampling the
varieties A and E are worst and i f the sam.pling were stopped at this
stage the probability that this decision is correct is P*
1.
To detect the
1!2 worst
varieties but
.!!!
= .724.
proper order (with
51~=1), the table shows that at the 5,!h stage the decision probability is
only P . . = .666.
11.
This is a more difficult task than the second one above.
The job of complete ranking is most difficult of all as indicated
in the last column of the table where
5 stages are necessary to obtain
the probability of only .571.
4.3 Ordering variances:
Let us define our goals in the same way as for the means, except
that we are interested in detecting the best populations.
The best
population is defined here as the one with the minimum variance.
In table 4.3.1 are given the variances of five different varieties
and their ranks at different stages and in table 4.3.2. are given the
uij-values (m=l, g*=2, for illustration) as defined in relation
(3.3.2.10).
From such u ..-values, L .-values are to be worked out to
J.J
J
obtain the decision probabilities.
As an example,
to find only the
best variety of the 5, the Lj-values work out as in table 4.3.3.
The
values in that table are multiplied by a constant number for simplicity
in presentation.
The corresponding decision pi'obabilities for this are
given in Column (1) of table 4.3.4·
In ColUIlU'S(2), (3), and (4) of
that table are given the decision probabilities for detecting the two
best (unordered)
vari~ties,
the two best. (ordered) varieties and for
the complete ranking, repectively for g1~=2 in all cases.
65
Variances for five varieties (s2) and their ranks
Sampling
Stage
(m)
Degrees
of
FreAdom
(n)
Varieties
A
11
23.
35
47
E
D
2
s2
s2
s2
279.55(5)
336·43(5)
263.77(5)
203.73(4)
118.62(3)
109.76(3)
112.77(3)
88.21(2)
263.82(4)
229.33(4)
211.51(4)
216.55(5)
45.87(1)
74·84(2)
67.49(1)
63.15(1)
s
1
2
3
4
C
B
8
2
56.88(2)
48·73(1)
107.60(2)
93.58(3)
Table 4.3.2
.
uij-values
for variances (m=l,Q*=2)
j
1
2
3
4
5
i
1
·4032
.1934
•0870
.0821
2
3
.6200
1.2930
1.0425
.2398
.1078
.1018
.2248 .
.2122
4
2.8755
2·3190
1.1120
·4719
5
3·0470
2·4575
1.1785
.5300
66
Table 4.3.3
Lj(Proportional) va:tues
m
L(l'l
I ....
1
2
3
4
.61410
5.8443
9.1048
1.0367
(variances)
L C2--I
6.8170
6.6857
1.2413
4·8878
8933.1 258.55
7407.5
8696·4
1137800.
189480.
141860.
290.99
12387000.
1235300.
4·2652 730320.
Table
4.3.4
Decision Probabilities (variances)
m
1
2
3
4
The 'Worst Two 'Worst
variety (unordered)
varieties
(2)
(1)
·337
.353
.774
.864
.283
·321
.680
.754
Two 'Worst
(ordered)
varieties
(3)
.232
.287
.607
.716
Complete ranking
of all varieties
(4)
.175
.214
·465
.578
67
As in the, computations on the means, inspection of table
4.3.4 of
decision probabilities shows that more stages are required as the task of
ordering becomes more difficult.
4.4 Ordering regression coefficients:
The best population in this case is, in general, defined as the one
having the largest regression coefficient. For illustration, data in
table 4.4.1 are the regression coefficients giVing the percent reduction
in sucrose content in sugarcane for a unit increase in infestation.
The
worst population is therefore the one having the largest reduction.
As
is obvious, the procedure for computing the decision probabilities here
is similar to the corresponding cases in means.
For illustration, table
4.4.2 gives the tij-values for m=l, 5*= .02. The M-values for the case
of selecting
~
worst variety are given in table 4.4.3.
In table 4.4.4
are given the decision probabilities for the four goals defined similarly
as in the case of' the means.
Here again, it is observed that more sta.ges
are necessary for more difficult order of the populations.
68
Ta.ble 4.4.1
Regression coefficients for five varieties and their ranks
Sampling
Stage
Sample
Size
(m)
(N)
Var.ieties
-b
12
1
2
C
-b
.08892(4)
.08307(4)
.07184(5)
.08809(5)
.021h9 (1) .08650(5)
E
D
-b
.009989(1)
.03594 (J)
.02623 (1)
.01717 (1)
24
36
48
60
3
4
5
B
A
.01254(2)
.02221(1)
.02783(3)
.02486(2)
.02791(2)
-b
-b
.03175(J)
'.02943(2)
.02717(2)
.02769(3)
.02933(3)
.1293(5)
.1320(5)
.05490(4)
.05959(4)
.05669(4)
Ta.ble 4.4.2
tij-values for regression coefficients (m=1,5*=.02)
i
1
2
.3
4
·4771
1.8247
.8361
2.2650
1.3198
-0.0185
j
1
2
3
4
5
-1.4134
-2.7516
-3.2013
-3.2610
-1.8064
-2.2561
-2.3158'
-0.9178
-0·9778
-0.5279
5
2.3247
1·3794
0.0412
-0·4085
69
Table 4.4.3
M-values for the worst variety (Regression coefficients)
(m)
M 1-'
~
1
2
3
4
,
<.'
.062117
"9845
.61632
·32134
·99476
~_2~
.12977
6.8371
, .767,
13·017
118.80
M
3 ~.
M'_~4
.38710
"'412
111·94
79 ·367
11.664
106.37
1383.,
938·76
12603.
9944·7
--
M;-,"/
.,8249
164·83
141·10
4119.,
468,3.
Decision Probabilities (Regression coefficients)
The worst
variety
m
(1)
1
2
.340
3
4
,
·453
"31
.638
.674
Two worst
(unordered)
varieties
(2)
·316
·428
·498
"86
.618
Two worst
(ordered)
varieties
(3)
.268
·369
·42,
"01
"42
Complete ranking
of all varieties
(4)
.207
.283
·341
·407
·457
CHAPTER V
SUMMARY
mean.
(2)
For variances, the one having the smallest variance, and
(3)
For regression coefficients, the one having the largest
positive coefficient of regression.
nlustrations are' given in Chapter IV for the computational procedures
for the goals considered.
The data are used to illustrate the technique
of. computing the decision probabilities.
The process of taking more
71
observations stops as soon as the desired level of probability is reached.
5.2
Considerations in applying sequential procedures=As mentioned earlier, the process should prove very useful when the
experimenter desires to get more information than that given by ordinary
analysis of variance test.
Again the sequential procedure is likely to
prove more economical and efficient in that the results could be obtained
at early stages.
The average sample size needed in sequential procedures
is generally less than the fixed sample size for the same precision.
In order to apply the procedures, however, certain assumptions are
made about the population. One of these is that of normality.
It is
known that this assumption is not always satisfied and hence use of
transformations may be made in order to use the proposed procedures.
It is advisable that the experimenter obtains, if possible, an
estimate of the sample size needed in ordinary cases for the level of
confidenge desired and for the required precision in measuring differences
between means.
Generally the samples needed for sequential procedures
will be less than that .number.
This will provide an additional bit of
information upon which a decision to sequentialize or to revise the
requirements for precision may be based.
5.3
Suggestions for further research:
In this dissertation only a few special cases of ordering populatitlls
by sequential procedures are given.
There is still need for extending
the idea further for considering other special cases and for working out
more general cases of, say, dividing k populations in s groups, with sl
of these groups having treatments ordered within the groups.
72
The procedures can also be worked out
besides the three considered here.
~or
other characteristics
Again, the definition of the ttbest-
ness" of the populations can also be changed in·certain characteristics.
For example, in case of regression coefficients, the best population can
be defined as the one having the regression coefficient giving the largest
regression sum of squares.
In cases when the particular characteristic of' the population has a
very complicated probability distribution some empirical sampling procedures can be devised with the help of the high speed computers.
A critical study of the validity of assumptions underlying the
proposed procedures is also of importance.
No attempt has been made in this dissertation to justify the ltgoodness lt of the proposed procedures.
A critical stUdy of criterion by
..
which such ftgoodness n should be judged is also a. worthy undertaking.
73
BIBLIOGRAPHY
Armitage, P.
trials.
1954. Sequential tests in prophylectic and therapeutic
Quart. Journ. Mad., ~:255-274.
Bangdiwala, I. S. and Martorell, L. F. 1953. Correlation between stalks
and joint infestation. Proc. of International Sugar Congress,
:aa,rbados .
Barnard, G. 1946. Sequential tests in industrial statistics.
Journ. Roy. Stat. Soc. ~: 1-28 .
Supp.
Bechhofer, R. E. 1954. 1\. single sample multiple decision procedure
for ranking means of normal popula.tions with known variances.
Ann. Math. Stat. &: 16-39 .
Bechhofer, R.E. 1955. Multiple-decision procedur-es for ranking means.
Proc. Ninth Annual Convention of the Arner. Soc. for Quality Control,
513-519.
Bechhofer, R. E., Dunnett, C. W., and Sobel, M. 1954. A two sample
multiple decision procedure for ranking means of normal popula.tions
_with a common unknown variance. Biometrika, ll!:170-176.
Bechhofer, R. E., Elmaghraby, S. R., and Morse, N. (In pr~pa.ration.)
A single sample mUltiple decision procedure for selecting the
multinomial event with the largest probability.
Bechhofer, R. E., and Sobel,M. 1953. A sequential multiple decision
procedure for ranking means of normal popula.tions with known
variances. (Preliminary Report) Abstract. Ann. Math. Stat. ~=
1)'6-137·
Bechhofer, R. E. and Sobel, M. 1954. A single sample multiple decision
procedure for ranking variances of normal populations. Ann. ]'Iath.
Stat. ~: 273-289 .
Bechhofer, R. E. and Sobel, M. 1956. - A sequential multiple decision
procedure for selecting the population with the largest mean from
the k normal populations with a common unknown variance. (Preliminary Report.) Abstract. Ann. Math. Stat. El.: 218-219·
Bechhofer, R. E. and Sobel, M. 195680. A scale invariance sequential
multiple decision procedure for selecting the population with the
smallest variaJ;'1ce from k normal populations. (Preliminary Report.)
Abstract. Ann. Math. Stat. El.= 219·
fe
I!
II
,
t,
f,
74
Bechhofer,R. E. and Sobel, M. 1956b. A seq. multiple decision prooedure for
selecting the m.ultinomial event with the largest probability.
(Preliminary Report.) Abstract. Ann. Math. Stat. 1I: 861. .
Bose, R. C. and Gupta, S. s. 1956. Moments of order statistics from a
normal population. Institute of Statistics Mimeograph Series
154. Uni.versityof North Carolina, Chapel Hill, N. C.
Box, G. E. P. 1956. Evolutionary operations. Mimeograph distributed in
Symposium on Design of Industrial Experiments, Institute of statistics,
UNC, Raleigh, N. C.
Bross, I. D. J.
1952.
Sequential medical. plans.
Biometrics, §.: 188-205 .
Cochran, 11. G. and Cox, G. M. 1957. E;Eerimental Designs.
and Sons. New York, N. Y.
J()hn Wiley
Dunnet, C. W. and Sobel, M. 1954. A univariate generalization of
Student t s t distribution, with tables for certain special cases.
Biometrika, M: 153-169.
Duncan, D.
1-42.
w.
1955.
Multiple range and multiple F-tests. Biometrics 11:
-
Gupta, S. s. 1956. On a decision rule for a problem in ranking means.
Institute of Statistics Mimeographed Series No. 150, UNC.
Hall, Wll1iam Jackson. 1954. Most economical multiple decision rules.
Institute of Statistics Mimeograph Series .No. 115, UNC.
Hall, 'Vlilliam Jackson.
1954a. An optimum. property of Bechhofer t s single
sample multiple decision procedure for ranking means and sane
extensions . Institute of Statistics Mimeograph Series No. 118,UNC.
Hotelling, H. 1940. The selection of variates for use in prediction
with some comments on the general problem of nuisance parameters.
Ann. Math. Stat. 1!: 271-283.
Huyett, M. J. and Sobel., M. 1957. Selecting the best one of several
binomial popula.tions. The Bell System ·Tech. Journ. :J.§:537-576.
Martorell, L. F. and Bangdiwala, I. s. 1.954. Sucrose content of sugar
cane as affected by moth-borer. Journ. Agr. of Univ. of Puerto Rico
~:22-37 .
Monroe, R. J. and Mason, D. D. 1955. Problems of experimental inference
with special reference to muJ.tip1e location experiments and experiments with perennial crops. Amer. Soc. for Horticultural Science,
,22: 410-414·
e-
,
'"
,~."
~.
,-
'"
,L
75
Mosteller, F. 1946. On some useful inefficient statisti cs • Ann.
Stat. 17:377-408.
-
Mat~.
"
Mosteller, F. 1948. k-sample slippage test for an extreme population.
Ann. Math. Stat. 12.:58-65.
Paulson, E. 1948 . A mu1tiple decision procedure for certain problems in
the analysis of variance. ADm. Math. stat. 20: 95-98.
Paulson, E. 1952. On comparison of several experimental categories with
a control. Ann. Math. Stat. ~: 239-246.
Paulson, E. 1952a. An optimum solution to the k-sample slippage problem
from the normal distribution. Ann. Math. Stat. ~: 610-616.
Rushton, S.
1952.
On a two-sided sequential test.
Biometrika
J2.: 302-308.
stein, C. 19145 • A two-sample test for a linear hypothesis whose power
is independent of the variances. Ann. Math. Stat. 1&: 253 •
Wa,ld, A.
1947.
Sequential AnalYsis.
John Wiley and Sons, New York, N.Y.
Wald, A. 1950. Statistical Decision Function.
New York, N. Y.
John Wiley and Sons,
Williams, E. J. 1958. The comparison of regression variables.
of Statistics Mimeograph Series No . 195. UNC.
Institute
© Copyright 2026 Paperzz