UNIVERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
~ ..
ILLUSTRATION OF A TEST \-lHICH COMPARES THO PARALLEL REGRESSION
LINES WHEN THE VARIANCES ARE UNEQUAL.
by
Richard F. Potthoff
April 1962
Contract No. AF 49(638)-213
In a situation where two regression lines are known to be
parallel, it may be desired to test the hypothesis that
the two lines are identical without assuming thet the
variances of the two sets of error terms are necessarily
equal. This paper presents a relatively non-technical
discussion of a test which can be used for this problem.
The test statistic is analogous to the well-known Wilcoxon
statistic. The obtaining of the test statistic involves
a large number of routine calculations, and (in general)
a computer is needed for this. This paper is intended
for the practitioner rather than for the theoretician;
the more technical aspects of the test are covered in a
separate paper.
1. .
lP
This research was partially supported by the Air Force Office of Scientific
Research and partially supported by Educational Testing Service.
Institute of Statistics
Mimeo Series No. 323
ILWSTRATION OF A TEST WHICH COMPARES TWO PARALIEL REGRESSION UNES WHEN THE
1
VARIANCES ARE UNEQUAL
•
by Richard F. Potthoff
1.
i
Introduction.
It is sometimes desired to make certain comparisons of
two regression lines, when it is assumed that the two sets of error terms are
normally distributed but with (possibly) unequal variances.
An earlier paper
~~7 presented a method of testing the hypothesis that the two regression lines
are parallel, under this condition of unequal variances.
In the present paper it
is assumed that the two regression lines are parallel (with the two variances being unequal), and a method is presented for testing the hypothesis that the two
regression lines are identical.
The test statistic to be used in this paper, like
the test statistic presented in~1_7, is analogous to the Wilcoxon statistic.
•
\
•
We suppose that we have m pairs (Y , Xl)' (Y2 , X2 ), ••• , (Ym' Xm), such
l
that, for each i, Y observes the relation
i
( 1.1)
,
where a y and ~ are unknown parameters (regression coefficients), the Xi'S are
specified constants, and the ei's are normal and independent with mean 0 and unknown variance
~e2
We suppose also that we have n pairs (Zl' W ), (Z2' W ),
2
l
••• , (Zn , Wn ), such that, for each j, Zj observes the relation
,
(1.2)
where a
Z
is an unknown parameter, ~ is the same thing as in (1.1) (this is where
the assumption that the two regression lines are parallel enters in), the Wj's
are specified constants, and the fj'S are normal and independent with mean 0 and
unknown variance
•
2
~f.
We will present a test of the hypothesis that the two
regression lines associated with (1.1) and (1.2) are identical:
that is, we will
1 This research was supported in part by Educational Testing SerVice,
and was supported in part by the Air Force Office of Scientific Research.
2
test the hypothesis that
ay
We of course must have a test which will be
2
2
valid regardless of what the values of ~e and ~f are.
~ ~z'
_0
This paper will give a numerical example, which will illustrate not only
how the test of the hypothesis
ay
= O:z
tain confidence bounds on (o:Z - O:y)'
works, but will also illustrate how to ob-
Except when m and n are very small, the
computations required to obtain our test statistic, although quite routine, will
be prohibitively lengthy to be performed on a desk calculator;
numerical example given in this paper is for m
=5
and n
= 5.
consequently, the
In general, it ap-
pears that it will be necessary to utilize a high-speed computer in order to calculate the test statistic we are presenting here.
Section 5 discusses briefly the
problem of performing the lengthy calculations on a computer, and suggests some
•
possible techniques for circumventing some of these calculations •
An experiment comparing two curriculums provides an illustration of a
practical situation in which the problem treated by this paper could arise. This
situation is discussed in some detail in~1_7. We have m classes which receive
curriculum
1F
1 and n classes which receive curriculum ~2.
Y and Xi represent
i
respectively the achievement measure (obtained after the course) and ability
measure (obtained before the course) for the i-th class receiving curriculum
Jtl.
Simi~arly,
Zj and W represent respectively the achievement measure and ability
j
measure for the j-th class receiving curriculum
2. The problem discussed in
#
~1_7 was that of testing the hypothesis that the regression lines associated with
the two curriculums are parallel (when the variances may be unequal).
..
In the pre-
sent paper, however, it is assumed that the two regression lines are parallel
(1.1 - 1.2), and the problem is to test whether they are identical.
(If they ~
identical, then there is no difference in the effects of the two curriculums.)
In this paper the discussion will be relatively non-technical.
paper covers the technical aspects of the topic of the present paper.
A separate
3
2.
The test.
This section presents the formula for the test statistic.
There is a certain similarity to the test statistic of ~1_7, but the present
statistic is much more difficult to compute.
For every quadruple (i, I, j, J) such that
,
(2.1)
and
,
(2.2)
,
let us define
V
will be normally distributed with expectation (oZ - 0y), and hence
iIjJ
will have median (oz - 0y). Thus we would expect (on the average) that half of
Each
..
~
•
the ViIjJ's would be positive and half negative, if and only if the null hypothesis 0y = 0z is true.
Let S be the number of ViIjJ'S which are positive, and let w be the proportion of them which are positive.
Then
,
(2.4)
where T is the total number of quadruples (i, I, j, J) which satisfy (2.1 - 2.2).
If the null hypothesis
~
= 0z
1
is true, then w will have expected value "2 and
will be approximately normally distributed.
0y
= 0z
is true will depend on a e
2
The variance of w when the hypothesis
2
and at ' but a number Q can be obtained such
that
var(w) ~ Q
2
2
no matter what the values of Ci
are. This number Q, which is the least
and Ci
f
e
~
upper bound of var(w), is calculated by a formula to be given below.
4
The test of the hypothesis 0y
use a two-tailed test at the 5
%
= 0z
is as follows:
if we wish (e.g.) to
level, we reject if
.
and accept otherwise.
This test (2.5) will be conservative in the sense that the
probability of rejecting the null hypothesis when it is true will generally be
somewhat
less than 5
rather than exactly equal to 5 0/0;
%
the reason for
this is that the actual value of var(w) (which is unknown) will generally be less
than Q.
Although w (2.4) is relatively simple to calculate, the computation of Q is
very lengthy.
The formula for Q is
Q = max(Ql' ~)
(2.6)
( i. e., Q is the larger of the two quantities Q and ~), where
l
(2.7)
1
Ql ="4T+
1
2
T
1t
l:
iIjJ>
i'I'j'J'
. -1
s~n
p
(1)
iIjJ,i'I'j'J'
and
(2.8)
1
~="4T+
1
1t
rf-
l:
iIjJ >
i'I'j'J'
. -1 (2)
s~n
PiIjJ,i'I'j'J'
,
the pIS being defined by
- 011 ' ( XI - w) (XI ,- Wj ,) + 0iI I (XI - Wj)(WJ I - Xi ' )
+ °Ii' (W.jXi)(X I ,.vl j ,) +. °II I (WJ-xi HwJ,-Xi I)
PiIjJ,i'I~j'Jr=
..
(
.
(1)
..
J(XI -W j )2 + (WJ-Xi)2j(XII-Wj,)2 + (WJ, -Xi ,)2
and
5
(2.10)
•
The 0 symbols in (2.9) and (2.10) are Kronecker deltas;
equal to 0 if i ~ i ' and is equal to 1 if i
= i'.
for example, 0 ..
~~
The summations in
1
(2.7)
is
and
(2.8) extend over all pairs of quadruples (i, I, j, J) and (i', I', j', JI) such
that both quadruples satisfy (2.1 - 2.2) and such that
(i, I, j, J) > (i', I', j', J')
(2.11)
fWe assume that the T quadruples satisfying (2.1 in some kind of order;
2.2)
are arranged arbitrarily
it is in this sense that the relation (2.11) is to be
interpreted._7
Note that the formula for Q does not depend on the Y. 1 S or the Z.' s.
J
~
In
other words, Q can generally be calculated before the experiment is completed, if
..
desired •
The reason for assuming normality of the ei's and fj'S in using the test
(2.5) is that this normality assumption was used in proving that Q is the least
upper bound of var(w).
It is not known how the test (2.5) would be affected by
non-normality, but it is possible that
there might be certain non-normal symme-
trical distributions of the ei's and fj'S for which the test would still be valid.
Potentially, there are ~ m(~l) x~ n(n-l) quadruples (i, I, j, J) which
will satisfy (2.1).
However, there will be fewer quadruples than this satisfying
(2.1) if any of the Xi'S are the same, or if any of the Wj's are tied.
it would be permissible to assign (by using random numbers) an
If desired,
arbitrary ranking
to any set of tied X. 's or of tied W.'s and thereby bring the number of quadruples
~
"
J
satisfying (2.1) up to its full potential number.
.!"However, this will not be
done in the numerical illustration to be given in Section 3 (in which there are
6
tied X. 's and tied W.'s), due to the fact that we are attempting to hold down
J
~
the length of this illustration in order for it to be readily understandable._7
The question might be raised as to why the statistic w is not based on all
quadruples
which satisfy (2.1), rather than just on those which satisfy (2.2) as
,
well as (2.1).
One reason is that the proof which est~blishes the least upper
bound for var(w) would not work if quadruples
Er?tisf'Ji.;.')[;
(2.1) but not (2.2) were
to enter in to the determination of w ~and so a new technique for proving an
upper bound for var(w) would somehow have to be worked out_7.
In addition to
this, however, a couple of other possible advantages might result from excluding
quadruples which do not satisfy (2.2).
(i)
One effect of the requirement (2.2) is to exclude all Vi1jJ'S whose
variances exceed a certain value.
Since the test statistic is thus based on the
ViIj/S with the smallest variances, it is conjectured that
the exclusion of
some of the ViIjJ'S via (2.2) does not necessarily result in a test with lower
'.
power, and that a test based on all (~)(~) potentially possible ViIjJ'S (if such
a test could be devised) would have worse rather than better power in some circumstances.
~on the other hand, the exclusion of some ViIjJ'S via (2.2) can
unquestionably be a hindrance if such exclusion results in too large a proportion of the ViIjJ'S being eliminated._7
(ii)
An obvious effect of the exclusions
based on (2.2) is to make the computations a bit less lengthy.
(iii)
Another
effect of the restriction (2.2) is to exclude those quadruples for which the
intervals (Xi' ~) and (W j , W ) have no points in common; however, it is not
J
known that this effect, as such, results in any advantage.
It thus appears that, although a test based on all ViIjJ'S satisfYing
(2.1) (if it could be devised) would be a more desirable test in some situations
than the one utilizing the restriction (2.2), it could be less desirable in other
~
situations.
7
3.
Numerical example.
In this section we give a numerical example to
illustrate the computations for the test (2.5).
Because the computations are
quite lengthy, the sample sizes in the example will be very small:
n = 5.
m = 5 and
For purposes of the illustration we will go ahead and use the normal ap-
proximation for the distribution of w, even though actually the sample sizes are
too small and var(w) and Q are too large for the normal approximation to be suitable.
We will counteract this difficulty to some extent by testing at the 10
rather than the 5
%
0/0
level.
Suppose that our two samples are
Y = 4.42,
l
Xl = 0
Y = 30 .78,
Y = 27.59,
2
,
X = 4
2
~
=
Z2 = 35.97,
Z3
= 38.42,
Z4
5
W4
Y4 = 32.65,
3
4
,
,
X4 = 4
Y = 69.36
5
X = 9
5
and
\-
e
W2
= 5
,
W =
3
,
= 38.81,
= 5 ,
Merely for the sake of orderliness, the first sample is arranged so that it is in
order of increasing magnitude of the Xi's, and the second sample is in order of
increasing magnitude of the Wj's.
Let us define v
i1jJ
to be the numerator of the formula for V
(2.3):
iljJ
viljJ = (XI - Wj){ZJ - Yi ) - (WJ - Xi)(YI - Zj)
(3.1)
In all cases v
iljJ
will have the same sign as V
•
iljJ
Hence we can determine S in
(2.4) by counting the number of viljJ'S which are positive, rather than by calculating all the ViIjJ'S and counting the number of them which are positive.
Thus, if it is desired only to test the hypothesis a y
= a Z'
it will not be
necessary to calculate the ViljJ'S, but rather it will suffice to calculate only
the v
tit
ilj
/ s;
z - ay)
On the other hand, if confidence bounds on (a
then i t ~ be necessary to have the ViljJ' s.
obtain confidence bounds.
are des:.i.::.~ed>
Section 4 will illustEate how to
8
For the samples above with m = 5 and n
= 5,
there are potentially
~ m(m-l) x ~ n(n-l) = ~ • 5(5-1) x ~ • 5(5-1) = 100 different quadruples (i,I,j,J)
which satisfy (2.1).
but some do not.
Xi
Of these 100 quadruples, some satisfy both (2.1) and (2.2)
For example, consider the quadruple (1, 4, 2, 5).
= Xl = 0,
XI
We have
= X4 = 4, WJ = W2 = 5, V!J = W5 = 9
Since W ~ XI does not hold, this quadruple fails to satisfy (2.2).
J
On the other
hand, the quadruple (3, 5, 1, 3) ~ satisfy (2.1) and {2.2),since
Xi
= X3 = 4, XI = X5 = 9, WJ = W1 = 1, WJ = W3 = 5
,
and so Xi ~ WJ and Wj 5 XI are both satisfied~Note: As indicated earlier, we are
excluding some quadruples because tied X's or tied WI S prevent (2.1) from being
satisfied.
Some quadruples, such as (2, 4, 1, 5), will satisfy (2.2) but will
be excluded because a tie prevents (2.1) from being satisfied._7
e
Altogether it turns out that 40 of the 100 quadruples satisfy both (2.1)
and (2.2), while 60 do not.
Thus T
= 40. ~In
general, one would suspect that,
the higher T is (for a given m and n), the more powerful the test will be;
in this
example, the power of the test is hampered by the relatively small value of T._7
We now indicate what the 40 quadruples are which satisfy both (2.1) and (2.2) and
we present the value of viIjJ for each quadruple (the 40 quat'l.ruples are the subscripts on the 40 viIjJ's):
= 9.25, v1214 = 10.42, v1215 = 13. 05,
= - 6.70, vi314 = - 5.53, v1315 = -15.66,
v1212
=
v1312
= -14.05,
v1412
= -23. 40 , v1413 = -16.05, v1414 = -14.88, v1415 = -32.49,
v1512
= - 49.20 ,
= -60.51,
= 6.72,
v1525
v
2512
v
2525
1.90,
= -19.63,
v1213
v
1313
v1513
v
1535
v
2513
v
2535
= - 29 •60 ,
= -38 •46,
= 26.32,
v1514
= -26.48
v1545
= -34 .95, v1515 = -62.88,
v2514
= 29. 44 ,
= - 7.38,
v 2 545
= - 5. 43,
9
v3512
= -18.80,
v3525 '" -3 2 .39,
f
= .80 , v3514 = 3.92,
v
= -20.14, v3545 = -18.19,
3535
v45l3 = -14.16, v4514 = -11.04,
v
35l3
v4512
= -33· 76,
v4525
= -39.87, v4535 = -27.62, v4545 = -25.67,
35l5
= -32.48,
v4515
= -47.44.
v
A couple of examples will suffice to demonstrate the calculation of the viIjJ's:
v2513 = (X - Wl )(Z3 - Y2 ) - (W - X2 )(Y - Zl)
5
5
3
= (9 - 1)(38.42 - 27.59) - (5 - 4)(69.36 - 9.04)
= 8(10.83)
,
- 1(60.32)
and
v1312
= (X3 - Wl )(Z2 - Yl ) - (W2 - Xl )(Y3 - Zl)
= (4
- 1)(35.97 - 4.42) - (5 - 0)(30.78 - 9.04)
= -14.05
We find that, of the 40 viIjJ'S, 9 are positive and 31 negative.
S
Thus
= 9, and, by (2.4),
w
(Note:
=~ =
to = .225
If one or more of the viIjJ'S had been exactly 0, we could have handled
such a situation by count1ng~ for each such viIjJ .)
Computing Q is much less simple than computing w.
In obtaining
Q
l
and
~
it is convenient to utilize the well-known trigonometric formula
2 sin-1 P
= cos-l( 1
- 2 P2)
in order to obviate the need for obtaining square roots.
Table I. contains the
value of
-1 (1)
2 sin PiIjJ,i'I'j'J'
to two decimal places, as computed by the formula (3.3), for every octup1e (i, I,
j, J;
•
i', I', jl, J') embraced by the summation appearing on the right-hand side
of (2.7).
Similarly, Table II. contains the value of
-1 (2)
2 sin PiIjJ,i'I'j'J'
for every octuple needed for (2.8).
,
10
A few examples will demonstrate how the:tbrm~ (2.7), (2.8), and (3.3)
are utilized to obtain the elements of Tables I. and II.:
•
cos
-1 (1)
2sin
P3545,14l2
. -1
2s~n
(2)
P14l2,3545
-1
r
L 1 - 2
=
3200 7
cos -lr
_ 1 - ~5-
=
2sin
=
2sin
-1
-1
= cos -l( -.2008) = 1·7730
0
=0
since
3
I 1, 3 I 4, 5 I 1,
0
=0
since
1
I 4, 1 I 5, 2 I 4, 2 I 5
5 ~ 4
1·5982
. -1 (2)
2s~n
P25l4,2535
(Note:
e
=
2sin
-1
0
=0
The only reason that Ta.bles
is for convenience in grouping
111~e
since
I. and II.
elements.
1
I 3, 1 I 5,
4
~
3, 4
I5 •
are arranged in different orders
VIe might also mention that the
elements of both tables were all computed to four decimU places, but only two
decimal places are shown in the tables in order to save space.)
11
The sum of all the elements in Table I.
(;.4)
2
-1 (1)
I:
sin
1IjJ>
is
=
PiIjJ,i'IJj'J'
628.732
i'I'j'J'
The sum of all the elements in Table II. is
2
-1
E
sin
lIjJ>
i'I'j'J'
L The
(;.4) and
1
~ = W
e
6;8.516
=
original figures, correct to four decimal places, were used in calculating
both of these sums
•
(2)
PiIjJ,i'I'j'Jt
=
1
+
(;.5)~7 We substitute
1
2
(;.4) into (2.7) to obtain
(628.7:32)
21tT
IbO
+
~ = 4~
+
628·7;2
=
3200n
.068791
Similarly, we get
from
~~2 (638.516)
= .069764
(3.5) and (2.8). Hence, by (2.6) ,
Q =
.069764
Since we already obtained w in
(;.2), we have
1
w - -2
•
4
= -.275
.2641 = -1.0 1
Inasmuch as the absolute value of
e j e c t the hypothesis
oy = a z
-1.041 does not exceed 1.645, we cannot re-
at the
100 /0 level.
sample sizes are not large enough and var(vT)
and
Life emphasize again that the
Q are not small enough for
12
the normal approximation to be very suitable, but for purposes of the illustration
we are going ahead and applying the normal approximation anyhow.
Although the
normal approximation would appear to be suitable for just slightly larger
values of
m and n, it 'Was felt that an illustration any lengthier than the
one given here would consume an excessive amount of space._7
4. Confidence bounds. If we wish to obtain confidence bounds on
we use roughly the same principle that 'Was described in
to find that value of
5 =
az - ay W'flich,
I)}
and
l'E,7.
(~-ay),
He need
when subtracted from every V
,
iIjJ
will cause (the resulting new) w to be on the threshhold of significance.
example, suppose we vlant a 90
Then
0/0
For
two-sided confidence interval for (~-oy).
w will be on the threshhold of being significantly large if
'57 of the
40 ViIjJ'S are positive and 1 of them is zero, since
1
(1. 645 x .2641 + '2) x 40
=
·9'54 x 40
=
and w will be on the threshhold of being significantly small if
ViIjJ'S
are positive and 1
1
must use the
x 40
'5 we obtained the
viIjJ'S
For example} we get
2
of the
is zero, since
(-1.645 x .2641 + '2)
In Section
,.
'57·4
=
viIjJ' s
2.6
but not the
to calculate the ViIjJ'S
V1IjJ ' s .
by the formula
We now
40
13
=
•
.,
The values of all
3·92
9+5-4=1
3·92 = .44 .
= 9
40 Vilj/ s are as follows:
1212 = .24,
V
1213 = 1.16,
V1214 = 1·30, V1215 = 1.09,
V
1312 =-1.76,
V
1313 = -.84,
V1314 = -.69, V
V1412 =-2·93,
V1413 = -2.01,
V1414 = -1.86,
V1512 =-3.78,
V1513 = -2.28,
V
V
1525 =-4.65,
1535 = -2.96,
V
V
V2512 = ·75,
V2513
V
2525 =-2.18,
V
V3512 =-2.09,
V
v
=-3.60,
3525
V
V4512 =-3·75,
4525 =-4.43,
V
'Ie notice that
equal to
= 2·92,
1315 = -1.31,
V1415 = -2·71,
1514 = -2.04,
V1545 = -2.69, V1515 = -3·70,
V2514 = 3·27,
2535 = -.82,
V2545 = -.60, V2515 = -.54,
3513 =
.09,
V
.44,
= -2.24,
V
= -2.02,
V4513 = -1·57,
V
4535 = -3·07,
V
3535
v
3514 =
3545
3515 = -2·50,
V
4514 = -1.23,
4545 = -2.85, . V4515 = -3.65 •
37 of the 40 Vi1jJ'S exceed -3.78, while one Vi1jJ is
-3.78. Hence -3.78 is the lower end of our confidence interval.
Similarly, we find that
being equal to
2 of the 40 Vi1j/S exceed 1.30 (with one Vi1jJ
1.30), so that 1.30 is the upper end of our confidence interval.
Thus we can state that
-3·78 ~
with confidence coefficient
<az - ay)
> 900/0 .
< 1·30
14
In most practical situations the true values of the parameters would never
be kno'WIl.
However", the numerical illustration in this paper was constructed
artificially", and so all the parameters are Imo'WIl.
~
2
= 7", ay = 4", az = 2",
5·
~e = 25", and
The problem of evaluating
2
~f
= 1.
and
Q
l
The values used were
Thus
(az - ay)
~.
= -2 •
Obtaining
Q
l
(2.7) and
Q (2.8) is by far the biggest problem associated with the calculation of the
2
test statistic which we have presented.
In general", the number of terms which
are included in the summations in formulas (2.7) and (2.8) increases rapidly
a.s
m and
n
increase.
It is therefore appropriate for us to consider possi-
ble techniques for reducing the computational burden involved in the evaluation
of
,e
Ql
and
~.
It would be helpful if", by theoretical means", simpler but equivalent formuQ
l
las for
and
could somehow be found.
~
There is no indication", however",
that any success in this direction could be achieved.
be possible mathematically to obtain upper bounds on
Alternatively", it might
~
and
~
which
would be relatively easy to compute and which would be close to the actual values
of
Q
l
and
Qi
2
Except when
and
Q
2
but at present no progress has been made in this direction.
T
is relatively very small", the direct calculation of
by means other than a high-speed computer is out of the question.
Ql
Un-
fortunately, however, even a high-speed computer is not fast enough to calculate
Q
1
and
Q
2
directly when
m and
n
are of moderate s:i.ze.
It has been esti-
mated that the calculation of a tJrpical non·,zero element of the summation on the
right-hand side of (2.7) or (2.8) would require roughly 15 milliseconds on the
UNIVAC at Chapel Hill.
2
Thus", even if there were only a million of these ele-
~e author is indebted to Thomas G. Donnelly for furnishing this approxinate
estimate.
15
ments to be calculated (as there very well could be if m and n were larger
than 10), the calculations would require (roughly) over four hours of computer
time.
Hence it is necessary to find some kind of short-cut for determining Q
l
and ~. Perhaps the most obvious approach is to take a sample of the elements
which enter into the summations in (2.7) and (2.8), rather than calculating all
"21 T(T-l) elements in each summation. One possibility would be to draw random
samples of sizes
s(l)
~ T(T-l) elements (in-
and s(2) (say) fram all the
cluding both zero and non-zero elements) in the summations in (2.7)
and (2.8)
respectively, and then use the sample means as estimates of the respective
population means ["where each population consists of ~ T(T-l) elements_7.
For
example, if we 'Want to estin:ate ~, we could draw a sample of (and calculate)
\
e
s(l) elements each of the form
. -1
s~n
(5.1)
Let the sum of these
(1)
Pi1jJ,i'I'j'J'
s(l) elements (5.1) be a(l)(say).
Then we could use the
expression
1
T.[l
"fJ:
to estimate
Q1'
1
+ 1CT2
x
~
(We use the symbol
T(T-l)
.[ (1)J =
:(1)·
to mean
"is estimated by".)
Similar-
ly, we could write
Perhaps a better technique of sampling, however, would be to determine
exactly the number of elements in the population which are zero by virtue of
4It
satisfying
16
(5.4a)
i ~ i', i ~ I', I ~ i', I ~ It
in the case of elements for (2·7)
or
j ~ j'
j ~ J', J ~ j'
j
J ~ J'
J
in the case of elements for (2.8),
and then to draw the sample from that sub-population which consists only of the
(5.4) .. For example, suppose we are estimating
elements not satisfying
Suppose we determine that exactly
zero by virtue of satisfying
tel)
p
t~l)
(5 .lta).
of the
elements
(5.1) are
Then the remaining
lIe may draw a random, sample of size
from this sub-population of size t(l).
Suppose we calculate these
p
e
T(T-l)
~ T(T-l) _ tel)
2
a
=
elements do not satisfy (5.lta).
\
~
~.
elements in the sample and find that their sum is
a(l)(say).
p
s(l)
(say)
p
s(l)
p
Then we can use
the expression
1
W
to estimate
1
+ - 2
1tr
Similarly J we can write the expression
Q .:.
2 -
1
1iii'
+
1
1CT2
~
t(2)
p
a(2»)
p
-;m
p
Still more sophisticated sampling techniques are available through the use
of stratified sampling.
not satisfying
In p:!.rticular (for example)" we can divide the elements
(5.4) into two groups or strata: those elements which satisfy
17
i
(5.7a)
= it,
I
= If
or i
= II,
I
= i'
in the case of elements for (2.7)
or
j = j', J = Jt
or
and those elements which do not.
j = J', J = j'
in the case of elements for (2.8),
Samples can then be taken from each stratum.
["The reason for stratifying is that those elements which satisfy (5.7) will
tend to have a much higher value than those elements which do not.:..7 Suppose, for
example, that we are estimating
(5.4a), suppose that
satisfy (5.7a).
t~l)
~.
Of the
t~l)
satisfy (5.7a) while
elements
til)
= t~l)_ t~l)
s~l)
Suppose we take samples of sizes
(5.1) not satisfying
and ail)
strata satisfying (5.7a) and not satisfying (5.7a) respectively.
of the elements in these samples be
\
e
estimate
Ql
a~l) and
all)
do not
from the two
Let the sums
respectively.
Then we may
by using the expression
1
W
+
~l)
+
Similarly, we can write
1
W
+
In addition to the three possible techniques just presented for estimating
~
and Q2' certain other techniques (involving stratified sampling) might be
devised.
The choice of what technique to use will be governed largely by the
comparative costs of the different techniques (in terms mainly of computer time
needed for obtaining a specified precision for the estimates of Q and ~).
l
The discussions of the three sampling techniques just presented did not
state explicitly whether sampling without replacement or sampling with replace-
18
moot should be used.
Sampling without replacement is, of course, more efficient
than sampling with replacement, assuming that the cost (1. e., computer time)
per element sampled is the same under both schemes.
However, there is no reason
'Why sampling with replacement could not be used if itw.s justified by cost
considerations.
Q
l
If a sampling scheme is utilized to estimate
and
~"
might be raised as to what the sizes of the SSUJPles should be.
'We wish to use formulas (5.5)
and (5.6), 'What should s(l)
p
For example" if
and
is estimated very roughly that" if we take the sample sizes
each to be 10,000, then the chances will be about
estimate of ~
will be within .! 1
both sample sizes are
that
YQ
0/0
40,000, then
99
0/0
2
{Q
0/0
of the time.
s(l)
p
and
that the
-Iff,;
that, if
99 out of 100
1
'Will be estimated to 'Within .! '2
0/0
These statements 'Will be roughly
.
are considerably larger than
100
of its true value; and that,
correct regardless of the values of m, n, and T,
p
out of
2,500" then the chances will be about
if both sample sizes are
2
t ( ),
s(2) be? It
P
(s(l) and s(2) )
p
p
of the true value of
will be estimated to within .!
of its true value about
99
the question
s(2).
p
so long as
tel)
p
(For relatively small
and t(2)
p
t(l)and
p
the etatements will still be roughly correct if sampling with replacement
is used" but will be conservative if sampling without replacement is used.)
If the stratified sampling soheme assoc1&ted with the formulas (5.8)
(5.9) ["instead of the sampling scheme associated 'With
(5.5 - 5.6L7
is em-
ployed for estimating
Q.. then we should achieve greater precision with the same
total sample size
=
s(l)
p
s(l)
1
+
s(l)
2
or
s(2) P
-
6(2)
1
+
s(2) .. and so
2
the statements made in the preceding paragraph would be conservative.
other hand" if the sampling scheme associated 'With
e
and
(,.2
- 5.;)
On the
is used, then
for given sample sizes we will have considerably less precision than for the
scheme associated 'With
(5.5 - 5.6); furthermore, the tendency 'Will be that this
19
loss in precision will be relatively worse the larger m and
the larger m and n
n are ~since,
are, the larger will tend to be the expected proportion
of zero elements sampled for the scheme associated with
(5.2 - 5.')_7 •
•
REFERENCES
~!::.7
Richard F. Potthoff, "Illustration of a Technique Which Tests Whether
Two Regression Lines Are Parallel When the Variances Are Unequal," Institute of
Statistics Mimeograph Series No. '20, University of North Carolina., 1962 •
£ ~7
Richard F. Potthoff, "Comparing the Medians of Two Symmetrical
Populations, II
Institute of Statistics Mimeograph Series No. '16, University of
North Carolina., 1962.
t
TABLE I·
e
2 sin
f
0415 0415 0419
0J015
0415
0419
0415
0415
04:5
0419
3·14 _
3·14 3.14
2·70 2.70 2.70
·54
.54 .54
.;4
.54
.54
.54
.;4
·54
·33
·33
·33
.54
.;4
.;4
.54
·54
.54
.54
.54
·54
.33
.20
·33
·33
·33
·33 3·14 3·14
.20 2.70 2.70 2.70
·90
·90
·90
·90
·90
.;4
·90
.90
·90
·54 _
·90
·90
·90
.54
.90
.54 3·14
·90
.54
.42
.42
.';"'c
.1;.2
.42
.42
.42
.42
.~2
.42
.70
.42
.42
.42
.42
.42
.70
.42
.42
.26
.26
.26
·70
.70
.42
·90
·90
.42
.42
.42
.70
·90
.42
.;4
.54
.;4
.26
.26
.26
.42
.90
·90
.90
·90
·90
·90
.42
.42
.42
.70
·90
.42
.42
.42
.70
.54
.26
.26
.26
.42
3.14
1.95
1.95
1.95
2.57
3.14
1.95
1.95
1.95
2.57
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
.13
.13
.13
.85
.85
.85
.57
.13
.13
.13
.85
.85
.85
o
o
o
o
.13
.13
.13
.85
.85
.85
1414
0415
.;4
.;4
:..415
0419
·33
0915
0915
0915
1525 0959
1535 0959
15;'5 0959
:51; C>9:9
·90
·90
2513
2;14
2;25
2535
~915
;;:545
':I:C:~
,..
.... ,.I_c;
3513
3514
3525
3535
3545
3515
49:'5
1..959
1..959
4959
;'919
49:-;
4915
4915
4959
4959
4959
4919
4512 4915
4513 4915
4514 4915
4525 4959
4535 4959
4545 4959
4515 4919
.33 _
.jj
.33
.33
.;4
.;4
1;.9:'5
3.14
3·14 3.14
2.70 2.70 2.70
.54
042.;
25:2
.33
·33
·33
.2C
·54
·54
.33
.54
.;4
:.41.3
25:5
2515 3512 3513 3514 3525 3535 3545 3515 4512 4513
liY),"
15].2
e
0415 0'.15 0415 0419 0415 0415 0415 0419 0915 0915
0415
:'5:::
1514
,
13121313131" 1315 141214131414141515121513151415251535154515152512 2513 2514 2525 2535 2545
0415
x,x.......
1412 0415
,e
4525453545454515
4959 4959 4959 4919
1212121312141215
iljJ
1315
4514
0915 0959 0959 0959 0919 4915 4915 4915 4959 4959 4959 4919 4915 4915 4915 4959 4959 4959 4919 4915 4915 4915
i'!'j'J'
XH!J'
1212
1213
1214
1215
1312
1313
1311+
-1 p(l)
\
iljJ,1'I'j ' J'
~.
"1"'-
2.04 2.04 2.04 2.45
2.04 2.0l. 2.04 2.45
2.0l. 2.04 2.04 2.45
.90
·90
o
o
o
o
o
o
o
o
o
o
o
o
2.04
2.04
2.04
1.13
1.13
1.13
1.63
2.04
2.04
2.04
1.13
1.13
1.13
1.63
2.45
2.45
2.45
1.27
1.27
1.27
1.87
o
o
o
o 2.04 2.04 2.04 2.45
o
o
o 1.13 1.13 1.13 1. 27
o 1.13 1.13 1.13 1.27
o
o 1.13 1.13 1.13 1.27
o 1.63 1.63 1.63 1.87
1.13 1.27
~·";'3
1.13
1 ..13
1.27
o
c
o
1·:3
1.13
1.13
1 .. 27
c
o
o
1.63 1.63 1.63 1.87
o
o
o
o
o
o
o
o
c
o
o
o
o
o
o
o
c
o
c
2.04
2.04
2.04
1.13
1.13
1.13
1.63
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
c
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
c
o
o
o
o
o
o
o
1.1.3 1.13
o
.33 3·14 _
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o 2.04 2.04 2.04 2.45
o 2.04 2.04 2.04 2.45
1·95
1.95 3.14
1.95 3.14 3·14
2.57 2.52 2.52 2·52
·57
.23
.13
.23
.13
.13
.23
.85 1. 59
.85 1·59
.85 1·59
·57 1.01
.23
.23
.19
.23
.23
.19 3·14
.23
.23
.19 3.14
1. 59 1·59 1.25 1.60
1·59 1. 59 1.25 1.60
1.59 1.59 1.25 1.60
1.01 1.01
.81 2.27
·57
.13
.13
.13
.85
.85
.85
.57
.13
.13
.13
.85
.85
.85
.57
.23
.23
.23
1.59
1.59
1.59
1.01
.13
.13
.13
.85
.8;
.85
.57
.13
.13
.13
.85
.85
.85
.57
.13
.23
.23
.23
.19
.13
.23
.23
.23
.19
.13
.23
.23
.23
.19
.85 1.59 1·59 1. 59 1.25
.85 1.59 1.59 1. 59 1.25
.85 1.59 1.59 1. 59 1.25
.57 1.01 1.01 1.01
.81
.23
.23
.23
1.59
1.59
1·59
1.01
.23
.19
.23
.19
.23
.19
1·59 1.25
1.59 1.25
1.59 1.25
1.01
.81
.~
.~
.~
.19
.19
.19
.13
.~
.~
.~
.19
~9
~9
.13
3.14
1.60
1.60 _
1.60 1.60 3.14
1.60 1.60 3.14 3.14 _
2·27 2.27 2.47 2.47 2.47
.03
.03
.03
.19
.19
.19
.13
.03
.19
.19 .:19
.03
.19
.19
.19
.03
.19 .19
.19
.19 1.31 1.31 1.31
.19 1·31 1.31 1.31
.19 1.31 1.31 1.31
.85
.85
.13
.85
.13
.13
.13
.85
.85
.85
.57
.03
.03
.03
.19
.19
.19
.13
.19
.19
.03
.19
.03
.19
.19
.19
.19
.19
.03
.19
.19 1.31 1·31 1.31
.19 1·31 1.31 1·31
.19 1·31 1.31 1.31
.85
.85
.13
.85
.13
.13
.13
.8;
.85
.85
·57
3·14
3·14
1.60
1.60
1.60
2.27
.03
.03
.03
.19
.19
.19
.13
3·14
1.60 1.60
1.60 1.60 3·14
1.60 1.60 3.14 3.14 _
2.27 2.27 2.47 2.47 2.47
.03
.03
.03
.19
.19
.19
.13
.19
.03
.19
.19
.19
.03
.19
.19
.19
.03
.19
.19
.19 1.31 1·31 1.31
.19 1·31 1·31 1·31
.19 1.31 1·31 1.31
.13
.85
.85 •. 85
.13
.13
~.~~
.13
3·:":'"
3 .14
.55 1.60 1.6c :.6':'
.55 1.60 1.60 l.60
.55 1.60 1.60 1.60
·57
2.2'7
2.27
2.27
3·:4
3·14 3·11.
2.47 2.4"7
2·:'7
e
T.fl.BLE II.
2 sin
t
.:' cT'
2525
352;
4525 l52;
2535
3535 4535 1535 2545
3545
45.45 1545
2515
-1 p(2)
\
I
iIjJ,i'I'j'J
3515 4515 1215 1315 1415 1515
2512 3512
4512 1212 1312 1412 1512 2513 3513 4513 1213 1313 1413 1513
2514 3514
4514 1214 1314 1414 1514
X'):-:!',<'cT' 4959 4959 4959 0'959 4959 4959 4959 0959 4959 4959 4959 0959 4919 4919 4919 0419 0419 0419 0919 4915 4915 4915 0415 0415 0415 0915 4915 4915 4915 0415 0415 0415 0915 4915 4915 4915 0415 0415 0415 0915
iIjJ
2525
X.. hV:)-: ~
t~?)1·
3525 4959
4525 4959
1525 0959
2535
3535
4535
•, "'.J,,'
~'21:
2545
351.5
4545
.:. 54 5
i_,
.
I
-.,
e
4959
4959
4959
0959
3·14
3·14 3·14
2.63 ~.63 2.63
.80
.80
.80
.52.
.5e
.0C
.50
.80
. be-
.5::'
.51
3.11,
·51
3.14
·51
.33
2.63 2.63
.80
.Bc-
.80
.Bo
.80
.80
.80
.80
.80
.;:
·51
·51
3·1lt
2.63
4959
4959
4959
Bo
·5:
.6:
.8e
.SC)
.80
.:':.
.80
·51
·51
·51
0955-
·51
""
'-~
·51
·33
:.. ..12
.7C
1.12
1.12
l ..12
1.2.2
:.1<:
:.:'2
.70
1 ..12
1.12
1.12
.40
.~C
:'::'2
.40
. -;0 1.12 1.12 1.12
.26
.40
.40
.40
.--1-':
252.5 49::'9
3515 49::'9
:";·15 49"_9
:2:5 0419
·51
.80
.80
·51
·51 3·14
.51 3·14 3.14
.33 2.63 2.63 2.63
.70 1.12 1.12 1.12
·70 1.12 1.12 1.12
·70 1.12 1.12 1.12
.40 .40 .40
.26
.40
.40 .40
.26
.40
.26 .40 .40
.86 .86
.86
·55
0
:7 - .70 3.14
·70 3.14 3·14
.26 1. 76 1.76 1·76
.26 1. 76 1.76 1. 76 3·14
.2£ 1.76 1.76 1·76 3.14 3·14
·55 2·57 2·57 2·57 2·33 2·33 2·33
:"32.5
O~:'9
.-+0
1.0
.2£
.40
.40
.40
:
0.:..1:;;
• ~c
.~:
...
.4:
.26
l5:'5
0919
.56
.~£
.56
'n
.40
.86
.40
.86
.40
.36
"'T"7
:2.Z"{
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
LTI loTI loTI
loTI 1·77 loTI
o 1.TI l·TI 1.77
o .83 .83 .83
.83
.83
.83
c .33 .83 .83
o :.45 1.45 1.45
2.27
2.27
2.27
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
·57
·57
1·90
1·90
1·90
·57 1.05
loTI loTI 1. TI
1.77 1.TI 1. TI
l·TI loTI 1. 77
.83
.83
.83
.83
.83
.83
.83
.83
.83
1.45 1.45 1.45
2.27
2.27
2.27
.13
.13
.13
.13
.13
.13
.13
.13
.13
.24
.24
.24
·98
.98
·98
.94
·94
·94
·94
·94
·94
.94
·94
·94
1·90
1. TI
.57
·57
:.~:
",
""
__c
-:;;:,,...
-;"'-~
...9:5
l.92.;
~5:'2
1.-9::'~
:,."1'"7
:".8.2
:;.:;.:.;
·:3
:. TI 2.27
.Ej
.83
.5/C
.... :;.....c.
04:5
Cl4F
·:3
.d;
: ....:.2
1512
89:'5
25:3
3513
45::'3
:2:.3
:3::'3
1413
1513
49:5
49:,
2 ""
1. T"
- ' (I
o
o
4915
o
042.;
o
0415
04 '"
.c.
0,
.83
.es .83
:.. .45 1.':"'5 1.45
0915
--
o
o
2514 4915
3514 4915
4514 4915
1214 0415
1314 0415
1414 0415
1514 0915
o
o
o
o
o
o
o
e
o
o
o
o
o
o
c
o
-0
-0
c
c
-0
o
:.::.:.:: i
·9:
.5/C
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
·98
.98
·98
loTI
o
o
o
o
o
o
o
.13
.13
.13
.13
.13
.13
.13
.13
.13
.24
.24
.24
.24
.24
.24
.24
.24
.24
.94
.94
.94
·94
.94
·94
·94
·94
·94
·57
1.90
1·90
1.90
·57 1.05
1·90
1·90
1·90
1.05
1·90
1·90
1.90
1.05
.57
.13
.13
.13
.13
.13
.13
.13
.13
.13
.24
.24
.24
.24
.24
.24
·94
·94
·94
·94
.94
·94
.03
.03
.03
.21
.21
.21
.03
.03
.03
.21
.21
.21
·94
·94
·94
.13
.13
.24
.24
.24
.24
.24
.24
.03
.03
.03
.21
.21
.21
.03
.03
.03
.21
.21
.21
.03
.03
.13
.13
.13
.24
.24
.24
1.90
1.90
·57 1.05
.19
.19
.19
1.39
1.39
1.39
.81
.19
.19
.19
1.90 1.90 1.39
1.90 1.90 1.39
1.90 1.90 1·39
.81
1.05 1.05
.19
.19
.19
1·90 1.90 1.39
1.90 1.90 1.39
1.90 1·90 1·39
.81
1.05 1.05
3·14
3·14
1.33
1.33
1·33
2.27
3·14
1.33 1·33
1.33 1·33
1·33 1.33
2.27 2.27
3·14
3.14 3·14
2.20 2.20 2.20
.21
.21
.21
.03
.21
.21
.21
.03
.21
.21
.21
.03
.21 1.65 1.65 1.65
.21 1.65 1.65 1.65
.21 1.65 1.65 1.65
.13
.94 ·94 ·94
.21
.21
.03
.21
.21 1.65
.21 1.65
.21 1.65
·94
.21
.21
.21
.21
.21
.21
1.65 1.65
1.65 1.65
1.65 1.65
·94
·94
.13
.13 3·14
.13 3·14 3.14
·94 1·33 1.33 1·33
·94 1·33 1·33 1.33 3·14
·94 1·33 1·33 1.33 3·14 3·14
·57 2·27 2.27 2.27 2.20 2.20 2.20
.94
.94
·94
.03
.03
.03
.21
.21
.21
.03
.03
.03
.21
.21
.21
.21
.2:.
.03
.21
.21
.2:.
.03
.21
.21
.21
.03
.21
.21 1.65 1.6; 1.6;
.21 1.65 1.65 1.65
.21 1.65 1.6; 1.6;
·57
.13
.13
.13
.13
.13
.13
·94
·94
·94
·::'3
.13
3·:~
.13
3·14 3·14
·94 1·33 ~·33 :·33
·94 1·33 1·33 1·33 3·14
·94
.57
1·33 1·33 1·33 3·14 3·14
2.27 2.27 2.27 2.20 2.20 2.20
© Copyright 2025 Paperzz