Potthoff, R.F.; (1963)Illustrations of some Scheffe-type tests for some Behrens-Fisher type regression problems."

•
UNIVERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
"
ILWSTRATIONS OF SOME SCHEFFE-TYPE
TESTS FOR SOME BEHRENS-FISBER-TYPE REGRESSION PROBLEMS
by
,t~Ohard F.
Potthoff
October 1963
Ie
r--'.
f
f
I
,
r
t
t
r:
r
r
Contract No. AF AFOSR-62-l96
It is sometimes required to compare two simple regression line~
without assuming that the variances of the two sets of error t~rms
are necessarily equal: one may wish to (A) test whether two parallel regression lines are identical, or (B) test whether two regression lines are parallel. In this paper, a t-test for each of these
probl~: is explained and illustrate~ numerically.
These t-tests
bear a ~ertain resemblance to Scheffe1s test for the Behrens-Fisher
problSm, but (unlike the latter) are not randomized tests. This
paper is on a practical rather than a theoretical level; the more
technical aspects of the tests are covered in a separate paper
(Mimeo Series No. 374).
t.
This research was supported in part by Educational Test!ng Service; it was
supported in part by the Air Force Office of Scientif~e Research; and it was
supported in part by a grant from the NATO Research Grants Programme under a
joint arrangement between the Institut de Statistique of the University of
Paris, the Istituto di Calcolo delle Probabilit~ of the University of Rome, and
the n~~tmez:~r, Statistics of the University of North Carolina.
Institute of Statistics
Mimeo Series No. 377
..
;/
ILllJSTRATIONS OF SOME SCHEFFE-TYPE
TESTS FOR SOME BEHRENS-FISHER-TYPE REGRESSION PROBrnMS
by
Richard F. Potthoffl
University of North Carolina
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = == = = = = =
1.
Introduction.
In educational and psychological applications, it is some-
times necessary to make certain comparisons between two regression lines when the
two variances are different.
Such problems arise particularly in connection with
experiments comparing two curriculums or two teaching methods (which often have
different variances), as is explained more fully in [lJ
cipal types of problems are
the same, and
(B)
and [2]..
The two prin-
(A) testing whether two parallel regression lines are
testing whether two regression lines are parallel.
These will
be referred to as Problem A and Problem B respectively.
By generalizing an idea which Scheffe
Behrens-Fisher problem, another paper [3]
;/
Scheffe-type tests for Problems A and B.
[4] used to devise a test for the
gives a theoretical development of
The purpose of the present paper is
to provide numerical illustrations of some of the techniques presented in [3].
The illustrations of this paper are all for the case of simple regression, even
though SCheffe-type tests are available in [;] both for simple regression and
for the more involved case of multiple regression.
randomized test, but the tests in [3J
Scheffe's test itself is a
include both randomized and non-randomized
~hiS research was supported in part by Educational Testing Service; it was
supported in part by the Air Force Office of Scientific Research; and it was
supported in part by a grant from the NATO Research Grants ProgralDIIle under a
joint arrangement between the Institut de Statistique of the University of
Paris, the Istituto di Calcolo delle Probab1lita of the University of Rome, and
the Department of Statistics of the University of North Carolina.
2
tests.
The tests illustrated in this paper, however, are all non-randomized
(under the provision that the independent variates all have distinct values),
for it is suggested in [3J that, for the case of simpLe regression, the nonrandomized
~ts
would ordinarily be used in preference to the randomized tests.
An entirely different test for Problem A is described in
Problem B in [l]j
[2J, and for
other pcssib1e tests for Problem B Which are in the
literature are mentioned in [3J.
All these tests, however, are inexact (i.e.,
the actual level of significance is only approximately equal to the stated level
of significance) and tend to be less exact the smaller the sample. The Scheffe'type tests, on the other hand, are exact tests based on the t-distribution.
Thus
these Scheffe.type t-tests are particularly useful for small sample sizes.
The mathematica.l models for Problems A and B may be expressed respectively
as follows:
Problem A.
We suppose that we have one group consisting of M pairs
(Y2, X2 ), ••• , (YM, XM), and another group consisting of N pairs
(Zl' WI)' (Z2' 1'12 ), ... (~, WN) For each i, the observed value Yi follows
the relation
(Yl , Xl)'
0
,.
(1.1a)
and for each
j, the observed value Zj
follows the relation
(1.1b)
The a's
Wj's
and
~
are unknown parameters (regression coefficients), the
are specified constants, and the ei's and fj'S
and
are error terms which are
independent and normal with mean 0 and unknown variances
tively.
Xi'S
u; and u~ respec-
The hypothesis to be tested is that Cty
regression lines associated with
(l.la)
be able to obtain confidence bounds on
= oz.,
i.e., that the two parallel
and (l.lb)
(CYz -
ay).
are identical.
We will also
Standard regression methods
cannot be used for this problem, of course, since two different variances
2
and cr
f
cr;
are involved.
ProbleJn B.
The model is the same as for Problem A, except that in place
of (1.1) we have
(1.2a)
and
(1.2b)
•
The hypothesis to be tested is that
lines associated with
(~z - ~y)
(1.2a)
and
~y
= ~Z'
(1.2b)
i.e., that the two regression
are parallel.
Confidence bounds on
can also be obtained.
For definiteness, we will always assume (with no loss of generality) that
M
<
N.
The reader should refer to
relation of Problems
[lJ
and [2J
for a further discussion of the
A and B to the experimental situation of comparing two
curriculums or two teaching methods.
2.
The test for Problem A.
This section tells how to calculate the test
statistic for Problem A, and then Section 3 will present a numerical example.
Since this test, like that of Scheffe
[4J
for the Behrens-Fisrer problem,
is based on pairing of the two samples, it is first necessary to effect
pairing.
~his
To start, let the first group be arranged in order of increasing
the second group in order of increasing W, so that
Xl < X2 < ••• <
XM
X and
and
4
WI < W2 < '"
< WN, (In case there are tied values of the X's or of the W's,
it will be necessary to break the tie in some way, either by randomizing or else
by some other objective but less arbitrary method.
e,g"
An alternative to randomizing,
would be to break the tie on the basis of some second variable similar to
X and W which otherwise would not even be utilized at all.)
Paired to each of the M members of the first group will be a member of the
second group,
Thus M of the Wj's
will be unpaired,
paired to Xi'
Let Vl(i)
will be paired, but the remaining (N-M)
denote that vl
(of the second group) which is
j
The pairing is to be done according to the formula
,
l<i< v
,
where the integer v
v+l
<
i
<M
is determined in a way which will be described shortly,
Note, however, that in the special case M=N, no v
(2,1) siID.]?ly reduces to matching the Xl sand W's
The optimal way to choose v
the smallest integer v
is necessary, and the pairing
in opposite order.
(for the general case)
is to set it equal to
such that
N
l:
M-v-l
l:
j=l
W +
j
W
j=N-v+l j
- .jflj[
(X
M-l
v+I
M-I
is
< O.
[The function
0v (2.2)
decreases as
from positive to negative somewhere between
Actually,
~
,
'V
X)
v increases, and will turn
=0
objective method of choosing v
and
'V
=M -
I.J
may be used and the probability
of Type I o error will be unaltered; however, the power of the test will be
affected.
'V
If the experimenter prefers . not to use the optimal method for choosing
because he feels it may be too lengthy computationally, the following simple
5
but somewhat sub-optimal method is recommended instead:
choose
\) =
M/2 if
M is even, and choose \) = (M-i)/2 [or (M+l)/2] if M is odd.
After completing the pairing, we proceed as follows.
M
(2.3)
X = 1
EX.,
M i=l ~
Wo
(2.4)
1
= M
t
=
M
E W(i)
i=l
I.~
R
Y
M
1
Y = M. E Yi ,
3.=1
,
-WeN
1
and
-0
Z
Let
N
Z=
E W.,
j=l J
NE
j=l
1
=
N
1
M
M
E Z(i)
i=l
Zj
,
,
,
+ (w _X)2
u.u
where
(2.6)
ahd
M M 2
+
2
(E Z( ) - MZO)
N i=l i
-
-
(U'TA)2
u*u
•
6
e
The statistic
t
(2.4) is used to test the n'\.1l1 hypothesis
follows the t ..distribution with
(M-2)
<ly
= OZ.
It
degress of freedom if the null hypothesis
is true.
3.
Numerical example of the test for Problem A.
Suppose there are M=4
observations in the first group,
y
x
and N=6
o
7
20.7
9
8
1
observations in the second group,
z
W
~anging
22.8
5.5
6.9
4
6
8
the first group in order of increasing
1
•
X and the second group in
order of increasing W, we have
~
X2 -7
-
=
8
and
•
The next step is to find
values of
v = 2.
We do not have to calculate
v, but rather we need only find the place where
to negative.
for
v.
Let us start with an intermediate value of v:
U~ing (2.2), and noting that
i
= 6,
we get
&v
5
for all
v
goes from positive
we evaluate
&
v
7
6
=
~
15
"2 - -'3
Hence the first negative
=- ~
(8 .. 6)
=2
could occur at v
0v
(6 + 4 Jb)
<
or at \) < 2.
0
•
We next cal-
culate
~
:3
(1 + 2) + (8)
6 + 3
2
:3
Now, evaluating
0v
for the next lower value of
>
which establishes that
v
=1
in
[Note:
01
is the first
(2.1), and (2.1)
Z(l) =Z6=17. 2
H(l)
= W =
6
o
<
=
8
Z(2)
(7 - 6)
•
v, we find
,
o
r.
which is < O. Hence we set
Vv
then gives us
= Z3
Z(3) = Z2 = 5.5
Z(4)
= Zl
w(:3) =W2 = 2
W(4)
= WI = 1
= 6.9
W(2) = w3 = ;
If we had simply chosen v = M/2, then \) would be 2, not
= 2.5
1.]
With respect to the quantities in (2.3), we may write
MX
= 24,
Hence
MY
= 67.9,
NW
= 24,
NZ
= 54.5,
-0
-0
MW
= 14, MZ
= 32.1.
Z - Y = -7.89 and W- X = -2. In order to get t (2.4), we must obtain
(2.5 - 2.7), for which we must first calculate
8
~
X.Y.
~
~
,
= 504.5,
•
Now we use (2.5 - 2.7) to write
,
ufTA
=
= 260.474
97.1 - (-63.524) - (-60.217) + 39.633
,
and
(260.474)2
131.387
Finally, (2.4) becomes
t
=
=
The number of degrees of freedom for
test at the a:; ~05
t
-3.93
.277
is M-2=2.
(az
Confidence bounds on
confidence interval for
(aZ -
- CXy)
ay) is
= az
Hence, for a two-sided
are easily obtained.
given by
i.e., we state that
-2.74 <
confidence.
(az - CXy) ~
-5.12
Since '-14.191
.
,
0/0
-14.19.
level, the critical region is It I > 4.30.
is > 4.30, we reject the null hypothesis CXy
with 95
==
For example, a
95 0/0
9
4.
The test for Problem B.
This section tells how to calculate the test
statistic for Problem B, and then Section 5 will present two numerical examples.
As before, the first thing to do is to pair the two samples.
by assuming
Xl < X < ••• <
2
integer such that
(4.1)
is
YV
< O.
=
~G
~
and W < W < ••• < vl •
N
l
2
1
(W
+ W ) N-v
M-v
[This function
y
M-v-l
M-l
(I:
j=l
(4.1), like
v
5
v
vij
Let
vle start again
v be the smallest
N
+
W)
I:
j=N-v+l
j
(2.2), decreases as
v
increases.]
v
by an objec-
Now define the two pairings
(4.2a)
+
Wei)
l<i<M-v
= Wi
,
= WN_M+i
M-v+l < i< M
and
(4.2b)
vl(i)
= WN+l _i
,
= WM+l _i
1
l<i<v
v+l < i < M
•
[As in the case of the previous test, it is permissible to choose
tive method different from the one just reconnnended.
Only one of the two pairings (4.2) will be used.
No v
To find out which one it
is to be, we compute the correlation coefficients between the
and between the
Xi IS
and 'ttl(i) IS.
Xi's
and Wei) 's
We select whichever pairing leads to the
correlation coefficient with larger absolute value.
tion coefficients have the same
is needed if M=N. J
denominator.)
pairing which is chosen (i.e., either 'ttl(i)
be), and the correlation coefficient is then
(Note that the two correla-
vle use 'H(i)
to denote the
or Wei)' whichever it turns out to
10
M
-
!:
i=l
=
p
-0
X. VI(.) - M X W
J.
J.
j
[In this section, we define all means the same as in (2.3).J
Some future formulas will contain (+)
and <:~)
signs.
The upper sign is
to be chosen in all places if pairing (4.2a) was selected, and the lower sign
if pairing (4.2b) v~s selected.
VIe
define
2
M O"x
(4.4)
M
E X2
i=l i
=
-
,
M X2
N
E V~_Nw2
j=l
N~ =
and
N 0"20
W
M
E
. 1
=
J.=
2
-19
2
,
-MW
He)
J.
and calcuJAte
(4.5)
R
=
N 0'.2
O"w
-cr o
=
vl
W
--"2
Ncr o
vI
At this stage, we specify t1VO cases:
Jp'
>
Ip 1:5 l/R.
A
Ip I :5
t
t3 z
We calculate
1\
-
t3 y
=----~-
;tC;:;;/:r
where
if
llR •
Case I.:
(4.6)
Case I.
,
l/R, and Case II. if
11
(4.7)
(4.8)
'~(R"p)
1 + R2 ,+ 2 p R
2
1 ... P
=
(uy TB)
A
fly =
(R...l)2
2(1
'+
,
(R+l)2
+
p)
2(1
.! p)
-+2 p R(UZTB)
,
1 ... P
-+
A
(4.9)
=
f3Z =
2
P R(uy TB) + R
2
1 ... P
(UZ TB)
,
(4.10)
1
(M
+ --2~ .,E
N O"w
2
_0
Z(i)'" M Z
2
) ...
A
fly
~=1
M
(,E X.Y .... MX 'Y) '+
(4.11)
i=l
~ ~
(
uy TB
~
1
)" M O"x
)
UZ TB)
---..l
M
(,E x. Z(i) ... M X ZO)"
A (
-2
YN trw
I
,
... flZ
i=l ~
and
(4.12)
Case II.
'pl
> l/R. We calculate
A
t
=
1
1P1
where
A
flZ ... fly
)7
M...;
,
12
(uy TB)
A
(4.14)
f3 y
=
A
(4.15)
f3 Z
:::
(uz TB)
+
2
1 .. P
(ur TB) + (l/p2)(UZ TB)
2
1 .. P
(4.16)
,
- M -X -0)
Z 1
(4.17)
and
(4.18)
UZ TB =- .
. /M
Y
Note also that
P
I
2,/N
(
2
O'x rO'o
W
M
-
-0)
.E YiW(i) - MY W +
1.=1
p
N 0'
2
2
WO
M
(
-0:::'0
E W(i)Z(i)" MW Z ) •
i=l
(4.14) and (4.15) imply the formula
To test the hypothesis f3 y
= f3z,
the statistic
to tables of the t-distribution with (M-3)
t
(4.6 or 4.13) is referred
degrees of freedom.
5. Numerical examples of the test for Problem B. We present two examples 1
one for Case I. and the other for Case II.
first group contains the M=6 observations
For the first example, suppose the
13
Yl
= 0.7
X
=
1
Y2
X
2
0
=
=
2.4
Y = 1.9
3
X =4
3
2
and suppose the second group contains the N=7
Y6 = 1.1..5
= 4.2
= 13
Y
5
x
5
Y4 = 2.4
x = 6
4
X6 =17
1
observations
•
Using
(4.1)J we find that
and
J
so that
=3
v
is the proper v-value to use in (4.2).
Thus we obtain
+
W(6) = 9
and
from (4.2).
As for the means (2.3)J we will need the values
M X = 42,
MY =
16.1 J
N W = 27 J
-0
MW
= 24,
-0
M Z =
78 .5
•
Now we find
and
i:: Xi W( i)
Since
1116/ >
O
- MXW
=
59 -
168 = -109
/-1091J we take W(i) = W(i) , so that
•
14
Z(l)
= 3.2
Z(2)
= 5.0
Z(3)
= 8.5
z(4)
= 15.7
Z(5)
W(l)
=0
W(2)
=1
W(3)
=2
W(4)
=
W(5) = 7
5
= 20.6
Z(6)
= 25.5
w(6)
=
The sums of squares and cross-products which we will need are
. I:
I:
~
= 53.51
XtZ(i)
and
,
= 839.5,
N
I:
j=l
I:
XiW(i)
b
,
284,
W~ = 169 •
J
Thus we obtain
2
M CTX
P
= 220
,
116
.../220 x 64
= ;::;;;;:::=:;:;::::=
and
2
N CTW
=
64.857
j
= j .955682
=
= /1.013393
= 1.006674
from (4.4), (4.3), and (4.5).
Since
·977590
Ripi =
j
.984114 < 1, this example falls
within Case I.
Wherever there are two signs in (447 - 4.12), we choose the upper sign
( since the upper signs are for
+
Wei)
- )
and the lower signs for W(i)'
Thus
(4.11 - 4.12) become
2~0 (45.2) -
uy TB =
1
/220 x 64.B57
(290.0)
=
-2.222317
and
=_
1
(23.7)+
-/220 x 64.857
2.271636
,
9
15
and so (4.8 - 4.10) are
A
f3 y
(-2.222317) + .984114 (2.271636)
1 - .9556S2
=
,
:: .29856
.984114 (-2.222317) + 1.013393 (2.271636)
1 - .9556S2
,
:: 2.5959
and
2
-/220
x
64.857
= .021884
Finally, (4~7)
is
= 1.0191
+
and thus the test statistic (4.6)
This
t
:;.~29:;;86~
= .....;2::.::59=:59:--;:w
t
has
-11.• 0191 ./;.021884
6-3
is
=
2.297
.08622
=
26.64
•
M-3=3 degrees of freedom. Hence the critical region for a
two-sided test at the a
= .05
level is
we reject the null hypothesis f3 y = f3Z.
Itl
> 3.18. Since
A 95 %
/26.64/ > 3.18,
confidence interval on
(f3z - f3 y ) is given by
2.30 + 3.18 (.0862)
,
i.e.,
•
16
For our second example, we will use exactly the same figures
as for
the first example, except that we will suppose that X4 = 9 instead of 6.
Since the W.'s are the same as before, v will also be the same as before,
J
and therefore the W+(i) I sand W-(i) I s will likewise be as before.
will stay the same except that M X = 45.
The means
We find
+
--0
E Xi W(i) - M X W
=
299 - 180
=
119
- -0
E Xi W(i)
- MX
W
=
65 - 10
8
=
-115
and
Thus we take W(i) = W(i)
•
(sincel119J > 1-1151>; this means that the W(i) t S
and Z(i)'s are exactly as before.
The sums' of squares and cross-products will be the same as before except for
2
E XiYi = 165.1, E XiZ(i) = 886.6, E Xiw(i) = 299
•
2 2
Hence M CT2
X = 221.5 now, but N CTW' N CTWO ' and R are unaltered. We obtain
E Xi
=
p
This time,
= 559,
.119
/221. 5 x 6liO
Rlpf =
•
1.006141 > 1, which means that we are thrown into Case II.
It remains to calculate (4.17 - 4.18), (4.14 - 4.16), and (4.13).
ttY TB
=
1
221.5 (44.35)
uz TB
=
_ ~999471 . (23.7)
/221.5 x 6~
A
_
0999471. (297.85)
/221.5 x 64
+
=
.99~42 (160.2)
~Y
=
(-2.300067) + f2.301528)
1 - .9989 2
"!3z
=
(-2.300067) + (1/.998942) (2.301528)
1 - .998942
=
We find
-2.300067
,
= 2.301528
,
,
1.3809
=
3.6853
,
.
17
2 x .999471
=
/221.5 x 64
,
and finally
t
=
,
3.6853 - 1.3809
1
I .022953
.999471
6-3
=
2.304
.08752
=
26.33
•
J
Since
126.331
)l'
3.18, we reject the null hypothesis f3 y ::: f3Z. A 95%
fidence interval for
(f3z - f3 y ) is given by
,
i.e.,
con-
18
REFERENCES
[lJ
Potthoff, Richard F., "Illustration of a Technique Which Tests Whether
Two Regression Lines Are Parallel When the Variances Are Unequal", M1meo
Series No. 320, Department of Statistics, University of North Carolina, 1962.
[2J
Potthoff, Richard F., "Illustration of A Test Which Compares Two Parallel
Regression Lines When the Variances Are Unequal", Mimeo Series No. 323,
Department of Statistics, University of North Carolina, 1962.
[3]
Potthoff, Richard F., "Some Scheff~ -Type Tests for Some Behrens-FisherType Regression Problems", Mimeo Series No. 374, Department of Statistics,
University of North Carolina, 1963.
[4J
Scheffe, Henry, "On Solutions of the Behrens-Fisher Problem, Based on the
t-Distribution", Annals of Mathematical Statistics, volume 14 (1943),
PP. 35-44.