THE APPLICATION OF A CONFIDENTIALITY
TRANSFORMATION TO OBSERVATIONS
FROM A SIMPLE LINEAR REGRESSION MODEL
by
/
Gipsie Bush Ranney
Institute of Statibtics
Mimeograph Series No. 10112
Raleigh - November 1975
iv
TABLE OF CONTENTS
Page
LIST OF TABLES
1.
INTRODUCTION
1.1
1.2
2.
Background • •
• ••
The Problem to be Considered.
3.1
The Model Prior to Transformation
• • • • • •
The Transformation Procedure • • • • • • •
Properti.es of the Permutation Matrices, G •
Properties of the Transformed Observation~ • •
Estimation of I-L and 8
Estimation of 0'2 • • •
Some Asymptotic Properties of the Lease Squares
Estimators • • • • • •
TESTING THE HYPOTHESIS 8
4.1
4.2
4.3
4.4
1
3
5
LEAST SQUARES ESTIMATION USING THE TRANSFORMED
OBSERVATIONS • • • • • • •
3.2
3.3
4.
1
THE MODEL . • • • • •
2.1
2.2
2.3
2.4
3.
v
=a
The Test Statistic F* • • • •
Small Sample Simulations of the Power of the Test
Based on F* • • • • •
• • • • • •
The Asymptotic Efficiency of the Test Based on F*
Relative to the Test Based on F • • • •
Further Simulation Results • • • • • • • •
5.
CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH •
6.
LIST OF REFERENCES
•••••••••••••••
5
5
9
17
26
26
38
55
64
64
68
76
89
97
100
ABSTRACT
RANNEY, GIPSIE BUSH.
The Application of a Confidentiality Trans-
formation to Observations From a Simple Linear Regression Model.
(Under the direction of C. H. PROCTOR and A. H. E. GRANDAGE.)
Information about individuals is routinely collected and maintained in data files by government and private agencies.
The persons
involved may object to the dissemination of certain sensitive information in the form of individual records, even though this may be
necessary for legitimate purposes of analysis.
The information may,
however, be rendered unidentifiable in order to protect the privacy of
the individual.
Various methods of doing this have been proposed
within the framework of randomized response, such as multiplication of
a sensitive item by a random component or the addition of a random
component to the item.
Another approach, considered in this thesis, is to randomly permute all or some of the observations on sensitive items prior to· the
release of the data.
Such a procedure disturbs the relationship
between a sensitive item and other items on the individual records.
The degree of disturbance of that relationship is related to the
amount of privacy afforded the individual, and both depend on
w , the
proportion permuted.
This thesis studies the effects of such a transformation procedure on estimation and testing for a simple linear regression model
with the sensitive item as the dependent variable.
estimators of
.
est~mator
0
f
~
cr 2
and
S
Least squares
in the model are discussed.
The usual
is found to be biased, and an unbiased estimator
which is a linear function of the regression and error sumsof squares
is developed.
Relative efficiencies of the estimators of
are found in terms of
w , the ratio
2
6 /0
2
6
asymptotic properties of the estimators are also derived.
S =0
2
and the number of
observations, and they are computed for Some specific examples.
the hypothesis
a
and
Some
A test of
and its power are discussed, and the results of
small sample simulations are shown for two examples.
The asymptotic
efficiency of this test based on transformed data relative to the
usual test based on the original observations is found for certain
asymptotic properties of the observations on the independent variable.
Finally, some further simulation results are presented.
THE APPLICATION OF A CONFIDENTIALITY TRA.NSFORMA.TION TO
OBSERVATIONS FROM A SIMPLE LINEAR REGRESSION MODEL
by
GIPSIE BUSH RANNEY
A thesis submitted to the Graduate Faculty of
North Carolina State University at Raleigh
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
DEPARTMENT OF STATISTICS
RALEIGH
1 9 7 .5
APPROVED BY:
----------------'.
Co-Chairman of Advisory Committee
Co-Chairman of Advisory Committee
i1
BIOGRAPHY
The author was born September 15, 1942, in Kingsport,
and is the daughter of Raymond B. and Lola Marcum Bush.
Tennes~ee,
She attended
Kingsport puplic schools and was graduated from Dobyns-Bennett High
School in 1960.
After receiving a Bachelor of Science degree with a major in
mathematics from Duke University in June, 1963, the author worked as
a computer programmer for Tennessee Eastman Company.
She enrolled in
the Gr&duate School of North Carolina State University in July, 1964,
and received the Master of Experimental Statistics degree in 1966.
She continued her enrollment in the graduate program of the
Department of Statistics while she was employed as a statistician by
the Research Triangle Institute,
In 1971, the author moved to
Georgia to be with her husband, J. W&rren Ranney, who is an emp1Qyee
of the U. S. Forest Service.
In October, 1974, she returned to
Raleigh and has been a part-time emp!oyae of the Research Triangle
Institute while completing her dissert&tion research for the Doctor
o~
Philosophy degree in statistics.
iii
ACKNOWLEDGEMENTS
The author wishes to express her sincere appreciation to Dr. C. H.
Proctor, co-chairman of her advisory committee, who suggested the topic
for this research and provided valuable assistance throughout its
execution.
The continued guidance and encouragement provided by Dr.
A. H. E. Grandage, co-chairman of her advisory committee, have contributed significantly to the successful completion of the author's
graduate program.
Also, appreciation is extended to the other members
of the advisory committee, Dr. R. J. Monroe, Dr. C. P. Quesenberry,
and Dr. H. L. W. Nuttle, for their helpful suggestions and guidance.
Special gratitude is expressed to Dr. W. K. Poole and Dr. D. G.
Horvitz of the Research Triangle Institute and to Dr. A. L. Finkner
of the U. S. Bureau of the Census for their encouragement and
counsel throughout the author's graduate program.
To her parents, Mr. and Mrs. Raymond Bush, and her husband, Jack,
the authPr expresses her deepest appreciation for their support and
patience and for the sacrifices they have made in her behalf.
Finally, the author gives special thanks to Ms. Linda
Bielawski who has invested many hours typing both the draft and final
copies of this thesis.
v
LIST OF TABLES
Page
3.1
A
Relative efficiencies of s*
values, n = 100 • • •
3.3
4.1
4.2
4.5
2
35
for three sets of X
Asymptotic relative efficiencies of
assumed constant • •
53
Su
Small sample simulated power values for X values
equally spaced • • •
• • • • • • • • • •
62
.. . .
72
Small sample simulated power values for X values
expected values of N(O,l) order statistics ••
74
Simulated power values and z statistics for F*
distributed as Xc'i A*)' n = 500 • • • • • •
93
Simulated power values and z statistics for three
sets of X values and n = 100, 200 •• • • • • •
94
Simulated power values and z statistics for X values
expected values of N(O,l) order statistics for
n = 100 . • • • • • • • . . • . . ~ • • • . . • •
96
,
4.4
A
Relative mean square erros of S*, Su and Suw for sets
of X values, n = 100 • •
• •••
Ul
1.
1.1
INTRODUCTION
Background
The problem of protection of privacy has received an increasing
amount of attention in recent years, both within and outside the
statistical profession.
This is due, in part, to growing awareness
of the power of the computer as a tool for the creation and maintenance of data banks and for linking items of information about
individuals from many separate sources.
Also, a greater demand has
been made upon individuals to supply information to various government
and private statistical agencies.
Due to the fact that legitimate
needs for information related to research and decision-making
dp
exist, these agencies are pressed to disseminate the data they have
collected, not only in aggregate form, but also in the form of
individual records.
Since the uses to which such data are to be put
will often be beneficial in some sense, the question of how to provide the needed infdrmation without violating the privacy of the
individual arises.
A distinction should perhaps be made between the protection of
privacy and the maintenance of data security.
Recently, techniques
have been developed for use by data gathering agencies to protect
their computer files against undetected or accidental disclosure.
These techniques often involve the application of encrypting devices
which are reversible.
Such techniques are applied for the purpose
of maintaining data security internally and while transmitting data
files from one point or organization to another.
Although data
security and protection of privacy are certainly related, the problem
2
to be considered here is one of data dissemination, rather than
maintenance of internal data security.
If it can be established that a useful purpose can be served by
releasing a set of data in the form of individual records, the
question arises as to how individual records may be released without
violating privacy.
The removal of direct identification may not
always prevent the individual from being identified if sufficient
other information about the individual is contained in the record.
One approach which):as been used is the release of a Small sample of
i.ndividual records with direct identification removed.
Another approach will be considered here.
Suppose certain data
items which would be considered sensitive in the sense of violating
privacy, such as income or financial assets, coulQ be identified.
\:
Such items could be removed from the individual records.
However,
these i.tems may be considered important for purposes of analysis.
Various methods for rendering such sensitive information
unidentifiable have been proposed within the framework of randomized
response.
These methods include the multiplication of the sensitive
item by a random component, the addition to the item of a random component, or randomizing the order of the records and adding the
sensitive items from successive records.
discussed by Warner in (13).
These types of methods are
AlSO, Poole (8) has investigated the
estimation of the distribution function of a random variable which
has been multiplied by a random component whose distribution is known.
Warner (13) suggested another approach which entails the random
permutation of the observations.
If the individual observations on a
3
sensitive item are randomly permuted, the observations on that item
remain intact, but the relation between that item and other items in
the record is disturbed.
The degree of disturbance of that relation
is related to the amount of privacy afforded by such a procedure.
Another possible application of the procedure to be considered
should be mentioned here.
At times, the privacy of individuals may be
violated when data about the individuals is subpoenaed in a legal proceeding.
It might be possible to protect a set of data from subpoena
by randomly permuting Some small proportion
sensitive items.
of the observations on
In this context, the problem then becomes one of
maintenance of data security in the sense of pre'venting the data from
b~ing
1.2
presented as legal evidence.
The Problem to be Considered
Suppose that a particular item such as income has been identified
as sensitive and that the individual observations on that variable
have been randomly permuted.
Suppose further that in analyzing the
data a simple linear regression model with the permuted variable as
the dependent variable and Some other observed variable as the independent variable has been assumed to be appropriate.
It is then of
interest to examine the effect of applying such a transformation on
the usual estimators and tests performed for such a model.
Although
it may be unlikely that a linear regression model with a single
independent variable would be proposed as the appropriate model for
analyzing data from a human population, the investigation of the
4
effects of applying the technique to observations from such a model
will provide Some basis for evaluation of the technique.
A particular type of random permutation of a sensitive item has
been selected for consideration.
Rather than selecting a random
permutation from the set of all possible permutations which could
occur, the set of permutations has been restricted in such a way that
a certain fixed number or percentage of the observations will be interchanged by the permutation selected.
Then it is assumed that the data
would be made available, and the number of observations which had been
interchanged would be reported along with a description of the transformation procedure which had been applied.
The problem to be addressed is the examination of the effects of
the application of such a procedure to a set of observations from a
simple linear regression model when least squares techniques are used
to estimate the parameters of the model.
Comparisons are made on the
basis of loss of information and bias introduced by the use of the
transformation with the original untransformed set of observations
taken as the point of reference.
The underlying model and assumptions
which are assumed to have produced the original observations, the
transformation procedure and properties of the resulting set of
observations are discussed in
Chap~er
2.
In Chapter 3, least squares
estimation of the parameters of the original model is considered.
Chapter 4 contains a discussion of the problem of testing the
hypothesis that the regression coefficient in the underlying model is
equal to zero.
Finally, general conclusions and suggestions for
future research are presented in Chapter 5.
5
2.
2.1
THE MODEL
The Model Prior to Transformation
Assume that
n
observations have been selected at random from a
population with model
(2.1.1)
where
Y
is an n-e1ement column vector of observations on some
sensitive item,
X
is an
n x 2
matrix of the form
(2.1.2)
where
x.].
1
= X.
].
is a vector of ones and the components of the vector
-
X,
are fixed quantitites measured without error.
x
The
2 x 1
parameter vector is
y
= [~,SJI
and the vector
e
,
(2.1.3)
is such that
and
V(!.)
= CJ 2 I
When necessary, the components of
(2.1.4)
•
~
will be assumed to be normally
distributed.
2.2
The Transformation Procedure
The observations for the model described in 2.1 are, in fact,
ordered pairs
(Y.,X.)
,or
]. . ].
(Y.,x.)
].
].
values are corrected for their mean.
when the observations on the
The transformation to be
X
6
considered reorders the
Y values and produces a new set of ordered
pairs, Some of which are different from the original set and Some of
which are the same.
Suppose that a certain percentage,
original pairs are to be altered by the transformation.
w, of the
Then the
transformation procedure to be considered would consist of selecting
a simple random sample of
= wn
r
of the observations and inter-
changing the
Y
values in the sample, so that in the new set of
observations
r
of the
Y
values are no longer paired with the
corresponding x-values which produced them.
wn
n
It will be assumed that
is integer valued for purposes of analysis.
Also, r
must be at
least two in order to perform any interchange.
The interchange can be achieved in practice by first selecting a
simple random sample of
r
of the subscripts
1, 2, ••• , n
used to index the observations, and randomly permuting the
integers selected.
The resulting set of
r
which are
r
permuted integers are
then cycled one step to produce a second set of
r
sequences of integers are combined to form a set of
integers.
r
The two
ordered pairs,
the first member taken from the original permutation and the second
member taken from the cycled permutation.
Finally, the resulting set
of ordered pairs is used to determine the interchange to take place
by using the first member of each pair to denote the original subscript of a
Y
subscript of the
set of
n
value and the second member to denote the original
x
value with which i t will be paired in the new
observations.
The
n - r
integers not selected in the
sample of subscripts denote the observations which will be unaltered.
7
For example, suppose there are ten observations in the original
sample, denoted by
Suppose
w=
selected are
.4 , so that
2, 3, 5
=4
r
and
8.
, and the subscripts or integers
Suppose further that random
permutation of these integers produces the sequence
Cycling one step then yields the sequence
5, 3, 2, 8 •
8, 5, 3, 2 .
Combining the
two sequences produces the resulting set of ordered pairs:
(5,8),
(3,5),
(2,3),
(8,2).
Therefore, the resulting set of ten observations will be
where the subscripts refer to the original set of observations.
Through the procedure discussed above, the original set of
ordered pairs
(Y. ,x.) ,
].
set of observations
Y~
and
].
= Y.
].
].
i
= 1,
(Y~,x.)
].
].
••• , n
such that
for the remaining
n - r
n
has been transformed to a new
Y~ ~
].
Y.
].
values of
for
i.
r
values of
It is assumed
that the only information made available for analysis is the new set
of
n
observations
(Y~,x.),
].
].
the value of
r , and a description of
the transformation procedure.
The transformation can be viewed as being accomplished by premultiplying the original observational vector
Y
by one of a set of
i
8
possible permutation
has the form of an
changed.
vector
n x n
G
s
s
= 1,
••• , S (r, n) ,where
identity matrix with
r
Gs
of the rows inter-
The transformations of the original vector
Y
to the new
can be written
y*
Y~~
Given
matrices
= G Y
for Some
s-
nand
s
= 1,
(2.2.1)
r , it can be seen that
= (rn) r !
S (r ,n)
Several of the
S(r,n)
•
(2.2.2)
transformations
same set of transformed observations.
mutations of
••• , S(r, n) •
r
G
s
do, in fact, yield the
The set of
integers can be separated into
r!
possible per-
(r-1)!
sets of
permutations which are unique in the sense that at least one of the
integers is preceded and followed by a different pair of integers than
in any other of the
(r-1)! sets.
permutations contains
r
members.
set is selected, the remaining
Each of these
If one of the
r - 1
(r-1)!
r
sets of
members of the
members can be obtained from
it by successively cycling the selected member with cycle lengths
1, 2,
n
..., r-l
•
ordered pairs
of observations.
Therefore, there are
(Y~,x.)
1.
1.
n
r
( )(r-1)!
different sets of
which can be obtained from the original set
When necessary, the distinction between all trans-
formations and all uni.que transformations will be made.
9
2.3
Properties of the Permutation Matrices,
As mentioned earlier, the matrices
matrix,
If
~hen
E..
are
if another matrix
matrices
of the rows inter-
A is
p~emultiplied
A with its
i
by
E .. , the
~J
th
.th
J-
and
is, in fact, an identity matrix with rows
E ..
~J
interchanged.
j
r
n x n
denotes a so-called elementary row transformation
~J
resulting matrix will be the matrix
interchanged.
s
G
s
having the form of an identity matrix with
changed.
G
Then, any of the permutation matrices
E.. •
represented as a product of such matrices,
~J
rows
i
and
can be
G
s
One particular
set of row transformation matrices which can be used to represent a
G
s
is the set of
E..
whose subscripts are the set of
~J
selected as the simple random sample corresponding to
r
G
s
integers
with the
subscripts ordered in a manner corresponding to the permutation which
produced
If the permuted subscripts used for the transformations
G
s
corresponding to
G
s
G
s
are denoted by
t l
t
,s '
2 ,s
, ••• , t
r,s
, then
can be represented as
= Et
t
r-l,s r,s
Et
t
r-2,s r-l,s
E
t
t
E
t 2 ,s 3,s t l,s 2,s
(2.3.1)
In the example presented in the previous section, for instance, the
permutation matrix corresponding to the transformation discussed can
be represented as
10
It will be useful later to observe that any
G
s
is a non-singular
transformation since the rank of a matrix is unaltered by a
succession of elementary row transformations, so that
same rank as
~1
= G's
G
s
r
I.
Gs
Further, any
is an orthogonal matrix;
l.~.,
Gs
-1
and therefore the determinant of any
is even or
if
+1
r
of the permutation matrix
of
the sequence
g..
11S
= o.
G
s
= (gijs}
s
' t 2 , s , ••• ,
t
= 1,
s
2, ••• , S(r,n)
giis
r,s
does not appear in
i
=1
; otherwise,
Therefore,
n-r
(2.3.2)
=n
for all
Gs
,
i
say
..., n .
= 1,
,
gijs
i
,. .
j
Next, consider an off-diagonal element of
When the subscripts
adjacent positions in the sequence
order
if
, consider first a diagonal
When the subscript
G
t ,s
l
is either
is odd.
In order to find the expected value over
element
has the
G
s
ij , then
g ..
1JS
=1
t
1,s
otherwise,
,
g..
1JS
t
i
2,s
and
,
=0
... ,
appear in
j
t
in the
r,s
So
= P(g..
= 1)
1JS
=_,;;,r_,...
Therefore,
(n-r)/n
e(G s )
is an
is a symmetric matrix with diagonal elements
and off-diagonal elements
written in the form
n x n
aI + bJ
matrix of ones.
e(G s)
(2.3.3)
i ,. j
for all
n(n-l) ,
= n-r-1
n-1
I +
where
r/n(n-1) •
I
Such a matrix can be
is the identity matrix and
J
In this case, then,
r
-n"";(n::"'--l-) J •
(2.3.4)
11
It will be useful to find the expected value of
(c
ijS
B
s
} , where
= GsA
A = (a .. }
~J
= {b ijS }
b ..
=
c ..
=
~Js
is an
n x n
G~ = (g~js}'
and
C
s
= GAG'
s s =
matrix of constants.
Let
Thus,
and
n
~J
s
=
n
=
L: b i1,sg j1, s
b i1,sg1,js
'
£,=1
£,=1
L:
n
n
L:
l:
1,=1 k=l
giksak1, g j.ts
.
Therefore,
c ..
~~s
=
n
n
l:
l:
1,=1 k=l
akdgiksgiAs
J/J
n
J/J
n
= k~l akkgiks + £,~l k~£, ak£,giksgi1,s '
since
Since the rows of
identity matrix,
c ..
~~s
giksgi£,s = 0
when
k
G
s
~
£,
are the rows of the
for any
=
and
n
=
l:
k=l
akke(giks) = aiie(giiS)
5,
so
12
From (2.3.2) and (2.3.3), it follows that
n-r
r
= - a oo +~(-'1':"")
n
~~
n n-
e(coo~)
~~~
a kk
l:
k~i
and since
n
a kk =
l:
k~i
e(c
For
i
+j
us )
=
l:
k=l
a kk - a{{ ,
......
r
n-r-1
a .. +
~~
n-1
n(n-1)
n
l:
k=l
a
kk
.
(2.3.5)
,
n
c ..
~Js
•
n
l:
l: ak,tgiksgjJ,s
.t=1 k=l
n
= l: akkgiksgjks + ajigijsgjiS + aijgiisgjjS
k=l
(2.3.6)In order to evaluate
products
giksgj.t s
summation above.
identity matrix,
e(c .. )
~Js
for
i
~
j , the expected values of the
will be found for each of the terms in the
Since the columns of any
for
i ~ j
G
s
are those of the
and any
k.
Therefore,
13
In the following, it is assumed that
gijsgjis
=1
when the subscripts
positions in the sequence
and
j
j
~
k
~
t.
Now,
appear i.n adjacent
t
in both the
, ••• , t
1 ,s ' 2 ,s
r,s '
orders
iJ'
and
can appear in both the orders
j
and
t
i
~
i
J'i', otherwise,
g ijs g j is -- 0
ij
and
The subscripts
i
only when
=2
ji
r
Therefore,
( 1/ (~)
=2
r
)
I
o
"
".
When neither of the subscripts
t
1,s
t
2 ,s
, ••• , t
and
sequence in the order
j
appear in the sequence
otherwise
r,s
When the subscripts
= o.
i
i,j
and
k
appear in adjacent positions in the
jik , then
g
=1
g
iks jis
" otherwise
Therefore,
(
e(giksgjiS)
= P(giks g j is = 1) =
/
r
0
)
l
[(n-3) / (n)] r(r-3) !
r-3
r
0
r!
=2
2 < r s; n •
......
Similarly,
0
r
=2
')
e(gijsgjt s )
= P(gijsg j,2,s = 1) =
-'
'l:(n-3)/(n)lor(r-3)I ,
r-3
r··
r!
2<rs;n.
14
If the subscript
i
and
=1
k
j
does not appear in the sequence and the subscripts
appear in adjacent positions in the order
,. otherwise
g
g
iks jjs
=0
'
ik
so
e ( giksgjjs ) = P ( giksgjjS
= 1) = [(n-3)/(n)].r(r-2)!
r-2
r
r!
'
2 s: r s: n •
Similarly,
2s:rs:n.
Finally,
giksgjLS
in the sequence
t
=1
l ,s
when all of the subscripts
,t
2 ,s
' ••• , t
r,s
,the subscripts
appear in adjacent positions in the order
and
L
o.
i
appear
and
k
ik, and the subscripts
appear in adjacent positions in the order
giksgjLS =
i, k, j, L
jL; otherwise,
Therefore,
(
r
s: 3
From (2.3.6) and the preceding with the expected values simplified,
j
15
= (n-2)(n-3)
2
a ..
a .. +
n(n-l)
n(n-l) J1.
1.J
+
e(ciJ,s) =
+
+
2
2
n(n-l) k~~,j a ik + -n~(n---l~) k~~,j a kj ,
r = 2
(n-3) (n-4)
+ 8(n-3)
~
n(n-l)
a iJ·
n(n l)(n 2)
a ik
k~i,j
3(n-3)
n(n-l)(n-2)
~
k~i,j
~
3
n(n-l)(n-2) k~i,j
~
3
a ki + n(n-l) (n-2)
~
a
jk'
k~i,j
aki
=3
r
(
) = (n-r)(n-r-l)
+ r(n-r)
~
e c ijS
n(n-l)
a ij
n(n-l)(n-2) k~i,j a ik
+
+
+
r(n-r)
n(n-l)(n-2)
~
~
r
k~i,j a kj + n(n-l) (n-2) k~i,j a ki
r
~
n(n-l)(n-2) k~i,j
r(r-3)
n(n-l)(n-2)(n-3)
a
jk
~
~
t~i,j k=i,j,t a kt
'
(2.3.7)
e(c., ) can be represented as
1.J S
16
since for any
s , premultiplication of the matrix
postmu1tiplication by
of
A , so that
c
= a kt
the coefficients of the
coefficients of the
value of
k,t
from some
,
ak,t s
a's
Gs
and
simply shifts the positions of the elements
G'
s
ijS
A by
combination.
Therefore,
shown above should sum to one.
The
in (2.3.7) do, in fact, Sum to one for each
r.
If the matrix
A
is symmetric, so that
for all
k,.t ,
then
=
(n-2) (n-3)+2
~2~_
n(n-l)
a ij + n(n-1) k~~,j a ik
r
e(c ijS )
= (n-r)(n-r-1)
n(n-l)
+
a ij
r(n-r+1)
n(n-l)(n-2)
+
r(n-r+1)
~
n(n-l)(n-2) k~i,j a kj
+
r(r-3)
n(n-1)(n-2)(n-3)
3
~
r
~
n •
=2
~
~
~
k~i,j
t~i,j k~i,j,t a kt '
a ik
i ~ j
(2.3.8)
17
2.4
Properties of the Transformed Observations
The random process which produces a particular transformation
matrix
G
s
is taken to be independent of the process which produced
the original set of observations, so from (2.2.1),
= e(Gs)e(y)
•
e(y*)
-
= p,(G s-Y)
Therefore, under the assumptions given in 2.1, it
follows from (2.3.4) that
e(y* )
= (n-r-1
n-1
= n-r--1
n-1
I +
XV
-
+
r
)
n(n-1) J Xl
Since
and
e(y*)
=X
JXY
r
n(n-1)
•
•
n
L;i=l xi = 0 ,
==
xy* .
(2.4.1)
n-r-1
n-1 S
The variance-covariance matrix of
V(y*)
= e([y*
y*
is
- e(y*)l[Y* - e(y*)l'}
= e(y*l*') -
e(y*)e'(y*) •
From (2.2.1) and (2.4.1), then
V(Y*)
= e(G YY'G') s-- s
X~*"i*'X'
•
(2.4.2)
In order to evaluate the first term of (2.4.2), it will be helpful to
find expectations in two stages.
First, the original set of
18
observations will be considered fixed and the (;xpected vatue will be
found over the set of transformations
original set of observations,
by
e.G.
G
s
.' given a particular
StICh an eKpected value will be denoted
Then, the overall expected value will be found by evaluating
the expected value of
eG
over the original random process.
There-
fore, the first term of (2.4.2) will be evaluated as
G YY'G'
s--
s
i.~.,
is a matrix of the form discussed in 2.3;
~th diagonal term of
Therefore, the
"
~:'G
A
= YY'
(G s-YY' G')
s
will
be, from (2.3.5),
(eG(G YY'C'». =
s--
s
1.
n-r-l
n-l
2
Y.
1,
r
n
+ ---~- L:
n(n-l) k=l
2
k
Y
for all
r.
(2.4.3)
Since
YY'
is a symmetric matrix, it can be seen from (2.3.8) that
the off-diagonal terms of
(ec(c s-YY'C'» ..
s 1.J
e(G yyiG 1 )
G s::-_ S'
are of the
= fu-2) (n:12+2
nCn-l)
2
+ --:-(---:"'l)~
11.
n-
(Y.
1.
+
Y Y
i j
Y.,)
J
l:
Yk '
k;'i, j
rt 1)
........ (Y + Y )
n(n-l)(n-2)
i
.j
r(n+ --
fo~m
=2
r
y
k
(2.4.4)
19
From 2.1, it follows that
=
(~
+ Sx.)
~
2
2
+a
2
n
+ m
L: x k
k=l
e(y.Y.)
~J
= ~ 2 + ~e(x. +
~
=
(n-2)~
2
2
x.) + S x,x. ,
J
2
~J
-!--LS (x. + x.) +
J
~
(n-2)~Sx
6
~
_ s2 x 2. - s2 x.x.,
~
=
(n-2)~
_ s2 x 2.
J
J
~
2
-
=
-
Q
jJ
~S(X.
~
+ x.) +
J
(n-2)~6X,
J
2x.x,
J
~
(n-2)(n-3)~
222
2
2(n-3)~6(x.
~
+ x,)
J
2
+ 26 (x. + x.) + 26 x. x ,
J
~
n
~
J
2
L: x •
k
k=l
From the above and (2.4.3), it can be shown that the expected value
of a diagonal element of
e(G YY·'G') is
s--
s
20
2
2
r
2n
2
(t(G YY'G'» .. = CJ + I-L + --(- _l)rl"3.L: x
s-- s u.
n ,n ~ k=l k
+
n-r-1 2 2
2(n-r-11
I-Ll3x..
n- 1 13 x.~ +
n- 1
~
(2.4.5)
From (2.4.4), it follows that the expected value of an off-diagonal
element of
t(G YY'G')
s--
(t(G YY'G'»
s--
is
s
s
= I-L 2
..
~J
n-3
n-4 2
+ -1 I-LS(x. + x.) + 13 x,x.
n-
n
J
~
222
2
n ( n--1) 13 (x.~ + x.)
J
(t(Gs!!'G~»ij
=
I-L
r
~
J
=2
2 + c(n-r) (n-r-l) + rig-r) (n-r+l)
n(n-1)
n(n-l) (n-2)
2r(r-3) _
'
- n ( n- 1)( n- 2)JI-LS(x.~ + x.)
J
+ [(n-r)(n-r-l)
n(n-l)
2r(n-r+l)
n(n-l)(n-2)
2r(r-3)
2
+ n(n-l)(n-2)(n-3)JS xix j
-r(n-r+l)
2r(r-3)
2 2
2
+ [n(n-l) (n-2) + n(n-l) (n-2) (n_3)JS (xi + x j )
r(r-3)
132
n(n-l) (n-2) (n-3)
n
2
L: x
k
k=l
(2.4.6)
To evaluate
V(!,,~)
, the corresponding di.agonal and off-diagonal terms
of Y::L*y'*'X'
must be subtracted from the quantities shown in (2.4 .5)
and (2.4.6).
The
, th
~--
d'~agona 1
term
0
f
is
21
(Xy"~~~~IXI)J.'J.'
For
i
~ j
= 1J.2 +
in-r-l~2
S2x: +
J.
(n-l)
XYi~*IX'
, the termS of
2~n:~)!2.IJ.SX,
n
(2.4.7)
J.
corresponding tofuose in (2.4.6)
are
(XY~"Yi(IXI),. = IJ. 2 + n-3 IJ.S(x, + x.) + (n-3)
J.J
-
n-l
(XY*'Y~'('X')"
- -
=
J.J
2
II.
~
+
J.
J
2
S2x x , r = 2
i j
(n_l)2
n-r-l
I IJ.S(x, + x.)
nJ.
J
2
+
(n-r-l)
2
2 S x,x.
(n-l)
J. J
(2.4.8)
3~r~n.
From (2.4.5) and (2.4.7), it can be shown that
V(X*)
has diagonal
terms
(V(Y*»ii
= cr
2
r
Q2
+ n(n-l) ~
n
2
r(n-r-l) Q2 2
xk +
2 ~ xi ' (2.4.9)
k=l
(n-l)
~
and from (2.4.6) and (2.4.8), it can be £hown that
V(X*)
has off-
diagonal terms
-4
2
(V(y*» ,. =
2 S x,x.
J.J
n(n-l)
J. J
i ~ j
(V(yi(»
-
, , = [r(n-r-l~
J.J
n(n-l)
2r(r-3)
r
=2
2r(n-r+l)
2r(r-3)
2
n(n-l)(n-2) + n(n-l) (n-2) (n_3)JS xix j
+ [(
n n- 1) ( n- 2) ( n- 3)
r(n-r+l)
2
2
- n ( n- 1) ( n-··
2) lS (x,J.
2
+ x.)
J
n
r(r-3)
S2
2
n(n-l) (n-2) (n-3)
k~l x k
i
:f:
j
(2.4.10)
22
The variance-covariance matrix of
can be obtained also by writing
yok
(2.4.11)
i.~.,
the variance-covariance matrix of
y*
is equal to the Sum of
the expected value of the conditional variance-covariance matrix of
y* ,
given
Gs
y* ,
given
Gs
(]
2
I
,
and the variance of the conditional expectation of
The first term of
(2.4.11) is, in fact, equal to
so
(2.4.12)
Since
(]2 I
(]2 > 0 , and the second term of
is positive definite for
(2.4.12) is a variance-covariance matrix and is therefore positive
semi-definite, it follows that
V(X*)
is positive definite and non-
singular.
If a new model is written for the transformed observations as
y* = e(x*) +
with
e(x*) = xy*
~*
,
(2.4.13)
as in (2.4.1), then clearly,
the variance-covariance matrix for
~*
e(~*)
=Q ,
and
can be written
=e[(x* - e(x*»(X* - e(x*»']
= VCJ..*) •
(2.4.14)
23
Therefore, from (2.4.9) and (2.4.10), it can be seen that the
e1y's
~
are correlated and their variances and covariances are functions of
13
2
, as well as
(j
2
Suppose that the probability density function of the vector
is denoted by
,
g(l)
Gs
... , n . Then, given a
y* = G Y . Since the matrix
s-
i = 1,
-co<Yi<co;
particular transformation matrix
Y
G
s
is orthogonal, as discussed in 2.3, the absolute value of the
Jacobian of the transformation from
every
s.
Y to
y'ic
has the value one for
Therefore, the conditional probability density function of
y* , given
G
, can be written
s
I
h (y* G )
s
g (G-ly*),
s -
:II
= 1,
i
- co < Yi: < co ,
~
(2.4.15)
••• , n •
If the probability density function of the transformation mattices,
s = 1, 2, ••• , S(r,n) , is denoted by
probability density function of
y*
P(G ) , then the marginal
s
can be written
S (r, n)
h(l*) =
r:
P(G s )h(Y*IG
) ,
s
s=l
so
S (r, n)
h(l*) =
1
P(G )g(G- y*)
r:
s
s=l
i
= 1,
s -
- co < Y'lf~ <co
(2.4.16)
••• , n •
If, in addition to the assumptions given in 2.1, the
assumed to be normally distributed, then
e . 's
~
are
24
1
-1
-1
- -2 (G s y*-xy) , (G s y*-Xy)
S(r,n)
=
h(y*)
P(G )
L;
s=l
s
1
2c
e
(2TT)n/2a n
- co < Y'l:~ <co ,
= 1,
i
(2.4.17)
••• , n •
Thus, the marginal probability density function of the transformed
observational vector
could be described in this case as a mixture
y*
of multivariate normal density functions.
The only case which will be
considered here is that in which
= S (r,
1
n)
P(G )
s
for
= 1,
s
••• , S ( r, n) •
Some of the properties of the transformed observations which have
been discussed in the preced-ing are of interest with respect to the
application of the transformation procedure to large data files.
considering large values of
about
wand
(1)
r
(2)
r
(3)
w
The notation
r
= Wn
n, three assumptions which can be made
are:
is constant, so that
= [nO']
,
0 <
0'
< 1,
is constant, so that
r · [nO']
w'" 0
so
r
r
as
w ... 0
n ... CO
as
,
n'" CO
increases with
,
n.
in assumption (2) is taken to mean
largest integer less than or equal to
value of
When
is taken to be
r
= [wn]
nO'
•
(2.4.18)
r
is the
Under assumption (3), the
Assumptions (2) and (3) may
be more likely to be acceptable in practical applications.
assumptions (1) and (2), it can be seen from (2.4.1) that
Under
25
lim ':1..*
= ':1..
(2.4.19)
•
n-+oo
However, when
w is constant,
lim Y...*
= [~, (1-wH3 J' .
(2.4.20)
n-+oo
Suppose it can be assumed that the centered
any
n
are finitely bounded,
o s: Ixil s:
M < +00,
X values observed for
i.~.,
i
= 1,
••• , n,
n
= 1,
2, ••••
This is not an unreasonable assumption in the usual regression framework.
Then under assumptions (1) and (2), it follows from (2.4.9)
and (2.4.10) that
lim
n-+oo
lim
n-+oo
(V(y*» ..
-
~~
(V(Y*» ..
~J
=a 2
= 0,
i
~ j
,
so that under these assumptions, the transformed observations tend
toward having the same variance structure as the original observations
when
n
becomes large.
However, when
w is assumed to be constant,
the same statement cannot be made, although the covariances of the
transformed observations do approach zero.
26
3.
LEAST SQUARES ESTIMATION US ING 'l'HE
TRANSFORMED OBSERVATIONS
3.1
Estimation of
I.L
and 8
Suppose the least squares estimator of
~. [~81'
which would
be applicable to the original observations is applied to the transformed observations;
i..~.,
~*
This is, in fact, the least squares estimator of
which would be
derived from the model shown in (2.4.13), ignoring the fact that the
variance-covariance matrix of
e*
does not have the form
1&2.
From
(2.4.1),
(3.1.2)
The estimator
"-
n
s* = r:
i=l
n
x.Y~/
r:
~ ~ i=l
is an unbiased estimator of
value of
r
able,
,
.!.~.
....
r ... n
defined.
following.
-
(3.1.3)
~
n-r-l
n-l S
,
rather than
is known, an unbiased estimator,
n-1
Su = n-r-1
If
2
x.
1 , then
~
u
S
,
of
,
but since the
S
is avail-
a*
A
s*
(3.1.4)
has expected value zero and
Therefore, it will be assumed that
Su
is un-
r of: n - 1 in the
27
A
The variance-covariance matrix of
y* can be written
v(i*) = e[(i* - e(y*»(y* - e(y*»'J
(3.1.5)
where
Since
l/n
0
n
o
1/
2
x.
L;
~
i=l
v(i*)
is of the general form
n
n
L;
L;
v",: .In
L;
~J
i=l j=l
n
n
2
i=l
x.
~
n
v",:./n
L;
j=l
~J
2
x.
L;
i=l
~
V(i*) =
n
n
x.
L;
j=l
L;
J i=l
n
v~.In L;
J~ i=l
2
x.
~
n
n"
L;
L;
i=l j=l
n
x.x.v"':./(
~
J ~J
r
i=l
x7)2
~
(3.1.6)
Using (2.4.9) and (2.4.10), it can be shown that the elements of any
row or column of
V~"
Sum to
Cf
2
; i.~.,
28
n
L;
i=l
n
v'!f . =
l.J
L;
=a 2
v'L
l.J
j=l
(3.1. 7)
'"
v('1..*)
and therefore, the off-diagonal elements of
for all
~*
r , so the estimators
~* = y* = Y has variance
2
a jn
are equal to zero
A
and
~*
are uncorrelated.
as in the untransformed case.
Also,
Sub-
stitution from (2.4.9) and (2.4.10) into (3.1.6) yields
(3.1.8)
+
From
(3~1.4),
~2
22
Ex4
(r(n -2n+3)-r (n-l)
(L;X2) 2
(n-l) (n-2) (n-3)
it follows that
(3.1.9)
Assuming
and
IJ. ,
estimator of '1..*
a
2
were known, a weighted least squares
could be written
(3.1.10)
and, from (2.4.1) ,
(3.1.11)
Then the variance-covariance matrix of
'"Y*
-w
is
29
(3.1.12)
An analytic expression for
a
2
by
has not been found.
<; o.
~J
,
V*
-1
in termS of
r,
n,
However, if the elements of
then
~
V*-l
,
13 2
and
are denoted
has the general form
------
n
n
l:
n
l:
c, ,
l:
~J
i=l j=l
n
x,
c, ,
l:
~
i=l
~J
j=l
(X 'V*-lX) =
n
n
x.
l:
c, ,
l:
J i=l
j=l
J~
n
n
l:
l:
i=l j=l
x,x,c, ,
~
J
~J
But from (3.1.7), it can be shown that the elements of any row or
column of
-1
V* -
sum to
1/a
2
n/a
,so
2
o
o
n
n
l:
l:
x,x,c, .
~
i=l j=l
J
~J
and therefore,
2
o
a In
o
n
n
[l:
l:
i=l j=l
x,x.c .. J
-1
~ J ~J'
(3.1.13)
30
Also,
a
2
n
In
o
I:
i=l
=
o
n
n
[I:
I:
x. ,c.
i=l j=l
= [Y
n
n
[L:
I:
,J -1
X.X,C.
1. ) 1)
i=l j=l
~)
n
,r"1
i=l
n
1.
L:
j=l
xjc,)~,
n
L:
i=l
I:
x)' c)'
j=l
Therefore, the weighted estimator of !J.
i
JI
(3.1.14 )
is the same as the unweighted
and untransformed estimators, and the estimators of !J.
13*
!J. ,
= n-r-1
n-1
13
2
are here again uncorre1ated.
13
and
estimator.
a
2
are unknown so
y*
-w
Su
and
In a practical situation,
would not be available as an
In Some examples to be presented later, the relative
efficiencies of a weighted unbiased estimator of
....
2
n
y'I:
L:
~)
Y-I:la
~
S
and the estimator
will be compared to give Some indication of the possible loss in
precision which may occur when the unweighted estimator is used.
In order to assess partially the effects of transforming the
observations, the mean square errors of
a*
and
8
Pu
relative to
that of the estimator
b
=
n
n
L:
i=l
X,Y.I L:
1 1 i=l
2
x.
1.
which would be used for the same set of untransformed observations can
be examined.
Since
S*
is biased,
31
A
= V(~*) +
and since
band Su
(3.1.15)
are unbiased, their mean square errors are
equal to their respective variances.
of
MSE(b)
and
MSE
and
(S u ) •
MSE(S*) ,and
Let
R*(n,r)
denote the ratio
denote the ratio of
Ru (n,r)
MSE(b)
Since
MSE(b)
R*(n,r)
.)
= V(b) = cr~/~
= 1/[1
A
2
,
2
+ ~ 2 ~2(
4 2 + 2(n-3~)
cr
(n-1)
Q.(n-1)
2 ~4
2
+~-(-)J
cr
R*(n,r)
2
Dc
= 1/[1
2
2
+
r
n-1
~
Dc
2
cr
(
=2
,
2
r 2
(n-1)
2
2
+ n r(r-3) - 3r(n-1)(n-4) - r (n-1»
2
n(n-1) (n-2)(n-3)
+
42
E2
~ (r(n -2n+3)
cr
2
~
2
2
- r (n-1»]
(n-1) (n-2) (n-3)
.
(3.1.16)
and
2
4
+ tL:. ~
cr
2
L;x
2
(_2_) ]
n-1
r
=2
,
R (n, r)
u
=
32
2 2 2
(n:~~1)2/[1 + ~ Ex 2 (n r(r-3)-3r~n-1)(n-4)-r (n-1»
cr
n(n-1) (n-2)(n-3)
+
242
2
~ (r(n -2n+3)
r (n-1»]
cr 2 Ex2
(n-1) (n-2) (n-3)
.
E
(3.1.17)
Let
f (n,r)
1
= 2(n-3)2'
f (n,r)
1
=n
=2
r
,
n(n-1)
2
2
r(r-3) -2 3r (n-1)(n-4) - r (n-l),
n(n-1) (n-2)(n-3)
3 ~ r ~ n ,
(3.1.18)
and
2
=-
n-1 '
r
=2
,
= r(n 2-2n+3) - r 2 (n-1)
)
f 2 (n,r
(n-1) (n-2) (n-3)
Then
R*(n,r)
and
R (n, r)
~ r
~ n
(3.1.19)
can be written
u
~2 -;Ex
2
R*(n,r) = 1/[1 +
3
2
(
r n
(n-1)
cr
2 + nf 1 (n,r)
(3.1.20)
2
2
R (n,r) .. (n-r-1)2/[1 + ~ ~ (nf' (n r)
u
n-1
2
n
'1 '
C1
(3.1.21)
33
General statements concerning the effects of the transformation
procedure based on
R*
and
R
are difficult to make due to the wide
u
variety of X-values which could be proposed.
Therefore, three
particular examples will be considered in some detail to provide some
comparative information.
(1)
The
They are:
X values are expected values of the order
statistics for a
(2)
The
N(O,l)
X values are
100
random sample with
= 100
n
•
points equally spaced about
zero with a range set to provide the same value for
L:X
(3)
2
In
The
as in the normal order statistics case,
X values are
100
(3.1.22)
points equally spaced
about zero with approximately the same range as the
expected values of the normal order statistics.
For each of the three sets of
below were performed for
and
2.00
and
l.00.
S2 1a 2
and
and
R*
= rln
tll
and
R
u
S2/a 2
values, computations to be discussed
values of
values of
.02,
.05,
.• 10,
.20,
.40,
.50,
.60,
l.00,
.80,
were computed for each combination of
for each set of
tll
X
computed and the efficiency of
X values.
"-
S*
Also,
relative to
V*
and
V*-l
were
"-
S* was examined.
w
The minimum value of this relative efficiency in the examples was
found to be
.9065, so that the loss in precision due to the un-
availability of
"-
s*w
as an estimator in comparable situations would be
expected to be fairly small.
eStimator derived from
"-
S*
w '
Further, the efficiency of an unbiased
34
relative to
b
was computed to compare with
computations are shown in Table 3.1.
is denoted by
R
uw
The results of the
R
u
"Q.
The relative efficiency of
"'uw
in the table.
Examination of the entries in Table 3.1 reveals that the loss of
w,
information in most of the cases is severe for larger values of
and the amount of information available for larger values of
w has
effectively been reduced to zero relative to that available in the untransformed observations.
greater than
R
u
, but as
When
S2/(J2
a2 /(J 2 = .05
,
is consistently
R*
increases, the effect of bias in
A
begins to be large enough to offset the larger variance of
w.
and the inequality is reversed for Some values of
2
quantities
4:x /n
and
The
shown in (3.1.20) and (3.1.21)
can be described as the variance and kurtosis of the
respectively.
Su'
X values,
The three examples have the following approximate
values for variance and kurtosis of the
X values:
Example
2
4:x /n
422
n4:x /(Dt )
(1)
.9726
2.7789
(2)
.9726
1. 7998
(3)
2.0831
1. 7998
In case (1.), the larger kurtosis value reflects a greater concentration
of the
X values about their mean than in the other two cases.
fixed values of
S2/(J2
and
w, both R* and
siderab1y more by an increase of variance of the
R
u
For
are reduced conX values than by an
increase of kurtosis, as would be expected by examining (3.1.20) and
(3.1.21).
It should be noted that both
f (n,r)
1
and
f2(n,~)
are
35
Relative mean square errors of "S*, Sand
u
sets of X values, n = 100
Table 3.1
(2)*b
(l)*a
S2/a 2
w
.05
Suw
for three
(3)*
c
Ric
Ru
R
uw
Ric
R
u
R
uw
R*
R
u
Ruw
.02
.20
.40
.60
.. 80
1.00
.994
.822
.546
.353
.237
.167
.956
.621
.341
.148
.035
.000
.956
.622
.341
.148
.035
.000
.995
.828
.544
.354
.238
.167
.957
.627
.345
.149
.035
.000
.957
.627
.345
.149
.035
.000
.990
.682
.362
.204
.127
.085
.954
.616
.335
.143
.034
.000
.954
.616
.335
.143
.034
.000
.10
.02
.20
.40
.60
.80
1.00
.989
.692
.376
.214
.134
.091
.953
.608
.328
.141
.033
.000
.953
.608
.329
.141
.033
.000
.991
.702
.378
.215
.135
.091
.955
.617
.336
.144
.034
.000
.955
.617
.336
.144
.034
.000
.980
.523
.221
.113
.068
.045
.948 .948
.596 .• 596
.316 .316
.133 .133
.031 .031
.000 .000
.50
.02
.20
.40
.60
.80
LOO
.946
.311
.107
.052
.030
.020
.926
.514
.252
.103
.024
.000
.939
.521
.258
.105
.024
.000
.955
.319
.109
.052
.030
.020
.934
.549
.276
.112
.025
.000
.936
.550
.277
.113
.025
.000
.908
.179
.054
.025
.014
.009
.907
.474
.220
.085
.019
.000
.916
.479
.223
.086
.019
.000
1.00
.02
.20
.40
.60
.80
1.00
.898
.184
.057
.027
.015
.010
.894
.431
.196
.077
.018
.000
.927
.447
.207
.081
.018
.000
.914
.189
.057
.027
.015
.010
.910
.482
.225
.088
.019
.000
.918
.486
.228
.089
.019
.000
.832
.098
.028
.013
.007
.005
.859
.378
.159
.059
.013
.000
.877
.386
.163
.060
.013
.000
2.00
.02
.20
.40
.60
.80
1.00
.815
.101
.029
.014
.008
.005
.836
.325
.135
.052
.012
.000
.905
.352
.149
.056
.012
.000
.841
.105
.030
.014
.008
.005
.865
.388
.165
.061
.013
.000
.883
.396
.169
.062
.013
.000
.712
.052
.014
.006
.004
.002
.778
.268
.102
.036
.008
.000
.816
.281
.107
.037
.008
.000
36
Table 3.1
(Continued)
a
X's expected values of order statistics N(O,l) ,
~x
2
In
~
.9726 ,
(n~4/(~2)2 ~ 2.7789
b
X's equally spaced about zero with same
4
(n~x/(r,x
2 2
) )
~x
2
In
value as (1),
1.1998
:&
c
X's equally spaced about zero with approximately same range as (1),
~
2
1n
=• 2.0831
,
(n~
4
1 (~x
2)2
)
=• 1.7998
37
quadratic functions of
r
some values of
n.
rand
for
r
~
3 , and can become negative for
In fact
f (n,r)
2
is negative for
r
=n
so an increase of kurtosis for a given variance would bring about an
increase in relative efficiency, but the value of the kurtosis term in
R*
and
R
u
at
r
=n
is too small relative to the other terms to
have any noticeable influence in the examples shown.
surprising that an increase in kurtosis of the
relative efficiency in this situation.
It is somewhat
X values would reduce
However, in the examples shown,
the decrease of kurtosis from case (1) to case (2) is accompanied by
a reduction in the range of the
X
values for fixed variance.
The
decrease of kurtosis from case (1) to case (3) is accompanied by an
increase in variance for approximately the same range of
X values,
and the effects of reduction of kurtosis are offset by the effects of
increased variance of the
X values.
From these examples, then, it
can be seen that general statements made about the effects of kurtosis
of the
X values without considering the other properties of the
frequency distribution of the
X values could be misleading.
The relative efficiencies of the weighted unbiased estimator,
~uw
' shown in Table 3.1 differ very little from the corresponding
relative efficiencies of
au .
The differences between
are slightly larger for larger values of
surprising, since
V(~*)
S2/a 2.
Rand
u
R
uw
This is not
would differ' more markedly from the form
Ia 2 for larger values of S2/a 2 , and the weighted estimator would
be expected to be more efficient in such cases.
However, it would
appear that, in situations similar to the examples, the loss of
38
information due to using
'"
rather than
Su
'"
S
uw
to estimate
S would
be of little consequence compared to the overall loss due to use of
the transformati.on procedure.
Estimation of (]
2
Suppose the Sums of squares and cross-products which would be
computed in the untransformed case were computed using the transformed
observations;
i.~.,
SST
=
n
n
;:
i=l
r:
(3.2.1)
i==l
(3.2.2)
and
SSE*
= SST
(3.2.3)
- SSR* •
Then,
e(SST) == (n-1)(]
222
+ S L:x ,
(3.2.4)
and
2
s
e(SSR*) = ;:x e[(p*)
2
...
2
J = ;:x 2...
[V(S*) + (e(s*)) J •
From (3.1.2), it follows that
and substitution of this quantity and
equation above yields
V(S*)
from (3.1.8) into the
39
e(SSR*)
=a 2
2
+ ~2~2[3 r(r-3) - ~r(n-l)(n-4) - 3r(r-l)
n(n-l) (n-2)(n-3)
+ {n-r-l~
2
J
(n-l)
422
+ ~2 ~ [r(n -2n+3) - r (rt-1)]
EX2
(n-l) (n-2) (n-3)
.
(3.2.5)
3s:rs:n.
Here again the case
efficient of
r
=2
will not be considered.
~2 in e(SSR*)
by
g({x},n,r) •
Denote the co-
Then e(SSR*)
can be
written
e(SSR*)
=a2
+
a2g({x},n,r)
(3.2.6)
and therefore,
e(SSE*)
= e(SST)
= (n-2)a 2
- e(SSR*)
2
2
+ ~ (LX - g«(x},n,r»
so
MSE~'(
case.
and
= SSE*/(n-2)
(3.2.7)
,
is not unbiased estimator of a
2
in this
It is possible, however, to write a linear combination of
SSE*
which is unbiased;
SSR*
i.~.,
8*2 = g([x}.n.r)SSE* - (Le
2
- g([x}.n.r»SSR*
(n-l)g({x},n,r) - ~x
It is of interest to find the variance of
2
s*2
(3.2.8)
and its efficiency
relative to the estimator which would have been available in the
40
untransformed
and
SSR*
For convenience, let the coefficients of
case~
be denoted by
c
and
1
c 2 , respectively.
SSE*
Then
and
V(s*2) '"
C~V(SSE*) +
C;V(SSR*) - 2c c 2Cov(SSE*,SSR*) •
1
(3.2.9)
In order to find the value of
and
SSE*
V(s*2) , i t is helpful to express
as quadratic forms in
y*;
i.~.,
'" y*' AY*
SSR~'c'
SSR'ic
(3.2.10)
SSE* '" y*' BY~'c'
where
-1
- - -x'
A '" -x(x'x)
,
and
1
B '" I - -n
values corrected for their mean,
an
n x n
matrix of ones.
idempotent matrices and
zeros.
Since
A and
their trace, so
I
J -
is the vector of
A • x
is the identity matrix and
It can be easily shown that
AB '" BA '" ,...
0,
,...0
being an
A and
n x n
J
X
is
Bare
matrix of
B are idempotent, they each have rank equal to
A has rank one and
An analytic expression for
B has rank
V(s*2)
n - 2 •
which expresses (3.2.9) in
terms of the parameters of the model will be found only under the
assumption of normality of the e's given in 2.1.
From (2.4.15), it
follows that under that assumption, the conditional probability density
function of
Y*, given
Gs ' can be written
41
,
-00<
but, since every
y~:~
~
Gs
<00,
(3.2.11)
i:= 1, ••• , n ,
is an orthogonal matrix,
Substitution of (3.2.12) into (3.2.11) yields
1
- - 2 (Y~'<'-G XY)' (Y*-G 'X:'( ....
ss'.!..l
2a
e
(3.2.13)
i • 1, ••• , n ,
and therefore, the conditional distribution of
seen to be that of a set of
2
condition that, for
with
2
z
Y*, given
s ' is
G
independent normal random variables
GsX1 and variance a I.
with mean
~'C~/a
n
,
distributed
A necessary and sufficient
N(~,a
2
I) , a quadratic form
be distributed as a non-central chi-square random variable
k degrees of freedom and non-centrality parameter
~ • ~-l~/2a2 is that C be idempotent of rank k. Therefore,
given
SSR*la 2
Gs '
•
X*'AX*la 2
is distributed as a non-central chi-
square variable with one degree of freedom and non-centrality parameter
~~,
and
SSE*la 2
chi-square variable with
parameter
se •
~*
•
X*'BX*la 2
(n-2)
is distributed as a non-central
degrees of freedom and non-centrality
42
Since
= Gs XY-
e(y*IG )
-
t.. *s
s
, it follows that
= -y' X' G'sAG s Xy- / 2c 2
(3.2.14)
,
and
t.. *se
= -y' X' G's BG sXV-/2c 2
Then using the form of
t..*se
B
= <r'x'xy ~ 1n
= -13
2
LX
20 2
2
•
y'X'JXY- - -y'X'G'AG
.-,
s sXY)/2c
-
_ t..* ,
s
s
= 1,
2
••• , S (r ,n) •
Let
(3.2.15)
t..*se = t.. - t..*s • Briefly, the conditional distribution> of
'2
2
2
,2
SSR*/cr
and SSE*/cr , given Gs ' are X(l,t..~) and X(n-2,t..-t..*)
,
s
respectively. Since A and B are idempotent and not of full
Then,
rank, they are positive semi-definite matrices.
the trace of
SSE*/cr
2
AB
Also,
is equal to zero, and therefore,
AB
, so
""
2
SSR'/C/cr
and
are conditionally independent.
Here again, it is convenient to find the variances of
and
=0
SSE*/a
2
SSR*/cr
2
by using
2
V(SSR*/cr )
= eG(v(SS~*IGs)) + vG(e(SS~*IGs))
cr
(3.2.16)
cr
and
(3.2.17)
43
and the covariance of
SSR*la
2
2 SS E* I a 2)
Cov (SS R*',a,
and
SSE*la
2
= e.G «SSR*
Cov -2-'
a
by using
SSE*IG))
2
s
a
(3.2.18)
The notation used above is analogous to that used in (2.4.11).
been shown that, given
and
SSR*la
2
for a particular
G
SSE*la
and
It has
s
2
are conditionally independent.
Therefore,
Gs
e.(SSR*1 G ) = 1 + 2,,·k ,
2
s
s
a
e.(SS~*IGs)
a
(3.2.19)
= n - 2 + 2(" - ,,*)
s
,
(3.2.20)
v(SSR*IG ) = 2 + 8,,* ,
2
s
s
a
(3.2.21)
v(SSE*1 G ) = 2(n-2) + 8(" - ,,*)
2
s
s
a
(3.2.2.2)
and
Cov (SSR*
-2a
SSE*I G ) = 0
2
s
a
.
(3.2.23)
44
Substitution of (3.2.19) through (3.2.23) into (3.2.16), (3.2.17), and
(3.2.18) and taking expectations and variances over the transformations
G
s
results in the following expressions for the co-
variance and variances of the regression and error SumS of squares:
2
V(SSR*/a )
=2+
8e(A~)
+
4V(A~)
(3.2.24)
,
2(n-2) + 8A - 8e(A*)
s + 4V(A*)
s ,
(3.2.25)
(3.2.26)
A simple check of these results can be made by observing that the
total corrected sum of squares,
SST , divided by a 2
as a non-central chi-square random variable with
freedom and non-centrality parameter
2
V(SST/a )
= 2(n-1)
A.
n-1
is distributed
degrees of
Therefore,
+ 8A •
(3.2.27)
Alternatively,
(3.2.28)
Substitution of (3.2.24) through (3.2.26) into (3.2.28) and comparison with (3.2.27) verifies the results.
the unbiased estimator of a
2
Finally, the variance of
can be shown to be
45
(3.2.29)
where
g(fx}.n.r)
(n-1)g«(x},n,r)-~x
(3.2.30)
2
and
2
tx -g([x}.n.r> _
(n-1)g«(x},n,r)-~x
(3.2.31)
2 •
From (3.2.19), it follows that
(3.2.32)
so, from (3.2.5), an expression for
of the parameters of the model;
~(A~)
can be written in terms
i.~.,
222
e(A*) = ~ ~ [n r(r-3) - 3r(n-1)(n-4) - 3r(r-1>
s
222
a
n(n-1) (n-2)(n-3)
+ (n-r-l)
(n-1)
2
2
1
222
2
+ ~~ [r(n -2n+3) - r (n-1)J
2 2 2
(n-1) (n-2) (n-3)
a
~
(3.2.33)
e(A~)
may also be found directly by writing
e(A*S) = __1__
, Y'X'e(G'AG)XY
2c
2 -
s
s
-'
46
and substituting the value of
(2.3.8).
e(G'AG)
s s
obtained from (2.3.5) and
Using this method, a slightly different form for
which will be used subsequently can be written;
2
=~
e(A~)
2
~
2
C(n-r)(n-r-l) + r
n(n-l)
2
C1
+
3r(r-3)
]
ri(n-l) (n-2) (n-3)
+
L ..le_ C.Ein-r-l)
C1
2
4
2 2
2
~x
n(n-l)
i.~.,
2r(n-r+l)
n(n-l)(n-~)
+ _4r(n.-r+l)
n(n-l) (n-2)
6r(r-3)
(3.2.34)
A different approach can be used to evaluate
1
V(A*)
• Let
s
x*Is
Gs X =
s
1
e(A*)
s
= 1,
••• , S ( r , n) •
(3.2.35)
x*ns
Then, it can be shown that
A*s
=
S2
n
2 (~ x.x~)
2cr LX
i=l ~ ~s
2
2
s
= 1,
••• , S (r, n) •
(3.2.36)
Also,
(3.2.37)
and
(3.2.38)
47
When
n
(~
x.x~)
4
is expanded and the expected value is taken, the
i=l 1. 1.S
following expression results:
n
+ 4 e( L:
3 ~3 X.X.*)
X.X:
L:
i=l j+i
1. 1.S J JS
n
+
6e( L:
L:
2 *2x.x* xkx* )
x.x.
1. 1.S J j S
kS
~
i=l j+i k+i,j
n
+ e(
~
~
~
~
i=l j+i k+i,j ,t+i,j,k
•
(3.2.39)
x.x~ x.x~ xkx~k' x~x*
1. 1.S J JS
S ~
)] •
,ts .
The terms in (3.2.39) can be evaluated in a manner similar to that
used to arrive at (2.3.7).
The structure of the resulting expected
value is such that a different expression will be written for
r = 2, 3, 4,
and
r
~
5.
The details of finding the
shown in (3.2.39) will not be given.
e[(
n
~
i=l
x1..x~s)
1.
4
J
e~pectations
The results folloW.
1
2
2 4
= ---- [(n -9n+36)(~x )
2(n)
2
224
+ (12n-l20)(BK ) ~ +
- (8n-56)tx
+
2n~
8
] ,
2
~x
6
-
r ... 2
l12~x
232
)
80~ (~x
3
L:x
5
+
70(~
4 2
)
(3.2.40)
48
n
e[(.~
1.=1
x1.,x*1.'s)
412
2 4
J = ---- [(n-4)(n -11n+36)(~ )
6(n)
3
+
18(n-4)(n-6)(~
224
) ~
232
2
2 6
+ 12(7n-24)Ex (Ex) - 12(n -7n+14)~ ~
-
48(2n-7)~
3
Ex
5
+
3(19n-70)(~
4 2
)
(3.2.41)
r =- 3 •
e[(
n
~
i=1
414
3
2
J=
[n - 22n + 187n - 726n
1. 1.S·
24(~)
x.x~)
2
232
+ 16(7n -67n+146)~x (~x )
3
- 16(n -14n
- 32(4n
+ 4(22n
+ 4n(n
2
2
2
3
~2n+91)~x ~x
l
Ex
6
5
242
-196n+455)(~ )
8
-12n+13)~x
Let
a
2
+65n-91)~
= r(n-r+1)[ (n-r) (n-r-l)
+ 6] ,
],
r
=4
•
(3.2.42)
49
a2
= r[(r+2)(r-S)
a
3
= r(r-S)[(r-4)(n-r) + 3(r-6)J ,
a
4
= r(r-S)(r-6)(r-7)
as = r(n-r+l)
a
6
;;: (n-r) ,
(3.2.43)
,
as = (n-r-2)
,
a 9 ;;: (n-r-3) ,
Then,
,
+ (n-r)(n-r-l)(n-r-2) ,
= (n-r-l)
a7
+ (n-r)[(r-3)(n-r-l) + 4(r-4)JJ ,
alO
= r(ri-r+l)
all
= r(r-3) ,
a 12
= r(r-4)
•
,
50
=
3a
ll
+ a a (6r+a a g )
S
6 7
[----:;;:.;;....-~--..;;.....;:;-
4 1(~)
4(3a 6a n +3 ~.12al)
5 1(~)
+
+ [
6a 5
-
105a4
S f (~)
2 4
l(L:X )
6(3all+6ra6+4r+a6a7(Sr+aSag»
---:;;~-~-~..:.-._--=-~-
41(~)
31(;)
2Sa
ll
+ 132ra
6
+ 10Sr + a a (S4r+Sa a g )
S
6 7
+_....::.:=---_--=._--_--=.~----=:;-..:;-.
41 (~)
1920a
--=-3 +
7 1(~)
3360a 4
S 1(~)
2 6
l L:x DC
51
48(4~6al1+10a10+a1)
5! (~)
384a
2
+-.....;;;;.
6! (~)
720a 3
7! (~)
+
1260a4
8! (~)
4 2
l(r;x )
6(7a11+36a10+a6a7(18r+a8a9))
4! (~)
720a
2880a
3
2
-.....;;;;.+-_.::.
6! (~)
7! (~)
5040a
4
--";']L:x
8! (~)
The expression shown for
8
r
~
5 •
(3.2.44)
52
for the appropriate value of
r
may, of course, be substituted into
(3.2.38), and the result substituted into (3.2.37), along with the
value of
e2(A~) , to finally obtain V(A~) • Since the expressions
involved are so complex that little or no insight can be gained from
them, an expression for
V(A*)
s
with the substitutions made will not
be shown here.
The objective of the preceding has been to find the variance of
s*2 , and thus to be able to examine the efficiency of
to the estimator,
S
2
not been transformed.
s*2
relative
,which would have been used had the observations
The estimator
s
2
has variance
(3.2.45)
so, from (3.2.29), the efficiency of
s*2
relative to
s
2
can be
written
R
2(n,r)
s*
22
= 2/(n-2)[2(n-2)c2
+
8Ac + 2c
2
I
1
+ 8e(A:)(C; -
c~) +
4V(A:)(Ci + c; + 2c 1c 2 )] •
(3.2.46)
In order to determine partially the effects of the transformation
procedure on the unbiased estimation of
a2 ,
V(s*2)
and
R 2(n,r)
s*
were computed for the three examples discussed in Section 3.1. Since
all three sets of
and
X values are symmetric, the terms involving
3 5
EK L:X
in
V(A*)
s
are equal to zero in these cases,
and the expression simplifies slightly.
putations are shown in Table 3.2.
The results of the com-
53
Table 3.2
S2 h 2
Relative efficiencies of
n '" 100
Ul
s·/( 2
for three sets of X values,
(l),a
(2)b
(3) c
.05
.02
.20
.40
.60
.80
1.00
.995
.931
.794
.503
.089
.000
.995
.931
.794
.499
.083
.000
.988
.872
.668
.365
.061
.000
.10
.02
.20
.qO
.60
.80
1.00
.989
.877
.679
.380
.067
.000
.989
.879
.682
.378
.063
.000
.974
.768
.501
.233
.040
.000
.50
.02
.20
.40
.60
.80
l.00
.908
.519
.253
.100
.019
.000
.920
.549
.272
.105
.018
.000
.770
.300
.116
.040
.007
.000
l.00
.02
.20
.40
.60
.80
1.00
.757
.282
.111
.040
.008
.000
.790
.321
.127
.044
.008
.000
.496
.127
.043
.014
.003
.000
2.00
.02
.20
.40
.60
.80
l.00
.472
.112
.039
.014
.003
.000
.527
.140
.048
.016
.003
.000
.212
.042
.013
.004
.001
.000
54
Table 3.2
(Continued)
a
· XIS expected values of order statistics
N(O,l) ,
2
tx /n
~
.9726 ,
ntx4/(~x2)2 ~ 2.7789
b
· X's equally spaced about zero with same
~x 2n/ .
= .97 26 , n~x4/( ~ 2)2.= 1.7998
c
~x
2
/n
value as (1),
· XIS equally spaced about zero with approximately same range as (1),
• 2 .08 3 1, n~x4/( ~x 2)2.= 1.7998
IX 2/ n =
55
Examination of the relative efficiencies shown in Table 3.2 re-
a,
veals that in this case, as in the case of estimation of
the
losses in information resulting from use of the transformation procedure are severe in most cases.
Comparison of Table 3.2 with Table
3.1 shows that the general comparisons of
three sets of
R
u
values made between the
X values also apply in the case of
R 2'
Further,
s*
it can be seen that the loss of information due to the transformation
is less severe when estimating
small values of
a2 /a 2
•
a2 /a 2
a
2
than when estimating
for
, but is more severe for larger values of
Some additional remarks about the estimation of a
2
will be
made in Section 3.3.
3.3
Some Asymptotic Properties of the Least Squares Estimators
The large sample behavior of the estimators of
and
a
2
discussed in the previous sections is dependent upon the assumptions
which are made about the
X values in the model and
convenience, the assumptions about
rand
r
or
w.
For
w shown in (2.4.18) will
be repeated:
(1)
r
is constant,
O<cv<l,
(3)
(3.3.1)
w is constant.
Each of the properties to be discussed will be considered for each of
the assumptions shown above.
In practical applications it appears
that assumption (3) would be more likely to be acceptable.
However,
56
assumptions (1) and (2) are included as alternatives for the sake of
completeness.
First, it can be shown that
lim
=S
e(~*)
under assumptions (1) and (2) ,
!HOO
and
lim
e(s*) =
(l-w)S
under assumption (3) •
(3.3.2)
!HOO
So,
"S~~
is biased in the limit when
w i.s constant, and the bias i.s
a function of the proportion of observations which are transformed.
Suppose it can be assumed that
lim
~2/n
=e >
0 ,
=0
for
(3.3.3)
!HOO
and
.
1,
lim ~xJ/n
n.... oo
j
~ 2
and
£ > 1 •
(3.3.4)
These do not appear to be unreasonable assumptions within the framework of regression.
From (3.2.3) and (3.2.4), it follows that
lim ~x4/(rx2)2
n.... oo
=0
,
and under all of the assumptions of (3.2.1), it can be shown that
lim
V(S*)
=0
,
V(S )
=0
,
(3.3.5)
n.... OO
and
lim
n.... OO
u
w< 1 •
(3.3.6)
57
Therefore, the following results regarding consistency of the
estimators of
s*
S
can be shown:
is consistent for
S
under assumptions (1)
8
under assumptions (1)
S
under assumption (3)
and (2) ,
A
Su
is consistent for
and (2) ,
Su
is consistent for
w< 1 •
if
Note that under assumption (3), neither of the estimators is consistent for
8
if
w
=1
•
Suppose that, in addition to (3.3.3) and (3.3.4), it can also be
assumed that
lim
n~4/(~x2)2
=K >
0 •
(3.3.7)
n~co
Then under the assumption that
lim
R*(n,r)
=1
R (n, r)
=1
r
remains constant, it follows that
(3.3.8)
,
rHCO
and
lim
rHCO
u
i.~., the asymptotic mean square errors of
(3.3.9)
S*
and
A
Q
flU
relative to
the estimator which would have been available in the untransformed
case are equal to one.
Under assumption (2) of (3.3.1),
58
lim Ru (n,r)
n-+ CO
=1 ,
0<0'<1
,
(3.3.10)
but
R*(n,r)
lim
=1
,
1
0<0' < 2 '
(3.3.11)
IH CO
2
lim R*(n,r)
n-+ co
= 1/(1 + ~
0'
e)
(j
= -21
(3.3.12)
'
and
lim R*(n,r)
n-+ CO
=0 ,
1
-<0' < 1
2
.
(3.3.13 )
Thus, the asymptotic relative mean square error of S*
this case on the choice of 0'.
depends in
This is due to the presence of the
bias term in the mean square error of
§*.
Under assumption (3) of
(3.3.1), it can be shown that
lim
R*(n, r)
=0
(3.3.14)
IH CO
and
lim Ru(n,r)
n-+ CO
Here again,
R*(n,r)
= (1-w)2/[1
2
+ ~ e(w + Kw(l-w»l •
2
(3.3.15)
(j
tends to zero due to the presence of the bias
term in the mean square error of
~* •
Suppose the assumptions about the
X values shown in (3.3.3),
(3.3.4), and (3.3.7) all hold, and the normality assumption given in
Section 2.1 holds.
Then it can be shown that under assumptions (1)
and (2) of (3.3.1)
lim
IHCO
2
V(s~'()
=0
(3.3.16)
59
and, under assumption (3) of (3.3.1),
(3.3.17)
w < 1 .,
lim V(s*2)
n-+cx:>
= +cx:>
,
w
=1
(3.3.18)
,
Also, under assumptions (1) and (2) of (3.3.1) the following results
can be shown to hold:
lim 2(n_2)2 c 2
1
n-+cx:>
=2
2
lim 8A(n-2)c
1
n-+cx:>
=4
.
2
11m 2(n-2)c
2
n-+CX:>
lim
n-+cx:>
where
c
1
and
2
~2
=0
(3.3.19)
4(n-2)V(A:)(C~ +
c2
it follows that if
e
0'
c; + 2c 1c 2 )
=0
,
are defined in (3.2.30) and (3.2.31).
r
is constant or
lim R 2(n,r)
n-+cx:> s*
=1
;
r
Therefore,
= na
(3.3.20)
i.~., the asymptotic efficiency of s*2 relative to the estimator,
s
2
,which would have been used in the untransformed case is equal to
60
one under either of these assumptions.
w< 1 ,
(3.3.1), it can be shown that if
lim
2(n-2)2ci
Under assumption (3) of
=2
n~CXJ
lim
8A(n-2)c~ = 4
tHCXJ
lim
2
a2 e
a
2(n-2)c~ = 0
(3.2.21.)
tH CXJ
lim
8(n-2)e(A~)(C;
n~CXJ
lim
4(n-2)V(A~XC~ +
n~CXJ
z =4
cl )
£2 e[
a
1
2 - 2]
(l-w)
c; + 2c 1c Z)
4
a
=4 ~
a
e2 2 (w 2 + Kw(l-w» ,
(l-w)
so with these results it follows that
·
1 ~m
n~CXJ
R
2 ( n,r )
= 1/[1
s*
2
2
+ 2 ~2 e (2w-w )2
a
(l-w)
(3.2.22)
However, if
w
lim
tHCXJ
=1
, it follows from (3.3.18) that
R 2(n,r)
s*
=0
•
(3.2.23)
61
The values of the asymptotic relative efficiencies of
s*2
for
a2 ja 2
A
Su
and
w assumed constant were computed for the same values of
and
wand for values of
= 100
n
and
~x2/n and
the corresponding values of
examples for
e
K taken to be the Same as
422
n~x /(~)
used in the
discussed in Sections 3.1 and 3.2.
The results of these computations are shown in Table 3.3.
values of
R
u
The
shown in Table 3.1 do not differ much from the corres-
ponding asymptotic relative efficiencies of Su
shown in Table 3.3.
There are greater differences between the values of
R 2(n,r) shown
s*
in Table 3.2 and the corresponding asymptotic relative efficiencies of
s*2
.
This is due in part to the differences between
2
2(n-2)c 2
fur
n
= 100
2 2
2(n-2) c
l
and the corresponding limiting values.
and
It
would appear then, that for sets of observations comparable to the
examples shown, the asymptotic results might be applicable for
relatively small sample sizes.
Several observations can be made regarding the apparent effects
of changes in the various parameters on the asymptotic relative
efficiencies shown in Table 3.3.
Most of these observations can also
be made by examining the formulae for the asymptotic relative
efficiencies of
Su
and
s*2
for
w assumed constant.
comparison of corresponding efficiency values for
veals that the relative efficiency of
greater extent by the size of
= .05
S
eu
is fairly closely approximated by
which
When
S2 ja 2
of
u
•
S2/a 2
e = .9726
s*2
au
and
First,
s*
2
re-
is effected to a far
than is the relative efficiency
, the asymptotic relative efficiency of
(l-w)
2
for the two cases in
but the approximation is less close for the larger
62
a
A symptotic relative efficiencies of '"
Table 3.3
constant
.2 , w assume d
s~
u
e*=.9726 K*=2.7789
a2 /a 2
and
e=.9726
K=l. 7998
e=2.0831 K=1.7998
2
ARE(s'1( )
w
ARE(S u )
ARE(s*2)
ARE(S'" u )
ARE(S*2)
ARE(S u )
.05
.02
.20
.40
.60
.80
l.00
.958
.625
.346
.152
.038
.000
.996
.945
.845
.649
.289
.000
.959
.630
.350
.154
.038
.000
.996
.946
.847
.652
.290
.000
.957
.619
.339
.148
.036
.000
.991
.886
.711
.454
.154
.000
.10
.02
.20
.40
.60
.80
l.00
.955
.611
.333
.145
.036
.000
.991
.890
.720
.467
.162
.000
.957
.620
.340
.149
.037
.000
.991
.894
.726
.473
.164
.000
.953
.599
.320
.137
.034
.000
.980
.782
.531
.276
.077
.000
.50
.02
.20
.40
.60
.80
1.00
.935
.518
.257
.107
.026
.000
.937
.525
.262
.109
.027
.000
.944
.552
.280
.116
.028
.000
.945
.559
.285
.118
.028
.000
.926
.477
.223
.088
.020
.000
.857
.305
.121
.044
.010
.000
l.00
.02
.20
.40
.60
.80
l.00
.912
.435
.200
.080
.019
.000
.842
.284
.114
.043
.010
.000
.928
.485
.228
.090
.021
.000
.869
.326
.132
.049
.011
.000
.894
.380
.161
.060
.014
.000
.669
.128
.044
.015
.003
.000
2.00
.02
.20
.40
.60
.80
1.00
.868
.329
.138
.053
.013
.000
.628
.112
.040
.014
.003
.000
.898
.391
.167
.063
.014
.000
.694
.142
.049
.017
.004
.000
.836
.270
.104
.037
.008
.000
.380
.043
.014
.005
.001
.000
K = lim
n-+CO
n~x4/(~x2)2
63
value of
G •
The comparisons of the relative efficiencies of the
estimators for different values of the variance and kurtosis of the
values which were made in the examples for
in the asymptotic case as well.
n; 100
2 2 2
than for
Su
for fixed
appear to apply
Note that the reduction in relative
efficiency which accompanies an increase in
S'"k
X
S /a
and
w.
e
is more marked for
It is difficult to
arrive at an intuitive explanation for the effects of changes in the
frequency distribution of the
X
values and the parameter values on
the relative efficiencies of the estimators due to the complexity of
the transformation process.
The preceding results would appear to indicate that the losses in
the area of estimation incurred by applying the transformation procedure would be too severe to make the use of such a procedure
desirable under any circumstances.
However, in the context in which
the procedure would be applied, the preceding results must be considered in conjunction with the observation that if Some such procedure were not used, the entire set of observations might not be
available for analysis.
Further, if the number of observations is
large and the proportion of observations interchanged is less than
one, the effective sample sizes available in termS of relative
efficiency for unbiased estimation of
considered sufficient.
s
and
a
2
might s till be
64
TESTING THE HYPOTHESIS S
4.
4.1
=0
The Test Statistic F*
Suppose the regression and error sums of squares of the trans-
formed observations expressed as quadratic forms in (3.2.10) were
used to form a statistic
F*,
i.~.,
(4.1.1)
or
= X*'A!*/[X*'BX*/(n-2)] •
F*
Suppose a test of the hypothesis
S
(4.1.2)
= 0
is made based on
F*.
The
test to be considered is
where
F
level
ct.
Reject
H :
O
s =0
if
F~'~
Accept
H :
O
l3
=0
if
F'/( ::;; F
(1, (n-2) ,ct)
>
is the critical
F
(1,(n-2),a)
(4.1.3)
(l,(n-2),a) ,
F
value for significance
This is, of course, the usual regression test which would
have been applied had the observations not been transformed.
In order to assess the effects of the transformation procedure
upon the test, the distribution of the statistic
hypothesi.s and the alternative hypothesis
F*
under the null
S:f: 0 will be examined.
Throughout the following, the assumption of normality of the errors
will be made.
65
It was shown in Section 3.2 that the conditional distribution of
y* ,
given
G ' is that of a set of
s
variables with mean
G XY
s -
n
and variance
independent normal random
CJ
2
I
.
Under the null
hypothesis,
so
and
GXY
s - =,,1
,..-
for every
of
s.
Therefore, the conditional probability density function
Gs ,can be written
y* , given
-
for every
s.
(X)
<
y",:
~
< (X)
i = 1,
,
(4.1.3)
... , n
Thus, the marginal probability density function of
is that of a set of
n
variables with mean
l-Ll and variance
3.2 that the matrices
independent, normally distributed random
A
and
B are
n - 2 , respectively, and that
the quadratic form
y*
AB
2
cr I
.
It was stated in Section
idempot~nt
= BA = o.
of rank one and
From the above, then,
is distributed as a non-central chi-
square random variable with one degree of freedom and non-centrality
parameter,
66
SSE*/cr
and
2
is distributed as a non-central chi-square random
variable with
(n-2)
degrees of freedom and non-centrality parameter
n
However, since
L:
i=l
l'Al
=
=0
x.~
l'Bl
=0
, it follows that
,
so
A,
r
=A,
e
=0.
Further, since the trace of
and
SSE*/cr
2
the statistic
AB
are independent.
F*
is zero, it follows that
Therefore, under the null hypothesis,
is the ratio of two independent central chi-
square random variables, each divided by their degrees of freedom, so
F*
is distributed as
F with one and
(n-2)
degrees of freedom.
Under the null hypothesis, then, the transformation has no effect on
the distribution of the test statistic.
test,
01,
is not effected by the transformation.
Suppose
F*
Therefore the size of the
8 ~ O.
In order t%btain the marginal distribution of
under this assumption, the conditional distribution of
Gs ' will be considered.
F~'(,
It was shown in Section 3.2 that, given
given
G
s
,
67
and
and
F*
and
are conditionally independent, where
A are defined in (3.2.14) and (3.2.15).
is distributed as a doubly non-central
one and
and
A-
n - 2
A"Sk
Gs '
random variable with
degrees of freedom and non-centrality parameters
A*s
Denote the probability density function of a doubly
•
non-central
F
Therefore, given
F
random variable with the degrees of freedom and non-
centrality parameters above by
o<
h(F"
)
A* '-A*)
,
(1, 0-2 's'f\
S
F"
<00 •
Then the marginal probability density function of
F*
may be
written
f(F~''')
=
8 (r, n)
L;
s=l
o<
Thus, for
S;
1
8(r,n) h(F(l , n-2 'A*,A-A*»
s
s
,
(4.1.4)
F* <CD
0 , the marginal probability density function of
is seen to be an average of
doubly non-central
S
F
F*
density
functions.
Of course, the objective of finding the distribution of
F*
under
the alternative hypothesis is to determine the power of the test using
the statistic
~
F*.
For a given
S
~
0
and a given signtficance level
, the power of the test is
S (r, n)
=
L;
s=l
8(;,n)
S;n(1,(n-2),~)~'
h(?1 r n-2 ,* \_,*»dF*
,f\s,f\
f\s
•
(4.1.5)
68
For a given
S1
0
and a given significance level, the power of the
test may be found by evaluating the upper tail probabilities of
doubly non-central
There are
F
S
distributions.
difficulti~s
which are immediately apparent when
evaluating the function in (4.1.5) is considered.
First, evaluation
of a single term of the Sum in (4.1.5) requires the integration of
either a doubly infinite series representation of the exact distribution of the doubly non-central
the
F"
F
or the use of approximations of
such as those given by Schot z
in (10).
Second, the number
of terms in the Sum in (4.1.5) becomes rather large with even small
n
and small
r.
For example, if
n
= 10
be forty-five terms in the sum in (4.1.5).
and
r
=2
, there would
It appears that, even with
the aid of a computer, finding the exact power or an approximation to
it would
~e
an unmanageable problem for all but very small numbers of
observations and
s~all
r.
~n
order to provide
Some insight into
the effect of the transformation on the power of the test for smaller
sample sizes, some simulati.ons were performed as exampleso
cedures used and the results of the
simu~ations
The pro-
will be presented in
the next section.
4.2
Small Sample Simulations of the Power of the Test Based on
F*
The purpose of performing the computer simulations which are
discussed in this section was to obtain Some information concerning
the effect of applying a transformation to the observations on the
power of the test.
for simulation.
Two general types of
In one oase, the
X
X values were selected
values were taken to be the
69
expected values of the order statistics for a sample of size
a normal distribution with mean zero and variance one, and
20, 50
for
n
100
and
n
= 10
= 20,
as shown in (5).
and
For each set of
2.00 •
a
In each case,
the model was taken to be one.
n
and set of
2
For a particular value of
X values, a set of
random variables.
the usual
F
taken to be
S
~
n
The
.05.
si
were generated as
Once the set of observations was generated,
and the
s~lected
F
n
tr~nsformed
criti~al
statistic.
r
of the
~
n
Y values within that sample were
interchanged using a random permutation.
was made using the same
in
observations was
Then a simple random sample of
computed for the set of
was
S2/a 2 , a
statistic was computed and the test was made with
observations was
based on the
.05, ,10, .50,
Further, the value of
generated using the model of Section 2.1.
N(O,l)
n
was set equal to one and
S2
taken to be the positive root of
given
10. For
X values anq sample size
S2/a 2 assuming the values
simulations were run for
and
X values
100, the ten points were replicated to provide a full
set of n X values.
1.00
1, 2, ""
from
= 10,
n
In the second case, the
were taken to be the integers
SO
n
The
F*
statistic was
observations, and the test
value as that used for the test
Several such samples were generated in
each case, and the number of rejections of the null hypothesis was
computed for both tests.
on
F*
Then the estimated power of the test based
waS computed as the proportion of rejections among the total
number of simulated observations for the particular value of
being considered.
The estimated power of the test based on
computed as the proportion of rejections among the simulated
r
F was
70
observations for all values of
were taken to be
= 10
n
,since
r = wn
r
r
for a given
w=
for
n.
Values of
r
.10, .20, ••• , 1.00 , except when
cannot take the value one.
If the power of the test for a particula r value of
w
is
A
P ,resulting from
w
a set of simulated observations could be considered to be a binomial
denoted by
p
w
,
then the estimated power value,
proportion with parameter
Pw ,based
on
The variance of such a proportion,
P
w
M observations would be
= Pw(1
- P ) 1M
w
•
(4.2.1)
This variance would take on its maximum value if
=
C1 A
P
W
... 50.
Taking
.025 , the number of observations to be simulated was set at
P
W
400 for all of the cases considered.
Using the normal approximation
to the binomial,
pCP
w
- 1.96
C1
P
~ "P ~ P. + 1.96 OAp.J
w w
=.. 95
,
(4.2.2)
w
w
so that with probability approximately
.95, the estimated power
would lie within a band of width
around the true value of
.098
P
w
A
P > .50 or P < .50 , the probability of P
lying within
w
w
w
that interval would be greater than .95 using the normal
When
approximation.
However, use of the normal approximation for
large or small values of
P
w
leads to overestimating the probability
of the interval shown in (4.2.2).
set by
~he
decrease in
fixed sample size.
C1~
e~tremely
Perhaps the overestimation is off-
with decreasing
or (l - P ) for
w
w
W
In any case, the simulation sample size was fixed
at 400 simulated samples for all cases.
P
71
The simulated power values from the simulations discussed above
are shown in Tables 4.1 and 4.2.
for both sets of
and
n
= 20
The behavior of the estimated power
X values appears to be somewhat erratic for
and for
w
~
.80
for all values of
trends are not clearly visible in those cases.
n;
i.~.,
n
= 10
general
In general, however,
it appears that for fixed
e2/~2 and n, the estimated power is a
non-increasing function of
wand the decrease in estimated power
does not appear to be linear in
w •
Also, the estimated power
appears to be a non-decreasing function of
n
and of
S2/~2 with
fixed values of the other parameters.
Since the power of the test based on
and
F*
for given values of
w
S2/~2 appears to increase with n, at least for w less than
one, it is of interest to arrive at a comparison of the test based on
F*
and the test based on
F
which demonstrates the comparative
abilities of the tests to discriminate between the null hypothesis and
alternatives which are very close to the null when
n
is large.
Such
a comparison is given by the concept of asymptotic relative
efficiency of tests introduced by Pitman.
In the context of the
present problem, the Pitman efficiency provides a measure of the
effect of using the transformation procedure on the power of the test
for a zero regression coefficient.
efficiency of the test based on
is developed.
F*
In the next section, the Pitman
relative to the test based on
F
72
Table 4.1
Small sample simulated power values for X values equally
spaced a
S 2/a 2
n
w=r/n
.05
.10
.50
1.00
2.00
10
.00
.20
.30
.40
.50
.60
.70
.80
.90
1.00
.427
.258
.185
.175
.095
.100
.063
.033
.035
.043
.707
.400
.290
.173
.125
.090
.068
.063
.045
.028
1.000
.733
.508
.353
.245
.123
.090
.040
.045
.048
1.000
.775
.568
.330
.245
.108
.065
.035
.015
.028
1.000
.810
.588
.428
.250
.118
.053
.028
.018
.045
20
.00
.10
.20
.30
.50
.60
.70
.80
.90
1.00
.773
.650
.498
.383
.250
.195
.145
.103
.060
.068
.048
.966
.888
.720
.590
.398
.288
.148
.108
.078
.055
.035
1.000
.990
- .968
.863
.688
.488
.325
.185
.065
.040
.053
1.000
.988
.973
.865
.753
.538
.353
.203
.098
.043
.055
1.000
.995
.985
.903
.770
.555
.353
.173
.088
.055
.043
50
.00
.10
.20
.30
.40
.50
.60
.70
.80
.90
1.00
.993
.970
.898
.795
.653
.515
.270
.178
.110
.035
.043
1.000
1.000
.988
.950
.835
.665
.465
.248
.130
.068
.053
1.000
1.000
.998
.998
.983
.893
.735
.475
.200
.078·
.073
1.000
1.000
1.000
1.000
.980
.940
.760
.483
.240
.068
.043
1.000
1.000
1.000
1.000
.995
.950
.783
.510
.250
.083
.050
100
.00
.10
.20
.30
.40
.50
1.000
1.000
1.000
.993
.920
.815
1.000
1.000
1.000
1.000
.985
.915
1.000
1.000
1.000
1.000
1.000
.995
1.000
1.000
1.000
1.000
1.000
.998
1.000
1.000
1.000
1.000
1.000
1.000
~40
73
Table 4.1
(Continued)
S2 /a 2
n
w=r/n
.05
.10
.50
100
.60
.70
.80
.90
1.00
.573
.358
.180
.085
.060
.775
.455
.275
.085
.060
.963
.770
.430
.155
.058
1.00
.985
.810
.415
.118
.030
2.00
.983
.850
.450
.138
.043
ax values are the integers 1, 2, ••• , 10 for n = 10 , and
replicates of these values for n > 10, For n = 10 and w = 0, 3600
samples were observed; for n > 10 and ill = 0 , 4000 samples
were observed; for ill > 0, 400 samples were observed
74
Table 4.2
Small sample simulated power values for X values
expected values of N(O,l) order statistics
13
2
1(]
2
1.00
2.00
n
w=r/n
.05
.10
.50
10
.00
.20
.30
.40
.50
.60
.70
.80
.90
1.00
.087
.080
.063
.045
.060
.068
.055
.058
.040
.058
.129
.113
.070
.075
.073
.093
.053
.033
.035
.045
.420
.308
.200
.148
.073
.063
.060
.048
.033
.043
.699
.463
.338
.213
.158
.125
.043
.045
.033
.030
.931
.583
.450
.268
.200
.115
.045
.050
.023
.060
20
.00
.10
.20
.30
.40
.50
.60
.70
.80
.90
1.00
.137
.113
.108
.075
.080
.075
.088
.068
.045
.050
.030
.241
.190
.163
.128
.093
.110
.068
.060
.065
.055
.045
.799
.688
.558
.430
.293
.230
.153
.093
.048
.060
.038
.977
.878
.738
.583
.458
.340
.215
.078
.055
.048
.035
1.000
.935
.898
.778
.595
.438
.293
.173
.098
.038
.048
50
.00
.10
.20
.30
.40
.50
.60
.70
.80
.90
1.00
.327
.280
.235
.185
.140
.105
.093
.060
.035
.055
.040
.567
.443
.408
.323
.255
.140
.120
.090
.053
.040
.050
.998
.988
.945
.863
.673
.558
.353
.245
.090
.053
.035
1.000
.998
.995
.970
.858
.733
.500
.248
.175
.058
.035
1.000
1..000
.995
.988
.938
.845
.598
.405
.1.65
.078
.058
100
.00
.10
.20
.30
.40
.50
.578
.488
.405
.330
.245
.193
.876
.768
.643
.528
.433
.300
1.000
1.000
.998
.990
.950
.783
1.000
1.000
1.000
1.000
.985
.925
1..000
1.000
1.000
1.000
1.000
.985
75
Table 4.2
(Continued)
S2/a 2
n
w=r/n
.05
.10
.50
100
.60
.70
.80
.90
1.00
.143
.068
.070
.063
.055
.188
.140
.090
.045
.043
.633
.363
.183
.093
.030
aFor
1.00
.783
.570
.265
.083
.065
n = 10 and w = 0, 3600 samples were observed; for
and w = 0, 4000 samples were observed, for w > 0 ,
samples were observed
2.00
.910
.670
.358
.115
.058
n > 10
400
76
4.3
The Asymptotic Efficiency of the Test Based on F* Relative to the
Test Based on F
The test based on the
F
statistic which would have been com-
puted using the original set of observations may be compared with the
test based on
F*
asymptotically in the Pitman sense;
sample sizes required for each level
a
i.~.,
the
test to achieve the same
power with respect to the same sequence of alternatives may be compared.
To compare the tests of
alternatives
H :
a
n
fl
2
S :f 0
H :
a
H :
O
fl == 0
or, equivalently,
versus the set of
H :
O
fl2
=0
versus
> 0 , a particular sequence of alternatives which change with
so that
2
= flO = 0
lim
n-+oo
will be considered for each test.
Q2
~
n
= kin ,
n
= 1,
Therefore, let
(4.3.1)
2, •...
Throughout the following, the assumptions made in (3.3.3) and (3.3.4)
will be assumed to hold,
lim
2
Ex /n
=e
i.~.,
> 0 ,
(4.3.2)
11-+00
and
lim
. t
/XJ/ n
=0
for
j
~ 2
and
t > 1 •
n-+oo
Also, it will be assumed that
w
== r/n
is constant.
(4.3.3)
77
In order to find the asymptotic relative efficiency under
consideration, it is necessary to find the asymptotic distribution of
the statistics
consider the
F
where
F
F
and
F*
under the assumptions given above.
statistic
SSR
n _."..,...
= __
-F
SSR
n
and
n
(4.3.4)
SSE l(n-2) ,
n
SSE
are the usual regression and error sums of
n
squares computed using
n
untransformed observations.
assumption of normality of the
e's,
SSE la
n
(central) chi-square random variable with
and
SSR la
First
2
n
2
n - 2
Under the
is distributed as a
degrees of freedom,
is distributed as a non-central chi-square random
variable with one degree of freedom and non-centrality parameter
n
An =
where
2
Q
I"n
k L: i =l x
2
(4.3.5)
2
2c n
= kin.
Therefore,
2
e(SSE n l(n-2)a )
=1
(4.3.6)
and
(4.3.7)
Since
lim
IHCO
2
V(SSE n l(n-2)a )
=0
,
it follows by Tchebycheff's inequality that
SSE n
-'-';~2
(n-2)a
P
-+ 1
(4.3.8)
78
i..~.
,
SSE /(n_2)a
2
converges in probability to the constant
n
1.
Since
~
its characteristic function,
~
= (1
(t)
n
n
(t) , can be written
. -1/2 ZA n it/(1-2it)
- Z1t)
e
,
(4.3.9)
so
1n
for all
~
n
n.
(t)
1
=- 2
1n(1 - Zit) + ZA n it/(l - Zit)
From (4.3.2), it follows that
A,
lim
(4.3.10)
n... oo
so
~
lim
11-+00
n
(t)
= - 21
= 1n
where
~(t)
1n(1 - Zit) + ZAit/(l - Zit)
~(t)
is the characteristic function of a non-central chi-
square random variable with one degree of freedom and non-centrality
parameter
A.
Therefore, by the Levy-Cramer convergence theorem,
SSR
n d
,Z
-Z- -+ X(l,A)
(4.3.11)
a
SSR /a
n
Z
converges in distribution to a non-central chi-square
random variable with one degree of freedom and non-centrality parameter
79
when
2
Sn
= kin.
A theorem proved by Cramer in (2) states in part
that if a sequence of random variables,
bution functions
random variable
F (x), F2 (X) , •..
1
n
n
F(x) , and if
Y1' Y2 , •••
converges in
c > 0 , then the sequence of random
probability to a constant
X IY
with distri-
converges in distribution to a
X with distribution function
another sequence of random variables
variables
Xl' X2 , ••.
converges in distribution to a random variable with
distribution function
F(cx) •
Cramer emphasizes that no requirements
are placed on the random variables with regard to independence.
In
the case currently being considered, the random variables are independent.
From Cramer's theorem, then, it follows from (4.3.8) and
(4.3 .11) tha t
(4.3.12)
It was shown in Section 3.2 that the conditional distribution of
the SumS of squares used to form the statistic
involve the quantity
From (3.2.36),
A:(n) •
F*, given
A:(n)
Gs(n) ,
can be written
in this case as
n
k( L;
i=l
* ( ) )2
x.x.
1. 1.S n
n
L;
i=l
s (n)
where
= 1,
2
x.
1.
2, ••• , S ( r , n) ,
(4.3.13)
80
X'*
1s (n)
x
x''<'
ns(n)
n
Therefore,
i.s the inner product of the vector of original
vector of x
x
values and the
values rearranged to reflect the transformation
which has been applied to the data.
G
8(n)
By the Cauchy-Schwartz inequality,
n
*
)2
n
2) n
2
x.x. ()
~ (L x. (L xts(n»
i=l ~ ~s n
i=l ~ i=l ~
(L
,
but
n
Xi (
L
i=l
2
is(n)
=
n
2
L x.
i=l
~
so
n
(L
i=l
)2
x.x.*()
~
~s
n
n
~ (L
~=1
~
2 2 •
x.)
(4.3.1.4)
~
From (4.3.14), it follows that
(4.3.1.5)
and
(4.3.16)
for
s(n)
= 1,
2, ••• , S(r,n)
and every
n.
From (3.2.34) it can be
shown that, under the assumptions (4.3.2) and (4.3.3),
81
= ~(1
lim e(A*s(n) )
-
IHOO
when
'1
~
2
n
= kin
.
2
w) ,
(4.3.17)
It can also be shown that
(4.3.18)
from (3.2.34), (3.2.37), (3.2.38), and (3.2.44) and the assumptions
(4.3.2) and (4.3.3).
From (3.2.20) it follows that
e(SSE~'<'
I (n-2)a 2 )
2).,
=1
+ --£ _
2e(A i<
n-2
s(n)
)
(n-2)
(4.3.19)
and therefore,
2
lim e(SsE*/(n-2)a )
=1
(4.3.20)
•
n-+OO
From (3.2.25),
2
V(SSE*/(n-2)a )
=
(4.3.21)
so
lim
2
V(SSE*/(n-2)a )
=0
(4.3.22)
•
IH 00
Consider
SSE~\"
- 2e[
n 21
(n-2)a
+ 1 •
82
From (4.3.20) and (4.3.22), it follows that
SSE*
lim ~[
n - 1)2 l = 0
2
(n-2)0'
n.... oo
so by Tchebycheff's inequality,
SSE",
n
P
2 -+ 1 •
(4.3.23)
(n-2)0'
Since the conditional distribution of
.
SSR*/O'
2
n
'
given
was shown i.n Section 3.2 to be that of a non-central chi.-square
random variable with one degree of freedom and non-centrality
parameter
, the characteristic function of
r..~(n)
may be
written
2
itS SR*/O'
~~(t) = e[e(e
n
\Gs(n»l
S (r, n)
=
S(~)=l
= (1-2it)
itSSR* 10'
1
S(r,n) ere
-1/2 S(r,n)
L
n
2
IGs(n)J
I
r..*
[e 2it (1-2it)] sen) IS(r,n)
s(n)=1
_ (1_2it)-1/
2
S (r ,n)
~
s(n)=l
Now ,
H('*
)
1\ s (n)
H(r..~(n»/S(r,n)
(4.3.24)
has a continuous derivative
Let
,.,* = ,.,(1
- w)
2
(4.3.25)
8.3
where
A is defined in (4.3.10).
H(A~(n»
can be expanded about
Then by the mean value theorem,
A*
as
(4.3.26)
o
where
is in a neighborhood of
As(n)
A*
of radius
Therefore,
S(r,u) H(A~(n»
~
- s(n)=l S(r,n)
... H(A*) +
S(r,n) (A*
-A*)
~
s(n)
H'(A O )
s(n)=l
S(r,n)
s(n)
so, if it exists,
S(r,n) H(A~(n»
lim
~
n... c::o s(n)=l S(r,n)
... H(A*)
+ lim
11-+
S(r,n) (A~(n)-A*)
~
S()
c::o s ( n) ... 1
r ,n
H'(A~(n»·
(4.3.27)
By the inequalities (4.3.15) and (4.3.16), and since
An
~
0
and
~
0
W ~ 1,
o
As(n)
A~ 0 ,
must satisfy one of the following
inequalities:
o~
o
A*s(n) < As(n) < A*
~
A ~ A + An
or
for all
nand
s(n)'" 1, 2, ••• , S(r,n)
i·~· ,
(4.3.28)
84
Let
c
= 4t 2 /(1
+ 4t )
2
a
= 2it/(l
- 2it) •
and
Then
o
0
)1 =
IH I( "sCn.)
laIICe2it/(1-2it)J"s(n)1
= lale C- 4t
= lale
2
2
/(I+4t
o
-c" sen)
)1
,,0
sen)
0
~
\ale
c'~s(n)
•
From (4.3.28),
o
e
c"s(n)
s e
c("+"n)
so
sen)
Therefore,
= 1,
2, ••• , S(r,n) •
(4.3.29)
85
S(r,n) (A* -A*)
I ~
s(n2
s(n)=l S(r,n)
~ lale
H,o.o)1
sen)
, CA n S(r,n) IA*sen) -'*1
~
e
~
- s(n)=l S(r,n)
C~
(4.3.30)
Now,
so
+
S(r,n) le(A:(n»-A*1
~
s(n)=l
S(r,n)
+ le(A*sen) ) - A*I .
(4.3.31)
By: Liapounoff' s inequality,
Therefore, from (4.3.31) and (4.3.30), it follows that
S(r,n) (A*( )-A*)
0
s n( )
S
H, (As(n»
~
Is(~)=l
r,n
1/2
I ~ Ia Ie C~' e CA n [Var(A:(n»)
[
]
.
+ \e(A*'
s(n»- k*IJ.
(4.3.32)
86
From (4.3. 17) ,
lim e(~~(n»
= ~(1 - w)
2
= ~* ,
n-+ 00
so
lim le(~~(n»
n-too
- ~*I
=0
,
and from (4.3.18),
lim
Var(~:(n»
"" 0 ,
n-+ 00
so
Also,
lim ~ n
n-too
=~ ,
so
c~
lim e n ... e c~ •
n-too
Therefore, from (4.3.32) and the preceding,
lim
n-t 00
S(r,n) (~~(n)-A*)
0
2
r
S(r n)
H'(~s(n»1 ~ \ale cA .0 "" 0 ,
s (n)"'l
'
I
so
S(r,n) (~~(n)-~*)
0
lim
r
S(r)
H'(~s(n.»'" 0 •
n-too s(n)=l
,n
(4.3.33)
87
Applying this result in (4.3.27) yields
S(r,n) H(~~(n»
~
lim
8(n)::Il
n-+ CO
- - = H(~*)
S (r,n)
,
so, from (4.3.24),
lim cp*(t)
n
n-+ CO
= (1_2it)-1/2
lim
n-+ CO
S(r,n) H(~~(n»
s (~) =1
S ( r , n)
= (1_2it)-1/2e2~*it/(l-2it)
==
Now,
cp*(t)
(4.3.34)
cp*(t) •
is the characteristic function of a
non~central
square random variable with one degree of freedom and
parameter
chi-
non~centrality
Therefore, by the Levy-Cramer convergence theorem,
~*.
(4.3.35)
From Cramer's theorem, discussed earlier, it follows from (4.3.23)
and (4.3.35) that
(4.3.36)
It has now been established that for the sequence of alternatives
S2
n
= kin
the statistic
F
n
, under the assumptions made about the
is asymptotically distributed
x
values,
and the
88
By
is asymptotically distri.buted as
statistic
Pitman's definiti.on of asymptotic relative efficiency, the two level
a
tests based on
F
n
F~
and
must have the same asymptotic power
with respect to the same sequence of alternatives.
of alternatives are
k/n
and
If the sequences
k*/n"k, then they will be the same if
k
(4.3.37)
n
The two tests will have the same asymptotic power i.f the two noncentrality parameters are asymptoti.cally equal,
k
lim
n-+co
n
2
l: x.
1.
i=l
= lim
2
2a n
n*-+co
l:
1=1
.i.~.,
x
2
i
or
kc
= k*c*
(4.3.38)
,
where
c
e
= --2
2cr
and
From (4.3.37) and (4.3.38), then,
n
n*
I:
-
k
k*
c~'(
= -c = (1.
- OJ)
2
(4.3.39)
89
Thus, the asymptotic efficiency of the test based on
the test based on
been transformed is
set of
n
F
F*
relative to
which would have been used if the data had not
(l -
2
0))
This implies that for an original
•
observations, the interchange of
O)n
of these
observations effectively reduces the sample size to
termS of the power of the test.
In particular, if
(1
0))
=1
0)
,
2
n
in
i.~.,
if
all of the observations are interchanged, the sample size has been
effectively reduced to zero.
assumption
It should be pointed out that if either
(1) or assumption (2) about
rand
0)
shown in (2.4.18)
can be made reasonably in the context in which the procedure is being
used, then the asymptotic relative efficiency of the test based on
F*
becomes one.
4.4
Further Simulation Results
A large sample simulation study was planned to attempt to
determine how rapidly the distribution of the statistic
F*
approaches
that of a non-central chi-square random variable with one degree of
freedom under the assumptions made in Section 4.3.
let
In the following,
A denote the non-centrality parameter of the distribution of
O
the regression Sum of squares computed when
2
0))
,
0)
0)
is equal to zero.
(4.4.1)
> 0 •
The approach to be taken in the study was that given large enough
and an appropriate set of
F*
n X values,
S2/cr 2 and
0)
is hypothesized to be distributed approxi.mately as
Then the power of the level
01
test of
Let
S
=0
,
n
the statistic
,2
X(l , A*)
•
0)
based on
F-I<
90
could be computed as the upper taU of the
,2
X(1
A~\')
,
distribution with
UJ
the critical value set equal to the appropriate critical value for an
F
test with one and
n - 2
degrees of freedom.
Then the values of
power calculated in this way could be compared with those observed
from corresponding simulations.
The first set of simulation runs was made with
n '" 500
and the
X values taken to be five replicates of the expected values of the
order statistics from a normal di.stribution with mean zero and
variance one for
1.~.,
n '" 100 ,
the
X values in this case were
replicates of those used for the small sample normal case with
described in Section 4.1.
The mechanics of generating samples,
transforming the observations, and making the tests based on
F*
n '" 100
F
and
were exactly the same in this case as in the simulation discussed
in Section 4.1.
by setting
Th e va 1ue
0
f
13
21a2
I'
used in this case was obtained
AD '" 9.00 , so that the power at
~
'" .05
for the non-
central chi square distribution would be slightly less than one.
was done so that
A*
UJ
This
would take on values less than one and some
comparisons of simulated results with the hypothesized power values
could be made.
For the given set of
simulations were performed for
UJ '"
X values and
2 2
13 /a ,
.20, .40, .60, .80
and
1.00.
Here again, 400 sets of 500 observations were generated in each case.
The full set of
UJ
values used in
~he
small sample simulations
described earlier were not used here due to the extremely large
amount of computer time required for the simulation of a single case.
A computer program was written to calculate the power values of
the non-central chi-square distribution whi.ch were needed for
91
comparison.
The method used was based on a representation of the
distribution function of a non-central chi-square random variable as
an infinite Sum of weighted central chi-square distribution functions;
i..~.
,
,2
P (X(l,,,) :s:
)
=
Z
OJ
~
k=O
e
_"" k
. 2
k! P(X(1+2k) :s: z) •
In all cases, the summation above was found to converge, within
limits of machine accuracy, with no more than twenty terms.
A
partial verification of the procedure was made by reproducing the
values for one and
degrees of freedom and
OJ
C¥
= .05
found in Tang I s
tables of the power function of the non-central beta.
If the underlying distribution of
F*
simulated power value for a given value of
,2
X (l,,,~)
,
"P
,
OJ
with parameter
power of the appropriate level
P
is assumed to be
OJ
th6.l the simulated power value can be considered to be a
binomial proport ion,
buted as
which produces the
,2
X(l , ,,*)
•
OJ
ex
P
OJ
,
where
P
is the
OJ
test based on a statistic distri-
Using the normal approximation to the binomial,
could be assumed to be approximately distributed as a normal
OJ
random variable with mean
is computed from
OJ ,
P
OJ
and variance
M observations.
P (l-P )/M
OJ
OJ
where
"P
OJ
Therefore, for a given value of
a test of the hypothesis that the true value of the parameter
estimated by
P
OJ
is
P
OJ
can be made.
If the hypothesis is rejected,
then effectively the hypothesis that the true power values come from
the underlying
,2
X(l,,,*)
distribution can be rejected.
OJ
statistic,
p -
z
P
= -;--:OJ====::\!)=--=
\ , P (l-P ) 1M
V OJ
OJ
The test
92
is assumed to be approximately distributed as
based on
z
was made at
=
01
simulations discussed above.
Note that none of the
.05
for each. value of
p
reject the hypothesis that
(J)
for the
The results are shown in Table 4.3.
values is
z
The test
N(O,l)
large enough in absolute value to
is the true power value.
(J)
This result
tends to lend Some credibility to the conjecture that, in this case,
the statistic
F*
may be distributed as
,2
X(l,,,'>") •
However, it
ill
should be pointed out that the use of the test based on the normal
approximation when
P > .90
P < .10
or
(J)
M is less than
and
(J)
600
may be unreliable (see (1».
Since none of the
n
= 500
200
statistics were found to be significant for
z
, further simulations were performed for
with the
n
N(O,l)
described in (3.1.22) for
X value used for
values used for
obtain a value of
simulations.
each value of
n
= 100
n
= 200.
"0
and
n
= 200
9.00
sets of
n
For
= 200
was adjusted to
for all of the
observations were simulated for
greater than zero.
n
X values
to form the set of
S2 1a 2
The value of
The estimated power values from
, none of the
z
ar.J.d z are shown
w
statistics exceeded the
F
critical value, so that in all three caseEl, it. appears that
be distributed as
x (i,x.,>,() •
Also,
In both cases,
was replicated
of approximately
Also, 400
(J)
= 100
n
the simulations and the corresponding values of
in Table 4.4.
=
n
order statistics, respectively.
simulations were performed for the other two sets of
X
and
X values taken to be one and two replicates of the
expected values of the
each
= 100
However,
WhAll
F"(
n'" 100 , one of the
may
z
(J)
values is significant at
01
=
.05
for two of the sets of
X
values.
93
Table 4.3
Simulated power values and z statisti.cs for Fi' dis! 2
trl b u t e d as X(l,~*)'
n "" sOOa
o
UJ
b
.~
p
P
ill
(8 imulated
Power)
UJ
UJ
(Power for
F"l\'
2
'"
Xb,A~'~»
z
UJ
~- -~~_. ~.~~-~_.
.00
.986
.989
-1. 286
.20
.930
.924
.• 453
.40
.72.0
.721
.60
.373
.396
.80
.120
.136
- .941
- .934
1.00
.055
.050
.459
-
.045
ax values are 5 replicates of the expected values of N(O,l) order
statistics for n == ]00
hp
UJ
is calculated from 2000 simulated observations for.
simulated observations for UJ > 0
UJ
==
0 and 400
94
Table 4.4
Simulated power values and z statistics for three sets of
X values and n = 100, 200
=-
pa
Ul
p
Ul
n
Case (1)
.00
.20
.40
.60
.80
1.00
z
Ul
Ul
"p
p
Ul
= 100
z
Ul
n
= 200
c
.987
.908
.660
.358
.ll5
.040
.988
.920
.712
.386
.130
.047
- .184
- .885
-2.297*b
-1.150
- .892
- .662
.00
.20
.40
.60
.80
1.00
.987
.930
.690
.420
.138
.040
.987
.910
.685
.358
.105
.053
.988
.920
.712
.386
.130
.047
- .184
- .737
-1.192
-1.150
-1.487
.567
.00
.20
.40
.60
.80
1.00
.990
.925
.693
.353
.128
.055
.989
.924
.396
.136
.050
.429
.075
-1. 249
-1. 758
- .468
.459
.982
.885
.670
.355
.145
.058
.988
.920
.712
.386
.130
.047
-1.102
-2.580*
-l.855
-l.274
.892
1.040
.00
.20
.40
.60
.80
.986
.915
.680
.398
.118
.040
.989
.924
.721
.396
.136
.050
-1. 286
- .679
-1.828
.082
-l.050
- .918
.989
.924
.721
.396
.136
.050
- .858
.453
-1.382
.981
.117
- .918
Case (2)
.00
.20
.40
.60
.80
1.00
.721
Case (3)
.• 00
.20
.40
.60
.80
1.00
a"
P
Ul
is calculated from 2000 simulated observations for
400 observations for Ul > 0
b*S ~gn~
0 Of ~cant
°
at
c
LOO
0/
Ul
=0
and from
= •05
Cases (1), (2), and (3) are replicates of the X values denoted by the
same numbers in (3.1.22)
95
Therefore, it appears that the statistic
F*
may become approximately
distributed as a non-central chi-square random variable when
n
exceeds 100 for the types of
used in
the examples shown.
X values and the value of
Note that the adjustment made to
maintain a constant value of
AO
AO
S2/a 2
to
is equivalent to assuming that a
sequence of alternative hypotheses of the type assumed in Section 4.3
exists.
The asymptotic distribution of the statistic
F*
was found
in Section 4.3 assuming a sequence of alternative hypotheses which
approach zero as
the statistic
n
F*
n
tends to infinity.
Therefore, the statement that
converges in distribution to a non-central chi-
square cannot be made in general.
However, it may not be unreasonable
to use the non-central chi-square distribution to approximate the
power of the test based on
centrality parameter
Values of
corresponding
for
n
= 100
P
ill
z
AO
F*, at least for cases in which the nonis relatively small.
,2
X(l A*)
obtained from the appropriate
values were computed for the simulated power values
shown in Table 4.2.
Some of the results of the
computations are shown in Table 4.5.
From the results shown, it
appears that the departure of the distribution of
central chi-square is a function of the size of
Ao.
and
, ill
F*
from the non-
S2/a 2 , and thus, of
Therefore, the use of the non-central chi-square to predict the
power of the test based on
Fie
may, in general, be very unreliable
for larger values of the non-centrality parameter
AO •
97
5.
CONCLUS IONS AND SUGGESTIONS FOR FUTURE RESEARCH
The preceding analysis seemS to indicate that the application of
the particular type of transformation procedure considered would result in rather severe losses in terms of increased variance of the
estimates of model parameters and possibly decreased power of the test
for a zero regression coefficient.
However, if the problem is con-
sidered in the primary context in which it was proposed,
i.~.,
if the
applicati.on of such a transformation to a sensitive item in a large
set of observations is presented as an alternative to removal of the
item prior to dissemination of the data, then the transformation
procedure may be considered to be acceptable.
Further, if it is
determined that the proportion of observations which are actually
interchanged by a transformation does not have to be too· large in
order to protect the privacy of the individual, use of the procedure
may not result in such severe losses of information.
This would be
true in the application of a transformation to prevent the
observations from court subpoena.
If this procedure is determined to be worthy of further investigation, then there are several areas to be explored.
First, it may be
possible to derive other estimators of the model parameters and other
test statistics which would result in less loss of information than
those considered here.
It seemS likely that models proposed for
analysis of data to which this type of transformation would be applied
would include several independent variables.
Therefore, extensions to
cases in which the sensitive item is the dependent variable or one of
98
several independent variables would be in order.
The general topic of
modelling with the presence of a transformed item. has not been
addressed here and should be considered.
Further, other transformation procedures of the same general
type could be investigated.
For example, a procedure which includes
all possible permutation matrices which allow for interchange of up to
r
of the observations without requiring that exactly
observations be interchanged should be considered.
r
of the
It appears that
the application of such a procedure would yield least squares
estimators of the model parameters with higher relative efficiencies
than those obtained using the restricted transformation discussed
here.
Another approach would involve ordering the observations on
the sensitive item and stratifying the observations into two or more
strata within which all of the observations on the sensitive item
would be interchanged.
It would be necessary to report the location
of the stratum boundaries when the data was released in order for any
analysis to be performed.
Therefore, the formation of more than two
strata might result in at least a partial disclosure of information
on the sensitive item for an individual.
used and
k
strata were formed,
S
If such a procedure were
in the simple linear regression
model could be estimated by
where
denotes the mean of the
denotes the mean of the
y
values in stratum
k ,
X values corrected for the grand mean, and
99
Y(l)
and
i(l)
are similarly defined for stratum one.
S
estimator was proposed by Wa1d (12) for estimating
values in a linear model are subject to error.
Such an
when the
X
This estimator would
have the advantage in the present context of not being a function of
the cross-products of the observations on the dependent and independent variables in the model.
efficient for estimating
S
It would be expected to be more
for larger values of
r
or
w.
However,
the extension of such a procedure to models in which there are more
than two variables might prove to be difficult.
One advantage of using a transformation procedure such as the
one considered is that the individual observations on the sensitive
item are not altered.
Techniques which consist of multiplying the
sensitive item by a random component or adding a random component to
the item do not have this property.
However, it appears that such
techniques would be easier to analyze and develop than the techniques
which involve permutations of the observations.
The use of such
techniques should be investigated within the framework of regression
and model building.
Particular attention would be paid in these
cases to the selection of the distribution for the random components
for particular types of models for the original observations.
100
6.
LIST OF REFERENCES
1..
Cochran, Wi.lliam G. 1963. Sampling Techniques.
and Sons, Inc., New York City, New York.
John Wiley
2.
Cramer, Harald. 1945. Mathematical Methods of Statistics.
Princeton University Press, Princeton, New Jersey.
3.
Felligi, I. P. 1972. On the question of statistical confidentiality. Journal of the American Statistical
Association, 67(337):7-18 (March).
4.
Graybill, F. A. 1961. An Introduction to Linear Statistical
Models, Vol. 1. McGraw-Hill Publishing Co., New York City,
New York.
5.
Harter, H. L. 1960. Expected values of normal order
statistics. ARL Technical Report 60-292 (June).
6.
Mitrinovic, D. S. 1965. Elementary Matrices.
Ltd., Groningen, The Netherlands.
P. Noordhoff,
7.
Noether, G. E. 1955. On a theorem of Pitman.
Mathematical Statistics, 26:64-68.
Annals of
8.
Poole, W. K. 1974. Estimation of a distribution function of
a continuous type random variable through randomized
response. Journal of the American Statistical Association
69(348):1002-1005 (December).
9.
Searle, S. R. 1971. Linear Models.
New York City, New York.
10.
Schotz, W. E. 1964. The effects of the occurrence of a Type III
Error on the power function of the F test for a fixed
effects analysis of variance model. M. A. Thesis (Unpublished).
Department of Statistics, State University of New York at
Buffalo, Buffalo, New York.
11.
Tang, P. C. 1938. The power function of the analysis of
variance test with tables and illustrations of their use.
Statistical Research Memoirs, Vol. II.
12.
Wald, A. 1940. The fitting of straight lines if both variables
are subject to error. Annals of Mathematical Statistics
11:284-300.
13.
Warner, S. L. 1971. The linear randomized response model.
Journal of the American Statistical Association, 66(336):
884-888 (December).
John Wiley and Sons, Inc.,
© Copyright 2026 Paperzz