Sethuraman, J.; (1963)Fixed interval analysis and fractile analysis."

iNSTITUTE OF STATISTICS
BOX 5457
STATE COLLEGE STATION
RALEIGH, NORTH CAROLINA
UNIVERSITY OF NORTH CAROLINA
De~artment of Statistics
Cha~el
Hill, N. C.
FIXED IJ.\lTERV.AL ANALYSIS .AND FRACTILE ANALYSIS
by
J. Sethuraman
Indian Statistical Institute and University of North Carolina
January 1963
Grant No.
AF-AFOSR-62-169
The historical background and the methods of
fixed interval analysis and fracti1e analysis
are presented in this exposition.
This research was partially sup~orted by the Mathematics Division of the Air
Force Office of Scientific Research.
Institute of Statistics
Mimeo Series No. 350
FIXED INTERVAL ANALYSIS AND FRACTILE ANALYSIS
l
by
J. Sethuraman
University of North Carolina and
= = = = = = = = = = = = = *,n~~n=Sga.~i~t~c~l=I~s~i~u~e= = = = = =
o.
= = ======
Summary.
Two methods of comparing certain aspects of two bivariate distributions
rather that their regression functions are described in section 1.
It often
happens that this is very meaningful in certain practical situations.
The large
sample theory of same statistics that enter this discussion is presented in sections
3, 4, 5. Some historical notes about these methods are related in section
6. A few tables that can be used for testing purposes when the underlying distributions are bivariate normal are given in section
1.
4It
7.
The methods of fixed interval analysis and fractile analysis.
One of the important problems of Statistics is the study of the relationship
of one variable Y with another variable
lationships among different populations.
X and of the comparison of such reThese are usually made through the study
of the regression function of Y on X, that is, through the study of the function E(YjX-x).
A common problem met with in practice is the comparison of the
regression functions in two populations (Y,X) and (y l ,X').
We can give illus-
trations for this problem from practically every applied field of Statistics.
lie
will however content ourselves with one example which we will refer to, for purposes of illustration, in the sequel.
and
(y t , X') are available.
Data on a pair of random variables (Y,X)
Y and X refer to the consumption of milk (volume)
and total expenditure per month, respectively, of an individual in a population P.
in part
~his research was supportedl by the Mathematics Division of the Air Force
Office of Scientific Research.
2
and X'
y'
refer to the same variates in a different population P'.
The
problem is to cOIll]?are the patterns of relationship of the consumption of milk
with the total expenditure in the two populations.
The problem just stated can be expressed in symbols as follows:
We wish to
test whether or not
A(X) = AI (x)
for all x •••
• ••
(1)
• e-.
(2)
• ••
C~)
where
A(X)
AI(X)
--------
= E(Y I
X
= E(Y ' I
= x)
XI
= x).
This is the sort of regression problem we shall be concerned with in this chapter.
The classical method of approaching this problem consists in assuming that the
regression functions
A(X)
and AI (x)
are of a certain algebraic fom, say, a
polynomial, a trignometric series or the like, cOIll]?letely determined except for a
finite number of 'parameters'.
The problem of testing the equality of the re-
gression functions then reduces to the problem of testing the equality of these
parameters.
The difficulty "'in these methods is that the assumption of an
~lge­
braic model for the two regression functions is too great an over simplification
of the actual situations.
Further, it is very difficult' to test whether any par-
ticular model fits in a given practical set up.
the fact that
X and
X'
could be discrete and A(X)
defined on different sets of the
dividuals
.Another difficulty arises from
Xl s.
and AI (x)
Again, if X and
Xl
were ages of in-
12.5 and the compari-
son through pointwise regression functions would be too farfetched.
situations the appropriate hypothesis would be to equate
s
uniquely
and Y and YI, their weights, then an observation on the ages like
12 would actually mean that the age lies between 11.5"0 to
yl
be
on suitably chosen intervals of the
XI s.
In such
the regression of the
The methods of fixed interval
e
3
analysis and fractile analysis, to be described here, consist in comparing the
regressions interval-wise rather than point-wise, and it is the choice of the
intervals that distinguishes the two methods.
In fixed interval analysis we fix to start with g fixed intervals
(ao' alJ ' (aI' a;J., ••• , (a g _l , a;J'
by means of the
• ••
(4)
·..
(6)
(g+l) constants
satisfying
We then define the regression
v
(a, b)
of Y on X in an interval
(a, ~
by the relation
v (a,
b)
= E(Y I a
< X~ b) •
Let
·..
·..
The quantities
-'
\) .
J.
are defined in the same 'Way for the random variable
(Y', X').
In fixed interval analysis we test the hypothesis that
-
\)
=
·..
where
v.,. . = (VI'
••• ,
v)
,
g
·..
(10).
It is obvious that such a hypothesis becomes meaningful only when the intervals
(aO' alJ , ••• , (a g_l , a~
of (4) are chosen carefully. For instance, in our
illustrative example, let the populations P and P I correspond to two constituent states of the Indian Union at the same point in time.
Several comparable
social strata can be defined in the two populations by a suitable choice of the
4
intervals of
(4).
Such strata might correspond to expenditures classified as
Ilower class', 'lower middle class', etc.
In this case the hypothesis (9)
is
meaningful and fixed interval analysis is appropriate to the situation.
It should be noted that the use of a large number of strata would mean that
the hypothesis
v
= Vi
'V
is practically equivalent to the hypothesis
The practical details of the method are as follows.
tities
v
A(X)=A'(X).
,-./
and Vi
We estimate the
suitably and test for their difference.
"'"
quan~
Let
• ••
(11)
·..
(12)
and
be two independent samples
tively.
Sand SI
from the popUlations P and P'
Let
n be the number of observations of the samples
i
nents in the interval (a _ , a~ , i = 1, ••• g. Let
i l
where the summation is made over all y's
r
We calculate
v
~
and
v'
~
vi '
for which
S with x-compo-
·..
1, ••• g
respec-
(13)
XIS
r
i = 1, ••• g in a similar way from the sample S'.
The vectors
called the fixed interval means are estimates of V and Vi.
~
~
We de-
fine several measures of divergence between the sample estimates for the two regressions:
D =
'6
r
=
=
g
I:
i=l
r vi - v!J
J.
g.
I: (V.
i=l
2
J.
(v _ V
I
,...,
- Vii)
~
)
B(V _ v')'
r-""'"
·..
(14)
·..
(15)
(16)
5
where
B is some positive definite matrix.
Large values of these statistics will
form the corresponding rejection region for testing the hypothesis in
(9).
It is
an important problem to determine the distributions of these statistics, at least
in large samples, and set up significance points.
Sections 3 and 4 deal with this
problem.
We now proceed to remark that there are many problems where fixed interval
analysis becomes conceptually meaningless or partially so.
suppose that in our practical example, the populations
two different states having different currencies.
As an example let us
P and P I
correspond to
After setting up interval limits
to the total expenditure reflecting different social groups in one population, we
may find it very difficult to demarcate comparable limits in the currency of the
second population.
The official exchange rate cannot be used here since it does
not reflect the actual purchasing power of the two currencies.
rate that does this is not easily available.
case of the general situation where
X' and
The real exchange
We have thus presented a typical
Xl
are not comparable and where
fixed interval analysis is not useful.
In the above example we can safely assume that the total expenditure is a
monotonic function of the socio-economic level of an individual.
Thus, though
the total expenditures in the two countries are not directly comparable they are
monotonically related.
Groupings based on the ranks of
comparable in a meaningful way.
X and
XI
will be
Professor P. C. Mahalanobis made use of this fact
in proposing a new method called fractile analysis for such situations.
We describe this method after developing some notation.
g- 1
Q
be the
2:g
-th,
. ,g
g -th, ••• g-l
g -th
Let
Ql' Q2' ••• ,
quantiles (or fract1les) of the distri-
Qo = - 00, Qg = + 00. Let Q~, Qi' ••• Q~-l' Q~ be the corresponding quantiles for the distribution of XI. Let
bution of
X.
Let
6
We see that the intervals (Qo' Ql)' (Ql' Q2) ••• (Qg-l' Qg )
lowest
100/g percent section, the second lowest
·..
(17)
• ••
(18)
of X represent the
lOO/g percent section,'"
the highest 100/g percent section, respectively, of populationP.
terpre~tation can
v
of X'
•••
A similar in-
be given for the intervals (Q',
Ql'),
(Ql', Q2') , ••• (Q'g- 1,Q')
0
.
g
.in the second population P'.
very important sense, although the
Thus these intervals are
X values are different.
comparabl~
in a
The method of frac-
tile analysis consists in testing the hypothesis
• ••
where
\) = (\)1"'"
-
\) g ),
\) t
J¥
= (\)1',
••• , \)gt) •
The practical method adopted is as follows.
from the populations
P and P'
observations in the sampel
8
Let
8 and
8'
respectively, as in (11), (12).
so that the
x's
(20)
• ••
be the samples
Rearrange the
are 1n the increasing order of
magnitude thus
·..
(21)
·..
(22)
'With
•••
Letn
= m.g
Vi
vi, i
where m and g are integers.
=
~
(i-l)m < r ~ im
= 1, ••• ,
y«
r
»/m,
i
Let
= 1, •••
g.
g are defined in a similar way from 8'.
·..
The vectors
!
and v'
7
.-
called the fractile means, are estimates of v
divergence are then defined.
and
-
Vi.
Suitable measures of
Professor P. C. Mahalanobis defitned one such measure
called separation, as follows.
to the equidistant points
Plot the ordinates v l ' v 2 ' •• ., v g corresponding
1, 2, ••• g. Join the successive points by straight
lines; the curve G thus obtained is called the fractile graph of S.
tile graph G'
of S I
and to the same scale.
The frac-
is obtained in a similar way and drawn on the same paper
The area A between these two graphs, and the
ordinates
at 1 and g called the separation, is a measure of divergence between the two
sample regressions.
The algebraic expression for
A is
l(vi-vi)(Vi +l - v +l ) I
l
- o(v.-v!,
v.~+1 - v!~+1)
~
~
lVi-vii + IV i +l - vi+l
7
I -
•••
(24)
if ab > 0
where
o(a,b)
•••
• ••
(25)
if ab < 0
Some other measures of divergence which have been considered are
g
D =
z I vi - vi I ,
i=l
g
6
=
Z
i=l
(vi -
vV 2
,
r = (v-v') B (v _ v') t
.......,
where
,..,
,
"...........J
B is some positive definite matrix.
·..
·..
• ••
·..
• ••
·..
(26)
Large values of these statistics
form the corresponding critical regions for testing the hypothesis in (19).
We
8
repeat the remark made earlier, that at least the large sample distributions of
these statistics should be determined before using these measures for testing
purposes.
Till now only descriptive methods are available for the methods of
fixed interval analysis and fractile analysis; for examples see Mahalanobis (1958),
(1960)~
We find the requisite limiting distributions in section 3 and these can
now be used in practice.
Let us add a few points bringing out the special character of the method of
fractile analysis in
comp~riBQn
with the method of fixed interval analysis.
applying the method of fixed interval analysis the random variables
For
X and Xf
must be directly comparable and some intervals resembling strata should be formed.
We then compare the regression in these intervals.
is applicable to the more difficult situation where
The method of fractile analysis
X and
Xf
are not directly
comparable but are monotonically related to a common character that
are supposed to measure.
can be used.
X and Xf
This, then is the general set up where fractile analysis
Such situations have been encountered in Econometrics, Psychometry,
Demography etc., and fractile analysis has been applied, though, as yet only in
a descriptive way.
Som (1960).
See Mahalanobis (1960), Das (1960), Das and Sharma (1960),
Fractile analysis is now being utilised in the National sample Survey
of India on a large scale.
Several modifications of the method of fractile analysis can be made.
After
ranking the individuals we can take groups with varying proportions instead of
equal proportions.
Thus a group already formed may be split up into several sub-
groups for a more detailed analysis of the regression in that group.
been employed in Mahalanobis (1960).
This has
For calculating the sample estimate of
~,
the median, mode or some other suitable characteristic can be taken instead of the
mean.
These may be easier to compute in practice.
This has been stated in
9
Mahalanobis (195&).
There is a lot of flexibility and many similar
modifica~
tions can be made.
In practice, it frequently happens that we take several and independent
samples
Sl' ••• , Sk from a population P instead of just one sample.
sub~
These
are called interpenetrating sub-samples and their usefulness in a large number of
situations has been recognised.
See Mahalanobis (1946).
samples are taken from both the populations P and pi
If interpenetrating subwe can get measures,
D,
Dl
and D , of divergence, from the samples measuring the divergence between the
2
combined samples of P and P', within the sub-samples of P and within the
sub-samples of P', respectively.
as
Thus
D would be the between divergence where-
Dl
and D would be the within divergences. These terminologies are analo2
gous to those in analysis of variance problems and are self-explanatory. Sugges-
tive as they are they permit us to develop· some tests of at least a descriptive
nature.
See Mahalanobis (1958a), (1960 ) •
Finally we add that the quantities calculated in the methods of fixed interval analysis and fractile analysis can be suitably modified to estiIllate relative
concentration curves and relative concentration ratios.
The utility of these is
well known and have been used by Mahalanobis (196o), Roy, Chakravarti and lAha
(1959), Iyengar (1960) etc.
We now give a brief description of the method of obtaining the concentration
curve etc.
The following symbols will be used:
Vi
= nl
Vo
=
-*
~
=
... g
(29)
•••
• ••
(30)
...
• ••
(31)
v1 + n2 v2 + ••• + n i Vii i
0, Vg
=V
YiN; i = 0, 1,
...
... ,
g
= 1,
10
When the method of fixed interval analysis is used an estimate of the relative concentration curve of Y on X is obtained by plotting the points
...
•••
and joining successive points by straight lines.
The relative concentration ratio
is estimated by
-
c =
1
=
V
r
L
- (-*
-*
7
~ ni Vi Pi-l + Pi'-
- 1
• ••
When the method of fractile analysis is used an estimate of the relative concentration curve of Y on X is obtained by plotting the points
·..
and joining successive points by straight lines.
(40)
The relative concentration ratio
is estimated by
C
2.
=
2
g V
~
LJ
iv
i
_ g+l
g
• ••
(41)
Notation>, definitions etc.
In this section we develop the notations to be used in sections ;,
4 and 5.
Since we are developing the theories of fixed interval analysis and fractile analysis side by side and since they are similar to one another in many respects, the
11
notations employed will also be similar.
Whenever possible we will distinguish
the quantities involved in fixed interval analysis by a bar -
•
(Y, X) be a random variable on the Euclidean plane with distribution
Let
function
G(y, x).
Let
The distribution function of
(Yl' xl)' (Y2' x 2 ) ,
S of
be a sample
n
X will be denoted by
(Yn' xn )
Of'
...
independent observations on
(Y, X).
F(x).
(42)
Let us arrange the
observations according to the increasing order of magnitude of the
x' , thus
e
(4~)
• ••
The fixed interval method of analysis involves stratifying the population
into
g
strata.
a , a , ••• a
o
g
l
-
co
Let these strata be formed by the predetermined constants
satisfying
< a g- 1 < a g = -f-OO
= a o < a l < '"
These constants intl'oduce"
the r-th stratum (r
= 1,
g
strata in the doms.in of
••• g) contains all
(44)
• ••
•
(Y , X) as
(y, x) with a
r-
f
follows:
1 < x < a.
- r
We
define
~
r
= F(a r )
- F(a r- 1); i
= 1,
••• , g
·..
throughout the discussion on fixed interval analysis we shall assume that
(i)
the variances of Y and
(ii) ~r > 0, r
= 1,
X are finite
(46)
••• , g •
Let us define the means, variances and covariances of
strata
Y and X in these
g
12
=
~r
E(XI a roo 1 < X -< a);
r
c;:2 = V(XI a
r
~r CTr
roo <
1
X
Tr = cov (Y,
vr
2
<
a); Tr
- r
xl
=
E(Y
a roo 1 < X -< a r )
=
V(Y
I a roo 1 < X -< a r )
(47)
a ~ 1 < X -< a r ); r = 1, ••• , g •
Let n , n , ••• , ng be the number of observations in the sample S in the
1 2
1-st, 2-nd, ••• , g-th stratum, respectively, introduced by the constants in (44).
We shall denote the proportion of observations in these strata by P1' P2, ••• ,Pg'
i.e.
(48)
•••
The means, variances and covariances the sample S in these strata are defined
as follows:
ri
8i t i
=
a.J.- 1
E
< xr -< a i
(Xr
- ui)(Yr .. vi)/ni ; i
= 1, ••• ,
g
In the theorems of the next section will be interested in the limiting distribution
of the following statistics.
~i(n)
=
In<Ui
.. ~i);
lli(n)
=}neVi
.. Vi) ;
•••
!i(n)
l(n)
r
= fii(pt .. 7(t);
= 0S1 (n),
(n) =
(50)
i = 1, ••• , g
••• , gg(n»; ~(n)
«(1 (n) , ••• , 'g(n) ).
j
= ~l(n),
•••
••• , ~g(n»
• ••
j
•••
(51)
13
In problems connected with fractile analysis we shall assume the following:
The distribution function
G(y, x) admits
of a density function
which is continuous and which does not vanish.
will be denoted by f( x) •
g(y, x)
The density function of
F(X)
Further the variances of Y and X are finite.
In the fractile method of analysis the population is to be divided into a
previous~
assigned number, g, of strata thus.
i
= - ~,
Q
o
= 1,
=+ ~
Q
g
Let
Q
i
be defined by
••• , g-l
•••
• ••
...
Then the r-th stratum is defined as the region of all points
< x < Q,
Q 1
r-
-
r
r
= 1,
(y, x) with
••• , g •
The means, variances and covariances in these strata are defined by
Il r
= E(X
Qr- 1 < X<Q
r ); v r
2
Q 1 < X < Q ); T
0"; = VeX
Pr0"r r
T
r-
-
= coy
(Y, X
r
r
= E(Y
Q
= V(Y
Q
I Qr1<X
Q );
-<r
<X
<r
Q );
r1r-
1 < X -< Qr );
r
= 1,
(55)
••• , g •
We also require the regresssion function
E(Y
A(Qi)
A0
...
I X = x) = A(X)
= Ai'
= 0,
Ag
1
= 1, ... ,
=0
g-l
•
•••
•••
(56)
•••
(57)
•••
(511)
The corresponding quantities are defined from the sample
that
n = m.g
where m is an integer.
S.
It is assumed
(52)
14
u
i
=
t
(i-l)m < r
5
1m
x(r)/ m ;
(59)
In the next section we will be interested in the limiting distribution of the
following stat.istics.
~i(n)
=JID
(Vi - vi); i
= 1,
••• ,
~i(n)
=,/0
(x(1m+l) - Qi); i
= 1,
f.(n)
=(
'1
(n),
Cg- l(n».
...
(60)
...
(61)
g
••• , g-l
...
...
(62)
In the following sections we shall have occasion to consider another random
variable (Y' , X') constituting a population P'.
The constants for this popu-
lation will be obtained by adding a ' mark to those of P.
In the same way the
statistics from a sample with some suffix will be obtained by adding that
suffix to the statistics of the sample S.
3.
Limit distributions
In this section we state without proof some results concerning the limiting
distributions of the statistics entering (51) and (62).
theorems can be found in Sethuraman (196la).
The proofs of all these
The results concerning the statistics
entering fixed interval analysis and fract11e analySis can again be found in
Sethuraman (1963) and (1961).
.'
15
Let condition (45) hold.
Theorem L
(1
random variables
( 1 , Ti , ~ )
,..,
'"
__
(n),
Tl
_
(n), ~
-v (n»
As n
->
the sequence of
00,
converges to a random variable
with a multivariate normal distribution with mean vector 0 and
roJ
variance covariance matrix
E
E
n -1
o
Where -~
-
T
= diag.* (
= diag.
E = di ag •
-2
al ,
••• ,
-2)
(j
( -2
T l'
... ,
-2
Tg )
(
f1
a1
n- l
o
o
(63)
o
(64)
g
T1
,
'
re (1 - re ) ,
l
l
-re re Z
l
'
(66)
-re re 2
l
re 2 ( 1 - re 2 )
-1t
1t
2 g
(68)
K=
-I{
1
re
g
,
re (1 - re )
g
g
b
*
diag.(b , ••• , b ) stands for the diagonal matrix
k
l
0
l
0
0
b
2
0
........
0
0
b
g
16
Corollary 2.
-
The distribution of
is multivariate normal with mean
~
'"
-
vector {) and variance covariance matrix where A
!\=T
Theorem 3.
Let condition (52) hold.
sequence of random variables ( ~ (n),
~,
random variable ( f.,
Let
Let
= (g -
~)
-- --
'"
MOl
n-
M~
=.±M
M
= (ii2
g1
g
j
2
= (il
= (i'2
g
+
O
S (n»
(i. e. n ->
CD
0
MiMi +
Mg
= - (0g- 1
(
- j.Lg )
)
°i - j.Li (0 1_ 1 - j.Li);
= j =1
D
i
O
1 = j
~MJ.
MM
g g
)
the
converges weakly to a
( 71)
i>j
i
1
(n),
CD
j>i
J
g
+ 1:
g
+
As m ->
with a multivariate normal distribution.
1)(01
- j.Ll)'
.
=.±M
g i
~
l
=g
1 = j;
i /: 1, g
17
f\ij
1
N~
j
>
i
Nj NO
1
i
>
j
= g N1
=-1g
J
i
1o
= T2
+ g Nl Nl
l
i
= j = 1
2
1
NO
g + -g Ng g
i
= j = g
=
E*
ij
T
1
= - M. NO
g J. j
j
>
i
O
=.!N M
g i j
j
<
i
= j;
i
i
Corollary 4.
(£,
.,-
~
'""
i
=j =1
1
= j =g
~ 1, g (75)
= j;
i l l , g(76)
) has a multivariate normal distribution with mean
vector 0 and variance covariance matrix
Corollary 5.
o
~
has a multivariate normal distribution with mean vector
and variance covariance matrix
A .
18
Theorem 6.
size
Let 8(1)' ... , S(k)
n, on (Y, X).
(~(k)(n), ~(k)(n»
be k
independent samples, each of
Let S be the pooled sample.
and
(r (n), ~ (n»
Let
(~(l)(n), ~(l)(n», •••
be statistics computed from these samples.
Let
1
=-
(78)
•••
Let condition (52) hold.
Then (g (n), ~ (n»
~
.~
- (so(n), ~o(n»
~
~
_>
0
in probability.
4. Theoretical applications.
In this section we show that the limiting distributions of section 3 in
effect reduce a large sample
S fram a population P
(v
or "'"
v) from a multivariate normal distribution.
...,
e
distribution of the measures of divergence
I5, 1" f ,
to just one observation
We then derive the limiting
A, D,
t:. and r (see section
2) used in fixed interval analysis and in fractile analysis.
We indicate the
tests that can be used when several (interpenetrating) samples are available
fram each population.
(39)
and
(41)
Let
The asymptotic distributions of the concentration ratios
are shown to be normal.
samples
Sand S'
be drawn from the populatI. ons
tively, and the method of fixed interval analysis be employed.
states that
-/\ In.
v"'"
P
and P'
respec-
Corollary 2
is asymptotically normal with mean and variance covariance matrix
We write this in symbols as follows
v,..·..
MN
...,
-
(V,
--
/\/n)
•••
(79)
...
(80)
Similarly
(79)
and
(80)
show that the samples
S and Sf
are now reduced to the vectors
19
V and
.-",
v'
respectively, with asymptotic multivariate normal distributions •
."V
n, n' -->
Let
GO
nln' -->
in such a way that
v - v' "-'
,-v.-...,
MN
(v - V
rv
........
-
f
,
1\
(
c , 0 <
-I
+
C
/\
)
C
<
(9).
distribution of E ~
r
Then
•
In .
(61)
From (81) we can easily obtain the asymptotic distributions of
under the null hypothesis
0:>
D, 7i
and
r
has a limiting distribution which is the
••• , Xg2 are independent X2 With 1 d.f.
Xr2 where
and
-,2
'1"1
~l = ( - + c --=r-2
'1"1
~l
~l
For a particular choice of
B, n
),
r
...
-.2
'1"
~
g
S(l)' • u, S(k)
Let
2
X
-;rg
)
-,2
-2
g
2
'1"i
'1"i
n E f(Vi-Vi) IC - + c -;crt7
1
~i
i
with
g
degrees of freedom.
exists but does not have a simple algebraic form,.
be k
independent and equally valid (interpenetrating)
n
from the population P.
interpenet:L.'at.ing 8uosamples of size
nf
Let
S(l)'
each from P'.
are large these two sets of samples can be reduced to two samples
-,
and ;[(1)'
A In)
(V,
~
-X(k')
u.
-I
~
If. nand n'
~(l)' ""~k)
In' )
respectively.
The problem of fixed interval analysis is to test the hypothesis
Since
A
S(k')
from multivariate normal distributions with parameters*
(v t, /\
and
The
J'n D
sub-samples of the same cize
be k'
~g
is equal to
and has a limiting distribution that is a
limiting distribution of
(...L + c
=
-,2
'1"g
v
'V
= ~'.
,..,
- I
and
f\
are diagonal the problem can be viewed as the problem of simul-
taneous independent Fisher-Behren tests.
To pose the problem as one in classical multivariate analysis we should
strengthen hypothesis (9) to the hypothesis
-
1\
*The 1JB.rameters
-/
= /\
•
(82)
are the mean and the varia.nce covariance matrix, respectively.
e
L
20
Multivariate analysis now yields us two solutions to this problem.
We can
use the likelihood ratio criterion
g
.
IT {' rkcs 2 + k's,2 + kk'c(V _ v·)2 7/rkCS 2 + k,s,2
I \
r=l
where
k
L
vr
=
.2
s
r
= j=l
E (v (j)
r
E
j=l
r
r
r
r
-
IL
7}
(83)
~
r
vr (.)
/k
J
k
-
.., .
2
vr ) /k •
(84)
• ••
lArge values of this criterion will form the region of rejection.
tion of this criterion has been evaluated by Box (1949).
The distribu-
Another method is to
use the criterion
1
s~
I vr - v; \ Lckk'(k + k'
217 "2
-
...
~2
2
1 < r < g r(CkS + k's,2)(k' + kc) 7 '
-
-
-
r
r
(85)
-
The distribution of this criterion is the distribution of the absolute maximum of
g
independent t-distributions each with k + k' - 2
degrees of freedom.
Signi-
ficance points of this distribution can easily be obtained from the tables of the
t-distribution.
Let samples
Sand S'
be draw from the populations P
tively, and the method of fractile analysis be used.
respec..
Corollary 5 states that
A..
MN ( ,.,\ I "
1\
1m)
(86)
X'""
MN (~'"
1\ 1m')
(87)
V
,..,
which show that the samples
v'
""
and P'
Sand S·
I
are now reduced to the vectors ,.,.v
respectively, with asymptotic multivaraite normal distributions.
and
21
Let
n, n ' ->
n/n' -> c, 0 < c <
in such a 'Way that
0:>
V -
"""
-v'
r-v
1\ +
MN ( v - v' , (
""..-
Then
CX).
I
c
A )/m
( 88 )
)
r
From (88) we can obtain the asymptotic distribution of A, D, 6. and
under the null hypothesis (19).
m 6. has a limiting distribution which is the
2
2
2
distribution of E" r xr where X , ••• Xg are independent
1
A + c 1\' ).
and 13 1 , ••• , "g are the latent roots of ( 1\
r
limiting distribution.
r
By
J
(
1\ + t\ fl
C
and has a limiting distribution that is a
.;m
tribution of jm A and
with 1 d.f.
has a similar
a particular choice of B,
~')
= m (;: -
x2
2
X
(X _ ~') I
with g
d.f.
The limiting dis-
D exist but do not have simple algebraic expressions.
Crude approximations to the limiting distribution of jm A can be made by the
use of the following easily proved inequality
ft <
0-Let
8(1)' ••• , 8(k)
subsamples from p(pl).
(X(l)' ••• , ~(kt))
parameters
(v,
~
_< ,rg
A
JF.
(90)
(8(1)' ••• , 8(k t ))
If n(n')
be k(k t )
interpenetrating
is large then !(l)' ••• , X(k)
is a sample from a multivariate normal distribution with
t\ /m '))
/\ /m)« v t ,
/'I
•
The problem of fractile analysis is to test the hypothesis
v
....,
= "'"v'.
To tackle
this problem as one in multivariate analysis we use a restricted hypothesis
J
v
rY
=
v'
,
1\=/\
•••
(91)
We now have the familiar problem of testing the equality of the mean when
the variance covariance matrix of one multivariate normal population is a constant
mUltiple of that of the other.
The Mahalanobis
2
D -statistic will be used.
Let
22
k
vO
i
=
Sij
=
S
=
L:
vi(a)/k
ct=l
k
L:
ct=l
(vi(a) -
v~)(v j(a)
-
v~)/k
(Sij)
=
["kCS + k's'_7/c
...
•••
•
Then
k + k' -
g -
g
1
, kk'
ck + k'
is our test criterion.
(v,.. 0 - rvv 0,)( -S ).l( v 0 - v 0,)
(94 )
:,...¥'".""
Its distribution is an
F
distribution with
g
and
k + k' - g - 1 d.f.
Let
X (X')
be derived from
S(l)' ••• , S(k) (S(l)' ••• S{k'».
stitute
S(S ' ), the salIl]?le obtained by pooling
6 we can sub-
As an application of Theorem
X and .l' and !o and !.to, in (94) without changing the limiting
distribution.
In the preceding discussion we assumed that the interpenetrating samples
fram each population are of the same size.
silIl]?le modifications.
We can remove this restriction by
The tests mentioned above require several samples from
each populations, and ms.ke use of them only through the v's
'"
volves considerable labour and waste of information.
or v's.
,...,
This in-
When only one salIl]?le is
a.vailable from each population we cannot make use of the measures of divergence
6., D, etc., for testing since their limiting distributions involve unknow con-
stants,
In section 5 we suggest methods of over-coming this difficulty.
In section 1 we described how concentration curves can be draw with the
help of the data of fixed interval analysis and fractile analysis.
We now give
the limiting distributions of the concentration ratio" as a direct application
2;
of the theorems -of section; and a theorem of Oramer
Let
-y =
-i
TC
I:
TC
i vi Gti _1 + "ii)
(I:
= TC1
TC
+ ••• + TCi
TC
r
•••
='\
vr )
r
;66.
(95)
...
•••
-y
TC
PIl.
...
-1
i Vi)
(1ti _1 + "ii)
(I:
(1946)
(96)
(97)
i = 1" ••• g
V
r
y
i = 1, ••• " g •
Then
;n <e . y)
has a limiting distribution that is normal with mean
0
and variance
...
•
2 I: i vi
Let
y
g + 1
g
=------g(V + ... + V }
1
•••
(99)
•••
(100)
g
2 i
The
Jm
(0 - y) has a limiting distribution that is normal with mean
0
and
variance
d
.....
1\
,....d t
•
•••
• ••
(102)
24
Though we have not explicitly mentioned, it should be noted that (46) is
'assumed when fixed interval analysis is employed and (52) is assumed when fractile analysis is employed, in this section.
5. Methods of testing with just one sample.
Among the statistics
D"
~
and
r
thesis (9) of fixed interval analysis"
that can be used to test the null hypo-
r
is most suited for practical use since
its lim!ting distribution is most simple.
r
Further
corresponds to the statis-
tic that is used for testing simula.tneously that several menas are zero when all
the variances are knawn.
When the variables concerned a.re independently normally
distributed this test corresponds to the Rotelling' sT.
We shall therefore use
f:i in this situa.tion.
-2
r = r..frvi - vi)2 I(
t2
i
n
i
We note that
i
= 1"
and
-,2
t
i
nri
1"i
fii n
-,2
1"i
+
,,'i
-2
1"i
are consistent for
"i n
••• , g, the unknown constants that enter in
-*
r =
We note that
r*
tribution is a
r. .frvi - vi )2/(
can be calculated from
2
X
with g
r .
t2
+
~
+
i
and 8'
degrees of freedom.
>-7 •
-,2
and
(10:3'
•••
1"i
,,'i n'
,
Let
t,2
i
8
n'
>-7.
•••
(104)
and that its limiting dis-
The critical region for testing
the null hypothesis of fixed interval analysis will be the region of large values
of
-*
r
Our applications so far of Theorem 1 depend in essence on the fact that in
large samples we -can treat V1" ••• "
variables.
vg
as independently normally distributed
This can be used in many other ways.
For instance" if we have two
25
samples
S(l)
and S(2)
(pooled, they form
S) fram one population P and
Sr, from the second population p r , our test criterion would be
only one,
-*
r =
(105)
The limiting distribution of this statistic is again
We have seen in section 4 (e.g. in (86»
~
I'\J
MN (
~,
2
X with g d.f.
that
...
A 1m).
(106)
Comparing this relation with (79) we note an important difference between fixed
interval analysis and fractile analysis.
These relations show that
-
are asymptotically normal with variance covariance matrices
respectively,
1\
/\
In
v
'""
and
and
/\
is always diagonal, but /\ is not diagonal in general.
diagonal only when Ai
=
for instance when A(x)
vi
= Ai _l
for all i.
is strictly monotone.
considered to be independent in large samples,
V
,..,.
1m,
/\
is
This does not hold in general,
Thus whereas V , ••• V can be
l
g
vl' ••• , vg are dependent even in
the limit.
Among the several measures of divergence introduced in section 1, 6 and
r
can be used since their limiting distributions under the null hypothesis are of
a simple form.
m6
=
I:
m(v
i
_ vt )2
i
has a limiting distribution which is the distribution of I: ~iX~ where
1
J.
2
l, ..., X are independent chi-squares with 1 d.f. and ~l' ••• , ~ are the
'l
g
latent roots of
1\ + c A
Satterthwaite (1946).
i
g
This distribution can be apprOXimated by
f
see
For other apprOXimations see Robbins and Pitman (194927
26
dZ
where
Z
2
bas a X -distribution with
a d.f.;
d and a are given by the
relations
(107)
(108)
(109)
2
X
has a limiting distribution that is a
with g
d.t.
I
These two statistics can not be used in practice unless
known.
In trying to estimate
estimate of
(1953), that
Aii .
t~
1\
from
S
1\
and
f\
are
it is natural to use
It can be shown, for instance from the results of Hoeffding
/\11
is not even consistent for
tunate fact that no simple consistent estimate of
in general.
1\
This unfor-
is available from
S, 1s
a great set-back to the construction of nonparametr1c tests in the method of
fract1le analysis.
We therefore proceed to construct convenient test procedures when
is known to follow some special distribution.
tionof
on
\),
~ and
r;r2.
A for this
g.
I
distribution.
Using (76) we find that
A
1\=
where
Let us assume that the distribu-
\),~; ,.2, r;r2, p.
(Y, X) is bivariate normal with parameters
evaluate the matrix
1\
We note that
Q is a
A.
We now
does not depend
can be written down in the form
...
2
2 2
'rI+p 'r Q
is the identity rna trix and
(Y, X)
g x g
It can be easily demonstrated that the matriX
(110)
me. trix depending only on
Q is doubly symmetric,
that is
(111)
27
The matrix
Q has certain interesting properties (not basic to our main
g1 ~
work) which are easily derived from the fact that
~
j
Q is negative semi-definite,
Thus,
Qi' = 0
Vi
= -y,
the mean of y's.
for each i, one of the
J
]a
tent
roots of Q is zero and so on.
In table I, at the end of this chapter, we have given these matrices
g = 2(1)10.
j
~j
Since the matrix Q is doubly symmetric, we give only
5 i 5 g-j+lj
1
5 j ~ (g~l]
Q for
for
for each g.
We have also tabulated the latent roots (9. , .... , 9.g ) of Q in table II
1
for g = 2(1)10. Actually Q, qi' ~i etc., all 'depend on g, so that Qg' 9.g,i'
We however drop the
~g,i etc., would be the more appropriate notations for them.
suffix g whenever we feel that it will not cause confusion.
1'2
~2
Let ,. and (J" be consistent estimates of ,.2 and
1s now consistently estimated by
Table I for some values of g.)
1\2
A. = ,. I
.A
+
/2' 2
(J ,.
2 2
(J"
respectively.
(Q can be got from
Q.
Then
A.
r* = (v
- v') ( /\
m
will be distributed as a
2
X
1\
The latent roots of
(112)
Ai
A
with g d.f.
A are
f\.
given by ,.2 +
'2'2
(J ,.
1'2
9. ' ••• , ,.
1
these are consistent estimates of' the latent roots of
consistent estimates of the latent roots of
1\+
C
1\
1\ .
+
/2"2
(J ,.
q
g
and
Let us denote the
J
{which is goe in the same
way) by 13 1 , ••• , 13. Then the limiting distribution of Ill6 can be approximated
/the di~'tributiorfS~f
2
by ~ l3 i Xi where Xi' ••• , Xg are independent chi-squares with 1 d.f. We
should note that larbe values of the statistics m ~ and r * form the critical
region for testing the null hypothesis (19) of fractile analysis.
2 (J"
2 from the sample
We now suggest two methods of estimating ,.2and
Let
t
2
and
r
be the sample variance of y and the sample correlation
S.
28
2and
2 r2
t
are consistent estimates
coefficient between y and x. Then t
222
of T and p T respectively. These can now be used as suggested in the
preceding paragraph.
Since the sample size of
consuming to compute t 2 and r.
S will be large it will be time
We now suggest another method that will be
computationally easier.
This method makes use of the sample S only through u
i
I
s
and vi's.
Consider
.... 2
t
-g1
1
""2
s = g
..,
r s t
=
=
2
1:
2
vi -
1:
2
1
2
'~i - ( g 1: ui )
( 1:g
1:
vi)
(113)
(114)
1
1
1
-Euv
g
i i - ( -g E ui )( g 1: vi)'
(115)
We can easily demonstrate (by showing that the expectations converge and
variances tend to zero) that m""2
t,
m 's2 and mrs
g-l 2 +-Tp
1 2 2 ,
L
g
g
respectively, where
=
TrQ
=
2
Tr Q + ~g, 1 + ••• +
2
~g,g
qg, 1 + ••• + qg,g •••
where ~
= ~ as defined in (47).
In ~~es tiI and IV we have given TrQ and
g
Making use of these tables and (116) we can obtain
,.
2
and
2 2
p T
based on
are consistent for
(116)
-T
L
i
(117)
(118)
g
E ~2
for g = 2(1)10.
1 g,i
consistent estimates for
f'"'''''"..
t, sand r .
6. Some historical notes and comments.
The method of fractile analysis was first propounded by Professor P. C.
Mahalanobis in a series of lectures at the Indian Statistical Institute in April
29
1958 and at Berkeley, Chicago and at East Lansing during May and June 1958.
Some further observations and conjectures were made in his lectures in Tokyo and
JCiyushu in November and December 1958.
A preliminary report with the results of
some model sampling experiments appeared in Mahalanobis (1958).
A more detailed
article on fractile analysis with applications appeared as Mahalanobis (1960) in
the Econometrica which was reprinted in the Bankhyi, 1961, Series A.
t'a.keueh1
(1961), -Kawada. (1961), Kitagavro. (1960) and Mitrafanova (1961) have investigated
the limiting belariour the expectation of the error area or separation.
Takeuchi
and Kitagawa (1960) have established results similar to that of corollary' 5"'under
different conditions.
Mahalanobis.
Mitrafanova (1961) has proved certain other conjectures of
The author's article Sethuraman (1961) and (1963) conUi.in all the
resUlts of section
3.
The method of fixed interval analysis has been known for long and has been
in fact used in the National sample Survey of India over a period of years.
A
practical illustration of the methods developed in section 5 can be found in
Parthasarathy and Sethuraman (1961).
A few interesting and difficult problems remain open.
It seems likely that
the restrictions of condition (52) can be relaxed while showing corollary 5.
The case where
g -->
co
presents a
tough problem.
To approxllJa,tely calcula.te the
fractile means of the pooled sample using only the fractile means of the interpenetrating samples is a big problem being encountered by the National sample
Survey of India.
It is very necessary to build up a satisfactory distribution
theory for the fractile means when the sampling has been done by a multistage
design.
e
;0
7.
Tables
The following are the
given only for
Q matrices
Qg
j ~ i ~ g-j + I, 1 ~ j ~
for
g
= 2(1)10,
Table 1
1
2
i
(~=
1
-;18;1
2
31831
2)·
= 3)
1
(g
1
2
;
-42954
28431
14523
i
(g
1
2
3
4
-49140
25;;9
1490;
8899
1
(g
1
2
;
4
5
-53210
23160
-56862
= 4)
-65199
24958
= 5)
1~141
9677
6231
has been
[g~l], since Q is doubly symmetric.
(See (111).)
j
(Qg)ij
-69826
21961
15028
-72205
.31
Table I (contd.)
105 x (Qg)1j
1
j
(g
1
2
:;
4
5
6
-56149
21549
1:;:;89
951:;
6987
4712
i
(g
1
2
:;
4
5
6
7
-58:;99
20:;01
127:;5
9216
702:;
5:;79
:;744
i
(g
i
(g = 9)
1
2
:;
4
5
6
7
8
9
-61670
18477
11702
8617
6761
5466.
4460
:;586
:;
-76069
19207
-750:;4
18296
1:;240
10089
7728
-78601
17285
1:;172
-79482
-76689
17091
12500
9701
770:;
6068
-80419
15866
12:;14
9777
-81711
15509
-78001
16127
11875
9:;17
75:;2
6146
4941
-81804
14769
11588
9:;68
7644
= 8)
4
5
6
7
8
2
-72858
19854
14107
10:;61
= 7)
-60194
19:;02
12179
8907
691:;
5489
4:;24
:;081
1
5
= 6)
i
1
4
2
2602
-8:;:;12
14202
11482
-8:;7:;6
Table II
Latent roots (qg, l' ••• qg,g )
= qg
of the
105 x
g
Q matrices:
g-2(1)10 •
q
g
=2
(00000, -6,662)
g = ,
(00000, -85a94, -57478)
g
=4
(00000, -9'249, -8048" -54946)
g = 5
(00000, -96'2', -77658, -90674, -5'621)
g = 6
(00000, -97716, -52822, -94976, -88817, -75820)
g
=7
(00000, -96960, -5229', -98448, -9'896, -87424, -74527)
g =
8
(00000, -51920, -98876, -97989, -86'39, -96315, -9'020, -7'569)
g =
9
(00000, -51644, -99148, -85466, -98579, -9758" -95767, -92295, -72828)
g = 10
(00000, -5143', -98945, -84746, -99"2, -98309, -97226, -95297, -91684, -722»9)
Table III
Trace Tr Q, of the Q matrices; g
= 2(1)10 •
g
- Tr Qg
2
3
4
5
6
7
8
9
10
0.63662
1.42770
2.28678
3.18277
4.10152
5.30550
5.98026
6.93310
7.82212
Table IV
g
The following gives E ~
1
2
i for
g,
g=2(1)10.
g 2
E ~
1
2
3
4
5
6
7
8
9
10
L
g,i
0.636620
2.379689
}.442239
4.484782
5.516166
6.540582
7.560287
8.576615
9.590464
References
1.
Box, G. E. P. (1949) A general distribution theory for a class of likelihood
.
criteria, Biometrika, 36, 317-346.
2.
Cramer, H. (1946).
press •
.3 .
Das, Rhea S. (1960). Applieationsof fractile graphical analysis to psychometry: I. Item analysis, Psychological Studies, 5, 11-18.
4.
Das, Rhea S. and SharD19., K. N. (1960). Applications of fractile graphical
analysis to psychometry: II. Reliability, Psychological Studies, 5,
71-77.
5·
Hoeffding, W. ( 195.3 ) • On the distribution of expected values of the order
statistics, Ann. Math. Stat., 24, 9.3-100.
6.
Iyengar, N. S. (1960). On a method of computing Engel elasticities from
concentration curves, Econometrica, 28, 882-891.
7.
Kawac'ia, Y. (1961). Some remarks concerning the expectation of the error
area in fractile analysis, sankhya, Series A., 2.3, 155-160.
Mathematical Methods of Statistics, Princeton University
8. Kitagawa, T. (1960).
Sampling distributions of statistics associa.ted with a
fractile graphical method, Bulletin of Mathematical Statistics, 9,
10-42.
9.
9.
10.
11.
Mahalanobis, P. c. (1946). Recent experiments at the Indian Statistical
Institute, Jour. Royal Stat. Soc., 109, .325-.378.
(1958). A method of fractile graphical analysis with
some surmises and results, Trans. of the Bose Institute, 22, 22.3-2.30.
--------
(195880 ) .
Lectures in Japan - draft proof.
(1960). A method of fractile graphical analysis, Econometrica, 28, .325-351, reprinted in (1961), Sankhya, Series A, 23,
41-64.
1.3. Mitrafanova, N. M. (1961). On some problems of fractile graphical analysis,
sankhya, Series A., 23, 145-154.
14. parthasarathy, G. and Sethuraman, J. (1961). Fixed interval analysis:
mimeographed as a working paper, no. PD. NSS.RTS. WP /96 (2.30)
.3 JUly 1961, in the Studies Relating to Planning and National Development,
Indian Statistical Institute, Calcutta.
15. Robbins, H. and Pitman, E. J. G. (1949). Applications of a method of mixtures
to quadratic forms in normal variables, Ann. Math. stat., 20, 552-560.
;6
16. Roy, J., Cba.kravorty, I. M. and Laba, R. G. (1959). A study of concentration curves as description of consumer pattern, Studies
Relating to Planning and National Development, 2, 7;-82, Indian
Statistical Institute, Calcutta.
17. Satterthwaite, F. E. (1946). An approximate "distribution of estimates of
variance components, Biometrics, 2, 110-114.
18. Sethuraman, J. (1961). Some results concerning asymptotic distributions and
their applications, Unpublished Ph.D. thesis, submitted to the Indian
Statistical Institute, Calcutta.
19.
(196la) • Some limit theorems connected with fractile graphical analysis, sankhya, Series A., 2;, 79-90.
20.
(196lb). Some limit theorems for joint distributions. saDkhyi,
Series A, 2;, ;79-;85.
R. K. (1960). A statistical study of the population of India, Unpublished Ph.D. thesis, submitted to the University of Calcutta.
22. Som,
23. Takeuchi, K. (1961). On some properties of the error area in fractile graphical analysis, BaDkhya, Series A, 2;, 65-78.
21. Sethuraman, J. (196;). Some limit theorems connected with fixed interval
analysis, submitted to sankhyi.
\,
•
'\
\