SOME NONPARAMETRIC PROCEDURES
FOR GENERAL RIGHT CENSORED DATA
by
Igusti Ngurah Agung
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1347
SOME NONPARAMETRIC PROCEDURES
FOR GENERAL RIGHT CENSORED DATA
by
Igusti Ngurah Agung
A Dissertation submitted to the faculty of the
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for the
degree of Doctor of Philosophy in the Department of Biostatistics
Chapel Hill
1981
@1981
Igusti Ngurah Agung
ALL RIGHTS RESERVED
11
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • .
vii
Chapter
I.
II.
INTRODUCTION AND LITERATURE REVIEW.....................
1
1.1
1.2
1. 3
Introduction. . . . . . . . . . .. . . . . .. ... . . . . . . . . . .. . . .. . .
Types of Censoring................................
Plo t ting Me tho ds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
3
1.4 Measure of Association............................
1. 5 Conditional Statis tics for Censored Data...........
1.6 Outline of the Present Work.......................
7
10
15
THE PAIR CHART FOR GENERAL SINGLY CENSORED TWO SAMPLE
PROBLm5 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
20
2.1
Introduction......................................
20
2.2
Censored Pair Chart of Type I.....................
24
2.2.1
2.2.2
2.2.3
Alternative Expression for W•••••••••••••••
The Triplets Statistic ••••••••••.••••••••••
24
26
27
Censored Pair Chart of Type-II....................
28
2.3.1
2.3.2
Discussion.................................
The Conditional Mann-Whitney U-Statistic...
28
30
The Pair Chart for Categorical Data...............
Computer Plotting of the Pair Chart...............
The Maximum Distance, D, Statistic................
32
37
39
2.3
2.4
2.5
2.6
Discussion
2.6.1
Discuss ion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.6.2
The Distribution of the D-Statistic.......
40
For Equal Sample Sizes .••••••••••
For Unequal Sample Sizes •••••••••
40
42
Vector Representation for D-Statistic............
Large Approximation for Equal Sample Sizes.......
Large Unequal Sample Sizes.......................
45
47
49
2.6.2.1
2.6.2.2
2.7
2.8
2.9
.
iii
III.
MEASURES OF ASSOCIATION FOR GENERAL RIGHT-CENSORED
BIVARIATE SAl1:PLES .•••••••••.•••••••••••.•••••••••••••••
51
3.1
Introduction
3.2
The Generalized Kendall's Tau •••••••••••••••••••••
51
53
3.2.1
.
53
Kendall's Tau ..........••.........
53
Def1ni tions
3.2.1.1
3.2.1.2
3.3
Unconditional Generalized
Conditional Generalized Kendall's
Tau •••••••••••••••••••••••••••••••
58
Characteristics of the Generalized Kendalls' Tau ••
59
General Properties .•.•••••...••••.•....•...
Some Special Cases .•••••••••...•.••.•..•...
60
60
3.3.1
3.3.2
3.4
.
Alternative Expression of the Generalized
Kendall's Tau
3.5
3.6
.
Asymptotic Distribution of Tau-c,l ••••••••••••••••
Some Characteristics of the Conditional Generalized
Kendall t S Tau ......•...............•.•...•........
67
Alternative Expression of Tau-c,2 ••••••••••
Conditional Distribution of Tau-c,2 ••••••••
67
3.6.1
3.6.2
3.7
An Index Alpha as a Modification of the Generalized
3.8
Test of Independence Using t c, l-Statistic •••••••••
Kendall's Tau .................•...................
3.8.1
3.8.2
IV.
62
66
For Uncensored Data ...•••••.••••.....••.•••
For Censored Data .......••.•..••...........
71
80
82
82
93
KENDALL'S TAU AND SECTOR SYMMETRY FOR RIGHT CENSORED
MlJLTIVARIATE DATA ••••••••••••••••••••••••••••••••••••••
4.1
Introduction
4.2
4.3
4.4
GKT in Multivariate Problems ••••••••••••••••••••••
Alternative Presentation of Simon's Statistic •••••
Dependence Functions and Tests of Independence ••••
97
98
103
107
Pairwise Independence ••••••••••••••••••••••
Non-null Distribution of the Generalized
107
S1DJOn's Statistic
.
107
A Type of Symmetry ••••••••••••••••••••••••••••••••
Tests for Sector Symmetry •••••••••••••••••••••••••
109
111
4.6.1
111
4.4.1
4.4.2
4.5
4.6
.
97
Complete Data Sets •••••••••••••••••••••••••
e.
iv
4.6.1.1
4.6.1.2
Chi-Squared Tes ts •••••••••••••••••
Coefficient of Sector Symmetry •••
111
113
censored Data Sets........................
114
Other Statistical Tests..........................
117
CHARTS FOR K-INDEPENDENT SAMPLES......................
125
4.6.2
4.7
v.
5.1
Introduction. ..
5.2
5.3
5.4
5.5
5.6
Equality Tests for K-Samp1es.....................
Triplet Chart for Uncensored Data................
Statistical Tests Based on the Triplet Chart.....
Orthogonal Projections of the Triplet Chart......
The Maximum Distance, D, Statistic...............
125
126
133
137
141
5.6.1
5.6.2
Values of D Statistic •••••••••••••••••••••
Recursive Formulas for D Statistic •••.••••
141
142
Triplet Chart for Censored Data..................
Alternative Presentation of the Triplet Chart....
156
156
APPLICATIONS TO DEMOGRAPHIC DATA......................
160
6.1
6.2
Applications of the Censored Pair Chart ••••••••••
Applications of the Generalized Kendall's Tau ••••
160
169
CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH •••••••
175
7.1
Cllarts...........................................
175
7.2
Generalized Kendall's Tau........................
176
7.3
7.4
Sector Synunetry-..................................
Applications.....................................
177
178
APPENDIX. • . . . . . . . . • . . . . . . . . • . . . . • . • . . . . . . . . . . . . . . . . . • . . . . . . . .
180
BIBLIOGRA.PHY • • • • • • • • • • • • • • • • • • • • • • • • • • • • • . • • • • • • • • • • • • • • • • . • •
182
5.7
5.8
VI.
VII.
.
..
125
v
LIST OF TABLES
Table
Page
2.1
The Data of Freireich et al. (1963) •••••••••••••••••••
24
2.2
Grouped Data of Freireich et al. (1963) •••••••••••••••
35
3.1
Values of the Function U by
ij
•••••••••••••••
65
3.2
neDistribution of t
2 for Data in Example 1 •••••••••
76
3.3
The Distribution of t
2 for Data in Example 2.........
77
4.1
Value-Case Diagram for m=3••••••••••••••••••••••••••••
99
5.1
Illustrative Data.....................................
128
5.2
The Values of D for n=2,3, and 4......................
142
5.3
6.2
c,
and
~
Paths Containing the Boundary Points or Segments for
n=3..
6.1
c,
~
.. .. . . .. .. .. . .
.. .. . . . . . . .. .. .. ..
. ..
149
First Birth Interval for Women Marrying for the First
Time at Age 18, Based on the U.S.-N.F.S., 1970 ••••••••
161
Time to Separation From First Marriage for Women Marrying for the First Time at Age l8~ ~ased on the V.S.N.F.S., 1970 ..................................•..•....
6.3
Statistical Analysis of
the Illustrative Bivariate
Data..................................................
6.4
170
Correlation Coefficients and Prob.>lrl under H : p=O
O
for the Illustrative Data•••••••••••••••••••••••••••••
6.5
164
171
Correlation Coefficients and Probe »rj under HO~ p=O
for Some Selected Age Groups at First Birth of Mothers
in Sri Lanka.........................................
173
vi
LIST OF FIGURES
Figure
Page
2.1
The Format of the Censored Pair Charts •••••••••••••••••
21
2.2
The CPC-I for the Data of Freireich et a1. (1963) ••••••
25
2.3
The CPC-II for the Data of Freireich et a1. (1963) •••••
29
2.4
The CPC-I for Grouped Data of Freireich et a1. (1963)..
36
2.5
A Copy of the Bar Charts as an Alternative Presentation
of the CPC- I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
e
=
38
2.6
Illustrative CPC-II for (m,n)
3.1
Illustrative Graphs for (A) t
5.1
The Perspective of the Triplet Chart ••••••••••••••••••• 127
5.2
Triplet Charts for Data in Table 5.1 ••••••••••••••••••• 129
5.3
The Projection of a 5x4x3 Unit Cubes ••••••••••••••••••• 139
5.4
One-Sixth of the Hexagonal Projection for n
5.5
Tree Diagrams for (a) D
5.6
Tree Diagrams for Computing P3 (D 3 ,i) From P2 (D 2 ,j) ••••• 147
5.7
The Paths Between Two Sample of Sizes nand (n+1) •••••• 150
5.8
Types of Path at Each Point •••••••••••••••••••••••••••• 151
5.9
The Paths for Sample Size (n+k) in the Hexagonal of
= 12,
Ol
n
(8,6) and (8,5) ••••••••
49
= -1
63
= 2;
and (B) t
ol
= +1..
= 4••••••••
(b) D = 12, n
=
143
3. 144
Size n.................................................. 153
6.1
6.2
6.3
The CPC-I of the First Birth Interval Between Black
and White Women Marrying for the First Time at Age 18
Based on the U.S.-N.F.S., 1970 ••••••••••••••••••••••••
162
The CPC-I of the Time to Separation from First Marriage Between Black and White Women Marrying for the
First Time at Age 18, Based On the U.S.-N.F.S., 1970..
165
The Pair Chart of the Censoring Variables or Entering
Times Between Black and White Women Marrying for the
First Time at Age 18, Based On the U.S.-N.F.S., 1970..
168
vii
ACKNOWLEDGEMENTS
The author wishes to thank the members of the doctoral committee, Dr. Pranab Kumar Sen, Dr. Dana Quade, Dr. Chirayath M.
Suchindran, Dr. James E. Grizzle, and Dr. MOye W. Freyman for their
support during this research, as well as their review of the manuscript.
The author is especially grateful to Dr. Sen, the author's
dissertation advisor, for his continuing encouragement, assistance
and helpful suggestions;
also to Dr. Quade for his review in detail,
as well as his suggestions for the improvement of the manuscript.
Dr. Suchindran, the author's course advisor, is also due special
thanks for his time, patience and encouragement during the author's
doctoral program.
A special recognition is due to the author's wife, Dra. Alit
M. Agung, for her patience, encouragement and many sacrifices during
the long course of study, and also the author's children, Ningsih,
Ratna and Dharma, for their patience and understanding.
The author thanks Mrs. Beryl Glover for her cooperation and
excellent typing of the manuscript.
Finally, the author is particularly indebted to the Department of Education and Culture, the Republic of Indonesia; the National
Family Planning Coordination Board (BKKBN), Jakarta; the United
States Agency for International Development; and the Ford Foundation
for their financial support during the author's training program in
the United States.
July 10, 1981
Igusti N. Agung
ABSTRACT
IGUSTI NGURAH AGUNG. Some Nonparametric Procedures for General
Right Censored Data (Under the Direction of DR. PRANAB KUMAR SEN).
This study covers three main topics.
First, the pair chart of
Quade (1973) will be extended to singly censored two-sample problems
and uncensored three-sample problems.
These extensions are con-
sidered as descriptive statistics for the censored K-samp1e problem
in testing the null hypothesis that each sample has the same distribution function.
Furthermore, based on these charts, we develop
a maximum distance statistic for a statistical test.
An extension of Kendall's (1938) tau to censored bivariate and
multivariate data is the second topic of this study.
Considering a
general right censored bivariate data, we propose an unconditional
and a conditional
gene~a1ized
Kendall's tau (GKT) as measures of
association between the two components or variables.
For right
censored multivariate data, we will consider a vector statistic as
a generalization of Simon's (1977) Kendall's tau.
All these ex-
tensions can be represented as U-statistics of Hoeffding (1948).
Hence, they have limiting normal distributions.
Moreover, we also
study their null and nonnull distributions under a sequence of a1ternative hypotheses.
Finally, for m-variates, we introduce a type of symmetry,
which is considered as sector symmetry.
A chi-squared statistic and
an index of symmetry are developed for testing the null hypothesis
that the m-variates are symmetric dependent.
Applications of the pair ohart and the GKT for demographic
data also are given.
CHAPTER I
INTRODUCTION AND LITERATURE REVIEW
1.1
Introduction
The present work deals with the development of some nonparametric
statistical procedures associated with K-sample problems and the mvariate variables, which are possibly censored on the right.
This
chapter presents a review of some statistical procedures and the outlines of this work.
The literature review is given in the following four sections.
Section 1.2 describes various types of censoring.
In section 1.3, we
review some plotting methods, such as the pair chart of Quade (1973)
for 2-sample problems and the hazard plot of Nelson (1972) for multiply censored life data.
In Section 1.4, a number of measures of
association such as the product moment correlation, Kendall's (1938)
tau, Spearman's rho, and the Goodman-Kruskal
briefly presented.
(1954) index gamma are
In Section 1.5, some conditional statistics for
censored data are given.
This section covers the conditional W
statistic of Gehan (1965), the k-th order statistic proposed by
Chatterjee and Sen (1973), and other statistics which are related
to the Mann-Whitney U statistic.
Finally, the outline of this work will be presented in Section 1.6.
2
1.2
TYpes of Censoring
Data are said to be right (or left) censored if the values of
the observations may not go beyond (or below) some point(s).
If all
values of the observations may not go beyond (or below) a fixed
point, then we have singly right (or left) censored data.
Otherwise,
we have multiply right (or left) censored data; in which the values
of the censored (incomplete) observations and those of the uncensored (complete) one are intermixed.
Life test data are frequently
right censored; that is, the failure times of unfailed units are
known only to be beyond their current running times.
Data are said
to be doubly censored if there are singly, right and left, censored
observations.
And data are said to be censored if they are right
or left censored.
Finally, data are said to be general (randomly)
right censored if they are right censored, and each unit in the
sample can be considered as having both a random censoring time and
a failure time which are statistically independent.
Now, considering a possible random variable relating to the
censoring procedure we can differentiate between Type I and Type II
censoring.
Type I censoring occurs in the right censoring case if
the censoring point(s) is pre-chosen.
Hence, the number of uncen-
sored observations is a random variable.
In Type II censoring, the
number of uncensored observations is pre-chosen, and the censoring
time is random.
Type I and Type II censoring are the two basic
types of planned censoring, in which the researcher has planned to
stop the observations at certain point(s).
Unplanned censoring may
3
occur in clinical trials; in which the researcher has to stop the
study because of the unexpected results which may occur.
Previous paragraphs clearly show the characteristics of censored or incomplete data.
Otherwise, if the data are not censored
then they are said to be complete.
1.3
Plotting Methods
Several types o£ plots have been used for presenting the data
and for statistical testing.
As a descriptive statistic, a Plot, such
as the residual plot or the pair chart, could show us the relations
between the corresponding variables or samples.
However, the results
of each analysis tend to be very subjective, because one may see a
specific deviation which can not be detected by the others.
Since we are proposing an extension of the pair chart of
Quade (1973) for right-censored data, first we will consider Quade's
paper.
Let Xi1'Xi2'.'.'Xini be a random sample of size n i on a variable
Xi with unknown distribution F ,i=1,2.
i
A pair chart can be used as
a descriptive statistic for testing the null hypothesis H : F = F •
O 1
2
The pair chart also can be used as an aid in computing and interpreting various nonparametric procedures for the two-sample problem;
such as the Ko1mogorov-Smirnov test, the Wi1coxon-Mann-Whitney test,
and Mood's squared-rank test.
The construction of the pair chart is as follows.
rectangle of width n
1
units and height n
observation in the combined sample is an
2
units.
Xl,
Draw a
If the smallest
draw a line from the
4
lower left corner of this rectangle one unit to the right; if it
is an X , draw the line one unit up instead.
2
From the end of the
first line draw a second line, one unit to the right if the second
smallest observation is an Xl' and one unit up if it is an X2 •
tinue in the same manner for all (n +n ) observations.
l 2
Con-
Then the
(n +n ) line segments form a path from the left hand corner of the
l 2
rectangle to the upper right corner.
If ties occur between Xl and X , then we have boxes, corres2
ponding to the ties, along the path of the pair chart.
This path divides the n xn -rectangle into two polygons with
l 2
areas U(X ) and U(X 2), below and above the path respectively.
l
boxes corresponding to the ties are equally divided between
and U(X ).
2
U-statistic.
The
U(~)
These areas represent the familiar Mann-Whitney (1947)
As
a descriptive statistic, a large difference between
U(Xl ) and U(X ) would indicate that we may reject the null hypothesis
2
HO: Fl = Fl·
It is clear that the quantities U(Xi ), i=1,2 are easily calculated with the aid of the pair chart.
Another type of test statistic, the triplets
N(X ,X ,X ) will be reviewed for later extension.
2 l 2
N(~,X2'~)
and
If there are no
ties, Quade noted that N(Xi,Xj,Xi ), i~j=l,l is the number of ways in
which it is possible to choose from the data 2 Xi's and 1 Xj such
that the Xj lies between the Xi's.
With the aid of the pair chart,
the quantities N(Xi,Xj,X ), i.j-I,2 are calculated as follows.
i
In the r-th row of the n xn -rectangle, let L be the number
r
l 2
of squares to the left of the path, and R the number to the right of
r
5
it; if some squares, say Qr of them lie within a box, count these as
equally divided between Lr and R •
r
N(Xl'X2'~)
n2
c
1:
r=l
L R
r r
-
n2
1:
r=l
Qr (Q r + 2)/12
(1.
3.1)
H (H + 2)/12
c c
(1.
3.2)
Similarly,
N(X2,~,X2) =
nl
nl
1:
1:
AB
c=l c c
c=l
where A , B , and H are the numbers of squares above the path, below
c
c
c
it, and boxed respectively, within the c-th columns.
The limitations of this pair chart is that it may not be used
as such for censored data, in particular for right-censored data.
Be-
cause a censored observation and an uncensored observation are not always comparable, and any pair of censored observations are not comparable either, it is necessary to extend this pair chart for censored data which we shall discuss later on.
For comparing two one-dimensional samples, Wilk and Gnanadesikan
(1968) introduced two kinds of probability plots, namely quantile
versus quantile plots (Q-Q plots) and percent versus percent plots
(P-P plots).
If the two variables are both uniform on (O,l), then
the two plots are identical.
If X ,i c l,2 are identically distributed then the Q-Q plot and
i
P-P plot of the Xi's are straight lines with slope 1 through the
origin.
If Xl is a linear function of X then the corresponding
2
Q-Q plot will still be linear but with possibly changed location and
slope but the P-P plot will not remain linear.
Wilk and Gnanandesikan
noted the use of the probability plot,
6
as an internal comparison, in regression analysis.
For example,
the Q-Q plot of the ordered residuals and a normal distribution
could show whether the sample came from normal population.
As
the pair charts, the probability plots are limited to un-
censored data.
Finally, for multiply censored life data, Nelson (1972) introduced the hazard plot for graphical analysis.
As
a descriptive
statistic, the hazard plot would show whether the assumed distribution of the times to failure of the units or subjects adequately
fits the data.
On hazard plotting paper for a theoretical distri-
bution, the data and the cumulative hazard scales are chosen so any
such cumulative hazard function is a straight line on the paper.
Papers have been developed for the exponential, normal, lognormal,
extreme value and Weibu11 distributions.
As
usual, the hazard density function h(x), for a distribution
of time x to failure is defined as
h(x)
= f(x)/{l
- F(x)}
(1. 3. 3)
where F(x) is the cumulative distribution function (c.d.f), which
is assumed to be differentiable with F'(x)
= f(x).
And the cumula-
tive hazard function H(x) satisfies the equation
F(x) • 1 - exp.{-H(x)}
For the hazard plotting method, it is assumed that the censoring is random, that is, if the unfai1ed units were to run to
failure, their failure times would be statistically independent of
their censoring times.
(The life distribution of the units cen-
sored at a particular age must be the same as the conditional life
e.
7
distribution of the units that run beyond that age.)
1.4
Measure of Association
A classic measure of association between two variables X and Y
is the parametric coefficient of correlation, r.
Suppose we have a
bivariate random sample, (Xi,Y ), i=l, ••• ,n, of size n from a twoi
dimensional variable (X,Y) then it is defined by
n
r (Xi - X)(Y - Y)
i=l
i
(1.4.1)
r -
where X and Yare the average of the Xi's and Yi'S, respectively.
This coefficient is known as the product-moment correlation.
Based on this correlation, Hotelling and Pabst (1936) studied
the rank correlation r , which was invented by C. Spearman.
s
By
taking Xi as the rank of the i-th individual with respect to the first
component, and Y as his rank with respect to the other component,
i
Spearman's r
s
can be written as
n
6
r
s
=1
n
r
i=l
3
d~
1
- n
(1. 4. 2)
with variance
0
where d
i
2 ., l/(n-l)
(1. 4. 3)
is the difference between the two ranks of the i-th in-
dividual (-observation).
Another index of rank correlation was introduced by Kendall
(1938), which is known as Kendall's tau, t.
The index was defined by
8
t .. 2(e -
D)
(1.4.4)
n(n - 1)
where C denotes the total number of concordant pairs of observations (Xi,Y ) i=l, ••• ,n, and D denotes the total number of disi
cordant pairs.
t =
1-
An alternative formula given by Kendall (1970) is
4s
n(n-l)
(1.4.5)
where s is the minimum number of interchanges between neighbors required to transform one ranking to the other.
Here, it was assumed
that X and Y have continuous marginal distribution functions, so
there are no ties among the Xi's or the Yi's.
Furthermore, if ties occur, Kendall (1970) proposed an index
tau-B.
t
B .. -;,:;~==;:::::::::::::;e~-;;,D~==;::::::::::::;~
I{~(n-l): - T}/{~(n-l) - O{
(1.4.6)
e
e
where T denotes the total number of tied pairs in one ranking, and
U for ties in the other, as a modification of the previous tau, which
is considered as tau-A.
Considering the occurrence of ties, Goodman and Kruskal (1954)
proposed an index Y(gamma) for the rank correlation coefficient.
The index was defined by
y-
C - D
C + D
.,
.....-....,.....;e;;....,..,-~D
~(n-l)
__
- T
12
(1.4.7)
where T12 denotes the total number of tied pairs among (Xi,Y ), i-I,
i
••• ,n.
In this case, (Xi,Y ) and (Xj,Yj ) are considered as tied
i
if Xi ., X or Y ., Y or both.
j
j
i
e
•
9
A more general measure of the correlation between two sets of
observations, ranked or otherwise, was introduced by Daniel (1948).
He introduced the quantity
•
r .,
taijb ij
IcI:a
ij
2 I:b
ij
2)
(1.4.8)
as a correlation coefficient, where a
ij
, b
ij
are scores assigned to
corresponding pairs i,j of the two variables X, Y; and a
., -b • Special cases of r
ij
ji
the product-moment correlation.
b
• -a ,
ij
ji
are Kendall tau, Spearman's rho and
Daniels and Kendall (1947) have shown that the sample correlation
Kendall's tau, t, is an unbiased estimator of the population value
T,
i.e.,
E(t)
=T
(1.4.9)
Using aU-statistic, Hoeffding (1948) showed that t is an unbiased
estimator of
T,
a regular functional of F of degree 2.
However, the sample correlation Spearman's rho is not an unbiased estimator dthe population value p.
Hoeffding showed that
E(r ) ., {(n-2) p + 3T}!(n+l)
(1.4.10)
s
Moran (1948), and Durgin and Stuart (1951) presented different formulas
in showing this bias.
The correlation between Kendall's tau and Spear-
man's rho over all permutations of the sample values was found by
Daniels (1944) •
He obtained
2(n + 1)
{{2n(2n + 5)}
•
(1.4.11)
10
All indexes considered above are limited to complete (- uncensored) data. Hence, it is necessary to propose measures of association, which are applicable for censored data.
Of course, those
measures should be applicable to complete data as well.
1.5
•
Conditional Statistics for Censored Data
We assume that we have two samples of sizes n
populations having
Sample 2:
2
from
And let r , r be the number of censored
l
2
elements in the two samples.
Sample 1:
and n
or continuous) cumulative distribution
~iscrete
F and G respectively.
l
We denote the samples as follows:
XII' X12 ,···, Xl,nl-r
l
uncensored
X* , X* ,···, X*
ll
l2
l,r
censored
e
l
X , X ,···, X
2,n -r
22
2l
2 2
uncensored
X* , X* , ••• , X*
22
21
2,r
censored
2
Halperin (1960) proposed a nonparametric two sample test, U , which is
c
an extension of the Wilcoxon-Mann-Whitney test to two samples censored
at the same fixed point T.
The null hypothesis to be tested was
taken to be F(x) - G(x), He defined U by P {U
c
r
<
c -
llD
U }
c
< X <
= a,
T against the alternative F(x) > G(x) •
where the probability was computed
over the conditional universe for r l + r 2 • r (the observed total number
of censored elements).
To compute U ' for any value of r , we use
l
c
Uc • U(n 1-r l ,n2-r2 ) + r l (n 2-r 2 )
(1.5.1)
where U(n -r ,n -r ) is the Mann-Whitney U statistic for the uncensored
l l 2 2
elements of the two samples.
Here, we should note that U is a
c
~
11
conditional statistic,
even under the null hypothesis.
A more general conditional statistic was proposed by Gehan
(1965).
He gave a distribution-free two-sample test, an extension of
the Wilcoxon test to samples with arbitrary censoring on the right.
•
The null hypothesis was
H : F(t) -G(t), (t
~
T), against either
H1 : F(t)
~
T) or
O
<
G(t), (t
H : F(t) < G(t) or F(t) > G(t), (t
2
He defined
~
T)
+1
-1
o
otherwise
(1.5.2)
and calculated the statistic W = t Uij , where the sum is over all
i,j
n
0
1 2
comparisons.
Gehan considered the conditiona1.mean and variance
of Wunder H ' denoted by E(W
O
I
the pattern of observations.
To test H against either H or H , a
1
2
O
P,HO) and Var(W
I
P, H )' where P was
O
value of
Z
=
W
(l.5.3)
was taken as asymptotically normal with zero mean and unit variance.
The conditional variance of Wwas given in Gehan's paper, including
several particular cases.
One of them is
N(N-n )(N-r)
1
N(N-1)
12
with N
+ n and r • r + r , if there are no ties and all cenl
2
2
l
sored observations occur after the (N-r)-th uncensored observation,
a
n
which was considered by Halperin.
In this special case, the censoring
is at the k (=N-r)-th order statistic of the combined sample of size
•
N-n +n •
l 2
This case is considered also by Sobel (1966) and Basu (1967).
Sobel (1966) proposed a statistic
(1.5.5)
where n
ij
is the number of uncensored observations on Xi among the
first j ordered observations of the combined sample, so that
n
a
j, j=1,2, ••• ,k
2j
Another statistic, T(N) was introduced by Basu
k '
k
T(N) = E ejz
j
k
j=l
lj
+ n
(1.5.6)
(1967).
He defined
(1.5.7)
e
where
r1
z. =i
J
~0
if the j -th ordered observation is an Xl
otherwise
r(j-k-l)/N +
e a~
j l(k+l)2/ 2N2
(k+l)2/2N
2
if 1 ~ j < k
ifk<j<N
(1.5.7)
Basu showed that the statistics V~N) and T~N) are equivalent for
testing H : F - G against one-sided (or two-sided) alternative
O
HI: F + G. By putting k • N, he obtained the relationship of T~N)
with the Wilcoxon statistic and the Mann-Whitney statistic, U, that is
(1.5.8)
•
13
Censoring at the k-th order statistic was considered also by
Chatterjee and Sen (1973).
They, for a fixed-plan truncation scheme,
proposed a conditional test for H •
O
Assuming continuity of F and G,
under HO' let
•
<
<
~(N)
<
Z
n
be the ordered values of the combined sample and
~i(=Ri)
rank of zN(i) among ZN(l)' ••• ZN(N)' then (~,R2' ••• '~)
=
be the
(1,2, ••• ,N).
If the experimentation stops at a pre-fixed time point, then the number of (completed) observations is
N
r
k(z) =
"
e
U(z - Zl)
(1.5.9)
i=l
where u(t) is 1 or 0 according to t
!{k(z) = k}
=
k
~
0 or t
<
o.
Under H
O
(~){F(z)}k{l - F(z)}N-k,
= O,l, ••. ,N
(1. 5.10)
and
-1
N k (z) -
F(z) a. s., as N -+-
co
Furthermore, Chatterjee and Sen proposed the conditional statistic
TNk(z) (= Tn(z»
below
0, k(z)
=0
k(z)
r
i=l
l TN' k(z) = N-l,N
I
•
e
(1.5.11)
14
with
and
*
~(k) - (N - k)-
1
N
r
j=k+1
•
~(j)
where c , ••• ,c are given numbers and 6N(1), •••
N
1
,~(N)}
represents a
set of scores to be suitably chosen, such that
N
r
~(k)
- 0
(1.5.12)
and
i""l
For the two sample problem, we have c
comes from the F or G distribution.
N(
~l)
and k(z)
~
is 1 or 0 according as zi
i
They also derived, for every
O.
E(TN(z)
H ) - 0
0
Var(TN(z)
I
and
(1. 5.13)
O
H ) = VN(z)
e
~
(1.5.14)
with
V (z)'"'
N
fI
0,
k(z) '"' 0
1 -
(N_1)-I[
rN
i-k(z)+l
{
r
N
i=k(z)+1
1,
and as N -..
co
~(i)
- (N - k(z))-1
x
~(i) }2J, 1 ~ k(z) ~ N-2
k(z) - N-1, N
(1.5.15)
L
O
{VNn*}-~N(z) ~
N(O,l)
(1.5.16)
where n* • [NF(Z)], O<F(z)~I; and L stands for the convergence in
O
•
15
law under HO'
A special case of this general formulation was pre-
sented by Davis (1978).
Taking
~(i)
• i, Davis proposed a two
sample Wilcoxon type statistic for analyzing the data for which
pN(O < P
~
1) smallest observations are to be observed sequentially •
•
1.6
Outline of the Present Work
Considering a two-sample life testing problem, pair charts for
right censored data are developed, as an extension of the pair chart
of Quade (1973).
Chart (CPC).
This extension will be called the Censored Pair
And two types of CPC's are proposed, which are con-
sidered as CPC-I and CPC-II.
The construction of the CPC-I is based
on the original data having complete and incomplete observations.
.e
However, the CPC-II is constructed based on a transformed data in
which the data are treated as a complete data.
the CPC-I is to present
The main purpose of
Gehan's (1965) W statistic as a chart,
based on either grouped or ungrouped data.
Some other statistics
such as triplet statistics and the maximum distance, D, statistic,
can be calculated based on a CPC.
Chapter II.
This topic is discussed in
Moreover, this chapter also presents the distribution
and a large sample approximation to the D statistic.
However, the
large sample approximation is limited to the case of equal sample
sizes.
The derivation of the distribution of the D statistic can be
considered as an application of the lattice path counting as treated
•
•
in Mahanty (1979).
e
Also we consider using the Hodges' (1958) method
in calculating the level of significance for the D statistic.
This
16
method is associated with the well known Pascal Triangle, which is
important especially for unequal sample sizes.
Chapter III presents an extension of Kendall's (1938) tau to
right censored bivariate variable.
This extension will be called
a generalized Kendall's Tau (GKT), which can be represented as the
U statistic of Hoeffding (1948).
by a normal distribution function.
•
Hence, the GKT can be approximated
We propose two types of GKT's,
the unconditional GKT (UGKT) and the conditional GKT (CGKT).
By extending the definition of concordant and discordant pairs
of observations to right censored bivariate data, the GKT's can be
written as functions of concordant and discordant pairs.
The distribution of the UGKT is discussed in detail under a
null hypothesis of independence and a sequence of alternative hypothesis.
Since censored data are associated with two variables, true
variable X and censoring variable Y, we define a variable Z
(!,Y) and an indicator variable
i presented in Section 3.8.
e•
= Min
Then we
study the distribution function of Z related to that of X based on
a dependence function, introduced by Sibuya (1960).
Examples are
given under the assumption that X has a bivariate normal distribution
function.
Now, for the CGKT, its conditional distribution is studied for
given general patterns of observations
variate data.
on both components of bi-
This general pattern was introduced by Gehan (1965).
And the distribution function of the CGKT is considered under the
•
assumption that all possible orderings of observations which lead to
17
the fixed given patterns are equally likely.
A detailed discussion
is given for a particular case, in which all observations on one
component are uncensored, and they have a natural ordering. Examples
are given to illustrate the variabilities in calculating the d.f.
of the CGKT.
Even in this special case, it seems impossible to ob-
tain an explicit expression for the d.f. of the CGKT, even though
the sample size is not very large.
The complexity of deriving
this d.f. is outlined.
In Chapter IV, we propose two statistical procedures.
The
first procedure is an extension of the GKT's to right censored multivariate data, and the second procedure is for sector symmetry or
interchangeability of an m-variate variable.
The extension of the GKT's to the censored' m-variate variable
will be presented as a vector statistic, which can be considered as
an extension of the vector statistic of Simon (1977).
The com-
ponents of this vector statistic are the GKT's of all possible pairs
of components and the Simon's indexes for groups of more than two
components.
The latter will be called the Generalized Simon's
Statistic (GSS).
Based on a value-case diagram, which is an extension of the
sign-case diagram of Simon (1977), a matrix equation is developed
for the vector statistic.
Considering its components, in particular
the GSS, we propose concordant and discordant properties for mvariate pairs.
Then, the GSS can be represented as a function of
concordant and discordant pairs.
18
As an application of the UGKT of the bivariate variable, in
this chapter we consider pairwise independence among the m variates.
Finally, based on the m-variate dependence function, proposed by
Sen (1967), we study the non-null distribution of the GSS.
Now, for the second procedure, we introduce a necessary and
sufficient condition for the sector symmetry.
First, we develop a
chi-squared statistic and an index of symmetry for testing the null
hypothesis that the m-variate variable is sector symmetric, based on
complete data.
sets.
Then, this is extended to incomplete (censored) data
As in Chapter II, that is the CPC, the analysis of censored
data can be done based on the transformed data, in which the data
are treated as a complete data set, or the original data.
The first analysis is appropriate for testing that the true
variable is symmetric, by introducing additional restrictions.
For
the second kind of analysis, we propose a conditional chi-squared
statistic.
And for a particular case, in which only one component
may be censored we propose other statistical tests, in particular a
chi-squared test based on a weighted linear combination method.
The distribution of this chi-square is studied under a null hypothesis
and a sequence of alternative hypotheses, including its large sample
approximation.
Chapter V presents the use of the CPC for K-sample problems
in testing the null hypothesis that the K samples have the same d.f.
Here, we consider two possible approaches: (i) using CPC's in pairwise comparisons, and (ii) using CPC's to present the Breslow's
(1970) statistic.
19
For K • 3, we develop a three dimensional chart as a descriptive
statistic for three-sample problems.
chart.
This chart is called a triplet
The limitation of this triplet chart is that it should be
constructed based on a complete data set or a censored data set,
which could be treated as a complete data set.
Examples are given
for small sample sizes to illustrate the construction of the triplet
chart and the use of the triplet chart as a descriptive statistic.
In studying the properties of a triplet chart, we consider
the two kinds of orthogonal projections of its path: (i) its projections on the three coordinate planes, and (ii) its projection on
the plane
~+X2+X3·0
where Xi is the i-th coordinate axis.
For equal sample sizes, we introduce a maximum distance, D,
statistic, for three sample problems.
The distribution of this D
statistic is studied using the second kind of the projection of the
corresponding triplet charts.
Here, the d.f. of the D statistic
will be represented by the number of paths which lead to a certain
value of D.
Moreover, we also develop some recursive formulas.
In Chapter VI, we have the applications of the CPC and the
UGKT to censored data in demography.
Finally, in Chapter VII, we have concluding remarks and
suggestions for further research.
CHAPTER II
THE PAIR CHART FOR GENERAL SINGLY
CENSORED NO SAMPLE PROBLEMS
2.1
Introduction
This chapter deals with an extension of the pair chart, as
represented by Quade (1973),
fo right censored data. The extension
will be called a censored pair chart (CPC).
For the case of left
censoring, the data can be transformed into the right censoring problem, if one multiplim the observed values by -1.
Hence, the dis-
cussion is limited to the general right censored two sample problems.
Let X , j=l, ••• , ni-r ; X* , k=l, ••• ,r i ; be a random sample
ij
ik
i
of size n
i
F , i-I, 2.
i
on variable Xi with unknown distribution function {d. f.)
So, each sample has two types of observations: (i) the
complete (uncensored) observation X , that is the true value for
ij
the j-th individual or subject in the i-th sample; and (ii) the incomplete (censored) observation X* , that is the value or point at
ik
which the observation is terminated or censored for the k-th individual in the i-th sample.
Two types of CPC's will be introduced.
First, a CPC which is
directly applicable for computing the W statistic of Gehan (1965).
This W prOVides a conditional test for
H : F {x)-F {x)
O l
2
(x
2
T) against
HI: FI {x);F2 {x)
(x
~
T)
(2.1.1)
21
where T is the upper bound of the observed values.
-1 Xli
Uij
K
0 ~i
+1 ~i
<
*
X2j or Xli~X2j
* ,X* ) or Xli
*
X2j or (Xli
2j
K
>
Gehan defined
<
*
X2j or X2j
<
*
X2j or Xli~X2j
Xli
(2.1. 2)
and calculated the generalized Wilcoxon statistic W = rruij, where
the sum is over n n comparisons.
l 2
To construct the CPC, without
loss of generality, we may assume that Xij ' j=l, ••• ,n -r , is ini i
* ' i=l, ••• ,r i is decreasing for each
creasing for each i-1,2; but Xik
i.
Different orderings between Xij's and X* ,s within each sample are
ik
needed in order to obtain two polygons on the Xi-side, i=1,2 with areas
A(X.),
which are the statistics considered later.
1.
Furthermore, as in Quade (1973), we draw a rectangle of width
n l units, and height n 2 units.
So we have a rectangle of size 0102'
This rectangle is divided into four sub-rectangles of sizes (nl-r )
l
(n 2-r ), (n -r )r , r (n -r ), aod r r as shown in Figure 2.1.
l 1 2
2
1 2 2
l 2
Figure 2.1.
The Format of the Censored Pair Charts
r r
1 2
(n -r )r
l l 2
,
I
i
i
I
(nl-r l ) (n2-r2 )
I
I
I
I
r (n -r )
l 2 2
22
Each sub-rectangle is further subdivided into unit squares.
have n n squares in total.
l 2
Hence we
Only the first three sub-rectangles are
needed for the construction of the censored pair chart, which will be
called CPC of the first type (CPC-I). In each sub-rectangle, we can
construct a pair chart as the pair chart of Quade, as follows.
In the (nl-rl ) (n -r ) sub-rectangle, if the smallest obser2 2
vation in the combined uncensored samples is an Xl' draw a line from
the lower left corner of this rectangle
unit to the right; if it
~ne
is X2 , draw the line one unit up instead.
From the end of the first
line draw a second line, one unit to the right if the smallest remaining observation is an Xl' one unit up if it is an X2 •
Continue in
the same manner for all (n -rl )+(n -r ) uncensored observations.
l
2 2
Next, in the (n -r l )r sub-rectangle, if the
2
l
s~llest
obser-
* ' k-l, ••• ,
vation in the combined subsamples; Xlj,j-l, ••• ,nl-r l , and X2k
r 2 ; is an Xl' draw a line one unit to the right from the upper left
corner, otherwise draw a line one unit down.
From the end of the
first line draw a second line, one unit to the right if the smallest
remaining observation is an Xl' and one unit down otherwise.
Continue
in the same manner for all (n -r )+r observations.
2
l l
Finally, in the r (n -r ) sub-rectangle, if the smallest obl 2 2
servation in the combined sub-samples: X* ' k-l, ••• ,r , and X2j ' j=l ••• ,
lk
l
n 2-r
2
is an X2 ' draw a line one unit up from the lower right corner;
otherwise, draw the line one unit to the left.
From the end of the
first line draw a second line, one unit up if the smallest remaining
observation is an X ' and one unit to the left otherwise.
2
the same manner for all r +(n -r ) observations.
l
2 2
Continue in
e.
23
The paths of the three pair charts and the left and the lower
sides of the n n rectangle form two polygons with areas A(Xl ) and
l 2
A(X ) unit squares.
2
As a descriptive statistic, a large difference
between A(Xl ) and A(X2 ) would indicate that Xl and X do not have
2
the same distribution functions. Thus, we reject the null hypothesis.
With the aid of the CPC, we can calculate the W statistic of
Gehan as A(X1 ) - A(X2 ).
This will be discussed in section 2.2 in
more detail.
The other type of CPC, which will be called CPC-II, is constructed as follows.
Let X be the true observation for the j-th
ij
individual in the i-th sample. (j=l, ••• ,n ; i=1,2) from the random
i
Fi • Since this observation may be censored
by a varaible T , it cannot always be observed. But we observe
ij
variable Xi with c.d.f.
X~,
1J
= min(Xi"T
,)
J iJ
(2.1.3)
along with the indicator variable
(2.1.4)
which shows whether or not Xij is in fact uncensored.
Assuming Xi
and T are independent random variables, we would have
i
P(Xi ~ x) • P(min(Xi,Ti )
~ x)
(2.1.5)
Hence
(2.1.6)
where Fi and G are the c.d.f.'s of Xi and Ti respectively.
i
This
24
shows, in general, Fi(x) ; Fi(x) under the null hypothesis H in
O
(2.1.1).
However, if we accept the assumption that G (x) • G (x),
1
2
then Fi(x) • Fi(x) under HO•
In this special case, we would test
the null hypothesis
(2.1.7)
using the usual statistics, such as Ko1mogorov-Smirnov statistic
and pair chart.
Hence, based on
XIj ,
j-1, ••• ,n , i=1,2, we can coni
struct the second type of the CPC (CPC-II), which is the same as that
of Quade.
However, indicator variables 0ij may be considered in ad-
dition to this pair chart.
This and the comparison between CPC-I and
CPC-II will be discussed in section 2.3.
2.2. Censored Pair Chart of Type-I
2.2.1
Discussion
For discussion, the data of Freireich et a1. (1963) con-
sidered by Gehan (1965) and Breslow (19iU) will be re-ana1yzed using a
censored pair chart.
Table 2.1.
The data are given in Table 2.1.
The data of Freireich et a1. (1963)
Sample 1: 6,6,6,7,10,13,16,22,23,35 *,34 *,32 *,32 *,25 *,20 *,19 *,17 * ,
11 *,10 *,9 *,6 *•
Sample 2: 1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23.
The first sample consists of lengths of remission in weeks for
21 patients with acute leukemia, maintained on the drug 6-MP, in which
12 are censored on the right.
The second sample consists of the
25
Figure 2.2
The CPC-I for the data of Freireich
(1963)
~
al.
,I
23
,
22
,I
~
17
15
12
--
12
11
11
e
t
r--
8
~
8
,
8
I
8
I
I--
5
5
4
4
3
2
2
1
.
1
6
6
6
7 10 13 16 22 23 35 *34 *32 *32 *25 *20 *19 *17 *11 *10 * 9 * 6 *
26
1pngths of unmaintained remission for 21 patients having complete observations.
The CPC-I of these data is given in Figure 2.2.
This
figure shows a large difference between A(X ) and A(X ) , which inl
2
dicates that the treatment group (maintained patients) and the control
group (unmaintained patients) do not have the same distribution
functions.
From Figure 2.2, we can calculate A(X )=336 and A(X2 )=65. Hence,
l
= 271,
W=336 - 65
which is in fact the same as the result of Gehan.
However, Gehan calculated the statistic as W = 335 - 64 = 271.
The
difference in calculation is the effect of the ties between uncensored
observations, i.e., 22 and 23, in the two samples.
As
defined by
Quade (1973), in a pair chart the square(s) corresponding to the ties
are equally divided between A(X ) and A(X ).
l
2
If there are no ties,
A(Xl ) (or A(X » is the total number of times the observed values on
2
~
(or X2 ) are larger than those on X2 (or Xl).
Here, we should note that ties cannot occur between a censored
observation and an uncensored one.
2.2.2
Alternative expression for W
Observing the CPC-I, A(Xl ) and A(X ) can be written as
2
(2.2.1)
where
U(~)
and U(X ) are the Mann-Whitney U-statistics corresponding
2
to the uncensored observations with U(X ) + U(X2 )
l
and
C(~)
(or C(X
are larger than
2
»
~j
=
(n -r )(n -r ),
l 1
2 2
* ,s (or X * ,s)
is the total number of times Xli
2j
(or
~i).
In Figure 2.2, we have
U(~).124,
27
U(~).65,
C(~)=2l2,
and C(X2 )=O.
Thence the W-statistic of Gehan can be written as
W - (U(Xl ) - U(X 2» +
2U(~)
2.2.3.
(C(~)
+
(C(~)
- C(X2»
C(X
2
»
- (nl -r l )(n 2-r 2 )
(2.2.2)
The triplets statistic
Let
N(Xl,X2'~)
(or
N(X2,~,X2»
be the number of ways
in which it is possible to choose from the data 2 of Xl's and I of
X2 's (or 2 of X2 's and 1 of
(or
~
~'s)
lies between the X2 's).
such that X lies between the Xl's
2
Using Quade's formulas, with the aid
of a CPC-I, we can calculate the values of the triplets N(X ,X ,X )
l 2 l
and
N(X2,~,X2).
Note that the two observed polygons in a CPC-I have
a common boundary, that is the path of the pair chart corresponding
to the complete (uncensored) observations.
In the j-th row, j=l, ••• ,
n 2-r , let L be the number of unit squares to the left of this path,
2
j
and R* to the right of it which belong to A(X ): if some unit-squares,
j
I
say Q. of them lie within a box corresponding to the ties between the
J
Xl's and X 's, count these as equally divided between L and R*•
j
2
j
N(Xl,X2'~)
k
*
- r LjRj
j=l
k
r
j=l
Qj(Qj+2)/12, k=n 2-r 2
Then
(2.2.3)
Similarly,
m
r Hi (H i +2)/12, m=n1-rl
(2.2.4)
i-I
* and Hi are the unit squares below the path, above it
where Bi , Ai'
belonging to A(X ), and boxed respectively, within the i-th column,
2
Extending the statistic of Crouse and Steffens (1969), as noted
28
by Quade, let
then V is a conditional statistic for testing H : F (x)=F (x) against
2
O 1
suspected differences in scale, assuming that Xl and X do not differ
2
in location.
Using Figure 2.2, for the data of Freireich
~
a1., we
obtain:
N(X1,X2'~)
= 9(0)(21)+4(4)(10)+2(5)(13)+2(5)(12)+6(11)+(7)(10)
+(7.5)(6.5)+(8.4)(5.5) - 2(1(1+2)/12)
= 641
N(X2 ,X ,X ) = 4(9)(12)+(13)(8)+(17)(4)+(18)(3)+(19.5)(1.5)+
1 2
+(20.5)(0.5) - 2(1(1+2)/12)
= 694
Hence,
V = 641 - 694
=
-53
2.3. Censored Pair Chart of Type-II
2.3.1.
Discussion
As noted in section 2.1 a CPC-II is in fact a pair chart
corresponding to the observed values on the random variables Xi =
min (Xi,T ), i-1,2.
i
However, the interpretation of the chart may
depend on the indicator variables
~i
(or
~ij'
j=l, ••• ,n , i=l,2).
i
The
construction of the chart is the same as that of Quade, since we consider all observations on Xi as uncensored observations.
However, in
addition to these observed values, on the xi-axis we would have the
29
Figure 2.3
2
X'2
0
23
0
22
0
17
0
15
0
12
0
12
0
11
0
11
0
8
0
8
0
8
0
8
0
5
0
5
0
4
0
4
0
3
0
2
0
2
0
1
0
1
6
The CPC-II for the data of Freireich et a1. (1963)
/j
1,.1
L'
I'"--l
e
I
II
.
.
6
6
6
6
7
o
0
0
1
0
9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35 X'
1
0 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 01
30
corresponding values of 6 •
i
Figure 2.3 shows the CPC-II of the
Freireich et al., data.
On a CPC-II, we observe the Mann-Whitney type U-statistic
U(Xi) (or U(Xi»
size on
X~,
with U(Xi) + U(Xi)
i=1,2.
U(Xi) (or U(Xi»
~
n l n 2 , where n i
is the sample
If there are no ties between Xij and X2k ' then
is the total number of the Xij's (or Xik's) are
larger than Xik (or
Xi j ).
This U-statistic is the usual Mann-Whitney
statistic for testing the null hypothesis H : Fi(x) = Fi(x), under
O
the assumption that Gl(x)
~
G (x), i.e., the two samples have the
2
same censoring function.
In computing this statistic, we ignore com-
plete the indicator variables 6 , i=1,2.
i
Considering the data of Freireich et al., Figure 2.3 shows a
'large' value of (U(Xi) - U(Xi».
As
figure suggests the rejection of HO•
a descriptive statistic, this
The assumption of equal cen-
soring for these data was noted by Breslow (1970).
Figure 2.3, we can calculate U(Xi)
334.5 = 106.5.
~
With the aid of
334.5, and then U(Xi) = 441 -
Note that the boxes corresponding to the ties between
Xij and Xik at the observed values 11, 17, 22 and 23 are equally
divided by U(Xi) and U(Xi).
Thence, based on this value of U(Xi) =
O:
333.5, we may use the usual Mann-Whitney U-statistic to test H
Fi(x)
~
Fi(x) against the alternative Fi(x)
2.3.2.
~
Fi(x).
The conditional Mann-Whitney U-statistic
In section 3.1, we proposed the (unconditional) statis-
tic U(Xi) (or U(Xi», which is independent of the indicator variables
6i , i-1,2, for testing the hypothesis H : Fl(x) - F (x) under the
O
2
assumption G (x) • G (x), i.e., the two samples have the same
2
1
31
censoring function. Without this assumption the statistic U(Xi)
would not be appropriate for testing H • However, the CPC-II can
O
be used as an aid for computing the conditional Mann-Witney U-statistics:
(i).
U(X'
i
6 ,6 )
1 2
(ii).
U(X'
i
6 ) = r U(X~
i
~,
1
(iii).
U(X'
1
6 ) = r U(Xil
i
&
1
6 ,6 ,),
i i
i~i'=1,2
6 ,6 ,),
i i
i~i'=1,2
(2.3.1)
This is accomplished by considering the ordered pairs (Xij,6
ij
),
j=l, ••• ,n , i=1,2 as the n ,n observations with 6
= 0 or 1 aci
ij
1 2
cording to whether Xi j is uncensored or censored.
the ties between
Xi
and
Xi,
In this case, for
we define
(2.3.2)
Thence, if ties occur, the box(es) corresponding to the ties in a
CPC-II should be adjusted accordingly.
If there are no ties between
Now, considering the statistics U(X ), C(X ), and A(Xi ),
i
i
i=1,2, as proposed in Section 2.2, it is easy to verify that
U(X'
61 =6 2=0) - U(Xi )
U(X'
6 =1,6 ,-0) - C(Xi ) ,
i
i
U(X'
~i'·O)
i
i
i
i~i'
= A(Xi ) , i~i'
(2.3.4)
Hence the W-statistic of Gehan can be written as
(2.3.5)
32
this shows that W is in fact a conditional statistic, that is a difference between two conditional U-statistics corresponding to the
:.~
The Pair Chart for Categorical Data
Censored categorical data in two-sample life testing problems
would have the following format
Sample 1
Class
un-cen.
j
where
f
ij
Sample 2
cen.
cen.
un-cen.
c~
= the
number of uncensored observations in the i-th
sample belonging to the j-th class (time interval)
c
ij
= the
number of censored observations in the i-th sample
belonging to the j-th class
i
= 1,
2; j
= l, ••• ,k.
In order to construct a censored pair chart for these data,
we consider all uncensored observations in the j-th class as having
the same value for each j.
Without loss of generality, we may define
j as the value of the uncensored observations in the j-th time interval, provided that the time intervals are increasing with j.
For the
censored observations in the j-th time interval, we define c * as their
value.
The value of c * will be determined by the method of counting
chosen for the occurrence of the censored observations.
consider the following four possible methods:
Here, we will
33
(1)
Gehan's method.
He considered the censored observations as tied in the
j-th class, and counted them as occurring after class
(j-l) but before j.
Using our notation, (j-l) * is de-
fined to be the values of the censored observations in
the j-th class.
(2)
The censored observations in the j-th class are considered as tied wi th observed values j * •
The reason for
this assumption is that a right-censored observation in
the j-th class is considered as occurring after uncensored observed values in the j-th class.
(3)
We define the time intervals corresponding to all uncensored observations.
as follows.
The censored observations are divided
If Xi* is a censored observation, and mj is
the mid-point of the j-th time interval such that m
j
<
Xi*
is
<
mj + , then j * is defined to be its value, or
l
*
~
counted as occurring after the j-th interval but before
the (j+l)--th.
(4)
The censored observations in the j-th interval are considered as tied having the same values as those of the
uncensored observations, that is j, but the number of
censored observations, c ij ' is reduced to c ij /2 if c ij
is even, and (c
ij
+l)/2 if it is odd.
In this case, the
censored data are in fact transformed into uncensored
data.
Considering fixed time-intervals, or the same time-intervals
34
for the four methods, let
Wk - (Uk (XI) - Uk (X2 »
~
(Ck(Xl ) -
~(X2»
(2.4.1)
be the value of the W statistic computed using the k-th method of
counting.
Since in the fourth method, the censored observations are
transformed into uncensored observations, C (X ) - C (X ) will be de4 l
4 2
fined as the value contributed by those observations in the transformed
data.
all k.
It is clear that Uk(Xl ) - U (X ) would have the same values for
k 2
But
~(Xl)
-
~(X2)
vary.
It is easy to verify that
~(Xi)'
i=1,2; k=1,2,3,4 satisfy the following inequalities
(i)
Cl(Xi )
~
C3 (Xi )
(ii)
C (X )
4 i
<
C (X )
3 i
~
C2 (Xi )
In (i), Cl(X ) - C (X ) (or C (Xi ) = C (X »
i
3 i
3
2 i
(2.4.2)
is attainable if the
censored observations occur within the first (or second) half of the
j-th time interval, for all j.
C2 (Xi ).
However, Cl(X ) is strictly less than
i
So, we may consider the third method as a 'compromise' between
the first two methods.
Now, how about the values of Wk ?
Based on
these inequalities we could not derive any general relation between
the Wk's, except for the following particular case.
If all observations on X are uncensored, hence C (X )-O, using
k 2
2
the same time-intervals, then Wk is a function of
~(Xl)
for each k.
Thence
(1)
WI ~ W3 ~ W2
(2)
W < W
(2.4.3)
4
3
As an illustration, we will consider again the data of Freireich
et a1. in Table 2.1, which were analyzed as grouped data by Gehan
(1965).
Following Gehan' s procedure, we would have the grouped data
35
as in Table 2.2,
Table 2.2. Grouped data of Freireich et al. (1963)
--
Sample 1 (6-MP)
un-cen.
Class Interval
j
(weeks)
1
f lj
v1j
cen.
c
v*
lj
lj
0
2
0 - 4
5 - 9
4
2
2
3
10 - 14
2
3
4
15 - 19
1
5
20 - 24
2
6
25+
0
0
Sample 2 (Placebo)
un-cen.
cen.
*
f 2j
v 2j
c 2j
v 2j
7
1
0
1*
6
2
0
2
2*
4
3
0
4
2
3*
2
4
0
5
1
4*
2
5
0
5
5*
0
0
where
vlj
= v 2j
is the value defined for uncensored observations in
the j-th interval.
v*
lj
= v *2j
is the value defined for censored observations in
the j-th interval.
The CPC-I of these data is given in Figure 2.4.As a descriptive
statistic, FigureZ.4shows a large difference between A(Xl ) and A(X 2 ) ,
hence we may reject H •
O
326 and A(X )
Z
= 61.
With the aid of this CPC, we calculate A(Xl ) •
Thence WI
= 326
- 61
= 265,
as computed by Gehan.
The construction of the CPC for grouped data based on the
other methods is straightforward.
and W • 264.
4
(2.4.3).
We would obtain W2 • 289, W3
= 273,
Note that the values of Wk satisfy the inequalities
Comparing with W • 271 for ungrouped data, the third method
gives a value which is very close to W, and the fourth method gives
36
Figure 2.4
The CPC-I for grouped data of Freireich ~ a1. (1963)
5
...-.
5
4
4
3
3
3
3
2
2
2
X2
2
2
2
1
.
1
1
1
1
1
1
1...-_
222
2
3 345
* 5*4 * 3* 3*2 * 2·
* 1* 1*
5 5* 5* 5 * 5
37
the smallest value of 264, which is smaller than that of Gehan.
Remark.
Discrete data can be considered as a special case of categorical
data with all intervals of length zero.
In this case, we assume that
the i-th population, i-l,2, has discrete distribution function F , and
i
the F 's are purely discontinuous with the same finite set of disi
continuity points.
Hence, the construction of the CPC for discrete
data is the same as that for categorical data.
2.5
Computer plotting of the pair chart
With the help of a SAS program, the CPC can be presented as a bar
chart, either vertical or horizontal.
Here, we should note that the
chart procedure in SAS could not give the printout
of the observations
with special ordering as defined in Section2J. of the Xi-axis.
Thence,
in the bar chart, the special-ordered ni-observations on Xi will be
presented by the integers 1,2, ••• ,n i along the Xi-axis.
This computer
plotting is necessary, especially for large sample sizes.
For an illustration, we will consider the CPC-I for the data of
Freireich et a1.
Using the SAS program, given as an appendix, we ob-
tain the vertical and horizontal bar charts in Figure 2.5.
Note that
the Xl-axis is horizontal in the vertical bar chart, and it is vertical
in the horizontal bar chart.
Furthermore, the bars in each chart are
divided in general into five kinds of sub-bars (-sub-regions) having
symbols (-STAT_ID) 1, 2, 3, 4, or 5.
The sub-regions having symbols
1, 2, and 3 correspond to un-censored pairs of observations (~~2) such
38
Figure 2.5.
A Copy of the Bar Charts as an Alternative
Presentation of the CPC-I
... __,.,_1
""".11
I~.
un un
....
'l"
I.
"!!
!!!J
p"
I!"
sa
In. IU. .u.
I'" I!!'S ISSS IS"
' ' ' ' 'B' I • ., un
nl! un un .u' 'u••• n
,,!!
,."
In.
S!"
'''I
S~JJ,
,S!)
'!lS'
"J.I
'''!
""
un
I!!'
'H'
SS'S
,."
I~!'
!!'"
S!3!
UI!
!!,U
ISS!
un
un
3!!S
'U'
,'S!
'H'
",!
"~"~ "u
II
.. ,..
~,!!
un
11.
II.
I
,.
••
1
1
t
••
I
1!5!
UU
!!U
J3=J
'!Il
,~!,
SlU
1111
Illl
1111
1;1:
1111
1111
1111
IIlI
lU1
1111
U::
1111
1111
1:.:1
1111
1111
lUI
lUI
IHS
'U'
a",
"~"~
n" , •.,
un
u~. un
S~!' 'S!! II!!
un
IU' 'U.
I!S! II!) I),J
J'!! n:s ,us
uS!
lB' 1111
I~JS "~S 1111
"!!
l!lS ,,!,
S!~, I!!!
S'
I!!!
U'l
""
J:lS
II!.
'5!'
",!
u"
.,!)
un
un
IISS
II!!
!'!1'
I!"
.,.,
1111
1111
1111
1111
1111
un
1111
1111
1111
1111
1111
1111
1111
1111
Ull
1111
1111
1111
1111
lUI
lUI
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
Ull
1111
1111
1111
1111
1111
1111
1111
U;I
1111
1111
Ull
1111
•
un
I!!!
un
un
1111
1111
IU1
Ull
1111
lila
1111
,u,
IJ-:'
alII
,n,
&111
1111
UlI
lUI
lUI
lUI
Ull
UI1
•
un
1111
1111
UU
UU
IIlI
1111
1111
1111
1111
lUI
1111
1111
UI1
lUI
1111
1111
UI1
1111
1111
lUI
'Ul
IIU
•
•
•
I
lUI
1111
.•......................__..1
I
IS"
u.,
'!S1
IS!)
1111
1111
1111
1111
1111
IHI
lUI
1111
1111
lUI
1111
1111
lUI
1111
lUI
1111
1111
1111
1111
Ull
1111
1111
1111
1111
&111
1111
In.
I~!"
,,,~
'U,
'!lS'
"n
1111
1111
IIlI
1111
Ull
1I1l
1111
IUS
IIlI
1111
1111
1111
1111
IIlI
UlI
1111
1111
1111
1111
1111
1111
lUI
1111
1111
1111
1111
1111
1111
UI1
UI1
UI1
1111
1111
1111
Illl
1111
Ull
1111
1111
un
1111
Ull
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
lUI
lUI
1111
1111
'Ill
&111
lUI
lUI
lUI
lUI
1111
1111
1111
UU
un
1111
un
lUI
lUI
UII. IIU
IU'
""
UU
IIU
1111
U"
1111
1111
1111
1111
UU
1111
1111
1111
1111
1111
UI1
IU,1
1111
1111
1111
1111
U,ll
1111
1111
1111
lUI
1111
1111
1111
1111
Ull
1111
1111
1111
1111
1111
1111
1111
1111
__ - ..-_
~
1111
'1"
"~~
,~"
""
.
,~!
""
,~,!
.,,, .", .", .,,,
""
""
,,,,
'!'!.l!
I'"~
••"
",~
""
""
".,
""
""
,!"
""
""
,,!O,
••"
,~"
""
.~"
""
""
.",
""
''',
""
""
''',
""
""
""
,,!:~
""
,~,~
,~!'~
".,
t!.!',
~~!,
""
".,
""'~
".,
,~~,
,,--,
""
,-"
,~"
""
""
,~"
,!'"
I''''
".,
""
I'"
""
II!'
""
""
""
,,~,
"~~
''''
""
~,!,
""
"~,
""
""
,~"
"",
""
S",
""
.",
.,,,
s,~!'
S!"
.S"
I'"~
.,,,
,n, ""
,!~~
""
ISS'
I'"
""
,.... ,,!,
""
""
It~!
""
""
,~.,
S'~'
1'!!5
,~~,
""
'''!I'
,,!!
""
~
""
"~,
,,~,
55!!
,,~~
,,~,
'''~
''''
lUI.""
""
""
IS!.!'
.,,~
,,,,
'S"
""
S'"
""
!~""
,~"
I'"~
'''!I!1
.",
,,~,
,~"
, • • • • •. ,
""
~1Il __ ,
,,--,
,~"
'5~'
"
" "
" ,..,
,~"
.,.,
''''
,..,
'I" ,." S,.' ,."
S'" "" ""
,."
" " 'S"
'!Il' ""
51" ''',
IS"
""
S,!~ IS!' I!!' " 5 ' ,!::, " ' 5
,,-~ ,~~, '5~' ,~" '''' ,~"
" " " " " " 'I"
" " ,,!!
,,-, " " " " " " " . , " "
" " nIl " " " " n:, !!l5'
'5!! ,~"
I!!' !!!.5 I'~' ,5!.,
,,_, , . " " " " " " . , , , , ,
IS"
. ."
........
""
""
II
~
II
II
II
~
U
I
y
I
•
.
II
U
lJ
a-
n
I~
IY
~
...
••at
I!.- 55 .. ~~~~ ... ! .... ~.! ~~ ~ ~ ~,~ ~ ,,~ ~,~ l!! ,-. !.,,~~ '''~!'.''''·Il\1\''''I'J~5~~!J!t'~~,"\~!: !'~!'l~!~!'':I~!I!a5!!I'~''''~~·.''~.!
5 !.!!.!! ~! !~,
~~;'.:!~. ~! ~ ~.!!. ~~!! ~!!! !~S~!!!! 5! 5! ~~!. ! 1Pt! S." ,.!~~~,~ ,,! '!'!~!l!'!
!!,,!Il!'! I! ~".,!,~!! ~,!!,~,
• ..... 5. ";.~! ~~! ~'!~.~~.!!!~!! !!!~!!~!!,~!,,!~~,~·~!t'·~"~!'''5~'!'1j!~~~!'!!\!'!!!!!''!l!t\~!,!!,~!
~,~"~"!,,,
~ ~ ;·!!S!-!!~l!~!! !;!~,!!~!,~!'!'!~,!,!~,,!!!~"l!>""""~"""!!",!!! ,!!!!
,!!"",!
"I)
• _. !,! ...... ! "5! ~! ~!!! ~!!,.; ,:!! ,!!!,: ~,!,:!~! !!!! ~~"'!5"5S!'5'~55!!!~!,,!,~~!!!!~~,!" !!!S! ~!!! SSSISII'S'S
I'~ 5'!!! !~!!~:~ ~ !!!-! ~!!!o~!~! !!!!! !!! :.~!!~!!!~!!'!~"'!'5!"'!I~'!"~!!~,!!~!!,!,!!~"',,!!,,!:
I'!!~!!' ,! ~!" ~! -,- -! !~" ,\",~~ ~,,~!,~~!~~ ~~"l!!
~""~5'~ "'~~'55 ~ !l'~ '~"5" " " " " "
"'I!!!,'
",!
",,!!
,t"
. ",.."""I!
,!", ,,!
I '~!'.~. '!' ••••••• - ••,-_. 'lle ~.- !:", ~!-~. ~,~!.~!! 1·.''\~·,~!'l''''''''''~,'~''5~~!!'''·'!'''5''S''I''I''''I''SSS
!~! ~!!! ,~':' ~ ~!!!~!!""",,!,~'~'3,,~!~~~!,t"'5;'"""~""""",,,,,S"
",!
'!.',t!",,!,! ,se,
, I!'!-,
•• .....•.........•.........•...................•....•.
••••
-...-.........-..-.-.......-.........
II
!.!!.5
lUUUlll111I1UllllllllUIIIIIU111111111IU. . . . . . .UU.U. .UIlU'IIUU. .'UIl. . .U.U, . . . . . . .'U. . . .,
111111111111; I J 1111111111111hU I U I UI Ul:U l'UU'UUnUUUUIUIUIUU'IIU'U'U'3Iu·.uunu. II
1111 ~ 11111 11 II: I U l111111111111111111U II1IU 1 I3IUSlSU3ISUUU3IU!3I'U':3IUJUU IUI!J,iuuun 51 n
'11: 111 U~: 11111111111111111111111; 1l11111l11l""UIUUIIU'UU'U,,,·.UIS"".UUU!! 1!~"UU~,.n.
111111111.11111: 11111111111111111111: 111111111111111111111111 n III '!!lu:J!U,nUIUUU!:~"U'I!IU'U'
11'111111 \ 11111: 11111111111111111 11 111111111 11 111 111111 I III 11 11111111111' III 1111111111 UUUSU"U!U'UI
111111111111111: I:: 1I1111111111'HlIlll1llIllIll 11 111 11111111111111111111 11 1111111111111111 U.UU!!"nU
111111111111111111111111111111111111111111 h 111111111111 11 111111111111111111111111111111111111111"" .,,..
III 111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111 " " ,
I!;
';:!!'!
" " " ~ ~!
~ ~. '!l~ !~.
I'··!~! ~.~ ~~! ~ ~!-
"',~!.
Sl"""'~""'"
•
•
•
•
•
.Y.... "".•,
•
-
!!,,~,,!!. !!,~,,!"! "'!'!~'II"""'I""I"
!""'''I!.'n.''''''nnl!.un''''nn~!
•
•
,
I
...... "AT••D
•
•
,
...,....
II
II
II
ItII88L " "•• 1
•
•
••
••
•
,!!S
I!~'
""
~!" " "
'"' ,." ""
" " ,~" " "
U
I
,
•
,•
51" "n
""
t~!I " "
"!I
'5:5
IS~' ' " ' '!I'
,~.,
lJI'·'
""
n" "'I ,,_, ,,-, .". '-'5 ""
'S" " " .t" .", "" "" "" "" ""
I I " .."
U" " " as" as" as"
-.....,.,-.
1
"
" "
" ""
.,.,
".,
,,_,
,." "'1
,~.,
__.....•....•................_ _
••"
. ."
" "
.,.,
"~'
U.,
u"
'J~'"
""
""
"'5 ""
" " ,~"
ISS'
""
,!!,
I .... ~
""
,!~,
""
'!5'
""
".,
""
.,.,
,,!,
,!" ""
I'"
S'" .",
""
"" ""
""
""
! " , I'"
I'"
''I' 51" ,.,,, ,.-,
''',
""
I'"
.",
,."
""
""
. " ' ' ' , " " ' ' ' ' ,~" " " " "
"".-. ",••'S
. , .", ,..,
''!!5
'5"
'SI'
",., ..., '''S ""
" " IS" ,!"
,,~, "" S'" " " " " ,,,,
"~' ,!!, ,!!, IS!' I!!' !'!5 15'! I!!$
" " IS!' " " " " " " n., ,~" '''' ""
,,-s
"S,.-,
,..,
••.,
"•.,
".,
"..,
""
"••,
",~ " " S," "" "" "" S~" ,,,. "" '''1
. , , , " . , S'" " " . . ~, " . , ''I' ""
,." ""
,,~! S'" " " ,,,, ,.~, "~a, .." '''' '''' ""
,~"
""
""
""
''''
""
S,~~
IS"
""
I'!'
!!"
,,.,
.1
..
•
I
•
I
•
•
•
•
•
""
""
"!os
""
_...•.....
h
U
4"'-"
IUtI
II
II
11
.
.
001
II
1I
ne
11
.,
.
.
..
11
0..
'1
0..
II."CD'
II
.
11
0.
11 • . , . .
.1 ...10.
..... 0..
I .......
.....10•
".u.n
II......
-. .. .......
......
""...
•
•
.,
.'
.'
11.10'"
..
n
39
that
~ >
X2 ,
~
• X2 , Xl < X2 , respectively.
The area of the sub-
region with STAT_ID • 4 (or 5) denotes the statistic C(X2 ) (or C(X
l
For the data of Freireich
~
al., C(X2 ) •
».
o.
If there are no ties, then the sub-regions having symbols 1 and
3 will correspond to U(X )
l
and U(X ) , the Mann-Whitney U-statistics
2
for the uncensored observations.
Thence the statistic
A(~)
(or
A(X2» is presented by the total area of the sub-regions having
symbols 1 and 5 (or, 3 and 4).
The program would print also the corresponding values of the
statistics U(Xi ) , C(Xi ) • A(X ) , i=1,2; and W.
i
2.6
The Maximum Distance, D, Statistic
2.6.1
4It
Discussion
Quade (1973) introduced the use of the pair chart for com-
puting the Kolmogorov-Smirnov statistic.
He considered the lattice
points of the path which are the farthest below or above the diagonal of
the n xn -rectangle.
l 2
Using these points he computed the maximum ver-
tical distance between the corresponding two empirical distribution
functions.
Here, we will consider the point on the path which is the
farthest from the diagonal of the rectangle.
This leads to the maximum
distance, D, statistic.
This D statistic is defined corresponding to the CPC-II or the
pair chart of Quade (1973).
Considering the CPC-II of the two samples
of sizes n l and 02' let 'D(Xi, Xi) be the distance from the lattice
point
(Xi, Xi»
we define
on the path of a CPC-II to the line nlXi • n2Xi.
Then
40
(2.6.1)
For Xi - 1, ••• ,n1 ; i=1,2, it is easy to verify that
Min {n/lni + n;}
i
~ 0 ~ nln/lni
and
D(X ',X') 0: n 1 n 2
1 2 / 2
2
n + n
I
+
n~
(2.6.2)
X'I
X'/n
1 nl - 2 2
(2.6.3)
2
1
Thence
(2.6.4)
Note that the maximum value of IXi/nl - Xi /n2 1
for fixed sample sizes.
for lattice points (Xi,Xi) on the path determines the Kolmogorov-Smirnov
O: Fi(x)
statistic for testing H
= Fi(x), as noted by Quade (1973).
So, under the assumption that the two samples have the same
censoring function (see Section 2.3), this
for testing the null hypothesis H : Fl(X)
O
suggests the rejection of H .
D-statistic is appropriate
= F2 (X).
Large value of 0
O
2.6.2
The Distribution of the D-Statistic
2.6.2.1
For Equal Sample Sizes
If n -o -n, then
1 2
D(i) •
i/12
(2.6.5)
where D(i) is the i-th ordered possible value of the D-statistic, for
i-1, ••• ,n.
Using Theorem 2. Chapter 1 of Mohanty (1979),
O' we
null hpyothesis H
obtain:
under the
4It.
41
o
P(D<il2) ..
< 1
i
L (n;i) I (2n)
1 < i < n.
1
i
n
>
n.
(2.6.6)
with
(2.6.7)
where
(~)+
(y) when y > z
z
.. 0 when y < 0 or y < z
{ 1 when z .. 0
(2.6.8)
and the summation is over all integer values of k: positive, negative
or zero.
Note that L(n;i), which is the IL(n,n;i,i)
I of
Mohanty (1979),
denotes the number of paths from the origin to (n,n) that do not touch
•
e
the lines
Xi .. xi
+ i and
Xi .. Xi -
P(D=ill2) .. P(D«i+l) 112)
-
i.
Thence,
(2.6.9)
P(D<ill2)
for i=1,2, ••• ,n.
For i>n/2, we may consider applying formula (4.9) of Mohanty.
We obtain that the number of paths from (0,0) to(n,n) that touch
the line
Xi .. Xi
+ i exactly r (>0) times is given by
2i+r-1 x (2n-r+1 ")
2n-r+ln-i-r+l
(2.6.10)
Note that, in this case, i>n/2, a path which touches the line
Xi .. xi
+ i will not touch or cross the line
2, ••• , (n+l-i).
Xi .. xi -
i for r = 1,
Now, let
n+l-i
..
t
r=l
which can be simplified as
2i+r-1 (2n-r+l )
2n-r+l n-i-r+l
(2.6.ll.a)
42
n-i-l
{(2n-k-l) _ (2n-k-l)} .!!. < i < n-l
n+i-l
n+i' 2
-
r
+
k=o
, i=n
(2.6.11.b)
Then
P(D=ill2; n/2
< i
~
n)
= 2L(i)1 (~)
(2.6.12)
Furthermore, let
n
L
(j~i)
•
r
j=i
(2.6.13.a)
Then we obtain
n-i-l
L(j~i) =
r
(2.6.l3.b)
k=O
Hence
(2.6.14)
2.6.2.2
For Unequal Sample Sizes
In this case, we would consider in general the
level of significance for the D-test, instead of the probability function
of the D-statistic.
Using Hodges' (1958) method, presented by Quade
(1973), we can calculate the level of significance for either the onesided or the two-sided D test. ,Let L(Xi,Xi) be the number of possible
paths or routes from the origin to the point (Xi,Xi) then we have the
recursion formula
(2.6.15)
with initial condition L(O,O) • 1.
Monahty (1979).
This formula is presented also by
e.
43
A particular path of the CPC-II gives rise to a value D >
line n2Xi - nlXi • -c.
Under HO' the (n l +n 2 )1!(n l ln2 1) possible paths
are equally likely; thence the significance level associated with
(2.6.16)
the value of L(n ,n ) can be computed using the recursion formula
l 2
(2.6.15), with boundary conditions that
(i)
L(Xi,Xi) - 0 for Xi
< 0;
Xi
>
nit
or In2xi - n1Xii ~ c if two sided D test;
e
that is to test HO:Fi
= Fi
L(Xi,Xi) - 0 for Xi < 0; Xi
(ii)
against Hi: Fi,Fi
>
n i , or
n 2 xi - n 1 Xi 2 - c to test HO against Hi:
Fi
>
Fi;
that is the one-sided D test.
Since L(nl-Xi,n2-Xi) denotes also the paths from (Xi,Xi) to
(n ,n ), the recursion formula need not be carried out beyond the
l 2
line Xi+xi • (n1+n2+l)!2.
Considering all possible paths from (0,0) to (Xi,Xi) on the
line Xi+Xi·
triangle.
c.
For
c > 0, we would have the well known Pascal
It is generated by the recursion formula (2.6.15).
However,
in computing the level of significance for the D test corresponding
to any observed values we would have an 'incomplete' Pascal triangle,
which depends on the sample sizes and the boundary conditions: (i)
or (ii).
44
By observing all possible lines
n2Xi - nIX;
K
c for
°
<
c
<
n n containing some lattice point(s), we may construct the corresponding
l 2
incomplete Pascal triangle.
Using this triangle, we can compute the
probability function of the D statistic.
The points on the boundary
lines are associated with a value of the D statistic.
Let
~
be the
value of c associated with the k-th ordered value of D, that is D(k)'
for k
= 1,2, ••• ,N(nl ,n 2 );
is the largest value of D.
then
Let
~(nl,n2)
be the number of paths from
(0,0) to (n l ,n 2 ) between the lines /n2xi - nIx;,
= ck '
including the
paths giving the value of D = D(k); then
(2.6.18)
and
= 2.
where
~(nl,n2)
(2.6.19)
= L(n ,n ) is the number of all possible paths from
l
2
(0,0) to (n ,n ), and L(nl,O) is the number of paths from (0,0) to
l 2
(nl,O) or from (O,n ) to (n l ,n ).
2
2
Similarly, L(O,n ) is the number of
2
paths from (0,0) to (O,n 2 ) or from (nl,O) to (n ,n ).
l 2
~(nl,n2)'
The values of
for k-l, ••• ,N-2, could be computed using the incomplete
Pascal triangles.
In fact, the
bounda~lines
Pascal triangle can be written as
corresponding to each incomplete
45
I Xi
D(k) -
sina
-
Xi
cosa
I
(2.6.20)
These equations are called the normal equations of parallel lines;
and D(k) denotes the distances of the lines from the origin.
And,
it is clear that each boundary line should contain at least one
lattice point.
If n l -n 2-n, then N = max(k)
e
n and each boundary
line has (n-k+l) lattice points, for k=l, ••• ,n.
Another method for computing the probability function of the
D-statistic is based on the vector representation of Mohanty (1979).
2.7.
Vector Representation for D-Statistic
Following Mohanty's notation, the sample sizes will be written
as nl=m and n
n.
e
2
Then a path in a CPC-II can be represented as a
vector (a ,a , .•• ,a ) = ~, where ai(i=l, ••• ,n) is the minimal distance,
n
l 2
measured parallel to the xi-axis, of the points (m,n-i) from the
path, such that
(i)
(ii)
a
i
is an integer, i-l, ••• ,n
o -< a l
<••• <
-
-
a
n
provided there are no tied observations.
In this case, a
l
+ a2 +
••• a n - U(Xl), the Mann-Whitney statistic.
For further discussion, the following definition and theorem
of Mohanty (1979) are needed.
Definition 2.7.1.
A path (Xl' •••
and only if Yi
~
n ) dominates the path (Yl' ••• 'Yn )
,X
xi for all i.
if
46
Theorem 2.7.1
l(a)1
c
Det(d
nxn
ij
)
(2.7.1)
where
a+n
(n
icj=l
,
n
a
+n-i
n-l
(n-i
)+
a -a _j+l+j-l
(n
n
j
)
+
an_l-an_j+l+j-i+l)
(
j-i
i,&l,j,&l
(2.7.2)
with (~)+ defined by (2.6.8).
In this theorem I(!)I denotes the number of paths dominated
by the vector a.
Associated with the boundary line Xisina
0, as given in section 2.6, let
~(k)
2
- X cosa
+ D(k)
=
be the path dominates all paths
from (0,0) to(m,n) laying in the positive side of this boundary line,
then
I(a(k»
I
(2.7.3)
denotes the number of paths crossing this boundary line.
These paths
2
mayor may not cross the other boundary line Xisina - X cosa - D(k) • O.
Hence, we obtain
(2.7.4)
47
Since associated with
~<
I
2 2
D(k)Ym n /mn
~ 1
the paths
C(~)
will not
cross the last boundary line. For other values of D, that is 0 < D(k)
2 2
< mn/2/m +n , we should consider another result of Mohanty:
I (b,a)
- -
I=
I(a)
-
1+
a+n
I(a -b ,a -b l, ••• ,a -b ) I _( n )
n n n nn l
n
(2.7.5)
representing the number of paths which dominate
~,
when no path can cross both boundaries.
2
and are dominated by
Now, let P(k) be a path on
the negative side of the line Xisina - Xicosa - D(k)
= 0,
which is
dominated by all other paths on this side, then
C
(Qk'~) = (~) - I(~(k)'~(k) I
(2.7.6)
denotes the number of possible paths which cross the boundary lines
(2.7.7)
In fact, this formula holds for all values of D.
However, formula
(2.7.4) is simpler to use for large values of D(k).
=
P(D>.D(k»
{<m:n) - 1(~k),a(k»I}m!n!
(m+n)!
So, we have
(2.7.8)
and
(2.7.9)
2.8
Large Approximation for Equal Sample Sizes
For large n, so i is also large, using Sterling's formula,
48
(2.6.13) can be written as
L(j~i)::
n-i-1
-k-1
t
(1- k+1 ) -n-i-k-~(n+i_1)-k-1 ....,.e(_-:-k-O
n+i+k
k+1) I
(2.8.1)
Since
k+1
-n-i-k-~
k+1
~~~(1- n+i+k)
- e
(2.8.2)
i-+~
For each k, then (2.8.1) can be approximated as
n-i-1
t
k-O
(n+i_1)-k-1
(k+1)!
(2.8.3)
Furthermore, for n/2 < i <n, we could have
lim i/n - p;
~<p<l.
(2.8.4)
n-~
with
(2.8.5)
Thence, for (2.6.14), we have two possible approximatims for
large n, that is
P(D>i/12; n/2<i<n)
~{(1+ n+~_l)n-i -
I} 2-2n+1
f~n
(2.8.6)
if (n-i) is small, and
P(D>i/ 12; n/2<i<n) :: {exp (l-p ) -I} 2-2n+1
1+p
otherwise.
rnn
(2.8.7)
.
49
2.9. Large Unequal Sample Sizes
In contrast with the previous section, here we will not propose an approximation formula for the level of significance for the
D test, that is (2.7.8).
But we will illustrate some facts, which
cause the approximation is so complex to derive.
First, let D (m,n; x,y) be the value of D corresponding to
a fixed point (x,y) in the path of the pair chart of the samples of
sizes m#n.
Then in general
(2.9.1)
except if m n
1 2
= m2n 1 •
Figure 2.6 illustrates the cases for m -m =8;
1 2
n -6, n =5, and (x,y) = A(2,4).
2
l
D(8,6;2,4)
= AB
<
AC
This figure shows
= D{8,5;2,4).
(2.9.2)
On the other side
(2.9.3)
D{8,6;4,2) > D{8,s;4,2).
Using (2.6.20), we obtain AB
Figure 2.6
= 2 and AC = 22 \1s9.
Illustrative CPC-II for (m,n)
y
o
x
=
(8,6) and (8,5)
50
This affects the corresponding boundary lines (2.6.20) at each point
(x,y).
Furthermore, we could note that
(2.9.4)
is a function of (x,y) beside (mi,n ), i=1,2; and it is not a
i
constant, even for fixed values of
2
(~,nl)
and (m ,n ).
2 2
Figure 2.7
2
shows that (2.9.4) increases if x + y , or the distance of (x,y) to
(0,0), increases.
Finally, associated with formula (2.7.9) we should consider
the paths represented by the vectors !k and eke
Figure 2.7 shows the
paths
!k(8,6)
c
(3,4,6,7,8,8).
Qk(8,6)
= (0,0,1,2,4,5).
(2.9.5)
corresponding to the value of D (8,6) = 2, where (8,6) on the left
k
hand sides indicate the sample sizes m,n. It is clear, these vectors
are affected by the boundary lines (2.6.20), and (2.9.4) as well.
This leads to the complexity of computing the values of
a b I
l-k'-k
for different large values of m.n.
still open.
So, this problem is
CHAPTER III
MEASURES OF ASSOCIATION FOR
GENERAL RIGHT-CENSORED BIVARIATE SAlfi>LES
3.1
Introduction
A nonparametric measure of association between the components
of a bivariate random variable (Xl ,X ) • X is proposed for general
2
right-censored samples, as an extension of Kendall's (1938) tau.
Such
a measure is called a 'correlation,' written t (X ,X ) • t where (t)
c
c l 2
•
e
indicates Kendall's tau, and the subscript (c) indicates 'censored'
samples.
Considering a censored bivariate sample of size n; (Xli ,X2i ),
i-l,2, ••• ,n; we define an indicator variable
6
ri
for r • 1 or 2.
•
{o
i f Xri uncensored
(3.1.1)
1 if X censored
ri
Then, the i-th bivariate sample
observation will
be written as
0.1.2)
Using this notation, we are defining that an observation on Xr is an
ordered pair (X
,6 ). Hence, a censored observation (Xri,l) cannot
ri ri
be smaller than an uncensored observation (Xrj,O), since we are
considering only right-censored samples.
Note that, if X is
ri
52
censored, then X is the point (. value) at which it was censored.
ri
For simplicity, whenever there is no confusion, the i-th observation will be written as
~
• (Xli 'X2i ).
Without loss of
generality, we may assume the n observations have the following
ordering.
i=l, ••• ,k.
i=k+l, ••• ,l.
i=l+l, ••• ,m.
(3.1. 3)
i=m+l, ••• ,n.
In this paper, the general right-censored sample will be called
sample, in short.
And the sample has an uncensored (bivariate) sub-
sample having observations X. • Xi(O,O), i=l, ••• ,k., and a censored
-'J.
-
(bivariate) sub-sample having observations Xi' i=k+l, ••• ,n.
If m=n, then the censoring occurs only on either one component
of the bivariate
~i' a~b=1,2
A-sample.
observation~,
is uncensored.
that is if X is censored, then
ai
This kind of sample will be called an
If m=k, so we only have observations Xi(O,O), i=l, ••• ,k,
and Xi(l,l), i·k+l, ••• ,n, then the sample will be called a B-sample;
both components are either censored or uncensored.
Furthermore, the censoring may occur after the kr -th order uncensored (complete) observations on the component X • r=1.2.
r
In this
case, we would have (Xri.O) < (Xrj.l). for all i.l ••••• k r ; j.kr+l ••••• n.
Finally, a modification of the
will be proposed in section 3.6.
will be called the index Alpha, a.
~eneralized
Kendall's tau, t c •
This modified index of correlation
It is expected that Alpha could
e
•
53
reach the values -1 and +1 for any general right-censored data.
3.2
The Generalized Kendall's Tau
3.2.1
•
Definitions
Considering a censored bivariate sample of size n on
variable X • (X ,X ) given in Section 3.1, we may consider the following
l 2
measures of association between Xl and
~,
depending on the treatment
of the censored observations and the proposed assumptions.
The cor-
responding nonparametric correlation coefficients will be written as
t c, l' and t c, 2' which are called the generalized Kendall's tau.
3.2.1.1
•
e
Unconditional Generalized Kendall's Tau
Using the notations in Section 3.1 the nonpara-
metric correlation, t c ,l • t c ,1(X1 ,X2 ) between the components of X, is
defined as
t
c,l
- (n)-l r' U(Xi,X ) _ (n)-lr'U .
2
n
j
2
n iJ
(3.2.1)
where
(3.2.2)
with
l if a>O,b>O or a-O,b>O
u(a,b) - ( -1 if a<O,b<O or a-O,b<O
lo
otherwise
(3.2.3)
as a function of ordered pair (a,b).
The
t~
stands for summation over all possible pairs (Xi,Xj ),
54
i<j; i,j-l, ••• ,n.
Considering the functions u
urij • -u rji •
rij
; ral,2, we could note that
Hence,
•
til
n
urij .. 0
(3.2.4)
where til stands for the summation over all permutations (i,j) such
n
that
i~j,i,j"l, •••
,n.
Furthermore, it is easy to verify that
max(t'u
) .. (n) , r-l,2
n rij
2
(3.2.5)
If u i. ~ 0 for all i, j and r, then
r J
E' 2
(3.2.6)
nUrij - n(n-l)!2
and tau-c,l can be written as
t
e
c,l
(3.2.7)
This shows that tau-c,l is a type of the general correlation coefficient, introduced by Daniels (1948), and Kendall (1970), provide
that urij
~
0 for all r,i,j.
Following Hoeffding's (1948) notation tau-c,l can be presented
as
t
c,l
•
1
--,"=-.."..,...-
n(n-l)
t"
U .
(3.2.8)
n .iJ
where til stands for summation over all permutation (i,j), i~j.
n
formula shows that tau-c,l is symmetric in
~,
!2' •••
'~.
This
Thence,
tau-c,l is in fact aU-statistic.
This will be discussed further in Section 3.3 and Section 3.4.
It is well known that the Kendall's tau for uncensored data
is presented as a difference between the number of concordant pairs,
•
55
C, and discordant pairs, D, that is t - (C - D)/N, where N is the
number of all possible pairs.
This idea of concordance and discon-
cordance is extended for censored data by defining concordant and
discordant censored pairs.
A
pair of observations
(~j' ~i)' (X
2i ,
~1) ,
i-l,2 ••• , is
called noncomparable if
ri < XrJ and 6ri> 0rj' i;j-l,2
for at least one value of r-l,2. Otherwise the pair is called
X
parable.
~-
Furthermore, a comparable pair can be classified into 3
possible categories, that is concordant, discordant, or tied.
The pair is called concordant if
(X
rl
,6 rl )
>
(X
r2
,6
r2
) or
(X rl ,6 rl )
<
(X
r2
,6 r2 )
for r-l,2; and the pair is discordant if
or
Here, we should note that (a,b) > (c,d) if and only if a>c;
a~c;
b> d.
b~d
or
Finally, the pair is called tied if it is comparable and
(Xrl,orl) • (X r2 ,or2) for at least one value of r.
Let C, D, T and 1 be the number of concordant, discordant, tied md
pairs out of N • n(n-l)/2 possible pairs, then the un-
incomparable
conditional generalized Kendall's tau; t c, 1; can be written as
C- D
C - D
t
c,l
•
C+ D+ T + 1
•
N
(3.2.9)
Considering a random pair of observations let p, q, rand s
(·l-~q-r)
e.
be the probability of being concordant, discordant, tied
56
and incomparable pair respectively; then we have the estimators:
p - C/n.
4 - DIN
r-
s-
TIN.
(3.2.10)
liN
Hence
(3.2.11)
t c, 1 - E(t c, 1) - p - q
The variance of
~
- F <5 (X), and F
for i-l,2.
c,
1 will be computed as follows.
Let F(X;r.)
- ....J.
(X ), r-1.2 are its marginal distribution functions
r ;.21
-i
t
r
Then, we have at most four distribution functions, asso-
ciated with the value <5 - (0,0).(0,1). (1.0) and (1,1).
's are continuous, the t
Fl~i)
assumption that
1
function of Hoeffding
(1948) can be generalized for the right censored data.
l - 2F1 <5
-2F2<5
'-2
r
'-2
+4F<5
-2
- F
, 1 - 2F
+ 3F
1 ,i
6
2.12
-2
2
1._<52
- 2F
2 '-2
C
+ 3F
6
-2
i
1,12
- F
2,12
+ 2F~
~2
a
•
il - (0,0)
(0,1)
; _<51 - (0.0)
12 1 - F
Let
;i1 - i2
12 1 - F
Under the
(l,O)
; £.1 - (0,0)
£.2 - (l.l)
- F
+ 2F
2'£'2
6
-2
;i
l • (0.1).12 - (0.0)
F
2.i2 - F£.2
iiI •
(1.0) •
.2.2
• (0.0)
57
.
6
,• -1
=
(1,1), i 2 • (0,0)
6
,• -1
=
(1,1), i 2 • (0,1)
;i
.. (1,1),
1
i
2
=
(1,0)
(3.2.12)
and
(3.2.13)
Considering the possible values of
P(~=(a,b»
.2
1-a
= P1(1-P1)
~,
we may write
b
. 1-b
P2(1-P2)
(3.2.14)
where
(3.2.15)
for r=1,2.
Thence we may write
c, 1" E{~l(X;il)}
T
• 1:
~l(X;.§.)
P('§'l='§') ! !
dF(X;.§.)
(3.2.16)
.§.
The variance of t
c,
1 is, by Hoeffding (1948),
(3.2.17)
where
(3.2.18)
2
• E(U12 )
•
(p
-
'r
2
c ,l
2
+ q) - T c, 1
(3.2.19)
58
In this case, we are, in fact considering non-random causes
for censoring.
In other words, we are not taking into consideration
the censoring variable.
So, another situation, in which the causes
of censoring are random needs to be proposed.
In this situation,
we should consider the minimal functions between the studied
variables and the censoring variable.
Then, the data haVing new
variables corresponding to the minimal functions are uncensored data.
Hence, Hoeffding's results are directly applicable.
This will be
discussed in Section 3.8.
3.2.1.2
Conditional Generalized Kendall's Tau
Corresponding to the function Uij above, we
define a function Vij • VlijV2ij such that
Uij if 0ri •
°rj • 0
or 0ri f: 0
rj
0 otherwise.
(3.2.20)
The difference between the functions Uij and Vij is in transforming
the censored observations, that is if 0ri • 0rj • 1, then v rij • 0
but u i' may take a non-zero value -lor +1, if X i f: X ,.
rJ
r
D
Thence, another nonparametric correlation, Kendall's tau-c,2
will be defined as
t c ,2 •
{(~)
- N*}-l
r~Vij
where N* is the total number of all possible pairs
(3.2.21)
(~,Xj)
in which
0ri • 0rj • 1 for at least one value of r-l,2.
Using the general ordering of observations given in Section 3.1,
59
we have
"
l ).
N* • ( n-k
2 ) - (4-k)(mThence, if
t·
k or _ •
t.
c,2
(3.2.22·)
t, we have
2
2k(n-k) + k(k-l)
(3.2.23)
If C* and D* are the number of concordant and discordant pairs
out of the N* observed pairs, then t
c,
2 can be written as
(C - C* ) - (D - D*)
t
c,2
•
(3.2.24)
N - N*
As the index tc,l; t c ,2 is also symmetric in !l' ••• '~: and
hence it is a U statistic of Hoeffding.
as a conditional generalized Kendall tau.
This t c, 2 will be considered
.
For a giveD pattern of ob-
servatioDS, as proposed by Gehan (1965) we will observe the distr1bution of t c, 2 under the permutation of the observations.
This will
be discussed in Section 3.6.
3.3 Characteristics of the Generalized Kendall Tau
In this section, the nonparametric correlation coefficients
t c,l and t 2 will be written as one formula:
c,
• N- l I'A
• N-lI'a
a
tc,s
8
n .,ij
• n .,lij 8,2ij
(3.3.1)
with
Ns •
n(n-l)/2 - N* , s • 2
and
a s,rij •
e
, • • 1
{ 0(0-1)/2
{U
v
rij
rij
(3.3.2)
s· • 1
•
, s • 2
because both correlations have 80IDe cOlZlDOn characteristic••
(3.3.3)
60
3.3.1
General Properties
Since As,ij • As,ji' then tc,s is 'symmetric under inter-
change' of its variables, so that
=
(3.3.4)
l
And it is 'positively invariant,' that is invariant under monotonic
t c,s (Xl 'X2 )
t C,s (X2 'X )
increasing transformations.
So, if gi(·)' i=1,2 are monotone in-
creasing functions, then
(3.3.5)
= ai +
In particular, for gi(x)
bix, b
i
> 0, we have linear trans-
formation with positive slope.
It is easy to verify that
-1 < t
c,s
<
+ 1
(3.3.6)
In general, for general right-censored data, the limits -lor +1
can not be reached, even though there are no ties, because as,rij may
take a value zero even x i
r
+ x..
rJ
This deficiency suggests some
modifications of tc,s' which will be considered later.
3.3.2
Some Special Cases
If alln-observations are uncensored, then N
l
Consequently, t c, 1
t
= t c, 2'
c,s
= N2
that is
(3.3.7)
which is the same as that of Kendall (1970).
Another special case is a censored sample, in which the uncensored and censored observations on X have the fallowing orderings.
2
61
(X
2l
,0)
<
(X22 ,0) ------
<
(X
2m
,0); and
(3.3.8)
(X2 ,m+l,1) < ----------- < (X2n ,1)
where X , i > m are the values at which they were censored.
2i
In this
case, we have
m-1
m
n-1
n
L as,lij + (2-s) L
L
t
= N- { L
c,S
s i-I j=i+1
i-m+l j=i+l
l
m
+
L
i-I
n
r a li. a 2ij}; s=1,2.
j=m+l s, J s,
since a 2 ,2ij = v 2ij =
0.3.9)
° for all i=m+1, ••• n; j=i+l.
If the censoring
on X2 occurs after the m-th order completed observations, such that
(3.3.10)
then
t
n-l
n
n-1
n
l
- N- { L
r a Ii' - (s-l)
L
r a 1"}
c,S
S i=l j=i+l s, J
i-m+1 j=i+1 s, 1J
0.3.11)
that is a function of the observations on Xl only.
In the previous cases, if ties occur on X2 , then the number of
terms on the right hand side of each formula will be reduced.
For
example, if the censoring on X occurs at 'one point' after the m-th
2
order completed observations, such that
then
-1
t
c,s
- N
s
m-l
{r
m
L
i-I j-i+l
as,lij +
m
n
L·
r
i-I j-m+l
as,lij}
(3.3.13)
62
3.4
Alternative Expression of the Generalized Kendall's Tau
Considering the ordering of the n-observations assumed in
Section ll,we would have uncensored
sub-sample~,
censored sub-sample Xi' i-k+l, ••• ,n.
i-l, ••• ,k; and
Thence, the Kendall's tau-c,x
can be presented as
t
C3.4.l)
c,S
where
(3.4.2)
t Ol - (k (n-k»
t II,s -
* -1
s
N
-1
k
n
1:
1:
A
i-I j-k+l
n-l
n
1:
1:
i-k+l j-i+l
s,ij
(3.4.3)
A
(3.4.4)
s ,ij
This shows that tc,s is a weighted average of tOO' t Ol and tIl,s'
since
It is clear that too is the Kendall tau for the uncensored
sub-sample.
Bence, too could have a value from -1 to +1.
However,
the limits -1 and +1 cannot be reached if ties occur as noted also
by Quade (1974).
The value of t
Ol
depends on
63
i .. 1,2, ••• ,k; j-k+1, ••• ,no
(3.4.5)
in which at least one of Q , 02j is equal to +1, because Xj's are
1j
censored. If 0rj .. 1, then a
-lor 0 according to whether
s,rij
X .. X or not. If 0rj .. 0 for at most one r, then a
-1,
ri
rj
s, r i."
J
..
o or
+1 according to whether X
ri
<
Xrj ' X
ri
Hence, AS,ij may take a value -1, 0, or +1.
= Xrj
or X > Xrj •
ri
This shows that tOl
could have a value from -1 to +1.
Figure 3.1
Illustrative Graphs for (A). t
01
.. -1 and (B).
tOl .. +1.
* *
*
~(O,l)
*
*
o
*
o
o
o
X.
-1.
o
* *
*
i
*
o
o
I
(B)
(A)
In Figure 3.1, the (O)'s indicate uncensored observations and
the (*)'8 indicate censored observations.
In Figure 3.1(A), uncensored
observations,!.r, satisfy (a ,b ) <!i < (a ,b ); and the censored
1 l
2 2
observations, Xj (l,O) and
~(O,I),
should satisfy
64
x 1j
> a ;X
X
1k
<
2
2j
<
b
1
> b
a ;X
1 2k
2
This implies Uij .. U • -1 for all i,j and k. Thence t Ol .. -1.
ik
< X
and X > X > X •
In this case, we have X <
lj
2j
2k
lk
~i
2i
In Figure 3.1{B), uncensored observations,
~.
satisfy
~
<
(c ,d1 ); and the censored observations, X.{O.); 0, .. (O.l), (1.0)
l
-J -J
or (l,l), should satisfy !j > (c .d ). This implies Uij .. +1 for
1 1
all i and j.
Hence t Ol " +1.
(o.),
with O.
-J
-J
~
In fact, here we have X.{O,O)
-~
<
X.
-J
(O.O) for all i,j.
This index, t 01 ' can be considered as a measure of association between the censored and uncensored sub-samples.
Finally, t 11 ,s is the Kendall tau
sample.
for
the censored sub-
The most general form of this sub-sample is Xi{O,l). i ..
k+l, ••• ,l; !i{l.O), i=i+1, ••• ,m and Xi{l.l). i=m+l ••••• n.
dex would show the difference between tau-c,l and tau-c,2.
sidering the possible values of
~
This inCon-
for each pair of observations
(3.4.6)
for all
~,o
-~
...
-J
(O,l), (1,0) and (1,1), so there are six possible
functions for each (i,j); and
~i
.. (0.1).
~j
.. (1.0)
(3.4.7)
Note that Vij (O.l;l,O) • Uij{O.l;l,O).
Hence tIl,s for s • 2 can
65
be simplified as
*
t ll ,2 - N2
l
m
E
E A2 ij
i-k+l j ...t+l
'
(3.4.8)
or
1
t
ll ,2"
(l-k)(~l)
.t
m
E
E
(3.4.9)
Ui,(O,l;l,O)
i=k+l j=.t+l
J
Thence, in general, tIl,s can be written as
t
II,s
..
+
2(2-s)
(n-k) (n-k-l)
n-l
n
E
E
i=k+l jll::i+l
m
.t
(s-l)
(!-k) (~.e)
Uij
r
E
i=k+l j=!+l
(3.4.10)
U (0,1;1,0)
ij
that is a function of Uij •
Now we will observe the possible values of A i"
s, J
or Vir
that is U ,
iJ
Since Vij(O,l;l,O) = Uij(O,l;l,O), we may ·consider only the
values of Uij(il,ij ).
The table below, Table 3.1, shows all possible
values of Uij by ii' by
ij
•
Table 3.1 Values of the Function Uij by
~i
and
~j
~j
Uij
(0,1)
(1,0)
(1,1)
(0,1)
-1,0,1
-1,0
-1,0,1
(1,0)
-1,0
-1,0,1
-1,0,1
(1,1)
-1,0,1
-1,0,1
-1,0,1
Based on this table, we conclude that tll,l could reach the
limits -1 and +1, but tIl , 2 could not.
In fact,
66
(3.4.11)
·Note that in computing t
relations between
ll
and
~,
,2' we are not concerned with the
if both Xri and Xrj are censored for at
Figure 3.l(A) shows the case, where t ll ,2 • -1.
least one r.
value t
~
ll
The
,2 - 0 is attainable if we have censored observations
!j (!j) , 15j - (0,1), (1,1); or,!j - (1,0), (1,1).
t ll ,2 - 0 for a B-sample, but in general t 1l ,2
If t ,2· 0 , t h en N* • (n- k).
2
ll
t
c ,2 •
«~) +
k(n-k)r
It is clear that
~
0 for an A-sample.
And hence
l
(3.4.12)
that is a weighted average of too and tOle
It is clear that, in
this case, t c, 2 could reach the, limits -1 and +1.
3.5
Asymptotic Distribution of Tau-c,l
In Section 3.2, we have shown that the statistic t c, 1 is the
unbiased estimator of t
population.
function.
c,
l' the Kendall's tau-c,l of the parent
Here, we would consider its asymptotic distribution
Theorem 8.1 of Hoeffding (1948) can be modified as
Theorem 3.1
Let
~ <5
i-l, ••• ,n be n independent bivariate random
'-i
vectors,
~ <5
'-=-1
having the distribution function F15 (X); ii • (0,0),
(0,1), (1,0) or (1,1).
~
Let U(~
function in its vector argument
<5
'!2
'-1
~
<5
'-2
<5
- ... '-=-1
)
•
U12 be a symmetric
which does not involve n.
e.
67
If there exist two positive numbers A and B such that
(3.5.1)
for ii·(O,O),(O,l),(l.O).(l.l)j i·l.2 ••••• n; and
(3.5.2)
with t1(tc,l) given by (3.2.18). then. as
(t c. 1 -
n~ ••
the d.f. of
~
c, l)/o(t c, 1) tends to the normal d.f. with mean 0 and
variance 1.
Now, we need to show that the statistic t
conditions of this theorem.
c.
1 satisfies the
From (3.2.2) and (3.2.3). we have
(3.5.3)
So there exists a positive number A such that (3.5.1) holds.
For
(3.5.2). we should consider that
(3.5.4)
from '3.2.16) and (3.2.18).
In general. this is a positive quan-
tity. except if t 1 is a constant function of X and
Thence. if t
is not a constant, as n_
l
tends to a normal d.f.
3.6
CID •
fl'
the d.f. of t • l
c
Some Characteristics of the Conditional Generalized Kendall's Tau
3.6.1
Alternative Expression of Tau-c,2
Following Gehan's (1965) notation. let
m
1j
• number of uncensored observations on the i-th component
of rank j, in the rank ordering of uncensored observations
68
on the i-th component.
iij = number of right-censored observations on the i-th component with values greater than observations at rank j
but less than observations at rank (j+l).
with
Lj(mij + iij) = n t
be the general pattern t Pit of the i-th component of the bivariate
n observations on X = (Xlt X ).
2
This may be illustrated as shown
below:
r/
iil
ii2
~--
m
i2
mil
.L
ii'
.J.
/'
_-/
i
is.
1
m
is.
m
ij
1
Note that. mij > 0 for all i and all j=lt2t ••• tSit but lij can be
zero for some j.
Hence t the i-th component has at most 2s
i
distinct
values.
We will consider the general pattern for Xlt PIt as fixed and
the pattern for X2t P2t as coming from (n!) possible allocations of
X2 •
Corresponding to each mljt we have uncensored observations
Xljktk=lt ••• tml on
~
with R(Xljk ) = j for all k. And for each ilj t
* k =It.··ti ij on Xl· We define
we have censored observations Xljkt
R(X* ) = j * with j * strictly larger than j.
1jk
Using the notations as
in Section 3.2 t an observation on Xl will be written as (jtO) if it
is uncensored of rank jt and (jt1) if it is censored.
Corresponding
to (jtO) or (jtl), observations on X2 may take values (kjtO) and
(kjt1)t for some kjtkj=lt ••• tS2.
Hence the n bivariate observations
69
would have the following general forme
where (i.O). i.l ••••••
1
indicates an uncensored observation on
and (k i • 62i ) is the associated observation on
Xl.
~
with 6 • 0 or 1
2i
corresponding to whether it is uncensored or censored. Here. we have
51 distinct ranks (or values) for the observation on
rank i.
So. k
i
~i
~i
with
may take more than one value of 1.2 ••••• s •
2
for each i, k i has
2.
Xl.
of
In fact.
possible values.
{(j.l). (k j ,6 )}.
2j
where (j,l) indicates a censored observation on
XI.
and
(kj~2j)
the same meaning as that of the previous type of observation.
has
For
the censored observation (j,l) on Xl' j may take only some distinct
value(s) of 1 •..• ,sl corresponding to whetherllj's is not zero.
Hence, we would have at most sl distinct values for the censored
observations on
XI.
For each j with llj P O.k j has llj possible
values.
Considering the observations on
XI,
we have that (i,O) and
(j,l) are increasing on i and j respectively.
And the v lij function
would satisfy
1 (i > j, 51j • 0) or
v lij • v{(i,6 li ) - (j.5 lj )} •
(i • j. 51i • 1.61j • 0)
otherwise
(3.6.1)
The index
of X:
t
c.
2 can be presented as a function of the second component
70
2
t
c,2
= -k~(2~n=-k~-~1-=-)
where
L means summation over (all possible) a such that (a,ola)
=
a
(a,O), that is uncensored on Xl'
~
* means
summation over all possible a such that (a,ola)
=
(a,l), that is censored on Xl'
,
kL means summation over all possible k
a
a
for each a.
Note that, if m > 1, k may take either multiple values of 1,2, ••• ,
la
a
or s2' or several distinct values, or a combination of both for each
a.
If llj =
° for j=1,2, ••• ,sl-1, then the censored observations
on Xl have one value, that is (S1,1), hence
rS~-l
2
t C,
2 = "'-k"""('="'2n""';-;;"'k--""='1"7)
k
If llj •
~=l
r'
v«k
k
i' s
sl
r
j=i+l
sl
'02
sl
) ... (ki ,02i»]
(3.6.3)
1
° for all j, that is all observations on Xl are un-
censored, then we have
t
c,2
•
v«k j ,02j) - (ki ,02i»
(3.6.4)
71
If mlj .. 1 and llj .. 0 for all j, then
t
c,2
..
2
-:--::~=---:-
k(2n-k-l)
n-l
n
L
L
(3.6.5)
i=l j=1+1
In this case ka , for each a, can take only one value out of 1,2, ••• ,s2.
Now, considering the observations on X ' if l2j .. 0 for j .. 1,2,
2
••• ,s2- l , then the censored observations on X are equal to (s2,1) with
2
a fixed value of s2.
And, in general, an observation on X2
can be
written as
where a is the rank of the corresponding observation on the first
component of X.
Associated with a type of censoring on Xl' a formula
for t c, 2 can be obtained from previous formulas by replacing (k a ,02 a )
a=i,j.
Of course, the formula obtained can be simplified easily,
according to censoring types on Xl.
3.6.2
Conditional Distribution of Tau-c,2
Considering PI as a fixed observation pattern on Xl'
and assuming the observation pattern on X ' P , comes from (n!)
2
2
equally likely allocations of X ' then we have (n!) possible values
2
of the index t ,2 for given PI and P2 •
c
the conditional probability of t
c,2
given PI and P2 is
Pr (t c, 2 - r 1 ) IP l ,P 2 ) .. lIen!)
for i-l,2, ••• ,(n!).
Hence, under this assumption,
(3.6.6)
The corresponding values of t c ,2 - r i , can be
computed easily for small sample sizes.
For large sample size,
72
however, the use of a computer should be more practical.
Furthermore, having all values of ri's we can compute the
conditional k-th moment of t ,2'
c
variance of t c, 2' VAR (t c, 2
E(t~,2 I
Pl,P2)' and the conditional
P ,P 2 ), as follows.
l
(3.6.7)
2
Pl ,P 2 ) - E (t c ,2
I
Pl ,P 2 )
(3.6.8)
For further discussion, we will consider a special case, in
which all observations on Xl are uncensored such that mlj = 1 for all
j=1,2, ••• ,n.
In this case, we are in fact assuming that Xl has a
continuous (marginal) distribution function.
In this case, the
general pattern, P , of the observations on X will be written as
2
2
k
m , f , j=1,2, ••• ,k with r (f + m.) • n
j
j
j
J
j-l
So, these observations will be represented by
1, ... ,1; 1,* ... ,1 * ; ... ; j, ... ,j; j * , ... ,J *
...,.
m.
J
*
*
j, ... ,k; k , ... ,k .
0.6.9)
where j, ••• ,j are the ranks of the m uncensored observations on
j
X ; and j *, ••• ,j * are the ranks or defined values of the f censored
j
2
observations on X • This ordering will be considered as the
2
natural ordering for the general right-censored sample. Now,
73
considering the nl permutations of these ordered observations, let
5i be
~he
set of permutations having i total number of inversions each.
In this case, we define that an inversion will occur in a
if a j * lies in front of an integer j'
of an integer j' < j.
integers.
~
permutati~n
j, or if a j lies in front
Note that j *,j-l, ••• ,k are not considered as
Hence, we would have the sets.
(3.6.10)
5 i , i-O,l, ••• ,1
with
(3.6.11)
where
k
I
m
i-j+l
j
(3.6.12)
The value of i - I is attainable if the permutation is the
inverse of the natural ordering, that is
k *, ...•k * k, ...•k; ... ; 1 * , ... ,1 * 1, ...• 1.
Let n(Si) be the number of elements or permutations in 5 ,
i
then we have
(3.6.13)
since each permutation in 5 is an inverse ordering of each peri
mutation in 5 _ "
1 i
The method for computing n(5 i ) will be proposed
using the illustrative examples below.
Let t c ,2;i be the conditional Kendall's tau between
Xl
and
the elements of 5i , then t c ,2;i can be easily derived from t c ,2;i-l
by considering that an inversion contributes a value of -1 to the
74
numerator of formula 3.6.2 in computing its value.
This implies
N2 (t c ,2;i - t c ,2;i-l) .. 2
{3.6.l4)
Thence,
t
c,2;i
, i=0,1, ••• ,1
..
(3.6.15)
where
N .. n(n-l)/2 - Ll (L -l)/2, and L is the number of the
2
l
l
censored observations.
Assuming that the nl* permutations are equally likely, we obtain
{3.6.16)
Pr(t c ,2 .. t c ,2;i) = n(Si)/(nl)
Consequently,
E(t
2r l
2r l
+ ) = E(t c,2
+
c,2
I
P ,P ) = 0; r=O,l, •••
l 2
(3.6.17)
and
1
= (n!)-l
E n(Si) t 2
i=l
c,2;i
{3.6.18)
This symmetric distribution suggests that the distribution
of t
c,
2 can be approximated by a normal distribution with mean
2
and variance a (t c, 2)' provided the sample size is large.
3.6.3
Illustrative Examples
(1)
Let Xl: 1, 2, 3, 4,5; and
X : 1, 2 , 3 , 4 *, 5 *
2
be the observations on Xl and X2 respectively, then we have
So •
(1,2,3,4 *,5 *); (1,2,3,5 *,4 *)
a
75
Sl • (1,2,4 * ,3,5 * ); (1,2,5 * ,3,4 *)
(1,3,2,4 * ,5 *); (1,3,2,5 * ,4 * )
(2,1,3,4 * ,5 *); (2,1,3,5 * ,4 * )
* * does
Note that in the second element of So the ordering 5,4
not count as an inversion based on the previous definition.
The
reason for this is the censored observations are considered as not
comparable in computing the conditional Kendall tau.
= 2.
Hence, n(5 )
0
The elements (or permutations) of 51 are constructed from
the elements of So by permuting two adjacent components which could
give an inversion.
Three elements on the left side are constructed
from (1,2,3,4 * ,5 *) by permuting 3,4*; 2,3 and 1,2 respectively.
And
the other three are constructed from (1,2,3,5 * ,4 *) by permuting 3,5 *
2,3.and 1,2, respectively.
inversion.
So, each permutation in Sl has only one
Here, we are in fact observing all possible adjacent
components in each permutation of SO' which could increase the
number of inversion by +1.
Furthermore, using the elements of Sl' we can obtain the permutations having two inversions each; which become the elements of
S2.
For example, from (1,2,4 * ,3,5 *)
£
Sl we obtain the elements
(1,2,4 *,5 *,3),
(1,4 *,2,3,5 *), and
(2,1,4 * ,3,5 *)
•
from (2,1,3,4 * ,5 * ), we obtain two permutations (2,1,4 * ,3,5 * ) and
*
(2,3,1,4 *,5).
However, only the second can be counted as a new
76
element of S2' because the first permutation is the same as one of
the previous three permutations.
Hence, in general, the elements of Si+l can be constructed
from those of Si,i-O, ••• ,I-l.
to compute Si,i=O, ••• ,k
for i
~
~
1/2.
Since n(Si)
As
C
n(SI_i)' we only need
shown in the previous paragraph,
1, we should note the possibility of getting the same per-
mutations for Si+l from two elements of Si.
For this example, finally we obtain the following distribution
of
t
c, 2.
Table 3.2
The Distribution of t
i
n(Si)
c,
t c,2;i
2 for data in Example 1.
Prob.
0
2
1
2/120
1
6
7/9
6/120
2
12
5/9
12/120
3
18
3/9
18/120
4
22
1/9
22/120
The table shows only the values for i-0,1, ••• ,4; since the others
are symmetric except for t c, 2.i'
which have negative values for
,
i=5,6, ••• ,9.
Note that the pattern P can be written as 1, 2, 3, 3*, 3*•
2
(2)
Let
~:
X2 :
1, 2, 3, 4, 5; and
1, 2, 3*, 4, 5 *
be the observations on ~ and X respectively, then we have
2
So • (1,2,3 * ,4,5 *); (1,2,4,3 *,5 *); (1,2,4,5 *,3 *)
.
77
Comparing with the set So in the first example, here we have
two adjacent components (4,3 *), which is not counted as an inversion,
because we cannot say that the uncensored observation, 4, is larger
than the censored observation, 3* •
Hence, here we have n(SO) .. 3.
Using the same steps as in the previous example, we obtain
Table 3.3
The Distribution of t
c,2
for Data in Example 2
i
N(Si)
t c,2;i
Prob.
0
3
8/9
3/120
1
9
6/9
9/120
2
15
4/9
15/120
3
21
2/9
21/120
4
24
0
24/120
5
21
-2/9
21/120
6
15
-4/9
·15/120
7
9
-6/9
9/120
8
3
-8/9
3/120
Note that, here we have I
-8/9 to +8/9.
= 9,
and the values of t
c,
2 range from
This situation suggests a modification for the
generalized Kendall's tau, which will be considered later.
The pattern for X also can be written as (1,2,2 *,3,3 *), be2
cause the uncensored observations have only three ranks.
Previous examples show all the cases which can be found in
more general patterns.
Considering the most general pattern, P ,
2
given in Section 3.6,it seems impossible to find a general formula
78
for n(S1).
For an illustration, let us consider SO •
. ~e natural ordering of censored observations is a permutation
having zero inversions, so it belongs to SO.
However, there are
many other permutations having zero inversions, because the
(t _
j
l
+ m ) observations (a components):
j
I(j-l) '* , ... ,(j-l) '* ; j, •.. ,j
t
j-l
j-l, ••• ,k+l;
to·
~+l • 0
can be permuted as many as (£j-l + m )! ways without affecting the
j
number of inversions.
Note that the orderings (j-l) '* ,j and
j,(j-l) '* are not considered as inversions.
So far, we may conclude
that
with
to •
~+l - 0 are defined.
In Example 1, we have ~ - m2 • m3 - 1 and t 1 • t 2 - 0;
£3 • 2; since the pattern P2 is I, 2, 3, 3'* , 3'* , hence K • 2.
Here, we obtain n(SO) • K • 2.
In Example 2, the pattern is
1, 2, 2 '* , 3, 3'* , so m1 • m2 • m3 • 1; II • O. l2· l3 • 1.
Hence
we have again K • 2, but n(5 0 ) • 3.
Now, considering the group of these K permutations, we have
• sub-group G. in which each permutation has ordered components
10-1) '* , •.• ,(j-l) '* ;
j
'* •.••• j '* 1
79
for 0 ~ c j ~ 1j ; j-2, ••• ,k; with cj=l ~ 0 and c.
J
one j.
~
0 for at least
These (c _ + c ) components also can be permuted without
j l
j
affecting the number of inversions, because all censored observations
are considered as uncomparab1e. So, there are (lj_llj) combinations
of (c _ + c j ), and for each combination we would have (c _ + c )!
j 1
j
j l
permutations.
However, (C _1 !)(C !) of these permutations are
j
j
counted in the group of K permutations.
having combinations c _ + c with c _
j 1
j
j 1
additional permutations from the c _
j
in finding the elements of SO.
1
Thence, for each permutation
~
+ c
0 and c
j
j
~
0, there are
which should be counted
In Example 2, we have c 2 • c • 1,
3
since the pattern P2 is 1, 2, 2* , 3, 3* ; hence C2 ,3 = 1.
Thus, we
have n(SO) • 2 + 1 • 3.
If the censoring on X occurs after the k-th ordered completed
2
observations, then we will not have any Cj_l,j' as shown by Example
1.
In this case
k+1
n(SO) -
.t0 -
II
j=l
~+l
(lj_l + mj )! • K
=0
Otherwise n(SO) is strictly larger than K.
The contribution of Cj_l,j in calculating n(SO) depends on
Furthermore, n(Si)' i ~ 0 would be affected also
lj,mj,j51, ••• ,k.
by Cj',j for j'
~
j, which will be so complex.
Hence, in computing
the value of neSt)' we may consider the following method:
(i)
One may find a formula for n(Si) for a particular pattern,
80
that is the pattern of his data, or he may calculate
n(Si) starting from ieO using the steps shown above.
(ii)
One may consider all permutations and compute the n!
values of t
c,
2 for the bivariate sample of size-n; then
the groups are formed with respect to the same values
of
3.7
An
t
c, 2.
Index Alpha as a Modification of the Generalized Kendall's
Tau
It has been noted, in previous sections, that the generalized
Kendall tau, t , in general could not reach the limits -1 and +1.
c
This situation is shown by Example 2.
It is known that tied ob-
servations also cause this deficiency.
So, in this section, we
will propose an index of correlation Alpha, as' as a modification
of the generalized Kendall's tau, t
C,s
First, we will consider the index
ditional' generalized Kendall's tau, t
c,
Q
1
1.
as a modified 'unconConsidering the functions
U , as defined in Section 3.2, let
ij
Zu
K
t~
(Uij
K
(3.7.1)
0)
(~,
be the total number of pairs
-,A.
X.),
-J
1 < j; i, j=l, ••• ,n such
Then the index a (X ,X ) is given by
l 1 2
that U • O.
ij
(n)
a
2
1
K
--~--
(n) _ Z
2
u
t
c,l
(3.7.2)
or
(3.7.3)
81
If C, D, T and I are the number of concordant, discordant, tied
and incomparable
observed pairs, respectively, then
..
C - D
°1 •
.UU.::>
shows that -1
~
01
N- T- I
~
1.
C - D
C + D
•
And it has the
(3.7.4)
same
formula as the
Goodman-Kurskal (1965) index gamma, for uncensored data.
So, this
index can be considered as the generalized G-K index.
Similarly, we define an index 02 as a modification of t ,2 by
c
N - N* t
c,2
v
N - Z
(3.7.5)
or
02 - {N - Zv }-l t'V
n ij
.e
(3.7.6)
where
is the total number of pairs
(~'!j)
i
<
j, i, j-l, ••• ,n such
that Vij • O.
If C* and D* are the numbers of concordant and discordant
pairs out of the N* pairs, as given in Section 2.1.2, then 02 can
be represented as
*
*
(C - C ) - (D - D )
°2 - (C - C*) + (D - D*)
(3.7.8)
It is clear that the index 02 can take a value from -1 to +1.
is considered as the conditional index alpha.
This
If all observations
are uncensored, then 02 • (C - D)/(C + D), that is the G-K index
gamma.
82
3.8
Test of Independence Using t c, I-Statistic
3.8.1
For Uncensored Data
Let
~
• (Xli 'X2i ) be the true observation on the i-th
subject or individual which may be censored by a variable
(Y li ,Y 2i ).
~
•
So, we observe
(3.8.1)
Zri • Min(Xri,Y ri )
for r-l,2 along with indicator variables
_{o
6
ri
of Zri • Xri
1 if Z
ri
- Y
ri
(3.8.2)
Associated with known d.f.'s of X and Y, we may consider
!i - (Zli,Z2i) as uncensored observations for all i-l,2, ••• ,n.
Thence the generalized Kendall's tau, t c,i , between Zl and Z2 is in
fact the usual Kendall's tau-a:
(3.8.3)
If F(·), G(·) and H(') are the d.f.'s!, Y and!, respectively,
then
(3.8.4)
Assuming that X and Yare independent random variables, we
obtain
6H(u,v) • 6F(u,v) • 6G(u,v)
where
6W(u,v) • 1 - W(u,-) - W (-,v) + W(u,v)
which will be written as
(3.8.5)
83
aw •
~f
1 - WI - W2 + W
W • Wl W2 then dW • (1-Wl )(1-W ).
2
(3.8.6)
So, if Y has independent
components then (3.8.5) becomes
dH • dF·(l - G )(1 - G )
1
2
(3.8.7)
In this case, independence of Xl and X would clearly imply the in2
dependence of Zl and Z2.
So, under· the null
hypoth~SiS,
HO:
tc,l(Xl , X2 ) • 0, Zl and Z2 are independent. Thence, using
Hoeffding's results, if ~ has continuous distribution, we have
Eaota(Zl' Z2) •
Ta
(3.8.8)
• 0
2 t (Z
Z). 2(2n + 5)
aHO a l' 2
9n(n - 1)
(3.8.9)
For large n, the distribution function of In(t
a
-
t
a
tends to
)
the normal form, by theorem 7.1 of Hoeffding (1948).
If HO is true, the critical region of size
may be defined by It
a
I ->
£
of the ta-test
c where c is the smallest number satisfyn
n
ing.
P(
I
t
a
I
> c
n
I
H ) ~
O
(3.8.10)
£
We may write cn • C'n
aCta
) •n
2b 13~, as given by Hoeffding.
And,
the power function of the test is
Pn (Kn
) • Pil t
> 2b 13~
a In
I Kn }
Since aCt ) --+0, as n~, P (K ) --+1 for any alternative hya n n
pothesis K with
n
T
a
~
O.
Now we shall study the distribution of t a(Z) assuming a sequence
of alternative hypotheses.
84
(3.8.1Z)
where
and A is some function of HI' HZ' the marginal distributions of !;
with A ~ 0 for n-1,Z, ••••
This n* is considered as a dependence
function of !, introduced by Sibuya (1960).
Writing t (Z) as
a(3.8.14)
we obtain
()
. E..._
~* t a
n
= (n)-l
Z
E...~ (U)
1Z
~n ~*
't"
n
= EK* (U1Z )
(3.8.15)
n
where
(3.8.16)
with
From (3.8.8), (3.8.1Z) and (3.8.16), we would have
E..._*(t )
-~
a
= n-~e*+
n- 1 y + E (t )
H0 a
= n-~e*
(3.8.18)
for n large, with
e* •
.
*(H ,H »
2ffffU1Zd(H11HZ1)d(HIZH2ZA
12 22
• 8ffH1HZA* (H1 ,H 2 )d(H1H2 ),
since
(3.8.19)
85
·11
.S • 8/xf(x)dx! yg(y)dy
•
o
(3.8.21)
0
Here, we should note that
(3.8.22)
under the assumption that
~
marginal d.f.'s H and H •
2j
lj
+1 , H
il
U12 •
> H
12
has continuous density functions,
with
And bence
or J
il
> H
i2
for i-l,2
{ -1 , Hll > H12 ,H 2l < H22 or Hll < H12 ,H 2l
>
H2
(3.8.23)
The variance of t
2 ()
°K*
n
t
a
-
a
under K* is
n
(n)-l IIIIu 2 dA ciA
2
12 i 2
t'
(i,j)
~
t'
(r,s)
(3.8.24)
where the multiple integrals under the double summations can be
6 or 8 integrals according to whether (i • r, j
or (i
~
r, j
~
s).
s), (1
~
r, j • s)
So, we obtain
2
n -1
0K*(t a ) • (2) {I + 2(n-2)YO + (n-2)
n
~
(n-3)~*(ta)} n
Ei*(t a )
n
(3.8.25)
where
YO • I ••• fU12U13dAl dA2 dA 3 • 1/(1 - 2Hl - 2B 2 + 4H)
2
dB
(3.8.26)
86
If A(H ,H ) ·
l 2
YO - 1/9 +
f(H ) g(H 2 ) then we obtain
l
n-~Yl + n- l Y2
(3.8.27)
where
l
Y - 8/ (2x - l)x f(x) dx /1(2y - l)y g(y) dy
1
0
0
+ 4/ l x f(x)
o
dx /lyg(y)dy
0
And hence
2
0H (t ) + R
o
a
(3.8.28)
n
with
1
1
1 *2
1 *
R - (~)- 1 (n - 2)(2n~Yl + 2n- Y + (n - 3)n- S ) - n- S
n
2
2
So, for large n, 0K*(t a )
n
Thence, as n
~
2
0H (t ) - 2(2n + 5)/9n(n-l)
0 a
--+~, (t - n-~s)/oH (t ) has a normal distri0
a
a
bution having zero mean and unit variance.
Finally, we would study the dependence of
the dependence of Zl and Z2.
~
and
X
2 associated
From (3.8.4-7), we obtain
Hi • Fi + Gi(l - Fi ); i-l,2
where W • W(x,y), WI •
o ~ (l - Wi) ~ 1.
W(x,~)
and W2 •
W(~,y).
This implies, since
..
87
Theorem 3.8.1
The marginal d.f. of!, B , is larger than F , the marginal
i
i
d.f.
of x.
That 1s
(3.8.31)
and
0.8.32)
Remark 3.8.1
Associated with (3.8.12), let
O(Fl ,F2 ) • 1 + n~
~(Fl,F2)
0.8.33)
with
.,e
be the dependent function of X, then
IH l H2 A*(HI ,H2) I ~
IFI F2 "(F1 ,F2 ) I
(3.8.34)
and
(3.8.35)
Remark 3.8.2
Considering n uncensored observations on the X variable, under
the alternative hypothesis
K : n(F ,F ) • 1 +
n
l 2
n~
A(F ,F )
l 2
(3.8.36 )
we would have, for large n,
E1<
n
(t ) •
a
n-~B
0.8.37)
with
(3.8.39)
Then
L •
(I BI - Is,* )/ Ie I
(3.8.40)
88
can be considered as the loss of the power of the test, because of
censoring.
Definition 3.8.1
Two
univariate variables U and V are called positively (or
negatively) quadrant dependent if, and only if,
F(u,v) - F(u,"')
F(~,v) ~
0 (or
~
0)
(3.8.41)
where F(u,v) is the joint d.f. of (u,v).
Theorem 3.8.2
Xl and X are positively (or negatively quadrant) dependent
2
if, and only if, Zl and Z2 are.
e.
Theorem 3.8.3
If
XJ
and
X2
are either positively or negatively quadrant
dependent, then
(3.8.42)
provided anyone of the following conditions holds:
(1)
(3.8.43)
(11)
If Xi and Y are non-negative variables, let
i
~x,i(t)
• f 1 (t)/{1 - F1 (t)}
~y,1(t) • g1(t)/{1 - Gi(t)}
(3.8.44)
be the hazard functions corresponding to Xi and Yi respectively.
89
Then the conditions would be
Ax,i + Ay,i -< Ax,i • Ay,i
(3.8.45)
for i-1.2.
(iii)
If Xi and Y have standard normal distribution functions for
i
i=1,2.
Proof: (Theorem 3.8.3)
Considering formulae (3.8.19) and (3.8.39), we need to show
that
(3.8.46)
It is easy to verify that
H H A* (H ,H ) h h
1 2
1 2
1 2
= F1 F2
A(Fl'F ) f f xM
1 2
2
(3.8.47)
with
(3. 8. 48)
Then, we should show that
o
< M< 1
(3.8.49)
Conditions (i) and (ii) clearly give this inequality.
For (iii), M can be written as
M-
(3.8.50)
Since F , G are standard normal, then (1 - Fi)/f i and (1 - Gi)/gi'
i
i
i-1,2 are the Miller Ratios (see Johnson and Kotz (1970», which are
non-negative.
Using their upper bounds and the normal density functions
90
for gi we obtain (3.8.49).
Example 3.8.1
Let X be normal having d. f.
with
F(u,v;p) • /u
/v f(s,t;P) ds dt
we have, see Sibuya (1960)
df(u,v;p) • f(u,v;p)
dp
(3.8.52)
We may consider f(u,v;p) as a function of p, then using a Taylor
series expansion we have
k
f(u,v;p) = L .L
k=O k!
IX)
tkf(U,Vi P)
dpk
J
p = 0
(3.8.53)
For large n, with p = n -~ r, this can be approximated by
f(u,v;n-~r) = f (u) f (v){l + n-~uvr}
1
2
(3.8.54)
where fit i=1,2 are the marginal density functions, and hence
(3.8.55)
Now, we would find the value of 8 under the alternative hypothesis K , which can be written as
n
(3.8.56)
From (3.8.33), (3.8.39) and (3.8.55), we obtain
dB
dr
2
2
• 8/! f 1 (u)f 2 (v) du dv
(3.8.57)
91
Thence, we have a differential equation
dp
"dr •
2/~
(3.8.58)
with boundary condition 8-0, for p-O.
Thus
8 • 2r/n
(3.8.59)
This value also can be obtained based on
Greiner's
relation
Ta
• E(ta ) •
~
sin-lp
(3.8.60)
(see Kendall (1949», for p • n-~r, by taking the first term of
its Taylor series expansion.
Example 3.8.2
If X has d.f. (3.8.51), then (3.8.29) implies
(3.8.61)
Hence
dB * • 811{1 - G (u)}{1 - G (v)}f(u,v;p) dA
n -~ dp
2
1
(3.8.62)
with
(3.8.63)
If Gi's are standard normal d.f. 's, then
(3.8.64)
Bence
n-~~* •
3; II
f(u,v;p) d{l - F (u)}3{1 - F2 (V)}]
1
Under K in (3.8.56), (3.8.65) reduces to
n
(3.8.65)
92
*
dB
n -~ -dp
(3.8.66)
c
Hence
(3.8.67)
To compute the value of 8* , we should consider using a Taylor
series expansion, that is
1
3
3 5
1 - F (u) - ~ - (u - ~,+ Us ' - ••• )
l
/:2n
u.
.
(3.8.68)
and the recursive formula
k _u 2
du
1 u e
00
k-l
= -2-
• k-3
--2-.. .. ~ ./n
(3.8.69)
for k is an even integer, and it is zero for k odd.
Thence, we
have
8*
= 32r {
~l___
8;;
2
2
1 Pol(u ) 3- u du}
00
2
(3.8.70)
where
2
Pol(u )
0=
(u -
5
3
4
u
3u
2
2
3T
+ ~ - • ..) -= u _ u /3 + 7u 6 /90- .••
(3.8.71)
Based on the previous example, we obtain
Theorem 3.8.4
If X has a bivariate normal distribution, then under the
alternative hypothesis (3.8.56)
8* • 8r I 1 *1 2
with
(3.8.72)
93
Proof
.By substituting (3.8.55) and (3.8.29) in (3.8.l9). we
easily obtain (3.8.72).
3.8.2
For Censored Data
In this case. we should consider the observations
~ (o.)
•
-1-
with
(Z11·01i;Z2i·02i)
cri • o
censored.
or 1 according to whether the Z is uncensored or
ri
For i-l.2 ••••• n. let us define
Pij • P{Uij - +l}
(3.8.73)
qij - P{Uij - -I}
(3.8.74)
where
with
Since. there are in general four types of observations. we
,
may have ten different values of Pij's or qij s.
And let
n
(2}P - I'n p.1- j
(3.8.75)
n
( 2 )q - I~qij
(3.8.76)
For illustration, we consider a particular case. that is a
data set having k uncensored observations and (n-k) censored observations only on the second component.
•••• k; and
~(O.l).
i·k-l •••••n.
So. we have Zi(O.O); i-I •
In this case. we have only three
distinct values of Pij's (or qij's).
Let
94
POO - P{U1Z - +1, the two observations are uncensored}
P11 - P{U1Z - +1, the two observations are censored}
POI - P{UIZ - +1, only one observation is censored}
(3.8.77)
Similarly, qoo' q1l and qOl are the corresponding probabilities
of UIZ - -1.
Having or assuming the d.f. of !' as in the previous
subsection, we have
1 1 H
H
1
1
P • / /(/11 / Zl + /
/ ) dA dA , k-O,l
kk
Z l
o 0 0
0
H H
ll Zl
(3.8.78)
and
1 H
1 1 H 1
qkk - / / (/11/ + / /21) dA dA , k-O,l
2
l
o 0 0 H H 0
Zl l1
1 1 Hll 1
qOl - 2 / / /
/
o0
0
H
dA Z dA l
(3.8.79)
ZI
Note the reduction of the integration regions, such as POI and
qOI' whenever we have two different types of observations.
In
this case, we obtain POO - P11 , and qoo • q1l·
Considering the statistics too' tIl and tOl in Section 3.4,
we in fact have
E(t oO ) • POO - qoo
E(t 11 ) • Pll - qll
E(t Ol ) • POI - qOl
(3.8.80)
95
and then
. E(t
n -1
c, 1) - (2)
k
{(2)(POO - qOO) + k(n-k) (POl - qOl)
n-k
+ ( 2 )(Pll - qll)}
(3.8.81)
(3.8.82)
E(t c,l ) - P - q
If A - H(Z) - HleH2 or n(H l ,H 2 ) - 1, then we obtain
Pkk - qkk • ~ , k-O,l
POl - qOl -
2 (~. ~
(3.8.83)
And then
This shows
E.._ (t
~O
and the variance of t
c,
c,
(3.8.84)
1) - 0
1 can be written as
(3.8.85)
Under HO' we have
(3.8.86)
Using similar steps to those in the previous subsection and taking
all possible pairs of observations, we obtain
2
aU (t
o c,
1) - 2(2n+5)/(9n(n-l»
(3.8.87)
Finally, we shall study the distribution of t c, 1 under the
alternative hypothesis
K:: Q(H"H2 ) - 1 +
n-~
f(H l )g(H 2 )
(3.8.88)
96
In this case, we substitute
(3.8.89)
in (3.8.75-79) and (3.8.85), then we obtain
~
*
n
where
*
(t c 1) • n
-~
8*
(3.8.90)
'
1
1
o
0
8 • 8!xf(x)dx !yg(y)dy
and
0.8.91)
As
in the previous subsection R -. 0 as n -.
n
m.
Hence for
large n,
{t
c,
1 - n -~ 8*}!oH (t
)
0 c,l
could be approximated by the normal distribution N(O,l).
0.8.91)
CHAPTER IV
KENDALL'S TAU AND SECTOR SYMMETRY
FOR RIGHT CENSORED MULTIVARIATE DATA
4.1
Introduction
This chapter extends the generalized Kendall's Taus (= GKT's)
to right-censored multivariate data sets.
This proposal can also
be considered as an extension of Simon's (1977) Kendall's Tau for
uncensored multivariate data.
.e
For the discussion, we would write the GKT's in the twodimensional sample, in Chapter III, as
(4.1)
By substituting appropriate values for Nand A ,ij' we can obtain
2
the indexes t c ,l' t c ,2' 2ij ,
Q
1 or
Q
2 as given in the preceding
chapter.
In an
m-dimensiona1 case, there is no real valued statistic
which can provide a complete description of the association between
the m components.
So, we should consider a vector valued statistic
having as its components GKT's of pairs of components and GKT's
of higher order.
The latter can be considered as the generalized
Simon's Statistics for right-censored multivariate data sets.
98
This chapter will also introduce a type of symmetry for mvariate variables.
This type of symmetry, called sector symmetry,
is proposed in Section 4.5.
And in Section 4.6, the corresponding
statistical tests are presented.
4.2
GKT in Multivariate Problems
~,
Let
X2' •••
'~
be the sample of m-vectors, having uncen-
sored sub-sample Xi' icl, ••• ,k; and censored sub-sample !i' i=k+l,
••• ,no
And let
(4.2.1)
(X r i ' 0r~. ), r=1,2 , ••• , m
denote the i-th observation on the r-th component, with 0r~.
=0
or 1 according as the observation is in fact uncensored or censored.
In studying multivariate data via GKT's, we examine each of
the
~(n-l)
point pairs and classify each according to the value of
a rij
for r=l, ••• ,m.
used as such.
= a(Xri-Xrj,Ori-Orj);
(4.2.2)
In this case Simon's "sign-case diagram" ca1U1ot be
So, we define a new variable
(4.2.3)
such that
with 00i cOat least for i c l,2, ••• ,k; and we should take 00i • 0
for all i whenever we are studying
Kendall's tau.
the
conditional generalized
And we will observe the GKT's
tOi • N
l
~
l.
i>j
aOija ij
r
(4.2.4a)
e.
99
for r-1,2, ••• ,m.
It is clear that
for all i>j, and hence
~ij>O
tOr can be written as
t
-
Or
N- l
I: a
i . rij
(4.2.4b)
>J
The function a rij takes the values -1, 0, +1, as given in
Chapter II. These three possibilities for a rij give rise to 3m
different value categories for a point pair.
"value-case diagram" for m
10:
3.
Table 4.1 shows a
Note that, without the zero values
this diagram is the same as the Simon's diagram in four dimensions.
Table 4.1
Value-Case Diagram for m=3
+1
.....
r
-I
,
,
~
-1
0
~
,-A...,
+1
"...-..-...
'"
0
-1
~
"-
+1
0
~-..
"-----.
"-
+1
r-"--.
-1
~
0
+1
,...-"-.
~
-1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
. .'-.
""""".--------------~.J
The fractions of
~(n-1)
pairs.
Now, let Z denote the 3mxl vector giving the fractions of the
~(n-1)
point pairs falling into the various categories.
tration, first we consider the case m=2.
with
For i11us-
In this case we have
100
1
Mc
2
1
1
1
1
1
1
1
-1 -1 -1
0
0
0
1
1
1
1 -1
0
1 -1
0
1
0
o -1
0 1
(4.2.6)
(1, t 01 ' t 02 ' t 12 )'
(4.2.7)
-1
1
0
o -1
1
0
and
T2
where t
~
0:
is the GKT between X and X.
r
s
If in the sample, there
are (X ri ,c5 ri ), i=l, ... ,n which satisfy the conditions on the XO's,
might be after reordering, for some value of r, then we could take
that component as XO.
m-l
Thence Z becomes a 3
xl vector.
Otherwise, we observe t
Or
,r=l, ••• ,m that is the GKT's
Xc
between Xr and a new variable
as defined above.
on Xo can be considered as having natural ordering.
All observations
However, we
may not be interested in these tOr's, then we could delete the corresponding rows of the matrix M.
[~
1
Thence, for 1IF2 we obtain
1
1
1
1 1
1
o -1
0
0
o -1
0
~] Z~ [~12]
(4.2.8)
In the following discussion, the vector T has tOr's as its components.
Now, for m • 3, we have
(4.2.9)
where
(4.2.10)
with
e.
101
1
1
1
1
1
1
1
1
1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1
M3 (-1)
=
.e
M3 (1) ..
0
1
1
1
0
1
0
1 -1
0
1 -1
1
1
1
0
0
o -1
1
o -1
o -1
1
-1 -1
-1
0
1
0
o -1 1 o -1
0 o -1 0 1
0 0 1 o -1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
-1 -1 -1
0
0
0
1
1
1
0
-1
0
1 -1
0
1 -1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
o -1
0
0
o -1
0
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-1 -1 -1
0
0
0
1
1
1
1 -1
0
1 -1
0
1
0
0
0
1
1
1
1 -1
0
1 -1
0
1
0
0
o -1
0
1
0
0
o -1
0
1
-1
0
-1 -1 -1
-1
1
1
e
0
-1
1
M3 (0) ..
0
0
o -1
o -1
102
and
The index t 123 can be written as
-1
t 123 - N
(4.2.11)
Using Simon's notation, t
123
- t
0123
•
Thus, in general, we would have
N-1
I:
r
2 < s < m
A
i>j s ,ij
(4.2.12)
where
As, i j - a r i Joa r ij···a r i"J
s
2
l
(4.2.13)
e_
And there are ml/(sl(m-s)!) combinations leading to indexes with
s-components. All these indexes will be considered as the possible
GKT's in the m-dimensional data sets.
These indexes and the natural
indexes tOr,r=l, ••• ,m can be presented as a matrix equation
Mm*Z
- Tm
(4.2.14)
m
where Z is the 3 *1 vector of the fractions, as noted above.
M is a 2mx3m matrix.
m
The first (m+l) rows can be written
easily, following the value case diagram as given in Table 4.1.
The
remaining rows are the element products of combinations of s-rows,
r l <r 2 ••• <rs out of the 2nd to the m+l-st rows of Mm.
m
Tm is the 2 xl vector of the GKT's with components
103
This vector will be considered as a generalized vector valued
statistic of Simon's (1977) or GSS (-Generalized Simon's Statistic)
for right-censored
~dimensional
data sets.
Furthermore, we note that the two matrices:
M , and
m
J
~
l
1
{g)M 1
-1 0 1
m-
wnere @denotes the direct or Kronecker product, have the same
row-vectors; that is, the first matrix can be obtained from the second
by permuting its rows.
4.3
Alternative Presentation of Simon's Statistic
*
Let X_i,i=l, ••• ,k and Xi,i=k+l,
••• ,n be the uncensored and
•~
censored sub-samples of the m-variate variables, then the Simon's
statistic (4.2.12) can be written as
k-l
t
r l ••• r s
k
n
n-l
n
L As iO + L
L A
+ L
L A iO}
i-I j=i+l ' J
i=l j=k+l s ,ij
i=k+l j=i+l s, J
= N- 1 { L
k
(4.3.1)
for s
< m.
This can be considered as an extension of (3.4.1) in Chapter
III.
t
Hence, (4.2.12) can be written as
r1 ••• r
S
•
N- l (k)
+ k(n-k)t
+ N*t
01;r1 ••• r s
11;r1 ••• r s }
{ 2 tOO;r1 ••• r
s
(4.3.2)
with
104
(4.3.3)
So, t
r 1 ···rs
is a weighted average of too
statistic of the uncensored sub-sample; t
;r1 ···rs
' the Simon's
01;r 1 ···r2 '
an index of
association of Simon's type between uncensored and censored subsamples; and t
the Simon's statistic of the censored sub-
1l;r1 ···rs '
sample.
As in Chapter II, in censored data, there are at least two
distinct distributions for Xi' i=1,2, ••• ,n.
the same distributions F(X);
*
and~,
If Xi' i c 1, ••• ,k have
i=k+1, ••• ,n have d.f.'s G(X),
then
E(t
r1
•••
rS
)
-1
c
k
N {(2)/••• /As,12dF(~)dF(X2)
+ k(n-k) I • •• lAs, 12dF(~)dG(X2 )
(4.3.4)
In this case, in fact, we have shown also that
(4.3.5)
(4.3.6)
(4.3.7)
Now, we will show that the index t
r 1 ···rs
in (4.3.4) can be
expressed as a function of the number of concordant and discordant
pairs. For two 1xs vectors
Xi(~)'
i-1,2 taken at random from the
population, define a vector W - (w ,w , ••• ,w ) by
s
1 2
105
wr •
+1
if (Xr1 ,or1) > (Xr2 ,or2)
-1
if (Xr1 ,or1)
o
<
(X r2 ,or2)
otherwise
(4.3.8)
And the two vectors are called concordant or discordant if the pro-
s
p
w
r=l r
+1)
c
P( IT
=
P( n we-I)
r=l r
s
c
s
q
s
(4.3.9)
Then
(4.3.10)
where C and D are the numbers of concordant and discordant pairs,
.e
s
s
respectively, out of the N
1
c
n(n-1)/2 pairs of observations.
Thence,
the Simon's statistic (4.2.12) can be written as
t
r 1 · .. r s
= (Cs
- D )/N
1
s
(4.3.11)
with
(4.3.12)
Up
s
(4.3.13)
For all possible values of s (>m), the index t
in (4.3.1) can be presented as a matrix equation.
4.1, it is easy to verify a matrix equation
r 1 · .. r s
given
Similar to Section
106
0
~O 0
M*
0
~1 0
0
0
=T
-w
!ll
(4.3.14)
where
T
-w
1
1
1
t OO ;Ol
t 01 ;01
t ll ;01
=
t 00 ;12 ••• m t 01,12
• ••• m t 11;12 ••• m
~O' ~1
(4.3.15)
m
and !11 are 3 xl vectors giving the fractions of
k(k-1)/2, k(n-k) and N* point pairs falling into various categories.
The subscript (00) indicates pairs within the uncensored sub-sample;
(11) indicates pairs within the censored sub-sample; and (01) indicates pairs between the uncensored and censored sub-samples.
And
the corresponding matrix M may be written as
M =.:!
with.:!
=
(1 1 1);
®
~O
~O
is in fact the matrix
(4.3.16)
~
of (4.3.11) corres-
ponding to the uncensored sub-sample, and ~ denotes a Kronecher
product.
From (4.2.14), (4.3.2) and (4.3.15) we obtain
T
-m
= -wT * W
(4.3.17)
where
W • N- 1 (k(k-1)/2, k(n-k), N*)'.
(4.3.18)
e.
107
4.4
Dependence Functions and Tests of Independence
4.4.1
Pairwise Independence
Let]f • (Xli' ••• 'Xmi ) be the true observation on the
i-th subject which may be censored by a variable
~
- (Y1i , ••• ,Y ).
mi
So, we observe
(4.4.1)
for i-l, ••• ,n; r-l, ••• ,m; along with indicator variables
6ri As in Chapter
o
if
Zri • Xri
{1
if
Zri • Yri < Xri
H(~)
III, let
(4.4.2)
be the d.f. of Z and Hr(zr)' H (z,z ) •••
rs r s
be its marginal d.f.'s.
Let us define the pairwise dependence function.
n
H (z,z)
- n (H H) _ rs r s
rs
rs r' s
H (z )H (z )
r
for all pairs (r,s).
Ho :
r
s
s
(4.4.3)
Then we will consider the problem of testing
nrs - 1 for all pair (r,s)
(4.4.4)
against the sequence of alternative
K::
nrs(Hr,H s )
=1
+
n-~Ars(Hr,Hs)
(4.4.5)
for n-l,2, ••• , where A is not identically equal to zero (a.e),
rs
for at least one pair (r,s), 1
~
r
<
s
~
m.
Using the results in Section 3.8 for each pair of (r,s),
the statistic
(4.4.6)
where t rs is the unconditional GKT, and
108
o~ - 2(2n+5)/9n(n+l)
o
* - 8f fH r Hs ~ rs (H,H)
Brs
r s dB r dH s
(4.4.7)
has a normal distribution having zero mean and unit variance, as
n
-+
00.
If we have uncensored data, and if F , the joint d.f. of
rs
X and X , is a bivariate normal with correlation p •
r
s
rs
K . r
• n-~p
n . rs
0 , rs
Then under
(4. 4. 8)
we have, from (3.8.53-56),
(4.4.9)
4.4.2
Non-null Distribution of the Generalized Simon's
Statistic
Considering the d.f. of the m-variate variable !,
let
m
n(Hl, ••• ,H)
= H(z)/
n Hr (z r )
m
r=l
be the
(1971).
(4.4.10)
m-variate dependence function, proposed by Puri and Sen
Here, we shall study the distribution of the unconditional
GKT:
(4.4.11)
under a sequence of alternative hypotheses.
Kn**
for n=1,2, ••••
1n
which~,
n· 1 + n -~ ~(H1, ••• ,Hm)
(4.4.12)
Using the same method as in the bivariate case,
1-1, ••• ,n are considered as uncensored, we have
e.
109
(4.4.13)
where
(4.4.14)
And for large n,
2
°K:*
(t
2 (t
=0H
12... m)
)
12 ... m
(4.4.15)
O
with
02 (t
)
H 12 ••. m
O
n -1{
2
m
(2)
f ••• fAm,12 ~1 (dH r1 dH r2 )
=
m
+ 2(n-2)f ••• fAm,12 Am,13 r~ldHr1Hr2Hr3
(4.4.16)
We obtain
(4.4.17)
Based on Hoeffding's theorems, as
distribution with mean
n-~B(:)
n~,
and variance
t I2 ••• m has a normal
O~O (t 12 ••• m)
under
K:*.
m
= n A (H
r=1 r
**
), then
m
B(m) • 2
n ffa r 12 dH r I dH r 2 Ar (H r 2)
r= 1
1
• 2m+
m 1
n f UA (u)du
r=1
4.5
r
O
(4.4.18)
r
A Type of Symmetry
Considering a m-variate variable X, we propose a type of
no
symmetry.
Observing the ordered values of the components of X. that
is Xr • r=l ••••• m; one may be interested in calculating the values of
xr .... <
(4.5.1)
2
for all possible
ri~rj.l.2••••• m.
Here. however. we are concerned
with the equality of those values. instead of the values themselves.
If F(X) is the d.f. of the variable X. then
! ••• !dF(x).
S(.I)
-
(4.5.2)
where
s (.I)
< x
••• < x }
(4.5. 3)
1
r2
rm
S(.I) may be called the L-th sector of the m-dimensional space.
"" {x: x r
Definition 4.5.1
A m-variate variable X is said to be sector symmetry
if and only if
(4.5.4)
for all possible orderings of ri's. that is (m!) orderings.
It is clear that
Pm!
=
LrP(_r)
< 1
where the summation is over all m! possible values of.!.
Pm!
=1
(4.5.5)
The value
is attainable if the equalities among any number of the
components of X have probability zero. or F(x) is a continuous d.f.
Let
q • 1 - Pm!
then q • 0 if and only if Pm! • 1.
(4.5.6)
111
Furthermore, we may consider testing the null hypotheses
H : p(rl, ••• ,r ) .. p(l, ••• ,m), r
O
m
-
£
P
m
(4.5.7)
where
P .. {r: X
m
r
<
l
xr ... <
X }
2
r
(4.5.8)
m
is the set of all possible permutations of 1,2, ••• ,m; which will be
discussed in the following section.
This is a sector symmetry test.
This H implies that F(x) is a symmetric function of its arguments.
O
So, the problem is in fact the same as to test the interchangeability
of the m variates x ,x , ••• ,xm in F(x).
l 2
If H is true, then
O
p(.E) .. p 2 11m !
(4.5.9)
and the equality is attainable i f F(x) is a continuous d.L
For m=2, a positive bivariate variable, (Xl ,X2 ) has a unique
characteristic.
Writing
~
for
4.6
o<
e
~
.. Rcose; X - Rsine
2
(4.5.10)
TT/2, then H in (4.5.7) reduces to
O
H2 ,0: Median (6) = TT/4
(4.5.11)
Tests for Sector Symmetry
4.6.1
COmpl~te
Data Sets
4.6.1.1
Chi-Squared Tests
Considering a random sample of size n on an
m-variate variable X having d. f. F(x), let
observations belonging to the domain
probability that exactly
n(~)
S(~)
n(r) be the number of
in (4.5.3), then the
observations fall into the sector
112
S(r) for all.!:
E
Pm is
(4.6.1)
with
where p(.!:) is given by (4.5.2) and q is given by (4.5.7).
This is
the multinomial p.d.f. of (m!) variables n(.!:)'s.
If F(x).is continuous, then q • 0 and (4.6.1)
becomes the
multinomial p.d.f. of (m! - 1) variables, that is
n !IT
r
{n (.!:)
!} -1 {p (r)
}n (o!)
( 4 • 6 . 3)
with
I: " P(r)
r
-
• 1, and I: r n (r)
• n
-
(4.6.4)
From this point, it is assumed that the d.£. F(x) is continuous.
In this case, the unbiased estimator of per) is
~(.!:) = n-l.n(.!:)
(4.6.5)
Define
.2
Xm!-l·
Then, as
n~,
r
r
{nW - npu.)}
np (!.)
2
(4.6.6)
2
Xm!-l has a limiting chi-squared distribution with
(m!-l) degrees of freedom.
2
Hence, X m!-l
is an appropriate
statistic for testing the null hypothesis
, rEP
-
(4.6.7)
m
where PO(r) is a specified number for each r.
And the null hypo-
*'
thesis H in (4.5.8) is a particular case of HO
O
113
If H is true, the statistic
O
{n(.&) - n/m!}
n/ml
2
(4.6.8)
has an approximate chi-square distribution with (ml-1) degrees of
freedom.
Hence, this is a test for sector symmetry.
4.5.1.2
Coefficient·of Sector Symmetry
Let
S2
= r r {nCr)
-
_ n!ml}2
(4.6.9)
be the corrected sum of squares of n 's or the observed sum of
r
squares of deviations from the average value, then
Max(S2) • n 2 ( 1 - lIml)
(4.6.10)
R2
(4.6.11)
Define
m
= S2!Max(S2)
2
and call R the coefficient of sector symmetry or the index of
m
symmetry.
2
It is clear that,. as Rm increases from 0 to +1 the de-
viations become more different or larger.
Hence, this index is an
appropriate statistic for testing the null hypothesis H in
O
2
Large values of R suggest the rejection of H •
O
m
(~.5.8).
It is easy to veri-
fy that
X2 •
H
O
(ml-1)nR2
m
and
{n(.z:)!n - 11m!} 2
1 - l!ml
or
(4.6.12)
114
2
m!
I 2
Rm
• m!-l O:r (n(!:) n) - 11m!}
(4.6.13)
2
R
(4.6.14)
This implies
<....!!!.L 0: rn(r)/n
- 11m!}
-
m - m!-l
with the equality is attainable if
of rEP.
n(~)
• 0 for all except one value
Hence
m
o -<
2
R
<
m-
1
(4.6.15)
This R2 can be considered as the sample index of symmetry, which is
m
an estimator of the population index of symmetry
(pW - 11m!) 2
I-11m!
with per) given by (4.5.2).
This index I
measure of the agreement between
4.6.2
(4.6.16)
p(~)
2
m
can be considered as a
and 11m!.
e.
Censored Data Sets
Considering the m-variate variables !' Y, and Z
discussed in Section 4.4, we have
(4.6.17)
provided the variable X and the censoring variable Yare independent.
l·et zrO be the r-th component of
~,
define
(4.6.18)
z(r)
- • (z r O,z r O,···,z r 0)
12m
Then the components of .! (~) are a permutation of the
~'s,
for each
rEP.
m
Now, if the d.f.'s of X and Y, F(x) and G(x), are symmetric
functions of their arguments, then
115
(4.6.19)
p (Y ~ ~) • p (Y ~.! (r) ) •
for all r
£
P.
m
(4.6.20)
This implies
- P(X ~ .!(.E»)P(Y ~ .!(.E»
= P(Mm(X,I)~<'!»
(4.6.21)
Hence
(4.6.22)
for all r
£
arguments.
P , or the d.f. Z, H(z), is a symmetric function of its
m
-
Thus, it has been shown
Theorem 4.6.1
If C is the class of symmetric functions, and if
os
(i)
X and Yare independent random variables
(ii)
F(x)
-
£
Cos and G(y)
£
Cos ,and
(iii)
Cos
Under the conditions of Theorem 4.5.1, the null hypothesis
then H(.!)
£
H in (4.5.8) for right-censored data on X with censoring variable
O
I, can be tested using uncensored (complete) data on Z.
Thence, the
chi-squared test in (4.6.6) and the index of symmetry in (4.6.13)
are applicable for the m-variate variable
On
!.
the other hand, without these conditions, we should con-
sider the component (X~i ' 6 ri ) of ~ in (4.2.1).
And the classi-
fication of Xi into the r-th sector S(r) in (4.5.3) is defined as
116
(4.6.23)
Otherwise
~
~
is a nonclassified observation.
£ S(.!.) if and only if
~
This implies that
has components
(Xrli,O)< ••• < (Xrji,O) < (Xrj+li,l) <••• «Xrmi,l)
(4.6.24)
for j-O,l, ••• ,m.
The value j=O indicates all components are
censored, and jam indicates uncensored Xi.
A better approach to classification may be considered, following the Gehan (1965) general pattern, given in Section 3.6.
In
this case, it is defined that Xi£S(.!.) if and only if
(X i'O)< ••• < (X
i'O) < (X i,6. i)
rl
r m- 1
r m rm
(4.6.25)
where 6r i • 0 or 1. This implies that observations having more
m
than one censored component are considered as non-classified observations.
We consider this classification to be more realistic
than that in (4.6.23), because if 6
i = 1, then the true observam
tion is certainly larger than X i and it belongs to SeE) anyway.
r
r
m
Now, let n c be the number of classified observations out of
n, then we have
= n-c l
I n)
c
nCr)
-
(4.6.26)
the estimator of the conditional probability p(rlnc)' instead of
(4.5.16).
Define a conditional chi-squared statistic
{n(L) - nc/m!}
n 1m!
c
2
(4.6.27)
117
For testing the null hypothesis
o,c : per
- 11 n c )
H
= p(1,2, ••• ,m
In)
c
(4.6.28)
Under the assumption that if Ho,c is true, then F(x) is symmetric,
2
this Xc becomes an appropriate test for testing the null hypothesis
H in (4. 5. 7) •
O
4.7
Other Statistical Tests
Here, we will consider a special case of the m-variate
variable X in which the censoring may occur only on a certain component.
Based on (4.4.1), without loss of generality, we can write
!
.e
= (min(~,
Yl ), ~)
; ! in distribution
where
~
(4.7.1)
is a lx(m-l)! vector of the uncensored components of X,
and Y is the censoring variable associated with
l
component of X.
~,
the first
And this special case does not satisfy the con-
ditions of the Theorem 4.6.1.
It is clear that a necessary condition for variable X to be
interchangeable is that Xo should be interchangeable.
So, first,
we need to test the null hypothesis
H01 :
~
by applying the chi-squared test in (4.6.8)
metry in (4.6.13).
(4.7.2)
is interchangeable
or the index of sym-
A rejection of H
Ol would clearly indicate the
rejection of H in (4.5.7), that is X has a symmetric d.f.
O
On the other hand, the acceptance of H in (4.7.2) suggests
Ol
•
further consideration, that is whether or not the H in (4.5.7) can
O
118
De tested.
Under a certain assumption, the conditional chi-squared
in (4.6.27) is applicable.
In general, however, Min(Xl,Y l ) -
does not have the same d.f. as the marginal d.f.'s
of~.
implies the variable! can not be interchangeable.
~~~ i~ply
that X is not interchangeable.
Zl
This
But, this does
Hence, in general, HO in
(4.5.7) can not be tested based on censored data.
Under the assumption that equality among any components of
X has zero probability, the i-th observations has the following
ordered components
where
(Xli'~li)
is a possibly censored observation on Xl' which is
larger than (k-l) observed components, for k=1,2, ••• ,m.
(~i'~li)
~li·
1.
Note that
and ( xrk+ji,O), j=1,2, ••• ,(m-k) are non comparable if
For a given value of k, there are
~l
(k_l)x(k-l) !x(m-k)! • (m-l)!
possible orderings of the m components of X.
all these (m-l)! orderings.
(4.7.4)
Let Sk be the set of
Accepting the fact that
~
is sym-
metric or interchangeable, and assuming that all possible orderings
are equally likely, then
P(~ £
Sk: ~ symmetric)
=
(~l)!/m!
• 11m
(4.7.5)
for k-l,2, ••• ,m.
In this case, we may be interested in studying the d.f. of
X within the set Sk for a given k, under the null hypothesis HOI
e.
119
in (4.7.2).
Or we may be considering the conditional d.f. of
a given value of Zl-min(Xl , Yl ), that is F(~
£
Sk.
I
~
for
zk) with (~, ~)
Consider the null hypothesis
(4.7.6)
where
~
is any permutation of the components of
~,
then the con-
ditional statistic
(4.7.8)
where
~
n(~
-= The number of observations belong to Sk
rk-=l) -= The number of observations belong to S(~lrk=l) in
(4.5.3)
p(r
I
rk-=l) ., The conditional probability of p(!.) in (4.5.2) given
rk=l
and the summation is over all!. with r =1, has an approximate chik
squareddistribution with «m-l)! - 1) degrees of freedom for large
values of
~.
Under H02 ,k' we obtain
2
)(. ., 1:
in(!.
"1(:
I
-
2
rk=l) ~ ~/(m-l)!}
~/ (m-l) !
(4.7.9)
for k=O,l, ••• ,(m-l).
This shows that we have m conditional chi-squared tests.
we may combine these tests to obtain an overall test.
So,
The method
is usually referred to as the summation of chi-squareds procedure.
11
This would lead to a chi-squared having a large number of degrees
120
of freedom.
Hence, it is suggested to consider the following
weighted linear combination method.
Let!O be a lx(m-l) vector of the permutation of (2,3, ••• ,m)
or (1,2, ••• ,m-l) denoting an ordering of the components of
(4.7.1).
~Qi
Let
~
in
be the i-th possible value of !Q' then each!Qi
would associate with m elements of S(r) in (4.5.3).
Considering
all possible values of i-l,2, ••• ,(m-l)!, and k s l,2, ••• ,m, within
each category, we have
n
ik
• The number of observations in the (i,k)-th cell
or category out of the n observations
Pik • The
p~obability
that a random observation falls
into the (i,k)-th cell
Now
let~'
be the lx(m-l)! vector of the probabilities with the
i-th m components: Pil,Pi2, ••• ,Pim' and let N' • {(nik )} be the
corresponding number of observation vector, then
(4.7.10)
is the maximum likelihood estimator of
~,
or the vector of sample
proportions, since N has a multinomial density function
(4.7.11)
Set
U
~
•
Tn (i -
~)
(4.7.12)
then
-
E(U ) - 0
~
Cov(.!!n) • Diag(p) ... ~ ~'
(4.7.13)
e.
121
where Diag(,£) is a diagonal matrix having .£ as diagonal elements.
Theorem 4.7.1
As
n~,
U has a multivariate normal distribution
-n
function with mean vector Q and covariance matrix (Diag(,£) - ,£,£').
For a proof see Bishop, Fienberg and Holland (1975), p. 470.
Define weighted linear combinations
(4.7.14)
where
sk • l/s.e.
is independent of n
ik
(p. k ),
(4.7.15)
k=1,2, ••• ,m
for i-l,2, ••• ,(m-l)!.
Here we should note
that defining the values of w in (4.7.15) corresponding to the
k
nik's makes them independent with respect to k.
However, since the
nik's are not independent, for correction, the dependency of the
nik's will be taken into account in computing the variance-covariance
matrix of L = (L ,L , ••• )'.
l 2
From (4.7.11), we obtain
(4.7.16)
i=j
(4.7.17)
Set ~ • fo (L
~ - (1 €)
where w· (wl, ••• ,wm)',
the Kronecker's product.
E(L» -
fo(,!; - j!), then
(4.7.18)
w')U • W'U
-n
--n
1 is an identity matrix and
Theorem 4.7.1 implies that
(E)
~
denotes
has a
122
multivariate normal distribution function, as
n~,
with mean
vector 0 and covariance matrix
v •
~r{Diag(.E)
-.£ .£'lw
(4.7.19)
• {(noij) }
with 0ij as given in (4.7.17).
Theorem 4.7.2
Let
that
~
(~~)r~(~~)
- N(l!, V).
A necessary and sufficient condition
have a chi-square distribution is
=
VAVAV
VAV
(4.7.20)
in which case the degrees of freedom are Rank (A V).
If
Ivi P 0,
then the condition reduces to
=
A V A
(4.7.21)
A
For a proof see Rao (1973), p. 188.
This theorem implies that the statistic
(4.7.22)
where V+ is the Moore-Penrose inverse of V, has an approximate
chi-squared distribution function with degrees of freedom rank
(~
+
V).
If
Ivi P 0
+ = V-1 and the chi-squared has (m-l)!
then V
degrees of freedom.
Considering the null hypothesis
HO: .£
we obtain
~ =~,
= .Eo,(.Eo
a fixed probability vector)
(4.7.23)
if H is true, and the statistic (4.7.21) reduces
O
to
(4.7.24)
e.
123
where
~
is the value of V in (4.7.19) for
£=~,
which has an
approximate chi-squared distribution function with Rank (~) degrees
of freedom.
Finally, we consider a sequence of alternative hypotheses
£ . £ + n-~A
K :
n
(4.7.25)
Under K , instead of (4.7.11), set
n
(4.7.26)
so that
(4.7.27)
and
Covi< (.!!n). {Diag(.PQ)
n
-.EoEQ}
+ n-ll l',
(4.7.28)
see Bishop, Fienberg and Holland (1975).
Cov
K
n
+ n-~ Diag(~) - 2¥'
(.!!n)
~ {Diag (~)
-.£o£O}
For large n,
+ n-~{Diag (~) - 2~'}
(4.7.29)
From (4.7.18) we obtain
(4.7.30)
(4.7.31)
Hence we have
(4.7.32)
which is a noncentra1 chi-square with noncentrality parameter
V-).' WW' A
(4.7.33)
Here, we should note that we may consider using another
124
approximation for Cov (~), instead of (4.7.29), that is
Kn
CovK (~)
n
provided
~
{Diag(IO)
-~}
(4.7.34)
n-~ is sufficiently large. Using this approximation,
(4.7.32) reduces to
~.~
n
(4.7.35)
CHAPTER V
CHARTS FOR K-INDEPENDENT SAMPLES
5.1
Introduction
In this chapter we propose two subjects.
First, we consider the
use of the censored pair chart for K-independent samples, as a descriptive statistic and to test the null hypothesis that the K parent
populations have the same distribution functions.
Second, we intro-
duce a chart in 3-dimensiona1 space based on 3-independent samples,
as an extension of the (2-dimensiona1) pair chart of Quade (1973),
for uncensored data.
5.2
Equality Test for K-Samp1es
Let X.. be the j-th observation within the i-th sample from a
1J
variable Xi with distribution function (d.f.) Fi , for i-1, ••• ,k;
jZ::1, ••• ,n i •
Then we consider the null hypothesis
H : F • F • H= F against
K
O 1
2
H : F
i
1
1
.; F
i
2
for at least one pair of (i ,i )
1 2
By observing the corresponding pair charts we could detect whether
or not we may reject H •
O
For this purpose, we could use the follow-
ing two methods:
(i)
Pairwise Comparisons.
In this case we could have
126
K(K - 1)/2 comparisons or tests whether the two corresponding variables
have the same distribution functions.
many pair charts.
So, we need to construct as
The construction of these pair charts or censored
pair charts is straightforward, as given in Chapter II.
number of the pair charts needed may cause a problem.
However, the
We should have
as many as K(K - 1)/2 in order to detect that the K populations tend
to have the same distribution functions.
For rejecting HO' we may not
need so many pair charts as for the acceptance of H •
O
But, this pro-
blem could be overcome with the help of a computer program.
(ii)
Pair Chart based on Breslow (1970) statistic.
Breslow
defined a vector score statistic
W • (Wl, ••• ,wKt
where Wi is in fact the conditional Gehan (1965) W statistic, as
given in Section 2.1, comparing the i-th sample with the remaining
(K-l) samples.
obvious.
5.3
Thence, the use of the censored pair chart becomes
Here, we need at most K pair charts.
Triplet Chart for Uncensored Data
For K-3, we will consider using a three-dimension chart as a
descriptive statistic for making a decision for the rejection of R •
O
The construction of the 3-dimensional or triplet chart is as follows.
Considering the uncensored 3-samples of sizes n ,n ,n we con1 2 3
struct a rectangular parallelepiped of size n -units x n -units x
2
1
03-units (or n
1
x n
2
x n ) as shown in Figure 5.1.
3
e.
127
Figure 5.1.
The Perspective of the Triplet Chart
P
P
o
I
\
\
\
\
\
'.
.e
This figure shows the perspective of triplet charts with angle
o
0 =60. The rectangular parallelepiped is subdivided into n x n x n
l
2
1
3
cubes having sides of unit length. Note that in the perspective,
a unit length on the Xl-axis should be taken as half a unit length
on the X -axis or X -axis. The path of the triplet chart will
2
3
start at the point 0 (or the origin) and end at the point P. The
construction of this path follows the steps below.
If the smallest observation in the combined samples is an
Xi' draw a unit line (- a segment of one unit length) from 0 to
the positive direction of Xi' that is from 0 parallel with Xi axis.
128
From the end of this first unit-line draw a second unit-line parallel with the Xi axis if the second smallest observation is an Xi
say.
Continue in this manner for all (n
+ n 2 + n ) observations.
3
+ n ) unit-lines then form the path of a triplet chart
3
from 0 to P which can be considered as the origin (0,0,0) to the
The (n
l
+ n
l
2
point (n ,n ,n )·
l 2 3
For illustration, we consider four data sets: (A), (B), (C),
and (D), in Table 5.1.
And the corresponding triplet charts are
given in Figure 5.2.
Table 5.1
(A)
(B)
(C)
(D)
Illustrative Data
Xl:
2, 6, 7, 10, 15, 16
X :
2
4, 5, 11, 13, 17
X :
3
1, 3, 8, 9, 12, 14
Xl:
2, 13, 14, 15
X2 :
1, 6, 8, 9, 10, 11
X :
3
3, 4, 5, 7, 12
~:
1, 4, 7, 8
X :
2
2, 4, 4, 8, 9, 10
X :
3
1, 3, 5, 6, 8, 10
Xl:
1, 3, 5, 8
X :
2
2, 4, 6, 9, 11, 13, 15
X :
3
7, 10, 12, 14
e.
129
Figure 5.2.
Triplet Charts for Data in Table 5.1
Chart A
13
12
o
.e
9
4
11
//"
5
/
1
/'
,//
""
/
",,/
130
Figure 5.2.
Triplet Charts for Data in Table 5.1
Chart B
X1
•
13
9
10
11
12
8
7
6
5
o
~-";;'--.-+";"'-----------------,:--~--+--. X2
131
Figure 5.2.
Triplet Charts for Data in Table 5.1
Chart C
p
10
""
./
_e
9
.
.
"//
10
6
5
4
3
2
1
o t::-~-----------"""-'--------'t------r--' X2
132
Figure 5.2.
Triplet Charts for Data in Table 5.1
Chart D
e.
15
....-~p
o
133
It is easy to see that the projection of the triplet chart
onto the xi,xj-plane is the pair chart between Xi and Xj •
Note
Figure (C) shows how tied observations should be treated.
If we
have
XiKXj~~
then we have a rectangle parallel with the xi,Xj-plane.
For data set (C) we have Xl • X3 • 1; XI • X2 K 4 and X2 • X3 • 10;
these observations give the three rectangles along the path of the
triplet chart (C).
In fact, we have two unit-squares and one rec-
tangle of size lx2 units.
If Xl • X K X then we have a rectangular
2
3
For data (C) we have a cube corresponding to
parallelepiped.
x..
--1
5.4
~
•
X2
K
X3 •
8.
Statistical Tests Based on the Triplet Chart
Considering the usage of a triplet chart as a descriptive
statistic, we should observe the distributions of the corresponding
unit lines, parallel with XI'X
2
or X which will be considered as
3
orthogonal, horizontal or vertical unit-lines, respectively, along
the path of the chart.
The (n +n +n ) unit-lines would form broken
l
2
3
lines or segments along the path. Let S be the number of segments
along a path of a triplet chart, then
(5.4.1)
The value S • 3 is attainable if all observations on Xi are
larger than all observations on Xj ' and they are smaller than all
observations on
~,
for i
~
j
~
k • 1,2,3.
The corresponding
triplet chart suggests the rejection of H ' in particular Fj > Fi >
O
F •
k
However, the value S • n l + n 2 + n • N does not directly imply
3
134
to the acceptance of H '
O
For example, for data (D) in Table 5.1,
we have 8 • N • 14 and its triplet chart (D) in Figure 5.2.
This
suggests us to observe the distributions of the orthogonal segments,
the horizontal segments and the vertical segments, which are parallel
with Xl-axis, X2-axis, and X -axis respectively.
3
In chart (D), three
out of four orthogonal segments lie on the left side (or at the beginning of the path) and all four of the vertical segments lie on
the right side of the three orthogonal segments.
dicate that F • F • F •
l
2
3
This does not in-
In fact chart (D) suggests that Fl > F •
3
Note that at the beginning of the path we have orthogonal and horizontal unit-segments only, and near the end of the path we observe
only the horizontal and vertical unit-segments.
is beneficial to show a rejection of H •
O
80, the statistic S
Thence, the triplet chart
is beneficial in showing a rejection of HO•
For instance, chart (A)
shows that we could not reject HO' and chart (B) shows a rejection
of H •
O
In fact, in chart (A) we have S • 13 with n
= 17,
and the
three types of segments are 'well' distributed along the path.
based on this chart (A), we may accept the null hypothesis.
So,
We
define
=
(S - 3)/(n + n + n - 3)
2
3
1
and call I an inequality index or rejection index.
I
from 0 to 1.
(5.4.2)
It has values
If I closes to zero then we reject H ' or we accept
O
the inequality of F , F , and F •
2
3
l
A problem which arises is: if tied observations occur, how
should 8 be counted?
First, we consider a cube along the path.
It
e.
135
is clear that the cube shows or indicates a better chance for having
F - F2 • F3 , so a cube should be considered as (s x 3) segments,
l
where s is the length of the edge of the cube.
Similarly for a rec-
tangular parallelepiped of size s1 x s2 x s3 units should be cOtmted
as sls2s3-segments.
On the other hand, a rectangle along the path
would indicate less chance for having F - F - F • As the rectangle
l
2
3
size increases, the differentiation between F
(or F ) and F irii
i
i
1
creases, provided Xi
and Xi
generate the rectangle.
2
3
50, a rectangle
1
2
will be considered as one segment for 5. This would lead to a lower value
of I.
In chart (C), we have 5 = 11 and N = 16.
In order to differentiate large values of I or 5, we need to
observe the maximum value, D, of the distances from the corner points
of a path to the diagonal line, OP, of the n x n x n 3-rectangular
1
2
parallelepiped. Large value of D would suggest a rejection for the
null hypothesis:
(5.4.3)
whether the value of I is small or large.
50, statistic D is worth-
while to study in more detail, instead of 5 or I.
Let Ai be the angle between OP, with P(n ,n ,n ), and the Xi l 2 3
axis then
222
cosA i • n /l(n + n + n ), i=1,2,3
i
2
3
1
(5.4.4)
And it is easy to verify that
o
<
D(n ,n ,n )
l 2 3
~
Max (nisinA )
i
(5.4.5)
i
If the vector x - (x ,x ,x ) denotes a point on the path of
l 2 3
a triplet chart then
136
(5.4.6)
But the lattice points of the path constitute the locus of points
for -
~
(nlF (xl ),n2F (x 2),n F (x »
3 n3 3
nl
n2
< xi < ~, where
(5.4.7)
F (Xi) = {number of observations Xi such that
ni
Xi2 xi }/ni
(5.4.8)
is the empirical distribution function for i=1,2 and 3.
Hence
(5.4.6) becomes
D(n ,n 2 ,n )
l
3
2 2
= sup./{EniF
2
(Xi) - (EniF (xi)cOSA ) }
i
ni
ni
(5.4.9)
By substituting (5.4.4), we obtain
D(n ,n ,n )
l 2 3
2 2
= sup./{EniF
nl
(Xi) (5.4.10)
If we have equal sample sizes, then we obtain, by writing
n
i
= nand
D(n,n,n)
= D;
D· Sup.ln/{EF
2
n
(Xi) - (EF
i
n
(x
i
i
»2 /3 }1
(5.4.11)
Based on the triplet chart, it is easy to verify that
o~
3 (D - 1) In 12 ~ 1
(5.4.12)
The distribution of this variable D will be discussed in Section 5.6.
As
a descriptive statistic, chart (C) with S
value of D(n ,n ,n ).
l 2 3
= 11
shows a small
Hence we could not reject the null hypotheses
H in (5.4.3).
O
Here, we note that the D statistic is presented as a function
of n i , i=1,2,3.
So, we are not considering
using standardization.
137
The idea of standardization was noted by Quade (1973) on the pair
chart.
Other statistical tests associated with 3-sample problems were
introduced by David (1958) and Conover (1965).
Furthermore, the usage of triplet may be extended for uncensored K-samples.
Instead of using pairwise comparisons, as noted in
Section 5.2, we would use triple comparison method.
However, since
K(K-l)(K-2)/6 > K(K-l)/2 for K > 5, then this method becomes worse
than the previous method if K>5 in the sense of the number of triplet
charts needed.
5.5
Orthogonal Projections of the Triplet Chart
Two kinds of projections will be considered in this section;
these are the projections on the coordinate-planes generated by positive axes Xl' X2 and X3 and the projection on the plane
~+X2+X3=O
(5.5.1)
that is the plane perpendicular to the line Xl - X2 = X •
3
As noted in the previous section, the 3 projections of the
first kind are the usual pair charts.
Since the pair chart has been
discussed in detail, in Chapter II, here we consider only the characteristics of the triplet chart related to its projections.
First,
considering a segment, in the path of a triplet chart which is
parallel with Xi' as the segment increases in length, the differentiation between Xi and X ,
j
i~j
increases, because the pair chart be-
tween Xi and Xj would have a segment with the same length.
And this
138
segment does not have any effect on the third pair chart, on the X.,
J
~-plane,
for i,&j,&k.
Second, we consider a rectangle of size si xS
parallel with the xi,Xj-plane, i'&j.
and
~
(or, Xj and
~)
j
Then the pair chart between Xi
has a segment of length si (or Sj).
Hence,
the length of si (or Sj)' clearly determines the differentiation
between Xi and
~
(or, Xj and
~).
The third projection, that is
the pair chart between Xi and X , would have a rectangle of the same
j
size si x Sj.
Likewise, if we have a rectangular parallelepiped of
size si x Sj x sk-units in the path of the triplet chart.
= Sj'
If si
and as si increases, the differentiation between Xi and Xj decreases.
However, a large difference between si and Sj may have different
effects; for instance the effect on the value of the Mann-Whitney
statistic U(Xi ).
These characteristics would lead us to make a
better judgment in observing the triplet chart as a descriptive
statistic.
Now, we will start with the second kind of projection.
It is
well known, that the projection of a unit-cube in the X},X ,X 2 3
space on the plane (5.5.1) forms a hexagon that is a polygon having
six equal sides of length
115/3.
Figure 5.3 shows the projection of
a 5x4x3-rectangular parallelepiped with all its unit-cubes.
This
projection will be considered as hexagonal projection.
In Figure 5.3, any two adjacent points have a distance of
/16/3 units.
The point 0 is the origin and P is the projection of
the point (5,4,3).
PI' P and P are the projections of the points
2
3
(0,4,3), (5,0,3) and (5,4,0), respectively.
Using this type of dia-
gram, the projection of a triplet chart can be constructed as follows.
e.
139
Figure 5.3.
.e
The Projection of a 5x4x3-Unit Cubes
140
First, we would note that drawing a line in the diagram means
drawing a line from one point to the adjacent point along the positive direction of Xi-axis.
the combined 3-samples.
Then, we observe the ordered values of
If the first value is an Xi' draw a line
from 0 parallel with Xi to an adjacent point, then from this point
draw a line parallel with Xj if the second value is an Xj •
in this manner for all observations.
Continue
For illustration, Figure 5.2
shows the (projection of the) triplet chart of data set (A).
The
lines are numbered from 1 to 17, from 0 to A, the projection of the
point (6,5,6).
It is not difficult to count the number of segments,
S, along the triplet chart, by following the broken path (projection)
from 0 to A.
This hexagonal projection has a special interest if n. - n,
1
because it shows the value of variable D.
Figure 5.3 also shows the path of the triplet charts for
the data
(E)
1,3,5
(F)
2
2,4,5
1,1,3,6,6
5
4,5,6
in order to illustrate the paths having tied observations.
For
data set (E) we have a cube, and for (F) we have a rectangle of
size 2xl.
e.
141
5.6
The Maximum Distance, D, Statistic
5.6.1
Values of D Statistic
For the distribution of the D statistic as given in (5.4.7),
we only consider the case n
i
= n;
that is equal sample sizes.
In this
case, the hexagonal projection of the triplet chart has a special
characteristic, that is the projections of the end points of the path
coincide at O.
This implies all possible values of D can be seen
easily on the hexagonal projection.
As
a descriptive statistic, the projection of the triplet chart
would show us: (i) the length or value of D; and (ii) the location or
distribution of the path around the origin
o.
By observing these two
characteristics, we mayor may not reject the null hypothesis,
.e
without knowing the distribution of D.
It is easy to see that, if V is the set of distinct values of
n
D for samples of sizes n, then V 1 is a subset of V.
nn
Applying the
Pythagorean formula, we can compute easily the values of D.
We
obtain
(5.6.l)
where
2
A • {{n /2 +
n
(n-2k)2/6)~ : n-2k
{n//2}
Mn •
{
~
• empty set,
>
O}
(5.6.2)
if n is even
otherwise
(5.6.3)
which are the sets of the int({n+2)/2) largest values of D corresponding to the 3-samples of size n.
For illustration, Table 5.2
shows the values of D for n • 2,3 and 4.
142
Table 5.2.
The Values of D for n • 2,3 and 4
D
n
2
{6/3,
12,
2/6/3
3
{6/3,
12,
2{6/3,
/42/3,
16
4
{6/3,
12,
216l3,
14fT3,
16,
M, /f8T3 ,14
6/3.
It is clear that if D i denotes the i-th ordered values of D,
n,
such that Dn, 1
Dn, 2
<
< ••• <
Dn, k' then
Dn,i • Dn-1,i
(5.6.4)
for i=1,2, ••• ,#(V _ ), where #(V _ l ) is the number of elements of
n l
n
the set V 1 with
n-
#(V ) - #(V 1)· [(n+2)/2)
n
n-
=
INT«n+2)/2) (5.6.5)
The probability function of D will be discussed in the next
sub-section under the assumption the parent population has a continuous distribution function.
So, there are no tied observations.
However, we do not write the probability itself, but the number of
paths or triplet charts out of (3n) !/(n!)3, which give a certain
value of D.
5.6.2.
Recursive Formulas for D Statistic
Here we consider all possible paths which lead to a
certain value of D, using a tree diagram.
The tree diagram can be
constructed by following the paths in the hexagonal projection.
For
this purpose, we need only one sixth of the projection as given in
Figure 5.3.
e.
143
Figure 5.4.
One Sixth of the Hexagonal Projection
For n=4
/
,.
i
,~
l-,
I~H
/
~
/
\/G
B "'",'----...:;J'I-,~---,
"/ .~
s
,
/
0
.e
f
/
I
)
1\\
'//
\
\
!
/
V
E
2
3
\
4
n = X
1
In Figure 5.4, we have the n-axis which is parallel with Xi-axis,
for one i.
Each path should start from 0 following the arrows; the
directions of the positive Xi-axis; and returns to O.
Note that there
is only one choice from 0 to the point n=l, then from this point there
are two choices, one along the n-axis or the 'boundary' and the other
'inside' of the sector.
At this point, the tree diagram has two
branches having values 1 along the boundary, and 2 for the inside
path.
This value 2 is obtained, because of two symmetric choices
with respect to the boundary.
Furthermore, from a point inside the
sector, T say, there are 3 choices.
For a certain pair of values
D,n; we mayor may not use all 3 choices or paths from a certain
point.
For illustration, first we consider the case D •
Figure 5.5 shows their corresponding tree diagrams.
1:2,
n > 2.
144
Figure 5.5 Tree Diagrams for (a) D E
~,
and (b) D •
(a)
n
E
n
E
~,
2
T
.
.....o- - -R..-
= 3
,
T
~1•
1\
'(
5
0
1
2
1
2 3 paths
*
*
/2
R -'
R
*
1 '"
0
=2
3
.---.----
(b) n
n
0
R
•
1
1
R
T
R
0
~ ~ ~
..T
R
:<
2
"5
2
5
0
2
2
R
5
0
1
•5
2
a
1
2
5
4
4
Figure 5.5(a) shows that from each point, R, 5 and T, we can take
only one path or direction, because the others (*)'s lead to
larger or smaller values of D.
and 5-* gives D - /Ib/3.
For instance, R-* gives D
E
2/6/3
50, the tree diagram has only one branch
with 6 segments corresponding to the 3x2 observations.
And we
e.
145
obtain 1.2.2.1.2.1 • 8 paths.
Thence, if P
is the number of
n,i
paths leading to Dn, i' the i-th ordered value of D, we obtain
(5.6.6)
It is easy to see that
P
• P (D • /6/3) _ 6n
n,l
n
(5.6.7)
for all n.
The value of P 2' n>2 can be calculated based on the
n,
tree diagram (b). This tree diagram has 3 branches - the first
corresponds to the path which does not enter ° within the path.
This type of path will be written as 0 k' so we have
n,
• D3,2 = /2) = 3.2
5
This path circles the triangular RST twice.
number of this recircling increases.
.e
on,2 •
As n increases, the
Then we obtain
3.2 2n- l
(5.6.8)
for all n > 1.
Now, P 3 ,2 can be written as
P3 ,2
= Pl ,lP 2 ,2 +
PZ,2 Pl,1 + 03,2 • l2P 2 ,2 + 3.2
5
(5.6.9)
instead of using a tree diagram.
The first term denotes the paths,
which enter the point ° once with D • /2.
3.2
7 • 384.
Thence, we have P3 ,2
=
Considering all possible partitions of 4, we can
compute
P4 ,2 - Pl ,lP l ,lP 2 ,2 + Pl ,lP 2 ,2 Pl,1 + P2 ,2 Pl,lPl ,1
2
+ P2 ,2 + Pl ,103,2 + °3,2 Pl,1 + °4,2
Keeping D -
ii,
the first 4 terms in the right indicate all possible
146
paths (in the projection) which reach 0 twice before the end of
the paths, or the paths reentered 0 three times.
indicates the paths reentered 0 two times.
2
2
P4,2 - 3P1 ,l P2,2 + P2 ,2
The second 2 terms
This can be written as
If
(5.6.10)
In general, we can write
Pn ,2 - F{P1 ,1,P 2 ,2'03,2""')n,2)
that is a function of P1 ,1,P 2 ,2,03,2 t
•••
,On,2 for n>3.
(5.6.11)
This function
has the following form.
P
n,2
=
[n/2J
n-2k k
n
t
c P 1 P 2 + L
r* c,p r ps 0
2
1
k
k=l
'
,
k=3 r+s+kt=n k 1,1 2,2 k,2
r*
(5.6.12)
where r * denotes the summation over all possible choices, by keeping
in mind the number of reentering of the path to 0; that is 1,2, ••• ,n,
c
k
and c
k
depend on k.
For instance,
e.
(5.6.13)
where the coefficients indicate the number of distinct orderings of
the corresponding factors.
Similarly, we obtain
3
P2 , 3 - P2 {D = 2/316) - 3{2 + 2 ) - 30
o
n,3
- 3{2 + 2 3)2 3 {n-2) _ 3.2 2n- 1
(5.6.14)
(5.6.15)
[n/2]
n
r* k Pn-2k Pk
r
r* c,p r ps 0 t
P
n,3 - k=l c 1,1
2,3 -k=3 r+s+kt-n k 1,1 2,2 k,2
(5.6.16)
•
147
Figure 5.6.
Tree Diagrams for Computing
,
•
T
A
F ______ 1
•
'2
//~
T /
(0)
2
A
E
~T'
•
1
A
~ TB*
1
( 3)
,r
1
•
G
T•
G
(2)
9
9 • ""'-,
B
.e
145
1
G----:r
"'~
(1)
18
18
T
3~
F~l
~
1,
T
A
1
-.
T
F
E
-
A
F
1
~--~
3
1
_.
.
G
--
1
G
•
1
-
6
--.H 1
_u ___
J
-
- --
J
-.B
1
1
H
. ------J
1
1
..-
9
'\
3
>9
B*
(4)
B
,
3
G
2•
H
•1
B
~
.
1
6
)
148
for n > 3.
Note that the first product in 0
n,
3 indicates the number
of paths through the points T and A or B, see Figure 5.4; and the
second product indicates the paths through the point T only, or 0n, 2'
.
,
And the formula of P 3 can be obtained from that of P 2 by replacing
n,
n,
the subscript 2.
This tree diagram method and the previous general formula can
be easily extended for higher ordered values of D.
obtain P3 ,4 • P3 (D • ~J42)
tree diagram.
and P3 ,5 • P3 (D = 115) = 114 using
It is clear that, as n increases, the tree diagram
becomes more complex.
from P2 ,2
For instance, we
= 270
So, we would like to compute these values
= P2 (D = 1:2) = 24
and P2 ,3
= P2 (D = 2/)76) = 30
as follows.
In Figure 5.4, P ,4 denotes the number of paths through the
3
point F or G, but not E and H.
These paths can be constructed from
P2 ,2 and P2 ,3' the paths through A, B or T.
Again, we use tree
e.
diagram as given in Figure 5.6 - (0).
The diagram (0) starts atT, where the (24 + 30)/6
giving D =
1:2 or 216/3, go through.
This gives P (D
3
=9
paths,
= OF = OG) =
6x45 = 270 • P3 ,4. In order to compute P3 ,5 • P3 (D = OE = OH) we
need diagrams (1) - (4). In diagram (1) the segment AT* is replaced
by AEFGT (=4 segments).
Here segment AT is counted as the segment
of 3 paths out of P2,3'
In diagram (2) segment ATB*, which is the
part of OATBO triplet chart, is replaced by two branches.
Similarly,
TB* is replaced by TFGHB, and T is a point of three paths.
Finally,
start from B, a point of 3 paths, we put the path BGHB.
Hence, we
obtain
P3 ,5 • (3+6+1+3+6)x6 • 114
•
149
In these diagrams, we should note that we really use one side of
n=~-axis,
..
because we use one sixth of (P ,2 + P2 ,3).
2
EF is not counted as twice, but segment BG is.
So, segment
For further discussion, we need mcompute the number of paths
passing each point E, F, G and H.
previous tree diagrams.
These can be taken from the five
Table 5.3 shows the *F = 24 denotes the
number of paths not containing any segments FG or FGH; FG
= 16
de-
notes the number of paths containing FG, including the FGH - 4.
Likewise for the other boundary points or segments.
Table 5.3
Paths Containing the Boundary Points
or Segments for n=3
Points
Segments
E
EF
10
EFG
4
EFGH
1
FG
18+6 '"' 24
9+3+1+3
16
FGH
1+3 = 4
*G
GH
9+18+3 '"' 30
1+3+6 = 10
*H
1+3+6
*F
F
G
H
fI of Paths
II:
II:
10
Note that for point E, there are no *E segment, because all paths
(-10) contain the segment EF.
the paths for n-4.
Based on this table we could compute
In general we would have the following diagram
150
Figure 5.7.
The Paths Between Two Sample Sizes
nand (n+l)
where Ai; i=l, ••• ,n and Bj ; j=O, ••• ,n+l are the points on the side
of one sixth of the hexagonal projection corresponding to 3-samples
of sizes nand (n+l), respectively.
Having the number of paths con-
taining each point Ai' we can compute the number of paths through
point Bj •
In fact, we need to find the number of paths having points
Bj or B _j + , and those paths do not have any point outside the segn
l
ment BjBj+l ••• Bn_j+l' which will be written as
(5.6.17)
for j=0,., ••• ,[{n+1)/2).
Then, if (n+1) = 2m, we have
P +l (D = OB j ) = P +l {D = (2m
n
n
2
+
~2)~} = 6
P +l (B or Bn - j +1)
n
j
(5.6.18)
However, if (n+l) = 2m+l, that is for odd sample sizes, we have
Pn+l (D • OB)
j
= Pn+l [D
= {( 2m+l) 2 + j 2 / 9}~ ] =
(5.6.19)
for j;O.
Now, we will proceed with computing the value of P + {B or
n 1 j
B - + ).
n j l
For this purpose we should distinguish between the caseS
151
j ~
1 and j > 1.
First, we should note that there are three types
of paths through each point Ai' as shown in Figure 5.8.
Figure 5.8.
Types of Path at Each Point
= 0,
The number of paths of these types will be written as A for k
ik
1 or 2 according to whether the paths having 0 segment in AuAn,(a);
1 segment AiAi+l,(b), or two segments Ai_lAi or AiAi+l,(c), respectively.
.e
So
(5.6.20)
would indicate the total number of paths containing the point A.•
~
Then for j
~
1, using tree diagram, we obtain
n
Pn+1 (B j )
where 011
=1
and 0li
=0
= i:1 (1
+ 2o li )Ai2
(5.6.21)
for i ~ 1.
Hence
n-11 + 2 i:l (1 + 20 1i )Ai2
(5.6.22)
n-l
P
(D. n+1/2) • 6{1 + 2 E (1 + 20 )A }
1i i2
n+l
2
i-I
(5.6.23)
Pn+l(Bn or Bn+1 )
=
Thence
or
n
P
(D - n+1 12) - 6{-1 + 2 E (1 + 20 )A }
1i i2
n+l
2
i-I
Also
(n-1)2/6}~] •
6{-1 +
~(l
i-I
(5.6.24)
+ 20 1i )A }
i2
(5.6.25)
152
Next, for j > 1, the Pn+l(Bj or Bn _j +l ) will be written as
2Pn+l (B j or Bk,j < k < n-j+l) + Pn+l(Bj and Bn _ j +l )
(5.6.26)
where the second term indicates the number of paths containing only
the segment BjBn_ j +l of the boundary BOB + •
n l
We obtain
n-j-l
P + (B j or B _j +l ) .. 2{P (A .) +
E A } + A _. 2
n J
n l
n
i=j i2
n J,
(5.6.27)
for j=2,3, ... ,(n+l)/2), and j
~
n-j+l.
If (n+l) .. 2m, then for j .. m, we would have
(5.6.28)
So, we may write
e.
P (A.) if 2j .. n+l
n
J
otherwise
(5.6.29)
Using tldsresult we can easily obtain the probabilities of D having
values corresponding to the boundary points for any sample size.
Finally, we need to consider computing
from Pn(A ), i-O,l, ••• ,n; for k > O.
i
show the results for n-2.
Formulas (5.6.12) and (5.6.16)
In contrast with the previous paragraphs,
in which we observe the paths and the corresponding tree diagrams
between AOA and B B + or outside
On 1
n
~An;
here we should observe the
153
path inside AOA •
n
So, (5.6.28) clearly indicates the paths which can
not go outside of AOA , because these paths should lead to D
n
That is,
E
OA •
i
(5.6.31)
as shown in Figure 5.9.
Figure 5.9.
The Paths for Sample Size (n+k) in
the Hexagonal of Size n.
A
n
.e
At this point, it is understood that the number of paths through each
point inside the hexagonal of size-n has been computed.
Hence, we
have the values of
Pn (Ai or An-1..)
(5.6.32)
and
(5.6.33)
for i
~
n/2.
Having these values, including the number of paths
having a certain type through each point Ai or T , as given in Figure
i
5.9; we can compute (5.6.30), as follows.
First, we consider the case k - 1, then we use this result,
as a recursive formula, to obtain Pn+k(Ai or An_i) for k > 1.
Having
the values of (5.6.32-33), we can compute
(5.6.34)
154
for i < n/2 by applying formula (5.6.27-29).
Then we obtain:
n-2
1 + 2
L (k
j=l
+ 20 lj ) Tj 2
if i .. n/2 exist
(5.6.35)
This value indicates the number of paths having D .. OA , which pass
i
along TiAi or AiT _ l only once, corresponding to the 3-samples of
i
sizes (n+l).
These paths would pass twice along a segment inside the
hexagonal of size (n-l).
once along each path.
In fact, they circle a certain triangular
*
So, in addition to Pn+l,i'
we should find the
number of paths, inside the hexagonal of size n, which circle either
**+1 .
Pn,~
= Pn**+l(A.
~
or An-~.)
(5.3.36)
Thence
Now, we still need to find the value of (5.6.36).
This value
can be computed from the previous computed values of (5.6.32). Having
these values, we should have also the values of
(5.6.38)
and
Pn" i 2 • Pn (Ai and An- i)
as noted in formula (5.6.26).
(5.6.39)
These values could be computed from
e.
155
Tn_l,j' j=O,l, ••• ,(n-l) using formulas (5.6.27-28).
By observing the possible circling once around either the
**
Pn+1,i
2Pn, i , 1 + 1,
if i • 0
AiO '
if i
-
= n/2
exist
(5.6.40)
where the Kronecker 0li
=1
if i=l and 0li • 0 otherwise, and AiO is
the number of paths, which touch the line AOA at point Ai' see Figure
n
5.9.
.e
Thence, substituting (5.6.35) and (5.6.40) in (5.6.37), we
obtain Pn+l(D
= DAi )
for i
~
n/2.
So, we have shown the existence
of a recursive formula of the form
P +l(D
n
= OA.)
1.
= F{P (Aj),P (T.,); i <j
n
n J
••• ,(n-1)}
~
n/2, j'
lI::
0,
(5.6.41)
Thence, using this formula k times we can compute (5.6.31) for
k
>
1.
Formulas (5.6.12) and (5.6.14) show a result for n=2, which
can be written as
(5.6.42)
156
for i=O or 1 according to whether OA = 216/3 or
i
5.7
1:2.
Triplet Chart for Censored Data
Triplet chart may be used for censored data if we can trans-
form the combined 3-samp1es into uncensored data.
In this case we
are to use the estimated values, instead of the observed values. This
transformation could be done, for instance, in life testing problems
using Nelson's (1972) hazard plotting method.
assumes
However,
t~method
that the parent populations have a known distribution
function, such as the exponential, Weibull, extreme value, normal
or log-normal distribution.
So, this method is a combination of parametric and nonpara-
e.
metric procedures.
5.8
Alternative Presentation of the Triplet Chart
Associated with the samples of sizes n , i=1,2,3, we observe
i
the hexagonal projection as a polar coordinate system having 0 as
the origin and Xl as the axis, see Figure 5.2.
rcis~
=
r(cos~
+ i
And let
sin~),
(5.8.1)
where i • 1-1, be a vector from 0 to a lattice point, for certain
values of r and~.
-2~/3
For example for r • 16/3 and ~ • 0, 2~/3 and
denote the points on the Xl' X and X axes, respectively, next
2
3
to the origin.
The projection of the point P(n ,n ,n ) will be
l 2 3
written as r(n l ,n2 ,n 3 ) cisa(nl,n2,n3) • r *
then r * • O.
*
cis~.
If n l • n 2 • n 3
157
Assuming there are no tied observations between the Xi's, the
path from 0 to r *cisa * would satisfy the equation
16/3
N
r cisa ... r *cisa *
j=l
J
(5.8.2)
y
w:'d'c: N • n +n +n
1
n
2
of a
j
2
3
and a j .. 0, + 21f/3 or -21f/3 with n of a
j
1
• +21f/3, and n
3
of a
j
..
-21f/3.
= 0;
The j-th value of a is
associated with the j-th lattice point in the path.
The NI/(n In In 1)
1 2 3
orderings of the values of aj's corresponds to the number of paths
from (0,0,0) to (n ,n ,n ). The paths from one lattice point to the
1 2 3
next point can be presented as a tree diagram:
reis(21f/3) •••
rcisO •••
<
reis(-21f/3)
.e
rcis(21f/3)
reisO
/
•
rcis(-21f/3)
0(;
rcis(21f/3)
reis(-21f/3)
From each point there are three possible paths or branches.
observe n
1
of zeros, n
2
of 2n/3's, or n
3
of -21f/3's.
Until we
Then the number
of branches from each point decreases to 2; and finally to 1; until
we have N branches along each path.
Thus, a path can be presented by
158
components of
2~/3's
and n
3
components of
-2~/3's.
It is clear that
a path from the origin to the next k-th lattice point can be written
as
£(k) .. (a l ,a 2 ,···,ak )'
for 1 < k < N.
(5.8.3)
And the k-th lattice point has polar coordinate
k
cisa
1613 • E
j=l
(5.8.4)
j
If the k-th lattice point is the origin then
k
1:
cosa
j=l
j
= 0
(5.8.5)
=0
(5.8.6)
and
k
1:
sina.
J
j=l
This implies k
=
3m for some m, and £(k) would have m of each pos-
0,
sible components;
2~/3
and
-2~/3.
Now, we may consider the ordered observations of the combined
samples associated with its vector a.
Let Xi(j) be the j-th ordered
observation coming from the i-th sample, then the j-th component of
a is
a
for j=l,2, ••• ,N.
j
..
0
if Xi(j)
2~/3
if Xi(j) .. X2 (j)
-2~/3
if Xi(j) = X3 (j)
= Xl(j)
(5.8.7)
Now, let
(5.8.8)
be the j-th component of the vector
0, or +1.
And it is clear that
e.
~
~,
then
~j
tan take a value -1,
can be considered as an alternative
•
159
presentation of the triplet chart.
For illustrations, based on data
sets (A), (B) and (D) in Table 5.1, we obtain the following i vectors,
as the alternative presentation of their triplet charts given in
Figure 5.2.
fA • (-1,0,-1,1,1,0,0,-1,-1,0,1,-1,1,-1,0,0,1),
~
• (1,0,-1,-1,-1,-1,1,-1,1,1,1,1,1,-1,0,0,0),
and
i D • (0,1,0,1,0,1,-1,0,1,-1,1,-1,1,-1,1).
In the case of tied observations, in particular between samples,
we may use "upper line" above the corresponding elements of
i.
For
data set (C) in Table 5.1, we have
~
.e
=
(-1,0,1,-1,0,1,1,-1,-1,0,-1,0,1,1,-1,1) •
If we are concerned only with the rejection index I given in
(5.4.2), then we should consider using this vector i, instead of
the triplet chart.
of
Because it is much easier to obtain the elements
f than to construct the triplet chart.
Finally, we consider computing the value of the D statistic
based on this vector representation, if we have equal sample sizes,
n say.
It is easy to verify that
D
provided
=
Ma:l.
1<k<3n
16
1
3'
k
1: cisCl
j=l
j ,
£ is a fixed vector for a given data set.
(5.8.8)
CHAPTER VI
APPLICATIONS TO DEMOGRAPHIC DATA
6.1
Applications of the Censored Pair Chart
To illustrate application of the CPC, we use a subsamp1e of data
from the National Fertility Survey of the United States, 1970 (Ryder
and Westoff, 1977).
In this study, we compare the experiences of
white women and black women on their first birth interval (FBI) and
the time to separation of their first marriage (TSFM).
However, this
study is limited to women who married for the first time at 18 years
of age.
Table 6.1 shows a statistical analysis, based on "Proc Means"
of SAS, of the FBI data for the women under consideration, after
deleting all births occurring before 7 months after marriage.
Figure
6.1 shows their CPC of Type-I, which is represented as a vertical
bar chart.
Let Xl and X denote the blacks' and the whites' FBI,
2
respectively; in Figure 6.1 we have statistics A(Xi),C(X ), Q(Xi )
i
and U(X ) for i=l,2.
i
This figure shows that
U(X )
l
U(X )
2
(6.1.1)
A(X ) < A(X )
l
2
(6.1.2)
<
where U(Xi ) is the Mann-Whitney U statistic for the complete observations, and W-A(X )-A(X ) is Gehan's W statistic for the censored
1
2
e.
161
Table 6.1.
First Birth Interval for Women Marrying
for the First Time at Age 18, Based on
the U.S.-N.F.S., 1970
Race
Black
White
Statistic
Complete
Sample Size
.e
Incomplete
707
Complete
113
Incomplete
76
17
Mean FBI
22.08
67.22
19.70
67.65
S.D.
20.94
73.62
17.38
92.35
o.
7.00
218.00
329.00
93.00
0.79
6.93
1.99
94.84
109.53
88.26
Minimum
7.00
Maximum
S.E. of Mean
c.v.
!
I
I
,
I
1.00
339.00
i
22.40
I
136.51
I
I
:
162
The CPC-I of the First Birth Interval
Between Black and White Women Marrying
for the First Time at Age 18, Based
on the U.S.-N.F.S., 1970
Figure 6.1.
X2 • White
•• a
,
,
,...........................
•
"
,.........................................................
,
. . . . . Il~
"
"
.11.'.""
"
" •••• ....... ,
..
,
.
..
..... 11
_•• II It
III
3.1IJ.USS.sSJ
II !!
.53"_ .$3.1.1"
_
,)
t.. .335 U 5HU33 533H5 333 5U333 53 335 33 333 U533 5333 3U5U5 35333 5 5 U5H 3 5 33U 3 Hll , , , ,
133 55 5!55"3U553"55H535533H5333UH5H5HH5UUUUUUJ333U535J55Hlll " ' "
1UU 33 UU U3U3UU3 3 3 U3333 U3 U 3U33333 U333333 33 U 3UU UUJU 3 5 3 51 111ll" ' " ' '
IU:U33H53533UU35H5335UHU3333333H355U5533JUJ53U555UU5355l1111ll,,,,,,,
I U UU3H '55333U33J33UU5U.533 3 35UU535UHU3U5U33UUUH3 U llllllll " " ' "
' " .3UJUHU3U33U33333U3U333335'U3333333UUUU33U3:UUU33llll11lllll"'"''''
1335: 333 3JU3333333U53UU333U533U3U3333533H33 U33UU5333l1lll1U olllll " " " ' "
IUU3UU5U3333UUU3UUU3333U33333333U53UU33UUU5UU1IUllllllll"'"''''
133 3 J 3 3 33J' 5 333 35 U33 535 3nU3 3333 3 333u33nun3 II 3 3333 33nu 1IIIIIllllll111" " " ' "
1333,33333,3U33U5un33333333533'33n3333U3333U5nU333UllI111l1111111ll"'"''''
so• • 33U333U33333333HnU33U53U3!!333H33353333U3UUUUlllU1UllUUU11"""""
13533U3U35U3nU3U33533UUU33JUJ333UUUUU5UOl21lll U 11llll1l1 1111 l " " " " , .
I3JUHUU33U355UU3UU535UU355UUU35UU3312Z111II11l11111ll1U11Ul"""''''
I 35!'UHUU5U533U5U335nn33535un33UU335UlZ2U 1111 11111111111111111 " ' ' ' ' ' ' ' '
. I U5: 5UU' 3533U33unU33535 53535U5U5355nU222l11111lll111 111111111 lll11 " ' ' ' ' ' ' ' '
. . . . U533UU5335uunU3UUU335nnUJ35nnu lI11111111111111Il1111111l1111''''''''''
I 3J3JUUU3nuuuuun33U5U3U5U5U31UIlll1111111l1l111l111111111111l" " " , . "
13UUUU'5unuuunUU53nUUUU3!211111l11l1111111111111111111111111""""'"
IU5JU5U3U3533UUUUUUJUU.UUUUlllI111111111111111111111111l11111"""3""
1 U5U55U5U555UUUUU355555UU55J2Ull111111 II 11111111111111 i 1111111111 ,,~~'''''.,
' " ·UJU55UUnU5U3U5UU5un3UUH1111 1111111111111 1111111111111111 11111"S""''''
1335' U IU3UU3UU5 55U5U55U332Ul111111111111111111111111111111111111111" " " " ' "
1'UUUU5UU3UuuuuuunUU2111111 11 111111111 111111111111111 11 l 1111 I " ' " ' ' ' ' ' '
153" IUJUUUUU53U5555U5U22 Il1111111111111 1111111111 111111111111111111 " , , , , , , , , , , ,
.,U3U5UU3U5UUU555UU3U221'1111111111111111111111l11l11Ul11111111111"'"n,,,,ss
I t• • 3UUU5U5U333IUUUUU221111111111111111 11 1111111111l11111111i111111111""""'"''
.3UJU5UJU3UUU551 51222221111111111111111111111111111l111111111111111111"",,,,n,,,
I.UUUUU5U3UU55U212U211111111lI11111111111111111111111111111111111l1 15"'"'15'''
IU555UUJU3U2UU2221111111111111l11 11 111111111 111111111111111 11111111111" " , , , , , , , , , ,
I 3UU55u",n35222222221111111111111111111111111111 1111111111111111111111111" " , , , , , n , , ,
I " .3UUUUU55U22222222111111111111 11 11 111111 1 111111111 1111111111 1 1111 11 1111,n"""n~"
1UI! U5U1222221111l11111 11 1111 11 llllll1 11 11 111111111111111111111111 11111l1l,,,~,,,,,n,,n
,.U5U52222222211 11111111111111111 11 11111111111111111111111 111 11111111111 11 1, , , , , n n , , , , , '
12U2222llI11111111111111 111111ll 11 1111111111ll1111l11111111 11 11111111111 Ull""""""~"
12212222111 11111111111111111111111111111111111111111111111111111111111 1111I11""n", n~",
~
Symbols:
Statis.
1
2
Q
• Black
3
4
5
e.
163
data.
The inequalities (6.1.1) and (6.1.2) suggest that Xl and X do
2
not have the same distribution functions for their first birth
intervals.
In fact, they suggest that the white women tend to have
longer first birth interval than the black women.
Furthermore, we obtain U(X )
l
A(Xl )
= 31396.50
and A(X )
2
C
23821.50, U(X )
2
= 36298.50.
= 29910.50,
Hence W = -4902.
Looking at the path of the pair chart based on the complete
observations, we may wonder whether the two groups have a significant
difference with respect to their FBI, because the path is relatively
close to the diagonal line.
Here, however, we are not interested in
doing further statistical tests.
Now,we will consider their first marriage experiences.
.e
Table
6.2 shows a statistical analysis, based on "Proc Means" of SAS, of
the TSFM data for the women under study.
In this case, right cen-
soring occurs, because they are still married at the time of survey.
So, it is not surprising that we have a large number of incomplete
observations in both samples, that is 690 out of 816 observations
for the whites and 58 out of 92 observations for the blacks.
Let
Y and Y denote the blacks' and the whites' TSFM, respectively.
l
2
Figure 6.2 shows
U(Y ) > U(Y )
l
2
(6.1.3)
A(Y )
l
(6.1.4)
<
A(Y )
2
Considering only the complete observations, (6.1.3) suggests that the white women tend to have shorter time to separation
than the blacks.
However, based on the whole data set, complete
164
Table 6.2.
Time to Separation From First Marriage
for Women Marrying for the First Time
at Age 18, Based on the U.S.-N.F.S.,
1970
Race
I
White
Statistic
Complete
Incomplete
l
Black
Complete
Incomplete
I
Sample Size
126
690
i
I
34
58
Mean
84.77
178.04
109.26
168.57
S.D.
69.29
89.94
78.54
95.45
5.00
11.00
23.00
360.00
292.00
339.00
13.47
12.53
71.88
56.63
Minimum
0
Maximum
307.00
S.E. of Mean
c.v.
I
6.17
I,
3.42
81.74
I
50.51
I
I
I
I
I
e.
165
Figure 6.2.
The CPC-I of the Time of Separation
from the First Marriage Between Black
and White Women Marrying for the First
Time at Age 18, based on the U.S.N.F.S., 1970
Y2 • White
I
100
••
..... , . " . .
II
·
.
·............ ". ...
1· .. •••••••••
700
I·.. ·~········I
...................
I······.
.·· '1,••••••••
······
,•••••••"
......................
I·· ·········· .
.........................
:
,
600
SOO
it
.
,f··.. ·.. ··················· ...
...........................
..................................
.1tI·······.. ····",.···,·.. ·····.. ·..
,
f· .. ••••••••• .. • ...... ••••••••
.
-
•
100
JOO
lOO
111 • • • • • :.
..
I··...··.····················.
.··..""_...
f··.
. ·•.. . ·....·.....·••..•·•
....................
""
.......................................
........
.
......................................
·.......................................
,
, .
..............
"
...
·.............. ,
IU.4U• • • • ~
" ..
.... " • • • • • 41
" .. , .. , •••
• 111,
111 lJ
)
ll)
J
)) •••••
I JJ lJ)JJJ))))JJ))JJJJJJJJ)J J 1111111 SH555555H55H5 55 5S555555
100 • J JJJJ JJJJJJ)JJJ JJJ)) 111111\111111155 5555555555555555 55555555555555555555
I )JJ3J)) 33JJJJJ II 111 111111111111111555555555555555555 55555555555~55555555555555555
I JJJJJ JJ,j)J 111111111111111111111111 5555555 5555H5 555555S555555555555555 55555555555555 55555
I JJJJJ 111111111111111111111111111\15555555 555 S55 555555555 55555555555 55S 55 555S 55 5555555555 ~55
I J 1111 111111 11'1111111111111111111155 S5555555555555555555555555555555555555555555555 555555555
1111111111H22~22222J))J)J3J)J
555555555;'''U.''66 n77777777
UUI99t
12 ~56 719012~ 567190 1n.s. 11 '012 J.5.1I 90 113'5"190 12 105671 9012 3. S67190 1n.s.'19012J. 56'190 12
Y1 • Black
Symbols :
Statis.
1
2
Q
3
4
C(Y )
2
5
166
and incomplete observations, (6.1.4) suggests that the whites have
longer time to separation than the blacks.
Anyway, both inequalities
suggest the rejection of the null hypothesis that Y and Y have
2
l
the same distribution functions.
Furthermore, we obtain U{Y )
l
A{Y l )
= 8191.50
C
and A{Y ) • 18667.50.
2
2563.50, U{Y ) • 1720.50,
2
Hence W c A{Y l )-A{Y 2 ) • 10476.
Remarks:
(I)
Note that, in Figure 6.1 and Figure 6.2, a unit length on the
vertical axis is much shorter than on the horizontal axis.
This
situation should be taken into consideration in comparing the
corresponding regions in the bar charts.
6.1
(2)
suggests
C(Xl )
<
For example, Figure
C{X2 ); but in fact C{Xl ) >
C{~).
The previous two applications show us in general how the resuIt based on only the complete observations may be affected
by the censored observations.
In the second application, as
shown by (6.1.3) and (6.1.4) as well as Figure 6.2, the result
based on the whole data set contradicts the result only based
on the complete data set.
(3)
Such a large number of censored observations,in the second
application, arouses some questions about the distributions
of the censoring variables of the white and the black groups.
In this case, the censorings occur at a fixed point of time,
that is, the time of survey.
However, the subjects entered
the studied groups at random.
So, by considering the sub-
jects as having entered at a point of time, the censoring can
167
be considered as random.
Furthermore, because we have large number of censored observations in both samples, we may consider testing whether
the censoring variables have the same distribution functions.
For this purpose, we can use the pair chart.
Keeping in mind
that the censored observations were assumed to have decreasing
ordering, we obtain the pair chart as given in Figure 6.3 with
diagonal line from upper left corner to lower right corner.
This pair chart suggests that the censoring variables have
the same distribution functions, since its path, that is the
boundary between the area having symbol (1) and that having
symbol (3), is relatively close
,e
to the diagonal line.
As
noted in Chapter V, this pair chart can be standardized following Quade's result, such that we would have a square, instead
of a rectangle.
The values of the corresponding Mann-Whitney U statistic
and Gehan's W statistic are U(Y *)
l
= 18653.5;
U(Y *
2)
= 21366.5,
and W = -2713.
Hence, the CPC-I in Figure 6.2 is a "valid" or "good"
descriptive statistic for testing the null hypothesis that the
times for separation from first marraiges of black and white
women have the same distribution functions.
suggests rejection of the null hypothesis.
And this CPC-I
e
(Ix)n
(lx)n
:spe~s
¥
¥
:I°qm.AS
I
£
){Oenl • IX
¥
..... , . , t t t . . . . . e, ~f t t . . . . .
"."t I •• ,
'e
_t
rIc " , ' e " [, \ 0 .... " "
t
1
s
e c " , "" II , • • • • • • • _rt I t I t C_........................................................................................
C C r I t l t r l l l l l ' t t t t t t t"
...................................................................
c
'e
t
t
t
c t
t t l t I l t i t t 1 1 l I t I , I 'I , t I l t i t t t l t , t t l t t t t l 1 t I t t t l 't ' l I t 1 1 I t I
t t l I I t I t I t t ' l I t t I t I t I t t i t t , t t l t 1 'I t t l I t I t t t l t 1 : 1 r 1 , 1 , 1 I 1
C C C' r , r t I t t t l I t ' t t t t l t I l t I t t I l t t l t r 1 t t t t t t t t t l t I t 1 t t t t t l t t l
c c t c r f c t C c . t l l l ' C l l t l , t . t t l t l t l t t l . t l t t t . l t t •• t t t t t t . t t , t I t
C , C' C , t r ere c , t t t t l I I t t t i t l I t I t ' t , 1 I t t t I I , , t t l t i t t I l t t t t l t I t • • It
cccrrccrcrCctClttllltt.,t,tttttll,tt,tt,ttt,ltl 1 tl,,11,,,,
t" t t l ' "
" , tIt t t t t t l ' " 1 1 1 1 1 I 11' 1 t, t t t t
l' (C ( C f l f l " "
ccc t t t t"
ttl ttl 1 • t t , • , tIt t t , t t t t t , , tIt • t • t t t t
trrrllrr,,,rrrcrc.tt.t
• , t • , , , t t t , 1 , I , t t t t , t , t t t , t 1 t t t t t 1: t 1
( C ( C S I C e l l l e s r r c t , •• ,
0"
, I , , t I • , , 'I 'I t , 1 t t , t t t t , t t t l , t , t f t t l t ••
trrtte,ettlsr,sSt'1"
t t , , t t t t , t t t • , t , t I t ' , , t , , , , , , t 'I , t 1 , •
c r " S ' c " s r r r r c , e • • • ct
, , t, t '.1'"
t ' t 1 t 1 t t t t t t t l 1 t, t , t , t t t t
"I r I "'llllerlrCIICt
r r t r t , 1 t , t 'I t t t t t t t t l t t l t t l r t t • t t t , t ,
I 'C" " c I , r r c c r ' c " r t c
r
C •• crt , t ttl I t 1 , , t t , t t l t t , 1 t t t t • , t t l
.C rc r l f C I ' C C C l f C I r , c .
, c 'C C , r , r r , t t t , , I t t I t t , , , t I t t t t t , t t t l • IIC
ItrrrCt""rrccccrrf'
rIC'I"llrll t 'I"I""I'I'111111111'
cclrrlrrlr'c'I,ccr,IC
,c •• e r , r c r t t " l l t " " I ' t l t C t t t t " I ' t '
ICClrrCtrl'lrrlcrc"r
crtre.,r,.tttttltttltttlttlltltt"tt
c rl".rrl.rcrrl,rrttr
rcrCrr.r'Itlllllltttlttttlttltt'ltll
I lec,rcItCeCcrrlCllel
'l'C'eccrcrCtCiCrlrcctrrcrrr •• r c r c r r t l l l t t ' t l t t t t ' l t t t t l 1 t f l ....
cecerrc.rrcrlcrrc.rlcrl,ccrcrcCctrCclltltttttttllttltftf,1
,,;rcceCllrirSCSS'CCICrtCCC,,,,rt,,,ccCt,"'tt'ttl"'t",ltlt
CCCCCCl:cccCl:rl:ccrrcccrctcrCCCl:rrccrcccc:t,tlttttlt'tl"lIt I
rcr,cr"crcccerlllceccc.crCII:CCCCCrccccc'tttll'III'II'lIt I
c r c r f r I: , • f r • c C f C • C , C f r IE IE , IE C C 'C r c , C c C rIC C • I I: I I .. t t l l I t t I , I 1 , I • II,
ccccccccrcrccccrcrcctrcrctccccrtrrccrcccccccf'llttllltIltl
rcrc,ccCctl't'CC,crcccrrccrcccccrcrcICCCtrccctlllll"1ttll
Ilrccrcc.errcrccrCICC'I.rcr.lc,rccrcrcrccrcrrctlltttlt11,1
CCtrrtICrrlcCCC'CI'Ii,ccrcrccrrecrcrcccrrrcrccrltlltltt11,11
e I , I t I I I r C' c r I I C I r r C CIt C I ICC I C f e r , r r I , r C , I I: C f e f , ~ I 1 I I , t t , t 1 • It,
1IIIC"I:C'CCCCCC'CCleCCtCec"ccrcrl;cc,cccrccrc"c'tt,I"'.
teeCICleccerccsrr,c"c"ctceccc"rcr",ccr'CCI""',I"Iltll
erllllcICfc,rlicccflrcr"ccccCiCC,cc,c.rclccc"rcrr,tcrlltl.
IClrrllflrrlccICCCCClccrrC'Ic",ccrcrccccrrcecc,cr.CCIelil •
• 0...
I I
I I • I
C •
I
I
I
I I I C I
C ICC
I I
I I
I
I
C1~ltU\
OL61 'oSo~oN_oSon aq~
UO pasea 'SI a8v ~e aml~ ~S~l~ aq~
~Oj 8UlA.Uew uamol1 aHtU\ pue ){O&Ia
uClCI~aa samll 8Ul~C1~U] ~O salqel~eA
o£ 9
8Ul~osoa:> aq~ 10 ~~eto ~led atU
e
0
891
C1~n8'U
• lX
¥
169
6.2
Application of the Generalized Kendall's Tau
Here, we will consider a bivariate vector!
K
is the second birth interval and X is the length of
2
after the first birth.
~.
Censoring on
~
(X ,X ) where Xl
l 2
~reastfeeding
Censoring may occur on both variables, Xl and
may be caused by divorce, separation, widowhood
after the first birth, and the time point of survey.
Censoring on
X occurs if the mothers are still breastfeeding at the time of sur2
vey.
We may note that termination of breastfeeding is either volun-
tary or involuntary, caused by child death.
In both cases, the
length of breastfeeding, X , is considered as uncensored and we are
2
interested in comparing measures of association between Xl and X2
based on the GKT in Chapter III and other possible indexes, such as
.e
the product moment correlation, the rank correlation and Kendall's
tau.
For illustration, we take a group of mothers having certain
characteristics from the data of SriLanka based on the World Fertility
Survey (WFS), 1975.
(i)
(ii)
(iii)
(iv)
The characteristics of the group are:
1 or 2 children ever born
neither husband nor wife sterilized
ever used contraceptive
18 years old at the first birth
Table 6.3 shows a statistical analysis of the corresponding
data.
170
Table 6.3.
The Statistical Analysis of the
Illustrative Bivariate Data
Statistics
N
Indicator Variable
(0,0)
(1,0)
(1,1)
57
17
27
Mean
(38.42, 15.59) (91. 59, 11.53) (10.11, 10.11)
S.D.
(23.31, 11.18) (99.34, 11. 43) (7.09,
7.09)
Minimum
(12, 0)
(0, 0)
(2, 2)
Maximum
(123, 48)
(356, 36)
(26, 26)
S.E. of Mean
c.v.
(3.09, 1.48)
(24.09,
2.77) (1. 36,
1.36)
(60.68, 71. 78) (108.46, 99. 14) (70. 11 , 70.11)
From this table, we can note that the statistics associated with 0
(1,1) have the same values for Xl and X •
2
observed values on Xl and X are the same.
2
e
=
This indicates that the
Within this subgroup, all
mothers are still breastfeeding at the time of survey, and they have
not had a second birth.
Table 6.4 shows some correlation coefficients: (1) the GKT's
and the index Alpha associated with the censored data with n
= 101;
(ii) the product moment correlation (Pearson), the rank correlation
(Spearman) and Kendall's Tau-B associated with the complete subdata with n • 57; and (iii) under the assumption that the whole data
171
Table 6.4.
Correlation Coefficients and Prob.>lr\
Under H :
O
Data
(J
00:
0 for the Illustrative
Correlation Coefficients
Censored Data: N
00:
101
Tau-C,l
0.132
0.0000
Tau-C,2.
0.098
0.0000(*)
Alpha-l
0.247
0.0000
Alpha-2
0.200
0.0000 (*)
Pearson
0.351
0.0074
Spearman
0.306
0.0205
Tau-B (>Tau-A)
0.234
0.0144
Uncensored Sub-Data: N
.e
00:
57
Assuming the Data are Complete: N
(*)
r
= 101
Pearson
0.258
0.0092
Spearman
0.404
0.0001
Tau-B
0.328
0.0000
Tau-C,l (oo:Tau-A)
0.315
0.0000
Alpha-l (-Gamma)
0.340
0.0000
Is computed using normal approximation of the U-statistic.
172
set (N • 101) are complete.
This
tab~~uggests
that the group
of mothers under study in Sri Lanka has positive correlation between their second birth interval and the length of their breastfeeding.
Considering the original data having complete and incomplete
observations the index Tau-C,l of 0.132 as well as the index Alpha-1
of 0.247
p • 0.0000.
clearly suggests the rejection of the null hypothesis with
This probability is computed using normal approximation.
The computation of the probabilities P( Ipl>r) for Tau-C,2 and Alpha-2
are still open, because of the large number of permutations invo1ved, that is 101.
Under the assumption that the data is complete, we are considering the variable
~.
(Zl,Z2) • Min(!, Y) given in (3.8.1).
Table 6.4 shows that Zl and Z2 have a positive correlation t c ,l •
t a • .315 with p • 0.0000.
This method has advantanges if the de-
pendence of Zl and Z2 implies the dependence of Xl and X2 •
Otherwise,
this method is not appropriate for testing the null hypothesis that
there is association between the second birth interval and the length
of breastfeeding.
For comparison, Table 6.4 also shows the correlation coefficients between Xl and X for the complete sub-sample.
2
The table
shows the Pearson correlation coefficient r • 0.351 with p • 0.0074.
Table 6.5 shows the correlation coefficients of bivariate
data sets for selected groups at first birth of mothers in Sri
Lanka, that is 20, 25 and 30 or above.
This table and Table 6.4 show
173
Table 6.5.
Correlation Coefficients and Prob.>lr!
Under H : p-O for Some Selected Age
O
Groups at First Birth of Mothers in
Sri Lanka
Age Groups at First Birth
20
25
> 30
Tau-C,l
.165/
.161/
.197/
Tau-C,2
.130/ -
.107/
.251/
Alpha-l
.301/
.313/
.298/
Alpha-2
.261/ -
.248/
.420/
Censored Data
.e
Complete Assumption
Pearson
.263/ .013
.465/.000
.394/.000
Spearman
.474/.000
.438/.000
.459/.000
Tau-B
.373/.000
.356/.000
.351/ .000
Tau-C,l (=Tau-A)
.457/ -
.345/
.338/
Alpha-l (-Gamma)
.384/
.367/
.364/
Complete Obs.
47
33
32
Incomp. Obs.
41
41
60
174
that the number of complete observations decreases as the group age increases.
Note that the inequalities
Tc,l (~, X2) < a l (~, X2 ),
Tc,2(~' X2 ) < a 2 (XI , X2 ),
Tc,l(~' X2 ) < Tc,l(Zl' Z2)' and
al(~' X2 ) < al(Zl' Z2)·
which are easily derived from Chapter III, are satisfied for all age
groups.
Considering the value of al(X , X ), as well as the Pearson
l
2
r(ZI' Z2)' its value increases from age group 18 to 25 and then decreases to age group 30 or over.
The maximum value of al(X , X ) is
2
I
.313 and its minimum value is .247 at age group 18.
However,
al(ZI' Z2) has maximum value of .384 at age group 20 and minimum
value of .364 at age 30 or above.
Finally all correlation coefficients are positive with
p(1 pl>r) < .10.
So, the second birth interval and the length of
breastfeeding for Sri Lanka mothers tend to have a positive correlation.
CHAPTER VII
CONCLUSION AND SUGGESTIONS FOR FUTURE RESEARCH
7.1
Charts
It has been shown that the censored pair chart (CPC), as well
as the triplet chart, can be drawn easily for small sample sizes.
Hence, the charts should be considered as descriptive statistics,
especially whenever the sample sizes are small.
For large sample sizes, we proposed a bar chart as an alter-
.e
native presentation of a CPC.
a SAS program.
As
The bar chart can be constructed using
shown in Figure 2.6 and Figure 6.1-2, this bar
chart has a drawback in comparing the two areas corresponding to the
statistics A(X ), i=1,2, because the horizontal and vertical axes
i
have different unit-lengths.
So, a better computer plot needs to be
developed for the construction of the CPC, in which both axes have
the same units.
A computer plot of the triplet chart for a large sample size
needs to be considered.
Based on our study, however, the construction
of the triplet chart is not worth doing, whenever the sample sizes
are large.
And it is suggested to consider the pairwise comparisons
using the CPC.
Moreover, it is easier to compute the inequality
index in (5.3.2), instead.
Based on the chart for equal sample sizes, we proposed a
176
maximum distance, D, statistic, including its distribution.
For
further study, we may suggest constructing a table of the critical
values of the D-statistic.
7.2
Generalized Kendall's tau
For bivariate data, the null and non-null distributions of the
unconditional generalized Kendall's tau (UGKT) have been discussed in
detail.
For the conditional generalized Kendall's tau (CGKT), however,
it is impossible to obtain an explicit general formula for its distribution function due to the variabilities of the pattern of the
observations, even though the sample size is small.
As
the sample
size n increases the number of permutations, n!, which leads to a fixed
pattern of observations, would increase very rapidly.
So, for large
n, it seems impossible to take into account the n! permutations for
our study.
This situation suggests that we consider taking a suf-
ficiently large random sample from the n! permutations for further
research on the conditional GKT.
This kind of randomization was sug-
gested by Chung and Fraser (1958) for a multivariate two-sample
problem.
And Boyett and Shuster (1977) consider using randomization
for some nonparametric one-sided tests in multivariate analysis.
The extension of these GKT's to censored multivariate data is
presented as a vector statistic.
However, we only study the null
and non-null distribution of the UGKT, which is considered as the
generalized Simon's (1977) statistic.
The previous paragraph implies
that the complexity of the distribution of the CGKT would increase
177
with the number of component variables.
This problem is open for
further study.
The relationship between the d.f. of the true variable, X,
and the d.f. of the variable! • Min(!,y) in (4.4.1), where y is the
censoring variable, has been studied, under the null hypothesis of
dependence and a sequence of the alternative hypotheses, for
general distribution functions.
normality assumptions.
The results are illustrated under
This may be extended to other type of
distribution functions.
Considering the vector statistic Tm in (4.2.14) and the nonnull distribution of t
12 ••• m
discussed in sub-section 4.4.2, we may
consider for future study the correlation matrix of T.
m
Under the
total independence hypothesis, Simon (1977) proved that the covariance of any two components of T is zero, for uncensored data.
m
7.3
Sector Symmetry
We have introduced the idea of sector symmetry, and developed
a chi-squared statistic and an index of symmetry for a statistical
test.
However, this test is valid under the assumption that equali-
ties among any number of the m variates,
m~2,
have zero probabilities.
So, research on how to treat tied components or variates is still
open.
A problem arises, because an observation having tied
variates could be classified into more than one particular sector.
This would violate a property of categorical data.
Compared with other nonparametric tests, the Wilcoxon and
178
normal score statistics are usually more efficient than the KolmogorovSmirnov (K-S) statistic for location or scale differences or other
parametric alternatives.
On the other hand, the K-S statistic is
valid for general alternatives, and the other rank statistics may not
be so.
7.4
A similar situation occurs in the test of symmetry.
Applications
It is undoubted that the CPC and the GKT can be used in the study
of life testing problems.
Examples have been illustrated in the
previous chapters, for a clinical trial/experiment and, for demographic data.
As noted in Section 7.1, a better program needs to be
developed for the CPC for large sample sizes.
The availability of
such a program would lead to a better or a wider usage of the CPC.
The difficulties in interpreting a path or a curve in 3dimensional space suggest a limitation on the application of the
triplet chart.
So, we suggest using the triplet chart only for
small sample sizes.
In the case of equal sample sizes, the hexa-
gonal projection of the triplet chart seems to give a good picture
for making a decision whether the corresponding three present populations tend not to have the same distribution functions.
The application of the generalized Simon's statistic (GSS)
is straightforward.
t
12
But presenting the value of the GSS, that is
••• m for m>2, is insufficient for describing the real life
situations.
A certain value of GSS could be associated with a pos-
tive and a negative association of a certain pair of variates.
So,
179
it is suggested to use the vector statistic to present or describe
the associations between the
.e
•
e
~variates•
180
APPENDIX
SAS PROGRAM FOR THE CPC-I
EXEC SAS,REGION=250K
IISYSlt, DO •
DATA .. STAT I
II
•• ••• ••••• ••••••• ••• ••••••• • •• ••••••••• ••••••••••• 1
••
•
•
FOR UNEQUAL SA~PLE SIlES:
PUT AODITIONAL OBSERVATIO~S HlVING
MISSING ~ALUES
~OTE
•
·1
••
.1
.1
.1
•••••••••••• •• • •• ••• ••• • •• •• •• •••• •• ••• ••••••••••• 1
IN?UT ex x OY YI
I~ X=. A~D Y=. THEN DELETEI
IF X=. THEN 001
OX=91 )(=01
ENOl
IF Y=. THEN 001
OV:'I Y=OI
[NOI
IF ox=O THEN Xl=XI
EL.SE Xl=-1"1
CA~DSI
PROC SORT OUT="STATI BY OX XII
PROC "'/lTRIXI
FETCH lex OATA=WSTATCKEEP=OXII
FETCH lX OATA=~STATCKEEP= XII
FETC~
lCY
CATA=~STATCKEEP=OYII
FETCH IT OATA=WSTATCKEEP= YII
Z=ZDllll ZX I 'lOY II ZY I
N=NROl'/CZII
YLX~JCN.1.011. THE NUMBER OF "S LE~S THAN EACH XI
~Y=JC~.1,OII. T~E NUI-lt;ER OF U~CENSURED OSS. ON
YCGX=JC~,l,OII.
THE Nur·\S!:"R c~ '.CENSURED GREATER THAN EACH XI
rJUMEER CF V'S LESS THAN EACH X.CENSOREDI
YLXC=JC~,l,O'I.·THE
g=JC~,l,O)I. THE NUMeEI' OF TIMES X=Y.
1=01
LOOP: 1=1+11
OA=zcI.lI1 A=1CI,2)1
AA=o; AB=OI AC=OI AD=OI AE-OI
K=OI
LOOPA: K=I<+ll
C8=ZCK,!11 8=ZCK,~)1
IF" OhO THEN 001
IF OB=o THEN 001
AB=AB+ll
IF B<A THEN AA=AA+ll
If B=A THEN AE=AE+ll
,.
(NO;
IF DB=1 THEN 001
IF B>=A THEN AC=AC+1'
(ttOI
[NOI
JF OA;l THEN 001
IF 08=u THEN 001
IF A)=8 THEN AD=AD+ll
(NDI
ENOl
•
181
~e~ THE~ 60 TO LOOPA'
YLXcl,U=AA'
NY (I,U =tAB'
YCGXII,lJ:AC,
YLyell,lJ·Ae,
Qcl,lJaAE'
IF leN THEN GO TO LOOPI
.LY:~Y.'L~.QI
• THE NUMBER CF ~,S LESS THAN EACH V,
• START COMFUTING THE U e A A~D w S,ATISTICS,
UX=SU~IYL)(J + .'.SUMC;J.
I)Y: SU~!NY J • UX'
if
C)(=SU~CYLXCJ'
CY=SU~ cYCGX J'
A)(=UX+CX, AT:UT+CTI
W=AX-AY: • THE W STATISTIC OF ItHAN,
P~INT UX UY ex CY AX AT W,
I=l:N, 1=1',
V: (II I YL XI IJ CN,1 ,1 J ) 1/ ( I I IQI IoJ CN,1 ,2 J JI I U I I XLV I ..JC N,1, 3 ) ) I I
CIII'CGXIIJIN,1'~JJIICII11LXClloJIN,1,5J),
V=VIIIZC,lJIIZI,lJIIZC,lJIIZC,l)IIZC\l)JI
OUTPUT V OUT:CPC,
DATA PCI SET cpe, IF COL_~='.
I.TH.Xl = COLlI
J.TH.X2 = eOLZ'
STAT.lo=eOL31
.................................................. ,
_e
••
•
•
•
NOTE FOR LARGE SAMPLE SIZES:
USE ONLY HBAR CHORIZONTAL BARJ
OR eHA~T WITH VERTICAL X.AXIS
*1
*.
.,
a,
..................................................a ,•
CHARTI
VbAR I.TH.Xl I
TITLE T~t CPC.I
P/(OC CHARTI
HeAR I.TH.Xl I
TITLE THC cPc.1
p~OC
I'
SUBGROUP=STtT.ID SUMVAR:J.TH.X2 DISCRETE NOSPACEI
FOR THE DATA OF FRtlHEICH ET AL.C1963J wITH HORIZONTAL Xl.AXIS,
SU6GROUP:STAT.IO SU~VAR:J.TH.X2 DISCRETE "NOSPACE,
FOR' THE DATA OF FREIKEICH £T AL.c1963J wITH VERTICAL Xl.AXIS'
BIBLIOGRAPHY
•
Barr, Donald R. and Davidson, Teddy (1973). "A Kolmogorov-Smirnov
Test for Censored Samples." Technometrics, Vol. 15, 114,
739-757.
Basu, A.P. (1967). "On the Large Sample Properties of a Generalized
Wilcoxon-Mann-Whitney Statistic," Ann. Math. Statist. 36,
905-915.
Basu, A.P. (1967). "On Two K-Sample Rank Tests for Censored Data,"
Ann. Math. Statist. 36, 1520-1535.
Bishop, Y.M.M.; Feinberg, S.E. and Holland, P.W. (1975). Discrete
Multivariate Analysis, The MIT Press, Cambridge, Mass.
Blomquist, Nils (1950). "On a Measure of Dependence Between Two
Random Variables," Ann. Math. Statist. 21, 593-600.
.e
Boyett, J.M., and Shuster, J.J. (1977). "Nonparametric One-Sided
Tests in Multivariate Analysis with Medical Applications,"
J. Amer. Statist. Assoc. 72, 665-668 •
Breslow, N. (1970). "A Generalized Kruskal-Wallis Test for Comparing
K Samples Subject to Unequal Patterns of Censorship,"
Biometrika 57, 3, 579-594.
Brown, Jr., B.W.M.; Hollander, M. and Korwar, R.M. (1974). "Nonparametric Tests of Independence for Censored Data, with
Applications to Heart Transplant Studies." Reliability and
Biometry 327-354.
Chatterjee, S.K. and Sen, P.K. (1973). "Nonparametric Testing
Under Progressive Censoring," Calcutta Statist. Assoc. Bull.
22 (1), 13-50.
Otung, J.H., and Fraser, D.A.S. (1958). "Randomization Tests for a
Multivariate Two Sample Probme." Ann. Math. Statist. 53,
729-735.
Conover, W.J. (1965). "Several K-Sample Kolmogorov-Smirnov Tests,"
Ann. Math. Statist. 36, Part I, 1019-1026.
Conover, W.J. (1967). "The Distribution Functions of Tsao's
Truncated Smirnov Statistics," Ann. Math. Statist. 38, 12081215.
183
Conover, W.J. (1971).
Practical Nonparametric Statistics, John
Wiley & Sons, Inc., New York.
Creason, J.P. (1978). "The Theory and Application of a General
Iterative Maximum Likelihood Procedure to Randomly Censored
Univariate and Bivariate Normal Linear Models." Ph.D.
Dissertation, Dept. of Biostatistics, University of North
Carolina at Chapel Hill.
Crouse, C.F. and Steffens, F.E. (1969). "A Distribution-Free
Two Sample Test for Dispersion for Symmetrical Distribution." South African Statistical Journal 3, 55-67.
Daniels, H.E. (1944). "The Relation Between Measures of Correlation in the Universe of Sample Permutations." Biometrika
33, 129-135.
Daniels, H.E. and Kendall, M.G. (1947). "The Significance of Rank
Correlations Where Parental Correlation Exists." Biometrika
34, 197-208.
Daniels, H.E. (1948).
35, 416-417.
"A Property of Rank Correlations." Biometrika
Daniels, H.E. (1950). "Rank Correlation and Population Models."
J. Roy. Statist. Soc. B 12, 171-181.
Daniels, H.E. (1952). "Note on Durbin and Stuart's Formula for
E(r ), J. Roy. Statist. Soc. B. 13, 310.
s
David, H.T. (1958). "A Three Sample Kolmogorov-Smirnov Test,"
Ann. Math. Statist. 29, 842-851.
Davis, C.E. (1978), "A Two-Sample Wilcoxon Test for Progressively
Censored Data," Commun. Statist. Theor. Meth., A7(4), 389-398.
Davis, J.A. (1967). "A Partial Coefficient for Goodman and Kruska1's
Gamma," J. Amer. Statist. Assoc. 62,189-193.
Durbin, J. and A. Stuart (1951). "Inversions and Rank Correlation
Coefficients." J. Roy. Statist. Soc. B. 13, 303-309.
Flies, J.L. (1973). Statistical Methods for Rates and Proportions,
John Wiley & Sons, Inc., New York.
Freireich, E., et a1. (1963). "The Effect of 6-mercaptopurine
on the Duration of Steroid-Induced Remissions in Acute
Leukemia," Blood 21, 699-716.
184
Gehan, E.A. (1965). "A Generalized Wilcoxon Test for Comparing
Arbitrarily Singly-Censored Samples," Biometrika 52, 203-218.
Gelberg, M.G. (1974). "The Relation Between Mann-Whitney's Statistic and Kendall's Correlation Coefficient Tau," Theory Probability and Applications 19, 205-207 •
•
r..nntiman, Leo A., and Kruskal, W.H. (1954). "Measures of Association
for Cross Classifications," J. Amer. Statist. Assoc. 49,
732-764.
Goodman, Leo A. (1959).
46, 425-432.
"Partial Test for Partial Taus," Biometrika
Halperin, Max (1960. "Extension of the Wilcoxon-Mann-Whitney Test
to Samples Censored at the Same Fixed Point," J. Amer. Statist.
Assoc. Vol. 55, 125-138.
Halperin, Max, and Ware, James (1974). "Early Decisions in a
Censored Wilcoxon Two-Sample Test for Accumulating Survival
Data," J. Amer. Statist. Assoc. 69, 414-422.
.e
Hodges, Jr., J.L. (1958). "The Significance Probability of the
Smirnov Two-Sample Test," Arkiv for Matematik 3, 469-486 •
Hoeffding, W. (1948). "A Class of Statistics with Asymptotically
Normal Distributions, Ann. Math. Statist. 19, 293-325.
Hoffding, W. (1947). "On the Distribution of the Rank Correlation
Coefficient-t When the Variates are not Independent,"
Biometrika 34, 183-196.
Hotte1ing, H. and Pabst, M.R. (1936). "Rank Correlation and Tests
of Significance Involving no Assumption of Normality,"
Ann. Math. Statist. 7, 29-43.
Johnson, N.I. and Kotz, S. (1970). Continuous Univariate Distributions - 2. Houghton Mifflin Company, New York.
Johnson, Richard A., and Mehrotra, K. G. (1972). "Locally Most
Powerful Rank Tests for the Two-Sample Problem with Censored
Data," Ann. Math. Statist. 43, 823-831.
Kendall, M.G. (1938). "A New Measure of Rank Correlation,"
Biometrika 30, 81-93.
Kendall, M.G. (1942).
277-283.
"Partial Rank Correlation," Biometrika 32,
185
Kendall, M.G. (1949). "Rank and Product-Moment Correlation,"
Biometrika 36, 177-193.
Kendall, M.G. (1970).
Rank Correlation Methods, Griffin, London.
Koziol, James A. and Byar, David P. (1975). "Percentage Points of
the Asymptotic Distribution of One and Two Sample K-S
Statistics for Truncated or Censored Data," Technometrics, Vol.
17, 114, 507-510.
Kruskal, W.H. (1958). "Ordinal Measure of Association," J. Amer.
Statist. Assoc. 53, 814-867.
Mohanty, S.G. (1979). Lattice Path Counting and Applications.
Academic Press, New York.
Moran, P.A.P. (1951), "Partial and Multiple Rank Correlation,"
Biometrika 38, 26-32.
Nelson, W.B. and Hahn, G.J. (1971). "Regression Analysis of Censored
Data-Linear Estimation Using Ordered Observations," General
Electric Corporate Research and Development TIS Report No.
71-C-122.
Nelson, W. (1972). "Theory and Applications of Hazard Plotting
for Censored Failure Data," Technometrics Vol. 14, No.4,
945-966.
Quade, Dana (1973).
Nr. 1, 29-45.
"The Pair Chart," Statistica Neerlandica 27
Quade, Dana (1974). ''Nonparametric Partial Correlation." From
Measurement in the Social Sciences: Theories and Strategies.
Edited by H.K Blalock, Jr., Aldine Publishing Company,
Chapter 13, 369-398.
Rao, C.R. (1973). Linear Statistical Inference and its Applications.
John Wiley & Sons, Inc., New York.
Rao, U.V.R., Savage, I.R. and Sobel, M. (1960). "Contributions to
the Theory of Rank Order Statistics: The Two-Sample Censored
Case," Ann. Math. Statist. 31, 415-426.
Ryder, N.B., and Westoff, C.W. (1977). The Contraceptive Revolution,
Princeton University Press, Princeton, N.J.
Sen, P.K. (1960). "On Some Convergence Properties of U-Statistics."
Calcultta Statistical Association Bulletin, Vol. 10, Nos. 37 &
38, 1-18.
..
186
Sen, P.K. (1967). "On Some Nonparametric Generalization of Wilk' s
Test for HM, HvC ' and ~C' I." Annals of the Inst. of
Statistical Mathematics, Vol. 19, 451-471.
Shirahata, S. (1975). "Locally Most Powerful Rank Tests for Independence with Censored Data." The Annals of Statistics 3, 241245 •
•
Sibuya, M. (1960). "Bivariate Extreme Statistic, I," Annals of the
Institute of Statistical Mathematics, Tokyo, 11, 195-210.
Simon, Gary (1977). "A Nonparametric Test of Total Independence
Based on Kendall's Tau." Biometrika 64, 2, 277-82.
Sobel, M. (1966). "On a Generalization of Wilcoxon's Rank Sum Tes t
for Censored Data." Technical Report No. 69 (Revised),
University of Minnesota.
Tjostheim, Dag (1978). "A Measure of Association for Spatial Variables," Biometrika 65, 109-114.
.e
•
Weier, D.R. and Basu, A.P. (1980). "An Investigation of Kendall's
T Modified for Censored Data with Application," Journal of
Statistical Planning and Inference 4, 381-390 •
Wilk, M.B. and Qlanadesikan, R. (1968). "Probability Plotting
Methods for the Analysis of Data," Biometrika 55,1-17.
''World Fertility Survey: Sri Lanka, 1975, First Report." Department
of Census and Statistics, Ministry of Plan Implementation,
Sri Lanka •
•
© Copyright 2025 Paperzz