NONPARAMETRIC AND ROBUST PROCEDURES
FOR BIOEQUIVALENCE MODELS
WITH MEASUREMENT ERRORS
by
HoKim
Department of Biostatistics
University of North Carolina
Institute of Statistics
Mimeo Series No. 2170T
October 1996
Nonparametric and Robust Procedures
for Bioequivalence Models
with Measurement Errors
by
Ho Kim
A dissertation submitted to the faculty of the C niw.>rsity of North Carolina at
Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of
Philosophy in the Department of Biostatistics.
Chapel Hill
1996
Approved by:
Advisor
Reader
Rpader
Reader
Readpr
Abstract
In a biological assay (bioassay, for short), a new preparation and an old one are compared by means of the reactions that follow their applications to biological organisms,
and the relative potency of the test preparation with respect to the standard one is the
main interest under investigation. One typically assumes that the test preparation
behaves as if it is a dilution (or concentration) of the standard one which is called the
fundamental assumption of a bioassay. Our main interest is to estimate the relative
potency and to verify the fundamental assumption. The usual maximum likelihood
estimator has many optimal properties such as consistency and asymptotic normality
under the assumed distribution. But these parametric estimates are generally not
very robust against departures from the assumed form of distribution function. Rank
based estimates of relative potency have been considered by Sen (196:3), Shorack
(1966) and Rao and Little (1976), among others. These estimates are invariant under
the choice of any monotone transformation on the dose, and, besides being robust, are
generally quite efficient for normal, log-normal, logistic or other common forms of the
tolerance distributions. Under the (generalized) linear models set-up, to verify the
fundamental assumption one typically tests the null hypothesis of the same slope for
the two preparations. And to estimate the relative potency, one typically estimates
the intercepts of the two dose-response curves from the two preparations. One of the
difficulties of those analyses of the bioequivalence model is the assumed form of the
expected value of the response as a function of the dose.
The conditional quantile process( CQP) is a very useful tool to verify the fundamental assumption in this situation. The k-nearest neighborhood estimator of the
conditional quantile has been studied by Bhattacharya and Gangopadhyay (1988) and
Sen and Gangopadhyay (1990). In our paper. these two methods are compared in
terms of asymptotic properties. Also several bias correction methods are proposed.
The proposed CQP is asymptotically a Gaussian process which is fully character-
ized by pointwise asymptotic normality and covariance structures for only shrinking
neighborhood of fixed points. The proposed CQP has useful asymptotic properties
and generates a smooth nonparametric dose-response curve. In addition. the CQP
does not require any functional form of the expected value. To estimate the relative
potency. a Kolmogorov-Smirnov type estimator is proposed along with.
In a clinical trial, some biological variables like cholesterol level can be considered as a function of some other variables like blood pressure. We don't have clear-cut
evidence of linear normality of the response variables with respect to_ covariates. An
additional problem is that the measurements of the covariates are not accurate; they
include biases such as measurement bias, laboratory bias, worker bias and bias due
to the data processing. In a classical parametric analysis, if we have measurement
errors. it is well known that the estimates are biased. In this paper it is proved that
the asymptotic properties of the CQP hold in the presence of small measurement errors. The difference of the quantiles between the two cdf's (one without measurement
error and one with measurement error) is the order of
O(IIt:W)
under the assumption
of the uniform measurement error distri bution (U (- t, t) ).
C ni variate CQP is generalized to the multivariate CQP for (q xl) vector of
covariates. \Ve show that a similar expression (Bahadur type) of conditional quantile
holds in this case by using contiguity. We apply our method to ARIC (Atherosclerosis Risk in Communities) study data. In the first of the two examples.
\'v'e
compare
the hypertension group and the normotension group in terms of the carotid artery
far wall thickness (\VT). The WT curves over Body Mass Index
(B~lI)
are com-
pared. In the second example, we investigate the effect of smoking on the WT cun'e.
Our conditional quantile method has useful applications and interpretations for the
examples.
11
Acknowledgment
First and foremost I would like to thank Dr. Pranab K. Sen. This dissertation would
not have been possible without his guidance and patience. Many thanks go to my
committee, Drs. Chirayath M. Suchindran, Michael J. Symons, Bahjat F. Qaqish
and Lloyd J. Edwards. I wish to extend my sincere appreciation to Drs. Lloyd E.
Chambless and C. Ed Davis for providing me with the opportunity to work at the
coordinating center. I also thank my fellow students, particularly my office mates:
PaiLien Chen, Hildete Pinheiro, Jinwhan Jung and Charmaine Marquis: they haye
been a constant source of encouragement. The next round of applause is for Dr. Louis
\Vijnberg without whom I would have lost my sense of humor and my way through
the days of my research. I would like to thank my parents and parents-in-law for their
encouragement and support over the years. Last, but not least, I sincerely appreciate
the support and understanding of my wife, Kyujeong, who has remained by my side.
III
Contents
1 Literature Review and Introduction
1.1
Introduction . . .
1.2
Literature review
1.2.1
Parametric bioassay
1.2.2
NN method.
1.2.3
bNN method
1.:3
Discussion . . . . . .
1.4
Significance of the study
1
1
10
13
2 Testing the Fundamental Assumption
17
2.1
Local asymptotic behavior of a conditional quantile process
17
2.2
Kolmogorov-Smirnov type estimator of the relative potency.
23
2.3
Test of the fundamental assumption
26
2.4
Confidence intervals and powers of the tests: comparing the cardinality
.
of the k - 1\;.V estimator . . . .
28
Bias correction for kn = O( n 4 / 5 )
:3:3
v
3
NN-Estimator of a Conditional Quantile with Measurement Errors 36
3.1
Asymptotic properties of k-NN estimator with measurement errors on
the covariate variable . . . . . . . . . . . . . . . .
:36
3.1.1
Approximation of the conditional quantile
3.1.2
Asymptotic normality of the conditional quantile estimator
with measurement error . . .
3.1.3
Asymptotic variance and bias of the process with
40
me~surement
error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Approximation of Conditional Quantiles with Measurement Erros
First order approximation
.52
3.2.2
2nd order approximation .
53
3.2.3
2nd order approximation with symmetric argument
.54
3.2.4
Bivariate symmetric measurement error . . . . . . .
55
3.2 ..5
Relationship between two quantiles: under normal assumptions
.) I
3.2.6
Approximation of conditional cdf with bivariate normality and
3.2.7
61
Conditional quantile from general cdf with uniform measure error 64
Multivariate Conditional Quantile Process
4.1
Multivariate conditional quantile
4.2
Asymptotic properties of multivariate conditional quantile through
contiguity . . . . . . . . . . . . . . . .
4.3
.52
3.2.1
uniform measurement error
4
46
Conditional Independence and Contiguity
VI
72
7S
4.4
Multiple conditional quantile process . . . . . . . . . . . . . . . . ..
5 Numerical Examples
9.5
97
•
5.1
Introduction . . . .
97
5.2
ARIC Black females
98
.5.2.1
Inferences at a given point of x
98
5.2.2
Inferences for the processes.
.5.3
6
101
ARIC White females . . . .
105
.5.3.1
10.5
Statistical inferences
Summary and further research
113
Bibliography
11.5
"
•
.
Vll
List of Figures
1.1
Graphical explanation of the bioequivalence model. . . . .
5.1
Estimated conditional median carotid artery far wall thickness (mm)
5.3
4
conditioned on BMI (KgJm 2 ) for two groups, fJ = 0 (black females)
102
SUPXEX It~(x) - t~(x
103
+ fJ)1
vs. fJ (black females)
Estimated conditional median carotid artery far wall thickness (mm)
conditioned on BMI (KgJm 2 ) for two groups, fJ = 10 (estimated relati ve potency) (black females). . . . . . . . . ..
.
0
0
•
•
•
•
•
0
••
.sA
sUPxO' It~ (x) - t~ (x + fJ) I vSo fJ (white females)
5..5
Estimated conditional median carotid artery far wall thickness (mm)
108
conditioned on BMI (KgJm 2 ) for two groups: fJ = 0 (white females).
.5.6
104:
109
BMI range of kn NN sample at each point of BMI (white females):
solid line: smoker. dotted line: non-smoker ..
Vlll
0
••
0
• • • • •
0
•
"
112
List of Tables
.5.1
Estimated conditional median carotid artery far wall thickn-ess (mm)
conditioned on BMI=30.162 Kgjm 2 (black females)
.. . . . . . . . .
99
5.2
Jackknife bias correction on kn sample (black females) .
100
.5.3
Jackknife bias correction on the whole n data (black females) .
101
.5.4
Estimated conditional median carotid artery far wall thickness (mm)
conditioned on BMI=25.173 Kgjm 2 (white females)
.. . . . . . . . . 106
Jackknife bias correction on kn sample (white females)
107
5.6
Jackknife bias correction on the whole n data (white females)
107
.5.7
BMI range of kn sample at each point of BMI (white females)
111
.5.5
IX
.
Chapter 1
Literature Review and
Introduction
1.1
Introduction
In a biological assay (bioassay), a new (test) preparation and an old (standard) one
are compared by means of the reactions that follow their applications to biological
organisms. The main interest under investigation is to examine how biological organisms react to the two preparations and to evaluate the relative potency of the test
preparation with respect to the standard. To achieve this goal, a stimulus is applied
to a subject, and changes in the organism (response) are measured. The magnitude
of this response may depend on the amount of the stimulus (dose). Furthermore. the
relationship between the dose and the response may not be exact, and it is subject
to random variation in the application of the dose, in the manifestation of the response. or in both. The response to a stimulus may be classified as quantitative or
quantal. For example, quantitative responses can be measured. as would be the case
if we were studying weight changes following the use of a certain drug. On the other
hand. quantal responses have two outcomes such as death or survival. There are three
1
main types of biological assays that are commonly used for numerical evaluation of
potencies: (1) Direct assays, (2) indirect assays with quantitative responses. and (:3)
indirect assays with quantal responses. In a direct assay, the basic principle is that
doses of the standard and test preparations, sufficient to produce a specific response.
are directly measured so that the response is certain while the dose is the random
variable. The ratio of these two doses estimates the potency of the test preparation
relative to the standard. Assays (2) and (3) are similar with respect to their statistical analysis. Both make use of dose-response regression relationships. In an indirect
assay. for given levels of doses, the responses are measured. Thus, the response is a
random variable whose distribution may depend on dose.
Direct assays have shortcomings, especially the bias introduced by the timelag in producing the response.
Even if this difficulty is overcome, it is not easy
to administer the precise dosage that produces the desired response. Let:r denote
the dose and Y denote the response it induces. If 5 and T are abbreviations for
standard and test preparations, respectively, then the regression relationships for the
•
two preparations can be denoted by
"
E(Ys ) = hs(x) and E(YT ) = hT(x).
( 1.1)
From (1.1) one can obtain the doses XT and Xs that would induce the average response y. More precisely, the doses Xs and XT are equally effective. In a typical
indirEct model, one assumes that the test preparation behaves as if it is a dilution (or
concentration) of the standard one, i.e. the potency ratio :r s / XT is independent of y.
Therefore,
hence
XS/XT
p,
hT(x) =
hs(px) for all x, andp>O.
( 1.2)
(1.3 )
The positive constant p is termed the relative potency of the test preparation with
respect to the standard one. The model (1.3) is termed the fundamental assumption
2
of an assay. The main interest is to estimate the relative potency p and to verify
the fundamental assumption. Parametric procedures for these problems are usually
based on the assumptions of the distributions of Y (normal, lognormal. logistic or
loglogistic). these procedures are discussed in detail in Finney (1964). Csing (generalized) linear models. the usual maximum likelihood estimator has many optimal
properties. such as consistency and asymptotic normality under the assumed distribution. However, these parametric estimates are generally not very robust against
departures from the assumed form of the distribution function. In univariate linear
models. testing equation (1.3) examines the equality of the slopes and the equality of
the intercepts. Rank based estimates of relative potency have been considered by Sen
(1963). Shorack (1966) and Rao and Little (1976), among others. These estimates
are invariant under the choice of any monotone transformation on the dose, and.
besides being robust, are generally quite efficient for normal, log-normaL logistic or
other common forms of the tolerance distributions. Recently, a method to obtain a
confidence interval of a conditional quantile was proposed by Sen and Gangopadhyay
(1990). If we use the conditional median or quantile of the response instead of the
expectation in (1.1), we can test the equivalence of a test preparation with respect
to a standard one. Because the conditional quantile has a nonparametric flavor, it
will be robust than the distribution-dependent expected value. \Ve do not need to
perform separate tests for slopes and intercepts as we usually do in dose-response
parallel-line bioassay. After taking logarithms of the doses, the dose-response model
becomes
M T ( x) = .i\1 5 (x
+ m) for all x and for some m
(1.4 )
where AfT and ll1yfs are the average responses for the test and standard preparations.
The fundamental assumption (1.4) can be better understood by Figure 1.1. To test
(1.4) is to test parallelism of the two response curves, and to estimate m is to estimate
the drift m under the assumption of (1.4).
3
..
/
/
/
/
'" '"
/
/
/
/
/
(1)
/
III
C
/
m
o
a.
III
M~x)
.
/
/
(1)
/
/
/
/
/
/
/
/
/
//
'" '"
'"
II
dose
Figure 1.1: Graphical explanation of the bioequivalence model
1.2
Literature review
1.2.1
Parametric bioassay
Bioequivalence is an important area of pharmaceutical research. The key question
is whether two or more formulations containing the same active ingredient are ther-
'.
apeutically equivalent. For example, a manufacturer of a new tabulate drug might
be interested in knowing whether his or her tabulate formulation can be envisaged
4
.
in comparing with the most readily available form, for example, a solution. \\:estlake
(1979) and Metzler (1974) compared several statistical methods for bioequivalence
study. The experimental design usually follows familiar paths employing either parallel groups. in which subjects in each group receive just one formulation. or the
cross-over concept discussed by Grizzle (1965) and \Vallenstein and Fisher (1977).
The frequently used statistical method is repeated measurements experiments. Some
of the more general methods are: the univariate split-plot ANOVA (Box (19.50). \Yestlake (1974), \\7allenstein and Fisher (1977)), MANOVA (Cole and Grizzle (1966)).
analysis of growth curves (Grizzle and Allen (1969)) and analysis o{ principal components (Snee (1972)).
One of the main concerns of the bioequivalence study is to establish the equivalence between two formulations. Therefore, testing a null hypothesis of no difference
is completely irrelevant to the demand of the situation. Thus, obtaining confidence
intervals rather than testing a hypothesis would be more appropriate.
l~ sing
confi-
dence intervals leads to a different approach to the problems of choosing type I and
type II error levels (Jones and Karson (1972), Westlake (1972)). The clinician or
pharmacologist can specify that the bioequivalence of the new formulation relative to
the standard one must be within a certain range. For example. the FDA guidelines
for Bioavailability Studies state that "products whose rate and extent of absorption
differ by 20 % or less generally considered bioequivalent" (Westlake (1979)).
To establish the bioequivalence, the most frequently used method is to estimate
the confidence interval of the ratio of two responses, one from the test preparation
and one from the standard preparation. Fieller's theorem referred by Govindarajulu
(1988) is the classical method of obtaining a confidence interval of a ratio. Let 11. v
be two unknown parameters (here, they are doses of the test and standard preparations which produce a specific response) and let p = p/v. Let a and b be unbiased
estimators for p and v, respectively. Assume that a and b are linear in observations
that are normally distributed. Let R = a/b be an estimate of p. Then, the upper
5
and lower confidence limits for p are given by Fieller's Theorem (Fieller (1940) and
Fieller (1944)), i.e.,
1
--x
--
1- 9
{
9 V 12
-V22
t2s2V2l/b2
and
R -
where 9 =
ts
± -b
s2
[
V11 -
2Rv12
+ R 2 V22 + R 2 V22 -
9
(
V11 -
2)] 1/2}
V 12
-V22
.
is an error mean square having m degrees-of freedom and
the estimates of the variances and covariance of
a
and bare
1111S2, 1122S2
and
S2 t '12.
respectively: also t = t m ,l-a/2. Therefore, we obtain a confidence interval of relative
potency by Fieller's Theorem.
\Vhen we are interested m the difference rather than the ratio, we can use
Behren's distribution. Let a and b be independent and unbiased estimates of J-l and
v and be linear combinations of observations that are normally distributed. Also
assume that estimates of Var(a) and Var(b) are given by
•
where si and s~ are independent mean squares with ml, m2 degrees of freedom. respectively. Then the estimated variance of
a -
b is
Let
(a-b)-(J-l-v)
D=
2
(slU11
+ S22 V22 )1 /2'
Then D has Behren's distribution (D is also called Sukhatme's D-statistic) which is
tabulated by Finney (1964) . Its distribution is defined in terms of the degrees of
freedom ml and m2 and the angle () where
6
Next. let a and b denote independent estimates of J1
v. Then we consider the
weighted estimate given by
Yates (1939) and Finney (1951) point out that
follows the Behrens' distribution with d.f.
tan () =
m1
and
m2
and an angle () defined by
( 8 22 V22 / 8 21 Vl1 )1/2 .
Hence, one can test the significance of a from a theoretical value J1 and also set up a
confidence intervals of J1 as
where d 1 -
a/2
1.2.2
NN method
denotes the tabulated value for the probability 1 - 0./2.
Let {( Xi, }i), i
2: 1} be a sequence of i.i.d. random vectors with a distribution function
r.(x.y).(x,y) E R2 (here, Xi is the dose and Y; is the response of ith subject). Let
F(x)
= r.(x. (0),
x E R and let G(ylx) be the conditional dJ. of Y gi\'en X
= J'.
for
y E R. x E R. A conditional quantile function (of Y given X = x) is defined by
~p(x) = inf{y: G(ylx)
2: p},
x E R (0
< P < 1).
(1 ..j)
The following regularity conditions are assumed to be true:
(At) The d.£. F admits an absolutely continuous density function f. such that
7
f(xo) = fo > 0
(a)
and
(b) f"(x) = (d 2 jdx 2 )f(x) exists in a neighborhood of Xo, and there exist positive numbers
and ko, such that
f
Ix -
xol <
f
implies that
If"(x) - !"(xo)1 ~ kolx - xol·
(A2) The d.L G(ylxo) has a continuous density function g(ylxo) for all yclose to
~o.
such that
g(~olxo)
(a)
> 0 where
G(~olxo) = p,
and
(b) the partial derivatives gy(ylx) and gxx(ylx) of g(ylx) and Gxx(ylx) of G(y Ix)
exist in a neighborhood of (xo, ~o), and there exist positive constants
ko. such that Ix - xol
~ t
and Iy - ~ol ~
f
f
and
together imply that
•
...
In the sequel, we shall write G(ylxo)
= G(y)
and g(Ylxo)
= g(y).
For given p and Xo, let {(Xi, Yi),i 2': I} be a sequence of i.i.d. random vectors. Consider the transformation (Xi, Yi)
collection
{(Zl,Yi)"",(Zn,Y~)}
---t
(Zi, Yi) where Zi =
ofr.v.'s, let Zn1 < '"
IXi
-
xol. For the
< Znn be the order statis-
tics corresponding to Zl,"', Zn, and let Y~l,"" l~m be the induced order statistics.
(i.e..
}~l
= 1'~
if Zni
= Zj,
for i,j
= 1, .. ·,n).
For every positive integer l... (~ 11). the
I.. - SS (nearest neighbor) empirical d.f. of Y (with respect to xo) is given by
Gnk(y) = k- 1
k
L l(l~j ~ y), y E R,
j=l
8
( 1.6)
where 1( A) stands for the indicator function of the set A. The following estimators
of
~o
are due to Bhattacharya and Gangopadhyay (1988).
(i) The k - NN estimator of
~nk
=
~o
is
the [kp]-th order statistic of
}~1, ... , }~k
inf{y : Gnk(y) ~ k- 1 [kp]}.
(1.7)
(ii) The kernel estimator (with uniform kernel and bandwidth h) is
(1.8 )
where
/{n (
h) = t 1 (Zi ::; ~2 h)
i=l
is a positive integer valued r.v. Note that by (1.7) and (1.8),
_
A
~nh = ~nKn(h)
(1.9 )
where h is usually non-stochastic and /{n(h) is stochastic in nature. In order that
ink and E,nh are consistent estimators of ~o, we need that as n
(and h = hn
f..'n
--t 00,
k = kn
--t
ex:
1 0). For the asymptotic normality results, usually, one needs that
= O(n'\) = nh n , for some ,\ E (0,1). Bhattacharya and Gangopadhyay (1988)
studied the case of ,\ = 4/5 which we call NN (nearest neighborhood) method. The
following are some of their theorems stated without proofs.
ink - ~ = i3(~)(k/n)2
k
+ {kg(~)}-12)1(Y:i > 0 i=l
9
(1 - p)]
+ Rnk
max IRnk l=O(n- 3 / s1ogn), a.s.
kEln(a,b)
and
(8j8x)C(ylx),
1
C*(ylz)
f(xo
+ z) + f(xo
x [j(xo
C(y)
=
C*(yIO)
+ z)C(ylxo + z) + f(xo
- z)C(ylxo ~ z)].
= C(ylxo).
Theorem 1.2 For any 0 < a < t < b <
where
- z)
p = ;3(0 given in Theorem 1.1,
00,
(12
as n
-+ 00
= p(l - p)/g2(~) and {B(t), t > O} denote
a standard Brownian motion.
.
Theorem 1.3 For every t E [a, b] as n
1.2.3
-+ 00
<'iNN method
Sen and Gangopadhyay (1990) showed the bias term (Bt 2 ) in theorems 1.2 and 1.:3
can be eliminated when kn = O( n 4 / S -
2S
).
We call their approach the 8~I\ (8 nearest
neighborhood) method. The followings are their main results.
Theorem 1,4 For k n lying in
10
in,kn(X) - ~(x) =
k
j3(~(x))(kn/n)2 + {kng(~(x)lx)}-1 ~]I(Z~i > ~(x))
i=1
- (1 - p)]
+ Rnk n,
( 1.10)
u'here
8(~(x))
-[f(X)Gxx(~(x)lx)
+
2f'( x )Gx(~( x) Ix)]/ {24f3(x )g(~(x) Ix)},
(1.11)
( 1.12)
and
max IRnkl = O(n-3/5+36/2Iogn) a.s.
kEln{a,b)
Theorem 1.5
t E
Under assumptions (Ai) and (A2), for every k : k = [tn 4 / 5 -
26 ],
[a. b].
Theorem 1.6 : Under assumptions (Ai) and (A2), for every t E [c. d] as n
~
x
They also considered the construction of (local) bootstrap confidence interyals
for
~o
based on both the k-NN and kernel methods, and their asymptotic properties.
For this purpose, in the k - 1'11'1 procedure, a bootstrap sample
obtained from the empirical d.L Gnk
,
(}~*1"'" }~*d
is
defined by (1.6). Thus given the induced
order statistics (}':1" .. , Y;k), the Y;i are conditionally i.i.d. LV. with the dJ. G nk .
Similarly. in the kernel method. given the
}'~i
and h(> 0), we define l\n(h) by (1.9).
and then a bootstrap sample of size Kn(h) is obtained by resampling from all the
11
}~i
for which Zi =
IXi - xal S
h/2. For the k-NN procedure, let G~k be the bootstrap
sample d.f. (based on Y~·l'···' Y;k)' Then, the bootstrap estimator is
~~k
=
[kp]-th order statistic of Y;l' ... , Y~·k
Similarly, in the kernel method. the bootstrap estimator is
where I\n(h) = 2::;'=1 l(Zi S h/2) is conditionally held fixed. Moreover, paralleled
to Theorem 1..5 and 1.6, we have the following asymptotic representation for the
bootstrap estimates.
Theorem 1.7 : Under (A1) and (A2), for each t E [a, b],
..
for every fJ : 0 < fJ < 1/10, and every 0 < a < b <
00.
Theorem 1.8 : Under (A1) and (A2), for each t E [c,dJ,
for every fJ : 0 < fJ < 1/10, and every 0 < c < d <
00.
One can generate a set of lU bootstrap samples. From each sample, one can compute
the bootstrap estimates, and use the empirical quantiles (for the ~~k - ~nk or ~~h - ~nh)
to set the desired bootstrap confidence intervals for
~a.
The asymptotic properties of
such bootstrap intervals would then follow Theorems 1.7 and 1.8.
12
Discussion
1.3
\,ve note that the parametric and nonparametric bioassay of a dose-response model
assume the functional forms of the expected values of response with respect to doses
(i.e., the response is linear to the dose); however, there are several situations where
such a model does not seem to be appropriate. especially from the point of robustness.
In the present paper, we shall consider the methodology which uses the conditional
quantile instead of conditional mean to provide a more robust approach.
In chapter 2, we will discuss our methodology: conditional quantile process
and its asymptotic properties.
And the test of the fundamental assumption and
the estimation of the relative potency are of primary interests. Let
~pT( Y
Ix)
be the
conditional p x 100% quantile of the test response (Y) for given dose (x) as defined as
in (1..)). Similarly, let ~ps(Y Ix) be the conditional quantile of the standard response.
\\'e will test the fundamental assumption that the test and standard preparations
are parallel with a constant difference J-l (relative potency) in terms of conditional
quantiles as follows
H0
:
~pT(Ylx) = ~ps(Ylx
+ J-l)'
for all p. (0 < p < 1) and x, (x E R) for some J-l, (J-l E R). Or, it may be of interest
to fix p = 1/2 to test medians. If H 0 is not rejected, we want to make a statistical
inference about f.l which we introduced in earlier sections. One of the possible ways
to get an estimate of J-l is to take the distance between
~pT(Ylx)
and
~ps(}lT
+ fJ)
by
shifting x. i.e..
i~f { s~p ImpT( 'Ix) Then
it
mps( ·Ix + f.l) I} .
(1.1:3 )
is the value which gives the infimum value of (1.13).
The bNN method (k n = O( n 4 / S -
28
))
and the NN method (J..~n = O( n 4 / 5 )) are
compared in terms of the asymptotic mean square errors. The bNN method gives
unbiased estimate but larger variance. We propose bias correction methods for the
13
NN method which give smaller variance of the estimate.
In chapter 3, we discuss the measurement errors relating to the covariate. Measurement errors are important in environmental or animal experiments. For example,
an investigator may conduct an experiment to measure the effects of fat and smoking
on rats. He or she may observe rats in a chamber, controlling food and air condition
to examine whether the given fat in the food and gas in the air are the exact same
as the amounts in a rat's body. A rat may not eat some food and. it may not inhale
the same amount of gas in the given air. In these situations,
measure~enterrors
may
playa significant role in a statistical analysis. Within a small range of measurement
errors, our procedure can be robust against the measurement errors by virtue of the
nonparametric property. However when measurement errors are not small we may
need a special procedure to control them. In other words, robustness or efficiency
of our test depends on the measurement errors' characteristics; therefore, we need to
explore the role of measurement errors in our robust bioassay procedures.
In general, we observe X* = X
term. Consider X* = X
IX· -
.To
I instead
+u
..
instead of X where u is the added error
+ u, where u is a continuous random variable.
of Zi =
IXi
-
Xo
k-
1
I and
We have
Zt
=
Y;l"", Y;n corresponding to Z~l'" .. Z~n'
And our estimates become
G~k(Y) =
k
L
l(Ynj ~ y), Y E R,
j=l
\Ve also are interested in the effects of the measurement errors on our estimates under
some assumptions of the distribution of u: normal or continuous and symmetric about
zero. And we are also interested in the impact of the measurement errors on our
asymptotic results.
14
In chapter 4, we will discuss the multivariate process. If we have a (q xl)
vector of covariates, we need to modify our method for this situation. We show that
a similar expression (Bahadur type) of conditional quantile holds in this case by using
contiguity.
In chapter 5, we will apply our method to ARIC (Atherosclerosis Risk in Communities) study data. In the first example, we compare the hypertension group and
the control group in terms of the carotid artery far wall thickness (WT): the \YT
curves over Body Mass Index (BMI) are compared. In the second example. we investigate the effect of smoking on the WT curve. Our conditional quantile method has
nice applications and interpretations as seen in the examples.
1.4
Significance of the study
In a classical parametric bioassay, several questions may arise. MLE has good asymptotic properties under the assumption of the distribution. However. it is well known
that MLE's are not very robust against the departure of the assumed form of the
distribution. Another difficulty of the parametric analysis is the form of the expected
value. In some situations, such a model does not seem to be appropriate.
A nonparametric approach is needed to overcome such problems. The conditional quantile process is a very useful tool in this situation; it has good asymptotic
properties and provides a clear dose-response curve without any assumption of parametric form. And it is proved that with some measurement errors. our asymptotic
properties still hold.
In a clinical trial, some biological variables, such as cholesterol lew!. can be
considered as a function of some other variables like blood pressure. \Ve do not have
clear evidence of linear normality of the response variables with covariates; other
obstacle is that the measurements of the covariates are not accurate: they include
15
biases such as measurement tool bias, laboratory bias, worker bias or bias due to the
data processing. In a classical parametric analysis. if we have measurement errors, it is
well known that the estimates are biased. But our method still works asymptotically
•
with measurement errors. The proposed conditional quantile method works nicely in
this situation, and have a clear interpretation without parametric assumption.
.
.
.
16
Chapter 2
Testing the Fundamental
Assumption
2.1
Local asymptotic behavior of a conditional
quantile process
Let ~(x) be the p-quantile of the dJ. G( 'Ix)' x E [xz, xu] = X
be the k - NN estimator where
that these
Znl
Zni
c
R, and let ~n.kn (.1')
are defined with respect to the pivot x. l'ote
are the induced order statistics corresponding to the nearest neighbors
of x (i.e. those X, 's which are closest to x). As such, if we consider two distinct
points. say
Xl
and
X2,
then the two sets of X, forming the corresponding nearest
neighbors would have at most k n members in common.
As such. the proportion
of the two subsets' common elements will be asymptotically o(n- l kn ). and we may
immediately say that ~n.kn (Xl) and ~n.kn (X2) are asymptotically independent. \Ve set
ll"n(x)
= {~Fn(.r). x
E [Xl, xu]}
=X
by letting
17
Then the asymptotic behavior of W'n is depicted entirely by the pointwise asymptotic
normality and the stochastic dependence pattern only in a shrinking neighborhood
of fixed points.
\Ve discussed the pointwise asymptotic normality
tn 4 / S -
26
•
(0 <
III
section 1.2.2 for A'n
< t < b < 00). Then we have
Q
l-Fn(x) ~ W(x), for every x E X,
where ~r(x) is normal with mean 0 and variance pq/(g2(~(x)lx)).
About the stochastic dependency, we refer to Gangopadhyay and Sen (199:3).
Shrinking neighborhoods are defined by the sets of x relative to a given Xo as
where 1\ is a finite positive number. For any two points, say,
to In(xo.I{). let
k(xn.x~)
Xn
and
x~.
belonging
be the number of elements in common between the two
neighbor sets. Consider the intersection set In(x n, x~):
Let
mn(xn.x~)
be the length of this intersection set
In(xn,x~).
Then by Bahadur
representation. we may again claim that
(2.1 )
And we may conclude that ",rn(x n ) and
Wn(x~)
have asymptotic correlation
(2.2)
and it is easy to show that for both X n and
(which vanishes when
mn(Xn.x~)
x~
belonging to In(xo. 1\). (2.2) has a limit
= 0). In order to justify this asymptotic analysis.
we need to extend (1.10) for a suitable convergence sequence {Xn} (to .ro). where
x n E In (Xo. 1\). In this context, we need to replace in (1.10), x by x n (everywhere)
and to show that this holds uniformly in In(xo, K).
18
.
To facilitate this, we consider a stochastic process
Consider a compact interval C = [- K, K] where K( < (0) is a positive number so
chosen that I{f(xo) :::; 1. In passing, we may remark that (2.2) posesses a limit. when
we set Xn = Xo + n- 1knt 1 and x~ = Xo + n- 1knt 2, with tI, t 2 E C. This is given by
[tI!(Xo) - 1/2, tI!(xo) + 1/2] n [td(xo) - 1/2, t 2f(xo) + 1/2]
which. for t 2 > t 1 is non-null when (t 2 - h )f(xo) :::; 1, and that is the reason we set
I{.f (x 0) :::; 1.
Theorem 2.1 If Z~i in (1.12) has finite fourth moment then
W~(C) = {W~(t), t E C}
is asymptotically tight, where W~(t) = Wn(xo
I{f(xo) :::; 1, for kn = O(n 4 / S - 20 ).
Proof: Let l-t'i
= W~(xo + n- 1knti)
for i
+ n- 1knt)
= 1,2,3
and C = [-K. K] such that
and t 1 < t 2 < t 3 . From Theorem
2.1. we have
k
k;/2{(3(~nj)(kn/n)2
-(1- p)]
+ (kng(~nj))-1 ~)l(Z;ni > ~nj]
+ Rjnkn },
where
19
i=1
for j = 1.2,3,
for
and
max IRjnkl = O(n-3/5-36/21ogn) a.s.
kEln(a,b)
for j = 1.2 and 3. And we have
k
k;;1 L [J2n i - J 1ni ] + (R 2nkn - Rlnk~)},
i=1
where
And
+2(j1(en2) - j1(enl))(kn/n)2k;;2
[t,
[J2ni - J1nd]
+2(j1(en2) - j1(ent}Hkn/n)2(R2nkn - R 1nkn )
infinity other than
.
20
Let
= A+B.
E
[1:)
J 2ni -
1=1
Jlni)2~)
J 3ni -
J 2ni )( J 3nj -
J2n j )]
I#J
= C,
and
=D.
And let
"'nij
be the number of common elements around xo+n- 1 knt i and Xo+ n- 1 /..'n t J"
then
21
B
C
E
-+
[L
il-j
JiJij + JiJij - 2JiJ2j J3j
0
E [fu(J2i - J,;)(J,j
D
E [
L
il-j#1-1
+
+ J;Jij + J;Ji j
E [
J2i )(J,j - J2 j )]
(J2i - J 1d(J2j - J1j )(J3k - J2k)(J31- J2d]
L
(J2i - J 1d(J2j - J1j)(J3i - J2i )(J3k - J2k)]
L
(J2i - J 1i )(J2j - J1j )(J3j - J2j )(J3k - J2k)]
il-j#
+ E [
~ JIj)fu(J3i -
il-j#
22
So
for -K < t 1 < t 2 < t 3 < K. By the theorem 2.3.3 by Sen (1981), we have shown the
o
asymptotic tightness.
'ke have proved
1. 2, ... , m as n goes to
00.
2) Asymptotic tightness of {",'n(t), t E [-K, K]}.
Those two conditions guarantee the weak convergence of Wn(t) to ",'(t) for all t E
[-1\.1\] by Theorem 8.2.1 of Sen and Singer (1993).
2.2
Kolmogorov-Smirnov type estimator of the
relative potency
The fundamental assumption of bioequivalence model is
Ho : ~T(x) = e(x
+ J1)
for all x E X C R and for some J1 E .\It C R.
(2.:3)
where X and .;\It are closed intervals of R. In this section. we discuss how to estimate Ii
and how to test the fundamental assumption. To estimate J1, we consider KolmogorovSmirnov type estimator J1 which minimize
23
\Ve have
k~/2 (t~.kn (x) - t~.kn (x
+ 11))
Cnder (2.3),
= sup IW;(x) - W;(x
+ 11)1.
xEX
Let
then we need to find 11 which minimize
a~(I1)
with respect to 11 which is equivalent
to minimize
an (l1) = sup IW;(x) - W;(x
+ 11)1
xEX
with respect to 11.
\Ve can estimate 11 by changing x and 11 numerically.
Lemma 2.1 Let
a n (l1) = sup IUT;(x) - W;(x
+ 11)1
for 11 E .-\It
+ 11)1
for 11 E M
xEX
and
a(l1) = sup IWT(x) - WS(x
xEX
then a n (l1) converges to a(l1) in distribution for any 11 EM. So minI-' an (l1) converges
to minI-' a(l1) in distribution.
Proof: For any 11 E o'\It and a > 0,
24
{s~p I~V;(x) -
W;(x
+ 11)1 < a}
-
P
=
P {IW;(x) - WT(x)1 < a for all x EX}
-+
P
=
P{a(Il)<a}.
{I W T(x)
- W S(x
+ II) I < a for all x EX}
as n
-+
x
o
·We may replace the conditional quantiles to the conditional dJ. because of the
following lemma.
Lemma 2.2 ~T(x)
= ~S(x + II)
ijJGT(·lx)
= GS('lx + II).
Proof: (¢)
If GT(·lx) = Gs(·lx
+ II)
then
Gs(~s(x)lx) = p
=> GT(e(x)lx - II)
= P
=> GT(~s(x+Il)lx)=p
=> e(x+Il)=~T(x).
(=»
If ~T(x)
= e(x + II)
then
Gs(e(x
+ 1l)lx + II)
=> Gs(~T(x)lx + II) = p,
25
= P
and by the definition of the conditional dJ.
o
So GT(·lx -/1) = Gs(·lx).
Let n, m be the sample sizes of the test and standard preparations. \Ve may
get Kolmogorov-Smirnov type estimator {i which is defined by the equation
where
D n.m(/1) = sup IGnT,knT(·lx) - Gms,kms(·lx
x
+ /1)1
"",here GnT.knT and Gms.kms are the k - N N empirical df of yT and yS w.r.t. :r and
:r
+ /1.
respectively. \Ve may use the method proposed by Rao. Schuster and Little
(1975) by replacing order statistics to induced order statistics, i.e., let
Then we may have the followings which are remained to future studies
(1) {in. is translation invariant; that is, for any real number c,
'fT
'/'s
/1 ('fT
II"'"
I kn ; II
A
(2) If N
N
2.3
= min(n, m)
---+ 00
and
S)
+ C, .•• , v
I km + C
liN
'fT
}"s
}rS
)
= /1 ('/'T
II"'"
I kn : 1 " ' "
km +
= O(N 1 / 2 - 5 ) for
A
some 8 > 0, then VV(Pn.m)
C.
---+
0 as
with probability one.
Test of the fundamental assumption
Let t~.kn and
t;.kn
be the k - N N estimators of the conditional medians of the test
and the standard treatments. Then we have two independent stochastic processes
26
where
~VT( x)
and
vVs (x)
are the Gaussian processes defined at previous section.
Subtracting (2.5) from (2.4) gives
D
~
Under Ho : ~T(X) = ~s(x
WT(X) - Ws(x
+ 11)
x E X.
+ 11),
where
for all x E X.
We may do the test by bootstrap sampling method. i.e. for m-
dimentional z = (Xll···' Xm
),
we have m-dimentional multivariate normality after
putting 11 = {J.. We can estimate the variance martix by bootrapping method. And we
may have asymptotic consistency of the bootstrap estimator of the variance estimator.
Let n, m be the sample sizes of the test and standard preparations. Let m =
and k m
= Ck n .
Under Ho : ~T(x)
C1
n
= e(x + 11),
(2.6 )
If the fundamental assumption holds, then we may use (2.6) to test it by similar
resampling method.
-,
?""
'We have
•
P [sup IW!(x) - W;(x
rE.l·
p
[II1'!(x) -
+ Ji)1 > Cn.c>]
W;(x + Ji)1 > Cn,c> for some x]
provided
by the week convergence.
2.4
Confidence intervals and powers of the tests:
comparing the cardinality of the k - N N estimator
Stute (1986) considered the case kn = O(n 2 / 3 ), the cardinality of NN-estimator.
while Bhattacharya and Gangopadhyay (1988) considered kn = O(n 4 / 5 ). However,
in the later case, there is generally a bias term of the order n- 2 / 5 • and it may be
necessary to eliminate the bias term in the drawing statistical conclusions. Sen and
Gangopadhyay (1990) considered kn = O(n 4 / 5 -
2o
)
for b > O. And they showed that
the bias term is eliminated. In this section, we compare the confidence intervals and
the powers of our tests between kn = O(n 4 / 5 ) and kn = O(n 4 / 5 For kn = tn 4 / 5 -
20
,
(0 < a < t < b <
00.
28
2o
).
b> 0) and for fixed x and Ji, we have
where
(
a 2 =p1-p)
(1
1) .
--+
g2(x)
g2(X
+~)
So (1 - a) x 100% confidence interval is
(2.8)
where Zex is the a-percentile from the standard normal distribution. And the width
of (2.8) is
1 2
2 5
2Zex/2 akn / -- O(n- / +O) .
And the power under the alternative hypothesis Ha : ~T (x) -
(2.9)
e (x + ~) =
k;;1/2 a d( x)
IS
< Zex/2}
= 1- (ep(Zex/2 - d(x)) - ep(-Zex/2 - d(x))).
(2.10)
To do the actual inferences for more than one point of x E X, we let z = (Xl,···. X m
be the vector of the observed values. Then, as n
where
29
--+ 00,
)'
and E is a diagonal matrix
And
for E- 1 / 2 such that E- 1 / 2E E- 1 / 2 = 1. To get (I-a) x 100% simultaneous confidence
interval under the null hypothesis Ho :
e (z) = e (z + J.l), we have
T
S
for i = 1"", m. And
is o./m . m = 0.. where Aj is the i-th element of the vector A. So the (1 - 0.) x 100%
simultaneous confidence interval for
eT (z) = eS (z + J.l) is
(2.13)
where E 1 / 2 - = (O"b"',O"m)' where
0";
is the i-th diagonal element of E. And the
power under the alternative hypothesis Ha :
eT (z) - eS (z + J.l)
= E 1 / 2 k;;1/2d( z)
where d(z) = (d(xd,···,d(x m ))' is
1/2 (A T
AS
Pr { IE- 1/2 k n
en (z) - en(z
+ J.l)) i I> Zo/(m2) for all i = L···. m }
Pr
-
{IE-l/2k~/2 (e~(z) e~(z + J.l)
30
- E 1/ 2k;:1/2 d (Z))i I
> Zo/(m2) - d(z) for all i = 1,'" ,m}
m
=
II (1 - <1>(Zo/(m2) -
d( z)d
+ <1>( -Zo/(m2) -
d( z);)) .
(2.14)
;=1
For k n = tn 4 / 5 , (0 < a < t < b < (0) and for fixed x and J.l, we have
(2.1.) )
as
n -+x.
where
2
(j = p(l - p)
(
1 + 1)
+
g2(X)
g2(x
J.l)
,
and
3(x) =
p(~(x))
=
-1
24j3(x )g( ~(x) Ix)
x[f(x)Gxx(~(x)lx)
+ 2J'(x)Gx(~(x)lx)].
as defined in (1.11). So (1 - a) x 100% cofidence interval is
And the width of (2.16) is
1 2
2Zo/2 /Tkn /
V
--
2 5
O(n, / )'
And the power under the alternative hypothesis Ha :
e (x) - e (x + Ii) =
IS
1 - Pr { - ZO/2 <
2 5
k1/2
:
('T
~n (x) -
t / (p(x) - p(x
--;;-
's
~n (x
+ Ii) )
+ J.l)) < ZO/2 -
31
d(x) }
(2.17)
f.... ;;1/2(jd(.r)
(2.18 )
= 1- (4)(ZQ/2 - d(x)) - 4>(-ZQ/2 - d(x))).
vVidth of the confidence interval for kn = O(n 4 / S ) (2.17) is shorter than that
for kn = O(n 4 / S t 2 / S (3(x) - :3(x
26
)
(2.9). But the confidence interval for kn = O(n 4 / S ) has bias term
+ {l)).
So the question is the effect of the bias term in (2.16). If we
have neglegible bias then kn = O(n 4 / S ) gives the smaller confidence interval. but the
bias term is not neglegible then we need to use kn = O( n 4 / S -
2c5
)
or consider how to
adjust the bias term. We may say that the bias term is neglegible if the confidence
interval for kn
iT (x) - is (x
=
+ {l)
O(n 4 / S ) is a subset of that for kn
=
O(n 4 / S -
26
).
If we suppose
are the same for the two kn's then if
•
(2.19)
and
(2.20)
then k n2 = tn 4 / S is better where k n1 = tn 4 / S -
2c5
and k n2 = tn 4 / S • Condit;
(2.19)
and (2.20) are equivalent to
Left hand side of (2.21) goes to
00
as n
asymptotically better than k n = tn 4 / S -
-+ 00,
26
interval.
32
so we may say that J.'n = tn 4 / S is
in terms of the width of the confidence
If our test is to reject Ho : ~T ( x) -
e (x + f.l) = 0 if
we need a distribution of the supremeum of the difference of the processes. \\-'e may
generate the critical values numerically. To calculate the power under the alternative
hypothesis Ha : ~T(x) - e(x
+ f.l)
= d(x), we may need to generate the distribution
based on the drift function d(x). To do that, we may assume some parametric form
of d(.r). e.g. quadratic form.
2.5
= O(n4/ 5 )
Bias correction for k n
In the previous section, we suggested that kn = (n 4 / 5 ) is asymptotically better than
kn
=
O(k~/5-26) in terms of the width of the confidence interval. But kn
=
O(n 4 / 5 )
has bias term in the asymptotic mean, namely, ;3t 2 as in Theorem 1.2. In practice,
we can adjust that bias by applying Jackknife method. In this section, we consider
the bias correction for kn = O( n 4 / 5 ) by applying Jackknife method for given data.
First method is Jackknife subsampling from the n original data (NBC). From
Theorem 1.1 we have the following experssion for tkn,n'
tkn,n - ~ = J3(~)(kn/n)2
kn
+ {kn9(~)}-12)1(Y;i >~)
- (1- p)]
+ R nkn
i=l
where
max IRnkl = O(n- 3 / 5 Iogn), a.s.
knEln(a,b)
Similarly. for
6
p
n.pn' (0 < P < 1)
[kpnJ
tkpn.pn - ~ = (3(~)(kpn/pn)2
+ {kpn9(O/2}-1
L
i=l
33
[1(~:i >
0-
(1 - p)]
+ Rpnkpn '
Finally, we have
where
max
knEln(a,b)
IRI
= O(n- 3 / 5 logn), a.s.,
for kn = tn 4 / 5 . We may use
A
1
[A
~kn,n - t 2(1 _ p-2/5) . mean of ekn,n -
A
6 n,pn
]
p
(2.22)
as a bias reduced estimator. We have equivalent results for -d subsampling method.
Let tkn_d.n-d be the conditional quantile estimator based on n - d samples, then we
have
kn - d
= 3(~) ((kn-d)/(n - d))2
+ {(kn_d)g(en- 1 L [l(Yn*i
>~) - (l - p)]
+ Rn-d.kn_d'
i=l
Similarly, we have
for J..'n = tn 4 / 5 . We may use
~k",n - 12(1 _ (( n ~ d)/n) -2/') , mean of [~k",n - ~k"_"n-d]
as a bias reduced estimator.
Second method is Jackknife subsampling method from the k n nearest neighborhood sample (KBC). For
~pkn.n,
tpkn.n - ~ = 3(~)(kn/n)2p2
(0 < P < 1)
(Pknl
+ {pkng(O/2}-1 L
i=l
34
[l(Y~*i >
0-
(1- p)]
+ R npkn ·
.
Finally, we have
where
max
IRI=O(n- 3 / 5 Iogn), a.s.
knEln{a,b)
vVe may use
(2.:2:3 )
as a bias reduced estimator. Similarly, let
based on k n
i,k n -d.n -
~=
-
!,kn-d,n
be the conditional quantile estimator
d samples, then we have
:3(0 ((k n
-
d)jn)2 + {(k n - d)g(~)} -1
kn-d
L
i=1
Similarly. we have
\Ve may use
as a bias reduced estimator.
35
[l(Y:i > 0
-
(1- p)]
+ Rn .kn - d .
..
..
Chapter 3
NN-Estimator of a Conditional
Quantile with Measurement
Errors
..
3.1
Asymptotic properties of k-NN estimator
with measurement errors on the covariate
variable
3.1.1
Approximation of the conditional quantile
\Vith uniform measurement error on X, we observe (T1 , Yi), (T2 , }i), ... , (Tn'
iidr\', where T = X
+ U,
}~)
of
U '" U( -E, €) and X and U are assumed to be independent.
Then
P(T:S; t)
=
P(X
+ U:S;
36
t)
1:
=
~ tlU = u)!u(u)du
Iff P(X ~ t - ulU = u)!u(u)du
jf-f Gx(t - u) 2(1 du,
=
=
It+f
=
t-f
1
Gx(u)?du
_(
Gx(t) as ( ~ 0+,
~
h(t)
P(X + U
jf !x(t - u)-du.
1
-f
2(
It+t-f f!x(u)-du
1
2(
-
=
!x(t)as(~O+.
~
And
!X.T(X, t)
-
!x,x+u(x, t)
-
!x,u(x, t - x)
=
!x(x)!u(t - x)
-
! x (x )I(x-f<t<x+f) 1/ (2().
and we assume non-differential measurement error, i.e.,
P(YIT, X) = P(YIX).
So.
!YIT(ylt)
=
J
!Y,XIT(y, xlt)dx
37
=
=
=
J(fY,X,T(y,x,t)/!T(t))dx
J(JYIX,T(ylx, t)!X,T(X, t)/!T(t)) dx
J(JYlX(yIX)!X.T(X, t)/ !T(t)) dx
1
= f T1(t)?
_f
GhT(YIt)
I
t
+
f
t-f
I
1 1 jY
= 7T)?
T t _f
1
!T(t)
1
2f
I
t
f
+ !Ylx(zlx )!x(x )dxdz-
f
+
t-f
t
t-f
-00
=
!Ylx(ylx)!x(x)dx,
GYlx(ylx)!x(x)dx
(:3.1 )
and
It is easy to show that
and let
+ 2Gx(ylt)f.~(t)
!i (t)
1 Gxx(ylt)!x(t)
3
= A
.
where
So for small
f,
<
(. It) - G
(It) 1 Gxx(Ylt)!x(t) + 2Gx(ylt)f.~(t) 2/,)
GYlT
y - Ylx Y - 3
!i (t)
f
-
38
+ 0 (II f11 2.)
(:3.2)
-
and
If we put (3.3) into (3.2) then we have
G}'IT(Tlt) =
GYIX(O\t)
+ fYlx(Ojt)(T -
0)
+ O(IIT -
-~ Gxx(Tlt)fx(tjl~t~Gx(Tlt)fx(t)
f.2 /2
OW)
+ o(IIf.W)
as f.
-+
+ 2G x(Tlt)fx (t) f. 2/2 + 0 (II f. 11 2) as f.
fl (t )
-+
0 + (:3.-±)
Due to the Lemma 3.1 (3.4) becomes
GhT(Tlt) =
GYlx(OIt) + fYlx(Olt)(T - 0)
_~ Gxx(Tlt)fx(t)
3
0 + (3 "")
. ..J
If we let
then
GhT(Tlt) - GYlx(Olt)
=
as f.
-+
frvd Olt)( T - 0) -
~ G xx (Tit )fx(tjl~ t~Gx( Tit )r~( t) f.2 /2 + o( IIf.W) =
0(:3.6)
O+. So we have the following theorem.
Theorem 3.1 Let 0 and T be the p-quantile from GYlx(·lx) and G}'IT(·lt). respec-
tivtly. where T = X
+ U,
U "'" U( -f., f.) and T and X are assumed to bf. indepf.ndf.nt.
thf.n
as f.
-+
0+.
39
Lemma 3.1 Let p
tl ~
GhT(rlt)
=
GYlx(Olt) and if there exists G~l(p) such that
= p ~ G~l(p) = y for e < ell
GhT(y!t)
as
=
0+
Y E R, t E Rand p E (0,1) . Then
.
Proof: Let
GYlx(rlt) +
GY1x(rlt)
::2 GhT(r1t)lf=0
2
e
/2 + O(lleW)
+ Ae 2 /2 + O(lleW)
p.
Then
r = G~l(p)
.
lim : G;: (p)
ue .
o
f-O+
Therefore.
o
And the lemma is proved.
3.1.2
Asymptotic normality of the conditional quantile estimator with measurement error
\Ye take H'i =
ITi - xol for
i = 1, ...• n then
~Fnl
< ... < Wnn are the order statistics
of IV1 •···• H~ and }'~\"'" }~fn are the induced order statistics.
40
!
The conditional
quantile,
C=
~;(xo)
is satisfying
for given p, (0 < p < 1) and
Xo
E R 1 . The contaminated k-NN empirical cdf of Y
(wi th respect to xo) is defined as
k
G~k(Y) = k- 1 L 1(Y:i ~ y).
i=1
The k- NT\" estimator of C can now be defined as
t~k
=
the [kp]th order statistics of Y:1, ... , Y:k
inf{y : G~k(Y) ~ [kp]jk}.
Then we have the following theorems similar to Bhattacharya and Gangopadhyay
(1988)"s
Theorem 3.2 For
t~k
f.
> 0 and k E In(a, b)
k
- C=
f3(C)(kjn)2
+ {kg~(C)}-1 L[1(Yn7 > C) i=1
what
max IR~kl = O(n- 3 / 5 log n), a.s.
kEln(a.b)
and
G~(ylt )
G*~(y Iw)
41
(1- p)]
+ R~k
Theorem 3.3 Let T~(t) = t~,[n4/5tr Then for any 0 < a < b.
de note a standard Brownian motion.
Theorem 3.4 For e = en = 0(n- 1/ 5),
k
=
;3(C)(kjn)2 + {kg((XO)}-l L:[1(1-:;/ >
C) -
(1- p)]
+
R~*k
i=l
u,here
-
13(~)
as n
---+ DC,
•
and
max
kEln(a,b)
nk
IR(*I
=
0
(n -2/5) ,a.s.
Theorem 3.5 Let T~(t) = t~,[n4/5tl' Then for any 0 < a < b and en =
u,herf 3
= 3(C)
given in Theorem 3.4.
(72
= p(1 -
a standard Brownian motion.
42
p)jg2(~)
1 5
0(n- / ).
and {B(t), t > O} denote
Proof of Theorem 3.4
And
so
1+,
-Gx(ylt - f.)fx(t - f.) - Gy\x(ylt - f.)j'x(t - f.)]
+ {[GndYlt + dfx(t + f.) -
x(fx(t
+ f.) -
fx(t - f.))
43
1
GYlx(ylt - f.)jx(t -
1-,
d]
jx(.r)dJ'
X
(j
t
-
I
I
x2
) 2
f+t
fx(x)dx
f-t
f
+ Gy1x(ylx)fx(x)dx(fx(t + €) - fx(t - €) }
t-f
t
f
+ fx(x)dx(fx(t + €) - fx(t - €)) }
t-f
Therefore.
4
•
X21:~:f fx(x)dx
+
x (fx{xo
+ €) -
fx(xo - €))}
1
2-;:;-(f:d x o + €} - f.dxo - €))
_€
44
/ (l:~:f
fx(x)dx) 2
rXO+f
[ - }XO-f Gyp;(Clx)fx(x)dx(J~(XO
+ €) -
] r XO +f '
}
f~(xo - €) (}XO-f fx(x)d.r) .
F( XO)G;t(~f Ixo) + 2F (xo)G;(e Ixo)
t
ff3(xo)gf(~f)
-
l
XO f
1
1
{
+
2€ U:OO!ff fx(x)dx)2 (XO-f fx(x)dx)2 [(Gx(CIXo + €)fx(xo + €)
+GYlx(Clxo
+ €)f~(xo + €) -
Gx(CIXo - €)fx(xo - €)
45
;f
[Gx(~elxo + f)Ix(xo + f) -
Gx(Clxo - f)Ix(xo - f)]
•
+ .:_f [GYlx(~'lxo + f)f.~(XO + f) - GylJdClxo - f)I'x(xo - f)]
-~ J:Oo~,e GndClx)Ix(x)dx (I' (x + f) - I' (x - f))
2f
fxo+eI
xo-e X (x)dx
X
0
X
0
/ (;f l:~:e Ix(x)dx) (;f l:~:e gYlx(elx)Ix(x)dx)
(:3.8)
2
--t
[Gxx(~lxoLfx(xo)+ 2f.~(xo)GX(~I.i:O)
Gxxf + 2f'Gx
j3g
as
+ GYlx(~lxo)I~(xo)
0
f --t
.
o
3.1.3
Asymptotic variance and bias of the process with
measurement error
In this section, we investigate the asymptotic variance and bias of our process with
measurement error. We call the variance and the bias from the uniform measurement
error model as contaminated bias and contaminated variance.
\Ve will show the
convergences of contaminated variance and bias to the original (uncontaminated)
\'ariance and bias. And sufficient conditions for the condition that the contaminated
variance and bias are greater than the unocontaminated ones are derived.
From the Theorems 1.4 and Theorem 1.5, we let (3 = (3(0 be the asymptotic
46
bias (ABIAS) and let p(l - p)/g2(~) be the asymptotic vaiance (AFAR), where
g(~)
=
gYI.Y(~lxo).
Similarly. from the Theorem 3.2 and Theorem 3.3, we let j3€ = j3(C) be the asymptotic
bias (AFAR€) and let p(l- p)/(g€)2(C) be the asymptotic vaiance (ABI AS€), where
,3(()
=
g€(C) =
-[r(xo)G~t(~€lxo)
+ 2r'(xo)G~((lxo)]/{23r3(xo)l(()},
€ ('€It) _ J:oo~€€ gYlx(Clx)jx(x)dx
gYIT '"
JXo+€ j (x)dx
Xo-€ x
To prove Theorem 3.6, we need Lemma 3.2.
Lemma 3.2
Proof: By Taylor expansion,
{XO+€
}xO-€ jx(x)dx =
and
{XO+€
}XO-€ (x - xo)jx(x)dx
o
47
Let g(Ylx)
= gYlx(ylx)
be the conditional pdf of Y given X
= x,
then
where
g'(ylx) =
8
8ygY 1X (ylx).
(3.10)
So.
AV AR~ -
AVAR as
O.
E -
And if
g'(~lxo)(~~ -
0 < 0 for
all
E
> 0,
then
Av AR~ > AVAR for small
T
Proof: From (3.9),
l
xo+~
g~(~~) x xo-~ fx(x)dx
=
1:~:~ g(~~ Ix )f(x )dx
48
L
(3.11 )
+g'(~lxo)(e -
rxo +,
0 }xo-,
+ 1~~:' (O(llx -
xoW)
fx(x )dx
+ O(lle - ~W))
f(x)dx,
and by Lemma 3.2,
g'(e)
g(~lxo) + gx(~'lxo) (~j((::1
+g'(~lxo)(~' -
0 + O(llxi' -
E
2
+ O(II E II 4 ))
~W)
o
If gYlx(yl.r) (conditional distribution of Y) given X
=x
mum at median then the difference between g'(~') and
of
2
E ,
so the difference is neglectable up to order
2
E •
is unimodal and has maxi-
g(~)
at the median is the order
So our procedure is robust for
the measurement error: our procedure is not sensitive to the measurement error.
From (3.8). let
1
(-1; J:oo!,' fx(x)dx)
2
J:oo!: g(~'lx)f(x)d.r
{[Gx(~'lxo
+ E)f(x + E) -
Gx(elxo - df(x - E)]
+ [G(Clxo
+ E)J'(X + d -
G(elxo - E)J'(.r - E)]
_J:OO!,' G(~'lx)f(x)dx [J'(x
J:oo!,' f x (x )dx
0
A+B+C
D
D
(;J
X
2( 2E f(xo)
+ O(IHI 3))2
49
+ E) -
J'(.r - E)l}
0
· [g(~lxo)
+ g'(~lxo)(C -
+ g'(~lxo)(~ -
-
f2(xo) (g(~lxo)
=
2f3(xo)g(~lxo)E
G(CIXO±E) -
O(2Ef(xo)
+ O(IIEW) + O(IIEW)]
e)) x 2f(xo)E
+ 2f3(xo)g'(~lxo)(~~ -
~)E
+ O(lkW)
+ O(IIEW).
G(~IXo)
-
G(Clxo ± E) - G(~IXo ± E)
=
g(~lxo
+ G(~IXo ± E) -
± E)(C - 0 + 0(11e -
±Gx(~IXO)E
G(~lxo)
~W)
+ O(lkW),
and
Therefore,
G(Clxo ± E)J'(XO ± E)
= G(~lxo)f'(xO)
+g(~IXo)J'(XO)(C
B =
[Gx(~lxo)J'(xO)
±
-
+ G(~lxo)f"(Xo)] E
0
(±gx(~lxo)f'(xO)
+ g(~lxo)f"(xo)) E(~~
2 [Gx(~lxo)J'(xO)
+ G(~lxo )f"(XO)] E
+2 [gx(~lxo)J'(xo)
-~)
+ O(IIEW)·
+ g(~lxo)f"(xo)] E(C - 0 + 0(llfI1 2 ).
Similarly.
A =
2 [Gxx(~lxo)f(xo)
+2 [G~(~lxo)f(xo)
+ Gx(~lxo)J'(xo)] E
+ G~(~lxo)J'(xo)] E(C - 0 + O(IIEW),
50
where
G'x -- ~G
By x ( y I)
x an d G"x -- ~G(l)(
By x y Ix ).
And
C
-
1
f:OO~ff fx(x)dx[2f"(xo)t
x l:~:f
+ O(lltI13)]
(G(~lxo) + Gx(Clx)(x -
xo)
+ O(llx -
xoW)
g(~lxo)(C - 0 + O(IIC - ~W)) f(x)dx
Therefore,
-23B(C) -
Gxx(~lxo)f(xo)
+ 2GX(~lxo)J'(xo)
j3(xo)g(~lxo)
+ j3(xo)~(~lxo) {gx(~lxo)J'(xo) + G~(~lxo)J'(xo) + f(xo)G~(~I.ro)
_ (Gxx(~lxo)f(xo)
+ 2Gx(~lxo)J'(xo)) P(xo)g'(~I.ro)} (C
j3(xo)g(~lxo)
51
_~)
-
(Gxx(~lxo)f(xo) + 2Gx(~lxo)f'(xo))~((:i::;} (C -~)
+O(lltW)·
Theorem 3.7 If
lie -
~W =
O(lltW)
then,
ABIAS; - ABIAS 2
3.2
Approximation of Conditional Quantiles with
Measurement Erros
3.2.1 . First order approximation
Let G( ·Ix) be the conditional c.dJ. of Y given x then
(:3.12)
as
t ~
liT - 0Il
O. where G'(·lx) is the first derivetive of G('lx) with respect to
~
.T.
And as
O.
G(Tlx) = G(Olx)
+ g(OIX)(T 52
0)
+ O(II T -
OW),
(:3.1:3)
where g(Olx) = (8/8T)(G(Tlx))IT=8' Substitute (3.13) into (3.12), we get
as
E ----.
O. If we let G(Tlx
as
E ----.
O. So.
+ u) = G(Olx) = p,
G'(Tlx)
2
T = 0 - g(Olx) E + O(lkll )
as
E ----.
0 if
(3.1.5)
3.2.2
2nd order approximation
Let G( 'Ix) be the conditional c.d.f. of Y given x then
as
E ----.
0, where G'( 'Ix) and G"( 'Ix) are the first and second derivetives of G( ·Ix) with
respect to x. And as liT - 011----.0,
G(Tlx) = G(Olx)
+ g(OIX)(T -
0)
+ g'(OIX)(T -
0)2/2 - o(IIT - OW).
(:3.17)
where g(Olx) = (8/8T)(G(Tlx))IT=8 and g'(Olx) = (8 2 /8T 2 )(G(Tlx))lr=8' Substitute
(:3.17) into (:3.16), we get
as E ----. O. Since G(Tlx
+ E)
= G(Olx) = p,
53
as
t -+
O. If
liT - oW =
o(lltW) as
liT -
011
-+
(3.18)
0,
then we have
G(Tlx
as
t -+
+ t)
= G(Olx)
t -+
0)
+ G'(Tlx)t + G"(Tlx)t 2 /2 + o(lltW).
•
O. So,
T
as
+ g(OIX)(T -
= 0 _ G'(Tlx) _ G"(Tlx) 2/?
g(Olx) t
g(Olx) t
-
+0
(II 11 2)
t
,
(:3.19)
O.
3.2.3
2nd order approximation with symmetric argument
Let G( 'Ix) be the conditional c.d.£. of Y given x then
G( ·Ix
+ t)
-
G('lx-t) =
as t
-+
G(·lx)
+ G'(·lx)t + G"(·lx)t 2 /2 + o(lltW)
G(·lx)-G'(·lx)t+G"(·lx)t 2 /2+o(lltWL
0, where G'(-Ix) and G"(·lx) are the first and second derivetives of G( 'Ix) \vith
respect to x. And
(G(·lx
as t
-+
+ t) + G('lx
= G*(·lx,t) = G('lx) + G"(·\x)t 2 /2 + o(lltW).
(:3.20)
O. G*(·lx,t) can be considered the conditional c.d.f. of Y given x and
Remember that T
liT -
- t))/2
011-+
Eo X
+ u Eo X
to
- U under the symmetric assumption of U. And as
0,
G(Tlx)
where g(Olx)
= G(Olx) + g(OIX)(T -
= (8/8T)(G(Tlx))IT=B.
G*(Tlx,t) = G(Olx)
+ g(OIX)(T -
0)
+ O(IIT -
OW).
(:3.21 )
Substitute (3.21) into (3.20). we get
0)
+ O(IIT -
54
OW)
+ G"(Tlx)t 2 /2 + o(lltW)·
Since G*(Tlx,t) = G(Olx) = p,
as
t
--t
T =
0-
~:~~~~) t 2 /2 + o(lltW) + O(IIT -
=
0-
G"( Tlx) 2
2
9(0 Ix) t /2 + o( litII ),
OW)
0 if
(:3.22)
So the question is the relationship between the magnitude of measurement error
(t) and the difference between the two quantiles (T - 0) .
.\'ote that
O(x) - O'(x)t + O"(x )t 2 /2
O(x - t) =
as
t
--t
+ o(IItWL
O. By similar argument, we let
T ~ O(x
+ t) + O(x -
t) = O(x)
2
+ O"(x)t 2 /2 + o(lltW),
so that (3.18) and (3.22) hold.
3.2.4
Bivariate symmetric measurement error
Let Y. X '" !(Y, X) and
T = { X +
X-
t
wp 1/2
t
wp 1/2
Then
!(YIT = t) =
=
!l'",T(Y, T = t)
h(T = t)
(fy,X(Y. t + t) + !Y.x(Y, t - t))/2
(fx(t + t) + !x(t - t))/2
55
fY,x(Y, t + €)
fx(t + €)
+ fY,x(Y, t + fx(t - €)
€)
And
f~oo fY,x(z, t
=
fx(t
=
=
+ €) + fY,x(z, t - €)dz
+ €) + fx(t -
f~oo
fy,x (z, t )dz
fx(t)
jY
~ [fY,x (z, t + €)
ih
E=O+
+ fy,x (z. t -
+ €) + fx(t
€)]
d-I
- €)
-
~ [fY,x(z, t + €) + fY,x(z, t -
d] d-I
-00
fx(t
€)
1
E=O+
0,
=
and let
=
jY
-00
-
j
=
A(Ylt),
y
-00
fx(t
8€2
+ €) + fx(t
- €)
-
E=O
4 (f" (z, t)fx(t) - f(z, t)f"(t)) d::
fi
where
f"(::, t) -
::2 fY1 x(z, t + €)I
E=O+
then we have
"',
And
8
8y G(ylx)
=
56
g(ylx),
82
8y 2 G(y Ix ) -
9' (y Ix ).
And
Therefore. if
the relationship between () and r is
Example 3.1 In the normal case, (3.23) becomes,
=
G(rlt.1:)
G(Olt)+g(Olt)(r-O)
O-jJ-
-g(Olt)--(r - 0)2/2
(j
whEre J1 and
3.2.5
+ A(r, t)E 2/2 + O(lkW),
are the contitional mean and standard deviation of Y given X
(j
Relationship between two quantiles:
+
I.
under normal
assumptions
Let
then for small
E
(> 0)
(3.2.5)
5i
Let e(x) be the p x 100 % quantile of Ylx, and T(X
of Ylx
+ f,
+ f)
be the p x 100 % quantile
i.e.,
So in the normal case. we have
Ile(x) -
T(X
+ f)11 = O(llfll)·
Therefore. (3.14) and (3.23) hold in the normal case.
Example 3.2 Under the assumption of (3.25), (3.14) becomes
T=
Let Y.X
I"V
e + ~ </J((T -
ax - b)/er) f + O(llfW).
er tP( (() - ax - b) / er)
BN(J.lx,J.ly,er},er},pyx) and T = X
and X and t: are assumed to be independent. Then T
+U
I"V
where C
N(J.lX, er}
I"V
N(O.ern
+ ern.
\Ne are
interested in the difference between the conditional quantiles of Y given X and given
T. Fnder these assumptions.
YjT = t
N (J.lY
""
+ PTY
Our parameters, conditional quantiles
Jerlery+ erlJ (t -
J.lx),er}(l- ph»)
e, T are expressed as
If we restrict to p = 1/2. i.e., median, then we have
() =
ery
J.ly + PXy-(x - J.lx)
erx
T
58
..
So,
O-r
O'y
( PXY
x - jlx
O'x
Jt0'1- +jlxO'[r ) '
- PTY
So.
O'y
O'y
( PXY O(x) - r(t) = -(pXyx
- PTyt) - -jlx
O'x
O'x
V1 +PTY)
.
O'ij !0'1
In generaL we may assume PXy < PTY, so we may let PTY = CPXY, (0 <
O'y
O'y
(1O(x) - r(t) = -pxy(x
- ct) - -jlxPXY
O'x
O'x
C
< 1) then
C)
.
.
VI + 0'[-/0'1
For general p, (0 < p < 1), we have
j
l1(X)
-00
j
<pYjx(y)dy
~Ylx(O(x))
Again. let PTY = CPXY. (0 <
cP
(0 - jly -
C
T(t)
-00
<PYlt(y)dy = p
~Ylt(r(t))
=
= p
< 1) then
PXYO'Y/O'x(x - jlx )) =
~
(r - jlY - CPXYO'Y/VO'l + O'[-(t - jlx ))
O'YVI - c2 PlY
O'yVI - Ply
1 - ply
2/ 2
1 + O'u O'x
x +t
wp 1/2
t
wp 1/2
T={ X -
59
-
VI
-
C
2 2
PXY
)
.
Then
j(YIT)
=
j(Y, T)
j(T)
=
(f(Y, T + t) + j(Y, T - t))/2
(f(T + t) + j(T - t))/2
Mixture of two bivariate normals
Mixture of two univariate normals .
G(YIT
= t) =
=
iYoo j(zlT = t)dz
JY
-00
-
</>Y,x(z, t
<Px(t
+ t) + </>Y,x(z, t -
+ t) + </>x(t -
t) dz
t)
1
</>x(t
x
+ t) + </>x(t -
JY
-00
t)
(</>Y,X(Z, t + t) </>x(t
</>x(t + t)
+ t) + </>Y,x(z, t -
t) <Px(t - t)dZ)
<Px(t - t)
1
-
</Jx(t
x
-
+ t) + <Px(t -
t)
iYoo <PYlx(zlt + t)</>x(t + t) + <PYlx(zlt 1
<Px(t
+ t) + <Px(t -
t)
So the relationship is
G(Tlt)
=
1
<Px(t
+ t) + <Px(t 60
t)
t)</>x(t - t)dz
<I>YIX (Olx) = p,
1
<px(t
where YIX = x '" N(a
+ t) + ¢x(t - t)
+ bx,( 2 )
for a = J.ly - pay/aXJ.lx,b = pay/ax and a 2 =
a?(1-p2).
3.2.6
Approximation of conditional cdf with bivariate normality and uniform measurement error
Condition 3.1 Let (Y, X) has bivariate normal distribution and T = X
+ U.
Co ""'
F(-ct). AndX andU are assumed to be independent. Let Gf(·lt) be the conditional
cdf of Y given T = t.
Theorem 3.8 Under condition 3.1, as
t -
0+,
</>((t - t - f.lX )/a)() - </>((t + t - f.l)( )/ax)
]
x [
aY+J.ly-t
<I>((t + t - wed/ax) - <I>((t - t - J.l.d/ax)·
.
<I>ypdylt) - 9Y!-"(Ylt)
x [ox(t
<I>x(t
(:1)
- t) - d>x(t + t) a1
<I>x(t - t)
+ t) -
61
+ J.lx - t] + o(lkW),
and
as
t ~
a:r
+ band a1
0+. where <1>(.) and ¢(.) are the cdf and pdf of standard normal distribution.
are the conditional mean and the conditional standard deviation of Y
given X = x. PX and ax are the marginal mean and the standard deviation of X.
Proof: Let
<I>(ylx) = <I> (y - (:: + b)) ,
""'here a = PXyay j ax, b = py - PXyay j ax px and a1 =
<I>
~<I>
o:r
~<I>
ox
2
(y - (:~ + b)) Ix=t
(y - (ax + b)) I
x=t
a1
(y - (ax + b)) I
<I> ( Y - ;: -
-¢
ph··). Then
b)
(y - ;: - b) (:J
_¢ (
x=t
a1
Jaf· (1 -
y- ;: - b)
(:J
2
(y - ;: - b)
So
<I>(yl:r )
<I> ( y - at - b) -¢ (y - ax - b) (a)
- (:r-t)
a1
a1
a
(:J
_¢(y-;:-b)
2
(y-;:-b) (x*-t)2j2.
(:3.26)
for l:r* - t\ < Ix - tl. Note that
:x
<f>(ax
+ b)
-¢(ax + b)(ax + b)a.
-
(x-px), ()
2
<Px x ,
ax
and
J/!ff xa>.d:r )dx
J/!: a>x (:r )d:r
J/!: (-d>~dx)al + pxd>x(x))dx
J/!: d>x (x )dx
62
(3.27)
(:3.28)
oJ
=
- [<px(t + t) - <Px(t - t)]
<PX(t+t)-<PX(t-t)
=
- [<p((t + t - J.lx )/ax) - <p((t - t - J.lx )/ax)]
~<p--:-((-t-+-t---J.l:'-x-)--:-/:'-a-x-)---<P-(:'-(t:'-_-t---J.l'-x-)-'-/a-x-')----'-a x
+J.lx
.
+ fL.x(J·29)
And by (3.1) and (3.26),
+R,
where
IRI <
<
Constant
t+
It-f
f
(x
2
-
2xt
+ t 2 )dx
=
So. we have
And
· [rJJx(t-t)-<px(t+t) 2
1lID
ax
f-O+ <Px(t + t) - <Px(t - t)
=
+ J.lx -
· [<bx(t - E)((t - t - J.lx )/a1)
1lID
f-O+
<Px(t
t
]
+ <Px(t + t)((t + t -
+ t) + ¢x(t 63
E)
J.lx )/a1) 2
aJ(
.
+
fir -
t]
t - J1I
=
+ J1I
- t
0
o
The theorem is proved.
Lemma 3.3 Under condition 3.1, theorem 3.8 becomes
X
as
f --t
<px(t-t)-</>x(t+t) 2
[ <PX(t + t) _ <PX(t _ t(X
+ J1x
]
- t
<PYlx(ylt) - </>Ylx(ylt)
(:1)
x [:.:((:}) ai + PX
t] + O(lltW),
-
-
2
+ o(lltll !
0+.
Extend these to general Gx (') and g(x) = g((x - p)ja).
\Vhat about
3.2.7
l
b
<p'(x)dx=</>(a)-¢J(b)
?
Conditional quantile from general cdf with uniform
measure error
In this section, we consider general cdf of (Y, X) and uniform measurement error. Let
y. X ,...., Gyx(y. x). X '" G x (') and T = X
+ U where U ,...., U( -t. t) and X
and Care
assumed to be independent. Then, under non-differential ME assumption. we haw
GT(t)
=
j
t+€
t-€
1
Gx(u):-du
64
2t
.
gT(t)
I
t+f
t-f
1
gx(u)-du
2E
Jt~ff GYlx(ylx)gx(x)dx
Jt~ff gx(x)dx
GyIT(y!t) =
V\i'ithout
~vlE.
fz(::)
let Z =
=
g(yl::)
If we let VV =
fw(w)
g(ylw)
IX -
gx(xo+z)+gx(xo-z),
[gx(xo
IT =
xol, then we have
+ z )g(Ylxo + z) + gx(xo -
z )g(Ylxo - z)] / 1z(z),
xol, then we have the following results,
gT(XO+W)+gT(XO-W)
[gT(XO
+ w)g(ylxo + w) + gT(XO -
rXO+W+f
[ }XO+W-f GYlx(ylx)gx(x)dx
65
w)g(Ylxo - w)] / fw(w),
rXO-W+f
+ }XO-W-f
]
Gndylx)gx(x)dx
Let
1
+ fB gx(u)du
f4 gx(u)du
X
[i GYlx(ylx)gx(x)dx + kGYlx(ylx)gx(x)dx]
If we do Taylor Expansion, we have
and
lim {){) Ghw(ylw) = O.
e..... O+
f
So, we get
Ghw(ylw) = GYlz(ylw)
+ :f22Ghw(ylw)1
f2j2
+ o(llfW).
e..... O+
Theorem 3,9 Lff (Y. X) ,...., BN, T
= X + U,
U", U( -f, f) and
~v
= IT - xol.
And
X and [' art assumed to be independent. Let Ghw(ylw) be the conditional cdf of Y
given
~.
= w. then
whert
G}'~IT(ylt)
1
=
(fA dlx(x)dx + fa <Px(x)dx)
x (<PYJx(ylxo
+ w)
i
ifJx(x)dx
66
+ <PYlx(ylxo -
w)
k
ifJx(x)dx)
=
4>x(XO
<l>YI.:dylxo
.4*
w)<Px(xo - w)
+
1
=
x - [</J (
+ <P (
-
+ w) + 4>x(XO - w)
+ w)</Jx(xo + w) + <l>Ylx(ylxo -
y-a(Xo+W)-b)(a) f ,
0"1
0"1
y-a(Xo-W)-b) (a)
0"1
0"1
.
JA (-4> (X) + Jix4>X(X)) dx
f
JB (-4>'(x)
+ Jix4>x(x)) dx ]
1
(ox(xo
+ w) + <Px(xo -
w))
, ( Xo
-Jix [<P ( y-a(Xo+W)-b)(a)
<Px
0"1
0" 1
+Q(y-
a(xo - w) - b) (a)
0"1
- ((/)x (Xo + w ) ~ <PX
. (Xo -
0"1
w )) x
6i
+ W)
4>~(xo - w) ]}
+ O(lltW)·
{[4>YI-~(Ylxo + w)a6~~(xo + w)
0(11 W)
t
B*
1
(fA <Px(x)dx
x [<P (
+ ¢J (
+ IB 1Jx(x)dx)
y - a(xo
+ w) -
0"1
b) ( a )
0"1
y - a(xo - w) - b) ( a )
0"1
(xo
+ w)
J 1Jx(x)dx
•
A
{
]
(xo - w) lB </Jx(x)dx
0"1
1
(Ox(Xo
X
[<P (
+ w) + <Px(xo -
w))
y-a(oro+w)-b) (a)
0"1
0"1
+<p ( y-a(Xo-W)-b)
0"1
(a)
0"1
(xo+w)</Jx(xo+w)
(xo-w)</Jx(xo-W)
]
+O(lltW)·
1
(fA <Px(x)dx
A
+ IB 1Jx(x)dx)
= (Xo+W-t,xo+w+t),
B
= (xo-W-t,Xo-w+t)
and pdf of standard normal distribution, ax + band
0"1
and cI>(.) and <p(o) are the cdf
are the conditional mean and
the conditional standard deviation of Y given X = x, !1X and O"x are the marginal
mean and the standard deviation of X
Proof: Let
n.
(I)
"¥YIX
Yx
0
+ b))
= n. (y - (ax
0"1
"¥
where a = PXFO"Y/O"X, b =!1Y - PXyO"y/O"x!1x and
cI>y x(y\x)!
I
=
x=xo+w
0"1
,
= JO"f,-(l - ph-)o Then
cI> (y - a(xo
+ w)
0"1
68
- b)
•
So
=
1
(fA <t>x(x)dx
x
{i [<p (y - a(x:~
_¢
+
+ IB ¢(x)dx)
(y -
a(x:~ w) -
w) - b)
b)
(~) (x _
Xo _ w)
a(x:~ w) -
b)
(~) (x _ Xo _ w) + R] d>X(X)dX}
And
/ (l ¢x(x)dx + k¢x(x)dx)
-
(:J l
+d>(y-a(X:~U')-b) (:J
-
d>x(x)dx
k [<p(y-a(X:~W)-b)
_¢ (y -
A
+ R]
[¢ (y - a(x:~
w) - b)
x¢x(x) dx
kX¢X(X)dX]
/ (l d>x(x)dx + k
¢X(x)dx)
69
-
- [<p (y -
+ </J (y
/(L
-
a(x:~ W) a(x:~ w) -
<Px(x )dx
+
b)
b)
-Ilx [ <P (
/ (<i>x (xo
+
(1)
(
/(L
(-<p'(x) + Ilx<Px(x))dx
(-<i>'(x)
<Px(x )dx )
+ w) + <Px(xo -
(a)
(71
(-<p'(xo-W)+llx<Px(xo-w))dx
(71
(71
+ w) + <Px (xo -
w)) as
y-a(Xo-W)-b) (a)
(71
]
w))
y-(axo+W)-b)(a)
</Jx(x)dx +
•
+ Il X<PX(X))dX]
k
+</J ( y-a(Xo-W)-b)
(71
/ (</>x(xo
(:J L
(;J k
(71
k
<t>x(x)dx)
70
<Px(xo + w)
t -+
0+
(xo - w)
f <t>x(x)dx ]
lB
•
+1> (
y - a(xo - w) -
/ (<Px(xo
IRI < f (x -
JA
0"1
b) (a ) (xo -
+ w) + <i>x(xo -
(xo
0"1
w)1>x(xo - w)
w))
+ w))2dx + f (x JB
71
(xo - w))2dx
]
.
Chapter 4
Multivariate Conditional Quantile
Process
4.1
Multivariate conditional quantile
..
In a multivariate set-up, suppose that (}ii, xd, i = 1,"', n are independent and
identically distributed (i.i.d) random vectors with a (q + I)-dimensional distribution
function (dJ.) II(·).
For every
Zo
E Rq and
Z
E Rq,
p(Zj, zo)
=
D j say for i
= 1,···, n.
\Ve choose a sequence {k n }, such that k n is increasing in nand n -1 J..~n converges to 0
as n
--+ OG.
\Ve denote the order statistics of D j by Dn :1 • ...• D n :n . If the distribution
of D is continuous. ties among the D j may be neglected with probability one.
\Ve define the anti-ranks Sj, i = 1,···, n by letting Dn :j
= Ds ,. for i = 1. .... n.
Thus. (51.···. 5 n ) is a random permutation of the numbers (1,···, n). Consider the
72
.
set of observations:
(lS, . X s. ), i = 1, ... , kn ·
The kn -nearest neighbor (k n - NN) empirical d.L of Y with respect to
Zo
and p(.) is
defined as
k
k;;l
L
I(ls. ~ y), Y E R.
(4.1 )
i=l
The Bhattachatya (1974) lemma yields that the Ys; are conditionally (given the X s,)
independent r.v.'s, although they are not necessarily identically distributed. However.
the very construction of the neighborhood makes the marginal (conditional) distributions of the Ys , in this set nearly identical, a fact that has been established through
contiguity of (conditional) probability measures by Gangopadhyay and Sen (1992).
vVe consider the function in a general mold as
:=:(C) = {e(z) : z E C},
where C is (usually a compact) subset of Rq which includes the essential support of
the distribution of the concomitant variable X.
The following regularity conditions are assumed as Sen (1994):
[A 1] Let F (z), z E Rq be the joint d.f. of the covariate X. \\'e assume that F
admits an absolutely continuous (q-variate) density function j( z), z E Rq. such that
j(z)
> 0, for every z
E C,
(4.2)
and the Hessian ([j2j8z8z')(f(z)) exists for all z E C. Moreover. there exist positiw
and finite numbers
f.
K o and
Q,
such that liz - z* 1\
< t: implies that
II(cPj8z8z')j(z)lz - (8 2 j8z8z')j(z)lz o ll:::; Kollz -
73
z*W"
uniformly in C.
Note that the D j are defined with respect to a pivot
Zo,
and hence, both FD(d) and
fD(d) depend on this pivot zo° We therefore write fD(d) as fD(d; zo). Note that by
definition,
fD(d;zo) =
f
f(z)dv(z;zo), dE R+,
J{Z;p(Z,Zo)=d}
(-L3)
where v(z: zo) stands for the measurement on the segment {z; p(z, zo) = d}.
[A2] \Ve denote the conditional distribution of Yi given Xi =
Zo,
by g(ylzo) and
assume that G(·) has a continuous density g(ylzo) for almost all (y,zo).
(8j8z)g(ylz) and (fJ2j8z8z')g(ylz) exist in a neighborhood of
ZOo
And
The conditional
dJ. and density of }i given D j = d(with respect to the pivot zo), are respectively
given by
GD(y!zo,d) =
{fJ{Z;p(Z,Zo)=d} G(Y1Z)f(Z)dv(z:zo)}/fD(d:z o).
gD(Ylzo, d) =
{J{Z;p(Z,Zo)=d} g(ylz )f(z )dv(z; zo)} / fD(d: zo).
(4.4)
and
where
Zo E
C and y E R.
[A3] \Ve assume that there exists a positive do, such that the first and second order
partial derivatives of gD(Ylzo, d) with respect to d, denoted by gn(') and g'D(.) respectively. exist for almost all Y, Zo and d : 0 < d
~
do, and these are dominated by
some Lebesgue measurable functions. Let
(-1,;3)
(4.6)
~loreover,
(4.7)
where
U3( y)
d ~ do and
is a Lebesgue measurable,
Zo
Q
is a positive number and (4.7) holds for all
E C.
74
[A4] We assume that
I(ylzo,d) = E[{(8j8D)loggD(Yl z o,d)}2ID = d] < x,
uniformly in d : 0 < d ::; do and Zo E C. We may write (for z in a neighborhood of
zo)
j(z)
=
j(zo)
+ (z -
zo)'[(8j8z)j(z)]zo
where
(4.8)
Thus, whenever p{z, zo) = d yield a closed contour which is symmetric around zoo
we ha\'e from (4.3) and (4.8) that for d
j~(d) = j(zo)a(d)
-+
0,
+ (lj2)Ql(d) + o(d2a(d)),
as d -+ 0,
(4.9)
where a( d) is the surface area of the contour {z; p( z, zo) = d} and
Ql(d) = {
(z - zo)'A(zo)(z - zo)dv(z;zo).
(4.10)
J{Z;p(Z.zo)=d}
Note that by the Courant theorem on quadratic forms, (z - zo)'A(zo)(z - zo)jllzzoW ::; a*(X o) = largest characteristic root of A(zo), so that we obtain that
Qdd)ja(d) = O(d1 ), as d -+ O.
For the conditional d.f. G(yjz), we make similar expansion around Zo and
obtain that for small liz - zoll,
G(ylz)
=
G(ylzo)
+ (z -
zo)'[(8j8z)G(ylz)]zo
where B( zo) is the Hessian of G(y Iz) with respect to z at zoo And let us also denote
by
75
From (4.4), (4.9) and (4.11), we have for small d,
where Q2(d) and Q3(d) are defined as in (4.10) with A(zo) being replaced by B(zo)
and C(zo) respectively. Integrating over 0 ::;
d::; r, for small r, we have the average
conditional d.L
=
1
!(zo)v(r)
+ (1/2)Qi(r) + o(Qi(r))
x [f(zo)G(ylzo)v(r)
+(1/2)!(zo)Q;(r)
=
G(ylzo)
+ (1/2)G(ylzo)Q~(r)
+ Q;(r) + o(Q~(r))l
+ {!(zo)Q;(r) + 2Q;(r)} j {2!(zo)v(r) + Q~(r)}
+o((Q~(r))jv(r)),
where
v(r) =
for a(d)dd
and
Qj(r) =
~ote
for Qj(d)dd,
j = 1,2,3, for r > O.
that Q;( r) and v( r) all converges to 0 as r
and obtain that as r
{!(zo)Q;(r)
Thus, for r
--+
--+
--+
0, we may apply L'Hopitafs rule
0,
+ 2Q;(r)} j {2!(zo)v(r) + Q~(r)}
= (r 2j2)h(y, zo)
+ o(r 2).
(-1.12)
O.
(-1.1:3)
76
In a similar manner, we have
(4.14)
In the case of vector-valued concomitant variates, the rate of r n is depend on q.
the number of components. In this context, we may note that for the Euclidean norm.
corresponding to the value of d(> OJ, the volume of the sphere {~; p(~, ~o) = d} is
O( dq ). for q ~ 1. so that we have v( r) = O( r q ) as r --;. O. A similar order holds
for general norms. On the other hand. if we choose a nearest neighborhood of
members close to the point
As such. if we let r n
t'V
~o.
"'n
then under (4.2), we have
n->', for some>.
> 0, we may set
and this naturally put an upper bound for >., namely, l/q. Moreover, the bias term
arising in the conditional quantile estimation, as can be dictated by (4.13), is O(r~).
so that the contribution of the bias term in the asymptotic mean square error of the
estimator is O(r~)
= O(n- 4 >'), while the variance term is
O(k;;l)
= O(nq,\-l).
Hence
we should select >. in such a way that
1 - q>. :::; 4>' I.e. >. ~ l/(q>.),
so that
kn = O(n 4 /(q+4)), as n --;. oc.
If we choose kn = o(n 4 /(4+q)), the contribution of the bias term in the asymptotic
mean square error will be negligible, but the order of the mean square error will be
o( n- 4 /(4+q)) so that we have a somewhat slower rate of convergence.
77
4.2
Asymptotic properties of multivariate conditional quantile through contiguity
•
In this section. we will extend the Bahadur type representation for conditional quantile given by Gangopadhyay and Sen (1993) to multivariate case through contiguity.
Theorem 4.1 For k n lying in (k no , knd where k no
and 0 < a < b < ex:: , as n incerases,
kn
X
I)I(Y:i > ~(zo)) -
(1 - p)]
i=l
+ R n .kn •
(4.1.5)
where
and letting G(·) for G(ylzo) and G D (-) for GD(Ylzo, d), the Y;i are conditionally
independent r.1'. 's defined by
and J(zo) = -h(~, zo) {24j2(zo)g(~)} -1 where h(·,·) is in (4-12).
I\ote that the induced order statistics Ys,'s are conditionally independent given
A = (j( 5 1 ,52 •... ) with }'s, having conditional c.dJ. GD n , (·Izo) = C D( ·Izo. D s ,). Sup-
pose ~nk be the p-th quantile of the random c.d.f. Gnk(Y)
(Ilk)
k
= (11k) Li=l
CD
n,
(·Izo)
=
L7=1 Gni (·); i.e.,
then one can think of
t as an estimator of ~nk.
The difference between G nk (·) and
G( .) of which- ~ is the p-quantile gives rise to the bias term. as well as error of small
magnitude. The difference between G(.) and G(·) provides the empirical process.
which is the main feature of the Bahadur type representation.
78
For every k(::; n); let qnk and Qnk denote the joint conditional density and
distribution of YS 11
of (}Sl"",
•••
1
Ys given X 51' ... , X 5
k
,
k
•
Using the conditional independence
YS k )l we have
k
k
qnk(') = IIgDn,(·lzo) = IIgni(')
i=1
i=1
and
Qnk(') =
k
k
i=1
i=1
II GDn,(·lzo) = II Gni (·).
l"ow let }~'" Y~k be i.i.d. r.v."s with a density and dJ. g(y) and G(y), and let
k
Pnk(') =
II g(.)
k
and Pnk =
i=1
II G(·).
i=1
Thus. qnk is the true conditional density of (Ys 1 ,
••• ,
}Sk)
given (X 51' .... X
5 k ).
while
pnk and Pnk stands for the joint density and distribution of an i.i.d. model. In this
set-up. we may have
Theorem 4.2 Let k = [tn>-] for some "\(0 < ,,\ ::; 4/(q
+ 4))
and for some t(O < a <
t < b < 00). Then under [Al}-[A4}, the densities qnk are contiguous to the densitiEs
pnk a.e. (F)
Lemma 4.1 Let WI,"', W k i.i.d. with distribution H, where k E In(a. b). and let
T/ be the p-quantile of H, i.e. H(T/) = p. Assume that the distribution H is twiCE
differentiable with H'(T/) = h(T/) > O. Let an be a sequence of constants and that
an ,..., n- 2 /(q+4) log n. Then
~T*
n
=
max
sup
kEln(a.b) IW-1)I:S:an
I(Hnk(w) - Hnk(T/)) - (H(i.I.:) - H(T/))I
O(n- 3 /(Q+4)logn) a.s.
where Hnd') is the empirical distribution function of (WI1 ••• , W k ).
79
Proof: As in (Bahadur (1966)), choose bn = n 1 /(q+4) and divide the interval
an. I']
+ an]
into 2bn intervals Jnk,r =
+ r( ani bn ), I'] + (r -
[I']
[I'] -
1)( ani bn )] = Tnk,r. Tnk.r+l].
for r = - bn , ... , bn + 1.
Thus for
I,.i,,'
E Jnk,r,
<
max
max
kEln(a.b) -bn Sr9n
IVnk (Tnk r)1
+
max
max
'kEln(a,b) -bn Sr9n
lank TI,
'
where
<
sup
IH'(w)ln- 3 /(q+4) log n,
IW-1)!San
for -bn
::;
r ::; bn
-
1. Thus
max max ank r = O( n 3 /(q+4) log n).
k
r
'
Finally. using Bernstein's inequality,
P(maxmaxIVnk(Tnkr)1 > Mn- 3 /(q+4)logn)
k
r
'
< LLP(IVnk(Tnk,r)1 >
k
Mn- 3 /(Q+4)logn)
k
< LLexp[-CAI 2 logn]
k
r
2( b - a)n exp[-C M 2 log n],
for large constant C. Thus
for sufficiently large 1\1, which completes the proof.
80
o
Lemma 4.2 Let For B > b/ f(~o) and for large n
Proof: Apply modified version of Lemma 2 of Bhattacharya and Mack (1987).
For B > b/ f(~) and sifficiently large n
put
(1)
= 4/(q
o
+ 4).
and
! g(u)1
K = 3x
uu
u=O
'
where u = FD(d) and d = Fn1(u) = g(u).
Proof: Apply Lemma 2 of Bhattacharya and Gangopadhyay (1988). In scalar case.
~g(
u) I =
1
au
u=O
f(xo + d) + f(xo max
k~n4/(qHJb
IRn k(3)\
= O(n 4/(q+4)-2)
d)
=
I = 2f(xo)
1 .
d=O
O(n-(2-4/(q+4l).
And :3/(q + 4) < 4/(q + 4), 3/(q + 4) < 2 - 4/(q + 4) and (n- 7 / 1 0JIog log n)
o
O(n- 3 /(q+4)). So we get Lemma 4.4.
Lemma 4.4 For r
~
0,
o
Proof: (4.13).
81
Lemma 4.5 For every B, there exist Nand C such that in the sample space of infinite sequences {((d 1 ,Yd,(d2,Y2),···):di >O, Yi real}, D n,[n 4 /(Q+4)bj:::; Bn- 1 / S implies maxkEln(a,b) I~nk - ~I ~ Cn -2/(q+4) for all n ~ N.
Corollary 4.1 For 0
..
<a<b
max I~nk - ~I = O(n- 2/(q+4)) a.s.
kEln(a,b)
Lemma 4.6 For 0 < a, b.
max I~nk - ~ - ,8(O(kjn)21 = O(n- 3/(q+4)). a.s.,
kEln(a,b)
where 3(~) = -h(~) {24P(zo)g(~)} -1
.
Lemma 4.7 Under [.41]-[,44],
D*n
=
Op( n -3/(q+4) log n).
•
Proof: \Ve have shown in lemma 1 that under pnk measure, D~ is O(n- 3/(q+4) log n)
a.s.
~ow.
since qnk is contiguous to Pnk (Theorem 4.2), under qnk measure.
D~
is
o
Lemma 4.8 For an = n -2/(q+4) log n,
P[ max
kEln(a,b)
I~nk - ~I
Now, lemma 4.7 and 4.8 implies
where
82
> an i.o.]
= O.
But
where ~~k lies between ~:k and ~nk and g(.) = (ljk)L:7=lg(·IX s ,). Note that
max I~nk *
kEln{a.b)
-
~I::; kEln(a.b)
max link - ~nkl
+ kEln(a.b)
max I~nk -
~I.
and hence by lemmas 4.6 and 4.8
Consequently. lemmas 4.3 and 4.5 imply
max Ignk(~nk*) - g(OI
kEln(a.b)
+ O(n- 2 /(Q+4) logn),
a.s.
Also. by Hoeffding's inequality we have
Thus. we have
k
ink = ~nk
+ {kg(~)} -1 L[l(Yni > ~) -
{I - Gni(~)}]
+ R nk
(4.16 )
;=1
where
max IRnkl
kEln(a,b)
+ Op(n- 3 /(Q+4) logn).
(4.16) can be expressed as
ink = ~nk
k
+ {kg(~)} -1 L[l(Yn~ > 0 -
where maxkEln(a,b) IRnkl
(1 - p)]
+ Rnb
(4.11)
i=l
= Op(n- 3 /(Q+4lIogn)
and }~*i
= G-1Gni(}'~;)
and G(~)
= p.
:"ow (4.1i) along with lemma 4.7 completes the proof of Theorem 4.1.
\Ve also have following asymptotic normalities for multivariate case. The following theorem is the extension of Theorem N2 by (Bhattacharya and Gangopadhyay
(1988)).
83
Theorem 4.3 Let Tn(t) =
~n,[n4/(q+4)t].
Then for any 0
{n 2/(q+4)[Tn(t) - ~(zo)] - ;3t 2, a :S t :S
< a < t < b<
b} !: {aC
1
(x,
B(t), a :S t :S
b} .
when 0'2 = p(l - p)/g2(~), {B(t), t > O} denote a standard Brownian motion for
.
q ? 2.
Proof: From theorem 1.1, we have
n2/(q+4)(~ -
. / (1 _ )
[n 4/(q+4)t]·
2
1
2
at2n- (q-l)/(q+4) = V p
P C n- /(q+4)
",rni + R(t).
g(O
i=l .
L
0-
where
IF . _
n, -
H
l(Yni > ~(zo)) - (1 - p) 1 < . <
Vp(l-p)
,_z_n,
are iid with mean 0 and variance 1 for each nand R(t) = Op(l). Thus we have. with
0' = Vp(l - p)/g(O,
[n 4/(q+4)t]
n2/(q+4)(~ -~)
- ;3t 2n- 2(q-l)/(q+4) = aC 1 n- 2/(q+4)
L
W ni
+ op(1),
.
i=l
uniformly in a :S t :S b. Now theorem 1, page 452 of Gikhman and Skorokhod (1969)
to see that {n- 2/(q+4) L~4/(q+4)t] Wni , a:S t :S b}
!:
{B(t), a:S t :S b}. This proves
o
the theorem.
It is shown that if for some 8
> 0, k n
= O( n 4 / 5 -
6
),
then the bootstrap method
proposed by Sen and Gangopadhyay (1990) works out neatly and the leading bias
term may be eliminated in the univariate case. We extend their bootstrap method
with kn = O(n 4/(q+4)-c5) for some 8 > 0 in the multivariate case. A bootstrap sample
(}'~*l' .... }'~;U is obtained from the empirical dJ. Gnk defined by (4.1). Thus. given
the induced order statistics
(}~l,""
Ynk ), the Yn"i are conditionally i.i.d.r. v. with
the d.f. Gnk' Let G~k be the bootrap sample d.f. (based on Y;l ... , }:k)' Then the
boot rap estimator is
~~k
[kp]-th order statistic of Y;l" .. , }'~*k
84
The following two theorems can be proven without major changes to their original
ones.
Theorem 4.4 Under [AI} and [A2}, for any k E In(a, b)
I.. ::; [bn 4 /(q+4)-26] =
i~k
kIl,
- ink =
0<a < b<
00,
as n
= {k:
ko = [an 4 /(q+4 l -25
::;
--+ 00,
k
{kg(~o)}-1 L{1(Y~* > ~o) - (1- p)}
+ R~~
;=1
u'here
are i.i.d.r.v. with d.f. G(·) and as n
--+ 00
max IR~~I = O(n- 3 /(q+4)+36/2l ogn )
a.s..
kEln(a,b)
Theorem 4.5 Under the hypothesis of Theorem
{
j.
n 2/(q+4)-6( <"n,[tn4!(qHl-26]
-
j
4.4,
as n
)
<"n,[tn4!(qHl-26] ,
--+
n
t E [a, b]}
Hence. for each t
One can generate a set of M bootstrap samples and from each one. compute
,
the bootstrap estimates, and then use the empirical quantile (for the
the desired confidence interval for
~o.
,
~~k - ~nk)
to set
The asymptotic properties of such bootstrap
intervals would then follow from Theorems 4.4 and 4.5.
4.3
Conditional Independence and Contiguity
In this section. we discuss the conditional independence of Ys,'s and contiguity for
proving theorem 2.
85
Lemma 4.9 Under the continuity assumption of FD(d; zo), the marginal distribution
of D, for every n and almost all (D l
,' .. ,
D n ), YS1 ,"', YSn are conditionally inde-
pendent given X S1 ,"', XS n with conditional cdl's GD S1 (·Izo),···, G DSn (·Izo) respectively.
Proof:
(Lemma 1 of Bhattachatya (1974)) For any d n = (d l , " ' , dn ) no two
coordinates of which are equal, let
fore.
By the continuity assumption,
Dnk = D sk, D n '
}~l"'"
of
}s
k.
uct
i
D ' k =
n
}~n
}'~k
Sk,D n
,
k = 1"", n are defined a.s.
= Ysk. D n ' k = 1,"', n.
and
Hence the conditional joint cdf
given X SI' ... ,XSn is the same as the conditional joint cdf of
1" .. ,n, given X Sk' k = 1"" ,n, which is easily seen to be the prod-
IT;;=l GDs, (·jzo)
:f j
= Sk,d n be the antirank as defined ,Je-
Sk
=
IT7=1 GDnk
due to the independence of }'j and D; for every
0
= 1. .... n.
Consider a sequence {Pn, qn} of simple hypothesis Pn and simple alternatives qn
defined on measure spaces (Xn , An, J1n), n
~
1, respectively.
Definition 4.1 If for any sequence of events {An}, An E An'
holds. we says that the densities qn are contiguous to the densities pn, where dPn
Pndl'n. dQn = qndl'n'
Contiguity implies that any sequence of random variables converging to zero in
Pn -probability converges to zero in Qn -probability, n
~ 00.
Let us denote by gDnl(ylzo) = gni(y) and GDnl(ylzo). Also for every 1.. (".5:. n);
let qnk and Qnk denote the joint conditional density and distribution of }SI' .... }Sk'
gi ven X SI ..... X Sk' Using the conditional independence of (1':SI , .... }Sk ). we ha \'e
k
k
qnk(') = IIgDnl('lzo) = IIgni(')
;=1
i=l
86
•
and
Qnk(o) =
k
k
i=l
i=l
II GDm(·lzo) = II Gni (·).
Now let Y~,', Y~k be i.i.d. r.v.'s with a density and dJ. g(y) and G(y). and let
k
Pnk(')
k
= IIg(·)
and Pnk
i=l
= II G(·).
£=1
Thus, qnk is the true conditional density of (YS1" .. , 15k ) given (X 51"
..•
X
5
k), while
pnk and Pnk stands for the joint density and distribution of an i.i.d. model.
The following regularity conditions are assumed:
[AI] Let F(z). z E Rq be the joint dJ. of the covariate X. \Ve assume that F
admits an absolutely continuous (q-variate) density function j(z). z E Rq. such that
j(z) > 0, for every z E C,
and j'(z) = (8j8z)(f(z)) exists in a neighboorhod of zoo Moreover. there exist
positive and finite numbers
f:,
K o and
Q,
such that liz - z* II <
f:
implies that
11j'(z) - j'(z*)11 ~ Kollz - z*lI·
[A2] Conditional distribution of Yi given Xi =
ZO,
by g(ylzo) and assume that
G(-) has a continuous density g(ylzo) for almost all (y,zo). And (8j8z)g(ylz) and
(8 2 j8z8z')g(ylz) exist in a neighborhood of zOo
[A3] \':Oe assume that there exist
f:
>' 0 and
functions uj(y) and u2(y), such that for all r
[A4] There exists
f:
> 0 such that
87
Q
> 0 and some Lebesgue measurable
~ f:,
uniformly in r : 0 < r <
Eo
Let £(A, B) be a set of continuous linear transformations from a topological
vector space A to another B. and let C be a class of compact subset of A such that
every su bset consisting of a single point belongs to C. For G E A and for every
H E I{ E C, we assume that
T(H)
= T(G + (H -
G))
= T(G) +
JT (G;y,zo)d[H(y) - G(y)] + R (G;G - H).
1
1
where G(·) = G(·lzo) and for any sequence H k EKE C such that IIHk
o.
IRd G: H - G) = o( IIH k - Gil)' uniformly in H k E K; and
11·11
-
Gil
-+
stands for the "sup-
norm". \Ve may refer to Clarke (1986) and Fernholz (1983) for a detailed discussion
in this context. This relates to the so called first order Hadamard-differEntiability of
T(·) at G. and TdG; y. zo) is called the first order compact (or Hadamard) derivative
of T(·) at G; it is normalized that
Similarly. if we assume that
T(H)
=
JT y, zo)d[H(y) - G(y)]
+(1/2) JJT (G; y, y', zo)d[H(y) - G(y)][H(y') - G(y')]
T( G)
+
1(
G;
2
R 2 (G: H - G),
VH EKE C,
where for any sequence H k EKE C such that IIHk
-
Gil
-+
0 and
then T(·) is second order Hadamard differentiable at G and T 2 (G;y.y'.zo) is the
second order compact (or Hadamard) derivative of T(·) at G. vVe may set
88
"
f T (G;y,y',xo)dG(y') = = f T (G;y,y',xo)dG(y)
0
2
[B1] There exist f> 0,
> 0 and A
Q
2
(a.e.).
< 00 such that uniformly in r(O :::; r :::;
If T (G; y, xo) {(8j8r )gr(Ylxo)} A;
If T (G; y, xo) {(8j8r )gr(Ylxo)}
-f T (G; y, xo) {(8j8r )gD(Yl x o)lr=o}
2
dyl :::;
2
dy
1
1
2
1
dyl :::; Ar
f).
Q
•
Lemma 4.10 (a) under [Ai}, [A2} and [Bl}, we have
whErE
and
for some
r*
between 0 and r. Moreover, there exists
that uniformly for all r(O <
If
If
r
< f) for some
Q
< M
Tl(G;y,xo)q(y,xo)dzl
T 1 ( G; y, xo)R(y, r, xo)dzl < AfrO.
l:
l:
r
<
q(y; xo)dy = 0
R(y, r, xo)dy =
89
x) such
>0
(b) If in addition, we assume [A3}, then for 0 <
and
> 0 and M(O < Af <
f
o.
f
(~.18
)
( ~.19)
o
Proof: Same as the proof of Lemma 3.
Now, we define
Ink =
qnk/Pnk
if Pnk > 0
1
if Pnk
o
if Pnk = 0 < qnk.
= qnk = 0
Then along with fashon of LeCam's first lemma (see, e.g. Hajek and Sidak (1967).
page 203- 204), we shall prove the following;
Lemma 4.11 log L nk is asymptotically normal (-1/2J2, J2), so that the densities qnk
a re contiguous to the densities Pnk.
For this purpose, first note that
k
logL nk
= Llog[gndYni)/g(Yni )].
(4.20)
i=l
In the next lemma we will show that the summands in (4.20) are uniformly asymptotically negligible (DAN).
Lemma 4.12 Assume [Al}-[A4}. Then under Pnb
Proof: By Chebychev's inequality
But
E [{gni(Yndg(}'~i)} - 1]2
=
D~i[
+2
J{q2(t, zo)/g(y)} dy + J{J R 2(y, Dni . zo)/g(y)} dy
J
{q(y. zo)R(y, D ni , zo)/ g(y)} dy]
= 0(1).0
90
From this point, we restrict ourselves to the case ,.\ = 4/(q
+ 4).
this section, we will indicate how the proof can be modified when ,.\
At the end of
> 4/ (q
+ 4).
Lemma 4.13 Let
k
Wnk =
22: {[gni(Ynd/g(Yni)]1/2 -I}.
i=l
Assume [A 1}-[A4} hold and the statistics W nk are asymptotically normal ( -1 /2d 2 . d2 )
under Pnk. then under Pnk
and logLnk is asymptotically normal (-1/2d?,d 2).
Proof: Since the UAN condition is satisfied by lemma 2.4, the result follows from
o
LeCam's second lemma.
Lemma 4.14 Assume [Al}-[A4} hold. Then under pnk,
Proof: Denote
and
s(y) = s(yIO) = [g(Y~OW/2.
So under Pnk
2: J{2s(YID~i )s(y) k
,=1
91
2
2s (y)} dy
k
-L
,=1
J(S2(y) + s2(yID;i) - 2S(y)s(yID;i))
dy
-t D~i J{(s(yID;;) - S(y)f /D;i r
dy
,=1
where 0 <
d~i
~ote
< Dni;i = 1,2,'" ,k.
that for some
t
> 0, by assumption [A4]
uniformly for all d(O < d < t). Also,
So,
}l.~ J{(8/8d2)S(Yld)L~I} 2 dy = J{(8/8d2)S(Yld)ld=J 2 dy
= (1/4)
J{l(y,zo)/g(ylzo)}dy,
uniformly i (= 1,2"", k).
Finally, let UnI
< ... < Unn denote the order statistics of a random sample
U'I,''''Cn ) of size n from the uniform distribution on (0,1). Then D ni = h(l'n;}'
where h = F DI . It can cbe seen easily that under assumption [AI] the following is
true:
(i) h(u) = FDI(U) is defined for 0
F D- I =
<
u
<
U.
(ii) h' (u) is continuous at O.
(iii) h(O)
= h"(O) = 0, h'(O) = {2J(zo)} -1.
92
t
for some
t
> 0 as a unique solution of
Let
F;
denote the empirical distribution of U1 , ... , Uk; then
max
15,J5,k
<
IU .- (jjtn /(q+4))1
5
nJ
sup 1Fk'(u)-ul+(1-n- 1 /(q+4)
O<u<l
Op( n- 2 /(q+4)).
So.
By lemma 3.1 and (iii)
k
L
i=l
k
{h(Und}4 =
L
{h'(O)Uni
i=l
+ {h'(tUni ) -
k
L U;i {h'(O) + op(1)}4
i=l
4
=
(h'(O))4
L U;i + op(l).
i=l
Substituting (4.3) in (4.21). we have
k
(2j(ZO))-4 L(jjtn 5 /(q+4))4
i=I
93
+ op(l)
h'(O)} Dnd
4
(4.21)
So. we have
where
<>
Lemma 4.15 Define
k
l~k =
I: D~i {q(Yni , zo)/g(}~d}·
i=1
Assume [Al}-[A4}. Then under Pnk
Proof: F nder
Pnk
I: D~i J[{ (s(Y~iIDni)
k
<
4
- s(Ynd)/ D~i}
i=1
_( 1/2) {q(y, zo)/(g(y)
< 4 ~D~i
where 0 :S
d~i
)1/2}] 2dy
J[(a/ad2)s(Yld)ld~. -
(1/2) {q(y,zo)/(g(y))1/2}f dy
:S D ni . i = 1,2"", k. So we conclude that the last integral converges
to 0 uniformly for all i
= 1,2,··· , k.
<>
94
Proof of the theorem 4.2 : Note that under
k
E(Vnk ) =
LD~i
,=1
and similarly under
J{q(Y,~o)/g(y)}g(y)dy
Pnk,
k
=
LD~i
,=1
Jq(Y,~o)dy
= 0,
Pnk.
E(\l~k)
k
=
L
i=1
D~i
J{q(y, ~o)/g(y)}
dy
So. it follows
:\ow, under
Pnk
And
~ow. we consider the case when k =
:3.1. we ha\'e 2:7=1 D~i
under
= op(1).So,E(Wnk )
---t
[tnA] for 0 < ,\ < 4/(q
0 (i.e., d2
= 0).
+ 4).
By lemma
So it follows easily that
pnk
log Lnk
---t
0 in probability.
Hence. the LeCam first lemma holds through the degenerate normal law. This completes the proof.
4.4
Multiple conditional quantile process
In the multiple multivariate setting (i.e., we have 1 x 1 response vector Y / . and
q x 1 vector of covariates Xi for i-th subject), then we need a extension of map-
ping (Xi, V,)
---t
(Zi. Yd· Consider the transformation (Xi, Vi)
95
---t
(Zi, Yd where
D i = p(X i • zo).
Let D n1
< ... <
D nn be the order statistic corresponding to
D1 , ... , Dn . Let Y n1, ... , Y nn be the induced order statistics. (i.e., Y ni = Y j if
Zni
=
Zj, for j
=
1,"" n.) To define epT(Ylzo), we need a multivariate empir-
ical d.L and corresponding conditional quantile. The i-th element of multivariate
empirical d.L and corresponding k - N N estimator of the conditional quantile are
G~l (y) = k -1
"
•
k
L 1(yJjl :::; y),
Y E R1
j=1
and
where l(Y :::; y). is 1 if Y :::; y. 0 otherwise and i = 1,"', I. So I x 1 multivariate
", - .vS estimator of the
conditional quintile is defined by
tk has component-wise asymptotic normality. But to derive the joint normality for
tk' "ve need to consider the multivariate conditional quantile process which remains
•
as a future study. An estimator of J.t can be obtained by the marginal approach: let
.
xU) be the j-the component of z for j = 1,' .. ,q. And the j-th the component of it
can be obtained by
inf {supp(mPT('lxU)),mps('lx(j)
/lU!
xU)
96
+ 11(j)))} ,
for j = 1,··· .q.
(4.22)
Chapter 5
Numerical Examples
5.1
Introduction
In this chapter, we demonstrate numerical examples of our work. We use p = 1/2,
the median of the response. The following is the summary of the paper by The ARIC
Investigators (1989). Atherosclerosis Risk in Communities (ARIC) is a prospective
investigation of the etiology and natural history of atherosclerosis and the etiology of
clinical atherosclerotic disease in four US communities. In each of four US communities (Forsyth County, North Carolina; Jackson, Mississippi; suburbs of J\linneapolis
~linnesota:
and Washington County, Maryland) 4.000 adults aged 4.5-64 years are ex-
amined twice three years apart. Examinations include ultrasound scanning of cai dtid
artery. In this example, we separately studied the black female and the white female groups. The black female data included the participants with second visits from
all four communities; the white female data included the participants with second
visits from Forsyth County, North Carolina. The response variable is the average
carotid artery far wall thickness(WT) scanned by Beta-mode ultrasound equipment
in mm. The explanatory variable is Body Mass Index (BMI) which is defined by
weight /height 2 (Kg/ m 2 ). The group variable for black females is the hypertension
97
medicine history which is one if the participant has taken hypertension lowering medication in past two weeks, zero otherwise. The group variable for white females is the
smoking status which is zero if the participant is (was) not a smoker. one if the par-
•
ticipant is a current or former smoker. The main interest is to test the bioequivalence
of the carotid artery far wall thickness between the two groups and to estimate the
relative potency.
5.2
ARIC Black females
Since the group variable is the use of the hypertension lowering medication for black
females. the participants were classified as hypertensive or not. In turn. the use of
the hypertensive medicine is determined by the exhibition of hypertension symptoms.
The black female data has sample size 623 and 735 for the hypertension and the
normotension groups, respectively.
f
5.2.1
Inferences at a given point of x
Table 5.1 shows the results of estimating the conditional median carotid artery
far wall thickness conditioned on the mean BMI of the blacks (30.162 Kgjm 2 ). All
the variance estimators are calculated by the bootstrap method of 200 iterations.
The
b~N
estimator is shown to be asymptotically unbiased, but it has a much larger
variance than the NN estimator does. The 95 % confidence interval of the difference
between two medians of the bNN estimators includes zero; therefore. we cannot say
that the conditional medians are different between the hypertension and the normotension groups by 95% significance level. The next two lines show the inference
on the ratio. The lower and upper bounds of the 95% C.l. of the ratio by Fieller's
theorem are not in the range of
±
20 %. From the viewpoint of bioequivalence. the
bioequivalence is not acceptable based on the C.l. of the ratio. Estimated variances
98
Table 5.1: Estimated conditional median carotid artery far wall thickness (mm) conditioned on BMI=30.162 Kgjm 2 (black females)
Normotension
Hypertension
623
735
90
101
~1
0.62965
0.66521
(7
0.01553
0.02406
95% C.l.
(0.59921,0.66009)
(0.61805,0.71237)
n
kn 1=
n4/5-26
95% C.l. of the diff.
(-0.02057,0.09169 )
ratio
1.06
Fieller's C.l.(95%)
(0.53,1.96)
NN
NBC
,-
1-')
196
6
0.64124
0.68388
(7
0.006596
0.006399
6-6
0.01159
0.01867
0.00958
0.01207
(0.61878,0.64454 )
(0.6.5927.0.6843.5 )
kn =
n 4/ 5
bias correction on
n
9.5% C.l.
KBC
95% C.l. of the diff.
(0.02214,0.05816 )
ratio
1.06
Fieller's C.l.(95%)
(0.75,1.52)
bias correction on kn
0.01870
0.01098
95% C.l.
(0.60961.0.63.547)
(0.66036.0.68.544 )
95% C.l. of the diff.
(0.03235,0.06837 )
ratio
1.08
Fieller's C.l.(95%)
(0.76,1.5.5 )
99
Table 5.2: Jackknife bias correction on kn sample (black females)
Normotension
p
kn
=n
4 5
/ =172
Hypertension
kn
=n
4 5
/ =
0.90
0.00000
0.00000
0.80
0.01981
-0.00133
0.70
0.01527
-0.02294
0.60
0.01811
-0.00075
0..50
0.01823
0.01404
OAO
0.01589
0.02483
0.30
0.02055
0.02292
0.20
0.01424
0.02173
0.10
0.02751
0.02896
mean
0.01870
0.01098
196
from the NN method are much smaller, but the median estimators are theoretically
biased. The differences between the bNN and the NN estimator for the hypertension
and the normotension groups are 0.01159 and 0.01867. Jackknife bias corrections
(KBC (bias correction on the k n samples) and NBC (bias correction on the whole
n data)) were employed. Tables 5.2 and 5.3 show the estimated bias correction in
terms of p (sub-sampling rate). There is no standard way to choose a particular p.
so we tried several possible cases and used the mean for our correction. Csing the
KBC method. we do not need to sort the data for every p: however. using the .\"BC
method. the data does need to be sorted for every subsample. since the subsamples
are changing for each different p. Therefore. computationally KBC is less intensi\'e
than .\"BC. but the estimated bias from the NBC approach looks more stable than
the bias from KBC. The 95% C.L's of the differences between the two medians do
not include zero for both bias correction methods. Therefore. the results of the .\".\"
100
•
Table 5.3: Jackknife bias correction on the whole n data (black females)
#
Normotension
Hypertensi on
n =623
n =73.5
p
pn
kpn
bias
pn
kpn
bias
10
0.1000
62.3
27
0.02517
72.3
31
0.01740
9
0.1111
69.2
30
0.00402
80.3
33
0.01586
8
0.1250
77.9
33
0.01081
90.3
37
0.00702
I
0.1429
89.0
36
0.00757
103.3
41
0.02390
6
0.1667
103.8
41
0.00449
120.5
46
0.00586
5
0.2000
124.6
48
0.01069
144.6
53
0.0079:3
4
0.2500
155.8
57
0.00432
180.8
64
0.006.).5
of gps
mean
0.00958
0.01207
method with bias correction show that the median carotid artery far wall thickness of
hypertensive patients is significantly thicker than that of normotensive people after
adjusting for BML The Fieller's C.L's of both the 8NN and the NN methods are
somewhat larger than ± 20%; the bioequivalence is not acceptable from the results.
If we increase the sample size, we may be able to conclude the bioequivalence.
5.2.2
Inferences for the processes
To test the bioequivalence between the two groups and to estimate the relative
potency. we assume the bioequivalence model (parallelism of the two response curves)
which is:
Ho : ~T(x) - e(x
+ p)
= 0 for all x E .1:'.
(5.1 )
\Ve need to check the parallelism before doing any statistical inferences based on the
assumption (.5.1). Figure 5.1 shows the estimated conditional median curves over
101
co
co
o
co
CO
o
Hypertension
~
i6~
oCco.
.;:; 0
'6
C
o
()
/
/
....
.......... .....
.....
..... .....0-. ..... .....
.....
..... .....
'~-----1
/
C\l
/
co
o
Normotension
.... - - - - - .... -
-_
.
/
/
/
/
o
co
0L..-.;-------.----.-----r-----,-------,..-----,.---J
15
20
25
30
35
40
45
8MI
•
Figure .5.1: Estimated conditional median carotid artery far wall thickness (mm)
conditioned on BMI (Kgjm 2 ) for two groupsl J.l = 0 (black females)
B~n.
and we decided that there is no violation of the parallelism. 'NT curve of the
hypertension group is greater than that of the normotension group for all BM1"s. To
estimate J.l, we discussed the mini-max rule in chapter 1. Figure 5.2 shows
sup I~~(x) - ~~(x
rEX
+ J.l)1
as a function of J.l. From the figure. we estimate J.l is 10 which minimizes (;">.2). i.e. the
''''T curve of hypertension patients is shifted to the left by 10 units of
to the 'WT curve of the normotension group.
102
B~n
compared
O~-------------------------,
r-o
o
10
(0
o
o
~O
00(0
II ~
x 0
~
Q)
o>
10
10
Q)O
(J
c:
.
0
«l
iii
.- 0
'010
0.0
::::l
•
1Jl0
10
~
o
o
o
~
o
O~-----~------,.-----~------r-J
o
-10
·20
10
20
mu
Figure 5.2: supxEX
It;;(x) - t;,(x + J.l)1
vs. J.l (black females)
To test (.5.1) when J.l = 10, as we discussed in section 2.6.
for x =
(Xl,'"
, I m )'
as n
---+ 00.
We reject Ho if
(.5.:3 )
\vhere
~n
is the variance matrix of
[e~(x) - eS(x + JL)]
which is asymptotically a
diagonal matrix. where \. 2(m, 1 - a) is the upper a percentile from \2 distribution
103
Q)
<D
o
<D
Hypertension
~
o
~
~
«i v
C<D
.Q
.': 0
.
...
"0
C
o
~....
()
/
..... /
.... .... " "
.......
" ""
/
/
Normotension
~------~-----~------~
/
C\l
/
<D
o
/
/
/
.... /
o
<D
O'----,-----,------r------,----..,.....----r---------r----'
15
20
25
30
35
40
45
8MI
Figure 5.3: Estimated conditional median carotid artery far wall thickness (mm)
conditioned on BMI (Kgjm 2 ) for two groups, J-l
10 (estimated relatiye potency)
(black females)
with m dJ. Therefore, we employ the following
m
Qd = L)t~(xd
- t;(x + J-l))2o-;;\
i=l
where
\Ve use
~'
= (1.3,20,25,30,3.3,40,45) with 200 bootstrap iterations to estimate
104
~n
and
CJjj.
\Ve have Q
=
20.36 with p-value 0.005 and Qd
=
18.87 with p
=
0.009.
Therefore, we reject the hypothesis (.5.1) which means the difference between the
two WT curves are statistically significant. The difference Q - Qd = 1.:30 can be
interpreted as the increase of the X2 due to the non-zero off-diagonal elements in
i: n .
Since Q- Qd is small, we can say that the independence assumption of the conditional
quantile over x E X is not violated.
The difference of the mean WT's between the two groups is 0.04974. \Ve subtract the difference from the hypertension group and apply the test again. The values
Q = ·).88 with p
= 0.55 and
Qd
= 5.56 with p = 0.59 were obtained.
Our conclusions
are as follows: 1) the WT curve of the hypertension group is significantly higher
than that of the normotension group for the entire range of BMI, 2) but hypertension
effect-adjusted WT curves are equivalent for the two groups; WT curves have the
same shape for the two groups.
5.3
ARIC White females
Using white females, we compared the two conditional median WT curves for the
smokers and the non-smokers conditioned on BMI and evaluated the bioequivalence
between the two curves.
5.3.1
Statistical inferences
Since Tables 5.4 to 5.6 show the similar results as the black females, we omit the
details of interpretations for these tables. From Figure 5.4, \ve estimate 11 as zero.
and Figure 5.5 shows the median curves for smokers and non-smokers.
\Ve notice two things from Figure 5.5.
First, the curves cross. clearly. the
parallel assumption is violated where BMI is approximately 17. Furthermore. the
10.5
Table .5.4: Estimated conditional median carotid artery far wall thickness (mm) conditioned on BMI=25.173 Kg/m 2 (white females)
Non Smoker
Smoker
651
679
93
96
~l
0.61823
0.63371
(7
0.01694
0.01315
9.5% c.l.
(0.58503,0.65143)
(0.60793,0.6.5947)
n
kn = n 4 / 5 -
9.5% C.l. of the diff.
(-0.02655,0.057.5 1)
ratio
1.03
Fieller's C.l.(95%)
(0.59,1.88)
kn = n 4 / 5
178
184
6
0.61518
0.62432
(7
0.004833
0.004388
-0.00305
-0.00939
bias correction on k n
-0.00485
-0.0071:3
95%C.I.
(0.61056,0.62950 )
(0.62285.0.6400.5 )
NN
~2
KBC
NBC
25
-
~l
9.5% C.l. of the diff.
(-0.00137,0.02422)
ratio
1.02
Fidler's C.l.(95%)
(0.75,1.39)
bias correction on n
-0.0086.5
-0.004:30
95%C.I.
(0.61436,0.63300)
(0.62002.0.6:3722 )
9.5% C.l. of the diff.
(-0.00800,0.017.59 )
ratio
1.01
Fieller's C.I.(9.5%)
(0.74.1.37)
106
Table 5.5: Jackknife bias correction on kn sample (white females)
Non Smoker
p
kn
Smoker
= n 4 /.5=178
kn
= n 4 /.5=
0.95
0.0000
-0.01036
0.90
0.00000
-0.00532
0.80
0.00000
-0.00281
0.70
-0.00.571
-0.00198
0.60
-0.00477
-0.00158
0.50
-0.00407
-0.02070
mean
-0.00485
-0.00713
184
Table 5.6: Jackknife bias correction on the whole n data (white females)
#
Non Smoker
Smoker
n =651
n =679
p
pn
kpn
bias
pn
kpn
bias
10
0.1000
65.1
28
-0.01040
67.9
29
-0.00.519
9
0.1111
72.3
31
-0.01210
75.4
:32
-0.003.59
8
0.1250
81.4
34
-0.00993
84.8
35
-0.00698
7
0.1429
93.0
38
-0.00549
97.0
:39
-0.00:3:37
6
0.1667
108.5
42
-0.00697
113..5
44
0.0019:3
.)
0.2000
130.2
49
-0.0067.5
135.8
.51
-0.00:389
4
0.2.500
162.8
.59
-0.00891
169.8
61
-0.00902
of gps
mean
-0.00958
107
-0.01201
t--
o
o
(0
o
o
~
CO
II
LO
x 0
05 0
>
o
Ql
()
C "=t
(150
'Uio
=0
c.
::l
(fl
(\")
o
o
C\J
•
o
o
-20
-10
o
10
20
mu
Figure 5.4: sUPxEX I~~(x) - i;,(x
+ J-l)1
vs. J-l (white females)
curves are flat for large values of BMI, say, greater than 40. These two issues are
related to the kn samples for given BMI; BMI has a bell-shape distribution. Therefore.
the data points are dense in the middle of the range and sparse at both ends. To
calculate the conditional median conditioned on BMI=lO for non-smokers. the range
of k' n sample is (16.6.21.1). To calculate the conditional median at
B~H=10.
we use
the data where the BMI range does not cover the point of BMI=10. Consequently.
we have too few data points at these extreme points. and the inferences at these
points reflect the trend for the mid-ranged data points. Table 5.7 shows the
108
B~II
Smoker
_Non-Smoker
/
---------
/
/
I~C\J
/
/
/
CD
/
Clio
c::
/
o
/
~
/
:0
c::
/
o
/
()
/
/
o
CD
/
o
/
,,
co
It)
/
,
/
/
,,
/
',I
/
/
o'---;------,----..,...---,...------r-----,,..-------,----'
15
20
25
30
8MI
35
40
45
Figure .5":): Estimated conditional median carotid artery far wall thickness (mOl)
conditioned on BMI (Kgjm 2 ) for two groups: fl = 0 (white females)
ranges for kn samples at each given point of BMI. And Figure 5.6 is the diagram for
the Table 5.7. Ideally, the ranges should be exclusive, which is true in the middle of
the range. If the ranges are exclusive, then the asymptotic independence theory will
work well. In Figure .5.6. BMI ranges for BMI=lO and 15 and the same and the range
for
B~1I=40.
4.5 and 50 are the same. vVe conclude that we cannot make statistical
109
inferences for those points due to the lack of data. We restrict our analysis for BMI
bigger than or equal to 20 and less than or equal to 35.
From Figure 5.5, the WT curve of the smoking group is higher than that of
the non-smoking group. The difference is small for thin people, but the difference for
heaYy people is much larger. Moreover, WT increases as BMI increases for the two
(x) - s (x + J.l) I vs. J.l. Our estimator of the
groups. Figure 5.4 shows the supxEX
ItT
t
relative potency is zero. We use :e = (20,22.5,25,27.5,30,32.5,35) for the test of the
fundamental assumption, and obtain Q = 2.74 with p = 0.91 and Qd = 2.7:3 with
p = 0.91.
Our conclusions of the 2nd example are these: 1) \VT increases as BMI increases
for the smokers and the non-smokers, 2) WT of the smoking group is higher than that
of the non-smoking group, 3) the difference of WT between the two groups is larger for
heaYy people than for thin people, 4) but the difference in WT between the smokers
and non-smokers is not statistically significant.
110
Table ·5.7: BMI range of kn sample at each point of BMI (white females)
given BMI( = xo)
Non Smoker
Smoker
10
(16.6,21.1 )
(13.7,20.7)
15
(16.6,21.1 )
(13.7,20.7)
20
(18.6,21.4)
(19.0,21.1)
22.5
(21.8,23.2)
(21.9,23.1 )
25
(24.3,25.7)
(24.2,25.9)
30
(28.1,31.9)
(28.0,32.0 )
35
(30.0,39.5 )
(29.3,40.1)
40
(30.2,48.3)
(29 ..5,43.6)
45
(30.2,48.3)
(29..5,43.6)
50
(30.2,48.3)
(29.5,43.6)
111
oL()
45 ----------------.!--..:
40
o
v
c
t
_
35 -----..!-----
'8-
~o
30--!...--
ccC'l
c
Q)
>
25_
'51
t_
o
C\I
.15
t
o....
20
30
40
50
8MI Range
Figure ,5.6: BMI range of k n NN sample at each point of BMI (white females): solid
line~
smoker, dotted line: non-smoker
112
Chapter 6
Summary and further research
In this paper. we propose a conditional quantile process by using the k-'Y.;I:. estimator.
We have shown that our process is asymptotically a Gaussian process which is fully
characterized by pointwise asymptotic normality and covariance structures for only
shrinking neighborhood of fixed points. A Kolmogorov type estimator of the relative
potency was proposed. We have the same asymptotic normality of the conditional
quantile under the small measurement errors. The difference of the quantiles bet\veen
the two cdrs (one without measurement error and one with measurement error) is
the order of
O(!!EW) under the uniform measurement error
(U( -€, El). \\le expanded
our results to the multivariate case by contiguity arguments. Finally. we applied our
method to ARIC data.
\Ve have several questions which may become future research topics. There may
be more than two responses in which we need to consider the correlation structure
among the responses. \Ve may want to estimate the effect of measurement errors on
the quantile which was shown to be the order of O( IIEW). To test the fundamental
assumption. we can try to obtain the distribution of supxo,I{;VT(.r) - H's(.r
+ 1')1
which would make our test of the fundamental assumption of the bioequivalence
models very simple. The Kolmogorov type estimator of the relative potency ma\'
113
possess some appealing properties such as asymptotic normality or consistency. One
interesting question is how to select k-NN samples for the processes. In this paper. we
select
J' 's
by the same increment over the range of interest. VIle may select exclusive
samples to guarantee independence among them. or we may allow some dependency
by selecting not totally exclusive samples.
114
Bibliography
Bahadur. R. R. (1966). A note on quantiles in large samples, The Annals of ,\fathematical Statistics 37: .577-580.
Bhattacharya. P. K. and Gangopadhyay, A. K. (1988). Kernel and nearest neighbor
estimation of a conditional quantile. Technical report series of the intercolltge
division of statistics Technical Report 104, University of California at Dayis.
Bhattacharya. P. K. and Mack Y. P. (1987). Weak convergence of k-NN density and
regression estimators with varying k and applications, The Annals of Statistics
15: 976-994.
Bhattachatya. P. K. (1974).
Convergence of sample paths of normalized sums of
induced order statistics. The Annals of Statistics 2: 1034-1039.
Box. G. E. P. (1950). Problems in the analysis of growth and wear curves. BiomEtrics
6: :362-389.
Clarke. B. R. (1986).
Nonsmooth analysis and Frechet differentiability of
~1
functionals, Prob. Theor. Re!. Fields 73: 197-209.
Cole. J. '1'/. C. and Grizzle, J. E. (1966). Application of multivariate analysis of
variance to repeated measurements experiments. Biometrics 22: 810-828.
Fernholz. L. T. (198:3). ron Mises calculu for statistical functions. Lecture notes in
statistics. 19 edn. New York: Springer.
11.5
Fieller, E. C. (1940). The biological standardization of insulin, Journal of the. Royal
Statistical Society, Supplementary 7 pp. 1-64.
Fieller. E. C. (1944). A fundamental formula in the statistics of bioassay, and some
applications. Q. J. Pharm. Pharmaco. pp. 117-123.
Finney, D. J. (1951). Two new uses of the Behrens-Fisher distribution. Journal of
the. Royal Statistical Society. Ser. B 12: 296-300.
Finney. D. J. (1964).
Statistical Methods in Biological Assay, 2nd edn. London:
Charles Griffin.
Gangopadhyay, A. K. and Sen, P. K. (1992). Contiguity in nonparametric estimation
of a conditional functional, in A. Saleh (ed.), Nonparametric Statistics and related
topics, Amsterdam: North Holland, pp. 141-162.
Gangopadhyay, A. K. and Sen, P. K. (1993). Contiguity in Bahadur-type representation of a conditional quantile and application in conditional quantile process. in
J. K. G. et al. (ed.), Statistics and Probability, a Raghu Rai Bahadur Festschrift,
:"ew Delhi: Wiley Eastern, pp. 219-232.
Gikhman. I. I. and Skorokhod, A. V. (1969). Introduction to the theory of random
prOCfsses, vV.B. Saunders, Philadelphia.
Goyindarajulu, Z. (1988). Statistical Techniques in Bioassay, New York: Karger.
Grizzle. J. E. (1965). The two-period change-over design and its use in clinical trials,
Biometrics 21: 467-480.
Grizzle, J. E. and Allen. D. M. (1969). Analysis of grwoth dose-response cun·es.
Biometrics 25: :357-381.
...
Hajek. J. and Sidak. Z. (1967). Theory of Rank Statistics, New York: Academic Press.
116
Jones. D. A. and Karson, M. J. (1972). On the use of confidence regions to test
hypotheses. Journal of Quality Technology 4: 156-158.
Metzler. C. M. (1974).
Bioavailability-a problem in bioequivalence. Biometrics
30: 309-:317.
Rao. P. V. and Little. R. C. (1976). An estimator of relative potency, Communications
in Statistics A 5: 183-189.
Rao. P. V.. Schuster. E. F. and Little, R. C. (1975). Estimation of shift and center
of symmetry based on Kolmogorov-Smirnov statistics, The Annals of Statistics
3: 862-87:3.
Sen. P. K. (1963). On the estimation of relative potency in dilute (-direct) assays by
distribution-free methods, Biometrics 19: 532-552.
Sen. P. K. (1981). Sequential lVonparametrics, New York: John ·Wiley &, Sons.
Sen. P. K. (1994). Regression quantiles in nonparametric regression, ]\/onparamEtric
Statistics 3: 237-2.53.
Sen. P. K. and Gangopadhyay, A. K. (1990).
Bootstrap confidence intervals for
conditional quantile functions, Sankhyii : The Indian Journal of Statistics
A
52: :346-363.
Sen. P. K. and Singer, J. M. (1993). Large Smaple Methods in Statistics. \"ew York:
Chapman &, Hall.
Shorack. G. R. (1966). Graphical procedures for using distribution-free methods in the
estimation of relative potency in dilution (-direct) assays. Biometrics 22: 610619.
Snee. R. D. (1972). On the analysis of response curve data, Technometrics 14: 47-62.
117
Stute. W. (1986). Conditional empirical processes, The Annals of Statistics 5: .59.564.5.
The ARIC Investigators (1989).
The Atherosclerosis Risk in Comminities Study:
Design and Objectives. American Journal of Epidemiology 129: 68'i-'i02.
\Yallenstein. S. and Fisher, A. C. (1977). The analysis of two repeated measurements
design with appication to clinical trials. Biometrics 33: 261-269.
Westlake. \V. J. (1972). Use of confidence intervals in analysis of comparative bioavailability trials. Juurnal of Pharmaceutical Sciences 62: 1340-1341.
~·estlake.
\\T. J. (1974). The use of balanced incomplete block design in comparative
bioavailability trials, Biometrics 30: 319-327.
Westlake. "V. J. (1979). Statistical aspects of comparative bioavailability trials. Biometrics 35: 273-280.
)·ates. F. (1939). An apparent inconsistency arising from tests of significance based on
fiducial distributions of unknown parameters, Proc. Camb. Phil. Soc. 35: .579591.
118
•
© Copyright 2026 Paperzz