The Annals of Statistics 1997, Vol. 25, No. 1, 212 ] 243 SEMIPARAMETRIC SINGLE INDEX VERSUS FIXED LINK FUNCTION MODELLING1 ARDLE, V. SPOKOINY AND S. SPERLICH BY W. H¨ Humboldt Universitat ¨ and Weierstrass Institute for Applied Analysis and Stochastics Discrete choice models are frequently used in statistical and econometric practice. Standard models such as logit models are based on exact knowledge of the form of the link and linear index function. Semiparametric models avoid possible misspecification but often introduce a computational burden especially when optimization over nonparametric and parametric components are to be done iteratively. It is therefore interesting to decide between approaches. Here we propose a test of semiparametric versus parametric single index modelling. Our procedure allows the Žlinear. index of the semiparametric alternative to be different from that of the parametric hypothesis. The test is proved to be rate-optimal in the sense that it provides Žrate. minimal distance between hypothesis and alternative for a given power function. 1. Introduction. Discrete choice models are frequently used in statistical and econometric applications. Among them, binary response models, such as Probit or Logit regression, dominate the applied literature. A basic hypothesis made there is that the link and the index function have a known form; see McCullagh and Nelder Ž1989.. The fixed form of the link function, for example, by logistic cdf is rarely justified by the context of the observed data but is often motivated by numerical convenience and by reference to ‘‘standard practice,’’ say ‘‘accessible canned software.’’ Recent theoretical and practical studies have questioned this somewhat rigid approach and have proposed a more flexible semiparametric approach. Green and Silverman Ž1994. use the theory of penalizied likelihood to model nonparametric link function with splines. Horowitz Ž1993. gives an excellent survey on single index methods and stresses economic applications. Severini and Staniswalis Ž1994. use kernel methods and keep a fixed link function but allow the index to be of partial linear form. Partial linear models are semiparametric models with a parametric linear and a nonparametric index and have been studied by Rice Ž1986., Speckman Ž1988. and Engle, Granger, Rice and Weiss Ž1986.. These models enhance the class of generalized linear models w McCullagh and Nelder Ž1989.x in several ways. Here we concentrate on one generaliza- Received March 1995; revised June 1996. 1 Research supported by Deutsche Forschungsgemeinschaft SFB 373. AMS 1991 subject classifications. Primary 62G10, 62H40; secondary 62G20, 62P20. Key words and phrases. Semiparametric models, single index model, hypothesis testing. 212 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION 213 tion, the single index models with link functions of unknown nonparametric form but Žlinear. index function. The advantage of this approach is that an interpretable linear single index, a weighted sum of the predictor variables, is still produced. The link function plays, in theoretical justifications of single index models via stochastic utility functions, an important role w Maddala Ž1983.x : it is the cdf of the errors in a latent variable model. Our approach enables us to interpret the results still in terms of a stochastic utility model but enhances it by allowing for an unknown cdf of the errors. Despite the flexibility gained in semiparametric regression modelling, there is still an important gap between theory and practice, namely a device for testing between a parametric and semiparametric alternative. A first Ž1994.. They considered for paper in bridging this gap is Horowitz and Hardle ¨ response Y and predictor X the parametric null hypothesis Ž 1. H0 : Y s F Ž X i u 0 . q « , where xi u denotes the index and F is the fixed and known link function. The semiparametric alternative considered there is that the regression function has the form f Ž xi u 0 . with a nonparametric link function f and the same index xi u 0 as under H0 . The main drawback of that paper is that the index is supposed to be the same under the null and the alternative. The goal of the present paper is to construct a test which has power for a large class of alternatives. We move also to a full semiparametric alternative by considering alternatives of single index type: Ž 2. H1 : Y s f Ž X i b . q « with b possibly different from u 0 . In addition we consider a more general H 0 Ž1994., namely a parametric family Ž Fu , u g Q ., than in Horowitz and Hardle ¨ thereby allowing link function and error distribution to depend on an arbitrary parameter. The situation of our test is illustrated in Figures 1 and 2. The data is a crosssection of 462 records on apprenticeship of the German Social Economic Panel from 1984 to 1992. The dependent variable is an indicator of unemployment, Ž Y s 1 corresponds to ‘‘yes’’.. Explanatory variables are X1 , gross monthly earnings as an apprentice, X 2 , percentage of people apprenticed in a certain occupation and X 3 , unemployment rate in the state the respondent lived in during the year the apprenticeship was completed. The aim of the test is to decide between the logit model and the semiparametric model with unknown link function and possibly different index. In Hardle, Klinke and ¨ ŽHH. test Turlach Ž1995. this hypothesis is tested with the Horowitz]Hardle ¨ by Proenca and Werwatz who also prepared the dataset. They give a more detailed description of the HH test procedure which does not reject. We measure the quality of a test by the value of minimal distance between the regression function under the null and under the alternative which is sufficient to provide the desirable power of testing. The test proposed below is shown to be rate-optimal in this sense. 214 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH FIG. 1. Parametric fitting. The paper is organized as follows. Section 2 contains the main results. The test procedure is described in Section 3. In Section 5 we present a simulation study. The proofs of main results are given in Section 4 ŽTheorem 2.2. and in the Appendix ŽTheorem 2.1.. 2. Main results. We start with a brief historical background of the nonparametric hypothesis testing problem. The problem for the case of a simple hypothesis and univariate nonparametric alternative was considered by Ibragimov and Khasminskii Ž1977. and Ingster Ž1982.. It was shown that the minimax rate for the distance between the null and the alternative set is of the order ny2 srŽ4 sq1. where s is a measure of smoothness. Note that this rate differs from that of an estimation problem where we have nys rŽ2 sq1.. In the multivariate case the corresponding rate changes to ny2 srŽ4 sqd ., as Ingster Ž1993. has shown. The problem of testing a parametric hypothesis versus a nonparametric alternative was discussed also in Hardle and ¨ Mammen Ž1993.. Their results allow the extraction of the above minimax rate. The results of Friedman and Stuetzle Ž1984., Huber Ž1985., Hall Ž1989. and Golubev Ž1992. show that estimation of the function f under Ž2. can be made with the rate corresponding to the univariate case. Below we will see, PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION FIG. 2. 215 Semiparametric fitting. though, that for the problem of hypothesis testing the situation is slightly different. The rate for this additive alternative of single index type differs from that of a univariate alternative Ž d s 1. by an extra log factor. Nevertheless, we have a nearly univariate rate and we can therefore still expect efficiency of the test for practical applications. We will come back to the introductory example in Section 5. Suppose we are given independent observations Ž X i , Yi ., X i g R d , Yi g R 1 , i s 1, . . . , n, that follow the regression Yi s F Ž X i . q « i , i s 1, . . . , n. Ž 3. Here « i s Yi y F Ž X i . are mean zero error variables, E « i s 0, i s 1, . . . , n, with conditional variance si2 s E « i2 < X i , i s 1, . . . , n. Ž 4. EXAMPLE 2.1. As a first example, take the above single index binary choice model. The observed response variables Yi take two values 0, 1 and P Ž Yi s 1 < X i . s F Ž X i . , P Ž Yi s 0 < X i . s 1 y F Ž X i . . In this case, si s F Ž X i . 1 y F Ž X i .4 . 2 216 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH EXAMPLE 2.2. A second example is a nonlinear regression model with unknown transformation. An excellent introduction into nonlinear regression can be found in Huet, Jolivet and Messeau Ž1993.. The model takes the same form as Ž1. but the response Y is not necessarily binary and the variance si2 may be an unknown function of the F Ž X i .’s. Carroll and Ruppert Ž1988. use this kind of error structure to model fan-shaped residual structure. We wish to test the hypothesis H0 that the regression function F Ž x . belongs to a prescribed parametric family w Fu Ž x ., u g Q x , where Q is a subset in a finite-dimensional space R m. This hypothesis is tested versus the semiparametric alternative H1 that the regression function F Ž?. is of the form Ž 5. F Ž x . s f Ž xi b . , where b is a vector in R d with < b < s 1, and f Ž?. is a univariate function. EXAMPLE 2.3. Let the parametric family w Fu Ž x ., u g Q x be of the form 1 Fu Ž x . s Ž 6. 1 q exp Ž yxi u . and let otherwise Ž X, Y . have stochastic structure as in Example 2.1. This form of parametrization leads to a binary choice logit regression model. Probit or complementary log-log models have a different parametrization but still have this single index form. Let F0 be the set of functions w Fu Ž x ., u g Q x and let F1 be a set of alternatives of the form Ž5.. We measure the power of a test wn by its power function on the sets F0 and F1: if wn s 0 then we accept the hypothesis H0 and if wn s 1, then we accept H1. The corresponding first and second type error probabilities are defined as usual: a 0 Ž wn . s sup PF Ž wn s 1 . , Fg F0 a 1 Ž wn . s sup PF Ž wn s 0 . . Fg F1 Here PF means the distributions of observations Ž X i , Yi . given the regression function F Ž?.. When there is no risk of confusion, we write P instead of PF . Our goal is to construct a test wn that has power over a wide class of alternatives. The assumptions needed are made precise below. We start with assumptions on the error distribution. ŽE1. There are l ) 0 and C e such that E exp l < « i < 4 F C e , i s 1, . . . , n. ŽE2. The conditional distributions of errors « i given X i depend only on values of the regression function F Ž X i ., L Ž « i < X i . s L Ž « i < F Ž X i . . s PF Ž X i . , where Ž Pz . is a prescribed distribution family with one-dimensional parameter z. PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION 217 ŽE3. The variance function s 2 Ž z . s Ew « i2 < F Ž X i . s z x and the fourth central moment function k 4 Ž z . s EwŽ « i2 y E « i2 . 2 < F Ž X i . s z x are bounded away from zero and infinity, that is, 0 - a# F s Ž z . F s * - ` 0 - k# F k Ž z . F k * - ` with some prescribed s #, s *, k#, k * and these functions are Lipschitz: for some positive constants Cs and Ck one has s Ž z . y s Ž z9 . F Cs < z y z9 < , k Ž z . y k Ž z9 . F Ck < z y z9 < . Note that ŽE1. through ŽE3. are obviously fulfilled for the single index model in Example 2.1 and 2.3. Now we present assumptions on the design X. ŽD. The predictor variables X have a design density p Ž x . which is supported on the compact convex set X in R d and is separated from zero and infinity on X . Assumption ŽD. is quite common in nonparametric regression analysis. It is apparently fulfilled for the above example on apprenticeship and youth unemployment. As an alternative to ŽD., one may assume that the design density p is continuous and positive on some compact subset D of X . In the last case, only observations on D are to be taken into account and this allows reducing the problem to the situation described in the condition; see, for example, Hardle, Hall and Ichimura Ž1993.. ¨ We now specify the hypothesis and alternative. ŽH0. The parameter set Q is a compact subset in R m . For some positive constant CQ the following holds: Fu Ž x . y Fu 9 Ž x . F CQ < u y u 9 < ; x g X , u , u 9 g Q. All functions Fu Ž?. belong to the Holder class S d Ž s, L. of functions in R d. ¨ ŽH1. The univariate link function f Ž?. from Ž5. belongs to the Holder class ¨ SŽ s, L.. The function F Ž x . s f Ž xi b . is bounded away from the parametric family F0 , that is, inf 5 F y Fu 5 G c n Ž 7. ugQ with a given c n ) 0. Here 5 F y Fu 5 s H < F Ž x . y Fu Ž x .< 2p Ž x . dx. For the definition of a Holder smoothness class in the context of statistical ¨ nonparametric problems we refer, for example, to Ibragimov and Khasminskii Ž1981.. In the case of an integer s, the class SŽ s, L. can be defined as the class of s-time differentiable functions with the sth derivative bounded in absolute value by L. Assumption ŽH0. is certainly fulfilled for 218 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH Example 2.3 but also in Probit and other generalized linear regression models such as the log linear models. The main results are given below. We compute first the optimal rate of convergence of the distance c n distinguishing the null from the alternative. The second theorem states the existence of an optimal test. The test will be given more explicitly in the next section where we also apply it to the above concrete examples. Theorem 2.2 is proved in Section 4 and the proof of Theorem 2.1 is given in the Appendix. ' THEOREM 2.1. Let c n s Ž aŽ log n .rn. 2 srŽ4 sq1.. If a is small enough then for any sequence of tests wn one has lim inf a 0 Ž wn . q a 1 Ž wn . G 1. nª` THEOREM 2.2. For any constant a* large enough there is a sequence of tests wnU which distinguish consistently the hypothesis H0 from the alternative H1 s H1Ž cUn . with cUn s Ž a*Ž log n .rn. 2 srŽ4 sq1., that is, ' lim a 0 Ž wnU . s 0 nª` and lim a 1 Ž wnU . s 0. nª` 3. The test procedure. Before we describe the test procedure let us introduce some notation. Given functions F Ž x . and GŽ x . we denote by Ž 8. ² F , G: s 1 n n Ý F Ž Xi . G Ž Xi . is1 the scalar product of the functions F and G. We write also ² F : instead of ² F, F : and identify the sequences Ž Yi ., Ž « i . with the functions Y Ž X i . and « Ž X i .. We construct the tests wnU from Theorem 2.2 in several steps. First, we shall do a preliminary parametric pilot estimation F˜0 under the null. Second, we estimate the d-dimensional nonparametric regression F˜1 of EŽ Y < x . by a kernel estimation. These estimators are used in the approximating of expected value and the variance of the proposed test statistics. In the third step, we estimate for each feasible value of b the corresponding link function f under ŽH1. as in Ž2.. Finally we compute the test statistic based on comparison of residuals under H0 and H1. 3.1. Parametric pilot estimation. Let Qn be a grid in the parametric set Q with the step log nr 'n . Put Ž 9. u˜n s arginf ² Y y Fu : s arginf ugQ n u gQ n 1 n Denote also Ž 10. F˜0 Ž ? . s Fu˜nŽ ? . . n Ý is1 2 Yi y Fu Ž X i . . PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION 219 Note that u˜n is not necessarily an efficient estimator under the null since we do not correct for the variance function. REMARK. We define u˜n as a discretized minimum contrast estimator. The discretization of the pilot parametric estimator is a standard device in this kind of problem. It allows replacing in the proof the random point u˜n by a nonrandom point un which is the point of the grid closest to u ; see Lemma 4.8. 3.2. Nonparametric pilot estimation. For the nonparametric estimation of the expected value and the variance of the test statistic, we shall use the Ž1990. or Muller Ž1987.. standard kernel technique; see, for example, Hardle ¨ ¨ More precisely, we use a one-dimensional kernel satisfying the following conditions. ŽK1. ŽK2. ŽK3. ŽK4. ŽK5. K Ž?. is compactly supported. K Ž?. is symmetric. K Ž?. has s continuous derivatives. H K Ž t . dt s 1. H K Ž t . t k dt s 0, k s 1, . . . , s y 1. Recall from ŽH0. and ŽH1. that s denotes the degree of smoothness of the regression function. Note also that ŽK5. ensures that K is orthogonal to polynomials of order 1 to s y 1. For a list of kernels satisfying ŽK1. ] ŽK5., we Ž1987.. A d-dimensional product kernel K 1 is defined as refer to Muller ¨ Ž 11. K 1 Ž u1 , . . . , u d . s d Ł K Ž uj . . js1 Take now Ž 12. h1 s ny1 rŽ2 sqd . , the rate optimal smoothing bandwidth in d-dimensions, and put Ž 13. F˜1 Ž x . s Ý nis1Yi K 1 Ž Ž x y X i . rh1 . Ý nis1 K 1 Ž Ž x y X i . rh1 . . The nonparametric kernel smoother F˜1 is the well-known multidimensional Nadaraya]Watson kernel estimator. 3.3. Estimation under H1. We will use the following bandwidth: Ž 14. hs ž 'log n n / 2r Ž4 sq1 . for estimation of f in the semiparametric model. Note that this bandwidth is different from the one used for the nonparametric pilot estimation. It is also different from the bandwidth rate ny2 rŽ4 sqd . used in obtaining the purely 220 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH nonparametric rate of testing, ny2 srŽ4 sqd . even for d s 1; see Ingster Ž1993.. The additional log factor comes from the semiparametric structure of our alternative and can be viewed as extra payment for reducing the multivariate structure to a univariate one. In practice, the smoothness parameter s is typically unknown and the bandwidth h is to be chosen in a data-driven way. Unfortunately, the theory for adaptive nonparametric testing is not sufficiently developed and discussion of the possibility of a data-driven bandwidth choice lies beyond the scope of this paper. Recent results from Spokoiny Ž1996. indicate that the described testing procedure can be performed in an adaptive way without significant loss of power. Let S d be the unit sphere in R d. Denote by Sn, d a discrete grid in S d with the step bn s h 2 sq2 . Let N be the cardinality of Sn, d Ž 15. N s aSn , d . For each b g Sn, d define Ž 16. Kh, b Ž x . s K ž / xi b h , x g Rd and introduce the smoothing operator Kb with Ž 17. Kb Y Ž X i . s Pb Ž X i . Ý Yj K h , b Ž X i y X j . , j/i where Ž 18. Pb Ž X i . s žÝ j/i K h , b Ž Xi y X j . / y1 . Similarly we define Kb « and Kb F. Note that given b , the values Kb Y estimate f in Ž2.. 3.4. The test statistic. We start with some discussion explaining the applied test statistics. First we treat the case with a simple null hypothesis corresponding to the trivial regression function F ' 0. We assume also that b is known. In this situation, the obvious goodness-of-fit test can be based on ² Kb Y : or better on Tb0 s 2Ž Y, Kb Y : y ² Kb Y :. ŽThe latter test statistic has smaller variance.. Under the null, both these statistics are quadratic forms in the errors « i and, therefore, U-type statistics. Standard calculation for U-statistics show that Tb0 has under the null the expectation of order Ž nh.y1 and the variance of order ny2 hy1 , and, being centered and normalized, it is asymptotically normal; see, for example, Hall Ž1984.. The proper test could be taken in the form w *s1 wŽ Var Tb0 .y1 r2 ŽTb0 y ETb0 . ) xa x where xa is Ž1 ya .quantile of the standard normal law. If a considered model has heteroskedastic noise with variance depending on the unknown regression function Žsee, e.g., Example 2.1 for the binary response model., then the value ETb0 and VarTb0 are to be estimated using a pilot nonparametric estimator. 221 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION In the case of the parametric null, the natural idea could be to construct a pilot parametric estimator F˜0 and then to substruct it from the observation Y, as was done in Hardle and Mammen Ž1993.. This leads to the test based on ¨ 2² Y y F˜0 , Kb Ž Y y Fˆ0 .: y ² Kb Ž Y y F˜0 .:. Unfortunately, this method cannot be applied to the case of the semiparametric alternative since the null hypothesis can have another structure Že.g., with a different index. and hence the value of the bias term in Kb F˜0 can be too large. To bypass this problem, we modify slightly the latter test statistics, changing Kb Ž Y y F˜0 . to Kb Y y F˜0 . Finally, to proceed with an unknown b , we calculate a statistic Tb for each b from the grid Sn, d and the resulting test statistic is based on the supremum of all these quantities. If now b be the value of the index for the unknown alternative, then, at least for one b 9 g Sn, d , the difference b 9 y b will be small and the corresponding Tb 9 will be large enough to detect this alternative. From the other side, considering the maximum of a growing number of test statistics creates a problem with the error probability of the first kind. To provide a prescribed level for this error probability, we have to take a logarithmically growing test level that results in the extra log factor in the rate of testing. The result of Theorem 2.1 shows that this log factor is an unavoidable payment for the choice of an unknown index for the alternative of semiparametric structure. Now we present the test statistics. For each b g Sn, d we calculate Tb as follows: Ž 19. Tb s n'h Ṽb ˜b . 2² Y y F˜0 , Kb Y y F˜0 : y ² Kb Y y F˜0 : q E Here ² ? : is defined by Ž8., h by Ž14., F˜0 by Ž10.. We use the following notation: 1 ˜b s Ý Ý s˜j2 Pb2 Ž X i . K h2, b Ž X i y X j . , E Ž 20. n i j/i where Pb Ž X i . is from Ž18., ž / s˜j2 s s 2 F˜1 Ž X j . , Ž 21. j s 1, . . . , n, the function s 2 Ž?. being defined in the model assumptions of F˜1Ž x . being the nonparametric pilot estimator. Finally, Ṽb2 s h Ý i Ý s˜i2s˜j2 Pb2 Ž X i . 2 K h , b Ž X i y X j . y K hŽ2., b Ž X i , X j . 2 j/i q hÝk ˜i4 i Ý Pb2 Ž X j . K h2, b Ž X i y X j . 2 j/i with k ˜i s k Ž F˜1Ž X i .., i s 1, . . . , n, k Ž?. being from ŽE3. and Ž 22. K hŽ2., b Ž X i , X j . s 1 Pb Ž X i . Ý k/i , j Pb2 Ž X k . K h , b Ž X k y X i . K h , b Ž X k y X j . . ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 222 Put now Ž 23. TnU s sup Tb bgS n, d and wnU s 1 Ž TnU ) '4 log N . . Ž 24. Here 1Ž?. is the indicator function of the corresponding event and N is the cardinality of Sn, d ; see Ž15.. REMARK. Similarly to the case of parametric pilot estimation, we optimize the parameter b over some grid of the unit sphere. This restriction allows simplifying the proof by reducing the random field ŽTb , b g S d . to a family of random variables ŽTb , b g Sn, d .. It may be possible to define the test statistic TnU as the supremum of Tb over the whole sphere S d but the proof in this case seems to be much more involved. 4. Proof of Theorem 2.2. We start with a decomposition of the test statistics Tb . Denote by Bb Ž x . the bias function for the smoothing operator Kb from Ž17.: Ž 25. Bb Ž X i . s Kb F Ž X i . y F Ž X i . , i s 1, . . . , n. Fix some b g Sn, d and F g F0 j F1. LEMMA 4.1. Ž 26. Tb s n'h Ṽb ˜b ² F y F˜0 : y ² Bb : q 2² Kb « , « : y ² Kb « : q E q2² F y F˜0 , « : q 2² Bb , « : y 2² Bb , Kb « : . PROOF. By definition, Y s F q « and therefore Kb Y s Kb F q Kb « s F q Bb q Kb « . Now 2² Y y F˜0 , Kb Y y F˜0 : s 2² F y F˜0 q « , F y F˜0 q Bb q Kb « : s 2² F y F˜0 : q 2² F y F˜0 , Bb : q 2² F y F˜0 , Kb « : q 2² « , F y F˜0 : q 2² « , Bb : q 2² « , Kb « : and ² Kb Y y F˜0 : s ² F y F˜0 q Bb q Kb « : s ² F y F˜0 : q ² Bb : q ² Kb « : q 2² F y F˜0 , Bb : q 2² F y F˜0 , Kb « : q 2² Bb , Kb « : . 223 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION Substituting this in the definition of Tb , we obtain the assertion of the lemma. I The next step is to show that the expansion Ž26. for the statistic Tb can be simplified by discarding lower order terms. Indeed we shall see below that the ˜b and V˜b last three terms are relatively small and can be omitted. The terms E can be replaced by similar expressions Eb and Vb which use ‘‘true’’ values si and k i instead of estimated values s˜i and k ˜i and finally, the parametric estimator u˜n can be replaced by un defined by Ž 27. un s arginf ² F y Fu : ugQ n where F is a ‘‘true’’ regression function from Ž3.. Suppose that all these replacements can be done. Define now TbX s n'h ² F y Fu : y ² Bb : q 2² Kb « , « : y ² Kb « : q Eb n Vb and Eb s 1 n Ý Ý sj2 Pb2 Ž Xi . K h2, b Ž Xi y X j . , i Vb2 s h Ý i j/i Ý si2sj2 Pb2 Ž X i . 2 K h , b Ž X i y X j . y K hŽ2., b Ž X i , X j . 2 j/i q h Ý k i4 i 2 Ý Pb2 Ž X j . K h2, b Ž X k y X j . . j/i Below we show that the tests wnUU based on the statistics TnUU with Ž 28. TnUU s sup TbX bgS n, d have the same asymptotic behavior as wnU . For the moment we only consider the tests wnUU . Note that they are not tests in the usual sense since they use the nonobservable values Eb , Vb , un . Central to our proof is the analysis of the asymptotic behavior of the random variables Ž 29. jb s n'h 2² Kb « , « : y ² Kb « : q Eb . LEMMA 4.2. The following assertions hold Ž 30. E jb s 0, Ž 31. E jb2 s Vb2 , ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 224 and uniformly in F g F0 j F1 , b g Sn, d and t g w ylog n, log n x , P Ž Ž jbrVb . ) t . Ž 32. 1 y FŽ t. ª 1, n ª `, F Ž?. being the standard normal distribution. PROOF. The first two statements are derived by direct calculation. In fact, by definition and Ž22., jb s 2'h Ý « i Pb Ž X i . Ý « j K h , b Ž X i y X j . i y'h j/i Ý Pb2 Ž X i . Ý « j K h , b Ž Xi y X j . i q'h j/i Ý Ý sj2 Pb2 Ž X i . K h2, b Ž X i y X j . i s 'h j/i Ý Ý « i « j Pb Ž X i . i q'h 2 2 K h , b Ž X i y X j . y K hŽ2., b Ž X i , X j . j/i Ý Ý Ž sj2 y « j2 . Pb2 Ž X i . K h2, b Ž X i y X j . . i j/i Since the errors « i are independent and E « i s 0, E « i2 s si2 , we immediately obtain Ž30. and Ž31.. The last statement Ž32. is a particular case of the general central limit theorem for quadratic forms of independent random variables ŽU-statistics. and can be obtained in a standard way by calculation of the corresponding cumulants. We omit the details; see, for example, Hardle ¨ and Mammen Ž1993.. I The assertion Ž32. of Lemma 4.2 straightforwardly implies the following corollary. LEMMA 4.3. Uniformly in F g F0 j F1 one has Ž 33. P ž sup bgS n, d jb Vb ' ) 4 log N / ª 0, n ª `. PROOF. For any t one gets Ž 34. P ž sup bgS n, d jb Vb / )t F Ý bgS n , d P ž jb Vb / ) t F N sup P b gS n , d ž jb Vb / )t . 225 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION However, by Ž32. for n large enough, P ž jb Vb ' ) 4 log N / ' F 2 Ž 1 y F Ž 4 log N . . ½ F exp y 1 2 '4 log N 2 5 s Ny2 that implies Ž33. through Ž34.. I Now we come to the calculation of the error probabilities for the tests wnUU based on TnUU . Under the hypothesis H0 one has F s Fu , u g Q. This does not automatically yield ² F y Fu n : s 0 since un g Qn w see Ž27.x and u can be outside Qn , but the assumptions ŽH0. on the parametric family guarantee that this value is small enough. Let F s Fu , u g Q. Then LEMMA 4.4. ² Fu y Fu : F Cu2 ln2 n n n . PROOF. Let unX s arginf < u y u 9 < . u 9gQ n The definition of the grid Qn provides < u y unX < 2 F Žln2 n.rn. Now from the definition of un and the assumptions ŽH0. on the parametric family we obtain ² Fu y Fu : F ² Fu y Fu X : s n n 1 Ý n Fu Ž X i . y Fu XnŽ X i . i 2 F CQ2 < u y u 9 < 2 2 F CQ2 ln n n . Using this result, we have for F s Fu by Lemma 4.3, ' P Ž TnUU ) 4 log N . F P ž sup bgS n, d jb Vb ' ) 4 log N y Cu2 ln2 n n that is, a 0 Ž wnUU . s sup PF Ž wnUU s 1 . ª 0, Fg F0 n ª `. Next we evaluate the error probability of the second type. LEMMA 4.5. Let F g F1. Then for n large enough ² F y Fu : G c nr2. n n'h / ª 0, n ª `, I 226 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH PROOF. Let F g F1 be fixed and u F s arginf 5 F y Fu 5 . ugQ By the triangle inequality and Lemma 4.4 one has ² F y Fu : F ² F y Fu : q ² Fu y Fu : F ² F y Fu : q Cu2 F n n F n ln2 n n . It remains to check that the inequality 5 F y Fu F 5 G c n implies ² F y Fu F : G c nr2. For n large enough that is obviously the case. I The following lemma is a direct consequence of assumptions ŽE3. and ŽD.. LEMMA 4.6. There exist constants Cp , s * and V * such that Ž 35. Ž 36. Pb Ž X i . K h , b Ž X i y X j . F Cp Pb Ž X j . K h , b Ž X i y X j . ; b , Xi , X j , sup si F s * i and Ž 37. sup Vb F V *. b Recall now that each function F Ž?. from F1 is of the form F Ž x . s f Ž xi b 0 . with some b 0 g S d . As a consequence F Ž?. should be well approximated by the smoothing operator Kb with b coinciding or close to b 0 . More precisely, the following can be stated. LEMMA 4.7. There is a positive constant Cb such that for each F Ž?. g F1 , F Ž x . s f Ž xi b 0 ., Ž 38. ² Bb : F Cb h 2 s n with Ž 39. bn s arginf < b y b 0 < . bgS n, d PROOF. The definition of the grid Sn, d provides < bn y b 0 < F h 2 sq2 . Then, it is well known, for example, from Ibragimov and Khasminskii Ž1981., that for F Ž x . s f Ž xi b 0 . with f g SŽ s, L. one has Ž 40. ² Bb : s ² Kb F y F : F L9h 2 sq1 0 0 227 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION with L9 s L 5 K 5rŽ s y 1.!, but ² Bb : y ² Bb : F ² Bb y Bb : n 0 n 0 F ² Kb 0 F y Kb nF : F 1 n Ý i Pb nŽ X i . Ý F Ž X j . K h , b Ž Xi y X j . n j/i yPb 0Ž X i . Ý F Ž X j . K h , b Ž Xi y X j . 0 j/i . Now using assumptions ŽD. and ŽK1. ] ŽK5. we obtain y1 Py1 b n Ž X i . y Pb 0 Ž X i . F Ž 41. Ý j/i K h , b nŽ X i y X j . y K h , b 0 Ž X i y X j . F C Pb 0Ž X i . < bn y b 0 < h and similarly Ý j/i Ž 42. F Ž X j . K h , b nŽ X i y X j . y F Ž X j . K h , b 0 Ž X i y X j . F C Pb 0Ž X i . < bn y b 0 < h . Putting together Ž41. and Ž42. we conclude that ² Bb : y ² Bb : F C n 0 < bn y b 0 < h F Ch2 sq1 and the lemma follows from Cb s L9 q 1. To complete the proof for the tests wnUU it remains to note that for each F g F1 , TnUU G n'h ² F y Fu : y ² Bb : q n n Vb n jb n Vb n and that if Ž 43. ² F y Fu : G Cb h 2 s q n 2V * n'h '4 log N , with V * from Lemma 4.6, then by Lemma 4.3 we obtain ' P Ž TnUU - 4 log N . F P FP ž ž n'h 2V * Vb n jb n Vb n j '4 log N q Vb n'h b ' ) 4 log N / ª 0, n ' - 4 log N n n ª `. / ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 228 Finally we remark that log N F C log n and the choice of h by Ž8. yields Cb h 2s 2V * q n'h '4 log N F C9 ž 'log n / n 4 sr Ž4 sq1 . s C9h2 s , that is, Ž43. holds true if c n in the definition of the alternative H1 is taken with c n2 G 2C9h 2 s. This completes the proof for the tests wnUU . I Now we explain why the statistics TnUU can be considered in place of TnU . The idea is to show that the difference TnUU y TnU is relatively small Žbeing compared with the test level 4 log N or deviation ² F y Fu n :.. First we treat the preliminary parametric estimator u˜n . Denote for given F g F0 j F1 , ' d n Ž F . s ² F y Fu n : q ln2 n n , un being from Ž27.. LEMMA 4.8. Uniformly in F g F0 j F1 we have for each d ) 0, P Ž 44. ž 1 / / ² F y F˜0 : y ² F y Fu : ) d ª 0, n dn Ž F . P ž 1 dnŽ F . ² F y F˜0 , « : ) d ª 0. PROOF. Let us fix some d ) 0 and some u g Qn . First we show that the probability of the event ½ ž ln2 n ² F y Fu , « : ) d ² F y Fu : q n /5 is asymptotically small. More precisely, we state the following assertion: Ž 45. Ý ugQ n ž ž P ² F y Fu , « : ) d ² F y Fu : q ln2 n n // ª 0, In fact, if we put du2 s E <² F y Fu , « :< 2 , then we have du2 sE s 1 n2 2 1 Ý «i n F Ž X i . y Fu Ž X i . i Ý si2 F Ž X i . y Fu Ž X i . 2 . i Using Lemma 4.6 we have du2 F s U2 n2 Ý i F Ž X i . y Fu Ž X i . 2 s s *2 n ² F y Fu : . n ª `. 229 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION Further, 1 2 and ž ² F y Fu : q ž ln2 n n / ( ² F y Fu : G ž FP FP ž ž 1 du 1 du ² F y Fu , « : ) ² F y Fu , « : ) d du n ( log n d s* n // ln2 n P ² F y Fu , « : ) d ² F y Fu : q ln2 n ² F y Fu : n / / log n . Now we use an estimate of the large deviation probability for the centered and normalized random variables Ž1rdu .² F y Fu , « :; see Lemma 4.11. Indeed, for n large enough, Ý P ugQ n ž 1 du ² F y Fu , « : ) d s* / log n F Ý u gQ n ½ exp y d2 2 s *2 ln2 n 5 F n d exp y Ž d q 1 . log n4 F ny1 which implies Ž45.. Here we used that the cardinality of Qn is of order n d. Let u g Qn be such that Ž 46. ² F y Fu : y ² F y Fu : ) 2 d d n Ž F . . n For d small enough this yields ² F y Fu : y ² F y Fu : ) d Ž ² F y Fu : q ² F y Fu : . . n n Ž 47. Now by definition of u˜n , we obtain by Ž46. and Ž47. u˜n s u 4 : ² Y y Fu : F ² Y y Fu : 4 s ² F y Fu q « : F ² F y Fu q « : 4 s ² F y Fu : y ² F y Fu : F 2² F y Fu , « : q 2² F y Fu n n n ½ : ² F y Fu , « : ) d 2 5 ½ ² F y Fu : j ² F y Fu , « : ) n d 2 n , « :4 Using this relation and Ž45. we deduce ž P ² F y Fu˜n : y ² F y Fu n : ) 2 d d n Ž F . F F ž / / Ý 1 ² F y Fu : y ² F y Fu n : ) 2 d d n Ž F . P Ž u˜n s u . Ý P ² F y Fu , « : ) ugQ n ugQ n ž d 2 / ² F y Fu : ª 0, 5 ² F y Fu : . n n ª `; 230 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH that proves Ž44.. The second statement of the lemma follows directly from Ž45.. I The next step is to show that the last two terms in the expansion Ž29. are vanishing. LEMMA 4.9. Given F let bb s ² Bb : q ln2 n . n Then uniformly in F g F0 j F1 for each d ) 0, the following assertions hold: Ý bgS n, d Ý bgS n, d P Ž ² Bb , « : ) d bb . ª 0, P Ž ² Bb , Kb « : ) d bb . ª 0. REMARK 4.1. The statements of this lemma yield immediately that P Ž ² Bb , « : F d bb ; b g Sn , d . ª 1 and similarly for ² Bb , Kb « :. PROOF. The statements of the lemma are proved in the same manner as in the last part of the proof of Lemma 4.8. For the second statement, we use in addition the fact that C Var² Bb , Kb « : F ² Bb : . Ž 48. n Indeed, using assumptions ŽE1. ] ŽE3. and ŽK1. ] ŽK5., Lemma 4.6 and Jensen’s inequality, we have 2 1 2 E ² Bb , Kb « : s 2 E Ý Bb Ž X i . Pb Ž X i . Ý « j K h , b Ž X i y X j . n i j/i s s 1 n E 2 Ý « j Ý Bb Ž Xi . Pb Ž X i . K h , b Ž Xi y X j . j F F F i/j 1 Ý sj2 Ý Bb Ž X i . Pb Ž X i . K h , b Ž Xi y X j . n2 j s *2 Cp2 Ý Pb2 Ž X j . 2 1 n2 1 n2 j s* 2 i/j 1 n 2 2 Cp2 Ý Ý Bb Ž Xi . K h , b Ž Xi y X j . i/j Ý i / j Bb Ž X i . K h , b Ž X i y X j . j s *2 Cp2 C² Bb : . 2 2 Ýi / j K h , b Ž Xi y X j . I 231 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION ˜b and V˜b estimate Eb and Vb well Next we show that the quantities E enough. For each d ) 0 and uniformly in F g F0 j F1 , LEMMA 4.10. P ž ˜b y Eb < ) sup < E bgS n, d P ž n'h Ṽb sup bgS n, d Vb 1 log n / ª 0, / y 1 ) d ª 0. PROOF. The assumption ŽE3. implies for each j s 1, . . . , n, < sj2 y s˜j2 < F Cs F˜1 Ž X j . y F Ž X j . and hence ˜b y Eb < F <E 1 n Ý Ý < sj2 y s˜j2 < Pb2 Ž Xi . K h2, b Ž Xi y X j . . i j/i Now by the design and kernel properties we derive for each j s 1, . . . , n, C Ý Pb2 Ž X i . K h2, b Ž X i y X j . F nH j/i and using the Cauchy]Schwarz inequality we obtain ˜b y Eb < F <E C n2 h Ý F˜1 Ž X j . y F Ž X j . F i C 1 Ý F˜1 Ž X j . y F Ž X j . n2 h n i 2 1r2 . The pilot estimator F˜1 fulfills with high probability, ² F˜1 y F : F Cny2 srŽ2 sqd . . Hence using the inequality 2 srŽ2 s q d . ) 1rŽ4 s q 1. and the definition of h we arrive at the conclusion that ˜b y Eb < F n'h < E C 'h ny2 rŽ2 sqd . s o ž / 1 log n . I Lemmas 4.8]4.10 together imply the asymptotic equivalence of the tests based on Tb and TbX . We finish the proof of the theorem with a result on probabilities of deviations of centered and normalized sums of independent errors « i over the logarithmic level. The following lemma was already used in the proof of Lemma 4.8. LEMMA 4.11. For each pair of positive constants r, a, the following relation holds uniformly in functions F from the Lipschitz class S d Ž1, L. of functions in R d : n r P Ž j Ž F . ) a log n . ª 0, n ª `, 232 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH where j Ž F. s ²F, « : 'E² F , « : 2 . For the proof, we proceed in a standard way using ŽE1. through ŽE3. and the Chebyshev exponential inequality: for any l, z ) 0, one has P Ž j ) z . F eyl z Ee lj . The details are omitted. 5. A simulation and an application. The purpose of our simulation experiments was to study the quantiles of the test statistic TnU and the power of the test in finite samples. All calculations have been performed in the languages GAUSS and XploRe w Hardle, Klinke, and Turlach Ž1995.x . The ¨ observations were generated according to a binary response model. The explanatory variables were identical independent uniformly distributed on w y1, 1x . We took the parameter u s Ž 11 .1r '2 and considered the functions 1 Ž 49. f0 Ž u. s Ž 50. Ž 51. f 1 Ž u . s f 0 Ž u . q hw 9 Ž u . 1 q expyu f 2 Ž u . s 1 y exp Ž yexp Ž u . . for different 0 - h F 1, where w is the density function of the standard normal distribution. While f 0 is a logit function, f 1 consists of a logit disturbed by a bump ŽFigure 3.. The response Y under H0 was generated such that P Ž Y s 1 < x Tu 0 s u. s f 0 Ž u.. We are thus interested in the hypothesis H0 , H0 : Fu Ž x . s E Y < u Ž x , u . s u s f 0 Ž u . , u g S2 . In a first step we calculated empirically the 90 and 95% quantiles of TnU for n s 100 and 200 observations generated by f 0 . They were used then as rejection boundaries, defined as 4 log N ; see Ž24.. We calculated TnU by optimizing Tb over a grid; see Ž23., with N s 50 gridpoints. As kernel function K we used always the quartic kernel, ' K Ž u. s 15 16 2 Ž 1 y u 2 . 1< u < - 14 . In the second step we analyzed the effect of increasing sample size on the power. In Table 1 we show the power of the test when the data were generated with functions f 1 a , that is, f 1 for h s 0.2, f 1 c , where h s 0.6 and f 2 . In order not to oversmooth, we used the bandwidth h1 s h s 0.5 for n s 100, 200 and h1 s h s 0.25 for n s 350, 500. Although we substituted, for speed reasons in the cases n s 350 and 500, V˜b by V˜u for all b , the power increases very fast with n. Therefore, it could be of interest to compare the power with regard to the bump h in the logit model. In Table 2 we show for n s 200 and 350 the power of the test as a function of h. We see that for h ) 0.4 this test procedure works very well. 233 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION FIG. 3. Solid line: f 0 ; dashed line: f 1 with h s 0.2; pointed line: f 1 with h s 0.6. TABLE 1 Power and rejection boundaries for different alternatives n, h s level rejection boundary f1 a f1 c f2 100, 0.5 5% 10% 4.00 3.35 200, 0.5 5% 10% 3.30 3.25 350, 0.25 5% 10% 3.75 2.90 500, 0.25 5% 10% 3.20 2.76 0.056 0.224 0.316 0.112 0.530 0.946 0.133 0.798 0.995 0.150 0.900 0.995 0.096 0.294 0.376 0.215 0.690 0.991 0.207 0.856 1.000 0.200 0.960 1.000 TABLE 2 Power for different bumps h hs level 5% 10% 5% 10% 5% 10% 5% 10% n, h 200, 0.50 350, 0.25 0.112 0.133 0.215 0.207 0.227 0.321 0.419 0.478 0.530 0.798 0.690 0.856 0.687 0.889 0.801 0.926 0.2 0.4 0.6 1.0 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 234 The last step of the simulation experiment was the study of bandwidth choice. For the sake of simplicity, we set h1 s h as above. First, we always have had to determine numerically the rejection boundaries for the special bandwidth h. Here we observed shrinking boundaries, when h grew from 0.25 u to 2.25. In Figure 4 we plot the bandwidth versus the power of the test with observations generated by f 1 c . Obviously for this kind of alternative we get better power for larger bandwidths. In the introductory example we dealt with youth unemployment. The question is, can we explain the youth unemployment with the aforementioned predictor variables X in a single index model with logit link? In the application to this dataset, we used a slightly modified numerical procedure as described in Proenca and Ritter Ž1995.. Further, we rescaled the explanatory variables of each dimension to w y1, 1x . Since there are three dimensions Ž d s 3. for a sample size of n s 462, we choose the bandwidth h1 large, definitively 1.5, whereas h s 0.3. By Monte Carlo studies described above we U determined the 90 and 95% one-sided quantiles of T462 and got 1.74, respectively 2.38. Then we ran the test for our data and got the statistic value U T462 s 3.076 for b s Žy0.18010, y0.10725, 0.97778.. For the purpose of comparison, in Table 3 we switch the norm of b and set this first component equal to the corresponding one of u , the parameter of the logit fit in Figure 1. FIG. 4. Power function of the test with respect to the bandwidth for function f 1 c . 235 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION TABLE 3 Comparison of u and b Explanatory variables Intercept Earnings as an apprentice Percentage of apprentices divided by employees Unemployed rate u b y2.40996 } y0.07999 y0.07999 y0.17989 y0.04763 0.95113 0.43422 APPENDIX PROOF OF THEOREM 2.1. To simplify our exposition and to emphasize the main idea, we consider the case when the parametric family consists of one point, namely, a zero regression function, and errors « i are independent and standard Gaussian. Moreover, we assume random design with a design density p Ž x . in R d of the form p Ž x . s p 1Ž< x <., where a univariate function p 1Ž?. is compactly supported on w y1, 1x , symmetric, twice continuously differentiable and satisfies p 1Ž t . s 3r4 for < t < F 1r2. The method of the proof is standard; see, for example, Ingster Ž1993.. We replace the minimax problem by a Bayes one, where we consider, instead of the set F1 of alternatives, one Bayes alternative corresponding to a prior n concentrated on F1. We try to choose this prior n in such a way that the likelihood Zn s dPnrdP0 is close to 1 where the measure Pn is the Bayes measure for the prior n and P0 corresponds to the case of zero regression function. The Neyman]Pearson lemma yields that the hypothesis H0 : P s P0 cannot be consistently distinguished from the Bayes alternative Hn : P s Pn and hence from the composite alternative H1: P g F1. In the proof we use a hypercube argument as in Bickel and Ritov Ž1988.. Now we describe the structure of the prior n . Let g Ž?. be some function from the Holder class SŽ s, L., supported on w y1, 1x and satisfying the condi¨ tions Ž 52. Hg Ž t . dt s 0, 5 g 52 s Hg 2 Ž t . dt ) 0, 5 g 5 ` s sup g Ž t . F 1. t Set Ž 53. hs ž ' a log n n / 2r Ž4 sq1 . , where a constant a will be chosen later. Denote by In the partition of the interval w y 12 , 12 x into intervals of length h. Without loss of generality, we assume that the cardinality of the set In coincides with 1rh, Ž 54. a In s 1 h . ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 236 For each interval i g In introduce a function g I Ž t . of the form Ž 55. gI Ž t . s hs g ž t y tI h / , t I being the center of I. Evidently g I Ž?. is supported on I, g I g SŽ s, L. and the following hold for h small enough: Hg Ž t . dt s 0, Ž 56. Hg I 2 I Ž t . dt s h2 sq1 5 g 5 2 . Let now m be a set of binary values m I , I g In 4 , that is, m I s "1. Define a function GmŽ t . with Ž 57. Gm Ž t . s Ý Ig In mI g I Ž t . . This function Gm g SŽ s, L. vanishes outside w y 12 , 12 x and by Ž56., Ž 58. HG 2 I Ž t . dt s Ý ig In Hg 2 I Ž t . dt s 1 h h2 sq1 Hg 2 Ž t . dt s h2 s 5 g 5 2 . Taking into account Ž53. we see that the distance between zero function and each Gm is just of the rate c n2 from Theorems 2.1 and 2.2. Denote by Mn the set of all possible collections m I , I g In 4 with binaries m I s "1, and let mŽ d m . be the uniform measure on Mn . This measure can be represented as the direct product of binary measures m I Ž d m I . with m I Ž m I " 1. s 1r2. Now we pass to the semiparametric model. Let Sn be a grid on the unit sphere S d with the step bn , Ž 59. bn s h1r6 , h being from Ž53.. This means that < b y b 9 < G bn s h1r6 for each b , b 9 g Sn , b / b 9. Below we will use that for some a ) 0, Ž 60. N s aSn 7 n a and Ž 61. h1r2 log n < b y b 9< 2 F h1r2 log n bn2 ª 0, n ª ` ; b , b 9 g Sn , b / b 9. For each b g Sn and each m g Mn , define a multivariate function Gb , mŽ x . on R d with Gb , m Ž x . s Gm Ž xi b . . It is clear that the function Gb , mŽ x . is Holder, Gb , mŽ x . g S d Ž s, L., and by Ž58. ¨ we get Ž 62. HG 2 b, m Ž x . p Ž x . dx s HGm2 Ž xi b . p 1 Ž < x < . dx H s Gm2 Ž t . p 2 Ž t . dt s C0 h 2 s PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION 237 with p 2 Ž t . s Ž drdt . 1Ž xi b F t .p 1Ž< x <. dx and C0 g w 12 5 g 5 2 , 5 g 5 2 x . Finally we take the prior n as the uniform measure on the set of functions Gb , m , b g Sn , m g Mn 4 , and Ž 63. 1 Pn s Ý N bgS n 1 M Ý m g Mn PGb , m . Here M s a Mn s 2 1r h , N being from Ž60.. Denote also Zn s dPnrdP0 and notice that this likelihood can be represented in the form Zn s Ž1rN .Ý b g S nZb with Ž 64. Zb s 1 M Ý mg Mn Zb , m s 1 M Ý mg Mn dPGb , m dP0 . Our goal is to prove that for a small enough in Ž53. one has Ž 65. Zn ª 1 under the measure P0 . We start from a decomposition and an asymptotic expansion for each Zb from Ž64.. For that we need some more notation. Fix some b g Sn and put Ž 66. sb2, I s Ý g I2 Ž Xii b . , I g In , i Ž 67. jb , I s 1 sb , I Ý g I Ž Xii b . « i , I g In . i We see that jb , I are standard normal and independent for different I g In , and Ý Gb2, m Ž X i . s Ý Gm2 Ž X ii b . s Ý i i Ig In sb2, I . Recall that we assume random design and Ž 68. H E Ý Gb2, m Ž X i . s n Gb2, m Ž x . p Ž x . dx s nC0 h2 s . i Similarly for each sb2, I , Ž 69. E sb2, I s nH g I2 Ž xi b . p Ž x . dx s nH g I2 Ž xi b . p 1 Ž < x < . dx s nCI h2 sq1 , where CI does not depend on b and CI g w C0r '2 , '2 C0 x . LEMMA A.1. Zb s Ł ig In where chŽ z . s 12 Ž e z q eyz .. ch Ž sb , I jb , I . exp Ž y 12 sb2, I . , ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 238 PROOF. By Girsanov’s formulas and Ž66. and Ž67., Zb , m s exp ½ ÝG b, m Ž X i . « i y 12 Gb2, m Ž X i . i s exp s ½Ý Ł Ig In Ig In m I sb , I jb , I y 1 2 Ý Ig In 5 sb2, I 5 exp m I sb , I jb , I y 12 sb2, I 4 . Now the lemma’s assertion follows from the direct product structure of the measure mŽ d m .. I Denote also Ž 70. vb2 s Ž 71. zb s 1 2 Ý Ig In 1 vb sb4, I , Ý Ig In sb2, I Ž jb2, I y 1 . . LEMMA A.2. The following statements hold: Ži. E 0 zb s 0. Žii. E 0 zb2 s 1. Žiii. E vb2 F C1 n2 h4 sq1 s C1 log n with C1 F a; see Ž53.. Var vb2 F C2 ny1 log n with some positive C2 . Živ. The random variables zb are asymptotically normal and there exist standard normal random variables z˜b such that ž log n sup E 0 z˜b y zb bgS n log n sup b , b 9gS n / 2 ª 0, <E 0 z˜b z˜b 9 y E 0 zb zb 9 < ª 0. For the proof, the first two statements are obvious. Statement Žiii. follows from Ž69.. Finally, Živ. is the application of the Strassen type invariance principle w see, e.g., Csorgo ¨ ˝ and Revesz ´ ´ Ž1981.x . The next step is the asymptotic expansion for each Zb . LEMMA A.3. The following statements are satisfied uniformly in b g Sn , for each d ) 0: Ž i. Ž ii . ž / P0 Zb y exp vb zb y 12 vb2 4 ) d ª 0, ž ½ P0 Zb y exp vb z˜b y 12 vb2 5 / ) d ª 0. PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION 239 PROOF. The first statement is equivalent to the following one: ž / P0 log Zb y vb zb q 12 vb2 ) d ª 0, but the latter can be obtained using the Taylor expansion for log Zb : log Zb s s Ý Ig In log ch Ž sb , I jb , I . y 12 sb2, I Ý 1 2 Ig In sb2, I Ž jb2, I y 1 . y 1 12 sb4, I jb4, I q O Ž sb6, I jb6, I . and the following asymptotic relations which hold uniformly in b : P0 ž / / Ý sb4, I Ž jb4, I y 3 . ) d ª 0, P0 ž Ig In Ý Ig In sb6, I jb6, I ) d ª 0; for details we refer to Ingster Ž1993.. The second statement of the lemma follows directly from Lemma A.2Žiii.. I Now we arrive at the central point of the proof. Actually we prove that ‘‘submodels’’ corresponding to different b are in some sense asymptotically independent. That is why we have to pay with the extra log term for the choice of ‘‘direction’’ b . LEMMA A.4. There exists a constant C such that for any b , b 9 g Sn , b / b 9, Ž 72. E zb zb 9 F Ch1r2 < b y b 9< 2 . PROOF. Let us fix some b , b 9 for Sn . Denote by r their scalar product, r s Ž b , b 9. . p We will also denote by E the condition expectation given the design points X i , i s 1, . . . , n. Set for I, I9 from In , Ž 73. r s rI , I9 s Epjb , I jb 9, I9 s 1 sb , I sb 9, I9 Ý g I Ž Xii b . g I9 Ž Xii b 9 . . i Obviously, < rI, I9 < F 1. Using the normality of jb , I and jb 9, I9 we calculate easily Ž 74. Ep Ž jb2, I y 1 .Ž jj29, I9 y 1 . s 4 rI2, I9 y 2 rI , I9 . ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 240 One has E 0 zb zb 9 s E 0 s E0 1 vb Ý Ig In 1 1 vb vb 9 sb2, I Ž j bb2, I y 1 . Ý Ý Ig In I9g In 1 Ý vb 9 I9g In sb29, I9 Ž jb29, I9 y 1 . sb2, I sb29, I9 4 rI2, I9 y 2 rI , I9 and using < rI, I9 < F 1 and the Cauchy]Schwarz inequality, we get <E 0 zb zb 9 < F 6E 0 1 1 vb vb 9 F 6 E0 1 vb2 1 Ý Ý Ig In I9g In sb2, I sb29, I9 < rI , I9 < 1r2 Ý Ý vb29 Ig I I9g I n n sb2, I sb29, I9 1r2 Ý Ý E0 Ig In I9g In sb2, I sb29, I9 rI2, I9 . Next, using the Holder inequality, ¨ Ý Ý Ig In I9g In ž sb2, I sb29, I9 s N Ý / ž ž Ý /ž Ig In s Nvb vb 9 1r2 sb4, I Ig In sb2, I / / 1r2 Ý sb49, I9 Ý sb29, I9 F Nvb vb 9 , N I9g In I9g In where, recall, N s a In s hy1 . This inequality and Lemma A.2Žiii. imply <E 0 zb zb 9 < F C Ž log n . y1 N Ž 75. F C Ž log n . y1 Ý Ý Ig In I9g In N 2 sup I, I9g In 1r2 E sb2, I sb29, I9 rI2, I9 E sb2, I sb29, I9 rI2, I9 1r2 with some C ) 0. Let us fix for a moment some I, I9 g In and denote wI2, I9 s E sb2, I sb29, I9 rI2, I9 . Below we will prove the following estimate: Ž 76. wI , I9 F Cnh2 sq3 1 y r2 with some constant C depending only on the design density p . Substituting the estimates in Ž75. and using Ž53., we get that <E 0 zb zb 9 < F Ch1r2 Ž1 y r 2 .y1 . Now the assertion follows from the simple fact that Ž b y b 9. 2 G 1 y r 2 . 241 PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION It remains to check Ž76.. First we note that from Ž73., wI2, I9 s E 2 Ý g I Ž X ii b . g I9 Ž Xii b 9 . i s Ž n2 y n . Hg Ž x 1 i b . g I9 Ž xi b 9 . p Ž x . dx 2 H q n g I2 Ž xi b . g I92 Ž xi b 9 . p Ž x . dx. To simplify the notation, suppose without loss of generality that b s Ž1, 0, . . . , 0., b 9 s Ž r , 1 y r 2 , 0, . . . , 0.. Introduce new variables y 1 , y 2 , . . . , yd by the equality xi b s t I q hy1 , xi b 9 s t I9 q hy 2 , y k s x k , k s 3, . . . , d and denote by T the linear transformation in R d such that y s Tx. It is easy to verify that Tr y 1 s Ž h, 0, . . . , 0., Tr y 2 s Ž hD , hŽ1 y D 2 .y1 r2 , . . . , 0. and <det T < s h 2 Ž1 y D 2 .y1 r2 . We have ' wI2, I9 s Ž n y n. 2 q 2 h 2 sq2 '1 y r Hg Ž y . g Ž y 1 2 nh4 sq2 '1 y r Hg 2 2 2 . p (T Ž y . dy Ž y1 . g 2 Ž y 2 . p (T Ž y . dy. Here p (T means the function on R d defined as the superposition of the linear transformation T and the design density p . Since the function g has the support in w y1, 1x , the latter integrals can be considered only on the domain with < y 1 < F 1 and < y 2 < F 1. For any such point, one has <p (T Ž y 1 , y 2 , y 3 , . . . , yd . y p (T Ž0, 0, y 3 , . . . , yd .< F 5p 9 5Ž< Tr y 1 < q < Tr y 2 <. 5p 9 5Ž2 h q hŽ1 y D 2 .y1 r2 . where 5p 9 5 means the maximum of the norm of the first derivative of p . This implies Hg Ž y . g Ž y 1 2 . p (T Ž y . dy s H g Ž y1 . g Ž y 2 . p (T Ž y1 , y 2 , y 3 , . . . , yd . dy H y g Ž y 1 . g Ž y 2 . p (T Ž 0, 0, y 3 , . . . , yd . dy F Ch Ž 1 y D 2 . y1 r2 . Here we have used the equality H g Ž t . dt s 0 and boundedness of g. Similarly, one estimates the second integral H g 2 Ž y 1 . g 2 Ž y 2 .p (T Ž y . dy and Ž76. follows. Now everything is prepared to complete the proof of Ž65.. The results of Lemmas A.2 and A.3, reduce this assertion to the following one: Ž 77. 1 N Ý bgS n ½ exp vb z˜b y 1 2 5 vb2 y 1 ª 0 ¨ W. HARDLE, V. SPOKOINY AND S. SPERLICH 242 under the measure P0 . It suffices to check that 1 N Ý ž Z˜b y 1 / E0 2 2 ª0 bgS n with ½ 5 Z˜b s exp vb z˜b y 12 vb2 . Using the normality of z˜b and Lemma A.2Žiii., one derives ½ 5 E 0 exp vb z˜b y 12 vb2 y 1 2 s exp vb2 4 F n a . For different, b , b 9 g Sn , denote a s E 0 z˜b z˜b 9. Then by direct calculation, E 0 Z˜b Z˜b 9 s E 0 exp a vb vb 9 4 . The results of Lemmas A.4 and A.2Živ. allow us to obtain ž /ž / E 0 Z˜b b y 1 Z˜b 9 y 1 s E 0 Z˜b Z˜b y 1 F C a log n. Finally, we derive by Ž61., Lemmas A.4 and A.2Žiii., Živ., 1 N E0 2 s F F Ý ž Z˜b y 1 / 2 bgS n 1 N 2 1 N2 1 N ž Ý E 0 Z˜b y 1 Ý exp Ž bgS n bgS n na q 2 vb2 / 2 .q q 1 N2 Ch1r2 log n bn2 1 N2 Ý Ý Ý b gS n b 9gS n b 9/ b Ý b gS n b 9gS n b 9/ b ž /ž E 0 Z˜b y 1 Z˜b 9 y 1 / Ch1r2 log n < b y b 9< 2 ª 0, if a in Ž53. is taken small enough. Acknowledgment. We thank O. Lepski for helpful discussion and comments. REFERENCES BICKEL, P. J. and RITOV, Y. Ž1988.. Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhya Ser. A 50 381]393. CARROLL, R. J. and RUPPERT, D. Ž1988.. Transformation and Weighting in Regression. Chapman and Hall, New York. CSORGO ¨ ˝, M. and R´EVESZ ´ , P. Ž1982.. Strong Approximation in Probability and Statistics. Akademiai Kiado, ´ ´ Budapest. ENGLE, R. F., GRANGER, W. J., RICE, J. and WEISS, A. Ž1986.. Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 310]320. FRIEDMAN, F. H. and STUETZLE, W. Ž1984.. A projection pursuit regression. J. Amer. Statist. Assoc. 79 599]608. PARAMETRIC VERSUS SEMIPERIMETRIC REGRESSION 243 GOLUBEV, G. Ž1992.. Asymptotic minimax regression estimation in additive model. Problems Inform. Transmission 28 3]15. ŽIn Russian.. GREEN, P. and SILVERMAN, B. W. Ž1994.. The Penalized Likelihood Approach. Chapman and Hall, London. H¨ ARDLE, W. Ž1990.. Applied Nonparametric Regression. Cambridge Univ. Press. ARDLE, W., HALL, P. and ICHIMURA, H. Ž1993.. Optimal smoothing in single index models. Ann. H¨ Statist. 21 157]178. H¨ ARDLE, W., KLINKE, S. and TURLACH, T. A. Ž1995.. XploRe: An Interactive Statistical Computing Environment. Springer, Berlin. H¨ ARDLE, W. and MAMMEN, E. Ž1993.. Comparing nonparametric versus parametric regression fits. Ann. Statist. 21 1926]1947. HALL, P. Ž1984.. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal. 14 1]16. HALL, P. Ž1989.. On projection pursuit regression. Ann. Statist. 17 573]578. HOROWITZ, J. Ž1993.. Semiparametric and nonparametric estimation of quantal response models. In Handbook of Statistics ŽG. S. Maddala, C. R. Rao and H. D. Vinod, eds.. 45]72. Elsevier, New York. ARDLE, W. Ž1994.. Testing a parametric model against a semiparametric HOROWITZ, J. and H¨ alternative. Econometric Theory 10 821]848. HUBER, P. J. Ž1985.. Projection pursuit. Ann. Statist. 13 435]475. HUET, S., JOLIVET, E. and M´ ESSEAU, A. Ž1993.. La regression Non-lineaire: Methodes et Applications en Biologie. INRA, Paris. IBRAGIMOV, I. A. and KHASMINSKI, R. Z. Ž1977.. One problem of statistical estimation in Gaussian white noise. Sov. Math. Dokl. 236 1351]1354. IBRAGIMOV, I. A. and KHASMINSKI, R. Z. Ž1981.. Statistical Estimation: Asymptotic Theory. Springer, Berlin. INGSTER, YU. I. Ž1982.. Minimax nonparametric detection of signals in white Gaussian noise. Problems Inform. Transmission 18 130]140. INGSTER, YU. I. Ž1993.. Asymptotically minimax hypothesis testing for nonparametric alternatives. I, II, III. Math. Methods Statist. 2 85]114. MADDALA, G. Ž1983.. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge Univ. Press. MCCULLAGH, P. and NELDER, J. A. Ž1989.. Generalized Linear Models 2. Chapman and Hall, London. M¨ ULLER, H. G. Ž1987.. Weighted local regression and kernel methods for nonparametric curve fitting. J. Amer. Statist. Assoc. 82 231]238. PROENCA, I. and RITTER, CHR. Ž1995.. Negative bias in the H-H Statistik. Comput. Statist. RICE, J. A. Ž1986.. Convergence rates for partially splined models. Statist. Probab. Lett. 4 203]208. SEVERINI, T. A. and STANISWALIS, J. G. Ž1994.. Quasi-likelihood estimation in semiparametric models. J. Amer. Statist. Assoc. 89 501]511. SPECKMAN, P. Ž1988.. Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 413]446. SPOKOINY, V. Ž1996.. Adaptive hypothesis testing using wavelets. Ann. Statist. 24 2477]2498. W. H¨ ARDLE S. SPERLICH INSTITUT FUR ¨ STATITIK UND OKONOMETRIE HUMBOLDT UNIVERSITAT ¨ ZU BERLIN SFB 373, SPANDAUER STR. 1 10178 BERLIN GERMANY E-MAIL: [email protected] V. SPOKOINY WEIERSTRASS INSTITUTE FOR APPLIED ANALYSIS AND STOCHASTICS MOHRENSTR. 39 10117 BERLIN GERMANY
© Copyright 2026 Paperzz