Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 Proc. R. Soc. A (2008) 464, 3175–3192 doi:10.1098/rspa.2008.0235 Published online 12 August 2008 A law of large numbers for nearest neighbour statistics B Y D AFYDD E VANS * School of Computer Science, Cardiff University, 5 The Parade, Cardiff CF24 3AA, UK In practical data analysis, methods based on proximity (near-neighbour) relationships between sample points are important because these relations can be computed in time O(n log n) as the number of points n/N. Associated with such methods are a class of random variables defined to be functions of a given point and its nearest neighbours in the sample. If the sample points are independent and identically distributed, the associated random variables will also be identically distributed but not independent. Despite this, we show that random variables of this type satisfy a strong law of large numbers, in the sense that their sample means converge to their expected values almost surely as the number of sample points n/N. Keywords: nearest neighbours; geometric probability; difference-based methods; noise estimation 1. Introduction Let X ZX1, X2, . be a sequence of independent and identically distributed random vectors Xi 2 Rd , and let X n Z ðX1 ; .; Xn Þ denote the first n points of the sequence. The common distribution of the Xi will be called the sampling distribution and X n will be called a sample (of size n). Let ni(n, k) denote the index of the k -nearest neighbour of Xi among the points of X n Z ðX1 ; .; Xn Þ, where the distance between any two points is defined with respect to the metric induced by any l p-norm on Rd , 8 !1=p d > < X p jaj j ; 1% p!N; kða 1 ; .; ad Þkp Z jZ1 > : maxjZ1; .; d jaj j; p ZN; and equidistant points are ordered by their indices. Let h : Rd!ðkC1Þ / R be a measurable function. We define the identically distributed random variables hi;n ðX Þ Z hðXi ; Xni ðn;1Þ ; .; Xni ðn;kÞ Þ ði Z 1; .; nÞ; ð1:1Þ so that hi,n(X ) is a function of the sample point Xi and its k-nearest neighbours among the points of X n . Let mn, s2n and kn, respectively, denote the mean, variance and kurtosis of the (identically distributed) random variables hi,n(X ), *[email protected] Received 5 June 2008 Accepted 11 July 2008 3175 This journal is q 2008 The Royal Society Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3176 D. Evans and let Sn and Hn, respectively, denote their sum and sample mean, n n X 1 X hi;n ðX Þ and Hn ðX Þ Z h ðX Þ: Sn ðX Þ Z n i Z1 i;n i Z1 ð1:2Þ Our main result is that Hn is a strongly consistent estimator for mn as n/N. Theorem 1.1. If the sampling distribution is continuous and if the kurtosis kn is uniformly bounded for all n2N, then Hn K mn / 0 a:s: as n/N: ð1:3Þ s n Theorem 1.1 shows that the rate at which Hn converges to mn is at least of the order of O(sn) as n/N. Let us define the normalized random variables, hi;n ðX ÞK mn ; ð1:4Þ h i;n ðX Þ Z sn along with their sum S n and sample mean Hn , n n X 1 X S n ðX Þ Z h i;n ðX Þ and Hn ðX Þ Z h i;n ðX Þ: ð1:5Þ n i Z1 i Z1 If the sampling distribution is continuous, we show that the variance of S n is of the asymptotic order O(n) as n/N. Theorem 1.2. If the sampling distribution is continuous, then for all nR16k, varðS n Þ% 2ðn C 1Þð3 C 8kbÞð3 C 64kÞðkn C knC1 Þ; ð1:6Þ d where bðk; d; pÞZ kbdvd;p c and vd,p is the volume of the unit ball in ðR ; k$kp Þ. To show that jHn j/ 0 a.s. as n/N, we first show that the expected squared difference EðS n KS nC1 Þ2 between successive terms of the sequence S n is bounded independently of n. We then use the Efron–Stein inequality (Efron & Stein 1981; Steele 1986) to bound the variance varðS n Þ, which by Chebyshev’s inequality yields the weak convergence of Hn to zero as n/N. Finally, we again use the bound on EðS n KS nC1 Þ2 to prove strong convergence using a standard argument. Similar results have previously appeared in the literature. In particular, Penrose & Yukich (2003) obtained a weak law of large numbers for functions of binomial point processes in Rd , then applied the result to a number of different types of random proximity graphs, and a number of functions including the total edge length (under arbitrary weighting) and the number of components in the graph. Wade (2007) applied the general result of Penrose & Yukich (2003) to the total (power-weighted) edge length of several types of the nearest neighbour graphs on random point sets in Rd , and gave explicit expressions for the limiting constants. In two recent papers, Penrose (2007) obtained a law of large numbers for a class of marked point processes and gave some applications to noise estimation, while Penrose & Wade (2008) have investigated the asymptotic properties of the random online nearest neighbour graph. The methods of Penrose & Yukich (2003) depend on the notion of stabilization, which demands that the neighbourhood of a particular vertex is unaffected by changes in the neighbourhood of another vertex located outside some sufficiently large ball (the radius of this ball is called the radius of stabilization). Because we consider only k -nearest neighbours graphs, we will instead rely on the standard Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 Nearest neighbour statistics 3177 geometric fact that there exists a finite number bZb(k, d, p) such that for any countable set of points in ðRd ; k$kp Þ, any point of the set can be among the first k-nearest neighbours of at most b other points of the set. Motivated by the need for a multidimensional goodness-of-fit test, Bickel & Breiman (1983) investigated the asymptotic properties of sums of bounded functions of the (first) nearest neighbour distances, and proved a central limit theorem for the random variables f ðXi ÞkXi K Xni ðn;1Þ k where f is the (unknown) sampling density. This was extended to k-nearest neighbour distances by Penrose (2000), with k being allowed to increase as a fractional power of the number of points n. Functions defined in terms of proximity relations exhibit ‘local’ dependence, which can often be represented by dependency graphs, first described by Petrovskaya & Leontovitch (1982). For a set of random variables fXi : i 2 V g, the graph G(V, E ) is said to be a dependency graph for the set if for any pair of disjoint sets A1, A24V such that no edge has one endpoint in A1 and the other in A2, the s-fields sfXi : i 2 A1 g and sfXi : i 2 A2 g are mutually independent. The following result is due to Baldi & Rinott (1989). Theorem 1.3 (Baldi & Rinott 1989). Let fZi : i 2 PV g be random variables having a dependency graph G(V, E ), and define S Z i2V Zi . Let D denote the maximal degree of G and suppose jZij%B almost surely. Then !1=2 ! pffiffiffi jV jD 2 B 3 S KEðSÞ ; ð1:7Þ P pffiffiffiffiffiffiffiffiffiffiffiffiffiffi % x KFðxÞ % 32 1 C 6 varðSÞ varðSÞ3=2 where jV j is the cardinality of V and F(x) is the standard normal distribution N(0, 1). Avram & Bertsimas (1993) applied theorem 1.3 to the length of the k-nearest neighbour graph, the Delaunay triangulation and the Voronoi diagram of random point sets. More recently, Penrose & Yukich (2001) used stabilization properties to prove central limit theorems for functions of various types of (random) proximity graphs, while Chen & Shao (2004) obtained central limit theorems for various types of local dependence, and in particular improved on the result of Baldi & Rinott (1989). In the present context, vertices in the dependency graph corresponding to the random variables hi,n and hj,n are connected by an edge only if they share a common nearest neighbour. By lemma 4.2, the maximal degree of G is therefore bounded independently of the number of points n. Under the additional assumption that the hi,n are bounded a.s., if we could show that there exist constants c, eO0 such that varðSn ÞR cn2=3Ce , it would then follow that S n / N ð0; 1Þ in distribution as n/N. This is beyond the scope of the present paper. 2. Applications (a ) Nearest neighbour distances Let X ZX1, X2, . be a sequence of independent and identically distributed random variables taking values in the unit cube [0,1]d. Let di,n(X ) be the distance between the point Xi and its k -nearest neighbour in the sample X n , and let Dn(X ) Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3178 D. Evans denote the sample mean of the di,n(X), di;n ðX Þ Z kXi K Xni ðn;kÞ k ðdÞ ðdÞ and Dn ðX Þ Z ðdÞ n 1 X d ðX Þ: n i Z1 i;n ð2:1Þ Let mn , sn and kn , respectively, denote the mean, standard deviation and kurtosis of the (identically distributed) random variables di,n(X ). Let F denote the sampling distribution, Bx(r) denote the ball of radius r centred at x 2 Rd and ux(r) denote its probability measure: ð dF: ð2:2Þ ux ðrÞ Z Bx ðrÞ Suppose that F satisfies a positive density condition (Gruber 2004), in the sense that there exist constants aO1 and rO0 such that a K1 r d % ux ðrÞ% ar d for all 0%r%r. Suppose also that F satisfies a smooth density condition in the sense that its second partial derivatives are bounded at every point of [0,1]d. Under these conditions, it is known (Evans et al. 2002) that for all eO0, the moments of the k -nearest neighbour distance distribution satisfies ð a Gðk C a=dÞ 1 ðdKaÞ=d E di;n Z a=d f ðxÞ dx 1 C O 1=dKe ; ð2:3Þ n vd;p n a=d GðkÞ as n/N, where vd,p is the volume of the unit ball in ðRd ; k$kp Þ and f(x)ZF 0 (x) is the ðdÞ ðdÞ density of the sampling distribution. In particular, sn Z Oðn K1=d Þ and kn Z Oð1Þ as n/N, so by theorem 1.1 it follows that: n1=d jDn KmðdÞ n j/ 0 a:s: as n/N: ð2:4Þ Thus, Dn is a strongly consistent estimator for the expected k -nearest neighbour ðdÞ distance mn , and its rate of convergence is of the asymptotic order Oðn1=d Þ as the number of sample points n/N. In fact, taking aZ1 in (2.3) we have ð Gðk C 1=dÞ 1=d ðdÞ ð2:5Þ f ðxÞðdK1Þ=d dx; lim n mn Z a=d n/N vd;p GðkÞ so by theorem 1.1, lim n n/N 1=d Dn ðX Þ Z Gðk C 1=dÞ ð a=d vd;p GðkÞ f ðxÞðdK1Þ=d dx a:s: ð2:6Þ (i) The Euclidean k -nearest neighbours graph For any set of vertices {x 1, ., xn} in Rd , the (undirected) Euclidean k -nearest neighbours graph is constructed by including an edge between every point and its k -nearest neighbours in the set, where the nearest neighbour relations are defined with respect to the Euclidean metric ( pZ2). Let Ln(X ) denote the total length of the Euclidean k -nearest neighbours graph of the random sample X n , n X k X Ln ðX Þ Z kXi K Xni ðn;[ Þ k2 : ð2:7Þ i Z1 [ Z1 Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3179 Nearest neighbour statistics Because the volume of the unit ball in ðRd ; k$k2 Þ is equal to 2pd=2 =Gð1C d=2Þ, then subject to the above conditions on the sampling distribution it follows by (2.6) that: ð lim Ln ðX Þ=nðdK1Þ=d Z aðk; dÞ f ðxÞðdK1Þ=d dx n/N where d Gðk C 1 C 1=dÞ ; aðk; dÞ Z d C1 GðkÞ a:s:; ð2:8Þ ð2:9Þ which agrees with theorem 2(b) of Wade (2007). (b ) Noise estimation Non-parametric regression attempts to model the behaviour of some observable variable Y2R in terms of another observable variable X 2 Rd , based only on a finite number of observations of the joint variable (X, Y ). The relationship between X and Y is often assumed to satisfy the additive hypothesis Y Z EðY jXÞ C R; ð2:10Þ where E(YjX ) is the regression function and R is the residual variable (noise). Let ZZZ1, Z2, . be a sequence of independent and identically distributed observations ZiZ(Xi , Yi) of the joint variable ZZ(X, Y ), let X ZX1, X2, . and YZY1, Y2, . denote the marginal sequences, and let ni(n, k) denote the index of the k-nearest neighbour of Xi among the points of the marginal sample X nZ ðX1 ; .; Xn Þ. To estimate the k -moment E(Rk ) of the residual distribution (k2N), we consider the random variables gi,n(Z) and their sample mean Gn(Z), defined by k n Y 1 X gi;n ðZÞ Z ðYi K Yni ðn;[ Þ Þ and Gn ðZÞ Z g ðZÞ: ð2:11Þ n i Z1 i;n [ Z1 ðgÞ ðgÞ ðgÞ Let mn , sn and kn , respectively, denote the mean, standard deviation and kurtosis of the (identically distributed) random variables gi,n(Z). Perhaps the main contribution of this paper is that theorem 1.1 extends to possibly unbounded random variables, the only requirement being that their first four moments are bounded. This allows the residual variable R in (2.10) to take arbitrarily large values. Suppose that the sampling distribution of the explanatory variables Xi satisfies smooth and positive density conditions, the regression function E(YjX ) satisfies a Lipschitz condition on Rd and the residual variable R is independent of X (homoscedastic). Under these conditions, Evans & Jones (2008) have shown that for k2N, k ðdÞ mðgÞ n Z EðR Þ C Oðmn Þ ðdÞ mn as n/N; ð2:12Þ is the expected k -nearest neighbour distance in the marginal sample where (X1, ., Xn) and the implied constant depends on the residual moments up to order kK1, the constant implied by the Lipschitz condition on the regression function, and the constant implied by the positive density condition on the sampling distribution. Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3180 D. Evans If the first 4k moments of the residual distribution are bounded, it can be ðgÞ ðgÞ shown that sn Z Oð1Þ and kn Z Oð1Þ as n/N. Hence, by (2.5), (2.12) and theorem 1.1, it follows that: jGn KmðgÞ ð2:13Þ n j/ 0 a:s: as n/N: Thus, the sample mean Gn is a strongly consistent estimator for the kth moment E(Rk ) of the residual distribution as n/N. 3. The Efron–Stein inequality Let X1, X2, . be a sequence of independent and identically distributed random vectors in Rd , and let g : Rd!n / R be a symmetric function of n vectors in Rd . The Efron–Stein inequality (Efron & Stein 1981) provides an upper bound for the variance of the statistic Z Z gðX1 ; .; Xn Þ in terms of the quantities, nC1 1 X ZðiÞ Z gðX1 ; .; XiK1 ; XiC1 ; .; XnC1 Þ and Zð$Þ Z Z : ð3:1Þ n C 1 i Z1 ðiÞ Theorem 3.1 (Efron & Stein 1981). Let X1, X2, ., XnC1 be independent and identically distributed random vectors in Rd , let g : Rd!n / R be a symmetric function, and define the random variable Z Z gðX1 ; .; Xn Þ. If E(Z 2)!N, then varðZÞ% nC1 X EðZðiÞ K Zð$Þ Þ2 : ð3:2Þ i Z1 The Efron–Stein inequality has found a wide range of applications in statistics. Our approach is partly based on the work of Reitzner (2003), who uses the inequality to prove strong laws of large numbers for statistics related to random polytopes. First we note that, because X1, ., XnC1 are independent and identically distributed, it follows by symmetry on the indices that: varðZÞ% ðn C 1ÞEðZðnC1Þ K Zð$Þ Þ2 : Furthermore, for any a2R, we have that nC1 nC1 X X EðZðiÞ K Zð$Þ Þ2 Z EððZðiÞ KaÞKðZð$Þ KaÞÞ2 i Z1 ð3:3Þ ð3:4Þ i Z1 Z nC1 X EðZðiÞ KaÞ2 Kðn C 1ÞðZð$Þ KaÞ2 ; ð3:5Þ iZ1 so inequalities (3.2) and (3.3) are preserved if we replace Z($) by any other function of X1, ., XnC1. To prove theorem 1.1, we apply the Efron–Stein inequality to the sum Sn(X ), n X ZðX1 ; .; Xn Þ Z hi;n ðX Þ ð3:6Þ i Z1 Z n X iZ1 Proc. R. Soc. A (2008) hðXi ; Xni ðn;1Þ ; .; Xni ðn;kÞ Þ: ð3:7Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 Nearest neighbour statistics 3181 If Eðh 2i;n Þ!N then EðZ 2 Þ!N. Furthermore, because the points X1, ., Xn are independent and identically distributed, it follows that Sn Z ZðX1 ; .; Xn Þ is invariant under permutations of its arguments. Thus, we may apply the Efron– Stein inequality to Sn. In this case ZðnC1Þ Z Sn , so by replacing Z($) by SnC1 in (3.3), we obtain varðSn Þ% ðn C 1ÞEðSn K SnC1 Þ2 : ð3:8Þ An identical expression also holds for the normalized sum S n . The lemma 3.2 shows that the expected squared difference between successive values of S n is bounded independently of the total number of points n. The proof is given in §4. Lemma 3.2. If the sampling distribution is continuous, then for all nR16k, E ðS n KS nC1 Þ2 % 2ð3 C 8kbÞð3 C 64kÞðkn C knC1 Þ: ð3:9Þ The following corollary of lemma 3.2 follows immediately by (3.8) and Chebyshev’s inequality, and asserts the weak consistency of the sample mean Hn, defined in (1.2), as an estimator for true mean mn as n/N (provided the kurtosis kn is uniformly bounded for all n2N). Corollary 3.3. If the sampling distribution is continuous, then for all nR16k, Hn K mn O e % 4ð3 C 8kbÞð3 C 64kÞðkn C knC1 Þ : P ð3:10Þ sn ne2 We prove lemma 3.2 using methods based on the approach of Bickel & Breiman (1983) and Hitczenko et al. (1999). Having established lemma 3.2, the proof of theorem 1.1 is then straightforward. 4. Proofs (a ) Standard results We need two standard geometric results. The first of these concerns the expected probability measure of k -nearest neighbour balls. For Xi 2 X n , let Bi;n ðX Þ 3Rd denote the k -nearest neighbour ball of Xi (with respect to the finite sample X n ), defined to be the ball centred at Xi of radius equal to the distance from Xi to its k-nearest neighbour in X n , and let ui,n(X) denote its probability measure, ð ui;n ðX Þ Z dF; ð4:1Þ Bi;n ðXÞ where F is the (common) distribution function of the sample points X1, X2, . . It is well known, see for example Percus & Martin (1998) or Evans et al. (2002), that provided the sampling distribution is continuous, the expected probability measure of any k -nearest neighbour ball (over all sample realizations) is equal to k/n. Lemma 4.1. If the sampling distribution is continuous, then Eðui;n Þ Z k=n: Proc. R. Soc. A (2008) ð4:2Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3182 D. Evans The second result concerns the maximum degree of any vertex in a k -nearest neighbour graph. It is well known that this number is bounded independently of the total number of vertices (e.g. Stone 1977; Bickel & Breiman 1983; Zeger & Gersho 1994; Yukich 1998). Lemma 4.2. For every countable set of points in Rd , any point can be among the first k-nearest neighbours of at most bðk; d; pÞZ kbdvd;p c other points of the set, where vd,p is the volume of the unit ball in ðRd ; k$kp Þ. In particular, bðk; d; 2ÞZ kb2pd=2 =Gðd=2Þc and bðk; d;NÞZ kd2d . (b ) Adding a sample point Let ln denote the second moment of the random variables hi,n(X), ln Z s2n C m2n : ð4:3Þ Lemma 4.3. If the sampling distribution is continuous, ! n X ðhi;n K hi;nC1 Þ2 % 2kðln C lnC1 Þ: E ð4:4Þ i Z1 Proof. Let Bi;n ðX Þ 3Rd denote the k -nearest neighbour ball of Xi with respect to the finite sample X n Z ðX1 ; .; Xn Þ, and suppose we add another (independent and identically distributed) point XnC1 to the ensemble. If XnC1 ;Bi;n ðX Þ (i.e. if the new point XnC1 falls outside the current k -nearest neighbour ball of Xi), then Xni ðn;[ Þ Z Xni ðnC1;[ Þ for all 1%[%k, and hence hi;n Z hðXi ; Xni ðn;1Þ ; .; Xni ðn;kÞ Þ ð4:5Þ Z hðXi ; Xni ðnC1;1Þ ; .; Xni ðnC1;kÞ Þ ð4:6Þ Z hi;nC1 : ð4:7Þ Thus, we have hi,nshi,nC1 only if XnC12Bi,n(X ). Let Ji,nC1 denote the indicator variable of the event XnC12Bi,n(X), ( 1; if XnC1 2 Bi;n ðX Þ; Ji;nC1 ðX Þ Z ð4:8Þ 0; otherwise: Then E n X i Z1 ! 2 ðhi;n K hi;nC1 Þ ZE n X ! 2 ðhi;n K hi;nC1 Þ Ji;nC1 : ð4:9Þ i Z1 By lemma 4.1, the probability measure of Bi,n(X ) is equal to k/n. Hence, because the new point XnC1 is assumed to be independent of the previously selected points X1, .,Xn, we have PðJi;nC1 Z 1Þ Z k=n Proc. R. Soc. A (2008) ð4:10Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3183 Nearest neighbour statistics and thus E n X ! ðhi;n K hi;nC1 Þ2 Z i Z1 n k X Eððhi;n K hi;nC1 Þ2 jJi;nC1 Z 1Þ: n i Z1 ð4:11Þ So that we can consider samples of sizes n and nC1 separately, we apply the crude bound Eððhi;n K hi;nC1 Þ2 jJi;nC1 Z 1Þ% 2E h 2i;n jJi;nC1 Z 1 C 2E h 2i;nC1 jJi;nC1 Z 1 : ð4:12Þ For samples of size n, because the random variable h 2i;n is determined only by the points X1,.,Xn, it follows that h 2i;n is independent of the event XnC12Bi,n, and therefore: ð4:13Þ E h 2i;n jJi;nC1 Z 1 Z E h 2i;n Z ln : For samples of size nC1, the event XnC12Bi,n is simply that the ‘last’ point of the sample X1,.,XnC1 is among the first k -nearest neighbours of Xi , i.e. XnC12Bi,n if and only if ni ðnC 1; [ ÞZ nC 1 for some 1%[%k. Because the points X1,.,XnC1 are independent and identically distributed, the order in which they are indexed is arbitrary. Hence, it follows by symmetry on the indices jsi that the condition ni ðnC 1; kÞZ nC 1 cannot affect the expected value of h 2i;nC1 , so ð4:14Þ E h 2i;nC1 jJi;nC1 Z E h 2i;nC1 Z lnC1 : Thus, we have that Eððhi;n K hi;nC1 Þ2 jJi;nC1 Z 1Þ Z 2ðln C lnC1 Þ ð4:15Þ and the result follows by (4.11). & Lemma 4.4. If the sampling distribution is continuous, then for all nR16k, ð4:16Þ lnC1 % 3ln : Proof. Because the hi,n are identically distributed, ! n 1 X lnC1 Z E h2 n i Z1 i;nC1 n X 1 Z E ðhi;n Kðhi;n K hi;nC1 ÞÞ2 n i Z1 Proc. R. Soc. A (2008) ð4:17Þ ! ð4:18Þ ! n X 2 2 2 % E h i;n C ðhi;n K hi;nC1 Þ n i Z1 ð4:19Þ ! n X 2 2 ðhi;n K hi;nC1 Þ : %2ln C E n i Z1 ð4:20Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3184 D. Evans Hence by lemma 4.3, 4k ðl C lnC1 Þ: n n Thus, for nR16k we have 4k/n%1/4 and hence lnC1%3ln. lnC1 % 2ln C ð4:21Þ & Lemma 4.5. If the sampling distribution is continuous, then for all nR16k, !2 n X E ðhi;n K hi;nC1 Þ % 8kbln : ð4:22Þ i Z1 Proof. Let MnC1 be the set containing the index of every point in X nC1 , which has the new point XnC1 among its first k -nearest neighbours, MnC1 ðX Þ Z fi : ni ðn C 1; [ Þ Z n C 1; for some 1% [ % kg: ð4:23Þ By construction, if i ;MnC1 then Xni ðn;[ Þ Z Xni ðnC1;[ Þ for all 1%[%k, so hi,nshi,nC1 only if i 2 MnC1 . Furthermore, by lemma 4.2, the new point XnC1 can become one of the first k -nearest neighbours of at most b of the existing points X1, ., Xn, so jMnC1 j% b. Hence, by the Cauchy inequality, !2 !2 n X X ðhi;n K hi;nC1 Þ Z ðhi;n K hi;nC1 Þ ð4:24Þ i Z1 i2MnC1 %jMnC1 j X ðhi;n K hi;nC1 Þ2 ð4:25Þ i2MnC1 %b n X ðhi;n K hi;nC1 Þ2 ; ð4:26Þ i Z1 and the result follows by lemmas 4.3 and 4.4. & (c ) An upper bound on var (Sn ) Lemma 4.6. If the sampling distribution is continuous, then for all nR16k, ðiÞ EðSn K SnC1 Þ2 % 2ð3 C 8kbÞln and ðiiÞ varðSn Þ% 2ðn C 1Þð3 C 8kbÞln : Proof. By definition, Sn K SnC1 Z n X ðhi;n K hi;nC1 ÞK hnC1;nC1 ; ð4:27Þ i Z1 so 2 ðSn K SnC1 Þ % 2 n X !2 ðhi;n K hi;nC1 Þ C 2h 2nC1;nC1 : ð4:28Þ i Z1 Taking expectations then applying lemmas 4.4 and 4.5, EðSn K SnC1 Þ2 % 2ð3 C 8kbÞln Proc. R. Soc. A (2008) ð4:29Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 Nearest neighbour statistics 3185 varðSn Þ% 2ðn C 1Þð3 C 8kbÞln : ð4:30Þ & and by (3.8), (d ) Normalization To find an upper bound on EðS nC1 KS n Þ2 , we first need to quantify the squared difference between successive values of mn and sn. Lemma 4.7. If the sampling distribution is continuous, then for all nR16k, ðiÞ ðmn K mnC1 Þ2 % 8kb ln n2 and 16k l : n n Proof. To prove (i), because the hi,n are identically distributed, " !# 2 n X 1 ðmn K mnC1 Þ2 Z E ðh K hi;nC1 Þ n i Z1 i;n ðiiÞ ðsn K snC1 Þ2 % n X 1 % 2E ðhi;n K hi;nC1 Þ n i Z1 ð4:31Þ !2 ð4:32Þ 8kb ln ; ð4:33Þ n2 where the last inequality follows by lemma 4.5. To prove (ii), let X~ Z ðX~ 1 ; X~ 2 ; .Þ be an independent copy of X Z(X1, X2,.), and let n~i ðn; kÞ denote the index of the k -nearest neighbour of X~ i among the points of X~n Z ðX~ 1 ; .; X~ n Þ. We consider the random variables, ð4:34Þ h~i;n ðX~ Þ Z h X~ i ; X~ n~i ðn;1Þ ; .; X~ n~i ðn;kÞ : % Because hi,n and h~i;n are independent and identically distributed, 2 Eððhi;n Kh~i;n Þ2 Þ Z Eðh 2i;n ÞK2Eðhi;n h~i;n Þ C Eðh~i;n Þ ð4:35Þ 2 Z Eðh 2i;n ÞK2Eðhi;n ÞEðh~i;n Þ C Eðh~i;n Þ ð4:36Þ 2 Z Eðh 2i;n ÞKm2n C Eðh~i;n ÞKm2n ð4:37Þ Z 2s2n : ð4:38Þ By the Cauchy–Schwarz inequality, Eððhi;n Kh~i;n Þðhi;nC1 Kh~i;nC1 ÞÞ% 2sn snC1 ; Proc. R. Soc. A (2008) ð4:39Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3186 D. Evans so 2ðsn K snC1 Þ2 Z 2s2n C 2s2nC1 K4sn snC1 % Eðhi;n Kh~i;n Þ2 C Eðhi;nC1 Kh~i;nC1 Þ2 K2Eðhi;n Kh~i;n Þðhi;nC1 Kh~i;nC1 Þ: ð4:40Þ ð4:41Þ Thus, we have 2ðsn K snC1 Þ2 % Eðððhi;n Kh~i;n ÞKðhi;nC1 Kh~i;nC1 ÞÞ2 Þ ð4:42Þ %2Eððhi;n K hi;nC1 Þ2 C ðh~i;n Kh~i;nC1 Þ2 Þ ð4:43Þ and because X and X~ are identically distributed, ðsn K snC1 Þ2 % 2Eðhi;n K hi;nC1 Þ2 : Finally, because the hi,n are identically distributed, n X 2 ðsn K snC1 Þ2 % E ðhi;n K hi;nC1 Þ2 n i Z1 ð4:44Þ ! and (ii) follows by lemmas 4.3 and 4.4. ð4:45Þ & (e ) An upper bound on varðSn Þ Proof of lemma 3.2. We aim to show that for all nR16k, E ðS n KS nC1 Þ2 % 2ð3 C 8kbÞð3 C 64kÞðkn C knC1 Þ: First, we write Sn Knmn SnC1 Kðn C 1ÞmnC1 Z K sn snC1 1 sn Sn Knmn K ðSnC1 Kðn C 1ÞmnC1 Þ Z sn snC1 1 ðSn K SnC1 ÞKnðmn K mnC1 Þ C mnC1 Z sn 1 K ðs K snC1 ÞðSnC1 Kðn C 1ÞmnC1 Þ : snC1 n from which it follows that: 4 2 E ðS n KS nC1 Þ % 2 EðSn K SnC1 Þ2 C n2 ðmn K mnC1 Þ2 C m2nC1 sn 1 2 C 2 ðsn K snC1 Þ varðSnC1 Þ : snC1 S n KS nC1 Thus, by lemmas 4.4, 4.6 and 4.7 we obtain ln lnC1 2 E ðS n KS nC1 Þ % 4ð3 C 8kbÞ 2 3 C 64k 2 : sn snC1 Proc. R. Soc. A (2008) ð4:46Þ ð4:47Þ ð4:48Þ ð4:49Þ ð4:50Þ ð4:51Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3187 Nearest neighbour statistics 1=2 Finally, because ln =s2n % kn and knC1R1, 1=2 E ðS n KS nC1 Þ2 % 4ð3 C 8kbÞð3 C 64kÞk1=2 n knC1 ; and the result follows by the fact that 1=2 1=2 2kn knC1 % kn C knC1 . ð4:52Þ & Proof of theorem 1.2. By lemma 3.2 and (3.8), varðS n Þ% 2ðn C 1Þð3 C 8kbÞð3 C 64kÞðkn C knC1 Þ: ð4:53Þ & (f ) Strong convergence Proof of theorem 1.1. We aim to show that jHn j/ 0 a.s. as n/N. Suppose that there exists a finite constant cO0 such that kn!c for all n2N. Then by corollary 3.3, the probabilities PðjHnm jO eÞ are summable for the subsequence nmZm2, N N X c0 X 1 P Hm 2 O e % 2 !N; ð4:54Þ e mZ1 m 2 mZ1 where c 0 Z 4cð3C 8kbÞð3C 64kÞ. Hence, by the Borel–Cantelli lemma it follows that Hm 2 / 0 a.s. as m/N. For m 2 ! n! ðmC 1Þ2 , we write m2 1 1 Hm2 C 2 S n KS m2 : Hn Z S m2 C S n KS m2 Z ð4:55Þ n n m Let 1 2 2 Wm Z max S n KS m2 : m ! n! ðm C 1Þ : ð4:56Þ m2 Then jHn j% jHm 2 jC Wm for all m 2 ! n! ðmC 1Þ2 , so by (4.54) it is sufficient to show that Wm/0 a.s. as m/N. Writing S n KS m2 Z ðS n KS nK1 Þ C ðS nK1 KS nK2 Þ C/C S m2C1 KS m2 ; ð4:57Þ we see that 1 Wm % 2 m 2 ðmC1Þ XK1 S j KS jK1 : ð4:58Þ jZm 2C1 Hence by the Cauchy inequality, and using the fact that the sum has exactly 2m terms, it follows by lemma 3.2 that: 0 12 2 ðmC1Þ K1 X 1 S j KS jK1 A E W 2m % 4 E @ ð4:59Þ m 2 jZm C1 2 % 3 m % Proc. R. Soc. A (2008) 2 ðmC1Þ XK1 2 E S j KS jK1 ð4:60Þ jZm 2C1 16cð3 C 8kbÞð3 C 64kÞ : m2 ð4:61Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3188 D. Evans Thus, by Markov’s inequality and the Borel–Cantelli lemma, Wm/0 a.s. as m/N, which concludes the proof of theorem 1.1. & 5. Conclusion For a sequence X ZX1, X2, . of independent and identically distributed random vectors in Rd , we have proved a strong law of large numbers for functions of a point and its nearest neighbours in the sample X n Z ðX1 ; .; Xn Þ. We have also used the result to show that certain non-parametric difference-based estimators of residual moments are strongly consistent as the number of points n/N (provided the distribution of the explanatory vectors satisfies smooth and positive density conditions). Perhaps the most significant advance put forward in this paper is that the random variables need not be bounded, the only requirement being that their first four moments are bounded. In practical applications, one area of concern is that the bound on varðS n Þ of theorem 1.2 increases exponentially with the dimension of the underlying space Rd , which negatively affects the rate of convergence when d is large. The exponential factor b is introduced in the proof of lemma 4.5, where we have used the relatively crude bound of lemma 4.2. By contrast, the proof of lemma 4.3 uses the bound of lemma 4.1, which avoids the exponential factor. If we attempt to prove lemma 4.5 along the same lines as those followed in the proof of lemma 4.3, we arrive at the expression !2 n n X X E ðhi;n K hi;nC1 Þ Z Eððhi;n K hi;nC1 Þðhj;n K hj;nC1 Þ2 Jij Þ; ð5:1Þ i Z1 i; jZ1 where Jij(X ) is the indicator variable of the event XnC1 2 Bi;n ðX Þh Bj;n ðX Þ, and Bi,n(X ) and Bj,n(X ) are, respectively, the k -nearest neighbour balls of the points Xi and Xj in the sample X n . At this point, we would like to show that PðXnC1 2 Bi;n h Bj;n Þ is at most equal to c(d )/n, where c(d ) does not increase exponentially with d. However, Dwyer (1995) showed that if k Z1 and dR7, then for n[d the expected number of edges ne in the random sphere of influence graph, or equivalently the expected number of pairs (Xi , Xj) for which /, satisfies Bi;n ðX Þh Bj;n ðX Þ s0 ð0:324Þ2d n% Eðne Þ% ð0:677Þ2d n: ð5:2Þ This suggests that the curse of dimensionality will not be easily lifted. The author would like to thank the Royal Society for supporting his research through the University Research Fellowship scheme. Appendix A. Standard results Proof of lemma 4.1. Let ui,n denote the probability measure of the k -nearest neighbour ball of Xi 2 X n , G(u) be the distribution function of ui,n and Bi(u) be the ball centred at Xi of probability measure u. Because the Xi are identically Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 Nearest neighbour statistics 3189 distributed, ðA 1Þ GðuÞ Z Pðui;n % uÞ Z PðBi ðuÞ contains at least k points other than Xi Þ ðA 2Þ kK1 X Z 1K PðBi ðuÞ contains exactly j points other than Xi Þ ðA 3Þ jZ0 Z 1K kK1 X n K1 jZ0 j ! u j ð1KuÞnKjK1 : ðA 4Þ Integrating by parts, E uai;n Z ð1 0 Za ð1 a u dGðuÞ Z 1Ka kK1 X n K1 jZ0 j !ð 1 uaK1 GðuÞdu ðA 5Þ 0 u jCaK1 ð1KuÞnKjK1 : ðA 6Þ 0 The integral in (A 6) is the beta function Bða; bÞZ GðaÞGðbÞ=GðaC bÞ with parameters aZjCa and bZnKj. Thus, we obtain kK1 aGðnÞ X Gðj C aÞ GðnÞGðk C aÞ Z : E uai;n Z Gðn C aÞ jZ1 Gðj C 1Þ Gðn C aÞGðkÞ In particular, Eðui;n ÞZ k=n. ðA 7Þ & d Lemma A.1. For every countable set of points in R , any point can be the (first) nearest neighbour of at most bdvd,pc other points of the set, where vd,p is the volume (Lebesgue measure) of the unit ball in ðRd ; k$kp Þ. Proof. Let tOdvd,p be an integer and suppose that x 0 is the nearest neighbour of every point in the set {x 1,.,xt}. We project the points xi onto the surface of the unit ball in Rd , writing xi Z x 0 C ri xi where ri Z kxi K x 0 k and kxi k Z 1: ðA 8Þ Let xi and xj be two distinct points of {x 1,.,xt} and suppose (without loss of generality) that kxj K x 0 k% kxi K x 0 k. The vector xiKxj can be expressed as ðxi K xj Þ Z ri xi K rj xj Z ðri K rj Þxi C rj ðxi K xj Þ; ðA 9Þ kxi K xj k% ðri K rj Þkxi k C rj kxi K xj k: ðA 10Þ so Hence, because ri % kxi K xj k and kxikZ1 we have ri % ðri K rj Þ C rj kxi K xj k Proc. R. Soc. A (2008) ðA 11Þ Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3190 D. Evans and therefore 1% kxi K xj k: ðA 12Þ Thus, every point xi must be located within an otherwise empty region on the surface of the unit ball in Rd , and the surface area of each of these regions must be at least equal to 1. By hypothesis, there are tOdvd,p points x1, ., xt so the total surface area covered by these disjoint regions must be greater than dvd,p. However, because the total surface area of the unit ball is equal to dvd,p we have a contradiction, and thus we conclude that t%bdvd,pc. & Lemma 8.4 of Yukich (1998) shows that lemma 4.2 follows lemma A.1. An alternative proof is given here. Proof of lemma 4.2. Let tOkdvd,p be an integer; suppose that x 0 is one of the k -nearest neighbours of every point in the set {x 1,.,xt}. First we choose x1 to be the point of {x 1,.,xt} furthest away from x 0, then eliminate this point x1 along with all points that are closer to x1 than x 0 is to x1. At least one point is eliminated (namely the point x1 itself ), and because x 0 is one of the k -nearest neighbours of x1 there can be at most kK1 other points closer to x1 than x 0 is to x1, so at most k points are eliminated in total. Next we repeat the procedure on remaining points, choosing x2 to be the furthest point away from x 0, then eliminating x2 along with all points that are closer to x2 than x 0 is to x2. Because x2 was not eliminated in the first round, it must be closer to x 0 than it is to x1, kx2 K x 0 k% kx2 K x1 k: ðA 13Þ We continue in this way, at each stage choosing xi to be the point (among those that remain) furthest away from x 0, and eliminating xi along with all points that are closer to xi than x 0 is to xi. Because xi was not eliminated in the previous rounds, it must be closer to x 0 than to any of the points chosen previously, kxi K x 0 k% kxi K xj k for all j Z 1; .; i K1: ðA 14Þ At least one point is eliminated at each stage, so the process must eventually terminate, say after T steps, and we obtain a set of points {x1,.,xT}, each having x 0 as its nearest neighbour. At most k points are eliminated at each step, so a minimum of t/k steps must be performed before the process terminates. By hypothesis, t Okdvd,p so T Odvd,p. However, by lemma A.1 we know that any point x 0 can be the nearest neighbour of at most bdvd,pc other points. Thus we have a contradiction, and conclude that t%kbdvd,pc. & For Euclidean space, Zeger & Gersho (1994) establish an alternative bound in terms of kissing numbers. Corollary A.2. If pZ2 then for every countable set of points in Rd , any point can be among the k-nearest neighbours of at most kK(d ) other points of the set, where K(d ) is the maximum kissing number in Rd . Proof. Following the proof of lemma A.1, by (A12) we can place a set of t nonoverlapping spheres of radius 1/2 at each point xi , and each of these will be tangent to the sphere of radius 1/2 centred at the origin. This contradicts the fact that there can be at most K(d ) such tangent spheres. & Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 Nearest neighbour statistics 3191 It is known that K(d ) satisfies the following inequalities, where the upper bound is due to Kabatiansky & Levenshtein (1978) and the lower bound is due to Wyner (1965), 20:2075dð1Coð1ÞÞ % KðdÞ% 20:401dð1Coð1ÞÞ : References Avram, F. & Bertsimas, D. 1993 On central limit theorems in geometrical probability. Ann. Appl. Prob. 3, 1033–1046. (doi:10.1214/aoap/1177005271) Baldi, P. & Rinott, Y. 1989 On normal approximations of distributions in terms of dependency graphs. Ann. Prob. 17, 1646–1650. (doi:10.1214/aop/1176991178) Bickel, P. J. & Breiman, L. 1983 Sums of functions of nearest neighbour distances, moment bounds, limit theorems and a goodness of fit test. Ann. Prob. 11, 185–214. (doi:10.1214/aop/ 1176993668) Chen, L. H. Y. & Shao, Q. 2004 Normal approximation under local dependence. Ann. Prob. 32, 1985–2028. (doi:10.1214/009117904000000450) Dwyer, R. A. 1995 The expected size of the sphere-of-influence graph. Comput. Geomet. Theory Appl. 5, 155–164. (doi:10.1016/0925-7721(94)00025-Q) Efron, B. & Stein, C. 1981 The jackknife estimate of variance. Ann. Stat. 9, 586–596. (doi:10.1214/ aos/1176345462) Evans, D. & Jones, A. J. 2008 Non-parametric estimation of residual moments and covariance. Proc. R. Soc. A 464, 2831–2846. (doi:10.1098/rspa.2007.0195) Evans, D., Jones, A. J. & Schmidt, W. M. 2002 Asymptotic moments of near neighbour distance distributions. Proc. R. Soc. A 458, 2839–2849. (doi:10.1098/rspa.2002.1011) Gruber, P. M. 2004 Optimum quantization and its applications. Adv. Math. 186, 456–497. (doi:10. 1016/j.aim.2003.07.017) Hitczenko, P., Janson, S. & Yukich, J. E. 1999 On the variance of the random sphere of influence graph. Random Struct. Algor. 14, 139–152. (doi:10.1002/(SICI )1098-2418(199903)14:2!139:: AID-RSA2O3.0.CO;2-E) Kabatiansky, G. A. & Levenshtein, V. I. 1978 Bounds for packings on a sphere and in space. Probl. Peredachi Inf. 1, 3–25. Penrose, M. D. 2000 Central limit theorems for k-nearest neighbour distances. Stoch. Proc. Appl. 85, 295–320. (doi:10.1016/S0304-4149(99)00080-0) Penrose, M. D. 2007 Laws of large numbers in stochastic geometry with statistical applications. Bernoulli 13, 1124–1150. (doi:10.3150/07-BEJ5167) Penrose, M. D. & Wade, A. R. 2008 Limit theory for the random on-line nearest-neighbor graph. Random Struct. Algor. 32, 125–156. (doi:10.1002/rsa.20185) Penrose, M. D. & Yukich, J. E. 2001 Central limit theorems for some graphs in computational geometry. Ann. Appl. Prob. 11, 1005–1041. (doi:10.1214/aoap/1015345393) Penrose, M. D. & Yukich, J. E. 2003 Weak laws of large numbers in geometric probability. Ann. Appl. Prob. 13, 277–303. (doi:10.1214/aoap/1042765669) Percus, A. G. & Martin, O. C. 1998 Scaling universalities of k th nearest neighbor distances on closed manifolds. Adv. Appl. Math. 21, 424–436. (doi:10.1006/aama.1998.0607) Petrovskaya, M. & Leontovitch, A. 1982 The central limit theorem for a sequence of random variables with a slowly growing number of dependencies. Theory Probab. Appl. 27, 815–825. (doi:10.1137/1127089) Reitzner, M. 2003 Random polytopes and the Efron–Stein jackknife inequality. Ann. Prob. 31, 2136–2166. (doi:10.1214/aop/1068646381) Steele, J. M. 1986 An Efron–Stein inequality for nonsymmetric statistics. Ann. Stat. 14, 753–758. (doi:10.1214/aos/1176349952) Proc. R. Soc. A (2008) Downloaded from http://rspa.royalsocietypublishing.org/ on July 28, 2017 3192 D. Evans Stone, C. J. 1977 Consistent nonparametric regression. Ann. Stat. 5, 595–620. (doi:10.1214/aos/ 1176343886) Wade, A. R. 2007 Explicit laws of large numbers for random nearest-neighbour-type graphs. Adv. Appl. Prob. 39, 326–342. (doi:10.1239/aap/1183667613) Wyner, A. D. 1965 Capabilities of bounded discrepancy decoding. Bell Syst. Tech. J. 44, 1061–1122. Yukich, J. E. 1998 Probability theory of classical Euclidean optimization problems. Springer Lecture Notes in Mathematics, no. 1675. Berlin, Germany: Springer. Zeger, K. & Gersho, A. 1994 The number of nearest neighbors in a Euclidean code. IEEE Trans. Inform. Theory 40, 1647–1649. (doi:10.1109/18.333884) Proc. R. Soc. A (2008)
© Copyright 2026 Paperzz