Journal of Computational Geometry jocg.org ON PROJECTIONS OF METRIC SPACES Mark Kozdoba∗ Abstract. Let X be a metric space and let µ be a probability measure on it. Consider a Lipschitz map T : X → Rn , with Lipschitz constant ≤ 1. Then one can ask whether the image T X can have large projections on many directions. For a large class of spaces X, we show that there are directions φ ∈ S n−1 on which the projection of the image T X is small on the average (in L2 (µ)), with bounds depending on the dimension n and the eigenvalues of the Laplacian on X. Our results can be viewed as a multidimensional extension of the concentration of measure principle. 1 Introduction Let (X, m) be a metric space and let µ be a probability measure on X. Consider a Lipschitz map T : X → Rn , with kT kLip ≤ 1, where Rn is taken with the standard inner product h·, ·i and the corresponding Euclidean norm | · |. Let us define another inner product on Rn by setting Z [φ, θ] = hφ, T xihθ, T xidµ(x) (1) for θ, φ ∈ Rn and denote the associated norm by Z kθkL2 := 2 hT x, θi dµ(x) 1 2 . (2) This norm is known as the covariance structure of the push-forward measure T µ, and we shall regard kθkL2 as measuring the size of the projection of the image of X onto the direction θ. Similar notion of size is used, for instance, in the archetypical dimensionality reduction technique in Machine Learning, the Principal Component Analysis (PCA) (see [3] for an exposition). In PCA, the data is projected onto a subspace V ⊂ Rn , of a fixed dimension k, such that the variation of the data on V is maximal among all subspaces with that dimension. One ideally expects then that the projection onto V is the real source of the data, and the projection onto V ⊥ comes from noise, with θ ∈ S n−1 ∩ V ⊥ having small kθkL2 norm. In this article we show that when a metric space is mapped into Rn , the geometry of X can provide bounds on the size of the projections of the image. To see why this might be the case, it is instructive first to take a look at an example. Take X to be the unit interval [0, 1], with Lebesgue measure, and let T : [0, 1] → Rn be a Lipschitz map, i.e. we consider ∗ [email protected] JoCG 5(1), 275–283, 2014 275 Journal of Computational Geometry jocg.org curves of length at most one in Rn (here and in what follows, Lipschitz will mean “having Lipschitz constant ≤ 1”). By considering a few pictures, it is clear that if n is large then there should be a direction with a small projection and we are interested in quantifying this phenomenon. It turns out that this particular situation was investigated already by Gohberg and Krein, in a different formulation and context (related to spectral properties of smooth kernels), see the book [7], Chapter III.10.3. Theorem 1 ([7]). There is an absolute constant c > 0, such that for every n > 1 and every 3 T : [0, 1] → Rn with kT kLip ≤ 1, there is θ ∈ S n−1 such that kθkL2 ≤ c · n− 2 . Our objective is to extend this result to a more general class of metric spaces with measure on which an appropriate notion of a derivative can be defined. Namely, we shall discuss finite spaces X which are graphs with the shortest path metric and with measure which is the stationary measure of the random walk, as described below. While this setting makes the presentation simpler, the statement and the proof of our result, Theorem 2, can be repeated with straightforward modifications in the setting of Riemannian manifolds. In addition, the approach we take does not explicitly use the stationarity of the measure. This assumption is used mainly because in the stationary case the associated Laplacians are well investigated operators, for which explicit bounds can be provided. Details are given in Section 2. Let X be a finite set and let E ⊂ X × X be a set of edges such that (X, E) is a connected undirected graph without loops. For x, y ∈ X, write x ∼ y iff (x, y) ∈ E and let d(x) denote the degree of x. We endow X with the shortest path metric and we set µ to be the stationary distribution of a simple nearest neighbour random walk on (X, E), µ(x) = d(x) 2|E| . Denote by L(X) the space of real-valued functions on X. The graph Laplacian on L(X) is defined by ! 1 X f (y) . 4(f )(x) = 2 f (x) − deg(x) y∼x (3) For our purposes, it suffices to note that this is a self-adjoint non-negative operator, with a one dimensional kernel consisting of constant functions. Some of these properties will be derived in Section 2, while the full details on the graph Laplacian can be found, for instance, |X|−1 in [4]. We denote by {λi }i=1 the sequence of non-zero eigenvalues of 4 in non-decreasing order, including multiplicities. Theorem 2. There is an absolute constant c > 0 such that for every graph X as above, every n > 1 and every Lipschitz map T : X → Rn , there is a direction θ ∈ S n−1 such that 1 −1 2 kθkL2 ≤ c · n− 2 λbn/2c−1 . Example 3 (Discrete Space). Let (X, m) be a metric space such that m(x, y) = 1 for all x 6= y, and let µ be the uniform probability on X. Then for every Lipschitz T : X → Rn there is θ ∈ S n−1 such that c kθkL2 ≤ √ . (4) n JoCG 5(1), 275–283, 2014 276 Journal of Computational Geometry jocg.org Note that, interestingly enough, the size of the space, |X|, does not appear in this bound (except that, of course, for n > |X| we always have θ ∈ S n−1 with kθkL2 = 0). That is, one can not increase the minimal projection size of X in, say, R20 , by adding more points. To prove (4), note that (X, m) corresponds to a clique graph, and the Laplacian has a single non-zero eigenvalue, which is of the order of a constant and has multiplicity |X| − 1. Example 4 (Combinatorial Cube). Here X is the set C d = {0, 1}d , with the Hamming metric m(x, y) = |{i ∈ {1, ..., d} | xi 6= yi }| where x = x1 ...xd , y = y1 ...yd ∈ C d and µ is again taken to be the uniform probability d 4k measure. The non-zero eigenvalues of the Laplacian on X are d with multiplicity , k k = 1, ..., d (See [5]). The n-th smallest eigenvalue is therefore of the order obtain that for every Lipschitz T : X → Rn , there is θ ∈ S n−1 such that √ d kθkL2 ≤ c · √ p . n · log(n) log(n) d and we We shall in fact prove a slightly stronger statement then Theorem 2. To state it, we require an additional bit of notation. Define the covariance matrix of the measure T µ in Rn to be the operator CovT µ : Rn → Rn , Z CovT µ = T x ⊗ T x dµ(x), (5) where for y ∈ Rn , y ⊗ y denotes an operator on Rn which acts by (y ⊗ y)(x) = hx, yiy. Note that for every θ, φ ∈ Rn , [θ, φ] =< CovT µ θ, φ > . Hence CovT µ is the non-negative matrix which connects the bilinear form [·, ·] with the natural inner product h·, ·i. Let {αi }ni=1 be the eigenvalues of CovT µ in non-increasing order. Then, with the rest of the notation as in Theorem 2, we have: Theorem 5. There is an absolute constant c > 0 such that for every graph X as above, every n > 1 and every Lipschitz map T : X → Rn , which satisfies in addition that Z T x dµ(x) = 0, (6) for every i = 1, ..., n, αi ≤ c · i−1 λ−1 bn/2c . (7) Note that for T satisfying the mean-zero assumption (6), Theorem 5 indeed implies Theorem 2. This can be seen by choosing θ of Theorem 2 to be the eigenvector of CovT µ corresponding to the eigenvalue αn . The removal of assumption (6) is simple and will be discussed in Section 2. Our approach to Theorem 2 extends the argument in [7] and, naturally, Theorem 1 can also be seen as a consequence of the principle of Theorem 2. Indeed, consider the JoCG 5(1), 275–283, 2014 277 Journal of Computational Geometry jocg.org 2 2 Laplacian on [0, 1], 4(f ) = − ∂∂xf2 . The minus sign in front of ∂∂xf2 is taken for consistency with definition (3), making 4(f ) a non-negative operator. This Laplacian has eigenspaces spanned by {sin2πkx, cos2πkx | k ∈ N}, with eigenvalues λn ≈ n2 , up to multiplicative 1 1 3 constants. Hence the decay rate of n− 2 · (n2 )− 2 = n− 2 in Theorem 1. Finally, we note that Theorem 5 can be viewed as a multidimensional extension of the (weak) concentration of measure phenomenon. The concentration of measure phenomenon states, roughly, that the distribution of a Lipschitz function on a metric measure space tends to be more concentrated around the mean then what would be suggested by space’s diameter. Many results, utilising different geometric parameters of the space were developed to quantify this phenomenon. We refer to [10] and [9] for further details on the concentration of measure. For the purposes of the present article, let us state a classic result by Alon and Milman, [1], which derives concentration of measure on a graph in terms of the first eigenvalue of the Laplacian. Theorem 6 (Alon and Milman, 1985). There exist constants c, C > 0, such that for every graph X, every Lipschitz f : X → R, and every t > 0: q −1 µ |f − Ef | > λ1 · t ≤ Ce−c·t . (8) The sub-exponential inequality (8) implies in particular the following, weaker, L2 form: There is a constant C > 0 such that for every X and every Lipschitz f : X → R, q 1 (V arµ (f )) 2 ≤ C · λ−1 (9) 1 . In Riemannian manifolds, this inequality is a special case of the Poincaré inequality (see [6]) and it has a short independent proof. We give a short argument for the graph case in Section 2. In M. Gromov’s terminology,[8], both inequalities (8) and q (9) state that the observable diameter of X, viewed through Lipschitz functions, is λ−1 1 . In other words, a Lipschitzq image of a space X in R will have most of its points within an interval of diameter of order λ−1 1 , which is typically much smaller then the diameter of X. In this light, one can ask what can be said about observations of X with values in higher dimensional Euclidean spaces. Theorem 5 provides one answer to this question, in terms of the Euclidean structure of the image. Thus Theorem 5 and can be viewed as an extension of inequality (9) to higher dimensional valued observations. Let this on the example of the cube C d . The diameter of C d is d and q us illustrate √ we have λ−1 d. Let T : X → Rn be a Lipschitz mapping and assume for simplicity 1 = R that T x dµ(x) = 0. For every θ ∈ S n−1 , define a function fθ : X → R by fθ (x) = hT x, θi. Since for every θ ∈ S n−1 , fθ is a Lipschitz function, we can conclude from inequality (9) that 1 √ kθkL2 = V arµ2 fθ ≤ C d JoCG 5(1), 275–283, 2014 278 Journal of Computational Geometry jocg.org and this is the best possible bound which is uniform over all θ. However, from Theorem 5 we know, for instance, that this bound can not be sharp for all θ. In particular, from √ 1 . Theorem 2, there exists θ ∈ S n−1 with V arµ2 fθ ≤ c · √ √ d n· log(n) Our results in this article deal with the L2 theory. That is, we give bounds on the second moments of the functions of the form fθ under the distribution induced by T , and these bounds are better then what can be obtained by application of the usual, one dimensional concentration inequalities. Whether similar results can be obtained for higher moments, or for full sub-exponential inequalities of the form (8) remains an interesting open question. 2 Proof Let us begin by introducing additional some notions and notation that is used in the proof of Theorem 2. Fix a graph (X, E) with the stationary measure µ on it. The notion of a gradient on a graph is folklore, although not particularly frequent in the literature and the notation varies considerably. We thus recall the definitions and some important properties. Define an inner product on L(X) by Z [f, g]X = f (x)g(x)dµ(x). Denote by L(E) the set of real valued functions on the set of edges, E, and let ν be the uniform probability measure on E. We equip L(E) with the inner product Z [f, g]E = f (e)g(e)dν. Fix an arbitrary orientation on E, i.e. for each edge e = {x, y} choose in an arbitrary way an enumeration ve1 , ve2 of the vertices of the edge. We can express this enumeration as a {+1, −1}-valued function o, o(x, y) = +1 if x = ve1 and y = ve2 −1 if x = ve2 and y = ve1 defined for pairs of vertices x, y which are connected by an edge e. The gradient operator on X is defined by 5 : L(X) → L(E), (5f )(e) = f (ve1 ) − f (ve2 ). Equivalently, for e = {x, y}, (5f )(e) = o(x, y) (f (x) − f (y)) . The Laplacian on the graph (X, E) is defined as 4 = 5∗ ◦ 5, where 5∗ is the Hilbert space adjoint of the 5. Let us first compute the explicit expression for the Laplacian for arbitrary JoCG 5(1), 275–283, 2014 279 Journal of Computational Geometry jocg.org measures µ and ν on X and E. By the definition of adjoints, the Laplacian needs to satisfy [5∗ ◦ 5f, g]X = [5f, 5g]E (10) for every f, g ∈ L(X). By choosing g to be the delta function at vertex x, we get X X (f (x) − f (y))ν(e), o(x, y) · (f (x) − f (y)) · o(x, y)ν(e) = (4f )(x) · µ(x) = e e where the sum is over all edges e = {x, y} connected to the vertex x. Note, in particular, that the Laplacian does not depend on the orientation, but does depend on the choice of the measures µ and ν. We shall be interested in the case where µ(x) = deg(x) 2|E| is the 1 stationary measure of the graph, and ν(e) = |E| is the uniform measure. These measures produce the classical definition (3), which has the advantage that this particular Laplacian is well-studied and its eigenvalues are known in many cases (see, for instance, [4] and [5]). Our proof of Theorem 2, however, uses the Laplacian only through equation (10), and does not explicitly assume stationarity. Denote by L0 (X) the subspace of L(X) that is orthogonal to the constant functions. Since Ker4 is the space of the constant functions, the Laplacian is invertible on L0 (X) and hence the gradient is also an invertible operator from L0 (X) onto its image. The inverse of the gradient will be of importance in what follows and we denote it by Γ : Im(5) → L0 (X). With the preliminaries on the Laplacian in place, we give a short argument for the proof of inequality (9). This inequality, however, will not be used in the rest of this section. First note that for every f ∈ L(X), V arµ f = [(f − Ef ), (f − Ef )]X ≤ λ−1 1 [4f, f ]X . (11) P Indeed, write f as an orthogonal decomposition f = i≥0 fi where fi is contained in the eigenspace of the eigenvalue λi of 4 (where fP = Ef is the mean, corresponding to 0 eigenvalue 0). Then the left hand-side of (11) is i>0 hfi , fi i and the right hand-side is P λ hf , fi i. Since λ1 is the smallest non-zero eigenvalue, (11) follows. Now, since the i i i>0 norm of the derivative satisfies by definition [5f, 5f ]E = [4f, f ]X (12) and since [5f, 5f ]E ≤ 1 for Lipschitz functions, inequality (9) follows. We now briefly recall the notion of singular values of an operator and some related singular value inequalities. We refer to [2] or [7] for full details and proofs. If V and W are (say, finite dimensional ) Hilbert spaces, and A : V → W is a linear operator, then A∗ A is a non-negative self-adjoint operator. Let {λi (A∗ A)}dimV be the eigenvalues of A∗ A, in i=1 non-increasing order, with multiplicities. Then singular values of the operator A are the p non-increasing sequence si (A) = λi (A∗ A). The singular values of A have a simple geometric interpretation. If B2 (V ) denotes the unit ball of V , then the image A(B2 (V )) is an ellipsoid in W and si (A) are precisely the lengths of the principal axes of this ellipsoid. JoCG 5(1), 275–283, 2014 280 Journal of Computational Geometry jocg.org The singular values satisfy si (A) = si (A∗ ) and singular values of a composition can be bounded by the following inequality due to Ky Fan. Let A : V → W and B : W → U be two operators on Hilbert spaces. Then, for every i, j ≥ 1, si+j−1 (BA) ≤ si (A)sj (B). (13) Finally, the Hilbert-Schmidt norm of an operator is defined by kAkHS = √ !1 2 trA∗ A = X s2i (A) . i Note that since si is a non-increasing sequence, si (A) ≤ kAkHS √ i (14) for all i. Before proceeding with the proof of Theorem 2, it will be convenient to introduce an additional assumption on the Lipschitz map T : X → Rn , namely that Z T (x)dµ = 0. (15) With conditions of Theorem 2 and this assumption, we will show that there is θ ∈ S n−1 such that 1 −1 2 kθkL2 = c · n− 2 λbn/2c . (16) This implies the result for arbitrary Lipschitz T : X → Rn . Indeed, assume that v := R T (x)dµ 6= 0. Denote by P the orthogonal projection onto the n − 1 dimensional space Rorthogonal to v. Then the composition P ◦ T is a Lipschitz map into that space and (P ◦ T )(x)dµ = 0, so (16) can be applied in dimension n − 1. Proof of Theorem 2. As mentioned above, we assume that condition (15) holds. Let DT denote the gradient of the map T , i.e. DT (e) = T (ve1 ) − T (ve2 ). Since T is Lipschitz, DT is bounded, i.e. |DT (e)| ≤ 1 for all e ∈ E. For every θ ∈ Rn consider the function on X, fθ (x) = hθ, T xi. By (15), fθ ∈ L0 (X) and clearly (5fθ )(e) = hθ, DT (e)i and fθ = Γ(hθ, DT i). Let JoCG 5(1), 275–283, 2014 n o BL2 (X) = f : X → R [f, f ]X ≤ 1 281 Journal of Computational Geometry jocg.org denote the unit ball in L(X) and write kθkL2 in a dual form: Z kθkL2 = supg∈B2 (X) g(x)hθ, T xidµ. Then Z g(x)hθ, T xidµ = [g, fθ ]X = [g, Γ ◦ 5fθ ]X = [Γ∗ g, 5fθ ]E . Next, write ∗ [Γ g, 5fθ ]E = Z Z ∗ (Γ g)(e) · hθ, DT (e)idν(e) = hθ, (Γ∗ g)(e) · DT (e)dν(e)i. e : L(E) → Rn the operator that acts by Denote by D Z e Du = u(e) · DT (e)dν. With this notation, e ◦ Γ∗ gi = suphθ, φi kθkL2 = supg∈B2 hθ, D (17) φ∈E e ◦ Γ∗ B2 is the dual (in Rn ) ellipsoid of the k · kL norm. where E = D 2 Our aim is to bound the quantity inf θ∈S n−1 kθkL2 . By (17), this quantity equals the length of the smallest principal axis of the ellipsoid E. e ◦ Γ∗ , we bound the Since the lengths of the principal axes are the singular values of D singular values of this operator. −1 The singular values of Γ∗ are given, si (Γ∗ ) = λi 2 , where λi are the non-zero eigenvalues of the Laplacian (in non-decreasing order, so that si (Γ∗ ) do not increase). To bound e we show that the singular values of D, e HS ≤ 1. kDk (18) Indeed, one readily verifies that e ◦D e ∗ (θ) = D Z hθ, DT (e)iDT (e)dν. For fixed e ∈ E, the trace of an operator θ 7→ hθ, DT (e)iDT (e) is |DT (e)|2 , which satisfies |DT (e)| ≤ 1 (19) e ≤ by the Lipschitz assumption. Therefore (18) follows by averaging. Now, by (14), si (D) and an application of (13) (with i = j = n/2) completes the proof. JoCG 5(1), 275–283, 2014 282 1 √ i Journal of Computational Geometry jocg.org References [1] N. Alon, V.D Milman, λ1 , Isoperimetric inequalities for graphs, and superconcentrators, Journal of Combinatorial Theory, Series B, Vol. 38, issue 1, p. 73 - 88, 1985 [2] R. Bhatia, Matrix analysis. Graduate Texts in Mathematics, 169. Springer-Verlag, New York, 1997. [3] C.M. Bishop, Pattern Recognition and Machine Learning , Springer-Verlag New York, Inc., 2006 [4] F.R. Chung, Spectral graph theory. Vol. 92. American Mathematical Soc., 1997. [5] P. Diaconis, Group representations in probability and statistics, Institute of Mathematical Statistics Lecture Notes—Monograph Series, 11, Institute of Mathematical Statistics, Hayward, CA, 1988, [6] L.C. Evans, Partial Differential Equations, Graduate studies in mathematics, American Mathematical Society, 1998 [7] I. C. Gohberg, M. G. Kreı̆n , Introduction to the theory of linear nonselfadjoint operators, Translations of Mathematical Monographs, Vol. 18, American Mathematical Society,1969 [8] M. Gromov, Metric structures for Riemannian and non-Riemannian spaces. Based on the 1981 French original. With appendices by M. Katz, P. Pansu and S. Semmes. Translated from the French by Sean Michael Bates. Progress in Mathematics, 152. Birkhuser Boston, Inc., Boston, MA, 1999 [9] M. Ledoux, The Concentration of Measure Phenomenon, Mathematical surveys and monographs, Vol. 89, American Mathematical Society, 2005 [10] V. Milman, G. Schechtman, Asymptotic theory of finite-dimensional normed spaces. With an appendix by M. Gromov. Lecture Notes in Mathematics, 1200. Springer-Verlag, Berlin, 1986. JoCG 5(1), 275–283, 2014 283
© Copyright 2026 Paperzz