Higher Dimensions CSI 772 James E. Gentle The most common statistical datasets can be thought of as rows, representing observations, and columns, representing variables. In traditional multiple regression and correlation and other methods of multivariate analysis, there are generally few conceptual hurdles in thinking of the observations as ranging over a multidimensional space. In multiple regression with m regressors, for example, it is easy to visualize the b It is even easy to visualize the projection of hyperplane in m + 1 dimensions that represents the fit yb = X β. the n-dimensional vector that represents a least-squares fit. Many properties of one- and two-dimensional objects (lines and planes) carry over into higher-dimensional space just as we would expect. Although most of our intuition is derived from our existence in a three-dimensional world, we generally have no problem dealing with one- or two-dimensional objects. On the other hand, it can be difficult to view a 3-D world from a two-dimensional perspective. The delightful fantasy, Flatland, written by Edwin Abbott in 1884, describes the travails of a two-dimensional person (one “A. Square”) thrown into a threedimensional world. (See also Stewart, 2001, Flatterland, Like Flatland Only More So.) The small book by Kendall (1961), A Course in the Geometry of n Dimensions, gives numerous examples in which common statistical concepts are elucidated by geometrical constructs. There are many situations, however, in which our intuition derived from the familiar representations in one-, two-, and three-dimensional space leads us completely astray. This is particularly true of objects whose dimensionality is greater than three, such as volumes in higher-dimensional space. The problem is not just with our intuition, however; it is indeed the case that some properties do not generalize to higher dimensions. Exercise 2 illustrates such a situation. The shape of a dataset is the total information content that is invariant under translations, rotations, and scale transformations. Quantifying the shape of data is an interesting problem. 1 Data Sparsity in Higher Dimensions We measure space both linearly and volumetrically. The basic cause of the breakdown of intuition in higher dimensions is that the relationship of linear measures to volumetric measures is exponential in dimensionality. The cubing we are familiar with in three-dimensional space cannot be used to describe the relative sizes of volumes (that is, the distribution of space). Volumes relative to the linear dimensions grow very rapidly. There are two consequences of this. One is that the volumes of objects with interior holes, such as thin boxes or thin shells, are much larger than our intuition predicts. Another is that the density of a fixed number of points becomes extremely small. The density of a probability distribution decreases as the distribution is extended to higher dimensions by an outer product of the range. This happens fastest going from one dimension to two dimensions but continues at a decreasing rate for higher dimensions. The effect of this is that the probability content of regions at a fixed distance to the center of the distribution increases; that is, outliers or isolated data points become more common. This is easy to see in comparing a univariate normal distribution with a bivariate normal distribution. If X = (X1 , X2 ) has a bivariate normal distribution with mean 0 and variance-covariance matrix diag(1, 1), Pr(|X1 | > 2) = 0.0455, whereas Pr(kXk > 2) = 0.135. The probability that the bivariate random variable is greater than two standard deviations from the center is much greater than the probability that the univariate random variable is greater than two standard deviations from the center. We can see the relative probabilities in Figures 1 and 2. The area under the univariate density that is outside the central interval shown is relatively small. It is about 5% of the total area. The volume under the bivariate density in Figure 2 beyond the circle is relatively greater than the volume within the circle. It is about 13% of the total volume. The percentage increases with the dimensionality (see Exercise 1). The consequence of these density patterns is that an observation in higher dimensions is more likely to appear to be an outlier than one in lower dimensions. 0.4 0.3 0.2 p(x) 0.1 0.0 ( −3 −2 ) −1 0 1 2 3 x Figure 1: Univariate Extreme Regions p(x,y) y x Figure 2: Bivariate Extreme Regions 2 Volumes of Hyperspheres and Hypercubes It is interesting to compare the volumes of regular geometrical objects and observe how the relationships of volumes to linear measures change as the number of dimensions changes. Consider, for example, that the volume of a sphere of radius a in d dimensions is ad π d/2 . Γ(1 + d/2) The volume of a superscribed cube is (2a)d . Now, compare the volumes. Consider the ratio π d/2 d2d−1 Γ(d/2) . For d = 3 (Figure 3), this is 0.524; for d = 7, however, it is 0.037. As the number of dimensions increases, more and more of the volume of the cube is in the corners. For two objects of different sizes but the same shape, with the smaller one centered inside the larger one, we have a similar phenomenon of the content of the interior object relative to the larger object. The volume of a thin shell as the ratio of the volume of the outer figure (sphere, cube, whatever) is Vd (r) − Vd (r − ) d =1− 1− . Vd (r) r As the number of dimensions increases, more and more of the volume of the larger object is in the outer thin shell. This is the same phenomenon that we observed above for probability distributions. In a multivariate distribution whose density is the product of identical univariate densities (which is the density of a simple random sample), the relative probability content within extreme regions becomes greater as the dimension increases. Figure 3: A Superscribed Cube 3 The Curse of Dimensionality The computational and conceptual problems associated with higher dimensions have often been referred to as “the curse of dimensionality”. How many dimensions cause problems depends on the nature of the application. Golub and Ortega (1993) use the phrase in describing the solution to the diffusion equation in three dimensions, plus time as the fourth dimensions of course. In higher dimensions, not only do data appear as outliers, but they also tend to lie on lower dimensional manifolds. This is the problem sometimes called “multicollinearity”. The reason that data in higher dimensions are multicollinear, or more generally, concurve, is that the number of lower dimensional manifolds increases very rapidly in the dimensionality: The rate is 2d . Whenever it is possible to collect data in a well-designed experiment or observational study, some of the problems of high dimensions can be ameliorated. In computer experiments, for example, Latin hypercube designs can be useful for exploring very high dimensional spaces. Data in higher dimensions carry more information in the same number of observations than data in lower dimensions. Some people have referred to the increase in information as the “blessing of dimensionality”. The support vector machine approach in fact attempts to detect structure in data by mapping the data to higher dimensions. 4 Tiling Space Tessellations of the data space are useful in density estimation and in clustering and classification. Generally, regular tessellations, or tilings (objects with the same shapes), are used. Regular tessellations are easier both to form and to analyze. Regular tessellations in higher dimensional space have counterintuitive properties. As an example, consider tiling by hypercubes, as illustrated in Figure 4 for squares in two dimensions. The tiling on the left-hand side in Figure 4 is a lattice tiling. In both tilings, we see that each tile has an entire side in common with at least one adjacent tile. This is a useful fact when we use the tiles as bins in data analysis, and it is always the case for a lattice tiling. It is also always the case in two dimensions. (To see this, make drawings similar to those in Figure 4.) In fact, in lower dimensions (up to six dimensions for sure), tilings by hypercubes of equal size always have the property that some adjacent tiles have an entire face (side) in common. It is an open question as to what number of dimensions ensures this property, but the property definitely does not hold in ten dimensions, as shown by Peter Shor and Jeff Lagarias. (See What’s Happening in the Mathematical Sciences, Volume 1, American Mathematical Society, 1993, pages 21–25.) Figure 4: Hypercube (Square) Tilings of 2-Space 5 Exercises 1. Let X be a standard 10-variate normal random variable (the mean is 0 and the variance-covariance is diag(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)). What is the probability that kXk > 6? In other words, what is the probability of exceeding six sigma? Hint: Use polar coordinates. (Even then, the algebra is messy.) 2. In d dimensions, construct 2d hyperspheres with centers at the points (±1, . . . , ±1), and construct the hypercube with edges of length 2 that contains the unit hyperspheres. At the point (0, . . . , 0), construct the hypersphere that is tangent to the other 2d spheres. In two dimensions, the spheres appear as '$ '$ q q (1, 1) q '$ &% '$ &% q q (−1, 1) &% &% (−1, −1) (1, −1) Is the interior hypersphere always inside the hypercube? (The answer is “No!”) At what number of dimensions does the interior hypersphere poke outside the hypercube? (See What’s Happening in the Mathematical Sciences, Volume 1, American Mathematical Society, 1993.) 3. Consider a cartesian coordinate system for IRd , with d ≥ 2. Let x be a point in IRd+ such that kxk2 = 1 and x is equidistant from all axes of the coordinate system. 6 + Z x Z Z C C C C X XC X θ C C XCXX CX XX XX z X What is the angle between the line through x and any of the positive axes? Hint: for d = 2, the angles are ±π/4. What are the angles as d → ∞?
© Copyright 2025 Paperzz