Higher Dimensions

Higher Dimensions
CSI 772
James E. Gentle
The most common statistical datasets can be thought of as rows, representing observations, and columns,
representing variables. In traditional multiple regression and correlation and other methods of multivariate analysis, there are generally few conceptual hurdles in thinking of the observations as ranging over a
multidimensional space. In multiple regression with m regressors, for example, it is easy to visualize the
b It is even easy to visualize the projection of
hyperplane in m + 1 dimensions that represents the fit yb = X β.
the n-dimensional vector that represents a least-squares fit.
Many properties of one- and two-dimensional objects (lines and planes) carry over into higher-dimensional
space just as we would expect.
Although most of our intuition is derived from our existence in a three-dimensional world, we generally
have no problem dealing with one- or two-dimensional objects. On the other hand, it can be difficult to
view a 3-D world from a two-dimensional perspective. The delightful fantasy, Flatland, written by Edwin
Abbott in 1884, describes the travails of a two-dimensional person (one “A. Square”) thrown into a threedimensional world. (See also Stewart, 2001, Flatterland, Like Flatland Only More So.) The small book by
Kendall (1961), A Course in the Geometry of n Dimensions, gives numerous examples in which common
statistical concepts are elucidated by geometrical constructs.
There are many situations, however, in which our intuition derived from the familiar representations in
one-, two-, and three-dimensional space leads us completely astray. This is particularly true of objects whose
dimensionality is greater than three, such as volumes in higher-dimensional space. The problem is not just
with our intuition, however; it is indeed the case that some properties do not generalize to higher dimensions.
Exercise 2 illustrates such a situation.
The shape of a dataset is the total information content that is invariant under translations, rotations,
and scale transformations. Quantifying the shape of data is an interesting problem.
1
Data Sparsity in Higher Dimensions
We measure space both linearly and volumetrically. The basic cause of the breakdown of intuition in higher
dimensions is that the relationship of linear measures to volumetric measures is exponential in dimensionality.
The cubing we are familiar with in three-dimensional space cannot be used to describe the relative sizes of
volumes (that is, the distribution of space). Volumes relative to the linear dimensions grow very rapidly.
There are two consequences of this. One is that the volumes of objects with interior holes, such as thin boxes
or thin shells, are much larger than our intuition predicts. Another is that the density of a fixed number of
points becomes extremely small.
The density of a probability distribution decreases as the distribution is extended to higher dimensions
by an outer product of the range. This happens fastest going from one dimension to two dimensions but
continues at a decreasing rate for higher dimensions. The effect of this is that the probability content of
regions at a fixed distance to the center of the distribution increases; that is, outliers or isolated data points
become more common. This is easy to see in comparing a univariate normal distribution with a bivariate
normal distribution. If X = (X1 , X2 ) has a bivariate normal distribution with mean 0 and variance-covariance
matrix diag(1, 1),
Pr(|X1 | > 2) = 0.0455,
whereas
Pr(kXk > 2) = 0.135.
The probability that the bivariate random variable is greater than two standard deviations from the center is
much greater than the probability that the univariate random variable is greater than two standard deviations
from the center. We can see the relative probabilities in Figures 1 and 2. The area under the univariate
density that is outside the central interval shown is relatively small. It is about 5% of the total area. The
volume under the bivariate density in Figure 2 beyond the circle is relatively greater than the volume within
the circle. It is about 13% of the total volume. The percentage increases with the dimensionality (see
Exercise 1).
The consequence of these density patterns is that an observation in higher dimensions is more likely to
appear to be an outlier than one in lower dimensions.
0.4
0.3
0.2
p(x)
0.1
0.0
(
−3
−2
)
−1
0
1
2
3
x
Figure 1: Univariate Extreme Regions
p(x,y)
y
x
Figure 2: Bivariate Extreme Regions
2
Volumes of Hyperspheres and Hypercubes
It is interesting to compare the volumes of regular geometrical objects and observe how the relationships of
volumes to linear measures change as the number of dimensions changes. Consider, for example, that the
volume of a sphere of radius a in d dimensions is
ad π d/2
.
Γ(1 + d/2)
The volume of a superscribed cube is (2a)d . Now, compare the volumes. Consider the ratio
π d/2
d2d−1 Γ(d/2)
.
For d = 3 (Figure 3), this is 0.524; for d = 7, however, it is 0.037. As the number of dimensions increases,
more and more of the volume of the cube is in the corners.
For two objects of different sizes but the same shape, with the smaller one centered inside the larger one,
we have a similar phenomenon of the content of the interior object relative to the larger object. The volume
of a thin shell as the ratio of the volume of the outer figure (sphere, cube, whatever) is
Vd (r) − Vd (r − )
d
=1− 1−
.
Vd (r)
r
As the number of dimensions increases, more and more of the volume of the larger object is in the outer thin
shell. This is the same phenomenon that we observed above for probability distributions. In a multivariate
distribution whose density is the product of identical univariate densities (which is the density of a simple
random sample), the relative probability content within extreme regions becomes greater as the dimension
increases.
Figure 3: A Superscribed Cube
3
The Curse of Dimensionality
The computational and conceptual problems associated with higher dimensions have often been referred
to as “the curse of dimensionality”. How many dimensions cause problems depends on the nature of the
application. Golub and Ortega (1993) use the phrase in describing the solution to the diffusion equation in
three dimensions, plus time as the fourth dimensions of course.
In higher dimensions, not only do data appear as outliers, but they also tend to lie on lower dimensional
manifolds. This is the problem sometimes called “multicollinearity”. The reason that data in higher dimensions are multicollinear, or more generally, concurve, is that the number of lower dimensional manifolds
increases very rapidly in the dimensionality: The rate is 2d .
Whenever it is possible to collect data in a well-designed experiment or observational study, some of the
problems of high dimensions can be ameliorated. In computer experiments, for example, Latin hypercube
designs can be useful for exploring very high dimensional spaces.
Data in higher dimensions carry more information in the same number of observations than data in lower
dimensions. Some people have referred to the increase in information as the “blessing of dimensionality”.
The support vector machine approach in fact attempts to detect structure in data by mapping the data to
higher dimensions.
4
Tiling Space
Tessellations of the data space are useful in density estimation and in clustering and classification. Generally,
regular tessellations, or tilings (objects with the same shapes), are used. Regular tessellations are easier both
to form and to analyze.
Regular tessellations in higher dimensional space have counterintuitive properties. As an example, consider tiling by hypercubes, as illustrated in Figure 4 for squares in two dimensions.
The tiling on the left-hand side in Figure 4 is a lattice tiling. In both tilings, we see that each tile has an
entire side in common with at least one adjacent tile. This is a useful fact when we use the tiles as bins in
data analysis, and it is always the case for a lattice tiling. It is also always the case in two dimensions. (To
see this, make drawings similar to those in Figure 4.) In fact, in lower dimensions (up to six dimensions for
sure), tilings by hypercubes of equal size always have the property that some adjacent tiles have an entire
face (side) in common. It is an open question as to what number of dimensions ensures this property, but
the property definitely does not hold in ten dimensions, as shown by Peter Shor and Jeff Lagarias. (See
What’s Happening in the Mathematical Sciences, Volume 1, American Mathematical Society, 1993, pages
21–25.)
Figure 4: Hypercube (Square) Tilings of 2-Space
5
Exercises
1. Let X be a standard 10-variate normal random variable (the mean is 0 and the variance-covariance
is diag(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)). What is the probability that kXk > 6? In other words, what is the
probability of exceeding six sigma?
Hint: Use polar coordinates. (Even then, the algebra is messy.)
2. In d dimensions, construct 2d hyperspheres with centers at the points (±1, . . . , ±1), and construct the
hypercube with edges of length 2 that contains the unit hyperspheres. At the point (0, . . . , 0), construct
the hypersphere that is tangent to the other 2d spheres. In two dimensions, the spheres appear as
'$
'$
q
q
(1, 1)
q
'$
&%
'$
&%
q
q
(−1, 1)
&%
&%
(−1, −1)
(1, −1)
Is the interior hypersphere always inside the hypercube? (The answer is “No!”) At what number of
dimensions does the interior hypersphere poke outside the hypercube? (See What’s Happening in the
Mathematical Sciences, Volume 1, American Mathematical Society, 1993.)
3. Consider a cartesian coordinate system for IRd , with d ≥ 2. Let x be a point in IRd+ such that kxk2 = 1
and x is equidistant from all axes of the coordinate system.
6
+
Z x
Z
Z C
C
C
C
X
XC
X
θ C C
XCXX
CX
XX
XX
z
X
What is the angle between the line through x and any of the positive axes? Hint: for d = 2, the angles
are ±π/4.
What are the angles as d → ∞?