6.893 FINAL PROJECT: QUANTILE ESTIMATION IN MANY

6.893 FINAL PROJECT: QUANTILE ESTIMATION IN
MANY DIMENSIONS
JOSH ALMAN AND JON SCHNEIDER
1. Introduction
In our project, we consider the following generalization of the median
estimator problem:
Problem 1. Given a stream S of n vectors in Rd , followed by a query vector
v, estimate the maximum (or median or rth quantile) of {a · v | a ∈ S}, the
set of projections of the vectors from S onto v.
There is a clear algorithm that takes linear time and space to solve this
problem exactly: store all of S, then compute the projections onto v, and
find the median using a linear time median finding algorithm. We seek
an algorithm to estimate the answer using sublinear time and space. In
particular, we typically envision n d, or even d constant, and so we seek
time and space complexities that are independent of n. In our project, we
focus mainly on space complexity.
In addition to this problem being interesting in its own right, it could
have some insightful applications, as comparisons of vector projections are
ubiquitous in combinatorial optimization, data analysis, and throughout
computer science. The variant of our problem where we find the maximum
is particularly significant in the context of optimizing an unknown weight
function over a set of given points, a common problem in combinatorial
optimization.
1.1. Estimation. The important question to ask before solving our problem is what we mean by estimate. There are two reasonable notions of
estimation: quantile estimation, and geometric estimation.
1.1.1. Quantile Estimation. In quantile estimation, we want our algorithm
to return a value that is within n elements of our stream from the actual
quantile we want. Hence, if we are estimating the rth quantile, we need to
return a value between the (r ± )th quantile.
However, for fixed d, quantile estimation is just as hard in one dimension
as it is in d > 1 dimensions. Indeed, recall our sublinear time quantile
estimation algorithm from class for one dimension in the sampling model:
We sample O(log(1/δ)/2 ) points, and return the rth quantile of our sample.
With probability 1−δ, this value is within n elements of the real rth quantile
Date: May 2013.
1
2
JOSH ALMAN AND JON SCHNEIDER
by the Chernoff bound. This method carries over to d > 1 dimensions in
either the streaming or the sampling model. If we sample or remember a
uniformly random selection of O(log(1/δ)/2 ) of the vectors of S, then no
matter what query vector v we get, the rth quantile of the projections of the
vectors we remember onto v is within n elements of the real rth quantile
with probability 1 − δ.
For this reason, the quantile estimation variant of our problem is not very
interesting. We instead focus on the geometric estimation variant.
1.1.2. Geometric Estimation. In geometric estimation, we want our algorithm to return a value that is within (1 ± ) of the rth quartile. In the rest
of this paper, we prove the following lower bounds, when d is fixed:
Theorem 1. Approximating the maximum to within 1± requires Θ(−(d−1)/2 )
space.
Theorem 2. Approximating the median to within 1 ± requires Ω(n) space
(when d > 1).
Theorem 3. Approximating
the rth-quantile (for r 6= 0, 1/2, 1) to within
√
1 ± requires Ω( n) space (when d > 1).
2. Approximating the Maximum
We begin with the problem of estimating the maximum of the projections of the vectors of S onto v. We will first give an algorithm for the
problem that uses O(−(d−1)/2 ) space, and then prove the same lower bound
of Ω(−(d−1)/2 ) space required for the problem, in order to prove Theorem
1. Both our upper and lower bounds will make use of spherical codes for
efficiently packing and covering the sphere with circles. We prove the results
about sizes of spherical codes that we use in Appendix A.
2.1. An Algorithm. To give an upper bound for the space required for
approximating the maximum, we provide an algorithm for the problem that
uses ideas from core-sets in computation geometry. The algorithm is a simplification of an algorithm for computing -kernels that was found independently by Chan [2] and Yu et al. [5].
We first give some intuition for the algorithm. Say that there is some
direction v0 for which we know that the vector of S whose projection onto
v0 is maximal is s0 . Then, in any other direction v, we could project s0
onto v as well. If θ is the angle between v0 and v, then projecting s0 onto
v gives within cos θ of the length of the largest projection of a vector of S
onto v. In other words, knowing the maximum projection onto v0 gives a
cos θ approximation for any vector with angle θ from v0 . The proof of this
can be found in [1, Lemma 2.1].
The idea√for our algorithm, then, is to pick θ such that cos θ = , namely
pick θ ≈ , and then select some directions vi so that any direction is
within angle θ of one of the vi s we pick. If we find a covering of the unit
6.893 FINAL PROJECT: QUANTILE ESTIMATION IN MANY DIMENSIONS
(a) An illustration of two radius θ circles
on a sphere.
3
(b) A circle covering of the sphere.
Figure 1. Covering the sphere with radius θ circles.
sphere by circles of radius θ (see Figure 1), then picking our vi to be the
vectors to the centers of each of the circles will give us this property. We
know that a sphere covering exists that uses O(θ−(d−1) ) = O(−(d−1)/2 )
circles (see Lemma 4). By picking the corresponding vi vectors, we thus get
an algorithm using this much space.
In the next section, we prove a lower bound that meets the space bound
we just achieved.
2.2. Hardness of Approximating the Maximum. In the previous section we saw that we can approximate the maximum with 1 + by using
O(−(d−1)/2 ) space. In this section, we will see that this much space is
required.
The key idea in the proof is that the information in the data set is hard
to compress; in order to encode answers to all possible queries, we need this
amount of space. As a corollary, it follows that this hardness result extends
to a far more general model of computation than the streaming model namely, even in the model where we have unlimited computational power
while processing the data set, we need to remember at least Ω(−(d−1)/2 )
bits of information to answer the subsequent query.
Theorem 4. Approximating the maximum to within 1± requires Ω(−(d−1)/2 )
space.
Proof. As mentioned above, we wish to show that it is hard to compress the
information in such a way that we can still answer our desired queries. To
do this, we will demonstrate a family of possible inputs, such that any two
inputs differ significantly (i.e., by at least (1 + )2 ≈ 1 + 2) on some possible
query vector.
We will construct this family as follows. Choose a set V = {v1 , v2 , . . . , vN }
of unit vectors in Rd such that any two
√ vectors in V are angle at least θ
apart (where as before, we choose θ ≈ so that cos θ = (1 + )−2 ≈ 1 − 2).
4
JOSH ALMAN AND JON SCHNEIDER
Figure 2. Packing the sphere with radius θ circles.
Note that this corresponds to a sphere packing of θ-radius circles on the
surface of a (d − 1)-dimensional sphere (see Figure 2). It is known that such
a set of size N = Ω(θ−(d−1) ) exists (see Lemma 3).
Given V , construct our set S of input vectors by, for each vi , either
including vi in S or including (1 + )2 vi in S. Since we have two choices
per vector in V , our family of possible inputs contains 2|V | = 2N possible
inputs.
Next, let mi = maxhw, vi i, where w ranges over all vectors in S (in other
words, mi is the answer to the query with query vector vi ). We claim that
mi = 1 if vi ∈ S and mi = (1 + )2 if (1 + )2 vi ∈ S. To show this, it
suffices to notice that, since every other element of S is angle at least θ
from vi , the maximum of hw, vi i over all w ∈ S not proportional to vi is at
most (1 + )2 cos θ ≤ 1. It follows that vi or (1 + )2 vi is the maximizer of
maxhw, vi i.
It now follows that any for any two of the 2N possible choices of input,
there is a query vector for which the two exact outputs differ by a factor of at
least (1+)2 . Any correct algorithm must present different outputs for these
two different inputs, and therefore that any correct algorithm must be able to
distinguish perfectly between all 2N possible input sets. The required space
complexity for such an algorithm is Ω(N ) = Ω(θ−(d−1) ) = Ω(−(d−1)/2 ), as
desired.
6.893 FINAL PROJECT: QUANTILE ESTIMATION IN MANY DIMENSIONS
5
3. Approximating the Median
In the previous section, we saw that we can approximate our generalized notion of maximum in space independent of n. A natural question is
whether we can obtain such a result for the median (perhaps by modifying
the maximum algorithm to instead store the median answer in a variety of
different directions).
Unfortunately, we will see that this is impossible (when d > 1). In this
section we will show that any algorithm that estimates the median to within
1 + must use at least Ω(n) space (and in particular, no √
sublinear algorithm
can exist for it). A similar yet weaker lower bound of Ω( n) likewise carries
over to the general problem of computing rth quantiles for general r.
As in the previous section, our core approach will be the same. We will
show that the required information is hard to compress; in particular, that
there are at least 2N different data sets with O(N ) points that we must
successfully distinguish between.
Theorem 5. Approximating the median to within 1 ± requires Ω(n) space
(when d > 1).
Proof. It suffices to prove this for the case where d = 2; in any higher
dimension, we can just restrict all our points to lie in a plane.
Choose N vectors V = {v1 , v2 , . . . , vN } in R2 such that no two are parallel,
and choose any subset S of V . We will show how to construct an input set
with O(N ) points such that for any vi ∈ S, the median is 0, whereas for
any vi ∈ V \ S, the median is nonzero (and hence multiplicatively separated
from 0). Since there are 2N possible choices of S, it follows that we need
Ω(N ) = Ω(n) space to distinguish between all of these possibilities.
To construct our desired input set, we will incrementally add points starting with an input set which contains only the origin. For each v ∈ S, draw
the line hv, xi = 0. These lines divide R2 into 2|S| different regions (see
Figure 3). For the median in each of these directions to occur at 0, we
would like there to be the same number of points on each side of each line.
Currently this condition is satisfied, since there are no points in any of the
regions.
Now, consider in turn each v ∈ V \ S. For each v, draw the line `v given
by hv, xi = 0; we would like there to be a different number of points on each
side of the line. Each such line intersects two existing regions; label these
regions R and −R. If either R or −R contains any points, then we can just
move all these points to the same side of v while not changing the number
of points in any existing region (and hence keeping the medians across all
other lines the same). On the other hand, if there are no points in R or
−R, we can first add one point to each region; note that doing so preserves
the absolute difference between the number of points on each side of each
already existing line. We can then reduce this to the previous case.
6
JOSH ALMAN AND JON SCHNEIDER
Figure 3. Dividing R2 into regions.
Altogether, since we consider at most n different directions v, we add at
most 2n points to our input set, so our resulting input set has size O(n), as
desired.
Remark: In the above proof, we’ve exploited the fact that to predict 0 up
to a multiplicative factor, we need to predict it exactly. Such a proof is not
necessary, however; we can straightforwardly adapt the same proof as above
in the case where all vectors v have magnitude at least C (for any fixed
C). The main change is that, instead of considering line through the origin,
we consider them through some arbitrary point p, and now our regions are
constrained by additional error margins about each line; see figure 4.
We can adapt the proof above to the case of approximating the rthquantile, unfortunately at a cost of a weaker lower bound.
Theorem 6. Approximating
the rth-quantile (for r 6= 0, 1/2, 1) to within
√
1 ± requires Ω( n) space (when d > 1).
Proof. The main reason why the proof of Theorem 5 above fails in this case,
is that by adding one point to each of regions R and −R, we can no longer
guarantee that the rth quantile in an existing direction has not changed
6.893 FINAL PROJECT: QUANTILE ESTIMATION IN MANY DIMENSIONS
7
Figure 4. Dividing R2 into regions with -margins.
(whereas we could for the median). Below we fix our proof to address this
problem.
Write r = p/q, and without loss of generality assume r > 1/2. When
originally choosing the set V of vectors, choose them all so they lie in a
common half-plane (e.g., make all of them have positive x-component). Now,
each vector v once again defines a line hv, xi = 0 which divides R2 into two
regions; this time, call the region hv, xi > 0 the “positive” region for v and
call the region hv, xi < 0 the “negative” region for v. Since all vectors v lie
in a common half-plane, it follows that there exists some region Rp that lies
in all of the positive regions.
Now, we claim that, if there are 2n regions, then adding (p − q)n + q
points to the region Rp and q points to each other region does not change
for any direction whether the rth quantile is 0. To see why this is true,
note that for every line, one side of the line (the line containing Rp ) gets
((p − q)n + q) + (n − 1)q = pn new points, whereas the other side of the line
gets qn new points. It follows that if the ratio of the number of points on
the two sides of the line was already r, it will stay equal to r.
Since this adds at least one point to every region, we can perform this
operation in lieu of adding one point to R and −R (as we did for the median),
8
JOSH ALMAN AND JON SCHNEIDER
and our new construction thus works. We now possibly add O(N ) points
for each of the N directions, so our resulting input set will have at most n =
O(N 2 ) points in our input set,√from which it follows that any approximation
algorithm requires at least Ω( n) space.
4. Further Steps
There are many possible avenues for extending and generalizing the results
of this paper.
One such avenue is strengthening the lower bound for rth quantile finding.
At heart, the rth quantile problem seems very similar to the median problem, so a similar linear bound seems attainable (possibly via an improved
construction).
Another is proving lower bounds for median approximation in more restricted settings. Currently, our construction requires that our input points
can range across all of Rd . While we can strengthen our construction somewhat (so that we only require input points that have magnitude at least C,
for example), our results don’t seem to extend to cases where all our input
points must lie in a partially-bounded region. For example, if all input points
and query vectors must lie in (R+ )d (i.e. have all positive components), is
it still true that we require Ω(n) space? What if all points lie in [1, 2]d ? In
this latter case, it seems like we should be able to subdivide this region in
such a way that we can answer such queries in space independent of n.
Finally, there are other models for median estimation which we have not
considered in this paper. For example, in [3] Greenwald and Khanna present
an algorithm for deterministic median estimation in the quantile estimation
model. Likewise, in [4] Guho and McGregor present an algorithm for median
estimation in the random stream model (where we can choose to receive the
elements in the stream in a random order). It would be interesting to adapt
either algorithm to this higher-dimensional setting.
References
[1] P. Agarwal,
S. Har-Peled,
and K. Varadarajan. Geometric Approximation
via
Coresets.
Survey
available
at
http://valis.cs.uiuc.edu/
sariel/research/papers/04/survey/survey.pdf
[2] T. M. Chan. Faster coreset constructions and data stream algorithms in fixed dimensions. In Proc. 20th Annu. ACM Sympos. Comput. Geom., 152–159, 2004.
[3] M. Greenwald and S. Khanna. Space-Efficient Online Computation of Quantile Summaries. In SIGMOD, 58-66, 2001.
[4] S. Guha and A. McGregor. Stream Order and Order Statistics: Quantile Estimation
in Random-Order Streams. In SIAM Journal on Computing, 38(5): 2044-2059, 2008.
[5] H. Yu, P. K. Agarwal, R. Poreddy, and K. R. Varadarajan. Practical methods for
shape fitting and kinetic data structures using core sets. In Proc. 20th Annu. ACM
Sympos. Comput. Geom., 263–272, 2004.
6.893 FINAL PROJECT: QUANTILE ESTIMATION IN MANY DIMENSIONS
9
Appendix A. Spherical Codes
We prove some lemmas about sizes of spherical codes that are used in our
bounds on the maximum estimation problem in Section 2. We will show
that the best coverings and the best packings of the (d − 1)-sphere of radius
1 by (d − 2)-spheres of radius θ have size Θ(θ−(d−1) ), when d is constant.
Lemma 1. A packing of the (d − 1)-sphere of radius 1 by (d − 2)-spheres
of radius θ uses O(θ−(d−1) ) (d − 2)-spheres.
Proof. We know from calculus that, if Sd is the surface area of the radius 1
d-sphere, and Vd is its volume, then for all d:
Sd−1 = 2πVd−2 .
0
is the volume of a (d − 2)-sphere of radius θ, then
Morever, if Vd−2
0
Vd−2
= Vd−2 × θd−1 .
If we pack (d − 2)-spheres onto a (d − 1)-sphere, then the total volume of
the (d − 2)-spheres cannot exceed the surface area of the (d − 1)-sphere,
since the (d − 2)-spheres do not overlap. Hence, the maximum number of of
(d − 2)-spheres we can pack is
Sd−1
= O(θ−(d−1) ).
0
Vd−2
Lemma 2. A covering of the (d − 1)-sphere of radius 1 by (d − 2)-spheres
of radius θ uses Ω(θ−(d−1) ) (d − 2)-spheres.
Proof. If we use (d − 2)-spheres to cover a (d − 1)-sphere, then the total
volume of the (d − 2)-spheres must be at least the surface area of the (d − 1)sphere. Hence, the minimum number of of (d − 2)-spheres we need is again
Sd−1
= Ω(θ−(d−1) ).
0
Vd−2
Lemma 3. A packing of the (d − 1)-sphere of radius 1 by (d − 2)-spheres
of radius θ can be found that uses Ω(θ−(d−1) ) (d − 2)-spheres.
Proof. We can repeatedly place (d − 2)-spheres on the (d − 1) sphere in any
way we want so that no two intersect until we have a maximal packing,
meaning, it is no longer possible to place a new (d − 2)-spheres without
it intersecting one that was already placed. Then, we could increase the
radius of each circle we placed from θ to 2θ. This gives a covering of the
(d − 2)-sphere, since if any point is not covered, then we could have centered
a (d − 2)-sphere there initially. The result follows from our bound in Lemma
2.
10
JOSH ALMAN AND JON SCHNEIDER
Lemma 4. A covering of the (d − 1)-sphere of radius 1 by (d − 2)-spheres
of radius θ can be found that uses O(θ−(d−1) ) (d − 2)-spheres.
Proof. We can take any maximal packing of the (d − 1)-sphere by (d − 2)spheres of radius 21 θ and then double the radii of the (d − 2)-spheres to get
a covering as above of the desired size.