p20-alon.pdf

The
space
complexity
of approximating
Noga Alon *
Yossi
the
Matias
t
Mario
The numbers
Abstract
provide
The
frequency
ments
moments
of type
~~=1
m?.
for
that
z ~
Surprisingly,
are given
and F2 can be approximated
plications
1
containing
are the
Fk, when
that
in logarithmic
bases are mentioned
features
the el-
the numbers
Haas
be
tems
space.
values
Ap-
as well.
A=(al,
az, . . ..a~
each a~ is a member
a,j
~}
=
denote the number
=
quence
) be a sequence
of N
I
A, and define,
{1,2,
of elements,
. . . ,n}.
Let
of occurrences
m,
where
=
I{j
of i in the
:
se-
determine
for data
aa discussed
and Poosala
small
is based
ing in the sequence,
and
Fz is the
of dktinct
elements
F1 ( = m) is the length
repeat
rate
needed
in order
quence
(see, e.g., [11]).
or
to compute
Gini’s
the
index
It is also natural
rithms
se-
the
*AT&T
ment
Exact
of
Bell
Labs,
Mathematics,
Sciences,
grant
Israel
t
and
Academy
AT&T
Bell
matiasQresearch
t AT&T
ms@research.
att.
Aviv
NJ
and
Tel
il.
Research
supported
the
Fund
Basic
of
Sciences.
Labs,
for
apphcations,
07974,
USA
Beverly
University,
by
.att.
Bell
Hill,
Raymond
Tel
[email protected].
BSF
Murray
Sackler
Aviv,
in
Research
and
part
Faculty
Israel.
by
Departof
it may
frequency
moments;
purposes,
the
ments
by
The selection
note
that
sampling
proposed
F2 is the
plays
be beneficial
based
a hybrid
baaed
essentially
on the
algo-
approach
degree
by the function
an important
of
F2.
role for many
to maintain
and, most notably,
computation
The
Murray
Hill,
NJ
07974,
USA.
Ema,il:
Murray
Hill,
NJ
07974,
USA.
Email:
of a relation
as the records
the
general
bution
corn.
Labs,
for a
distri-
estimates
for
for F2. For efficiency
of estimates
for
frequency
mo-
Email:
a USA-Israel
administered
of
Their
histograms
the frequency
itsel~
is selected
measured
Since skew information
= ~~%~n m~ .
——
costs.
of relations.
with
FO, and
algorithm
skew of the data,
F-
appropriate
Haas et al [13] considered
for estimating
in which
to define
in the context
size of such join.
recently
of homogeneity
imdez of the
et al [6] (see
access plan
to approximate
a relation
de-
F2 is used by Ioan-
estimation
in the attributes
joining
output
of the sequence,
surprise
of values
involves
appear-
on selecting
of values
the
of algorithms
by DeWltt
sizes and
in many
for example,
In particular,
result
the degree
consideration
the selection
[14] for error
query
number
bution
F. is the number
therein).
FO for
attribute.
Thus,
gree of the skew may
sys-
of distinct
Fk for k ~ 2 indicate
is of major
applications.
partitioning,
database
number
i.e., the function
of the relation
which
estimating
the
Indeed,
optimization
object-relational
in a relation,
moments
important
and are important
all query
assessing
The frequency
nidis
for each k ~ O
for
consisting
also references
of A and
applications.
virtually
and
a means
database
the data,
of database
that
of skew of the data,
method
In particular,
context
of an attribute
parallel
moments
of a data set represent
about
relational
the sequence
Introduction
Let
in
require
the frequency
moments
information
in the
$
on the sequence.
et al [13] claim
methods
FO, F1
space, whereas
nQ(l)
statistics
demographic
moments
Szegedy
F& are called
useful
The frequency
F& =
one by one and cannot
out
of F& for k ~ 6 requires
to data
m~ ele-
numbers
of randomized
the numbers
it turns
the approximation
n,
the space complexity
approximate
of the sequence
stored.
of a sequence
1 ~
We consider
algorithms
ements
i,
frequency
Note
corn.
(exact)
Permission to mske dlgitellhard copies of all or part of thk material for
personal or classroom use is granted without fee provided that the copiee
are not ~de or dktributed for profit or commercial advantage, the copyright notwe, the title of tbe publication and its date appesr, and notice is.
given that copyright is by permission of the ACM, Inc. To COPYothemaw,
to republish, to post on servers or to redktribute to lists, requires specific
permission and/or fee.
STOC’96,
Phdadelphia PA,USA
Q 1996 ACM o-89791-785-5196105.
,$3.50
on the
value
20
of the
data,
2C{l,2,
view
...,
maintaining
n},
has been
which
such
aa distri-
well-studied
maintenance
m the
(cf. [10]).
to maintain
by maintaining
a counter
requires
and updated
to the database.
views,
straightforward
moments
i.e.,
are inserted
data
it is rather
frequency
be done
of maintaining
of incremental
that
preferably
of the relation
approach
statistics,
problem
should
a full
m,
memory
for
the
histogram
each data
of eize Q(n)
(cf. [16]).
However,
computing
memory
tures
requirements
irl external
overhead
memory,
require
which
and
incoming
that
quires
would
size is further
typically
relations
data
memory
the data
imply
The
on Chebyshev’s
application
restriction
some concluding
Thus,
are several
known
randomized
memory.
For
simplicity,
of approximating
these
say with
success
2
eral case, that
ris
(see also
Martin
error
(=
O(lg n)
bits
with
that
strong
required
for every
ing at most
of memory.
using
however,
of hash
bounds
to approximate
approximating
algorithm
(not surprisingly)
using
algorithms
respectively
The
very
for F-
there
moments
algorithm
and
that
simple
of [17] and
tools
a
Fk
one can
as well
result
as in the next
the given
can then
a random
variable
that
space constraints,
variance
whose
is relatively
be deduced
ran-
is a very natural
from
small.
Chebyshev’s
requires
2.1
Fk as well
2 we describe
o
Q(n)
(randomly)
of the
approximating
Flajolet-
hash
functions,
bits,
viates from
Proof.
and
estimating
F1 and
define
here include
that
ofmembers
and every e >0
computes,
given
of N={l,2,
(
klg (l/~)nl-llk(lgn
a
. . ..n}.
+ lgm)
J2
)
a number
Fk by more
Without
.SI =
presentation
FO
concerning
that
the space com-
approximate
the fre-
as on the space complexity
abstract
that
compute
is organized
space-efficient
the
algorithm
Y so that
the probability
than ~Fk
as at most
that Y de-
c.
FO can be implemented
linear
algorithms
our
For every k ~ 1, every A >0
in one pass and using
frequency
the known
explicit
moments.
we omit,
from
now on, all floor
and ceiling
on the required
modifications
computes
their
variables
median
X~j
identically
of the variables
Choose
constructions
21
a random
We first
in advance,
if this
Each
distributed
assume the length
and then
variables
Yi is the
~ s 1, where
random
is computed
from
O(lg n + lg m) memory
member
the
signs
comment
is not the case.
sz random
Y.
: 1 ~ j
X = Xtj
the same way, using
The
constants,
(To simplify
m is known
dependent,
al-
our absolute
52 = 2 lg(l/c).
these are not essential).
and outputs
as follows.
to optimize
and
The algorithm
of
those
randomized
trying
‘kn~~”k
of the sequence
random
applied
getting
numbers
show that
in this section,
Fh we define
under
ex~sts a randomized
memory
a version
[8] for
algorithms
or deterministic
for
thus
the
We next
value is Fk, and whose
whenever
The rest of this extended
gorithms
value,
re-
numbers
any
precisely.
In Section
bits.
sequence A = (al ,.. .,a~)
the O(lg Ig n) and the O(lg n) bounds
of determinist~c
quency
precise
by
Using
can be slightly
each of the
approximates
described
to estimate
desired
‘I!heorem
are tight.
randomized
m, precisely.
possible
bits
be approximated
We also make some comments
plexity
its
that
pre-
bits,
Fk
algorithm
expected
F~. We prove
) memory
for approximating
that
algorithm
can be computed
bits.
we observe
analyzed
can be computed
(randomly)
of computing
one. Trying
FO in the
approximation
algorithm
F2 can
and
in the
the numbers
fl(TZ1-5/~
on Iy O(lg n) memory
Martin
randommoments
using O(n lg m) memory
basic idea in our algorithm,
domized
) Whang
for the minimum
(randomized)
at least
acldition
efficient
Inequality.
Surprisingly,
space.
algorithms
frequency
each of the numbers
Estimating
The
is based
are available.
each of these moments
O(n lg lg m) memory
2.1
and
functions
of approximating
from
4 contains
F1
only
Fo using
families
properties
using
do better.
space
the
of [17] the space requirement
instead
of n,
Flajolet
analysis,
Section
approximation
several
by approximating
Mor-
to approximate
counter)
final
for approximating
computing
randomized
k >0, Fk can be approximated
randomly
usbits.
We further
show
1– 1/k lg ~) memory
O (n
requires
randomized
In
e.)
(Their
tight
for k z 6, any
Fk
using
as a function
how
m{
that
the gen-
for approximating
the problem
we obtain
memory
that
given
of databases.
Here
that
3/4,
explicit
random
et al [19] considered
context
complexity
) bits
of memory.
assumption
very
duced,
we consider
randomized
we describe
that
the method
the error-probability
an algorithm
Note
0.1, and
say,
3 we present
and open problems.
cisely and deterministically
constant
first
does not exceed
an approximate
O(lg lg n)
Fk.
ap-
limited
up to some fixed
sections
A and
that
simply
[7], [12] ) showed
[8] designed
on the
section
problem
of at least,
is; how to design
O(lg lg m)
efficient
the
that
is, the space
relative
for
error
(In the following
m, the
[17]
Space
In this
in the study
F~ using
us consider
numbers
relative
probability
m < nc’(’).
(that
let
In Section
a simple
based on techniques
The
remarks
and the
and
in one
algorithms
moments
bound.
of four-
variables,
Inequality
are mostly
complexity,
ized algorithms
some of the frequency
with
which
a sequence
random
re-
the problem
moments
binary
of the Chernoff
our lower bounds
to different
each relation
arises naturally
proximate
factor,
is based
belong
structure.
support
analysis
communication
will
which
struc-
observation
database;
spaces
uniform
of databases.
There
sample
wise independent
by the
the frequency
constraints
of small
Large
an expensive
time.
records
in the
or estimating
pass under
would
used for
be limited.
storing
update
data
its own separate
the memory
emphasized
are stored
of computing
that
the estimates
in access time
on memory
that
it is important
and maintaining
Y1, . . . . Y.z
average
the
X,l
of S1
are in-
variables.
Each
the sequence
bits,
an of the sequence
in
as follows.
A, where
the
index
p is chosen
randomly
and uniformly
... m,. Suppose
bersl,2,.
that
among
(Note
the num-
aP = 1 ( c N = {1,2,.
. . ,n}.)
that
the
sequence
shows that
this
is tight,
Proof
Let
‘r=l{q:
be the
number
the sequence
q?p,
ag=~}l(>l)
of occurrences
A following
of 1 among
aP (inclusive),
x = rrl(rk –
Note
that
in order
the lg n bits
The
to compute
representing
ing the number
the
members
m:
value
E(X)
a[(1’+(2’
m
+ (2k m!
X we only
need to maintain
,=1
To estimate
Z
-1’)+
+((m,
i=l
for the last inequality
Var(X)
= E(X2)
-
(E(X))2
of X
=5
-
variables
above,
E(X2)/sl
+(mf
(ml
-
-l)k)’)+
Therefore,
~ kFIFzk_l/sl
. ..+((m2(l)
1’)2
krn~-l(mf
~ knl-llkF~/sl
,
+ . . . + (m:
(ml
- 1)’))+
-
(mn -
It follows
+...+
(kl’k-’
l)–))
-t- k2k-1(2k
1(m:
– (m.
-
that
+ k2k-l(2k
- 1’)
the probability
dard estimate
+)++...+
viate
1’) -t . ..+
by more
It
+..
= kFIFz’_l
ineq. is obtained
for any numbers
.+k?n~l-l
]
m
from
the following
=
inequality
< (a – b),kak-l
+czk-’b+
--
+Ctbk-2
+6’-’)
=
.
the value
element
probability
We need the following
For every
simple
n positive
reals ml,
m’
,=1
22
this
e. In case
a good estimate
to
algorithm
process
can
be imple-
in advance.
In this
the member
al of the se-
computation
of X
ends,
al by a2 with
as al.
probability
sequence
1/2,
after
we replace
at by that
In case of such a replacement,
the
and
process-
we have
for at and for r.
case,
If indeed
else we update
In general,
of the
XtJ ) some value
When
element
(for
the
with
we update
it to be 1. Else, al stays as it is and r increases
easy to check that
(’jjm)(jjmT_l)
<
nl-’/k(~m$)2
.
—
i=l
the
known
by 1 in case am = al and otherwise
. . . . m~
A),
Y, de-
as needed.
of r as needed.
am arrives
[3] Appendix
Fk is at most
how
the
from
by the stan-
of the variables
Y, supplies
m — 1 elements
l/m.
r and define
inequality:
in the
1 and
Y, deviates
and hence,
sz/2
m = 1 and choose
A used
each variable
next
show
of m to 2, replace
update
a > b >0:
with
1, r
value
to
in case m is not
we start
,
~Fb from
F’,
a single
1/8,
than
the median
quantity
ing the first
(a – b)(&l
and by the definition
(cf., for example,
more
than
remains
quence
= kmF2k-1
that
that
is at most
of Chernoff
the required
- 1)’))]
+km~k-l
~F’
does not happen,
mented
< m [km,~k-l
Inequality
i,
1) ’)2)]
the probability
+hr(m$((mz(
= F’ .
k)2)++)++
+k2k-1(2k–lk)
-
= E(X)
by Chebyshev’s
Fk by more than
i=l
❑
of the random
the definition
Y, and the computation
that
whereas
(12’ + (2’
~k—bb=
we use the fact
(.ZV=I
‘~) /n < (Z~=l m~/n)l/k
Var(Y,)
~ m [ (kl’~-l
Fact:
~m:)(2k-1)/k
By the above fact,
-lk)2-t-
holds
S
l)’))]
= Fk.
(;k+(2k
the first
Mk
.
nl-l/k(~m~)l/k(
of s 1, for every fixed
where
then
i=l
E(Yi)
which
1
. ..+(m~–(mz–l)k))++
[(12’ +(2’ - lk)2+...
km:-
=
,=1
.
=
(kl’k-l
m~
(l)k))+))+
lk)+ . ..+(m~–(mn–
the variance
+...
maxl<,<nm,,
—-
i=l
.=1
E(X2);
E(X2)
=
=
of 1.
of X is, by definition,
,=1
we bound
M
.
of
where
=2
=
factor. )
1)~)
-lk)+...
(1’
mz
and hence
<
+(2k
n~lk,
and define
=
(lk
Put
fact ):
=
a~ = 1 and the lg m bits represent-
of occurrences
expected
E(X)
(?-–
(of
~~=1
ml
up to a constant
cess, O(lg n + lg m)
memory
completes
of the theorem.
the proof
does not
for the implementation
bits
for each X;j
❑
change.
of the whole
suffice.
It is
proThis
Remark.
In case m is much
n, one can use the
number
r used
in the
O(lg lg m + lg(l/A))
cu requires
plexity
bigger
algorithm
than
computation
memory
lg n additional
a polynomial
of [17] and approximate
of each X,J
bits.
bits
this
to O (wn’-’/~(lg
using
Since storing
changes
in
the process
the sum
only
be generated
the value of
computed
the space com-
terminates
n+lglgm+lg
*)).
Improved
The second
estimation
frequency
since the repeat
statistical
applications.
proximated
(for
memory
in fact
a logarithmic
Theorem
randomized
(al, .
number
every
For
—
memory
When
Z can be
the sequence
X = 22.
proof,
of X.
independent
we next
Since
compute
the random
the expecta-
variables
e, are
and E(e, ) = O for all i,
interest,
O(@(lg
e >
n -i-
n
in this
case.
0 there
exwts
grven
a sequence
Similarly,
a
pendent
A =
the fact
implies
that
the variables
e, are four-wise
inde-
that
m one pass and uszng
It follows
Y so that
than
AF2
the probabdity
u at most
can be implemented
that
E. For
Y
fixed
by performing,
number
a constant
operations
field
Therefore,
m) bits.
of
of VP can
)
of the sequence,
and finite
define
O(lg rz+lg
value
bits
+ Igm)
Fz by more
aigorzthm
member
of N,
a number
bits,
ates fr~m
(lgn
~z
0 and
p in O(lg n) space).
current
the
we show that
suffices
computes,
that
of members
(lg‘1”)
o
A >
the
p (since
F2 can be ap-
theorem
of bits
to maintain
value
arise in various
A and e) using
In the following
algorithm
, am)
e, the
positive
bits.
2.2
index
By the last theorem,
fixed
lg m))
Fz, is of particular
and the surprise
from
using only
and variance
pairwise
for FZ
moment,
rate
have
and to keep the
As in the previous
tion
2.2
we only
each
on elements
that
deviA and
for
each
of arithmetic
Therefore,
of O(lg n + lg n) bats.
by Chebyshev’s
Inequality,
for each fixed
i,
l<i<sz,
Proofe
Put
algorithm,
dian
S1 = ~
of S2 random
average
X,j
variables
of S1 random
Fix
tors
<...
[1] such
is, for every
4) can
number
sets
reducik,le
ij
(also
power
such
ations in the
finite
1 and
Z is a linear
be computed
in the
d over
than
h.
matrices
define
omit
the details.
2.3
Comments
Flajolet
of each
products
a random
the sum
for
very
by applying
Z with
a sufficient
parts
approxi-
of this
sum using
bits for each, and use the analysis
Inequality
high
to show
probability,
that
the required
on the estimation
.arn,.
this
result.
gives,
We
of FO
an explicit
random
family
a slight
analysis
during
linear
23
a randomized
O(lg n) memory
assuming
of hash
which
shows that
hash function.
functions
of the
algorithm
bits,
and ana-
one may use in the algorithm
exhibits
Since we are not aware
modification
that
only
of hash functions
properties.
of such a family
Note
[8] described
FO using
lyzed its performance
and can thus
A, where
and Martin
for estimating
p is chosen uniformly
m.,
is at
and once
inner
the sequence
Y of
~Fz
2d is
difficult
of mult implic-
~~=1
can be reduced
and positive
O(lg n + lg lg m + lg(l/A))
we need an ir-
than
way to do so is to maintain
of the negative
a sufficiently
as in the
median
❑
complexity
easiest
with
imply,
the
Fz by more
of [17] to maintain
The
now
that
(Ig n + lg lg m + lg(l/A))
in [12] and Chebyshev’s
X we choose
Z =
from
the proof.
space
of
It is not
binary
mations
in
Y, deviates
The
accuracy.
of Chernoff
the probability
m to O (~
the method
of strength
GF (2 ), where
of the numbers
in one pass from
described
check
number
and
d. To compute
function
As
each coordinate
a constant
Remark.
1 s
that
e, completing
large
63 in their
O(lg n) space),
to compute
We then
have
n.
vec-
estimates
proof,
numbers
most
,C4 G {–1,1}
construction
(using
GF(2d)
coordinates
arrays
parity
the
are four-wise
of C1, ...
vectors
the
this
using
of h = 0(rL2)
standard
previous
as follows.
which
distinct
VP = (cl, ez, . . . . en ) c V, where
between
the sequence
bits,
The
the
vari-
1,...,4.
of degree
field
of length
=
of 2 greater
it is possible
v. in O (Ig n) space,
of vectors
j
a polynomial
the
random
as orthogonal
using
polynomial
it is given
vector
for
known
To implement
smallest
to find
that
four
choice
of the
be constructed
codes
from
+1, – 1 entries,
a (1/16)—fraction
coordinate
the
is computed
~ i4 ~ n and every
exactly
distributed
set V = {VI, . . . . vk}
that
independent,
BCH
identically
n with
is the me-
each being
X.7 : 1 ~ j s S1, where
O(lg n + Ig m) memory
an explicit
of length
algorithm
YI, YZ, . . . . Ys,,
variables
Each X = X,j
same way, using
As in the previous
Y of the present
are independent,
ables.
ii
and sz = 2 lg(l/e).
the output
we briefly
algorithm
for this
describe
of [8] and
version
For simplicity
some ideal
of the existence
it suffices
we only
here
a simple
to use a
describe
here
the problem
tive
of estimating
constant
possible
factor,
to improve
of the algorithm
ber Y
that
O(lg n)
ratio
at most
For
memory
d be the smallest
the
members
GF(2*),
of N
which
and
sequence
integer
independently.
and
binary
vector
probability
r(z)
value
a, of the sequence
algorithm
is Y = 2~.
algorithm
we only
representing
current
have
operations
two
maps
that
appear
the probability
only
the
The
the
d.
For
any
r so that
the
Let
d =
FO is the correct
bits
of the random
distributed
over
independent.
the probability
(and
F
1/2”),
Thus,
y that
an r.
~
whose
where
Wz,
ited
for every
x ranges
in the sequence
expectation
value
A.
= az +
b
a~, z,
By
FO ~ (1 – ~)
< Fo/2T.
appears
indicator
Let
+ b) ~ r.
of expectation
is 1/2’,
The
independence,
Therefore,
the
and should
all pairs
(z, y) besides
=
appear
not exceed
variance
by Markov’s
of Z
of Z is
Inequality
Let DIS~
function
: {O, l}n
(called
since
E(Z.
2“ >
.Fo
) = F./2’
then
Prob(Zr
is 1 iff the subsets
tors
< 1/c.
Similarly,
> O) < l/c,
In-
for
any
fixed
since Var(Zr
Var(Z,
) < Fe/2”
)/( E(Z.
then
Prob(Z,
= E(Z,
)2) < l/E(Z~)
= O) < l/c,
) and hence
= 2r/F0.
Prob(Z~
lower
= O) s
Since our algorithm
of f(z,
possible
by Babai,
of bits
Frankl
by considering
and
the related
commwncation
y measure
must
y)
z and
number
complex-
on the possible
apply
in-
a deterministic
value
whose
in [21], [4], C.(f)
x {O, 1}*
of {1, 2,
complexity
bound
lows easily
approximation
e <
a simple
and showed
If C2T < F.
are
the best protocol).
the correct
R
of j(z,
y) on
p-measure
does
> ~D2c(flp)
that
1/2,
{O, 1} denote
junction)
. . . n}
are z and y intersect.
exhibited
equality
parties
each of them
value
expected
case (under
of this
for
(z, y)
vec-
and Schnitger
measure
~
Q(n).
D, (DIS~
of [15]. The
of Fk for fixed
studied
a re-
[15] proved
that
of this
k ~ 6 is more
[18]
function
Ip) ~ O(n).
of estimating
lower
the
Improving
Razborov
~ on the inputs
for this measure
the result
DISn
function.
C,(DISn)
the Boolean
characteristic
researchers
for the space complexity
from
where
whose
Several
sult in [4], Kalyanaeundaram
by Chebyshev’s
the
and
correct
is the
the Disjointness
communication
If
of
vec-
all f, e and p.
and since the
E(Z)
value
At the end of the com-
the
a set of inputs
e. As shown
{O, 1},
unlim-
the
1 — e (for the worst
output
+
with
x and y are binary
as well.
Here the two parties
protocol,
pre-
definitions
communication
to compute
distributional
at leaat
Before
possesses z and the second
output
a probability
is
requires
x {O, l}m
computation,
can be estimated
[p) under
method
of F=
parties
by Yao [21] and extended
of the c-error
the
some basic
two
either
lower bounds
communication
to each other,
C, (f)
notion
Z = Z,
party
the
in the worst
a, and
we can
probabilistic
wish
or ap-
an appropriate
illustrates
Consider
at least
complexity
(z, y).
to
f : {O, l}n
decisions
[4], C,(f)
precisely
approximation
that
must
Simon
D, ( f
that
f (z, y), where
random
z that
the expectation
[20].
to send messages
probability
the
where
let us recall
n, the first
they
space
in the next subsection.
power,
random
for
of our lower bounds
problem
the e-error
is
ity
result
To perform
As shown
mapping
distinct
over all the FO elements
By linearity
pairwise
be the
c is
the
but
problem,
of a function
function
of length
y).
the
proof,
by Yao
communicated
the probability
this
x 6 N that
Wz
is 1 iff r(az
of each W=
is F./2’.
let
A,
Cc(f)
possesses y.
with
FO. The
f(z)
fixed
and
approximate
or prove the required
presented
simple
computing
tors
the
r(z~ ) ~ r and r (zj ) z r is precisely
For each element
sequence
this
can make
hence
>
that
on the space required
Most
the randomized
memory,
a Boolean
puts
in the
variable
linear
that
for the corresponding
easiest
that
introduced
representing
for every fixed
results,
those
The
munication
and that
1/c
bounds
randomly
complexity
by establishing
the
in order
from
moments
use some existing
the proof
lower
and comment
by reducing
communication
senting
our
deterministically.
and facts concerning
1/22” .
Fix
are obtained
of distinct
mapping
between
algorithms
Fk
these
them
and let us estimate
considerably
each a, to z. we need is that
Z,
❑
we present
moments
proximate
complezzty
representing
number
A,
FO is not
and
as needed.
of randomized
to compute
allowed
T(Z, ) ~ r is precisely
once
section
complexity
of the
O(lg n)
needed
bits
R
is taken
output
the
O(lg n)
Y
2/c,
r for which
the probability
bounds
problem.
Thus
to implement
(besides
O(lg lg n)
Y deviates
properties
pairwise
of the
F.
the maximum
in the sequence
that
is uniformly
field
polynomial
the
ai
r~ = T(Z; ).
in order
to keep
of F, chosen
largest
A.
of
r~ value.
now,
that
that
in F)
maintain
Suppose,
that
Note
where
an irreducible
maximum
elements
of r,,
vectors
of length
the
over all elements
b and
in the
of z are all O and put
be the maximum
field
a . a, + b , where
vector
denote
finite
a member
z, =
by a binary
bits
to perform
Lower
In this
and c is
2* > n, and
of the
members
are computed
z, let
r rightmost
aj,
3
a num-
l/c
by binary
When
compute
addition
z, is represented
that
the
so that
are represented
appears,
A
product
a and
that
between
as elements
d. Let a and b be two random
length
uniformly
between
than
frequency
Let
bits
computes
2/c.
Proof.
=
such
ratio
smaller
above show that
an algorithm
exists
of N,
bits,
consider
F
there
R is the maximum
Y = 2R, where
0, the two inequalities
the
and the success probability
and FO is not
Y
It is
the space it uses.
every c >2
between
outputs
multiplica-
success probability.
a sequence A of members
using
the
the accuracy
2.3
given
constant
by increasing
Proposition
that,
FO up to an absolute
with
F=
bound
complicated
Our
fol-
for the
and
requires
an extension
of the result
of Razborov
in [18],
3.2
The
In this
3.1
The
space
Proposition
complexity
3.1
Any
of approximating
randomized
a number
F-
Y such
by at least
must
that
the
Fw /3
use Q(n)
probability
than
is less
memory
e, for
outputs,
that
of N = {1,.
that
subsection
Given
Y
. . . n}
deviates
some fixed
from
3.2
domized
algorithm
at most
bits,
we describe
parties
a simple
possessing
using
only
e < 1/2,
n elements
memory
arbitrarily)
followed
The
gorithm
characteristic
rithm
first
[x [ members
is smaller
otherwise
The
than
It
or 2m/n
that
gorithm
see that
the
than
method
the linear
lower
bound
positive
fl(lg
(simply
Ig m)
because
for
holds
with
occurs
above
for
the
above
lower
disjointness
there
is only
to the second
arbitrary
that
objective
bound
approximation
output
must
one.
F~
hand,
according
protocol
ability
that
output
bit
e-correct
this
that
bit
of the protocol
length
bit
is 1 is at least
input
sequences
(Al,
player
pro-
outputs
sequence
1 — .s (The
may
1 —e
the prob-
value
of the
over all possi-
of the expected
In order
a
input
be arbitrary).
is the maximum,
in the communication.
these
can exchange
is O is at least
sequence
. . . . As),
com-
disjoint.
probabilistic
input
of
dis-
is called
if for any disjoint
this
intersecting
it
between
players
the last
of
player
a unique
are pairwise
to any predetermined
for any other
Each
and
is to distinguish
is called
iV =
A,
on those
share
To do so, the
the probability
to prove
number
of
Theorem
3.2
the following.
for
Proposition
al-
the
proof
munication
By the
(including
Since the lower
in
F~ )
we can deduce
of Fm
sequence
t)
1/2,
and any t >
protocol
is at least
from
protocol
between
is less than
tion
P=
the
into
of [15]
a space
even if we allow
errs with
A in its order
a
for
the
S4,
com-
fl(t/s3).
II U12 U...
players
i2(t/s3
y Q(l),
disjoint
~ s, z c N
all partitions
of N with
1/2,
(Al,
U lS U {z}
s + 1 pairwise
subset
define
sets,
P
these
=
~j
commu-
Define
bit
probability
a distribu-
. . . . A, ) as follows.
where
is chosen
parameters.
of cardinality
total
an output
the last
be a random
and
AJ
the
distribution.
on
communication
produces
where
to prove
a distribution
in which
) bits
sequences
each 1 ~ j
[4], in order
to exhibit
any deterministic
over the input
~ on the input
probability
25
the
probability
~J be a random
of times.
of [21] and
it suffices
and prove that
is computed
protocol
argument
nication
that
need a
bound
c <
e-correct
DIS(S,
proposition
the inputs
communication
as in the
any fixed
For
problem
simple
the last
for some
we only
3.3
of any randomized
length
y, as
as mentioned
one communication,
the whole
game
At the end of the protocol
we prove
attain
probability
moments
function,
of the
put
of 100, say. It is
probabilistic
communication
observe
to
disjoint,
t)
s play-
. . . . A, ) is called
Az,
if all the sets A,
of inputs.
The
of N).
has no information
(Al,
D(s,
by
is a subset
P.
x and the sets A, — {z}
types
played
t-subset
sets A~ are pairwise
two
bits
C?(n + lg lg m) is a lower
bound
for the approximation
number
subsection
but
additional
t, let
– 1)s + 1 and
(2t
sequence
intersecting
ble input
O(n lg lg m) bits.
one-way
An input
The
The
game,
n =
of Razborov
and
s and
of each player
his own subset,
and for any uniquely
O or
even if we wish
of estimating
all frequency
either
next
positive
) Thus
using
lower
is also a lower
A and c. On the other
of the
bound
algorithms
bound
of the
its final
values
in advance.
in the
complexity
party
above
of any randomized
section,
bound
described
in N
The
for the space complexity
that
of [15]
integers
input
an appro-
its complexity.
that
positive
t of N (also called a
if the
bit.
2n, since we may consider
number
times.
for F~
constant
and
the theorem
The
to
modifications
Define
P,.
, n}.
tocol.
Zk such
Q(nl-5ik)
by considering
is similar
communication
messages
1 – e,
a number
several
following
mon element
It is obvious
y at least
game
requires
be the
uniquely
algo-
theorem
For
joint
. . . . n},
any ran-
sequence A of
game and by studying
Proof,
the others.
F- on
outputs
is 1 if the sets are disjoint,
to
every
can be approximated
lower
knowing
1.
probability
from
to see that
the previous
holds
with
m is bigger
m is not known
first
sends the
approximation
follows
at least Q (lg m) distinct
Note
then
which,
an input
< -y uses at least
ideaa.
cardina}ity
al-
above
of the
but
{1,2,...
0
the space complexity
lower
of the
the value of Fm up to a factor
not diilicult
analysis
[18],
sequence
in which
approxl~mate
fixed
party
thus
is easy
even when
bound
It
of A.
else it outputs
value
result
above.
sequences
shows
approximation
the
ers P1, P2, ...,
it is 2.
Remark.
m/n
4/3;
value of F~
desired
mentioned
holds
of
is z (arranged
for approximating
second party then
output
is the correct
since the precise
Let A be
of all members
the
to the second
(or O) iff the
this
Iyl denote
of the subset of N whose
z, runs
v! continues
to run the algorithm
the rest of the sequence A. The
that
in
O.lF~)
>
and v < 1/2,
given
= {1,2,
of N
– Fkl
k >5
outputs,
communication
The
(x, y),
that
bits.
knows
knowing
of the memory
“disjoint”
for two
DISn
Izl and
vector
priate
is y.
party,
on the
content
Let
by all members
vector
first
protocol
to compute
lx I + Iyl consisting
of N whose
uses s memory
of z and y, respectively.
of length
the subset
characl:eristic
communication
of communication.
of l-entries
the secpence
that
z and y respectively
s bits
the numbers
as above
Fk
the following.
any fixed
For
Prob(lZ~
that
bits.
an algorithm
of approximating
we prove
Theorem
We prove
Proof.
complexity
F~
algorithm
given a sequence A of at most 2n elements
space
partition
1lj I =
Let
of N
2t — 1 for
uniformly
among
For each j,
t of lJ. Finally,
let
with
for all 1 ~ j ~ s, and with
probability
useful
1/2,
define
to observe
Aj
that
is to choose the random
be a random
each Aj
either
them
subset
that
(A,,...,
joint
is precisely
uniquely
each disjoint
uniquely
input
(A!,.
of N.
probability
is disthat
also that
the
it
and each
input
uniquely
a lower
nication
complexity
of any
game.
Lemma
There
every
sequence,
To prove
that
Probp
[Aj
c ~j]
of the
input
denote
Probp
analogously.
is called
P.
c ~j]
a box.
case the
conditional
and
assertion
c >
later.
The
in XJ,
put
not bad,
c >0
such
We need the
x . . . x ~,.
by first
probability
that
used in the random
is P.
The
[A;
let
P,
set
the
3.5
last
In order
of s -
s, r # j),
II U12 U...
r+j,
There
two
such
1 pairwise
the
Aj
choice
conditional
c 73]
U I.
is at most
U {x}
~.
to bound
holds.
Consider,
are at least that
many
the family
{x1, x2,.
. . , zzt}.
that
Let
contain
inequality
t-subsets
of
of all these t-subsets
denote
pi
w,
the
and let H(p)
entropy
=
function.
(cf., e.g., [5]),
P = 11 U 12 U.
the partition
is that
if the
bad partition
P, then
H(pt)
c’ /s2
s
1-
b denote
Let
choice
some
- p,),
absolute
P.
crucial
zi
that
constant
whose
By the above
in a j-
implying
positive
of elements
partition
. U IS U {x}
The
of z~ as z results
p, < (1 – ~)(1
for
the number
in a j-bad
X2 as z.
c’.
choice
as z
discussion
implies
that
for b is smaller
bound
is much
if t/s3
and by choosing
O(ct/s),
Lemma
3.6
larger
than
c to be sufficiently
than
2t /(20s),
lg
t,then
small
b <
this upper
completing
the proof
❑
ZfP=
IIUIZU..
U I, U {x}
is a good partition
then
are de-
11 U I.z U . . . U A.
constant,
U {z}
By the definition
a choice
for
about
the
t-subsets
and
is j-bad,
given
c >
for every
C
the
that
N,
(1 < r ~
partition.
P =
partition
IT =
I;
for
all
able
(1 – fi)s
Similarly,
value
Multiplying
of the
the
proof
variable
p
result
of Lemma
whose
let Xj (P)
is 1 iff
above
distribution
> ~ the desired
to the
random
whose
x~=~Xj(p)
1 ~ j < s.
the definition
Returning
0
indicator
For
j,
and using
any
holds.
I:
that
partition
If it is
good
constant
the following
probability
of a good
to be chosen
for some j.
statements
that
disjoint
conditional
we have
lemma
of F
entropy
6
Therefore,
1 < j s s, if
absolute
eoists
=
to determine
fact that
inequality
U {z}
of the lemma.
bad partitions.
in
probability
–p lg2 p – (1 – p) lgx (1 – p) be the binary
This
Recall
has been defined
is bad if it is j-bad
following
there
ProbP [A$
P,
j-bad.
– s2–ctls3
it is good.
Lemma
lj
partitions
P is not
of the
let F denote
of members
observation
protocol
For such a partition
P =
j satisfies
0 is a (small)
partition
the
the case that
13 U {z}
Proof.
where
that
we have to choose one of the elements
commu-
x . . ~ x 77,]
and Probp
A partition
()
in this
U {z}
2–et/.3
possible
2t
of lj
than
2t
implying
2t possibilities
of t-subsets
that
lemma
average
constant
x ~z
A,)
(A 1, ...,
-j - bad, where
forms
c-correct
the conditional
[A;
is smaller
is zero
results
E xl
the partition
sequence
probabilities
fined
,A~)
partition
given that
If the number
each of the
2-W*’
By a standard
x . . . x ~.
p on the inputs
a random
is
oft-
imply
the following
on the
fix a box xl
the distribution
lies in ~j,
the players
that
an absolute
x ~z
the lemma,
choosing
for
<
and
each xi
of s-tuples
arguments
deterministic
[( A:,...
to ~j
are only
AZ, . . . . As ) corresponding
bound
exists
box xl
-&Prob
simple)
(Al,
shows
to establish
for
then
yj]
thus,
intersecting
where
a family
between
this
suffices
for the above
x . . . x ~.,
(and
sequences
see later,
3.4
belong
is
~ gives
same probability.
disjoint
is clearly
communication
As we shall
that
P.
for all r # j, the union
and there
1
sequence
distribution
Note
a random
x ~Z
This
Standard
the set of all input
that
the
1/2.
a random
xl
N.
since I, is known
as well,
for the partition
fraction
A box is a family
that
it
sequence
. . . , A: ) denote
a set of t-subsets
to a fixed
is known
contain
sequence.
subsets
Note
we discard
the same probability
input
denote
If
the input
whereas
sequence
. . . A:)
t of A~ U {z}.
z or all of them
the above
is also
intersecting
and let (A;,
input
1/2,
let
Proof.
IJ U {z}
choice.
that
under
intersecting
and then
sets, and otherwise
the random
It is
definition
P as above,
A3 contain
the probability
generated
A.)
for all j.
equivalent
of cardlnality
as our input
and repeat
Note
U {x}
(13 – X3)
partition
none of the subsets
x we keep them
Let
=
an alternative,
value
well
follows.
as
is 1 iff
Note
P
be the
is a bad
random
that
the
❑
3.4, let x(P)
be the indicator
P is j-bad.
inequalities
as
vari-
X(P)
~
By computing
the expectation
F’rob[(A~,.
over all partitions
probability
P
(A;,
... A~)6~lx~s]x~s]
—
—
E (ProbP
[(A;
>
E(Probp
[( A~,.
... A~)C~lx.
..x
> ~E (ProbP
in this
Z,]
where
the last
It follows
x
inequality
that
3.4 it suffices
[(A!,...,
A~) c
~s]
(1
follows
in order
–
from
to prove
to show that
X(P)))
–
Lemma
s’2-’t/”
Q( ~2ctfs
Proof
[(A:,.
. ,A~)
[( A~,.
Consider
a fixed
definition
of the partition
choice,
element
choice
the union
z should
Given
xj(P))
x . . . x 7,]
(1)
... A~) cZl
for
the
. (2)
x . . . xx,])
subsets
j
in the
P = 11 U 12 U. ~. U Is U {z}.
Given
U = Ij U {x}
still
the above
be chosen
information
I.,
r
is known,
#
but the actual
randomly
in this
on P, the quantity
player
and each of these
and
communicates
ond
player.
The
The
in the
in XJ
choice
in ~i
ceived,
union.
on.
probability
at most
product,
ae follows.
zk
that
1/ (~).
1/ (zt~l)
in Ij
Note,
= 21/ (~).
= 1 given
and we thus
also, that
the choice
conclude
– 1) =
Fk with
is -y-correct
This
implying
the inequality
of Lemma
Proof
3.4.
in (1), (2) and completing
the protocol
and amplify
the awertion
the probabilities,
of the proposition
thus it suffices to show that
length
is smaller
according
It
than
and
well
corresponds
known
), applied
of communication
protocol
outputs
assertion
that
of Lemma
ing to such communication
any
than
3.4 over
all the
patterns,
boxes
and
the
1.1st
so
out-
he reports
that
Else,
he
the
in-
if
of Fk is st,
value
value
of Fk
(~ +o(l))n.
a good approximation
is (s – 1) ill
SM
to
for DIS(S,
t)
< sM.
showing
~ Q(t/s3),
= Q(?zl-sfk)
❑
the proof.
bound
not
every
and
player
for
communicates
lower
bound
holds
even for algorithms
order
that
a constant
may read
using
the
above
Fk
the sequence
section
are both
o(n)
once,
A in
of times.
of this
and approximation
protocols
of approximating
number
in the remainder
3.3 holds
one-way
only
for the space complexity
of F~ when
timation
in Proposition
only
memory
that
required
the ranin the es-
bits.
and
3.3
It
0(1).
Deterministic
space.
if the
algorithms
the
given
cannot
be computed
is crucial
the
27
of memory
a sequence
A,
its
length
and deterministically
Here we show that
ear number
correspondthat
that
precisely
even an approximation
by summing
we conclude
is obvious
be computed
communication
Therefore,
then,
s, obtains
the correct
= Q(n/s5)
protocols,
general
domization
whose
in the end of which
~2c*ls3
es the
one,
generated
probability
fixed
of inputs.
patterns
O is smaller
protocol
to inputs
p, errs with
to a box
number
the
any deterministic
f2(t/s3
to repeat
c < 1/2,
third
Note
that
for
its original
it suffices to prove
for some fixed
to the distribution
is easy
pattern
,
the proof
Since it is possible
3.3 this implies
completes
We show
3.3.
he re-
communicant
1 – ~, the protocol
Since the lower
•l
of Proposition
outputs
Remark.
in which
X7F.])
and
to the
communication
M 2 Q(t/s4)
of 1,, r # j, is
x...
the
configuration
that
3.5, the
that
Yl
sec-
to run
1) > (3t – 2)s =
at least
and its total
By Proposition
lies
to the
the correct
s(t -
if the algorithm
probability
memory
intersecting
n+
The
of his set
. . . . As ) is disjoint.
then
com-
continues
intersecting.
is disjoint,
=
@(nl–l/~).
tm the players.
number
(Al,
if it is uniquely
Therefore,
Then
A;
sequence
for the
t =
If it is at most
sequence
of ~
then
memory
algorithm.
a
mo-
A4 memory
t elements
memory
player
Given
of the
of his set,
is uniquely
is sk +s(t
is
for any
By Lemma
it
the
of the
input
nl/~,
given
player
from
player,
of the
the
whereas
the number
U {z}.
the probability y that
of U as z,
xl (P)
1/(20s)
1 denote
Let
E X3],
last
5.
k >
n members
on the
content
second
elements
content
The
put
to
also in (1).
ProbP [A:
are contained
is precisely
exceed
appear
the
The
in the com-
protocol
s =
the algorithm
on the
resulting
(2) is
the one corresponding
s – 1 factors
above
which
c)f a member
cannot
besides
same
also easy to compute
ProbP [.4: c ~j]
for
be at
the frequency
randomized
t)
!2(1 ).
must
(2t — 1)s + 17 using
DIS(S,
starting
reports
factors
r = j is fixed.
runs
algorithm,
!.=]
of t-suklsets
an integer
of at most
n =
a simple
first
that
factor
sequence
game
probability
of bits
a
that
❑
f2(t/s3).
Fix
choosing
shows
patterns
for approximating
where
we define
put
last
any
p. By
err with
input
it outputs
p > 0 this
the number
3.2.
algorithm
munication
must
be at least
Theorem
, n},
minus
Let A1, Az, . . . , As be the inputs
<,
.> ~E(ProbP
this
~ X1
. . , A:)
constant
) and hence
Fh for
{1,2,...
bits,
E (ProbP
(A?,.
O on a random
the probability
of communication
must
of
ment
of Lemma
1 S j S S,
outputs
times
absolute
nu~ber
randomized
3.6.
the assertion
for every j,
,
&
case the algorithm
the
least
input
small
munication
X...
the protocol
is at least
sufficiently
Thus,
X1
A;)
0 on a random
,. . . . A~)E~Ix..x~.])
.(1 –x(P)))
that
.,.,
k besides
for any nonnegative
of Fk up to, say, a relative
deterministically
bits.
using
This shows that
in the two approximation
algorithms
can
F1
in logarithmic
error
less than
1,
of 0.1
a lin-
the randomness
described
in
Section
2.
This
concerning
is a simple
the
the equality
3.7
algorithm
nf 2 elements
Of N
IY – F~l
< O.lFk
Proof.
Let
For
any
~ be a family
n/4
so that
n/8
elements
at most
such a G follows
from
can be proved
members
the set of n/4
algorithm
members
content
dicts
after
reading
give
GI)
erably;
Gl),
for A(Gz,
Therefore,
exceeds
It follows
that
differ
F.
n/4
Randomized
As shown
gorithms
then
fact
last
that
these
final
This
k #
from
of
(for
all k but
randomized
{1,2,
to that
randomized
algorithm
most
2n elements
Y = Fk
must
Proof.
works
wdh
use Q(n)
For
with
that
of N
probability
memory
The reduction
here as well
(i)
F2, where
the
following
the proof
that
all these
of at most
as shown
given
1 – e for
in the two
result
must
F2 up to
use at least
Q(lg n +
from
the
construction
with
the well
in the
known
complexity
proof
fact
that
of the equality
z and y
is El(lg 1).
F1 of the sequence
content
m) distinct
can be any number
of the memory
values with
should
positive
admit
probability,
at
giving
result.
memory
in the proof
by the argument
5
the
is at least Q(lg n) by the argument
of part
mentioned
(i) and is at least
in the proof
of part
Q(lg lg m)
❑
(ii).
Concluding
k #
remarks
as
We have seen that
domized
space, even if we
1, any
for
a sequence A of at
there
algorithms
moments
over the trivial
for
be interesting
c < 1/2
plexity
other
28
any
of the
k
the
functions
not much
an nnclj
of ~~=1
m!
of the numbers
m~.
The
in many
of Fk
k
bound
>
2.
It
the space comfor
or the space complexity
fre-
space can be
when
or estimate
ran-
three
space lower
or non-integer),
to determine
2.1 can be applied
first
in the approximation
that
approximation
of k for k <2,
in Section
the
space efficient
approximating
algorithms
(integer
holds
using
are surprisingly
k ~ 6. We conjecture
would
3.1 easily
for
F., F1, Fz, whereas
that
Y such
assertion
3/4
communication
The required
values
the above
follows
3.7, together
Since the length
leaat !l(lg
of these mo-
of Proposition
to
lg m)
bits.
up to m, the final
al-
up
Fk, dethat
is crucial
fixed
approximating
for
at least
f (x, y) whose value is 1 iff z = y, where
moments
some
F1
numbers,
be at
to
must
must use at least fl(lg
algorithm
probability
the desired
a number
approximating
for
3/4
are l-bit
must
up
3/4
(sketch).
The
(ii)
ele-
bits.
function
er-
m
F.
at least
+ 2kn/8.
a relative
integer
probability
for k ~ 2,
we observe
linear
approximating
for
with
at least
the randomized
consid-
algorithm
probability
each other
bits.
and proves
and
be a sequence
A
algorithm
of Proposition
algorithms
in the proof
for
F.
for estimating
to show
small
suffice
factor,
n) memory
memory
Proof
contra-
❑
nonnegative
at least
bits
up to a constant
randomized
1, the values of Fh
computation
outputs,
A of at
. . ..n}.
randomized
with
mentioned
= {1, 2, . . . . n}
show
randomly
bits.
Any
quency
any
suffice
difficult
of O.lFO
error
Any
Ig Ig m)
that
Let
randomized
Any
gained
3.8
of N=
O.lF1
algorithms,
Proposition
2 here
some fixed
O (lg lg m)
the remark
is not
are tight,
with
for approximating
from
It
4.1
(ii)
of the
however,
makes
subsection
1) requires
k =
2.2.
ments
memory
are
to the two
and Fk ~ n/4
the frequency
the precise
follows
bounds
O.lF1
prin-
means
output
is essential
are approximation
in the sense that
precisely,
lg m) bits suffice
Section
Fz of a sequence
factor
F1, O(lg n) bits
Proposition
(ii)
computation
2. In this
More
statement
in
to approximate
FO, F~ and
probability.
use at least fl(lg
one of these two sequences.
the randomness
in Section
ments
allow
above,
those
suffices
error
(i)
the
there
the content
of GZ.
the proof.
for approximating
scribed
well,
precise
[8] and
memory
up to a constant
(iii)
3.4
[17],
moments
an additive
Gz ), the
elements
the space used by the algorithm
t = Q(n),completing
lg
with
with
When
A(G1,
of G1 is equal
of the algorithm
0.1 for at least
in
m terms
upper
starting
and Fk = 2kn/4
3n/8
~
of Fo, F1, Fz
for the approximation
the frequency
below.
and G2 of
order).
G1 ). This,
= n/4
F.
Gl),
the answer
ror that
above
bounds
most
and
and ending
lg t bits,
since for every
logarithmic
a de-
By the pigeonhole
same
and A(G2,
sequences
for A(G1,
the
results
of Theorem
of
theory,
Fix
n/2
order)
elements
The
that
O(lg n +lg
of L7
existence
GI
of the form
the elements
must
A(G1,
for the two
least
the
the assumption,
whereas
of length
on GI.
lower
Fk for some fixed
members
has less than
reading
algorithm
sequences
two
Tight
approximating
in coding
sets G1 and Gz in ~, so that
after
such that
members
it reads the first
only
memory
Y
argument).
of Gz (in a sorted
after
4
1, any
k #
(The
approximates
a sequence
depends
if the
two distinct
the
given
configuration
memory
results
counting
that
results
bits.
distinct
of G1 (in a sorted
runs,
the sequence
ciple,
standard
Gz) be the sequence
G let A(G1,
memory
memory
❑
of [15].
without
a sequence A of
a number
in common.
1. For every
k #
n},
any two
by a simple
algorithm
nonnegative
integer
result
of
t = 2Q(”) subsets of N, each
of
of cardinality
known
proof,
given
outputs,
use Q(n)
have
the n/4
nonnegative
{1, 2, ...,
must
these
main
results
complexity
complexity.
that
=
known
a self contained
to communication
deterministic
terministic
however,
we present
Proposition
of the
communication
Since,
function.
are not dMicult,
any reference
corollary
deterministic
non-integral
of estimating
method
described
cases and give some
nontrivial
ficult
space savings.
to design
scheme
Thus,
a randomized
in Subsection
for example,
algorithm
2.1, that
it is not too dif-
approximates
small
relative
error-probability,
using
O(lg n Ig m) memory
description
In a recent
study
work
of this
[2] Alon
of the estimation
results
The
demonstrate
algorithms
namic
estimation
more
in practice,
efficient
which
than
bounds.
algorithm
for maintaining
popular
items
and their
approximating
F-)
for frequency
for typical
Gibbons
data
using
small
distributions
dy-
as well.
We
memory,
of practical
[13] P.J. Haas,
implied
by
an
Colin
Mallows
(and hence also
mation,
for helpful
comments
out
(1995).
Proc.
J. Statistical
90-92.
counting
S. Seshadri,
Estimation
of the
Proc.
of
and L. Stokes,
of Distinct
Number
of the 21st VLDB
and V. Poosala,
Balancing
Query
for
ACM-SIGMOD
a
1995.
Conf.,
Histogram
Result
Size
Esti-
1995.
well
[15] B. Kalyanasundaram
interest.
and for pointing
Issue on Ma-
Probabilistic
manuscript,
and Practicality
Optimality
regarding
the
[16] Y. Ling
[11].
and G. Schnitger,
communication
Structure
literature,
Kechris,
Naughton,
[14] Y. E. Ioannidis
Acknowledgment
statistics
N.
of an Attribute,
presented
tic
We thank
32 (1989),
J.F.
1995, 311-322.
works
and P-values,
of events,
be
sets would
which
Materi-
18(2),
indexes
Sampling-Based
e list of the k most
counts
Special
Warehousing,
and Simulation
and
Values
case performance
an approximate
Hofri
large number
be able to obtain
et zd [9] recently
approximate
[12] M.
the fully
be deleted
of
and Applications,
Bulletin,
and Data
Surprise
Computation
of these algorithms.
one may
the worst
the lower
Engineering
Views
Maintenance
Techniques,
We omit
an experimental
to deal with
may
1.S. Mumick,
Problems,
Data
terialized
[11] I. J. Good,
utility
set items
algorithms
bits.
and
View:
IEEE
fixed
for Fz. The experimental
the practical
that
m; lg rm
some small
et al presented
are also extended
remark
with
Gupta
alized
algorithm.
algorithms
case, in which
finally
error
~~=1
up to some fixed
the detailed
[10] A.
based on the general
complexity
in Complexity
and W.
methods
Theory
The
Conf.
RECORD,
21(4)
(1992),
Counting
large
numbers
(1987),
2nd
41-49.
to sampling-based
A supplement
size estimation
probabilis-
intersection,
set
query
for
SIGMOD
Sun,
of
in a database
system,
12-15.
References
[17] R. Morris,
[1] N. Alon,
L. Babai
domized
N. filon,
P. Gibbons,
namic
probabilistic
limited
storage,
7(1986),
ran-
independent
Y. Matias
and
of
M. Szegedy,
N. Alon
.John Wiley
[4]
L. Babai,
and Sons Inc.,
P. Frankl
in comrnunicatton
IEEE
[5]
[6]
FOCS,
[19] K.-Y.
BIT
[9]
Very
Elements
D.A.
handling
Large
Y.
maintenance
algorithms
Schneider,
Data
Martin,
ing probabilistic
Laboratories,
Matias
and
jor
jilt ering,
Murray
Hill,
249-253.
of dis-
(To ap-
Science. )
15(2)
C. Yao,
Vander-Zanden,
ACM
(1990),
Some
computing,
and
counting
probabilistic
Systems,
[21] A.
Bases,
and S. Sejoins,
Proc.
1992. pp. 27.
a detailed
analysis,
Probabilistic
counting,
1983, 76-82.
P. Gibbons,
in small
complexity
(1990),
Computer
B.T.
applications,
tributed
of Informa-
in parallel
counting:
G. N.
Whang,
database
[20] A.
113-134.
and
On the distributional
of the ICALP
A linear-time
C. Yao,
Proc
Approximate
25 (1985),
FOC;S
classes
of the 27th
1991.
skew
Conf.
P. Flajolet
Proc.
events
Transactions
H.M.
Tay-
algorithm
for
on Database
208-229.
complexity
questions
Proc of the 1lth
ACM
related
STOC,
to
dis-
1979,
209-213.
J.F. Naughton,
Practical
P. Flajolet,
1992.
1986, 337-347.
and J. A. Thomas,
D.J. DeWitt,
Method,
Complexity
theory,
Wiley,
18t~ Int’1.
[8]
J. Simon,
complexity
tion
Theory,
Probabilistic
New York,
T. M. Cover
shadri,
[7]
and
The
of
840-842.
in
Feb. 1996.
and J. H. Spencer,
21 (1978),
pear in Theoretical
Dy-
sizes
lor,
[3]
Proc.
jointness,
self-join
CACM
[18] A. A. Razborov,
567-583.
maintenance
manuscript,
and simple
A fast
the maximal
for
J. Algorithms
set problem,
[2]
and A. Itai,
algorithm
parallel
registers,
A.
Witkowski,
high-biased
Technical
NJ,
Dec.
Practical
histograms
Report,
AT&T
usBell
1995.
29
Lower
bounds
of the 24’h IEEE
by probabilistic
FOCS,
1983, 420-428.
arguments,

Download Report

p20-alon.pdf

Paperzz.com

Your Paperzz