Tight Lower Bounds for the Distinct Elements Problem Piotr Indyk MIT [email protected] Abstract We prove strong lower bounds for the space complexity of -approximating the number of distinct elements in a data stream. Let be the size of the universe from which the stream elements are drawn. We show that any one-pass streaming algorithm for -approximating must use space when !#"$&% , for any '(*) , im- proving upon the known lower bound of + for this range of . This lower bound is tight up to a factor of ,.-0/1,2-3/4 . Our lower bound is derived from a reduction from the one-way communication complexity of approximating a boolean function in Euclidean space. The reduction makes use of a lowdistortion embedding from an 56 to an 5 norm. 1 Introduction Let a 87 9:9:9;7=< be a sequence of elements, which we will refer to as a stream, from a universe of size , which we denote by > @?ACB+)D:99:9E;GF H I . In this paper we examine the space complexity of algorithms that count the number of distinct elements JK= a in a. All algorithms will be given one-pass over the elements of a, which are arranged in adversarial order. An algorithm A is said to +; approximate on stream a if A outputs a number L such that Pr >.MD L F MN(OP ?RQO . Since there is a provable deterministic space lower bound of ST for computing or even approximatHVU ing within a multiplicative factor of W [1], there has been considerable effort to devise randomized approximation algorithms. There are several practical motivations for deX This work was supported in part by NSF ITR grant CCR0220280 and Sloan Research Fellowship. Y Supported by Akamai Presidential Fellowship for the duration of this work. David Woodruff MIT [email protected] signing space-efficient algorithms for approximating . Computing the number of distinct elements is very valuable to the database community. Query optimizers can use to find the number of unique values in a database with a certain attribute without having to perform an expensive sort on the values. With commercial databases approaching the size of 100 terabytes, it is infeasible to make multiple passes over the data since the sheer amount of time spent looking at the data is prohibitive. Other applications include networking: internet routers that only get one quick view at incoming packets can gather the number of distinct destination addresses passing through them with only limited memory. For an application of algorithms to detecting Denial of Service attacks, see [2]. There have been a multitude (e.g., [7, 1, 4]) of algorithms proposed for computing in a data stream, beginning with the work of Flajolet and Martin [7]. The best known algorithm was presented in [4] and +; approximates in space 6 Z [ \ ,2-3/4,.-0/4^]_,.-0/1`,.-0/a =,.-0/bc . In this paper we will take to be constant. For reasonable values of and (e.g., dfeg 6 , H @ )h ), the storage bound of the algorithms is H\i dominated by the 6 term. If one wants even H better approximation quality (e.g., h ), the Hji quadratic dependence on constitutes a severe drawback of the existing algorithms. This raises the question (posed in [4]) if it is possible to reduce the H\i dependence on . Till now, the best known lower Hji P [1, 3]. bound for the problem was S,2-3/1k] In this paper we answer this question in the negative. In particular, we show that any algorithm H for approximating up to a factor of ]fW reHji 6 Hji quires S storage, as long as lnm&olpq for certain rs(k) . This matches (up to a factor of Hji ,2-3/4,.-0/4 for small , and ,2-3/t W for large ) the upper bound of [4]. 1.1 Overview of techniques and technical results One way of establishing a lower bound is to lower bound the communication complexity of computing certain boolean functions and reduce them to that of computing uv [3]. In this model there are two parties, Alice and Bob, who have inputs w and x respectively and wish to compute the value of a boolean function y{zow}|;x~ with probability at least =R . The idea is that Alice can run a distinct elements algorithm on w and transmit the state of to Bob. Then Bob can plug into his copy of and continue the computation on x . This computation will return a number u v which is a z[P~ approximation to u v zow{x&~ with probability at least . If u v can be used to determine y{zw}|;x&~ with error probability less than , then the communication cost is just the space used by , which must be at least the one-way communication complexity of computing y{zow|[x~ . In [1] the authors reduce the communication complexity of computing equality zw}|;x&~ to computing uvzowTRx~ . Since the rana domized communication complexity of zow|[x~ is bzo.04~ , this established an Sz231~ lower bound for computing uv . We would like to obtain lower bounds in terms of the approximation error . In [3] the one-way communication complexity of the -set disjointness problem is used to derive an Sz\\~ space lower bound for approximating u v . Here Alice is given a set wOn with w¡ ¢ and Bob is given a set x£¤ @ with xA¥¦§ . Furthermore, both parties are given the promise that either xGw or x¨w@© . In case of the former, u v zwªtx&~4G ¢ and in the latter uv=zow«x~4 ¢a¬ ¦ . Hence, an algorithm which z+|; ~ approximates uv can distinguish these two cases. However, this reduction is particularly weak since Alice can replace the u{v approximation algorithm with an algorithm which simply samples ®z ~ of the elements ¯+° of , checks if ¯+°V±lw , and sends the result to Bob. This motivates the search for promise problems which use the full power of a distinct elements algorithm as a subroutine. Given a stream a, its characteristic vector ² a is the -dimensional vector with ³ th coordinate 1 iff element ³ of @ appears in a. Note that u v z a ~ is just the Hamming weight of ² a . One natural boolean function to consider is the following: Alice and Bob are given w|[x´±fµ+¶D|:0· with the promise that ei- ther ¸Jzow|[x~¹ ¢º , in which case y{zw}|[x~4`¶ , or ¸Jzow|[x~S» ¢ , in which case y{zow}|;x~K . Here ¸Jzow}|;x~ denotes the Hamming distance between w and x , that is, the number of bit positions in which w and x differ. Alice and Bob view their inputs w and x as characteristic vectors of certain streams a ¼ and a ½ . The value u v z a¼1 a½ ~ is then just the Hamming weight of w1¾§x , the bitwise OR of w and x . By constraining the weights of the inputs w and x to be close to each other and less than ¢^ , one can use an z+|; ~Vuv -approximation algorithm to determine the value of y{zw}|;x&~ with error probability at most . Unfortunately, it is rather difficult to lower bound the one-way communication complexity of y directly. Á , where we asFix and set ¿À¡¿EzP~f sume ¿ is a power of 2 for convenience. In this paper we consider the one-way communication complexity of computing the following related promise problem Âà Á zo¿[~ . Alice has a vector wı£ ¶|::ÆÅ with small rational coordinates and with . w{2 ¢ Ç . Bob has a basis vector x drawn from the standard basis µjÈ |:É:ÉÉ:|È · of ÊÅ . Both parties are given the Å Í , promise that either Ëow}|;xÌ¡¶ or Ëw}|;x&ÌÀ Å where Ë[|Ì denotes Euclidean inner product. We show the one-way communication complexity of deciding  à Á zo¿[~ with error probability at most when w}|;x are drawn from a uniform distribution is Szo¿[~ . We do this by using tools from information theory developed in [5] which generalize the notion of VC-dimension [10] to shatter coefficients. We then reduce  à Á zo¿[~ to that of approximating uv of a certain stream. In the reduction we use a deterministic zÎ ¬ÐÏ ~ -distorting embedding Ñ between an Ò ¢Å and an ÒÔÓ norm (defined below) developed in [6], where Ñ z0Ü Û ~ Á Þ ÕCÖ®K×&¿\Ø ÙÚ Ý ß . Al- Ñ ice computes zow~ and Bob computes zox~ . Depending on whether Ëow}|;xÌàá¶ or Ëw}|;x&Ìà Í , ¢ ¢ ¢ Å 2 w£ÇxA. ¢ â2 w. ¢ ¬ . xA. ¢ ÇãDËow|[xÌ will be ã or ãzÎS Í ~ . By choosing Ï appropriately small, Ñ Ñ Å zwq~R zox~: 2 wxA. ¢ . We then rationally Ñ Ñ Ää approximate the coordinates of zow~ and zx~ and scale all coordinates by a common denominator. We convert the integer coordinates of the scaled Ñ and zx&~ into their unary equivalents, obtaining bit strings wqå and x&å of length , where is determined by parameters chosen in the reduction (specifically, is a function of and Ï ). We show how a zÍ æ+|; ~ approximation algorithm comÑ zowq~ Å puting çè=éê=ë+ìtíê=îEìï can decide ðñóò3éoô[ï with probability at least õVöl÷ . Since the communication complexity of ð ñ ò éoô[ï is ô1øúû ù ò , the space complexity of éü+ý÷ ï approximating ç è in a universe of dimension þ is ÿ û ù ò . The goal is then to find the smallest þ for a fixed ü so that an éüý÷ ïç è -approximation algorithm can decide ð ñ ò0éô[ï . We determine þ to be ùû , which shows an ÿ û ù ò space lower ûù boundò for üVøfÿSé þ , and hence ï , for any þ lower bound for all smaller ü . an 2 Preliminaries +) .* Definition 1 For each randomized protocol ð as described above for computing , the communication cost of ð is the expected length of the longest message sent from Alice to Bob over all inputs. The ÷ -error randomized communication complexity of , 3é «ï , is the communication cost of the optimal protocol computing with error probability ÷ (that ðé ý ï ø {é ý ï ÷ ). is, 021 3546 +) ,* 897 +) ,* :<; = For deterministic protocols with input distribution é «ï , the ÷ -error , define -distributional communication complexity of , to be the communication cost of an optimal such protocol. Using the Yao Minimax Principle, é «ï is bounded from below by for any [12]. >@?A 1 B = 0C1 B > ?A 1 D E%&FGH# &% '' be a family of . Each IJD K LK NMMOM QP RSP . Definition 2 For a subset TVUW , the shatter coefficient XZY B([ of T is given by K\%&]K ^<'`_(a(bCK , the number of distinct bit strings obtained by restricting D to T . The c -th shatter coefficient XZY /D dc of D is the largest number of different bit patterns one é T D D gS$h! #i%& (' = kj kj > ?A 1 mln po@q where n VCD R . é «ï ø ù ö 0é÷ ï;ïWý ï rGYC> B(s tpu9!v# cLlxrGYC> = R yx z! > ?A 1 B ml XZY B R .c pc|{Oo@q Theorem 4 For every function Dý:õ , every é ï , and every ÷ , there exists a distribution on such that: %w (' é «ï é é ý Ôï[ï{ö 0é÷ ï 2.3 Embeddings For a survey on low-distortion embeddings the reader is referred to [8]. The norm in of a c/~} } ~ vector ) is defined to be K K )K K ~ & } ) w . A 5 -distortion embedding Lc/} #cB is a mapping such that: for any dGIc} , p K K pQKK ;KK \ p / KK ;K K QK K ø éÎõ ù Aï }ý õ ºö é qïNö é ï ö We will need the following theorem [6] in our reduction: Theorem 5 For every distortion embedding ` dw£ ¢ ¡ . ô qï é @ý ï é 0éÎõ The following generalization of this theorem [5] is lé Sï is difficult. useful when computing õ 2.2 VC dimension ø ý:õ Let Boolean functions on a domain can be viewed as a -bit string X The connection between VC dimension and randomized one-way communication complexity was first explored in [9]. The following theorem lower bounds the (one-way) communcation complexity of in terms of information theory. ! ýõ Let be a Boolean function. We will consider two parties, Alice and Bob, receiving and respectively, who try to compute {é ý ï . In the protocols we consider, Alice computes some function é ï of and sends the result to Bob. Bob then attempts to compute {é }ý ï from é qï and . Note that only one message is sent, and it can only be from Alice to Bob. = T e P^ P D T U f c ýõ Theorem 3 For every and õ , there exists a distribution on every G÷ such that 2.1 Communication Complexity " !$# &% (' ) * +) ,* - +) ) - /) * ]K [ D can obtain by considering all possible , where ranges over all subsets of size . If the shatter coefficient of is , then is shattered by . The VC dimension of , VCD( ), is the size of the largest subset shattered by . , there exists a @ c/q} # cB with é[õ - ù ò K )K We will use the notation to mean the and to mean the norm of . ) «ï ßø K K )K K c+q ) c ù norm of 3 Reduction We proceed as outlined in section 1.1. Recall , that is fixed, and we set which we assume to be a power of 2 for convenience. We first define and lower bound the communication complexity of . ¤ ¥¦E¥¨§¤ª©¦«¬N® ¯±° ²2³ ¯ §/¥,© 3.1 Complexity of ² ³ ¯ §+¥,© Let ´µ¦·¶w¸ w¹OººO» º¹ ¸w»d¼ be the standard basis of ¥ unit vectors in ½ . We define the promise problem ²³¾ ¯ §+¥,© as follows: ² ³¾ ¯ §+¥,©¿¦ ¶§/À ¹.Á ©mÂhÃ Ä ¹Å¨Æ »ZÇ ´gÈ¢ÈÈ ÀÈ È±¦ Å and either É+À ¹,ÁQÊ ¦ËÄ or É+À ¹,ÁQÊ ¦ Í Ì ¥ ¼ We will only be interested in tuples §+À ¹,Á ©Î²2³ ¯ §/¥,© in which the descriptions of the coordinates of À and are finite, so we impose the constraint that Á these coordinates be rational. We will also need ÇÐ to to assign probability distributions on Ï use Theorem 4, so it will be convenient to assume rational numbers have size bounded by Ñ ¦Óthese Ò size of a rational number ØÙ ¦ÚÒ Ì]ÔÕ±Ö§ÝÜÞ¥©× ß , where§à(the © ß Ô ÕÖÛ Ô ÕÖÛ Å × . We define the promise problem ² ³ ¯ §/¥,© : ² ³ ¯ §/¥,©¿¦ ¶¢§+À ¹.Á ©Oȧ+À ¹.Á ©ÎÂz² ³¾ ¯ §/¥,© and á|â ¹ Ñ ÀÞã and Á ã are rational with size ä ¼ We let Ï denote the set of all À for which there ©ÎÂå² ³ ¯ §/¥,© , and we define Ð exists a such that §+À Á , ¹ Á ©æÂç² ³ ¯ §+¥,© , we define è§/À ©Î¦ similarly. For §+À Ä if É+À ¹.ÁQÊ ¦Ä and¹.Á è§+À ¹,Á ©m¦ Å if É+À ¹,ÁQÊ ¦ é Û ¹,» Á . As stated in the preliminaries, we can view è§/À Ðí ¶wÄ ¹.Á ¼8© È asÀL a family of functions ꦶ&è(ëQ§ ©æì ¹Å Ïz¼ , where è ë § Á ©5¦îè§+À ¹,Á © . Á » Theorem 6 The ï th shatter coefficient of ê is ¯¨ñóôwò õ » Ì(ð . Proof For any subset ö9¦÷¶w¸ ã ººOº ¸ ã¨ôø ¼úùfû of ï» vectors, we define À|ü to be theò ¹Onormalized average é Û »<ý9þdÿ ü ¸ . From our assumptions on ¥ , the coordinates of À ü are rational with size bounded Ñ ¦ above by . We define the set Ï as Ï ùVÏ length¶À ü Èö÷ù Ð ¼ . The» claim is that every ¥ bit string with exactly ï 1s will occur in the truth table of è ò . Consider any such string with 1s in positions â ¹OººOº¨¹ â ôø , and let öÚ¦ ¶w¸ ã ò ¹ºOºº ¸ ãªôø ¼ . Then ¸ Âö É+ÀÞü ¹ ¸ Ê ¦ É é Û » ý þdÿ ü ¸ ¹ ¸ Ê ¦ é Û » è ë §/¸`© ¦ Å ¸ Âö +É ÀÞü ¹ ¸ Ê ¦ Ä ÀÞü ¸ ¬ ô»ø ° Ì𠨯 ñóô&ò õ » è ë §/¸`©C¦÷Ä for all , so that , and for all , since is in an orthogonal subspace to , and hence . Since there are such strings, the theorem follows. ï ½ (§è © C§/¥,© Corollary 7 For all , the one-way communication complexity is . ¥ We can apply Theorem 5 so long as §Bè ò © , but this is clear since È ûÈ ¦¥ so that ù , §Bè ò © ä¥ . We defor all subsets duce that there exists an input distribution such § §è ò ¹ ¥,©,© ¥ Û § (© that §è © is at least ¥¨§ Û § ï ©!" Û § (©.©,©C¦#Ô ÕCÖ §+¥,© . By the Yao minimax Proof principle [12] the corollary follows. » Û §+À ©"ÂÚ² ³ ¯ §+¥,© È È Á îÀÈ È Û ¦·ÈÈ Á ¹,ÈÁ È Û ß È È ÀÈÈ Û È ® È Á È È ¦ÚÈÈ ÀÈ È ¦ Å Ì É/À ¹,ÁQÊ Í p¤ Å Å Û Ä ¤ Å è§/À ¹.Á ©¦ËÄ ' È È Á ÀÈÈ Û ¦ Í Ì ' È È Á ÀÈȦ Ì ( è§/À ¹.Á ©¦ Å ' È È Á ÀÈÈ Û ¦ Ì Í ¥ ) Í ' È È Á ÀÈȦ Ì Å Í Ì ¥ Í § ÍÅ © Ì Å ¥ Ø * ,³ + ñ » õ 3.2 Embedding $ » into $ Û ¦ -]§+¥,© be a function to be specified» í later. Let -p. $32 , Let / be a § ß0- © -distortion embedding /çì1$ Å ñ <ò õ Û with 4@6 ¦ 57O¥ 8 9;: = ? ¯ > , as per Theorem 6. Alice and Bob can construct the same embedding / locally without any communication overhead. Let §/À ¹,Á © be an instance of ²³ ¯ §+¥,© and let Alice possess À and Bob possess . Alice computes /§/ÀÞ© , Bob comÁ putes /§ © . By the distance-distorting properties of Á / , we have: We will need the following connection between $ distances and dot products: Let . Using the relation , the property that , and the % for & , we see: inequality ß Å - ä È /§+À|©OÈ ¹ È /§ Á ©Èä Å @ Å È È Á ÀÈ È ä È /§ ©!A/§+À|©OÈxäyÈ È ÀÈ È Á Á Å Aß - (1) (2) 3.3 Rational Approximation We will need the coordinates of BDCFEHG and BDCFIJG to be rational. This will change K BDCFEHGLK , K BMC IJGLK , and K BMC ENGPOQBDCFIJGRGK , but not by much if we choose a good rational approximation. We fix a function S@TUS C VRGWHXZY[X to be specified later. Let \ S1] denote the set of nonnegative integers less than S . Let ^_` denote the fractional part of a real number _ , so that _aOb^_` is an integer. Then it easily follows that for any real number _ , there exists a unique c T cCF_Ged"\ S] with fhg.^_`iObk j gmk l . For this value of c we define the rational approximation n n of _ to be C _G T C _oOp^_`Giq k j . For a r C _G -dimensional real vector s T CF_ _wvwG , we n n n ltuLuuLt define CFsxG T C C _ G CF_wvGRG . l tLuLuuyt By our choice of rational approximation, we have: r r n K BDCFEHGLK1O S g?K 3C BDCFEHGRGKzgQK BDCFENGKq S (3) r r n (4) K BDCFIJGLKO S g{K 3C BDC IxG|GLKigQK BDCFIJGLKq S r K BDC IxG!OABDC ENGLK1O n K C3BDCFIJG|G~O n S n n g}K C BDCFIJGRG!O C3BDCFEHGRGLK (5) r C3BDC ENGRGKzg?K BDCFIJGO@BMC ENGKq S (6) 3.4 Reduction to Distinct Elements Alice and Bob now convert their transformed inputs to integer vectors. To do this, they scale each coordinate by S , scaling the norm of their vectors by S . For r -dimensional vectors s T CFs s1vwG , let S s1lvtG uL, uLsout that AlcCFsxG denote the vector C S s n lwtuLuLut n ice and Bob now have cC CBMC ENG|GRG and cC C3BDCFIJG|GRG respectively. The idea is to convert each of these integer coordinates to their unary representation so that we can reduce this problem to that of computing Hamming n distances between bit strings. Since K C3BDCFEHGRGKg n v K BMC ENGK1q k g , each coordinate of C3BDCFEHGRG is rational with absolute value less than or equal to . n Hence, each coordinate of cC CBDCFENG|GRG is an integer with absolute value less than or equal to S . For each integer , Oi S g g S , we define its unary equivalent CF G to be a bit string of length S with first S q. bit positions to be 1s, and remaining k k bit positions to be 0s, namely, DC G Tmw f . r For a -dimensional integer vector s T C s s1vwG ltLuuLuLt with coordinates in the range \,Oi S S1] , we det CFs v G|G . For any two fine CFsxG to be CFCFs G l tuLuuLt such vectors s s with coordinates in \Oi S S1] , t l t it is easy to see that K s Os K T# CFCFs G CFs GRG , t l l refers to Hamming distance. Let EH T where n n DCcC CBDCFENG|GRG|G and Ix T DCcC CBMC IJGRGRG|G . Note r that both EN and Ix are bit strings of length CF S q G . Let %VyCFEHG denote the number of 1s in bit string E . n n Since K cC C3BDC ENGRG|GLK T %VyC E G and K cC CBDCFIJGRG|GLK T %VyCFIxG , combining (1), (3) and (4), we see that: r r S S q SM q@ O S g%VyCFE Gg r r S S q S qA O S g%VyC I Gzg n T K cC C3BDCFEHGRG|GO Also, since C EN n IxG n n t S K C3BDCFEHGRG"O cC CBDCFIJGRG|GLK T C3BDCFIJG|GLK , com- bining (2), (5), and (6), we observe: S KK I O¡EKK r O g q@ CFE I Gg t S KK IPO¡EKKq r (7) We now transform these observations on Hamming distances into observations on the number of distinct elements of streams. Alice and Bob can pretend their bit strings EH and Ix are the characteristic vectors of certainr streams a ¢£ and a¤y£ in a universe of size CF S q G . There are an unbounded number of streams that Alice and Bob can choose from which have characteristic vectors EH and Ix . They choose two such streams arbitrarily, say a ¢ £ and a¤ £ , which will be fed into local copies of an ¥D¦ approximation algorithm. Let a ¢£¨§ a¤y£ be the stream which is the concatenation of streams a ¢ £ and a ¤ £ , so that its characteristic vector is just the bitwise OR EH©%Iª of E« and Ix . ¢ ¤ Claim ¬ S 8 Let EN t Iª t a £ t a £ ¬ beS C Gzg%VyCFEN®G %VyC Ix®G%g C t ¬ l S S . Then ¬ C G functions of ¬ S ¢ £± ¤ ¥ ¦ C a ¢ £ § a¤ £ Gg C Gq¯° as above. ¬Suppose G for some C S Gzg ¢ £®l ± ¤ £,² C S G0q{¯° g £l ² . Proof As noted above, ¥ ¦ C a ¢ £ § a¤ £ G is just %VyCFEN©PIªG . Let ³ be the number of positions which are 1 in Ix but 0 in EN . Then CFEN Ix®G T C %VyCFENFGO t CF%VyC Ix®GO³LG|GyqP³ T %VyCFEN®GO0%VyC IªGyq ³ , so that ³ T T l C C E« t Ix®Gq%VyCFIxGDO¡iVyCFE«FG|G and iVyCFENy©0Iª®G T %VyCFENG;q³ l C C E« t Ix®Gq@iVyCFE«FGMq@%VyC IxG|G . The claim follows from the bounds on iVyCFEHG %VyCFIxFG . t Now suppose ´µ ¶·|¸x¹!º6» . Then ¼¼ ¶¾½o¸¼¼ºÀ¿ Á , so that we have ÂyÅRÆÈ Ã Ç Ä ½ÉÊËoµF¶NÌ·R¸x̹ by (7). By Claim 8 we conclude, ÍMÎ ¿ Á × Ö Á × Á ÖAØ Ù Å Ö Ù É º½Õ Ø Ö Ã ÄÛ Á Ö × Ú Ö@Ù Ø On the other hand, if ´µ ¶·R¸J¹!º , we have Ø Ø ¼¼ ¶½¡¸¼¼ªÜ ¿ ÁÝ ½ ¿ Þß µ a ÏÐÑ aÒyÐ3¹ÔÓÀ½Õ É ÁpÖ Ø 4 Conclusions so that Ëhµ ¶ Ì ·R¸ Ì ¹Ü Ø Ý ¿ Á Ý Ø ½ É ¿ Þ ß Ö × by (7), and hence by Claim 8 we have ÍMÎ Å Å Our computations mean that Éíºèîï ðFñ!òó1ô ï ðõõ so that º Éöº çµ ¹ so long as we set ÷ ï Å òóô ï Å in dimension ìíµ É× ¹Aº ð3Å ø ðÅõõ and× hence á Í Î ÷ ï òóô ï approximator ð3ù ðõõ an µúìíµ ¹¨·;ã¹ × can distinguish the above two cases with error probaá bility at most ã , Å which means by the reduction that it must take ûï ð ü õ space. Å Å Let ýþºÿìÀï ð3ù òó1ô ï ð õõ . Then we see that for ù » , there is a space all ºèû ý , for any Û Å á Ú lower bound of û ï ð üõ on µ ·;ã¹ approximating the number of distinct elements á in a universe of size ý . µ aÏ Ð Ñ a Ò Ð ¹àÊ × Ö º Õ É Á Ö p É Ö × Ý Ö Ø ¿ ×Á Þ Ö ¿ ÁÈß ½ We Å have shown a tight space lower bound of û ï ð3üõ on µ ·|ã¹ approximating the number of distinct elements á in a universe of size ý when º ù , for any û ý » . á Û Ø Ú The upper bound of çxµFý ù ¹ on can be somewhat relaxed by strengthening theá analysis presented in this paper. For example, one could use a randomized embedding of the relevant â vectors Ä into Å ; this would give Éíºèîµ òó1ô Þ Ä ¹ and lead to ØÙ a somewhat higher upper bound on . Instead of following along these lines, we mention á that, very recently, the second author Å managed to improve the Ø upper bound on to ý Ä [11], which is optimal. The approach of á that paper is as follows. Observe that one can reformulate the reductions given in Section 3 as a method for constructing a set of Ø vectors in »· that results in a large bound for the shatter coefficient of the function ´µF¶M·R¸J¹ . The set is constructed indirectly: first, the vectors are chosen in â , then they are mapped into Å , and then Ä finally into the Hamming space. This indirect route blows up the dimension of the vectors by a polynomial factor. Instead, in [11], the vectors are constructed directly in the Hamming space via a (fairly involved) probabilistic argument. É ½ ¿ × Á Ø × ß Á ¿ ×Á Þ It remains to choose and so that these two cases Å Ù × an º ·|ã approximacan be distinguished by à â Û Ú á tion algorithm for distinct elements. Let ÌDºpäæå for some small constant ä toÍMbe determined á shortly.á Î Suppose we have an µ Ì ·;ã¹ -approximation algorithm which can distinguish these two cases. Back Å á substituting for , we want: à â á Ø Ø É Ø Ì ¹%Ý!Õ µ Ý ½ (8) Ö ÁÖ Ö ¿ Á ß ¿á ×Á ß × á Ø Ø É Ø ÜÀµ ½ Ì ¹&Ý½Õ Ý Ø Á Ö × Ö ¿ Á ßiß á Ö@Ù Î If É ºèçµ ¹ , there exists a constant such that for Î all Ü ×, á(8) is equivalent to: á á á Ø Ø Ø Ì ¹&Ý Ý µ ½ Ö Ö ¿ Á ß ¿á ×Á ß × á Ø Ø Ø ÜQµ ½ Ì ¹&Ý Ø Ý × Ö ¿ Á ß%ß Ö@Ù Åá Setting äº éLê ÆMÅë , one finds after some algebra Ã Ä that for sufficiently small, but constant , one can set ºèìíµ ¹ . á Ù á 5 Acknowledgments For helpful discussions, we thank: Johnny Chen, Ron Rivest, Naveen Sunkavally, Hoeteck Wee. Also many thanks to Hoeteck for reading and commenting on previous drafts of this paper. References [1] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the 28th Annual ACM Symposum on the Theory of Computing, p. 20-29, 1996. [2] A. Akella, A. R. Bharambe, M. Reiter and S. Seshan. Detecting DDoS attacks on ISP networks. In Proceedings of the Workshop on Management and Processing of Data Streams, 2003. [3] Z. Bar Yossef. The complexity of massive data set computations. Ph.D. Thesis, U.C. Berkeley, 2002. [4] Z. Bar Yossef, T.S. Jayram, R. Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. RANDOM 2002, 6th. International Workshop on Randomization and Approximation Techniques in Computer Science, p. 1-10, 2002. [5] Z. Bar Yossef, T.S. Jayram, R. Kumar, and D. Sivakumar. Information Theory Methods in Communication Complexity. 17th IEEE Annual Conference on Computational Complexity, p. 93, 2002. [6] T. Figiel, J. Lindenstrauss, and V. D. Milman. The Dimension of Almost Spherical Sections of Convex Bodies. Acta Mathematica (139) 5394, 1977. [7] P. Flajolet and G.N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 18(2) 143-154, 1979. [8] P. Indyk. Algorithmic applications of lowdistortion geometric embeddings. In Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science, invited talk, Las Vegas, Nevada, 14-17 October, 2001. [9] I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21-49, 1999. [10] V.N. Vapnik and A.Y. Chervonenkis. On the uniform converges of relative frequences of events to their probabilities. Theory of Probability and its Applications, XVI(2):264-280, 1971. [11] D. P. Woodruff. Optimal space lower bounds for all frequency moments. Available: http://web.mit.edu/dpwood/www [12] A. C-C. Yao. Lower bounds by probabilistic arguments. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science, p. 420-428, 1983.
© Copyright 2026 Paperzz