New Characterizations in Turnstile Streams with Applications Yuqing Ai Wei Hu Tsinghua University Tsinghua University Yi Li David Woodruff Facebook IBM Almaden Turnstile Streaming Model ο Underlying π-dimensional vector π₯ initialized to 0 ο Stream of updates π₯ β π₯ + ππ or π₯ β π₯ β ππ for standard unit vector ππ ο At end of the stream, π₯ β {βπ, β¦ , β1, 0, 1, β¦ , π}π ο Output an approximation to π(π₯) w.h.p. ο Goal: use as small space in bits as possible Example: Estimating the β2 -norm ο Output π with 1 β π π₯ ο Algorithm: 2 β€π β€ 1+π π₯ 2 1. Let π = 1/π 2 2. Choose an π × π matrix π΄ of i.i.d. sign random variables (+1 w.p. 1/2, β1 w.p. 1/2) 3. Maintain π΄π₯ in the stream 4. Output π΄π₯ 2 π Generic Form ο All known algorithms have the following generic form (linear sketch): 1. Sample a random matrix π΄ 2. Maintain π΄π₯ in the stream 3. Output a function of π΄π₯ Question (?!): does the optimal algorithm for approximating any function in the turnstile model have this form? The LNW Reduction ο Yes! [Li, Nguyα» n, Woodruffβ14] ο Theorem: for computing a function π of π₯ in βπ, β¦ , π π in the turnstile model, there is a randomized algorithm which 1. samples a matrix π΄ and a vector π uniformly from π(π log π) instances 2. maintains (π΄π₯ mod π) in the stream 3. outputs a function of (π΄π₯ mod π) ο Space complexity is optimal up to a constant factor (not including the π(log π + log log π) bits for randomness) Consequence Input π₯ Create stream π (π₯) Input π¦ Create stream π (π¦) Lower Bound Technique Streaming algorithm π 1. Run π on π (π₯), send state of π(π (π₯)) to Bob 2. Bob computes π(π (π₯), π (π¦)) 3. If Bob solves π(π₯, π¦), space complexity of π at least the 1-way communication complexity of π Consequence Input π₯ Create stream π (π₯) Input π¦ Create stream π (π¦) The LNW reduction implies If players can solve π(π₯, π¦), then space of π at least the simultaneous communication complexity of π Weaker model in which Alice and Bob simultaneously send a message to a referee who outputs the answer Our Result ο Strengthen the LNW reduction from several aspects: β¦ Remove the βbox constraintβ β¦ Generalize to the strict turnstile model β¦ Extend to multi-pass algorithms ο Obtain new tight lower bounds Strengthen the LNW Reduction ο Remove the βbox constraintβ ο Generalize to the strict turnstile model ο Extend to multi-pass algorithms The βBox Constraintβ ο The LNW reduction requires the algorithm to be correct as long as π₯ β βπ, β¦ , π π at the end of the stream. ο While processing the stream, may have π₯ ο The algorithm is not allowed to abort if this happens. It must still be correct at the end of the stream as long as π₯ β βπ, β¦ , π π . ο More natural requirement: the algorithm only needs to be correct when π₯ belongs to βπ, β¦ , π π at all time in the stream. β β«π Stream Automaton β¦ βππ +ππ β¦ βπ1 , +π2 β¦ Start β¦ +π1 +π1 +π5 βπ1 β¦ β¦ Path-Independent Automaton ο Every π₯ β β€π in a unique state Path-Independent Automaton βππ +ππ β¦ βπ1 , +π2 β¦ Start β¦ +π1 +π1 +π5 0 in two different states βπ1 β¦ β¦ Path-Independent Automaton ο Every π₯ β β€π in a unique state ο Equivalent to π΄π₯ mod π Zero-Frequency Graph ο For stream π, let freq π β β€π be the βnet updateβ to all coordinates. ο Zero-freq graph: directed graph πΊ = (π, πΈ) β¦ π = states of the automaton β¦ π’, π£ β πΈ if there exists stream π such that π’ β π = π£ and freq π = 0 ο Terminal equivalence class: strongly connected component in πΊ with no outgoing edge ο Walk in G is a sequence of zero-frequency streams The LNW Reduction πΊ: zero-frequency graph of πold ο States of new automaton πnew = terminal equivalence classes in πΊ ο ο ο For a terminal equivalence class πΆ and an update ππ , define transition as: β¦ Let π£ β πΆ be an arbitrary node β¦ Compute π£ β ππ using transition function of πold β¦ Walk from π£ β ππ in πΊ until reach a terminal equivalence class πΆβ² πΆβ² is unique β¦ Does not depend on π£ or the walk πΆ Terminal equivalence class π£ ππ freq(π) = 0 Terminal equivalence class πΆβ² The Box Constraint ο For a stream π, define |π|max = max prefix π of π freq π β π = (π1 , π2 , β¦ , ππ ) on πnew πβ² = (β¦ , π1 , β¦ , π2 , β¦ , ππ , β¦ ) on πold π1 ο ο ο π2 π3 π4 π5 π6 β¦ π1 , π2 , β¦ are zero-frequency streams (walks in πΊ) Length of ππ could be very large When |π|max β€ π, |πβ²|max could be very large Zero-Freq Stream Length ο πΏ: upper bound on the lengths of ππ βs ο |π|max β€ π βΉ |πβ²|max β€ π + πΏ/2 ο Want πΏ β€ π ο Let s = # states in πold Lemma: if there is a zero-freq stream from π’ to π£, then there exists such a stream with length at most π π poly ππ β + 1 ο π ο πΏ β€ poly ππ β π π +1 π Tightness of Our Bound ο ο πΏ β€ poly ππ β π π +1 Lower bound: πΏ β₯ π π Ξ©(π) π Removing the Box Constraint ο Want πΏ β€ π ο πΏ β€ poly ππ β ο π ππ πΏβ€π βΈ π π +1 π β€ π ππ β€ π βΈ log π β€ log π ππ Space of πold Application: Counting π=1 ο Problem: output |π₯| up to additive error π/4, while π₯ varies in {βπ, β¦ , π} ο ο π(log π) space algorithm ο Is there an Ξ©(log π) lower bound? β¦ For insertion streams, no: approximate counting β¦ For relative error, yes: but proof doesnβt apply β¦ For additive errorβ¦ yes! Application: Counting ο Condition for removing box constraint: space β€ log π log π = ππ π log π , π ο Assume space β€ ο π΄π₯ mod π = (π1 π₯ mod π1 , π2 π₯ mod π2 , β¦ , ππ π₯ mod ππ ) β¦ Show lcm π1 , β¦ , ππ = Ξ©(π) ο Cannot distinguish π₯, π₯ + lcm, π₯ + 2 β lcm, β¦ β¦ Ξ©(π) different states, Ξ©(log π) space otherwise done Application: Norm Estimation ο Problem: for π₯ β βπ, β¦ , π π , output π₯ 1 additive error π1/π π π up to 4 ο Ξ©(log π) space lower bound ο π(log π + log log π) space algorithm (1 β€ π β€ 2) [KNWβ10] ο Lower bound tight when log log π = π log π βΊ π β€ exp poly(π) Strengthen the LNW Reduction ο Remove the βbox constraintβ ο Generalize to the strict turnstile model ο Extend to multi-pass algorithms The Strict Turnstile Model ο The strict turnstile model: no negative coordinates, i.e., π₯π β₯ 0 at all times in the stream ο Dynamic graph streams: insertions and deletions of edges β¦ Allow multi-graphs, but no negative edges ο Generalize the LNW reduction to the strict turnstile model β¦ β¦ β¦ β¦ πΏ: upper bound on the length of zero-freq streams Initialize all coordinates of π₯ to be πΏ Now the reduction guarantees π₯ is always nonnegative Subtract πΏ from all coordinates at the end of the stream Application: Maximum Matching ο [AKLYβ16]: For outputting an ππ -approximate maximum matching, space is Ξ(π2β3π ) β¦ Lower bound only in simultaneous communication model ο Can apply our reduction Strengthen the LNW Reduction ο Remove the βbox constraintβ ο Generalize to the strict turnstile model ο Extend to multi-pass algorithms Multi-Pass Algorithms ο π-pass automaton β¦ After π-th pass (π < π), output an automaton ππ+1 β¦ Run ππ+1 on input stream in (π + 1)-st pass β¦ After π-th pass, output answer ο Theorem: There is a π-pass automaton for which each automaton in each pass is path-independent β¦ Space is optimal up to a constant factor Conclusions ο New progress on characterizing turnstile streaming algorithms as linear sketches ο Applications β¦ Optimal lower bounds for counting with additive error, maximum matching in dynamic graph ο Open questions β¦ Box constraint β¦ After removing box constraint, still have very long streams β¦ Better reduction? Thank you!
© Copyright 2024 Paperzz