Event Stream Processing with Out-of-Order Data Arrival Mo Liu Database System Research Group Worcester Polytechnic Institute Total Latency 4.5E+14 4E+14 3.5E+14 3E+14 2.5E+14 2E+14 1.5E+14 1E+14 5E+13 0 0.1 0.2 0.3 0.4 Out-of-Order Event Percentage Revision_Disk Revision_Disk_Batch Total Latency 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0 0.1 0.2 0.3 0.4 Our-of-Order Event Percentage Revision_Memory Punctuation_Memory Operator State Size 120000 100000 80000 60000 40000 20000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Punctuation Generation Rate 0.8 Execution Time 8000 7000 6000 5000 4000 3000 2000 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Punctuation Generation Rate 0.8 Operator State Size 120000 100000 80000 60000 40000 20000 0 1 2 3 4 5 6 Punctuation Generation Rate 7 8 4.5E+14 Total Latency 4E+14 3.5E+14 3E+14 2.5E+14 2E+14 1.5E+14 1E+14 5E+13 0 0.1 0.2 0.3 0.4 Out-of-Order Event Percentage Revision_Disk Revision_Disk_Batch 1600000 Total Latency 1400000 1200000 1000000 800000 600000 400000 200000 0 0.1 0.2 0.3 0.4 Our-of-Order Event Percentage Revision_Memory Punctuation_Memory 1600000 Total Latency 1400000 1200000 1000000 800000 600000 400000 200000 0 0.1 0.2 0.3 0.4 Our-of-Order Event Percentage Revision_Memory Punctuation_Memory Performance Gain in Execution Time 8000 7000 Execution Time 6000 5000 4000 3000 2000 1000 0 0.1 0.2 0.3 0.4 0.5 Punctuation Generation Rate 0.6 0.7 0.8 Performance Gain in State Size 120000 100000 Operator State Size 80000 60000 40000 20000 0 0.1 0.2 0.3 0.4 0.5 0.6 Punctuation Generation Rate 0.7 0.8 Performance Gain in Execution Time 8000 7000 Execution Time 6000 5000 4000 3000 2000 1000 0 0.1 0.2 0.3 0.4 0.5 Punctuation Generation Rate 0.6 0.7 0.8 Total Latency Comparison (Revision_Disk vs Revision_Disk_Batch) 4.5E+14 4E+14 3.5E+14 Total Latency 3E+14 2.5E+14 2E+14 1.5E+14 1E+14 5E+13 0 0.1 0.2 0.3 Out-of-Order Event Percentage Revision_Disk Revision_Disk_Batch 0.4 Performance Gain in Execution Time 120000 100000 Execution Time 80000 60000 40000 20000 0 0.1 0.2 0.3 0.4 0.5 0.6 Punctuation Generation Rate 0.7 0.8 Performance Gain in State Size 120000 100000 Operator State Size 80000 60000 40000 20000 0 0.1 0.2 0.3 0.4 0.5 0.6 Punctuation Generation Rate 0.7 0.8 Total Latency Comparison (Revision_Memory vs Punctuation_Memory) 1600000 1400000 1200000 Total Latency 1000000 800000 600000 400000 200000 0 0.1 0.2 0.3 Our-of-Order Event Percentage Revision_Memory Punctuation_Memory 0.4 Outline Introduction Problem with Out-of-Order Event Arrival Solutions Conclusion Related Work Introduction: Event Stream Processing Raising interest on the database community Wild-range and growing applications Retail Management Introduction: Algebraic Query Plan Sel (s.id = c.id AND c.id = e.id) WinNeq (!C,10) (S.ts<C.ts<E.ts) WinSeq (S, E, 10) Input Event Stream Q: EVENT WSeq (S , !C, E) WHERE s.id = c.id AND c.id = e.id WITHIN 10 mins S—Shelf reading C—Checkout counter reading E—Exit reading Problems with Out-of-order arrivals Total Order Assumption in event arrivals Order in which the events are received by the query system is the same as their timestamp order By this assumption, “later arrival” means “larger timestamp” In the Case of Out-of-Order Event Arrival Unbounded operator Blocking operator Problem with OOO: Purge in WinSeq EVENT SEQ(S, !C, E) WITHIN 10 mins You see e15 then purge s3 and so on After that, OOO e4 comes Unbounded operator! * S 0 Output in WinNeg OOO c5 will cancel “ s3 e6” Blocking operator! E 1 2 s 3 e4 () s3 () s7 S1 (s3) e4 (s3) e6 (s7) e11 (s7) e15 s 3 e6 S2 … s3 e11 s3 e12 s e s c e e e c 3 6 7 10 11 15 4 5 … Timestamp Two Solution Frameworks Conservative Solution Exploits partial order guarantees (POGs) to produce permanent correct results. Aggressive Solution Outputs sequence results immediately and we design a compensation solution. Conservative method … SourceN data Network Source1 Metadata t Unblock with t Query Processor data Purge with t Operator state Metadata t POG: Partial Order Guarantee POG < Pi, ts > Event type Pi time_stamp ts Example <D, 6> No More out-dated D events with timestamp smaller than 6 will come. d a d Time Stamp a <D,6> POG Functionalities in WinSeq Insert Sort semantics Compute Backward & Forward computation Purge Algorithm Singleton Purge using POGs e e e e P P e P POG Functionalities in WinNeg Insert: Holding Set: store the spurious results Compute: POG P<NE, ts> e1 e2 ...,ei, ej , ...em WinNeg (ej.ts < P.ts) EVENT SEQ(E1,E2,..., Ei, !NE,Ej ...Em) WITHIN 10 mins Aggressive solution Assumption: Most data comes in time and in order. Our goal is to send out results with small latency and send out compensation tuples when out-of-order data arrival occurs. Compensation tuple Insertion message (+,Seq): (induced by an out-of-order positive event), where “Seq” is a new sequence. Deletion messages (-,Seq): (induced by an out-of-order negative event), such that “Seq” consists of the previously processed sequence. Functionalities EVENT SEQ(S, !C, E) WITHIN 10 mins Compute Results s3 e6 e4 c5 <+,s3 e4 > WinSeq WinNeg <-,s3 e6> <-,s3 <-, s3 e6> e6 > s e s c c e c e 3 6 7 10 11 15 5 4 … Timestamp Comparison Conservative Aggressive Latency High Low Correctness Permanent Eventually Memory Consumption Low High CONCLUSION Problems analysis Blocking operator Unbounded operator Two solution frameworks Pessimistic solution Optimistic solution Experiments are on-going… Related Work Borealis system http://www.cs.brown.edu/research/borealis/public CEDR system http://research.microsoft.com/db/CEDR TuckerMSF03 UJ04 … Happy Summer!
© Copyright 2026 Paperzz