Disk - Worcester Polytechnic Institute

Event Stream Processing
with Out-of-Order Data Arrival
Mo Liu
Database System Research Group
Worcester Polytechnic Institute
Total Latency
4.5E+14
4E+14
3.5E+14
3E+14
2.5E+14
2E+14
1.5E+14
1E+14
5E+13
0
0.1
0.2
0.3
0.4
Out-of-Order Event Percentage
Revision_Disk
Revision_Disk_Batch
Total Latency
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
Our-of-Order Event Percentage
Revision_Memory
Punctuation_Memory
Operator State Size
120000
100000
80000
60000
40000
20000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Punctuation Generation Rate
0.8
Execution Time
8000
7000
6000
5000
4000
3000
2000
1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Punctuation Generation Rate
0.8
Operator State Size
120000
100000
80000
60000
40000
20000
0
1
2
3
4
5
6
Punctuation Generation Rate
7
8
4.5E+14
Total Latency
4E+14
3.5E+14
3E+14
2.5E+14
2E+14
1.5E+14
1E+14
5E+13
0
0.1
0.2
0.3
0.4
Out-of-Order Event Percentage
Revision_Disk
Revision_Disk_Batch
1600000
Total Latency
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
Our-of-Order Event Percentage
Revision_Memory
Punctuation_Memory
1600000
Total Latency
1400000
1200000
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
0.4
Our-of-Order Event Percentage
Revision_Memory
Punctuation_Memory
Performance Gain in Execution Time
8000
7000
Execution Time
6000
5000
4000
3000
2000
1000
0
0.1
0.2
0.3
0.4
0.5
Punctuation Generation Rate
0.6
0.7
0.8
Performance Gain in State Size
120000
100000
Operator State Size
80000
60000
40000
20000
0
0.1
0.2
0.3
0.4
0.5
0.6
Punctuation Generation Rate
0.7
0.8
Performance Gain in Execution Time
8000
7000
Execution Time
6000
5000
4000
3000
2000
1000
0
0.1
0.2
0.3
0.4
0.5
Punctuation Generation Rate
0.6
0.7
0.8
Total Latency Comparison (Revision_Disk vs Revision_Disk_Batch)
4.5E+14
4E+14
3.5E+14
Total Latency
3E+14
2.5E+14
2E+14
1.5E+14
1E+14
5E+13
0
0.1
0.2
0.3
Out-of-Order Event Percentage
Revision_Disk
Revision_Disk_Batch
0.4
Performance Gain in Execution Time
120000
100000
Execution Time
80000
60000
40000
20000
0
0.1
0.2
0.3
0.4
0.5
0.6
Punctuation Generation Rate
0.7
0.8
Performance Gain in State Size
120000
100000
Operator State Size
80000
60000
40000
20000
0
0.1
0.2
0.3
0.4
0.5
0.6
Punctuation Generation Rate
0.7
0.8
Total Latency Comparison (Revision_Memory vs Punctuation_Memory)
1600000
1400000
1200000
Total Latency
1000000
800000
600000
400000
200000
0
0.1
0.2
0.3
Our-of-Order Event Percentage
Revision_Memory
Punctuation_Memory
0.4
Outline
Introduction
Problem with Out-of-Order Event Arrival
Solutions
Conclusion
Related Work
Introduction: Event Stream Processing


Raising interest on the database community
Wild-range and growing applications
Retail Management
Introduction: Algebraic Query Plan
Sel (s.id = c.id AND c.id = e.id)
WinNeq (!C,10) (S.ts<C.ts<E.ts)
WinSeq (S, E, 10)
Input Event Stream
Q:
EVENT WSeq (S , !C, E)
WHERE s.id = c.id AND c.id = e.id
WITHIN 10 mins
S—Shelf reading
C—Checkout counter reading
E—Exit reading
Problems with Out-of-order arrivals

Total Order Assumption in event arrivals
 Order
in which the events are received by the
query system is the same as their timestamp
order
 By this assumption, “later arrival” means “larger
timestamp”

In the Case of Out-of-Order Event Arrival
Unbounded operator
Blocking operator
Problem with OOO:
Purge in WinSeq
EVENT SEQ(S, !C, E)
WITHIN 10 mins
You see e15 then
purge s3 and so on
After that, OOO e4
comes

Unbounded operator!
*
S
0
Output in WinNeg
OOO c5 will cancel
“ s3 e6”

Blocking operator!
E
1
2
s 3 e4
() s3
() s7
S1
(s3) e4
(s3) e6
(s7) e11
(s7) e15
s 3 e6
S2
…
s3 e11
s3 e12
s
e
s
c
e
e
e
c
3
6
7
10
11
15
4
5
…
Timestamp
Two Solution Frameworks

Conservative Solution
Exploits partial order guarantees (POGs) to
produce permanent correct results.

Aggressive Solution
Outputs sequence results immediately and we
design a compensation solution.
Conservative method
…
SourceN
data
Network
Source1
Metadata t
Unblock with t
Query Processor
data
Purge with t
Operator state
Metadata t
POG: Partial Order Guarantee

POG < Pi, ts >
Event type Pi

time_stamp ts
Example <D, 6>
No More out-dated D events
with timestamp smaller than 6
will come.
d
a
d
Time Stamp
a
<D,6>
POG Functionalities in WinSeq
Insert
Sort semantics
 Compute
Backward & Forward computation
 Purge
Algorithm Singleton Purge using POGs

e
e
e
e
P
P
e
P
POG Functionalities in WinNeg
Insert:
Holding Set: store the spurious results
 Compute:

POG P<NE, ts>
e1 e2 ...,ei,
ej , ...em
WinNeg
(ej.ts < P.ts)
EVENT SEQ(E1,E2,..., Ei, !NE,Ej ...Em)
WITHIN 10 mins
Aggressive solution
Assumption: Most data comes in time and
in order.
 Our goal is to send out results with small
latency and send out compensation tuples
when out-of-order data arrival occurs.

Compensation tuple

Insertion message (+,Seq): (induced by
an out-of-order positive event), where
“Seq” is a new sequence.

Deletion messages (-,Seq): (induced by
an out-of-order negative event), such that
“Seq” consists of the previously processed
sequence.
Functionalities
EVENT SEQ(S, !C, E)
WITHIN 10 mins

Compute
Results
s3 e6
e4 c5
<+,s3 e4 >
WinSeq
WinNeg
<-,s3 e6>
<-,s3
<-, s3 e6>
e6 >
s
e
s
c
c
e
c
e
3
6
7
10
11
15
5
4
…
Timestamp
Comparison
Conservative
Aggressive
Latency
High
Low
Correctness
Permanent
Eventually
Memory
Consumption
Low
High
CONCLUSION

Problems analysis
Blocking operator
Unbounded operator

Two solution frameworks
Pessimistic solution
Optimistic solution

Experiments are on-going…
Related Work
Borealis system
http://www.cs.brown.edu/research/borealis/public
 CEDR system
http://research.microsoft.com/db/CEDR
 TuckerMSF03
 UJ04
…

Happy
Summer!
