Energy-ef*cient Polling Protocols in RFID Systems

Measuring Big Network Data
Shigang Chen
Computer & Information Science & Engineering
University of Florida, Gainesville, FL, USA
2
/45
If we can learn the traffic, …
Network Access Pattern
Traffic Control and
Filtering
Capacity
Planning
Accounting and Billing
Worm Detection
Scanner Detection
Discover Bottleneck
Service Provision
Anomaly Detection
Traffic Engineering
3
/45
Big Network Data
 Annual global IP traffic will pass zettabyte by 2016.
 Cisco CRS-3 has a capacity 322 Terabits per second.
4
They are fast
5
/45
They are small …
6
/45
Data volume is huge …
7
/45
Counting Big Network Data
 Number of distinct elements per flow, i.e., flow
cardinality
 Per-source flow, scan detection.
 per-destination flow, per-source/destination flow,
TCP flow, WWW flow, P2P flow, other applicationspecific flow, etc.
 Doing so online in network processor is a challenge.
8
Counting Big Edge Data
 Google
 over 40,000 search queries every second
 over 3.5 billion searches per day
 1.2 trillion searches per year worldwide.
 For each searched keyword/phrase/question, how many
distinct users make the query?
 Query suggestion, caching, trend
 Data center or …
 1 bit per keyword/phrase, 1/10 bit per keyword/phrase
 commodity computer
9
On-chip Implementation
 Data volume huge
 Space limited and shared …







Routing
Packet scheduling
Access control
Quality of service
Packet inspection and classification
Intrusion detection
Traffic measurement
 per-destination flow, per-source/destination flow, TCP
flow, WWW flow, P2P flow, other application-specific
flow, etc.
10
Challenge
 Available on-chip memory: 8Mb
 Concurrent Flows: 8M
 Number of bits per flow: 1
11
Challenge
 Measurement of flow size
 Measurement of flow cardinality
 Number of distinct elements in a flow
 Duplicate removal
 Storage elements or use bitmaps
12
Per-flow spread: bit vector for each
flow
m:
p1
p2
p3 p4 p5
p6 …
e1
e2
e3 e1 e1
e4 …
0 11 10
0 10 11
10 0 11
0
Vm: fraction of bits that are zeros
13
/45
Per-flow Cardinality Estimation
by Sketches
 FM Sketches
 LogLog Sketches
 HyperLogLog Sketches
Five bits per sketch and hundreds of sketches per flow
14
How Sketch works?
0
0
1
0
1
1
1
FM Sketch
15
23
Performance by FM Sketches
16
Virtual Maximum Likelihood Sketches
 Two bits per sketch
 One bit per flow
17
Virtual Sketches
B[2]
B[1]
B[0]
V[0]
18
V[1]
V[2]
V[3]
V[4]
V[5]
V[6]
V[7]
Virtual Sketch Vector
V[0]
V[1]
f
19
V[2]
V[3]
V[4]
V[5]
V[6]
V[7]
f’
Vf[0] Vf[1] Vf[2]
Vf’[0] Vf’[1] Vf’[2]
B[2]
Online Operation
B[1]
B[0]
f, e …
H(f, e), uniform
20
Vf[0] Vf[1] Vf[2]
H’(f, e), geometrical
Offline Estimation
B[2]
B[1]
B[0]
f
Vf[0] Vf[1] Vf[2]
21
Experimental Results
22
Expanding Research
23
Branch 1. Fundamental Limitation
0.1 bit per flow?
Where is the limit?
Conjecture: No limit, but accuracy threshold
24
Branch 2. How to Share?
Multi-level bit sharing
25
Branch 2. How to Share?
Multi-level bit sharing
26
Branch 2. How to Share?
Common-pool bit sharing
Common pool of bits
27
Question 2. How to Share?
Common sketch pool sharing
Common pool of bits
28
Branch 3. Space Dimension
 Traffic matrix
network
 ISP may use traffic matrix to align traffic distribution
29
Branch 3. Space Dimension
 Flow matrix
network
30
Branch 4: Time Dimension
 Example: server farm located in an intranet
and protected by gateway router
 Malicious attacks
 Stealthy scan – persistent cardinality is zero
 Stealthy DoQ attacks – persistent cardinality is
larger than normal
Branch 4: Time Dimension
0
0
0
0
0
0
1
0
0
0
1
1
1
1
0
1
1
1
1
1
1
Time 1
intersection
32
Time 3
Time 2
intersection
intersection
Branch 4: Functional Dimension
 Flow size
Differentiated Accuracy
f, e …
uniform
Space Dimension
Time Dimension
Branch 5: Applications
Tag
Reader
Automated Warehouse Management
Warehouse and
Distribution Center
Management
Counting the number of
tags
Group Size Estimation
Solution? Virtual Sketches
B[2]
Bits -> Time Slots
B[1]
B[0]
g, e
H(g, e), uniform
37
Vg[0] Vg[1] Vg[2]
H’(g, e), geometrical
Sketches (single flow)
virtual sketches (multiple flows)
Limits and metheds (theoretical dimension)
flow matrix (space dimension)
persistent cardinality (time dimension)
RFID system (application dimension)
Flow size
(functional dimension)
Thank you
39