Measuring Big Network Data Shigang Chen Computer & Information Science & Engineering University of Florida, Gainesville, FL, USA 2 /45 If we can learn the traffic, … Network Access Pattern Traffic Control and Filtering Capacity Planning Accounting and Billing Worm Detection Scanner Detection Discover Bottleneck Service Provision Anomaly Detection Traffic Engineering 3 /45 Big Network Data Annual global IP traffic will pass zettabyte by 2016. Cisco CRS-3 has a capacity 322 Terabits per second. 4 They are fast 5 /45 They are small … 6 /45 Data volume is huge … 7 /45 Counting Big Network Data Number of distinct elements per flow, i.e., flow cardinality Per-source flow, scan detection. per-destination flow, per-source/destination flow, TCP flow, WWW flow, P2P flow, other applicationspecific flow, etc. Doing so online in network processor is a challenge. 8 Counting Big Edge Data Google over 40,000 search queries every second over 3.5 billion searches per day 1.2 trillion searches per year worldwide. For each searched keyword/phrase/question, how many distinct users make the query? Query suggestion, caching, trend Data center or … 1 bit per keyword/phrase, 1/10 bit per keyword/phrase commodity computer 9 On-chip Implementation Data volume huge Space limited and shared … Routing Packet scheduling Access control Quality of service Packet inspection and classification Intrusion detection Traffic measurement per-destination flow, per-source/destination flow, TCP flow, WWW flow, P2P flow, other application-specific flow, etc. 10 Challenge Available on-chip memory: 8Mb Concurrent Flows: 8M Number of bits per flow: 1 11 Challenge Measurement of flow size Measurement of flow cardinality Number of distinct elements in a flow Duplicate removal Storage elements or use bitmaps 12 Per-flow spread: bit vector for each flow m: p1 p2 p3 p4 p5 p6 … e1 e2 e3 e1 e1 e4 … 0 11 10 0 10 11 10 0 11 0 Vm: fraction of bits that are zeros 13 /45 Per-flow Cardinality Estimation by Sketches FM Sketches LogLog Sketches HyperLogLog Sketches Five bits per sketch and hundreds of sketches per flow 14 How Sketch works? 0 0 1 0 1 1 1 FM Sketch 15 23 Performance by FM Sketches 16 Virtual Maximum Likelihood Sketches Two bits per sketch One bit per flow 17 Virtual Sketches B[2] B[1] B[0] V[0] 18 V[1] V[2] V[3] V[4] V[5] V[6] V[7] Virtual Sketch Vector V[0] V[1] f 19 V[2] V[3] V[4] V[5] V[6] V[7] f’ Vf[0] Vf[1] Vf[2] Vf’[0] Vf’[1] Vf’[2] B[2] Online Operation B[1] B[0] f, e … H(f, e), uniform 20 Vf[0] Vf[1] Vf[2] H’(f, e), geometrical Offline Estimation B[2] B[1] B[0] f Vf[0] Vf[1] Vf[2] 21 Experimental Results 22 Expanding Research 23 Branch 1. Fundamental Limitation 0.1 bit per flow? Where is the limit? Conjecture: No limit, but accuracy threshold 24 Branch 2. How to Share? Multi-level bit sharing 25 Branch 2. How to Share? Multi-level bit sharing 26 Branch 2. How to Share? Common-pool bit sharing Common pool of bits 27 Question 2. How to Share? Common sketch pool sharing Common pool of bits 28 Branch 3. Space Dimension Traffic matrix network ISP may use traffic matrix to align traffic distribution 29 Branch 3. Space Dimension Flow matrix network 30 Branch 4: Time Dimension Example: server farm located in an intranet and protected by gateway router Malicious attacks Stealthy scan – persistent cardinality is zero Stealthy DoQ attacks – persistent cardinality is larger than normal Branch 4: Time Dimension 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 Time 1 intersection 32 Time 3 Time 2 intersection intersection Branch 4: Functional Dimension Flow size Differentiated Accuracy f, e … uniform Space Dimension Time Dimension Branch 5: Applications Tag Reader Automated Warehouse Management Warehouse and Distribution Center Management Counting the number of tags Group Size Estimation Solution? Virtual Sketches B[2] Bits -> Time Slots B[1] B[0] g, e H(g, e), uniform 37 Vg[0] Vg[1] Vg[2] H’(g, e), geometrical Sketches (single flow) virtual sketches (multiple flows) Limits and metheds (theoretical dimension) flow matrix (space dimension) persistent cardinality (time dimension) RFID system (application dimension) Flow size (functional dimension) Thank you 39
© Copyright 2026 Paperzz