Beyond Bloom Filters: From Approximate Membership Checks to

Beyond Bloom Filters:
From Approximate Membership
Checks to Approximate State
Machines
By F. Bonomi et al.
Presented by
Kenny Cheng, Tonny Mak Yui Kuen
Introduction
A) Motivation
B) Objectives
C) Problem statements
2
A) Motivation
• Increasing trend to keep flow state in routers
• Large memory space (~100 bits per flow) is
needed for storing a large amount of flow
states
• If memory space can be reduced, using fast onchip memory is feasible to improve
performance
3
B) Objectives
• Introduce the idea of an Approximate
Concurrent State Machine (ACSM), it
sacrifices some accuracy for memory size.
• Introduce and compare several solutions to
ACSM problem
• To find an approach with the highest accuracy
to memory ratio
4
C) Problem statements
• Describe 3 techniques based on Bloom filters
and hashing, and evaluate them using both
theoretical analysis and simulation
5
Bloom Filter
• A data structure proposed by Bloom in 1970
• Designed for membership test, i.e. to test
whether an element exists in a set
• Fast and compact
• Chance of false positive, i.e. an element not in
the set may be wrongly identified
• No false negative, i.e. an element in the set
must be identified correctly
6
How a Bloom Filter Works
0 0 0 0 0 0 0 0 0 0 0 0 0 0
1
2
3
...
k
• A bit array with all zeros initially
• k hash functions
7
How a Bloom Filter Works
Insertion
0 1
0 0 0 0 1
0 0 0 1
0 0 0 0
0 0 1
1
2
3
...
k
x
• Hash the element using the hash functions, get k
indices in the bit array
• Mark the bits to 1
8
How a Bloom Filter Works
Lookup
0 0 1 1 1 0 0 1 0 0 1 0 0 1
1
2
3
...
k
x
• Hash the element using the hash functions
• If all corresponding bits are 1, it’s in the set
9
How a Bloom Filter Works
Deletion
1 ?
1 1 0 0 ?
1 0 0 ?
1 0 0 1
0 0 ?
1
2
3
...
k
x
• Sorry, no deletion
• You don’t know whether the bits are used by other
elements or not, cannot simply clear them
10
Counting Bloom Filter
0 0 1
0 1
0 3
1 0 0 3
1 0 0 1
2
0 0 1 0
1
2
3
...
k
x
•
•
•
•
Use a counter to replace a bit
For insertion, increment the counters
For deletion, decrement the counters
Problems: more space, overflow counters
11
3 Approaches to ACSM
• Approaches:
1. Direct Bloom Filter
2. Stateful Bloom Filter
3. Fingerprint-compressed Filter
• Operations need to implement:
1. Insert(flow, state)
2. Lookup(flow) returns (state)
3. Delete(flow)
4. Update(flow, new_state)
12
Direct Bloom Filter Approach
• Use counting Bloom filter
• 4 operations:
Insert – insert (flow_id, state) pair
Lookup – if state is not provided, have to lookup
every state, return “don’t know” if more than one
state is found
Delete – lookup + decrement counters
Update – delete old + insert new
• Improvement: use timing-based deletion to handle
non-terminated flows
13
Timing-based Deletion
Timing Bits
0 0 1
0 0
1 0 0 0 0
1 0 0 1
0 0 0 0
2 0 1 1 0 0
1 0 2 0 0 3 0
3 0 0
1
0
1
2
3
...
k
x
•
•
•
•
Add a timing bit to each cell
Set the bit if the cell is touched
Clear untouched cells periodically, and reset timing bits
Alternative to DBF: use standard Bloom filter instead of
counting, delete elements only by time-based deletion
14
Stateful Bloom Filter Approach
• Direct Bloom Filter doesn’t store the state of a
flow, need to lookup every state
• Improvement: add a state value for each cell
for faster lookup
• Hash flow_id only, instead of (flow_id, state)
pair
• Introduce a “don’t know” (DK) state when
collision occurs
• Keep timing-based deletion
15
Stateful Bloom Filter Approach
• Insert, modify, delete – similar to Direct
Bloom Filter, set the cell value to DK for
collision (counter > 1)
• Lookup:
If all cells are DK, return DK
If all cells are either state i or DK, return state i
If more than one state other than DK, return
“not found”
16
Fingerprint-compressed Filter
Approach
• Store a fingerprint of flow + state in a d-left
hashtable
...
1
2
...
d
x
Fingerprint State
0110111010
1100110000
1001010110
2
4
1
1110001000
1110011101
0111010100
1
3
1
...
0000111101
1100000110
3
3
17
Fingerprint-compressed Filter
Approach
• Insert - hash the element, and find the corresponding
bucket in each hash table, insert the fingerprint + state
in the bucket with least number of elements (choose
the left-most one to break ties)
• Lookup – retrieve the state of the fingerprint
• Delete – remove the fingerprint
• Update – direct update or remove old + add new
• Make use of DK when a fingerprint is found in
multiple buckets
• Timing-based deletion can still be applied
18
Simulation
• To investigate the size/accuracy trade-off for
the 3 approaches
• State machine: 10 states
• Legal state changes: 1 → 2 → 3 → … → 10
• Run for 1 million flows
• About 60000 simultaneous flows
• 100 ± 40 packets for each flow
• Some packets trigger state change
19
Simulation
• 3 kinds of simulation flows
• Interesting flows (30%) – flows with legal
state changes only, always complete
• Noise flows (30%) – flows with random (can
be legal or illegal) state changes, never
complete
• Random flows (40%) – flows without state
change
20
Simulation
False positive rate: % of completed flows which is not-interesting
False negative rate: % of interesting flows without completion
21
Applications
Place in the application level QoS:• Video congestion control
• Peer-to-Peer (P2P) traffic identification
22
Video congestion control
• Apply to MPEG video streaming
• 3 kinds of frames for MPEG video:
I frame – scene information
P frame – differential information
B frame – least important information
• Can drop B frames up to 30% with acceptable
quality
• Need to keep track of current frame
23
Video congestion control
• Use FCF ACSM to keep track of state
• Experimentally the highest false positive rate
acceptable is 0.37%
• This requires a memory size of 27 bits per
flow (about ¼ compared to original 100 bits)
24
P2P Traffic Identification
• To limit P2P flows to increase quality for other
applications
• One possible way to identify a P2P flow:
concurrent TCP and UDP flows
• Use ACSM for real-time P2P identification
25
Conclusion
• It’s feasible for ACSM
• FCF approach is the best approach
• Two potential applications are introduced for
ACSM
• ACSM may be beneficial to QoS applications,
which are fault-tolerant
26
Comments
• Authors focus on accuracy and memory size,
but not real performance
• FCF approach may not perform well on
hardware
27
Question & Answer
- End -