CCM:A cooperative caching midleware layer - Research

Cooperative Caching Middleware for
Cluster-Based Servers
Francisco Matias Cuenca-Acuna
Thu D. Nguyen
Panic Lab
Department of Computer Science
Rutgers University
Our work
• Goal
– Provide a mechanism to co-manage memory of cluster-based
servers
– Deliver a generic solution that can be reused by Internet
servers and file systems
• Motivation
– Emerging Internet computing model based on infrastructure
services like Google, Yahoo! and others
» Being built on clusters: scalability, fault-tolerance
– It’s hard to build efficient cluster-based servers
» Dealing with distributed memory
» If memories are used independently, servers only perform
well when the working set fits on a single node
Previous solutions
Request distribution
based on load and
data affinity
Front end
Network
Web
Server
A
Web
Server
Web
Server
Web
Server
A
Web
Server
FS
FS
FS
FS
FS
Previous solutions
Round Robin req .distribution
Request distribution
based on load and
data affinity
Network
Web
Server
A
FS
Web
Web
Distributed
front end Web
Server
Server
Server
A
FS
FS
FS
Web
Server
FS
Our approach
Round Robin req distribution
Network
Cooperative block
caching and global
block replacement
Web
Server
A
Web
Server
Web
Server
Web
Server
A
Web
Server
FS
FS
FS
FS
FS
Our approach
Round Robin req distribution
Network
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
FS
FS
FS
FS
FS
Other uses for
our CC layer
Why cooperative caching and what do
we give up?
• Advantages of our approach
– Generality
» By presenting a block-level abstraction
» Can be used across very different applications such as
web servers and file systems
» Doesn’t need any application knowledge
– Reusability
» By presenting it as a generic middleware layer
• Disadvantages of our approach
– Generality + no application knowledge  possible
performance loss
– How much?
Our contributions
• Study carefully why cooperative caching, as
designed for cooperative client caching to
reduce server load, does not perform as well as
content-aware request distribution
• When compared to a web server that uses
content-aware request distribution
– Lose 70% when using a traditional CC algorithm
– Lose only 8% when using our adapted version (CCM)
• Adapt cooperative caching to better suit cluster
based servers
– Trade lower local hit rates for higher total hit rates (local +
global)
Our cooperative caching algorithm
(CCM)
• Files are distributed across all nodes
– No replication
– The node holding a file on disk is called the file’s home
– Homes are responsible for tracking blocks in memory
• Master blocks and non-master blocks
– There is only one master block for each block/file in memory
– CCM only tracks master blocks
• Hint based block location
– Algorithm based on Dahlin et. al (1994)
– Nodes have approximate knowledge of block location and
may have to follow a chain of nodes to get to it
Replacement mechanisms
• Each node maintains local LRU lists
• Exchange age hints when forwarding blocks
– Piggyback age of oldest block
• Replacement
– Victim is a local block: evict
– Victim is a master block:
» If oldest block in cluster according to age hints, evict
» Otherwise, forward to peer with oldest block
Example of CCM at work
m
fhome
n
p
bmc
bmc
b
b
Assessing performance
• Compare a CCM-based web server against one
that uses content-aware request distribution
– L2S (HPDC 2000)
» Efficient and scalable
» Application-specific request distribution
» Maintain global information
» File based caching
• Event driven simulation
– The same simulator used in L2S
• The platform we simulate is equivalent to:
– 1Gbps VIA LAN
– Clusters of 4 & 8 nodes with single 800Mhz Pentium III
– IDE hard drive on each node
Workload
• Four WWW traces:
Trace
Avg. req. size Num. of requests Working set size
Calgary
13.67KB
567823
128MB
Clarknet
9.50KB
2978121
250MB
NASA
20.33KB
3147685
250MB
Rutgers
17.54KB
745815
500MB
• Drive server as fast as possible
Results
Throughput for Clarknet on 8 nodes
8000
Throughput (req/sec)
7000
6000
5000
4000
3000
L2S
L2S
L2S
CCM
CCM-DS
CCM-DS
CCM-Basic
CCM-Basic
CCM-Basic
2000
1000
0
4
8
16
32
64
Memory per node (MB)
128
256
Hit Rate
Hit rate distribution on CCM
100
100
90
90
80
80
70
70
60
50
40
30
20
Remote hit rate
10
Local hit rate
0
Total hit rate
Total hit rate
Hit rate distribution on CCM-Basic
60
50
40
30
20
Remote hit rate
10
Local hit rate
0
4
8
16
32
64
Memory per node (MB)
128
256
512
4
8
16
32
64
128
Memory per node (MB)
256
512
Normalized throughput
Throughput normalized versus L2S
1.2
Normalized throughput
1.1
1
0.9
0.8
Clarknet
0.7
Rutgers
Nasa
0.6
Calgary
0.5
4
8
16
32
64
Memory per node (MB)
128
256
Resource utilization
CCM’s resource utilization
1
Normalized resource usage
0.9
0.8
0.7
0.6
Disk
0.5
CPU
0.4
NIC
0.3
0.2
0.1
0
4
8
16
32
64
128
Memory per node (MB)
256
512
Scalability
Throughput when running on varying cluster sizes
14000
10000
8000
6000
4000
2000
Number of nodes
32
24
16
8
4
0
2
Throughput (req/sec)
12000
Further results
• Performance differences between CCM and L2S
may be affected by:
– L2S’s use of TCP hand-off
– L2S’s assumption that files are replicated everywhere
– Refer to paper for estimates of potential performance
difference due to these factors
• Current work
– Limit the amount of metadata maintained by CCM
» To reduce memory usage
» Discard outdated information
– Lazy eviction and forwarding notification
» On average finds a block with 1.1 hops (vs. 2.4)
» 10% response time decrease
» 2% throughput increase
Conclusions
• A generic block-based cooperative caching
algorithm can efficiently co-manage cluster
memory
– CCM performs almost as well as a highly optimized content
aware request distribution web server
– CCM scales linearly with cluster size
– Presenting a block-based solution to a file-based application
only led to a small performance loss  should work great for
block-based applications
• CCM achieves high-performance by using a new
replacement algorithm well-suited to a server
environment
– Trades off local hit rates and network bandwidth for increased
total hit rates
– Right trade-off given current network and disk technology
trends
Future & related work
• Future Work
– Investigate the importance of load-balancing
– Provide support for writes
– Validate simulation results with implementation
• Some Related Work
–
–
–
–
–
PRESS (PPoPP 2001)
L2S (HPDC 2000)
LARD (ASPLOS 1998)
Cooperative Caching (OSDI 1994)
Cluster-Based Scalable Network Services (SOSP 1997)
Thanks to
•
•
•
•
Liviu Iftode
Ricardo Bianchini
Vinicio Carreras
Xiaoyan Li
Want more information?
www.panic-lab.rutgers.edu
Extra slides – Simulation parameters
Extra slides – Response time
Response time normalized versus L2S
Normalized response time
1.8
Clarknet
Rutgers
Nasa
Calgary
1.6
1.4
1.2
1
0.8
0.6
4
8
16
32
64
Memory per node (MB)
128
256
Extra slides – Hops vs. hit rate
Number of hops versus hit rate
120
2.5
100
80
1.5
60
1
40
0.5
Hops (w/ notification)
Hops
Global Hit Rate
0
20
0
4
8
16
32
64 128 256
Memory per node (MB)
512
Hit rate
Number of hops
2
Extra slides – Traces characteristics
Extra slides – Using location hints
m
fhome
n
p
bmc
bmc
b
b