SAC: Exploiting Stable Set Model to Enhance CacheFiles

Liu JL, Zhang YL, Yang L et al. SAC: Exploiting stable set model to enhance CacheFiles. JOURNAL OF COMPUTER
SCIENCE AND TECHNOLOGY 29(2): 293–302 Mar. 2014. DOI 10.1007/s11390-014-1431-z
SAC: Exploiting Stable Set Model to Enhance CacheFiles
Jian-Liang Liu1,2 (刘建亮), Yong-Le Zhang3 (张永乐), Lin Yang1,2 (杨
Zhen-Jun Liu1 (刘振军), and Lu Xu1 (许 鲁)
琳), Ming-Yang Guo1 (郭明阳)
1
Data Storage and Management Technology Research Center, Institute of Computing Technology, Chinese Academy
of Sciences, Beijing 100190, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Department of Electrical and Computer Engineering, University of Toronto, Toronto M5S 3G4, Canada
E-mail: [email protected]; [email protected]; {yanglin, guomingyang, liuzhenjun, xulu}@ nrchpc.ac.cn
Received November 15, 2013; revised January 8, 2014.
Abstract Client cache is an important technology for the optimization of distributed and centralized storage systems. As
a representative client cache system, the performance of CacheFiles is limited by transition faults. Furthermore, CacheFiles
just supports a simple LRU policy with a tightly-coupled design. To overcome these limitations, we propose to employ
Stable Set Model (SSM) to improve CacheFiles and design an enhanced CacheFiles, SAC. SSM assumes that data access
can be decomposed to access on some stable sets, in which elements are always repeatedly accessed or not accessed together.
Using SSM methods can improve the cache management and reduce the effect of transition faults. We also adopt looselycoupled methods to design prefetch and replacement policies. We implement our scheme on Linux 2.6.32 and measure the
execution time of the scheme with various file I/O benchmarks. Experiments show that SAC can significantly improve I/O
performance and reduce execution time up to 84%, compared with the existing CacheFiles.
Keywords
1
Stable Set Model, cache management, CacheFiles
Introduction
With the era of big data and increasing computing
power, data centers, global enterprises, and cloud storage providers all require massive amounts of data to
share. The conventional distributed storage and centralized storage are confronted with severe challenges
of performance and scalability.
Client cache is an important technology for the
optimization of distributed and centralized storage
systems[1-4] . It can reduce access latency and server
load, and smooth data access traffic, with local cache
storage. The rapid development of SSD (solid-state
drive) further increases the importance of client cache.
With a directory on disks, CacheFiles[2],① can be
used as a caching file system layer for Linux to enhance the performance of a distributed file system (e.g.,
NFS, AFS). Blue Whale Cluster File System[5] (BWFS)
is based on SAN architecture and adopts Out-of-Band
transfer mode. BWFS’s clients can use CacheFiles as
local caching. However, the performance of CacheFiles
is limited by two problems: CacheFiles lacks the ability
to efficiently exchange data during phase transitions;
furthermore, CacheFiles just supports a simple file-level
LRU policy with a tightly-coupled design.
Larger cache capacity leads to fewer and fewer
phase faults[6] which represent cache misses in stable phases. But, transition faults which happen in
transition periods account for majority of cache faults,
which leads to the inefficiency of “data exchange” between phases. How to improve the performance of
CacheFiles during transition periods is an important
problem. Conventional researches on cache management algorithms[7-9] mainly focus on phase faults and
ignore “data exchange” during transitions. In order to
solve the transition fault problem, our previous study
presented Stable Set Model (SSM)[10] . SSM considers
that data access streams can be decomposed to a num-
Regular Paper
This work was supported by the National Basic Research 973 Program of China under Grant No. 2011CB302304, the National High
Technology Research and Development 863 Program of China under Grant Nos. 2011AA01A102, 2013AA013201 and 2013AA013205,
the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010401, and the Chinese Academy
of Sciences Key Deployment project under Grant No. KGZD-EW-103-5(7).
The work was done while the second author was a M.S. student of Institute of Computing Technology, Chinese Academy of
Sciences.
① https://www.kernel.org/doc/Documentation/filesystems/caching/cachefiles.txt, Jan. 2014.
©2014 Springer Science + Business Media, LLC & Science Press, China
294
J. Comput. Sci. & Technol., Mar. 2014, Vol.29, No.2
ber of access streams on different stable sets, and there
are durable relationships between data in each stable
set.
The most important contribution of this paper is
that we use SSM-based methods to manage CacheFiles
and apply SSM to a real cache system for the first time.
After devising a general framework where different polices can be easily plugged in, we present the overall design and implementation of SSM-based CacheFiles —
SAC, and take an incremental approach to evaluate it
with representative testing tools and file system benchmarks. Experiments show SSM-based prefetch can reduce response time by up to 54.3% and SSM-based replacement can reduce the execution time by 21.76%.
The overall effectiveness can be improved up to 84%.
The remainder of the paper is organized as follows.
Section 2 provides a brief review of Stable Set Model.
We describe system design details in Section 3 and experiments in Section 4. Section 5 discusses the limitation of our work and presents some future work. Section 6 describes related work and Section 7 concludes
the paper.
2
2.1
Stable Set Model
Definition
SSM is the first macro model used in managing cache
of storage systems. It comes from graphical observation of a large amount of data access. The majority
of data access at the macroscopic level has two things
in common: 1) Just like program behavior, the data
access presents phase-transition feature, which is the
temporal-locality behavior coexisting with mutations;
2) The dataset accessed in a phase is not randomly
composed, and there are elementary sets to form it.
SSM is defined based on these two commonalities.
Definition 1 (Stable Set Model). Stable Set Model
considers that any data access stream RE can be decomposed to a number of stable set access streams, just as
following:
n
X
RE =
(Si , Ti ),
(1)
i=1
where (Si , Ti ) represents access stream on stable set Si .
Stream (Si , Ti ) is in a phase-based manner, and independent with streams on other stable sets. Si represents the dataset, and is disjoint with other stable sets.
Ti represents the sequence of time phases during which
stream (Si , Ti ) happens:
Ti = (ti1 , ti2 ), (ti3 , ti4 ), (ti5 , ti6 ), . . . .
(2)
We use working set model[6] as a base to give the precise definition. Because working set model ignores the
access order information inside a working set, the complexity of our definition is approximately O(L), where
L is the length of data accesses.
Because a stable set is a dataset in which all data
are always repeatedly accessed together, we need two
steps to define the stable set via working set.
Step 1: forming repeatedly accessed sets (defined
as the stable access sets in early stage).
We get all repeatedly accessed sets by intersecting
two contiguous working sets.
R is a repeatedly accessed set, i.e.,
R = W (t, T ) ∩ W (t + T, T ),
(3)
where W (t, T ), W (t + T, T ) are working sets.
Step 2: forming always repeatedly accessed together sets (so called stable sets).
We get stable sets by intersecting all repeatedly accessed sets we got. So the stable set is the elementary
set of repeatedly accessed sets.
S is a stable set, i.e.,
∀R, S ∩ R = S or S ∩ R = Ø.
(4)
Another important concept is the stable set’s active
time T, and we define it as follows.
Definition 2 (Active Time). If repeatedly accessed
set R = W (t, T ) ∩ W (t + T2 , T2 ), and stable set S ∩ R =
S; then time section (t − T, t + T2 ) is called the active
time of S, and S in (t − T, t + T2 ) is considered to be in
the active state.
2.2
Cache Block Size
Cache block size (B) is not only the basic unit of
cache resource allocation, but also the basic unit of
backend load. It is very important for cache resource,
backend load, and the locality of data access.
For cache resources, too small B will lead to insufficient use of cache capacity and high overhead for
cache management; too large B will lead to too much
cache pollution, thus greatly increasing the number of
cache miss. For backend load, too small B could undermine the continuity of data requests, thus decreasing the maximum throughput of storage systems; too
large B may cause too much waste I/O, and also reduce the maximum service ability of network storage
systems. Using a larger granularity for perceiving data
access streams will enhance the locality of data access,
and reduce the miss ratio of SSM. Based on the three
reasons above, we use SSM to choose a proper B for
cache management.
Definition 3 (Stable Set Granularity). B is the stable set granularity (SSG) of a data access stream if and
only if using B as the cache granularity will not lead
Jian-Liang Liu et al.: SAC: Exploiting Stable Set Model to Enhance CacheFiles
to phase faults, and the backend load of its footprint is
minimal.
2.3
Cache Prefetch
Using SSM for prefetch has three advantages: 1) It
can reduce I/O response time more. 2) It can bring
more opportunities for merging requests, so as to reduce
backend load more. 3) Prefetch makes more data requests asynchronous and provides more schedule space
for storage systems, so as to smooth the bursty I/O and
improve the scalability of storage systems.
The core idea of stable set prefetch, is to perceive
the state change of stable sets, and prefetch all data
blocks of a stable set when it turns active, so as to reduce compulsory misses. But the active time of SSM is
a posterior definition, which is hard to be used in cache
management. In order to detect the state change of
stable sets, we still need the following concept — MAI.
Definition 4 (Maximum Active Interval). The
largest access interval on a stable set S when it is active is called the maximum active interval (MAI) of the
stable set S.
When any stable set is active, all of its elements will
be accessed simultaneously and sustainably, but when
it is inactive, there will be no access to it. So there is
a wide difference of data access intervals between these
two states. This is why we can use data access intervals to detect the state change of stable sets. MAI is
the boundary of state change detection. If a stable set
is accessed twice successively with an interval smaller
than MAI, we can think that it is in an active state. If a
stable set has not been accessed for a time longer than
MAI, we can think that it turns inactive. The formal
definition of stable set spatial-locality is as follows.
Definition 5 (Stable Set Spatial-Locality). If the
last access interval of a stable set is smaller than MAI,
then all the data of the stable set are likely to be accessed
in recent future. This is called stable set spatial-locality.
2.4
295
MAI, we will consider that it turns to the inactive state,
and degrade all its data to a low priority. The formal
definition of stable set temporal-locality is given as follows.
Definition 6 (Stable Set Temporal-Locality). If a
stable set is not accessed for a time longer than its MAI,
all the data of the stable set are unlikely to be accessed in
recent future. This is called stable set temporal-locality.
2.5
Limitation of SSM Cache Management
The premise of using SSM in cache management is
that cache must be big enough so there is a chance to
eliminate all phase faults. Therefore we define largecapacity cache as follows.
Definition 7 (Large-Capacity Cache). A cache system is a large-capacity cache if and only if it is bigger
than the largest phase set when using the smallest granularity.
3
3.1
System Design
SAC System Architecture
In this design, SAC is used in the client of BWFS
and utilizes a large capacity SSD with ext4 file system
as client cache’s physical media. Fig.1 shows the architecture of our client cache, and cachefilesd is a daemon
in user space and manages cache, while CacheFiles runs
in kernel space. In order to support SSM, we add a stable set mining module. We design each component, as
follows.
Cache Replacement
Stable set prefetch needs a lot of cache resources to
load the whole set. According to [11], prefetched data
is usually treated as one-time access data or even lower
priority. Some stable sets may be no longer active, but
because their data has been accessed repeatedly, we
cannot replace them from the cache. Consequently, it
will waste the cache space.
The core idea of stable set replacement algorithm,
is to perceive the state change of a stable set, and degrade all its data when it becomes inactive, so as to
make them replaceable. The way to perceive the state
change is also to monitor access intervals on stable sets.
If a stable set is not accessed for a time longer than its
Fig.1. SAC system architecture.
SSM Mining. This component in user space is for
mining stable sets. It first collects online traces captured by kernel to a temporal trace file, and then mines
the trace file with the mining program to produce the
most critical stable set file.
SSM Cache Manager. This component in user space,
contains replacement and prefetch policies. These policies are configured on demand.
296
J. Comput. Sci. & Technol., Mar. 2014, Vol.29, No.2
Cache-Kernel-Module. This kernel part is implemented in CacheFiles, and consists of three parts.
• Prefetch-Module. This module analyzes command parameters passed by cachefilesd and does asynchronous prefetch.
• Release-Module.
This module receives and
analyzes command parameters by cachefilesd, and
executes ext4’s fallocate operation with FALLOC PUNCH HOLE flag to release cache blocks.
• Tracking-Module. This module is for tracking
access-record trace. Its main goal is to provide access
records to SSM mining and cache manager modules.
Overall, cachefilesd is responsible for collecting the
kernel states (via read call), and then sends all commands (via write call) to CacheFiles; CacheFiles does
the real jobs of prefetch and replacement. That is to
say, cachefilesd to CacheFiles is analogous to the strategy to the mechanism.
Fig.2 shows the pseudo-code of cache management in
cachefilesd and policy initializations are ignored. Here,
we isolate prefetch algorithm from replacement algorithm, unlike the conventional prefetch triggered by
miss. It benefits the module design and simplifies the
inner procedure.
cachefilesd() {
n = read cache state to buffer;
do {
S1: collect system information;
} while (buffer is not empty);
For (prefetched records)
S2: add block to cache;
For (access records) {
S3: access cache;
S4: do the prefetch;
}
S5: cache replacing;
}
Fig.2. Simplified cache management in cachefilesd.
S1. This step processes the kernel information returned by read call, including prefetch trace, data access
trace, and the state of cache resource.
S2. Because prefetch is asynchronous, this step adds
pages successfully prefetched to cache, and when all
pages belonging to the same block (SSG may contains
multiple pages) are called back, SAC removes the corresponding node from the prefetch tree.
S3. This step playbacks the data access trace, updating the cache state (if needed, update states of all
stable sets).
S4. This step checks whether prefetch or not, and if
allowed, SAC issues prefetch and inserts a node generated with prefetch information into a prefetch tree for
quickly looking up, so as to avoid repeatedly prefetching
a block.
S5. The final step, according to the cache state and
the physical state of the cache directory (via statvfs
call), triggers replacement operation to remove the right
data blocks.
In essence, SAC is a combination of SSM and
CacheFiles, but two challenging issues need to be addressed for the design. The first issue is to find a data
organization to describe cache management. The second issue is how cache policies interact with SSM. The
next two subsections address the two design issues respectively.
3.2
File-Block Management
CacheFiles is a file-level cache, only supporting cache
replacement at a file granularity, while the cache management based on SSM needs block-level caching. In
order to make the best use of cache management, we
need to implement mechanism of cache management at
a block granularity.
The combination of file level and block level of cache
management performs better. File-Aware-Cache[12] has
explored file-level information to optimize block-level
cache, which can avoid caching one-time sequential access of large file using file information. The research
has shown that cache management of combination of
file level and block level can perform better than that
of simple file level or block level.
To achieve a file-block-level cache, we combine the
file identification and the offset together. Generally, we
can find a file via a file full path name or a file inode, or
a file handle by NFS. As the length of a full path name
is uncertain and the inode may be expired, we use the
file handle as the file identifier. File handles and file
block offsets are combined into logical block numbers.
As the size of a file handle can be up to 128 B, directly
as a part of the logical block number, it is bound to
introduce a lot of time and space overhead. In order
to reduce the overhead, we map the file handle to a
32-bit number, and then with a 32-bit offset of the file
together to form the logical block number, as shown in
Fig.3.
3.3
SSM-Based Cache Management
To make use of SSM for guiding the cache prefetch
and cache replacement, we need to define the way of
interaction between SSM and cache (including cache
replacement and cache prefetch). Unlike the conventional prefetch and cache replacement mixed together,
we isolate prefetch interface from replacement interface.
Jian-Liang Liu et al.: SAC: Exploiting Stable Set Model to Enhance CacheFiles
297
Fig.3. A file handle is encoded to be visible characters by base64 algorithm, and then we use a hash function for visible characters to
complete the mapping from a file handle to a 32-bit number. On the one hand, hash function ensures SAC runs with a small overhead;
on the other hand, the 32-bit offset within file allows the file size of 2T, guaranteeing the large file support.
Both of SSM-based prefetch and replacement are
able to access stable sets to guide their optimizations
through interfaces of SSM. Cache management is described as three distinct operations: cache access, cache
replacement, and cache prefetch.
3.3.1 Cache Access
When a data block is accessed, we first determine
whether it is in cache. If in the cache, then data is returned. It should be noted that, the search operation is
done automatically by NFS, and we just record access
to data (FH, Offset, and Length). When the user daemon playbacks the access trace, we update cache state,
and check state transitions of stable sets, and do data
prefetch. Finally, update the last access time of the
stable set if the data block belongs to any stable set.
What is more, our data management uses the hash list
to reduce space overhead.
3.3.2 Cache Replacement
Stable set temporal-locality can be used to enhance any replacement algorithm, because stable set
temporal-locality is orthogonal to the theoretical basis
of all replacement algorithms, including the temporallocality, access frequency and so on.
The replacement procedure is divided to three interfaces:
Cache hit interface (by S3 in Fig.2) only cares those
hits. First update metadata cache management according to the hit information, and then update the states
of all stable sets. If the interval from the last update
time of a table set is more than the maximum active
interval, the active set turns inactive. When a miss
occurs, we leave it to prefetch simply.
Cache allocate interface (by S2 in Fig.2) processes
data blocks successfully prefetched, which have been
cached in the underlying physical cache. If the block
belongs to a stable set, we insert it to the corresponding
stable set. According to the current design of prefetch, NFS read does not write CacheFiles cache disk, so
CacheFiles just caches the prefetched data.
Cache release interface (by S5 in Fig.2) does
cache resource recycling, based on the cull flag from
CacheFiles. First use statvfs call to detect free space
in cache, then according to the threshold of cache system settings, select data blocks which are inactive to
release, and finally use CacheFiles HCULL interface to
release the physical resource blocks.
Based on SSM, the replacement policy will only remove data blocks of inactive stable sets, provide more
space for active stable sets and improve the cache utilization.
3.3.3 Cache Prefetch
Stable set prefetch will get the entire stable set of
data blocks; and any access not belonging to any stable
set will get a stable set granularity of data (by S4 in
Fig.2).
SSM-based prefetch method uses a way of monitoring the access distance in a stable set to predict state
transitions of stable sets. If the access interval of an
inactive stable set is less than its maximum active interval, the prediction is that the stable set is turning
into the active state, and we need to prefetch all data
blocks of the stable set.
SSM-based prefetch is based on the state of a stable set to decide how to send prefetch, rather than a
single access. This principle is the biggest difference
from conventional prefetch. Prefetching the whole stable set in the active state to cache, is really the nature
of stable sets — the always-accessing-together feature.
The whole stable set of data will be repeatedly accessed
together during a relatively long period.
4
Experimental Evaluation
To evaluate the performance of SAC, we implemented this prototype in Linux 2.6.32 and transplanted
ext4’s punch-hole function in Linux 3.0 and compared
our scheme with original CacheFiles. As follows, we
show the results.
298
J. Comput. Sci. & Technol., Mar. 2014, Vol.29, No.2
Table 1. Experimental Setup
Client
MDS
DSS
4.1
CPU
Intelr Xeon E5620 2.40 GHz
Intelr Xeon E5620 2.40 GHz
Intelr Pentium E6500 2.93 GHz
Experimental Setup
This experiment platform is based on BWFS, including three nodes: the data storage server (DSS), metadata server (MDS), and client. Though our design assumes SSD, as the client machine has 12 GB memory,
we use a RAM disk of 4 GB size as the cache space
simply. Experimental setup is detailed in Table 1.
We choose several conventional testing tools[13] to
measure their execution time. These testing tools represent different access modes:
1) Cat is a Linux tool to output the content of a file;
2) Grep② is a Linux tool to search a collection of
files for lines containing a match to a given regular expression;
3) Diff is a Linux tool that compares two files for differences. It sequentially reads two files and then compares them;
4) Cp does sequentially reading from one file while
writing to another;
5) Cscope③ is an interactive utility that allows users
to view and edit parts of the source code relevant to
specified program items under the auxiliary of an index
database;
6) Glimpse④ is a text information retrieval tool,
searching for keywords through large collections of text
documents. It builds approximate indices for words and
searches relatively fast with small index files.
7) Fio⑤ is a tool that will spawn a number of threads
or processes doing a particular type of I/O action as
specified by the user;
8) IOzone⑥ is a file system benchmark tool. It generates and measures a variety of file operations.
First we mount a directory of BWFS to the client,
and then run all experiments. The first 6 kinds of tools
are used in Subsections 4.2 and 4.3 as basic experiments
to show the advantages of SSM. As Fio displays all sorts
of I/O performance information, including complete IO
latencies, we use it to test random read in Subsection
4.4. IOzone is useful for performing a broad file system
analysis. So, we use it in the overall test of file system
in Subsection 4.5. Because SSM needs a stable set file,
② http://www.gnu.org/software/grep/, Jan. 2014.
③ http://cscope.sourceforge.net, Jan. 2014.
④ http://freecode.com/projects/glimpse, Jan. 2014.
⑤ http://freshmeat.com/projects/fio, Jan. 2014.
⑥ http://www.iozone.org, Jan. 2014.
Memory (GB)
12
2
2
Network
Gigabits
Gigabits
Gigabits
OS
Linux 2.6.32
Linux 2.6.32
Linux 2.6.32
we first run each test to produce stable sets. Between
any two consecutive runs, the buffer cache is emptied to
ensure that the second run does not benefit from cached
data. Then we run all experiments again. Every test is
run three times to get an average value.
4.2
Effectiveness of File Block
The original CacheFiles (orig) uses a file-level management in cachefilesd and cannot prefetch data on its
own, just passively accepting a request of NFS at a page
granularity. In SAC, SSG represents the basic unit of
cache management (orig/ssg). We first compare the
management methods between orig and orig/ssg. Table 2 lists all the experiments.
Table 2. Basic Experiments
Tools
Cat
Experiment
Traverse a glibc source code directory. The size of
source code is 142 MB.
Grep
Traverse a glibc source code directory, and search
32 system keywords. The size of source code is
142 MB.
Compare two source code directories of glibc. The
Diff
size of dataset is 288 MB.
Cp
Copy a Linux source code directory to the local
disk. The size of source code is 545 MB.
Cscope
Traverse three glibc source code directories, and
search 32 keywords. The size of source code and
index is 500 MB.
Glimpse
Traverse three glibc source code directories, and
search 32 type keywords. The size of source code
is 393 MB.
Note: all source code directories are in the mounted directory.
Fig.4 shows that the effectiveness of orig/ssg is better. Response time is reduced by 17%∼70%. Because
the size of SSG (obtained by mining) is 64 KB, it is bigger than 4 096-byte page size, which provides a better
locality. So, the management of SAC is feasible.
4.3
Performance of Prefetch
We design stable set prefetch (ssetpref) to enhance
CacheFiles (orig/ssetpref) and compare orig/ssetpref
Jian-Liang Liu et al.: SAC: Exploiting Stable Set Model to Enhance CacheFiles
Fig.4. Execution time of different managements. Orig/ssg can
fetch a block of SSG size. SSG means a bigger size that can
effectively capture the long-term block referencing behavior.
with orig/ssg. All the experiments are listed in Table
2, the same as the experiments of Subsection 4.2. Fig.5
shows that the effectiveness of orig/ssetpref is better.
The response time is reduced by 13.7%∼54.3%.
Fig.5. Execution time of the six benchmarks. For a request, ssg
method just fetches a block of SSG size; ssetpref method can
299
rithm, we get SSLIRS replacement algorithm, while the
main principle remains the same with LIRS.
Fio randomly reads 5 GB data of files, 2 MB for each
file. First Fio reads a directory of 3 GB data (phase 1),
and there is no replacement; then it sleeps 10 seconds
(phase 2), and reads the other directory of 2 GB data
(phase 3). And around at 600 s SAC triggers replacement.
Fig.6 shows the result: LIRS plus prefetch consumes
1 043 s, while SSLIRS plus prefetch consumes 816 s, reducing time by 21.76%. SSLIRS can timely adjust
cache state, and pro-actively move inactive sets to HIR,
so that when the available cache is insufficient, the candidate set to be replaced can be found. While the original LIRS is not aware of such a situation, LIRS replacement is only reactive.
As soon as determining the candidate set, SSMbased replacement can free up space in time, which is a
radical change in the process, different from the gradual
process of those conventional replacement algorithms.
Fig.6. Comparison of execution time. These two executions are
prefetch a set of blocks.
the same until replacement occurs.
In each test, we find only one stable set, which covers the whole amount of data, and the whole stable set
is repeatedly accessed. Essentially, Fig.5 shows the effectiveness of SSM-based prefetch. On the one hand,
the overall optimization effectiveness is reflected in the
polymerization compared with a single request for a
read request, which helps reduce the interaction of RPC
requests. On the other hand, the ssetpref method can
prefetch data in time and reduce miss ratio. Therefore,
SSM-based prefetch is reasonable.
Conventional replacements are not aware of the
SSM-based data access rule, so how to choose the victim is based on the current state of cache data and the
history information. SAC can quickly swap out those
inactive sets, without affecting the active sets, so SSMbased replacement is reasonable.
4.4
Performance of Replacement
When cache lacks free data blocks, we need to recycle cache blocks. To demonstrate the design of replacement algorithm based on SSM, we choose Fio random
read test which can produce enough data to fill cache.
Although the original cachefilesd uses file-level LRU
management, we directly use SSM to guide LIRS algorithm, according to the advantage of LIRS[9] . Adding
the SSM temporal-locality to LIRS replacement algo-
4.5
Performance of File System Benchmark
To test the overall effectiveness of cache management
based on SSM, we run the file system benchmark IOzone testing.
IOzone measures file I/O performance by generating
particular types of operations in batches. It creates a
single large file and performs a series of operations on
that file. We measure the performance with running
two IOzone random read tests one by one, varying request block size ranging from 4 KB to 5 MB. The size
of each file is 3 GB.
Fig.7(a) displays that during phase 1 SAC improves
throughput by 8.67%∼936%, and the 4 096-byte is the
300
J. Comput. Sci. & Technol., Mar. 2014, Vol.29, No.2
Fig.7. Throughput of IOzone with different request block sizes and the whole execution time. (a) Throughput of phase 1. (b)
Throughput of phase 2. (c) Execution time of all phases.
most remarkable; Fig.7(b) shows when phase 2 is going on, replacement will occur and SAC will swap out
inactive stable sets accessed only in phase 1, resulting
in improvement by 1.93%∼644%. We also measure the
execution time. Fig.7(c) shows the overall execution
time is reduced by 0.3%∼84%.
The overall effectiveness is that: when no replacement occurs, SSM-based prefetch reduces network overhead by aggregating the relevant reading; replacement
can swap out inactive data in time, without affecting
data being accessed. So the SSM-based cache management can bring effective optimization to the client
cache.
Because throughput mainly depends on the sequentiality of disk requests, with the increasing request
block size, there are many sequential accesses to disks
and the effectiveness of SSM gets limited.
4.6
Evaluation for Overhead
To illustrate the integration overhead caused by
SSM, we do not cache any data and compare SAC with
the original CacheFiles system. Although SAC has no
data cached, it still maintains the cache state and replays the access trace.
Here, as an example, we present the results of experiments in Subsection 4.3. Fig.8 shows that compared with original CacheFiles, the overheads of SAC
are 8.8%, 0.695%, 0.659%, 0.811% and 1.77% respectively.
So we think that with faster computing speed, the
integration of SSM does not bring too much negative
impact.
5
Discussion
In all experiments, the effectiveness of SSM-based
prefetch is the most obvious. SSM-based replacement
plays a significant role when amounts of stable sets get
inactive.
Fig.8. Five applications’ execution time of SAC and CacheFiles.
In our implementation, the mining algorithm is offline and we must run applications to get stable sets.
Because of repeatedly accessed feature of stable set
temporal-locality, there is no need to mine the access
trace continuously. It is a feasible solution to re-mine
stable sets at some intervals. As a result, we leave research on time regularity of stable sets as future work.
There are several limitations in our work. First, in
this paper, prefetch sometimes is aggressive and throttling of prefetch is simple. The prefetched data may be
useless if applications read slowly. It is worthwhile to
evaluate prefetch’s utility. So we will focus on actual
control of prefetch in future work. Second, CacheFiles
is only a read-only cache, because FS-cache for NFS in
Linux 2.6.32 does not support write operation. Third,
IOzone supports multi-threads, but because obtaining
data early for some thread can result in early termination of IOzone and SSM-based prefetch can get data
early, we just choose one thread.
6
Related Work
Cache Systems. Caching has been used widely in
distributed file systems to improve performance. AFS[3]
implements a cache manager called Venus at the client
to improve the system performance. Venus duplicates
the entire file from remote file server to local disk of
the client as replicas and carries out all read and write
Jian-Liang Liu et al.: SAC: Exploiting Stable Set Model to Enhance CacheFiles
operations of the client on local replicas. Nache[1] is
an NFSv4-oriented caching proxy system and based on
the NFSv4 implementation of proxy support. Nache includes Nache Server and Nache Client. Nache enables
the cache and the delegation to be shared among a set
of local clients, thereby reducing conflicts and improving performance. In Nache, CacheFS[2] is used primarily
to act as an on-disk extension of the buffer cache.
Bwcc[14] is a cooperative caching system, using disks
as its cache media through FS-Cache[2] for network storage system. Dm-cache[15] , a general block-level disk
cache, can be transparently plugged into a client for any
storage systems, and supports dynamic customization
for policy-guided optimizations. In order to reduce the
idle power caused by main memory in web server platforms, Flash-Cache[16] uses a two-level file buffer cache
composed of a relatively small DRAM and a flash memory. And it needs DMA transaction to transfer flash
memory content from/to DRAM.
Cache Algorithms. There are also a lot of researches
on the cache algorithms[7-9] . But they lack the considerations of transition faults, which is important in
large-capacity cache management. Traditional prefetch and replacement algorithms cannot predict data
access across transitions, leading to low efficiency of
cache systems when there is a lot of data to be exchanged. DiskSeen[17] performs more accurate prefetch and achieves more continuous streaming of data
from disk by efficiently tracking disk accesses. However,
compared with the always-repeatedly-accessed feature
of stable sets, DiskSeen just employs a tuned linear sequence of address accesses.
7
Conclusions
The cache management of existing CacheFiles cannot be aware of the radical change of data references
and will be limited by phase transitions. Performance
of CacheFiles can be significantly improved by well utilizing SSM, which proves the effect of phase transitions
in turn. SSM-based prefetch and replacement both benefit from the prediction of cache state by SSM. SSM is
orthogonal to the traditional cache management algorithms and can provide enough information of data correlations to guide cache management. Our implementation of the SAC scheme, shows the effectiveness and
feasibility of SSM in real systems and provides a preliminary basis for managing other large-capacity client
caches of big data and cloud computing. In the future,
we will focus on the time regularity of stable sets which
can provide a more accurate prediction about the states
of stable sets. Prefetch control and system optimization
will also be included in our future work.
301
References
[1] Gulati A, Naik M, Tewari R. Nache: Design and implementation of a caching proxy for NFSv4. In Proc. the 5th USENIX
Conf. File and Storage Technologies, Feb. 2007, pp.199-214.
[2] Howells D. FS-Cache: A network filesystem caching facility.
In Proc. the Linux Symposium, July 2006, pp.424-440.
[3] Howard J H, Kazar M L, Menees S G et al. Scale and performance in a distributed file system. ACM Transactions on
Computer Systems, 1988, 6(1): 51-81.
[4] Satyanarayanan M, Kistler J J, Kumar P et al. Coda: A
highly available file system for a distributed workstation environment. IEEE Trans. Computers, 1990, 39(4): 447-459.
[5] Yang D, Huang H, Zhang J et al. BWFS: A distributed file
system with large capacity, high throughput and high scalability. J. Computer Research and Development, 2005, 42(6):
1028-1033. (In Chinese)
[6] Denning P J. Working sets past and present. IEEE Transactions on Software Engineering, 1980, 6(1): 64-84.
[7] Megiddo N, Modha D S. ARC: A self-tuning, low overhead
replacement cache. In Proc. the 2nd USENIX Conference on
File And Storage Technologies, March 2003, pp.115-130.
[8] Johnson T, Shasha D. 2Q: A low overhead high performance
buffer management replacement algorithm. In Proc. the 20th
Int. Conf. Very Large Data Bases, Sept. 1994, pp.439-450.
[9] Jiang S, Zhang X D. LIRS: An efficient low inter-reference
recency set replacement policy to improve buffer cache performance. In Proc. the 2002 ACM SIGMETRICS, June 2002,
pp.31-42.
[10] Guo M Y, Liu L, Zhang Y L et al. Stable set model based
methods for large-capacity client cache management. In Proc.
the 14th HPCC, June 2012, pp.681-690.
[11] Butt A R, Gniady C, Hu Y C.The performance impact of
kernel prefetching on buffer cache replacement algorithms. In
Proc. ACM SIGMETRICS Int. Conf. Measuring and Modeling of Computer Systems, June 2005, pp.157-168.
[12] Sivathanu M, Prabhakaran V, Popovici F I et al. Semantically-smart disk systems. In Proc. the 2nd USENIX Conference on File and Storage Technologies, March 2003, pp.73-88.
[13] Traeger A, Zadok E, Joukov N et al. A nine year study of
file system and storage benchmarking. ACM Transactions
on Storage, 2008, 4(2): Article No.5.
[14] Shi L, Liu Z J, Xu L. BWCC: A FS-cache based cooperative
caching system for network storage system. In Proc. the 2012
IEEE CLUSTER, September 2012, pp.546-550.
[15] Van Hensbergen E, Zhao M. Dynamic policy disk caching
for storage networking. Technical Report, RC24123, IBM
Research Division Austin Research Laboratory, http://citeseerx.ist.psu.edu/showciting?cid=19808002, Jan. 2014.
[16] Kgil T, Mudge T. FlashCache: A NAND flash memory file
cache for low power Web servers. In Proc. the 2006 CASES,
October 2006, pp.103-112.
[17] Ding X N, Jiang S, Chen F et al. DiskSeen: Exploiting disk
layout and access history to enhance I/O prefetch. In Proc.
USENIX Annual Technical Conference, June 2007, Article
No.20.
Jian-Liang Liu received his M.S
degree in computer science from
China University of Geoscience, Beijing, in 2010. Now, he is currently a
Ph.D. candidate of Institute of Computing Technology (ICT), Chinese
Academy of Sciences (CAS), Beijing.
His research interests include blocklevel storage and cache management.
302
J. Comput. Sci. & Technol., Mar. 2014, Vol.29, No.2
Yong-Le Zhang received his
M.S. degree in computer science from
ICT, CAS, in 2013. Now, he is
a Ph.D. candidate of University of
Toronto, Canada. His research interests include network storage and
cache management.
Lin Yang received her B.S. degree in computer science from Beijing Language and Culture University, China, in 2011. Now, she is a
M.S candidate of ICT, CAS. Her research interests include network storage and cache management.
Ming-Yang Guo received his
Ph.D. degree in computer architecture from ICT, CAS, Beijing, in
2012. Now, he is an assistant professor at the Data Storage and Management Technology Research Center, ICT, CAS. His research interests
include distributed raid and cache
management.
Zhen-Jun Liu received his
Ph.D. degree in computer architecture from ICT, CAS, Beijing, in
2006. , He is currently an associate
professor at the Data Storage and
Management Technology Research
Center, ICT, CAS. His research interests include WAN storage, distributed file system and RAID management.
Lu Xu received his M.S. degree
in computer architecture from University of Tokyo, Japan, in 1989
and his Ph.D. degree in computer
systems software from Purdue University, USA, in 1995. He is currently a professor at the Data Storage and Management Technology
Research Center, ICT, CAS. His research interests include computer architecture, high performance network storage, and computer system software.