HAG: An Energy-Proportional Data Storage Scheme for Disk

Jin PQ, Xie X, Jensen CS et al. HAG: An energy-proportional data storage scheme for disk array systems. JOURNAL OF
COMPUTER SCIENCE AND TECHNOLOGY 30(4): 679–695 July 2015. DOI 10.1007/s11390-015-1554-x
HAG: An Energy-Proportional Data Storage Scheme for Disk Array
Systems
7
F‰
wu
Pei-Quan Jin 1,2 (
), Senior Member, CCF, Member, ACM, IEEE
Xike Xie 3,∗ (
), Member, ACM, IEEE, Christian S. Jensen 3 , Fellow, ACM, IEEE, Yong Jin 1 (
and Li-Hua Yue 1,2 (
), Senior Member, CCF, Member, ACM
7 ℄)
1
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
2
Key Laboratory of Electromagnetic Space Information, Chinese Academy of Sciences, Hefei 230027, China
3
Department of Computer Science, Aalborg University, Aalborg, DK-9220, Denmark
E-mail: [email protected]; {xkxie, csj}@cs.aau.dk; [email protected]; [email protected]
Received January 29, 2015; revised March 25, 2015.
Abstract
Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A
recent survey has showed that power costs amount to about 50% of the total cost of ownership in a typical data center, with
about 27% of the system power being consumed by storage systems. This paper aims at providing an effective solution to
reducing the energy consumed by disk storage systems, by proposing a new approach to reduce the energy consumption.
Differing from previous approaches, we adopt two new designs. 1) We introduce a hotness-aware and group-based system
model (HAG) to organize the disks, in which all disks are partitioned into a hot group and a cold group. We only make file
migration between the two groups and avoid the migration within a single group, so that we are able to reduce the total cost
of file migration. 2) We use an on-demand approach to reorganize files among the disks that is based on workload change
as well as the change of data hotness. We conduct trace-driven experiments involving two real and nine synthetic traces
and we make detailed comparisons between our method and competitor methods according to different metrics. The results
show that our method can dynamically select hot files and disks when the workload changes and that it is able to reduce
energy consumption for all the traces. Furthermore, its time performance is comparable to that of the compared algorithms.
In general, our method exhibits the best energy efficiency in all experiments, and it is capable of maintaining an improved
trade-off between performance and energy consumption.
Keywords
1
energy-aware system, file organization, storage management
Introduction
Reducing energy cost of storage systems is a critical
issue for disk storage systems such as data centers. A
recent study has showed that energy consumption accounts for 50% of the total cost of ownership for data
centers[1] , and that storage systems account for 27% of
system power[2-3] . To enable cost effective data centers,
there is a strong need to develop effective schemes to
reduce the energy consumption of storage systems[4] .
In order to reduce energy consumption, many researchers proposed an energy-proportional approach for
file storage on disk array systems[5-7] , which aims to
keep frequently accessed hot files on some specific disks
and to let the disks hosting cold files temporally power
off. However, current solutions on energy-proportional
storage may introduce high file-migration and additional energy costs. In addition, they cannot be selftuned according to workload changes.
In this paper, we propose an effective approach to
realize energy-proportional data storage on disk array
systems. Our solution aims to reduce data migration
costs when distributing and reorganizing files among
Regular Paper
Special Section on Data Management and Data Mining
The work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61379037 and 61472376,
and the Oversea Academic Training Funds (OATF) sponsored by the University of Science and Technology of China.
∗ Corresponding Author
©2015 Springer Science + Business Media, LLC & Science Press, China
680
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
disks and present a new on-demand algorithm for file reorganization that can adapt to workload changes. The
major contributions of this paper are as follows.
1) We present a hotness-aware and group-based system model for disk storage systems, along with effective algorithms for dynamically determining hot files
and grouping disks according to the access frequencies
of their files. Data migration is only allowed between
hot and cold disk groups. This group-based model can
reduce migration costs because the migration within a
single group does not happen in our model. (Section 3)
2) We present a new on-demand approach to reorganize files among disks, in which the reorganization
process is triggered on the basis of the change of filehotness and predicted migration costs. This policy can
reduce the times of reorganization as well as the unnecessary adjustment on file organization. (Section 4)
3) We conduct experiments with both real and synthetic traces to compare the proposed approach with
previous solutions. The experimental results show that
our approach considerably outperforms the competitors
considering energy savings while maintaining a comparable time performance. (Section 5)
2
Related Work
A number of energy conservation techniques have
been proposed for disk storage systems[4] , most of which
are based on the assumption of skewed file access patterns. Therefore, the main approach is to keep hot files
on some specific disks and to let the disks hosting cold
files temporally power off. Such approaches are usually
called “energy proportional”[5-7] . To realize energy proportionality for disk storage systems, several techniques
have been proposed.
The first types of solutions can be called copy-based
techniques[2,5-9] , because they copy popular files onto
some additional disks or caches and let primary data
drives power off. Weddle et al.[2] exploited the unused
storage space for data replication such that one or more
disks of an RAID can be put in standby mode to save
power. They called this replication technique poweraware RAID (PARAID). Inspired by PARAID, Kim
and Rotem[5-6] proposed replicating data across nodes
to facilitate node deactivation when the system load decreases. They called their technique Fractional Replication for Energy Proportionality (FREP). Verma et al.[7]
proposed to consolidate the workload across physical
volumes using a technique they called Sample-Replicate
Consolidate Mapping (SRCMap). They assumed that
a physical volume is composed of an RAID. SRCMap
samples a subset of the blocks, namely the working set,
from each physical volume. This working set is replicated on other physical volumes. SRCMap only keeps
the minimum number of physical volumes required by
the workload turned on. MAID (Massive Array of Idle
Disks)[8] was proposed as a replacement for old tape
backup archives with hundreds or thousands of tapes.
It uses a few additional always-on cache disks to hold recently accessed blocks to reduce the number of accesses
to other disks. However, this layout is not energyefficient because the extra cache disks consume energy.
The EXCES (External Caching in Energy Saving Storage Systems)[9] technique uses a low-end flash device
for caching popular data and uses buffering of writes to
increase idle periods of disk drives. EXCES maintains
the top-k hottest files and moves them to flash devices
periodically. A dedicated storage cache does not provide fine-grained energy proportionality and adds additional energy consumption as well as economic costs. In
order to get a better trade-off between energy savings
and read/write availability, Thereska et al.[10] proposed
a distributed storage system that uses a power-aware
replica layout for data chunks and a distributed virtual logging scheme to improve read/write availability.
However, this approach also needs to maintain replicas
of data. Besides, it has to use extra logs as well as a
centralized metadata service to coordinate data replicas.
Another method is to redistribute files among
disks[11-12] . Pinheiro and Bianchini proposed the PDC
(Popular Data Concentration) technique for energy
conservation[12] . PDC does not use additional disks,
but suggests transferring data between disks according
to the popularity of the accessed data. PDC focuses
on workloads with highly skewed file access patterns.
It periodically moves files on the basis of their access
frequencies, i.e., the most popular files are moved onto
the first disk until it is full and the next most popular
files are put onto the second disk, etc. However, PDC
employs an individual-disk-based file migration policy.
This policy may bring too many file movements because the access frequencies of files change very often
in many applications. As the hotness of files between
a disk and its neighbor is very close, a slight change of
the hotness of files may lead to file migrations. This
can result in substantial performance degradation even
when all disks are in active mode. While in our proposal, we use an improved group-based file migration
model that groups all the disks into a hot disks group
681
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
and a cold disks group. File migrations are only allowed
between groups. This scheme can reduce the unnecessary file migrations in PDC that are caused by the slight
changes of the hotness of files. On the other hand, moving files between two hot disks contributes little in both
energy saving and performance improvement, because
these disks may always be in active mode. Therefore,
reducing file migrations between disks within the same
group will save energy but without degrading time performance. The disk-grouping idea was also proposed
in [13], which was designed for RAID disk systems
and especially for read-only workloads. In this work,
the authors proposed to partition all disks into RAID
groups and use a disk-block-based method to exchange
data between groups. However, differing from [13], our
work can support non-RAID systems and handle general workloads with reads and writes. In addition, we
use a file-based exchanging model rather than a diskblock-based model.
A new method for energy proportionality called EHASH was proposed recently[11] . E-HASH divides all
the disks into a hot disk set and a cold disk set and periodically puts the files with high access frequency onto
the hot disk set periodically. The cold disk set can be
turned off or switched into sleep state.
To let cold disks shut off is a straightforward idea
that can be effective at reducing energy consumption,
but we have to consider the wake-up cost for cold
disks that are shut off. Therefore, when and how to
turn off cold disks has been a research topic in energyproportional disk storage systems[4,14-15] . The arguably
best approach to addressing this problem is called DPM
(Dynamic Power Management)[14] . DPM aims at finding an optimal solution to determining the right shut-off
time for cold disks. The basic idea is to measure the
idle time interval of cold disks, and to shut them off
if the energy savings in the observed idle time interval exceeds a predefined energy penalty. The predefined energy penalty reflects the additional energy cost
of serving a request to a disk that is shut down before
the request is received (thus we have to immediately
turn on the disk). However, the DPM policy is only
based on energy penalty and idle time of disks. It does
not consider any data movement among disks. We use
DPM as a baseline in our experiments.
Our work is influenced by E-HASH. However, there
are some major differences between our approach and
E-HASH. First, the size of the hot disk set is fixed in
E-HASH while in our work, it is dynamically adapted
according to the changes in the workload. Second, E-
HASH redistributes the files among disks in terms of
a stable period, while we use a new dynamic approach
to reorganize the files among disks. Here, the reorganization process is triggered on the basis of the changes
in the hotness of files and of predicted migration costs.
As the redistribution process adds extra energy cost
and adversely affects the time performance of the storage system, our approach is expected to have better
performance in both energy conservation and run time.
3
HAG System Model
To address the problems of previous methods such
as PDC and E-HASH, we first propose an improved
model for disk storage systems, which is called HAG
(Hotness-Aware and Group-Based model). The model
takes into account the hotness of files and disks when
reorganizing files. The group-based design groups disks
into two sets, namely hot and cold groups.
3.1
System Model
Fig.1 shows the general idea of the HAG model. All
disks in a storage system are categorized into a hot
group and a cold group. The hotness of a disk is computed based on the hotness of the files residing on the
disk. In Subsection 3.2, we cover the details on how to
determine the hotness of files. In general, the hotness
of a file is influenced by its access frequency. As the access frequency of a file changes with time, the hotness of
its associated disk will eventually change, too. Therefore, in the HAG model, the hot disks are dynamically
selected on the basis of the hotness of files as well as
disks.
File I/O Requests
File I/O
Dispatcher
File-Disk Mapping Table
Migration
( Dynamic)
Hot Disks
Cold Disks
Fig.1. HAG (Hotness-Aware and Group-Based) system model.
The operational process of the HAG model is shown
in Fig.2. We periodically check the state of the system
to determine whether file reorganization is needed. If
yes, we determine the hot files and the hot disks. Both
the checking of the file reorganization condition and
682
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
the selection of hot files and disks are based on statistics collected during the operation of the system. After
that, we move data between hot disks and cold disks so
that the hottest data can be allocated on the selected
hot disks. The cold disks are then set in standby states
for the sake of saving energy.
of hot disks according to workload change. Generally,
different applications may have different file access patterns, and also the frequently accessed files change as
applications change. Thus, it is not practical to always
keep a fixed number of hot disks.
3.2
Start
Initialization :
HOT={all the disks}, COLD=φ
Wait tp Seconds
Need File
Reorganization?
No
Determining Hot Files
Many database applications exhibit skewed access
patterns. Basically, we can use access count to measure the hotness of a file. However, as access counts
change with time, we have to introduce a smoothing or
aging scheme to reflect the time factor. For example, a
file accessed frequently and recently should be regarded
as hotter than a file that was accessed frequently in the
past. Therefore, we define the hotness of a file as follows.
estr (tn ) = a × xtn + (1 − a) × estr (tn−1 ).
(1)
Yes
Slight Adjustment
on File Allocation
Detect Hot Files
Select Hot Disks
Interchange Data Between
Hot and Cold Disks
Set Cold Disks into
Standby States
No
End?
Yes
End
Fig.2. HAG-based system procedure.
Compared with previous models such as PDC and
E-HASH, the major benefits of HAG can be described
in two aspects. First, differing from the single-diskbased model of PDC in which file migration happens between two disks with different hotness values, the HAG
model only performs file migration between the two disk
groups (hot group and cold group). There will not be
any file migrations between two disks within the same
group. Therefore, HAG can reduce unnecessary file migration costs caused by PDC. Next, in E-HASH, the
number of hot disks remains stable, while in HAG, we
use a dynamic selection procedure to determine the set
In (1), tn represents the current time slice, xtn represents the observed value at time tn . In our framework,
xtn is 1 if file r was accessed during tn , otherwise xtn is
0. Next, estr (tn−1 ) is the hotness estimate for the previous time slice tn−1 . The variable α is a decay factor
that reflects the weight given to the new observation
versus the earlier one, i.e., estr (tn−1 ). In our experiments, the I/O requests to files are written into a log
file, and we can simply scan the log to calculate the
hotness of files.
In order to finally determine which files are hot,
we sort all the files in a descending order according to
estr (tn ). Then the top 1/4 files are initially identified
as hot files. According to the 20/80 rule in database
applications, we suppose that 20% files are possible hot
ones. Here, we slightly increase the expected size of the
set of hot files to 25%, because the energy penalty of
recognizing a cold file as a hot one is less than that of
the opposite case. When a hot file is labeled as cold, we
have to turn on/off the disk hosting the file frequently.
Such switches of power states will cost more power than
the ordinary operations on a disk. On the other hand,
it is not energy efficient to denote all the remaining 3/4
files as cold data. In general, the hottest and coldest
files are easily to be recognized based on the access frequency. The difficult part lies in the identification of
“warm files”. Since they are likely in a changing state,
either from hot to cold or from cold to hot, we have
to carefully deal with those files. In order to find the
warm files that are much likely turned into hot files, we
first get the mean est value of all the warm files, and
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
683
then identify those files whose est values are higher than
the mean as hot, and the others as cold. The detailed
process is described in Algorithm 1.
(2)
Table 1, can be simply calculated by (2).
hot volume
.
hot disks =
maxRatio hot × disk size
Algorithm 1. Determining HotFiles(F )
Input: the set of files: F , each of which has an associated est
value
Output: H: the set of hot files
1: G ← sort all files in F according to est;
2: H ← the top 1/4 files in G;
Table 1. Notations Used in Determining Hot Disks
Notation
Description
disk size
Capacity of a single disk (bytes)
maxRatio hot
Maximal ratio of hot data in a disk (%)
3: C ← the bottom 1/4 files in G;
minRatio f ree Required minimal ratio of free space in a disk
(%)
4: W ← G – H – C ; //get the warm files
hot f iles
Number of the hot files selected in Subsection
3.2
6: for each file f in W do
hot volume
Volume of all the selected hot files (bytes)
7:
hot disks
Estimated number of hot disks
8: end for
αi
Current hot data ratio in disk i
9: return H;
βi
Current free space ratio in disk i
5: m ← the mean est value of all the files in W ;
if f.est > m then H ← H ∪ {f };
End Determining HotFiles
In our design, we do not introduce a moderate
group, e.g., warm files group. Warm files will eventually lead to warm disks. However, if we leave those
warm disks in active state, they are actually treated
as hot disks; if they are set in standby mode, they are
treated as cold disks. As a result, a warm files group as
well as a warm disks group will complicate the control
but contribute little in energy efficiency.
3.3
Selecting Hot Disks
Based on the hot files as detailed in Subsection 3.2,
we can further determine the hot disks. The rest of
the disks are then identified as cold disks. The basic
idea is to let hot files reside in hot disks and make the
number of hot disks as small as possible so that we can
maximize the chance to shut off more disks and in turn
reduce energy consumption.
There are two principles when selecting the hot
disks.
1) The selection of hot disks should consider both
energy savings and time performance.
2) The file migration between disks should be reduced as much as possible.
To simplify the analysis, we introduce some notations as shown in Table 1. The purpose of parameters
maxRatio hot and minRatio free is to control the number of hot files on hot disks, as we have to keep some free
spaces for new insertions or updates. After determining the hot files using the algorithm in Subsection 3.2,
the estimated number of hot disks, namely hot disks in
After that, we select hot disks disks among all the
disks as the hot disks. As the current ratio of hot files
in a disk, i.e., αi , may exceed maxRatio hot, we need
to be able to swap files among disks to get the ratio
of hot files in each hot disk below maxRatio hot. This
will introduce additional migration cost, and we want
to first determine the minimal migration cost.
Property 1. The migration cost in selecting the
set of hot disks, denoted as MCOST , is determined by
αi and βi . It can be represented by (3), where C is a
constant.
MCOST = C −
hotX
disks
(2αi + βi ) × disk size.
(3)
i=1
The proof of Property 1 is put in Appendix A. Then,
based on (3), we give Algorithm 2 for selecting the set
of hot disks.
Algorithm 2. Selecting HotDisks(α, β, hot disks)
Input: α: the hot data ratios of all disks; β: the free space
ratios of all disks; hot disks: the number of hot disks to be
returned
Output: D: an array storing the identifiers of the hot disks
hot disks
Preliminary: the disks are numbered from 0 to |α| − 1
1: for i = 0 to |α| − 1 do
2:
H[i].cost ← 2αi + βi ;
3:
H[i].disk no ← i;
4: end for
5: H ← sort H descendingly w.r.t. H[i].cost;
6: for i = 0 to hot disks − 1 do
7:
D[i] ← H[i].disk no;
8: end for;
9: return D;
End Selecting HotDisks
684
4
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
File Reorganization
File access patterns generally change with time.
Therefore, the hotness of a file as well as the hotness
of a disk changes with time in turn. In order to make
the file distribution energy-efficient when file access patterns change, we present an on-demand approach to reorganize files on disks to ensure continued high energy
efficiency.
The basic idea is to reorganize files periodically.
This approach has been used in previous studies such as
PDC[12] . However, as file reorganization introduces additional energy costs because of data migration operations, periodical reorganization may worsen the energy
efficiency of the target storage system. Basically, it is
appropriate to check periodically whether file reorganization is needed, but physical reorganization should not
be performed periodically, because in some cases, the
reorganization costs may exceed the predicted benefits.
In our proposal, we use on-demand file reorganization,
where we periodically check whether file reorganization
is needed and only conduct physical reorganization if
the current file organization is poor in energy efficiency.
Generally, there are two major rules for designing
an efficient approach to file reorganization. First, the
approach should be adaptive to workload change. Second, it should introduce only little overhead in energy
consumption and runtime overheads.
Based on these rules, in contrast to the periodical
reorganization way, we consider two factors to determine whether file reorganization is necessary.
1) ratio file: the ratio of accesses to current hot
disks, defined in Definition 1.
2) ratio disk : the ratio of hot disks that are frequently accessed, defined in Definition 2.
Definition 1. The hot disk file access ratio, ratio file, is defined as the total number of accesses to the
files in hot disks divided by the total number of file accesses. This quantifier captures the popularity of the
files in hot disks.
Definition 2. The hot disk popularity, ratio disk,
is computed as follows: we sort all the disks in a descending order according to the number of accesses to
them in descending order. Then, we pick up the top
hot disks disks from the sorted list and compute the
number of current hot disks that appear in the set of the
top hot disks disks. This number divided by hot disks is
returned as ratio disk.
Example 1. Suppose there are total 9 disks numbered from 0 to 8, and the current hot disks are 1, 4,
and 7. Thus the constant hot disks is 3 in this example. Table 2 shows the number of accesses to the files
in each disk.
Table 2. Accesses to Each Disk in Example 1
Disk
Accesses to Each Disk
0
100
1
400
2
600
3
500
4
800
5
200
6
300
7
900
8
700
Then, we can compute ratio file and ratio disk as
follows:
1) ratio f ile = (400 + 800 + 900)/(100 + 400 + 600 +
500 + 800 + 200 + 300 + 900 + 700) = 0.47;
2) ratio disk = 2/3 = 0.67.
Here, for ratio disk, we first sort the disks according
to the number of accesses to them, which results in the
list: (7, 4, 8, 2, 3, 1, 6, 5, 0). Then, we check the top
hot disks disks in the list, i.e., {7, 4, 8}, and find that
it contains two hot disks, i.e., {7, 4}. Thus ratio disk
is calculated as |{7, 4}| / hot disks, i.e., 2/3.
We use the average value of ratio file and ratio disk
and perform file reorganization if the value is below a
threshold. In this case, there has been a substantial
change in the access pattern, and the current files organization does not support the new workload. Therefore,
file reorganization is performed to re-select the hot files
as well as hot disks and further to distribute the newly
hot files onto the newly identified hot disks. In case
that the average ratio exceeds the threshold, meaning
that the current accesses are still focused on the current hot files and hot disks, we start a slight adjusting
procedure. This procedure interchanges the hot files
in current cold disks with the cold files in current hot
disks, so that the hot files in cold disks have a chance
to move to hot disks even when physical file reorganization is not triggered. As a consequence, we present
the on-demand file reorganization in Algorithm 3.
5
Performance Evaluation
As our approach aims at reducing the energy consumption of a disk storage system while guaranteeing
time performance, the evaluation focuses on two metrics: energy saving and runtime. We first describe the
685
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
Algorithm 3. File Reorganization(F , α, β, tp, δ)
Input: F : the file set, α: the hot data ratios of disks, array
β: the free space ratios of disks, tp: a time interval, δ: the
threshold for triggering physical file reorganization
Preliminary: disks are identified by unique number from 0 to
|α| − 1
/* Checking on file reorganization conditions will be periodically executed after tp time units since the last checking */
1: for each time interval tp do
2:
ratio file ← computed by Definition 1;
3:
ratio disk ← computed by Definition 2;
4:
if (ratio file + ratio disk ) / 2 < δ then
5:
6:
7:
8:
9:
10:
11:
Fig.3 shows an illustration of the power model of the
disk we use in the experiments. For the computation of
energy consumption when running an algorithm g on
trace s, we define energy(s, g) by (4).
energy(s, g) = energyrun + energyspin + energystop +
energyidle + energystandby .
(4)
The elements in (4) are defined as follows.
hot files ← Determining HotFiles(F );
hot volume
hot disks ←
;
maxRatio hot × disk size
D ← Selecting HotDisks (α, β, hot disks);
Swap between hot disksD and cold disks;
else
Slight adjustment: interchange hot files in cold disks
with cold files in hot disks;
end if ;
12: end for;
End File Reorganization
energyrun =
|s|
X
((seek power +
i=1
operating power) × IO latency),
X
energyspin =
(spinup power × spinup time),
X
energystop =
(stopping power × stopping time),
X
energyidle =
(idle power × idle time),
X
energystandy =
(standby power × standby time).
experimental setup and workloads in Subsections 5.1
and 5.2, and then we present the evaluation results
w.r.t. various metrics from Subsection 5.3 to Subsection 5.8.
Operating
Mode
(13 w)
Seek
5.1
(12 .6 w , 4 .16 ms)
Experimental Settings
The experiments are conducted under a simulation
1
environment developed using SimPy○
. The environment consists of a workload generator, a file dispatcher,
and a group of virtual hard disks. Each virtual hard
disk is designed as a container of files. The parameters
of hard disks are based on a modern hard disk: a Sea2
gate Barracuda 7200.10○
. Table 3 shows the detailed
parameters for the disk.
Table 3. Power Parameters Used in the Experiments
Description
Idle mode
Operating mode
Standby mode
Spinup power
Stopping power
Seek power
Spinup time
Stopping time
I/O latency
Value
9.30 w
13.00 w
0.80 w
24.00 w
9.30 w
12.60 w
15.00 s
10.00 s
4.16 ms
Spinup
(24 w , 15 s)
Standby
Mode
( 0.8 w)
Idle Mode
(9. 3 w)
Stop
(9.3 w, 10 s)
Fig.3. Illustration of the disk power model.
Here, idle time and standby time are dynamically
aggregated for each disk in the idle state and the
standby state. Other parameters are listed in Table 3.
We compare the proposed approach with three competitors, i.e., DPM[14] , PDC[12] , and E-HASH[11] . The
E-HASH method uses a fixed set of disks as the hot
disks. In our experiment, the ratio of hot disks in EHASH is set to two different values, namely 25% and
50%, in order to make a comprehensive comparison between E-HASH and our approach. We denote them as
E-HASH(25%) and E-HASH(50%) in the results.
1
○
http://simpy.readthedocs.org, May 2015.
2
○
http://www.seagate.com/support/disc/manuals/sata/100402371a.pdf, May 2015.
686
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
In the experiments, we set the parameter tp to 7 200
seconds. This time period is also used in E-HASH, i.e.,
both E-HASH(25%) and E-HASH(50%) perform file reorganization every 7 200 seconds. The parameter δ is
set to 0.7. These parameters are likely to impact the
overall performance, and we will discuss their effects
later in Subsection 5.7.
5.2
Workloads
We use two types of workloads. The first type consists of six synthetic traces, each composed of 20 million
read/write requests over 500 thousand files randomly
distributed among 24 disks. This disk number is the
same as that in one of the real traces. We suppose
that the disks have the same volume and each file has
the same size. Therefore, we use the number of files to
represent the volume of a disk, which is calculated by
(5).
disk size = ⌈(f ile number/disk number) ×
(1 + f ree space ratio)⌉.
cases of α = 1.0, we also consider another situation of
α = 1.4.
Another factor that needs to be considered when
generating synthetic traces is disk locality. The disk locality refers to the probability for a disk to receive I/O
requests. We use the exponential function to simulate
disk locality. Specially, exp(u) indicates the exponential distribution with mean u. A previous disk storage
system named Hibernator[17] uses exp(6) and exp(20) to
characterize disk locality. We add the case of exp(50).
Generally, a larger value of parameter u indicates that
the accesses are more load-balanced.
We show the synthetic Zipf traces in Table 4. Here,
each trace consists of total 20 million requests, and each
trace has 5 million requests for each file locality value
(25%, 50%, 75%, and 100%) within the total 20 million requests. In order to obtain changing file access
patterns, we make each 5 million requests focus on different files.
(5)
In the synthetic traces, the number of total files
is 500 000, and the size of each file is 48 KB. The
free space ratio is set to 0.3. Thus, each disk is able
to contain a maximum of 27 084 files.
As modern disks usually contain internal caches, we
implement a buffering scheme for each simulated disk.
The buffer for a disk can contain 1 000 files (about 3.7%
of the total files in a disk), and all the buffers for each
disk are with the same size. In our experiments, we
implement the LRU scheme as the buffer replacement
algorithm.
For generating each synthetic trace, we use the concept of file locality to indicate the percentages of files
being accessed. For example, a file locality of 25%
means that the I/O requests concern on 25% of all files.
To reflect the dynamics of workloads, we use four different file locality values, namely 25%, 50%, 75%, and
100%, when generating the trace.
Next, we use the Zipf distribution to generate the
synthetic traces so that they reflect skewed file access patterns. Access patterns with Zipf distribution
have been characterized and used in many previous
studies[16] . For instance, Internet data access patterns
can usually be characterized as satisfying a Zipf distribution with the distribution parameter of α = 1.0[16] .
In our experiments, we use two different Zipf parameters to generate the synthetic traces. In addition to the
Table 4. Synthetic Zipf Traces
Name
Zipf Parameter
Disk Locality
S11
α = 1.0
exp(6)0
S12
α = 1.0
exp(20)
S13
α = 1.0
exp(50)
S21
α = 1.4
exp(6)0
S22
α = 1.4
exp(20)
S23
α = 1.4
exp(50)
In addition to the synthetic traces, we use two real
3
traces: OLTP and WEB○
.
OLTP. This is an I/O trace from a real OLTP application running at two large financial institutions.
It contains 4 099 352 write requests interleaved with
1 235 632 read requests to 24 disks. The mean request
rate is 123.5 per second.
WEB. The second trace is recorded on a storage
system that supports a Web search engine. There are
4 579 809 read requests invoked within 4.5 hours, and
99% of the requests are to three disks. The mean request rate is 297.487 per second.
The original OLTP and WEB traces only record
block-level requests, and thus we map those requests
into file-level requests by regarding each block number
as a file number.
3
○
http://traces.cs.umass.edu/index.php/Storage/Storage, May 2015.
687
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
5.3
Adaptivity in Selecting Hot Disks
We first consider the adaptivity of our approach in
selecting the set of hot disks. As discussed before, our
method is able to dynamically determine the hot disks
according to the changing workload. Fig.4 presents the
different numbers of hot disks selected by our method
and the three previous approaches.
35
DPM
E-HASH(50%)
Our Method
Hot Disks Number
30
PDC
E-HASH(25%)
25
20
15
10
5
hot disks according to the change of access patterns.
Thus we direct most requests to hot disks and let the
cold disks power off.
5.4
Effectiveness of File Reorganization
As discussed in Section 4, our method is able to
dynamically reorganize files based on the computing of
ratio file and ratio disk. These two quantifiers enable
us to conduct on-demand file reorganization based on
the changing file access patterns. Compared with the
periodical file reorganization way in E-HASH, this technique can reduce the number of file reorganization and
further save energy and improve time performance.
We conduct an experiment on trace S12 to measure
the effectiveness of file reorganization. The results are
shown in Fig.5, where we show the I/O requests to each
disk generated by different methods. The disks are ordered ascendingly on the number of I/O requests.
0
t1
t2
t3
t4
0.7
Four Different Periods
DPM
E-HASH(50%)
Our Method
0.6
Fig.4. Hot disks varying with time.
PDC
E-HASH(25%)
Here, we use the trace S12 as the workload, and the
decay factor in (1) is set to 0.05. When scanning the
trace, we manually select four time instants to calculate
the number of hot disks. In detail, the first time instant
t1 is the time when the first 2.5 million requests have
been processed and the second time instant t2 is the
time when the first 7.5 million requests have been finished. Finally, t3 is the time when 12.5 million requests
have been processed, and t4 is the time after executing
17.5 million requests.
As DPM uses all 24 disks and is not able to select
hot disks, the number of hot disks is constant at 24. EHASH uses a static set of disks as hot disks, i.e., in EHASH(25%), there is a stable set of six hot disks, while
in E-HASH(50%), the set consists of 12 disks. However,
in both implementations of E-HASH, the number of hot
disks cannot be adjusted in accordance with workload
change. In contrast, our method adapts to the changing workload. This is mainly because more and more
files are being accessed.
The dynamicity of the number of hot disks is critical
to the energy proportionality of disk storage systems.
Since different applications usually have different data
access patterns, and even the same application may
have different access patterns at different time points,
it is more appropriate to dynamically change the set of
Request Ratio
0.5
0.4
0.3
0.2
0.1
0
1
3
5
7
9
11 13 15 17 19 21 23
Disks (Ascendant w.r.t. Request Ratio)
Fig.5. Request ratios to each disk.
Because DPM does not involve any file reorganization, there is no change in the request ratios of the total
24 disks. As the hotness of files is changing with time,
DPM is not suitable for achieving energy proportionality. Next, E-HASH(50%) and E-HASH(25%) generally have higher accessing ratios on cold disks than our
method. This is mainly because of the limited number
of hot disks in E-HASH(25%) and because the sets of
hot disks are stable. The PDC method allocates most
files in one disk (the No. 24 disk in Fig.5). This will
cause high I/O overheads to the hottest disk. Besides,
this method has to perform many file migrations in order to make the hottest files always reside in the hottest
disk. The experimental results in Subsection 5.6 will
demonstrate this claim. In contrast, our method can
688
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
dynamically select an appropriate number of hot disks
according to workload change; therefore, we can see in
Fig.5 that more requests are concentrated on eight hot
disks (No.17∼No.24) by our method. This will finally
lead to better performance in energy saving and run
time, as we will discuss in Subsections 5.5 and 5.6.
0.8
Energy Savings
Energy Savings
PDC
E-HASH(25%)
0.5
0.4
0.3
0.2
In this experiment, we measure the percentage of
energy savings of each method. The baseline method,
denoted as NEC (No Energy Saving), is that all the
24 disks are always in the active mode, i.e., they are
all powered on, and no energy-aware techniques are applied. Then we compare the energy savings of each
method using (6), where energy saving(s, g) refers to
the energy savings rate of the algorithm identified as g
under trace s. Here, energy(s, g) is defined by(4).
energy saving(s, g) = 1 −
DPM
E-HASH(50%)
Our Method
0.6
energy(s, g)
.
energy(s, N EC)
(6)
Fig.6 presents the energy savings rate of the four
methods as measured for the six traces. In most cases,
our method has 50% or higher energy savings. The
average energy savings rate of our method is 52.79%,
while E-HASH(50%) and E-HASH(25%) reach energy
savings rates of 31.58% and 23.40% on average, respectively. The PDC method performs better than
our method under traces S22 and S23. This is simply because these two traces have a very high access
locality. Thus, the hottest files can be stored in one
hottest disk. However, PDC cannot adapt to the workload change, which causes the worse energy-saving results under other synthetic traces. Notably, PDC has
an average energy saving rate of 50.44% under all the
synthetic traces, which is worse than our method. As
our method is designed for the better trade-off between
energy saving and time performance, we can see in Subsection 5.6 that the time performance of our method is
significantly superior to that of the PDC method. The
average energy saving rate of DPM is only 3.09% because of its inability to identify hot and cold disks. Further, as studied in [18], DPM is only helpful for energy
savings under very low request rates (less than 0.029
requests per second). However, in server-side disk storage systems, this cannot always be expected. And in
our next experiment with real traces, we will see that
the performance of DPM is even worse because these
traces involve high request rates.
0.1
0
S11
S12
S13
S21
S22
S23
Traces
Fig.6. Energy savings for the synthetic traces.
Fig.7 presents the energy savings of the four methods for the two real traces. As both traces have high
request rates, DPM offers no energy savings, because
in such scenarios, no disks have a chance to be turned
off. Rather, the energy consumption is the same as
that for the baseline. Our method can reach an energy
savings rate of 43.62% on average. Compared with the
average rates of other methods, which are 28.15% for
PDC, 31.99% for E-HASH(50%), and 26.61% for EHASH(25%), our method is more effective.
0.7
0.6
Energy Savings
5.5
0.7
0.5
DPM
PDC
E-HASH(50%)
E-HASH(25%)
Our Method
0.4
0.3
0.2
0.1
0
OLTP
WEB
Traces
Fig.7. Energy savings for the real traces.
5.6
Time Performance
We first measure the count of file reorganization for
our method. The results under the six synthetic traces
are shown in Fig.8. As each synthetic trace contains
four types of access patterns, each of which is generated
with a different file locality value (25%, 50%, 75%, or
100%) and focuses on different files, there are actually
four changes of file access patterns in each trace. Fig.8
shows that our method is able to detect these changes
rightly.
689
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
our method. As there are no file migration operations
in DPM, we do not present it in Fig.10. Notably, PDC
has the largest number of file migrations, due to its
individual-disk-based migration design.
7
6
5
4
22 500
3
20 000
2
1
0
S11
S12
S13
S21
S22
S23
Traces
Fig.8. Count of file reorganization.
Next, we conduct an experiment on trace S12 to
show the disk switch number for each method. As
power switches introduce additional energy and time
costs, the disk switch number is critical to the overall
performance. Fig.9 shows the results for our method
and the four comparative methods. Here, the DPM
method has no disk switches, because it always lets all
the disks power on. But this is obviously not energy
efficient. Remember that DPM only turns off a disk
when the disk is idle for a considerable time interval.
The PDC method shows an extremely high number of
disk switches, due to its single-disk-based and periodical file reorganization scheme. Our method incurs the
smallest number of disk switches, because of the groupbased disk organization and on-demand file reorganization policy.
Disk Switch Number (Τ103)
70
60
50
40
30
20
10
0
DPM
PDC
Our
E-HASH E-HASH
Method
(50%)
(25%)
Method
Fig.9. Disk switch numbers.
The file migration numbers under trace S12 are
shown in Fig.10 which indicates that our method has
very few file migrations compared with E-HASH and
PDC. This is mainly owing to the group-based disk organization and on-demand file reorganization policy of
File Migration Number
Count of File Reorganization
8
17 500
15 000
12 500
10 000
7 500
5 000
2 500
0
PDC
E-HASH
(50%)
E-HASH
(25%)
Our
Method
Method
Fig.10. File migration numbers.
We also consider the response time of the proposed
method. The average response time per I/O request on
the six synthetic traces is shown in Fig.11. It shows
that the average I/O response time of E-HASH(25%)
is much more than that of the other methods in most
cases. This is because the small set of hot disks in EHASH(25%) causes some hot files being placed on cold
disks, which leads to many frequent spinning-up operations on the cold disks. Although E-HASH(50%) has
a large set of hot disks, its low efficiency on file reorganization lowers its time performance. The poor time
performance of PDC is due to the frequent disk switches
and file migrations as shown in Figs.9 and 10. The low
time performance of DPM is owing to that DPM does
not use any file reorganization techniques. Therefore,
when some disks are turned off by DPM, it is very possible for them to spin up, resulting in the increasing of
average I/O time.
We can also infer from Fig.11 that our method has
relatively stable time performance under all the traces,
owing to its dynamical algorithms in hot disk selection
and file reorganization. In contrast, all the other methods show very different time performance when being
measured under different traces. To this end, we can
say that our method is adaptive to workload changes.
Fig.12 presents average I/O time for the real traces.
Since both traces have very high request rates, no disks
are turned off in DPM. In other words, all the disks
in DPM will be active all the time. As a consequence,
690
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
90
Average I/O Time (ms)
80
70
DPM
PDC
E-HASH(50%)
E-HASH(25%)
a very large number, our method will be very similar
to the traditional approaches that do not consider energy issues, as the file reorganization operation will not
be executed any more (see Algorithm 3). In contrast,
using a small value for tp may lead to the frequent computation of ratio file and ratio disk, which not only is
time-consuming but also has additional energy costs.
In Figs.13 and 14, we show the energy savings rate and
average I/O time when different tp values are used on
trace S12.
Our Method
60
50
40
30
20
10
0
S11
S12
S13
S21
S22
S23
0.6
Traces
0.5
DPM gets a low I/O time in the experiment. However, as shown in Fig.7, the energy saving of DPM
is almost zero, which indicates that DPM is not an
energy-efficient approach. The average I/O time of
PDC is the worst for both traces, which is mainly due
to its frequent disk switch and file migration. In both
traces, our method gets comparable response time with
E-HASH(50%) and DPM.
Energy Savings
Fig.11. Average I/O time for the synthetic traces.
0.4
0.3
0.2
0.1
0
3 600
7 200 10 800 14 400 18 000 21 600 25 200
tp (s)
Fig.13. Influence of tp on energy savings.
280
DPM
E-HASH(25%)
PDC
E-HASH(50%)
14.0
Our Method
240
13.5
200
160
120
80
40
0
WEB
OLTP
Traces
Fig.12. Average I/O time for the real traces.
Average I/O Time (ms)
Average I/O Time (ms)
320
13.0
12.5
12.0
11.5
11.0
10.5
10.0
3 600
7 200 10 800 14 400 18 000 21 600 25 200
tp (s)
Combined with the energy saving results in Figs.6
and 7, we can see that for the synthetic traces, our
method can save 20% energy but keep even higher time
performance than E-HASH(50%). For the real traces,
our method consumes additional 3% more time but reduces 12% more energy cost on average, compared with
E-HASH(50%).
5.7
Influence of Parameters tp and δ
The parameter tp refers to the time interval which
we use to check whether a file reorganization operation should be started. Theoretically, if we set tp to
Fig.14. Influence of tp on average I/O time.
The results show that both metrics tend to decrease
with the increasing of tp. The best performance is
reached when tp is set to 7 200 s. It also shows that
when the parameter exceeds 7 200 s, the average I/O
time keeps nearly stable. In those situations, a long
time interval is used for checking the condition of file
reorganization. Thus cold disks are very likely to be
turned on because the system is running for a long
time and more files may be accessed compared with the
situation that we use a short time interval. Further, the
691
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
0.6
Energy Savings
0.5
0.4
0.3
0.2
0.1
0
0.5
0.6
0.7
0.8
0.9
δ
Fig.15. Influence of δ on energy savings.
1.0
21
18
Average I/O Time (ms)
cold disks that are turned on during the time interval
may also be in the active mode for a long time before
the next checking. Therefore, those cold disks will have
few chances to be turned off even if a file reorganization
is performed. When a lot of cold disks are active, our
method has little influence on the time performance,
because very few switches of power state of disks are
introduced. As a result, we use tp = 7 200 s in other
experiments. But generally, this parameter should be
first tuned when we apply the proposed method in different disk storage systems.
The parameter δ is the threshold controlling when
to conduct physical file reorganization. This parameter influences the overall performance, as the physical
file reorganization operations are both time-consuming
and incur additional energy cost. On the other hand,
appropriate file reorganizations can adjust the file distribution to fit the workload, so that the coming I/O
requests are handled more energy efficiently.
In this experiment, we also use trace S12. Parameters other than δ are the same as those in other experiments. As this parameter basically reflects the skewness
of access patterns, we vary δfrom 0.5 to 1. The results
regarding energy savings rate and average I/O time are
shown in Figs.15 and 16, respectively. We find that δ=
0.7 is the most appropriate setup considering both energy and time performance. It can be observed in Fig.16
that the average I/O time arises with the increase of the
parameter because more computations and file reorganizations including slight adjustments are introduced.
This in turn worsens the energy savings rate, which is
shown in Fig.16 as a declining curve when δ exceeds
0.7. For this reason, in our previous experiments, we
set δ to 0.7.
15
12
9
6
3
0
0.5
0.6
0.7
0.8
0.9
1.0
δ
Fig.16. Influence of δ on average I/O time.
5.8
Influence of I/O Request Type
In this experiment, we measure whether our method
is influenced by the I/O request type. Generally, file
systems have buffers to cache files when they are accessed. Therefore, if a file from a cold disk is cached
by the file system before the power-off of the disk, future read requests to this file can still be served until
the file is selected as the victim by the buffer replacement scheme of the file system. However, we have to
turn on a power-off cold disk if write requests to the
files in this disk arrive. To this extent, write-intensive
workloads will probably worsen the energy savings of
our proposal. On the other hand, our method is able
to organize the hottest files in active disks, and thus we
expect that write requests to cold disks will rarely happen and thus we can still keep the high rate of energy
savings even for write-intensive workloads.
We also use trace S12 but slightly modify it to generate read-intensive and write-intensive traces by using
different read ratios among the total requests. If the
read ratio in the trace is set to 0, it means the trace
is a write-only trace. Thus, we can use the read ratio
to produce traces with different read/write ratios. As a
result, we use six read ratios in the experiment, namely
0, 0.2, 0.4, 0.6, 0.8, and 1.0. When the read ratio is 1.0,
the trace is actually a read-only one.
The results are shown in Fig.17, which indicates
that the changing of I/O request type has little influence on the performance of our method. The reason is that our method is able to let the hottest files
stored in hot disks. To this extent, our method is resistant to the changing of read/write ratios and can suit
for different types of I/O requests. In contrast, PDC
and E-HASH(25%) are not able to keep stable energy
692
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
saving rates when the read ratio varies. Although EHASH(50%) and DPM show relatively stable performance in energy saving, their low energy saving rates
indicate that they are not practical for real applications.
0.90
Energy Savings
0.75
DPM
PDC
E-HASH(50%)
E-HASH(25%)
Our Method
0.60
0.45
Acknowledgements We would like to thank the
anonymous reviewers and editors for their valuable suggestions and comments to improve the quality of the
paper.
0.30
References
0.15
0
0
0.2
0.4
0.6
Read Ratio
0.8
1.0
Fig.17. Energy savings of our method under varying read ratios.
6
4
file system (HDFS)○
. HDFS provides file storage using distributed master-slave architecture and uses data
blocks (64 MB by default) rather than files to organize
data. With the wide applications of MapReduce and
HDFS in the big data era, it is helpful to devise energyproportional storage schemes to improve the energy efficiency of the MapReduce framework and HDFS.
Conclusions
Energy proportionality is a key metric for future
data centers for both energy conservation and performance guarantees. In this work, we presented a new design for energy-proportional disk storage systems. This
design employs a hotness-aware and group-based technique to organize files and disks. We further presented
effective algorithms to dynamically determine hot files
and hot disks according to workload changes. Finally,
a new algorithm for file reorganization was presented
for conducting the on-demand adjustment of file placement. We conducted comprehensive experiments on
both synthetic and real traces to quantify the performance of our proposal. The results show that our proposal is capable of saving more than 50% of the energy costs on average when compared with traditional
energy-unaware approaches. Furthermore, it can save
12%∼20% of the energy consumption on average when
compared with the state-of-the-art E-HASH method.
While offering low energy costs, our proposal is also
able to keep time performance comparable with previous approaches.
Future work is needed in the area of hybrid storage systems involving SSDs and magnetic hard disks.
In such systems, we have to distinguish the I/O request type when determining hot files and disks, as
SSDs have asymmetric read/write speeds. Another future work is to consider energy-proportional storage
for MapReduce workloads[19] and Hadoop distributed
4
○
http://hadoop.apache.org/, May 2015.
[1] Joukov N, Sipek J. GreenFS: Making enterprise computers
greener by protecting them better. In Proc. EuroSys, April
2008, pp.69-80.
[2] Weddle C, Oldham M, Qian J, Wang A A, Reiher P, Kuenning G H. PARAID: A gear-shifting power-aware RAID.
ACM Transaction on Storage, 2007, 3(3): Article No. 13.
[3] Jin Y, Xing B, Jin P. Towards a benchmark platform for
measuring the energy consumption of database systems. Advanced Science and Technology Letters, 2013, 29: 385-389.
[4] Bostoen T, Mullender S J, Berbers Y. Power-reduction techniques for data-center storage systems. ACM Computing
Surveys, 2013, 45(3): 33:1-33:38.
[5] Kim J, Rotem D. Energy proportionality for disk storage
using replication. In Proc. the 14th EDBT, March 2011,
pp.81-92.
[6] Kim J, Rotem D. FREP: Energy proportionality for disk
storage using replication. Journal of Parallel and Distributed Computing, 2012, 72(8): 960-974.
[7] Verma A, Koller R, Useche L, Rangaswami R. SRCMap:
Energy proportional storage using dynamic consolidation.
In Proc. the 8th FAST, February 2010, pp.267-280.
[8] Colarelli D, Grunwald D. Massive arrays of idle disks for
storage archives. In Proc. the ACM/IEEE SC, November
2002, pp.1-11.
[9] Useche L, Guerra J, Bhadkamkar M, Alarcon M, Rangaswami R. EXCES: External caching in energy saving storage systems. In Proc. the 14th HPCA, Feb. 2008, pp.89-100.
[10] Thereska E, Donnelly A, Narayanan D. Sierra: Practical
power-proportionality for data center storage. In Proc. the
6th EuroSys, April 2011, pp.169-182.
[11] Hui J, Ge X, Huang X, Liu Y, Ran Q. E-HASH: An energyefficient hybrid storage system composed of one SSD and
multiple HDDs. In Proc. the 3rd ICSI, June 2012, pp.527534.
[12] Pinheiro E, Bianchini R. Energy conservation techniques
for disk array-based servers. In Proc. the 18th ICS, June
26-July 1, 2004, pp.68-78.
[13] Otoo E J, Rotem D, Tsao S C. Dynamic data reorganization for energy savings in disk storage systems. In Proc. the
22nd SSDBM, June 30-July 2, 2010, pp.322-341.
693
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
[14] Irani S, Singh G, Shukla S K, Gupta R K. An overview of
the competitive and adversarial approaches to designing dynamic power management strategies. IEEE Transactions on
Very Large Scale Integration Systems, 2005, 13(12): 13491361.
[15] Irani S, Gupta R K, Shukla S K. Competitive analysis of dynamic power management strategies for systems with multiple power savings states. In Proc. DATE, March 2002,
pp.117-123.
[16] Padmanabhan V N, Qiu L. The content and access dynamics of a busy web site: Findings and implications. In Proc.
SIGCOMM, August 28-September 1, 2000, pp.111-123.
[17] Zhu Q, Chen Z, Tan L, Zhou Y, Keeton K, Wilkes J. Hibernator: Helping disk arrays sleep through the winter. In
Proc. the 20th SOSP, October 2005, pp.177-190.
[18] Otoo E J, Rotem D, Tsao S C. Energy smart management
of scientific data. In Proc. the 21st SSDBM, June 2009,
pp.92-109.
Christian S. Jensen is Obel Professor of computer science at Aalborg
University, Denmark. He has recently
been at Aarhus University for three
years and Google Inc. for one year. His
research concerns data management and
data-intensive systems, and its focus is
on temporal and spatio-temporal data
management. Christian is an ACM fellow and an IEEE
fellow, and he is a member of the Academia Europaea, the
Royal Danish Academy of Sciences and Letters, and the
Danish Academy of Technical Sciences. He has received
several national and international awards for his research.
He is the editor-in-chief of ACM Transactions on Database
Systems (TODS) and was an editor-in-chief of The VLDB
Journal from 2008 to 2014.
Yong Jin is currently a master
student in the School of Computer
Science and Technology at University
of Science and Technology of China
(USTC), Hefei. His research interests
include databases on new hardware
and
energy-proportional
computing.
[19] Chen Y, Ganapathi A S, Fox A, Katz R H, Patterson D A.
Statistical workloads for energy efficient MapReduce. Technical Report, No. UCB/EECS-2010-6, EECS Department,
University of California at Berkeley, January 2010.
Pei-Quan Jin received his Ph.D.
degree in computer science from University of Science and Technology of China
(USTC), Hefei, in 2003. After that, he
spent two years as a postdoctoral researcher in the Department of Electronic
Engineering & Information Science,
USTC. He is now an associate professor
in the School of Computer Science and Technology, USTC.
He was a visiting scientist of University of Kaiserslautern
(2009) and Aalborg University (2014).
His research
interests focus on spatio-temporal databases, flash-based
databases, and Web information retrieval. He served as the
PC co-chair of HardBD 2014, HardBD 2013, and FlashDB
2011, the demo co-chair of WAIM 2013 and NDBC 2012,
and a PC member of many international conferences such
as DASFAA, WAIM, DEXA, and WISE. He is a senior
member of CCF and a member of both ACM and IEEE.
Xike Xie is currently an assistant
professor in the Department of Computer
Science, Aalborg University, Denmark.
He received his Ph.D. degree in computer science from the University of
Hong Kong in 2012, and B.S. and M.S.
degrees from Xi’an Jiaotong University,
Xi’an, in 2003 and 2006, respectively. His research interests include data uncertainty, spatio-temporal databases,
and mobile computing. He is a member of ACM and IEEE.
Li-Hua Yue is a full professor in
the School of Computer Science and
Technology at University of Science
and Technology of China (USTC),
Hefei. She received her B.S. and M.S.
degrees in computer science both from
USTC. Her research interests include
flash-based databases, spatio-temporal
databases, information retrieval, and image processing.
She is a senior member of CCF and a member of ACM.
Appendix A Proof of Property 1
1) Suppose we have selected hot disks disks as hot.
Then we start the migration process.
2) The current free space of all the hot disks can be
represented as follows, where we suppose all disks have
the same size.
current f ree space =
hotX
disks
βi × disk size.
i=1
3) The required free space in the total hot disks is
determined by minRatio free, so we have:
required f ree space
disks
hotX
=
minRatio f ree × disk size.
i=1
694
J. Comput. Sci. & Technol., July 2015, Vol.30, No.4
4) Case 1.
If required free space exceeds current free space, we have to move some files from the
hot disks so as to increase the free space on the hot
disks. After that, if the current hot data volume in
Phot disks
αi × disk size, is less
the hot disks, namely i=1
than the total volume of all the selected hot files, i.e.,
hot volume, we can swap between hot and cold disks
to move some cold files on hot disks to cold disks and
vice versa. Fig.A1 shows an example of the migration
process for this case.
(minRatio֓free/⊲)
Hot Disks
Disk1
Disk2
Hot File
Cold File
Disk3
Cold Disks
Disk4
1
4
8
11
2
5
9
12
3
6
10
α/⊲
β/
= 2×
hot volume −
hotX
disks
!
αi × disk size ,
i=1
MCOST
= 2 × hot volume + minRatio f ree × disk size ×
hot disks −
hotX
disks
(2αi + βi ) × disk size.
i=1
5) Case 2. If required free space does not exceed
current free space, we first move some hot files on cold
disks to hot disks. After that, we can conduct swapping between cold and hot disks, just as we did in the
previous step. In this case, the migration cost can be
formulated as follows:
MCOST = COST (f illing f ree space) + COST (swap),
COST (f illing f ree space)
7
α/⊲
β/⊲
COST (swap)
α/⊲
α/⊲
β/⊲
β/⊲
Move File 7 into Cold Disks
=
disks
hotX
βi × disk size −
i=1
1
4
8
11
2
5
9
12
3
6
10
7
Swap Between
Hot and
Cold Disks
File 2
File 10
File 5
File 11
4
8
5
10
11
9
12
3
6
2
1
hotX
disks
minRatio f ree × disk size.
i=1
The hot data moved from cold disks to hot disks
should be deducted from the total hot data volume before swapping. Then, we can get the following migration cost.
COST (swap)
hotX
disks
= 2 × hot volume −
αi × disk size −
i=1
7
Fig.A1. Example of data migration for case 1.
In this case, the migration cost MCOST can be calculated as follows:
COST (f illing f ree space) ,
MCOST
= 2 × hot volume + minRatio f ree ×
hotX
disks
disk size × hot disks −
(2αi + βi ) ×
i=1
MCOST
disk size.
= COST (enlarging f ree space) + COST (swap),
COST (enlarging f ree space)
=
hotX
disks
minRatio f ree × disk size −
i=1
hotX
disks
i=1
βi × disk size,
The migration cost for case 2 is exactly the same as
that for case 1. For the purpose of illustration, we show
an example in Fig.A2.
Since hot volume, minRatio free, disk size and
hot disks are all constants for a given scenario, we can
simply represent 2 × hot volume + minRatio f ree ×
disk size × hot disks, as a constant C. Thus, we have
Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems
the following result.
MCOST = C −
hotX
disks
(2αi + βi ) × disk size.
i=1
(minRatio֓free/⊲)
Hot Disks
Disk2
Disk1
The above equation shows that the migration cost is
determined by the parameters αi and βi . Moreover,
the migration cost decreases with the increasing of
2αi + βi .
Hot File
Cold File
Disk3
Cold Disks
Disk4
1
4
7
10
2
5
8
11
9
3
695
α/⊲
α/⊲
β/⊲
β/⊲
Move File 9 into Hot Disks
α/⊲
β/⊲
α/⊲
β/⊲
1
4
7
10
2
5
8
11
3
9
Swap Between
Hot and
Cold Disks
File 2
File 7
File 5
File 11
1
4
2
5
7
10
8
11
3
6
Fig.A2. Example of data migration for case 2.