Jin PQ, Xie X, Jensen CS et al. HAG: An energy-proportional data storage scheme for disk array systems. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 30(4): 679–695 July 2015. DOI 10.1007/s11390-015-1554-x HAG: An Energy-Proportional Data Storage Scheme for Disk Array Systems 7 F wu Pei-Quan Jin 1,2 ( ), Senior Member, CCF, Member, ACM, IEEE Xike Xie 3,∗ ( ), Member, ACM, IEEE, Christian S. Jensen 3 , Fellow, ACM, IEEE, Yong Jin 1 ( and Li-Hua Yue 1,2 ( ), Senior Member, CCF, Member, ACM 7 ℄) 1 School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China 2 Key Laboratory of Electromagnetic Space Information, Chinese Academy of Sciences, Hefei 230027, China 3 Department of Computer Science, Aalborg University, Aalborg, DK-9220, Denmark E-mail: [email protected]; {xkxie, csj}@cs.aau.dk; [email protected]; [email protected] Received January 29, 2015; revised March 25, 2015. Abstract Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A recent survey has showed that power costs amount to about 50% of the total cost of ownership in a typical data center, with about 27% of the system power being consumed by storage systems. This paper aims at providing an effective solution to reducing the energy consumed by disk storage systems, by proposing a new approach to reduce the energy consumption. Differing from previous approaches, we adopt two new designs. 1) We introduce a hotness-aware and group-based system model (HAG) to organize the disks, in which all disks are partitioned into a hot group and a cold group. We only make file migration between the two groups and avoid the migration within a single group, so that we are able to reduce the total cost of file migration. 2) We use an on-demand approach to reorganize files among the disks that is based on workload change as well as the change of data hotness. We conduct trace-driven experiments involving two real and nine synthetic traces and we make detailed comparisons between our method and competitor methods according to different metrics. The results show that our method can dynamically select hot files and disks when the workload changes and that it is able to reduce energy consumption for all the traces. Furthermore, its time performance is comparable to that of the compared algorithms. In general, our method exhibits the best energy efficiency in all experiments, and it is capable of maintaining an improved trade-off between performance and energy consumption. Keywords 1 energy-aware system, file organization, storage management Introduction Reducing energy cost of storage systems is a critical issue for disk storage systems such as data centers. A recent study has showed that energy consumption accounts for 50% of the total cost of ownership for data centers[1] , and that storage systems account for 27% of system power[2-3] . To enable cost effective data centers, there is a strong need to develop effective schemes to reduce the energy consumption of storage systems[4] . In order to reduce energy consumption, many researchers proposed an energy-proportional approach for file storage on disk array systems[5-7] , which aims to keep frequently accessed hot files on some specific disks and to let the disks hosting cold files temporally power off. However, current solutions on energy-proportional storage may introduce high file-migration and additional energy costs. In addition, they cannot be selftuned according to workload changes. In this paper, we propose an effective approach to realize energy-proportional data storage on disk array systems. Our solution aims to reduce data migration costs when distributing and reorganizing files among Regular Paper Special Section on Data Management and Data Mining The work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61379037 and 61472376, and the Oversea Academic Training Funds (OATF) sponsored by the University of Science and Technology of China. ∗ Corresponding Author ©2015 Springer Science + Business Media, LLC & Science Press, China 680 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 disks and present a new on-demand algorithm for file reorganization that can adapt to workload changes. The major contributions of this paper are as follows. 1) We present a hotness-aware and group-based system model for disk storage systems, along with effective algorithms for dynamically determining hot files and grouping disks according to the access frequencies of their files. Data migration is only allowed between hot and cold disk groups. This group-based model can reduce migration costs because the migration within a single group does not happen in our model. (Section 3) 2) We present a new on-demand approach to reorganize files among disks, in which the reorganization process is triggered on the basis of the change of filehotness and predicted migration costs. This policy can reduce the times of reorganization as well as the unnecessary adjustment on file organization. (Section 4) 3) We conduct experiments with both real and synthetic traces to compare the proposed approach with previous solutions. The experimental results show that our approach considerably outperforms the competitors considering energy savings while maintaining a comparable time performance. (Section 5) 2 Related Work A number of energy conservation techniques have been proposed for disk storage systems[4] , most of which are based on the assumption of skewed file access patterns. Therefore, the main approach is to keep hot files on some specific disks and to let the disks hosting cold files temporally power off. Such approaches are usually called “energy proportional”[5-7] . To realize energy proportionality for disk storage systems, several techniques have been proposed. The first types of solutions can be called copy-based techniques[2,5-9] , because they copy popular files onto some additional disks or caches and let primary data drives power off. Weddle et al.[2] exploited the unused storage space for data replication such that one or more disks of an RAID can be put in standby mode to save power. They called this replication technique poweraware RAID (PARAID). Inspired by PARAID, Kim and Rotem[5-6] proposed replicating data across nodes to facilitate node deactivation when the system load decreases. They called their technique Fractional Replication for Energy Proportionality (FREP). Verma et al.[7] proposed to consolidate the workload across physical volumes using a technique they called Sample-Replicate Consolidate Mapping (SRCMap). They assumed that a physical volume is composed of an RAID. SRCMap samples a subset of the blocks, namely the working set, from each physical volume. This working set is replicated on other physical volumes. SRCMap only keeps the minimum number of physical volumes required by the workload turned on. MAID (Massive Array of Idle Disks)[8] was proposed as a replacement for old tape backup archives with hundreds or thousands of tapes. It uses a few additional always-on cache disks to hold recently accessed blocks to reduce the number of accesses to other disks. However, this layout is not energyefficient because the extra cache disks consume energy. The EXCES (External Caching in Energy Saving Storage Systems)[9] technique uses a low-end flash device for caching popular data and uses buffering of writes to increase idle periods of disk drives. EXCES maintains the top-k hottest files and moves them to flash devices periodically. A dedicated storage cache does not provide fine-grained energy proportionality and adds additional energy consumption as well as economic costs. In order to get a better trade-off between energy savings and read/write availability, Thereska et al.[10] proposed a distributed storage system that uses a power-aware replica layout for data chunks and a distributed virtual logging scheme to improve read/write availability. However, this approach also needs to maintain replicas of data. Besides, it has to use extra logs as well as a centralized metadata service to coordinate data replicas. Another method is to redistribute files among disks[11-12] . Pinheiro and Bianchini proposed the PDC (Popular Data Concentration) technique for energy conservation[12] . PDC does not use additional disks, but suggests transferring data between disks according to the popularity of the accessed data. PDC focuses on workloads with highly skewed file access patterns. It periodically moves files on the basis of their access frequencies, i.e., the most popular files are moved onto the first disk until it is full and the next most popular files are put onto the second disk, etc. However, PDC employs an individual-disk-based file migration policy. This policy may bring too many file movements because the access frequencies of files change very often in many applications. As the hotness of files between a disk and its neighbor is very close, a slight change of the hotness of files may lead to file migrations. This can result in substantial performance degradation even when all disks are in active mode. While in our proposal, we use an improved group-based file migration model that groups all the disks into a hot disks group 681 Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems and a cold disks group. File migrations are only allowed between groups. This scheme can reduce the unnecessary file migrations in PDC that are caused by the slight changes of the hotness of files. On the other hand, moving files between two hot disks contributes little in both energy saving and performance improvement, because these disks may always be in active mode. Therefore, reducing file migrations between disks within the same group will save energy but without degrading time performance. The disk-grouping idea was also proposed in [13], which was designed for RAID disk systems and especially for read-only workloads. In this work, the authors proposed to partition all disks into RAID groups and use a disk-block-based method to exchange data between groups. However, differing from [13], our work can support non-RAID systems and handle general workloads with reads and writes. In addition, we use a file-based exchanging model rather than a diskblock-based model. A new method for energy proportionality called EHASH was proposed recently[11] . E-HASH divides all the disks into a hot disk set and a cold disk set and periodically puts the files with high access frequency onto the hot disk set periodically. The cold disk set can be turned off or switched into sleep state. To let cold disks shut off is a straightforward idea that can be effective at reducing energy consumption, but we have to consider the wake-up cost for cold disks that are shut off. Therefore, when and how to turn off cold disks has been a research topic in energyproportional disk storage systems[4,14-15] . The arguably best approach to addressing this problem is called DPM (Dynamic Power Management)[14] . DPM aims at finding an optimal solution to determining the right shut-off time for cold disks. The basic idea is to measure the idle time interval of cold disks, and to shut them off if the energy savings in the observed idle time interval exceeds a predefined energy penalty. The predefined energy penalty reflects the additional energy cost of serving a request to a disk that is shut down before the request is received (thus we have to immediately turn on the disk). However, the DPM policy is only based on energy penalty and idle time of disks. It does not consider any data movement among disks. We use DPM as a baseline in our experiments. Our work is influenced by E-HASH. However, there are some major differences between our approach and E-HASH. First, the size of the hot disk set is fixed in E-HASH while in our work, it is dynamically adapted according to the changes in the workload. Second, E- HASH redistributes the files among disks in terms of a stable period, while we use a new dynamic approach to reorganize the files among disks. Here, the reorganization process is triggered on the basis of the changes in the hotness of files and of predicted migration costs. As the redistribution process adds extra energy cost and adversely affects the time performance of the storage system, our approach is expected to have better performance in both energy conservation and run time. 3 HAG System Model To address the problems of previous methods such as PDC and E-HASH, we first propose an improved model for disk storage systems, which is called HAG (Hotness-Aware and Group-Based model). The model takes into account the hotness of files and disks when reorganizing files. The group-based design groups disks into two sets, namely hot and cold groups. 3.1 System Model Fig.1 shows the general idea of the HAG model. All disks in a storage system are categorized into a hot group and a cold group. The hotness of a disk is computed based on the hotness of the files residing on the disk. In Subsection 3.2, we cover the details on how to determine the hotness of files. In general, the hotness of a file is influenced by its access frequency. As the access frequency of a file changes with time, the hotness of its associated disk will eventually change, too. Therefore, in the HAG model, the hot disks are dynamically selected on the basis of the hotness of files as well as disks. File I/O Requests File I/O Dispatcher File-Disk Mapping Table Migration ( Dynamic) Hot Disks Cold Disks Fig.1. HAG (Hotness-Aware and Group-Based) system model. The operational process of the HAG model is shown in Fig.2. We periodically check the state of the system to determine whether file reorganization is needed. If yes, we determine the hot files and the hot disks. Both the checking of the file reorganization condition and 682 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 the selection of hot files and disks are based on statistics collected during the operation of the system. After that, we move data between hot disks and cold disks so that the hottest data can be allocated on the selected hot disks. The cold disks are then set in standby states for the sake of saving energy. of hot disks according to workload change. Generally, different applications may have different file access patterns, and also the frequently accessed files change as applications change. Thus, it is not practical to always keep a fixed number of hot disks. 3.2 Start Initialization : HOT={all the disks}, COLD=φ Wait tp Seconds Need File Reorganization? No Determining Hot Files Many database applications exhibit skewed access patterns. Basically, we can use access count to measure the hotness of a file. However, as access counts change with time, we have to introduce a smoothing or aging scheme to reflect the time factor. For example, a file accessed frequently and recently should be regarded as hotter than a file that was accessed frequently in the past. Therefore, we define the hotness of a file as follows. estr (tn ) = a × xtn + (1 − a) × estr (tn−1 ). (1) Yes Slight Adjustment on File Allocation Detect Hot Files Select Hot Disks Interchange Data Between Hot and Cold Disks Set Cold Disks into Standby States No End? Yes End Fig.2. HAG-based system procedure. Compared with previous models such as PDC and E-HASH, the major benefits of HAG can be described in two aspects. First, differing from the single-diskbased model of PDC in which file migration happens between two disks with different hotness values, the HAG model only performs file migration between the two disk groups (hot group and cold group). There will not be any file migrations between two disks within the same group. Therefore, HAG can reduce unnecessary file migration costs caused by PDC. Next, in E-HASH, the number of hot disks remains stable, while in HAG, we use a dynamic selection procedure to determine the set In (1), tn represents the current time slice, xtn represents the observed value at time tn . In our framework, xtn is 1 if file r was accessed during tn , otherwise xtn is 0. Next, estr (tn−1 ) is the hotness estimate for the previous time slice tn−1 . The variable α is a decay factor that reflects the weight given to the new observation versus the earlier one, i.e., estr (tn−1 ). In our experiments, the I/O requests to files are written into a log file, and we can simply scan the log to calculate the hotness of files. In order to finally determine which files are hot, we sort all the files in a descending order according to estr (tn ). Then the top 1/4 files are initially identified as hot files. According to the 20/80 rule in database applications, we suppose that 20% files are possible hot ones. Here, we slightly increase the expected size of the set of hot files to 25%, because the energy penalty of recognizing a cold file as a hot one is less than that of the opposite case. When a hot file is labeled as cold, we have to turn on/off the disk hosting the file frequently. Such switches of power states will cost more power than the ordinary operations on a disk. On the other hand, it is not energy efficient to denote all the remaining 3/4 files as cold data. In general, the hottest and coldest files are easily to be recognized based on the access frequency. The difficult part lies in the identification of “warm files”. Since they are likely in a changing state, either from hot to cold or from cold to hot, we have to carefully deal with those files. In order to find the warm files that are much likely turned into hot files, we first get the mean est value of all the warm files, and Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems 683 then identify those files whose est values are higher than the mean as hot, and the others as cold. The detailed process is described in Algorithm 1. (2) Table 1, can be simply calculated by (2). hot volume . hot disks = maxRatio hot × disk size Algorithm 1. Determining HotFiles(F ) Input: the set of files: F , each of which has an associated est value Output: H: the set of hot files 1: G ← sort all files in F according to est; 2: H ← the top 1/4 files in G; Table 1. Notations Used in Determining Hot Disks Notation Description disk size Capacity of a single disk (bytes) maxRatio hot Maximal ratio of hot data in a disk (%) 3: C ← the bottom 1/4 files in G; minRatio f ree Required minimal ratio of free space in a disk (%) 4: W ← G – H – C ; //get the warm files hot f iles Number of the hot files selected in Subsection 3.2 6: for each file f in W do hot volume Volume of all the selected hot files (bytes) 7: hot disks Estimated number of hot disks 8: end for αi Current hot data ratio in disk i 9: return H; βi Current free space ratio in disk i 5: m ← the mean est value of all the files in W ; if f.est > m then H ← H ∪ {f }; End Determining HotFiles In our design, we do not introduce a moderate group, e.g., warm files group. Warm files will eventually lead to warm disks. However, if we leave those warm disks in active state, they are actually treated as hot disks; if they are set in standby mode, they are treated as cold disks. As a result, a warm files group as well as a warm disks group will complicate the control but contribute little in energy efficiency. 3.3 Selecting Hot Disks Based on the hot files as detailed in Subsection 3.2, we can further determine the hot disks. The rest of the disks are then identified as cold disks. The basic idea is to let hot files reside in hot disks and make the number of hot disks as small as possible so that we can maximize the chance to shut off more disks and in turn reduce energy consumption. There are two principles when selecting the hot disks. 1) The selection of hot disks should consider both energy savings and time performance. 2) The file migration between disks should be reduced as much as possible. To simplify the analysis, we introduce some notations as shown in Table 1. The purpose of parameters maxRatio hot and minRatio free is to control the number of hot files on hot disks, as we have to keep some free spaces for new insertions or updates. After determining the hot files using the algorithm in Subsection 3.2, the estimated number of hot disks, namely hot disks in After that, we select hot disks disks among all the disks as the hot disks. As the current ratio of hot files in a disk, i.e., αi , may exceed maxRatio hot, we need to be able to swap files among disks to get the ratio of hot files in each hot disk below maxRatio hot. This will introduce additional migration cost, and we want to first determine the minimal migration cost. Property 1. The migration cost in selecting the set of hot disks, denoted as MCOST , is determined by αi and βi . It can be represented by (3), where C is a constant. MCOST = C − hotX disks (2αi + βi ) × disk size. (3) i=1 The proof of Property 1 is put in Appendix A. Then, based on (3), we give Algorithm 2 for selecting the set of hot disks. Algorithm 2. Selecting HotDisks(α, β, hot disks) Input: α: the hot data ratios of all disks; β: the free space ratios of all disks; hot disks: the number of hot disks to be returned Output: D: an array storing the identifiers of the hot disks hot disks Preliminary: the disks are numbered from 0 to |α| − 1 1: for i = 0 to |α| − 1 do 2: H[i].cost ← 2αi + βi ; 3: H[i].disk no ← i; 4: end for 5: H ← sort H descendingly w.r.t. H[i].cost; 6: for i = 0 to hot disks − 1 do 7: D[i] ← H[i].disk no; 8: end for; 9: return D; End Selecting HotDisks 684 4 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 File Reorganization File access patterns generally change with time. Therefore, the hotness of a file as well as the hotness of a disk changes with time in turn. In order to make the file distribution energy-efficient when file access patterns change, we present an on-demand approach to reorganize files on disks to ensure continued high energy efficiency. The basic idea is to reorganize files periodically. This approach has been used in previous studies such as PDC[12] . However, as file reorganization introduces additional energy costs because of data migration operations, periodical reorganization may worsen the energy efficiency of the target storage system. Basically, it is appropriate to check periodically whether file reorganization is needed, but physical reorganization should not be performed periodically, because in some cases, the reorganization costs may exceed the predicted benefits. In our proposal, we use on-demand file reorganization, where we periodically check whether file reorganization is needed and only conduct physical reorganization if the current file organization is poor in energy efficiency. Generally, there are two major rules for designing an efficient approach to file reorganization. First, the approach should be adaptive to workload change. Second, it should introduce only little overhead in energy consumption and runtime overheads. Based on these rules, in contrast to the periodical reorganization way, we consider two factors to determine whether file reorganization is necessary. 1) ratio file: the ratio of accesses to current hot disks, defined in Definition 1. 2) ratio disk : the ratio of hot disks that are frequently accessed, defined in Definition 2. Definition 1. The hot disk file access ratio, ratio file, is defined as the total number of accesses to the files in hot disks divided by the total number of file accesses. This quantifier captures the popularity of the files in hot disks. Definition 2. The hot disk popularity, ratio disk, is computed as follows: we sort all the disks in a descending order according to the number of accesses to them in descending order. Then, we pick up the top hot disks disks from the sorted list and compute the number of current hot disks that appear in the set of the top hot disks disks. This number divided by hot disks is returned as ratio disk. Example 1. Suppose there are total 9 disks numbered from 0 to 8, and the current hot disks are 1, 4, and 7. Thus the constant hot disks is 3 in this example. Table 2 shows the number of accesses to the files in each disk. Table 2. Accesses to Each Disk in Example 1 Disk Accesses to Each Disk 0 100 1 400 2 600 3 500 4 800 5 200 6 300 7 900 8 700 Then, we can compute ratio file and ratio disk as follows: 1) ratio f ile = (400 + 800 + 900)/(100 + 400 + 600 + 500 + 800 + 200 + 300 + 900 + 700) = 0.47; 2) ratio disk = 2/3 = 0.67. Here, for ratio disk, we first sort the disks according to the number of accesses to them, which results in the list: (7, 4, 8, 2, 3, 1, 6, 5, 0). Then, we check the top hot disks disks in the list, i.e., {7, 4, 8}, and find that it contains two hot disks, i.e., {7, 4}. Thus ratio disk is calculated as |{7, 4}| / hot disks, i.e., 2/3. We use the average value of ratio file and ratio disk and perform file reorganization if the value is below a threshold. In this case, there has been a substantial change in the access pattern, and the current files organization does not support the new workload. Therefore, file reorganization is performed to re-select the hot files as well as hot disks and further to distribute the newly hot files onto the newly identified hot disks. In case that the average ratio exceeds the threshold, meaning that the current accesses are still focused on the current hot files and hot disks, we start a slight adjusting procedure. This procedure interchanges the hot files in current cold disks with the cold files in current hot disks, so that the hot files in cold disks have a chance to move to hot disks even when physical file reorganization is not triggered. As a consequence, we present the on-demand file reorganization in Algorithm 3. 5 Performance Evaluation As our approach aims at reducing the energy consumption of a disk storage system while guaranteeing time performance, the evaluation focuses on two metrics: energy saving and runtime. We first describe the 685 Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems Algorithm 3. File Reorganization(F , α, β, tp, δ) Input: F : the file set, α: the hot data ratios of disks, array β: the free space ratios of disks, tp: a time interval, δ: the threshold for triggering physical file reorganization Preliminary: disks are identified by unique number from 0 to |α| − 1 /* Checking on file reorganization conditions will be periodically executed after tp time units since the last checking */ 1: for each time interval tp do 2: ratio file ← computed by Definition 1; 3: ratio disk ← computed by Definition 2; 4: if (ratio file + ratio disk ) / 2 < δ then 5: 6: 7: 8: 9: 10: 11: Fig.3 shows an illustration of the power model of the disk we use in the experiments. For the computation of energy consumption when running an algorithm g on trace s, we define energy(s, g) by (4). energy(s, g) = energyrun + energyspin + energystop + energyidle + energystandby . (4) The elements in (4) are defined as follows. hot files ← Determining HotFiles(F ); hot volume hot disks ← ; maxRatio hot × disk size D ← Selecting HotDisks (α, β, hot disks); Swap between hot disksD and cold disks; else Slight adjustment: interchange hot files in cold disks with cold files in hot disks; end if ; 12: end for; End File Reorganization energyrun = |s| X ((seek power + i=1 operating power) × IO latency), X energyspin = (spinup power × spinup time), X energystop = (stopping power × stopping time), X energyidle = (idle power × idle time), X energystandy = (standby power × standby time). experimental setup and workloads in Subsections 5.1 and 5.2, and then we present the evaluation results w.r.t. various metrics from Subsection 5.3 to Subsection 5.8. Operating Mode (13 w) Seek 5.1 (12 .6 w , 4 .16 ms) Experimental Settings The experiments are conducted under a simulation 1 environment developed using SimPy○ . The environment consists of a workload generator, a file dispatcher, and a group of virtual hard disks. Each virtual hard disk is designed as a container of files. The parameters of hard disks are based on a modern hard disk: a Sea2 gate Barracuda 7200.10○ . Table 3 shows the detailed parameters for the disk. Table 3. Power Parameters Used in the Experiments Description Idle mode Operating mode Standby mode Spinup power Stopping power Seek power Spinup time Stopping time I/O latency Value 9.30 w 13.00 w 0.80 w 24.00 w 9.30 w 12.60 w 15.00 s 10.00 s 4.16 ms Spinup (24 w , 15 s) Standby Mode ( 0.8 w) Idle Mode (9. 3 w) Stop (9.3 w, 10 s) Fig.3. Illustration of the disk power model. Here, idle time and standby time are dynamically aggregated for each disk in the idle state and the standby state. Other parameters are listed in Table 3. We compare the proposed approach with three competitors, i.e., DPM[14] , PDC[12] , and E-HASH[11] . The E-HASH method uses a fixed set of disks as the hot disks. In our experiment, the ratio of hot disks in EHASH is set to two different values, namely 25% and 50%, in order to make a comprehensive comparison between E-HASH and our approach. We denote them as E-HASH(25%) and E-HASH(50%) in the results. 1 ○ http://simpy.readthedocs.org, May 2015. 2 ○ http://www.seagate.com/support/disc/manuals/sata/100402371a.pdf, May 2015. 686 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 In the experiments, we set the parameter tp to 7 200 seconds. This time period is also used in E-HASH, i.e., both E-HASH(25%) and E-HASH(50%) perform file reorganization every 7 200 seconds. The parameter δ is set to 0.7. These parameters are likely to impact the overall performance, and we will discuss their effects later in Subsection 5.7. 5.2 Workloads We use two types of workloads. The first type consists of six synthetic traces, each composed of 20 million read/write requests over 500 thousand files randomly distributed among 24 disks. This disk number is the same as that in one of the real traces. We suppose that the disks have the same volume and each file has the same size. Therefore, we use the number of files to represent the volume of a disk, which is calculated by (5). disk size = ⌈(f ile number/disk number) × (1 + f ree space ratio)⌉. cases of α = 1.0, we also consider another situation of α = 1.4. Another factor that needs to be considered when generating synthetic traces is disk locality. The disk locality refers to the probability for a disk to receive I/O requests. We use the exponential function to simulate disk locality. Specially, exp(u) indicates the exponential distribution with mean u. A previous disk storage system named Hibernator[17] uses exp(6) and exp(20) to characterize disk locality. We add the case of exp(50). Generally, a larger value of parameter u indicates that the accesses are more load-balanced. We show the synthetic Zipf traces in Table 4. Here, each trace consists of total 20 million requests, and each trace has 5 million requests for each file locality value (25%, 50%, 75%, and 100%) within the total 20 million requests. In order to obtain changing file access patterns, we make each 5 million requests focus on different files. (5) In the synthetic traces, the number of total files is 500 000, and the size of each file is 48 KB. The free space ratio is set to 0.3. Thus, each disk is able to contain a maximum of 27 084 files. As modern disks usually contain internal caches, we implement a buffering scheme for each simulated disk. The buffer for a disk can contain 1 000 files (about 3.7% of the total files in a disk), and all the buffers for each disk are with the same size. In our experiments, we implement the LRU scheme as the buffer replacement algorithm. For generating each synthetic trace, we use the concept of file locality to indicate the percentages of files being accessed. For example, a file locality of 25% means that the I/O requests concern on 25% of all files. To reflect the dynamics of workloads, we use four different file locality values, namely 25%, 50%, 75%, and 100%, when generating the trace. Next, we use the Zipf distribution to generate the synthetic traces so that they reflect skewed file access patterns. Access patterns with Zipf distribution have been characterized and used in many previous studies[16] . For instance, Internet data access patterns can usually be characterized as satisfying a Zipf distribution with the distribution parameter of α = 1.0[16] . In our experiments, we use two different Zipf parameters to generate the synthetic traces. In addition to the Table 4. Synthetic Zipf Traces Name Zipf Parameter Disk Locality S11 α = 1.0 exp(6)0 S12 α = 1.0 exp(20) S13 α = 1.0 exp(50) S21 α = 1.4 exp(6)0 S22 α = 1.4 exp(20) S23 α = 1.4 exp(50) In addition to the synthetic traces, we use two real 3 traces: OLTP and WEB○ . OLTP. This is an I/O trace from a real OLTP application running at two large financial institutions. It contains 4 099 352 write requests interleaved with 1 235 632 read requests to 24 disks. The mean request rate is 123.5 per second. WEB. The second trace is recorded on a storage system that supports a Web search engine. There are 4 579 809 read requests invoked within 4.5 hours, and 99% of the requests are to three disks. The mean request rate is 297.487 per second. The original OLTP and WEB traces only record block-level requests, and thus we map those requests into file-level requests by regarding each block number as a file number. 3 ○ http://traces.cs.umass.edu/index.php/Storage/Storage, May 2015. 687 Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems 5.3 Adaptivity in Selecting Hot Disks We first consider the adaptivity of our approach in selecting the set of hot disks. As discussed before, our method is able to dynamically determine the hot disks according to the changing workload. Fig.4 presents the different numbers of hot disks selected by our method and the three previous approaches. 35 DPM E-HASH(50%) Our Method Hot Disks Number 30 PDC E-HASH(25%) 25 20 15 10 5 hot disks according to the change of access patterns. Thus we direct most requests to hot disks and let the cold disks power off. 5.4 Effectiveness of File Reorganization As discussed in Section 4, our method is able to dynamically reorganize files based on the computing of ratio file and ratio disk. These two quantifiers enable us to conduct on-demand file reorganization based on the changing file access patterns. Compared with the periodical file reorganization way in E-HASH, this technique can reduce the number of file reorganization and further save energy and improve time performance. We conduct an experiment on trace S12 to measure the effectiveness of file reorganization. The results are shown in Fig.5, where we show the I/O requests to each disk generated by different methods. The disks are ordered ascendingly on the number of I/O requests. 0 t1 t2 t3 t4 0.7 Four Different Periods DPM E-HASH(50%) Our Method 0.6 Fig.4. Hot disks varying with time. PDC E-HASH(25%) Here, we use the trace S12 as the workload, and the decay factor in (1) is set to 0.05. When scanning the trace, we manually select four time instants to calculate the number of hot disks. In detail, the first time instant t1 is the time when the first 2.5 million requests have been processed and the second time instant t2 is the time when the first 7.5 million requests have been finished. Finally, t3 is the time when 12.5 million requests have been processed, and t4 is the time after executing 17.5 million requests. As DPM uses all 24 disks and is not able to select hot disks, the number of hot disks is constant at 24. EHASH uses a static set of disks as hot disks, i.e., in EHASH(25%), there is a stable set of six hot disks, while in E-HASH(50%), the set consists of 12 disks. However, in both implementations of E-HASH, the number of hot disks cannot be adjusted in accordance with workload change. In contrast, our method adapts to the changing workload. This is mainly because more and more files are being accessed. The dynamicity of the number of hot disks is critical to the energy proportionality of disk storage systems. Since different applications usually have different data access patterns, and even the same application may have different access patterns at different time points, it is more appropriate to dynamically change the set of Request Ratio 0.5 0.4 0.3 0.2 0.1 0 1 3 5 7 9 11 13 15 17 19 21 23 Disks (Ascendant w.r.t. Request Ratio) Fig.5. Request ratios to each disk. Because DPM does not involve any file reorganization, there is no change in the request ratios of the total 24 disks. As the hotness of files is changing with time, DPM is not suitable for achieving energy proportionality. Next, E-HASH(50%) and E-HASH(25%) generally have higher accessing ratios on cold disks than our method. This is mainly because of the limited number of hot disks in E-HASH(25%) and because the sets of hot disks are stable. The PDC method allocates most files in one disk (the No. 24 disk in Fig.5). This will cause high I/O overheads to the hottest disk. Besides, this method has to perform many file migrations in order to make the hottest files always reside in the hottest disk. The experimental results in Subsection 5.6 will demonstrate this claim. In contrast, our method can 688 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 dynamically select an appropriate number of hot disks according to workload change; therefore, we can see in Fig.5 that more requests are concentrated on eight hot disks (No.17∼No.24) by our method. This will finally lead to better performance in energy saving and run time, as we will discuss in Subsections 5.5 and 5.6. 0.8 Energy Savings Energy Savings PDC E-HASH(25%) 0.5 0.4 0.3 0.2 In this experiment, we measure the percentage of energy savings of each method. The baseline method, denoted as NEC (No Energy Saving), is that all the 24 disks are always in the active mode, i.e., they are all powered on, and no energy-aware techniques are applied. Then we compare the energy savings of each method using (6), where energy saving(s, g) refers to the energy savings rate of the algorithm identified as g under trace s. Here, energy(s, g) is defined by(4). energy saving(s, g) = 1 − DPM E-HASH(50%) Our Method 0.6 energy(s, g) . energy(s, N EC) (6) Fig.6 presents the energy savings rate of the four methods as measured for the six traces. In most cases, our method has 50% or higher energy savings. The average energy savings rate of our method is 52.79%, while E-HASH(50%) and E-HASH(25%) reach energy savings rates of 31.58% and 23.40% on average, respectively. The PDC method performs better than our method under traces S22 and S23. This is simply because these two traces have a very high access locality. Thus, the hottest files can be stored in one hottest disk. However, PDC cannot adapt to the workload change, which causes the worse energy-saving results under other synthetic traces. Notably, PDC has an average energy saving rate of 50.44% under all the synthetic traces, which is worse than our method. As our method is designed for the better trade-off between energy saving and time performance, we can see in Subsection 5.6 that the time performance of our method is significantly superior to that of the PDC method. The average energy saving rate of DPM is only 3.09% because of its inability to identify hot and cold disks. Further, as studied in [18], DPM is only helpful for energy savings under very low request rates (less than 0.029 requests per second). However, in server-side disk storage systems, this cannot always be expected. And in our next experiment with real traces, we will see that the performance of DPM is even worse because these traces involve high request rates. 0.1 0 S11 S12 S13 S21 S22 S23 Traces Fig.6. Energy savings for the synthetic traces. Fig.7 presents the energy savings of the four methods for the two real traces. As both traces have high request rates, DPM offers no energy savings, because in such scenarios, no disks have a chance to be turned off. Rather, the energy consumption is the same as that for the baseline. Our method can reach an energy savings rate of 43.62% on average. Compared with the average rates of other methods, which are 28.15% for PDC, 31.99% for E-HASH(50%), and 26.61% for EHASH(25%), our method is more effective. 0.7 0.6 Energy Savings 5.5 0.7 0.5 DPM PDC E-HASH(50%) E-HASH(25%) Our Method 0.4 0.3 0.2 0.1 0 OLTP WEB Traces Fig.7. Energy savings for the real traces. 5.6 Time Performance We first measure the count of file reorganization for our method. The results under the six synthetic traces are shown in Fig.8. As each synthetic trace contains four types of access patterns, each of which is generated with a different file locality value (25%, 50%, 75%, or 100%) and focuses on different files, there are actually four changes of file access patterns in each trace. Fig.8 shows that our method is able to detect these changes rightly. 689 Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems our method. As there are no file migration operations in DPM, we do not present it in Fig.10. Notably, PDC has the largest number of file migrations, due to its individual-disk-based migration design. 7 6 5 4 22 500 3 20 000 2 1 0 S11 S12 S13 S21 S22 S23 Traces Fig.8. Count of file reorganization. Next, we conduct an experiment on trace S12 to show the disk switch number for each method. As power switches introduce additional energy and time costs, the disk switch number is critical to the overall performance. Fig.9 shows the results for our method and the four comparative methods. Here, the DPM method has no disk switches, because it always lets all the disks power on. But this is obviously not energy efficient. Remember that DPM only turns off a disk when the disk is idle for a considerable time interval. The PDC method shows an extremely high number of disk switches, due to its single-disk-based and periodical file reorganization scheme. Our method incurs the smallest number of disk switches, because of the groupbased disk organization and on-demand file reorganization policy. Disk Switch Number (Τ103) 70 60 50 40 30 20 10 0 DPM PDC Our E-HASH E-HASH Method (50%) (25%) Method Fig.9. Disk switch numbers. The file migration numbers under trace S12 are shown in Fig.10 which indicates that our method has very few file migrations compared with E-HASH and PDC. This is mainly owing to the group-based disk organization and on-demand file reorganization policy of File Migration Number Count of File Reorganization 8 17 500 15 000 12 500 10 000 7 500 5 000 2 500 0 PDC E-HASH (50%) E-HASH (25%) Our Method Method Fig.10. File migration numbers. We also consider the response time of the proposed method. The average response time per I/O request on the six synthetic traces is shown in Fig.11. It shows that the average I/O response time of E-HASH(25%) is much more than that of the other methods in most cases. This is because the small set of hot disks in EHASH(25%) causes some hot files being placed on cold disks, which leads to many frequent spinning-up operations on the cold disks. Although E-HASH(50%) has a large set of hot disks, its low efficiency on file reorganization lowers its time performance. The poor time performance of PDC is due to the frequent disk switches and file migrations as shown in Figs.9 and 10. The low time performance of DPM is owing to that DPM does not use any file reorganization techniques. Therefore, when some disks are turned off by DPM, it is very possible for them to spin up, resulting in the increasing of average I/O time. We can also infer from Fig.11 that our method has relatively stable time performance under all the traces, owing to its dynamical algorithms in hot disk selection and file reorganization. In contrast, all the other methods show very different time performance when being measured under different traces. To this end, we can say that our method is adaptive to workload changes. Fig.12 presents average I/O time for the real traces. Since both traces have very high request rates, no disks are turned off in DPM. In other words, all the disks in DPM will be active all the time. As a consequence, 690 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 90 Average I/O Time (ms) 80 70 DPM PDC E-HASH(50%) E-HASH(25%) a very large number, our method will be very similar to the traditional approaches that do not consider energy issues, as the file reorganization operation will not be executed any more (see Algorithm 3). In contrast, using a small value for tp may lead to the frequent computation of ratio file and ratio disk, which not only is time-consuming but also has additional energy costs. In Figs.13 and 14, we show the energy savings rate and average I/O time when different tp values are used on trace S12. Our Method 60 50 40 30 20 10 0 S11 S12 S13 S21 S22 S23 0.6 Traces 0.5 DPM gets a low I/O time in the experiment. However, as shown in Fig.7, the energy saving of DPM is almost zero, which indicates that DPM is not an energy-efficient approach. The average I/O time of PDC is the worst for both traces, which is mainly due to its frequent disk switch and file migration. In both traces, our method gets comparable response time with E-HASH(50%) and DPM. Energy Savings Fig.11. Average I/O time for the synthetic traces. 0.4 0.3 0.2 0.1 0 3 600 7 200 10 800 14 400 18 000 21 600 25 200 tp (s) Fig.13. Influence of tp on energy savings. 280 DPM E-HASH(25%) PDC E-HASH(50%) 14.0 Our Method 240 13.5 200 160 120 80 40 0 WEB OLTP Traces Fig.12. Average I/O time for the real traces. Average I/O Time (ms) Average I/O Time (ms) 320 13.0 12.5 12.0 11.5 11.0 10.5 10.0 3 600 7 200 10 800 14 400 18 000 21 600 25 200 tp (s) Combined with the energy saving results in Figs.6 and 7, we can see that for the synthetic traces, our method can save 20% energy but keep even higher time performance than E-HASH(50%). For the real traces, our method consumes additional 3% more time but reduces 12% more energy cost on average, compared with E-HASH(50%). 5.7 Influence of Parameters tp and δ The parameter tp refers to the time interval which we use to check whether a file reorganization operation should be started. Theoretically, if we set tp to Fig.14. Influence of tp on average I/O time. The results show that both metrics tend to decrease with the increasing of tp. The best performance is reached when tp is set to 7 200 s. It also shows that when the parameter exceeds 7 200 s, the average I/O time keeps nearly stable. In those situations, a long time interval is used for checking the condition of file reorganization. Thus cold disks are very likely to be turned on because the system is running for a long time and more files may be accessed compared with the situation that we use a short time interval. Further, the 691 Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems 0.6 Energy Savings 0.5 0.4 0.3 0.2 0.1 0 0.5 0.6 0.7 0.8 0.9 δ Fig.15. Influence of δ on energy savings. 1.0 21 18 Average I/O Time (ms) cold disks that are turned on during the time interval may also be in the active mode for a long time before the next checking. Therefore, those cold disks will have few chances to be turned off even if a file reorganization is performed. When a lot of cold disks are active, our method has little influence on the time performance, because very few switches of power state of disks are introduced. As a result, we use tp = 7 200 s in other experiments. But generally, this parameter should be first tuned when we apply the proposed method in different disk storage systems. The parameter δ is the threshold controlling when to conduct physical file reorganization. This parameter influences the overall performance, as the physical file reorganization operations are both time-consuming and incur additional energy cost. On the other hand, appropriate file reorganizations can adjust the file distribution to fit the workload, so that the coming I/O requests are handled more energy efficiently. In this experiment, we also use trace S12. Parameters other than δ are the same as those in other experiments. As this parameter basically reflects the skewness of access patterns, we vary δfrom 0.5 to 1. The results regarding energy savings rate and average I/O time are shown in Figs.15 and 16, respectively. We find that δ= 0.7 is the most appropriate setup considering both energy and time performance. It can be observed in Fig.16 that the average I/O time arises with the increase of the parameter because more computations and file reorganizations including slight adjustments are introduced. This in turn worsens the energy savings rate, which is shown in Fig.16 as a declining curve when δ exceeds 0.7. For this reason, in our previous experiments, we set δ to 0.7. 15 12 9 6 3 0 0.5 0.6 0.7 0.8 0.9 1.0 δ Fig.16. Influence of δ on average I/O time. 5.8 Influence of I/O Request Type In this experiment, we measure whether our method is influenced by the I/O request type. Generally, file systems have buffers to cache files when they are accessed. Therefore, if a file from a cold disk is cached by the file system before the power-off of the disk, future read requests to this file can still be served until the file is selected as the victim by the buffer replacement scheme of the file system. However, we have to turn on a power-off cold disk if write requests to the files in this disk arrive. To this extent, write-intensive workloads will probably worsen the energy savings of our proposal. On the other hand, our method is able to organize the hottest files in active disks, and thus we expect that write requests to cold disks will rarely happen and thus we can still keep the high rate of energy savings even for write-intensive workloads. We also use trace S12 but slightly modify it to generate read-intensive and write-intensive traces by using different read ratios among the total requests. If the read ratio in the trace is set to 0, it means the trace is a write-only trace. Thus, we can use the read ratio to produce traces with different read/write ratios. As a result, we use six read ratios in the experiment, namely 0, 0.2, 0.4, 0.6, 0.8, and 1.0. When the read ratio is 1.0, the trace is actually a read-only one. The results are shown in Fig.17, which indicates that the changing of I/O request type has little influence on the performance of our method. The reason is that our method is able to let the hottest files stored in hot disks. To this extent, our method is resistant to the changing of read/write ratios and can suit for different types of I/O requests. In contrast, PDC and E-HASH(25%) are not able to keep stable energy 692 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 saving rates when the read ratio varies. Although EHASH(50%) and DPM show relatively stable performance in energy saving, their low energy saving rates indicate that they are not practical for real applications. 0.90 Energy Savings 0.75 DPM PDC E-HASH(50%) E-HASH(25%) Our Method 0.60 0.45 Acknowledgements We would like to thank the anonymous reviewers and editors for their valuable suggestions and comments to improve the quality of the paper. 0.30 References 0.15 0 0 0.2 0.4 0.6 Read Ratio 0.8 1.0 Fig.17. Energy savings of our method under varying read ratios. 6 4 file system (HDFS)○ . HDFS provides file storage using distributed master-slave architecture and uses data blocks (64 MB by default) rather than files to organize data. With the wide applications of MapReduce and HDFS in the big data era, it is helpful to devise energyproportional storage schemes to improve the energy efficiency of the MapReduce framework and HDFS. Conclusions Energy proportionality is a key metric for future data centers for both energy conservation and performance guarantees. In this work, we presented a new design for energy-proportional disk storage systems. This design employs a hotness-aware and group-based technique to organize files and disks. We further presented effective algorithms to dynamically determine hot files and hot disks according to workload changes. Finally, a new algorithm for file reorganization was presented for conducting the on-demand adjustment of file placement. We conducted comprehensive experiments on both synthetic and real traces to quantify the performance of our proposal. The results show that our proposal is capable of saving more than 50% of the energy costs on average when compared with traditional energy-unaware approaches. Furthermore, it can save 12%∼20% of the energy consumption on average when compared with the state-of-the-art E-HASH method. While offering low energy costs, our proposal is also able to keep time performance comparable with previous approaches. Future work is needed in the area of hybrid storage systems involving SSDs and magnetic hard disks. In such systems, we have to distinguish the I/O request type when determining hot files and disks, as SSDs have asymmetric read/write speeds. Another future work is to consider energy-proportional storage for MapReduce workloads[19] and Hadoop distributed 4 ○ http://hadoop.apache.org/, May 2015. [1] Joukov N, Sipek J. GreenFS: Making enterprise computers greener by protecting them better. In Proc. EuroSys, April 2008, pp.69-80. [2] Weddle C, Oldham M, Qian J, Wang A A, Reiher P, Kuenning G H. PARAID: A gear-shifting power-aware RAID. ACM Transaction on Storage, 2007, 3(3): Article No. 13. [3] Jin Y, Xing B, Jin P. Towards a benchmark platform for measuring the energy consumption of database systems. Advanced Science and Technology Letters, 2013, 29: 385-389. [4] Bostoen T, Mullender S J, Berbers Y. Power-reduction techniques for data-center storage systems. ACM Computing Surveys, 2013, 45(3): 33:1-33:38. [5] Kim J, Rotem D. Energy proportionality for disk storage using replication. In Proc. the 14th EDBT, March 2011, pp.81-92. [6] Kim J, Rotem D. FREP: Energy proportionality for disk storage using replication. Journal of Parallel and Distributed Computing, 2012, 72(8): 960-974. [7] Verma A, Koller R, Useche L, Rangaswami R. SRCMap: Energy proportional storage using dynamic consolidation. In Proc. the 8th FAST, February 2010, pp.267-280. [8] Colarelli D, Grunwald D. Massive arrays of idle disks for storage archives. In Proc. the ACM/IEEE SC, November 2002, pp.1-11. [9] Useche L, Guerra J, Bhadkamkar M, Alarcon M, Rangaswami R. EXCES: External caching in energy saving storage systems. In Proc. the 14th HPCA, Feb. 2008, pp.89-100. [10] Thereska E, Donnelly A, Narayanan D. Sierra: Practical power-proportionality for data center storage. In Proc. the 6th EuroSys, April 2011, pp.169-182. [11] Hui J, Ge X, Huang X, Liu Y, Ran Q. E-HASH: An energyefficient hybrid storage system composed of one SSD and multiple HDDs. In Proc. the 3rd ICSI, June 2012, pp.527534. [12] Pinheiro E, Bianchini R. Energy conservation techniques for disk array-based servers. In Proc. the 18th ICS, June 26-July 1, 2004, pp.68-78. [13] Otoo E J, Rotem D, Tsao S C. Dynamic data reorganization for energy savings in disk storage systems. In Proc. the 22nd SSDBM, June 30-July 2, 2010, pp.322-341. 693 Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems [14] Irani S, Singh G, Shukla S K, Gupta R K. An overview of the competitive and adversarial approaches to designing dynamic power management strategies. IEEE Transactions on Very Large Scale Integration Systems, 2005, 13(12): 13491361. [15] Irani S, Gupta R K, Shukla S K. Competitive analysis of dynamic power management strategies for systems with multiple power savings states. In Proc. DATE, March 2002, pp.117-123. [16] Padmanabhan V N, Qiu L. The content and access dynamics of a busy web site: Findings and implications. In Proc. SIGCOMM, August 28-September 1, 2000, pp.111-123. [17] Zhu Q, Chen Z, Tan L, Zhou Y, Keeton K, Wilkes J. Hibernator: Helping disk arrays sleep through the winter. In Proc. the 20th SOSP, October 2005, pp.177-190. [18] Otoo E J, Rotem D, Tsao S C. Energy smart management of scientific data. In Proc. the 21st SSDBM, June 2009, pp.92-109. Christian S. Jensen is Obel Professor of computer science at Aalborg University, Denmark. He has recently been at Aarhus University for three years and Google Inc. for one year. His research concerns data management and data-intensive systems, and its focus is on temporal and spatio-temporal data management. Christian is an ACM fellow and an IEEE fellow, and he is a member of the Academia Europaea, the Royal Danish Academy of Sciences and Letters, and the Danish Academy of Technical Sciences. He has received several national and international awards for his research. He is the editor-in-chief of ACM Transactions on Database Systems (TODS) and was an editor-in-chief of The VLDB Journal from 2008 to 2014. Yong Jin is currently a master student in the School of Computer Science and Technology at University of Science and Technology of China (USTC), Hefei. His research interests include databases on new hardware and energy-proportional computing. [19] Chen Y, Ganapathi A S, Fox A, Katz R H, Patterson D A. Statistical workloads for energy efficient MapReduce. Technical Report, No. UCB/EECS-2010-6, EECS Department, University of California at Berkeley, January 2010. Pei-Quan Jin received his Ph.D. degree in computer science from University of Science and Technology of China (USTC), Hefei, in 2003. After that, he spent two years as a postdoctoral researcher in the Department of Electronic Engineering & Information Science, USTC. He is now an associate professor in the School of Computer Science and Technology, USTC. He was a visiting scientist of University of Kaiserslautern (2009) and Aalborg University (2014). His research interests focus on spatio-temporal databases, flash-based databases, and Web information retrieval. He served as the PC co-chair of HardBD 2014, HardBD 2013, and FlashDB 2011, the demo co-chair of WAIM 2013 and NDBC 2012, and a PC member of many international conferences such as DASFAA, WAIM, DEXA, and WISE. He is a senior member of CCF and a member of both ACM and IEEE. Xike Xie is currently an assistant professor in the Department of Computer Science, Aalborg University, Denmark. He received his Ph.D. degree in computer science from the University of Hong Kong in 2012, and B.S. and M.S. degrees from Xi’an Jiaotong University, Xi’an, in 2003 and 2006, respectively. His research interests include data uncertainty, spatio-temporal databases, and mobile computing. He is a member of ACM and IEEE. Li-Hua Yue is a full professor in the School of Computer Science and Technology at University of Science and Technology of China (USTC), Hefei. She received her B.S. and M.S. degrees in computer science both from USTC. Her research interests include flash-based databases, spatio-temporal databases, information retrieval, and image processing. She is a senior member of CCF and a member of ACM. Appendix A Proof of Property 1 1) Suppose we have selected hot disks disks as hot. Then we start the migration process. 2) The current free space of all the hot disks can be represented as follows, where we suppose all disks have the same size. current f ree space = hotX disks βi × disk size. i=1 3) The required free space in the total hot disks is determined by minRatio free, so we have: required f ree space disks hotX = minRatio f ree × disk size. i=1 694 J. Comput. Sci. & Technol., July 2015, Vol.30, No.4 4) Case 1. If required free space exceeds current free space, we have to move some files from the hot disks so as to increase the free space on the hot disks. After that, if the current hot data volume in Phot disks αi × disk size, is less the hot disks, namely i=1 than the total volume of all the selected hot files, i.e., hot volume, we can swap between hot and cold disks to move some cold files on hot disks to cold disks and vice versa. Fig.A1 shows an example of the migration process for this case. (minRatio֓free/⊲) Hot Disks Disk1 Disk2 Hot File Cold File Disk3 Cold Disks Disk4 1 4 8 11 2 5 9 12 3 6 10 α/⊲ β/ = 2× hot volume − hotX disks ! αi × disk size , i=1 MCOST = 2 × hot volume + minRatio f ree × disk size × hot disks − hotX disks (2αi + βi ) × disk size. i=1 5) Case 2. If required free space does not exceed current free space, we first move some hot files on cold disks to hot disks. After that, we can conduct swapping between cold and hot disks, just as we did in the previous step. In this case, the migration cost can be formulated as follows: MCOST = COST (f illing f ree space) + COST (swap), COST (f illing f ree space) 7 α/⊲ β/⊲ COST (swap) α/⊲ α/⊲ β/⊲ β/⊲ Move File 7 into Cold Disks = disks hotX βi × disk size − i=1 1 4 8 11 2 5 9 12 3 6 10 7 Swap Between Hot and Cold Disks File 2 File 10 File 5 File 11 4 8 5 10 11 9 12 3 6 2 1 hotX disks minRatio f ree × disk size. i=1 The hot data moved from cold disks to hot disks should be deducted from the total hot data volume before swapping. Then, we can get the following migration cost. COST (swap) hotX disks = 2 × hot volume − αi × disk size − i=1 7 Fig.A1. Example of data migration for case 1. In this case, the migration cost MCOST can be calculated as follows: COST (f illing f ree space) , MCOST = 2 × hot volume + minRatio f ree × hotX disks disk size × hot disks − (2αi + βi ) × i=1 MCOST disk size. = COST (enlarging f ree space) + COST (swap), COST (enlarging f ree space) = hotX disks minRatio f ree × disk size − i=1 hotX disks i=1 βi × disk size, The migration cost for case 2 is exactly the same as that for case 1. For the purpose of illustration, we show an example in Fig.A2. Since hot volume, minRatio free, disk size and hot disks are all constants for a given scenario, we can simply represent 2 × hot volume + minRatio f ree × disk size × hot disks, as a constant C. Thus, we have Pei-Quan Jin et al.: Energy-Proportional Data Storage Scheme for Disk Array Systems the following result. MCOST = C − hotX disks (2αi + βi ) × disk size. i=1 (minRatio֓free/⊲) Hot Disks Disk2 Disk1 The above equation shows that the migration cost is determined by the parameters αi and βi . Moreover, the migration cost decreases with the increasing of 2αi + βi . Hot File Cold File Disk3 Cold Disks Disk4 1 4 7 10 2 5 8 11 9 3 695 α/⊲ α/⊲ β/⊲ β/⊲ Move File 9 into Hot Disks α/⊲ β/⊲ α/⊲ β/⊲ 1 4 7 10 2 5 8 11 3 9 Swap Between Hot and Cold Disks File 2 File 7 File 5 File 11 1 4 2 5 7 10 8 11 3 6 Fig.A2. Example of data migration for case 2.
© Copyright 2026 Paperzz