Future Storage Systems

INVITED
PAPER
Recent Advancements and
Future Challenges of
Storage Systems
In the Internet environment, hard drives are becoming global storage devices
that can access many types of data, manage and locate desired data,
and securely preserve it wherever it resides.
By David H. C. Du, Fellow IEEE
ABSTRACT | The rapid development of the Internet, fast drop of
storage cost, and the great improvement of storage capacity have
created an environment with an enormous amount of digital
data. In addition to the traditional goals like performance,
scalability, availability, and reliability, other challenges including
manageability, searching for the desired information, energy
efficiency, long-term data preservation, and data security
become increasingly important for storage systems. We first
present the evolution path of the past development of storage
systems. The impact of the advancement of disk technology on
fault tolerance is discussed next. The potential solutions and
research issues of the new challenges are also briefly discussed.
KEYWORDS | Disk technology; file systems; intelligent storage;
object-based storage device (OSD)
I. INTRODUCTION
The past decade has witnessed tremendous advances in
computing, wired and wireless communication, and
storage technologies. The storage device has increased its
capacity to 1 Tbytes/drive. The rotational speed of disk
spindle has also increased to 15 thousand revolutions per
minute (RPM). It is also important that remarkable cost
reductions have made large computing and storage capacity
available to an increasing number of consumers. Many
wired and wireless networks are integrated with Internet
protocol (IP) into a single information technology infra-
Manuscript received December 15, 2007; revised February 13, 2008. Current version
published December 2, 2008. This work was supported in part by NSI Grant 0621462.
The author is with the Department of Computer Science and Engineering,
University of Minnesota, Minneapolis, MN 55455 USA.
Digital Object Identifier: 10.1109/JPROC.2008.2004321
0018-9219/$25.00 2008 IEEE
structure called the Internet. with this unprecedented
connectivity provided by the Internet, many new applications have emerged and developed. A huge amount of data
has become available for access to satisfy the demand of
these new applications. Each person has access to a
relatively large number of storage devices in desktop and
laptop computers as well as in personal digital assistants and
cellular phones. The traditional focus of storage systems on
high-performance computing has being extended to cover
this new environment. It increasingly becomes a challenge
to manage the huge amount data available at our fingertips
and to locate the data we need at anytime from anywhere.
More data types like audio and video are converted into
digital format, and data in various types need to be
preserved for a very long term (beyond 100 years).1
In the current computer systems, data are stored as a
file under a given file system or in databases. A file system
is associated with a particular computer (or computers in
the distributed file systems) and its operating system. A file
system can be on top of a storage system. A database can be
on top of file systems or directly on a storage system. A
storage system consists of a number of disks or other
storage devices that are connected and perform as a single
integrated storage space.
A disk usually contains one or more platters, with
magnetic heads attached in an arm reading bits from the
platters. Each platter is divided into tracks. Each track is
further divided into sectors. The same track on all the
platters forms a cylinder. To access a particular sector from
a platter, the disk arm moves in or out to be on top of the
track where the sector is located, while the platter rotates
for the magnetic head to be on top of that sector. Since
outer tracks are larger than the inner tracks and with the
1
SNIA 100 Year Archive Requirements Survey.
Vol. 96, No. 11, November 2008 | Proceedings of the IEEE
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
1875
Du: Recent Advancements and Future Challenges of Storage Systems
same rotational speed, many high-performance disks employ a partition with more sectors on the outer tracks than
the inner. It is called zone bit encoding. A disk employing
zone bit encoding divides a platter into multiple zones,
each consisting of a set of contiguous cylinders. The
cylinders in the same zone have the same recording density.
Outer zones have higher numbers of sectors per cylinder
and have higher data transfer rate.
A challenge for disk technologies is that the magnetic
heads need to be close enough to sense the data on the disk
platter while far enough to not to touch the surface. It is
extremely challenging in physics to reduce platter vibration while rotating at high speed. This is one of the reasons
that disk rotation speeds do not improve as much compared to aerial density. Today’s popular off-the-shelf disks
have rotation speeds of either 7200 or 10 000 RPM. Highperformance disks have rotation speeds from 10 000 to
15 000 RPM. The research on magnetic disks focuses on
how to increase the aerial density and how to improve the
reliability of the drive.
Most of the disks sold today have one or two 32-bit
internal microprocessors and several megabytes of internal
random access memory (RAM). the processor and ram on
disks enable disks to perform complex tasks that are
unnoticed by the users. one of the complex tasks is error
detection and correction. sophisticated algorithms using
digital error checking and correction codes are used on
disks to ensure data integrity. error-correction code (ECC)
bits are stored along with the user data on disks. Such an
error detection and correction mechanism is built into
drive control electronics as the first line of error correction
to deliver user data without delay. When such a first line of
error correction fails, additional error recovery mechanisms will be invoked. Sophisticated algorithms are used to
schedule multiple disk requests in the most efficient way
to access data. Another feature that has been included in
today’s disk drives is a failure warning provided to the host
systems when error rates exceed a threshold value. In
1994, a failure warning standard specification called selfmonitoring and reporting technology (SMART)2 was
adopted by the disk drive industry. SMART uses an
algorithm running on the drive’s microprocessor that
continuously checks whether internal drive errors such as
bit-read errors and track-seek errors have exceeded a
threshold rate value. A warning will be given to the host
CPU when the error rates exceed the threshold.
On some disks like fiber channel (FC) drives, the
processor can be programmed to execute XOR computations. By computing XOR on disks, it helps to reduce the
need for such resources as CPU and redundant arrays of
independent disk (RAID) controllers in parity computations [1], [2]. The processing capability of a disk is expected
to continuously increase, and so is the memory capacity.
2
ANSI-SCSI Informational Exception Control (IEC) document
X3T10/94-190.
1876
The data available over The internet today is highly
unstructured and heterogeneous. Therefore, the traditional
way of accessing data via file name or url may not be
adequate. As we are moving into the direction of accessing
data based on their semantics, how to store semantic
information with the data and how to allow data retrieval
based on semantic become extremely challenging. The
design goals of a traditional storage system are performance,
scalability, reliability, and availability. With the dramatic
changing of our computing and communication environment, how to deal with various types of data, how to manage
and locate the desired data, and how to securely preserve the
data for longer terms become increasingly challenging. In
the extreme case, we can consider all the data stored on the
Web servers over the Internet today as a single large
integrated storage system. To meet these challenges, we
have to reexamine the potential and basic functions that can
be supported by a single storage device (disk drive) as well as
the architecture of future storage systems. The solutions may
depend on how we take advantage of the onboard computing
and memory capability of a disk. We shall call such disks
intelligent storage devices.
The rest of this paper is organized as follows. Section II
describes the evolution path of storage systems from directly
attached devices to global storage systems. Section III
addresses the fault-tolerant issue of storage systems and the
impact of the fault-tolerant storage systems due to the
advancement of today’s disk drive. Some of the challenges
and research issues for current and future storage systems
are discussed in Section IV. It is clear that more research is
needed to investigate these issues. We believe that intelligent storage is one of the promising approaches. However,
we are still far away from solving these challenges. We offer
some conclusions in the last section.
I I. FROM DI RECTLY ATTACHED TO
GLOBAL STORAGE DEVICES
Ever since the late 1950s and early 1960s, when online
digital storage devices were first introduced, scientists and
researchers have never stopped searching for technologies
to support bigger, better, and more scalable storage solutions. The truth is that the improvements on the storage
capacity alone will not satisfy the growing needs for more
scalable storage solutions. Many emerging applications
such as digital medical imaging systems in picture archive
and communication system, digital libraries, and video on
demand require not only a huge storage capacity but also
high-performance and fault-tolerant storage systems.
Other applications such as e-commerce and scientific
analysis and visualization of a very large data set such as
the human genome requires more powerful and scalable
capability for searching and data mining.
In the past, the storage model was always assumed to
be the direct attached storage (DAS) model. In DAS,
storage devices are directly attached to a host system. This
Proceedings of the IEEE | Vol. 96, No. 11, November 2008
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
Du: Recent Advancements and Future Challenges of Storage Systems
model is the simplest way to connect host and storage
devices and is efficient for storage systems in a small scale.
It provides security, manageability (in small scales), and
performance. However, this model provides a very limited
scalability. When a system reaches its limit on the number
of devices it can attach, the system must be replicated. In
addition, when additional host systems are needed, data
stored on the storage systems will need to be either
replicated or partitioned. There is no direct device sharing
and data sharing in DAS model. Device sharing between
two DAS systems must be served by the host systems
through computer networks. It introduces extra CPU
overhead and access latency. Many old mainframe systems
use the DAS model and proprietary protocols to connect
host and storage devices. In recent decades, standardized
protocols such as small computer systems interface (SCSI)
[3] and advanced technology attachment (ATA) (also
known as IDE)3 have been used to reduce the cost and
increase the interoperability. However, both SCSI and ATA
have very limited connectivity. A SCSI channel can only
connect up to 15 devices. Its popular parallel interfaces
have the maximum distance in only a few feet. Although the
speed and reliability of parallel SCSI and ATA have been
improved with their serial versions (serial SCSI and serial
ATA)4 [4], the extensibility and the number of devices that
can be connected have not. Examples of DAS include a
laptop, desktop PC, or single server storage devices that
usually reside inside the system. When multiple servers are
required in an organization, DAS may seem to be low in
cost from each server’s point of view. When the number of
servers becomes larger, the DAS model becomes inefficient
and expensive in the total cost of the ownership.
Network attached storage (NAS) is a technology that
provides file-level access on any type of network. A NAS
device consists of one or more storage devices and an
engine that implements and provides file services. A host
system accesses data on NAS devices using a file system
device driver with file access protocol such as network file
system (NFS), server message block, and common Internet
file system (CIFS). With NAS, the storage can increase in
capacity and scale in number independently from the
applications. Another advantage of the NAS is that it can
use any popular network transport protocols such as
asynchronous transfer mode, gigabit Ethernet networks, or
even IP networks. Therefore, the cost for networking is
relatively low. NAS technology is mature and interoperable. It is popular in provide scalable file systems. However,
NAS may not be suitable for applications such as database
transaction that require short latency due to its file-level
access latency or any high-performance applications.
To provide scalable low-latency data transfers from a
storage device to a host or another storage device over a
network, storage area network (SAN) is a technology that
3
X3T10/0948D Information technologyVATA attachment interface
with extensions (ATA-2).
4
SATA-1 specification, serial ATA: high speed serial AT attachment.
emerged and has gained popularity in recent years. Its
primary use is for transferring data between computer
systems and storage devices as well as among storage
devices. A SAN consists of a communication infrastructure,
a management layer, storage devices, and computer systems. The communication infrastructure provides physical
connections, and the management layer organizes them.
From each computer system’s point of view, the storage
devices are connected as if they were connected in a DAS
model. Nonetheless, storage devices can be shared very
easily through the SAN. This is very beneficial and cost
effective, especially when sharing such expensive devices as
tape libraries and CD-ROM jukeboxes. In addition, storage
devices can be added to the storage network independently
from the hosts. Therefore, the scalability is greatly
improved. Fiber channel [5] and InfiniBand (IB)5 are two
popular standards that provide physical connections and
communication protocols in SAN. To a lesser extent, serial
storage architecture (SSA) [6]–[8] and gigabit (or 10 Gbit)
Ethernet are also used for this purpose. Some of these
protocols include a layer mapping between SCSI and their
lower layer protocols. One of the big differences between
NAS and SAN is that SAN provides a block-level access to
the storage devices and NAS provides a file-level access.
Therefore, SAN could be optimized for data movement
from servers to disk and tape drives. Because of its blocklevel access, SAN is more suitable to those applications
that require low latency and high performance.
Among the communication protocols that support SAN,
fiber channel is the most popular one. It provides up to four
gigabits per second link bandwidth. It can run on either
optical or electrical connections. It supports three different
basic topologies. The first supported topology is point to
point, which allows a direct connection between a server and
a storage device. The second is loop topology, also referred to
as fiber channel-arbitrated loop (FC-AL) [9], [10]. Up to 127
devices can be connected to an FC-AL. The third topology is
fabric (or switched). It connects hosts and storage devices
through fiber channel switches. By adding more ports/
switch or switches, more devices can be added to an SAN
and the total bandwidth can be scaled up with the total
number of ports in the network. However, fiber channel
switches are more expensive. Each fiber channel switch port
costs about two to three times more than an IP-based gigabit
Ethernet port. FC-AL provides a low-cost alternative to the
switched architecture. It provides connectivity up to 127
hosts and devices without the need of a fiber channel switch.
FC-AL uses an arbitration protocol to determine who has the
access right of the loop. Only the node (a host or device) that
won in an arbitration phase is allowed to use the loop for
transfer. Only one simultaneous transfer is allowed on a
loop. When a transfer ends, the next transfer right will be
given to the node that won in the following arbitration
phase.
5
http://www.infinibandta.org.
Vol. 96, No. 11, November 2008 | Proceedings of the IEEE
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
1877
Du: Recent Advancements and Future Challenges of Storage Systems
Another alternative solution is InfiniBand. IB was originally designed for Bsystem area networks[ that would
connect CPUs, memory, and I/O devices in a scalable cluster
environment. IB offers switch fabric based point-to-point
connections. Each connection is a bidirectional serial link
with transmission rate of 2 Gbit/s in each direction. IB also
offers double (DDR) and quad (QDR) data rates at 4 and
8 Gbit/s, respectively. Links can be aggregated in units of
four (4X) or 12 (12X). Therefore, a high I/O rate can be
easily accomplished. The other unique features of IB include
remote direct memory access (RDMA) support for low CPU
overhead and quality of service for data transmissions. A
storage subsystem needs to be connected via a target channel
adapter and each processor via a host channel adapter. Data
are transmitted in packets of up to 4 KB. IB uses a superset
of primitives from virtual interface architecture [11]. For an
application to communicate with another application on a
different node via IB, a work queue that consists of a queue
pair (QP) has to be created first. For each work queue, a
corresponding complete queue is also created. The QP
represents the application on the source and destination
nodes. An operation (send/receive, RDMA-read, RDMAwrite, or RDMA atomics) is placed in the corresponding
work queue as a work queue element (WQE). The channel
adapter will pick up the WQE to carry out the intended
operation. Once a WQE has being done, a complete queue
element (CQE) is created and placed in the complete
queue. The benefit of using CQE in the complete queue is to
reduce the interrupts that otherwise would be generated.
For each created QP, one of the following types of transport
service can be chosen: reliable connection (RC), unreliable
connection (UC), reliable datagram (RD), unreliable
datagram (UD), and raw datagram. The QP with raw
datagram is a data link layer service that sends and receives
raw datagram messages that are not interpreted. Through
these services and a type of credit-based flow control,
quality of service can be accomplished in IB.
SSA [6]–[8] is another possibility and was originally
designed by IBM. It provides two input and two output
links to and from each node (host or device). Each link has
a 20 MB/s (megabytes per second) bandwidth. There are
three different topologies supported in SSA. The first one
is SSA string. In an SSA string, the nodes are connected one
next to another as a string, with each node in the middle
connecting to a left neighboring node and a right
neighboring node. The beginning node and the ending
node of a string are connected to only one other node. The
second topology is the SSA loop. An SSA loop can be
formed by connecting the beginning and the ending nodes
of an SSA string. An SSA loop provides two nonoverlapped
paths from one node to any other node on the same loop.
Therefore, SSA loops provide fault tolerance against single
link failure. SSA loop is the most common topology in SSA.
The third topology is switched that uses a switch node with
six or more ports to connect a mix of a node, SSA string,
and SSA loop. One of the most attractive features in SSA is
1878
spatial reuse. In SSA, each link operates independently. As
opposed to FC-AL, which requires arbitration before
accessing the loop and allows only one simultaneous
transfer, SSA does not need any arbitration for accessing
the loop and it allows multiple nonoverlapping simultaneous transfers on a loop. Furthermore, since each link is
operating independently, simultaneous transfers on the
nonoverlapped portion of the loop can utilize the full link
bandwidth. This is called spatial reuse. Spatial reuse allows
multiple nonoverlapped transfers, with each utilizing the
full link bandwidth. With spatial reuse, the total system
throughput can be much higher than the link bandwidth.
Although technically interesting, SSA is less successful or
popular due to its proprietary nature.
We will not discuss gigabit or 10 Gbit Ethernet in this
paper since they are popular local-area network standards
and are commonly understood by the research community.
Recently a new effort to integrate gigabit Ethernet, IB, and
FC in a data center is emerged (called data center Ethernet
or IEEE 802.1 Standard). Storage area networks have some
common limitations. The first one is their distance
limitation. Although fiber channel provides a maximum
of 10 km in distance, it is still not long enough for connections across a wide area such as multiple cities and
states. Although NAS transfer data via computer networks
can travel across wide-area networks, it does not provide
the low-latency access that many applications need. For
applications that require low-latency accesses over a large
geographic area, neither NAS nor SAN can provide a
satisfactory solution. The second limitation is the cost.
Even with fiber channel’s popularity, a fiber channel switch
port is about three times of an equivalent IP-based port such
as gigabit Ethernet. Expertise in fiber channel, InfiniBand,
and SSA is also relatively rare compared to those in IPbased networks. This adds up to the total cost of ownership
for SAN. Both of these two limitations lead to IP-based
storage solutions.
IP storage is widely envisioned as the storage networking of the future to provide low-latency block-level
access across longer distance. IP storages take advantage of
popular low-cost IP networking and provide a block-level
access to the storage devices. Three major IP storage
standards are currently being defined: Internet SCSI
(iSCSI) [12]–[15], fiber channel over IP (FCIP) [16], and
Internet fiber channel protocol (iFCP) [17]. The concept is
to transport SCSI packets or fiber channel frames over IP
networks. The objective is to utilize the low-cost and highspeed IP networks to provide low-latency block-level
accesses to storage devices over a long distance. In the
short term, IP storage protocols including iSCSI, FCIP, and
iFCP will more likely be used for connecting multiple
remote SANs. There are still many challenges to overcome
for native IP storage to be accepted. A short description of
iSCSI, FCIP, and iFCP is provided below.
• iSCSI defines a protocol that will enable transferring
SCSI data over TCP/IP networks [12]–[15]. To
Proceedings of the IEEE | Vol. 96, No. 11, November 2008
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
Du: Recent Advancements and Future Challenges of Storage Systems
identify an iSCSI device, each iSCSI node has two
types of identifiers: an iSCSI name and an iSCSI
address. A permanent globally unique iSCSI name
with length up to 255 bytes will be assigned to each
iSCSI initiator and target by naming authority. The
combination of TCP port and IP address provides a
unique network address for an iSCSI device. While
the iSCSI name will remain unchanged regardless
the location of the iSCSI device, the TCP port and IP
address combination changes from one subnet to
another. The iSCSI address is composed of an IP
address, a TCP port number, and the iSCSI name of
the node. In order for an initiator to connect to a
storage resource, the Internet storage name service
(iSNS) provides a DNS-type service for device
discovery for an iSCSI initiator. An initiator would
first query an iSNS server to resolve the IP addresses
of potential target resources. The IP address will
then be used to establish a TCP/IP connection to the
target resource.
• FCIP [16] is a protocol that is intended for transferring FC frames over IP networks. By creating
point-to-point IP tunnels between FCIP endpoints
on remote SANs, it forms a single unified FC SAN
across IP networks. Once the tunnels are set up,
they are transparent to the FC devices. The devices
use FC addressing scheme and view each of the
devices on the unified SAN as if they were in a
local SAN. A single FC fabric namespace will be
established. The FCIP bridge at the sending FCIP
endpoint encapsulates the FC frames, then uses
TCP/IP for transferring. On the receiving FCIP endpoint, the encapsulated FC frames will be decapsulated and transferred in FC frame format on the
remote SAN.
• iFCP [17] is also intended to extend FC SAN over
wide-area networks. It provides an alternative to
FCIP. As in FCIP, it uses the common FC encapsulation format specified by the Internet Engineering Task Force to encapsulate FC frames. The major
difference between FCIP and iFCP is over the addressing schemes. As opposed to FCIP, which establishes point-to-point tunnels to connect two FC
SAN, iFCP is a gateway-to-gateway protocol. iFCP
protocol allows each interconnected SAN to retain
its own independent namespace as opposed to
FCIP’s single unified namespace.
As discussed above, to meet the new challenges of the
future computing and communication environment, disk
drives can be further improved by adding more intelligence to the devices. Advantages of putting intelligence on
the storage devices include direct connection to networks
for the parallel processing power with a large number of
devices, integrating metadata with data, better security,
search and filtering capability, and layout-aware scheduling. The concept of on-device processing leads to the recent
development of an object-oriented storage device (OSD)
standard by the Storage Network Industry Association.
OSD [18], [19] provides a layer between block-level and
file-level access that intended to make the existing and
future storage access more effective. It includes three
logical components: Object Manager, OSD Intelligence,
and File Manager. The main function of the Object
Manager is to locate objects and secure access of objects.
File Manager provides the security mechanism to ensure
data privacy and integrity. It also provides an application
programming interface to the legacy applications to access
files on OSD for backward compatibility. OSD Intelligence
is the software that runs on storage devices. It provides the
device with the capabilities of sense of time, communication with other OSDs, device and data management, and
OSD commands and object attribute interpretations. OSD
is intended to create a new class of storage-centric devices
that are capable of understanding, interpreting, and
processing data they store more effectively. The OSD
specification [18] defines a new device-type specific
command set in the SCSI standards family. The objectbased storage device model is defined by this specification.
It specifies the required commands and behavior that are
specific to the OSD device type.
Fig. 1 depicts the abstract model of OSD in comparison to
the traditional block-based device model for a file system.
The traditional functionality of file systems is repartitioned
primarily to take advantage of the increased intelligence that
is available in storage devices. Object-based storage devices
are capable of managing their capacity and presenting filelike storage objects to their hosts. These storage objects are
like files in that they are byte vectors that can be created and
destroyed and can grow and shrink their size during their
lifetimes. Like a file, a single command can be used to read or
write any consecutive stream of the bytes constituting a
storage object. In addition to mapping data to storage
objects, the OSD storage management component maintains
other information about the storage objects in attributes,
e.g., size, usage quotas, associated user name, access
control, and semantic information.
A. OSD Objects
In the OSD specification, the storage objects that are
used to store regular data are called user objects. In addition, the specification defines three other kinds of objects
to assist navigating user objects, i.e., root object, partition
objects, and collection objects. There is one root object for
each OSD logical unit [18]. It is the starting point for
navigation of the structure on an OSD logical unit
analogous to a partition table for a logical unit of block
devices. User objects are collected into partitions that are
represented by partition objects. There may be any number
of partitions within a logical unit up to a specific quota
defined in the root object. Every user object belongs to one
and only one partition. The collection represented by a
collection object is a more flexible way to organize user
Vol. 96, No. 11, November 2008 | Proceedings of the IEEE
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
1879
Du: Recent Advancements and Future Challenges of Storage Systems
Fig. 1. Comparison of traditional and OSD storage models.
objects for navigation. Each collection object belongs to
one and only one partition and may contain zero or more
user objects belonging to the same partition. Different
from user objects, all three kinds of aforementioned
navigating objects do not contain a read/write data area.
All relationships between objects are represented by object
attributes, discussed in the next section.
Various storage objects are uniquely identified within
an OSD logical unit by the combination of two identification numbers: the partition ID and the user object ID. The
OSD standard offers improved but still limited capabilities
of intelligence. We believe OSD is the beginning of
developing more sophisticated intelligent storage devices.
Other possible future features include storing semantic
information in the object attributes, indexing objects based
on the attribute values and metadata information on the
drive, using onboard computing power to respond to some
simple queries, and a small search engine to filter unwanted data objects when responding to a query.
To fully take advantage of intelligent storage devices,
the concept of a global file system is also developed. One
possible global file system based on a hybrid architecture is
illustrated in Fig. 2. The base level of the multitier architecture contains an arbitrary number of variable-sized regions
comprising numerous intelligent OSDs under the control of
a single regional manager. These regions then are composed
into a peer-to-peer overlay network. These regions can
themselves be hierarchical and so may be divided recursively
into some appropriate number of subregions.
1880
B. Regional Organization
To support all of the operations necessary to make each
data object accessible from anywhere at any time, each
region in addition to a set of OSD devices will include the
following two components.
The regional manager serves like a metadata server and
provides centralized management of the clients and objects
within the region. It maintains the metadata for each object
whose home is within the region (as well as replicated
metadata), performs required security-related tasks, and
authenticates access requests from within and outside the
region. It also provides name services for all of the objects for
which it serves as the home region and supports consistency
management, replication, and migration. The regional
manager must monitor the storage devices registered within
the region, and it must track both permanent and visiting
clients. While the regional manager provides centralized
functionality, its actual operations can be distributed among
a cluster of servers and storage devices.
Clients are end users or applications that want to access
objects within a region. After obtaining location and access
information from the regional manager, clients can use an
Internet (IP) connection to access individual objects
directly from the appropriate storage device. Clients will
have a home region that stores important information
about each client, such as basic profile information, but
clients are not tied to a specific region and can freely move
among regions while still maintaining the same personalized view of the data objects they are allowed to access.
Proceedings of the IEEE | Vol. 96, No. 11, November 2008
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
Du: Recent Advancements and Future Challenges of Storage Systems
Fig. 2. A two-tier architecture.
The home region of a client can be changed depending on
access patterns and performance-related issues. An issue
we are investigating is the appropriateness of allowing
clients to automatically change their home regions or of
limiting this to a manual operation.
An intelligent storage device (OSD in this case) can be
either a low level storage device like a disk, a storage subsystem (i.e., the intelligence is implemented in the storage
controller of the storage subsystem) or a high level storage
server. The partitioning of clients and storage devices into
regions can be done in response to logical or physical
affinities. For instance, it is likely that many of the clients
located on a single physical campus of an enterprise will
access the same subset of data objects from the overall
universe of objectsVthe same subset of accounts, for
example. In this case, a physical grouping of clients and
storage devices into the same region will improve performance by allowing the system to exploit the physical locality
of data accesses. In other situations, however, a logical
grouping into a region may be more appropriate, as when a
department within an enterprise tends to access a common
subset of data but is physically distributed in different locales.
Regions will dynamically merge and split as performance
characteristics within and between regions change.
Some work in the area of distributed and object-based
file systems exist. Several projects, for instance, have
investigated techniques for sharing storage devices within
a cluster environment. The Linux-based Lustre project [3],
which uses Object-based Storage Devices (OSD), separates
metadata from object data and allows direct data access
between storage devices and clients; it does not, however,
support data replication and migration or mobile clients,
and assumes that all devices within the cluster can be
trusted. Both Panasas’s ActiveStor6 and EMC’s Centera7
have implemented some aspects of OSD. The Direct Access
File System (DAFS) (DAFS Collaborative), which evolved
from the NFS 4.0 system [21], is based on a central server.
StorageTank [22], [23] uses a Storage Area Network (SAN)
architecture somewhat similar to the description above in
which clients obtain the metadata needed to access files
from a server cluster, while the data itself is accessed
directly from the distributed storage devices.
Another similar global file system is the OceanStore
project [24], which proposed a global-scale persistent
storage infrastructure constructed from what could potentially be millions of untrustworthy storage devices. A key
design goal of that project was to support nomadic data that
could flow freely through the network while being cached
anywhere at any time. Another project with similarities is
the SHORE project [25], which examined a hybrid two-tier
network with peer-to-peer communication between servers
in which each server manages a group of clients. In this
system, a client’s managing server provides access to both
local data and data managed by a remote server.
III . FAULT-TOLERANCE STORAGE
The storage systems today may consist of hundreds or
thousands disks. At any time, the probability of having a
6
http://www.panasas.com/activestor.html.
http://www.emc.com/products/systems/centera.jsp?openfolder=
platform.
7
Vol. 96, No. 11, November 2008 | Proceedings of the IEEE
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
1881
Du: Recent Advancements and Future Challenges of Storage Systems
single or multiple disk failures is large enough and cannot
be ignored. The fundamental way of dealing with disk
failure is to group disks into redundant arrays of independent disks (RAIDs) [26], [27]. RAID uses parity information to provide fault tolerance against disk failure and to
strip data over multidisks for the performance.
RAID is one of the most important storage technologies. There are several RAID variants. The most popular
ones include RAID-0, RAID-1, RAID-3, RAID-5, RAID-6,
and RAID-10. In RAID, a number of disks are grouped
together as a RAID group. For RAID-0, disk space is
divided into blocks of a fixed size. A disk stripe consists of
one block from each disk in a RAID group, with different
blocks in a disk belonging to a different disk striping. The
disk space in the RAID group can be viewed as one huge
space that is partitioned into many nonoverlapping disk
stripes. When storing data, if data are greater than one disk
block, they will be stored into data blocks in the same
disk stripe. If more data blocks are needed, they will be
stored in the data blocks in the next striping units.
RAID-1 is mainly data mirroring. Data on one half of the
RAID group are mirrored on the other half of the group.
RAID-3 uses a different striping than that in RAID-0. Each
data byte is stored across eight disks (one bit on each disk)
with an additional disk allocated for storing the parity bit
for each stripe. Disks in a RAID-3 group need to be
synchronized for either reading or writing data. RAID-5
[28] is the most popular RAID version that provides fault
tolerance. It uses striping similar to RAID-0 with a
difference that a parity block is allocated in each striping
unit for storing the parity data of the striping unit. The
parity blocks are allocated in a circular fashion and evenly
among the disks. The parity block of stripek is allocated at
disk ðk mod nÞ, where n is the number of disks in the
RAID-5 group. In RAID-6, the striped set is with dual
distributed parity. This will provide fault tolerance from
two disk failures. RAID-6 makes large RAID groups more
practical since the larger the RAID group is, the higher the
probability is of more than one disk failure. RAID-10 is a
combination of RAID-1 (mirroring) and RAID-0 (striping).
A set of two disks is used. Each disk is paired with another
mirroring disk. Therefore, there are N mirrored groups.
The data are then striped over the N groups. RAID-10 can
tolerate more than one disk failure as long as the two disks
in the same mirrored group have not failed.
RAID can be found in almost all mission-critical systems.
The parity data in RAID will be used for reconstructing the
lost data block when a disk failure occurs. When data are
written to a data block, the corresponding parity block will
also be updated with the exclusive-or (XOR) of the new and
old data blocks and the old parity block. This task is usually
referred to as a read–modify–write operation. Based on how
and where the parity blocks are reconstructed, traditional
RAID implementations can be categorized into two categories. The first category uses RAID controllers to process all
the corresponding tasks required in RAID functions. This
1882
type of implementation can be referred to as hardware RAID.
The cost of a hardware RAID is more expensive because it
usually requires multiple channel interfaces for the participating disks, large memory for buffering temporary data,
processing units to execute XOR computations, and all the
other management tasks. The other category of implementation uses the host’s main processor (or CPU) instead for all
the RAID-related tasks. The advantage of such implementations is that it reduces the cost of RAID systems since it
requires no RAID controllers. On the other hand, the RAIDrelated tasks compete with the applications on the CPU and
memory as well as other system resources. As a result, the
applications and RAID operations may encounter bad,
unstable, and unpredictable performance.
With the space capacity of a disk increasing beyond
1 Tbyte and limited I/O bandwidth improvement, the
rebuilt time of a failed disk becomes a concern. To rebuild
a RAID may take several days instead a few hours for a disk
of 1 Tbyte. If a RAID like RAID-5 can tolerate only one disk
failure, the chance of loss data is greatly increased since
another disk failure can occur during this long rebuilt
period. Since the rebuilt period is long, the performance
degradation during the rebuilt time also becomes a concern. This tends to be in favor of RAID-6 since a single disk
failure will not seriously degrade the performance and
RAID-10, since mirroring is the easiest and most efficient
way to rebuild a failed disk. RAID-10 can potentially
tolerate even more than two disk failures as long the failed
disks are not in the same mirrored group.
I V. ADDITIONAL CHALLENGES FOR
CURRENT AND FUTURE
STORAGE S YS TEMS
The concept of intelligent storage was first proposed in
[29]–[31], and [47] as active storage. Further extending this
concept, an intelligent storage device can recognize the data
objects stored in the device, monitor the usage of data
information, and reallocate data objects if necessary to meet
the required I/O bandwidth (object storage). It can interact
with other intelligent storage devices or metadata servers to
enforce certain data storage policy for ease of management,
for example, a duplicated copy must be maintained in
another storage device for highly critical data objects. Similar
to recently proposed self-regulating autonomic computing
systems in which processes are performed automatically in
response to internal causes or influences, the actions of
intelligent storage devices are regulated by the needs of the
data they contain (autonomic storage). Some of these possible
approaches are discussed in previous sections. We focus our
discussion on efficient power energy consumption for
storage systems and safe data protection and long-term
data preservation in this section.
Rapid digitization of content has led to extreme demands
on storage systems. The nature of data access such as
simulation data dumps, check-pointing, real-time data access
Proceedings of the IEEE | Vol. 96, No. 11, November 2008
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
Du: Recent Advancements and Future Challenges of Storage Systems
queries, data warehousing queries, etc., warrants an online
data management solution. Most online data management
solutions make use of hierarchical storage management
techniques to accommodate the large volume of digital data.
In such solutions, a major portion of the data set is usually
hosted by tape-based archival solutions, which offer cheaper
storage at the cost of higher access latencies. This loss in
performance due to tape-based archive solutions limits the
performance of the higher level applications that make these
different types of data accesses. This is particularly true since
many queries may require access to older, archived data.
The decreasing cost and increasing capacity of commodity disks are rapidly changing the economics of online
storage and making the use of these large disk arrays more
practical for applications of low latency. Large disk arrays
also enable system scaling, an important property as the
growth of online content is predicted to be enormous. The
enhanced performance offered by disk-based solutions
comes at a price, however. Keeping huge arrays of spinning
disks has a hidden cost, i.e., energy.
Industry surveys suggest that the cost of powering our
country’s data centers is growing at a rate of 25% every
year [32]. Among various components of a data center,
storage is one of the biggest energy consumers, consuming
almost 27% of the total. To make matters worse, increasing
performance demands have led to disks with higher power
requirements; moreover, storage demands are growing by
60% annually, according to an industry report [33]. Given
the well-known growth in total cost of ownership, a
solution that can mitigate the high cost of power, yet keep
data online, is needed.
Various studies of data access patterns in data centers
suggest that on any given day, the total amount of data
accessed is less than 5% of the total stored [36]. Most
energy conservation techniques make use of various
optimizations to conserve energy, but this usually comes
with a huge performance penalty. Massive array of idle
disks (MAID) is a design philosophy recently adopted [34],
[35]. The central idea behind MAID is that all disks in a
MAID storage array are not spinning all the time. Within a
MAID subsystem, disks remain dormant (i.e., powered off)
until the data they hold is requested. When a request
arrives for data on a disk that is off, the controller turns on
the disk, which takes around 7–10 s, and services the
request. Additionally, a set of disks is designated as cache
disks, which are always spinning (i.e., never turned off).
This disk-based caching is necessary because the regular
memory cache is usually not large enough to hold all of
the frequently accessed data. The MAID concept works
on the assumption that less than 5% of the stored data
actually gets accessed on any given day. Keeping this in
mind, the MAID controller tries to make sure that
frequently accessed data are moved to the always-on
cache disks. For this reason, the response time of the
system is very tightly tied to the size of the cache disk set.
By increasing the cache hit ratio, the controller tries to
minimize the response time and also conserve energy.
The savings increase as the storage environments get
larger. A commercial product based on this idea, Copan
MAID, has seen a great deal of success in the realm of
archival systems. One of the main drawbacks with the
MAID approach is that it tries to keep the most frequently
accessed data in the cache disk set, but this will not
ensure good response time for noncached data. Data that
are not cached could include data being accessed for the
first time or data that cannot be cached due to their sheer
volume or the access pattern. A study of using application
hints to increase the efficiency of prefetching and to
achieve better energy efficient is presented in [37].
Application hinting has drawn a great amount of interest
in the high-performance computing community [38],
[39]. The idea is to use application hints for the purpose
of prefetching data ahead of time, thereby reducing the
file system I/O latencies.
Other approaches to increasing energy efficiency for
storage systems are possible. A new energy conservation
technique for disk array-based network servers called
popular data concentration (PDC) was proposed in [40].
According to this scheme, frequently accessed data are
migrated to a subset of the disks. The main assumption here
is that data exhibit heavily clustered popularities. PDC tries
to lay data out across the disk array so that the first disk
stores the most popular disk data, the second disk stores the
next most popular disk data, and so on. Since data blocks
are always moved around to different locations in the disk
array, this mapping mechanism becomes very important. A
new solution called Hibernator was presented in [41]. The
main idea here is the dynamic switching of disk speeds
based on observed performance. This approach makes use
of multispeed disk drives that can run at different speeds
but have to be shut down to make a transition between
different speeds. Such disks have been demonstrated by
Sony. Several other power-aware storage cache management algorithms that provide more opportunities for the
underlying disk power management schemes to save
energy were discussed in [42]. The general consensus of
all these works is that normal cache management
algorithms are not necessarily the best option when it
comes to power conservation. Specifically, they explore the
use of spatial and temporal locality information together in
order to develop cache replacement algorithms. A new type
of hard disk drive that can operate at multiple speeds is also
explored for energy saving [43]. It was demonstrated that
using dynamic revolutions per minute (DRPM) speed
control for power management in server disk arrays can
provide large savings in power consumption with very
little degradation in delivered performance. The DRPM
technique dynamically modulates the hard disk rotation
speed so that the disk can service requests at different
RPMs. For such a scheme to work, a dynamically
adjustable speed disk must first be designed, for which
there are several manufacturing/mechanical challenges.
Vol. 96, No. 11, November 2008 | Proceedings of the IEEE
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
1883
Du: Recent Advancements and Future Challenges of Storage Systems
Designing a disk that can operate at various speeds is very
complex and almost infeasible. Of course, the energy
efficient solutions for data centers are not just limited to
storage devices/systems. The cooling and servers are important too. However, these topics are outside of the scope
of this paper.
As for the data protection, one of the major issues is
where to protect the data. An earlier (and most common)
mechanism is to protect communication link between the
client and the storage systems (for example, using IPSec).
In this case, the data are assumed to be stored in a safely
protected storage system without encryption. When the
storage system was trusted, this was considered to be
enough. However, as indicated by many recent security
incidences, this is not the case. When the disc is stolen, the
access control mechanism enforced in the file system is
useless. As demonstrated in [43], insider misuse is the most
common cause of system break-ins. Administrators have
full access to the storage system, and they become the
central point of social attacks. Further, the data can pass
through several different entities, such as administrators,
backup servers, and users. To prevent or mitigate the risk of
insider misuse, the data have to be encrypted by the writer
and decrypted by the reader. This rather simple-looking
transition introduces new challenges, since the overhead
for key management is now shifted to the client. The client
now has to remember (store) all keys used to encrypt any
file. When the file has to be shared with other clients,
distributing the key becomes the new problem. When the
key is lost, the client will not be able to recover any
encrypted data. Therefore, designing an efficient, transparent, and user-friendly key management mechanism is
crucial. This is especially crucial for the enterprise to adopt
data encryption. The enterprise has to control the keys in
order to keep accessing the data for many years to come.
We have identified a few challenging issues related to key
management for long-term data preservation.
A. Backup and Efficient Retrieval of Keys
Key backup is one of the fundamental requirements of
key management for long-term storage. The cryptographic
keys should be secured as long as the data are considered in
existence. It is almost impossible to recover an encrypted
data if the associated keys are lost. This implies the
following.
1) If the data are backed up, keys must be backed up.
2) If the data are deleted, keys should be deleted due
to ease of management.
3) Keys must be backed up securely.
In order to retrieve the data that were encrypted (or
signed) several years ago, the system should also retrieve
the keys that were used to encrypt (or sign) the data.
Considering the large number of keys that will be created
in an organization over time, this can be a daunting task.
As a result, efficient search and retrieval of these keys is
important.
1884
B. Key Recovery
In practice, just like the data, keys can be lost or
accidentally deleted. Loss of keys will result in loss of data.
Therefore, the key management mechanisms must ensure
that lost keys can be easily recovered. Further, it can
happen that the user who encrypted the file may not be
available at the time of decryption (e.g., the user no longer
works for the organization). Under such circumstances,
some designated agents of the organization should be able
to retrieve the keys. Key recovery mechanisms must be
secure and efficient. The failure of key recovery mechanisms can jeopardize the proper operation, underlying
confidentiality, and can also result in improper disclosures
of keys.
C. Long-Term Management
Typically, enterprises get reorganized periodically.
Reorganization can require merging or splitting of existing
groups. As a result, the keys should be reorganized as well.
Therefore, key management mechanisms should efficiently handle such reorganization activities.
Since the data and the keys should be protected for a
longer duration of time, a lot of unforeseen events can
happen. For example, the keys can be compromised, or
cryptographic algorithm can be compromised since an
attacker has more time to attempt breaking a key or an
algorithm. If an attacker can compromise a key, s/he could
read encrypted data or even write data without being
detected since the attacker can generate a valid signature.
D. Usability
Strong security with poor usability and/or performance
is not sufficient to make the system practical. As illustrated
by [44]–[46], secure systems and cryptographic software
are used less than we would expect due to their lack of
consideration for usability of their products. A recent
survey [46] performed by Sun Microsystems shows that
45% of the surveyed companies find storage management
to be the most expensive task, and 95% of the surveyed
companies state that it is the most difficult. Considering
these statistics, usability (by the administrator and,
naturally, by the client) is one of the major concerns. A
user should be able to use the file system that provides
long-term key management without having to know any
underlying cryptographic operations. Transparency should
be provided as much as is necessary.
E. Scalability
Due to the huge amount of data generated for long
term, the number of keys will be increased with respect to
the data. The key management system must scale well with
respect to the number of keys. Further, the number of keys
that have to be secured physically should be minimal.
In addition to these requirements, the long-term data
preservation by itself poses a great challenge to future
storage systems.
Proceedings of the IEEE | Vol. 96, No. 11, November 2008
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
Du: Recent Advancements and Future Challenges of Storage Systems
V. CONCLUSION
The rapidly changing computing and communication
environment has imposed many new challenges on the
current and future storage systems. It is our belief that the
storage devices and systems have to be drastically
redesigned to meet these challenges. In this paper, we
have briefly covered some of the challenges and potential
REFERENCES
[1] S. Shim, Y. Wang, J. Hsieh, T.-S. Chang, and
D. H. C. Du, BEfficient implementation of
RAID-5 using disk based read modify writes,[
Dept. of Computer Science,
Univ. of Minnesota, Tech. Rep., 1996.
[2] T.-S. Chang, S. Shim, and D. H. C. Du,
BThe designs of RAID with XOR engines on
disks for mass storage systems,[ in
Proc. 6th NASA Goddard Conf. Mass Storage
Syst. Technol./15th IEEE Symp. Mass Storage
Syst., Mar. 22–24, 1998, pp. 181–186.
[3] Information Technology Small Computer
System Interface 2, ANSI X3.131:1994,
American National Standard Institute,
Sep. 7, 1993.
[4] Serial Attached SCSI (SAS), working draft,
American National Standard Institute.
[Online]. Available: http://www.t10.org/ftp/
t10/drafts/sas2/sas2r12.pdf
[5] Fibre Channel 3rd Generation Physical
Interface (FC-PH-3), ANSI X3.303:1998,
American National Standard Institute,
Nov. 5, 1998.
[6] Information Technology Serial Storage
Architecture, Physical Layer 1 (SSA-PH1),
ANSI X3.293:1996, American National
Standard Institute, Jul. 22, 1996.
[7] Information Technology Serial Storage
Architecture, Transport Layer 2 (SSA-TL2)
Rev: 05b, ANSI NCITS.308:1997, American
National Standard Institute, Apr. 1, 1997.
[8] Information Technology Serial Storage
Architecture, SCSI-2 Protocol (SSA-S2P),
ANSI T10.1/1051D, American National
Standard Institute, Apr. 1, 1997, draft
proposed rev. 05b.
[9] Fibre ChannelVArbitrated Loop (FC-AL),
Revision 4.5, ANSI X3.272-1996, American
National Standard Institute, Jun. 1, 1995.
[10] Fibre Channel 2nd Generation Arbitrated
Loop (FC-AL-2), Revision 6.4, NCITS 332:
1999, American National Standard Institute,
Mar. 26, 1996.
[11] Compaq, Intel, Microsoft, Virtual interface
architecture specification version 1.0.
[Online]. Available: ftp://download.intel.
com/design/servers/vi/VI_Arch_
Specification10.pdf
[12] J. Satran et al. (2002, Sep. 16).
iSCSI Internet draft. Internet Engineering
Task Force (IETF). [Online]. Available:
http://ietf.org/ids.by.wg/ips.html
[13] Intel iSCSI reference implementation.
[Online]. Available: http://sourceforge.net/
projects/intel-iscsi
[14] Y. Lu and D. Du, BPerformance evaluation of
iSCSI-based storage subsystem,[ IEEE
Commun. Mag., vol. 41, pp. 76–82, Aug. 2003.
[15] S. Y. Tang, Y. Lu, and D. Du, BPerformance
study of software-based iSCSI security,[ in
Proc. 1st Int. IEEE Security Storage Workshop,
2002.
[16] M. Rajagopal et al. (2002, Aug. 28).
Fibre channel over TCP/IP (FCIP).
Internet Engineering Task Force (IETF).
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
solutions. However, many more studies and experiments
are necessary before a storage system that can meet these
challenges is developed. h
Acknowledgment
This work was partially supported by NSI Grant
0621462.
[Online]. Available: http://ietf.org/internetdrafts/draft-ietf-ips-fcovertcpip-12.txt
C. Monia et al. (2002, Aug. 2). iFCPVA
protocol for Internet fibre channel storage
networking. Internet Engineering Task Force
(IETF). [Online]. Available: http://ietf.org/
internet-drafts/draft-ietf-ips-ifcp-13.txt
Information Technology SCSI Object-Based
Storage Device Commands (OSD), working
draft, Project T10/1355-D.
D. Du, D. H. He et al., BExperiences on
building an object-based storage system based
on T10 standard,[ in Proc. IEEE Mass Storage
Syst. Technol. Conf. 2006.
P. J. Braam and M. J. Callahan, ‘‘Lustre: A SAN
file system for Linux,’’ white paper.
S. Shepler, B. Callaghan, D. Robinson,
R. Thurlow, C. Beame, M. Eisler, and
D. Noveck, RFC 3010: NFS Version 4 Protocol,
2000.
IBM Almaden Research, StorageTank.
[Online]. Available: http://www.almaden.
ibm.com/StorageSystems
D. A. Pease, R. M. Rees, W. C. Hineman,
D. L. Plantenberg, R. A. Becker-Szendy,
R. Ananthanarayanan, M. S. nZimet,
C. J. Sullivan, R. C. Burns, and
D. D. E. Long, ‘‘IBM StorageTankVA
distributed storage system,’’ white paper.
J. Kubiatowicz, D. Bindel et al., BOceanStore:
An architecture for global-scale persistent
storage,[ in Proc. ACM ASPLOS, 2000.
M. Carey et al., BShoring up persistent
applications,[ in Proc. ACM SIGMOD Int. Conf.
Manage. Data, Minneapolis, MN, 1994,
pp. 383–394.
D. Patterson, G. Gibson, and R. Katz, BA case
of redundant arrays inexpensive disks
(RAID),[ in Proc. SIGMOD, Chicago, IL,
Jun. 1988.
P. Chen, E. Lee, G. Gibson, R. Katz, and
D. Patterson, BRAID: High-performance,
reliable secondary storage,[ ACM Comput.
Surv., pp. 145–185, Jun. 1994.
P. Chen and E. Lee, BStriping in a RAID level
5 disk array,[ in Proc. 1995 ACM SIGMETRICS
Conf. Meas. Model. Comput. Syst., May 1995.
G. A. Gilbson et al., BFile server scaling with
network-attached secure disks,[ Sigmetrics,
pp. 272–284, 1997.
E. Riedel, G. A. Gibson, and C. Faloutsos,
BActive storage for large scale data mining and
multimedia,[ in Proc. 1998 Very Large Data
Bases Conf. (VLDB), Aug. 1998, pp. 62–73.
H. R. Lim, V. Kapoor, C. Wighe, and D. Du,
BActive disk file system: A distributed scalable
file system,[ in Proc. IEEE/NASA MSST 2001.
A. P. Corporation (APC). (2004).
Determining total cost of ownership for data
center and network room infrastructure.
[Online]. Available: http: www.apcmedia.
com/salestools/CMRP-5T9PQG R3 EN.pdf
M. Hopkins. (2004). The onsite energy
generation option, Data Center J. [Online].
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
Available: http://datacenterjournal.com/
News/Article.asp?articleid=66
D. Colarelli and D. Grunwald, BMassive arrays
of idle disks for storage archives,[ in Proc.
2002 ACM/IEEE Conf. Supercomput.
(Supercomputing ’02), Los Alamitos, CA, 2002,
pp. 1–11.
D. Colarelli, D. Grunwald, and M. Neufeld.
The case for massive arrays of idle disks
(MAID). [Online]. Available: http://citeseer.
ist.psu.edu/colarelli02case.html
K. Rajamani and C. Lefurgy, BOn evaluating
request distribution schemes for saving
energy in server clusters,[ ISPASS,
pp. 111–122, 2003.
N. Mandagere, J. Diehl, and D. Du,
BGreenStor: Application-aided
energy-efficient storage,[ in Proc. IEEE Mass
Storage Syst. Technol. Conf., 2007.
R. H. Patterson, G. A. Gibson, and
M. Satyanarayanan, BUsing transparent
informed prefetching to reduce file read
latency,[ in Proc. 1992 NASA Goddard Conf.
Mass Storage Syst., 1992, pp. 329–342.
D. Rochberg and G. Gibson, BPrefetching over
a network: Early experience with CTIP,[ Proc.
SIGMETRICS Perform. Eval. Rev., vol. 25,
no. 3, pp. 29–36, 1997.
E. Pinheiro and R. Bianchini, BEnergy
conservation techniques for disk array-based
servers,[ in Proc. 18th Annu. Int. Conf.
Supercomput. (ICS ’04), New York, 2004,
pp. 68–78.
Q. Zhu, Z. Chen, L. Tan, Y. Zhou, K. Keeton,
and J. Wilkes, BHibernator: Helping disk
arrays sleep through the winter,[ SIGOPS
Oper. Syst. Rev., vol. 39, no. 5, pp. 177–190,
2005.
A. P. Papathanasiou and M. L. Scott, BEnergy
efficient prefetching and caching,[ in Proc.
Usenix 2004 Annu. Tech. Conf., pp. 255–268.
S. Gurumurthi, A. Sivasubramaniam,
M. Kandemir, and H. Franke, BDrpm:
Dynamic speed control for power
management in server class disks,[ in Proc.
30th Annu. Int. Symp. Comput. Architect.
(ISCA ’03), New York, 2003, pp. 169–181.
A. Whitten and J. D. Tygar, BWhy Johnny
can’t encrypt: A usability evaluation of
PGP 5.0,[ in Proc. USENIX Security Symp.,
1999.
J. Kaiser and M. Reichenbach, BEvaluating
security tools towards usable security: A
usability taxonomy for the evaluation of
security tools based on a categorization of user
errors,[ in Proc. Usability 2002.
Sun Microsystems. (2003, Sep.). Importance
of total cost of management. [Online].
Available: http://sg.sun.com/events/
presentation/files/data_center/TCM.pdf
A. Acharya, M. Uysal, and J. Saltz, BActive
disks: Programming model, algorithms and
evaluation,[ in Proc. 8th Int. Conf. Architect.
Support Program. Lang. Oper. Syst. (ASPLOS),
San Jose, CA, 1998.
Vol. 96, No. 11, November 2008 | Proceedings of the IEEE
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.
1885
Du: Recent Advancements and Future Challenges of Storage Systems
ABOUT THE AUTHOR
David H. C. Du (Fellow, IEEE) received the B.S.
degree in mathematics from National Tsing-Hua
University, Taiwan, R.O.C., in 1974, and the M.S. and
Ph.D. degrees in computer science from the University of Washington, Seattle, in 1980 and 1981,
respectively.
He is currently a Qwest Chair Professor in the
Computer Science and Engineering Department,
University of Minnesota, Minneapolis. He also has
been a Program Director with National Science
Foundation since 2006. His research interests include cyber security,
sensor networks, multimedia computing, storage systems, high-speed
1886
networking, high-performance computing over clusters of workstations,
database design, and computer-aided design for very large-scale
integrated circuits. He has authored or coauthored more than 190
technical papers, including 95 referred journal publications, in his
research areas. He has also graduated 49 Ph.D. and 80 M.S. students.
He is currently serving on a number of journal editorial boards. He has
been a Guest Editor for Communications of the ACM. He has been
Conference Chair and Program Committee Chair for several conferences
in multimedia, database, and networking areas.
Dr. Du is a Fellow of the Minnesota Supercomputer Institute. He has
been Guest Editor for IEEE COMPUTER.
Proceedings of the IEEE | Vol. 96, No. 11, November 2008
Authorized licensed use limited to: University of Minnesota. Downloaded on December 12, 2008 at 11:59 from IEEE Xplore. Restrictions apply.