White Paper RAID: End of an Era?

White Paper RAID: End of an Era?
White Paper
RAID: End of an Era?
Ceph-based FUJITSU Storage ETERNUS CD10000: an improvement over
traditional RAID-based storage systems in OpenStack cloud environments
Content
Introduction2
Cloud infrastructures: The game changer
3
What does this mean for storage?
3
What is Ceph?
4
ETERNUS CD10000: The better Ceph hyperscale storage
7
Comparing ETERNUS CD10000 with RAID systems for OpenStack 8
Flexibility
8
Fault Tolerance & Disaster Resilience
10
Exploiting Performance
13
Capacity Utilization
14
RAID: End of an Era?
16
Page 1 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Introduction
RAID technology has been the fundamental building block for storage systems for many years. It
has proven successful for almost every kind of data that has been generated in the last three decades, and the various RAID levels have provided sufficient flexibility for balancing usable capacity,
performance, and redundancy up to now.
However, over the course of the last years cloud infrastructures have gained a strong momentum
and are imposing new requirements on storage and challenging traditional RAID-based storage
systems (in the remaining text we will refer to these systems simply as “RAID systems”).
This white paper explains
■ current trends in application development and how this challenges IT operations to become
faster and more agile including the impact on storage system design.
■ the essential ideas about a new open source storage architecture called Ceph and its implementation in an E2E software/hardware solution called ETERNUS CD10000 in order to address the
previously mentioned need for speed and agility.
■ how Ceph/ETERNUS CD10000 adds new value to OpenStack-based cloud environments in contrast
to traditional RAID systems by
– automating the efforts of storage scaling, i.e. adding new storage, retiring old ones with
automatic data migration, on the fly with zero downtime.
– offering practically unlimited storage scalability in a single system, thereby removing the need
for multiple individual RAID systems.
– overcoming the issue of long disk recovery times and degraded performance in RAID systems
– offering disaster resilience as part of the basic product instead of using complex layers on top
of RAID systems
– reducing unnecessary copy operations in the process of spawning OpenStack VMs
– improving capacity utilization by managing all storage needed for recovery, headroom, and
growth in a single pool on a data center level.
– introducing the concept of an “immortal system”
Among other information sources, Fujitsu used the results of a proof of concept of the Fujitsu
Storage ETERNUS CD10000 cloud storage system which were provided by Karan Singh of the Center
for Scientific Computing in Helsinki, Finland.
Page 2 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Cloud infrastructures:
The game changer
What does this mean
for storage?
Most cloud environments are characterized by the enormous speed at
which new applications are created and new versions are released.
Companies, which are able to rapidly deploy the latest application
version in production, become increasingly competitive and innovative.
New infrastructure tools and processes are needed in order to avoid IT
infrastructure operation becoming a bottleneck and consequently a
threat to innovation.
As a cloud operating system OpenStack also requires storage components that allow and fully support the idea of rapid and simple deployment of new applications. However, (with the only exception of the
Swift object storage layer) OpenStack only defines the storage interface
but leaves the actual implementation to other software and hardware.
Of course traditional RAID systems are the usual suspects and are
among the storage platforms supported by OpenStack. According to a
user survey conducted by OpenStack.org in November 2014 it is all
the more surprising that RAID systems are no longer very popular for
OpenStack-based clouds. The most prominent RAID system vendor could
only reach a share of 4% among all production OpenStack installations.
In the meantime the term “DevOps” has become a synonym for this
quest to achieve a new level of agility in the cooperation of development
with IT operations. The challenge is to overcome the typical “blame game”
between development and operation teams: development blames IT
operation for not deploying new software versions fast enough and
IT operations blames development for delivering software versions
that are not ready for production. In order to resolve this conflict new
methods and processes are being discussed between the parties and
some solution concepts have already been proposed in the direction
of increased automation of quality assurance and deployment of new
application software via self-service portals.
Agile
Development
The clear number 1 with a share of 37% is a new open source storage
software called “Ceph” that follows a paradigm called “Software Defined
Storage (SDS)”. This raises a valid question: Why is Ceph turning the
storage market upside down as far as (OpenStack-based) cloud infrastructures are concerned? This white paper analyzes this situation and
explains the reasons why Ceph has evolved to become the number one
storage platform for OpenStack-based cloud infrastructures and furthermore why Ceph could easily become the number one storage solution
for any cloud type infrastructure in the near future.
Lengthy
Development
Figure 1: Agile Development vs. Lengthy Deployment
On this playing field OpenStack, an open source cloud operating system,
provides the foundation for a new dynamic infrastructure that will help
IT operations to cope with the enormous increase in innovation speed
required by application development. According to the OpenStack
Foundation OpenStack is “a cloud operating system that controls large
pools of compute, storage, and networking resources throughout a data
center, all managed through a dashboard that gives administrators
control while empowering their users to provision resources through
a web interface” (for details please refer to http://www.openstack.org).
Page 3 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
What is Ceph?
According to http://www.ceph.com “Ceph is a distributed object store and
file system designed to provide excellent performance, reliability and
scalability”. Like OpenStack, Ceph is open source and freely available.
The main motivation for developing Ceph was to overcome the limitations
of traditional storage like
■ direct attached disks
■ network attached storage
■ or large SAN-based RAID system installations
All these approaches have the fact in common that they do not scale
well in the PB range and even if do, they can only offer this at very high
costs for the management layer involved.
The Ceph design addresses new requirements for cloud environments by
offering support for very large data sets and fully automated provisioning
of resources. Ceph supports big data environments by offering distributed
data processing and furthermore it supports the continued growth of
traditional environments.
It has also been designed
■ to support environments ranging from Terabyte to Exabyte capacities
■ to offer fault tolerance and disaster resilience with no single point
of failure
■ and to be self-managing and self-healing.
This positions Ceph as an ideal storage for OpenStack but also for other
cloud and hyperscale use cases.
The Ceph Hyperscale Approach: No data caging in RAID groups, objects instead of blocks or files, manage storage locations without meta data
In order to support the scaling in capacity and performance as well as
provide the necessary economy of scale in terms of price and management efforts Ceph follows 3 paradigms:
Customer Network
1. Ceph offers direct access to a distributed set of storage nodes rather
than offering access to large monolithic storage boxes caging data
in RAID groups. The storage nodes are implemented on industrystandard servers hosting “just a bunch of disks”. Storage is provided
without using any sort of disk aggregation in the form of standard
RAID groups or systems. Ceph presents these nodes as a single and
practically unlimited storage repository while providing every storage
client immediate access to the node and disk in question.
2. Ceph uses objects as the basic storage entity instead of too primitive
blocks or too complex files. Based on this object architecture presented via TCP/IP Ceph offers unified parallel access of a single large
storage repository through
a. a block device interface (used by OpenStack)
b. a RESTful object storage layer either in the form of S3 or OpenStack
Swift
c. and a clustered file system representing one global namespace
3. Ceph automatically places, replicates, balances, and migrates data
using a topology-aware algorithm called CRUSH (Controlled Replication
Under Scalable Hashing) that calculates storage locations instead of
looking them up, thereby avoiding unnecessary meta data
Data
Ceph Cluster
Storage
Node
Storage
Node
Storage
Node
Storage
Node
...
Storage
Node
Replica, Recovery, Balancing, Migration
Backend Network
Figure 2: A typical Ceph cluster
Ceph Object
RESTful (S3, Swift)
Block
File System
Ceph Object
Figure 3: The various Ceph APIs
Page 4 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Objects as the basic storage entity were chosen to combine the advantages of the two other approaches, block and file, as outlined in the
following table
Blocks
Objects
Files
Addressing
Block number
Names in a flat namespace
Names in a complex hierarchy
Size
Fixed
Variable
Variable
API
Simple
Simple
Complex
Semantic
Primitive
Rich
Rich
Updates
Simple
Simple
Complex
Parallel Operations
Simple
Simple
Complex
CRUSH: The Ceph crown jewel
A central Ceph technology element is the intelligent data distribution
across the storage nodes. In contrast to other systems, Ceph manages
read and write operations with practically no access to a central instance
and without meta data. A special data distribution algorithm called
CRUSH (Controlled Replication under Scalable Hashing) enables all
storage clients to exactly calculate (not lookup) where to write and read
data within the storage cluster. This pseudo-random but still deterministic
data distribution method enables equal utilization of all involved storage
nodes in the cluster. In addition, all data can be written in a redundant
way. Using this method the system can tolerate the loss of hard drives
as well as the loss of complete storage nodes without losing data (and
all this in the absence of RAID).
In addition, lost data copies are recreated in the background and the
original redundancy level is restored automatically. New nodes can be
added easily while the system is running.
Pool 1
Placement Group #a
Placement Group #b
Placement Group #c
Object
CRUSH ruleset
OSD#i
(primary copy)
OSD#j
(2nd copy)
OSD#k
(3rd copy)
Failure Zone 1
Failure Zone 2
Failure Zone 3
OSD = Object Storage Daemon
Process maintaining all I/O of a disk
Figure 4: The CRUSH algorithm
Page 5 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
The following table summarizes the main features and advantages of using CRUSH.
CRUSH (Controlled Replication under Scalable Hashing)
Data equally distributed across the whole cluster
No meta data involved in determining storage location for a given storage object
■ calculate instead of lookup
■ storage location is a pure function of the object name and the current storage topology
Automatic creation of data redundancy
Fast and automatic data recovery after losing disks and nodes
■ No system administrator involved in data recovery
■ Data copies of disks are distributed across the whole cluster. As a consequence, the whole cluster performance
can be involved while restoring the lost content of a disk or node
Automatically adapts to changes in storage configuration including load balancing
■ Disk and node failures
■ Addition and removal of disks and nodes
All adaptation and load balancing activities follow the goal to achieve
■ Equal distribution of data also in the new configuration
■ Optimum utilization of all available I/O capacity
■ Move the smallest possible data amount during load balancing
CRUSH is storage-topology-aware: In addition to equally distributing data across the whole cluster
■ It is possible to place data
- On faster or slower disks/SSDs
- On more expensive or less expensive disks
■ It is possible to place data copies
- Across disks to survive disk failures
- Across storage nodes to survive storage node failures
- Across fire sections to survive the outage of a fire section
- Across data centers to survive the outage of a complete data center
All changes to the storage configuration are application-transparent
More information can be found at
http://docs.ceph.com/docs/master/
http://docs.ceph.com/docs/master/architecture/
http://ceph.com/papers/weil-ceph-osdi06.pdf
https://www.youtube.com/watch?v=gKpM90jXZrM
Page 6 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
ETERNUS CD10000:
The better Ceph hyperscale storage
In August 2014 Fujitsu released a combined hardware and software
solution based on Ceph, called ETERNUS CD10000. The idea behind this
concept is to address and overcome typical shortcomings of softwareonly approaches, especially open source projects.
Cep
h
In particular and in addition to pure Ceph, ETERNUS CD10000 offers
■ End-to-end integration of hardware & software also covered by a
solution maintenance contract
■ An architecture based on design criteria ensuring
– a balanced node design, avoiding bottlenecks and wasted
investment
– a scalable network topology as part of the product
■ Solid lifecycle management based on quality assured software and
hardware versions avoiding
– exploding combinations of hardware and software variants
– deteriorating cluster quality and reliability over time
■ Simplified administration and deployment compensating some
otherwise negative effects of a software-only approach.
■ Introduction of an “immortal system” concept that allows you to
start with a defined configuration, adapt this over the years by
introducing new hardware and software and phasing out old
components while maintaining the same system identity forever.
ETERNUS CD10000 is a comprehensive integration of
■ Ceph software,
■ Assured quality and hardware,
■ Complementing software for out-of-the-box deployment, further
automation and simplified management
Like Ceph, ETERNUS CD10000 is an ideal platform for OpenStack but is
also able to serve additional cloud and hyperscale use cases. Because
ETERNUS CD10000 is an end-to-end hardware and software solution it
also an ideal target for a comparison with today’s dominating RAID-based
storage hardware platforms.
Storage
d
u
Clo
E2E Solution Contract
Immortal System
Automated Deployment
Figure 5: ETERNUS CD10000: “Ceph++”
Page 7 of 16
Automated Management
Hardware/Software Integration
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Comparing ETERNUS CD10000 with RAID
systems for OpenStack
In the subsequent chapters we will take a closer look
at the following main areas
■ Flexibility
■ Fault Tolerance & Disaster Resilience
■ Exploiting Performance
■ Capacity Utilization
Flexibility
Cloud requirements challenge the static design of RAID systems
As already mentioned cloud environments are characterized by rapid
changes: New virtual machines are created and started frequently. Old
ones are shut down and may be retired forever. Consequently, the underlying storage is also undergoing rapid changes. New data needs to
be allocated quickly; old data needs to be freed up. As storage ages, it
needs to be retired and its content moved to new places. All these requirements seriously challenge RAID systems.
Data on a RAID system is caged in RAID groups. Even thin provisioning
does not provide enough freedom (The RAID group can only grow within
certain limits). In general, RAID systems require a lot of capacity management by defining RAID groups, extending RAID groups, defining volumes
on top of RAID groups and eventually making them available as LUNs
on a SAN (not to mention the complex task of defining host access to
the LUNs). Once data is stored on a volume (i.e. the associated RAID
group) it is stuck.
If data needs to be moved to other locations (e.g. because the RAID
system is retired), it has to be done explicitly on an application level.
This becomes very apparent when it’s no longer possible to extend a
RAID system with new shelves. Storage virtualization techniques have
been introduced to the market to address this issue but they usually
create additional costs, easily within the same range as running the
RAID systems alone.
Page 8 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Self-management of highly dynamic storage with ETERNUS CD10000
Managing Ceph is now simpler with ETERNUS CD10000. Scaling your
storage cluster is no longer a manual and complex task. With ETERNUS
CD10000 storage nodes can simply be added on the fly without any
downtime or any change on the OpenStack/application layer. Since
the complete ETERNUS CD10000 storage capacity can be managed as a
single entity on a data center level, typical issues of the past while
adding new storage in the form of RAID systems are overcome. In addition, old ETERNUS CD10000 storage can be phased out completely
transparently for OpenStack, i.e. data migration is handled automatically and nothing needs to be done on an OpenStack level.
All this is made possible by using the notion of a volume that runs completely virtualized on a set of disks equally distributed across the complete
cluster, instead of binding it to an exact set of disks like in a RAID group.
The Ceph/ETERNUS CD10000 volume will then take advantage of a wide
striping algorithm that utilizes the full cluster bandwidth at any time.
As soon as new storage nodes are added, the volume will adapt itself to
the new layout and distribute data evenly to the disks of new nodes as
part of the CRUSH-based load balancing process. The entire operation
takes place in a fully automated way and is completely transparent to
the application layer. As a consequence, a rapidly growing Ceph/ETERNUS
CD10000 environment will always offer optimum conditions for all involved data volumes without the need to optimize data layout on an
application level.
RAID System
Ceph/ETERNUS CD10000
Volume
Volume
CRUSH
RAID Controller
Node
Node
...
Node
RAID Group
Figure 6: Data Layout on RAID Systems vs. Ceph/ETERNUS CD10000
Seamless Integration of ETERNUS CD10000 with OpenStack
The Glance program of OpenStack is responsible for storing VM images
that are used by OpenStack instances to spin up VMs. One of the
OpenStack Glance pain points is that it doesn’t support RAID systems.
The workaround for this problem is to store the VM image repository
on an NFS share.
On a RAID system this means that multiple interfaces for the different
tasks are needed, i.e. FC/SCSI/iSCSI for volume access and TCP/IP / NFS
for file access.
Page 9 of 16
In contrast to this, Ceph/ETERNUS CD10000 offers volume access to
relevant OpenStack programs (Glance, Cinder and Nova volume) by a
single network technology (i.e. TCP/IP) and storage interface (block
device access). Hence, the necessity to invest in two different network
and storage technologies goes away. In addition, Ceph/ETERNUS CD10000
offers further consolidation potential by the native support for the
OpenStack Swift object storage interface, which is not supported by RAID
systems at all.
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Fault Tolerance & Disaster Resilience
The failure of one or more disks: Long recovery times for RAID systems
There are three scenarios to consider here:
In a traditional RAID system disk failure is handled via defined RAID
levels and the usage of hot spare disks. In the event of disk failure,
the RAID system reconstructs the entire data onto a spare disk. The
time required for this reconstruction depends on the amount of failed
data. As the disk sizes become larger and larger, this way of failure
handling is creating more and more problems. Think of RAID systems
using 2, 3, or 4TB disk drives. In the event of disk failure, it takes
approx. 18 hours per TB to reconstruct the lost redundancy on a hot
spare disk.
■ The failure of one or more disks
■ The failure of a data access component
(RAID controller/storage node)
■ The failure of all storage belonging to a
data center location
Depending on the RAID group’s usage scenario during repair the data
reconstruction can last much longer, i.e. the RAID group is exposed to
this vulnerability for several days if not more than a week. Meanwhile
if another drive fails, the situation is then critical. In the case of RAID5
real data is lost and even in the case of RAID6 there is a significant
chance that the next disk failure will lead to a real data loss (for example
in the special case that a systematic failure hits the series of disks
belonging to that RAID group). This situation continues to worsen with
the introduction of even larger disks (6 TB or larger).
Page 10 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
The failure of one or more disks: Use the combined power of the whole cluster for quick recovery with Ceph/ETERNUS CD10000
In contrast to a RAID system, Ceph/ETERNUS CD10000 follows a different
approach. Instead of organizing data in RAID groups consisting of a
static set of disks plus some hot spare disks, Ceph/ETERNUS CD10000
distributes all data in a wide striping fashion across the whole cluster.
From a recovery point of view the basic data entity in a Ceph/ETERNUS
CD10000 is a placement group. A placement group is an aggregation
of objects (the atomic unit in a Ceph/ETERNUS CD10000 cluster), i.e. a
placement group is simply a collection of objects that can be managed
as a single entity. For example, placement groups are replicated rather
than individual objects. As a consequence, a disk failure (and even a
node failure) is handled as a failure of placement groups residing on
that disk.
Thanks to the clever design of the CRUSH algorithm all replicas of lost
placement groups belonging to a single disk are evenly distributed
across the cluster. This offers the opportunity to use the combined power
of the whole cluster to regenerate the original redundancy level, which
makes the entire recovery operation blazing fast. Please keep in mind
that not only are the copies of lost placement groups (i.e. the source
of the restore operation) distributed across the cluster, but also the
RAID System
RAID Group
target destination for the restored redundancy level is not located
on a single disk (like the one we just lost) but again distributed
across the cluster.
Although the basic CRUSH algorithm seeks to distribute all placement
groups evenly across the whole cluster, it is possible to influence this
behaviour. It is for example possible to define so-called failure groups
like storage nodes, racks, fire zones, and even data centers so that
CRUSH can ensure that replicas are placed across those failure groups,
thereby avoiding a loss of a storage node or a fire zone etc. leading to
the loss of all placement group replicas.
By this paradigm of equally distributing all data copies in a failuregroup-aware way across the cluster
■ the data layout is extremely resilient against all sorts of failures instead
of a RAID system that is limited by its static design
■ the recreation of lost redundancy is not limited by the bandwidth of
a single disk in contrast to a RAID system
■ the data layout is constantly optimized in contrast to a RAID system
that will always store the data in the same RAID group.
Ceph/ETERNUS CD10000
Hot Spare
Ceph Placement Groups:
Figure 7: Data Recovery on a RAID system vs. Ceph/ETERNUS CD10000
Page 11 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Less impact and better resilience with Ceph/ETERNUS CD10000 while coping with the failure of a data access component
For a RAID system this effectively means the failure of the RAID controller.
For the Ceph/ETERNUS CD10000 cluster it means the failure of a storage
node. A RAID system will typically protect itself by a redundant (i.e. dual)
controller design. If one controller fails, the remaining one will take over
but will cut the available performance by half.
In contrast, a Ceph/ETERNUS CD10000 cluster consists of more than one
storage node (typically eight and more) and in case one node is lost it
will only lose a minor part of its I/O capability. Moreover, based on the
configured number of data copies and failure zones it can also survive
the loss of multiple storage nodes at any given time.
RAID System
Ceph/ETERNUS CD10000
RAID Controller
CRUSH
Node
Node
...
Node
Figure 8: Controller-based RAID system vs. node-based Ceph/ETERNUS CD10000
Built-in disaster resilience with Ceph/ETERNUS CD10000
The scenario of losing all storage belonging to one data center component is not foreseen by the RAID system design and constitutes the
biggest challenge. At least two RAID systems are needed in two locations that constitute two independent systems and that need to be
orchestrated by additional hardware and/or software (like storage
virtualization) in order to become disaster resilient. All this adds additional complexity, which creates additional costs of at least the same
magnitude as the RAID system’s TCO.
In contrast to this, a single Ceph/ETERNUS CD10000 cluster can easily
be spread across multiple fire sections (preferably three or more) or
data center locations and is thereby able to seamlessly survive larger
disasters that might impact selected data center fire sections/locations
without any additional costs and complexity.
RAID System
Ceph/ETERNUS CD10000
Storage virtualization or similar
CRUSH
RAID System 1
RAID System 2
RAID Controller
RAID Controller
Location 1
Location 2
Node
Node
Location 1
...
Node
Location 2
Figure 9: Two data center architectures with RAID systems vs. Ceph/ETERNUS CD10000
Page 12 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Exploiting Performance
At first glance achievable performance is mainly dominated by the used storage media (fast disks, slow disks, all sorts of SSDs) and RAID
systems make good use of these technologies in combination with highly optimized controller techniques. Although this leaves little room
for improvement, there is one area that has a direct impact on the achievable performance: To what degree is an application able to actually
use the potential performance a storage system can offer?
Full performance for all resources at any time
In a highly dynamic environment like a cloud infrastructure data is quickly
changed, deleted, and newly generated. It is always hard to predict how
to best distribute data initially and how this (static) layout will perform
over the course of time in terms of future hot spots and underutilized
data areas existing in parallel. In the end this can easily result in a badly
performing system, although the general I/O architecture is nearly
perfect. In contrast to RAID systems with their static storage allocation
scheme bound to individual RAID groups, Ceph/ETERNUS CD10000 follows
a different approach in which all data is equally distributed across all
available nodes (somehow comparable with an algorithm striping data
across multiple RAID systems but with better DR resilience).
Thanks to the built-in storage virtualization this optimized data layout
will prevail over time and remain resilient against all sorts of changes
like adding new storage nodes, losing disks, phasing out old storage
nodes and the like. Instead of remaining in its static layout all data
entities (e.g. volumes) will be rebalanced to make optimum use of all
available storage resources in the cluster while maintaining the same
identity towards the application (OpenStack VMs in our case). This leads
to a system that can maintain (and even improve) its performance
characteristics without the need to change the data layout on an application level.
No performance impact during recovery from lost disks
As mentioned earlier, the recovery times after losing disks in a RAID
system grow dramatically with the disk capacity leading to degraded
RAID groups for several days if not more than a week. This implies that
during this recovery time the performance for all volumes belonging to
this RAID group is severely impacted. In contrast, Ceph/ETERNUS CD10000
follows a much more parallelized process of involving the whole cluster
in the recreation of lost data (for details please refer to the fault tolerance
section). This approach avoids negative performance impact on selected
data areas. Especially in the Ceph implementation of ETERNUS CD10000
a clear separation of front end network from back end network data
traffic helps avoid negative performance impacts while data is recovered,
load balanced, or replicated.
Avoid unnecessary copy operations in OpenStack
Ceph/ETERNUS CD10000 offers seamless support across all relevant
OpenStack programs: For example VM images are stored in Glance
using the Ceph/ETERNUS CD10000 block interface. This enables the
usage of cloning and snapshotting while assembling production VMs
with Cinder and deploying them with Nova, i.e. all modules use the
same interface at the same time and can benefit from Ceph/ETERNUS
CD10000’s cluster and sharing capability. This has the effect that even
hundreds of VM instances can be deployed in only a few minutes.
In contrast, RAID systems use different interfaces for Glance as opposed
to Cinder and Nova. The consequence is that you typically set up an
NFS store for Glance and SCSI block devices for RAID. This has the effect
that images need to be explicitly copied between Glance-controlled
and Cinder-controlled repositories (as opposed to simply snapshot and
clone entities). Furthermore, organizations have to maintain two
network infrastructures (Ethernet for NFS, FC* for SCSI block devices)
at the same time.
*) although iSCSI is an alternative, most larger installations still rely on FC-based storage as far as RAID systems are concerned.
Page 13 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Capacity Utilization
A typical claim brought up by people who are familiar with RAID is that RAID systems use RAID levels like RAID5 and RAID6 to protect their data,
while the typical recommendation for CEPH/ETERNUS CD10000 is to use 3 data copies instead. Keeping aside the cost aspects (that puts
this discussion into relation due to the fact that Ceph/ETERNUS CD10000s allow better raw capacity prices anyway), it is worthwhile discussing
this claim in more detail:
Capacity utilization in reliability-sensitive environments for frequently used data
In all reliability-sensitive environments data centers will consider the
deployment of mirrored RAID system pairs distributed across fire sections.
As a consequence, it doubles the capacity requirements on top of the
defined RAID system. In Ceph/ETERNUS CD10000 you can still live with 3
copies distributed across those sections. Furthermore, the usage of hot
spare disks in a RAID configuration will add another “hidden” capacity.
In Ceph/ETERNUS CD10000 this way of provisioning extra capacity can
be dynamically consumed by the typical growth and headroom planning
(as opposed to the hot spare concept in RAID systems that doesn’t
allow any other usage for hot spares). Let’s have a look at the picture
for a typical OpenStack VM store:
Type
Level
Multiplier
Hot Spare
Multiplier
Fire Sections
Multiplier
Effective Factor***
RAID
RAID10
x2
yes
x1.05
2
x2
4.2
RAID
RAID5
x1.3*
yes
x1.05
2
x2
2.8
Ceph
3 copies
x3
no
x1
2
x1**
3.0
*) We are well aware that this multiplier greatly depends on the individual implementation of RAID5 (i.e. there is a difference if a setup like 3+1 or 8+1 is used). However, due to growing disk capacities
going hand in hand with growing recovery times, a 3+1 setup has become very common in the meantime).
**) Multiple fire sections are already covered by the original number of 3 copies
***) The effective factor is the product of the involved multipliers for level, hot spare, and fire sections
This shows that the originally perceived disadvantage of Ceph/ETERNUS CD10000 is almost compensated in the light of actual deployment schemes.
The better raw price capacity ratio of Ceph/ETERNUS CD10000 will eventually bring this solution to the forefront in a capacity-oriented discussion.
Capacity utilization in normal environments for less frequently used data
much like multiple data copies do. In contrast to multiple data copies,
A similar picture accrues when looking at the typical storage of less
it uses the concept of so-called data and coding chunks. Data chunks
frequently used (especially less frequently written) data. Apart from the
contain the actual data while coding chunks contain redundancy infortypical 3-copy concept used for production VMs in OpenStack, there is
mation. From a capacity perspective erasure coding works in a similar
another scheme available called erasure coding. Erasure coding actually
way to a RAID setup with k data chunks and m parity chunks. Unlike
combines the advantages of less capacity overhead caused by RAID levels
RAID it is now possible to even store the chunks across multiple fire
and the flexibility and virtualization offered by Ceph/ETERNUS CD10000.
sections. In essence this delivers the following picture:
Erasure coding is storing data in chunks across the whole cluster pretty
Type
Level
Multiplier
Hot Spare
Multiplier
Fire Sections
Multiplier
Effective Factor***
RAID
RAID6
x1.25*
yes
x1.05
1
x1
1.3
Ceph
EC****
k=2, m=1
x1.5
no
x1
3
x1**
1.5
*) Assuming an 8+2 setup
**) Multiple fire sections are already covered by the original number of 3 data/coding chunks
***) The effective factor is the product of the involved multipliers for level, hot spare, and fire sections
****) The parameter k defines the number of data chunks and the parameter m defines the number of coding (parity) chunks.
Page 14 of 16
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
Capacity utilization in reliability-sensitive environments for less frequently used data
It now becomes obvious that although using the same redundancy level
at first glance Ceph/ETERNUS CD10000 offers the much more resilient
concept to distribute data across 3 fire sections as opposed to storing RAID
data in just one. This effectively means that a Ceph/ETERNUS CD10000
configuration can survive a disaster striking one fire section without
any data loss. A similar picture accrues while considering the storage
across two geographically separated locations:
Type
Level
Multiplier
Hot Spare
Multiplier
Fire Sections
Multiplier
Effective Factor**
RAID
RAID6
x1.25*
yes
x1.05
2
x2
2.6
Ceph
EC***
k=4, m=1
x1.25
no
x1
2
x2
2.5
*) Assuming an 8+2 setup
**) The effective factor is the product of the involved multipliers for level, hot spare, and fire sections
***) The parameter k defines the number of data chunks and the parameter m defines the number of coding (parity) chunks.
Apart from these very obvious calculations there are still some more hidden aspects that shine a particular light on the overall capacity
discussion:
■ Storage capacity planning for a Ceph/ETERNUS CD10000 configuration
is simplified because all planning for headroom, growth rate, and
data recovery is consolidated in a single data-center-wide pool. In
contrast, RAID systems need separate planning based on individual
RAID groups and per system hot spares.
Page 15 of 16
■ A typical characteristic of RAID systems is that for performance reasons
storage extensions have to be provided with the same disk types and
RAID aggregation levels as the ones already in use. Furthermore, any
disk replacement must happen with the identical type of disks already
in use. With Ceph/ETERNUS CD10000 it is possible to use new node
types with different characteristics at any point in time to extend or
replace existing storage. Ceph/ETERNUS CD10000 uses a weighting
mechanism for its disks, hence different disk sizes are not a problem,
and Ceph/ETERNUS CD10000 stores data based on disk weight which
is intelligently managed. Unlike RAID, Ceph/ETERNUS CD10000 offers
a vendor-agnostic storage concept and works nicely on heterogeneous
storage node types providing a unified solution for file, block and
object storage out of the box.
www.fujitsu.com/eternus
White Paper RAID: End of an Era?
RAID: End of an Era?
“RAID: End of an Era?” This is the topic of this white paper and after all we still need to conclude on this
pending question:
We can sum it up with the statement: “It depends.”
Two scenarios need to be distinguished:
■ We have seen that for cloud-based environments (we
used OpenStack as an example) the infrastructure requirements are substantially different and dominated by
requirements for speed and agility requesting hyperscale
storage that is self-managing and self-healing. In this
environment RAID systems have problems to deliver the
right functions at the right cost level.
Published by
Fujitsu Limited
Copyright © 2015 Fujitsu Limited
www.fujitsu.com/eternus
Page 16 of 16
■ In contrast, there is still a significant demand for environments seeking to deliver the optimum performance and
service level for specific legacy applications. These environments are less dominated by cloud infrastructure
requirements and here traditional RAID systems will
remain a good choice because Ceph/ETERNUS CD10000
cannot fully unfold its strengths.
All rights reserved, including intellectual property rights. Technical data subject to modifications and delivery
subject to availability. Any liability that the data and illustrations are complete, actual or correct is excluded.
Designations may be trademarks and/or copyrights of the respective manufacturer, the use of which by third
parties for their own purposes may infringe the rights of such owner.
For further information see ts.fujitsu.com/terms_of_use.html
www.fujitsu.com/eternus