White Paper RAID: End of an Era? White Paper RAID: End of an Era? Ceph-based FUJITSU Storage ETERNUS CD10000: an improvement over traditional RAID-based storage systems in OpenStack cloud environments Content Introduction2 Cloud infrastructures: The game changer 3 What does this mean for storage? 3 What is Ceph? 4 ETERNUS CD10000: The better Ceph hyperscale storage 7 Comparing ETERNUS CD10000 with RAID systems for OpenStack 8 Flexibility 8 Fault Tolerance & Disaster Resilience 10 Exploiting Performance 13 Capacity Utilization 14 RAID: End of an Era? 16 Page 1 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Introduction RAID technology has been the fundamental building block for storage systems for many years. It has proven successful for almost every kind of data that has been generated in the last three decades, and the various RAID levels have provided sufficient flexibility for balancing usable capacity, performance, and redundancy up to now. However, over the course of the last years cloud infrastructures have gained a strong momentum and are imposing new requirements on storage and challenging traditional RAID-based storage systems (in the remaining text we will refer to these systems simply as “RAID systems”). This white paper explains ■ current trends in application development and how this challenges IT operations to become faster and more agile including the impact on storage system design. ■ the essential ideas about a new open source storage architecture called Ceph and its implementation in an E2E software/hardware solution called ETERNUS CD10000 in order to address the previously mentioned need for speed and agility. ■ how Ceph/ETERNUS CD10000 adds new value to OpenStack-based cloud environments in contrast to traditional RAID systems by – automating the efforts of storage scaling, i.e. adding new storage, retiring old ones with automatic data migration, on the fly with zero downtime. – offering practically unlimited storage scalability in a single system, thereby removing the need for multiple individual RAID systems. – overcoming the issue of long disk recovery times and degraded performance in RAID systems – offering disaster resilience as part of the basic product instead of using complex layers on top of RAID systems – reducing unnecessary copy operations in the process of spawning OpenStack VMs – improving capacity utilization by managing all storage needed for recovery, headroom, and growth in a single pool on a data center level. – introducing the concept of an “immortal system” Among other information sources, Fujitsu used the results of a proof of concept of the Fujitsu Storage ETERNUS CD10000 cloud storage system which were provided by Karan Singh of the Center for Scientific Computing in Helsinki, Finland. Page 2 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Cloud infrastructures: The game changer What does this mean for storage? Most cloud environments are characterized by the enormous speed at which new applications are created and new versions are released. Companies, which are able to rapidly deploy the latest application version in production, become increasingly competitive and innovative. New infrastructure tools and processes are needed in order to avoid IT infrastructure operation becoming a bottleneck and consequently a threat to innovation. As a cloud operating system OpenStack also requires storage components that allow and fully support the idea of rapid and simple deployment of new applications. However, (with the only exception of the Swift object storage layer) OpenStack only defines the storage interface but leaves the actual implementation to other software and hardware. Of course traditional RAID systems are the usual suspects and are among the storage platforms supported by OpenStack. According to a user survey conducted by OpenStack.org in November 2014 it is all the more surprising that RAID systems are no longer very popular for OpenStack-based clouds. The most prominent RAID system vendor could only reach a share of 4% among all production OpenStack installations. In the meantime the term “DevOps” has become a synonym for this quest to achieve a new level of agility in the cooperation of development with IT operations. The challenge is to overcome the typical “blame game” between development and operation teams: development blames IT operation for not deploying new software versions fast enough and IT operations blames development for delivering software versions that are not ready for production. In order to resolve this conflict new methods and processes are being discussed between the parties and some solution concepts have already been proposed in the direction of increased automation of quality assurance and deployment of new application software via self-service portals. Agile Development The clear number 1 with a share of 37% is a new open source storage software called “Ceph” that follows a paradigm called “Software Defined Storage (SDS)”. This raises a valid question: Why is Ceph turning the storage market upside down as far as (OpenStack-based) cloud infrastructures are concerned? This white paper analyzes this situation and explains the reasons why Ceph has evolved to become the number one storage platform for OpenStack-based cloud infrastructures and furthermore why Ceph could easily become the number one storage solution for any cloud type infrastructure in the near future. Lengthy Development Figure 1: Agile Development vs. Lengthy Deployment On this playing field OpenStack, an open source cloud operating system, provides the foundation for a new dynamic infrastructure that will help IT operations to cope with the enormous increase in innovation speed required by application development. According to the OpenStack Foundation OpenStack is “a cloud operating system that controls large pools of compute, storage, and networking resources throughout a data center, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface” (for details please refer to http://www.openstack.org). Page 3 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? What is Ceph? According to http://www.ceph.com “Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability”. Like OpenStack, Ceph is open source and freely available. The main motivation for developing Ceph was to overcome the limitations of traditional storage like ■ direct attached disks ■ network attached storage ■ or large SAN-based RAID system installations All these approaches have the fact in common that they do not scale well in the PB range and even if do, they can only offer this at very high costs for the management layer involved. The Ceph design addresses new requirements for cloud environments by offering support for very large data sets and fully automated provisioning of resources. Ceph supports big data environments by offering distributed data processing and furthermore it supports the continued growth of traditional environments. It has also been designed ■ to support environments ranging from Terabyte to Exabyte capacities ■ to offer fault tolerance and disaster resilience with no single point of failure ■ and to be self-managing and self-healing. This positions Ceph as an ideal storage for OpenStack but also for other cloud and hyperscale use cases. The Ceph Hyperscale Approach: No data caging in RAID groups, objects instead of blocks or files, manage storage locations without meta data In order to support the scaling in capacity and performance as well as provide the necessary economy of scale in terms of price and management efforts Ceph follows 3 paradigms: Customer Network 1. Ceph offers direct access to a distributed set of storage nodes rather than offering access to large monolithic storage boxes caging data in RAID groups. The storage nodes are implemented on industrystandard servers hosting “just a bunch of disks”. Storage is provided without using any sort of disk aggregation in the form of standard RAID groups or systems. Ceph presents these nodes as a single and practically unlimited storage repository while providing every storage client immediate access to the node and disk in question. 2. Ceph uses objects as the basic storage entity instead of too primitive blocks or too complex files. Based on this object architecture presented via TCP/IP Ceph offers unified parallel access of a single large storage repository through a. a block device interface (used by OpenStack) b. a RESTful object storage layer either in the form of S3 or OpenStack Swift c. and a clustered file system representing one global namespace 3. Ceph automatically places, replicates, balances, and migrates data using a topology-aware algorithm called CRUSH (Controlled Replication Under Scalable Hashing) that calculates storage locations instead of looking them up, thereby avoiding unnecessary meta data Data Ceph Cluster Storage Node Storage Node Storage Node Storage Node ... Storage Node Replica, Recovery, Balancing, Migration Backend Network Figure 2: A typical Ceph cluster Ceph Object RESTful (S3, Swift) Block File System Ceph Object Figure 3: The various Ceph APIs Page 4 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Objects as the basic storage entity were chosen to combine the advantages of the two other approaches, block and file, as outlined in the following table Blocks Objects Files Addressing Block number Names in a flat namespace Names in a complex hierarchy Size Fixed Variable Variable API Simple Simple Complex Semantic Primitive Rich Rich Updates Simple Simple Complex Parallel Operations Simple Simple Complex CRUSH: The Ceph crown jewel A central Ceph technology element is the intelligent data distribution across the storage nodes. In contrast to other systems, Ceph manages read and write operations with practically no access to a central instance and without meta data. A special data distribution algorithm called CRUSH (Controlled Replication under Scalable Hashing) enables all storage clients to exactly calculate (not lookup) where to write and read data within the storage cluster. This pseudo-random but still deterministic data distribution method enables equal utilization of all involved storage nodes in the cluster. In addition, all data can be written in a redundant way. Using this method the system can tolerate the loss of hard drives as well as the loss of complete storage nodes without losing data (and all this in the absence of RAID). In addition, lost data copies are recreated in the background and the original redundancy level is restored automatically. New nodes can be added easily while the system is running. Pool 1 Placement Group #a Placement Group #b Placement Group #c Object CRUSH ruleset OSD#i (primary copy) OSD#j (2nd copy) OSD#k (3rd copy) Failure Zone 1 Failure Zone 2 Failure Zone 3 OSD = Object Storage Daemon Process maintaining all I/O of a disk Figure 4: The CRUSH algorithm Page 5 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? The following table summarizes the main features and advantages of using CRUSH. CRUSH (Controlled Replication under Scalable Hashing) Data equally distributed across the whole cluster No meta data involved in determining storage location for a given storage object ■ calculate instead of lookup ■ storage location is a pure function of the object name and the current storage topology Automatic creation of data redundancy Fast and automatic data recovery after losing disks and nodes ■ No system administrator involved in data recovery ■ Data copies of disks are distributed across the whole cluster. As a consequence, the whole cluster performance can be involved while restoring the lost content of a disk or node Automatically adapts to changes in storage configuration including load balancing ■ Disk and node failures ■ Addition and removal of disks and nodes All adaptation and load balancing activities follow the goal to achieve ■ Equal distribution of data also in the new configuration ■ Optimum utilization of all available I/O capacity ■ Move the smallest possible data amount during load balancing CRUSH is storage-topology-aware: In addition to equally distributing data across the whole cluster ■ It is possible to place data - On faster or slower disks/SSDs - On more expensive or less expensive disks ■ It is possible to place data copies - Across disks to survive disk failures - Across storage nodes to survive storage node failures - Across fire sections to survive the outage of a fire section - Across data centers to survive the outage of a complete data center All changes to the storage configuration are application-transparent More information can be found at http://docs.ceph.com/docs/master/ http://docs.ceph.com/docs/master/architecture/ http://ceph.com/papers/weil-ceph-osdi06.pdf https://www.youtube.com/watch?v=gKpM90jXZrM Page 6 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? ETERNUS CD10000: The better Ceph hyperscale storage In August 2014 Fujitsu released a combined hardware and software solution based on Ceph, called ETERNUS CD10000. The idea behind this concept is to address and overcome typical shortcomings of softwareonly approaches, especially open source projects. Cep h In particular and in addition to pure Ceph, ETERNUS CD10000 offers ■ End-to-end integration of hardware & software also covered by a solution maintenance contract ■ An architecture based on design criteria ensuring – a balanced node design, avoiding bottlenecks and wasted investment – a scalable network topology as part of the product ■ Solid lifecycle management based on quality assured software and hardware versions avoiding – exploding combinations of hardware and software variants – deteriorating cluster quality and reliability over time ■ Simplified administration and deployment compensating some otherwise negative effects of a software-only approach. ■ Introduction of an “immortal system” concept that allows you to start with a defined configuration, adapt this over the years by introducing new hardware and software and phasing out old components while maintaining the same system identity forever. ETERNUS CD10000 is a comprehensive integration of ■ Ceph software, ■ Assured quality and hardware, ■ Complementing software for out-of-the-box deployment, further automation and simplified management Like Ceph, ETERNUS CD10000 is an ideal platform for OpenStack but is also able to serve additional cloud and hyperscale use cases. Because ETERNUS CD10000 is an end-to-end hardware and software solution it also an ideal target for a comparison with today’s dominating RAID-based storage hardware platforms. Storage d u Clo E2E Solution Contract Immortal System Automated Deployment Figure 5: ETERNUS CD10000: “Ceph++” Page 7 of 16 Automated Management Hardware/Software Integration www.fujitsu.com/eternus White Paper RAID: End of an Era? Comparing ETERNUS CD10000 with RAID systems for OpenStack In the subsequent chapters we will take a closer look at the following main areas ■ Flexibility ■ Fault Tolerance & Disaster Resilience ■ Exploiting Performance ■ Capacity Utilization Flexibility Cloud requirements challenge the static design of RAID systems As already mentioned cloud environments are characterized by rapid changes: New virtual machines are created and started frequently. Old ones are shut down and may be retired forever. Consequently, the underlying storage is also undergoing rapid changes. New data needs to be allocated quickly; old data needs to be freed up. As storage ages, it needs to be retired and its content moved to new places. All these requirements seriously challenge RAID systems. Data on a RAID system is caged in RAID groups. Even thin provisioning does not provide enough freedom (The RAID group can only grow within certain limits). In general, RAID systems require a lot of capacity management by defining RAID groups, extending RAID groups, defining volumes on top of RAID groups and eventually making them available as LUNs on a SAN (not to mention the complex task of defining host access to the LUNs). Once data is stored on a volume (i.e. the associated RAID group) it is stuck. If data needs to be moved to other locations (e.g. because the RAID system is retired), it has to be done explicitly on an application level. This becomes very apparent when it’s no longer possible to extend a RAID system with new shelves. Storage virtualization techniques have been introduced to the market to address this issue but they usually create additional costs, easily within the same range as running the RAID systems alone. Page 8 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Self-management of highly dynamic storage with ETERNUS CD10000 Managing Ceph is now simpler with ETERNUS CD10000. Scaling your storage cluster is no longer a manual and complex task. With ETERNUS CD10000 storage nodes can simply be added on the fly without any downtime or any change on the OpenStack/application layer. Since the complete ETERNUS CD10000 storage capacity can be managed as a single entity on a data center level, typical issues of the past while adding new storage in the form of RAID systems are overcome. In addition, old ETERNUS CD10000 storage can be phased out completely transparently for OpenStack, i.e. data migration is handled automatically and nothing needs to be done on an OpenStack level. All this is made possible by using the notion of a volume that runs completely virtualized on a set of disks equally distributed across the complete cluster, instead of binding it to an exact set of disks like in a RAID group. The Ceph/ETERNUS CD10000 volume will then take advantage of a wide striping algorithm that utilizes the full cluster bandwidth at any time. As soon as new storage nodes are added, the volume will adapt itself to the new layout and distribute data evenly to the disks of new nodes as part of the CRUSH-based load balancing process. The entire operation takes place in a fully automated way and is completely transparent to the application layer. As a consequence, a rapidly growing Ceph/ETERNUS CD10000 environment will always offer optimum conditions for all involved data volumes without the need to optimize data layout on an application level. RAID System Ceph/ETERNUS CD10000 Volume Volume CRUSH RAID Controller Node Node ... Node RAID Group Figure 6: Data Layout on RAID Systems vs. Ceph/ETERNUS CD10000 Seamless Integration of ETERNUS CD10000 with OpenStack The Glance program of OpenStack is responsible for storing VM images that are used by OpenStack instances to spin up VMs. One of the OpenStack Glance pain points is that it doesn’t support RAID systems. The workaround for this problem is to store the VM image repository on an NFS share. On a RAID system this means that multiple interfaces for the different tasks are needed, i.e. FC/SCSI/iSCSI for volume access and TCP/IP / NFS for file access. Page 9 of 16 In contrast to this, Ceph/ETERNUS CD10000 offers volume access to relevant OpenStack programs (Glance, Cinder and Nova volume) by a single network technology (i.e. TCP/IP) and storage interface (block device access). Hence, the necessity to invest in two different network and storage technologies goes away. In addition, Ceph/ETERNUS CD10000 offers further consolidation potential by the native support for the OpenStack Swift object storage interface, which is not supported by RAID systems at all. www.fujitsu.com/eternus White Paper RAID: End of an Era? Fault Tolerance & Disaster Resilience The failure of one or more disks: Long recovery times for RAID systems There are three scenarios to consider here: In a traditional RAID system disk failure is handled via defined RAID levels and the usage of hot spare disks. In the event of disk failure, the RAID system reconstructs the entire data onto a spare disk. The time required for this reconstruction depends on the amount of failed data. As the disk sizes become larger and larger, this way of failure handling is creating more and more problems. Think of RAID systems using 2, 3, or 4TB disk drives. In the event of disk failure, it takes approx. 18 hours per TB to reconstruct the lost redundancy on a hot spare disk. ■ The failure of one or more disks ■ The failure of a data access component (RAID controller/storage node) ■ The failure of all storage belonging to a data center location Depending on the RAID group’s usage scenario during repair the data reconstruction can last much longer, i.e. the RAID group is exposed to this vulnerability for several days if not more than a week. Meanwhile if another drive fails, the situation is then critical. In the case of RAID5 real data is lost and even in the case of RAID6 there is a significant chance that the next disk failure will lead to a real data loss (for example in the special case that a systematic failure hits the series of disks belonging to that RAID group). This situation continues to worsen with the introduction of even larger disks (6 TB or larger). Page 10 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? The failure of one or more disks: Use the combined power of the whole cluster for quick recovery with Ceph/ETERNUS CD10000 In contrast to a RAID system, Ceph/ETERNUS CD10000 follows a different approach. Instead of organizing data in RAID groups consisting of a static set of disks plus some hot spare disks, Ceph/ETERNUS CD10000 distributes all data in a wide striping fashion across the whole cluster. From a recovery point of view the basic data entity in a Ceph/ETERNUS CD10000 is a placement group. A placement group is an aggregation of objects (the atomic unit in a Ceph/ETERNUS CD10000 cluster), i.e. a placement group is simply a collection of objects that can be managed as a single entity. For example, placement groups are replicated rather than individual objects. As a consequence, a disk failure (and even a node failure) is handled as a failure of placement groups residing on that disk. Thanks to the clever design of the CRUSH algorithm all replicas of lost placement groups belonging to a single disk are evenly distributed across the cluster. This offers the opportunity to use the combined power of the whole cluster to regenerate the original redundancy level, which makes the entire recovery operation blazing fast. Please keep in mind that not only are the copies of lost placement groups (i.e. the source of the restore operation) distributed across the cluster, but also the RAID System RAID Group target destination for the restored redundancy level is not located on a single disk (like the one we just lost) but again distributed across the cluster. Although the basic CRUSH algorithm seeks to distribute all placement groups evenly across the whole cluster, it is possible to influence this behaviour. It is for example possible to define so-called failure groups like storage nodes, racks, fire zones, and even data centers so that CRUSH can ensure that replicas are placed across those failure groups, thereby avoiding a loss of a storage node or a fire zone etc. leading to the loss of all placement group replicas. By this paradigm of equally distributing all data copies in a failuregroup-aware way across the cluster ■ the data layout is extremely resilient against all sorts of failures instead of a RAID system that is limited by its static design ■ the recreation of lost redundancy is not limited by the bandwidth of a single disk in contrast to a RAID system ■ the data layout is constantly optimized in contrast to a RAID system that will always store the data in the same RAID group. Ceph/ETERNUS CD10000 Hot Spare Ceph Placement Groups: Figure 7: Data Recovery on a RAID system vs. Ceph/ETERNUS CD10000 Page 11 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Less impact and better resilience with Ceph/ETERNUS CD10000 while coping with the failure of a data access component For a RAID system this effectively means the failure of the RAID controller. For the Ceph/ETERNUS CD10000 cluster it means the failure of a storage node. A RAID system will typically protect itself by a redundant (i.e. dual) controller design. If one controller fails, the remaining one will take over but will cut the available performance by half. In contrast, a Ceph/ETERNUS CD10000 cluster consists of more than one storage node (typically eight and more) and in case one node is lost it will only lose a minor part of its I/O capability. Moreover, based on the configured number of data copies and failure zones it can also survive the loss of multiple storage nodes at any given time. RAID System Ceph/ETERNUS CD10000 RAID Controller CRUSH Node Node ... Node Figure 8: Controller-based RAID system vs. node-based Ceph/ETERNUS CD10000 Built-in disaster resilience with Ceph/ETERNUS CD10000 The scenario of losing all storage belonging to one data center component is not foreseen by the RAID system design and constitutes the biggest challenge. At least two RAID systems are needed in two locations that constitute two independent systems and that need to be orchestrated by additional hardware and/or software (like storage virtualization) in order to become disaster resilient. All this adds additional complexity, which creates additional costs of at least the same magnitude as the RAID system’s TCO. In contrast to this, a single Ceph/ETERNUS CD10000 cluster can easily be spread across multiple fire sections (preferably three or more) or data center locations and is thereby able to seamlessly survive larger disasters that might impact selected data center fire sections/locations without any additional costs and complexity. RAID System Ceph/ETERNUS CD10000 Storage virtualization or similar CRUSH RAID System 1 RAID System 2 RAID Controller RAID Controller Location 1 Location 2 Node Node Location 1 ... Node Location 2 Figure 9: Two data center architectures with RAID systems vs. Ceph/ETERNUS CD10000 Page 12 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Exploiting Performance At first glance achievable performance is mainly dominated by the used storage media (fast disks, slow disks, all sorts of SSDs) and RAID systems make good use of these technologies in combination with highly optimized controller techniques. Although this leaves little room for improvement, there is one area that has a direct impact on the achievable performance: To what degree is an application able to actually use the potential performance a storage system can offer? Full performance for all resources at any time In a highly dynamic environment like a cloud infrastructure data is quickly changed, deleted, and newly generated. It is always hard to predict how to best distribute data initially and how this (static) layout will perform over the course of time in terms of future hot spots and underutilized data areas existing in parallel. In the end this can easily result in a badly performing system, although the general I/O architecture is nearly perfect. In contrast to RAID systems with their static storage allocation scheme bound to individual RAID groups, Ceph/ETERNUS CD10000 follows a different approach in which all data is equally distributed across all available nodes (somehow comparable with an algorithm striping data across multiple RAID systems but with better DR resilience). Thanks to the built-in storage virtualization this optimized data layout will prevail over time and remain resilient against all sorts of changes like adding new storage nodes, losing disks, phasing out old storage nodes and the like. Instead of remaining in its static layout all data entities (e.g. volumes) will be rebalanced to make optimum use of all available storage resources in the cluster while maintaining the same identity towards the application (OpenStack VMs in our case). This leads to a system that can maintain (and even improve) its performance characteristics without the need to change the data layout on an application level. No performance impact during recovery from lost disks As mentioned earlier, the recovery times after losing disks in a RAID system grow dramatically with the disk capacity leading to degraded RAID groups for several days if not more than a week. This implies that during this recovery time the performance for all volumes belonging to this RAID group is severely impacted. In contrast, Ceph/ETERNUS CD10000 follows a much more parallelized process of involving the whole cluster in the recreation of lost data (for details please refer to the fault tolerance section). This approach avoids negative performance impact on selected data areas. Especially in the Ceph implementation of ETERNUS CD10000 a clear separation of front end network from back end network data traffic helps avoid negative performance impacts while data is recovered, load balanced, or replicated. Avoid unnecessary copy operations in OpenStack Ceph/ETERNUS CD10000 offers seamless support across all relevant OpenStack programs: For example VM images are stored in Glance using the Ceph/ETERNUS CD10000 block interface. This enables the usage of cloning and snapshotting while assembling production VMs with Cinder and deploying them with Nova, i.e. all modules use the same interface at the same time and can benefit from Ceph/ETERNUS CD10000’s cluster and sharing capability. This has the effect that even hundreds of VM instances can be deployed in only a few minutes. In contrast, RAID systems use different interfaces for Glance as opposed to Cinder and Nova. The consequence is that you typically set up an NFS store for Glance and SCSI block devices for RAID. This has the effect that images need to be explicitly copied between Glance-controlled and Cinder-controlled repositories (as opposed to simply snapshot and clone entities). Furthermore, organizations have to maintain two network infrastructures (Ethernet for NFS, FC* for SCSI block devices) at the same time. *) although iSCSI is an alternative, most larger installations still rely on FC-based storage as far as RAID systems are concerned. Page 13 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Capacity Utilization A typical claim brought up by people who are familiar with RAID is that RAID systems use RAID levels like RAID5 and RAID6 to protect their data, while the typical recommendation for CEPH/ETERNUS CD10000 is to use 3 data copies instead. Keeping aside the cost aspects (that puts this discussion into relation due to the fact that Ceph/ETERNUS CD10000s allow better raw capacity prices anyway), it is worthwhile discussing this claim in more detail: Capacity utilization in reliability-sensitive environments for frequently used data In all reliability-sensitive environments data centers will consider the deployment of mirrored RAID system pairs distributed across fire sections. As a consequence, it doubles the capacity requirements on top of the defined RAID system. In Ceph/ETERNUS CD10000 you can still live with 3 copies distributed across those sections. Furthermore, the usage of hot spare disks in a RAID configuration will add another “hidden” capacity. In Ceph/ETERNUS CD10000 this way of provisioning extra capacity can be dynamically consumed by the typical growth and headroom planning (as opposed to the hot spare concept in RAID systems that doesn’t allow any other usage for hot spares). Let’s have a look at the picture for a typical OpenStack VM store: Type Level Multiplier Hot Spare Multiplier Fire Sections Multiplier Effective Factor*** RAID RAID10 x2 yes x1.05 2 x2 4.2 RAID RAID5 x1.3* yes x1.05 2 x2 2.8 Ceph 3 copies x3 no x1 2 x1** 3.0 *) We are well aware that this multiplier greatly depends on the individual implementation of RAID5 (i.e. there is a difference if a setup like 3+1 or 8+1 is used). However, due to growing disk capacities going hand in hand with growing recovery times, a 3+1 setup has become very common in the meantime). **) Multiple fire sections are already covered by the original number of 3 copies ***) The effective factor is the product of the involved multipliers for level, hot spare, and fire sections This shows that the originally perceived disadvantage of Ceph/ETERNUS CD10000 is almost compensated in the light of actual deployment schemes. The better raw price capacity ratio of Ceph/ETERNUS CD10000 will eventually bring this solution to the forefront in a capacity-oriented discussion. Capacity utilization in normal environments for less frequently used data much like multiple data copies do. In contrast to multiple data copies, A similar picture accrues when looking at the typical storage of less it uses the concept of so-called data and coding chunks. Data chunks frequently used (especially less frequently written) data. Apart from the contain the actual data while coding chunks contain redundancy infortypical 3-copy concept used for production VMs in OpenStack, there is mation. From a capacity perspective erasure coding works in a similar another scheme available called erasure coding. Erasure coding actually way to a RAID setup with k data chunks and m parity chunks. Unlike combines the advantages of less capacity overhead caused by RAID levels RAID it is now possible to even store the chunks across multiple fire and the flexibility and virtualization offered by Ceph/ETERNUS CD10000. sections. In essence this delivers the following picture: Erasure coding is storing data in chunks across the whole cluster pretty Type Level Multiplier Hot Spare Multiplier Fire Sections Multiplier Effective Factor*** RAID RAID6 x1.25* yes x1.05 1 x1 1.3 Ceph EC**** k=2, m=1 x1.5 no x1 3 x1** 1.5 *) Assuming an 8+2 setup **) Multiple fire sections are already covered by the original number of 3 data/coding chunks ***) The effective factor is the product of the involved multipliers for level, hot spare, and fire sections ****) The parameter k defines the number of data chunks and the parameter m defines the number of coding (parity) chunks. Page 14 of 16 www.fujitsu.com/eternus White Paper RAID: End of an Era? Capacity utilization in reliability-sensitive environments for less frequently used data It now becomes obvious that although using the same redundancy level at first glance Ceph/ETERNUS CD10000 offers the much more resilient concept to distribute data across 3 fire sections as opposed to storing RAID data in just one. This effectively means that a Ceph/ETERNUS CD10000 configuration can survive a disaster striking one fire section without any data loss. A similar picture accrues while considering the storage across two geographically separated locations: Type Level Multiplier Hot Spare Multiplier Fire Sections Multiplier Effective Factor** RAID RAID6 x1.25* yes x1.05 2 x2 2.6 Ceph EC*** k=4, m=1 x1.25 no x1 2 x2 2.5 *) Assuming an 8+2 setup **) The effective factor is the product of the involved multipliers for level, hot spare, and fire sections ***) The parameter k defines the number of data chunks and the parameter m defines the number of coding (parity) chunks. Apart from these very obvious calculations there are still some more hidden aspects that shine a particular light on the overall capacity discussion: ■ Storage capacity planning for a Ceph/ETERNUS CD10000 configuration is simplified because all planning for headroom, growth rate, and data recovery is consolidated in a single data-center-wide pool. In contrast, RAID systems need separate planning based on individual RAID groups and per system hot spares. Page 15 of 16 ■ A typical characteristic of RAID systems is that for performance reasons storage extensions have to be provided with the same disk types and RAID aggregation levels as the ones already in use. Furthermore, any disk replacement must happen with the identical type of disks already in use. With Ceph/ETERNUS CD10000 it is possible to use new node types with different characteristics at any point in time to extend or replace existing storage. Ceph/ETERNUS CD10000 uses a weighting mechanism for its disks, hence different disk sizes are not a problem, and Ceph/ETERNUS CD10000 stores data based on disk weight which is intelligently managed. Unlike RAID, Ceph/ETERNUS CD10000 offers a vendor-agnostic storage concept and works nicely on heterogeneous storage node types providing a unified solution for file, block and object storage out of the box. www.fujitsu.com/eternus White Paper RAID: End of an Era? RAID: End of an Era? “RAID: End of an Era?” This is the topic of this white paper and after all we still need to conclude on this pending question: We can sum it up with the statement: “It depends.” Two scenarios need to be distinguished: ■ We have seen that for cloud-based environments (we used OpenStack as an example) the infrastructure requirements are substantially different and dominated by requirements for speed and agility requesting hyperscale storage that is self-managing and self-healing. In this environment RAID systems have problems to deliver the right functions at the right cost level. Published by Fujitsu Limited Copyright © 2015 Fujitsu Limited www.fujitsu.com/eternus Page 16 of 16 ■ In contrast, there is still a significant demand for environments seeking to deliver the optimum performance and service level for specific legacy applications. These environments are less dominated by cloud infrastructure requirements and here traditional RAID systems will remain a good choice because Ceph/ETERNUS CD10000 cannot fully unfold its strengths. All rights reserved, including intellectual property rights. Technical data subject to modifications and delivery subject to availability. Any liability that the data and illustrations are complete, actual or correct is excluded. Designations may be trademarks and/or copyrights of the respective manufacturer, the use of which by third parties for their own purposes may infringe the rights of such owner. For further information see ts.fujitsu.com/terms_of_use.html www.fujitsu.com/eternus
© Copyright 2025 Paperzz