Meeting the Challenge of Large Data

Meeting the
Challenge of
Large Data
The IT industry is
abuzz with advice on
what to do about Big
Data, but some
organizations are
facing another
important topic that
receives much less
attention: Large Data
Large Data as a concept is the need to store individual pieces of data that are massive
in size such as audio, high definition videos and graphics, (see Figure 1). Some
examples of Large Data include an archive of videos, such as security videos for a
department store or a movie production company, seismic maps generated by oil and
gas companies for exploration or storage of high resolution digital X-rays and 3D
ultrasound scans by a hospital.
Large Data Characteristics
Some important characteristics of the Large Data are:
SUSE Enterprise Storage is a comprehensive storage
solution tailored for the Large Data environment. You
• Large total size for the archive (over 200TB)
get what you need, and you don't pay for what you
• Large individual pieces of data stored on the system
don't need. Behind the scenes, SUSE Enterprise
• Reliable but relatively infrequent access to stored
Storage adds additional savings with fault-tolerant,
resources
• Potential need for future expansion and scale
self-healing Ceph technology that minimizes
administration and maintenance. The result is a
powerful solution optimized for Large Data customers.
The problems in large data are compounded when it
Conventional block storage solutions cost around $1-2
comes to the total size of the storage system. A 10GB
per GB for installations in the 100TB to 1PB range.
video file is almost ten thousand times bigger than a
With SUSE Enterprise Storage, the cost is often only
standard Microsoft Word document, which means you
30 cents per GB, dropping to as low as 10 cents per
would need ten thousand times more capacity to
GB for installations above 10PB.
store an equal number of files. If you attempt to
address your Large Data storage needs with a
conventional storage solution, the total cost of
deploying and operating the archive increases with
the storage capacity, which is infeasible for many IT
budgets.
Scalable storage to deal with Large Data emerged
because data vendors and customers were not
satisfied with the prospect of letting storage costs
scale linearly to very large implementations. As you
will learn in this paper, a storage solution specifically
designed for the needs of Large Data can deliver
savings that aren't possible if you depend on
conventional storage technologies.
Figure 1: Large Data scenarios encompass a wide range of
use cases, but they have one common feature: files are large.
One file might be bigger than hundreds of the spreadsheets
and still-shot images you find on a conventional disk.
Low-Latency/High Overhead
When hard disk technology began to emerge many
data could simply reside in conventional, low-latency
years ago, disks were small, and system administrators
block storage. This solution was not especially
spent a lot of time tending to the storage system to
efficient from a cost viewpoint, but rising disk
make sure it didn't fill up. Users had to make choices
capacities obscured the cost penalty.
about which files to save and which to delete. Files
that might need occasional access even if they weren't
The emergence of Large Data as a distinctive market
in active use were relegated to offline storage.
puts the block storage cost penalty in the foreground.
Large Data buyers need a solution that offers a
Offline storage usually consisted of some kind of
significantly lower cost than pouring all the storage
archival storage medium, which could be anything
resources into conventional block storage. At the
from a floppy disk, to a magnetic tape, to a DVD.
same time, a Large Data solution must provide:
Offline storage was quite cumbersome and inefficient,
typically requiring a human to store and classify disks
• Convenient, always-on live access to all data
in some kind of library-like storage collection, but
• Fault tolerance
offline storage served an important function:
• A self-healing architecture that minimizes
preserving the expensive, low-latency block storage
maintenance
tier for files that required frequent access.
An alternative solution that can meet these needs and
As storage capacities increased over time, storage
deliver big savings for your Large Data customers is
vendors were all too happy to offer solutions that
object storage.
provided more and bigger hard disks so that all the
Understanding Object Storage
Conventional block storage breaks a data file into fixed-sized blocks, placing the blocks
at different (typically discontiguous) places on the disk (Figure 2). A file saved through
block storage in a Large Data storage environment might consist of hundreds of blocks.
Managing and manipulating the storage blocks requires system resources. Perhaps more
importantly, some component of the system must serve as the central source for
information on where all the blocks are stored, which can cause a processing bottleneck.
Object storage, on the other hand, saves a file without breaking it into separate blocks.
The location is derived using a hash of the filename and metadata, which means the
system is not dependent on a central lookup table.
Object storage simplifies the handoff from the client to the storage system, creating the
potential for innovation and automation within the storage system.
Ceph is a powerful and versatile solution based on object storage.
File
Figure 2: Block storage systems divide the file into
blocks, storing the blocks separately on the disk.
Disk
Discover Ceph
Ceph builds the promise of object storage into a
storage admin per 500TB of data. Because a Ceph
real-world, self-healing, fault-tolerant framework.
cluster manages itself, a Ceph storage admin can
Unlike other enterprise-ready storage solutions, which
administer as much as 3-4PB of data, and some
often require the purchase of expensive, proprietary
networks report up to 10PB per admin.
hardware, Ceph runs in a cluster formation on
commodity hardware. You decide what hardware you
wish to use based on a price you negotiate with your
hardware vendor. The Ceph object store is accessible
from a variety of storage client technologies (see the
SUSE Enterprise Storage:
The Best Way to Leverage Ceph
box entitled “Interfaces”), which means you can
deploy Ceph in a number of ways depending on your
SUSE Enterprise Storage takes the promise of Ceph
needs and your network configuration.
and packages it into a form that is easy to use and
implement. The storage experts at SUSE bring value
The Ceph object store is a self-managing,
to Ceph by adding:
self-contained system. Once you get your Ceph
cluster up and running, it stores and retrieves files with
• Easy deployment—SUSE assembles and integrates a
little or no intervention. Ceph has been called “valet
helpful collection of deployment and management
parking” for file storage. A file shows up at the front
tools to simplify administration and extend
door, and Ceph does the rest, managing the process
productivity
automatically and invisibly. And Ceph doesn't just
stuff the data on a disk; multiple copies of the file are
• Hardware certification—SUSE tests and certifies
stored within the cluster using a fault-tolerance
common hardware to minimize hardware headaches
technology known as erasure coding to ensure the
and IT overhead
seamless recovery of your data if a disk is lost.
• Support—the SUSE team provides several options
If one of the nodes in the cluster goes down, Ceph
for comprehensive, inexpensive customer support
senses the loss and redistributes to other systems,
that will keep your systems running and minimize
restoring full fault tolerance and providing continuous
the need for on-site expertise.
operation. If your Large Data archive grows beyond its
capacity, simply add another node to the cluster. Ceph
The nature of the Large Data environment means that
integrates the new node automatically.
storage archives tend to grow with time. Unlike other
Ceph-based options, which charge based on the
The automation and native fault tolerance built into
amount of data in your archive, SUSE offers per-node
Ceph result in huge savings. A good rule of thumb for
licensing, which means the price of your network
conventional block storage is that it requires around 1
won't go up just because you store more data.
Interfaces
Ceph is based on object storage, but a robust and versatile system of interfaces make the Ceph
object store accessible from many different kinds of storage systems (see Figure 3). In addition to an
object storage interface, your Ceph cluster offers interfaces for block storage, a RESTful gateway for
web services, and a network filesystem for access from file service clients.
Block Storage
Clients
RESTful
Gateway
File Server
Clients
SUSE Enterprise
Storage
Storage
Cluster
Local Cloud
Object Storage
iSCSI
(Network Block
Storage)
Figure 3: Ceph supports a number of different file storage interfaces and therefore can serve in a number of different roles.
Questions for your Storage Vendor
Conclusion
If you are comparing storage solutions to address your
Pouring 100% of your storage resources into
Large Data problem, keep the following questions in
conventional block storage is inefficient and
mind:
counter-productive if you work in a Large Data
environment. Ceph is a powerful alternative built on
• What is the total cost of the solution in $ per GB?
commodity hardware and object storage technology.
• How many storage admins will you need to
SUSE Enterprise Storage puts the power of Ceph into
administer the system?
a form that is easy to implement and deploy. The
SUSE team adds hardware certification and support,
• How easy is it to expand the system after you deploy
it? Does the cost rise when you add more data?
and a per-node licensing scheme means you won't
pay more when you add more data to your archive.
Ceph and SUSE Enterprise Storage bring the total
• What is the cost of migrating the data to newer
cost of your Large Data archive down to as low as 40
hardware when the warranty of the storage being
cents per GB—one third the cost of using conventional
procured expires?
block storage for the same scenario.
• Is the solution based on Open Source technology, or
If your environment has a need for storing
is it subject to vendor lock-in and proprietary
jumbo-sized files in a Large Data setting, and your
control?
storage solution requires (or might someday require)
more that 200TB in storage capacity, consider the
• Does the solution work with commodity hardware?
If you consider all these questions carefully, Ceph and
SUSE Enterprise Storage emerge as a sensible
solution for your Large Data storage needs.
benefits of Ceph and SUSE Enterprise Storage.
For more information
800-796-3700 U.S./Canada
801-861-4500 Worldwide
Learn more at suse.com
SUSE
Maxfeldstrasse 5
90409 Nuremburg
Germany
Part number
163-000006-001
© 2016 SUSE. All rights reserved.
SUSE and the SUSE logo are registered trademarks of SUSE, LLC in
the United States and other countries. All third-party trademarks are
the property of their respective owners.