Document

Micron® SolidScale™ Platform Architecture for Cassandra™
Big Data Needs
Big Performance
Today it is no longer enough to simply store data. You need to
be able to glean actionable information from it—whether it is
sensor data from thousands of IoT devices distributed around
the world, sales transaction data from some of the world’s
largest ecommerce websites, or data that may lead to finding
the newest energy reserve. As the size of data grows at an ever
increasing rate, it is harder and harder to process that data and
convert it into useful information. Apache Cassandra™ software
is one of the industry’s most popular NoSQL database
solutions, powering some of the world’s most precise data
analytics applications. It is designed to provide fast, scalable,
and optimized access to dynamic, unstructured data. Deployed
as part of larger analytic solutions such as Apache Spark,
Elasticsearch® and Apache Hadoop® MapReduce, Cassandra
can put an incredible demand on a storage infrastructure.
Cassandra is designed to take advantage of a scale-out, “share
nothing” resource infrastructure where a portion of the data is
stored in each local server. When your administrators need
additional performance, they add additional servers (nodes) to
the cluster. This expansion can lead to unused storage (that
you are paying for) and additional administration resources.
Analytics solutions need exceedingly fast access to all types of
data — structured, unstructured, semi-structured — and
Cassandra is a proven, high-performance solution, especially
when paired with fast, low-latency NVMe™ SSDs. SSDs are
traditionally installed inside the servers in the cluster to gain
the performance advantages of PCIe®. Ideally, it would be
more efficient for administrators to share these devices and
manage them as a single pool of capacity and IOPS, but this is
difficult to do with server-local storage. Administrators can turn
to storage area networks (SANs) to share storage, but
traditional SANs carry their own limitations and costs when
used with scale-out applications solutions like Cassandra, and
there are currently no SAN options that provide the full
performance benefits of NVMe at scale. This is where
leveraging a high-speed, low-latency infrastructure such as the
new Micron® SolidScale™ architecture can provide the
flexibility you need to take full advantage of those NVMe SSD
resources.
Get Aboard the Data Express
for Your Cassandra Solution
SolidScale architecture is the next-generation, intelligent
infrastructure providing scale-out applications, like Cassandra,
with all of the raw performance administrators, users and
automated systems demand from server-local storage. It has all
of the flexibility, manageability and scalability of SANs,
leveraging the latest NVMe and PCIe standards with a lowlatency, high-bandwidth RDMA over Converged Ethernet
(RoCE) fabric to connect compute and storage. By decoupling
your application servers and storage, the SolidScale
architecture allows you the flexibility to deploy compute and
storage resources to meet your needs over time. The SolidScale
infrastructure can be deployed in storage-centric (centralized,
Micron® SolidScale™ Platform Architecture for Cassandra™
dedicated storage nodes) or compute-centric (nodes that can
run both applications and provide storage services to the data
center) configurations or by using a combination of both
deployment strategies at the same time.
By combining all of your NVMe SSD storage into a single pool,
the SolidScale architecture unlocks all of the capacity and IOPS
of the aggregated NVMe SSDs. Allowing more SSDs to
contribute to each Cassandra server’s storage needs potentially
enables you to run the same workload with fewer storage
devices. Simply configure logical volumes of any desired size
using the capacity of one or more physical NVMe devices in
one or more SolidScale nodes to optimize your Cassandra
deployment’s needs.
A basic SolidScale solution consists of three storage nodes
configured as a clustered storage resource in a high-availability
configuration to prevent data loss. For Cassandra, SolidScale
Infrastructure Manager allows you to create logical volumes in
a striped (RAID 0) configuration to scale the performance of
your data volumes to more SSD devices than you could in a
typical in-server storage Cassandra deployment. The SolidScale
infrastructure gives you access to thousands of NVMe SSDs
across hundreds of SolidScale nodes in a single storage cluster.
The SolidScale architecture is built on a low-latency, 100Gb
RoCE fabric to connect application servers to their NVMe SSDs.
RoCE networks are the latest high-performance interconnect
technology specifically designed for memory and storage data
transfer between independent server resources. While it’s
expected that any networked solution will introduce some
additional latency to the end-to-end I/O, our initial testing
shows that the RoCE network introduces an average of only
5µs additional latency.
Early Testing
We tested two Cassandra cluster configurations: the first used
server-local NVMe SSDs and the second used a SolidScale
storage cluster for the database repository. With each
configuration we used the Yahoo Cloud Serving Benchmark
(YCSB) workloads A–D and F to reflect a broad set of cloud
application workload types and their respective I/O read and
write profiles. Table 1 shows the YCSB workloads definitions.
Workload1
Operation
Description
Mix (R/W)
A
50% / 50%
B
95% / 5%
Photo tagging
C
100% / 0%
User profile cache (typical for
Session store recording actions
in a user session
Hadoop solutions)
D
95% / 5%
E
95% / 5%
Threaded conversation perusal
F
50% / 50%
Read, modify, write database
Insert records and read latest
inserted records multiple times
activity
Table 1: YCSB Workload Definitions
1.
We did not test workload E (short ranges) as it is not universally
supported on NoSQL-type databases.
The baseline Cassandra solution used six application servers
configured as described in Table 1 and deployed as shown in
Figure 1. We compared these results to those obtained by
running the same YCSB test matrix against a six-node
Cassandra solution using a three-node SolidScale storage
configuration, with each server using a single SolidScalearchitecture-provided NVMe SSD device. Using six application
servers distributed across the SolidScale cluster meant each
SolidScale node hosted the data for two of the Cassandra
servers, as shown in Figure 2).
The early testing provides proof that a SolidScale infrastructure
can provide performance that is nearly that of server-local
deployments.
Figure 1: Cassandra test configuration using Micron
SolidScale infrastructure
Figure 2: Cassandra test configuration using server local SSDs
Micron® SolidScale™ Platform Architecture for Cassandra™
Baseline
SolidScale
(Figure 1)
(Figure 2)
# Application servers
Application server configuration
6
• 2X Intel® Xeon® 26xxv4 @ 2.1MHz
• 2X Intel Xeon 26xxv4 @ 2.1MHz
• 8X 32GB RDIMMs
• 8X 32GB RDIMMs
• 2X 500GB M500DC SSDs RAID1 (boot)
• 2X 500GB M500DC SSDs RAID1 (boot)
• 2X 10 GbE local area network ports
• 2X10GbE local area network ports
• 2X 25Gb Mellanox ConnectX®-4 RoCE storage
network ports
# Data drives per server
1X 2.4TB Micron 9100 NVMe SSD
0
1
1
# Physical drives per logical volume
Logical volume capacity
# Logical data volumes per application server
Total Cassandra database size
Cassandra replication factor
2.4TB
1
1TB
3
Table 2: Cassandra test configurations
The Results
The Bottom Line
The results of our testing with YCSB workloads show that the
SolidScale infrastructure performance, in terms of operations
per second and latency, is less than 10 percent slower than
local NVMe storage performance. While our testing was done
on an early technology preview configuration of hardware and
software, we are already seeing significant performance
increases from our SolidScale infrastructure. Using only a single
25Gb RoCE port and a single NVMe SSD in each Cassandra
server, and without any network optimization, our performance
was expected to be lower than what we should be able to
achieve when we include additional, redundant data paths,
advanced multipath software and network quality of service —
none of which were used in these early tests. When compared
to a server-local SSD solution, some additional latency will be
incurred because of the network, but our early numbers are
very encouraging.
As data continues to grow and the need to analyze this data at
an ever faster pace becomes more essential to your business
success, advanced, high-performance data center
infrastructures that can power next-generation applications
must be used. As we have shown, SolidScale can provide the
capacity, performance, and scalability that you will need to be
successful. Cassandra performance on SolidScale is extremely
fast, demonstrating a scalable, centrally managed solution that
provides more potential performance than when deployed
using traditional Cassandra server deployments that use serverlocal storage.
Want to Learn More?
Interested in participating in the SolidScale platform early access program? Or, are you an OEM company interested in partnering with Micron to
extend the SolidScale architecture across your hardware platforms? Visit micron.com/solidscale or send an email to [email protected].
www.micron.com
No hardware, software or system can provide absolute security and protection of data under all conditions. Micron assumes no liability for lost, stolen or corrupted data arising from the use of any Micron product.
Products are warranted only to meet Micron’s production data sheet specifications. Products, programs and specifications are subject to change without notice. Dates are estimates only.
©2017 Micron Technology, Inc. All rights reserved. All information is provided on as “AS IS” basis without warranties of any kind. Micron, the Micron logo and SolidScale, are trademarks of Micron Technology, Inc.
Apache, Apache Cassandra, and Apache Hadoop are registered trademarks of The Apache Software Foundation. NVMe is a trademark of NVM Express, Inc. Elasticsearch is a trademark of Elasticsearch BV, registered in the
U.S. and in other countries. PCIE is a registered trademark of PCI-SIG. Intel and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Mellanox and ConnectX are registered
trademarks of Mellanox Technologies, Ltd. Rev. C 6/17 CCMMD-676576390-10708