What’s New in Apache Cassandra™ 1.2?
An Overview for Architects, Developers, and IT Managers
White Paper
BY DATASTAX CORPORATION
DECEMBER 2012
Contents
Introduction
3
Why Cassandra?
3
What’s New in Cassandra 1.2?
5
Manageability Enhancements
5
Virtual Nodes (Vnodes)
5
Parallel Leveled Compaction
7
Off-Heap Bloom Filters and Compression Metadata
7
Improved JBOD Functionality
7
Performance Enhancements
8
Query Profiling/Tracing
8
Faster Node Bootup/Startup
9
Murmur3Partitoner
9
Miscellaneous Performance Enhancements
9
Development Enhancements
Collections
9
9
Atomic Batches
11
Flat File Load/Export Utility
11
Native/Binary CQL Transport
12
Concurrent Schema Changes
12
CQL Enhancements
12
Additional Metadata Information
12
Getting Started with Cassandra 1.2
13
Cassandra for Production Environments
13
Conclusion
13
About DataStax
13
Introduction
Apache Cassandra, an Apache Software Foundation project, is a massively scalable NoSQL
database. Cassandra is designed to handle big data workloads across multiple data centers with
no single point of failure, providing enterprises with continuous availability without compromising
performance.
This paper discusses the new features contained within the 1.2 version of Cassandra. For more
general information on NoSQL and NoSQL use cases, as well as an introduction to Cassandra,
please see the “Why NoSQL?” and “Introduction to Apache Cassandra” white papers on
DataStax.com.
Why Cassandra?
Many modern businesses use Cassandra to power the applications that transform their business.
Some companies using Cassandra today include the following:
Abstract
Many modern businesses use
Cassandra to power the applications that transform their
business, with widely varying use
cases: healthcare management,
online gaming, e-commerce,
media streaming, social media,
and many more. Cassandra users
enjoy massive scalability, continuous availability, fault detection
and recovery, data consistency,
and simplicity of installation that
remain unmatched. The newest
version of Cassandra offers many
new enhancements including
virtual nodes and parallel leveled
compaction, query profiling/tracing, faster node bootup/startup,
collections and atomic batches.
Figure 1:
Sample of companies currently
using Cassandra
Core features in Cassandra that cause many to choose the database for their big data, modern
business systems include the following:
Massively scalable architecture – Cassandra’s masterless, peer-to-peer architecture
overcomes the limitations of master-slave designs and allows for both high availability and
massive scalability. Cassandra is the acknowledged NoSQL leader1 when it comes to
comfortably scaling to terabytes or petabytes of data, while maintaining industry-leading
write and read performance.
Linear scale performance – Nodes added to a Cassandra cluster (all done online) increase the
throughput of a database in a predictable, linear fashion2 for both read and write operations,
even in the cloud where such predictability can be difficult to ensure.
Continuous availability – Data is replicated to multiple nodes in a Cassandra database cluster
to protect from loss during node failure and provide continuous availability with no downtime.
1
http://wikibon.org/wiki/v/Cassandra_Continues_to_Win_Real-Time_Big_Data_Converts
2
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
3
Transparent fault detection and recovery – Cassandra clusters can grow into the hundreds or
thousands of nodes. Because Cassandra was designed for commodity servers, machine
failure is expected. Cassandra utilizes gossip protocols to detect machine failure and recover
when a machine is brought back into the cluster – all without the application noticing.
Flexible, dynamic schema data modeling – Cassandra offers the organization of a
traditional RDBMS table layout combined with the flexibility and power of no stringent
structure requirements. This allows data to be dynamically stored as needed without
performance penalty for changes that occur. In addition, Cassandra can store structured,
semi-structured, and unstructured data.
Guaranteed data safety – Cassandra far exceeds other systems on write performance due to
its append-only commit log while always ensuring durability. Users must no longer trade off
durability to keep up with immense write streams. Data is absolutely safe in Cassandra; data
loss is not possible.
Distributed, location independence design – Cassandra’s architecture avoids the hot spots
and read/write issues found in master-slave designs. Users can have a highly distributed
database (e.g., multiple geographies, multiple data centers) and read or write to any node in
a cluster without concern over what node is being accessed.
Tunable data consistency – Cassandra offers flexible data consistency on a cluster, data
center, or individual I/O operation basis. Very strong or eventual data consistency among all
participating nodes can be set globally and also controlled on a per-operation basis (e.g.,
per INSERT, per UPDATE) in Cassandra’s drivers and client libraries.
Multiple data center replication – Whether it’s keeping data in multiple locations for disaster
recovery scenarios or locating data physically near its end users for fast performance,
Cassandra offers support for multiple data centers. Administrators simply configure how many
copies of the data they want in each data center, and Cassandra handles the rest – replicating
the data automatically. Cassandra is also rack-aware and can keep replicas of data stored on
different physical racks, which helps ensure uptime in the case of single rack failures.
Cloud-enabled – Cassandra’s architecture maximizes the benefits of running in the cloud.
Also, Cassandra allows for hybrid data distribution where some data can be kept on-premise
and some in the cloud.
Data compression – Cassandra supplies built-in data compression, with up to an 80 percent
reduction in raw data footprint. More importantly, Cassandra’s compression results in no
performance penalty, with some use cases showing actual read/write operations speeding
up due to less physical I/O being required.
CQL (Cassandra Query Language) – Cassandra provides a SQL-like language called CQL
that mirrors SQL’s DDL, DML, and SELECT syntax. CQL greatly decreases the learning curve
for those coming from RDBMS systems because they can use familiar syntax for all object
creation and data access operations.
No caching layer required – Cassandra offers caching on each of its nodes. Coupled with
Cassandra’s scalability characteristics, nodes can be incrementally added to the cluster to
keep as much data in memory as needed. The result is that there is no need for a separate
caching layer.
No special hardware needed – Cassandra runs on commodity machines and requires no
expensive or special hardware.
Incremental and elastic expansion – The Cassandra ring allows online node additions.
Because of Cassandra’s fully distributed architecture, every node type is the same, which
means clusters can grow as needed without any complex architecture decisions.
4
Simple install and setup – Cassandra can be downloaded and installed in minutes, even for
multi-cluster installs.
Ready for developers – Cassandra has drivers and client libraries for all the popular development
languages (e.g., Java, Python)
Given these technical features and benefits, the following are typical big data use cases handled well by
Cassandra in the enterprise:
Real-time, big data workloads
Time series data management
High-velocity device data ingestion and analysis
Healthcare system input and analysis
Media streaming management (e.g., music, movies)
Social media (i.e., unstructured data) input and analysis
Online web retail (e.g., shopping carts, user transactions)
Real-time data analytics
Online gaming (e.g., real-time messaging)
Software as a Service (SaaS) applications that utilize web services
Write-intensive systems
What’s New in Cassandra 1.2?
Cassandra 1.2 includes few features in the areas of manageability, performance, and developer
functionality.
Manageability Enhancements
Virtual Nodes (Vnodes)
Those who have worked with Cassandra in the past know how the database distributes data across a
cluster of nodes. A numerical token is assigned each node, which makes it responsible for one range of
data in the cluster. While this paradigm has worked very well for scaling out massive databases, it has a
few limitations.
First, when a new node is added to an existing cluster, anywhere from one to a handful of existing nodes
will participate in bootstrapping the new node with its data. The same is true if a node goes down and
needs to be replaced. If the amount of data is large, then the process could be one that is very resource
intensive on the nodes participating and time consuming overall. Because of this, a rule-of-thumb
recommendation for Cassandra has been to deploy ‘thin nodes’, which have equated to keeping about
½ TB of data on each node.
Second, when new nodes are added to an existing cluster, the cluster becomes unbalanced where its data
distribution is concerned (i.e. some nodes having more/less data than others). From a performance
perspective, an even distribution of data is desired so the newly modified cluster must go through a
rebalance operation, which if done manually can be an error prone and potentially long process.
5
In Cassandra 1.2, virtual nodes – or ‘vnodes’ – have been implemented to overcome these issues and
provide easier manageability. Vnodes change the previous Cassandra paradigm from one token or
range per node, to many per node. Within a cluster these can be randomly selected and be non-contiguous, resulting in smaller ranges that belong to each node:
Figure 2:
Comparison between non-vnodes and
vnodes implementation
Vnodes provide the following core benefits:
Rather than just one or a couple nodes participating in bootstrapping new nodes, all nodes
participate in the operation, thus parallelizing the task with the end result being much faster
performance for node addition operations.
The need to adhere to the ‘thin node’ recommendation for Cassandra no longer applies.
Vnodes automatically maintain the data distribution / balance of a cluster so there is no need to
perform any rebalance operation after a cluster has been modified.
Enabling vnodes for a Cassandra cluster is easy and straightforward. Rather than assign each node a
token, a new configuration parameter – num_tokens – is used to specify the number of vnodes tokens
to use for a cluster (a good default is 256).
For more information on vnodes, including instructions on upgrading an existing cluster to use vnodes,
please see the DataStax online documentation.
6
Parallel Leveled Compaction
Leveled compaction has proven to be effective for update-intensive workloads, but has been limited by
allowing only one leveled compaction at a time to run at a time per table. This has been true no matter
how many hard disks or SSDs that data was spread across.
Parallel leveled compaction in Cassandra 1.2 is aimed at providing more efficient and faster compaction operations to deployments that especially occur on SSD hardware. Whereas the general idea for
compaction processes is to mitigate its impact on the overall operation of nodes (which typically
results in longer compaction times but less resource intensive operations), SSD implementations lend
themselves to speeding up compaction tasks. Parallel leveled compaction allows for this to be the case
on clusters deployed with SSD’s by allowing the LCS to run up to concurrent_compactors compactions across different SSTable ranges (including multiple compactions within the same level).
Off-Heap Bloom Filters and Compression Metadata
Cassandra 1.2 helps reduce the Java heap requirements for large datasets by moving the memory used
for bloom filters and compression metadata into native memory. Java heap sizes that exceed 8GB tend
to cause garbage collection operations to impact performance, so the off-heap enhancement for bloom
filters helps reduce that possibility.
Improved JBOD Functionality
Before Cassandra 1.2, a single disk going down in a JBOD (just a bunch of disks) configuration had the
potential to make an entire node unavailable for I/O operations. Version 1.2 introduces a new disk_failure_policy configuration setting that allows you to choose from two policies that deal with
disk failure:
stop is the default setting for new 1.2 installations. Upon encountering a file system error Cassandra will shut down gossip and Thrift services, leaving the node effectively unavailable, but still
reachable via JMX for troubleshooting.
best_effort This new Cassandra option will allow the database to do its best in the event of a
disk error. If Cassandra can’t write to a disk, the disk will become blacklisted for writes and the
node will continue writing elsewhere. If Cassandra can’t read from a disk, it will be marked as
unreadable, and the node will continue serving data from readable sstables only. This implies that
it’s possible, if the consistency level is ONE, for stale data to be served when the most recent
version of data is on the unreadable disk, so choose this option with care.
ignore This policy exists for users upgrading from prior versions of Cassandra. In this mode, the
database will behave in the exact same manner as older versions – all file system errors will be
logged but otherwise ignored. DataStax recommends using either the stop or best_effort
policies instead.
For more information on version 1.2’s improved JBOD support, see DataStax’s online documentation.
7
Performance Enhancements
Query Profiling/Tracing
Version 1.2 of Cassandra provides new performance diagnostic utilities aimed at helping you understand, diagnose, and troubleshoot CQL statements that are sent to a Cassandra cluster. You can
interrogate individual CQL statements in an ad-hoc manner, or perform a system-wide collection of all
queries/commands that are sent to a cluster.
For example, to understand how a Cassandra cluster will satisfy a single CQL INSERT statement, you
would enable the trace utility from the CQL command prompt, issue your query, and review the
diagnostic information provided:
cqlsh> tracing on;
Now tracing requests.
cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity
| timestamp
|
source
| source_elapsed
-------------------------------------+-----------------+-------------+---------------execute_cql3_query | 00:02:37,015
|
127.0.0.1
|
0
Parsing statement | 00:02:37,015
|
127.0.0.1
|
81
Preparing statement | 00:02:37,015
|
127.0.0.1
|
273
Determining replicas for mutation | 00:02:37,015
|
127.0.0.1
|
540
779
Sending message to /127.0.0.2 | 00:02:37,015
| 127.0.0.1
|
Messsage received from /127.0.0.1 | 00:02:37,016
| 127.0.0.2
|
63
Applying mutation | 00:02:37,016
| 127.0.0.2
|
220
Acquiring switchLock | 00:02:37,016
| 127.0.0.2
|
250
Appending to commitlog | 00:02:37,016
| 127.0.0.2
|
277
Adding to memtable | 00:02:37,016
| 127.0.0.2
|
378
Enqueuing response to /127.0.0.1 | 00:02:37,016
| 127.0.0.2
|
710
Sending message to /127.0.0.1 | 00:02:37,016
| 127.0.0.2
|
888
Messsage received from /127.0.0.2 | 00:02:37,017
| 127.0.0.1
|
2334
Processing response from /127.0.0.2 | 00:02:37,017
| 127.0.0.1
|
2550
Request complete | 00:02:37,017
| 127.0.0.1
|
2581
Cassandra provides a description of each step it takes to satisfy the request along with what node(s)
are affected, the time for each step, and the total time for the request.
In addition to individual query analysis, database administrators and system admins oftentimes need to
collect all statements that are sent to a database to understand what the most resource intensive
statements are and locate queries that need to be tuned. Cassandra 1.2 allows you to set a new node
tool option – settraceprobability – to trace some or all statements sent to a cluster. A probability
of 1.0 will trace everything whereas lesser amounts (e.g. 0.10) only sample a certain percentage of
statements. Care should be taken on large and active systems, as system-wide tracing will have a
performance impact.
The trace information is stored in a new systems_traces keyspace that holds two tables –
sessions and events, which can be easily queried to answer questions such as what the most
time-consuming query has been since a trace was started, and much more.
For more information on tracing and troubleshooting CQL statements in version 1.2, see DataStax’s
online documentation.
8
Faster Node Bootup/Startup
Cassandra 1.2 provides faster startup/bootup times for each node in a cluster, with internal tests performed
at DataStax showing up to 80% less time needed to start a Cassandra node. The startup reductions were
realized through more efficient sampling and loading of SSTable indexes into memory caches.
Murmur3Partitoner
Cassandra 1.2 supplies a new default partitioner: the Murmur3Partitioner, which based on the Murmur3
hash. The Murmur3 hash is 3x-5x faster than the prior MD5 has used in earlier versions of Cassandra;
this translates into performance gains of over 10% for index-heavy workloads.
Note that the new Murmur3Partitioner is not backwards compatible with the previously used RandomPartitioner. Any upgrades from earlier versions of Cassandra necessitate that the partitioner being used
is the RandomPartitioner.
Miscellaneous Performance Enhancements
Version 1.2 of Cassandra 1.2 supplies a number of other performance enhancements including:
A new approach to index maintenance, which improves the speed at which indexes are updated.
More efficient and faster streaming of data during bootstrap or repair operations.
Faster replica recovery via a new concurrent hint delivery mechanism.
Development Enhancements
Collections
Version 1.2 of Cassandra introduces a new mechanism for storing data called collections. The general idea
behind collections is to provide easier methods for inserting and manipulating data that consists of multiple
items that you want to store in a single column; for example, multiple email addresses for a single employee.
There are three different types of collections you can select from: (1) sets; (2) lists; (3) maps.
Sets
A set allows for the storage of a group of elements that are returned in sorted order when queried. For
example, if you wanted to store multiple emails for the employees of a company, you might create the
following table:
cqlsh> CREATE TABLE emp (
emp_id int PRIMARY KEY,
first_name text,
last_name text,
emails set<text>
);
cqlsh> INSERT INTO emp (emp_id, first_name, last_name, emails)
VALUES(1, 'Laura', 'Jung', {'[email protected]',
'[email protected]'});
Sets may be added to:
cqlsh> UPDATE emp
SET emails = emails + {[email protected]'}
WHERE emp_id = 1;
Sets may be queried:
cqlsh> SELECT emp_id, emails
FROM emp
WHERE emp_id = 1;
emp_id | emails
---------+------------------------------------------------------------1
| {'[email protected] ,"[email protected]","[email protected]"}
9
Sets may be deleted from on an individual item basis:
cqlsh> UPDATE emp
SET emails = emails - {'[email protected]'} WHERE emp_id = 1;
Or sets may be deleted from in total, in one of two ways:
cqlsh> UPDATE emp SET emails = {} WHERE emp_id = 1;
cqlsh> DELETE emails FROM emp WHERE emp_id = 1;
Lists
A set allows for the storage of a group of elements that are returned in sorted order when queried. For
example, if you wanted to store multiple emails for the employees of a company, you might create the
following table:
cqlsh> ALTER TABLE emp ADD depts_mngd list<text>;
cqlsh> UPDATE emp
SET depts_mngd = [ 'engineering’, 'support' ] WHERE emp_id = 1;
With lists, you can prepend and append new items:
cqlsh> UPDATE emp
SET depts_mngd = ['QA' ] + depts_mngd WHERE emp_id = 1;
cqlsh> UPDATE emp
SET depts_mngd = depts_mngd + [ 'doc' ] WHERE emp_id = 1;
You can also manipulate an item by its index:
cqlsh> UPDATE emp SET depts_mngd [4] = 'docs' WHERE emp_id = 1;
cqlsh> DELETE depts_mngd [4] FROM emp WHERE emp_id = 1;
Lastly, you can remove list items by value (note that all instances of the value will be removed
from the list):
cqlsh> UPDATE emp
SET depts_mngd = depts_mngd - ['QA'] WHERE emp_id = 1;
Maps
As its name implies, a map maps one thing to another. For example, you might want to record the dates
of performance reviews along with the end score of each review for each employee in your employee
table:
cqlsh> ALTER TABLE emp ADD perf_reviews map<timestamp, int>;
cqlsh> UPDATE users
SET perf_reviews = { '2012-04-01' : 95,
'2012-07-01' : 97 }
WHERE emp_id = 1;
Maps can be added to and items can be manipulated:
cqlsh> UPDATE emp
SET pref_reviews['2012-10-01'] = 90
WHERE emp_id = 1;
cqlsh> DELETE perf_reviews['2012-10-01']
FROM emp
WHERE emp_id = 1;
For more information on collections, see DataStax’s online documentation.
10
Atomic Batches
Prior versions of Cassandra allowed for batch operations, which allowed you to group related updates into a
single statement. If some of the replicas for the batch failed mid-operation, the coordinator would hint those
rows automatically. However, if the coordinator itself failed in mid operation, you could end up with partially
applied batches.
In version 1.2 of Cassandra, batch operations are guaranteed by default to be atomic, and are handled
differently than in earlier versions of the database. When a batch is written in 1.2, it is first written to a new
system table that consumes the serialized batch as blob data. After the rows in the batch have been
successfully written and persisted (or hinted), the system table entry is removed.
Again, the default functionality for batches in version 1.2 is for any batch to be atomic (i.e. all or nothing). It
should be noted that there is a performance penalty for using atomic batches, so for use cases that necessitate batch operations, but either have client side workarounds or other methods for ensuring batch atomicity,
a BEGIN UNLOGGED BATCH command is supplied for cases when performance is more important than
atomicity guarantees. This is akin to using unlogged statements in many RDBMS’s.
In addition, version 1.2 also introduces a new BEGIN COUNTER BATCH command for batched counter
updates. Unlike other writes in Cassandra, counter updates are not idempotent, so replaying them automatically from the new system table is not safe. Counter batches are thus strictly for improved performance when
updating multiple counters in the same partition.
Lastly, it should be understood that although an atomic batch guarantees that if any part of the batch
succeeds, all of it will, no other transactional enforcement is done at the batch level. For example, there is no
batch isolation – other clients will be able to read the first updated rows from the batch, while other rows are
in progress. However, transactional row updates within a single row are isolated (i.e. a partial row update
cannot be read).
Flat File Load/Export Utility
Cassandra 1.2 contains a utility that makes it easy to import and export flat file data to/from Cassandra
tables. Although it was initially introduced in Cassandra 1.1.3, the new load utility wasn’t formally announced
with that version, so an explanation of it is warranted in this document.
The utility mirrors the COPY command from the PostgreSQL RDBMS and is used in Cassandra’s CQL shell. A
variety of file formats and delimiters are supported including comma-separated value (CSV), tabs, and more,
with CSV being the default.
The syntax for the COPY command is the following:
COPY <column family / table name> [ ( column [, ...] ) ]
FROM ( '<filename>' | STDIN )
[
WITH <option>='value' [AND ...] ];
COPY <column family / table name> [ ( column [, ...] ) ]
TO ( '<filename>' | STDOUT )
[
WITH <option>='value' [AND ...] ];
Below are simple examples of the COPY command in action:
cqlsh> SELECT * FROM airplanes;
name
| mach | manufacturer | year
--------------+------+--------------+-----P38-Lightning |
0.7 |
Lockheed | 1937
cqlsh> COPY airplanes (name, mach, year, manufacturer) TO 'temp.csv'
1 rows exported in 0.004 seconds.
cqlsh> TRUNCATE airplanes;
cqlsh> COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv'; 1 rows imported in
0.087 seconds.
For more information about the COPY command, see DataStax’s online documentation.
11
Native/Binary CQL Transport
Prior to Cassandra 1.2, the Cassandra Query Language (CQL) API had been using Thrift as a network
transport, but now with version 1.2 and above, a new binary protocol is available for CQL that does not
require Thrift.
There are a number of benefits that the new 1.2 native CQL transport provides:
Thrift is a synchronous transport meaning only one request can be active at a time for a connection. By contrast, the new native CQL transport allows each connection to handle more than one
active request at the same time. This translates into client libraries only needing to maintain a
relatively low number of open connections to a Cassandra node in order to maximize performance,
and helps scale large clusters.
Thrift is an RPC mechanism, which means you cannot have a Cassandra server push information
to a client. However the new native CQL protocol allows clients to register for certain types of
event notifications from a server. As of 1.2, currently supported events include [1] cluster topology
changes (e.g. a node joins the cluster, is removed, is moved, etc.); [2] status changes (e.g. a node is
detected up/down); [3] schema changes (e.g. a table has been modified, etc.). These new
capabilities allow clients to stay up to date with the state of the Cassandra cluster without having
to poll the cluster regularly.
The new native protocol allows for messages to be compressed if desired.
Thrift is still the default transport in 1.2, so if you want to use the new binary protocol, you will need to
change the start_native_transport option to true in the cassandra.yaml file (you can also
turn start_rpc to false if you’re not going to use the thrift interface). You will also need a client
driver that supports this new binary protocol such as the new DataStax Java and .NET drivers.
Concurrent Schema Changes
While Cassandra 1.1 introduced the ability to modify objects in a concurrent fashion across a cluster, it
did not include support for programmatically creating and dropping column families / tables (either
permanent or temporary) in a concurrent manner. Version 1.2 supplies this functionality, which means
multiple users may add/drop tables at the same time in the same cluster.
CQL Enhancements
There have been numerous enhancements to CQL made in Cassandra 1.2. Changes include a new
ALTER KEYSPACE statement, syntax additions to understand how long a TTL column has remaining,
support for conditional operators, and much more. For a full list of all CQL additions in version 1.2,
please see the DataStax online documentation.
Additional Metadata Information
Cassandra 1.2 delivers new data dictionary objects that can be queried to find out cluster demographic
information and more. The three new metadata tables in the Cassandra system keyspace are:
schema_keyspaces – provides quick access to keyspace metadata.
local – supplies demographic data for the local node that is currently connected to.
peers – provides information for peer nodes in a cluster.
Example output from the schema_keyspaces data dictionary object might be:
SELECT * from system.schema_keyspaces;
keyspace | durable_writes | name
| strategy_class | strategy_options
----------+----------------+---------+----------------+-------------------history |
True | history | SimpleStrategy | {"replication_factor":"1"}
ks_info |
True | ks_info | SimpleStrategy | {"replication_factor":"1"}
12
Getting Started with Cassandra 1.2
The easiest way to get started with Cassandra 1.2 is by downloading and using DataStax
Community Edition that bundles the latest version of Apache Cassandra, sample database and
applications, and a free version of DataStax OpsCenter, which is a visual management and
monitoring solution for Cassandra and other big data platforms.
You can find out more about DataStax Community Edition and obtain downloads by visiting:
http://www.planetcassanra.org.
Cassandra for Production Environments
DataStax Enterprise Edition is a big data platform that provides a production-ready version of
Cassandra, which is integrated with Hadoop for analytics and Apache Solr for enterprise search.
DataStax Enterprise Edition is completely free to use in development environments, however
production deployments do require a subscription be purchased from DataStax.
You can find out more about DataStax Enterprise Edition and find downloads at:
http://www.datastax.com/products/enterprise.
Conclusion
To find out more about Apache Cassandra and DataStax, and to obtain downloads of Cassandra
and DataStax Enterprise software, please visit www.datastax.com or send an email to
[email protected]. Note that DataStax Enterprise Edition is completely free to use in
development environments, while production deployments require a software subscription to be
purchased.
About DataStax
DataStax provides a massively scalable
big data platform to run mission-critical
business applications for some of the
world’s most innovative and data-intensive enterprises. Powered by the open
source Apache Cassandra™ database,
DataStax delivers a fully distributed,
continuously available platform that is
faster to deploy and less expensive to
maintain than other database platforms.
DataStax has more than 250 customers
including leaders such as Netflix,
Rackspace, Pearson Education, and
Constant Contact, and spans verticals
including web, financial services,
telecommunications, logistics, and
government. Based in San Mateo, Calif.,
DataStax is backed by industry-leading
investors including Lightspeed Venture
Partners, Meritech Capital, and Crosslink Capital.
For more information, visit
www.datastax.com.
DataStax powers the big data apps that transform business for more than 200 customers, including startups and 20 of the Fortune 100.
DataStax delivers a massively scalable, flexible and continuously available big data platform built on Apache Cassandra™. DataStax integrates
enterprise-ready Cassandra, Apache Hadoop™ for analytics and Apache Solr™ for search across multi-datacenters and in the cloud.
777 Mariners Island Blvd #510
San Mateo, CA 94404
650-389-6000
Companies such as Adobe, Healthcare Anytime, eBay and Netflix rely on DataStax to transform their businesses. Based in San Mateo, Calif.,
DataStax is backed by industry-leading investors: Lightspeed Venture Partners, Crosslink Capital and Meritech Capital Partners. For more
information, visit DataStax.com or follow us on Twitter @DataStax.
© Copyright 2026 Paperzz