Introduction to Cassandra

HDB++: HIGH
AVAILABILITY WITH
Page 1
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
•
•
•
•
•
•
•
•
Page 2
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
OVERVIEW
•
•
•
•
•
•
•
•
Page 3
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
WHAT IS CASSANDRA?
• Mythology: an excellent
Oracle not believed.
• A massively scalable open
source NoSQL (Not Only
SQL) database
• Created by Facebook
• Open Source since 2008
• Apache license, 2.0,
compatible with GPLV3
Page 4
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
WHAT IS CASSANDRA?
•
•
•
•
•
•
•
•
•
Page 5
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Peer to peer architecture
No Single Point of Failure
Replication
Continuous Availability
Multi Data Centers support
100s to 1000s nodes
Java
High Write Throughput
Read efficiency
WHAT IS CASSANDRA?
Source: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Page 6
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
•
•
•
•
•
•
•
•
Page 7
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
WHO IS USING CASSANDRA?
Page 8
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
•
•
•
•
•
•
•
•
Page 9
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
CASSANDRA QUERY LANGUAGE
•
•
•
•
•
•
•
•
Page 10
CQL: Cassandra Query Language
Very similar to SQL
But restrictions and limitations
JOIN requests are forbidden
No subqueries
String comparisons are limited (when not using SOLR)
select * from my_table where mystring like
‘%tango%’
No OR operator
Can only apply a WHERE condition on an indexed column
(or primary key)
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
• Collections (64K Limitation):
• list
• set
• map
• TTL
• INSERT = UPDATE (UPSERT)
• Doc:
http://www.datastax.com/documentation/cql/3.1/cql/cql_intro_c.html
• cqlsh
Page 11
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro (
att_conf_id timeuuid,
period text,
data_time timestamp,
data_time_us int, value_r double,
quality int,
error_desc text,
PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us)
)
WITH comment='Scalar DevDouble ReadOnly Values Table‘;
Page 12
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro (
att_conf_id timeuuid,
period text,
data_time timestamp,
data_time_us int,
value_r double,
quality int,
error_desc text,
PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us)
);
Partition key
Page 13
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Clustering columns
OVERVIEW
•
•
•
•
•
•
•
•
Page 14
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
CASSANDRA ARCHITECTURE
•
Node: one Cassandra instance (Java process)
Token Range
+263-1
Page 15
Node 1
Node 3
Node 5
Node 7
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 2
Node 4
Node 6
Node 8
-263
CASSANDRA ARCHITECTURE
•
•
Partition: ordered and replicable unit of data on a
node identified by a token
Partitioner (based on mumur3 algorithm by default)
will distribute the data across the nodes.
Token Range
+263-1
Page 16
Node 1
Node 3
Node 5
Node 7
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 2
Node 4
Node 6
Node 8
-263
CASSANDRA ARCHITECTURE
•
Rack: logical set of nodes
Token Range
+263-1
Node 1 Rack 1 Node 3
Node 5 Rack 2 Node 7
Page 17
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 2 Rack 3 Node 4
Node 6 Rack 4 Node 8
-263
CASSANDRA ARCHITECTURE
•
Data Center: logical set of racks
Data Center 1
Data Center 2
Token Range
+263-1
Node 1 Rack 1 Node 3
Node 5 Rack 2 Node 7
Page 18
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 2 Rack 3 Node 4
Node 6 Rack 4 Node 7
-263
REQUEST COORDINATION
•
Cluster: full set of nodes which maps to a single
complete token ring
Cassandra Cluster
Data Center 1
Node 1 Rack 1 Node 3
Node 5 Rack 2 Node 7
Page 19
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Data Center 2
Token Range
+263-1
Node 2 Rack 3 Node 4
Node 6 Rack 4 Node 7
-263
OVERVIEW
•
•
•
•
•
•
•
•
Page 20
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
REQUEST COORDINATION
•
Coordinator: the node chosen by the client to
receive a particular read or write request to its
cluster
Data Center 1
Node 1
Node 4
Node 2
Node 3
Client
Page 21
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
REQUEST COORDINATION
•
Coordinator: the node chosen by the client to
receive a particular read or write request to its
cluster
Data Center 1
Node 1
Coordinator
Node 4
Node 2
Node 3
Client
Page 22
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
REQUEST COORDINATION
•
Coordinator: the node chosen by the client to
receive a particular read or write request to its
cluster
Data Center 1
Node 1
Coordinator
Node 4
Read/Write
Node 3
Client
Page 23
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 2
REQUEST COORDINATION
•
•
Any node can coordinate any request
Each client request may be coordinated by a
different node
Data Center 1
Node 1
Coordinator
Node 4
Node 2
Acknowledge
Node 3
Client
Page 24
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
No Single Point of
Failure
REQUEST COORDINATION
•
•
The Cassandra driver chooses the coordinator node
Round-Robin pattern, token-aware pattern
•
•
Client library to manage requests
Many open source drivers for many programming languages
Node 1
Node 4
Node 2
Java
Python
C++
Node.js
C#
Perl
PHP
Go
Clojure
Ruby
Scala
R (GNU S)
Erlang
Haskell
ODBC
Node 3
Client
Page 25
Driver
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Coordinator
Rust
REQUEST COORDINATION
• The coordinator manages the
replication process
• Replication Factor (RF): onto
how many nodes should a
write be copied
• The write will occur on the
nodes responsible for that
partition
• 1 ≤ RF ≤ (#nodes in cluster)
• Every write is time-stamped
Page 26
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Coordinator
Node 4
RF=3
Node 2
Node 3
Client
Driver
REQUEST COORDINATION
• The coordinator manages the
replication process
• Replication Factor (RF): onto
how many nodes should a
write be copied
• The write will occur on the
nodes responsible for that
partition
• 1 ≤ RF ≤ (#nodes in cluster)
• Every write is time-stamped
Page 27
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Coordinator
Node 4
RF=3
Node 2
Node 3
Client
Driver
OVERVIEW
•
•
•
•
•
•
•
•
Page 28
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
CONSISTENCY
• The coordinator applies the Consistency Level (CL)
• Consistency Level (CL): Number of nodes which must
Node 4
acknowledge a request
• Examples of CL:
•
•
•
•
•
•
•
•
ONE
TWO
THREE
ANY
ALL
QUORUM (= RF/2 + 1)
EACH_QUORUM
LOCAL_QUORUM
• CL may vary for each request
• On success, the coordinator notifies the client (with most
recent partition data in case of read request)
Page 29
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY ONE - READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
Page 30
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY ONE - READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
Page 31
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY ONE – READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
Page 32
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY ONE - READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
Page 33
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash)
Page 34
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash)
Page 35
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash)
Page 36
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
In case of inconsistency: the
most recent data is returned
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash)
Page 37
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – READ - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
Read repair if needed
RF=3
Node 3
Node 5
Direct Read Request
Digest Read Request (Hash)
Page 38
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Page 39
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Page 40
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Page 41
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
SUCCESS
Client
Node 1
Driver
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Page 42
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
SUCCESS
Client
hint
Node 1
Driver
max_hint_window_in_ms
property in
cassandra.yaml file
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Hinted handoff mechanism
Page 43
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
hint
max_hint_window_in_ms
property in
cassandra.yaml file
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Node 4
Hinted handoff mechanism
Page 44
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
hint
max_hint_window_in_ms
property in
cassandra.yaml file
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Node 4
Hinted handoff mechanism
Page 45
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY ONE – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Node 4
Hinted handoff mechanism
Page 46
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY
if node downtime > max_hint_window_in_ms
Node 4
Anti-entropy node repair
Page 47
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Page 48
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Page 49
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
SUCCESS
Client
Node 1
Driver
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Page 50
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Page 51
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
SUCCESS
Client
Node 1
Driver
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Page 52
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
Client
Node 1
Driver
Node 2
Node 6
RF=3
Node 3
Node 5
Write Request
Page 53
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
CONSISTENCY QUORUM – WRITE - SINGLE DC
Coordinator
FAILURE
Client
Node 1
Driver
Node 2
Node 6
ACK
RF=3
Node 3
Node 5
Node 4
ACK
Page 54
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
•
•
•
•
•
•
•
•
Page 55
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
MONITORING TOOL: OPSCENTER
http://cassandra2:8888
Page 56
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
•
•
•
•
•
•
•
•
Page 57
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra (C*)?
Who is using C*?
CQL
C* architecture
Request Coordination
Consistency
Monitoring tool
HDB++
HDB++
libhdb++
<<implements>>
libhdb++mysql
<<use>>
hdb++cm-srv
hdb++es-srv
hdb++es-srv
hdb++es-srv
hdb++es-srv
MySQL
Page 58
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
<<implements>>
libhdb++cassandra
<<use>>
hdb++cm-srv
Cassandra
Cassandra
Cassandra
hdb++es-srv
hdb++es-srv
hdb++es-srv
hdb++es-srv
CONCLUSION: C* PROS
• High Availaibility
• SW upgrade with no downtime
• HW failure
• Linear Scalability
•
•
•
•
•
•
•
Page 59
• Need more performances? => Add nodes
Big community with industrial support
Can use Apache Spark for analytics (distributed processing)
List, Set, Map data types (tuples and user defined types soon)
Tries not to let you do actions which do not perform well
Backups = snapshot = hard links => very fast
Difficult to lose data
Good fit for time series data
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONCLUSION: C* CONS
• Requires more total disk space and machines
• sstable format can change from one version to another
• No easy way to come back to a previous version once the
sstables have been converted to a newer version
• Cannot rename keyspaces or tables easily (not foreseen in CQL)
• Difficult to modify existing partitions (Needs to duplicate the data
at some point in the process)
•
•
•
•
Different way of modelling
Not designed for huge read requests
Can be tricky to tune to avoid long GC pauses
Maintenance: Need to run nodetool repair regularly if some
data are deleted to avoid resurrections (CPU intensive operation)
• Can take quite some time to redeem disk space after deletion
in some specific cases.
Page 60
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
THE END
USEFUL LINKS
•
•
•
•
•
•
•
•
Page 62
http://cassandra.apache.org
Planet Cassandra (http://planetcassandra.org)
Datastax academy (https://academy.datastax.com)
Cassandra Java Driver getting started
(https://academy.datastax.com/demos/cassandra-java-driver-gettingstarted)
Cassandra C++ Driver: https://github.com/datastax/cpp-driver
Datastax documentation (http://www.datastax.com/docs)
Users mailing list: [email protected]
#Cassandra channel on IRC
(http://webchat.freenode.net/?channels=#Cassandra)
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA FUTURE DEPLOYMENT
DC Prod
1 partition/hour
Keyspace prod RF:3
(write LOCAL_QUORUM)
7200 RPM Disks
Big CPU - 64GB RAM
DC Analytics 1
Keyspace prod RF:3
(read LOCAL_QUORUM)
Keyspace analytics RF:3
(write LOCAL_QUORUM)
SSD Disks
Big CPU – 128 GB RAM
Page 63
l Cassandra HDB++ Implementation Status l 9th April 2015 l Accelerator Control Unit
DC Analytics 2
Keyspace analytics RF:5
(read LOCAL_QUORUM)
7200 RPM Disks
Tiny CPU – 32 GB RAM
CASSANDRA FUTURE DEPLOYMENT
DC Prod
1 partition/hour
Keyspace prod RF:3
(write LOCAL_QUORUM)
7200 RPM Disks
Big CPU - 64GB RAM
DC Analytics 1
Keyspace prod RF:3
(read LOCAL_QUORUM)
Keyspace analytics RF:3
(write LOCAL_QUORUM)
SSD Disks
Big CPU – 128 GB RAM
Page 64
l Cassandra HDB++ Implementation Status l 9th April 2015 l Accelerator Control Unit
CASSANDRA’S NODE-BASED ARCHITECTURE
Page 65
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
BASIC WRITE PATH CONCEPT
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Page 66
BASIC READ PATH CONCEPT
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Page 67