HDB++: HIGH AVAILABILITY WITH Page 1 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg OVERVIEW • • • • • • • • Page 2 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ OVERVIEW • • • • • • • • Page 3 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ WHAT IS CASSANDRA? • Mythology: an excellent Oracle not believed. • A massively scalable open source NoSQL (Not Only SQL) database • Created by Facebook • Open Source since 2008 • Apache license, 2.0, compatible with GPLV3 Page 4 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg WHAT IS CASSANDRA? • • • • • • • • • Page 5 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Peer to peer architecture No Single Point of Failure Replication Continuous Availability Multi Data Centers support 100s to 1000s nodes Java High Write Throughput Read efficiency WHAT IS CASSANDRA? Source: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Page 6 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg OVERVIEW • • • • • • • • Page 7 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ WHO IS USING CASSANDRA? Page 8 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg OVERVIEW • • • • • • • • Page 9 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ CASSANDRA QUERY LANGUAGE • • • • • • • • Page 10 CQL: Cassandra Query Language Very similar to SQL But restrictions and limitations JOIN requests are forbidden No subqueries String comparisons are limited (when not using SOLR) select * from my_table where mystring like ‘%tango%’ No OR operator Can only apply a WHERE condition on an indexed column (or primary key) l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CASSANDRA QUERY LANGUAGE • Collections (64K Limitation): • list • set • map • TTL • INSERT = UPDATE (UPSERT) • Doc: http://www.datastax.com/documentation/cql/3.1/cql/cql_intro_c.html • cqlsh Page 11 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CASSANDRA QUERY LANGUAGE CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro ( att_conf_id timeuuid, period text, data_time timestamp, data_time_us int, value_r double, quality int, error_desc text, PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us) ) WITH comment='Scalar DevDouble ReadOnly Values Table‘; Page 12 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CASSANDRA QUERY LANGUAGE CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro ( att_conf_id timeuuid, period text, data_time timestamp, data_time_us int, value_r double, quality int, error_desc text, PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us) ); Partition key Page 13 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Clustering columns OVERVIEW • • • • • • • • Page 14 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ CASSANDRA ARCHITECTURE • Node: one Cassandra instance (Java process) Token Range +263-1 Page 15 Node 1 Node 3 Node 5 Node 7 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 2 Node 4 Node 6 Node 8 -263 CASSANDRA ARCHITECTURE • • Partition: ordered and replicable unit of data on a node identified by a token Partitioner (based on mumur3 algorithm by default) will distribute the data across the nodes. Token Range +263-1 Page 16 Node 1 Node 3 Node 5 Node 7 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 2 Node 4 Node 6 Node 8 -263 CASSANDRA ARCHITECTURE • Rack: logical set of nodes Token Range +263-1 Node 1 Rack 1 Node 3 Node 5 Rack 2 Node 7 Page 17 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 2 Rack 3 Node 4 Node 6 Rack 4 Node 8 -263 CASSANDRA ARCHITECTURE • Data Center: logical set of racks Data Center 1 Data Center 2 Token Range +263-1 Node 1 Rack 1 Node 3 Node 5 Rack 2 Node 7 Page 18 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 2 Rack 3 Node 4 Node 6 Rack 4 Node 7 -263 REQUEST COORDINATION • Cluster: full set of nodes which maps to a single complete token ring Cassandra Cluster Data Center 1 Node 1 Rack 1 Node 3 Node 5 Rack 2 Node 7 Page 19 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Data Center 2 Token Range +263-1 Node 2 Rack 3 Node 4 Node 6 Rack 4 Node 7 -263 OVERVIEW • • • • • • • • Page 20 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ REQUEST COORDINATION • Coordinator: the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Node 4 Node 2 Node 3 Client Page 21 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg REQUEST COORDINATION • Coordinator: the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Coordinator Node 4 Node 2 Node 3 Client Page 22 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg REQUEST COORDINATION • Coordinator: the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Coordinator Node 4 Read/Write Node 3 Client Page 23 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 2 REQUEST COORDINATION • • Any node can coordinate any request Each client request may be coordinated by a different node Data Center 1 Node 1 Coordinator Node 4 Node 2 Acknowledge Node 3 Client Page 24 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg No Single Point of Failure REQUEST COORDINATION • • The Cassandra driver chooses the coordinator node Round-Robin pattern, token-aware pattern • • Client library to manage requests Many open source drivers for many programming languages Node 1 Node 4 Node 2 Java Python C++ Node.js C# Perl PHP Go Clojure Ruby Scala R (GNU S) Erlang Haskell ODBC Node 3 Client Page 25 Driver l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Coordinator Rust REQUEST COORDINATION • The coordinator manages the replication process • Replication Factor (RF): onto how many nodes should a write be copied • The write will occur on the nodes responsible for that partition • 1 ≤ RF ≤ (#nodes in cluster) • Every write is time-stamped Page 26 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Coordinator Node 4 RF=3 Node 2 Node 3 Client Driver REQUEST COORDINATION • The coordinator manages the replication process • Replication Factor (RF): onto how many nodes should a write be copied • The write will occur on the nodes responsible for that partition • 1 ≤ RF ≤ (#nodes in cluster) • Every write is time-stamped Page 27 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Coordinator Node 4 RF=3 Node 2 Node 3 Client Driver OVERVIEW • • • • • • • • Page 28 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ CONSISTENCY • The coordinator applies the Consistency Level (CL) • Consistency Level (CL): Number of nodes which must Node 4 acknowledge a request • Examples of CL: • • • • • • • • ONE TWO THREE ANY ALL QUORUM (= RF/2 + 1) EACH_QUORUM LOCAL_QUORUM • CL may vary for each request • On success, the coordinator notifies the client (with most recent partition data in case of read request) Page 29 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY ONE - READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) + eventual read repair Page 30 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY ONE - READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) + eventual read repair Page 31 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY ONE – READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) + eventual read repair Page 32 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY ONE - READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) + eventual read repair Page 33 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) Page 34 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) Page 35 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) Page 36 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 In case of inconsistency: the most recent data is returned RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) Page 37 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – READ - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 Read repair if needed RF=3 Node 3 Node 5 Direct Read Request Digest Read Request (Hash) Page 38 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY ONE – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Page 39 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY ONE – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 40 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY ONE – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Page 41 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY ONE – WRITE - SINGLE DC Coordinator SUCCESS Client Node 1 Driver Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 42 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY ONE – WRITE - SINGLE DC Coordinator SUCCESS Client hint Node 1 Driver max_hint_window_in_ms property in cassandra.yaml file Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Hinted handoff mechanism Page 43 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY ONE – WRITE - SINGLE DC Coordinator Client Node 1 Driver hint max_hint_window_in_ms property in cassandra.yaml file Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Hinted handoff mechanism Page 44 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY ONE – WRITE - SINGLE DC Coordinator Client Node 1 Driver hint max_hint_window_in_ms property in cassandra.yaml file Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Hinted handoff mechanism Page 45 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY ONE – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Node 4 Hinted handoff mechanism Page 46 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY if node downtime > max_hint_window_in_ms Node 4 Anti-entropy node repair Page 47 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Page 48 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 49 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator SUCCESS Client Node 1 Driver Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 50 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Page 51 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator SUCCESS Client Node 1 Driver Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 52 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator Client Node 1 Driver Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Page 53 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 CONSISTENCY QUORUM – WRITE - SINGLE DC Coordinator FAILURE Client Node 1 Driver Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 54 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg OVERVIEW • • • • • • • • Page 55 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ MONITORING TOOL: OPSCENTER http://cassandra2:8888 Page 56 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg OVERVIEW • • • • • • • • Page 57 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ HDB++ libhdb++ <<implements>> libhdb++mysql <<use>> hdb++cm-srv hdb++es-srv hdb++es-srv hdb++es-srv hdb++es-srv MySQL Page 58 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg <<implements>> libhdb++cassandra <<use>> hdb++cm-srv Cassandra Cassandra Cassandra hdb++es-srv hdb++es-srv hdb++es-srv hdb++es-srv CONCLUSION: C* PROS • High Availaibility • SW upgrade with no downtime • HW failure • Linear Scalability • • • • • • • Page 59 • Need more performances? => Add nodes Big community with industrial support Can use Apache Spark for analytics (distributed processing) List, Set, Map data types (tuples and user defined types soon) Tries not to let you do actions which do not perform well Backups = snapshot = hard links => very fast Difficult to lose data Good fit for time series data l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CONCLUSION: C* CONS • Requires more total disk space and machines • sstable format can change from one version to another • No easy way to come back to a previous version once the sstables have been converted to a newer version • Cannot rename keyspaces or tables easily (not foreseen in CQL) • Difficult to modify existing partitions (Needs to duplicate the data at some point in the process) • • • • Different way of modelling Not designed for huge read requests Can be tricky to tune to avoid long GC pauses Maintenance: Need to run nodetool repair regularly if some data are deleted to avoid resurrections (CPU intensive operation) • Can take quite some time to redeem disk space after deletion in some specific cases. Page 60 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg THE END USEFUL LINKS • • • • • • • • Page 62 http://cassandra.apache.org Planet Cassandra (http://planetcassandra.org) Datastax academy (https://academy.datastax.com) Cassandra Java Driver getting started (https://academy.datastax.com/demos/cassandra-java-driver-gettingstarted) Cassandra C++ Driver: https://github.com/datastax/cpp-driver Datastax documentation (http://www.datastax.com/docs) Users mailing list: [email protected] #Cassandra channel on IRC (http://webchat.freenode.net/?channels=#Cassandra) l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg CASSANDRA FUTURE DEPLOYMENT DC Prod 1 partition/hour Keyspace prod RF:3 (write LOCAL_QUORUM) 7200 RPM Disks Big CPU - 64GB RAM DC Analytics 1 Keyspace prod RF:3 (read LOCAL_QUORUM) Keyspace analytics RF:3 (write LOCAL_QUORUM) SSD Disks Big CPU – 128 GB RAM Page 63 l Cassandra HDB++ Implementation Status l 9th April 2015 l Accelerator Control Unit DC Analytics 2 Keyspace analytics RF:5 (read LOCAL_QUORUM) 7200 RPM Disks Tiny CPU – 32 GB RAM CASSANDRA FUTURE DEPLOYMENT DC Prod 1 partition/hour Keyspace prod RF:3 (write LOCAL_QUORUM) 7200 RPM Disks Big CPU - 64GB RAM DC Analytics 1 Keyspace prod RF:3 (read LOCAL_QUORUM) Keyspace analytics RF:3 (write LOCAL_QUORUM) SSD Disks Big CPU – 128 GB RAM Page 64 l Cassandra HDB++ Implementation Status l 9th April 2015 l Accelerator Control Unit CASSANDRA’S NODE-BASED ARCHITECTURE Page 65 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg BASIC WRITE PATH CONCEPT l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Page 66 BASIC READ PATH CONCEPT l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Page 67
© Copyright 2026 Paperzz