Cassandra - Cs Team Site | courses.cs.tau.ac.il

Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
Cassandra - Tutorial
Introduction:
This document aims to provide a few easy to follow steps for the first-time user.
We will cover the following subjects regarding Cassandra database:





Installation and configuration of Cassandra on Windows.
Installation and configuration of Cassandra on Linux.
Running a single Cassandra node.
Examples of usage.
Extend Cassandra to multiple nodes.
Installation and configuration of Cassandra on windows:
1. Cassandra is java based application, so first of all you need to install java on your machine.
Latest JRE you can download from here:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. Download Cassandra from here: http://cassandra.apache.org/download/
3. Extract Cassandra files. e.g. to c:\cassandra
4. Set environment variables:
 Go to System properties.

Click on the advanced tab -> then click on Environment Variables button.
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky

Add the following new Variables and values:
 JAVA_HOME=c:\Program Files\Java\jre6\
This value should be path to jre directory.
 CASSANDRA_HOME=c:\cassandra
This value should be set to the path of where you extract Cassandra.
5. Go to the conf folder inside Cassandra directory:
 Edit the cassandra.yaml file and change the var instances in the
data_file_directories, commitlog_directory, and saved_caches_directory rows
to your cassandra directory in the following way:

data_file_directories:
- var/lib/cassandra/data
data_file_directories:
- c:\cassandra\lib\cassandra\data
commitlog_directory:
var/lib/cassandra/commitlog
data_file_directories:
c:\cassandra\lib\cassandra\commitlog
saved_caches_directory:
var/lib/cassandra/saved_caches
data_file_directories:
c:\cassandra\lib\cassandra\saved_caches
Edit the log4j-server.properties file.
 Change the log4j.appender.R.File line to point at the system log file to be
created in the cassandra folder:
log4j.appender.R.File=c:\cassandra\log\cassandra\system.log
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
6. Since we added new values and variables to the system environment we need to restart
the computer so the changes will take place. If you want to restart your computer later
you may skip to the next clause. After that it should work properly without restart.
7. Open the command prompt from the startup menu and enter the following commands:


set CASSANDRA_HOME=c:\cassandra
This should be the path to the Cassandra folder.
set JAVA_HOME=c:\Program Files\Java\jre6\
This should be the path to the java folder.
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
Installation and configuration of Cassandra on Linux:
1. Cassandra is java based application, so first of all you need to install java on your machine.
Latest JRE you can download from here:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. Download Cassandra from here: http://cassandra.apache.org/download/
3. Extract Cassandra files. e.g. to /specific/disk1/temp/cassandra/
4. Set environment variables:
 Add the following new Variables and values to the system:
setenv CASSANDRA_HOME "/specific/disk1/temp/cassandra:."
This should be the path to the Cassandra folder.
 setenv JAVA_HOME "/usr/local/lib/jdk-6u25-ea-bin-b03:."
This should be the path to the java folder.
5. Go to the conf folder inside Cassandra directory and change the following cassandra.yaml
values:

data_file_directories:
- var/lib/cassandra/data
data_file_directories:
commitlog_directory:
var/lib/cassandra/commitlog
data_file_directories:
/specific/disk1/temp/cassandra/lib/cassandra/commitlog
saved_caches_directory:
var/lib/cassandra/saved_caches
data_file_directories:
/specific/disk1/temp/cassandra/lib/cassandra/saved_caches

- /specific/disk1/temp/cassandra/lib/cassandra/data
Edit the log4j-server.properties file.
 Change the log4j.appender.R.File line to point at the system log file to be
created in the cassandra folder:
log4j.appender.R.File=
/specific/disk1/temp/cassandra/log/cassandra/system.log
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
Running a single Cassandra node:
1. Now we are ready to run Cassandra:
Enter the following command at the command prompt (or terminal) from
Cassandra folder location:
bin/cassandra -f
2. Cassandra should go up and listen to clients:
3. If you want to stop Cassandra press Control+c and the server will shut down.
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
Examples of usage:
1. Once cassandra server is up we may bring the client shell up by:
bin/cassandra-cli -host <ip address> -p 9160
for example:
bin/cassandra-cli -host 127.0.0.1 -p 9160
Then you will see following cassandra-cli prompt:
2. At any time you may check the help menu by enter the command: help;
3. First we'll create a new keyspace for our test called DEMO:
create keyspace DEMO;
4. Next we'll use the keyspace and create a new column family called Users:
 use DEMO;
 create column Users;
5. Now you can store data into Users column family.
Let's insert a new column:
We have inserted a row to Users column family.
The row key is '1234', and we set the 2 columns in the row: column named 'name',
and 'password'. 'utf8()' means to treat the data as UTF8 string.
Let's try to retrieve the columns:
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
6. You may use 'assume' command to let Cassandra to know the data type of the key,
column name and value.
7. You may exit the client shell any time by: exit;
Extend Cassandra to multiple nodes:
To add a node to a Cassandra cluster one have to make a series of recurring operations on
each node he would like to add to the cluster.
First make sure that Cassandra is installed properly on the new node you would like to add.
Perform all the steps described in the Installation and configuration section in this document.
In addition you must perform the configuration steps described below before starting the
edited cluster.
To expand a single node to a two-node cluster as we will do in our examples in this page,
you must edit the configuration file cassandra.yaml which is located in the conf folder under
Cassandra directory.
The following values must be specified on both the existing and new nodes:
 seeds – the list of seeds for the cluster.
 rpc_address and listen_address – network addresses for the nodes to listen.
 initial_token – defining the node’s token range for the load balance in the cluster.
1. Seed List:
You must specify at least one node to act as the seeds for other nodes joining the ring.
When additional nodes are added, the seed nodes provide information required to join the
ring such as what other nodes are included in it, what are their locations, and so on.
After a node joins the ring, it shares ring information through the gossip protocol, and does
not make any further special contact with the seed node.
There is no strict rule to determine which hosts need to be listed as seeds, but all nodes in a
cluster should have the same seed list.
To configure the seed list:
Edit cassandra.yaml for each node and add the first node (132.67.104.197 in this example)
as the seed in each.
seeds: "132.67.104.197"
If more than one seed node should be defined use the following pattern:
seeds: "<ip-1>,<ip-2>,…,<ip-n>"
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
2. Listen Address and RPC Address:
In order for nodes to communicate via the Gossip protocol, you need to specify the interfaces
on which your nodes will listen for client traffic via Thrift and inter-cluster traffic.
Set the rpc_address value to an interface accessible by clients, and the listen_address value
to interfaces routable from other servers in the cluster.
To configure listen_adress and rpc_address settings:
Edit cassandra.yaml on all nodes in the cluster and replace the default localhost entries to
specify the interfaces which will listen for traffic.
For the first node in this example:
listen_address: 132.67.104.197
...
rpc_address: 132.67.104.197
And for the second node (132.67.104.238 for this example):
listen_address: 132.67.104.23
...
rpc_address: 132.67.104.23
3. Initial Token Values:
Whenever you expand the node capacity of a Cassandra cluster, you need to set explicitly
each node’s initial token in the cassandra.yaml. This is required for all nodes in order to
balance the load evenly.
The very first node in the cluster, is set properly to zero, and we will never need
its initial_token value to be edited, but all other tokens must be recalculated every time you
expand the cluster.
To determine the correct initial token values for each node in the cluster you may see the
following token configurations:
One Node:
node 0: 0
Two Nodes:
node 0: 0
node 1: 85070591730234615865843651857942052864
Three Nodes:
node 0: 0
node 1: 56713727820156410577229101238628035242
node 2: 113427455640312821154458202477256070485
Four Nodes:
node 0: 0
node 1: 42535295865117307932921825928971026432
node 2: 85070591730234615865843651857942052864
node 3: 127605887595351923798765477786913079296
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
Five Nodes:
node 0: 0
node 1: 34028236692093846346337460743176821145
node 2: 68056473384187692692674921486353642291
node 3: 102084710076281539039012382229530463436
node 4: 136112946768375385385349842972707284582
Six Nodes:
node 0: 0
node 1: 28356863910078205288614550619314017621
node 2: 56713727820156410577229101238628035242
node 3: 85070591730234615865843651857942052864
node 4: 113427455640312821154458202477256070485
node 5: 141784319550391026443072753096570088106
Seven Nodes:
node 0: 0
node 1: 24305883351495604533098186245126300818
node 2: 48611766702991209066196372490252601636
node 3: 72917650054486813599294558735378902454
node 4: 97223533405982418132392744980505203273
node 5: 121529416757478022665490931225631504091
node 6: 145835300108973627198589117470757804909
Eight Nodes:
node 0: 0
node 1: 21267647932558653966460912964485513216
node 2: 42535295865117307932921825928971026432
node 3: 63802943797675961899382738893456539648
node 4: 85070591730234615865843651857942052864
node 5: 106338239662793269832304564822427566080
node 6: 127605887595351923798765477786913079296
node 7: 148873535527910577765226390751398592512
Nine Nodes:
node 0: 0
node 1: 18904575940052136859076367079542678414
node 2: 37809151880104273718152734159085356828
node 3: 56713727820156410577229101238628035242
node 4: 75618303760208547436305468318170713656
node 5: 94522879700260684295381835397713392071
node 6: 113427455640312821154458202477256070485
node 7: 132332031580364958013534569556798748899
node 8: 151236607520417094872610936636341427313
Ten Nodes:
node 0: 0
node 1: 17014118346046923173168730371588410572
node 2: 34028236692093846346337460743176821145
node 3: 51042355038140769519506191114765231718
Cassandra Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky
node
node
node
node
node
node
4:
5:
6:
7:
8:
9:
68056473384187692692674921486353642291
85070591730234615865843651857942052864
102084710076281539039012382229530463436
119098828422328462212181112601118874009
136112946768375385385349842972707284582
153127065114422308558518573344295695155
If you would like to set up a larger cluster you may check the token calculator on:
http://blog.milford.io/cassandra-token-calculator/
Finally we'll start the nodes in the cluster.
Starting a Cassandra Cluster:
Start the seed node, and verify connectivity with nodetool ring as in the single node example
above. Then start the remaining node. After a few minutes of pauses to exchange data all of
the nodes should be up you can nodetool ring command again and it should give you
something like the following:
This implies the nodes are running correctly.