Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Cassandra - Tutorial Introduction: This document aims to provide a few easy to follow steps for the first-time user. We will cover the following subjects regarding Cassandra database: Installation and configuration of Cassandra on Windows. Installation and configuration of Cassandra on Linux. Running a single Cassandra node. Examples of usage. Extend Cassandra to multiple nodes. Installation and configuration of Cassandra on windows: 1. Cassandra is java based application, so first of all you need to install java on your machine. Latest JRE you can download from here: http://www.oracle.com/technetwork/java/javase/downloads/index.html 2. Download Cassandra from here: http://cassandra.apache.org/download/ 3. Extract Cassandra files. e.g. to c:\cassandra 4. Set environment variables: Go to System properties. Click on the advanced tab -> then click on Environment Variables button. Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Add the following new Variables and values: JAVA_HOME=c:\Program Files\Java\jre6\ This value should be path to jre directory. CASSANDRA_HOME=c:\cassandra This value should be set to the path of where you extract Cassandra. 5. Go to the conf folder inside Cassandra directory: Edit the cassandra.yaml file and change the var instances in the data_file_directories, commitlog_directory, and saved_caches_directory rows to your cassandra directory in the following way: data_file_directories: - var/lib/cassandra/data data_file_directories: - c:\cassandra\lib\cassandra\data commitlog_directory: var/lib/cassandra/commitlog data_file_directories: c:\cassandra\lib\cassandra\commitlog saved_caches_directory: var/lib/cassandra/saved_caches data_file_directories: c:\cassandra\lib\cassandra\saved_caches Edit the log4j-server.properties file. Change the log4j.appender.R.File line to point at the system log file to be created in the cassandra folder: log4j.appender.R.File=c:\cassandra\log\cassandra\system.log Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 6. Since we added new values and variables to the system environment we need to restart the computer so the changes will take place. If you want to restart your computer later you may skip to the next clause. After that it should work properly without restart. 7. Open the command prompt from the startup menu and enter the following commands: set CASSANDRA_HOME=c:\cassandra This should be the path to the Cassandra folder. set JAVA_HOME=c:\Program Files\Java\jre6\ This should be the path to the java folder. Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Installation and configuration of Cassandra on Linux: 1. Cassandra is java based application, so first of all you need to install java on your machine. Latest JRE you can download from here: http://www.oracle.com/technetwork/java/javase/downloads/index.html 2. Download Cassandra from here: http://cassandra.apache.org/download/ 3. Extract Cassandra files. e.g. to /specific/disk1/temp/cassandra/ 4. Set environment variables: Add the following new Variables and values to the system: setenv CASSANDRA_HOME "/specific/disk1/temp/cassandra:." This should be the path to the Cassandra folder. setenv JAVA_HOME "/usr/local/lib/jdk-6u25-ea-bin-b03:." This should be the path to the java folder. 5. Go to the conf folder inside Cassandra directory and change the following cassandra.yaml values: data_file_directories: - var/lib/cassandra/data data_file_directories: commitlog_directory: var/lib/cassandra/commitlog data_file_directories: /specific/disk1/temp/cassandra/lib/cassandra/commitlog saved_caches_directory: var/lib/cassandra/saved_caches data_file_directories: /specific/disk1/temp/cassandra/lib/cassandra/saved_caches - /specific/disk1/temp/cassandra/lib/cassandra/data Edit the log4j-server.properties file. Change the log4j.appender.R.File line to point at the system log file to be created in the cassandra folder: log4j.appender.R.File= /specific/disk1/temp/cassandra/log/cassandra/system.log Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Running a single Cassandra node: 1. Now we are ready to run Cassandra: Enter the following command at the command prompt (or terminal) from Cassandra folder location: bin/cassandra -f 2. Cassandra should go up and listen to clients: 3. If you want to stop Cassandra press Control+c and the server will shut down. Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Examples of usage: 1. Once cassandra server is up we may bring the client shell up by: bin/cassandra-cli -host <ip address> -p 9160 for example: bin/cassandra-cli -host 127.0.0.1 -p 9160 Then you will see following cassandra-cli prompt: 2. At any time you may check the help menu by enter the command: help; 3. First we'll create a new keyspace for our test called DEMO: create keyspace DEMO; 4. Next we'll use the keyspace and create a new column family called Users: use DEMO; create column Users; 5. Now you can store data into Users column family. Let's insert a new column: We have inserted a row to Users column family. The row key is '1234', and we set the 2 columns in the row: column named 'name', and 'password'. 'utf8()' means to treat the data as UTF8 string. Let's try to retrieve the columns: Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 6. You may use 'assume' command to let Cassandra to know the data type of the key, column name and value. 7. You may exit the client shell any time by: exit; Extend Cassandra to multiple nodes: To add a node to a Cassandra cluster one have to make a series of recurring operations on each node he would like to add to the cluster. First make sure that Cassandra is installed properly on the new node you would like to add. Perform all the steps described in the Installation and configuration section in this document. In addition you must perform the configuration steps described below before starting the edited cluster. To expand a single node to a two-node cluster as we will do in our examples in this page, you must edit the configuration file cassandra.yaml which is located in the conf folder under Cassandra directory. The following values must be specified on both the existing and new nodes: seeds – the list of seeds for the cluster. rpc_address and listen_address – network addresses for the nodes to listen. initial_token – defining the node’s token range for the load balance in the cluster. 1. Seed List: You must specify at least one node to act as the seeds for other nodes joining the ring. When additional nodes are added, the seed nodes provide information required to join the ring such as what other nodes are included in it, what are their locations, and so on. After a node joins the ring, it shares ring information through the gossip protocol, and does not make any further special contact with the seed node. There is no strict rule to determine which hosts need to be listed as seeds, but all nodes in a cluster should have the same seed list. To configure the seed list: Edit cassandra.yaml for each node and add the first node (132.67.104.197 in this example) as the seed in each. seeds: "132.67.104.197" If more than one seed node should be defined use the following pattern: seeds: "<ip-1>,<ip-2>,…,<ip-n>" Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 2. Listen Address and RPC Address: In order for nodes to communicate via the Gossip protocol, you need to specify the interfaces on which your nodes will listen for client traffic via Thrift and inter-cluster traffic. Set the rpc_address value to an interface accessible by clients, and the listen_address value to interfaces routable from other servers in the cluster. To configure listen_adress and rpc_address settings: Edit cassandra.yaml on all nodes in the cluster and replace the default localhost entries to specify the interfaces which will listen for traffic. For the first node in this example: listen_address: 132.67.104.197 ... rpc_address: 132.67.104.197 And for the second node (132.67.104.238 for this example): listen_address: 132.67.104.23 ... rpc_address: 132.67.104.23 3. Initial Token Values: Whenever you expand the node capacity of a Cassandra cluster, you need to set explicitly each node’s initial token in the cassandra.yaml. This is required for all nodes in order to balance the load evenly. The very first node in the cluster, is set properly to zero, and we will never need its initial_token value to be edited, but all other tokens must be recalculated every time you expand the cluster. To determine the correct initial token values for each node in the cluster you may see the following token configurations: One Node: node 0: 0 Two Nodes: node 0: 0 node 1: 85070591730234615865843651857942052864 Three Nodes: node 0: 0 node 1: 56713727820156410577229101238628035242 node 2: 113427455640312821154458202477256070485 Four Nodes: node 0: 0 node 1: 42535295865117307932921825928971026432 node 2: 85070591730234615865843651857942052864 node 3: 127605887595351923798765477786913079296 Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Five Nodes: node 0: 0 node 1: 34028236692093846346337460743176821145 node 2: 68056473384187692692674921486353642291 node 3: 102084710076281539039012382229530463436 node 4: 136112946768375385385349842972707284582 Six Nodes: node 0: 0 node 1: 28356863910078205288614550619314017621 node 2: 56713727820156410577229101238628035242 node 3: 85070591730234615865843651857942052864 node 4: 113427455640312821154458202477256070485 node 5: 141784319550391026443072753096570088106 Seven Nodes: node 0: 0 node 1: 24305883351495604533098186245126300818 node 2: 48611766702991209066196372490252601636 node 3: 72917650054486813599294558735378902454 node 4: 97223533405982418132392744980505203273 node 5: 121529416757478022665490931225631504091 node 6: 145835300108973627198589117470757804909 Eight Nodes: node 0: 0 node 1: 21267647932558653966460912964485513216 node 2: 42535295865117307932921825928971026432 node 3: 63802943797675961899382738893456539648 node 4: 85070591730234615865843651857942052864 node 5: 106338239662793269832304564822427566080 node 6: 127605887595351923798765477786913079296 node 7: 148873535527910577765226390751398592512 Nine Nodes: node 0: 0 node 1: 18904575940052136859076367079542678414 node 2: 37809151880104273718152734159085356828 node 3: 56713727820156410577229101238628035242 node 4: 75618303760208547436305468318170713656 node 5: 94522879700260684295381835397713392071 node 6: 113427455640312821154458202477256070485 node 7: 132332031580364958013534569556798748899 node 8: 151236607520417094872610936636341427313 Ten Nodes: node 0: 0 node 1: 17014118346046923173168730371588410572 node 2: 34028236692093846346337460743176821145 node 3: 51042355038140769519506191114765231718 Cassandra Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky node node node node node node 4: 5: 6: 7: 8: 9: 68056473384187692692674921486353642291 85070591730234615865843651857942052864 102084710076281539039012382229530463436 119098828422328462212181112601118874009 136112946768375385385349842972707284582 153127065114422308558518573344295695155 If you would like to set up a larger cluster you may check the token calculator on: http://blog.milford.io/cassandra-token-calculator/ Finally we'll start the nodes in the cluster. Starting a Cassandra Cluster: Start the seed node, and verify connectivity with nodetool ring as in the single node example above. Then start the remaining node. After a few minutes of pauses to exchange data all of the nodes should be up you can nodetool ring command again and it should give you something like the following: This implies the nodes are running correctly.
© Copyright 2026 Paperzz