Lecture 5

Lecture 5: Build-A-Cloud
http://www.cs.columbia.edu/~sambits/
Life Cycle in a Cloud
 Build a image(s) for the software/application that we want to host on cloud
(lecture 4)
 Request a VM – pass appropriate parameters such as resource needs and
image details (lecture 3)
 When the VM is started up, parameters are passed to it at appropriate run
levels to auto-configure the software image (lecture 4)
 Now in this lecture
– Lets monitor the provisioned VM
– Manage it at run time
– As workload changes, make changes to the amount of requested resource
2
What we shall learn
 We shall put together a cloud piece by piece
– Open Nebula as the cluster manager
– KVM as the hypervisor for host machines
– Creating and managing guest VMs
– Creating Cluster Application(s) using VMs
– Application level management
 Interesting Sub-topics which we will touch
– Monitoring cluster and applications in such an environment
– Example application level management
o How to add on-demand resource scaling using Open Nebula and Ganglia
3
Cloud Setup
 Basic Management
– Image Management
– VM Monitoring & Management
– Host Monitoring &
Management
private cloud client
Image Management
VM Management
VN Management
Host Management
Management Layer
Infrastructure
Info
4
Our stack for the cloud
 Open Nebula – for managing a set of host machines that have hypervisor on them
 KVM – hypervisor on the host machines
 Ganglia – for monitoring the guest VMs
 Glue code for implementing Application management: e.g. resource scaling
5
OpenNebula Setup
 Install OpenNebula management node
– Download and compile the src on the mgmt-node (easy installation, install root
as oneadmin)
– Setup sshd on all hosts which have to be added (also install ruby on
them)
– Allow root of the mgmt-node to have password-less access to all the
managed hosts
– Setup image repository (shared FS based setup is required for live
migration)
– If you do not have linux-server (download VirtualBox) and create a
linux VM on your laptop
 Open Nebula Architecture
– Tools written om top of OpenNebula interact with core via XML-RPC
– The core exposes VM, Host, Network management APIs
– Core stores all installation and monitoring information in SQLite3 (or
MySQL) DB.
– Most of the DB information can be accessed using XML-RPC calls
– All the drivers are written in ruby as run as daemons, which in-turn
call small shell-scripts to get the work-done.
6
Create a Cloud
 Start the one daemon
– Edit $ONEHOME/etc/oned.conf for necessary changes (quite intuitive)
– Put login:passwd in $ONEHOME/etc/one_auth
– “one start” does that
– Keeps all the DB and logs in $ONEHOME/var/
– NOTE: if you want to do a fresh setup, simply stop oned and delete $ONEHOME/var/ and again
start the OpenNebula daemon
 Setup ssh on host machines (allow oneadmin as password-less entry)
– Concatenate the .ssh/id_rsa of admin-node on the host-server’s .ssh/authorized_keys
– chmod 600 .ssh/authorized_keys
 Add hosts to OpenNebula
– Use command onehost
o Command is written in Ruby
o Command basically makes XMLRPC call to the OpenNebula server’s HostAllocate call
– E.g.
7
 Configure network
– Fixed: defines fixed set of IP-MAC pairs
– Ranged: defines a class network
– e.g. fixed set network setting (assuming you have a set of static IP addresses allotted to
you then how will you set it up).
Note: good site for help:http://www.opennebula.org/documentation:rel1.4:vgg
8
How to access OpenNebula
 All API can be called using XML-RPC client libraries
– Nebula command line client (Ruby)
– Java Client
9
Setup Monitoring
 Requirements of Monitoring
– Need something which stores resource monitoring data as a time series
– Exposes interfaces for querying it and simple aggregation of data
– Automatically archives the older data
 How to achieve it?
– Install Ganglia !
– Tune the VM-images to automatically report their monitoring via ganglia
– Install gmond on host-servers
 What is Ganglia
– Its an open-source S/W (BSD License)
– Distributed monitoring of clusters and grids
– Stores time-series data and historical data as archives (RRDs)
 How to get Ganglia
– Download the source-code from (http://ganglia.info/downloads.php)
– For some Linux distributions, RPMs are available
10
Components of ganglia
 It has two prime daemons
– Gmond: a multi-threaded daemon, which runs on monitored nodes
o Collects data on monitored notes and broadcasts the monitored data as XML
(can be accessed at port 8649)
o Configuration script (/etc/gmond.conf)
– Gmetad:
o periodically polls a collection of children data-sources
o parses the collected XML and saves all numeric metrics to round-robin
databases
o exports the aggregated XML over a TCP socket to clients (8651)
o Configuration file /etc/gmetad.conf
o One for each cluster
– Round Robin Database
o RRDtool is a well known tool for creating and storing and retrieving/plotting
RRD data
o Maintains data at various granularities: e.g. defaults are:
•
•
•
1-hour data averaged over 15-sec (rra[0])
1-day data averaged over 6-min (rra[1])
1-week-data averaged over 42-min (rra[2])
– The web GUI tools
o These are a collection of PHP scripts started by the Webserver to extract the
ganglia data and generate the graphs for the website
– Additional tools
o gmetric to add extra stats - in fact anything you like numbers or strings, with
units etc.
o gstat to get at the Ganglia data to do anything else you like
Note: good site for help: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia
11
How to get monitoring-data?
 How to get the time-series data?
– Ganglia stores all RRDs in “/var/lib/ganglia/rrds/cluster_name/machine_ip”
– There is a rrd file for each metric
– Data is collected at a fixed time-interval (default is 15 sec )
– One ca retrieve the complete time series of monitored data using rrdtool from each rrd
file: e.g.:
o Get average load_one for every 15 sec of the the last 1-hour:
o rrdtool fetch load_one.rrd AVERAGE -end now -start e-1h -r 15
12
How to get monitoring-data? …
 How to access this data from inside a program
– Either use sshlib (for perl, python or Java) and remotely execute the rrdtool command
with correct parameters
– Write a small XML-RPC server which exposes a function to run rrdtool fetch queries.
– E.g. perl XML RPC server
use Frontier::Daemon;
my $d = Frontier::Daemon->new(
methods => {
sum => \&sum,},
LocalAddr => $server_ip,
LocalPort => $server_port,
debug => 1,
);
sub sum {
my ($auth, $arg1, $arg2) = @_;
my $bool = 1;
my ($package, $filename, $line, $subroutine, $hasargs, $wantarray,
$evaltext, $is_require, $hints, $bitmask) = caller(0);
$log->info("$subroutine - " . $_[1]);
$log->debug("$subroutine @_");
$log->debug("$subroutine - " . join(";",@_));
return {SUCCESS=>$bool, MESSAGE=>$arg1 + $arg2};
}
13
Create a Multi-tiered Clustered Application
 Lets us consider a two-tired TPC-W (web server and database performance
benchmark)
 How to create an application on custom-images
– Create a 6-GB file using dd (utility for converting and copying files)
– Attach a loop-back device to it
– Format it like a file-system (say ext3)
– Partition it into 3(swap, boot and root)
– Install complete OS and application stack on the relevant partitions.
– Install gmond and configure it.
– Save it as a custom-image
 For TPC-W one will need,
– apache tomcat-server,
– java-implementation of TPC-W
– MySQL Server.
 We will need a load-balancer, which can route http-packets to various backendservers (and also http-session aware)
– I am using HAProxy (easy to install and configure)
– Nginx, lighttpd are also other popular http-proxy servers.
14
Installing a multi-tier application
Client
 Install a two-tiered Application
– Create a template of load-balancer
– Create a template of TPCW
– Deploy the LB-VM (using OpenNebula)
– Deploy the TPCW-VM (using OpenNebula)
– Attach TPCW application VMs to LB-VM
– Test using Web-browser if setup is working
– Create a Client-template
– Deploy the client VM
– Test client
15
LoadBalancer
TPCW-0
TPCW-1
Application Level Operation
 One needs to maintain Application level information, for e.g. which VM is a loadbalancer and which VMs are backend servers).
 Keep Application level knowledge in some local database.
 Application Level Operation: e.g. Dynamic provisioning
 Case 1: increasing capacity using replication
– Monitor the average utilization of VMs over say 1-min (using ganglia)
– If the average utilization of all the VMs under the load-balancer is above say 70%
o provision a new VM using OpenNebula (reactive provisioning also supported by EC2)
o Run the post-install script to add the new VM to the application
 Case 2: increase capacity using migration/resizing
– Monitor the average utilization of VMs over say 1-min (using ganglia)
– If only one-vm is over-utilized and the host does not have more resources
o migrate it to another host and re-size it to higher capacity (note nebula does not support it)
o Migrate-and-resize VM
•
•
•
16
Migrate the image to another host
Change the VM-configuration file to new configuration
Start the VM with new configuration file (with more RAM and CPU)
Application Level Operations (e.g. Dynamic Provisioning) …
 Where and How to implement the application-scaling logic:
– Application scaling logic needs knowledge of application topology
o It obviously resides above Infrastructure management layer (I.e. OpenNebula)
– Choose an easy to build language (Perl, Python, Ruby, Java etc).
o XML-RPC client is required to make access to OpenNebula
– Write a management program using language of your choice which
o Installs a multi-tier Application and stores application topology in local DB
o Periodically monitors
•
•
average load on each server
Proxy errors
o Implement case-1 and case-2
o Post-install script is adding the VM to the load-balancer and restarting it.
 Problem: live-resize or migrate-and-resize are not present in OpenNebula
– Hack: create a script which does the following (very dirty but it works)
o Migrate the current VM to destination host
o Alter the configuration file of this migrate VM
o Destroy and recreate the VM.
– Neater solution
o Add a class in include/RequestManager.h (say VirtualMachineResize similar to that of class
VirtualMachineMigrate)
o Add another method in src/rm/RequestManager.cc (say: migrateResize)
o Implement the class in src/rm/RequestManagerResize.cc (implement the resize).
17
Solution Architecture
 Application Manager (written above OpenNebula): The high level control flow is:
– Periodically monitor the workload-change and Application performance
– Manage the current configuration and actuate configuration change
– Calculate the changed capacity (using some model and feedback from monitoring block)
– Find the new configuration of application
– Go ahead and start the process of actuating the new change
18
How to use (demo!)
 Command line scripts
– VM Lifecycle steps
o
o
o
o
Creation: show template and image-naming
Suspension: just the command
Migration: migration (suspend and migrate)
Deletion: removing the image
– Show ganglia monitoring
o Host monitoring through VM-lifecycle
o VM monitoring
19
Cloud Management using this Setup
 Integrate Nebula monitoring with ganglia and make it more efficient
 Use monitoring for VM placement on hosts.
 Use monitoring to do reactive provisioning
20