HP Helion CloudSystem Monasca Agent

HP Helion CloudSystem Enterprise and Foundation Software
HP Helion CloudSystem 9.0:
Monasca Agent
This white paper describes the Monitoring-as-a-Service (Monasca) agent in CloudSystem.
Overview
Monasca, the HP open source project, is a comprehensive cloud monitoring solution for OpenStack based clouds.
Monasca enables users to understand the operational effectiveness of the services and underlying infrastructure
that make up their cloud and provide actionable details when there is a problem. System status and supporting
metrics are constantly monitored, readily available, and trackable, making system management tasks more
timely and predictable.
Monasca uses node-based agents to report metrics to a centralized collection point, where alarms are triggered.
The Monasca agent is based on Python and consists of several sub-components. The agent supports system
metrics, such as CPU utilization and available memory. Monasca also supports StatsD and built-in checks for
services such as MySQL and RabbitMQ.
The Monasca agent includes:
•
•
•
•
•
•
System metrics, for example, CPU and memory utilization
Integrated StatsD daemon that can be used by applications via a StatsD client library
Process checks that return several metrics on a process such as number of instances, memory, I/O, and
threads
Service checks, for example MySQL and RabbitMQ
OpenStack metrics
Automatic checks that detect and setup checks on certain processes and resources
Deployment
The Monasca agent is installed on all of the CloudSystem virtual appliances and KVM compute nodes. Metrics are
collected on all servers where the agent is installed. When the Monasca agent monitors ESXi compute nodes, it
collects the required information from VMware vCenter using the VMware Infrastructure (VI) API.
Once metrics are fetched, the agent pushes these metrics to the Monasca server. The following steps and
diagram show the process.
1.
Metrics are published to the Message-API from each agent.
2.
The Monasca API publishes the metrics to the Message Queue (Kafka).
3.
The Persister consumes metrics from the Message Queue and publishes them to Vertica.
4.
The Threshold Engine consumes metrics from the Message Queue and uses them to either evaluate
current alarms or create new alarms. If an alarm transitions to a new state due to the metrics coming in,
it updates that state in the MySQL database and publishes an “alarm-state-transitioned-event” to the
Message Queue.
5.
The Persister consumes alarm-state-transitioned-events from the Message Queue and publishes those
to Vertica as alarm state history.
6.
The Notification Engine consumes alarm-state-transitioned-events from the Message Queue and sends
out a webhook to the monasca-webhook-handler.
2
Alarm definitions
Alarm Definitions are policies that specify how Alarms are created. By using Alarm Definitions, you do not need to
create individual alarms for each system or service. Instead, a small number of Alarm Definitions can be
managed. Monasca creates Alarms for systems and services as they appear. An Alarm is created when metrics
match the Alarm Definition.
3
An Alarm Definition has an expression for evaluating metrics to determine if one or more alarms needs to be
created.
Using the Monasca Agent
The Monasca Agent shows the status/usage of resources on the various appliances and compute nodes where
monasca-agent is installed.
To view the Monitoring Dashboard:
1.
2.
3.
4.
5.
In the Operations Console, from the General menu, select Monitoring Dashboard.
Click Launch Monitoring Dashboard.
Log in using the user name and password you set for the Operations Console during First-Time
Installation.
On the overview of the Monitoring Dashboard, the alarms for all the appliances and activated computes
are displayed.
The alarm states are:
o OK [Green] - Metrics have been received and the Alarm Definition Expression evaluates to false
for the given metrics
4
o
o
Alarm [Orange] - Metrics have been received and the Alarm Definition Expression evaluates to
true for the given metrics
Undetermined [Gray]: No information has been received in (period + 2) time periods
6.
To view alarms:
o In the left navigation, click Alarms to see alarms for all services and appliances. From the
Actions menu, to the right of each row, click Graph metrics,, Show History, or Show Alarm
Definition.
o Click any service name from the Overview screen to view alarms.
o Click any server name from the Overview screen to see alarms for a CloudSystem appliance.
7.
In the left navigation, click Alarm Definitions to view and edit the types of alarms that are enabled. You
can change the name, expression, and other details about the alarm. You might want to raise or lower
alarm thresholds if you are receiving too many or not enough alarms. Do not edit the default alarm
definitions, because they are used in the Operations Console.
Click Dashboard. The Grafana Dashboard opens. From this dashboard, you can view a graphical
representation of the health of services, and the CPU and database usage of each CloudSystem
appliance.
o Click the graph title (for example, CPU), and then click Edit.
8.
9.
Click Monasca Health. The Monasca Service Dashboard opens. From this dashboard, you view a
graphical representation of the health of the Monasca services.
Compute Nodes
The Monasca agent supports KVM and ESXi compute node monitoring. Hyper-V monitoring is not supported in
CloudSystem 9.0.
5
The Operations Console uses the Monasca agent to display the status and usage data of CPU, memory, and
storage of the compute nodes as shown in the figure below.
•
KVM Monitoring
o When a KVM compute node is activated, a CloudSystem RPM installs monasca-agent on the
compute node.
o The Monasca agent collects the metrics from the compute node and sends them to the
Monitoring appliance.
o When the KVM compute node is deactivated, the RPM and monasca-agent are uninstalled.
•
ESXi Monitoring
o A custom plug-in for the Monasca agent exists in each of the Cloud controllers to monitor ESXi
compute clusters.
o When an ESXI cluster is activated, the configuration (vcenter.yaml) of the particular plug-in is
modified to monitor the activated cluster.
o When an ESXi cluster is deactivated, the cluster’s details are removed from the plug-in
configuration.
Notification Methods: Webhook
Webhook is the notification method used to notify the state change for all of the alarms. When there are state
changes in these alarms, the threshold engine on the Monitoring appliance sends a notification to a webhookhandler service, which converts this notification to the appropriate payload. This payload is processed by other
services to display on the Activity Dashboard screen.
To view the Activity Dashboard:
1.
In the Operations Console, from the main menu, select Monitoring Dashboard.
6
2.
3.
Click Launch Activity Dashboard. The Monitoring Activity dashboard in Horizon on the Management
appliance opens.
Log in using the user name and password you set for the Operations Console during First-Time
Installation.
Learn more about HP Helion CloudSystem:
http://www.hp.com/go/CloudSystem
http://www.hp.com/go/CloudSystem/docs
© Copyright 2014-2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services.
Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions
contained herein.
5900-0103, December 2015
7