Extending OpenStack-Ansible with Automated Operation Management

Extending OpenStack-Ansible with
Automated Operational Management
William D. Irons
[email protected]
5/10/2017
Agenda
• Background
• OpenStack Ansible and How It Can Be Extended
• Operational Manager (OpsMgr)
• Demo
Agenda
• Background
• OpenStack Ansible and How It Can Be Extended
• Operational Manager (OpsMgr)
• Demo
Background
• OpenStack Ansible [OSA] provides a fully automated and consistent install of
OpenStack.
• However the monitoring of the environment after it is installed is lacking.
• Rackspace has already extended OpenStack Ansible to install Elastic Stack on an
OpenStack Cluster
• Using the Rackspace concepts and some existing in house work we built a
solution to monitor hardware and the applications running on that hardware
using Nagios Core and Elastic Stack.
Operational
Management
Background
• This is part of a larger effort to provide
open source reference architectures for
running OpenStack and Ceph on IBM
Power hardware and providing tools to
provision the hardware, install the
applications and monitor the entire cluster.
• Operational Management works on both
x86_64 and Power LE, Ubuntu 16.04
• Currently using the Newton branch of OSA
Hardware
Setup
Block
Storage
Database as
a Service
OpenStack
Cloud Toolkit
for
OpenPOWER
Private
Cloud
Object
Storage
Agenda
• Background
• OpenStack Ansible and How It Can Be Extended
• Operational Manager (OpsMgr)
• Demo
OpenStack Ansible
• Ansible is an open source automation platform
• Uses SSH to configure each endpoint and not agents
• OpenStack Ansible provides Ansible playbooks for the deployment and
configuration of an OpenStack cluster
• LXC Containers for each OpenStack service
• haproxy to provide both high availability and load balancing between multiple
controller nodes and proxy the request from the host to the back end LXC Containers
Extending OpenStack Ansible
• The main purpose for extending OpenStack Ansible is to install your own
customizations in the same consistent manner that OpenStack is being installed
with.
• Four main things to do:
• Create LXC Containers for additional services
• Playbooks for installing custom services and configuration
• Variable files for user defined variables
• Haproxy configuration for accessing the services
Create LXC Containers for additional services
• Need to create yml files under
/etc/openstack_deploy/env.d for each container to create
• Running the setup-host.yml playbook from OSA will
create the containers
• Or you could write your own playbook to create the
LXC containers
• Need to ensure you don’t have IP address conflicts with
OSA containers
elasticsearch.yml
--component_skel:
elasticsearch:
belongs_to:
- elasticsearch_all
container_skel:
elasticsearch_container:
belongs_to:
- log_containers
contains:
- elasticsearch
properties:
service_name: elasticsearch
Playbooks for custom services and configuration
• Ansible playbooks are necessary for installing the custom services and any
configuration of those services.
• Playbooks may be placed in any directory.
• An ansible.cfg file should be created to reference the OpenStack Ansible scripts
as necessary:
• library = /etc/ansible/roles/plugins/action
• roles_path = /opt/openstack-ansible/playbooks/roles:/etc/ansible/roles
• inventory = /opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py
• Run the playbook using either the ‘openstack-ansible’ or ‘ansible-playbook’
command.
Variable files for user defined variables
• Variable files matching the name format /etc/openstack_deploy/user_*.yml will
automatically be included when using the openstack-ansible command line.
• Don’t add new variables to the existing user_variables.yml and user_secrets.yml
files. Create new files.
• Variable files can be included manually using the -e option with the ansibleplaybook command.
haproxy configuration for accessing the services
• Define haproxy_extra_services in a variable file you create
haproxy_extra_services:
- service:
haproxy_service_name: elasticsearch
haproxy_backend_nodes: "{{ groups['elasticsearch_all'] | default([]) }}"
haproxy_port: 9200
haproxy_balance_type: http
• These are the minimal parameters, a lot of other options are available. See the
template for configuring haproxy.
• Running the haproxy-install.yml playbook from OSA will configure haproxy
Agenda
• Background
• OpenStack Ansible and How It Can Be Extended
• Operational Manager (OpsMgr)
• Demo
Operational Manager (OpsMgr)
• Operational Manager consist of three main components:
• Horizon User Interface Extension
• Resource Monitoring & Alerts (Nagios Core)
• Log/Metric Analysis (Elastic Stack / Filebeat & Metricbeat)
Horizon User Interface extension
• Created our own dashboard for Operational Management
with a panel for Inventory.
• Keeps track of the physical inventory of your OpenStack
Cluster
• Possible future function initiated from this interface:
• The ability to add and remove nodes from a cluster
• Cluster Maintenance
• Guided Updates
Resource Monitoring & Alerts
• Needed an open source monitoring tool that could be completely installed and
configured to monitor OpenStack without manual intervention.
• Chose Nagios because of it’s history and the fact that all configuration can be
done via config files.
• Our solution is extensible in that another monitoring tool could be added in,
along with plugins to monitor the OpenStack services.
Resource Monitoring & Alerts
check_proc
check_load
check_ceph_mon
NRPE
check_nrpe
…
Installed on all endpoints
Installed on controllers
Log/Metric Analysis
• The Elastic Stack is a popular open source log analysis tool
• High availability and load balancing is built into the design
• Log analysis and visualizations of log data help understand how applications
are being used and trends over time
• The analysis and visualizations are only as good and the information provided
in the logs
Kibana:
Visualizes
Easticsearch
data
Log/Metric Analysis
Metricbeat: Sends Metric data
(cpu, memory, process data…)
to Elasticsearch
Filebeat: Sends application
logs to logstash to be
parsed
Installed on all endpoints
Elasticsearch: stores the data
across all controllers, provides
API for querying the data.
Logstash: Parses application logs into
individual fields that can be queried on
Installed on controllers
Logstash Parsing Example
host
message
source
tags
int3-controller-1
2017-05-02 15:13:18.327915 3fff946be1f0 0 mon.int3-controller- 1@0(leader).
data_health(8) update_stats avail 85% total 1407 GB, used 127 GB, avail 1208 GB
/var/log/ceph/ceph-mon.int3-controller-1.log
ceph-mon, ceph, infrastructure, beats_input_codec_plain_applied
avail_percent
avail_space
avail_units
date
percent_used
total_space
total_units
used_space
used_units
85
1,236,992
MB
2017-05-02 15:13:18.327915
15
1,440,768
MB
130,048
MB
Agenda
• Background
• OpenStack Ansible and How It Can Be Extended
• Operational Manager (OpsMgr)
• Demo
• Slides of demo available at the end of this presentation
Future Work Items
• Software Currency (Ocata, Elastic Stack 5.3 now in our master branch)
• Investigate Monasca and how to leverage it’s monitoring
• Investigate the OpenStack Ansible monitoring script framework for Pike
• CentOS / RedHat Support
Links
• https://github.com/open-power-ref-design-toolkit/opsmgr
• https://github.com/open-power-ref-design-toolkit
• https://github.com/open-power-ref-design
Questions?
[email protected]
Demo Slides
Ability to launch to
Nagios or Kibana
Physical
Inventory
of the rack
Nagios Core Interface summarizing host and service status
Checks are specific to what is being monitored. Compute node services for compute
nodes, ceph services for ceph nodes.
Each LXC
Container is
monitored on the
controller node
as a service.
OpenStack Compute Node
Each service being monitored has unique
checks based on the type of service it is.
Ceph Monitor
The default dashboard when you log into Kibana gives an overall summary of
the request made and the number of request that returned errors
The timeframe can
easily be changed
and Kibana will
recalculate the
graphs based on
the new timeframe
The response time / request rate
dashboards shows the average
response of individual openstack
REST services and the amount of
request coming in.
The ceph dashboard shows the number of bytes read, written and operations per
second over time. Also graphed is the cluster’s space usage.
Metricbeat can visualize system metrics like cpu,
memory, disk and network usage.