Extending OpenStack-Ansible with Automated Operational Management William D. Irons [email protected] 5/10/2017 Agenda • Background • OpenStack Ansible and How It Can Be Extended • Operational Manager (OpsMgr) • Demo Agenda • Background • OpenStack Ansible and How It Can Be Extended • Operational Manager (OpsMgr) • Demo Background • OpenStack Ansible [OSA] provides a fully automated and consistent install of OpenStack. • However the monitoring of the environment after it is installed is lacking. • Rackspace has already extended OpenStack Ansible to install Elastic Stack on an OpenStack Cluster • Using the Rackspace concepts and some existing in house work we built a solution to monitor hardware and the applications running on that hardware using Nagios Core and Elastic Stack. Operational Management Background • This is part of a larger effort to provide open source reference architectures for running OpenStack and Ceph on IBM Power hardware and providing tools to provision the hardware, install the applications and monitor the entire cluster. • Operational Management works on both x86_64 and Power LE, Ubuntu 16.04 • Currently using the Newton branch of OSA Hardware Setup Block Storage Database as a Service OpenStack Cloud Toolkit for OpenPOWER Private Cloud Object Storage Agenda • Background • OpenStack Ansible and How It Can Be Extended • Operational Manager (OpsMgr) • Demo OpenStack Ansible • Ansible is an open source automation platform • Uses SSH to configure each endpoint and not agents • OpenStack Ansible provides Ansible playbooks for the deployment and configuration of an OpenStack cluster • LXC Containers for each OpenStack service • haproxy to provide both high availability and load balancing between multiple controller nodes and proxy the request from the host to the back end LXC Containers Extending OpenStack Ansible • The main purpose for extending OpenStack Ansible is to install your own customizations in the same consistent manner that OpenStack is being installed with. • Four main things to do: • Create LXC Containers for additional services • Playbooks for installing custom services and configuration • Variable files for user defined variables • Haproxy configuration for accessing the services Create LXC Containers for additional services • Need to create yml files under /etc/openstack_deploy/env.d for each container to create • Running the setup-host.yml playbook from OSA will create the containers • Or you could write your own playbook to create the LXC containers • Need to ensure you don’t have IP address conflicts with OSA containers elasticsearch.yml --component_skel: elasticsearch: belongs_to: - elasticsearch_all container_skel: elasticsearch_container: belongs_to: - log_containers contains: - elasticsearch properties: service_name: elasticsearch Playbooks for custom services and configuration • Ansible playbooks are necessary for installing the custom services and any configuration of those services. • Playbooks may be placed in any directory. • An ansible.cfg file should be created to reference the OpenStack Ansible scripts as necessary: • library = /etc/ansible/roles/plugins/action • roles_path = /opt/openstack-ansible/playbooks/roles:/etc/ansible/roles • inventory = /opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py • Run the playbook using either the ‘openstack-ansible’ or ‘ansible-playbook’ command. Variable files for user defined variables • Variable files matching the name format /etc/openstack_deploy/user_*.yml will automatically be included when using the openstack-ansible command line. • Don’t add new variables to the existing user_variables.yml and user_secrets.yml files. Create new files. • Variable files can be included manually using the -e option with the ansibleplaybook command. haproxy configuration for accessing the services • Define haproxy_extra_services in a variable file you create haproxy_extra_services: - service: haproxy_service_name: elasticsearch haproxy_backend_nodes: "{{ groups['elasticsearch_all'] | default([]) }}" haproxy_port: 9200 haproxy_balance_type: http • These are the minimal parameters, a lot of other options are available. See the template for configuring haproxy. • Running the haproxy-install.yml playbook from OSA will configure haproxy Agenda • Background • OpenStack Ansible and How It Can Be Extended • Operational Manager (OpsMgr) • Demo Operational Manager (OpsMgr) • Operational Manager consist of three main components: • Horizon User Interface Extension • Resource Monitoring & Alerts (Nagios Core) • Log/Metric Analysis (Elastic Stack / Filebeat & Metricbeat) Horizon User Interface extension • Created our own dashboard for Operational Management with a panel for Inventory. • Keeps track of the physical inventory of your OpenStack Cluster • Possible future function initiated from this interface: • The ability to add and remove nodes from a cluster • Cluster Maintenance • Guided Updates Resource Monitoring & Alerts • Needed an open source monitoring tool that could be completely installed and configured to monitor OpenStack without manual intervention. • Chose Nagios because of it’s history and the fact that all configuration can be done via config files. • Our solution is extensible in that another monitoring tool could be added in, along with plugins to monitor the OpenStack services. Resource Monitoring & Alerts check_proc check_load check_ceph_mon NRPE check_nrpe … Installed on all endpoints Installed on controllers Log/Metric Analysis • The Elastic Stack is a popular open source log analysis tool • High availability and load balancing is built into the design • Log analysis and visualizations of log data help understand how applications are being used and trends over time • The analysis and visualizations are only as good and the information provided in the logs Kibana: Visualizes Easticsearch data Log/Metric Analysis Metricbeat: Sends Metric data (cpu, memory, process data…) to Elasticsearch Filebeat: Sends application logs to logstash to be parsed Installed on all endpoints Elasticsearch: stores the data across all controllers, provides API for querying the data. Logstash: Parses application logs into individual fields that can be queried on Installed on controllers Logstash Parsing Example host message source tags int3-controller-1 2017-05-02 15:13:18.327915 3fff946be1f0 0 mon.int3-controller- 1@0(leader). data_health(8) update_stats avail 85% total 1407 GB, used 127 GB, avail 1208 GB /var/log/ceph/ceph-mon.int3-controller-1.log ceph-mon, ceph, infrastructure, beats_input_codec_plain_applied avail_percent avail_space avail_units date percent_used total_space total_units used_space used_units 85 1,236,992 MB 2017-05-02 15:13:18.327915 15 1,440,768 MB 130,048 MB Agenda • Background • OpenStack Ansible and How It Can Be Extended • Operational Manager (OpsMgr) • Demo • Slides of demo available at the end of this presentation Future Work Items • Software Currency (Ocata, Elastic Stack 5.3 now in our master branch) • Investigate Monasca and how to leverage it’s monitoring • Investigate the OpenStack Ansible monitoring script framework for Pike • CentOS / RedHat Support Links • https://github.com/open-power-ref-design-toolkit/opsmgr • https://github.com/open-power-ref-design-toolkit • https://github.com/open-power-ref-design Questions? [email protected] Demo Slides Ability to launch to Nagios or Kibana Physical Inventory of the rack Nagios Core Interface summarizing host and service status Checks are specific to what is being monitored. Compute node services for compute nodes, ceph services for ceph nodes. Each LXC Container is monitored on the controller node as a service. OpenStack Compute Node Each service being monitored has unique checks based on the type of service it is. Ceph Monitor The default dashboard when you log into Kibana gives an overall summary of the request made and the number of request that returned errors The timeframe can easily be changed and Kibana will recalculate the graphs based on the new timeframe The response time / request rate dashboards shows the average response of individual openstack REST services and the amount of request coming in. The ceph dashboard shows the number of bytes read, written and operations per second over time. Also graphed is the cluster’s space usage. Metricbeat can visualize system metrics like cpu, memory, disk and network usage.
© Copyright 2025 Paperzz