New jobcomp plugin for elasticsearch

A job completion plugin for
ElasticSearch
Alejandro Sanchez
[email protected]
Index
1. Introduction
2. ElasticSearch
3. MareNostrumIII solution
4. Plugin goals
5. Plugin development
6. Production Integration
7. Future work
8. References and conclusions
Introduction and motivation
• BSC-CNS (Barcelona Supercomputing Center)
• Officially constituted in April 2005
• Variety of clusters:
• MareNostrumIII
• 48,896 cores (3,056 compute nodes), 103.5TB of main memory
• IBM Plaftorm LSF
• MinoTauro (GPU based), CNAG (genomics), BSCCV (life), ...
• Different sizes/configurations
• SLURM
• Research, develop and manage IT in order to ease scientific
progress
• Special dedication in some areas:
• Computer Sciences, Life Sciences, Earth Sciences and Computational
Applications
Introduction and motivation
• Who make use of the clusters?
• How we divide cpu hours among projects?
 Sharing depends on the cluster
MN3
(Partnership for Advanced
Computing in Europe)
PRACE
70%
cluster
6%
24%
(Red Española de
Supercomputación)
RES
BSC
projects
…
prace
…
bsc_ls
class_a
class_b
class_c
bsc_cs
queues
bsc_es
Introduction and motivation
• We need to ensure that the cpu usage distribution among
projects meets the agreement
• Analyzing data about finished jobs gives us a very valuable
information
o Correlations between users time_limit and elapsed time
o Statistical information about projects, groups, users and
how do their executions finish
• Use the results for
o Make corrections to the scheduling configuration
o Train users on how to properly submit jobs
o Accounting purposes
 There is a NEED to store historical data about finished jobs
ElasticSearch basics
“Elasticsearch is a flexible and powerful open source, distributed,
real-time search and analytics engine.”
Features:
• Real-time data
• Distributed
• High-availability
• Document oriented (JSON)
• RESTful API
• Schema free
• Based on Apache Lucene
www.elasticsearch.org
ElasticSearch basics
Structure:
• Cluster: “collection of one or more nodes (servers) that
together holds your entire data”
• Node: “single server that is part of your cluster”
• Index: “collection of documents that have somewhat similar
characteristics”
• Type: “whithin and index, you can define one or more types
(logical category/partition)”
• Document: “basic unit of information that can be indexed,
expressed in JSON format”
• Shard: “subdivision of an index”
MareNostrumIII solution
Network
mbatchd
events_log
inotify
tcp
Monitoring Server
tcp
logstash
event
pipe
index
netcat
lsb.acct
Scheduling Server with
LSF
uses
ElasticSearch
presents job
historical data
Kibana
web
browser
access
httpd
request
Plugin goals
• Rest of BSC clusters use SLURM
• Make it generic, following the SLURM guidelines
• Current jobcomp plugins didn’t satisfy our needs
mysql, filetxt or script  elasticsearch
Finished job data, 37 fields:
slurmctld
index job data
ElasticSearch
account
alloc_node
cluster
cpu_hours
cpus_per_task
derived_exitcode
elapsed
eligible_time
end_time
excluded_nodes
exitcode
gres_alloc
gres_req
group_id
groupname
jobid
nodes
ntasks
ntasks_per_node
orig_dependency
parent_accounts
partition
qos
reservation_name
script
start_time
state
std_err
std_in
std_out
submit_time
time_limit
total_cpus
total_nodes
user_id
username
work_dir
Plugin development
• Operations against elasticsearch server are executed through
HTTP requests/responses
• Request pattern:
$ curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>' -d '<BODY>'
• Request to index a document, example:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
• Plugin uses libcurl-devel library to handle requests/responses
 autoconf files have been added
 plugin not installed unless library detected and usable
Plugin development
• Plugin can be enabled and configured in slurm.conf
JobCompType=jobcomp/elasticsearch
JobCompLoc=http://YOURELASTICSERVER:9200
• So plugin has to check that server referenced by the
configured URL is reachable and accessible
• How? by capturing and parsing the HTTP response headers
received from the server side
Plugin development
• Example of a properly indexed document’s response:
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 92
{"_index":"someindex","_type":"sometype","_id":
"fsAx6qXcQGCSrY1DWvQACw","_version":1,"created"
:true}
• Just the header is needed (not the body). So libcurl
parameters are configured to just capture the headers
• Specifically, plugin checks whether the status code is 200
(OK) or 201 (Created)
Plugin development
• Different sources of failure: server unavailable, index readonly, etc.
• Example of a document not indexed:
HTTP/1.1 403 Forbidden
Content-Type: application/json; charset=UTF-8
Content-Length: 96
{"error":"ClusterBlockException[blocked by: [FORBIDDEN/5/index read-only
(api)];]","status":403}
• Does it mean that every status code different to 200 or 201
indicates a failure? … NO (corner case found while testing)
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Cotent-Type: application/json
Used to determine if the origin server
is willing to accept the request (based
on the headers) before the client
sends the body
Plugin development
• What happens with job data that can’t be indexed?
Plugin manages a memory structure to take into account the
data information about pending jobs
job0
data
job1
data
job2
data
job3
data
job4
data
…
jobN-1
data
Data coherence
between memory
structure and
state file
typedef struct {
uint32_t nelems;
char **jobs;
} pending_jobs_t
StateSaveLocation/elasticsearch_state
Data saved in network byte order, using SLURM functions:
pack_str_array()
safe_unpackstr_array()
Plugin development
• When does the plugin try to reindex the pending jobs?
1. When the plugin is loaded
pending_jobs_t
job0
data
_load_pending_jobs()
job1
data
_index_retry()
…
elasticsearch_state
jobN-1
data
2. Just after a successfully indexed job
_index_retry()
ElasticSearch
Production integration
• A web-layer has been added (Kibana)
Configurable dashboards,
time-based comparisons
Make sense of your data,
create bar, line and scatter
plots
Flexible interface,
easy to share
Powerful search
syntax and easy-setup
Production integration
• Plugin already running in MinoTauro cluster
• 126 compute nodes, GPU based
• 2 login nodes
• Planned integration in CNAG cluster in future months
• Genomics analysis and research
• 100 compute nodes, 20 HiMem nodes
• 2 login nodes
• Same with the rest of BSC SLURM clusters
• BSCCV, Altix2 UV100, etc.
Production integration
• Kibana global view
Production integration
• Zoom in/out time range
• Expand job data details
• Search, filter, pagination…
Future work (basic statistics)
• Elapsed time vs
project/qos
• Mins, Maxs,
Means, Std-devs, …
Future work (Machine Learning)
• Simple prediction methdos (Linear Regression)
• time_limit prediction based on submit parameters
𝑌𝑡 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + … + 𝛽𝑝 𝑋𝑝 + 𝜀
𝑌𝑡 measured or dependant variable
𝑋𝑖 input or independent variables
𝛽𝑖 regression coefficients
𝜀 error term
 Helps improving Backfill scheduling (more efficient usage of
cluster resources)
 A submit plugin could be developed applying the prediction
formula
 There are more complex models, using decision trees or
combining different models into one
Future work (Machine Learning)
References and conclusions
• SLURM reference to the plugin
http://slurm.schedmd.com/download.html
• Github repository
https://github.com/asanchez1987/jobcomp-elasticsearch
• Possible merge in future stable releases
• Final Master Thesis, university-company context
o Bacelona School of Informatics, www.fib.upc.edu/en
o Barcelona Supercomputing Center, www.bsc.es