Ceph status and plans

1
Ceph status and plans
Alastair Dewhurst, Alison Packer, James Adams,
Bruno Canning, George Vasilakakos, Ian Johnson
Alastair Dewhurst, 1st September 2016
Ceph at RAL
S3 / Swift
Service
XrootD /
GridFTP
Castor
buffers
2
CephFS
(Facilities)
Echo
EC 8+3
Cluster
•
•
•
RBD
(Cloud)
Sirius
Rep x3
Common configuration management.
Both clusters run by data service team.
Working on improving monitoring and operational
procedures.
Alastair Dewhurst, 1st September 2016
Sirius cluster
•
Has been providing reliable RBD storage for internal
cloud services for over a year.
•
•
One serious incident which resulted in the loss and recreation of all Cloud VMs.
•
•
Providing pools for OpenNebula and test OpenStack.
We learnt much from this!
Future:
•
•
Additional rack of hardware being purchased.
CephFS planned for STFC Facilities users.
Alastair Dewhurst, 1st September 2016
3
Echo cluster
•
Working (new) cluster since July 2016.
•
•
•
•
•
3 physical Monitors (Hot spare on order).
60 x 216TB storage nodes.
3 gateways machines (awaiting network upgrade to
40GB/s).
Running Jewel on SL7.
9.4PB usable storage (8+3 Erasure Coding).
•
Intend to provide ATLAS and CMS 2.8PB each as part
of pledged resources in April 2017.
Alastair Dewhurst, 1st September 2016
4
Ceph Monitoring
•
•
Nagios for exceptions.
Ceph Dashboard:
•
•
Telegraf plugin for Ceph → InfluxDB:
•
•
https://github.com/Crapworks/ceph-dash
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/ceph
Log files → ELK
Alastair Dewhurst, 1st September 2016
5
Echo Benchmarking
•
•
•
•
Ran Ceph ‘rados bench’ from 3 machines with 10Gb/s links.
Each machine concurrently writing 64 x 4MB objects.
Average latency 0.35s.
Bottleneck appears to be with benchmarking machines (not
the cluster).
Alastair Dewhurst, 1st September 2016
6
GridFTP + XrootD plugins
•
•
For the LHC VOs we need working GridFTP and
XrootD access.
Without modifying any XrootD source code we have
working plugin.
•
•
•
•
Authorization using Gridmap file + AuthDB.
Large amount of performance optimization being done
by CERN[1].
Minor bugs expected to be fixed in XrootD 4.45
Ian Johnson has written authorization for GridFTP using
same model as XrootD.
[1] Talk by Sebastien Ponce describing performance:
https://indico.cern.ch/event/524549/contributions/2185945/attachments/1289528/1919824/CephForHighThroughput.pdf
Alastair Dewhurst, 1st September 2016
7
Plugin architecture
Client
Client Software
xrdcp /
globus-url-copy
8
Gateway
Storage
Authentication
Authorization
Gridmap
File
AuthDB
Is user
valid?
Yes
Is user
allowed to
perform op?
No
No
Error
Yes
Data
libRadosStriper
For XrootD the Gateway could be the WN
Alastair Dewhurst, 1st September 2016
GridFTP plugin design
FTS
Client
Gateway
Ceph
Backend
Patch prevents writes
Being too far off stream
2 buffers
Arriving data
•
•
FTS transfers use GridFTP with multiple streams.
Need to re-assemble data to allow decent performance.
•
Also done for HDFS at Nebraska (Brian Boceklman)
Alastair Dewhurst, 1st September 2016
9
XrootD Architecture
XrootD
Redirector used to load
balance across Gateways.
XrootD
UK XrootD
Redirector
Cmsd
XrootD
Echo XrootD
Redirector
XrootD
Cmsd
Gateway
XrootD
Cmsd
10
ATLAS/CMS jobs will
use redirectors as failover
XrootD
Cmsd
Cmsd
Castor XrootD
Redirector
RAL XrootD
Redirector
Cmsd
Gateway
WN will be install with an XrootD Gateway.
This will allow direct connection to Echo.
Ceph Backend
Xrootd
Xrootd
Xrootd
WN
WN
WN
Xrootd
Xrootd
Xrootd
WN
WN
WN
Alastair Dewhurst, 1st September 2016
S3 / Swift
•
•
•
We believe S3 / Swift are the industry standard
protocols we should be supporting.
S3 / Swift Gateway is being provided for all users.
•
In process of enabling SSL.
•
•
Need to ensure credentials are looked after properly.
Then will look at opening access to world (all UK
sites should have access).
Within Tier 1:
•
•
•
CVMFS Stratum 1 (tested)
Docker images (in use)
ELK backup (in progress)
Alastair Dewhurst, 1st September 2016
11
ATLAS and S3
•
ATLAS are currently the main external users of S3.
•
•
•
Log files.
AES writes output of individual events to S3 endpoint so that
jobs can be killed at any time.
•
•
•
ATLAS Event Service.
All UK sites write to either RAL or Lancaster.
AES working but very little work for it.
ATLAS log files known to cause stress on storage which is
designed for large files.
•
•
At RAL 20 – 30% of the transactions ~50TB space used.
Tested, but waiting on pilot development to implement in
production.
Alastair Dewhurst, 1st September 2016
12
Grid tools for S3/Swift
•
•
Are their tools available that will integrate S3/Swift
in to the Grid?
Oliver Keeble’s group at CERN have developed:
•
•
•
•
Davix - high performance HTTP client (used by
ROOT), with S3 optimisations.
gfal2 - using Davix, full HTTP/S3 support.
Dynafed - dynamic HTTP storage federation and
S3 gateway.
FTS3 - support for HTTP, S3 & Co., plus protocol
translation (eg gridftp → S3).
Alastair Dewhurst, 1st September 2016
13
DynaFed
•
14
We believe DynaFed is best tool to allow small VOs
secure access.
•
•
S3/Swift credentials stored on DynaFed Box.
Users use voms proxy.
davix-ls -k https://vm118.nubes.stfc.ac.uk/myfed/
davix-put -k testfile-50M https://vm118.nubes.stfc.ac.uk/myfed/s3-federation/testfile-50M
Job with
proxy
3. Data
1. Proxy + request
2. Pre-signed URL
DynaFed Box
S3 / Swift
Alastair Dewhurst, 1st September 2016
DynaFed WebUI
•
15
WebUI has been created for ease of use:
•
•
Creates directory structure (object called a/foobar).
Not quite complete (upload is still buggy).
Can add any other endpoint with WebDav support
‘Metalink’ will provide pre-signed
URL with 1 hour lifetime.
Dashboard and browser
based on DPM tools
Alastair Dewhurst, 1st September 2016
Summary & Plans
•
•
Future Tier 1 storage heavily Ceph based.
We intend to provide ATLAS and CMS pledged storage on
Echo in 2017.
•
•
Waiting on GridFTP plugin.
Have been exploring ways to use S3 / Swift.
•
Welcome any feedback regarding DynaFed.
Alastair Dewhurst, 1st September 2016
16
Backup
17
Alastair Dewhurst, 1st September 2016