1 Ceph status and plans Alastair Dewhurst, Alison Packer, James Adams, Bruno Canning, George Vasilakakos, Ian Johnson Alastair Dewhurst, 1st September 2016 Ceph at RAL S3 / Swift Service XrootD / GridFTP Castor buffers 2 CephFS (Facilities) Echo EC 8+3 Cluster • • • RBD (Cloud) Sirius Rep x3 Common configuration management. Both clusters run by data service team. Working on improving monitoring and operational procedures. Alastair Dewhurst, 1st September 2016 Sirius cluster • Has been providing reliable RBD storage for internal cloud services for over a year. • • One serious incident which resulted in the loss and recreation of all Cloud VMs. • • Providing pools for OpenNebula and test OpenStack. We learnt much from this! Future: • • Additional rack of hardware being purchased. CephFS planned for STFC Facilities users. Alastair Dewhurst, 1st September 2016 3 Echo cluster • Working (new) cluster since July 2016. • • • • • 3 physical Monitors (Hot spare on order). 60 x 216TB storage nodes. 3 gateways machines (awaiting network upgrade to 40GB/s). Running Jewel on SL7. 9.4PB usable storage (8+3 Erasure Coding). • Intend to provide ATLAS and CMS 2.8PB each as part of pledged resources in April 2017. Alastair Dewhurst, 1st September 2016 4 Ceph Monitoring • • Nagios for exceptions. Ceph Dashboard: • • Telegraf plugin for Ceph → InfluxDB: • • https://github.com/Crapworks/ceph-dash https://github.com/influxdata/telegraf/tree/master/plugins/inputs/ceph Log files → ELK Alastair Dewhurst, 1st September 2016 5 Echo Benchmarking • • • • Ran Ceph ‘rados bench’ from 3 machines with 10Gb/s links. Each machine concurrently writing 64 x 4MB objects. Average latency 0.35s. Bottleneck appears to be with benchmarking machines (not the cluster). Alastair Dewhurst, 1st September 2016 6 GridFTP + XrootD plugins • • For the LHC VOs we need working GridFTP and XrootD access. Without modifying any XrootD source code we have working plugin. • • • • Authorization using Gridmap file + AuthDB. Large amount of performance optimization being done by CERN[1]. Minor bugs expected to be fixed in XrootD 4.45 Ian Johnson has written authorization for GridFTP using same model as XrootD. [1] Talk by Sebastien Ponce describing performance: https://indico.cern.ch/event/524549/contributions/2185945/attachments/1289528/1919824/CephForHighThroughput.pdf Alastair Dewhurst, 1st September 2016 7 Plugin architecture Client Client Software xrdcp / globus-url-copy 8 Gateway Storage Authentication Authorization Gridmap File AuthDB Is user valid? Yes Is user allowed to perform op? No No Error Yes Data libRadosStriper For XrootD the Gateway could be the WN Alastair Dewhurst, 1st September 2016 GridFTP plugin design FTS Client Gateway Ceph Backend Patch prevents writes Being too far off stream 2 buffers Arriving data • • FTS transfers use GridFTP with multiple streams. Need to re-assemble data to allow decent performance. • Also done for HDFS at Nebraska (Brian Boceklman) Alastair Dewhurst, 1st September 2016 9 XrootD Architecture XrootD Redirector used to load balance across Gateways. XrootD UK XrootD Redirector Cmsd XrootD Echo XrootD Redirector XrootD Cmsd Gateway XrootD Cmsd 10 ATLAS/CMS jobs will use redirectors as failover XrootD Cmsd Cmsd Castor XrootD Redirector RAL XrootD Redirector Cmsd Gateway WN will be install with an XrootD Gateway. This will allow direct connection to Echo. Ceph Backend Xrootd Xrootd Xrootd WN WN WN Xrootd Xrootd Xrootd WN WN WN Alastair Dewhurst, 1st September 2016 S3 / Swift • • • We believe S3 / Swift are the industry standard protocols we should be supporting. S3 / Swift Gateway is being provided for all users. • In process of enabling SSL. • • Need to ensure credentials are looked after properly. Then will look at opening access to world (all UK sites should have access). Within Tier 1: • • • CVMFS Stratum 1 (tested) Docker images (in use) ELK backup (in progress) Alastair Dewhurst, 1st September 2016 11 ATLAS and S3 • ATLAS are currently the main external users of S3. • • • Log files. AES writes output of individual events to S3 endpoint so that jobs can be killed at any time. • • • ATLAS Event Service. All UK sites write to either RAL or Lancaster. AES working but very little work for it. ATLAS log files known to cause stress on storage which is designed for large files. • • At RAL 20 – 30% of the transactions ~50TB space used. Tested, but waiting on pilot development to implement in production. Alastair Dewhurst, 1st September 2016 12 Grid tools for S3/Swift • • Are their tools available that will integrate S3/Swift in to the Grid? Oliver Keeble’s group at CERN have developed: • • • • Davix - high performance HTTP client (used by ROOT), with S3 optimisations. gfal2 - using Davix, full HTTP/S3 support. Dynafed - dynamic HTTP storage federation and S3 gateway. FTS3 - support for HTTP, S3 & Co., plus protocol translation (eg gridftp → S3). Alastair Dewhurst, 1st September 2016 13 DynaFed • 14 We believe DynaFed is best tool to allow small VOs secure access. • • S3/Swift credentials stored on DynaFed Box. Users use voms proxy. davix-ls -k https://vm118.nubes.stfc.ac.uk/myfed/ davix-put -k testfile-50M https://vm118.nubes.stfc.ac.uk/myfed/s3-federation/testfile-50M Job with proxy 3. Data 1. Proxy + request 2. Pre-signed URL DynaFed Box S3 / Swift Alastair Dewhurst, 1st September 2016 DynaFed WebUI • 15 WebUI has been created for ease of use: • • Creates directory structure (object called a/foobar). Not quite complete (upload is still buggy). Can add any other endpoint with WebDav support ‘Metalink’ will provide pre-signed URL with 1 hour lifetime. Dashboard and browser based on DPM tools Alastair Dewhurst, 1st September 2016 Summary & Plans • • Future Tier 1 storage heavily Ceph based. We intend to provide ATLAS and CMS pledged storage on Echo in 2017. • • Waiting on GridFTP plugin. Have been exploring ways to use S3 / Swift. • Welcome any feedback regarding DynaFed. Alastair Dewhurst, 1st September 2016 16 Backup 17 Alastair Dewhurst, 1st September 2016
© Copyright 2026 Paperzz