SGI`s Lustre HSM

SGI’s Lustre HSM
Rob Mollard
Senior Storage Specialist (APAC)
Some names and brands may be claimed as the property of others
©2016 SGI
Why consider Lustre HSM?
Nov, 2013
Galen Shipman (LANL) – Former OpenSFS Chairman
2
Some names and brands may be claimed as the property of others
©2016 SGI
Data always lives longer than the
hardware it’s stored on.
Forward migration to new
technology should never adversely
impact the users.
3
Some names and brands may be claimed as the property of others
©2016 SGI
Overview of Lustre HSM
Features:
• Migrate data to and from external storage (HSM)
• Free disk space when needed
• Restore data on cache-miss
• Policy management (migration, purge, soft rm,...)
• Import from existing backend
• Disaster recovery (restore Lustre filesystem from backend)
4
Some names and brands may be claimed as the property of others
©2016 SGI
Supported Lustre HSM Actions
Archive
• Archiving a file means pre-copying a file from Lustre to an external HSM.
• A Copy Tool (“copytool”) reads file content and copies it to an external HSM.
• Once it has been copied, a file is then ready to be released.
Release
• Remove all file data objects.
• Synchronous action which does not involve the copytool nor coordinator.
Restore
• All file accesses are blocked until the file is fully restored.
• Copytool will write file data back from an external HSM to Lustre.
• File data accesses are unblocked when the restore is finished.
5
Some names and brands may be claimed as the property of others
©2016 SGI
Components of Lustre HSM
• Coordinator
• Policy Engine (robinhood)
• Copytool (DMF copytool)
Lustre OSS/OST Building Block
Lustre Clients
• HSM (DMF)
Lustre OSS/OST Building Block
Lustre MDS/MDT
DMF Managed Environment
Coordinator
Lustre HSM Client Agent
PolicyEngine
6
Tier 2:
Disk Cache Manager ZWS
ZWS
ZWS
Tier 3:
Secondary Storage
TAPE
TAPE
TAPE
ZWS
TAPE
ZWS
TAPE
copytool
DB
Some names and brands may be claimed as the property of others
©2016 SGI
Lustre HSM | Coordinator
• Centralises reception of HSM requests and ignores duplicate ones.
• Dispatches and balances requests across the available copytools.
• Manages a log of all requests.
– Replay request if MDS/MDT has crashed.
– Can be manually cancelled
– Behaviour is tuneable
• Can be stopped, resumed, purged
• Retry/No retry on error
• Timeouts, number of simultaneous requests, etc…
Lustre MDS/MDT
DMF Managed Environment
Coordinator
Lustre HSM Client Agent
PolicyEngine
7
Tier 2:
Disk Cache Manager ZWS
ZWS
ZWS
Tier 3:
Secondary Storage
TAPE
TAPE
TAPE
ZWS
TAPE
ZWS
TAPE
copytool
DB
Some names and brands may be claimed as the property of others
©2016 SGI
Lustre HSM | Policy Engine
Lustre OSS/OST Building Block
Lustre Clients
Lustre OSS/OST Building Block
Lustre MDS/MDT
Coordinator
Lustre HSM Client Agent
PolicyEngine
DB
Policy Engine manages pre-migration and purge policies.
• A user-space tool which communicates with the MDS and the coordinator.
• Watches Lustre file system changes.
• Triggers HSM actions/requests like pre-migration, purges and removal.
8
Some names and brands may be claimed as the property of others
©2016 SGI
Policy Engine: “Robinhood”
Robinhood User Group 2016 - September 19th, 2016 - Paris, France
Robinhood is a user-space daemon for managing filesystems
 Purge oldest files when needed
‘archive’ requests
(copy data to target)
 Custom policies
MDT ChangeLogs
With a database backend
DB
 Persistent, avoid scanning for each action
 Currently MySQL or MariaDB are supported
Supports features like:
9
OST usage
Policies
•
User/group usage accounting, including file size profiling.
•
Extra-fast 'du' and 'find' clones.
•
Customizable alerts on filesystem entries.
•
Aware of Lustre OSTs and pools. (stripping)
•
Filesystem disaster recovery tools.
Robinhood
‘release’ requests
(free disk space in Lustre)
‘restore’ requests
(copy data from target)
Some names and brands may be claimed as the property of others
©2016 SGI
Robinhood Policies
Robinhood manages 3 types of policies
• Migration policy
• Purge policy
• Removal policy
Policies - schedule actions on filesystem entries according to admin-defined criteria
• File class definitions, associated to policies
• Based on file attributes (path, size, owner, age, xattrs, ...)
• Rules can be combined with boolean operators
• Least Recently Used (LRU) based migration/purge policies
• Entries can be white-listed
10
Some names and brands may be claimed as the property of others
©2016 SGI
Use cases
HPC Active Backup
HPC Disaster Recovery
HPC Workflow Optimisation
Persistent HPC Storage
Lower HPC storage costs
Long Term HPC Archives
HPC Active Archive
11
Some names and brands may be claimed as the property of others
©2016 SGI
Lustre DMF
Active Data Protection
12
Some names and brands may be claimed as the property of others
©2016 SGI
Data Assurance and Reliability
A “3-2-1” Approach
1
13
copies
media
types
2
copy
offsite
3
1
Performance
copy
1
RAID, Flash,
Disc, Tape,
Object & ZWS
2
3
Secure
copy
Disaster Recovery
copy
•
•
Optimized use of storage HW
Elimination of backup
Advantage of keeping data
on two different media
types:
2
Tape or Cloud Object
1
Primary Data
Center
Advantage of 3 copies
of all data:
Offsite or Cloud
Storage
•
•
•
Fast data access
Data retention
Archive resilience
Advantage of keeping one
copy offsite:
•
•
•
Lower power consumption
Base for compliance
Disaster recovery
Some names and brands may be claimed as the property of others
©2016 SGI
Lustre DMF Overview
Management
Target (MGT)
Metadata
Target (MDT)
Object Storage
Targets (OSTs)
Storage servers grouped
into failover pairs
Management Server (MGS)
Metadata Server (MDS)
Object Storage Servers (OSS)
Management
Network
Storage
Monitoring
Data Network (LNET)
(InfiniBand/Ethernet/Omnipath)
Policy Engine
Server
DMF Agent
(copytool)
DMF Storage Network
(InfiniBand/Fibre Channel/SAS)
Lustre Clients
Archive
14
Some names and brands may be claimed as the property of others
©2016 SGI
HSM | Data Migration Facility (DMF)
Hierarchical Storage Management

Data life cycle management
Transparently migrate data to Tape, MAID or Cloud
- DMF manages the placement of data
within multiple tiers of storage

Automated data migration
- From expensive, production disk to
2nd or 3rd tier storage

Lustre*
Transparent to user
- All data appears on line all the time

Key Benefits
DMF
- DMF reduces tier 1 disk investment
- DMF reduces power consumption
- DMF protects data long term
SGI® DMF™ 25 years in production
 Active User Community (DMFUG)

15
Cloud
Some names and brands may be claimed as the property of others
©2016 SGI
Lustre DMF | Communication & Data Flow
Lustre* HSM Communications
DMF Communications
DMF Metadata update
DMF Data
…
…
Lustre* Clients
Lustre* OSS/OST
Building Block
Lustre* MDS/MDT
Building Block
…
Lustre* Clients
Lustre* OSS/OST
Building Block
…
Lustre* Clients
Lustre*
Lustre*
OSS/OST
OSS/OST
Building
Building
Block
Block
Coordinator
DMF Direct Archiving
Lustre* HSM
Client Agent
PolicyEngine
DB
Parallel Data Mover Option
•
Data migration from multiple
parallel servers
Scales I/O performance
Add Additional data movers
as required
•
•
16
TAPE
…
ZWS
Parallel DMF Data Mover
Building Block
TAPE
…
ZWS
Parallel DMF Data Mover
Building Block
…
… …
TAPE ZWS
TAPE TAPE
Parallel
Mover
Parallel
DMFDMF
DataData
Mover
Building
Block
Building
Block
DMF Managed HSM Environment
Some names and brands may be claimed as the property of others
…
DMF MDS
copytool
©2016 SGI
DMF Direct Archiving | Data Flow
Lustre* MDS
Metadata
Lustre* Clients
…
MDS 1
Data
MDS 2
Logical Path
Physical Path
Lustre* OSS
DMF Servers
Storage
OSS 1
DMF Data Mover
OSS 2
DMF MDS 1
High Performance Disk Cache
Primary
Storage
OSS 3
ZWS
DMF Tier2
TAPE
DMF Tier3
OSS 4
DMF MDS 2
17
Some names and brands may be claimed as the property of others
©2016 SGI
Key Differentiators
 High performance data migrations
– DMF Parallel Data Movers (pDMO)
 Simplified Lustre HSM communication
– Single DMF copytool instance
 Optimised data migrations
– DMF Direct Archiving
 MAID storage target
– DMF Zero Watt Storage
 Trusted data protection
– Over 25 years preserving data
 Active user community
– DMF User Group (Feb
2017)http://hpc.csiro.au/users/dmfug/
18
Some names and brands may be claimed as the property of others
©2016 SGI
SGI Standardised Lustre Implementation Service
Tasks & Deliverables
 Project management
 Pre-sales assessment of storage requirements
-
Provide guidance on Lustre best practices
-
Determine LUN layout
-
Assess Lustre networking needs
 Connect/verify all HW
 Install Lustre Environment
-
Install & configure Lustre SW for servers
-
Tune Lustre file system environment
-
Install Lustre clients
-
Create failover configuration files, if required
 Benchmark & Test environment
 Documentation of system and Lustre networking configuration
 Knowledge Transfer & Onsite Training
19
Some names and brands may be claimed as the property of others
©2016 SGI
Lustre DMF
Customer Examples
20
Some names and brands may be claimed as the property of others
©2016 SGI
 Lustre Version: 2.5.41.1
 HSM: DMF 6.5 (ISSP 3.5)
 In use: since early 2015
 Capacity: 3PB
 Performance: ~50GB/s
 How is it used: Protection of scientific data.
Customer supports users all over France, with
Lustre used for persistent storage. Data
replication/duplication is not feasible with PB’s.
21
Some names and brands may be claimed as the property of others
©2016 SGI
High Performance
Automotive Manufacturing
Formula 1 Engine Manufacturer
Lustre Version: 2.5
HSM: DMF 6.5 (ISSP 3.5)
In use: since April 2015
Capacity: 1PB
Performance: ~5GB/s
How is it used: Active Archive (HSM)
22
Some names and brands may be claimed as the property of others
©2016 SGI
 Lustre Version: 2.5 – DDN EXAScaler (Intel EEL)
 HSM: DMF 6.5 (ISSP 3.5)
 In use: Testing for several months – just going into
production now
 Capacity: 1.9PB
 Performance: ~20GB/s
 How is it used: Robinhood is running in HSM mode,
although there is sufficient capacity for all data to
remain online in Lustre (dual state)
23
Some names and brands may be claimed as the property of others
©2016 SGI
Q&A
24
Some names and brands may be claimed as the property of others
©2016 SGI