HUF2015: Lustre HSM status

FROM RESEARCH TO INDUSTRY
Lustre HSM
integration
Project update
HUF 2015 | Thomas Leibovici <[email protected]>
CEA, DAM, DIF, F-91297 Arpajon, France
SEPTEMBER, 29th 2015
CEA | 10 AVRIL 2012 | PAGE 1
SUMMARY
Principle
Architecture and components
Project status
Vendors integration
HPSS integration
Future work
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 2
BIG PICTURE
Principle
Take the best of each world:
HSM seamless integration
Clients
Client
Client
Clients
Client
Client
Clients
Client
Client
Lustre: High performant
disk-cache in front of the HSM
- Parallel filesystem
- High I/O performance
- POSIX access
HSM backend
HSM: long term data storage
- Manage large number of
cheaper disks and tapes
- Huge storage capacity
Ideal for center-wide Lustre
filesystem.
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 3
FEATURES AND COMPONENTS
Features
Copy data to HSM (Archive)
Free disk space when needed (Release)
Bring back data on cache-miss (Restore)
Supports multiple backends
Policy management (migration, purge, removal,…)
Import from existing backend
Undelete
Needed components
Lustre (2.5+)
Copy tool (backend specific user-space daemon)
Posix copy tool shipped with Lustre
Policy Engine (user-space daemon)
RobinHood Policy Engine (open source)
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 4
ARCHITECTURE (1/2)
Coordinator, Agent and Copy tool
Clients
MDS
Coordinator
OSS
Client
Client
Client
“Agent”
“Agent”
“Agent”
OSS
Lustre world
Archivingtool
tool
Archiving
Copy tool
HSM protocols
HSM world
The coordinator gathers archive requests and dispatches them to agents.
Agent is a client which runs a copytool to transfer data between Lustre and
the HSM.
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 5
ARCHITECTURE (2/2)
Policy Engine manages Archive and Release policies
Clients
MDS
Coordinator
OSS
OSS
PolicyEngine
Client
A user-space tool which communicates with the MDT and the coordinator.
Watches the filesystem changes.
Triggers actions like archive, release and removal in backend.
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 6
EXAMPLES (1/3)
Command line tools
Sysadmins and users can manage file system states:
$ lfs hsm_archive /mnt/lustre/foo
ARCHIVE
$ lfs hsm_state /mnt/lustre/foo
/mnt/lustre/foo: (0x00000009) exists archived, archive_id:1
$ lfs hsm_release /mnt/lustre/foo
RELEASE
AUTOMATIC
RESTORE
16 septembre 2015
$ lfs hsm_state /mnt/lustre/foo
/mnt/lustre/foo: (0x0000000d) released exists archived, archive_id:1
$ md5sum /mnt/lustre/foo
ded5b0680e566aa024d47ac53e48cdac
/mnt/lustre/foo
$ lfs hsm_state /mnt/lustre/foo
/mnt/lustre/foo: (0x00000009) exists archived, archive_id:1
HUF2015 | 29 SEPTEMBER 2015 | PAGE 7
EXAMPLES (2/3)
Example RobinHood policy: Migration
Migrate files older than 12 hours with a different behavior for small ones.
Filesets {
FileClass small_files {
definition { tree == "/mnt/lustre/project" and size < 1MB }
...
}
}
Migration_Policies {
ignore { size == 0 or xattr.user.no_copy == 1 }
ignore { tree == "/mnt/lustre/logs" and name == "*.log" }
policy migrate_small {
target_fileclass = small_files;
condition { last_mod > 6h or last_archive > 1d }
migration_hints = "cos=12" ;
}
...
policy default {
condition { last_mod > 12h }
migration_hints = "cos=3" ;
}
}
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 8
EXAMPLES (3/3)
Example RobinHood policy: Release
Release archived files when FS usage is above 90 % but ignore some files.
Purge_trigger {
trigger_on = ost_usage;
high_watermark_pct = 90%;
low_watermark_pct = 80%;
}
Purge_Policies {
ignore { size < 1KB or owner == “root” }
policy purge_quickly {
target_fileclass = class_foo;
condition { last_access > 1min }
}
...
policy default {
condition { last_access > 1h }
}
}
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 9
PROJECT STATUS
Initially developed by CEA in Collaboration with
CFS/Sun/Oracle/WhamCloud/Intel...
HSM feature initially released in Lustre 2.5.0 (oct. 2013)
Now supported and maintained by main Lustre vendors:
Intel, Cray, Seagate, Bull, DDN, SGI...
HSM support is very active:
Lustre 2.5.1: 19 patches (1 improvement, 1 feature)
Lustre 2.5.2: 7 patches (1 improvement)
Lustre 2.5.3: 6 patches
Lustre 2.5.4 (Intel Fundation Edition): 8 patches (1 improvement)
Lustre 2.6: 35 patches (3 improvements)
Lustre 2.7: 18 patches
Lustre 2.8 (under development): 14 patches
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 10
VENDORS INTEGRATION
Intel:
Support for HSM feature, Posix Copytool and Robinhood Policy Engine
is part of the Intel Enterprise Edition Lustre solution.
SGI:
Developed a specific copytool for DMF
Now in production on large systems: NCI/ANU (Australia), CINES (France), ...
Cray:
Developed a specific copytool (enhanced POSIX copytool + Versity support)
Active developer of Robinhood Policy Engine
First customer: KAUST University (Saudi Arabia)
Seagate:
Working on HSM support (including Robinhood)
Grau Data:
Developed a parallel copy tool for OpenArchive
Support for other backends under development
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 11
HPSS INTEGRATION
CEA developed a copytool for HPSS
based on HPSS Client API
Available as open-source to HPSS sites:
http://lustrehpss.sourceforge.net
In production at CEA
Advanced (successful) testing at SLAC
Other sites downloaded it (no feedback yet)
Mailing list (questions, new releases...):
[email protected]
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 12
Clients
Client
Client
Clients
Client
Client
Clients
Client
Client
Policy Engine
Robinhood
lustreapi
lustreapi
lustreapi
HPSS
Copytool
HPSS
Copytool
HPSS
Copytool
HPSS
client API
HPSS
client API
HPSS
client API
HPSS servers
Lustre
world
HPSS
world
HUF2015 | 29 SEPTEMBER 2015
HPSS SPECIFIC FEATURES
Admin-Friendly HPSS namespace
Lustre namespace
/fs/proj1/grp/user/foo
/fs/proj/grp/user1/dir1/bar
/fs/proj/grp/user1/dir2/save
/fs/mylogs/tool/20131011.log
/fs/mylogs/tool/20131012.log
HPSS namespace
/fs/proj/grp/user/foo__0x200000201:0x1a43f5:0x0
/fs/proj/grp/user1/dir1/bar__0x200100201:0x2f320:0x0
/fs/proj/grp/user1/dir2/save__0x200450201:0x74320:0x0
/fs/mylogs/tool/20131011.log__0x201210201:0x43112:0x0
/fs/mylogs/tool/20131012.log__0x200300201:0x24120:0x0
Relies on HPSS UDAs
Else: namespace based on ids
Just for admin convenience (no path update on rename)
Lustre/HSM configurations
1 HPSS system <--> several Lustre filesystems
Contraint: distinct directories
1 Lustre filesystem <--> several storage systems (HPSS, POSIX, ...)
Policy driven
e.g. small files to a NFS filer, big files to HPSS...
HUF2015 | 29 SEPTEMBER 2015
FAQ
“If a top directory is renamed, does this trigger millions of renames in HPSS?”
The only operations in the backend are copy operations “archive” and “restore”,
+ “remove” (cleaning deleted files).
Path in HPSS is only for admin convenience, it is not used for working.
The metadata replicate used for disaster recovery is maintained by the Policy
Engine (Robinhood). No massive metadata update is performed in HPSS.
“How long is an import operation?”
Import is a metadata-only operation (creating a “released” file in Lustre). File
data is restored on first access (or explicit “hsm_restore” command).
“How is Disaster Recovery performed? Are the files recovered by copying the
files from HPSS to Lustre?”
Policy Engine replicates filesystem metadata near real time (thanks to Lustre
changelogs). The contents of its DB can be used to restore files in “released”
state with the right path and attributes. This is a metadata only operation.
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 15
LIMITATIONS AND FUTURE WORK
Whole files
The current implementation only support whole file copy
Support of partial files copy is part of HSMv2 TODO list
DNE support (multiple Lustre MDS)
HSM is compatible with DNE pĥase 1 (static namespace partitioning)
HSM is not compatible with later phases (in TODO list for HSMv2)
Scalability: distributed policy engine DB (WIP)
Disaster recovery process
All needed information is available in Robinhood DB to restore a Lustre
filesystem with the current backend contents.
Disaster recovery can be implemented using 'import' command + information
from Robinhood DB.
However, a more automated/integrated command would be better.
Intel is working on it.
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 16
GETTING STARTED
Enable HSM feature on your Lustre filesystem:
In “Lustre Manual”:
- See “Hierarchical Storage Management (HSM)”
Get HPSS copytool at:
http://lustrehpss.sourceforge.net (download)
Doc is in tar file: share/doc/hpss_ct.pdf
Then you can test manual actions (archive, release, restore...)
To massively trigger automatic actions, get the policy engine:
http://robinhood.sourceforge.net -> “Download latest version” (lhsm flavor)
Lustre/HSM related doc: “Online documentation” -> “robinhood-lhsm tutorial”
16 septembre 2015
HUF2015 | 29 SEPTEMBER 2015 | PAGE 17
Thanks for your attention !
Questions ?
Commissariat à l’énergie atomique et aux énergies alternatives
CEA / DAM Ile-de-France| Bruyères-le-Châtel - 91297 Arpajon Cedex
T. +33 (0)1 69 26 40 00
Etablissement public à caractère industriel et commercial | RCS Paris B 775 685 019
DAM Île-de-France