LustreHSMTestPlan

Test Plan
For “HSM”
Revision History
Date
Revision
Author
Table of Contents
Table of Contents ............................................................................................................................................ 2
High Performance Data Division
Introduction (Content should be detailed so that it can be directly used in the Lustre Operations Manual
when appropriate) ............................................................................................................................................ 2
Feature Setup ................................................................................................................................................... 3
Test cases......................................................................................................................................................... 5
The intent of this document is to provide a guideline for the engineers developing features to use when
creating the test plan for the feature. Sections that do not apply to a particular feature may be deleted or
marked as not applicable.
Introduction (Content should be detailed so that it can be directly used in the Lustre Operations Manual
when appropriate)
HSM is a set of new features which allow a Lustre file system to transparently use an external backend to
extend the Lustre storage capacity. All the name space always remain in Lustre, only data file are copied to
the backend, removed from Lustre and restored back when accessed.
Feature Setup
-
Feature part of Lustre 2.5. One additional component is needed for large scale tests: robinhood policy
engine (named RBH) So to setup an HSM conf you need to:
setup Lustre (a minimum of 2 clients is better)
setup RBH (build, init DB and start)
start HSM in Lustre
To build and Install RBH
Retrieve the sources from git repo and set it up:
$ git clone https://github.com/ceahpc/robinhood.git
$ cd robinhood.git
$ git checkout master
$ sh autogen.sh
Build and install the RPMs:
$ ./configure –withpurpose=LUSTRE_HSM [withlustre=/path/to/specific/lustre_sources]
$ make rpm
$ rpm ivh rpms/RPMS/<arch>/robinhoodadm<version>.<arch>.rpm rpms/RPMS/<arch>/robinhoodlhsm<version>.<arch>.rpm

Other names and brands may be claimed as the property of others.
HPDD High Performance Data Division
2
Robinhood is to run on a Lustre client. A common configuration is to run it on the same host as its database
to minimize DB requests latency.
Setup
Enable changelogs on MDS:
• using the config helper:
$ rbhconfig enable_chglogs <fsname>
• or using lctl commands:
$ lctl device=<fsname>MDT* changelog_register
$ lctl set_param mdd.<fsname>MDT*.changelog_mask "allXATTRMARKATIME"
On DB host:
$ service mysqld start
$ rbhconfig create_db robinhood_hsm localhost rbh_db_password
Write the password to a file with restricted access (root/0600):
$ echo rbh_db_password > /etc/robinhood.d/lhsm/.dbpassword
Get the simple configuration file in source directory
(doc/templates/hsm_policy_basic.conf), or the detailed example in /etc/robinhood.d/lhsm/templates:
$ cp doc/templates/hsm_policy_basic.conf /etc/robinhood.d/lhsm/lustre.conf
Run
First, populate robinhood database by running an initial scan:
$ rbhlhsm scan once
Then, run the daemon to read changelogs, apply policies... :
$ service robinhoodlhsm start
To use HSM feature:
Reporting commands: rbhlhsmreport, rbhlhsmfind, rbhlhsmdu...
Manually apply mass policy actions using rbhlhsm
Setup HSM in Lustre
HSM has to be explicitely enabled on MDT.
$ lctl conf_param $FSNAMEMDT0000.mdt.hsm_control=enabled
To check whether HSM (actually, the coordinator) is running:
$ lctl get_param mdt.$FSNAMEMDT0000.hsm_control
must returned: mdt.lustreMDT0000.hsm_control=enabled
To control the current coordinator state:
$ lctl set_param mdt.$FSNAMEMDT0000.hsm_control=VALUE
where VALUE can be:
* enabled: start CDT, all feature are on;
* disabled: suspend CDT, on registration/display of request is possible, no data movement
* shutdown: stop the CDT
* purge (cancel all pending requests)
HPDD High Performance Data Division
3
We have 2 types of clients, the standard one named client and the agent. An agent is a Lustre client on
which run a copytool. All the agents which serve the same backend must be able to access it. The backend
needed by the posix copytool is a Posix FS, so it can be another Lustre FS or a NFS mounted FS. This
backend FS need to be mounted only on the agents nodes.
Agent setup
Start the copytool daemon.
$ lhsmtool_posix --daemon v --hsm_root /mnt/backend --archive 0 $MOUNT
Make sure (on MDT) that the copytool is properly registered.
$ cat /proc/fs/lustre/mdt/$FSNAMEMDT0000/hsm/agents
Test cases
This should include the areas of testing that will need to be completed for this feature. Below are
examples:
The test cases are divided into 7 components:
1 – Unit Testing
2 – Regression Testing
3 – Scalability Testing
4 – Compatibility/Interop testing
5 – Performance testing
6 – Failover testing
7 – User Interface Components
Unit Testing
- setup a cfg file based on your Lustre conf, with the agent
- run sanity-hsm.sh on a client
Test cases template
Test Case Name
sanity-hsm
Test unitary features of HSM
Purpose
Actors
Description
Environment Settings
Trigger
Preconditions
Post-conditions
Special Requirements
Assumptions
Expected Results
Notes and Issues
Lustre FS with HSM enabled
Regression Testing
All acceptance_small must be OK.
HPDD High Performance Data Division
4
Test cases template
Test Case Name
Purpose
Actors
Description
Environment Settings
Trigger
Preconditions
Postconditions
Special Requirements
Assumptions
Expected Results
Notes and Issues
Scalability Testing
The scalability points are:
the number of pending request
the number of running agents
Setup 2 Lustre FS, with separate OST and MDT. One FS will be used as backend (named backend FS), the
other one will use HSM feature (named HSM FS). The backend FS must be the biggest (a size of 10 time
the Lustre FS if possible). The HSM FS has more clients than the backend FS.
Setup the HSM configuration at previously describe, no need of RBH.
From the half of the HSM clients, create thousands or million of files of multiple sizes. After each file
creation, archive the file (lfs hsm_archive file) and release it (lfs hsm_release file). From the other half of
the HSM clients, read the previously created files and check they are ok (md5sum). The read test must be
done after the release, so the restore feature will be tested. A simple way is to start the read test few minutes
after the creation/archive test. If possible read the files in an order different from creation.
A variant of test is the following:
create million of files
from all the clients:
randomly choose a file
check it's status:
if release read it
if not and not archived, archive it and release it
loop
Compatibility/Interoperability Testing
Set up an HSM configuration. Setup 2.4 clients.
From HSM clients, create files, archive and release them.
After from older clients
- read the files, they must be seen released (read/write failed with -ENODATA)
From a HSM client, read the file, it will restore it.
From the older client, re-read the file, it must be seen as from the HSM client (same content).
HPDD High Performance Data Division
5
Performance Testing
An agent must be able to move data at Lustre client speed
How many transfer per client to get this speed?
Test cases template (including the current typical test runs)
Test Case Name
Purpose
Actors
Description
Environment Settings
Trigger
Preconditions
Postconditions
Special Requirements
Assumptions
Expected Results
Notes and Issues
Failover Testing
During a test like the scalability or the Compatibility (but with HSM client only in place of 2.4 clients),
crash, reboot the MDT. Must be transparent to the test.
Test cases template
Test Case Name
Purpose
Actors
Description
Environment Settings
Trigger
Preconditions
Postconditions
Special Requirements
Assumptions
Expected Results
Notes and Issues
User Interface Components: (content should be detailed so that it can be directly used in the Lustre
Operations Manual if appropriate)
Procfs/Sysfs entries added or changed:
NODE
PATH
HPDD High Performance Data Division
SETTINGS
6
MDT
MDT
MDT0000.mdt.hsm_control
MDT0000.mdt.hsm.agent_actions
MDT
MDT
MDT
MDT
MDT
MDT
MDT0000.mdt.hsm.agents
MDT0000.mdt.hsm.archive_id
MDT0000.mdt.hsm.grace_delay
MDT0000.mdt.hsm.loop_period
MDT0000.mdt.hsm.max_requests
MDT0000.mdt.hsm.policy
MDT
MDT0000.mdt.hsm.requests
MDT
MDT0000.mdt.hsm.request_timeout
These settings are for each MDT in a DNE configuration.
enabled
List or registered requests
(RO)
List of registered agents (RO)
1
60
10
10 x Agent count
NonBlockingRestore
[NoRetryAction]
List of running requests (RO)
3600
Other system settings added or changed:
New llapi calls
LFS commands added or changed:
- lfs hsm_archive
- lfs hsm_release
- lfs hsm_restore
- lfs hsm_cancel
LCTL commands added or changed:
None
New Scripts added:
- sanity-hsm.sh
Man Pages entries added or changed:
- lfs-hsm.1
Test Plan Results: (To be populated by the Test Engineer)
-
Necessary data that demonstrates the success of the feature
Important data from performance tests (speed, latency, and bandwidth)
Include the data defined above in the test plan
Additional reports or supporting documentation as defined above in the test plan
Detailed configuration of the system used for testing
HPDD High Performance Data Division
7


Other names and brands may be claimed as the property of others.
HPDD High Performance Data Division
8