Test Plan For “HSM” Revision History Date Revision Author Table of Contents Table of Contents ............................................................................................................................................ 2 High Performance Data Division Introduction (Content should be detailed so that it can be directly used in the Lustre Operations Manual when appropriate) ............................................................................................................................................ 2 Feature Setup ................................................................................................................................................... 3 Test cases......................................................................................................................................................... 5 The intent of this document is to provide a guideline for the engineers developing features to use when creating the test plan for the feature. Sections that do not apply to a particular feature may be deleted or marked as not applicable. Introduction (Content should be detailed so that it can be directly used in the Lustre Operations Manual when appropriate) HSM is a set of new features which allow a Lustre file system to transparently use an external backend to extend the Lustre storage capacity. All the name space always remain in Lustre, only data file are copied to the backend, removed from Lustre and restored back when accessed. Feature Setup - Feature part of Lustre 2.5. One additional component is needed for large scale tests: robinhood policy engine (named RBH) So to setup an HSM conf you need to: setup Lustre (a minimum of 2 clients is better) setup RBH (build, init DB and start) start HSM in Lustre To build and Install RBH Retrieve the sources from git repo and set it up: $ git clone https://github.com/ceahpc/robinhood.git $ cd robinhood.git $ git checkout master $ sh autogen.sh Build and install the RPMs: $ ./configure –withpurpose=LUSTRE_HSM [withlustre=/path/to/specific/lustre_sources] $ make rpm $ rpm ivh rpms/RPMS/<arch>/robinhoodadm<version>.<arch>.rpm rpms/RPMS/<arch>/robinhoodlhsm<version>.<arch>.rpm Other names and brands may be claimed as the property of others. HPDD High Performance Data Division 2 Robinhood is to run on a Lustre client. A common configuration is to run it on the same host as its database to minimize DB requests latency. Setup Enable changelogs on MDS: • using the config helper: $ rbhconfig enable_chglogs <fsname> • or using lctl commands: $ lctl device=<fsname>MDT* changelog_register $ lctl set_param mdd.<fsname>MDT*.changelog_mask "allXATTRMARKATIME" On DB host: $ service mysqld start $ rbhconfig create_db robinhood_hsm localhost rbh_db_password Write the password to a file with restricted access (root/0600): $ echo rbh_db_password > /etc/robinhood.d/lhsm/.dbpassword Get the simple configuration file in source directory (doc/templates/hsm_policy_basic.conf), or the detailed example in /etc/robinhood.d/lhsm/templates: $ cp doc/templates/hsm_policy_basic.conf /etc/robinhood.d/lhsm/lustre.conf Run First, populate robinhood database by running an initial scan: $ rbhlhsm scan once Then, run the daemon to read changelogs, apply policies... : $ service robinhoodlhsm start To use HSM feature: Reporting commands: rbhlhsmreport, rbhlhsmfind, rbhlhsmdu... Manually apply mass policy actions using rbhlhsm Setup HSM in Lustre HSM has to be explicitely enabled on MDT. $ lctl conf_param $FSNAMEMDT0000.mdt.hsm_control=enabled To check whether HSM (actually, the coordinator) is running: $ lctl get_param mdt.$FSNAMEMDT0000.hsm_control must returned: mdt.lustreMDT0000.hsm_control=enabled To control the current coordinator state: $ lctl set_param mdt.$FSNAMEMDT0000.hsm_control=VALUE where VALUE can be: * enabled: start CDT, all feature are on; * disabled: suspend CDT, on registration/display of request is possible, no data movement * shutdown: stop the CDT * purge (cancel all pending requests) HPDD High Performance Data Division 3 We have 2 types of clients, the standard one named client and the agent. An agent is a Lustre client on which run a copytool. All the agents which serve the same backend must be able to access it. The backend needed by the posix copytool is a Posix FS, so it can be another Lustre FS or a NFS mounted FS. This backend FS need to be mounted only on the agents nodes. Agent setup Start the copytool daemon. $ lhsmtool_posix --daemon v --hsm_root /mnt/backend --archive 0 $MOUNT Make sure (on MDT) that the copytool is properly registered. $ cat /proc/fs/lustre/mdt/$FSNAMEMDT0000/hsm/agents Test cases This should include the areas of testing that will need to be completed for this feature. Below are examples: The test cases are divided into 7 components: 1 – Unit Testing 2 – Regression Testing 3 – Scalability Testing 4 – Compatibility/Interop testing 5 – Performance testing 6 – Failover testing 7 – User Interface Components Unit Testing - setup a cfg file based on your Lustre conf, with the agent - run sanity-hsm.sh on a client Test cases template Test Case Name sanity-hsm Test unitary features of HSM Purpose Actors Description Environment Settings Trigger Preconditions Post-conditions Special Requirements Assumptions Expected Results Notes and Issues Lustre FS with HSM enabled Regression Testing All acceptance_small must be OK. HPDD High Performance Data Division 4 Test cases template Test Case Name Purpose Actors Description Environment Settings Trigger Preconditions Postconditions Special Requirements Assumptions Expected Results Notes and Issues Scalability Testing The scalability points are: the number of pending request the number of running agents Setup 2 Lustre FS, with separate OST and MDT. One FS will be used as backend (named backend FS), the other one will use HSM feature (named HSM FS). The backend FS must be the biggest (a size of 10 time the Lustre FS if possible). The HSM FS has more clients than the backend FS. Setup the HSM configuration at previously describe, no need of RBH. From the half of the HSM clients, create thousands or million of files of multiple sizes. After each file creation, archive the file (lfs hsm_archive file) and release it (lfs hsm_release file). From the other half of the HSM clients, read the previously created files and check they are ok (md5sum). The read test must be done after the release, so the restore feature will be tested. A simple way is to start the read test few minutes after the creation/archive test. If possible read the files in an order different from creation. A variant of test is the following: create million of files from all the clients: randomly choose a file check it's status: if release read it if not and not archived, archive it and release it loop Compatibility/Interoperability Testing Set up an HSM configuration. Setup 2.4 clients. From HSM clients, create files, archive and release them. After from older clients - read the files, they must be seen released (read/write failed with -ENODATA) From a HSM client, read the file, it will restore it. From the older client, re-read the file, it must be seen as from the HSM client (same content). HPDD High Performance Data Division 5 Performance Testing An agent must be able to move data at Lustre client speed How many transfer per client to get this speed? Test cases template (including the current typical test runs) Test Case Name Purpose Actors Description Environment Settings Trigger Preconditions Postconditions Special Requirements Assumptions Expected Results Notes and Issues Failover Testing During a test like the scalability or the Compatibility (but with HSM client only in place of 2.4 clients), crash, reboot the MDT. Must be transparent to the test. Test cases template Test Case Name Purpose Actors Description Environment Settings Trigger Preconditions Postconditions Special Requirements Assumptions Expected Results Notes and Issues User Interface Components: (content should be detailed so that it can be directly used in the Lustre Operations Manual if appropriate) Procfs/Sysfs entries added or changed: NODE PATH HPDD High Performance Data Division SETTINGS 6 MDT MDT MDT0000.mdt.hsm_control MDT0000.mdt.hsm.agent_actions MDT MDT MDT MDT MDT MDT MDT0000.mdt.hsm.agents MDT0000.mdt.hsm.archive_id MDT0000.mdt.hsm.grace_delay MDT0000.mdt.hsm.loop_period MDT0000.mdt.hsm.max_requests MDT0000.mdt.hsm.policy MDT MDT0000.mdt.hsm.requests MDT MDT0000.mdt.hsm.request_timeout These settings are for each MDT in a DNE configuration. enabled List or registered requests (RO) List of registered agents (RO) 1 60 10 10 x Agent count NonBlockingRestore [NoRetryAction] List of running requests (RO) 3600 Other system settings added or changed: New llapi calls LFS commands added or changed: - lfs hsm_archive - lfs hsm_release - lfs hsm_restore - lfs hsm_cancel LCTL commands added or changed: None New Scripts added: - sanity-hsm.sh Man Pages entries added or changed: - lfs-hsm.1 Test Plan Results: (To be populated by the Test Engineer) - Necessary data that demonstrates the success of the feature Important data from performance tests (speed, latency, and bandwidth) Include the data defined above in the test plan Additional reports or supporting documentation as defined above in the test plan Detailed configuration of the system used for testing HPDD High Performance Data Division 7 Other names and brands may be claimed as the property of others. HPDD High Performance Data Division 8
© Copyright 2026 Paperzz