ZFS Backup Recovery Cloning Using HGST Active Archive System

TECHNICAL REPORT
ZFS Backup,
Recovery and
Cloning
with the
HGST
Active
Archive
System
Roye Avidor
Technical Marketing Engineer
HGST
JANUARY 2016
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
Contents
Overview
1Overview
This paper shows you how to easily backup and recover ZFS file systems with
the HGST Active Archive System as its a central repository.
1Installation
3ZFS snapshot backup
by examples
The backup process makes a read-only copy of a ZFS file system state, using
ZFS file system snapshots. These copies will be stored in the HGST Active
Archive System. The stored copies can then be used to recover corrupted file
systems, roll back changes in test/dev environments, and also transfer them to
other servers.
4ZFS snapshot recovery
by examples
Using the s3cmd utility, you can easily transfer the data from and to the HGST
Active Archive System, and also manage the stored ZFS snapshot repository
within the HGST Active Archive System.
5ZFS incremental snapshot
backup by examples
ZFS agents
2Configuration
6ZFS incremental snapshot
recovery by examples
Backup stream
HGST Active
Archive System
7ZFS file system
transfer
Recovery stream
Installation
Step 1: Install the ZFS file system (not covered in this paper).
Step 2: Install the s3cmd command line from http://s3tools.org, or use the prebuilt binaries from http://s3tools.org/repositories
Step 3: Get the S3 endpoint hostname and port number for your HGST Active
Archive System.
Step 4: Get credentials (account) to access the HGST Active Archive System.
In this paper our S3 endpoint is “s3.hgst.com” with the corresponding port
number 7730, using the credentials:
Username: zfsUser
Password: zfsSecret
1
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
Configuration
To configure the s3cmd command line utility to access our HGST Active Archive System, we need
first to initialize its configuration file, and change a few configuration parameters.
Configure s3cmd as below:
#s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty
for using the env variables.
Access Key: zfsUser
Secret Key: zfsSecret
Default Region [US]:
Encryption password:
Path to GPG program [/bin/gpg]:
Use HTTPS protocol [Yes]: No
HTTP Proxy server name: s3.hgst.com
HTTP Proxy server port [3128]: 7730
New settings:
Access Key: zfsUser
Secret Key: zfsSecret
Default Region: US
Encryption password:
Path to GPG program: /bin/gpg
Use HTTPS protocol: False
HTTP Proxy server name: s3.hgst.com
HTTP Proxy server port: 7730
Test access with supplied credentials? [Y/n] n
Save settings? [y/N] y
Configuration saved to ‘/root/.s3cfg‘
#
Change the authentication version from Version 4 to Version 2 by editing the .s3cfg file.
Before the change:
signature_v2 = False
After the change:
signature_v2 = True
Change the host_base and host_bucket to point to our S3 endpoint by editing the .s3cfg file.
Before the change:
host_base = s3.amazonaws.com
host_bucket = %(bucket)s.s3.amazonaws.com
2
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
After the change:
host_base = s3.hgst.com
host_bucket = %(bucket)s.s3.hgst.com
The last step is to create a bucket to be used for backup and recovery procedures. Later, you can
create any number of buckets to manage the repository. You can organize different ZFS pools by
different buckets, or by servers.
[root@zfs ~]# s3cmd mb s3://zfs-bucket
Bucket ‘s3://zfs-bucket/’ created
[root@zfs ~]# s3cmd ls
2015-11-09 21:34 s3://zfs-bucket
From now on in this document, for all s3 operations we will use `s3://zfs-bucket` as our s3 endpoint
bucket name.
ZFS snapshot backup by examples
This section shows how to create a snapshot, and back it up to the S3 HGST Active Archive System.
Start by creating a new ZFS file system:
[root@zfs ~]# zfs create pool1/myDataFS
[root@zfs ~]# zfs list
NAME
USED AVAIL REFER
pool1
1002M 95.4G 23.5K
pool1/myDataFS
19K 95.4G
19K
[root@zfs ~]# ls -l /pool1/myDataFS
total 0
[root@zfs ~]#
MOUNTPOINT
/pool1
/pool1/myDataFS
Copy some data into the newly created ZFS file system:
[root@zfs ~]# cp -pr
/usr/share/dict /pool1/myDataFS/
[root@zfs ~]# ls -l /pool1/myDataFS/
total 1
drwxr-xr-x. 2 root root 4 Nov 5 08:10 dict
3
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
Create a new snapshot:
[root@zfs ~]# zfs snapshot pool1/myDataFS@Snap1
[root@zfs ~]# zfs list -t snapshot
NAME
USED AVAIL
pool1/myDataFS@Snap1
0
-
REFER
4.77M
MOUNTPOINT
-
Now, back up the snapshot to the HGST Active Archive System:
[root@zfs ~]# zfs send pool1/myDataFS@Snap1 | s3cmd put - s3://zfs-bucket2/snap1
‘<stdin>’ -> ‘s3://zfs-bucket2/snap1’
5043928 of 5043928
100% in
0s
[part 1, 4MB]
7.55 MB/s
done
See how we redirected the zfs data stream to the s3cmd utility using a pipe (‘|’)?
ZFS snapshot recovery by examples
This section shows how to recover a stored ZFS snapshot from the HGST Active Archive System.
Let’s make sure the ZFS file system is corrupted by removing the entire ZFS file system from its pool:
[root@zfs ~]# zfs list
NAME
USED AVAIL REFER MOUNTPOINT
pool1
1.01G 95.4G 27.5K /pool1
pool1/myDataFS 37.2M 95.4G 37.2M /pool1/myDataFS
[root@zfs ~]# zfs destroy -r pool1/myDataFS
[root@zfs ~]# zfs list
NAME
USED AVAIL
pool1
1.01G 95.4G
REFER
27.5K
MOUNTPOINT
/pool1
[root@zfs ~]# ls /pool1/myDataFS
ls: cannot access /pool1/myDataFS: No such file or directory
Now to recover the file system, first we will list the stored snapshots:
[root@zfs ~]# s3cmd ls
2015-11-09 21:34 s3://zfs-bucket
[root@zfs ~]# s3cmd ls s3://zfs-bucket
2015-11-09 21:47
5043928
s3://zfs-bucket/Snap1
4
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
It’s a one line command to recover the ZFS file system:
# s3cmd -F --no-progress get s3://zfs-bucket2/Snap1 - | zfs recv pool1/myDataFS
Pay attention to the “--no-progress” switch. This tells the s3cmd utility not to print any progress
information to the standard output. Otherwise, it will write verbose information to the STDOUT,
and corrupt the output of the zfs command.
Checking the recovered file system:
[root@zfs ~]# zfs list
NAME
USED AVAIL
pool1
1.01G 95.4G
pool1/myDataFS 37.2M 95.4G
REFER
27.5K
37.2M
MOUNTPOINT
/pool1
/pool1/myDataFS
[root@zfs ~]# ls -l /pool1/myDataFS/
total 1
drwxr-xr-x. 2 root root 4 Nov 5 08:10 dict
How easy was that?
ZFS incremental snapshot backup by examples
An incremental snapshot relates to two specific snapshots.
Before creating the second snapshot, let’s add new data to the file system, so each snapshot will
hold a different state of the ZFS file system:
[root@zfs ~]# ls -l /pool1/myDataFS/
total 1
drwxr-xr-x. 2 root root 4 Nov 5 08:10 dict
[root@zfs ~]# cp -pr /usr/share/man /pool1/myDataFS/
[root@zfs ~]# ls -l /pool1/myDataFS/
total 2
drwxr-xr-x. 2 root root 4 Nov 5 08:10 dict
drwxr-xr-x. 47 root root 47 Nov 9 12:58 man
Create the second snapshot:
[root@zfs ~]# zfs list -t snapshot
NAME
USED AVAIL
pool1/myDataFS@Snap1
10K
-
REFER
4.77M
MOUNTPOINT
-
[root@zfs ~]# zfs snapshot pool1/myDataFS@Snap2
[root@zfs ~]# zfs list -t snapshot
NAME
USED AVAIL REFER MOUNTPOINT
pool1/myDataFS@Snap1
10K
- 4.77M pool1/myDataFS@Snap2
0
- 37.2M -
5
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
Backup the incremental snaphost between Snap1 and Snap2, and store it:
# zfs send -i pool1/myDataFS@Snap1 pool1/myDataFS@Snap2 | s3cmd put - s3://zfsbucket/Snap1_Snap2
‘<stdin>’ -> ‘s3://zfs-bucket/Snap1_Snap2’ [part 1, 15MB]
15728640 of 15728640
100% in
1s
9.17 MB/s done
‘<stdin>’ -> ‘s3://zfs-bucket/Snap1_Snap2’ [part 2, 15MB]
15728640 of 15728640
100% in
1s
9.66 MB/s done
‘<stdin>’ -> ‘s3://zfs-bucket/Snap1_Snap2’ [part 3, 15MB]
15728640 of 15728640
100% in
1s
8.94 MB/s done
‘<stdin>’ -> ‘s3://zfs-bucket/Snap1_Snap2’ [part 4, 15MB]
15728640 of 15728640
100% in
1s
8.84 MB/s done
‘<stdin>’ -> ‘s3://zfs-bucket/Snap1_Snap2’ [part 5, 7MB]
7502992 of 7502992
100% in
0s
8.36 MB/s done
Here, we used the “-i” option that takes two snapshots and produces an incremental data stream.
ZFS incremental snapshot recovery by examples
To show how a ZFS file system can be recovered using incremental snapshots, we will corrupt the
local file system by destroying it.
Execute the destroy command to remove the pool1/myDataFS file system:
[root@zfs ~]# zfs list
NAME
USED AVAIL REFER MOUNTPOINT
pool1
1.01G 95.4G 27.5K /pool1
pool1/myDataFS 37.2M 95.4G 37.2M /pool1/myDataFS
[root@zfs ~]# zfs destroy -r pool1/myDataFS
[root@zfs ~]# zfs list
NAME
USED AVAIL
pool1
1.01G 95.4G
REFER
27.5K
MOUNTPOINT
/pool1
[root@zfs ~]# ls /pool1/myDataFS
ls: cannot access /pool1/myDataFS: No such file or directory
To apply an incremental snapshot we need to recover the base snapshot that was used when the
incremental snapshot was taken. In our example, the incremental snapshot was taken between
snaphost Snap1 and snapshot Snap2, so we will first recover Snap1:
# s3cmd ls
2015-11-09
2015-11-09
# s3cmd -F
s3://zfs-bucket
22:07 70417552
s3://zfs-bucket/Snap1_Snap2
21:47
5043928
s3://zfs-bucket/Snap1
--no-progress get s3://zfs-bucket/Snap1 - | zfs recv pool1/myDataFS
# ls -l /pool1/myDataFS/
total 1
drwxr-xr-x. 2 root root 4 Nov
5 08:10 dict
6
TECHNICAL REPORT
ZFS BACKUP, RECOVERY AND CLONING WITH THE HGST ACTIVE ARCHIVE SYSTEM
After recovering the base snapshot, then apply the incremental snapshot:
# s3cmd -F --no-progress get s3://zfs-bucket/Snap1_Snap2 - | zfs recv pool1/
myDataFS
[root@zfs ~]# ls -l /pool1/myDataFS/
total 2
drwxr-xr-x. 2 root root 4 Nov 5 08:10 dict
drwxr-xr-x. 47 root root 47 Nov 9 12:58 man
ZFS file system transfer
A ZFS file system transfer is similar to the recovery step, but this time we will use two different
servers. This is a simple way to clone file systems for test and development. These are the steps we
will take:
1. On the source server, create a snapshot and store it on the HGST Active Archive System:
host_base = s3.hgst.com
host_bucket = %(bucket)s.s3.hgst.com
2. On the target server, create a file system based on the stored snapshot:
target# zfs list
NAME
USED AVAIL
pool2
55K 96.4G
REFER
19K
MOUNTPOINT
/pool2
target# s3cmd --no-progress get s3://zfs-bucket/SNAP-current pool2/MyDataFS
| zfs recv
target# zfs list
NAME
USED AVAIL REFER MOUNTPOINT
pool2
37.3M 96.4G
21K /pool2
pool2/MyDataFS 37.2M 96.4G 37.2M /pool2/MyDataFS
target# ls -l /pool2/MyDataFS/
total 2
drwxr-xr-x. 2 root root 4 Nov 5 08:10 dict
drwxr-xr-x. 47 root root 47 Nov 9 12:58 man
This process can easily be scripted or automated, so that a test and development workflow can
easily take advantage of the quick snapshot capabilities of ZFS, while storing data in a highly
durable repository.
© 2016 HGST, Inc. 3403 Yerba Buena Road, San Jose, CA 95135 USA. Produced in the United States 01/16. All rights reserved.
HGST trademarks are intended and authorized for use only in countries and jurisdictions in which HGST has obtained the rights to use, market and advertise the brand. Contact HGST for
additional information. HGST shall not be liable to third parties for unauthorized use of this document or unauthorized use of its trademarks. Other trademarks are the property of their respective
owners.
References in this publication to HGST’s products, programs, or services do not imply that HGST intends to make these available in all countries in which it operates. Product specifications
provided are sample specifications and do not constitute a warranty. Information is true as of the date of publication and is subject to change. Actual specifications for unique part numbers may
vary. Please visit the Support section of our website, www.hgst.com/support/systems-support, for additional information on product specifications. Photographs may show design models.
TR01-ZFS-Backup-Recovery-Cloning-EN-0116-1