EMC Celerra Automated Storage Tiering

EMC CONFIDENTIAL – INTERNAL USE ONLY
EMC CONFIDENTIAL – INTERNAL AND PARTNER USE ONLY
DELETE IF THIS IS A PUBLIC DOCUMENT
EMC CONFIDENTIAL – INTERNAL USE ONLY
EMC CONFIDENTIAL – INTERNAL AND PARTNER USE ONLY
DELETE IF THIS IS A PUBLIC DOCUMENT
EMC Celerra Automated Storage Tiering
Applied Best Practices Guide
EMC NAS Product Validation
Corporate Headquarters
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Copyright © 2009 EMC Corporation. All rights reserved.
Published August, 2009
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION
MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
P/N h6499
2
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Contents
About this Document ............................................................................................................................... 5
Chapter 1
Automated Storage Tiering Overview ..................................................................................................... 7
Architectural overview............................................................................................................................. 8
Benefits of automated storage tiering ...................................................................................................... 8
Chapter 2
General Recommendations ...................................................................................................................... 9
Storage provisioning recommendations................................................................................................. 10
Recommendation #1 Evaluate the data set’s suitability for a tiered-storage architecture ..................... 10
Recommendation #2 Store primary tier data on EFD ........................................................................... 11
Recommendation #3 Store secondary tier data on SATA..................................................................... 11
Recommendation #4 Correctly size the primary file system on Tier 0/1.............................................. 11
Recommendation #5 Correctly size the secondary file system on Tier 2 ............................................. 12
Recommendation #6 Use deduplication on the secondary tier file system ........................................... 13
Recommendation #7 Use manual file system extension on the primary tier file system ...................... 13
Recommendation #8 Use automatic file system extension on the secondary tier file system............... 13
Recommendation #9 Use virtual provisioning on the secondary tier file system ................................. 14
Recommendation #10 Avoid client access to the secondary file system .............................................. 14
Archiving and deduplication policy recommendations.......................................................................... 14
Recommendation #11 Avoid unintended recalls to primary storage .................................................... 14
Recommendation #12 Use the default FileMover offline attribute setting ........................................... 14
Recommendation #13 Use aggressive deduplication policy settings.................................................... 15
Recommendation #14 Tune archiving policies to optimize primary tier capacity usage...................... 15
Recommendation #15 Use the FMA Create Preview feature ............................................................... 16
Recommendation #16 Tune archiving policies to take advantage of data set idiosyncrasies ............... 16
Recommendation #17 Tune archiving policies to match the active data set’s rate of change .............. 17
Recommendation #18 Tune archiving policies to account for data that is not archivable .................... 18
Recommendation #19 Run archiving jobs often enough to avoid overfilling the primary tier ............. 19
Chapter 3
Migration Strategies............................................................................................................................... 21
Single-tier migrations............................................................................................................................. 22
Recommendation #20 Use the multiple-tier migration technique whenever possible .......................... 22
Recommendation #21 Use the single-tier migration technique for NFS data....................................... 22
Recommendation #22 Use the single-tier migration technique if extra space is not available ............. 22
Recommendation #23 Use FMA to preview policies when migrating from Celerra or NetApp .......... 22
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
3
Contents
Recommendation #24 If file sizes and ages are unknown, archive all files during migration .............. 23
Recommendation #25 Migrate only a subset of the data set at a time .................................................. 23
Recommendation #26 After the migration, temporarily set the read recall policy to full recall ........... 23
Multiple-tier migrations ......................................................................................................................... 24
Recommendation #27 Use the single-tier migration technique for NFS data ....................................... 24
Recommendation #28 Correctly size the temporary file system........................................................... 25
Recommendation #29 Configure FMA and FileMover on the temporary file system and the target
primary file system................................................................................................................................. 25
Recommendation #30 Use the temporary file system to tune archiving policies.................................. 25
Recommendation #31 Set the CIFS backup option on the temporary file system to offline................. 25
Chapter 4
Data Protection Strategies ...................................................................................................................... 27
Backup recommendations ...................................................................................................................... 28
Recommendation #32 If the backup window allows, back up the data set as a single file system ....... 28
Recommendation #33 To reduce the backup window, back up primary and secondary file systems
separately .......................................................................................................................................... 28
Recommendation #34 When using the separate backup technique, back up the secondary file
system after FMA operations ................................................................................................................. 29
Restore recommendations ...................................................................................................................... 29
Recommendation #35 Use single-tier migration techniques when performing the full restore of
backups containing both primary and secondary tier files ..................................................................... 29
Recommendation #36 Preserve stub file synchronization when restoring separate backups of
primary and secondary tier data ............................................................................................................. 30
SnapSure recommendations ................................................................................................................... 30
Recommendation #37 Take separate snapshots of primary and secondary file systems....................... 30
Recommendation #38 Take snapshots of secondary file systems only after FMA activities................ 30
Recommendation #39 Preserve stub file synchronization when restoring snapshots of primary or
secondary tier file systems ..................................................................................................................... 31
Recommendation #40 Do not keep primary tier snapshots for more than 30 days ............................... 31
Recommendation #41 Minimize the number and age of secondary tier snapshots............................... 31
Recommendation #42 Minimize the number and age of primary tier snapshots ................................. 31
Recommendation #43 Store primary tier file system snapshots on secondary tier storage................... 31
Replication recommendations................................................................................................................ 32
Recommendation #44 Replicate both primary and secondary file systems .......................................... 32
Recommendation #45 Configure source and destination FMA devices ............................................... 32
Recommendation #46 Populate and deduplicate the source site secondary tier file system before
configuring replication........................................................................................................................... 32
Recommendation #47 Use the same CIFS domain on replicated sites ................................................. 32
Recommendation #48 Use local host files instead of DNS entries to resolve secondary tier
hostnames when using replication ......................................................................................................... 32
Chapter 5
Performance Recommendations............................................................................................................ 35
Introduction .......................................................................................................................................... 36
Recommendation #49 Use VLANs to segregate network traffic .......................................................... 36
Recommendation #50 Monitor read access to the secondary tier file system....................................... 36
4
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
About this Document
This document describes the best practices to configure and manage the Celerra automated storage
tiering architecture.
Audience
This document is intended for storage administrators who are engaged in planning deployments of
network-attached storage.
Related documents
The following documents provide additional, relevant information. Access to these documents is based
on your login credentials. If you do not have access to the following content, contact your EMC
representative:
♦
Achieving Storage Efficiency with EMC Celerra — Best Practices Planning
♦
Automated Tiered Storage with EMC Celerra and Rainfinity File Management Appliance —
Applied Best Practices
♦
EMC Rainfinity File Management Application Installation and User Guide
♦
Using EMC Celerra FileMover — Technical Module
♦
Using Celerra Data Deduplication — Technical Module
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
5
About this Document
6
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Chapter 1
Automated Storage
Tiering Overview
This chapter presents these topics:
Architectural overview............................................................................................................................. 8
Benefits of automated storage tiering ...................................................................................................... 8
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
7
Automated Storage Tiering Overview
Architectural overview
EMC® Celerra® automated storage tiering is a storage architecture that uses Celerra FileMover and
Rainfinity® File Management Appliance (FMA) to effectively create a multi-tiered file system by
distributing the contents of a data set across multiple file systems that reside on different storage tiers.
Network-attached storage (NAS) clients access the data set from the primary-tier file system regardless
of where the data is stored. To the client, the multi-tiered storage used for the data set appears to be a
single homogenous file system with a contiguous namespace. As shown in Figure 1, the active files in
the data set are stored on Tier 0 or Tier 1, which consists of enterprise Flash drives (EFD) or Fibre
Channel (FC) disks. The inactive files in the data set are stored on Tier-2 Serial ATA (SATA) disks.
Tier 0: EFD
Tier 1: FC / SAS
Tier 2: SATA
(Deduplication)
Rainfinity FMA
Figure 1
Celerra automated storage tiering architecture with FMA
Benefits of automated storage tiering
The automated storage tiering architecture enables Tier 0/1 performance benefits for the active portion
of a data set, while providing Tier 2 storage cost savings for the inactive portion. This provides Tier
0/1 performance without the cost of allocating Tier 0/1 storage for the entire data set. Also,
deduplication is used on Tier 2 to reduce the amount of storage required for the data set, which thereby
reduces the number of required drives along with the power and cooling costs.
8
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
General Recommendations
Chapter 2
General
Recommendations
This chapter presents these topics:
Storage provisioning recommendations ................................................................................................. 10
Recommendation #1 Evaluate the data set’s suitability for a tiered-storage architecture ..................... 10
Recommendation #2 Store primary tier data on EFD ........................................................................... 11
Recommendation #3 Store secondary tier data on SATA..................................................................... 11
Recommendation #4 Correctly size the primary file system on Tier 0/1 .............................................. 11
Recommendation #5 Correctly size the secondary file system on Tier 2.............................................. 12
Recommendation #6 Use deduplication on the secondary tier file system ........................................... 13
Recommendation #7 Use manual file system extension on the primary tier file system ...................... 13
Recommendation #8 Use automatic file system extension on the secondary tier file system............... 13
Recommendation #9 Use virtual provisioning on the secondary tier file system ................................. 14
Recommendation #10 Avoid client access to the secondary file system .............................................. 14
Archiving and deduplication policy recommendations.......................................................................... 14
Recommendation #11 Avoid unintended recalls to primary storage .................................................... 14
Recommendation #12 Use the default FileMover offline attribute setting ........................................... 14
Recommendation #13 Use aggressive deduplication policy settings.................................................... 15
Recommendation #14 Tune archiving policies to optimize primary tier capacity usage ...................... 15
Recommendation #15 Use the FMA Create Preview feature................................................................ 16
Recommendation #16 Tune archiving policies to take advantage of data set idiosyncrasies ............... 16
Recommendation #17 Tune archiving policies to match the active data set’s rate of change............... 17
Recommendation #18 Tune archiving policies to account for data that is not archivable .................... 18
Recommendation #19 Run archiving jobs often enough to avoid overfilling the primary tier ............. 19
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
9
Migration Strategies
Storage provisioning recommendations
This section details recommendations for storage provisioning.
Recommendation #1 Evaluate the data set’s suitability for a tiered-storage
architecture
Evaluate the data set for compatibility with a tiered-storage architecture before implementing a tieredstorage solution. The three data set characteristics that affect the effectiveness of tiering are:
♦
Distribution of file sizes
♦
Distribution of file access dates
♦
Number of files in the data set
Based on these characteristics, the following sections discuss the file size, file age, and file count
considerations.
File size considerations
Celerra FileMover does not reduce the space used on the primary tier storage for files that are smaller than
8 KB because each archived file is replaced with an 8 KB stub file on the primary tier file system.
Therefore, a data set that is mostly made up of 5 KB files is unsuitable for tiering. As the average file size
increases, the benefits of tiering also increase because greater primary tier space savings can be attained
with larger file sizes.
For example:
♦
A 1 TB data set of 5 KB files requires 1 TB of primary storage (You attain 0 percent primary tier space
savings by tiering).
♦
A 1 TB data set of 50 KB files requires approximately 150 GB of primary storage for stub files, if the
entire data set is archived (You can attain 85 percent primary tier space savings by tiering).
♦
A 1 TB data set of 1 MB files requires approximately 8 GB of primary storage for stub files if the
entire data set is archived (You can attain 99 percent primary tier space savings by tiering).
File age considerations
You can configure the FMA policy engine to select files for archiving based on a file’s level of activity,
which is indicated by the number of days since it was last accessed or modified. The goal of automated
storage tiering is to store the active data on the primary tier storage and the inactive data on the lower tiers.
If all files in a data set are accessed on a regular basis, none of the files will be considered inactive enough
to be candidates for archiving. An even distribution of last-accessed dates combined with the characteristic
that older files are less likely to be accessed increases the benefits of tiering.
For example:
10
♦
A 1 TB data set where all files are accessed on a regular basis requires 1 TB of primary storage (You
attain 0 percent primary tier space savings by tiering).
♦
A 1 TB data set where 80 percent of the files are rarely accessed requires approximately 200 GB for
the active files (You can attain 80 percent primary tier space savings by tiering).
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
General Recommendations
♦
A 1 TB data set where 80 percent of the files are more than 2 months old and where files that are more
than 2 months old are never accessed will require approximately 200 GB for the active files (You can
attain 80 percent primary tier space savings by tiering. The percentage savings increase as the data set
grows).
Note that the amount of storage consumed by the active files in the third example remains constant over
time because it is possible to schedule a policy to archive all files that are more than two months old at
regular intervals. This creates a “sliding window” such that at any given time, only two months worth of
data is stored on the primary storage tier. Regardless of how much the data set grows, the amount of
primary storage space required to store the active files remains at 200 GB. This makes the example a very
good candidate for tiering because you can use the primary tier to store the sliding window of 200 GB of
active data and use the secondary tier as an overflow for an ever-increasing amount of inactive data.
File count considerations
Each FMA/VE virtual appliance can archive up to 50 million files. The hardware FMA appliance is
limited to 200 million files for each FMA. When the possibility of implementing tiering for a data set with
a large number of files is evaluated, ensure that the future growth rate of the total number of files in all
tiered data sets on Celerra is included, so that the number of archived files falls within the published limits
of the FMA product selected.
Recommendation #2 Store primary tier data on EFD
The performance of enterprise Flash drives (EFD) far exceeds other disk drive technologies. Based on the
data access characteristics, a set of EFDs can support up to 30 times the I/O rate of a similar number of FC
drives.
Recommendation #3 Store secondary tier data on SATA
In the automated storage tiering architectural model, only inactive data is stored on the secondary tier.
Therefore, you can achieve the cost savings of storing secondary tier data on high-capacity SATA drives
with minimal impact on the production data access because all active data is stored on the primary tier. The
files archived to the secondary tier are still online and readily available and when updated, the files are
automatically promoted to the primary tier.
Recommendation #4 Correctly size the primary file system on Tier 0/1
The benefits of automated storage tiering are maximized by restricting the amount of storage allocated to
the primary file system to the size that is required to store active files and stubs for inactive files. Before
migrating to a automated storage tiering architecture, choose the criteria to differentiate between active and
inactive data. This can be based on access characteristics such as data last accessed or modified, or you can
choose criteria that results in an arbitrary percentage of data to remain on the primary tier, for example,
declaring the most recently accessed 20 percent of the data set as the active portion.
To protect against accidental recalls of archive data, a file that is archived will not be relocated back to the
primary tier storage unless it is modified (Recommendation #11 on page 14). Because read access to the
archived files is significantly slower than read access to the primary tier files, it is important to evaluate
data access patterns to ensure that archived files are unlikely to require repeated future read access.
After you select the archiving criteria, use FMA to preview policy definitions, which split the data into
active and inactive portions, paying special attention to the total size of the active portion.
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
11
Migration Strategies
Provision the primary tier file system such that it is large enough to store all active data in addition to
having enough space to store an 8 KB stub for each file in the inactive portion. Leave additional space in
the primary file system for future growth in the form of space for additional stub files. Note that the
primary tier always stores the active portion of the data set and that, over time, as new data is added to the
primary tier, older data is archived to the secondary tier. With a constant rate of growth, this process
renders the size required for the primary file system as fairly static with the new file creation rate in a state
of equilibrium with the old file archiving rate. However, each archived file still requires 8 KB of space on
the primary file system. Therefore, make an allowance for the primary storage consumed by these stub files
as the data set continues to grow.
For example, use the following calculation to provision primary storage for 70 percent usage, that is, when
primary storage is used to store 20 percent of the most active files of a 1 TB data set of 2 million files. This
data set is expected to grow by 1 million files (500 GB) per year for the next two years:
Current active data
= 20% of 1 TB, 20% of 2 million files
= 200 GB consumed by 400,000 files
Current inactive data
= 80% of 1 TB, 80% of 2 million files
= 800 GB consumed by 1,600,000 files
Stub files for the inactive data
= 8 KB * 1,600,000 files
= 12 GB
Stub files for the next 2 years
= 8 KB * 2,000,000 files
= 15 GB
Space required on the primary tier
= 200 GB + 12 GB + 15 GB
= 227 GB
Target utilization of the primary tier
= 70%
Recommended size of the primary tier
= 227 GB * (100%/70%)
= 324 GB
Recommendation #5 Correctly size the secondary file system on Tier 2
Ensure that the secondary tier file system is large enough to hold all currently inactive data in addition to
providing future space for currently active data and future growth data as it becomes inactive over time.
Using the same data set described in Recommendation #4, use this protocol to estimate the secondary
storage required to store the inactive portion of the data set:
Growth of the inactive data (2 years)
= 1 TB consumed by 2 million files
Logical size of the secondary data
= 800 GB currently inactive + 1 TB future growth
= 1.8 TB
Target utilization of the secondary
12
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
= 70%
General Recommendations
Recommended size of the secondary
= 1.8 TB * (100%/70%)
= 2.6 TB
Recommendation #6 Use deduplication on the secondary tier file system
The use of deduplication on the secondary file system reduces the amount of storage that needs to be
allocated to it. By using the data set described in Recommendation #5, the following calculation yields a
reduced recommended size for the secondary file system:
Logical size of the secondary data
=
1.8 TB
Expected deduplication savings
=
40%
Deduplicated size of the secondary data
=
(100%-40%) * (1.8 TB)
=
1.1 TB
Target utilization of the secondary tier
=
70%
Recommended size of the secondary
=
1.1 TB * (100%/70%)
=
1.6 TB
This calculation yields a net savings of 1 TB of secondary storage when compared with the calculation in
Recommendation #5.
The deduplication savings percentage varies based on the degree of duplication and the compressibility of
the files in the data set. Files that are less than 24 KB in size will not be deduplicated. Therefore, data sets
that contain a high percentage of small files experience lower rates of deduplication savings.
Use FMA orphan file management to minimize the consumption of secondary tier storage by archived
versions of files that have been deleted or modified on the primary tier file system.
Recommendation #7 Use manual file system extension on the primary tier file
system
File system auto-extension enables the space allocated to a file system to grow automatically upon reaching
a capacity utilization high water mark. The automated storage tiering architecture is designed to offload
primary tier data to secondary tiers to avoid growing the primary tier file system. A primary tier file system
that is used with a properly tuned archiving policy does not require additional space unless the data set has
increased to the point where the sum of the space required for the active primary tier data and for the stub
files pointing to the archived secondary tier files does not fit into the allocated primary tier space.
Automatic file system extension runs the risk of accidentally extending the primary file system because of
an accumulation of inactive files in the primary tier. A properly tuned archiving policy archives the files to
the secondary tier to free up space in the primary tier, and thereby avoid the need for file system extension.
Recommendation #8 Use automatic file system extension on the secondary tier
file system
You need to extend the secondary tier file system only if all eligible files have been deduplicated and more
space is required to archive additional files from the primary file system. Ensure that you do not extend the
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
13
Migration Strategies
secondary file system because of an accumulation of files that have not been deduplicated. Deduplicate all
eligible files on the secondary tier file system to free space in the secondary tier and avoid the need for file
system extension. Use of aggressive deduplication policies (Recommendation #13 on page 15) helps to
avoid unnecessary secondary tier file system extension.
However, because the secondary tier represents an overflow area for the primary tier, it is reasonable to use
auto-extension to enable the secondary file system to start small and grow as-needed over time, as files ageout of the primary file system and are archived to the secondary file system.
Recommendation #9 Use virtual provisioning on the secondary tier file system
In the automated storage tiering architecture, the goal is to keep primary tier file system utilization at a
constant and moderately high level so that the performance advantages of primary tier disk technologies is
maximized. In contrast, usage of the secondary tier increases over time as new data is added to the primary
tier and older data stored on the primary tier becomes inactive and is archived to the secondary tier. The
continuous growth typical of archive storage is a natural fit for using storage pools, automatic file system
extension, and virtual provisioning. Because a portion of the available secondary tier storage space is
intended for future growth needs, a collection of secondary file systems can share the same storage pool
and use virtual provisioning to effectively “share” the storage space set aside for future growth needs. As
usage increases over time, some file systems use more or less than what was originally predicted. Virtual
provisioning avoids the trap of over-allocating space for future growth to specific file systems that do not
require it.
Recommendation #10 Avoid client access to the secondary file system
Do not allow Celerra clients to directly access the secondary file system. Any modification of archived files
on the secondary tier causes the files to be inaccessible through the stub files on the primary storage tier.
Archiving and deduplication policy recommendations
This section details recommendations for archiving and deduplication.
Recommendation #11 Avoid unintended recalls to primary storage
The automated storage tiering architecture provides enough space in the primary tier for active data, but not
enough to store the entire data set. Therefore, it is important to avoid accidentally recalling data from the
secondary tier to the primary tier because these uncontrolled recalls may consume all the capacity allocated
to the file system on the primary tier. To avoid accidental read recalls triggered by content searches,
Network File System (NFS) backups, Windows offline folder synchronization or client-based anti-virus
scans, use the FileMover fs_dhsm command to set the read_policy_override option to passthrough.
$ fs_dhsm -modify fs_name -read_policy_override passthrough
where fs_name is the name of the FileMover-enabled primary file system.
Recommendation #12 Use the default FileMover offline attribute setting
If Recommendation #11 is not followed, EMC strongly recommends that you accept the default value for
the –offline_attr option. The default value enables Celerra to indicate to Common Internet File System
(CIFS) clients that a file is archived. A small marker appears at the bottom-left corner of the file icon when
the file appears in Windows Explorer. If the option is disabled and the read_policy_override option is not
14
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
General Recommendations
set to passthrough, archived files can be recalled to primary storage unnecessarily because some
applications, such as Windows Explorer, read parts of files when a user views the file’s enclosing folder.
Recommendation #13 Use aggressive deduplication policy settings
Archived files on the secondary file system are never modified in place because modifications to archived
files occur only after the files have been recalled to the primary file system. Therefore, deduplication policy
settings on the secondary tier can treat archived files as if they are read-only and disregard the modification
time parameter built into the default deduplication policy. In addition, because typical archiving policies
include an access time parameter, it is likely that newly-archived files have not been accessed in some time.
Therefore, it is reasonable to disregard the access time parameter built into the default deduplication policy.
To set both these values to 0 causes all files on the secondary to be considered for deduplication without
delay and saves capacity on the secondary tier, at the same time incurring little or no performance penalty
when accessing archived files from the primary file system. This is because there is a very small
performance penalty for reading deduplicated files.
Use the following commands on the Celerra Control Station to set the policy engine for a Data Mover to
ignore the access and modification time when considering files for deduplication:
server_param server_2 –f dedupe –modify accessTime –value 0
server_param server_2 –f dedupe –modify modificationTime –value 0
where server_2 is the Data Mover name that hosts the secondary file system.
Recommendation #14 Tune archiving policies to optimize primary tier capacity
usage
One of the automated storage tiering goals is to maximize the usage of high-performance primary tier
storage and take advantage of the cost savings offered by high-capacity secondary tier storage. An
archiving policy designed to achieve this goal must keep data as much as possible on the primary tier
without allowing the primary tier to reach 100-percent capacity. If automatic file system extension is used
on the primary tier, the goal of the archiving policy must be to keep the primary tier utilization below the
auto-extension high water mark. Careful tuning of the archiving policy is required to ensure that primary
tier usage does not exceed the limits.
There are several factors that affect the capacity usage of the primary tier file system:
♦
Capacity consumed by stub files
♦
Capacity consumed by files that are too small to archive
♦
Capacity consumed by files that are too recently active to archive
Stub files generally consume 8 KB of space. In a typical file system, the number of stub files grows as new
files are added to the file system and older files are archived. This increases the amount of primary tier
capacity that is consumed by the stub files. The percentage of data set space consumed by stub files is
highly dependent on the distribution of file sizes in the data set. In general, the smaller the average file size,
the higher is the percentage of primary tier space that is consumed by the stub files. Recommendation #19
on page 19 provides more information.
Likewise, if some of the new files added to the file system are below the size threshold for archiving (8 KB
by default), the capacity consumed by small files increases over time. The amount of primary tier space
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
15
Migration Strategies
consumed by files that are too small to archive is also dependent on the distribution of file sizes in the data
set and is discussed further in Recommendation #19 on page 19.
The last category of files that consumes space in the primary tier is the set of files that are too active to
archive. The administrator can control the capacity consumed by files that are too recently active to archive
through careful selection of the archiving criteria when the archiving policy is created or modified. For
example, consider a policy that archives files that match the policy rule “last_accessed > 30 days”.
However, a more recent cutoff date (last_accessed > 15 days) results in more files matching the rule and
therefore, more files being archived. This decreases the capacity usage of the primary tier. In contrast, a
cutoff date that is further in the past (last_accessed > 60 days) results in fewer files matching the rule and
therefore, fewer files being archived. This increases the capacity usage of the primary tier. Figure 2 shows
the general relationship between the archiving policy and the capacity usage of the primary tier file system.
Note that the actual effect of the specific last_accessed date criteria on a specific file system is based on the
distribution of last_accessed dates in the file system and the sizes of the files that meet the archiving
criteria.
Figure 2
free space
free space
free space
active files
active files
active files
stub files
small files
stub files
small files
stub files
small files
last_accessed > 15
last_accessed > 30
last_accessed > 60
Effect of archiving policy criteria on primary tier capacity usage
Recommendation #15 Use the FMA Create Preview feature
Use the FMA Create Preview feature to measure the effects of an archiving policy without actually moving
files. Compare these results to the available capacity in the primary file system before scheduling the
archiving job. Note that it may take a significant amount of time to generate a preview on a large data set.
Ensure that previews are generated when the FMA is not under a heavy load with other tasks.
Recommendation #16 Tune archiving policies to take advantage of data set
idiosyncrasies
If there are known non-date attributes that you can use to predict whether a file needs to be accessed or
modified, set up separate policies to archive files based on these attributes. For example, if all files in a
particular directory or with a particular file name format are known to be static records that are unlikely to
be accessed, create a policy rule that specifically matches these files. For example, if data about cancelled
accounts is rarely accessed and is always stored in a specific directory, create an archiving rule to
automatically archive all files that reside in that directory. The “Implementing a File Tiering Strategy”
chapter in the Automated Tiered Storage with EMC Celerra and Rainfinity File Management Appliance —
Applied Best Practices document provides more information about tailoring archiving criteria for specific
data set characteristics.
16
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
General Recommendations
Recommendation #17 Tune archiving policies to match the active data set’s rate
of change
After an archiving policy is created to optimize the amount of active data that is stored directly on the
primary tier file system, it is important to check the effects of the policy periodically to ensure that
variations in the growth rate of the active portion of the data set do not cause the policy to archive too many
or too few files to keep the primary tier capacity usage at an optimal level.
For example, previews of archiving policies run against a data set at a particular point in time may indicate
that archiving all files that have not been accessed in the past 30 days results in a comfortable margin of 40
percent free space in the primary tier file system. However, if the rate of data access increases or if an
unexpectedly large number of new files are added to the data set, it is possible that re-running the same
policy might not archive enough files to keep the primary tier capacity usage from exceeding the desired
level. In a worst-case scenario, a poorly tuned policy will not archive any files at all. As a result, the file
system may grow until it reaches 100 percent or is automatically extended.
At the opposite end, if there are no new files, or if access to the existing data drops off dramatically,
re-running the same policy might archive a significantly higher percentage of files, resulting in very low
capacity usage on the primary tier. In a worst-case scenario, all files will be archived, resulting in very low
primary tier capacity utilization. Additionally, it leads to performance degradation if files that are
unnecessarily archived need to be accessed or recalled to the primary tier file system for modification.
It is worth noting the variety of mechanisms that can result in the growth of primary tier active data, where
a file is considered active based on the time since it was last accessed:
♦
Creating new files
♦
Modifying primary tier files where the size of the file is increased
♦
Modifying inactive but not yet archived files, regardless of whether the file size is increased
♦
Reading inactive but not yet archived files (because reading changes the last accessed date)
♦
Modifying previously archived files (Regardless of whether the size of the file is changed, archived
files are recalled upon modification. If the new size of the file is greater than the size of a stub file, the
size of the set of active files in the data set increases.)
The following types of access have no effect on the size of the set of active files:
♦
Reading existing active files
♦
Reading previously archived files (assuming read recall is set to passthrough)
♦
Deleting inactive files, whether or not they have been archived yet
Several mechanisms shrink the size of the active portion of the data set:
♦
The passage of time (active files age-out into inactive files if they are not accessed)
♦
Modifying existing active files where the size of the file is reduced
♦
Deleting an active file
Because it is difficult to quantify these factors independently, it is relatively easier to think in terms of the
total net growth rate of active data. If this growth rate is expressed in terms of new space consumed per
day, that is, N GB/day, you can consider an archiving policy that archives files that have not been accessed
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
17
Migration Strategies
in a given number of days as leaving M days worth of data on the primary tier. You can calculate M days
worth of data by multiplying the active data growth rate and the number of days excluded from the
archiving policy. For example, an active data growth rate of 10 GB/day and an archive policy that archives
files that have not been active in the past 30 days leaves 30 days worth of data (30 days * 10 GB/day) or
300 GB of active data on the primary tier. This active data can be added to the set of stub files and files that
are too small to archive to calculate the total usage of the primary tier after the archiving policy is run.
Files that are too small to archive are excluded from the growth rate of active files. Small files that are
added to the data set over time can be considered as file system overhead, much in the same way that stub
files are considered as overhead. They both consume space in the primary file system and therefore,
consumes space that could otherwise be used for active data. Over a long period of time, the overhead
reduces the amount of space that is available for truly active data and results in the need to tune archive
policies to shrink the size of the time window wherein a file is considered active.
Recommendation #18 Tune archiving policies to account for data that is not
archivable
Over a period of time, the primary file system accumulates stub files that point to previously archived files.
It also accumulates files that cannot be archived because of their small size or because they do not match
the archiving policy’s file selection criteria. This accumulation of used capacity reduces the amount of
capacity available to store currently active files.
The amount of primary tier space consumed by stub files and small files is directly related to the
distribution of file sizes in the data set.
Example 1
1 TB of storage space consumed by 1 million files
Average file size = 1 MB
Space required for stub files for the entire file system = (1 million * 8 KB per stub) = 7.6 GB
Example 2
1 TB of storage space consumed by 20 million files
Average file size = 50 KB
Space required for stub files for the entire file system = (20 million * 8 KB per stub) = 152.6 GB
For Example 1, the upper space limit that stub files can consume is less than 1 percent of the data set size.
The growth rate of the space consumed by the stub files therefore increases at less than 1 percent of the
growth rate of the entire file system. Therefore, the space consumed by the stub files can largely be ignored
for the purposes of tuning archiving policies.
In Example 2, the potential primary tier space needed for stub files is 75 percent of the 200 GB allocated to
the primary tier file system, in a automated storage tiering architecture that allocates 20 percent of the file
system’s storage on the primary tier and 80 percent on the secondary tier. Given that 75 percent is a
reasonably high-capacity usage rate for the primary tier, there will not be any primary tier space remaining
to comfortably store active files. If the data set in example 2 was created and archived over an extended
period of time, the administrator who is responsible for archiving needs to adjust the archiving policy to
gradually reduce the amount of active data allowed on the primary tier to account for the increasing amount
of space consumed by the stub files pointing to the archived data on the secondary tier. Though this
18
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
General Recommendations
practice accommodates continued growth of the data set, eventually the primary tier will run out of space
unless older files are purged when they are no longer needed. Furthermore, unless space is freed on the
primary tier by deleting outdated files, the space consumed by the stub files eventually reduces the space
available for active files to the point where even active files will need to be archived, leading to the need to
add storage to the primary tier file system to continue providing high-tier performance for the active files in
the data set.
Recommendation #19 Run archiving jobs often enough to avoid overfilling the
primary tier
An optimally tuned archiving policy resets the capacity usage of the primary tier file system to the lower
end of the target usage range defined by the administrator. As new files are created and existing files are
recalled for modification, the capacity usage increases. Run scheduled archiving jobs often enough to keep
the primary file system from running out of space or exceeding auto-extension high water mark triggers.
The frequency depends on the amount of free space in the primary tier file system at the desired lower end
of the capacity usage and the rate of growth between archiving runs.
The following example demonstrates a method of deriving archiving policies and scheduling from a data
set’s rate of change and the desired capacity usage of the primary file system. The example also shows how
to determine the impact of small files and stub files on archiving policies and schedules.
Example
Size of the primary file system:
200 GB
Growth rate of the active data:
2 GB/day consumed by 10,000 new files, including 2,000 files
that are less than 8 KB in size
Target primary utilization:
60 percent to 80 percent full
Using these parameters, it is possible to calculate an archiving policy and archiving schedule, and also to
assess how much space will be consumed by stub files and files that are too small to archive.
Archiving policy to preserve the capacity utilization target:
Lower range of target = 60% of 200 GB = 120 GB
120 GB primary tier space is available for the active data initially.
This space will be consumed by a sliding window of 60 days worth of new data
120 GB target /2 GB per day growth rate = 60 days
Recommendation: Archive all files that are more than 60 days old
Archiving scheduling frequency:
Calculate the difference between the upper and lower bounds of target usage
= (80% * 200 GB) – (60% * 200 GB)
= 160 GB – 120 GB
= 40 GB
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
19
Migration Strategies
Calculate the number of days at the projected growth rate that will consume this space
= 40 GB / 2 GB per day = 20 days
Recommendation: Run an archiving scan at least once every 20 days
Growth rate of capacity usage overhead that cannot be reduced by archiving:
Calculate the stub file growth rate
= 8 KB per stub * 800 files per day growth rate for files large enough to archive
= 6.25 MB/day
Calculate growth rate for files that are too small to archive
= 8 KB per file * 200 files per day growth rate for files less than 8 KB in size
= 1.5 MB/day
Add these together to get the primary file system space consumption that cannot be archived
= 6.25 MB/day stub file growth rate + 1.5 MB/day small file growth rate
= 7.75 MB/day
Primary file system usage that cannot be reclaimed by archiving grows 7.75 MB/day.
After a year, the space that cannot be reclaimed by archiving grows to 2.75 GB. This has a negligible effect
on the available space in the primary file system and can be comfortably ignored for several years. Note
that this conclusion is dependent on the distribution of file sizes in the data set. The example uses an
average file size of approximately 200 KB. However, significantly smaller files result in a much higher
stub file growth rate as a percentage of the overall data growth rate.
20
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Chapter 3
Migration Strategies
This chapter presents these topics:
Single-tier migrations.................................................................................................................................... 22
Recommendation #20 Use the multiple-tier migration technique whenever possible ................................. 22
Recommendation #21 Use the single-tier migration technique for NFS data .............................................. 22
Recommendation #22 Use the single-tier migration technique if extra space is not available .................... 22
Recommendation #23 Use FMA to preview policies when migrating from Celerra or NetApp ................. 22
Recommendation #24 If file sizes and ages are unknown, archive all files during migration ..................... 23
Recommendation #25 Migrate only a subset of the data set at a time ......................................................... 23
Recommendation #26 After the migration, temporarily set the read recall policy to full recall .................. 23
Multiple-tier migrations ................................................................................................................................ 24
Recommendation #27 Use the single-tier migration technique for NFS data .............................................. 24
Recommendation #28 Correctly size the temporary file system.................................................................. 25
Recommendation #29 Configure FMA and FileMover on the temporary file system and the target
primary file system........................................................................................................................................ 25
Recommendation #30 Use the temporary file system to tune archiving policies......................................... 25
Recommendation #31 Set the CIFS backup option on the temporary file system to offline........................ 25
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
21
Migration Strategies
Single-tier migrations
The following recommendations are applicable only to single-tier migrations. Figure 3 shows the single-tier
migration technique. It involves migrating data directly to the primary tier file system and actively
archiving inactive files from the primary tier to the secondary tier during the migration process.
Migrate files to
primary tier
Source file
system
Archive files to
secondary tier
during migration
Migration source
Figure 3
Primary Tier
Secondary Tier
Migration destination
(Celerra)
Single-tier migration technique with active archiving
Recommendation #20 Use the multiple-tier migration technique whenever
possible
If extra storage space is available and the migrated data uses the CIFS protocol, use the multiple-tier
migration technique as described on page 24. The multiple-tier migration technique is enabled by stubaware copy utilities, which are only supported by EMC for the CIFS protocol.
Recommendation #21 Use the single-tier migration technique for NFS data
Because EMC does not currently support a stub-aware copy mechanism for NFS, use the single-tier
migration method to copy data into the automated storage tiering architecture if the data set consists of NFS
data.
Recommendation #22 Use the single-tier migration technique if extra space is
not available
Because the multiple-tier migration technique requires an additional temporary file system that is large
enough to store the entire data set, use the single-tier technique if the storage system does not have
sufficient extra space to allocate to the temporary file system for the duration of the migration.
Recommendation #23 Use FMA to preview policies when migrating from Celerra
or NetApp
If the data to be migrated to the automated storage tiering architecture is currently residing on a Celerra or
NetApp system, use the FMA Create Preview feature to establish an initial archiving policy before the data
is migrated. If the data set is currently active, use the preview feature to determine the file-age selection
22
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Migration Strategies
criteria that will archive enough data to the secondary storage so that the active data remaining on the
primary storage meets the targeted lower range of the desired primary tier capacity usage.
For example, migrate a 1 TB data set to a automated storage tiering architecture consisting of a 200 GB
primary tier file system with a desired target capacity usage of 60 percent to 80 percent of primary tier
space. Use the FMA preview feature to discover the policy criteria that will result in 120 GB of the most
active data remaining on the primary tier (120 GB = 60 percent of the 200 GB of total primary tier
capacity). Remember to make an allowance for space consumed by stub files (8 KB for each archived file)
and files that are too small to archive (less than 8 KB in size) as described in Recommendation #18 on page
18. After an appropriate policy is discovered, use it during the migration process to offload inactive files
from the primary tier file system.
Recommendation #24 If file sizes and ages are unknown, archive all files during
migration
If the data set is migrated from a Celerra or NetApp system, use FMA to derive an initial archiving policy
during the migration as described in Recommendation #23. If the data is migrated from another platform, it
may not be possible to decide on an archiving policy before the migration because of a lack of information
about the file size and age distributions in the data set. Therefore, establish a working archiving policy after
the migration, and use a special archiving policy during the migration process to offload data from the
primary tier. A suitable archiving policy that can be used during the migration is one that archives all files,
so as to allow the entire data set to be migrated to a primary tier file system that is smaller than the size of
the data set. Before starting the migration, create a policy that selects all files for archiving, such as a policy
with the criteria last_accessed > 0.
Recommendation #25 Migrate only a subset of the data set at a time
During the migration, FMA may not be able to archive files off the primary tier file system at a sufficient
rate to keep up with the incoming data from the migration process. To avoid accidentally filling the primary
file system to 100 percent, divide the data set into smaller portions and migrate them serially. Divide data
sets into smaller portions by migrating sets of directories at a time and by using Windows and Linux tools
to assess the size of the directories targeted for each migration session. After each subset of the data set is
migrated, manually execute FMA archiving to offload newly migrated data to the secondary tier file
system. Allow FMA to complete the scanning and archiving process between the migration sessions.
Ensure that each migrated portion is small enough to fit completely into the free space remaining in the
primary tier file system after the previous migration/archiving operation. When using the “archive
everything” approach described in Recommendation #24, most of the space in the primary file system is
freed up after each archiving session. However, if more selective archiving criteria is used as suggested in
Recommendation #23 on page 22, each migration/archiving session will result in a lesser amount of
primary tier file system space for migrating the next portion of the data set. Therefore, pay careful attention
to the available primary tier space and to the size of the migrated portion of the data set to avoid filling the
primary tier file system to 100 percent. If the primary tier file system is configured to auto-extend, ensure
that it does not exceed the high water mark threshold for auto-extension during the migration process.
Recommendation #26 After the migration, temporarily set the read recall policy
to full recall
If a tuned archiving policy is used during the migration (Recommendation #23 on page 22), the end result
of the migration is a primary tier file system filled with active data at the desired target capacity usage
level. However, if the archive everything approach of Recommendation #24 is used, after the migration is
complete, the primary tier file system contains only stub files and files that are too small to archive. You
can address this underutilization of the primary tier by doing nothing and allowing the primary tier file
system usage to increase gradually as new data is created. However, this approach exposes users to a
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
23
Migration Strategies
temporary performance penalty when they access recent and presumably active files that were archived to
the secondary tier during the migration process. You can minimize this performance penalty by allowing
Celerra to automatically recall archived files back to the primary tier file system whenever they are
accessed for the first time after the migration. To allow the recall of archived files on read access, use the
FileMover fs_dhsm command to set the read_policy_override option to full.
$ fs_dhsm -modify fs_name -read_policy_override full
where fs_name is the name of the FileMover-enabled primary file system
Closely monitor the capacity usage on the primary tier file system when using this setting. When the
primary tier capacity usage reaches the desired target usage, disable read recalls as described in
Recommendation #11 on page 14. Note that when the read recall policy setting is full, it is possible that file
system scanning applications like content searches, NFS backups, Windows offline folder synchronization,
and client-based anti-virus scans may accidentally trigger recalls of all archived data to the primary file
system. This may lead to accidental file system auto-extension or failed client writes and therefore, it is
desirable to limit exposure to this risk by minimizing the time period during which the read recall policy is
set to full. Use the default offline attribute settings described in Recommendation #12 on page 14 to further
limit the risk of accidental recalls.
Multiple-tier migrations
The following recommendations are applicable only to multiple-tier migrations. The multiple-tier migration
technique uses three file systems: the target primary tier file system, the target secondary tier file system,
and a temporary file system. The migration involves copying the entire data set to a temporary file system,
and then using FMA to archive inactive files from the temporary file system to the target secondary tier file
system. After archiving is complete, the remaining files and stub files on the temporary file system are
copied to the target primary tier file system by using a stub-aware migration tool such as the EMCopy
utility. After the stub-aware migration is complete, the temporary file system can be deleted and its space
freed for other uses. Figure 4 illustrates the multiple-tier migration technique.
Source file
system
1) Migrate files to
temporary file
system
Migration source
Figure 4
3) Stub-aware
copy to
primary tier
Temporary
file system
2) Archive files
to secondary
tier file system
Primary Tier
Secondary Tier
Migration destination (Celerra)
Multiple-tier migration technique by using a temporary file system
Recommendation #27 Use the single-tier migration technique for NFS data
The multiple-tier migration technique requires a stub-aware copy mechanism. Because EMC does not
currently support a stub-aware copy mechanism for NFS, use the single-tier migration method to copy NFS
data into the automated storage tiering architecture. This recommendation is a repeat of Recommendation
#21 on page 22.
24
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Migration Strategies
Recommendation #28 Correctly size the temporary file system
Ensure that the temporary file system is large enough to contain the entire data set. Because it is only used
for staging, it can be provisioned from any tier of storage. Size the target secondary tier file system as
described in Recommendation #5 on page 12.
Recommendation #29 Configure FMA and FileMover on the temporary file
system and the target primary file system
Ensure that FileMover is enabled to point to the same target secondary tier file system for archiving on both
the temporary file system and the target primary tier file system. Similarly, configure FMA such that both
primary file systems archive to the same secondary. Before doing the stub-aware copy, enable FileMover
on the target primary tier file system, and ensure that the connections to the target secondary file system
match the connections defined on the temporary file system.
Recommendation #30 Use the temporary file system to tune archiving policies
The temporary file system is used as a staging location to divide the data set into active and inactive
portions that will be stored on the target primary and secondary file systems, respectively. Use the FMA
preview feature (Recommendation #14 on page 15) to select the archiving criteria that ensures the size of
the data remaining in the temporary file system after archiving is equal to the desired target capacity usage
of the primary tier file system. After tuning the archiving policy, run it against the data in the temporary file
system and verify that the remaining data in the temporary file system fits into the primary tier file system.
Recommendation #31 Set the CIFS backup option on the temporary file system
to offline
Use a stub-aware copy mechanism to copy the data that remains on the temporary file system after
archiving to the target primary tier file system. The EMCopy utility performs a stub-aware copy as long as
the FileMover CIFS backup option on the temporary file system is set to offline. However, it is unnecessary
to set this option on the target primary tier file system. This parameter is set by executing the following CLI
command on the Celerra Control Station:
$ fs_dhsm -modify fs_name -backup offline
where fs_name is the name of the temporary file system.
Note: Failure to set this parameter results in EMCopy using read recalls to migrate archived data from the
target secondary tier file system to the target primary tier file system. Consequently, the primary tier file
system may reach 100-percent capacity or extend automatically.
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
25
Migration Strategies
26
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Chapter 4
Data Protection
Strategies
This chapter presents these topics:
Backup recommendations...................................................................................................................... 28
Recommendation #32 If the backup window allows, back up the data set as a single file system ....... 28
Recommendation #33 To reduce the backup window, back up primary and secondary file systems
separately .......................................................................................................................................... 28
Recommendation #34 When using the separate backup technique, back up the secondary file
system after FMA operations................................................................................................................. 29
Restore recommendations...................................................................................................................... 29
Recommendation #35 Use single-tier migration techniques when performing the full restore of
backups containing both primary and secondary tier files ..................................................................... 29
Recommendation #36 Preserve stub file synchronization when restoring separate backups of
primary and secondary tier data ............................................................................................................. 30
SnapSure recommendations................................................................................................................... 30
Recommendation #37 Take separate snapshots of primary and secondary file systems ...................... 30
Recommendation #38 Take snapshots of secondary file systems only after FMA activities................ 30
Recommendation #39 Preserve stub file synchronization when restoring snapshots of primary or
secondary tier file systems ..................................................................................................................... 31
Recommendation #40 Do not keep primary tier snapshots for more than 30 days............................... 31
Recommendation #41 Minimize the number and age of secondary tier snapshots............................... 31
Recommendation #42 Minimize the number and age of primary tier snapshots ................................. 31
Recommendation #43 Store primary tier file system snapshots on secondary tier storage................... 31
Replication recommendations................................................................................................................ 32
Recommendation #44 Replicate both primary and secondary file systems .......................................... 32
Recommendation #45 Configure source and destination FMA devices ............................................... 32
Recommendation #46 Populate and deduplicate the source site secondary tier file system before
configuring replication........................................................................................................................... 32
Recommendation #47 Use the same CIFS domain on replicated sites ................................................. 32
Recommendation #48 Use local host files instead of DNS entries to resolve secondary tier
hostnames when using replication ......................................................................................................... 32
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
27
Data Protection Strategies
Backup recommendations
Data sets that span multiple storage tiers present special requirements for backups. There are two
general approaches that can be taken:
♦
Back up the data set as a single file system
♦
Back up the data set as separate file systems (one backup for each storage tier)
Recommendation #32 If the backup window allows, back up the data set as a
single file system
If the data set is backed up as a as a single file system, it avoids the need for synchronized backups and
restores of the primary and secondary portions of the data set. It is therefore the recommended method
for backing up tiered data sets. To use this technique, the backup software is used to back up the
primary file system in a manner that includes the contents of both the primary tier file system and the
archived files stored in the secondary tier file system during the backup of the primary tier file system.
This is enabled through the use of automatic passthrough reads that read the archived files from the
secondary tier during backup processing, without relocating the files to the primary tier file system.
The technique to enable passthrough reads during backup processing is dependent on the backup
method.
Network Data Management Protocol (NDMP): For NDMP backups, set the NDMP environment
variable in the backup job to configure the behavior of the backup software to include the contents of
archived files from the secondary tier file system in backups of the primary tier file system:
EMC_OFFLINE=y
Note that the default setting is EMC_OFFLINE=n, which means that only the primary storage tier is
backed up. NDMP supports integrated snapshots that automate checkpoint creation, management, and
deletion activities if the SNAPSURE=y environmental variable is configured for qualified
vendor-backup software.
CIFS: When the Celerra FileMover –backup option is set to passthrough (the default setting), CIFS
backups of the primary tier file system include archived file data from the secondary tier. CIFS
backups of the primary tier file system read file content from the secondary storage tier in passthrough
mode, which does not relocate the data to the primary tier storage.
NFS: NFS backups of the primary file system always include associated secondary tier file content. Set
the Celerra FileMover – read_policy_override option to passthrough (as described in
Recommendation #11 on page 14) to ensure that NFS backup operations do not accidentally relocate
secondary tier file content back to the primary storage tier.
Recommendation #33 To reduce the backup window, back up primary and
secondary file systems separately
It is possible to reduce the time to back up the primary tier file system by taking separate backups of
the primary tier and secondary tier file systems that comprise the data set. Configure the backups of the
primary file system to be stub-aware to avoid backing up file content that has been archived to the
secondary file system. This reduces the size of the primary file system backup and therefore reduces
the time it takes to perform the backup.
28
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Data Protection Strategies
Taking separate data backups on primary and secondary tiers requires the ability to do stub-aware
backups of the primary file system. The technique to enable stub-aware backup processing is
dependent on the backup method.
NDMP: For NDMP backups, set the backup software NDMP environment variable in the backup job
definition to avoid including secondary tier content in the primary file system backup.
EMC_OFFLINE=n
Note that this is the default setting.
NDMP supports integrated snapshots that automate checkpoint creation, management, and deletion
activities if the SNAPSURE=y environmental variable is configured for qualified vendor-backup
software.
NDMP Volume Backup (NVB): Celerra supports an EMC-specific type of NDMP backup
mechanism called NVB or Volume Based Backup (VBB). Celerra NVB backs up data blocks at a
volume level rather than at a file level. NVB reads a set of disk data blocks in an efficient manner
compared to the method used for traditional, file-based backups. NVB does not support the NDMP
environment variable EMC_OFFLINE_DATA. In this case, you need to back up the contents of
migrated files and offline files independently.
In addition to providing faster backups, NVB retains the benefits of deduplication on secondary tier
file systems because NVB operates at the block level. Therefore, it does not reinflate deduplicated files
during backup operations. Note that NVB does not provide for single file restore or file-by-file restore.
CIFS: Set the Celerra FileMover –backup option to offline to back up only stub files and files that
have not been archived when backing up primary tier file systems. Execute the following command at
the Celerra CLI prompt to set this option:
$ fs_dhsm –modify fs_name -backup offline
NFS: It is not possible to configure NFS backups of the primary file system to be stub-aware. NFS
backups of the primary tier file system includes all associated secondary tier file content as described
in Recommendation #32 on page 28.
Recommendation #34 When using the separate backup technique, back up
the secondary file system after FMA operations
A best practice is to back up the secondary file system after every FMA policy engine scan or orphan
file cleanup operation. This is because the secondary tier file system is only updated by FMA, and
these updates only occur during archiving or orphan file cleanup. Therefore, backups of the secondary
file system need to occur only after an archiving or orphan file cleanup process has run.
Restore recommendations
This section details the restore recommendations.
Recommendation #35 Use single-tier migration techniques when performing
the full restore of backups containing both primary and secondary tier files
The full restore of primary file system backups that contain both primary and secondary file system
content (that is, backups created with the read passthrough option) write the complete contents of both
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
29
Data Protection Strategies
tiers into the primary tier file system. The content that was originally on the secondary tier can be
rearchived to the secondary tier by subsequent passes of the FMA policy engine scan. Because most
tiered data sets are larger than the primary tier file system, performing a full restore of data sets that
were backed up as a single file system (using the techniques in Recommendation #32 on page 28)
requires the use of single-tier migration techniques. Use the technique described in Recommendation
#25 on page 23 to avoid overfilling the primary tier file system when restoring backups that are larger
than the primary tier file system
Recommendation #36 Preserve stub file synchronization when restoring
separate backups of primary and secondary tier data
The disadvantage of the separate backup strategy described in Recommendation #33 on page 28 is that
it requires special considerations during the restore operations to ensure that the stub files in the
restored primary file system contain valid links to associated file content on the secondary file system.
When restoring a separate backup to either the primary or secondary tier, follow these guidelines:
♦
If the primary backup is more than 30 days old and orphan file cleanup procedures are regularly
used, restore the secondary tier from a backup taken more recently than the primary backup but
within 30 days of the primary backup.
♦
Restoring a backup of the secondary file system requires restoring a backed-up copy of the
primary file system that is older than the secondary backup.
♦
Single file or directory-level restores from primary file system backups that are more than 30 days
old may require administrative action to determine if it is necessary to recover older versions of
archived files from secondary file system backups.
Note: When restoring a secondary file system, if the secondary file system is shared by several
primary systems, it may be necessary to restore all primary file systems to the same recovery
point, This is because all stub files on all primary file systems must have corresponding files on
the shared secondary. As a result, you may need to revert all primary file systems to the earlier
versions.
SnapSure recommendations
This section details SnapSure™ recommendations.
Recommendation #37 Take separate snapshots of primary and secondary file
systems
For tiered data sets, take snapshots separately on the primary and secondary file systems. When
snapshots are used to back up consistent point-in-time images of a live file system, ensure that you
delete the snapshot after the backup operation is completed so as to enable Celerra to effectively
release storage space during future archiving and deduplication operations.
Recommendation #38 Take snapshots of secondary file systems only after
FMA activities
Because the policy engine is the only process that modifies files on the secondary, it is sufficient to
take snapshots of the secondary file system only after the completion of FMA activities like archiving
jobs or orphan file cleanups. This is similar to Recommendation #35 on page 29.
30
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Data Protection Strategies
Recommendation #39 Preserve stub file synchronization when restoring
snapshots of primary or secondary tier file systems
Because snapshots are essentially separate backups of the primary and secondary file systems, ensure
that the synchronization of stub files is preserved on the primary file system with archived data files on
the secondary file system when restoring snapshots of either file system.
When restoring entire file systems from snapshots, follow the guidelines listed in Recommendation
#36 on page 30 for restoring separate primary and secondary backups.
Recommendation #40 Do not keep primary tier snapshots for more than 30
days
User-initiated “Microsoft Previous Versions” tab restore of individual files or directories from primary
file system snapshots may result in stub files on the primary file system that do match corresponding
file versions on the secondary file system. This situation is more likely to occur with snapshots that are
older than 30 days and can be generally avoided by maintaining snapshots for shorter periods of time.
Recommendation #41 Minimize the number and age of secondary tier
snapshots
Snapshots reduce the effectiveness of deduplication because the space saved in the secondary file
system during deduplication is added to the previous snapshots’ SavVol storage usage until the
snapshot is deleted.
To realize the full-capacity saving benefits of the automated storage tiering architecture, keep
snapshots only for short-term use. You can automate this through the Celerra Manager checkpoint
scheduling feature. It allows an administrator to configure the system to automatically take snapshots
at a specific time of day, on specific days of the week or month, and to define a specific number of
snapshots to keep. For example, snapshots scheduled to be taken once a day can be configured to keep
seven copies only, ensuring that there will never be a snapshot on the system that is more than a week
old.
Recommendation #42 Minimize the number and age of primary tier
snapshots
Snapshots reduce the effectiveness of tiering because space saved in the primary tier file system during
an archiving operation is added to the previously created snapshots’ SavVol storage usage until the
snapshot is deleted. To realize the full-capacity saving benefits of the automated storage tiering
architecture, keep snapshots only for short-term use by configuring the snapshot scheduling as
described in Recommendation #41.
Recommendation #43 Store primary tier file system snapshots on secondary
tier storage
The goal of the automated storage tiering architecture is to limit the use of primary tier storage to
active data. Snapshots reduce the effectiveness of tiering because space saved in the primary tier file
system during an archiving operation is added to the previously created snapshots’ SavVol storage
usage until the snapshot is deleted. For this reason, store snapshots of primary tier file systems on the
secondary tier to preserve primary tier capacity. However, if high IOPS write performance to files on
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
31
Data Protection Strategies
primary storage is considered more important than conserving primary storage tier capacity, store the
snapshots on the primary tier storage. The read IOPS performance is not affected by the snapshots.
Replication recommendations
This section details the replication recommendations.
Recommendation #44 Replicate both primary and secondary file systems
Although it is possible to replicate only the primary file system in a tiered storage architecture and
have the stubs on the replicated copy of the primary file system point back to the non-replicated
secondary file system at the source site, this is not recommended for the automated storage tiering
architecture. Because the automated storage tiering architecture stores a tiered data set’s content on a
single Celerra system, better protection is realized by replicating both primary and secondary tier file
systems in separate replication sessions so that both primary and secondary tier content is fully
protected against site disasters that affect the source Celerra.
Recommendation #45 Configure source and destination FMA devices
Although only a single FMA device is required (at the replication source site) for basic disaster
protection, EMC recommends providing a second FMA device (at the replication destination site) to
allow for continued FMA policy engine scans during the outage. During an extended source site
outage, continued data set growth might cause the destination site’s primary tier file system to fill to
capacity if there is no FMA device available to perform archiving at the destination site.
Recommendation #46 Populate and deduplicate the source site secondary
tier file system before configuring replication
Deduplicating the contents of a file system before it is replicated can greatly reduce the amount of data
that has to be sent over the network as part of the initial baseline copy process. Therefore, do not
configure replication until after the initial migration of the data set to the source site and after
completing a deduplication scan on the archived data in the source site secondary tier file system.
After replication and deduplication are running together, most newly archived files are replicated
before they are deduplicated. For this reason, deduplication has little effect on the steady state quantity
of data replicated. However, deduplication saves the same amount of space in the replicated file system
at the source and destination sites, thereby reducing the overall secondary tier disk usage at both sites.
Recommendation #47 Use the same CIFS domain on replicated sites
If you are using CIFS connections on the primary storage, the CIFS server on the secondary side of
Replicator must be in the same domain as the CIFS server on the primary Data Mover. Otherwise, the
CIFS connections on the replicated file system do not work and any attempts to access the stub files on
the replicated file system result in I/O errors.
Recommendation #48 Use local host files instead of DNS entries to resolve
secondary tier hostnames when using replication
Use local host files on Data Movers for FileMover secondary server host name resolution rather than
DNS. On each Data Mover that hosts a FileMover primary file system, use the local host file for name
resolution of the secondary file system’s hostname. On the production site, associate the address for
32
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Data Protection Strategies
the secondary server at the production site with the secondary server hostname entry in the local host
file. Conversely, at the replication destination site, associate the address for the secondary server at the
replication destination site with the secondary server hostname entry in the local hosts file.
This causes the primary Data Mover at the production site to use the secondary server at the production
site to retrieve archived files. It also causes the primary Data Mover at the replication destination site
to use the secondary server at the replication destination site to retrieve archived files. In both cases,
archived file retrieval is accomplished locally without the need to traverse the WAN.
You can implement an additional level of disaster protection by listing the IP address of the secondary
server at the remote site as a second IP address in the local host file entry for the secondary server
hostname. This way, each site uses a local secondary server unless the local secondary server is not
available. If the local secondary server is not available, the remote secondary server will automatically
be used.
Use the following Celerra CLI commands to retrieve a Data Mover’s local host file:
server_file server_2 –get /.etc/hosts hosts
where server_2 is the name of the Data Mover and hosts is the name of the local file on the Control
Station that contains the contents of the Data Mover file /.etc/hosts after the command is executed
Next, edit the retrieved file named hosts on the Control Station, and move the edited file back to the
Data Mover to replace the contents of the Data Mover’s local /.etc/hosts file:
server_file server_2 –put hosts /.etc/hosts
where server_2 is the name of the Data Mover and hosts is the name of the local file on the Control
Station that replaces the contents of the /.etc/hosts file on the Data Mover after the command is
executed.
If you use local host file name resolution for the secondary server hostname, do not create DNS entries
for the secondary server hostnames.
If you use DNS instead of local host files:
♦
On a Celerra file system or secondary server failover, update the DNS entry for the secondary
servers to reference the IP addresses of the secondary servers at the disaster recovery site. This
allows stub files in the destination primary tier file system to resolve to the secondary file system
on the destination Celerra instead of the secondary file system on the source Celerra.
♦
Flush the DNS cache on the Data Mover if the DNS entry for the secondary server hostname has a
Time to Live (TTL) of more than a few minutes.
Do not use multiple IP addresses for a single hostname in DNS because most DNS servers return all
the addresses but rotate the order each time it is queried. This means it is not possible to predict which
secondary server a Data Mover recalls data from when the IP addresses for multiple Celerra secondary
servers are listed in a single DNS hostname entry.
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
33
Data Protection Strategies
34
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
Chapter 5
Performance
Recommendations
This chapter presents these topics:
Introduction…........................................................................................................................................ 36
Recommendation #49 Use VLANs to segregate network traffic.......................................................... 36
Recommendation #50 Monitor read access to the secondary tier file system....................................... 36
EMC Celerra Automated Storage Tiering Applied Best Practices Guide
35
Data Protection Strategies
Introduction
This section details the performance recommendations.
Recommendation #49 Use VLANs to segregate network traffic
Use VLANs to segregate network traffic of different types to improve throughput, manageability,
application separation, high availability, and security. You can attain better archiving performance by
using separate interfaces on the Rainfinity FMA to read data from the primary tier and write data to the
secondary tier. For FMA/VE, use two separate interfaces configured in two separate VLANs through
two virtual switches on the ESX server to enhance archiving performance.
Use additional VLANs for client access to the primary tier file system.
Recommendation #50 Monitor read access to the secondary tier file system
High rates of read operations on the secondary tier file system during normal client access to the
primary tier file system may indicate that archived files may be more active than intended. Archived
files that continue to get repeated client read requests can be relocated to the primary tier file system by
temporarily adjusting the FileMover read_policy_override setting to full. This causes Celerra to
automatically recall any archived file back to the primary tier file system the next time the file is
accessed for reading. To allow the recall of archived files on read access, use the FileMover fs_dhsm
command to set the read_policy_override option to full.
$ fs_dhsm -modify fs_name -read_policy_override full
where fs_name is the name of the FileMover-enabled primary file system
Closely monitor the capacity usage on the primary tier file system when using this setting. You can
explicity recall archived files that get repeated client read requests by viewing the files’ contents from
a NAS client or implicitly recall them by changing the read_policy_override option to full for a long
time period. This recalls the archived files that continue to be accessed by read requests. After the
desired files are recalled or when the primary tier capacity usage reaches the desired target usage,
disable read recalls as described in Recommendation #11 on page 14. Note that when the read recall
policy setting is full, it is possible that file system scanning applications like content searches, NFS
backups, Windows offline folder synchronization, and client-based anti-virus scans may accidentally
trigger recalls of all archived data to the primary file system. This may lead to accidental file system
auto-extension or failed client writes. Therefore, it is desirable to limit exposure to this risk by
minimizing the time period during which the read recall policy is set to full. Use the default offline
attribute settings described in Recommendation #12 on page 14 to further limit the risk of accidental
recalls.
In a controlled environment where there is little risk of unintentional recalls, you can set the
read_policy_override option to full for normal operations. This has a net performance benefit on the
client access to files that become active after periods of inactivity. Such files get archived during
periods of inactivity, and this setting causes Celerra to automatically migrate archived files back to the
primary tier when and if they become active again.
36
EMC Celerra Automated Storage Tiering Applied Best Practices Guide