EMC CONFIDENTIAL – INTERNAL USE ONLY EMC CONFIDENTIAL – INTERNAL AND PARTNER USE ONLY DELETE IF THIS IS A PUBLIC DOCUMENT EMC CONFIDENTIAL – INTERNAL USE ONLY EMC CONFIDENTIAL – INTERNAL AND PARTNER USE ONLY DELETE IF THIS IS A PUBLIC DOCUMENT EMC Celerra Automated Storage Tiering Applied Best Practices Guide EMC NAS Product Validation Corporate Headquarters Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com Copyright © 2009 EMC Corporation. All rights reserved. Published August, 2009 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. EMC Celerra Automated Storage Tiering Applied Best Practices Guide P/N h6499 2 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Contents About this Document ............................................................................................................................... 5 Chapter 1 Automated Storage Tiering Overview ..................................................................................................... 7 Architectural overview............................................................................................................................. 8 Benefits of automated storage tiering ...................................................................................................... 8 Chapter 2 General Recommendations ...................................................................................................................... 9 Storage provisioning recommendations................................................................................................. 10 Recommendation #1 Evaluate the data set’s suitability for a tiered-storage architecture ..................... 10 Recommendation #2 Store primary tier data on EFD ........................................................................... 11 Recommendation #3 Store secondary tier data on SATA..................................................................... 11 Recommendation #4 Correctly size the primary file system on Tier 0/1.............................................. 11 Recommendation #5 Correctly size the secondary file system on Tier 2 ............................................. 12 Recommendation #6 Use deduplication on the secondary tier file system ........................................... 13 Recommendation #7 Use manual file system extension on the primary tier file system ...................... 13 Recommendation #8 Use automatic file system extension on the secondary tier file system............... 13 Recommendation #9 Use virtual provisioning on the secondary tier file system ................................. 14 Recommendation #10 Avoid client access to the secondary file system .............................................. 14 Archiving and deduplication policy recommendations.......................................................................... 14 Recommendation #11 Avoid unintended recalls to primary storage .................................................... 14 Recommendation #12 Use the default FileMover offline attribute setting ........................................... 14 Recommendation #13 Use aggressive deduplication policy settings.................................................... 15 Recommendation #14 Tune archiving policies to optimize primary tier capacity usage...................... 15 Recommendation #15 Use the FMA Create Preview feature ............................................................... 16 Recommendation #16 Tune archiving policies to take advantage of data set idiosyncrasies ............... 16 Recommendation #17 Tune archiving policies to match the active data set’s rate of change .............. 17 Recommendation #18 Tune archiving policies to account for data that is not archivable .................... 18 Recommendation #19 Run archiving jobs often enough to avoid overfilling the primary tier ............. 19 Chapter 3 Migration Strategies............................................................................................................................... 21 Single-tier migrations............................................................................................................................. 22 Recommendation #20 Use the multiple-tier migration technique whenever possible .......................... 22 Recommendation #21 Use the single-tier migration technique for NFS data....................................... 22 Recommendation #22 Use the single-tier migration technique if extra space is not available ............. 22 Recommendation #23 Use FMA to preview policies when migrating from Celerra or NetApp .......... 22 EMC Celerra Automated Storage Tiering Applied Best Practices Guide 3 Contents Recommendation #24 If file sizes and ages are unknown, archive all files during migration .............. 23 Recommendation #25 Migrate only a subset of the data set at a time .................................................. 23 Recommendation #26 After the migration, temporarily set the read recall policy to full recall ........... 23 Multiple-tier migrations ......................................................................................................................... 24 Recommendation #27 Use the single-tier migration technique for NFS data ....................................... 24 Recommendation #28 Correctly size the temporary file system........................................................... 25 Recommendation #29 Configure FMA and FileMover on the temporary file system and the target primary file system................................................................................................................................. 25 Recommendation #30 Use the temporary file system to tune archiving policies.................................. 25 Recommendation #31 Set the CIFS backup option on the temporary file system to offline................. 25 Chapter 4 Data Protection Strategies ...................................................................................................................... 27 Backup recommendations ...................................................................................................................... 28 Recommendation #32 If the backup window allows, back up the data set as a single file system ....... 28 Recommendation #33 To reduce the backup window, back up primary and secondary file systems separately .......................................................................................................................................... 28 Recommendation #34 When using the separate backup technique, back up the secondary file system after FMA operations ................................................................................................................. 29 Restore recommendations ...................................................................................................................... 29 Recommendation #35 Use single-tier migration techniques when performing the full restore of backups containing both primary and secondary tier files ..................................................................... 29 Recommendation #36 Preserve stub file synchronization when restoring separate backups of primary and secondary tier data ............................................................................................................. 30 SnapSure recommendations ................................................................................................................... 30 Recommendation #37 Take separate snapshots of primary and secondary file systems....................... 30 Recommendation #38 Take snapshots of secondary file systems only after FMA activities................ 30 Recommendation #39 Preserve stub file synchronization when restoring snapshots of primary or secondary tier file systems ..................................................................................................................... 31 Recommendation #40 Do not keep primary tier snapshots for more than 30 days ............................... 31 Recommendation #41 Minimize the number and age of secondary tier snapshots............................... 31 Recommendation #42 Minimize the number and age of primary tier snapshots ................................. 31 Recommendation #43 Store primary tier file system snapshots on secondary tier storage................... 31 Replication recommendations................................................................................................................ 32 Recommendation #44 Replicate both primary and secondary file systems .......................................... 32 Recommendation #45 Configure source and destination FMA devices ............................................... 32 Recommendation #46 Populate and deduplicate the source site secondary tier file system before configuring replication........................................................................................................................... 32 Recommendation #47 Use the same CIFS domain on replicated sites ................................................. 32 Recommendation #48 Use local host files instead of DNS entries to resolve secondary tier hostnames when using replication ......................................................................................................... 32 Chapter 5 Performance Recommendations............................................................................................................ 35 Introduction .......................................................................................................................................... 36 Recommendation #49 Use VLANs to segregate network traffic .......................................................... 36 Recommendation #50 Monitor read access to the secondary tier file system....................................... 36 4 EMC Celerra Automated Storage Tiering Applied Best Practices Guide About this Document This document describes the best practices to configure and manage the Celerra automated storage tiering architecture. Audience This document is intended for storage administrators who are engaged in planning deployments of network-attached storage. Related documents The following documents provide additional, relevant information. Access to these documents is based on your login credentials. If you do not have access to the following content, contact your EMC representative: ♦ Achieving Storage Efficiency with EMC Celerra — Best Practices Planning ♦ Automated Tiered Storage with EMC Celerra and Rainfinity File Management Appliance — Applied Best Practices ♦ EMC Rainfinity File Management Application Installation and User Guide ♦ Using EMC Celerra FileMover — Technical Module ♦ Using Celerra Data Deduplication — Technical Module EMC Celerra Automated Storage Tiering Applied Best Practices Guide 5 About this Document 6 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Chapter 1 Automated Storage Tiering Overview This chapter presents these topics: Architectural overview............................................................................................................................. 8 Benefits of automated storage tiering ...................................................................................................... 8 EMC Celerra Automated Storage Tiering Applied Best Practices Guide 7 Automated Storage Tiering Overview Architectural overview EMC® Celerra® automated storage tiering is a storage architecture that uses Celerra FileMover and Rainfinity® File Management Appliance (FMA) to effectively create a multi-tiered file system by distributing the contents of a data set across multiple file systems that reside on different storage tiers. Network-attached storage (NAS) clients access the data set from the primary-tier file system regardless of where the data is stored. To the client, the multi-tiered storage used for the data set appears to be a single homogenous file system with a contiguous namespace. As shown in Figure 1, the active files in the data set are stored on Tier 0 or Tier 1, which consists of enterprise Flash drives (EFD) or Fibre Channel (FC) disks. The inactive files in the data set are stored on Tier-2 Serial ATA (SATA) disks. Tier 0: EFD Tier 1: FC / SAS Tier 2: SATA (Deduplication) Rainfinity FMA Figure 1 Celerra automated storage tiering architecture with FMA Benefits of automated storage tiering The automated storage tiering architecture enables Tier 0/1 performance benefits for the active portion of a data set, while providing Tier 2 storage cost savings for the inactive portion. This provides Tier 0/1 performance without the cost of allocating Tier 0/1 storage for the entire data set. Also, deduplication is used on Tier 2 to reduce the amount of storage required for the data set, which thereby reduces the number of required drives along with the power and cooling costs. 8 EMC Celerra Automated Storage Tiering Applied Best Practices Guide General Recommendations Chapter 2 General Recommendations This chapter presents these topics: Storage provisioning recommendations ................................................................................................. 10 Recommendation #1 Evaluate the data set’s suitability for a tiered-storage architecture ..................... 10 Recommendation #2 Store primary tier data on EFD ........................................................................... 11 Recommendation #3 Store secondary tier data on SATA..................................................................... 11 Recommendation #4 Correctly size the primary file system on Tier 0/1 .............................................. 11 Recommendation #5 Correctly size the secondary file system on Tier 2.............................................. 12 Recommendation #6 Use deduplication on the secondary tier file system ........................................... 13 Recommendation #7 Use manual file system extension on the primary tier file system ...................... 13 Recommendation #8 Use automatic file system extension on the secondary tier file system............... 13 Recommendation #9 Use virtual provisioning on the secondary tier file system ................................. 14 Recommendation #10 Avoid client access to the secondary file system .............................................. 14 Archiving and deduplication policy recommendations.......................................................................... 14 Recommendation #11 Avoid unintended recalls to primary storage .................................................... 14 Recommendation #12 Use the default FileMover offline attribute setting ........................................... 14 Recommendation #13 Use aggressive deduplication policy settings.................................................... 15 Recommendation #14 Tune archiving policies to optimize primary tier capacity usage ...................... 15 Recommendation #15 Use the FMA Create Preview feature................................................................ 16 Recommendation #16 Tune archiving policies to take advantage of data set idiosyncrasies ............... 16 Recommendation #17 Tune archiving policies to match the active data set’s rate of change............... 17 Recommendation #18 Tune archiving policies to account for data that is not archivable .................... 18 Recommendation #19 Run archiving jobs often enough to avoid overfilling the primary tier ............. 19 EMC Celerra Automated Storage Tiering Applied Best Practices Guide 9 Migration Strategies Storage provisioning recommendations This section details recommendations for storage provisioning. Recommendation #1 Evaluate the data set’s suitability for a tiered-storage architecture Evaluate the data set for compatibility with a tiered-storage architecture before implementing a tieredstorage solution. The three data set characteristics that affect the effectiveness of tiering are: ♦ Distribution of file sizes ♦ Distribution of file access dates ♦ Number of files in the data set Based on these characteristics, the following sections discuss the file size, file age, and file count considerations. File size considerations Celerra FileMover does not reduce the space used on the primary tier storage for files that are smaller than 8 KB because each archived file is replaced with an 8 KB stub file on the primary tier file system. Therefore, a data set that is mostly made up of 5 KB files is unsuitable for tiering. As the average file size increases, the benefits of tiering also increase because greater primary tier space savings can be attained with larger file sizes. For example: ♦ A 1 TB data set of 5 KB files requires 1 TB of primary storage (You attain 0 percent primary tier space savings by tiering). ♦ A 1 TB data set of 50 KB files requires approximately 150 GB of primary storage for stub files, if the entire data set is archived (You can attain 85 percent primary tier space savings by tiering). ♦ A 1 TB data set of 1 MB files requires approximately 8 GB of primary storage for stub files if the entire data set is archived (You can attain 99 percent primary tier space savings by tiering). File age considerations You can configure the FMA policy engine to select files for archiving based on a file’s level of activity, which is indicated by the number of days since it was last accessed or modified. The goal of automated storage tiering is to store the active data on the primary tier storage and the inactive data on the lower tiers. If all files in a data set are accessed on a regular basis, none of the files will be considered inactive enough to be candidates for archiving. An even distribution of last-accessed dates combined with the characteristic that older files are less likely to be accessed increases the benefits of tiering. For example: 10 ♦ A 1 TB data set where all files are accessed on a regular basis requires 1 TB of primary storage (You attain 0 percent primary tier space savings by tiering). ♦ A 1 TB data set where 80 percent of the files are rarely accessed requires approximately 200 GB for the active files (You can attain 80 percent primary tier space savings by tiering). EMC Celerra Automated Storage Tiering Applied Best Practices Guide General Recommendations ♦ A 1 TB data set where 80 percent of the files are more than 2 months old and where files that are more than 2 months old are never accessed will require approximately 200 GB for the active files (You can attain 80 percent primary tier space savings by tiering. The percentage savings increase as the data set grows). Note that the amount of storage consumed by the active files in the third example remains constant over time because it is possible to schedule a policy to archive all files that are more than two months old at regular intervals. This creates a “sliding window” such that at any given time, only two months worth of data is stored on the primary storage tier. Regardless of how much the data set grows, the amount of primary storage space required to store the active files remains at 200 GB. This makes the example a very good candidate for tiering because you can use the primary tier to store the sliding window of 200 GB of active data and use the secondary tier as an overflow for an ever-increasing amount of inactive data. File count considerations Each FMA/VE virtual appliance can archive up to 50 million files. The hardware FMA appliance is limited to 200 million files for each FMA. When the possibility of implementing tiering for a data set with a large number of files is evaluated, ensure that the future growth rate of the total number of files in all tiered data sets on Celerra is included, so that the number of archived files falls within the published limits of the FMA product selected. Recommendation #2 Store primary tier data on EFD The performance of enterprise Flash drives (EFD) far exceeds other disk drive technologies. Based on the data access characteristics, a set of EFDs can support up to 30 times the I/O rate of a similar number of FC drives. Recommendation #3 Store secondary tier data on SATA In the automated storage tiering architectural model, only inactive data is stored on the secondary tier. Therefore, you can achieve the cost savings of storing secondary tier data on high-capacity SATA drives with minimal impact on the production data access because all active data is stored on the primary tier. The files archived to the secondary tier are still online and readily available and when updated, the files are automatically promoted to the primary tier. Recommendation #4 Correctly size the primary file system on Tier 0/1 The benefits of automated storage tiering are maximized by restricting the amount of storage allocated to the primary file system to the size that is required to store active files and stubs for inactive files. Before migrating to a automated storage tiering architecture, choose the criteria to differentiate between active and inactive data. This can be based on access characteristics such as data last accessed or modified, or you can choose criteria that results in an arbitrary percentage of data to remain on the primary tier, for example, declaring the most recently accessed 20 percent of the data set as the active portion. To protect against accidental recalls of archive data, a file that is archived will not be relocated back to the primary tier storage unless it is modified (Recommendation #11 on page 14). Because read access to the archived files is significantly slower than read access to the primary tier files, it is important to evaluate data access patterns to ensure that archived files are unlikely to require repeated future read access. After you select the archiving criteria, use FMA to preview policy definitions, which split the data into active and inactive portions, paying special attention to the total size of the active portion. EMC Celerra Automated Storage Tiering Applied Best Practices Guide 11 Migration Strategies Provision the primary tier file system such that it is large enough to store all active data in addition to having enough space to store an 8 KB stub for each file in the inactive portion. Leave additional space in the primary file system for future growth in the form of space for additional stub files. Note that the primary tier always stores the active portion of the data set and that, over time, as new data is added to the primary tier, older data is archived to the secondary tier. With a constant rate of growth, this process renders the size required for the primary file system as fairly static with the new file creation rate in a state of equilibrium with the old file archiving rate. However, each archived file still requires 8 KB of space on the primary file system. Therefore, make an allowance for the primary storage consumed by these stub files as the data set continues to grow. For example, use the following calculation to provision primary storage for 70 percent usage, that is, when primary storage is used to store 20 percent of the most active files of a 1 TB data set of 2 million files. This data set is expected to grow by 1 million files (500 GB) per year for the next two years: Current active data = 20% of 1 TB, 20% of 2 million files = 200 GB consumed by 400,000 files Current inactive data = 80% of 1 TB, 80% of 2 million files = 800 GB consumed by 1,600,000 files Stub files for the inactive data = 8 KB * 1,600,000 files = 12 GB Stub files for the next 2 years = 8 KB * 2,000,000 files = 15 GB Space required on the primary tier = 200 GB + 12 GB + 15 GB = 227 GB Target utilization of the primary tier = 70% Recommended size of the primary tier = 227 GB * (100%/70%) = 324 GB Recommendation #5 Correctly size the secondary file system on Tier 2 Ensure that the secondary tier file system is large enough to hold all currently inactive data in addition to providing future space for currently active data and future growth data as it becomes inactive over time. Using the same data set described in Recommendation #4, use this protocol to estimate the secondary storage required to store the inactive portion of the data set: Growth of the inactive data (2 years) = 1 TB consumed by 2 million files Logical size of the secondary data = 800 GB currently inactive + 1 TB future growth = 1.8 TB Target utilization of the secondary 12 EMC Celerra Automated Storage Tiering Applied Best Practices Guide = 70% General Recommendations Recommended size of the secondary = 1.8 TB * (100%/70%) = 2.6 TB Recommendation #6 Use deduplication on the secondary tier file system The use of deduplication on the secondary file system reduces the amount of storage that needs to be allocated to it. By using the data set described in Recommendation #5, the following calculation yields a reduced recommended size for the secondary file system: Logical size of the secondary data = 1.8 TB Expected deduplication savings = 40% Deduplicated size of the secondary data = (100%-40%) * (1.8 TB) = 1.1 TB Target utilization of the secondary tier = 70% Recommended size of the secondary = 1.1 TB * (100%/70%) = 1.6 TB This calculation yields a net savings of 1 TB of secondary storage when compared with the calculation in Recommendation #5. The deduplication savings percentage varies based on the degree of duplication and the compressibility of the files in the data set. Files that are less than 24 KB in size will not be deduplicated. Therefore, data sets that contain a high percentage of small files experience lower rates of deduplication savings. Use FMA orphan file management to minimize the consumption of secondary tier storage by archived versions of files that have been deleted or modified on the primary tier file system. Recommendation #7 Use manual file system extension on the primary tier file system File system auto-extension enables the space allocated to a file system to grow automatically upon reaching a capacity utilization high water mark. The automated storage tiering architecture is designed to offload primary tier data to secondary tiers to avoid growing the primary tier file system. A primary tier file system that is used with a properly tuned archiving policy does not require additional space unless the data set has increased to the point where the sum of the space required for the active primary tier data and for the stub files pointing to the archived secondary tier files does not fit into the allocated primary tier space. Automatic file system extension runs the risk of accidentally extending the primary file system because of an accumulation of inactive files in the primary tier. A properly tuned archiving policy archives the files to the secondary tier to free up space in the primary tier, and thereby avoid the need for file system extension. Recommendation #8 Use automatic file system extension on the secondary tier file system You need to extend the secondary tier file system only if all eligible files have been deduplicated and more space is required to archive additional files from the primary file system. Ensure that you do not extend the EMC Celerra Automated Storage Tiering Applied Best Practices Guide 13 Migration Strategies secondary file system because of an accumulation of files that have not been deduplicated. Deduplicate all eligible files on the secondary tier file system to free space in the secondary tier and avoid the need for file system extension. Use of aggressive deduplication policies (Recommendation #13 on page 15) helps to avoid unnecessary secondary tier file system extension. However, because the secondary tier represents an overflow area for the primary tier, it is reasonable to use auto-extension to enable the secondary file system to start small and grow as-needed over time, as files ageout of the primary file system and are archived to the secondary file system. Recommendation #9 Use virtual provisioning on the secondary tier file system In the automated storage tiering architecture, the goal is to keep primary tier file system utilization at a constant and moderately high level so that the performance advantages of primary tier disk technologies is maximized. In contrast, usage of the secondary tier increases over time as new data is added to the primary tier and older data stored on the primary tier becomes inactive and is archived to the secondary tier. The continuous growth typical of archive storage is a natural fit for using storage pools, automatic file system extension, and virtual provisioning. Because a portion of the available secondary tier storage space is intended for future growth needs, a collection of secondary file systems can share the same storage pool and use virtual provisioning to effectively “share” the storage space set aside for future growth needs. As usage increases over time, some file systems use more or less than what was originally predicted. Virtual provisioning avoids the trap of over-allocating space for future growth to specific file systems that do not require it. Recommendation #10 Avoid client access to the secondary file system Do not allow Celerra clients to directly access the secondary file system. Any modification of archived files on the secondary tier causes the files to be inaccessible through the stub files on the primary storage tier. Archiving and deduplication policy recommendations This section details recommendations for archiving and deduplication. Recommendation #11 Avoid unintended recalls to primary storage The automated storage tiering architecture provides enough space in the primary tier for active data, but not enough to store the entire data set. Therefore, it is important to avoid accidentally recalling data from the secondary tier to the primary tier because these uncontrolled recalls may consume all the capacity allocated to the file system on the primary tier. To avoid accidental read recalls triggered by content searches, Network File System (NFS) backups, Windows offline folder synchronization or client-based anti-virus scans, use the FileMover fs_dhsm command to set the read_policy_override option to passthrough. $ fs_dhsm -modify fs_name -read_policy_override passthrough where fs_name is the name of the FileMover-enabled primary file system. Recommendation #12 Use the default FileMover offline attribute setting If Recommendation #11 is not followed, EMC strongly recommends that you accept the default value for the –offline_attr option. The default value enables Celerra to indicate to Common Internet File System (CIFS) clients that a file is archived. A small marker appears at the bottom-left corner of the file icon when the file appears in Windows Explorer. If the option is disabled and the read_policy_override option is not 14 EMC Celerra Automated Storage Tiering Applied Best Practices Guide General Recommendations set to passthrough, archived files can be recalled to primary storage unnecessarily because some applications, such as Windows Explorer, read parts of files when a user views the file’s enclosing folder. Recommendation #13 Use aggressive deduplication policy settings Archived files on the secondary file system are never modified in place because modifications to archived files occur only after the files have been recalled to the primary file system. Therefore, deduplication policy settings on the secondary tier can treat archived files as if they are read-only and disregard the modification time parameter built into the default deduplication policy. In addition, because typical archiving policies include an access time parameter, it is likely that newly-archived files have not been accessed in some time. Therefore, it is reasonable to disregard the access time parameter built into the default deduplication policy. To set both these values to 0 causes all files on the secondary to be considered for deduplication without delay and saves capacity on the secondary tier, at the same time incurring little or no performance penalty when accessing archived files from the primary file system. This is because there is a very small performance penalty for reading deduplicated files. Use the following commands on the Celerra Control Station to set the policy engine for a Data Mover to ignore the access and modification time when considering files for deduplication: server_param server_2 –f dedupe –modify accessTime –value 0 server_param server_2 –f dedupe –modify modificationTime –value 0 where server_2 is the Data Mover name that hosts the secondary file system. Recommendation #14 Tune archiving policies to optimize primary tier capacity usage One of the automated storage tiering goals is to maximize the usage of high-performance primary tier storage and take advantage of the cost savings offered by high-capacity secondary tier storage. An archiving policy designed to achieve this goal must keep data as much as possible on the primary tier without allowing the primary tier to reach 100-percent capacity. If automatic file system extension is used on the primary tier, the goal of the archiving policy must be to keep the primary tier utilization below the auto-extension high water mark. Careful tuning of the archiving policy is required to ensure that primary tier usage does not exceed the limits. There are several factors that affect the capacity usage of the primary tier file system: ♦ Capacity consumed by stub files ♦ Capacity consumed by files that are too small to archive ♦ Capacity consumed by files that are too recently active to archive Stub files generally consume 8 KB of space. In a typical file system, the number of stub files grows as new files are added to the file system and older files are archived. This increases the amount of primary tier capacity that is consumed by the stub files. The percentage of data set space consumed by stub files is highly dependent on the distribution of file sizes in the data set. In general, the smaller the average file size, the higher is the percentage of primary tier space that is consumed by the stub files. Recommendation #19 on page 19 provides more information. Likewise, if some of the new files added to the file system are below the size threshold for archiving (8 KB by default), the capacity consumed by small files increases over time. The amount of primary tier space EMC Celerra Automated Storage Tiering Applied Best Practices Guide 15 Migration Strategies consumed by files that are too small to archive is also dependent on the distribution of file sizes in the data set and is discussed further in Recommendation #19 on page 19. The last category of files that consumes space in the primary tier is the set of files that are too active to archive. The administrator can control the capacity consumed by files that are too recently active to archive through careful selection of the archiving criteria when the archiving policy is created or modified. For example, consider a policy that archives files that match the policy rule “last_accessed > 30 days”. However, a more recent cutoff date (last_accessed > 15 days) results in more files matching the rule and therefore, more files being archived. This decreases the capacity usage of the primary tier. In contrast, a cutoff date that is further in the past (last_accessed > 60 days) results in fewer files matching the rule and therefore, fewer files being archived. This increases the capacity usage of the primary tier. Figure 2 shows the general relationship between the archiving policy and the capacity usage of the primary tier file system. Note that the actual effect of the specific last_accessed date criteria on a specific file system is based on the distribution of last_accessed dates in the file system and the sizes of the files that meet the archiving criteria. Figure 2 free space free space free space active files active files active files stub files small files stub files small files stub files small files last_accessed > 15 last_accessed > 30 last_accessed > 60 Effect of archiving policy criteria on primary tier capacity usage Recommendation #15 Use the FMA Create Preview feature Use the FMA Create Preview feature to measure the effects of an archiving policy without actually moving files. Compare these results to the available capacity in the primary file system before scheduling the archiving job. Note that it may take a significant amount of time to generate a preview on a large data set. Ensure that previews are generated when the FMA is not under a heavy load with other tasks. Recommendation #16 Tune archiving policies to take advantage of data set idiosyncrasies If there are known non-date attributes that you can use to predict whether a file needs to be accessed or modified, set up separate policies to archive files based on these attributes. For example, if all files in a particular directory or with a particular file name format are known to be static records that are unlikely to be accessed, create a policy rule that specifically matches these files. For example, if data about cancelled accounts is rarely accessed and is always stored in a specific directory, create an archiving rule to automatically archive all files that reside in that directory. The “Implementing a File Tiering Strategy” chapter in the Automated Tiered Storage with EMC Celerra and Rainfinity File Management Appliance — Applied Best Practices document provides more information about tailoring archiving criteria for specific data set characteristics. 16 EMC Celerra Automated Storage Tiering Applied Best Practices Guide General Recommendations Recommendation #17 Tune archiving policies to match the active data set’s rate of change After an archiving policy is created to optimize the amount of active data that is stored directly on the primary tier file system, it is important to check the effects of the policy periodically to ensure that variations in the growth rate of the active portion of the data set do not cause the policy to archive too many or too few files to keep the primary tier capacity usage at an optimal level. For example, previews of archiving policies run against a data set at a particular point in time may indicate that archiving all files that have not been accessed in the past 30 days results in a comfortable margin of 40 percent free space in the primary tier file system. However, if the rate of data access increases or if an unexpectedly large number of new files are added to the data set, it is possible that re-running the same policy might not archive enough files to keep the primary tier capacity usage from exceeding the desired level. In a worst-case scenario, a poorly tuned policy will not archive any files at all. As a result, the file system may grow until it reaches 100 percent or is automatically extended. At the opposite end, if there are no new files, or if access to the existing data drops off dramatically, re-running the same policy might archive a significantly higher percentage of files, resulting in very low capacity usage on the primary tier. In a worst-case scenario, all files will be archived, resulting in very low primary tier capacity utilization. Additionally, it leads to performance degradation if files that are unnecessarily archived need to be accessed or recalled to the primary tier file system for modification. It is worth noting the variety of mechanisms that can result in the growth of primary tier active data, where a file is considered active based on the time since it was last accessed: ♦ Creating new files ♦ Modifying primary tier files where the size of the file is increased ♦ Modifying inactive but not yet archived files, regardless of whether the file size is increased ♦ Reading inactive but not yet archived files (because reading changes the last accessed date) ♦ Modifying previously archived files (Regardless of whether the size of the file is changed, archived files are recalled upon modification. If the new size of the file is greater than the size of a stub file, the size of the set of active files in the data set increases.) The following types of access have no effect on the size of the set of active files: ♦ Reading existing active files ♦ Reading previously archived files (assuming read recall is set to passthrough) ♦ Deleting inactive files, whether or not they have been archived yet Several mechanisms shrink the size of the active portion of the data set: ♦ The passage of time (active files age-out into inactive files if they are not accessed) ♦ Modifying existing active files where the size of the file is reduced ♦ Deleting an active file Because it is difficult to quantify these factors independently, it is relatively easier to think in terms of the total net growth rate of active data. If this growth rate is expressed in terms of new space consumed per day, that is, N GB/day, you can consider an archiving policy that archives files that have not been accessed EMC Celerra Automated Storage Tiering Applied Best Practices Guide 17 Migration Strategies in a given number of days as leaving M days worth of data on the primary tier. You can calculate M days worth of data by multiplying the active data growth rate and the number of days excluded from the archiving policy. For example, an active data growth rate of 10 GB/day and an archive policy that archives files that have not been active in the past 30 days leaves 30 days worth of data (30 days * 10 GB/day) or 300 GB of active data on the primary tier. This active data can be added to the set of stub files and files that are too small to archive to calculate the total usage of the primary tier after the archiving policy is run. Files that are too small to archive are excluded from the growth rate of active files. Small files that are added to the data set over time can be considered as file system overhead, much in the same way that stub files are considered as overhead. They both consume space in the primary file system and therefore, consumes space that could otherwise be used for active data. Over a long period of time, the overhead reduces the amount of space that is available for truly active data and results in the need to tune archive policies to shrink the size of the time window wherein a file is considered active. Recommendation #18 Tune archiving policies to account for data that is not archivable Over a period of time, the primary file system accumulates stub files that point to previously archived files. It also accumulates files that cannot be archived because of their small size or because they do not match the archiving policy’s file selection criteria. This accumulation of used capacity reduces the amount of capacity available to store currently active files. The amount of primary tier space consumed by stub files and small files is directly related to the distribution of file sizes in the data set. Example 1 1 TB of storage space consumed by 1 million files Average file size = 1 MB Space required for stub files for the entire file system = (1 million * 8 KB per stub) = 7.6 GB Example 2 1 TB of storage space consumed by 20 million files Average file size = 50 KB Space required for stub files for the entire file system = (20 million * 8 KB per stub) = 152.6 GB For Example 1, the upper space limit that stub files can consume is less than 1 percent of the data set size. The growth rate of the space consumed by the stub files therefore increases at less than 1 percent of the growth rate of the entire file system. Therefore, the space consumed by the stub files can largely be ignored for the purposes of tuning archiving policies. In Example 2, the potential primary tier space needed for stub files is 75 percent of the 200 GB allocated to the primary tier file system, in a automated storage tiering architecture that allocates 20 percent of the file system’s storage on the primary tier and 80 percent on the secondary tier. Given that 75 percent is a reasonably high-capacity usage rate for the primary tier, there will not be any primary tier space remaining to comfortably store active files. If the data set in example 2 was created and archived over an extended period of time, the administrator who is responsible for archiving needs to adjust the archiving policy to gradually reduce the amount of active data allowed on the primary tier to account for the increasing amount of space consumed by the stub files pointing to the archived data on the secondary tier. Though this 18 EMC Celerra Automated Storage Tiering Applied Best Practices Guide General Recommendations practice accommodates continued growth of the data set, eventually the primary tier will run out of space unless older files are purged when they are no longer needed. Furthermore, unless space is freed on the primary tier by deleting outdated files, the space consumed by the stub files eventually reduces the space available for active files to the point where even active files will need to be archived, leading to the need to add storage to the primary tier file system to continue providing high-tier performance for the active files in the data set. Recommendation #19 Run archiving jobs often enough to avoid overfilling the primary tier An optimally tuned archiving policy resets the capacity usage of the primary tier file system to the lower end of the target usage range defined by the administrator. As new files are created and existing files are recalled for modification, the capacity usage increases. Run scheduled archiving jobs often enough to keep the primary file system from running out of space or exceeding auto-extension high water mark triggers. The frequency depends on the amount of free space in the primary tier file system at the desired lower end of the capacity usage and the rate of growth between archiving runs. The following example demonstrates a method of deriving archiving policies and scheduling from a data set’s rate of change and the desired capacity usage of the primary file system. The example also shows how to determine the impact of small files and stub files on archiving policies and schedules. Example Size of the primary file system: 200 GB Growth rate of the active data: 2 GB/day consumed by 10,000 new files, including 2,000 files that are less than 8 KB in size Target primary utilization: 60 percent to 80 percent full Using these parameters, it is possible to calculate an archiving policy and archiving schedule, and also to assess how much space will be consumed by stub files and files that are too small to archive. Archiving policy to preserve the capacity utilization target: Lower range of target = 60% of 200 GB = 120 GB 120 GB primary tier space is available for the active data initially. This space will be consumed by a sliding window of 60 days worth of new data 120 GB target /2 GB per day growth rate = 60 days Recommendation: Archive all files that are more than 60 days old Archiving scheduling frequency: Calculate the difference between the upper and lower bounds of target usage = (80% * 200 GB) – (60% * 200 GB) = 160 GB – 120 GB = 40 GB EMC Celerra Automated Storage Tiering Applied Best Practices Guide 19 Migration Strategies Calculate the number of days at the projected growth rate that will consume this space = 40 GB / 2 GB per day = 20 days Recommendation: Run an archiving scan at least once every 20 days Growth rate of capacity usage overhead that cannot be reduced by archiving: Calculate the stub file growth rate = 8 KB per stub * 800 files per day growth rate for files large enough to archive = 6.25 MB/day Calculate growth rate for files that are too small to archive = 8 KB per file * 200 files per day growth rate for files less than 8 KB in size = 1.5 MB/day Add these together to get the primary file system space consumption that cannot be archived = 6.25 MB/day stub file growth rate + 1.5 MB/day small file growth rate = 7.75 MB/day Primary file system usage that cannot be reclaimed by archiving grows 7.75 MB/day. After a year, the space that cannot be reclaimed by archiving grows to 2.75 GB. This has a negligible effect on the available space in the primary file system and can be comfortably ignored for several years. Note that this conclusion is dependent on the distribution of file sizes in the data set. The example uses an average file size of approximately 200 KB. However, significantly smaller files result in a much higher stub file growth rate as a percentage of the overall data growth rate. 20 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Chapter 3 Migration Strategies This chapter presents these topics: Single-tier migrations.................................................................................................................................... 22 Recommendation #20 Use the multiple-tier migration technique whenever possible ................................. 22 Recommendation #21 Use the single-tier migration technique for NFS data .............................................. 22 Recommendation #22 Use the single-tier migration technique if extra space is not available .................... 22 Recommendation #23 Use FMA to preview policies when migrating from Celerra or NetApp ................. 22 Recommendation #24 If file sizes and ages are unknown, archive all files during migration ..................... 23 Recommendation #25 Migrate only a subset of the data set at a time ......................................................... 23 Recommendation #26 After the migration, temporarily set the read recall policy to full recall .................. 23 Multiple-tier migrations ................................................................................................................................ 24 Recommendation #27 Use the single-tier migration technique for NFS data .............................................. 24 Recommendation #28 Correctly size the temporary file system.................................................................. 25 Recommendation #29 Configure FMA and FileMover on the temporary file system and the target primary file system........................................................................................................................................ 25 Recommendation #30 Use the temporary file system to tune archiving policies......................................... 25 Recommendation #31 Set the CIFS backup option on the temporary file system to offline........................ 25 EMC Celerra Automated Storage Tiering Applied Best Practices Guide 21 Migration Strategies Single-tier migrations The following recommendations are applicable only to single-tier migrations. Figure 3 shows the single-tier migration technique. It involves migrating data directly to the primary tier file system and actively archiving inactive files from the primary tier to the secondary tier during the migration process. Migrate files to primary tier Source file system Archive files to secondary tier during migration Migration source Figure 3 Primary Tier Secondary Tier Migration destination (Celerra) Single-tier migration technique with active archiving Recommendation #20 Use the multiple-tier migration technique whenever possible If extra storage space is available and the migrated data uses the CIFS protocol, use the multiple-tier migration technique as described on page 24. The multiple-tier migration technique is enabled by stubaware copy utilities, which are only supported by EMC for the CIFS protocol. Recommendation #21 Use the single-tier migration technique for NFS data Because EMC does not currently support a stub-aware copy mechanism for NFS, use the single-tier migration method to copy data into the automated storage tiering architecture if the data set consists of NFS data. Recommendation #22 Use the single-tier migration technique if extra space is not available Because the multiple-tier migration technique requires an additional temporary file system that is large enough to store the entire data set, use the single-tier technique if the storage system does not have sufficient extra space to allocate to the temporary file system for the duration of the migration. Recommendation #23 Use FMA to preview policies when migrating from Celerra or NetApp If the data to be migrated to the automated storage tiering architecture is currently residing on a Celerra or NetApp system, use the FMA Create Preview feature to establish an initial archiving policy before the data is migrated. If the data set is currently active, use the preview feature to determine the file-age selection 22 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Migration Strategies criteria that will archive enough data to the secondary storage so that the active data remaining on the primary storage meets the targeted lower range of the desired primary tier capacity usage. For example, migrate a 1 TB data set to a automated storage tiering architecture consisting of a 200 GB primary tier file system with a desired target capacity usage of 60 percent to 80 percent of primary tier space. Use the FMA preview feature to discover the policy criteria that will result in 120 GB of the most active data remaining on the primary tier (120 GB = 60 percent of the 200 GB of total primary tier capacity). Remember to make an allowance for space consumed by stub files (8 KB for each archived file) and files that are too small to archive (less than 8 KB in size) as described in Recommendation #18 on page 18. After an appropriate policy is discovered, use it during the migration process to offload inactive files from the primary tier file system. Recommendation #24 If file sizes and ages are unknown, archive all files during migration If the data set is migrated from a Celerra or NetApp system, use FMA to derive an initial archiving policy during the migration as described in Recommendation #23. If the data is migrated from another platform, it may not be possible to decide on an archiving policy before the migration because of a lack of information about the file size and age distributions in the data set. Therefore, establish a working archiving policy after the migration, and use a special archiving policy during the migration process to offload data from the primary tier. A suitable archiving policy that can be used during the migration is one that archives all files, so as to allow the entire data set to be migrated to a primary tier file system that is smaller than the size of the data set. Before starting the migration, create a policy that selects all files for archiving, such as a policy with the criteria last_accessed > 0. Recommendation #25 Migrate only a subset of the data set at a time During the migration, FMA may not be able to archive files off the primary tier file system at a sufficient rate to keep up with the incoming data from the migration process. To avoid accidentally filling the primary file system to 100 percent, divide the data set into smaller portions and migrate them serially. Divide data sets into smaller portions by migrating sets of directories at a time and by using Windows and Linux tools to assess the size of the directories targeted for each migration session. After each subset of the data set is migrated, manually execute FMA archiving to offload newly migrated data to the secondary tier file system. Allow FMA to complete the scanning and archiving process between the migration sessions. Ensure that each migrated portion is small enough to fit completely into the free space remaining in the primary tier file system after the previous migration/archiving operation. When using the “archive everything” approach described in Recommendation #24, most of the space in the primary file system is freed up after each archiving session. However, if more selective archiving criteria is used as suggested in Recommendation #23 on page 22, each migration/archiving session will result in a lesser amount of primary tier file system space for migrating the next portion of the data set. Therefore, pay careful attention to the available primary tier space and to the size of the migrated portion of the data set to avoid filling the primary tier file system to 100 percent. If the primary tier file system is configured to auto-extend, ensure that it does not exceed the high water mark threshold for auto-extension during the migration process. Recommendation #26 After the migration, temporarily set the read recall policy to full recall If a tuned archiving policy is used during the migration (Recommendation #23 on page 22), the end result of the migration is a primary tier file system filled with active data at the desired target capacity usage level. However, if the archive everything approach of Recommendation #24 is used, after the migration is complete, the primary tier file system contains only stub files and files that are too small to archive. You can address this underutilization of the primary tier by doing nothing and allowing the primary tier file system usage to increase gradually as new data is created. However, this approach exposes users to a EMC Celerra Automated Storage Tiering Applied Best Practices Guide 23 Migration Strategies temporary performance penalty when they access recent and presumably active files that were archived to the secondary tier during the migration process. You can minimize this performance penalty by allowing Celerra to automatically recall archived files back to the primary tier file system whenever they are accessed for the first time after the migration. To allow the recall of archived files on read access, use the FileMover fs_dhsm command to set the read_policy_override option to full. $ fs_dhsm -modify fs_name -read_policy_override full where fs_name is the name of the FileMover-enabled primary file system Closely monitor the capacity usage on the primary tier file system when using this setting. When the primary tier capacity usage reaches the desired target usage, disable read recalls as described in Recommendation #11 on page 14. Note that when the read recall policy setting is full, it is possible that file system scanning applications like content searches, NFS backups, Windows offline folder synchronization, and client-based anti-virus scans may accidentally trigger recalls of all archived data to the primary file system. This may lead to accidental file system auto-extension or failed client writes and therefore, it is desirable to limit exposure to this risk by minimizing the time period during which the read recall policy is set to full. Use the default offline attribute settings described in Recommendation #12 on page 14 to further limit the risk of accidental recalls. Multiple-tier migrations The following recommendations are applicable only to multiple-tier migrations. The multiple-tier migration technique uses three file systems: the target primary tier file system, the target secondary tier file system, and a temporary file system. The migration involves copying the entire data set to a temporary file system, and then using FMA to archive inactive files from the temporary file system to the target secondary tier file system. After archiving is complete, the remaining files and stub files on the temporary file system are copied to the target primary tier file system by using a stub-aware migration tool such as the EMCopy utility. After the stub-aware migration is complete, the temporary file system can be deleted and its space freed for other uses. Figure 4 illustrates the multiple-tier migration technique. Source file system 1) Migrate files to temporary file system Migration source Figure 4 3) Stub-aware copy to primary tier Temporary file system 2) Archive files to secondary tier file system Primary Tier Secondary Tier Migration destination (Celerra) Multiple-tier migration technique by using a temporary file system Recommendation #27 Use the single-tier migration technique for NFS data The multiple-tier migration technique requires a stub-aware copy mechanism. Because EMC does not currently support a stub-aware copy mechanism for NFS, use the single-tier migration method to copy NFS data into the automated storage tiering architecture. This recommendation is a repeat of Recommendation #21 on page 22. 24 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Migration Strategies Recommendation #28 Correctly size the temporary file system Ensure that the temporary file system is large enough to contain the entire data set. Because it is only used for staging, it can be provisioned from any tier of storage. Size the target secondary tier file system as described in Recommendation #5 on page 12. Recommendation #29 Configure FMA and FileMover on the temporary file system and the target primary file system Ensure that FileMover is enabled to point to the same target secondary tier file system for archiving on both the temporary file system and the target primary tier file system. Similarly, configure FMA such that both primary file systems archive to the same secondary. Before doing the stub-aware copy, enable FileMover on the target primary tier file system, and ensure that the connections to the target secondary file system match the connections defined on the temporary file system. Recommendation #30 Use the temporary file system to tune archiving policies The temporary file system is used as a staging location to divide the data set into active and inactive portions that will be stored on the target primary and secondary file systems, respectively. Use the FMA preview feature (Recommendation #14 on page 15) to select the archiving criteria that ensures the size of the data remaining in the temporary file system after archiving is equal to the desired target capacity usage of the primary tier file system. After tuning the archiving policy, run it against the data in the temporary file system and verify that the remaining data in the temporary file system fits into the primary tier file system. Recommendation #31 Set the CIFS backup option on the temporary file system to offline Use a stub-aware copy mechanism to copy the data that remains on the temporary file system after archiving to the target primary tier file system. The EMCopy utility performs a stub-aware copy as long as the FileMover CIFS backup option on the temporary file system is set to offline. However, it is unnecessary to set this option on the target primary tier file system. This parameter is set by executing the following CLI command on the Celerra Control Station: $ fs_dhsm -modify fs_name -backup offline where fs_name is the name of the temporary file system. Note: Failure to set this parameter results in EMCopy using read recalls to migrate archived data from the target secondary tier file system to the target primary tier file system. Consequently, the primary tier file system may reach 100-percent capacity or extend automatically. EMC Celerra Automated Storage Tiering Applied Best Practices Guide 25 Migration Strategies 26 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Chapter 4 Data Protection Strategies This chapter presents these topics: Backup recommendations...................................................................................................................... 28 Recommendation #32 If the backup window allows, back up the data set as a single file system ....... 28 Recommendation #33 To reduce the backup window, back up primary and secondary file systems separately .......................................................................................................................................... 28 Recommendation #34 When using the separate backup technique, back up the secondary file system after FMA operations................................................................................................................. 29 Restore recommendations...................................................................................................................... 29 Recommendation #35 Use single-tier migration techniques when performing the full restore of backups containing both primary and secondary tier files ..................................................................... 29 Recommendation #36 Preserve stub file synchronization when restoring separate backups of primary and secondary tier data ............................................................................................................. 30 SnapSure recommendations................................................................................................................... 30 Recommendation #37 Take separate snapshots of primary and secondary file systems ...................... 30 Recommendation #38 Take snapshots of secondary file systems only after FMA activities................ 30 Recommendation #39 Preserve stub file synchronization when restoring snapshots of primary or secondary tier file systems ..................................................................................................................... 31 Recommendation #40 Do not keep primary tier snapshots for more than 30 days............................... 31 Recommendation #41 Minimize the number and age of secondary tier snapshots............................... 31 Recommendation #42 Minimize the number and age of primary tier snapshots ................................. 31 Recommendation #43 Store primary tier file system snapshots on secondary tier storage................... 31 Replication recommendations................................................................................................................ 32 Recommendation #44 Replicate both primary and secondary file systems .......................................... 32 Recommendation #45 Configure source and destination FMA devices ............................................... 32 Recommendation #46 Populate and deduplicate the source site secondary tier file system before configuring replication........................................................................................................................... 32 Recommendation #47 Use the same CIFS domain on replicated sites ................................................. 32 Recommendation #48 Use local host files instead of DNS entries to resolve secondary tier hostnames when using replication ......................................................................................................... 32 EMC Celerra Automated Storage Tiering Applied Best Practices Guide 27 Data Protection Strategies Backup recommendations Data sets that span multiple storage tiers present special requirements for backups. There are two general approaches that can be taken: ♦ Back up the data set as a single file system ♦ Back up the data set as separate file systems (one backup for each storage tier) Recommendation #32 If the backup window allows, back up the data set as a single file system If the data set is backed up as a as a single file system, it avoids the need for synchronized backups and restores of the primary and secondary portions of the data set. It is therefore the recommended method for backing up tiered data sets. To use this technique, the backup software is used to back up the primary file system in a manner that includes the contents of both the primary tier file system and the archived files stored in the secondary tier file system during the backup of the primary tier file system. This is enabled through the use of automatic passthrough reads that read the archived files from the secondary tier during backup processing, without relocating the files to the primary tier file system. The technique to enable passthrough reads during backup processing is dependent on the backup method. Network Data Management Protocol (NDMP): For NDMP backups, set the NDMP environment variable in the backup job to configure the behavior of the backup software to include the contents of archived files from the secondary tier file system in backups of the primary tier file system: EMC_OFFLINE=y Note that the default setting is EMC_OFFLINE=n, which means that only the primary storage tier is backed up. NDMP supports integrated snapshots that automate checkpoint creation, management, and deletion activities if the SNAPSURE=y environmental variable is configured for qualified vendor-backup software. CIFS: When the Celerra FileMover –backup option is set to passthrough (the default setting), CIFS backups of the primary tier file system include archived file data from the secondary tier. CIFS backups of the primary tier file system read file content from the secondary storage tier in passthrough mode, which does not relocate the data to the primary tier storage. NFS: NFS backups of the primary file system always include associated secondary tier file content. Set the Celerra FileMover – read_policy_override option to passthrough (as described in Recommendation #11 on page 14) to ensure that NFS backup operations do not accidentally relocate secondary tier file content back to the primary storage tier. Recommendation #33 To reduce the backup window, back up primary and secondary file systems separately It is possible to reduce the time to back up the primary tier file system by taking separate backups of the primary tier and secondary tier file systems that comprise the data set. Configure the backups of the primary file system to be stub-aware to avoid backing up file content that has been archived to the secondary file system. This reduces the size of the primary file system backup and therefore reduces the time it takes to perform the backup. 28 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Data Protection Strategies Taking separate data backups on primary and secondary tiers requires the ability to do stub-aware backups of the primary file system. The technique to enable stub-aware backup processing is dependent on the backup method. NDMP: For NDMP backups, set the backup software NDMP environment variable in the backup job definition to avoid including secondary tier content in the primary file system backup. EMC_OFFLINE=n Note that this is the default setting. NDMP supports integrated snapshots that automate checkpoint creation, management, and deletion activities if the SNAPSURE=y environmental variable is configured for qualified vendor-backup software. NDMP Volume Backup (NVB): Celerra supports an EMC-specific type of NDMP backup mechanism called NVB or Volume Based Backup (VBB). Celerra NVB backs up data blocks at a volume level rather than at a file level. NVB reads a set of disk data blocks in an efficient manner compared to the method used for traditional, file-based backups. NVB does not support the NDMP environment variable EMC_OFFLINE_DATA. In this case, you need to back up the contents of migrated files and offline files independently. In addition to providing faster backups, NVB retains the benefits of deduplication on secondary tier file systems because NVB operates at the block level. Therefore, it does not reinflate deduplicated files during backup operations. Note that NVB does not provide for single file restore or file-by-file restore. CIFS: Set the Celerra FileMover –backup option to offline to back up only stub files and files that have not been archived when backing up primary tier file systems. Execute the following command at the Celerra CLI prompt to set this option: $ fs_dhsm –modify fs_name -backup offline NFS: It is not possible to configure NFS backups of the primary file system to be stub-aware. NFS backups of the primary tier file system includes all associated secondary tier file content as described in Recommendation #32 on page 28. Recommendation #34 When using the separate backup technique, back up the secondary file system after FMA operations A best practice is to back up the secondary file system after every FMA policy engine scan or orphan file cleanup operation. This is because the secondary tier file system is only updated by FMA, and these updates only occur during archiving or orphan file cleanup. Therefore, backups of the secondary file system need to occur only after an archiving or orphan file cleanup process has run. Restore recommendations This section details the restore recommendations. Recommendation #35 Use single-tier migration techniques when performing the full restore of backups containing both primary and secondary tier files The full restore of primary file system backups that contain both primary and secondary file system content (that is, backups created with the read passthrough option) write the complete contents of both EMC Celerra Automated Storage Tiering Applied Best Practices Guide 29 Data Protection Strategies tiers into the primary tier file system. The content that was originally on the secondary tier can be rearchived to the secondary tier by subsequent passes of the FMA policy engine scan. Because most tiered data sets are larger than the primary tier file system, performing a full restore of data sets that were backed up as a single file system (using the techniques in Recommendation #32 on page 28) requires the use of single-tier migration techniques. Use the technique described in Recommendation #25 on page 23 to avoid overfilling the primary tier file system when restoring backups that are larger than the primary tier file system Recommendation #36 Preserve stub file synchronization when restoring separate backups of primary and secondary tier data The disadvantage of the separate backup strategy described in Recommendation #33 on page 28 is that it requires special considerations during the restore operations to ensure that the stub files in the restored primary file system contain valid links to associated file content on the secondary file system. When restoring a separate backup to either the primary or secondary tier, follow these guidelines: ♦ If the primary backup is more than 30 days old and orphan file cleanup procedures are regularly used, restore the secondary tier from a backup taken more recently than the primary backup but within 30 days of the primary backup. ♦ Restoring a backup of the secondary file system requires restoring a backed-up copy of the primary file system that is older than the secondary backup. ♦ Single file or directory-level restores from primary file system backups that are more than 30 days old may require administrative action to determine if it is necessary to recover older versions of archived files from secondary file system backups. Note: When restoring a secondary file system, if the secondary file system is shared by several primary systems, it may be necessary to restore all primary file systems to the same recovery point, This is because all stub files on all primary file systems must have corresponding files on the shared secondary. As a result, you may need to revert all primary file systems to the earlier versions. SnapSure recommendations This section details SnapSure™ recommendations. Recommendation #37 Take separate snapshots of primary and secondary file systems For tiered data sets, take snapshots separately on the primary and secondary file systems. When snapshots are used to back up consistent point-in-time images of a live file system, ensure that you delete the snapshot after the backup operation is completed so as to enable Celerra to effectively release storage space during future archiving and deduplication operations. Recommendation #38 Take snapshots of secondary file systems only after FMA activities Because the policy engine is the only process that modifies files on the secondary, it is sufficient to take snapshots of the secondary file system only after the completion of FMA activities like archiving jobs or orphan file cleanups. This is similar to Recommendation #35 on page 29. 30 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Data Protection Strategies Recommendation #39 Preserve stub file synchronization when restoring snapshots of primary or secondary tier file systems Because snapshots are essentially separate backups of the primary and secondary file systems, ensure that the synchronization of stub files is preserved on the primary file system with archived data files on the secondary file system when restoring snapshots of either file system. When restoring entire file systems from snapshots, follow the guidelines listed in Recommendation #36 on page 30 for restoring separate primary and secondary backups. Recommendation #40 Do not keep primary tier snapshots for more than 30 days User-initiated “Microsoft Previous Versions” tab restore of individual files or directories from primary file system snapshots may result in stub files on the primary file system that do match corresponding file versions on the secondary file system. This situation is more likely to occur with snapshots that are older than 30 days and can be generally avoided by maintaining snapshots for shorter periods of time. Recommendation #41 Minimize the number and age of secondary tier snapshots Snapshots reduce the effectiveness of deduplication because the space saved in the secondary file system during deduplication is added to the previous snapshots’ SavVol storage usage until the snapshot is deleted. To realize the full-capacity saving benefits of the automated storage tiering architecture, keep snapshots only for short-term use. You can automate this through the Celerra Manager checkpoint scheduling feature. It allows an administrator to configure the system to automatically take snapshots at a specific time of day, on specific days of the week or month, and to define a specific number of snapshots to keep. For example, snapshots scheduled to be taken once a day can be configured to keep seven copies only, ensuring that there will never be a snapshot on the system that is more than a week old. Recommendation #42 Minimize the number and age of primary tier snapshots Snapshots reduce the effectiveness of tiering because space saved in the primary tier file system during an archiving operation is added to the previously created snapshots’ SavVol storage usage until the snapshot is deleted. To realize the full-capacity saving benefits of the automated storage tiering architecture, keep snapshots only for short-term use by configuring the snapshot scheduling as described in Recommendation #41. Recommendation #43 Store primary tier file system snapshots on secondary tier storage The goal of the automated storage tiering architecture is to limit the use of primary tier storage to active data. Snapshots reduce the effectiveness of tiering because space saved in the primary tier file system during an archiving operation is added to the previously created snapshots’ SavVol storage usage until the snapshot is deleted. For this reason, store snapshots of primary tier file systems on the secondary tier to preserve primary tier capacity. However, if high IOPS write performance to files on EMC Celerra Automated Storage Tiering Applied Best Practices Guide 31 Data Protection Strategies primary storage is considered more important than conserving primary storage tier capacity, store the snapshots on the primary tier storage. The read IOPS performance is not affected by the snapshots. Replication recommendations This section details the replication recommendations. Recommendation #44 Replicate both primary and secondary file systems Although it is possible to replicate only the primary file system in a tiered storage architecture and have the stubs on the replicated copy of the primary file system point back to the non-replicated secondary file system at the source site, this is not recommended for the automated storage tiering architecture. Because the automated storage tiering architecture stores a tiered data set’s content on a single Celerra system, better protection is realized by replicating both primary and secondary tier file systems in separate replication sessions so that both primary and secondary tier content is fully protected against site disasters that affect the source Celerra. Recommendation #45 Configure source and destination FMA devices Although only a single FMA device is required (at the replication source site) for basic disaster protection, EMC recommends providing a second FMA device (at the replication destination site) to allow for continued FMA policy engine scans during the outage. During an extended source site outage, continued data set growth might cause the destination site’s primary tier file system to fill to capacity if there is no FMA device available to perform archiving at the destination site. Recommendation #46 Populate and deduplicate the source site secondary tier file system before configuring replication Deduplicating the contents of a file system before it is replicated can greatly reduce the amount of data that has to be sent over the network as part of the initial baseline copy process. Therefore, do not configure replication until after the initial migration of the data set to the source site and after completing a deduplication scan on the archived data in the source site secondary tier file system. After replication and deduplication are running together, most newly archived files are replicated before they are deduplicated. For this reason, deduplication has little effect on the steady state quantity of data replicated. However, deduplication saves the same amount of space in the replicated file system at the source and destination sites, thereby reducing the overall secondary tier disk usage at both sites. Recommendation #47 Use the same CIFS domain on replicated sites If you are using CIFS connections on the primary storage, the CIFS server on the secondary side of Replicator must be in the same domain as the CIFS server on the primary Data Mover. Otherwise, the CIFS connections on the replicated file system do not work and any attempts to access the stub files on the replicated file system result in I/O errors. Recommendation #48 Use local host files instead of DNS entries to resolve secondary tier hostnames when using replication Use local host files on Data Movers for FileMover secondary server host name resolution rather than DNS. On each Data Mover that hosts a FileMover primary file system, use the local host file for name resolution of the secondary file system’s hostname. On the production site, associate the address for 32 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Data Protection Strategies the secondary server at the production site with the secondary server hostname entry in the local host file. Conversely, at the replication destination site, associate the address for the secondary server at the replication destination site with the secondary server hostname entry in the local hosts file. This causes the primary Data Mover at the production site to use the secondary server at the production site to retrieve archived files. It also causes the primary Data Mover at the replication destination site to use the secondary server at the replication destination site to retrieve archived files. In both cases, archived file retrieval is accomplished locally without the need to traverse the WAN. You can implement an additional level of disaster protection by listing the IP address of the secondary server at the remote site as a second IP address in the local host file entry for the secondary server hostname. This way, each site uses a local secondary server unless the local secondary server is not available. If the local secondary server is not available, the remote secondary server will automatically be used. Use the following Celerra CLI commands to retrieve a Data Mover’s local host file: server_file server_2 –get /.etc/hosts hosts where server_2 is the name of the Data Mover and hosts is the name of the local file on the Control Station that contains the contents of the Data Mover file /.etc/hosts after the command is executed Next, edit the retrieved file named hosts on the Control Station, and move the edited file back to the Data Mover to replace the contents of the Data Mover’s local /.etc/hosts file: server_file server_2 –put hosts /.etc/hosts where server_2 is the name of the Data Mover and hosts is the name of the local file on the Control Station that replaces the contents of the /.etc/hosts file on the Data Mover after the command is executed. If you use local host file name resolution for the secondary server hostname, do not create DNS entries for the secondary server hostnames. If you use DNS instead of local host files: ♦ On a Celerra file system or secondary server failover, update the DNS entry for the secondary servers to reference the IP addresses of the secondary servers at the disaster recovery site. This allows stub files in the destination primary tier file system to resolve to the secondary file system on the destination Celerra instead of the secondary file system on the source Celerra. ♦ Flush the DNS cache on the Data Mover if the DNS entry for the secondary server hostname has a Time to Live (TTL) of more than a few minutes. Do not use multiple IP addresses for a single hostname in DNS because most DNS servers return all the addresses but rotate the order each time it is queried. This means it is not possible to predict which secondary server a Data Mover recalls data from when the IP addresses for multiple Celerra secondary servers are listed in a single DNS hostname entry. EMC Celerra Automated Storage Tiering Applied Best Practices Guide 33 Data Protection Strategies 34 EMC Celerra Automated Storage Tiering Applied Best Practices Guide Chapter 5 Performance Recommendations This chapter presents these topics: Introduction…........................................................................................................................................ 36 Recommendation #49 Use VLANs to segregate network traffic.......................................................... 36 Recommendation #50 Monitor read access to the secondary tier file system....................................... 36 EMC Celerra Automated Storage Tiering Applied Best Practices Guide 35 Data Protection Strategies Introduction This section details the performance recommendations. Recommendation #49 Use VLANs to segregate network traffic Use VLANs to segregate network traffic of different types to improve throughput, manageability, application separation, high availability, and security. You can attain better archiving performance by using separate interfaces on the Rainfinity FMA to read data from the primary tier and write data to the secondary tier. For FMA/VE, use two separate interfaces configured in two separate VLANs through two virtual switches on the ESX server to enhance archiving performance. Use additional VLANs for client access to the primary tier file system. Recommendation #50 Monitor read access to the secondary tier file system High rates of read operations on the secondary tier file system during normal client access to the primary tier file system may indicate that archived files may be more active than intended. Archived files that continue to get repeated client read requests can be relocated to the primary tier file system by temporarily adjusting the FileMover read_policy_override setting to full. This causes Celerra to automatically recall any archived file back to the primary tier file system the next time the file is accessed for reading. To allow the recall of archived files on read access, use the FileMover fs_dhsm command to set the read_policy_override option to full. $ fs_dhsm -modify fs_name -read_policy_override full where fs_name is the name of the FileMover-enabled primary file system Closely monitor the capacity usage on the primary tier file system when using this setting. You can explicity recall archived files that get repeated client read requests by viewing the files’ contents from a NAS client or implicitly recall them by changing the read_policy_override option to full for a long time period. This recalls the archived files that continue to be accessed by read requests. After the desired files are recalled or when the primary tier capacity usage reaches the desired target usage, disable read recalls as described in Recommendation #11 on page 14. Note that when the read recall policy setting is full, it is possible that file system scanning applications like content searches, NFS backups, Windows offline folder synchronization, and client-based anti-virus scans may accidentally trigger recalls of all archived data to the primary file system. This may lead to accidental file system auto-extension or failed client writes. Therefore, it is desirable to limit exposure to this risk by minimizing the time period during which the read recall policy is set to full. Use the default offline attribute settings described in Recommendation #12 on page 14 to further limit the risk of accidental recalls. In a controlled environment where there is little risk of unintentional recalls, you can set the read_policy_override option to full for normal operations. This has a net performance benefit on the client access to files that become active after periods of inactivity. Such files get archived during periods of inactivity, and this setting causes Celerra to automatically migrate archived files back to the primary tier when and if they become active again. 36 EMC Celerra Automated Storage Tiering Applied Best Practices Guide
© Copyright 2026 Paperzz