Smart Storage Sizing Managing Performance and Saving Energy Smart Storage Sizing IntelliMagic Storage Performance Management firm Based in Leiden, The Netherlands Sales office in Dallas, Texas Privately owned by Gilbert Houtekamer and Els Das 30 people and growing z/OS and SAN solutions Storage Intelligence © IntelliMagic 2012 2 Overview Managing large amounts of data Tiered Storage Migrating to Tiered Storage Green Storage Storage Intelligence © IntelliMagic 2012 3 Managing Large Amounts of Data When you can, keep everything. Storage Intelligence © IntelliMagic 2012 4 How Large is Storage Expenditure Projected Zettabytes Optimized Storage requirements growing 20% - 40% every year Exabytes Petabytes Terabytes Gigabytes 2005 Storage Intelligence © IntelliMagic 2012 2010 2015 2020 5 These savings are possible A real-world example of the benefits that Storage Performance Management can provide: Storage Intelligence © IntelliMagic 2012 6 How to Manage Large Amounts of Data Storage Intelligence How to store large amounts of data Not all data is very active Depending on how active the data is, an appropriate tier can be selected. Access Density can be used to determine storage requirements How to quickly access that data Important data needs to be accessed quickly. Tiers need to be “important data” aware Data placement is critical 7 © IntelliMagic 2012 Access Density Access density is the number of I/Os per second per GB of storage It is important that the appropriate drive be used to handle the required access density For that, it is important to look at the back-end I/Os Storage Intelligence © IntelliMagic 2012 8 Front-end vs. Back-end Front-end I/O activity is different to Backend I/O activity I/O response time is directly influenced by Back-end performance Read miss Read hit Random Write Seq. Write Cache HDD (arrays) Storage Intelligence © IntelliMagic 2012 9 Front End vs. Back End 18 SGBKUP 16 SGDB2LOG 14 IO/s/GB 12 10 8 6 4 2 Back-end access density will be different Look at the Storage Group at the back end at the array level Storage Intelligence © IntelliMagic 2012 5 4 3 2 1 0 18 :0 0 20 :0 0 22 :0 0 0: 00 2: 00 4: 00 6: 00 8: 00 10 :0 0 12 :0 0 14 :0 0 16 :0 0 18 :0 0 Front End IO/s/GB 18 :0 0 20 :0 0 22 :0 0 0: 00 2: 00 4: 00 6: 00 8: 00 10 :0 0 12 :0 0 14 :0 0 16 :0 0 18 :0 0 0 S2DB2TRN SGVSAM02 S2DB2QRY SGADSM -unknow nSGDB2LOG SGTSMDB SGBKUP S1FLAT01 S1WRKPRD SGSORTWK SGWRKTST S1VSAM03 SGDB2LOG 8 S2MIRR01 7 S2DEP01 6 S2MIRR02 SGBKUP S2DB2TRN SGVSAM02 S2DB2QRY SGADSM -unknow nSGDB2LOG SGTSMDB SGBKUP S1FLAT01 S1WRKPRD SGSORTWK SGWRKTST S1VSAM03 S2MIRR01 S2DEP01 S2MIRR02 Back End 10 Access Density Planning values for Access Density Access Density = number of I/Os per second per Gigabyte of storage Back-end access density will be different to front-end • Degree depends on the RAID implementation Storage Intelligence © IntelliMagic 2012 Drive Max. Back-end 300 GB SSD Access Density for Planning > 10.0 73 GB, 15K RPM, 3.5” 1.4 73 GB, 10K RPM, 3.5” 0.8 146 GB, 15K RPM , 3.5” 0.7 146 GB, 10K RPM , 3.5” 0.4 300 GB, 15kRPM , 3.5” 0.3 300 GB, 10K RPM, 3.5” 0.2 450 GB, 15K RPM, 3.5” 0.2 450 GB, 10K RPM, 3.5” 0.13 450 GB, 10K RPM, 2.5” 0.25 600 GB, 10K RPM, 2.5” 0.2 1 TB, 7.2K RPM, 3.5” 0.04 2 TB, 7.2K RPM , 3.5” 0.02 11 Access Density and Drive Size Suppose the back-end AD is 1 disk access per sec per GB: • A 73 GB drive would have to do 73 accesses per second • A 144 GB drive would have to do 144 accesses per second • A 300 GB drive would have to do 300 accesses per second (too many for a hard disk) HDD busy values for a 144 GB drive for the same AD will be twice as high as on the 73 GB drive, for a 300 GB drive it is quadrupled! The HDD response time is a function of HDD busy, so the performance of the larger drives will be much less for the same back-end AD. Storage Intelligence © IntelliMagic 2012 12 Response Time vs Back-end AD HDD Response vs I/O Rate per GB 40 Response time (ms) 35 30 SATA 250 GB 400 GB 10K Fibre 146 GB 15K Fibre 300 GB 15K Fibre 450 GB 15K Fibre 600 GB 10K SAS 146 GB 15K SAS 25 20 15 10 5 0 0 0.2 0.4 0.6 0.8 1 Back-end I/O rate per GB (density) • Curves for larger drives are much steeper • This is the effect of the larger HDD utilization for the same AD • Response for the large 15k rpm drives quickly exceeds the smaller (146 GB) drives Storage Intelligence © IntelliMagic 2012 13 Tiered Storage Storage Intelligence © IntelliMagic 2012 14 Data Lifecycle Management Data growth is exponential Once created, most data is rarely accessed again It should be moved off expensive higher tiers and onto lower inexpensive tiers But – how do you determine what data should be moved Each tier is a combination of performance, access rate and cost Storage Intelligence © IntelliMagic 2012 15 HSM HSM is an example of a Data Lifecycle Management product. Typically, data is migrated based on the elapsed time since last access Data may stay on a higher tier for many days before it migrates to a lower, inexpensive tier Migration is done at the data set level When data needs to be accessed it is recalled back to a higher tier Storage Intelligence © IntelliMagic 2012 16 Possible Tier Implementation SSD (Flash) Very high access density Checkpoint, RACF, HSC FC/SAS HDD High access data requirement Tier 0 Tier 1 SATA HDD or large FC HDD Passive data sets HSM ML1, Image copy (Virtual) Tape All other data Storage Intelligence © IntelliMagic 2012 Tier 2 Tier 3 17 Tiering It may make sense to combine different HDD sizes and RAID types within a single DSS A smaller DSS footprint may be achievable by combining different drives within the same hardware • Power savings are possible compared to separate DSS per HDD type Each vendor has their own implementation of automatic tiering • DSS internal (microcode) IntelliMagic Balance is a hardware-vendor independent host based tiering product Storage Intelligence © IntelliMagic 2012 18 Autotiering Each vendor has different sized ‘chunks’ of data that can be automatically moved between defined tiers of storage Some elapsed time must take place before recommendations are made as to which ‘chunks’ can be promoted to a higher tier or demoted to a lower tier Heat maps are used to identify ‘chunks’ that can be moved. ‘Chunk’ movement is scheduled during low back-end activity • Not the case for all vendors! Storage Intelligence © IntelliMagic 2012 19 Autotiering (2) Heat gathered over time Based on I/O rate and throughput Heatmap Higher performance HDD Logical Volume Storage Intelligence © IntelliMagic 2012 Lower performance HDD 20 Vendor Implementation IBM Easy Tier • 1 GB extents • Currently supports 3 tiers • Migration plan created every 24 hours EMC FAST (Fully Automated Storage Tiering) • Variable extent size (360 MB down to 7.6 MB) • Data may initially be moved at the extent size (360 MB) and then moved back at the sub-extent size (7.6 MB) Hitachi HDT (Hitachi Dynamic Tiering) • Pages are 42 MB in size HP Auto LUN Storage Intelligence • Can automatically move LUNs between tiers or within Parity groups © IntelliMagic 2012 21 IntelliMagic Balance Analyze many day’s worth of data to see workload trends Creates recommendations based on I/O rate and throughput for the front and backend Creates before/after charts and heatmaps to verify volume migration Creates migration plan to show required device or storage group data movement z/OS Host data aware, and you have the final say whether to perform a certain move or not Storage Intelligence © IntelliMagic 2012 22 Tiering with SSD Storage Intelligence © IntelliMagic 2012 23 Flash Drives/SSD Flash storage is a nonvolatile computer memory that can be electrically erased and reprogrammed All major vendors offer Flash drives SSD have a limited number of read-write cycles, typically up to 100,000 • Flash drives are still very expensive Storage Intelligence • “wear leveling” to ensure all cells are written about the same number of times, Redundant cells are added, such that cells that become unusable can be replaced with new cells © IntelliMagic 2012 No significant advantage for sequential I/O Small-block read rate 100 times higher than for a conventional HDD Role in multi-tiered storage • • Ideally suited for datasets with high read-miss disconnect time Data sets that have a very high access density and are currently on ‘short-stroked’ DASD 24 Selecting Data for SSD 1. Look for the highest response time improvement a) Selectively move data sets onto SSD that experience high disconnect times due to read misses b) Selectively move volumes that experience high levels of random read misses Storage Intelligence © IntelliMagic 2012 2 . Look for the overall throughput improvement This allows for a substantial reduction of total HDDs and a decreased footprint because the remainder can be placed on slow, large drives 25 Disconnect Time Selection These are the best candidates for SSD, based on random read disconnect intensity Storage Intelligence © IntelliMagic 2012 26 Random Read Misses These are the best volume candidates for SSD, based on random read misses Storage Intelligence © IntelliMagic 2012 27 Example Goal is to replace current HDD drives with a combination of SSD and larger capacity drives to reduce the energy footprint Determine which volumes must reside on SSD and which volumes can reside on larger capacity drives Decision is based on the access density of the data Storage Intelligence © IntelliMagic 2012 28 Example 20 TB Disk System Access density Horizontal Storage Groups spread the load over all the array groups 10000 I/Os per second Number of I/Os per second per GB Average access density = 0.5 Requires using 146 GB 15K RPM HDD We need a total of 20 array groups Total = 160 HDD Storage Intelligence © IntelliMagic 2012 29 Example 20 TB Disk System 64/160 = 40% HDD required Fewer HDD For a large DSS a complete I/O frame may be saved! Power savings 10000 I/Os per second 10% handling 90% of the load 2 TB = 9000 I/Os per sec Access Density = 4.5 2 x 146 GB SSD array groups 18 TB = 1000 I/Os per sec Access Density = 0.06 6 x 450 GB 10K RPM array groups We need a total of 8 array groups Total = 64 HDD Storage Intelligence © IntelliMagic 2012 30 Migrating to Tiered Storage Storage Intelligence © IntelliMagic 2012 31 Storage Management Data Set allocation is controlled through SMS DFSMS will assign data set to volume in storage group, it will try to balance load across LCUs You define the rules it should use You assign volumes to LUNs in storage system You assign LUNs to RAID arrays in your storage system There is a lot you can do to automatically “tune” allocation Storage Intelligence © IntelliMagic 2012 32 Allocation Complexity increases with an increase in the detail level of what is being migrated onto which tier E.g. Migrating a complete SMS storage group does not require recoding of SMS allocation routines but simply moving a large amount of data The number of storage groups will typically stay the same. The only change will be on which tier they reside. Storage Intelligence © IntelliMagic 2012 33 How to Direct Allocations Set up SMS Storage Groups to only contain the same kind of back end device • Fewer Storage Groups will waste less space Use large logical volumes Size , RPM, attachment and Raid type SG2 SG1 SG3 146GB 15K Storage Intelligence © IntelliMagic 2012 450 GB 10K 34 Allocation (2) Selectively moving specific volumes results in moving less data but the SMS allocation routines need to be modified. The number of storage groups will expand since each storage group may need to be split into 2 or more. Volumes still reside in storage groups so the allocation routines will need to be updated. Depending on how the allocation routines are coded this may not be trivial. Storage Intelligence © IntelliMagic 2012 35 Allocation (3) Selectively moving individual data sets requires an large amount of work to identify the appropriate candidates. Extensive SMS allocation routine changes are subsequently required. It is not a matter of simply moving data. It is important that any future allocation of this data set continue to land on a particular tier (e.g. SSD). Storage Intelligence © IntelliMagic 2012 36 Allocation (4) FILTLIST DATA_HLQ INCLUDE(DB2PROD.**,DB2TEST.**,DB2DEV.**) /**********************************************************************/ /* DATA STORAGE GROUP */ /**********************************************************************/ WHEN (&DSN = &DATA_HLQ) SET &STORGRP = 'DATA' A simple sample ACS routine prior to migrating to tiered storage The DATA_HLQ filter list would have to be modified and/or an additional filter will have to be defined Storage Intelligence © IntelliMagic 2012 37 Allocation (5) FILTLIST DATA_HLQ INCLUDE(DB2PROD.**,DB2TEST.**,DB2DEV.**) FILTLIST DATA_HLQ_HIGH INCLUDE(DB2PROD.HIGH.**, DB2PROD.IMPORT.**) /**********************************************************************/ /* DATA STORAGE GROUP */ /**********************************************************************/ WHEN (&DSN = &DATA_HLQ_HIGH) SET &STORGRP = 'DATAHIGH' /* SSD STORAGE GROUP */ WHEN (&DSN = &DATA_HLQ) SET &STORGRP = 'DATA' /* TIER 1 STORAGE GROUP */ A new DATA_HLQ _HIGH filter list would have to be created for those data sets that belong on SSD (for example) Storage Intelligence © IntelliMagic 2012 38 Automated Tiering Transparent to the host operating system. Less work to implement compared to manual tiering. May lag behind desired performance requirement • E.g. uses 24-hour window for IBM Doesn’t work on secondary location. • Will take some time after moving to secondary location before the “chunks” are on the right tier Not suitable for workloads that reallocate active data sets regularly Storage Intelligence © IntelliMagic 2012 39 Green Storage Storage Intelligence © IntelliMagic 2012 40 Power Requirements “Less than one third of any disk drives contains useful data. But you are powering the other 66%”* * Sun Microsystems Storage Intelligence © IntelliMagic 2012 41 Power Requirements This doesn’t mean migrating 66% of your data to tape, but selecting the right HDD technology for your data to minimize cost. Each HDD technology has its own power requirements. Storage Intelligence © IntelliMagic 2012 42 Factors that influence power 1. Number of drives • Every platter consumes power 2. RPM • The higher rotation speed, the more power used 3. Size • The larger, the more power used, because more platters and more heads must be moved 4. Technology • FATA/SATA interfaces use less power than Fibre Storage Intelligence © IntelliMagic 2012 43 Power Usage per Drive Type 20.00 18.00 16.00 14.00 Watts 12.00 Fibre 15k Fibre 10k 10.00 SATA 8.00 SAS 15k 6.00 SAS 10k SSD 4.00 2.00 0.00 73 146 300 500 600 750 1000 Disk Drive Capacity Storage Intelligence © IntelliMagic 2012 44 Observations Migrate to smaller efficient HDD (2.5 inch SAS) • Energy savings The RAID configuration will play a part in energy consumption SATA drives have a low energy requirement but may not be able to handle the I/O load demand. Implementing SSD together with a larger capacity drive will reduce total energy requirements. Storage Intelligence © IntelliMagic 2012 45 RAID Revisited RAID 10: 4 data disks, 4 mirrored disks per group of 8 drives 100% drive and power overhead for protection RAID 5: 7 data disks, 1 parity disks per group of 8 drives 14% drive and power overhead for protection RAID 6 uses 2 parity disks per 8 drives, obviously less energy efficient. Used for extra protection e.g. for 1 TB SATA. Not discussed here Storage Intelligence © IntelliMagic 2012 46 More arms in RAID-10 RAID-10 for the same net capacity gives you more arms More arms can process more read-miss I/Os For high access densities, those extra arms may be needed You could also simply add extra RAID-5 arrays! • Such that you have as many drives as RAID-10 would have ... Still no Green reason to pick RAID-10! Storage Intelligence © IntelliMagic 2012 47 RAID writes So is RAID-10 never green? Well, let’s look at the writes RAID-10 and RAID-5 behave differently for write operations And random and sequential write are treated differently on RAID-5 Storage Intelligence © IntelliMagic 2012 48 Random write penalty Random write operations result in 2 writes for RAID-10 (primary and mirror copy) But in four operations for RAID-5: • Old data and old parity information are read (2 reads) • New parity is computed • New data and new parity are both written (2 writes) So RAID-10 supports double the RAID-5 random write throughput per array group Storage Intelligence © IntelliMagic 2012 49 Sequential Efficiency For sequential workloads the RAID-5 scheme is more efficient: • All data drives and parity are written as a stripe So the overhead of the RAID-5 scheme for sequential is only 1/7th. RAID-10 still requires two writes for every application write. So RAID-5 supports almost double the RAID-10 sequential write throughput per array group Storage Intelligence © IntelliMagic 2012 50 Reality: mixed workloads Real workloads are a varying mix of reads and writes, cache hits and misses, random and sequential I/O So the number of back-end accesses can go either way: • more for a RAID-5 implementation • more for a RAID-10 implementation • or almost the same for both Storage Intelligence © IntelliMagic 2012 51 RAID-5 or RAID-10? Questions to be answered: How many disks are minimally needed to satisfy the capacity requirements? How many disks are minimally needed to support the peak throughputs for RAID-5 versus RAID-10? • Translate front-end I/Os to back-end accesses for each RAID type Workloads with a large random write fraction and a large throughput could benefit from RAID-10 May need to over-configure (short-stroke) anyhow Storage Intelligence © IntelliMagic 2012 52 Summary Large amounts of data require careful planning to ensure that data access time is good and at the same time the cost to store the data is low. A combination of fast SSD and slow(er) larger capacity drives creates an efficient storage hierarchy. • Adequate performance • Reduced power consumption It is important to place the correct data on the correct tier Storage Intelligence © IntelliMagic 2012 53 Thank You Storage Intelligence © IntelliMagic 2012 54
© Copyright 2026 Paperzz