HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9th 2010 Łukasz Heldt Largest Japanese IT company $43 Billion in annual revenue 143,000 staff www.nec.com Owns & sells Polish R&D company 50 engineers and scientists www.9livesdata.com R&D of critical backend component Scalable disk based storage for backup with global deduplication Started in 2003 in NEC Labs by Cezary Dubnicki 2007 Product of the year award by SearchStorage.com 2008 Product innovation award by Network Products Guide 2009/2010 FAST conference publication in San Jose Sold in US and Japan since 2007 Will be sold in Poland in 2011 by 9LivesData in coop. with NEC HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Backup storage ● Tapes are most common, despite: ● Sensitive environment requirements ● Unreliable restore ● Low performance ● Manual labor or expensive robots ● Problematic replication 3 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Backup storage size ● Usual backup policy ● 4-12+ full backups ● 7-30+ incremental ● ● Majority of data does not change ● Secondary storage size: ● ● Data compression 2:1 ● 5x-20x more than primary storage Includes many copies of the same data Each data chunk stored 5-10+ times HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Backup storage size ● Usual backup policy ● 4-12+ full backups ● 7-30+ incremental ● ● Majority of data does not change ● Secondary storage size: ● ● Data compression 2:1 ● 5x-20x more than primary storage Includes many copies of the same data Each data chunk stored 5-10+ times High potential for the deduplication technology. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Deduplication ● Save disk space by eliminating duplicates ● Sample reduction ratio 10:1 ● Lowers price of gigabyte (depends on backup policy) Sub-file level deduplication File A A B C File B A D E File A A B C Only unique blocks are stored Stored blocks A B C D E HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Global deduplication ● Prevent silos of deduped data ● One system to manage Global vs. siloed dedup 7 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor product ● Provides ● ● global deduplication using DataRedux™ performance, storage scalability and data resiliency using Distributed Resilient Data™ 8 HYDRAstor deployment ● Interface: CIFS, NFS, Symantec OST ● Marker filtering for: Tivoli, Netbackup, Networker, CommVault 9 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10 HYDRAstor architecture ● Accelerator Nodes realize performance ● Storage Nodes realize capacity NFS / CIFS / OST over Ethernet Accelerator Nodes Internal Network Storage Nodes HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11 HYDRAstor architecture ● Accelerator Nodes realize performance ● Storage Nodes realize capacity NFS / CIFS / OST over Ethernet Accelerator Nodes Internal Network Storage Nodes Non-disruptive grid expansion HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor scalability ● ● ● MiniHYDRA – single server ● Storage: 12 TB – 240 TB* ● Performance: 1.3 TB / hour 2AN 4SN ● Storage: 48 TB – 960 TB* ● Performance: 3.6 TB / hour 20AN 40SN (4 racks) ● Storage: 480 TB – 9600 TB* ● Performance: 36 TB / hour * - assuming 20x data reduction through DataRedux™ 12 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13 HYDRAstor scalability ● Slide from Curtis Preston presentation Curtis Preston is a famous storage analyst owning independent consulting company HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor other features ● ● Fully automatic/non-disruptive mgmt ● Recovery of lost data resiliency ● Periodic data scrubbing ● Machine and disk failure recovery Configurable redundancy level ● erasure coding – better than RAID6 ● Optimized replication ● Smart resource management 14 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor backend design Details of the design: http://www.usenix.org/events/fast09/tech/full_papers/dubnicki/dubnicki.pdf 15 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16 Programming Model ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17 Programming Model ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized Exposed pointers to other blocks E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18 Programming Model ● ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized Exposed pointers to other blocks Trees of blocks hash=010..1 Root1 E E E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19 Programming Model ● ● Content-addressed ● Immutable ● Variable-sized Exposed pointers to other blocks Trees of blocks ● DAGs due to deduplication ● No cycles possible Root2 E Root1 E hash=110..0 E 01 1. .0 ● Repository of blocks hash=010..1 E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20 Programming Model ● ● ● Content-addressed ● Immutable ● Variable-sized Exposed pointers to other blocks Trees of blocks ● DAGs due to deduplication ● No cycles possible Deletion of whole trees Root2 E Root1 E hash=110..0 E 01 1. .0 ● Repository of blocks hash=010..1 E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21 Programming Model ● ● ● Content-addressed ● Immutable ● Variable-sized Exposed pointers to other blocks Trees of blocks ● DAGs due to deduplication ● No cycles possible Deletion of whole trees Root2 E Root1 E hash=110..0 E 01 1. .0 ● Repository of blocks hash=010..1 E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22 Programming Model ● ● ● Content-addressed ● Immutable ● Variable-sized Exposed pointers to other blocks Trees of blocks ● DAGs due to deduplication ● No cycles possible Deletion of whole trees Root2 E Root1 E hash=110..0 E 01 1. .0 ● Repository of blocks hash=010..1 E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23 Programming Model ● ● ● Content-addressed ● Immutable ● Variable-sized Root2 E hash=110..0 Exposed pointers to other blocks Trees of blocks ● DAGs due to deduplication ● No cycles possible Deletion of whole trees 01 1. .0 ● Repository of blocks E 011. .0 ● hash=011..0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 24 Original Fragments Original block Any 3 fragments can be lost Decode Encode Example: N=8, m=5 Redundant Fragments Failure tolerance: erasure coding HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 25 Original Fragments Original block Decode Encode Example: N=8, m=5 Redundant Fragments Failure tolerance: erasure coding Any 3 fragments can be lost Assuming 12 disks array Mirror 3-copy RAID6 Erasure coding Resiliency 1 2 2 2 3 Overhead 100% 200% 20% 20% 33% HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 26 Scalability with DHT: data placement ● Block location: DHT with prefix routing empty prefix 0 00 1 01 10 11 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 27 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix hash=011..0 empty prefix Block 0 00 1 01 10 11 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 28 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● hash=011..0 empty prefix Block Prefix components ● ● 0 Hosted on SNs N components per prefix 00 Node 1 1 Node 1 2 3 1 Node 3 2 Node 4 1 0 1 01 1 N=4 10 11 0 2 1 3 0 1 Node 5 2 1 Node 6 3 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 29 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● Prefix components ● ● ● Store fragments Block 0 Hosted on SNs N components per prefix hash=011..0 empty prefix 00 Node 1 1 Node 1 2 3 1 Node 3 2 Node 4 1 0 1 01 1 N=4 10 11 0 2 1 3 0 1 Node 5 2 1 Node 6 3 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 30 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● Prefix components ● ● ● ● Store fragments Distributed consensus Block 0 Hosted on SNs N components per prefix hash=011..0 empty prefix 00 Node 1 1 Node 1 2 3 1 Node 3 2 Node 4 1 0 1 01 1 N=4 10 11 0 2 1 3 0 1 Node 5 2 1 Node 6 3 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 31 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● Prefix components ● ● ● ● Store fragments Distributed consensus Block 0 Hosted on SNs N components per prefix hash=011..0 empty prefix 00 Node 1 1 Node 1 2 3 1 Node 3 2 Node 4 1 0 1 01 1 N=4 10 11 0 2 1 3 0 1 Node 5 2 1 Node 6 3 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 32 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● Prefix components ● ● ● ● Store fragments Distributed consensus Block 0 Hosted on SNs N components per prefix hash=011..0 empty prefix 1 00 01 10 Node 1 2 3 1 1 1 Node 3 2 Node 4 1 0 0 1 Node 5 1 2 N=4 11 Node 1 1 Node 6 3 0 3 2 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 33 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● Prefix components ● ● ● ● Store fragments Distributed consensus Block 0 Hosted on SNs N components per prefix hash=011..0 empty prefix 1 00 01 10 Node 1 2 3 1 1 1 Node 3 2 Node 4 1 0 0 1 Node 5 1 2 N=4 11 Node 1 1 Node 6 3 0 3 2 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 34 Scalability with DHT: data placement ● Block location: DHT with prefix routing ● Block mapped to hash prefix ● Prefix components ● ● ● ● ● Store fragments Distributed consensus Load balancing Block 0 Hosted on SNs N components per prefix hash=011..0 empty prefix 1 00 01 10 Node 1 2 3 1 1 1 Node 3 2 Node 4 1 0 0 1 Node 5 1 2 N=4 11 Node 1 1 Node 6 3 0 3 2 2 1 3 0 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 35 Data organization: synchrun chains A B C D E F G ● Data stream split to blocks HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 36 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E Hash 000… F Hash 011… G Hash 100… ● Data stream split to blocks ● Hashes of blocks computed HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 37 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E Hash 000… F Hash 011… G Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 38 Data organization: synchrun chains A Hash 010… B Hash 101… Prefix 01 C Hash 110… D Hash 011… E Hash 000… F Hash 011… G Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 39 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E F Hash 000… Prefix 01 Compression Erasure Coding Hash 011… G Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E F Hash 000… Hash 011… G Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT Prefix 01 Compression Erasure Coding Component 0 Component 1 Component 2 Component 3 ● Erasure-coded fragments stored by components HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 41 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E F Hash 000… Hash 011… G Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT Prefix 01 Compression Erasure Coding Component 0 Component 1 Component 2 ● A D F A D F A D F Component 3 A D F Erasure-coded fragments stored by components HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 42 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E F Hash 000… G Hash 011… Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT Prefix 01 Compression Erasure Coding Synchrun 1 Synchrun 2 Component ● Synchrun 3 A D F 0 Component A D F 1 Component A D F 2 Component 3 A D F Synchrun ● Erasure-coded fragments stored by components Grouped into synchruns HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 43 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E F Hash 000… G Hash 011… Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT Prefix 01 Compression Erasure Coding Synchrun 1 Synchrun 2 Component ● Synchrun 3 A D F 0 Component A D F 1 Erasure-coded fragments stored by components ● Grouped into synchruns ● Containers stored on disks ● Component A D F 2 Component 3 Container A D F Synchrun Fragment metadata separately from data HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44 Data organization: synchrun chains A Hash 010… B Hash 101… C Hash 110… D Hash 011… E F Hash 000… G Hash 011… Hash 100… ● Data stream split to blocks ● Hashes of blocks computed ● Routing through DHT Prefix 01 Compression Erasure Coding Synchrun 1 Synchrun 2 Component ● Synchrun 3 A D F 0 Component ● Grouped into synchruns ● Containers stored on disks A D F 1 Erasure-coded fragments stored by components ● Component A D F 2 ● Component 3 Container Fragment metadata separately from data Ordered synchrun chains A D F Synchrun ● Preserve order & locality ● Manageable HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 Data Services: Identification of data resiliency level Missing fragments Component 01:0 Component 01:1 Component 01:2 Component 01:3 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data Services: Identification of data resiliency level Component 01:0 Component 01:1 Component 01:2 Component 01:3 Chain scanning 46 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data Services: Identification of data resiliency level Component 01:0 Component 01:1 Component 01:2 Component 01:3 Chain scanning 47 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data Services: Identification of data resiliency level Component 01:0 Component 01:1 Component 01:2 Component 01:3 Chain scanning 48 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data Services: Identification of data resiliency level Component 01:0 Component 01:1 Component 01:2 Component 01:3 Chain scanning 49 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: reconstruction Component 01:0 Component 01:1 Component 01:2 Component 01:3 ● Sequential read/write of entire Containers ● Erasure decoding and re-encoding 50 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: reconstruction Component 01:0 Component 01:1 Component 01:2 Component 01:3 ● Sequential read/write of entire Containers ● Erasure decoding and re-encoding 51 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: reconstruction Component 01:0 Component 01:1 Component 01:2 Component 01:3 ● Sequential read/write of entire Containers ● Erasure decoding and re-encoding 52 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 53 Data services: fast data transfer Component 01:0 Component 01:1 Component 01:2 Component 01:3 Location of new node (DHT) Old component 01:3 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: fast data transfer Component 01:0 Component 01:1 Component 01:2 Component 01:3 Data transfer Old component 01:3 54 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: fast data transfer Component 01:0 Component 01:1 Component 01:2 Component 01:3 Data transfer Old component 01:3 55 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: fast data transfer Component 01:0 Component 01:1 Component 01:2 Component 01:3 Data transfer Old component 01:3 56 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services: fast data transfer Component 01:0 Component 01:1 Component 01:2 Component 01:3 Old component 01:3 57 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58 Data services for deduplication hash=011.. Block Component 01:0 Component Choose complete chain 01:1 Component 01:2 Component 01:3 Completeness: “definitely not a duplicate” Deletion interaction: wasn't the block scheduled for deletion? HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services for deduplication hash=011.. Block Component 01:0 Component 01:1 Component 01:2 Query Component 01:3 59 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services for deduplication hash=011.. Block Component 01:0 Component 01:1 Component 01:2 Component 01:3 Local candidate found 60 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Data services for deduplication hash=011.. Block Component 01:0 Component 01:1 Component 01:2 Successful dedup Component 01:3 Candidate verification 61 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC On-demand data deletion ● ● ● Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant ● ● Block reference counter calculated independently on peer Container chains Interference with duplicate elimination: ● duplicates resurrection after garbage collection ● space reclamation in background 62 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Resource management ● Configurable load balancing between: ● backup/restore ● background tasks (reconstruction, transfer, etc.) ● garbage collection ● Shares depend on system state ● Assigns priority of tasks automatically ● ● e.g. reconstruction before transfer or space reclamation Maximizes resources utilization 63 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Topics for further discussion ● Features and technical details of HYDRAstor ● Sales of HYDRAstor in Poland ● Cooperation with 9LivesData on other projects 64 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Questions? Contact: [email protected] www.9livesdata.com www.hydrastor.com 65
© Copyright 2026 Paperzz