Slide - Percona

Exploiting Multi-Core, Flash, and High Performance
Networking in Web 2.0 and Clouds
Opportunities and Challenges
Dr. Brian O’Krafka, Fellow
Darpan Dinker, Vice President of Database Technologies
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
1
Web 2.0 and Cloud Challenges
Time
Billion Transactions
 Web traffic
 Web transactions
 User interaction response time
Million Users
Unmanageable Data Growth
Database Size
Web Traffic
[Source]
Web Transactions
[Source]
Response Time
Rack, Power, and Pipe Usage
 Increasing power costs
 Limited datacenter rack space
 Inefficient, underutilized hardware
means wasted energy
Data Complexity
 Too much data to process effectively
 Requires extensive data partitioning,
application level mapping, caching,
replication/recovery, load balancing
 Commodity, non-application-specific
hardware difficult to use and manage
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
2
Computing Trends and New Technologies
Four transformational technologies:
 Multi-core processors
 State-of-the-art, enterprise-class flash memory
 Low-latency interconnects
 Optimized data access and caching applications
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
3
Multi-Core Processors
Benefits:
 High computational density (4-8 cores/chip)
 High speed data access (on-chip caches)
 High thread-level throughput
 Multi-chip coherency
 Standard and commodity products
Challenges:
 Requires extensive optimization to exploit, including:
− High thread level parallelism in software
− Granular concurrency control
− Memory affinity to a core
− Thread affinity to a core
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
4
Leading-Edge Example: Intel® Core™ i7
 Announced April 1, 2009
 4 cores/chip, 8 simultaneous threads/chip
 32kB L1, 512 kB L2, shared inclusive 8MMB L3 cache
 Fast switching, low power 45nm high-k metal gate silicon
 Multi-chip shared memory coherency with point-to-point memory
interconnects (4 x memory bandwidth)
 Commodity and standard
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
5
Enterprise-Class Flash Memory
Compared to HDD:
 100x faster
 More reliable
Compared to DRAM:
 Consume 1/100th power of DRAM
 Much higher density→capacity
 Cheaper than DRAM
 Persistent when written
Can be organized into modules of different
capacities, form factors, physical and
programmatic interfaces
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
6
Flash Memory Challenges
Write access is different from read access behavior
Writes done in large blocks (~128 kB)
 Small writes need to be buffered and combined into
large blocks before writing (write coalescing)
Before writing, region must be erased
 Need background garbage collection to create free
regions for writing
Limits on how many times erased (~100k)
 Writes must be spread uniformly across total flash
memory subsystem to maximize effective lifetime
(wear levelling)
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
7
All Flash Memory Is Not the Same
Read B/W Write B/W
HDD
100 mb/s
150.00 mb/s
NAND MLC 250 mb/s
70.00 mb/s
NAND SLC
250 mb/s
NOR SLC
58 mb/s
DRAM
© 2009
Erase Lat.
Read Lat.
Cost per GB
5,000.00 us
$0.10
3.5 ms
85.00 us
$3.50
170.00 mb/s
1.5 ms
75.00 us
$11.00
0.13 mb/s
5,000.00 ms
0.27 us
$70.00
0.08 us
$75.00
2,000 mb/s 2,000.00 mb/s
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
8
NOR vs. NAND Comparison
NOR flash memory chips:
 Low density
 Long write and erase latencies
 High cost
 Minimal technology investment by tier 1 memory suppliers
 Used primarily in consumer devices
NAND flash memory chips:
 Leading SSDs all designed with NAND flash
 2-8x capacity of equivalent-sized NOR chips
 5x bandwidth for large reads, 100x write bandwidth
 1/7th the cost of NOR
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
9
Flash: SLC vs. MLC Comparison
 MLC increases density by storing more than a
single bit per memory cell
 MLC costs less than SLC
 MLC write bandwidth and erase latency ~2.5
times slower than SLC
 MLC has significantly lower lifetime than SLC
 SLC dominant in enterprise flash usage
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
10
Wear Levelling: Typical Flash Lifetimes
Innodb - DBT2
8
Drives
64
GB/drive
45,000 TPM
750
NewOrders/Second
15
Buffer writes/NewOrder
4096
Bytes/page
11250 Writes/Sec per node
1406
Writes/Sec per drive
5.5
MB/Sec/drive
11930 seconds to rewrite drive
3.3
hours to rewrite drive
3.8
Write amplification
0.9
Page reprogram hours
87211 Hours lifetime
10.0
Years lifetime
© 2009
Memcache
8
Drives
64
GB/drive
240,000 req/Sec
10%
SET fraction
2048
Bytes/object
24000 SET/Sec per node
3000
Writes/Sec per drive
5.9
MB/Sec/drive
11185 seconds to rewrite drive
3.1
hours to rewrite drive
3.8
Write amplification
0.8
page reprogram hours
81760 Hours lifetime
9.3
Years lifetime
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
11
Flash Form Factor/Interface:
SSDs vs. PCIe Cards
 Flash memory can be installed using SSD or PCIe cards
 PCIe flash cards use lots of server processor cycles and
memory for garbage collection, write coalescing, wear
leveling, mapping:
─ SSDs have ASICs+memory that perform these functions efficiently
close to the flash, freeing server processor cores and memory for
productive use
 SSDs provide higher degree of parallelism, balance, and
maintainability over PCIe flash cards:
– Configurations can be adjusted to match workload capacity,
bandwidth, latency requirements
– Servers can be provisioned with higher capacity than PCIe flash
memory subsystems
– Easier to replace SSDs than PCIe cards
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
12
The Intel® X25-E Extreme SSD
 Leading-edge example of flash
technology
 64 GB 2.5” SATA SSD
 10 parallel channels accessing
SLC NAND flash memory
 Native command queuing for
up to 32 concurrent operations
 Each device delivers 55k of IOPS at 75 us read latency with
write buffering for virtually instantaneous writes
 On-board ASIC provides write coalescing, wear levelling,
space management
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
13
Challenges: Incorporating Flash Memory
Requires optimized apps and server operating
environment to utilize potential I/O throughput and
bandwidth, including:
 High degree of parallelism
 Granular concurrency control
 Low latency access paths, interrupt batching
 Write synchronization
 Space management, caching, persistence control, wearlevel control
Flash memory driver, controller, device firmware
optimization and tuning required to match to workload
characteristics
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
14
High-Performance Interconnects
Latency
Bandwidth
1Gb Ethernet
25.0 us
1 Gb/sec
10Gb Ethernet
6.0 us
10 Gb/sec
Benefits:
 Latencies as low as six microseconds between server
nodes
 Feasible to distribute workloads across multiple servers
and use replication to multiple server nodes
 Provide high availability and data integrity
Challenges:
 Require apps to handle parallel communication,
efficient initiation, completion, asynchrony, batching
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
15
Challenges: Utilizing New Technologies
Using multi-core, flash, and high speed networking in Web 2.0
and cloud computing environments requires:
 High thread-level parallelism and granular concurrency control at each
level
 Data/thread affinity management down to the core level
 Optimized path lengths for data access and thread context switching
 Multiple caching levels to exploit component technology access time
and bandwidth variations
 Optimization for simultaneous multi-threading and flash memory
 Selecting, tuning, and balancing component technologies and
configurations to match workloads
 Measurement and analysis tools for implementation and deployment
optimization
 Application and storage distribution
 Failure-tolerant replication and recovery mechanisms
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
16
Schooner Scalable Hardware Platform
 Intel Nehalem 5550 processors
 IBM System x3650 M2 server platform
 512GB flash memory subsystem with highly parallel
flash controllers and Intel X25e SSDs
 High throughput networking: 1/10-GbE
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
17
Schooner Operating Environment
Networked Service Application
1/10 G E-Net Management  Application Protocol Handling
Data Fabric
Object Attributes  Thread and Core Management
Synchronization/Concurrency Management  DRAM Cache Management
Container Management  Object Metadata Management  Replication Management
Admin
Flash Management
Subsystem
Configure
Monitor
Control
Optimize
Space Allocation and Shard Management
Object Replacement (cache mode)
Persistency Management
Tiered Storage Management
Flash Access
Asynchronous IO Handling  Data Striping,
Interrupt Batching
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
18
Schooner Appliance for Memcached
Increases throughput, network bandwidth, improves
power efficiency compared to legacy Memcached
servers
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
19
Schooner Appliance for MySQL Enterprise
DBT2 TPM per Watt
55,000
150
100
50
7,000
0
InnoDB/Disks
Schooner
Provides 8x performance increases over legacy disks
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
20
Schooner Data Access Appliances
First two Schooner appliances:
 Schooner Appliance for Memcached
 Schooner Appliance for MySQL Enterprise
 Designed for Web 2.0 and cloud computing datacenters
 Provide 8x performance improvements
 Consume 1/8th power and space
 Lower TCO by 60%
 Provide 100% compatibility with existing applications and
management tools
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
21
Q&A
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
22
Comparison of MySQL Performance
NOTE: Durability not guaranteed for “Commodity/Opt. Flash” configuration
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
23
Contact Us
For more information:
www.schoonerinfotech.com

(650) 328-4200
[email protected]
© 2009
SCHOONER INFORMATION TECHNOLOGY: CONFIDENTIAL
24