LOSF Optimization in OpenStack Swift

LOSF Storage
Optimization in Swift
Jeff Li
Senior Software Engineer,
Technology and Products Center, iQiyi.com
Outline
•
Background
–
–
•
Blob Engine
–
–
–
–
•
•
•
Introduction
Motivation
Persist objects
Locate objects
Replicate object
Volume compaction
Performance
Future
Q&A
Background
Who are we
Why Swift
•
•
•
•
Simple
Low cost
Have been in use since 2012
Serve video, image, text etc. at iQiyi
Video Transcoding
Clients
Transcoding System
Standard
Swift
Customized
Middlewares
Standard
Swift
……
Proxy Node M
Proxy Node 1
W/R
Customized
Services
Entry:
/srv/node/
Storage Node 1
Standard
Swift
W/R
W/R
Customized
Services
Entry:
/srv/node/
Storage Node 2
Customized
Middlewares
Customized
Services
Standard
Swift W/R
W/R
……
Standard
Swift
W/R
Entry:
/srv/node/
Storage Node N
Other Use Cases
• Video snapshots
• Archive with Swift EC
• Social product
Massive small files storage matters!
Our Problem with LOSF
• Write performance degradation
Replication storage engine contributes most of the latency
Storage Engine
• Erasure coding
• Replication Engine
– Every replica is saved as a file
– Metadata is saved as extended attributes
Write Pipeline of Replication Engine
Begin
Check if objects exists
Rename
End
Create temp File
Make dirs
Write data
Invalid hash
Write metadata
Drop cache
fsync
Why Replication Engine Inadequate with LOSF
• Heavy inodes usage
• Heavy random IO
• Synchronous pipeline
Our attempts
• Expand the cluster
• PyPy
• Hummingbird
None resolves the issue completely
Blob Engine
Blob Store System
•
•
•
•
•
Mainly designed for binary object storage
Small files are stored in a big file
File handle with encoded metadata
Reduce random IO at best
FastDFS, Haystack, SeaweedFS, Ambry, TFS
Blob Store Architecture
•
•
•
•
Distributed fault tolerant
Central lightweight metadata server
Data servers
File handle with encoded information
Clients
1
Metadata Server
2
3
Data Servers
Disk i
Disk n
Challenges in Swift
•
•
•
•
•
No centralized meta servers
No file handle
File path based replication
Customized object metadata
WSGI’s multiple workers model
Persist Objects
• Volume files to save needles(objects)
• Embedded key value database
KV Database
Volume 0
Disk A
Volume n
Locate Objects
• Replication Engine
– /account/container/object -> Partition
– Partition -> Disk
• Blob Engine
– Partition
– Disk
– Volume
– Offset
– Size
Locate Objects(cont.)
o1
o1
o2
o1
o2
o2
Partition 0
Partition x
Partition 0
Partition y
Partition 0
Partition z
Volume 0
Volume x
Volume 0
Volume y
Volume 0
Volume z
Disk A
Disk B
Disk C
Replicate Objects
• Based on Object Replicator
• Path mock in key value database
DB Key: /3/63c/3e19cafe6fc6d71c6ee3fe814ef4d63c/
Compact Volumes
• In place copy
• Punch continuous file hole in volume files
Superblock
Superblock
Hole
Hole
Needle 4
Deleted Needle
Needle 6
Needle 6
Needle 7
Needle 7
Needle 4
original volume
compacted volume
Implementation
•
•
•
•
Based on Hummingbird
RocksDB as key value database
Leverage Python Swift code
gRPC
Performance
Average Write Latency
95th Percentile Write Latency
Average Read Latency
95th Percentile Read Latency
Future
Future
•
•
•
•
Full Go stack
Operation tools
Better large file support
System performance observation
Summary
Motivation
Blob Engine
Roadmap
THANK YOU!
@ffejfd