Lecture 13: I/O Queuing Theory, RAID

Reliability vs Availability
• Reliability: Is anything broken?
• Availability: Is the system still available to the
user?
Lecture 4 1
RAID
• Redundant Array of Inexpensive Disks
– Higher throughput
– Ability to recover from failures
•
•
•
•
•
Disk Array (Availability reduced to 1/N)
Data Integrity
Striping
MTTR
MTTF
Lecture 4 2
Manufacturing Advantages
of Disk Arrays
Disk Product Families
Conventional:
4 disk
3.5” 5.25”
designs
Low End
10”
14”
High End
Disk Array:
1 disk design
3.5”
Lecture 4 3
Replace Small # of Large Disks with
Large # of Small Disks! (1988 Disks)
IBM 3390 (K)
IBM 3.5" 0061
x70
20 GBytes
320 MBytes
23 GBytes
Volume
97 cu. ft.
0.1 cu. ft.
11 cu. ft.
Power
3 KW
11 W
1 KW
15 MB/s
1.5 MB/s
120 MB/s
I/O Rate
600 I/Os/s
55 I/Os/s
3900 IOs/s
MTTF
250 KHrs
50 KHrs
??? Hrs
Cost
$250K
$2K
$150K
Data Capacity
Data Rate
large data and I/O rates
Disk Arrays have potential for
high MB per cu. ft., high MB per KW
reliability?
Lecture 4 4
Array Reliability
• Reliability of N disks = Reliability of 1 Disk / N
50,000 Hours / 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays (without redundancy) too unreliable to be useful!
Hot spares support reconstruction in parallel with
access: very high media availability can be achieved
Lecture 4 5
Redundant Arrays of Disks
• Files are "striped" across multiple spindles
• Redundancy yields high data availability
Disks will fail
Contents reconstructed from data redundantly stored in the array
Capacity penalty to store it
Bandwidth penalty to update
Mirroring/Shadowing (high capacity cost)
Techniques:
Horizontal Hamming Codes (overkill)
Parity & Reed-Solomon Codes
Failure Prediction (no capacity overhead!)
VaxSimPlus — Technique is controversial
Lecture 4 6
Redundant Arrays of Disks
RAID 1: Disk Mirroring/Shadowing
recovery
group
• Each disk is fully duplicated onto its "shadow"
Very high availability can be achieved
• Bandwidth sacrifice on write:
Logical write = two physical writes
• Reads may be optimized
• Most expensive solution: 100% capacity overhead
Targeted for high I/O rate , high availability environments
Lecture 4 7
Redundant Arrays of Disks
RAID 3: Parity Disk
10010011
11001101
10010011
...
logical record
Striped physical
records
P
1
0
0
1
0
0
1
1
1
1
0
0
1
1
0
1
1
0
0
1
0
0
1
1
0
0
1
1
0
0
0
0
• Parity computed across recovery group to protect against
hard disk failures
33% capacity cost for parity in this configuration
wider arrays reduce capacity costs, decrease expected availability,
increase reconstruction time
• Arms logically synchronized, spindles rotationally synchronized
logically a single high capacity, high transfer rate disk
Targeted for high bandwidth applications: Scientific, Image Processing
Lecture 4 8
Redundant Arrays of Disks RAID 5+:
High I/O Rate Parity
A logical write
becomes four
physical I/Os
Independent writes
possible because of
interleaved parity
Reed-Solomon
Codes ("Q") for
protection during
reconstruction
Targeted for mixed
applications
D0
D1
D2
D3
P
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
Increasing
Logical
Disk
Addresses
Stripe
P
D16
D17
D18
D19
D20
D21
D22
D23
P
.
.
.
.
.
.
.
.
.
.
Disk Columns
.
.
.
.
.
Stripe
Unit
Lecture 4 9
Problems of Disk Arrays:
Small Writes
RAID-5: Small Write Algorithm
1 Logical Write = 2 Physical Reads + 2 Physical Writes
D0'
new
data
D0
D1
D2
D3
old
data (1. Read)
P
old
(2. Read)
parity
+ XOR
+ XOR
(3. Write)
D0'
D1
(4. Write)
D2
D3
P'
Lecture 4 10
Subsystem Organization
host
host
adapter
array
controller
manages interface
to host, DMA
control, buffering,
parity logic
physical device
control
striping software off-loaded from
host to array controller
no applications modifications
no reduction of host performance
single board
disk
controller
single board
disk
controller
single board
disk
controller
single board
disk
controller
often piggy-backed
in small format devices
Lecture 4 11
Summary: A Little Queuing
Theory
System
Queue
Proc
server
IOC
Device
• Queuing models assume state of equilibrium:
input rate = output rate
• Notation:
r
Tser
u
Tq
Tsys
Lq
Lsys
average number of arriving customers/second
average time to service a customer (tradtionally ต = 1/ Tser )
server utilization (0..1): u = r x Tser
average time/customer in queue
average time/customer in system: Tsys = Tq + Tser
average length of queue: Lq = r x Tq
average length of system : Lsys = r x Tsys
• Little’s Law: Lengthsystem = rate x Timesystem
(Mean number customers = arrival rate x mean service time)
Lecture 4 12
Summary: Redundant Arrays of
Disks (RAID) Techniques
• Disk Mirroring, Shadowing (RAID 1)
Each disk is fully duplicated onto its "shadow"
Logical write = two physical writes
100% capacity overhead
• Parity Data Bandwidth Array (RAID 3)
Parity computed horizontally
Logically a single high data bw disk
• High I/O Rate Parity Array (RAID 5)
1
0
0
1
0
0
1
1
1
0
0
1
0
0
1
1
1
0
0
1
0
0
1
1
1
1
0
0
1
1
0
1
1
0
0
1
0
0
1
1
0
0
1
1
0
0
1
0
Interleaved parity blocks
Independent reads and writes
Logical write = 2 reads + 2 writes
Lecture 4 13
Parity + Reed-Solomon codes
Homework #3
• Problem 6.5 and 6.6
• Due date: June 26, 2001
Lecture 4 14
Assignment #1
• Explain RAID 0-6 in greater detail.
• Due date: July 3, 2001
Lecture 4 15