MEMS: Advancing to candidacy - cse.scu.edu

Reliability of MEMS-Based Storage
Enclosures
Bo Hong, Thomas J. E. Schwarz, S. J.*
Scott A. Brandt, Darrell D. E. Long
Storage Systems Research Center
University of California, Santa Cruz
UC Santa Cruz
*Also Santa Clara University, Santa Clara, CA
MEMS Storage Technology

Micro-Electro-Mechanical Systems (MEMS) storage
• A promising alternative secondary storage technology
• Hardware Research: IBM, HP, CMU, Nanochip
• Magnetic storage, but very different mechanics
Spring
2
MEMS Storage Technology

MEMS-based storage vs. Magnetic Disk
•
•
•
•
•
•
•
Provides non-volatile storage, too.
Delivers 10 * faster access time (< 1 ms)
Delivers higher bandwidth (100 MB – 1 GB/s)
Small (size of penny, cent)
Consumes 100* less power
Costs ~10 USD per device
Expected to be more reliable
• Stores limited amount of data per device (3-10 GB)

A serious alternative to disk drives, in particular for
mobile computing applications
3
Reliability Implication of MEMS-based
Storage

Storage systems built from MEMS-based storage …
• Require more MEMS devices

At least 10 times the number of disks to meet capacity requirements
• Require more connection components

Reliability implication
• More components, hence (?) lower reliability
4
MEMS Storage Enclosure

Our proposal: MEMS Enclosures
• A device with dozens of MEMS
• Single interface to rest of system
• Might be serviceable, but service calls
during economic lifetime should be very
rare
Interface
5
MEMS Storage Enclosures

Reliability an issue:
• MTTF 1- 2 years without redundant data storage

Uses RAID Level 5 technology with distributed sparing
• Additional k spares

Calls for service when necessary
• i.e. when we run out of spares

Organization and number of spares can
• Decrease the data recovery time and thus improve reliability
• Reduce human interference


No errors servicing
Reduce maintenance costs
6
MEMS Enclosure Reliability

Measure MTBF for enclosures
• Without replacing spares
• With replacing spares (service calls)



Determine number of failures that trigger a
service call
Mandatory replacement: no redundancy left
Preventive replacement: no spare left
7
MEMS Enclosure Reliability without
Replacement
5 spares
4 spares
8.1 Yrs
6.9 Yrs
3 spares
Disk
5.8 Yrs
23 Yrs
Disk
11.5 Yrs
2 spares
1 spare 4.6 Yrs

MTTFDISK = 11.5 or
3.5 Yrs
23 yrs
No spare

MTTFMEMS = 23 yrs
2.3 Yrs

19 data + 1 parity + k
dedicated spares

15-minute data
recovery


MTTF is not enough to measure reliability of enclosures without
repairs
Instead: focus on data reliability during the economic lifetimes (3-5
years) of enclosures
8
MEMS Enclosures with Replacement

Markov model for a MEMS enclosure with N data,
one parity, and one dedicated spare devices
•
•
•
•
N – Normal; D – Degraded; DL – Data Loss
1/ – MTTFMEMS (in tens of years)
1/µ – Mean Time Between Recovery (in minutes)
1/ – Mean Time Between Replacement (in days, weeks)
Preventive
replacement
Mandatory
replacement

Preventive and mandatory replacement
9
MEMS Enclosure Reliability with
Replacement
1, 2, 3 – Number
of spares
Preventive + mandatory
1
2
3
Mandatory
1
2 3
No spare

Preventive replacement increases reliability and
reduces replacement urgency
10
MEMS Enclosure Reliability

Dedicated Sparing
• Replace all data from a failed MEMS
on a single spare MEMS

Distributed Sparing
• Every spare contains
Client data
 Parity data
 Spare space

11
Distributed Sparing [Menon and Mattson 1992]
X
Before failure


After MEMS 4 fails
Shorter data recovery time
More devices can fail
12
Reliability Comparison: Dedicated
Sparing vs. Distributed Sparing
1, 2– Number
of spares
Dedicated
Preventive + mandatory
Mandatory
No spare
1
2
Dedicated
1
2
Compare with following slide
13
Reliability Comparison: Dedicated
Sparing vs. Distributed Sparing
1, 2– Number
of spares
Distributed
Dedicated
Preventive + mandatory
2
Mandatory
No spare

1
Dedicated &
Distributed
1
2
Distributed sparing only better at short replacement times
when using preventive replacement
14
Durability of MEMS Storage Enclosures

All about economy
• How long can MEMS enclosures work without repairs?
• How often do they need repairing in the first 3-5 years?
• How does replacement policies affect maintenance
frequency?

# of failures an enclosure with k spares can tolerate
before the (m+1)th repair is scheduled (m >= 0):
• (m + 1) × k, under the preventive replacement policy
• (m + 1) × (k + 1), under the mandatory replacement policy
15
Durability of MEMS Storage Enclosures
10 failures
6 failures
8 failures
Disk
23 Yrs
4 failures
1 failure
2 failures
No failure
Probabilities that a MEMS storage enclosure has up to k failure during (0, t]


First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures
with two spares
Chance that MEMS enclosure with four spares requires more than one
service in five years: 3.5% (preventive) vs. 0.6% (mandatory)
16
Related Work

MEMS-based storage technology development
• IBM, HP, CMU CHI2PS, Nanochip

Digital Micromirror Devices by TI
• Reported Mean Time Between Failure: 650,000 hours
[Douglass]

RAID reliability
• Dedicated sparing [Dunphy et al.]
• Distributed sparing [Menon and Mattson]
• Parity sparing [Reddy and Banerjee]

Disk failure prediction
• S.M.A.R.T (Self-Monitoring Analysis and Reporting
Technology)
17
Summary

Reliability of MEMS storage enclosures
• Can be more reliable than disks even without failed
device replacement
• Highly reliable when using preventive replacement
• Dedicated sparing and distributed sparing provide
comparable or almost identical reliability

Economy of MEMS storage enclosures
• Preventive replacement trades more maintenance
services for higher reliability
18
Thank You!

Acknowledgements
• Dave Nagle, Greg Ganger, CMU PDL
• The rest of the UCSC SSRC

More information:
• http://ssrc.cse.ucsc.edu
• http://ssrc.cse.ucsc.edu/mems.shtml

Questions?
19
Backup Slides
20
MEMS Storage Technology

Micro-Electro-Mechanical Systems (MEMS) storage
• A promising alternative secondary storage technology
• Hardware Research: IBM, HP, CMU, Nanochip

Radical differences between MEMS storage and magnetic
disk technologies
Disk
MEMS
Recoding
media
Magnetic
Magnetic or physical
(non-volatile)
Recoding
technique
Longitudinal
Orthogonal
(higher density)
R/W head
Single
Thousands – tip array
(Higher bandwidth and parallelism)
Media
movement
Rotation
Media sled moves in X and Y independently
(no rotation delay)
21
MEMS Storage Device Characteristics


Physical size: 1 – 2 cm2
Recording density: 250 – 750 Gb/in2
Throughput
7GB/s
Predicted Performance in 2005
DRAM
6GB/s
0.5–2 GB
$100-$200/GB
5GB/s
4GB/s
3GB/s
3–10 GB
$5-$50/GB
2GB/s
MEMS
1GB/s
DISK
1ns
10ns
100ns
1us
10us 100us
1ms
100–500 GB
$1-$2/GB
10ms
Access Latency
22
MEMS Storage Device
Spring
Y
X
23
Durability of MEMS Storage Enclosures
6 failures
10 failures
8 failures
Disk
23 Yrs
4 failures
1 failure
2 failures
No failure
Probabilities that a MEMS storage enclosure
has up to k failure during (0, t]
24