Slides

EXPLODE: A Lightweight, General
Approach to Finding Serious Errors in
Storage Systems
Junfeng Yang, Can Sar, and Dawso Engler
Presented by Pramodh Pochu
1
Agenda
•
•
•
•
•
•
•
•
•
•
Motivation
Introduction
Key Contributions
Checking a Storage System
Implementation
Exploring choices
Generating crash disks
Checking different storage systems
Conclusion
Evaluation
2
Motivation
• Storage systems are important pieces of
systems code
• Storage systems are difficult to test
• Verifying system recovery from a crash is
difficult
• Previous works (Traditional model
checking and Implementation model
checking) for finding data integrity bugs
are heavy weight
3
Introduction
• Traditional model checking takes as
specification a model which it checks by
starting from an initial state and repeatedly
performing all possible actions to this state
and successors
• Traditional model checking requires
rewriting the system using artificial
modeling language Ex: Promela
4
Introduction
• Implementation level model checking uses
the code as its own model
• So it eliminates need for writing a model
• But it requires porting the entire OS to run
on top of the model checker to run it as an
user process
• Verisoft and CMC are examples of these
types
5
Key Contributions
• Explode checkers are effective
• Explode checkers can do thorough checks
• A series of new file system specific checks
for catching bugs in the data sync facilities
used by applications
• Simple checkers find bugs
6
Principles
• Explore all choices : When program point
can legally do one of N different actions,
fork execution N times and do each
• Exhaust states: Do every action before
proceeding to next state
• Touch nothing: modifying large checked
systems may produce corner case
mistakes. So those are not modified
• Report true errors and deterministically
7
Checking Storage System
• Clients use Explode for two things
– Systematically exhaust all possibilities
– Check that it correctly recovers from the crash
• Clients provide three system specific
components
– Checker
– Storage component
– Checking stack
8
Choose
• Explode provides “choose” fn. Given an
program that has N possible actions
clients insert a call “choose (N)”, Which
will appear to fork execution N times
returning 0,1,..N-1 in each child execution
resp.
9
Choose
• In low memory situations, kmalloc can
return NULL when called without
_GFP_NOFAIL flag.
Void *kmalloc (size_t size, int flags) {
If (flags & _GFP_NOFAIL)==0)
If(choose(2)==0)
return NULL;
……
10
Writing Checkers
• Client provides a checker that Explode
uses to drive and check a storage system
• Checker implements five methods
– Mutate: system specific operations and calls
into Explode to explore choices and to do
crash checking
– Check: checking for storage errors after crash
– get_sig: returns byte array signature
representing the current state of the checked
system
11
Writing Checkers
– init and finish: setup and clear checkers
internal states
• File system checker checks if a file
synchronously written to disk persists after
a crash
12
13
14
Storage Components
• For each storage subsystem involved in
checking, clients provide a storage
component that implements five methods
– Init: for initialization
– Mount: setup storage system
– Unmount: tear down storage system
– Recover: repair the storage system
– Threads: return thread ids for the storage
system’s kernel threads
15
Checking stack
• Checking stack builds a checker by gluing
storage system components together and
then attaching single checker on top of
them.
• Currently stack can only have one checker
• There can be fan-out of storage
components, such as setting up multiple
RAM disks to make a RAID array
16
Implementation
• Create clean initial state and invoke
client’s mutate on it
• At every Choose (N) call, fork N children
• On client request generate all crash disks
and run the client check method on them
• When mutate returns, re invoke it
17
18
Implementation
19
State Checkpoint and Restore
• Explode checkpoints a state by recording
sequence of choices the checked code
took to reach S
• It restores S by starting from a clean state
and replaying these choices
• Checkpoint records the sequence of n
choices that produced S in an array
20
State Checkpoint and Restore
• Unmounting clears in state memory,
removes buffer cache entries and frees up
kernel data structures.
• To restore a state, the current disk is
unmounted. Then a copy of the initial
pristine disk is mounted and all previously
made choices are replayed.
21
Re-executing the code
deterministically
• Doing the same choice: Explode discards any
calls from an interrupt or calls from any other
process whose id is not associated with checked
system.
• Controlling threads: Explode uses priorities to
control when storage system threads run.
• Requirements on checked system: Checked
system must issue same choose calls across
replay runs. The systems checked are partly
isolated during checking and nothing besides
checker and their kernel threads modifies RAM
disks.
22
Generating crash disks
• As the storage system executes, EKM
logs operations that affect which blocks
could be written to disk
• Explode extracts this log using EKM ioctl
• Then applies add/remove operations to
initial write set
• Whenever “write set shrinks”, it generates
all possible crash disks.
23
Generating crash disks
24
Checking different storage systems
• Explode checks different types of storage
systems which include File systems,
Version control systems , database etc.,
• Bugs were found in each of the storage
system that is checked using Explode
25
Checking File System
• Explode checks ten different types of
Linux file systems
• A common checkers is used initially
• Three checkers are developed using the
common checker which focused on
different special cases
26
Generic checker core
• Starts from empty file system and
systematically generates FS topologies
• Mutate applies eight system calls to each
node (File, Link, Directory) in the current
topology before exploring the next
• For each operation invoked, mutate
duplicates its effect on fake abstract file
system.
27
Check : Failed systems calls have
no user visible effect
• It uses Explode to systematically fail calls to
six kernel functions.
• If a system call succeeds then it updates
abstract file system but otherwise doesn’t .
• It then checks abstract FS with real FS
• Two bugs are found
• Bug in ReisserFS ftruncate which can fail with
its job half done if memory allocation fails.
28
Check: sync operations work
• Applications use OS provided methods to
force data to disk to prevent crashes from
destroying it
• SYNC
• FSYNC
• Synchronous mount
• O_SYNC
29
Check: sync operations work
30
Check: a recovered FS is
reasonable
• After crash a file system recovers to a reasonable
state
• As Mutate issues operations, it builds two sets.
• Stable set: contains operations it knows as contained
on the disk
• Volatile set: operations that may or may not be on the
disk
31
Check: a recovered FS is
reasonable
• Check verifies that recovered file system can be
constructed using some of the operations in volatile
set legally combined with all the stable ones.
• Two bugs (in JFS & Reiser4) are found. Crashed
disks could not be recovered using FSCK. A bug was
found in Reiser4, where mounting causes a kernel
panic.
32
Checking Version Control
• The checkers mutate method checks out a repository,
does a local modification, commits the changes and
simulates a crash on the block device.
• It then calls check_crashes_now()
• All three systems CVS, SubVersion and ExpENSiv made
the same mistake.
• To update File A, they update temp file B, which they
then atomically rename it to A. However they forget to
force B’s contents to disks before rename. So a crash
can destroy it.
33
Checking Berkeley DB
• The database checker checks that after a
crash no committed transaction records
are corrupted or disappear.
• Mutate method is a simple loop that starts
a transaction, adds several records to it an
then commits this transaction. It records
committed transactions
• On ext2 creating a database inside the
transaction may corrupt if system crashes.
34
Checking Berkely DB
• Furthermore even committed transactions
may disappear
• On ext3 crash while adding a record can
lead to unrecoverable state
• On all three FS (ext2,ext3 and JFS) a
record which is added but not committed
can appear after crash
35
Checking RAID
• Two checks are done
– A file system’s behavior (crash & non crash) on top of
RAID should be same as without it.
– Losing any single sector in a RAID does not cause
data loss
• Explode found that Linux RAID does not
reconstruct unreadable sector
• If two sector read errors occurs then all
maintenance operations fail. Disk writes also fail
36
Checking NFS
• Setup NFS partition: Export local FS as an NFS
partition over loopback interface
• Tear : Unmount it
• Recovery: FSCK for local file systems
• Writing to a file, then reading the same file
through a hard link in a different directory yields
inconsistent data.
• Linux NFS security feature called Subtree
checking caused this error.
• There are also additional bugs specific to
individual file systems.
37
Conclusion
• Lightweight approach can be used to find
crash recovery errors
• Explode runs on a slightly modified Linux
kernel on raw hardware and
• Explode is applied to a variety of storage
systems and serious bugs are found.
38
Evaluation
• Explode is more general, lightweight and
easier to apply than FISC.
• Efforts are being made to make Explode
open source
• Improvements have been made when
compared to paper presented on Explode
in 2004 at the workshop
39
Questions ?
40
Thank you
41
Related Work
• File system testing tools
• Software Model Checking
• Generic bug finding
42
File System Checking Tools
•
•
•
•
Non deterministic
Less Comprehensive
But Lightweight and work ‘out of box’
Complementary to explode
43
Software Model Checking
• Model checkers are used to find errors in
design and implementation of software
systems
• Verisoft doesn’t store states at
checkpoints. It relies heavily on partial
order reduction techniques.
• Java pathfinder relies on Virtual machine,
which extracts current state of a java
program
44
Generic Bug finding
• Static analysis is better at finding errors in
surface properties visible in source code.
• Model checking is more strenuous as it
requires running code and moreover
checks only the path executed
• Because it executes code it can check
properties implied by the code
• Static analysis is complementary to
Explode
45