Mass Storage

Lecture 22: Mass Storage
Spring 2016
Jason Tang
Slides based upon Operating System Concept slides,
http://codex.cs.yale.edu/avi/os-book/OS9/slide-dir/index.html
Copyright Silberschatz, Galvin, and Gagne, 2013
1
Topics
• Mass Storage Systems
• Disk Scheduling
• Disk Management
2
Mass Storage Systems
• After startup, OS loads into RAM applications that are stored elsewhere:
• ROM / EPROM / FPGA
• Magnetic hard disk
• Non-volatile random-access memory (NVRAM) • Tape drive • Or others: boot CD, ZIP drive, punch card, …
3
EPROM
• Erasable programmable read-only memory
• Manufacturer or OEM burns image into EPROM
• Use in older computing systems and in modern embedded systems
4
Magnetic Hard Disk
• Spins magnetic platters, typically 5400 or 7200 RPM (or higher for server-grade)
• Transfer rate: rate which data flow between drive and computer
• Positioning time (random-access time): time to move disk arm to desired
cylinder (seek time) plus time for desired sector to rotate under disk head
(rotational latency)
• Head crash: when disk head hits platter
• Attached to computer via some bus: SCSI, IDE, SATA, Fibre Channel, USB,
Thunderbolt, others
• Host controller in computer uses bus to talk to disk controller
5
Magnetic Hard Disk (HDD)
• Multi-terabyte drives are common
• Theoretical SATA III speed is 6 Gb/sec
• Seek times from 3 ms to 15 ms
• Average seek time calculated based on
1/3 of tracks
• Average rotational latency from 2.0 ms to 7.14 ms
• Most drives also have an internal cache, approximately 64 MiB
6
Hard Disk Performance
• Average access time = average seek time + average latency
• Server-grade hard disks average 5 ms (3 ms seek time + 2 ms latency)
• Average I/O time = average access time + (transfer size / transfer rate) +
controller overhead
• Example: transfer 4 KB data with 9 ms average access time, 1 Gb/s
transfer rate, 0.1 ms controller overhead
• = 9 ms + (4 KB × (1 GB / 1000×1000 KB) × (8 b / 1B) / (1 Gb/s)) + 0.1 ms
• = 9.100032 ms (or about 100,000 times slower than modern RAM)
7
Non-Volatile RAM (NVRAM)
• Used in modern computers for secondary storage, like a hard drive
• Also known as flash memory, flash drive, or solid-state drive (SSD)
• Two variants: NAND flash (most common) or NOR flash
• Requires less power, much faster than magnetic hard disk
• Does not suffer from head crash
• Block erasure: must erase entire block at a time
• Memory wear: may only erase the same block finite times (usually over 10,000)
8
Tape Drive
• Early read/write secondary storage medium
• Linear search: tape drive had to fast-forward or rewind spool of tape to
correct place; very slow
• Can hold up to 200 TB
• Transfer rate on order of 140 MB/s
(40 times slower than hard disk)
• Origin of tar (tape archive) command
9
Disk Structure
• Addressed as large 1-dimensional array of logical blocks
• Block is smallest unit of transfer; HDD is usually 512 or 4096 bytes, NAND
flash anywhere from 512 bytes to 128 KiB
• On HDD, sector 0 is first sector on first track on outermost cylinder
• Logical to physical addressing tricky, due to bad sectors
• For HDD, non-constant number of sectors per track due to constant
platter rotational speed
10
I/O Scheduling
• Just as OS has a process scheduler to decide which process to run next, OS
has an I/O scheduler to decide which disk operation to perform next
• On HDD, minimize seek time, by decreasing physical distance that platter
must rotate and for disk arm to move to correct cylinder
• On SSD, combine write requests to same block
• Disk bandwidth: total number of bytes transferred, divided by total time
between start of first request to completion of last transfer
• While data being transferred via DMA, OS can do other things
11
Disk Scheduling
• Disk I/O request includes input or output mode, disk address, memory
address, number of sectors to transfer
• OS maintains queue of requests
• Idle disk can immediately work on I/O request, while requests are queued for
a busy disk
• Optimization algorithms only make sense when a queue exists
• HDD controllers have small buffers and can manage a queue of I/O requests
12
FCFS Scheduling (HDD)
• Example: requests for cylinders 98, 183, 37, 122, 14, 124, 65, 67; head is
currently on cylinder 53
• Requests serviced first-come, first-serve
• Note wild swing between cylinders 37 to 122 to 14; would be faster if 37 and
14 were serviced consecutively
13
SSTF (HDD)
• Shortest Seek Time first selects request with minimum seek time from current
head position
• Form of shortest-job first scheduling, but may starve a request
14
SCAN (HDD)
• Disk arm starts at one end of disk, moves towards other end, servicing
requests until it reaches other end; head then reverses direction
• Also known as elevator algorithm
• Works well if requests are uniformly dense; a large density of requests at
other end of disk will wait the longest
15
Circular SCAN (C-SCAN) (HDD)
• More uniform wait time than SCAN
• When head reaches one end, return to beginning of disk instead of reversing
direction
• Treats cylinders as circular list that wraps around from last cylinder to first
16
LOOK and Circular-LOOK (C-LOOK) (HDD)
• Arm only goes as far as last request in each direction, then reverses direction
immediately, without going all of the way to end of disk
• C-LOOK is LOOK with circular list
17
Disk Management (HDD)
• Low-level formatting: dividing a disk into sectors that disk controller can read
and write
• Each sector holds header information, data, plus error correction code
(ECC)
• Usually done by manufacturer
• Usually, disk is partitioned into one or more groups of cylinders, where each
partition treated as a logical disk
• Logical formatting: creating a file system
18
Disk Management (SSD)
• On flash drives, can always flip a single bit from 1 to 0, but reverse not
physically possible
• Must instead erase entire flash sector (flip all bits from 0 to 1)
• Erasing is slow, from 3 ms per sector for NAND flash up to 5 s for NOR flash
• No such thing as low-level formatting a flash drive, though file systems still
exist
• File systems optimized for SSDs operate very differently than ones
designed for HDDs
19
Bad Blocks
• For HDD, bad blocks discovered during low-level initialization
• Disk controller automatically skips over that block when formatting disk
• For SSD, bad blocks discovered during erase
• OS maintains list of bad blocks and skips them during operations
• Bad blocks can also be found during operations, when controller/OS
calculates a different ECC than one currently stored
• Spare sectors: extra space not normally allocated, used when replacing a bad
sector
20
Boot Block
• At startup, ROM bootstrap loads OS from a fixed location on HDD/SSD
• On x86 non-EFI systems, the master boot record (MBR) is stored on first
sector; it contains a boot loader and partition table
21
Modern x86 Booting
• Modern systems employ the extensible firmware interface (EFI) instead of
traditional BIOS
• EFI systems can boot from a variety of sources:
• Local hard disk, with a GUID partition table (GPT)
• Over a network, via preboot execution environment (PXE)
• Regardless of boot medium, EFI then passes various parameters to the
operating system
• Requires kernel to be EFI aware (all modern desktop OSes are)
22
EFI Booting
23