Lecture 22: Mass Storage Spring 2016 Jason Tang Slides based upon Operating System Concept slides, http://codex.cs.yale.edu/avi/os-book/OS9/slide-dir/index.html Copyright Silberschatz, Galvin, and Gagne, 2013 1 Topics • Mass Storage Systems • Disk Scheduling • Disk Management 2 Mass Storage Systems • After startup, OS loads into RAM applications that are stored elsewhere: • ROM / EPROM / FPGA • Magnetic hard disk • Non-volatile random-access memory (NVRAM) • Tape drive • Or others: boot CD, ZIP drive, punch card, … 3 EPROM • Erasable programmable read-only memory • Manufacturer or OEM burns image into EPROM • Use in older computing systems and in modern embedded systems 4 Magnetic Hard Disk • Spins magnetic platters, typically 5400 or 7200 RPM (or higher for server-grade) • Transfer rate: rate which data flow between drive and computer • Positioning time (random-access time): time to move disk arm to desired cylinder (seek time) plus time for desired sector to rotate under disk head (rotational latency) • Head crash: when disk head hits platter • Attached to computer via some bus: SCSI, IDE, SATA, Fibre Channel, USB, Thunderbolt, others • Host controller in computer uses bus to talk to disk controller 5 Magnetic Hard Disk (HDD) • Multi-terabyte drives are common • Theoretical SATA III speed is 6 Gb/sec • Seek times from 3 ms to 15 ms • Average seek time calculated based on 1/3 of tracks • Average rotational latency from 2.0 ms to 7.14 ms • Most drives also have an internal cache, approximately 64 MiB 6 Hard Disk Performance • Average access time = average seek time + average latency • Server-grade hard disks average 5 ms (3 ms seek time + 2 ms latency) • Average I/O time = average access time + (transfer size / transfer rate) + controller overhead • Example: transfer 4 KB data with 9 ms average access time, 1 Gb/s transfer rate, 0.1 ms controller overhead • = 9 ms + (4 KB × (1 GB / 1000×1000 KB) × (8 b / 1B) / (1 Gb/s)) + 0.1 ms • = 9.100032 ms (or about 100,000 times slower than modern RAM) 7 Non-Volatile RAM (NVRAM) • Used in modern computers for secondary storage, like a hard drive • Also known as flash memory, flash drive, or solid-state drive (SSD) • Two variants: NAND flash (most common) or NOR flash • Requires less power, much faster than magnetic hard disk • Does not suffer from head crash • Block erasure: must erase entire block at a time • Memory wear: may only erase the same block finite times (usually over 10,000) 8 Tape Drive • Early read/write secondary storage medium • Linear search: tape drive had to fast-forward or rewind spool of tape to correct place; very slow • Can hold up to 200 TB • Transfer rate on order of 140 MB/s (40 times slower than hard disk) • Origin of tar (tape archive) command 9 Disk Structure • Addressed as large 1-dimensional array of logical blocks • Block is smallest unit of transfer; HDD is usually 512 or 4096 bytes, NAND flash anywhere from 512 bytes to 128 KiB • On HDD, sector 0 is first sector on first track on outermost cylinder • Logical to physical addressing tricky, due to bad sectors • For HDD, non-constant number of sectors per track due to constant platter rotational speed 10 I/O Scheduling • Just as OS has a process scheduler to decide which process to run next, OS has an I/O scheduler to decide which disk operation to perform next • On HDD, minimize seek time, by decreasing physical distance that platter must rotate and for disk arm to move to correct cylinder • On SSD, combine write requests to same block • Disk bandwidth: total number of bytes transferred, divided by total time between start of first request to completion of last transfer • While data being transferred via DMA, OS can do other things 11 Disk Scheduling • Disk I/O request includes input or output mode, disk address, memory address, number of sectors to transfer • OS maintains queue of requests • Idle disk can immediately work on I/O request, while requests are queued for a busy disk • Optimization algorithms only make sense when a queue exists • HDD controllers have small buffers and can manage a queue of I/O requests 12 FCFS Scheduling (HDD) • Example: requests for cylinders 98, 183, 37, 122, 14, 124, 65, 67; head is currently on cylinder 53 • Requests serviced first-come, first-serve • Note wild swing between cylinders 37 to 122 to 14; would be faster if 37 and 14 were serviced consecutively 13 SSTF (HDD) • Shortest Seek Time first selects request with minimum seek time from current head position • Form of shortest-job first scheduling, but may starve a request 14 SCAN (HDD) • Disk arm starts at one end of disk, moves towards other end, servicing requests until it reaches other end; head then reverses direction • Also known as elevator algorithm • Works well if requests are uniformly dense; a large density of requests at other end of disk will wait the longest 15 Circular SCAN (C-SCAN) (HDD) • More uniform wait time than SCAN • When head reaches one end, return to beginning of disk instead of reversing direction • Treats cylinders as circular list that wraps around from last cylinder to first 16 LOOK and Circular-LOOK (C-LOOK) (HDD) • Arm only goes as far as last request in each direction, then reverses direction immediately, without going all of the way to end of disk • C-LOOK is LOOK with circular list 17 Disk Management (HDD) • Low-level formatting: dividing a disk into sectors that disk controller can read and write • Each sector holds header information, data, plus error correction code (ECC) • Usually done by manufacturer • Usually, disk is partitioned into one or more groups of cylinders, where each partition treated as a logical disk • Logical formatting: creating a file system 18 Disk Management (SSD) • On flash drives, can always flip a single bit from 1 to 0, but reverse not physically possible • Must instead erase entire flash sector (flip all bits from 0 to 1) • Erasing is slow, from 3 ms per sector for NAND flash up to 5 s for NOR flash • No such thing as low-level formatting a flash drive, though file systems still exist • File systems optimized for SSDs operate very differently than ones designed for HDDs 19 Bad Blocks • For HDD, bad blocks discovered during low-level initialization • Disk controller automatically skips over that block when formatting disk • For SSD, bad blocks discovered during erase • OS maintains list of bad blocks and skips them during operations • Bad blocks can also be found during operations, when controller/OS calculates a different ECC than one currently stored • Spare sectors: extra space not normally allocated, used when replacing a bad sector 20 Boot Block • At startup, ROM bootstrap loads OS from a fixed location on HDD/SSD • On x86 non-EFI systems, the master boot record (MBR) is stored on first sector; it contains a boot loader and partition table 21 Modern x86 Booting • Modern systems employ the extensible firmware interface (EFI) instead of traditional BIOS • EFI systems can boot from a variety of sources: • Local hard disk, with a GUID partition table (GPT) • Over a network, via preboot execution environment (PXE) • Regardless of boot medium, EFI then passes various parameters to the operating system • Requires kernel to be EFI aware (all modern desktop OSes are) 22 EFI Booting 23
© Copyright 2026 Paperzz