Monitor
Keyboard
Floppy
disk drive
Hard
disk drive
Keyboard
controller
Floppy
disk
controller
Hard
disk
controller
I/O M ANAGEMENT
Slide 1
➜ Categories of I/O devices and their integration with processor
and bus
CPU
Memory
Video
controller
Slide 3
➜ Design of I/O subsystems
➜ I/O buffering and disk scheduling
Bus
➜ RAID
Controller:
➜ can often handle more than one identical devices
➜ low level interface to actual device
➜ for disks: perform error check, assemble data in buffer
Three classes of devices (by usage):
CATEGORIES
OF
➜ Human readable:
I/O D EVICES
• For communication with user
There exists a large variety of I/O devices:
• Keyboard, mouse, video display, printer
➜ Many of them have different properties
Slide 2
Slide 4
➜ They seem to require a range of different interfaces
• We don’t want a new device interface for every new device
➜ Machine readable:
• For communication with electronic equipment
• Disks, tapes, sensors, controllers, actuators
• Diverse, but similar interfaces lead to duplication of code
➜ Remote communication:
➜ Challenge: Uniform and efficient approach to I/O
• For communication with remote devices
• Modems, Ethernet, wireless
CATEGORIES OF I/O D EVICES
1
W HICH DIFFERENCES IMPACT DEVICE HANDLING ?
2
W HICH
DIFFERENCES IMPACT DEVICE HANDLING ?
➜ Data rate
ACCESSING I/O D EVICES — H ARDWARE
➜ Complexity of control
- e.g., line printer versus high-speed graphics
Slide 5
➜ Unit of transfer
- stream-oriented: e.g. terminal I/O
- block-oriented: e.g. disk I/O
Slide 7
Interfacing alternatives:
➜ Port mapped I/O
➜ Memory mapped I/O
➜ Data representation (encoding)
➜ Direct memory access (DMA)
➜ Error conditions (types, severity, recovery, etc.)
Hopefully similarity within a class, but there are exceptions!
Typical data rates of I/O devices::
Gigabit Ethernet
Port mapped versus memory mapped I/O:
Graphics display
➜ Memory-mapped I/O:
Hard disk
• I/O device registers and memory are mapped into the
normal memory address range
Ethernet
• Standard memory access is used
Optical disk
Slide 6
Slide 8
Scanner
Laser printer
➜ Port-mapped I/O:
Floppy disk
• I/O devices are accessed using special I/O port instructions
Modem
• Only part of the address lines are needed
Mouse
- standard PC architecture: 16 bit port address space
Keyboard
101
- any memory access function can be used to manipulate
I/O device
102
103
104
105
106
107
108
109
Data Rate (bps)
ACCESSING I/O D EVICES — H ARDWARE
3
ACCESSING I/O D EVICES — H ARDWARE
4
Two address
0xFFFF…
One address space
Two address spaces
D IRECT M EMORY ACCESS (DMA)
Memory
Basics:
I/O ports
Slide 9
Slide 11
0
(a)
(b)
➜ Takes control of the system form the CPU to transfer data to and
from memory over the system bus
➜ Cycle stealing is used to transfer data on the system bus
➜ The instruction cycle is suspended so data can be transferred
(c)
(a) Port mapped
➜ The CPU pauses one bus cycle
(b) Memory mapped
➜ No interrupts occur (i.e., no expensive context switches)
(c) Hybrid
Drive
Memory mapped I/O
✔ directly accessable in high-level languages
CPU
✔ no need for special protection mechanism
Slide 10
✔ not necessary to load contents into main memory/registers
1. CPU
programs
the DMA
controller
Buffer
Count
4. Ack
Control
✘ memory modules and I/O devices must inspect all memory
references
Interrupt when
done
✘ complex problem if multiple buses are present
Main
memory
Disk
controller
Address
Slide 12
✘ interference with caching
DMA
controller
2. DMA requests
transfer to memory
3. Data transferred
Bus
D IRECT M EMORY ACCESS (DMA)
5
D IRECT M EMORY ACCESS (DMA)
6
Time
Instruction Cycle
Processor
Cycle
Processor
Cycle
Processor
Cycle
Processor
Cycle
Processor
Cycle
Processor
Cycle
Fetch
Instruction
Decode
Instruction
Fetch
Operand
Execute
Instruction
Store
Result
Process
Interrupt
Configurations: Single bus, detached DMA::
Slide 13
Slide 15
DMA
Breakpoints
Processor
DMA
I/O
¥ ¥ ¥
Memory
I/O
Single-bus,
DMA
➜ Cycle stealing causes(a)the
CPU detached
to execute
more slowly
Interrupt
Breakpoint
➜ most buses as well as DMA controllers can operate in
word-by-word or block mode
- word-by-word mode: use cycle stealing to transfer data
- block mode: DMA controller aquires bus to transfer data
Processor
DMA
Memory
DMA
I/O
➜ typically, devices like disks, sound or graphic cards use DMA
I/O
I/O
(b) Single-bus, Integrated DMA-I/O
System bus
Processor
Processor transfers data vs DMA:
I/O
¥ ¥ ¥
Memory
I/O
Memory
DMA
(a) Single-bus, detached DMA
I/O bus
➜ Processor transfers data:
- Processor copies data from main memory into processor
registers or memory
- Large data volume ⇒ CPU load can be very high
Slide 14
DMA
Processor
Single bus, integrated DMA I/O:
I/O
Slide 16
➜ Direct memory access (DMA):
- Processor forwards address range of data to a DMA
controller
- The DMA controller performs the data transfer without
processor intervention (but locks the memory bus)
- Slows down processor, but overhead much less
Processor
I/O
I/O
DMA
Memory
(c) I/O busDMA
I/O
I/O
I/O
Single-bus, Integrated DMA-I/O
➜ Reduce busy cycles(b)by
integrating the DMA and I/O devices
System bus
Processor
Memory
DMA
I/O bus
I/O
D IRECT M EMORY ACCESS (DMA)
7
I/O
D IRECT M EMORY ACCESS (DMA)
I/O
8
(c) I/O bus
Processor
DMA
I/O
¥ ¥ ¥
Memory
I/O
(a) Single-bus, detached DMA
Processor
DMA
Memory
DMA
I/O
I/O
Separate I/O bus:
I/O
(b) Single-bus, Integrated DMA-I/O
P ROGRAMMED
System bus
Processor
Slide 17
Slide 19
I/O bus
I/O
I/O
I/O
Example: what happens when the CPU reads from disk?
Memory
DMA
AND INTERRUPT DRIVEN
I/O
➀ Disk controller
- reads block bit by bit into buffer
- compute checksum to detect read errors
- causes interrupt
➁ CPU copies block byte by byte into main memory
I/O busI/O module that does not
➜ Path between DMA module(c)and
include the system bus
P ROGRAMMED I/O
ACCESSING I/O D EVICES
Read:
➀ poll on status of device
Three general approaches on software level:
➁ issue read command to device
➀ Programmed I/O
➂ wait for read to complete, poll on status of device
- poll on device
Slide 18
Slide 20
➁ Interrupt-driven I/O
- suspend when device not ready
➃ copy data from device register into main memory
➄ jump to (1) until all data read
Write:
➂ I/O using direct memory access (DMA)
➀ poll on status of device
- use extra hardware component
➁ copy data to device register
Let’s have a look at each of the three methods.
➂ issue write command to device
➃ jump to (1) until all data written
P ROGRAMMED AND INTERRUPT DRIVEN I/O
9
P ROGRAMMED I/O
10
I NTERRUPT- DRIVEN I/O
Steps involved:
➀ issue read/write command to device
➁ wait for corresponding interrupt, suspend
➂ device ready: acknowledge I/O interrupt
➃ handle interrupt:
Properties:
Slide 21
➜ programmed I/O suitable in some situations (e.g., single
threaded, embedded system)
Slide 23
➜ usually inefficent, waste of CPU cycles
- read data from device register
- write data to device register
Properties:
➜ high overhead due to frequent context switching
➜ more efficient use of CPU than programmed I/O
➜ but, still waste of CPU cycles as CPU does all the work
Alternative: use extra hardware (direct memory access
controller) to offload some of the work from CPU
D IRECT M EMORY ACCESS (DMA)
I NTERRUPT- DRIVEN I/O
How DMA works:
➜ avoid polling on device!
CPU
Slide 22
3. CPU acks
interrupt
Interrupt
controller
➀ CPU tells DMA controller what it should copy, and where it
should copy to
1. Device is finished
Disk
Slide 24
Keyboard
2. Controller
issues
interrupt
11 12 1
10
2
9
3
8
4
7 6 5
Clock
Printer
➁ DMA controller issues read request to disk controller
➂ disk controller
- transfers word into main memory
- signals DMA controller
➃ DMA controller decrements counter
- if all words are transferred, signal CPU
- otherwise, continue with step (2)
Bus
I NTERRUPT- DRIVEN I/O
11
T HE QUEST FOR GENERALITY / UNIFORMITY
12
T HE
QUEST FOR GENERALITY / UNIFORMITY
➃ DMA
Ideal state:
• CPU involved at beginning an end only
➜ handle all I/O devices in the same way (both in the OS and in
user processes)
Slide 25
➄ IO module has separate processor
Slide 27
Problem:
• Example: SCSI controller)
• Controller CPU executes SCSI code out of main memory
➜ Diversity of I/O devices
➅ IO module is a computer in its own right
➜ Especially, different access methods (random access versus
stream-based) as well as vastly different data rates
• Myrinet multi-gigabit network controller
➜ Generality often compromises efficiency!
T HE E VOLUTION
OF THE
IO-F UNCTION
I/O S OFTWARE L AYERS
Hardware changes trigger changes in handling of IO
devices:
I/O software is divided into a number of layers, to provide
adequate abstraction and modularisation:
➀ Processor directly controls a peripheral device
➁ Controller or IO module is added
Slide 26
• Programmed IO without interrupts
Slide 28
• Example: Universal Asynchroneous Receiver Transmitter
(UART)
User-level I/O software
Device-independent operating system software
• CPU reads or writes bytes to IO controller
Device drivers
• CPU does not need to handle details of external device
Interrupt handlers
➂ Controller or IO module with interrupts
Hardware
• CPU does not spend time waiting on completion of
operation
T HE E VOLUTION OF THE IO-F UNCTION
13
I/O S OFTWARE L AYERS
14
I/O Organisation:
User
Processes
User
Processes
I/O I NTERRUPT H ANDLER
User
Processes
Main steps involved:
➀ save state that has not already been saved by hardware
Directory
Management
Logical
I/O
Communication
Architecture
Slide 29
➁ set up context (address space) for specific interrupt handler
(stack, TLB, MMU, page table)
File
System
Slide 31
Physical
Organization
Device
I/O
Device
I/O
Device
I/O
Scheduling
& Control
Scheduling
& Control
Scheduling
& Control
➂ set up stack for interrupt service procedure (usually runs in
kernel stack of current process)
➃ acknowledge interrupt controller, re-enable other interrupts
➄ run specific interrupt handler
• find out what caused interrupt (received network packet,
disk read finished, etc)
• get information from device controller
➅ re-enable interrupt
Hardware
Hardware
Hardware
(a) Local peripheral device
(b) Communications port
(c) File system
➆ invoke scheduler
D EVICE D RIVER
I/O I NTERRUPT H ANDLER
➜ Interrupt handlers are best “hidden”
Slide 30
Code to control specific device:
➜ Can execute almost any time, raises complex concurrency
issues
Slide 32
➜ Generally, drivers starting an IO operation block until interrupt
notifies them of completion (dev read())
➜ sometimes suitable for class of devices adhering to a standard
(e.g., SCSI)
➜ common interface to rest of OS, depending on type (block,
character, network) of device
➜ Interrupt handler does its task, then unblocks driver
I/O I NTERRUPT H ANDLER
➜ usually differs from device to device
15
D EVICE D RIVER
16
D EVICE D RIVER
D EVICE I NDEPENDENT I/O S OFTWARE
User process
➜ Commonality between drivers of different classes
User
program
User
space
➜ Split software into device dependent and independent part
Device independent software is responsible for:
Slide 33
Slide 35
Rest of the operating system
Kernel
space
➜ Error reporting
Printer
driver
Hardware
➜ Uniform interfacing to device drivers
➜ Buffering
Printer controller
Camcorder
driver
CD-ROM
driver
➜ Allocating/releasing
➜ Providing device independent block sizes
Camcorder controller CD-ROM controller
We will look into each of these tasks separately
Devices
U NIFORM I NTERFACING
Design goal:
Main steps involved:
➜ interface to all device driver should be the same
➀ check input parameters
➜ may not be possible, but few classes of different devices, high
similarity between classes
➁ translate request (open, close, read, . . . ) into appropriate
sequence of commands for part. hardware
➜ provides an abstraction layer over concrete device
➂ convert into device specific format e.g. disk
Slide 34
Slide 36
- linear block number into head, track, sector, cylinder number
➃ check if device is available, queue request if necessary
➜ uniform kernel interface for device code
- kmalloc, installing IRQ handler
- allows kernel to evolve without breaking exisiting drivers
➄ program device controller (may block, depending on device)
Naming of devices:
Device drivers also initialise hardware at boot time, shut it
down cleanly.
➜ map symbolic device names to driver
➜ Unix, Windows 2000:
- devices appear in the file system
- usual file protection rule applies to device drivers
D EVICE I NDEPENDENT I/O S OFTWARE
17
U NIFORM I NTERFACING
18
Unix device files:
Some I/O devices have no device file:
➜ Uniform device interface: devices as files
➜ network interfaces are handled differently
- read()
➜ However, symbolic name for network interfaces (eth0)
- write()
Slide 37
➜ Device name associated with network address
- seek()
Slide 39
- ioctl() etc,
➜ User-level interface: sockets (a BSD invention)
➜ Sockets are also a file in the Unix file system, but offer a different
interface
➜ Main attributes of a device file:
- Device type: block versus character (stream) devices
• socket(), bind(), receive(), send() instead of open(),
read() and write()
- Major number (1–255) identifies driver (device group)
- Minor number (8 bit) identifies a specific device in a group
Implementation of device files:
➜ Virtual File System (VFS)
Examples:
Slide 38
• Re-directs file operations to device driver
Name
Type
Major
Minor
Description
/dev/fd0
block
2
0
floppy disk
/dev/hda
block
3
0
first IDE disk
/dev/hda2
block
3
0
2nd primary partition of IDE disk
/dev/ttyp0
char
3
0
terminal
/dev/ttyp0
char
5
1
console
/dev/null
char
1
3
Null device
• driver determined by the major number
Device drivers:
Slide 40
➜ Device-dependent low-level code
• convert between device operations and standard file ops
(read(), write(), ioctl(), etc.)
➜ Device drivers in Linux are:
• statically linked into the kernel, or
• dynamically loaded as kernel modules
➜ The kernel provides standardised access to DMA etc.
U NIFORM I NTERFACING
19
U NIFORM I NTERFACING
20
Association of device drivers with device files:
➜ VFS maintains tables of device file class descriptors
• chrdevs for character devices
• blkdevs for block devices
A LLOCATING
• indexed by major device number
• each contains file operation table
➜ some devices can only be used by single process
• device driver registers itself with VFS by providing an entry
Slide 41
AND RELEASING DEVICES
- e.g., CD burner
Slide 43
struct file_operations {
ssize_t (*read) (struct file *, char *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
int
(*ioctl) (struct inode *, struct file *,
unsigned int, unsigned long);
int
(*mmap) (struct file *, struct vm_area_struct *);
int
(*open) (struct inode *, struct file *);
int
(*lock) (struct file *, int, struct file_lock *);
...
};
➜ OS must manage allocation of devices
➜ if device is mapped to special file
- open and close to aquire and release device
- may be blocking or non-blocking (fail)
Device access via device files:
I/O B UFFERING
➜ On file open(), VFS does the following:
• retrieves file type as well as major and minor number
Why do we need buffering?
• indexes either chrdevs or blkdevs with major number
➜ Performance:
• places file operations table into file descriptor
• Put output data in buffer and write asynchronously
• invokes the open() member of the file operations table
Slide 42
➜ On other file operations:
Slide 44
• VFS performs indirect jump to handler function via file
operations table
• Batch several small writes into one large
• Some devices can only read/write largish blocks
• Read ahead
➜ Locking of memory pages; deadlock avoidance:
➜ The file operation table provides a small and well-defined
interface to most devices
• Cannot swap pending I/O data (especially with DMA)
• Cannot deliver data to process that is swapped out
➜ However, the ioctl() entry is a kind of “back door”:
➼ lock pages when I/O operation is queued
• implements functionality not covered by the other routines
• ugly!
A LLOCATING AND RELEASING DEVICES
21
I/O B UFFERING
22
User process
User
space
Kernel
space
Slide 45
2
2
1
1
A LLOCATING
➜ some devices can only be used by single process
3
Modem
Modem
Modem
Modem
(a)
(b)
(c)
(d)
AND RELEASING DEVICES
- e.g., CD burner
Slide 47
➜ OS must manage allocation of devices
➜ if device is mapped to special file
(a) no buffering
- open and clode to aquire and release device
(b) buffer in user space
- may be blocking or non-blocking (fail)
- what happens if page is swapped to disk?
(c) buffer in kernel space
(d) double buffering
User process
E RROR R EPOR TING
User
space
Kernel
space
Errors in context of I/O are common and can occur on
different levels:
5
1
➜ programming errors:
2
4
- write to read-only device
Network
controller
Slide 46
Slide 48
3
- invalid buffer address
report error code to caller
➜ I/O error
Network
• device not present
➀ data copied to kernel space, user process does not block
• storage medium defect
➁ driver copies it into controller register
➜ critical errors
➂ copy to network, receiver’s buffer
- critical data structure destroyed
➃ send acknowledgement
➄ copy to kernel space, then user space
A LLOCATING AND RELEASING DEVICES
23
U SER SPACE I/O SOFTWARE
24
U SER
SPACE
I/O
SOFTWARE
H ARD D ISKS
Library functions of different complexity
After general discussion of I/O, let’s look at hard disks in detail
➜ write
➜ fprintf
Slide 49
➜ graphics libraries
Slide 51
Spooling:
➜ Disk hardware
- architecture influences lower level I/O software
➜ Disk formatting
➜ Disk Scheduling
- dealing with disk requests
Special OS processes (daemons) control device
➜ RAID
➜ Linux: check ls /var/spool to see what type of I/O operations
use spooling
C OMPONENTS
S UMMARY
Layer
OF
I/O
OF A
Read/write head (1 per surface)
SYSTEM LAYERS
I/O
reply
D ISK D RIVE
Direction of
arm motion
Surface 9
Platter
I/O functions
Surface 8
I/O
request
Slide 50
User processes
Device-independent
software
Make I/O call; format I/O; spooling
Surface 7
Slide 52
Naming, protection, blocking, buffering, allocation
Device drivers
Set up device registers; check status
Interrupt handlers
Wake up driver when I/O completed
Surface 6
Surface 5
Surface 4
Surface 3
Surface 2
Hardware
Surface 1
Perform I/O operation
Surface 0
Spindle
H ARD D ISKS
25
T RACKS PER C YLINDER
Boom
26
Disk geometry:
29
31
30
0
1
23
2
26
21
20
19
18
6
11 12 13
10
5
6
23
4
8
7 8 9
17
7
16
89
11
12
15
13
20
10
21
9
22
2
7
3 4 5
13
14
14
15
16
17
18
Slide 55
1
6
24 2
5
5
0 1
2
Slide 53
0
3
4
15
14
24
22
3
27
28
12
C YLINDER
11
PER
10
T RACKS
19
➜ modern disks are devided into zones
➜ different number of sectors per track for each zone
➜ present virtual geometry to OS
- pretends to have equal amount of sectors per track
- maps virtual (cylinder, head,sector) coordinate to real
location
Disk geometry:
Sectors
Tracks
Inter-sector gap
¥¥
¥
S6
D ISK F ORMATTING
Inter-track gap
Low-level formatting:
¥¥¥
S6
S5
SN
Preamble
SN
Data
ECC
Slide 56
S5
Slide 54
Layout of sector:
S4
S1
S4
S1
- Preamble: marks start of a sector, cylinder and sector number
- Data: usually 512 byte section
S2
S2
S3
S3
T RACKS PER C YLINDER
- Error correction code (ECC): used to detect, correct errors
27
D ISK FORMATTING — C YLINDER SKEW
28
15
14
4
➜ root directory
➜ type of file system
5
1
2
30
31
6
3
0
7
4
1
12
32
4
23 24 25 2
62
7
5
11
13
10
9
8
14
18
17
➜ boot block
29
22
21
20
8 9
14 15 16
15
13
17
18
7 18 19
16 1
20
11
5
1
21
2
14
19 20 1 22 2
10 11
12
7
6
12
High-level formatting: each partition contains
➜ free storage admin info
3
8
Slide 59
0
9
8
5
➜ Partition table gives starting sector and size of each partition.
➜ master boot record: boot code and partition table
6 7
15 16 17 1
81
9
10
3
11 12 13 14 15
10
16
13
2
31
26
0
25
30 31 0 1 2
29
3
9
➜ disk is divided into different partitions, each treated as a
separate logical disk.
27
12
20
25
24
2
6
23
27
22
2
8
7
2
29
26
28
3
0
21
25
30 31 0
31
29
1
24
8
2
2
0 1 2 3 4
31
5
4 5
22 23 24
23
21
25
16
20
26
4 25 26 27 2
2
19
8
23
2
9
22
7 28 29 30
28
Partitioning:
Direction of disk
rotation
19
22
7 8 9 10
5 6
11
8
18
21
4
11
20
2
6
17
19
1
10
17
16
SKEW
30
7
18
9
13
Slide 57
— C YLINDER
FORMATTING
14
13
12
D ISK
7
6
4
3
2
D ISK E RROR H ANDLING
D ISK
FORMATTING
— I NTERLEAVING
➜ due to high density, most disks have bad sectors
SECTORS
➜ small defects can be masked using ECC
7
0
6
5
Slide 58
4
3
(a)
7
1
3
2
6
0
2
5
4
2
1
7
5
(b)
➜ major defects handled by remapping to spare sectors
0
3
Substituting bad sectors:
6
4
Slide 60
1
(c)
➜ system may not be able to keep up with rotation speed
➜ to avoid interleaving sectors, modern controllers able to buffer
entire track
D ISK FORMATTING — I NTERLEAVING SECTORS
29
0 1
29
2
28
3
27
4
26
5
25
Spare
6
sectors Bad
24
sector
8
23
9
22
10
21
11
20
12
19
13
18 17
16 15 14
0 1
29 7
2
28
3
27
4
26
5
25 Replacement
6
sector
24
8
23
9
22
10
21
11
20
12
19
13
18 17
16 15 14
(a)
(b)
D ISK E RROR H ANDLING
28 29
27
26
25
24
23
22
21
20
19
18
17 16
0 1
2
3
4
5
6
15 14 13
7
8
9
10
11
12
(c)
30
Disk performance parameters:
➜ Disk is moving device ⇒ must position correctly for I/O
➜ Execution of a disk operation involves:
Can be done by
• Wait time: the process waits to be granted device access
➜ disk controller
- before disk is shipped
Slide 61
Slide 63
- dynamically when repeated errors occur
- remapping by maintaining internal table or rewriting
preamble
– Wait for device: time the request spends in a wait queue
– Wait for channel: time until a shared I/O channel is
available
• Access time: time the hardware needs to position the head
– Seek time: position the head at the desired track
– Rotational delay (latency): spin disk to the desired sector
➜ operating system: tricky (eg. backups)
• Transfer time: sectors to be read/written rotate below the
head
D ISK S CHEDULING
➜ Disk performance is critical for system performance
Slide 62
➜ Management and ordering of disk access requests have strong
influence on
- access time
- bandwidth
Wait for
Device
Wait for
Channel
Seek
Rotational
Delay
Data
Transfer
Slide 64
➜ Important to optimise because:
Device Busy
• huge speed gap between memory and disk
• disk throughput extremely sensitive to
- request order ⇒ disk scheduling
- placement of data on disk ⇒ file system design
➜ Request scheduler must be aware of disk geometry
D ISK S CHEDULING
31
P ERFORMANCE PARAMETERS
32
P ERFORMANCE PARAMETERS
➜ Seek time Ts : Moving the head to the required track
• not linear in the number of tracks to traverse:
- startup and settling time
• Typical average seek time: a few milliseconds
D ISK S CHEDULING P OLICY
➜ Rotational delay:
Slide 65
Observation from the calculation:
- rotational speed, r, of 5,000 to 10,000rpm
- At 10,000rpm, one revolution per 6ms ⇒ average delay 3ms
Slide 67
➜ Transfer time:
➜ For a single disk there will be a number of I/O requests
➜ Processing in random order leads to worst possible performance
• to transfer b bytes, with N bytes per track:
T =
➜ Seek time is the reason for differences in performance
➜ We need better strategies
b
rN
• Total average access time:
Ta = T s +
1
b
+
2r
rN
A Timing Comparison:
First-in, first-out (FIFO):
➜ Ts = 2 ms, r = 10, 000 rpm, 512B sect, 320 sect/track
➜ Read a file with 2560 sectors (= 1.3MB)
➜ File stored compactly (8 adjacent tracks):
➜ Process requests as they come in
Read first track
Average seek
Slide 66
Request tracks: 55, 58, 39, 18, 90, 160, 150, 38, 184
2ms
Rot. delay
3ms
Read 320 sectors
6ms
Slide 68
11ms ⇒ All sectors: 11 + 7 ∗ 9 = 74ms
➜ Sectors distributed randomly over the disk:
Read any sector
Average seek
2ms
Rot. delay
3ms
Read 1 sector
(a) FIFO
➜0 Fair (no starvation!)
0.01875ms
5.01875ms
D ISK S CHEDULING P OLICY
0
25
50
75
100
125
150
175
199
⇒ All: 2560 ∗ 5.01875 = 20, 328ms
33
25
➜
50
75
➜
100
125
150
175
199
Good for few processes with clustered requests
deteriorates to random if there are many processes
D ISK S CHEDULING P OLICY
0
25
50
75
100
Time
(b) SSTF
34
199
(a) FIFO
Time
0
25
50
75
100
125
150
175
199
Slide 69
Slide 70
0
25
50
75
Shortest
Service Time First (SSTF):
100
➜
Select
the request that minimises seek time
125
150
175
199
Request tracks: 55, 58, 39, 18, 90, 160, 150, 38, 184
(a) FIFO
0
25
50
75
100
125
150
175
199
(b) SSTF
➜00 service order: 90, 58, 55,39,18, 150,160,184
25
25
➜
Minimising locally may not lead to overall minimum!
50
50
75
➜
Can lead to starvation
75
100
100
125
150
125
175
150
199
(a) FIFO
175
0
199
25
(c) SCAN
50
SCAN
(Elevator): Move head
in one direction
750
➜
100
25 services requests in track order until it reaches last track,
125
50 reverse direction
150
75
175
Request tracks: 55, 58, 39, 18, 90, 160, 150, 38, 184
199
100
(b) SSTF
1250
25
150
50
175
75
199
100
(d) C-SCAN
125
Time
Slide 71
➜ Better use of locality (sequential reads)
➜ Better use of disk controller’s read-ahead cache
➜ Reduces the maximum delay compared to SCAN
Time
then
N -step-SCAN:
➜ SSTF, SCAN & C-SCAN allow device monopolisation
• process issues many requests to same track
Slide 72
150
175
199
➜ N -step-SCAN segments request queue:
• subqueues of length N
(c) SCAN
➜0 service order: 150,160,184, (200), 90, 58, 55,39,18, (0)
25
50 Similar to SSTF, but avoids starvation
➜
75
➜
100 LOOK: variant of SCAN, moves head only to last request of one
125
direction: 150,160,184, 90, 58, 55,39,18
150
175
➜ SCAN/LOOK are biased against region most recently traversed
199
(d) C-SCAN
➜ Favour innermost and outermost
tracks
➜ Makes poor use of sequential reads (on down-scan)
D ISK S CHEDULING P OLICY
(b) SSTF
0
25
50
75
Circular SCAN (C-SCAN):
100
125 SCAN, but scanning to one direction only
➜ Like
150
• When reaching last track, go back to first non-stop
175
199
Request tracks: 55, 58,
39, 18, 90, 160, 150, 38, 184
(c) SCAN
0
25
50
75
100
125
150
175
199
(d) C-SCAN
• process one queue at a time, using SCAN
• added new requests to other queue
35
D ISK S CHEDULING P OLICY
36
D ISK S CHEDULING
➜ Modern disks:
FSCAN:
Slide 73
• seek and rotational delay dominate performance
➜ Two queues
Slide 75
• one being presently processed
• not efficient to read only few sectors
• cache contains substantial part of currently read track
➜ assume real disk geometry is same as virtual geometry
• other to hold new incoming requests
➜ if not, controller can use scheduling algorithm internally
So, does OS disk scheduling make any difference at all?
Disk scheduling algorithms:
Name
Description
Remarks
Selection according to requestor
Slide 74
RSS
Random scheduling
For analysis and simulation
FIFO
First in, first out
Fairest
PRI
By process priority
Control outside disk magmt
LIFO
Last in, first out
Maximise locality & utilisation
Selection according to requested item
SSTF
Shortest seek time first
High utilisation, small queues
SCAN
Back and forth over disk
Better service distribution
C-SCAN
One-way with fast return
Better worst-case time
N-SCAN
SCAN of N recs at once
Service guarantee
FSCAN
N-SCAN (N=init. queue)
Load sensitive
D ISK S CHEDULING
37
L INUX 2.4.
➜ Used a version of C-SCAN
Slide 76
➜ no real-time support
➜ Write and read handled in the same way — read requests have
to be prioritised
L INUX 2.6.
38
L INUX 2.6.
Deadline I/O scheduler:
P ERFORMANCE
➜ two additional queues: FIFO read queue with deadline of 5ms,
FIFO write with deadline of 500ms
Slide 77
➜ Writes
- similar for writes
➜ request submitted to both queues
Slide 79
➜ if request expires,scheduler dispatches from FIFO queue
➜ Performance:
✔ seeks minimised
✔ requests not starved
✔ read requests handled faster
✘ can result in seek storm, everything read from FIFO queues
- deadline: about 10 times faster for reads
- as: 100 times faster for streaming reads
RAID
Anticipatory Scheduling:
➜ CPU performace has improved exponentially
➜ Same, but anticipates dependent read requests
Slide 78
➜ disk performance only by a factor of 5 to 10
➜ After read request: waits for a few ms
Slide 80
➜ Performance
➜ huge gap between CPU and disk performance
Parallel processing used to improve CPU performance.
✔ can dramatically reduce the number of seek operations
Question: can parallel I/O be used to speed up and improve
reliability of I/O?
✘ if no requests follow, time is wasted
P ERFORMANCE
- deadline scheduler slightly better than AS
➜ Reads
39
RAID: R EDUNDANT A RRAY OF I NEXPENSIVE /I NDEPENDENT D ISKS
40
Data mapping for RAID 0:
RAID: R EDUNDANT A RRAY
OF I NEXPENSIVE /I NDEPENDENT
D ISKS
Multiple disks for improved performance or reliability:
➜ Set of physical disks
➜ Treated as a single logical drive by OS
Slide 81
Logical Disk
Physical
Disk 0
Physical
Disk 1
Physical
Disk 2
strip 0
strip 0
strip 1
strip 1
strip 4
strip 5
strip 6
strip 7
strip 2
strip 8
strip 9
strip 10
strip 11
strip 3
strip 12
strip 13
strip 14
strip 15
strip 2
Physical
Disk 3
strip 3
strip 4
strip 5
➜ Data is distributed over a number of physical disks
Slide 83
➜ Redundancy used to recover from disk failure (exception: RAID
0)
strip 6
strip 7
Array
Management
Software
strip 8
strip 9
➜ There is a range of standard configurations
strip 10
strip 11
- numbered 0 to 6
strip 12
- various redundancy and distribution arrangements
strip 14
strip 13
strip 15
RAID 0 (striped, non-redundant):
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
RAID 1 (mirrored, 2× redundancy):
(a) RAID 0 (non-redundant)
Slide 82
strip 0
strip 1
strip 2
strip 3
strip 4
strip 5
strip 6
strip 7
strip 0
strip 1
strip 8
strip 9
strip 10
strip 11
strip 4
strip 5
strip 8
strip 9
strip 12
strip 13
strip 14
strip 12
strip 13
strip 14
Slide 84
strip 15
➜ requests can be processed in parallel
➜ simple, works well for large requests
strip 4
strip 5
strip 2
strip 6
8
9
strip 10 D ISKS
RAID: strip
R EDUNDANT
A RRAYstrip
OF I NEXPENSIVE
/I NDEPENDENT
strip 12
strip 13
strip 0
strip 1
strip 2
strip 3
strip 6
strip 7
strip 4
strip 5
strip 6
strip 7
strip 10
strip 11
strip 8
strip 9
strip 10
strip 11
strip 15
strip 12
strip 13
strip 14
strip 15
f1(b)
f2(b)
➜ read: can be read from either disk
➜ does not improve on reliability, no redundancy
strip 1
strip 3
duplicates
all disks
(b)➜
RAID
1 (mirrored)
➜ write: each request is written twice
➜ controller translates single request into separate requests to
(a) single
RAIDdisks
0 (non-redundant)
strip 0
strip 2
strip 14
strip 3
strip 0
strip 1
b0
strip
2
b
1
b2
strip 6
b3
strip 3f (b)
0
strip 7
strip 4
strip 5
strip 1141
strip 8
stripRAID:
9
strip A10
strip 11 /I NDEPENDENT D ISKS
R EDUNDANT
RRAY OF I NEXPENSIVE
strip 15
strip 12
through
strip 13 (c) RAID 2 (redundancy
strip
14Hamming code)
strip 7
strip 15
42
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
(a) RAID 0 (non-redundant)
strip 0
strip 1
strip 2
strip 3
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 7
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
strip 12
strip 13
strip 14
strip 15
b0
b1
b2
b3
P(b)
P(0-3)
RAID
(redundancy through Hamming code):
(b) RAID 12
(mirrored)
(d) RAID
(bit-interleaved parity)
RAID
4 3(block-level
parity):
b0
Slide 85
b1
b2
b3
f0(b)
f1(b)
f2(b)
Slide 87
➜ strips are very small (single byte or word)
(c) RAID 2 (redundancy through Hamming code)
➜ error correction code across corresponding bit positions
block 0
block 1
block 2
block 3
block 4
block 5
block 6
block 7
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
b2
b3
P(b)
block 2
block 3
P(0-3)
block 4
block 5
block 6
(d) RAID 3 (bit-interleaved parity)
block 8
block 9
P(8-11)
P(4-7)
block 7
➜ for n disks, log2 n redundancy
➜ expensive
(e) RAID
b0 4 (block-level
b1 parity)
➜ high data rate, but only single request
block 0
block 1
block 13
block17
2
block
block 10
block 11
block 14
block18
3
block
block 15
P(0-3)
block
19
block 12
block 0
P(16-19)
P(12-15)
block16
1
block
block 4
block 5
block 6
block 7
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
(f)block
RAID
parity)
125 (block-level
block distributed
13
block
14
block 15
P(12-15)
RAID 3 (bit-interleaved parity):
block 0
block 1
block 2
block 3
(e) RAID
(block-level parity)
RAID
54(block-level
distributed
parity):
block 4
block 5
block 6
P(4-7)
b0
b1
b2
b3
P(b)
Slide 86
Slide 88
(d)
3 (bit-interleaved
parity)
➜RAID
strips
are very small
(single
Q(0-3)
block 7
block 15
block 8
block12
0
block
block 4
block 9
block 1
P(12-15)
block 5
P(8-11)
block 2
Q(12-15)
block 6
Q(8-11)
block13
3
block
P(4-7)
block 10
P(0-3)
block
14
block 8
block 9
P(8-11)
block 10
block 11
block 13
block 14
block 15
block 17
block 18
block 19
(g)
RAID
block
12 6 (dual redundancy)
P(12-15)
byte or word)
P(0-3)
Q(4-7)
P(16-19)
block 16
block 11
block 7
➜ simple parity bit based redundancy
➜
error
block
0 detection
block 1
block 2
block 3
P(0-3)
(f) RAID 5 (block-level distributed parity)
block
4
block
5
block
6
block
7
P(4-7)
➜
partial
error
correction
(if offender
is known)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
(e) RAID 4 (block-level parity)
RAID:block
R EDUNDANT
A RRAY
OFblock
I NEXPENSIVE
/I NDEPENDENT
0
block
1
2
block
3
P(0-3)D ISKS
block 4
block 5
block 6
P(4-7)
block 7
block 8
block 9
P(8-11)
block 10
block 11
block 12
P(12-15)
block 13
block 14
block 15
43
block 0
block 1
block 2
block 3
P(0-3)
Q(0-3)
block 4
block 5
block 6
P(4-7)
Q(4-7)
block 7
block 8
block 9
P(8-11)
Q(8-11)
block 10
block 11
block 12
P(12-15)
Q(12-15)
block 13
block 14
block 15
RAID: R EDUNDANT A RRAY OF I NEXPENSIVE /I NDEPENDENT D ISKS
(g) RAID 6 (dual redundancy)
44
block 0
block 1
block 2
block 3
block 4
block 5
block 6
block 7
P(0-3)
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
(e) RAID 4 (block-level parity)
block 0
block 1
block 2
block 3
P(0-3)
block 4
block 5
block 6
P(4-7)
block 7
block 8
block 9
P(8-11)
block 10
block 11
block 12
P(12-15)
block 13
block 14
block 15
P(16-19)
block 16
block 17
block 18
block 19
(f) RAID 5 (block-level distributed parity)
RAID 6 (dual redundancy):
Slide 89
block 0
block 1
block 2
block 3
P(0-3)
Q(0-3)
block 4
block 5
block 6
P(4-7)
Q(4-7)
block 7
block 8
block 9
P(8-11)
Q(8-11)
block 10
block 11
block 12
P(12-15)
Q(12-15)
block 13
block 14
block 15
(g) RAID 6 (dual redundancy)
RAID: R EDUNDANT A RRAY OF I NEXPENSIVE /I NDEPENDENT D ISKS
45
© Copyright 2026 Paperzz