Notes 01

CS222: Principles of Database Management
Fall 2010
Professor Chen Li
Department of Computer Science
University of California, Irvine
Notes 01
1
Topic 1: Data Storage and RecordOriented File Systems
• Data Storage
– Storage hierarchy
– Disks
• Record-oriented file systems
CS222
Notes 01
2
Storage hierarchy
CPU
...
cache
Memory
Controller
...
Disk/tape
CS222
Notes 01
3
Storage Media
• Cache: inside/outside CPU
– CPU: becoming faster and faster (>=3 GHz now)
• Main Memory
–
–
–
–
–
costs $100/Mbyte -- reduces every year
‘volatile’ -- does not survive system failures
random I/O very fast
data can be processed by CPU directly
capacity limited to orders of magnitude lower than what database
needs.
CS222
Notes 01
4
Storage Media: secondary storage
• Disks (floppy disks, hard disks, CD)
–
–
–
–
Cheap, and price reduces each year
Non-volatile (except when disk crashes)
Random I/O slow
Data needs to be transferred to memory to be processed by CPU
• Tape
– Cheaper but slower than disks.
– Sequential I/O devices.
– Handy for backups, sometimes for archival.
CS222
Notes 01
5
Databases and Storage Devices
•
•
•
•
Due to capacity, cost, volatility factors, DBs usually stored in disks.
Data brought to main memory for processing from disks
There are many ways to interface memory with disk resident data
E.g., virtual memory:
– VM size limited to max address generated by CPU
– Existing VM does not support durability
• File system provides a more powerful mapping between memory and
disk storage
• A bunch of tricks used ensure that high latency of secondary storage
does not impact application response time and system throughput
– access disks asynchronously with active applications
– prefetch data before application needs it
– intelligent caching techniques
CS222
Notes 01
6
Disk Storages -- Outline
•
•
•
•
•
Disk mechanics
Access times (random, sequential)
Examples
Optimization
Other topics
CS222
Notes 01
7
Disk mechanics
…
Terms: Spindle, Platters, Magnetic surfaces,
Disk head, Disk controller, …
CS222
Notes 01
8
Top Views
Tracks
Sectors
Gaps
Cylinders
CS222
Notes 01
9
Characteristics
•
•
•
•
•
•
Diameter:
Cylinders:
Surfaces:
Tracks/Cyl:
Sector Size:
Capacity:
CS222
1 inch -- 15 inches
100 -- 2000
1 (CDs) -- many
2 (floppies) -- 30
512B -- 50K
360 KB (old floppy) -- >=200GB
Notes 01
10
“Block”
• Corresponds to 1 or multiple sectors
• Its address consists of:
–
–
–
–
Physical device # (in case of multi disks)
Cylinder #
Surface #
Sector #
CS222
Notes 01
11
Random disk access time
block x
in memory
I want
block X
Time = Seek Time + Rotational Delay + Transfer Time + Other
time 1
CS222
time 2
time 3
Notes 01
time 4
12
Time 1: seek time
3 or 5x
Time
x
1
N
Cylinders Traveled
CS222
Notes 01
13
Average Random Seek Time
N
N
 
S=
i=1
SeekTime(Track i  Track j)
j=1
ji
N(N-1)
• Assumptions:
– Each track has the same probability to be accessed.
– Each track has the probability to jump to another track.
• Typical S value: 10 ms – 50 ms
CS222
Notes 01
14
Time 2: Rotational Delay
Initial Head
Block Wanted
• Average delay:
– R = 1/2 revolution
– If disk speed 3600 RPM, then R = 8.33 ms
CS222
Notes 01
15
Complication
May have to wait for start of track
before we can read desired block
Track Start
Head Here
Block We Want
CS222
Notes 01
16
Time 3: Transfer time
• Transfer time: block size/transfer rate
• Typical transfer rate:1  3 MB/sec
CS222
Notes 01
17
Time 4: Other Delays
• CPU time to issue I/O
• Contention for controller
• Contention for bus, memory, etc.
Typical value: “0”
CS222
Notes 01
18
Sequential disk access
• Reading “Next” block
• Additional time = Block size/transfer rate
• Other time negligible:
– skip gaps
– once in a while, next cylinder
CS222
Notes 01
19
Random I/O vs Sequential I/O
• Average sequential IO time much smaller than random IO
time
– Random I/O:  20 ms (most time on the initial delay)
– Sequential I/O:  1 ms.
• When designing a structure, try to use sequential
IOs.
– Data layout on disk becomes critical
– Do not just look at the number of IOs
CS222
Notes 01
20
Modify blocks
•
•
•
•
Read block
Modify in memory
Write block
Verify
– Optional
– If so, the access time needs to add:
full rotation + block size/transfer rate
CS222
Notes 01
21
Example 1
Disk Specs:
• 3.5 in diameter
• 3600 RPM
• 1 surface
• Usable capacity: 16 MB = 224
• # of cylinders: 128 = 27
• 1 block = 1 sector = 1 KB
• 10% overhead between blocks (gaps)
• seek time:
– average = 25 ms.
– adjacent cyl = 5 ms.
CS222
Notes 01
22
Cylinder
• bytes/cyl = 224/27 = 217 = 128 KB
• blocks/cyl = 128 KB / 1 KB = 128
CS222
Notes 01
23
Track
...
One track
• Speed:
– 3600 RPM  60 revolutions / sec  16.66 ms/rev
• In each revolution:
–
–
–
–
Time over useful data: 16.66 * 0.9=14.99 ms
Time over gaps: 16.66 * 0.1 = 1.66 ms
Transfer time 1 block = 14.99/128 = 0.117 ms
Trans. time 1 block + gap = 16.66/128 = 0.13ms
CS222
Notes 01
24
Bandwidths
• Burst bandwidth:
– No time on gaps (10%)
– 1 KB in 0.117 ms.
BB=1KB / 0.117ms = 8.54 KB/ms = 8.33MB/sec
• Sustained bandwidth:
– Including time on gaps
– 128 KB in 16.66 ms.
SB=128KB /16.66ms = 7.68 KB/ms = 7.50 MB/sec
CS222
Notes 01
25
Time of random block access
• Time to read one random block T1
• T1 = seek time + rotational delay + Transfer time
–
–
–
–
–
Assume we do not have to wait for track start
Seek time = 25ms
Rotational delay = 16.66ms /2 = 8.33 ms
Transfer time = .117 ms
Total = 25 ms + 8.33 ms + .117 ms= 33.45 ms
• Most of the time is on “seek time” and “rotational
delay”!
CS222
Notes 01
26
Larger blocks?
1
2
3
4
...
1 block
• Suppose OS deals with 4 KB blocks
• We need to include the time of reading 1 block (without gap)
and 3 blocks (with gaps)
• T4 = 25ms + (16.66ms/2) + (.117) x 1 + (.130) * 3 = 33.83 ms
• Compare to T1 = 33.45 ms – not much difference
– That’s why we want to use sequential IOs!
CS222
Notes 01
27
Reading a track
• TT = Time to read a full track (start at any block)
• TT = 25ms
(seek time)
+ (0.13ms / 2) (rotational delay, half of a block)
+ 16.66 ms
(transfer time)
= 41.73 ms
• The time could be a bit less by ignoring the last gap.
• Question: what if we need to wait for the start of a
track?
CS222
Notes 01
28