ppt - SFU Computing Science

CMPT 300
Introduction to Operating
Systems
File systems
© Janice Regan, CMPT 300, May 2007
0
File management system
 System software to provide I/O services to users

Meet needs of user, access and organize files and directories,
providing standardized interface







Each user can create, modify, delete their own files and directories
and have controlled access to files of other users
Each user may control access by others to their files
Each user should be able to organize their files and directories for
efficient use, and refer to them by symbolic names
Verify validity of files, minimize lost/damaged data
Optimize, were possible, the throughput and system usage for
I/O to files
Provide support for a variety of devices
To support multiple users
© Janice Regan, CMPT 300, May 2007
1
Issues to consider
 If we append new files to the end of the file system the
disk will eventually fill up
 After the disk is full (or perhaps earlier) subsequent files
must be fitted into spaces left by files that have been
deleted
 If files are stored contiguously (either in one piece or in
successive blocks)



File size must be know to determine which available spaces
(series of empty blocks) are big enough
Large files may not fit into available spaces
Space at between the end of an inserted file and the next file
may be too small to be usable (part of a block or too small to
hold a file)
© Janice Regan, CMPT 300, May 2007
2
Approaches: Summary

Contiguous storage:





FAT






Excellent access time (sequential and random)
Poor space usage with external fragmentation
Simple management, no need for data structures to relate non contiguous blocks
VERY difficult and inefficient to extend existing files !!!!
Good sequential access (2 dereferences per block)
Good random access (order N)
Good space usage with some internal fragmentation
One pointer per block must be stored in memory and on disk, size increases as
size of disk increases
Not efficiently scalable to large disks
I-nodes





Good sequential access (2-4 dereferences per block)
Better random access (order logN)
Good space usage with some internal fragmentation
One pointer and one I-node per file on disk
One pointer and one I-node per OPEN file in memory
© Janice Regan, CMPT 300, May 2007
3
More issues
 Finding a series of empty blocks to expand
existing file or add a new file
 Which series of blocks do we choose, if there
are multiple possible series of blocks available
First fit: Choose ‘first’ group found (for example next
in free list)
 Best fit: Choose smallest group that will hold the file
 Nearest fit: Choose the series of blocks ‘closest’ (on
the physical disk) to the previous allocation.

© Janice Regan, CMPT 300, May 2007
4
Free list
 Need to keep track of which blocks are free.
 Two common approaches
 Linked list of disk blocks holding addresses of free
blocks
 Bitmap, 1 bit for each block, 0 if not allocated, 1 if
allocated.
 The list or bitmap is kept on the disk (more
about where later), all or part is also kept in
memory
© Janice Regan, CMPT 300, May 2007
5
Chaining of free portions
 Most useful for contiguous file systems, not commonly
used



Create a linked list of addresses
Each link contains the length of the free portion in blocks or
bytes and the address of the first block or byte
Free portions of the disk are linked in sequence
 Easy to use a first-fit algorithm for contiguous file
systems
 List becomes long as free portions become small.
 Overhead increases rapidly as free portions become
smaller
 Response slow, takes a long time to create/delete a file
© Janice Regan, CMPT 300, May 2007
6
Linked list of Free blocks
 Linked list or stack of disk blocks holding addresses of
free blocks




For example with a1KB block and a 32 bit word, one block holds
256 addresses
The last address in each block is reserved for a pointer to the
next block
Free blocks are added to the front of the list
Free blocks are removed from the front of the list (LIFO stack,
UNIX) or from the end of the list (FIFO linked list, FAT)
 The list is kept on the disk (more about where later)
 One block from the list is also kept in memory

Which block is kept in memory at what time ?
 Free blocks may be used to hold the free list
© Janice Regan, CMPT 300, May 2007
7
Free list, LIFO stack
 One block of the free list is kept in memory. (LIFO stack of
blocks of free block addresses is kept in memory)




When that block fills with free block addresses then it is
replaced by an empty block
When a file is deleted its free blocks are added to the free list
When a file is created the ‘next’ blocks in the free list are
removed from the free list and used for the file
Consider an example a file using 4 blocks is deleted then a file
using 3 blocks is added
© Janice Regan, CMPT 300, May 2007
8
Free list issues (LIFO)
 When the present block of addresses fills it must be
copied to disk and a new empty block copied to memory
from disk
 Therefore, creating new files when the block in memory
is almost empty or deleting files when the block is
almost full causes blocks to be swapped in and out of
memory
 If a series of small or temporary files are created when a
block of addresses is near full (near empty) then
significant swapping can occur
 To avoid this extra system load when a block fills


Either split the block saving half of it to memory in a new block.
Or load the half full block from the previous split
© Janice Regan, CMPT 300, May 2007
9
Free list, LIFO stack
 One block of the free list is kept in memory.




When that block fills with free block addresses then half of free memory
addresses in the block are copied into a new block on disk OR the full
block is read out of memory and a half full block is read in
When a file is deleted its free blocks are added to the free list
When a file is created the ‘next’ blocks in the free list are removed from
the free list and used for the file
Consider an example where a file using 4 blocks is deleted then a file
using 3 blocks is added
© Janice Regan, CMPT 300, May 2007
10
Free list issues (FIFO)
 Using a FAT
 Linked list for free blocks is actually implemented
inside the FAT as we described for individual files
 One or more ‘files’ containing free blocks are recorded
in the FAT
 Using an actual linked list
 Keep only a portion of the list at the head and a portion
of the list at the tail in memory
 Take free blocks from head of list add freed blocks to
tail of list
 Only need disk access when head portion empties or
tail portion fills
© Janice Regan, CMPT 300, May 2007
11
Free space: Bit tables
 How much space is required?
 One bit for each block on the disk
 Disk size in bytes / (8 * block size in bytes)
 Example for an 8GB disk with 2KB blocks
 8*230/(8*2*210)=0.5*220=0.5 MB (250 disk blocks)
 Less space
 More time searching
 Bit tables used by MacOS, NTFS(windows)
© Janice Regan, CMPT 300, May 2007
12
Choosing a block size
 Large unit gives
 large amount of internal fragmentation, decreased
space utilization (less of the disk actually being used)
 Continuity of blocks in file (more efficient access and
increased data rate)
 Smaller table or list describing free blocks
© Janice Regan, CMPT 300, May 2007
13
Choosing block size
 Large block size



1 file no matter how small occupies 1 block
Small files waste a lot of space
Some systems compensate by dividing a large block and
placing partially full blocks from multiple files in that block
 Small block size


Files span many blocks, more overhead
More seek time, rotational delay
 Base choice on observed file size distribution and disk
properties


Small files more efficient disk usage
Large file more efficient disk access
© Janice Regan, CMPT 300, May 2007
14
I-Node structure
File Information
Address block 0
Address block 1
Address block 2
Address block 3
Address block 4
Address block 5
Address block 6
Address block 7
Address block 8
Address block 9
Single indirect
Double indirect
Triple indirect

]block 0
block 1
block 2
block 3
block 4
block 5
block 6
block 7
block 8
block 9
Address array
1 block of addresses
of further blocks of
addresses
© Janice Regan, CMPT 300, May 2007
indirect block 0
Address array
1 block of addresses 
of further blocks

indirect block N
indirect block 0
0th Address array
1 block of addresses
of further blocks

Nth Address array
1 block of addresses
of further blocks

indirect block N
indirect block 0

indirect block N
15
I-Node structure
File Information
Address block 0
Address block 1
Address block 2
Address block 3
Address block 4
Address block 5
Address block 6
Address block 7
Address block 8
Address block 9
Single indirect
Double indirect
Triple indirect
0th Address
array
indirect block 0

indirect block N
0th Address
array
Address array
1 block of
addresses

Nth
Address array

0th Address
array

indirect block N
indirect block 0

indirect block N
Nth Address
array

Nth
Address array
© Janice Regan, CMPT 300, May 2007
indirect block 0
indirect block 0

indirect block N
16
Linux file system organization
© Janice Regan, CMPT 300, May 2007
17
Further Details on Linux
filesystems
 For detailed information on the contents of each
of the blocks shown on the previous slide go to
http://www.nongnu.org/ext2-doc/ext2.html#I-MODE
© Janice Regan, CMPT 300, May 2007
18
Linux organization
 Each linux filesystem is kept on one partition of
one disk (or other device)
 Each disk may be separated into multiple
partitions each containing its own filesystem
 The file system includes a boot block a several
block groups

A block group is stored on a group of adjacent
cylinders
 This keeps files stored in a particular block group
localized on the disk
© Janice Regan, CMPT 300, May 2007
19
File system info: Superblock
 The superblock of each block group contains the same
information about the whole file system






the block size, the inode size (2n <= block size)
the maximum # of blocks, inodes and fragments in each group
the number of free blocks and inodes
list of open files
Location of the first inode (directory file for /)
Block group # of block group holding this copy of superblock
 Usually the superblock of block 0 is read, the other
blocks are copies of the information that can be used if
the superblock of block 0 is corrupted. In ext2 all groups
have a copy of the superblock. For efficiency in ext3 and
later only some predetermined group blocks include a
copy of the superblock
© Janice Regan, CMPT 300, May 2007
20
Other info: Superblock
 The superblock of each block group contains
other information (again common to all block
groups) used in administering the block group


Magic number (indicates block is a superblock)
Mount count and last mount time, maximum mount
count and time interval between disk checks
 The version (e.g. ext2, ext4) and release
 Features supported and not supported, some
features were added later, e.g. journaling in ext3
 If supported, description of journaling system
© Janice Regan, CMPT 300, May 2007
21
Linux Organization:
group descriptors
 Group descriptors contain block ids of (block #s of)



the block bitmap
the inode bitmap
the inode table
 Group descriptors also contain information about the
number each of the following within the block group



Free information nodes (inodes)
Used directories
Free blocks
 The group descriptor block in each block group contains
the group contains the group descriptors for all block
groups.

Again the descriptors in block 0 are usually used, the block
descriptor lists in other block groups are backup
© Janice Regan, CMPT 300, May 2007
22
Block and inode bitmaps
 Each bit in the block bitmap represents
one block in the block group
 Each inode in the inode bitmap represents
on inode in the block group
 if the value of the bit in the bitmap is 1 the
corresponding block or inode is in use
 if the value of the bit in the bitmap is 0 the
corresponding block or inode is not in use
© Janice Regan, CMPT 300, May 2007
23
The inode table
 The inode table includes the inode description
for every file in the group block
 The description of each inode includes

The block describing the inode (with the pointers to
the first 10 blocks and the single, double and triple
indirection block. (not the blocks containing
addresses of further blocks or inodes)
 Variables and flags describing the attributes of the
file, file type, ownership, access history etc.
© Janice Regan, CMPT 300, May 2007
24
Windows Disk organization
 NTFS (new technology file system) Volume
Boot block
Master File Table
(12.5% of volume)
File area
MFT
mirror
File area
 NTFS provides more flexibility that FAT
 FAT has limitations

For example, a FAT32 system uses 16 KB clusters for partition
sizes between 16 and 32 GB. A 20 KB file would require two 16
KB clusters actually occupying 32 KB of space. A mere 1 KB file
still requires 16 KB of space.
 NTFS uses 4KB clusters for disks up to 2TB, small files
waste less space (smaller clusters on disks < 2GB)
© Janice Regan, CMPT 300, May 2007
25
Further Details on Windows,
NTFS filesystems
 For detailed information on the contents of each
of the blocks shown on the previous slide go to
http://www.pcguide.com/ref/hdd/file/ntfs/
© Janice Regan, CMPT 300, May 2007
26
Boot Sector
 Contains information such as the number of sectors per
cluster, the number of sectors per track, the total
number of sectors. The cluster number of the start of the
Master File Table, and the copy (mirror of the MFT)
 Contains bootstrap code to properly initialize the file
system for use
 Should not be confused with the master boot block that
is used to start up the computer system (contains the
BIOS). The code in the master boot block will run the
bootstrap code in the NTFS boot block
© Janice Regan, CMPT 300, May 2007
27
Master File Table (MFT)
 Contains all information needed to retrieve files
in a relational database (rows files, columns
attributes) which describes every file on the file
system (volume)

the master file table
 The mirror of part of the master file table
 The metadata files
 Every other system or user file added to the file
system
© Janice Regan, CMPT 300, May 2007
28
Size of MFT
 When a volume (file system) is first made by
formatting the disk 12.5% of the storage in that
file system is set aside for the MFT. This area is
referred to as the MFT zone and is used to help
prevent fragmentation of the MFT
 File data is placed in the remaining portion of
the file system.
 If the data portion fills before the MFT then a
portion of the MFT may be used for file data
© Janice Regan, CMPT 300, May 2007
29
Metadata Files
 Special files created when the disk volume (file system)
is formatted. These files contain the information needed
to access and manage each file. They are placed at the
beginning of the MFT, They include:







MFT, and mirror MFT
File system log files
Cluster bitmap (which clusters are in use /not in use)
Description of the physical properties of the volume
definitions of attributes
list of bad clusters
The root file system and the boot sector
© Janice Regan, CMPT 300, May 2007
30
Mirror MFT
 A partial copy of the metadata files
contained in the MFT, used to recover the
file system in the case of damage to the
original MFT
 Copies the first 4 MB of the MFT or the first
four records of the MFT whichever is larger
 Guarantees continued access to the file
system in the case of single sector failure
© Janice Regan, CMPT 300, May 2007
31
Some (of 11) attributes
Standard Information access mode (read-only, read/write, and so
forth) timestamp, and link count ...
Attribute List Locations of all additional attribute records that do not fit
in the MFT record.
File Name A repeatable attribute for both long and short file names.
Data File data. NTFS supports multiple data attributes per file
Object ID A volume-unique file identifier.
Reparse point Used for mounting file system
Index Root Used to implement folders and other indexes.
Index Allocation and Bitmap Used to implement the B-tree structure
for large folders and other large indexes.
© Janice Regan, CMPT 300, May 2007
32
File records in the MFT



Each file record includes all attributes of the file
Small files (< 900B) will be stored in the 1KB MFT record itself, and
can therefore be rapidly accessed. When the data for a file is stored
in the MFT record it is called an immediate file
Larger (or more fragmented) files will have the data attribute (the
actual data) replaced by a series of runs.


A run describes the initial address (cluster number) and length in
clusters of the location where a portion of the data is stored.
The data referred to (stored in the listed clusters rather than the MFT
record) is called a nonresident attribute
 Other attributes (other than data) may also be nonresident if
they are too large to fit into a single MFT record.
© Janice Regan, CMPT 300, May 2007
33
Example:MFT record small file
MTF header
data header
Standard Info header
Filename header
Standard
Information
Filename
RESIDENT FILE
FILE DATA
Numbers in black indicate runs of clusters
number is number of blocks in run.
© Janice Regan, CMPT 300, May 2007
34
Data attribute Header
 Consider a stream of data. We break the stream of data
into pieces large enough to fill one cluster.
 A particular record describes a particular file. So assume
that the data from cluster 5 to cluster 21 in the stream will
be placed into a file.

Counting starts at 0, so the 5th cluster in the stream is cluster 4, and
the 21st cluster is cluster 20
 The data header in the MFT record would contain


The number of the first cluster of the stream to be placed in the
file, cluster 4
The number of the first cluster in the stream that will not be
placed in the file, cluster 21
© Janice Regan, CMPT 300, May 2007
35
Example: MFT record large file
MTF header
data header
Standard Info header
Filename header
Standard
Information
Filename
4
Numbers in black indicate runs of clusters
holding file data, first number is first block in
run, second number is number of blocks in run.
21
27
27 28 29 30 31 32
Numbers in blue indicate which
clusters contain the data in the file.
Data in the file is a
NONRESIDENT ATTRIBUTE
© Janice Regan, CMPT 300, May 2007
6
52
8
73
3
…
73 74 75
52 53 54 55 31 32 33 34
36
File records in the MFT
 Larger files will have the data attribute replaced by a
series of record numbers, Each listed MFT record
number will refer to an MFT record that contains runs
describing where the file’s data is stored.
 The system is hierarchical
© Janice Regan, CMPT 300, May 2007
37
File records in the MFT
MTF header
data header
Standard Info header
Filename header
Standard
Information
Numbers in black indicate runs of clusters
holding file data, first number is first block in
run, second number is number of blocks in run.
Continue
With MFT N-2
Record N
Filename
…
8
73
3
…
52
8
73
3
…
MFT record N
Standard
Information
Filename
© Janice Regan, CMPT 300, May 2007
4
21
27
6
38