UNIX Internals – The New Frontiers
Chapters 8 & 9
File Systems
1
Contents
The
User Interface to Files
File System
File System Framework
The Vnode/VFS Architecture
Implementation Overview
File-System-Dependent Objects
Mounting a File System
Operations on Files
The System V File System(s5fs)
S5fs Kernel
2
8.2 The User Interface
files, directory, file descriptor, file systems
File & Directories
File:
logically a container for data
A hierarchical, tree-structured name space
Pathname: all the components in the path
from the root to the node, by “/”
“.” & “..”
Link: a directory entry for a file.
3
Directory tree
4
Operation on directory
5
dirp = opendir(const *filename);
direntp = readdir (dirp);
rewinddir(dirp);
status = closedir(firp);
struct dirent {
int_t d_ino;
char d_name[NAME_MAX +1];
};
File Attributes
Kept in the inode: index node
File attributes:
File
type
Number of hard links
File size
Device ID
Inode number
User and Group Ids of the owner of the file.
Timestamps
Permissions and mode flags
6
Permissions and mode flags
7
0wner, group, others (3 x 3 bits)
Read, write, execute (3 bits)
Mode flags - apply to executable files
- suid, sgid – to set the user’s effective UID
to that of the owner of the file,
- stick – to retain file in swap area
System calls
8
link, unlink – to create and delete hard links
utimes – to change the access and modify
timestamps,
chown – to change the owner UID and GID,
Chmode – to change permissions and mode flags.
File Descriptors
9
fd = open (path, oflag, mode);
fd is a per-process object.
File descriptors
10
File I/O
Random
and sequential access
– random access
nread = read(fd, buf, count);
Write has similar semantics
Operations are serialized
In append mode offset pointer set to the
end of the file
lseek
11
Scatter-Gather I/O
12
nbytes = writev(fd, iov, iovcnt);
File Locking
Read
and write are atomic.
Advisory locks: protect from
cooperative processes, flock() in 4BSD;
in SVR3 chmod must be enabled first
SVR4: r/w locks.
Mandatory locks:kernel
C library function lockf
13
8.3 File systems
14
Mount-on
- a directory is covered by the mounted file system.
- mount table (original) & vfs list (modern)
Restrictions
- file cannot span file system,
- each file system must reside on a single logical
disk
15
Logical Disks
16
A logical disk is a storage abstraction that the kernel
sees as a linear sequence of fixed sized, randomly
accessible blocks.
newfs, mkfs,
Traditional: partition – physical storage of a file
system
Modern configurations:
Volume (several disks combined),
Disk mirroring
Stripe sets
RAID(Redundant Array of Inexpensive Disks)
Special files
17
Generalization to include all kinds of I/O related
objects such as directories, symbolic links, hardware
devices (disks, terminals, printers, psuedodevices
such as the system memory, and communications
abstractions such as pipes and sockets;
Problems with hard links – may not span file
systems,can be created by superuser only,
ownership problems,
Special files
18
Symbolic links – special file that points to another file
(linked-to file); the data portion of the file contains
the pathname of the linked-to file; may be stored in
the I-node of the symbolic link ( more on this in
Practical UNIX Programming pp.90-96);
Pipes – created by pipe system call, deleted by the
kernel automatically
FIFOs - created by mknod system call, must be
explicitly deleted;
8.5 File System Framework
Traditional
UNIX can not support >1 types of
FS.
The new developments (DOS, file sharing,
RFS, NFS) require the framework to change.
AT&T:
file system switch
Sun Microsystem: vnode/vfs
DEC: gnode
SVR4:(AT&T+
standard
19
vnode/vfs+NFS)-> de facto
8.6 The Vnode/Vfs Architecture
Objectives
Support
several file system types
simultaneously.
Different disk partitions may contain
different types of file systems.
Support for sharing files over a network.
Vendors should be able to create their own
file system types and add them to the
kernel.
20
Lessons from Device I/O
Devices:
block & character
Character device switch:
struc cdevsw {
int (*d_open)();
int (*d_close)();
int (*d_read)();
int (*d_write)();
} cdevsw[ ];
21
Major device number: as the index
read system call(in traditional UNIX)
1)
2)
3)
4)
5)
6)
7)
8)
9)
22
Use the file descriptor to get to the open file object;
Check the entry to see if the file is open for read;
Get the pointer to the in-core inode from this entry;
Lock the inode so as to serialize access to the file;
Check the inode mode field and find that the file is a
character device file.
Use the major device number to index into a table of
character devices and obtain the cdevsw entry for this
device;
From the cdevsw, obtain the pointer to the d_read
routine for this device;
Invoke the d_read operation to perform the devicespecific processing of the read request.
Unlock the inode and return to the user.
Lessons from Device I/O
It
is necessary to separate the file
subsystem code into file-systemindependent code and file-systemdependent code
The interface between these two parts
is defined by a set of generic functions
that are called by the file systemindependent code
23
Object Oriented Design
24
Overview of the Vnode/Vfs Interface
Vnode
represents a file in the UNIX
kernel.
Vfs represents a file system
25
)
26
base class data and operations
pointers
v_data: inode(s5fs), rnode(NFS),
tmpnode(tmpfs),
v_op: vnodeops
Example: to close the file associated with the vnode
27
#define VOP_CLOSE(vp,…) (*((vp)->v_opclose))(vp,…)
VFS base class
28
8.7 Implementation Overview
Objectives
Each
operation must be carried out on behalf of the
current process.
Certain operations may need to serialize access to the
file.
The interface must be stateless and reentrant.
FS implementation should be allowed to use global
resources, such as buffer cache.
The interface should be usable by the server side
The use of fixed-size static tables must be avoided.
29
Vnodes and Open Files
The
vnode is the fundamental
abstraction that represents an active
file in the kernel.
access to a vnode:
by
a file descriptor
by file-system-dependent data structures
30
Data structures
Reference count
31
The Vnode
struct vnode
{u_short v_flag;
u_short v_count;
struct vfs *vfsmountedhere;
struct vnodeops *v_op;
struct vfs *vfsp;
…
};
// p242
32
Vnode Reference Count
33
It determines how long the vnode must remain in the
kernel.
Reference versus lock:
Acquire a reference:
Open a file
A process holds a reference to its current directory.
When a new file system is mounted
Pathname traversal routine
file is deleted physically when reference count becomes
zero.
The Vfs Object
struct vfs {
};
34
struct vfs *vfs_next;
struct vfsops * vfs_op;
struct vnode *vfs_vnodecovered;
int vfs_fstype;
caddr_t vfs_data;
dev_t vfs_dev;
…
//p243
35
8.8 File-System-Dependent Objects
The
Per-File Private Data
Vnode
36
is an abstract objects.
The vnodeops Vector
struct vnodeops{
int (*vop_open)();
int (*vop_close)();
…
}; //p245
For ufs:
struct vnodeops ufs_vnodeops = {
ufs_open;
ufs_close;
…
}; //p246
37
38
File-System-Dependent Parts of
the Vfs Layer
struct vfsops {
int (*vfs_mount)();
int (*vfs_unmount)();
int (*vfs_root)();
int (*vfs_statvfs)();
int (*vfs_sync)();
…
}; //p246
39
40
8.9 Mounting a File System
mount(spec, dir, flags, type, dataptr, datalen) //SVR4
Virtual File System Switch - a global table containing
one entry for each file system type.
struct vfssw{
char *vsw_name;
int (*vsw_init)();
struct vfsops * vsw_vfsops;
….
} vsfsw[];
41
mount Implementation
Adds
the structure to the linked list
headed by rootvfs.
Sets the vfs_op field to the vfsops
vector specified in the switch entry.
Sets the vfs_vnodecovered field to
point to the vnode of the mount point
directory.
42
VFS_MOUNT processing
Verify
permissions for the operation.
Allocate and initialize the private data
object of the file system.
Store a pointer to it in the vfs_data field
of the vfs object.
Access the root directory of the file
system and initialize its vnode in
memory.
43
8.10 Operations on Files
Pathname Traversal
lookuppn(): u_cdir
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
v_type is of a directory
“..” & system root – move on
“..” & a mounted system root – access the mount point
VOP_LOOKUP
Not found, last one - success, else – error ENOENT
A mount point - go to the mounted vfs root
A symbolic link – translate it and append
Release the directory
Go back to the top of the loop
Terminate, do not release the reference of the final vnode
//p250
44
Opening a file
fd = open(pathname, mode)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
45
Allocate a descriptor
Allocate an open file object
Call lookuppn()
Check the vnode for permissions
Check for the operations
Not exist, O_Creat, VOP_CREAT; ENOENT
VOP_OPEN
If O_TRUNC, VOP_SETATTR
Initialize
Return the index of the file descriptor
//p252
Other topics
46
File I/O
File attributes
User credentials
Analysis
Drawbacks of the SVR4 Implementation
The 4.4 BSD Model
Chapter 9
File System Implementations
47
9.2 The System V File System(s5fs)
The
layout of s5fs
partition:
B S inode list
Directories:
48
s5fs directory is a
special file containing a
list of files and
subdirectories.
data blocks
Inodes
The
inode contains administrative
information,or meta data.
The
node list contains all the inodes.
On-disk inode - see Tab. 9-1
In-core inode have more fields
49
Inode Fields
50
di_mode
Bit-fields
51
Block array of inode—di_addr
inode
10, 10K
256, 256K
256*256=65K, 65M
52
256*256*256=16M, 16G
The superblock
Size
in blocks of the file system
Size in blocks of the inode list
Number of free blocks and inodes
Free block list
Free inode list
53
Free block list
54
9.3 s5fs Kernel Organization
In-core Inodes
The
vnode
Device ID
Inode number of the file
Flags for synchronization and cache management
Pointers to keep the inode on a free list
Pointers to keep the inode on a hash queue.
Block number of last block read
55
Allocating and Reclaiming
Inodes
Inode
table(LRU) containing the active
inodes
Reference count of a vnode ==0 the
reclaim the inode as free
Iget()(allocating):
56
Inode lookup
s5lookup()
Checks
the directory name lookup cache
Directory name lookup cache Miss? Reads
the directory one block at a time, searching
the entries for the specified file name:Get it
If the file is in the directory, get the inode
number, use iget() to locate the inode,
Inode in the table?get it: allocate a new
inode, initialize, copy, put in the hash queue,
also initialize the vnode(v_ops, v_data, vfs)
Return the pointer to the inode
57
File I/O (1)
Read(to
Fd->
58
a user buffer address)
the open file object, verify mode-> vnode-> get
the rw-lock->call s5read()
Offset -> block number & the offset -> uiomove()->
call copyout()
The page not in memory?page fault->the handler>s5getpage()->call bmap()
logical to physical mapping, search vnode’s page
list, not in?allocates a free page and call the disk
driver to read the data from disk
Sleeps until the I/O completes. Before copying to
user data space, verifies the user has access
s5read() returns, unlock, advances the offset,
returns the number of bytes read
File I/O (2)
Write:
Not
immediately to disk
May increase the file size
May require the allocation of data blocks
Read the entire block, write relevant data,
write back all the block
59
Allocating and reclaiming
Inodes
When
the reference count drops to 0..
When a file becomes inactive….
It is better to reuse inodes…………
60
Analysis of s5fs
Reliability
concern : super block
Performance:
2
disk I/Os
Blocks randomly located
Block size: 512(SVR2), 1024(SVR3)
Name: 14 characters
Inodes limit: 65535
61
The Berkeley Fast File System
Hard disk structure
On-disk organization
- Blocks and fragments
- Allocation policy
FFS functionality enhancements
– long file names,
- symbolic links,
- other enhancements;
Analysis
62
Other file systems
Temporary
file systems
- RAM disk, mfs, tmpfs)
The Specfs File System
The /proc File System
63
Linux Virtual File
System
Uniform
file system interface to user
processes
Represents any conceivable file
system’s general feature and behavior
Assumes files are objects that share
basic properties regardless of the target
file system
64
65
66
Primary Objects in VFS
Superblock
object
Represents
Inode
object
Represents
Dentry
a specific directory entry
object
Represents
process
67
a specific file
object
Represents
File
a specific mounted file system
an open file associated with a
© Copyright 2026 Paperzz