CSMC 412 Operating Systems Prof. Ashok K Agrawala © 2006 Ashok Agrawala Set 11 Operating System Concepts 11.1 File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery Log-Structured File Systems NFS Example: WAFL File System Operating System Concepts 11.2 1 Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block allocation and free-block algorithms and trade-offs Operating System Concepts 11.3 File System Structure Primary device for File System – Disks z Direct Access z Written in place Operating System Concepts 11.4 2 File-System Structure File structure z Logical storage unit z Collection of related information File system resides on secondary storage (disks) z Direct Access z Written in place File system organized into layers File control block – storage structure consisting of information about a file Operating System Concepts 11.5 Layered File System Operating System Concepts 11.6 3 File System Implementation Several disk and in memory structures are used to support the file system On disk the file system contains a variety of information z Boot information z Disk information – No of blocks z Directory Structure z Free blocks z Files – their data Operating System Concepts 11.7 File System Implementation Boot Control Block – per volume Information to boot the OS – typically the first block of the disk. Boot Block in UFS Partition Boot Sector in NTFS Volume Control Block – per volume z Contains volume or partition details No of blocks. Size of clocks, free block count, free-block pointers, free FCB count Super Block in UFS Stored in Master File table in NTFS Directory Structure – per file system z In UFS – INODE number and File name z In NTFS – Master File Table File Control Block – per file z Operating System Concepts 11.8 4 A Typical File Control Block Operating System Concepts 11.9 In Memory Information Mount Table Directory Structure cache z Information about recently accessed directories System-wide Open-file Table z Copy of FCB of all open files z Additional information Per-process open-file Table z Pointers into System-wide Open File Table. Operating System Concepts 11.10 5 In-Memory File System Structures Operating System Concepts 11.11 File Operations Creating a file z Calls logical file system – knows the format of the directory z Allocates/creates an FCB z Reads the directory into memory z Updates it with the new file information z Writes back to the disk Opening a file z Call passes a file name z Search System-Wide Open File Table to see if it is open If there – create a per-process entry and point to it z Search directory structure for the file name z Copy FCB into System-Wide open file table z Returns a pointer to per-process table entry Make Operating System Concepts an entry in per-process open file table 11.12 6 File Operations Read and Write Operations z Use process open file table entry which also has the pointer z This entry is called File Descriptor in UFS and File Handle in Windows Close z Remove process table entry z Decrease the open count in the System-wide table. If zero, write back any updated meta data to disk and remove the entry. Caching of information – done for efficiency Operating System Concepts 11.13 Partitions and Mounting Partitions or volumes z Divide one disk to look like multiple disks z Manage by having a partition table z Each partition is treated as a raw disk z Root partition dual boot Contains the OS kernel and other system files Mounted at Boot time Other z volumes can be mounted if desired In Memory Mount Table Operating System Concepts 11.14 7 Virtual File Systems Virtual File Systems (VFS) provide an object-oriented way of implementing file systems. VFS allows the same system call interface (the API) to be used for different types of file systems. The API is to the VFS interface, rather than any specific type of file system. Operating System Concepts 11.15 Schematic View of Virtual File System Operating System Concepts 11.16 8 Directory Implementation Linear list of file names with pointer to the data blocks. z simple to program z time-consuming to execute Hash Table – linear list with hash data structure. z decreases directory search time z collisions – situations where two file names hash to the same location z fixed size Operating System Concepts 11.17 Allocation Methods An allocation method refers to how disk blocks are allocated for files: Contiguous allocation Linked allocation Indexed allocation Operating System Concepts 11.18 9 Contiguous Allocation Each file occupies a set of contiguous blocks on the disk Simple – only starting location (block #) and length (number of blocks) are required Random access Wasteful of space (dynamic storage-allocation problem) Files cannot grow Operating System Concepts 11.19 Contiguous Allocation Mapping from logical to physical Q LA/512 R Block to be accessed = Q + starting address Displacement into block = R Operating System Concepts 11.20 10 Contiguous Allocation of Disk Space Operating System Concepts 11.21 Extent-Based Systems Many newer file systems (I.e. Veritas File System) use a modified contiguous allocation scheme Extent-based file systems allocate disk blocks in extents An extent is a contiguous block of disks z Extents are allocated for file allocation z A file consists of one or more extents. Operating System Concepts 11.22 11 Linked Allocation Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk. block = Operating System Concepts pointer 11.23 Linked Allocation (Cont.) Simple – need only starting address Free-space management system – no waste of space No random access Mapping Q LA/511 R Block to be accessed is the Qth block in the linked chain of blocks representing the file. Displacement into block = R + 1 File-allocation table (FAT) – disk-space allocation used by MS-DOS and OS/2. Operating System Concepts 11.24 12 Linked Allocation Operating System Concepts 11.25 File-Allocation Table Operating System Concepts 11.26 13 Indexed Allocation Brings all pointers together into the index block. Logical view. index table Operating System Concepts 11.27 Example of Indexed Allocation Operating System Concepts 11.28 14 Indexed Allocation (Cont.) Need index table Random access Dynamic access without external fragmentation, but have overhead of index block. Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need only 1 block for index table. Q LA/512 R Q = displacement into index table R = displacement into block Operating System Concepts 11.29 Indexed Allocation – Mapping (Cont.) Mapping from logical to physical in a file of unbounded length (block size of 512 words). Linked scheme – Link blocks of index table (no limit on size). Q1 LA / (512 x 511) R1 Q1 = block of index table R1 is used as follows: Q2 R1 / 512 R2 Q2 = displacement into block of index table R2 displacement into block of file: Operating System Concepts 11.30 15 Indexed Allocation – Mapping (Cont.) Two-level index (maximum file size is 5123) Q1 LA / (512 x 512) R1 Q1 = displacement into outer-index R1 is used as follows: Q2 R1 / 512 R2 Q2 = displacement into block of index table R2 displacement into block of file: Operating System Concepts 11.31 Indexed Allocation – Mapping (Cont.) # outer-index index table Operating System Concepts file 11.32 16 Combined Scheme: UNIX (4K bytes per block) Operating System Concepts 11.33 Free-Space Management Bit vector (n blocks) 0 1 2 n-1 … bit[i] = 0 ⇒ block[i] free 1 ⇒ block[i] occupied Block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit Operating System Concepts 11.34 17 Free-Space Management (Cont.) Bit map requires extra space z Example: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230/212 = 218 bits (or 32K bytes) Easy to get contiguous files Linked list (free list) z Cannot get contiguous space easily z No waste of space Grouping Counting Operating System Concepts 11.35 Free-Space Management (Cont.) Need to protect: z Pointer to free list z Bit map Must be kept on disk Copy in memory and disk may differ Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk z Solution: Set bit[i] = 1 in disk Allocate Set Operating System Concepts block[i] bit[i] = 1 in memory 11.36 18 Linked Free Space List on Disk Operating System Concepts 11.37 Efficiency and Performance Efficiency dependent on: z disk allocation and directory algorithms z types of data kept in file’s directory entry Performance z disk cache – separate section of main memory for frequently used blocks z free-behind and read-ahead – techniques to optimize sequential access z improve PC performance by dedicating section of memory as virtual disk, or RAM disk Operating System Concepts 11.38 19 Page Cache A page cache caches pages rather than disk blocks using virtual memory techniques Memory-mapped I/O uses a page cache Routine I/O through the file system uses the buffer (disk) cache This leads to the following figure Operating System Concepts 11.39 I/O Without a Unified Buffer Cache Operating System Concepts 11.40 20 Unified Buffer Cache A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O Operating System Concepts 11.41 I/O Using a Unified Buffer Cache Operating System Concepts 11.42 21 Various Disk-Caching Locations Operating System Concepts 11.43 Recovery Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical) Recover lost file or disk by restoring data from backup Operating System Concepts 11.44 22 Log Structured File Systems Log structured (or journaling) file systems record each update to the file system as a transaction All transactions are written to a log z A transaction is considered committed once it is written to the log z However, the file system may not yet be updated The transactions in the log are asynchronously written to the file system z When the file system is modified, the transaction is removed from the log If the file system crashes, all remaining transactions in the log must still be performed Operating System Concepts 11.45 UNIX- File System The UNIX file system supports two main objects: files and directories. Directories are just files with a special format, so the representation of a file is the basic UNIX concept. Operating System Concepts 11.46 23 Blocks and Fragments Most of the file system is taken up by data blocks. 4.2BSD uses two block sized for files which have no indirect blocks: z All the blocks of a file are of a large block size (such as 8K), except the last. z The last block is an appropriate multiple of a smaller fragment size (i.e., 1024) to fill out the file. z Thus, a file of size 18,000 bytes would have two 8K blocks and one 2K fragment (which would not be filled completely). Operating System Concepts 11.47 Blocks and Fragments (Cont.) The block and fragment sizes are set during file-system creation according to the intended use of the file system: z If many small files are expected, the fragment size should be small. z If repeated transfers of large files are expected, the basic block size should be large. The maximum block-to-fragment ratio is 8 : 1; the minimum block size is 4K (typical choices are 4096 : 512 and 8192 : 1024). Operating System Concepts 11.48 24 Inodes A file is represented by an inode — a record that stores information about a specific file on the disk. The inode also contains 15 pointer to the disk blocks containing the files’s data contents. z First 12 point to direct blocks. z Next three point to indirect blocks First indirect block pointer is the address of a single indirect block — an index block containing the addresses of blocks that do contain data. Second is a double-indirect-block pointer, the address of a block that contains the addresses of blocks that contain pointer to the actual data blocks. A triple indirect pointer is not needed; files with as many as 232 bytes will use only double indirection. Operating System Concepts 11.49 Directories The inode type field distinguishes between plain files and directories. Directory entries are of variable length; each entry contains first the length of the entry, then the file name and the inode number. The user refers to a file by a path name,whereas the file system uses the inode as its definition of a file. z The kernel has to map the supplied user path name to an inode z Directories are used for this mapping. Operating System Concepts 11.50 25 Directories (Cont.) First determine the starting directory: If the first character is “/”, the starting directory is the root directory. z For any other starting character, the starting directory is the current directory. The search process continues until the end of the path name is reached and the desired inode is returned. Once the inode is found, a file structure is allocated to point to the inode. 4.3BSD improved file system performance by adding a directory name cache to hold recent directory-to-inode translations. z Operating System Concepts 11.51 Mapping of a File Descriptor to an Inode System calls that refer to open files indicate the file is passing a file descriptor as an argument. The file descriptor is used by the kernel to index a table of open files for the current process. Each entry of the table contains a pointer to a file structure. This file structure in turn points to the inode. Since the open file table has a fixed length which is only setable at boot time, there is a fixed limit on the number of concurrently open files in a system. Operating System Concepts 11.52 26 File-System Control Blocks Operating System Concepts 11.53 Disk Structures The one file system that a user ordinarily sees may actually consist of several physical file systems, each on a different device. Partitioning a physical device into multiple file systems has several benefits. z Different file systems can support different uses. z Reliability is improved z Can improve efficiency by varying file-system parameters. z Prevents one program form using all available space for a large file. z Speeds up searches on backup tapes and restoring partitions from tape. Operating System Concepts 11.54 27 Disk Structures (Cont.) The root file system is always available on a drive. Other file systems may be mounted — i.e., integrated into the directory hierarchy of the root file system. The following figure illustrates how a directory structure is partitioned into file systems, which are mapped onto logical devices, which are partitions of physical devices. Operating System Concepts 11.55 Mapping File System to Physical Devices Operating System Concepts 11.56 28 Implementations The user interface to the file system is simple and well defined, allowing the implementation of the file system itself to be changed without significant effect on the user. For Version 7, the size of inodes doubled, the maximum file and file system sized increased, and the details of free-list handling and superblock information changed. In 4.0BSD, the size of blocks used in the file system was increased form 512 bytes to 1024 bytes — increased internal fragmentation, but doubled throughput. 4.2BSD added the Berkeley Fast File System, which increased speed, and included new features. z New directory system calls z truncate calls z Fast File System found in most implementations of UNIX. Operating System Concepts 11.57 Layout and Allocation Policy The kernel uses a <logical device number, inode number> pair to identify a file. z The logical device number defines the file system involved. z The inodes in the file system are numbered in sequence. 4.3BSD introduced the cylinder group — allows localization of the blocks in a file. z Each cylinder gorup occupies one or more consecutive cylinders of the disk, so that disk accesses within the cylinder group require minimal disk head movement. z Every cylinder group has a superblock, a cylinder block, an array of inodes, and some data blocks. Operating System Concepts 11.58 29 4.3BSD Cylinder Group Operating System Concepts 11.59 The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs (or WANs) The implementation is part of the Solaris and SunOS operating systems running on Sun workstations using an unreliable datagram protocol (UDP/IP protocol and Ethernet Operating System Concepts 11.60 30 NFS (Cont.) Interconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner z A remote directory is mounted over a local file system directory z Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided z The mounted directory looks like an integral subtree of the local file system, replacing the subtree descending from the local directory Files in the remote directory can then be accessed in a transparent manner Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory Operating System Concepts 11.61 NFS (Cont.) NFS is designed to operate in a heterogeneous environment of different machines, operating systems, and network architectures; the NFS specifications independent of these media This independence is achieved through the use of RPC primitives built on top of an External Data Representation (XDR) protocol used between two implementation-independent interfaces The NFS specification distinguishes between the services provided by a mount mechanism and the actual remote-file-access services Operating System Concepts 11.62 31 Three Independent File Systems Operating System Concepts 11.63 Mounting in NFS Mounts Operating System Concepts Cascading mounts 11.64 32 NFS Mount Protocol Establishes initial logical connection between server and client Mount operation includes name of remote directory to be mounted and name of server machine storing it z Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine z Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them Following a mount request that conforms to its export list, the server returns a file handle—a key for further accesses File handle – a file-system identifier, and an inode number to identify the mounted directory within the exported file system The mount operation changes only the user’s view and does not affect the server side Operating System Concepts 11.65 NFS Protocol Provides a set of remote procedure calls for remote file operations. The procedures support the following operations: z searching for a file within a directory z reading a set of directory entries z manipulating links and directories z accessing file attributes z reading and writing files NFS servers are stateless; each request has to provide a full set of arguments (NFS V4 is just coming available – very different, stateful) Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching) The NFS protocol does not provide concurrency-control mechanisms Operating System Concepts 11.66 33 Three Major Layers of NFS Architecture UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors) Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types z The VFS activates file-system-specific operations to handle local requests according to their file-system types z Calls the NFS protocol procedures for remote requests NFS service layer – bottom layer of the architecture z Implements the NFS protocol Operating System Concepts 11.67 Schematic View of NFS Architecture Operating System Concepts 11.68 34 NFS Path-Name Translation Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names Operating System Concepts 11.69 NFS Remote Operations Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs (except opening and closing files) NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes z Cached file blocks are used only if the corresponding cached attributes are up to date File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server Clients do not free delayed-write blocks until the server confirms that the data have been written to disk Operating System Concepts 11.70 35 Example: WAFL File System Used on Network Appliance “Filers” – distributed file system appliances “Write-anywhere file layout” Serves up NFS, CIFS, http, ftp Random I/O optimized, write optimized z NVRAM for write caching Similar to Berkeley Fast File System, with extensive modifications Operating System Concepts 11.71 The WAFL File Layout Operating System Concepts 11.72 36 Snapshots in WAFL Operating System Concepts 11.73 NTFS File Systems supported by Windows NTFS Design Goals File System Driver Architecture NTFS Operation Windows File System On-Disk Structure NTFS File Compression Operating System Concepts 11.74 37 Windows File System - Terminology Sectors: z z File system formats: z z Define the way data is stored on storage media Impact a file system features: permissions & security, limitations on file size, support for small/large files/disks Clusters: z z z hardware-addressable blocks on a storage medium Typical sector size on hard disks for x86-based systems is 512 bytes Addressable blocks that many file system formats use Cluster size is always a multiple of the sector size Cluster size tradeoff: space efficiency vs. access speed Metadata: z Data stored on a volume in support of file system format management z Metadata includes the data that defines the placement of files and directories on a volume, for example z Typically not accessible to applications Operating System Concepts 11.75 Formats Supported by Windows CD-ROM File System (CDFS) Universal Disk Format (UDF) File Allocation Table (FAT12, FAT16, and FAT32) New Technology File System (NTFS) Operating System Concepts 11.76 38 CDFS CDFS, or, is a relatively simple format that was defined in 1988 as the read-only formatting standard for CD-ROM media. Windows 2000 implements ISO 9660-compliant CDFS in \Winnt\System32\Drivers\Cdfs.sys, with long filename support defined by Level 2 of the ISO 9660 standard Because of its simplicity, the CDFS format has a number of restrictions z Directory and file names must be fewer than 32 characters long z Directory trees can be no more than eight levels deep CDFS is considered a legacy format because the industry has adopted the Universal Disk Format (UDF) as the standard for read-only media Operating System Concepts 11.77 UDF OSTA (Optical Storage Technology Association) defined UDF in 1995 as a format to replace CDFS for magneto-optical storage media, mainly DVDROM z The Windows 2000 UDF file system implementation is ISO 13346compliant and supports UDF versions 1.02 and 1.5 UDF file systems have the following traits: z Filenames can be 255 characters long z The maximum path length is 1023 characters Although the UDF format was designed with rewritable media in mind, the Windows 2000 UDF driver (\Winnt\System32\Drivers\Udfs.sys) provides read-only support Operating System Concepts 11.78 39 FAT FAT (File Allocation Table) file systems are a legacy format that originated in DOS and Windows 9x Reasons why Windows supports FAT file systems: z to enable upgrades from other versions of Windows z compatibility with other operating systems in multiboot systems z as a floppy disk format Windows FAT file system driver is implemented in \Winnt\System32\Drivers\Fastfat.sys Each FAT format includes a number that indicates the number of bits the format uses to identify clusters on a disk Boot sector File allocation table 1 File allocation table 2 (duplicate) Root directory Other directories and all files FAT format organization Operating System Concepts 11.79 FAT12 FAT12's 12-bit cluster identifier limits a partition to storing a maximum of 212 (4096) clusters z Windows uses cluster sizes from 512 bytes to 8 KB in size, which limits a FAT12 volume size to 32 MB z Windows uses FAT12 as the format for all 5-inch floppy disks and 3.5-inch floppy disks, which store up to 1.44 MB of data Operating System Concepts 11.80 40 FAT16 FAT16, with a 16-bit cluster identifier, can address 216 (65,536) clusters z On Windows, FAT16 cluster sizes range from 512 bytes (the sector size) to 64 KB, which limits FAT16 volume sizes to 4 GB z The cluster size Windows uses depends on the size of a volume Operating System Concepts 11.81 FAT32 FAT32 is the most recently defined FAT-based file system format z it's included with Windows 95 OSR2, Windows 98, and Windows Millennium Edition FAT32 uses 32-bit cluster identifiers but reserves the high 4 bits, so in effect it has 28bit cluster identifiers z Because FAT32 cluster sizes can be as large as 32 KB, FAT32 has a theoretical ability to address 8 TB volumes z Although Windows works with existing FAT32 volumes of larger sizes (created in other operating systems), it limits new FAT32 volumes to a maximum of 32 GB z FAT32's higher potential cluster numbers let it more efficiently manage disks than FAT16; it can handle up to 128-MB volumes with 512-byte clusters Unlike FAT12 and FAT16, root directory is not fixed size or location z Largest file size on Windows is 4GB (largest on Win9x is 2G) Operating System Concepts 11.82 41 NTFS NTFS is the native file system format of Windows NTFS uses 64-bit cluster indexes z Theoretical ability to address volumes of up to 16 exabytes (16 billion GB) z Windows 2000 limits the size of an NTFS volume to that addressable with 32-bit clusters, which is 128 TB (using 64-KB clusters) Why use NTFS instead of FAT? FAT is simpler, making it faster for some operations, but NTFS supports: z Larger file sizes and disks z Better performance on large disks, large directories, and small files z Reliability z Security Operating System Concepts 11.83 CIFS – the Common Internet File System The standard Windows network file system The file sharing protocol at the heart of CIFS is an updated version of the Server Message Block (SMB) protocol z dates back to the mid-1980s z in 1996/97, Microsoft submitted draft CIFS specifications to the IETF The SMB protocol was originally developed to run over NetBIOS (Network Basic Input Output System) LANs z Until Windows 2000, NetBIOS support was required for SMB transport z The machine and service names visible in the Windows Network Neighborhood are, basically, NetBIOS addresses (Windows 2000 and later use DNS names) Windows 3.11 (WfW) introduced: z service announcement and location system called Browsing z The browser service provides the list of available file and print services presented in the Network Neighborhood z Workgroup concept was expanded to create NT Domains Operating System Concepts 11.84 42 File System Format Compatibility FAT12/FAT16 supported on all Microsoft OS’s FAT32: z Only Windows 2000/XP/2003 z Winternals FAT32 driver for NT4 NTFS: z Only Windows NT-based OS’s z Winternals NTFSDOS for DOS access z Winternals NTFS for Windows 98 for Win9x/Me Operating System Concepts 11.85 NTFS Design Goals Overcome limitations inherent in FAT / HPFS z FAT (File Allocation Table) does not support large disks very well z FAT16 (MS-DOS file system) supports only up to 216 clusters and 2 GB disks (with 64 Kb clusters!!) z FAT / root directory represents single point of failure z Number of entries in root directory is limited z HPFS removed some of FAT‘s limitations, but still did not support recoverability, security, data redundancy, and fault-tolerance (later versions of HPFS support up to 2TeraByte disks) Operating System Concepts 11.86 43 NTFS Recoverability PC disk I/O in the old days: Speed was most important NTFS changes this view – Reliability counts most: I/O operations that alter NTFS structure are implemented as atomic transactions z Change directory structure, z extend files, allocate space for new files Transactions are either completed or rolled back NTFS uses redundant storage for vital FS information z Contrasts with FAT / HPFS on-disk structures, which have single sectors containing critical file system data z Read error in these sectors -> volume lost Operating System Concepts 11.87 NTFS Security and Recoverability NTFS security is derived from Windows object model Open file is implemented as file object; security descriptor is stored on disk as part of the file NT security system verifies access rights when a process tries to open a handle to any object Administrator or file owner may set permissions NTFS recoverability ensures integrity of FS structure No guarantees for complete recovery of user files Layered driver model + FTDISK driver z z Mirroring of data – RAID level 1 Striping of data - RAID level 5 (one disk with parity info) Operating System Concepts 11.88 44 Large Disks and Large Files Efficient support for large files and disks in NTFS FAT16: z 16-bit wide table stores allocation status of disk Up to 65.536 clusters per volume (#files !!); adjustable cluster size z FAT32: z New in since Windows 2000 4kb clusters on volumes up to 8 GB Can relocate root directory / use backup copy of FAT Root directory is ordinary cluster chain – no limits on #entries z z z HPFS (support dropped in NT 4.0): z 32 bits to enumerate allocation units; maximum file size: 4GB Allocates disk space in terms of physical sectors of 512 bytes; problem with some disks (1024 bit sectors) z Operating System Concepts 11.89 Large Disks and Large Files (contd.) NTFS enumerates cluster with 64-bit numbers Up to 264 clusters of up to 64 Kbytes size Maximum file size: 264 bytes Cluster size is adjustable z z 512 bytes on small disks Maximum of 64Kb on large disks Multiple data streams z z z z z File info: name, owner, time stamps, type implemented as attribute Each attribute consists of a stream – sequence of bytes Default data stream has no name Used to implement services for Macintosh in Windows NT Server New streams can be added: myfile.dat:stream2 File operations manipulate all streams simultaneously Operating System Concepts 11.90 45 Other NTFS Features Multiple data streams Unicode-based names Hard links Junctions Compression and sparse files Change logging Per-user volume quotas Link tracking Encryption POSIX support Defragmentation Operating System Concepts 11.91 Multiple Data Streams In NTFS, each unit of information associated with a file, including its name, its owner, its time stamps, its contents, and so on, is implemented as a file attribute (NTFS object attribute) Each attribute consists of a single stream, that is, a simple sequence of bytes z This generic implementation makes it easy to add more attributes (and therefore more streams) to a file z Because a file's data is "just another attribute" of the file and because new attributes can be added, NTFS files (and file directories) can contain multiple data streams Operating System Concepts 11.92 46 Multiple Data Streams An NTFS file has one default data stream, which has no name z An application can create additional, named data streams and access them by referring to their names. z To avoid altering the Microsoft Windows I/O APIs, which take a string as a filename argument, the name of the data stream is specified by appending a colon (:) to the filename e.g. myfile:stream2 Operating System Concepts 11.93 Unicode Names Like Windows as a whole, NTFS is fully Unicode enabled, using Unicode characters to store names of files, directories, and volumes Operating System Concepts 11.94 47 Hard Links A hard link allows multiple paths to refer to the same file or directory z If you create a hard link named C:\Users\Documents\Spec.doc that refers to the existing file C:\My Documents\Spec.doc, the two paths link to the same on-disk file and you can make changes to the file using either path z can create hard links with the Windoqs API CreateHardLink function or the ln POSIX function Operating System Concepts 11.95 Junctions Junctions, also called symbolic links, allow a directory to redirect file or directory pathname translation to an alternate directory z If the path C:\Drivers is a junction that redirects to C:\Winnt\System32\Drivers, an application reading C:\Drivers\Ntfs.sys actually reads C:\Winnt\System\Drivers\Ntfs.sys z Junctions are a useful way to lift directories that are deep in a directory tree to a more convenient depth without disturbing the original tree's structure or contents Operating System Concepts 11.96 48 Junctions You can create junctions with the junction tool from Sysinternals or the linkd tool from the Resource Kits Operating System Concepts 11.97 Change Logging Many types of applications, such as incremental backup utilities, need to monitor a volume for changes An obvious way to watch for changes is to perform a full scan z Very performance inefficient There is a way for an application to “wait” on a directory and be told of notifications z An application can miss changes since it must specify a buffer to hold them Operating System Concepts 11.98 49 Change Logging With Windows 2000, NTFS introduces the change log, which is a sparse metadata file that records file system events (not enabled by default) z As the file exceeds its maximum on-disk size, NTFS frees the disk space for the oldest portions marking them empty z An application uses Win32 APIs to read events z The log file is shared, and generally large enough that an application won’t miss changes even during heavy file system activity Operating System Concepts 11.99 Per-User Volume Quotas NTFS quota-management support allows for per-user specification of quota enforcement z Can be configured to log an event indicating the occurrence to the system Event Log if a user surpasses his warning limit z If a user attempts to use more volume storage then her quota limit permits, NTFS can log an event to the system Event Log and fail the application file I/O that would have caused the quota violation with a "disk full" error code User disk space is tracked on a per-volume basis by summing the logical sizes of all the files and directories that have the user as the owner in their security descriptors Operating System Concepts 11.100 50 Link Tracking Several types of symbolic file links are used by layered applications z Shell shortcuts allow users to place files in their shell namespace (on their desktop, for example) that link to files located in the file system namespace z Object linking and embedding (OLE) links allow documents from one application to be transparently embedded in the documents of other applications In the past, these links were difficult to manage Windows now has a link-tracking service, TrkWks (it runs in services.exe), tags link sources with a unique object ID z If someone moved a link source (what a link points to), the link broke z NTFS can return the name of a file given a link, so if the link moves the service can query each of a system’s volume for the object ID z A distributed link-tracking service, TrkSvr, works to track link source movement across systems Operating System Concepts 11.101 Encryption While NTFS implements security for files and directories, the security is ineffective if the physical security of the computer is compromised z Can install a parallel copy of Windows z NTFSDOS Encrypting File System (EFS) z Like compression, its operation is transparent z Also like compression, encryption is a file and directory attribute z Files that are encrypted can be accessed only by using the private key of an account's EFS private/public key pair, and private keys are locked using an account's password While you might think that its implemented as a file system filter driver, it’s a driver that’s tightly connected to NTFS Operating System Concepts 11.102 51 POSIX Support POSIX support requires two file system features: z Primary group in security descriptor z Case-sensitive names Operating System Concepts 11.103 Defragmentation Fragmentation: A file is fragmented if its data occupies discontiguous clusters Operating System Concepts 11.104 52 Defragmentation A common myth is that NTFS doesn’t fragment, but it does z Defragmentation APIs have been present since NT 4 z Windows 2000 introduced a non-schedulable graphical defragmenter z A command line interface was added in Windows XP Operating System Concepts 11.105 Compression and Sparse Files NTFS supports transparent compression of files z When a directories is marked compressed it means any files or subdirectories are marked compressed z Compression is performed on 16-cluster blocks of a file z Use Explorer or the compact tool to compress files (compact shows compression rations for compressed files) Sparse files are an application-controlled form of compression that define parts of a file as empty – those areas don’t occupy any disk space z Applications use Windows APIs to define empty areas Operating System Concepts 11.106 53 NTFS File System Driver Flush the log file Log file service Write the cache Cache manager Access the mapped file or flush the cache Log the transaction Read/write the file I/O manager NTFS driver Fault tolerant driver Load data from disk into memory Disk driver Read/write a mirrored or striped volume Read/write the disk Virtual memory manager Operating System Concepts 11.107 Components related to NTFS Cache Manager System wide caching z for NTFS and other file systems drivers z Including network file system drivers (server and redirectors) Cached files are mapped into virtual memory z Specialized interface from Cache Manager to NT virtual memory manager z Memory manager calls NTFS to access disk driver and obtain file Log File Service z z z 2 copies of transaction logs Transaction log is flushed to disk before write-data is sent to disk Cache manager performs actual flush operation Operating System Concepts 11.108 54 NTFS & File Objects Process Handle table File object Object manager data structures File object File control block Data attribute NTFS data structures App accesses files as NT objects by handles. Object Manager and security subsystem verify access rights Operating System Concepts Stream control blocks Master file table Userdefined attribute (used to manage the on-disk structure) NTFS database (on disk) 11.109 NTFS On-Disk Structure Volumes correspond to logical partitions on disk Fault tolerant volumes may span multiple disks z Windows 2000 Disk Administrator utility Volume consists of series of files + unallocated space z FAT volume: some areas specially formatted for file system z NTFS volume: all data are stored as ordinary files NTFS refers internally to clusters z Cluster factor: #sectors/cluster; varies with volume size; (integral number of physical sectors; always a power of 2) Logical Cluster Numbers (LCNs): z refer to physical location z LCNs are contiguous enumeration of all clusters on a volume Operating System Concepts 11.110 55 NTFS Cluster Size Default cluster size is disk-size dependent z 512 bytes for small disks (up to 512 MB) z 1 KB for disks up to 1 GB z 2 KB for disks between 1 and 2 GB z 4 KB for disks larger than 2 GB Tradeoff: disk fragmentation versus wasted space NTFS refers to physical locations via LCNs z Physical cluster = LCN * cluster-factor Virtual Cluster Numbers (VCNs): z Enumerates clusters belonging to a file; mapped to LCNs z LCNs are not necessarily physically contiguous Operating System Concepts 11.111 Master File Table All data stored on a volume is contained in a file MFT: Heart of NTFS volume structure z Implemented as array of file records z One row for each file on the volume (including one row for MFT itself) z Metadata files store file system structure information (hidden files; $MFT; $Volume...) z More than one MFT record for highly fragmented files NTFS z Nfi.exe Utility from OEM Support Tools allows to metadata dump MFT content file (see support.microsoft.com/support/ kb/articles/Q253/0/66.asp) MFT MFT copy (partial) Log file Volume file Attribute def. table Root directory Bitmap file Boot file Bad cluster file ... User files and dirs. Operating System Concepts 11.112 56 NTFS operation Mounting a volume NTFS looks in boot file for physical address of MFT ($MFT) 2nd entry in MFT points to copy of MFT ($MFTMirr) z used to locate metadata files if MFT is corrupted MFT entry in MFT contains VCN-to-LCN mapping info NTFS obtains from MFT addresses of metadata files z NTFS opens these files NTFS performs recovery operations File system is now ready for user access 1. 2. 3. 4. 5. 6. Operating System Concepts 11.113 NTFS metadata NTFS writes to log file ($LogFile) Root directory: z z Record all commands that change volume structure z When NTFS tries to open a file, it starts search in the root directory Once the file is found, NTFS stores the file‘s MFT file reference z Subsequent read/write ops. may access file‘s MFT record directly Bitmap file ($Bitmap): Boot file ($Boot): z z z z stores allocation state volume; each bit represents one cluster Stores bootstrap code Has to be located at special disk address Represented as file by NTFS -> file ops. possible (!) (no editing) Operating System Concepts 11.114 57 NTFS metadata (contd.) Bad-cluster file ($BadClus) z Records bad spots on the disk Volume file ($Volume) z Contains: volume name, NTFS version z Bit, which indicates whether volume is corrupted Attribute Definition Table ($AttrDef) z Defines attribute types supported on the volume z Indicates whether they can be indexed, recovered, etc. Operating System Concepts 11.115 File Records & File Reference Numbers Sequence number 63 File number 0 47 File on NTFS volume is identified by file reference z File number == index in MFT z Sequence number – used by NTFS for consistency checking; incremented each time a reference is re-used File Records: z File is collection of attribute/value pairs (one of which is data) z Unnamed data attribute z Other attributes: filename, time stamp, security descriptor,... z Each file attribute is stored as separate stream of bytes within a file Operating System Concepts 11.116 58 File Records (contd.) NTFS doesn‘t read/write files: z It reads/writes attribute streams z Operations: create, delete, read (byte range), write (byte range) z Read/write normally operate on unnamed data attribute Windows optimization: Security descriptors are stored in a central file and referenced by each file record (saves disk space) Master File Table Standard information Security Filename descriptor Data MFT record for a small file Operating System Concepts 11.117 Standard Attributes for NTFS Files Attribute Description Standard information File attributes: read-only, archive, etc; time stamps; creation/modification time; hard link count Filename Name in Unicode characters; multiple filename attributes possible (POSIX links!!); short names for access by MS-DOS and 16-bin Win applications Security descriptor Specifies who owns the file and who can access it data Contents of the file; a file has one default unnamed data attribute; directory has no default data attrib. Index root, index Three attributes used to implement filename allocation, bitmap index for large directories (dirs. only) Attribute list List of attributes that make up the file and first reference of the MFT record in which the attribute is located (for files which require multiple MFT file records) Operating System Concepts 11.118 59 Attributes (contd.) Each attribute in a file record has a name and a value NTFS identifies attributes: z Uppercase name starting with $: $FILENAME, $DATA Attribute‘s value: Byte stream z The filename for $FILENAME z The data bytes for $DATA Attribute names correspond to numeric typecodes File attributes in an MFT record are ordered by typecodes z Some attribute types may appear more than once (e.g. Filename) Operating System Concepts 11.119 Filenames POSIX: z Case-sensitive, trailing periods & spaces z NTFS namespace equiv. to POSIX space Win32: z Long filenames, unicode names z Multiple dots, embedded spaces, beginning dots MS-DOS: z 8.3 names, case does not matter NTFS generates MS-DOS names for Win32 files automatically z Fully functional aliases for NTFS names z Stored in same directory as long names; dir /x Namespaces POSIX subsystem Win32 subsystem MS-DOS Win16 clients Operating System Concepts 11.120 60 MS-DOS filenames in NTFS Standard info NTFS filename MS-DOS filename Security desc. Data MFT file record with MS-DOS filename attribute NTFS name and MS-DOS name are stored in same file record and refer to same file z Renaming changes both filenames Open, read, write, delete work with both names equally z POSIX hardlinks are implemented in similar way Generation of MS-DOS names: z Deleting a file with multiple names only decreases link count Remove all illegal chars; remove all but one period; truncate to 6 chars Append ~1 to name; truncate extension to 3 chars; all uppercase Increment ~1 if filename duplicates an existing name in directory 1. 2. 3. Operating System Concepts 11.121 Resident & Nonresident Attributes Small files: z All attributes and values fit into MFT z Attribute with value in MFT is called „resident“ z All attributes start with header (always resident) z Header contains offset to attr. value and length of value Standard info NTFS filename Security desc. Data header „RESIDENT“ Offset: 8h Length: 14h Operating System Concepts value MYFILE.DAT 11.122 61 Attributes (contd.) Small directory: z index root attribute contains index of file references for files and subdirectories Standard info NTFS filename Security desc. Index root Empty Index of files file1, file2, file3,... MFT file record for a small directory • If file attribute does not fit into MFT: • • • • NTFS allocates separate cluster (run, extent) to store the values NTFS allocates additional runs if an attribute‘s value later grows Those attributes are called „non-resident“ Header of non-resident attribute contains location info Operating System Concepts 11.123 Large files & directories Standard info NTFS filename Security desc. Data HPFS extended attr. MFT record for large filegrow with data runs Only attributes that can can2be non-resident Filename & standard info are always resident Index of files for directories forms B+ tree Standard info NTFS filename Security desc. Index root Index allocation Bitmap Index of files file4, file8 MFT file record for a large directory with nonresident filename index file1, file2, file3 Operating System Concepts VCN-to-LCN mappings file5, file6 11.124 62 Large files (contd.) NTFS keeps track of runs by means of VCN (Virtual Cluster Numbers) z z z Logical Cluster Numbers represent an entire volume Virtual Cluster Numbers represent clusters belonging to one file Attribute lists may extend over multiple runs (not only data) Standard info NTFS filename Security desc. VCN-to-LCN mappings for a nonresident data attribute VCN 0 1 2 Data Starting VCN Starting LCN Number of clusters 0 1355 4 4 1588 4 VCN 4 3 Data 6 7 Data LCN 1355 1356 1357 1358 Operating System Concepts 5 LCN 1588 1589 1590 1591 11.125 Data Compression NTFS supports compression z Per-file, per-directory, per-volume basis z NTFS compression is performed on user data only, not NTFS metadata Inspect files/volume via Winndows API: GetVolumeInformation(), GetCompressedFileSize() Change settings for files/directories: DeviceIoControl() with flags FSCTL_GET_COMPRESSION, FSCTL_SET_COMPRESSION Operating System Concepts 11.126 63 Compression of sparse files NTFS zeroes all file contents on creation (C2 req.) Many sparse files contain large amount of zero-bytes z These bytes occupy space on disk – unless files are compressed Standard info VCN 0 1 NTFS filename Security desc. 2 3 .... 15 Data Starting VCN Starting LCN Number of clusters 0 1355 16 32 1588 16 48 96 16 128 324 16 Data LCN 1355 1356 1357 1358 .... VCN 32 33 34 35 ... Data LCN 1588 1589 1590 1591 .... Operating System Concepts 1370 47 Certain ranges of VCNs have no (16-31, 6464-127) 1603 disk allocation (1611.127 Compressing Nonsparse Data NTFS divides the file‘s unprocessed data into compression units 16 clusters long Certain sequence might not compress much z NTFS determines for each compression unit whether it will shrink by at least on cluster z If data does not compress, NTFS allocates cluster space and simply writes data z If data compresses at least one cluster, NTFS allocates only the clusters needed for compressed data When writing data, NTFS ensures that each run begins on virtual 16-cluster boundary z NTFS reads/writes at least one compression unit when accessing a file z Read-ahead + asynch. decompression improves performance Operating System Concepts 11.128 64 Data runs of a compressed file 15 VCN 0 Compressed data LCN 19 20 21 16 22 31 Compressed data 23 24 25 32 26 27 28 29 30 47 Noncompressed data 97 Starting VCN Starting LCN No. of clusters 0 19 4 16 23 8 32 97 16 48 113 10 98 99 48 100 101 102 103 104 105 106 107 108 109 110 111 112 63 Compressed data 113 114 115 116 117 118 119 120 121 122 Operating System Concepts MFT record for a compressed file 11.129 Windows - NTFS Extensions Disk quotas on per-user bases Security descriptors (ACLs) can be stored once but referenced in multiple files Native support for properties (OLE), including indexing Reparse points – implementation of symbolic links z z Mount points for arbitrary file system volumes Support for sparse files Distributed link tracking (via global object Ids) Renaming the target file will no longer break links (shortcuts...) Add disk space to an NTFS volume without reboot No decompressing when transmitting files over network Operating System Concepts 11.130 65 File System Driver Architecture Local File System Drivers (Local FSDs): z z z z z z Ntfs.sys, Fastfat.sys, Udfs,sys, Cdfs,sys Responsible for registering with the I/O manager and volume recognition/integrity checks FSD creates device objects for each mounted file system format I/O manager makes connection between volume‘s device objects (Created by storage device) and the FSD‘s device object Local FSDs use cache manager to improve file access performance Dismount operation permits the system to disconnect FSD from volume object When media is changed or when application requires raw device access I/O manager reinitiated volume mount operation on next access to media Operating System Concepts 11.131 Layered Drivers I/O System Architecture Environment subsystem or DLL 1) Call I/O service 2)The I/O manager creates an IRP, initializes first stack location and calls file system driver IRP 3)File system driver fills in a 2nd IRP stack location and calls the disk driver 7)Return I/O pending status User mode Kernel mode Services I/O manager 6)Return I/O pending status File system driver IRP 5)Return I/O pending status Disk driver 4)Send IRP data to device (or queue IRP), and return Operating System Concepts Optimization: associated IRPs may work in parallel on a single I/O request 11.132 66 File System Driver Architecture (contd.) Remote File System Drivers (Remote FSDs): Client-side FSD translates I/O requests from applications into network file system protocol commands Server-side FSD listens for network commands and issues I/O requests to local FSD Windows client-side remote FSD: LANMan Redirector z z z user mode kernel mode I/O manager Implemented as port/miniport driver Includes Windows service Workstation Server-side FSD server: LANMan Server z Application Includes Windows service Server CIFS – common internet file system (enhancement of Server Message Block protocol) Remote FSD (redirector) server client Remote FSD (server) Local FSD volume Operating System Concepts Storage device driver 11.133 Windows Remote File Drivers: Server Message Block (SMB) protocol SMB is a client server, request-response protocol. Addl. info at http://anu.samba.org/ cifs/docs/what-is-smb.html The only exception to the request-response nature of SMB is when the client has requested opportunistic locks (oplocks) and the server subsequently has to break an already granted oplock because another client has requested a file open with a mode that is incompatible with the granted oplock. In this case, the server sends an unsolicited message to the client signaling the oplock break. Operating System Concepts 11.134 67 SMB and the OSI model Clients connect to servers using TCP/IP (actually NetBIOS over TCP/IP as specified in RFC1001 and RFC1002), NetBEUI or IPX/SPX. SMB was also sent over the DECnet protocol. Digital (now HP) did this for their PATHWORKS product Operating System Concepts 11.135 SMB characteristics NetBIOS Names z If SMB is used over TCP/IP, DECnet or NetBEUI, then NetBIOS names must be used in a number of cases. z NetBIOS names are up to 15 characters long, and are usually the name of the computer that is running NetBIOS. z Microsoft, and some other implementers, insist that NetBIOS names be in upper case, especially when presented to servers as the CALLED NAME. Protocol functionality (Core protocol): z connecting to and disconnecting from file and print shares z opening and closing files z opening and closing print files z reading and writing files z creating and deleting files and directories z searching directories z getting and setting file attributes z Locking and unlocking byte ranges in files Operating System Concepts 11.136 68 SMB characteristics (contd.) Security The SMB model defines two levels of security: Share level. z Each share can have a password, and a client only needs that password to access all files under that share. z This was the first security model that SMB had and is the only security model available in the Core and CorePlus protocols. User Level. z Protection is applied to individual files in each share and is based on user access rights. z Each user (client) must log in to the server and be authenticated by the server. z When it is authenticated, the client is given a UID which it must present on all subsequent accesses to the server. z This model has been available since LAN Manager 1.0. Operating System Concepts 11.137 SMB Clients and Servers Clients: z Included in WfW 3.x, Win 95, Win98, Win ME and Windows NT/2000/XP/Server 2003/Vista. z smbclient from Samba, smbfs for Linux, SMBlib Servers: z Microsoft Windows for Workgroups 3.x, Win95, Win98, Win ME, Windows NT/2000/XP/Server 2003/Vista z Samba (Linux, Solaris, SunOS, HP-UX, ULTRIX, DEC OSF/1, Digital UNIX, Dynix (Sequent), IRIX (SGI), SCO Open Server, DG-UX, UNIXWARE, AIX, BSDI, NetBSD, NEXTSTEP, A/UX) z The PATHWORKS family of servers from Digital z LAN Manager for OS/2, SCO, etc z VisionFS from SCO z Advanced Server for UNIX from AT&T (NCR?) z LAN Server for OS/2 from IBM Operating System Concepts 11.138 69
© Copyright 2026 Paperzz