http://​www.​usenix.​org/​events/​fast08/​wips_​posters/​slides/​wong.​pdf

Mirror File System
A Multiple Server File System
John Wong
CTO
[email protected]
Twin Peaks Software Inc.
Page 1 of
Multiple Server File System
• Conventional File System – EXT3/UFS
and NFS
–
Manage files on a single server and its storage
devices
• Multiple Server File system
–
Page 2 of
Manage files on multiple servers and their
storage devices
Problems
• Single resource is vulnerable
• Redundancy provides a safety net
–
–
–
–
–
–
Page 3 of
Disk level
Storage level
TCP/IP level
File System level
System level
Application
=> RAID
=> Storage Replication
=> SNDR
=> CFS, MFS
=> Clustering system
=> Database
Why MFS?
• Many advantages over existing
technologies
Page 4 of
Local File System
Application
2
Application
1
User Space
EXT3
Disk Driver
Kernel Space
Data
EXT3 manages file on the local server’s storage devices
Page 5 of
Network File System
Client
Application
Server
Application
Application
Application
EXT3/UFS
NFS (Client mount)
NFSD
Data
NFS manages file on remote server’s storage devices
Page 6 of
EXT3 | NFS
Client
rsync, tar
Application
Server
Application
Application
EXT3/UFS
EXT3/UFS
Data B
NFS (Client
mount)
NFSD
Data B
Applications can only use either one, not both.
Page 7 of
EXT3 + NFS ??
• Combine these two file systems to manage file on
both local and remote servers storage devices
-- at the same time
-- in real time
Page 8 of
MFS = EXT3 + NFS
Active MFS Server
Application
Application
Data
Page 9 of
User Space
Application
Application
Kernel Space
MFS
EXT3/UFS
Passive MFS Server
NFS
EXT3/UFS
Data
Building Block Approach
• MFS is a kernel loadable module
- loaded on top of EXT3/UFS and NFS
• Standard VFS interface
• Provide Complete Transparency
- to users and applications
- to underlining file systems
Page 10 of
Q&A
Application
Application
MFS
Data A
Page 11 of
Application
Application
MFS
Data B
Advantages
• Building block approach
-- Building upon existing EXT3, NFS, NTFS, CIFS
infrastructures
• No metadata is replicated
-- Superblock, Cylinder group, file allocation map
are not replicated.
• Every file write operation is checked by file system
-- file consistency, integrity
• Live file, not raw data replication
-- The primary and backup copy both are live files
Page 12 of
Advantages
• Interoperability
-- Two nodes can be different systems
-- Storage systems can be different
• Small granularity
-- Directory level, not entire file system
• One to many or many to one replication
Page 13 of
Advantages
• Fast replication
-- Replication in Kernel file system module
• Immediate failover
-- No need to fsck and mount operation
• Geographically dispersed clustering
-- Two nodes can be separated by hundreds of miles
• Easy to deploy and manage
-- Only one copy of MFS running on primary server is
needed for replication
Page 14 of
Why MFS?
•
Better Data Protection
•
Better Disaster Recovery
•
Better RAS
•
Better Scalability
•
Better Performance
•
Better Resources Utilization
Page 15 of
File System Framework
User Applications
File System Operation calls
System Call Interface
File System Operation calls
Network
SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL
Optical
drive
Statfs()
sync ()
PCFS
Data
PCFS
UFS (2)
Data
NFS (1)
UFS (1)
Data
NFS (2)
VxFS
QFS
Data
HSFS
VFS interfaces
Vnode interfaces
Page 16 of
umount ()
mount ()
lseek ()
creat ()
Other System calls
ioctl ()
link ()
rmdir ()
mkdir ()
open ()
close ()
write ()
read ()
File Operation System
Calls
MFS Framework
User Applications
System Call Interface
Vnode interfaces
VFS interfaces
Page 17 of
Data
Optical drive
PCFS
HSFS
PCFS
Network
NFS (1)
UFS(1)
Data
NFS
(2)
VxFS
Data
Vnode VFS interface
UFS
(2)
QFS
MFS
Data
sync
()
Statfs()
umount ()
mount ()
lseek ()
creat ()
link
()
ioctl ()
rmdir ()
mkdir ()
close ()
open ()
read
()
write ()
File System Operation calls
Other System calls
File Operation System Calls
Transparency
• Transparent to users and applications
- No re-compilation or re-link needed
• Transparent to existing file structures
- Same pathname access
• Transparent to underlying file systems
- UFS, NFS
Page 18 of
Mount Mechanism
• Conventional Mount
- One directory, one file system
• MFS Mount
- One directory, two or more file systems
Page 19 of
Mount Mechanism
# mount –F mfs host:/ndir1/ndir2 /udir1/udir2
- First mount the NFS on a UFS directory
- Then mount the MFS on top of UFS and NFS
- Existing UFS tree structure /udir1/udir2
becomes a local copy of MFS
- Newly mounted host:/ndir1/ndir2 becomes a
remote copy of MFS
- Same mount options as NFS except no ‘-o hard’
option
Page 20 of
MFS mfsck Command
# /usr/lib/fs/mfs/mfsck mfs_dir
- After MFS mount succeeds, the local copy may
not be identical to the remote copy.
- Use mfsck (the MFS fsck) to synchronize them.
- The mfs_dir can be any directory under MFS
mount point.
- Multiple mfsck commands can be invoked at the
same time.
Page 21 of
READ/WRITE Vnode Operation
• All VFS/vnode operations received by MFS
• READ related operation: read, getattr,….
those operations only need to go to local copy
(UFS).
• WRITE related operation: write, setattr,…..
those operations go to both local (UFS) and remote
(NFS) copy simultaneously (using threads)
Page 22 of
Mirroring Granularity
• Directory Level
- Mirror any UFS directory instead of entire UFS
file system
- Directory A mirrored to Server A
- Directory B mirrored to Server B
•
Block Level Update
- Only changed block is mirrored
Page 23 of
MFS msync Command
# /usr/lib/fs/mfs/msync mfs_root_dir
- A daemon that synchronizes MFS pair after a
remote MFS partner fails.
- Upon a write failure, MFS:
- Logs name of file to which the write operation failed
- Starts a heartbeat thread to verify the remote MFS
server is back online
- Once the remote MFS server is back online, msync
uses the log to sync missing files to remote server.
Page 24 of
Active/Active Configuration
Server
Active
MFS Server
Application
Application
Server
Active MFS Server
Application
MFS
UFS
Data A
Page 25 of
Application
MFS
NFS
NFS
UFS
Data B
MFS Locking Mechanism
MFS uses UFS, NFS file record lock.
Locking is required for the active-active
configuration.
Locking enables write-related vnode operations as
atomic operations.
Locking is enabled by default.
Locking is not necessary in active-passive
configuration.
Page 26 of
Real -Time and Scheduled
• Real-time
-- Replicate file in real-time
• Scheduled
-- Log file path, offset and size
-- Replicate only changed portion of a file
Page 27 of
Applications
• Online File Backup
• Server File Backup, active  passive
• Server/NAS Clustering, active  Active
Page 28 of
MFS = NTFS + CIFS
Window Desktop/Laptop
Application
Application
Remote Server
Application
Application
MFS
NTFS
Data
Page 29 of
CIFS
NTFS
Data
Online File Backup
Real-time or Scheduled time
MFS
LAN
LAN or
or Wan
Wan
MFS
User
Desktop/Laptop
Page 30 of
Folder
Folder
ISP Server
Server Replication
Primary
Email
App
Secondary
Heartbeat
Mirror File
File
Mirror
System
System
Mirror File
System
Mirroring Path : /home
: /var/spool/mail
Page 31 of
Enterprise Clusters
Central
App
App
Mirror File
System
Page 32 of
Mirroring Path
App
App
Mirror File
System
App
App
Mirror File
System