Professional Documents
Culture Documents
Peter J. Braam
P.J.Braam/CMU -- 1
Aims
Present the data structures in Linux VFS Provide information about flow of control Describe methods and invariants needed to implement a new file system Illustrate with some examples
P.J.Braam/CMU -- 2
History
BSD implemented VFS for NFS: aim dispatch to different filesystems VMS had elaborate filesystem NT/Win95 have VFS type interfaces Newer systems integrate VM with buffer cache. File access
P.J.Braam/CMU -- 3
Linux Filesystems
Media based
ext2 - Linux native ufs - BSD fat - DOS FS vfat - win 95 hpfs - OS/2 minix - well. Isofs - CDROM sysv - Sysv Unix hfs - Macintosh affs - Amiga Fast FS NTFS - NTs FS adfs - Acorn-strongarm
Network
nfs Coda AFS - Andrew FS smbfs - LanManager ncpfs - Novell
Special ones
P.J.Braam/CMU -- 4
Varia:
cfs - crypt filesystem cfs - cache filesystem ftpfs - ftp filesystem mailfs - mail filesystem pgfs - Postgres versioning file system
P.J.Braam/CMU -- 5
P.J.Braam/CMU -- 6
Linux VFS
Multiple interfaces build up VFS:
files dentries inodes superblock quota
File access
VFS can do all caching & provides utility fctns to FS FS provides methods to VFS; many are optional
P.J.Braam/CMU -- 7
P.J.Braam/CMU -- 8
VFS
Manages kernel level file abstractions in one format for all file systems Receives system call requests from user level (e.g. write, open, stat, link) Interacts with a specific file system based on mount point traversal Receives requests from other parts of the kernel, mostly from memory management
P.J.Braam/CMU -- 9
Call into inode layer rc = inode->i_op->i_getattr(inode, buf); of filesystem dput(dentry); return rc; }
P.J.Braam/CMU -- 11
P.J.Braam/CMU -- 12
Data structures
VFS data structures for:
VFS handle to the file: inode (BSD: vnode) User instantiated file handle: file (BSD: file) The whole filesystem: superblock (BSD: vfs) A name to inode translation: dentry
P.J.Braam/CMU -- 13
namei
VFS
struct dentry *namei(parent, name) { if (dentry = d_lookup(parent,name)) ddd_hash(parent, name) ddd_revalidate(dentry)
FS
else
iii_lookup(parent, name) struct inode *iget(ino, dev) { /* try cache else .. */ sss_read_inode() }
P.J.Braam/CMU -- 15
Superblocks
Handle metadata only (attributes etc) Responsible for retrieving and storing metadata from the FS media or peers Struct superblocks hold things like:
device, blocksize, dirty flags, list of dirty inodes super operations wait queue pointer to the root inode of this FS
P.J.Braam/CMU -- 16
Superblock manips:
read_super (mount) put_super (unmount) write_super (unmount) statfs (attributes)
P.J.Braam/CMU -- 17
Inodes
Inodes are VFS abstraction for the file Inode has operations (iii_methods) VFS maintains an inode cache, NOT the individual FSs (compare NT, BSD etc) Inodes contain an FS specific area where:
ext2 stores disk block numbers etc AFS would store the FID
Extraordinary inode ops are good for dealing with stale NFS file handles etc.
P.J.Braam/CMU -- 18
caching
Identifies file
Usual stuff
P.J.Braam/CMU -- 19
Which FS
Inode state
Inode can be on one or two lists:
(hash & in_use) or (hash & dirty ) or unused inode has a use count i_count
Transitions
unused hash: iget calls sss_read_inode
dirty in_use: sss_write_inode hash unused: call on sss_clear_inode, but if i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0
P.J.Braam/CMU -- 21
Inode Cache
Players: 1. iget: if i_count>0 ++ 2. iput: if i_count>1 - 3. free_inodes 4. syncing inodes
Inode_hashtable
sss_clear_inode (freeing inos) or sss_delete_inode (iput) Fs storage sss_read_inode (iget) Fs storage Unused inodes
Dirty inodes
media fs only (mark_inode_dirty) Used inodes
Sales
Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998.
Estimates of installed servers (InfoWorld): - Linux: 7 million - OS/2: 5 million - Macintosh: 1 million
P.J.Braam/CMU -- 23
symbolic links
readlink follow link
pages
readpage, writepage, updatepage - read or write page. Generic for mediafs. bmap - return disk block number of logical block
special operations
Dentry world
Dentry is a name to inode translation structure Cached agressively by VFS Eliminates lookups by FS & private caches
timing on Coda FS: ls -lR 1000 files after priming cache linux 2.0.32: 7.2secs linux 2.1.92: 0.6secs
P.J.Braam/CMU -- 25
Inside dentrys
name pointer to inode pointer to parent dentry list head of children chains for lots of lists use count
P.J.Braam/CMU -- 26
= d_inode pointer
d_alias chains place: d_instantiate remove: dentry_iput
= d_parent pointer
d_child chains place: d_alloc remove: d_prune, d_invalidate, d_put
P.J.Braam/CMU -- 27
Dcache
dentry_hashtable (d_hash chains) dhash(parent, name) list head namei tries cache: d_lookup ddd_compare Success: ddd_revalidate d_invalidate if fails proceed if success Failure: iii_lookup find inode iget sss_read_inode finish: d_add can give negative entry in dcache
P.J.Braam/CMU -- 28
Dentry methods
ddd_revalidate: can force new lookup ddd_hash: compute hash value of name ddd_compare: are names equal? ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity
P.J.Braam/CMU -- 29
Dentry particulars:
ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat:
case insensitive long and short filename pleasantries
Style
P.J.Braam/CMU -- 31
Memory mapping
vm_area structure has
vm_operations inode, addresses etc.
vm_operations
map, unmap swapin, swapout nopage -- read when page isnt in VM
mmap
calls on iii_readpage keeps a use count on the inode until unmap
P.J.Braam/CMU -- 32