Filesystems

Data model

inode

Index nodes, or inodes, are structures describing each file, returned by the stat() system call. A fixed number of these structures are allocated at filesystem creation time in most cases, with XFS being a notable exception. Note that filenames are stored elsewhere, either in directory data or a filesystem-managed BTree.

At a high level, inodes contain the following properties:

  • inode number.
  • Parent device.
  • Mode (socket, link, regular, block dev, character device, FIFO; setuid, setgid, sticky; user, group, other).
  • uid and gid define the owning user and group.
  • Device ID, if the file represents a device.
  • acl define POSIX ACLs.
  • default_acl define default POSIX ACLs.
  • Number of allocated blocks.
  • size of the file in bytes.
  • blocksize denotes the preferred I/O block size.
  • atime contains last access time.
  • mtime contains the last data modification time.
  • ctime contains the last inode modification time.
  • Number of hard links.

unlink() operations remove the inode, leaving data.

dentry

Dentries, or directory entries, relate inode numbers to filenames. They're also used as boundaries for directory caching and filesystem traversal.

File descriptors

Superblock

Superblocks are crucial data structures that contain metadata about a filesystem. Their loss prevents use of the filesystem, so filesystem drivers usually replicate them across the volume to account for damage.

They comprise:

  • Filesystem size
  • Block size
  • Empty and filled block bitmap
  • Size and location of inode table
  • Disk block map

Mounting

Filesystems are mounted to locations in a single namespace -- below the root filesystem (/).

Common flags:

  • remount allows remounting an existing mount with new options.
  • ro and rw determine whether the filesystem is writable or not.
  • exec and noexec control whether binaries are executable when they have the +x mode.
  • async allows async operations.
  • auto and noauto set whether the filesystem should be mounted when executing mount -a.
  • defaults sets rw, suid, dev, exec, auto, nouser, and async.
  • suid and nosuid allow or prevent use of the suid and sgid bits.
  • user and nouser allow non-root users to bring up the mount.
  • loop mounts images as loop devices to allow accessing their filesystems.

ext4

ext4 uses 48-bit addressing for a maximum filesystem size of 1EiB, with 16TiB maximum file size with a 4KiB block size.

It offers three journaling levels, configured with the data mount option:

  • journal offers the lowest risk, writing both metadata and data to the journal before committing changes to the filesystem. This ensures consistency at the cost of performance.
  • ordered writes metadata to the journal, writes the data directly, and then commits the journal. On crash incomplete writes present in the journal and can be rolled back.
  • writeback -- removes the ordering constraint, allowing the changes to the journal to be committed to the filesystem before the data is written.

Children
  1. NFS
  2. Process VFS
  3. ZFS

Backlinks