Filesystems
Filesystems, or file systems (written as two words), are responsible for laying out data on a mass storage device. These systems implement data structures that make it easy to locate, read, and write data. In addition, filesystems make using persistent data easier for human users by organizing data into hierarchies of folders and providing human-readable names. Many filesystems also facilitate sharing computer systems among multiple users and programs, as file permissions can be used to prevent one user from changing another user’s saved data.
Video Lecture
Metadata
A key contribution of filesystems to the field of digital forensics is the automatic addition of metadata to files and folders stored on the system. Metadata, or data about the data, provide human-readable filenames and file sizes. Filesystems also differentiate between different kinds of entries, including directories (folders), regular files, links, and special files.
Importantly, the filesystem also stores timestamps as metadata. Whenever a file is created or modified, a timestamp is updated to reflect the date and time at which the action occurred. The creation time is the time at which the file was first created, while the modification time is the most recent time at which data were written to the file or the file metadata were changed. Many filesystems also support the recording of an access time, or the time at which a file was last read. On devices with solid state storage (SSDs, SD cards, and other flash memory), the system administrator will often disable the recording of access times. Flash memory can only be written a limited number of times, so writing a timestamp every single time a file is accessed contributes to unnecessary wear and tear on the storage device.
Whether or not access times are enabled, filesystem metadata are a critical part of any forensic investigation. Ownership information can be used to link a specific individual to a file, while the pattern of timestamps can be used to establish a timeline of actions that user took. In some cases, the metadata alone might be enough to implicate the user in some action. Conversely, the timestamps might be so closely spaced that they clearly show that an automated process completed an action, which might be exculpatory. For example, if many nefarious files have been downloaded from a server only a few milliseconds apart, the metadata will provide a clue that the system may have a virus or have been made part of a botnet, since a human could not act that quickly.1
Structure and Organization of a Filesystem
In a nutshell, a filesystem organizes a hierarchy of files and folders (Figure 1) and stores them inside a linear block of space. There are several different data structures that can be used for this purpose, including B-trees2, B+ trees3, and red-black trees.4 Each choice of implementation has various trade-offs in terms of performance, capabilities, and feature sets. As a result, many different filesystems have been created, and the development of new filesystems with new capabilities is an ongoing area of research.
Data Layout
A primary difference between filesystems involves the way data are laid out on the storage medium. One relatively simple design is the File Allocation Table (FAT), which uses a single table to store both file metadata and the addresses of file contents. Another design is to use some type of hierarchical data structure, which permits greater flexibility at the expense of more complex code.
In Unix-like systems, such as Linux, the basic structural unit in the filesystem is the inode. An inode stores basic metadata about a file, along with pointers to the file contents. These pointers can be simple addresses to a contiguous chunk of storage, or they can be in the form of extents, which allow large files to be split into multiple non-contiguous chunks on the storage device. In terms of metadata, one thing that an inode does not store is the filename. Instead, filenames are stored in special directory structures that map filenames to inodes.
Clusters and Slack Space
Since mass storage devices cannot be accessed in units smaller than a sector or block, filesystems are also designed to work in units of multiple bytes called clusters. A single cluster represents the smallest amount of storage space that a single file can consume, even if the actual size of the file is smaller than the size of the cluster. Cluster sizes vary: they may be as small as a single sector (512 bytes) or quite large (64 KiB or more). For some workloads, larger clusters may improve efficiency.
There is a fundamental trade-off between cluster size and wasted space. Since storage inside a filesystem cannot be allocated in units smaller than a cluster, some space is always wasted, since file sizes are rarely exact multiples of cluster sizes. This wasted space is known as slack space in digital forensics, and it is a potentially important source of evidence.
It is possible for a computer user to hide data in slack space using special tools.5 However, a more likely source of data in slack space comes from files that have previously been deleted and have then been overwritten. Since deleting a file only removes the directory entries and internal pointers, the contents of the old file often remain on the storage device. Whenever the contents of a new file are written over the same clusters, there is a chance that the new data will be shorter than the old data. Portions of the old data might then be recoverable from the slack space.
Fragmentation
While larger cluster sizes waste more slack space, small cluster sizes have their own problems. A small cluster size combined with a poor layout algorithm can cause the individual clusters holding the file to be scattered all over the disk, or fragmented. Mechanical hard disks are especially susceptible to the effects of fragmentation, since the fragments usually wind up on separate tracks, requiring the drive heads to be moved in the middle of a file access operation. This type of head movement reduces disk performance.
Some filesystems are better than others (or worse than others) in terms of fragmentation. For those filesystems that are prone to fragmentation (such as FAT32), disk defragmentation (“defrag”) tools are available to collect the fragments and put them together into contiguous clusters. Another alternative, if a fragmenting filesystem must be used, is to use a solid state drive instead of a hard drive. Since SSDs do not have drive heads to move, the impacts of fragmentation are relatively minor.
Journaling
Some filesystems implement journaling, which is a strategy that provides some protection against computer crashes during write operations. In a non-journaling filesystem, if the computer crashes or loses power while file system structures are being modified, it is possible for the system to be left in an inconsistent state. Journaling avoids this problem by first writing a record of the operations to be performed to a journal. Once the journal records have been created, the filesystem then begins to make the requested structural modifications, such as those that occur when writing data. If a crash or power loss occurs during the write, the journal is replayed at the next system boot or filesystem mount, and the filesystem can be brought into a consistent state relatively quickly.
Journaling does increase the number of writes to a device and may be turned off, or journaling filesystems may be avoided altogether, on some types of flash storage. However, if it is enabled, the filesystem journal might contain evidence related to filesystem operations that have not yet been performed.
Using a Filesystem
A filesystem typically is created inside a single partition on a storage device. Creating a filesystem is called formatting, which refers to the process of laying out the initial filesystem data structures inside the partition. Formatting is a destructive process on any filesystem structure that was on the partition before the format operation started. However, unless the format procedure also includes a disk wipe (such as a slow format using an NTFS filesystem), the contents of files from any prior filesystem are largely recoverable with forensic tools.
Once a filesystem has been formatted, the system administrator can mount the filesystem, making it ready for use. On Unix-like operating systems, including Linux and macOS, filesystems are mounted into a directory inside a single hierarchy. For example, on Linux a flash drive might be mounted at “/mnt/flash_drive” or “/run/media/user/flash_drive”. Recent versions of Windows are capable of similar directory mounts, but by default, Windows systems use a separate hierarchy per filesystem, starting with a drive letter. Drive letters A: and B: are reserved for floppy disks, while C: is used for the system partition. Thus, D: through Z: are normally used for additional mounts. On all operating systems, it is good practice to unmount a mounted filesystem before removing its underlying storage device from the computer. Since operating systems tend to cache pending data writes before performing them, removing a device without unmounting may result in data loss.
Examples of Filesystems
Table 1 lists various filesystems by the operating system on which they are most common. Note that not all filesystems are listed, as there are a large number of them, and new ones are under constant development. In addition, it is possible for an operating system to support filesystems that are primarily used by other operating systems. For example, most Linux systems can be configured with at least partial support for nearly any filesystem in the table.
Operating System | Filesystems |
---|---|
Linux | Ext2, Ext3, Ext4, Btrfs, XFS, JFS, YAFFS |
macOS | APFS, HFS+ |
Windows | FAT12, FAT16, FAT32, exFAT, NTFS, ReFS |
*BSD | ZFS, UFS, UFS2, HAMMER, HAMMER2, LFS |
Notes and References
-
“A Misconfigured Laptop, a Wrecked Life.” ABC News. ↩
-
R. Bayer and E. McCreight. “Organization and Maintenance of Large Ordered Indices.” Acta Informatica, 1(3): 173-189. [PDF] ↩
-
Rob Landley. Red-black Trees (rbtree) in Linux ↩