In the previous blog, I shared the basics of filesystems in Node.js. In this post, we'll delve into the best practices for handling filesystems and explore the reasons behind them.
First, let's review what a filesystem is.
A filesystem is a logical structure that organizes, manages, and provides access to files and directories on storage devices like hard drives or SSDs. Filesystems can vary between operating systems, but some are designed to work across multiple platforms or for specific applications.
The Node.js documentation outlines several best practices for handling filesystems:
- Filesystem Behavior
- Avoid a Lowest Common Denominator Approach
- Adopt a Superset Approach
- Case Preservation
- Unicode Form Preservation
- Timestamp Resolution
- Normalization Functions
Filesystem Behavior
As mentioned at the beginning, while filesystems work across multiple platforms, their behaviors differ significantly in areas like case sensitivity, Unicode form preservation, and timestamp resolution. Instead of relying on assumptions (e.g., inferring behavior from process.platform
), it's better to directly probe the filesystem to understand its features. This approach helps account for variations, including users with different filesystems mounted at different paths.
Avoid a Lowest Common Denominator Approach
Implementing "a lowest common denominator" means aligning systems to the simplest and most restrictive method to ensure compatibility across all environments. Some examples the documentation shares are normalizing all filenames to uppercase, normalizing all filenames to NFC Unicode form, and normalizing all file timestamps to say 1-second resolution. This may negatively impact performance and functionality.
Adopt a Superset Approach
Supporting the most advanced features across platforms ensures compatibility with advanced filesystems while retaining essential functionality with simpler ones. For example, preserve features like case sensitivity, Unicode form preservation, and high-resolution timestamps, even when some platforms do not natively support them.
Case Preservation
Some filesystems convert filenames to uppercase or lowercase when storing them, as not all filesystems support case preservation. To maintain compatibility, it's crucial to understand and handle these differences.
Unicode Form Preservation
UTF-8 and ASCII handle character encoding differently. While ASCII assigns each character a unique single-byte representation, UTF-8 can represent the same character using different sequences of bytes.
An example provided in the Node.js documentation is the character "é" in café. If you create a directory named "test/café" in NFC Unicode form (with a byte sequence <63 61 66 c3 a9> and string.length === 5), reading the directory with fs.readdir('test') may return ['café'] in NFD Unicode form (with a byte sequence <63 61 66 65 cc 81> and string.length === 6).
This difference is not a bug. Node.js simply returns filenames as stored by the filesystem, and not all filesystems preserve Unicode forms. For instance:
- HFS+ (used in macOS) normalizes filenames to a Unicode form similar to NFD.
- NTFS (Windows) and EXT4 (Linux) handle Unicode forms differently.
To avoid problems, do not permanently change how data is stored through normalization. Instead, preserve the original Unicode form and use normalization only for comparison or processing.
Timestamp Resolution
Timestamps differ between filesystems in how they store and handle time resolutions.
Different filesystems store timestamps with varying levels of precision:
- Milliseconds: Some filesystems record timestamps in milliseconds (e.g., 1444291759414).
- Seconds: Others might round to 1-second resolution (e.g., 1444291759000) or 2-second resolution (e.g., 1444291758000).
- Coarse Resolutions: Certain filesystems (e.g., FAT) may only update timestamps like atime (access time) once every 24 hours.
Normalization Functions
There are some functions for normalization, and you might also consider using toUpperCase()
and toLowerCase()
. However, we should never use them to permanently normalize or store filenames. These functions should only be used for comparison.
It might seem a bit overwhelming at first, but as developers, it's crucial to consider these practices. They help us work effectively as a team, create robust applications, and deliver better services to our users.
Top comments (0)