What is ZFS?

ZFS is a revolutionary file system and logical volume manager that fundamentally changes the way file systems are administered, with features and benefits not found in any other file system available today. ZFS is robust, scalable, and easy to administer.

It provides greater space for files, hugely improved administration and greatly improved data integrity.

ZFS uses a 128-bit addressing scheme and can store up to 275 billion TB per storage pool. ZFS capacity limits are so far away as to be unimaginable.

Storage Pools

ZFS does away with the concept of disk volumes, partitions and disk provisioning by adopting pooled storage, where all available hard drives in a system are essentially joined together. The combined bandwidth of the pooled devices is available to ZFS, which effectively maximizes storage space, speed and availability.

ZFS takes available storage drives and pools them together as a single resource, called a “zpool”. This can be optimized for capacity, I/O performance or redundancy, using striping, mirroring or some form of RAID. If more storage is needed, then more drives can simply be added to the zpool. ZFS sees the new capacity and starts using it automatically, balancing I/O and maximizing throughput.

Instead of pre-allocating metadata like other file systems, ZFS utilizes dynamically allocated metadata as needed, with no initial space required at the initialization and no limit on the files or directories supported by the file system.

File systems are no longer constrained to individual devices, allowing them to share disk space with all file systems in the pool. You no longer need to predetermine the size of a file system, as file systems grow automatically within the disk space allocated to the storage pool. When new storage is added, all file systems within the pool can immediately use the additional disk space without additional work.

Data Integrity

One major feature that distinguishes ZFS from other file systems is that ZFS is designed with a focus on data integrity. That is, it is designed to protect the data on disk against silent data corruption caused by bit rot, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, memory parity errors between the array and server memory, driver errors and accidental overwrites.

ZFS ensures that data is always consistent on the disk using a number of techniques, including copy-on-write. What this means is that when data is changed it is not overwritten — it is always written to a new block and checksummed before pointers to the data are changed. The old data may be retained, creating snapshots of the data through time as changes are made. File writes using ZFS are transactional — either everything or nothing is written to disk.

Scrubbing

ZFS can be scheduled to perform a “scrub” on all the data in a storage pool, checking each piece of data with its corresponding checksum to verify its integrity, detect any silent data corruption and to correct any errors where possible.

When the data is stored in a redundant fashion — in a mirrored or RAID-type array — it can be self-healed automatically and without any administrator intervention. Since data corruption is logged, ZFS can bring to light defects in memory modules (or other hardware) that cause data to be stored on hard drives incorrectly.

Scrubbing is given low I/O priority so that it has a minimal effect on system performance and can operate while the storage pool is in use.

Snapshots

An advantage of copy-on-write is that, when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored.

They are also space efficient, since any unchanged data is shared among the file system and its snapshots.

Initially, snapshots consume no additional disk space within the pool. As data within the active dataset changes, the snapshot consumes disk space by continuing to reference the old data. As a result, the snapshot prevents the data from being freed back to the pool.

Clones

Writeable snapshots (“clones”) can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is possible due to the copy-on-write design.

Sending & Receiving Snapshots

Snapshots of ZFS file systems and volumes can be sent to remote hosts over the network. This data stream can be an entire file system or volume, or it can be the changes since it was last sent. When sending only the changes, the stream size depends on the number of blocks changed between the snapshots. This provides a very efficient strategy for synchronizing backups.

Caching

ARC & L2ARC

The ARC is the “adaptive replacement cache”. ARC is a very fast block level cache located in the systems memory. Any read requests for data in the cache can be served directly from the ARC memory cache instead of hitting the much slower hard drives. This creates a noticeable performance increase for data that is accessed frequently.

As a general rule, you want to install as much RAM into the server as you can to make the ARC as big as possible. At some point adding more memory becomes cost prohibitive, which is where the L2ARC becomes important. The L2ARC is the second level adaptive replacement cache. The L2ARC is often called the “cache drive” in ZFS systems. The algorithms that manage L2ARC population are automatic and intelligent.

When cache drives are present in the ZFS pool, the cache drives will cache frequently accessed data that did not fit in ARC. When read requests come into the system, ZFS will attempt to serve those requests from the ARC. If the data is not in the ARC, ZFS will attempt to serve the requests from the L2ARC. Hard drives are only accessed when data does not exist in either the ARC or L2ARC. This means the hard drives receive far fewer requests which can have a dramatic improvement on IOPs performance.

ZIL

The ZIL is the “ZFS intent log”and acts as a logging mechanism to store synchronous writes, until they are safely written to the main data structure on the storage pool. The speed at which data can be written to the ZIL determines the speed at which synchronous write requests can be done. By using fast disks as the ZIL, you accelerate the ZIL and improves the synchronous write performance. Like L2ARC, the ZIL is managed automatically and intelligently by ZFS.

SSD Drives

High performance SSDs can be added to a storage pool to create a hybrid storage pool. SSD drives can be added to a ZFS pool as “cache” drives (for the L2ARC) or as “log” drives (for the ZIL).

By adding drives for both the L2ARC and the ZIL, both read and write data is accelerated.

Other Features

Block Level Deduplication

ZFS can employ block level deduplication, which is to say it can detect identical blocks, and simply keep one copy of the data. This can significantly reduce storage allocation. The deduplication is an inline process, occurring when the data is written.

Compression

ZFS offers native compression. The new LZ4 algorithm offers very fast compression and de-compression. Compression is completely transparent.