One of the many features OpenZFS brings to the table is ZFS native encryption. First introduced in OpenZFS 0.8, native encryption allows a system administrator to transparently encrypt data at-rest within ZFS itself. This obviates the need for separate tools like LUKS, VeraCrypt, or BitLocker.
OpenZFS encryption algorithm defaults to either
aes-256-ccm (prior to 0.8.4) or
aes-256-gcm (>= 0.8.4) when
encryption=on is set. But it may also be specified directly. Currently supported algorithms are:
aes-256-ccm(default in OpenZFS < 0.8.4)
aes-256-gcm(default in OpenZFS >= 0.8.4)
There’s more to OpenZFS native encryption than the algorithms used, though—so we’ll try to give you a brief but solid grounding in the sysadmin’s-eye perspective on the “why” and “what” as well as the simple “how.”
Why (or why not) OpenZFS native encryption?
A clever sysadmin who wants to provide at-rest encryption doesn’t actually need OpenZFS native encryption, obviously. As mentioned in the introduction,
VeraCrypt, and many other schemes are available and can be layered either beneath or atop OpenZFS itself.
First, the “why not”
Putting something like Linux’s
LUKS underneath OpenZFS has an advantage—with the entire disk encrypted, an enterprising attacker can no longer see the names, sizes, or properties of ZFS
zvols without access to the key. In fact, the attacker can’t necessarily see that ZFS is in use at all!
But there are significant disadvantages to putting
LUKS (or similar) beneath OpenZFS. One of the gnarliest is that each individual disk which will be part of the pool must be encrypted, with each volume loaded and decrypted prior to the ZFS pool
import stage. This can be a noticeable challenge for ZFS systems with many disks—in some cases, many tens of disks. Another problem with encryption-beneath-ZFS is that the extra layer is an extra thing to go wrong—and it’s in a position to undo all of ZFS’ normal integrity guarantees.
LUKS or similar atop OpenZFS gets rid of the aforementioned problems—a
zvol only needs one key regardless of how many disks are involved, and the
LUKS layer cannot undo OpenZFS’ integrity guarantees from here. Unfortunately, encryption-atop-ZFS introduces a new problem—it effectively nerfs OpenZFS inline compression, since encrypted data is generally incompressible. This approach also requires the use of one
zvol per encrypted filesystem, along with a guest filesystem (e.g.,
ext4) to format the
LUKS volume itself with.
Now, the “why”
OpenZFS native encryption splits the difference: it operates atop the normal ZFS storage layers and therefore doesn’t nerf ZFS’ own integrity guarantees. But it also doesn’t interfere with ZFS compression—data is compressed prior to being saved to an encrypted
There’s an even more compelling reason to choose OpenZFS native encryption, though—something called “raw send.” ZFS replication is ridiculously fast and efficient—frequently several orders of magnitude faster than filesystem-neutral tools like
rsync—and raw send makes it possible not only to replicate encrypted
zvols, but to do so without exposing the key to the remote system.
This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend’s house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.
In the event that you need to recover your offsite backup, you can simply replicate it back to your own location—then, and only then, loading the decryption key to actually access the data. This works for either full replication (moving every single block across the wire) or asynchronous incremental replication (beginning from a commonly held snapshot and only moving the blocks which have changed since that snapshot).
What’s encrypted—and what isn’t?
OpenZFS native encryption isn’t a full-disk encryption scheme—it’s enabled or disabled on a per-dataset / per-zvol basis, and it cannot be turned on for entire pools as a whole. The contents of encrypted datasets or zvols are protected from at-rest spying—but the metadata describing the datasets/zvols themselves is not.
Let’s say we create an encrypted dataset named
pool/encrypted, and beneath it we create several more child datasets. The
encryption property for the children is inherited by default from the parent dataset, so we can see the following:
root@banshee:~# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase banshee/encrypted Enter passphrase: Re-enter passphrase: root@banshee:~# zfs create banshee/encrypted/child1 root@banshee:~# zfs create banshee/encrypted/child2 root@banshee:~# zfs create banshee/encrypted/child3 root@banshee:~# zfs list -r banshee/encrypted NAME USED AVAIL REFER MOUNTPOINT banshee/encrypted 1.58M 848G 432K /banshee/encrypted banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1 banshee/encrypted/child2 320K 848G 320K /banshee/encrypted/child2 banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3 root@banshee:~# zfs get encryption banshee/encrypted/child1 NAME PROPERTY VALUE SOURCE banshee/encrypted/child1 encryption aes-256-gcm -
At the moment, our encrypted datasets are all mounted. But even if we unmount them and unload the encryption key—making them inaccessible—we can still see that they exist, along with their properties:
root@banshee:~# wget -qO /banshee/encrypted/child2/HuckFinn.txt http://textfiles.com/etext/AUTHORS/TWAIN/huck_finn root@banshee:~# zfs unmount banshee/encrypted root@banshee:~# zfs unload-key -r banshee/encrypted 1 / 1 key(s) successfully unloaded root@banshee:~# zfs mount banshee/encrypted cannot mount 'banshee/encrypted': encryption key not loaded root@banshee:~# ls /banshee/encrypted/child2 ls: cannot access '/banshee/encrypted/child2': No such file or directory root@banshee:~# zfs list -r banshee/encrypted NAME USED AVAIL REFER MOUNTPOINT banshee/encrypted 2.19M 848G 432K /banshee/encrypted banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1 banshee/encrypted/child2 944K 848G 720K /banshee/encrypted/child2 banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3
As we can see above, after unloading the encryption key, we can no longer see our freshly-downloaded copy of Huckleberry Finn in
/banshee/encrypted/child2/. What we can still see is the existence—and structure—of our entire ZFS-encrypted tree. We can also see each encrypted dataset’s properties, including but not limited to the
REFER of each dataset.
It’s worth noting that trying to
ls an encrypted dataset which doesn’t have its key loaded won’t necessarily produce an error:
root@banshee:~# zfs get keystatus banshee/encrypted NAME PROPERTY VALUE SOURCE banshee/encrypted keystatus unavailable - root@banshee:~# ls /banshee/encrypted root@banshee:~#
This is because a naked directory exists on the host, even when the actual dataset is not mounted. Reloading the key doesn’t automatically remount the dataset, either:
root@banshee:~# zfs load-key -r banshee/encrypted Enter passphrase for 'banshee/encrypted': 1 / 1 key(s) successfully loaded root@banshee:~# zfs mount | grep encr root@banshee:~# ls /banshee/encrypted root@banshee:~# ls /banshee/encrypted/child2 ls: cannot access '/banshee/encrypted/child2': No such file or directory
In order to access our fresh copy of Huckleberry Finn, we’ll also need to actually mount the freshly key-reloaded datasets:
root@banshee:~# zfs get keystatus banshee/encrypted/child2 NAME PROPERTY VALUE SOURCE banshee/encrypted/child2 keystatus available - root@banshee:~# ls -l /banshee/encrypted/child2 ls: cannot access '/banshee/encrypted/child2': No such file or directory root@banshee:~# zfs mount -a root@banshee:~# ls -lh /banshee/encrypted/child2 total 401K -rw-r--r-- 1 root root 554K Jun 13 2002 HuckFinn.txt
Now that we’ve both loaded the necessary key and mounted the datasets, we can see our encrypted data again.