This is part of my series on renovating my homelabs to ring in the roaring ’20s.

I have a few TB of data that’s been living in a Btrfs array on an underpowered FX “server” that I want to move to a more robust ZFS array in the new Epyc server. This describes setting up ZFS on Linux, ZED, and Postfix+Mailx for regular filesystem status emails.

Sections:


ZFS

ZFS is an enterprise filesystem created by Sun for OpenSolaris more than 15 years ago. It has since been ported to Linux under the stewardship of the OpenZFS project - and the port has become popular enough that the ZFS-on-Linux codebase (zfsonlinux/zfs) is becoming the primary codebase for OpenZFS (according to the 2019 ZFS Summit slides).

Warning regarding desktop drives

ZFS wants to be run on enterprise or NAS drives. Consumer desktop drives which spin down during idle will drop out of the zpool and cause RAIDZ degradation or data loss.

Benefits of ZFS

Why am I moving to ZFS?
ZFS has some features that make it very appealing like storage pools, snapshots, native encryption and compression, software raid (raidz), native caching, and more.
My primary reasons for choosing it (and for moving away from Btrfs) is its integrity guarantees, transparent native compression, and cache support.

RAIDZ

RAID (Redundant Array of Inexpensive Disks) is a method of pooling several disks togethor into a singe storage unit, while trading some of their capacity for redundancy and fault tolerance. ZFS has a built-in filesystem RAID known as RAIDZ with several non-standard RAID levels. In addition to supporting standard RAID0, RAID1, and RAID10 setups with combinations of striped and mirrored Vdevs, ZFS RAIDZ, RAIDZ2, and RAIDZ3 are the better ZFS flavors of RAID5/6.

ZFS RAIDZ RAID equivalent Parity stripes Tolerable disk failures
RAIDZ(1) RAID5 1 1
RAIDZ2 RAID6 2 2
RAIDZ3 - 3 3

I recommend playing with the RAIDZ calculator here to get an idea of what the different RAIDZ levels do to the available storage in the pool and in exchange for disk failure tolerance.

In addition to tolerating whole disk failures up to the specified RAIDZ level, ZFS is also capable of silently and transparently detecting and repairing bit-rot and filesystem corruption that may occur during the lifetime of the array. This means that our ZFS pool will never suddenly present file corruption: ZFS will detect that the file is corrupted, correct the corruption using the redundant parity data, and serve us the corrected data while writing it back to the corrupted location.

This self healing capability of ZFS is only possible in RAIDZ. To take full advantage of it, we’ll also want to run zfs scrub on a regular basis - daily or weekly if possible - to let ZFS checksum the entire disk and fix parity errors. And to make it almost completely bulletproof, we’ll also want ECC RAM that will prevent memory corruption that could affect our filesystem integrity.

Let me stop to answer the question you should be asking:

Do I need ECC (or RAIDZ, or even ZFS)?

No, not really.

The statistical likelihood of a bitflip in memory corrupting a file being written is tiny.

And the odds of a hard drive failing catastrophically decrease with every new drive released as manufacturers like Seagate and WD get better and better at building them. The drives that I have installed are warrantied for 5 years! But as the capacity of the drives increases (I’ve installed 10TB units), the impact that that single drive failure would have also increases. I want protect my data from at least 1 drive failure and preferably more.

But I don’t even really need ZFS for that. Btrfs has decent software RAID1 and 10 (but don’t use 5/6!), and Linux’s MD softraid could do it too. Or a hardware RAID card, or etc, etc…

But, all together, ECC and ZFS make it very, very unlikely that I will lose data on this system. (Lose some data. It’s still vulnerable to a total system failure event - see 3-2-1 strategy)

Compression

ZFS supports native compression, meaning that the data will be compressed on disk but appear uncompressed when accessed - ZFS handles the compression without you the user needing to unzip or tar -xvf anything, and it applies to the entire filesystem with practically no performance penalty.

ZFS also supports deduplication, where blocks or byte patterns that are already present on the disk can be reused instead of written again across the entire filesystem, but this has performance and resource usage implications:

  • It eats RAM like a monster - the deduplication table needs about 5GB per 1TB of data stored.
  • But because the dedupe table can’t be more than 1/4 of the ARC (which is also in-memory), we need 4 times that much memory to be able to hold the entire ARC and dedupe table in RAM. Make that 20GB/TB of storage.
  • And there’s write delays - data coming in has to be deduped before it’s written to disk.

Deduping can save as much as 50% capacity - writing 10TB of data to 5TB of disk - but I’m going to leave it turned off.

Installing ZFS on Linux

Installing ZoL on Fedora is the easiest part of getting ZFS up:

$ sudo dnf install http://download.zfsonlinux.org/fedora/zfs-release$(rpm -E %dist).noarch.rpm
$ sudo dnf install zfs kernel-devel

It may be necessary to reboot if the kernel-devel version does not match the running kernel. This page on the Fedora wiki should be referenced for any issues.

With ZFS installed, we can create our zpool. I’m going to create a 6 disk RAIDZ2 pool:

$ sudo zpool create tank raidz2 sda sdb sdc sde sdd sdf

And I’ll create a dataset and check the mounts (note that /tank is already mounted and ready to use!):

$ sudo zfs create tank/media
$ sudo zfs list
tank        0.0T  X.XT     0K  /tank
tank/media  0.0T  X.XT     0T  /tank/media

This ZFS documentation from Aaron Toponce is fantastic for building a quick understanding of vdevs, RAIDZ, zpools, and zvols.

Automating daily scrubs

To take full advantage of ZFS’s self-healing capabilities, we should run regular scrubs. We can automate this with systemd.

I created a systemd unit called [email protected]:

$ sudo vi /etc/systemd/system/[email protected]
[Unit]                                                                                                                             
Description=zpool scrub on %i                                                                                                      
                                                                                                                                   
[Service]                                                                                                                          
Nice=19                                                                                                                            
IOSchedulingClass=idle                                                                                                             
KillSignal=SIGINT
ExecStart=/usr/sbin/zpool scrub %i

(the @ is the systemd “service template” syntax)

To periodically execute the scrub service, I also created [email protected]:

$ sudo vi /etc/systemd/system/[email protected]
[Unit]
Description=Daily zpool scrub on %i

[Timer]
OnCalendar=daily # <- adjust this interval to suit your needs
AccuracySec=1h
Persistent=true

[Install]
WantedBy=multi-user.target

Finally, to wire everything up and schedule daily scrubs of my /tank, I enabled the timer:

$ sudo systemctl enable --now [email protected]

Now, the tank will automatically be scrubbed every day at midnight!

ZED

I don’t want to have to log in to my server every morning to check the status of last night’s zpool scrub. Wouldn’t it be nice if I could just be notified of the results?

ZED is the ZFS Event Daemon. It watches for ZFS events (like a scrub completing) and reacts to those events (by, for example, sending an email). It comes with ZFS and can be enabled like:

$ sudo systemctl enable --now zfs-zed.service

However, the default configuration probably doesn’t know how to send emails correctly. I’m going to set ZED up to post mails to a local mail relay (postfix, via mailx), so I need to adjust the ZED config to send to my email, using mail:

$ sudo vi /etc/zfs/zed.d/zed.rc
...
ZED_EMAIL_ADDR="[email protected]"
ZED_EMAIL_PROG="mail"
...
$ sudo systemctl restart zfs-zed.service

Postfix, Mailx and Gmail:

I tested several combinations of mail relays, mail servers, and cli wrappers, and Postfix+Mailx is the combo that I landed on that works best with ZED. Other sendmail compatible programs may work, but I couldn’t figure out how to make them play together.

Install Postfix and Mailx:

$ sudo dnf install -y postfix mailx

Setting up Postfix, Mailx and ZED is continued in “Configuring Postfix with Gmail”.


Read the other articles in this series here: