ZFS on NetBSD

This page attempts to do two things: provide enough orientation and pointers to standard ZFS documentation for NetBSD users who are new to ZFS, and to describe NetBSD-specific ZFS information. It is emphatically not a tutorial or an introduction to ZFS.

Many things are marked with \todo because they need a better explanation, and some have question marks

This HOWTO describes the most recent state of branches, and does not attempt to describe formal releases. This is a clue; if you are using NetBSD-9 and ZFS, you should update along the branch. Writing "NetBSD-10" is a further clue.

This HOWTO errs on the side of saying things if they seem 90-95% true. Feel free to complain if you think something is wrong!

Status of ZFS in NetBSD

NetBSD-8

NetBSD-8 has an old version of ZFS, and it is not recommended for use at all. There is no evidence that anyone is interested in helping with ZFS on 8. Those wishing to use ZFS on NetBSD 8 should therefore update to NetBSD-9. This page therefore contains no useful status or hints about 8.

NetBSD-9

NetBSD-9 has ZFS that is considered to work well. There have been fixes since 9.0_RELEASE. As always, people running NetBSD-9 are likely best served by the most recent version of the netbsd-9 stable branch. This page assumes anyone using 9 is using 9.3 or netbsd-9 after 9.3; issues fixed in earlier versions have been removed.

Extended attributes are not supported. \todo Link to PR.

There is a workaround where removing a file will commit the ZIL (normally this would not be done), to avoid crashes due to vnode reclaims. \todo Link to PR.

There has been a report of an occasional panic somewhere in zfs_putpages.

NetBSD-10

NetBSD-10 (as of 2023-07) very likely has similar ZFS code to 9.

There is initial support for ZFS root, via booting from ffs and pivoting.

NetBSD-current

NetBSD-current (as of 2023-07) has similar ZFS code to 10.

NetBSD/xen special issues

Summary: if you are using NetBSD, xen and zfs, use NetBSD-10 or newer.

In NetBSD-9, MAXPHYS is 64KB in most places, but because of xbd(4) it is set to 32KB for XEN kernels. Thus the standard zfs kernel modules do not work under xen. In NetBSD-10 and newer, xbd(4) supports 64 KB MAXPHYS and this is no longer an issue. Xen and zfs on 10/current are reported to work well together, as of 2021-02.

Architectures

Most people seem to be using amd64.

To build zfs, one puts MKZFS=yes in mk.conf. This is default on amd64 and aarch64 on netbsd-9. In 10 and current, it is also default on sparc64.

More or less, it is acceptable to commit enabling zfs on an architecture when it is known to build and run reliably. (Of course, anyone is welcome to build it and use it, and reports of success or unexpected failure are welcome.)

Known Problems

Excessive ARC usage

The upstream code sets the max ARC size (and hence the initial target) to all but 1 GB. This results in consuming excessive amounts of memory on systems with less than about 8 GB, and systems of 4 GB and lower have been observed to lock up. It seems likely that even high-memory systems would have trouble if enough data were paged in. Patches to change this behavior have been sent to netbsd-users@.

See the section "Memory Usage", below.

Difficulty freeing metadata

FreeBSD has a function arc_dnlc_evicts_thread. This has something to do with freeing objects not in ZFS that refer to metadata entries in the ARC, which keeps them from being evictable. A work item is to understand this and adapt it to NetBSD.

Draining under memory pressure

There should be a mechanism to shrink the ARC under memory pressure. FreeBSD hooks in a drain procedure to do this. A work item is to understand this and adapt it.

Quick Start

See the FreeBSD Quickstart Guide; only the first item is NetBSD specific.

Documentation Pointers

See the man pages for zfs(8), zpool(8). Also see zdb(8), if only for seeing pool config info when run with no arguments.

NetBSD-specific information

rc.conf

The main configuration is to put zfs=YES in rc.conf, so that the rc.d scripts bring up ZFS and mount ZFS file systems.

pool locations

One can add disks or parts of disks into pools. Methods of specifying areas to be included include:

Information about created or imported pools is stored in /etc/zfs/zpool.cache.

Conventional wisdom is that a pool that is more than 80% used gets unhappy; so far there is not NetBSD-specific wisdom to confirm or refute that.

pool native blocksize mismatch

ZFS attempts to find out the native blocksize for a disk when using it in a pool; this is almost always 512 or 4096. Somewhere between 9.0 and 9.1, at least some disks on some controllers that used to report 512 now report 4096. This provokes a blocksize mismatch warning.

Given that the native blocksize of the disk didn't change, and things seemed OK using the 512 emulated blocks, the warning is likely not critical. However, it is also likely that rebuilding the pool with the 4096 blocksize is likely to result in better behavior because ZFS will only try to do 4096-byte writes. \todo Verify this and find the actual change and explain better.

pool importing problems with disklabel partitions

While one can "zpool pool0 /dev/wd0f" and have a working pool, this pool cannot (after having been exported) be imported straigthforwardly. "zpool export" works fine, and deletes zpool.cache. "zpool import", however, only looks at entire disks (e.g. /dev/wd0), and wedges (e.g. /dev/dk0). It does not look at partitions like /dev/wd0f.

One an mkdir e.g. /etc/zfs/pool0 and in it have a symlink to /dev/wd0f. Then, zpool import -d /etc/zfs/pool0 will scan /etc/zfs/pool0/wd0f and succeed. The resulting zpool.cache will have that path, but having symlinks in /etc/zfs/POOLNAME seems acceptable.

\todo Link to a PR that says that the zpool import man page claims that devices in /dev are searched, when disklabel partitions are excluded.

mountpoint conventions

By default, datasets are mounted as /poolname/datasetname. One can also set a mountpoint; see zfs(8).

There does not appear to be any reason to choose explicit mountpoints vs the default (and either using data in place or symlinking to it).

mount order

NetBSD-9 mounts other file systems and then ZFS file systems. This can be a problem if /usr/pkgsrc is on ZFS and /usr/pkgsrc/distfiles is on NFS. A workaround is to use noauto and do the mounts in /etc/rc.local.

NetBSD current after 20200301, and thus NetBSD-10 mounts ZFS first. The same issues and workarounds apply in different circumstances.

NFS

zfs filesystems can be exported via NFS, simply by placing them in /etc/exports like any other filesystem.

The "zfs share" command adds a line for each filesystem with the sharenfs property set to /etc/zfs/exports, and "zfs unshare" removes it. This file is ignored on NetBSD-9 and current before 20210216; on current after 20210216 and thus also 10, those filesystems should be exported (assuming NFS is enabled). It does not appear to be possible to set options like maproot and network restrictions via this method.

zvol

Within a ZFS pool, the standard approach is to have file systems, but one can also create a zvol, which is a block device of a certain size.

As an example, "zfs create -V 16G tank0/xen-netbsd-9-amd64" creates a zvol (intended to be a virtual disk for a domU).

The zvol in the example will appear as /dev/zvol/rdsk/tank0/xen-netbsd-9-amd64 and /dev/zvol/dsk/tank0/xen-netbsd-9-amd64 and can be used like a disklabel partition or wedge. However, the system will not read disklabels and gpt labels from a zvol.

Doing "swapctl -a" on a zvol device node fails. \todo Is it really true that NetBSD can't swap on a zvol? (When using a zvol for swap, standard advice is to avoid the "-s" option which avoids reserving the allocated space. Standard advice is also to consider using a dedicated pool.)

\todo Explain that one can export a zvol via iscsi.

One can use ccd to create a normal-looking disk from a zvol. This allows reading a GPT label from the zvol, which is useful in case the zvol had been exported via iscsi and some other system created a label.

Memory usage

Basically, ZFS uses lots of memory and most people run it on systems with large amounts of memory. This is not specifically about NetBSD; the same "ZFS is piggy" issues arise with other operating systems as well.

NetBSD works well on systems with comparatively small amounts of memory. So a natural question is how well ZFS works on one's VAX with 2 MB of RAM :-). More seriously, one might ask if it is reasonable to run ZFS on a RPI3 with 1G of RAM, or if it is reasonable on an amd64 system with 4G.

ZFS uses memory for various things, but a particularly significant use is ARC, or "Adaptive Replacement Cache". A limit on ARC is set in the kernel at boot. Enough memory can be consumed that the system deadlocks.

NetBSD does not have a mechanism to see or set this limit. FreeBSD has tunables, described in FreeBSD Handbook ZFS Advanced section.

ARC usage manifests as pool usage. A system with 8 GB maxes out around 4 GB pool usage (not just ZFS).

Anecdata is that systems with 32 GB never have problems and are entirely safe in production. Problems have not been reported on 16 GB systems and they are believed safe. 8 GB systems seem ok, but confidence is lower. 4 GB systems are known to lock up.

Several changes would improve behavior: - expose the limit as a sysctl so that an admin can tune it - set the limit based on RAM, to avoid using large fractions of lower-memory machines - enable the system, when under memory pressure, to ask ARC to free memory

Besides RAM, zfs requires that architecture kernel stack size is at least 12KB or more -- some operations cause stack overflow with 8KB kernel stack. On NetBSD, the architectures with 16KB kernel stack are amd64, sparc64, powerpc, and experimental ia64, hppa. mac68k and sh3 have 12KB kernel stack. All others use only 8KB stack, which is not enough to run zfs.

\todo Try zfs on i386 and see what happens, and report to netbsd-users@.

NetBSD has many statistics provided via sysctl; see "sysctl kstat.zfs".

Interoperability with other systems

Modern ZFS uses pool version 5000 and feature flags.

It is in general possible to export a pool and them import the pool on some other system, as long as the other system supports all the used features.

\todo Explain how to do this and what is known to work.

\todo Explain feature flags relationship to FreeBSD, Linux, iIllumos, macOS.

Sources of ZFS code

Currently, there are multiple ZFS projects and codebases:

OpenZFS is a coordinating project to align open ZFS codebases. There is a notion of a shared core codebase and OS-specific adaptation code.