# ZFS on NetBSD This page attempts to do two things: provide enough orientation and pointers to standard ZFS documentation for NetBSD users who are new to ZFS, and to describe NetBSD-specific ZFS information. It is emphatically not a tutorial or an introduction to ZFS. Many things are marked with \todo because they need a better explanation, and some have question marks This HOWTO describes the most recent state of branches, and does not attempt to describe formal releases. This is a clue; if you are using NetBSD-9 and ZFS, you should update along the branch. Writing "NetBSD-10" is a further clue. This HOWTO errs on the side of saying things if they seem 90-95% true. Feel free to complain if you think something is wrong! As of 2024-11, multiple people report full-system lockups when using zfs and pushing it hard. There are no reports of systems that are pushed hard and stay up for months. # Status of ZFS in NetBSD NetBSD-8 and older are no longer supported, and hence not addressed. ## NetBSD-9 NetBSD-9 has ZFS that is considered to work well, except for lockups. There have been fixes since 9.0_RELEASE. As always, people running NetBSD-9 are likely best served by the most recent version of the netbsd-9 stable branch. This page assumes anyone using 9 is using 9.3 or netbsd-9 after 9.3; issues fixed in earlier versions have been removed. Extended attributes are not supported. \todo Link to PR. There is a workaround where removing a file will commit the ZIL (normally this would not be done), to avoid crashes due to vnode reclaims. \todo Link to PR. There has been a report of an occasional panic somewhere in zfs_putpages. ## NetBSD-10 NetBSD-10 (as of 2023-07) very likely has similar ZFS code to 9. There is initial support for [[ZFS root|Root_on_zfs]], via booting from ffs and pivoting. ## NetBSD-current NetBSD-current (as of 2023-07) has similar ZFS code to 10. ## NetBSD/xen special issues Summary: if you are using NetBSD, xen and zfs, use NetBSD-10 or newer. In NetBSD-9, MAXPHYS is 64KB in most places, but because of xbd(4) it is set to 32KB for XEN kernels. Thus the standard zfs kernel modules do not work under xen. In NetBSD-10 and newer, xbd(4) supports 64 KB MAXPHYS and this is no longer an issue. Xen and zfs on 10 (and post-10 current) work well (except for lockups, which happen without Xen). ## Architectures Most people seem to be using amd64. To build zfs, one puts MKZFS=yes in mk.conf. This is default on amd64 and aarch64 on netbsd-9. In 10 and current, it is also default on sparc64. More or less, it is acceptable to commit enabling zfs on an architecture when it is known to build and run as reliably as it does on amd64. (Of course, anyone is welcome to build it and use it, and reports of success or unexpected failure are welcome.) ## Known Problems See https://codeberg.org/gdt/netbsd-zfs/ for a patch to reduce ARC usage and add debug output, scripts to monitor memory usage, and scripts to stress zfs and memory. ### Lockups zfs is prone to full-system lockups. They seem to be related to heavy zfs activity, especially writing data or deleting files, while at the same time there is memory pressure. It seems that the odds of a lockup go up over time, suggesting a leak. Probably, the bug is in an error or exception path. zfs works well, when it has not locked up. The /etc/daily script seems to provoke lockups. ### Not enough sysctls Many things should be sysctls and aren't. ### Excessive ARC usage The upstream code sets the max ARC size (and hence the initial target) to all but 1 GB. This results in consuming excessive amounts of memory on systems with less than about 8 GB, and systems of 4 GB and lower have been observed to lock up. It is far from clear that using all spare RAM for ARC is the real cause. See the section "Memory Usage", below. ### Difficulty freeing metadata, vnodes FreeBSD has a function `arc_dnlc_evicts_thread`. This has something to do with freeing objects not in ZFS that refer to metadata entries in the ARC, which keeps them from being evictable. A work item is to understand this and adapt it to NetBSD. In NetBSD, zfs vnodes have a 'dnode' associated with them. This memory is from a pool, and is counted as being in the ARC, but it is not actually a DVA/data tuple in the ARC. Reducing kern.maxvnodes seems to help keep this from causing problems. ### Prefetching Prefetching is perhaps related to lockups. ### Allocating memory in routines called under memory pressure The NetBSD glue code allocates memory when asked to free memory. This could be related to the lockup problem, but there is no clear evidence so far. # Quick Start See the [FreeBSD Quickstart Guide](https://www.freebsd.org/doc/handbook/zfs-quickstart.html); only the first item is NetBSD specific. - Put zfs=YES in rc.conf. - Create a pool as "zpool create pool1 /dev/dk0". - df and see /pool1 - Create a filesystem mounted on /n0 as "zfs create -o mountpoint=/n0 pool1/n0". - Read the documentation referenced in the next section. ## Documentation Pointers See the man pages for zfs(8), zpool(8). Also see zdb(8), if only for seeing pool config info when run with no arguments. - [OpenZFS Documentation](https://openzfs.github.io/openzfs-docs/) - [OpenZFS admin docs index page](https://github.com/openzfs/zfs/wiki/Admin-Documentation) - [FreeBSD Handbook ZFS Chapter](https://www.freebsd.org/doc/handbook/zfs.html) - [Oracle ZFS Administration Manual](https://docs.oracle.com/cd/E26505_01/html/E37384/index.html) - [Wikipedia](https://en.wikipedia.org/wiki/ZFS) # NetBSD-specific information ## rc.conf The main configuration is to put zfs=YES in rc.conf, so that the rc.d scripts bring up ZFS and mount ZFS file systems. ## pool locations One can add disks or parts of disks into pools. Methods of specifying areas to be included include: - entire disks (e.g., /dev/wd0d on amd64, or /dev/wd0 which has the same major/minor) - disklabel partitions (e.g., /dev/sd0e) - wedges (e.g., /dev/dk0) Information about created or imported pools is stored in /etc/zfs/zpool.cache. Conventional wisdom is that a pool that is more than 80% used gets unhappy; so far there is not NetBSD-specific wisdom to confirm or refute that. ## pool native blocksize mismatch ZFS attempts to find out the native blocksize for a disk when using it in a pool; this is almost always 512 or 4096. Somewhere between 9.0 and 9.1, at least some disks on some controllers that used to report 512 now report 4096. This provokes a blocksize mismatch warning. Given that the native blocksize of the disk didn't change, and things seemed OK using the 512 emulated blocks, the warning is likely not critical. However, it is also likely that rebuilding the pool with the 4096 blocksize is likely to result in better behavior because ZFS will only try to do 4096-byte writes. \todo Verify this and find the actual change and explain better. ## pool importing problems with disklabel partitions While one can "zpool pool0 /dev/wd0f" and have a working pool, this pool cannot (after having been exported) be imported straigthforwardly. "zpool export" works fine, and deletes zpool.cache. "zpool import", however, only looks at entire disks (e.g. /dev/wd0), and wedges (e.g. /dev/dk0). It does not look at partitions like /dev/wd0f. One an mkdir e.g. `/etc/zfs/pool0` and in it have a symlink to `/dev/wd0f`. Then, `zpool import -d /etc/zfs/pool0` will scan `/etc/zfs/pool0/wd0f` and succeed. The resulting `zpool.cache` will have that path, but having symlinks in /etc/zfs/POOLNAME seems acceptable. \todo Link to a PR that says that the `zpool import` man page claims that devices in `/dev` are searched, when disklabel partitions are excluded. ## mountpoint conventions By default, datasets are mounted as /poolname/datasetname. One can also set a mountpoint; see zfs(8). There does not appear to be any reason to choose explicit mountpoints vs the default (and either using data in place or symlinking to it). ## mount order NetBSD-9 mounts other file systems and then ZFS file systems. This can be a problem if /usr/pkgsrc is on ZFS and /usr/pkgsrc/distfiles is on NFS. A workaround is to use noauto and do the mounts in /etc/rc.local. NetBSD current after 20200301, and thus NetBSD-10 mounts ZFS first. The same issues and workarounds apply in different circumstances. ## NFS zfs filesystems can be exported via NFS, simply by placing them in /etc/exports like any other filesystem. The "zfs share" command adds a line for each filesystem with the sharenfs property set to /etc/zfs/exports, and "zfs unshare" removes it. This file is ignored on NetBSD-9 and current before 20210216; on current after 20210216 and thus also 10, those filesystems should be exported (assuming NFS is enabled). It does not appear to be possible to set options like maproot and network restrictions via this method. ## zvol Within a ZFS pool, the standard approach is to have file systems, but one can also create a zvol, which is a block device of a certain size. As an example, "zfs create -V 16G tank0/xen-netbsd-9-amd64" creates a zvol (intended to be a virtual disk for a domU). The zvol in the example will appear as /dev/zvol/rdsk/tank0/xen-netbsd-9-amd64 and /dev/zvol/dsk/tank0/xen-netbsd-9-amd64 and can be used like a disklabel partition or wedge. However, the system will not read disklabels and gpt labels from a zvol. Doing "swapctl -a" on a zvol device node fails. \todo Is it really true that NetBSD can't swap on a zvol? (When using a zvol for swap, standard advice is to avoid the "-s" option which avoids reserving the allocated space. Standard advice is also to consider using a dedicated pool.) \todo Explain that one can export a zvol via iscsi. One can use ccd to create a normal-looking disk from a zvol. This allows reading a GPT label from the zvol, which is useful in case the zvol had been exported via iscsi and some other system created a label. # Memory usage Basically, ZFS uses lots of memory and most people run it on systems with large amounts of memory. This is not specifically about NetBSD; the same "ZFS is piggy" issues arise with other operating systems as well. NetBSD works well on systems with comparatively small amounts of memory. So a natural question is how well ZFS works on one's VAX with 2 MB of RAM :-). More seriously, one might ask if it is reasonable to run ZFS on a RPI3 with 1G of RAM, or if it is reasonable on an amd64 system with 4G. ZFS uses memory for various things, but a particularly significant use is ARC, or "Adaptive Replacement Cache". A limit on ARC is set in the kernel at boot. Enough memory can be consumed that the system deadlocks. NetBSD does not have a mechanism to see or set this limit. FreeBSD has tunables, described in [FreeBSD Handbook ZFS Advanced section](https://docs.freebsd.org/en/books/handbook/zfs/#zfs-advanced). ARC usage manifests as pool usage. A system with 8 GB maxes out around 4 GB pool usage (not just ZFS). Anecdata is that systems with 32 GB never have problems and are entirely safe in production. Problems have not been reported on 16 GB systems and they are believed safe. 8 GB systems seem ok, but confidence is lower. 4 GB systems are known to lock up. Several changes would improve behavior: - expose the limit as a sysctl so that an admin can tune it - set the limit based on RAM, to avoid using large fractions of lower-memory machines - enable the system, when under memory pressure, to ask ARC to free memory Besides RAM, zfs requires that architecture kernel stack size is at least 12KB or more -- some operations cause stack overflow with 8KB kernel stack. On NetBSD, the architectures with 16KB kernel stack are amd64, sparc64, powerpc, and experimental ia64, hppa. mac68k and sh3 have 12KB kernel stack. All others use only 8KB stack, which is not enough to run zfs. \todo Try zfs on i386 and see what happens, and report to `netbsd-users@`. NetBSD has many statistics provided via sysctl; see "sysctl kstat.zfs". # Interoperability with other systems Modern ZFS uses pool version 5000 and feature flags. It is in general possible to export a pool and them import the pool on some other system, as long as the other system supports all the used features. \todo Explain how to do this and what is known to work. \todo Explain feature flags relationship to FreeBSD, Linux, iIllumos, macOS. # Sources of ZFS code Currently, there are multiple ZFS projects and codebases: - [OpenZFS](http://www.open-zfs.org/wiki/Main_Page) - [openzfs repository](https://github.com/openzfs/zfs) - [zfsonlinux](https://zfsonlinux.org/) - [OpenZFS on OS X ](https://openzfsonosx.org/) [repo](https://github.com/openzfsonosx) - proprietary ZFS in Solaris (not relevant in open source) - ZFS as released under the CDDL (common ancestor, now of historical interest) OpenZFS is a coordinating project to align open ZFS codebases. There is a notion of a shared core codebase and OS-specific adaptation code. - [zfsonlinux relationship to OpenZFS](https://github.com/openzfs/zfs/wiki/OpenZFS-Patches) - FreeBSD more or less imports code from openzfs and pushes back fixes. \todo Verify this. - NetBSD has imported code from FreeBSD. - The status of ZFS on macOS is unclear (2021-02).