File:  [NetBSD Developer Wiki] / wikisrc / zfs.mdwn
Revision 1.42: download - view: text, annotated - select for diffs
Thu Mar 25 22:59:39 2021 UTC (6 months, 3 weeks ago) by gdt
Branches: MAIN
CVS tags: HEAD
zfs: clarify rm/zil-commit issue

# ZFS on NetBSD

This page attempts to do two things: provide enough orientation and
pointers to standard ZFS documentation for NetBSD users who are new to
ZFS, and to describe NetBSD-specific ZFS information.  It is
emphatically not a tutorial or an introduction to ZFS.

Many things are marked with \todo because they need a better
explanation, and some have question marks

This HOWTO describes the most recent state of branches, and does not
attempt to describe formal releases.  This is a clue; if you are using
NetBSD 9 and ZFS, you should update along the branch.

# Status of ZFS in NetBSD

## NetBSD 8

NetBSD 8 has an old version of ZFS, and it is not recommended for use
at all.  There is no evidence that anyone is interested in helping
with ZFS on 8.  Those wishing to use ZFS on NetBSD 8 should therefore
update to NetBSD 9.

## NetBSD 9

NetBSD-9 has ZFS that is considered to work well.  There have been
fixes since 9.0_RELEASE.  As always, people running NetBSD 9 are
likely best served by the most recent version of the netbsd-9 stable
branch.  As of 2021-03, ZFS in the NetBSD 9.1 release is very close to
netbsd-9, except that the mkdir fix is newly in netbsd-9.

There was a crash with mkdir over NFS with maproot, resolved in March
2021 in 9 and current.  See http://gnats.netbsd.org/55042

There is a workaround where removing a file will commit the ZIL
(normally this would not be done), to avoid crashes due to vnode
reclaims.  \todo Link to PR.

There has been a report of an occasional panic somewhere in
zfs_putpages.

## NetBSD-current

NetBSD-current (as of 2021-03) has similar ZFS code to 9.

There is initial support for [[ZFS root|wiki/RootOnZFS]], via booting
from ffs and pivoting.

## NetBSD/xen special issues

Summary: if you are using NetBSD, xen and zfs, use NetBSD-current.

In NetBSD-9, MAXPHYS is 64KB in most places, but because of xbd(4) it
is set to 32KB for XEN kernels.  Thus the standard zfs kernel modules
do not work under xen.  In NetBSD-current, xbd(4) supports 64 KB
MAXPHYS and this is no longer an issue.  Xen and zfs on current are
reported to work well together, as of 2021-02.

## Architectures

Most people seem to be using amd64.

To build zfs, one puts MKZFS=yes in mk.conf.  This is default on amd64
and aarch64 on netbsd-9.  In current, it is also default on sparc64.

More or less, zfs can be enabled on an architecture when it is known
to build and run reliably.  (Of course, users are welcome to build it
and report.)

# Quick Start

See the [FreeBSD Quickstart
Guide](https://www.freebsd.org/doc/handbook/zfs-quickstart.html); only
the first item is NetBSD specific.

  - Put zfs=YES in rc.conf.

  - Create a pool as "zpool create pool1 /dev/dk0".

  - df and see /pool1

  - Create a filesystem mounted on /n0 as "zfs create -o
    mountpoint=/n0 pool1/n0".

  - Read the documentation referenced in the next section.

## Documentation Pointers

See the man pages for zfs(8), zpool(8).  Also see zdb(8), if only for
seeing pool config info when run with no arguments.

  - [OpenZFS Documentation](https://openzfs.github.io/openzfs-docs/)
  - [OpenZFS admin docs index page](https://github.com/openzfs/zfs/wiki/Admin-Documentation)
  - [FreeBSD Handbook ZFS Chapter](https://www.freebsd.org/doc/handbook/zfs.html)
  - [Oracle ZFS Administration Manual](https://docs.oracle.com/cd/E26505_01/html/E37384/index.html)
  - [Wikipedia](https://en.wikipedia.org/wiki/ZFS)

# NetBSD-specific information

## rc.conf

The main configuration is to put zfs=YES in rc.conf, so that the rc.d
scripts bring up ZFS and mount ZFS file systems.

## pool locations

One can add disks or parts of disks into pools.  Methods of specifying
areas to be included include:

  - entire disks (e.g., /dev/wd0d on amd64, or /dev/wd0 which has the same major/minor)
  - disklabel partitions (e.g., /dev/sd0e)
  - wedges (e.g., /dev/dk0)

Information about created or imported pools is stored in
/etc/zfs/zpool.cache.

Conventional wisdom is that a pool that is more than 80% used gets
unhappy; so far there is not NetBSD-specific wisdom to confirm or
refute that.

## pool native blocksize mismatch

ZFS attempts to find out the native blocksize for a disk when using it
in a pool; this is almost always 512 or 4096.  Somewhere between 9.0
and 9.1, at least some disks on some controllers that used to report
512 now report 4096.  This provokes a blocksize mismatch warning.

Given that the native blocksize of the disk didn't change, and things
seemed OK using the 512 emulated blocks, the warning is likely not
critical.  However, it is also likely that rebuilding the pool with
the 4096 blocksize is likely to result in better behavior because ZFS
will only try to do 4096-byte writes.  \todo Verify this and find the
actual change and explain better.

## pool importing problems

While one can "zpool pool0 /dev/wd0f" and have a working pool, this
pool cannot be exported and imported straigthforwardly.  "zpool
export" works fine, and deletes zpool.cache.  "zpool import", however,
only looks at entire disks (e.g. /dev/wd0), and might look at slices
(e.g. /dev/dk0).  It does not look at partitions like /dev/wd0f, and
there is no way on the command line to ask that specific devices be
examined.  Thus, export/import fails for pools with disklabel
partitions.

One can make wd0 be a link to wd0f temporarily, and the pool will then
be importable.  However, "wd0" is stored in zpool.cache and on the
next boot that will attempt to be used.  This is obviously not a good
approach.

One an mkdir e.g. /etc/zfs/pool0 and in it have a symlink to
/dev/wd0f.  Then, zpool import -d /etc/zfs/pool0 will scan
/etc/zfs/pool0/wd0f and succeed.  The resulting zpool.cache will have
that path, but having symlinks in /etc/zfs/POOLNAME seems acceptable.

\todo Determine a good fix, perhaps man page changes only, fix it
upstream, in curent, and in 9, before removing this discussion.

## mountpoint conventions

By default, datasets are mounted as /poolname/datasetname.  One can
also set a mountpoint; see zfs(8).

There does not appear to be any reason to choose explicit mountpoints
vs the default (and either using data in place or symlinking to it).

## mount order

NetBSD 9 mounts other file systems and then ZFS file systems.  This can
be a problem if /usr/pkgsrc is on ZFS and /usr/pkgsrc/distfiles is on
NFS.  A workaround is to use noauto and do the mounts in
/etc/rc.local.

NetBSD current after 20200301 mounts ZFS first.  The same issues and
workarounds apply in different circumstances.

## NFS

zfs filesystems can be exported via NFS, simply by placing them in
/etc/exports like any other filesystem.

The "zfs share" command adds a line for each filesystem with the
sharenfs property set to /etc/zfs/exports, and "zfs unshare" removes
it.  This file is ignored on NetBSD-9 and current before 20210216; on
current after 20210216 those filesystems should be exported (assuming
NFS is enabled).  It does not appear to be possible to set options
like maproot and network restrictions via this method.

On current before 20210216, a remote mkdir of a filesystem mounted via
-maproot=0:10 causes a kernel NULL pointer dereference.  This is now
fixed.

## zvol

Within a ZFS pool, the standard approach is to have file systems, but
one can also create a zvol, which is a block device of a certain size.

As an example, "zfs create -V 16G tank0/xen-netbsd-9-amd64" creates a
zvol (intended to be a virtual disk for a domU).

The zvol in the example will appear as
/dev/zvol/rdsk/tank0/xen-netbsd-9-amd64 and
/dev/zvol/dsk/tank0/xen-netbsd-9-amd64 and can be used like a
disklabel partition or wedge.  However, the system will not read
disklabels and gpt labels from a zvol.

Doing "swapctl -a" on a zvol device node fails.  \todo Is it really
true that NetBSD can't swap on a zvol?  (When using a zvol for swap,
standard advice is to avoid the "-s" option which avoids reserving the
allocated space.  Standard advice is also to consider using a
dedicated pool.)

\todo Explain that one can export a zvol via iscsi.

One can use ccd to create a normal-looking disk from a zvol.  This
allows reading a GPT label from the zvol, which is useful in case the
zvol had been exported via iscsi and some other system created a
label.

# Memory usage

Basically, ZFS uses lots of memory and most people run it on systems
with large amounts of memory.  NetBSD works well on systems with
comparatively small amounts of memory.  So a natural question is how
well ZFS works on one's VAX with 2M of RAM :-) More seriously, one
might ask if it is reasonable to run ZFS on a RPI3 with 1G of RAM, or
if it is reasonable on a system with 4G.

The prevailing wisdom is more or less that ZFS consumes 1G plus 1G per
1T of disk.  32-bit architectures are viewed as too small to run ZFS.

Besides RAM, zfs requires that architecture kernel stack size is at
least 12KB or more -- some operations cause stack overflow with 8KB
kernel stack. On NetBSD, the architectures with 16KB kernel stack are
amd64, sparc64, powerpc, and experimental ia64, hppa. mac68k and sh3
have 12KB kernel stack. All others use only 8KB stack, which is not
enough to run zfs.

NetBSD has many statistics provided via sysctl; see "sysctl
kstat.zfs".

FreeBSD has tunables that NetBSD does not seem to have, described in
[FreeBSD Handbook ZFS Advanced
section](https://docs.freebsd.org/en/books/handbook/zfs/#zfs-advanced).

# Interoperability with other systems

Modern ZFS uses pool version 5000 and feature flags.

It is in general possible to export a pool and them import the pool on
some other system, as long as the other system supports all the used
features.

\todo Explain how to do this and what is known to work.

\todo Explain feature flags relationship to FreeBSD, Linux, iIllumos,
macOS.

# Sources of ZFS code

Currently, there are multiple ZFS projects and codebases:

  - [OpenZFS](http://www.open-zfs.org/wiki/Main_Page)
  - [openzfs repository](https://github.com/openzfs/zfs)
  - [zfsonlinux](https://zfsonlinux.org/)
  - [OpenZFS on OS X ](https://openzfsonosx.org/) [repo](https://github.com/openzfsonosx)
  - proprietary ZFS in Solaris (not relevant in open source)
  - ZFS as released under the CDDL (common ancestor, now of historical interest)

OpenZFS is a coordinating project to align open ZFS codebases.  There
is a notion of a shared core codebase and OS-specific adaptation code.

  - [zfsonlinux relationship to OpenZFS](https://github.com/openzfs/zfs/wiki/OpenZFS-Patches)
  - FreeBSD more or less imports code from openzfs and pushes back fixes. \todo Verify this.
  - NetBSD has imported code from FreeBSD.
  - The status of ZFS on macOS is unclear (2021-02).

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb