This page is a blog mirror of sorts. It pulls in articles from blog's feed and publishes them here (with a feed, too).
So, can we do better? Looking at the size of a GENERIC kernel:
text data bss dec hex filename 2997389 67748 173044 3238181 316925 netbsdit seems we can not easily go below 4 MB (and for other reasons we would need to compile the bootloader differently for that anyway). But 16MB is still quite a difference, so it should work.
Now at the time I started this quest, I only had one VAX machine in real hardware - a VaxStation 4000 M96a, one of the fastest machines, and with 128 MB RAM well equipped. This is nice if you try to natively compile modern gcc, but I did not feel like fiddling with my hardware to create a better test environment for small RAM installations.
Like a year (or so) ago, when I fixed the VAX primary boot blocks (with lots of help from various vaxperts on the port-vax mailing list), SIMH, found in pkgsrc as emulators/simh, proved helpful. Testing various configurations I found an emulated VAX 11/780 with 8 MB to be the smallest I could get working.
The first step of the tuning was obvious: the CD image used a ramdisk based kernel, with the ramdisk containing all of the install system. At the same time, most of the CD was unused. We already use different schemes on i386, amd64 and sparc64 - so I cloned the sparc64 one and adjusted it to VAX. Now we use the GENERIC kernel on CD and mount the ISO9660 filesystem on the CD itself as root file system. The VAX boot loader already could deal with this, only a minor fix was needed for the kernel to recognize some variants of CD drives as boot device.
The resulting CD did boot, but did not go far in userland. The CD did only contain a (mostly) empty /dev directory (without /dev/console), which causes init(8) to mount a tmpfs on /dev and run the MAKEDEV script there. But to my surprise, on the 11/780 mfs was used instead of tmpfs - and we will see why soon. Next step in preparation of the userland for the installer is creating additional tmpfs instances to deal with the read-only nature of the CD used as root. This did not work at all, the mount attempts simply failed - and the installer was very unhappy, as it could not create files in /tmp for example.
I checked, but tmpfs was part of the VAX GENERIC kernel. I tried the install CD on a simulated MicroVAX 3900 with 64 MB of RAM - and to my surprise all of /dev and the three additional tmpfs instances created later worked (as well as the installation procedure). I checked the source (stupid me) and then found the documentation: tmpfs reserved a hard coded 4 MB of RAM for the system. With the GENERIC kernel booted on a 8 MB machine, we had slightly less than 4 MB RAM free, so tmpfs never worked.
One step back - this explained why /dev ended up as a mfs instead of tmpfs. The MAKEDEV code is written to deal with kernels that do include tmpfs, but also with those that do not: it tried tmpfs, and falls back to mfs if that does not work. This made me think, I could do the same (but without even trying tmpfs): I changed the install CD scripts to use mfs instead of tmpfs. The main difference is: mfs uses a userland process to manage the swappable memory. However, we do not have any swap space yet. Checking when sysinst enables swapping for the first time, I found: it never did on VAX. Duh! I added the missing calls to machine dependent code in sysinst, but of course the installer can only enable swap after partitioning is done (and a swap partition got created).
Testing showed: we did not get far enough with four mfs instances. So let us try with fewer. One we do not need is the /dev one: I changed the CD content creation code to pre-populate /dev on the CD. This is not possible with all filesystems, including the original ISO9660 one, but with the so-called Rockridge Extensions it works. We know that it is a modern NetBSD kernel mounting the CD - so support for those extensions is always present. I made some errors and hit some bugs (that got fixed) on the way there, but soon the CD booted without creating a mfs (nor tmpfs) for /dev.
Still, three mfs instances did not survive until sysinst enabled swapping. The userland part was killed once the kernel ran out of memory. I needed tmpfs working with less than 4 MB memory free. After a slight detour and some discussion on the tech-kern mailing list, I changed tmpfs to deal (and only reserve a dynamically scaled amount of memory calculated bv the UVM memory management). With this change, a current install CD just works, and installation completes successful.
The following is just the start of the installation process, the sysinst part afterwards is standard stuff and left out for brevity.
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 6.99.43 (GENERIC) #1: Thu Jun 5 22:01:14 CEST 2014 firstname.lastname@example.org:/usr/obj/vax/usr/src/sys/arch/vax/compile/GENERIC VAX 11/780 total memory = 8188 KB avail memory = 3884 KB mainbus0 (root) cpu0 at mainbus0: KA80, S/N 1234(0), hardware ECO level 7(112) cpu0: 4KB L1 cachen, no FPA sbi0 at mainbus0 mem0 at sbi0 tr1: standard mem1 at sbi0 tr2: standard uba1 at sbi0 tr3: DW780 dz1 at uba1 csr 160100 vec 304 ipl 15 mtc0 at uba1 csr 174500 vec 774 ipl 15 mscpbus0 at mtc0: version 5 model 5 mscpbus0: DMA burst size set to 4 uda0 at uba1 csr 172150 vec 770 ipl 15 mscpbus1 at uda0: version 3 model 6 mscpbus1: DMA burst size set to 4 de0 at uba1 csr 174510 vec 120 ipl 15: delua, hardware address 08:00:2b:cc:dd:ee mt0 at mscpbus0 drive 0: TU81 mt1 at mscpbus0 drive 1: TU81 mt2 at mscpbus0 drive 2: TU81 mt3 at mscpbus0 drive 3: TU81 ra0 at mscpbus1 drive 0: RA92 ra1 at mscpbus1 drive 1: RA92 racd0 at mscpbus1 drive 3: RRD40 ra0: size 2940951 sectors ra1: no disk label: size 2940951 sectors racd0: size 1331200 sectors boot device: racd0 root on racd0a dumps on racd0b root file system type: cd9660 init: kernel secur You are using a serial console, we do not know your terminal emulation. Please select one, typical values are: vt100 ansi xterm Terminal type (just hit ENTER for 'vt220'): xterm NetBSD/vax 6.99.43 This menu-driven tool is designed to help you install NetBSD to a hard disk, or upgrade an existing NetBSD system, with a minimum of work. In the following menus type the reference letter (a, b, c, ...) to select an item, or type CTRL+N/CTRL+P to select the next/previous item. The arrow keys and Page-up/Page-down may also work. Activate the current selection from the menu by typing the enter key. +---------------------------------------------+ |>a: Installation messages in English | | b: Installation auf Deutsch | | c: Mensajes de instalacion en castellano | | d: Messages d'installation en français | | e: Komunikaty instalacyjne w jezyku polskim | +---------------------------------------------+Overall this improved NetBSD to better deal with small memory systems. The VAX specific install changes can be brought over to other ports as well, but sometimes changes to the bootloader will be needed.
The NetBSD Project is pleased to announce NetBSD 6.1.4, the fourth security/bugfix update of the NetBSD 6.1 release branch, and NetBSD 6.0.5, the fifth security/bugfix update of the NetBSD 6.0 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a prior release of either branch, we strongly suggest that you update to one of these releases.http://www.NetBSD.org/mirrors/.
Due to a strange series of events the code changes needed to support the (slightly unusual) MIPS CPU used in the playstation2 had never been merged into gcc nor binutils mainline. Only recently this has been fixed. Unfortunately the changes have not been pulled up to the gcc 4.8.3 branch (which is available in NetBSD-current), so an external toolchain from pkgsrc is needed for the playstation2.
To install this toolchain, use a pkgsrc-current checkout and cd to cross/gcc-mips-current, then do "make install" - that is all.
Work is in progress to bring the old code up to -current. Hopefully a bootable NetBSD-current kernel will be available soon.
Work is ongoing to bring this modern toolchain to all other ports too (most of them already work, but some more testing will be done). If you want to try it, just add -V HAVE_GCC=48 to the build.sh invocation.
Note that in parallel clang is available as an alternative option for a few architectures already (i386, amd64, arm, and sparc64), but needs more testing and debugging at least on some of them (e.g. the sparc64 kernel does not boot).
For a project with diverse hardware support like NetBSD, all toolchain updates are a big pain - so a big THANK YOU! to everyone involved; in no particular order Christos Zoulas, matthew green, Nick Hudson, Tohru Nishimura, Frank Wille (and myself).
The NetBSD Project is pleased to announce:
- NetBSD 6.1.3, the third security/bugfix update of the NetBSD 6.1 release branch,
- NetBSD 6.0.4, the fourth security/bugfix update of the NetBSD 6.0 release branch,
- NetBSD 5.2.2, the second security/bugfix update of the NetBSD 5.2 release branch,
- and NetBSD 5.1.4, the fourth security/bugfix update of the NetBSD 5.1 release branch
These releases represent a selected subset of fixes deemed important for security or stability reasons. Updating to one of these versions is recommended for users of all prior releases.
For more details, please see the NetBSD 6.1.3 release notes, the NetBSD 6.0.4 release notes, the NetBSD 5.2.2 release notes, or the NetBSD 5.1.4 release notes.
Complete source and binaries for NetBSD 6.1.3, NetBSD 6.0.4, NetBSD 5.2.2 and NetBSD 5.1.4 are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, SUP, and other services may be found at http://www.NetBSD.org/mirrors/.
A cyclic trend in operating systems is moving things in and out of the kernel for better performance. Currently, the pendulum is swinging in the direction of userspace being the locus of high performance. The anykernel architecture of NetBSD ensures that the same kernel drivers work in a monolithic kernel, userspace and beyond. One of those driver stacks is networking. In this article we assume that the NetBSD networking stack is run outside of the monolithic kernel in a rump kernel and survey the open source interface layer options.
There are two sub-aspects to networking. The first facet is supporting network protocols and suites such as IPv6, IPSec and MPLS. The second facet is delivering packets to and from the protocol stack, commonly referred to as the interface layer. While the first facet for rump kernels is unchanged from the networking stack running in a monolithic NetBSD kernel, there is support for a number of interfaces not available in kernel mode.
The Data Plane Development Kit is meant to be used for high-performance, multiprocessor-aware networking. DPDK offers network access by attaching to hardware and providing a hardware-independent API for sending and receiving packets. The most common runtime environment for DPDK is Linux userspace, where a UIO userspace driver framework kernel module is used to enable access to PCI hardware. The NIC drivers themselves are provided by DPDK and run in application processes.
For high performance, DPDK uses a run-to-completion scheduling model -- the same model is used by rump kernels. This scheduling model means that NIC devices are accessed in polled mode without any interrupts on the fast path. The only interrupts that are used by DPDK are for slow-path operations such as notifications of link status change.
Like DPDK, netmap offers user processes access to NIC hardware with a high-performance userspace packet processing intent. Unlike DPDK, netmap reuses NIC drivers from the host kernel and provides memory-mapped buffer rings for accessing the device packet queues. In other words, the device drivers still remain in the host kernel, but low-level and low-overhead access to hardware is made available to userspace processes. In addition to the memory-mapping of buffers, netmap uses other performance optimization methods such as batch processing and buffer reallocation, and can easily saturate a 10GigE with minimum-size frames. Another significant difference to DPDK is that netmap allows also for a blocking mode of operation.
Netmap is coupled with a high-performance software virtual switch called VALE. It can be used to interconnect networks between virtual machines and processes such as rump kernels. The netmap API is used also by VALE, so VALE switching can be used with the rump kernel driver for netmap.
A tap device injects packets written into a device node, e.g. /dev/tap, to a tap virtual network interface. Conversely, packets received by the virtual tap network can be read from the device node. The tap network interface can be bridged with other network interfaces to provide further network access. While indirect access to network hardware via the bridge is not maximally efficient, it is not hideously slow either: a rump kernel backed by a tap device can saturate a gigabit Ethernet. The advantage of the tap device is portability, as it is widely available on Unix-type systems. Tap interfaces also virtualize nicely, and most operating systems will allow unprivileged processes to use tap interface as long as the processes have the credentials to access the respective device nodes.
The tap device was the original method for accessing with a rump kernel. In fact, the in-kernel side of the rump kernel network driver was rather short-sightedly named virt back in 2008. The virt driver and the associated hypercalls are available in the NetBSD tree. Fun fact: the tap driver is also the method for packet shovelling when running the NetBSD TCP/IP stack in the Linux kernel; the rationale is provided in a comment here and also by running wc -l.
After a fashion, using Xen hypercalls is a variant of using the TAP device: a virtualized network resource is accessed using high-level hypercalls. However, instead of accessing the network backend from a device node, Xen hypercalls are used. The Xen driver is limited to the Xen environment and is available here.
NetBSD PCI NIC drivers
The previous examples we have discussed use a high-level interface to packet I/O functions. For example, to send a packet, the rump kernel will issue a hypercall which essentially says "transmit these data", and the network backend handles the request. When using NetBSD PCI drivers, the hypercalls work at a low level, and deal with for example reading/writing the PCI configuration space and mapping the device memory space into the rump kernel. As a result, using NetBSD PCI device drivers in a rump kernel work exactly like in a regular kernel: the PCI devices are probed during rump kernel bootstrap, relevant drivers are attached, and packet shovelling works by the drivers fiddling the relevant device registers.
The hypercall interfaces and necessary kernel-side implementations are currently hosted in the repository providing Xen support for rump kernels. Strictly speaking, there is nothing specific to Xen in these bits, and they will most likely be moved out of the Xen repository once PCI device driver support for other planned platforms, such as Linux userspace, is completed. The hypercall implementations, which are Xen specific, are available here.
For testing networking, it is advantageous to have an interface which can communicate with other networking stacks on the same host without requiring elevated privileges, special kernel features or a priori setup in the form of e.g. a daemon process. These requirements are filled by shmif, which uses file-backed shared memory as a bus for Ethernet frames. Each interface attaches to a pathname, and interfaces attached to the same pathname see the the same traffic.
The shmif driver is available in the NetBSD tree.
We presented a total of six open source network backends for networking with rump kernels. These backends represent four different methodologies:
- DPDK and netmap provide high-performance network hardware access using high-level hypercalls.
- TAP and Xen hypercall drivers provide access to virtualized network resources using high-level hypercalls.
- NetBSD PCI drivers access hardware directly using register-level device access to send and receive packets.
- shmif allows for unprivileged testing of the networking stack without relying on any special kernel drivers or global resources.
Choice is a good thing here, as the optimal backend ultimately depends on the characteristics of the application.
FOSDEM 2014 will take place on 1–2 February, 2014, in Brussels, Belgium. Just like in the last years, there will be both a BSD booth and a developer's room (on Saturday).
The topics of the devroom include all BSD operating systems. Every talk is welcome, from internal hacker discussion to real-world examples and presentations about new and shiny features. The default duration for talks will be 45 minutes including discussion. Feel free to ask if you want to have a longer or a shorter slot.
If you already submitted a talk last time, please note that the procedure is slightly different.
To submit your proposal, visit
and follow the instructions to create an account and an “event”. Please select “BSD devroom” as the track. (Click on “Show all” in the top right corner to display the full form.)
Please include the following information in your submission:
- The title and subtitle of your talk (please be descriptive, as titles will be listed with ~500 from other projects)
- A short abstract of one paragraph
- A longer description if you wish to do so
- Links to related websites/blogs etc.
The deadline for submissions is December 20, 2013. The talk committee, consisting of Daniel Seuffert, Marius Nünnerich and Benny Siegert, will consider the proposals. If yours has been accepted, you will be informed by e-mail before the end of the year.
The following report is by Manuel Wiesinger:
First of all, I like to thank the NetBSD Foundation for enabling me to successfully complete this Google Summer of Code. It has been a very valuable experience for me.
My project is a defragmentation tool for FFS. I want to point out at the beginning that it is not ready for use yet.
What has been done:
Fragment analysis + reordering. When a file is smaller or equal than the file system's fragment size, it is stored as a fragment. One can think of a fragment as a block. It can happen that there are many small files that occupy a fragment. When the file systems changes over time it can happen that there are many blocks containing fewer fragments than they can hold. The optimization my tool does is to pack all these fragments into fewer blocks. This way the system may get a little more free space.
Directory optimization. When a directory gets deleted, the space for that directory and its name are appended to the previous directory. This can be imagined like a linked list. My tool reads that list and writes all entries sequentially.
Non-contiguous files analysis + reordering strategy. This is what most other operating systems call defragmentation - a reordering of blocks, so that blocks belonging to the same file or directory can be read sequentially.
What did not work as expected
Testing: I thought that it is the most productive and stable to work with unit tests. Strictly test driven development. It was not really effective to play around with rump/atf. Although I always started a new implementation step by generating a file system in a state where it can be optimized. So I wrote the scripts, took a look if they did what I intended (of course, they did not always).
I'm a bit disappointed about the amount of code. But as I said before, the hardest part is to figure out how things work. The amount it does is relatively much, I expected more lines of code to be needed to get where I am now.
Before applying for this project, I took a close look at UFS. But it was not close enough. There were many surprises. E.g. I had no idea that there are gaps in files on purpose, to exploit the rotation of hard disks.
Time management, everything took longer than I expected. Mostly because it was really hard to figure out how things work. Lacking documentation is a huge problem too.
Things I learned
A huge lesson learned in software engineering. It is always different than expected, if you do not have a lot of experience.
I feel more confident to read and patch kernel code. All my previous experiences were not so in-depth. (e.g., I worked with pintos). The (mental) barrier of kernel/system programming is gone. For example I see a chance now to take a look on ACPI, and see if I can write a patch to get suspend working on my notebook.
I got more contact with the NetBSD community, and got a nice overview how things work. The BSD community here is very mixed. There are not many NetBSD people.
CVS is better than most of my friends say.
I learned about pkgsrc, UVM, and other smaller things about NetBSD too, but that's not worth mentioning in detail.
How I intend to continue:
After a sanity break, of the whole project, there are several possibilities.
In the next days I will speak to a supervisor at my university, if I can continue the project as a project thesis (I still need to do one). It may even include online defragmentation, based on snapshots. That's my preferred option.
I definitely want to finish this project, since I spent so much time and effort. It would be a shame otherwise.
What there is to do technically
Once the defragmentation works, given enough space to move a file. I want to find a way where you can defragment it even when there is too little space. This can be achieved by simply moving blocks piece by piece, and use the files' space as 'free space'.
Online defragmentation. I already skimmed hot snapshots work. It should be possible.
Improve the tests.
It should be easy to get this compiling on older releases. Currently it compiles only on -current.
Eventually it's worth it to port the tests to atf/rump on the long run.
I will continue to work, and definitely continue to use and play with NetBSD!
It's a stupid thing that a defrag tool is worth little (nothing?) on SSDs. But since NetBSD is designed to run on (almost) any hardware, this does not bug me a lot.
Thank you NetBSD! Although it was a lot of hard work. It was a lot of fun!
The NetBSD Project is pleased to announce NetBSD 6.1.2, the second security/bugfix update of the NetBSD 6.1 release branch, and NetBSD 6.0.3, the third security/bugfix update of the NetBSD 6.0 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a prior release of NetBSD 6.x3, you are recommended to update.http://www.NetBSD.org/mirrors/.