This page is a blog mirror of sorts. It pulls in articles from blog's feed and publishes them here (with a feed, too).
The NetBSD Project is pleased to announce NetBSD 6.1.5, the fifth security/bugfix update of the NetBSD 6.1 release branch, and NetBSD 6.0.6, the sixth security/bugfix update of the NetBSD 6.0 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a prior release of either branch, we strongly suggest that you update to one of these releases.http://www.NetBSD.org/mirrors/.
There was a nice, friendly and informal summit of NetBSD developers (and interested users) in Sofia on Friday, September 26, 2014.
Bernd Ernesti (veego@) took this photo:
In the back row from left to right:
Masao Uebayashi (uebayasi), Thomas Klausner (wiz), Yann Sionneau, Marc Balmer (mbalmer), Justin Cormack (justin), Jaap Boender (jaapb), Adrian Steinmann (ast), Martin Husemann (martin), Taylor R Campbell (riastradh), Michael van Elst (mlelstv), Sevan Janiyan, Alexander Nasonov (alnsn).
In the front row, from left to right:
Julian Coleman (jdc), Joerg Sonnenberger (joerg), Valeriy E. Ushakov (uwe), Christoph Badura (bad), S.P.Zeidler (spz), Pierre Pronchery (khorben), Stephen Borrill (sborrill)
Some Developers made it to the conference only after the summit, so Emmanuel Dreyfus (manu), Luke Mewburn (lukem) and Lourival Neto (lneto) are unfortunately missing on that picture.
Marc Balmer presented some slides prepared by Masanobu SAITOH about ongoing work at IIJ.
Marc also proposed some extensions to the in-tree httpd (aka bozohttpd) to allow creation of simple dynamic content via "Lua templates". The question whether it would be possible to serve www.netbsd.org by bozohttpd was discussed, and only administrative reasons seem to prevent it - which was overall considered "good enough".
Pierre Pronchery presented his work on EdgeBSD - and why he does not consider it a fork, but more a playground for experimentation. He also reported on some of his experiences with using git on the NetBSD source tree.
Two more slightly TNF internal issues were discussed, and after that the whole crowd moved to dinner (including a bit of Bulgarian wine).
Having an in-person meeting of a relative huge number of NetBSD developers was considered very useful and we will try to repeat it at other occasions. Next time, if similar attendance is likely, we will plan for more time (like a full day) and also create a schedule of talks/presentations up front, still with the option to add ad-hoc ones.
The most time-consuming part of operating system development is obtaining enough drivers to enable the OS to run real applications which interact with the real world. NetBSD's rump kernels allow reducing that time to almost zero, for example for developing special-purpose operating systems for the cloud and embedded IoT devices. This article describes an experiment in creating an OS by using a rump kernel for drivers. It attempts to avoid going into full detail on the principles of rump kernels, which are available for interested readers from rumpkernel.org. We start by defining the terms in the title:
- OS: operating system, i.e. the overhead that enables applications to run
- internet-ready: supports POSIX applications and talks TCP/IP
- a week: 7 days, in this case the period between Wednesday night last week and Wednesday night this week
- from scratch: began by writing the assembly instructions for the kernel entry point
- rump kernel: partial kernel consisting of unmodified NetBSD kernel drivers
- bare metal: what you get from the BIOS/firmware
Why would anyone want to write a new OS? If you look at our definition of "OS", you notice that you want to keep the OS as small as possible. Sometimes you might not care, e.g. in case of a desktop PC, but other times when hardware resources are limited or you have high enough security concerns, you actually might care. For example, NetBSD itself is not able to run on systems without a MMU, but the OS described in this article does not use virtual memory at all, and yet it can run most of the same applications as NetBSD can. Another example: if you want to finetune the OS to suit your application, it's easier to tune a simple OS than a very complicated general purpose OS. The motivation for this work came in fact from someone who was looking to provision applications as services on top of VMWare, but found that no existing solution supported the system interfaces his applications needed without dragging an entire classic OS along for the ride.
Let's move on to discussing what an OS needs to support for it to be able to host for example a web server written for a regular OS such as Linux or the BSDs. The list gets quite long. You need a file system where the web server reads the served pages from, you need a TCP/IP stack to communicate with the clients, and you need a network interface driver to be able to send and receive packets. Furthermore, you need the often overlooked, yet very surprisingly complicated system call handlers. For example, opening a socket is not really very complicated to handle. Neither is reading and writing data. However, when you start piling things like fcntl(O_NONBLOCK) and poll() on top, things get trickier. By a rough estimate, if you run an httpd on NetBSD, approximately 100k lines of code from kernel are used just to service the requests that the httpd makes. If you do the math (and bc did), there are 86400 seconds in a week. The OS we are discussing is able to run an off-the-shelf httpd, but definitely I did not write >1 line of code per second 24/7 during the past week.
Smoke and Mirrors, CGI Edition
The key to happiness is not to write 100k lines of code from scratch, nor to port it from another OS, as both are time-consuming and error-prone techniques, and error-proneness leads to even more consumption of time. Rump kernels come into the picture as the key to happiness and provide the necessary drivers.
As the old saying goes: "rump kernels do not an OS make", and we need the rest of the bits that make up the OS side of the software stack from somewhere. These bits need to make it seem like the drivers in a rump kernel are running inside the NetBSD kernel, hence "smoke and mirrors". What is surprising is how little code needs to exist between the drivers and the hardware, just some hundreds of lines of code. More specifically, in the bare metal scenario we need support for:
- low level machine dependent code
- thread support and a scheduler
- rump kernel hypercall layer
- additionally: bundling the application into a bootable image
The figure below illustrates the rump kernel software stack. The arrows correspond to the above list (in reverse order). We go over the list starting from the top of the list (bottom of the figure).
Low level machine dependent code is what the OS uses to get the CPU and devices to talking terms with the rest of OS. Before we can do anything useful, we need to bootstrap. Bootstrapping x86-32 is less work than one would expect, which incidentally is also why the OS runs only in 32bit mode (adding 64bit support would not likely be many hours of work — and patches are welcome). Thanks to the Multiboot specification, the bootstrap code is more or less just a question of setting the stack pointer and jumping to C code. In C code we need to parse the amount of physical memory available and initialize the console. Since NetBSD device drivers mainly use interrupts, we also need interrupt support for the drivers to function correctly. On x86, interrupt support means setting up the CPU's interrupt descriptor tables and programming the interrupt controller. Since rump kernels do not support interrupts, in addition we need a small interrupt stub that transfers the interrupt request to a thread context which calls the rump kernel. In total, the machine dependent code is only a few hundred lines. The OSDev.org wiki contains a lot of information which was useful when hammering the hardware into shape. The other source of x86 hardware knowledge was x86 support in NetBSD.
Threads and scheduling might sound intimidating, but they are not. First, rump kernels can run on top of any kinds of threads you throw at them, so we can just use the ones which are the simplest to implement: cooperative threads. Note, simple does not mean poorly performing threads, and in fact the predictability of cooperative threads, at least in my opinion, makes them more likely to perform better than preemptive threading in cases where you are honing an OS for a single application. Second, I already had access to an implementation which served as the basis: Justin Cormack's work on userspace fibers, which in turn has its roots in Xen MiniOS we use for running rump kernel on the Xen hypervisor, could be re-purposed as the threads+scheduler implementation, with the context switch code kindly borrowed from MiniOS.
The rump kernel hypercall interface is what rump kernels themselves run on. While the implementation is platform-specific, our baremetal OS shares a large portion of its qualities with the Xen platform that was already supported. Therefore, most of the Xen implementation applied more or less directly. One notable exception to the similarities is that Xen paravirtualized devices are not available on bare metal and therefore we access all I/O devices via the PCI bus.
All we need now is the application, a.k.a. "userspace". Support for application interfaces (POSIX syscalls, libc, etc.) readily exists for rump kernels, so we just use what is already available. The only remaining issue is building the bundle that we bootstrap. For that, we can repurpose Ian Jackson's app-tools which were originally written for the rump kernel Xen platform. Using app-tools, we could build a bootable image containing thttpd simply by running the app-tools wrappers for ./configure and make. The image below illustrates part of the build output, along with booting the image in QEMU and testing that the httpd really works. The use of QEMU, i.e. software-emulated bare metal, is due to convenience reasons.
You probably noticed that whole thing is just bolting a lot of working components together while writing minimal amounts of necessary glue. That is exactly the point: never write or port or hack what you can reuse without modification. Code reusability has always been the strength of NetBSD and rump kernels add another dimension to that quality.
The source code for the OS discussed in this post is available under a 2-clause BSD license from repo.rumpkernel.org/rumpuser-baremetal.
So, can we do better? Looking at the size of a GENERIC kernel:
text data bss dec hex filename 2997389 67748 173044 3238181 316925 netbsdit seems we can not easily go below 4 MB (and for other reasons we would need to compile the bootloader differently for that anyway). But 16MB is still quite a difference, so it should work.
Now at the time I started this quest, I only had one VAX machine in real hardware - a VaxStation 4000 M96a, one of the fastest machines, and with 128 MB RAM well equipped. This is nice if you try to natively compile modern gcc, but I did not feel like fiddling with my hardware to create a better test environment for small RAM installations.
Like a year (or so) ago, when I fixed the VAX primary boot blocks (with lots of help from various vaxperts on the port-vax mailing list), SIMH, found in pkgsrc as emulators/simh, proved helpful. Testing various configurations I found an emulated VAX 11/780 with 8 MB to be the smallest I could get working.
The first step of the tuning was obvious: the CD image used a ramdisk based kernel, with the ramdisk containing all of the install system. At the same time, most of the CD was unused. We already use different schemes on i386, amd64 and sparc64 - so I cloned the sparc64 one and adjusted it to VAX. Now we use the GENERIC kernel on CD and mount the ISO9660 filesystem on the CD itself as root file system. The VAX boot loader already could deal with this, only a minor fix was needed for the kernel to recognize some variants of CD drives as boot device.
The resulting CD did boot, but did not go far in userland. The CD did only contain a (mostly) empty /dev directory (without /dev/console), which causes init(8) to mount a tmpfs on /dev and run the MAKEDEV script there. But to my surprise, on the 11/780 mfs was used instead of tmpfs - and we will see why soon. Next step in preparation of the userland for the installer is creating additional tmpfs instances to deal with the read-only nature of the CD used as root. This did not work at all, the mount attempts simply failed - and the installer was very unhappy, as it could not create files in /tmp for example.
I checked, but tmpfs was part of the VAX GENERIC kernel. I tried the install CD on a simulated MicroVAX 3900 with 64 MB of RAM - and to my surprise all of /dev and the three additional tmpfs instances created later worked (as well as the installation procedure). I checked the source (stupid me) and then found the documentation: tmpfs reserved a hard coded 4 MB of RAM for the system. With the GENERIC kernel booted on a 8 MB machine, we had slightly less than 4 MB RAM free, so tmpfs never worked.
One step back - this explained why /dev ended up as a mfs instead of tmpfs. The MAKEDEV code is written to deal with kernels that do include tmpfs, but also with those that do not: it tried tmpfs, and falls back to mfs if that does not work. This made me think, I could do the same (but without even trying tmpfs): I changed the install CD scripts to use mfs instead of tmpfs. The main difference is: mfs uses a userland process to manage the swappable memory. However, we do not have any swap space yet. Checking when sysinst enables swapping for the first time, I found: it never did on VAX. Duh! I added the missing calls to machine dependent code in sysinst, but of course the installer can only enable swap after partitioning is done (and a swap partition got created).
Testing showed: we did not get far enough with four mfs instances. So let us try with fewer. One we do not need is the /dev one: I changed the CD content creation code to pre-populate /dev on the CD. This is not possible with all filesystems, including the original ISO9660 one, but with the so-called Rockridge Extensions it works. We know that it is a modern NetBSD kernel mounting the CD - so support for those extensions is always present. I made some errors and hit some bugs (that got fixed) on the way there, but soon the CD booted without creating a mfs (nor tmpfs) for /dev.
Still, three mfs instances did not survive until sysinst enabled swapping. The userland part was killed once the kernel ran out of memory. I needed tmpfs working with less than 4 MB memory free. After a slight detour and some discussion on the tech-kern mailing list, I changed tmpfs to deal (and only reserve a dynamically scaled amount of memory calculated bv the UVM memory management). With this change, a current install CD just works, and installation completes successful.
The following is just the start of the installation process, the sysinst part afterwards is standard stuff and left out for brevity.
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 6.99.43 (GENERIC) #1: Thu Jun 5 22:01:14 CEST 2014 firstname.lastname@example.org:/usr/obj/vax/usr/src/sys/arch/vax/compile/GENERIC VAX 11/780 total memory = 8188 KB avail memory = 3884 KB mainbus0 (root) cpu0 at mainbus0: KA80, S/N 1234(0), hardware ECO level 7(112) cpu0: 4KB L1 cachen, no FPA sbi0 at mainbus0 mem0 at sbi0 tr1: standard mem1 at sbi0 tr2: standard uba1 at sbi0 tr3: DW780 dz1 at uba1 csr 160100 vec 304 ipl 15 mtc0 at uba1 csr 174500 vec 774 ipl 15 mscpbus0 at mtc0: version 5 model 5 mscpbus0: DMA burst size set to 4 uda0 at uba1 csr 172150 vec 770 ipl 15 mscpbus1 at uda0: version 3 model 6 mscpbus1: DMA burst size set to 4 de0 at uba1 csr 174510 vec 120 ipl 15: delua, hardware address 08:00:2b:cc:dd:ee mt0 at mscpbus0 drive 0: TU81 mt1 at mscpbus0 drive 1: TU81 mt2 at mscpbus0 drive 2: TU81 mt3 at mscpbus0 drive 3: TU81 ra0 at mscpbus1 drive 0: RA92 ra1 at mscpbus1 drive 1: RA92 racd0 at mscpbus1 drive 3: RRD40 ra0: size 2940951 sectors ra1: no disk label: size 2940951 sectors racd0: size 1331200 sectors boot device: racd0 root on racd0a dumps on racd0b root file system type: cd9660 init: kernel secur You are using a serial console, we do not know your terminal emulation. Please select one, typical values are: vt100 ansi xterm Terminal type (just hit ENTER for 'vt220'): xterm NetBSD/vax 6.99.43 This menu-driven tool is designed to help you install NetBSD to a hard disk, or upgrade an existing NetBSD system, with a minimum of work. In the following menus type the reference letter (a, b, c, ...) to select an item, or type CTRL+N/CTRL+P to select the next/previous item. The arrow keys and Page-up/Page-down may also work. Activate the current selection from the menu by typing the enter key. +---------------------------------------------+ |>a: Installation messages in English | | b: Installation auf Deutsch | | c: Mensajes de instalacion en castellano | | d: Messages d'installation en français | | e: Komunikaty instalacyjne w jezyku polskim | +---------------------------------------------+Overall this improved NetBSD to better deal with small memory systems. The VAX specific install changes can be brought over to other ports as well, but sometimes changes to the bootloader will be needed.
The NetBSD Project is pleased to announce NetBSD 6.1.4, the fourth security/bugfix update of the NetBSD 6.1 release branch, and NetBSD 6.0.5, the fifth security/bugfix update of the NetBSD 6.0 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a prior release of either branch, we strongly suggest that you update to one of these releases.http://www.NetBSD.org/mirrors/.
Due to a strange series of events the code changes needed to support the (slightly unusual) MIPS CPU used in the playstation2 had never been merged into gcc nor binutils mainline. Only recently this has been fixed. Unfortunately the changes have not been pulled up to the gcc 4.8.3 branch (which is available in NetBSD-current), so an external toolchain from pkgsrc is needed for the playstation2.
To install this toolchain, use a pkgsrc-current checkout and cd to cross/gcc-mips-current, then do "make install" - that is all.
Work is in progress to bring the old code up to -current. Hopefully a bootable NetBSD-current kernel will be available soon.
Work is ongoing to bring this modern toolchain to all other ports too (most of them already work, but some more testing will be done). If you want to try it, just add -V HAVE_GCC=48 to the build.sh invocation.
Note that in parallel clang is available as an alternative option for a few architectures already (i386, amd64, arm, and sparc64), but needs more testing and debugging at least on some of them (e.g. the sparc64 kernel does not boot).
For a project with diverse hardware support like NetBSD, all toolchain updates are a big pain - so a big THANK YOU! to everyone involved; in no particular order Christos Zoulas, matthew green, Nick Hudson, Tohru Nishimura, Frank Wille (and myself).
The NetBSD Project is pleased to announce:
- NetBSD 6.1.3, the third security/bugfix update of the NetBSD 6.1 release branch,
- NetBSD 6.0.4, the fourth security/bugfix update of the NetBSD 6.0 release branch,
- NetBSD 5.2.2, the second security/bugfix update of the NetBSD 5.2 release branch,
- and NetBSD 5.1.4, the fourth security/bugfix update of the NetBSD 5.1 release branch
These releases represent a selected subset of fixes deemed important for security or stability reasons. Updating to one of these versions is recommended for users of all prior releases.
For more details, please see the NetBSD 6.1.3 release notes, the NetBSD 6.0.4 release notes, the NetBSD 5.2.2 release notes, or the NetBSD 5.1.4 release notes.
Complete source and binaries for NetBSD 6.1.3, NetBSD 6.0.4, NetBSD 5.2.2 and NetBSD 5.1.4 are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, SUP, and other services may be found at http://www.NetBSD.org/mirrors/.
A cyclic trend in operating systems is moving things in and out of the kernel for better performance. Currently, the pendulum is swinging in the direction of userspace being the locus of high performance. The anykernel architecture of NetBSD ensures that the same kernel drivers work in a monolithic kernel, userspace and beyond. One of those driver stacks is networking. In this article we assume that the NetBSD networking stack is run outside of the monolithic kernel in a rump kernel and survey the open source interface layer options.
There are two sub-aspects to networking. The first facet is supporting network protocols and suites such as IPv6, IPSec and MPLS. The second facet is delivering packets to and from the protocol stack, commonly referred to as the interface layer. While the first facet for rump kernels is unchanged from the networking stack running in a monolithic NetBSD kernel, there is support for a number of interfaces not available in kernel mode.
The Data Plane Development Kit is meant to be used for high-performance, multiprocessor-aware networking. DPDK offers network access by attaching to hardware and providing a hardware-independent API for sending and receiving packets. The most common runtime environment for DPDK is Linux userspace, where a UIO userspace driver framework kernel module is used to enable access to PCI hardware. The NIC drivers themselves are provided by DPDK and run in application processes.
For high performance, DPDK uses a run-to-completion scheduling model -- the same model is used by rump kernels. This scheduling model means that NIC devices are accessed in polled mode without any interrupts on the fast path. The only interrupts that are used by DPDK are for slow-path operations such as notifications of link status change.
Like DPDK, netmap offers user processes access to NIC hardware with a high-performance userspace packet processing intent. Unlike DPDK, netmap reuses NIC drivers from the host kernel and provides memory-mapped buffer rings for accessing the device packet queues. In other words, the device drivers still remain in the host kernel, but low-level and low-overhead access to hardware is made available to userspace processes. In addition to the memory-mapping of buffers, netmap uses other performance optimization methods such as batch processing and buffer reallocation, and can easily saturate a 10GigE with minimum-size frames. Another significant difference to DPDK is that netmap allows also for a blocking mode of operation.
Netmap is coupled with a high-performance software virtual switch called VALE. It can be used to interconnect networks between virtual machines and processes such as rump kernels. The netmap API is used also by VALE, so VALE switching can be used with the rump kernel driver for netmap.
A tap device injects packets written into a device node, e.g. /dev/tap, to a tap virtual network interface. Conversely, packets received by the virtual tap network can be read from the device node. The tap network interface can be bridged with other network interfaces to provide further network access. While indirect access to network hardware via the bridge is not maximally efficient, it is not hideously slow either: a rump kernel backed by a tap device can saturate a gigabit Ethernet. The advantage of the tap device is portability, as it is widely available on Unix-type systems. Tap interfaces also virtualize nicely, and most operating systems will allow unprivileged processes to use tap interface as long as the processes have the credentials to access the respective device nodes.
The tap device was the original method for accessing with a rump kernel. In fact, the in-kernel side of the rump kernel network driver was rather short-sightedly named virt back in 2008. The virt driver and the associated hypercalls are available in the NetBSD tree. Fun fact: the tap driver is also the method for packet shovelling when running the NetBSD TCP/IP stack in the Linux kernel; the rationale is provided in a comment here and also by running wc -l.
After a fashion, using Xen hypercalls is a variant of using the TAP device: a virtualized network resource is accessed using high-level hypercalls. However, instead of accessing the network backend from a device node, Xen hypercalls are used. The Xen driver is limited to the Xen environment and is available here.
NetBSD PCI NIC drivers
The previous examples we have discussed use a high-level interface to packet I/O functions. For example, to send a packet, the rump kernel will issue a hypercall which essentially says "transmit these data", and the network backend handles the request. When using NetBSD PCI drivers, the hypercalls work at a low level, and deal with for example reading/writing the PCI configuration space and mapping the device memory space into the rump kernel. As a result, using NetBSD PCI device drivers in a rump kernel work exactly like in a regular kernel: the PCI devices are probed during rump kernel bootstrap, relevant drivers are attached, and packet shovelling works by the drivers fiddling the relevant device registers.
The hypercall interfaces and necessary kernel-side implementations are currently hosted in the repository providing Xen support for rump kernels. Strictly speaking, there is nothing specific to Xen in these bits, and they will most likely be moved out of the Xen repository once PCI device driver support for other planned platforms, such as Linux userspace, is completed. The hypercall implementations, which are Xen specific, are available here.
For testing networking, it is advantageous to have an interface which can communicate with other networking stacks on the same host without requiring elevated privileges, special kernel features or a priori setup in the form of e.g. a daemon process. These requirements are filled by shmif, which uses file-backed shared memory as a bus for Ethernet frames. Each interface attaches to a pathname, and interfaces attached to the same pathname see the the same traffic.
The shmif driver is available in the NetBSD tree.
We presented a total of six open source network backends for networking with rump kernels. These backends represent four different methodologies:
- DPDK and netmap provide high-performance network hardware access using high-level hypercalls.
- TAP and Xen hypercall drivers provide access to virtualized network resources using high-level hypercalls.
- NetBSD PCI drivers access hardware directly using register-level device access to send and receive packets.
- shmif allows for unprivileged testing of the networking stack without relying on any special kernel drivers or global resources.
Choice is a good thing here, as the optimal backend ultimately depends on the characteristics of the application.