There are a number of motivations for running applications directly on top of the Xen hypervisor without resorting to a full general-purpose OS. For example, one might want to maximally isolate applications with minimal overhead. Leaving the OS out of the picture decreases overhead, since for example the inter-application protection offered normally by virtual memory is already handled by the Xen hypervisor. However, at the same time problems arise: applications expect and use many services normally provided by the OS, for example files, sockets, event notification and so forth. We were able to set up a production quality environment for running applications as Xen DomU's in a few weeks by reusing hundreds of thousands of lines of unmodified driver and infrastructure code from NetBSD. While the amount of driver code may sound like a lot for running single applications, keep in mind that it involves for example file systems, the TCP/IP stack, stdio, system calls and so forth -- the innocent-looking open() alone accepts over 20 flags which must be properly handled. The remainder of this post looks at the effort in more detail.

I have been on a path to maximize the reuse potential of the NetBSD kernel with a technology called rump kernels. Things started out with running unmodified drivers in userspace on NetBSD, but the ability to host rump kernels has since spread to the userspace of other operating systems, web browsers (when compiled to javascript), and even the Linux kernel. Running rump kernels directly on the Xen hypervisor has been suggested by a number of people over the years. It provides a different sort of challenge, since as opposed to the environments mentioned previously, the Xen hypervisor is a "bare metal" type of environment: the guest is in charge of everything starting from bootstrap and page table management. The conveniently reusable Xen Mini-OS was employed for interfacing with the bare metal environment, and the necessary rump kernel hypercalls were built upon that. The environment for running unmodified NetBSD kernel drivers (e.g. TCP/IP) and system call handlers (e.g. socket()) directly on top of the Xen hypervisor was available after implementing the necessary rump kernel hypercalls (note: link points to the initial revision).

Shortly after I had published the above, I was contacted by Justin Cormack with whom I had collaborated earlier on his ljsyscall project, which provides system call interfaces to Lua programs. He wanted to run the LuaJIT interpreter and his ljsyscall implementation directly on top of Xen. However, in addition to system calls which were already handled by the rump kernel, the LuaJIT interpreter uses interfaces from libc, so we added the NetBSD libc to the mix. While currently the libc sources are hosted on github, we plan to integrate the changes into the upstream NetBSD sources as soon as things settle (see the repo for instructions on how to produce a diff and verify that the changes really are tiny). The same repository hosts the math library libm, but it is there just for convenience reasons so that these early builds deliver everything from a single checkout. It has been verified that you can alternatively use libm from a standard NetBSD binary distribution, as supposedly you could any other user-level library. The resulting architecture is depicted below.

architecture diagram

The API and ABI we provide are the same as of a regular NetBSD installation. Apart from some limitations, such as the absense of fork() -- would it duplicate the DomU? -- objects compiled for a regular NetBSD installation can be linked into the DomU image and booted to run directly as standalone applications on top of Xen. As proofs of concept, I created a demo where a Xen DomU configures TCP/IP networking, mounts a file system image, and runs a httpd daemon to serve the contents, and Justin's demo runs the LuaJIT interpreter and executes the self-test suite for ljsyscall. Though there is solid support for running applications, not all of the work is done. Especially the build framework needs to be more flexible, and everyone who has a use case for this technology is welcome to test out their application and contribute ideas and code for improving the framework.

In conclusion, we have shown that it is straightforward to reuse both kernel and library code from an existing real-world operating system in creating an application environment which can run on top of a bare-metal type cloud platform. By being able to use in the order of 99.9% of the code -- that's 1,000 lines written per 1,000,000 used unmodified -- from an existing, real-world proven source, the task was quick to pull off, the result is robust, and the offered application interfaces are complete. Some might call our work a "yet another $fookernel", but we call it a working result and challenge everyone to evaluate it for themselves.

Posted at teatime on Tuesday, September 17th, 2013 Tags:

Yesterday I wrote a serious, user-oriented post about running applications directly on the Xen hypervisor. Today I compensate for the seriousness by writing a why-so-serious, happy-buddha type kernel hacker post. This post is about using NetBSD kernel PCI drivers in rump kernels on Xen, with device access courtesy of Xen PCI passthrough.

I do not like hardware. The best thing about hardware is that it gives software developers the perfect excuse to blame something else for their problems. The second best thing about hardware is that most of the time you can fix problems with physical violence. The third best thing about hardware is that it enables running software. Apart from that, the characteristics of hardware are undesirable: you have to possess the hardware, it does not virtualize nicely, it is a black box subject to the whims of whoever documented it, etc. Since rump kernels are targeting reuse and virtualization of kernel drivers in an environment-agnostic fashion, needless to say, there is a long, non-easy truce between hardware drivers and rump kernels.

Many years ago I did work which enabled USB drivers to run in rump kernels. The approach was to use the ugen device node to access the physical device from userspace. In other words, the layers which transported the USB protocol to and from the device remained in the host kernel, while the interpretation of the contents was moved to userspace; a USB host controller driver was written to act as the middleman between these two. While the approach did allow to run USB drivers such as umass and ucom, and it did give me much-needed exercise in the form of having to plug and unplug USB devices while testing, the whole effort was not entirely successful. The lack of success was due to too much of the driver stack, namely the USB host controller and ugen drivers, residing outside of the rump kernel. The first effect was that due to my in-userspace development exercising in-kernel code (via the ugen device) in creative ways, I experienced way too many development host kernel panics. Some of the panics could be fixed, while others were more in the department "well I have no idea why it decided to crash now or how to repeat the problem". The second effect was being able to use USB drivers in rump kernels only on NetBSD hosts, again foiling environment-agnostism (is that even a word?). The positive side-effect of the effort was adding ioconf and pseudo-root support to config(1), thereby allowing modular driver device tree specifications to be written in the autoconf DSL instead of having to be open-coded into the driver in C.

In the years that followed, the question of rump kernels supporting real device drivers which did not half-hide behind the host's skirt became a veritable FAQ. My answer remained the same: "I don't think it's difficult at all, but there's no way I'm going to do it since I hate hardware". While it was possible to run specially crafted drivers in conjuction with rump kernels, e.g. DPDK drivers for PCI NICs, using any NetBSD driver and supported device was not possible. However, after bolting rump kernels to run on top of Xen, the opportunity to investigate Xen's PCI passthrough capabilities presented itself, and I did end up with support for PCI drivers. Conclusion: I cannot be trusted to not do something.

The path to making PCI devices work consisted of taking n small steps. The trick was staying on the path instead of heading toward the light. If you do the "imagine how it could work and then make it work like that" development like I do, you'll no doubt agree that the steps presented below are rather obvious. (The relevant NetBSD man pages are linked in parenthesis. Also note that the implementations of these interfaces are MD in NetBSD, making for a clean cut into the NetBSD kernel architecture)

  1. passing PCI config space read and writes to the Xen hypervisor (pci(9))
  2. mapping the device memory space into the Xen guest and providing access methods (bus_space(9))
  3. mapping Xen events channels to driver interrupt handlers (pci_intr(9))
  4. allocating DMA-safe memory and translating memory addresses to and from machine addresses, which are even more physical than physical addresses (bus_dma(9))

On the Xen side of things, the hypercalls for all of these tasks are more or less one-liner calls into the Xen Mini-OS (which, if you read my previous post, is the layer which takes care of the lowest level details of running rump kernels on top of Xen).

And there we have it, NetBSD PCI drivers running on a rump kernel on Xen. The two PCI NIC drivers I tested both even pass the allencompassing ping test (and the can-configure-networking-using-dhcp test too). There's nothing like a dmesg to brighten the day.

Closing thoughts: virtual machine emulators are great, but you lose the ability to kick the hardware.

Posted late Wednesday afternoon, September 18th, 2013 Tags:
Just a small update on the previous post about firefox on sparc64: after a bit more work, the brand new version 24 ESR builds straight from pkgsrc (so should be included in the next set of binary pkgs).

All open issues (wrong colours on scaled images, failing https, ...) have been resolved.

Here is a new screeenshot:

Posted at lunch time on Monday, September 23rd, 2013 Tags:
Just a small update on the previous post about firefox on sparc64: after a bit more work, the brand new version 24 ESR builds straight from pkgsrc (so should be included in the next set of binary pkgs).

All open issues (wrong colours on scaled images, failing https, ...) have been resolved.

Here is a new screeenshot:

Posted at lunch time on Monday, September 23rd, 2013 Tags:

The NetBSD Project is pleased to announce NetBSD 5.2.1, the first security/bugfix update of the NetBSD 5.2 release branch, and NetBSD 5.1.3, the third security/bugfix update of the NetBSD 5.1 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a release of NetBSD prior to 5.1.3, you are recommended to update to a supported NetBSD 5.x or NetBSD 6.x version.

For more details, please see the NetBSD 5.2.1 release notes or NetBSD 5.1.3 release notes.

Complete source and binaries for NetBSD 5.2.1 and NetBSD 5.1.3 are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, SUP, and other services may be found at http://www.NetBSD.org/mirrors/.

Updates to NetBSD 6.x will be coming in the next few days.

Posted Saturday evening, September 28th, 2013 Tags:

The NetBSD Project is pleased to announce NetBSD 6.1.2, the second security/bugfix update of the NetBSD 6.1 release branch, and NetBSD 6.0.3, the third security/bugfix update of the NetBSD 6.0 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a prior release of NetBSD 6.x3, you are recommended to update.

For more details, please see the NetBSD 6.1.2 release notes or NetBSD 6.0.3 release notes.

Complete source and binaries for NetBSD 6.1.2 and NetBSD 6.0.3 are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, SUP, and other services may be found at http://www.NetBSD.org/mirrors/.

Posted in the wee hours of Sunday night, September 30th, 2013 Tags:
Add a comment