There are a number of motivations for running applications directly on top of the Xen hypervisor without resorting to a full general-purpose OS. For example, one might want to maximally isolate applications with minimal overhead. Leaving the OS out of the picture decreases overhead, since for example the inter-application protection offered normally by virtual memory is already handled by the Xen hypervisor. However, at the same time problems arise: applications expect and use many services normally provided by the OS, for example files, sockets, event notification and so forth. We were able to set up a production quality environment for running applications as Xen DomU's in a few weeks by reusing hundreds of thousands of lines of unmodified driver and infrastructure code from NetBSD. While the amount of driver code may sound like a lot for running single applications, keep in mind that it involves for example file systems, the TCP/IP stack, stdio, system calls and so forth -- the innocent-looking open() alone accepts over 20 flags which must be properly handled. The remainder of this post looks at the effort in more detail.
Shortly after I had published the above, I was contacted by Justin Cormack with whom I had collaborated earlier on his ljsyscall project, which provides system call interfaces to Lua programs. He wanted to run the LuaJIT interpreter and his ljsyscall implementation directly on top of Xen. However, in addition to system calls which were already handled by the rump kernel, the LuaJIT interpreter uses interfaces from libc, so we added the NetBSD libc to the mix. While currently the libc sources are hosted on github, we plan to integrate the changes into the upstream NetBSD sources as soon as things settle (see the repo for instructions on how to produce a diff and verify that the changes really are tiny). The same repository hosts the math library libm, but it is there just for convenience reasons so that these early builds deliver everything from a single checkout. It has been verified that you can alternatively use libm from a standard NetBSD binary distribution, as supposedly you could any other user-level library. The resulting architecture is depicted below.
The API and ABI we provide are the same as of a regular NetBSD installation. Apart from some limitations, such as the absense of fork() -- would it duplicate the DomU? -- objects compiled for a regular NetBSD installation can be linked into the DomU image and booted to run directly as standalone applications on top of Xen. As proofs of concept, I created a demo where a Xen DomU configures TCP/IP networking, mounts a file system image, and runs a httpd daemon to serve the contents, and Justin's demo runs the LuaJIT interpreter and executes the self-test suite for ljsyscall. Though there is solid support for running applications, not all of the work is done. Especially the build framework needs to be more flexible, and everyone who has a use case for this technology is welcome to test out their application and contribute ideas and code for improving the framework.
In conclusion, we have shown that it is straightforward to reuse both kernel and library code from an existing real-world operating system in creating an application environment which can run on top of a bare-metal type cloud platform. By being able to use in the order of 99.9% of the code -- that's 1,000 lines written per 1,000,000 used unmodified -- from an existing, real-world proven source, the task was quick to pull off, the result is robust, and the offered application interfaces are complete. Some might call our work a "yet another $fookernel", but we call it a working result and challenge everyone to evaluate it for themselves.
Yesterday I wrote a serious, user-oriented post about running applications directly on the Xen hypervisor. Today I compensate for the seriousness by writing a why-so-serious, happy-buddha type kernel hacker post. This post is about using NetBSD kernel PCI drivers in rump kernels on Xen, with device access courtesy of Xen PCI passthrough.
I do not like hardware. The best thing about hardware is that it gives software developers the perfect excuse to blame something else for their problems. The second best thing about hardware is that most of the time you can fix problems with physical violence. The third best thing about hardware is that it enables running software. Apart from that, the characteristics of hardware are undesirable: you have to possess the hardware, it does not virtualize nicely, it is a black box subject to the whims of whoever documented it, etc. Since rump kernels are targeting reuse and virtualization of kernel drivers in an environment-agnostic fashion, needless to say, there is a long, non-easy truce between hardware drivers and rump kernels.
Many years ago I did work which enabled USB drivers to run in rump kernels. The approach was to use the ugen device node to access the physical device from userspace. In other words, the layers which transported the USB protocol to and from the device remained in the host kernel, while the interpretation of the contents was moved to userspace; a USB host controller driver was written to act as the middleman between these two. While the approach did allow to run USB drivers such as umass and ucom, and it did give me much-needed exercise in the form of having to plug and unplug USB devices while testing, the whole effort was not entirely successful. The lack of success was due to too much of the driver stack, namely the USB host controller and ugen drivers, residing outside of the rump kernel. The first effect was that due to my in-userspace development exercising in-kernel code (via the ugen device) in creative ways, I experienced way too many development host kernel panics. Some of the panics could be fixed, while others were more in the department "well I have no idea why it decided to crash now or how to repeat the problem". The second effect was being able to use USB drivers in rump kernels only on NetBSD hosts, again foiling environment-agnostism (is that even a word?). The positive side-effect of the effort was adding ioconf and pseudo-root support to config(1), thereby allowing modular driver device tree specifications to be written in the autoconf DSL instead of having to be open-coded into the driver in C.
In the years that followed, the question of rump kernels supporting real device drivers which did not half-hide behind the host's skirt became a veritable FAQ. My answer remained the same: "I don't think it's difficult at all, but there's no way I'm going to do it since I hate hardware". While it was possible to run specially crafted drivers in conjuction with rump kernels, e.g. DPDK drivers for PCI NICs, using any NetBSD driver and supported device was not possible. However, after bolting rump kernels to run on top of Xen, the opportunity to investigate Xen's PCI passthrough capabilities presented itself, and I did end up with support for PCI drivers. Conclusion: I cannot be trusted to not do something.
The path to making PCI devices work consisted of taking n small steps. The trick was staying on the path instead of heading toward the light. If you do the "imagine how it could work and then make it work like that" development like I do, you'll no doubt agree that the steps presented below are rather obvious. (The relevant NetBSD man pages are linked in parenthesis. Also note that the implementations of these interfaces are MD in NetBSD, making for a clean cut into the NetBSD kernel architecture)
- passing PCI config space read and writes to the Xen hypervisor (pci(9))
- mapping the device memory space into the Xen guest and providing access methods (bus_space(9))
- mapping Xen events channels to driver interrupt handlers (pci_intr(9))
- allocating DMA-safe memory and translating memory addresses to and from machine addresses, which are even more physical than physical addresses (bus_dma(9))
On the Xen side of things, the hypercalls for all of these tasks are more or less one-liner calls into the Xen Mini-OS (which, if you read my previous post, is the layer which takes care of the lowest level details of running rump kernels on top of Xen).
And there we have it, NetBSD PCI drivers running on a rump kernel on Xen. The two PCI NIC drivers I tested both even pass the allencompassing ping test (and the can-configure-networking-using-dhcp test too). There's nothing like a dmesg to brighten the day.
Closing thoughts: virtual machine emulators are great, but you lose the ability to kick the hardware.
All open issues (wrong colours on scaled images, failing https, ...) have been resolved.
Here is a new screeenshot:
The NetBSD Project is pleased to announce NetBSD 5.2.1, the first security/bugfix update of the NetBSD 5.2 release branch, and NetBSD 5.1.3, the third security/bugfix update of the NetBSD 5.1 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a release of NetBSD prior to 5.1.3, you are recommended to update to a supported NetBSD 5.x or NetBSD 6.x version.http://www.NetBSD.org/mirrors/.
Updates to NetBSD 6.x will be coming in the next few days.
The NetBSD Project is pleased to announce NetBSD 6.1.2, the second security/bugfix update of the NetBSD 6.1 release branch, and NetBSD 6.0.3, the third security/bugfix update of the NetBSD 6.0 release branch. They represent a selected subset of fixes deemed important for security or stability reasons, and if you are running a prior release of NetBSD 6.x3, you are recommended to update.http://www.NetBSD.org/mirrors/.