Apr 2019
S M T W T F S
 
22 23
       

Archives

This page is a blog mirror of sorts. It pulls in articles from blog's feed and publishes them here (with a feed, too).

Six months ago, I told myself I would write a small hypervisor for an old x86 AMD CPU I had. Just to learn more about virtualization, and see how far I could go alone on my spare time. Today, it turns out that I've gone as far as implementing a full, fast and flexible virtualization stack for NetBSD. I'd like to present here some aspects of it.

Design Aspects

General Considerations

In order to achieve hardware-accelerated virtualization, two components need to interact together:
  • A kernel driver that will switch machine's CPU to a mode where it will be able to safely execute guest instructions.
  • A userland emulator, which talks to the kernel driver to run virtual machines.
Simply said, the emulator asks the kernel driver to run virtual machines, and the kernel driver will run them until a VM exit occurs. When this happens, the kernel driver returns to the emulator, telling it along the way why the VM exit occurred. Such exits can be IO accesses for instance, that a virtual machine is not allowed to perform, and that require the emulator to virtualize them.

The NVMM Design

NVMM provides the infrastructure needed for both the kernel driver and the userland emulators.

The kernel NVMM driver comes as a kernel module that can be dynamically loaded into the kernel. It is made of a generic machine-independent frontend, and of several machine-dependent backends. In practice, it means that NVMM is not specific to x86, and could support ARM 64bit for example. During initialization, NVMM selects the appropriate backend for the system. The frontend handles everything that is not CPU-specific: the virtual machines, the virtual CPUs, the guest physical address spaces, and so forth. The frontend also has an IOCTL interface, that a userland emulator can use to communicate with the driver.

When it comes to the userland emulators, NVMM does not provide one. In other words, it does not re-implement a Qemu, a VirtualBox, a Bhyve (FreeBSD) or a VMD (OpenBSD). Rather, it provides a virtualization API via the libnvmm library, which allows to effortlessly add NVMM support in already existing emulators. This API is meant to be simple and straightforward, and is fully documented. It has some similarities with WHPX on Windows and HVF on MacOS.


Fig. A: General overview of the NVMM design.

The Virtualization API: An Example

The virtualization API is installed by default on NetBSD. The idea is to provide an easy way for applications to use NVMM to implement services, which can go from small sandboxing systems to advanced system emulators.

Let's put ourselves in the context of a simple C application we want to write, to briefly showcase the virtualization API. Note that this API may change a little in the future.

Creating Machines and VCPUs

In libnvmm, each machine is described by an opaque nvmm_machine structure. We start with:

#include <nvmm.h>
...
	struct nvmm_machine mach;
	nvmm_machine_create(&mach);
	nvmm_vcpu_create(&mach, 0);

This creates a machine in 'mach', and then creates VCPU number zero (VCPU0) in this machine. This VM is associated with our process, so if our application gets killed or exits bluntly, NVMM will automatically destroy the VM.

Fetching and Setting the VCPU State

In order to operate our VM, we need to be able to fetch and set the state of its VCPU0, that is, the content of VCPU0's registers. Let's say we want to set the value '123' in VCPU0's RAX register. We can do this by adding four more lines:

	struct nvmm_x64_state state;
	nvmm_vcpu_getstate(&mach, 0, &state, NVMM_X64_STATE_GPRS);
	state.gprs[NVMM_X64_GPR_RAX] = 123;
	nvmm_vcpu_setstate(&mach, 0, &state, NVMM_X64_STATE_GPRS);

Here, we fetch the GPR component of the VCPU0 state (GPR stands for General Purpose Registers), we set RAX to '123', and we put the state back into VCPU0. We're done.

Allocating Guest Memory

Now is time to give our VM some memory, let's say one single page. (What follows is a bit technical.)

The VM has its own MMU, which translates guest virtual addresses (GVA) to guest physical addresses (GPA). A secondary MMU (which we won't discuss) is set up by the host to translate the GPAs to host physical addresses. To give our single page of memory to our VM, we need to tell the host to create this secondary MMU.

Then, we will want to read/write data in the guest memory, that is to say, read/write data into our guest's single GPA. To do that, in NVMM, we also need to tell the host to associate the GPA we want to read/write with a host virtual address (HVA) in our application. The big picture:


Fig. B: Memory relations between our application and our VM.

In Fig. B above, if the VM wants to read data at virtual address 0x4000, the CPU will perform a GVA→GPA translation towards the GPA 0x3000. Our application is able to see the content of this GPA, via its virtual address 0x2000. For example, if our application wants to zero out the page, it can simply invoke:

	memset((void *)0x2000, 0, PAGE_SIZE);

With this system, our application can modify guest memory, by reading/writing to it as if it was its own memory. All of this sounds complex, but comes down to only the following four lines of code:

	uintptr_t hva = (uintptr_t)mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
	gpaddr_t gpa = 0x3000;
	nvmm_hva_map(&mach, hva, PAGE_SIZE);
	nvmm_gpa_map(&mach, hva, gpa, PAGE_SIZE, PROT_READ|PROT_WRITE);

Here we allocate a simple HVA in our application via mmap. Then, we turn this HVA into a special buffer that NVMM will be able to use. Finally, we tell the host to link the GPA (0x3000) towards the HVA. From then on, the guest is allowed to touch what it perceives as being a simple physical page located at address 0x3000, and our application can directly modify the content of this page by reading and writing into the address pointed to by 'hva'.

Running the VM

The final step is running the VM for real. This is achieved with a VCPU Loop, which runs our VCPU0 and processes the different exit reasons, typically in the following form:

	struct nvmm_exit exit;
	while (1) {
		nvmm_vcpu_run(&mach, 0, &exit);
		switch (exit.reason) {
		case NVMM_EXIT_NONE:
			break; /* nothing to do */
		case ... /* completed as needed */
		}
	}

The nvmm_vcpu_run function blocks, and runs the VM until an exit or a rescheduling occurs.

Full Code

We're done now: we know how to create a VM and give it VCPUs, we know how to modify the registers of the VCPUs, we know how to allocate and modify guest memory, and we know how to run a guest.

Let's sum it all up in one concrete example: a calculator that runs inside a VM. This simple application receives two 16bit ints as parameters, launches a VM that performs the addition of these two ints, fetches the result, and displays it.

Full code: calc-vm.c

That's about it, we have our first NVMM-based application in less than 100 lines of C code, and it is an example of how NetBSD's new virtualization API can be used to easily implement VM-related services.

Advanced Use of the Virtualization API

Libnvmm can go farther than just providing wrapper functions around IOCTLs. Simply said, certain exit reasons are very complex to handle, and libnvmm provides assists that can emulate certain guest operations on behalf of the userland emulator.

Libnvmm embeds a comprehensive machinery, made of three main components:

  • The MMU Walker: the component in charge of performing a manual GVA→GPA translation. It basically walks the MMU page tree of the guest; if the guest is running in x86 64bit mode for example, it will walk the four layers of pages in the guest to obtain a GPA.
  • The Instruction decoder: fetches and disassembles the guest instructions that cause MMIO exits. The disassembler uses a Finite State Machine. The result of the disassembly is summed up in a structure that is passed to the instruction emulator, possibly several times consecutively.
  • The instruction emulator: as its name indicates, it emulates the execution of an instruction. Contrary to many other disassemblers and hypervisors, NVMM makes a clear distinction between the decoder and the emulator.

An NVMM-based application can therefore avoid the burden of implementing these components, by just leveraging the assists provided in libnvmm.

Security Aspects

NVMM can be used in security products, such as sandboxing systems, to provide contained environments. Without elaborating more on my warplans, this is a project I've been thinking about for some time on NetBSD.

One thing you may have noticed from Fig. A, is that the complex emulation machinery is not in the kernel, but in userland. This is an excellent security property of NVMM, because it reduces the risk for the host in case of bug or vulnerability – the host kernel remains unaffected –, and also has the advantage of making the machinery easily fuzzable. Currently, this property is not found in other hypervisors such as KVM, HAXM or Bhyve, and I hope we'll be able to preserve it as we move forward with more backends.

Another security property of NVMM is that the assists provided by libnvmm are invoked only if the emulator explicitly called them. In other words, the complex machinery is not launched automatically, and an emulator is free not to use it if it doesn't want to. This can limit the attack surface of applications that create limited VMs, and want to keep things simple and under control as much as possible.

Finally, NVMM naturally benefits from the modern bug detection features available in NetBSD (KASAN, KUBSAN, and more), and from NetBSD's automated test framework.

Performance Aspects

Contrary to other pseudo-cross-platform kernel drivers such as VirtualBox or HAXM, NVMM is well integrated into the NetBSD kernel, and this allows us to optimize the context switches between the guests and the host, in order to avoid expensive operations in certain cases.

Another performance aspect of NVMM is the fact that in order to implement the secondary MMU, NVMM uses NetBSD's pmap subsystem. This allows us to have pageable guest pages, that the host can allocate on-demand to limit memory consumption, and can then swap out when it comes under memory pressure.

It also goes without saying that NVMM is fully MP-safe, and uses fine-grained locking to be able to run many VMs and many VCPUs simultaneously.

On the userland side, libnvmm tries to minimize the processing cost, by for example doing only a partial emulation of certain instructions, or by batching together certain guest IO operations. A lot of work has been done to try to reduce the number of syscalls an emulator would have to make, in order to increase the overall performance on the userland side; but there are several cases where it is not easy to keep a clean design.

Hardware Support

As of this writing, NVMM supports two backends, x86-SVM for AMD CPUs and x86-VMX for Intel CPUs. In each case, NVMM can support up to 128 virtual machines, each having a maximum of 256 VCPUs and 128GB of RAM.

Emulator Support

Armed with our full virtualization stack, our flexible backends, our user-friendly virtualization API, our comprehensive assists, and our swag NVMM logo, we can now add NVMM support in whatever existing emulator we want.

That's what was done in Qemu, with this patch, which shall soon be upstreamed. It uses libnvmm to provide hardware-accelerated virtualization on NetBSD.

It is now fully functional, and can run a wide variety of operating systems, such as NetBSD (of course), FreeBSD, OpenBSD, Linux, Windows XP/7/8.1/10, among others. All of that works equally across the currently supported NVMM backends, which means that Qemu+NVMM can be used on both AMD and Intel CPUs.

Windows 10 on Qemu+NVMM
Fig. C: Example, Windows 10 running on Qemu+NVMM, with 3 VCPUs, on a host that has a quad-core AMD CPU.

Fedora 29 on Qemu+NVMM
Fig. D: Example, Fedora 29 running on Qemu+NVMM, with 8 VCPUs, on a host that has a quad-core Intel CPU.

The instructions on how to use Qemu+NVMM are available on this page.

What Now

All of NVMM is available in NetBSD-current, and will be part of the NetBSD 9 release.

Even if perfectly functional, the Intel backend of NVMM is younger than its AMD counterpart, and it will probably receive some more performance and stability improvements.

There still are, also, several design aspects that I haven't yet settled, because I haven't yet decided the best way to fix them.

Overall, I expect new backends to be added for other architectures than x86, and I also expect to add NVMM support in more emulators.

That's all, ladies and gentlemen. In six months of spare time, we went from Zero to NVMM, and now have a full virtualization stack that can run advanced operating systems in a flexible, fast and secure fashion.

Not bad

Posted late Tuesday evening, April 9th, 2019 Tags: blog
Over the past month I've finally managed to correct masking semantics of crash signals (SIGSEGV, SIGTRAP, SIGILL, SIGFPE, SIGBUS). Additionally I've fixed masking semantics in forks(2) and vforks(2) (they trigger a crash signal SIGTRAP). There is remaining work in signal semantics for other types of events (mainly thread related). The coverage of signal code in ptrace(2) regression tests keeps continuously incrementing.

Crash signal masking

Certain applications and frameworks mask signals that occur during crashes. This can happen deliberately or by an accident when masking all signals in a process.

There are two basic types of signals in this regard:

  • emitted by a debugger-related event (such as software or hardware breakpoint),
  • emitted by other source such as other process (kill(2)) or raised in a thread (raise(2)).
The NetBSD kernel had no subtlety to distinguish these two events and regular signal masking was affecting both types of sources of these signals. This caused various side effects such as a developer being unable to single step a code or after placing a software trap and silently moving over it crashing an application due to abnormal conditions.

Not only debuggers were affected, but software reusing the debugging APIs internally, including the DTrace tools in userland.

Right now the semantics of crash signals has been fixed for traps issued by crashes (such as software breakpoint of segmentation fault) and fork(2)/vfork(2) events.

New ATF tests for ptrace(2)

Browsing the available Linux resources with tests against ptrace(2), I got an inspiration to validate whether unaligned memory access through the PT_READ/PT_WRITE and PIOD READ/WRITE/READ_AUXV operations. These calls are needed to transfer data between the memory of a debugger and a debuggee. They are documented and expected to be safe for a potentially misaligned access. Newly added tests validate whether it is true.

It's much better to detect a potential problem with ATF rather than a kernel crash on a more sensitive CPU (most RISC-ones) during operation.

Plan for the next milestone

Keep preparing kernel fixes and after thorough verification applying them to the mainline distribution.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted in the wee hours of Wednesday night, April 4th, 2019 Tags: blog

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

Originally, LLDB was ported to NetBSD by Kamil Rytarowski. However, multiple upstream changes and lack of continuous testing have resulted in decline of support. So far we haven't been able to restore the previous state.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. My initial effort was focused on restoring continuous integration via buildbot and restoring core file support. You can read more about that in my Feb 2019 report.

In March, I have been continuing this work and this report aims to summarize what I have done and what challenges still lie ahead of me.

Followup from last month and buildbot updates

By the end of February, I was working on fixing the most urgent test failures in order to resume continuous testing on buildbot. In early March, I was able to get all the necessary fixes committed. The list includes:

  • fixing tests to always use libc++ to avoid libstdc++-related breakage: r355273,

  • enabling passing -pthread flag: r355274,

  • enabling support for finding libc++ headers relatively to the executable location: r355282,

  • fixing additional case of bind/connect address mismatch: r355285,

  • passing appropriate -L and -Wl,-rpath flags for tests: r355502 with followup test fix in r355510.

The commit series also included an update for the code finding main executable location to use sysctl(): r355283. However, I've reverted it afterwards as it did not work reliably: r355302. Since this was neither really necessary nor unanimously considered correct, I've abandoned the idea.

Once those initial issues were fixed, I was able to enable the full LLDB test suite on the buildbot. Based on the initial results, I updated the list of tests known to fail on NetBSD: r355320, and later r355774, and start looking into ‘flaky’ tests.

Tests are called flaky if they can either pass or fail unpredictably — usually as a result of race conditions, timeouts and other events that may depend on execution order, system load, etc. The LLDB test suite provides a workaround for flaky tests — through executing them multiple times, and requiring only one of the runs to pass.

I have initially attempted to take advantage of this, committing r355830, then r355838. However, this approach turned out to be suboptimal. Sadly, marking more tests flaky only yielded more failures than previously — presumably because of the load increased through rerunning failing tests. Therefore, I've decided it more prudent to instead focus on finding the root issue.

Currently we are facing a temporary interruption in our buildbot service. We will restore it as soon as possible.

Threaded and AArch64 core file support

The next task on the list was to finish work on improving NetBSD core file support that was started by Kamil Rytarowski from almost two years ago. This involved rebasing his old patches after major upstream refactoring, adding tests and addressing remarks from upstream.

Firstly, I've addressed support for core files created from threaded programs. The new code itself landed as r355736. However, one of the tests was initially broken as it relied on symbol names provided by libc, and therefore failed on non-NetBSD systems. After initially disabling the test, I've finally fixed it by refactoring the code to fail in regular program function rather than libc call: r355786.

Secondly, I've added support for core files from AArch64 systems. To achieve this, I have set up a QEMU VM with a lot of help from Jared McNeill. For completeness, I include his very helpful instructions here:

  1. Fetch http://snapshots.linaro.org/components/kernel/leg-virt-tianocore-edk2-upstream/latest/QEMU-AARCH64/RELEASE_GCC49/QEMU_EFI.fd

  2. Fetch and uncompress latest arm64.img.gz from http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/latest/evbarm-aarch64/binary/gzimg/

  3. Use qemu-img resize arm64.img <newsize> to expand the image for your needs

  4. Start qemu:

    SMP=4
    MEM=2g
    qemu-system-aarch64 -M virt -machine gic-version=3 -cpu cortex-a53 -smp $SMP -m $MEM \
       -drive if=none,file=arm64.img,id=hd0 -device virtio-blk-device,drive=hd0 \
       -netdev type=user,id=net0 -device virtio-net-device,netdev=net0,mac=00:11:22:33:44:55 \
       -bios QEMU_EFI.fd \
       -nographic

With this approach, I've finally been able to run NetBSD/arm64 via QEMU. I used it to create matching core dumps for the tests, and afterwards implemented AArch64 support: r357399.

Other improvements

During the month, upstream has introduced a new SBReproducer module that wrapped most of the LLDB API. The original implementation involved generating the wrappers for all modules in a single file. Combined with heavy use of templates, building this file caused huge memory consumption, making it impossible to build on systems with 4 GiB of RAM. I've discussed possible solutions with upstream and finally implemented one splitting and moving wrappers to individual modules: r356481.

Furthermore, as an effort to reduce flakiness of tests, I've worked on catching and fixing more functions potentially interrupted via signals (EINTR): r356703, with fixup in r356960. Once the LLVM bot will be back, we will check whether that solved our flakiness problem.

LLVM 8 release

Additionally, during the month LLVM 8.0.0 was finally released. Following Kamil's request, by the end of the month I've started working on updating NetBSD src for this release. This is still a work in progress, and I've only managed to get LLVM and Clang to build (with other system components failing afterwards). Nevertheless, if you're interested the current set of changes can be seen on my GitHub fork of netbsd/src, llvm8 branch. The URL for comparison is: https://github.com/NetBSD/src/compare/6e11444..mgorny:llvm8

Please note that for practical reasons this omits the commit updating distributed LLVM and Clang sources.

Future plans

The plans for the nearest future include finishing the efforts mentioned here and working with others on resuming the buildbot. I have also discussed with Pavel Labath and he suggested working on improving error reporting to help us resolve current and future test failures.

The next milestones in the LLDB development plan are:

  1. Add support for FPU registers support for NetBSD/i386 and NetBSD/amd64.

  2. Support XSAVE, XSAVEOPT, ... registers in core(5) files on NetBSD/amd64.

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted late Tuesday evening, April 2nd, 2019 Tags: blog
Kernel signal code is a complex maze, it's very difficult to introduce non-trivial changes without regressions. Over the past month I worked on covering missing elementary scenarios involving the ptrace(2) API. Part of the new tests were marked as expected to success, however a number of them are expected to fail.

The NetBSD distribution changes

I've also introduced non-ptrace(2) related changes namely from the domain of kernel sanitizers, kernel fixes and corresponding ATF tests. I won't discuss them further as they were beyond the ptrace(2) scope. These changes were largely stimulated by students preparing for summer work as a part of Google Summer of Code.

The ptrace(2) ATF commits landed into the repository:

  • Define PTRACE_ILLEGAL_ASM for NetBSD/amd64 in ptrace.h
  • Enable 3 new ptrace(2) tests for SIGILL
  • Refactor GPR and FPR tests in t_ptrace_wait* tests
  • Refactor definition of PT_STEP tests into single macro
  • Correct a style in description of PT_STEP tests in t_ptrace_wait*
  • Refactor kill* test in t_ptrace_wait*
  • Add infinite_thread() for ptrace(2) ATF tests
  • Add initial pthread(3) tests in ATF t_prace_wait* tests
  • Link t_ptrace_wait* tests with -pthread
  • Initial refactoring of siginfo* tests in t_ptrace_wait*
  • Drop siginfo5 from ATF tests in t_ptrace_wait*
  • Merge siginfo6 into other PT_STEP tests in t_ptrace_wait*
  • Rename the siginfo4 test in ATF t_ptrace_wait*
  • Refactor lwp_create1 and lwp_exit1 into trace_thread* in ptrace(2) tests
  • Rename signal1 to signal_mask_unrelated in t_ptrace_wait*
  • Add new regression scenarios for crash signals in t_ptrace_wait*
  • Replace signal2 in t_ptrace_wait* with new tests
  • Add new ATF tests traceme_raisesignal_ignored in t_ptrace_wait*
  • Add new ATF tests traceme_signal{ignored,masked}_crash* in t_ptrace_wait*
  • Add additional assert in traceme_signalmasked_crash t_ptrace_wait* tests
  • Add additional assert in traceme_signalignored_crash t_ptrace_wait* tests
  • Remove redundant test from ATF t_ptrace_wait*
  • Add new ATF t_ptrace_wait* vfork(2) tests
  • Add minor improvements in unrelated_tracer_sees_crash in t_ptrace_wait*
  • Add more tests for variations of unrelated_tracer_sees_crash in ATF
  • Replace signal4 (PT_STEP) test with refactored ones with extra asserts
  • Add signal masked and ignored variations of traceme_vfork_exec in ATF tests
  • Add signal masked and ignored variations of traceme_exec in ATF tests
  • Drop signal5 test-case from ATF t_ptrace_wait*
  • Refactor signal6-8 tests in t_ptrace_wait*

Trap signals processing without signal context reset

The current NetBSD kernel approach of processing crash signals (SEGV, FPE, BUS, ILL, TRAP) is to reset the context of signals. This behavior was introduced as an intermediate and partially legitimate fix for cases of masking a crash signal that was causing infinite loop in a dying process.

The expected behavior is to never reset signal context of a trap signal (or any other signal) when executed under a debugger. In order to achieve these semantics I've introduced a fix for this for the first time last year, but I had to revert quickly, as it caused side effect breakage, not covered by existing at that time ATF ptrace(2) regression tests. This time I made sure to cover upfront almost all interesting scenarios that are requested to function properly. Surprisingly after grabbing old faulty fix and improving it locally, the current signal maze code caused various side effects in corner cases, such as translating SIGKILL in certain tests to previous trap signal (like SIGSEGV).. In other cases side effect behavior seems to be probably even stranger, as one tests hangs only against a certain type of wait(2)-like function (waitid(2)), and executes without hangs against other wait(2)-like function types.

For the reference such surprises can be achieved with the following patch:

Index: sys/kern/kern_sig.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_sig.c,v
retrieving revision 1.350
diff -u -r1.350 kern_sig.c
--- sys/kern/kern_sig.c	29 Nov 2018 10:27:36 -0000	1.350
+++ sys/kern/kern_sig.c	3 Mar 2019 19:26:54 -0000
@@ -911,13 +911,25 @@
 	KASSERT(!cpu_intr_p());
 	mutex_enter(proc_lock);
 	mutex_enter(p->p_lock);
+
+	if (ISSET(p->p_slflag, PSL_TRACED) &&
+	    !(p->p_pptr == p->p_opptr && ISSET(p->p_lflag, PL_PPWAIT))) {
+		p->p_xsig = signo;
+		p->p_sigctx.ps_faked = true; // XXX
+		p->p_sigctx.ps_info._signo = signo;
+		p->p_sigctx.ps_info._code = ksi->ksi_code;
+		sigswitch(0, signo, false);
+		// XXX ktrpoint(KTR_PSIG)
+		mutex_exit(p->p_lock);
+		return;
+	}
+
 	mask = &l->l_sigmask;
 	ps = p->p_sigacts;
 
-	const bool traced = (p->p_slflag & PSL_TRACED) != 0;
 	const bool caught = sigismember(&p->p_sigctx.ps_sigcatch, signo);
 	const bool masked = sigismember(mask, signo);
-	if (!traced && caught && !masked) {
+	if (caught && !masked) {
 		mutex_exit(proc_lock);
 		l->l_ru.ru_nsignals++;
 		kpsendsig(l, ksi, mask);

Such changes need proper investigation and addressing bugs that are now detectable easier with the extended test-suite.

Plan for the next milestone

Keep preparing kernel fixes and after thorough verification applying them to the mainline.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted early Monday morning, March 4th, 2019 Tags: blog

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

Originally, LLDB was ported to NetBSD by Kamil Rytarowski. However, multiple upstream changes and lack of continuous testing have resulted in decline of support. So far we haven't been able to restore the previous state.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. My four first goals as detailed in the previous report were:

  1. Restore tracing in LLDB for NetBSD (i386/amd64/aarch64) for single-threaded applications.

  2. Restore execution of LLDB regression tests, unless there is need for a significant LLDB or kernel work, mark detected bugs as failing or unsupported ones.

  3. Enable execution of LLDB regression tests on the buildbot in order to catch regressions.

  4. Upstream NetBSD (i386/amd64) core(5) support. Develop LLDB regression tests (and the testing framework enhancement) as requested by upstream.

Of those tasks, I consider running regression tests on the buildbot the highest priority. Bisecting regressions post-factum is hard due to long build times, and having continuous integration working is going to be very helpful to maintaining the code long-term.

In this report, I'd like to summarize what I achieved and what technical difficulties I met.

The kqueue interoperability issues

Given no specific clue as to why LLDB was no longer able to start processes on NetBSD, I've decided to start by establishing the status of the test suites. More specifically, I've started with a small subset of LLDB test suite — unittests. In this section, I'd like to focus on two important issues I had with them.

Firstly, one of the tests was hanging indefinitely. As I established, the purpose of the test was to check whether the main loop implementation correctly detects and reports when all the slaves of a pty are disconnected (and therefore the reads on master would fail). Through debugging, I've came to the conclusion that kevent() is not reporting this particular scenario.

I have built a simple test case (which is now part of kqueue ATF tests) and confirmed it. Afterwards, I have attempted to establish whether this behavior is correct. While kqueue(2) does not mention ptys specifically, it states the following for pipes:

Fifos, Pipes

Returns when there is data to read; data contains the number of bytes available.

When the last writer disconnects, the filter will set EV_EOF in flags. This may be cleared by passing in EV_CLEAR, at which point the filter will resume waiting for data to become available before returning.

Furthermore, my test program indicated that FreeBSD exhibits the described EV_EOF behavior. Therefore, I have decided to write a kernel patch adding this functionality, submitted it to review and eventually committed it after applying helpful suggestions from Robert Elz ([PATCH v3] kern/tty_pty: Fix reporting EOF via kevent and add a test case). I have also disabled the test case temporarily since the functionality is non-critical to LLDB (r353545).

Secondly, a few gdbserver-based tests were flaky — i.e. unpredictably passed and failed every iteration. I've started debugging this with a test whose purpose was to check verbose error messages support in the protocol. To my surprise, it seemed as if gdbserver worked fine as far as error message exchange was concerned. This packet was followed by a termination request from client — and it seemed that the server sometimes replies to it correctly, and sometimes terminates just before receiving it.

While working on this particular issue, I've noticed a few deficiencies in LLDB's error handling. In this case, this involved two major issues:

  1. gdbserver ignored errors from main loop. As a result, if kevent() failed, it silently exited with a successful status. I've fixed it to catch and report the error verbosely instead: r354030.

  2. Main loop reported meaningless return value (-1) from kevent(). I've established that most likely all kevent() implementation use errno instead, and made the function return it: r354029.

After applying those two fixes, gdbserver clearly indicated the problem: kevent() returned due to EINTR (i.e. the process receiving a signal). Lacking correct handling for this value, the main loop implementation wrongly treated it as fatal error and terminated the program. I've fixed this via implementing EINTR support for kevent() in r354122.

This trivial fix not only resolved most of the flaky tests but also turned out to be the root cause for LLDB being unable to start processes. Therefore, at this point tracing for single-threaded processes was restored on amd64. Testing on other platforms is pending.

Now, for the moral: working error reporting can save a lot of time.

Socket issues

The next issue I hit while working on the unittests is rather curious, and I have to admit I haven't managed to neither find the root cause or build a good reproducer for it. Nevertheless, I seem to have caught the gist of it and found a good workaround.

The test in question focuses on the high-level socket API in LLDB. It is rather trivial — it binds a server in one thread, and tries to connect to it from a second thread. So far, so good. Most of the time the test works just fine. However, sometimes — especially early after booting — it hangs forever.

I've debugged this thoroughly and came to the following conclusion: the test binds to 127.0.0.1 (i.e. purely IPv4) but tries to connect to localhost. The latter results in the client trying IPv6 first, failing and then succeeding with IPv4. The connection is accepted, the test case moves forward and terminates successfully.

Now, in the failing case, the IPv6 connection attempt succeeds, even though there is no server bound to that port. As a result, the client part is happily connected to a non-existing service, and the server part hangs forever waiting for the connection to come.

I have attempted to reproduce this with an isolated test case, reproducing the use of threads, binding to port zero, the IPv4/IPv6 mixup and I simply haven't been able to reproduce this. However, curiously enough my test case actually fixes the problem. I mean, if I start my test case before LLDB unit tests, they work fine afterwards (until next reboot).

Being unable to make any further progress on this weird behavior, I've decided to fix the test design instead — and make it connect to the same address it binds to: r353868.

Getting the right toolchain for live testing

The largest problem so far was getting LLDB tests to interoperate with NetBSD's clang driver correctly. On other systems, clang either defaults to libstdc++, or has libc++ installed as part of the system (FreeBSD, Darwin). The NetBSD driver wants to use libc++ but we do not have it installed by default.

While this could be solved via installing libc++ on the buildbot host, I thought it would be better to establish a solution that would allow LLDB to use just-built clang — similarly to how other LLVM projects (such as OpenMP) do. This way, we would be testing the matching libc++ revision and users would be able to run the tests in a single checkout out of the box.

Sadly, this is non-trivial. While it could be all hacked into the driver itself, it does not really belong there. While it is reasonable to link tests into the build tree, we wouldn't want regular executables built by user to bind to it. This is why normally this is handled via the test system. However, the tests in LLDB are an accumulation of at least three different test systems, each one calling the compiler separately.

In order to establish a baseline for this, I have created wrappers for clang that added the necessary command-line options. The state-of-art wrapper for clang looked like the following:

#!/usr/bin/env bash

topdir=/home/mgorny/llvm-project/build-rel-master
cxxinc="-cxx-isystem $topdir/include/c++/v1"
lpath="-L $topdir/lib"
rpath="-Wl,-rpath,$topdir/lib"
pthread="-pthread"
libs="-lunwind"

# needed to handle 'clang -v' correctly
[ $# -eq 1 ] && [ "$1" = -v ] && exec $topdir/bin/clang-9-real "$@"
exec $topdir/bin/clang-9-real $cxxinc $lpath $rpath "$@" $pthread $libs

The actual executable I renamed to clang-9-real, and this wrapper replaced clang and a similar one replaced clang++. clang-cl was linked to the real executable (as it wasn't called in wrapper-relevant contexts), while clang-9 was linked to the wrapper.

After establishing a baseline of working tests, I've looked into migrating the necessary bits one by one to the driver and/or LLDB test system, removing the migrated parts and verifying whether tests pass the same.

My proposal so far involves, appropriately:

  1. Replacing -cxx-isystem with libc++ header search using path relative the compiler executable: D58592.

  2. Integrating -L and -Wl,-rpath with the LLDB test system: D58630.

  3. Adding NetBSD to list of platforms needing -pthread: r355274.

  4. The need for -lunwind is solved via switching the test failing due to the lack of it to use libc++ instead of libstdc++: r355273.

The reason for adjusting libc++ header search in the driver rather than in LLDB tests is that the path is specific to building against libc++, and the driver makes it convenient to adjust the path conditionally to standard C++ library being used. In other words, it saves us from hard-relying on the assumption that tests will be run against libc++ only.

I've went for integrating -L in the test system since we do not want to link arbitrary programs to the libraries in LLVM's build directory. Appending this path unconditionally should be otherwise harmless to LLDB's tests, so that is the easier way to go.

Originally I wanted to avoid appending RPATHs. However, it seems that the LD_LIBRARY_PATH solution that works for Linux does not reliably work on NetBSD with LLDB. Therefore, passing -Wl,-rpath along with -L allowed me to solve the problem simpler.

Furthermore, those design solutions match other LLVM projects. I've mentioned OpenMP before — so far we had to pass -cxx-isystem to its tests explicitly but it passed -L for us. Those patches render passing -cxx-isystem unnecessary, and therefore make LLDB follow the suit of OpenMP.

Finishing touches

Having a reasonably working compiler and major regressions fixed, I have focused on establishing a baseline for running tests. The goal is to mark broken tests XFAIL or skip them. With all tests marked appropriately, we would be able to start running tests on the buildbot and catch regressions compared to this baseline. The current progress on this can be see in D58527.

Sadly, besides failing tests there is still a small number of flaky or hanging tests which are non-trivial to detect. The upstream maintainer, Pavel Labath is very helpful and I hope to be able to finally get all the flaky tests either fixed or covered with his help.

Other fixes not worth a separate section include:

  • fixing compiler warnings about empty format strings: r354922,

  • fixing two dlopen() based test cases not to link -ldl on NetBSD: r354617,

  • finishing Kamil's patch for core file support: r354466, followup fix in r354483,

  • removing dead code in main loop: r354050,

  • fixing stand-alone builds after they've been switched to LLVMConfig.cmake: r353925,

  • skipping lldb-mi tests when Python support (needed by lldb-mi) is disabled: r353700,

  • fixing incorrect initialization of sigset_t (not actually used right now): r353675.

Buildbot updates

The last part worth mentioning is that the NetBSD LLVM buildbot has seen some changes. Notably, zorg r354820 included:

  • fixing the bot commit filtering to include all projects built,

  • renaming the bot to shorter netbsd-amd64,

  • and moving it to toolchain category.

One of the most useful functions of buildbot is that it associated every successive build with new commits. If the build fails, it blames the authors of those commits and reports the failure to them. However, for this to work buildbot needs to be aware which projects are being tested.

Our buildbot configuration has been initially based on one used for LLDB, and it assumed LLVM, Clang and LLDB are the only projects built and tested. Over time, we've added additional projects but we failed to update the buildbot configs appropriately. Finally, with the help of Jonas Hahnfeld, Pavel Labath and Galina Kistanova we've managed to update the list and make the bot blame all projects correctly.

While at it, we were suggested to rename the bot. The previous name was lldb-amd64-ninja-netbsd8, and others suggested that the developers may ignore failures in other projects seeing lldb there. Kamil Rytarowski also pointed out that the version number confuses users to believe that we're running separate bots for different versions. The new name and category mean to clearly indicate that we're running a single bot instance for multiple projects.

Quick summary and future plans

At this point, the most important regressions in LLDB have been fixed and it is able to debug simple programs on amd64 once again. The test suite patches are still waiting for review, and once they're approved I still need to work on flaky tests before we can reliably enable that on the buildbot. This is the first priority.

The next item on the TODO list is to take over and finish Kamil's patch for core files with thread. Most notably, the patch requires writing tests, and verifying whether there are no new bugs affecting it.

On a semi-related note, LLVM 8.0.0 will be released in a few days and I will be probably working on updating src to the new version. I will also try to convince Joerg to switch from unmaintained libcxxrt to upstream libc++abi. Kamil also wanted to change libc++ include path to match upstream (NetBSD is dropping /v1 suffix at the moment).

Once this is done, the next big step is to fix threading support. Testing on non-amd64 arches is deferred until I gain access to some hardware.

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Saturday evening, March 2nd, 2019 Tags: blog

Google Summer of Code logo For the 4th year in a row and for the 13th time The NetBSD Foundation will participate in Google Summer of Code 2019!

If you are a student and would like to learn more about Google Summer of Code please go to the Google Summer of Code homepage.

You can find a list of projects in Google Summer of Code project proposals in the wiki.

Do not hesitate to get in touch with us via #netbsd-code IRC channel on Freenode and via NetBSD mailing lists!

Looking forward to have a great summer!

Posted Wednesday afternoon, February 27th, 2019 Tags: blog

Starting this month, I will be focusing my effort on LLDB, the debugger of the LLVM toolchain, as work contracted by the NetBSD Foundation. In this entry, I would like to shortly summarize what I've been working on before and what I have been able to accomplish, and what I am going to do next.

Final status of LLD support

LLD is the link editor (linker) component of Clang toolchain. Its main advantage over GNU ld is much lower memory footprint, and linking speed. I started working on LLD this month, and encountered a few difficulties. I have explained them in detail in the first report on NetBSD LLD porting.

The aforementioned impasse between LLD and NetBSD toolchain maintainers still stands. A few comments have been exchanged but it doesn't seem that either of the sides have managed to convince the other. Right now, it seems that the most probable course of action for the future would be for NetBSD to maintain necessary changes as a local patchset.

To finish my work on LLD, I have committed devel/lld to pkgsrc. It is based on 7.0.1 release with NetBSD patches. I will update it to 8.0.0 once it is released.

Other work on LLVM

Besides the specific effort on LLD, I have been focusing on preparing and testing for the upcoming 8.0.0 release of LLVM. Upstream has set branching point to Jan 16th, and we wanted to get all the pending changes merged if possible.

Of the compiler-rt patches previously submitted for review, the following changes have been merged (and will be included in 8.0.0):

  • added interceptor tests for clearerr, feof, ferrno, fileno, fgetc, getc, ungetc (r350225)

  • fixed more interceptor tests to use assert in order to report potential errors verbosely (r350227)

  • fixed return type of devname_r() interceptor (r350228)

  • added interceptor tests for fputc, putc, putchar, getc_unlocked, putc_unlocked, putchar_unlocked (r350229)

  • added interceptor tests for popen, pclose (r350230)

  • added interceptor tests for funopen (r350231)

  • added interception support for popen, popenve, pclose (r350232)

  • added interception support for funopen* (r350233)

  • implemented FILE structure sanitization (r350882)

Additionally, the following changes have been made to other LLVM components and merged into 8.0.0:

  • enabled system-linker-elf LLD feature on NetBSD (NFC) (r350253)

  • made clang driver permit building instrumented code for all kinds of sanitizers (r351002)

  • added appropriate RPATH when building instrumented code with shared sanitizer runtime (e.g. via -shared-libasan option) (r352610)

Post-release commits were focused on fixing new or newly noticed bugs:

  • fixed Mac-specific compilation-db test to not fail on systems where getMainExecutable function results depends on argv[0] being correct (r351752) (see below)

  • fixed formatting error in polly that caused tests to fail (r351808)

  • fixed missing -lutil linkage in LLDB that caused build using LLD as linker to fail (r352116)

Finding executable path in LLVM

The LLVM support library defines getMainExecutable() function whose purpose is to find the path to the currently executed program. It is used e.g. by clang to determine the driver mode depending on whether you executed clang or clang++, etc. It is also used to determine resource directory when it is specified relatively to the program installation directory.

The function implements a few different execution paths depending on the platform used:

  • on Apple platforms, it uses _NSGetExecutablePath()

  • on BSD variants and AIX, it does path lookup on argv[0]

  • on Linux and Cygwin, it uses /proc/self/exe, or argv[0] lookup if it is not present

  • on other platforms supporting dladdr(), it attempts to find the program path via Dl_info structure corresponding to the main function

  • on Windows, it uses GetModuleFileNameW()

For consistency, all symlinks are eliminated via realpath().

The different function versions require different arguments. argv[0]-based methods require passing argv[0]; dladdr method requires passing pointer to the main function. Other variants ignore those parameters.

When clang-check-mac-libcxx-fixed-compilation-db test was added to clang, it failed on NetBSD because the variant of getMainExecutable() used on NetBSD requires passing argv[0], and the specific part of Clang did not pass it correctly. However, the test authors did not notice the problem since non-BSD platforms normally do not use argv[0] for executable path.

I have determined three possible solutions here (ideally, all of them would be implemented simultaneously):

  1. Modifying getMainExecutable() to use KERN_PROC_PATHNAME sysctl on NetBSD (D56975).

  2. Fixing the compilation database code to pass argv[0] through.

  3. Adding -ccc-install-dir argument to the invocation in the test to force assuming specific install directory (D56976).

The sysctl change was already historically implemented (r303015). It was afterwards reverted (r303285) since it did not provide expected paths on FreeBSD when the executable was referenced via multiple links. However, NetBSD does not suffer from the same issue, so we may switch back to sysctl.

Fixing compilation database would be non-trivial as it would probably involve passing argv[0] to constructor, and effectively some API changes. Given that the code is apparently only used on Apple where argv[0] is not used, I have decided not to explore it for the time being.

Finally, passing -ccc-install-dir seems like the simplest workaround for the problem. Other tests based on paths in clang already pass this option to reliably override path detection. I've committed it as r351752.

Future plans: LLDB

The plan for the next 6 months is as follows:

  1. Restore tracing in LLDB for NetBSD (i386/amd64/aarch64) for single-threaded applications.

  2. Restore execution of LLDB regression tests, unless there is need for a significant LLDB or kernel work, mark detected bugs as failing or unsupported ones.

  3. Enable execution of LLDB regression tests on the build bot in order to catch regressions.

  4. Upstream NetBSD (i386/amd64) core(5) support. Develop LLDB regression tests (and the testing framework enhancement) as requested by upstream.

  5. Upstream NetBSD aarch64 core(5) support. This might involve generic LLDB work on the interfaces and/or kernel fixes. Add regression tests as will be requested by upstream.

  6. Rework threading plan in LLDB in Remote Process Plugin to be more agnostic to non-Linux world and support the NetBSD threading model.

  7. Add support for FPU registers support for NetBSD/i386 and NetBSD/amd64.

  8. Support XSAVE, XSAVEOPT, ... registers in core(5) files on NetBSD/amd64.

  9. Add support for Debug Registers support for NetBSD/i386 and NetBSD/amd64.

  10. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  11. Stabilize LLDB and address breaking tests from the test-suite.

  12. Merge LLDB with the basesystem (under LLVM-style distribution).

I will be working closely with Kamil Rytarowski who will support me on the NetBSD kernel side. I'm officially starting today in order to resolve the presented problems one by one.

This work will be sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted late Friday afternoon, February 1st, 2019 Tags: blog
Over the past month I've merged the LLVM compiler-rt sanitizers (LLVM svn r350590) with the base system. I've also managed to get a functional set of Makefile rules to build all of them, namely:
  • ASan
  • UBSan
  • TSan
  • MSan
  • libFuzzer
  • SafeStack
  • XRay
In all supported variations and modes that are supported by the original LLVM compiler-rt package.

Integration of sanitizers with the base system

I've submitted a patch for internal review but I was asked to push it through tech-toolchain@ first. I'm still waiting for active feedback on moving it in the proper direction.

The final merge of build rules will be done once we get LLVM 8.0(rc2) in the base as there is a small ABI mismatch between Clang/LLVM (7.0svn) and compilr-rt (8.0svn). I've ported/adapted-with-a-hack all the upstream tests for supported sanitizers to be executed against the newly integrated ones with the base system and everything has been adjusted to pass with a few exceptions that still need to be fixed: ASan dynamic (.so) tests are still crashy and UBSan tests where around 1/3 of them are failing due to an ABI mismatch. This caused by a number of new features for UBSan that are not supported by older Clang/LLVM.

Changes intergrated with LLVM projects

There has been a branching of LLVM 8.0 in the middle of January, causing a lot of breakage that required collaboration with the LLVM people to get things back into proper shape. I've also taken part in the LLD porting effort with Michal Gorny.

Post branching point there was also a refactoring of existing features in compiler-rt, such as LSan, SafeStack and Scudo. I had to apply appropriate patches in these sanitizers and temporarily disable LSan until it can be fully ported.

Changes in the base system

Out of the context of sanitizer I've fixed two bugs that relate to my previous work on interfaces for debuggers:

  • PR kern/53817 Random panics in vfs_mountroot()
  • PR lib/53343 t_ptrace_wait*:traceme_vfork_crash_bus test cases fail

Plan for the next milestone

Collect feedback for the patch integrating LLVM sanitizers and merge the final version with the base system.

Return to ptrace(2) kernel fixes and start the work with a focus on improving correctness of signal handling.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Friday afternoon, February 1st, 2019 Tags: blog
Over two years ago, I made a pledge to use NetBSD as my sole OS and only operating system, and to resist booting into any other OS until I had implemented hardware-accelerated virtualization in the NetBSD kernel (the equivalent of Linux' KVM, or Hyper-V).

Today, I am here to report: Mission Accomplished!

It's been a long road, but we now have hardware-accelerated virtualization in the kernel! And while I had only initially planned to get Oracle VirtualBox working, I have with the help of the Intel HAXM engine (the same backend used for virtualization in Android Studio) and a qemu frontend, successfully managed to boot a range of mainstream operating systems.

With the advent of Intel's open-sourcing of their HAXM engine, we now have access to an important set of features:

  • A BSD-style license.
  • Support for multiple platforms: Windows, Darwin, Linux, and now NetBSD .
  • HAXM is an Intel hardware assisted virtualization for their CPUs (VTx and EPT needed).
  • Support for an arbitrary number of concurrent VMs. For simplicity's sake, NetBSD only supports 8, whereas Windows/Darwin/Linux support 64.
  • An arbitary number of supported VCPUS per VM. All OSes support up to 64 VCPUs.
  • ioctl(2) based API (/dev/HAX, /dev/haxmvm/vmXX, /dev/haxmvm/haxmvmXXvcpuYY).
  • Implement non-intrusively as part of the kernel, rather than as an out-of-tree, standalone executable kernel module.
  • Default compatibility with qemu as a frontend.
  • Active upstream support from Intel, which is driven by commercial needs.
  • Optimized for desktop scenarios.
  • Probably the only open-source cross-OS virtualization engine.
  • An active and passionate community that's dedicated to keep improving it.

As well as a few of HAXM's downsides:

  • No AMD (SVM) support (althought there are community plans to implement it).
  • No support for non-x86 architectures.
  • Need for a relatively recent Intel CPU (EPT required).
  • Not as flexible as KVM-like solutions for embedded use-cases or servers.
  • Not as quick as KVM (probably 80% as fast as KVM).

If you'd like more details on HAXM, check out the following sites:

Showcase

I've managed to boot and run several operating systems as guests running on HAXM/NetBSD, all of which are multi-core (2-4 VCPUs):

  • NetBSD. And as our motto goes, of course it runs NetBSD. And, of course, my main priority as a NetBSD developer is achieving excellent support for NetBSD guest operating systems. With the massive performance gains of hardware-accelerated virtualization, it'll finally be possible to run kernel fuzzers and many other practical, real-world workloads on virtualized NetBSD systems.


    [NetBSD at the bootloader]


    [NetBSD kernel booting]


    [NetBSD at a shell prompt]


    [NetBSD with X Window session]


    [NetBSD with X Window session]


    [NetBSD guest and qemu's ACPI integration - emitted poweroff button press]


    [NetBSD in the qemu's curses mode display, which I find convenient to use, especially on a headless remote computer]

  • Linux. When I pledged myself to not boot any other OS before accomplishing my goal, I was mostly thinking about resigning from benefits of the Linux ecosystem (driver and software base). There is still a selection of programs that I miss such as valgrind.. but with each week we are getting closer to fill the missing gaps. Linux guests seem to work, however there is need to tune or disable IOAPIC in order to get it running (I had to pass "noapic" as a Linux kernel option).


    [ArchLinux at a bootloader]


    [ArchLinux at a shell (ZSH) prompt]


    [Ubuntu's installer]


    [Ubuntu at a bootloader]

  • Windows. While I have no personal need nor use-case to run Windows, it's a must-have prestigious target for virtualization. I've obtained a Windows 7 86 trial image from the official Microsoft webpage for testing purposes. Windows 8.1 or newer and 64-bit version is still in development in HAXM.


    [Windows 7 booting]


    [Windows 7 welcome message]


    [Windows 7 running]


    [Windows 7 Control Panel]


    [Windows 7 multitasking]


    [Windows 7 MS Paint]

  • DragonflyBSD. I was prompted to test this FreeBSD derivation by a developer of this OS and it just worked flawlessly.


    [DragonflyBSD at a bootloader]


    [DragonflyBSD at a shell prompt]

  • FreeDOS. It seems to just work, but I have no idea what I can use it for.


    [FREEDOS at an installer]


    [FREEDOS at a command line]

Unfortunatelly not all Operating Systems are already supported. I've found issues with the following ones:
  • Android. The kernel booting seems fine (with "noapic"), but later during distribution load it freezes the host computer.


    [Android bootloader]


    [Android just before the host crash]

  • FreeBSD. Hangs during the booting process.


    [FreeBSD hanging at boot]

Summary and future plans

One thing I must clarify, since I'm frequently asked about it, is that HAXM/NetBSD does not attempt to compete with the NVMM (NetBSD Virtual Machine Monitor) work being done by Maxime Villard. I'm primarily doing this for my own educational purposes, and because I find reaching feature-parity with other open-source projects is important work. Additionally, NVMM only has AMD CPU support, whereas I'm primarily a user of the Intel x86 platform, and thus, so is HAXM/NetBSD. The Intel port of NVMM and NVMM in general is still in development, and this means that HAXM is probably the first solution that has ever successfuly managed to run Windows on NetBSD (has anyone done it with Xen before?)

I will keep working on this project in my spare time and try to correct IOAPIC issues on Linux, hangs during FreeBSD's boot process, and Android crashes.

Most of the NetBSD-specific patches for qemu and Intel HAXM have already been merged upstream. And after this process has been completed, there are plans to make it available in pkgsrc. There's also at least one kernel-level workaround for HAXM behavior related to FPU state, which triggers an assert due to an abnormal condition. For this to be amended, fixes would have to land upstream into HAXM's code.

Subnote

Althought I confess that I've been playing with OpenVMS/VAX in SIMH as I have got a hobbyist license, but on the other hand it's hardly to be treated as a competition to NetBSD. Another exception was during a DTrace tutorial during EuroBSDCon 2017 in Paris, where I had to boot and use a FreeBSD image shared by the lecturer.

Posted in the wee hours of Tuesday night, January 30th, 2019 Tags: blog
Prepared by Michał Górny (mgorny AT gentoo.org).

LLD is the link editor (linker) component of Clang toolchain. Its main advantage over GNU ld is much lower memory footprint, and linking speed. It is of specific interest to me since currently 8 GiB of memory are insufficient to link LLVM statically (which is the upstream default).

The first goal of LLD porting is to ensure that LLD can produce working NetBSD executables, and be used to build LLVM itself. Then, it is desirable to look into trying to build additional NetBSD components, and eventually into replacing /usr/bin/ld entirely with lld.

In this report, I would like to shortly summarize the issues I have found so far trying to use LLD on NetBSD.

DT_RPATH vs DT_RUNPATH

RPATH is used to embed a library search path in the executable. Since it takes precedence over default system library paths, it can be used both to specify the location of additional program libraries and to override system libraries.

Currently, RPATH can be embedded in executables using two tags: the “old” DT_RPATH tag and the “new” DT_RUNPATH tag. The existence of two tags comes from behavior exhibited by some operating systems (e.g. glibc systems): DT_RPATH used to take precedence over LD_LIBRARY_PATH, making it impossible to override the paths specified there. Therefore, a new DT_RUNPATH tag was added that comes after LD_LIBRARY_PATH in precedence. When both DT_RPATH and DT_RUNPATH are specified, the former is ignored.

On NetBSD, DT_RPATH does not take precedence over LD_LIBRARY_PATH. Therefore, there wasn't ever a need for DT_RUNPATH and the support for it (as alias to DT_RPATH) was added only very recently: on 2018-12-30.

Unlike GNU ld, LLD defaults to using “new” tag by default and therefore produces executables whose RPATHs do not work on older NetBSD versions. Given that using DT_RUNPATH on NetBSD has no real advantage, using the --disable-new-dtags option to suppress them is preferable.

More than two PT_LOAD segments

PT_LOAD segments are used to map executable image into the memory. Traditionally, GNU ld produces exactly two PT_LOAD segments: a RX text (code) segment, and a RW data segment. NetBSD dynamic loader (ld.elf_so) hardcodes the assumption of that design. However, lld sometimes produces an additional read-only data segment, causing the assertions to fail in the dynamic loader.

I have attempted to rewrite the memory mapping routine to allow for arbitrary number of segments. However, apparently my patch is just “a step in the wrong direction” and Joerg Sonnenberger is working on a proper fix.

Alternatively, LLD has a --no-rosegment option that can be used to suppress the additional segment and work around the problem.

Clang/LLD driver design issues

Both GCC and Clang use a design based on a front-end driver component. That is, the executable called directly by the user is a driver whose purpose is to perform initial command-line option and input processing, and run appropriate tools performing the actual work. Those tools may include the C preprocessor (cpp), C/C++ compiler (cc1), assembler, link editor (ld).

This follows the original UNIX principle of simple tools that perform a single task well, and a wrapper that combines those tools into complete workflows. Interesting enough, it makes it possible to keep all system-specific defaults and logic in a single place, without having to make every single tool aware of them. Instead, they are passed to those tools as command-line options.

This also accounts for simpler and more portable build system design. The gcc/clang driver provides a single high-level interface for performing a multitude of tasks, including compiling assembly files or linking executables. Therefore, the build system and the user do not need to be explicitly aware of low-level tooling and its usage. Not to mention it makes much easier to swap that tooling transparently.

For example, if you are linking an executable via the driver, it takes care of finding the appropriate link editor (and makes it easy to change it via -fuse-ld), preparing appropriate command-line options (e.g. if you do a -m32 multilib build, it sets the emulation for you) and passes necessary libraries to link (e.g. an appropriate standard C++ library when building a C++ program).

The clang toolchain considers LLD an explicit part of this workflow, and — unlike GNU ld — ld.lld is not really suitable for using stand-alone. For example, it does not include any standard search paths for libraries, expecting the driver to provide them in form of appropriate -L options. This way, all the logic responsible for figuring out the operating system used (including possible cross-compilation scenarios) and using appropriate paths is located in one component.

However, Joerg Sonnenberger disagrees with this and believes LLD should contain all the defaults necessary for it to be used stand-alone on NetBSD. Effectively, we have two conflicting designs: one where all logic is in clang driver, and the other where some of the logic is moved into LLD. At this moment, LLD is following the former assumption, and clang driver for NetBSD — the latter. As a result, neither using LLD directly nor via clang works out of the box on NetBSD; to use either, the user would have to pass all appropriate -L and -z options explicitly.

Fixing LLD to work with the current clang driver would require adding target awareness to LLD and changing a number of defaults for NetBSD based on the target used. However, LLD maintainer Rui Ueyama is opposed to introducing this extra logic specifically for NetBSD, and believes it should be added to the clang driver as for other platforms. On the other side, the NetBSD toolchain driver maintainer Joerg Sonnenberger blocks adding this to the driver. Therefore, we have reached a impasse that prevents LLD from working out of the box on NetBSD without local patches.

A work-in-progress implementation of the local target logic approach requested by Joerg can be seen in D56650. Afterwards, additional behavior can be enabled on NetBSD by using target triple properties such as in D56215 (which copies the libdir logic from clang).

For comparison, the same problem solved in a way consistent with other distributions (and rejected by Joerg) can be seen in D56932 (which updates D33726). However, in some cases it will require adding additional options to LLD (e.g. -z nognustack, D56554), and corresponding dummy switches in GNU ld.

Handling of indirect shared library dependencies

When starting a program, the dynamic loader needs to find and load all shared libraries listed via DT_NEEDED entries in order to obtain symbols needed by the program (functions, variables). Naturally, it also needs to process DT_NEEDED entries of those libraries to satisfy their symbol dependencies, and so on. As a result to this, the program can also reference symbols declared in dependencies of its DT_NEEDED libraries, that is its indirect dependencies.

While linking executables, link editors normally verify that all symbols can be resolved in one of the linked libraries. Historically, GNU ld followed the logic used by the dynamic loader and permitted symbols used by program to be present in either direct or indirect dependencies. However, GNU gold, LLD and newer versions of GNU ld use different logic and permit only symbols provided by the direct dependencies.

Let's take an example: you are writing a program that works with .zip files, and therefore you link -lzip. However, you also implement support for .gz files, and therefore call gzopen() provided by -lz which is also a dependency of libzip.so. Now, with old GNU ld versions you could just use -lzip since it would indirectly include libz.so. However, modern linkers will refuse to link it claiming that gzopen is undefined. You need to explicitly link -lzip -lz to resolve that.

Joerg Sonnenberger disagrees with this new behavior, and explicitly preserves the old behavior in GNU ld version used in NetBSD. However, LLD does not support the historical GNU ld behavior at all. It will probably become necessary to implement it from scratch to support NetBSD fully.

Summary

At this point, it seems that using LLD for the majority of regular packages is a goal that can be achieved soon. The main blocker right now is the disagreement between developers on how to proceed. When we can resolve that and agree on a single way forward, most of the patches become trivial.

Sadly, at this point I really do not see any way to convince either of the sides. The problem was reported in May 2017, and in the same month a fix consistent with all other platforms was provided. However, it is being blocked and LLD can not work out of the box.

Hopefully, we will able to finally find a way forward that does not involve keeping upstream clang driver for NetBSD incompatible with upstream LLD, and carrying a number of patches to LLD locally to make it work.

The further goals include attempting to build the kernel and complete userland using LLD, as well as further verifying compatibility of executables produced with various combinations of linker options (e.g. static PIE executables).

Posted late Friday evening, January 18th, 2019 Tags: blog