Mar 2019
S M T W T F S
         
20 21
           

Archives

This page is a blog mirror of sorts. It pulls in articles from blog's feed and publishes them here (with a feed, too).

Kernel signal code is a complex maze, it's very difficult to introduce non-trivial changes without regressions. Over the past month I worked on covering missing elementary scenarios involving the ptrace(2) API. Part of the new tests were marked as expected to success, however a number of them are expected to fail.

The NetBSD distribution changes

I've also introduced non-ptrace(2) related changes namely from the domain of kernel sanitizers, kernel fixes and corresponding ATF tests. I won't discuss them further as they were beyond the ptrace(2) scope. These changes were largely stimulated by students preparing for summer work as a part of Google Summer of Code.

The ptrace(2) ATF commits landed into the repository:

  • Define PTRACE_ILLEGAL_ASM for NetBSD/amd64 in ptrace.h
  • Enable 3 new ptrace(2) tests for SIGILL
  • Refactor GPR and FPR tests in t_ptrace_wait* tests
  • Refactor definition of PT_STEP tests into single macro
  • Correct a style in description of PT_STEP tests in t_ptrace_wait*
  • Refactor kill* test in t_ptrace_wait*
  • Add infinite_thread() for ptrace(2) ATF tests
  • Add initial pthread(3) tests in ATF t_prace_wait* tests
  • Link t_ptrace_wait* tests with -pthread
  • Initial refactoring of siginfo* tests in t_ptrace_wait*
  • Drop siginfo5 from ATF tests in t_ptrace_wait*
  • Merge siginfo6 into other PT_STEP tests in t_ptrace_wait*
  • Rename the siginfo4 test in ATF t_ptrace_wait*
  • Refactor lwp_create1 and lwp_exit1 into trace_thread* in ptrace(2) tests
  • Rename signal1 to signal_mask_unrelated in t_ptrace_wait*
  • Add new regression scenarios for crash signals in t_ptrace_wait*
  • Replace signal2 in t_ptrace_wait* with new tests
  • Add new ATF tests traceme_raisesignal_ignored in t_ptrace_wait*
  • Add new ATF tests traceme_signal{ignored,masked}_crash* in t_ptrace_wait*
  • Add additional assert in traceme_signalmasked_crash t_ptrace_wait* tests
  • Add additional assert in traceme_signalignored_crash t_ptrace_wait* tests
  • Remove redundant test from ATF t_ptrace_wait*
  • Add new ATF t_ptrace_wait* vfork(2) tests
  • Add minor improvements in unrelated_tracer_sees_crash in t_ptrace_wait*
  • Add more tests for variations of unrelated_tracer_sees_crash in ATF
  • Replace signal4 (PT_STEP) test with refactored ones with extra asserts
  • Add signal masked and ignored variations of traceme_vfork_exec in ATF tests
  • Add signal masked and ignored variations of traceme_exec in ATF tests
  • Drop signal5 test-case from ATF t_ptrace_wait*
  • Refactor signal6-8 tests in t_ptrace_wait*

Trap signals processing without signal context reset

The current NetBSD kernel approach of processing crash signals (SEGV, FPE, BUS, ILL, TRAP) is to reset the context of signals. This behavior was introduced as an intermediate and partially legitimate fix for cases of masking a crash signal that was causing infinite loop in a dying process.

The expected behavior is to never reset signal context of a trap signal (or any other signal) when executed under a debugger. In order to achieve these semantics I've introduced a fix for this for the first time last year, but I had to revert quickly, as it caused side effect breakage, not covered by existing at that time ATF ptrace(2) regression tests. This time I made sure to cover upfront almost all interesting scenarios that are requested to function properly. Surprisingly after grabbing old faulty fix and improving it locally, the current signal maze code caused various side effects in corner cases, such as translating SIGKILL in certain tests to previous trap signal (like SIGSEGV).. In other cases side effect behavior seems to be probably even stranger, as one tests hangs only against a certain type of wait(2)-like function (waitid(2)), and executes without hangs against other wait(2)-like function types.

For the reference such surprises can be achieved with the following patch:

Index: sys/kern/kern_sig.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_sig.c,v
retrieving revision 1.350
diff -u -r1.350 kern_sig.c
--- sys/kern/kern_sig.c	29 Nov 2018 10:27:36 -0000	1.350
+++ sys/kern/kern_sig.c	3 Mar 2019 19:26:54 -0000
@@ -911,13 +911,25 @@
 	KASSERT(!cpu_intr_p());
 	mutex_enter(proc_lock);
 	mutex_enter(p->p_lock);
+
+	if (ISSET(p->p_slflag, PSL_TRACED) &&
+	    !(p->p_pptr == p->p_opptr && ISSET(p->p_lflag, PL_PPWAIT))) {
+		p->p_xsig = signo;
+		p->p_sigctx.ps_faked = true; // XXX
+		p->p_sigctx.ps_info._signo = signo;
+		p->p_sigctx.ps_info._code = ksi->ksi_code;
+		sigswitch(0, signo, false);
+		// XXX ktrpoint(KTR_PSIG)
+		mutex_exit(p->p_lock);
+		return;
+	}
+
 	mask = &l->l_sigmask;
 	ps = p->p_sigacts;
 
-	const bool traced = (p->p_slflag & PSL_TRACED) != 0;
 	const bool caught = sigismember(&p->p_sigctx.ps_sigcatch, signo);
 	const bool masked = sigismember(mask, signo);
-	if (!traced && caught && !masked) {
+	if (caught && !masked) {
 		mutex_exit(proc_lock);
 		l->l_ru.ru_nsignals++;
 		kpsendsig(l, ksi, mask);

Such changes need proper investigation and addressing bugs that are now detectable easier with the extended test-suite.

Plan for the next milestone

Keep preparing kernel fixes and after thorough verification applying them to the mainline.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted early Monday morning, March 4th, 2019 Tags: blog

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

Originally, LLDB was ported to NetBSD by Kamil Rytarowski. However, multiple upstream changes and lack of continuous testing have resulted in decline of support. So far we haven't been able to restore the previous state.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. My four first goals as detailed in the previous report were:

  1. Restore tracing in LLDB for NetBSD (i386/amd64/aarch64) for single-threaded applications.

  2. Restore execution of LLDB regression tests, unless there is need for a significant LLDB or kernel work, mark detected bugs as failing or unsupported ones.

  3. Enable execution of LLDB regression tests on the buildbot in order to catch regressions.

  4. Upstream NetBSD (i386/amd64) core(5) support. Develop LLDB regression tests (and the testing framework enhancement) as requested by upstream.

Of those tasks, I consider running regression tests on the buildbot the highest priority. Bisecting regressions post-factum is hard due to long build times, and having continuous integration working is going to be very helpful to maintaining the code long-term.

In this report, I'd like to summarize what I achieved and what technical difficulties I met.

The kqueue interoperability issues

Given no specific clue as to why LLDB was no longer able to start processes on NetBSD, I've decided to start by establishing the status of the test suites. More specifically, I've started with a small subset of LLDB test suite — unittests. In this section, I'd like to focus on two important issues I had with them.

Firstly, one of the tests was hanging indefinitely. As I established, the purpose of the test was to check whether the main loop implementation correctly detects and reports when all the slaves of a pty are disconnected (and therefore the reads on master would fail). Through debugging, I've came to the conclusion that kevent() is not reporting this particular scenario.

I have built a simple test case (which is now part of kqueue ATF tests) and confirmed it. Afterwards, I have attempted to establish whether this behavior is correct. While kqueue(2) does not mention ptys specifically, it states the following for pipes:

Fifos, Pipes

Returns when there is data to read; data contains the number of bytes available.

When the last writer disconnects, the filter will set EV_EOF in flags. This may be cleared by passing in EV_CLEAR, at which point the filter will resume waiting for data to become available before returning.

Furthermore, my test program indicated that FreeBSD exhibits the described EV_EOF behavior. Therefore, I have decided to write a kernel patch adding this functionality, submitted it to review and eventually committed it after applying helpful suggestions from Robert Elz ([PATCH v3] kern/tty_pty: Fix reporting EOF via kevent and add a test case). I have also disabled the test case temporarily since the functionality is non-critical to LLDB (r353545).

Secondly, a few gdbserver-based tests were flaky — i.e. unpredictably passed and failed every iteration. I've started debugging this with a test whose purpose was to check verbose error messages support in the protocol. To my surprise, it seemed as if gdbserver worked fine as far as error message exchange was concerned. This packet was followed by a termination request from client — and it seemed that the server sometimes replies to it correctly, and sometimes terminates just before receiving it.

While working on this particular issue, I've noticed a few deficiencies in LLDB's error handling. In this case, this involved two major issues:

  1. gdbserver ignored errors from main loop. As a result, if kevent() failed, it silently exited with a successful status. I've fixed it to catch and report the error verbosely instead: r354030.

  2. Main loop reported meaningless return value (-1) from kevent(). I've established that most likely all kevent() implementation use errno instead, and made the function return it: r354029.

After applying those two fixes, gdbserver clearly indicated the problem: kevent() returned due to EINTR (i.e. the process receiving a signal). Lacking correct handling for this value, the main loop implementation wrongly treated it as fatal error and terminated the program. I've fixed this via implementing EINTR support for kevent() in r354122.

This trivial fix not only resolved most of the flaky tests but also turned out to be the root cause for LLDB being unable to start processes. Therefore, at this point tracing for single-threaded processes was restored on amd64. Testing on other platforms is pending.

Now, for the moral: working error reporting can save a lot of time.

Socket issues

The next issue I hit while working on the unittests is rather curious, and I have to admit I haven't managed to neither find the root cause or build a good reproducer for it. Nevertheless, I seem to have caught the gist of it and found a good workaround.

The test in question focuses on the high-level socket API in LLDB. It is rather trivial — it binds a server in one thread, and tries to connect to it from a second thread. So far, so good. Most of the time the test works just fine. However, sometimes — especially early after booting — it hangs forever.

I've debugged this thoroughly and came to the following conclusion: the test binds to 127.0.0.1 (i.e. purely IPv4) but tries to connect to localhost. The latter results in the client trying IPv6 first, failing and then succeeding with IPv4. The connection is accepted, the test case moves forward and terminates successfully.

Now, in the failing case, the IPv6 connection attempt succeeds, even though there is no server bound to that port. As a result, the client part is happily connected to a non-existing service, and the server part hangs forever waiting for the connection to come.

I have attempted to reproduce this with an isolated test case, reproducing the use of threads, binding to port zero, the IPv4/IPv6 mixup and I simply haven't been able to reproduce this. However, curiously enough my test case actually fixes the problem. I mean, if I start my test case before LLDB unit tests, they work fine afterwards (until next reboot).

Being unable to make any further progress on this weird behavior, I've decided to fix the test design instead — and make it connect to the same address it binds to: r353868.

Getting the right toolchain for live testing

The largest problem so far was getting LLDB tests to interoperate with NetBSD's clang driver correctly. On other systems, clang either defaults to libstdc++, or has libc++ installed as part of the system (FreeBSD, Darwin). The NetBSD driver wants to use libc++ but we do not have it installed by default.

While this could be solved via installing libc++ on the buildbot host, I thought it would be better to establish a solution that would allow LLDB to use just-built clang — similarly to how other LLVM projects (such as OpenMP) do. This way, we would be testing the matching libc++ revision and users would be able to run the tests in a single checkout out of the box.

Sadly, this is non-trivial. While it could be all hacked into the driver itself, it does not really belong there. While it is reasonable to link tests into the build tree, we wouldn't want regular executables built by user to bind to it. This is why normally this is handled via the test system. However, the tests in LLDB are an accumulation of at least three different test systems, each one calling the compiler separately.

In order to establish a baseline for this, I have created wrappers for clang that added the necessary command-line options. The state-of-art wrapper for clang looked like the following:

#!/usr/bin/env bash

topdir=/home/mgorny/llvm-project/build-rel-master
cxxinc="-cxx-isystem $topdir/include/c++/v1"
lpath="-L $topdir/lib"
rpath="-Wl,-rpath,$topdir/lib"
pthread="-pthread"
libs="-lunwind"

# needed to handle 'clang -v' correctly
[ $# -eq 1 ] && [ "$1" = -v ] && exec $topdir/bin/clang-9-real "$@"
exec $topdir/bin/clang-9-real $cxxinc $lpath $rpath "$@" $pthread $libs

The actual executable I renamed to clang-9-real, and this wrapper replaced clang and a similar one replaced clang++. clang-cl was linked to the real executable (as it wasn't called in wrapper-relevant contexts), while clang-9 was linked to the wrapper.

After establishing a baseline of working tests, I've looked into migrating the necessary bits one by one to the driver and/or LLDB test system, removing the migrated parts and verifying whether tests pass the same.

My proposal so far involves, appropriately:

  1. Replacing -cxx-isystem with libc++ header search using path relative the compiler executable: D58592.

  2. Integrating -L and -Wl,-rpath with the LLDB test system: D58630.

  3. Adding NetBSD to list of platforms needing -pthread: r355274.

  4. The need for -lunwind is solved via switching the test failing due to the lack of it to use libc++ instead of libstdc++: r355273.

The reason for adjusting libc++ header search in the driver rather than in LLDB tests is that the path is specific to building against libc++, and the driver makes it convenient to adjust the path conditionally to standard C++ library being used. In other words, it saves us from hard-relying on the assumption that tests will be run against libc++ only.

I've went for integrating -L in the test system since we do not want to link arbitrary programs to the libraries in LLVM's build directory. Appending this path unconditionally should be otherwise harmless to LLDB's tests, so that is the easier way to go.

Originally I wanted to avoid appending RPATHs. However, it seems that the LD_LIBRARY_PATH solution that works for Linux does not reliably work on NetBSD with LLDB. Therefore, passing -Wl,-rpath along with -L allowed me to solve the problem simpler.

Furthermore, those design solutions match other LLVM projects. I've mentioned OpenMP before — so far we had to pass -cxx-isystem to its tests explicitly but it passed -L for us. Those patches render passing -cxx-isystem unnecessary, and therefore make LLDB follow the suit of OpenMP.

Finishing touches

Having a reasonably working compiler and major regressions fixed, I have focused on establishing a baseline for running tests. The goal is to mark broken tests XFAIL or skip them. With all tests marked appropriately, we would be able to start running tests on the buildbot and catch regressions compared to this baseline. The current progress on this can be see in D58527.

Sadly, besides failing tests there is still a small number of flaky or hanging tests which are non-trivial to detect. The upstream maintainer, Pavel Labath is very helpful and I hope to be able to finally get all the flaky tests either fixed or covered with his help.

Other fixes not worth a separate section include:

  • fixing compiler warnings about empty format strings: r354922,

  • fixing two dlopen() based test cases not to link -ldl on NetBSD: r354617,

  • finishing Kamil's patch for core file support: r354466, followup fix in r354483,

  • removing dead code in main loop: r354050,

  • fixing stand-alone builds after they've been switched to LLVMConfig.cmake: r353925,

  • skipping lldb-mi tests when Python support (needed by lldb-mi) is disabled: r353700,

  • fixing incorrect initialization of sigset_t (not actually used right now): r353675.

Buildbot updates

The last part worth mentioning is that the NetBSD LLVM buildbot has seen some changes. Notably, zorg r354820 included:

  • fixing the bot commit filtering to include all projects built,

  • renaming the bot to shorter netbsd-amd64,

  • and moving it to toolchain category.

One of the most useful functions of buildbot is that it associated every successive build with new commits. If the build fails, it blames the authors of those commits and reports the failure to them. However, for this to work buildbot needs to be aware which projects are being tested.

Our buildbot configuration has been initially based on one used for LLDB, and it assumed LLVM, Clang and LLDB are the only projects built and tested. Over time, we've added additional projects but we failed to update the buildbot configs appropriately. Finally, with the help of Jonas Hahnfeld, Pavel Labath and Galina Kistanova we've managed to update the list and make the bot blame all projects correctly.

While at it, we were suggested to rename the bot. The previous name was lldb-amd64-ninja-netbsd8, and others suggested that the developers may ignore failures in other projects seeing lldb there. Kamil Rytarowski also pointed out that the version number confuses users to believe that we're running separate bots for different versions. The new name and category mean to clearly indicate that we're running a single bot instance for multiple projects.

Quick summary and future plans

At this point, the most important regressions in LLDB have been fixed and it is able to debug simple programs on amd64 once again. The test suite patches are still waiting for review, and once they're approved I still need to work on flaky tests before we can reliably enable that on the buildbot. This is the first priority.

The next item on the TODO list is to take over and finish Kamil's patch for core files with thread. Most notably, the patch requires writing tests, and verifying whether there are no new bugs affecting it.

On a semi-related note, LLVM 8.0.0 will be released in a few days and I will be probably working on updating src to the new version. I will also try to convince Joerg to switch from unmaintained libcxxrt to upstream libc++abi. Kamil also wanted to change libc++ include path to match upstream (NetBSD is dropping /v1 suffix at the moment).

Once this is done, the next big step is to fix threading support. Testing on non-amd64 arches is deferred until I gain access to some hardware.

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Saturday evening, March 2nd, 2019 Tags: blog

Google Summer of Code logo For the 4th year in a row and for the 13th time The NetBSD Foundation will participate in Google Summer of Code 2019!

If you are a student and would like to learn more about Google Summer of Code please go to the Google Summer of Code homepage.

You can find a list of projects in Google Summer of Code project proposals in the wiki.

Do not hesitate to get in touch with us via #netbsd-code IRC channel on Freenode and via NetBSD mailing lists!

Looking forward to have a great summer!

Posted Wednesday afternoon, February 27th, 2019 Tags: blog

Starting this month, I will be focusing my effort on LLDB, the debugger of the LLVM toolchain, as work contracted by the NetBSD Foundation. In this entry, I would like to shortly summarize what I've been working on before and what I have been able to accomplish, and what I am going to do next.

Final status of LLD support

LLD is the link editor (linker) component of Clang toolchain. Its main advantage over GNU ld is much lower memory footprint, and linking speed. I started working on LLD this month, and encountered a few difficulties. I have explained them in detail in the first report on NetBSD LLD porting.

The aforementioned impasse between LLD and NetBSD toolchain maintainers still stands. A few comments have been exchanged but it doesn't seem that either of the sides have managed to convince the other. Right now, it seems that the most probable course of action for the future would be for NetBSD to maintain necessary changes as a local patchset.

To finish my work on LLD, I have committed devel/lld to pkgsrc. It is based on 7.0.1 release with NetBSD patches. I will update it to 8.0.0 once it is released.

Other work on LLVM

Besides the specific effort on LLD, I have been focusing on preparing and testing for the upcoming 8.0.0 release of LLVM. Upstream has set branching point to Jan 16th, and we wanted to get all the pending changes merged if possible.

Of the compiler-rt patches previously submitted for review, the following changes have been merged (and will be included in 8.0.0):

  • added interceptor tests for clearerr, feof, ferrno, fileno, fgetc, getc, ungetc (r350225)

  • fixed more interceptor tests to use assert in order to report potential errors verbosely (r350227)

  • fixed return type of devname_r() interceptor (r350228)

  • added interceptor tests for fputc, putc, putchar, getc_unlocked, putc_unlocked, putchar_unlocked (r350229)

  • added interceptor tests for popen, pclose (r350230)

  • added interceptor tests for funopen (r350231)

  • added interception support for popen, popenve, pclose (r350232)

  • added interception support for funopen* (r350233)

  • implemented FILE structure sanitization (r350882)

Additionally, the following changes have been made to other LLVM components and merged into 8.0.0:

  • enabled system-linker-elf LLD feature on NetBSD (NFC) (r350253)

  • made clang driver permit building instrumented code for all kinds of sanitizers (r351002)

  • added appropriate RPATH when building instrumented code with shared sanitizer runtime (e.g. via -shared-libasan option) (r352610)

Post-release commits were focused on fixing new or newly noticed bugs:

  • fixed Mac-specific compilation-db test to not fail on systems where getMainExecutable function results depends on argv[0] being correct (r351752) (see below)

  • fixed formatting error in polly that caused tests to fail (r351808)

  • fixed missing -lutil linkage in LLDB that caused build using LLD as linker to fail (r352116)

Finding executable path in LLVM

The LLVM support library defines getMainExecutable() function whose purpose is to find the path to the currently executed program. It is used e.g. by clang to determine the driver mode depending on whether you executed clang or clang++, etc. It is also used to determine resource directory when it is specified relatively to the program installation directory.

The function implements a few different execution paths depending on the platform used:

  • on Apple platforms, it uses _NSGetExecutablePath()

  • on BSD variants and AIX, it does path lookup on argv[0]

  • on Linux and Cygwin, it uses /proc/self/exe, or argv[0] lookup if it is not present

  • on other platforms supporting dladdr(), it attempts to find the program path via Dl_info structure corresponding to the main function

  • on Windows, it uses GetModuleFileNameW()

For consistency, all symlinks are eliminated via realpath().

The different function versions require different arguments. argv[0]-based methods require passing argv[0]; dladdr method requires passing pointer to the main function. Other variants ignore those parameters.

When clang-check-mac-libcxx-fixed-compilation-db test was added to clang, it failed on NetBSD because the variant of getMainExecutable() used on NetBSD requires passing argv[0], and the specific part of Clang did not pass it correctly. However, the test authors did not notice the problem since non-BSD platforms normally do not use argv[0] for executable path.

I have determined three possible solutions here (ideally, all of them would be implemented simultaneously):

  1. Modifying getMainExecutable() to use KERN_PROC_PATHNAME sysctl on NetBSD (D56975).

  2. Fixing the compilation database code to pass argv[0] through.

  3. Adding -ccc-install-dir argument to the invocation in the test to force assuming specific install directory (D56976).

The sysctl change was already historically implemented (r303015). It was afterwards reverted (r303285) since it did not provide expected paths on FreeBSD when the executable was referenced via multiple links. However, NetBSD does not suffer from the same issue, so we may switch back to sysctl.

Fixing compilation database would be non-trivial as it would probably involve passing argv[0] to constructor, and effectively some API changes. Given that the code is apparently only used on Apple where argv[0] is not used, I have decided not to explore it for the time being.

Finally, passing -ccc-install-dir seems like the simplest workaround for the problem. Other tests based on paths in clang already pass this option to reliably override path detection. I've committed it as r351752.

Future plans: LLDB

The plan for the next 6 months is as follows:

  1. Restore tracing in LLDB for NetBSD (i386/amd64/aarch64) for single-threaded applications.

  2. Restore execution of LLDB regression tests, unless there is need for a significant LLDB or kernel work, mark detected bugs as failing or unsupported ones.

  3. Enable execution of LLDB regression tests on the build bot in order to catch regressions.

  4. Upstream NetBSD (i386/amd64) core(5) support. Develop LLDB regression tests (and the testing framework enhancement) as requested by upstream.

  5. Upstream NetBSD aarch64 core(5) support. This might involve generic LLDB work on the interfaces and/or kernel fixes. Add regression tests as will be requested by upstream.

  6. Rework threading plan in LLDB in Remote Process Plugin to be more agnostic to non-Linux world and support the NetBSD threading model.

  7. Add support for FPU registers support for NetBSD/i386 and NetBSD/amd64.

  8. Support XSAVE, XSAVEOPT, ... registers in core(5) files on NetBSD/amd64.

  9. Add support for Debug Registers support for NetBSD/i386 and NetBSD/amd64.

  10. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  11. Stabilize LLDB and address breaking tests from the test-suite.

  12. Merge LLDB with the basesystem (under LLVM-style distribution).

I will be working closely with Kamil Rytarowski who will support me on the NetBSD kernel side. I'm officially starting today in order to resolve the presented problems one by one.

This work will be sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted late Friday afternoon, February 1st, 2019 Tags: blog
Over the past month I've merged the LLVM compiler-rt sanitizers (LLVM svn r350590) with the base system. I've also managed to get a functional set of Makefile rules to build all of them, namely:
  • ASan
  • UBSan
  • TSan
  • MSan
  • libFuzzer
  • SafeStack
  • XRay
In all supported variations and modes that are supported by the original LLVM compiler-rt package.

Integration of sanitizers with the base system

I've submitted a patch for internal review but I was asked to push it through tech-toolchain@ first. I'm still waiting for active feedback on moving it in the proper direction.

The final merge of build rules will be done once we get LLVM 8.0(rc2) in the base as there is a small ABI mismatch between Clang/LLVM (7.0svn) and compilr-rt (8.0svn). I've ported/adapted-with-a-hack all the upstream tests for supported sanitizers to be executed against the newly integrated ones with the base system and everything has been adjusted to pass with a few exceptions that still need to be fixed: ASan dynamic (.so) tests are still crashy and UBSan tests where around 1/3 of them are failing due to an ABI mismatch. This caused by a number of new features for UBSan that are not supported by older Clang/LLVM.

Changes intergrated with LLVM projects

There has been a branching of LLVM 8.0 in the middle of January, causing a lot of breakage that required collaboration with the LLVM people to get things back into proper shape. I've also taken part in the LLD porting effort with Michal Gorny.

Post branching point there was also a refactoring of existing features in compiler-rt, such as LSan, SafeStack and Scudo. I had to apply appropriate patches in these sanitizers and temporarily disable LSan until it can be fully ported.

Changes in the base system

Out of the context of sanitizer I've fixed two bugs that relate to my previous work on interfaces for debuggers:

  • PR kern/53817 Random panics in vfs_mountroot()
  • PR lib/53343 t_ptrace_wait*:traceme_vfork_crash_bus test cases fail

Plan for the next milestone

Collect feedback for the patch integrating LLVM sanitizers and merge the final version with the base system.

Return to ptrace(2) kernel fixes and start the work with a focus on improving correctness of signal handling.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Friday afternoon, February 1st, 2019 Tags: blog
Over two years ago, I made a pledge to use NetBSD as my sole OS and only operating system, and to resist booting into any other OS until I had implemented hardware-accelerated virtualization in the NetBSD kernel (the equivalent of Linux' KVM, or Hyper-V).

Today, I am here to report: Mission Accomplished!

It's been a long road, but we now have hardware-accelerated virtualization in the kernel! And while I had only initially planned to get Oracle VirtualBox working, I have with the help of the Intel HAXM engine (the same backend used for virtualization in Android Studio) and a qemu frontend, successfully managed to boot a range of mainstream operating systems.

With the advent of Intel's open-sourcing of their HAXM engine, we now have access to an important set of features:

  • A BSD-style license.
  • Support for multiple platforms: Windows, Darwin, Linux, and now NetBSD .
  • HAXM is an Intel hardware assisted virtualization for their CPUs (VTx and EPT needed).
  • Support for an arbitrary number of concurrent VMs. For simplicity's sake, NetBSD only supports 8, whereas Windows/Darwin/Linux support 64.
  • An arbitary number of supported VCPUS per VM. All OSes support up to 64 VCPUs.
  • ioctl(2) based API (/dev/HAX, /dev/haxmvm/vmXX, /dev/haxmvm/haxmvmXXvcpuYY).
  • Implement non-intrusively as part of the kernel, rather than as an out-of-tree, standalone executable kernel module.
  • Default compatibility with qemu as a frontend.
  • Active upstream support from Intel, which is driven by commercial needs.
  • Optimized for desktop scenarios.
  • Probably the only open-source cross-OS virtualization engine.
  • An active and passionate community that's dedicated to keep improving it.

As well as a few of HAXM's downsides:

  • No AMD (SVM) support (althought there are community plans to implement it).
  • No support for non-x86 architectures.
  • Need for a relatively recent Intel CPU (EPT required).
  • Not as flexible as KVM-like solutions for embedded use-cases or servers.
  • Not as quick as KVM (probably 80% as fast as KVM).

If you'd like more details on HAXM, check out the following sites:

Showcase

I've managed to boot and run several operating systems as guests running on HAXM/NetBSD, all of which are multi-core (2-4 VCPUs):

  • NetBSD. And as our motto goes, of course it runs NetBSD. And, of course, my main priority as a NetBSD developer is achieving excellent support for NetBSD guest operating systems. With the massive performance gains of hardware-accelerated virtualization, it'll finally be possible to run kernel fuzzers and many other practical, real-world workloads on virtualized NetBSD systems.


    [NetBSD at the bootloader]


    [NetBSD kernel booting]


    [NetBSD at a shell prompt]


    [NetBSD with X Window session]


    [NetBSD with X Window session]


    [NetBSD guest and qemu's ACPI integration - emitted poweroff button press]


    [NetBSD in the qemu's curses mode display, which I find convenient to use, especially on a headless remote computer]

  • Linux. When I pledged myself to not boot any other OS before accomplishing my goal, I was mostly thinking about resigning from benefits of the Linux ecosystem (driver and software base). There is still a selection of programs that I miss such as valgrind.. but with each week we are getting closer to fill the missing gaps. Linux guests seem to work, however there is need to tune or disable IOAPIC in order to get it running (I had to pass "noapic" as a Linux kernel option).


    [ArchLinux at a bootloader]


    [ArchLinux at a shell (ZSH) prompt]


    [Ubuntu's installer]


    [Ubuntu at a bootloader]

  • Windows. While I have no personal need nor use-case to run Windows, it's a must-have prestigious target for virtualization. I've obtained a Windows 7 86 trial image from the official Microsoft webpage for testing purposes. Windows 8.1 or newer and 64-bit version is still in development in HAXM.


    [Windows 7 booting]


    [Windows 7 welcome message]


    [Windows 7 running]


    [Windows 7 Control Panel]


    [Windows 7 multitasking]


    [Windows 7 MS Paint]

  • DragonflyBSD. I was prompted to test this FreeBSD derivation by a developer of this OS and it just worked flawlessly.


    [DragonflyBSD at a bootloader]


    [DragonflyBSD at a shell prompt]

  • FreeDOS. It seems to just work, but I have no idea what I can use it for.


    [FREEDOS at an installer]


    [FREEDOS at a command line]

Unfortunatelly not all Operating Systems are already supported. I've found issues with the following ones:
  • Android. The kernel booting seems fine (with "noapic"), but later during distribution load it freezes the host computer.


    [Android bootloader]


    [Android just before the host crash]

  • FreeBSD. Hangs during the booting process.


    [FreeBSD hanging at boot]

Summary and future plans

One thing I must clarify, since I'm frequently asked about it, is that HAXM/NetBSD does not attempt to compete with the NVMM (NetBSD Virtual Machine Monitor) work being done by Maxime Villard. I'm primarily doing this for my own educational purposes, and because I find reaching feature-parity with other open-source projects is important work. Additionally, NVMM only has AMD CPU support, whereas I'm primarily a user of the Intel x86 platform, and thus, so is HAXM/NetBSD. The Intel port of NVMM and NVMM in general is still in development, and this means that HAXM is probably the first solution that has ever successfuly managed to run Windows on NetBSD (has anyone done it with Xen before?)

I will keep working on this project in my spare time and try to correct IOAPIC issues on Linux, hangs during FreeBSD's boot process, and Android crashes.

Most of the NetBSD-specific patches for qemu and Intel HAXM have already been merged upstream. And after this process has been completed, there are plans to make it available in pkgsrc. There's also at least one kernel-level workaround for HAXM behavior related to FPU state, which triggers an assert due to an abnormal condition. For this to be amended, fixes would have to land upstream into HAXM's code.

Subnote

Althought I confess that I've been playing with OpenVMS/VAX in SIMH as I have got a hobbyist license, but on the other hand it's hardly to be treated as a competition to NetBSD. Another exception was during a DTrace tutorial during EuroBSDCon 2017 in Paris, where I had to boot and use a FreeBSD image shared by the lecturer.

Posted in the wee hours of Tuesday night, January 30th, 2019 Tags: blog
Prepared by Michał Górny (mgorny AT gentoo.org).

LLD is the link editor (linker) component of Clang toolchain. Its main advantage over GNU ld is much lower memory footprint, and linking speed. It is of specific interest to me since currently 8 GiB of memory are insufficient to link LLVM statically (which is the upstream default).

The first goal of LLD porting is to ensure that LLD can produce working NetBSD executables, and be used to build LLVM itself. Then, it is desirable to look into trying to build additional NetBSD components, and eventually into replacing /usr/bin/ld entirely with lld.

In this report, I would like to shortly summarize the issues I have found so far trying to use LLD on NetBSD.

DT_RPATH vs DT_RUNPATH

RPATH is used to embed a library search path in the executable. Since it takes precedence over default system library paths, it can be used both to specify the location of additional program libraries and to override system libraries.

Currently, RPATH can be embedded in executables using two tags: the “old” DT_RPATH tag and the “new” DT_RUNPATH tag. The existence of two tags comes from behavior exhibited by some operating systems (e.g. glibc systems): DT_RPATH used to take precedence over LD_LIBRARY_PATH, making it impossible to override the paths specified there. Therefore, a new DT_RUNPATH tag was added that comes after LD_LIBRARY_PATH in precedence. When both DT_RPATH and DT_RUNPATH are specified, the former is ignored.

On NetBSD, DT_RPATH does not take precedence over LD_LIBRARY_PATH. Therefore, there wasn't ever a need for DT_RUNPATH and the support for it (as alias to DT_RPATH) was added only very recently: on 2018-12-30.

Unlike GNU ld, LLD defaults to using “new” tag by default and therefore produces executables whose RPATHs do not work on older NetBSD versions. Given that using DT_RUNPATH on NetBSD has no real advantage, using the --disable-new-dtags option to suppress them is preferable.

More than two PT_LOAD segments

PT_LOAD segments are used to map executable image into the memory. Traditionally, GNU ld produces exactly two PT_LOAD segments: a RX text (code) segment, and a RW data segment. NetBSD dynamic loader (ld.elf_so) hardcodes the assumption of that design. However, lld sometimes produces an additional read-only data segment, causing the assertions to fail in the dynamic loader.

I have attempted to rewrite the memory mapping routine to allow for arbitrary number of segments. However, apparently my patch is just “a step in the wrong direction” and Joerg Sonnenberger is working on a proper fix.

Alternatively, LLD has a --no-rosegment option that can be used to suppress the additional segment and work around the problem.

Clang/LLD driver design issues

Both GCC and Clang use a design based on a front-end driver component. That is, the executable called directly by the user is a driver whose purpose is to perform initial command-line option and input processing, and run appropriate tools performing the actual work. Those tools may include the C preprocessor (cpp), C/C++ compiler (cc1), assembler, link editor (ld).

This follows the original UNIX principle of simple tools that perform a single task well, and a wrapper that combines those tools into complete workflows. Interesting enough, it makes it possible to keep all system-specific defaults and logic in a single place, without having to make every single tool aware of them. Instead, they are passed to those tools as command-line options.

This also accounts for simpler and more portable build system design. The gcc/clang driver provides a single high-level interface for performing a multitude of tasks, including compiling assembly files or linking executables. Therefore, the build system and the user do not need to be explicitly aware of low-level tooling and its usage. Not to mention it makes much easier to swap that tooling transparently.

For example, if you are linking an executable via the driver, it takes care of finding the appropriate link editor (and makes it easy to change it via -fuse-ld), preparing appropriate command-line options (e.g. if you do a -m32 multilib build, it sets the emulation for you) and passes necessary libraries to link (e.g. an appropriate standard C++ library when building a C++ program).

The clang toolchain considers LLD an explicit part of this workflow, and — unlike GNU ld — ld.lld is not really suitable for using stand-alone. For example, it does not include any standard search paths for libraries, expecting the driver to provide them in form of appropriate -L options. This way, all the logic responsible for figuring out the operating system used (including possible cross-compilation scenarios) and using appropriate paths is located in one component.

However, Joerg Sonnenberger disagrees with this and believes LLD should contain all the defaults necessary for it to be used stand-alone on NetBSD. Effectively, we have two conflicting designs: one where all logic is in clang driver, and the other where some of the logic is moved into LLD. At this moment, LLD is following the former assumption, and clang driver for NetBSD — the latter. As a result, neither using LLD directly nor via clang works out of the box on NetBSD; to use either, the user would have to pass all appropriate -L and -z options explicitly.

Fixing LLD to work with the current clang driver would require adding target awareness to LLD and changing a number of defaults for NetBSD based on the target used. However, LLD maintainer Rui Ueyama is opposed to introducing this extra logic specifically for NetBSD, and believes it should be added to the clang driver as for other platforms. On the other side, the NetBSD toolchain driver maintainer Joerg Sonnenberger blocks adding this to the driver. Therefore, we have reached a impasse that prevents LLD from working out of the box on NetBSD without local patches.

A work-in-progress implementation of the local target logic approach requested by Joerg can be seen in D56650. Afterwards, additional behavior can be enabled on NetBSD by using target triple properties such as in D56215 (which copies the libdir logic from clang).

For comparison, the same problem solved in a way consistent with other distributions (and rejected by Joerg) can be seen in D56932 (which updates D33726). However, in some cases it will require adding additional options to LLD (e.g. -z nognustack, D56554), and corresponding dummy switches in GNU ld.

Handling of indirect shared library dependencies

When starting a program, the dynamic loader needs to find and load all shared libraries listed via DT_NEEDED entries in order to obtain symbols needed by the program (functions, variables). Naturally, it also needs to process DT_NEEDED entries of those libraries to satisfy their symbol dependencies, and so on. As a result to this, the program can also reference symbols declared in dependencies of its DT_NEEDED libraries, that is its indirect dependencies.

While linking executables, link editors normally verify that all symbols can be resolved in one of the linked libraries. Historically, GNU ld followed the logic used by the dynamic loader and permitted symbols used by program to be present in either direct or indirect dependencies. However, GNU gold, LLD and newer versions of GNU ld use different logic and permit only symbols provided by the direct dependencies.

Let's take an example: you are writing a program that works with .zip files, and therefore you link -lzip. However, you also implement support for .gz files, and therefore call gzopen() provided by -lz which is also a dependency of libzip.so. Now, with old GNU ld versions you could just use -lzip since it would indirectly include libz.so. However, modern linkers will refuse to link it claiming that gzopen is undefined. You need to explicitly link -lzip -lz to resolve that.

Joerg Sonnenberger disagrees with this new behavior, and explicitly preserves the old behavior in GNU ld version used in NetBSD. However, LLD does not support the historical GNU ld behavior at all. It will probably become necessary to implement it from scratch to support NetBSD fully.

Summary

At this point, it seems that using LLD for the majority of regular packages is a goal that can be achieved soon. The main blocker right now is the disagreement between developers on how to proceed. When we can resolve that and agree on a single way forward, most of the patches become trivial.

Sadly, at this point I really do not see any way to convince either of the sides. The problem was reported in May 2017, and in the same month a fix consistent with all other platforms was provided. However, it is being blocked and LLD can not work out of the box.

Hopefully, we will able to finally find a way forward that does not involve keeping upstream clang driver for NetBSD incompatible with upstream LLD, and carrying a number of patches to LLD locally to make it work.

The further goals include attempting to build the kernel and complete userland using LLD, as well as further verifying compatibility of executables produced with various combinations of linker options (e.g. static PIE executables).

Posted late Friday evening, January 18th, 2019 Tags: blog
I've finished the process of upstreaming patches to LLVM sanitizers (almost 2000LOC of local code) and submitted to upstream new improvements for the NetBSD support. Today out of the box (in unpatched version) we have support for a variety of compiler-rt LLVM features: ASan (finds unauthorized memory access), UBSan (finds unspecified code semantics), TSan (finds threading bugs), MSan (finds uninitialized memory use), SafeStack (double stack hardening), Profile (code coverage), XRay (dynamic code tracing); while other ones such as Scudo (hardened allocator) or DFSan (generic data flow sanitizer) are not far away from completeness.

The NetBSD support is no longer visibly lacking behind Linux in sanitizers, although there are still failing tests on NetBSD that are not observed on Linux. On the other hand there are features working on NetBSD that are not functional on Linux, like sanitizing programs during early initialization process of OS (this is caused by /proc dependency on Linux that is mounted by startup programs, while NetBSD relies on sysctl(3) interfaces that is always available).

Changes in compiler-rt

A number of patches have been merged upstream this month. Part of the upstreamed code has been originally written by Yang Zheng during GSoC-2018. My work was about cleaning the patches, applying comments from upstream review and writing new regression tests. Some of the changes were newly written over the past month, like background thread support in ASan/NetBSD or NetBSD compatible per-thread cleanup destructors in ASan and MSan.

Additionally, I've also ported the LLVM profile (--coverage) feature to NetBSD and investigated the remaining failing tests. Part of the failures were caused by already a copy of older runtime inside the NetBSD libc (ABIv2 libc vs ABIv4 current). The remaining two tests are affected by incompatible behavior of atexit(3) in Dynamic Shared Objects. Replacing the functionality with destructors didn't work and I've marked these tests as expected failures and moved on.

Changes in compiler-rt:

  • 3d5a3668a Reenable hard_rss_limit_mb_test.cc for android-26
  • decb231c3 Add support for background thread on NetBSD in ASan
  • 3ebc523bb Fix a mistake in previous
  • df1f46250 Update NetBSD ioctl(2) entries with 8.99.28
  • 2de4ff725 Enable asan_and_llvm_coverage_test.cc for NetBSD
  • 4d9ac421b Reimplement Thread Static Data MSan routines with TLS
  • f4a536af4 Adjust NetBSD/sha2.cc to be portable to more environments
  • 21bd4bd9f Adjust NetBSD/md2.cc to be portable to more environments
  • 1f2d0324e Adjust NetBSD/md[45].cc to be portable to more environments
  • 2835fe7cf Add support for LLVM profile for NetBSD
  • 52af2fe7c Reimplement Thread Static Data ASan routines with TLS
  • 79d385b5c Improve the comment in previous
  • 384486fa4 Expand TSan sysroot workaround to NetBSD
  • 81e370964 Enable test/msan/pthread_getname_np.cc for NetBSD
  • 19e2af50e Enable SANITIZER_INTERCEPT_PTHREAD_GETNAME_NP for NetBSD
  • 5088473c7 Fix internal_sleep() for NetBSD
  • cd24f2f94 Mark interception_failure_test.cc as passing for NetBSD and asan-dynamic-runtime
  • ececda6ca Set shared_libasan_path in lit tests for NetBSD
  • 429bc2d51 Add a new interceptors for cdbr(3) and cdbw(3) API from NetBSD
  • 71553eb50 Add new interceptors for vis(3) API in NetBSD
  • a3e78a793 Add data types needed for md2(3)/NetBSD interceptors
  • 0ddb9d099 Add interceptors for the sha2(3) from NetBSD
  • 9e2ff43a6 Add interceptors for md2(3) from NetBSD
  • 42ac31ee6 Add new interceptors for FILE repositioning stream
  • 8627b4b30 Fix a typo in the strtoi test
  • 660f7441b Revert a chunk of previous change in sanitizer_platform_limits_netbsd.h
  • 9a087462c Add interceptors for md5(3) from NetBSD
  • 086caf6a2 Add interceptors for the rmd160(3) from NetBSD
  • 8f77a2e89 Add interceptors for the md4(3) from NetBSD
  • 27af3db52 Add interceptors for the sha1(3) from NetBSD
  • 6b9f7889b Add interceptors for the strtoi(3)/strtou(3) from NetBSD
  • 195044df9 Add a new interceptors for statvfs1(2) and fstatvfs1(2) from NetBSD
  • f0835eb01 Add a new interceptor for fparseln(3) from NetBSD
  • 11ecbe602 Add new interceptor for strtonum(3)
  • 19b47fcc0 Remove XFAIL in get_module_and_offset_for_pc.cc for NetBSD-MSan
  • b3a7f1d78 Add a new interceptor for modctl(2) from NetBSD
  • 39c2acc81 Add a new interceptor for nl_langinfo(3) from NetBSD
  • 2eb9a4c53 Update GET_LINK_MAP_BY_DLOPEN_HANDLE() for NetBSD x86
  • e8dd644be Improve the regerror(3) interceptor
  • dd939986a Add interceptors for the sysctl(3) API family from NetBSD
  • 67639f9cc Add interceptors for the fts(3) API family from NetBSD
  • c8fae517a Add new interceptor for regex(3) in NetBSD

Part of the new code has been quickly ported from NetBSD to other Operating Systems, mostly FreeBSD, and when applicable to Darwin and Linux.

Changes in other LLVM projects

In order to eliminate local diffs in other LLVM projects, I've upstreamed two patches to LLVM and two to OpenMP. I've also helped other BSDs to get their support in OpenMP (DragonFlyBSD and OpenBSD).

LLVM changes:

  • 50df229c26a Add NetBSD support in needsRuntimeRegistrationOfSectionRange.
  • 267dfed3ade Register kASan shadow offset for NetBSD/amd64

OpenMP changes:

  • 67d037d Implement __kmp_is_address_mapped() for NetBSD
  • 9761977 Implement __kmp_gettid() for NetBSD
  • a72c79b Add OpenBSD support to OpenMP
  • b3d05ab Add DragonFlyBSD support to OpenMP

NetBSD changes

I've introduced 5 changes to the NetBSD source tree over the past month, not counting updates to TODO lists.

  • Raise the fill_vmentries() E2BIG limit from 1MB to 10MB
  • Correct libproc_p.a in distribution sets
  • compiler_rt: Update prepare-import.sh according to future updates
  • Correct handling of minval > maxval in strtonum(3)
  • Stop mangling __func__ for C++11 and newer

The first change is needed to handle large address space with sysctl(3) operation to retrieve the map. This feature is required in sanitizers and part of the tests were failing because within 1MB it wasn't possible to pass all the information about the process virtual map (mostly due to a large number of small allocations).

The second change was introduced to unbreak MKPROFILE=no build, I needed this during my work of porting the modern LLVM profile feature.

The third change is a preparation for import of compiler-rt sanitizers into the NetBSD distribution.

The forth change was a bug fix for strtonum(3) implementation in libc.

The fifth change was intended to reuse native compiler support for the __func__ compiler symbol.

Integration of LLVM sanitizers with the NetBSD basesystem

We are ready to push support for LLVM sanitizers into the NetBSD basesystem as all the needed patches have been merged. I've divided the remaining tests of integration of LLVM sanitizers into three milestones:

  1. Import compiler-rt sources into src/. A complete diff is pending for final acceptance in internal review.
  2. Integrate building of compiler-rt stanitizer under the MKLLVM=yes option. This has been made functional, but it needs polishing and submitting to internal review.
  3. Make MKSANITIZER available out of the box with the toolchain available in "./build.sh tools". This will be continuation of the previous point. All the MKSANITIZER patches independent from compiler type are already committed into the NetBSD distribution, however there will be likely some extra minor adaptation work here too.

Plan for the next milestone

Finish the integration of LLVM sanitizers with the NetBSD distribution.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted late Thursday afternoon, January 3rd, 2019 Tags: blog

Prepared by Michał Górny (mgorny AT gentoo.org).

I'm recently helping the NetBSD developers to improve the support for this operating system in various LLVM components. As you can read in my previous report, I've been focusing on fixing build and test failures for the purpose of improving the buildbot coverage.

Previously, I've resolved test failures in LLVM, Clang, LLD, libunwind, openmp and partially libc++. During the remainder of the month, I've been working on the remaining libc++ test failures, improving the NetBSD clang driver and helping Kamil Rytarowski with compiler-rt.

Locale issues in NetBSD / libc++

The remaining libc++ work focused on resolving locale issues. This consisted of two parts:

  1. Resolving incorrect assumptions in re.traits case-insensitivity translation handling: r349378 (D55746),

  2. Enabling locale support and disabling tests failing because of partial locale support in NetBSD: r349379 (D55767).

The first of the problems was related to testing the routine converting strings to common case for the purpose of case-insensitive regular expression matching. The test attempted to translate \xDA and \xFA characters (stored as char) in UTF-8 locale, and expected both of them to map to \xFA, i.e. according to Unicode Latin-1 Supplement. However, this behavior is only exhibited on OSX. Other systems, including NetBSD, FreeBSD and Linux return the character unmodified (i.e. \xDA and \xFA appropriately).

I've came to the conclusion that the most likely cause of the incompatible behavior is that both \xDA and \xFA alone do not comprise valid UTF-8 sequences. However, since the translation function can take only a single char and is therefore incapable of processing multi-byte sequences, it is unclear how it should handle the range \x80..\xFF that's normally used for multi-byte sequences or not used at all. Apparently, the OSX implementers decided to map it into \u0080..\u00FF range, i.e. treat equivalently to wchar_t, while others decided to ignore it.

After some discussion, upstream agreed on removing the two tests for now, and treating the result of this translation as undefined implementation-specific behavior. At the same time, we've left similar cases for L'\xDA' and L'\xFA' which seem to work reliably on all implementations (assuming wchar_t is using UCS-4, UCS-2 or UTF-16 encoding).

The second issue was adding libc++ target info for NetBSD which has been missing so far. This I've based on the rules for FreeBSD, modified to use libc++abi instead of libcxxrt (the former being upstream default, and the latter being abandoned external project). This also included a list of supported locales which caused a large number of additional tests to be run (if I counted correctly, 43 passing and 30 failing).

Some of the tests started failing due to limited locale support on NetBSD. Apparently, the system supports locales only for the purpose of character encoding (i.e. LC_CTYPE category), while the remaining categories are implemented as stubs (see localeconv(3)). After some thinking, we've agreed to list all locales expected by libc++ as supported since NetBSD has support files for them, and mark relevant tests as expected failures. This has the advantage that when locale support becomes more complete, the expected failures will be reported as unexpected passes, and we will know to update the tests accordingly.

Clang driver updates

On specific request of Kamil, I have looked into the NetBSD Clang driver code afterwards. My driver work focused on three separate issues:

  1. Recently added address significance tables (-faddrsig) causing crashes of binutils (r349647; D55828),

  2. Passing -D_REENTRANT when building sanitized code (with prerequisite: r349649, D55832; r349650, D55654,

  3. Establishing proper support for finding and using locally installed LLVM components, i.e. libc++, compiler-rt, etc.

The first problem was mostly a side effect of Kamil testing my z3 bump. He noticed that binutils crash on executables produced by recent versions of clang (example backtrace <http://netbsd.org/~kamil/llvm/strip.txt>). Given my earlier experiences in Gentoo, I correctly guessed that this is caused by address significance table support that is enabled by default in LLVM 7. While technically they should be ignored by tooling not supporting them, it causes verbose warnings in some versions of binutils, and explicit crashes in the 2.27 version used by NetBSD at the moment.

In order to prevent users from experiencing this, we've agreed to disable address significance table by default on NetBSD (users can still explicitly enable them via -faddrsig). What's interesting, the block for disabling LLVM_ADDRSIG has grown since to include PS4 and Gentoo which indicates that we're not the only ones considering this default a bad idea.

The second problem was that Kamil indicated that we should only support sanitizing reentrant versions of the system API. Accordingly, he requested that the driver automatically includes -D_REENTRANT when compiling code with any of the sanitizers enabled. I've implemented a new internal clang API that checks whether any of the sanitizers were enabled, and used it to implicitly pass this option from driver to the actual compiler instance.

The third problem is much more complex. It boils down to the driver using hardcoded paths from a few years back assuming LLVM being installed to /usr as path of /usr/src integration. However, those paths do not really work when clang is run from the build directory or installed manually to /usr/local; which means e.g. clang install with libc++ does not work out of the box.

Sadly, we haven't been able to come up with a really good and reliable solution to this. Even if we could make clang find other components reasonably reliably using executable-relative paths, this would require building executables with RPATHs potentially pointing to (temporary) build directory of LLVM. Eventually, I've decided to abandon the problem and focus on passing appropriate options explicitly when using just-built clang to build and test other components. For the purpose of buildbot, this required using the following option:

-DOPENMP_TEST_FLAGS="-cxx-isystem${PWD}/include/c++/v1"

compiler-rt work

As Kamil is finalizing his work on sanitizers, he left tasks in TODO lists and I picked them up to work on them.

Build fixes

My first two fixes to compiler-rt were plain build fixes, necessary to proceed further. Those were:

  1. Fixing use of variables (whose values could not be directly determined at compile time) for array length: r349645, D55811.

  2. Detecting missing libLLVMTestingSupport and skipping tests requiring it: r349899, D55891.

The first one was a trivial coding slip that was easily fixed by using a constant value for hash length. The second one was more problematic.

Two tests in XRay test suite required LLVMTestingSupport. However, this library is not installed by LLVM, and so is not present when building stand-alone. Normally, similar issues in LLVM were resolved by building the relevant library locally (that's e.g. what we do with gtest). In this case this wasn't feasible due to the library being more tightly coupled with LLVM itself, and adjusting the build system would be non-trivial. Therefore, since only two tests required this we've agreed on disabling this when the library isn't present.

XRay: alignment and MPROTECT problems

The next step was to research XRay test failures. Firstly, all the tests were failing due to PaX MPROTECT. Secondly, after disabling MPROTECT I've been getting the following test failures from check-xray:

XRay-x86_64-netbsd :: TestCases/Posix/fdr-reinit.cc
XRay-x86_64-netbsd :: TestCases/Posix/fdr-single-thread.cc

Both tests segfaulted on initializing thread-local data structure. I've came to the conclusion that somehow initializing aligned thread-local data is causing the issue, and built a simple test case for it:

struct FDRLogWriter {
  FDRLogWriter() {}
};

struct alignas(64) ThreadLocalData {
  FDRLogWriter Buffer{};
};

void f() { thread_local ThreadLocalData foo{}; }

int main() {
  f();
  return 0;
}

The really curious part of this was that the code produced by g++ worked fine, while the one produced by clang++ segfaulted. At the same time, the LLVM bytecode generated by clang worked fine on both FreeBSD and Linux. Finally, through comparing and manipulating LLVM bytecode I've found the culprit: the alignment was not respected while allocating storage for thread-local data. However, clang assumed it will be and used MOVAPS instructions that caused a segfault on unaligned data.

I wrote about the problem to tech-toolchain and received a reply that TLS alignment is mostly ignored and there is no short term plan for fixing it. As an interim solution, I wrote a patch that disables alignment tips sufficiently to prevent clang from emitting code relying on it: r350029, D56000.

As suggested by Kamil, I discussed the PaX MPROTECT issues with upstream. The direct problem in solving it the usual way is that the code relies on remapping program memory mapped by ld.so, and altering the program code in place. I've been informed that this design was chosen to keep the mechanism simple, and explicitly declaring that XRay is incompatible with system security features such as PaX MPROTECT. As Kamil suggested, I've written an explicit check for MPROTECT being enabled, and made XRay fail with explanatory error message in that case: r350030, D56049.

Improving stdio.h / FILE* interceptor coverage

My final work has been on improving stdio.h coverage in interceptors. It started as a task on adding NetBSD FILE structure support for interceptors (D56109), and expanded into increasing test and interceptor coverage, both for POSIX and NetBSD-specific stdio.h functions.

I have implemented tests for the following functons:

  • clearerr, feof, ferrno, fileno, fgetc, getc, ungetc: D56136,

  • fputc, putc, putchar, getc_unlocked, putc_unlocked, putchar_unlocked: D56152,

  • popen, pclose: D56153,

  • funopen, funopen2 (in multiple parameter variants): D56154.

Furthermore, I have implemented missing interceptors for the following functions:

  • popen, popenve, pclose: D56157,

  • funopen, funopen2: D56158.

Kamil also pointed out that devname_r interceptor has wrong return type on non-NetBSD systems, and I've fixed that for completeness: D56150.

Summary

At this point, NetBSD LLVM buildbot is building and testing the following projects with no expected test failures:

  • llvm: core LLVM libraries; also includes the test suite of LLVM's lit testing tool

  • clang: C/C++ compiler

  • clang-tools-extra: extra C/C++ code manipulation tools (tidy, rename...)

  • lld: link editor

  • polly: loop and data-locality optimizer for LLVM

  • openmp: OpenMP runtime library

  • libunwind: unwinder library

  • libcxxabi: low-level support library for libcxx

  • libcxx: implementation of C++ standard library

It also builds the lldb debugger but does not run its test suite.

The support for compiler-rt is progressing but it is not ready for buildbot inclusion yet.

Future plans

Next month, I'm planning to work on next items from the TODO. Most notably, this includes:

  • building compiler-rt on buildbot and running the tests,

  • porting LLD to actually produce working executables on NetBSD,

  • porting remaining compiler-rt components: DFSan, ESan, LSan, shadowcallstack.

Posted Sunday night, December 30th, 2018 Tags: blog

Prepared by Michał Górny (mgorny AT gentoo.org).

I'm recently helping the NetBSD developers to improve the support for this operating system in various LLVM components. My first task in this endeavor was to fix build and test issues in as many LLVM projects as timely possible, and get them all covered by the NetBSD LLVM buildbot.

Including more projects in the continuous integration builds is important as it provides the means to timely catch regressions and new issues in NetBSD support. It is not only beneficial because it lets us find offending commits easily but also because it makes other LLVM developers aware of NetBSD porting issues, and increases the chances that the patch authors will fix their mistakes themselves.

Initial buildbot setup and issues

The buildbot setup used by NetBSD is largely based on the LLDB setup used originally by Android, published in the lldb-utils repository. For the purpose of necessary changes, I have forked it as netbsd-llvm-build and Kamil Rytarowski has updated the buildbot configuration to use our setup.

Initially, the very high memory use in GNU ld combined with high job count caused our builds to swap significantly. As a result, the builds were very slow and frequently were terminated due to no output as buildbot presumed them to hang. The fix for this problem consisted of two changes.

Firstly, I have extended the building script to periodically report that it is still active. This ensured that even during prolonged linking buildbot would receive some output and would not terminate the build prematurely.

Secondly, I have split the build task into two parts. The first part uses full ninja job count to build all static libraries. The second part runs with reduced job count to build everything else. Since most of LLVM source files are part of static libraries, this solution makes it possible to build as much as possible with full job count, while reducing it necessarily for GNU ld invocations later.

While working on this setup, we have been informed that the buildbot setup based on external scripts is a legacy design, and that it would be preferable to update it to define buildbot rules directly. However, we have agreed to defer that until our builds mature, as external scripts are more flexible and can be updated without having to request a restart of the LLVM buildbot.

The NetBSD buildbot is part of LLVM buildbot setup, and can be accessed via http://lab.llvm.org:8011/builders/lldb-amd64-ninja-netbsd8. The same machine is also used to run GDB and binutils build tests.

RPATH setup for LLVM builds

Another problem that needed solving was to fix RPATH in built executables to include /usr/pkg/lib, as necessary to find dependencies installed via pkgsrc. Normally, the LLVM build system sets RPATH itself, using a path based on $ORIGIN. However, I have been informed that NetBSD discourages the use of $ORIGIN, and appropriately I have been looking for a better solution.

Eventually, after some experimentation I have come up with the following CMake parameters:

-DCMAKE_BUILD_RPATH="${PWD}/lib;/usr/pkg/lib"
-DCMAKE_INSTALL_RPATH=/usr/pkg/lib

This explicitly disables the standard logic used by LLVM. Build-time RPATH includes the build directory explicitly as to ensure that freshly built shared libraries will be preferred at build time (e.g. when running tests) over previous pkgsrc install; this directory is afterwards removed from rpath when installing.

Building and testing more LLVM sub-projects

The effort so far was to include the following projects in LLVM buildbot runs:

  • llvm: core LLVM libraries; also includes the test suite of LLVM's lit testing tool

  • clang: C/C++ compiler

  • clang-tools-extra: extra C/C++ code manipulation tools (tidy, rename...)

  • lld: link editor

  • polly: loop and data-locality optimizer for LLVM

  • openmp: OpenMP runtime library

  • libunwind: unwinder library

  • libcxxabi: low-level support library for libcxx

  • libcxx: implementation of C++ standard library

  • lldb: debugger; built without test suite at the moment

Additionally, the following project was considered but it was ultimately skipped as it was not ready for wider testing yet:

  • llgo: Go compiler

My project fixes

During my work, I have been trying to upstream all the necessary changes ASAP, as to avoid creating additional local patch maintenance burden. This section provides a short list of all patches that have either been merged upstream, or are in process of waiting for review.

LLVM

Waiting for upstream review:

pkgsrc

  • z3 version bump (submitted to maintainer, waiting for reply)

NetBSD portability

During my work, I have met with a few interesting divergencies between the assumptions made by LLVM developers and the actual behavior of NetBSD. While some of them might be considered bugs, we determined it was preferable to support the current behavior in LLVM. In this section I shortly describe each of them, and indicate the path I took in making LLVM work.

unwind.h

The problem with unwind.h header is a part of bigger issue — while the unwinder API is somewhat defined as part of system ABI, there is no well-defined single implementation on most of the systems. In practice, there are multiple implementations both of the unwinding library and of its headers:

  • gcc: it implements unwinder library in libgcc; also, has its own unwind.h on Linux (but not on NetBSD)

  • clang: it has its own unwind.h (but no library)

  • 'non-GNU' libunwind: stand-alone implementation of library and headers

  • llvm-libunwind: stand-alone implementation of library and headers

  • libexecinfo: provides unwinder library and unwind.h on NetBSD

Since gcc does not provide unwind.h on NetBSD, using it to build LLVM normally results in the built-in unwinder library from GCC being combined with unwind.h installed as part of libexecinfo. However, the API defined by the latter header is type-incompatible with most of the other implementations, and caused libc++abi build to fail.

In order to resolve the build issue, we agreed to use LLVM's own unwinder implementation (llvm-libunwind) which we were building anyway, via the following CMake option:

-DLIBCXXABI_USE_LLVM_UNWINDER=ON

I have started a thread about fixing unwind.h to be more compatible.

noatime behavior

noatime is a filesystem mount option that is meant to inhibit atime updates on file accesses. This is usually done in order to avoid spurious inode writes when performing read-level operations. However, unlike the other implementations NetBSD not only disables automatic atime updates but also explicitly blocks explicit updates via utime() family of functions.

Technically, this behavior is permitted by POSIX as it permits implementation-defined behavior on updating atimes. However, a small number of LLVM tests explicitly rely on being able to set atime on a test file, and the NetBSD behavior causes them to fail. Without a way to set atime, we had to mark those tests unsupported.

I have started a thread about noatime behavior on tech-kern.

__func__ value

__func__ is defined by the standard to be an arbitrary form of function identifier. On most of the other systems, it is equal to the value of __FUNCTION__ defined by gcc, that is the undecorated function name. However, NetBSD system headers conditionally override this to __PRETTY_FUNCTION__, that is a full function prototype.

This has caused one of the LLVM tests to fail due to matching debug output. Admittedly, this was definitely a problem with the test (since __func__ can have an arbitrary value) and I have fixed it to permit the pretty function form.

Kamil Rytarowski has noted that the override is probably more accidental than expected since the header was not updated for C++11 compilers providing __func__, and started a thread about disabling it.

tar -t output

Another difference I have noted while investigating test failures was in output of tar -t (listing files inside a tarball). Curious enough, both GNU tar and libarchive use C-style escapes in the file list output. NetBSD pax/tar output the filenames raw.

The test meant to verify whether backslash in filenames is archived properly (i.e. not treated equivalent to forward slash). It failed because it expected the backslash to be escaped. I was able to fix it by permitting both forms, as the exact treatment of backslash was not relevant to the test case at hand.

I have compared different tar implementations including NetBSD pax in the article portability of tar features.

(time_t)-1 meaning

One of the libc++ test cases was verifying the handling of negative timestamps using a value of -1 (i.e. one second before the epoch). However, this value seems to be mishandled in some of the BSD implementations, FreeBSD and NetBSD in particular. Curious enough, other negative values work fine.

The easier side of the issue is that some functions (e.g. mktime()) use -1 as an error value. However, this can be easily fixed by inspecting errno for actual errors.

The harder side is that the kernel uses a value of -1 (called ENOVAL) to internally indicate that the timestamp is not to be changed. As a result, an attempt to update the file timestamp to one second before the epoch is going to be silently ignored.

I have fixed the test via extend the FreeBSD workaround to NetBSD, and using a different timestamp. I have also started a thread about (time_t)-1 handling on tech-kern.

Future plans

The plans for the remainder of December include, as time permits:

  • finishing upstream of the fore-mentioned patches

  • fixing flaky tests on NetBSD buildbot

  • upstreaming (and fixing if necessary) the remaining pkgsrc patches

  • improving NetBSD support in profiling and xray (of compiler-rt)

  • porting ESan/DFSan

The long-term goals include:

  • improving support for __float128

  • porting LLD to NetBSD (currently it passes all tests but does not produce working executables)

  • finishing LLDB port to NetBSD

  • porting remaining sanitizers to NetBSD

Posted mid-morning Sunday, December 16th, 2018 Tags: blog