Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February 2019, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, fixing watchpoint and threading support, porting to i386.

During the last month, I've finally managed to create proper reproducers (and tests) for the remaining concurrent signal delivery problems. I have started working on backtracing through signal trampolines, and prepared a libc++ update.

NetBSD concurrent signal updates

While finishing the last report, I was trying to reproduce some of the concurrent test failures in LLDB with plain ptrace(). I've finally managed to do that and therefore discover the factor causing all my earlier attempts to fail — concurrent signal delivery works fine unless the signal is actually delivered to the process and handled by it.

Let me explain this a bit. When a signal is delivered to a debugged process (or one of its threads), it is stopped and the debugger receives stopping signal via waitpid(). Now, if the debugger wishes the signal to be delivered to the process (thread), it needs to pass the signal number as an argument to PT_CONTINUE. If it neglects to do so (passes 0), the signal is discarded.

My tests so far were doing precisely that — discarding the signal. However, once I modified them to pass it back, they started failing similarly to how LLDB tests are failing.

Whenever the debugged program receives concurrent signals to different threads and the debugger requests their delivery, the process is stopped with some of the signals multiple times. Curiously enough, during my testing every signal to a thread was reported at least once which means no signals were lost. I suspect that in an attempt to deliver pending concurrent signals the kernel is passing them again to the debugger rather than to the process itself.

I've used this research to extend testing of concurrent behavior. More specifically, I have:

  1. Made signal concurrency test into a reusable factory.

  2. Started testing passing signal back to the process.

  3. Extended the test to verify that signal is actually being delivered.

  4. Included catching newly-created processes in the test.

  5. Added concurrent breakpoints to the test.

  6. Added concurrent watchpoints to the test.

  7. Finally, started testing combination of simultaneous signals, breakpoints and watchpoints.

Research into backtrace through signal trampoline

The most important of the remaining tasks was to enhance LLDB with NetBSD signal trampoline support.

Signal trampolines on NetBSD

Signal trampolines are shortly covered by Signal delivery chapter of NetBSD Internals.

When a signal is delivered to a running program, the system needs to interrupt its execution and run its defined signal handler. Once the signal handler finishes, the program execution resumes where it left off. How this is achieved differs from system to system.

On NetBSD, so-called signal trampoline is used. The kernel (this is done by sendsig_siginfo() e.g. in amd64/machdep.c function on newer ABIs) saves the program context and executes the signal handler. When the signal handler returns, it returns to a trampoline function defined by the libc that restores the saved context and therefore resumes the program execution.

From debugger's perspective, the backtrace for a process interrupted in midst of a signal handler ends on this trampoline function. However, it is often considered useful to be able to know the status of the process just before the signal was received — and therefore, the point where program execution will continue. The goal in this point was to make LLDB aware of NetBSD's trampoline design and capable of locating and using the saved context to produce full backtrace.

The two possible solutions

There are two approaches to implementing signal trampoline handling:

  1. Explicitly detecting and processing signal trampolines in debugger.

  2. Adding CFI code to signal trampoline implementation in order to store the necessary information in libc itself.

GDB on NetBSD is currently using the first approach. The code (found in nbsd-tdep.c and e.g. amd64-nbsd-tdep.c) explicitly establishes whether the current frame corresponds to a signal trampoline, finds the saved context and processes it.

Long-term, the second approach is preferable. Instead of explicitly writing platform-specific code, we add CFI annotations to the trampoline code (e.g. in __sigtramp2.S). Those annotations are consumed by the toolchain and used to construct frame information inside the executable that can be afterwards consumed by the debugger.

Both approaches are therefore roughly equivalent. The main difference is that approach 1. stores platform-specific logic in the debugger, while approach 2. stores it in the executable for all debuggers to consume.

libc++ update

Another task to undergo during this period was to update libc++ in NetBSD src tree. It was last imported in 2015, to the version roughly corresponding to LLVM 3.7 release. This version is dated and has some bugs, particularly it is prone to miscompilation due to undefined behavior (e.g. segfault in std::map). I've decided to upgrade to the commit corresponding to the most recent LLVM/Clang update.

max_align_t visibility

The first problem I've hit after upgrading is that max_align_t is declared on NetBSD only for C11/C++11. However, on NetBSD libc++ is exposing it unconditionally.

Kamil Rytarowski proposed to expose max_align_t unconditionally in our headers as well. Joerg Sonnenberger on the other hand wants to change libc++ instead.

Missing errno constants

Another issue I've found is that NetBSD is missing the two errno constants for robust mutexes: EOWNERDEAD and ENOTRECOVERABLE. While libc++ has a hack to redefine them when missing, it seemed a better idea to assign them on our end.

I've learned that adding errno constants involves a few changes besides adding new constants:

  1. Adding mapping to Linux compat in sys/compat/linux/common/linux_errno.c.

  2. Adding descriptions to manpage lib/libc/sys/intro.2.

  3. Adding messages to libc catalogs.

  4. Enabling appropriate features in libstdc++.

  5. Adding new error codes to libdtrace.

  6. Adding errno mapping to NFS support in sys/nfs/nfs_subs.c.

While at it, I've made sure to make it harder to accidentally miss doing some of that in the future. Notably:

  1. I've added ATF tests to make sure that libc catalogs stay in sync with errno and signal descriptions in code.

  2. I've added a script to autogenerate libdtrace errno lists.

  3. I've added a compile-time assertion that NFS errno mapping covers all values.

The complete list of commits:

  1. Sync errno messages between catalog and errno.h

  2. Sync signal messages between catalog and sys_siglist

  3. Add tests for missing libc catalog entries

  4. PR standards/44921: Add errno consts for robust mutexes

  5. Enable EOWNERDEAD & ENOTRECOVERABLE in libstdc++

  6. Update dtrace errno.d mapping and add a script for it

  7. Update NFS errno mapping and add assert for correctness

The update

I have sent libc++ update to 01f3a59fb3e2542fce74c768718f594d0debd0da to the mailing list for review. The proposed patch set includes:

  1. Adjust the cleanup script for the new version.

  2. Cleaning up extraneous files from the old import (to make the diff clearer).

  3. Importing the new version and updating Makefiles.

  4. Moving headers to standard /usr/include/c++/v1 location for better interoperability.

  5. Moving libc++ to apache2 license group.

Future plans

This is the final month of my contract and therefore I would like to primarily focus on importing LLDB into src tree. As time permits, I will continue attempting to improve support for backtracing through signal trampolines.

The exact list of remaining tasks in my contract follows:

  1. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  2. Add support for aarch64 target.

  3. Stabilize LLDB and address breaking tests from the test suite.

  4. Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

Posted Monday evening, March 9th, 2020 Tags:
This month I have finished porting ptrace(2) tests from other Operating Systems. I have determined which test scenarios were missing, compared to FreeBSD and Linux, and integrated them into the ATF framework. I have skipped some of the tests as the interesting behavior was already covered in existing tests (sometimes indirectly) or tools (like picotrace), or the NetBSD kernel exhibits different behavior.

As my work is reaching the end, I was trying to clean up the state with other projects.

ptrace(2) ATF tests

I have determined which test scenarios were missing and integrated them. Certain tests like wrapping FreeBSD specific pdfork(2) call were omitted as not applicable.

There are few new tests that are marked as expected failure for corner cases that are scheduled for fixing in future.

I have also worked on SIGCHLD-based debugging and analysis of its behavior. I have found out that SA_NOCLDWAIT behaves suspiciously. This flag passed to sigaction(2) is an extension. If set, the system will not create a zombie when the child exits, but the child process will be automatically waited for. The same effect can be achieved by setting the signal handler for SIGCHLD to SIG_IGN. Currently it behaves differently under a debugger as the child process is never collected and is waiting for parent to collect it. According to my research this behavior is unexpected. A potential fix might not be difficult in the kernel, but due to time constraints I have decided to add an ATF tests for this scenario, mark it as failed and include a comment deferring this case into future.

I have also refactored the remaining threaded tests, switching them from low-level LWP API to pthread(3) one.

Other changes

I was working on finishing projects that were left behind.

GDB and qemu upstreaming

I'm working on upstreaming NVMM support to mainline QEMU. This process is still ongoing.

I am slowly reducing the patchset against the GDB repository.

jemalloc changes

The jemalloc allocator is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. It's the default allocator in the NetBSD Operating System since 2007.

There are a few workarounds that make jemalloc compatible with NetBSD internals and I was trying to remove them. Unfortunately, the allocator tries to initialize itself too early using a C++-like constructor and intercepts the first malloc(3). The is done before initializing libpthread, and the pthread startup code uses malloc() when registering pthread_atfork(3) callbacks. In order to make it work, we allow premature usage of the libpthread functionality. I was trying to correct this, but I've introduced slight regressions in corner cases. They are hard to debug as the allocator is corrupted internally and randomly misbehaves (hangs, occasional crashes). I've discussed with the upstream developers about addressing this properly, but as reproducing the setup needs familiarity with the process of development NetBSD, we are still working on it.

Meanwhile, I have managed to correct known Undefined Behavior issues in jemalloc and address all known issues working together with upstream.


I received write access to the syzkaller GitHub repository. I also helped to get Kernel MSan (unauthorized memory access) operational on the syzbot node.

Miscellaneous changes

I helped with the libc++ upgrade that was done by Michal Gorny (but still not merged into mainline). As part of this work we gained a support for errno codes for POSIX robust mutexes.

I have implemented missing DT_GNU_HASH support as specified by GNU and LLVM linkers. This code was based on the implementation from three other major BSDs.

The micro-UBSan implementation gained support for alignment_assumptions. A number of UBSan reports were addressed.

Plan for the next and the last milestone

Upstream gdbserver support and address as many remaining bugs as the time will permit.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

Posted late Tuesday afternoon, March 10th, 2020 Tags: