Jan 2018
S M T W T F S
 
17 18
     

Archives

This page is a blog mirror of sorts. It pulls in articles from blog's feed and publishes them here (with a feed, too).

In the past 31 days, I've managed to get the core functionality of MSan to work. This is an uninitialized memory usage detector. MSan is a special sanitizer because it requires knowledge of every entry to the basesystem library and every entry to the kernel through public interfaces. This is mandatory in order to mark memory regions as initialized.

Most of the work has been done directly for MSan. However, part of the work helped generic features in compiler-rt.

Sanitizers

Changes in the sanitizer are listed below in chronological order. Almost all of the changes mentioned here landed upstream. A few small patches were reverted due to breaking non-NetBSD hosts and are rescheduled for further investigation. I maintain these patches locally and have moved on for now to work on the remaining features.

  • Move __tsan::Vector to the __sanitizer namespace. This move allows reuse of the homegrown Vector container in sanitizers other than TSan. I've implemented atexit(3) and __cxa_atexit() handling in MSan using this class.
  • I've added function renaming of the NetBSD libc calls to the common code of sanitizer interceptors (sanitizer_common_interceptors.inc). There have been 37 renames registered. In general, I don't intend to support compat code and focus entirely on newly compiled features, taking NetBSD-8 as the basesystem release. Therefore I support only the newest function version that is basically always used in new code.

    • clock_gettime -> __clock_gettime50
    • clock_getres -> __clock_getres50
    • clock_settime -> __clock_settime50
    • setitimer -> __setitimer50
    • getitimer -> __getitimer50
    • opendir -> __opendir30
    • readdir -> __readdir30
    • time -> __time50
    • localtime_r -> __localtime_r50
    • gmtime_r -> __gmtime_r50
    • gmtime -> __gmtime50
    • ctime -> __ctime50
    • ctime_r -> __ctime_r50
    • mktime -> __mktime50
    • getpwnam -> __getpwnam50
    • getpwuid -> __getpwuid50
    • getpwnam_r -> __getpwnam_r50
    • getpwuid_r -> __getpwuid_r50
    • getpwent -> __getpwent50
    • glob -> __glob30
    • wait3 -> __wait350
    • wait4 -> __wait450
    • readdir_r -> __readdir_r30
    • setlocale -> __setlocale50
    • scandir -> __scandir30
    • sigtimedwait -> __sigtimedwait50
    • sigemptyset -> __sigemptyset14
    • sigfillset -> __sigfillset14
    • sigpending -> __sigpending14
    • sigprocmask -> __sigprocmask14
    • shmctl -> __shmctl50
    • times -> __times13
    • stat -> __stat50
    • getutent -> __getutent50
    • getutxent -> __getutxent50
    • getutxid -> __getutxid50
    • getutxline -> __getutxline50
  • I've disabled intercepting absent functions on NetBSD, located in the MSan specific code: mempcpy, __libc_memalign, malloc_usable_size, stpcpy, gcvt, wmempcpy, fcvt.
  • I've added handling of symbol aliases of certain POSIX threading library functions in MSan interceptors, namely __libc_thr_keycreate -> pthread_key_create.
  • I've reused the Linux/x86_64 memory layout for NetBSD/amd64. There is a requirement to disable ASLR to get a functional mapping of memory regions on NetBSD. The NetBSD PaX ASLR implementation is too aggressive for the MSan needs, and this is similar to the TSan/NetBSD case.
  • I've switched the strerror_r(3) interceptor for NetBSD from the GNU-specific to the POSIX one. They behave differently and are not interchangeable.
  • I've disabled unsupported MSan tests on NetBSD: ftime and pvalloc and tsearch. They ship with unportable function calls that are absent on NetBSD.
  • I've corrected execution of the ifaddrs, textdomain and iconv tests on NetBSD. They were adapted for the NetBSD specific behavior.
  • I've finished handling in the interceptors the TLS block of the main program. This is a continuation of the past month's effort, the new solution is now corrected and has been made machine independent. It's worth noting that NetBSD is the only OS that is handling the main thread TLS block with generic interfaces.
  • I've introduced dlopen(3) support fixes. I've corrected the retrieval of link_map pointer from the dlopen(3) opaque handler structure.
  • I've fixed the list of intercepting functions for NetBSD: disabling getpshared-specific functions from POSIX pthreading library and switching from UTMP-like to UTMPX-like interceptors.
  • I've disabled a MSan test verifying fgetgrent_r() on NetBSD, as this function call is absent.
  • By mistake, I've disabled handling of fstatat()/MSan on NetBSD, and I've enabled it again.
  • MSan is supporting all GLIBC strtol-like function symbols in the GNU C library. I've adopted this for the NetBSD case intercepting only the _l variations.

NetBSD syscall hooks

I wrote a large patch (815kb!) adding support for NetBSD syscall hooks for use with sanitizers. I wrote the following description on the still pending patch for review:

Implement the initial set of NetBSD syscall hooks for use with sanitizers.

Add a script that generates the rules to handle syscalls
on NetBSD: generate_netbsd_syscalls.awk. It has been written
in NetBSD awk(1) (patched nawk) and is compatible with gawk.

Generate lib/sanitizer_common/sanitizer_platform_limits_netbsd.h
that is a public header for applications, and included as:
<sanitizer_common/sanitizer_platform_limits_netbsd.h>.

Generate sanitizer_netbsd_syscalls.inc that defines all the
syscall rules for NetBSD. This file is modeled after the Linux
specific file: sanitizer_common_syscalls.inc.

Start recognizing NetBSD syscalls with existing sanitizers:
ASan, ESan, HWASan, TSan, MSan, TSan.

Update the list of platform (NetBSD OS) specific structs
in lib/sanitizer_common/sanitizer_platform_limits_netbsd.

This patch does contain the most wanted structs
and handles the most wanted syscalls as of now, the rest
of them will be implemented in future when needed.

This patch is 815KB, therefore I will restrict the detailed
description to a demo:

$ uname -a
NetBSD chieftec 8.99.9 NetBSD 8.99.9 (GENERIC) #0: Mon Dec 25 12:58:16 CET 2017  root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
$ cat s.cc                                                                                                                   
#include <assert.h>
#include <errno.h>
#include <glob.h>
#include <stdio.h>
#include <string.h>

#include <sanitizer/netbsd_syscall_hooks.h>

int main(int argc, char *argv[]) {
  char buf[1000];
  __sanitizer_syscall_pre_recvmsg(0, buf - 1, 0);
  // CHECK: AddressSanitizer: stack-buffer-{{.*}}erflow
  // CHECK: READ of size {{.*}} at {{.*}} thread T0
  // CHECK: #0 {{.*}} in __sanitizer_syscall{{.*}}recvmsg
  return 0;
}
$ ./a.out   
=================================================================
==18015==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f7fffe9c2ff at pc 0x000000467798 bp 0x7f7fffe9c2d0 sp 0x7f7fffe9ba90
WRITE of size 48 at 0x7f7fffe9c2ff thread T16777215
    #0 0x467797 in __sanitizer_syscall_pre_impl_recvmsg /public/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_netbsd_syscalls.inc:394:3
    #1 0x4abeb2 in main (/public/llvm-build/./a.out+0x4abeb2)
    #2 0x419bba in ___start (/public/llvm-build/./a.out+0x419bba)

Address 0x7f7fffe9c2ff is located in stack of thread T0 at offset 31 in frame
    #0 0x4abd7f in main (/public/llvm-build/./a.out+0x4abd7f)

  This frame has 1 object(s):
    [32, 1032) 'buf' <== Memory access at offset 31 partially underflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMAR.Y: AddressSanitizer: stack-buffer-underflow /public/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_netbsd_syscalls.inc:394:3 in __sanitizer_syscall_pre_impl_recvmsg
Shadow bytes around the buggy address:
  0x4feffffd3800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x4feffffd3850: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1[f1]
  0x4feffffd3860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3890: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd38a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18015==ABORTING

NetBSD ioctl(2) hooks

Similar to the syscall hooks, there is need to handle every ioctl(2) call. I've created the needed patch, this time shorter - for less than 300kb. This code is still pending in upstream review:

Introduce handling of 1200 NetBSD specific ioctl(2) calls.
Over 100 operations are disabled as unavailable or conflicting
with the existing ones (the same operation number).

Add a script that generates the rules to detect ioctls on NetBSD.
The generate_netbsd_ioctls.awk script has been written
in NetBSD awk(1) (patched nawk) and is compatible with gawk.

Generate lib/sanitizer_common/sanitizer_netbsd_interceptors_ioctl.inc
with the awk(1) script.

Update sanitizer_platform_limits_netbsd accordingly to add the needed
definitions.

New patches still pending for upstream review

There are two corrections that I've created, and they are still pending upstream for review:

I've got a few more local patches that require cleanup before submitting to review.

NetBSD basesystem corrections

I've introduced few corrections in the NetBSD codebase:

  • Rename of a local function uname() in ps(1) to usrname(). This removes a name clash with libc. This situation is legal in the C language, but unwanted in the POSIX environment and it makes harder to reuse sanitizers with such code.
  • Removal of three unused and unimplemented legacy syscalls: sstk (stack section size change), sbrk (change data segment size) and vadvise (specify system paging behaviour). The sbrk syscall could be removed as the functionality is implemented in libc brk(2) in assembly.
  • I've detected and corrected a bug in the pipe2() syscall, as it was returning incorrectly two integers instead of the one with the status of operation. This was leftover from the pipe(2) system call behavior and implementation detail. I've introduced refactoring to address this and make the code cleaner.
  • Committed and fixed the jobs variable usage in sh(1) - by Christos Zoulas and K. Robert Elz - the first real (but harmless) bug detected and squashed thanks to MSan.

Sanitizers in Go

I've prepared a scratch port of TSan and MSan to the Go language environment. This code mostly works. However, there are remaining bugs that must be fixed.

Results of ./race.bash:

Passed 340 of 347 tests (97.98%, 0+, 7-)
0 expected failures (0 has not fail)

The MSan state as of today

Although, I've managed to pass more than 90% of tests in the check-msan target within approximately one week, the gap between passing most of the test and sanitizing real world applications - even the small ones like cat(1) - is large. This pushed me towards supporting most of the important NetBSD syscalls and NetBSD ioctls, and demands from me to support most of the libc, libutil, librt, libm and libkvm entry calls.

********************
Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Testing Time: 27.92s
********************
Failing Tests (7):
    MemorySanitizer-X86_64 :: chained_origin_with_signals.cc
    MemorySanitizer-X86_64 :: dtls_test.c
    MemorySanitizer-X86_64 :: ioctl_custom.cc
    MemorySanitizer-X86_64 :: sem_getvalue.cc
    MemorySanitizer-X86_64 :: signal_stress_test.cc
    MemorySanitizer-X86_64 :: textdomain.cc
    MemorySanitizer-X86_64 :: tzset.cc

  Expected Passes    : 97
  Expected Failures  : 1
  Unsupported Tests  : 27
  Unexpected Failures: 7

I can already execute cat(1) under sanitizers, and this milestone was achieved just at the end of the passed month.

Most other from the NetBSD base programs are not usable, few examples:

  • test(1) - bug with handling isspace(3), still not understood.
  • sh(1) - bug with signal-specific function calls.
  • ksh(1) - clash with a libc symbol.
  • ps(1) - lack of libkvm(3) handling.

There are also general stability problems with signals and forks. This all makes MSan not ready for larger applications like LLDB.

Solaris support in sanitizers

I've helped the Solaris team add basic support for Sanitizers (ASan, UBsan). This does not help NetBSD directly, however indirectly it improves the overall support for non-Linux hosts and helps to catch more Linuxisms in the code.

Plan for the next milestone

I plan to continue the work on MSan and correct sanitizing of the NetBSD basesystem utilities. This mandates me to iterate over the basesystem libraries implementing the missing interceptors and correcting the current support of the existing ones. My milestone is to build all src/*bin* programs against Memory Sanitizer and when possible execute them cleanly.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Wednesday afternoon, January 3rd, 2018 Tags: blog

The NetBSD Project is pleased to announce NetBSD 7.1.1, the first security/bugfix update of the NetBSD 7.1 release branch. It represents a selected subset of fixes deemed important for security or stability reasons. If you are running an earlier release of NetBSD, we strongly suggest updating to 7.1.1.

For more details, please see the release notes.

Complete source and binaries for NetBSD are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, and other services may be found at http://www.NetBSD.org/mirrors/.

Posted early Thursday morning, December 28th, 2017 Tags: blog
During the past month I've finished my work on TSan for NetBSD/amd64. There are still few minor issues, although the Sanitizer is already suitable for real applications and is stable. I was able to build real applications like LLDB against TSan and get it to work to find real threading problems.

The process of stabilization and fixing TSan was challenging as there are intermixed types of issues that resulted in one big random breakage bug that is difficult to analyze. Software debuggers need more work with threaded programs, so this was like a chicken-egg problem, to debug debugging utilities.

Corrections

Most of the corrections were in TSan-specific and Common Sanitizer code. There was also one fix in LSan.

TSan: on_exit()/at_exit(3)/__cxa_atexit()

There are different function types for the same purpose: to execute a callback function on thread or process termination. The existing code in TSan wasn't compatible with the NetBSD Operating System:

  • on_exit() - This function is Linux-specific, I've disabled it for NetBSD.
  • at_exit(3) - It was reimplemented by TSan using __cxa_atexit(), however in an incompatible way for NetBSD. TSan was attempting to register a wrapper callback through __cxa_atexit() with the second argument as a function pointer and the third argument (Dynamic Shared Object pointer) equal with NULL. This approach is not portable and it broke on NetBSD, therefore I had to add a new implementation based on a stack (LIFO container).
  • Every at_exit(3) registering function is intercepted by TSan and the sanitizer pushes it to the local LIFO container, passing its local wrapper function to the system. During the execution of a callback by the OS, we call the wrapper, which pops the originally saved function pointer from the stack and executes it.
  • __cxa_atexit() - This callback shared TSan internals with at_exit(3) and is functional on NetBSD.

To assure the changes, I've added a new test named atexit3, which assures the correct order of execution of the at_exit(3) callbacks.

TSan: _lwp_exit()

In order to detect a thread's termination by the TSan interceptors, a mechanism to register a callback function in the pthread(3) destructor was used. The destructor callback was registered with pthread_key_create(3) and this approach was broken on NetBSD for two reasons.

  1. We cannot register it during early libc and libpthread(3) bootstrap, as the system functions need to initialize.
  2. The execution of callback functions is not the last event during a POSIX thread entity termination.

I was looking for a mechanism to defer the destructor callback registration to subsequent libc initialization stages, similar to constructor sections. I've understood that this approach was suboptimal because it resulted in further breakage. The NetBSD implementation of a POSIX thread termination notifies a parent thread (waiter for join) and still attempts to acquire mutex. TSan assumed that no longer any thread specific function is called like a mutex acquisition and destroyed part of thread specific data to trace such events. I've switched the POSIX thread termination event detection to the interception of _lwp_exit(2) call, as it's truly the latest interceptable function on NetBSD, detaching the low-level thread entity (LWP) that is the kernel context for POSIX thread.

TSan: Thread Joined vs Thread Exited

Correcting the detection of termination of a thread caused new problems, with a race between two event notifications that happen at the same time:

  • Thread A sleeps waiting for joining of thread B.
  • Thread B wakes thread A notifying it as joinable.
  • Thread B terminates calling _lwp_exit().

Both events are traced by TSan: joining and exiting and they must be intercepted in the order of exiting followed by joining (unless a thread is marked to be detached without joining).

This problem has been analyzed and fixed by the introduction of atomic-function waiters in low-level parts (not exposed to TSan or other sanitizers), that causes busy waiting in ThreadRegistry::JoinThread for notifying the end of execution of ThreadRegistry::FinishThread. This approach happened to be stable and so far no failures are observed. There was a tiny breakage in ppc64-linux, as this change introduced as infinite freeze, but it was caused by an unrelated problem and a faulty test was switched from failing to unsupported.

Sanitizers: GetTls

I've implemented the initial support for determining whether a memory buffer is allocated as Thread-Local-Storage. The current approach uses FreeBSD code, however it's subject to future improvement: in order to make it more generic and aware of dynamic allocation (like after dlopen(3)) TLS vectors.

Sanitizers: Handling NetBSD specific indirection of libpthread functions

I've corrected handling of three libpthread(3) functions on NetBSD:

  • pthread_mutex_lock(3),
  • pthread_mutex_unlock(3),
  • pthread_setcancelstate(3).

Code out of the libpthread(3) context uses the libc symbols:

  • __libc_mutex_lock,
  • __libc_mutex_unlock,
  • __libc_thr_setcancelstate.

The threading library (libpthread(3)) defines strong aliases:

  • __strong_alias(__libc_mutex_lock,pthread_mutex_lock)
  • __strong_alias(__libc_mutex_unlock,pthread_mutex_unlock)
  • __strong_alias(__libc_thr_setcancelstate,pthread_setcancelstate)

This caused that these functions were invisible to sanitizers on NetBSD. I've introduced interception of the libc-specific functions and I have added them as NetBSD-specific aliases for the common pthread(3) functions.

NetBSD needs to intercept both functions, as the regularly named ones are used internally in libpthread(3).

Sanitizers: Adding DemangleFunctionName for backtracing on NetBSD

NetBSD uses indirection for old threading functions for historical reasons. The mangled names are an internal implementation detail and should not be exposed even in backtraces.

  • __libc_mutex_init -> pthread_mutex_init
  • __libc_mutex_lock -> pthread_mutex_lock
  • __libc_mutex_trylock -> pthread_mutex_trylock
  • __libc_mutex_unlock -> pthread_mutex_unlock
  • __libc_mutex_destroy -> pthread_mutex_destroy
  • __libc_mutexattr_init -> pthread_mutexattr_init
  • __libc_mutexattr_settype -> pthread_mutexattr_settype
  • __libc_mutexattr_destroy -> pthread_mutexattr_destroy
  • __libc_cond_init -> pthread_cond_init
  • __libc_cond_signal -> pthread_cond_signal
  • __libc_cond_broadcast -> pthread_cond_broadcast
  • __libc_cond_wait -> pthread_cond_wait
  • __libc_cond_timedwait -> pthread_cond_timedwait
  • __libc_cond_destroy -> pthread_cond_destroy
  • __libc_rwlock_init -> pthread_rwlock_init
  • __libc_rwlock_rdlock -> pthread_rwlock_rdlock
  • __libc_rwlock_wrlock -> pthread_rwlock_wrlock
  • __libc_rwlock_tryrdlock -> pthread_rwlock_tryrdlock
  • __libc_rwlock_trywrlock -> pthread_rwlock_trywrlock
  • __libc_rwlock_unlock -> pthread_rwlock_unlock
  • __libc_rwlock_destroy -> pthread_rwlock_destroy
  • __libc_thr_keycreate -> pthread_key_create
  • __libc_thr_setspecific -> pthread_setspecific
  • __libc_thr_getspecific -> pthread_getspecific
  • __libc_thr_keydelete -> pthread_key_delete
  • __libc_thr_once -> pthread_once
  • __libc_thr_self -> pthread_self
  • __libc_thr_exit -> pthread_exit
  • __libc_thr_setcancelstate -> pthread_setcancelstate
  • __libc_thr_equal -> pthread_equal
  • __libc_thr_curcpu -> pthread_curcpu_np

This demangling also fixes several tests that expect the regular pthread(3) function names.

TSan: Handling NetBSD specific indirection of libpthread functions

I've corrected handling of libpthread(3) functions in TSan/NetBSD:

  • pthread_cond_init(3),
  • pthread_cond_signal(3),
  • pthread_cond_broadcast(3),
  • pthread_cond_wait(3),
  • pthread_cond_destroy(3),
  • pthread_mutex_init(3),
  • pthread_mutex_destroy(3),
  • pthread_mutex_trylock(3),
  • pthread_rwlock_init(3),
  • pthread_rwlock_destroy(3),
  • pthread_rwlock_rdlock(3),
  • pthread_rwlock_tryrdlock(3),
  • pthread_rwlock_wrlock(3),
  • pthread_rwlock_trywrlock(3),
  • pthread_rwlock_unlock(3),
  • pthread_once(3).

Code out of the libpthread(3) context uses the libc symbols that are prefixed with __libc_, for example: __libc_cond_init.

This has caused that these functions were invisible to sanitizers on NetBSD. Intercepting the libc-specific and adding them as NetBSD-specific aliases for the common pthread(3) functions.

NetBSD needs to intercept both functions, as the regularly named ones are used internally in libpthread(3).

TSan: Correcting NetBSD support in pthread_once(3)

The pthread_once(3)/NetBSD type is built with the following structure:

struct __pthread_once_st { pthread_mutex_t pto_mutex; int pto_done; };

I've set the pto_done position as shifted by __sanitizer::pthread_mutex_t_sz from the beginning of the pthread_once struct.

This corrects deadlocks when the pthread_once(3) function is used.

Sanitizers: Plug dlerror() leak for swift_demangle

InitializeSwiftDemangler() attempts to resolve the swift_demangle symbol. If this is not available, we observe dlerror message leak.

LSan: Detecting thread's termination

I've fixed the same problem as has been analyzed in TSan, and I've switched to the _lwp_exit(2) approach.

Sanitizers: Handling symbol renaming of sigaction on NetBSD

NetBSD uses the __sigaction14 symbol name for historical and compat reasons for the sigaction(2) function name.

I've renamed the interceptors and users of sigaction to sigaction_symname and I've reused it in the code base.

TSan: Correcting mangled_sp on NetBSD/amd64

I've fixed the LongJmp(3) function on NetBSD and pointed the correct place of the RSP (stack pointer) register on NetBSD/amd64.

TSan: Supporting the setjmp(3) family of functions on NetBSD/amd64

I've added support for handling the setjmp(3)/longjmp(3) family of functions on NetBSD/amd64.

There are three types of them on NetBSD:

  • setjmp(3) / longjmp(3)
  • sigsetjmp(3) / sigsetjmp(3)
  • _setjmp(3) / _longjmp(3)

Due to historical and compat reasons the symbol names are mangled:

  • setjmp -> __setjmp14
  • longjmp -> __longjmp14
  • sigsetjmp -> __sigsetjmp14
  • siglongjmp -> __siglongjmp14
  • _setjmp -> _setjmp
  • _longjmp -> _longjmp

This leads to symbol renaming in the existing codebase.

There is no such symbol as __sigsetjmp/__longsetjmp on NetBSD so it has been disabled.

Additonally, I've added a comment that GNU-style executable stack note is not needed on NetBSD. The stack is not executable without it.

TSan: Deferring StartBackgroundThread() and StopBackgroundThread()

NetBSD cannot spawn new POSIX thread entities in early libc and libpthread initialization stage. I've deferred this to the point of intercepting the first pthread_create(3) call.

This is the last change that makes Thread Sanitizer functional on NetBSD/amd64 without downstream patches.

Final TSan results

Results for the check-tsan test-target.

********************
Testing Time: 64.91s
********************
Failing Tests (5):
    ThreadSanitizer-x86_64 :: dtls.c
    ThreadSanitizer-x86_64 :: ignore_lib5.cc
    ThreadSanitizer-x86_64 :: ignored-interceptors-mmap.cc
    ThreadSanitizer-x86_64 :: mutex_lock_destroyed.cc
    ThreadSanitizer-x86_64 :: vfork.cc

  Expected Passes    : 290
  Expected Failures  : 1
  Unsupported Tests  : 83
  Unexpected Failures: 5

The following results present that the all crucial issues are now fixed, and this Sanitizer can be used to trace real software. The remaining problems are minor ones and they are scheduled to be fixed in the future:

  • signal_block.cc - there is some race; sometimes it works sometimes it does not work.
  • dtls.c - it looks like dynamically allocated TLS vectors are missing on the NetBSD side.
  • vfork.cc - testing UB, it looks like NetBSD behaves the same way like Linux does, however the test is failing.
  • mutex_lock_destroyed.cc - it is based on UB implemented in style of Linux.
  • The other tests fail for similar rare case scenarios like massive mmap(2) calls that seem to overflow the shadow.

LLVM JIT

As noted in the previous reports, there is an ongoing process to improve NetBSD compatiblity with existing Just-In-Time frameworks in LLVM. In the recent month the existing code has been adjusted to the point to pass all existing LLVM tests of JIT code on NetBSD under PaX MPROTECT.

Scudo hardened allocator

I've added initial support for NetBSD in the Scudo hardened allocator. I keep this code locally in pkgsrc-wip/compiler-rt-netbsd.

More work is needed in order to correct the known failures in tests. These are largely caused by the fact that Scudo was a Linux-only feature and the existing tests depend on GLIBC specific internals. They need to be adapted for the default NetBSD allocator (jemalloc(3)).

********************
Testing Time: 5.40s
********************
Failing Tests (32):
    Scudo-i386 :: double-free.cpp
    Scudo-i386 :: interface.cpp
    Scudo-i386 :: memalign.c
    Scudo-i386 :: mismatch.cpp
    Scudo-i386 :: options.cpp
    Scudo-i386 :: overflow.c
    Scudo-i386 :: preload.cpp
    Scudo-i386 :: quarantine.c
    Scudo-i386 :: realloc.cpp
    Scudo-i386 :: rss.c
    Scudo-i386 :: secondary.c
    Scudo-i386 :: sizes.cpp
    Scudo-i386 :: valloc.c
    Scudo-x86_64 :: alignment.c
    Scudo-x86_64 :: double-free.cpp
    Scudo-x86_64 :: interface.cpp
    Scudo-x86_64 :: malloc.cpp
    Scudo-x86_64 :: memalign.c
    Scudo-x86_64 :: mismatch.cpp
    Scudo-x86_64 :: options.cpp
    Scudo-x86_64 :: overflow.c
    Scudo-x86_64 :: preload.cpp
    Scudo-x86_64 :: quarantine.c
    Scudo-x86_64 :: random_shuffle.cpp
    Scudo-x86_64 :: realloc.cpp
    Scudo-x86_64 :: rss.c
    Scudo-x86_64 :: secondary.c
    Scudo-x86_64 :: sized-delete.cpp
    Scudo-x86_64 :: sizes.cpp
    Scudo-x86_64 :: threads.c
    Scudo-x86_64 :: valloc.c

  Expected Passes    : 8
  Unexpected Failures: 32

Plans for the next milestone

The next goal is to finish MSan and switch back to LLDB restoration for tracing single threaded programs.

The TSan corrections indirectly increased the number of passing MSan tests. I'm going to solve the detected problems and thanks to the experience with other sanitizers the MSan issues don't seem to be as challenging like as before finishing TSan.

********************
Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Testing Time: 30.91s
********************
Failing Tests (69):
    MemorySanitizer-x86_64 :: allocator_returns_null.cc
    MemorySanitizer-x86_64 :: backtrace.cc
    MemorySanitizer-x86_64 :: c-strdup.c
    MemorySanitizer-x86_64 :: chained_origin.cc
    MemorySanitizer-x86_64 :: chained_origin_empty_stack.cc
    MemorySanitizer-x86_64 :: chained_origin_limits.cc
    MemorySanitizer-x86_64 :: chained_origin_memcpy.cc
    MemorySanitizer-x86_64 :: chained_origin_with_signals.cc
    MemorySanitizer-x86_64 :: check_mem_is_initialized.cc
    MemorySanitizer-x86_64 :: death-callback.cc
    MemorySanitizer-x86_64 :: dlopen_executable.cc
    MemorySanitizer-x86_64 :: dso-origin.cc
    MemorySanitizer-x86_64 :: dtls_test.c
    MemorySanitizer-x86_64 :: dtor-base-access.cc
    MemorySanitizer-x86_64 :: dtor-bit-fields.cc
    MemorySanitizer-x86_64 :: dtor-derived-class.cc
    MemorySanitizer-x86_64 :: dtor-multiple-inheritance-nontrivial-class-members.cc
    MemorySanitizer-x86_64 :: dtor-multiple-inheritance.cc
    MemorySanitizer-x86_64 :: dtor-trivial-class-members.cc
    MemorySanitizer-x86_64 :: dtor-vtable-multiple-inheritance.cc
    MemorySanitizer-x86_64 :: dtor-vtable.cc
    MemorySanitizer-x86_64 :: fork.cc
    MemorySanitizer-x86_64 :: ftime.cc
    MemorySanitizer-x86_64 :: getaddrinfo-positive.cc
    MemorySanitizer-x86_64 :: getaddrinfo.cc
    MemorySanitizer-x86_64 :: getc_unlocked.c
    MemorySanitizer-x86_64 :: heap-origin.cc
    MemorySanitizer-x86_64 :: icmp_slt_allones.cc
    MemorySanitizer-x86_64 :: iconv.cc
    MemorySanitizer-x86_64 :: ifaddrs.cc
    MemorySanitizer-x86_64 :: insertvalue_origin.cc
    MemorySanitizer-x86_64 :: mktime.cc
    MemorySanitizer-x86_64 :: mmap.cc
    MemorySanitizer-x86_64 :: msan_copy_shadow.cc
    MemorySanitizer-x86_64 :: msan_dump_shadow.cc
    MemorySanitizer-x86_64 :: msan_print_shadow.cc
    MemorySanitizer-x86_64 :: msan_print_shadow2.cc
    MemorySanitizer-x86_64 :: origin-store-long.cc
    MemorySanitizer-x86_64 :: param_tls_limit.cc
    MemorySanitizer-x86_64 :: print_stats.cc
    MemorySanitizer-x86_64 :: pthread_getattr_np_deadlock.cc
    MemorySanitizer-x86_64 :: pvalloc.cc
    MemorySanitizer-x86_64 :: readdir64.cc
    MemorySanitizer-x86_64 :: realloc-large-origin.cc
    MemorySanitizer-x86_64 :: realloc-origin.cc
    MemorySanitizer-x86_64 :: report-demangling.cc
    MemorySanitizer-x86_64 :: scandir.cc
    MemorySanitizer-x86_64 :: scandir_null.cc
    MemorySanitizer-x86_64 :: select_float_origin.cc
    MemorySanitizer-x86_64 :: select_origin.cc
    MemorySanitizer-x86_64 :: sem_getvalue.cc
    MemorySanitizer-x86_64 :: signal_stress_test.cc
    MemorySanitizer-x86_64 :: sigwait.cc
    MemorySanitizer-x86_64 :: stack-origin.cc
    MemorySanitizer-x86_64 :: stack-origin2.cc
    MemorySanitizer-x86_64 :: strerror_r-non-gnu.c
    MemorySanitizer-x86_64 :: strlen_of_shadow.cc
    MemorySanitizer-x86_64 :: strndup.cc
    MemorySanitizer-x86_64 :: textdomain.cc
    MemorySanitizer-x86_64 :: times.cc
    MemorySanitizer-x86_64 :: tls_reuse.cc
    MemorySanitizer-x86_64 :: tsearch.cc
    MemorySanitizer-x86_64 :: tzset.cc
    MemorySanitizer-x86_64 :: unaligned_read_origin.cc
    MemorySanitizer-x86_64 :: unpoison_string.cc
    MemorySanitizer-x86_64 :: use-after-dtor.cc
    MemorySanitizer-x86_64 :: use-after-free.cc
    MemorySanitizer-x86_64 :: wcsncpy.cc

  Expected Passes    : 38
  Expected Failures  : 1
  Unsupported Tests  : 24
  Unexpected Failures: 69

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Thursday evening, November 30th, 2017 Tags: blog
latest developments in the Kernel ASLR district

Initial design

As I said in the previous episode, I added in October a Kernel ASLR implementation in NetBSD for 64bit x86 CPUs. This implementation would randomize the location of the kernel in virtual memory as one block: a random VA would be chosen, and the kernel ELF sections would be mapped contiguously starting from there.

This design had several drawbacks: one leak, or one successful cache attack, could be enough to reconstruct the layout of the entire kernel and defeat KASLR.

NetBSD’s new KASLR design significantly improves this situation.

New design

In the new design, each kernel ELF section is randomized independently. That is to say, the base addresses of .text, .rodata, .data and .bss are not correlated. KASLR is already at this stage more difficult to defeat, since you would need a leak or cache attack on each of the kernel sections in order to reconstruct the in-memory kernel layout.

Then, starting from there, several techniques are used to strengthen the implementation even more.

Sub-blocks

The kernel ELF sections are themselves split in sub-blocks of approximately 1MB. The kernel therefore goes from having:

	{ .text .rodata .data .bss }
to having
	{ .text .text.0 .text.1 ... .text.i .rodata .rodata.0 ... .rodata.j ... .data ...etc }
As of today, this produces a kernel with ~33 sections, each of which is mapped at a random address and in a random order.

This implies that there can be dozens of .text segments. Therefore, even if you are able to conduct a cache attack and determine that a given range of memory is mapped as executable, you don’t know which sub-block of .text it is. If you manage to obtain a kernel pointer via a leak, you can at most guess the address of the section it finds itself in, but you don’t know the layout of the remaining 32 sections. In other words, defeating this KASLR implementation is much more complicated than in the initial design.

Higher entropy

Each section is put in a 2MB-sized physical memory chunk. Given that the sections are 1MB in size, this leaves half of the 2MB chunk unused. Once in control, the prekern shifts the section within the chunk using a random offset, aligned to the ELF alignment constraint. This offset has a maximum value of 1MB, so that once shifted the section still resides in its initial 2MB chunk:


Fig. A: Physical memory, a random offset has been added.

The prekern then maps these 2MB physical chunks at random virtual addresses; but addresses aligned to 2MB. For example, the two sections in Fig. A will be mapped at two distinct VAs:


Fig. B: two random, 2MB-aligned ranges of VAs point to the chunks the sections find themselves in.

There is a reason the sections are shifted in memory: it offers higher entropy. If we consider a .text.i section with a 64byte ELF alignment constraint, and give a look at the number of possibilities for the location of the section in memory:

  • The prekern shifts the 1MB section in its 2MB chunk, with an offset aligned to 64 bytes. So there are (2MB-1MB)/(64B)=214 possibilities for the offset.
  • Then, the prekern uses a 2MB-sized 2MB-aligned range of VA, chosen in a 2GB window. So there are (2GB-2MB)/(2MB)=210-1 possibilities for the VA.

Therefore, there are 214x(210-1)≈224 possible locations for the section. As a comparison with other systems:

OS # of possibilities
Linux
26
MacOS
28
Windows
213
NetBSD
224

Fig. C: comparison of entropies. Note that the other KASLR implementations do not split the kernel sections in sub-blocks.

Of course, we are talking about one .text.i section here; the sections that will be mapped afterwards will have fewer location possibilities because some slots will be already occupied. However, this does not alter the fact that the resulting entropy is still higher than that of the other implementations. Note also that several sections have an alignment constraint smaller than 64 bytes, and that in such cases the entropy is even higher.

Large pages

There is also a reason we chose to use 2MB-aligned 2MB-sized ranges of VAs: when the kernel is in control and initializes itself, it can now use large pages to map the physical 2MB chunks. This greatly improves memory access performance at the CPU level.

Countermeasures against TLB cache attacks

With the memory shift explained above, randomness is therefore enforced at both the physical and virtual levels: the address of the first page of a section does not equal the address of the section itself anymore.

It has, as a side effect, an interesting property: it can mostly mitigate TLB cache attacks. Such attacks operate at the virtual-page level; they will allow you to know that a given large page is mapped as executable, but you don’t know where exactly within that page the section actually begins.

Strong?

This KASLR implementation, which splits the kernel in dozens of sub-blocks, randomizes them independently, while at the same time allowing for higher entropy in a way that offers large page support and some countermeasures against TLB cache attacks, appears to be the most advanced KASLR implementation available publicly as of today.

Feel free to prove me wrong, I would be happy to know!

WIP

Even if it is in a functional state, this implementation is still a work in progress, and some of the issues mentioned in the previous blog post haven't been addressed yet. But feel free to test it and report any issue you encounter. Instructions on how to use this implementation can still be found in the previous blog post, and haven’t changed since.

See you in the next episode!

Posted at lunch time on Monday, November 20th, 2017 Tags: blog

Since the last update, we've made a number of improvements to the NetBSD Allwinner port. The SUNXI kernel has grown support for 8 new SoCs, and we added many new device drivers to the source repository.

Supported systems

Device driver support

In addition to the countless machine-independent device drivers already in NetBSD, the following Allwinner-specific devices are supported:

Audio codec

The built-in analog audio codec is supported on the following SoCs with the sunxicodec driver: A10, A13, A20, A31, GR8, H2+, H3, and R8.

Ethernet

Ethernet is supported on all applicable Allwinner SoCs. Three ethernet drivers are available:

  • Fast Ethernet MAC (EMAC) as found in A10 and A20 family SoCs
  • Gigabit Ethernet MAC (GMAC) as found in A20, A31, and A80 family SoCs
  • Gigabit Ethernet MAC (EMAC) as found in A64, A83T, H2+, and H3 family SoCs

Framebuffer

Framebuffer console support is available wherever it is supported by U-Boot using the simplefb(4) driver.

Thermal sensors

Thermal sensors are supported on A10, A13, A20, A31, A64, A83T, H2+, and H3 SoCs.

CPU frequency and voltage scaling

On A10, A20, H2+, and H3 SoCs, dynamic CPU frequency and voltage scaling support is available when configured in the device tree. In addition, on H2+ and H3 SoCs, the kernel will automatically detect when the CPU temperature is too high and throttle the CPU frequency and voltage to prevent overheating.

Touch screen

The touch screen controller found in A10, A13, A20, and A31 SoCs is fully supported. The tpctl(8) utility can be used to calibrate the touch screen and has been updated to support standard wsdisplay APIs.

Other drivers

A standard set of devices are supported across all SoCs (where applicable): DMA, GPIO, I2C, interrupt controllers, RTC, SATA, SD/MMC, timers, UART, USB, watchdog, and more.

U-Boot

A framework for U-Boot packages has been added to pkgsrc, and U-Boot packages for many boards already exist.

What now?

There are a few missing features that would be nice to have:

  • Wi-Fi (SDIO). There are a lot of different wireless chips used on these boards, but the majority seem to be either Broadcom or Realtek based. We recently ported OpenBSD's bwfm(4) driver to support the USB version of the Broadcom Wi-Fi controllers, with an expectation that SDIO support will follow at some point in the future.
  • NAND controller. Most boards have eMMC and/or microSD slots, but this would be really useful for the CHIP / CHIP Pro / PocketCHIP family of devices.
  • 64-bit support for sun50i family SoCs
  • Readily available install images. A prototype NetBSD ARM Bootable Images site is available with a limited selection of supported boards.

More information

Posted in the wee hours of Tuesday night, November 8th, 2017 Tags: blog

Since the last update, we've made a number of improvements to the NetBSD Allwinner port. The SUNXI kernel has grown support for 8 new SoCs, and we added many new device drivers to the source repository.

Supported systems

Device driver support

In addition to the countless machine-independent device drivers already in NetBSD, the following Allwinner-specific devices are supported:

Audio codec

The built-in analog audio codec is supported on the following SoCs with the sunxicodec driver: A10, A13, A20, A31, GR8, H2+, H3, and R8.

Ethernet

Ethernet is supported on all applicable Allwinner SoCs. Three ethernet drivers are available:

  • Fast Ethernet MAC (EMAC) as found in A10 and A20 family SoCs
  • Gigabit Ethernet MAC (GMAC) as found in A20, A31, and A80 family SoCs
  • Gigabit Ethernet MAC (EMAC) as found in A64, A83T, H2+, and H3 family SoCs

Framebuffer

Framebuffer console support is available wherever it is supported by U-Boot using the simplefb(4) driver.

Thermal sensors

Thermal sensors are supported on A10, A13, A20, A31, A64, A83T, H2+, and H3 SoCs.

CPU frequency and voltage scaling

On A10, A20, H2+, and H3 SoCs, dynamic CPU frequency and voltage scaling support is available when configured in the device tree. In addition, on H2+ and H3 SoCs, the kernel will automatically detect when the CPU temperature is too high and throttle the CPU frequency and voltage to prevent overheating.

Touch screen

The touch screen controller found in A10, A13, A20, and A31 SoCs is fully supported. The tpctl(8) utility can be used to calibrate the touch screen and has been updated to support standard wsdisplay APIs.

Other drivers

A standard set of devices are supported across all SoCs (where applicable): DMA, GPIO, I2C, interrupt controllers, RTC, SATA, SD/MMC, timers, UART, USB, watchdog, and more.

U-Boot

A framework for U-Boot packages has been added to pkgsrc, and U-Boot packages for many boards already exist.

What now?

There are a few missing features that would be nice to have:

  • Wi-Fi (SDIO). There are a lot of different wireless chips used on these boards, but the majority seem to be either Broadcom or Realtek based. We recently ported OpenBSD's bwfm(4) driver to support the USB version of the Broadcom Wi-Fi controllers, with an expectation that SDIO support will follow at some point in the future.
  • NAND controller. Most boards have eMMC and/or microSD slots, but this would be really useful for the CHIP / CHIP Pro / PocketCHIP family of devices.
  • 64-bit support for sun50i family SoCs
  • Readily available install images. A prototype NetBSD ARM Bootable Images site is available with a limited selection of supported boards.

More information

Posted in the wee hours of Tuesday night, November 8th, 2017 Tags: blog
The past year has been started with bugfixes and the development of regression tests for ptrace(2) and related kernel features, as well as the continuation of bringing LLDB support and LLVM sanitizers (ASan + UBsan and partial TSan + Msan) to NetBSD.
My plan for the next year is to finish implementing TSan and MSan support, followed by a long run of bug fixes for LLDB, ptrace(2), and other related kernel subsystems

TSan

In the past month, I've developed Thread Sanitizer far enough to have a subset of its tests pass on NetBSD, started with addressing breakage related to the memory layout of processes. The reason for this breakage was narrowed down to the current implementation of ASLR, which was too aggressive and which didn't allow enough space to be mapped for Shadow memory. The fix for this was to either force the disabling of ASLR per-process, or globally on the system. The same will certainly happen for MSan executables. After some other corrections, I got TSan to work for the first time ever on October 14th. This was a big achievement, so I've made a snapshot available. Getting the snapshot of execution under GDB was pure hazard.

$ gdb ./a.out                                  
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
(gdb) r
Starting program: /public/llvm-build/a.out 
[New LWP 2]
==================
WARNING: ThreadSanitizer: data race (pid=1621)
  Write of size 4 at 0x000001475d70 by thread T1:
    #0 Thread1 /public/llvm-build/tsan.c:4:10 (a.out+0x46bf71)

  Previous write of size 4 at 0x000001475d70 by main thread:
    #0 main /public/llvm-build/tsan.c:10:10 (a.out+0x46bfe6)

  Location is global 'Global' of size 4 at 0x000001475d70 (a.out+0x000001475d70)

  Thread T1 (tid=2, running) created by main thread at:
    #0 pthread_create /public/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:930:3 (a.out+0x412120)
    #1 main /public/llvm-build/tsan.c:9:3 (a.out+0x46bfd1)

SUMMARY: ThreadSanitizer: data race /public/llvm-build/tsan.c:4:10 in Thread1
==================

Thread 2 received signal SIGSEGV, Segmentation fault.

I was able to get the above execution results around 10% of the time (being under a tracer had no positive effect on the frequency of successful executions).

I've managed to hit the following final results for this month, with another set of bugfixes and improvements:

check-tsan:
Expected Passes    : 248
Expected Failures  : 1
Unsupported Tests  : 83
Unexpected Failures: 44

At the end of the month, TSan can now reliably executabe the same (already-working) program every time. The majority of failures are in tests verifying sanitization of correct mutex locking usage.

There are still problems with NetBSD-specific libc and libpthread bootstrap code that conflicts with TSan. Certain functions (pthread_create(3), pthread_key_create(3), _cxa_atexit()) cannot be started early by TSan initialization, and must be deferred late enough for the sanitizer to work correctly.

MSan

I've prepared a scratch support for MSan on NetBSD to help in researching how far along it is. I've also cloned and adapted the existing FreeBSD bits; however, the code still needs more work and isn't functional yet. The number of passed tests (5) is negligible and most likely does not work at all.

The conclusion after this research is that TSan shall be finished first, as it touches similar code.

In the future, there will be likely another round of iterating the system structs and types and adding the missing ones for NetBSD. So far, this part has been done before executing the real MSan code. I've added one missing symbol that was missing and was detected when attempting to link a test program with MSan.

Sanitizers

The GCC team has merged the LLVM sanitizer code, which has resulted in almost-complete support for ASan and UBsan on NetBSD. It can be found in the latest GCC8 snapshot, located in pkgsrc-wip/gcc8snapshot. Though, do note that there is an issue with getting backtraces from libasan.so, which can be worked-around by backtracing ASan events in a debugger. UBsan also passes all GCC regression tests and appears to work fine. The code enabling sanitizers on the GCC/NetBSD frontend will be submitted upstream once the backtracing issue is fixed and I'm satisfied that there are no other problems.

I've managed to upstream a large portion of generic+TSan+MSan code to compiler-rt and reduce local patches to only the ones that are in progress. This deals with any rebasing issues, and allows me to just focus on the delta that is being worked on.

I've tried out the LLDB builds which have TSan/NetBSD enabled, and they built and started fine. However, there were some false positives related to the mutex locking/unlocking code.

Plans for the next milestone

The general goals are to finish TSan and MSan and switch back to LLDB debugging. I plan to verify the impact of the TSan bootstrap initialization on the observed crashes and research the remaining failures.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted in the wee hours of Tuesday night, November 1st, 2017 Tags: blog
The past year has been started with bugfixes and the development of regression tests for ptrace(2) and related kernel features, as well as the continuation of bringing LLDB support and LLVM sanitizers (ASan + UBsan and partial TSan + Msan) to NetBSD.
My plan for the next year is to finish implementing TSan and MSan support, followed by a long run of bug fixes for LLDB, ptrace(2), and other related kernel subsystems

TSan

In the past month, I've developed Thread Sanitizer far enough to have a subset of its tests pass on NetBSD, started with addressing breakage related to the memory layout of processes. The reason for this breakage was narrowed down to the current implementation of ASLR, which was too aggressive and which didn't allow enough space to be mapped for Shadow memory. The fix for this was to either force the disabling of ASLR per-process, or globally on the system. The same will certainly happen for MSan executables. After some other corrections, I got TSan to work for the first time ever on October 14th. This was a big achievement, so I've made a snapshot available. Getting the snapshot of execution under GDB was pure hazard.

$ gdb ./a.out                                  
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
(gdb) r
Starting program: /public/llvm-build/a.out 
[New LWP 2]
==================
WARNING: ThreadSanitizer: data race (pid=1621)
  Write of size 4 at 0x000001475d70 by thread T1:
    #0 Thread1 /public/llvm-build/tsan.c:4:10 (a.out+0x46bf71)

  Previous write of size 4 at 0x000001475d70 by main thread:
    #0 main /public/llvm-build/tsan.c:10:10 (a.out+0x46bfe6)

  Location is global 'Global' of size 4 at 0x000001475d70 (a.out+0x000001475d70)

  Thread T1 (tid=2, running) created by main thread at:
    #0 pthread_create /public/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:930:3 (a.out+0x412120)
    #1 main /public/llvm-build/tsan.c:9:3 (a.out+0x46bfd1)

SUMMARY: ThreadSanitizer: data race /public/llvm-build/tsan.c:4:10 in Thread1
==================

Thread 2 received signal SIGSEGV, Segmentation fault.

I was able to get the above execution results around 10% of the time (being under a tracer had no positive effect on the frequency of successful executions).

I've managed to hit the following final results for this month, with another set of bugfixes and improvements:

check-tsan:
Expected Passes    : 248
Expected Failures  : 1
Unsupported Tests  : 83
Unexpected Failures: 44

At the end of the month, TSan can now reliably executabe the same (already-working) program every time. The majority of failures are in tests verifying sanitization of correct mutex locking usage.

There are still problems with NetBSD-specific libc and libpthread bootstrap code that conflicts with TSan. Certain functions (pthread_create(3), pthread_key_create(3), _cxa_atexit()) cannot be started early by TSan initialization, and must be deferred late enough for the sanitizer to work correctly.

MSan

I've prepared a scratch support for MSan on NetBSD to help in researching how far along it is. I've also cloned and adapted the existing FreeBSD bits; however, the code still needs more work and isn't functional yet. The number of passed tests (5) is negligible and most likely does not work at all.

The conclusion after this research is that TSan shall be finished first, as it touches similar code.

In the future, there will be likely another round of iterating the system structs and types and adding the missing ones for NetBSD. So far, this part has been done before executing the real MSan code. I've added one missing symbol that was missing and was detected when attempting to link a test program with MSan.

Sanitizers

The GCC team has merged the LLVM sanitizer code, which has resulted in almost-complete support for ASan and UBsan on NetBSD. It can be found in the latest GCC8 snapshot, located in pkgsrc-wip/gcc8snapshot. Though, do note that there is an issue with getting backtraces from libasan.so, which can be worked-around by backtracing ASan events in a debugger. UBsan also passes all GCC regression tests and appears to work fine. The code enabling sanitizers on the GCC/NetBSD frontend will be submitted upstream once the backtracing issue is fixed and I'm satisfied that there are no other problems.

I've managed to upstream a large portion of generic+TSan+MSan code to compiler-rt and reduce local patches to only the ones that are in progress. This deals with any rebasing issues, and allows me to just focus on the delta that is being worked on.

I've tried out the LLDB builds which have TSan/NetBSD enabled, and they built and started fine. However, there were some false positives related to the mutex locking/unlocking code.

Plans for the next milestone

The general goals are to finish TSan and MSan and switch back to LLDB debugging. I plan to verify the impact of the TSan bootstrap initialization on the observed crashes and research the remaining failures.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted in the wee hours of Tuesday night, November 1st, 2017 Tags: blog
NetBSD participated in the 2017 edition of Google of Summer of Code with 3 students. All of the students finished their projects successfully. The following links report about their activities: Congratulations to the students for finishing their projects successfully, and thanks to Google for sponsoring!
Posted Wednesday afternoon, October 18th, 2017 Tags: blog
NetBSD participated in the 2017 edition of Google of Summer of Code with 3 students. All of the students finished their projects successfully. The following links report about their activities: Congratulations to the students for finishing their projects successfully, and thanks to Google for sponsoring!
Posted Wednesday afternoon, October 18th, 2017 Tags: blog
Add a comment
Contact | Disclaimer | Copyright © 1994-2018 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.
NetBSD® is a registered trademark of The NetBSD Foundation, Inc.