Jul 2018
18 19


This page is a blog mirror of sorts. It pulls in articles from blog's feed and publishes them here (with a feed, too).

On July 7th and 8th there was pkgsrcCon 2018 in Berlin, Germany. It was my first pkgsrcCon and it was really really nice... So, let's share a report about it, what we have done, the talk presented and everything else!

Friday (06/07): Social Event

I arrived by plane at Berlin Tegel Airport in the middle of the afternoon. TXL buses were pretty full but after waiting for 3 of them, I was finally in the direction for Berlin Hauptbahnhof (nice thing about the buses is that after many are getting too full they start to arrive minute after minute!) and then took the S7 for Berlin Jannowitzbrücke station, just a couple of minutes on foot to republik-berlin (for the Friday social event).

On 18:00 we met in republik-berlin for the social event. We had good burgers there and one^Wtwo^Wsome beers together!

The place were a bit noisy for the Belgium vs Brazil World Cup match, but we still had nice discussions together (and also without losing a lot of people cheering on! :))

There was also a table tennis table and spz, maya, youri and myself played (I'm a terrible table tennis player but it was very funny to play the wild west without any rules! :)).

Saturday (07/07): Talks session

Meet & Greet -- Pierre Pronchery (khorben), Thomas Merkel (tm)

Pierre and Thomas welcomed us (aliens! :)) in c-base. c-base is a space station under Berlin (or probably one of the oldest hackerspace, at least old enough that the word "hackerspace" even didn't existed!).

Slides (PDF) are available!

Keynote: Beautiful Open Source -- Hugo Teso

Hugo talked about his experience as an open source developer and focused in particular how important is the user interface.

He discussed that examinating some projects he worked on: Inguma, Bokken, Iaitö and Cutter extracting patterns about his experience.

Slides (PDF) are available!

The state of desktops in pkgsrc -- Youri Mouton (youri)

Youri discussed about the state of desktop environments (DE) in pkgsrc starting with xfce, MATE, LXDE, KDE and Defora.

He then discussed about the WIP desktop environments: Cinnamon, LXQT, Gnome 3 and CDE, hardware support and login managers.

Especially for the WIP desktop environments help is more than welcomed so if you're interested in any of that, would like to help (that's also a great way to start involved in pkgsrc!) please get in touch with youri and/or give a look at the wip/*/TODO files in pkgsrc-wip!

NetBSD & Mercurial: One year later -- Jörg Sonnenberger (joerg)

Jörg started discussing about Git (citing High-level Problems with Git and How to Fix Them - Gregory Szorc) and then discussed on why using Mercurial.

Then he announced the latest changes: hgmaster.NetBSD.org and anonhg.NetBSD.org that permits to experiment with Mercurial and source-changes-hg@ and pkgsrc-changes-hg@ mailing lists.

The talk ended describing missing/TODO steps.

Slides (HTML) are available!

Maintaining qmail in 2018 -- Amitai Schleier (schmonz)

Amitai shared his long experience in maintaining qmail.

A lot of lesson learned in doing that were shared and it was also funny to see that at a certain point from MAINTAINER he was more and more involved doing that and ending up writing patches and tools for qmail.

Slides (HTML) are available!

A beginner's introduction to GCC -- Maya Rashish (maya)

Maya discussed about GCC. First she talked about an overview of the toolchain (in general) and the corresponding GCC projects, how to pass flags to each of them and how to stop the compilation process for each of them.

Then she talked about the black magic that happens in preprocessor, for example, what a program does an #include <math.h> and why __NetBSD__ is defined.

We then saw that with -save-temps is possible to save all intermediary results and how this is very helpful to debug possible problems.

Compiler, assembler and linker were then discussed. We have also seen specfiles, readelf and other GCC internals.

Slides (HTML) are available!

Handling the workflow of pkgsrc-security -- Leonardo Taccari (leot)

I discussed about the workflow of the pkgsrc Security Team (pkgsrc-security).

I gave a brief introduction to nmh (new MH) message handling system.

Then talked about the mission, tasks and workflow of the pkgsrc-security.

For the last part of the talk, I tried to put everything together and showed how to try to automate some part of the pkgsrc-security with nmh and some shell scripting.

Slides (PDF) are available!

Preaching for releng-pkgsrc -- Benny Siegert (bsiegert)

Benny discussed about pkgsrc Releng team (releng-pkgsrc).

The talk started discussing about the pkgsrc Quarterly Releases. Since 2003Q4, every quarter a new pkgsrc release is released. Stable releases are the basis for binary packages. Security, build and bug fixes get applied over the liftime of the release via pullups, until the next quarterly release. The release procedure and freeze period were also discussed.

Then we examined the life of a pullup. Benny first introduced what a pullup is, the rules for requesting them and a practical example of how to file a good pullup request. Under the hood parts of releng were also discussed, for example how tickets are handled with req, help script to ease the pullup, etc..

The talk concluded with the importance of releng-pkgsrc and also a call for volunteers to join releng-pkgsrc! (despite they're really doing a great work, at the moment there is a shortage of members in releng-pkgsrc, so, if you are interested and would like to join them please get in touch with them!)

Something old, something new, something borrowed -- Sevan Janiyan (sevan)

Sevan discussed about the state of NetBSD/macppc port.

Lot of improvements and news happened (a particular kudos to macallan for doing an amazing work on the macppc port!)! HEAD-llvm builds for macppc were added; awacs(4) Bluetooth support, IPsec support, Veriexec support are all enabled by default now.

radeonfb(4) and XCOFF boot loader had several improvements and now DVI is supported on the G4 Mac Mini.

The other big news in the macppc land is the G5 support that will probably be interesting also for possible pkgsrc bulk builds.

Sevan also discussed about some current problems (and workarounds!), bulk builds takes time, no modern browser with JavaScript support is easily available right now but also how using macppc port helped to spot several bugs.

Then he discussed about Upspin (please also give a look to the corresponding package in wip/go-upspin!)

Slides (PDF) are available!

Magit -- Christoph Badura (bad)

Christoph talk was a live introduction to Magit, a Git interface for Emacs.

The talk started quoting James Mickens It Was Never Going to Work, So Let's Have Some Tea talk presented at USENIX LISA15 when James Mickens talked about an high level picture of how Git works.

We then saw how to clone a repository inside Magit, how to navigate the commits, how to create a new branch, edit a file and look at unstaged changes, stage just some hunks of a change and commit them and how to rebase them (everything is just one or two keystrokes far!).

Post conf dinner

After the talks we had some burgers and beers together at Spud Bencer.

We formed several groups to go there from c-base and I was actually in the group that went there on foot so it was also a nice chance to sightsee Berlin (thanks to khorben for being a very nice guide! :)).

Sunday (08/07): Hacking session

An introduction to Forth -- Valery Ushakov (uwe)

On Sunday morning Valery talked about Forth from the ground up.

We saw how to implement a Forth interpreter step by step and discussed threaded code.

Unfortunately the talk was not recorded... However, if you are curious I suggest taking a look to nbuwe/forth BitBucket repository. internals.txt file also contains a lot of interesting resources about Forth.

Learning about Forth from uwe !@netbsd #pkgsrcCon

Hacking session

After Valery talk there was the hacking session where we hacked on pkgsrc, discussed together, etc..

Late in the afternoon some of us visited Computerspielemuseum.

More than 50 years of computer games were covered there and it was fun to also play to several historical and also more recent video games.

We then met again for a dinner together in Potsdamer Platz.

Group photograph of the pkgsrcCon 2018 kindly taken by Gilberto Taccari


pkgsrcCon 2018 was really really great!

First of all I would like to thank all the pkgsrcCon organizers: khorben and tm. It was very well organized and everything went well, thank you Pierre and Thomas!

A big thank you also to wiedi, just after few hours all the recordings of the talk were shared and that's really impressive!

Thanks also to youri and Gilberto for photographs.

Last, but not least, thanks to The NetBSD Foundation for supporting three developers to attend the conference. c-base for kindly providing a very nice location for the pkgsrcCon. Our sponsors: Defora Networks for sponsoring the t-shirts and badges for the conference and SkyLime for sponsoring the catering on Saturday.

Thank you!

Posted early Saturday morning, July 14th, 2018 Tags: blog

Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018

This is the second part of the project of integrating libFuzzer for the userland applications, you can learn about the first part of this project in this post.

After the preparation of the first part, I started to fuzz the userland programs with the libFuzzer. The programs we chose are five:

  1. expr(1)
  2. sed(1)
  3. sh(1)
  4. file(1)
  5. ping(8)

After we fuzzed them with libFuzzer, we also tried other fuzzers, i.e.: American Fuzzy Lop (AFL), honggfuzz and Radamsa.

Fuzz Userland Programs with libFuzzer

"LLVM Logo" by Teresa Chang / All Right Retained by Apple

In this section, I'll introduce how to fuzz the five programs with libFuzzer. The libFuzzer is an in-process, coverage-guided fuzzing engine. It can provide some interfaces to be implemented by the users:

  • LLVMFuzzerTestOneInput: fuzzing target
  • LLVMFuzzerInitialize: initialization function to access argc and argv
  • LLVMFuzzerCustomMutator: user-provided custom mutator
  • LLVMFuzzerCustomCrossOver: user-provided custom cross-over function
In the above functions, only the LLVMFuzzerTestOneInput is necessary to be implemented for any fuzzing programs. This function takes a buffer and the buffer length as input, it is the target to be fuzzed again and again. When the users want to finish some initialization job with argc and argv parameters, they also need to implement LLVMFuzzerInitialize. With LLVMFuzzerCustomMutator and LLVMFuzzerCustomCrossOver, the users can also change the behaviors of producing input buffer with one or two old input buffers. For more details, you can refer to this document.

Fuzz Userland Programs with Sanitizers

libFuzzer can be used with different sanitizers. It is quite simple to use sanitizers together with libFuzzer, you just need to add sanitizer names to the option like -fsanitize=fuzzer,address,undefined. However, memory sanitizer seems to be an exception. When we tried to use it together with libFuzzer, we got some runtime errors. The official document has mentioned that "using MemorySanitizer (MSAN) with libFuzzer is possible too, but tricky", but it doesn't mention how to use it properly.

In the following part of this article, you can assume that we have used the address and undefined sanitizers together with fuzzers if there is no explicit description.

Fuzz expr(1) with libFuzzer

The expr(1) takes some parameters from the command line as input and then treat the command line as a whole expression to be calculated. A example usage of the expr(1) would be like this:

    $ expr 1 + 1
This program is relatively easy to fuzz, what we only to do is transform the original main function to the form of LLVMFuzzerTestOneInput. Since the implementation of the parser in expr(1) takes the argc and argv parameters as input, we need to transform the buffer provided by the LLVMFuzzerTestOneInput to the format needed by the parser. In the implementation, I assume the buffer is composed of several strings separated by the space characters (i.e.: ' ', '\t' and '\n'). Then, we can split the buffer into different strings and organize them into the form of argc and argv parameters.

However, there comes the first problem when I start to fuzz expr(1) with this modification. Since the libFuzzer will treat every exit as an error while fuzzing, there will be a lot of false positives. Fortunately, the implementation of expr(1) is simple, so we only need to replace the exit(3) with the return statement. In the fuzzing process of other programs, I'll introduce how to handle the exit(3) and other error handling interfaces elegantly.

You can also pass the fuzzing dictionary file (to provide keywords) and initial input cases to the libFuzzer, so that it can produce test cases more smartly. For expr(1), the dictionary file will be like this:

And there is only one initial test case:
    1 / 2

With this setting, we can quickly reproduce an existing bug which has been fixed by Kamil Rytarowski in this patch, that is, when you try to feed one of -9223372036854775808 / -1 or -9223372036854775808 % -1 expressions to expr(1), you will get a SIGFPE. After adopting the fix of this bug, it also detected a bug of integer overflow by feeding expr(1) with 9223372036854775807 * -3. This bug is detected with the help of undefined sanitizer (UBSan). This has been fixed in this commit. The fuzzing of expr(1) can be reproduced with this script.

Fuzz sed(1) with libFuzzer

The sed(1) reads from files or standard input (stdin) and modifying the input as specified by a list of commands. It is more complicated than the expr(1) to be fuzzed as it can receive input from several sources including command line parameters (commands), standard input (text to be operated on) and files (both commands and text). After reading the source code of sed(1), I have two findings:

  1. The commands are added by the add_compunit function
  2. The input files (including standard input) are organized by the s_flist structure and the mf_fgets function
With these observations, we can manually parse the libFuzzer buffer with the interfaces above. So I organized the buffer as below:
    command #1
    command #2
    command #N
        // an empty line
    text strings
The first several lines are the commands, one line for one command. Then there will be an empty line to identify the end of command lists. At last, the remaining part of this buffer is the text to be operated on. After parsing the buffer like this, we can add the commands one by one with the add_compunit interface. For the text, since we can directly get the whole text buffer as the format of a buffer, I re-implement the mf_fgets interface to get the input directly from the buffer provided by the libFuzzer.

As mentioned before in the fuzzing of expr(1), exit(3) will result in false positives with libFuzzer. Replacing the exit(3) with return statement can solve this problem in expr(1), but it will not work in sed(1) due to the deeper function call stack. The exit(3) interface is usually used to handle the unexpected cases in the programs. So, it will be a good idea to replace it with exceptions. Unfortunately, the programs we fuzzed are all implemented in C language instead of C++. Finally, I choose to use setjmp/longjmp interfaces to handle it: use the setjmp interface to define an exit point in the LLVMFuzzerTestOneInput function, and use longjmp to jmp to this point whenever the original implementation wants to call exit(3).

The dictionary file for it is like this:

And here is an initial test case:

    hello, world!
which means replacing the "hello" into "hi" in the text of "hello, world!". The fuzzing script of sed(1) can be found here.

Fuzz sh(1) with libFuzzer

sh(1) is the standard command interpreter for the system. I choose the evalstring function as the fuzzing entry for sh(1). This function takes a string as the commands to be executed, so we can directly pass the libFuzzer input buffer to this function to start fuzzing. The dictionary file we used is like this:

We can also add some other commands and shell script syntax to this file to reproduce other conditions. And also an initial test case is provided:
    echo "hello, world!"
You can also reproduce the fuzzing of sh(1) by this script.

Fuzz file(1) with libFuzzer

The fuzzing of file has been done by Christos Zoulas in this project. The difference between this program and other programs from the list is that the main functionality is provided by the libmagic library. As a result, we can directly fuzz the important functions (e.g.: magic_buffer) from this library.

Fuzz ping(8) with libFuzzer

The ping(8) is quite different from all of the programs mentioned above, the main input source is from the network instead of the command line, standard input or files. This challenges us a lot because we usually use the socket interface to receive network data and thus more complex to transform a single buffer into the socket model.

Fortunately, the ping(8) organizes all the network interfaces as the form of hooks to be registered in a structure. So I re-implement all these necessary interfaces (including socket(2), recvfrom(2), sendto(2), poll(2) and etc.) for ping(8).These re-implemented interfaces will take the data from the libFuzzer buffer and transform it into the data to be accessed by the network interfaces. After that, then we can use libFuzzer to fuzz the network data for ping(8). The script to reproduce can be found here.

Fuzz Userland Programs with Other Fuzzers

To compare libFuzzer with other fuzzers from different aspects, including the effort to modify, performance and functionalities, we also fuzzed these five programs with AFL, honggfuzz and radamsa.

Fuzz Programs with AFL and honggfuzz

The AFL and honggfuzz can fuzz the input from standard input and file. They both provide specific compilers (such as afl-cc, afl-clang, hfuzz-cc, hfuzz-clang and etc.) to fuzz programs with coverage information. So, the basic process to fuzz programs with them is to:

  1. Use the specific compilers to compile programs with necessary sanitizers
  2. Run the fuzzed programs with proper command line parameters
For detailed parameters, you can refer to the scripts for expr(1), sed(1), sh(1), file(1) and ping(8).

Miniature Lop
"Miniature Lop" (A kind of fuzzy lop) from Wikipedia / CC BY-SA 3.0

There is no need to do any modification to fuzz sed(1), sh(1) and file(1) with AFL and honggfuzz, because these programs mainly get input from standard input or files. But this doesn't mean that they can achieve the same functionalities as libFuzzer. For example, to fuzz the sed(1), you may also need to pass the commands in the command line parameters. This means that you need to manually specify the commands in the command line and you cannot fuzz them with AFL and honggfuzz, because they can only fuzz input from standard input and files. There is an option of reusing the modifications from the fuzzing process with libFuzzer, but we need to further add a main function for the fuzzed program.

"Höngg" (A quarter in district 10 in Zürich) by Ikiwaner / CC BY-SA 3.0

For expr(1) and ping(8), we even need more modifications than the libFuzzer solution, because expr(1) mainly gets input from command line parameters and ping(8) mainly gets input from the network.

During this period, I have also prepared a package to install honggfuzz for the pkgsrc-wip repository. To make it compatible with NetBSD, we have also contributed to improving the code in the official repository, for more details, you can refer to this pull request.

Fuzz Programs with Radamsa

Radamsa is a test case generator, it works by reading sample files and generating different interesting outputs. Radamsa is not dependant on the fuzzed programs, it is only dependant on the input sample, which means it will not record the coverage information.

"The Moomins" ("Radamsa" is a word spoken by a creature in Moomins) from the comic book cover by Tove Jansson

With Radamsa, we can use scripts to fuzz different programs with different input sources. For the expr(1), we can generate the mutated string and store it to a variable in the shell script and then feed it to the expr(1) in command line parameters. For the sed(1), we can generate both command strings and text by Radamsa and then feed them by command line parameters and file separately. For both sh(1) and file(1), we can generate the needed input file by Radamsa in the shell scripts.

It seems that the shell script and Radamsa combination can fuzz any kinds of programs, but it encounters some problems with ping(8). Although Radamsa supports generating input cases as a network server or client, it doesn't support the ICMP protocol. This means that we can not fuzz ping(8) with modifications or help from other applications.

Comparison Among Different Fuzzers

In this project, we have tried four different fuzzers: libFuzzer, AFL, honggfuzz and Radamsa. In this section, I will introduce a comparison from different aspects.

Modification of Fuzzing

For the programs we mentioned above, here I list the lines of code we need to modify as a factor of porting difficulties:

expr(1) sed(1) sh(1) file(1) ping(8)
libFuzzer 128 96 60 48 582
AFL/honggfuzz 142 0 0 0 590
Radamsa 0 0 0 0 N/A
As mentioned before, the libFuzzer needs to modify more lines for programs who mainly get input from standard input and files. However, for other programs (i.e.: expr(1) and ping(8)), the AFL and honggfuzz need to add more lines of code to get input from these sources. As for Radamsa, since it only needs the sample input data to generate outputs, it can fuzz all programs without modifications except ping(8).

Binary Sizes

The binary sizes for these fuzzers should also be considered if we want to ship them with NetBSD. The following binary sizes are based on the NetBSD-current with the nearly newest LLVM (compiled from source) as an external toolchain:

Dependency Compilers Fuzzer Tools Total
libFuzzer 0 56MB N/A 0 56MB
AFL 0 24KB 292KB 152KB 468KB
honggfuzz 36KB 840KB 124KB 0 1000KB
Radamsa 588KB 0 608KB 0 1196KB
The above table shows the space needed to install different fuzzers. The "Dependency" column shows the size of dependant library; the "Compilers" column shows the size of compilers used for re-compiling fuzzed programs; the "Fruzzer" column shows the size of fuzzer itself and the "Tools" column shows the size of analysis tools.

For the libFuzzer, if the system has already included the LLVM together with compiler-rt as the toolchain, we don't need extra space to import it. The fuzzer of libFuzzer is compiled together with the user's program, so the size is not counted. The compiler size shown above in this table is the size of statically compiled compiler clang. If we compile it dynamically, then there will be a plenty of dependant libraries should be considered. For the AFL, there is no dependant library except libc, so the size is zero. It will also introduce some tools like afl-analyze, afl-cmin and etc. The honggfuzz is dependant on the libBlocksRuntime library whose size is 36KB. This library is also included in the compiler-rt of LLVM. So, if you have already installed it, this size can be ignored. As for the Radamsa, it needs the Owl Lisp during the building process. So the size of the dependency is the size of Owl Lisp interpreter.

Compiler Compatibility

All these fuzzers except libFuzzer are compatible with both GCC and clang. The AFL and honggfuzz provide a wrapper for the native compiler, and the Radamsa does not care about the compilers. As for the libFuzzer, it is implemented in the compiler-rt of LLVM, so it cannot support the GCC compiler.

Support for Sanitizers

All these fuzzers can work together with sanitizers, but only the libFuzzer can provide a relatively strong guarantee that it can provide them. The AFL and honggfuzz, as I mentioned above, provide some wrappers for the underlying compiler. This means that it is dependant on the native compiler to decide whether they can fuzz the programs with the support of sanitizers. The Radamsa can only fuzz the binary directly, so the programs should be compiled with the sanitizers first. However, since the sanitizers are in the compiler-rt together with libFuzzer, you can directly add some flags of sanitizers while compiling the fuzzed programs.


At last, you may wonder how fast are those fuzzers to find an existing bug. For the above programs we have fuzzed in NetBSD, only libFuzzer can find two bugs for the expr(1). However, we cannot assert that the libFuzzer performs well than others. To further evaluate the performance of different fuzzers we have used, I choose some simple functions with bugs to measure how fast they can find them out. Here is a table to show the time for them to find the first bug:

libFuzzer AFL honggfuzz Radamsa
DivTest+S <1s 7s 1s 7s
DivTest >10min >10min 2s >10min
SimpleTest+S <1s >10min 1s >10min
SimpleTest <1s >10min 1s >10min
CxxStringEqTest+S <1s >10min 2s >10min
CxxStringEqTest >10min >10min 2s >10min
CounterTest+S 1s 5min 1s 7min
CounterTest 1s 4min 1s 7min
SimpleHashTest+S <1s 3s 1s 2s

The "+S" symbol means the version with sanitizers (in this evaluation, I used address and undefined sanitizers). In this table, we can observe that libFuzzer and honggfuzz perform better than others in most cases. And another point is that fuzzers can work better with sanitizers. For example, in the case of DivTest, the primary goal of this test is to trigger a "divide-by-zero" error, however, when working with the undefined sanitizer, all these fuzzers will trigger the "integer overflow" error more quickly. I only present a part of the interesting results of this evaluation here. You can refer to this script to reproduce some results or do more evaluation by yourself.


In the past one month, I mainly contributed to:

  1. Porting the libFuzzer to NetBSD
  2. Preparing a pkgsrc-wip package for honggfuzz
  3. Fuzzing some userland programs with libFuzzer and other three different fuzzers
  4. Evaluating different fuzzers from different aspects
Regarding the third contribution, I tried to use different methods to handle them according to their features. During this period, I have fortunately found two bugs for the expr(1).

I'd like to thank my mentor Kamil Rytarowski and Christos Zoulas for their suggestions and proposals. I also want to thank Kamil Frankowicz for his advice on fuzzing and playing with AFL. At last, thanks to Google and the NetBSD community for giving me a good opportunity to work on this project.

Posted early Friday morning, July 13th, 2018 Tags: blog

Prepared by Siddharth Muralee (@Tr3x__) as a part of GSoC'18

I have been working on porting the Kernel Address Sanitizer(KASAN) for the NetBSD kernel. This summarizes the work done until the second evaluation.

Refer here for the link to the first report.

What is a Kernel Address Sanitizer?

The Kernel Address Sanitizer or KASAN is a fast and efficient memory error detector designed by developers at Google. It is heavily based on compiler optimization and has been very effective in reporting bugs in the Linux Kernel.

The aim of my project is to build the NetBSD kernel with the KASAN and use it to find bugs and improve code quality in the kernel. This Sanitizer will help detect a lot of memory errors that otherwise would be hard to detect.

Porting code from Linux to NetBSD

The design of KASAN in the NetBSD kernel is based on its Linux counterpart. Linux code is GPL licensed hence we intend to rewrite it completely or/and relicense certain code parts. We will be handling this once we have a working prototype ready.

This is in no way an easy task especially when the code we try to port is from multiple areas in the kernel like the Memory management system, Process Management etc.

The total port requires a transfer of around 3000 lines in around 6 files with references in around 20 other locations or more.

Design of KASAN and how it works

Kernel Address Sanitizer works by instrumenting all the memory accesses and having a separate "shadow buffer" to keep track of all the addresses that are legitimate and accessible and complains (Very Descriptively!!) when the kernel reads/writes elsewhere.

The basic idea behind Kernel ASan is to set aside a map/buffer where each byte in the kernel is represented by using a bit. This means the size of the buffer would be 1/8th of the total memory accessible by the kernel. In amd64(also x86_64) this would mean setting aside 16TB of memory to handle a total of 128TB of kernel memory.

Implementation Outline

A bulk of the work is done by the compiler inserted code itself(GCC as of now), but still there are a lot of features we have to implement.

  • Checking and reporting Infrastructure
  • Allocation and population of the Shadow buffer during boot
  • Modification of Allocators to update the Shadow buffer upon allocations and deallocations

Kernel Address Sanitizer is useful in finding bugs/coding errors in the kernel such as :

  • Use - after - free
  • Stack, heap and global buffer overflows
  • Double free
  • Use - after - scope

The design makes it faster than other tools such as kmemcheck etc. The average slowdown is expected to be around ~2x times or less.

KASAN Initialisation

KASAN initialization happens in two stages -

  • early in the boot stage, we set each page entry of the entire shadow region to zero_page (early_kasan_init)
  • after the physical memory has been mapped and the pmap(9) has been bootstrapped during kernel startup, the zero_pages are unmapped and the real pages are allocated and mapped (kasan_init).

Below is a short description of what kasan_init() does in Linux code :

  • It loads the kernel boot time page table and clears all the page table entries for the shadow buffer region which had been populated with zero_pages during early_kasan_init.
  • It marks shadow buffer offsets of parts of kernel memory; which we don't want to track or are prohibited, by populating them using kasan_populate_zero_shadow which iterates through all the page tables.
  • Write-protects the mappings and flushes the TLB.

Allocating the shadow buffer

Instead of iterating through the page table entries as Linux preferred to do, we decided to use our low-level kernel memory allocators to do the job for us. This helped in reducing the code complexity and allowed us to reduce the size of the code by a significant amount.

One may ask then does that allocator need to be sanitized? We propose to add a kasan_inited variable which would help the sanitization to occur after the initialization.

We are still in the process of testing this part.

Shadow translation (Address Sanitizer Algorithm)

The translation from a memory address to the corresponding shadow offset must be done pretty fast since it happens during every memory read/write. This is implemented similar to the below code

shadow_address = KmemToShadow(address);
void * KmemToShadow(void * addr) {
return (addr >> Shadow_scale) + Shadow_buffer_start;

The reverse shadow offsets to kernel memory addresses function is also similar to this.

The shadow translation functions have already been implemented and can be found in kasan.h in my Github repository.

Error Detection

Every read/write is instrumented to have a check which would decide if the memory access was legitimate or not. This would be done in the manner shown below.

shadow_address = KmemToShadow(address);
if (IsPoisoned(shadow_address)) {
ReportError(address, Size, IsWrite);

The actual implementation of the Error detection is a bit more complex since we have to include the mapping aspect as well.

Each byte of shadow buffer memory maps to a qword(8 bytes) of kernel memory. Because of which poisoned memory(*shadow_address) values have only 3 possibilities :

  • The value can be 0 ( Meaning that all 8 bytes are unpoisoned )
  • The value can be -ve ( Meaning that all 8 bytes are poisoned )
  • The value can have first k bits unpoisoned and the rest (8 - k) poisoned

Therefore we can use the value also to help assist us while doing Error detection.

Basic Bug Report

The information about each bug is stored in struct kasan_access_info which is then used to determine the following information

  • The kind of bug
  • Whether read/write caused it
  • Process ID of the task being executed
  • The address which caused the error

We also print the stack backtrace which helps in identifying the function with the bug and also helps in finding the execution flow which caused the bug.

One of the best features is that we will be able to use the address where the error occurred to show the poisoning in the shadow buffer. This diagram will be pretty useful for developers trying to fix the bugs found by KASAN.

Unfortunately, since we haven't finished modifying the allocators to update the shadow buffer on read/write we will not be able to test this as of now.


I have managed to get a good initial grasp of the internals of NetBSD kernel over the last two months.

I would like to thank my mentor Kamil for his constant support and valuable suggestions. A huge thanks to the NetBSD community who have been supportive throughout.

Most of my work is done on my fork of NetBSD.

Work left to be done

There is a lot of important features that still remains to be implemented. Below is the list of features that I will be working on.

  • Solve licensing issues
  • sysctl switches to tune options of kern_asan.c (quarantine size, halt_on_error etc)
  • Move the KASAN code to src/sys/kernel and the MI part call kern_asan.c (similar to kern_ubsan.c)
  • Ability to run concurrently KUBSAN & KASAN
  • Refactor kasan_depth and in_ubsan to be shared between sanitizers: probably as a bit in private LWP bitfield
  • ATF tests verifying KASAN's detection of bugs
  • The first boot to a functional shell of a kernel executing with KASAN
  • Finish execution of ATF tests with a kernel running with KASAN
  • Quarantine List
  • Report generation
  • Continue execution
  • Allocator hooks and functions
  • Memory hotplug
  • Kernel module shadowing
  • Quarantine for reusable structs like LWP
Posted late Wednesday afternoon, July 11th, 2018 Tags: blog

The NetBSD Project is pleased to announce NetBSD 8.0 RC 2, the second (and hopefully final) release candidate for the upcoming NetBSD 8.0 release.

Unfortunately the first release candidate did not hold up in our extensive testing (also know as eating our own dog food): many NetBSD.org servers/machines were updated to it and worked fine, but the auto build cluster, where we produce our binaries, did not work well. The issue was tracked down to a driver bug (Intel 10 GBit ethernet), only showing up in certain configurations, and it has been fixed now.

Other security events, like the new FPU related exploit on some Intel CPUs, caused further kernel changes, so we are not going to release NetBSD 8.0 directly, but instead provide this new release candidate for additional testing.

The official RC2 announcement list these major changes compared to older releases:

  • USB stack rework, USB3 support added
  • In-kernel audio mixer
  • Reproducible builds
  • Full userland debug information (MKDEBUG) available. While most install media do not come with them (for size reasons), the debug and xdebug sets can be downloaded and extracted as needed later. They provide full symbol information for all base system and X binaries and libraries and allow better error reporting and (userland) crash analyzis.
  • PaX MPROTECT (W^X) memory protection enforced by default on some architectures with fine-grained memory protection and suitable ELF formats: i386, amd64, evbarm, landisk, pmax
  • PaX ASLR enabled by default on:
    i386, amd64, evbarm, landisk, pmax, sparc64
  • MKPIE (position independent executables) by default for userland on: i386, amd64, arm, m68k, mips, sh3, sparc64
  • added can(4), a socket layer for CAN busses
  • added ipsecif(4) for route-based VPNs
  • made part of the network stack MP-safe
  • NET_MPSAFE kernel option is required to try
  • WAPBL stability and performance improvements

Specific to i386 and amd64 CPUs:
  • Meltdown mitigation: SVS (separate virtual address spaces)
  • Spectre mitigation (support in gcc, used by default for kernels)
  • Lazy cpu saving disabled on some Intel CPUs ("eagerfpu")
  • SMAP support
  • (U)EFI bootloader

Various new drivers:
  • nvme(4) for modern solid state disks
  • iwm(4), a driver for Intel Wireless devices (AC7260, AC7265, AC3160...)
  • ixg(4): X540, X550 and newer device support.
  • ixv(4): Intel 10G Ethernet virtual function driver.
  • bta2dpd - new Bluetooth Advanced Audio Distribution Profile daemon

Many evbarm kernels now use FDT (flat device tree) information (loadable at boot time from an external file) for device configuration, the number of kernels has decreased but the numer of boards has vastly increased.

Lots of updates to 3rd party software included:
  • GCC 5.5 with support for Address Sanitizer and Undefined Behavior Sanitizer
  • GDB 7.12
  • GNU binutils 2.27
  • Clang/LLVM 3.8.1
  • OpenSSH 7.6
  • OpenSSL 1.0.2k
  • mdocml 1.14.1
  • acpica 20170303
  • ntp 4.2.8p11-o
  • dhcpcd 7.0.6
  • Lua 5.3.4

The NetBSD developers and the release engineering team have spent a lot of effort to make sure NetBSD 8.0 will be a superb release, but we have not yet fixed most of the accompanying documentation. So the included release notes and install documents will be updated before the final release, and also the above list of major items may lack important things.

Get NetBSD 8.0 RC2 from our CDN (provided by fastly) or one of the ftp mirrors.

Complete source and binaries for NetBSD are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, and other services may be found at http://www.NetBSD.org/mirrors/.

Please test RC2, so we can make the final release the best one ever so far. We are looking forward to your feedback. Please send-pr any bugs or mail us at releng at NetBSD.org for more general comments.


Posted Monday evening, July 2nd, 2018 Tags: blog
I've finished the integration of sanitizers with the distribution build framework. A bootable and installable distribution is now available, verified with Address Sanitizer, with Undefined Behavior Sanitizer, or with both concurrently. A few dozen bugs were detected and the majority of them addressed.

LLVM sanitizers are compiler features that help find common software bugs. The following sanitizers are available:

  • TSan: Finds threading bugs,
  • MSan: Finds uninitialized memory read,
  • ASan: Finds invalid address usage bugs,
  • UBSan: Finds unspecified code semantics in runtime.

The new MKSANITIZER option supports full coverage of the NetBSD code base with these sanitizers, which helps reduce bugs and serve high security demands.

A brief overview of MKSANITIZER

A sanitizer is a special type of addition to a compiled program, and is included from a toolchain (LLVM or GCC). There are a few types of sanitizers. Their usual purposes are: bug detecting, profiling, and security hardening.

NetBSD already supports the most useful ones with a decent completeness:

  • Address Sanitizer (ASan, memory usage bug detector),
  • Undefined Behavior Sanitizer (UBSan, unspecified semantics in runtime detector),
  • Thread Sanitizer (TSan, data race detector), and
  • Memory Sanitizer (MSan, uninitialized memory read detector).

It's possible to combine compatible sanitizers in a single application; NetBSD and MKSANITIZER support doing so.

There are various advantages and limitations. Properties and requirements vary, mainly reflecting the type of sanitization. Comparisons against other software with similar properties (such as Valgrind) may provide a fuller picture.

Sanitizers usually introduce a relatively small overhead (~2x) compared to Valgrind (~20x). The portability is decent as the sanitizers don't depend heavily on the underlying CPU architecture, and in the UBSan case they basically work on everything including VAX. In the Valgrind case the portability is extremely dependent on the kernel and CPU, thus making this diagnostic tool very difficult to port across platforms. ASan, MSan and TSan require large addressable memory due to their design. This restricts MSan and TSan to 64-bit architectures with a lot of RAM, with ASan for ones that cover completely all of the 4GB (32-bit) address space (it's still possible to use small resources with ASan but it's a tradeoff between usability, time investment, and gain). Although the memory usage is higher with sanitized programs, the modern design and implementation of the memory management subystem in the NetBSD kernel allows to manage it lazily and regardless of reserving TBs of buffers for metadata, the physically used memory is significantly lower usually doubling the regular memory usage by a process. Memory demands are higher for processes that are in the process of fuzzing and thus there is an option to restrict the maximum number of used physical pages that will cause the program to halt (by default 2GB for libFuzzer). A selection of LLVM Sanitizers may conflict with some tools (like Valgrind) and mechanisms (like PaX ASLR in the ASan, TSan and MSan case). Other ones like PaX MPROTECT (sometimes called W^X) are fully compatible with all the currently supported sanitizers.

The main purposes of sanitizations from a user point of view are:

  • bug detecting and assuring correctness,
  • high security demands, and
  • auxiliary feature for fuzzing.

It's worth adding a few notes on the security part as there are numerous good security approaches. One of them is proactive secure coding that is a regime of using safe constructs in the source code and replacement of functions that are prone to errors with versions that are harder to misuse.

However the disadvantage of this approach is that it's just a regime in the coding period. The probability of introducing a bug is minimized, however it does still exist. A problem that is in a program of either style (proactive secure style and careless coding) are almost indistinguishable in the final product and an attacker can use the same methods to violate the program like integer overflow or use after free.

The usual way to prevent bugs is to assume that a code is buggy and add mitigation that will aim to reduce the chance to exploit it. An example of this is the sandboxing of an application.

A code that is aided with sanitizers can be configured, either at build-time or run-time, to report the bug in the execution time of e.g. integer overflow and cause an application to halt immediately. No coding regime can have the same effect and perhaps the number of programming languages with this property is also limited.

In order to use sanitizers effectively within a distribution there is need to rebuild a program and all of its dependencies (with few exceptions) with the same sanitizing configuration. Furthermore, in order to use some versions of fuzzing engines with some types of sanitizers we need to build the fuzzing libraries with the same sanitization as well (this is true for e.g. Memory Sanitizer used together with libFuzzer).

This was my primary motivation towards introduction of a new NetBSD distribution build option: MKSANITIZER.

NetBSD is probably the only distribution that ships with a fully sanitized distribution option. Today there is "just" need for a locally patched external LLVM toolchain and the work on this is still ongoing.

The whole userland sanitization skips not applicable exceptions:

  • low-level libc libraries crt0, crtbegin, crtend, crti, crtn etc,
  • libc,
  • libm,
  • librt,
  • libpthread,
  • bootloader,
  • crunchgen programs like rescue,
  • dynamic ELF loader (implemented as a library),
  • as of today static libraries and executables,
  • as of today as an exception ldd(1) that borrows parts from the dynamic ELF loader.
The selection of unsanitized base libraries like libc is the design choice of sanitizers that a part of the base code is unsanitized and sanitizers install interceptors for their public symbols. Sanitizers expect to use their API from high level, their features and so prevent recursive sanitization (although this happens sometimes in narrow cases). A good illustration of this design choice is the process of sanitization of users of the threading library. Sanitizers and TSan in particular register interceptors for the public symbols of libpthread and treat it mostly as a black box (there are few exceptions). As an alternative with a fully sanitized libpthread, there would need to be fully OS dependent implementation of each feature in sanitizers based on the selection of kernel features, handle relatively opaque syscalls, CPU specific differences in the implementation etc... and in the end it would be very difficult without the full reimplementation of libpthread to handle operations like pthread_join(3).

The sanitization of static programs as of today is a low priority and falls outside the scope of my work.

The situation with ldd(1) will be cleared in future and it will be most probably sanitized.

Kernel and kernel modules use a different version of sanitizers and the porting process of Kernel-AddressSanitizer and Kernel-UndefinedBehaviorSanitizer is ongoing out of the MKSANITIZER context.

There used to be an analogous attempt in the Gentoo land (asantoo), however these efforts stalled two years ago. The Google Chromium team uses a set of scripts to bootstrap sanitized dependencies for their programs on top of a Linux distribution (as of today Ubuntu Trusty x86_64).

I've started to document bugs detected with MKSANITIZER in a dedicated directory on my NetBSD homepage with my code and notes. So far there are 35 documented findings. Most of them are real problems in programs, some of them might be considered overcautious (mostly ones detected with UBSan) and probably all of them are without serious security risk or privilege escalation or system crash. Some of the findings (0029-0035 - MemorySanitizer userland one) contain problems located probably in sanitizers (the proper NetBSD support in them).

This list presents that some of the problems are located in formally externally-maintained software like tmux, heimdal, grep, nvi or nawk.

I think that the following patch is a good example of a good finding for a privileged (setuid) program passwd(1) that reads a vector out of bounds and write a null character into a random byte on the stack (documented as report 0024).

From 28dd358940af30f434a930fd1977e3bf2b69dcb1 Mon Sep 17 00:00:00 2001
From: kamil 
Date: Sun, 24 Jun 2018 01:53:14 +0000
Subject: [PATCH] Prevent underflow buffer read in trim_whitespace() in

If a string is empty or contains only white characters, the algorithm of
removal of white characters at the end of the passed string will read
buffer at index -1 and keep iterating backward.

Detected with MKSANITIZER/ASan when executing passwd(1).
 lib/libutil/passwd.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/libutil/passwd.c b/lib/libutil/passwd.c
index 9cc1d481a349..cee168e7d678 100644
--- a/lib/libutil/passwd.c
+++ b/lib/libutil/passwd.c
@@ -1,4 +1,4 @@
-/*	$NetBSD: passwd.c,v 1.52 2012/06/25 22:32:47 abs Exp $	*/
+/*	$NetBSD: passwd.c,v 1.53 2018/06/24 01:53:14 kamil Exp $	*/
  * Copyright (c) 1987, 1993, 1994, 1995
@@ -31,7 +31,7 @@
 #if defined(LIBC_SCCS) && !defined(lint)
-__RCSID("$NetBSD: passwd.c,v 1.52 2012/06/25 22:32:47 abs Exp $");
+__RCSID("$NetBSD: passwd.c,v 1.53 2018/06/24 01:53:14 kamil Exp $");
 #endif /* LIBC_SCCS and not lint */
@@ -503,13 +503,21 @@ trim_whitespace(char *line)
 	_DIAGASSERT(line != NULL);
+	/* Handle empty string */
+	if (*line == '\0')
+		return;
 	/* Remove leading spaces */
 	p = line;
 	while (isspace((unsigned char) *p))
 	memmove(line, p, strlen(p) + 1);
-	/* Remove trailing spaces */
+	/* Handle empty string after removal of whitespace characters */
+	if (*line == '\0')
+		return;
+	/* Remove trailing spaces, line must not be empty string here */
 	p = line + strlen(line) - 1;
 	while (isspace((unsigned char) *p))

The first boot of a MKSANITIZER distribution with Address Sanitizer

The process of getting a bootable and installable (and ignoring the aspect of buildable and generatable) installation ISO image was a loop of fixing bugs and retrying the process. At the end of the process there is an option to install a fully sanitized userland with ASan, UBSan or both. The MSan version is scheduled after finishing the kernel ptrace(2) work. Other options like a target prebuilt with ThreadSanitizer, safestack or The Scudo Hardened Allocator are untested.

I have also documented an example of the Heimdal bug that appeared during the login attempt (and actually preventing it) to a fully ASanitized userland:

This particular issue has been fixed with the following patch:

From ddc98829a64357ad73af0d0fa60c8d9c8499cce3 Mon Sep 17 00:00:00 2001
From: kamil 
Date: Sat, 16 Jun 2018 18:51:36 +0000
Subject: [PATCH] Do not reference buffer after the code scope {}

rk_getpwuid_r() returns a pointer pwd->pw_dir to a buffer pwbuf[].

It's not safe to store another a copy of pwd->pw_dir in outter scope and
use it out of the scope where there exists pwbuf[].

This fixes a problem reported by ASan under MKSANITIZER.
 crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c b/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c
index 47cb4481962e..6af30502ed5e 100644
--- a/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c
+++ b/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c
@@ -1,4 +1,4 @@
-/*	$NetBSD: config_file.c,v 1.3 2017/09/08 15:29:43 christos Exp $	*/
+/*	$NetBSD: config_file.c,v 1.4 2018/06/16 18:51:36 kamil Exp $	*/
  * Copyright (c) 1997 - 2004 Kungliga Tekniska Hogskolan
@@ -430,6 +430,8 @@ krb5_config_parse_file_multi (krb5_context context,
     if (ISTILDE(fname[0]) && ISPATHSEP(fname[1])) {
 	const char *home = NULL;
+	struct passwd pw, *pwd = NULL;
+	char pwbuf[2048];
 	if (!_krb5_homedir_access(context)) {
 	    krb5_set_error_message(context, EPERM,
@@ -441,9 +443,6 @@ krb5_config_parse_file_multi (krb5_context context,
 	    home = getenv("HOME");
 	if (home == NULL) {
-	    struct passwd pw, *pwd = NULL;
-	    char pwbuf[2048];
 	    if (rk_getpwuid_r(getuid(), &pw, pwbuf, sizeof(pwbuf), &pwd) == 0)
 		home = pwd->pw_dir;

Sending this patch upstream is on my TODO list, this means that other projects can benefit from this work. A single patch preventing NULL pointer arithmetic for tmux has been already submitted upstream and merged.

After the process of long run of booting newer versions of locally patched distribution I've finally entered the functional shell.

And a stored "copy-pasted" terminal screenshot after login into a shell:

also known as NetBSD-current.  It is very possible that it has serious bugs,
regressions, broken features or other problems.  Please bear this in mind
and use the system with care.

You are encouraged to test this version as thoroughly as possible.  Should you
encounter any problem, please report it back to the development team using the
send-pr(1) utility (requires a working MTA).  If yours is not properly set up,
use the web interface at: http://www.NetBSD.org/support/send-pr.html

Thank you for helping us test and improve NetBSD.

We recommend that you create a non-root account and use su(1) for root access.
qemu# uname -a
NetBSD qemu 8.99.19 NetBSD 8.99.19 (GENERIC) #12: Sat Jun 16 02:39:37 CEST 2018
 root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
qemu# nm /bin/ksh |grep asan|grep init
0000000000439bf8 B _ZN6__asan11asan_initedE
0000000000439bfc B _ZN6__asan20asan_init_is_runningE
00000000004387a1 b _ZN6__asanL14tsd_key_initedE
0000000000430f18 b _ZN6__asanL20dynamic_init_globalsE
000000000043a190 b _ZZN6__asan18asanThreadRegistryEvE11initialized
00000000000cfaf0 T __asan_after_dynamic_init
00000000000cf8a0 T __asan_before_dynamic_init
0000000000199b50 T __asan_init

The sshd(8) crash has been fixed by Christos Zoulas. There are still at least 2 ASan unfixed bugs left in the installer and few ones that prevent booting and using the distribution without noting that the sanitizers are enabled. The most notorious ones are ssh(1) & sshd(8) startup breakage and egrep(1) misbehavior in corner cases, both might be false positives and bugs in the sanitizers.

Validation of the MKSANITIZER=yes distribution

I've managed to execute the ATF regression tests against a sanitized distribution prebuilt with Address Sanitizer and in another attempt against Undefined Behavior Sanitizer.

In my setup of the external toolchain I had broken C++ runtime library caused with a complicated bootstrap chain. The process of building various LLVM projects from a GCC distribution requires generic work with the LLVM projects and there is need to build and reuse intermediate steps. For example, the compiler-rt project that contains various low-level libraries (including sanitizers) requires Clang as the compiler, as otherwise it's not buildable. This is the reason why I've deferred testing all the features in the current stage and I'm trying to coordinate with the maintainer Joerg Sonnenberger the process of upgrading the LLVM projects in the NetBSD distribution. I will reuse it to rebase the patches of mine and ship a readme text to users and other developers expecting to run a release with the MKSANITIZER option.

The lack of C++ runtime pushed me towards reusing non-sanitized ATF tests (as the ATF framework is written in C++) against the sanitized userland. Two bugs have been detected:

  • expr(1) triggering Undefined Behavior in the routines detecting overflow in arithmetic operations,
  • sh(1) use after free in corner case of redefining an active function.

I've addressed the expr(1) issues and added new ATF tests in order to catch regressions in future potential changes. The Almquist Shell bug has been reported to the maintainer K. Robert Elz and fixed accordingly.

libFuzzer integration with the userland programs

During the Google Summer of Code project: libFuzzer integration with the basesystem by Yang Zheng it has been detected that the original expr(1) fix introduced by myself is not fully correct.

Yang Zheng has detected that the new version of expr(1) is still crashing in narrow cases. I've checked his integration patch of expr(1) with libFuzzer, reproduced the problem myself and documented:

$ ./expr -only_ascii=1 -max_len=32 -dict=expr-dict expr_corpus/ 1>/dev/null 
Dictionary: 12 entries
INFO: Seed: 2332047193
INFO: Loaded 1 modules   (725 inline 8-bit counters): 725 [0x7a11f0, 0x7a14c5), 
INFO: Loaded 1 PC tables (725 PCs): 725 [0x579d18,0x57ca68), 
INFO:      269 files found in expr_corpus/
INFO: seed corpus: files: 269 min: 1b max: 31b total: 3629b rss: 29Mb
expr.y:377:12: runtime error: signed integer overflow: 9223172036854775807 * -3 cannot be represented in type 'long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior expr.y:377:12 in 
MS: 0 ; base unit: 0000000000000000000000000000000000000000
9223172036854775807 * -3
artifact_prefix='./'; Test unit written to ./crash-9c3dd31298882557484a14ce0261e7bfd38e882d
Base64: OTIyMzE3MjAzNjg1NDc3NTgwNyAqIC0z

And the offending operation is INT * -INT:

$ eval ./expr-ubsan '9223372036854775807 \* -3' expr.y:377:12: runtime error: signed integer overflow: 9223372036854775807 * -3 cannot be represented in type 'long' -9223372036854775805

This has been fixed as well and the set of ATF tests for expr(1) extended for missing scenarios.

MKSANITIZER implementation

The initial implementation of MKSANITIZER has been designed and implemented by Christos Zoulas. I took this code and continued working on it with an external LLVM toolchain (version 7svn with local patches). The final result has been documented in share/mk/bsd.README:

MKSANITIZER     if "yes", use the selected sanitizer to compile userland
                programs as defined in USE_SANITIZER, which defaults to
                "address". A selection of available sanitizers:
                        address:        A memory error detector (default)
                        thread:         A data race detector
                        memory:         An uninitialized memory read detector
                        undefined:      An undefined behavior detector
                        leak:           A memory leak detector
                        dataflow:       A general data flow analysis
                        cfi:            A control flow detector
                        safe-stack:     Protect against stack-based corruption
                        scudo:          The Scudo Hardened allocator
                It's possible to specify multiple sanitizers within the
                USE_SANITIZER option (comma separated). The USE_SANITIZER value
                is passed to the -fsanitize= argument to the compiler.
                Additional arguments can be passed through SANITIZERFLAGS.
                The list of supported features and their valid combinations
                depends on the compiler version and target CPU architecture.

As an illustration, in order to build a distribution with ASan and UBSan, using the LLVM toolchain one needs to enter a command line like:

./build.sh -V MKLLVM=yes -V MKGCC=no -V HAVE_LLVM=yes -V MKSANITIZER=yes -V USE_SANITIZER="address,undefined" distribution

There is an ongoing effort on upstreaming the remaining toolchain patches and right now we need to use a specially preprocessed external LLVM toolchain with a pile of local patches.

The GCC toolchain is a downstream for LLVM sanitizers and is out of the current focus, although there are local NetBSD patches for ASan, UBSan and LSan in GCC's libsanitizer. Starting with GCC 8.x, there is the first upstreamed block of NetBSD code pulled in from LLVM sanitizers.

Golang and TSan (-race)

There has been finally merged the compiler-rt update patch in Golang.

runtime/race: update most syso files to compiler-rt fe2c72

These were generated using the racebuild configuration from
https://golang.org/cl/115375, with the LLVM compiler-rt repository at
commit fe2c72c59aa7f4afa45e3f65a5d16a374b6cce26 for most platforms.

The Windows build is from an older compiler-rt revision, because the
compiler-rt build script for the Go race detector has been broken
since January 2017 (https://reviews.llvm.org/D28596).

Updates #24354.

Change-Id: Ica05a5d0545de61172f52ab97e7f8f57fb73dbfd
Reviewed-on: https://go-review.googlesource.com/112896
Reviewed-by: Brad Fitzpatrick 
Run-TryBot: Brad Fitzpatrick 
TryBot-Result: Gobot Gobot 

This means that the TSan/amd64 support syzo file has been included for NetBSD next to Darwin, FreeBSD and Linux (Windows is broken and no longer maintained). There is still need to merge the remaining patches for shell scripts and go files, and the code is still in review waiting for feedback.

Changes merged with the NetBSD sources

  • ksh: Remove symbol clash with libc -- rename twalk() to ksh_twalk()
  • ktruss: Remove symbol clash with libc -- rename wprintf() to xwprintf()
  • ksh: Remove symbol clash with libc -- rename glob() to ksh_glob()
  • Don't pass -z defs to libc++ with MKSANITIZER=yes
  • Mark sigbus ATF tests in t_ptrace_wait as expected failure
  • Make new DTrace and ZFS code buildable with Clang/LLVM
  • Fix the MKGROFF=no MKCXX=yes build
  • Correct Undefined Behavior in ifconfig(8)
  • Correct Undefined Behavior in libc/citrus
  • Correct Undefined Behavior in gzip(1)
  • Do not use index out of bounds in nawk
  • Change type of tilde_ok from int to unsigned int in ksh(1)
  • Rework perform_arith_op() in expr(1) to omit Undefined Behavior
  • Add 2 new expr(1) ATF tests
  • Prevent Undefined Behavior in shift of signed integer in grep(1)
  • Set NOSANITIZER in i386 mbr files
  • Disable sanitizers for libm and librt
  • Avoid Undefind Behavior in DEFAULT_ALIGNMENT in GNU grep(1)
  • Detect properly overflow in expr(1) for 0 + INT
  • Make the alignof() usage more portable in grep(1)
  • heimdal: Do not reference buffer after the code scope {}
  • Do not cause Undefined Behavior in vi(1)
  • Disable MKSANITIZER in lib/csu
  • Disable SANITIZER for ldd(1)
  • Set NOSANITIZER in rescue/Makefile
  • Add new option -s to crunchgen(1) -- enable sanitization
  • Make building of dhcp compatible with MKSANITIZER
  • Refactor MKSANITIZER flags in mk rules
  • Specify NOSANITIZER in distrib/amd64/ramdisks/common
  • Fix invalid free(3) in sysinst(8)
  • Fix integer overflow in installboot(8)
  • Specify -Wno-format-extra-args for Clang/LLVM in gpl2/gettext
  • sysinst: Enlarge the set_status[] array by a single element
  • Prevent underflow buffer read in trim_whitespace() in libutil/passwd.c
  • Fix stack use after scope in libutil/pty
  • Prevent signed integer left shift UB in FD_SET(), FD_CLR(), FD_ISSET()
  • Enhance the documentation of MKSANITIZER in bsd.README
  • Avoid unportable offsetof(3) calculation in nvi in log1.c
  • Add a framework for renaming symbols in libc&co for MKSANITIZER
  • Specify SANITIZER_RENAME_SYMBOL in diffutils
  • Specify SANITIZER_RENAME_SYMBOL in chpass
  • Include for offsetof(3)
  • Avoid UB in tmux/window_copy_add_formats()
  • Document sanitizers in acronyms.comp
  • Add TODO.sanitizer
  • Avoid misaligned access in disklabel(8) in find_label() (patch by Christos Zoulas)
  • Improve the * operator handling in expr(1)
  • Add a couple of new ATF expr(1) tests
  • Add a missing check to handle correctly 0 * 0 in expr(1)
  • Add 3 more expr(1) ATF tests detecting overflow

Changes merged with the LLVM projects

  • LLVM: Handle NetBSD specific path in findDebugBinary()
  • compiler-rt: Disable recursive interceptors in signal(3)/MSan
  • Introduce CheckASLR() in sanitizers

Plan for the next milestone

The ptrace(2) tasks have been preempted by the suspended work on sanitizers, in order to actively collaborate with the Google Summer of Code students (libFuzzer integration with userland, KUBSan, KASan).

I have planned the following tasks before returning back to the ptrace(2) fixes:

  • upgrade base Clang/LLVM, libcxx, libcxxabi to at least 7svn (HEAD) (needs cooperation with Joerg Sonnenberger)
  • compiler-rt import and integration with base (needs cooperation with Joerg Sonnenberger)
  • merge TSan, MSan and libFuzzer ATF tests
  • prepare MKSANITIZER readme
  • kernel-asan port
  • kernel-ubsan port
  • switch syscall(2)/__syscall(2) to libc calls
  • upstream local patches, mostly to compiler-rt
  • develop fts(3) interceptors (MSan, for ls(1), find(1), mtree(8)
  • investigate and address the libcxx failing tests on NetBSD
  • no-ASLR boot.cfg option, required for MKSANITIZER
My plan for the next milestone is to reduce the the list and keep actively collaborating with the summer students.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:


Posted Monday afternoon, July 2nd, 2018 Tags: blog
Prepared by Harry Pantazis (IRC:luserx0, Mail:luserx0 AT gmail DOT com) as part of GSoC 2018.

For GSoC '18, I'm working on the Kernel Undefined Behavior Sanitizer (KUBSAN) project for the integration of Undefined Behavior regression testing on the amd64 kernel. This article summarizes what has been done up to this point (Phase 1 Evaluation), future goals and a brief introduction to Undefined Behavior.

So, first things first, let's get started.
The mailing list project presentation

What is Undefined Behavior?

For Turing-complete languages we cannot reliably decide offline whether a program has the potential to execute an error; we just have to run it and see. DUH!

Undefined Behavior in C is basically what the ANSI standard leaves unexplained. Code containing Undefined Behavior is ANSI C compatible. It follows all the rules explained in the standard and causes real trouble. In programming terms, it involves all the possible functionalities C code can run. It's whatever the compiler doesn't moan about, but when run it causes run-time bugs, hard to locate.

The C FAQ defines "Undefined Behavior" like this:

Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

A brief explanation of what is classifed as UB and some real case scenarios

A great blog post explaining more than mere mortals might need

The important and scary thing to realize is that just about *any* optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will and a significant part of their reason for existing is to expose secondary optimizations like the ones above.

Solution: Make a UB Sanitizer

What we can do to find undefined behavior errors in our code, is creating a Sanitizer. Hopefully both CLang and GCC have taken care of such "dream" tools, covering the majority of undefined behavior cases in a very nice manner. They allow us to simply parse the -fsanitize=undefined option when we build our code and the compiler "spits out" simple warnings for us to see. The CLang supported flags (same as GCC's but they don't have such extensive explanation docs).

Adding ATF Tests for Userland UBSan

This was my first deliverable for the integration of KUBSan. The concept was to include tests causing simple C programs to portray Undefined Behavior, such as overflows, erroneous shifting and out of bounds accessing of arrays (VLAs actually). The ATF framework is not a real "sweetheart" to learn, so it took me more than expected to complete this preliminary step to the project. The good news was that I had enough time to understand Undefined Behavior to a suave depth and make my extensive research for ideas (and not only).

The initial commit of the tests cleaned up and submitted by my mentor Kamil Rytarowski.

Addition of Example Kernel Module Panic_String

Next on our roadmap was the understanding of NetBSD's loadable kernel modules. For this, I created a kernel module parsing a string from a device named /dev/panic and calling the kernel panic(9) with it as argument, after syncing the system. This took a long time, but in the process I had the priviledge of reading FreeBSD Device Drivers: A Guide for the Intrepid, which unfortunatelyfor our foundation is the only book in close resemblance to our kernel module infrastructure.

The panic_string module commit revised, corrected and uploaded by Kamil.

Compiling the kernel with -fsanitize=undefined

Compiled the kernel with the aforementioned option to catch UB bugs. We got one. Only one! Which was reported to the tech-kern mailing list in this Thread.

Adding the option to compile the Kernel with KUBSan

At last what was our last deliverable for GSoC's first evaluation, was getting the amd64 kernel to boot with the KUBSan option enabled. This was a trick. We only needed the appropriate dummy functions, so we could use them as symbols in the linking process of a kernel build. At first I created KUBSan as a loadable kernel module, but the chaotic structure of our codebase was to much for me. This means that I searched for 4 whole days a way to link the exported symbols to the kernel build and was unsuccessful :(. But everything happens for a reason, because that one failure ignited me to search for all the available UBSan implementations and I was able to locate the initial support of the KUBSan functionality for: Linux, Chromium and FreeBSD. Which in turn, made me realise that the module was not necessary, since I could include the KUBSan functiuonality to our /sys infrastructure. Which I did and which was successful and which allowed me to boot and run a fully KUBSan-ed kernel.

It hasn't been uploaded to upstream yet, but you can have a look at my local (and totally messy) fork.

Summary and Future Goals

This first month of GSoC has been a great experience. Last year I participated again with project trying to "revamp" support for Scheme R7RS in the Eclipse IDE (we later tried to create a Kawa-Scheme Language Server-LSP, but that's a sad story) and my overall experience was really bad (I had to quit mid-July). This year we are doing things in much friendlier, cooperative and result-producing manner. I'm really happy about that.

A brief summary is that: the Kernel booted with KUBSan and I'm in knowledge of all the tools needed to extent that functionality. That's of ye need to know up to this point.

    Future goals include:
  1. Making a full implementation of KUBSan, with an edge on surpassing other existing implementations,
  2. Clear up any license issues,
  3. Finish the amd64 implementation and switch focus to the i386,
  4. Spead the NetBSD hype

At last I would like to deliver a huge thanks to my mentors Kamil and Christos for their advices and help with the project, but mostly for their incredible behavior towards the problems I went through this past month. Much love :)

Posted Friday afternoon, June 15th, 2018 Tags: blog
Prepared by Siddharth Muralee (@Tr3x__) as part of GSoC 2018.

It's been a fun couple of weeks since I started working on the Kernel Address Sanitizer (KASan) project with NetBSD. I have learned a lot during this period. It's been pretty amazing. This is a report on the work I have done prior to the first evaluation period.

What is an Address Sanitizer?

The Address Sanitizer (ASan) is an open source tool that was developed by Google to detect memory corruption bugs such as buffer overflows or access to dangling pointers (use after free). Its a part of the toolset that Google has which includes an Undefined Behaviour Sanitizer (UBSan), a Thread Sanitizer (TSan) and a Leak Sanitizer (LSan).

On adding the feature to NetBSD it would be possible to add build the kernel with ASan and then use it to find memory corruption bugs.

Testing ASan in the User Space

My first step was to testing whether ASan had been implemented in the NetBSD userspace. I wrote a couple of ATF regression tests for checking whether ASan worked in the userspace for C and C++ compilers and also whether manual poisoning would work.

This allowed me to get familiar with the ATF testing framework that NetBSD.

Added a couple of Kernel Modules

I was asked to add a set of example kernel modules to the kernel. I added an example module to show how to make a /dev module multiprocessor safe and to add a node in the sysctl tree.

Reading about UVM

My next task was to get familiar with the UVM (virtual memory system of NetBSD). I read through a 1998 dissertation by Dr. Chuck Cranor. I published a blog article containing my scratch notes on reading the article.

Adding an option to compile the kernel with KASan

Finally, I had to build the kernel with the KASan stubs (Dummy functions so that the build would be working). I added a configuration file which can be used to build the kernel with KASAN. I also published a blog post regarding how to do the same.


In short, I am pretty excited to move forward with the project. The community has been supportive and helpful all the way.

I would like to thank my mentor, Kamil Rytarowski who was always ready to dive deep into code and help whenever required. I also want to thank Cherry Mathews for helping clear up doubts related to UVM.

Posted Wednesday afternoon, June 13th, 2018 Tags: blog

Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018

During the Google Summer of Code 2018, I'm working on the project of integrating libFuzzer for the userland applications. The libFuzzer is a fuzzing engine based on the coverage information provided by the SanitizerCoverage in LLVM. It can repeatedly generate mutations of input data and test them until it finds the potential bugs. In this post, I'm going to share what I have done in the first month of this summer.

For the first month, I mainly tried to apply the sanitizers to the userland applications. Sanitizers (such as MemorySanitizer, AddressSanitizer, and etc.) are helpful to the fuzzing process because they can detect various types of run-time errors like uninitialized reads, out-of-bounds accesses, use-after-free and so on. I tried to apply MemorySanitizer as a start and there were three steps to finish this:

  1. Import new version LLVM as an external toolchain
  2. Add new interceptors for userland applications
  3. Enable MemorySanitizer for userland applications and test them

Compile New Version LLVM Statically with EXTERNAL_TOOLCHAIN

Using a new version of LLVM toolchain is necessary because the LLVM in NetBSD trunk is old and there are some changes in the new version. However, updating the toolchain in the src will introduce extra work for this project, so we decided to use the EXTERNAL_TOOLCHAIN parameter provided by NetBSD to work with the new version.

During this period, I chose to use a pure-LLVM userland to avoid potential problems. This means that we should replace the libc++ instead of libstdc++ library for the userland programs. As a result, I used -DSANITIZER_CXX_ABI=libc++ and -DCLANG_DEFAULT_CXX_STDLIB=libc++ flags to eliminate some compilation errors while compiling the LLVM toolchain.

Another compiling issue is related to the sanitizers. Whenever there is failed check with sanitizers, the program will abort with backtrace information like this:

    ==15299==WARNING: MemorySanitizer: use-of-uninitialized-value
        #0 0x41c837 in main /home/zhengy/free.c:6:3
        #1 0x41c580 in ___start (//./a.out+0x41c580)

    SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/zhengy/free.c:6:3 in main
The backtrace is generated with the support of llvm-symbolizer. However, if we compile some dynamic libraries, which are needed by llvm-symbolizer, with sanitizers (because some userland programs with sanitizers also need them), then it will not available for generating a readable backtrace anymore:
    ==1623==WARNING: MemorySanitizer: use-of-uninitialized-value
        #0 0x41c837  (//./a.out+0x41c837)
        #1 0x41c580  (//./a.out+0x41c580)

    SUMMARY: MemorySanitizer: use-of-uninitialized-value (//./a.out+0x41c837)
So, to remove the dependencies of the sanitized dynamic libraries for llvm-symbolizer and other LLVM tools, we chose to compile the whole LLVM toolchain statically. For this purpose, we found that the static building behavior of LLVM on NetBSD is not workable, so we need to do some subtle modification to the cmake file. But this modification still needs some correctness confirmation from the LLVM community.

After all of these preparations, I wrote a shell script to automatically do the jobs of preparing external LLVM toolchains, compiling the NetBSD from source and finally generate a chroot(8)-able environment to work with sanitizers and libFuzzer.

With this environment, I first tried to run the test cases from both the LLVM and the NetBSD. For the LLVM part, I found that some libFuzzer cases were not working. But finally, we found that this resulted from the improper usages of sem_open(3) interface in the libFuzzer and so I submitted a patch to fix this.

For the NetBSD part, it worked well with the existing ATF(7) test cases for the AddressSanitizer and UndefinedBehaviorSanitizer. To test the MemorySanitizer, ThreadSanitizer, and libFuzzer, I added some test cases for them.

Add New Interceptors

Some libraries (such as libc, libm, and libpthread) and syscalls cannot be applied properly with sanitizers. This will introduce some troubles because we will lack information with these unsanitized interfaces. Fortunately, sanitizers can provide wrappers, namely interceptors, for these interfaces to manually provide some information. However, the set of interceptors is quite incomplete and thus need some effort to add some unsanitized functions needed by userland applications. As a summary, I added interceptors for the following interfaces:

  • strtonum(3) family: strtonum(3), strtoi(3), strtou(3)
  • vis(3) family: vis(3), nvis(3), strvis(3) and etc.
  • getmntinfo(3)
  • puts(3), fputs(3)
  • Hash interfaces: sha1(3), md2(3), md4(3), md5(3), rmd160(3) and sha2(3)
  • getvfsstat(2)
  • nl_langinfo(3)
  • fparseln(3)
  • unvis(3) family: unvis(3), strunvis(3) and etc.
  • statvfs(2) family: statvfs(2), fstatvfs(2) and etc.
  • mount(2) and unmount(2)
  • fseek(3) family: fseek(3), ftell(3), rewind(3) and etc.
  • cdbr(3) family: cdbr_open(3), cdbr_get(3), cdbr_find(3) and etc.
  • setvbuf(3) family: setbuf(3), setbuffer(3), setlinebuf(3), setvbuf(3)
  • mi_vector_hash(3)

Most of these interceptors are easy to add, we only need to leverage the interceptor interfaces provided by the compiler-rt and do the pre- and post- function call check. As an example, I choose the interceptor of strvis(3) to illustrate the implementation:

    INTERCEPTOR(int, strvis, char *dst, const char *src, int flag) {
      void *ctx;
      COMMON_INTERCEPTOR_ENTER(ctx, strvis, dst, src, flag);
      if (src)
        COMMON_INTERCEPTOR_READ_RANGE(ctx, src, REAL(strlen)(src) + 1);
      int len = REAL(strvis)(dst, src, flag);
      if (dst)
        COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dst, len + 1);
      return len;
The strvis(3) interface will transform the representation of string stored in src and then return it with dst. So, its interceptor wants to tell the sanitizers two messages:
  1. strvis(3) will read the string in src (COMMON_INTERCEPTOR_READ_RANGE interface)
  2. strvis(3) will write a string to dst (COMMON_INTERCEPTOR_WRITE_RANGE interface)

So, with interceptors, the sanitizers can obtain information of unsanitized interfaces. There are three unsolved issues with interceptors:

  1. Interceptors with FILE type: the FILE type is implemented as a structure and contains some pointers inside. This means that we should check these pointers one by one in the interceptors. However, the FILE type is common among different OSs and their implementations vary a lot. So, for different OSs, we should write different conditions. What's worse, there are some interceptors (such as fopen) implemented by others skipping the checks for FILE. This will introduce some incompatible problems if we enforce the check with other interfaces (like fputs). For example, the fopen is the interface to initialize the FILE type, if we skip marking the returned FILE pointer as initialized (with COMMON_INTERCEPTOR_WRITE_RANGE), we will get an error in the interceptor of fputs after we enforce the check of this pointer (with COMMON_INTERCEPTOR_READ_RANGE).
  2. mount(2) interface: The mount(2) interface requires data parameter for different file systems. This parameter can be different types, such as struct ufs_args, struct nfs_args and so on. These types usually contain pointers, so we need to check them one by one. However, there are around 34 different struct xxx_args types in NetBSD, so it will be quite hard to add and maintain them in compiler-rt repository.
  3. getchar(3) and putchar(3) family interfaces: these interfaces will be defined by macros with some compiler conditions, so their implementation will be complicated.

Enable the Sanitizers for the Userland with MKSANITIZER

After adding interceptors, we can then enable the sanitizers for userland applications. To ship the sanitizers to the user, Christos Zoulas prepared the MKSANITIZER framework, dedicated for building the whole sanitizer userland with a dedicated sanitizer (including UndefinedBehaviorSanitizer, Control Flow Integrity, MemorySanitizer, ThreadSanitizer, SafeStack, LeakSanitizer and etc).

Based on this framework, Kamil Rytarowski used the NetBSD building parameters like MKSANITIZER=yes USE_SANITIZER=undefined HAVE_LLVM=yes and managed to enable the UndefinedBehaviorSanitizer option for the whole userland. There is the ongoing effort on upstreaming local patches, fixing detected bugs. It is planned to follow up this with the remaining sanitizer options.

I also tried to enable the MemorySanitizer for the userland programs and here is the result. If you have any insights or suggestions, please feel free to comment on it. Applying the MemorySanitizer option also helped to improve the interceptors and integrate MKSANITIZER. The MemorySanitizer is sensitive to the interceptor issues and so actually this job was twisted with the process of adding and improving the interceptors. With the MemorySanitizer, I also find out two bugs with top(1) program. You can refer to this post to learn about it.

There are also some unsolved issues with some applications. As shown in the sheet, I divide them into five categories:

  1. DEADLYSIGNAL: mainly happening when sending CTRL-C to programs
  2. IOCTL: ioctl(2)-related errors
  3. GETC, PUTC, FFLUSH: stdio(3)-related errors
  4. REALLOC: realloc(3)-related errors
  5. Compilation errors: conflict symbols between programs and base libraries
The challenging of GETC, PUTC, FFLUSH category has been mentioned above, it mainly results from lacking the interceptors of these interfaces. The other categories are still remained to be investigated.


In the last month, I have a good start of working with LLVM and NetBSD and successfully build some userland programs with MemorySanitizer. All of these jobs mentioned above are based on the forked repositories instead of the official ones. If you have interests in them, please refer to these repositories: NetBSD source, pkgsrc-wip, LLVM, clang, and compiler-rt. Next, I will switch to the integration work of libFuzzer and try to run some programs as a trial.

Last but not least, I want to thank my mentors, Christos Zoulas and Kamil Rytarowski, they help me a lot with so many good suggestions and assistance. I also want to thank Matthew Green and Joerg Sonnenberger for their help with LLVM-related suggestions. Finally, thanks to Google to give me a good chance to work with NetBSD community.

Posted early Wednesday morning, June 13th, 2018 Tags: blog

I like to use CLI email clients (mutt). This by itself is not unusual, but I happen to do this while speaking a language written right-to-left, Hebrew.
Decent bidi support in CLI tools is rare, so my impression is that very few people do this.

In the dark ages before Unicode, Hebrew used its own encodings which allowed typing both Latin and Hebrew letters: Windows-1255, ISO-8859-8.
I speculate that people initially expected input to be written in reverse order (aka "visual order"), assuming that everything will display text left to right.

When people wanted to use e-mail, they decided they'll write a line stating the charset encoding as others do, and use quoted-printable or base64 to avoid the content being mangled by clueless servers (8BITMIME wasn't around then).

But then they thought about bidi, and realized that writing in reverse isn't that great when you can have some bidi support. I've yet to write a bidi algorithm, but I suspect it makes line-wrapping illogical.

To avoid conflicts with existing emails, they decided on a separate encoding for the purpose of conveying that the information isn't in reverse: iso-8859-8-i: the content is in logical order, and Hebrew is assumed to be rtl.
iso-8859-8-e: the text direction is explicit using control codes.

The latter is a neat idea, but hasn't caught on. Now it's common to assume logical order, and even iso-8859-8 might be in that format.
While defining this, they've also done the same for Arabic (iso-8859-6).

This is a discussion that should've been part of the past - Unicode is now a thing, and I can send messages that contain Hebrew, Arabic, Chinese, English - without flipping back and forth in encoding (if that was ever even possible?), and out of the box! Never a need to enable support for specifying charset. Unicode has a detailed algorithm for handling bidi.
Unicode is love. Unicode is life. Use Unicode.
But I recently was looking for work, and HR's presumed Microsoft Outlook MUA did not use Unicode.

One of the emails I got was encoded as iso-8859-8-i.
It turns out, my MUA setup cannot handle this charset. It ended up looking like \344 things, and the subject as boxes.
mail is a plaintext format with extensions hacked into it, so you can view the raw content as a file. I used 'e' on mutt to open it:

Subject: =?iso-8859-8-i?B?base64stuff
(The magical 'encode this in a different way' for email subjects)
Content-Type: text/plain; charset="iso-8859-8-i"
Content-Transfer-Encoding: quoted-printable
So this is an iso-8859-8-i file.

OK, let's just read this file. I've got python.
I saved the file, which looked like this in its raw format:

Or quoted-printable. Gotta turn that into raw data, then convert ISO-8859-8 to UTF-8.
import quopri
import sys
rawmsg = sys.stdin.read()
notutf8msg = quopri.decodestring(rawmsg)
utf8msg = notutf8msg.decode('iso-8859-8')

Cool. I can read the message. I even discover 'fribidi' isn't just a library, but also provides a command I can pipe this into and see nicely-formatted Hebrew even without using weirdo terminal emulators.

But let's not leave bugs like that lurking around. It is my duty as an RTL warrior to fix it.

One of the perks to using pkgsrc/netbsd and open source is that I can immediately look at mutt's source code. I knew it could handle iso-8859-8, so that's what I looked for.

The amount of results (combined with experience) quickly suggested that the encoding is handled by the OS, netbsd in this case.
NetBSD didn't know about iso-8859-8-i.

Experience meant I knew to look in either src/lib/ (wasn't there) or src/share/ for 'data used by things'. I've looked for 'iso-8859-8' to see if it appears anywhere, and found it. It was good to see that NetBSD does appear to have a way to alias charsets as being equivalent, and I added iso-8859-8-i here, and did a full build because I didn't know how the files are used.

Testing locally, I could now read the email with mutt! But what about replying?
I have a weird email setup again. I had a hard time setting up a remote POP/IMAP thing, so I ssh to sdf.org and email from there. And I can't change their libc or install.
Hoping to just elide all the corrupted characters and reply with UTF-8 was too optimistic - mutt wanted to reply in the original encoding, and again could not handle it properly.

Well, I'll just put in my updated libc, and LD_PRELOAD it, then!
Except, after ktracing it (via 'ktruss -i mutt |grep esdb'), it turns out that it opens a file in /usr/share/i18n/ to figure out charset aliases.
I'll need to tell it to look elsewhere I can modify.
I've edited out paths.h, which is where the lookup path is stored, changed it to my home on sdf.org, and then built myself a fresh libc.
(It was during this I realized I could've just edited the email to say it's iso-8859-8, rather than iso-8859-8-i)

A few minor setbacks, and I could finally reply to the email, saying that yes, I will show up to the job interview.

I leave you with this tidbit from the RFC announcing these encodings and that finally, emails in Hebrew are possible:
"Within Israel there are in excess of 40 Listserv lists which will now start using Hebrew for part of their conversations."

Posted at teatime on Sunday, June 10th, 2018 Tags: blog
During the past month I have been working on coverage of various corner cases in the signal subsystem in the kernel. I have also spent some time on improvements in the land of sanitizers. As a mentor I was able to, thanks to the full-time focus on NetBSD work, actively help three Google Summer of Code students. Not every question would be answered by myself without code reading but at least I am available for active collaboration, especially when it's to improve code that I have already authored, like sanitizers. At the end of the month we have managed to catch two uninitialized memory reads in the top(1) utility, using the Memory Sanitizer feature and rebuilt part of the basesystem (i.e. library dependencies: libterminfo, libkvm, libutil) with dedicated sanitization flags.

ptrace(2) and related distribution changes

I am actively working on handling of processes, forks/vforks, signals and threads that is reliable and fully functional under a debugger. This is a process and the situation is actively improving. For the end-user this means that we are achieving the state when a developer will be able to trace an application like Firefox using modern tools and save time detecting the issues quickly.

I am using the Test-Driven Development approach in my work. I keep extending the Automatic Test Framework with new tests, covering sets of scenarios handled by debuggers and related code. This is followed by kernel fixes. Thanks to the tests, I can more confidently introduce changes to critical routines inside the Operating System, test new changes quickly for regressions and keep covering new verifiable scenarios.

Titles of the merged commits with the main tree of NetBSD:

  • Remove an element from struct emul: e_tracesig.

    e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin compatib[i]lity layer is gone and there are no other users.

  • Refactoring of can_we_set_dbregs() in ATF ptrace(2) tests. Push this auxiliary function to all ports.
  • Add a new ptrace(2) ATF exploit for: CVE-2018-8897 (POP SS debug exception).
  • Correct handling of: vfork(2) + PT_TRACE_ME + raise(2).
  • Add a new ATF ptrace(2) test: traceme_vfork_breakpoint.
  • Improve the description of traceme_vfork_raise in ATF ptrace(2) tests.
  • Add a new ATF ptrace(2) test: traceme_vfork_exec.
  • Improve the description of traceme_vfork_breakpoint (ATF ptrace(2) test).
  • Add extra asserts in three ATF ptrace(2) tests.

    In traceme* tests after validate_status_stopped() include additional check the verify the received signal with PT_GET_SIGINFO.

  • Correct assert in ATF t_zombie test.
  • Add new ATF tests: t_fork and t_vfork.
  • Stop masking SIGSTOP in a vfork(2)ed child.
  • Stop masking raise(SIGSTOP) in a vfork(2)ed child that called PT_TRACE_ME.
  • Add new auxiliary functions in t_ptrace_wait.h

    New functions:

    • await_stopped_child()
  • Enable traceme_vfork_raise2 in ATF ptrace(2) tests.

    raise(SIGSTOP) is now handled correctly by the kernel, in a child that vfork(2)ed and called PT_TRACE_ME.

  • Cover SIGTSTP, SIGTTIN and SIGTTOU in traceme_vfork_raise ATF tests.
  • Note in vfork(2) that SIGTSTP is masked.
  • Fix and enable traceme_signal_nohandler2 in ATF ptrace(2) tests.
  • Make stopsigmask a non-static symbol now as it's used in ptrace(2) code.
  • Refactor and enable the signal3 ATF ptrace(2) test

    Adapt the test to be independent from the software breakpoint trap behavior, whether the Program Counter is moved or not. Just kill the process after catching the expected signal, instead of pretending to resume it.

  • Add new ATF test: t_trapsignal:trap_ignore.
  • Minor update to signal(7)

    Note that SIGCHLD is not just a child exit signal. Note that SIGIOT is PDP-11 specific signal.

  • Minor improvement in sigaction(2)

    Note that SIGCHLD covers process continued event.

  • Extend ATF tests in t_trapsignal.sh to verify software breakpoint traps.
  • Add new ATF ptrace(2) tests: traceme_sendsignal_{masked,ignored}[1-3].
  • Define PTRACE_BREAKPOINT_ASM for i386 in the MD part of .
  • Refactor the attach[1-8] and race1 ATF t_ptrace_wait* tests.
  • Cherry-pick upstream patch for internal_mmap() in GCC sanitizers.
  • Cherry-pick upstream patch for internal_mmap() in GCC(.old) sanitizers
  • Add new auxiliary functions in ATF ptrace(2) tests


    • trigger_trap()
    • trigger_segv()
    • trigger_ill()
    • trigger_fpe()
    • trigger_bus()
  • Extend traceme_vfork_breakpoint in ATF ptrace(2) tests for more scenarios

    Added tests:

    • traceme_vfork_crash_trap
    • traceme_vfork_crash_segv (renamed from traceme_vfork_breakpoint)
    • traceme_vfork_crash_ill (disabled)
    • traceme_vfork_crash_fpe
    • traceme_vfork_crash_bus
  • Merge the eventmask[1-6] ATF ptrace(2) tests into a shared function body.
  • Introduce can_we_write_to_text() to ATF ptrace(2) tests

    The purpose of this function is to detect whether a tracer can write to the .text section of its tracee.

  • Refactor the PT_WRITE*/PT_READ* and PIOD_* ATF ptrace(2) tests.
  • Handle vm.maxaddress in compat_netbsd32(8).
  • Port the CVE 2018-8897 mitigation to i386 ATF ptrace(2) tests.
  • Fix sysctl(3):vm.minaddress in compat_netbsd32(8).
  • Fix ATF ptrace(2) bytes_transfer_piod_read_auxv test.
  • Handle FPE and BUS scenarios in the ATF t_trapsignal tests.
  • Try to fool $CC harder in ATF ptrace(2) tests in trigger_fpe().
  • Correct reporting SIGTRAP TRAP_EXEC when SIGTRAP is masked.
  • Correct the t_ptrace_wait*:signal5 ATF test case.
  • This functionality now works.

  • Add new ATF ptrace(2) tests verifying crash signal handling.
  • Harden PT_ATTACH in ptrace(2).

    Don't allow to PT_ATTACH from a vfork(2)ed child (before exec(3)/_exit(3)) to its parent. Return error with EPERM errno.

    This scenario does not have a purpose and there is no clear picture how to route signals.

  • Simplify comparison of two processes

    No need to check p_pid to compare whether two processes are the same.

LLVM compiler-rt features

I have helped the GSoC student to prepare for LLVM libfuzzer integration with the NetBSD base system. We have managed to get down to the following results for the test target in the upstream repository:

$ check-fuzzer-default

  Expected Passes    : 105
  Unsupported Tests  : 8
  Unexpected Failures: 2

$ check-fuzzer

  Expected Passes    : 105
  Unsupported Tests  : 8
  Unexpected Failures: 2

$ check-fuzzer-unit

  Expected Passes    : 35

The remaining two failures appear to be false positives and specific to the differences between the NetBSD setup difference and other supported Operating Systems (including Linux). I have decided not to investigate them and instead to move on to more urgent tasks.

While there, I have been working on restoring a good state to userland LLVM sanitizers in the upstream repository, in order ship them in the NetBSD distribution along with the libfuzzer utility.

A number of patches were merged upstream:

  • LLVM: Register NetBSD/i386 in AddressSanitizer.cpp
  • Clang: Permit -fxray-instrument for NetBSD/amd64
  • Clang: Support XRay in the NetBSD driver
  • compiler-rt: Remove dead sanitizer_procmaps_freebsd.cc
  • compiler-rt: wrong usages of sem_open in the libFuzzer (patch by Yang Zheng, the GSoC student)
  • compiler-rt: Register NetBSD/i386 in asan_mapping.h
  • compiler-rt: Setup ORIGIN/NetBSD option in sanitizer tests
  • compiler-rt: Enable SANITIZER_INTERCEPTOR_HOOKS for NetBSD

There is also at least a single pending upstream patch that is worth to note: Introduce CheckASLR() in sanitizers

At least the ASan, MSan, TSan sanitizers require disabled ASLR on a NetBSD.

Introduce a generic CheckASLR() routine, that implements a check for the
current process. This flag depends on the global or per-process settings.

There is no simple way to disable ASLR in the build process from the
level of a sanitizer or during the runtime execution.

With ASLR enabled sanitizers that operate over the process virtual address
space can misbehave usually breaking with cryptic messages.

This check is dummy for !NetBSD.

The current results for test targets in the compiler-rt features are as follows:

$ make check-builtins

  Expected Passes    : 343
  Expected Failures  : 4
  Unsupported Tests  : 36
  Unexpected Failures: 5

$ check-interception

-- Testing: 0 tests, 0 threads --

$ check-lsan

  Expected Passes    : 6
  Unsupported Tests  : 60
  Unexpected Failures: 106

$ check-ubsan

  Expected Passes    : 229
  Expected Failures  : 1
  Unsupported Tests  : 32
  Unexpected Failures: 2

$ check-cfi

  Unsupported Tests  : 232

$ check-cfi-and-supported

BaseException: Tests unsupported

$ make check-sanitizer

  Expected Passes    : 576
  Expected Failures  : 13
  Unsupported Tests  : 206
  Unexpected Failures: 31

$ check-asan

  Expected Passes    : 852
  Expected Failures  : 4
  Unsupported Tests  : 440
  Unexpected Failures: 16

$ check-asan-dynamic

  Expected Passes    : 394
  Expected Failures  : 3
  Unsupported Tests  : 440
  Unexpected Passes  : 1
  Unexpected Failures: 222

$ check-msan

  Expected Passes    : 102
  Expected Failures  : 1
  Unsupported Tests  : 30
  Unexpected Failures: 4

$ check-tsan

  Expected Passes    : 288
  Expected Failures  : 1
  Unsupported Tests  : 84
  Unexpected Failures: 8

$ check-safestack

  Expected Passes    : 7
  Unsupported Tests  : 1

$ check-scudo

  Expected Passes    : 14
  Unexpected Failures: 28

$ check-ubsan-minimal

  Expected Passes    : 6
  Unsupported Tests  : 2

$ check-profile

  Unsupported Tests  : 116

$ check-xray

  Expected Passes    : 21
  Unsupported Tests  : 1
  Unexpected Failures: 21

$ check-shadowcallstack

  Unsupported Tests  : 4

Sanitization of userland and the kernel

I am helping to setup the process for shipping a NetBSD userland that is prebuilt with a desired sanitizer. This involves consulting the Google Summer of Code student, fixing known issues, reviewing patches etc.

There were two new uninitialized memory read bugs detected in the top(1) program:

Fix unitialized signal mask passed to sigaction(2) in top(1)

Detected with Memory Sanitizer during the integration of sanitizers with
the NetBSD basesystem.

Reported by <Yang Zheng>

Fix read of uni[ni]tialized array elements in top(1)

The cp_old array is allocated with malloc(3) and its pointer is passed to

In this function there happens a calculation of total_change, which value
depends on the value inside the unitialized cp_old[] array.

==26662==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x268a2c in percentages64 /usr/src/external/bsd/top/bin/../dist/machine/m_netbsd.c:1341:6
#1 0x26748b in get_system_info /usr/src/external/bsd/top/bin/../dist/machine/m_netbsd.c:478:6
#2 0x25518e in do_display /usr/src/external/bsd/top/bin/../dist/top.c:507:5
#3 0x253038 in main /usr/src/external/bsd/top/bin/../dist/top.c:975:2
#4 0x21cad1 in ___start (/usr/bin/top+0x1cad1)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /usr/src/external/bsd/top/bin/../dist/machine/m_netbsd.c:1341:6 in percentages64

Fix this issue by chang[]ing malloc(3) with calloc(3).

Detected with Memory Sanitizer during the integration of sanitizers with
the NetBSD basesystem.

Reported by <Yang Zheng>

As similar process happens with two kernel sanitizer GSoC tasks: kernel-ubsan and kernel-asan.

Thanks to the involvement to The NetBSD Foundation tasks, I can be reachable for students (although not always in all cases) for active feedback and collaboration.


The number of ATF ptrace(2) tests cases has been significantly incremented, however there is still a substantial amount of work to be done and a number of serious bugs to be resolved.

With fixes and addition of new test cases, as of today we are passing 1,206 (last month: 961) ptrace(2) tests and skipping 1 (out of 1,256 total; last month: 1,018 total). No counted here tests that appeared outside the ptrace(2) context.

Plan for the next milestone

Cover with regression tests remaining elementary scenarios of handling crash signals. Fix known bugs in the NetBSD kernel.

Follow up the process with the remaining fork(2) and vfork(2) scenarios.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:


Posted at teatime on Friday, June 1st, 2018 Tags: blog
Add a comment