File:  [NetBSD Developer Wiki] / wikisrc / guide / tuning.mdwn
Revision 1.2: download - view: text, annotated - select for diffs
Thu Mar 21 11:11:21 2013 UTC (5 years, 3 months ago) by jdf
Branches: MAIN
CVS tags: HEAD
Add TOCs to all chapters.


[[!toc levels=3]]

# Tuning NetBSD

## Introduction

### Overview

This section covers a variety of performance tuning topics. It attempts to span
tuning from the perspective of the system administrator to systems programmer.
The art of performance tuning itself is very old. To tune something means to
make it operate more efficiently, whether one is referring to a NetBSD based
technical server or a vacuum cleaner, the goal is to improve something, whether
that be the way something is done, how it works or how it is put together.

#### What is Performance Tuning?

A view from 10,000 feet pretty much dictates that everything we do is task
oriented, this pertains to a NetBSD system as well. When the system boots, it
automatically begins to perform a variety of tasks. When a user logs in, they
usually have a wide variety of tasks they have to accomplish. In the scope of
these documents, however, performance tuning strictly means to improve how
efficient a NetBSD system performs.

The most common thought that crops into someone's mind when they think "tuning"
is some sort of speed increase or decreasing the size of the kernel - while
those are ways to improve performance, they are not the only ends an
administrator may have to take for increasing efficiency. For our purposes,
performance tuning means this: *To make a NetBSD system operate in an optimum

Which could mean a variety of things, not necessarily speed enhancements. A good
example of this is filesystem formatting parameters, on a system that has a lot
of small files (say like a source repository) an administrator may need to
increase the number of inodes by making their size smaller (say down to 1024k)
and then increasing the amount of inodes. In this case the number of inodes was
increased, however, it keeps the administrator from getting those nasty out of
inodes messages, which ultimately makes the system more efficient.

Tuning normally revolves around finding and eliminating bottlenecks. Most of the
time, such bottlenecks are spurious, for example, a release of Mozilla that does
not quite handle java applets too well can cause Mozilla to start crunching the
CPU, especially applets that are not done well. Occasions when processes seem to
spin off into nowhere and eat CPU are almost always resolved with a kill. There
are instances, however, when resolving bottlenecks takes a lot longer, for
example, say an rsynced server is just getting larger and larger. Slowly,
performance begins to fade and the administrator may have to take some sort of
action to speed things up, however, the situation is relative to say an
emergency like an instantly spiked CPU.

#### When does one tune?

Many NetBSD users rarely have to tune a system. The GENERIC kernel may run just
fine and the layout/configuration of the system may do the job as well. By the
same token, as a pragma it is always good to know how to tune a system. Most
often tuning comes as a result of a sudden bottleneck issue (which may occur
randomly) or a gradual loss of performance. It does happen in a sense to
everyone at some point, one process that is eating the CPU is a bottleneck as
much as a gradual increase in paging. So, the question should not be when to
tune so much as when to learn to tune.

One last time to tune is if you can tune in a preventive manner (and you think
you might need to) then do it. One example of this was on a system that needed
to be able to reboot quickly. Instead of waiting, I did everything I could to
trim the kernel and make sure there was absolutely nothing running that was not
needed, I even removed drivers that did have devices, but were never used (lp).
The result was reducing reboot time by nearly two-thirds. In the long run, it
was a smart move to tune it before it became an issue.

#### What these Documents Will Not Cover

Before I wrap up the introduction, I think it is important to note what these
documents will not cover. This guide will pertain only to the core NetBSD
system. In other words, it will not cover tuning a web server's configuration to
make it run better; however, it might mention how to tune NetBSD to run better
as a web server. The logic behind this is simple: web servers, database
software, etc. are third party and almost limitless. I could easily get mired
down in details that do not apply to the NetBSD system. Almost all third party
software have their own documentation about tuning anyhow.

#### How Examples are Laid Out

Since there is ample man page documentation, only used options and arguments
with examples are discussed. In some cases, material is truncated for brevity
and not thoroughly discussed because, quite simply, there is too much. For
example, every single device driver entry in the kernel will not be discussed,
however, an example of determining whether or not a given system needs one will
be. Nothing in this Guide is concrete, tuning and performance are very
subjective, instead, it is a guide for the reader to learn what some of the
tools available to them can do.

## Tuning Considerations

Tuning a system is not really too difficult when pro-active tuning is the
approach. This document approaches tuning from a *before it comes up* approach.
While tuning in spare time is considerably easier versus say, a server that is
almost completely bogged down to 0.1% idle time, there are still a few things
that should be mulled over about tuning before actually doing it, hopefully,
before a system is even installed.

### General System Configuration

Of course, how the system is setup makes a big difference. Sometimes small items
can be overlooked which may in fact cause some sort of long term performance

#### Filesystems and Disks

How the filesystem is laid out relative to disk drives is very important. On
hardware RAID systems, it is not such a big deal, but, many NetBSD users
specifically use NetBSD on older hardware where hardware RAID simply is not an
option. The idea of `/` being close to the first drive is a good one, but for
example if there are several drives to choose from that will be the first one,
is the best performing the one that `/` will be on? On a related note, is it
wise to split off `/usr`? Will the system see heavy usage say in `/usr/pkgsrc`?
It might make sense to slap a fast drive in and mount it under `/usr/pkgsrc`, or
it might not. Like all things in performance tuning, this is subjective.

#### Swap Configuration

There are three schools of thought on swap size and about fifty on using split
swap files with prioritizing and how that should be done. In the swap size
arena, the vendor schools (at least most commercial ones) usually have their own
formulas per OS. As an example, on a particular version of HP-UX with a
particular version of Oracle the formula was:

2.5 GB \* Number\_of\_processor

Well, that all really depends on what type of usage the database is having and
how large it is, for instance if it is so large that it must be distributed,
that formula does not fit well.

The next school of thought about swap sizing is sort of strange but makes some
sense, it says, if possible, get a reference amount of memory used by the
system. It goes something like this:

 1. Startup a machine and estimate total memory needs by running everything that
    may ever be needed at once. Databases, web servers .... whatever. Total up
	the amount.
 2. Add a few MB for padding.
 3. Subtract the amount of physical RAM from this total.

If the amount leftover is 3 times the size of physical RAM, consider getting
more RAM. The problem, of course, is figuring out what is needed and how much
space it will take. There is also another flaw in this method, some programs do
not behave well. A glaring example of misbehaved software is web browsers. On
certain versions of Netscape, when something went wrong it had a tendency to
runaway and eat swap space. So, the more spare space available, the more time to
kill it.

Last and not least is the tried and true PHYSICAL\_RAM \* 2 method. On modern
machines and even older ones (with limited purpose of course) this seems to work

All in all, it is hard to tell when swapping will start. Even on small 16MB RAM
machines (and less) NetBSD has always worked well for most people until
misbehaving software is running.

### System Services

On servers, system services have a large impact. Getting them to run at their
best almost always requires some sort of network level change or a fundamental
speed increase in the underlying system (which of course is what this is all
about). There are instances when some simple solutions can improve services. One
example, an ftp server is becoming slower and a new release of the ftp server
that is shipped with the system comes out that, just happens to run faster. By
upgrading the ftp software, a performance boost is accomplished.

Another good example where services are concerned is the age old question: *To
use inetd or not to use inetd?* A great service example is pop3. Pop3
connections can conceivably clog up inetd. While the pop3 service itself starts
to degrade slowly, other services that are multiplexed through inetd will also
degrade (in some case more than pop3). Setting up pop3 to run outside of inetd
and on its own may help.

### The NetBSD Kernel

The NetBSD kernel obviously plays a key role in how well a system performs,
while rebuilding and tuning the kernel is covered later in the text, it is worth
discussing in the local context from a high level.

Tuning the NetBSD kernel really involves three main areas:

 1. removing unrequired drivers
 2. configuring options
 3. system settings

#### Removing Unrequired Drivers

Taking drivers out of the kernel that are not needed achieves several results;
first, the system boots faster since the kernel is smaller, second again since
the kernel is smaller, more memory is free to users and processes and third, the
kernel tends to respond quicker.

#### Configuring Options

Configuring options such as enabling/disabling certain subsystems, specific
hardware and filesystems can also improve performance pretty much the same way
removing unrequired drivers does. A very simple example of this is a FTP server
that only hosts ftp files - nothing else. On this particular server there is no
need to have anything but native filesystem support and perhaps a few options to
help speed things along. Why would it ever need NTFS support for example?
Besides, if it did, support for NTFS could be added at some later time. In an
opposite case, a workstation may need to support a lot of different filesystem
types to share and access files.

#### System Settings

System wide settings are controlled by the kernel, a few examples are filesystem
settings, network settings and core kernel settings such as the maximum number
of processes. Almost all system settings can be at least looked at or modified
via the sysctl facility. Examples using the sysctl facility are given later on.

## Visual Monitoring Tools

NetBSD ships a variety of performance monitoring tools with the system. Most of
these tools are common on all UNIX systems. In this section some example usage
of the tools is given with interpretation of the output.

### The top Process Monitor

The [top(1)](
monitor does exactly what it says: it displays the CPU hogs on the
system. To run the monitor, simply type top at the prompt. Without any
arguments, it should look like:

    load averages:  0.09,  0.12,  0.08                                     20:23:41
    21 processes:  20 sleeping, 1 on processor
    CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    Memory: 15M Act, 1104K Inact, 208K Wired, 22M Free, 129M Swap free
    13663 root       2    0  1552K 1836K sleep     0:08  0.00%  0.00% httpd
      127 root      10    0   129M 4464K sleep     0:01  0.00%  0.00% mount_mfs
    22591 root       2    0   388K 1156K sleep     0:01  0.00%  0.00% sshd
      108 root       2    0   132K  472K sleep     0:01  0.00%  0.00% syslogd
    22597 jrf       28    0   156K  616K onproc    0:00  0.00%  0.00% top
    22592 jrf       18    0   828K 1128K sleep     0:00  0.00%  0.00% tcsh
      203 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
        1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
      205 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
      206 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
      208 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
      207 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
    13667 nobody     2    0  1660K 1508K sleep     0:00  0.00%  0.00% httpd
     9926 root       2    0   336K  588K sleep     0:00  0.00%  0.00% sshd
      200 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
      182 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
      180 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
    13666 nobody    -4    0  1600K 1260K sleep     0:00  0.00%  0.00% httpd

The top(1) utility is great for finding CPU hogs, runaway processes or groups of
processes that may be causing problems. The output shown above indicates that
this particular system is in good health. Now, the next display should show some
very different results:

    load averages:  0.34,  0.16,  0.13                                     21:13:47
    25 processes:  24 sleeping, 1 on processor
    CPU states:  0.5% user,  0.0% nice,  9.0% system,  1.0% interrupt, 89.6% idle
    Memory: 20M Act, 1712K Inact, 240K Wired, 30M Free, 129M Swap free
     5304 jrf       -5    0    56K  336K sleep     0:04 66.07% 19.53% bonnie
     5294 root       2    0   412K 1176K sleep     0:02  1.01%  0.93% sshd
      108 root       2    0   132K  472K sleep     1:23  0.00%  0.00% syslogd
      187 root       2    0  1552K 1824K sleep     0:07  0.00%  0.00% httpd
     5288 root       2    0   412K 1176K sleep     0:02  0.00%  0.00% sshd
     5302 jrf       28    0   160K  620K onproc    0:00  0.00%  0.00% top
     5295 jrf       18    0   828K 1116K sleep     0:00  0.00%  0.00% tcsh
     5289 jrf       18    0   828K 1112K sleep     0:00  0.00%  0.00% tcsh
      127 root      10    0   129M 8388K sleep     0:00  0.00%  0.00% mount_mfs
      204 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
        1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
      208 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
      210 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
      209 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
      211 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
      217 nobody     2    0  1616K 1272K sleep     0:00  0.00%  0.00% httpd
      184 root       2    0   336K  580K sleep     0:00  0.00%  0.00% sshd
      201 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd

At first, it should seem rather obvious which process is hogging the system,
however, what is interesting in this case is why. The bonnie program is a disk
benchmark tool which can write large files in a variety of sizes and ways. What
the previous output indicates is only that the bonnie program is a CPU hog, but
not why.

#### Other Neat Things About Top

A careful examination of the manual page
[top(1)]( shows
that there is a lot more that can be done with top, for example, processes can
have their priority changed and killed. Additionally, filters can be set for
looking at processes.

### The sysstat utility

As the man page
indicates, the sysstat utility shows a variety of system statistics using the
curses library. While it is running the screen is shown in two parts, the upper
window shows the current load average while the lower screen depends on user
commands. The exception to the split window view is when vmstat display is on
which takes up the whole screen. Following is what sysstat looks like on a
fairly idle system with no arguments given when it was invoked:

                       /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
         Load Average   |
                             /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100

Basically a lot of dead time there, so now have a look with some arguments
provided, in this case, `sysstat inet.tcp` which looks like this:

                        /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
         Load Average   |
            0 connections initiated           19 total TCP packets sent
            0 connections accepted            11   data
            0 connections established          0   data (retransmit)
                                               8   ack-only
            0 connections dropped              0   window probes
            0   in embryonic state             0   window updates
            0   on retransmit timeout          0   urgent data only
            0   by keepalive                   0   control
            0   by persist
                                              29 total TCP packets received
           11 potential rtt updates           17   in sequence
           11 successful rtt updates           0   completely duplicate
            9 delayed acks sent                0   with some duplicate data
            0 retransmit timeouts              4   out of order
            0 persist timeouts                 0   duplicate acks
            0 keepalive probes                11   acks
            0 keepalive timeouts               0   window probes
                                               0   window updates

Now that is informative. The first poll is accumulative, so it is possible to
see quite a lot of information in the output when sysstat is invoked. Now, while
that may be interesting, how about a look at the buffer cache with `sysstat

                        /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
         Load Average
    There are 1642 buffers using 6568 kBytes of memory.
    File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
    /                          877  53        6171  93        6516  99      94
    /var/tmp                     5   0          17   0          28   0      60
    Total:                     882  53        6188  94        6544  99

Again, a pretty boring system, but great information to have available. While
this is all nice to look at, it is time to put a false load on the system to see
how sysstat can be used as a performance monitoring tool. As with top, bonnie++
will be used to put a high load on the I/O subsystems and a little on the CPU.
The bufcache will be looked at again to see of there are any noticeable

                        /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
         Load Average   |||
    There are 1642 buffers using 6568 kBytes of memory.
    File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
    /                          811  49        6422  97        6444  98      99
    Total:                     811  49        6422  97        6444  98

First, notice that the load average shot up, this is to be expected of course,
then, while most of the numbers are close, notice that utilization is at 99%.
Throughout the time that bonnie++ was running the utilization percentage
remained at 99, this of course makes sense, however, in a real troubleshooting
situation, it could be indicative of a process doing heavy I/O on one particular
file or filesystem.

## Monitoring Tools

In addition to screen oriented monitors and tools, the NetBSD system also ships
with a set of command line oriented tools. Many of the tools that ship with a
NetBSD system can be found on other UNIX and UNIX-like systems.

### fstat

The [fstat(1)](
utility reports the status of open files on the system, while it is not what
many administrators consider a performance monitor, it can help find out if a
particular user or process is using an inordinate amount of files, generating
large files and similar information.

Following is a sample of some fstat output:

    USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
    jrf      tcsh       21607   wd /         29772 drwxr-xr-x     512 r
    jrf      tcsh       21607    3* unix stream c057acc0<-> c0553280
    jrf      tcsh       21607    4* unix stream c0553280 <-> c057acc0
    root     sshd       21597   wd /             2 drwxr-xr-x     512 r
    root     sshd       21597    0 /         11921 crw-rw-rw-    null rw
    nobody   httpd       5032   wd /             2 drwxr-xr-x     512 r
    nobody   httpd       5032    0 /         11921 crw-rw-rw-    null r
    nobody   httpd       5032    1 /         11921 crw-rw-rw-    null w
    nobody   httpd       5032    2 /         15890 -rw-r--r--  353533 rw

The fields are pretty self explanatory, again, this tool while not as
performance oriented as others, can come in handy when trying to find out
information about file usage.

### iostat

The [iostat(8)](
command does exactly what it sounds like, it reports the status of the I/O
subsystems on the system. When iostat is employed, the user typically runs it
with a certain number of counts and an interval between them like so:

    $ iostat 5 5
          tty            wd0             cd0             fd0             md0             cpu
     tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
       0    1  5.13   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
       0   54  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
       0   18  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
       0   18  8.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
       0   28  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100

The above output is from a very quiet ftp server. The fields represent the
various I/O devices, the tty (which, ironically, is the most active because
iostat is running), wd0 which is the primary IDE disk, cd0, the cdrom drive,
fd0, the floppy and the memory filesystem.

Now, let's see if we can pummel the system with some heavy usage. First, a large
ftp transaction consisting of a tarball of netbsd-current source along with the
`bonnie++` disk benchmark program running at the same time.

    $ iostat 5 5
          tty            wd0             cd0             fd0             md0             cpu
     tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
       0    1  5.68   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
       0   54 61.03 150 8.92   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 18  4 78
       0   26 63.14 157 9.71   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 20  4 75
       0   20 43.58  26 1.12   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  9  2 88
       0   28 19.49  82 1.55   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0  7  3 89

As can be expected, notice that wd0 is very active, what is interesting about
this output is how the processor's I/O seems to rise in proportion to wd0. This
makes perfect sense, however, it is worth noting that only because this ftp
server is hardly being used can that be observed. If, for example, the cpu I/O
subsystem was already under a moderate load and the disk subsystem was under the
same load as it is now, it could appear that the cpu is bottlenecked when in
fact it would have been the disk. In such a case, we can observe that *one tool*
is rarely enough to completely analyze a problem. A quick glance at processes
probably would tell us (after watching iostat) which processes were causing

### ps

Using the [ps(1)](
command or process status, a great deal of information about the system can be
discovered. Most of the time, the ps command is used to isolate a particular
process by name, group, owner etc. Invoked with no options or arguments, ps
simply prints out information about the user executing it.

    $ ps
    21560 p0 Is   0:00.04 -tcsh
    21564 p0 I+   0:00.37 ssh
    21598 p1 Ss   0:00.12 -tcsh
    21673 p1 R+   0:00.00 ps
    21638 p2 Is+  0:00.06 -tcsh

Not very exciting. The fields are self explanatory with the exception of `STAT`
which is actually the state a process is in. The flags are all documented in the
man page, however, in the above example, `I` is idle, `S` is sleeping, `R` is
runnable, the `+` means the process is in a foreground state, and the s means
the process is a session leader. This all makes perfect sense when looking at
the flags, for example, PID 21560 is a shell, it is idle and (as would be
expected) the shell is the process leader.

In most cases, someone is looking for something very specific in the process
listing. As an example, looking at all processes is specified with `-a`, to see
all processes plus those without controlling terminals is `-ax` and to get a
much more verbose listing (basically everything plus information about the
impact processes are having) aux:

    # ps aux
    root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
    root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
    root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
    jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
    root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
    jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh
    jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
    root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
    nobody  5032  0.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd

Again, most of the fields are self explanatory with the exception of `VSZ` and
`RSS` which can be a little confusing. `RSS` is the real size of a process in
1024 byte units while `VSZ` is the virtual size. This is all great, but again,
how can ps help? Well, for one, take a look at this modified version of the same

    # ps aux
    root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
    root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
    root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
    jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
    root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
    jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh
    jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
    root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
    nobody  5032  9.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd

Given that on this server, our baseline indicates a relatively quiet system, the
PID 5032 has an unusually large amount of `%CPU`. Sometimes this can also cause
high `TIME` numbers. The ps command can be grepped on for PIDs, username and
process name and hence help track down processes that may be experiencing

### vmstat

information pertaining to virtual memory can be monitored and measured. Not
unlike iostat, vmstat can be invoked with a count and interval. Following is
some sample output using `5 5` like the iostat example:

    # vmstat 5 5
     procs   memory     page                       disks         faults      cpu
     r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
     0 7 0 17716 33160    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
     0 7 0 17724 33156    2   0   0    0    0    0  1  0  0  0  109    6   3  0  0 100
     0 7 0 17724 33156    1   0   0    0    0    0  1  0  0  0  105    6   3  0  0 100
     0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  107    6   3  0  0 100
     0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  105    6   3  0  0 100

Yet again, relatively quiet, for posterity, the exact same load that was put on
this server in the iostat example will be used. The load is a large file
transfer and the bonnie benchmark program.

    # vmstat 5 5
     procs   memory     page                       disks         faults      cpu
     r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
     1 8 0 18880 31968    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
     0 8 0 18888 31964    2   0   0    0    0    0 130  0  0  0 1804 5539 1094 31 22 47
     1 7 0 18888 31964    1   0   0    0    0    0 130  0  0  0 1802 5500 1060 36 16 49
     1 8 0 18888 31964    1   0   0    0    0    0 160  0  0  0 1849 5905 1107 21 22 57
     1 7 0 18888 31964    1   0   0    0    0    0 175  0  0  0 1893 6167 1082  1 25 75

Just a little different. Notice, since most of the work was I/O based, the
actual memory used was not very much. Since this system uses mfs for `/tmp`,
however, it can certainly get beat up. Have a look at this:

    # vmstat 5 5
     procs   memory     page                       disks         faults      cpu
     r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
     0 2 0 99188   500    2   0   0    0    0    0  1  0  0  0  105   16   4  0  0 100
     0 2 0111596   436  592   0 587  624  586 1210 624  0  0  0  741  883 1088  0 11 89
     0 3 0123976   784  666   0 662  643  683 1326 702  0  0  0  828  993 1237  0 12 88
     0 2 0134692  1236  581   0 571  563  595 1158 599  0  0  0  722  863 1066  0  9 90
     2 0 0142860   912  433   0 406  403  405  808 429  0  0  0  552  602 768  0  7 93

Pretty scary stuff. That was created by running bonnie in `/tmp` on a memory
based filesystem. If it continued for too long, it is possible the system could
have started thrashing. Notice that even though the VM subsystem was taking a
beating, the processors still were not getting too battered.

## Network Tools

Sometimes a performance problem is not a particular machine, it is the network
or some sort of device on the network such as another host, a router etc. What
other machines that provide a service or some sort of connectivity to a
particular NetBSD system do and how they act can have a very large impact on
performance of the NetBSD system itself, or the perception of performance by
users. A really great example of this is when a DNS server that a NetBSD machine
is using suddenly disappears. Lookups take long and they eventually fail.
Someone logged into the NetBSD machine who is not experienced would undoubtedly
(provided they had no other evidence) blame the NetBSD system. One of my
personal favorites, *the Internet is broke*, usually means either DNS service or
a router/gateway has dropped offline. Whatever the case may be, a NetBSD system
comes adequately armed to deal with finding out what network issues may be
cropping up whether the fault of the local system or some other issue.

### ping

The classic
[ping(8)]( utility
can tell us if there is just plain connectivity, it can also tell if host
resolution (depending on how `nsswitch.conf` dictates) is working. Following is
some typical ping output on a local network with a count of 3 specified:

    # ping -c 3 marie
    PING marie ( 56 data bytes
    64 bytes from icmp_seq=0 ttl=255 time=0.571 ms
    64 bytes from icmp_seq=1 ttl=255 time=0.361 ms
    64 bytes from icmp_seq=2 ttl=255 time=0.371 ms
    ----marie PING Statistics----
    3 packets transmitted, 3 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.361/0.434/0.571/0.118 ms

Not only does ping tell us if a host is alive, it tells us how long it took and
gives some nice details at the very end. If a host cannot be resolved, just the
IP address can be specified as well:

    # ping -c 1
    PING ash ( 56 data bytes
    64 bytes from icmp_seq=0 ttl=64 time=0.452 ms
    ----ash PING Statistics----
    1 packets transmitted, 1 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.452/0.452/0.452/0.000 ms

Now, not unlike any other tool, the times are very subjective, especially in
regards to networking. For example, while the times in the examples are good,
take a look at the localhost ping:

    # ping -c 4 localhost
    PING localhost ( 56 data bytes
    64 bytes from icmp_seq=0 ttl=255 time=0.091 ms
    64 bytes from icmp_seq=1 ttl=255 time=0.129 ms
    64 bytes from icmp_seq=2 ttl=255 time=0.120 ms
    64 bytes from icmp_seq=3 ttl=255 time=0.122 ms
    ----localhost PING Statistics----
    4 packets transmitted, 4 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.091/0.115/0.129/0.017 ms

Much smaller because the request never left the machine. Pings can be used to
gather information about how well a network is performing. It is also good for
problem isolation, for instance, if there are three relatively close in size
NetBSD systems on a network and one of them simply has horrible ping times,
chances are something is wrong on that one particular machine.

### traceroute

command is great for making sure a path is available or detecting problems on a
particular path. As an example, here is a trace between the example ftp server

    # traceroute
    traceroute to (, 30 hops max, 40 byte packets
     1 (  1.646 ms  1.492 ms  1.456 ms
     2 (  7.318 ms  3.249 ms  3.854 ms
     3 (  35.982 ms  28.667 ms  21.971 ms
     4 (  22.607 ms  26.242 ms  19.631 ms
     5 (  78.586 ms  70.585 ms  84.779 ms
     6 (  69.222 ms  85.739 ms  75.979 ms
     7 (  83.882 ms  67.739 ms  69.937 ms
     8 (  72.782 ms  67.687 ms  73.320 ms
     9 (  78.007 ms  81.860 ms  77.069 ms
    10 (  70.808 ms  75.151 ms  81.485 ms
    11 (  69.700 ms  69.528 ms  77.788 ms

All in all, not bad. The trace went from the host to the local router, then out
onto the provider network and finally out onto the Internet looking for the
final destination. How to interpret traceroutes, again, are subjective, but
abnormally high times in portions of a path can indicate a bottleneck on a piece
of network equipment. Not unlike ping, if the host itself is suspect, run
traceroute from another host to the same destination. Now, for the worst case

    # traceroute
    traceroute: Warning: has multiple addresses; using
    traceroute to (, 30 hops max, 40 byte packets
     1 (  2.517 ms  4.922 ms  5.987 ms
     2 (  10.981 ms  3.374 ms  3.249 ms
     3 (  37.810 ms  37.505 ms  20.795 ms
     4 (  36.987 ms  32.320 ms  22.430 ms
     5 (  33.155 ms  32.859 ms  33.462 ms
     6 (  39.265 ms  20.482 ms  26.084 ms
     7 (  26.681 ms  24.000 ms  28.975 ms
     8 (  65.329 ms  69.694 ms  76.704 ms
     9 (  65.659 ms  66.797 ms  74.408 ms
    10 (  104.657 ms  89.958 ms  91.754 ms
    11 (  89.197 ms  84.527 ms  81.629 ms
    12 (  78.090 ms  91.550 ms  89.480 ms
    13  * * *

In this case, the Microsoft server cannot be found either because of multiple
addresses or somewhere along the line a system or server cannot reply to the
information request. At that point, one might think to try ping, in the
Microsoft case, a ping does not reply, that is because somewhere on their
network ICMP is most likely disabled.

### netstat

Another problem that can crop up on a NetBSD system is routing table issues.
These issues are not always the systems fault. The
[route(8)]( and
commands can show information about routes and network connections

The route command can be used to look at and modify routing tables while netstat
can display information about network connections and routes. First, here is
some output from `route show`:

    # route show
    Routing tables
    Destination      Gateway            Flags
    default        UG
    loopback          UG
    localhost          UH       UG       link#2             U      0:80:d3:cc:2c:0    UH     link#2             UH
    marie            0:10:83:f9:6f:2c   UH     0:5:32:8f:d2:35    UH     link#2             UH
    loghost          8:0:20:a7:f0:75    UH
    artemus          8:0:20:a8:d:7e     UH
    ash              0:b0:d0:de:49:df   UH      link#1             U      0:4:27:3:94:20     UH      0:5:32:8f:d2:34    UH     0:c0:4f:10:79:92   UH
    Destination      Gateway            Flags
    default          localhost          UG
    default          localhost          UG
    localhost        localhost          UH
    ::      localhost          UG
    ::      localhost          UG
    ::      localhost          UG
    ::ffff:   localhost          UG
    2002::           localhost          UG
    2002:7f00::      localhost          UG
    2002:e000::      localhost          UG
    2002:ff00::      localhost          UG
    fe80::           localhost          UG
    fe80::%ex0       link#1             U
    fe80::%ex1       link#2             U
    fe80::%lo0       fe80::1%lo0        U
    fec0::           localhost          UG
    ff01::           localhost          U
    ff02::%ex0       link#1             U
    ff02::%ex1       link#2             U
    ff02::%lo0       fe80::1%lo0        U

The flags section shows the status and whether or not it is a gateway. In this
case we see `U`, `H` and `G` (`U` is up, `H` is host and `G` is gateway, see
the man page for additional flags).

Now for some netstat output using the `-r` (routing) and `-n` (show network
numbers) options:

    Routing tables
    Destination        Gateway            Flags     Refs     Use    Mtu  Interface
    default          UGS         0   330309   1500  ex0
    127                UGRS        0        0  33228  lo0          UH          1     1624  33228  lo0
    172.15.13/24       UGS         0        0   1500  ex1
    172.16             link#2             UC         13        0   1500  ex1
    Destination                   Gateway                   Flags     Refs     Use
      Mtu  Interface
    ::/104                        ::1                       UGRS        0        0
    33228  lo0 =>
    ::/96                         ::1                       UGRS        0        0

The above output is a little more verbose. So, how can this help? Well, a good
example is when routes between networks get changed while users are connected. I
saw this happen several times when someone was rebooting routers all day long
after each change. Several users called up saying they were getting kicked out
and it was taking very long to log back in. As it turned out, the clients
connecting to the system were redirected to another router (which took a very
long route) to reconnect. I observed the `M` flag or Modified dynamically (by
redirect) on their connections. I deleted the routes, had them reconnect and
summarily followed up with the offending technician.

### tcpdump

Last, and definitely not least is
the network sniffer that can retrieve a lot of information. In this discussion,
there will be some sample output and an explanation of some of the more useful
options of tcpdump.

Following is a small snippet of tcpdump in action just as it starts:

    # tcpdump
    tcpdump: listening on ex0
    14:07:29.920651 mail.ssh > P 2951836801:2951836845(44) ack 2
    476972923 win 17520 <nop,nop,timestamp 1219259 128519450> [tos 0x10]
    14:07:29.950594 > ESP(spi=2548773187,seq=0x3e8c) (DF)
    14:07:29.983117 > . ack 420285166 win
    16500 (DF)
    14:07:29.984406 > . 1:1376(1375) ack 0
     win 7431 (DF)

Given that the particular server is a mail server, what is shown makes perfect
sense, however, the utility is very verbose, I prefer to initially run tcpdump
with no options and send the text output into a file for later digestion like

    # tcpdump > tcpdump.out
    tcpdump: listening on ex0

So, what precisely in the mish mosh are we looking for? In short, anything that
does not seem to fit, for example, messed up packet lengths (as in a lot of
them) will show up as improper lens or malformed packets (basically garbage).
If, however, we are looking for something specific, tcpdump may be able to help
depending on the problem.

#### Specific tcpdump Usage

These are just examples of a few things one can do with tcpdump.

Look for duplicate IP addresses:

    tcpdump -e host ip-address

For example:

    tcpdump -e host

Routing Problems:

    tcpdump icmp

There are plenty of third party tools available, however, NetBSD comes shipped
with a good tool set for tracking down network level performance problems.

## Accounting

The NetBSD system comes equipped with a great deal of performance monitors for
active monitoring, but what about long term monitoring? Well, of course the
output of a variety of commands can be sent to files and re-parsed later with a
meaningful shell script or program. NetBSD does, by default, offer some
extraordinarily powerful low level monitoring tools for the programmer,
administrator or really astute hobbyist.

### Accounting

While accounting gives system usage at an almost userland level, kernel
profiling with gprof provides explicit system call usage.

Using the accounting tools can help figure out what possible performance
problems may be laying in wait, such as increased usage of compilers or network
services for example.

Starting accounting is actually fairly simple, as root, use the
command. The syntax to start accounting is: `accton filename`

Where accounting information is appended to filename, now, strangely enough, the
lastcomm command which reads from an accounting output file, by default, looks
in `/var/account/acct` so I tend to just use the default location, however,
lastcomm can be told to look elsewhere.

To stop accounting, simply type accton with no arguments.

### Reading Accounting Information

To read accounting information, there are two tools that can be used:

 * [lastcomm(1)](
 * [sa(8)](

#### lastcomm

The lastcomm command shows the last commands executed in order, like all of
them. It can, however, select by user, here is some sample output:

    $ lastcomm jrf
    last       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:39 (0:00:00.02)
    man        -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
    sh         -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
    less       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
    lastcomm   -       jrf      ttyp3      0.02 secs Tue Sep  3 14:38 (0:00:00.02)
    stty       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
    tset       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:01.05)
    hostname   -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
    ls         -       jrf      ttyp0      0.00 secs Tue Sep  3 14:36 (0:00:00.00)

Pretty nice, the lastcomm command gets its information from the default location
of /var/account/acct, however, using the -f option, another file may be

As may seem obvious, the output of lastcomm could get a little heavy on large
multi user systems. That is where sa comes into play.

#### sa

The sa command (meaning "print system accounting statistics") can be used to
maintain information. It can also be used interactively to create reports.
Following is the default output of sa:

    $ sa
          77       18.62re        0.02cp        8avio        0k
           3        4.27re        0.01cp       45avio        0k   ispell
           2        0.68re        0.00cp       33avio        0k   mutt
           2        1.09re        0.00cp       23avio        0k   vi
          10        0.61re        0.00cp        7avio        0k   ***other
           2        0.01re        0.00cp       29avio        0k   exim
           4        0.00re        0.00cp        8avio        0k   lastcomm
           2        0.00re        0.00cp        3avio        0k   atrun
           3        0.03re        0.00cp        1avio        0k   cron*
           5        0.02re        0.00cp       10avio        0k   exim*
          10        3.98re        0.00cp        2avio        0k   less
          11        0.00re        0.00cp        0avio        0k   ls
           9        3.95re        0.00cp       12avio        0k   man
           2        0.00re        0.00cp        4avio        0k   sa
          12        3.97re        0.00cp        1avio        0k   sh

From left to right, total times called, real time in minutes, sum of user and
system time, in minutes, Average number of I/O operations per execution, size,
command name.

The sa command can also be used to create summary files or reports based on some
options, for example, here is the output when specifying a sort by CPU-time
average memory usage:

    $ sa -k
          86       30.81re        0.02cp        8avio        0k
          10        0.61re        0.00cp        7avio        0k   ***other
           2        0.00re        0.00cp        3avio        0k   atrun
           3        0.03re        0.00cp        1avio        0k   cron*
           2        0.01re        0.00cp       29avio        0k   exim
           5        0.02re        0.00cp       10avio        0k   exim*
           3        4.27re        0.01cp       45avio        0k   ispell
           4        0.00re        0.00cp        8avio        0k   lastcomm
          12        8.04re        0.00cp        2avio        0k   less
          13        0.00re        0.00cp        0avio        0k   ls
          11        8.01re        0.00cp       12avio        0k   man
           2        0.68re        0.00cp       33avio        0k   mutt
           3        0.00re        0.00cp        4avio        0k   sa
          14        8.03re        0.00cp        1avio        0k   sh
           2        1.09re        0.00cp       23avio        0k   vi

The sa command is very helpful on larger systems.

### How to Put Accounting to Use

Accounting reports, as was mentioned earlier, offer a way to help predict
trends, for example, on a system that has cc and make being used more and more
may indicate that in a few months some changes will need to be made to keep the
system running at an optimum level. Another good example is web server usage. If
it begins to gradually increase, again, some sort of action may need to be taken
before it becomes a problem. Luckily, with accounting tools, said actions can be
reasonably predicted and planned for ahead of time.

## Kernel Profiling

Profiling a kernel is normally employed when the goal is to compare the
difference of new changes in the kernel to a previous one or to track down some
sort of low level performance problem. Two sets of data about profiled code
behavior are recorded independently: function call frequency and time spent in
each function.

### Getting Started

First, take a look at both [[Kernel Tuning|guide/tuning#kernel]] and [[Compiling
the kernel|guide/kernel]]. The only difference in procedure for setting up a
kernel with profiling enabled is when you run config add the `-p` option. The
build area is `../compile/<KERNEL_NAME>.PROF` , for example, a GENERIC kernel
would be `../compile/GENERIC.PROF`.

Following is a quick summary of how to compile a kernel with profiling enabled
on the i386 port, the assumptions are that the appropriate sources are available
under `/usr/src` and the GENERIC configuration is being used, of course, that
may not always be the situation:

 1. **`cd /usr/src/sys/arch/i386/conf`**
 2. **`config -p GENERIC`**
 3. **`cd ../compile/GENERIC.PROF`**
 4. **`make depend && make`**
 5. **`cp /netbsd /netbsd.old`**
 6. **`cp netbsd /`**
 7. **`reboot`**

Once the new kernel is in place and the system has rebooted, it is time to turn
on the monitoring and start looking at results.

#### Using kgmon

To start kgmon:

    $ kgmon -b
    kgmon: kernel profiling is running.

Next, send the data into the file `gmon.out`:

    $ kgmon -p

Now, it is time to make the output readable:

    $ gprof /netbsd > gprof.out

Since gmon is looking for `gmon.out`, it should find it in the current working

By just running kgmon alone, you may not get the information you need, however,
if you are comparing the differences between two different kernels, then a known
good baseline should be used. Note that it is generally a good idea to  stress
the subsystem if you know what it is both in the baseline and with the newer (or
different) kernel.

### Interpretation of kgmon Output

Now that kgmon can run, collect and parse information, it is time to actually
look at some of that information. In this particular instance, a GENERIC kernel
is running with profiling enabled for about an hour with only system processes
and no adverse load, in the fault insertion section, the example will be large
enough that even under a minimal load detection of the problem should be easy.

#### Flat Profile

The flat profile is a list of functions, the number of times they were called
and how long it took (in seconds). Following is sample output from the quiet

    Flat profile:
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls  ns/call  ns/call  name
     99.77    163.87   163.87                             idle
      0.03    163.92     0.05      219 228310.50 228354.34  _wdc_ata_bio_start
      0.02    163.96     0.04      219 182648.40 391184.96  wdc_ata_bio_intr
      0.01    163.98     0.02     3412  5861.66  6463.02  pmap_enter
      0.01    164.00     0.02      548 36496.35 36496.35  pmap_zero_page
      0.01    164.02     0.02                             Xspllower
      0.01    164.03     0.01   481968    20.75    20.75  gettick
      0.01    164.04     0.01     6695  1493.65  1493.65  VOP_LOCK
      0.01    164.05     0.01     3251  3075.98 21013.45  syscall_plain

As expected, idle was the highest in percentage, however, there were still some
things going on, for example, a little further down there is the `vn\_lock`

      0.00    164.14     0.00     6711     0.00     0.00  VOP_UNLOCK
      0.00    164.14     0.00     6677     0.00  1493.65  vn_lock
      0.00    164.14     0.00     6441     0.00     0.00  genfs_unlock

This is to be expected, since locking still has to take place, regardless.

#### Call Graph Profile

The call graph is an augmented version of the flat profile showing subsequent
calls from the listed functions. First, here is some sample output:

                         Call graph (explanation follows)
    granularity: each sample hit covers 4 byte(s) for 0.01% of 164.14 seconds
    index % time    self  children    called     name
    [1]     99.8  163.87    0.00                 idle [1]
    [2]      0.1    0.01    0.08                 syscall1 [2]
                    0.01    0.06    3251/3251        syscall_plain [7]
                    0.00    0.01     414/1660        trap [9]
                    0.00    0.09     219/219         Xintr14 [6]
    [3]      0.1    0.00    0.09     219         pciide_compat_intr [3]
                    0.00    0.09     219/219         wdcintr [5]

Now this can be a little confusing. The index number is mapped to from the
trailing number on the end of the line, for example,

                    0.00    0.01      85/85          dofilewrite [68]
    [72]     0.0    0.00    0.01      85         soo_write [72]
                    0.00    0.01      85/89          sosend [71]

Here we see that dofilewrite was called first, now we can look at the index
number for 64 and see what was happening there:

                    0.00    0.01     101/103         ffs_full_fsync <cycle 6> [58]
    [64]     0.0    0.00    0.01     103         bawrite [64]
                    0.00    0.01     103/105         VOP_BWRITE [60]

And so on, in this way, a "visual trace" can be established.

At the end of the call graph right after the terms section is an index by
function name which can help map indexes as well.

### Putting it to Use

In this example, I have modified an area of the kernel I know will create a problem that will be blatantly obvious.

Here is the top portion of the flat profile after running the system for about an hour with little interaction from users:

    Flat profile:
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls  us/call  us/call  name
     93.97    139.13   139.13                             idle
      5.87    147.82     8.69       23 377826.09 377842.52  check_exec
      0.01    147.84     0.02      243    82.30    82.30  pmap_copy_page
      0.01    147.86     0.02      131   152.67   152.67  _wdc_ata_bio_start
      0.01    147.88     0.02      131   152.67   271.85  wdc_ata_bio_intr
      0.01    147.89     0.01     4428     2.26     2.66  uvn_findpage
      0.01    147.90     0.01     4145     2.41     2.41  uvm_pageactivate
      0.01    147.91     0.01     2473     4.04  3532.40  syscall_plain
      0.01    147.92     0.01     1717     5.82     5.82  i486_copyout
      0.01    147.93     0.01     1430     6.99    56.52  uvm_fault
      0.01    147.94     0.01     1309     7.64     7.64  pool_get
      0.01    147.95     0.01      673    14.86    38.43  genfs_getpages
      0.01    147.96     0.01      498    20.08    20.08  pmap_zero_page
      0.01    147.97     0.01      219    45.66    46.28  uvm_unmap_remove
      0.01    147.98     0.01      111    90.09    90.09  selscan

As is obvious, there is a large difference in performance. Right off the bat the
idle time is noticeably less. The main difference here is that one particular
function has a large time across the board with very few calls. That function is
`check_exec`. While at first, this may not seem strange if a lot of commands
had been executed, when compared to the flat profile of the first measurement,
proportionally it does not seem right:

      0.00    164.14     0.00       37     0.00 62747.49  check_exec

The call in the first measurement is made 37 times and has a better performance.
Obviously something in or around that function is wrong. To eliminate other
functions, a look at the call graph can help, here is the first instance of

                    0.00    8.69      23/23          syscall_plain [3]
    [4]      5.9    0.00    8.69      23         sys_execve [4]
                    8.69    0.00      23/23          check_exec [5]
                    0.00    0.00      20/20          elf32_copyargs [67]

Notice how the time of 8.69 seems to affect the two previous functions. It is
possible that there is something wrong with them, however, the next instance of
`check_exec` seems to prove otherwise:

                    8.69    0.00      23/23          sys_execve [4]
    [5]      5.9    8.69    0.00      23         check_exec [5]

Now we can see that the problem, most likely, resides in `check_exec`. Of
course, problems are not always this simple and in fact, here is the simpleton
code that was inserted right after `check_exec` (the function is in

            /* A Cheap fault insertion */
            for (x = 0; x < 100000000; x++) {
                    y = x;

Not exactly glamorous, but enough to register a large change with profiling.

### Summary

Kernel profiling can be enlightening for anyone and provides a much more refined
method of hunting down performance problems that are not as easy to find using
conventional means, it is also not nearly as hard as most people think, if you
can compile a kernel, you can get profiling to work.

## System Tuning

Now that monitoring and analysis tools have been addressed, it is time to look
into some actual methods. In this section, tools and methods that can affect how
the system performs that are applied without recompiling the kernel are
addressed, the next section examines kernel tuning by recompiling.

### Using sysctl

The sysctl utility can be used to look at and in some cases alter system
parameters. There are so many parameters that can be viewed and changed they
cannot all be shown here, however, for the first example here is a simple usage
of sysctl to look at the system PATH environment variable:

    $ sysctl user.cs_path
    user.cs_path = /usr/bin:/bin:/usr/sbin:/sbin:/usr/pkg/bin:/usr/pkg/sbin:/usr/local/bin:/usr/local/sbin

Fairly simple. Now something that is actually related to performance. As an
example, lets say a system with many users is having file open issues, by
examining and perhaps raising the kern.maxfiles parameter the problem may be
fixed, but first, a look:

    $ sysctl kern.maxfiles
    kern.maxfiles = 1772

Now, to change it, as root with the -w option specified:

    # sysctl -w kern.maxfiles=1972
    kern.maxfiles: 1772 -> 1972

Note, when the system is rebooted, the old value will return, there are two
cures for this, first, modify that parameter in the kernel and recompile, second
(and simpler) add this line to `/etc/sysctl.conf`:


### tmpfs & mfs

NetBSD's *ramdisk* implementations cache all data in the RAM, and if that is
full, the swap space is used as backing store. NetBSD comes with two
implementations, the traditional BSD memory-based file system
and the more modern
While the former can only grow in size, the latter can also shrink if space is
no longer needed.

When to use and not to use a memory based filesystem can be hard on large multi
user systems. In some cases, however, it makes pretty good sense, for example,
on a development machine used by only one developer at a time, the obj directory
might be a good place, or some of the tmp directories for builds. In a case like
that, it makes sense on machines that have a fair amount of RAM on them. On the
other side of the coin, if a system only has 16MB of RAM and `/var/tmp` is
mfs-based, there could be severe applications issues that occur.

The GENERIC kernel has both tmpfs and mfs enabled by default. To use it on a
particular directory first determine where the swap space is that you wish to
use, in the example case, a quick look in `/etc/fstab` indicates that
`/dev/wd0b` is the swap partition:

    mail% cat /etc/fstab
    /dev/wd0a / ffs rw 1 1
    /dev/wd0b none swap sw 0 0
    /kern /kern kernfs rw

This system is a mail server so I only want to use `/tmp` with tmpfs, also on
this particular system I have linked `/tmp` to `/var/tmp` to save space (they
are on the same drive). All I need to do is add the following entry:

    /dev/wd0b /var/tmp tmpfs rw 0 0

If you want to use mfs instead of tmpfs, put just that into the above place.

Now, a word of warning: make sure said directories are empty and nothing is
using them when you mount the memory file system! After changing `/etc/fstab`,
you can either run `mount -a` or reboot the system.

### Soft-dependencies

Soft-dependencies (softdeps) is a mechanism that does not write meta-data to
disk immediately, but it is written in an ordered fashion, which keeps the
filesystem consistent in case of a crash. The main benefit of softdeps is
processing speed. Soft-dependencies have some sharp edges, so beware! Also note
that soft-dependencies will not be present in any releases past 5.x. See
[[Journaling|guide/tuning#system-logging]] for information about WAPBL, which is
the replacement for soft-dependencies.

Soft-dependencies can be enabled by adding `softdep` to the filesystem options
in `/etc/fstab`. Let's look at an example of `/etc/fstab`:

    /dev/wd0a / ffs rw 1 1
    /dev/wd0b none swap sw 0 0
    /dev/wd0e /var ffs rw 1 2
    /dev/wd0f /tmp ffs rw 1 2
    /dev/wd0g /usr ffs rw 1 2

Suppose we want to enable soft-dependencies for all file systems, except for the
`/` partition. We would change it to (changes are emphasized):

    /dev/wd0a / ffs rw 1 1
    /dev/wd0b none swap sw 0 0
    /dev/wd0e /var ffs rw,softdep 1 2
    /dev/wd0f /tmp ffs rw,softdep 1 2
    /dev/wd0g /usr ffs rw,softdep 1 2

More information about softdep capabilities can be found on the
[author's page](

### Journaling

Journaling is a mechanism which puts written data in a so-called *journal*
first, and in a second step the data from the journal is written to disk. In the
event of a system crash, data that was not written to disk but that is in the
journal can be replayed, and will thus get the disk into a proper state. The
main effect of this is that no file system check (fsck) is needed after a rough
reboot. As of 5.0, NetBSD includes WAPBL, which provides journaling for FFS.

Journaling can be enabled by adding `log` to the filesystem options in
`/etc/fstab`. Here is an example which enables journaling for the root (`/`),
`/var`, and `/usr` file systems:

    /dev/wd0a /    ffs rw,log 1 1
    /dev/wd0e /var ffs rw,log 1 2
    /dev/wd0g /usr ffs rw,log 1 2

### LFS

LFS, the log structured filesystem, writes data to disk in a way that is
sometimes too aggressive and leads to congestion. To throttle writing, the
following sysctls can be used:


Besides tuning those parameters, disabling write-back caching on
[wd(4)]( devices may
be beneficial. See the
[dkctl(8)]( man
page for details.

More is available in the NetBSD mailing list archives. See
[this]( and
[this]( mail.

## Kernel Tuning

While many system parameters can be changed with sysctl, many improvements by
using enhanced system software, layout of the system and managing services
(moving them in and out of inetd for example) can be achieved as well. Tuning
the kernel however will provide better performance, even if it appears to be

### Preparing to Recompile a Kernel

First, get the kernel sources for the release as described in
[[Obtaining the sources|guide/fetch]], reading
[[Compiling the kernel|guide/kernel]]for more information on building the kernel
is recommended. Note, this document can be used for -current tuning, however, a
read of the
[[Tracking -current|tracking_current]] documentation should be done first, much
of the information there is repeated here.

### Configuring the Kernel

Configuring a kernel in NetBSD can be daunting. This is because of multiple line
dependencies within the configuration file itself, however, there is a benefit
to this method and that is, all it really takes is an ASCII editor to get a new
kernel configured and some dmesg output. The kernel configuration file is under
`src/sys/arch/ARCH/conf` where ARCH is your architecture (for example, on a
SPARC it would be under `src/sys/arch/sparc/conf`).

After you have located your kernel config file, copy it and remove (comment out)
all the entries you don't need. This is where
becomes your friend. A clean
will show all of the devices detected by the kernel at boot time. Using
output, the device options really needed can be determined.

#### Some example Configuration Items

In this example, an ftp server's kernel is being reconfigured to run with the
bare minimum drivers and options and any other items that might make it run
faster (again, not necessarily smaller, although it will be). The first thing to
do is take a look at some of the main configuration items. So, in
`/usr/src/sys/arch/i386/conf` the GENERIC file is copied to FTP, then the file
FTP edited.

At the start of the file there are a bunch of options beginning with maxusers,
which will be left alone, however, on larger multi-user systems it might be help
to crank that value up a bit. Next is CPU support, looking at the dmesg output
this is seen:

    cpu0: Intel Pentium II/Celeron (Deschutes) (686-class), 400.93 MHz

Indicating that only the options `I686_CPU` options needs to be used. In the next
section, all options are left alone except the `PIC_DELAY` which is recommended
unless it is an older machine. In this case it is enabled since the 686 is
*relatively new*.

Between the last section all the way down to compat options there really was no
need to change anything on this particular system. In the compat section,
however, there are several options that do not need to be enabled, again this is
because this machine is strictly a FTP server, all compat options were turned

The next section is File systems, and again, for this server very few need to be
on, the following were left on:

    # File systems
    file-system     FFS             # UFS
    file-system     LFS             # log-structured file system
    file-system     MFS             # memory file system
    file-system     CD9660          # ISO 9660 + Rock Ridge file system
    file-system     FDESC           # /dev/fd
    file-system     KERNFS          # /kern
    file-system     NULLFS          # loopback file system
    file-system     PROCFS          # /proc
    file-system     UMAPFS          # NULLFS + uid and gid remapping
    options         SOFTDEP         # FFS soft updates support.

Next comes the network options section. The only options left on were:

    options         INET            # IP + ICMP + TCP + UDP
    options         INET6           # IPV6
    options         IPFILTER_LOG    # ipmon(8) log support

`IPFILTER_LOG` is a nice one to have around since the server will be running

The next section is verbose messages for various subsystems, since this machine
is already running and had no major problems, all of them are commented out.

#### Some Drivers

The configurable items in the config file are relatively few and easy to cover,
however, device drivers are a different story. In the following examples, two
drivers are examined and their associated *areas* in the file trimmed down.
First, a small example: the cdrom, in dmesg, is the following lines:

    cd0 at atapibus0 drive 0: <CD-540E, , 1.0A> type 5 cdrom removable
    cd0: 32-bit data port
    cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
    pciide0: secondary channel interrupting at irq 15
    cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfer

Now, it is time to track that section down in the configuration file. Notice
that the `cd`-drive is on an atapibus and requires pciide support. The section
that is of interest in this case is the kernel config's "IDE and related
devices" section. It is worth noting at this point, in and around the IDE
section are also ISA, PCMCIA etc., on this machine in the
output there are no PCMCIA devices, so it stands to reason that all PCMCIA
references can be removed. But first, the `cd` drive.

At the start of the IDE section is the following:

    wd*     at atabus? drive ? flags 0x0000
    atapibus* at atapi?

Well, it is pretty obvious that those lines need to be kept. Next is this:

    # ATAPI devices
    # flags have the same meaning as for IDE drives.
    cd*     at atapibus? drive ? flags 0x0000       # ATAPI CD-ROM drives
    sd*     at atapibus? drive ? flags 0x0000       # ATAPI disk drives
    st*     at atapibus? drive ? flags 0x0000       # ATAPI tape drives
    uk*     at atapibus? drive ? flags 0x0000       # ATAPI unknown

The only device type that was in the
output was the cd, the rest can be commented out.

The next example is slightly more difficult, network interfaces. This machine
has two of them:

    ex0 at pci0 dev 17 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x64)
    ex0: interrupting at irq 10
    ex0: MAC address 00:50:04:83:ff:b7
    UI 0x001018 model 0x0012 rev 0 at ex0 phy 24 not configured
    ex1 at pci0 dev 19 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
    ex1: interrupting at irq 11
    ex1: MAC address 00:50:da:63:91:2e
    exphy0 at ex1 phy 24: 3Com internal media interface
    exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

At first glance it may appear that there are in fact three devices, however, a
closer look at this line:

    exphy0 at ex1 phy 24: 3Com internal media interface

Reveals that it is only two physical cards, not unlike the cdrom, simply
removing names that are not in dmesg will do the job. In the beginning of the
network interfaces section is:

    # Network Interfaces
    # PCI network interfaces
    an*     at pci? dev ? function ?        # Aironet PC4500/PC4800 (802.11)
    bge*    at pci? dev ? function ?        # Broadcom 570x gigabit Ethernet
    en*     at pci? dev ? function ?        # ENI/Adaptec ATM
    ep*     at pci? dev ? function ?        # 3Com 3c59x
    epic*   at pci? dev ? function ?        # SMC EPIC/100 Ethernet
    esh*    at pci? dev ? function ?        # Essential HIPPI card
    ex*     at pci? dev ? function ?        # 3Com 90x[BC]

There is the ex device. So all of the rest under the PCI section can be removed.
Additionally, every single line all the way down to this one:

    exphy*  at mii? phy ?                   # 3Com internal PHYs

can be commented out as well as the remaining.

#### Multi Pass

When I tune a kernel, I like to do it remotely in an X windows session, in one
window the dmesg output, in the other the config file. It can sometimes take a
few passes to rebuild a very trimmed kernel since it is easy to accidentally
remove dependencies.

### Building the New Kernel

Now it is time to build the kernel and put it in place. In the conf directory on
the ftp server, the following command prepares the build:

    $ config FTP

When it is done a message reminding me to make depend will display, next:

    $ cd ../compile/FTP
    $ make depend && make

When it is done, I backup the old kernel and drop the new one in place:

    # cp /netbsd /netbsd.orig
    # cp netbsd /

Now reboot. If the kernel cannot boot, stop the boot process when prompted and
type `boot netbsd.orig` to boot from the previous kernel.

### Shrinking the NetBSD kernel

When building a kernel for embedded systems, it's often necessary to modify the
Kernel binary to reduce space or memory footprint.

#### Removing ELF sections and debug information

We already know how to remove Kernel support for drivers and options that you
don't need, thus saving memory and space, but you can save some KiloBytes of
space by removing debugging symbols and two ELF sections if you don't need them:
`.comment` and `.ident`. They are used for storing RCS strings viewable with
[ident(1)]( and a
[gcc(1)]( version
string. The following examples assume you have your `TOOLDIR` under
`/usr/src/tooldir.NetBSD-2.0-i386` and the target architecture is `i386`.

    $ /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-objdump -h /netbsd
    /netbsd:     file format elf32-i386
    Idx Name          Size      VMA       LMA       File off  Algn
      0 .text         0057a374  c0100000  c0100000  00001000  2**4
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
      1 .rodata       00131433  c067a380  c067a380  0057b380  2**5
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      2 .rodata.str1.1 00035ea0  c07ab7b3  c07ab7b3  006ac7b3  2**0
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      3 .rodata.str1.32 00059d13  c07e1660  c07e1660  006e2660  2**5
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      4 link_set_malloc_types 00000198  c083b374  c083b374  0073c374  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      5 link_set_domains 00000024  c083b50c  c083b50c  0073c50c  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      6 link_set_pools 00000158  c083b530  c083b530  0073c530  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      7 link_set_sysctl_funcs 000000f0  c083b688  c083b688  0073c688  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      8 link_set_vfsops 00000044  c083b778  c083b778  0073c778  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      9 link_set_dkwedge_methods 00000004  c083b7bc  c083b7bc  0073c7bc  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     10 link_set_bufq_strats 0000000c  c083b7c0  c083b7c0  0073c7c0  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     11 link_set_evcnts 00000030  c083b7cc  c083b7cc  0073c7cc  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     12 .data         00048ae4  c083c800  c083c800  0073c800  2**5
                      CONTENTS, ALLOC, LOAD, DATA
     13 .bss          00058974  c0885300  c0885300  00785300  2**5
     14 .comment      0000cda0  00000000  00000000  00785300  2**0
                      CONTENTS, READONLY
     15 .ident        000119e4  00000000  00000000  007920a0  2**0
                      CONTENTS, READONLY

On the third column we can see the size of the sections in hexadecimal form. By
summing `.comment` and `.ident` sizes we know how much we're going to save with
their removal: around 120KB (= 52640 + 72164 = 0xcda0 + 0x119e4). To remove the
sections and debugging symbols that may be present, we're going to use

    # cp /netbsd /netbsd.orig
    # /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-strip -S -R .ident -R .comment /netbsd
    # ls -l /netbsd /netbsd.orig
    -rwxr-xr-x  1 root  wheel  8590668 Apr 30 15:56 netbsd
    -rwxr-xr-x  1 root  wheel  8757547 Apr 30 15:56 netbsd.orig

Since we also removed debugging symbols, the total amount of disk space saved is
around 160KB.

#### Compressing the Kernel

On some architectures, the bootloader can boot a compressed kernel. You can save
several MegaBytes of disk space by using this method, but the bootloader will
take longer to load the Kernel.

    # cp /netbsd /netbsd.plain
    # gzip -9 /netbsd

To see how much space we've saved:

    $ ls -l /netbsd.plain /netbsd.gz
    -rwxr-xr-x  1 root  wheel  8757547 Apr 29 18:05 /netbsd.plain
    -rwxr-xr-x  1 root  wheel  3987769 Apr 29 18:05 /netbsd.gz

Note that you can only use gzip coding, by using
[gzip(1)](, bzip2
is not supported by the NetBSD bootloaders!

CVSweb for NetBSD wikisrc <> software: FreeBSD-CVSweb