Annotation of wikisrc/guide/tuning.mdwn, revision 1.2

1.2     ! jdf         1: **Contents**
        !             2: 
        !             3: [[!toc levels=3]]
        !             4: 
1.1       jdf         5: # Tuning NetBSD
                      6: 
                      7: ## Introduction
                      8: 
                      9: ### Overview
                     10: 
                     11: This section covers a variety of performance tuning topics. It attempts to span
                     12: tuning from the perspective of the system administrator to systems programmer.
                     13: The art of performance tuning itself is very old. To tune something means to
                     14: make it operate more efficiently, whether one is referring to a NetBSD based
                     15: technical server or a vacuum cleaner, the goal is to improve something, whether
                     16: that be the way something is done, how it works or how it is put together.
                     17: 
                     18: #### What is Performance Tuning?
                     19: 
                     20: A view from 10,000 feet pretty much dictates that everything we do is task
                     21: oriented, this pertains to a NetBSD system as well. When the system boots, it
                     22: automatically begins to perform a variety of tasks. When a user logs in, they
                     23: usually have a wide variety of tasks they have to accomplish. In the scope of
                     24: these documents, however, performance tuning strictly means to improve how
                     25: efficient a NetBSD system performs.
                     26: 
                     27: The most common thought that crops into someone's mind when they think "tuning"
                     28: is some sort of speed increase or decreasing the size of the kernel - while
                     29: those are ways to improve performance, they are not the only ends an
                     30: administrator may have to take for increasing efficiency. For our purposes,
                     31: performance tuning means this: *To make a NetBSD system operate in an optimum
                     32: state.*
                     33: 
                     34: Which could mean a variety of things, not necessarily speed enhancements. A good
                     35: example of this is filesystem formatting parameters, on a system that has a lot
                     36: of small files (say like a source repository) an administrator may need to
                     37: increase the number of inodes by making their size smaller (say down to 1024k)
                     38: and then increasing the amount of inodes. In this case the number of inodes was
                     39: increased, however, it keeps the administrator from getting those nasty out of
                     40: inodes messages, which ultimately makes the system more efficient.
                     41: 
                     42: Tuning normally revolves around finding and eliminating bottlenecks. Most of the
                     43: time, such bottlenecks are spurious, for example, a release of Mozilla that does
                     44: not quite handle java applets too well can cause Mozilla to start crunching the
                     45: CPU, especially applets that are not done well. Occasions when processes seem to
                     46: spin off into nowhere and eat CPU are almost always resolved with a kill. There
                     47: are instances, however, when resolving bottlenecks takes a lot longer, for
                     48: example, say an rsynced server is just getting larger and larger. Slowly,
                     49: performance begins to fade and the administrator may have to take some sort of
                     50: action to speed things up, however, the situation is relative to say an
                     51: emergency like an instantly spiked CPU.
                     52: 
                     53: #### When does one tune?
                     54: 
                     55: Many NetBSD users rarely have to tune a system. The GENERIC kernel may run just
                     56: fine and the layout/configuration of the system may do the job as well. By the
                     57: same token, as a pragma it is always good to know how to tune a system. Most
                     58: often tuning comes as a result of a sudden bottleneck issue (which may occur
                     59: randomly) or a gradual loss of performance. It does happen in a sense to
                     60: everyone at some point, one process that is eating the CPU is a bottleneck as
                     61: much as a gradual increase in paging. So, the question should not be when to
                     62: tune so much as when to learn to tune.
                     63: 
                     64: One last time to tune is if you can tune in a preventive manner (and you think
                     65: you might need to) then do it. One example of this was on a system that needed
                     66: to be able to reboot quickly. Instead of waiting, I did everything I could to
                     67: trim the kernel and make sure there was absolutely nothing running that was not
                     68: needed, I even removed drivers that did have devices, but were never used (lp).
                     69: The result was reducing reboot time by nearly two-thirds. In the long run, it
                     70: was a smart move to tune it before it became an issue.
                     71: 
                     72: #### What these Documents Will Not Cover
                     73: 
                     74: Before I wrap up the introduction, I think it is important to note what these
                     75: documents will not cover. This guide will pertain only to the core NetBSD
                     76: system. In other words, it will not cover tuning a web server's configuration to
                     77: make it run better; however, it might mention how to tune NetBSD to run better
                     78: as a web server. The logic behind this is simple: web servers, database
                     79: software, etc. are third party and almost limitless. I could easily get mired
                     80: down in details that do not apply to the NetBSD system. Almost all third party
                     81: software have their own documentation about tuning anyhow.
                     82: 
                     83: #### How Examples are Laid Out
                     84: 
                     85: Since there is ample man page documentation, only used options and arguments
                     86: with examples are discussed. In some cases, material is truncated for brevity
                     87: and not thoroughly discussed because, quite simply, there is too much. For
                     88: example, every single device driver entry in the kernel will not be discussed,
                     89: however, an example of determining whether or not a given system needs one will
                     90: be. Nothing in this Guide is concrete, tuning and performance are very
                     91: subjective, instead, it is a guide for the reader to learn what some of the
                     92: tools available to them can do.
                     93: 
                     94: ## Tuning Considerations
                     95: 
                     96: Tuning a system is not really too difficult when pro-active tuning is the
                     97: approach. This document approaches tuning from a *before it comes up* approach.
                     98: While tuning in spare time is considerably easier versus say, a server that is
                     99: almost completely bogged down to 0.1% idle time, there are still a few things
                    100: that should be mulled over about tuning before actually doing it, hopefully,
                    101: before a system is even installed.
                    102: 
                    103: ### General System Configuration
                    104: 
                    105: Of course, how the system is setup makes a big difference. Sometimes small items
                    106: can be overlooked which may in fact cause some sort of long term performance
                    107: problem.
                    108: 
                    109: #### Filesystems and Disks
                    110: 
                    111: How the filesystem is laid out relative to disk drives is very important. On
                    112: hardware RAID systems, it is not such a big deal, but, many NetBSD users
                    113: specifically use NetBSD on older hardware where hardware RAID simply is not an
                    114: option. The idea of `/` being close to the first drive is a good one, but for
                    115: example if there are several drives to choose from that will be the first one,
                    116: is the best performing the one that `/` will be on? On a related note, is it
                    117: wise to split off `/usr`? Will the system see heavy usage say in `/usr/pkgsrc`?
                    118: It might make sense to slap a fast drive in and mount it under `/usr/pkgsrc`, or
                    119: it might not. Like all things in performance tuning, this is subjective.
                    120: 
                    121: #### Swap Configuration
                    122: 
                    123: There are three schools of thought on swap size and about fifty on using split
                    124: swap files with prioritizing and how that should be done. In the swap size
                    125: arena, the vendor schools (at least most commercial ones) usually have their own
                    126: formulas per OS. As an example, on a particular version of HP-UX with a
                    127: particular version of Oracle the formula was:
                    128: 
                    129: 2.5 GB \* Number\_of\_processor
                    130: 
                    131: Well, that all really depends on what type of usage the database is having and
                    132: how large it is, for instance if it is so large that it must be distributed,
                    133: that formula does not fit well.
                    134: 
                    135: The next school of thought about swap sizing is sort of strange but makes some
                    136: sense, it says, if possible, get a reference amount of memory used by the
                    137: system. It goes something like this:
                    138: 
                    139:  1. Startup a machine and estimate total memory needs by running everything that
                    140:     may ever be needed at once. Databases, web servers .... whatever. Total up
                    141:        the amount.
                    142:  2. Add a few MB for padding.
                    143:  3. Subtract the amount of physical RAM from this total.
                    144: 
                    145: If the amount leftover is 3 times the size of physical RAM, consider getting
                    146: more RAM. The problem, of course, is figuring out what is needed and how much
                    147: space it will take. There is also another flaw in this method, some programs do
                    148: not behave well. A glaring example of misbehaved software is web browsers. On
                    149: certain versions of Netscape, when something went wrong it had a tendency to
                    150: runaway and eat swap space. So, the more spare space available, the more time to
                    151: kill it.
                    152: 
                    153: Last and not least is the tried and true PHYSICAL\_RAM \* 2 method. On modern
                    154: machines and even older ones (with limited purpose of course) this seems to work
                    155: best.
                    156: 
                    157: All in all, it is hard to tell when swapping will start. Even on small 16MB RAM
                    158: machines (and less) NetBSD has always worked well for most people until
                    159: misbehaving software is running.
                    160: 
                    161: ### System Services
                    162: 
                    163: On servers, system services have a large impact. Getting them to run at their
                    164: best almost always requires some sort of network level change or a fundamental
                    165: speed increase in the underlying system (which of course is what this is all
                    166: about). There are instances when some simple solutions can improve services. One
                    167: example, an ftp server is becoming slower and a new release of the ftp server
                    168: that is shipped with the system comes out that, just happens to run faster. By
                    169: upgrading the ftp software, a performance boost is accomplished.
                    170: 
                    171: Another good example where services are concerned is the age old question: *To
                    172: use inetd or not to use inetd?* A great service example is pop3. Pop3
                    173: connections can conceivably clog up inetd. While the pop3 service itself starts
                    174: to degrade slowly, other services that are multiplexed through inetd will also
                    175: degrade (in some case more than pop3). Setting up pop3 to run outside of inetd
                    176: and on its own may help.
                    177: 
                    178: ### The NetBSD Kernel
                    179: 
                    180: The NetBSD kernel obviously plays a key role in how well a system performs,
                    181: while rebuilding and tuning the kernel is covered later in the text, it is worth
                    182: discussing in the local context from a high level.
                    183: 
                    184: Tuning the NetBSD kernel really involves three main areas:
                    185: 
                    186:  1. removing unrequired drivers
                    187:  2. configuring options
                    188:  3. system settings
                    189: 
                    190: #### Removing Unrequired Drivers
                    191: 
                    192: Taking drivers out of the kernel that are not needed achieves several results;
                    193: first, the system boots faster since the kernel is smaller, second again since
                    194: the kernel is smaller, more memory is free to users and processes and third, the
                    195: kernel tends to respond quicker.
                    196: 
                    197: #### Configuring Options
                    198: 
                    199: Configuring options such as enabling/disabling certain subsystems, specific
                    200: hardware and filesystems can also improve performance pretty much the same way
                    201: removing unrequired drivers does. A very simple example of this is a FTP server
                    202: that only hosts ftp files - nothing else. On this particular server there is no
                    203: need to have anything but native filesystem support and perhaps a few options to
                    204: help speed things along. Why would it ever need NTFS support for example?
                    205: Besides, if it did, support for NTFS could be added at some later time. In an
                    206: opposite case, a workstation may need to support a lot of different filesystem
                    207: types to share and access files.
                    208: 
                    209: #### System Settings
                    210: 
                    211: System wide settings are controlled by the kernel, a few examples are filesystem
                    212: settings, network settings and core kernel settings such as the maximum number
                    213: of processes. Almost all system settings can be at least looked at or modified
                    214: via the sysctl facility. Examples using the sysctl facility are given later on.
                    215: 
                    216: ## Visual Monitoring Tools
                    217: 
                    218: NetBSD ships a variety of performance monitoring tools with the system. Most of
                    219: these tools are common on all UNIX systems. In this section some example usage
                    220: of the tools is given with interpretation of the output.
                    221: 
                    222: ### The top Process Monitor
                    223: 
                    224: The [top(1)](http://netbsd.gw.com/cgi-bin/man-cgi?top+1+NetBSD-current)
                    225: monitor does exactly what it says: it displays the CPU hogs on the
                    226: system. To run the monitor, simply type top at the prompt. Without any
                    227: arguments, it should look like:
                    228: 
                    229:     load averages:  0.09,  0.12,  0.08                                     20:23:41
                    230:     21 processes:  20 sleeping, 1 on processor
                    231:     CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                    232:     Memory: 15M Act, 1104K Inact, 208K Wired, 22M Free, 129M Swap free
                    233:     
                    234:       PID USERNAME PRI NICE   SIZE   RES STATE     TIME   WCPU    CPU COMMAND
                    235:     13663 root       2    0  1552K 1836K sleep     0:08  0.00%  0.00% httpd
                    236:       127 root      10    0   129M 4464K sleep     0:01  0.00%  0.00% mount_mfs
                    237:     22591 root       2    0   388K 1156K sleep     0:01  0.00%  0.00% sshd
                    238:       108 root       2    0   132K  472K sleep     0:01  0.00%  0.00% syslogd
                    239:     22597 jrf       28    0   156K  616K onproc    0:00  0.00%  0.00% top
                    240:     22592 jrf       18    0   828K 1128K sleep     0:00  0.00%  0.00% tcsh
                    241:       203 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
                    242:         1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
                    243:       205 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
                    244:       206 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
                    245:       208 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
                    246:       207 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
                    247:     13667 nobody     2    0  1660K 1508K sleep     0:00  0.00%  0.00% httpd
                    248:      9926 root       2    0   336K  588K sleep     0:00  0.00%  0.00% sshd
                    249:       200 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
                    250:       182 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
                    251:       180 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
                    252:     13666 nobody    -4    0  1600K 1260K sleep     0:00  0.00%  0.00% httpd
                    253: 
                    254: The top(1) utility is great for finding CPU hogs, runaway processes or groups of
                    255: processes that may be causing problems. The output shown above indicates that
                    256: this particular system is in good health. Now, the next display should show some
                    257: very different results:
                    258: 
                    259:     load averages:  0.34,  0.16,  0.13                                     21:13:47
                    260:     25 processes:  24 sleeping, 1 on processor
                    261:     CPU states:  0.5% user,  0.0% nice,  9.0% system,  1.0% interrupt, 89.6% idle
                    262:     Memory: 20M Act, 1712K Inact, 240K Wired, 30M Free, 129M Swap free
                    263:     
                    264:       PID USERNAME PRI NICE   SIZE   RES STATE     TIME   WCPU    CPU COMMAND
                    265:      5304 jrf       -5    0    56K  336K sleep     0:04 66.07% 19.53% bonnie
                    266:      5294 root       2    0   412K 1176K sleep     0:02  1.01%  0.93% sshd
                    267:       108 root       2    0   132K  472K sleep     1:23  0.00%  0.00% syslogd
                    268:       187 root       2    0  1552K 1824K sleep     0:07  0.00%  0.00% httpd
                    269:      5288 root       2    0   412K 1176K sleep     0:02  0.00%  0.00% sshd
                    270:      5302 jrf       28    0   160K  620K onproc    0:00  0.00%  0.00% top
                    271:      5295 jrf       18    0   828K 1116K sleep     0:00  0.00%  0.00% tcsh
                    272:      5289 jrf       18    0   828K 1112K sleep     0:00  0.00%  0.00% tcsh
                    273:       127 root      10    0   129M 8388K sleep     0:00  0.00%  0.00% mount_mfs
                    274:       204 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
                    275:         1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
                    276:       208 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
                    277:       210 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
                    278:       209 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
                    279:       211 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
                    280:       217 nobody     2    0  1616K 1272K sleep     0:00  0.00%  0.00% httpd
                    281:       184 root       2    0   336K  580K sleep     0:00  0.00%  0.00% sshd
                    282:       201 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
                    283: 
                    284: At first, it should seem rather obvious which process is hogging the system,
                    285: however, what is interesting in this case is why. The bonnie program is a disk
                    286: benchmark tool which can write large files in a variety of sizes and ways. What
                    287: the previous output indicates is only that the bonnie program is a CPU hog, but
                    288: not why.
                    289: 
                    290: #### Other Neat Things About Top
                    291: 
                    292: A careful examination of the manual page
                    293: [top(1)](http://netbsd.gw.com/cgi-bin/man-cgi?top+1+NetBSD-5.0.1+i386) shows
                    294: that there is a lot more that can be done with top, for example, processes can
                    295: have their priority changed and killed. Additionally, filters can be set for
                    296: looking at processes.
                    297: 
                    298: ### The sysstat utility
                    299: 
                    300: As the man page
                    301: [sysstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?sysstat+1+NetBSD-5.0.1+i386)
                    302: indicates, the sysstat utility shows a variety of system statistics using the
                    303: curses library. While it is running the screen is shown in two parts, the upper
                    304: window shows the current load average while the lower screen depends on user
                    305: commands. The exception to the split window view is when vmstat display is on
                    306: which takes up the whole screen. Following is what sysstat looks like on a
                    307: fairly idle system with no arguments given when it was invoked:
                    308: 
                    309:                        /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
                    310:          Load Average   |
                    311:     
                    312:                              /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
                    313:                       <idle> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                    314: 
                    315: Basically a lot of dead time there, so now have a look with some arguments
                    316: provided, in this case, `sysstat inet.tcp` which looks like this:
                    317: 
                    318:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
                    319:          Load Average   |
                    320:     
                    321:             0 connections initiated           19 total TCP packets sent
                    322:             0 connections accepted            11   data
                    323:             0 connections established          0   data (retransmit)
                    324:                                                8   ack-only
                    325:             0 connections dropped              0   window probes
                    326:             0   in embryonic state             0   window updates
                    327:             0   on retransmit timeout          0   urgent data only
                    328:             0   by keepalive                   0   control
                    329:             0   by persist
                    330:                                               29 total TCP packets received
                    331:            11 potential rtt updates           17   in sequence
                    332:            11 successful rtt updates           0   completely duplicate
                    333:             9 delayed acks sent                0   with some duplicate data
                    334:             0 retransmit timeouts              4   out of order
                    335:             0 persist timeouts                 0   duplicate acks
                    336:             0 keepalive probes                11   acks
                    337:             0 keepalive timeouts               0   window probes
                    338:                                                0   window updates
                    339: 
                    340: Now that is informative. The first poll is accumulative, so it is possible to
                    341: see quite a lot of information in the output when sysstat is invoked. Now, while
                    342: that may be interesting, how about a look at the buffer cache with `sysstat
                    343: bufcache`:
                    344: 
                    345:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
                    346:          Load Average
                    347:     
                    348:     There are 1642 buffers using 6568 kBytes of memory.
                    349:     
                    350:     File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
                    351:     /                          877  53        6171  93        6516  99      94
                    352:     /var/tmp                     5   0          17   0          28   0      60
                    353:     
                    354:     Total:                     882  53        6188  94        6544  99
                    355: 
                    356: Again, a pretty boring system, but great information to have available. While
                    357: this is all nice to look at, it is time to put a false load on the system to see
                    358: how sysstat can be used as a performance monitoring tool. As with top, bonnie++
                    359: will be used to put a high load on the I/O subsystems and a little on the CPU.
                    360: The bufcache will be looked at again to see of there are any noticeable
                    361: differences:
                    362: 
                    363:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
                    364:          Load Average   |||
                    365:     
                    366:     There are 1642 buffers using 6568 kBytes of memory.
                    367:     
                    368:     File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
                    369:     /                          811  49        6422  97        6444  98      99
                    370:     
                    371:     Total:                     811  49        6422  97        6444  98
                    372: 
                    373: First, notice that the load average shot up, this is to be expected of course,
                    374: then, while most of the numbers are close, notice that utilization is at 99%.
                    375: Throughout the time that bonnie++ was running the utilization percentage
                    376: remained at 99, this of course makes sense, however, in a real troubleshooting
                    377: situation, it could be indicative of a process doing heavy I/O on one particular
                    378: file or filesystem.
                    379: 
                    380: ## Monitoring Tools
                    381: 
                    382: In addition to screen oriented monitors and tools, the NetBSD system also ships
                    383: with a set of command line oriented tools. Many of the tools that ship with a
                    384: NetBSD system can be found on other UNIX and UNIX-like systems.
                    385: 
                    386: ### fstat
                    387: 
                    388: The [fstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?fstat+1+NetBSD-5.0.1+i386)
                    389: utility reports the status of open files on the system, while it is not what
                    390: many administrators consider a performance monitor, it can help find out if a
                    391: particular user or process is using an inordinate amount of files, generating
                    392: large files and similar information.
                    393: 
                    394: Following is a sample of some fstat output:
                    395: 
                    396:     USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
                    397:     jrf      tcsh       21607   wd /         29772 drwxr-xr-x     512 r
                    398:     jrf      tcsh       21607    3* unix stream c057acc0<-> c0553280
                    399:     jrf      tcsh       21607    4* unix stream c0553280 <-> c057acc0
                    400:     root     sshd       21597   wd /             2 drwxr-xr-x     512 r
                    401:     root     sshd       21597    0 /         11921 crw-rw-rw-    null rw
                    402:     nobody   httpd       5032   wd /             2 drwxr-xr-x     512 r
                    403:     nobody   httpd       5032    0 /         11921 crw-rw-rw-    null r
                    404:     nobody   httpd       5032    1 /         11921 crw-rw-rw-    null w
                    405:     nobody   httpd       5032    2 /         15890 -rw-r--r--  353533 rw
                    406:     ...
                    407: 
                    408: The fields are pretty self explanatory, again, this tool while not as
                    409: performance oriented as others, can come in handy when trying to find out
                    410: information about file usage.
                    411: 
                    412: ### iostat
                    413: 
                    414: The [iostat(8)](http://netbsd.gw.com/cgi-bin/man-cgi?iostat+8+NetBSD-5.0.1+i386)
                    415: command does exactly what it sounds like, it reports the status of the I/O
                    416: subsystems on the system. When iostat is employed, the user typically runs it
                    417: with a certain number of counts and an interval between them like so:
                    418: 
                    419:     $ iostat 5 5
                    420:           tty            wd0             cd0             fd0             md0             cpu
                    421:      tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
                    422:        0    1  5.13   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
                    423:        0   54  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
                    424:        0   18  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
                    425:        0   18  8.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
                    426:        0   28  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
                    427: 
                    428: The above output is from a very quiet ftp server. The fields represent the
                    429: various I/O devices, the tty (which, ironically, is the most active because
                    430: iostat is running), wd0 which is the primary IDE disk, cd0, the cdrom drive,
                    431: fd0, the floppy and the memory filesystem.
                    432: 
                    433: Now, let's see if we can pummel the system with some heavy usage. First, a large
                    434: ftp transaction consisting of a tarball of netbsd-current source along with the
                    435: `bonnie++` disk benchmark program running at the same time.
                    436: 
                    437:     $ iostat 5 5
                    438:           tty            wd0             cd0             fd0             md0             cpu
                    439:      tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
                    440:        0    1  5.68   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
                    441:        0   54 61.03 150 8.92   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 18  4 78
                    442:        0   26 63.14 157 9.71   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 20  4 75
                    443:        0   20 43.58  26 1.12   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  9  2 88
                    444:        0   28 19.49  82 1.55   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0  7  3 89
                    445: 
                    446: As can be expected, notice that wd0 is very active, what is interesting about
                    447: this output is how the processor's I/O seems to rise in proportion to wd0. This
                    448: makes perfect sense, however, it is worth noting that only because this ftp
                    449: server is hardly being used can that be observed. If, for example, the cpu I/O
                    450: subsystem was already under a moderate load and the disk subsystem was under the
                    451: same load as it is now, it could appear that the cpu is bottlenecked when in
                    452: fact it would have been the disk. In such a case, we can observe that *one tool*
                    453: is rarely enough to completely analyze a problem. A quick glance at processes
                    454: probably would tell us (after watching iostat) which processes were causing
                    455: problems.
                    456: 
                    457: ### ps
                    458: 
                    459: Using the [ps(1)](http://netbsd.gw.com/cgi-bin/man-cgi?ps+1+NetBSD-5.0.1+i386)
                    460: command or process status, a great deal of information about the system can be
                    461: discovered. Most of the time, the ps command is used to isolate a particular
                    462: process by name, group, owner etc. Invoked with no options or arguments, ps
                    463: simply prints out information about the user executing it.
                    464: 
                    465:     $ ps
                    466:       PID TT STAT    TIME COMMAND
                    467:     21560 p0 Is   0:00.04 -tcsh
                    468:     21564 p0 I+   0:00.37 ssh jrf.odpn.net
                    469:     21598 p1 Ss   0:00.12 -tcsh
                    470:     21673 p1 R+   0:00.00 ps
                    471:     21638 p2 Is+  0:00.06 -tcsh
                    472: 
                    473: Not very exciting. The fields are self explanatory with the exception of `STAT`
                    474: which is actually the state a process is in. The flags are all documented in the
                    475: man page, however, in the above example, `I` is idle, `S` is sleeping, `R` is
                    476: runnable, the `+` means the process is in a foreground state, and the s means
                    477: the process is a session leader. This all makes perfect sense when looking at
                    478: the flags, for example, PID 21560 is a shell, it is idle and (as would be
                    479: expected) the shell is the process leader.
                    480: 
                    481: In most cases, someone is looking for something very specific in the process
                    482: listing. As an example, looking at all processes is specified with `-a`, to see
                    483: all processes plus those without controlling terminals is `-ax` and to get a
                    484: much more verbose listing (basically everything plus information about the
                    485: impact processes are having) aux:
                    486: 
                    487:     # ps aux
                    488:     USER     PID %CPU %MEM    VSZ  RSS TT STAT STARTED    TIME COMMAND
                    489:     root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
                    490:     root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
                    491:     root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
                    492:     jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
                    493:     root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
                    494:     jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh jrf.odpn.net
                    495:     jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
                    496:     root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
                    497:     nobody  5032  0.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd
                    498:     ...
                    499: 
                    500: Again, most of the fields are self explanatory with the exception of `VSZ` and
                    501: `RSS` which can be a little confusing. `RSS` is the real size of a process in
                    502: 1024 byte units while `VSZ` is the virtual size. This is all great, but again,
                    503: how can ps help? Well, for one, take a look at this modified version of the same
                    504: output:
                    505: 
                    506:     # ps aux
                    507:     USER     PID %CPU %MEM    VSZ  RSS TT STAT STARTED    TIME COMMAND
                    508:     root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
                    509:     root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
                    510:     root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
                    511:     jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
                    512:     root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
                    513:     jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh jrf.odpn.net
                    514:     jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
                    515:     root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
                    516:     nobody  5032  9.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd
                    517:     ...
                    518: 
                    519: Given that on this server, our baseline indicates a relatively quiet system, the
                    520: PID 5032 has an unusually large amount of `%CPU`. Sometimes this can also cause
                    521: high `TIME` numbers. The ps command can be grepped on for PIDs, username and
                    522: process name and hence help track down processes that may be experiencing
                    523: problems.
                    524: 
                    525: ### vmstat
                    526: 
                    527: Using
                    528: [vmstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?vmstat+1+NetBSD-5.0.1+i386),
                    529: information pertaining to virtual memory can be monitored and measured. Not
                    530: unlike iostat, vmstat can be invoked with a count and interval. Following is
                    531: some sample output using `5 5` like the iostat example:
                    532: 
                    533:     # vmstat 5 5
                    534:      procs   memory     page                       disks         faults      cpu
                    535:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
                    536:      0 7 0 17716 33160    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
                    537:      0 7 0 17724 33156    2   0   0    0    0    0  1  0  0  0  109    6   3  0  0 100
                    538:      0 7 0 17724 33156    1   0   0    0    0    0  1  0  0  0  105    6   3  0  0 100
                    539:      0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  107    6   3  0  0 100
                    540:      0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  105    6   3  0  0 100
                    541: 
                    542: Yet again, relatively quiet, for posterity, the exact same load that was put on
                    543: this server in the iostat example will be used. The load is a large file
                    544: transfer and the bonnie benchmark program.
                    545: 
                    546:     # vmstat 5 5
                    547:      procs   memory     page                       disks         faults      cpu
                    548:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
                    549:      1 8 0 18880 31968    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
                    550:      0 8 0 18888 31964    2   0   0    0    0    0 130  0  0  0 1804 5539 1094 31 22 47
                    551:      1 7 0 18888 31964    1   0   0    0    0    0 130  0  0  0 1802 5500 1060 36 16 49
                    552:      1 8 0 18888 31964    1   0   0    0    0    0 160  0  0  0 1849 5905 1107 21 22 57
                    553:      1 7 0 18888 31964    1   0   0    0    0    0 175  0  0  0 1893 6167 1082  1 25 75
                    554: 
                    555: Just a little different. Notice, since most of the work was I/O based, the
                    556: actual memory used was not very much. Since this system uses mfs for `/tmp`,
                    557: however, it can certainly get beat up. Have a look at this:
                    558: 
                    559:     # vmstat 5 5
                    560:      procs   memory     page                       disks         faults      cpu
                    561:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
                    562:      0 2 0 99188   500    2   0   0    0    0    0  1  0  0  0  105   16   4  0  0 100
                    563:      0 2 0111596   436  592   0 587  624  586 1210 624  0  0  0  741  883 1088  0 11 89
                    564:      0 3 0123976   784  666   0 662  643  683 1326 702  0  0  0  828  993 1237  0 12 88
                    565:      0 2 0134692  1236  581   0 571  563  595 1158 599  0  0  0  722  863 1066  0  9 90
                    566:      2 0 0142860   912  433   0 406  403  405  808 429  0  0  0  552  602 768  0  7 93
                    567: 
                    568: Pretty scary stuff. That was created by running bonnie in `/tmp` on a memory
                    569: based filesystem. If it continued for too long, it is possible the system could
                    570: have started thrashing. Notice that even though the VM subsystem was taking a
                    571: beating, the processors still were not getting too battered.
                    572: 
                    573: ## Network Tools
                    574: 
                    575: Sometimes a performance problem is not a particular machine, it is the network
                    576: or some sort of device on the network such as another host, a router etc. What
                    577: other machines that provide a service or some sort of connectivity to a
                    578: particular NetBSD system do and how they act can have a very large impact on
                    579: performance of the NetBSD system itself, or the perception of performance by
                    580: users. A really great example of this is when a DNS server that a NetBSD machine
                    581: is using suddenly disappears. Lookups take long and they eventually fail.
                    582: Someone logged into the NetBSD machine who is not experienced would undoubtedly
                    583: (provided they had no other evidence) blame the NetBSD system. One of my
                    584: personal favorites, *the Internet is broke*, usually means either DNS service or
                    585: a router/gateway has dropped offline. Whatever the case may be, a NetBSD system
                    586: comes adequately armed to deal with finding out what network issues may be
                    587: cropping up whether the fault of the local system or some other issue.
                    588: 
                    589: ### ping
                    590: 
                    591: The classic
                    592: [ping(8)](http://netbsd.gw.com/cgi-bin/man-cgi?ping+8+NetBSD-5.0.1+i386) utility
                    593: can tell us if there is just plain connectivity, it can also tell if host
                    594: resolution (depending on how `nsswitch.conf` dictates) is working. Following is
                    595: some typical ping output on a local network with a count of 3 specified:
                    596: 
                    597:     # ping -c 3 marie
                    598:     PING marie (172.16.14.12): 56 data bytes
                    599:     64 bytes from 172.16.14.12: icmp_seq=0 ttl=255 time=0.571 ms
                    600:     64 bytes from 172.16.14.12: icmp_seq=1 ttl=255 time=0.361 ms
                    601:     64 bytes from 172.16.14.12: icmp_seq=2 ttl=255 time=0.371 ms
                    602:     
                    603:     ----marie PING Statistics----
                    604:     3 packets transmitted, 3 packets received, 0.0% packet loss
                    605:     round-trip min/avg/max/stddev = 0.361/0.434/0.571/0.118 ms
                    606: 
                    607: Not only does ping tell us if a host is alive, it tells us how long it took and
                    608: gives some nice details at the very end. If a host cannot be resolved, just the
                    609: IP address can be specified as well:
                    610: 
                    611:     # ping -c 1 172.16.20.5
                    612:     PING ash (172.16.20.5): 56 data bytes
                    613:     64 bytes from 172.16.20.5: icmp_seq=0 ttl=64 time=0.452 ms
                    614:     
                    615:     ----ash PING Statistics----
                    616:     1 packets transmitted, 1 packets received, 0.0% packet loss
                    617:     round-trip min/avg/max/stddev = 0.452/0.452/0.452/0.000 ms
                    618: 
                    619: Now, not unlike any other tool, the times are very subjective, especially in
                    620: regards to networking. For example, while the times in the examples are good,
                    621: take a look at the localhost ping:
                    622: 
                    623:     # ping -c 4 localhost
                    624:     PING localhost (127.0.0.1): 56 data bytes
                    625:     64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.091 ms
                    626:     64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=0.129 ms
                    627:     64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=0.120 ms
                    628:     64 bytes from 127.0.0.1: icmp_seq=3 ttl=255 time=0.122 ms
                    629:     
                    630:     ----localhost PING Statistics----
                    631:     4 packets transmitted, 4 packets received, 0.0% packet loss
                    632:     round-trip min/avg/max/stddev = 0.091/0.115/0.129/0.017 ms
                    633: 
                    634: Much smaller because the request never left the machine. Pings can be used to
                    635: gather information about how well a network is performing. It is also good for
                    636: problem isolation, for instance, if there are three relatively close in size
                    637: NetBSD systems on a network and one of them simply has horrible ping times,
                    638: chances are something is wrong on that one particular machine.
                    639: 
                    640: ### traceroute
                    641: 
                    642: The
                    643: [traceroute(8)](http://netbsd.gw.com/cgi-bin/man-cgi?traceroute+8+NetBSD-5.0.1+i386)
                    644: command is great for making sure a path is available or detecting problems on a
                    645: particular path. As an example, here is a trace between the example ftp server
                    646: and ftp.NetBSD.org:
                    647: 
                    648:     # traceroute ftp.NetBSD.org
                    649:     traceroute to ftp.NetBSD.org (204.152.184.75), 30 hops max, 40 byte packets
                    650:      1  208.44.95.1 (208.44.95.1)  1.646 ms  1.492 ms  1.456 ms
                    651:      2  63.144.65.170 (63.144.65.170)  7.318 ms  3.249 ms  3.854 ms
                    652:      3  chcg01-edge18.il.inet.qwest.net (65.113.85.229)  35.982 ms  28.667 ms  21.971 ms
                    653:      4  chcg01-core01.il.inet.qwest.net (205.171.20.1)  22.607 ms  26.242 ms  19.631 ms
                    654:      5  snva01-core01.ca.inet.qwest.net (205.171.8.50)  78.586 ms  70.585 ms  84.779 ms
                    655:      6  snva01-core03.ca.inet.qwest.net (205.171.14.122)  69.222 ms  85.739 ms  75.979 ms
                    656:      7  paix01-brdr02.ca.inet.qwest.net (205.171.205.30)  83.882 ms  67.739 ms  69.937 ms
                    657:      8  198.32.175.3 (198.32.175.3)  72.782 ms  67.687 ms  73.320 ms
                    658:      9  so-1-0-0.orpa8.pf.isc.org (192.5.4.231)  78.007 ms  81.860 ms  77.069 ms
                    659:     10  tun0.orrc5.pf.isc.org (192.5.4.165)  70.808 ms  75.151 ms  81.485 ms
                    660:     11  ftp.NetBSD.org (204.152.184.75)  69.700 ms  69.528 ms  77.788 ms
                    661: 
                    662: All in all, not bad. The trace went from the host to the local router, then out
                    663: onto the provider network and finally out onto the Internet looking for the
                    664: final destination. How to interpret traceroutes, again, are subjective, but
                    665: abnormally high times in portions of a path can indicate a bottleneck on a piece
                    666: of network equipment. Not unlike ping, if the host itself is suspect, run
                    667: traceroute from another host to the same destination. Now, for the worst case
                    668: scenario:
                    669: 
                    670:     # traceroute www.microsoft.com
                    671:     traceroute: Warning: www.microsoft.com has multiple addresses; using 207.46.230.220
                    672:     traceroute to www.microsoft.akadns.net (207.46.230.220), 30 hops max, 40 byte packets
                    673:      1  208.44.95.1 (208.44.95.1)  2.517 ms  4.922 ms  5.987 ms
                    674:      2  63.144.65.170 (63.144.65.170)  10.981 ms  3.374 ms  3.249 ms
                    675:      3  chcg01-edge18.il.inet.qwest.net (65.113.85.229)  37.810 ms  37.505 ms  20.795 ms
                    676:      4  chcg01-core03.il.inet.qwest.net (205.171.20.21)  36.987 ms  32.320 ms  22.430 ms
                    677:      5  chcg01-brdr03.il.inet.qwest.net (205.171.20.142)  33.155 ms  32.859 ms  33.462 ms
                    678:      6  205.171.1.162 (205.171.1.162)  39.265 ms  20.482 ms  26.084 ms
                    679:      7  sl-bb24-chi-13-0.sprintlink.net (144.232.26.85)  26.681 ms  24.000 ms  28.975 ms
                    680:      8  sl-bb21-sea-10-0.sprintlink.net (144.232.20.30)  65.329 ms  69.694 ms  76.704 ms
                    681:      9  sl-bb21-tac-9-1.sprintlink.net (144.232.9.221)  65.659 ms  66.797 ms  74.408 ms
                    682:     10  144.232.187.194 (144.232.187.194)  104.657 ms  89.958 ms  91.754 ms
                    683:     11  207.46.154.1 (207.46.154.1)  89.197 ms  84.527 ms  81.629 ms
                    684:     12  207.46.155.10 (207.46.155.10)  78.090 ms  91.550 ms  89.480 ms
                    685:     13  * * *
                    686:     .......
                    687: 
                    688: In this case, the Microsoft server cannot be found either because of multiple
                    689: addresses or somewhere along the line a system or server cannot reply to the
                    690: information request. At that point, one might think to try ping, in the
                    691: Microsoft case, a ping does not reply, that is because somewhere on their
                    692: network ICMP is most likely disabled.
                    693: 
                    694: ### netstat
                    695: 
                    696: Another problem that can crop up on a NetBSD system is routing table issues.
                    697: These issues are not always the systems fault. The
                    698: [route(8)](http://netbsd.gw.com/cgi-bin/man-cgi?route+8+NetBSD-5.0.1+i386) and
                    699: [netstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?netstat+1+NetBSD-5.0.1+i386)
                    700: commands can show information about routes and network connections
                    701: (respectively).
                    702: 
                    703: The route command can be used to look at and modify routing tables while netstat
                    704: can display information about network connections and routes. First, here is
                    705: some output from `route show`:
                    706: 
                    707:     # route show
                    708:     Routing tables
                    709:     
                    710:     Internet:
                    711:     Destination      Gateway            Flags
                    712:     default          208.44.95.1        UG
                    713:     loopback         127.0.0.1          UG
                    714:     localhost        127.0.0.1          UH
                    715:     172.15.13.0      172.16.14.37       UG
                    716:     172.16.0.0       link#2             U
                    717:     172.16.14.8      0:80:d3:cc:2c:0    UH
                    718:     172.16.14.10     link#2             UH
                    719:     marie            0:10:83:f9:6f:2c   UH
                    720:     172.16.14.37     0:5:32:8f:d2:35    UH
                    721:     172.16.16.15     link#2             UH
                    722:     loghost          8:0:20:a7:f0:75    UH
                    723:     artemus          8:0:20:a8:d:7e     UH
                    724:     ash              0:b0:d0:de:49:df   UH
                    725:     208.44.95.0      link#1             U
                    726:     208.44.95.1      0:4:27:3:94:20     UH
                    727:     208.44.95.2      0:5:32:8f:d2:34    UH
                    728:     208.44.95.25     0:c0:4f:10:79:92   UH
                    729:     
                    730:     Internet6:
                    731:     Destination      Gateway            Flags
                    732:     default          localhost          UG
                    733:     default          localhost          UG
                    734:     localhost        localhost          UH
                    735:     ::127.0.0.0      localhost          UG
                    736:     ::224.0.0.0      localhost          UG
                    737:     ::255.0.0.0      localhost          UG
                    738:     ::ffff:0.0.0.0   localhost          UG
                    739:     2002::           localhost          UG
                    740:     2002:7f00::      localhost          UG
                    741:     2002:e000::      localhost          UG
                    742:     2002:ff00::      localhost          UG
                    743:     fe80::           localhost          UG
                    744:     fe80::%ex0       link#1             U
                    745:     fe80::%ex1       link#2             U
                    746:     fe80::%lo0       fe80::1%lo0        U
                    747:     fec0::           localhost          UG
                    748:     ff01::           localhost          U
                    749:     ff02::%ex0       link#1             U
                    750:     ff02::%ex1       link#2             U
                    751:     ff02::%lo0       fe80::1%lo0        U
                    752: 
                    753: The flags section shows the status and whether or not it is a gateway. In this
                    754: case we see `U`, `H` and `G` (`U` is up, `H` is host and `G` is gateway, see
                    755: the man page for additional flags).
                    756: 
                    757: Now for some netstat output using the `-r` (routing) and `-n` (show network
                    758: numbers) options:
                    759: 
                    760:     Routing tables
                    761:     
                    762:     Internet:
                    763:     Destination        Gateway            Flags     Refs     Use    Mtu  Interface
                    764:     default            208.44.95.1        UGS         0   330309   1500  ex0
                    765:     127                127.0.0.1          UGRS        0        0  33228  lo0
                    766:     127.0.0.1          127.0.0.1          UH          1     1624  33228  lo0
                    767:     172.15.13/24       172.16.14.37       UGS         0        0   1500  ex1
                    768:     172.16             link#2             UC         13        0   1500  ex1
                    769:     ...
                    770:     Internet6:
                    771:     Destination                   Gateway                   Flags     Refs     Use
                    772:       Mtu  Interface
                    773:     ::/104                        ::1                       UGRS        0        0
                    774:     33228  lo0 =>
                    775:     ::/96                         ::1                       UGRS        0        0
                    776: 
                    777: The above output is a little more verbose. So, how can this help? Well, a good
                    778: example is when routes between networks get changed while users are connected. I
                    779: saw this happen several times when someone was rebooting routers all day long
                    780: after each change. Several users called up saying they were getting kicked out
                    781: and it was taking very long to log back in. As it turned out, the clients
                    782: connecting to the system were redirected to another router (which took a very
                    783: long route) to reconnect. I observed the `M` flag or Modified dynamically (by
                    784: redirect) on their connections. I deleted the routes, had them reconnect and
                    785: summarily followed up with the offending technician.
                    786: 
                    787: ### tcpdump
                    788: 
                    789: Last, and definitely not least is
                    790: [tcpdump(8)](http://netbsd.gw.com/cgi-bin/man-cgi?tcpdump+8+NetBSD-5.0.1+i386),
                    791: the network sniffer that can retrieve a lot of information. In this discussion,
                    792: there will be some sample output and an explanation of some of the more useful
                    793: options of tcpdump.
                    794: 
                    795: Following is a small snippet of tcpdump in action just as it starts:
                    796: 
                    797:     # tcpdump
                    798:     tcpdump: listening on ex0
                    799:     14:07:29.920651 mail.ssh > 208.44.95.231.3551: P 2951836801:2951836845(44) ack 2
                    800:     476972923 win 17520 <nop,nop,timestamp 1219259 128519450> [tos 0x10]
                    801:     14:07:29.950594 12.125.61.34 >  208.44.95.16: ESP(spi=2548773187,seq=0x3e8c) (DF)
                    802:     14:07:29.983117 smtp.somecorp.com.smtp > 208.44.95.30.42828: . ack 420285166 win
                    803:     16500 (DF)
                    804:     14:07:29.984406 208.44.95.30.42828 > smtp.somecorp.com.smtp: . 1:1376(1375) ack 0
                    805:      win 7431 (DF)
                    806:     ...
                    807: 
                    808: Given that the particular server is a mail server, what is shown makes perfect
                    809: sense, however, the utility is very verbose, I prefer to initially run tcpdump
                    810: with no options and send the text output into a file for later digestion like
                    811: so:
                    812: 
                    813:     # tcpdump > tcpdump.out
                    814:     tcpdump: listening on ex0
                    815: 
                    816: So, what precisely in the mish mosh are we looking for? In short, anything that
                    817: does not seem to fit, for example, messed up packet lengths (as in a lot of
                    818: them) will show up as improper lens or malformed packets (basically garbage).
                    819: If, however, we are looking for something specific, tcpdump may be able to help
                    820: depending on the problem.
                    821: 
                    822: #### Specific tcpdump Usage
                    823: 
                    824: These are just examples of a few things one can do with tcpdump.
                    825: 
                    826: Look for duplicate IP addresses:
                    827: 
                    828:     tcpdump -e host ip-address
                    829: 
                    830: For example:
                    831: 
                    832:     tcpdump -e host 192.168.0.2
                    833: 
                    834: Routing Problems:
                    835: 
                    836:     tcpdump icmp
                    837: 
                    838: There are plenty of third party tools available, however, NetBSD comes shipped
                    839: with a good tool set for tracking down network level performance problems.
                    840: 
                    841: ## Accounting
                    842: 
                    843: The NetBSD system comes equipped with a great deal of performance monitors for
                    844: active monitoring, but what about long term monitoring? Well, of course the
                    845: output of a variety of commands can be sent to files and re-parsed later with a
                    846: meaningful shell script or program. NetBSD does, by default, offer some
                    847: extraordinarily powerful low level monitoring tools for the programmer,
                    848: administrator or really astute hobbyist.
                    849: 
                    850: ### Accounting
                    851: 
                    852: While accounting gives system usage at an almost userland level, kernel
                    853: profiling with gprof provides explicit system call usage.
                    854: 
                    855: Using the accounting tools can help figure out what possible performance
                    856: problems may be laying in wait, such as increased usage of compilers or network
                    857: services for example.
                    858: 
                    859: Starting accounting is actually fairly simple, as root, use the
                    860: [accton(8)](http://netbsd.gw.com/cgi-bin/man-cgi?accton+8+NetBSD-5.0.1+i386)
                    861: command. The syntax to start accounting is: `accton filename`
                    862: 
                    863: Where accounting information is appended to filename, now, strangely enough, the
                    864: lastcomm command which reads from an accounting output file, by default, looks
                    865: in `/var/account/acct` so I tend to just use the default location, however,
                    866: lastcomm can be told to look elsewhere.
                    867: 
                    868: To stop accounting, simply type accton with no arguments.
                    869: 
                    870: ### Reading Accounting Information
                    871: 
                    872: To read accounting information, there are two tools that can be used:
                    873: 
                    874:  * [lastcomm(1)](http://netbsd.gw.com/cgi-bin/man-cgi?lastcomm+1+NetBSD-5.0.1+i386)
                    875:  * [sa(8)](http://netbsd.gw.com/cgi-bin/man-cgi?sa+8+NetBSD-5.0.1+i386)
                    876: 
                    877: #### lastcomm
                    878: 
                    879: The lastcomm command shows the last commands executed in order, like all of
                    880: them. It can, however, select by user, here is some sample output:
                    881: 
                    882:     $ lastcomm jrf
                    883:     last       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:39 (0:00:00.02)
                    884:     man        -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
                    885:     sh         -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
                    886:     less       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
                    887:     lastcomm   -       jrf      ttyp3      0.02 secs Tue Sep  3 14:38 (0:00:00.02)
                    888:     stty       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
                    889:     tset       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:01.05)
                    890:     hostname   -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
                    891:     ls         -       jrf      ttyp0      0.00 secs Tue Sep  3 14:36 (0:00:00.00)
                    892:     ...
                    893: 
                    894: Pretty nice, the lastcomm command gets its information from the default location
                    895: of /var/account/acct, however, using the -f option, another file may be
                    896: specified.
                    897: 
                    898: As may seem obvious, the output of lastcomm could get a little heavy on large
                    899: multi user systems. That is where sa comes into play.
                    900: 
                    901: #### sa
                    902: 
                    903: The sa command (meaning "print system accounting statistics") can be used to
                    904: maintain information. It can also be used interactively to create reports.
                    905: Following is the default output of sa:
                    906: 
                    907:     $ sa
                    908:           77       18.62re        0.02cp        8avio        0k
                    909:            3        4.27re        0.01cp       45avio        0k   ispell
                    910:            2        0.68re        0.00cp       33avio        0k   mutt
                    911:            2        1.09re        0.00cp       23avio        0k   vi
                    912:           10        0.61re        0.00cp        7avio        0k   ***other
                    913:            2        0.01re        0.00cp       29avio        0k   exim
                    914:            4        0.00re        0.00cp        8avio        0k   lastcomm
                    915:            2        0.00re        0.00cp        3avio        0k   atrun
                    916:            3        0.03re        0.00cp        1avio        0k   cron*
                    917:            5        0.02re        0.00cp       10avio        0k   exim*
                    918:           10        3.98re        0.00cp        2avio        0k   less
                    919:           11        0.00re        0.00cp        0avio        0k   ls
                    920:            9        3.95re        0.00cp       12avio        0k   man
                    921:            2        0.00re        0.00cp        4avio        0k   sa
                    922:           12        3.97re        0.00cp        1avio        0k   sh
                    923:     ...
                    924: 
                    925: From left to right, total times called, real time in minutes, sum of user and
                    926: system time, in minutes, Average number of I/O operations per execution, size,
                    927: command name.
                    928: 
                    929: The sa command can also be used to create summary files or reports based on some
                    930: options, for example, here is the output when specifying a sort by CPU-time
                    931: average memory usage:
                    932: 
                    933:     $ sa -k
                    934:           86       30.81re        0.02cp        8avio        0k
                    935:           10        0.61re        0.00cp        7avio        0k   ***other
                    936:            2        0.00re        0.00cp        3avio        0k   atrun
                    937:            3        0.03re        0.00cp        1avio        0k   cron*
                    938:            2        0.01re        0.00cp       29avio        0k   exim
                    939:            5        0.02re        0.00cp       10avio        0k   exim*
                    940:            3        4.27re        0.01cp       45avio        0k   ispell
                    941:            4        0.00re        0.00cp        8avio        0k   lastcomm
                    942:           12        8.04re        0.00cp        2avio        0k   less
                    943:           13        0.00re        0.00cp        0avio        0k   ls
                    944:           11        8.01re        0.00cp       12avio        0k   man
                    945:            2        0.68re        0.00cp       33avio        0k   mutt
                    946:            3        0.00re        0.00cp        4avio        0k   sa
                    947:           14        8.03re        0.00cp        1avio        0k   sh
                    948:            2        1.09re        0.00cp       23avio        0k   vi
                    949: 
                    950: The sa command is very helpful on larger systems.
                    951: 
                    952: ### How to Put Accounting to Use
                    953: 
                    954: Accounting reports, as was mentioned earlier, offer a way to help predict
                    955: trends, for example, on a system that has cc and make being used more and more
                    956: may indicate that in a few months some changes will need to be made to keep the
                    957: system running at an optimum level. Another good example is web server usage. If
                    958: it begins to gradually increase, again, some sort of action may need to be taken
                    959: before it becomes a problem. Luckily, with accounting tools, said actions can be
                    960: reasonably predicted and planned for ahead of time.
                    961: 
                    962: ## Kernel Profiling
                    963: 
                    964: Profiling a kernel is normally employed when the goal is to compare the
                    965: difference of new changes in the kernel to a previous one or to track down some
                    966: sort of low level performance problem. Two sets of data about profiled code
                    967: behavior are recorded independently: function call frequency and time spent in
                    968: each function.
                    969: 
                    970: ### Getting Started
                    971: 
                    972: First, take a look at both [[Kernel Tuning|guide/tuning#kernel]] and [[Compiling
                    973: the kernel|guide/kernel]]. The only difference in procedure for setting up a
                    974: kernel with profiling enabled is when you run config add the `-p` option. The
                    975: build area is `../compile/<KERNEL_NAME>.PROF` , for example, a GENERIC kernel
                    976: would be `../compile/GENERIC.PROF`.
                    977: 
                    978: Following is a quick summary of how to compile a kernel with profiling enabled
                    979: on the i386 port, the assumptions are that the appropriate sources are available
                    980: under `/usr/src` and the GENERIC configuration is being used, of course, that
                    981: may not always be the situation:
                    982: 
                    983:  1. **`cd /usr/src/sys/arch/i386/conf`**
                    984:  2. **`config -p GENERIC`**
                    985:  3. **`cd ../compile/GENERIC.PROF`**
                    986:  4. **`make depend && make`**
                    987:  5. **`cp /netbsd /netbsd.old`**
                    988:  6. **`cp netbsd /`**
                    989:  7. **`reboot`**
                    990: 
                    991: Once the new kernel is in place and the system has rebooted, it is time to turn
                    992: on the monitoring and start looking at results.
                    993: 
                    994: #### Using kgmon
                    995: 
                    996: To start kgmon:
                    997: 
                    998:     $ kgmon -b
                    999:     kgmon: kernel profiling is running.
                   1000: 
                   1001: Next, send the data into the file `gmon.out`:
                   1002: 
                   1003:     $ kgmon -p
                   1004: 
                   1005: Now, it is time to make the output readable:
                   1006: 
                   1007:     $ gprof /netbsd > gprof.out
                   1008: 
                   1009: Since gmon is looking for `gmon.out`, it should find it in the current working
                   1010: directory.
                   1011: 
                   1012: By just running kgmon alone, you may not get the information you need, however,
                   1013: if you are comparing the differences between two different kernels, then a known
                   1014: good baseline should be used. Note that it is generally a good idea to  stress
                   1015: the subsystem if you know what it is both in the baseline and with the newer (or
                   1016: different) kernel.
                   1017: 
                   1018: ### Interpretation of kgmon Output
                   1019: 
                   1020: Now that kgmon can run, collect and parse information, it is time to actually
                   1021: look at some of that information. In this particular instance, a GENERIC kernel
                   1022: is running with profiling enabled for about an hour with only system processes
                   1023: and no adverse load, in the fault insertion section, the example will be large
                   1024: enough that even under a minimal load detection of the problem should be easy.
                   1025: 
                   1026: #### Flat Profile
                   1027: 
                   1028: The flat profile is a list of functions, the number of times they were called
                   1029: and how long it took (in seconds). Following is sample output from the quiet
                   1030: system:
                   1031: 
                   1032:     Flat profile:
                   1033:     
                   1034:     Each sample counts as 0.01 seconds.
                   1035:       %   cumulative   self              self     total
                   1036:      time   seconds   seconds    calls  ns/call  ns/call  name
                   1037:      99.77    163.87   163.87                             idle
                   1038:       0.03    163.92     0.05      219 228310.50 228354.34  _wdc_ata_bio_start
                   1039:       0.02    163.96     0.04      219 182648.40 391184.96  wdc_ata_bio_intr
                   1040:       0.01    163.98     0.02     3412  5861.66  6463.02  pmap_enter
                   1041:       0.01    164.00     0.02      548 36496.35 36496.35  pmap_zero_page
                   1042:       0.01    164.02     0.02                             Xspllower
                   1043:       0.01    164.03     0.01   481968    20.75    20.75  gettick
                   1044:       0.01    164.04     0.01     6695  1493.65  1493.65  VOP_LOCK
                   1045:       0.01    164.05     0.01     3251  3075.98 21013.45  syscall_plain
                   1046:     ...
                   1047: 
                   1048: As expected, idle was the highest in percentage, however, there were still some
                   1049: things going on, for example, a little further down there is the `vn\_lock`
                   1050: function:
                   1051: 
                   1052:     ...
                   1053:       0.00    164.14     0.00     6711     0.00     0.00  VOP_UNLOCK
                   1054:       0.00    164.14     0.00     6677     0.00  1493.65  vn_lock
                   1055:       0.00    164.14     0.00     6441     0.00     0.00  genfs_unlock
                   1056: 
                   1057: This is to be expected, since locking still has to take place, regardless.
                   1058: 
                   1059: #### Call Graph Profile
                   1060: 
                   1061: The call graph is an augmented version of the flat profile showing subsequent
                   1062: calls from the listed functions. First, here is some sample output:
                   1063: 
                   1064:                          Call graph (explanation follows)
                   1065:     
                   1066:     
                   1067:     granularity: each sample hit covers 4 byte(s) for 0.01% of 164.14 seconds
                   1068:     
                   1069:     index % time    self  children    called     name
                   1070:                                                      <spontaneous>
                   1071:     [1]     99.8  163.87    0.00                 idle [1]
                   1072:     -----------------------------------------------
                   1073:                                                      <spontaneous>
                   1074:     [2]      0.1    0.01    0.08                 syscall1 [2]
                   1075:                     0.01    0.06    3251/3251        syscall_plain [7]
                   1076:                     0.00    0.01     414/1660        trap [9]
                   1077:     -----------------------------------------------
                   1078:                     0.00    0.09     219/219         Xintr14 [6]
                   1079:     [3]      0.1    0.00    0.09     219         pciide_compat_intr [3]
                   1080:                     0.00    0.09     219/219         wdcintr [5]
                   1081:     -----------------------------------------------
                   1082:     ...
                   1083: 
                   1084: Now this can be a little confusing. The index number is mapped to from the
                   1085: trailing number on the end of the line, for example,
                   1086: 
                   1087:     ...
                   1088:                     0.00    0.01      85/85          dofilewrite [68]
                   1089:     [72]     0.0    0.00    0.01      85         soo_write [72]
                   1090:                     0.00    0.01      85/89          sosend [71]
                   1091:     ...
                   1092: 
                   1093: Here we see that dofilewrite was called first, now we can look at the index
                   1094: number for 64 and see what was happening there:
                   1095: 
                   1096:     ...
                   1097:                     0.00    0.01     101/103         ffs_full_fsync <cycle 6> [58]
                   1098:     [64]     0.0    0.00    0.01     103         bawrite [64]
                   1099:                     0.00    0.01     103/105         VOP_BWRITE [60]
                   1100:     ...
                   1101: 
                   1102: And so on, in this way, a "visual trace" can be established.
                   1103: 
                   1104: At the end of the call graph right after the terms section is an index by
                   1105: function name which can help map indexes as well.
                   1106: 
                   1107: ### Putting it to Use
                   1108: 
                   1109: In this example, I have modified an area of the kernel I know will create a problem that will be blatantly obvious.
                   1110: 
                   1111: Here is the top portion of the flat profile after running the system for about an hour with little interaction from users:
                   1112: 
                   1113:     Flat profile:
                   1114:     
                   1115:     Each sample counts as 0.01 seconds.
                   1116:       %   cumulative   self              self     total
                   1117:      time   seconds   seconds    calls  us/call  us/call  name
                   1118:      93.97    139.13   139.13                             idle
                   1119:       5.87    147.82     8.69       23 377826.09 377842.52  check_exec
                   1120:       0.01    147.84     0.02      243    82.30    82.30  pmap_copy_page
                   1121:       0.01    147.86     0.02      131   152.67   152.67  _wdc_ata_bio_start
                   1122:       0.01    147.88     0.02      131   152.67   271.85  wdc_ata_bio_intr
                   1123:       0.01    147.89     0.01     4428     2.26     2.66  uvn_findpage
                   1124:       0.01    147.90     0.01     4145     2.41     2.41  uvm_pageactivate
                   1125:       0.01    147.91     0.01     2473     4.04  3532.40  syscall_plain
                   1126:       0.01    147.92     0.01     1717     5.82     5.82  i486_copyout
                   1127:       0.01    147.93     0.01     1430     6.99    56.52  uvm_fault
                   1128:       0.01    147.94     0.01     1309     7.64     7.64  pool_get
                   1129:       0.01    147.95     0.01      673    14.86    38.43  genfs_getpages
                   1130:       0.01    147.96     0.01      498    20.08    20.08  pmap_zero_page
                   1131:       0.01    147.97     0.01      219    45.66    46.28  uvm_unmap_remove
                   1132:       0.01    147.98     0.01      111    90.09    90.09  selscan
                   1133:     ...
                   1134: 
                   1135: As is obvious, there is a large difference in performance. Right off the bat the
                   1136: idle time is noticeably less. The main difference here is that one particular
                   1137: function has a large time across the board with very few calls. That function is
                   1138: `check_exec`. While at first, this may not seem strange if a lot of commands
                   1139: had been executed, when compared to the flat profile of the first measurement,
                   1140: proportionally it does not seem right:
                   1141: 
                   1142:     ...
                   1143:       0.00    164.14     0.00       37     0.00 62747.49  check_exec
                   1144:     ...
                   1145: 
                   1146: The call in the first measurement is made 37 times and has a better performance.
                   1147: Obviously something in or around that function is wrong. To eliminate other
                   1148: functions, a look at the call graph can help, here is the first instance of
                   1149: `check_exec`
                   1150: 
                   1151:     ...
                   1152:     -----------------------------------------------
                   1153:                     0.00    8.69      23/23          syscall_plain [3]
                   1154:     [4]      5.9    0.00    8.69      23         sys_execve [4]
                   1155:                     8.69    0.00      23/23          check_exec [5]
                   1156:                     0.00    0.00      20/20          elf32_copyargs [67]
                   1157:     ...
                   1158: 
                   1159: Notice how the time of 8.69 seems to affect the two previous functions. It is
                   1160: possible that there is something wrong with them, however, the next instance of
                   1161: `check_exec` seems to prove otherwise:
                   1162: 
                   1163:     ...
                   1164:     -----------------------------------------------
                   1165:                     8.69    0.00      23/23          sys_execve [4]
                   1166:     [5]      5.9    8.69    0.00      23         check_exec [5]
                   1167:     ...
                   1168: 
                   1169: Now we can see that the problem, most likely, resides in `check_exec`. Of
                   1170: course, problems are not always this simple and in fact, here is the simpleton
                   1171: code that was inserted right after `check_exec` (the function is in
                   1172: `sys/kern/kern_exec.c`):
                   1173: 
                   1174:     ...
                   1175:             /* A Cheap fault insertion */
                   1176:             for (x = 0; x < 100000000; x++) {
                   1177:                     y = x;
                   1178:             }
                   1179:     ..
                   1180: 
                   1181: Not exactly glamorous, but enough to register a large change with profiling.
                   1182: 
                   1183: ### Summary
                   1184: 
                   1185: Kernel profiling can be enlightening for anyone and provides a much more refined
                   1186: method of hunting down performance problems that are not as easy to find using
                   1187: conventional means, it is also not nearly as hard as most people think, if you
                   1188: can compile a kernel, you can get profiling to work.
                   1189: 
                   1190: ## System Tuning
                   1191: 
                   1192: Now that monitoring and analysis tools have been addressed, it is time to look
                   1193: into some actual methods. In this section, tools and methods that can affect how
                   1194: the system performs that are applied without recompiling the kernel are
                   1195: addressed, the next section examines kernel tuning by recompiling.
                   1196: 
                   1197: ### Using sysctl
                   1198: 
                   1199: The sysctl utility can be used to look at and in some cases alter system
                   1200: parameters. There are so many parameters that can be viewed and changed they
                   1201: cannot all be shown here, however, for the first example here is a simple usage
                   1202: of sysctl to look at the system PATH environment variable:
                   1203: 
                   1204:     $ sysctl user.cs_path
                   1205:     user.cs_path = /usr/bin:/bin:/usr/sbin:/sbin:/usr/pkg/bin:/usr/pkg/sbin:/usr/local/bin:/usr/local/sbin
                   1206: 
                   1207: Fairly simple. Now something that is actually related to performance. As an
                   1208: example, lets say a system with many users is having file open issues, by
                   1209: examining and perhaps raising the kern.maxfiles parameter the problem may be
                   1210: fixed, but first, a look:
                   1211: 
                   1212:     $ sysctl kern.maxfiles
                   1213:     kern.maxfiles = 1772
                   1214: 
                   1215: Now, to change it, as root with the -w option specified:
                   1216: 
                   1217:     # sysctl -w kern.maxfiles=1972
                   1218:     kern.maxfiles: 1772 -> 1972
                   1219: 
                   1220: Note, when the system is rebooted, the old value will return, there are two
                   1221: cures for this, first, modify that parameter in the kernel and recompile, second
                   1222: (and simpler) add this line to `/etc/sysctl.conf`:
                   1223: 
                   1224:     kern.maxfiles=1972
                   1225: 
                   1226: ### tmpfs & mfs
                   1227: 
                   1228: NetBSD's *ramdisk* implementations cache all data in the RAM, and if that is
                   1229: full, the swap space is used as backing store. NetBSD comes with two
                   1230: implementations, the traditional BSD memory-based file system
                   1231: [mfs](http://netbsd.gw.com/cgi-bin/man-cgi?mount_mfs+8+NetBSD-current)
                   1232: and the more modern
                   1233: [tmpfs](http://netbsd.gw.com/cgi-bin/man-cgi?mount_tmpfs+8+NetBSD-current).
                   1234: While the former can only grow in size, the latter can also shrink if space is
                   1235: no longer needed.
                   1236: 
                   1237: When to use and not to use a memory based filesystem can be hard on large multi
                   1238: user systems. In some cases, however, it makes pretty good sense, for example,
                   1239: on a development machine used by only one developer at a time, the obj directory
                   1240: might be a good place, or some of the tmp directories for builds. In a case like
                   1241: that, it makes sense on machines that have a fair amount of RAM on them. On the
                   1242: other side of the coin, if a system only has 16MB of RAM and `/var/tmp` is
                   1243: mfs-based, there could be severe applications issues that occur.
                   1244: 
                   1245: The GENERIC kernel has both tmpfs and mfs enabled by default. To use it on a
                   1246: particular directory first determine where the swap space is that you wish to
                   1247: use, in the example case, a quick look in `/etc/fstab` indicates that
                   1248: `/dev/wd0b` is the swap partition:
                   1249: 
                   1250:     mail% cat /etc/fstab
                   1251:     /dev/wd0a / ffs rw 1 1
                   1252:     /dev/wd0b none swap sw 0 0
                   1253:     /kern /kern kernfs rw
                   1254: 
                   1255: This system is a mail server so I only want to use `/tmp` with tmpfs, also on
                   1256: this particular system I have linked `/tmp` to `/var/tmp` to save space (they
                   1257: are on the same drive). All I need to do is add the following entry:
                   1258: 
                   1259:     /dev/wd0b /var/tmp tmpfs rw 0 0
                   1260: 
                   1261: If you want to use mfs instead of tmpfs, put just that into the above place.
                   1262: 
                   1263: Now, a word of warning: make sure said directories are empty and nothing is
                   1264: using them when you mount the memory file system! After changing `/etc/fstab`,
                   1265: you can either run `mount -a` or reboot the system.
                   1266: 
                   1267: ### Soft-dependencies
                   1268: 
                   1269: Soft-dependencies (softdeps) is a mechanism that does not write meta-data to
                   1270: disk immediately, but it is written in an ordered fashion, which keeps the
                   1271: filesystem consistent in case of a crash. The main benefit of softdeps is
                   1272: processing speed. Soft-dependencies have some sharp edges, so beware! Also note
                   1273: that soft-dependencies will not be present in any releases past 5.x. See
                   1274: [[Journaling|guide/tuning#system-logging]] for information about WAPBL, which is
                   1275: the replacement for soft-dependencies.
                   1276: 
                   1277: Soft-dependencies can be enabled by adding `softdep` to the filesystem options
                   1278: in `/etc/fstab`. Let's look at an example of `/etc/fstab`:
                   1279: 
                   1280:     /dev/wd0a / ffs rw 1 1
                   1281:     /dev/wd0b none swap sw 0 0
                   1282:     /dev/wd0e /var ffs rw 1 2
                   1283:     /dev/wd0f /tmp ffs rw 1 2
                   1284:     /dev/wd0g /usr ffs rw 1 2
                   1285: 
                   1286: Suppose we want to enable soft-dependencies for all file systems, except for the
                   1287: `/` partition. We would change it to (changes are emphasized):
                   1288: 
                   1289:     /dev/wd0a / ffs rw 1 1
                   1290:     /dev/wd0b none swap sw 0 0
                   1291:     /dev/wd0e /var ffs rw,softdep 1 2
                   1292:     /dev/wd0f /tmp ffs rw,softdep 1 2
                   1293:     /dev/wd0g /usr ffs rw,softdep 1 2
                   1294: 
                   1295: More information about softdep capabilities can be found on the
                   1296: [author's page](http://www.mckusick.com/softdep/index.html).
                   1297: 
                   1298: ### Journaling
                   1299: 
                   1300: Journaling is a mechanism which puts written data in a so-called *journal*
                   1301: first, and in a second step the data from the journal is written to disk. In the
                   1302: event of a system crash, data that was not written to disk but that is in the
                   1303: journal can be replayed, and will thus get the disk into a proper state. The
                   1304: main effect of this is that no file system check (fsck) is needed after a rough
                   1305: reboot. As of 5.0, NetBSD includes WAPBL, which provides journaling for FFS.
                   1306: 
                   1307: Journaling can be enabled by adding `log` to the filesystem options in
                   1308: `/etc/fstab`. Here is an example which enables journaling for the root (`/`),
                   1309: `/var`, and `/usr` file systems:
                   1310: 
                   1311:     /dev/wd0a /    ffs rw,log 1 1
                   1312:     /dev/wd0e /var ffs rw,log 1 2
                   1313:     /dev/wd0g /usr ffs rw,log 1 2
                   1314: 
                   1315: ### LFS
                   1316: 
                   1317: LFS, the log structured filesystem, writes data to disk in a way that is
                   1318: sometimes too aggressive and leads to congestion. To throttle writing, the
                   1319: following sysctls can be used:
                   1320: 
                   1321:     vfs.sync.delay
                   1322:     vfs.sync.filedelay
                   1323:     vfs.sync.dirdelay
                   1324:     vfs.sync.metadelay
                   1325:     vfs.lfs.flushindir
                   1326:     vfs.lfs.clean_vnhead
                   1327:     vfs.lfs.dostats
                   1328:     vfs.lfs.pagetrip
                   1329:     vfs.lfs.stats.segsused
                   1330:     vfs.lfs.stats.psegwrites
                   1331:     vfs.lfs.stats.psyncwrites
                   1332:     vfs.lfs.stats.pcleanwrites
                   1333:     vfs.lfs.stats.blocktot
                   1334:     vfs.lfs.stats.cleanblocks
                   1335:     vfs.lfs.stats.ncheckpoints
                   1336:     vfs.lfs.stats.nwrites
                   1337:     vfs.lfs.stats.nsync_writes
                   1338:     vfs.lfs.stats.wait_exceeded
                   1339:     vfs.lfs.stats.write_exceeded
                   1340:     vfs.lfs.stats.flush_invoked
                   1341:     vfs.lfs.stats.vflush_invoked
                   1342:     vfs.lfs.stats.clean_inlocked
                   1343:     vfs.lfs.stats.clean_vnlocked
                   1344:     vfs.lfs.stats.segs_reclaimed
                   1345:     vfs.lfs.ignore_lazy_sync
                   1346: 
                   1347: Besides tuning those parameters, disabling write-back caching on
                   1348: [wd(4)](http://netbsd.gw.com/cgi-bin/man-cgi?wd+4+NetBSD-5.0.1+i386) devices may
                   1349: be beneficial. See the
                   1350: [dkctl(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dkctl+8+NetBSD-5.0.1+i386) man
                   1351: page for details.
                   1352: 
                   1353: More is available in the NetBSD mailing list archives. See
                   1354: [this](http://mail-index.NetBSD.org/tech-perform/2007/04/01/0000.html) and
                   1355: [this](http://mail-index.NetBSD.org/tech-perform/2007/04/01/0001.html) mail.
                   1356: 
                   1357: ## Kernel Tuning
                   1358: 
                   1359: While many system parameters can be changed with sysctl, many improvements by
                   1360: using enhanced system software, layout of the system and managing services
                   1361: (moving them in and out of inetd for example) can be achieved as well. Tuning
                   1362: the kernel however will provide better performance, even if it appears to be
                   1363: marginal.
                   1364: 
                   1365: ### Preparing to Recompile a Kernel
                   1366: 
                   1367: First, get the kernel sources for the release as described in
                   1368: [[Obtaining the sources|guide/fetch]], reading
                   1369: [[Compiling the kernel|guide/kernel]]for more information on building the kernel
                   1370: is recommended. Note, this document can be used for -current tuning, however, a
                   1371: read of the
                   1372: [[Tracking -current|tracking_current]] documentation should be done first, much
                   1373: of the information there is repeated here.
                   1374: 
                   1375: ### Configuring the Kernel
                   1376: 
                   1377: Configuring a kernel in NetBSD can be daunting. This is because of multiple line
                   1378: dependencies within the configuration file itself, however, there is a benefit
                   1379: to this method and that is, all it really takes is an ASCII editor to get a new
                   1380: kernel configured and some dmesg output. The kernel configuration file is under
                   1381: `src/sys/arch/ARCH/conf` where ARCH is your architecture (for example, on a
                   1382: SPARC it would be under `src/sys/arch/sparc/conf`).
                   1383: 
                   1384: After you have located your kernel config file, copy it and remove (comment out)
                   1385: all the entries you don't need. This is where
                   1386: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
                   1387: becomes your friend. A clean
                   1388: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)-output
                   1389: will show all of the devices detected by the kernel at boot time. Using
                   1390: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
                   1391: output, the device options really needed can be determined.
                   1392: 
                   1393: #### Some example Configuration Items
                   1394: 
                   1395: In this example, an ftp server's kernel is being reconfigured to run with the
                   1396: bare minimum drivers and options and any other items that might make it run
                   1397: faster (again, not necessarily smaller, although it will be). The first thing to
                   1398: do is take a look at some of the main configuration items. So, in
                   1399: `/usr/src/sys/arch/i386/conf` the GENERIC file is copied to FTP, then the file
                   1400: FTP edited.
                   1401: 
                   1402: At the start of the file there are a bunch of options beginning with maxusers,
                   1403: which will be left alone, however, on larger multi-user systems it might be help
                   1404: to crank that value up a bit. Next is CPU support, looking at the dmesg output
                   1405: this is seen:
                   1406: 
                   1407:     cpu0: Intel Pentium II/Celeron (Deschutes) (686-class), 400.93 MHz
                   1408: 
                   1409: Indicating that only the options `I686_CPU` options needs to be used. In the next
                   1410: section, all options are left alone except the `PIC_DELAY` which is recommended
                   1411: unless it is an older machine. In this case it is enabled since the 686 is
                   1412: *relatively new*.
                   1413: 
                   1414: Between the last section all the way down to compat options there really was no
                   1415: need to change anything on this particular system. In the compat section,
                   1416: however, there are several options that do not need to be enabled, again this is
                   1417: because this machine is strictly a FTP server, all compat options were turned
                   1418: off.
                   1419: 
                   1420: The next section is File systems, and again, for this server very few need to be
                   1421: on, the following were left on:
                   1422: 
                   1423:     # File systems
                   1424:     file-system     FFS             # UFS
                   1425:     file-system     LFS             # log-structured file system
                   1426:     file-system     MFS             # memory file system
                   1427:     file-system     CD9660          # ISO 9660 + Rock Ridge file system
                   1428:     file-system     FDESC           # /dev/fd
                   1429:     file-system     KERNFS          # /kern
                   1430:     file-system     NULLFS          # loopback file system
                   1431:     file-system     PROCFS          # /proc
                   1432:     file-system     UMAPFS          # NULLFS + uid and gid remapping
                   1433:     ...
                   1434:     options         SOFTDEP         # FFS soft updates support.
                   1435:     ...
                   1436: 
                   1437: Next comes the network options section. The only options left on were:
                   1438: 
                   1439:     options         INET            # IP + ICMP + TCP + UDP
                   1440:     options         INET6           # IPV6
                   1441:     options         IPFILTER_LOG    # ipmon(8) log support
                   1442: 
                   1443: `IPFILTER_LOG` is a nice one to have around since the server will be running
                   1444: ipf.
                   1445: 
                   1446: The next section is verbose messages for various subsystems, since this machine
                   1447: is already running and had no major problems, all of them are commented out.
                   1448: 
                   1449: #### Some Drivers
                   1450: 
                   1451: The configurable items in the config file are relatively few and easy to cover,
                   1452: however, device drivers are a different story. In the following examples, two
                   1453: drivers are examined and their associated *areas* in the file trimmed down.
                   1454: First, a small example: the cdrom, in dmesg, is the following lines:
                   1455: 
                   1456:     ...
                   1457:     cd0 at atapibus0 drive 0: <CD-540E, , 1.0A> type 5 cdrom removable
                   1458:     cd0: 32-bit data port
                   1459:     cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
                   1460:     pciide0: secondary channel interrupting at irq 15
                   1461:     cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfer
                   1462:     ...
                   1463: 
                   1464: Now, it is time to track that section down in the configuration file. Notice
                   1465: that the `cd`-drive is on an atapibus and requires pciide support. The section
                   1466: that is of interest in this case is the kernel config's "IDE and related
                   1467: devices" section. It is worth noting at this point, in and around the IDE
                   1468: section are also ISA, PCMCIA etc., on this machine in the
                   1469: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
                   1470: output there are no PCMCIA devices, so it stands to reason that all PCMCIA
                   1471: references can be removed. But first, the `cd` drive.
                   1472: 
                   1473: At the start of the IDE section is the following:
                   1474: 
                   1475:     ...
                   1476:     wd*     at atabus? drive ? flags 0x0000
                   1477:     ...
                   1478:     atapibus* at atapi?
                   1479:     ...
                   1480: 
                   1481: Well, it is pretty obvious that those lines need to be kept. Next is this:
                   1482: 
                   1483:     ...
                   1484:     # ATAPI devices
                   1485:     # flags have the same meaning as for IDE drives.
                   1486:     cd*     at atapibus? drive ? flags 0x0000       # ATAPI CD-ROM drives
                   1487:     sd*     at atapibus? drive ? flags 0x0000       # ATAPI disk drives
                   1488:     st*     at atapibus? drive ? flags 0x0000       # ATAPI tape drives
                   1489:     uk*     at atapibus? drive ? flags 0x0000       # ATAPI unknown
                   1490:     ...
                   1491: 
                   1492: The only device type that was in the
                   1493: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
                   1494: output was the cd, the rest can be commented out.
                   1495: 
                   1496: The next example is slightly more difficult, network interfaces. This machine
                   1497: has two of them:
                   1498: 
                   1499:     ...
                   1500:     ex0 at pci0 dev 17 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x64)
                   1501:     ex0: interrupting at irq 10
                   1502:     ex0: MAC address 00:50:04:83:ff:b7
                   1503:     UI 0x001018 model 0x0012 rev 0 at ex0 phy 24 not configured
                   1504:     ex1 at pci0 dev 19 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
                   1505:     ex1: interrupting at irq 11
                   1506:     ex1: MAC address 00:50:da:63:91:2e
                   1507:     exphy0 at ex1 phy 24: 3Com internal media interface
                   1508:     exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
                   1509:     ...
                   1510: 
                   1511: At first glance it may appear that there are in fact three devices, however, a
                   1512: closer look at this line:
                   1513: 
                   1514:     exphy0 at ex1 phy 24: 3Com internal media interface
                   1515: 
                   1516: Reveals that it is only two physical cards, not unlike the cdrom, simply
                   1517: removing names that are not in dmesg will do the job. In the beginning of the
                   1518: network interfaces section is:
                   1519: 
                   1520:     ...
                   1521:     # Network Interfaces
                   1522:     
                   1523:     # PCI network interfaces
                   1524:     an*     at pci? dev ? function ?        # Aironet PC4500/PC4800 (802.11)
                   1525:     bge*    at pci? dev ? function ?        # Broadcom 570x gigabit Ethernet
                   1526:     en*     at pci? dev ? function ?        # ENI/Adaptec ATM
                   1527:     ep*     at pci? dev ? function ?        # 3Com 3c59x
                   1528:     epic*   at pci? dev ? function ?        # SMC EPIC/100 Ethernet
                   1529:     esh*    at pci? dev ? function ?        # Essential HIPPI card
                   1530:     ex*     at pci? dev ? function ?        # 3Com 90x[BC]
                   1531:     ...
                   1532: 
                   1533: There is the ex device. So all of the rest under the PCI section can be removed.
                   1534: Additionally, every single line all the way down to this one:
                   1535: 
                   1536:     exphy*  at mii? phy ?                   # 3Com internal PHYs
                   1537: 
                   1538: can be commented out as well as the remaining.
                   1539: 
                   1540: #### Multi Pass
                   1541: 
                   1542: When I tune a kernel, I like to do it remotely in an X windows session, in one
                   1543: window the dmesg output, in the other the config file. It can sometimes take a
                   1544: few passes to rebuild a very trimmed kernel since it is easy to accidentally
                   1545: remove dependencies.
                   1546: 
                   1547: ### Building the New Kernel
                   1548: 
                   1549: Now it is time to build the kernel and put it in place. In the conf directory on
                   1550: the ftp server, the following command prepares the build:
                   1551: 
                   1552:     $ config FTP
                   1553: 
                   1554: When it is done a message reminding me to make depend will display, next:
                   1555: 
                   1556:     $ cd ../compile/FTP
                   1557:     $ make depend && make
                   1558: 
                   1559: When it is done, I backup the old kernel and drop the new one in place:
                   1560: 
                   1561:     # cp /netbsd /netbsd.orig
                   1562:     # cp netbsd /
                   1563: 
                   1564: Now reboot. If the kernel cannot boot, stop the boot process when prompted and
                   1565: type `boot netbsd.orig` to boot from the previous kernel.
                   1566: 
                   1567: ### Shrinking the NetBSD kernel
                   1568: 
                   1569: When building a kernel for embedded systems, it's often necessary to modify the
                   1570: Kernel binary to reduce space or memory footprint.
                   1571: 
                   1572: #### Removing ELF sections and debug information
                   1573: 
                   1574: We already know how to remove Kernel support for drivers and options that you
                   1575: don't need, thus saving memory and space, but you can save some KiloBytes of
                   1576: space by removing debugging symbols and two ELF sections if you don't need them:
                   1577: `.comment` and `.ident`. They are used for storing RCS strings viewable with
                   1578: [ident(1)](http://netbsd.gw.com/cgi-bin/man-cgi?ident+1+NetBSD-5.0.1+i386) and a
                   1579: [gcc(1)](http://netbsd.gw.com/cgi-bin/man-cgi?gcc+1+NetBSD-5.0.1+i386) version
                   1580: string. The following examples assume you have your `TOOLDIR` under
                   1581: `/usr/src/tooldir.NetBSD-2.0-i386` and the target architecture is `i386`.
                   1582: 
                   1583:     $ /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-objdump -h /netbsd
                   1584:     
                   1585:     /netbsd:     file format elf32-i386
                   1586:     
                   1587:     Sections:
                   1588:     Idx Name          Size      VMA       LMA       File off  Algn
                   1589:       0 .text         0057a374  c0100000  c0100000  00001000  2**4
                   1590:                       CONTENTS, ALLOC, LOAD, READONLY, CODE
                   1591:       1 .rodata       00131433  c067a380  c067a380  0057b380  2**5
                   1592:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1593:       2 .rodata.str1.1 00035ea0  c07ab7b3  c07ab7b3  006ac7b3  2**0
                   1594:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1595:       3 .rodata.str1.32 00059d13  c07e1660  c07e1660  006e2660  2**5
                   1596:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1597:       4 link_set_malloc_types 00000198  c083b374  c083b374  0073c374  2**2
                   1598:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1599:       5 link_set_domains 00000024  c083b50c  c083b50c  0073c50c  2**2
                   1600:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1601:       6 link_set_pools 00000158  c083b530  c083b530  0073c530  2**2
                   1602:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1603:       7 link_set_sysctl_funcs 000000f0  c083b688  c083b688  0073c688  2**2
                   1604:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1605:       8 link_set_vfsops 00000044  c083b778  c083b778  0073c778  2**2
                   1606:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1607:       9 link_set_dkwedge_methods 00000004  c083b7bc  c083b7bc  0073c7bc  2**2
                   1608:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1609:      10 link_set_bufq_strats 0000000c  c083b7c0  c083b7c0  0073c7c0  2**2
                   1610:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1611:      11 link_set_evcnts 00000030  c083b7cc  c083b7cc  0073c7cc  2**2
                   1612:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
                   1613:      12 .data         00048ae4  c083c800  c083c800  0073c800  2**5
                   1614:                       CONTENTS, ALLOC, LOAD, DATA
                   1615:      13 .bss          00058974  c0885300  c0885300  00785300  2**5
                   1616:                       ALLOC
                   1617:      14 .comment      0000cda0  00000000  00000000  00785300  2**0
                   1618:                       CONTENTS, READONLY
                   1619:      15 .ident        000119e4  00000000  00000000  007920a0  2**0
                   1620:                       CONTENTS, READONLY
                   1621: 
                   1622: On the third column we can see the size of the sections in hexadecimal form. By
                   1623: summing `.comment` and `.ident` sizes we know how much we're going to save with
                   1624: their removal: around 120KB (= 52640 + 72164 = 0xcda0 + 0x119e4). To remove the
                   1625: sections and debugging symbols that may be present, we're going to use
                   1626: [strip(1)](http://netbsd.gw.com/cgi-bin/man-cgi?strip+1+NetBSD-5.0.1+i386):
                   1627: 
                   1628:     # cp /netbsd /netbsd.orig
                   1629:     # /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-strip -S -R .ident -R .comment /netbsd
                   1630:     # ls -l /netbsd /netbsd.orig
                   1631:     -rwxr-xr-x  1 root  wheel  8590668 Apr 30 15:56 netbsd
                   1632:     -rwxr-xr-x  1 root  wheel  8757547 Apr 30 15:56 netbsd.orig
                   1633: 
                   1634: Since we also removed debugging symbols, the total amount of disk space saved is
                   1635: around 160KB.
                   1636: 
                   1637: #### Compressing the Kernel
                   1638: 
                   1639: On some architectures, the bootloader can boot a compressed kernel. You can save
                   1640: several MegaBytes of disk space by using this method, but the bootloader will
                   1641: take longer to load the Kernel.
                   1642: 
                   1643:     # cp /netbsd /netbsd.plain
                   1644:     # gzip -9 /netbsd
                   1645: 
                   1646: To see how much space we've saved:
                   1647: 
                   1648:     $ ls -l /netbsd.plain /netbsd.gz
                   1649:     -rwxr-xr-x  1 root  wheel  8757547 Apr 29 18:05 /netbsd.plain
                   1650:     -rwxr-xr-x  1 root  wheel  3987769 Apr 29 18:05 /netbsd.gz
                   1651: 
                   1652: Note that you can only use gzip coding, by using
                   1653: [gzip(1)](http://netbsd.gw.com/cgi-bin/man-cgi?gzip+1+NetBSD-5.0.1+i386), bzip2
                   1654: is not supported by the NetBSD bootloaders!
                   1655: 

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb