File:  [NetBSD Developer Wiki] / wikisrc / guide / tuning.mdwn
Revision 1.1: download - view: text, annotated - select for diffs
Mon Mar 4 21:21:18 2013 UTC (7 years, 7 months ago) by jdf
Branches: MAIN
CVS tags: HEAD
Migrate tuning part from the guide to the wiki

    1: # Tuning NetBSD
    2: 
    3: ## Introduction
    4: 
    5: ### Overview
    6: 
    7: This section covers a variety of performance tuning topics. It attempts to span
    8: tuning from the perspective of the system administrator to systems programmer.
    9: The art of performance tuning itself is very old. To tune something means to
   10: make it operate more efficiently, whether one is referring to a NetBSD based
   11: technical server or a vacuum cleaner, the goal is to improve something, whether
   12: that be the way something is done, how it works or how it is put together.
   13: 
   14: #### What is Performance Tuning?
   15: 
   16: A view from 10,000 feet pretty much dictates that everything we do is task
   17: oriented, this pertains to a NetBSD system as well. When the system boots, it
   18: automatically begins to perform a variety of tasks. When a user logs in, they
   19: usually have a wide variety of tasks they have to accomplish. In the scope of
   20: these documents, however, performance tuning strictly means to improve how
   21: efficient a NetBSD system performs.
   22: 
   23: The most common thought that crops into someone's mind when they think "tuning"
   24: is some sort of speed increase or decreasing the size of the kernel - while
   25: those are ways to improve performance, they are not the only ends an
   26: administrator may have to take for increasing efficiency. For our purposes,
   27: performance tuning means this: *To make a NetBSD system operate in an optimum
   28: state.*
   29: 
   30: Which could mean a variety of things, not necessarily speed enhancements. A good
   31: example of this is filesystem formatting parameters, on a system that has a lot
   32: of small files (say like a source repository) an administrator may need to
   33: increase the number of inodes by making their size smaller (say down to 1024k)
   34: and then increasing the amount of inodes. In this case the number of inodes was
   35: increased, however, it keeps the administrator from getting those nasty out of
   36: inodes messages, which ultimately makes the system more efficient.
   37: 
   38: Tuning normally revolves around finding and eliminating bottlenecks. Most of the
   39: time, such bottlenecks are spurious, for example, a release of Mozilla that does
   40: not quite handle java applets too well can cause Mozilla to start crunching the
   41: CPU, especially applets that are not done well. Occasions when processes seem to
   42: spin off into nowhere and eat CPU are almost always resolved with a kill. There
   43: are instances, however, when resolving bottlenecks takes a lot longer, for
   44: example, say an rsynced server is just getting larger and larger. Slowly,
   45: performance begins to fade and the administrator may have to take some sort of
   46: action to speed things up, however, the situation is relative to say an
   47: emergency like an instantly spiked CPU.
   48: 
   49: #### When does one tune?
   50: 
   51: Many NetBSD users rarely have to tune a system. The GENERIC kernel may run just
   52: fine and the layout/configuration of the system may do the job as well. By the
   53: same token, as a pragma it is always good to know how to tune a system. Most
   54: often tuning comes as a result of a sudden bottleneck issue (which may occur
   55: randomly) or a gradual loss of performance. It does happen in a sense to
   56: everyone at some point, one process that is eating the CPU is a bottleneck as
   57: much as a gradual increase in paging. So, the question should not be when to
   58: tune so much as when to learn to tune.
   59: 
   60: One last time to tune is if you can tune in a preventive manner (and you think
   61: you might need to) then do it. One example of this was on a system that needed
   62: to be able to reboot quickly. Instead of waiting, I did everything I could to
   63: trim the kernel and make sure there was absolutely nothing running that was not
   64: needed, I even removed drivers that did have devices, but were never used (lp).
   65: The result was reducing reboot time by nearly two-thirds. In the long run, it
   66: was a smart move to tune it before it became an issue.
   67: 
   68: #### What these Documents Will Not Cover
   69: 
   70: Before I wrap up the introduction, I think it is important to note what these
   71: documents will not cover. This guide will pertain only to the core NetBSD
   72: system. In other words, it will not cover tuning a web server's configuration to
   73: make it run better; however, it might mention how to tune NetBSD to run better
   74: as a web server. The logic behind this is simple: web servers, database
   75: software, etc. are third party and almost limitless. I could easily get mired
   76: down in details that do not apply to the NetBSD system. Almost all third party
   77: software have their own documentation about tuning anyhow.
   78: 
   79: #### How Examples are Laid Out
   80: 
   81: Since there is ample man page documentation, only used options and arguments
   82: with examples are discussed. In some cases, material is truncated for brevity
   83: and not thoroughly discussed because, quite simply, there is too much. For
   84: example, every single device driver entry in the kernel will not be discussed,
   85: however, an example of determining whether or not a given system needs one will
   86: be. Nothing in this Guide is concrete, tuning and performance are very
   87: subjective, instead, it is a guide for the reader to learn what some of the
   88: tools available to them can do.
   89: 
   90: ## Tuning Considerations
   91: 
   92: Tuning a system is not really too difficult when pro-active tuning is the
   93: approach. This document approaches tuning from a *before it comes up* approach.
   94: While tuning in spare time is considerably easier versus say, a server that is
   95: almost completely bogged down to 0.1% idle time, there are still a few things
   96: that should be mulled over about tuning before actually doing it, hopefully,
   97: before a system is even installed.
   98: 
   99: ### General System Configuration
  100: 
  101: Of course, how the system is setup makes a big difference. Sometimes small items
  102: can be overlooked which may in fact cause some sort of long term performance
  103: problem.
  104: 
  105: #### Filesystems and Disks
  106: 
  107: How the filesystem is laid out relative to disk drives is very important. On
  108: hardware RAID systems, it is not such a big deal, but, many NetBSD users
  109: specifically use NetBSD on older hardware where hardware RAID simply is not an
  110: option. The idea of `/` being close to the first drive is a good one, but for
  111: example if there are several drives to choose from that will be the first one,
  112: is the best performing the one that `/` will be on? On a related note, is it
  113: wise to split off `/usr`? Will the system see heavy usage say in `/usr/pkgsrc`?
  114: It might make sense to slap a fast drive in and mount it under `/usr/pkgsrc`, or
  115: it might not. Like all things in performance tuning, this is subjective.
  116: 
  117: #### Swap Configuration
  118: 
  119: There are three schools of thought on swap size and about fifty on using split
  120: swap files with prioritizing and how that should be done. In the swap size
  121: arena, the vendor schools (at least most commercial ones) usually have their own
  122: formulas per OS. As an example, on a particular version of HP-UX with a
  123: particular version of Oracle the formula was:
  124: 
  125: 2.5 GB \* Number\_of\_processor
  126: 
  127: Well, that all really depends on what type of usage the database is having and
  128: how large it is, for instance if it is so large that it must be distributed,
  129: that formula does not fit well.
  130: 
  131: The next school of thought about swap sizing is sort of strange but makes some
  132: sense, it says, if possible, get a reference amount of memory used by the
  133: system. It goes something like this:
  134: 
  135:  1. Startup a machine and estimate total memory needs by running everything that
  136:     may ever be needed at once. Databases, web servers .... whatever. Total up
  137: 	the amount.
  138:  2. Add a few MB for padding.
  139:  3. Subtract the amount of physical RAM from this total.
  140: 
  141: If the amount leftover is 3 times the size of physical RAM, consider getting
  142: more RAM. The problem, of course, is figuring out what is needed and how much
  143: space it will take. There is also another flaw in this method, some programs do
  144: not behave well. A glaring example of misbehaved software is web browsers. On
  145: certain versions of Netscape, when something went wrong it had a tendency to
  146: runaway and eat swap space. So, the more spare space available, the more time to
  147: kill it.
  148: 
  149: Last and not least is the tried and true PHYSICAL\_RAM \* 2 method. On modern
  150: machines and even older ones (with limited purpose of course) this seems to work
  151: best.
  152: 
  153: All in all, it is hard to tell when swapping will start. Even on small 16MB RAM
  154: machines (and less) NetBSD has always worked well for most people until
  155: misbehaving software is running.
  156: 
  157: ### System Services
  158: 
  159: On servers, system services have a large impact. Getting them to run at their
  160: best almost always requires some sort of network level change or a fundamental
  161: speed increase in the underlying system (which of course is what this is all
  162: about). There are instances when some simple solutions can improve services. One
  163: example, an ftp server is becoming slower and a new release of the ftp server
  164: that is shipped with the system comes out that, just happens to run faster. By
  165: upgrading the ftp software, a performance boost is accomplished.
  166: 
  167: Another good example where services are concerned is the age old question: *To
  168: use inetd or not to use inetd?* A great service example is pop3. Pop3
  169: connections can conceivably clog up inetd. While the pop3 service itself starts
  170: to degrade slowly, other services that are multiplexed through inetd will also
  171: degrade (in some case more than pop3). Setting up pop3 to run outside of inetd
  172: and on its own may help.
  173: 
  174: ### The NetBSD Kernel
  175: 
  176: The NetBSD kernel obviously plays a key role in how well a system performs,
  177: while rebuilding and tuning the kernel is covered later in the text, it is worth
  178: discussing in the local context from a high level.
  179: 
  180: Tuning the NetBSD kernel really involves three main areas:
  181: 
  182:  1. removing unrequired drivers
  183:  2. configuring options
  184:  3. system settings
  185: 
  186: #### Removing Unrequired Drivers
  187: 
  188: Taking drivers out of the kernel that are not needed achieves several results;
  189: first, the system boots faster since the kernel is smaller, second again since
  190: the kernel is smaller, more memory is free to users and processes and third, the
  191: kernel tends to respond quicker.
  192: 
  193: #### Configuring Options
  194: 
  195: Configuring options such as enabling/disabling certain subsystems, specific
  196: hardware and filesystems can also improve performance pretty much the same way
  197: removing unrequired drivers does. A very simple example of this is a FTP server
  198: that only hosts ftp files - nothing else. On this particular server there is no
  199: need to have anything but native filesystem support and perhaps a few options to
  200: help speed things along. Why would it ever need NTFS support for example?
  201: Besides, if it did, support for NTFS could be added at some later time. In an
  202: opposite case, a workstation may need to support a lot of different filesystem
  203: types to share and access files.
  204: 
  205: #### System Settings
  206: 
  207: System wide settings are controlled by the kernel, a few examples are filesystem
  208: settings, network settings and core kernel settings such as the maximum number
  209: of processes. Almost all system settings can be at least looked at or modified
  210: via the sysctl facility. Examples using the sysctl facility are given later on.
  211: 
  212: ## Visual Monitoring Tools
  213: 
  214: NetBSD ships a variety of performance monitoring tools with the system. Most of
  215: these tools are common on all UNIX systems. In this section some example usage
  216: of the tools is given with interpretation of the output.
  217: 
  218: ### The top Process Monitor
  219: 
  220: The [top(1)](http://netbsd.gw.com/cgi-bin/man-cgi?top+1+NetBSD-current)
  221: monitor does exactly what it says: it displays the CPU hogs on the
  222: system. To run the monitor, simply type top at the prompt. Without any
  223: arguments, it should look like:
  224: 
  225:     load averages:  0.09,  0.12,  0.08                                     20:23:41
  226:     21 processes:  20 sleeping, 1 on processor
  227:     CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
  228:     Memory: 15M Act, 1104K Inact, 208K Wired, 22M Free, 129M Swap free
  229:     
  230:       PID USERNAME PRI NICE   SIZE   RES STATE     TIME   WCPU    CPU COMMAND
  231:     13663 root       2    0  1552K 1836K sleep     0:08  0.00%  0.00% httpd
  232:       127 root      10    0   129M 4464K sleep     0:01  0.00%  0.00% mount_mfs
  233:     22591 root       2    0   388K 1156K sleep     0:01  0.00%  0.00% sshd
  234:       108 root       2    0   132K  472K sleep     0:01  0.00%  0.00% syslogd
  235:     22597 jrf       28    0   156K  616K onproc    0:00  0.00%  0.00% top
  236:     22592 jrf       18    0   828K 1128K sleep     0:00  0.00%  0.00% tcsh
  237:       203 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
  238:         1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
  239:       205 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
  240:       206 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  241:       208 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  242:       207 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  243:     13667 nobody     2    0  1660K 1508K sleep     0:00  0.00%  0.00% httpd
  244:      9926 root       2    0   336K  588K sleep     0:00  0.00%  0.00% sshd
  245:       200 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
  246:       182 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
  247:       180 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
  248:     13666 nobody    -4    0  1600K 1260K sleep     0:00  0.00%  0.00% httpd
  249: 
  250: The top(1) utility is great for finding CPU hogs, runaway processes or groups of
  251: processes that may be causing problems. The output shown above indicates that
  252: this particular system is in good health. Now, the next display should show some
  253: very different results:
  254: 
  255:     load averages:  0.34,  0.16,  0.13                                     21:13:47
  256:     25 processes:  24 sleeping, 1 on processor
  257:     CPU states:  0.5% user,  0.0% nice,  9.0% system,  1.0% interrupt, 89.6% idle
  258:     Memory: 20M Act, 1712K Inact, 240K Wired, 30M Free, 129M Swap free
  259:     
  260:       PID USERNAME PRI NICE   SIZE   RES STATE     TIME   WCPU    CPU COMMAND
  261:      5304 jrf       -5    0    56K  336K sleep     0:04 66.07% 19.53% bonnie
  262:      5294 root       2    0   412K 1176K sleep     0:02  1.01%  0.93% sshd
  263:       108 root       2    0   132K  472K sleep     1:23  0.00%  0.00% syslogd
  264:       187 root       2    0  1552K 1824K sleep     0:07  0.00%  0.00% httpd
  265:      5288 root       2    0   412K 1176K sleep     0:02  0.00%  0.00% sshd
  266:      5302 jrf       28    0   160K  620K onproc    0:00  0.00%  0.00% top
  267:      5295 jrf       18    0   828K 1116K sleep     0:00  0.00%  0.00% tcsh
  268:      5289 jrf       18    0   828K 1112K sleep     0:00  0.00%  0.00% tcsh
  269:       127 root      10    0   129M 8388K sleep     0:00  0.00%  0.00% mount_mfs
  270:       204 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
  271:         1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
  272:       208 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
  273:       210 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  274:       209 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  275:       211 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  276:       217 nobody     2    0  1616K 1272K sleep     0:00  0.00%  0.00% httpd
  277:       184 root       2    0   336K  580K sleep     0:00  0.00%  0.00% sshd
  278:       201 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
  279: 
  280: At first, it should seem rather obvious which process is hogging the system,
  281: however, what is interesting in this case is why. The bonnie program is a disk
  282: benchmark tool which can write large files in a variety of sizes and ways. What
  283: the previous output indicates is only that the bonnie program is a CPU hog, but
  284: not why.
  285: 
  286: #### Other Neat Things About Top
  287: 
  288: A careful examination of the manual page
  289: [top(1)](http://netbsd.gw.com/cgi-bin/man-cgi?top+1+NetBSD-5.0.1+i386) shows
  290: that there is a lot more that can be done with top, for example, processes can
  291: have their priority changed and killed. Additionally, filters can be set for
  292: looking at processes.
  293: 
  294: ### The sysstat utility
  295: 
  296: As the man page
  297: [sysstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?sysstat+1+NetBSD-5.0.1+i386)
  298: indicates, the sysstat utility shows a variety of system statistics using the
  299: curses library. While it is running the screen is shown in two parts, the upper
  300: window shows the current load average while the lower screen depends on user
  301: commands. The exception to the split window view is when vmstat display is on
  302: which takes up the whole screen. Following is what sysstat looks like on a
  303: fairly idle system with no arguments given when it was invoked:
  304: 
  305:                        /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  306:          Load Average   |
  307:     
  308:                              /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
  309:                       <idle> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  310: 
  311: Basically a lot of dead time there, so now have a look with some arguments
  312: provided, in this case, `sysstat inet.tcp` which looks like this:
  313: 
  314:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  315:          Load Average   |
  316:     
  317:             0 connections initiated           19 total TCP packets sent
  318:             0 connections accepted            11   data
  319:             0 connections established          0   data (retransmit)
  320:                                                8   ack-only
  321:             0 connections dropped              0   window probes
  322:             0   in embryonic state             0   window updates
  323:             0   on retransmit timeout          0   urgent data only
  324:             0   by keepalive                   0   control
  325:             0   by persist
  326:                                               29 total TCP packets received
  327:            11 potential rtt updates           17   in sequence
  328:            11 successful rtt updates           0   completely duplicate
  329:             9 delayed acks sent                0   with some duplicate data
  330:             0 retransmit timeouts              4   out of order
  331:             0 persist timeouts                 0   duplicate acks
  332:             0 keepalive probes                11   acks
  333:             0 keepalive timeouts               0   window probes
  334:                                                0   window updates
  335: 
  336: Now that is informative. The first poll is accumulative, so it is possible to
  337: see quite a lot of information in the output when sysstat is invoked. Now, while
  338: that may be interesting, how about a look at the buffer cache with `sysstat
  339: bufcache`:
  340: 
  341:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  342:          Load Average
  343:     
  344:     There are 1642 buffers using 6568 kBytes of memory.
  345:     
  346:     File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
  347:     /                          877  53        6171  93        6516  99      94
  348:     /var/tmp                     5   0          17   0          28   0      60
  349:     
  350:     Total:                     882  53        6188  94        6544  99
  351: 
  352: Again, a pretty boring system, but great information to have available. While
  353: this is all nice to look at, it is time to put a false load on the system to see
  354: how sysstat can be used as a performance monitoring tool. As with top, bonnie++
  355: will be used to put a high load on the I/O subsystems and a little on the CPU.
  356: The bufcache will be looked at again to see of there are any noticeable
  357: differences:
  358: 
  359:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  360:          Load Average   |||
  361:     
  362:     There are 1642 buffers using 6568 kBytes of memory.
  363:     
  364:     File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
  365:     /                          811  49        6422  97        6444  98      99
  366:     
  367:     Total:                     811  49        6422  97        6444  98
  368: 
  369: First, notice that the load average shot up, this is to be expected of course,
  370: then, while most of the numbers are close, notice that utilization is at 99%.
  371: Throughout the time that bonnie++ was running the utilization percentage
  372: remained at 99, this of course makes sense, however, in a real troubleshooting
  373: situation, it could be indicative of a process doing heavy I/O on one particular
  374: file or filesystem.
  375: 
  376: ## Monitoring Tools
  377: 
  378: In addition to screen oriented monitors and tools, the NetBSD system also ships
  379: with a set of command line oriented tools. Many of the tools that ship with a
  380: NetBSD system can be found on other UNIX and UNIX-like systems.
  381: 
  382: ### fstat
  383: 
  384: The [fstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?fstat+1+NetBSD-5.0.1+i386)
  385: utility reports the status of open files on the system, while it is not what
  386: many administrators consider a performance monitor, it can help find out if a
  387: particular user or process is using an inordinate amount of files, generating
  388: large files and similar information.
  389: 
  390: Following is a sample of some fstat output:
  391: 
  392:     USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
  393:     jrf      tcsh       21607   wd /         29772 drwxr-xr-x     512 r
  394:     jrf      tcsh       21607    3* unix stream c057acc0<-> c0553280
  395:     jrf      tcsh       21607    4* unix stream c0553280 <-> c057acc0
  396:     root     sshd       21597   wd /             2 drwxr-xr-x     512 r
  397:     root     sshd       21597    0 /         11921 crw-rw-rw-    null rw
  398:     nobody   httpd       5032   wd /             2 drwxr-xr-x     512 r
  399:     nobody   httpd       5032    0 /         11921 crw-rw-rw-    null r
  400:     nobody   httpd       5032    1 /         11921 crw-rw-rw-    null w
  401:     nobody   httpd       5032    2 /         15890 -rw-r--r--  353533 rw
  402:     ...
  403: 
  404: The fields are pretty self explanatory, again, this tool while not as
  405: performance oriented as others, can come in handy when trying to find out
  406: information about file usage.
  407: 
  408: ### iostat
  409: 
  410: The [iostat(8)](http://netbsd.gw.com/cgi-bin/man-cgi?iostat+8+NetBSD-5.0.1+i386)
  411: command does exactly what it sounds like, it reports the status of the I/O
  412: subsystems on the system. When iostat is employed, the user typically runs it
  413: with a certain number of counts and an interval between them like so:
  414: 
  415:     $ iostat 5 5
  416:           tty            wd0             cd0             fd0             md0             cpu
  417:      tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
  418:        0    1  5.13   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  419:        0   54  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  420:        0   18  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  421:        0   18  8.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  422:        0   28  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  423: 
  424: The above output is from a very quiet ftp server. The fields represent the
  425: various I/O devices, the tty (which, ironically, is the most active because
  426: iostat is running), wd0 which is the primary IDE disk, cd0, the cdrom drive,
  427: fd0, the floppy and the memory filesystem.
  428: 
  429: Now, let's see if we can pummel the system with some heavy usage. First, a large
  430: ftp transaction consisting of a tarball of netbsd-current source along with the
  431: `bonnie++` disk benchmark program running at the same time.
  432: 
  433:     $ iostat 5 5
  434:           tty            wd0             cd0             fd0             md0             cpu
  435:      tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
  436:        0    1  5.68   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  437:        0   54 61.03 150 8.92   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 18  4 78
  438:        0   26 63.14 157 9.71   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 20  4 75
  439:        0   20 43.58  26 1.12   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  9  2 88
  440:        0   28 19.49  82 1.55   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0  7  3 89
  441: 
  442: As can be expected, notice that wd0 is very active, what is interesting about
  443: this output is how the processor's I/O seems to rise in proportion to wd0. This
  444: makes perfect sense, however, it is worth noting that only because this ftp
  445: server is hardly being used can that be observed. If, for example, the cpu I/O
  446: subsystem was already under a moderate load and the disk subsystem was under the
  447: same load as it is now, it could appear that the cpu is bottlenecked when in
  448: fact it would have been the disk. In such a case, we can observe that *one tool*
  449: is rarely enough to completely analyze a problem. A quick glance at processes
  450: probably would tell us (after watching iostat) which processes were causing
  451: problems.
  452: 
  453: ### ps
  454: 
  455: Using the [ps(1)](http://netbsd.gw.com/cgi-bin/man-cgi?ps+1+NetBSD-5.0.1+i386)
  456: command or process status, a great deal of information about the system can be
  457: discovered. Most of the time, the ps command is used to isolate a particular
  458: process by name, group, owner etc. Invoked with no options or arguments, ps
  459: simply prints out information about the user executing it.
  460: 
  461:     $ ps
  462:       PID TT STAT    TIME COMMAND
  463:     21560 p0 Is   0:00.04 -tcsh
  464:     21564 p0 I+   0:00.37 ssh jrf.odpn.net
  465:     21598 p1 Ss   0:00.12 -tcsh
  466:     21673 p1 R+   0:00.00 ps
  467:     21638 p2 Is+  0:00.06 -tcsh
  468: 
  469: Not very exciting. The fields are self explanatory with the exception of `STAT`
  470: which is actually the state a process is in. The flags are all documented in the
  471: man page, however, in the above example, `I` is idle, `S` is sleeping, `R` is
  472: runnable, the `+` means the process is in a foreground state, and the s means
  473: the process is a session leader. This all makes perfect sense when looking at
  474: the flags, for example, PID 21560 is a shell, it is idle and (as would be
  475: expected) the shell is the process leader.
  476: 
  477: In most cases, someone is looking for something very specific in the process
  478: listing. As an example, looking at all processes is specified with `-a`, to see
  479: all processes plus those without controlling terminals is `-ax` and to get a
  480: much more verbose listing (basically everything plus information about the
  481: impact processes are having) aux:
  482: 
  483:     # ps aux
  484:     USER     PID %CPU %MEM    VSZ  RSS TT STAT STARTED    TIME COMMAND
  485:     root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
  486:     root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
  487:     root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
  488:     jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
  489:     root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
  490:     jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh jrf.odpn.net
  491:     jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
  492:     root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
  493:     nobody  5032  0.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd
  494:     ...
  495: 
  496: Again, most of the fields are self explanatory with the exception of `VSZ` and
  497: `RSS` which can be a little confusing. `RSS` is the real size of a process in
  498: 1024 byte units while `VSZ` is the virtual size. This is all great, but again,
  499: how can ps help? Well, for one, take a look at this modified version of the same
  500: output:
  501: 
  502:     # ps aux
  503:     USER     PID %CPU %MEM    VSZ  RSS TT STAT STARTED    TIME COMMAND
  504:     root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
  505:     root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
  506:     root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
  507:     jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
  508:     root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
  509:     jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh jrf.odpn.net
  510:     jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
  511:     root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
  512:     nobody  5032  9.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd
  513:     ...
  514: 
  515: Given that on this server, our baseline indicates a relatively quiet system, the
  516: PID 5032 has an unusually large amount of `%CPU`. Sometimes this can also cause
  517: high `TIME` numbers. The ps command can be grepped on for PIDs, username and
  518: process name and hence help track down processes that may be experiencing
  519: problems.
  520: 
  521: ### vmstat
  522: 
  523: Using
  524: [vmstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?vmstat+1+NetBSD-5.0.1+i386),
  525: information pertaining to virtual memory can be monitored and measured. Not
  526: unlike iostat, vmstat can be invoked with a count and interval. Following is
  527: some sample output using `5 5` like the iostat example:
  528: 
  529:     # vmstat 5 5
  530:      procs   memory     page                       disks         faults      cpu
  531:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
  532:      0 7 0 17716 33160    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
  533:      0 7 0 17724 33156    2   0   0    0    0    0  1  0  0  0  109    6   3  0  0 100
  534:      0 7 0 17724 33156    1   0   0    0    0    0  1  0  0  0  105    6   3  0  0 100
  535:      0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  107    6   3  0  0 100
  536:      0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  105    6   3  0  0 100
  537: 
  538: Yet again, relatively quiet, for posterity, the exact same load that was put on
  539: this server in the iostat example will be used. The load is a large file
  540: transfer and the bonnie benchmark program.
  541: 
  542:     # vmstat 5 5
  543:      procs   memory     page                       disks         faults      cpu
  544:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
  545:      1 8 0 18880 31968    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
  546:      0 8 0 18888 31964    2   0   0    0    0    0 130  0  0  0 1804 5539 1094 31 22 47
  547:      1 7 0 18888 31964    1   0   0    0    0    0 130  0  0  0 1802 5500 1060 36 16 49
  548:      1 8 0 18888 31964    1   0   0    0    0    0 160  0  0  0 1849 5905 1107 21 22 57
  549:      1 7 0 18888 31964    1   0   0    0    0    0 175  0  0  0 1893 6167 1082  1 25 75
  550: 
  551: Just a little different. Notice, since most of the work was I/O based, the
  552: actual memory used was not very much. Since this system uses mfs for `/tmp`,
  553: however, it can certainly get beat up. Have a look at this:
  554: 
  555:     # vmstat 5 5
  556:      procs   memory     page                       disks         faults      cpu
  557:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
  558:      0 2 0 99188   500    2   0   0    0    0    0  1  0  0  0  105   16   4  0  0 100
  559:      0 2 0111596   436  592   0 587  624  586 1210 624  0  0  0  741  883 1088  0 11 89
  560:      0 3 0123976   784  666   0 662  643  683 1326 702  0  0  0  828  993 1237  0 12 88
  561:      0 2 0134692  1236  581   0 571  563  595 1158 599  0  0  0  722  863 1066  0  9 90
  562:      2 0 0142860   912  433   0 406  403  405  808 429  0  0  0  552  602 768  0  7 93
  563: 
  564: Pretty scary stuff. That was created by running bonnie in `/tmp` on a memory
  565: based filesystem. If it continued for too long, it is possible the system could
  566: have started thrashing. Notice that even though the VM subsystem was taking a
  567: beating, the processors still were not getting too battered.
  568: 
  569: ## Network Tools
  570: 
  571: Sometimes a performance problem is not a particular machine, it is the network
  572: or some sort of device on the network such as another host, a router etc. What
  573: other machines that provide a service or some sort of connectivity to a
  574: particular NetBSD system do and how they act can have a very large impact on
  575: performance of the NetBSD system itself, or the perception of performance by
  576: users. A really great example of this is when a DNS server that a NetBSD machine
  577: is using suddenly disappears. Lookups take long and they eventually fail.
  578: Someone logged into the NetBSD machine who is not experienced would undoubtedly
  579: (provided they had no other evidence) blame the NetBSD system. One of my
  580: personal favorites, *the Internet is broke*, usually means either DNS service or
  581: a router/gateway has dropped offline. Whatever the case may be, a NetBSD system
  582: comes adequately armed to deal with finding out what network issues may be
  583: cropping up whether the fault of the local system or some other issue.
  584: 
  585: ### ping
  586: 
  587: The classic
  588: [ping(8)](http://netbsd.gw.com/cgi-bin/man-cgi?ping+8+NetBSD-5.0.1+i386) utility
  589: can tell us if there is just plain connectivity, it can also tell if host
  590: resolution (depending on how `nsswitch.conf` dictates) is working. Following is
  591: some typical ping output on a local network with a count of 3 specified:
  592: 
  593:     # ping -c 3 marie
  594:     PING marie (172.16.14.12): 56 data bytes
  595:     64 bytes from 172.16.14.12: icmp_seq=0 ttl=255 time=0.571 ms
  596:     64 bytes from 172.16.14.12: icmp_seq=1 ttl=255 time=0.361 ms
  597:     64 bytes from 172.16.14.12: icmp_seq=2 ttl=255 time=0.371 ms
  598:     
  599:     ----marie PING Statistics----
  600:     3 packets transmitted, 3 packets received, 0.0% packet loss
  601:     round-trip min/avg/max/stddev = 0.361/0.434/0.571/0.118 ms
  602: 
  603: Not only does ping tell us if a host is alive, it tells us how long it took and
  604: gives some nice details at the very end. If a host cannot be resolved, just the
  605: IP address can be specified as well:
  606: 
  607:     # ping -c 1 172.16.20.5
  608:     PING ash (172.16.20.5): 56 data bytes
  609:     64 bytes from 172.16.20.5: icmp_seq=0 ttl=64 time=0.452 ms
  610:     
  611:     ----ash PING Statistics----
  612:     1 packets transmitted, 1 packets received, 0.0% packet loss
  613:     round-trip min/avg/max/stddev = 0.452/0.452/0.452/0.000 ms
  614: 
  615: Now, not unlike any other tool, the times are very subjective, especially in
  616: regards to networking. For example, while the times in the examples are good,
  617: take a look at the localhost ping:
  618: 
  619:     # ping -c 4 localhost
  620:     PING localhost (127.0.0.1): 56 data bytes
  621:     64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.091 ms
  622:     64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=0.129 ms
  623:     64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=0.120 ms
  624:     64 bytes from 127.0.0.1: icmp_seq=3 ttl=255 time=0.122 ms
  625:     
  626:     ----localhost PING Statistics----
  627:     4 packets transmitted, 4 packets received, 0.0% packet loss
  628:     round-trip min/avg/max/stddev = 0.091/0.115/0.129/0.017 ms
  629: 
  630: Much smaller because the request never left the machine. Pings can be used to
  631: gather information about how well a network is performing. It is also good for
  632: problem isolation, for instance, if there are three relatively close in size
  633: NetBSD systems on a network and one of them simply has horrible ping times,
  634: chances are something is wrong on that one particular machine.
  635: 
  636: ### traceroute
  637: 
  638: The
  639: [traceroute(8)](http://netbsd.gw.com/cgi-bin/man-cgi?traceroute+8+NetBSD-5.0.1+i386)
  640: command is great for making sure a path is available or detecting problems on a
  641: particular path. As an example, here is a trace between the example ftp server
  642: and ftp.NetBSD.org:
  643: 
  644:     # traceroute ftp.NetBSD.org
  645:     traceroute to ftp.NetBSD.org (204.152.184.75), 30 hops max, 40 byte packets
  646:      1  208.44.95.1 (208.44.95.1)  1.646 ms  1.492 ms  1.456 ms
  647:      2  63.144.65.170 (63.144.65.170)  7.318 ms  3.249 ms  3.854 ms
  648:      3  chcg01-edge18.il.inet.qwest.net (65.113.85.229)  35.982 ms  28.667 ms  21.971 ms
  649:      4  chcg01-core01.il.inet.qwest.net (205.171.20.1)  22.607 ms  26.242 ms  19.631 ms
  650:      5  snva01-core01.ca.inet.qwest.net (205.171.8.50)  78.586 ms  70.585 ms  84.779 ms
  651:      6  snva01-core03.ca.inet.qwest.net (205.171.14.122)  69.222 ms  85.739 ms  75.979 ms
  652:      7  paix01-brdr02.ca.inet.qwest.net (205.171.205.30)  83.882 ms  67.739 ms  69.937 ms
  653:      8  198.32.175.3 (198.32.175.3)  72.782 ms  67.687 ms  73.320 ms
  654:      9  so-1-0-0.orpa8.pf.isc.org (192.5.4.231)  78.007 ms  81.860 ms  77.069 ms
  655:     10  tun0.orrc5.pf.isc.org (192.5.4.165)  70.808 ms  75.151 ms  81.485 ms
  656:     11  ftp.NetBSD.org (204.152.184.75)  69.700 ms  69.528 ms  77.788 ms
  657: 
  658: All in all, not bad. The trace went from the host to the local router, then out
  659: onto the provider network and finally out onto the Internet looking for the
  660: final destination. How to interpret traceroutes, again, are subjective, but
  661: abnormally high times in portions of a path can indicate a bottleneck on a piece
  662: of network equipment. Not unlike ping, if the host itself is suspect, run
  663: traceroute from another host to the same destination. Now, for the worst case
  664: scenario:
  665: 
  666:     # traceroute www.microsoft.com
  667:     traceroute: Warning: www.microsoft.com has multiple addresses; using 207.46.230.220
  668:     traceroute to www.microsoft.akadns.net (207.46.230.220), 30 hops max, 40 byte packets
  669:      1  208.44.95.1 (208.44.95.1)  2.517 ms  4.922 ms  5.987 ms
  670:      2  63.144.65.170 (63.144.65.170)  10.981 ms  3.374 ms  3.249 ms
  671:      3  chcg01-edge18.il.inet.qwest.net (65.113.85.229)  37.810 ms  37.505 ms  20.795 ms
  672:      4  chcg01-core03.il.inet.qwest.net (205.171.20.21)  36.987 ms  32.320 ms  22.430 ms
  673:      5  chcg01-brdr03.il.inet.qwest.net (205.171.20.142)  33.155 ms  32.859 ms  33.462 ms
  674:      6  205.171.1.162 (205.171.1.162)  39.265 ms  20.482 ms  26.084 ms
  675:      7  sl-bb24-chi-13-0.sprintlink.net (144.232.26.85)  26.681 ms  24.000 ms  28.975 ms
  676:      8  sl-bb21-sea-10-0.sprintlink.net (144.232.20.30)  65.329 ms  69.694 ms  76.704 ms
  677:      9  sl-bb21-tac-9-1.sprintlink.net (144.232.9.221)  65.659 ms  66.797 ms  74.408 ms
  678:     10  144.232.187.194 (144.232.187.194)  104.657 ms  89.958 ms  91.754 ms
  679:     11  207.46.154.1 (207.46.154.1)  89.197 ms  84.527 ms  81.629 ms
  680:     12  207.46.155.10 (207.46.155.10)  78.090 ms  91.550 ms  89.480 ms
  681:     13  * * *
  682:     .......
  683: 
  684: In this case, the Microsoft server cannot be found either because of multiple
  685: addresses or somewhere along the line a system or server cannot reply to the
  686: information request. At that point, one might think to try ping, in the
  687: Microsoft case, a ping does not reply, that is because somewhere on their
  688: network ICMP is most likely disabled.
  689: 
  690: ### netstat
  691: 
  692: Another problem that can crop up on a NetBSD system is routing table issues.
  693: These issues are not always the systems fault. The
  694: [route(8)](http://netbsd.gw.com/cgi-bin/man-cgi?route+8+NetBSD-5.0.1+i386) and
  695: [netstat(1)](http://netbsd.gw.com/cgi-bin/man-cgi?netstat+1+NetBSD-5.0.1+i386)
  696: commands can show information about routes and network connections
  697: (respectively).
  698: 
  699: The route command can be used to look at and modify routing tables while netstat
  700: can display information about network connections and routes. First, here is
  701: some output from `route show`:
  702: 
  703:     # route show
  704:     Routing tables
  705:     
  706:     Internet:
  707:     Destination      Gateway            Flags
  708:     default          208.44.95.1        UG
  709:     loopback         127.0.0.1          UG
  710:     localhost        127.0.0.1          UH
  711:     172.15.13.0      172.16.14.37       UG
  712:     172.16.0.0       link#2             U
  713:     172.16.14.8      0:80:d3:cc:2c:0    UH
  714:     172.16.14.10     link#2             UH
  715:     marie            0:10:83:f9:6f:2c   UH
  716:     172.16.14.37     0:5:32:8f:d2:35    UH
  717:     172.16.16.15     link#2             UH
  718:     loghost          8:0:20:a7:f0:75    UH
  719:     artemus          8:0:20:a8:d:7e     UH
  720:     ash              0:b0:d0:de:49:df   UH
  721:     208.44.95.0      link#1             U
  722:     208.44.95.1      0:4:27:3:94:20     UH
  723:     208.44.95.2      0:5:32:8f:d2:34    UH
  724:     208.44.95.25     0:c0:4f:10:79:92   UH
  725:     
  726:     Internet6:
  727:     Destination      Gateway            Flags
  728:     default          localhost          UG
  729:     default          localhost          UG
  730:     localhost        localhost          UH
  731:     ::127.0.0.0      localhost          UG
  732:     ::224.0.0.0      localhost          UG
  733:     ::255.0.0.0      localhost          UG
  734:     ::ffff:0.0.0.0   localhost          UG
  735:     2002::           localhost          UG
  736:     2002:7f00::      localhost          UG
  737:     2002:e000::      localhost          UG
  738:     2002:ff00::      localhost          UG
  739:     fe80::           localhost          UG
  740:     fe80::%ex0       link#1             U
  741:     fe80::%ex1       link#2             U
  742:     fe80::%lo0       fe80::1%lo0        U
  743:     fec0::           localhost          UG
  744:     ff01::           localhost          U
  745:     ff02::%ex0       link#1             U
  746:     ff02::%ex1       link#2             U
  747:     ff02::%lo0       fe80::1%lo0        U
  748: 
  749: The flags section shows the status and whether or not it is a gateway. In this
  750: case we see `U`, `H` and `G` (`U` is up, `H` is host and `G` is gateway, see
  751: the man page for additional flags).
  752: 
  753: Now for some netstat output using the `-r` (routing) and `-n` (show network
  754: numbers) options:
  755: 
  756:     Routing tables
  757:     
  758:     Internet:
  759:     Destination        Gateway            Flags     Refs     Use    Mtu  Interface
  760:     default            208.44.95.1        UGS         0   330309   1500  ex0
  761:     127                127.0.0.1          UGRS        0        0  33228  lo0
  762:     127.0.0.1          127.0.0.1          UH          1     1624  33228  lo0
  763:     172.15.13/24       172.16.14.37       UGS         0        0   1500  ex1
  764:     172.16             link#2             UC         13        0   1500  ex1
  765:     ...
  766:     Internet6:
  767:     Destination                   Gateway                   Flags     Refs     Use
  768:       Mtu  Interface
  769:     ::/104                        ::1                       UGRS        0        0
  770:     33228  lo0 =>
  771:     ::/96                         ::1                       UGRS        0        0
  772: 
  773: The above output is a little more verbose. So, how can this help? Well, a good
  774: example is when routes between networks get changed while users are connected. I
  775: saw this happen several times when someone was rebooting routers all day long
  776: after each change. Several users called up saying they were getting kicked out
  777: and it was taking very long to log back in. As it turned out, the clients
  778: connecting to the system were redirected to another router (which took a very
  779: long route) to reconnect. I observed the `M` flag or Modified dynamically (by
  780: redirect) on their connections. I deleted the routes, had them reconnect and
  781: summarily followed up with the offending technician.
  782: 
  783: ### tcpdump
  784: 
  785: Last, and definitely not least is
  786: [tcpdump(8)](http://netbsd.gw.com/cgi-bin/man-cgi?tcpdump+8+NetBSD-5.0.1+i386),
  787: the network sniffer that can retrieve a lot of information. In this discussion,
  788: there will be some sample output and an explanation of some of the more useful
  789: options of tcpdump.
  790: 
  791: Following is a small snippet of tcpdump in action just as it starts:
  792: 
  793:     # tcpdump
  794:     tcpdump: listening on ex0
  795:     14:07:29.920651 mail.ssh > 208.44.95.231.3551: P 2951836801:2951836845(44) ack 2
  796:     476972923 win 17520 <nop,nop,timestamp 1219259 128519450> [tos 0x10]
  797:     14:07:29.950594 12.125.61.34 >  208.44.95.16: ESP(spi=2548773187,seq=0x3e8c) (DF)
  798:     14:07:29.983117 smtp.somecorp.com.smtp > 208.44.95.30.42828: . ack 420285166 win
  799:     16500 (DF)
  800:     14:07:29.984406 208.44.95.30.42828 > smtp.somecorp.com.smtp: . 1:1376(1375) ack 0
  801:      win 7431 (DF)
  802:     ...
  803: 
  804: Given that the particular server is a mail server, what is shown makes perfect
  805: sense, however, the utility is very verbose, I prefer to initially run tcpdump
  806: with no options and send the text output into a file for later digestion like
  807: so:
  808: 
  809:     # tcpdump > tcpdump.out
  810:     tcpdump: listening on ex0
  811: 
  812: So, what precisely in the mish mosh are we looking for? In short, anything that
  813: does not seem to fit, for example, messed up packet lengths (as in a lot of
  814: them) will show up as improper lens or malformed packets (basically garbage).
  815: If, however, we are looking for something specific, tcpdump may be able to help
  816: depending on the problem.
  817: 
  818: #### Specific tcpdump Usage
  819: 
  820: These are just examples of a few things one can do with tcpdump.
  821: 
  822: Look for duplicate IP addresses:
  823: 
  824:     tcpdump -e host ip-address
  825: 
  826: For example:
  827: 
  828:     tcpdump -e host 192.168.0.2
  829: 
  830: Routing Problems:
  831: 
  832:     tcpdump icmp
  833: 
  834: There are plenty of third party tools available, however, NetBSD comes shipped
  835: with a good tool set for tracking down network level performance problems.
  836: 
  837: ## Accounting
  838: 
  839: The NetBSD system comes equipped with a great deal of performance monitors for
  840: active monitoring, but what about long term monitoring? Well, of course the
  841: output of a variety of commands can be sent to files and re-parsed later with a
  842: meaningful shell script or program. NetBSD does, by default, offer some
  843: extraordinarily powerful low level monitoring tools for the programmer,
  844: administrator or really astute hobbyist.
  845: 
  846: ### Accounting
  847: 
  848: While accounting gives system usage at an almost userland level, kernel
  849: profiling with gprof provides explicit system call usage.
  850: 
  851: Using the accounting tools can help figure out what possible performance
  852: problems may be laying in wait, such as increased usage of compilers or network
  853: services for example.
  854: 
  855: Starting accounting is actually fairly simple, as root, use the
  856: [accton(8)](http://netbsd.gw.com/cgi-bin/man-cgi?accton+8+NetBSD-5.0.1+i386)
  857: command. The syntax to start accounting is: `accton filename`
  858: 
  859: Where accounting information is appended to filename, now, strangely enough, the
  860: lastcomm command which reads from an accounting output file, by default, looks
  861: in `/var/account/acct` so I tend to just use the default location, however,
  862: lastcomm can be told to look elsewhere.
  863: 
  864: To stop accounting, simply type accton with no arguments.
  865: 
  866: ### Reading Accounting Information
  867: 
  868: To read accounting information, there are two tools that can be used:
  869: 
  870:  * [lastcomm(1)](http://netbsd.gw.com/cgi-bin/man-cgi?lastcomm+1+NetBSD-5.0.1+i386)
  871:  * [sa(8)](http://netbsd.gw.com/cgi-bin/man-cgi?sa+8+NetBSD-5.0.1+i386)
  872: 
  873: #### lastcomm
  874: 
  875: The lastcomm command shows the last commands executed in order, like all of
  876: them. It can, however, select by user, here is some sample output:
  877: 
  878:     $ lastcomm jrf
  879:     last       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:39 (0:00:00.02)
  880:     man        -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
  881:     sh         -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
  882:     less       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
  883:     lastcomm   -       jrf      ttyp3      0.02 secs Tue Sep  3 14:38 (0:00:00.02)
  884:     stty       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
  885:     tset       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:01.05)
  886:     hostname   -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
  887:     ls         -       jrf      ttyp0      0.00 secs Tue Sep  3 14:36 (0:00:00.00)
  888:     ...
  889: 
  890: Pretty nice, the lastcomm command gets its information from the default location
  891: of /var/account/acct, however, using the -f option, another file may be
  892: specified.
  893: 
  894: As may seem obvious, the output of lastcomm could get a little heavy on large
  895: multi user systems. That is where sa comes into play.
  896: 
  897: #### sa
  898: 
  899: The sa command (meaning "print system accounting statistics") can be used to
  900: maintain information. It can also be used interactively to create reports.
  901: Following is the default output of sa:
  902: 
  903:     $ sa
  904:           77       18.62re        0.02cp        8avio        0k
  905:            3        4.27re        0.01cp       45avio        0k   ispell
  906:            2        0.68re        0.00cp       33avio        0k   mutt
  907:            2        1.09re        0.00cp       23avio        0k   vi
  908:           10        0.61re        0.00cp        7avio        0k   ***other
  909:            2        0.01re        0.00cp       29avio        0k   exim
  910:            4        0.00re        0.00cp        8avio        0k   lastcomm
  911:            2        0.00re        0.00cp        3avio        0k   atrun
  912:            3        0.03re        0.00cp        1avio        0k   cron*
  913:            5        0.02re        0.00cp       10avio        0k   exim*
  914:           10        3.98re        0.00cp        2avio        0k   less
  915:           11        0.00re        0.00cp        0avio        0k   ls
  916:            9        3.95re        0.00cp       12avio        0k   man
  917:            2        0.00re        0.00cp        4avio        0k   sa
  918:           12        3.97re        0.00cp        1avio        0k   sh
  919:     ...
  920: 
  921: From left to right, total times called, real time in minutes, sum of user and
  922: system time, in minutes, Average number of I/O operations per execution, size,
  923: command name.
  924: 
  925: The sa command can also be used to create summary files or reports based on some
  926: options, for example, here is the output when specifying a sort by CPU-time
  927: average memory usage:
  928: 
  929:     $ sa -k
  930:           86       30.81re        0.02cp        8avio        0k
  931:           10        0.61re        0.00cp        7avio        0k   ***other
  932:            2        0.00re        0.00cp        3avio        0k   atrun
  933:            3        0.03re        0.00cp        1avio        0k   cron*
  934:            2        0.01re        0.00cp       29avio        0k   exim
  935:            5        0.02re        0.00cp       10avio        0k   exim*
  936:            3        4.27re        0.01cp       45avio        0k   ispell
  937:            4        0.00re        0.00cp        8avio        0k   lastcomm
  938:           12        8.04re        0.00cp        2avio        0k   less
  939:           13        0.00re        0.00cp        0avio        0k   ls
  940:           11        8.01re        0.00cp       12avio        0k   man
  941:            2        0.68re        0.00cp       33avio        0k   mutt
  942:            3        0.00re        0.00cp        4avio        0k   sa
  943:           14        8.03re        0.00cp        1avio        0k   sh
  944:            2        1.09re        0.00cp       23avio        0k   vi
  945: 
  946: The sa command is very helpful on larger systems.
  947: 
  948: ### How to Put Accounting to Use
  949: 
  950: Accounting reports, as was mentioned earlier, offer a way to help predict
  951: trends, for example, on a system that has cc and make being used more and more
  952: may indicate that in a few months some changes will need to be made to keep the
  953: system running at an optimum level. Another good example is web server usage. If
  954: it begins to gradually increase, again, some sort of action may need to be taken
  955: before it becomes a problem. Luckily, with accounting tools, said actions can be
  956: reasonably predicted and planned for ahead of time.
  957: 
  958: ## Kernel Profiling
  959: 
  960: Profiling a kernel is normally employed when the goal is to compare the
  961: difference of new changes in the kernel to a previous one or to track down some
  962: sort of low level performance problem. Two sets of data about profiled code
  963: behavior are recorded independently: function call frequency and time spent in
  964: each function.
  965: 
  966: ### Getting Started
  967: 
  968: First, take a look at both [[Kernel Tuning|guide/tuning#kernel]] and [[Compiling
  969: the kernel|guide/kernel]]. The only difference in procedure for setting up a
  970: kernel with profiling enabled is when you run config add the `-p` option. The
  971: build area is `../compile/<KERNEL_NAME>.PROF` , for example, a GENERIC kernel
  972: would be `../compile/GENERIC.PROF`.
  973: 
  974: Following is a quick summary of how to compile a kernel with profiling enabled
  975: on the i386 port, the assumptions are that the appropriate sources are available
  976: under `/usr/src` and the GENERIC configuration is being used, of course, that
  977: may not always be the situation:
  978: 
  979:  1. **`cd /usr/src/sys/arch/i386/conf`**
  980:  2. **`config -p GENERIC`**
  981:  3. **`cd ../compile/GENERIC.PROF`**
  982:  4. **`make depend && make`**
  983:  5. **`cp /netbsd /netbsd.old`**
  984:  6. **`cp netbsd /`**
  985:  7. **`reboot`**
  986: 
  987: Once the new kernel is in place and the system has rebooted, it is time to turn
  988: on the monitoring and start looking at results.
  989: 
  990: #### Using kgmon
  991: 
  992: To start kgmon:
  993: 
  994:     $ kgmon -b
  995:     kgmon: kernel profiling is running.
  996: 
  997: Next, send the data into the file `gmon.out`:
  998: 
  999:     $ kgmon -p
 1000: 
 1001: Now, it is time to make the output readable:
 1002: 
 1003:     $ gprof /netbsd > gprof.out
 1004: 
 1005: Since gmon is looking for `gmon.out`, it should find it in the current working
 1006: directory.
 1007: 
 1008: By just running kgmon alone, you may not get the information you need, however,
 1009: if you are comparing the differences between two different kernels, then a known
 1010: good baseline should be used. Note that it is generally a good idea to  stress
 1011: the subsystem if you know what it is both in the baseline and with the newer (or
 1012: different) kernel.
 1013: 
 1014: ### Interpretation of kgmon Output
 1015: 
 1016: Now that kgmon can run, collect and parse information, it is time to actually
 1017: look at some of that information. In this particular instance, a GENERIC kernel
 1018: is running with profiling enabled for about an hour with only system processes
 1019: and no adverse load, in the fault insertion section, the example will be large
 1020: enough that even under a minimal load detection of the problem should be easy.
 1021: 
 1022: #### Flat Profile
 1023: 
 1024: The flat profile is a list of functions, the number of times they were called
 1025: and how long it took (in seconds). Following is sample output from the quiet
 1026: system:
 1027: 
 1028:     Flat profile:
 1029:     
 1030:     Each sample counts as 0.01 seconds.
 1031:       %   cumulative   self              self     total
 1032:      time   seconds   seconds    calls  ns/call  ns/call  name
 1033:      99.77    163.87   163.87                             idle
 1034:       0.03    163.92     0.05      219 228310.50 228354.34  _wdc_ata_bio_start
 1035:       0.02    163.96     0.04      219 182648.40 391184.96  wdc_ata_bio_intr
 1036:       0.01    163.98     0.02     3412  5861.66  6463.02  pmap_enter
 1037:       0.01    164.00     0.02      548 36496.35 36496.35  pmap_zero_page
 1038:       0.01    164.02     0.02                             Xspllower
 1039:       0.01    164.03     0.01   481968    20.75    20.75  gettick
 1040:       0.01    164.04     0.01     6695  1493.65  1493.65  VOP_LOCK
 1041:       0.01    164.05     0.01     3251  3075.98 21013.45  syscall_plain
 1042:     ...
 1043: 
 1044: As expected, idle was the highest in percentage, however, there were still some
 1045: things going on, for example, a little further down there is the `vn\_lock`
 1046: function:
 1047: 
 1048:     ...
 1049:       0.00    164.14     0.00     6711     0.00     0.00  VOP_UNLOCK
 1050:       0.00    164.14     0.00     6677     0.00  1493.65  vn_lock
 1051:       0.00    164.14     0.00     6441     0.00     0.00  genfs_unlock
 1052: 
 1053: This is to be expected, since locking still has to take place, regardless.
 1054: 
 1055: #### Call Graph Profile
 1056: 
 1057: The call graph is an augmented version of the flat profile showing subsequent
 1058: calls from the listed functions. First, here is some sample output:
 1059: 
 1060:                          Call graph (explanation follows)
 1061:     
 1062:     
 1063:     granularity: each sample hit covers 4 byte(s) for 0.01% of 164.14 seconds
 1064:     
 1065:     index % time    self  children    called     name
 1066:                                                      <spontaneous>
 1067:     [1]     99.8  163.87    0.00                 idle [1]
 1068:     -----------------------------------------------
 1069:                                                      <spontaneous>
 1070:     [2]      0.1    0.01    0.08                 syscall1 [2]
 1071:                     0.01    0.06    3251/3251        syscall_plain [7]
 1072:                     0.00    0.01     414/1660        trap [9]
 1073:     -----------------------------------------------
 1074:                     0.00    0.09     219/219         Xintr14 [6]
 1075:     [3]      0.1    0.00    0.09     219         pciide_compat_intr [3]
 1076:                     0.00    0.09     219/219         wdcintr [5]
 1077:     -----------------------------------------------
 1078:     ...
 1079: 
 1080: Now this can be a little confusing. The index number is mapped to from the
 1081: trailing number on the end of the line, for example,
 1082: 
 1083:     ...
 1084:                     0.00    0.01      85/85          dofilewrite [68]
 1085:     [72]     0.0    0.00    0.01      85         soo_write [72]
 1086:                     0.00    0.01      85/89          sosend [71]
 1087:     ...
 1088: 
 1089: Here we see that dofilewrite was called first, now we can look at the index
 1090: number for 64 and see what was happening there:
 1091: 
 1092:     ...
 1093:                     0.00    0.01     101/103         ffs_full_fsync <cycle 6> [58]
 1094:     [64]     0.0    0.00    0.01     103         bawrite [64]
 1095:                     0.00    0.01     103/105         VOP_BWRITE [60]
 1096:     ...
 1097: 
 1098: And so on, in this way, a "visual trace" can be established.
 1099: 
 1100: At the end of the call graph right after the terms section is an index by
 1101: function name which can help map indexes as well.
 1102: 
 1103: ### Putting it to Use
 1104: 
 1105: In this example, I have modified an area of the kernel I know will create a problem that will be blatantly obvious.
 1106: 
 1107: Here is the top portion of the flat profile after running the system for about an hour with little interaction from users:
 1108: 
 1109:     Flat profile:
 1110:     
 1111:     Each sample counts as 0.01 seconds.
 1112:       %   cumulative   self              self     total
 1113:      time   seconds   seconds    calls  us/call  us/call  name
 1114:      93.97    139.13   139.13                             idle
 1115:       5.87    147.82     8.69       23 377826.09 377842.52  check_exec
 1116:       0.01    147.84     0.02      243    82.30    82.30  pmap_copy_page
 1117:       0.01    147.86     0.02      131   152.67   152.67  _wdc_ata_bio_start
 1118:       0.01    147.88     0.02      131   152.67   271.85  wdc_ata_bio_intr
 1119:       0.01    147.89     0.01     4428     2.26     2.66  uvn_findpage
 1120:       0.01    147.90     0.01     4145     2.41     2.41  uvm_pageactivate
 1121:       0.01    147.91     0.01     2473     4.04  3532.40  syscall_plain
 1122:       0.01    147.92     0.01     1717     5.82     5.82  i486_copyout
 1123:       0.01    147.93     0.01     1430     6.99    56.52  uvm_fault
 1124:       0.01    147.94     0.01     1309     7.64     7.64  pool_get
 1125:       0.01    147.95     0.01      673    14.86    38.43  genfs_getpages
 1126:       0.01    147.96     0.01      498    20.08    20.08  pmap_zero_page
 1127:       0.01    147.97     0.01      219    45.66    46.28  uvm_unmap_remove
 1128:       0.01    147.98     0.01      111    90.09    90.09  selscan
 1129:     ...
 1130: 
 1131: As is obvious, there is a large difference in performance. Right off the bat the
 1132: idle time is noticeably less. The main difference here is that one particular
 1133: function has a large time across the board with very few calls. That function is
 1134: `check_exec`. While at first, this may not seem strange if a lot of commands
 1135: had been executed, when compared to the flat profile of the first measurement,
 1136: proportionally it does not seem right:
 1137: 
 1138:     ...
 1139:       0.00    164.14     0.00       37     0.00 62747.49  check_exec
 1140:     ...
 1141: 
 1142: The call in the first measurement is made 37 times and has a better performance.
 1143: Obviously something in or around that function is wrong. To eliminate other
 1144: functions, a look at the call graph can help, here is the first instance of
 1145: `check_exec`
 1146: 
 1147:     ...
 1148:     -----------------------------------------------
 1149:                     0.00    8.69      23/23          syscall_plain [3]
 1150:     [4]      5.9    0.00    8.69      23         sys_execve [4]
 1151:                     8.69    0.00      23/23          check_exec [5]
 1152:                     0.00    0.00      20/20          elf32_copyargs [67]
 1153:     ...
 1154: 
 1155: Notice how the time of 8.69 seems to affect the two previous functions. It is
 1156: possible that there is something wrong with them, however, the next instance of
 1157: `check_exec` seems to prove otherwise:
 1158: 
 1159:     ...
 1160:     -----------------------------------------------
 1161:                     8.69    0.00      23/23          sys_execve [4]
 1162:     [5]      5.9    8.69    0.00      23         check_exec [5]
 1163:     ...
 1164: 
 1165: Now we can see that the problem, most likely, resides in `check_exec`. Of
 1166: course, problems are not always this simple and in fact, here is the simpleton
 1167: code that was inserted right after `check_exec` (the function is in
 1168: `sys/kern/kern_exec.c`):
 1169: 
 1170:     ...
 1171:             /* A Cheap fault insertion */
 1172:             for (x = 0; x < 100000000; x++) {
 1173:                     y = x;
 1174:             }
 1175:     ..
 1176: 
 1177: Not exactly glamorous, but enough to register a large change with profiling.
 1178: 
 1179: ### Summary
 1180: 
 1181: Kernel profiling can be enlightening for anyone and provides a much more refined
 1182: method of hunting down performance problems that are not as easy to find using
 1183: conventional means, it is also not nearly as hard as most people think, if you
 1184: can compile a kernel, you can get profiling to work.
 1185: 
 1186: ## System Tuning
 1187: 
 1188: Now that monitoring and analysis tools have been addressed, it is time to look
 1189: into some actual methods. In this section, tools and methods that can affect how
 1190: the system performs that are applied without recompiling the kernel are
 1191: addressed, the next section examines kernel tuning by recompiling.
 1192: 
 1193: ### Using sysctl
 1194: 
 1195: The sysctl utility can be used to look at and in some cases alter system
 1196: parameters. There are so many parameters that can be viewed and changed they
 1197: cannot all be shown here, however, for the first example here is a simple usage
 1198: of sysctl to look at the system PATH environment variable:
 1199: 
 1200:     $ sysctl user.cs_path
 1201:     user.cs_path = /usr/bin:/bin:/usr/sbin:/sbin:/usr/pkg/bin:/usr/pkg/sbin:/usr/local/bin:/usr/local/sbin
 1202: 
 1203: Fairly simple. Now something that is actually related to performance. As an
 1204: example, lets say a system with many users is having file open issues, by
 1205: examining and perhaps raising the kern.maxfiles parameter the problem may be
 1206: fixed, but first, a look:
 1207: 
 1208:     $ sysctl kern.maxfiles
 1209:     kern.maxfiles = 1772
 1210: 
 1211: Now, to change it, as root with the -w option specified:
 1212: 
 1213:     # sysctl -w kern.maxfiles=1972
 1214:     kern.maxfiles: 1772 -> 1972
 1215: 
 1216: Note, when the system is rebooted, the old value will return, there are two
 1217: cures for this, first, modify that parameter in the kernel and recompile, second
 1218: (and simpler) add this line to `/etc/sysctl.conf`:
 1219: 
 1220:     kern.maxfiles=1972
 1221: 
 1222: ### tmpfs & mfs
 1223: 
 1224: NetBSD's *ramdisk* implementations cache all data in the RAM, and if that is
 1225: full, the swap space is used as backing store. NetBSD comes with two
 1226: implementations, the traditional BSD memory-based file system
 1227: [mfs](http://netbsd.gw.com/cgi-bin/man-cgi?mount_mfs+8+NetBSD-current)
 1228: and the more modern
 1229: [tmpfs](http://netbsd.gw.com/cgi-bin/man-cgi?mount_tmpfs+8+NetBSD-current).
 1230: While the former can only grow in size, the latter can also shrink if space is
 1231: no longer needed.
 1232: 
 1233: When to use and not to use a memory based filesystem can be hard on large multi
 1234: user systems. In some cases, however, it makes pretty good sense, for example,
 1235: on a development machine used by only one developer at a time, the obj directory
 1236: might be a good place, or some of the tmp directories for builds. In a case like
 1237: that, it makes sense on machines that have a fair amount of RAM on them. On the
 1238: other side of the coin, if a system only has 16MB of RAM and `/var/tmp` is
 1239: mfs-based, there could be severe applications issues that occur.
 1240: 
 1241: The GENERIC kernel has both tmpfs and mfs enabled by default. To use it on a
 1242: particular directory first determine where the swap space is that you wish to
 1243: use, in the example case, a quick look in `/etc/fstab` indicates that
 1244: `/dev/wd0b` is the swap partition:
 1245: 
 1246:     mail% cat /etc/fstab
 1247:     /dev/wd0a / ffs rw 1 1
 1248:     /dev/wd0b none swap sw 0 0
 1249:     /kern /kern kernfs rw
 1250: 
 1251: This system is a mail server so I only want to use `/tmp` with tmpfs, also on
 1252: this particular system I have linked `/tmp` to `/var/tmp` to save space (they
 1253: are on the same drive). All I need to do is add the following entry:
 1254: 
 1255:     /dev/wd0b /var/tmp tmpfs rw 0 0
 1256: 
 1257: If you want to use mfs instead of tmpfs, put just that into the above place.
 1258: 
 1259: Now, a word of warning: make sure said directories are empty and nothing is
 1260: using them when you mount the memory file system! After changing `/etc/fstab`,
 1261: you can either run `mount -a` or reboot the system.
 1262: 
 1263: ### Soft-dependencies
 1264: 
 1265: Soft-dependencies (softdeps) is a mechanism that does not write meta-data to
 1266: disk immediately, but it is written in an ordered fashion, which keeps the
 1267: filesystem consistent in case of a crash. The main benefit of softdeps is
 1268: processing speed. Soft-dependencies have some sharp edges, so beware! Also note
 1269: that soft-dependencies will not be present in any releases past 5.x. See
 1270: [[Journaling|guide/tuning#system-logging]] for information about WAPBL, which is
 1271: the replacement for soft-dependencies.
 1272: 
 1273: Soft-dependencies can be enabled by adding `softdep` to the filesystem options
 1274: in `/etc/fstab`. Let's look at an example of `/etc/fstab`:
 1275: 
 1276:     /dev/wd0a / ffs rw 1 1
 1277:     /dev/wd0b none swap sw 0 0
 1278:     /dev/wd0e /var ffs rw 1 2
 1279:     /dev/wd0f /tmp ffs rw 1 2
 1280:     /dev/wd0g /usr ffs rw 1 2
 1281: 
 1282: Suppose we want to enable soft-dependencies for all file systems, except for the
 1283: `/` partition. We would change it to (changes are emphasized):
 1284: 
 1285:     /dev/wd0a / ffs rw 1 1
 1286:     /dev/wd0b none swap sw 0 0
 1287:     /dev/wd0e /var ffs rw,softdep 1 2
 1288:     /dev/wd0f /tmp ffs rw,softdep 1 2
 1289:     /dev/wd0g /usr ffs rw,softdep 1 2
 1290: 
 1291: More information about softdep capabilities can be found on the
 1292: [author's page](http://www.mckusick.com/softdep/index.html).
 1293: 
 1294: ### Journaling
 1295: 
 1296: Journaling is a mechanism which puts written data in a so-called *journal*
 1297: first, and in a second step the data from the journal is written to disk. In the
 1298: event of a system crash, data that was not written to disk but that is in the
 1299: journal can be replayed, and will thus get the disk into a proper state. The
 1300: main effect of this is that no file system check (fsck) is needed after a rough
 1301: reboot. As of 5.0, NetBSD includes WAPBL, which provides journaling for FFS.
 1302: 
 1303: Journaling can be enabled by adding `log` to the filesystem options in
 1304: `/etc/fstab`. Here is an example which enables journaling for the root (`/`),
 1305: `/var`, and `/usr` file systems:
 1306: 
 1307:     /dev/wd0a /    ffs rw,log 1 1
 1308:     /dev/wd0e /var ffs rw,log 1 2
 1309:     /dev/wd0g /usr ffs rw,log 1 2
 1310: 
 1311: ### LFS
 1312: 
 1313: LFS, the log structured filesystem, writes data to disk in a way that is
 1314: sometimes too aggressive and leads to congestion. To throttle writing, the
 1315: following sysctls can be used:
 1316: 
 1317:     vfs.sync.delay
 1318:     vfs.sync.filedelay
 1319:     vfs.sync.dirdelay
 1320:     vfs.sync.metadelay
 1321:     vfs.lfs.flushindir
 1322:     vfs.lfs.clean_vnhead
 1323:     vfs.lfs.dostats
 1324:     vfs.lfs.pagetrip
 1325:     vfs.lfs.stats.segsused
 1326:     vfs.lfs.stats.psegwrites
 1327:     vfs.lfs.stats.psyncwrites
 1328:     vfs.lfs.stats.pcleanwrites
 1329:     vfs.lfs.stats.blocktot
 1330:     vfs.lfs.stats.cleanblocks
 1331:     vfs.lfs.stats.ncheckpoints
 1332:     vfs.lfs.stats.nwrites
 1333:     vfs.lfs.stats.nsync_writes
 1334:     vfs.lfs.stats.wait_exceeded
 1335:     vfs.lfs.stats.write_exceeded
 1336:     vfs.lfs.stats.flush_invoked
 1337:     vfs.lfs.stats.vflush_invoked
 1338:     vfs.lfs.stats.clean_inlocked
 1339:     vfs.lfs.stats.clean_vnlocked
 1340:     vfs.lfs.stats.segs_reclaimed
 1341:     vfs.lfs.ignore_lazy_sync
 1342: 
 1343: Besides tuning those parameters, disabling write-back caching on
 1344: [wd(4)](http://netbsd.gw.com/cgi-bin/man-cgi?wd+4+NetBSD-5.0.1+i386) devices may
 1345: be beneficial. See the
 1346: [dkctl(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dkctl+8+NetBSD-5.0.1+i386) man
 1347: page for details.
 1348: 
 1349: More is available in the NetBSD mailing list archives. See
 1350: [this](http://mail-index.NetBSD.org/tech-perform/2007/04/01/0000.html) and
 1351: [this](http://mail-index.NetBSD.org/tech-perform/2007/04/01/0001.html) mail.
 1352: 
 1353: ## Kernel Tuning
 1354: 
 1355: While many system parameters can be changed with sysctl, many improvements by
 1356: using enhanced system software, layout of the system and managing services
 1357: (moving them in and out of inetd for example) can be achieved as well. Tuning
 1358: the kernel however will provide better performance, even if it appears to be
 1359: marginal.
 1360: 
 1361: ### Preparing to Recompile a Kernel
 1362: 
 1363: First, get the kernel sources for the release as described in
 1364: [[Obtaining the sources|guide/fetch]], reading
 1365: [[Compiling the kernel|guide/kernel]]for more information on building the kernel
 1366: is recommended. Note, this document can be used for -current tuning, however, a
 1367: read of the
 1368: [[Tracking -current|tracking_current]] documentation should be done first, much
 1369: of the information there is repeated here.
 1370: 
 1371: ### Configuring the Kernel
 1372: 
 1373: Configuring a kernel in NetBSD can be daunting. This is because of multiple line
 1374: dependencies within the configuration file itself, however, there is a benefit
 1375: to this method and that is, all it really takes is an ASCII editor to get a new
 1376: kernel configured and some dmesg output. The kernel configuration file is under
 1377: `src/sys/arch/ARCH/conf` where ARCH is your architecture (for example, on a
 1378: SPARC it would be under `src/sys/arch/sparc/conf`).
 1379: 
 1380: After you have located your kernel config file, copy it and remove (comment out)
 1381: all the entries you don't need. This is where
 1382: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
 1383: becomes your friend. A clean
 1384: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)-output
 1385: will show all of the devices detected by the kernel at boot time. Using
 1386: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
 1387: output, the device options really needed can be determined.
 1388: 
 1389: #### Some example Configuration Items
 1390: 
 1391: In this example, an ftp server's kernel is being reconfigured to run with the
 1392: bare minimum drivers and options and any other items that might make it run
 1393: faster (again, not necessarily smaller, although it will be). The first thing to
 1394: do is take a look at some of the main configuration items. So, in
 1395: `/usr/src/sys/arch/i386/conf` the GENERIC file is copied to FTP, then the file
 1396: FTP edited.
 1397: 
 1398: At the start of the file there are a bunch of options beginning with maxusers,
 1399: which will be left alone, however, on larger multi-user systems it might be help
 1400: to crank that value up a bit. Next is CPU support, looking at the dmesg output
 1401: this is seen:
 1402: 
 1403:     cpu0: Intel Pentium II/Celeron (Deschutes) (686-class), 400.93 MHz
 1404: 
 1405: Indicating that only the options `I686_CPU` options needs to be used. In the next
 1406: section, all options are left alone except the `PIC_DELAY` which is recommended
 1407: unless it is an older machine. In this case it is enabled since the 686 is
 1408: *relatively new*.
 1409: 
 1410: Between the last section all the way down to compat options there really was no
 1411: need to change anything on this particular system. In the compat section,
 1412: however, there are several options that do not need to be enabled, again this is
 1413: because this machine is strictly a FTP server, all compat options were turned
 1414: off.
 1415: 
 1416: The next section is File systems, and again, for this server very few need to be
 1417: on, the following were left on:
 1418: 
 1419:     # File systems
 1420:     file-system     FFS             # UFS
 1421:     file-system     LFS             # log-structured file system
 1422:     file-system     MFS             # memory file system
 1423:     file-system     CD9660          # ISO 9660 + Rock Ridge file system
 1424:     file-system     FDESC           # /dev/fd
 1425:     file-system     KERNFS          # /kern
 1426:     file-system     NULLFS          # loopback file system
 1427:     file-system     PROCFS          # /proc
 1428:     file-system     UMAPFS          # NULLFS + uid and gid remapping
 1429:     ...
 1430:     options         SOFTDEP         # FFS soft updates support.
 1431:     ...
 1432: 
 1433: Next comes the network options section. The only options left on were:
 1434: 
 1435:     options         INET            # IP + ICMP + TCP + UDP
 1436:     options         INET6           # IPV6
 1437:     options         IPFILTER_LOG    # ipmon(8) log support
 1438: 
 1439: `IPFILTER_LOG` is a nice one to have around since the server will be running
 1440: ipf.
 1441: 
 1442: The next section is verbose messages for various subsystems, since this machine
 1443: is already running and had no major problems, all of them are commented out.
 1444: 
 1445: #### Some Drivers
 1446: 
 1447: The configurable items in the config file are relatively few and easy to cover,
 1448: however, device drivers are a different story. In the following examples, two
 1449: drivers are examined and their associated *areas* in the file trimmed down.
 1450: First, a small example: the cdrom, in dmesg, is the following lines:
 1451: 
 1452:     ...
 1453:     cd0 at atapibus0 drive 0: <CD-540E, , 1.0A> type 5 cdrom removable
 1454:     cd0: 32-bit data port
 1455:     cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
 1456:     pciide0: secondary channel interrupting at irq 15
 1457:     cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfer
 1458:     ...
 1459: 
 1460: Now, it is time to track that section down in the configuration file. Notice
 1461: that the `cd`-drive is on an atapibus and requires pciide support. The section
 1462: that is of interest in this case is the kernel config's "IDE and related
 1463: devices" section. It is worth noting at this point, in and around the IDE
 1464: section are also ISA, PCMCIA etc., on this machine in the
 1465: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
 1466: output there are no PCMCIA devices, so it stands to reason that all PCMCIA
 1467: references can be removed. But first, the `cd` drive.
 1468: 
 1469: At the start of the IDE section is the following:
 1470: 
 1471:     ...
 1472:     wd*     at atabus? drive ? flags 0x0000
 1473:     ...
 1474:     atapibus* at atapi?
 1475:     ...
 1476: 
 1477: Well, it is pretty obvious that those lines need to be kept. Next is this:
 1478: 
 1479:     ...
 1480:     # ATAPI devices
 1481:     # flags have the same meaning as for IDE drives.
 1482:     cd*     at atapibus? drive ? flags 0x0000       # ATAPI CD-ROM drives
 1483:     sd*     at atapibus? drive ? flags 0x0000       # ATAPI disk drives
 1484:     st*     at atapibus? drive ? flags 0x0000       # ATAPI tape drives
 1485:     uk*     at atapibus? drive ? flags 0x0000       # ATAPI unknown
 1486:     ...
 1487: 
 1488: The only device type that was in the
 1489: [dmesg(8)](http://netbsd.gw.com/cgi-bin/man-cgi?dmesg+8+NetBSD-5.0.1+i386)
 1490: output was the cd, the rest can be commented out.
 1491: 
 1492: The next example is slightly more difficult, network interfaces. This machine
 1493: has two of them:
 1494: 
 1495:     ...
 1496:     ex0 at pci0 dev 17 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x64)
 1497:     ex0: interrupting at irq 10
 1498:     ex0: MAC address 00:50:04:83:ff:b7
 1499:     UI 0x001018 model 0x0012 rev 0 at ex0 phy 24 not configured
 1500:     ex1 at pci0 dev 19 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
 1501:     ex1: interrupting at irq 11
 1502:     ex1: MAC address 00:50:da:63:91:2e
 1503:     exphy0 at ex1 phy 24: 3Com internal media interface
 1504:     exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 1505:     ...
 1506: 
 1507: At first glance it may appear that there are in fact three devices, however, a
 1508: closer look at this line:
 1509: 
 1510:     exphy0 at ex1 phy 24: 3Com internal media interface
 1511: 
 1512: Reveals that it is only two physical cards, not unlike the cdrom, simply
 1513: removing names that are not in dmesg will do the job. In the beginning of the
 1514: network interfaces section is:
 1515: 
 1516:     ...
 1517:     # Network Interfaces
 1518:     
 1519:     # PCI network interfaces
 1520:     an*     at pci? dev ? function ?        # Aironet PC4500/PC4800 (802.11)
 1521:     bge*    at pci? dev ? function ?        # Broadcom 570x gigabit Ethernet
 1522:     en*     at pci? dev ? function ?        # ENI/Adaptec ATM
 1523:     ep*     at pci? dev ? function ?        # 3Com 3c59x
 1524:     epic*   at pci? dev ? function ?        # SMC EPIC/100 Ethernet
 1525:     esh*    at pci? dev ? function ?        # Essential HIPPI card
 1526:     ex*     at pci? dev ? function ?        # 3Com 90x[BC]
 1527:     ...
 1528: 
 1529: There is the ex device. So all of the rest under the PCI section can be removed.
 1530: Additionally, every single line all the way down to this one:
 1531: 
 1532:     exphy*  at mii? phy ?                   # 3Com internal PHYs
 1533: 
 1534: can be commented out as well as the remaining.
 1535: 
 1536: #### Multi Pass
 1537: 
 1538: When I tune a kernel, I like to do it remotely in an X windows session, in one
 1539: window the dmesg output, in the other the config file. It can sometimes take a
 1540: few passes to rebuild a very trimmed kernel since it is easy to accidentally
 1541: remove dependencies.
 1542: 
 1543: ### Building the New Kernel
 1544: 
 1545: Now it is time to build the kernel and put it in place. In the conf directory on
 1546: the ftp server, the following command prepares the build:
 1547: 
 1548:     $ config FTP
 1549: 
 1550: When it is done a message reminding me to make depend will display, next:
 1551: 
 1552:     $ cd ../compile/FTP
 1553:     $ make depend && make
 1554: 
 1555: When it is done, I backup the old kernel and drop the new one in place:
 1556: 
 1557:     # cp /netbsd /netbsd.orig
 1558:     # cp netbsd /
 1559: 
 1560: Now reboot. If the kernel cannot boot, stop the boot process when prompted and
 1561: type `boot netbsd.orig` to boot from the previous kernel.
 1562: 
 1563: ### Shrinking the NetBSD kernel
 1564: 
 1565: When building a kernel for embedded systems, it's often necessary to modify the
 1566: Kernel binary to reduce space or memory footprint.
 1567: 
 1568: #### Removing ELF sections and debug information
 1569: 
 1570: We already know how to remove Kernel support for drivers and options that you
 1571: don't need, thus saving memory and space, but you can save some KiloBytes of
 1572: space by removing debugging symbols and two ELF sections if you don't need them:
 1573: `.comment` and `.ident`. They are used for storing RCS strings viewable with
 1574: [ident(1)](http://netbsd.gw.com/cgi-bin/man-cgi?ident+1+NetBSD-5.0.1+i386) and a
 1575: [gcc(1)](http://netbsd.gw.com/cgi-bin/man-cgi?gcc+1+NetBSD-5.0.1+i386) version
 1576: string. The following examples assume you have your `TOOLDIR` under
 1577: `/usr/src/tooldir.NetBSD-2.0-i386` and the target architecture is `i386`.
 1578: 
 1579:     $ /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-objdump -h /netbsd
 1580:     
 1581:     /netbsd:     file format elf32-i386
 1582:     
 1583:     Sections:
 1584:     Idx Name          Size      VMA       LMA       File off  Algn
 1585:       0 .text         0057a374  c0100000  c0100000  00001000  2**4
 1586:                       CONTENTS, ALLOC, LOAD, READONLY, CODE
 1587:       1 .rodata       00131433  c067a380  c067a380  0057b380  2**5
 1588:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1589:       2 .rodata.str1.1 00035ea0  c07ab7b3  c07ab7b3  006ac7b3  2**0
 1590:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1591:       3 .rodata.str1.32 00059d13  c07e1660  c07e1660  006e2660  2**5
 1592:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1593:       4 link_set_malloc_types 00000198  c083b374  c083b374  0073c374  2**2
 1594:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1595:       5 link_set_domains 00000024  c083b50c  c083b50c  0073c50c  2**2
 1596:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1597:       6 link_set_pools 00000158  c083b530  c083b530  0073c530  2**2
 1598:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1599:       7 link_set_sysctl_funcs 000000f0  c083b688  c083b688  0073c688  2**2
 1600:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1601:       8 link_set_vfsops 00000044  c083b778  c083b778  0073c778  2**2
 1602:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1603:       9 link_set_dkwedge_methods 00000004  c083b7bc  c083b7bc  0073c7bc  2**2
 1604:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1605:      10 link_set_bufq_strats 0000000c  c083b7c0  c083b7c0  0073c7c0  2**2
 1606:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1607:      11 link_set_evcnts 00000030  c083b7cc  c083b7cc  0073c7cc  2**2
 1608:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1609:      12 .data         00048ae4  c083c800  c083c800  0073c800  2**5
 1610:                       CONTENTS, ALLOC, LOAD, DATA
 1611:      13 .bss          00058974  c0885300  c0885300  00785300  2**5
 1612:                       ALLOC
 1613:      14 .comment      0000cda0  00000000  00000000  00785300  2**0
 1614:                       CONTENTS, READONLY
 1615:      15 .ident        000119e4  00000000  00000000  007920a0  2**0
 1616:                       CONTENTS, READONLY
 1617: 
 1618: On the third column we can see the size of the sections in hexadecimal form. By
 1619: summing `.comment` and `.ident` sizes we know how much we're going to save with
 1620: their removal: around 120KB (= 52640 + 72164 = 0xcda0 + 0x119e4). To remove the
 1621: sections and debugging symbols that may be present, we're going to use
 1622: [strip(1)](http://netbsd.gw.com/cgi-bin/man-cgi?strip+1+NetBSD-5.0.1+i386):
 1623: 
 1624:     # cp /netbsd /netbsd.orig
 1625:     # /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-strip -S -R .ident -R .comment /netbsd
 1626:     # ls -l /netbsd /netbsd.orig
 1627:     -rwxr-xr-x  1 root  wheel  8590668 Apr 30 15:56 netbsd
 1628:     -rwxr-xr-x  1 root  wheel  8757547 Apr 30 15:56 netbsd.orig
 1629: 
 1630: Since we also removed debugging symbols, the total amount of disk space saved is
 1631: around 160KB.
 1632: 
 1633: #### Compressing the Kernel
 1634: 
 1635: On some architectures, the bootloader can boot a compressed kernel. You can save
 1636: several MegaBytes of disk space by using this method, but the bootloader will
 1637: take longer to load the Kernel.
 1638: 
 1639:     # cp /netbsd /netbsd.plain
 1640:     # gzip -9 /netbsd
 1641: 
 1642: To see how much space we've saved:
 1643: 
 1644:     $ ls -l /netbsd.plain /netbsd.gz
 1645:     -rwxr-xr-x  1 root  wheel  8757547 Apr 29 18:05 /netbsd.plain
 1646:     -rwxr-xr-x  1 root  wheel  3987769 Apr 29 18:05 /netbsd.gz
 1647: 
 1648: Note that you can only use gzip coding, by using
 1649: [gzip(1)](http://netbsd.gw.com/cgi-bin/man-cgi?gzip+1+NetBSD-5.0.1+i386), bzip2
 1650: is not supported by the NetBSD bootloaders!
 1651: 

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb