File:  [NetBSD Developer Wiki] / wikisrc / guide / tuning.mdwn
Revision 1.3: download - view: text, annotated - select for diffs
Fri Jun 19 19:18:31 2015 UTC (5 years ago) by plunky
Branches: MAIN
CVS tags: HEAD
replace direct links to manpages on netbsd.gw.com with templates

    1: **Contents**
    2: 
    3: [[!toc levels=3]]
    4: 
    5: # Tuning NetBSD
    6: 
    7: ## Introduction
    8: 
    9: ### Overview
   10: 
   11: This section covers a variety of performance tuning topics. It attempts to span
   12: tuning from the perspective of the system administrator to systems programmer.
   13: The art of performance tuning itself is very old. To tune something means to
   14: make it operate more efficiently, whether one is referring to a NetBSD based
   15: technical server or a vacuum cleaner, the goal is to improve something, whether
   16: that be the way something is done, how it works or how it is put together.
   17: 
   18: #### What is Performance Tuning?
   19: 
   20: A view from 10,000 feet pretty much dictates that everything we do is task
   21: oriented, this pertains to a NetBSD system as well. When the system boots, it
   22: automatically begins to perform a variety of tasks. When a user logs in, they
   23: usually have a wide variety of tasks they have to accomplish. In the scope of
   24: these documents, however, performance tuning strictly means to improve how
   25: efficient a NetBSD system performs.
   26: 
   27: The most common thought that crops into someone's mind when they think "tuning"
   28: is some sort of speed increase or decreasing the size of the kernel - while
   29: those are ways to improve performance, they are not the only ends an
   30: administrator may have to take for increasing efficiency. For our purposes,
   31: performance tuning means this: *To make a NetBSD system operate in an optimum
   32: state.*
   33: 
   34: Which could mean a variety of things, not necessarily speed enhancements. A good
   35: example of this is filesystem formatting parameters, on a system that has a lot
   36: of small files (say like a source repository) an administrator may need to
   37: increase the number of inodes by making their size smaller (say down to 1024k)
   38: and then increasing the amount of inodes. In this case the number of inodes was
   39: increased, however, it keeps the administrator from getting those nasty out of
   40: inodes messages, which ultimately makes the system more efficient.
   41: 
   42: Tuning normally revolves around finding and eliminating bottlenecks. Most of the
   43: time, such bottlenecks are spurious, for example, a release of Mozilla that does
   44: not quite handle java applets too well can cause Mozilla to start crunching the
   45: CPU, especially applets that are not done well. Occasions when processes seem to
   46: spin off into nowhere and eat CPU are almost always resolved with a kill. There
   47: are instances, however, when resolving bottlenecks takes a lot longer, for
   48: example, say an rsynced server is just getting larger and larger. Slowly,
   49: performance begins to fade and the administrator may have to take some sort of
   50: action to speed things up, however, the situation is relative to say an
   51: emergency like an instantly spiked CPU.
   52: 
   53: #### When does one tune?
   54: 
   55: Many NetBSD users rarely have to tune a system. The GENERIC kernel may run just
   56: fine and the layout/configuration of the system may do the job as well. By the
   57: same token, as a pragma it is always good to know how to tune a system. Most
   58: often tuning comes as a result of a sudden bottleneck issue (which may occur
   59: randomly) or a gradual loss of performance. It does happen in a sense to
   60: everyone at some point, one process that is eating the CPU is a bottleneck as
   61: much as a gradual increase in paging. So, the question should not be when to
   62: tune so much as when to learn to tune.
   63: 
   64: One last time to tune is if you can tune in a preventive manner (and you think
   65: you might need to) then do it. One example of this was on a system that needed
   66: to be able to reboot quickly. Instead of waiting, I did everything I could to
   67: trim the kernel and make sure there was absolutely nothing running that was not
   68: needed, I even removed drivers that did have devices, but were never used (lp).
   69: The result was reducing reboot time by nearly two-thirds. In the long run, it
   70: was a smart move to tune it before it became an issue.
   71: 
   72: #### What these Documents Will Not Cover
   73: 
   74: Before I wrap up the introduction, I think it is important to note what these
   75: documents will not cover. This guide will pertain only to the core NetBSD
   76: system. In other words, it will not cover tuning a web server's configuration to
   77: make it run better; however, it might mention how to tune NetBSD to run better
   78: as a web server. The logic behind this is simple: web servers, database
   79: software, etc. are third party and almost limitless. I could easily get mired
   80: down in details that do not apply to the NetBSD system. Almost all third party
   81: software have their own documentation about tuning anyhow.
   82: 
   83: #### How Examples are Laid Out
   84: 
   85: Since there is ample man page documentation, only used options and arguments
   86: with examples are discussed. In some cases, material is truncated for brevity
   87: and not thoroughly discussed because, quite simply, there is too much. For
   88: example, every single device driver entry in the kernel will not be discussed,
   89: however, an example of determining whether or not a given system needs one will
   90: be. Nothing in this Guide is concrete, tuning and performance are very
   91: subjective, instead, it is a guide for the reader to learn what some of the
   92: tools available to them can do.
   93: 
   94: ## Tuning Considerations
   95: 
   96: Tuning a system is not really too difficult when pro-active tuning is the
   97: approach. This document approaches tuning from a *before it comes up* approach.
   98: While tuning in spare time is considerably easier versus say, a server that is
   99: almost completely bogged down to 0.1% idle time, there are still a few things
  100: that should be mulled over about tuning before actually doing it, hopefully,
  101: before a system is even installed.
  102: 
  103: ### General System Configuration
  104: 
  105: Of course, how the system is setup makes a big difference. Sometimes small items
  106: can be overlooked which may in fact cause some sort of long term performance
  107: problem.
  108: 
  109: #### Filesystems and Disks
  110: 
  111: How the filesystem is laid out relative to disk drives is very important. On
  112: hardware RAID systems, it is not such a big deal, but, many NetBSD users
  113: specifically use NetBSD on older hardware where hardware RAID simply is not an
  114: option. The idea of `/` being close to the first drive is a good one, but for
  115: example if there are several drives to choose from that will be the first one,
  116: is the best performing the one that `/` will be on? On a related note, is it
  117: wise to split off `/usr`? Will the system see heavy usage say in `/usr/pkgsrc`?
  118: It might make sense to slap a fast drive in and mount it under `/usr/pkgsrc`, or
  119: it might not. Like all things in performance tuning, this is subjective.
  120: 
  121: #### Swap Configuration
  122: 
  123: There are three schools of thought on swap size and about fifty on using split
  124: swap files with prioritizing and how that should be done. In the swap size
  125: arena, the vendor schools (at least most commercial ones) usually have their own
  126: formulas per OS. As an example, on a particular version of HP-UX with a
  127: particular version of Oracle the formula was:
  128: 
  129: 2.5 GB \* Number\_of\_processor
  130: 
  131: Well, that all really depends on what type of usage the database is having and
  132: how large it is, for instance if it is so large that it must be distributed,
  133: that formula does not fit well.
  134: 
  135: The next school of thought about swap sizing is sort of strange but makes some
  136: sense, it says, if possible, get a reference amount of memory used by the
  137: system. It goes something like this:
  138: 
  139:  1. Startup a machine and estimate total memory needs by running everything that
  140:     may ever be needed at once. Databases, web servers .... whatever. Total up
  141: 	the amount.
  142:  2. Add a few MB for padding.
  143:  3. Subtract the amount of physical RAM from this total.
  144: 
  145: If the amount leftover is 3 times the size of physical RAM, consider getting
  146: more RAM. The problem, of course, is figuring out what is needed and how much
  147: space it will take. There is also another flaw in this method, some programs do
  148: not behave well. A glaring example of misbehaved software is web browsers. On
  149: certain versions of Netscape, when something went wrong it had a tendency to
  150: runaway and eat swap space. So, the more spare space available, the more time to
  151: kill it.
  152: 
  153: Last and not least is the tried and true PHYSICAL\_RAM \* 2 method. On modern
  154: machines and even older ones (with limited purpose of course) this seems to work
  155: best.
  156: 
  157: All in all, it is hard to tell when swapping will start. Even on small 16MB RAM
  158: machines (and less) NetBSD has always worked well for most people until
  159: misbehaving software is running.
  160: 
  161: ### System Services
  162: 
  163: On servers, system services have a large impact. Getting them to run at their
  164: best almost always requires some sort of network level change or a fundamental
  165: speed increase in the underlying system (which of course is what this is all
  166: about). There are instances when some simple solutions can improve services. One
  167: example, an ftp server is becoming slower and a new release of the ftp server
  168: that is shipped with the system comes out that, just happens to run faster. By
  169: upgrading the ftp software, a performance boost is accomplished.
  170: 
  171: Another good example where services are concerned is the age old question: *To
  172: use inetd or not to use inetd?* A great service example is pop3. Pop3
  173: connections can conceivably clog up inetd. While the pop3 service itself starts
  174: to degrade slowly, other services that are multiplexed through inetd will also
  175: degrade (in some case more than pop3). Setting up pop3 to run outside of inetd
  176: and on its own may help.
  177: 
  178: ### The NetBSD Kernel
  179: 
  180: The NetBSD kernel obviously plays a key role in how well a system performs,
  181: while rebuilding and tuning the kernel is covered later in the text, it is worth
  182: discussing in the local context from a high level.
  183: 
  184: Tuning the NetBSD kernel really involves three main areas:
  185: 
  186:  1. removing unrequired drivers
  187:  2. configuring options
  188:  3. system settings
  189: 
  190: #### Removing Unrequired Drivers
  191: 
  192: Taking drivers out of the kernel that are not needed achieves several results;
  193: first, the system boots faster since the kernel is smaller, second again since
  194: the kernel is smaller, more memory is free to users and processes and third, the
  195: kernel tends to respond quicker.
  196: 
  197: #### Configuring Options
  198: 
  199: Configuring options such as enabling/disabling certain subsystems, specific
  200: hardware and filesystems can also improve performance pretty much the same way
  201: removing unrequired drivers does. A very simple example of this is a FTP server
  202: that only hosts ftp files - nothing else. On this particular server there is no
  203: need to have anything but native filesystem support and perhaps a few options to
  204: help speed things along. Why would it ever need NTFS support for example?
  205: Besides, if it did, support for NTFS could be added at some later time. In an
  206: opposite case, a workstation may need to support a lot of different filesystem
  207: types to share and access files.
  208: 
  209: #### System Settings
  210: 
  211: System wide settings are controlled by the kernel, a few examples are filesystem
  212: settings, network settings and core kernel settings such as the maximum number
  213: of processes. Almost all system settings can be at least looked at or modified
  214: via the sysctl facility. Examples using the sysctl facility are given later on.
  215: 
  216: ## Visual Monitoring Tools
  217: 
  218: NetBSD ships a variety of performance monitoring tools with the system. Most of
  219: these tools are common on all UNIX systems. In this section some example usage
  220: of the tools is given with interpretation of the output.
  221: 
  222: ### The top Process Monitor
  223: 
  224: The [[!template id=man name="top" section="1"]]
  225: monitor does exactly what it says: it displays the CPU hogs on the
  226: system. To run the monitor, simply type top at the prompt. Without any
  227: arguments, it should look like:
  228: 
  229:     load averages:  0.09,  0.12,  0.08                                     20:23:41
  230:     21 processes:  20 sleeping, 1 on processor
  231:     CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
  232:     Memory: 15M Act, 1104K Inact, 208K Wired, 22M Free, 129M Swap free
  233:     
  234:       PID USERNAME PRI NICE   SIZE   RES STATE     TIME   WCPU    CPU COMMAND
  235:     13663 root       2    0  1552K 1836K sleep     0:08  0.00%  0.00% httpd
  236:       127 root      10    0   129M 4464K sleep     0:01  0.00%  0.00% mount_mfs
  237:     22591 root       2    0   388K 1156K sleep     0:01  0.00%  0.00% sshd
  238:       108 root       2    0   132K  472K sleep     0:01  0.00%  0.00% syslogd
  239:     22597 jrf       28    0   156K  616K onproc    0:00  0.00%  0.00% top
  240:     22592 jrf       18    0   828K 1128K sleep     0:00  0.00%  0.00% tcsh
  241:       203 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
  242:         1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
  243:       205 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
  244:       206 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  245:       208 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  246:       207 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  247:     13667 nobody     2    0  1660K 1508K sleep     0:00  0.00%  0.00% httpd
  248:      9926 root       2    0   336K  588K sleep     0:00  0.00%  0.00% sshd
  249:       200 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
  250:       182 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
  251:       180 root       2    0    92K  436K sleep     0:00  0.00%  0.00% portsentry
  252:     13666 nobody    -4    0  1600K 1260K sleep     0:00  0.00%  0.00% httpd
  253: 
  254: The top(1) utility is great for finding CPU hogs, runaway processes or groups of
  255: processes that may be causing problems. The output shown above indicates that
  256: this particular system is in good health. Now, the next display should show some
  257: very different results:
  258: 
  259:     load averages:  0.34,  0.16,  0.13                                     21:13:47
  260:     25 processes:  24 sleeping, 1 on processor
  261:     CPU states:  0.5% user,  0.0% nice,  9.0% system,  1.0% interrupt, 89.6% idle
  262:     Memory: 20M Act, 1712K Inact, 240K Wired, 30M Free, 129M Swap free
  263:     
  264:       PID USERNAME PRI NICE   SIZE   RES STATE     TIME   WCPU    CPU COMMAND
  265:      5304 jrf       -5    0    56K  336K sleep     0:04 66.07% 19.53% bonnie
  266:      5294 root       2    0   412K 1176K sleep     0:02  1.01%  0.93% sshd
  267:       108 root       2    0   132K  472K sleep     1:23  0.00%  0.00% syslogd
  268:       187 root       2    0  1552K 1824K sleep     0:07  0.00%  0.00% httpd
  269:      5288 root       2    0   412K 1176K sleep     0:02  0.00%  0.00% sshd
  270:      5302 jrf       28    0   160K  620K onproc    0:00  0.00%  0.00% top
  271:      5295 jrf       18    0   828K 1116K sleep     0:00  0.00%  0.00% tcsh
  272:      5289 jrf       18    0   828K 1112K sleep     0:00  0.00%  0.00% tcsh
  273:       127 root      10    0   129M 8388K sleep     0:00  0.00%  0.00% mount_mfs
  274:       204 root      10    0   220K  424K sleep     0:00  0.00%  0.00% cron
  275:         1 root      10    0   312K  192K sleep     0:00  0.00%  0.00% init
  276:       208 root       3    0    48K  432K sleep     0:00  0.00%  0.00% getty
  277:       210 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  278:       209 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  279:       211 root       3    0    48K  424K sleep     0:00  0.00%  0.00% getty
  280:       217 nobody     2    0  1616K 1272K sleep     0:00  0.00%  0.00% httpd
  281:       184 root       2    0   336K  580K sleep     0:00  0.00%  0.00% sshd
  282:       201 root       2    0    76K  456K sleep     0:00  0.00%  0.00% inetd
  283: 
  284: At first, it should seem rather obvious which process is hogging the system,
  285: however, what is interesting in this case is why. The bonnie program is a disk
  286: benchmark tool which can write large files in a variety of sizes and ways. What
  287: the previous output indicates is only that the bonnie program is a CPU hog, but
  288: not why.
  289: 
  290: #### Other Neat Things About Top
  291: 
  292: A careful examination of the manual page
  293: [[!template id=man name="top" section="1"]] shows
  294: that there is a lot more that can be done with top, for example, processes can
  295: have their priority changed and killed. Additionally, filters can be set for
  296: looking at processes.
  297: 
  298: ### The sysstat utility
  299: 
  300: As the man page
  301: [[!template id=man name="sysstat" section="1"]]
  302: indicates, the sysstat utility shows a variety of system statistics using the
  303: curses library. While it is running the screen is shown in two parts, the upper
  304: window shows the current load average while the lower screen depends on user
  305: commands. The exception to the split window view is when vmstat display is on
  306: which takes up the whole screen. Following is what sysstat looks like on a
  307: fairly idle system with no arguments given when it was invoked:
  308: 
  309:                        /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  310:          Load Average   |
  311:     
  312:                              /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
  313:                       <idle> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  314: 
  315: Basically a lot of dead time there, so now have a look with some arguments
  316: provided, in this case, `sysstat inet.tcp` which looks like this:
  317: 
  318:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  319:          Load Average   |
  320:     
  321:             0 connections initiated           19 total TCP packets sent
  322:             0 connections accepted            11   data
  323:             0 connections established          0   data (retransmit)
  324:                                                8   ack-only
  325:             0 connections dropped              0   window probes
  326:             0   in embryonic state             0   window updates
  327:             0   on retransmit timeout          0   urgent data only
  328:             0   by keepalive                   0   control
  329:             0   by persist
  330:                                               29 total TCP packets received
  331:            11 potential rtt updates           17   in sequence
  332:            11 successful rtt updates           0   completely duplicate
  333:             9 delayed acks sent                0   with some duplicate data
  334:             0 retransmit timeouts              4   out of order
  335:             0 persist timeouts                 0   duplicate acks
  336:             0 keepalive probes                11   acks
  337:             0 keepalive timeouts               0   window probes
  338:                                                0   window updates
  339: 
  340: Now that is informative. The first poll is accumulative, so it is possible to
  341: see quite a lot of information in the output when sysstat is invoked. Now, while
  342: that may be interesting, how about a look at the buffer cache with `sysstat
  343: bufcache`:
  344: 
  345:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  346:          Load Average
  347:     
  348:     There are 1642 buffers using 6568 kBytes of memory.
  349:     
  350:     File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
  351:     /                          877  53        6171  93        6516  99      94
  352:     /var/tmp                     5   0          17   0          28   0      60
  353:     
  354:     Total:                     882  53        6188  94        6544  99
  355: 
  356: Again, a pretty boring system, but great information to have available. While
  357: this is all nice to look at, it is time to put a false load on the system to see
  358: how sysstat can be used as a performance monitoring tool. As with top, bonnie++
  359: will be used to put a high load on the I/O subsystems and a little on the CPU.
  360: The bufcache will be looked at again to see of there are any noticeable
  361: differences:
  362: 
  363:                         /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
  364:          Load Average   |||
  365:     
  366:     There are 1642 buffers using 6568 kBytes of memory.
  367:     
  368:     File System          Bufs used   %   kB in use   %  Bufsize kB   %  Util %
  369:     /                          811  49        6422  97        6444  98      99
  370:     
  371:     Total:                     811  49        6422  97        6444  98
  372: 
  373: First, notice that the load average shot up, this is to be expected of course,
  374: then, while most of the numbers are close, notice that utilization is at 99%.
  375: Throughout the time that bonnie++ was running the utilization percentage
  376: remained at 99, this of course makes sense, however, in a real troubleshooting
  377: situation, it could be indicative of a process doing heavy I/O on one particular
  378: file or filesystem.
  379: 
  380: ## Monitoring Tools
  381: 
  382: In addition to screen oriented monitors and tools, the NetBSD system also ships
  383: with a set of command line oriented tools. Many of the tools that ship with a
  384: NetBSD system can be found on other UNIX and UNIX-like systems.
  385: 
  386: ### fstat
  387: 
  388: The [[!template id=man name="fstat" section="1"]]
  389: utility reports the status of open files on the system, while it is not what
  390: many administrators consider a performance monitor, it can help find out if a
  391: particular user or process is using an inordinate amount of files, generating
  392: large files and similar information.
  393: 
  394: Following is a sample of some fstat output:
  395: 
  396:     USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
  397:     jrf      tcsh       21607   wd /         29772 drwxr-xr-x     512 r
  398:     jrf      tcsh       21607    3* unix stream c057acc0<-> c0553280
  399:     jrf      tcsh       21607    4* unix stream c0553280 <-> c057acc0
  400:     root     sshd       21597   wd /             2 drwxr-xr-x     512 r
  401:     root     sshd       21597    0 /         11921 crw-rw-rw-    null rw
  402:     nobody   httpd       5032   wd /             2 drwxr-xr-x     512 r
  403:     nobody   httpd       5032    0 /         11921 crw-rw-rw-    null r
  404:     nobody   httpd       5032    1 /         11921 crw-rw-rw-    null w
  405:     nobody   httpd       5032    2 /         15890 -rw-r--r--  353533 rw
  406:     ...
  407: 
  408: The fields are pretty self explanatory, again, this tool while not as
  409: performance oriented as others, can come in handy when trying to find out
  410: information about file usage.
  411: 
  412: ### iostat
  413: 
  414: The [[!template id=man name="iostat" section="8"]]
  415: command does exactly what it sounds like, it reports the status of the I/O
  416: subsystems on the system. When iostat is employed, the user typically runs it
  417: with a certain number of counts and an interval between them like so:
  418: 
  419:     $ iostat 5 5
  420:           tty            wd0             cd0             fd0             md0             cpu
  421:      tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
  422:        0    1  5.13   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  423:        0   54  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  424:        0   18  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  425:        0   18  8.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  426:        0   28  0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  427: 
  428: The above output is from a very quiet ftp server. The fields represent the
  429: various I/O devices, the tty (which, ironically, is the most active because
  430: iostat is running), wd0 which is the primary IDE disk, cd0, the cdrom drive,
  431: fd0, the floppy and the memory filesystem.
  432: 
  433: Now, let's see if we can pummel the system with some heavy usage. First, a large
  434: ftp transaction consisting of a tarball of netbsd-current source along with the
  435: `bonnie++` disk benchmark program running at the same time.
  436: 
  437:     $ iostat 5 5
  438:           tty            wd0             cd0             fd0             md0             cpu
  439:      tin tout  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
  440:        0    1  5.68   1 0.00   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  0  0 100
  441:        0   54 61.03 150 8.92   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 18  4 78
  442:        0   26 63.14 157 9.71   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0 20  4 75
  443:        0   20 43.58  26 1.12   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   0  0  9  2 88
  444:        0   28 19.49  82 1.55   0.00   0 0.00   0.00   0 0.00   0.00   0 0.00   1  0  7  3 89
  445: 
  446: As can be expected, notice that wd0 is very active, what is interesting about
  447: this output is how the processor's I/O seems to rise in proportion to wd0. This
  448: makes perfect sense, however, it is worth noting that only because this ftp
  449: server is hardly being used can that be observed. If, for example, the cpu I/O
  450: subsystem was already under a moderate load and the disk subsystem was under the
  451: same load as it is now, it could appear that the cpu is bottlenecked when in
  452: fact it would have been the disk. In such a case, we can observe that *one tool*
  453: is rarely enough to completely analyze a problem. A quick glance at processes
  454: probably would tell us (after watching iostat) which processes were causing
  455: problems.
  456: 
  457: ### ps
  458: 
  459: Using the [[!template id=man name="ps" section="1"]]
  460: command or process status, a great deal of information about the system can be
  461: discovered. Most of the time, the ps command is used to isolate a particular
  462: process by name, group, owner etc. Invoked with no options or arguments, ps
  463: simply prints out information about the user executing it.
  464: 
  465:     $ ps
  466:       PID TT STAT    TIME COMMAND
  467:     21560 p0 Is   0:00.04 -tcsh
  468:     21564 p0 I+   0:00.37 ssh jrf.odpn.net
  469:     21598 p1 Ss   0:00.12 -tcsh
  470:     21673 p1 R+   0:00.00 ps
  471:     21638 p2 Is+  0:00.06 -tcsh
  472: 
  473: Not very exciting. The fields are self explanatory with the exception of `STAT`
  474: which is actually the state a process is in. The flags are all documented in the
  475: man page, however, in the above example, `I` is idle, `S` is sleeping, `R` is
  476: runnable, the `+` means the process is in a foreground state, and the s means
  477: the process is a session leader. This all makes perfect sense when looking at
  478: the flags, for example, PID 21560 is a shell, it is idle and (as would be
  479: expected) the shell is the process leader.
  480: 
  481: In most cases, someone is looking for something very specific in the process
  482: listing. As an example, looking at all processes is specified with `-a`, to see
  483: all processes plus those without controlling terminals is `-ax` and to get a
  484: much more verbose listing (basically everything plus information about the
  485: impact processes are having) aux:
  486: 
  487:     # ps aux
  488:     USER     PID %CPU %MEM    VSZ  RSS TT STAT STARTED    TIME COMMAND
  489:     root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
  490:     root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
  491:     root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
  492:     jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
  493:     root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
  494:     jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh jrf.odpn.net
  495:     jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
  496:     root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
  497:     nobody  5032  0.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd
  498:     ...
  499: 
  500: Again, most of the fields are self explanatory with the exception of `VSZ` and
  501: `RSS` which can be a little confusing. `RSS` is the real size of a process in
  502: 1024 byte units while `VSZ` is the virtual size. This is all great, but again,
  503: how can ps help? Well, for one, take a look at this modified version of the same
  504: output:
  505: 
  506:     # ps aux
  507:     USER     PID %CPU %MEM    VSZ  RSS TT STAT STARTED    TIME COMMAND
  508:     root       0  0.0  9.6      0 6260 ?? DLs  16Jul02 0:01.00 (swapper)
  509:     root   23362  0.0  0.8    144  488 ?? S    12:38PM 0:00.01 ftpd -l
  510:     root   23328  0.0  0.4    428  280 p1 S    12:34PM 0:00.04 -csh
  511:     jrf    23312  0.0  1.8    828 1132 p1 Is   12:32PM 0:00.06 -tcsh
  512:     root   23311  0.0  1.8    388 1156 ?? S    12:32PM 0:01.60 sshd: jrf@ttyp1
  513:     jrf    21951  0.0  1.7    244 1124 p0 S+    4:22PM 0:02.90 ssh jrf.odpn.net
  514:     jrf    21947  0.0  1.7    828 1128 p0 Is    4:21PM 0:00.04 -tcsh
  515:     root   21946  0.0  1.8    388 1156 ?? S     4:21PM 0:04.94 sshd: jrf@ttyp0
  516:     nobody  5032  9.0  2.0   1616 1300 ?? I    19Jul02 0:00.02 /usr/pkg/sbin/httpd
  517:     ...
  518: 
  519: Given that on this server, our baseline indicates a relatively quiet system, the
  520: PID 5032 has an unusually large amount of `%CPU`. Sometimes this can also cause
  521: high `TIME` numbers. The ps command can be grepped on for PIDs, username and
  522: process name and hence help track down processes that may be experiencing
  523: problems.
  524: 
  525: ### vmstat
  526: 
  527: Using
  528: [[!template id=man name="vmstat" section="1"]],
  529: information pertaining to virtual memory can be monitored and measured. Not
  530: unlike iostat, vmstat can be invoked with a count and interval. Following is
  531: some sample output using `5 5` like the iostat example:
  532: 
  533:     # vmstat 5 5
  534:      procs   memory     page                       disks         faults      cpu
  535:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
  536:      0 7 0 17716 33160    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
  537:      0 7 0 17724 33156    2   0   0    0    0    0  1  0  0  0  109    6   3  0  0 100
  538:      0 7 0 17724 33156    1   0   0    0    0    0  1  0  0  0  105    6   3  0  0 100
  539:      0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  107    6   3  0  0 100
  540:      0 7 0 17724 33156    1   0   0    0    0    0  0  0  0  0  105    6   3  0  0 100
  541: 
  542: Yet again, relatively quiet, for posterity, the exact same load that was put on
  543: this server in the iostat example will be used. The load is a large file
  544: transfer and the bonnie benchmark program.
  545: 
  546:     # vmstat 5 5
  547:      procs   memory     page                       disks         faults      cpu
  548:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
  549:      1 8 0 18880 31968    2   0   0    0    0    0  1  0  0  0  105   15   4  0  0 100
  550:      0 8 0 18888 31964    2   0   0    0    0    0 130  0  0  0 1804 5539 1094 31 22 47
  551:      1 7 0 18888 31964    1   0   0    0    0    0 130  0  0  0 1802 5500 1060 36 16 49
  552:      1 8 0 18888 31964    1   0   0    0    0    0 160  0  0  0 1849 5905 1107 21 22 57
  553:      1 7 0 18888 31964    1   0   0    0    0    0 175  0  0  0 1893 6167 1082  1 25 75
  554: 
  555: Just a little different. Notice, since most of the work was I/O based, the
  556: actual memory used was not very much. Since this system uses mfs for `/tmp`,
  557: however, it can certainly get beat up. Have a look at this:
  558: 
  559:     # vmstat 5 5
  560:      procs   memory     page                       disks         faults      cpu
  561:      r b w   avm   fre  flt  re  pi   po   fr   sr w0 c0 f0 m0   in   sy  cs us sy id
  562:      0 2 0 99188   500    2   0   0    0    0    0  1  0  0  0  105   16   4  0  0 100
  563:      0 2 0111596   436  592   0 587  624  586 1210 624  0  0  0  741  883 1088  0 11 89
  564:      0 3 0123976   784  666   0 662  643  683 1326 702  0  0  0  828  993 1237  0 12 88
  565:      0 2 0134692  1236  581   0 571  563  595 1158 599  0  0  0  722  863 1066  0  9 90
  566:      2 0 0142860   912  433   0 406  403  405  808 429  0  0  0  552  602 768  0  7 93
  567: 
  568: Pretty scary stuff. That was created by running bonnie in `/tmp` on a memory
  569: based filesystem. If it continued for too long, it is possible the system could
  570: have started thrashing. Notice that even though the VM subsystem was taking a
  571: beating, the processors still were not getting too battered.
  572: 
  573: ## Network Tools
  574: 
  575: Sometimes a performance problem is not a particular machine, it is the network
  576: or some sort of device on the network such as another host, a router etc. What
  577: other machines that provide a service or some sort of connectivity to a
  578: particular NetBSD system do and how they act can have a very large impact on
  579: performance of the NetBSD system itself, or the perception of performance by
  580: users. A really great example of this is when a DNS server that a NetBSD machine
  581: is using suddenly disappears. Lookups take long and they eventually fail.
  582: Someone logged into the NetBSD machine who is not experienced would undoubtedly
  583: (provided they had no other evidence) blame the NetBSD system. One of my
  584: personal favorites, *the Internet is broke*, usually means either DNS service or
  585: a router/gateway has dropped offline. Whatever the case may be, a NetBSD system
  586: comes adequately armed to deal with finding out what network issues may be
  587: cropping up whether the fault of the local system or some other issue.
  588: 
  589: ### ping
  590: 
  591: The classic
  592: [[!template id=man name="ping" section="8"]] utility
  593: can tell us if there is just plain connectivity, it can also tell if host
  594: resolution (depending on how `nsswitch.conf` dictates) is working. Following is
  595: some typical ping output on a local network with a count of 3 specified:
  596: 
  597:     # ping -c 3 marie
  598:     PING marie (172.16.14.12): 56 data bytes
  599:     64 bytes from 172.16.14.12: icmp_seq=0 ttl=255 time=0.571 ms
  600:     64 bytes from 172.16.14.12: icmp_seq=1 ttl=255 time=0.361 ms
  601:     64 bytes from 172.16.14.12: icmp_seq=2 ttl=255 time=0.371 ms
  602:     
  603:     ----marie PING Statistics----
  604:     3 packets transmitted, 3 packets received, 0.0% packet loss
  605:     round-trip min/avg/max/stddev = 0.361/0.434/0.571/0.118 ms
  606: 
  607: Not only does ping tell us if a host is alive, it tells us how long it took and
  608: gives some nice details at the very end. If a host cannot be resolved, just the
  609: IP address can be specified as well:
  610: 
  611:     # ping -c 1 172.16.20.5
  612:     PING ash (172.16.20.5): 56 data bytes
  613:     64 bytes from 172.16.20.5: icmp_seq=0 ttl=64 time=0.452 ms
  614:     
  615:     ----ash PING Statistics----
  616:     1 packets transmitted, 1 packets received, 0.0% packet loss
  617:     round-trip min/avg/max/stddev = 0.452/0.452/0.452/0.000 ms
  618: 
  619: Now, not unlike any other tool, the times are very subjective, especially in
  620: regards to networking. For example, while the times in the examples are good,
  621: take a look at the localhost ping:
  622: 
  623:     # ping -c 4 localhost
  624:     PING localhost (127.0.0.1): 56 data bytes
  625:     64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.091 ms
  626:     64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=0.129 ms
  627:     64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=0.120 ms
  628:     64 bytes from 127.0.0.1: icmp_seq=3 ttl=255 time=0.122 ms
  629:     
  630:     ----localhost PING Statistics----
  631:     4 packets transmitted, 4 packets received, 0.0% packet loss
  632:     round-trip min/avg/max/stddev = 0.091/0.115/0.129/0.017 ms
  633: 
  634: Much smaller because the request never left the machine. Pings can be used to
  635: gather information about how well a network is performing. It is also good for
  636: problem isolation, for instance, if there are three relatively close in size
  637: NetBSD systems on a network and one of them simply has horrible ping times,
  638: chances are something is wrong on that one particular machine.
  639: 
  640: ### traceroute
  641: 
  642: The
  643: [[!template id=man name="traceroute" section="8"]]
  644: command is great for making sure a path is available or detecting problems on a
  645: particular path. As an example, here is a trace between the example ftp server
  646: and ftp.NetBSD.org:
  647: 
  648:     # traceroute ftp.NetBSD.org
  649:     traceroute to ftp.NetBSD.org (204.152.184.75), 30 hops max, 40 byte packets
  650:      1  208.44.95.1 (208.44.95.1)  1.646 ms  1.492 ms  1.456 ms
  651:      2  63.144.65.170 (63.144.65.170)  7.318 ms  3.249 ms  3.854 ms
  652:      3  chcg01-edge18.il.inet.qwest.net (65.113.85.229)  35.982 ms  28.667 ms  21.971 ms
  653:      4  chcg01-core01.il.inet.qwest.net (205.171.20.1)  22.607 ms  26.242 ms  19.631 ms
  654:      5  snva01-core01.ca.inet.qwest.net (205.171.8.50)  78.586 ms  70.585 ms  84.779 ms
  655:      6  snva01-core03.ca.inet.qwest.net (205.171.14.122)  69.222 ms  85.739 ms  75.979 ms
  656:      7  paix01-brdr02.ca.inet.qwest.net (205.171.205.30)  83.882 ms  67.739 ms  69.937 ms
  657:      8  198.32.175.3 (198.32.175.3)  72.782 ms  67.687 ms  73.320 ms
  658:      9  so-1-0-0.orpa8.pf.isc.org (192.5.4.231)  78.007 ms  81.860 ms  77.069 ms
  659:     10  tun0.orrc5.pf.isc.org (192.5.4.165)  70.808 ms  75.151 ms  81.485 ms
  660:     11  ftp.NetBSD.org (204.152.184.75)  69.700 ms  69.528 ms  77.788 ms
  661: 
  662: All in all, not bad. The trace went from the host to the local router, then out
  663: onto the provider network and finally out onto the Internet looking for the
  664: final destination. How to interpret traceroutes, again, are subjective, but
  665: abnormally high times in portions of a path can indicate a bottleneck on a piece
  666: of network equipment. Not unlike ping, if the host itself is suspect, run
  667: traceroute from another host to the same destination. Now, for the worst case
  668: scenario:
  669: 
  670:     # traceroute www.microsoft.com
  671:     traceroute: Warning: www.microsoft.com has multiple addresses; using 207.46.230.220
  672:     traceroute to www.microsoft.akadns.net (207.46.230.220), 30 hops max, 40 byte packets
  673:      1  208.44.95.1 (208.44.95.1)  2.517 ms  4.922 ms  5.987 ms
  674:      2  63.144.65.170 (63.144.65.170)  10.981 ms  3.374 ms  3.249 ms
  675:      3  chcg01-edge18.il.inet.qwest.net (65.113.85.229)  37.810 ms  37.505 ms  20.795 ms
  676:      4  chcg01-core03.il.inet.qwest.net (205.171.20.21)  36.987 ms  32.320 ms  22.430 ms
  677:      5  chcg01-brdr03.il.inet.qwest.net (205.171.20.142)  33.155 ms  32.859 ms  33.462 ms
  678:      6  205.171.1.162 (205.171.1.162)  39.265 ms  20.482 ms  26.084 ms
  679:      7  sl-bb24-chi-13-0.sprintlink.net (144.232.26.85)  26.681 ms  24.000 ms  28.975 ms
  680:      8  sl-bb21-sea-10-0.sprintlink.net (144.232.20.30)  65.329 ms  69.694 ms  76.704 ms
  681:      9  sl-bb21-tac-9-1.sprintlink.net (144.232.9.221)  65.659 ms  66.797 ms  74.408 ms
  682:     10  144.232.187.194 (144.232.187.194)  104.657 ms  89.958 ms  91.754 ms
  683:     11  207.46.154.1 (207.46.154.1)  89.197 ms  84.527 ms  81.629 ms
  684:     12  207.46.155.10 (207.46.155.10)  78.090 ms  91.550 ms  89.480 ms
  685:     13  * * *
  686:     .......
  687: 
  688: In this case, the Microsoft server cannot be found either because of multiple
  689: addresses or somewhere along the line a system or server cannot reply to the
  690: information request. At that point, one might think to try ping, in the
  691: Microsoft case, a ping does not reply, that is because somewhere on their
  692: network ICMP is most likely disabled.
  693: 
  694: ### netstat
  695: 
  696: Another problem that can crop up on a NetBSD system is routing table issues.
  697: These issues are not always the systems fault. The
  698: [[!template id=man name="route" section="8"]] and
  699: [[!template id=man name="netstat" section="1"]]
  700: commands can show information about routes and network connections
  701: (respectively).
  702: 
  703: The route command can be used to look at and modify routing tables while netstat
  704: can display information about network connections and routes. First, here is
  705: some output from `route show`:
  706: 
  707:     # route show
  708:     Routing tables
  709:     
  710:     Internet:
  711:     Destination      Gateway            Flags
  712:     default          208.44.95.1        UG
  713:     loopback         127.0.0.1          UG
  714:     localhost        127.0.0.1          UH
  715:     172.15.13.0      172.16.14.37       UG
  716:     172.16.0.0       link#2             U
  717:     172.16.14.8      0:80:d3:cc:2c:0    UH
  718:     172.16.14.10     link#2             UH
  719:     marie            0:10:83:f9:6f:2c   UH
  720:     172.16.14.37     0:5:32:8f:d2:35    UH
  721:     172.16.16.15     link#2             UH
  722:     loghost          8:0:20:a7:f0:75    UH
  723:     artemus          8:0:20:a8:d:7e     UH
  724:     ash              0:b0:d0:de:49:df   UH
  725:     208.44.95.0      link#1             U
  726:     208.44.95.1      0:4:27:3:94:20     UH
  727:     208.44.95.2      0:5:32:8f:d2:34    UH
  728:     208.44.95.25     0:c0:4f:10:79:92   UH
  729:     
  730:     Internet6:
  731:     Destination      Gateway            Flags
  732:     default          localhost          UG
  733:     default          localhost          UG
  734:     localhost        localhost          UH
  735:     ::127.0.0.0      localhost          UG
  736:     ::224.0.0.0      localhost          UG
  737:     ::255.0.0.0      localhost          UG
  738:     ::ffff:0.0.0.0   localhost          UG
  739:     2002::           localhost          UG
  740:     2002:7f00::      localhost          UG
  741:     2002:e000::      localhost          UG
  742:     2002:ff00::      localhost          UG
  743:     fe80::           localhost          UG
  744:     fe80::%ex0       link#1             U
  745:     fe80::%ex1       link#2             U
  746:     fe80::%lo0       fe80::1%lo0        U
  747:     fec0::           localhost          UG
  748:     ff01::           localhost          U
  749:     ff02::%ex0       link#1             U
  750:     ff02::%ex1       link#2             U
  751:     ff02::%lo0       fe80::1%lo0        U
  752: 
  753: The flags section shows the status and whether or not it is a gateway. In this
  754: case we see `U`, `H` and `G` (`U` is up, `H` is host and `G` is gateway, see
  755: the man page for additional flags).
  756: 
  757: Now for some netstat output using the `-r` (routing) and `-n` (show network
  758: numbers) options:
  759: 
  760:     Routing tables
  761:     
  762:     Internet:
  763:     Destination        Gateway            Flags     Refs     Use    Mtu  Interface
  764:     default            208.44.95.1        UGS         0   330309   1500  ex0
  765:     127                127.0.0.1          UGRS        0        0  33228  lo0
  766:     127.0.0.1          127.0.0.1          UH          1     1624  33228  lo0
  767:     172.15.13/24       172.16.14.37       UGS         0        0   1500  ex1
  768:     172.16             link#2             UC         13        0   1500  ex1
  769:     ...
  770:     Internet6:
  771:     Destination                   Gateway                   Flags     Refs     Use
  772:       Mtu  Interface
  773:     ::/104                        ::1                       UGRS        0        0
  774:     33228  lo0 =>
  775:     ::/96                         ::1                       UGRS        0        0
  776: 
  777: The above output is a little more verbose. So, how can this help? Well, a good
  778: example is when routes between networks get changed while users are connected. I
  779: saw this happen several times when someone was rebooting routers all day long
  780: after each change. Several users called up saying they were getting kicked out
  781: and it was taking very long to log back in. As it turned out, the clients
  782: connecting to the system were redirected to another router (which took a very
  783: long route) to reconnect. I observed the `M` flag or Modified dynamically (by
  784: redirect) on their connections. I deleted the routes, had them reconnect and
  785: summarily followed up with the offending technician.
  786: 
  787: ### tcpdump
  788: 
  789: Last, and definitely not least is
  790: [[!template id=man name="tcpdump" section="8"]],
  791: the network sniffer that can retrieve a lot of information. In this discussion,
  792: there will be some sample output and an explanation of some of the more useful
  793: options of tcpdump.
  794: 
  795: Following is a small snippet of tcpdump in action just as it starts:
  796: 
  797:     # tcpdump
  798:     tcpdump: listening on ex0
  799:     14:07:29.920651 mail.ssh > 208.44.95.231.3551: P 2951836801:2951836845(44) ack 2
  800:     476972923 win 17520 <nop,nop,timestamp 1219259 128519450> [tos 0x10]
  801:     14:07:29.950594 12.125.61.34 >  208.44.95.16: ESP(spi=2548773187,seq=0x3e8c) (DF)
  802:     14:07:29.983117 smtp.somecorp.com.smtp > 208.44.95.30.42828: . ack 420285166 win
  803:     16500 (DF)
  804:     14:07:29.984406 208.44.95.30.42828 > smtp.somecorp.com.smtp: . 1:1376(1375) ack 0
  805:      win 7431 (DF)
  806:     ...
  807: 
  808: Given that the particular server is a mail server, what is shown makes perfect
  809: sense, however, the utility is very verbose, I prefer to initially run tcpdump
  810: with no options and send the text output into a file for later digestion like
  811: so:
  812: 
  813:     # tcpdump > tcpdump.out
  814:     tcpdump: listening on ex0
  815: 
  816: So, what precisely in the mish mosh are we looking for? In short, anything that
  817: does not seem to fit, for example, messed up packet lengths (as in a lot of
  818: them) will show up as improper lens or malformed packets (basically garbage).
  819: If, however, we are looking for something specific, tcpdump may be able to help
  820: depending on the problem.
  821: 
  822: #### Specific tcpdump Usage
  823: 
  824: These are just examples of a few things one can do with tcpdump.
  825: 
  826: Look for duplicate IP addresses:
  827: 
  828:     tcpdump -e host ip-address
  829: 
  830: For example:
  831: 
  832:     tcpdump -e host 192.168.0.2
  833: 
  834: Routing Problems:
  835: 
  836:     tcpdump icmp
  837: 
  838: There are plenty of third party tools available, however, NetBSD comes shipped
  839: with a good tool set for tracking down network level performance problems.
  840: 
  841: ## Accounting
  842: 
  843: The NetBSD system comes equipped with a great deal of performance monitors for
  844: active monitoring, but what about long term monitoring? Well, of course the
  845: output of a variety of commands can be sent to files and re-parsed later with a
  846: meaningful shell script or program. NetBSD does, by default, offer some
  847: extraordinarily powerful low level monitoring tools for the programmer,
  848: administrator or really astute hobbyist.
  849: 
  850: ### Accounting
  851: 
  852: While accounting gives system usage at an almost userland level, kernel
  853: profiling with gprof provides explicit system call usage.
  854: 
  855: Using the accounting tools can help figure out what possible performance
  856: problems may be laying in wait, such as increased usage of compilers or network
  857: services for example.
  858: 
  859: Starting accounting is actually fairly simple, as root, use the
  860: [[!template id=man name="accton" section="8"]]
  861: command. The syntax to start accounting is: `accton filename`
  862: 
  863: Where accounting information is appended to filename, now, strangely enough, the
  864: lastcomm command which reads from an accounting output file, by default, looks
  865: in `/var/account/acct` so I tend to just use the default location, however,
  866: lastcomm can be told to look elsewhere.
  867: 
  868: To stop accounting, simply type accton with no arguments.
  869: 
  870: ### Reading Accounting Information
  871: 
  872: To read accounting information, there are two tools that can be used:
  873: 
  874:  * [[!template id=man name="lastcomm" section="1"]]
  875:  * [[!template id=man name="sa" section="8"]]
  876: 
  877: #### lastcomm
  878: 
  879: The lastcomm command shows the last commands executed in order, like all of
  880: them. It can, however, select by user, here is some sample output:
  881: 
  882:     $ lastcomm jrf
  883:     last       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:39 (0:00:00.02)
  884:     man        -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
  885:     sh         -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
  886:     less       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:01:49.03)
  887:     lastcomm   -       jrf      ttyp3      0.02 secs Tue Sep  3 14:38 (0:00:00.02)
  888:     stty       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
  889:     tset       -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:01.05)
  890:     hostname   -       jrf      ttyp3      0.00 secs Tue Sep  3 14:38 (0:00:00.02)
  891:     ls         -       jrf      ttyp0      0.00 secs Tue Sep  3 14:36 (0:00:00.00)
  892:     ...
  893: 
  894: Pretty nice, the lastcomm command gets its information from the default location
  895: of /var/account/acct, however, using the -f option, another file may be
  896: specified.
  897: 
  898: As may seem obvious, the output of lastcomm could get a little heavy on large
  899: multi user systems. That is where sa comes into play.
  900: 
  901: #### sa
  902: 
  903: The sa command (meaning "print system accounting statistics") can be used to
  904: maintain information. It can also be used interactively to create reports.
  905: Following is the default output of sa:
  906: 
  907:     $ sa
  908:           77       18.62re        0.02cp        8avio        0k
  909:            3        4.27re        0.01cp       45avio        0k   ispell
  910:            2        0.68re        0.00cp       33avio        0k   mutt
  911:            2        1.09re        0.00cp       23avio        0k   vi
  912:           10        0.61re        0.00cp        7avio        0k   ***other
  913:            2        0.01re        0.00cp       29avio        0k   exim
  914:            4        0.00re        0.00cp        8avio        0k   lastcomm
  915:            2        0.00re        0.00cp        3avio        0k   atrun
  916:            3        0.03re        0.00cp        1avio        0k   cron*
  917:            5        0.02re        0.00cp       10avio        0k   exim*
  918:           10        3.98re        0.00cp        2avio        0k   less
  919:           11        0.00re        0.00cp        0avio        0k   ls
  920:            9        3.95re        0.00cp       12avio        0k   man
  921:            2        0.00re        0.00cp        4avio        0k   sa
  922:           12        3.97re        0.00cp        1avio        0k   sh
  923:     ...
  924: 
  925: From left to right, total times called, real time in minutes, sum of user and
  926: system time, in minutes, Average number of I/O operations per execution, size,
  927: command name.
  928: 
  929: The sa command can also be used to create summary files or reports based on some
  930: options, for example, here is the output when specifying a sort by CPU-time
  931: average memory usage:
  932: 
  933:     $ sa -k
  934:           86       30.81re        0.02cp        8avio        0k
  935:           10        0.61re        0.00cp        7avio        0k   ***other
  936:            2        0.00re        0.00cp        3avio        0k   atrun
  937:            3        0.03re        0.00cp        1avio        0k   cron*
  938:            2        0.01re        0.00cp       29avio        0k   exim
  939:            5        0.02re        0.00cp       10avio        0k   exim*
  940:            3        4.27re        0.01cp       45avio        0k   ispell
  941:            4        0.00re        0.00cp        8avio        0k   lastcomm
  942:           12        8.04re        0.00cp        2avio        0k   less
  943:           13        0.00re        0.00cp        0avio        0k   ls
  944:           11        8.01re        0.00cp       12avio        0k   man
  945:            2        0.68re        0.00cp       33avio        0k   mutt
  946:            3        0.00re        0.00cp        4avio        0k   sa
  947:           14        8.03re        0.00cp        1avio        0k   sh
  948:            2        1.09re        0.00cp       23avio        0k   vi
  949: 
  950: The sa command is very helpful on larger systems.
  951: 
  952: ### How to Put Accounting to Use
  953: 
  954: Accounting reports, as was mentioned earlier, offer a way to help predict
  955: trends, for example, on a system that has cc and make being used more and more
  956: may indicate that in a few months some changes will need to be made to keep the
  957: system running at an optimum level. Another good example is web server usage. If
  958: it begins to gradually increase, again, some sort of action may need to be taken
  959: before it becomes a problem. Luckily, with accounting tools, said actions can be
  960: reasonably predicted and planned for ahead of time.
  961: 
  962: ## Kernel Profiling
  963: 
  964: Profiling a kernel is normally employed when the goal is to compare the
  965: difference of new changes in the kernel to a previous one or to track down some
  966: sort of low level performance problem. Two sets of data about profiled code
  967: behavior are recorded independently: function call frequency and time spent in
  968: each function.
  969: 
  970: ### Getting Started
  971: 
  972: First, take a look at both [[Kernel Tuning|guide/tuning#kernel]] and [[Compiling
  973: the kernel|guide/kernel]]. The only difference in procedure for setting up a
  974: kernel with profiling enabled is when you run config add the `-p` option. The
  975: build area is `../compile/<KERNEL_NAME>.PROF` , for example, a GENERIC kernel
  976: would be `../compile/GENERIC.PROF`.
  977: 
  978: Following is a quick summary of how to compile a kernel with profiling enabled
  979: on the i386 port, the assumptions are that the appropriate sources are available
  980: under `/usr/src` and the GENERIC configuration is being used, of course, that
  981: may not always be the situation:
  982: 
  983:  1. **`cd /usr/src/sys/arch/i386/conf`**
  984:  2. **`config -p GENERIC`**
  985:  3. **`cd ../compile/GENERIC.PROF`**
  986:  4. **`make depend && make`**
  987:  5. **`cp /netbsd /netbsd.old`**
  988:  6. **`cp netbsd /`**
  989:  7. **`reboot`**
  990: 
  991: Once the new kernel is in place and the system has rebooted, it is time to turn
  992: on the monitoring and start looking at results.
  993: 
  994: #### Using kgmon
  995: 
  996: To start kgmon:
  997: 
  998:     $ kgmon -b
  999:     kgmon: kernel profiling is running.
 1000: 
 1001: Next, send the data into the file `gmon.out`:
 1002: 
 1003:     $ kgmon -p
 1004: 
 1005: Now, it is time to make the output readable:
 1006: 
 1007:     $ gprof /netbsd > gprof.out
 1008: 
 1009: Since gmon is looking for `gmon.out`, it should find it in the current working
 1010: directory.
 1011: 
 1012: By just running kgmon alone, you may not get the information you need, however,
 1013: if you are comparing the differences between two different kernels, then a known
 1014: good baseline should be used. Note that it is generally a good idea to  stress
 1015: the subsystem if you know what it is both in the baseline and with the newer (or
 1016: different) kernel.
 1017: 
 1018: ### Interpretation of kgmon Output
 1019: 
 1020: Now that kgmon can run, collect and parse information, it is time to actually
 1021: look at some of that information. In this particular instance, a GENERIC kernel
 1022: is running with profiling enabled for about an hour with only system processes
 1023: and no adverse load, in the fault insertion section, the example will be large
 1024: enough that even under a minimal load detection of the problem should be easy.
 1025: 
 1026: #### Flat Profile
 1027: 
 1028: The flat profile is a list of functions, the number of times they were called
 1029: and how long it took (in seconds). Following is sample output from the quiet
 1030: system:
 1031: 
 1032:     Flat profile:
 1033:     
 1034:     Each sample counts as 0.01 seconds.
 1035:       %   cumulative   self              self     total
 1036:      time   seconds   seconds    calls  ns/call  ns/call  name
 1037:      99.77    163.87   163.87                             idle
 1038:       0.03    163.92     0.05      219 228310.50 228354.34  _wdc_ata_bio_start
 1039:       0.02    163.96     0.04      219 182648.40 391184.96  wdc_ata_bio_intr
 1040:       0.01    163.98     0.02     3412  5861.66  6463.02  pmap_enter
 1041:       0.01    164.00     0.02      548 36496.35 36496.35  pmap_zero_page
 1042:       0.01    164.02     0.02                             Xspllower
 1043:       0.01    164.03     0.01   481968    20.75    20.75  gettick
 1044:       0.01    164.04     0.01     6695  1493.65  1493.65  VOP_LOCK
 1045:       0.01    164.05     0.01     3251  3075.98 21013.45  syscall_plain
 1046:     ...
 1047: 
 1048: As expected, idle was the highest in percentage, however, there were still some
 1049: things going on, for example, a little further down there is the `vn\_lock`
 1050: function:
 1051: 
 1052:     ...
 1053:       0.00    164.14     0.00     6711     0.00     0.00  VOP_UNLOCK
 1054:       0.00    164.14     0.00     6677     0.00  1493.65  vn_lock
 1055:       0.00    164.14     0.00     6441     0.00     0.00  genfs_unlock
 1056: 
 1057: This is to be expected, since locking still has to take place, regardless.
 1058: 
 1059: #### Call Graph Profile
 1060: 
 1061: The call graph is an augmented version of the flat profile showing subsequent
 1062: calls from the listed functions. First, here is some sample output:
 1063: 
 1064:                          Call graph (explanation follows)
 1065:     
 1066:     
 1067:     granularity: each sample hit covers 4 byte(s) for 0.01% of 164.14 seconds
 1068:     
 1069:     index % time    self  children    called     name
 1070:                                                      <spontaneous>
 1071:     [1]     99.8  163.87    0.00                 idle [1]
 1072:     -----------------------------------------------
 1073:                                                      <spontaneous>
 1074:     [2]      0.1    0.01    0.08                 syscall1 [2]
 1075:                     0.01    0.06    3251/3251        syscall_plain [7]
 1076:                     0.00    0.01     414/1660        trap [9]
 1077:     -----------------------------------------------
 1078:                     0.00    0.09     219/219         Xintr14 [6]
 1079:     [3]      0.1    0.00    0.09     219         pciide_compat_intr [3]
 1080:                     0.00    0.09     219/219         wdcintr [5]
 1081:     -----------------------------------------------
 1082:     ...
 1083: 
 1084: Now this can be a little confusing. The index number is mapped to from the
 1085: trailing number on the end of the line, for example,
 1086: 
 1087:     ...
 1088:                     0.00    0.01      85/85          dofilewrite [68]
 1089:     [72]     0.0    0.00    0.01      85         soo_write [72]
 1090:                     0.00    0.01      85/89          sosend [71]
 1091:     ...
 1092: 
 1093: Here we see that dofilewrite was called first, now we can look at the index
 1094: number for 64 and see what was happening there:
 1095: 
 1096:     ...
 1097:                     0.00    0.01     101/103         ffs_full_fsync <cycle 6> [58]
 1098:     [64]     0.0    0.00    0.01     103         bawrite [64]
 1099:                     0.00    0.01     103/105         VOP_BWRITE [60]
 1100:     ...
 1101: 
 1102: And so on, in this way, a "visual trace" can be established.
 1103: 
 1104: At the end of the call graph right after the terms section is an index by
 1105: function name which can help map indexes as well.
 1106: 
 1107: ### Putting it to Use
 1108: 
 1109: In this example, I have modified an area of the kernel I know will create a problem that will be blatantly obvious.
 1110: 
 1111: Here is the top portion of the flat profile after running the system for about an hour with little interaction from users:
 1112: 
 1113:     Flat profile:
 1114:     
 1115:     Each sample counts as 0.01 seconds.
 1116:       %   cumulative   self              self     total
 1117:      time   seconds   seconds    calls  us/call  us/call  name
 1118:      93.97    139.13   139.13                             idle
 1119:       5.87    147.82     8.69       23 377826.09 377842.52  check_exec
 1120:       0.01    147.84     0.02      243    82.30    82.30  pmap_copy_page
 1121:       0.01    147.86     0.02      131   152.67   152.67  _wdc_ata_bio_start
 1122:       0.01    147.88     0.02      131   152.67   271.85  wdc_ata_bio_intr
 1123:       0.01    147.89     0.01     4428     2.26     2.66  uvn_findpage
 1124:       0.01    147.90     0.01     4145     2.41     2.41  uvm_pageactivate
 1125:       0.01    147.91     0.01     2473     4.04  3532.40  syscall_plain
 1126:       0.01    147.92     0.01     1717     5.82     5.82  i486_copyout
 1127:       0.01    147.93     0.01     1430     6.99    56.52  uvm_fault
 1128:       0.01    147.94     0.01     1309     7.64     7.64  pool_get
 1129:       0.01    147.95     0.01      673    14.86    38.43  genfs_getpages
 1130:       0.01    147.96     0.01      498    20.08    20.08  pmap_zero_page
 1131:       0.01    147.97     0.01      219    45.66    46.28  uvm_unmap_remove
 1132:       0.01    147.98     0.01      111    90.09    90.09  selscan
 1133:     ...
 1134: 
 1135: As is obvious, there is a large difference in performance. Right off the bat the
 1136: idle time is noticeably less. The main difference here is that one particular
 1137: function has a large time across the board with very few calls. That function is
 1138: `check_exec`. While at first, this may not seem strange if a lot of commands
 1139: had been executed, when compared to the flat profile of the first measurement,
 1140: proportionally it does not seem right:
 1141: 
 1142:     ...
 1143:       0.00    164.14     0.00       37     0.00 62747.49  check_exec
 1144:     ...
 1145: 
 1146: The call in the first measurement is made 37 times and has a better performance.
 1147: Obviously something in or around that function is wrong. To eliminate other
 1148: functions, a look at the call graph can help, here is the first instance of
 1149: `check_exec`
 1150: 
 1151:     ...
 1152:     -----------------------------------------------
 1153:                     0.00    8.69      23/23          syscall_plain [3]
 1154:     [4]      5.9    0.00    8.69      23         sys_execve [4]
 1155:                     8.69    0.00      23/23          check_exec [5]
 1156:                     0.00    0.00      20/20          elf32_copyargs [67]
 1157:     ...
 1158: 
 1159: Notice how the time of 8.69 seems to affect the two previous functions. It is
 1160: possible that there is something wrong with them, however, the next instance of
 1161: `check_exec` seems to prove otherwise:
 1162: 
 1163:     ...
 1164:     -----------------------------------------------
 1165:                     8.69    0.00      23/23          sys_execve [4]
 1166:     [5]      5.9    8.69    0.00      23         check_exec [5]
 1167:     ...
 1168: 
 1169: Now we can see that the problem, most likely, resides in `check_exec`. Of
 1170: course, problems are not always this simple and in fact, here is the simpleton
 1171: code that was inserted right after `check_exec` (the function is in
 1172: `sys/kern/kern_exec.c`):
 1173: 
 1174:     ...
 1175:             /* A Cheap fault insertion */
 1176:             for (x = 0; x < 100000000; x++) {
 1177:                     y = x;
 1178:             }
 1179:     ..
 1180: 
 1181: Not exactly glamorous, but enough to register a large change with profiling.
 1182: 
 1183: ### Summary
 1184: 
 1185: Kernel profiling can be enlightening for anyone and provides a much more refined
 1186: method of hunting down performance problems that are not as easy to find using
 1187: conventional means, it is also not nearly as hard as most people think, if you
 1188: can compile a kernel, you can get profiling to work.
 1189: 
 1190: ## System Tuning
 1191: 
 1192: Now that monitoring and analysis tools have been addressed, it is time to look
 1193: into some actual methods. In this section, tools and methods that can affect how
 1194: the system performs that are applied without recompiling the kernel are
 1195: addressed, the next section examines kernel tuning by recompiling.
 1196: 
 1197: ### Using sysctl
 1198: 
 1199: The sysctl utility can be used to look at and in some cases alter system
 1200: parameters. There are so many parameters that can be viewed and changed they
 1201: cannot all be shown here, however, for the first example here is a simple usage
 1202: of sysctl to look at the system PATH environment variable:
 1203: 
 1204:     $ sysctl user.cs_path
 1205:     user.cs_path = /usr/bin:/bin:/usr/sbin:/sbin:/usr/pkg/bin:/usr/pkg/sbin:/usr/local/bin:/usr/local/sbin
 1206: 
 1207: Fairly simple. Now something that is actually related to performance. As an
 1208: example, lets say a system with many users is having file open issues, by
 1209: examining and perhaps raising the kern.maxfiles parameter the problem may be
 1210: fixed, but first, a look:
 1211: 
 1212:     $ sysctl kern.maxfiles
 1213:     kern.maxfiles = 1772
 1214: 
 1215: Now, to change it, as root with the -w option specified:
 1216: 
 1217:     # sysctl -w kern.maxfiles=1972
 1218:     kern.maxfiles: 1772 -> 1972
 1219: 
 1220: Note, when the system is rebooted, the old value will return, there are two
 1221: cures for this, first, modify that parameter in the kernel and recompile, second
 1222: (and simpler) add this line to `/etc/sysctl.conf`:
 1223: 
 1224:     kern.maxfiles=1972
 1225: 
 1226: ### tmpfs & mfs
 1227: 
 1228: NetBSD's *ramdisk* implementations cache all data in the RAM, and if that is
 1229: full, the swap space is used as backing store. NetBSD comes with two
 1230: implementations, the traditional BSD memory-based file system
 1231: [[!template id=man name="mfs" section="8"]]
 1232: and the more modern
 1233: [[!template id=man name="tmpfs" section="8"]].
 1234: While the former can only grow in size, the latter can also shrink if space is
 1235: no longer needed.
 1236: 
 1237: When to use and not to use a memory based filesystem can be hard on large multi
 1238: user systems. In some cases, however, it makes pretty good sense, for example,
 1239: on a development machine used by only one developer at a time, the obj directory
 1240: might be a good place, or some of the tmp directories for builds. In a case like
 1241: that, it makes sense on machines that have a fair amount of RAM on them. On the
 1242: other side of the coin, if a system only has 16MB of RAM and `/var/tmp` is
 1243: mfs-based, there could be severe applications issues that occur.
 1244: 
 1245: The GENERIC kernel has both tmpfs and mfs enabled by default. To use it on a
 1246: particular directory first determine where the swap space is that you wish to
 1247: use, in the example case, a quick look in `/etc/fstab` indicates that
 1248: `/dev/wd0b` is the swap partition:
 1249: 
 1250:     mail% cat /etc/fstab
 1251:     /dev/wd0a / ffs rw 1 1
 1252:     /dev/wd0b none swap sw 0 0
 1253:     /kern /kern kernfs rw
 1254: 
 1255: This system is a mail server so I only want to use `/tmp` with tmpfs, also on
 1256: this particular system I have linked `/tmp` to `/var/tmp` to save space (they
 1257: are on the same drive). All I need to do is add the following entry:
 1258: 
 1259:     /dev/wd0b /var/tmp tmpfs rw 0 0
 1260: 
 1261: If you want to use mfs instead of tmpfs, put just that into the above place.
 1262: 
 1263: Now, a word of warning: make sure said directories are empty and nothing is
 1264: using them when you mount the memory file system! After changing `/etc/fstab`,
 1265: you can either run `mount -a` or reboot the system.
 1266: 
 1267: ### Soft-dependencies
 1268: 
 1269: Soft-dependencies (softdeps) is a mechanism that does not write meta-data to
 1270: disk immediately, but it is written in an ordered fashion, which keeps the
 1271: filesystem consistent in case of a crash. The main benefit of softdeps is
 1272: processing speed. Soft-dependencies have some sharp edges, so beware! Also note
 1273: that soft-dependencies will not be present in any releases past 5.x. See
 1274: [[Journaling|guide/tuning#system-logging]] for information about WAPBL, which is
 1275: the replacement for soft-dependencies.
 1276: 
 1277: Soft-dependencies can be enabled by adding `softdep` to the filesystem options
 1278: in `/etc/fstab`. Let's look at an example of `/etc/fstab`:
 1279: 
 1280:     /dev/wd0a / ffs rw 1 1
 1281:     /dev/wd0b none swap sw 0 0
 1282:     /dev/wd0e /var ffs rw 1 2
 1283:     /dev/wd0f /tmp ffs rw 1 2
 1284:     /dev/wd0g /usr ffs rw 1 2
 1285: 
 1286: Suppose we want to enable soft-dependencies for all file systems, except for the
 1287: `/` partition. We would change it to (changes are emphasized):
 1288: 
 1289:     /dev/wd0a / ffs rw 1 1
 1290:     /dev/wd0b none swap sw 0 0
 1291:     /dev/wd0e /var ffs rw,softdep 1 2
 1292:     /dev/wd0f /tmp ffs rw,softdep 1 2
 1293:     /dev/wd0g /usr ffs rw,softdep 1 2
 1294: 
 1295: More information about softdep capabilities can be found on the
 1296: [author's page](http://www.mckusick.com/softdep/index.html).
 1297: 
 1298: ### Journaling
 1299: 
 1300: Journaling is a mechanism which puts written data in a so-called *journal*
 1301: first, and in a second step the data from the journal is written to disk. In the
 1302: event of a system crash, data that was not written to disk but that is in the
 1303: journal can be replayed, and will thus get the disk into a proper state. The
 1304: main effect of this is that no file system check (fsck) is needed after a rough
 1305: reboot. As of 5.0, NetBSD includes WAPBL, which provides journaling for FFS.
 1306: 
 1307: Journaling can be enabled by adding `log` to the filesystem options in
 1308: `/etc/fstab`. Here is an example which enables journaling for the root (`/`),
 1309: `/var`, and `/usr` file systems:
 1310: 
 1311:     /dev/wd0a /    ffs rw,log 1 1
 1312:     /dev/wd0e /var ffs rw,log 1 2
 1313:     /dev/wd0g /usr ffs rw,log 1 2
 1314: 
 1315: ### LFS
 1316: 
 1317: LFS, the log structured filesystem, writes data to disk in a way that is
 1318: sometimes too aggressive and leads to congestion. To throttle writing, the
 1319: following sysctls can be used:
 1320: 
 1321:     vfs.sync.delay
 1322:     vfs.sync.filedelay
 1323:     vfs.sync.dirdelay
 1324:     vfs.sync.metadelay
 1325:     vfs.lfs.flushindir
 1326:     vfs.lfs.clean_vnhead
 1327:     vfs.lfs.dostats
 1328:     vfs.lfs.pagetrip
 1329:     vfs.lfs.stats.segsused
 1330:     vfs.lfs.stats.psegwrites
 1331:     vfs.lfs.stats.psyncwrites
 1332:     vfs.lfs.stats.pcleanwrites
 1333:     vfs.lfs.stats.blocktot
 1334:     vfs.lfs.stats.cleanblocks
 1335:     vfs.lfs.stats.ncheckpoints
 1336:     vfs.lfs.stats.nwrites
 1337:     vfs.lfs.stats.nsync_writes
 1338:     vfs.lfs.stats.wait_exceeded
 1339:     vfs.lfs.stats.write_exceeded
 1340:     vfs.lfs.stats.flush_invoked
 1341:     vfs.lfs.stats.vflush_invoked
 1342:     vfs.lfs.stats.clean_inlocked
 1343:     vfs.lfs.stats.clean_vnlocked
 1344:     vfs.lfs.stats.segs_reclaimed
 1345:     vfs.lfs.ignore_lazy_sync
 1346: 
 1347: Besides tuning those parameters, disabling write-back caching on
 1348: [[!template id=man name="wd" section="4"]] devices may
 1349: be beneficial. See the
 1350: [[!template id=man name="dkctl" section="8"]] man
 1351: page for details.
 1352: 
 1353: More is available in the NetBSD mailing list archives. See
 1354: [this](http://mail-index.NetBSD.org/tech-perform/2007/04/01/0000.html) and
 1355: [this](http://mail-index.NetBSD.org/tech-perform/2007/04/01/0001.html) mail.
 1356: 
 1357: ## Kernel Tuning
 1358: 
 1359: While many system parameters can be changed with sysctl, many improvements by
 1360: using enhanced system software, layout of the system and managing services
 1361: (moving them in and out of inetd for example) can be achieved as well. Tuning
 1362: the kernel however will provide better performance, even if it appears to be
 1363: marginal.
 1364: 
 1365: ### Preparing to Recompile a Kernel
 1366: 
 1367: First, get the kernel sources for the release as described in
 1368: [[Obtaining the sources|guide/fetch]], reading
 1369: [[Compiling the kernel|guide/kernel]]for more information on building the kernel
 1370: is recommended. Note, this document can be used for -current tuning, however, a
 1371: read of the
 1372: [[Tracking -current|tracking_current]] documentation should be done first, much
 1373: of the information there is repeated here.
 1374: 
 1375: ### Configuring the Kernel
 1376: 
 1377: Configuring a kernel in NetBSD can be daunting. This is because of multiple line
 1378: dependencies within the configuration file itself, however, there is a benefit
 1379: to this method and that is, all it really takes is an ASCII editor to get a new
 1380: kernel configured and some dmesg output. The kernel configuration file is under
 1381: `src/sys/arch/ARCH/conf` where ARCH is your architecture (for example, on a
 1382: SPARC it would be under `src/sys/arch/sparc/conf`).
 1383: 
 1384: After you have located your kernel config file, copy it and remove (comment out)
 1385: all the entries you don't need. This is where
 1386: [[!template id=man name="dmesg" section="8"]]
 1387: becomes your friend. A clean
 1388: [[!template id=man name="dmesg" section="8"]]-output
 1389: will show all of the devices detected by the kernel at boot time. Using
 1390: [[!template id=man name="dmesg" section="8"]]
 1391: output, the device options really needed can be determined.
 1392: 
 1393: #### Some example Configuration Items
 1394: 
 1395: In this example, an ftp server's kernel is being reconfigured to run with the
 1396: bare minimum drivers and options and any other items that might make it run
 1397: faster (again, not necessarily smaller, although it will be). The first thing to
 1398: do is take a look at some of the main configuration items. So, in
 1399: `/usr/src/sys/arch/i386/conf` the GENERIC file is copied to FTP, then the file
 1400: FTP edited.
 1401: 
 1402: At the start of the file there are a bunch of options beginning with maxusers,
 1403: which will be left alone, however, on larger multi-user systems it might be help
 1404: to crank that value up a bit. Next is CPU support, looking at the dmesg output
 1405: this is seen:
 1406: 
 1407:     cpu0: Intel Pentium II/Celeron (Deschutes) (686-class), 400.93 MHz
 1408: 
 1409: Indicating that only the options `I686_CPU` options needs to be used. In the next
 1410: section, all options are left alone except the `PIC_DELAY` which is recommended
 1411: unless it is an older machine. In this case it is enabled since the 686 is
 1412: *relatively new*.
 1413: 
 1414: Between the last section all the way down to compat options there really was no
 1415: need to change anything on this particular system. In the compat section,
 1416: however, there are several options that do not need to be enabled, again this is
 1417: because this machine is strictly a FTP server, all compat options were turned
 1418: off.
 1419: 
 1420: The next section is File systems, and again, for this server very few need to be
 1421: on, the following were left on:
 1422: 
 1423:     # File systems
 1424:     file-system     FFS             # UFS
 1425:     file-system     LFS             # log-structured file system
 1426:     file-system     MFS             # memory file system
 1427:     file-system     CD9660          # ISO 9660 + Rock Ridge file system
 1428:     file-system     FDESC           # /dev/fd
 1429:     file-system     KERNFS          # /kern
 1430:     file-system     NULLFS          # loopback file system
 1431:     file-system     PROCFS          # /proc
 1432:     file-system     UMAPFS          # NULLFS + uid and gid remapping
 1433:     ...
 1434:     options         SOFTDEP         # FFS soft updates support.
 1435:     ...
 1436: 
 1437: Next comes the network options section. The only options left on were:
 1438: 
 1439:     options         INET            # IP + ICMP + TCP + UDP
 1440:     options         INET6           # IPV6
 1441:     options         IPFILTER_LOG    # ipmon(8) log support
 1442: 
 1443: `IPFILTER_LOG` is a nice one to have around since the server will be running
 1444: ipf.
 1445: 
 1446: The next section is verbose messages for various subsystems, since this machine
 1447: is already running and had no major problems, all of them are commented out.
 1448: 
 1449: #### Some Drivers
 1450: 
 1451: The configurable items in the config file are relatively few and easy to cover,
 1452: however, device drivers are a different story. In the following examples, two
 1453: drivers are examined and their associated *areas* in the file trimmed down.
 1454: First, a small example: the cdrom, in dmesg, is the following lines:
 1455: 
 1456:     ...
 1457:     cd0 at atapibus0 drive 0: <CD-540E, , 1.0A> type 5 cdrom removable
 1458:     cd0: 32-bit data port
 1459:     cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
 1460:     pciide0: secondary channel interrupting at irq 15
 1461:     cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfer
 1462:     ...
 1463: 
 1464: Now, it is time to track that section down in the configuration file. Notice
 1465: that the `cd`-drive is on an atapibus and requires pciide support. The section
 1466: that is of interest in this case is the kernel config's "IDE and related
 1467: devices" section. It is worth noting at this point, in and around the IDE
 1468: section are also ISA, PCMCIA etc., on this machine in the
 1469: [[!template id=man name="dmesg" section="8"]]
 1470: output there are no PCMCIA devices, so it stands to reason that all PCMCIA
 1471: references can be removed. But first, the `cd` drive.
 1472: 
 1473: At the start of the IDE section is the following:
 1474: 
 1475:     ...
 1476:     wd*     at atabus? drive ? flags 0x0000
 1477:     ...
 1478:     atapibus* at atapi?
 1479:     ...
 1480: 
 1481: Well, it is pretty obvious that those lines need to be kept. Next is this:
 1482: 
 1483:     ...
 1484:     # ATAPI devices
 1485:     # flags have the same meaning as for IDE drives.
 1486:     cd*     at atapibus? drive ? flags 0x0000       # ATAPI CD-ROM drives
 1487:     sd*     at atapibus? drive ? flags 0x0000       # ATAPI disk drives
 1488:     st*     at atapibus? drive ? flags 0x0000       # ATAPI tape drives
 1489:     uk*     at atapibus? drive ? flags 0x0000       # ATAPI unknown
 1490:     ...
 1491: 
 1492: The only device type that was in the
 1493: [[!template id=man name="dmesg" section="8"]]
 1494: output was the cd, the rest can be commented out.
 1495: 
 1496: The next example is slightly more difficult, network interfaces. This machine
 1497: has two of them:
 1498: 
 1499:     ...
 1500:     ex0 at pci0 dev 17 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x64)
 1501:     ex0: interrupting at irq 10
 1502:     ex0: MAC address 00:50:04:83:ff:b7
 1503:     UI 0x001018 model 0x0012 rev 0 at ex0 phy 24 not configured
 1504:     ex1 at pci0 dev 19 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
 1505:     ex1: interrupting at irq 11
 1506:     ex1: MAC address 00:50:da:63:91:2e
 1507:     exphy0 at ex1 phy 24: 3Com internal media interface
 1508:     exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 1509:     ...
 1510: 
 1511: At first glance it may appear that there are in fact three devices, however, a
 1512: closer look at this line:
 1513: 
 1514:     exphy0 at ex1 phy 24: 3Com internal media interface
 1515: 
 1516: Reveals that it is only two physical cards, not unlike the cdrom, simply
 1517: removing names that are not in dmesg will do the job. In the beginning of the
 1518: network interfaces section is:
 1519: 
 1520:     ...
 1521:     # Network Interfaces
 1522:     
 1523:     # PCI network interfaces
 1524:     an*     at pci? dev ? function ?        # Aironet PC4500/PC4800 (802.11)
 1525:     bge*    at pci? dev ? function ?        # Broadcom 570x gigabit Ethernet
 1526:     en*     at pci? dev ? function ?        # ENI/Adaptec ATM
 1527:     ep*     at pci? dev ? function ?        # 3Com 3c59x
 1528:     epic*   at pci? dev ? function ?        # SMC EPIC/100 Ethernet
 1529:     esh*    at pci? dev ? function ?        # Essential HIPPI card
 1530:     ex*     at pci? dev ? function ?        # 3Com 90x[BC]
 1531:     ...
 1532: 
 1533: There is the ex device. So all of the rest under the PCI section can be removed.
 1534: Additionally, every single line all the way down to this one:
 1535: 
 1536:     exphy*  at mii? phy ?                   # 3Com internal PHYs
 1537: 
 1538: can be commented out as well as the remaining.
 1539: 
 1540: #### Multi Pass
 1541: 
 1542: When I tune a kernel, I like to do it remotely in an X windows session, in one
 1543: window the dmesg output, in the other the config file. It can sometimes take a
 1544: few passes to rebuild a very trimmed kernel since it is easy to accidentally
 1545: remove dependencies.
 1546: 
 1547: ### Building the New Kernel
 1548: 
 1549: Now it is time to build the kernel and put it in place. In the conf directory on
 1550: the ftp server, the following command prepares the build:
 1551: 
 1552:     $ config FTP
 1553: 
 1554: When it is done a message reminding me to make depend will display, next:
 1555: 
 1556:     $ cd ../compile/FTP
 1557:     $ make depend && make
 1558: 
 1559: When it is done, I backup the old kernel and drop the new one in place:
 1560: 
 1561:     # cp /netbsd /netbsd.orig
 1562:     # cp netbsd /
 1563: 
 1564: Now reboot. If the kernel cannot boot, stop the boot process when prompted and
 1565: type `boot netbsd.orig` to boot from the previous kernel.
 1566: 
 1567: ### Shrinking the NetBSD kernel
 1568: 
 1569: When building a kernel for embedded systems, it's often necessary to modify the
 1570: Kernel binary to reduce space or memory footprint.
 1571: 
 1572: #### Removing ELF sections and debug information
 1573: 
 1574: We already know how to remove Kernel support for drivers and options that you
 1575: don't need, thus saving memory and space, but you can save some KiloBytes of
 1576: space by removing debugging symbols and two ELF sections if you don't need them:
 1577: `.comment` and `.ident`. They are used for storing RCS strings viewable with
 1578: [[!template id=man name="ident" section="1"]] and a
 1579: [[!template id=man name="gcc" section="1"]] version
 1580: string. The following examples assume you have your `TOOLDIR` under
 1581: `/usr/src/tooldir.NetBSD-2.0-i386` and the target architecture is `i386`.
 1582: 
 1583:     $ /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-objdump -h /netbsd
 1584:     
 1585:     /netbsd:     file format elf32-i386
 1586:     
 1587:     Sections:
 1588:     Idx Name          Size      VMA       LMA       File off  Algn
 1589:       0 .text         0057a374  c0100000  c0100000  00001000  2**4
 1590:                       CONTENTS, ALLOC, LOAD, READONLY, CODE
 1591:       1 .rodata       00131433  c067a380  c067a380  0057b380  2**5
 1592:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1593:       2 .rodata.str1.1 00035ea0  c07ab7b3  c07ab7b3  006ac7b3  2**0
 1594:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1595:       3 .rodata.str1.32 00059d13  c07e1660  c07e1660  006e2660  2**5
 1596:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1597:       4 link_set_malloc_types 00000198  c083b374  c083b374  0073c374  2**2
 1598:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1599:       5 link_set_domains 00000024  c083b50c  c083b50c  0073c50c  2**2
 1600:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1601:       6 link_set_pools 00000158  c083b530  c083b530  0073c530  2**2
 1602:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1603:       7 link_set_sysctl_funcs 000000f0  c083b688  c083b688  0073c688  2**2
 1604:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1605:       8 link_set_vfsops 00000044  c083b778  c083b778  0073c778  2**2
 1606:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1607:       9 link_set_dkwedge_methods 00000004  c083b7bc  c083b7bc  0073c7bc  2**2
 1608:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1609:      10 link_set_bufq_strats 0000000c  c083b7c0  c083b7c0  0073c7c0  2**2
 1610:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1611:      11 link_set_evcnts 00000030  c083b7cc  c083b7cc  0073c7cc  2**2
 1612:                       CONTENTS, ALLOC, LOAD, READONLY, DATA
 1613:      12 .data         00048ae4  c083c800  c083c800  0073c800  2**5
 1614:                       CONTENTS, ALLOC, LOAD, DATA
 1615:      13 .bss          00058974  c0885300  c0885300  00785300  2**5
 1616:                       ALLOC
 1617:      14 .comment      0000cda0  00000000  00000000  00785300  2**0
 1618:                       CONTENTS, READONLY
 1619:      15 .ident        000119e4  00000000  00000000  007920a0  2**0
 1620:                       CONTENTS, READONLY
 1621: 
 1622: On the third column we can see the size of the sections in hexadecimal form. By
 1623: summing `.comment` and `.ident` sizes we know how much we're going to save with
 1624: their removal: around 120KB (= 52640 + 72164 = 0xcda0 + 0x119e4). To remove the
 1625: sections and debugging symbols that may be present, we're going to use
 1626: [[!template id=man name="strip" section="1"]]:
 1627: 
 1628:     # cp /netbsd /netbsd.orig
 1629:     # /usr/src/tooldir.NetBSD-2.0-i386/bin/i386--netbsdelf-strip -S -R .ident -R .comment /netbsd
 1630:     # ls -l /netbsd /netbsd.orig
 1631:     -rwxr-xr-x  1 root  wheel  8590668 Apr 30 15:56 netbsd
 1632:     -rwxr-xr-x  1 root  wheel  8757547 Apr 30 15:56 netbsd.orig
 1633: 
 1634: Since we also removed debugging symbols, the total amount of disk space saved is
 1635: around 160KB.
 1636: 
 1637: #### Compressing the Kernel
 1638: 
 1639: On some architectures, the bootloader can boot a compressed kernel. You can save
 1640: several MegaBytes of disk space by using this method, but the bootloader will
 1641: take longer to load the Kernel.
 1642: 
 1643:     # cp /netbsd /netbsd.plain
 1644:     # gzip -9 /netbsd
 1645: 
 1646: To see how much space we've saved:
 1647: 
 1648:     $ ls -l /netbsd.plain /netbsd.gz
 1649:     -rwxr-xr-x  1 root  wheel  8757547 Apr 29 18:05 /netbsd.plain
 1650:     -rwxr-xr-x  1 root  wheel  3987769 Apr 29 18:05 /netbsd.gz
 1651: 
 1652: Note that you can only use gzip coding, by using
 1653: [[!template id=man name="gzip" section="1"]], bzip2
 1654: is not supported by the NetBSD bootloaders!
 1655: 

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb