Kernel Panic Procedures

This article is a work in progress or otherwise under review and does not represent current policy.

Contents

  1. Synopsis
  2. Preliminary Notes
  3. Obtaining a Kernel Dump
  4. Finding which line caused the crash
  5. Backtrace through trap() in GDB
  6. Example Crash: Force Panic from WSCons via KVM: Dell DRAC4
  7. What Now
  8. Processing the core dump

Synopsis

Although a few official NetBSD.org documents exist on the topics of using the advanced kernel debugging using KGDB (Kernelized GNU Debugger (GDB)), there are few documents which formalize a "Kernel Panic/Crash Reporting Procedure" using a combination of DDB (the minimalist in-kernel debugger) in combination with GDB after the crash.

http://www.netbsd.org/docs/kernel/#ddb

Preliminary Notes

If the problem is easily re-created, try to obtain a kernel backtrace

The DDB is the minimalist kernel Debugger added by options DDB to the kernel

Obtain a backtrace at the db{0}> prompt using the bt command

Search the Mailing List Archives and Query the NetBSD.org PR database for reports of similar issues.

Post the problem for the discussion on the appropriate mailing list.

Obtaining a Kernel Dump

A kernel dump is possible to obtain from many kernel panics. When at the DB prompt, simply execute:

db{0}> sync

The dump of memory will be written to the swap partition.

At boot time the swap file coredump will be saved to /var/crash. The default settings to control this behaviour is in /etc/defaults/rc.conf and can be overriden in /etc/rc.conf

savecore=yes
savecore_flags="-N /netbsd -z"
savecore_dir="/var/crash"

A gzip(1) compressed file will be available for analysis with gdb(1) or crash(8). To load the core dump into gdb, after uncompressing it with gunzip(1), use target kvm /path/to/netbsd.core.

Your swap partition must be at least the size of your physical RAM

Your /var/crash partition must have sufficient space to hold the same file.

Finding which line caused the crash

With a back trace, it's possible to translate the an address to a line in source code.

Stopped in pid 496.1 (gdb) at netbsd:breakpoint+0x5: leave

To find the address of breakpoint function in the running kernel, use nm(1).

nm /netbsd | grep breakpoint
ffffffff8021df70 T breakpoint
ffffffff8079d944 T db_breakpoint_cmd
ffffffff81644b38 d db_breakpoint_list
ffffffff81644b30 d db_breakpoints_inserted
ffffffff8079d892 T db_clear_breakpoints
ffffffff8079d7d0 t db_find_breakpoint
ffffffff8079d824 T db_find_breakpoint_here
ffffffff81644b40 d db_free_breakpoints
ffffffff81644b48 d db_next_free_breakpoint
ffffffff8079d835 T db_set_breakpoints

Then add 0x5 to the address (0x5 is obtained from the panic message above, not a fixed value for all) and use addr2line(1)

addr2line -e /netbsd ffffffff8021df75

Backtrace through trap() in GDB

In gdb(1) import the stack script and run the stack command.

(gdb) source /usr/src/sys/arch/i386/gdbscripts/stack

See port-i386/10313 for more info.

Example Crash: Force Panic from WSCons via KVM: Dell DRAC4

You can invoke the kernel debugger from the console on amd64/i386 using the special key sequence: Control+Alt+Esc. See the "Entering the debugger" section of ddb(9) for the key sequence on other platforms.

Once in the debugger, you can instruct the KDB to run a preliminary backtrace to get a general idea of what went wrong using the bt command.

You can then force a sync of the file system and and dump of the kernel memory into the swap partition using the sync command.

On the subsequent boot, the /etc/rc.d/savecore script will perform the necessary tasks to archive and gzip(1) the dump.

You can then load the core dump into gdb(1) or crash(8)

What Now

You can submit the feedback as a PR to the NetBSD GNATS system.

Processing the core dump

Hubert Feyrer has a great guide to analyzing kernel panic core dumps

Additionally, the following command below can be used to create a relatively useful backtrace:

localhost# cd /var/crash
localhost# gunzip -d *gz
localhost# gdb  --symbols=/netbsd.gdb --quiet --eval-command="file /netbsd.gdb" \ 
                --eval-command="target kvm netbsd.1.core" --eval-command "bt" \ 
                --eval-command "list" --eval-command "info all-registers" 2>&1
Load new symbol table from "/netbsd.gdb"? (y or n) y
Reading symbols from /netbsd.gdb...done.
#0  0xc047c9f8 in cpu_reboot (howto=256, bootstr=0x0) at /usr/src/sys/arch/i386/i386/machdep.c:927
927                     dumpsys();
#0  0xc047c9f8 in cpu_reboot (howto=256, bootstr=0x0) at /usr/src/sys/arch/i386/i386/machdep.c:927
#1  0xc01c3f2a in db_sync_cmd (addr=-1065223264, have_addr=false, count=-1071881791, modif=0xcc883c04 "[BINARY]") at /usr/src/sys/ddb/db_command.c:1304
#2  0xc01c45fa in db_command (last_cmdp=0xc07dfe3c) at /usr/src/sys/ddb/db_command.c:926
#3  0xc01c4856 in db_command_loop () at /usr/src/sys/ddb/db_command.c:583
#4  0xc01c7320 in db_trap (type=1, code=0) at /usr/src/sys/ddb/db_trap.c:101
#5  0xc0478855 in kdb_trap (type=1, code=0, regs=0xcc883e3c) at /usr/src/sys/arch/i386/i386/db_interface.c:229
#6  0xc047efe2 in trap (frame=0xcc883e3c) at /usr/src/sys/arch/i386/i386/trap.c:350
#7  0xc010cb80 in calltrap ()
#8  0xc047717c in breakpoint ()
#9  0xc02e3676 in wskbd_translate (id=0xc0833ae0, type=2, value=<value optimized out>) at /usr/src/sys/dev/wscons/wskbd.c:1586
#10 0xc02e386e in wskbd_input (dev=0xcc888800, type=2, value=1) at /usr/src/sys/dev/wscons/wskbd.c:682
#11 0xc054c27a in pckbd_input (vsc=0xcc0cc6a8, data=1) at /usr/src/sys/dev/pckbport/pckbd.c:584
#12 0xc02ba80d in pckbcintr (vsc=0xcc0d6ebc) at /usr/src/sys/dev/ic/pckbc.c:607
#13 0xc0465798 in intr_biglock_wrapper (vp=0xc2e853c0) at /usr/src/sys/arch/x86/x86/intr.c:617
#14 0xc01036d9 in Xintr_ioapic_edge3 ()
#15 0xc0477234 in x86_mwait ()
Previous frame inner to this frame (corrupt stack?)
922             /* Disable interrupts. */
923             splhigh();
924     
925             /* Do a dump if requested. */
926             if ((howto & (RB_DUMP | RB_HALT)) == RB_DUMP)
927                     dumpsys();
928     
929     haltsys:
930             doshutdownhooks();
931     
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x100    256
esp            0xcc883bb8       0xcc883bb8
ebp            0xcc883bc0       0xcc883bc0
esi            0xc07dfe3c       -1065484740
edi            0x0      0
eip            0xc047c9f8       0xc047c9f8 <cpu_reboot+368>
eflags         0x0      [ ]
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
st0            0        (raw 0x00000000000000000000)
st1            0        (raw 0x00000000000000000000)
st2            0        (raw 0x00000000000000000000)
st3            0        (raw 0x00000000000000000000)
st4            0        (raw 0x00000000000000000000)
st5            0        (raw 0x00000000000000000000)
st6            0        (raw 0x00000000000000000000)
st7            0        (raw 0x00000000000000000000)
fctrl          0x0      0
fstat          0x0      0
ftag           0x0      0
fiseg          0x0      0
fioff          0x0      0
foseg          0x0      0
fooff          0x0      0
fop            0x0      0
xmm0           
xmm1           
xmm2           
xmm3           
xmm4           
xmm5           
xmm6           
xmm7           
mm0            
mm1            
mm2            
mm3            
mm4            
mm5            
mm6            
mm7            
Add a comment