@c Copyright (c) 1998 - 2000 Kungliga Tekniska Högskolan @c (Royal Institute of Technology, Stockholm, Sweden). @c All rights reserved. @c $arla: debugging.texi,v 1.28 2002/11/29 19:53:00 tol Exp $ @node Debugging, Arlad, Parts of Arla, Top @chapter Debugging @cindex Debugging This chapter of the manual includes tips that are useful when debugging arla. Arla and nnpfs contains logging facilities that is quite useful when debugging when something goes wrong. This and some kernel debugging tips are described. @menu * Arlad:: * Debugging LWP with GDB:: * nnpfs:: * nnpfs on linux:: * Debugging techniques:: * Kernel debuggers:: * Darwin/MacOS X:: @end menu @node Arlad, Debugging LWP with GDB, Debugging, Debugging @comment node-name, next, previous, up @section Arlad @cindex Arlad debugging @cindex Debugging arlad If arlad is run without any arguments arlad will fork(2) and log to syslog(3). To disable forking use the --no-fork (-n) switch. In the current state of the code, arlad is allways to be started with the recover (-z) switch. This will invalidate your cache at startup. This restriction may be dropped in the future. To enable more debuggning run arla with the switch --debug=module1,module2,... One useful combination is @example --debug=all,-cleaner @end example The cleaner output is usully not that intresting and can be ignored. A convenient way to debug arlad is to start it inside gdb. @example datan:~# gdb /usr/arla/libexec/arlad (gdb) run -z -n @end example This gives you the opportunity to examine a crashed arlad. @example (gdb) bt @end example The arla team appreciates cut and paste information from the beginning to the end of the bt output from such a gdb run. To set the debugging with a running arlad use @code{fs arladeb} as root. @example datan:~# fs arladeb arladebug is: none datan:~# fs arladeb almost-all datan:~# @end example By default, arlad logs through syslog if running as a daemon and to stderr when running in the foreground (with @kbd{--no-fork}). @node Debugging LWP with GDB, nnpfs, Arlad, Debugging @comment node-name, next, previous, up @section Debugging LWP with GDB @cindex Gdb For easy tracing of threads we have patch (@url{http://www.stacken.kth.se/projekt/arla/gdb-4.18-backfrom.diff}) for gdb 4.18 (a new command) and a gdb sequence (think script). The sequence only works for i386, but its just matter of choosing different offset in topstack to find $fp and $pc in the lwp_ps_internal part of the sequence. You should copy the @file{.gdbinit} (that you can find in the arlad directory in the source-code) to your home-directory, the directory from where you startat the patched gdb or use flag -x to gdb. Your debugging session might look like this: @example (gdb) lwp_ps Runnable[0] name: IO MANAGER eventlist: fp: 0x806aac4 pc: 0x806aac4 name: producer eventlist: 8048b00 fp: 0x8083b40 pc: 0x8083b40 Runnable[1] [...] (gdb) help backfrom Print backtrace of FRAMEPOINTER and PROGRAMCOUNTER. (gdb) backfrom 0x8083b40 0x8083b40 #0 0x8083b40 in ?? () #1 0x8049e2f in LWP_MwaitProcess (wcount=1, evlist=0x8083b70) at /afs/e.kth.se/home/staff/lha/src/cvs/arla-foo/lwp/lwp.c:567 #2 0x8049eaf in LWP_WaitProcess (event=0x8048b00) at /afs/e.kth.se/home/staff/lha/src/cvs/arla-foo/lwp/lwp.c:585 #3 0x8048b12 in Producer (foo=0x0) at /afs/e.kth.se/home/staff/lha/src/cvs/arla-foo/lwp/testlwp.c:76 #4 0x804a00c in Create_Process_Part2 () at /afs/e.kth.se/home/staff/lha/src/cvs/arla-foo/lwp/lwp.c:629 #5 0xfffefdfc in ?? () #6 0x8051980 in ?? () @end example There also the possibility to run arla with pthreads (run configure with --with-pthreads). @node nnpfs, nnpfs on linux, Debugging LWP with GDB, Debugging @comment node-name, next, previous, up @section nnpfs @cindex Debugging NNPFS @cindex NNPFS debugging NNPFS debugging does almost look the same on all platforms. They all share same debugging flags, but not all are enabled on all platforms. Change the debugging with the @kbd{fs nnpfsdebug} command. @example datan:~# fs nnpfsdebug nnpfsdebug is: none datan:~# fs nnpfsdebug almost-all datan:~# @end example If it crashes before you have an opportunity to set the debug level, you will have to edit @file{nnpfs/@var{your-os}/nnpfs_deb.c} and recompile. The logging of nnpfs ends up in your syslog. Syslog usully logs to /var/log or /var/adm (look in /etc/syslog.conf). @node nnpfs on linux, Debugging techniques, nnpfs, Debugging @comment node-name, next, previous, up @section nnpfs on linux There is a problem with klogd, it's too slow. Cat the @file{/proc/kmsg} file instead. Remember to kill klogd, since the reader will delete the text from the ring-bufer, and you will only get some of the message in your cat. @node Debugging techniques, Kernel debuggers, nnpfs on linux, Debugging @comment node-name, next, previous, up @section Debugging techniques @cindex Debugging techniques Kernel debugging can sometimes force you to exercise your imagination. We have learned some different techniques that can be useful. @subsection Signals On operatingsystems with kernel debugger that you can use probably find where in the kernel a user-program live, and thus deadlocks or trigger the bad event, that later will result in a bug. This is a problem, how do you some a process to find where it did the intresting thing when you can't set a kernel breakpoint ? One way to be notified is to send a signal from the kernel module (psignal() on a BSD and force_sig() on linux). SIGABRT() is quite useful if you want to force a coredump. If you want to continue debugging, use SIGSTOP. @subsection Recreateable testcases Make sure bugs don't get reintroduced. @node Kernel debuggers, Darwin/MacOS X, Debugging techniques, Debugging @comment node-name, next, previous, up @section Kernel debuggers @cindex Kernel debugging Kernel debuggers are the most useful tool when you are trying to figure out what's wrong with nnpfs. Unfortunately they also seem to have their own life and do not always behave as expected. @subsection Using GDB Kernel debugging on NetBSD, OpenBSD, FreeBSD and Darwin are almost the same. You get the idea from the NetBSD example below: @example (gdb) file netbsd.N (gdb) target kcore netbsd.N.core (gdb) symbol-file /sys/arch/i386/compile/NUTCRACKER/netbsd.gdb @end example This example loads the kernel symbols into gdb. But this doesn't show the nnpfs symbols, and that makes your life harder. @subsection Getting all symbols loaded at the same time If you want to use the symbols of nnpfs, there is a gdb command called @samp{add-symbol-file} that is useful. The symbol file is obtained by loading the kernel module nnpfs with @samp{kmodload -o /tmp/nnpfs-sym} (Darwin) or @samp{modload} (NetBSD and OpenBSD). FreeBSD has a linker in the kernel that does the linking instead of relying on @samp{ld}. The symbol address where the module is loaded get be gotten from @samp{modstat}, @samp{kldstat} or @samp{kmodstat} (it's in the @code{area} field). If you forgot the to run modstat/kldstat/kmodstat, you can extract the information from the kernel. In Darwin you look at the variable kmod (you might have to case it to a (kmod_info_t *). We have seen gdb lose the debugging info). kmod is the start of a linked list. Other BSDs have some variant of this. You should also source the commands in /sys/gdbscripts (NetBSD), or System/xnu/osfmk/.gdbinit (Darwin) to get commands like ps inside gdb. @example datan:~# modstat Type Id Off Loadaddr Size Info Rev Module Name DEV 0 29 ce37f000 002c ce388fc0 1 nnpfs_mod [...] [...] (gdb) add-symbol-table nnpfs.sym ce37f000 @end example @subsection Debugging processes, examine their stack, etc One of diffrencies between the BSD's are the @code{proc}, a command that enables you do to a backtrace on all processes. On FreeBSD you give the @code{proc} command a @samp{pid}, but on NetBSD and OpenBSD you give a pointer to a @code{struct proc}. After you have ran @code{proc} to set the current process, you can examine the backtrace with the regular @code{backtrace} command. Darwin does't have a @code{proc} command. Instead you are supposed to use gdb sequences (System/xnu/osfmk/.gdbinit) to print process stacks, threads, activations, and other information. @subsection Debugging Linux @cindex Kernel debuging on linux @cindex Linux kernel debugging You can't get a crashdump for linux with patching the kernel. There are two projects have support for this. Mission Critical Linux @url{http://www.missioncritiallinux.com} and SGI @url{http://oss.sgi.com/}. Remember save the context of /proc/ksyms before you crash, since this is needed to figure out where the nnpfs symbols are located in the kernel. But you can still use the debugger (or objdump) to figure out where in the binary that you crashed. @cindex Ksymoops @code{ksymoops} can be used to create a backtrace. @subsection Using adb @cindex adb Adb is not a symbolic debugger, this means that you have to read the disassembled object-code to figure out where it made the wrong turn and died. You might be able to use GNU objdump to list the assembler and source-code intertwined (@samp{objdump -S -d mod_nnpfs.o}). Remember that GNU binutils for sparc-v9 isn't that good yet. You can find the script that use use for the adb command @samp{$<} in @file{/usr/lib/adb} and @file{/usr/platform/PLATFORNAME/adb}. @subsection Debugging a live kernel @cindex Live kernel An important thing to know is that you can debug a live kernel too, this can be useful to find dead-locks. To attach to a kernel you use a command like this on a BSD system (that is using gdb): @example (gdb) file /netbsd (gdb) target kcore /dev/mem (gdb) symbol-file /sys/arch/i386/compile/NUTCRACKER/netbsd.gdb @end example And on Solaris: @example # adb -k /dev/ksyms /dev/mem @end example @subsection Other useful debugging tools Most diagnosics tools like ps, dmesg, and pstat on BSD systems used to look in kernel memory to extract information (and thus earned the name kmem-groovlers). On some systems they have been replaced with other method of getting their data, like /proc and sysctl. But due to their heritage they can still be used in with a kernel and coredump to extract information on some system. @node Darwin/MacOS X, Porting, Kernel debuggers, Debugging @comment node-name, next, previous, up @section Darwin/MacOS X You'll need two computers to debug arlad/nnpfs on darwin since the common way to debug is to use a remote kernel-debugger over IP/UDP. First you need to publish the arp-address of the computer that you are going to crash. We have not found any kernel symbols in MacOSX Public Beta, so you should probably build your own kernel. Use Darwin xnu kernel source with cvs-tag: Apple-103-0-1 (not xnu-103). @example gdb nnpfs.out target remote-kdp add-symbol-table ... attach @end example