[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Many tools lie at your disposal for debugging Pintos. This appendix introduces you to a few of them.
printf()
Don't underestimate the value of printf()
. The way
printf()
is implemented in Pintos, you can call it from
practically anywhere in the kernel, whether it's in a kernel thread or
an interrupt handler, almost regardless of what locks are held (but see
printf Reboots for a counterexample).
printf()
is useful for more than just examining data.
It can also help figure out when and where something goes wrong, even
when the kernel crashes or panics without a useful error message. The
strategy is to sprinkle calls to print()
with different strings
(e.g. "<1>"
, "<2>"
, ...) throughout the pieces of
code you suspect are failing. If you don't even see <1>
printed,
then something bad happened before that point, if you see <1>
but not <2>
, then something bad happened between those two
points, and so on. Based on what you learn, you can then insert more
printf()
calls in the new, smaller region of code you suspect.
Eventually you can narrow the problem down to a single statement.
See section E.6 Debugging by Infinite Loop, for a related technique.
ASSERT
Assertions are useful because they can catch problems early, before
they'd otherwise be noticed. Pintos provides the
ASSERT
, defined in <debug.h>
, for assertions.
Ideally, each function should begin with a set of
assertions that check its arguments for validity. (Initializers for
functions' local variables are evaluated before assertions are
checked, so be careful not to assume that an argument is valid in an
initializer.) You can also sprinkle assertions throughout the body of
functions in places where you suspect things are likely to go wrong.
They are especially useful for checking loop invariants.
When an assertion proves untrue, the kernel panics. The panic message should help you to find the problem. See the description of backtraces below for more information.
These macros defined in <debug.h>
tell the compiler special
attributes of a function or function parameter. Their expansions are
GCC-specific.
printf()
-like format string as the argument numbered
format (starting from 1) and that the corresponding value
arguments start at the argument numbered first. This lets the
compiler tell you if you pass the wrong argument types.
When the kernel panics, it prints a "backtrace," that is, a summary
of how your program got where it is, as a list of addresses inside the
functions that were running at the time of the panic. You can also
insert a call to debug_backtrace()
, prototyped in
<debug.h>
, to print a backtrace at any point in your code.
The addresses in a backtrace are listed as raw hexadecimal numbers,
which are meaningless by themselves. You can translate them into
function names and source file line numbers using a tool called
addr2line
.
The output format of addr2line
is not ideal, so
we've supplied a wrapper for it simply called backtrace
.
Give it the name of your kernel.o
as the first argument and the
hexadecimal numbers composing the backtrace (including the 0x
prefixes) as the remaining arguments. It outputs the function name
and source file line numbers that correspond to each address.
If the translated form of a backtrace is garbled, or doesn't make
sense (e.g. function A is listed above function B, but B doesn't
call A), then it's a good sign that you're corrupting a kernel
thread's stack, because the backtrace is extracted from the stack.
Alternatively, it could be that the kernel.o
you passed to
backtrace
does not correspond to the kernel that produced
the backtrace.
Sometimes backtraces can be confusing without implying corruption.
Compiler optimizations can cause surprising behavior. When a function
has called another function as its final action (a tail call), the
calling function may not appear in a backtrace at all. Similarly, when
function A calls another function B that never returns, the compiler may
optimize such that an unrelated function C appears in the backtrace
instead of A. Function C is simply the function that happens to be in
memory just after A. In the threads project, this is commonly seen in
backtraces for test failures; see pass()
Fails), for more information.
Here's an example. Suppose that Pintos printed out this following call stack, which is taken from an actual Pintos submission for the file system project:
Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8. |
You would then invoke the backtrace
utility like shown below,
cutting and pasting the backtrace information into the command line.
This assumes that kernel.o
is in the current directory. You
would of course enter all of the following on a single shell command
line, even though that would overflow our margins here:
backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8 |
The backtrace output would then look something like this:
0xc0106eff: debug_panic (../../lib/debug.c:86) 0xc01102fb: file_seek (../../filesys/file.c:405) 0xc010dc22: seek (../../userprog/syscall.c:744) 0xc010cf67: syscall_handler (../../userprog/syscall.c:444) 0xc0102319: intr_handler (../../threads/interrupt.c:334) 0xc010325a: ?? (threads/intr-stubs.S:1554) 0x804812c: ?? (??:0) 0x8048a96: ?? (??:0) 0x8048ac8: ?? (??:0) |
(You will probably not get the same results if you run the command above on your own kernel binary, because the source code you compiled from is different from the source code that panicked.)
The first line in the backtrace refers to debug_panic()
, the
function that implements kernel panics. Because backtraces commonly
result from kernel panics, debug_panic()
will often be the first
function shown in a backtrace.
The second line shows file_seek()
as the function that panicked,
in this case as the result of an assertion failure. In the source code
tree used for this example, line 405 of filesys/file.c
is the
assertion
ASSERT (file_ofs >= 0); |
(This line was also cited in the assertion failure message.)
Thus, file_seek()
panicked because it passed a negative file offset
argument.
The third line indicates that seek()
called file_seek()
,
presumably without validating the offset argument. In this submission,
seek()
implements the seek
system call.
The fourth line shows that syscall_handler()
, the system call
handler, invoked seek()
.
The fifth and sixth lines are the interrupt handler entry path.
The remaining lines are for addresses below PHYS_BASE
. This
means that they refer to addresses in the user program, not in the
kernel. If you know what user program was running when the kernel
panicked, you can re-run backtrace
on the user program, like
so: (typing the command on a single line, of course):
backtrace grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8 |
The results look like this:
0xc0106eff: ?? (??:0) 0xc01102fb: ?? (??:0) 0xc010dc22: ?? (??:0) 0xc010cf67: ?? (??:0) 0xc0102319: ?? (??:0) 0xc010325a: ?? (??:0) 0x804812c: test_main (../../tests/filesys/extended/grow-too-big.c:20) 0x8048a96: main (../../tests/main.c:10) 0x8048ac8: _start (../../lib/user/entry.c:9) |
Here's an extra tip for anyone who read this far: backtrace
is smart enough to strip the Call stack:
header and .
trailer from the command line if you include them. This can save you
a little bit of trouble in cutting and pasting. Thus, the following
command prints the same output as the first one we used:
backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8. |
gdb
You can run the Pintos kernel under the supervision of the
gdb
(80x86) or i386-elf-gdb
(SPARC)
debugger. First,
start Pintos with the --gdb
option, e.g. pintos
--gdb -- run mytest
. Second, in a separate terminal, invoke gdb
(or
i386-elf-gdb
) on
kernel.o
:
gdb kernel.o |
gdb
command:
target remote localhost:1234 |
(If the target remote
command fails, then make sure that both
gdb
and pintos
are running on the same machine by
running hostname
in each terminal. If the names printed
differ, then you need to open a new terminal for gdb
on the
machine running pintos
.)
Now gdb
is connected to the simulator over a local
network connection. You can now issue any normal gdb
commands. If you issue the c
command, the simulated BIOS will take
control, load Pintos, and then Pintos will run in the usual way. You
can pause the process at any point with Ctrl+C. If you want
gdb
to stop when Pintos starts running, set a breakpoint on
main()
with the command break main
before c
.
You can read the gdb
manual by typing info gdb
at a
terminal command prompt, or you can view it in Emacs with the command
C-h i. Here's a few commonly useful gdb
commands:
c
break function
break filename:linenum
break *address
0xprefix to specify an address in hex.)
p expression
l *address
0xprefix to specify an address in hex.)
bt
backtrace
program described above.
p/a address
0xprefix to specify an address in hex.)
diassemble function
If you notice other strange behavior while using gdb
, there
are three possibilities: a bug in your
modified Pintos, a bug in Bochs's
interface to gdb
or in gdb
itself, or
a bug in the original Pintos code. The first and second
are quite likely, and you should seriously consider both. We hope
that the third is less likely, but it is also possible.
You can also use gdb
to debug a user program running under
Pintos. Start by issuing this gdb
command to load the
program's symbol table:
add-symbol-file program |
gdb
command line, instead of kernel.o.)
If you get yourself into a situation where the machine reboots in a
loop, that's probably a "triple fault." In such a situation you
might not be able to use printf()
for debugging, because the
reboots might be happening even before everything needed for
printf()
is initialized. In such a situation, you might want to
try what I call "debugging by infinite loop."
What you do is pick a place in the Pintos code, insert the statement
for (;;);
there, and recompile and run. There are two likely
possibilities:
If you move around the infinite loop in a "binary search" fashion, you can use this technique to pin down the exact spot that everything goes wrong. It should only take a few minutes at most.
An advanced debugging technique is to modify and recompile the simulator. This proves useful when the simulated hardware has more information than it makes available to the OS. For example, page faults have a long list of potential causes, but the hardware does not report to the OS exactly which one is the particular cause. Furthermore, a bug in the kernel's handling of page faults can easily lead to recursive faults, but a "triple fault" will cause the CPU to reset itself, which is hardly conducive to debugging.
In a case like this, you might appreciate being able to make Bochs
print out more debug information, such as the exact type of fault that
occurred. It's not very hard. You start by retrieving the source
code for Bochs 2.2.5 from http://bochs.sourceforge.net and
extracting it into a directory. If desired, apply
pintos/src/misc/bochs-2.2.5.jitter.patch
.
Then run ./configure
, supplying the options you want (some
suggestions are in the patch file). Finally, run make
.
This will compile Bochs and eventually produce a new binary
bochs
. To use your bochs
binary with pintos
,
put it in your PATH
, and make sure that it is earlier than
/home/courses/cs3204/bin/bochs
.
Of course, to get any good out of this you'll have to actually modify
Bochs. Instructions for doing this are firmly out of the scope of
this document. However, if you want to debug page faults as suggested
above, a good place to start adding printf()
s is
BX_CPU_C::dtranslate_linear()
in cpu/paging.cc
.
The page allocator in threads/palloc.c
and the block allocator in
threads/malloc.c
both clear all the bytes in pages and blocks to
0xcc when they are freed. Thus, if you see an attempt to
dereference a pointer like 0xcccccccc, or some other reference to
0xcc, there's a good chance you're trying to reuse a page that's
already been freed. Also, byte 0xcc is the CPU opcode for "invoke
interrupt 3," so if you see an error like Interrupt 0x03 (#BP
Breakpoint Exception)
, Pintos tried to execute code in a freed page or
block.
An assertion failure on the expression sec_no < d->capacity
indicates that Pintos tried to access a file through an inode that has
been closed and freed. Freeing an inode clears its starting sector
number to 0xcccccccc, which is not a valid sector number for disks
smaller than about 1.6 TB.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |