The handling of such faults is implemented as a combination of hardware and software. The hardware (specifically the CPU's memory management unit or MMU) knows which addresses are valid and which are not. When an instruction attempts to read from or write to an address that is not valid, the CPU aborts this instruction and jumps to a fault handler at a particular address ("intr_entry" in Pintos).
This fault handler must handle this situation. If the fault occurred in user code (which is possible in Pintos Project 2 and up), then the user process must be terminated. If the fault occurred in kernel code, then this is indicative of a bug (the kernel should never access invalid addresses). In this case, Pintos attempts to output as much useful information as it can to the console, before calling PANIC() which will halt the machine.
Let's look at an example. When running test_priority_change, you might see:
gback@cantaloupe [21](~/pintos/pintos-solution2/src/threads/build) > pintos -v -T 60 -- -q run priority-change Writing command line to /tmp/Wh1QpnjYEa.dsk... bochs -q ======================================================================== Bochs x86 Emulator 2.2.5 Build from CVS snapshot on December 30, 2005 ======================================================================== 00000000000i[ ] reading configuration from bochsrc.txt 00000000000i[ ] installing nogui module as the Bochs GUI 00000000000i[ ] using log file bochsout.txt Pintos booting with 4,096 kB RAM... Kernel command line: -q run priority-change 374 pages available in kernel pool. 373 pages available in user pool. Interrupt 0x0e (#PF Page-Fault Exception) at eip=0xc0101647 cr2=00000000 error=00000002 eax=00000000 ebx=00000089 ecx=00000004 edx=00000002 esi=00007369 edi=c0111000 esp=c0116000 ebp=c002ff58 cs=0008 ds=0010 es=0010 ss=d581 Kernel PANIC at ../../threads/interrupt.c:354 in intr_handler(): Unexpected interrupt Call stack: 0xc01088c4 0xc0101fce 0xc010218b 0xc0101647 0xc0100e9f 0xc01008d8 0xc01000a2. The `backtrace' program can make call stacks useful. Read "Backtraces" in the "Debugging Tools" chapter of the Pintos documentation for more information. Timer: 0 ticks Thread: 0 idle ticks, 0 kernel ticks, 0 user ticks Console: 774 characters output Keyboard: 0 keys pressed Powering off... ======================================================================== Bochs is exiting with the following message: [UNMP ] Shutdown port: shutdown requested ========================================================================Your first step (the automated grading scripts even do that for you if they detect a test fails) is to run backtrace on the list of addresses listed after Call stack:
gback@cantaloupe [22](~/pintos/pintos-solution2/src/threads/build) > backtrace 0xc01088c4 0xc0101fce 0xc010218b 0xc0101647 0xc0100e9f 0xc01008d8 0xc01000a2 0xc01088c4: debug_panic (../../lib/kernel/debug.c:32) 0xc0101fce: intr_handler (..../threads/interrupt.c:356) 0xc010218b: intr_entry (..../threads/intr-stubs.S:38) 0xc0101647: alloc_frame (../../threads/thread.c:635) 0xc0100e9f: thread_create (../../threads/thread.c:291) 0xc01008d8: thread_start (../../threads/thread.c:105) 0xc01000a2: main (../../threads/init.c:123)This backtrace is read backwards. It says that main() called thread_start() on line 123 in file threads/init.c, that thread_start() called thread_create() on line 105 in thread.c, and so on.
You should first examine the backtrace if it makes sense (meaning whether indeed in your code, in thread.c, on line 105, main() calls thread_start(). See Section E.4 Backtraces in the Pintos documentation for more information on when not to trust the information displayed.) In this case, you'll notice that everything checks out, except for intr_entry(), which is not called in alloc_frame() on thread.c:635. Instead, this is where the abrupt control transfer happened as a result of the page fault exception. You need to take a closer look. You should first look at line 635 in thread.c and see whether it is obvious what's causing the fault.
If you can't see it already, you'll have to dig deeper. The state dump in the interrupt handler gives you two more important pieces of information: it tells you the instruction that caused the fault, and it tells you the address the instruction tried to access. Look at eip for the address of the faulting instruction (it's eip because that's where the instruction pointer (aka program counter) was when the fault happened. In this case, the fault happened at eip=0xc0101647.)
You can use the objdump program to disassemble your kernel and see what that instruction is (alternatively, gdb will tell you as well, see the "disass" command.) objdump works like so:
gback@cantaloupe [24](~/pintos/pintos-solution2/src/threads/build) > objdump -d kernel.o | less ... Now type /^c0101647 and 'b' to go back a page until you see all of alloc_frame: c01015de <alloc_frame>: c01015de: 55 push %ebp c01015df: 89 e5 mov %esp,%ebp c01015e1: 83 ec 08 sub $0x8,%esp c01015e4: ff 75 08 pushl 0x8(%ebp) c01015e7: e8 9c fe ff ff call c0101488 <is_thread> c01015ec: 83 c4 04 add $0x4,%esp c01015ef: 0f b6 c0 movzbl %al,%eax c01015f2: 85 c0 test %eax,%eax c01015f4: 75 21 jne c0101617 <alloc_frame+0x39> c01015f6: 83 ec 0c sub $0xc,%esp c01015f9: 68 71 d6 10 c0 push $0xc010d671 c01015fe: 68 4e d5 10 c0 push $0xc010d54e c0101603: 68 94 d7 10 c0 push $0xc010d794 c0101608: 68 78 02 00 00 push $0x278 c010160d: 68 65 d5 10 c0 push $0xc010d565 c0101612: e8 4d 72 00 00 call c0108864 <debug_panic> c0101617: 8b 45 0c mov 0xc(%ebp),%eax c010161a: 83 e0 03 and $0x3,%eax c010161d: 85 c0 test %eax,%eax c010161f: 74 21 je c0101642 <alloc_frame+0x64> c0101621: 83 ec 0c sub $0xc,%esp c0101624: 68 a0 d7 10 c0 push $0xc010d7a0 c0101629: 68 4e d5 10 c0 push $0xc010d54e c010162e: 68 94 d7 10 c0 push $0xc010d794 c0101633: 68 79 02 00 00 push $0x279 c0101638: 68 65 d5 10 c0 push $0xc010d565 c010163d: e8 22 72 00 00 call c0108864 <debug_panic> c0101642: b8 00 00 00 00 mov $0x0,%eax c0101647: c7 00 2a 00 00 00 movl $0x2a,(%eax) c010164d: 8b 45 08 mov 0x8(%ebp),%eax c0101650: 8b 50 18 mov 0x18(%eax),%edx c0101653: 8b 45 0c mov 0xc(%ebp),%eax c0101656: 29 c2 sub %eax,%edx c0101658: 8b 45 08 mov 0x8(%ebp),%eax c010165b: 89 50 18 mov %edx,0x18(%eax) c010165e: 8b 45 08 mov 0x8(%ebp),%eax c0101661: 8b 40 18 mov 0x18(%eax),%eax c0101664: c9 leave c0101665: c3 retThe instruction at 0xc0101647 is shown in red. It attempts to write the value 0x2a to memory at the address contained in %eax. The previous instruction, shown in blue, set %eax to 0, however. Consequently, this was an attempt to dereference a NULL pointer.
You can confirm this by looking at the value of %eax in the dump above (eax=00000000). You also see that 0 was the faulting address, which is stored in cr2 (cr2=00000000). Let's look at thread.c, line 635 and up:
0626 /* Allocates a SIZE-byte frame at the top of thread T's stack and 0627 returns a pointer to the frame's base. */ 0628 static void * 0629 alloc_frame (struct thread *t, size_t size) 0630 { 0631 /* Stack data is always allocated in word-size units. */ 0632 ASSERT (is_thread (t)); 0633 ASSERT (size % sizeof (uint32_t) == 0); 0634 0635 *(void**)NULL = 42; 0636 t->stack -= size; 0637 return t->stack; 0638 }The offending line is shown in red. Indeed, I am trying to dereference a NULL pointer.
Note that if you dereference a NULL pointer, the offending address may not always be 00000000. For instance, if 'p' is a NULL pointer of type struct P *, then an access to p->x will attempt to access memory at offset(struct P, x), which might be 00000004, 00000008, 0000000c etc.
Conversely, if a list_elem pointer is 0, then list_entry might give you a pointer that is below zero (recall that -1 is FFFFFFFF, -2 is FFFFFFFE, and so on.) For instance, if you have:
struct X { int x; struct list_elem elem; }and you use struct X * xp = list_entry(e, struct X, elem); and access xp->x, then the offending address will be fffffffc if e was NULL:
struct list_elem * e = NULL; // a NULL pointer struct X * xp = list_entry(e, struct X, elem); // no memory access here, just pointer arithmetic xp->x; // accesses memory at -4 which is FFFFFFFC
Note that there's nothing special about NULL. It's just an invalid address because of how Pintos maps its memory - most OS make sure that address 00000000 and the addresses around it (typically at least -4096 ... + 4095) are invalid, because it helps detect accesses to NULL pointers. The same scenario would apply to any invalid address. In Pintos Project 1, only addresses between 0xC0000000 (PHYS_BASE) and 0xC0400000 (PHYS_BASE plus 4 MB) are valid, assuming you run Pintos with 4MB of physical memory. All other addresses are invalid and cause page fault exceptions.