[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

5. Project 3: Virtual Memory

By now you should be familiar with the inner workings of Pintos. Your OS can properly handle multiple threads of execution with proper synchronization, and can load multiple user programs at once. However, the number and size of programs that can run is limited by the machine's main memory size. In this assignment, you will remove that limitation.

You will build this assignment on top of the last one. It will benefit you to get your project 2 in good working order before this assignment so those bugs don't keep haunting you. Test programs from the previous project should also work with this project.

You will continue to handle Pintos disks and file systems the same way you did in the previous assignment (see section 4.1.2 Using the File System).


5.1 Background


5.1.1 Source Files

You will work in the vm directory for this project. The vm directory contains only Makefiles. The only change from userprog is that this new Makefile turns on the setting -DVM. All code you write will be in new files or in files introduced in earlier projects.

You will probably be encountering just a few files for the first time:

devices/disk.h
devices/disk.c
Provides access to the physical disk, abstracting away the rather awful IDE interface. You will use this interface to access the swap disk.


5.1.2 Page Faults

For the last assignment, whenever a context switch occurred, the new process installed its own page table into the machine. The new page table contained all the virtual-to-physical translations for the process. Whenever the processor needed to look up a translation, it consulted the page table. As long as the process only accessed memory that it owned, all was well. If the process accessed memory it didn't own, it "page faulted" and page_fault() terminated the process.

When we implement virtual memory, the rules have to change. A page fault is no longer necessarily an error, since it might only indicate that the page must be brought in from a disk file or from swap. You will have to implement a more sophisticated page fault handler to handle these cases.


5.1.2.1 Page Table Structure

On the 80x86, the page table format is fixed by hardware. We have provided code for managing page tables for you to use in userprog/pagedir.c. The functions in there provide an abstract interface to all the page table functionality that you should need to complete the project. However, you may still find it worthwhile to understand a little about the hardware page table format, so we'll go into a little of detail about that in this section.

The top-level paging data structure is a page called the "page directory" (PD) arranged as an array of 1,024 32-bit page directory entries (PDEs), each of which represents 4 MB of virtual memory. Each PDE may point to the physical address of another page called a "page table" (PT) arranged, similarly, as an array of 1,024 32-bit page table entries (PTEs), each of which translates a single 4 kB virtual page to a physical page.

Translation of a virtual address into a physical address follows the three-step process illustrated in the diagram below:(4)

  1. The most-significant 10 bits of the virtual address (bits 22...32) index the page directory. If the PDE is marked "present," the physical address of a page table is read from the PDE thus obtained. If the PDE is marked "not present" then a page fault occurs.

  2. The next 10 bits of the virtual address (bits 12...21) index the page table. If the PTE is marked "present," the physical address of a data page is read from the PTE thus obtained. If the PTE is marked "not present" then a page fault occurs.

  3. The least-significant 12 bits of the virtual address (bits 0...11) are added to the data page's physical base address, yielding the final physical address.

 
 31                  22 21                  12 11                   0
+----------------------+----------------------+----------------------+
| Page Directory Index |   Page Table Index   |    Page Offset       |
+----------------------+----------------------+----------------------+
             |                    |                     |
     _______/             _______/                _____/
    /                    /                       /
   /    Page Directory  /      Page Table       /    Data Page
  /     .____________. /     .____________.    /   .____________.
  |1,023|____________| |1,023|____________|    |   |____________|
  |1,022|____________| |1,022|____________|    |   |____________|
  |1,021|____________| |1,021|____________|    \__\|____________|
  |1,020|____________| |1,020|____________|       /|____________|
  |     |            | |     |            |        |            |
  |     |            | \____\|            |_       |            |
  |     |      .     |      /|      .     | \      |      .     |
  \____\|      .     |_      |      .     |  |     |      .     |
       /|      .     | \     |      .     |  |     |      .     |
        |      .     |  |    |      .     |  |     |      .     |
        |            |  |    |            |  |     |            |
        |____________|  |    |____________|  |     |____________|
       4|____________|  |   4|____________|  |     |____________|
       3|____________|  |   3|____________|  |     |____________|
       2|____________|  |   2|____________|  |     |____________|
       1|____________|  |   1|____________|  |     |____________|
       0|____________|  \__\0|____________|  \____\|____________|
                           /                      /


5.1.2.2 Working with Virtual Addresses

Header threads/mmu.h has useful functions for various operations on virtual addresses. You should look over the header yourself. The most important functions are described below.

Function: uintptr_t pd_no (const void *va)
Returns the page directory index for virtual address va.

Function: uintptr_t pt_no (const void *va)
Returns the page table index for virtual address va.

Function: unsigned pg_ofs (const void *va)
Returns the page offset of virtual address va.

Function: void *pg_round_down (const void *va)
Returns va rounded down to the nearest page boundary, that is, va with its page offset set to 0.

Function: void *pg_round_up (const void *va)
Returns va rounded up to the nearest page boundary.


5.1.2.3 Accessed and Dirty Bits

Most of the page table is under the control of the operating system, but two bits in each page table entry are also manipulated by the CPU. On any read or write to the page referenced by a PTE, the CPU sets the PTE's accessed bit to 1; on any write, the CPU sets the dirty bit to 1. The CPU never resets these bits to 0, but the OS may do so.

You will need to use the accessed and dirty bits in your submission to choose which pages to evict from memory and to decide whether evicted pages need to be written to disk. The page table code in userprog/pagedir.c provides functions for checking and setting these bits. These functions are described at the end of this section.

You need to watch out for aliases, that is, two (or more) different virtual pages that refer to the same physical page frame. When an aliased page is accessed, the accessed and dirty bits are updated in only one page table entry (the one for the virtual address used to access the page). The accessed and dirty bits for the other aliased virtual addresses are not updated.

In Pintos, every user virtual page is aliased to its kernel virtual address. You must manage these aliases somehow. For example, your code could check and update the accessed and dirty bits for both addresses. Alternatively, the kernel could avoid the problem by only accessing user data through the user virtual address.

Other aliases should only arise if you implement sharing, as extra credit (see VM Extra Credit), or as bugs elsewhere in your code.

Function: bool pagedir_is_dirty (uint32_t *pd, const void *vpage)
Function: bool pagedir_is_accessed (uint32_t *pd, const void *vpage)
Returns true if page directory pd contains a page table entry for virtual page vpage that is marked dirty (or accessed). Otherwise, returns false.

Function: void pagedir_set_dirty (uint32_t *pd, const void *vpage, bool value)
Function: void pagedir_set_accessed (uint32_t *pd, const void *vpage, bool value)
If page directory pd has a page table entry for vpage, then its dirty (or accessed) bit is set to value.


5.1.3 Disk as Backing Store

VM systems effectively use memory as a cache for disk. From another perspective, disk is a "backing store" for memory. This provides the abstraction of an (almost) unlimited virtual memory size. You must implement such a system, with the additional constraint that performance should be close to that provided by physical memory. You can use dirty bits to tell whether pages need to be written back to disk when they're evicted from main memory, and the accessed bits for page replacement algorithms (see section 5.1.2.3 Accessed and Dirty Bits).

As with any caching system, performance depends on the policy used to decide what to keep in the cache and what to evict. On a page fault, the kernel must decide which page to replace. Ideally, it will throw out a page that will not be referenced for a long time, keeping in memory those pages that are soon to be referenced. Another consideration is that if the replaced page has been modified, the page must be first written to disk before the needed page can be brought in. Many virtual memory systems avoid this extra overhead by writing modified pages to disk in advance, so that later page faults can be completed more quickly (but you do not have to implement this optimization).


5.1.4 Memory Mapped Files

The file system is most commonly accessed with read and write system calls. A secondary interface is to "map" the file into the virtual address space. The program can then use load and store instructions directly on the file data. An alternative view is to see the file system is as "durable memory": files just store data structures, so if you access ordinary data structures using normal program instructions, why not access durable data structures the same way?

Suppose file foo is 0x1000 bytes (4 kB, or one page) long. If foo is mapped into memory starting at address 0x5000, then any memory accesses to locations 0x5000...0x5fff will access the corresponding bytes of foo.

A consequence of memory mapped files is that address spaces are sparsely populated with lots of segments, one for each memory mapped file (plus one each for code, data, and stack).


5.2 Requirements

This assignment is an open-ended design problem. We are going to say as little as possible about how to do things. Instead we will focus on what functionality we require your OS to support. We will expect you to come up with a design that makes sense. You will have the freedom to choose how to handle page faults, how to organize the swap disk, how to implement paging, etc.


5.2.1 Design Document

Before you turn in your project, you must copy the project 3 design document template into your source tree under the name pintos/src/vm/DESIGNDOC and fill it in. We recommend that you read the design document template before you start working on the project. See section D. Project Documentation, for a sample design document that goes along with a fictitious project.


5.2.2 Page Table Management

Implement page directory and page table management to support virtual memory. You will need data structures to accomplish the following tasks:

The page fault handler, page_fault() in threads/exception.c, needs to do roughly the following:

  1. Locate the page backing the virtual address that faulted. It might be in the file system, in swap, or it might be an invalid virtual address. If you implement sharing, it might even already be in physical memory, but not in the page table.

    If the virtual address is invalid, that is, if there's nothing assigned to go there, or if the virtual address is above PHYS_BASE, meaning that it belongs to the kernel instead of the user, then the process's memory access must be disallowed. In this case, terminate the process and free all of its resources.

  2. If the page is not in physical memory, fetch it by appropriate means. If necessary to make room, first evict some other page from memory. (When you do that you need to first remove references to the page from any page table that refers to it.)

  3. Point the page table entry for the faulting virtual address to the physical page. You can use the functions in userprog/pagedir.c.

You'll need to modify the ELF loader in userprog/process.c to do page table management according to your new design. As supplied, it reads all the process's pages from disk and initializes the page tables for them at the same time. As a first step, you might want to leave the code that reads the pages from disk, but use your new page table management code to construct the page tables only as page faults occur for them.

You should use the palloc_get_page() function to get the page frames that you use for storing user virtual pages. Be sure to pass the PAL_USER flag to this function when you do so, because that allocates pages from a "user pool" separate from the "kernel pool" that other calls to palloc_get_page() make (see Why PAL_USER?).

You might find the Pintos bitmap code, in lib/kernel/bitmap.c and lib/kernel/bitmap.h, useful for tracking pages. A bitmap is an array of bits, each of which can be true or false. Bitmaps are typically used to track usage in a set of (identical) resources: if resource n is in use, then bit n of the bitmap is true.

There are many possible ways to implement virtual memory. The above is simply an outline of our suggested implementation.


5.2.3 Paging To and From Disk

Implement paging to and from files and the swap disk. You may use the disk on interface hd1:1 as the swap disk, using the disk interface prototyped in devices/disk.h. From the vm/build directory, use the command pintos-mkdisk swap.dsk n to create an n MB swap disk named swap.dsk. Afterward, swap.dsk will automatically be attached as hd1:1 when you run pintos. Alternatively, you can tell pintos to use a temporary n-MB swap disk for a single run with --swap-disk=n.

You will need routines to move a page from memory to disk and from disk to memory, where "disk" is either a file or the swap disk. If you do a good job, your VM should still work when you implement your own file system for the next assignment.

To fully handle page faults, you will need a way to track pages that are used by a process but which are not in physical memory. Pages in swap should not be constrained to any particular ordering. You will also need a way to track physical page frames, to find an unused one when needed, or to evict a page when memory is needed but no empty pages are available. The page table data structure that you designed should do most of the work for you.

Implement a global page replacement algorithm. You should be able to use the "accessed" and "dirty" bits (see section 5.1.2.3 Accessed and Dirty Bits) to implement an approximation to LRU. Your algorithm should perform at least as well as the "second chance" or "clock" algorithm.

Your design should allow for parallelism. Multiple processes should be able to process page faults at once. If one page fault require I/O, in the meantime processes that do not fault should continue executing and other page faults that do not require I/O should be able to complete. These criteria require some synchronization effort.


5.2.4 Lazy Loading

Since you will already be paging from disk, you should implement a "lazy" loading scheme for new processes. When a process is created, it will not need all of its resources immediately, so it doesn't make sense to load all its code, data, and stack into memory when the process is created. Instead, bring pages in from the executable only as needed. Use the executable file itself as backing store for read-only segments, since these segments won't change. This means that read-only pages should not be written to swap.

The core of the program loader is the loop in load_segment() in userprog/process.c. Each time around the loop, read_bytes receives the number of bytes to read from the executable file and zero_bytes receives the number of bytes to initialize to zero following the bytes read. The two always sum to PGSIZE (4,096). The handling of a page depends on these variables' values:

Incidentally, if you have trouble handling the third case above, you can eliminate it temporarily by linking the test programs with a special "linker script." Read Makefile.userprog for details. We will not test your submission with this special linker script, so the code you turn in must properly handle all cases.


5.2.5 Stack Growth

Implement stack growth. In project 2, the stack was a single page at the top of the user virtual address space, and programs were limited to that much stack. Now, if the stack grows past its current size, allocate additional pages as necessary.

Allocate additional pages only if they "appear" to be stack accesses. Devise a heuristic that attempts to distinguish stack accesses from other accesses.

User programs are buggy if they write to the stack below the stack pointer, because typical real OSes may interrupt a process at any time to deliver a "signal," which pushes data on the stack.(5) However, the 80x86 PUSH instruction checks access permissions before it adjusts the stack pointer, so it may cause a page fault 4 bytes below the stack pointer. (Otherwise, PUSH would not be restartable in a straightforward fashion.) Similarly, the PUSHA instruction pushes 32 bytes at once, so it can fault 32 bytes below the stack pointer.

You will need to be able to obtain the current value of the user program's stack pointer. Within a system call or a page fault generated by a user program, you can retrieve it from esp member of the struct intr_frame passed to syscall_handler() or page_fault(), respectively. If you verify user pointers before accessing them (see section 4.1.5 Accessing User Memory), these are the only cases you need to handle. On the other hand, if you depend on page faults to detect invalid memory access, you will need to handle another case, where a page fault occurs in the kernel. Reading esp out of the struct intr_frame passed to page_fault() in that case will obtain the kernel stack pointer, not the user stack pointer. You will need to arrange another way, e.g. by saving esp into struct thread on the initial transition from user to kernel mode.

You may impose some absolute limit on stack size, as do most OSes. Some OSes make the limit user-adjustable, e.g. with the ulimit command on many Unix systems. On many GNU/Linux systems, the default limit is 8 MB.

The first stack page need not be allocated lazily. You can initialize it with the command line arguments at load time, with no need to wait for it to be faulted in. (Even if you did wait, the very first instruction in the user program is likely to be one that faults in the page.)


5.2.6 Memory Mapped Files

Implement memory mapped files, including the following system calls.

System Call: mapid_t mmap (int fd, void *addr)
Maps the file open as fd into the process's virtual address space. The entire file is mapped into consecutive virtual pages starting at addr.

If the file's length is not a multiple of PGSIZE, then some bytes in the final mapped page "stick out" beyond the end of the file. Set these bytes to zero when the page is faulted in from disk, and discard them when the page is written back to disk. A partial page need not be lazily loaded, as in the case of a partial page in an executable (see section 5.2.4 Lazy Loading).

If successful, this function returns a "mapping ID" that uniquely identifies the mapping within the process. On failure, it must return -1, which otherwise should not be a valid mapping id, and the process's mappings must be unchanged.

A call to mmap may fail if the file open as fd has a length of zero bytes. It must fail if addr is not page-aligned or if the range of pages mapped overlaps any existing set of mapped pages, including the stack or pages mapped at executable load time. It must also fail if addr is 0, because some Pintos code assumes virtual page 0 is not mapped. Finally, file descriptors 0 and 1, representing console input and output, are not mappable.

Your VM system should use the mmap'd file itself as backing store for the mapping. That is, to evict a page mapped by mmap, write it to the file it was mapped from. (In fact, you may choose to implement executable mappings as special, copy-on-write file mappings.)

System Call: void munmap (mapid_t mapping)
Unmaps the mapping designated by mapping, which must be a mapping ID returned by a previous call to mmap by the same process that has not yet been unmapped.

All mappings are implicitly unmapped when a process exits, whether via exit or by any other means. When a mapping is unmapped, whether implicitly or explicitly, all pages written to by the process are written back to the file, and pages not written must not be. The pages are then removed from the process's list of virtual pages.

Closing or removing a file does not unmap any of its mappings. Once created, a mapping is valid until munmap is called or the process exits, following the Unix convention. See Removing an Open File, for more information.

If two or more processes map the same file, there is no requirement that they see consistent data. Unix handles this by making the two mappings share the same physical page, but the mmap system call also has an argument allowing the client to specify whether the page is shared or private (i.e. copy-on-write).


5.3 FAQ

How much code will I need to write?

Here's a summary of our reference solution, produced by the diffstat program. The final row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion.

This summary is relative to the Pintos base code, but the reference solution for project 3 starts from the reference solution to project 2. See section 4.4 FAQ, for the summary of project 2.

The reference solution represents just one possible solution. Many other solutions are also possible and many of those differ greatly from the reference solution. Some excellent solutions may not modify all the files modified by the reference solution, and some may modify files not modified by the reference solution.

 
 Makefile.build       |    4 
 devices/timer.c      |   42 ++
 threads/init.c       |    5 
 threads/interrupt.c  |    2 
 threads/thread.c     |   31 +
 threads/thread.h     |   37 +-
 userprog/exception.c |   12 
 userprog/pagedir.c   |   10 
 userprog/process.c   |  319 +++++++++++++-----
 userprog/syscall.c   |  545 ++++++++++++++++++++++++++++++-
 userprog/syscall.h   |    1 
 vm/frame.c           |  162 +++++++++
 vm/frame.h           |   23 +
 vm/page.c            |  297 ++++++++++++++++
 vm/page.h            |   50 ++
 vm/swap.c            |   85 ++++
 vm/swap.h            |   11 
 17 files changed, 1532 insertions(+), 104 deletions(-)

Do we need a working Project 2 to implement Project 3?

Yes.

What extra credit is available?

You may implement sharing: when multiple processes are created that use the same executable file, share read-only pages among those processes instead of creating separate copies of read-only segments for each process. If you carefully designed your page table data structures, sharing of read-only pages should not make this part significantly harder.


5.3.1 Page Table and Paging FAQ

Does the virtual memory system need to support data segment growth?

No. The size of the data segment is determined by the linker. We still have no dynamic allocation in Pintos (although it is possible to "fake" it at the user level by using memory-mapped files). Supporting data segment growth should add little additional complexity to a well-designed system.

Why should I use PAL_USER for allocating page frames?

Passing PAL_USER to palloc_get_page() causes it to allocate memory from the user pool, instead of the main kernel pool. Running out of pages in the user pool just causes user programs to page, but running out of pages in the kernel pool will cause many failures because so many kernel functions need to obtain memory. You can layer some other allocator on top of palloc_get_page() if you like, but it should be the underlying mechanism.

Also, you can use the -u option to pintos to limit the size of the user pool, which makes it easy to test your VM implementation with various user memory sizes.

Data pages might need swap space. Can I swap them out at process load?

No. Reading data pages from the executable and writing them to swap immediately at program startup is not demand paging. You need to demand page everything (except partial pages).


5.3.2 Memory Mapped File FAQ

How do we interact with memory-mapped files?

Let's say you want to map a file called foo into your address space at address 0x10000000. You open the file then use mmap:

 
#include <stdio.h>
#include <syscall.h>

int main (void)
{
    void *addr = (void *) 0x10000000;
    int fd = open ("foo");
    mapid_t map = mmap (fd, addr);
    if (map != -1)
        printf ("success!\n");
}

Suppose foo is a text file and you want to print the first 64 bytes on the screen (assuming, of course, that the length of the file is at least 64). Without mmap, you'd need to allocate a buffer, use read to get the data from the file into the buffer, and finally use write to put the buffer out to the display. But with the file mapped into your address space, you can directly address it like so:

 
write (STDOUT_FILENO, addr, 64);

Similarly, if you wanted to replace the first byte of the file, all you need to do is:

 
addr[0] = 'b';

When you're done using the memory-mapped file, you simply unmap it:

 
munmap (map);

The mcp program in src/examples shows how to copy a file using memory-mapped I/O.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by CS3204 on February, 2 2006 using texi2html