[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In this assignment, we give you a minimally functional thread system. Your job is to extend the functionality of this system to gain a better understanding of synchronization problems.
You will be working primarily in the threads
directory for
this assignment, with some work in the devices
directory on the
side. Compilation should be done in the threads
directory.
Before you read the description of this project, you should read all of the following sections: 1. Introduction, C. Coding Standards, D. Debugging Tools, and E. Development Tools. You should at least skim the material from A.1 Loading through A.6 Memory Allocation, especially A.4 Synchronization. To complete this project you will also need to read B. Completely Fair Scheduler.
The first step is to read and understand the code for the initial thread system. Pintos already implements thread creation and thread completion, a simple scheduler to switch between threads, and synchronization primitives (semaphores, locks, condition variables, and optimization barriers).
Some of this code might seem slightly mysterious. If
you haven't already compiled and run the base system, as described in
the introduction (see section 1. Introduction), you should do so now. You
can read through parts of the source code to see what's going
on. If you like, you can add calls to printf()
almost
anywhere, then recompile and run to see what happens and in what
order. You can also run the kernel in a debugger and set breakpoints
at interesting spots, single-step through code and examine data, and
so on.
When a thread is created, you are creating a new context to be
scheduled. You provide a function to be run in this context as an
argument to thread_create()
. The first time the thread is
scheduled and runs, it starts from the beginning of that function
and executes in that context. When the function returns, the thread
terminates. Each thread, therefore, acts like a mini-program running
inside Pintos, with the function passed to thread_create()
acting like main()
.
At any given time, exactly one thread runs on each CPU.
The remaining threads, if any, become inactive.
The scheduler decides which thread to run next on a CPU.
(If no thread is ready to run at any given time, then the
special "idle" thread, implemented in idle()
, runs.)
Synchronization primitives can force context switches when one
thread needs to wait for another thread to do something.
The mechanics of a context switch are
in threads/switch.S
, which is 80x86
assembly code. (You don't have to understand it.) It saves the
state of the currently running thread and restores the state of the
thread we're switching to.
Using the GDB debugger, slowly trace through a context
switch to see what happens (see section D.5 GDB). You can set a
breakpoint on schedule()
to start out, and then
single-step from there.(1) Be sure
to keep track of each thread's address
and state, and what procedures are on the call stack for each thread.
You will notice that when one thread calls switch_threads()
,
another thread starts running, and the first thing the new thread does
is to return from switch_threads()
. You will understand the thread
system once you understand why and how the switch_threads()
that
gets called is different from the switch_threads()
that returns.
See section A.3.3 Thread Switching, for more information.
Warning: In Pintos, each thread is assigned a small,
fixed-size execution stack just under 4 kB in size. The kernel
tries to detect stack overflow, but it cannot do so perfectly. You
may cause bizarre problems, such as mysterious kernel panics, if you
declare large data structures as non-static local variables,
e.g. int buf[1000];
. Alternatives to stack allocation include
the page allocator and the block allocator (see section A.6 Memory Allocation).
In an operating system that supports multiple CPUs, such as this version of Pintos, each CPU must be independently managed. For instance, the OS needs to track keep of which threads are currently assigned to that CPU, and which thread is currently running on that CPU (or whether that CPU's idle thread is running).
Per-CPU information is also used for interrupt management, as multiple CPUs may each be handling interrupts.
See section A.2 Struct CPU. for full details of how a CPU is represented in Pintos.
Here is a brief overview of the files in the threads
directory. You will not need to modify most of this code, but the
hope is that presenting this overview will give you a start on what
code to look at.
loader.S
loader.h
start()
in start.S. See section A.1.1 The Loader, for details. You should not need to look at this code or modify it.
start.S
main()
startother.S
mpenter()
kernel.lds.S
start.Sto be near the beginning of the kernel image. See section A.1.1 The Loader, for details. Again, you should not need to look at this code or modify it, but it's here in case you're curious.
init.c
init.h
main()
, the kernel's "main
program." You should look over main()
at least to see what
gets initialized. You might want to add your own initialization code
here. See section A.1.3 High-Level Kernel Initialization, for details.
thread.c
thread.h
thread.hdefines
struct thread
, which you are likely to modify
in all four projects. See A.3.1 struct thread
and A.3 Threads for
more information.
switch.S
switch.h
palloc.c
palloc.h
malloc.c
malloc.h
malloc()
and free()
for
the kernel. See section A.6.2 Block Allocator, for more information.
mp.c
mp.h
interrupt.c
interrupt.h
intr-stubs.S
intr-stubs.h
spinlock.c
spinlock.h
synch.c
synch.h
gdt.c
gdt.h
tss.c
tss.h
io.h
devicesdirectory that you won't have to touch.
vaddr.h
pte.h
flags.h
devicescode
The basic threaded kernel also includes these files in the
devices
directory:
timer.c
timer.h
vga.c
vga.h
printf()
calls into the VGA display driver for you, so there's little reason to
call this code yourself.
serial.c
serial.h
printf()
calls this code for you,
so you don't need to do so yourself.
It handles serial input by passing it to the input layer (see below).
block.c
block.h
ide.c
ide.h
partition.c
partition.h
kbd.c
kbd.h
input.c
input.h
intq.c
intq.h
rtc.c
rtc.h
thread/init.cto choose an initial seed for the random number generator.
speaker.c
speaker.h
pit.c
pit.h
devices/timer.cand
devices/speaker.cbecause each device uses one of the PIT's output channel.
ioapic.c
ioapic.h
ioapicenable()
is called by several device drivers during initialization. All I/O drivers in PintOS routes I/O interrupts to CPU0.
lapic.c
lapic.h
libfiles
Finally, lib
and lib/kernel
contain useful library
routines. (lib/user
will be used by user programs, starting in
project 2, but it is not part of the kernel.) Here's a few more
details:
ctype.h
inttypes.h
limits.h
stdarg.h
stdbool.h
stddef.h
stdint.h
stdio.c
stdio.h
stdlib.c
stdlib.h
string.c
string.h
debug.c
debug.h
random.c
random.h
-rskernel command-line option on each run, or use a simulator other than Bochs, or specify the
-roption to
pintos
.
atomic-ops.c
atomic-ops.h
round.h
syscall-nr.h
kernel/list.c
kernel/list.h
kernel/bitmap.c
kernel/bitmap.h
kernel/hash.c
kernel/hash.h
kernel/console.c
kernel/console.h
kernel/stdio.h
printf()
and a few other functions.
Proper synchronization is an important part of the solutions to these
problems. We strongly recommend that you first read the tour section on
synchronization (see section A.4 Synchronization) or the comments in
threads/synch.c
to learn what synchronization constructs Pintos
provides and which to use for what situations. In particular, it is
important to know when a spinlock should be acquired as opposed to a
lock (and vice versa).
Disabling interrupts as a synchronization technique would work on a uniprocessor system, but not on Pintos: if a thread on one CPU disables interrupts it will disable interrupts only on that CPU, providing no synchronization with threads running on other CPUs.
Yet, to prevent the current thread from being preempted on its CPU, spinlocks do disable interrupts during the entire period they are held. This can have implications on performance, therefore you should hold spinlocks, and thus disable interrupts, for the least amount of code possible, or you can end up losing important things such as timer ticks or input events. Turning off interrupts for any reason also increases the interrupt handling latency, which can make a machine feel sluggish if taken too far.
Disabling interrupts can be useful for debugging when running Pintos on a single CPU, if you want to make sure that a section of code is not interrupted. You should remove debugging code before turning in your project. (Don't just comment it out, because that can make the code difficult to read.)
There should be no busy waiting in your submission. A tight loop that
calls thread_yield()
is one form of busy waiting.
In the past, many groups divided the assignment into pieces, then each group member worked on his or her piece until just before the deadline, at which time the group reconvened to combine their code and submit. This is a bad idea. We do not recommend this approach. Groups that do this often find that two changes conflict with each other, requiring lots of last-minute debugging. Some groups who have done this have turned in code that did not even compile or boot, much less pass any tests.
Instead, we recommend integrating your team's changes early and often, using the git source code control system (see section E.3 git). These systems also make it possible to review changes and, when a change introduces a bug, drop back to working versions of code.
You can decide which model to use: either a shared repository model in which team partners share access to an upstream repository kept on git.cs.vt.edu, which in turn is forked from the provided pintos-2017 repository, or whether you want to use pull request based model to give other team member a chance to review changes first.
You should expect to run into bugs that you simply don't understand while working on this and subsequent projects. When you do, reread the appendix on debugging tools, which is filled with useful debugging tips that should help you to get back up to speed (see section D. Debugging Tools). Be sure to read the section on backtraces (see section D.4 Backtraces), which will help you to get the most out of every kernel panic or assertion failure.
Before you turn in your project, you must copy the
project 1 design document template into your source tree under the name
pintos/src/threads/DESIGNDOC
and fill it in. We recommend that
you read the design document template before you start working on the
project.
To start, we ask that you implement a simple timer facility. Timers are frequently used by operating for many tasks: device drivers, networking code, or to let processes wait for some time.
Reimplement timer_sleep()
, defined in devices/timer.c
.
Although a working implementation is provided, it "busy waits," that
is, it spins in a loop checking the current time and calling
thread_yield()
until enough time has gone by. Reimplement it to
avoid busy waiting.
timer_sleep()
is useful for threads that operate in real-time,
e.g. for blinking the cursor once per second.
The argument to timer_sleep()
is expressed in timer ticks, not in
milliseconds or any another unit. There are TIMER_FREQ
timer
ticks per second, where TIMER_FREQ
is a macro defined in
devices/timer.h
.
Separate functions timer_msleep()
, timer_usleep()
, and
timer_nsleep()
do exist for sleeping a specific number of
milliseconds, microseconds, or nanoseconds, respectively, but these will
call timer_sleep()
automatically when necessary. You do not need
to modify them.
The alarm clock implementation is not needed for later projects, although it could be useful for project 4.
Scheduling is a domain full of trade-offs in which many different algorithms have been developed, tested, and tuned over the years.
Pintos as provided comes with a simple scheduler implementation that manages each CPU's ready queue separately. Threads are assigned to a CPU upon creation and will never migrate between CPUs. The scheduler pursues a simple round-robin policy: when a thread's time slice expires, it is moved to the end of the ready queue and whichever thread is at the front is scheduled. The length of a time slice is the same for all threads.
Clearly, this simple policy lacks sophistication. Therefore, in this project, we ask that you implement a simplified version of the so-called CFS ("Completely Fair Scheduler") scheduler used in the Linux kernel since about 2009.
This scheduler pursues the following goals:
Often, these goals conflict with each other: generally, providing fairness can increase scheduling overhead, and reacting to different scheduling needs may adversely impact fairness.
For this part of the project, you will be working primarily
in threads/scheduler.c
.
See section B. Completely Fair Scheduler, for detailed requirements.
Many scheduling decisions in CFS depends on how much CPU time
a thread has received. Your scheduler must calculate this by recording
when a thread starts and when it stops using the CPU.
You will find the function timer_gettime ()
useful.
The fair scheduler is not strictly required for any later project, but should be useful.
A work-conserving scheduler tries to keep available CPUs busy when there are ready processes to run, a goal pursued by most widely used operating systems.
One of the simplest ways to do this is to keep the threads in a global queue that is shared by all CPUs. An advantage to this approach is that it ensures that no CPU is idle while threads are ready to run (but are not currently running). However, using a global queue has two main weaknesses.
The first weakness is lack of scalability. The global queue must be locked while choosing the next job to run. Locking greatly reduces performance as the number of CPUs grows. Each CPU will spend more and more time contenting for the global queue lock and less time actually running threads.
The second weakness is processor affinity. A thread can build up a fair amount of state in the caches and TLB associated with its running CPU. It is advantageous to try run it on the same CPU each time, as it will run faster than if it ran on a different CPU where its data is far less likely to be stored in the CPU cache. A global queue in which threads are equally likely to be chosen by any CPU may not preserve processor affinity.
Because of the weaknesses described above, many operating systems, including Pintos, use per-CPU queues. Each CPU manages only the threads on their own queue, independent of the other CPUs, thereby avoiding the scalability problem outlined above and improving processor affinity.
Using separate, per-CPU ready queues, however, has the potential drawback in that it may lead to load imbalance between CPUs, which in turn can affect fairness and the ability to use all CPUs fully. For instance, if CPU0 manages thread A, and CPU1 manages two threads B and C, then A has exclusive access to CPU0, while B and C take turns being scheduled on CPU1. A then is given twice as much CPU time as B and C. Even worse, imagine if thread A finishes. Then CPU0 would be idle, while CPU1 is still shared between threads B and C. Load balancing avoids this problem by providing mechanisms and policies to migrate threads between CPUs so that each is shared between approximately the same number of threads.
Implement load balancing in Pintos.
In Pintos, when a thread is created, it is assigned a CPU in a round-robin fashion and added to its ready queue first. Although one could imagine better policies for initial placement, for the purposes of this project we require that you DO NOT change this, as it is an assumption made by the load balance tests.
As shown by the example above, this simple placement policy does not guarantee that CPUs will be balanced because the threads initially placed on one CPU may finish faster than those placed on another, causing the former to become idle.
Thus, a good load balancing strategy should pursue the following goals:
In this assignment, we ask that you implement the load balancing policy used by the CFS scheduler, which uses a load metric that is specific to it.
You should create a function called load_balance()
and call it
from appropriate places.
It is up to you how frequently you call load_balance()
; at the very least,
load balancing must be performed inside the idle loop to avoid missing
when there are available threads in other CPUs' ready queues.
In the presence of load balancing, care must be taken whenever accessing the
data structure representing the current CPU. You must avoid a scenario where
a thread reads the current CPU value (via get_cpu ()()
or by accessing
thread_current ()->cpu()
) and has been migrated by the load balancer
to another CPU by the time it is ready to use that value. The easiest way
to do that is prevent preemption of that thread on its CPU, which is accomplished
by disabling interrupts. See lock_own_ready_queue ()()
in threads/thread.c
for an example.
Testing the correct behavior of a scheduler can be tricky. On the one hand, tests need to verify that the desired policy is implemented correctly, which tends to favor a unit-test based approach. On the other hand, the scheduler implementation must work in an actual kernel environment to schedule a workload of real threads.
For this project, we pursue a dual approach to testing that includes both simulation and actual execution. The simulator framework is built into the Pintos kernel, ensuring no changes are needed to your scheduler for testing.
Simulated Tests.
The majority of CFS tests is performed under the simulated scheduler.
In these tests, we do not actually create or schedule any real threads.
Instead, the scheduler simulator simulates how threads would be scheduled
under your scheduler.
The simulator framework asks your scheduler for which
scheduling decisions to make at which point, but it does not actually switch
between threads. Instead, it verifies that the correct thread is selected to run
at the correct time, based on the algorithm.
As such, it is able to create a wide variety of scheduling scenarios and check
if your scheduler makes the correct decisions.
See tests/threads/cfstest.c
and tests/threads/simulator.c
.
The tests are set up in functions
cfstest_set_up()
and cfstest_tear_down()
, defined in tests/threads/cfstest.c
.
The simulator sets up a "fake CPU" that does not represent an actual
CPU on the hardware, but rather a virtual environment where the simulator
can create threads, execute timer interrupts, etc., without affecting the
system. During setup, change_cpu()
is called so that the CPU local
variable cpu
points to the fake CPU. After that, all OS events are
directed towards the simulated CPU, causing your scheduler to be invoked
in the process. The real CPU is restored at the end of the test.
During the simulated testing, interrupts are disabled, so no real timer interrupts
will arrive. Timer interrupts are simulated
by setting the system time via timer_settime()
and then executing
driver_interrupt_tick()
, which in turn invokes driver_tick()
.
These functions are almost identical to timer_interrupt()
in
devices/timer.c
and thread_tick()
in thread.c
respectively.
At the beginning of each test, the system time is set to 0, so any time spent prior to the test does not affect the test. Each test defines a set of OS events that arrive after a certain amount of time. Each OS event is a scheduling event that will invoke your scheduler. At the end of each event, the test checks that the thread that your scheduler would run at the end of the event is the correct one. The real time is restored at the end of the test.
Restrictions. While in simulated testing mode, your scheduler code is
exercised very similar to how it is exercised during actual operation.
However, you must be careful not to call functions that assume that the
machine is operating on the real CPU. These include most functions in
threads/thread.c
, including thread_current()()
and thread_yield()()
.
This includes functions that may call those function transitively.
In addition, since the simulator replaces most functions in threads/thread.c
with its own while operating, changes you may make to functions in that file
will not be used during simulation. As a concrete example, you cannot
update scheduler values such as ideal_vruntime in thread_set_nice()
,
See section B. Completely Fair Scheduler.
We hope to release these restrictions in future versions of Pintos.
Note that your scheduler's sched_init()
function will be called for
every ready queue on which it operates: that is, once for each (real) CPU
found on the system, and once for the simulated CPU used during testing.
Furthermore, to successfully run the tests, it needs to support real
operation prior to cfstest_set_up()()
and post cfstest_tear_down()()
.
Real Workload Tests. The alarm, cfs-run, and balance tests do not run the simulator, but rather schedule real threads doing work under your scheduler to ensure that your scheduler works under real conditions.
Since both the driver and thread.c
calls into the same module
threads/scheduler.c
, you should not have to make any special changes
to make Pintos invoke your scheduler, provided that you do not remove
any of its exported functions.
The real workload tests take significantly longer to run than the simulated ones.
Makefiles when I add a new source file?
To add a .c
file, edit the top-level Makefile.build
.
Add the new file to variable dir_SRC
, where
dir is the directory where you added the file. For this
project, that means you should add it to threads_SRC
or
devices_SRC
. Then run make
. If your new file
doesn't get
compiled, run make clean
and then try again.
When you modify the top-level Makefile.build
and re-run
make
, the modified
version should be automatically copied to
threads/build/Makefile
. The converse is
not true, so any changes will be lost the next time you run make
clean
from the threads
directory. Unless your changes are
truly temporary, you should prefer to edit Makefile.build
.
A new .h
file does not require editing the Makefile
s.
warning: no previous prototype for `func'
mean?
It means that you defined a non-static
function without
preceding it by a prototype. Because non-static
functions are
intended for use by other .c
files, for safety they should be
prototyped in a header file included before their definition. To fix
the problem, add a prototype in a header file that you include, or, if
the function isn't actually used by other .c
files, make it
static
.
Timer interrupts occur TIMER_FREQ
times per second.
The default is 1000Hz. It is set in devices/timer.h
.
We do not recommend changing it, since it may cause some of the
tests to fail.
There are TIME_SLICE
ticks per time slice. This macro is
declared in threads/thread.c
. The default is 4 ticks.
However, in Project 1 you will change the scheduler to dynamically
calculate an ideal timeslice, under the unit of nanoseconds rather than
ticks.
See section 1.2.1 Testing.
pass()
?
You are probably looking at a backtrace that looks something like this:
0xc0108810: debug_panic (lib/kernel/debug.c:32) 0xc010a99f: pass (tests/threads/tests.c:93) 0xc010bdd3: test_mlfqs_load_1 (...threads/mlfqs-load-1.c:33) 0xc010a8cf: run_test (tests/threads/tests.c:51) 0xc0100452: run_task (threads/init.c:283) 0xc0100536: run_actions (threads/init.c:333) 0xc01000bb: main (threads/init.c:137) |
This is just confusing output from the backtrace
program. It
does not actually mean that pass()
called debug_panic()
. In
fact, fail()
called debug_panic()
(via the PANIC()
macro). GCC knows that debug_panic()
does not return, because it
is declared NO_RETURN
(see section D.3 Function and Parameter Attributes), so it doesn't include any code in fail()
to take
control when debug_panic()
returns. This means that the return
address on the stack looks like it is at the beginning of the function
that happens to follow fail()
in memory, which in this case happens
to be pass()
.
See section D.4 Backtraces, for more information.
Don't worry about the possibility of timer values overflowing. Timer values are expressed as signed 64-bit numbers, which at 100 ticks per second should be good for almost 2,924,712,087 years. By then, we expect Pintos to have been phased out of the CS 4284 curriculum.
Linux's implementation of CFS uses a red/black tree to implement insertion and retrieval in O(log n) time. For the purposes of this project, it is acceptable for these operations to be performed in O(n) time.
If your implementation mysteriously fails some of the advanced scheduler tests, try the following:
You should think about how your policy retains the potential for threads to retain processor affinity.
Race conditions are, by nature, not guaranteed to occur. The goal of the test is to fail with high probability if race conditions are present. We designed them by identifying the critical sections that you will have to protect with synchronization, and entering the critical sections enough times that it is likely two threads will try to enter at the same time, either because a timer interrupt preempted the first thread or because they are running on different CPUs. The critical sections are rather small, so the tests have to be repeated, which leads to high execution time.
You can try speeding the tests up by enabling KVM if it is available to you, but it is not guaranteed to help because the speedup provided by KVM may make the already-small critical sections even smaller, meaning it may not produce a failure even if race conditions are present. Remember that timer interrupts still come at the same time intervals, despite the code running a lot faster. Instead we recommend writing a script to run the tests many times (and in parallel, by using tmux for example) and saving output in case of a kernel panic.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |