CS3214 Computer Systems

Why and how should we use valgrind in CS3214, particularly in p1, and how do we interpret its error messages?

valgrind is an instrumentation framework for building dynamic analysis tools. Out of the box, it comes with a number of tools. If you run valgrind programname args... you are using the memcheck tool.

The memcheck tool performs many different checks, which roughly fall into 2 groups:

It checks for situations in which a buggy program encountered undefined behavior as described in the C standard. These include illegal read and write errors, the use of uninitialized values (or uninitialised as valgrind calls it), use after free errors, overruns of dynamically allocated heap buffers, and the like.
Detection of certain kinds of memory leaks upon program exit, particularly when run with --leak-check=full.

You should use valgrind to identify and fix any and all bugs in the first group. Not ifs and buts. C programs that exhibit undefined behavior can compute, not compute, fail, or not fail, in any way they please. The compiler and runtime system will literally not only fail to make any guarantees, but do their absolute best to deliver the most unexpected result possible and make your life hell.

❗ ❗ ❗ Therefore, if you encounter one of those errors that are flagged during execution, you need to fix it before continuing to debug. This means, in practice, that you should be doing most of your initial testing (if not all of it) running valgrind ./cush. ❗ ❗ ❗

The remainder of this note focuses on the second group, memory leaks, whose discussion is more nuanced. Generally speaking, a memory leak occurs when a program allocates a region of memory but then at some point stops accessing this region of memory, yet fails to deallocate it. We refer to such regions as "objects" that represent blocks of memory managed by a memory allocator.

Answering the question whether a program will use allocated memory in the future is, in general, undecidable - it would require predicting all possible paths a program can take. However, if a program failed to store a pointer to the allocated area in a location to which it still has access (such as a global variable, or local variable, or a heap object that is directly or indirectly reachable from a global or local variable), then we can say with certainty that the program will not access this region in the future. (In a garbage collected language, such objects would then be reclaimed by the garbage collector.)

There is a second important aspect, which is that when a process exits, the OS will reclaim all of its resources, which includes all of its virtual memory, including all regions where dynamically allocated objects are located. This is true for all commonly used process-based environments, including Linux, Windows, etc., despite the fact that there may be some ambiguity about this in the POSIX specification and that different rules may apply in certain embedded systems.

Therefore, reclaiming (as in calling free() on dynamically allocated memory objects) directly before calling exit() is generally not done and can also be detrimental to locality and performance.

valgrind's leak checker runs when a process exits. At this point, it performs an analysis to check which allocated objects are still reachable from a local or global variable.
If an object is not reachable, then if the program had continued to run, it would be impossible for the program to free this object. valgrind reports these as definitely lost or indirectly lost.

You should address those leaks - they represent true defects. If your program continued to run, such leaks could accumulate over time and cause an increase in the program's virtual memory size, which in turn can have a number of adverse effects.

If valgrind finds objects that are still reachable when a program exits, then one of two cases applies: in the first case, the objects would be freed if the program continued to execute. An example in CS3214 are the ast* objects allocated by the shell's parser, which your shell should deallocate when a job exits while the shell continues to run. You therefore do not need to address such leak reports.

Many libraries also perform one time allocations which are not deallocated before exit, particularly if the library does not provide a deinitialization function. You also do not need to address such leak reports.

The other case is that even though the object is still reachable, it actually will not be accessed in the future. For instance, storing objects in a hash table or global list or similar container even though there's no path where they would be retrieved. Valgrind would report such blocks also as 'still reachable,' but unlike the first case such leaks should be addressed. (This is the type of leak that also occurs in languages such as Java that provide automatic memory management.) Identifying such leaks, and distinguishing them from the former case, requires a case-by-case analysis.

In some CS3214 assignments, we will use valgrind's memory leak checking facilities to ensure that your code deallocates all allocated memory. In these situations, you must ensure that all memory your code allocates is deallocated upon exit, allowing valgrind to flag any objects your code should have deallocated.