3. Using and understanding the Valgrind core: Advanced Topics

			Valgrind User Manual

Table of Contents

3.2.1. A Simple Example
3.2.2. Wrapping Specifications
3.2.3. Wrapping Semantics
3.2.4. Debugging
3.2.5. Limitations - control flow
3.2.6. Limitations - original function signatures
3.2.7. Examples

This chapter describes advanced aspects of the Valgrind core services, which are mostly of interest to power users who wish to customise and modify Valgrind's default behaviours in certain useful ways. The subjects covered are:

The "Client Request" mechanism
Function Wrapping

3.1. The Client Request mechanism

Valgrind has a trapdoor mechanism via which the client program can pass all manner of requests and queries to Valgrind and the current tool. Internally, this is used extensively to make malloc, free, etc, work, although you don't see that.

For your convenience, a subset of these so-called client requests is provided to allow you to tell Valgrind facts about the behaviour of your program, and also to make queries. In particular, your program can tell Valgrind about changes in memory range permissions that Valgrind would not otherwise know about, and so allows clients to get Valgrind to do arbitrary custom checks.

Clients need to include a header file to make this work. Which header file depends on which client requests you use. Some client requests are handled by the core, and are defined in the header file valgrind/valgrind.h. Tool-specific header files are named after the tool, e.g. valgrind/memcheck.h. All header files can be found in the include/valgrind directory of wherever Valgrind was installed.

The macros in these header files have the magical property that they generate code in-line which Valgrind can spot. However, the code does nothing when not run on Valgrind, so you are not forced to run your program under Valgrind just because you use the macros in this file. Also, you are not required to link your program with any extra supporting libraries.

The code added to your binary has negligible performance impact: on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions and is probably undetectable except in tight loops. However, if you really wish to compile out the client requests, you can compile with -DNVALGRIND (analogous to -DNDEBUG's effect on assert()).

You are encouraged to copy the valgrind/*.h headers into your project's include directory, so your program doesn't have a compile-time dependency on Valgrind being installed. The Valgrind headers, unlike most of the rest of the code, are under a BSD-style license so you may include them without worrying about license incompatibility.

Here is a brief description of the macros available in valgrind.h, which work with more than one tool (see the tool-specific documentation for explanations of the tool-specific macros).

RUNNING_ON_VALGRIND:

Returns 1 if running on Valgrind, 0 if running on the real CPU. If you are running Valgrind on itself, returns the number of layers of Valgrind emulation you're running on.

VALGRIND_DISCARD_TRANSLATIONS:

Discards translations of code in the specified address range. Useful if you are debugging a JIT compiler or some other dynamic code generation system. After this call, attempts to execute code in the invalidated address range will cause Valgrind to make new translations of that code, which is probably the semantics you want. Note that code invalidations are expensive because finding all the relevant translations quickly is very difficult. So try not to call it often. Note that you can be clever about this: you only need to call it when an area which previously contained code is overwritten with new code. You can choose to write code into fresh memory, and just call this occasionally to discard large chunks of old code all at once.

Alternatively, for transparent self-modifying-code support, use--smc-check=all, or run on ppc32/Linux or ppc64/Linux.

VALGRIND_COUNT_ERRORS:

Returns the number of errors found so far by Valgrind. Can be useful in test harness code when combined with the --log-fd=-1 option; this runs Valgrind silently, but the client program can detect when errors occur. Only useful for tools that report errors, e.g. it's useful for Memcheck, but for Cachegrind it will always return zero because Cachegrind doesn't report errors.

VALGRIND_MALLOCLIKE_BLOCK:

If your program manages its own memory instead of using the standard malloc() / new / new[], tools that track information about heap blocks will not do nearly as good a job. For example, Memcheck won't detect nearly as many errors, and the error messages won't be as informative. To improve this situation, use this macro just after your custom allocator allocates some new memory. See the comments in valgrind.h for information on how to use it.

VALGRIND_FREELIKE_BLOCK:

This should be used in conjunction with VALGRIND_MALLOCLIKE_BLOCK. Again, see memcheck/memcheck.h for information on how to use it.

VALGRIND_CREATE_MEMPOOL:

This is similar to VALGRIND_MALLOCLIKE_BLOCK, but is tailored towards code that uses memory pools. See the comments in valgrind.h for information on how to use it.

VALGRIND_DESTROY_MEMPOOL:

This should be used in conjunction with VALGRIND_CREATE_MEMPOOL. Again, see the comments in valgrind.h for information on how to use it.

VALGRIND_MEMPOOL_ALLOC:

This should be used in conjunction with VALGRIND_CREATE_MEMPOOL. Again, see the comments in valgrind.h for information on how to use it.

VALGRIND_MEMPOOL_FREE:

This should be used in conjunction with VALGRIND_CREATE_MEMPOOL. Again, see the comments in valgrind.h for information on how to use it.

VALGRIND_NON_SIMD_CALL[0123]:

Executes a function of 0, 1, 2 or 3 args in the client program on the real CPU, not the virtual CPU that Valgrind normally runs code on. These are used in various ways internally to Valgrind. They might be useful to client programs.

Warning: Only use these if you really know what you are doing. They aren't entirely reliable, and can cause Valgrind to crash. Generally, your prospects of these working are made higher if the called function does not refer to any global variables, and does not refer to any libc or other functions (printf et al). Any kind of entanglement with libc or dynamic linking is likely to have a bad outcome, for tricky reasons which we've grappled with a lot in the past.

VALGRIND_PRINTF(format, ...):

printf a message to the log file when running under Valgrind. Nothing is output if not running under Valgrind. Returns the number of characters output.

VALGRIND_PRINTF_BACKTRACE(format, ...):

printf a message to the log file along with a stack backtrace when running under Valgrind. Nothing is output if not running under Valgrind. Returns the number of characters output.

VALGRIND_STACK_REGISTER(start, end):

Registers a new stack. Informs Valgrind that the memory range between start and end is a unique stack. Returns a stack identifier that can be used with other VALGRIND_STACK_* calls.

Valgrind will use this information to determine if a change to the stack pointer is an item pushed onto the stack or a change over to a new stack. Use this if you're using a user-level thread package and are noticing spurious errors from Valgrind about uninitialized memory reads.

VALGRIND_STACK_DEREGISTER(id):

Deregisters a previously registered stack. Informs Valgrind that previously registered memory range with stack id id is no longer a stack.

VALGRIND_STACK_CHANGE(id, start, end):

Changes a previously registered stack. Informs Valgrind that the previously registered stack with stack id id has changed its start and end values. Use this if your user-level thread package implements stack growth.

Note that valgrind.h is included by all the tool-specific header files (such as memcheck.h), so you don't need to include it in your client if you include a tool-specific header.

3.2. Function wrapping

Valgrind versions 3.2.0 and above can do function wrapping on all supported targets. In function wrapping, calls to some specified function are intercepted and rerouted to a different, user-supplied function. This can do whatever it likes, typically examining the arguments, calling onwards to the original, and possibly examining the result. Any number of functions may be wrapped.

Function wrapping is useful for instrumenting an API in some way. For example, wrapping functions in the POSIX pthreads API makes it possible to notify Valgrind of thread status changes, and wrapping functions in the MPI (message-passing) API allows notifying Valgrind of memory status changes associated with message arrival/departure. Such information is usually passed to Valgrind by using client requests in the wrapper functions, although that is not of relevance here.

3.2.1. A Simple Example

Supposing we want to wrap some function

int foo ( int x, int y ) { return x + y; }

A wrapper is a function of identical type, but with a special name which identifies it as the wrapper for foo. Wrappers need to include supporting macros from valgrind.h. Here is a simple wrapper which prints the arguments and return value:

#include <stdio.h>
#include "valgrind.h"
int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y )
{
   int    result;
   OrigFn fn;
   VALGRIND_GET_ORIG_FN(fn);
   printf("foo's wrapper: args %d %d\n", x, y);
   CALL_FN_W_WW(result, fn, x,y);
   printf("foo's wrapper: result %d\n", result);
   return result;
}

To become active, the wrapper merely needs to be present in a text section somewhere in the same process' address space as the function it wraps, and for its ELF symbol name to be visible to Valgrind. In practice, this means either compiling to a .o and linking it in, or compiling to a .so and LD_PRELOADing it in. The latter is more convenient in that it doesn't require relinking.

All wrappers have approximately the above form. There are three crucial macros:

I_WRAP_SONAME_FNNAME_ZU: this generates the real name of the wrapper. This is an encoded name which Valgrind notices when reading symbol table information. What it says is: I am the wrapper for any function named foo which is found in an ELF shared object with an empty ("NONE") soname field. The specification mechanism is powerful in that wildcards are allowed for both sonames and function names. The details are discussed below.

VALGRIND_GET_ORIG_FN: once in the the wrapper, the first priority is to get hold of the address of the original (and any other supporting information needed). This is stored in a value of opaque type OrigFn. The information is acquired using VALGRIND_GET_ORIG_FN. It is crucial to make this macro call before calling any other wrapped function in the same thread.

CALL_FN_W_WW: eventually we will want to call the function being wrapped. Calling it directly does not work, since that just gets us back to the wrapper and tends to kill the program in short order by stack overflow. Instead, the result lvalue, OrigFn and arguments are handed to one of a family of macros of the form CALL_FN_*. These cause Valgrind to call the original and avoid recursion back to the wrapper.

3.2.2. Wrapping Specifications

This scheme has the advantage of being self-contained. A library of wrappers can be compiled to object code in the normal way, and does not rely on an external script telling Valgrind which wrappers pertain to which originals.

Each wrapper has a name which, in the most general case says: I am the wrapper for any function whose name matches FNPATT and whose ELF "soname" matches SOPATT. Both FNPATT and SOPATT may contain wildcards (asterisks) and other characters (spaces, dots, @, etc) which are not generally regarded as valid C identifier names.

This flexibility is needed to write robust wrappers for POSIX pthread functions, where typically we are not completely sure of either the function name or the soname, or alternatively we want to wrap a whole set of functions at once.

For example, pthread_create in GNU libpthread is usually a versioned symbol - one whose name ends in, eg, @GLIBC_2.3. Hence we are not sure what its real name is. We also want to cover any soname of the form libpthread.so*. So the header of the wrapper will be

int I_WRAP_SONAME_FNNAME_ZZ(libpthreadZdsoZd0,pthreadZucreateZAZa)
  ( ... formals ... )
  { ... body ... }

In order to write unusual characters as valid C function names, a Z-encoding scheme is used. Names are written literally, except that a capital Z acts as an escape character, with the following encoding:

     Za   encodes    *
     Zp              +
     Zc              :
     Zd              .
     Zu              _
     Zh              -
     Zs              (space)
     ZA              @
     ZZ              Z
     ZL              (       # only in valgrind 3.3.0 and later
     ZR              )       # only in valgrind 3.3.0 and later

Hence libpthreadZdsoZd0 is an encoding of the soname libpthread.so.0 and pthreadZucreateZAZa is an encoding of the function name pthread_create@*.

The macro I_WRAP_SONAME_FNNAME_ZZ constructs a wrapper name in which both the soname (first component) and function name (second component) are Z-encoded. Encoding the function name can be tiresome and is often unnecessary, so a second macro, I_WRAP_SONAME_FNNAME_ZU, can be used instead. The _ZU variant is also useful for writing wrappers for C++ functions, in which the function name is usually already mangled using some other convention in which Z plays an important role. Having to encode a second time quickly becomes confusing.

Since the function name field may contain wildcards, it can be anything, including just *. The same is true for the soname. However, some ELF objects - specifically, main executables - do not have sonames. Any object lacking a soname is treated as if its soname was NONE, which is why the original example above had a name I_WRAP_SONAME_FNNAME_ZU(NONE,foo).

Note that the soname of an ELF object is not the same as its file name, although it is often similar. You can find the soname of an object libfoo.so using the command readelf -a libfoo.so | grep soname.

3.2.3. Wrapping Semantics

The ability for a wrapper to replace an infinite family of functions is powerful but brings complications in situations where ELF objects appear and disappear (are dlopen'd and dlclose'd) on the fly. Valgrind tries to maintain sensible behaviour in such situations.

For example, suppose a process has dlopened (an ELF object with soname) object1.so, which contains function1. It starts to use function1 immediately.

After a while it dlopens wrappers.so, which contains a wrapper for function1 in (soname) object1.so. All subsequent calls to function1 are rerouted to the wrapper.

If wrappers.so is later dlclose'd, calls to function1 are naturally routed back to the original.

Alternatively, if object1.so is dlclose'd but wrappers.so remains, then the wrapper exported by wrapper.so becomes inactive, since there is no way to get to it - there is no original to call any more. However, Valgrind remembers that the wrapper is still present. If object1.so is eventually dlopen'd again, the wrapper will become active again.

In short, valgrind inspects all code loading/unloading events to ensure that the set of currently active wrappers remains consistent.

A second possible problem is that of conflicting wrappers. It is easily possible to load two or more wrappers, both of which claim to be wrappers for some third function. In such cases Valgrind will complain about conflicting wrappers when the second one appears, and will honour only the first one.

3.2.4. Debugging

Figuring out what's going on given the dynamic nature of wrapping can be difficult. The --trace-redir=yes flag makes this possible by showing the complete state of the redirection subsystem after every mmap/munmap event affecting code (text).

There are two central concepts:

A "redirection specification" is a binding of a (soname pattern, fnname pattern) pair to a code address. These bindings are created by writing functions with names made with the I_WRAP_SONAME_FNNAME_{ZZ,_ZU} macros.
An "active redirection" is code-address to code-address binding currently in effect.

The state of the wrapping-and-redirection subsystem comprises a set of specifications and a set of active bindings. The specifications are acquired/discarded by watching all mmap/munmap events on code (text) sections. The active binding set is (conceptually) recomputed from the specifications, and all known symbol names, following any change to the specification set.

--trace-redir=yes shows the contents of both sets following any such event.

-v prints a line of text each time an active specification is used for the first time.

Hence for maximum debugging effectiveness you will need to use both flags.

One final comment. The function-wrapping facility is closely tied to Valgrind's ability to replace (redirect) specified functions, for example to redirect calls to malloc to its own implementation. Indeed, a replacement function can be regarded as a wrapper function which does not call the original. However, to make the implementation more robust, the two kinds of interception (wrapping vs replacement) are treated differently.

--trace-redir=yes shows specifications and bindings for both replacement and wrapper functions. To differentiate the two, replacement bindings are printed using R-> whereas wraps are printed using W->.

3.2.5. Limitations - control flow

For the most part, the function wrapping implementation is robust. The only important caveat is: in a wrapper, get hold of the OrigFn information using VALGRIND_GET_ORIG_FN before calling any other wrapped function. Once you have the OrigFn, arbitrary calls between, recursion between, and longjumps out of wrappers should work correctly. There is never any interaction between wrapped functions and merely replaced functions (eg malloc), so you can call malloc etc safely from within wrappers.

The above comments are true for {x86,amd64,ppc32}-linux. On ppc64-linux function wrapping is more fragile due to the (arguably poorly designed) ppc64-linux ABI. This mandates the use of a shadow stack which tracks entries/exits of both wrapper and replacement functions. This gives two limitations: firstly, longjumping out of wrappers will rapidly lead to disaster, since the shadow stack will not get correctly cleared. Secondly, since the shadow stack has finite size, recursion between wrapper/replacement functions is only possible to a limited depth, beyond which Valgrind has to abort the run. This depth is currently 16 calls.

For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above comments apply on a per-thread basis. In other words, wrapping is thread-safe: each thread must individually observe the above restrictions, but there is no need for any kind of inter-thread cooperation.

3.2.6. Limitations - original function signatures

As shown in the above example, to call the original you must use a macro of the form CALL_FN_*. For technical reasons it is impossible to create a single macro to deal with all argument types and numbers, so a family of macros covering the most common cases is supplied. In what follows, 'W' denotes a machine-word-typed value (a pointer or a C long), and 'v' denotes C's void type. The currently available macros are:

CALL_FN_v_v       -- call an original of type  void fn ( void )
CALL_FN_W_v       -- call an original of type  long fn ( void )

CALL_FN_v_W       -- void fn ( long )
CALL_FN_W_W       -- long fn ( long )

CALL_FN_v_WW      -- void fn ( long, long )
CALL_FN_W_WW      -- long fn ( long, long )

CALL_FN_v_WWW     -- void fn ( long, long, long )
CALL_FN_W_WWW     -- long fn ( long, long, long )

CALL_FN_W_WWWW    -- long fn ( long, long, long, long )
CALL_FN_W_5W      -- long fn ( long, long, long, long, long )
CALL_FN_W_6W      -- long fn ( long, long, long, long, long, long )
and so on, up to 
CALL_FN_W_12W

The set of supported types can be expanded as needed. It is regrettable that this limitation exists. Function wrapping has proven difficult to implement, with a certain apparently unavoidable level of ickyness. After several implementation attempts, the present arrangement appears to be the least-worst tradeoff. At least it works reliably in the presence of dynamic linking and dynamic code loading/unloading.

You should not attempt to wrap a function of one type signature with a wrapper of a different type signature. Such trickery will surely lead to crashes or strange behaviour. This is not of course a limitation of the function wrapping implementation, merely a reflection of the fact that it gives you sweeping powers to shoot yourself in the foot if you are not careful. Imagine the instant havoc you could wreak by writing a wrapper which matched any function name in any soname - in effect, one which claimed to be a wrapper for all functions in the process.

3.2.7. Examples

In the source tree, memcheck/tests/wrap[1-8].c provide a series of examples, ranging from very simple to quite advanced.

auxprogs/libmpiwrap.c is an example of wrapping a big, complex API (the MPI-2 interface). This file defines almost 300 different wrappers.