12. Lackey: a simple profiler and memory tracer

Table of Contents

12.1. Overview
12.2. Lackey Options

To use this tool, you must specify --tool=lackey on the Valgrind command line.

12.1. Overview

Lackey is a simple valgrind tool that does some basic program measurement. It adds quite a lot of simple instrumentation to the program's code. It is primarily intended to be of use as an example tool, and consequently emphasises clarity of implementation over performance.

It measures and reports various things.

  1. When command line option --basic-counts=yes is specified, it prints the following statistics and information about the execution of the client program:

    1. The number of calls to _dl_runtime_resolve(), the function in glibc's dynamic linker that resolves function references to shared objects.

      You can change the name of the function tracked with command line option --fnname=<name>.

    2. The number of conditional branches encountered and the number and proportion of those taken.

    3. The number of superblocks entered and completed by the program. Note that due to optimisations done by the JIT, this is not at all an accurate value.

    4. The number of guest (x86, amd64, ppc, etc.) instructions and IR statements executed. IR is Valgrind's RISC-like intermediate representation via which all instrumentation is done.

    5. Ratios between some of these counts.

    6. The exit code of the client program.

  2. When command line option --detailed-counts=yes is specified, a table is printed with counts of loads, stores and ALU operations for various types of operands.

    The types are identified by their IR name ("I1" ... "I128", "F32", "F64", and "V128").

  3. When command line option --trace-mem=yes is specified, it prints out the size and address of almost every load and store made by the program. See the comments at the top of the file lackey/lk_main.c for details about the output format, how it works, and inaccuracies in the address trace.

  4. When command line option --trace-superblocks=yes is specified, it prints out the address of every superblock (extended basic block) executed by the program. This is primarily of interest to Valgrind developers. See the comments at the top of the file lackey/lk_main.c for details about the output format.

Note that Lackey runs quite slowly, especially when --detailed-counts=yes is specified. It could be made to run a lot faster by doing a slightly more sophisticated job of the instrumentation, but that would undermine its role as a simple example tool. Hence we have chosen not to do so.

Note also that --trace-mem=yes and --trace-superblocks=yes create immense amounts of output. If you are saving the output in a file, you can eat up tens of gigabytes of disk space very quickly. As a result of printing out so much stuff, they also cause the program to run absolutely utterly unbelievably slowly.

12.2. Lackey Options

Lackey-specific options are:

--basic-counts=<no|yes> [default: yes]

Count basic events, as described above.

--detailed-counts=<no|yes> [default: no]

Count loads, stores and alu ops, differentiated by their IR types.

--fnname=<name> [default: _dl_runtime_resolve()]

Count calls to the function <name>.

--trace-mem=<no|yes> [default: no]

Produce a log of all memory references, as described above.

--trace-superblocks=<no|yes> [default: no]

Print a line of text giving the address of each superblock (single entry, multiple exit chunk of code) executed by the program.