DynamoRIO
Sample Tools

We provide several examples in the samples directory of the release package to illustrate how the DynamoRIO API is used to build a DynamoRIO client.

There are also samples for the Dr. Memory Framework located in the drmemory/drmf/samples directory of the release package.

For larger examples of clients, see the provided Available Tools, which are larger and more polished end-user clients than these samples.

List of Samples

Each sample below is in the api/samples directory. The links point to the corresponding source file in the Git repository.

bbbuf.c

The sample bbbuf.c records each basic block’s start PC into a per-thread fast circular buffer (default 64KB) by inserting inline instrumentation at the first instruction of each basic block using the drx_buf extension.

DynamoRIO concepts demonstrated
Typical use cases
  • Hot path profiling based on basic block history
  • Lightweight execution tracing for debugging

bbcount.c

The sample bbcount.c counts dynamic basic block executions by inserting a non-atomic global counter update at the start of each basic block using DynamoRIO eXtension utilities counter helpers.

DynamoRIO concepts demonstrated
Typical use cases
  • Measuring overall execution volume
  • Comparing workloads by basic block activity

bbsize.c

The sample bbsize.c computes the number, maximum size, and average size of basic blocks using a basic-block analysis callback and explicit floating-point state save/restore.

DynamoRIO concepts demonstrated
  • Basic block analysis callbacks
  • Floating-point state preservation in callbacks
  • Shared statistics protected by a mutex
Typical use cases
  • Characterizing code layout behavior
  • Comparing compiler output across binaries

callstack.cpp

The sample callstack.cpp wraps a configurable function (default: malloc) and prints a symbolized call stack on each call using Function Wrapping and Replacing Extension, Callstack Walking, and Symbol Access Library.

DynamoRIO concepts demonstrated
Typical use cases
  • Debugging unexpected call paths
  • Capturing stack traces at selected points

cbr.c

The sample cbr.c instruments conditional branches with separate taken/not-taken clean calls, records which edges execute, and flushes and redirects execution to remove instrumentation during execution once each edge has been seen.

DynamoRIO concepts demonstrated
  • Conditional branch instrumentation with clean calls
  • Fragment flushing and execution redirection
  • Per-branch state tracking with a hash table
Typical use cases
  • Building dynamic control-flow graphs
  • Reducing overhead after edge discovery

cbrtrace.c

The sample cbrtrace.c logs each conditional branch’s address, fall-through, target, and taken direction to per-thread log files using dr_insert_cbr_instrumentation_ex().

DynamoRIO concepts demonstrated
Typical use cases
  • Debugging branch behavior
  • Building branch outcome traces

countcalls.c

The sample countcalls.c counts dynamic direct calls, indirect calls, and returns using per-thread TLS data and inline counter updates with flag preservation.

DynamoRIO concepts demonstrated
Typical use cases
  • Profiling call/return behavior
  • Comparing indirect call frequency across inputs

div.c

The sample div.c counts unsigned division instructions and how often the runtime divisor is a power of two, using a clean call to read the divisor value.

DynamoRIO concepts demonstrated
  • Clean calls for operand inspection
  • Runtime operand value extraction
  • Thread-safe counters with a mutex
Typical use cases
  • Identifying optimization opportunities
  • Finding expensive divisions in hot code

empty.c

The sample empty.c is a minimal client that only registers an exit event and performs no instrumentation.

DynamoRIO concepts demonstrated
  • Client initialization
  • Exit event handling
Typical use cases
  • Starting point for new clients
  • Verifying build and load pipelines

hot_bbcount.c

The sample hot_bbcount.c uses Basic Block Duplicator to create cold and hot versions of basic block instrumentation, switching to inline counter updates after a per-block hit threshold is reached.

DynamoRIO concepts demonstrated
  • Basic block duplication (Basic Block Duplicator)
  • Clean calls versus inline counter updates
  • Runtime case encoding stored in raw TLS
Typical use cases
  • Hot path profiling with reduced overhead
  • Tiered instrumentation strategies

inc2add.c

The sample inc2add.c performs an app2app transformation that replaces inc/dec with add/sub in trace blocks when the carry flag is not live, using drreg liveness analysis to preserve correctness.

DynamoRIO concepts demonstrated
  • App2app transformations
  • Instruction replacement
  • Flag liveness checks with Register Management
Typical use cases
  • Exploring micro-architecture-specific optimizations
  • Testing transformation pipelines

inline.c

The sample inline.c customizes trace formation to inline callees into traces by marking call sites as trace heads and ending traces after returns or size limits.

DynamoRIO concepts demonstrated
Typical use cases
  • Dynamic inlining experiments
  • Trace-level performance research

inscount.cpp

The sample inscount.cpp counts executed instructions by computing per-basic-block instruction counts in analysis and adding them via an auto-inlined clean call, with an option to count only application code.

DynamoRIO concepts demonstrated
Typical use cases
  • Instruction count profiling
  • Validating instrumentation overhead

instrace_simple.c

The sample instrace_simple.c records a per-thread instruction trace (PC and opcode) into a raw TLS buffer with inline instrumentation and flushes it to a text file using clean calls.

DynamoRIO concepts demonstrated
  • Raw TLS buffer management
  • Inline per-instruction instrumentation
  • Clean calls to flush per-thread buffers
Typical use cases
  • Building simple instruction traces
  • Validating instrumentation correctness
  • For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

instrace_x86.c

The sample instrace_x86.c captures a per-thread instruction trace on x86 using inline buffer filling and a local code cache for lean clean calls when the buffer fills.

DynamoRIO concepts demonstrated
  • Local code cache generation
  • Lean clean calls
  • Per-thread buffering and file output
Typical use cases
  • High-performance instruction tracing
  • Producing binary traces for offline analysis
  • For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

instrcalls.c

The sample instrcalls.c logs each direct call, indirect call, and return with target information to per-thread files, with optional symbolization via drsyms (Symbol Access Library).

DynamoRIO concepts demonstrated
  • Call and return instrumentation
  • Per-thread logging
  • Optional symbol lookup with drsyms
Typical use cases
  • Building call flow traces
  • Investigating indirect call behavior

memtrace_simple.c

The sample memtrace_simple.c records instruction and memory-reference entries (type, size, address) into a per-thread buffer and dumps them to a text file using clean calls.

DynamoRIO concepts demonstrated
Typical use cases
  • Studying memory access patterns
  • Validating memory reference instrumentation
  • For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

memtrace_x86.c

The sample memtrace_x86.c captures memory references on x86 with inline buffer filling and a local code cache for lean clean calls, writing binary or text traces per thread.

DynamoRIO concepts demonstrated
  • Instrumentation Utilities string and scatter/gather expansion
  • Local code cache generation
  • Per-thread buffering with lean clean calls
Typical use cases
  • High-performance memory tracing
  • Generating binary traces for offline analysis
  • For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

memval_simple.c

The sample memval_simple.c records memory write addresses and the written bytes using drx_buf trace and circular buffers, flushing via a trace-buffer fault handler to per-thread log files.

DynamoRIO concepts demonstrated
Typical use cases
  • Tracking data writes for debugging
  • Building lightweight memory value traces

modxfer.c

The sample modxfer.c counts instructions per module and tracks cross-module indirect branch transfers between any modules, logging summary statistics at exit.

DynamoRIO concepts demonstrated
Typical use cases
  • Studying inter-module call patterns
  • Identifying frequent cross-module transfers

modxfer_app2lib.c

The sample modxfer_app2lib.c counts instructions in the main application versus libraries and tracks indirect call/jump transfers between them using per-basic-block clean calls.

DynamoRIO concepts demonstrated
  • Module range checks for app vs. library code
  • Clean call updates of per-basic-block counts
  • Indirect branch instrumentation for cross-module transfers
Typical use cases
  • Measuring time spent in app vs. libraries
  • Detecting app-to-lib transition frequency

opcode_count.cpp

The sample opcode_count.cpp counts executions of a selected opcode and the total instruction count using drmgr opcode instrumentation events and drx counter updates.

DynamoRIO concepts demonstrated
Typical use cases
  • Focused opcode profiling
  • Regression checks for specific instruction patterns

opcodes.c

The sample opcodes.c counts dynamic instruction executions by opcode, grouped by ISA mode, and reports the top opcode counts at exit.

DynamoRIO concepts demonstrated
Typical use cases
  • Opcode mix profiling
  • Comparing dynamic behavior across workloads

prefetch.c

The sample prefetch.c removes prefetch and prefetchw instructions on Intel CPUs using an app2app pass, counting the removals for reporting.

DynamoRIO concepts demonstrated
  • App2app instruction removal
  • CPU vendor detection
  • Thread-safe counting with a mutex
Typical use cases
  • Running AMD-optimized binaries on Intel CPUs
  • Testing instruction stream sanitization

signal.c

The sample signal.c monitors UNIX signals, suppressing SIGTERM and redirecting SIGSEGV by skipping the faulting instruction while counting signals seen.

DynamoRIO concepts demonstrated
  • Signal event callbacks (UNIX)
  • Signal suppression and redirection
  • Atomic counters
Typical use cases
  • Observing crash signals in a controlled way
  • Prototyping signal-based fault handling

ssljack.c

The sample ssljack.c wraps OpenSSL and GnuTLS read/write functions on module load and logs plaintext data to per-SSL-context files.

DynamoRIO concepts demonstrated
Typical use cases
  • Inspecting decrypted SSL/TLS traffic
  • Debugging application-level crypto usage

statecmp.c

The sample statecmp.c uses Machine State Comparison Library to detect instrumentation-induced state mismatches by intentionally clobbering flags and handling mismatches via a user callback.

DynamoRIO concepts demonstrated
Typical use cases
  • Validating new instrumentation passes
  • Debugging subtle state clobbers

stats.c

The sample stats.c (Windows-only) exports instruction, floating-point, and syscall counters via shared memory for the stats viewer, using drx counter updates per basic block.

DynamoRIO concepts demonstrated
Typical use cases
  • Live statistics dashboards
  • Comparing runs without heavy logging

strace.c

The sample strace.c prints the name and result of every system call using the Dr. Syscall: System Call Monitoring Extension extension from DRMF.

DynamoRIO concepts demonstrated
Typical use cases
  • Debugging unexpected syscalls
  • Building basic syscall traces

stl_test.cpp

The sample stl_test.cpp is a C++ client that exercises STL containers (vector, list, map) and optionally uses a TLS variable on UNIX when SHOW_RESULTS is enabled.

DynamoRIO concepts demonstrated
  • C++ client setup
  • STL usage in a client
  • Exit event registration
Typical use cases
  • C++ client scaffolding
  • Validating STL usage in DynamoRIO clients

syscall.c

The sample syscall.c monitors system calls, counts them, and modifies write syscalls (SYS_write/NtWriteFile) using drmgr’s pre/post syscall events and thread-context-local storage (Multi-Instrumentation Manager).

DynamoRIO concepts demonstrated
  • Syscall interception events
  • Thread-context-local storage via drmgr CLS (Multi-Instrumentation Manager)
  • Platform-specific syscall handling and parameter modification
Typical use cases
  • Auditing syscall activity
  • Prototyping syscall rewriting

tracedump.c

The sample tracedump.c uses the standalone API to parse and disassemble binary trace dump files produced by the -tracedump_binary option.

DynamoRIO concepts demonstrated
  • Standalone API usage
  • Trace dump file parsing
  • Disassembly of cached code
Typical use cases
  • Inspecting trace dumps offline
  • Debugging code cache behavior

utils.c

The sample utils.c provides shared logging helpers for samples, including unique per-process log file creation and FILE* stream helpers.

DynamoRIO concepts demonstrated
Typical use cases
  • Reusing logging utilities across sample clients
  • Standardizing output file naming

wrap.c

The sample wrap.c wraps malloc (Linux) or HeapAlloc (Windows) to track the maximum allocation size and optionally force occasional allocation failures.

DynamoRIO concepts demonstrated
Typical use cases
  • Testing error handling paths
  • Tracking allocation patterns

Discussion of Selected Samples

Instruction Counting

We now illustrate how to use the above API to implement a simple instrumentation client for counting the number of executed call and return instructions in the input program. Full code for this example is in the file countcalls.c.

The client maintains set of three counters: num_direct_calls, num_indirect_calls, and num_returns to count three different types of instructions during execution. It keeps both thread-private and global versions of these counters. The client initializes everything by supplying the following dr_client_main routine:

DR_EXPORT void
dr_client_main(client_id_t id, int argc, const char *argv[])
{
/* register events */
dr_register_thread_init_event(event_thread_init);
dr_register_thread_exit_event(event_thread_exit);
dr_register_bb_event(event_basic_block);
/* make it easy to tell, by looking at log file, which client executed */
dr_log(NULL, DR_LOG_ALL, 1, "Client 'countcalls' initializing\n");
}
DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
uint client_id_t
Definition: dr_defines.h:357
DR_API void dr_register_exit_event(void(*func)(void))
DR_API void dr_register_thread_init_event(void(*func)(void *drcontext))
DR_API void dr_register_thread_exit_event(void(*func)(void *drcontext))
DR_API void dr_register_bb_event(dr_emit_flags_t(*func)(void *drcontext, void *tag, instrlist_t *bb, bool for_trace, bool translating))
#define DR_LOG_ALL
Definition: dr_tools.h:1617
DR_API void dr_log(void *drcontext, uint mask, uint level, const char *fmt,...)

The client provides an event_exit routine that displays the final values of the global counters as well as a thread_exit routine that shows the counter totals on a per-thread basis.

The client keeps track of each thread's instruction counts separately. To do this, it creates a data structure that will be separately allocated for each thread:

typedef struct {
int num_direct_calls;
int num_indirect_calls;
int num_returns;
} per_thread_t;

Now the thread hooks are used to initialize the data structure and to display the thread-private totals :

static void event_thread_init(void *drcontext)
{
/* create an instance of our data structure for this thread */
per_thread *data = (per_thread *)
dr_thread_alloc(drcontext, sizeof(per_thread));
/* store it in the slot provided in the drcontext */
dr_set_tls_field(drcontext, data);
data->num_direct_calls = 0;
data->num_indirect_calls = 0;
data->num_returns = 0;
dr_log(drcontext, DR_LOG_ALL, 1, "countcalls: set up for thread "TIDFMT"\n",
dr_get_thread_id(drcontext));
}
static void event_thread_exit(void *drcontext)
{
per_thread *data = (per_thread *) dr_get_tls_field(drcontext);
... // string formatting and displaying
/* clean up memory */
dr_thread_free(drcontext, data, sizeof(per_thread));
}
#define TIDFMT
Definition: dr_defines.h:648
DR_API void * dr_thread_alloc(void *drcontext, size_t size)
DR_API void dr_set_tls_field(void *drcontext, void *value)
DR_API thread_id_t dr_get_thread_id(void *drcontext)
DR_API void * dr_get_tls_field(void *drcontext)
DR_API void dr_thread_free(void *drcontext, void *mem, size_t size)

The real work is done in the basic block hook. We simply look for the instructions we're interested in and insert an increment of the appropriate thread-local and global counters, remembering to save the flags, of course. This sample has separate paths for incrementing the thread private counts for shared vs. thread-private caches (see the -thread_private option) to illustrate the differences in targeting for them. Note that the shared path would work fine with private caches.

static void
insert_counter_update(void *drcontext, instrlist_t *bb, instr_t *where, int offset)
{
/* Since the inc instruction clobbers 5 of the arithmetic eflags,
* we have to save them around the inc. We could be more efficient
* by not bothering to save the overflow flag and constructing our
* own sequence of instructions to save the other 5 flags (using
* lahf) or by doing a liveness analysis on the flags and saving
* only if live.
*/
dr_save_arith_flags(drcontext, bb, where, SPILL_SLOT_1);
/* Increment the global counter using the lock prefix to make it atomic
* across threads. It would be cheaper to aggregate the thread counters
* in the exit events, but this sample is intended to illustrate inserted
* instrumentation.
*/
(drcontext, OPND_CREATE_ABSMEM(((byte *)&global_count) + offset, OPSZ_4))));
/* Increment the thread private counter. */
per_thread_t *data = (per_thread_t *) dr_get_tls_field(drcontext);
/* private caches - we can use an absolute address */
OPND_CREATE_ABSMEM(((byte *)&data) + offset, OPSZ_4)));
} else {
/* shared caches - we must indirect via thread local storage */
/* We spill xbx to use a scratch register (we could do a liveness
* analysis to try and find a dead register to use). Note that xax
* is currently holding the saved eflags. */
dr_save_reg(drcontext, bb, where, REG_XBX, SPILL_SLOT_2);
dr_insert_read_tls_field(drcontext, bb, where, REG_XBX);
INSTR_CREATE_inc(drcontext, OPND_CREATE_MEM32(REG_XBX, offset)));
dr_restore_reg(drcontext, bb, where, REG_XBX, SPILL_SLOT_2);
}
/* restore flags */
dr_restore_arith_flags(drcontext, bb, where, SPILL_SLOT_1);
}
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating)
{
instr_t *instr, *next_instr;
... // some logging
for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
/* grab next now so we don't go over instructions we insert */
next_instr = instr_get_next(instr);
/* instrument calls and returns -- ignore far calls/rets */
if (instr_is_call_direct(instr)) {
insert_counter_update(drcontext, bb, instr,
offsetof(per_thread_t, num_direct_calls));
} else if (instr_is_call_indirect(instr)) {
insert_counter_update(drcontext, bb, instr,
offsetof(per_thread_t, num_indirect_calls));
} else if (instr_is_return(instr)) {
insert_counter_update(drcontext, bb, instr,
offsetof(per_thread_t, num_returns));
}
}
... // some logging
}
struct _instrlist_t instrlist_t
Definition: dr_defines.h:908
dr_emit_flags_t
Definition: dr_events.h:145
@ DR_EMIT_DEFAULT
Definition: dr_events.h:147
DR_API bool instr_is_call_indirect(instr_t *instr)
DR_API bool instr_is_call_direct(instr_t *instr)
DR_API INSTR_INLINE instr_t * instr_get_next(instr_t *instr)
DR_API bool instr_is_return(instr_t *instr)
DR_API instr_t * instrlist_first(instrlist_t *ilist)
#define OPND_CREATE_MEM32(base_reg, disp)
Definition: dr_ir_macros.h:82
#define OPND_CREATE_ABSMEM(addr, size)
Definition: dr_ir_macros_aarch64.h:102
#define INSTR_CREATE_inc(dc, d)
Definition: dr_ir_macros_x86.h:1626
#define LOCK(instr_ptr)
Definition: dr_ir_macros_x86.h:60
@ OPSZ_4
Definition: dr_ir_opnd.h:85
DR_API void dr_restore_arith_flags(void *drcontext, instrlist_t *ilist, instr_t *where, dr_spill_slot_t slot)
DR_API void instrlist_meta_preinsert(instrlist_t *ilist, instr_t *where, instr_t *instr)
@ SPILL_SLOT_2
Definition: dr_ir_utils.h:68
DR_API void dr_restore_reg(void *drcontext, instrlist_t *ilist, instr_t *where, reg_id_t reg, dr_spill_slot_t slot)
DR_API void dr_insert_read_tls_field(void *drcontext, instrlist_t *ilist, instr_t *where, reg_id_t reg)
DR_API void dr_save_arith_flags(void *drcontext, instrlist_t *ilist, instr_t *where, dr_spill_slot_t slot)
DR_API void dr_save_reg(void *drcontext, instrlist_t *ilist, instr_t *where, reg_id_t reg, dr_spill_slot_t slot)
DR_API bool dr_using_all_private_caches(void)
Definition: dr_defines.h:378
Building the Example

For general instructions on building a client, see How to Build a Tool.

To build the instrcalls.c client using CMake, if DYNAMORIO_HOME is set to the base of the DynamoRIO release package:

mkdir build
cd build
cmake -DDynamoRIO_DIR=$DYNAMORIO_HOME/cmake $DYNAMORIO_HOME/samples
make instrcalls

To build 32-bit samples when using gcc with a default of 64-bit, use:

mkdir build
cd build
CFLAGS=-m32 CXXFLAGS=-m32 cmake -DDynamoRIO_DIR=$DYNAMORIO_HOME/cmake $DYNAMORIO_HOME/samples
make instrcalls

The result is a shared library instrcalls.dll or libinstrcalls.so. To invoke the client library, follow the instructions under How to Run.

Instruction Profiling

The next example shows how to use the provided control flow instrumentation routines, which allow more sophisticated profiling than simply counting instructions. Full code for this example is in the file instrcalls.c.

As in the previous example, the client is interested in direct and indirect calls and returns. The client wants to analyze the target address of each dynamic instance of a call or return. For our example, we simply dump the data in text format to a separate file for each thread. Since FILE cannot be exported from a DLL on Windows, we use the DynamoRIO-provided file_t type that hides the distinction between FILE and HANDLE to allow the same code to work on Linux and Windows. We make use of the thread initialization and exit routines to open and close the file. We store the file for a thread in the user slot in the drcontext.

static void event_thread_init(void *drcontext)
{
/* we're going to dump our data to a per-thread file */
file_t f;
char logname[512];
... // filename generation
f = dr_open_file(fname, false/*write*/);
/* store it in the slot provided in the drcontext */
dr_set_tls_field(drcontext, (void *)f);
... // logging
}
static void event_thread_exit(void *drcontext)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(drcontext);
}
#define INVALID_FILE
Definition: dr_defines.h:332
DR_API void dr_close_file(file_t f)
#define DR_ASSERT(x)
Definition: dr_tools.h:114
DR_API file_t dr_open_file(const char *fname, uint mode_flags)

The basic block hook inserts a call to a procedure for each type of instruction, using the API-provided dr_insert_call_instrumentation and dr_insert_mbr_instrumentation routines, which insert calls to procedures with a certain signature.

event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating)
{
instr_t *instr, *next_instr;
... // logging
for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
next_instr = instr_get_next(instr);
if (!instr_opcode_valid(instr))
continue;
/* instrument calls and returns -- ignore far calls/rets */
if (instr_is_call_direct(instr)) {
dr_insert_call_instrumentation(drcontext, bb, instr, (app_pc)at_call);
} else if (instr_is_call_indirect(instr)) {
dr_insert_mbr_instrumentation(drcontext, bb, instr, (app_pc)at_call_ind,
SPILL_SLOT_1);
} else if (instr_is_return(instr)) {
dr_insert_mbr_instrumentation(drcontext, bb, instr, (app_pc)at_return,
SPILL_SLOT_1);
}
}
}
DR_API bool instr_opcode_valid(instr_t *instr)
DR_API void dr_insert_call_instrumentation(void *drcontext, instrlist_t *ilist, instr_t *instr, void *callee)
DR_API void dr_insert_mbr_instrumentation(void *drcontext, instrlist_t *ilist, instr_t *instr, void *callee, dr_spill_slot_t scratch_slot)

These procedures look like this :

static void
at_call(app_pc instr_addr, app_pc target_addr)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
dr_fprintf(f, "CALL @ "PFX" to "PFX", TOS is "PFX"\n",
instr_addr, target_addr, mc.xsp);
}
static void
at_call_ind(app_pc instr_addr, app_pc target_addr)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
dr_fprintf(f, "CALL INDIRECT @ "PFX" to "PFX"\n", instr_addr, target_addr);
}
static void
at_return(app_pc instr_addr, app_pc target_addr)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
dr_fprintf(f, "RETURN @ "PFX" to "PFX"\n", instr_addr, target_addr);
}
#define PFX
Definition: dr_defines.h:638
DR_API bool dr_get_mcontext(void *drcontext, dr_mcontext_t *context)
DR_API void * dr_get_current_drcontext(void)
DR_API ssize_t dr_fprintf(file_t f, const char *fmt,...)
Definition: dr_defines.h:885

The address of the instruction and the address of its target are both provided. These routines could perform some sort of analysis based on these addresses. In our example we simply print out the data.

Modifying Existing Instrumentation

In this example, we show how to update or replace existing instrumentation after it executes. This ability is useful for clients performing adaptive optimization. In this example, however, we are interested in recording the direction of all conditional branches, but wish to remove the overhead of instrumentation once we've gathered that information. This code could form part of a dynamic CFG builder, where we want to observe the control-flow edges that execute at runtime, but remove the instrumentation after it executes.

While DynamoRIO supports direct fragment replacement, another method for re-instrumentation is to flush the fragment from the code cache and rebuild it in the basic block event callback. In other words, we take the following approach:

  1. In the basic block event callback, insert separate instrumentation for the taken and fall-through edges.
  2. When the basic block executes, note the direction taken and flush the fragment from the code cache.
  3. When the basic block event triggers again, insert instrumentation only for the unseen edge. After both edges have triggered, remove all instrumentation for the cbr.

We insert separate clean calls for the taken and fall-through cases. In each clean call, we record the observed direction and immediately flush the basic block using dr_flush_region(). Since that routine removes the calling block, we redirect execution to the target or fall-through address with dr_redirect_execution(). The file cbr.c contains the full code for this sample.

Optimization

For the next example we consider a client application for a simple optimization. The optimizer replaces every increment/decrement operation with a corresponding add/subtract operation if running on a Pentium 4, where the add/subtract is less expensive. For optimizations, we are less concerned with covering all the code that is executed; on the contrary, in order to amortize the optimization overhead, we only want to apply the optimization to hot code. Thus, we apply the optimization at the trace level rather than the basic block level. Full code for this example is in the file inc2add.c.

Custom Tracing

This example demonstrates the custom tracing interface. It changes DynamoRIO's tracing behavior to favor making traces that start at a call and end right after a return. It demonstrates the use of both custom trace API elements :

int query_end_trace(void *drcontext, void *trace_tag, void *next_tag);
bool dr_mark_trace_head(void *drcontext, void *tag);
DR_API bool dr_mark_trace_head(void *drcontext, void *tag)

Full code for this example is in the file inline.c.

Use of x87 Floating Point Operation in a Client

Because saving the x87 floating point state is very expensive, on x86 DynamoRIO seeks to do so on an as needed basis. If a client wishes to use floating point operations and is unsure whether its compiler will use x87 or not, or if it wishes to use MMX registers, it must save and restore the application's floating point state around the usage. For an inserted clean call out of the code cache, this can be conveniently done using dr_insert_clean_call() and passing true for the save_fpstate parameter. It can also be done explicitly using these routines:

void proc_save_fpstate(byte *buf);
void proc_restore_fpstate(byte *buf);
DR_API size_t proc_save_fpstate(byte *buf)
DR_API void proc_restore_fpstate(byte *buf)

These routines must be used if x87 floating point operations are performed in non-inserted-call locations, such as event callbacks. Note that there are restrictions on how these methods may be called: see the documentation in the header files for additional information. Note also that the floating point state must be saved around calls to our provided printing routines when they are used to print floats. However, it is not necessary to save and restore the floating point state around floating point operations if they are being used in the initialization or termination routines.

On ARM and AArch64 the SIMD/FP registers are always saved, so proc_save_fpstate and proc_restore_fpstate are no-ops. On x86, modern compilers typically do not use x87 operations, but to be safe clients are still advised to either avoid floating-point operations or use the preservation routines listed here.

This example client counts the number of basic blocks processed and keeps statistics on their average size using floating point operations. Full code for this example is in the file bbsize.c.

Use of Custom Client Statistics with the Windows GUI

The new Windows GUI will display custom client statistics, if they are placed in shared memory with a certain name. The sample stats.c gives code for the protocol used in the form of a sample client that counts total instructions, floating-point instructions, and system calls.

Note that the stats.c example client and the Windows GUI must both be run within the same session in order for the statistics to be shared properly. They can be modified to use a "Global" prefix instead of "Local" for cross-session sharing, though this requires running with administrative privileges.

Use of Standalone API

The binary tracedump reader also functions as an example of Disassembly Library : tracedump.c.