DynamoRIO

We provide several examples in the samples directory of the release package to illustrate how the DynamoRIO API is used to build a DynamoRIO client.

There are also samples for the Dr. Memory Framework located in the drmemory/drmf/samples directory of the release package.

For larger examples of clients, see the provided Available Tools, which are larger and more polished end-user clients than these samples.

List of Samples

Each sample below is in the api/samples directory. The links point to the corresponding source file in the Git repository.

bbbuf.c

The sample bbbuf.c records each basic block’s start PC into a per-thread fast circular buffer (default 64KB) by inserting inline instrumentation at the first instruction of each basic block using the drx_buf extension.

DynamoRIO concepts demonstrated

TLS-backed per-thread buffers via DynamoRIO eXtension utilities (Buffer Filling API)
Basic block instrumentation at block entry
Scratch register management with Register Management

Typical use cases

Hot path profiling based on basic block history
Lightweight execution tracing for debugging

bbcount.c

The sample bbcount.c counts dynamic basic block executions by inserting a non-atomic global counter update at the start of each basic block using DynamoRIO eXtension utilities counter helpers.

DynamoRIO concepts demonstrated

Basic block instrumentation
Inline counter updates with DynamoRIO eXtension utilities
AFLAGS liveness checks via Register Management

Typical use cases

Measuring overall execution volume
Comparing workloads by basic block activity

bbsize.c

The sample bbsize.c computes the number, maximum size, and average size of basic blocks using a basic-block analysis callback and explicit floating-point state save/restore.

DynamoRIO concepts demonstrated

Basic block analysis callbacks
Floating-point state preservation in callbacks
Shared statistics protected by a mutex

Typical use cases

Characterizing code layout behavior
Comparing compiler output across binaries

callstack.cpp

The sample callstack.cpp wraps a configurable function (default: malloc) and prints a symbolized call stack on each call using Function Wrapping and Replacing Extension, Callstack Walking, and Symbol Access Library.

DynamoRIO concepts demonstrated

Function wrapping with Function Wrapping and Replacing Extension
Call stack walking via the Callstack Walking extension
Symbol lookup with Symbol Access Library
Runtime options via DynamoRIO Option Parser

Typical use cases

Debugging unexpected call paths
Capturing stack traces at selected points

cbr.c

The sample cbr.c instruments conditional branches with separate taken/not-taken clean calls, records which edges execute, and flushes and redirects execution to remove instrumentation during execution once each edge has been seen.

DynamoRIO concepts demonstrated

Conditional branch instrumentation with clean calls
Fragment flushing and execution redirection
Per-branch state tracking with a hash table

Typical use cases

Building dynamic control-flow graphs
Reducing overhead after edge discovery

cbrtrace.c

The sample cbrtrace.c logs each conditional branch’s address, fall-through, target, and taken direction to per-thread log files using dr_insert_cbr_instrumentation_ex().

DynamoRIO concepts demonstrated

Conditional branch instrumentation
Thread-local storage via drmgr TLS fields (Multi-Instrumentation Manager)
Per-thread log file output

Typical use cases

Debugging branch behavior
Building branch outcome traces

countcalls.c

The sample countcalls.c counts dynamic direct calls, indirect calls, and returns using per-thread TLS data and inline counter updates with flag preservation.

DynamoRIO concepts demonstrated

TLS-based per-thread data via drmgr (Multi-Instrumentation Manager)
Instruction classification for calls and returns
Inline counter updates with Register Management

Typical use cases

Profiling call/return behavior
Comparing indirect call frequency across inputs

div.c

The sample div.c counts unsigned division instructions and how often the runtime divisor is a power of two, using a clean call to read the divisor value.

DynamoRIO concepts demonstrated

Clean calls for operand inspection
Runtime operand value extraction
Thread-safe counters with a mutex

Typical use cases

Identifying optimization opportunities
Finding expensive divisions in hot code

empty.c

The sample empty.c is a minimal client that only registers an exit event and performs no instrumentation.

DynamoRIO concepts demonstrated

Client initialization
Exit event handling

Typical use cases

Starting point for new clients
Verifying build and load pipelines

hot_bbcount.c

The sample hot_bbcount.c uses Basic Block Duplicator to create cold and hot versions of basic block instrumentation, switching to inline counter updates after a per-block hit threshold is reached.

DynamoRIO concepts demonstrated

Basic block duplication (Basic Block Duplicator)
Clean calls versus inline counter updates
Runtime case encoding stored in raw TLS

Typical use cases

Hot path profiling with reduced overhead
Tiered instrumentation strategies

inc2add.c

The sample inc2add.c performs an app2app transformation that replaces inc/dec with add/sub in trace blocks when the carry flag is not live, using drreg liveness analysis to preserve correctness.

DynamoRIO concepts demonstrated

App2app transformations
Instruction replacement
Flag liveness checks with Register Management

Typical use cases

Exploring micro-architecture-specific optimizations
Testing transformation pipelines

inline.c

The sample inline.c customizes trace formation to inline callees into traces by marking call sites as trace heads and ending traces after returns or size limits.

DynamoRIO concepts demonstrated

Custom trace control via end-trace events
Trace head tracking with Multi-Instrumentation Manager and dr_mark_trace_head()
Fragment lifecycle handling on deletion

Typical use cases

Dynamic inlining experiments
Trace-level performance research

inscount.cpp

The sample inscount.cpp counts executed instructions by computing per-basic-block instruction counts in analysis and adding them via an auto-inlined clean call, with an option to count only application code.

DynamoRIO concepts demonstrated

Clean calls optimized by DynamoRIO
Runtime options via DynamoRIO Option Parser
Emulation-aware instruction counting with drmgr (Multi-Instrumentation Manager)

Typical use cases

Instruction count profiling
Validating instrumentation overhead

instrace_simple.c

The sample instrace_simple.c records a per-thread instruction trace (PC and opcode) into a raw TLS buffer with inline instrumentation and flushes it to a text file using clean calls.

DynamoRIO concepts demonstrated

Raw TLS buffer management
Inline per-instruction instrumentation
Clean calls to flush per-thread buffers

Typical use cases

Building simple instruction traces
Validating instrumentation correctness
For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

instrace_x86.c

The sample instrace_x86.c captures a per-thread instruction trace on x86 using inline buffer filling and a local code cache for lean clean calls when the buffer fills.

DynamoRIO concepts demonstrated

Local code cache generation
Lean clean calls
Per-thread buffering and file output

Typical use cases

High-performance instruction tracing
Producing binary traces for offline analysis
For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

instrcalls.c

The sample instrcalls.c logs each direct call, indirect call, and return with target information to per-thread files, with optional symbolization via drsyms (Symbol Access Library).

DynamoRIO concepts demonstrated

Call and return instrumentation
Per-thread logging
Optional symbol lookup with drsyms

Typical use cases

Building call flow traces
Investigating indirect call behavior

memtrace_simple.c

The sample memtrace_simple.c records instruction and memory-reference entries (type, size, address) into a per-thread buffer and dumps them to a text file using clean calls.

DynamoRIO concepts demonstrated

Memory operand analysis
Instrumentation Utilities helpers for address and size
DynamoRIO eXtension utilities expansion of string and scatter/gather operations

Typical use cases

Studying memory access patterns
Validating memory reference instrumentation
For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

memtrace_x86.c

The sample memtrace_x86.c captures memory references on x86 with inline buffer filling and a local code cache for lean clean calls, writing binary or text traces per thread.

DynamoRIO concepts demonstrated

Instrumentation Utilities string and scatter/gather expansion
Local code cache generation
Per-thread buffering with lean clean calls

Typical use cases

High-performance memory tracing
Generating binary traces for offline analysis
For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.

memval_simple.c

The sample memval_simple.c records memory write addresses and the written bytes using drx_buf trace and circular buffers, flushing via a trace-buffer fault handler to per-thread log files.

DynamoRIO concepts demonstrated

Post-instruction instrumentation for write values
DynamoRIO eXtension utilities trace and circular buffers with a fault handler (Buffer Filling API)
Memory operand address calculation with Instrumentation Utilities

Typical use cases

Tracking data writes for debugging
Building lightweight memory value traces

modxfer.c

The sample modxfer.c counts instructions per module and tracks cross-module indirect branch transfers between any modules, logging summary statistics at exit.

DynamoRIO concepts demonstrated

Module load and unload tracking
Indirect branch instrumentation
Per-module counters with DynamoRIO eXtension utilities

Typical use cases

Studying inter-module call patterns
Identifying frequent cross-module transfers

modxfer_app2lib.c

The sample modxfer_app2lib.c counts instructions in the main application versus libraries and tracks indirect call/jump transfers between them using per-basic-block clean calls.

DynamoRIO concepts demonstrated

Module range checks for app vs. library code
Clean call updates of per-basic-block counts
Indirect branch instrumentation for cross-module transfers

Typical use cases

Measuring time spent in app vs. libraries
Detecting app-to-lib transition frequency

opcode_count.cpp

The sample opcode_count.cpp counts executions of a selected opcode and the total instruction count using drmgr opcode instrumentation events and drx counter updates.

DynamoRIO concepts demonstrated

Opcode instrumentation events via Multi-Instrumentation Manager
Inline counter updates with DynamoRIO eXtension utilities
Runtime options via DynamoRIO Option Parser

Typical use cases

Focused opcode profiling
Regression checks for specific instruction patterns

opcodes.c

The sample opcodes.c counts dynamic instruction executions by opcode, grouped by ISA mode, and reports the top opcode counts at exit.

DynamoRIO concepts demonstrated

Basic block instrumentation
Inline counter updates with DynamoRIO eXtension utilities
ISA mode awareness

Typical use cases

Opcode mix profiling
Comparing dynamic behavior across workloads

prefetch.c

The sample prefetch.c removes prefetch and prefetchw instructions on Intel CPUs using an app2app pass, counting the removals for reporting.

DynamoRIO concepts demonstrated

App2app instruction removal
CPU vendor detection
Thread-safe counting with a mutex

Typical use cases

Running AMD-optimized binaries on Intel CPUs
Testing instruction stream sanitization

signal.c

The sample signal.c monitors UNIX signals, suppressing SIGTERM and redirecting SIGSEGV by skipping the faulting instruction while counting signals seen.

DynamoRIO concepts demonstrated

Signal event callbacks (UNIX)
Signal suppression and redirection
Atomic counters

Typical use cases

Observing crash signals in a controlled way
Prototyping signal-based fault handling

ssljack.c

The sample ssljack.c wraps OpenSSL and GnuTLS read/write functions on module load and logs plaintext data to per-SSL-context files.

DynamoRIO concepts demonstrated

Module load events
Function wrapping (Function Wrapping and Replacing Extension)
Per-context file logging

Typical use cases

Inspecting decrypted SSL/TLS traffic
Debugging application-level crypto usage

statecmp.c

The sample statecmp.c uses Machine State Comparison Library to detect instrumentation-induced state mismatches by intentionally clobbering flags and handling mismatches via a user callback.

DynamoRIO concepts demonstrated

Machine State Comparison Library integration
Instrumentation correctness checking
Custom mismatch callbacks

Typical use cases

Validating new instrumentation passes
Debugging subtle state clobbers

stats.c

The sample stats.c (Windows-only) exports instruction, floating-point, and syscall counters via shared memory for the stats viewer, using drx counter updates per basic block.

DynamoRIO concepts demonstrated

Shared memory usage for client stats
Inline counters with DynamoRIO eXtension utilities
Windows-specific API interactions

Typical use cases

Live statistics dashboards
Comparing runs without heavy logging

strace.c

The sample strace.c prints the name and result of every system call using the Dr. Syscall: System Call Monitoring Extension extension from DRMF.

DynamoRIO concepts demonstrated

Syscall event callbacks via Dr. Syscall: System Call Monitoring Extension
Syscall name lookup with DRMF
Result decoding

Typical use cases

Debugging unexpected syscalls
Building basic syscall traces

stl_test.cpp

The sample stl_test.cpp is a C++ client that exercises STL containers (vector, list, map) and optionally uses a TLS variable on UNIX when SHOW_RESULTS is enabled.

DynamoRIO concepts demonstrated

C++ client setup
STL usage in a client
Exit event registration

Typical use cases

C++ client scaffolding
Validating STL usage in DynamoRIO clients

syscall.c

The sample syscall.c monitors system calls, counts them, and modifies write syscalls (SYS_write/NtWriteFile) using drmgr’s pre/post syscall events and thread-context-local storage (Multi-Instrumentation Manager).

DynamoRIO concepts demonstrated

Syscall interception events
Thread-context-local storage via drmgr CLS (Multi-Instrumentation Manager)
Platform-specific syscall handling and parameter modification

Typical use cases

Auditing syscall activity
Prototyping syscall rewriting

tracedump.c

The sample tracedump.c uses the standalone API to parse and disassemble binary trace dump files produced by the -tracedump_binary option.

DynamoRIO concepts demonstrated

Standalone API usage
Trace dump file parsing
Disassembly of cached code

Typical use cases

Inspecting trace dumps offline
Debugging code cache behavior

utils.c

The sample utils.c provides shared logging helpers for samples, including unique per-process log file creation and FILE* stream helpers.

DynamoRIO concepts demonstrated

Client path discovery
Log file creation helpers via DynamoRIO eXtension utilities
Stream helper utilities

Typical use cases

Reusing logging utilities across sample clients
Standardizing output file naming

wrap.c

The sample wrap.c wraps malloc (Linux) or HeapAlloc (Windows) to track the maximum allocation size and optionally force occasional allocation failures.

DynamoRIO concepts demonstrated

Module load callbacks
Function wrapping (Function Wrapping and Replacing Extension)
Return value modification

Typical use cases

Testing error handling paths
Tracking allocation patterns

Discussion of Selected Samples

Instruction Counting

We now illustrate how to use the above API to implement a simple instrumentation client for counting the number of executed call and return instructions in the input program. Full code for this example is in the file countcalls.c.

The client maintains set of three counters: num_direct_calls, num_indirect_calls, and num_returns to count three different types of instructions during execution. It keeps both thread-private and global versions of these counters. The client initializes everything by supplying the following dr_client_main routine:

DR_EXPORT void
dr_client_main(client_id_t id, int argc, const char *argv[])
{
  /* register events */
  dr_register_exit_event(event_exit);
  dr_register_thread_init_event(event_thread_init);
  dr_register_thread_exit_event(event_thread_exit);
  dr_register_bb_event(event_basic_block);
 
  /* make it easy to tell, by looking at log file, which client executed */
  dr_log(NULL, DR_LOG_ALL, 1, "Client 'countcalls' initializing\n");
}

The client provides an event_exit routine that displays the final values of the global counters as well as a thread_exit routine that shows the counter totals on a per-thread basis.

The client keeps track of each thread's instruction counts separately. To do this, it creates a data structure that will be separately allocated for each thread:

typedef struct {
  int num_direct_calls;
  int num_indirect_calls;
  int num_returns;
} per_thread_t;

Now the thread hooks are used to initialize the data structure and to display the thread-private totals :

static void event_thread_init(void *drcontext)
{
  /* create an instance of our data structure for this thread */
  per_thread *data = (per_thread *)
    dr_thread_alloc(drcontext, sizeof(per_thread));
  /* store it in the slot provided in the drcontext */
  dr_set_tls_field(drcontext, data);
  data->num_direct_calls = 0;
  data->num_indirect_calls = 0;
  data->num_returns = 0;
  dr_log(drcontext, DR_LOG_ALL, 1, "countcalls: set up for thread "TIDFMT"\n",
         dr_get_thread_id(drcontext));
}
 
static void event_thread_exit(void *drcontext)
{
  per_thread *data = (per_thread *) dr_get_tls_field(drcontext);
 
  ... // string formatting and displaying
 
  /* clean up memory */
  dr_thread_free(drcontext, data, sizeof(per_thread));
}

The real work is done in the basic block hook. We simply look for the instructions we're interested in and insert an increment of the appropriate thread-local and global counters, remembering to save the flags, of course. This sample has separate paths for incrementing the thread private counts for shared vs. thread-private caches (see the -thread_private option) to illustrate the differences in targeting for them. Note that the shared path would work fine with private caches.

static void
insert_counter_update(void *drcontext, instrlist_t *bb, instr_t *where, int offset)
{
  /* Since the inc instruction clobbers 5 of the arithmetic eflags,
   * we have to save them around the inc. We could be more efficient
   * by not bothering to save the overflow flag and constructing our
   * own sequence of instructions to save the other 5 flags (using
   * lahf) or by doing a liveness analysis on the flags and saving
   * only if live.
   */
  dr_save_arith_flags(drcontext, bb, where, SPILL_SLOT_1);
 
  /* Increment the global counter using the lock prefix to make it atomic
   * across threads. It would be cheaper to aggregate the thread counters
   * in the exit events, but this sample is intended to illustrate inserted
   * instrumentation.
   */
  instrlist_meta_preinsert(bb, where, LOCK(INSTR_CREATE_inc
    (drcontext, OPND_CREATE_ABSMEM(((byte *)&global_count) + offset, OPSZ_4))));
 
  /* Increment the thread private counter. */
  if (dr_using_all_private_caches()) {
    per_thread_t *data = (per_thread_t *) dr_get_tls_field(drcontext);
    /* private caches - we can use an absolute address */
    instrlist_meta_preinsert(bb, where, INSTR_CREATE_inc(drcontext,
        OPND_CREATE_ABSMEM(((byte *)&data) + offset, OPSZ_4)));
  } else {
    /* shared caches - we must indirect via thread local storage */
    /* We spill xbx to use a scratch register (we could do a liveness
     * analysis to try and find a dead register to use). Note that xax
     * is currently holding the saved eflags. */
    dr_save_reg(drcontext, bb, where, REG_XBX, SPILL_SLOT_2);
    dr_insert_read_tls_field(drcontext, bb, where, REG_XBX);
    instrlist_meta_preinsert(bb, where,
        INSTR_CREATE_inc(drcontext, OPND_CREATE_MEM32(REG_XBX, offset)));
    dr_restore_reg(drcontext, bb, where, REG_XBX, SPILL_SLOT_2);
  }
 
  /* restore flags */
  dr_restore_arith_flags(drcontext, bb, where, SPILL_SLOT_1);
}
 
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
                  bool for_trace, bool translating)
{
  instr_t *instr, *next_instr;
 
  ... // some logging
 
  for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
    /* grab next now so we don't go over instructions we insert */
    next_instr = instr_get_next(instr);
 
    /* instrument calls and returns -- ignore far calls/rets */
    if (instr_is_call_direct(instr)) {
      insert_counter_update(drcontext, bb, instr,
                            offsetof(per_thread_t, num_direct_calls));
    } else if (instr_is_call_indirect(instr)) {
      insert_counter_update(drcontext, bb, instr,
                            offsetof(per_thread_t, num_indirect_calls));
    } else if (instr_is_return(instr)) {
      insert_counter_update(drcontext, bb, instr,
                            offsetof(per_thread_t, num_returns));
    }
  }
 
  ... // some logging
 
  return DR_EMIT_DEFAULT;
}

Building the Example

For general instructions on building a client, see How to Build a Tool.

To build the instrcalls.c client using CMake, if DYNAMORIO_HOME is set to the base of the DynamoRIO release package:

mkdir build
cd build
cmake -DDynamoRIO_DIR=$DYNAMORIO_HOME/cmake $DYNAMORIO_HOME/samples
make instrcalls

To build 32-bit samples when using gcc with a default of 64-bit, use:

mkdir build
cd build
CFLAGS=-m32 CXXFLAGS=-m32 cmake -DDynamoRIO_DIR=$DYNAMORIO_HOME/cmake $DYNAMORIO_HOME/samples
make instrcalls

The result is a shared library instrcalls.dll or libinstrcalls.so. To invoke the client library, follow the instructions under How to Run.

Instruction Profiling

The next example shows how to use the provided control flow instrumentation routines, which allow more sophisticated profiling than simply counting instructions. Full code for this example is in the file instrcalls.c.

As in the previous example, the client is interested in direct and indirect calls and returns. The client wants to analyze the target address of each dynamic instance of a call or return. For our example, we simply dump the data in text format to a separate file for each thread. Since FILE cannot be exported from a DLL on Windows, we use the DynamoRIO-provided file_t type that hides the distinction between FILE and HANDLE to allow the same code to work on Linux and Windows. We make use of the thread initialization and exit routines to open and close the file. We store the file for a thread in the user slot in the drcontext.

static void event_thread_init(void *drcontext)
{
  /* we're going to dump our data to a per-thread file */
  file_t f;
  char logname[512];
 
  ... // filename generation
 
  f = dr_open_file(fname, false/*write*/);
  DR_ASSERT(f != INVALID_FILE);
 
  /* store it in the slot provided in the drcontext */
  dr_set_tls_field(drcontext, (void *)f);
 
  ... // logging
}
 
static void event_thread_exit(void *drcontext)
{
  file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(drcontext);
  dr_close_file(f);
}

The basic block hook inserts a call to a procedure for each type of instruction, using the API-provided dr_insert_call_instrumentation and dr_insert_mbr_instrumentation routines, which insert calls to procedures with a certain signature.

static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
                  bool for_trace, bool translating)
{
  instr_t *instr, *next_instr;
 
  ... // logging
 
  for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
    next_instr = instr_get_next(instr);
    if (!instr_opcode_valid(instr))
        continue;
    /* instrument calls and returns -- ignore far calls/rets */
    if (instr_is_call_direct(instr)) {
        dr_insert_call_instrumentation(drcontext, bb, instr, (app_pc)at_call);
    } else if (instr_is_call_indirect(instr)) {
        dr_insert_mbr_instrumentation(drcontext, bb, instr, (app_pc)at_call_ind,
                                      SPILL_SLOT_1);
    } else if (instr_is_return(instr)) {
        dr_insert_mbr_instrumentation(drcontext, bb, instr, (app_pc)at_return,
                                      SPILL_SLOT_1);
    }
  }
  return DR_EMIT_DEFAULT;
}

These procedures look like this :

static void
at_call(app_pc instr_addr, app_pc target_addr)
{
  file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
  dr_mcontext_t mc;
  dr_get_mcontext(dr_get_current_drcontext(), &mc, NULL);
  dr_fprintf(f, "CALL @ "PFX" to "PFX", TOS is "PFX"\n",
             instr_addr, target_addr, mc.xsp);
}
 
static void
at_call_ind(app_pc instr_addr, app_pc target_addr)
{
  file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
  dr_fprintf(f, "CALL INDIRECT @ "PFX" to "PFX"\n", instr_addr, target_addr);
}
 
static void
at_return(app_pc instr_addr, app_pc target_addr)
{
  file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
  dr_fprintf(f, "RETURN @ "PFX" to "PFX"\n", instr_addr, target_addr);
}

The address of the instruction and the address of its target are both provided. These routines could perform some sort of analysis based on these addresses. In our example we simply print out the data.

Modifying Existing Instrumentation

In this example, we show how to update or replace existing instrumentation after it executes. This ability is useful for clients performing adaptive optimization. In this example, however, we are interested in recording the direction of all conditional branches, but wish to remove the overhead of instrumentation once we've gathered that information. This code could form part of a dynamic CFG builder, where we want to observe the control-flow edges that execute at runtime, but remove the instrumentation after it executes.

While DynamoRIO supports direct fragment replacement, another method for re-instrumentation is to flush the fragment from the code cache and rebuild it in the basic block event callback. In other words, we take the following approach:

In the basic block event callback, insert separate instrumentation for the taken and fall-through edges.
When the basic block executes, note the direction taken and flush the fragment from the code cache.
When the basic block event triggers again, insert instrumentation only for the unseen edge. After both edges have triggered, remove all instrumentation for the cbr.

We insert separate clean calls for the taken and fall-through cases. In each clean call, we record the observed direction and immediately flush the basic block using dr_flush_region(). Since that routine removes the calling block, we redirect execution to the target or fall-through address with dr_redirect_execution(). The file cbr.c contains the full code for this sample.

Optimization

For the next example we consider a client application for a simple optimization. The optimizer replaces every increment/decrement operation with a corresponding add/subtract operation if running on a Pentium 4, where the add/subtract is less expensive. For optimizations, we are less concerned with covering all the code that is executed; on the contrary, in order to amortize the optimization overhead, we only want to apply the optimization to hot code. Thus, we apply the optimization at the trace level rather than the basic block level. Full code for this example is in the file inc2add.c.

Custom Tracing

This example demonstrates the custom tracing interface. It changes DynamoRIO's tracing behavior to favor making traces that start at a call and end right after a return. It demonstrates the use of both custom trace API elements :

int query_end_trace(void *drcontext, void *trace_tag, void *next_tag);

bool dr_mark_trace_head(void *drcontext, void *tag);

dr_mark_trace_head

DR_API bool dr_mark_trace_head(void *drcontext, void *tag)

Full code for this example is in the file inline.c.

Use of x87 Floating Point Operation in a Client

Because saving the x87 floating point state is very expensive, on x86 DynamoRIO seeks to do so on an as needed basis. If a client wishes to use floating point operations and is unsure whether its compiler will use x87 or not, or if it wishes to use MMX registers, it must save and restore the application's floating point state around the usage. For an inserted clean call out of the code cache, this can be conveniently done using dr_insert_clean_call() and passing true for the save_fpstate parameter. It can also be done explicitly using these routines:

void proc_save_fpstate(byte *buf);

void proc_restore_fpstate(byte *buf);

proc_save_fpstate

DR_API size_t proc_save_fpstate(byte *buf)

proc_restore_fpstate

DR_API void proc_restore_fpstate(byte *buf)

These routines must be used if x87 floating point operations are performed in non-inserted-call locations, such as event callbacks. Note that there are restrictions on how these methods may be called: see the documentation in the header files for additional information. Note also that the floating point state must be saved around calls to our provided printing routines when they are used to print floats. However, it is not necessary to save and restore the floating point state around floating point operations if they are being used in the initialization or termination routines.

On ARM and AArch64 the SIMD/FP registers are always saved, so proc_save_fpstate and proc_restore_fpstate are no-ops. On x86, modern compilers typically do not use x87 operations, but to be safe clients are still advised to either avoid floating-point operations or use the preservation routines listed here.

This example client counts the number of basic blocks processed and keeps statistics on their average size using floating point operations. Full code for this example is in the file bbsize.c.

Use of Custom Client Statistics with the Windows GUI

The new Windows GUI will display custom client statistics, if they are placed in shared memory with a certain name. The sample stats.c gives code for the protocol used in the form of a sample client that counts total instructions, floating-point instructions, and system calls.

Note that the stats.c example client and the Windows GUI must both be run within the same session in order for the statistics to be shared properly. They can be modified to use a "Global" prefix instead of "Local" for cross-session sharing, though this requires running with administrative privileges.

Use of Standalone API

The binary tracedump reader also functions as an example of Disassembly Library : tracedump.c.

DynamoRIO version 11.91.20503 --- Fri Feb 20 2026 03:50:16