|
DynamoRIO
|
We provide several examples in the samples directory of the release package to illustrate how the DynamoRIO API is used to build a DynamoRIO client.
There are also samples for the Dr. Memory Framework located in the drmemory/drmf/samples directory of the release package.
For larger examples of clients, see the provided Available Tools, which are larger and more polished end-user clients than these samples.
List of Samples
Each sample below is in the api/samples directory. The links point to the corresponding source file in the Git repository.
bbbuf.c
The sample bbbuf.c records each basic block’s start PC into a per-thread fast circular buffer (default 64KB) by inserting inline instrumentation at the first instruction of each basic block using the drx_buf extension.
- DynamoRIO concepts demonstrated
- TLS-backed per-thread buffers via DynamoRIO eXtension utilities (Buffer Filling API)
- Basic block instrumentation at block entry
- Scratch register management with Register Management
- Typical use cases
- Hot path profiling based on basic block history
- Lightweight execution tracing for debugging
bbcount.c
The sample bbcount.c counts dynamic basic block executions by inserting a non-atomic global counter update at the start of each basic block using DynamoRIO eXtension utilities counter helpers.
- DynamoRIO concepts demonstrated
- Basic block instrumentation
- Inline counter updates with DynamoRIO eXtension utilities
- AFLAGS liveness checks via Register Management
- Typical use cases
- Measuring overall execution volume
- Comparing workloads by basic block activity
bbsize.c
The sample bbsize.c computes the number, maximum size, and average size of basic blocks using a basic-block analysis callback and explicit floating-point state save/restore.
- DynamoRIO concepts demonstrated
- Basic block analysis callbacks
- Floating-point state preservation in callbacks
- Shared statistics protected by a mutex
- Typical use cases
- Characterizing code layout behavior
- Comparing compiler output across binaries
callstack.cpp
The sample callstack.cpp wraps a configurable function (default: malloc) and prints a symbolized call stack on each call using Function Wrapping and Replacing Extension, Callstack Walking, and Symbol Access Library.
- DynamoRIO concepts demonstrated
- Function wrapping with Function Wrapping and Replacing Extension
- Call stack walking via the Callstack Walking extension
- Symbol lookup with Symbol Access Library
- Runtime options via DynamoRIO Option Parser
- Typical use cases
- Debugging unexpected call paths
- Capturing stack traces at selected points
cbr.c
The sample cbr.c instruments conditional branches with separate taken/not-taken clean calls, records which edges execute, and flushes and redirects execution to remove instrumentation during execution once each edge has been seen.
- DynamoRIO concepts demonstrated
- Conditional branch instrumentation with clean calls
- Fragment flushing and execution redirection
- Per-branch state tracking with a hash table
- Typical use cases
- Building dynamic control-flow graphs
- Reducing overhead after edge discovery
cbrtrace.c
The sample cbrtrace.c logs each conditional branch’s address, fall-through, target, and taken direction to per-thread log files using dr_insert_cbr_instrumentation_ex().
- DynamoRIO concepts demonstrated
- Conditional branch instrumentation
- Thread-local storage via drmgr TLS fields (Multi-Instrumentation Manager)
- Per-thread log file output
- Typical use cases
- Debugging branch behavior
- Building branch outcome traces
countcalls.c
The sample countcalls.c counts dynamic direct calls, indirect calls, and returns using per-thread TLS data and inline counter updates with flag preservation.
- DynamoRIO concepts demonstrated
- TLS-based per-thread data via drmgr (Multi-Instrumentation Manager)
- Instruction classification for calls and returns
- Inline counter updates with Register Management
- Typical use cases
- Profiling call/return behavior
- Comparing indirect call frequency across inputs
div.c
The sample div.c counts unsigned division instructions and how often the runtime divisor is a power of two, using a clean call to read the divisor value.
- DynamoRIO concepts demonstrated
- Clean calls for operand inspection
- Runtime operand value extraction
- Thread-safe counters with a mutex
- Typical use cases
- Identifying optimization opportunities
- Finding expensive divisions in hot code
empty.c
The sample empty.c is a minimal client that only registers an exit event and performs no instrumentation.
- DynamoRIO concepts demonstrated
- Client initialization
- Exit event handling
- Typical use cases
- Starting point for new clients
- Verifying build and load pipelines
hot_bbcount.c
The sample hot_bbcount.c uses Basic Block Duplicator to create cold and hot versions of basic block instrumentation, switching to inline counter updates after a per-block hit threshold is reached.
- DynamoRIO concepts demonstrated
- Basic block duplication (Basic Block Duplicator)
- Clean calls versus inline counter updates
- Runtime case encoding stored in raw TLS
- Typical use cases
- Hot path profiling with reduced overhead
- Tiered instrumentation strategies
inc2add.c
The sample inc2add.c performs an app2app transformation that replaces inc/dec with add/sub in trace blocks when the carry flag is not live, using drreg liveness analysis to preserve correctness.
- DynamoRIO concepts demonstrated
- App2app transformations
- Instruction replacement
- Flag liveness checks with Register Management
- Typical use cases
- Exploring micro-architecture-specific optimizations
- Testing transformation pipelines
inline.c
The sample inline.c customizes trace formation to inline callees into traces by marking call sites as trace heads and ending traces after returns or size limits.
- DynamoRIO concepts demonstrated
- Custom trace control via end-trace events
- Trace head tracking with Multi-Instrumentation Manager and dr_mark_trace_head()
- Fragment lifecycle handling on deletion
- Typical use cases
- Dynamic inlining experiments
- Trace-level performance research
inscount.cpp
The sample inscount.cpp counts executed instructions by computing per-basic-block instruction counts in analysis and adding them via an auto-inlined clean call, with an option to count only application code.
- DynamoRIO concepts demonstrated
- Clean calls optimized by DynamoRIO
- Runtime options via DynamoRIO Option Parser
- Emulation-aware instruction counting with drmgr (Multi-Instrumentation Manager)
- Typical use cases
- Instruction count profiling
- Validating instrumentation overhead
instrace_simple.c
The sample instrace_simple.c records a per-thread instruction trace (PC and opcode) into a raw TLS buffer with inline instrumentation and flushes it to a text file using clean calls.
- DynamoRIO concepts demonstrated
- Raw TLS buffer management
- Inline per-instruction instrumentation
- Clean calls to flush per-thread buffers
- Typical use cases
- Building simple instruction traces
- Validating instrumentation correctness
- For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.
instrace_x86.c
The sample instrace_x86.c captures a per-thread instruction trace on x86 using inline buffer filling and a local code cache for lean clean calls when the buffer fills.
- DynamoRIO concepts demonstrated
- Local code cache generation
- Lean clean calls
- Per-thread buffering and file output
- Typical use cases
- High-performance instruction tracing
- Producing binary traces for offline analysis
- For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.
instrcalls.c
The sample instrcalls.c logs each direct call, indirect call, and return with target information to per-thread files, with optional symbolization via drsyms (Symbol Access Library).
- DynamoRIO concepts demonstrated
- Call and return instrumentation
- Per-thread logging
- Optional symbol lookup with drsyms
- Typical use cases
- Building call flow traces
- Investigating indirect call behavior
memtrace_simple.c
The sample memtrace_simple.c records instruction and memory-reference entries (type, size, address) into a per-thread buffer and dumps them to a text file using clean calls.
- DynamoRIO concepts demonstrated
- Memory operand analysis
- Instrumentation Utilities helpers for address and size
- DynamoRIO eXtension utilities expansion of string and scatter/gather operations
- Typical use cases
- Studying memory access patterns
- Validating memory reference instrumentation
- For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.
memtrace_x86.c
The sample memtrace_x86.c captures memory references on x86 with inline buffer filling and a local code cache for lean clean calls, writing binary or text traces per thread.
- DynamoRIO concepts demonstrated
- Instrumentation Utilities string and scatter/gather expansion
- Local code cache generation
- Per-thread buffering with lean clean calls
- Typical use cases
- High-performance memory tracing
- Generating binary traces for offline analysis
- For a full-featured instruction and data address tracing tool, see Tracing and Analysis Framework.
memval_simple.c
The sample memval_simple.c records memory write addresses and the written bytes using drx_buf trace and circular buffers, flushing via a trace-buffer fault handler to per-thread log files.
- DynamoRIO concepts demonstrated
- Post-instruction instrumentation for write values
- DynamoRIO eXtension utilities trace and circular buffers with a fault handler (Buffer Filling API)
- Memory operand address calculation with Instrumentation Utilities
- Typical use cases
- Tracking data writes for debugging
- Building lightweight memory value traces
modxfer.c
The sample modxfer.c counts instructions per module and tracks cross-module indirect branch transfers between any modules, logging summary statistics at exit.
- DynamoRIO concepts demonstrated
- Module load and unload tracking
- Indirect branch instrumentation
- Per-module counters with DynamoRIO eXtension utilities
- Typical use cases
- Studying inter-module call patterns
- Identifying frequent cross-module transfers
modxfer_app2lib.c
The sample modxfer_app2lib.c counts instructions in the main application versus libraries and tracks indirect call/jump transfers between them using per-basic-block clean calls.
- DynamoRIO concepts demonstrated
- Module range checks for app vs. library code
- Clean call updates of per-basic-block counts
- Indirect branch instrumentation for cross-module transfers
- Typical use cases
- Measuring time spent in app vs. libraries
- Detecting app-to-lib transition frequency
opcode_count.cpp
The sample opcode_count.cpp counts executions of a selected opcode and the total instruction count using drmgr opcode instrumentation events and drx counter updates.
- DynamoRIO concepts demonstrated
- Opcode instrumentation events via Multi-Instrumentation Manager
- Inline counter updates with DynamoRIO eXtension utilities
- Runtime options via DynamoRIO Option Parser
- Typical use cases
- Focused opcode profiling
- Regression checks for specific instruction patterns
opcodes.c
The sample opcodes.c counts dynamic instruction executions by opcode, grouped by ISA mode, and reports the top opcode counts at exit.
- DynamoRIO concepts demonstrated
- Basic block instrumentation
- Inline counter updates with DynamoRIO eXtension utilities
- ISA mode awareness
- Typical use cases
- Opcode mix profiling
- Comparing dynamic behavior across workloads
prefetch.c
The sample prefetch.c removes prefetch and prefetchw instructions on Intel CPUs using an app2app pass, counting the removals for reporting.
- DynamoRIO concepts demonstrated
- App2app instruction removal
- CPU vendor detection
- Thread-safe counting with a mutex
- Typical use cases
- Running AMD-optimized binaries on Intel CPUs
- Testing instruction stream sanitization
signal.c
The sample signal.c monitors UNIX signals, suppressing SIGTERM and redirecting SIGSEGV by skipping the faulting instruction while counting signals seen.
- DynamoRIO concepts demonstrated
- Signal event callbacks (UNIX)
- Signal suppression and redirection
- Atomic counters
- Typical use cases
- Observing crash signals in a controlled way
- Prototyping signal-based fault handling
ssljack.c
The sample ssljack.c wraps OpenSSL and GnuTLS read/write functions on module load and logs plaintext data to per-SSL-context files.
- DynamoRIO concepts demonstrated
- Module load events
- Function wrapping (Function Wrapping and Replacing Extension)
- Per-context file logging
- Typical use cases
- Inspecting decrypted SSL/TLS traffic
- Debugging application-level crypto usage
statecmp.c
The sample statecmp.c uses Machine State Comparison Library to detect instrumentation-induced state mismatches by intentionally clobbering flags and handling mismatches via a user callback.
- DynamoRIO concepts demonstrated
- Machine State Comparison Library integration
- Instrumentation correctness checking
- Custom mismatch callbacks
- Typical use cases
- Validating new instrumentation passes
- Debugging subtle state clobbers
stats.c
The sample stats.c (Windows-only) exports instruction, floating-point, and syscall counters via shared memory for the stats viewer, using drx counter updates per basic block.
- DynamoRIO concepts demonstrated
- Shared memory usage for client stats
- Inline counters with DynamoRIO eXtension utilities
- Windows-specific API interactions
- Typical use cases
- Live statistics dashboards
- Comparing runs without heavy logging
strace.c
The sample strace.c prints the name and result of every system call using the Dr. Syscall: System Call Monitoring Extension extension from DRMF.
- DynamoRIO concepts demonstrated
- Syscall event callbacks via Dr. Syscall: System Call Monitoring Extension
- Syscall name lookup with DRMF
- Result decoding
- Typical use cases
- Debugging unexpected syscalls
- Building basic syscall traces
stl_test.cpp
The sample stl_test.cpp is a C++ client that exercises STL containers (vector, list, map) and optionally uses a TLS variable on UNIX when SHOW_RESULTS is enabled.
- DynamoRIO concepts demonstrated
- C++ client setup
- STL usage in a client
- Exit event registration
- Typical use cases
- C++ client scaffolding
- Validating STL usage in DynamoRIO clients
syscall.c
The sample syscall.c monitors system calls, counts them, and modifies write syscalls (SYS_write/NtWriteFile) using drmgr’s pre/post syscall events and thread-context-local storage (Multi-Instrumentation Manager).
- DynamoRIO concepts demonstrated
- Syscall interception events
- Thread-context-local storage via drmgr CLS (Multi-Instrumentation Manager)
- Platform-specific syscall handling and parameter modification
- Typical use cases
- Auditing syscall activity
- Prototyping syscall rewriting
tracedump.c
The sample tracedump.c uses the standalone API to parse and disassemble binary trace dump files produced by the -tracedump_binary option.
- DynamoRIO concepts demonstrated
- Standalone API usage
- Trace dump file parsing
- Disassembly of cached code
- Typical use cases
- Inspecting trace dumps offline
- Debugging code cache behavior
utils.c
The sample utils.c provides shared logging helpers for samples, including unique per-process log file creation and FILE* stream helpers.
- DynamoRIO concepts demonstrated
- Client path discovery
- Log file creation helpers via DynamoRIO eXtension utilities
- Stream helper utilities
- Typical use cases
- Reusing logging utilities across sample clients
- Standardizing output file naming
wrap.c
The sample wrap.c wraps malloc (Linux) or HeapAlloc (Windows) to track the maximum allocation size and optionally force occasional allocation failures.
- DynamoRIO concepts demonstrated
- Module load callbacks
- Function wrapping (Function Wrapping and Replacing Extension)
- Return value modification
- Typical use cases
- Testing error handling paths
- Tracking allocation patterns
Discussion of Selected Samples
Instruction Counting
We now illustrate how to use the above API to implement a simple instrumentation client for counting the number of executed call and return instructions in the input program. Full code for this example is in the file countcalls.c.
The client maintains set of three counters: num_direct_calls, num_indirect_calls, and num_returns to count three different types of instructions during execution. It keeps both thread-private and global versions of these counters. The client initializes everything by supplying the following dr_client_main routine:
The client provides an event_exit routine that displays the final values of the global counters as well as a thread_exit routine that shows the counter totals on a per-thread basis.
The client keeps track of each thread's instruction counts separately. To do this, it creates a data structure that will be separately allocated for each thread:
Now the thread hooks are used to initialize the data structure and to display the thread-private totals :
The real work is done in the basic block hook. We simply look for the instructions we're interested in and insert an increment of the appropriate thread-local and global counters, remembering to save the flags, of course. This sample has separate paths for incrementing the thread private counts for shared vs. thread-private caches (see the -thread_private option) to illustrate the differences in targeting for them. Note that the shared path would work fine with private caches.
- Building the Example
For general instructions on building a client, see How to Build a Tool.
To build the instrcalls.c client using CMake, if DYNAMORIO_HOME is set to the base of the DynamoRIO release package:
To build 32-bit samples when using gcc with a default of 64-bit, use:
The result is a shared library instrcalls.dll or libinstrcalls.so. To invoke the client library, follow the instructions under How to Run.
Instruction Profiling
The next example shows how to use the provided control flow instrumentation routines, which allow more sophisticated profiling than simply counting instructions. Full code for this example is in the file instrcalls.c.
As in the previous example, the client is interested in direct and indirect calls and returns. The client wants to analyze the target address of each dynamic instance of a call or return. For our example, we simply dump the data in text format to a separate file for each thread. Since FILE cannot be exported from a DLL on Windows, we use the DynamoRIO-provided file_t type that hides the distinction between FILE and HANDLE to allow the same code to work on Linux and Windows. We make use of the thread initialization and exit routines to open and close the file. We store the file for a thread in the user slot in the drcontext.
The basic block hook inserts a call to a procedure for each type of instruction, using the API-provided dr_insert_call_instrumentation and dr_insert_mbr_instrumentation routines, which insert calls to procedures with a certain signature.
These procedures look like this :
The address of the instruction and the address of its target are both provided. These routines could perform some sort of analysis based on these addresses. In our example we simply print out the data.
Modifying Existing Instrumentation
In this example, we show how to update or replace existing instrumentation after it executes. This ability is useful for clients performing adaptive optimization. In this example, however, we are interested in recording the direction of all conditional branches, but wish to remove the overhead of instrumentation once we've gathered that information. This code could form part of a dynamic CFG builder, where we want to observe the control-flow edges that execute at runtime, but remove the instrumentation after it executes.
While DynamoRIO supports direct fragment replacement, another method for re-instrumentation is to flush the fragment from the code cache and rebuild it in the basic block event callback. In other words, we take the following approach:
- In the basic block event callback, insert separate instrumentation for the taken and fall-through edges.
- When the basic block executes, note the direction taken and flush the fragment from the code cache.
- When the basic block event triggers again, insert instrumentation only for the unseen edge. After both edges have triggered, remove all instrumentation for the cbr.
We insert separate clean calls for the taken and fall-through cases. In each clean call, we record the observed direction and immediately flush the basic block using dr_flush_region(). Since that routine removes the calling block, we redirect execution to the target or fall-through address with dr_redirect_execution(). The file cbr.c contains the full code for this sample.
Optimization
For the next example we consider a client application for a simple optimization. The optimizer replaces every increment/decrement operation with a corresponding add/subtract operation if running on a Pentium 4, where the add/subtract is less expensive. For optimizations, we are less concerned with covering all the code that is executed; on the contrary, in order to amortize the optimization overhead, we only want to apply the optimization to hot code. Thus, we apply the optimization at the trace level rather than the basic block level. Full code for this example is in the file inc2add.c.
Custom Tracing
This example demonstrates the custom tracing interface. It changes DynamoRIO's tracing behavior to favor making traces that start at a call and end right after a return. It demonstrates the use of both custom trace API elements :
Full code for this example is in the file inline.c.
Use of x87 Floating Point Operation in a Client
Because saving the x87 floating point state is very expensive, on x86 DynamoRIO seeks to do so on an as needed basis. If a client wishes to use floating point operations and is unsure whether its compiler will use x87 or not, or if it wishes to use MMX registers, it must save and restore the application's floating point state around the usage. For an inserted clean call out of the code cache, this can be conveniently done using dr_insert_clean_call() and passing true for the save_fpstate parameter. It can also be done explicitly using these routines:
These routines must be used if x87 floating point operations are performed in non-inserted-call locations, such as event callbacks. Note that there are restrictions on how these methods may be called: see the documentation in the header files for additional information. Note also that the floating point state must be saved around calls to our provided printing routines when they are used to print floats. However, it is not necessary to save and restore the floating point state around floating point operations if they are being used in the initialization or termination routines.
On ARM and AArch64 the SIMD/FP registers are always saved, so proc_save_fpstate and proc_restore_fpstate are no-ops. On x86, modern compilers typically do not use x87 operations, but to be safe clients are still advised to either avoid floating-point operations or use the preservation routines listed here.
This example client counts the number of basic blocks processed and keeps statistics on their average size using floating point operations. Full code for this example is in the file bbsize.c.
Use of Custom Client Statistics with the Windows GUI
The new Windows GUI will display custom client statistics, if they are placed in shared memory with a certain name. The sample stats.c gives code for the protocol used in the form of a sample client that counts total instructions, floating-point instructions, and system calls.
Note that the stats.c example client and the Windows GUI must both be run within the same session in order for the statistics to be shared properly. They can be modified to use a "Global" prefix instead of "Local" for cross-session sharing, though this requires running with administrative privileges.
Use of Standalone API
The binary tracedump reader also functions as an example of Disassembly Library : tracedump.c.