TL;DR
This blog post explains how ltrace works, internally. This is a great companion post to our previous blog post which describes strace internals.
We’ll begin by examining the difference between ltrace
and strace
. Next, we’ll move on to examining the ptrace
system call and the ways in which ltrace
uses it to get information about library calls being made in a running process.
ltrace
vs strace
strace
is a system call and signal tracer. It is primarily used to trace system calls (that is, function calls made from programs to the kernel), print the arguments passed to system calls, print return values, timing information and more. It can also trace and output information about signals received by the process.
As described in our previous blog post, strace relies on the ptrace
system call.
ltrace
is a library call tracer and it is primarily used to trace calls made by programs to library functions. It can also trace system calls and signals, like strace
.
Both programs have some similar command line options for things like printing timing information, return values, attaching to running processes, and following forked processes.
ltrace
also relies on the ptrace
system call, but tracing library functions works differently than tracing system calls and this is where the tools differ.
Before we can examine how ltrace
uses ptrace
, we need to understand a few key ideas.
How do programs call functions in libraries?
Shared libraries can be loaded at any address. This means that functions within shared libraries will exist at an address that is not known until the library is loaded at runtime. Subsequent runs of the same program with the same libraries will load the same libraries at different addresses.
So, how do programs call functions at addresses that are unknown?
The short answer is: it depends on the binary format, operating system, and loader. On Linux, it is a delicate dance between the program and the dynamic linker.
And now, the long answer.
Programs on Linux use the ELF
binary format which provides for many features. For the purposes of understanding how library functions are called, we’ll direct our attention to the Procedure Linkage Table (PLT) and the Global Offset Table (GOT).
The PLT contains a group of assembly instructions per library function that executes when a library function is called. Groups of assembly instructions are often called “trampolines.”
Here’s an example of a PLT trampoline:
All of the entries in the PLT follow the same template.
This code starts by jumping to the address stored in an entry in the GOT.
The GOT contains a list of absolute addresses. At program start, these addresses are initialized to point to the pushq
instruction inside the PLT (the second line in the assembly code above).
The pushq
code executes to store some data for the dynamic linker and the jmp
on line 3 runs which transfers execution to another piece of code that calls into the dynamic linker.
The dynamic linker then uses the value $index1
and other data to find out which library function the program tried to call. It locates the address of the library function and writes it to the entry in the GOT overwriting the previous entry which pointed inside of the PLT.
Any call to the same library function after this point will execute the function directly instead of invoking the dynamic linker.
You can read the System V AMD64 ABI starting at page 75 for the full, in depth walk through of this process.
To review, the high level summary is:
- When the program is loaded into memory the program and each dynamic shared object (DSO for short, also known as shared library) has their PLT and GOT mapped into memory.
- At the start of execution, the memory locations of functions in a shared library are not known. This is because a shared library can be loaded at any address in the address space of a program.
- When a library function is called, execution is transferred to the function’s PLT entry. The PLT entry is a set of assembly instrucions (called a ‘trampoline’).
- This ‘trampoline’ arranges data about the function the program was trying to call and invokes the dynamic linker.
- The dynamic linker runs, takes the data arranged by the PLT trampoline and uses it to find the address of the function that the program is trying to call.
- Once found, the address is written to the GOT and execution is transferred to the function.
- Subsequent calls to the same function do not invoke the dynamic linker. Instead, the PLT calls directly to the function using the address stored in the GOT.
In order to hook library function calls, ltrace
must insert itself into this workflow.
ltrace
inserts itself by setting a software breakpoint in the PLT entry for a particular library function.
What is a breakpoint and how do they work?
A breakpoint is a way of getting a program to stop executing at a certain point so that another program (like a debugger, or tracer, or similar) can intervene.
There are two types of breakpoints: hardware breakpoints and software breakpoints.
A hardware breakpoint is a feature of the CPU your system uses and are a very limited resource. On the amd64 CPU, there are special registers that can be set with an address at which to halt execution. There are only 4 registers that can be used for storing breakpoint addresses.
A software breakpoint is a breakpoint that is triggered by running a special assembly instruction. You can write this instruction as many times as you like. Thus, the number of software breakpoints you can set is unlimited.
On the amd64 CPU, a software breakpoint is the following assembly instruction:
This instruction causes the processor to raise interrupt #3, which is a special interrupt for debuggers. The Linux kernel has a handler that executes when interrupt #3 is triggered. The kernel delivers a SIGTRAP
to the program which has executed the instruction.
Recall from our previous article about strace, that a tracer can attach to a program using ptrace
. Any signal delivered to the traced program causes that program to halt while the tracer is notified about the signal.
So:
- A program executes
int $3
and execution is halted. - The Linux kernel’s handler for this interrupt is triggered.
- The handler eventually calls into some code in the Linux kernel which delivers a
SIGTRAP
to the program. - If the program has been attached to by another program that used
ptrace
, that other program will be notified that aSIGTRAP
is pending.
(This is similar to what happens if you use the PTRACE_SYSCALL
argument described in our other blog post.)
So, how does a tracer or debugger insert this int $3
instruction into a program?
ptrace
+ PTRACE_POKETEXT
to modify memory in running programs
The ptrace
system call takes a request
argument which can be set to PTRACE_POKETEXT
. The PTRACE_POKETEXT
argument allows the program calling ptrace
to modify memory in a running process.
Debuggers and tracers can use PTRACE_POKETEXT
to write the int $3
instruction into a program’s memory while it is running. This is how breakpoints are set in programs.
ptrace
+ PTRACE_POKETEXT
+ int $3
= ltrace
Combine all of the things examined previously and you end up with ltrace
.
ltrace
works by:
- Attaching to the running program with
ptrace
- Locating the PLT of a program
- Using
ptrace
withPTRACE_POKETEXT
to overwrite the assembly trampolines in the program’s PLT entry for each library function with theint $3
instruction - Resuming execution of the program
Then, when a program makes a library call, it will execute the int $3
instruction:
- Program executes
int $3
- Linux kernel handler for
int $3
runs - Kernel notifies
ltrace
that aSIGTRAP
is pending for the process ltrace
examines the program to determine which library call it was making and prints the library call name, arguments, time stamps, and other data requested by the user.
And finally, ltrace
must replace the int $3
instruction it wrote in the PLT with the original code, so that the program can be resumed and execute correctly:
ltrace
usesPTRACE_POKETEXT
to replace the original instruction- Program execution is resumed
- Program continues executing as expected as original instruction was replaced.
And, this is how ltrace
traces calls to library functions.
Conclusion
The ptrace
system call is incredibly powerful and can be used to trace system calls, overwrite memory in a running program, read registers in a running program, and more.
strace
and ltrace
both use PTRACE_SYSCALL
to trace system calls. ltrace
also uses PTRACE_POKETEXT
to overwrite memory in a program in order to write a special instruction which halts the program.
To learn more about PTRACE_SYSCALL
internals, read our previous blog post about strace
.
In both cases, the kernel generates a SIGTRAP
for the traced program, halts it, and notifies the tracer (strace
or ltrace
) that a signal is pending. This is how ltrace
and strace
are “woken up” to analyze a halted program.