Two frequently used system calls are ~77% slower on AWS EC2

Mar 07, 2017 · 10 min read

linux

TL;DR

This blog post dives into an interesting finding: two frequently used system calls (gettimeofday, clock_gettime) are much slower on AWS EC2.

Linux provides a mechanism for speeding up those two frequently used system calls by implementing the system call code in userland and avoiding the switch to the kernel entirely. This is done via a virtual shared library provided by the kernel that is mapped into the address space of every running program.

The two system calls listed cannot use the vDSO as they normally would on any other system. This is because the virtualized clock source on xen (and some kvm configurations) do not support reading the time in userland via the vDSO.

There is no safe workaround for this; the user may decide to change their clock source to tsc by writing to file in sysfs, but this is considered dangerous. Continue reading to learn more and the results of a microbenchmark.

Confirming you are affected by this issue

To quickly confirm if your system is affected by this issue, you can compile and run the following program with strace:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int
main(int argc, char *argv[])
{
        struct timeval tv;
        int i = 0;
        for (; i<100; i++) {
                gettimeofday(&tv,NULL);
        }

        return 0;
}

Compile with: gcc -o test test.c and then trace with strace -ce gettimeofday ./test

% strace -ce gettimeofday ./test
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0       100           gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                   100           total`

As you can see, strace counted 100 calls to gettimeofday. This means that the vDSO is not being used and real system calls are being made causing a context switch to the kernel. The Linux vDSO was designed with gettimeofday in mind (in fact, it’s even mentioned in the vDSO man page). Any system call that is passed through the vDSO is executed completely in userland causing no context switch to the kernel. As a result, any system call that successfully uses the vDSO will not appear in strace output.

Continue reading to learn why exactly this is and to see some interesting profiling data.

Prerequisite information

There’s a few important things the reader will need to be familiar with in order to follow the explanation and code snippets that follow illustrating this issue.

Linux system calls and the vDSO

Before proceeding with this post, it is strongly recommended that the reader carefully read our previous post detailing how system calls work on Linux: The Definitive Guide to Linux System Calls.

As described in detail in that blog post, the vDSO is essentially a shared library that is provided by the kernel which is mapped into every process’ address space. When the gettimeofday, clock_gettime, getcpu, or time system calls are made, glibc will attempt to call the code provided by the vDSO. This code will access the needed data without entering the kernel, saving the process the overhead of making a real system call.

Because system calls made via the vDSO do not enter the kernel, strace is not notified that the vDSO system call was made. As a result, a program which calls gettimeofday successfully via the vDSO will not show gettimeofday in the strace output. You would need to use ltrace instead. Learn more about how strace works by reading an older post of ours.

On AWS EC2, gettimeofday appears in the strace output. This is because the vDSO falls back to a regular system call in certain situations.

Linux timekeeping

There are many different apparatuses that can be used for timekeeping on x86 systems running the Linux kernel:

Each system has its own benefits and drawbacks. More detailed information on each method is conveniently presented in the kernel source in Documentation/virtual/kvm/timekeeping.txt.

It’s important to understand that virtualization introduces many complexities when it comes to timekeeping. Some examples include:

The virtual machines running on a host now all share the same source of time, but it is impossible for every VM to have its time updated at exactly the same instant. Moreover, a VM may have interrupts disabled while executing critical sections of the kernel while the hypervisor may be happily generating timer interrupts.
Certain timekeeping systems (like the Time Stamp Counter) are themselves virtualized; reads from the TSC register may impose a performance penalty yielding inaccurate readings and backwards time drift.
Migration of VMs between hypervisors with different CPUs may be problematic if the timekeeping system relies on the clock rate of the processor.

The folks at VMWare published a very interesting paper describing these and other timekeeping issues. The information is presented as being specific to VMWare, but most of it is generally applicable to any virtualization system.

In order to deal with these and other issues, KVM and Xen provide their own timekeeping systems: the KVM PVclock and the Xen time implementation. In the Linux kernel each of these is commonly referred to as a clocksource.

The system’s current clocksource can be found by checking the file /sys/devices/system/clocksource/clocksource0/current_clocksource.

This is the clocksource which will be consulted when system calls like gettimeofday or clock_gettime are executed.

vDSO fallback mechanism

Let’s take a look at the vDSO code implementing gettimeofday for more clarity. Remember, this code is packaged with the kernel, but is actually run completely in userland.

If we examine the code in arch/x86/vdso/vclock_gettime.c and check the vDSO implementations for gettimeofday (__vdso_gettimeofday) and clock_gettime (__vdso_clock_gettime), we’ll find that both pieces of code have a similar conditional near the end of the function:

if (ret == VCLOCK_NONE)
	return vdso_fallback_gtod(clock, ts);

(The code for __Vdso_clock_gettime has the same check, but calls vdso_fallback_gettime instead.)

If ret is set to VCLOCK_NONE this indicates that the system’s current clocksource does not support the vDSO. In this case, the vdso_fallback_gtod function failsafe function is called which will simply executes a system call normally: by entering the kernel and incurring all the normal overhead.

But, in which cases does ret get set to VCLOCK_NONE?

If we follow the code backward from this point, we’ll find that ret is set to the vclock_mode field of the current clocksource. Clocksources such as:

the High Precision Event Timer, and
the Time Stamp Counter,
and in some cases the KVM PVClock

all have their vclock_mode fields set to an identifier other than VCLOCK_NONE.

On the other hand, clocksources such as:

the Xen time implementation, and
systems where either CONFIG_PARAVIRT_CLOCK is not enabled in the kernel configuration or the CPU does not provide a paravirtualized clock feature

all have their vclock_mode fields set to VCLOCK_NONE (0).

AWS EC2 uses Xen. Xen’s default clocksource (xen) has its vclock_mode field set to VCLOCK_NONE which means EC2 instances will always fall back to using the slower system call path – the vDSO will never be used.

But, what effect does this have on performance?

Profiling the performance difference between regular system calls vs vDSO system calls

The purpose of the following experiment is to measure the difference in wall clock time in a microbenchmark to test the difference in execution speed between the fast vDSO-enabled gettimeofday system calls and regular, slow, gettimeofday calls.

In order to test this, we’ll run the sample program above with three different loop counts on an EC2 instance with the clocksource set to xen and then again with the clocksource set to tsc.

It is not safe to switch the clocksource to tsc on EC2. It is unlikely, but possible that this can lead to unexpected backwards clock drift. Do not do this on your production systems.

Experiment setup

All tests were run:

with the Amazon Linux AMI 2016.09.1 (HVM), SSD Volume Type (AMI: ami-f173cc91),
on an m4.xlarge instance size,
in the us-west-2c availability zone.

We’ll time the execution of the program using the time program. Readers may wonder: “how can you use the time program if you are potentially destabilizing the clocksource?”

Luckily, the kernel developer Ingo Molnar wrote a program for detecting time warps: time-warp-test.c. Note that you will need to modify this program just slightly for 64bit x86 systems.

We ran this helpful program while performing our experiments to help detect if any time warps were experienced. There were none.

There’s a few things that some one wishing to get a more scientific result could do to increase confidence in the result if desired:

Backward clock drift is unlikely, but possible. Running the expirement many times, gathering the outcomes, and running a probabilistic analysis can help mute bad data points.
The experiment can be re-run on a non-virtualized system that would not suffer from clock drift. The experiment can be performed first by running the test program provided to test the vDSO. Then, the program can be modified to call the syscall directly.

For our purposes, running the time warp test to detect time warps while performing the experiment was sufficient.

Results

The results of this microbenchmark show that the regular system call method which is used on ec2 is about 77% slower than the vDSO method:

A tight loop of 5 million calls to gettimeofday:

vDSO enabled:
- real: 0m0.123s
- user: 0m0.120s
- sys: 0m0.000s
regular slow system call:
- real: 0m0.547s
- user: 0m0.120s
- sys: 0m0.424s

A tight loop of 50 million calls to gettimeofday:

vDSO enabled:
- real: 0m1.225s
- user: 0m1.224s
- sys: 0m0.000s
regular system call:
- real: 0m5.459s
- user: 0m1.316s
- sys: 0m4.140s

A tight loop of 500 million calls to gettimeofday:

vDSO enabled:
- real: 0m12.247s
- user: 0m12.244s
- sys: 0m0.000s
regular system call:
- real: 0m54.606s
- user: 0m13.192s
- sys: 0m41.412s

Patches to Xen in progress

The proper fix for this issue would be to add vDSO support to the xen clocksource. Luckily, there are some patches in the works that aim to do just that.

Until this change (or one like it) is merged to the kernel and deployed to EC2, the gettimeofday and clock_gettime system calls will execute 77% slower than they otherwise could when run on EC2.

Conclusion

As expected, the vDSO system call path is measurably faster than the normal system call path. This is because the vDSO system call path prevents a context switch into the kernel. Remember: vDSO system calls will not appear in strace output if they successfully pass through the vDSO. If they are unabled to use the vDSO for some reason, they will fall back to regular system calls and will appear in strace output.

There are some patches in the works to add vDSO support to Xen, but there’s no telling when this will be available in AWS EC2.

Until a change like this is deployed to EC2, gettimeofday and clock_gettime will perform approximately 77% slower than they normally would.

Using strace on your applications incurs overhead while it is in use, but it provides invaluable insight into what exactly your applications are doing. All programmers deploying software to production environments should regularly strace their applications in development mode and question all output they find.