June 07, 2024

Cache Attacks

For my Master thesis, I had to dive into the fascinating world of microarchitectural attacks, especially cache attacks. These attacks target how the CPU cache is implemented, so I learned cool concepts from computer microarchitecture and a different aspect of computer security.

Side-channel attacks

Before getting into cache attacks, I think it is a good idea to understand what side-channel attacks are in general, and timing attacks in particular.

Side-channel

A side-channel is an unintended channel of information in a system. It can be, for example:

The heat, sound or light produced by the system
The change in the power consumption of the system
The time it takes to perform an operation

Side-channels are due to the physical properties of the system implementation. For example, transistors are used to create logic functions, but they have latency, consume electrical power and even produce electromagnetic radiation ¹.

Unfortunately, such channels can sometimes reveal critical information if certain conditions are met. We will see some examples below.

Differential Power Analysis

The power consumed by a system depends on the control flow, as different operations require the use of different parts of the physical circuit.

Here is an example of a power trace when running AES-128 – a cryptographic scheme – where we can clearly distinguish the ten encryption rounds:

Power trace of AES-128 on a smart card 2 — Power trace of AES-128 on a smart card ²

Differential Power Analysis consists of comparing a power trace with two reference traces:

The one in case of known event
The one in case of known non-event

We can determine if the event we are looking for has occurred by looking at the actual power trace and seeing whether it is closer to case 1 or case 2.

Keystroke acoustic emanations

Differential power analysis requires a probe to measure the power consumed, but we do not necessarily need specific instruments to perform side-channel attacks. For example, research has shown that it is possible to recover text typed on a keyboard only by listening to the sound emitted by keystrokes ³.

In fact, different keystrokes produce different sounds because they are located at different places on the keyboard plate, as we can see in the sound spectrum below.

Average spectrums for keystrokes q and w 3 — Average spectrums for keystrokes `q` and `w` ³

Timing attacks

Now that you (hopefully) have a better understanding of what a side-channel attack is, let's talk about another well-known side-channel: timing.

Timing attacks exploit the difference in execution time between different inputs to reveal sensitive data. Let's take for example the code below, where the function takes two strings as parameters and returns whether they are similar. Note that this implementation is different from that of the standard library.

bool strcmp(string a, string b) {
  if (a.length() != b.length()) {
    return false;
  }

  for (size_t i = 0; i < a.length(); ++i) {
    if (a[i] != b[i]) {
      return false;
    }
  }

  return true;
}

Function comparing two strings

The above strcmp function compares the inputs character by character and returns immediately when it reaches mismatching characters. We say that the function is not in constant-time because its execution time depends on the branch taken on line 2 and 7.

Now, imagine a software relying on strcmp to check whether a user input matches a secret.

bool verify(string input) {
  return strcmp(input, PASSWORD);
}

Simple password verification using strcmp

An attacker can deduce the secret by bruteforcing one character at a time rather than the password as a whole, which greatly improves the attack performance. Here is how:

The attacker measures the response time for inputs of increasing size. The function returns false almost immediately if the input size doesn't match the secret size. As a result, the query that takes the longer is the one whose input is the correct size.
The attacker queries the function with a random well-sized input by changing the first character in each query. The query with the longest response time is the one where the loop entered its second iteration, meaning that the first character was correct.
The attacker repeats step 2 with the next characters until the entire secret is known.

Microarchitectural attacks

We have seen how timing attacks exploit subtle variations in the execution time to infer sensitive information. In the next sections, we will explore how to exploit timing vulnerabilities in the microarchitecture. That is, cache attacks are a form of timing attacks that exploit the cache microarchitecture.

Cache

Now that we know what a timing attack is, we only need a few notions of microarchitecture before we can talk about cache attacks.

The cache is a fast memory used to temporarily store data that is likely to be accessed in the future, so it can be fetched more quickly than from the main memory. Before accessing the memory, the processor checks whether the data or the instruction is in the cache:

Cache hit: data is in the cache, so no need to fetch the main memory.
Cache miss: data is not in the cache, the processor queries main memory and stores the result in the cache.

The cache microarchitecture may slightly vary depending on the target architecture, but the approach should be quite similar. In the following we consider caches in ARMv8-A processors ⁴.

Set-associative cache

When we refer to a cache, we usually mean a set-associative cache. Such a cache is organized in lines, sets and ways. Each memory address is mapped to a set index (bits 6\to13 on the figure), meaning they are cached in a line in that set. Ways refer to the amount of parallel lines in each set. A collision arises when different congruent addresses map to the same set.

As we can see above, the data address is divided into three parts:

A tag used at read time to check if the cached data matches the one at the requested address. All cache ways are fetched in parallel to verify the tag.
A set index to find out which set is mapped to this address.
An offset used to fetch the requested bytes from the cache line.

The cache may either use virtual or physical addresses for indexing and tagging, depending on the configuration. For example, VIPT stands for Virtually Indexed and Physically Tagged.

On Linux, you can check your cache configuration in /sys/devices/system/cpu/cpuX/cache/indexY ⁵, where X refers to the core number and Y to the cache number from the core perspective:

$ ls /sys/devices/system/cpu/cpu0/cache/index0
coherency_line_size  physical_line_partition  type
id                   shared_cpu_list          uevent
level                shared_cpu_map           ways_of_associativity
number_of_sets       size

Cache attacks

Now that we know the basics of cache microarchitecture, we can finally move on to main topic. Cache attacks take advantage of the time difference between cache hits and cache misses. Let's take a simple example to understand how.

bool strcmp(string a, string b) {
  static bool ok = true;

  for (size_t i = 0; i < a.length(); ++i) {
    if (a[i] != b[i]) {
      ok = false;
    }
  }

  return ok;
}

Constant-time function comparing two strings

The above function is a constant-time variant of that we have seen earlier. The difference is that it now turns a flag off when it encounters mismatched characters and only returns after the entire loop has finished. We assume that a and b are the same length and that ok is defined to true before the function is called, hence the static keyword.

In this code, the variable ok is read from main memory on line 6 only if the input characters match. If so, the new ok value is also stored in the cache. So, an attacker measuring access time to that memory location and able to distinguish cache hits from cache misses can determine if the ok variable was accessed, thus if characters mismatched.

Let's see how we can detect cache hits without accessing the variable ok directly.

Prime+Probe

Prime+Probe ⁶ is one of the main techniques used in cache attacks to detect cache hits. It consists of three steps:

Prime: the attacker fills the entire cache set by reading data with congruent addresses from its own address space, i.e. data whose addresses map to the same cache set.
The victim process possibly evicts one of the lines when loading data into the cache.
Probe: the attacker measures how long it takes to read all the congruent addresses from step 1. A longer access time means that the data was evicted and the victim accessed an address mapped to that cache set.

Case studies

In the previous section, we have seen an illustrative example where the cache leaks sensitive data. But there are several limitations to this example, the first being that no one would implement this function with a static ok variable. We will now look at four realistic scenarios where cache attacks can leak sensitive data.

RSA

Our first example is an attack against RSA implementations that use the square-and-multiply algorithm ⁷. RSA is a widely used cryptographic scheme used in the TLS protocol to encrypt HTTP communications, which is referred to as HTTPS.

RSA deals with huge modular exponentiations that are not as easy to compute as with low numbers. This is where the square-and-multiply algorithm comes in.

/* Compute m = c^d */
long squareAndMultiply(long c, long d, long n) {
  long m = c;

  for (size_t i = 0; i < log2(d); ++i) {
    m = (m * m) % n;  // Modular exponentiation

    if (e[i] == 1) {
      m = (m * c) % n;
    }
  }

  return m;
}

Square-and-multiply algorithm

All the security of RSA relies on the fact that there is a private exponent that cannot be known by an attacker. We won't go into more details because only the above algorithm is relevant for us.

At some point during the decryption, the square-and-multiply algorithm is used to compute c^d \mod n, where c is the ciphertext, d is the private exponent, and n is the modulus. At each loop iteration, the algorithm accesses c on line 9 only if the i^\text{th} exponent bit is 1. An attacker measuring access time of congruent addresses with Prime+Probe can therefore determine whether c was accessed, thus the value of the private exponent's i^\text{th} bit.

AES

Our second case study exploits a specific optimization in the implementation of AES ⁸. AES is a widely used encryption scheme that runs in a certain number of rounds of four subsequent operations: SubBytes, ShiftRows, MixColumns and AddRoundKey. Modern architectures now have a specific set of instructions to run AES much faster, but in the past it had to be done "by hand" and software optimization was welcome to improve performance.

One of these optimizations relies on precomputed lookup tables called T-Tables ⁹. Each T-Table is 256 bytes long, thus spanning four different cache sets of 64-byte lines. The actual byte accessed from these tables depends on the key, so there is a data-dependent memory access that an attacker can exploit to recover parts of the secret key at each round.

SVG filter

SVG filters allow you to apply graphical transformations on HTML elements with the filter CSS property. For example, the images on this page are transformed to suit dark and light themes.

You can also use the <filter> SVG element, which lets you put filter primitives together to create more complex transformations. Here is an example:

<svg xmlns="http://www.w3.org/2000/svg">
  <filter id="bw">
    <feComponentTransfer>
      <feFuncR type="discrete" tableValues="0 1" />
      <feFuncG type="discrete" tableValues="0 1" />
      <feFuncB type="discrete" tableValues="0 1" />
    </feComponentTransfer>
  </filter>
</svg>

Custom black & white filter

The <feComponentTransfer> filter primitive remaps the colors according using transfer functions that are applied on the pixel’s red, green, blue and alpha components. These functions are interpolated from the tableValues attributes.

Recent studies have shown that the <feComponentTransfer> filter can be exploited in a cache attack because transfer functions are precomputed and stored in lookup tables of 256 bytes, so they span multiple cache sets ¹⁰. As for AES, an attacker monitoring the cache latency can determine which cache set is accessed by the browser, thus the color of the pixel.

<feComponentTransfer> lookup table — `<feComponentTransfer>` lookup table

The ability to determine the color of an arbitrary pixel allows an attacker to determine whether a link <a> points to a visited URL, or even to read the content of an <iframe>. In both cases, it defeats the browser security features.

Transient execution attacks

The last example of cache attack application we will discuss is Spectre, a well-known transient execution attack ¹¹. There are several variants of this attack, but we will essentially focus on the first one, also called Spectre-PHT.

A transient execution is when a sequence of instructions is executed but not committed. This can happen for example with branch prediction, as in the following code:

if (x < size(arr1)) {
  y = arr2[arr1[x] * 4096];
}

Spectre-PHT gadget

We won't go into detail here, but in some situations the code on line 2 may be speculatively executed before the result of the condition on line 1 is known. This mechanism is called branch prediction and considerably improves performance when the predicted branch is correct. Otherwise, the state must be reverted as it was before the execution of line 2.

Unfortunately, speculative execution has brought a new category of microarchitectural attacks known as transient execution attacks. Spectre is arguably the most notable example.

The main idea of Spectre is to trigger a gadget, a piece of code in the victim's program that can leak secret data through the cache in a transient execution. For instance, the code above leaks the value of arr1[x], no matter if x is actually in the bounds of arr1. This would not be possible without speculative execution because of the bound check on line 1.

Instead of conclusion

This concludes our brief exploration of cache attacks. I hope this has taught you something and that you are now as fascinated as I am by this unusual kind of attack.

If you want a deeper look at the topic, I suggest checking out the work of Daniel Gruss and Moritz Lipp, especially their respective PhD theses ⁶ ¹². They provide a great overview to help you dive deeper into the topic of microarchitectural attacks.

If you're not afraid of reading scientific publications, you'll find plenty of other documented microarchitectural attacks on academic databases. You might also want to take a look at the Arm documentation ¹³.

Finally, you can also check out my Master thesis here :)