• Corporate
Published on 5 March 2019

Escape from a Docker container: Explanation of the last patched vulnerability on docker < 18.09.2 (CVE-2019-5736)

A Common Vulnerability Exposure (CVE-2019-5736) was released on February 11, 2019, regarding Docker, the well-known containerization platform. This vulnerability allows to escape from a container and get a root access on its host machine. That could be done by overwriting a binary file (runC) on a host machine from one of its containers. The flaw was also present for privileged Linux Containers (LXC), a “userspace interface for the Linux kernel containment features”.

The goal of the present article is to explain in detail how an attacker could manage to exploit the flaw and how it was corrected to prevent such an attack. Below elements will be covered in this article:

1. What is runC?
2. What is the proc filesystem?
3. How the proc filesystem of a docker container can be used to overwrite the runC program?
4. How a malicious image can be built ? (Shared libraries)
5. How do the vulnerability was fixed?

1. What is runC?

According to the Open Container Initiative (OCI), runC is a lightweight universal run time container. It is used by containerd as a CLI tool for spawning and running containers. The binary file of runC can be found at the following path /usr/sbin/runc on linux systems.

The program containerd is a container runtime available as a daemon for Linux and Windows. It is used by docker to manage the complete container lifecycle of its host system such as image transfer and storage, container execution and supervision, low-level storage and network attachments.

The CVE-2019-5736 bulletin revealed that due to a flaw in runC, it was possible to overwrite the host machine’s runC binary in /usr/sbin/runc using the proc filesystem of one of its containers.

2. What is the proc filesystem?

On Linux, the proc filesystem is a pseudo filesystem in which a lot of data about kernel and processes can be found. It is mounted at /proc. Moreover, every process is represented by a directory in /proc/[pid_of_the_process]. All information about processes are stored in files located in those directories. For example, there are files and subdirectories such as the following :

  • /proc/[pid]/exe: this file is a symbolic link containing the actual pathname of the executed command.
  • /proc/[pid]/fd/: This is a subdirectory containing one entry for each file which the process has open, named by its file descriptor, and which is a symbolic link to the actual file.

Using the same structure, the directory /proc/self refers to the process accessing the /proc filesystem and is identical to the /proc directory named by the process ID of the same process. This means that the exe file and the fd/ subdirectory can also be found in the /proc/self directory as well as every other file contained in /proc/[pid] directories.

3. How the proc filesystem of a docker container can be used to overwrite the runC program?

When someone executes the commands docker run and docker exec, the runC binary is called by containerd to spawn and run a container. During this operation, the process which /proc/self refers to is the runC’s one. In other words, at this time, /proc/self/exe is a symbolic link which points to /proc/[runC_pid]/exe which points to /usr/sbin/runc file on the host machine.

However, as the runC process is running, overwriting runC binary is not allowed. Moreover, if the attacker waits for the end of the process to overwrite it, /proc/self/exe will not point to the runC binary anymore. Nevertheless, the runC file overwriting can be done with a trick using the runC’s shared libraries.

4. How a malicious image can be built? (Shared libraries)

The runC binary dynamically loads several shared libraries when it is executed. The specificity of shared libraries is that, unlike static libraries, they are loaded by the executable (or other shared libraries) at the runtime. runC’s shared libraries can be shown by executing the command below:

$ ldd /user/sbin/runc
    linux-vdso.so.1 (0x00007ffc9d526000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fed38ddf000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fed38bdb000)
    libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007fed38995000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fed385f6000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fed39de0000)

Overwriting one of those libraries would allow us to make runC calling our code. This will give us the opportunity to be part of the runC process and perform the trick described below to “overwrite a binary used to launch a running process”.

The scenario introduced here is to craft a malicious Docker image and send it to a victim by anyway (by using a Docker registry for example). To craft it, a Dockerfile performing below actions needs to be created:

  1. Install the shared library which will be overwritten.
  2. Overwrite one of the files of the library with a malicious code. We will name it reader.c for the explanation.
  3. Create a symbolic link to /proc/self/exe
  4. Set the previous symbolic link as the entrypoint.

Now, let’s describe actions of the reader.c file.

  1. Open the /proc/self/exe file in read-only. This creates a file descriptor located at /proc/self/fd/[fd_for_reading].
  2. Call another executable file (we will call this file overwriter for the explanation) using the function execve() and passing in argument the path to the file descriptor opened in read-only.

The runC executable has to not be used by a process anymore to be overwritten. Nevertheless, to do it, the file descriptor opened in read-only has to remain open.

According the man page of the execve() function, it “executes the program pointed to by filename. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments.” Another point is important in the man page: “By default, file descriptors remain open across an execve().”

The execve function is therefore perfect to end runC usage and launch the executable (overwriter) which will overwrite the runC binary. Note that the path /proc/self/fd/[fd_for_reading] will be given to overwriter.

Finally, the overwriter binary will follow the steps below:

  1. Read the crafted runC we wrote with a back door.
  2. Try to open the /proc/self/fd/[fd_for_reading] with writing flag until success. A new file descriptor with writing permission will be given to us.
  3. Write the crafted runC on the original runC using the new file descriptor.

After the overwriter process ended, the runC binary (the overwritten one) will be called a second time to end the container. This will execute the malicious crafted runC binary.

Let’s see the exploitation in real life.

5. How do the vulnerability was fixed?

In order to prevent the runC file to be overwritten, runC’s maintainers make some changes in its mechanism:

  1. They sealed the runC binary
  2. They created an original runC. Now, every time runC binary is called, a new copy of the original runC is made. Only the new copy is then executed. If the first mechanism is bypassed and an attacker succeeds to overwrite the runC binary copy, that does not impact the system because this hijacked copy will be overwritten by a new copy of the original.

However, this fix seems to increase the use of memory to launch containers, according to an issue opened on the runC Github repository. Nevertheless, a maintainer is currently implementing a solution to fix this unfortunate effect. (see runC’s Github repository for more details).


The Common Vulnerability Exposure CVE-2019-5736 found on Docker’s runC library has been explained in this article. An introduction of runC and proc filesystem has been provided in order to have the basic knowledge to understand why the vulnerability was present.

An exploitation of the vulnerability using one of the runC’s shared libraries has been introduced to understand with deep details how attackers could manage to get a root access on your system (which uses a version of docker < 18.09.2) using a malicious image.

Finally, an explanation of the performed vulnerability fix has been provided in order to figure out how runC’s maintainers prevented attackers to overwrite it.

I hope I caught your attention with this explanation and that you found it interesting. Note that the fix is available in the new release of Docker Engine 18.09.2. Feel free to contact me if you have any question regarding this article at contact@evagroup.fr

Article written by
Stanley Ragava