There’s a logic error in the PCRE engine version used in Flash that allows the execution of arbitrary PCRE bytecode, with potential for memory corruption and RCE.
The issue is in the handling of comments containing multibyte UTF8 characters in extended-mode regular expressions.
Simplest testcase that will crash in an ASAN build of of avmshell is the following:
(?x)#䞅䞅*(?1)
‘(?x)#\xe4\x9e\x85\xe4\x9e\x85*(?1)’
During compilation we will have an intermediate stage compiled to the following:
0000 5d0000 93 BRA [0]
0003 1b9e 27 CHAR ['\x9e']
0005 1e1b 30 STAR ['\x1b']
0007 85 133 INVALID
And we will then later at (?1) trigger a call to find_recurse, resulting in an index off the end of _pcre_OP_lengths again, similarly to the previously reported issues.
In this case, the code resulting in this issue is the following in compile_branch (beautified):
if (c == '#')
{
while (*(++ptr) != 0) <---- this increments one byte at a time
{
if (IS_NEWLINE(ptr)) <---- this understands UTF8
{
ptr += cd->nllen;
break;
}
}
This is iterating through a string of UTF8 characters one byte at a time, instead of one character at a time.
The UTF8 representation of 䞅 contains the byte \x85, which is interpreted by _pcre_is_newline as the unicode NEL (newline) character; so we conclude that the comment is finished part-way through a multibyte character. We then continue compiling the regex from offcut multibyte characters, and this results in a similar situation to the earlier issue PSIRT-3161 (https://code.google.com/p/google-security-research/issues/detail?id=199).
This bug was found using AFL, written by lcamtuf.
See attached for a crash PoC tested on the latest Chrome/Flash on x64 linux.
This bug is subject to a 90 day disclosure deadline. If 90 days elapse
without a broadly available patch, then the bug report will automatically
become visible to the public.