Project: v8 Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Issue 2254 Negated regexp character class take unusually long to evaluate.
Starred by 7 users Project Member Reported by yangguo@chromium.org, Jul 24 2012 Back to list
Status: Duplicate
Merged: issue 430
Owner:
Closed: May 2014
Cc:
HW: ----
OS: ----
Priority: 3
Type: FeatureRequest



Sign in to add a comment
new RegExp("([^a]*)*b").test("ccccccccccccccccccccccccccccccccc")

takes extremely long time, whereas
new RegExp("([ad]*)*b").test("ccccccccccccccccccccccccccccccccc")

terminates almost immediately.
 
The first case takes exponential time because [^a] includes c, and back tracking takes up all the time.

We could notice that the pattern does not match since b is not part of the subject string. We already do smart multi-pass checks for elements of a TextNode. however, text class in quantifiers are not included as elements of a TextNode.
A solution that comes to mind is to determine the atoms on the critical path of the regular expression and search for those atoms in the subject string first, in order to fail early.
Labels: -Type-Bug -Priority-Medium Type-FeatureRequest Priority-Low
Comment 4 by dontf...@gmail.com, Apr 15 2013
I guess it is worth pointing out that this provides an easy DoS attack on, for example, Node.js applications that use a regular expression of this form on user input. In the example I bumped into, a single small request could keep a process running at 100% CPU for hours.

The risk of this is increased by the common recommendation to use [^] in place of . if you want something that matches newlines. For example: http://stackoverflow.com/questions/1387116/matching-multiline-patterns .
There is no general way to detect exponential backtracking regexps, so if you let the attacker provide the regexp then you can be DoSed.  The discussion on this bug is about a way to mitigate this particular exponential backtracking regexp, it doesn't solve the general case.

This bug could be dupped to  issue 430 .

Note that the regexp engine does check the stack limit on loop back edges, so it would be possible to break execution after a timeout or some other heuristic, in the same way that the debugger can break an infinite loop.  But right now there is AFAIK no mechanism to declare the regexp 'failed to match' and continue execution normally.
I think there is certain value in preventing DoS attacks that could happen this way, but there are two questions to be answered:
a) Where do we set the time limit before aborting? 10 seconds for example might be too short, but too long to prevent DoS attacks.
b) What do we do when the time runs out? Return with no match? Throw an exception?

We could introduce a flag for (a), but I wonder whether this is an actual problem.
 Issue 2668  has been merged into this issue.
http://code.google.com/p/v8/issues/detail?id=2912

It is an actual problem. We have a server app that uses a lot of regexes.
We faced that issue twice already: dead hang of a server process, a regex going exponential on *some* inputs.

Regarding those two questions:
a) time limit is better given at compile time; we can always set it to e.g. 3 sec for a server process;
b) clearly, throw an exception.

The actual issue is being able to locate/fix that regex fast; graceful degradation is less important here, IMHO.
Mergedinto: 430
Status: Duplicate
It's bad to nest quantifiers in this way:

(.*)*

where the dot represents anything that is likely to match a lot of consecutive characters in a string.  This pattern is a known weakness of backtracking regexps, which is what JS has.  V8 could do a better job of identifying and optimizing such regexps, but you can never catch them all.

The reason the positive one matches faster than the negative one is that [^c] matches many consecutive characters in the string, whereas [ad] does not.
Labels: Priority-3
Sign in to add a comment