Issue metadata
Sign in to add a comment
|
[regexp] RegExp with exponential runtime on blog.chromium.org
Reported by
jleedev@gmail.com,
Jun 12 2016
|
||||||||||||||||||||||||
Issue descriptionChrome Version : 53.0.2750.0 (Developer Build) (64-bit) Operating System and Version: OS X, Chrome OS, Node. URLs (if applicable) : http://blog.chromium.org/2016_06_01_archive.html Description of performance problem: 1. Load http://blog.chromium.org/2016_06_01_archive.html 2. Observe 100% CPU usage in BreakpointHandler.finalizeSummary I can reproduce this in Firefox, but I cannot reproduce this in Safari. Remember to attach your trace file to this bug! Attached. Also attached is a stripped down copy of blog.chromium.org that still exhibits the issue, as well as a single regexp call; pasted below but also attached since it has significant whitespace. s="\n\n \n <ul>\n <li>\n <div>\n <span>Sites</span>\n <a><span>can now experiment</span></a>\n <span>with</span>\n <a><span>persistent storage</span></a>\n <span>as an</span>\n <span>, allowing a site to disable automatic storage clearing when bookmarked.</span>\n </div>\n </li><li>\n <div>\n <span>Multiple WebVTT tracks will now be presented as</span>\n <a><span>user options</span></a> \n </div>\n </li></ul>"; t=s.replace(/(<br>|\s+)*$/,'');
,
Jun 16 2016
Setting component Blink>Javascript for V8 team to triage.
,
Jun 17 2016
Feel free to close it if this is not something actionable.
,
Jul 20 2016
This is back, on googlechromereleases now.
,
Jul 20 2016
Confirmed. The page takes 100% CPU and doesn't render. http://googlechromereleases.blogspot.com/
,
Apr 20 2017
Here are some more data-points: https://jsperf.com/pathological-regexp - Safari is 300million times faster than Chrome recognizing a tiny, 30character string. Some googling suggests this may be the problem: https://swtch.com/~rsc/regexp/regexp1.html
,
Apr 20 2017
In case jsperf is down - which it is sometimes these days, here is the repro javascript: var regexp10 = /a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa/, string10 = "aaaaaaaaaa", regexp20 = /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa/, string20 = "aaaaaaaaaaaaaaaaaaaa", regexp30 = /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/, string30 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" ; # Tests: regexp10.test(string10); regexp20.test(string20); regexp30.test(string30); # results: # Chrome 57.0.2987.133 (64-bit): # regexp10: 35,8914 / second # regexp20: 110 / second # regexp30: 0.11 / second # # Safari 10.1 (12603.1.30.0.34): # regexp10: 50,098,582 / second # regexp20: 40,519,142 / second # regexp30: 32,515,791 / second
,
Aug 21 2017
,
Nov 3 2017
,
Nov 6 2017
Exponential backtracking again. A sequence of n 'a?' in the given patterns in #7 will result in 2^n options that the regexp engine needs to try. A better solution would be /a{0,30}aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/.
The pattern from #4 and #0 is a bit different but also results in exponential behavior. A brief repro there is:
/(<br>|\s+)*$/.exec(" a")
Here, I'd suggest slightly changing the pattern to
/(<br>|\s)*$/
to remove the nested repetition.
,
Nov 6 2017
To be clear, the "aaa" example was an example. Here is what I ran into in the real world:
/^([^,:]|\(.*\)|\{.*\}|\".*\"|\'.*\'|\[.*\])*$/.test('{"mappings":{"post_search":{"_parent":{"type":"topic"},"properties":{"topicId":{"type":"keyword"},"userId":{"type":"keyword"},"postOrder":{"type":"long"},"createdAt":{"type":"long"},"updatedAt":{"type":"long"},"isChapterStart":{"type":"boolean"},"caption":{"type":"text","analyzer":"english"},"tags":{"type":"text"},"mentions":{"type":"text"},"mediaMimeType":{"type":"keyword"},"mediaAspectRatio":{"type":"short"},"mediaUrl":{"type":"keyword","index":false},"mediaFocus":{"type":"object","index":false},"mediaDimensions":{"type":"object","index":false},"mediaColorInfo":{"type":"object","index":false},"templateUrl":{"type":"keyword","index":false},"templateType":{"type":"keyword"},"templateDropInCount":{"type":"byte"},"templateText":{"type":"text","analyzer":"english"},"templateDropInLocations":{"type":"nested","index":false},"templateUses":{"type":"integer"},"activityCount":{"type":"integer"},"lastActivityAt":{"type":"long"},"messageCount":{"type":"integer","index":false},"participantCount":{"type":"integer","index":false},"lastActiveUserId":{"type":"keyword","index":false},"lastMessageId":{"type":"keyword","index":false}}},"topic_search":{"_parent":{"type":"user"},"properties":{"title":{"type":"text","analyzer":"english"},"createdAt":{"type":"long"},"updatedAt":{"type":"long"},"lastPostCreatedAt":{"type":"long"},"lastPostId":{"type":"keyword"},"lastChapterPostId":{"type":"keyword"},"postCount":{"type":"integer"},"followerCount":{"type":"integer"},"activityCount":{"type":"long"},"messageCount":{"type":"long"},"isProfileTopic":{"type":"boolean"}}},"user_search":{"properties":{"displayName":{"type":"text","analyzer":"english"},"postCount":{"type":"integer"},"topicCount":{"type":"short"},"followerCount":{"type":"integer"},"messageCount":{"type":"integer"},"lastTopicCreatedAt":{"type":"long"},"lastPostCreatedAt":{"type":"long"},"profileTopicId":{"type":"keyword","index":false}}}},"settings":{}}');
,
Nov 6 2017
#11 seems similar to /(<br>|\s+)*$/ as described in #10, A repeated alteration with nested repetitions.
,
Nov 6 2017
Evidently there is a better way to implement RegExp which avoids these problems. There is a well-written article on this here: https://swtch.com/~rsc/regexp/regexp1.html Did I mention Safari is 300 MILLION times faster than Chrome on this? 300 million? Granted this doesn't effect most RegExp, BUT, it does effect real-world examples. I was shocked when my above example brought chrome to its knees where Safari breezed through it without a hiccup.
,
Nov 6 2017
Last, this is not a duplicated of 287: input box rendering error : Facebook. https://bugs.chromium.org/p/chromium/issues/detail?id=287 has nothing to do with RegExp. Can we unmerge this? |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by jleedev@gmail.com
, Jun 16 2016It appears the regexp on the blog has been fixed: summaryHtml = $.trim(summaryHtml).replace(/(<br>|\s)+$/,''); Nonetheless, Safari appears to optimize the regexp in this example in a way which Chrome does not, which may still be interesting.