Issue metadata
Sign in to add a comment
|
11.4%-12.5% regression in blink_perf.parser at 417443:417526 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Sep 12 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9001716210969876336
,
Sep 12 2016
=== Auto-CCing suspected CL author thakis@chromium.org === Hi thakis@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Roll clang 280106:280836. Author : thakis Commit description: New: * win: Speculative fix for LNK1285/delete-pdb-and-rebuild * win: Support for [] ATL-style uuid() attributes * win: __nop intrinsic now has a definition * win: complete codeview debug info for vptrs Also switch to gnuwin-5, which is identical to gnuwin-4 except that it has gnuwin32's od.exe too. Also switch to the python version of the Android NDK "make standalone toolchain" script, since the .sh version is deprecated and stopped working after the recent NDK update. BUG= 644351 , 644976 , 644977 Review-Url: https://codereview.chromium.org/2317123004 Cr-Commit-Position: refs/heads/master@{#417492} Commit : e9ed3eadc79673e746507158cfbdf2651961af6f Date : Fri Sep 09 03:19:26 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@417491 6.85555 0.176865 5 good chromium@417492 6.11672 0.0977077 5 bad <-- chromium@417493 6.17671 0.147906 4 bad chromium@417494 6.13866 0.12526 5 bad chromium@417497 6.02044 0.0366794 5 bad Bisect job ran on: mac_10_11_perf_bisect Bug ID: 645962 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests blink_perf.parser Test Metric: url-parser/url-parser Relative Change: 12.18% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_11_perf_bisect/builds/888 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9001716210969876336 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5804581143969792 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Sep 12 2016
Probably the same root cause as bug 643724 (an optimization landed in clang in the previous roll and got temporarily disabled again in this roll).
,
Sep 12 2016
I don't see an improvement within the last several months, so it seems somewhat unlikely to be that.
,
Sep 13 2016
krasin, I don't suppose this rings a bell as well? (Admittedly a bit of a long shot.)
,
Sep 13 2016
I will take a look, but tomorrow (I am fully booked for today, sorry).
,
Sep 13 2016
Profiles: Old clang (r280106): + 8.04% url::RemoveURLWhitespace(char const*, int, url::CanonOutputT<char>*, int*) + 7.64% WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits(char const*, char const*, unsigned int&, unsigned int&) + 4.39% bool url::(anonymous namespace)::DoPartialPath<char, unsigned char>(char const*, url::Component const&, int, url::CanonOutputT<char>*) + 3.83% bool url::(anonymous namespace)::DoSimpleHost<char, char>(char const*, int, url::CanonOutputT<char>*, bool*) + 3.27% void url::(anonymous namespace)::DoParseAuthority<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*, url::Component*) + 3.15% WTF::String::containsOnlyASCII() const + 3.02% bool url::(anonymous namespace)::DoIsStandard<char>(char const*, url::Component const&, url::SchemeType*) + 2.85% void url::(anonymous namespace)::ParsePath<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*) + 2.77% WTF::HashTableAddResult<WTF::HashTable<WTF::StringImpl*, WTF::StringImpl*, WTF::IdentityExtractor, WTF::StringHash, WTF::HashTraits<WTF::StringImpl*>, WTF::HashTrai + 2.69% void url::(anonymous namespace)::DoParseAfterScheme<char>(char const*, int, int, url::Parsed*) + 2.65% void url::(anonymous namespace)::DoHost<char, unsigned char>(char const*, url::Component const&, url::CanonOutputT<char>*, url::CanonHostInfo*) + 2.30% WTF::HashAndUTF8CharactersTranslator::equal(WTF::StringImpl* const&, WTF::HashAndUTF8Characters const&) + 1.76% WTF::String blink::v8StringToWebCoreString<WTF::String>(v8::Local<v8::String>, blink::ExternalMode) New clang (r280836): + 10.28% bool url::(anonymous namespace)::DoPartialPath<char, unsigned char>(char const*, url::Component const&, int, url::CanonOutputT<char>*) + 7.16% url::RemoveURLWhitespace(char const*, int, url::CanonOutputT<char>*, int*) + 7.00% WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits(char const*, char const*, unsigned int&, unsigned int&) + 6.67% bool url::(anonymous namespace)::DoSimpleHost<char, char>(char const*, int, url::CanonOutputT<char>*, bool*) + 3.55% WTF::HashTableAddResult<WTF::HashTable<WTF::StringImpl*, WTF::StringImpl*, WTF::IdentityExtractor, WTF::StringHash, WTF::HashTraits<WTF::StringImpl*>, WTF::HashTrai + 2.83% WTF::HashAndUTF8CharactersTranslator::equal(WTF::StringImpl* const&, WTF::HashAndUTF8Characters const&) + 2.67% WTF::String blink::v8StringToWebCoreString<WTF::String>(v8::Local<v8::String>, blink::ExternalMode) + 2.57% bool url::(anonymous namespace)::DoIsStandard<char>(char const*, url::Component const&, url::SchemeType*) + 2.50% WTF::String::containsOnlyASCII() const + 2.47% void url::(anonymous namespace)::DoParseAuthority<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*, url::Component*) + 2.41% void url::(anonymous namespace)::ParsePath<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*) + 2.09% void url::(anonymous namespace)::DoParseAfterScheme<char>(char const*, int, int, url::Parsed*) + 1.83% url::CanonicalizeScheme(char const*, url::Component const&, url::CanonOutputT<char>*, url::Component*) + 1.73% url::CanonicalizeQuery(char const*, url::Component const&, url::CharsetConverter*, url::CanonOutputT<char>*, url::Component*) As we can see, DoPartialPath, DoSimpleHost are >2x slower with the new Clang. Taking a closer look at DoPartialPath.
,
Sep 13 2016
For the record, I build Chromium, not Chrome. So, this slowdown is 100% is not related to CFI.
,
Sep 14 2016
So, it's a spill. DoPartialPath calls CanonOutputT<char>::push_back and there's a load in the performance-critical check: https://cs.chromium.org/chromium/src/url/url_canon.h?q=In+VC2005,+putting.this.common.case.first.speeds.up.execution&sq=package:chromium&l=91&dr=C // In VC2005, putting this common case first speeds up execution // dramatically because this branch is predicted as taken. if (cur_len_ < buffer_len_) { buffer_[cur_len_] = ch; cur_len_++; return; } ... (slow path) Old Clang (r280106): │ void push_back(T ch) { │ // In VC2005, putting this common case first speeds up execution │ // dramatically because this branch is predicted as taken. │ if (cur_len_ < buffer_len_) { 3.76 │ mov 0x10(%rbx),%eax 2.09 │ cmp %eax,%ecx │ ↓ jge 600 │ buffer_[cur_len_] = ch; 1.25 │ mov 0x8(%rbx),%rax 1.88 │ mov %bpl,(%rax,%rcx,1) 34.66 │ ↓ jmpq 7f0 New Clang (r280836): │ void push_back(T ch) { │ // In VC2005, putting this common case first speeds up execution │ // dramatically because this branch is predicted as taken. │ if (cur_len_ < buffer_len_) { 4.49 │ mov 0x10(%r13),%rax 46.09 │ cmp %eax,0x14(%r13) <==================== LOAD ! │ ↓ jge 6c0 7.73 │ shr $0x20,%rax │ ↓ jmpq 700 │ nop Looking at Clang ToT. Recently, we had at least one similar issue fixed: https://llvm.org/bugs/show_bug.cgi?id=30244
,
Sep 14 2016
Haha, synchronicity, I think one of the changes on that bug caused https://llvm.org/bugs/show_bug.cgi?id=30373 (which I need to reduce tomorrow) :-) While looking at it, I figured that it's probably unrelated since the regression was earlier than the roll blamed here.
,
Sep 14 2016
In any case, we don't care much about VC2005 anymore, so if we can change the code to fix things that's probably fine too :-)
,
Sep 14 2016
The code actually looks reasonable to me even without taking VS2005 into the account. Also, the spill would have happened anyway.
,
Sep 14 2016
But the comment could now be removed, I guess.
,
Sep 14 2016
Yes, it's fixed in ToT: Clang ToT (r281246) │ void push_back(T ch) { │ // In VC2005, putting this common case first speeds up execution │ // dramatically because this branch is predicted as taken. │ if (cur_len_ < buffer_len_) { 6.67 │ mov 0x10(%rbx),%eax 4.37 │ cmp %eax,%ecx │ ↓ jge 600 │ buffer_[cur_len_] = ch; 1.15 │ mov 0x8(%rbx),%rax 1.61 │ mov %bpl,(%rax,%rcx,1) 35.40 │ ┌──jmpq 7f0 There's not much we can do now, other than verify that it's not regressed again before making the next Clang roll. For the reference, I used the following commands to build and run the benchmark: $ gn gen out/Release-tot '--args=is_debug=false symbol_level=2 clang_use_chrome_plugins=false clang_base_path="/usr/local/google/home/krasin/src/llvm.org/release-build"' --check $ time ninja -C out/Release-tot chrome $ ./tools/perf/run_benchmark --browser=exact --browser-executable=out/Release/chrome --story-filter=url-parser blink_perf.parser --results-label='Release' --profiler=perf $ perf report -i tools/perf/url_parser_html.renderer0
,
Sep 14 2016
,
Sep 14 2016
And thank you for taking a look!
,
Sep 22 2016
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by rsch...@chromium.org
, Sep 12 2016