New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 645962 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression

Blocked on:
issue 645953



Sign in to add a comment

11.4%-12.5% regression in blink_perf.parser at 417443:417526

Project Member Reported by rsch...@chromium.org, Sep 12 2016

Issue description

See the link to graphs below.
 
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Sep 12 2016

Cc: thakis@chromium.org
Owner: thakis@chromium.org

=== Auto-CCing suspected CL author thakis@chromium.org ===

Hi thakis@chromium.org, the bisect results pointed to your CL below as possibly
causing a regression. Please have a look at this info and see whether
your CL be related.


===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Roll clang 280106:280836.
Author  : thakis
Commit description:
  
New:
* win: Speculative fix for LNK1285/delete-pdb-and-rebuild
* win: Support for [] ATL-style uuid() attributes
* win: __nop intrinsic now has a definition
* win: complete codeview debug info for vptrs

Also switch to gnuwin-5, which is identical to gnuwin-4
except that it has gnuwin32's od.exe too.

Also switch to the python version of the Android NDK
"make standalone toolchain" script, since the .sh version
is deprecated and stopped working after the recent NDK
update.

BUG= 644351 , 644976 , 644977 

Review-Url: https://codereview.chromium.org/2317123004
Cr-Commit-Position: refs/heads/master@{#417492}
Commit  : e9ed3eadc79673e746507158cfbdf2651961af6f
Date    : Fri Sep 09 03:19:26 2016


===== TESTED REVISIONS =====
Revision         Mean     Std Dev    N  Good?
chromium@417491  6.85555  0.176865   5  good
chromium@417492  6.11672  0.0977077  5  bad    <--
chromium@417493  6.17671  0.147906   4  bad
chromium@417494  6.13866  0.12526    5  bad
chromium@417497  6.02044  0.0366794  5  bad

Bisect job ran on: mac_10_11_perf_bisect
Bug ID: 645962

Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests blink_perf.parser
Test Metric: url-parser/url-parser
Relative Change: 12.18%
Score: 99.9

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_11_perf_bisect/builds/888
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9001716210969876336


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5804581143969792

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!

Comment 4 by thakis@chromium.org, Sep 12 2016

Probably the same root cause as  bug 643724  (an optimization landed in clang in the previous roll and got temporarily disabled again in this roll).

Comment 5 Deleted

I don't see an improvement within the last several months, so it seems somewhat unlikely to be that.

Comment 7 by thakis@chromium.org, Sep 13 2016

Cc: krasin@chromium.org
krasin, I don't suppose this rings a bell as well? (Admittedly a bit of a long shot.)

Comment 8 by krasin@chromium.org, Sep 13 2016

Owner: krasin@chromium.org
I will take a look, but tomorrow (I am fully booked for today, sorry).

Comment 9 by krasin@chromium.org, Sep 13 2016

Profiles:

Old clang (r280106):

+   8.04% url::RemoveURLWhitespace(char const*, int, url::CanonOutputT<char>*, int*)
+   7.64% WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits(char const*, char const*, unsigned int&, unsigned int&)
+   4.39% bool url::(anonymous namespace)::DoPartialPath<char, unsigned char>(char const*, url::Component const&, int, url::CanonOutputT<char>*)
+   3.83% bool url::(anonymous namespace)::DoSimpleHost<char, char>(char const*, int, url::CanonOutputT<char>*, bool*)
+   3.27% void url::(anonymous namespace)::DoParseAuthority<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*, url::Component*)
+   3.15% WTF::String::containsOnlyASCII() const
+   3.02% bool url::(anonymous namespace)::DoIsStandard<char>(char const*, url::Component const&, url::SchemeType*)
+   2.85% void url::(anonymous namespace)::ParsePath<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*)
+   2.77% WTF::HashTableAddResult<WTF::HashTable<WTF::StringImpl*, WTF::StringImpl*, WTF::IdentityExtractor, WTF::StringHash, WTF::HashTraits<WTF::StringImpl*>, WTF::HashTrai
+   2.69% void url::(anonymous namespace)::DoParseAfterScheme<char>(char const*, int, int, url::Parsed*)
+   2.65% void url::(anonymous namespace)::DoHost<char, unsigned char>(char const*, url::Component const&, url::CanonOutputT<char>*, url::CanonHostInfo*)
+   2.30% WTF::HashAndUTF8CharactersTranslator::equal(WTF::StringImpl* const&, WTF::HashAndUTF8Characters const&)
+   1.76% WTF::String blink::v8StringToWebCoreString<WTF::String>(v8::Local<v8::String>, blink::ExternalMode)

New clang (r280836):

+  10.28% bool url::(anonymous namespace)::DoPartialPath<char, unsigned char>(char const*, url::Component const&, int, url::CanonOutputT<char>*)
+   7.16% url::RemoveURLWhitespace(char const*, int, url::CanonOutputT<char>*, int*)
+   7.00% WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits(char const*, char const*, unsigned int&, unsigned int&)
+   6.67% bool url::(anonymous namespace)::DoSimpleHost<char, char>(char const*, int, url::CanonOutputT<char>*, bool*)
+   3.55% WTF::HashTableAddResult<WTF::HashTable<WTF::StringImpl*, WTF::StringImpl*, WTF::IdentityExtractor, WTF::StringHash, WTF::HashTraits<WTF::StringImpl*>, WTF::HashTrai
+   2.83% WTF::HashAndUTF8CharactersTranslator::equal(WTF::StringImpl* const&, WTF::HashAndUTF8Characters const&)
+   2.67% WTF::String blink::v8StringToWebCoreString<WTF::String>(v8::Local<v8::String>, blink::ExternalMode)
+   2.57% bool url::(anonymous namespace)::DoIsStandard<char>(char const*, url::Component const&, url::SchemeType*)
+   2.50% WTF::String::containsOnlyASCII() const
+   2.47% void url::(anonymous namespace)::DoParseAuthority<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*, url::Component*)
+   2.41% void url::(anonymous namespace)::ParsePath<char>(char const*, url::Component const&, url::Component*, url::Component*, url::Component*)
+   2.09% void url::(anonymous namespace)::DoParseAfterScheme<char>(char const*, int, int, url::Parsed*)
+   1.83% url::CanonicalizeScheme(char const*, url::Component const&, url::CanonOutputT<char>*, url::Component*)
+   1.73% url::CanonicalizeQuery(char const*, url::Component const&, url::CharsetConverter*, url::CanonOutputT<char>*, url::Component*)

As we can see, DoPartialPath, DoSimpleHost are >2x slower with the new Clang. Taking a closer look at DoPartialPath.
For the record, I build Chromium, not Chrome. So, this slowdown is 100% is not related to CFI.

So, it's a spill. DoPartialPath calls CanonOutputT<char>::push_back and there's a load in the performance-critical check:

https://cs.chromium.org/chromium/src/url/url_canon.h?q=In+VC2005,+putting.this.common.case.first.speeds.up.execution&sq=package:chromium&l=91&dr=C

    // In VC2005, putting this common case first speeds up execution
    // dramatically because this branch is predicted as taken.
    if (cur_len_ < buffer_len_) {
      buffer_[cur_len_] = ch;
      cur_len_++;
      return;
    }
... (slow path)

Old Clang (r280106):

       │       void push_back(T ch) {
       │         // In VC2005, putting this common case first speeds up execution
       │         // dramatically because this branch is predicted as taken.
       │         if (cur_len_ < buffer_len_) {
  3.76 │       mov    0x10(%rbx),%eax
  2.09 │       cmp    %eax,%ecx
       │     ↓ jge    600
       │           buffer_[cur_len_] = ch;
  1.25 │       mov    0x8(%rbx),%rax
  1.88 │       mov    %bpl,(%rax,%rcx,1)
 34.66 │     ↓ jmpq   7f0

New Clang (r280836):

       │       void push_back(T ch) {
       │         // In VC2005, putting this common case first speeds up execution
       │         // dramatically because this branch is predicted as taken.
       │         if (cur_len_ < buffer_len_) {
  4.49 │       mov    0x10(%r13),%rax
 46.09 │       cmp    %eax,0x14(%r13) <==================== LOAD !
       │     ↓ jge    6c0
  7.73 │       shr    $0x20,%rax
       │     ↓ jmpq   700
       │       nop         

Looking at Clang ToT. Recently, we had at least one similar issue fixed: https://llvm.org/bugs/show_bug.cgi?id=30244

Haha, synchronicity, I think one of the changes on that bug caused https://llvm.org/bugs/show_bug.cgi?id=30373 (which I need to reduce tomorrow) :-)

While looking at it, I figured that it's probably unrelated since the regression was earlier than the roll blamed here.
In any case, we don't care much about VC2005 anymore, so if we can change the code to fix things that's probably fine too :-)
The code actually looks reasonable to me even without taking VS2005 into the account. Also, the spill would have happened anyway.

But the comment could now be removed, I guess.
Owner: thakis@chromium.org
Yes, it's fixed in ToT:

Clang ToT (r281246)

       │       void push_back(T ch) {
       │         // In VC2005, putting this common case first speeds up execution
       │         // dramatically because this branch is predicted as taken.
       │         if (cur_len_ < buffer_len_) {
  6.67 │       mov    0x10(%rbx),%eax
  4.37 │       cmp    %eax,%ecx
       │     ↓ jge    600
       │           buffer_[cur_len_] = ch;
  1.15 │       mov    0x8(%rbx),%rax
  1.61 │       mov    %bpl,(%rax,%rcx,1)
 35.40 │    ┌──jmpq   7f0

There's not much we can do now, other than verify that it's not regressed again before making the next Clang roll.
For the reference, I used the following commands to build and run the benchmark:

$ gn gen out/Release-tot '--args=is_debug=false symbol_level=2 clang_use_chrome_plugins=false clang_base_path="/usr/local/google/home/krasin/src/llvm.org/release-build"' --check
$ time ninja -C out/Release-tot chrome
$ ./tools/perf/run_benchmark --browser=exact --browser-executable=out/Release/chrome  --story-filter=url-parser blink_perf.parser --results-label='Release' --profiler=perf
$ perf report -i tools/perf/url_parser_html.renderer0

Blockedon: 645953
Cool, that was easy :-)
And thank you for taking a look!
Status: Fixed (was: Assigned)
Should be fixed after https://codereview.chromium.org/2361513002

Sign in to add a comment