New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 851539 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----



Sign in to add a comment

Several blink_perf.events and blink_perf.parser stories failing on chromium.perf/Android Nexus5X WebView Perf

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Jun 11 2018

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of charliea@chromium.org

blink_perf.events on Android device Nexus 5X failing on chromium.perf/Android Nexus5X WebView Perf

Builders failed on: 
- Android Nexus5X WebView Perf: 
  https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus5X%20WebView%20Perf

This seemed to go from passing 100% to failing 100% between 564829 and 564866. I'm going to kick off a bisect and send out a CL disabling this story.
 
Project Member

Comment 2 by bugdroid1@chromium.org, Jun 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9f907acb7e32c92cba347c4e142fd1566c444728

commit 9f907acb7e32c92cba347c4e142fd1566c444728
Author: Charlie Andrews <charliea@chromium.org>
Date: Mon Jun 11 17:20:20 2018

Disable failing blink_perf.events story on 5X webview

NOTRY=true
TBR=nednguyen@chromium.org

Bug:  851539 
Change-Id: I2128758cdc151cf496ee9485d9f639641bb48c69
Reviewed-on: https://chromium-review.googlesource.com/1095636
Commit-Queue: Charlie Andrews <charliea@chromium.org>
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566040}
[modify] https://crrev.com/9f907acb7e32c92cba347c4e142fd1566c444728/tools/perf/expectations.config

Cc: haraken@chromium.org jbroman@chromium.org yukishiino@chromium.org
Components: Speed>Benchmarks>Waterfall
Owner: charliea@chromium.org
Status: Assigned (was: Available)
Summary: Several blink_perf.events and blink_perf.parser stories failing on chromium.perf/Android Nexus5X WebView Perf (was: blink_perf.events/EventsDispatchingInDeeplyNestedShadowTrees.html failing on chromium.perf/Android Nexus5X WebView Perf)
Looks like this is also happening for three blink_perf.parser stories, also on 5X webview perf:

- query-selector-all-class-deep.html
- query-selector-all-deep.html
- query-selector-all-id-deep.html

Given that these are all blink_perf tests that started failing in the same revision range, I'm going to assume that the culprit is the same and group these two bugs together.

I'm going to send out another CL that disables these blink_perf.parser stories.
Project Member

Comment 4 by bugdroid1@chromium.org, Jun 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4

commit 5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4
Author: Charlie Andrews <charliea@chromium.org>
Date: Mon Jun 11 17:38:48 2018

Disable several failing blink_perf.parser stories on Nexus 5X webview

NOTRY=true
TBR=nednguyen@chromium.org

Bug:  851539 
Change-Id: Ia51f717aa8c7ca47b47007fa4aa5077767a86e91
Reviewed-on: https://chromium-review.googlesource.com/1095676
Commit-Queue: Charlie Andrews <charliea@chromium.org>
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566043}
[modify] https://crrev.com/5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4/tools/perf/expectations.config

Project Member

Comment 5 by 42576172...@developer.gserviceaccount.com, Jun 11 2018

Cc: afdo-chr...@skia-buildbots.google.com.iam.gserviceaccount.com
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/16ee8ce9240000

Roll AFDO from 69.0.3450.0_rc-r1 to 69.0.3451.0_rc-r1 by afdo-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com
https://chromium.googlesource.com/chromium/src/+/b15094a674a95da8308f9516d5e35fea26f55169
0 → 1 (+1)

Assigning to sheriff gbiv@chromium.org because "Roll AFDO from 69.0.3450.0_rc-r1 to 69.0.3451.0_rc-r1" is a roll.

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Reverting is okay for now, but this needs to be fixed. We're actively using the three benchmarks to evaluate performance regressions.


Owner: g...@chromium.org
gbiv@, it looks like the above roll resulted in several functional breakages in blink_perf benchmark suite. How can we go about reverting it?

Comment 8 by g...@chromium.org, Jun 12 2018

That's highly concerning, since AFDO isn't meant to cause functional changes. If AFDO is what's breaking things, then it's either a compiler bug, or AFDO is exposing a latent bug somewhere in Chrome (presumably blink).

It looks like there was another roll 4 hours ago. I'll sync and see if that fixes this. If it doesn't (or if I can't get anywhere on it within the next 5ish hours), I'll stop the autoroller and roll us back to the 69.0.3450.0_rc-r1 profile until I can figure out what's going on.
Cc: llozano@chromium.org

Comment 10 by g...@chromium.org, Jun 13 2018

Having trouble reproducing this locally, so I stopped the autoroller.

Waiting for CR+1 on the rollback: https://chromium-review.googlesource.com/c/chromium/src/+/1097941 . It's really straightforward, so if anyone not in the R= line wants to jump in, please feel free. :)

Comment 11 by g...@chromium.org, Jun 13 2018

Looks like AFDO is causing this (turning it off fixes the test, as does flipping to an older profile), and these tests only fail on arm64 (for added fun).

I still get crashes on r565942, which has the profile we rolled to after 3451, so this doesn't appear to be a one-off profile flake.

I'll look more into it tomorrow, but we should be able to turn the blink tests back on after said revert lands.
Project Member

Comment 12 by bugdroid1@chromium.org, Jun 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/599a9e359e2d7da2e00b490fad9af5d50b1385a7

commit 599a9e359e2d7da2e00b490fad9af5d50b1385a7
Author: George Burgess IV <gbiv@chromium.org>
Date: Wed Jun 13 17:48:58 2018

Revert AFDO to 69.0.3450.0_rc-r1

The AFDO roll to 3451.0_rc-r1 is somehow breaking blink tests.
Until we can figure out why, revert to something known to work
with said tests.

This isn't a literal revert, since there have been AFDO rolls
since the one that broke us.

Offending commit: http://crrev.com/564831

Bug:  851539 
Test: CQ
Change-Id: Ifb172873492673945396dfd5b593bdc7e44ff590
Reviewed-on: https://chromium-review.googlesource.com/1097941
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Commit-Queue: George Burgess <gbiv@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566896}
[modify] https://crrev.com/599a9e359e2d7da2e00b490fad9af5d50b1385a7/chrome/android/profiles/newest.txt

Comment 13 by g...@chromium.org, Jun 15 2018

As an update, this appears to be a stack overflow. I can repro it without AFDO enabled if I add -fno-inline (which, as you may guess, prevents inlining) to the build rules in //third_party/blink/renderer/core/layout/BUILD.gn.

In this case, my assumption is that the AFDO roll caused us to not inline some pieces of layout/ as aggressively as before. If this is correct, this is a latent bug in blink.

The stack we're blowing is around 1MB. Linux's resource limits don't matter, since ART adds its own guard page at the 1MB limit. Modifying ART to place that at 2MB makes everything pass, so "we've been flying under the 1MB limit for a while, and are now starting to bump into it" seems like the most likely explanation to me.

Since all of the tests that this appears to break have "deep" in their names, this makes sense. :)

Neither gdb nor adb_gdb are willing to provide me with stack traces or symbol information, but I'll try to see what I can figure out from here.

Comment 14 by g...@chromium.org, Jun 19 2018

So, thanks to some python scripts, I think I can provide more insight here.

The backtrace contains 6 functions that use each other to recurse. In this case, we get 863 layers of recursion deep before hitting the segv (meaning there are 863 calls to func1(), 863 to func2(), 863 to func3(), ...)

The functions of interest are, in order of calling:

blink::LayoutBlockFlow::PositionAndLayoutOnceIfNeeded, which reserves 96 bytes of stack,
blink::LayoutBlock::UpdateLayout, which reserves 48 bytes of stack,
blink::LayoutBlockFlow::UpdateBlockLayout, which reserves 176 bytes of stack,
blink::LayoutBlockFlow::LayoutChildren, which reserves 112 bytes of stack,
blink::LayoutBlockFlow::LayoutBlockChildren, which reserves a whopping 512 bytes of stack, and finally
blink::LayoutBlockFlow::LayoutBlockChild, which reserves 272 bytes of stack,

Each "layer" of these collectively eats 1216B of stack. So, 863 "layers" consumes 1,049,408B of stack, which is 832B shy of our 1MB limit.

I'm unfamiliar with blink, but glancing at the code, the recursion here seems to be very much intended (no virtual/indirect call trickery). So, I'm unsure if we should just make these benchmarks less "deep", or if it's unexpected for these frames to be as large as they are. I'll ask for help on layout-dev@ shortly.

Comment 15 by g...@chromium.org, Jun 21 2018

Our blink friends said that the overly deep recursion was WAI, and collectively suggested:
- upping the stack limit
- turning the 'deepness' of the test down
- figuring out if there's any low-hanging fruit for making the frame sizes above smaller

I spent some time trying the third; here's what I found out:
- Non-official builds have really large CHECK()s. Like, 292B of stack large. That's where a majority of LayoutBlockChildren's stack usage came from.
- The biggest difference in stack usage between an AFDO and an AFDO -fno-inline build is in blink::LayoutBlockFlow::LayoutChildren. With inlining, this function eats 272 bytes of stack. Without, it eats 112.

The extra stack appears to be consumed entirely because we're not inlining enough. In particular, clang is reluctant to inline `PhysicalToLogical<BorderValue>::Before()`, which requires a 72 byte structure to be built up + passed in as `this`. The entirety of this method boils down to `if (fieldA == foo) return fieldB; return fieldC == bar ? fieldC : fieldD;`.

Marking this (and all other accessors in the class in question) as always_inline actually decreases our binary size by over 900 bytes (I'll get the champagne).

I'm happy to add always_inline if people want, but I think a more sure fix would just be to raise the stack limit for our renderer thread: https://crrev.com/c/1109213
wrt:

"// Our renderer process can recurse quite a bit on some of our own
// (somewhat patholgoical) benchmarks."

My vote is that we turn down the recursion level in the benchmarks rather than add a special case just so that we can handle our benchmarks. It seems like, if this test breakage reflects real user pain, then we should be fixing it in the Chrome code base. If it instead reflects unrealistic benchmarks, we should be fixing the benchmarks.

Comment 17 by g...@chromium.org, Jun 21 2018

I'm happy with this if the benchmark owners are. (...I also continue to be grateful that we have very clear benchmark owners :) )

hayato: Are you fine with lowering the deepness of blink_perf.events::EventsDispatchingInDeeplyNestedShadowTrees.html ?

jbroman/yukishiino/haraken: Are you all fine with lowering the deepness in blink_perf.parser::query-selector-all-(deep|class-deep|id-deep).html ?
Thanks for investigating the issue.  For query-selector-*.html, it seems reasonable to lower the deepness.  Could you lower the deepness for *all* query-selector-*.html tests?  Not only (deep|class-deep|id-deep), but also other family members seem potentially having the same issue.

> hayato: Are you fine with lowering the deepness of blink_perf.events::EventsDispatchingInDeeplyNestedShadowTrees.html ?

Fine if the causes a stack-overflow in some platforms.


> createComposedTree(document.getElementById('child1'), 50, 20);

Now, the test creates a tree of trees whose depth is 50, where each tree has 20 height.

I'd suggest:

> createComposedTree(document.getElementById('child1'), 50, 10);



#17: Sounds okay to me.
Project Member

Comment 21 by bugdroid1@chromium.org, Jun 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a9cd34ec652eeeef52d3326234c93df499cf54e7

commit a9cd34ec652eeeef52d3326234c93df499cf54e7
Author: George Burgess IV <gbiv@chromium.org>
Date: Fri Jun 22 17:42:47 2018

blink_perf.events: Reduce how much we recurse

This test caused blink to recurse to stacks > 4,000 frames deep. This is
caused crashes for Android on ARM64, which has a standard stack limit of
1MB.

Since it appears that this benchmark's depth is somewhat arbitrary (e.g.
the crashes we're seeing when running the benchmarks aren't indicative
of a problem in the real world), turning down the recursion seems
reasonable here.

Bug:  851539 
Test: Ran blink_perf.events on ARM64 Android; no crashes observed.
Change-Id: I71290e7d2ddc4786aa1ba09c5030375d2027b39e
Reviewed-on: https://chromium-review.googlesource.com/1111565
Reviewed-by: Hayato Ito <hayato@chromium.org>
Commit-Queue: George Burgess <gbiv@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569685}
[modify] https://crrev.com/a9cd34ec652eeeef52d3326234c93df499cf54e7/third_party/blink/perf_tests/events/EventsDispatchingInDeeplyNestedShadowTrees.html

Project Member

Comment 22 by bugdroid1@chromium.org, Jun 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/11b1eb7d4df34e860e58405aecdf2feb49944a27

commit 11b1eb7d4df34e860e58405aecdf2feb49944a27
Author: George Burgess IV <gbiv@chromium.org>
Date: Fri Jun 22 19:03:46 2018

blink_perf.parser: Reduce how much we recurse

Some of these tests caused blink to recurse to stacks > 4,000 frames
deep. This caused crashes for Android on ARM64, which has a standard
stack limit of 1MB.

Since it appears that these benchmarks' depths are somewhat arbitrary
(e.g. the crashes that we're seeing when running these benchmarks aren't
indicative of a problem in the real world), turning down the recursion
seems reasonable here.

New limits were chosen arbitrarily.

Bug:  851539 
Test: Ran blink_perf.parser on ARM64 Android; no crashes observed.
Change-Id: I310b5e92fcb32524ec6e1c8a1c67226d68f4e5aa
Reviewed-on: https://chromium-review.googlesource.com/1111685
Commit-Queue: George Burgess <gbiv@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Yuki Shiino <yukishiino@chromium.org>
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569720}
[modify] https://crrev.com/11b1eb7d4df34e860e58405aecdf2feb49944a27/third_party/blink/perf_tests/parser/query-selector-all-class-deep.html
[modify] https://crrev.com/11b1eb7d4df34e860e58405aecdf2feb49944a27/third_party/blink/perf_tests/parser/query-selector-all-deep.html
[modify] https://crrev.com/11b1eb7d4df34e860e58405aecdf2feb49944a27/third_party/blink/perf_tests/parser/query-selector-all-id-deep.html

Comment 23 by g...@chromium.org, Jun 22 2018

I dunno what Chromium etiquette is for turning tests back on (e.g. if I should just TBR+CQ these, or if I should ask for review), so a review would be appreciated for undisabling the tests that were disabled for this bug:

https://chromium-review.googlesource.com/c/chromium/src/+/1112517
https://chromium-review.googlesource.com/c/chromium/src/+/1112181

After I get those in, I'll turn the AFDO autoroller back on, which should just about wrap this up.

Thank you all :)
Project Member

Comment 24 by bugdroid1@chromium.org, Jun 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/be6a0e57f139253500a73d705b61c4d834dfd775

commit be6a0e57f139253500a73d705b61c4d834dfd775
Author: George Burgess <gbiv@chromium.org>
Date: Sat Jun 23 22:48:24 2018

Revert "Disable several failing blink_perf.parser stories on Nexus 5X webview"

This reverts commit 5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4.

Reason for revert: The cause of the failures has (hopefully) been fixed.

Original change's description:
> Disable several failing blink_perf.parser stories on Nexus 5X webview
> 
> NOTRY=true
> TBR=nednguyen@chromium.org
> 
> Bug:  851539 
> Change-Id: Ia51f717aa8c7ca47b47007fa4aa5077767a86e91
> Reviewed-on: https://chromium-review.googlesource.com/1095676
> Commit-Queue: Charlie Andrews <charliea@chromium.org>
> Reviewed-by: Charlie Andrews <charliea@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#566043}

Bug:  851539 
Change-Id: I69b8f51a4e91171448f6d61c052168deaa3dc222
Reviewed-on: https://chromium-review.googlesource.com/1112181
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#569913}
[modify] https://crrev.com/be6a0e57f139253500a73d705b61c4d834dfd775/tools/perf/expectations.config

Project Member

Comment 25 by bugdroid1@chromium.org, Jun 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7dd7d795a3c6198af56bf8a215ace37fd2880f47

commit 7dd7d795a3c6198af56bf8a215ace37fd2880f47
Author: George Burgess <gbiv@chromium.org>
Date: Mon Jun 25 19:43:58 2018

Revert "Disable failing blink_perf.events story on 5X webview"

This reverts commit 9f907acb7e32c92cba347c4e142fd1566c444728.

Reason for revert: The cause of the failures has (hopefully) been fixed.

Original change's description:
> Disable failing blink_perf.events story on 5X webview
> 
> NOTRY=true
> TBR=nednguyen@chromium.org
> 
> Bug:  851539 
> Change-Id: I2128758cdc151cf496ee9485d9f639641bb48c69
> Reviewed-on: https://chromium-review.googlesource.com/1095636
> Commit-Queue: Charlie Andrews <charliea@chromium.org>
> Reviewed-by: Charlie Andrews <charliea@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#566040}

Bug:  851539 
Change-Id: I46cfb5ff14395207d25c4b1d05bdc479fadcdd47
Reviewed-on: https://chromium-review.googlesource.com/1112517
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#570138}
[modify] https://crrev.com/7dd7d795a3c6198af56bf8a215ace37fd2880f47/tools/perf/expectations.config

Comment 26 by g...@chromium.org, Jun 25 2018

Status: Fixed (was: Assigned)
Autoroller is back up and running, and we're on a new profile at r570151. Fingers crossed that this doesn't find a new way to smash our stack. ;)

Sign in to add a comment