Several blink_perf.events and blink_perf.parser stories failing on chromium.perf/Android Nexus5X WebView Perf |
||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of charliea@chromium.org blink_perf.events on Android device Nexus 5X failing on chromium.perf/Android Nexus5X WebView Perf Builders failed on: - Android Nexus5X WebView Perf: https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus5X%20WebView%20Perf This seemed to go from passing 100% to failing 100% between 564829 and 564866. I'm going to kick off a bisect and send out a CL disabling this story.
,
Jun 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9f907acb7e32c92cba347c4e142fd1566c444728 commit 9f907acb7e32c92cba347c4e142fd1566c444728 Author: Charlie Andrews <charliea@chromium.org> Date: Mon Jun 11 17:20:20 2018 Disable failing blink_perf.events story on 5X webview NOTRY=true TBR=nednguyen@chromium.org Bug: 851539 Change-Id: I2128758cdc151cf496ee9485d9f639641bb48c69 Reviewed-on: https://chromium-review.googlesource.com/1095636 Commit-Queue: Charlie Andrews <charliea@chromium.org> Reviewed-by: Charlie Andrews <charliea@chromium.org> Cr-Commit-Position: refs/heads/master@{#566040} [modify] https://crrev.com/9f907acb7e32c92cba347c4e142fd1566c444728/tools/perf/expectations.config
,
Jun 11 2018
Looks like this is also happening for three blink_perf.parser stories, also on 5X webview perf: - query-selector-all-class-deep.html - query-selector-all-deep.html - query-selector-all-id-deep.html Given that these are all blink_perf tests that started failing in the same revision range, I'm going to assume that the culprit is the same and group these two bugs together. I'm going to send out another CL that disables these blink_perf.parser stories.
,
Jun 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4 commit 5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4 Author: Charlie Andrews <charliea@chromium.org> Date: Mon Jun 11 17:38:48 2018 Disable several failing blink_perf.parser stories on Nexus 5X webview NOTRY=true TBR=nednguyen@chromium.org Bug: 851539 Change-Id: Ia51f717aa8c7ca47b47007fa4aa5077767a86e91 Reviewed-on: https://chromium-review.googlesource.com/1095676 Commit-Queue: Charlie Andrews <charliea@chromium.org> Reviewed-by: Charlie Andrews <charliea@chromium.org> Cr-Commit-Position: refs/heads/master@{#566043} [modify] https://crrev.com/5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4/tools/perf/expectations.config
,
Jun 11 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/16ee8ce9240000 Roll AFDO from 69.0.3450.0_rc-r1 to 69.0.3451.0_rc-r1 by afdo-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com https://chromium.googlesource.com/chromium/src/+/b15094a674a95da8308f9516d5e35fea26f55169 0 → 1 (+1) Assigning to sheriff gbiv@chromium.org because "Roll AFDO from 69.0.3450.0_rc-r1 to 69.0.3451.0_rc-r1" is a roll. Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Jun 11 2018
Reverting is okay for now, but this needs to be fixed. We're actively using the three benchmarks to evaluate performance regressions.
,
Jun 12 2018
gbiv@, it looks like the above roll resulted in several functional breakages in blink_perf benchmark suite. How can we go about reverting it?
,
Jun 12 2018
That's highly concerning, since AFDO isn't meant to cause functional changes. If AFDO is what's breaking things, then it's either a compiler bug, or AFDO is exposing a latent bug somewhere in Chrome (presumably blink). It looks like there was another roll 4 hours ago. I'll sync and see if that fixes this. If it doesn't (or if I can't get anywhere on it within the next 5ish hours), I'll stop the autoroller and roll us back to the 69.0.3450.0_rc-r1 profile until I can figure out what's going on.
,
Jun 12 2018
,
Jun 13 2018
Having trouble reproducing this locally, so I stopped the autoroller. Waiting for CR+1 on the rollback: https://chromium-review.googlesource.com/c/chromium/src/+/1097941 . It's really straightforward, so if anyone not in the R= line wants to jump in, please feel free. :)
,
Jun 13 2018
Looks like AFDO is causing this (turning it off fixes the test, as does flipping to an older profile), and these tests only fail on arm64 (for added fun). I still get crashes on r565942, which has the profile we rolled to after 3451, so this doesn't appear to be a one-off profile flake. I'll look more into it tomorrow, but we should be able to turn the blink tests back on after said revert lands.
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/599a9e359e2d7da2e00b490fad9af5d50b1385a7 commit 599a9e359e2d7da2e00b490fad9af5d50b1385a7 Author: George Burgess IV <gbiv@chromium.org> Date: Wed Jun 13 17:48:58 2018 Revert AFDO to 69.0.3450.0_rc-r1 The AFDO roll to 3451.0_rc-r1 is somehow breaking blink tests. Until we can figure out why, revert to something known to work with said tests. This isn't a literal revert, since there have been AFDO rolls since the one that broke us. Offending commit: http://crrev.com/564831 Bug: 851539 Test: CQ Change-Id: Ifb172873492673945396dfd5b593bdc7e44ff590 Reviewed-on: https://chromium-review.googlesource.com/1097941 Reviewed-by: Charlie Andrews <charliea@chromium.org> Commit-Queue: George Burgess <gbiv@chromium.org> Cr-Commit-Position: refs/heads/master@{#566896} [modify] https://crrev.com/599a9e359e2d7da2e00b490fad9af5d50b1385a7/chrome/android/profiles/newest.txt
,
Jun 15 2018
As an update, this appears to be a stack overflow. I can repro it without AFDO enabled if I add -fno-inline (which, as you may guess, prevents inlining) to the build rules in //third_party/blink/renderer/core/layout/BUILD.gn. In this case, my assumption is that the AFDO roll caused us to not inline some pieces of layout/ as aggressively as before. If this is correct, this is a latent bug in blink. The stack we're blowing is around 1MB. Linux's resource limits don't matter, since ART adds its own guard page at the 1MB limit. Modifying ART to place that at 2MB makes everything pass, so "we've been flying under the 1MB limit for a while, and are now starting to bump into it" seems like the most likely explanation to me. Since all of the tests that this appears to break have "deep" in their names, this makes sense. :) Neither gdb nor adb_gdb are willing to provide me with stack traces or symbol information, but I'll try to see what I can figure out from here.
,
Jun 19 2018
So, thanks to some python scripts, I think I can provide more insight here. The backtrace contains 6 functions that use each other to recurse. In this case, we get 863 layers of recursion deep before hitting the segv (meaning there are 863 calls to func1(), 863 to func2(), 863 to func3(), ...) The functions of interest are, in order of calling: blink::LayoutBlockFlow::PositionAndLayoutOnceIfNeeded, which reserves 96 bytes of stack, blink::LayoutBlock::UpdateLayout, which reserves 48 bytes of stack, blink::LayoutBlockFlow::UpdateBlockLayout, which reserves 176 bytes of stack, blink::LayoutBlockFlow::LayoutChildren, which reserves 112 bytes of stack, blink::LayoutBlockFlow::LayoutBlockChildren, which reserves a whopping 512 bytes of stack, and finally blink::LayoutBlockFlow::LayoutBlockChild, which reserves 272 bytes of stack, Each "layer" of these collectively eats 1216B of stack. So, 863 "layers" consumes 1,049,408B of stack, which is 832B shy of our 1MB limit. I'm unfamiliar with blink, but glancing at the code, the recursion here seems to be very much intended (no virtual/indirect call trickery). So, I'm unsure if we should just make these benchmarks less "deep", or if it's unexpected for these frames to be as large as they are. I'll ask for help on layout-dev@ shortly.
,
Jun 21 2018
Our blink friends said that the overly deep recursion was WAI, and collectively suggested: - upping the stack limit - turning the 'deepness' of the test down - figuring out if there's any low-hanging fruit for making the frame sizes above smaller I spent some time trying the third; here's what I found out: - Non-official builds have really large CHECK()s. Like, 292B of stack large. That's where a majority of LayoutBlockChildren's stack usage came from. - The biggest difference in stack usage between an AFDO and an AFDO -fno-inline build is in blink::LayoutBlockFlow::LayoutChildren. With inlining, this function eats 272 bytes of stack. Without, it eats 112. The extra stack appears to be consumed entirely because we're not inlining enough. In particular, clang is reluctant to inline `PhysicalToLogical<BorderValue>::Before()`, which requires a 72 byte structure to be built up + passed in as `this`. The entirety of this method boils down to `if (fieldA == foo) return fieldB; return fieldC == bar ? fieldC : fieldD;`. Marking this (and all other accessors in the class in question) as always_inline actually decreases our binary size by over 900 bytes (I'll get the champagne). I'm happy to add always_inline if people want, but I think a more sure fix would just be to raise the stack limit for our renderer thread: https://crrev.com/c/1109213
,
Jun 21 2018
wrt: "// Our renderer process can recurse quite a bit on some of our own // (somewhat patholgoical) benchmarks." My vote is that we turn down the recursion level in the benchmarks rather than add a special case just so that we can handle our benchmarks. It seems like, if this test breakage reflects real user pain, then we should be fixing it in the Chrome code base. If it instead reflects unrealistic benchmarks, we should be fixing the benchmarks.
,
Jun 21 2018
I'm happy with this if the benchmark owners are. (...I also continue to be grateful that we have very clear benchmark owners :) ) hayato: Are you fine with lowering the deepness of blink_perf.events::EventsDispatchingInDeeplyNestedShadowTrees.html ? jbroman/yukishiino/haraken: Are you all fine with lowering the deepness in blink_perf.parser::query-selector-all-(deep|class-deep|id-deep).html ?
,
Jun 21 2018
Thanks for investigating the issue. For query-selector-*.html, it seems reasonable to lower the deepness. Could you lower the deepness for *all* query-selector-*.html tests? Not only (deep|class-deep|id-deep), but also other family members seem potentially having the same issue.
,
Jun 21 2018
> hayato: Are you fine with lowering the deepness of blink_perf.events::EventsDispatchingInDeeplyNestedShadowTrees.html ?
Fine if the causes a stack-overflow in some platforms.
> createComposedTree(document.getElementById('child1'), 50, 20);
Now, the test creates a tree of trees whose depth is 50, where each tree has 20 height.
I'd suggest:
> createComposedTree(document.getElementById('child1'), 50, 10);
,
Jun 21 2018
#17: Sounds okay to me.
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a9cd34ec652eeeef52d3326234c93df499cf54e7 commit a9cd34ec652eeeef52d3326234c93df499cf54e7 Author: George Burgess IV <gbiv@chromium.org> Date: Fri Jun 22 17:42:47 2018 blink_perf.events: Reduce how much we recurse This test caused blink to recurse to stacks > 4,000 frames deep. This is caused crashes for Android on ARM64, which has a standard stack limit of 1MB. Since it appears that this benchmark's depth is somewhat arbitrary (e.g. the crashes we're seeing when running the benchmarks aren't indicative of a problem in the real world), turning down the recursion seems reasonable here. Bug: 851539 Test: Ran blink_perf.events on ARM64 Android; no crashes observed. Change-Id: I71290e7d2ddc4786aa1ba09c5030375d2027b39e Reviewed-on: https://chromium-review.googlesource.com/1111565 Reviewed-by: Hayato Ito <hayato@chromium.org> Commit-Queue: George Burgess <gbiv@chromium.org> Cr-Commit-Position: refs/heads/master@{#569685} [modify] https://crrev.com/a9cd34ec652eeeef52d3326234c93df499cf54e7/third_party/blink/perf_tests/events/EventsDispatchingInDeeplyNestedShadowTrees.html
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/11b1eb7d4df34e860e58405aecdf2feb49944a27 commit 11b1eb7d4df34e860e58405aecdf2feb49944a27 Author: George Burgess IV <gbiv@chromium.org> Date: Fri Jun 22 19:03:46 2018 blink_perf.parser: Reduce how much we recurse Some of these tests caused blink to recurse to stacks > 4,000 frames deep. This caused crashes for Android on ARM64, which has a standard stack limit of 1MB. Since it appears that these benchmarks' depths are somewhat arbitrary (e.g. the crashes that we're seeing when running these benchmarks aren't indicative of a problem in the real world), turning down the recursion seems reasonable here. New limits were chosen arbitrarily. Bug: 851539 Test: Ran blink_perf.parser on ARM64 Android; no crashes observed. Change-Id: I310b5e92fcb32524ec6e1c8a1c67226d68f4e5aa Reviewed-on: https://chromium-review.googlesource.com/1111685 Commit-Queue: George Burgess <gbiv@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Yuki Shiino <yukishiino@chromium.org> Reviewed-by: Jeremy Roman <jbroman@chromium.org> Cr-Commit-Position: refs/heads/master@{#569720} [modify] https://crrev.com/11b1eb7d4df34e860e58405aecdf2feb49944a27/third_party/blink/perf_tests/parser/query-selector-all-class-deep.html [modify] https://crrev.com/11b1eb7d4df34e860e58405aecdf2feb49944a27/third_party/blink/perf_tests/parser/query-selector-all-deep.html [modify] https://crrev.com/11b1eb7d4df34e860e58405aecdf2feb49944a27/third_party/blink/perf_tests/parser/query-selector-all-id-deep.html
,
Jun 22 2018
I dunno what Chromium etiquette is for turning tests back on (e.g. if I should just TBR+CQ these, or if I should ask for review), so a review would be appreciated for undisabling the tests that were disabled for this bug: https://chromium-review.googlesource.com/c/chromium/src/+/1112517 https://chromium-review.googlesource.com/c/chromium/src/+/1112181 After I get those in, I'll turn the AFDO autoroller back on, which should just about wrap this up. Thank you all :)
,
Jun 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/be6a0e57f139253500a73d705b61c4d834dfd775 commit be6a0e57f139253500a73d705b61c4d834dfd775 Author: George Burgess <gbiv@chromium.org> Date: Sat Jun 23 22:48:24 2018 Revert "Disable several failing blink_perf.parser stories on Nexus 5X webview" This reverts commit 5e32cab2b0e38ec1aa5c0cb10299dadcfb6142e4. Reason for revert: The cause of the failures has (hopefully) been fixed. Original change's description: > Disable several failing blink_perf.parser stories on Nexus 5X webview > > NOTRY=true > TBR=nednguyen@chromium.org > > Bug: 851539 > Change-Id: Ia51f717aa8c7ca47b47007fa4aa5077767a86e91 > Reviewed-on: https://chromium-review.googlesource.com/1095676 > Commit-Queue: Charlie Andrews <charliea@chromium.org> > Reviewed-by: Charlie Andrews <charliea@chromium.org> > Cr-Commit-Position: refs/heads/master@{#566043} Bug: 851539 Change-Id: I69b8f51a4e91171448f6d61c052168deaa3dc222 Reviewed-on: https://chromium-review.googlesource.com/1112181 Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#569913} [modify] https://crrev.com/be6a0e57f139253500a73d705b61c4d834dfd775/tools/perf/expectations.config
,
Jun 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7dd7d795a3c6198af56bf8a215ace37fd2880f47 commit 7dd7d795a3c6198af56bf8a215ace37fd2880f47 Author: George Burgess <gbiv@chromium.org> Date: Mon Jun 25 19:43:58 2018 Revert "Disable failing blink_perf.events story on 5X webview" This reverts commit 9f907acb7e32c92cba347c4e142fd1566c444728. Reason for revert: The cause of the failures has (hopefully) been fixed. Original change's description: > Disable failing blink_perf.events story on 5X webview > > NOTRY=true > TBR=nednguyen@chromium.org > > Bug: 851539 > Change-Id: I2128758cdc151cf496ee9485d9f639641bb48c69 > Reviewed-on: https://chromium-review.googlesource.com/1095636 > Commit-Queue: Charlie Andrews <charliea@chromium.org> > Reviewed-by: Charlie Andrews <charliea@chromium.org> > Cr-Commit-Position: refs/heads/master@{#566040} Bug: 851539 Change-Id: I46cfb5ff14395207d25c4b1d05bdc479fadcdd47 Reviewed-on: https://chromium-review.googlesource.com/1112517 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#570138} [modify] https://crrev.com/7dd7d795a3c6198af56bf8a215ace37fd2880f47/tools/perf/expectations.config
,
Jun 25 2018
Autoroller is back up and running, and we're on a new profile at r570151. Fingers crossed that this doesn't find a new way to smash our stack. ;) |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jun 11 2018