Issue metadata
Sign in to add a comment
|
rasterize_and_record_micro.top_25_smooth crash on chromium.perf at 384235:384244 |
||||||||||||||||||||||
Issue descriptionRevision range first seen: 384235:384244 Link to failing step log: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14428/steps/rasterize_and_record_micro.top_25_smooth/logs/stdio Stacks are broken: issue 561447 Will try a return-code bisect If the test is disabled, please downgrade to Pri-2.
,
Apr 4 2016
Looks like it's not 100% consistent - I see crashes on different pages (sometimes Blogger, sometimes Google calendar, etc.). Hopefully a return-code bisect can still deal with the non-determinism.
,
Apr 5 2016
===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Large heap collection type hits assertion in Heap::allocationSizeFromSize Author : keishi Commit description: allocationSizeFromSize is called for large object page allocations as well. BUG=597953 Review URL: https://codereview.chromium.org/1842263004 Cr-Commit-Position: refs/heads/master@{#384230} Commit : 57a75864e012b06db30f4b027e24f5fd758e46a3 Date : Thu Mar 31 09:29:16 2016 ===== TESTED REVISIONS ===== Revision Exit Code Std. Dev. Num Values Good? chromium@384198 0 N/A 20 good chromium@384221 0 N/A 20 good chromium@384227 0 N/A 20 good chromium@384229 0 N/A 20 good chromium@384230 1 N/A 20 bad <- chromium@384233 1 N/A 20 bad chromium@384244 1 N/A 20 bad Bisect job ran on: linux_perf_bisect Bug ID: 600377 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --also-run-disabled-tests rasterize_and_record_micro.top_25_smooth Test Metric: rasterize_time/https___www.google.com_calendar_ Relative Change: Zero to non-zero Score: 0.0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6370 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9016302404771402992 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=600377 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Apr 5 2016
,
Apr 5 2016
keishi@ it looks like your CL is causing crashes on the perf bots here (I was worried it was a little non-deterministic, but it looks like this test passed 20 times without your CL then failed 20 time with your CL). Can you please investigate ASAP (or roll your change back)? Sorry there are no stacks from the bots, but it should be easy to reproduce locally. I'm happy to help if necessary.
,
Apr 6 2016
I reverted my CL with r385379, but the bot is still crashing. https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14456 (r385387) I ran a bisect locally but the crash was flaky and couldn't pinpoint the cause. It seems to be crashing at either PaintLayerScrollableArea::invalidateAllStickyConstraints or FramePainter::paintContents. I don't see any suspicious changes. +flackr since you seem to be working on sticky, could you take a look? Thanks!
,
Apr 6 2016
My CL which introduced PaintLayerScrollableArea::invalidateAllStickyConstraints was reverted in #384331 https://chromium.googlesource.com/chromium/src/+/e37fce4deefe8c872c4329bdf4233858aab8b9a5# as it was indeed crashing on other bots. I'm currently working on fixing the crash and relanding but it should not be causing any current problems in the bots. I only see crashes in FramePainter::paintContents on the linked crash. +wangxianzhu, since you've somewhat recently touched FramePainter. Any idea where the FramePainter::paintContents crash may be coming from?
,
Apr 6 2016
I saw stack information on bot like this: 0 chrome!blink::FramePainter::paintContents [FramePainter.cpp : 124 + 0x0] Can this be trusted?
,
Apr 6 2016
,
Apr 7 2016
wangxianzhu@ this specific issue (a number of perf tests becoming very flaky) is new in the past few days. Can the old issue 590856 really be the cause of this? Regardless this crash is causing active bot redness so we should keep this bug open (as Pri-1) as long as the perf bots are still crashing. Given all the confusion over different possible sources of crashes and flakiness, I'd say that reliable stacks in the logs are really essential for making progress on issues like this. Blocking on issue 561447 .
,
Apr 7 2016
The stack traces should be fixed already (I even added a unittest to test it for this case). I am only keeping issue 561447 open while I do some refactoring. Re: #8, yes that should be correct now. I fixed it earlier in the week. The initial failure probably happened right before my fix landed.
,
Apr 7 2016
,
Apr 7 2016
We use bug 590856 to track all RELEASE_ASSERT(!m_frameView.needsLayout()) failures in FramePainter.cpp. We have fixed some root reasons of them but there are still crash reports so it's still open. As this affects perf bot, I agree to keep this open. Will update bug 590856 after we figure out the root reason of this bug.
,
Apr 8 2016
Tried many times but still could't reproduce the crash locally (except the crashes in debug build, recorded in #140~#142 in bug 590856). rbyers@ can you run the bisect again?
,
Apr 9 2016
I seems that the current crash is different from the crash after 384231. My try job shows that 384300 crashes at code related to position:sticky (which has been reverted in 384331): 0 chrome!blink::PaintLayerScrollableArea::invalidateAllStickyConstraints [OwnPtr.h : 61 + 0x8] 1 chrome!blink::LayoutBoxModelObject::invalidateStickyConstraints [LayoutBoxModelObject.cpp : 332 + 0x5] 2 chrome!blink::LayoutBlock::updateAfterLayout [LayoutBlock.cpp : 888 + 0x5] Tried 383432 (after position:sticky is reverted). The stack trace isn't symbolized because of bug 561447 . It seems that we have another regression between position:sticky was landed and reverted. Another possibility is that the revert of position:sticky is not complete. flackr@ can you verify if the revert of position:sticky is complete? Are there any CLs related to position:sticky other than 384231? dyen@ can I try on perf bot with the stack trace issue fixed on a previous revision? Which revision fixed the stack trace issue?
,
Apr 12 2016
flackr@, dyen@ can you answer the questions in #15? Thanks.
,
Apr 12 2016
flackr@ can you answer the question in #15? Thanks.
,
Apr 12 2016
The fix for the stack trace issue was actually in catapult. If you are using Chromium CLs you would need to find the catapult roll CL. It is in r385180: commit 4dbb26170e3e350d7b7ce17703e48bebfc3cf8ad Author: catapult-deps-roller <catapult-deps-roller@chromium.org> AuthorDate: Tue Apr 5 09:03:34 2016 Commit: Commit bot <commit-bot@chromium.org> CommitDate: Tue Apr 5 09:04:50 2016 Roll src/third_party/catapult/ 6b041c490..6d71ed556 (6 commits).
,
Apr 13 2016
Yes, the revert of sticky was complete. There had only been one sticky related CL to land which was reverted as you pointed out.
,
Apr 14 2016
Issue 602439 has been merged into this issue.
,
Apr 14 2016
Sheriff peeking in, here's what it looks like to me so far: The test started failing on build 14369, which has a regression range of r384233:r384244. But it appears that the failure was flaky at that time (comment 2). Flake caused the bisect (comment 3) to fail incorrectly. If you dig into the bisect results: chromium@384230[0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] chromium@384231[] chromium@384232[] chromium@384233[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] The test passed at r384230 and failed at r384233. This, and the stack trace in comment 6 imply that flackr's sticky CL (r384231) did cause the test to start failing flakily. But the current stack trace (issue 590856 comment 140) is a different stack trace, and the revert of sticky (r384331) didn't resolve the failure. That implies that a second CL between the land and revert of sticky (with a range of r384231:r384331) caused an additional failure, that caused the test to start failing consistently.
,
Apr 14 2016
,
Apr 15 2016
https://codereview.chromium.org/1878943011/ added some logs and tried on linux_perf_bisect. It seems another frame throttling issue: a frame just become unthrottled is painted with dirty layout. Sami can you take a look? Link to log: https://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6421/steps/Performance%20Test%20%28With%20Patch%29%201%20of%201/logs/stdio
,
Apr 15 2016
Issue 602413 has been merged into this issue.
,
Apr 15 2016
,
Apr 15 2016
Apologies, I only saw this now. I'll take a look on Monday. Looks like no-one has managed to reproduce this locally?
,
Apr 15 2016
You can search for "!!!!@@@" in the log (link in #23) to find where the problem happened.
,
Apr 15 2016
So, just a timeline of the commits we've talked about: ## 3/31 02:29 ## https://chromium.googlesource.com/chromium/src/+/57a75864e012b06db30f4b027e24f5fd758e46a3 (large heap collection) is committed. This is the first failure that we know of for rasterize_and_record_micro.top_25_smooth, seen in these bisect results (http://bit.ly/1W4bzU5). However, if you look closely at the bisect results, you'll see that out of 20 runs, the test only failed once (returning an error code of 24?). ## 3/31 02:31 ## https://chromium.googlesource.com/chromium/src/+/846f623cb1ef78d90e620c2a7a6557544a86b692 (reland of position sticky) is committed, introducing flaky failures for several tests, including rasterize_and_record_micro.top_25_smooth. We can see this failure in the the same bisect as above. With this CL, the failures of the test jumped up from 1/20 to 19/20. Tests that become flaky/failing: blink_style.top_25, v8.top_25_smooth, rasterize_and_record_micro.top_25_smooth Perbot runs during this period: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14368 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14369 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14370 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14371 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14372 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14373 Run 14368 is especially interesting, because rasterize_and_record_micro.top_25_smooth actually *passes*. If our bisect results are indicative, there was only a 1/20 change that this would happen. ## 3/31 11:15 ## https://chromium.googlesource.com/chromium/src/+/e37fce4deefe8c872c4329bdf4233858aab8b9a5 (revert of reland of position sticky) is committed. Flakiness in v8.top_25_smooth and blink_style.top_25_smooth goes away, but rasterize_and_record_micro.top_25_smooth failures continue. Perfbot runs during this period: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14374 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14375 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14376 ## 4/5 20:45 ## https://chromium.googlesource.com/chromium/src/+/c3d03859aa54039e399d2938c09f24fbcb1e8fd3 (revert of large heap collection) is committed. Test continues to fail. rasterize_and_record_micro.top_25_smooth hasn't passed since 14368 (over 10 days), which signals to me that even if the reland of position sticky introduced a flaky failure, some other CL has since introduced a consistent one.
,
Apr 17 2016
I finally figured out how to run a perf try job at a past revision: - Sync to that past CL and make the change git checkout -b my_perf_test a1b2c3 - Create the correct run-perf-test.cfg file that specifies what benchmark you want to run. I did this by running: tools/perf/run_benchmark try linux rasterize_and_record_micro.top_25_smooth and copying the file that it creates into my local checkout, although there could certainly be a better way of creating this file. - Upload your changes - After that, manually specify the revision that you're changes are based on git cl try -m tryserver.chromium.perf --bot=linux_perf_bisect -r a1b2c3 There might be an easier way to do this, but it was the only one I could find.
,
Apr 18 2016
(Also, big thanks to prasadv@ who helped me figure out how the heck to do that)
,
Apr 18 2016
Also, I've confirmed that the error we're seeing (problems with chrome!blink::FramePainter::paintContents) is happening at e37fce4deefe8c872c4329bdf4233858aab8b9a5 and isn't happening at 57a75864e012b06db30f4b027e24f5fd758e46a3. This isn't any new news, besides confirming that our good and bad revisions are okay. I'll be bisecting this afternoon (I'm OOO for the rest of the morning).
,
Apr 18 2016
charliea@, based on #27, bisecting and reproducing with older version is not necessary for now. Thanks for the findings.
,
Apr 18 2016
I think the problem is that we have a document with one or more throttled frames and the benchmark calls cc_blink::WebContentLayerImpl::PaintContentsToDisplayList() to rasterize the layer *without* allowing throttling. In this case shouldThrottleRendering() returns false (canThrottleRendering()) is true, so the painting code tries to paint stale contents and blows up. I can think of two ways to fix this: 1. Always allow throttling when painting (e.g., in CompositedLayerMapping::paintContents) 2. Make cc_blink::WebContentLayerImpl::PaintContentsToDisplayList do a synchronous lifecycle update without allowing throttling to make sure everything is up to date. Probable #2 makes more sense -- WDYT?
,
Apr 18 2016
I think we should allow throttling in ContentLayerDelegate::paintContents(). Normally WebContentLayerImpl::PaintContentsToDisplayList() just get the display list cached by the previous full lifecycle update without actual painting, so whether throttling is allowed doesn't matter. During rasterize_and_record_micro test, in some modes we force painting during WebContentLayerImpl::PaintContentsToDisplayList() to measure the painting performance. This painting should produce the same result as normal painting.
,
Apr 18 2016
,
Apr 18 2016
That makes sense. Proposed fix here: https://codereview.chromium.org/1898813002/
,
Apr 18 2016
Got it - sounds good. For my own curiosity, do we know what started the breakage?
,
Apr 19 2016
I suspect it started with https://codereview.chromium.org/1845223002.
,
Apr 20 2016
I'm disabling the benchmark per perf bot sheriffing instructions (https://chromium.googlesource.com/chromium/src/+/master/tools/perf/docs/perf_bot_sheriffing.md): https://codereview.chromium.org/1907463002/ I discussed this offline with skyostil@ and he will re-enable the benchmark as part of his fix.
,
Apr 20 2016
,
Apr 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/502afb658d9b946b123ced6d0c91c444adb23f19 commit 502afb658d9b946b123ced6d0c91c444adb23f19 Author: petrcermak <petrcermak@chromium.org> Date: Wed Apr 20 17:43:37 2016 Disable rasterize_and_record_micro.top_25_smooth on Linux and Android Rationale: The benchmark is causing persistent redness on one Linux and several Android perf bots: https://build.chromium.org/p/chromium.perf/builders/Linux%20Perf%20(5) https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20(2) https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5X%20Perf%20(2) https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus7v2%20Perf%20(2) https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus6%20Perf%20(2) https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20(2) https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20(2) TBR=skyostil BUG= 600377 , 605120 CQ_EXTRA_TRYBOTS=tryserver.chromium.perf:android_s5_perf_cq;tryserver.chromium.perf:winx64_10_perf_cq;tryserver.chromium.perf:mac_retina_perf_cq;tryserver.chromium.perf:linux_perf_cq Review URL: https://codereview.chromium.org/1907463002 Cr-Commit-Position: refs/heads/master@{#388528} [modify] https://crrev.com/502afb658d9b946b123ced6d0c91c444adb23f19/tools/perf/benchmarks/rasterize_and_record_micro.py
,
Apr 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/30585301cd73bf1e6cff2c9bf22422e515b728d8 commit 30585301cd73bf1e6cff2c9bf22422e515b728d8 Author: skyostil <skyostil@chromium.org> Date: Wed Apr 20 23:51:51 2016 Allow throttling in painting microbenchmark ContentLayerDelegate::paintContents() is being used as an alternate painting path by the rasterize and record microbenchmark. This benchmark can crash if any frame on the page is being throttled, because this way of painting doesn't enable throttling. This patch fixes that by enabling throttling, which also makes the benchmark more realistic since the real painting path also uses throttling. BUG= 600377 TEST=tools/perf/run_benchmark rasterize_and_record_micro.top_25_smooth --browser=debug --story-filter=Blogger Review URL: https://codereview.chromium.org/1898813002 Cr-Commit-Position: refs/heads/master@{#388598} [modify] https://crrev.com/30585301cd73bf1e6cff2c9bf22422e515b728d8/third_party/WebKit/Source/core/layout/compositing/CompositedLayerMapping.cpp [modify] https://crrev.com/30585301cd73bf1e6cff2c9bf22422e515b728d8/third_party/WebKit/Source/web/tests/FrameThrottlingTest.cpp
,
Apr 21 2016
The benchmark is now disabled on Linux, so I'm reducing the priority of this issue.
,
Apr 21 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/523707adcb03a0fa6123831eaa8cad0048f911a4 commit 523707adcb03a0fa6123831eaa8cad0048f911a4 Author: skyostil <skyostil@chromium.org> Date: Thu Apr 21 10:10:53 2016 Revert of Disable rasterize_and_record_micro.top_25_smooth on Linux and Android (patchset #2 id:20001 of https://codereview.chromium.org/1907463002/ ) Reason for revert: Benchmark fixed with https://codereview.chromium.org/1898813002. Original issue's description: > Disable rasterize_and_record_micro.top_25_smooth on Linux and Android > > Rationale: The benchmark is causing persistent redness on one Linux and > several Android perf bots: > > https://build.chromium.org/p/chromium.perf/builders/Linux%20Perf%20(5) > https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20(2) > https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5X%20Perf%20(2) > https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus7v2%20Perf%20(2) > https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus6%20Perf%20(2) > https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20(2) > https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20(2) > > TBR=skyostil > BUG= 600377 , 605120 > CQ_EXTRA_TRYBOTS=tryserver.chromium.perf:android_s5_perf_cq;tryserver.chromium.perf:winx64_10_perf_cq;tryserver.chromium.perf:mac_retina_perf_cq;tryserver.chromium.perf:linux_perf_cq TBR=skyostil@google.com,petrcermak@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 600377 , 605120 Review URL: https://codereview.chromium.org/1903903003 Cr-Commit-Position: refs/heads/master@{#388734} [modify] https://crrev.com/523707adcb03a0fa6123831eaa8cad0048f911a4/tools/perf/benchmarks/rasterize_and_record_micro.py
,
Apr 21 2016
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by rbyers@chromium.org
, Apr 4 2016