New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 600377 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug-Regression



Sign in to add a comment

rasterize_and_record_micro.top_25_smooth crash on chromium.perf at 384235:384244

Project Member Reported by rbyers@chromium.org, Apr 4 2016

Issue description

Revision range first seen: 384235:384244
Link to failing step log: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14428/steps/rasterize_and_record_micro.top_25_smooth/logs/stdio

Stacks are broken:  issue 561447 

Will try a return-code bisect

If the test is disabled, please downgrade to Pri-2.

 
Cc: vmp...@chromium.org
Looks like it's not 100% consistent - I see crashes on different pages (sometimes Blogger, sometimes Google calendar, etc.).  Hopefully a return-code bisect can still deal with the non-determinism.

===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Large heap collection type hits assertion in Heap::allocationSizeFromSize
Author  : keishi
Commit description:
  
allocationSizeFromSize is called for large object page allocations as well.

BUG=597953

Review URL: https://codereview.chromium.org/1842263004

Cr-Commit-Position: refs/heads/master@{#384230}
Commit  : 57a75864e012b06db30f4b027e24f5fd758e46a3
Date    : Thu Mar 31 09:29:16 2016


===== TESTED REVISIONS =====
Revision                Exit Code   Std. Dev.   Num Values  Good?
chromium@384198         0           N/A         20          good
chromium@384221         0           N/A         20          good
chromium@384227         0           N/A         20          good
chromium@384229         0           N/A         20          good
chromium@384230         1           N/A         20          bad         <-
chromium@384233         1           N/A         20          bad
chromium@384244         1           N/A         20          bad

Bisect job ran on: linux_perf_bisect
Bug ID: 600377

Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --also-run-disabled-tests rasterize_and_record_micro.top_25_smooth
Test Metric: rasterize_time/https___www.google.com_calendar_
Relative Change: Zero to non-zero
Score: 0.0

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6370
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9016302404771402992


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=600377

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!

Comment 4 by rmis...@google.com, Apr 5 2016

Cc: nednguyen@chromium.org rmis...@chromium.org
 Issue 597589  has been merged into this issue.
Cc: rbyers@chromium.org
Owner: keishi@chromium.org
Status: Assigned (was: Untriaged)
keishi@ it looks like your CL is causing crashes on the perf bots here (I was worried it was a little non-deterministic, but it looks like this test passed 20 times without your CL then failed 20 time with your CL).  Can you please investigate ASAP (or roll your change back)?

Sorry there are no stacks from the bots, but it should be easy to reproduce locally.  I'm happy to help if necessary.
Owner: flackr@chromium.org
I reverted my CL with r385379, but the bot is still crashing.
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14456 (r385387)
I ran a bisect locally but the crash was flaky and couldn't pinpoint the cause.
It seems to be crashing at either PaintLayerScrollableArea::invalidateAllStickyConstraints or FramePainter::paintContents.
I don't see any suspicious changes.

+flackr since you seem to be working on sticky, could you take a look?
Thanks!
Owner: wangxianzhu@chromium.org
My CL which introduced PaintLayerScrollableArea::invalidateAllStickyConstraints was reverted in #384331 https://chromium.googlesource.com/chromium/src/+/e37fce4deefe8c872c4329bdf4233858aab8b9a5# as it was indeed crashing on other bots. I'm currently working on fixing the crash and relanding but it should not be causing any current problems in the bots. I only see crashes in FramePainter::paintContents on the linked crash.

+wangxianzhu, since you've somewhat recently touched FramePainter. Any idea where the FramePainter::paintContents crash may be coming from?
I saw stack information on bot like this:

	 0  chrome!blink::FramePainter::paintContents [FramePainter.cpp : 124 + 0x0]

Can this be trusted?
Mergedinto: 590856
Status: Duplicate (was: Assigned)
Blockedon: 561447
Status: Assigned (was: Duplicate)
wangxianzhu@ this specific issue (a number of perf tests becoming very flaky) is new in the past few days.  Can the old issue 590856 really be the cause of this?  

Regardless this crash is causing active bot redness so we should keep this bug open (as Pri-1) as long as the perf bots are still crashing.

Given all the confusion over different possible sources of crashes and flakiness, I'd say that reliable stacks in the logs are really essential for making progress on issues like this.  Blocking on  issue 561447 .

Comment 11 by dyen@chromium.org, Apr 7 2016

The stack traces should be fixed already (I even added a unittest to test it for this case). I am only keeping  issue 561447  open while I do some refactoring.

Re: #8, yes that should be correct now. I fixed it earlier in the week. The initial failure probably happened right before my fix landed.

Comment 12 by dyen@chromium.org, Apr 7 2016

Cc: dyen@chromium.org
We use bug 590856 to track all RELEASE_ASSERT(!m_frameView.needsLayout()) failures in FramePainter.cpp. We have fixed some root reasons of them but there are still crash reports so it's still open.

As this affects perf bot, I agree to keep this open. Will update bug 590856 after we figure out the root reason of this bug.
Blockedon: -561447
Tried many times but still could't reproduce the crash locally (except the crashes in debug build, recorded in #140~#142 in bug 590856).

rbyers@ can you run the bisect again?


I seems that the current crash is different from the crash after 384231. My try job shows that 384300 crashes at code related to position:sticky (which has been reverted in 384331):
	 0  chrome!blink::PaintLayerScrollableArea::invalidateAllStickyConstraints [OwnPtr.h : 61 + 0x8]
	 1  chrome!blink::LayoutBoxModelObject::invalidateStickyConstraints [LayoutBoxModelObject.cpp : 332 + 0x5]
	 2  chrome!blink::LayoutBlock::updateAfterLayout [LayoutBlock.cpp : 888 + 0x5]

Tried 383432 (after position:sticky is reverted). The stack trace isn't symbolized because of  bug 561447 . It seems that we have another regression between position:sticky was landed and reverted. Another possibility is that the revert of position:sticky is not complete.

flackr@ can you verify if the revert of position:sticky is complete? Are there any CLs related to position:sticky other than 384231? 

dyen@ can I try on perf bot with the stack trace issue fixed on a previous revision? Which revision fixed the stack trace issue?

 
flackr@, dyen@ can you answer the questions in #15? Thanks.
Cc: flackr@chromium.org
flackr@ can you answer the question in #15? Thanks.

Comment 18 by dyen@chromium.org, Apr 12 2016

The fix for the stack trace issue was actually in catapult. If you are using Chromium CLs you would need to find the catapult roll CL. It is in r385180:

commit 4dbb26170e3e350d7b7ce17703e48bebfc3cf8ad
Author:     catapult-deps-roller <catapult-deps-roller@chromium.org>
AuthorDate: Tue Apr 5 09:03:34 2016
Commit:     Commit bot <commit-bot@chromium.org>
CommitDate: Tue Apr 5 09:04:50 2016

    Roll src/third_party/catapult/ 6b041c490..6d71ed556 (6 commits).
Yes, the revert of sticky was complete. There had only been one sticky related CL to land which was reverted as you pointed out.

Comment 20 by dtu@chromium.org, Apr 14 2016

Cc: rnep...@chromium.org sullivan@chromium.org charliea@chromium.org
 Issue 602439  has been merged into this issue.

Comment 21 by dtu@chromium.org, Apr 14 2016

Sheriff peeking in, here's what it looks like to me so far:

The test started failing on build 14369, which has a regression range of r384233:r384244.
But it appears that the failure was flaky at that time (comment 2).

Flake caused the bisect (comment 3) to fail incorrectly. If you dig into the bisect results:
chromium@384230[0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
chromium@384231[]             
chromium@384232[]             
chromium@384233[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

The test passed at r384230 and failed at r384233. This, and the stack trace in comment 6 imply that flackr's sticky CL (r384231) did cause the test to start failing flakily.

But the current stack trace (issue 590856 comment 140) is a different stack trace, and the revert of sticky (r384331) didn't resolve the failure. That implies that a second CL between the land and revert of sticky (with a range of r384231:r384331) caused an additional failure, that caused the test to start failing consistently.

Comment 22 by dtu@chromium.org, Apr 14 2016

Labels: OS-Linux
Cc: wangxianzhu@chromium.org
Components: Blink>Scheduling
Owner: skyos...@chromium.org
https://codereview.chromium.org/1878943011/ added some logs and tried on linux_perf_bisect. It seems another frame throttling issue: a frame just become unthrottled is painted with dirty layout.

Sami can you take a look?

Link to log: https://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6421/steps/Performance%20Test%20%28With%20Patch%29%201%20of%201/logs/stdio
Cc: aiolos@chromium.org jbudorick@chromium.org stip@chromium.org reve...@chromium.org dtu@chromium.org jo...@chromium.org m...@chromium.org
 Issue 602413  has been merged into this issue.
Cc: wnwen@chromium.org
 Issue 602297  has been merged into this issue.
Apologies, I only saw this now. I'll take a look on Monday. Looks like no-one has managed to reproduce this locally?
You can search for "!!!!@@@" in the log (link in #23) to find where the problem happened.

Comment 28 Deleted

So, just a timeline of the commits we've talked about:

## 3/31 02:29 ##
https://chromium.googlesource.com/chromium/src/+/57a75864e012b06db30f4b027e24f5fd758e46a3 (large heap collection) is committed. This is the first failure that we know of for rasterize_and_record_micro.top_25_smooth, seen in these bisect results (http://bit.ly/1W4bzU5). However, if you look closely at the bisect results, you'll see that out of 20 runs, the test only failed once (returning an error code of 24?).

## 3/31 02:31 ##
https://chromium.googlesource.com/chromium/src/+/846f623cb1ef78d90e620c2a7a6557544a86b692 (reland of position sticky) is committed, introducing flaky failures for several tests, including rasterize_and_record_micro.top_25_smooth. We can see this failure in the the same bisect as above. With this CL, the failures of the test jumped up from 1/20 to 19/20.

Tests that become flaky/failing: blink_style.top_25, v8.top_25_smooth, rasterize_and_record_micro.top_25_smooth

Perbot runs during this period:
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14368
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14369
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14370
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14371
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14372
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14373

Run 14368 is especially interesting, because rasterize_and_record_micro.top_25_smooth actually *passes*. If our bisect results are indicative, there was only a 1/20 change that this would happen.

## 3/31 11:15 ##

https://chromium.googlesource.com/chromium/src/+/e37fce4deefe8c872c4329bdf4233858aab8b9a5 (revert of reland of position sticky) is committed. Flakiness in v8.top_25_smooth and blink_style.top_25_smooth goes away, but rasterize_and_record_micro.top_25_smooth failures continue.

Perfbot runs during this period:
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14374
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14375
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf%20%285%29/builds/14376

## 4/5 20:45 ## 

https://chromium.googlesource.com/chromium/src/+/c3d03859aa54039e399d2938c09f24fbcb1e8fd3 (revert of large heap collection) is committed. Test continues to fail.





rasterize_and_record_micro.top_25_smooth hasn't passed since 14368 (over 10 days), which signals to me that even if the reland of position sticky introduced a flaky failure, some other CL has since introduced a consistent one.
I finally figured out how to run a perf try job at a past revision:

- Sync to that past CL and make the change

  git checkout -b my_perf_test a1b2c3


- Create the correct run-perf-test.cfg file that specifies what benchmark you want to run. I did this by running:

  tools/perf/run_benchmark try linux rasterize_and_record_micro.top_25_smooth

and copying the file that it creates into my local checkout, although there could certainly be a better way of creating this file.

- Upload your changes

- After that, manually specify the revision that you're changes are based on 

  git cl try -m tryserver.chromium.perf --bot=linux_perf_bisect -r a1b2c3

There might be an easier way to do this, but it was the only one I could find.
(Also, big thanks to prasadv@ who helped me figure out how the heck to do that)
Also, I've confirmed that the error we're seeing (problems with chrome!blink::FramePainter::paintContents) is happening at  e37fce4deefe8c872c4329bdf4233858aab8b9a5 and isn't happening at 57a75864e012b06db30f4b027e24f5fd758e46a3. This isn't any new news, besides confirming that our good and bad revisions are okay. I'll be bisecting this afternoon (I'm OOO for the rest of the morning).
charliea@, based on #27, bisecting and reproducing with older version is not necessary for now. Thanks for the findings.
I think the problem is that we have a document with one or more throttled frames and the benchmark calls cc_blink::WebContentLayerImpl::PaintContentsToDisplayList() to rasterize the layer *without* allowing throttling. In this case shouldThrottleRendering() returns false (canThrottleRendering()) is true, so the painting code tries to paint stale contents and blows up.

I can think of two ways to fix this:

1. Always allow throttling when painting (e.g., in CompositedLayerMapping::paintContents)

2. Make cc_blink::WebContentLayerImpl::PaintContentsToDisplayList do a synchronous lifecycle update without allowing throttling to make sure everything is up to date.

Probable #2 makes more sense -- WDYT?
I think we should allow throttling in ContentLayerDelegate::paintContents().

Normally WebContentLayerImpl::PaintContentsToDisplayList() just get the display list cached by the previous full lifecycle update without actual painting, so whether throttling is allowed doesn't matter.

During rasterize_and_record_micro test, in some modes we force painting during WebContentLayerImpl::PaintContentsToDisplayList() to measure the painting performance. This painting should produce the same result as normal painting.

Comment 36 by wnwen@chromium.org, Apr 18 2016

Cc: -wnwen@chromium.org
Status: Started (was: Assigned)
That makes sense. Proposed fix here: https://codereview.chromium.org/1898813002/
Got it - sounds good. For my own curiosity, do we know what started the breakage?
I suspect it started with https://codereview.chromium.org/1845223002.
I'm disabling the benchmark per perf bot sheriffing instructions (https://chromium.googlesource.com/chromium/src/+/master/tools/perf/docs/perf_bot_sheriffing.md): https://codereview.chromium.org/1907463002/

I discussed this offline with skyostil@ and he will re-enable the benchmark as part of his fix.
Cc: petrcermak@chromium.org
Project Member

Comment 42 by bugdroid1@chromium.org, Apr 20 2016

Project Member

Comment 43 by bugdroid1@chromium.org, Apr 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/30585301cd73bf1e6cff2c9bf22422e515b728d8

commit 30585301cd73bf1e6cff2c9bf22422e515b728d8
Author: skyostil <skyostil@chromium.org>
Date: Wed Apr 20 23:51:51 2016

Allow throttling in painting microbenchmark

ContentLayerDelegate::paintContents() is being used as an alternate
painting path by the rasterize and record microbenchmark. This benchmark
can crash if any frame on the page is being throttled, because this
way of painting doesn't enable throttling. This patch fixes that by
enabling throttling, which also makes the benchmark more realistic since
the real painting path also uses throttling.

BUG= 600377 
TEST=tools/perf/run_benchmark rasterize_and_record_micro.top_25_smooth --browser=debug --story-filter=Blogger

Review URL: https://codereview.chromium.org/1898813002

Cr-Commit-Position: refs/heads/master@{#388598}

[modify] https://crrev.com/30585301cd73bf1e6cff2c9bf22422e515b728d8/third_party/WebKit/Source/core/layout/compositing/CompositedLayerMapping.cpp
[modify] https://crrev.com/30585301cd73bf1e6cff2c9bf22422e515b728d8/third_party/WebKit/Source/web/tests/FrameThrottlingTest.cpp

Labels: -Pri-1 Pri-2
The benchmark is now disabled on Linux, so I'm reducing the priority of this issue.
Project Member

Comment 45 by bugdroid1@chromium.org, Apr 21 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/523707adcb03a0fa6123831eaa8cad0048f911a4

commit 523707adcb03a0fa6123831eaa8cad0048f911a4
Author: skyostil <skyostil@chromium.org>
Date: Thu Apr 21 10:10:53 2016

Revert of Disable rasterize_and_record_micro.top_25_smooth on Linux and Android (patchset #2 id:20001 of https://codereview.chromium.org/1907463002/ )

Reason for revert:
Benchmark fixed with https://codereview.chromium.org/1898813002.

Original issue's description:
> Disable rasterize_and_record_micro.top_25_smooth on Linux and Android
>
> Rationale: The benchmark is causing persistent redness on one Linux and
> several Android perf bots:
>
> https://build.chromium.org/p/chromium.perf/builders/Linux%20Perf%20(5)
> https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20(2)
> https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5X%20Perf%20(2)
> https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus7v2%20Perf%20(2)
> https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus6%20Perf%20(2)
> https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20(2)
> https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20(2)
>
> TBR=skyostil
> BUG= 600377 , 605120 
> CQ_EXTRA_TRYBOTS=tryserver.chromium.perf:android_s5_perf_cq;tryserver.chromium.perf:winx64_10_perf_cq;tryserver.chromium.perf:mac_retina_perf_cq;tryserver.chromium.perf:linux_perf_cq

TBR=skyostil@google.com,petrcermak@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 600377 , 605120 

Review URL: https://codereview.chromium.org/1903903003

Cr-Commit-Position: refs/heads/master@{#388734}

[modify] https://crrev.com/523707adcb03a0fa6123831eaa8cad0048f911a4/tools/perf/benchmarks/rasterize_and_record_micro.py

Status: Fixed (was: Started)

Sign in to add a comment