New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 714207 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Flaky V8_Fatal in MarkCompactCollector::Sweeper::RawSweep during WebglConformance_deqp_functional_gles3_multisample

Project Member Reported by ynovikov@chromium.org, Apr 21 2017

Issue description

Seen in:
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48296
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48260
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48212
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48159

Stack:
#
# Fatal error in ../../v8/src/heap/mark-compact-inl.h, line 164
# Check failed: not_done.
#
#0 0x7fcd3aa39b27 base::debug::StackTrace::StackTrace()
#1 0x7fcd3d14fc75 gin::(anonymous namespace)::PrintStackTrace()
#2 0x7fcd3d01d30d V8_Fatal
#3 0x7fcd39fd1435 v8::internal::LiveObjectIterator<>::Next()
#4 0x7fcd39fd0cba v8::internal::MarkCompactCollector::Sweeper::RawSweep()
#5 0x7fcd39fc7ca4 v8::internal::MarkCompactCollector::Sweeper::ParallelSweepPage()
#6 0x7fcd39fc805d v8::internal::MarkCompactCollector::Sweeper::ParallelSweepSpace()
#7 0x7fcd39fd8c54 v8::internal::MarkCompactCollector::Sweeper::SweeperTask::Run()
#8 0x7fcd38d787c1 _ZNO4base8CallbackIFvvELNS_8internal8CopyModeE1ELNS2_10RepeatModeE1EE3RunEv
#9 0x7fcd3aab23c7 base::(anonymous namespace)::WorkerThread::ThreadMain()
#10 0x7fcd3aaa7a4c base::(anonymous namespace)::ThreadFunc()
#11 0x7fcd37d91182 start_thread
#12 0x7fcd3180747d clone

Going to mark the test as flaky to prevent CQ flakiness.
 

Comment 1 by kbr@chromium.org, Apr 21 2017

Cc: u...@chromium.org
Owner: hpayer@chromium.org
Status: Assigned (was: Unconfirmed)
Hannes, could you please take and assign this as necessary?

Comment 2 by kbr@chromium.org, Apr 21 2017

Cc: machenb...@chromium.org hablich@chromium.org
Project Member

Comment 3 by bugdroid1@chromium.org, Apr 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/74486301cfa4958e9627a17d053ecf2886f74d50

commit 74486301cfa4958e9627a17d053ecf2886f74d50
Author: ynovikov <ynovikov@chromium.org>
Date: Fri Apr 21 20:41:28 2017

Mark a WebGL2 test Flaky

deqp/functional/gles3/multisample.html
on linux nvidia

Used to be Fail on Linux NVIDIA Quadro P400,
tentatively marking as Flaky there as well.

BUG= 714207 , 702861 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2835693002
Cr-Commit-Position: refs/heads/master@{#466438}

[modify] https://crrev.com/74486301cfa4958e9627a17d053ecf2886f74d50/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Cc: jgruber@chromium.org
+jgruber, next on V8 memory rotation
Cc: serg...@chromium.org
Can we see on the flakiness dashboard how often this flakes also after being marked as flaky? Can we also see this for the continuous bots? E.g. by just looking superficially I didn't see this on V8's fyi bots:
https://build.chromium.org/p/client.v8.fyi/builders/Linux%20Release%20%28NVIDIA%29?numbuilds=100

Comment 6 by hpayer@chromium.org, Apr 24 2017

Cc: -jgruber@chromium.org hpayer@chromium.org
Owner: jgruber@chromium.org
Hi Jakob,
can you please try to re-pro, this one is important.

Flakiness dashboard shows this test failing twice recently:
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webgl2_conformance_tests&tests=WebglConformance_deqp_functional_gles3_multisample

The two red ones on the CI bot are indeed this check failure...
Cc: -serg...@chromium.org
As discussed offline, I am not sure how does WebGL conformance test launcher is working, but I have created a query for Michal that he can use for finding recent flakes.
To run the tests locally:

content/test/gpu/run_gpu_integration_test.py webgl_conformance --webgl-conformance-version=2.0.1 --test-filter 'functional_gles3_multisample' --show-stdout -v --extra-browser-args='--js-flags=--expose-gc' --browser=exact --browser-executable=out/debug/chrome

No luck with a repro so far (and it's a very long-running test).
Note I marked it Flaky, so it won't show as a failure. You'd need to
analyze the logs or revert my CL locally.
Thanks, I did unmark it locally, still no luck. Will keep trying tomorrow.
Project Member

Comment 12 by bugdroid1@chromium.org, Apr 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/443a29819b5b55a5a9bf8fff7b6275a29028cfac

commit 443a29819b5b55a5a9bf8fff7b6275a29028cfac
Author: ynovikov <ynovikov@chromium.org>
Date: Mon Apr 24 19:04:11 2017

Fix WebglConformance_deqp_functional_gles3_multisample expectations

Fail on Linux NVIDIA Quadro P400,
Flaky on rest on Linux NVIDIA,
but explicitly restricting to GeForce GT 610,
to avoid rule conflicts.

BUG= 714207 ,  702861 
TBR=kbr@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2837883002
Cr-Commit-Position: refs/heads/master@{#466706}

[modify] https://crrev.com/443a29819b5b55a5a9bf8fff7b6275a29028cfac/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Still can't get a local repro. The test does fail, but not due to a crash:

  multisample.fbo_max_samples.depth: Failure: Number of distinct colors detected is lower than sample count+1
  FAIL Failure: Number of distinct colors detected is lower than sample count+1

Possibly related:  http://crbug.com/704530 

Comment 15 by kbr@chromium.org, Apr 25 2017

Sorry this is difficult to reproduce.

You may have better luck by running the tests from that shard (#9 of 15), by adding the following arguments to your run_gpu_integration_test.py invocation:

  --read-abbreviated-json-results-from=content/test/data/gpu/webgl2_conformance_tests_output.json --shard-index=9 --total-shards=15

and removing the --test-filter option. This will increase the run time but may help repro.

You might then be able to narrow down the set of tests using --test-filter=[regex] --filter-tests-after-sharding , to preserve the fact that this test runs on shard #9.

Labels: GPU-Intel OS-Mac
Summary: Flaky V8_Fatal in MarkCompactCollector::Sweeper::RawSweep during WebglConformance_deqp_functional_gles3_multisample on Linux NVIDIA and Mac Intel (was: Flaky V8_Fatal in MarkCompactCollector::Sweeper::RawSweep during WebglConformance_deqp_functional_gles3_multisample on Linux NVIDIA)
Based on comment #13, looks like you are hitting  issue 702861  instead of this one.
Guess you need to try on a different GPU. The bots have NVIDIA GeForce GT 610.

Or, you could try Mac Intel, since you've seen it happen there in #14.
But, it may be more rare there, since build 9175 is the only one I see that happening.
Comment 15: Thanks, good idea. Still no repro unfortunately.

Comment 16: Right. I gave it another try on my windows machine (NVIDIA Quadro K620). So far the test hasn't failed at all on this machine, neither running into  issue 702861 , nor the crash from this bug.

Is there a way to get access to a machine with a GT610 (as on the bots)?

Comment 18 by kbr@chromium.org, Apr 28 2017

You could take one of the machines out of the Swarming pool and try to reproduce on it.

As you found, it was also found on macOS: https://bugs.chromium.org/p/chromium/issues/detail?id=714207#c14 -- so it can't really be that OS specific, though I can believe it's hard to reproduce locally.

Is there more logging you could add when DCHECK_IS_ON() and when this particular DCHECK fails that would help you debug it from the logs? That may be the most expedient way of debugging it.

I'm running the tests locally on a trybot (ssh'd in with instructions from [0]) for the past couple of hours, surprisingly I still haven't seen a crash.

I used the prebuilt version from https://chromium-swarm.appspot.com/task?id=35a2d0a8c0da3110&refresh=10&show_raw=1, and I'm running the tests for shard #9 with:

while DISPLAY=:0 content/test/gpu/run_gpu_integration_test.py webgl_conformance --webgl-conformance-version=2.0.1 --show-stdout -v --extra-browser-args='--js-flags=--expose-gc' --browser=exact --browser-executable=out/Release/chrome --read-abbreviated-json-results-from=content/test/data/gpu/webgl2_conformance_tests_output.json --shard-index=9 --total-shards=15 ; do echo 'No failures, retrying'; done

Also tried running only the affected test with stressed gc. 

Will leave this running over the weekend, but I'm starting to lose hope that this approach is leading anywhere.

[0] https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions?pli=1
Labels: -OS-Linux -GPU-Intel -GPU-NVidia -OS-Mac OS-All
Summary: Flaky V8_Fatal in MarkCompactCollector::Sweeper::RawSweep during WebglConformance_deqp_functional_gles3_multisample (was: Flaky V8_Fatal in MarkCompactCollector::Sweeper::RawSweep during WebglConformance_deqp_functional_gles3_multisample on Linux NVIDIA and Mac Intel)
Examining the dashboard in #7, I see that thought this bug is rare, it seems platform independent:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Retina%20Release%20(NVIDIA)/builds/1886
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Retina%20Release%20(NVIDIA)/builds/1912
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Release%20(Intel)/builds/2466
https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20x64%20Release%20(NVIDIA)/builds/10273

FWIW, I've also examined "Linux Release (NVIDIA)" builds since https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48296, and there are no new occurrences there.

Comment 21 by kbr@chromium.org, Apr 28 2017

jgruber@: if direct reproduction isn't working -- and it might not due to the low frequency of reproduction -- then please consider both additional logging you might add to V8 to track it down in the situation the DCHECK fires, so that you might make progress from the bots' logs, as well as thinking about possible root causes and just doing code examination yourself.

A crash like this is extremely serious and it's important that progress continue to be made on tracking it down, rather than giving up because it's difficult to reproduce locally.

Cc: -hpayer@chromium.org jgruber@chromium.org
Owner: hpayer@chromium.org
kbr@: This is definitely staying on our radar. Just chatted with the GC team, assigning this to Hannes to take a look.
After staring at the code a bit I think I know what is going on. Instead of DCHECKing we should actually bail out. This is actually the same case as described a few lines above: "However, if there is a black area at the end of the page, and the last word is a one word filler, we are not allowed to advance. In that case we can return immediately."

Uploading CL...

Comment 24 by kbr@chromium.org, May 2 2017

Very nice! Good analysis.

Project Member

Comment 25 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/f82a59ac30565dd0c4653a22ec7d96d14d050bf2

commit f82a59ac30565dd0c4653a22ec7d96d14d050bf2
Author: hpayer <hpayer@chromium.org>
Date: Wed May 03 10:11:21 2017

[heap] Fix live object iterator bail out case.

BUG= chromium:714207 

Review-Url: https://codereview.chromium.org/2857003002
Cr-Commit-Position: refs/heads/master@{#45055}

[modify] https://crrev.com/f82a59ac30565dd0c4653a22ec7d96d14d050bf2/src/heap/mark-compact-inl.h

Status: Fixed (was: Assigned)
That should fix it. Please re-open if you observe the crasher.
Project Member

Comment 27 by bugdroid1@chromium.org, May 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cbb8751cd7410acdb6535865a0a1beb41051fc96

commit cbb8751cd7410acdb6535865a0a1beb41051fc96
Author: ynovikov <ynovikov@chromium.org>
Date: Thu May 04 21:03:42 2017

Update WebGL2 expectations

deqp/functional/gles3/multisample.html flakiness is supposed to be fixed
by https://codereview.chromium.org/2857953002/

BUG= 714207 
TBR=kbr@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2856403003
Cr-Commit-Position: refs/heads/master@{#469467}

[modify] https://crrev.com/cbb8751cd7410acdb6535865a0a1beb41051fc96/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Labels: Merge-Request-58 Merge-Request-59
Project Member

Comment 29 by sheriffbot@chromium.org, May 15 2017

Labels: -Merge-Request-59 Hotlist-Merge-Approved Merge-Approved-59
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 30 by bugdroid1@chromium.org, May 15 2017

Labels: merge-merged-5.9
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/ad4b7a1b683193a877bfd1c64a5f693ee43519d3

commit ad4b7a1b683193a877bfd1c64a5f693ee43519d3
Author: Hannes Payer <hpayer@chromium.org>
Date: Mon May 15 14:39:30 2017

Merged: [heap] Fix live object iterator bail out case.

Revision: f82a59ac30565dd0c4653a22ec7d96d14d050bf2

BUG= chromium:714207 
LOG=N
NOTRY=true
NOPRESUBMIT=true
NOTREECHECKS=true
R=mlippautz@chromium.org

Change-Id: Iae7b7032e1d72fe9574a61dc632d3411e1289109
Reviewed-on: https://chromium-review.googlesource.com/506072
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Cr-Commit-Position: refs/branch-heads/5.9@{#49}
Cr-Branched-From: fe9bb7e6e251159852770160cfb21dad3cf03523-refs/heads/5.9.211@{#1}
Cr-Branched-From: 70ad23791a21c0dd7ecef8d4d8dd30ff6fc291f6-refs/heads/master@{#44591}
[modify] https://crrev.com/ad4b7a1b683193a877bfd1c64a5f693ee43519d3/src/heap/mark-compact-inl.h

Labels: -Merge-Approved-59
Per comment #30, this is already merged to M59. Hence,removing "Merge-Approved-59" label. 
Labels: -Merge-Request-58 Merge-Rejected-58
We're are not planning any further M58 stable releases. Rejecting merge to M58.

Sign in to add a comment