Issue metadata
Sign in to add a comment
|
Flaky V8_Fatal in MarkCompactCollector::Sweeper::RawSweep during WebglConformance_deqp_functional_gles3_multisample |
||||||||||||||||||||||
Issue descriptionSeen in: https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48296 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48260 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48212 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48159 Stack: # # Fatal error in ../../v8/src/heap/mark-compact-inl.h, line 164 # Check failed: not_done. # #0 0x7fcd3aa39b27 base::debug::StackTrace::StackTrace() #1 0x7fcd3d14fc75 gin::(anonymous namespace)::PrintStackTrace() #2 0x7fcd3d01d30d V8_Fatal #3 0x7fcd39fd1435 v8::internal::LiveObjectIterator<>::Next() #4 0x7fcd39fd0cba v8::internal::MarkCompactCollector::Sweeper::RawSweep() #5 0x7fcd39fc7ca4 v8::internal::MarkCompactCollector::Sweeper::ParallelSweepPage() #6 0x7fcd39fc805d v8::internal::MarkCompactCollector::Sweeper::ParallelSweepSpace() #7 0x7fcd39fd8c54 v8::internal::MarkCompactCollector::Sweeper::SweeperTask::Run() #8 0x7fcd38d787c1 _ZNO4base8CallbackIFvvELNS_8internal8CopyModeE1ELNS2_10RepeatModeE1EE3RunEv #9 0x7fcd3aab23c7 base::(anonymous namespace)::WorkerThread::ThreadMain() #10 0x7fcd3aaa7a4c base::(anonymous namespace)::ThreadFunc() #11 0x7fcd37d91182 start_thread #12 0x7fcd3180747d clone Going to mark the test as flaky to prevent CQ flakiness.
,
Apr 21 2017
,
Apr 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/74486301cfa4958e9627a17d053ecf2886f74d50 commit 74486301cfa4958e9627a17d053ecf2886f74d50 Author: ynovikov <ynovikov@chromium.org> Date: Fri Apr 21 20:41:28 2017 Mark a WebGL2 test Flaky deqp/functional/gles3/multisample.html on linux nvidia Used to be Fail on Linux NVIDIA Quadro P400, tentatively marking as Flaky there as well. BUG= 714207 , 702861 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2835693002 Cr-Commit-Position: refs/heads/master@{#466438} [modify] https://crrev.com/74486301cfa4958e9627a17d053ecf2886f74d50/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Apr 23 2017
+jgruber, next on V8 memory rotation
,
Apr 23 2017
Can we see on the flakiness dashboard how often this flakes also after being marked as flaky? Can we also see this for the continuous bots? E.g. by just looking superficially I didn't see this on V8's fyi bots: https://build.chromium.org/p/client.v8.fyi/builders/Linux%20Release%20%28NVIDIA%29?numbuilds=100
,
Apr 24 2017
Hi Jakob, can you please try to re-pro, this one is important.
,
Apr 24 2017
Flakiness dashboard shows this test failing twice recently: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webgl2_conformance_tests&tests=WebglConformance_deqp_functional_gles3_multisample The two red ones on the CI bot are indeed this check failure...
,
Apr 24 2017
As discussed offline, I am not sure how does WebGL conformance test launcher is working, but I have created a query for Michal that he can use for finding recent flakes.
,
Apr 24 2017
To run the tests locally: content/test/gpu/run_gpu_integration_test.py webgl_conformance --webgl-conformance-version=2.0.1 --test-filter 'functional_gles3_multisample' --show-stdout -v --extra-browser-args='--js-flags=--expose-gc' --browser=exact --browser-executable=out/debug/chrome No luck with a repro so far (and it's a very long-running test).
,
Apr 24 2017
Note I marked it Flaky, so it won't show as a failure. You'd need to analyze the logs or revert my CL locally.
,
Apr 24 2017
Thanks, I did unmark it locally, still no luck. Will keep trying tomorrow.
,
Apr 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/443a29819b5b55a5a9bf8fff7b6275a29028cfac commit 443a29819b5b55a5a9bf8fff7b6275a29028cfac Author: ynovikov <ynovikov@chromium.org> Date: Mon Apr 24 19:04:11 2017 Fix WebglConformance_deqp_functional_gles3_multisample expectations Fail on Linux NVIDIA Quadro P400, Flaky on rest on Linux NVIDIA, but explicitly restricting to GeForce GT 610, to avoid rule conflicts. BUG= 714207 , 702861 TBR=kbr@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2837883002 Cr-Commit-Position: refs/heads/master@{#466706} [modify] https://crrev.com/443a29819b5b55a5a9bf8fff7b6275a29028cfac/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Apr 25 2017
Still can't get a local repro. The test does fail, but not due to a crash: multisample.fbo_max_samples.depth: Failure: Number of distinct colors detected is lower than sample count+1 FAIL Failure: Number of distinct colors detected is lower than sample count+1 Possibly related: http://crbug.com/704530
,
Apr 25 2017
FYI there's also a failure on intel gpus: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/9175.
,
Apr 25 2017
Sorry this is difficult to reproduce. You may have better luck by running the tests from that shard (#9 of 15), by adding the following arguments to your run_gpu_integration_test.py invocation: --read-abbreviated-json-results-from=content/test/data/gpu/webgl2_conformance_tests_output.json --shard-index=9 --total-shards=15 and removing the --test-filter option. This will increase the run time but may help repro. You might then be able to narrow down the set of tests using --test-filter=[regex] --filter-tests-after-sharding , to preserve the fact that this test runs on shard #9.
,
Apr 25 2017
Based on comment #13, looks like you are hitting issue 702861 instead of this one. Guess you need to try on a different GPU. The bots have NVIDIA GeForce GT 610. Or, you could try Mac Intel, since you've seen it happen there in #14. But, it may be more rare there, since build 9175 is the only one I see that happening.
,
Apr 26 2017
Comment 15: Thanks, good idea. Still no repro unfortunately. Comment 16: Right. I gave it another try on my windows machine (NVIDIA Quadro K620). So far the test hasn't failed at all on this machine, neither running into issue 702861 , nor the crash from this bug. Is there a way to get access to a machine with a GT610 (as on the bots)?
,
Apr 28 2017
You could take one of the machines out of the Swarming pool and try to reproduce on it. As you found, it was also found on macOS: https://bugs.chromium.org/p/chromium/issues/detail?id=714207#c14 -- so it can't really be that OS specific, though I can believe it's hard to reproduce locally. Is there more logging you could add when DCHECK_IS_ON() and when this particular DCHECK fails that would help you debug it from the logs? That may be the most expedient way of debugging it.
,
Apr 28 2017
I'm running the tests locally on a trybot (ssh'd in with instructions from [0]) for the past couple of hours, surprisingly I still haven't seen a crash. I used the prebuilt version from https://chromium-swarm.appspot.com/task?id=35a2d0a8c0da3110&refresh=10&show_raw=1, and I'm running the tests for shard #9 with: while DISPLAY=:0 content/test/gpu/run_gpu_integration_test.py webgl_conformance --webgl-conformance-version=2.0.1 --show-stdout -v --extra-browser-args='--js-flags=--expose-gc' --browser=exact --browser-executable=out/Release/chrome --read-abbreviated-json-results-from=content/test/data/gpu/webgl2_conformance_tests_output.json --shard-index=9 --total-shards=15 ; do echo 'No failures, retrying'; done Also tried running only the affected test with stressed gc. Will leave this running over the weekend, but I'm starting to lose hope that this approach is leading anywhere. [0] https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions?pli=1
,
Apr 28 2017
Examining the dashboard in #7, I see that thought this bug is rare, it seems platform independent: https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Retina%20Release%20(NVIDIA)/builds/1886 https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Retina%20Release%20(NVIDIA)/builds/1912 https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Release%20(Intel)/builds/2466 https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20x64%20Release%20(NVIDIA)/builds/10273 FWIW, I've also examined "Linux Release (NVIDIA)" builds since https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48296, and there are no new occurrences there.
,
Apr 28 2017
jgruber@: if direct reproduction isn't working -- and it might not due to the low frequency of reproduction -- then please consider both additional logging you might add to V8 to track it down in the situation the DCHECK fires, so that you might make progress from the bots' logs, as well as thinking about possible root causes and just doing code examination yourself. A crash like this is extremely serious and it's important that progress continue to be made on tracking it down, rather than giving up because it's difficult to reproduce locally.
,
May 2 2017
kbr@: This is definitely staying on our radar. Just chatted with the GC team, assigning this to Hannes to take a look.
,
May 2 2017
After staring at the code a bit I think I know what is going on. Instead of DCHECKing we should actually bail out. This is actually the same case as described a few lines above: "However, if there is a black area at the end of the page, and the last word is a one word filler, we are not allowed to advance. In that case we can return immediately." Uploading CL...
,
May 2 2017
Very nice! Good analysis.
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/f82a59ac30565dd0c4653a22ec7d96d14d050bf2 commit f82a59ac30565dd0c4653a22ec7d96d14d050bf2 Author: hpayer <hpayer@chromium.org> Date: Wed May 03 10:11:21 2017 [heap] Fix live object iterator bail out case. BUG= chromium:714207 Review-Url: https://codereview.chromium.org/2857003002 Cr-Commit-Position: refs/heads/master@{#45055} [modify] https://crrev.com/f82a59ac30565dd0c4653a22ec7d96d14d050bf2/src/heap/mark-compact-inl.h
,
May 3 2017
That should fix it. Please re-open if you observe the crasher.
,
May 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cbb8751cd7410acdb6535865a0a1beb41051fc96 commit cbb8751cd7410acdb6535865a0a1beb41051fc96 Author: ynovikov <ynovikov@chromium.org> Date: Thu May 04 21:03:42 2017 Update WebGL2 expectations deqp/functional/gles3/multisample.html flakiness is supposed to be fixed by https://codereview.chromium.org/2857953002/ BUG= 714207 TBR=kbr@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2856403003 Cr-Commit-Position: refs/heads/master@{#469467} [modify] https://crrev.com/cbb8751cd7410acdb6535865a0a1beb41051fc96/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
May 15 2017
,
May 15 2017
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/ad4b7a1b683193a877bfd1c64a5f693ee43519d3 commit ad4b7a1b683193a877bfd1c64a5f693ee43519d3 Author: Hannes Payer <hpayer@chromium.org> Date: Mon May 15 14:39:30 2017 Merged: [heap] Fix live object iterator bail out case. Revision: f82a59ac30565dd0c4653a22ec7d96d14d050bf2 BUG= chromium:714207 LOG=N NOTRY=true NOPRESUBMIT=true NOTREECHECKS=true R=mlippautz@chromium.org Change-Id: Iae7b7032e1d72fe9574a61dc632d3411e1289109 Reviewed-on: https://chromium-review.googlesource.com/506072 Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/branch-heads/5.9@{#49} Cr-Branched-From: fe9bb7e6e251159852770160cfb21dad3cf03523-refs/heads/5.9.211@{#1} Cr-Branched-From: 70ad23791a21c0dd7ecef8d4d8dd30ff6fc291f6-refs/heads/master@{#44591} [modify] https://crrev.com/ad4b7a1b683193a877bfd1c64a5f693ee43519d3/src/heap/mark-compact-inl.h
,
May 17 2017
Per comment #30, this is already merged to M59. Hence,removing "Merge-Approved-59" label.
,
May 18 2017
We're are not planning any further M58 stable releases. Rejecting merge to M58. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Apr 21 2017Owner: hpayer@chromium.org
Status: Assigned (was: Unconfirmed)