New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 614142 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Latest v8 roll introduced flakiness into GPU bots

Project Member Reported by zmo@chromium.org, May 23 2016

Issue description

This is the roll:
https://codereview.chromium.org/2004883002

This is one of the failing bots:
https://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29

This is one of the builds that failed:
https://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29/builds/62138

It's also on Linux Release bots, so it affects our CQ.


 

Comment 1 by kbr@chromium.org, May 23 2016

The crash stack is as follows:

	Thread 14 (crashed)
	 0  libstdc++.so.6.0.19 + 0x749e4
	 1  libstdc++.so.6.0.19 + 0x74cc1
	 2  libv8.so!_M_insert_unique_<std::pair<v8::internal::JSArrayBuffer *const, std::pair<void *, unsigned long> > > [stl_tree.h : 973 + 0xe]
	 3  libv8.so!AddLive [stl_map.h : 579 + 0xb]
	 4  libv8.so!Process<(lambda at ../../v8/src/heap/array-buffer-tracker-inl.h:51:15)> [array-buffer-tracker-inl.h : 24 + 0xb]
	 5  libv8.so!EvacuatePage [array-buffer-tracker-inl.h : 51 + 0x5]
	 6  libv8.so!RunInternal [mark-compact.cc : 3213 + 0x5]
	 7  libgin.so!void base::internal::RunnableAdapter<void (gin::Timer::*)()>::Run<gin::Timer*>(gin::Timer*&&) + 0x70
	 8  libgin.so!void base::internal::InvokeHelper<false, void, base::internal::RunnableAdapter<void (base::Timer::*)()> >::MakeItSo<base::Timer*>(base::internal::RunnableAdapter<void (base::Timer::*)()>, base::Timer*&&) + 0x29
	 9  libgin.so!base::internal::Invoker<base::IndexSequence<0ul>, base::internal::BindState<base::internal::RunnableAdapter<void (v8::Task::*)()>, void (v8::Task*), base::internal::OwnedWrapper<v8::Task> >, base::internal::InvokeHelper<false, void, base::internal::RunnableAdapter<void (v8::Task::*)()> >, void ()>::Run(base::internal::BindStateBase*) + 0x55
	10  libbase.so!base::Callback<void (), (base::internal::CopyMode)1>::Run() const + 0x2e
	11  libbase.so!base::(anonymous namespace)::WorkerThread::ThreadMain() + 0x28d
	12  libbase.so!base::(anonymous namespace)::ThreadFunc(void*) + 0xba
	13  libpthread-2.19.so + 0x8182
	14  libc-2.19.so + 0xfa47d

We're going to roll V8 back to the last known good version, and halt V8 rolls. This is affecting many bots, and needs to be addressed before V8 rolls in again.

Comment 2 by zmo@chromium.org, May 23 2016

Here is one of the stack trace:

#
# Fatal error in ../../v8/src/heap/array-buffer-tracker.cc, line 47
# Check failed: live_.count(key) == 1 (0 vs. 1).
#

==== C stack trace ===============================

 1: V8_Fatal
 2: v8::internal::LocalArrayBufferTracker::Remove(v8::internal::JSArrayBuffer*)
 3: v8::internal::ArrayBufferTracker::Unregister(v8::internal::JSArrayBuffer*)
 4: v8::ArrayBuffer::Externalize()
 5: blink::V8ArrayBuffer::toImpl(v8::Local<v8::Object>)
 6: blink::V8Uint8Array::toImpl(v8::Local<v8::Object>)
 7: blink::V8ArrayBufferView::toImpl(v8::Local<v8::Object>)
 8: blink::WebGL2RenderingContextV8Internal::texImage2D2Method(v8::FunctionCallbackInfo<v8::Value> const&)
 9: blink::WebGL2RenderingContextV8Internal::texImage2DMethodCallback(v8::FunctionCallbackInfo<v8::Value> const&)
10: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&))
11: v8::internal::(anonymous namespace)::HandleApiCallHelper(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)3>)
12: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)3>, v8::internal::Isolate*)
13: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
14: 0x2d5c17208ba7
Received signal 4 <unknown> 000110aa8f82
 [0x00010e7b5956]
 [0x7fff89904f1a]
 [0x000000000000]
 [0x000110670d77]
 [0x000110671265]
 [0x0001102f4774]
 [0x000111cefd4e]
 [0x000111c9e2be]
 [0x000111cc70aa]
 [0x000110f1f78d]
 [0x000110f11323]
 [0x000110302aaa]
 [0x00011034a406]
 [0x00011037e0e9]
 [0x0001103559b6]
 [0x2d5c17208ba7]
[end of stack trace]

Comment 3 by mek@chromium.org, May 23 2016

Cc: mlippautz@chromium.org
 Issue 614087  has been merged into this issue.
Project Member

Comment 4 by bugdroid1@chromium.org, May 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ab21a0a4d53222d3f016dd610c7129c07b3f574f

commit ab21a0a4d53222d3f016dd610c7129c07b3f574f
Author: zmo <zmo@chromium.org>
Date: Mon May 23 21:52:03 2016

Roll V8 back to e50f265d.

https://chromium.googlesource.com/v8/v8/+log/e50f265d..68d87836

BUG= 614142 
TEST=linux gpu bots
TBR=kbr@chromium.org
NOTRY=true

Review-Url: https://codereview.chromium.org/2005983003
Cr-Commit-Position: refs/heads/master@{#395425}

[modify] https://crrev.com/ab21a0a4d53222d3f016dd610c7129c07b3f574f/DEPS

Comment 5 by zmo@chromium.org, May 23 2016

Labels: OS-Windows
Summary: Latest v8 roll introduced flakiness into GPU bots (was: Latest v8 roll introduced flakiness into GPU linux bots)
I also see this on Windows: 

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20x64%20Release%20%28NVIDIA%29/builds/4274/steps/webgl2_conformance_tests%20on%20NVIDIA%20GPU%20on%20Windows%20on%20Windows-2008ServerR2-SP1/logs/stdio

#
# Fatal error in e:\b\build\slave\gpu_win_x64_builder\build\src\v8\src\heap\array-buffer-tracker.cc, line 47
# Check failed: live_.count(key) == 1 (0 vs. 1).
#
Cc: machenb...@chromium.org hablich@chromium.org
Owner: mlippautz@chromium.org
Without much doubt I think it's this:
https://chromium.googlesource.com/v8/v8/+/b2d8bfc7931eef49d527605ba485950dea41cde3

I'll start reverting this to unblock our rolls.
Status: Fixed (was: Assigned)
Reverted the commit and its unrelated dependencies. We will start rolling again from ToT because we cannot easily backmerge onto 5.3.14 and there have only been ~20 commits since then.

Please CC hablich@ instead of machenbach@ on release/roll related issues.

Comment 9 by kbr@chromium.org, May 24 2016

Thanks for the quick triage and revert, and for the information on who to CC: on future issues.

If we could help add tests to V8's tree that would have caught this sooner, please tell us. It's a little distressing that the bad roll made it through Chromium's commit queue in the first place, requiring manual sheriffing.

In general: Yes, we already started discussing on how we could leverage the test coverage of the GPU bots (which turns out to be huge). I already knew about potential concurrency issues and that's why I added some extra try bots. Any chance, we could run the GPU tests in such cases? (Maybe we actually can and I don't know about it...)

About JSArrayBuffer issues specifically (this CL): Unfortunately we've maneuvered ourselves into a position where the current implementation just doesn't scale anymore. Worse, we actually have 0 explicit tests for the current implementation and rely solely on other components using them. The new implementation comes with a set of explicit tests but lacked one for this corner case (which is a concurrency bug). I added a test case relying on TSAN to catch this for a reland.

Comment 11 by kbr@chromium.org, May 24 2016

I see from the original V8 roll https://codereview.chromium.org/2004883002 that linux_blink_rel was added as another trybot. The GPU tests already run against V8 rolls; see e.g. the win_chromium_rel_ng, linux_chromium_rel_ng and mac_chromium_rel_ng tryjobs.

Surprisingly, there weren't any flaky failures in the tryjobs for that V8 roll or the subsequent three ones (they would show up as a "More..." link at the end of the bots' results). However, as soon as the roll landed, failures started showing up on the waterfall. The trybots run exactly the same way as the waterfall bots, so this is mystifying. The only way I can think of to possibly catch these failures earlier in this case would be to add GPU bots to the V8 waterfall. This should be pretty straightforward, since these bots could build top-of-tree V8 into Chromium and run the existing set of tests.

It is possible to add standard chromium trybots, which also run the gpu tests, to V8 CLs. But we wouldn't catch more or less than the same trybot did on the roll. Unless the flakes show up somehow and somebody actively looks for them (which nobody ever does).

We could certainly set up gpu tests on our waterfall. This approach has some drawbacks and challenges:
- Nobody will look at the results manually
- It is hard to differentiate v8 problems from chromium noise (the bot might break upstream). There are ways to overcome this, e.g. to search for v8 check failures explicitly and only alert those
- Infrastructure tends to get out of sync over time. We've set up similar chromium bots before. They all broke eventually due to upstream infra changes and we don't have the human resources to keep them in sync.

I'd be happier if we found out more about the nature of the missing test coverage and tried to simulate that in d8, e.g. with a different kind of gc stress testing.


Comment 13 by kbr@chromium.org, May 26 2016

 Issue 612385  has been merged into this issue.

Sign in to add a comment