VR instrumentation tests flakily crashing due to bad file descriptor |
||||||
Issue descriptionSince around the 3rd or 4th of April, the VR instrumentation tests with Daydream View paired have been flakily crashing due to bad file descriptor checks. The runs of the tests with Cardboard paired look completely fine, but the same tests with DDView paired run into this issue. See https://test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&testType=chrome_public_test_vr_apk-pixelxl-ddview-nougat. An example stack trace of the crash occurring on the Nougat Phone Tester bot can be found at https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F3ca87fbb2a1c5d11%2F%2B%2Ftombstones_tombstones_20180404T081655-UTC_HT73Y0201102, although it looks like garbage to me. A more useful trace from the FYI bot running O is at https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.fyi%2FAndroid_VR_Tests%2F16844%2F%2B%2Frecipes%2Fsteps%2Fstack_tool_with_logcat_dump%2F0%2Fstdout, copied below in case the logs disappear. signal 6 (SIGABRT), code -6 in tid 6852 (VrShellGL), pid 6598 (chromium.chrome) pid: 6598, tid: 6852, name: VrShellGL >>> org.chromium.chrome <<< signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- [FATAL:platform_handle.cc(44)] Check failed: success. : Bad file descriptor (9) Stack Trace: RELADDR FUNCTION FILE:LINE 00e5207d logging::LogMessage::~LogMessage() ../../base/logging.cc:594:29 00e52415 logging::ErrnoLogMessage::~ErrnoLogMessage() ../../base/logging.cc:992:1 01aed475 mojo::edk::PlatformHandle::CloseIfNecessary() ../../mojo/edk/embedder/platform_handle.cc:44:5 v------> mojo::edk::ScopedPlatformHandle::~ScopedPlatformHandle() ../../mojo/edk/embedder/scoped_platform_handle.h:21:37 v------> std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle>::destroy(mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/memory:1814:0 v------> void std::__ndk1::allocator_traits<std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::__destroy<mojo::edk::ScopedPlatformHandle>(std::__ndk1::integral_constant<bool, true>, std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle>&, mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/memory:1682:0 v------> void std::__ndk1::allocator_traits<std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::destroy<mojo::edk::ScopedPlatformHandle>(std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle>&, mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/memory:1550:0 01aee51d std::__ndk1::__vector_base<mojo::edk::ScopedPlatformHandle, std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::__destruct_at_end(mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/vector:415:0 ----------------------------------------------------- r0 00000000 r1 00001ac4 r2 00000006 r3 00000008 r4 000019c6 r5 00001ac4 r6 c5977d4c r7 0000010c r8 00000000 r9 cd7f8120 sl c5978214 fp c5978210 ip c5977da5 sp c5977d38 lr eb804bdf pc eb7fe5ec Stack Trace: RELADDR FUNCTION FILE:LINE 0001a5ec <UNKNOWN> /system/lib/libc.so v------> base::debug::(anonymous namespace)::DebugBreak() ../../base/debug/debugger_posix.cc:228:5 00e416ab base::debug::BreakDebugger() ../../base/debug/debugger_posix.cc:258:0 00e52293 logging::LogMessage::~LogMessage() ../../base/logging.cc:855:7 00e52413 logging::ErrnoLogMessage::~ErrnoLogMessage() ../../base/logging.cc:992:1 01aed475 mojo::edk::PlatformHandle::CloseIfNecessary() ../../mojo/edk/embedder/platform_handle.cc:44:5 v------> mojo::edk::ScopedPlatformHandle::~ScopedPlatformHandle() ../../mojo/edk/embedder/scoped_platform_handle.h:21:37 v------> std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle>::destroy(mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/memory:1814:0 v------> void std::__ndk1::allocator_traits<std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::__destroy<mojo::edk::ScopedPlatformHandle>(std::__ndk1::integral_constant<bool, true>, std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle>&, mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/memory:1682:0 v------> void std::__ndk1::allocator_traits<std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::destroy<mojo::edk::ScopedPlatformHandle>(std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle>&, mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/memory:1550:0 01aee51b std::__ndk1::__vector_base<mojo::edk::ScopedPlatformHandle, std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::__destruct_at_end(mojo::edk::ScopedPlatformHandle*) ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/vector:415:0 v------> std::__ndk1::__vector_base<mojo::edk::ScopedPlatformHandle, std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::clear() ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/vector:356:29 01aee2cb std::__ndk1::__vector_base<mojo::edk::ScopedPlatformHandle, std::__ndk1::allocator<mojo::edk::ScopedPlatformHandle> >::~__vector_base() ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/include/vector:443:0 01b000cd mojo::edk::(anonymous namespace)::ChannelPosix::WriteNoLock(mojo::edk::(anonymous namespace)::MessageView) ../../mojo/edk/system/channel_posix.cc:426:5 01aff7ad mojo::edk::(anonymous namespace)::ChannelPosix::Write(std::__ndk1::unique_ptr<mojo::edk::Channel::Message, std::__ndk1::default_delete<mojo::edk::Channel::Message> >) ../../mojo/edk/system/channel_posix.cc:125:14 01af5237 mojo::edk::NodeChannel::WriteChannelMessage(std::__ndk1::unique_ptr<mojo::edk::Channel::Message, std::__ndk1::default_delete<mojo::edk::Channel::Message> >) ../../mojo/edk/system/node_channel.cc:906:15 01af56d5 mojo::edk::NodeChannel::SendChannelMessage(std::__ndk1::unique_ptr<mojo::edk::Channel::Message, std::__ndk1::default_delete<mojo::edk::Channel::Message> >) ../../mojo/edk/system/node_channel.cc:389:3 01af7b83 mojo::edk::NodeController::SendPeerEvent(mojo::edk::ports::NodeName const&, std::__ndk1::unique_ptr<mojo::edk::ports::Event, std::__ndk1::default_delete<mojo::edk::ports::Event> >) ../../mojo/edk/system/node_controller.cc:619:11 01af7f09 mojo::edk::NodeController::ForwardEvent(mojo::edk::ports::NodeName const&, std::__ndk1::unique_ptr<mojo::edk::ports::Event, std::__ndk1::default_delete<mojo::edk::ports::Event> >) ../../mojo/edk/system/node_controller.cc:696:5 01b02c37 mojo::edk::ports::Node::SendUserMessageInternal(mojo::edk::ports::PortRef const&, std::__ndk1::unique_ptr<mojo::edk::ports::UserMessageEvent, std::__ndk1::default_delete<mojo::edk::ports::UserMessageEvent> >*) ../../mojo/edk/system/ports/node.cc:838:16 01b02a73 mojo::edk::ports::Node::SendUserMessage(mojo::edk::ports::PortRef const&, std::__ndk1::unique_ptr<mojo::edk::ports::UserMessageEvent, std::__ndk1::default_delete<mojo::edk::ports::UserMessageEvent> >) ../../mojo/edk/system/ports/node.cc:359:12 01af70d3 mojo::edk::NodeController::SendUserMessage(mojo::edk::ports::PortRef const&, std::__ndk1::unique_ptr<mojo::edk::ports::UserMessageEvent, std::__ndk1::default_delete<mojo::edk::ports::UserMessageEvent> >) ../../mojo/edk/system/node_controller.cc:270:17 01af4673 mojo::edk::MessagePipeDispatcher::WriteMessage(std::__ndk1::unique_ptr<mojo::edk::ports::UserMessageEvent, std::__ndk1::default_delete<mojo::edk::ports::UserMessageEvent> >, unsigned int) ../../mojo/edk/system/message_pipe_dispatcher.cc:144:30 01af0299 mojo::edk::Core::WriteMessage(unsigned int, unsigned int, unsigned int) ../../mojo/edk/system/core.cc:701:22 v------> mojo::WriteMessageNew(mojo::MessagePipeHandle, mojo::ScopedHandleBase<mojo::MessageHandle>, unsigned int) ../../mojo/public/cpp/system/message_pipe.h:94:10 012c1bc3 mojo::Connector::Accept(mojo::Message*) ../../mojo/public/cpp/bindings/lib/connector.cc:278:0 012c6edb mojo::internal::MultiplexRouter::InterfaceEndpoint::SendMessage(mojo::Message*) ../../mojo/public/cpp/bindings/lib/multiplex_router.cc:132:32 012c4385 mojo::InterfaceEndpointClient::Accept(mojo::Message*) ../../mojo/public/cpp/bindings/lib/interface_endpoint_client.cc:242:23 01f2081d device::mojom::VRSubmitFrameClientProxy::OnSubmitFrameGpuFence(gfx::GpuFenceHandle const&) gen/device/vr/public/mojom/vr_service.mojom.cc:969:28 03508c3f vr::VrShellGl::DrawFrameSubmitNow(short, gfx::Transform const&) ../../chrome/browser/android/vr/vr_shell_gl.cc:1349:23 0350869f vr::VrShellGl::DrawIntoAcquiredFrame(short, base::TimeTicks) ../../chrome/browser/android/vr/vr_shell_gl.cc:1236:5 035071f5 vr::VrShellGl::DrawFrame(short, base::TimeTicks) ../../chrome/browser/android/vr/vr_shell_gl.cc:0:31 0350667b vr::VrShellGl::OnVSync(base::TimeTicks) ../../chrome/browser/android/vr/vr_shell_gl.cc:1585:5 017adc09 void base::internal::Invoker<base::internal::BindState<void (media::ChunkDemuxer::*)(base::TimeDelta), base::internal::UnretainedWrapper<media::ChunkDemuxer> >, void (base::TimeDelta)>::RunImpl<void (media::ChunkDemuxer::* const&)(base::TimeDelta), std::__ndk1::tuple<base::internal::UnretainedWrapper<media::ChunkDemuxer> > const&, 0u>(void (media::ChunkDemuxer::* const&&&)(base::TimeDelta), std::__ndk1::tuple<base::internal::UnretainedWrapper<media::ChunkDemuxer> > const&&&, std::__ndk1::integer_sequence<unsigned int, 0u>, base::TimeDelta&&) ../../base/bind_internal.h:604:12 027ff847 base::OnceCallback<void (base::TimeTicks)>::Run(base::TimeTicks) && ../../base/callback.h:95:12 034f8e7b vr::AndroidVSyncHelper::OnVSync(_JNIEnv*, base::android::JavaParamRef<_jobject*> const&, long long) ../../chrome/browser/android/vr/android_vsync_helper.cc:43:36 034f8dc7 Java_org_chromium_chrome_browser_vr_1shell_AndroidVSyncHelper_nativeOnVSync gen/chrome/browser/android/vr/vr_shell_jni_headers/vr_shell/jni/AndroidVSyncHelper_jni.h:48:18 00087a1d <UNKNOWN>
,
Apr 5 2018
It's hard to tell since we can't get screenshots on crashes. Based on the times in the logcat output (e.g. https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F3cafbe715dfc4711%2F%2B%2Flogcat_logcat_org.chromium.chrome.browser.vr_shell.WebVrTransitionTest.testNonExclusiveStopsDuringExclusive__ChromeTabbedActivity_20180405T175645-UTC_HT73Y0201102), it's either entering presentation or immediately after. My initial guess would be that it's happening on the first frame after entering VR?
,
Apr 5 2018
This might also be related to the flaky timeouts of some of the tests. Looking at their logcat output, the tests are completing, but their completion is not being recognized by the host-side test runner because the call to destroy the activity is timing out. Logcat is filled with tons of spam of lines like "Looper : Ignoring unexpected epoll events 0x1 on fd 155 that is no longer registered.", even after the test is supposed to be completed. That, combined with the fact that the two seemed to start around the same time, makes me think they might be related.
,
Apr 5 2018
Unassigning myself for now based on this. As discussed offline, IPC transport of a GpuFence involves duplicating a file descriptor, so this error may be a side effect of a file descriptor leak elsewhere that breaks this. Try "lsof -p <PID>" using the PID of org.chromium.chrome (main browser process) to see if the number of file descriptors seems to be increasing while running.
,
Apr 6 2018
This appears to be tied to the recent change to enable the use of the VrCore head tracking service - I'm unable to reproduce the crashes when VrCore has EnableVrCoreHeadTracking set to false, but can reproduce occasionally (~2% of the time?) when set to true. In both cases, the maximum number of file descriptors is around 1140, although I haven't been able to reproduce yet with my script running that repeatedly prints out the number of file descriptors Chrome has open. I'll have to also check the total number of file descriptors open, as you said that there's some global maximum, right? It might be that Chrome doesn't necessarily leak file descriptors, but the head tracking service does.
,
Apr 6 2018
So nothing particularly interesting happens with the browser process's file descriptors when the issue reproduces.
,
Apr 6 2018
A summary of what I've found out so far: On my Pixel, the total number of file descriptors open on the system peaked at around 51400 when running the test. The one time I was able to reproduce the issue while monitoring total file descriptors before my Pixel decided to die, the number got up to 52135. On my Pixel 2, the max number of file descriptors peaks at ~3500, and doesn't change when the issue reproduces. However, the issue on the Pixel 2 is slightly different - whereas on the Pixel, it was failing a check in Chrome, the Pixel 2 fails due to being unable to write to a bad file descriptor. This appears to occur while transitioning into VR. The abort message is: "Could not write wake signal to fd 168: Bad file descriptor" Attached is the full logcat output of the failed run on the Pixel 2.
,
Apr 6 2018
Also, I mistakenly said that this only occurs on DDView - it does seem to be happening on Cardboard, just not as often.
,
Apr 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dffda945364daa67ce98260f74e76fa59768de80 commit dffda945364daa67ce98260f74e76fa59768de80 Author: bsheedy <bsheedy@chromium.org> Date: Fri Apr 06 19:26:26 2018 Disable VrCore head tracking service Enabling the VrCore head tracking service somehow causes Chrome to flakily crash due to bad file descriptors. So, disable it for the time being. TBR=mthiesse@chromium.org Bug: 829520 Change-Id: If5443c7e5dff5dda366fe04301487b3bac018c76 Reviewed-on: https://chromium-review.googlesource.com/1000024 Reviewed-by: Brian Sheedy <bsheedy@chromium.org> Commit-Queue: Brian Sheedy <bsheedy@chromium.org> Cr-Commit-Position: refs/heads/master@{#548897} [modify] https://crrev.com/dffda945364daa67ce98260f74e76fa59768de80/chrome/android/shared_preference_files/test/vr_cardboard_skipdon_setupcomplete.json [modify] https://crrev.com/dffda945364daa67ce98260f74e76fa59768de80/chrome/android/shared_preference_files/test/vr_ddview_skipdon_setupcomplete.json
,
Apr 6 2018
I've filed b/77718627 for tracking this on VrCore's end.
,
Apr 6 2018
,
Jul 4
,
Aug 7
Removing Internals>VR component and assigning to Internals>XR
,
Aug 7
,
Aug 29
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by klausw@chromium.org
, Apr 5 2018Status: Started (was: Available)