Many tests are crashing on WebKit Android |
|||||||||||
Issue descriptionMany tests are crashing on WebKit Android after landing https://chromium-review.googlesource.com/c/chromium/src/+/1128207. https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/ I think that the problem is: SwiftShader is not yet enabled on some CPU used in Android (https://cs.chromium.org/chromium/src/ui/gl/BUILD.gn?type=cs&q=enable_swiftshader&sq=package:chromium&g=0&l=14) but the CL was landed assuming that SwiftShader is available on all platforms. Given the scale of the CL, it's hard for me to revert it. Would you mind handling this asap? CCed a couple of reviewers of the CL. Thanks!
,
Aug 17
Assigning to capn@ since I'm on vacation next week. Why aren't we using the hardware GPU on those devices? There should be no reason to use SwiftShader on Android, except if it's being emulated on Linux, which should be supported.
,
Aug 17
SwiftShader supports ARM 32-bit so we could in theory enable that. But as questioned by Alexis I'm not sure if we want CPU-based testing on these GPU-enabled devices in the first place. I'll create a patch to try enabling ARM to see if that can at least avoid us having to revert things.
,
Aug 17
This is probably helpful, in case no one has seen it yet: https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.webkit%2FWebKit_Android__Nexus4_%2F81683%2F%2B%2Frecipes%2Fsteps%2Fstack_tool_with_logcat_dump%2F0%2Fstdout libdalvik one might be this, which is pretty common in logcat: JNI posting fatal error: Native registration unable to find class 'android/debug/JNITest'; aborting... No idea why that happens though. FallBackToNextGpuMode is caused by this: db8bb: 08-17 18:45:21.715 15953 15980 E chromium: [15953:15980:0817/184521.727939:ERROR:viz_main_impl.cc(184)] Exiting GPU process due to errors during initialization But not sure what causes the GPU to suicide. There's a bunch of other stuff that's not related to GPU at all though.
,
Aug 17
The GPU process crash has been seen before in the development of https://chromium-review.googlesource.com/c/chromium/src/+/1128207 . See below. I think these layout tests are run with SwiftShader to eliminate differences between GPUs; there aren't GPU-specific baselines for layout tests. The quickest way to make this work again would be to enable SwiftShader builds for 32-bit ARM. Stack Trace: RELADDR FUNCTION FILE:LINE 016a1749 logging::LogMessage::~LogMessage() ??:0:0 00e12aa9 content::GpuDataManagerImplPrivate::FallBackToNextGpuMode() ??:0:0 00e11b8b content::GpuDataManagerImpl::FallBackToNextGpuMode() ??:0:0 00ce61cb viz::mojom::GpuHostStubDispatch::Accept(viz::mojom::GpuHost*, mojo::Message*) ??:0:0 01714897 mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*) ??:0:0 017146d3 mojo::internal::MultiplexRouter::Accept(mojo::Message*) ??:0:0 01711ec9 mojo::Connector::ReadSingleMessage(unsigned int*) ??:0:0 01712195 mojo::Connector::ReadAllAvailableMessages() ??:0:0 0095ebc3 void base::internal::Invoker<base::internal::BindState<void (net::HostResolverImpl::LegacyRequestImpl::*)(int), base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >, void (int)>::RunImpl<void (net::HostResolverImpl::LegacyRequestImpl::*)(int), std::__ndk1::tuple<base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >, 0u>(void (net::HostResolverImpl::LegacyRequestImpl::*&&)(int), std::__ndk1::tuple<base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >&&, std::__ndk1::integer_sequence<unsigned int, 0u>, int&&) ??:0:0 0095ebb5 base::internal::Invoker<base::internal::BindState<void (net::HostResolverImpl::LegacyRequestImpl::*)(int), base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >, void (int)>::RunOnce(base::internal::BindStateBase*, int) ??:0:0 0170ee3d mojo::SimpleWatcher::OnHandleReady(int, unsigned int, mojo::HandleSignalsState const&) ??:0:0 0170ef49 mojo::SimpleWatcher::Context::Notify(unsigned int, MojoHandleSignalsState, unsigned int) ??:0:0 0170eadd mojo::SimpleWatcher::Context::CallNotify(MojoTrapEvent const*) ??:0:0 00cfdfa3 mojo::core::WatcherDispatcher::InvokeWatchCallback(unsigned int, unsigned int, mojo::core::HandleSignalsState const&, unsigned int) ??:0:0 00cfdda5 mojo::core::Watch::InvokeCallback(unsigned int, mojo::core::HandleSignalsState const&, unsigned int) ??:0:0 00cfc1bf mojo::core::RequestContext::~RequestContext() ??:0:0 00cf828f mojo::core::NodeChannel::OnChannelMessage(void const*, unsigned int, std::__ndk1::vector<mojo::PlatformHandle, std::__ndk1::allocator<mojo::PlatformHandle> >) ??:0:0 00cf1633 mojo::core::Channel::OnReadComplete(unsigned int, unsigned int*) ??:0:0 00cffb37 mojo::core::(anonymous namespace)::ChannelPosix::OnFileCanReadWithoutBlocking(int) ??:0:0 016f7ed1 base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) ??:0:0 016f9267 event_base_loop ??:0:0 016f80cb base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) ??:0:0 016b3a45 base::RunLoop::Run() ??:0:0 00d732ad content::BrowserProcessSubThread::IOThreadRun(base::RunLoop*) ??:0:0 016d892d base::Thread::ThreadMain() ??:0:0 016f379d base::(anonymous namespace)::ThreadFunc(void*) ??:0:0 0000d173 <UNKNOWN> /system/lib/libc.so 0000d30b <UNKNOWN> /system/lib/libc.so
,
Aug 19
Unfortunately it looks like enabling ARM for this won't be so quick and easy. SwiftShader's current ARM builds are for system-level Android, while here we need an NDK based build (native app-level). Several headers aren't available. Even if we can get it to build it might take some time to get everything working as expected. Also, I'm traveling and won't be back at my workstation until Friday. So I'm leaning toward reverting to get things green again and giving us some time to do the ARM NDK build properly.
,
Aug 20
Just so that I understand fully what's going on here: From the stack, it looks like it's trying to use libGLESv2_adreno.so and libEGL_adreno.so and failing. Why is it failing to use the GPU? Even though a few differences could arise from using the GPU, most layout tests should still pass and definitely none of them should crash, so why does rendering on a Nexus 4 using the GPU cause crashes? Is there any acceptable workaround for what this bot is testing, like testing on different hardware and using a GPU that works (after all, the Nexus 4 is quite dated at this point)? What is tested here that isn't already covered by other bots? Is Android emulation on Linux not working / insufficient? AFAIK, we don't ship Android with OSMesa to users, so what use cases are we testing here? We can either temporarily revert for Android or temporarily disable these layout tests if they are redundant with other tests we already run, but it would be nice to understand why these failures happen in the first place from someone who understands Android testing better than I do.
,
Aug 20
I think content_shell is forcibly disabling the use of the GPU internally here: https://cs.chromium.org/chromium/src/content/shell/app/shell_main_delegate.cc?q=shell_main_delegate.cc&sq=package:chromium&g=0&l=215 Before we revert the removal of OSMesa from the Chromium repo, may I please try enabling the GPU for these tests on this device in https://chromium-review.googlesource.com/1181683 ? This will probably fix the widespread crashes on this device and only leave a couple of remaining failures on this bot.
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/4e038507d5b0ecfcc8445243e92e9125d9aa34fd commit 4e038507d5b0ecfcc8445243e92e9125d9aa34fd Author: Kenneth Russell <kbr@chromium.org> Date: Mon Aug 20 18:13:09 2018 Run layout tests with the real GPU on "WebKit Android (Nexus4)". Disable the software fallback on this one bot because SwiftShader doesn't yet run on 32-bit ARM. Hopefuly most of these tests will run correctly on top of the real GPU on this device. Bug: 875172 Change-Id: I80064474a2be69b4331dd2f36786f1ba1e8830d5 Reviewed-on: https://chromium-review.googlesource.com/1181683 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> [modify] https://crrev.com/4e038507d5b0ecfcc8445243e92e9125d9aa34fd/scripts/slave/recipe_modules/chromium_tests/chromium_webkit.py
,
Aug 20
jbudorick@ pointed out that the layout tests are also failing on this Nexus 5 bot: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20(dbg) whose parent builder is a 32-bit ARM builder: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20arm%20Builder%20(dbg) Attempting to use the GPU for layout tests on this bot as well in this CL: https://chromium-review.googlesource.com/1181767
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/8a39612b49d0abbfb6bac50563193ab2b80c2720 commit 8a39612b49d0abbfb6bac50563193ab2b80c2720 Author: Kenneth Russell <kbr@chromium.org> Date: Mon Aug 20 21:18:24 2018 Fix extra_args specification on 'WebKit Android (Nexus4)'. It should have been an array of strings. Bug: 875172 Change-Id: Ie9d52022b58f962b1bd2b86c8f9486f2b00de76f Reviewed-on: https://chromium-review.googlesource.com/1182195 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> [modify] https://crrev.com/8a39612b49d0abbfb6bac50563193ab2b80c2720/scripts/slave/recipe_modules/chromium_tests/chromium_webkit.py
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/45cf5409ce0926426ec90284be3874791f44be17 commit 45cf5409ce0926426ec90284be3874791f44be17 Author: Kenneth Russell <kbr@chromium.org> Date: Mon Aug 20 21:21:30 2018 Use GPU for layout tests on KitKat Phone Tester (dbg). SwiftShader isn't available on 32-bit ARM yet. Bug: 875172 Change-Id: I3ddeec51d8ef156def5f5194c75cec31a4aa9412 Reviewed-on: https://chromium-review.googlesource.com/1181767 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#584563} [modify] https://crrev.com/45cf5409ce0926426ec90284be3874791f44be17/testing/buildbot/chromium.android.json [modify] https://crrev.com/45cf5409ce0926426ec90284be3874791f44be17/testing/buildbot/test_suite_exceptions.pyl
,
Aug 21
It turns out that while content_shell would obey the command line flag --use-gpu-in-tests, Blink's layout test runner doesn't. These two bots are still failing to run layout tests: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20(dbg) https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/ I'll pick this up again tomorrow as my top priority. Please don't revert sugoi's CL in the meantime.
,
Aug 21
I assume we need to pass it w/ --additional-driver-flag=--use-gpu-in-tests?
,
Aug 21
Issue 874058 has been merged into this issue.
,
Aug 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/101c8698c55c7c2c2a071fe84f4780983025f4ba commit 101c8698c55c7c2c2a071fe84f4780983025f4ba Author: Kenneth Russell <kbr@chromium.org> Date: Tue Aug 21 17:30:52 2018 Fix additional driver flag on 'WebKit Android (Nexus4)'. Neglected to use --additional-driver-flag command line argument. Bug: 875172 Change-Id: I9546f06c7fc84394cd9983f4534f3ece858c2b47 Reviewed-on: https://chromium-review.googlesource.com/1183882 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> [modify] https://crrev.com/101c8698c55c7c2c2a071fe84f4780983025f4ba/scripts/slave/recipe_modules/chromium_tests/chromium_webkit.py
,
Aug 21
,
Aug 21
Thanks jbudorick@ for pointing out that missing flag. The layout test harness crashes on Linux while initializing the GLX connection when passing that flag, so I'm building blink_tests locally on Android to test on a Nexus 4 to see how they'll work.
,
Aug 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e830534349791e9fa99b52dd7b8e201d426d12bf commit e830534349791e9fa99b52dd7b8e201d426d12bf Author: Kenneth Russell <kbr@chromium.org> Date: Tue Aug 21 19:34:23 2018 Fix additional driver flag on 'KitKat Phone Tester (dbg)'. Neglected to use --additional-driver-flag command line arg. Bug: 875172 Change-Id: I3603e0b3659a2e6309673aad66b76a3dcb01fcc0 Reviewed-on: https://chromium-review.googlesource.com/1183884 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Cr-Commit-Position: refs/heads/master@{#584880} [modify] https://crrev.com/e830534349791e9fa99b52dd7b8e201d426d12bf/testing/buildbot/chromium.android.json [modify] https://crrev.com/e830534349791e9fa99b52dd7b8e201d426d12bf/testing/buildbot/test_suite_exceptions.pyl
,
Aug 21
After fixing the driver flag, only two tests fail on 'WebKit Android (Nexus4)': https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/81815 external/wpt/web-animations/interfaces/Animatable/animate.html http/tests/worklet/webexposed/global-interface-listing-paint-worklet.html https://chromium-review.googlesource.com/1183979 will suppress these if necessary, but we're going to wait for the first build on the Nexus 5 bot with the fixed flag: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20%28dbg%29/8897 to understand whether these suppressions are actually needed there. jbudorick@ points out that https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/ is on its way out. Downgrading this to P1 from P0 after discussion with jbudorick@ and dpranke@. Only a couple of bots are affected.
,
Aug 21
,
Aug 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/07d8d2d9a382a0c5a9d851ebe5145dc4512b8612 commit 07d8d2d9a382a0c5a9d851ebe5145dc4512b8612 Author: Kenneth Russell <kbr@chromium.org> Date: Tue Aug 21 22:53:41 2018 Suppress two layout test failures on 'WebKit Android (Nexus4)'. external/wpt/web-animations/interfaces/Animatable/animate.html http/tests/worklet/webexposed/ global-interface-listing-paint-worklet.html These are the remaining test failures seen on this bot when running with --use-gpu-in-tests. Unfortunately there doesn't seem to be a way to specialize these for this device. Bug: 875172 Change-Id: I050b131b2702987f2e5803ac6e75ea6dfae35b2b Reviewed-on: https://chromium-review.googlesource.com/1183979 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#584915} [modify] https://crrev.com/07d8d2d9a382a0c5a9d851ebe5145dc4512b8612/third_party/WebKit/LayoutTests/TestExpectations
,
Aug 22
'WebKit Android (Nexus4)' is green after the above suppressions: https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/81824 Still watching the build of 'KitKat Phone Tester (dbg)' which contains those suppressions: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20%28dbg%29/8900
,
Aug 22
webkit_layout_tests are passing on this bot now. There are some capacity problems on the bot occasionally causing shards to fail. Linking this to the related issue.
,
Aug 27
Marking Verified as per comment#24 & 25 |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by fhorschig@chromium.org
, Aug 17