New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 875172 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
OOO until 2019-01-24
Closed: Aug 22
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug

Blocking:
issue 873321
issue 876539



Sign in to add a comment

Many tests are crashing on WebKit Android

Project Member Reported by haraken@chromium.org, Aug 17

Issue description

Many tests are crashing on WebKit Android after landing https://chromium-review.googlesource.com/c/chromium/src/+/1128207.

https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/

I think that the problem is: SwiftShader is not yet enabled on some CPU used in Android (https://cs.chromium.org/chromium/src/ui/gl/BUILD.gn?type=cs&q=enable_swiftshader&sq=package:chromium&g=0&l=14) but the CL was landed assuming that SwiftShader is available on all platforms.

Given the scale of the CL, it's hard for me to revert it. Would you mind handling this asap?

CCed a couple of reviewers of the CL.

Thanks!

 
Labels: Sheriff-Chromium
Cc: sugoi@chromium.org
Owner: capn@chromium.org
Assigning to capn@ since I'm on vacation next week.
Why aren't we using the hardware GPU on those devices?
There should be no reason to use SwiftShader on Android, except if it's being emulated on Linux, which should be supported.
Cc: boliu@chromium.org zmo@chromium.org
Components: Mobile>WebView Internals>GPU
Labels: OS-Android
SwiftShader supports ARM 32-bit so we could in theory enable that. But as questioned by Alexis I'm not sure if we want CPU-based testing on these GPU-enabled devices in the first place.

I'll create a patch to try enabling ARM to see if that can at least avoid us having to revert things.
This is probably helpful, in case no one has seen it yet:
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.webkit%2FWebKit_Android__Nexus4_%2F81683%2F%2B%2Frecipes%2Fsteps%2Fstack_tool_with_logcat_dump%2F0%2Fstdout


libdalvik one might be this, which is pretty common in logcat:
JNI posting fatal error: Native registration unable to find class 'android/debug/JNITest'; aborting...

No idea why that happens though.


FallBackToNextGpuMode is caused by this:
db8bb:  08-17 18:45:21.715 15953 15980 E chromium: [15953:15980:0817/184521.727939:ERROR:viz_main_impl.cc(184)] Exiting GPU process due to errors during initialization

But not sure what causes the GPU to suicide.


There's a bunch of other stuff that's not related to GPU at all though.
Blocking: 873321
Components: Internals>GPU>SwiftShader
The GPU process crash has been seen before in the development of https://chromium-review.googlesource.com/c/chromium/src/+/1128207 . See below.

I think these layout tests are run with SwiftShader to eliminate differences between GPUs; there aren't GPU-specific baselines for layout tests.

The quickest way to make this work again would be to enable SwiftShader builds for 32-bit ARM.


Stack Trace:

  016a1749  logging::LogMessage::~LogMessage
  00e12aa9  content::GpuDataManagerImplPrivate::FallBackToNextGpuMode
  00e11b8b  content::GpuDataManagerImpl::FallBackToNextGpuMode
  00ce61cb  viz::mojom::GpuHostStubDispatch::Accept(viz::mojom::GpuHost*, mojo::Message
  01714897  mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*)                                                                                                                                                                                                                                                                                                                                                                                                                 ??:0:0
  017146d3  mojo::internal::MultiplexRouter::Accept(mojo::Message
  01711ec9  mojo::Connector::ReadSingleMessage(unsigned int
  01712195  mojo::Connector::ReadAllAvailableMessages
  0095ebc3  void base::internal::Invoker<base::internal::BindState<void (net::HostResolverImpl::LegacyRequestImpl::*)(int), base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >, void (int)>::RunImpl<void (net::HostResolverImpl::LegacyRequestImpl::*)(int), std::__ndk1::tuple<base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >, 0u>(void (net::HostResolverImpl::LegacyRequestImpl::*&&)(int), std::__ndk1::tuple<base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >&&, std::__ndk1::integer_sequence<unsigned int, 0u>, int&&)  ??:0:0
  0095ebb5  base::internal::Invoker<base::internal::BindState<void (net::HostResolverImpl::LegacyRequestImpl::*)(int), base::internal::UnretainedWrapper<net::HostResolverImpl::LegacyRequestImpl> >, void (int)>::RunOnce(base::internal::BindStateBase*, int)                                                                                                                                                                                                                                                                                                                                                        ??:0:0
  0170ee3d  mojo::SimpleWatcher::OnHandleReady(int, unsigned int, mojo::HandleSignalsState const
  0170ef49  mojo::SimpleWatcher::Context::Notify(unsigned int, MojoHandleSignalsState, unsigned int
  0170eadd  mojo::SimpleWatcher::Context::CallNotify(MojoTrapEvent const
  00cfdfa3  mojo::core::WatcherDispatcher::InvokeWatchCallback(unsigned int, unsigned int, mojo::core::HandleSignalsState const&, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ??:0:0
  00cfdda5  mojo::core::Watch::InvokeCallback(unsigned int, mojo::core::HandleSignalsState const&, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ??:0:0
  00cfc1bf  mojo::core::RequestContext::~RequestContext
  00cf828f  mojo::core::NodeChannel::OnChannelMessage(void const*, unsigned int, std::__ndk1::vector<mojo::PlatformHandle, std::__ndk1::allocator<mojo::PlatformHandle> >)                                                                                                                                                                                                                                                                                                                                                                                                                                             ??:0:0
  00cf1633  mojo::core::Channel::OnReadComplete(unsigned int, unsigned int
  00cffb37  mojo::core::(anonymous namespace)::ChannelPosix::OnFileCanReadWithoutBlocking(int
  016f7ed1  base::MessagePumpLibevent::OnLibeventNotification(int, short, void
  016f9267  event_base_loop
  016f80cb  base::MessagePumpLibevent::Run(base::MessagePump::Delegate
  016b3a45  base::RunLoop::Run
  00d732ad  content::BrowserProcessSubThread::IOThreadRun(base::RunLoop
  016d892d  base::Thread::ThreadMain
  016f379d  base::(anonymous namespace)::ThreadFunc(void
  0000dsystem/lib/libc.so
  0000d30bsystem/lib/libc.so

Unfortunately it looks like enabling ARM for this won't be so quick and easy. SwiftShader's current ARM builds are for system-level Android, while here we need an NDK based build (native app-level). Several headers aren't available. Even if we can get it to build it might take some time to get everything working as expected.

Also, I'm traveling and won't be back at my workstation until Friday. So I'm leaning toward reverting to get things green again and giving us some time to do the ARM NDK build properly.
Just so that I understand fully what's going on here:
From the stack, it looks like it's trying to use libGLESv2_adreno.so and libEGL_adreno.so and failing. Why is it failing to use the GPU? Even though a few differences could arise from using the GPU, most layout tests should still pass and definitely none of them should crash, so why does rendering on a Nexus 4 using the GPU cause crashes?

Is there any acceptable workaround for what this bot is testing, like testing on different hardware and using a GPU that works (after all, the Nexus 4 is quite dated at this point)? What is tested here that isn't already covered by other bots? Is Android emulation on Linux not working / insufficient? AFAIK, we don't ship Android with OSMesa to users, so what use cases are we testing here?

We can either temporarily revert for Android or temporarily disable these layout tests if they are redundant with other tests we already run, but it would be nice to understand why these failures happen in the first place from someone who understands Android testing better than I do.
Cc: capn@chromium.org
Owner: kbr@chromium.org
I think content_shell is forcibly disabling the use of the GPU internally here:

https://cs.chromium.org/chromium/src/content/shell/app/shell_main_delegate.cc?q=shell_main_delegate.cc&sq=package:chromium&g=0&l=215

Before we revert the removal of OSMesa from the Chromium repo, may I please try enabling the GPU for these tests on this device in https://chromium-review.googlesource.com/1181683 ? This will probably fix the widespread crashes on this device and only leave a couple of remaining failures on this bot.

Project Member

Comment 9 by bugdroid1@chromium.org, Aug 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/4e038507d5b0ecfcc8445243e92e9125d9aa34fd

commit 4e038507d5b0ecfcc8445243e92e9125d9aa34fd
Author: Kenneth Russell <kbr@chromium.org>
Date: Mon Aug 20 18:13:09 2018

Run layout tests with the real GPU on "WebKit Android (Nexus4)".

Disable the software fallback on this one bot because SwiftShader
doesn't yet run on 32-bit ARM. Hopefuly most of these tests will run
correctly on top of the real GPU on this device.

Bug:  875172 
Change-Id: I80064474a2be69b4331dd2f36786f1ba1e8830d5
Reviewed-on: https://chromium-review.googlesource.com/1181683
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>

[modify] https://crrev.com/4e038507d5b0ecfcc8445243e92e9125d9aa34fd/scripts/slave/recipe_modules/chromium_tests/chromium_webkit.py

jbudorick@ pointed out that the layout tests are also failing on this Nexus 5 bot:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20(dbg)

whose parent builder is a 32-bit ARM builder:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20arm%20Builder%20(dbg)

Attempting to use the GPU for layout tests on this bot as well in this CL:
https://chromium-review.googlesource.com/1181767

Project Member

Comment 11 by bugdroid1@chromium.org, Aug 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/8a39612b49d0abbfb6bac50563193ab2b80c2720

commit 8a39612b49d0abbfb6bac50563193ab2b80c2720
Author: Kenneth Russell <kbr@chromium.org>
Date: Mon Aug 20 21:18:24 2018

Fix extra_args specification on 'WebKit Android (Nexus4)'.

It should have been an array of strings.

Bug:  875172 
Change-Id: Ie9d52022b58f962b1bd2b86c8f9486f2b00de76f
Reviewed-on: https://chromium-review.googlesource.com/1182195
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>

[modify] https://crrev.com/8a39612b49d0abbfb6bac50563193ab2b80c2720/scripts/slave/recipe_modules/chromium_tests/chromium_webkit.py

Project Member

Comment 12 by bugdroid1@chromium.org, Aug 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/45cf5409ce0926426ec90284be3874791f44be17

commit 45cf5409ce0926426ec90284be3874791f44be17
Author: Kenneth Russell <kbr@chromium.org>
Date: Mon Aug 20 21:21:30 2018

Use GPU for layout tests on KitKat Phone Tester (dbg).

SwiftShader isn't available on 32-bit ARM yet.

Bug:  875172 
Change-Id: I3ddeec51d8ef156def5f5194c75cec31a4aa9412
Reviewed-on: https://chromium-review.googlesource.com/1181767
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#584563}
[modify] https://crrev.com/45cf5409ce0926426ec90284be3874791f44be17/testing/buildbot/chromium.android.json
[modify] https://crrev.com/45cf5409ce0926426ec90284be3874791f44be17/testing/buildbot/test_suite_exceptions.pyl

It turns out that while content_shell would obey the command line flag --use-gpu-in-tests, Blink's layout test runner doesn't. These two bots are still failing to run layout tests:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20(dbg)
https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/

I'll pick this up again tomorrow as my top priority. Please don't revert sugoi's CL in the meantime.

I assume we need to pass it w/ --additional-driver-flag=--use-gpu-in-tests?
Issue 874058 has been merged into this issue.
Project Member

Comment 16 by bugdroid1@chromium.org, Aug 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/101c8698c55c7c2c2a071fe84f4780983025f4ba

commit 101c8698c55c7c2c2a071fe84f4780983025f4ba
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Aug 21 17:30:52 2018

Fix additional driver flag on 'WebKit Android (Nexus4)'.

Neglected to use --additional-driver-flag command line argument.

Bug:  875172 
Change-Id: I9546f06c7fc84394cd9983f4534f3ece858c2b47
Reviewed-on: https://chromium-review.googlesource.com/1183882
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>

[modify] https://crrev.com/101c8698c55c7c2c2a071fe84f4780983025f4ba/scripts/slave/recipe_modules/chromium_tests/chromium_webkit.py

Cc: -fhorschig@chromium.org
Status: Started (was: Assigned)
Thanks jbudorick@ for pointing out that missing flag.

The layout test harness crashes on Linux while initializing the GLX connection when passing that flag, so I'm building blink_tests locally on Android to test on a Nexus 4 to see how they'll work.

Project Member

Comment 19 by bugdroid1@chromium.org, Aug 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e830534349791e9fa99b52dd7b8e201d426d12bf

commit e830534349791e9fa99b52dd7b8e201d426d12bf
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Aug 21 19:34:23 2018

Fix additional driver flag on 'KitKat Phone Tester (dbg)'.

Neglected to use --additional-driver-flag command line arg.

Bug:  875172 
Change-Id: I3603e0b3659a2e6309673aad66b76a3dcb01fcc0
Reviewed-on: https://chromium-review.googlesource.com/1183884
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#584880}
[modify] https://crrev.com/e830534349791e9fa99b52dd7b8e201d426d12bf/testing/buildbot/chromium.android.json
[modify] https://crrev.com/e830534349791e9fa99b52dd7b8e201d426d12bf/testing/buildbot/test_suite_exceptions.pyl

Labels: -Pri-0 Pri-1
After fixing the driver flag, only two tests fail on 'WebKit Android (Nexus4)':

https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/81815

external/wpt/web-animations/interfaces/Animatable/animate.html
http/tests/worklet/webexposed/global-interface-listing-paint-worklet.html

https://chromium-review.googlesource.com/1183979 will suppress these if necessary, but we're going to wait for the first build on the Nexus 5 bot with the fixed flag:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20%28dbg%29/8897

to understand whether these suppressions are actually needed there. jbudorick@ points out that https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/ is on its way out.

Downgrading this to P1 from P0 after discussion with jbudorick@ and dpranke@. Only a couple of bots are affected.

Cc: jonr...@chromium.org
Project Member

Comment 22 by bugdroid1@chromium.org, Aug 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/07d8d2d9a382a0c5a9d851ebe5145dc4512b8612

commit 07d8d2d9a382a0c5a9d851ebe5145dc4512b8612
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Aug 21 22:53:41 2018

Suppress two layout test failures on 'WebKit Android (Nexus4)'.

  external/wpt/web-animations/interfaces/Animatable/animate.html
  http/tests/worklet/webexposed/
    global-interface-listing-paint-worklet.html

These are the remaining test failures seen on this bot when running
with --use-gpu-in-tests. Unfortunately there doesn't seem to be a way
to specialize these for this device.

Bug:  875172 
Change-Id: I050b131b2702987f2e5803ac6e75ea6dfae35b2b
Reviewed-on: https://chromium-review.googlesource.com/1183979
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#584915}
[modify] https://crrev.com/07d8d2d9a382a0c5a9d851ebe5145dc4512b8612/third_party/WebKit/LayoutTests/TestExpectations

'WebKit Android (Nexus4)' is green after the above suppressions:
https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Android%20%28Nexus4%29/81824

Still watching the build of 'KitKat Phone Tester (dbg)' which contains those suppressions:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Phone%20Tester%20%28dbg%29/8900

Blocking: 876539
Status: Fixed (was: Started)
webkit_layout_tests are passing on this bot now. There are some capacity problems on the bot occasionally causing shards to fail. Linking this to the related issue.

Comment 25 Deleted

Status: Verified (was: Fixed)
Marking Verified as per comment#24 & 25

Sign in to add a comment