New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 633321 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug



Sign in to add a comment

webkit_unit_tests failing after didn't run for ~10 hours, still failing for 8+ days

Project Member Reported by gab@chromium.org, Aug 1 2016

Issue description

Large Revision range:
chromium 408864 : 408966 => https://chromium.googlesource.com/chromium/src/+log/12e4094f1942c015925e406c7ce86c9cdafea645..7c19b1a766b1aa7aea6aa5b03fb5dd8dbed62ec8?pretty=fuller&n=10000

Failing builders:
WebKit Android (Nexus4): https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Android%20(Nexus4)

Initially suspecting https://codereview.chromium.org/2134663002 per it touching both Android and scroll (errors seem to be scrolling related...). Attempting revert
 

Comment 1 by gab@chromium.org, Aug 1 2016

Cc: ma...@chromium.org tzik@chromium.org
Forgot to add BUG= but https://codereview.chromium.org/2198993002/ landed as a revert for suspect.

Leaving the office for today though, @other sheriffs to track whether this fixes https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Android%20(Nexus4) and reland if not.

Comment 2 by ma...@chromium.org, Aug 1 2016

Was not the culprit, relanding.

Comment 3 by gab@chromium.org, Aug 2 2016

Cc: gab@chromium.org
Owner: kozyatinskiy@chromium.org
The test runner seems to be timing out about 80% though. Seeing a bunch of "Timed out. Dumping threads." errors towards the end.
Cc: kozyatinskiy@chromium.org
Labels: -Pri-2 -Sheriff-Chromium Pri-1
Owner: stip@chromium.org
This looks like an infra bug. The tests all fail due to some Android problem.

This is pretty crazy to *still* be failing 8 days later. Please disable this bot or fix it. :)
Labels: Infra-Troopers
Summary: webkit_unit_tests failing after didn't run for ~10 hours, still failing for 8+ days (was: webkit_unit_tests failing after didn't run for ~10 hours...)
Cc: qyears...@chromium.org
Cc: stip@chromium.org
Owner: ----
Status: Available (was: Assigned)
Putting this back in the trooper queue, this is pretty bad, it's wasting lots of resources failing and retrying all the time. Can we just disable this bot entirely if we're going to ignore it for weeks?

I  126.197s run_tests_on_device(02741ca41a0ea5b6)  Still working on StartInstrumentation(02741ca41a0ea5b6, org.chromium.native_test/org.chromium.native_test.NativeTestInstrumentationTestRunner, retries=0, extras={'org.chromium.native_test.NativeTestInstrumentationTestRunner.ShardNanoTimeout': 120000000000, 'org.chromium.native_test.NativeTestInstrumentationTestRunner.NativeTestActivity': 'org.chromium.native_test.NativeUnitTestActivity', 'org.chromium.native_test.NativeTestInstrumentationTestRunner.TestList': '/data/local/tmp/temp_file-3c2316023c0c4'}, timeout=120.0, raw=False)
C  165.211s run_tests_on_device(01cb7ce0cfe305ae)  Timed out. Dumping threads.
C  165.211s run_tests_on_device(01cb7ce0cfe305ae)  ********************************************************************************
C  165.212s run_tests_on_device(01cb7ce0cfe305ae)  Stack dump for thread 'TimeoutThread-1-for-run_tests_on_device(01cb7ce0cfe305ae)'
C  165.212s run_tests_on_device(01cb7ce0cfe305ae)  ********************************************************************************
C  165.225s run_tests_on_device(01cb7ce0cfe305ae)  File: "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
C  165.225s run_tests_on_device(01cb7ce0cfe305ae)    self.__bootstrap_inner()
C  165.225s run_tests_on_device(01cb7ce0cfe305ae)  File: "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
C  165.225s run_tests_on_device(01cb7ce0cfe305ae)    self.run()
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)    self._ret = self._func(*self._args, **self._kwargs)
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/timeout_retry.py", line 150, in <lambda>
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)    child_thread = reraiser_thread.ReraiserThread(lambda: func(*args, **kwargs),
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)    return f(*args, **kwargs)
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 1052, in StartInstrumentation
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)    large_output=True)
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 51, in timeout_retry_wrapper
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)    return impl()
C  165.226s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    return f(*args, **kwargs)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 898, in RunShellCommand
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    output = handle_large_output(cmd, large_output)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 872, in handle_large_output
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    handle_large_command(cmd)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 858, in handle_large_command
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    return handle_check_return(cmd)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 849, in handle_check_return
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    return run(cmd)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 845, in run
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    return self.adb.Shell(cmd)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 476, in Shell
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)    output = self._RunDeviceAdbCmd(args, timeout, retries, check_error=False)
C  165.227s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 278, in _RunDeviceAdbCmd
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)    check_error=check_error)
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 51, in timeout_retry_wrapper
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)    return impl()
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)    return f(*args, **kwargs)
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 245, in _RunAdbCmd
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)    timeout_retry.CurrentTimeoutThreadGroup().GetRemainingTime())
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 268, in GetCmdStatusAndOutputWithTimeout
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)    for data in _IterProcessStdout(process, timeout=timeout):
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)  File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 227, in _IterProcessStdout
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
C  165.228s run_tests_on_device(01cb7ce0cfe305ae)  ********************************************************************************
I  165.229s TimeoutThread-1-for-run_tests_on_device(01cb7ce0cfe305ae)  [host]> /b/c/b/WebKit_Android__Nexus4_/src/third_party/android_tools/sdk/platform-tools/adb -s 01cb7ce0cfe305ae shell '( p=org.chromium.native_test;if [[ "$(ps)" = *$p* ]]; then am force-stop $p; fi );echo %$?'
I  165.286s TimeoutThread-1-for-delete_temporary_file(01cb7ce0cfe305ae)  [host]> /b/c/b/WebKit_Android__Nexus4_/src/third_party/android_tools/sdk/platform-tools/adb -s 01cb7ce0cfe305ae shell 'rm -f /data/local/tmp/temp_file-96380d03fc814'
E  166.132s run_tests_on_device(01cb7ce0cfe305ae)  Shard timed out: run_tests_on_device(01cb7ce0cfe305ae)
Traceback (most recent call last):
  File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_environment.py", line 55, in wrapper
    return f(dev, *args, **kwargs)
  File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_test_run.py", line 62, in run_tests_on_device
    result = self._RunTest(dev, test)
  File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_gtest_run.py", line 346, in _RunTest
    timeout=timeout, retries=0)
  File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_gtest_run.py", line 152, in Run
    self._component, extras=extras, raw=False, **kwargs)
  File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper
    retry_if_func=retry_if_func)
  File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/timeout_retry.py", line 157, in Run
    error_log_func=error_log_func)
  File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 186, in JoinAll
    self._JoinAll(watcher, timeout)
  File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 151, in _JoinAll
    (len(alive_threads), len(self._threads)))
CommandTimeoutError: Timed out waiting for 1 of 1 threads.

I think something is wrong with the suite, not with infra.

I've attached a graph of the median and mean runtime for the last 30 days for webkit_unit_tests on the N4 bot. You'll note that it used to take less than two minutes for the entire suite to run. Something landed around August 1 that dramatically increased the runtime.
Screen Shot 2016-08-10 at 11.35.12 AM.png
114 KB View Download
Components: Infra>Client>Android
Labels: -Infra-Troopers
Triaging over to android infra, this looks like a failure within catapult (?).

Also Troopers don't have the authority to remove builders from waterfalls, but sheriffs do.
no, this isn't a failure within catapult, either.
Issue 636314 has been merged into this issue.

Comment 14 by kolos@chromium.org, Aug 11 2016

Labels: Sheriff-Chromium

Comment 15 by kolos@chromium.org, Aug 11 2016

Cc: kolos@chromium.org
I rerun last successful build #51255 (https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51255). New log is here https://uberchromegw.corp.google.com/i/chromium.webkit/builders/Android%20Builder/builds/90128 It was successful, i.e. the problem is not with infra. 

I am going to find the culprit with bisect on the commits listed here (https://chromium.googlesource.com/chromium/src/+log/12e4094f1942c015925e406c7ce86c9cdafea645..7c19b1a766b1aa7aea6aa5b03fb5dd8dbed62ec8?pretty=fuller&n=10000). It takes some time. 

Comment 16 by kolos@chromium.org, Aug 11 2016

Owner: kolos@chromium.org
Status: Assigned (was: Available)
Run at master@{408914} was successful. 

New range is 408915-408966.

Comment 17 by kolos@chromium.org, Aug 12 2016

Bisect pointed to this CL: Change VirtualTimePolicy::PAUSE_IF_NETWORK_FETCHES_PENDING (https://codereview.chromium.org/2184123002)


408943 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51744 Passed

408944 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51787 -> FAIL

I will double check and try to revert this CL

Comment 19 by kolos@chromium.org, Aug 12 2016

Cc: alexclarke@chromium.org

Comment 20 by kolos@chromium.org, Aug 12 2016

Is is impossible to revert this CL (https://codereview.chromium.org/2246493002/#). 

alexclarke@ will have a look what is going on w/ tests. 
Owner: alexclarke@chromium.org
Status: Started (was: Assigned)
I suspect one of the VirtualTimeTest tests is failing on android in some weird way that makes it seem all the subsequent tests are failing too.

It's probably quite hard to revert that patch since various patches have landed on top, instead I'll find out what is going on android and check the test is behaving on other platforms.  If it turns out it's 1 or 2 tests failing only on android I'll disable them on that platform. 

I've reproduced this problem locally, the VirtualTimeTests pass when run individually, but the suite fails when they're run. Commenting them out makes the test suite pass.

I suspect there's something different about the way these tests are run on android that makes enabling virtual time is bad.
Project Member

Comment 24 by bugdroid1@chromium.org, Aug 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/50aaa9cf939631691a4a0dfde16f1fc429db7b1b

commit 50aaa9cf939631691a4a0dfde16f1fc429db7b1b
Author: alexclarke <alexclarke@chromium.org>
Date: Fri Aug 12 16:45:44 2016

Disable VirtualTimeTest* on Andorid

These tests are failing on Android.  We don't actually need this code
to run there currently, so lets disable them and figure out later why
they fail there.  (My suspicion is something is different about the
android gtest harness which causes virtual time to break things)

BUG= 633321 

Review-Url: https://codereview.chromium.org/2239293003
Cr-Commit-Position: refs/heads/master@{#411675}

[modify] https://crrev.com/50aaa9cf939631691a4a0dfde16f1fc429db7b1b/third_party/WebKit/Source/web/tests/VirtualTimeTest.cpp

The bot should go green now.  If not I'll look into it more on monday.

Comment 26 by kolos@chromium.org, Aug 12 2016

The bot passed webkit_unit_tests! webkit_tests were failed (because of fast/dom/Window/window-scaled-viewport-properties.html), but this is another issue. 
Cc: -qyears...@chromium.org
Labels: -Sheriff-Chromium
Removing Sheriff-Chromium label because now bots are green.
Labels: Type-Bug
Status: Fixed (was: Started)
I think we can close this.

Sign in to add a comment