webkit_unit_tests failing after didn't run for ~10 hours, still failing for 8+ days |
|||||||||||||||||
Issue descriptionLarge Revision range: chromium 408864 : 408966 => https://chromium.googlesource.com/chromium/src/+log/12e4094f1942c015925e406c7ce86c9cdafea645..7c19b1a766b1aa7aea6aa5b03fb5dd8dbed62ec8?pretty=fuller&n=10000 Failing builders: WebKit Android (Nexus4): https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Android%20(Nexus4) Initially suspecting https://codereview.chromium.org/2134663002 per it touching both Android and scroll (errors seem to be scrolling related...). Attempting revert
,
Aug 1 2016
Was not the culprit, relanding.
,
Aug 2 2016
,
Aug 9 2016
The test runner seems to be timing out about 80% though. Seeing a bunch of "Timed out. Dumping threads." errors towards the end.
,
Aug 9 2016
This looks like an infra bug. The tests all fail due to some Android problem. This is pretty crazy to *still* be failing 8 days later. Please disable this bot or fix it. :)
,
Aug 9 2016
,
Aug 9 2016
,
Aug 9 2016
,
Aug 10 2016
Putting this back in the trooper queue, this is pretty bad, it's wasting lots of resources failing and retrying all the time. Can we just disable this bot entirely if we're going to ignore it for weeks?
I 126.197s run_tests_on_device(02741ca41a0ea5b6) Still working on StartInstrumentation(02741ca41a0ea5b6, org.chromium.native_test/org.chromium.native_test.NativeTestInstrumentationTestRunner, retries=0, extras={'org.chromium.native_test.NativeTestInstrumentationTestRunner.ShardNanoTimeout': 120000000000, 'org.chromium.native_test.NativeTestInstrumentationTestRunner.NativeTestActivity': 'org.chromium.native_test.NativeUnitTestActivity', 'org.chromium.native_test.NativeTestInstrumentationTestRunner.TestList': '/data/local/tmp/temp_file-3c2316023c0c4'}, timeout=120.0, raw=False)
C 165.211s run_tests_on_device(01cb7ce0cfe305ae) Timed out. Dumping threads.
C 165.211s run_tests_on_device(01cb7ce0cfe305ae) ********************************************************************************
C 165.212s run_tests_on_device(01cb7ce0cfe305ae) Stack dump for thread 'TimeoutThread-1-for-run_tests_on_device(01cb7ce0cfe305ae)'
C 165.212s run_tests_on_device(01cb7ce0cfe305ae) ********************************************************************************
C 165.225s run_tests_on_device(01cb7ce0cfe305ae) File: "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
C 165.225s run_tests_on_device(01cb7ce0cfe305ae) self.__bootstrap_inner()
C 165.225s run_tests_on_device(01cb7ce0cfe305ae) File: "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
C 165.225s run_tests_on_device(01cb7ce0cfe305ae) self.run()
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) self._ret = self._func(*self._args, **self._kwargs)
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/timeout_retry.py", line 150, in <lambda>
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) child_thread = reraiser_thread.ReraiserThread(lambda: func(*args, **kwargs),
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) return f(*args, **kwargs)
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 1052, in StartInstrumentation
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) large_output=True)
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 51, in timeout_retry_wrapper
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) return impl()
C 165.226s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) return f(*args, **kwargs)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 898, in RunShellCommand
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) output = handle_large_output(cmd, large_output)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 872, in handle_large_output
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) handle_large_command(cmd)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 858, in handle_large_command
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) return handle_check_return(cmd)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 849, in handle_check_return
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) return run(cmd)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/device_utils.py", line 845, in run
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) return self.adb.Shell(cmd)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 476, in Shell
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) output = self._RunDeviceAdbCmd(args, timeout, retries, check_error=False)
C 165.227s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 278, in _RunDeviceAdbCmd
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) check_error=check_error)
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 51, in timeout_retry_wrapper
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) return impl()
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) return f(*args, **kwargs)
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 245, in _RunAdbCmd
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) timeout_retry.CurrentTimeoutThreadGroup().GetRemainingTime())
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 268, in GetCmdStatusAndOutputWithTimeout
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) for data in _IterProcessStdout(process, timeout=timeout):
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) File: "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 227, in _IterProcessStdout
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
C 165.228s run_tests_on_device(01cb7ce0cfe305ae) ********************************************************************************
I 165.229s TimeoutThread-1-for-run_tests_on_device(01cb7ce0cfe305ae) [host]> /b/c/b/WebKit_Android__Nexus4_/src/third_party/android_tools/sdk/platform-tools/adb -s 01cb7ce0cfe305ae shell '( p=org.chromium.native_test;if [[ "$(ps)" = *$p* ]]; then am force-stop $p; fi );echo %$?'
I 165.286s TimeoutThread-1-for-delete_temporary_file(01cb7ce0cfe305ae) [host]> /b/c/b/WebKit_Android__Nexus4_/src/third_party/android_tools/sdk/platform-tools/adb -s 01cb7ce0cfe305ae shell 'rm -f /data/local/tmp/temp_file-96380d03fc814'
E 166.132s run_tests_on_device(01cb7ce0cfe305ae) Shard timed out: run_tests_on_device(01cb7ce0cfe305ae)
Traceback (most recent call last):
File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_environment.py", line 55, in wrapper
return f(dev, *args, **kwargs)
File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_test_run.py", line 62, in run_tests_on_device
result = self._RunTest(dev, test)
File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_gtest_run.py", line 346, in _RunTest
timeout=timeout, retries=0)
File "/b/c/b/WebKit_Android__Nexus4_/src/build/android/pylib/local/device/local_device_gtest_run.py", line 152, in Run
self._component, extras=extras, raw=False, **kwargs)
File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper
retry_if_func=retry_if_func)
File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/timeout_retry.py", line 157, in Run
error_log_func=error_log_func)
File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 186, in JoinAll
self._JoinAll(watcher, timeout)
File "/b/c/b/WebKit_Android__Nexus4_/src/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 151, in _JoinAll
(len(alive_threads), len(self._threads)))
CommandTimeoutError: Timed out waiting for 1 of 1 threads.
,
Aug 10 2016
I think something is wrong with the suite, not with infra. I've attached a graph of the median and mean runtime for the last 30 days for webkit_unit_tests on the N4 bot. You'll note that it used to take less than two minutes for the entire suite to run. Something landed around August 1 that dramatically increased the runtime.
,
Aug 10 2016
Triaging over to android infra, this looks like a failure within catapult (?). Also Troopers don't have the authority to remove builders from waterfalls, but sheriffs do.
,
Aug 10 2016
no, this isn't a failure within catapult, either.
,
Aug 10 2016
Issue 636314 has been merged into this issue.
,
Aug 11 2016
,
Aug 11 2016
I rerun last successful build #51255 (https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51255). New log is here https://uberchromegw.corp.google.com/i/chromium.webkit/builders/Android%20Builder/builds/90128 It was successful, i.e. the problem is not with infra. I am going to find the culprit with bisect on the commits listed here (https://chromium.googlesource.com/chromium/src/+log/12e4094f1942c015925e406c7ce86c9cdafea645..7c19b1a766b1aa7aea6aa5b03fb5dd8dbed62ec8?pretty=fuller&n=10000). It takes some time.
,
Aug 11 2016
Run at master@{408914} was successful.
New range is 408915-408966.
,
Aug 12 2016
Bisect pointed to this CL: Change VirtualTimePolicy::PAUSE_IF_NETWORK_FETCHES_PENDING (https://codereview.chromium.org/2184123002) 408943 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51744 Passed 408944 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51787 -> FAIL I will double check and try to revert this CL
,
Aug 12 2016
Bisect runs: 408863 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51735 Passed 408914 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51740 Passed 408943 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51744 Passed Recheck: https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51790 Passed 408955 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51749 FAIL 408950 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51777 FAIL 408947 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51781 FAIL 408945 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51783 FAIL 408944 https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51787 FAIL
,
Aug 12 2016
,
Aug 12 2016
Is is impossible to revert this CL (https://codereview.chromium.org/2246493002/#). alexclarke@ will have a look what is going on w/ tests.
,
Aug 12 2016
I suspect one of the VirtualTimeTest tests is failing on android in some weird way that makes it seem all the subsequent tests are failing too. It's probably quite hard to revert that patch since various patches have landed on top, instead I'll find out what is going on android and check the test is behaving on other platforms. If it turns out it's 1 or 2 tests failing only on android I'll disable them on that platform.
,
Aug 12 2016
This suggests the problem is only on android: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webkit_unit_tests&tests=VirtualTimeTest*
,
Aug 12 2016
I've reproduced this problem locally, the VirtualTimeTests pass when run individually, but the suite fails when they're run. Commenting them out makes the test suite pass. I suspect there's something different about the way these tests are run on android that makes enabling virtual time is bad.
,
Aug 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/50aaa9cf939631691a4a0dfde16f1fc429db7b1b commit 50aaa9cf939631691a4a0dfde16f1fc429db7b1b Author: alexclarke <alexclarke@chromium.org> Date: Fri Aug 12 16:45:44 2016 Disable VirtualTimeTest* on Andorid These tests are failing on Android. We don't actually need this code to run there currently, so lets disable them and figure out later why they fail there. (My suspicion is something is different about the android gtest harness which causes virtual time to break things) BUG= 633321 Review-Url: https://codereview.chromium.org/2239293003 Cr-Commit-Position: refs/heads/master@{#411675} [modify] https://crrev.com/50aaa9cf939631691a4a0dfde16f1fc429db7b1b/third_party/WebKit/Source/web/tests/VirtualTimeTest.cpp
,
Aug 12 2016
The bot should go green now. If not I'll look into it more on monday.
,
Aug 12 2016
The bot passed webkit_unit_tests! webkit_tests were failed (because of fast/dom/Window/window-scaled-viewport-properties.html), but this is another issue.
,
Aug 12 2016
The next build (https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Android%20%28Nexus4%29/builds/51802) is green.
,
Aug 13 2016
,
Aug 15 2016
Removing Sheriff-Chromium label because now bots are green.
,
Sep 1 2016
,
Oct 25 2016
I think we can close this. |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by gab@chromium.org
, Aug 1 2016