Issue metadata
Sign in to add a comment
|
gl_unittests timing out on android_n5x_swarming_rel |
||||||||||||||||||||||
Issue descriptionThere are many failures (timeouts) of gl_unittests on the android_n5x_swarming_rel trybot: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel?numbuilds=200 This is causing valid CLs like https://codereview.chromium.org/2696203004/ to be rejected. It's hard to see what's going on from the logs. My best guess is that this assertion failure is related: [FATAL:task_scheduler_impl.cc(39)] Check failed: join_for_testing_returned_.IsSet(). This assertion was added back on January 25 in https://chromium.googlesource.com/chromium/src/+/7bba05ed69f2a2fdf3aca587dec0c9f148bdc469 , and I don't imagine that these tests have been broken since then. We need to figure this out immediately. It's causing a lot of failures on the CQ. For this reason marking P0.
,
Feb 16 2017
,
Feb 16 2017
Interesting, Android Release (Nexus 5X), running on same devices, is green.
,
Feb 16 2017
The error seems to be:
Device(003c66e07c6ed1a4) 02-16 21:09:03.074 15901 15901 E libEGL : validate_display:255 error 3008 (EGL_BAD_DISPLAY)
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 E chromium: [ERROR:gl_surface_egl.cc(658)] ANGLE Platform initialization failed.
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : QUALCOMM build : 63c06b2, I8366cd0437
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : Build Date : 10/21/15
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : OpenGL ES Shader Compiler Version: XE031.05.13.02
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : Local Branch :
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : Remote Branch : quic/LA.BF64.1.2.9_v2
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : Remote Branch : NONE
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno : Reconstruct Branch : NOTHING
Device(003c66e07c6ed1a4) 02-16 21:09:03.149 15887 15887 I Process : Sending signal. PID: 15901 SIG: 9
Device(003c66e07c6ed1a4) 02-16 21:09:03.188 972 983 I WindowState: WIN DEATH: Window{f2f839 u0 org.chromium.native_test/org.chromium.native_test.NativeUnitTestActivity}
,
Feb 16 2017
Jamie, looks like this is related to your change for platform init?
,
Feb 16 2017
Sorry, a red herring. This seems to be the real failure: Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: [FATAL:task_scheduler_impl.cc(39)] Check failed: join_for_testing_returned_.IsSet(). Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #00 0x0000007f974c020b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000013d20b Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #01 0x0000007f974ef197 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016c197 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #02 0x0000007f974ef1b3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016c1b3 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #03 0x0000007f974edec3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016aec3 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #04 0x0000007f974ee1ff /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016b1ff Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #05 0x0000007f9743582b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x00000000000b282b Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #06 0x0000007f97986ab3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000603ab3 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #07 0x0000007f9798ac9f /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000607c9f Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #08 0x0000007f9798ae03 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000607e03 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #09 0x0000007f9798aebf /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000607ebf Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #10 0x0000007f9798d43f /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000060a43f Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #11 0x0000007f9798d64b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000060a64b Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #12 0x0000007f979a6ef3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000623ef3 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #13 0x0000007f974a207b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000011f07b Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #14 0x0000007f979a8303 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000625303 Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #15 0x0000007f979a847f /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000062547f Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #16 0x0000007f974279eb /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x00000000000a49eb Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #17 0x0000007f978cc3cf /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x00000000005493cf Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #18 0x0000007f980f6c3b /data/app/org.chromium.native_test-1/oat/arm64/base.odex+0x0000000000525c3b Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: Device(003c66e07c6ed1a4) 02-16 21:08:45.036 15600 15600 E chromium: [ERROR:test_suite.cc(295)] Currently running: GLImageSharedMemory/GLImageTest/0.Create
,
Feb 16 2017
That error should be benign, and should go away with https://codereview.chromium.org/2700713004/. Please review. I still don't see why my change would be related so not claiming ownership.
,
Feb 16 2017
I have compared GPU.FYI waterfall bot with this one.
Both have the same failing stack, but looks like all the tests eventually pass in 2nd try.
I also have it fail in the same way locally, but am unable to symbolize with third_party/android_platform/development/scripts/stack.
The difference is, I think, in uploading logcats.
This is from GPU.FYI:
I 45.370s Main FINISHED TRY #2/3
I 45.370s Main All tests completed.
C 45.371s Main ********************************************************************************
C 45.371s Main Summary
C 45.371s Main ********************************************************************************
C 45.372s Main [==========] 41 tests ran.
C 45.372s Main [ PASSED ] 41 tests.
C 45.372s Main ********************************************************************************
I 45.379s tear_down_device(004efe612fa2b6d2) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_004efe612fa2b6d2.json
I 45.380s tear_down_device(004309a44234a94d) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_004309a44234a94d.json
I 45.381s tear_down_device(008c30d196b52b59) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_008c30d196b52b59.json
I 45.382s tear_down_device(00a7d3d53f6c144e) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00a7d3d53f6c144e.json
I 45.382s tear_down_device(00bb487c8078c728) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00bb487c8078c728.json
I 45.382s tear_down_device(00ec0b662757c00a) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00ec0b662757c00a.json
I 45.383s tear_down_device(00e17bd389b310f9) Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00e17bd389b310f9.json
2017/02/16 13:05:45 proto: duplicate proto type registered: google.protobuf.Duration
2017/02/16 13:05:45 proto: duplicate proto type registered: google.protobuf.Timestamp
2017/02/16 13:05:45 proto: duplicate proto type registered: google.protobuf.Empty
[I2017-02-16T13:05:46.739367-08:00 10998 0 main.go:332] Terminating. {"returnCode":0}
INFO:root:Logcats are located at: https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F34618efcc8c82811%2F%2B%2Funified_logcats
+---------------------------------------------------------------------------+
| End of shard 0 Pending: 8.7s Duration: 60.6s Bot: build290-m4 Exit: 0 |
+---------------------------------------------------------------------------+
Total duration: 60.6s
exit code (as seen by runtest.py): 0
This is from CQ:
I 45.427s Main FINISHED TRY #2/3
I 45.427s Main All tests completed.
C 45.428s Main ********************************************************************************
C 45.428s Main Summary
C 45.428s Main ********************************************************************************
C 45.428s Main [==========] 41 tests ran.
C 45.428s Main [ PASSED ] 41 tests.
C 45.428s Main ********************************************************************************
I 45.432s tear_down_device(00a695a948c7bbb1) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00a695a948c7bbb1.json
I 45.433s tear_down_device(003c66e07c6ed1a4) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_003c66e07c6ed1a4.json
I 45.433s tear_down_device(004f23f589db3cec) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_004f23f589db3cec.json
I 45.433s tear_down_device(00b806fa8098a327) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00b806fa8098a327.json
I 45.433s tear_down_device(00b930c18f571e23) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00b930c18f571e23.json
I 45.433s tear_down_device(00d2a8de2797ce04) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00d2a8de2797ce04.json
I 45.434s tear_down_device(00bae14274cecaf8) Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00bae14274cecaf8.json
[I2017-02-16T13:09:26.706428-08:00 2348 0 main.go:346] Terminating. {"returnCode":0}
+-----------------------------------------------------------------------------+
| End of shard 0 Pending: 9.8s Duration: 60.0s Bot: build261-m4 Exit: -15 |
+-----------------------------------------------------------------------------+
Total duration: 60.0s
@@@STEP_WARNINGS@@@
Test runtime exceeded allocated time
@@@STEP_LOG_LINE@Test runtime exceeded allocated time@Either it ran for too long (hard timeout) or it didn't produce I/O for an extended period of time (I/O timeout)@@@
@@@STEP_LINK@logdog-->Test runtime exceeded allocated time@https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.android%2Fandroid_n5x_swarming_rel%2F120844%2F%2B%2Frecipes%2Fsteps%2Fgl_unittests__with_patch__on_Android%2F0%2Flogs%2FTest_runtime_exceeded_allocated_time%2F0@@@
@@@STEP_LOG_END@Test runtime exceeded allocated time@@@
@@@STEP_WARNINGS@@@
some shards did not complete: 0
@@@STEP_LOG_LINE@some shards did not complete: 0@Missing results from the following shard(s): 0@@@
@@@STEP_LINK@logdog-->some shards did not complete: 0@https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.android%2Fandroid_n5x_swarming_rel%2F120844%2F%2B%2Frecipes%2Fsteps%2Fgl_unittests__with_patch__on_Android%2F0%2Flogs%2Fsome_shards_did_not_complete%3A_0%2F0@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@This can happen in following cases:@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@ * Test failed to start (missing *.dll/*.so dependency for example)@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@ * Test crashed or hung@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@ * Task expired because there are not enough bots available and are all used@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@ * Swarming service experienced problems@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@Please examine logs to figure out what happened.@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@@@@
@@@STEP_LOG_END@some shards did not complete: 0@@@
exit code (as seen by runtest.py): 241
,
Feb 16 2017
Weird how the main waterfall version of that bot seems unaffected: https://build.chromium.org/p/chromium.android/builders/Android%20N5X%20Swarm%20Builder?numbuilds=200
,
Feb 16 2017
Here's the tail of a log for a test that timed out:
I 45.292s Main FINISHED TRY #2/3
I 45.293s Main All tests completed.
C 45.293s Main ********************************************************************************
C 45.293s Main Summary
C 45.293s Main ********************************************************************************
C 45.294s Main [==========] 41 tests ran.
C 45.294s Main [ PASSED ] 41 tests.
C 45.294s Main ********************************************************************************
I 45.300s tear_down_device(004f06d285a2964f) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_004f06d285a2964f.json
I 45.300s tear_down_device(00490e5a853aa657) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00490e5a853aa657.json
I 45.301s tear_down_device(00b66ea289732aec) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00b66ea289732aec.json
I 45.303s tear_down_device(00ba801480809f24) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00ba801480809f24.json
I 45.303s tear_down_device(00b931eb258ea2d5) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00b931eb258ea2d5.json
I 45.304s tear_down_device(00bd3d1a7446baf4) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00bd3d1a7446baf4.json
I 45.304s tear_down_device(00d4d4956535a083) Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00d4d4956535a083.json
[I2017-02-16T14:56:09.802642-08:00 13646 0 main.go:346] Terminating. {"returnCode":0}
And here's the tail for a log that didn't time out:
I 22.200s Main FINISHED TRY #1/3
I 22.200s Main All tests completed.
C 22.200s Main ********************************************************************************
C 22.200s Main Summary
C 22.200s Main ********************************************************************************
C 22.200s Main [==========] 41 tests ran.
C 22.200s Main [ PASSED ] 41 tests.
C 22.200s Main ********************************************************************************
I 22.201s tear_down_device(008b0a75425cab55) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_008b0a75425cab55.json
I 22.202s tear_down_device(00cabca27e9739d5) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00cabca27e9739d5.json
I 22.202s tear_down_device(004ef5b07c8ecda8) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_004ef5b07c8ecda8.json
I 22.203s tear_down_device(00b7e7d08da636b9) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00b7e7d08da636b9.json
I 22.203s tear_down_device(00b92ff025b69ad1) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00b92ff025b69ad1.json
I 22.203s tear_down_device(00ca940725cebace) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00ca940725cebace.json
I 22.203s tear_down_device(00caa86a80b0d125) Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00caa86a80b0d125.json
[I2017-02-16T14:56:34.908273-08:00 31796 0 main.go:346] Terminating. {"returnCode":0}
INFO:root:Logcats are located at: https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F3461f4b010a4d711%2F%2B%2Funified_logcats
Looks like the butler exits, but the logdog_wrapper hangs around and doesn't complete the log uploads.
,
Feb 16 2017
I'll poke around for a bit, but if I don't find anything obvious, I'll try committing https://codereview.chromium.org/2653313003/ which I have hanging around for the last time logdog_wrapper misbehaved.
,
Feb 16 2017
#11: that CL still lgtm. I think users are primarily getting the full logcat through the isolateserver, not that either is easy to find.
,
Feb 16 2017
This timeout has a bit more info: https://chromium-swarm.appspot.com/task?id=346210f8bd98a010 Thread "record_to_file" (ident: 140574187308800) is currently running: File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap self.__bootstrap_inner() File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run self._ret = self._func(*self._args, **self._kwargs) File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file logcat_format='threadtime'): File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines for data in _IterProcessStdout(process, timeout=timeout): File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout read_fds, _, _ = select.select([child_fd], [], [], poll_interval) Definitely something with logcats. Not sure it's the wrapper though.
,
Feb 16 2017
I'll go ahead with the CL in #11, but it might not fix all our problems.
,
Feb 16 2017
hm, yeah, that's hanging before it has a chance to get to the wrapper.
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e428ea9a94c51f8f16af35dac2d03ec97793a100 commit e428ea9a94c51f8f16af35dac2d03ec97793a100 Author: bpastene <bpastene@chromium.org> Date: Fri Feb 17 02:20:32 2017 Temporarily disable logdog uploading of android logcats in swarming. Many android swarming tasks are timing out due to a hanging logdog butler process. This disables the mechanism that uses it. BUG=684559, 693203 Review-Url: https://codereview.chromium.org/2653313003 Cr-Commit-Position: refs/heads/master@{#451184} [modify] https://crrev.com/e428ea9a94c51f8f16af35dac2d03ec97793a100/tools/mb/mb.py
,
Feb 17 2017
Looks like things have gotten significantly better: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel?numbuilds=200 Although, the test does timeout every now and then: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/121450 https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/121521
,
Feb 17 2017
Issue 693249 has been merged into this issue.
,
Feb 17 2017
bpastene@: there still seems to be some infra-level problem. The test harness seems to have exited in under a minute, but exceptions were thrown in code underneath build/android/test_runner.py:
C 45.869s Main Summary
C 45.869s Main ********************************************************************************
C 45.869s Main [==========] 41 tests ran.
C 45.869s Main [ PASSED ] 41 tests.
C 45.869s Main ********************************************************************************
I 45.874s tear_down_device(004919f28552b857) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_004919f28552b857.json
I 45.876s tear_down_device(004ef33c858ad857) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_004ef33c858ad857.json
I 45.877s tear_down_device(00769bda427cd74f) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00769bda427cd74f.json
I 45.877s tear_down_device(00b924f3259e9ecf) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00b924f3259e9ecf.json
I 45.878s tear_down_device(00b9305f8f974228) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00b9305f8f974228.json
I 45.879s tear_down_device(00bd3eb774c6b4f4) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00bd3eb774c6b4f4.json
I 45.880s tear_down_device(00c6855bac6f208d) Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00c6855bac6f208d.json
C 59.753s Main Received SIGTERM. Shutting down.
Thread "record_to_file" (ident: 140348793812736) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "record_to_file" (ident: 140349353723648) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "record_to_file" (ident: 140349370509056) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "record_to_file" (ident: 140348777027328) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "record_to_file" (ident: 140349397792512) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "record_to_file" (ident: 140349388347136) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "record_to_file" (ident: 140349624678144) is currently running:
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
self._ret = self._func(*self._args, **self._kwargs)
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
logcat_format='threadtime'):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
for data in _IterProcessStdout(process, timeout=timeout):
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
read_fds, _, _ = select.select([child_fd], [], [], poll_interval)
Thread "MainThread" (ident: 140349698787136) is currently running:
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 862, in <module>
sys.exit(main())
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 850, in main
return RunTestsCommand(args)
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 645, in RunTestsCommand
return RunTestsInPlatformMode(args)
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 792, in RunTestsInPlatformMode
str(iteration_count))
File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/base/environment.py", line 33, in __exit__
self.TearDown()
File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/local/device/local_device_environment.py", line 225, in TearDown
m.Stop()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 204, in Stop
self._StopRecording()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 183, in _StopRecording
self._record_thread.join(timeout=self._RECORD_THREAD_JOIN_WAIT)
File "/usr/lib/python2.7/threading.py", line 960, in join
self.__block.wait(delay)
File "/usr/lib/python2.7/threading.py", line 358, in wait
_sleep(delay)
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 691, in unexpected_sigterm
sys._current_frames()[live_thread.ident]))
E 59.756s Main Unrecognized error occurred.
Traceback (most recent call last):
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 850, in main
return RunTestsCommand(args)
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 645, in RunTestsCommand
return RunTestsInPlatformMode(args)
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 792, in RunTestsInPlatformMode
str(iteration_count))
File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/base/environment.py", line 33, in __exit__
self.TearDown()
File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/local/device/local_device_environment.py", line 225, in TearDown
m.Stop()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 204, in Stop
self._StopRecording()
File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 183, in _StopRecording
self._record_thread.join(timeout=self._RECORD_THREAD_JOIN_WAIT)
File "/usr/lib/python2.7/threading.py", line 960, in join
self.__block.wait(delay)
File "/usr/lib/python2.7/threading.py", line 358, in wait
_sleep(delay)
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 697, in unexpected_sigterm
infra_error('\n'.join(msg))
File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 677, in infra_error
sys.exit(constants.INFRA_EXIT_CODE)
SystemExit: 87
,
Feb 17 2017
So it seems the threads responsible for pulling logcats off of the devices are either stuck or taking too long. Since this only seems to be a problem for gl_unittests, is it possible that this test is now suddenly spewing a lot more logging to the device? I can look into that, but the logic behind the actual monitor is outside my scope, so passing to jbudorick@
,
Feb 17 2017
Yeah, the monitor needs to behave better in this case.
,
Feb 17 2017
Re more logging - it may be because of the crashing tests. I've opened issue 693369 for this.
,
Feb 18 2017
Ah, the dcheck gate would explain why this never showed up on the main waterfall. (trybots = main watefall bots + dcheck enabled)
,
Feb 21 2017
,
Feb 21 2017
Fix pending review https://codereview.chromium.org/2708923003
,
Feb 21 2017
LogcatMonitor fix on review: https://codereview.chromium.org/2707113002/
,
Feb 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240 commit 8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240 Author: fdoray <fdoray@chromium.org> Date: Tue Feb 21 21:53:17 2017 Use base::test::ScopedAsyncTaskScheduler in GLImageTest. GLImageTest registers a TaskScheduler by calling base::test::CreateAndSetSimpleTaskScheduler, but never unregisters it. The second GLImageTest running in fails because there is already a registered TaskScheduler. This CL replaces the call to base::test::CreateAndSetSimpleTaskScheduler with a base::test::ScopedAsyncTaskScheduler member which takes care of registering a TaskScheduler at the beginning of the test and unregistering it at the end of the test. BUG= 693203 Review-Url: https://codereview.chromium.org/2708923003 Cr-Commit-Position: refs/heads/master@{#451842} [modify] https://crrev.com/8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240/ui/gl/BUILD.gn [modify] https://crrev.com/8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240/ui/gl/test/gl_image_test_template.h
,
Feb 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3c96661cd7d066e3f4d6fe8b9760661ff76ea005 commit 3c96661cd7d066e3f4d6fe8b9760661ff76ea005 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Wed Feb 22 09:17:23 2017 Roll src/third_party/catapult/ d9036c9b7..e56eec861 (13 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/d9036c9b7e3d..e56eec86141a $ git log d9036c9b7..e56eec861 --date=short --no-merges --format='%ad %ae %s' 2017-02-21 nednguyen Revert of [Telemetry] Migrate browser_test_runner to use typ as the test runner (patchset #1 id:1 of https://codereview.chromium.org/2701253008/ ) 2017-02-21 jbudorick [devil] Add an iteration timeout option to cmd_helper.IterCmdStdoutLines. 2017-02-21 benjhayden Update style in value/ui/. 2017-02-21 benjhayden Dedent CSS in histogram_set_view. 2017-02-21 benjhayden Fix merged relationships in histogram-set-table. 2017-02-21 benjhayden Merge sample diagnostics. 2017-02-21 nednguyen Reland of [Telemetry] Migrate browser_test_runner to use typ as the test runner (patchset #1 id:1 of https://codereview.chromium.org/2700263002/ ) 2017-02-21 benjhayden Update style in tab_view.html. 2017-02-21 benjhayden Allow toggling merged-telemetry-info-span. 2017-02-21 benjhayden Document createScalarSpan. 2017-02-21 charliea Update the update_v8 script and roll d8 on Windows to stable 2017-02-21 perezju [Devil] Improve error message for pull failure 2017-02-21 sebmarchand Use pgosweep.exe from the build directory if it's available Created with: roll-dep src/third_party/catapult BUG= 636153 , 693203 , 636153 , 693714 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2710903002 Cr-Commit-Position: refs/heads/master@{#451952} [modify] https://crrev.com/3c96661cd7d066e3f4d6fe8b9760661ff76ea005/DEPS
,
Feb 22 2017
I think this is fixed in both regards. We've had no gl_unittests timeouts since #27 landed: https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=user&c=bot&et=1487779980000&f=device_type%3ANexus%205X%20(bullhead)&f=name%3Agl_unittests&f=buildername%3Aandroid_n5x_swarming_rel&f=state%3ATIMED_OUT&l=50&n=true&q=state%3A&s=created_ts%3Adesc&st=1487347980000 Thanks fdoray for fixing the test side of things!
,
Feb 22 2017
Thanks John and Francois for getting to the bottom of this!!! |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Feb 16 2017Components: Infra>Client>Android