New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 693203 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug-Regression



Sign in to add a comment

gl_unittests timing out on android_n5x_swarming_rel

Project Member Reported by kbr@chromium.org, Feb 16 2017

Issue description

There are many failures (timeouts) of gl_unittests on the android_n5x_swarming_rel trybot:
https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel?numbuilds=200

This is causing valid CLs like https://codereview.chromium.org/2696203004/ to be rejected.

It's hard to see what's going on from the logs. My best guess is that this assertion failure is related:

[FATAL:task_scheduler_impl.cc(39)] Check failed: join_for_testing_returned_.IsSet(). 

This assertion was added back on January 25 in https://chromium.googlesource.com/chromium/src/+/7bba05ed69f2a2fdf3aca587dec0c9f148bdc469 , and I don't imagine that these tests have been broken since then.

We need to figure this out immediately. It's causing a lot of failures on the CQ. For this reason marking P0.

 

Comment 1 by kbr@chromium.org, Feb 16 2017

Cc: jbudorick@chromium.org
Components: Infra>Client>Android
Looking at the end of the log, it seems that the shard completed successfully.

https://chromium-swarm.appspot.com/task?id=34611fb8b02d2710&refresh=10&show_raw=1

I   45.576s Main  FINISHED TRY #2/3
I   45.576s Main  All tests completed.
C   45.577s Main  ********************************************************************************
C   45.577s Main  Summary
C   45.577s Main  ********************************************************************************
C   45.578s Main  [==========] 41 tests ran.
C   45.578s Main  [  PASSED  ] 41 tests.
C   45.578s Main  ********************************************************************************
I   45.584s tear_down_device(00439efe1d86cae3)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_00439efe1d86cae3.json
I   45.588s tear_down_device(0049003c85aa9c55)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_0049003c85aa9c55.json
I   45.589s tear_down_device(0078f1f6834a3dd6)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_0078f1f6834a3dd6.json
I   45.589s tear_down_device(006717981de69ae8)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_006717981de69ae8.json
I   45.590s tear_down_device(00c68ae98f974c29)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_00c68ae98f974c29.json
I   45.591s tear_down_device(00b923002566c4d0)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_00b923002566c4d0.json
I   45.591s tear_down_device(00c8b45d79e9b4f5)  Wrote device cache: /b/swarm_slave/w/ir0bWNlQ/out/Release/device_cache_00c8b45d79e9b4f5.json
[I2017-02-16T11:04:17.517512-08:00 8966 0 main.go:346] Terminating.                                {"returnCode":0}


Comment 2 by kbr@chromium.org, Feb 16 2017

Cc: bpastene@chromium.org
Interesting, Android Release (Nexus 5X), running on same devices, is green.
The error seems to be:
Device(003c66e07c6ed1a4) 02-16 21:09:03.074 15901 15901 E libEGL  : validate_display:255 error 3008 (EGL_BAD_DISPLAY)
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 E chromium: [ERROR:gl_surface_egl.cc(658)] ANGLE Platform initialization failed.
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : QUALCOMM build                   : 63c06b2, I8366cd0437
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : Build Date                       : 10/21/15
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : OpenGL ES Shader Compiler Version: XE031.05.13.02
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : Local Branch                     : 
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : Remote Branch                    : quic/LA.BF64.1.2.9_v2
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : Remote Branch                    : NONE
Device(003c66e07c6ed1a4) 02-16 21:09:03.075 15901 15901 I Adreno  : Reconstruct Branch               : NOTHING
Device(003c66e07c6ed1a4) 02-16 21:09:03.149 15887 15887 I Process : Sending signal. PID: 15901 SIG: 9
Device(003c66e07c6ed1a4) 02-16 21:09:03.188   972   983 I WindowState: WIN DEATH: Window{f2f839 u0 org.chromium.native_test/org.chromium.native_test.NativeUnitTestActivity}

Owner: jmad...@chromium.org
Jamie, looks like this is related to your change for platform init?
Cc: jmad...@chromium.org
Owner: ----
Sorry, a red herring. This seems to be the real failure:
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: [FATAL:task_scheduler_impl.cc(39)] Check failed: join_for_testing_returned_.IsSet(). 
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #00 0x0000007f974c020b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000013d20b
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #01 0x0000007f974ef197 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016c197
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #02 0x0000007f974ef1b3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016c1b3
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #03 0x0000007f974edec3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016aec3
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #04 0x0000007f974ee1ff /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000016b1ff
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #05 0x0000007f9743582b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x00000000000b282b
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #06 0x0000007f97986ab3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000603ab3
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #07 0x0000007f9798ac9f /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000607c9f
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #08 0x0000007f9798ae03 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000607e03
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #09 0x0000007f9798aebf /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000607ebf
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #10 0x0000007f9798d43f /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000060a43f
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #11 0x0000007f9798d64b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000060a64b
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #12 0x0000007f979a6ef3 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000623ef3
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #13 0x0000007f974a207b /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000011f07b
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #14 0x0000007f979a8303 /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x0000000000625303
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #15 0x0000007f979a847f /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x000000000062547f
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #16 0x0000007f974279eb /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x00000000000a49eb
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #17 0x0000007f978cc3cf /data/app/org.chromium.native_test-1/lib/arm64/lib_gl_unittests__library.so+0x00000000005493cf
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: #18 0x0000007f980f6c3b /data/app/org.chromium.native_test-1/oat/arm64/base.odex+0x0000000000525c3b
Device(003c66e07c6ed1a4) 02-16 21:08:45.035 15600 15600 F chromium: 
Device(003c66e07c6ed1a4) 02-16 21:08:45.036 15600 15600 E chromium: [ERROR:test_suite.cc(295)] Currently running: GLImageSharedMemory/GLImageTest/0.Create

That error should be benign, and should go away with https://codereview.chromium.org/2700713004/. Please review. I still don't see why my change would be related so not claiming ownership.
I have compared GPU.FYI waterfall bot with this one.
Both have the same failing stack, but looks like all the tests eventually pass in 2nd try.
I also have it fail in the same way locally, but am unable to symbolize with third_party/android_platform/development/scripts/stack.

The difference is, I think, in uploading logcats.
This is from GPU.FYI:

I   45.370s Main  FINISHED TRY #2/3
I   45.370s Main  All tests completed.
C   45.371s Main  ********************************************************************************
C   45.371s Main  Summary
C   45.371s Main  ********************************************************************************
C   45.372s Main  [==========] 41 tests ran.
C   45.372s Main  [  PASSED  ] 41 tests.
C   45.372s Main  ********************************************************************************
I   45.379s tear_down_device(004efe612fa2b6d2)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_004efe612fa2b6d2.json
I   45.380s tear_down_device(004309a44234a94d)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_004309a44234a94d.json
I   45.381s tear_down_device(008c30d196b52b59)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_008c30d196b52b59.json
I   45.382s tear_down_device(00a7d3d53f6c144e)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00a7d3d53f6c144e.json
I   45.382s tear_down_device(00bb487c8078c728)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00bb487c8078c728.json
I   45.382s tear_down_device(00ec0b662757c00a)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00ec0b662757c00a.json
I   45.383s tear_down_device(00e17bd389b310f9)  Wrote device cache: /b/swarm_slave/w/irqVOwWU/out/Release/device_cache_00e17bd389b310f9.json
2017/02/16 13:05:45 proto: duplicate proto type registered: google.protobuf.Duration
2017/02/16 13:05:45 proto: duplicate proto type registered: google.protobuf.Timestamp
2017/02/16 13:05:45 proto: duplicate proto type registered: google.protobuf.Empty
[I2017-02-16T13:05:46.739367-08:00 10998 0 main.go:332] Terminating.                                {"returnCode":0}
INFO:root:Logcats are located at: https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F34618efcc8c82811%2F%2B%2Funified_logcats
+---------------------------------------------------------------------------+
| End of shard 0  Pending: 8.7s  Duration: 60.6s  Bot: build290-m4  Exit: 0 |
+---------------------------------------------------------------------------+
Total duration: 60.6s
exit code (as seen by runtest.py): 0

This is from CQ:
I   45.427s Main  FINISHED TRY #2/3
I   45.427s Main  All tests completed.
C   45.428s Main  ********************************************************************************
C   45.428s Main  Summary
C   45.428s Main  ********************************************************************************
C   45.428s Main  [==========] 41 tests ran.
C   45.428s Main  [  PASSED  ] 41 tests.
C   45.428s Main  ********************************************************************************
I   45.432s tear_down_device(00a695a948c7bbb1)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00a695a948c7bbb1.json
I   45.433s tear_down_device(003c66e07c6ed1a4)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_003c66e07c6ed1a4.json
I   45.433s tear_down_device(004f23f589db3cec)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_004f23f589db3cec.json
I   45.433s tear_down_device(00b806fa8098a327)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00b806fa8098a327.json
I   45.433s tear_down_device(00b930c18f571e23)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00b930c18f571e23.json
I   45.433s tear_down_device(00d2a8de2797ce04)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00d2a8de2797ce04.json
I   45.434s tear_down_device(00bae14274cecaf8)  Wrote device cache: /b/swarm_slave/w/irTo1Af8/out/Release/device_cache_00bae14274cecaf8.json
[I2017-02-16T13:09:26.706428-08:00 2348 0 main.go:346] Terminating.                                {"returnCode":0}
+-----------------------------------------------------------------------------+
| End of shard 0  Pending: 9.8s  Duration: 60.0s  Bot: build261-m4  Exit: -15 |
+-----------------------------------------------------------------------------+
Total duration: 60.0s
@@@STEP_WARNINGS@@@
Test runtime exceeded allocated time
@@@STEP_LOG_LINE@Test runtime exceeded allocated time@Either it ran for too long (hard timeout) or it didn't produce I/O for an extended period of time (I/O timeout)@@@
@@@STEP_LINK@logdog-->Test runtime exceeded allocated time@https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.android%2Fandroid_n5x_swarming_rel%2F120844%2F%2B%2Frecipes%2Fsteps%2Fgl_unittests__with_patch__on_Android%2F0%2Flogs%2FTest_runtime_exceeded_allocated_time%2F0@@@
@@@STEP_LOG_END@Test runtime exceeded allocated time@@@
@@@STEP_WARNINGS@@@
some shards did not complete: 0
@@@STEP_LOG_LINE@some shards did not complete: 0@Missing results from the following shard(s): 0@@@
@@@STEP_LINK@logdog-->some shards did not complete: 0@https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.android%2Fandroid_n5x_swarming_rel%2F120844%2F%2B%2Frecipes%2Fsteps%2Fgl_unittests__with_patch__on_Android%2F0%2Flogs%2Fsome_shards_did_not_complete%3A_0%2F0@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@This can happen in following cases:@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@  * Test failed to start (missing *.dll/*.so dependency for example)@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@  * Test crashed or hung@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@  * Task expired because there are not enough bots available and are all used@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@  * Swarming service experienced problems@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@Please examine logs to figure out what happened.@@@
@@@STEP_LOG_LINE@some shards did not complete: 0@@@@
@@@STEP_LOG_END@some shards did not complete: 0@@@
exit code (as seen by runtest.py): 241
Owner: bpastene@chromium.org
Status: Assigned (was: Untriaged)
Weird how the main waterfall version of that bot seems unaffected:
https://build.chromium.org/p/chromium.android/builders/Android%20N5X%20Swarm%20Builder?numbuilds=200
Here's the tail of a log for a test that timed out:

I   45.292s Main  FINISHED TRY #2/3
I   45.293s Main  All tests completed.
C   45.293s Main  ********************************************************************************
C   45.293s Main  Summary
C   45.293s Main  ********************************************************************************
C   45.294s Main  [==========] 41 tests ran.
C   45.294s Main  [  PASSED  ] 41 tests.
C   45.294s Main  ********************************************************************************
I   45.300s tear_down_device(004f06d285a2964f)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_004f06d285a2964f.json
I   45.300s tear_down_device(00490e5a853aa657)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00490e5a853aa657.json
I   45.301s tear_down_device(00b66ea289732aec)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00b66ea289732aec.json
I   45.303s tear_down_device(00ba801480809f24)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00ba801480809f24.json
I   45.303s tear_down_device(00b931eb258ea2d5)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00b931eb258ea2d5.json
I   45.304s tear_down_device(00bd3d1a7446baf4)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00bd3d1a7446baf4.json
I   45.304s tear_down_device(00d4d4956535a083)  Wrote device cache: /b/swarm_slave/w/irij2tlG/out/Release/device_cache_00d4d4956535a083.json
[I2017-02-16T14:56:09.802642-08:00 13646 0 main.go:346] Terminating.                                {"returnCode":0}



And here's the tail for a log that didn't time out:
I   22.200s Main  FINISHED TRY #1/3
I   22.200s Main  All tests completed.
C   22.200s Main  ********************************************************************************
C   22.200s Main  Summary
C   22.200s Main  ********************************************************************************
C   22.200s Main  [==========] 41 tests ran.
C   22.200s Main  [  PASSED  ] 41 tests.
C   22.200s Main  ********************************************************************************
I   22.201s tear_down_device(008b0a75425cab55)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_008b0a75425cab55.json
I   22.202s tear_down_device(00cabca27e9739d5)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00cabca27e9739d5.json
I   22.202s tear_down_device(004ef5b07c8ecda8)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_004ef5b07c8ecda8.json
I   22.203s tear_down_device(00b7e7d08da636b9)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00b7e7d08da636b9.json
I   22.203s tear_down_device(00b92ff025b69ad1)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00b92ff025b69ad1.json
I   22.203s tear_down_device(00ca940725cebace)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00ca940725cebace.json
I   22.203s tear_down_device(00caa86a80b0d125)  Wrote device cache: /b/swarm_slave/w/irL2vky8/out/Release/device_cache_00caa86a80b0d125.json
[I2017-02-16T14:56:34.908273-08:00 31796 0 main.go:346] Terminating.                                {"returnCode":0}
INFO:root:Logcats are located at: https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F3461f4b010a4d711%2F%2B%2Funified_logcats


Looks like the butler exits, but the logdog_wrapper hangs around and doesn't complete the log uploads.
I'll poke around for a bit, but if I don't find anything obvious, I'll try committing https://codereview.chromium.org/2653313003/ which I have hanging around for the last time logdog_wrapper misbehaved.
#11: that CL still lgtm. I think users are primarily getting the full logcat through the isolateserver, not that either is easy to find.
This timeout has a bit more info: https://chromium-swarm.appspot.com/task?id=346210f8bd98a010

Thread "record_to_file" (ident: 140574187308800) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irenMqtE/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)


Definitely something with logcats. Not sure it's the wrapper though.
I'll go ahead with the CL in #11, but it might not fix all our problems.
hm, yeah, that's hanging before it has a chance to get to the wrapper.
Project Member

Comment 16 by bugdroid1@chromium.org, Feb 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e428ea9a94c51f8f16af35dac2d03ec97793a100

commit e428ea9a94c51f8f16af35dac2d03ec97793a100
Author: bpastene <bpastene@chromium.org>
Date: Fri Feb 17 02:20:32 2017

Temporarily disable logdog uploading of android logcats in swarming.

Many android swarming tasks are timing out due to a hanging logdog
butler process. This disables the mechanism that uses it.

BUG=684559,  693203 

Review-Url: https://codereview.chromium.org/2653313003
Cr-Commit-Position: refs/heads/master@{#451184}

[modify] https://crrev.com/e428ea9a94c51f8f16af35dac2d03ec97793a100/tools/mb/mb.py

Issue 693249 has been merged into this issue.

Comment 19 by kbr@chromium.org, Feb 17 2017

Status: Started (was: Assigned)
bpastene@: there still seems to be some infra-level problem. The test harness seems to have exited in under a minute, but exceptions were thrown in code underneath build/android/test_runner.py:

C   45.869s Main  Summary
C   45.869s Main  ********************************************************************************
C   45.869s Main  [==========] 41 tests ran.
C   45.869s Main  [  PASSED  ] 41 tests.
C   45.869s Main  ********************************************************************************
I   45.874s tear_down_device(004919f28552b857)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_004919f28552b857.json
I   45.876s tear_down_device(004ef33c858ad857)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_004ef33c858ad857.json
I   45.877s tear_down_device(00769bda427cd74f)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00769bda427cd74f.json
I   45.877s tear_down_device(00b924f3259e9ecf)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00b924f3259e9ecf.json
I   45.878s tear_down_device(00b9305f8f974228)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00b9305f8f974228.json
I   45.879s tear_down_device(00bd3eb774c6b4f4)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00bd3eb774c6b4f4.json
I   45.880s tear_down_device(00c6855bac6f208d)  Wrote device cache: /b/swarm_slave/w/irswr7Y7/out/Release/device_cache_00c6855bac6f208d.json
C   59.753s Main  Received SIGTERM. Shutting down.
Thread "record_to_file" (ident: 140348793812736) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "record_to_file" (ident: 140349353723648) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "record_to_file" (ident: 140349370509056) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "record_to_file" (ident: 140348777027328) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "record_to_file" (ident: 140349397792512) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "record_to_file" (ident: 140349388347136) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "record_to_file" (ident: 140349624678144) is currently running:
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 167, in record_to_file
    logcat_format='threadtime'):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 311, in IterCmdOutputLines
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/utils/cmd_helper.py", line 229, in _IterProcessStdout
    read_fds, _, _ = select.select([child_fd], [], [], poll_interval)

Thread "MainThread" (ident: 140349698787136) is currently running:
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 862, in <module>
    sys.exit(main())
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 850, in main
    return RunTestsCommand(args)
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 645, in RunTestsCommand
    return RunTestsInPlatformMode(args)
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 792, in RunTestsInPlatformMode
    str(iteration_count))
  File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/base/environment.py", line 33, in __exit__
    self.TearDown()
  File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/local/device/local_device_environment.py", line 225, in TearDown
    m.Stop()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 204, in Stop
    self._StopRecording()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 183, in _StopRecording
    self._record_thread.join(timeout=self._RECORD_THREAD_JOIN_WAIT)
  File "/usr/lib/python2.7/threading.py", line 960, in join
    self.__block.wait(delay)
  File "/usr/lib/python2.7/threading.py", line 358, in wait
    _sleep(delay)
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 691, in unexpected_sigterm
    sys._current_frames()[live_thread.ident]))

E   59.756s Main  Unrecognized error occurred.
Traceback (most recent call last):
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 850, in main
    return RunTestsCommand(args)
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 645, in RunTestsCommand
    return RunTestsInPlatformMode(args)
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 792, in RunTestsInPlatformMode
    str(iteration_count))
  File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/base/environment.py", line 33, in __exit__
    self.TearDown()
  File "/b/swarm_slave/w/irswr7Y7/build/android/pylib/local/device/local_device_environment.py", line 225, in TearDown
    m.Stop()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 204, in Stop
    self._StopRecording()
  File "/b/swarm_slave/w/irswr7Y7/third_party/catapult/devil/devil/android/logcat_monitor.py", line 183, in _StopRecording
    self._record_thread.join(timeout=self._RECORD_THREAD_JOIN_WAIT)
  File "/usr/lib/python2.7/threading.py", line 960, in join
    self.__block.wait(delay)
  File "/usr/lib/python2.7/threading.py", line 358, in wait
    _sleep(delay)
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 697, in unexpected_sigterm
    infra_error('\n'.join(msg))
  File "/b/swarm_slave/w/irswr7Y7/build/android/test_runner.py", line 677, in infra_error
    sys.exit(constants.INFRA_EXIT_CODE)
SystemExit: 87

Owner: jbudorick@chromium.org
Status: Assigned (was: Started)
So it seems the threads responsible for pulling logcats off of the devices are either stuck or taking too long. Since this only seems to be a problem for gl_unittests, is it possible that this test is now suddenly spewing a lot more logging to the device?

I can look into that, but the logic behind the actual monitor is outside my scope, so passing to jbudorick@
Status: Started (was: Assigned)
Yeah, the monitor needs to behave better in this case.
Re more logging - it may be because of the crashing tests.
I've opened  issue 693369  for this.
Ah, the dcheck gate would explain why this never showed up on the main waterfall. (trybots = main watefall bots + dcheck enabled)
Cc: dcasta...@chromium.org kbr@chromium.org
 Issue 693369  has been merged into this issue.
Fix pending review https://codereview.chromium.org/2708923003
LogcatMonitor fix on review: https://codereview.chromium.org/2707113002/
Project Member

Comment 27 by bugdroid1@chromium.org, Feb 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240

commit 8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240
Author: fdoray <fdoray@chromium.org>
Date: Tue Feb 21 21:53:17 2017

Use base::test::ScopedAsyncTaskScheduler in GLImageTest.

GLImageTest registers a TaskScheduler by calling
base::test::CreateAndSetSimpleTaskScheduler, but never unregisters
it. The second GLImageTest running in fails because there is
already a registered TaskScheduler.

This CL replaces the call to base::test::CreateAndSetSimpleTaskScheduler
with a base::test::ScopedAsyncTaskScheduler member which takes care of
registering a TaskScheduler at the beginning of the test and
unregistering it at the end of the test.

BUG= 693203 

Review-Url: https://codereview.chromium.org/2708923003
Cr-Commit-Position: refs/heads/master@{#451842}

[modify] https://crrev.com/8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240/ui/gl/BUILD.gn
[modify] https://crrev.com/8a9cc7e7dc99a62392c3a8fd02ac4c3a56415240/ui/gl/test/gl_image_test_template.h

Project Member

Comment 28 by bugdroid1@chromium.org, Feb 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3c96661cd7d066e3f4d6fe8b9760661ff76ea005

commit 3c96661cd7d066e3f4d6fe8b9760661ff76ea005
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Wed Feb 22 09:17:23 2017

Roll src/third_party/catapult/ d9036c9b7..e56eec861 (13 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/d9036c9b7e3d..e56eec86141a

$ git log d9036c9b7..e56eec861 --date=short --no-merges --format='%ad %ae %s'
2017-02-21 nednguyen Revert of [Telemetry] Migrate browser_test_runner to use typ as the test runner (patchset #1 id:1 of https://codereview.chromium.org/2701253008/ )
2017-02-21 jbudorick [devil] Add an iteration timeout option to cmd_helper.IterCmdStdoutLines.
2017-02-21 benjhayden Update style in value/ui/.
2017-02-21 benjhayden Dedent CSS in histogram_set_view.
2017-02-21 benjhayden Fix merged relationships in histogram-set-table.
2017-02-21 benjhayden Merge sample diagnostics.
2017-02-21 nednguyen Reland of [Telemetry] Migrate browser_test_runner to use typ as the test runner (patchset #1 id:1 of https://codereview.chromium.org/2700263002/ )
2017-02-21 benjhayden Update style in tab_view.html.
2017-02-21 benjhayden Allow toggling merged-telemetry-info-span.
2017-02-21 benjhayden Document createScalarSpan.
2017-02-21 charliea Update the update_v8 script and roll d8 on Windows to stable
2017-02-21 perezju [Devil] Improve error message for pull failure
2017-02-21 sebmarchand Use pgosweep.exe from the build directory if it's available

Created with:
  roll-dep src/third_party/catapult
BUG= 636153 , 693203 , 636153 , 693714 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2710903002
Cr-Commit-Position: refs/heads/master@{#451952}

[modify] https://crrev.com/3c96661cd7d066e3f4d6fe8b9760661ff76ea005/DEPS

Comment 30 by kbr@chromium.org, Feb 22 2017

Thanks John and Francois for getting to the bottom of this!!!

Sign in to add a comment