telemetry_perf_unittests keeps running even after receiving SIGTERM (android_n5x_swarming) |
||||||||||
Issue descriptionSwarming log: https://chromium-swarm.appspot.com/task?id=33a5c1fc4d9fbd10&refresh=10&show_raw=1 Seems like there are 2 things here: 1) First, a bunch of test failed due to cloud storage lock timeout: Traceback (most recent call last): RunBenchmark at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:362 benchmark.ShouldTearDownStateAfterEachStorySetRun()) Run at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:209 stories): _UpdateAndCheckArchives at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:401 wpr_archive_info.DownloadArchivesIfNeeded() DownloadArchivesIfNeeded at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/wpr/archive_info.py:84 cloud_storage.GetIfChanged(archive_path, self._bucket) GetIfChanged at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_utils/py_utils/cloud_storage.py:348 with _FileLock(file_path): __enter__ at /usr/lib/python2.7/contextlib.py:17 return self.gen.next() _FileLock at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_utils/py_utils/cloud_storage.py:246 PSEUDO_LOCK_ACQUISITION_TIMEOUT) WaitFor at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_utils/py_utils/__init__.py:132 (timeout, GetConditionString())) TimeoutException: Timed out while waiting 10s for py_utils.WaitFor(lambda: not os.path.exists(pseudo_lock_path), PSEUDO_LOCK_ACQUISITION_TIMEOUT). 2) When Telemetry retries the failed tests, it fails to discover any Android devices at all: Traceback (most recent call last): RunBenchmark at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:362 None Run at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:273 None traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 None TearDownState at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:311 None traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 None _StopBrowser at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:317 None Close at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/browser/browser.py:265 None traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 None Close at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py:225 None traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 None _StopBrowser at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py:73 None StopApplication at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/platform/android_platform_backend.py:348 None timeout_retry_wrapper at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:57 None Run at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/timeout_retry.py:159 None JoinAll at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/reraiser_thread.py:186 None _JoinAll at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/reraiser_thread.py:158 None run at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/reraiser_thread.py:81 None <lambda> at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/timeout_retry.py:152 None impl at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:47 None ForceStop at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:1148 None timeout_retry_wrapper at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:51 None impl at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:47 None RunShellCommand at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:923 None handle_large_output at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:898 None handle_large_command at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:880 None handle_check_return at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:871 None run at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:867 None Shell at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:480 None _RunDeviceAdbCmd at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:282 None timeout_retry_wrapper at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:51 None impl at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:47 None _RunAdbCmd at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:252 None NoAdbError: [Errno 2] No such file or directory
,
Jan 11 2017
re #2: The adb binary itself is being removed because swarming is timing out the task. That is WAI. Devil is indicating that as clearly as it can by raising a NoAdbError. That is also WAI. Does telemetry handle receiving a SIGTERM?
,
Jan 11 2017
We depends on typ's for handling SIGTERM. I notice that locally, I have to send multiple ctrl+c to stop a telemetry test running locally. +Dirk: do you recall how Typ handle SIGTERM?
,
Jan 11 2017
typ is supposed to handle SIGTERM cleanly and correctly, but it's certainly possible that there are bugs here, since doing that in multiprocess python setups is tricky. I'm happy to look at issues if we have them.
,
Jan 11 2017
Dirk, in this bug, the swarming infra sent SIGTERM to telemetry test but it still keeps running & causes the weird adb failure.
,
Jan 12 2017
Un-CCing myself from this bug. Feel free to re-CC me if necessary.
,
Jan 15 2017
,
Jan 15 2017
,
Feb 10 2017
,
Feb 25 2017
I think this isn't causing too many failures, so maybe it's okay for it to be a P2 (i.e., I have other P1 things I need to prioritize ahead of this). LMK if you disagree.
,
Sep 2 2017
I have no idea if this is still an issue? I'm not planning to work on this any time soon unless it's urgent, so marking this as available and removing myself as the owner.
,
Sep 4 2017
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by nedngu...@google.com
, Jan 11 2017