Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 1 user
Status: WontFix
Owner: ----
Closed: Sep 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocking:
issue 637904



Sign in to add a comment
telemetry_perf_unittests keeps running even after receiving SIGTERM (android_n5x_swarming)
Project Member Reported by nedngu...@google.com, Jan 11 2017 Back to list
Swarming log:
https://chromium-swarm.appspot.com/task?id=33a5c1fc4d9fbd10&refresh=10&show_raw=1

Seems like there are 2 things here:
1) First, a bunch of test failed due to cloud storage lock timeout:

Traceback (most recent call last):
RunBenchmark at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:362
benchmark.ShouldTearDownStateAfterEachStorySetRun())
Run at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:209
stories):
_UpdateAndCheckArchives at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:401
wpr_archive_info.DownloadArchivesIfNeeded()
DownloadArchivesIfNeeded at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/wpr/archive_info.py:84
cloud_storage.GetIfChanged(archive_path, self._bucket)
GetIfChanged at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_utils/py_utils/cloud_storage.py:348
with _FileLock(file_path):
__enter__ at /usr/lib/python2.7/contextlib.py:17
return self.gen.next()
_FileLock at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_utils/py_utils/cloud_storage.py:246
PSEUDO_LOCK_ACQUISITION_TIMEOUT)
WaitFor at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_utils/py_utils/__init__.py:132
(timeout, GetConditionString()))
TimeoutException: Timed out while waiting 10s for py_utils.WaitFor(lambda: not os.path.exists(pseudo_lock_path),
PSEUDO_LOCK_ACQUISITION_TIMEOUT).

2) When Telemetry retries the failed tests, it fails to discover any Android devices at all:
Traceback (most recent call last):
RunBenchmark at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:362
None
Run at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/story_runner.py:273
None
traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
None
TearDownState at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:311
None
traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
None
_StopBrowser at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:317
None
Close at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/browser/browser.py:265
None
traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
None
Close at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py:225
None
traced_function at /b/swarm_slave/w/irFieVan/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
None
_StopBrowser at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py:73
None
StopApplication at /b/swarm_slave/w/irFieVan/third_party/catapult/telemetry/telemetry/internal/platform/android_platform_backend.py:348
None
timeout_retry_wrapper at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:57
None
Run at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/timeout_retry.py:159
None
JoinAll at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/reraiser_thread.py:186
None
_JoinAll at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/reraiser_thread.py:158
None
run at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/reraiser_thread.py:81
None
<lambda> at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/utils/timeout_retry.py:152
None
impl at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:47
None
ForceStop at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:1148
None
timeout_retry_wrapper at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:51
None
impl at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:47
None
RunShellCommand at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:923
None
handle_large_output at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:898
None
handle_large_command at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:880
None
handle_check_return at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:871
None
run at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/device_utils.py:867
None
Shell at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:480
None
_RunDeviceAdbCmd at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:282
None
timeout_retry_wrapper at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:51
None
impl at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/decorators.py:47
None
_RunAdbCmd at /b/swarm_slave/w/irFieVan/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:252
None
NoAdbError: [Errno 2] No such file or directory
 
Blocking: 637904
re #2: The adb binary itself is being removed because swarming is timing out the task. That is WAI. Devil is indicating that as clearly as it can by raising a NoAdbError. That is also WAI.

Does telemetry handle receiving a SIGTERM?
Cc: dpranke@chromium.org
We depends on typ's for handling SIGTERM. I notice that locally, I have to send multiple ctrl+c  to stop a telemetry test running locally.

+Dirk: do you recall how Typ handle SIGTERM?
typ is supposed to handle SIGTERM cleanly and correctly, but it's certainly possible that there are bugs here, since doing that in multiprocess python setups is tricky. 

I'm happy to look at issues if we have them.
Owner: dpranke@chromium.org
Status: Assigned
Summary: telemetry_perf_unittests keeps running even after receiving SIGTERM (android_n5x_swarming) (was: telemetry_perf_unittests timed out flakily on android_n5x_swarming due to adb connection failure)
Dirk, in this bug, the swarming infra sent SIGTERM to telemetry test but it still keeps running & causes the weird adb failure.
Cc: -charliea@chromium.org
Un-CCing myself from this bug. Feel free to re-CC me if necessary.
Labels: OS-All
Labels: -Pri-2 Pri-1
Labels: Build-Tools-TYP
Components: Build
Labels: -Pri-1 Pri-2
I think this isn't causing too many failures, so maybe it's okay for it to be a P2 (i.e., I have other P1 things I need to prioritize ahead of this). LMK if you disagree.
Cc: bpastene@chromium.org
Owner: ----
Status: Available
I have no idea if this is still an issue? I'm not planning to work on this any time soon unless it's urgent, so marking this as available and removing myself as the owner.
Status: WontFix
Sign in to add a comment