New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 750137 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Jul 30
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

system_health.common_Desktop timing out on mac air 10.11

Project Member Reported by rnep...@chromium.org, Jul 28 2017

Issue description

Cc: martiniss@chromium.org
This confuses me, it says it is timing out but looking at the output it looks like the test completes?

https://chromium-swarm.appspot.com/task?id=37a0ecce3021b310&refresh=10&show_raw=1

martiniss@ you are probably more familiar with swarming output than I am, do you have any idea where I could look to see whats going on?
Cc: mar...@chromium.org
The swarming data on that task took "1h 9m 10s". The IO timeout for it was 1h, so that's why swarming gave up.

According to the task logs, the task itself took about 9 minutes... 
The first log line in the task is
(WARNING) 2017-07-28 02:51:08,525 desktop_browser_finder.FindAllAvailableBrowsers:171  Chrome build location for mac_x86_64 not found. Browser will be run without Flash.


The last log line with a timestamp is
(INFO) 2017-07-28 03:00:14,050 cloud_storage.Insert:377  Uploading /b/s/w/it46CDtp/tmp1l6XaB.png to gs://chrome-telemetry-output/profiler-file-id_3-2017-07-28_03-00-147786.png

So, that's about 9 minutes, if I'm reading the timestamps correctly. Not sure what's happening here then....

maruel@ is it possible swarming is hanging after the task finishes?
Summary: system_health.common_Desktop timing out on mac air 10.11 (was: systme_health.common_Desktop timing out on mac air 10.11)

Comment 4 by mar...@chromium.org, Jul 28 2017

Zombie process?
Status: Available (was: Untriaged)
How does swarming decide the task is finished? Does it care about processes spawned by the task it starts at the beginning of the task, or does it just wait for the task it starts to finish?
Nevermind, the task is hung. Compare https://chromium-swarm.appspot.com/task?id=37a3a4dc1a632c10&refresh=10&show_raw=1 (currently running) with https://chromium-swarm.appspot.com/task?id=379ff08e11853610&refresh=10&show_raw=1 (exited). There are some lines at the bottom in the exited one that are missing in the currently running one.
Cc: rnep...@chromium.org charliea@chromium.org
I ssh-ed onto the bot and did `kill -2` on the process, which sent a keyboard interrupt on https://chromium-swarm.appspot.com/task?id=37a3a4dc1a632c10&refresh=10&show_raw=1, while the task was hung. This is what was in the log:

Exception KeyboardInterrupt in <module 'threading' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.pyc'> ignored
(CRITICAL) 2017-07-28 14:51:08,289 battor_wrapper.KillBattOrShell:191  BattOr shell was not properly closed. Killing now.
(WARNING) 2017-07-28 14:51:08,295 ps_util._ListAllSubprocesses:86  psutil.AccessDenied (pid=405, name='battor_agent')
(WARNING) 2017-07-28 14:51:08,295 ps_util._ListAllSubprocesses:89  Telemetry leaks these processes: battor_agent (405)
Running ['/usr/bin/python', '../../tools/perf/run_benchmark', 'battor.steady_state', '-v', '--upload-results', '--output-format=chartjson', '--browser=reference', '--output-trace-tag=_ref', '--output-dir', '/b/s/w/itDlc7im/tmpUvOu6htelemetry', '--output-format=json'] in None (env: {'VERSIONER_PYTHON_PREFER_32_BIT': 'no', 'LOGNAME': 'chrome-bot', 'USER': 'chrome-bot', 'PATH': '/opt/local/bin:/opt/local/sbin:/usr/local/sbin:/usr/local/git/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'BOTO_CONFIG': '/Users/chrome-bot/.boto', 'HOME': '/Users/chrome-bot', 'SWARMING_BOT_ID': 'build127-b1', 'LANG': 'en_US.UTF-8', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.D3Ob2W0dnw/Render', 'SWARMING_SERVER': 'https://chromium-swarm.appspot.com', 'VERSIONER_PYTHON_VERSION': '2.7', 'CHROME_DEVEL_SANDBOX': '/opt/chromium/chrome_sandbox', 'XPC_FLAGS': '0x0', 'SWARMING_HEADLESS': '1', 'LUCI_CONTEXT': '/b/s/w/luci_ctx.4pZhxx.json', 'XPC_SERVICE_NAME': 'org.swarm.bot.plist', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.r5FFeYhO4X/Listeners', 'SWARMING_TASK_ID': '37a3a4dc1a632c11', 'SHELL': '/bin/bash', 'NO_GCE_CHECK': 'False', 'TMPDIR': '/b/s/w/itDlc7im', 'GIT_USER_AGENT': 'git/2.7.4 darwin build127-b1.labs.chromium.org', '__CF_USER_TEXT_ENCODING': '0x1F4:0x0:0x0'})
Command ['/usr/bin/python', '../../tools/perf/run_benchmark', 'battor.steady_state', '-v', '--upload-results', '--output-format=chartjson', '--browser=reference', '--output-trace-tag=_ref', '--output-dir', '/b/s/w/itDlc7im/tmpUvOu6htelemetry', '--output-format=json'] returned exit code 255

Doesn't look very illuminating. I was looking for a traceback to figure out where we're hung. My only guess is that some battor shell stuff is broken, only because there are battor issues in the bot log itself. cc-ing battor people.
The task does reach this line (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?q=internal/story_runn&sq=package:chromium&l=373). So it gets there, and then returns, and from my understanding of the code, there shouldn't be anything to stop it from exiting, except for one thing. That is https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/results/page_test_results.py?sq=package:chromium&l=295, the cleanup function. I'm not exactly sure what this all does; a cursory inspection made it look like it's trying to delete files.
Also, the bot rebooted before the task mentioned in #7 happened. So doesn't look like a reboot will solve this :/
Project Member

Comment 10 by sheriffbot@chromium.org, Jul 30

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Recharge-Cold
Status: WontFix (was: Untriaged)
Reopen if relevant.

Sign in to add a comment