Tests timing out without output on Mac perf bots |
||||
Issue descriptioncommand timed out: 2400 seconds without output, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2728.743504 https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf%20(1) https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf%20%284%29?numbuilds=200 ccing charliea@ since its after battor tests. ccing dtu@ because it seems like it might be recipe related.
,
Sep 8 2016
I think this kind of thing implies that telemetry is leaving processes lying around. maybe in this error scenario the battor_agent_binary is not being stopped. Passing to rnephew, I think it's in this error handling code? https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py#L75
,
Sep 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c4b1e75d9334ca63343d5e44768c246d796c8386 commit c4b1e75d9334ca63343d5e44768c246d796c8386 Author: nednguyen <nednguyen@google.com> Date: Sat Sep 10 12:00:05 2016 Manually roll src/third_party/catapult/ da6d44e4a..c0b988891 (3 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/da6d44e4a2a5..c0b988891fd1 $ git log da6d44e4a..c0b988891 --date=short --no-merges --format='%ad %ae %s' 2016-09-09 erikchen Add a new cpu time TBMv2 system health metric. 2016-09-09 nednguyen Update the lock operation in cloud_storage with better implementation using py_utils.lock 2016-09-09 rnephew [BattOr] Kill BattOr shell if there is a problem with communicating during clock sync. BUG= 640312 , 637904 , 645106 , 645720 TBR=catapult-sheriff@chromium.org, jbudorick@chromium.org Review-Url: https://codereview.chromium.org/2326063004 Cr-Commit-Position: refs/heads/master@{#417833} [modify] https://crrev.com/c4b1e75d9334ca63343d5e44768c246d796c8386/DEPS [modify] https://crrev.com/c4b1e75d9334ca63343d5e44768c246d796c8386/build/android/test_runner.pydeps
,
Sep 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8bf8e86ce08385909ae3e149c5b290a8a9602607 commit 8bf8e86ce08385909ae3e149c5b290a8a9602607 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Mon Sep 12 17:33:20 2016 Roll src/third_party/catapult/ 32f19b159..9d32403a4 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/32f19b159dd6..9d32403a467f $ git log 32f19b159..9d32403a4 --date=short --no-merges --format='%ad %ae %s' 2016-09-12 rnephew [BattOr] Fix error in catching BattOr exception. BUG= 645106 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2335543003 Cr-Commit-Position: refs/heads/master@{#417965} [modify] https://crrev.com/8bf8e86ce08385909ae3e149c5b290a8a9602607/DEPS
,
Sep 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a80bccf5aa929e36d526180dc5b51abe03b9d2d4 commit a80bccf5aa929e36d526180dc5b51abe03b9d2d4 Author: nednguyen <nednguyen@google.com> Date: Tue Sep 13 03:06:26 2016 Manually roll src/third_party/catapult/ aea37326a..900438075 (5 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/aea37326ac67..900438075780 $ git log aea37326a..900438075 --date=short --no-merges --format='%ad %ae %s' 2016-09-12 benjhayden Stop computing summary statistic ScalarNumerics in ValueSet. 2016-09-12 nednguyen Make cloud_storage_global_lock a python module & import it in py_utils.cloud_storage 2016-09-12 benjhayden Prevent re-entering value-set-table.updateContents_. 2016-09-12 rnephew [BattOr] Add logging to battor subprocess polling. 2016-09-12 washingtonp Enable some profile_chrome unit tests on Trybots, specifically the ones that will currently run without error. BUG= 645720 , 645106 TBR=catapult-sheriff@chromium.org NOTRY=true (net_unittests is flaky: crbug.com/646215 ) Review-Url: https://codereview.chromium.org/2331333002 Cr-Commit-Position: refs/heads/master@{#418154} [modify] https://crrev.com/a80bccf5aa929e36d526180dc5b51abe03b9d2d4/DEPS [modify] https://crrev.com/a80bccf5aa929e36d526180dc5b51abe03b9d2d4/build/android/test_runner.pydeps
,
Sep 13 2016
I'm not convinced that this is what is going on (the battor shell is left running and that is why the test isn't ending). Does anyone else have any ideas on what could be causing the telemetry test to hang? Here is a screenshot of the end of the test: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/profiler-file-id_0-2016-09-13_12-21-4489077.png Its just a blank mac screen.
,
Sep 16 2016
This seems serious. Can you ping me again next week?
,
Oct 3 2016
Ping. Is this fixed?
,
Oct 3 2016
Looks like there is some battor error:
INFO:root:*********** END OF BROWSER STANDARD OUTPUT ************
INFO:root:********************* BROWSER LOG *********************
INFO:root:No log file
INFO:root:***************** END OF BROWSER LOG ******************
Traceback (most recent call last):
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 88, in _RunStoryAndProcessErrorIfNeeded
test.Measure(state.platform, results)
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 287, in Measure
trace_result = platform.tracing_controller.StopTracing()
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 47, in StopTracing
return self._tracing_controller_backend.StopTracing()
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 108, in StopTracing
self._IssueClockSyncMarker()
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 203, in _IssueClockSyncMarker
self._RecordIssuerClockSyncMarker)
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py", line 103, in RecordClockSyncMarker
self._battor.RecordClockSyncMarker(sync_id)
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/common/battor/battor/battor_wrapper.py", line 201, in RecordClockSyncMarker
self._SendBattorCommand('%s %s' % (self._RECORD_CLOCKSYNC_CMD, sync_id))
File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/common/battor/battor/battor_wrapper.py", line 270, in _SendBattorCommand
'Outputted: %s' % (cmd, status))
BattorError: BattOr did not complete command 'RecordClockSyncMarker 4843713c-49cc-42d3-9dfd-2f408ae4c156' correctly.
Outputted: [1003/080255:FATAL:battor_agent_bin.cc(88)] Fatal error when communicating with the BattOr: RECEIVE ERROR
,
Oct 3 2016
Thats a seperate issue tracked in crbug.com/652306 . The timing out without output appears to be not happening anymore so we can close this bug as wontfix. |
||||
►
Sign in to add a comment |
||||
Comment 1 by rnep...@chromium.org
, Sep 8 2016