New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 645106 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

Tests timing out without output on Mac perf bots

Project Member Reported by rnep...@chromium.org, Sep 8 2016

Issue description

command timed out: 2400 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2728.743504

https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf%20(1)

https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf%20%284%29?numbuilds=200

ccing charliea@ since its after battor tests.
ccing dtu@ because it seems like it might be recipe related.
 
It looks like the test is failing then it isn't starting the next test and timing out without output...

Comment 2 by dtu@chromium.org, Sep 8 2016

Owner: rnep...@chromium.org
Status: Assigned (was: Untriaged)
I think this kind of thing implies that telemetry is leaving processes lying around. maybe in this error scenario the battor_agent_binary is not being stopped.

Passing to rnephew, I think it's in this error handling code?
https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py#L75
Project Member

Comment 3 by bugdroid1@chromium.org, Sep 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c4b1e75d9334ca63343d5e44768c246d796c8386

commit c4b1e75d9334ca63343d5e44768c246d796c8386
Author: nednguyen <nednguyen@google.com>
Date: Sat Sep 10 12:00:05 2016

Manually roll src/third_party/catapult/ da6d44e4a..c0b988891 (3 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/da6d44e4a2a5..c0b988891fd1

$ git log da6d44e4a..c0b988891 --date=short --no-merges --format='%ad %ae %s'
2016-09-09 erikchen Add a new cpu time TBMv2 system health metric.
2016-09-09 nednguyen Update the lock operation in cloud_storage with better implementation using py_utils.lock
2016-09-09 rnephew [BattOr] Kill BattOr shell if there is a problem with communicating during clock sync.

BUG= 640312 , 637904 , 645106 , 645720 

TBR=catapult-sheriff@chromium.org, jbudorick@chromium.org

Review-Url: https://codereview.chromium.org/2326063004
Cr-Commit-Position: refs/heads/master@{#417833}

[modify] https://crrev.com/c4b1e75d9334ca63343d5e44768c246d796c8386/DEPS
[modify] https://crrev.com/c4b1e75d9334ca63343d5e44768c246d796c8386/build/android/test_runner.pydeps

Project Member

Comment 4 by bugdroid1@chromium.org, Sep 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8bf8e86ce08385909ae3e149c5b290a8a9602607

commit 8bf8e86ce08385909ae3e149c5b290a8a9602607
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Mon Sep 12 17:33:20 2016

Roll src/third_party/catapult/ 32f19b159..9d32403a4 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/32f19b159dd6..9d32403a467f

$ git log 32f19b159..9d32403a4 --date=short --no-merges --format='%ad %ae %s'
2016-09-12 rnephew [BattOr] Fix error in catching BattOr exception.

BUG= 645106 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2335543003
Cr-Commit-Position: refs/heads/master@{#417965}

[modify] https://crrev.com/8bf8e86ce08385909ae3e149c5b290a8a9602607/DEPS

Project Member

Comment 5 by bugdroid1@chromium.org, Sep 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a80bccf5aa929e36d526180dc5b51abe03b9d2d4

commit a80bccf5aa929e36d526180dc5b51abe03b9d2d4
Author: nednguyen <nednguyen@google.com>
Date: Tue Sep 13 03:06:26 2016

Manually roll src/third_party/catapult/ aea37326a..900438075 (5 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/aea37326ac67..900438075780

$ git log aea37326a..900438075 --date=short --no-merges --format='%ad %ae %s'
2016-09-12 benjhayden Stop computing summary statistic ScalarNumerics in ValueSet.
2016-09-12 nednguyen Make cloud_storage_global_lock a python module & import it in py_utils.cloud_storage
2016-09-12 benjhayden Prevent re-entering value-set-table.updateContents_.
2016-09-12 rnephew [BattOr] Add logging to battor subprocess polling.
2016-09-12 washingtonp Enable some profile_chrome unit tests on Trybots, specifically the ones that will currently run without error.

BUG= 645720 , 645106 

TBR=catapult-sheriff@chromium.org
NOTRY=true (net_unittests is flaky:  crbug.com/646215 )

Review-Url: https://codereview.chromium.org/2331333002
Cr-Commit-Position: refs/heads/master@{#418154}

[modify] https://crrev.com/a80bccf5aa929e36d526180dc5b51abe03b9d2d4/DEPS
[modify] https://crrev.com/a80bccf5aa929e36d526180dc5b51abe03b9d2d4/build/android/test_runner.pydeps

Cc: nednguyen@chromium.org
I'm not convinced that this is what is going on (the battor shell is left running and that is why the test isn't ending). Does anyone else have any ideas on what could be causing the telemetry test to hang?
Here is a screenshot of the end of the test:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/profiler-file-id_0-2016-09-13_12-21-4489077.png

Its just a blank mac screen.
This seems serious. Can you ping me again next week?

Comment 8 by zh...@chromium.org, Oct 3 2016

Ping.

Is this fixed?
Looks like there is some battor error:


INFO:root:*********** END OF BROWSER STANDARD OUTPUT ************
INFO:root:********************* BROWSER LOG *********************
INFO:root:No log file
INFO:root:***************** END OF BROWSER LOG ******************
Traceback (most recent call last):
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 88, in _RunStoryAndProcessErrorIfNeeded
    test.Measure(state.platform, results)
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 287, in Measure
    trace_result = platform.tracing_controller.StopTracing()
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 47, in StopTracing
    return self._tracing_controller_backend.StopTracing()
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 108, in StopTracing
    self._IssueClockSyncMarker()
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 203, in _IssueClockSyncMarker
    self._RecordIssuerClockSyncMarker)
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py", line 103, in RecordClockSyncMarker
    self._battor.RecordClockSyncMarker(sync_id)
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/common/battor/battor/battor_wrapper.py", line 201, in RecordClockSyncMarker
    self._SendBattorCommand('%s %s' % (self._RECORD_CLOCKSYNC_CMD, sync_id))
  File "/b/c/b/Mac_Retina_Perf__1_/src/third_party/catapult/common/battor/battor/battor_wrapper.py", line 270, in _SendBattorCommand
    'Outputted: %s' % (cmd, status))
BattorError: BattOr did not complete command 'RecordClockSyncMarker 4843713c-49cc-42d3-9dfd-2f408ae4c156' correctly.
Outputted: [1003/080255:FATAL:battor_agent_bin.cc(88)] Fatal error when communicating with the BattOr: RECEIVE ERROR
Status: WontFix (was: Assigned)
Thats a seperate issue tracked in  crbug.com/652306 .

The timing out without output appears to be not happening anymore so we can close this bug as wontfix.

Sign in to add a comment