New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 684632 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

beaglebone paladin timedout because no output to logdog for 9000 seconds

Project Member Reported by pprabhu@chromium.org, Jan 24 2017

Issue description

Has only happened once, so low priority.

https://uberchromegw.corp.google.com/i/chromeos/builders/beaglebone-paladin/builds/12256/steps/steps/logs/stdio

Timed out after 2+ hours.
Towards the end, we see:

@@@STEP_CLOSED@@@
2017/01/24 07:25:50 proto: duplicate proto type registered: google.protobuf.Duration
2017/01/24 07:25:50 proto: duplicate proto type registered: google.protobuf.Timestamp
2017/01/24 07:25:50 proto: duplicate proto type registered: google.protobuf.Empty

command timed out: 9000 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=9009.788312

-------------------------------------
Note that the build started around 7:25, and the timeout in the end is around 9:30.
So, the builder logged the duplicate proto warning and then went silent for ~2 hours, at which point it was killed.
 
btw, the new retry-early-death-of-cq-slave worked beautifully and recovered the CQ run for us. Kudos!
Cc: d...@chromium.org nxia@chromium.org
Labels: -current-issue
Status: Unconfirmed (was: Available)
Labels: -Pri-3 Pri-1
Owner: no...@chromium.org
Status: Assigned (was: Unconfirmed)
OK, we do need to look at this.

+ chrome-infra trooper.

nodir@: Do you have any more visibility into why logdog's log stream dried up?

Comment 8 by no...@chromium.org, Jan 25 2017

Owner: d...@chromium.org

Comment 9 by no...@chromium.org, Jan 25 2017

FWIW the proto warnings are irrelevant

Comment 10 by d...@chromium.org, Jan 25 2017

Owner: pprabhu@chromium.org
LogDog / Annotee are just proxies for an underlying command. In this case, it's the recipe engine's checkout. If that checkout fails, or hangs, or doesn't produce any logs, LogDog's not going show any output either.

From a successful build:

2017/01/24 09:56:58 proto: duplicate proto type registered: google.protobuf.Duration
2017/01/24 09:56:58 proto: duplicate proto type registered: google.protobuf.Timestamp
2017/01/24 09:56:58 proto: duplicate proto type registered: google.protobuf.Empty
INFO:root:Running ['git', 'rev-parse', '--verify', '836fb28be142e5385a8715e4c61e23e8ee505ccd^{commit}']

This isn't a LogDog line. In fact, LogDog tooling showed exactly what it always showed. I'm not sure why you think this is a LogDog-related timeout, since the LogDog bootstrap step finished and the LogDog-revlevant part of the output seems consistent.

At a glance, my guess is that Git hung when the recipe engine was performing its recipe checkout. Assigning back to reporter in someone pprabhu@ wants to take a deeper look; I don't think there's anything else for me to weigh in on here.

Comment 11 by d...@chromium.org, Jan 25 2017

Mergedinto: 684559
Status: Duplicate (was: Assigned)

Sign in to add a comment