New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 684669 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 684701



Sign in to add a comment

most canaries are twice-purple with no indication of manual intervention

Project Member Reported by semenzato@chromium.org, Jan 24 2017

Issue description

Subject says it all.  For instance:

https://uberchromegw.corp.google.com/i/chromeos/builders/samus-release/builds/3960

Some were purple twice in a row, others only once.

There was no obvious announcement of build stoppages.  We would like to know what happened so we can exclude a malfunction.
 
Cc: d...@chromium.org
From Pratmesh:

Very weird.
Last night's should have been due to https://bugs.chromium.org/p/chromium/issues/detail?id=683359 and https://bugs.chromium.org/p/chromium/issues/detail?id=683359
Sadly, I wasn't following it, so didn't communicate anything.
But there is a problem with this explanation. The times don't quite match up. In particular, how was this slave still running till 11:00 PM? https://uberchromegw.corp.google.com/i/chromeos/builders/mccloud-release/builds/793

This morning's restart got everything that hadn't finished by ~6:56 AM. I have no idea who/why restarted that.


According to the log (https://chrome-internal.googlesource.com/infradata/master-manager.git/) there was no restart this morning...


Comment 2 by d...@chromium.org, Jan 24 2017

Cc: no...@chromium.org
Heartbeat failed with error "" (reason "BUILD_IS_COMPLETED")

This is a BuildBucket error: https://cs.chromium.org/chromium/infra/appengine/cr-buildbucket/api.py?q=BUILD_IS_COMPLETED&sq=package:chromium&dr=C&l=32

Any thoughts on what could have done this, nodir@?

Comment 3 by no...@chromium.org, Jan 24 2017

this buildbot build is associated with buildbucket build 8989593815995856800, which was cancelled at 2017-01-24 14:55:04 UTC by 446450136466-mko2u1g65l7iqsos5c09tni364ejqg75@developer.gserviceaccount.com, the same service account that scheduled the build. This is chromeos account https://chrome-internal.googlesource.com/infra/puppet/+/master/puppetm/etc/puppet/hieradata/credentials/default.eyaml#429

I assume this is a chromeos service that manages builds? 
canary master realized that these got cancelled, but the logs do not indicate that the master cancelled them:

06:54:00: INFO: 2:56:49.771849 until timeout...
06:55:16: INFO: Running cidb query on pid 29837, repr(query) starts with <sqlalchemy.sql.expression.Select at 0x4c3ca10; Select object>
06:55:21: INFO: cidb query succeeded after 1 retries
06:55:21: INFO: Not retriable build veyron_speedy-release started already.
..
06:55:21: INFO: Build config veyron_speedy-release completed with status "CANCELED".


afaict, we currently do not cancel any release builds under any circumstances.

Comment 6 by d...@chromium.org, Jan 24 2017

So talked to Nodir and w/ #3, BUILD_IS_COMPLETED will be returned if the build is cancelled. Don't master builders cancel all previous builds before scheduling new ones? If so, I would bet that someone manually or automatically started a new master builder, causing all current builds to become cancelled and, consequently, halt with that error. Seem plausible?
Cc: nxia@chromium.org
+nxia: The in-house expert on when we murder builds (or give them the resurrection potion).

Comment 8 by nxia@chromium.org, Jan 24 2017

Owner: nxia@chromium.org

Comment 9 by nxia@chromium.org, Jan 24 2017

Recently R57 was cut and got run on chromeos_release waterfall. It also runs master-release build but on a different waterfall and branch. It also tries to cancel the slave release builds in the chromeos waterfall in its cleanup stage.

Having a fix CL at https://chromium-review.googlesource.com/#/c/432004/.

Will have another CL to add branch and waterfall tags to the build, so that cleanup stage only searches for the builds with the right tags.

Comment 10 by nxia@chromium.org, Jan 25 2017

Blockedon: 684701
Status: Started (was: Untriaged)
I don't see these particular aborts last night. Let's call it fixed.

Comment 12 by nxia@chromium.org, Jan 26 2017

Status: Fixed (was: Started)

Comment 13 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 14 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 16 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment