New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 646121 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

R55 builders "Pending build requests"

Project Member Reported by tienchang@chromium.org, Sep 12 2016

Issue description

Release builders for banjo and nyan_kitty each have 15+ "Pending build requests". It looks like this prevents builds from being created, which prevents scheduling tests, which prevents bvt-cq tests from running since Sept 06th/07th.

https://chromegw.corp.google.com/i/chromeos/builders/banjo-release

https://chromegw.corp.google.com/i/chromeos/builders/nyan_kitty-release

https://wmatrix.googleplex.com/unfiltered?releases=tot&suites=bvt-cq&days_back=14

This may block upcoming releases for these boards without these tests.
 
Cc: dgarr...@chromium.org
+dgarrett

We may need to kick:

https://chromegw.corp.google.com/i/chromeos/buildslaves/cros-beefy14-c2
https://chromegw.corp.google.com/i/chromeos/buildslaves/cros-beefy5-c2

If we get them back online we probably should cancel out the long queue of builds (we only really need the latest one at this point). 
Owner: iannucci@chromium.org
These machines are both currently up and running, but buildbot thinks they are offline. I'm fairly sure they would recover if I just rebooted them.

I'd like to get the cause of the problem diagnosed (it's happened before, and I'd prefer it didn't keep happening). There is no need to reboot until the 6 PM builds start, so waiting to make diagnoses easier.

Please pass back to me before 6 PM.
I did clean up all pending builds, and for background, these builders are expected to build 3 times a day at 8 hour intervals, and they normally reboot after each build completes.
If we expect them to reboot 3x a day, perhaps a less elegant solution would be a watchdog, if uptime goes over 24 hours trigger a reboot?
Not a horrible idea, but lets understand the current problem first.
Status: Started (was: Assigned)
Finally getting to look at this.
The last thing I see in the logs is

2016-09-07 10:02:11-0700 [Broker,client] slave shutting down on command from master
2016-09-07 10:02:11-0700 [Broker,client] lost remote
2016-09-07 10:02:11-0700 [Broker,client] Lost connection to master2b.golo.chromium.org:31600
2016-09-07 10:02:11-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x7fa2b106b4d0>
2016-09-07 10:02:11-0700 [-] Main loop terminated.
2016-09-07 10:02:11-0700 [-] Server Shut Down.

So... Um. ¯\_(ツ)_/¯
I'm respawning these two
They're up again. Keep an eye on them in case they go offline and don't come back after their next build (though I also have a pin bump+restart CL that I'll be doing at 5pm PST today too, so they'll only be on for a little bit.
Status: Fixed (was: Started)
When would the master send that command?
If the bot is set to autoreboot, it could send that command. I'm not sure why the machine wouldn't then reboot itself afterwards. I'm not aware of any other conditions that could cause it to send that command, but I could imagine a race in the master logic if the master is heavily loaded and the bot reboots quickly.
Labels: VerifyIn-55

Comment 14 by dchan@chromium.org, Oct 10 2016

Labels: -VerifyIn-55

Comment 15 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 16 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 17 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 18 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 19 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 21 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment