R55 builders "Pending build requests" |
|||||||||||||
Issue descriptionRelease builders for banjo and nyan_kitty each have 15+ "Pending build requests". It looks like this prevents builds from being created, which prevents scheduling tests, which prevents bvt-cq tests from running since Sept 06th/07th. https://chromegw.corp.google.com/i/chromeos/builders/banjo-release https://chromegw.corp.google.com/i/chromeos/builders/nyan_kitty-release https://wmatrix.googleplex.com/unfiltered?releases=tot&suites=bvt-cq&days_back=14 This may block upcoming releases for these boards without these tests.
,
Sep 12 2016
These machines are both currently up and running, but buildbot thinks they are offline. I'm fairly sure they would recover if I just rebooted them. I'd like to get the cause of the problem diagnosed (it's happened before, and I'd prefer it didn't keep happening). There is no need to reboot until the 6 PM builds start, so waiting to make diagnoses easier. Please pass back to me before 6 PM.
,
Sep 12 2016
I did clean up all pending builds, and for background, these builders are expected to build 3 times a day at 8 hour intervals, and they normally reboot after each build completes.
,
Sep 12 2016
If we expect them to reboot 3x a day, perhaps a less elegant solution would be a watchdog, if uptime goes over 24 hours trigger a reboot?
,
Sep 12 2016
Not a horrible idea, but lets understand the current problem first.
,
Sep 12 2016
Finally getting to look at this.
,
Sep 12 2016
The last thing I see in the logs is 2016-09-07 10:02:11-0700 [Broker,client] slave shutting down on command from master 2016-09-07 10:02:11-0700 [Broker,client] lost remote 2016-09-07 10:02:11-0700 [Broker,client] Lost connection to master2b.golo.chromium.org:31600 2016-09-07 10:02:11-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x7fa2b106b4d0> 2016-09-07 10:02:11-0700 [-] Main loop terminated. 2016-09-07 10:02:11-0700 [-] Server Shut Down. So... Um. ¯\_(ツ)_/¯
,
Sep 12 2016
I'm respawning these two
,
Sep 12 2016
They're up again. Keep an eye on them in case they go offline and don't come back after their next build (though I also have a pin bump+restart CL that I'll be doing at 5pm PST today too, so they'll only be on for a little bit.
,
Sep 12 2016
,
Sep 12 2016
When would the master send that command?
,
Sep 12 2016
If the bot is set to autoreboot, it could send that command. I'm not sure why the machine wouldn't then reboot itself afterwards. I'm not aware of any other conditions that could cause it to send that command, but I could imagine a race in the master logic if the master is heavily loaded and the bot reboots quickly.
,
Oct 7 2016
,
Oct 10 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by bhthompson@google.com
, Sep 12 2016