Please, restart slave33-c1 |
|||||
Issue descriptionslave33-c1 is offline for ~1 hour, which makes three bots to stall: http://build.chromium.org/p/chromium.fyi/builders/CFI%20Linux%20CF http://build.chromium.org/p/chromium.fyi/builders/LTO%20Linux https://build.chromium.org/p/chromium.fyi/builders/LTO%20Linux%20Perf Please, restart the slave. Note: this is a recurrence of https://crbug.com/573350 and https://crbug.com/577059 , you might want to consider a better monitoring for this kind of an issue.
,
May 17 2016
I poked around on the bot (and brought it back online as a side effect), trying to figure out what's wrong with it. It seems to have terminated cleanly by a command from a master (probably because it is marked as 'auto_reboot' in the master config), but wasn't be able to come back online. In the processes list I saw stuck "gclient sync" process. I think we need to impose a timeout on initial gclient sync in https://chromium.googlesource.com/infra/infra/+/master/infra/tools/bot_setup/start/chrome.py#99 (or figure out why it is getting stuck...)
,
May 20 2016
Reopening issue, as slave33-c1 is offline again. Please, restart it.
,
May 20 2016
Looking... The machine itself is online, it must be the slave process that's dead.
,
May 20 2016
The slave process stopped logging after 2016-05-19 18:38:57-0700. I restarted it, it connected to the master. I'm not yet sure what happened this time.
,
May 20 2016
Latest slave logs before the restart: 2016-05-19 18:36:35-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:36:35-0700 [-] Connecting to master1.golo.chromium.org:8111 2016-05-19 18:36:35-0700 [Broker,client] Lost connection to master1.golo.chromium.org:8111 2016-05-19 18:36:35-0700 [Broker,client] <twisted.internet.tcp.Connector instance at 0x7f2af4bbf4d0> will retry in 7 seconds 2016-05-19 18:36:35-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:36:43-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:36:43-0700 [-] Connecting to master1.golo.chromium.org:8111 2016-05-19 18:36:43-0700 [Broker,client] Lost connection to master1.golo.chromium.org:8111 2016-05-19 18:36:43-0700 [Broker,client] <twisted.internet.tcp.Connector instance at 0x7f2af4bbf4d0> will retry in 23 seconds 2016-05-19 18:36:43-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:37:07-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:37:07-0700 [-] Connecting to master1.golo.chromium.org:8111 2016-05-19 18:37:37-0700 [-] Connection to master1.golo.chromium.org:8111 failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure. ] 2016-05-19 18:37:37-0700 [-] <twisted.internet.tcp.Connector instance at 0x7f2af4bbf4d0> will retry in 56 seconds 2016-05-19 18:37:37-0700 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:38:34-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7f2af48b3b90> 2016-05-19 18:38:34-0700 [-] Connecting to master1.golo.chromium.org:8111 2016-05-19 18:38:57-0700 [Broker,client] message from master: attached 2016-05-19 18:38:57-0700 [Broker,client] removing old builder LTO Linux Perf 2016-05-19 18:38:57-0700 [Broker,client] I have a leftover directory 'goma_cache' that is not being used by the buildmaster: you can delete it now 2016-05-19 18:38:57-0700 [Broker,client] I have a leftover directory 'cert' that is not being used by the buildmaster: you can delete it now 2016-05-19 18:38:57-0700 [Broker,client] I have a leftover directory 'google-chrome-lto-perf-linux_64' that is not being used by the buildmaster: you can delete it now 2016-05-19 18:38:57-0700 [Broker,client] I have a leftover directory 'cache_dir' that is not being used by the buildmaster: you can delete it now 2016-05-19 18:38:57-0700 [Broker,client] I have a leftover directory 'cache' that is not being used by the buildmaster: you can delete it now 2016-05-19 18:38:57-0700 [Broker,client] Wanted directories: ['.svn', 'CFI_Linux_CF', 'cache_dir', 'cert', 'goma_cache', 'google-chrome-lto-linux_64', 'info'] 2016-05-19 18:38:57-0700 [Broker,client] Actual directories: ['CFI_Linux_CF', 'cache', 'cache_dir', 'cert', 'goma_cache', 'google-chrome-lto-linux_64', 'google-chrome-lto-perf-linux_64', 'info'] 2016-05-19 18:38:57-0700 [Broker,client] Deleting unwanted directory cache 2016-05-19 18:38:57-0700 [Broker,client] Deleting unwanted directory google-chrome-lto-perf-linux_64
,
May 20 2016
Closing the bug - the immediate problem is solved. Something is still unstable with this slave, it's worth a deeper investigation. Filed http://crbug.com/613612 for tracking the stability issue.
,
May 20 2016
,
May 20 2016
Thank you, Sergey! |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by aga...@chromium.org
, May 17 2016Status: Fixed (was: Untriaged)