New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 736921 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

amd64-generic-telemetry bot has NOT been able to kick off any build since 6/20/17

Project Member Reported by jen...@chromium.org, Jun 26 2017

Issue description

amd64-generic-telemetry bot has NOT been able to kick off any build since 6/20/17

https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-telemetry
 

Comment 1 by jen...@chromium.org, Jun 27 2017

Cc: achuith@chromium.org
Components: Infra
Labels: -Pri-3 Infra-Troopers Pri-2
I tried restarting the builder per a tip from Don about a bug last week.  It seems like that didn't fix it though.

Handing off to troopers to take a look at it
Project Member

Comment 3 by bugdroid1@chromium.org, Jun 27 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager/+/5bca6bc6c8a840d4d86ae778a4a875eedb549b0a

commit 5bca6bc6c8a840d4d86ae778a4a875eedb549b0a
Author: Dirk Pranke <dpranke@google.com>
Date: Tue Jun 27 01:11:53 2017

The master seems to be in a very strange state, in that the builders seem to be connected, but aren't processing new jobs. I tried restarting the master but that seems to have had no effect.
Cc: iannucci@chromium.org d...@chromium.org
I have no idea what's going on here. dnj@, any ideas?

Comment 7 by d...@chromium.org, Jun 27 2017

From logs on master2a:

2017-06-26 17:00:43-0700 [Broker,891131,192.168.121.61] slave 'build246-m2' attaching from IPv4Address(TCP, '192.168.121.61', 48254)
2017-06-26 17:00:43-0700 [Broker,891131,192.168.121.61] Starting buildslave keepalive timer for 'build246-m2'
2017-06-26 17:00:43-0700 [Broker,891131,192.168.121.61] Got slaveinfo from 'build246-m2'
2017-06-26 17:00:43-0700 [Broker,891131,192.168.121.61] bot attached
2017-06-26 17:00:43-0700 [Broker,891131,192.168.121.61] BuildSlave.sendBuilderList (<AutoRebootBuildSlave 'build246-m2', current builders: amd64-generic-telemetry>) failed
2017-06-26 17:00:43-0700 [Broker,891131,192.168.121.61] Unhandled Error
        Traceback from remote host -- Traceback unavailable

---

2017-06-26 18:13:34-0700 [Broker,client] Wanted directories: ['.svn', 'amd64-generic-telemetry', 'cache_dir', 'cert', 'goma_cache', 'info']
2017-06-26 18:13:34-0700 [Broker,client] Actual directories: ['amd64-generic-telemetry', 'cache', 'cache_dir', 'cert', 'info', 'tests']
2017-06-26 18:13:34-0700 [Broker,client] Deleting unwanted directory cache
2017-06-26 18:13:34-0700 [Broker,client] Peer will receive following PB traceback:
2017-06-26 18:13:34-0700 [Broker,client] Unhandled Error
        Traceback (most recent call last):
          File "/b/build/third_party/twisted_10_2/twisted/spread/banana.py", line 153, in gotItem
            self.callExpressionReceived(item)
          File "/b/build/third_party/twisted_10_2/twisted/spread/banana.py", line 116, in callExpressionReceived
            self.expressionReceived(obj)
          File "/b/build/third_party/twisted_10_2/twisted/spread/pb.py", line 516, in expressionReceived
            method(*sexp[1:])
          File "/b/build/third_party/twisted_10_2/twisted/spread/pb.py", line 828, in proto_message
            self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
        --- <exception caught here> ---
          File "/b/build/third_party/twisted_10_2/twisted/spread/pb.py", line 842, in _recvMessage
            netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
          File "/b/build/third_party/twisted_10_2/twisted/spread/flavors.py", line 114, in remoteMessageReceived
            state = method(*args, **kw)
          File "run_slave.py", line 190, in cleanup
            chromium_utils.RemoveDirectory(os.path.join(self.basedir, d))
          File "/b/build/scripts/common/chromium_utils.py", line 562, in RemoveDirectory
            os.chmod(root, 0770)
        exceptions.OSError: [Errno 1] Operation not permitted: '/b/build/slave/cache/cbuild/repository/chroot/mnt/host/source'


Looks like this is a failure in bot code cleanup due to root-owned data. The directory named "cache" isn't whitelisted in "run_slave.py", so it's trying to delete it and failing b/c it has a chroot in it. This is fallout from the "remote_run" transition.

I'd propose two things:
1) Whitelist "cache", b/c why not.
2) Manually delete this directory, since the builder is "remote_run" now and doesn't need it.

I'd guess what happened is that this master was not restarted for some reason, and tried to run an "annotated_run" build (old config) with the new layout.

Comment 8 by d...@chromium.org, Jun 27 2017

Owner: d...@chromium.org
Status: Fixed (was: Untriaged)
The builder is running again.
Status: Verified (was: Fixed)
Closing. Please reopen it if its not fixed. Thanks!

Sign in to add a comment