New issue
Advanced search Search tips

Issue 716669 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

ChromeOS swarm bots aren't running swarm client

Project Member Reported by dgarr...@chromium.org, Apr 28 2017

Issue description

These two GCE instances aren't running the swarming client automatically, though I've been able to run it by hand.

I'm not sure how this was supposed to be configured, so I'm sure it's my fault, I'm just not sure how to fix it.

https://pantheon.corp.google.com/compute/instances?project=chromeos-bot&filter=name:*swarm*


 

Comment 1 by hinoka@chromium.org, Apr 29 2017

...
  "startup": {
    "enabled": true,
    "cwd": "/opt/infra-bot-setup/infra-python",
    "cmd": [
      "{{python}}", "run.py", "infra.tools.bot_setup.start",
...

It doesn't look like it's running the swarming template?
They are using this template in ccompute_config.py. Any idea what's wrong with it?

  Template(
      name='swarm-trusty-chromeos',
      image=DEFAULT_CHROMEOS_TRUSTY_IMAGE,
      disk_size=2048,
      project=PROJECTS['chromeos'].project_id,
      metadata_from_file={
        'cipd_deployments': os.path.join(
            DIRNAME, 'cipd', 'swarm-trusty.json'),
      }),


Should it have this?
      tags=['swarm'],

For that matter, how is the bot supposed to be assigned to a specific swarming bot pool?
Oh nevermind, for some reason I thought the templates were different.

yeah the reason they're running buildbot is because the hostname isn't prefixed with "swarm".  I can fix that on the code side to also pick up "cros-swarming" as a recognized prefix.  Alternatively if you rename them to swarm-cros-123 they'll also work.  Lemme know which way you want to do this.
I'll rename them, that seems easiest, and should be fine.

Though, how does the bot pool get specified?
Cc: mar...@chromium.org
bot pool i have no idea, maruel?
Also, the renamed instances doesn't seem to be appearing on the Bot List at all.

Though they are running what looks to me like the right process:

chrome-+  5999  0.2  0.0 176672 37272 ?        Sl   15:58   0:05 /opt/infra-bot-setup/infra-python/ENV/bin/python /b/swarming/swarming_bot.1.zip start_bot

Project Member

Comment 8 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/928be25d51e01dddf61f75793fa92cb37d920128

commit 928be25d51e01dddf61f75793fa92cb37d920128
Author: Don Garrett <dgarrett@google.com>
Date: Tue May 02 00:39:09 2017

Project Member

Comment 9 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/c31007bd874fb70343b2d3276fc8bbe803699a1d

commit c31007bd874fb70343b2d3276fc8bbe803699a1d
Author: Don Garrett <dgarrett@google.com>
Date: Tue May 02 00:43:41 2017

The clients seem to be trying to connect, but are now getting authentication errors.

1649 2017-05-02 00:56:54.693 W: Authentication is required for https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake on attempt 0.
403 Client Error: Forbidden for url: https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake
1649 2017-05-02 00:56:54.693 E: Unable to authenticate to https://chromium-swarm.appspot.com (403 Client Error: Forbidden for url: https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake).
1649 2017-05-02 00:56:54.694 E: Failed to contact for handshake, retrying in 300 sec...


Looking around a little further, I think they are trying to connect to the wrong instance of the service:

# This is a Swarming bot
Server: https://chromium-swarm.appspot.com
Version: 2770-6784d9a

How do I configure them to talk too:

https://chrome-swarming.appspot.com/
Project Member

Comment 12 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/e905fca12945b49f4e36211b1fe1b35b76deb38f

commit e905fca12945b49f4e36211b1fe1b35b76deb38f
Author: Ryan Tseng <hinoka@google.com>
Date: Tue May 02 18:06:35 2017

Point swarm-cros-* ot chrome-swarming.appspot.com

BUG= 716669 

Change-Id: I4ac367380570691b447cedcac0ed0d5ac1a44030
Reviewed-on: https://chromium-review.googlesource.com/493886
Commit-Queue: Ryan Tseng <hinoka@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/e905fca12945b49f4e36211b1fe1b35b76deb38f/infra/tools/bot_setup/start/swarming.py

It seems like all of this would have been simpler if we used GCE tags for this configuration.

One for the swarming instance to contact, and another for the pool.

The presence of the swarming instance tag could control if we run the swarming client.
Also, even after renaming the bots again, it doesn't seem to be working.

From swarm-cros-0:

5876 2017-05-02 19:23:01.009 W: Authentication is required for https://chromium-swarm.appspot.com/swarming/api/v1/b
ot/handshake on attempt 0.
403 Client Error: Forbidden for url: https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake
5876 2017-05-02 19:23:01.009 E: Unable to authenticate to https://chromium-swarm.appspot.com (403 Client Error: For
bidden for url: https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake).
5876 2017-05-02 19:23:01.009 E: Failed to contact for handshake, retrying in 5 sec...
5
They probably have to be IP whitelisted (chromeos-bot project), we don't have any GCE bots on chrome-swarming prior to this.

They have been whitelisted according to this doc. go/cros-builder-address-whitelisting

Let's update it with whatever else is needed.
Project Member

Comment 17 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/eaa284049bb9e9aead7c37c86024cb7f0ca3e6c0

commit eaa284049bb9e9aead7c37c86024cb7f0ca3e6c0
Author: Don Garrett <dgarrett@google.com>
Date: Tue May 02 22:28:04 2017

Project Member

Comment 18 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/eaa284049bb9e9aead7c37c86024cb7f0ca3e6c0

commit eaa284049bb9e9aead7c37c86024cb7f0ca3e6c0
Author: Don Garrett <dgarrett@google.com>
Date: Tue May 02 22:28:04 2017

Status: Fixed (was: Untriaged)
The bots have now connected, and I've even been able to schedule a trivial job against them.

Thanks!
Labels: VerifyIn-61

Comment 21 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment