New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 701756 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Make Swarming trigger more robust

Project Member Reported by emso@chromium.org, Mar 15 2017

Issue description

The current swarming trigger request often fail with DEADLINE_EXCEEDED:

2017/03/15 13:25:22 INFO: [driver] Trigger request (run ID: 5852562955698176, Worker: Hello_Ubuntu)
2017/03/15 13:25:23 INFO: [driver] Created worker isolate, hash: "2139ca0ef78ffe0ba795ec0f49a5276c490c6a18"
2017/03/15 13:25:23 INFO: [driver] PubSub userdata for trigger: "CICAgICA3LIKEiVodHRwczovL2lzb2xhdGVzZXJ2ZXItZGV2LmFwcHNwb3QuY29tGihlZTIwZDQyOWExMmYyNWRhMGIxY2M5ZTU1NzQ5YmE4ZjMwY2ZmZGY5IgxIZWxsb19VYnVudHUqJmh0dHBzOi8vY2hyb21pdW0tc3dhcm0tZGV2LmFwcHNwb3QuY29t"
2017/03/15 13:25:28 ERROR: failed to trigger swarming task: Post https://chromium-swarm-dev.appspot.com/_ah/api/swarming/v1/tasks/new?alt=json: API error 5 (urlfetch: DEADLINE_EXCEEDED): The read operation timed out :: {"error":"Post https://chromium-swarm-dev.appspot.com/_ah/api/swarming/v1/tasks/new?alt=json: API error 5 (urlfetch: DEADLINE_EXCEEDED): The read operation timed out"}
2017/03/15 13:25:28 ERROR: [driver] Failed to call Driver.Trigger :: {"error":"rpc error: code = Internal desc = failed to trigger worker: failed to call trigger on swarming API: failed to trigger swarming task: Post https://chromium-swarm-dev.appspot.com/_ah/api/swarming/v1/tasks/new?alt=json: API error 5 (urlfetch: DEADLINE_EXCEEDED): The read operation timed out"}
INFO     2017-03-15 13:25:28,933 module.py:806] driver: "POST /driver/internal/trigger HTTP/1.1" 500 -
WARNING  2017-03-15 13:25:28,933 taskqueue_stub.py:1981] Task task2 failed to execute. This task will retry in 0.400 seconds

Failure causes the task in the launch task queue to be retried. The failed trigger request tends to succeed despite the failed request, and the end result is a more than one swarming task. 


 

Comment 1 by emso@chromium.org, Mar 22 2017

Owner: emso@chromium.org
Status: Assigned (was: Available)
Project Member

Comment 2 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/1a047c59a674f6874533ae6a58afdea56586f7f1

commit 1a047c59a674f6874533ae6a58afdea56586f7f1
Author: Emma <emso@chromium.org>
Date: Mon Apr 03 05:10:32 2017

Adds longer timeout to swarming connection

BUG= 701756 

Change-Id: I89b15ac0210d4f098411724fe2f120fb96f23b2d
Reviewed-on: https://chromium-review.googlesource.com/464869
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/1a047c59a674f6874533ae6a58afdea56586f7f1/go/src/infra/tricium/appengine/common/swarming.go

Comment 3 by emso@chromium.org, Apr 4 2017

Status: Verified (was: Assigned)

Sign in to add a comment