New issue
Advanced search Search tips

Issue 691399 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Treat context deadline as a transient error

Project Member Reported by vadimsh@chromium.org, Feb 12 2017

Issue description

Noticed some luci-scheduler job stuck in "Starting" state.

Debugging led to /internal/tasks/invocations task queue call:
"""
Job transaction failed: Call error 11: Deadline exceeded (timeout)
HTTP 202: Error when executing the action - Call error 11: Deadline exceeded (timeout)
"""

The deadline error is treated as fatal here, and task queue task is declared finished (by returning HTTP 202). 

It should be treated as transient and the task should complete with HTTP 500 code (so Task Queue Service can retry it).
 

Comment 1 by estaab@chromium.org, Jun 22 2017

Cc: tandrii@chromium.org
Status: Available (was: Untriaged)
Project Member

Comment 2 by bugdroid1@chromium.org, Sep 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/af2196e28a0b79b66902a31e6b9f75caf525eb25

commit af2196e28a0b79b66902a31e6b9f75caf525eb25
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Tue Sep 12 00:20:56 2017

scheduler: Extract the transaction helper function.

Make it retry on subset of transient errors. Consistently mark all unexpected
datastore errors (usually deadlines) as transient, to make sure their status
correctly propagates to the task queue level.

R=tandrii@chromium.org
BUG= 764043 ,  691399 

Change-Id: I31c289f12eb91cf932022bc85b9c7f7368309fe6
Reviewed-on: https://chromium-review.googlesource.com/660987
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org>

[modify] https://crrev.com/af2196e28a0b79b66902a31e6b9f75caf525eb25/scheduler/appengine/engine/controller.go
[modify] https://crrev.com/af2196e28a0b79b66902a31e6b9f75caf525eb25/scheduler/appengine/engine/engine.go
[modify] https://crrev.com/af2196e28a0b79b66902a31e6b9f75caf525eb25/scheduler/appengine/engine/invocation.go
[modify] https://crrev.com/af2196e28a0b79b66902a31e6b9f75caf525eb25/scheduler/appengine/engine/utils.go
[add] https://crrev.com/af2196e28a0b79b66902a31e6b9f75caf525eb25/scheduler/appengine/engine/utils_test.go

Owner: vadimsh@chromium.org
Status: Assigned (was: Available)
Isn't this fixed now?
Status: Fixed (was: Assigned)
Yes

Sign in to add a comment