New issue
Advanced search Search tips

Issue 722493 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

CQ: When a try job is canceled, look for a message in result_details_json

Project Member Reported by bore...@google.com, May 15 2017

Issue description

CQ: When a try job is canceled, look for a message in result_details_json

At the moment, when a try job is canceled, the CQ uses this message, for example:

Try jobs failed on following builders:
  Infra-PerCommit-Large on skia.primary (JOB_FAILED, build has not started yet; builder either lacks capacity or does not exist (misspelled?))

There are a couple of problems here. In this case the job was explicitly canceled (because the patch doesn't apply), not failed. And "build has not started yet" makes it seem like the CQ or Buildbucket gave up on the build because it wasn't consumed for some amount of time, rather than being canceled by the task scheduler.

The result_details_json field of a Buildbucket build can now be filled when the build is canceled.  Can we make the CQ look inside the build's result_details_json when it has the canceled state and use the message given there, if any, and fall back on the above message otherwise?
 
Status: Available (was: Untriaged)
Yep, we can. I'll keep this in my mind in next two weeks that I'll be re-making tryjob processing for CQ.
Owner: tandrii@chromium.org
Status: Assigned (was: Available)

Comment 3 by bore...@google.com, May 15 2017

Great, thanks!  FTR we've just started providing result_details_json for canceled jobs in this format:

{"message": "No jobs for you!"}

in https://skia-review.googlesource.com/c/16911/

But we're happy to standardize on a format you prefer.
Cc: no...@chromium.org
Ah, I didn't realize it's not yet standard. I think being able to express failure reason is very handy. I'd vote for the structure similar to other errors from buildbucket: 

"error": {
  "reason": "..."  # for machines;
  "message": "for humans"
}

+nodir@ any recommendations?

Comment 5 by no...@chromium.org, May 15 2017

note that borenet@ is talking about Build.result_details_json field, not buildbucket response. "error" seems to imply that an error happened, but it did not necessarily happen. I'd vote for 

{
  "reason": "..."  # for machines;
  "message": "for humans"
}

in Build.result_details_json. 

Note that this applies only to Builds with status=COMPLETED, result=CANCELED

It was not standardized before because it was not possible to set result_details_json in cancelled builds until today.
yes, i understood that this is about result_details_json. The reason I proposed "error":{...} is because I thought CQ could make use of it if available in other cases, such as say run_presubmit.py recipe failing.

Comment 7 by bore...@google.com, May 16 2017

It seems like it might even be appropriate to have a CQ-specific field, ie:

{
  "cq": {
    "reason": "...", # for machines
    "message": "for humans"
  }
}

But again, I'm happy to use whatever makes sense across the board.

Comment 8 by no...@chromium.org, May 16 2017

What if a build had nothing to do with CQ?

Comment 9 by no...@chromium.org, May 16 2017

Milo could display the message too, so the standard is not specific to CQ. This build could be continuous, not necessarily a tryjob

Comment 10 by no...@chromium.org, May 16 2017

The reason, to be interpretable by machines, must be standardized too. It is not as easy as the reason in buildbucket responses because the latter is under buildbucket control. I'd omit the reason for now, so I'm leaning proposal in #3
Fine, CQ would get "message" field if available and set it instead of generic message.
Status: Started (was: Assigned)
https://chrome-internal-review.googlesource.com/386828
Project Member

Comment 13 by bugdroid1@chromium.org, Jun 2 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/c300716ffd7177e25020ea27dbf9b453468d2d64

commit c300716ffd7177e25020ea27dbf9b453468d2d64
Author: Andrii Shyshkalov <tandrii@chromium.org>
Date: Fri Jun 02 10:49:48 2017

Project Member

Comment 14 by bugdroid1@chromium.org, Jun 7 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/puppet/+/8ebf6654555cffd6f6184664352e6e1256b0b2bc

commit 8ebf6654555cffd6f6184664352e6e1256b0b2bc
Author: Andrii Shyshkalov <tandrii@chromium.org>
Date: Wed Jun 07 13:01:44 2017

Status: Fixed (was: Started)
This has been deployed, but I haven't tested it end-to-end. LMK if you can verify this.
Status: Assigned (was: Fixed)
So, yes doesn't work yet. The message would probably get shortened anyway, but it's surely better to shorten it in in your own scheduler.
Owner: ----
Status: Available (was: Assigned)

Comment 19 by bore...@google.com, Jul 14 2017

Friendly ping. What's the status on this?
I don't have time to debug this at the moment :(
To be clear on the status: the CL linked above has a bug, but I don't see where (but certainly in test, too :)). Perhaps, I didn't properly mocked buildbucket response.
Project Member

Comment 22 by sheriffbot@chromium.org, Jul 17 2017

Labels: Hotlist-Google
Components: -Infra>CQ Infra>Platform>CQdaemon

Comment 24 by efoo@chromium.org, Aug 31 2017

Components: Infra>Platform>CQ

Comment 25 by efoo@chromium.org, Aug 31 2017

Components: -Infra>Platform>CQdaemon
Project Member

Comment 26 by sheriffbot@chromium.org, Aug 31

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Recharge-Cold
borenet@ is this still something you'd like?
Labels: -Pri-2 Pri-3
Status: Available (was: Untriaged)
Labels: -Hotlist-Google

Sign in to add a comment