CQ: When a try job is canceled, look for a message in result_details_json |
||||||||||||||
Issue descriptionCQ: When a try job is canceled, look for a message in result_details_json At the moment, when a try job is canceled, the CQ uses this message, for example: Try jobs failed on following builders: Infra-PerCommit-Large on skia.primary (JOB_FAILED, build has not started yet; builder either lacks capacity or does not exist (misspelled?)) There are a couple of problems here. In this case the job was explicitly canceled (because the patch doesn't apply), not failed. And "build has not started yet" makes it seem like the CQ or Buildbucket gave up on the build because it wasn't consumed for some amount of time, rather than being canceled by the task scheduler. The result_details_json field of a Buildbucket build can now be filled when the build is canceled. Can we make the CQ look inside the build's result_details_json when it has the canceled state and use the message given there, if any, and fall back on the above message otherwise?
,
May 15 2017
,
May 15 2017
Great, thanks! FTR we've just started providing result_details_json for canceled jobs in this format:
{"message": "No jobs for you!"}
in https://skia-review.googlesource.com/c/16911/
But we're happy to standardize on a format you prefer.
,
May 15 2017
Ah, I didn't realize it's not yet standard. I think being able to express failure reason is very handy. I'd vote for the structure similar to other errors from buildbucket:
"error": {
"reason": "..." # for machines;
"message": "for humans"
}
+nodir@ any recommendations?
,
May 15 2017
note that borenet@ is talking about Build.result_details_json field, not buildbucket response. "error" seems to imply that an error happened, but it did not necessarily happen. I'd vote for
{
"reason": "..." # for machines;
"message": "for humans"
}
in Build.result_details_json.
Note that this applies only to Builds with status=COMPLETED, result=CANCELED
It was not standardized before because it was not possible to set result_details_json in cancelled builds until today.
,
May 16 2017
yes, i understood that this is about result_details_json. The reason I proposed "error":{...} is because I thought CQ could make use of it if available in other cases, such as say run_presubmit.py recipe failing.
,
May 16 2017
It seems like it might even be appropriate to have a CQ-specific field, ie:
{
"cq": {
"reason": "...", # for machines
"message": "for humans"
}
}
But again, I'm happy to use whatever makes sense across the board.
,
May 16 2017
What if a build had nothing to do with CQ?
,
May 16 2017
Milo could display the message too, so the standard is not specific to CQ. This build could be continuous, not necessarily a tryjob
,
May 16 2017
The reason, to be interpretable by machines, must be standardized too. It is not as easy as the reason in buildbucket responses because the latter is under buildbucket control. I'd omit the reason for now, so I'm leaning proposal in #3
,
May 17 2017
Fine, CQ would get "message" field if available and set it instead of generic message.
,
Jun 1 2017
,
Jun 2 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/c300716ffd7177e25020ea27dbf9b453468d2d64 commit c300716ffd7177e25020ea27dbf9b453468d2d64 Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Fri Jun 02 10:49:48 2017
,
Jun 7 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/8ebf6654555cffd6f6184664352e6e1256b0b2bc commit 8ebf6654555cffd6f6184664352e6e1256b0b2bc Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Wed Jun 07 13:01:44 2017
,
Jun 8 2017
This has been deployed, but I haven't tested it end-to-end. LMK if you can verify this.
,
Jun 12 2017
This does not seem to be working. CL: https://skia-review.googlesource.com/c/19450 Canceled build: https://apis-explorer.appspot.com/apis-explorer/?base=https://cr-buildbucket.appspot.com/_ah/api#p/buildbucket/v1/buildbucket.get?id=8976987858612587136&_h=1& The above message is very long; I'm going to shorten it.
,
Jun 12 2017
So, yes doesn't work yet. The message would probably get shortened anyway, but it's surely better to shorten it in in your own scheduler.
,
Jun 23 2017
,
Jul 14 2017
Friendly ping. What's the status on this?
,
Jul 14 2017
I don't have time to debug this at the moment :(
,
Jul 14 2017
To be clear on the status: the CL linked above has a bug, but I don't see where (but certainly in test, too :)). Perhaps, I didn't properly mocked buildbucket response.
,
Jul 17 2017
,
Aug 18 2017
,
Aug 31 2017
,
Aug 31 2017
,
Aug 31
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 31
borenet@ is this still something you'd like?
,
Sep 3
,
Sep 3
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by tandrii@chromium.org
, May 15 2017