New issue
Advanced search Search tips

Issue 754169 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 773180
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature



Sign in to add a comment

[WPT Import] Consider waiting longer for the CQ to start

Project Member Reported by raphael....@intel.com, Aug 10 2017

Issue description

I've observed a few cases lately where a WPT import is being processed correctly and should be good to land, but ends up being cancelled because it took too long for the CQ (+1 and +2) to trigger the bots.

The latest example is https://chromium-review.googlesource.com/609126. As far as I can see, wpt-importer triggered a dry-run at 02:48, Commit Bot posted a message to Gerrit at 02:50 and was going to start processing the patch at 02:52 (per https://chromium-cq-status.appspot.com/v2/patch-status/chromium-review.googlesource.com/609126/2).

However, the Gerrit CL was abandoned at 02:50 and the logs say:
  Triggering CQ try jobs.
  Waiting for try jobs (timeout: 10800 seconds).
  All jobs finished.
  No CQ try job results, aborting.
  step returned non-zero exit code: 1

Something similar happened a few days ago (I don't have the CL URL at hand) at later stage when a Gerrit CL was CQ+2'ed but ended up being abandoned for the same reason.
 
Cc: robertma@chromium.org
Let me try to run some quick scripts to collect some stats on the CQ starting time.
Great point, I didn't think of this as a possible reason for this happening!

Another thought here: To get the list of finished jobs we use git cl try-results, which just returns just the jobs for the latest patchset. We also shouldn't consider the try jobs to be finished if there are no try job results at all.
Another case that I think was related: https://chromium-review.googlesource.com/c/608944

Logs: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.infra.cron%2Fwpt-importer%2F446%2F%2B%2Frecipes%2Fsteps%2FImport_changes_from_WPT_to_Chromium%2F0%2Fstdout

Log excerpt:
...
Adding test expectations lines to LayoutTests/TestExpectations.
All jobs finished.
No results for build Build(builder_name=u'android_blink_rel', build_number=3261)
No tests to rebaseline.
No lines to write to TestExpectations.
Triggering CQ try jobs.
Waiting for try jobs (timeout: 10800 seconds).
All jobs finished.
CQ appears to have failed; aborting.

What it *should* have done would be to wait for CQ to start and fully finish, then decide whether the CQ passed just based on the CQ jobs (not considering the optional try bots).

What it did do, I believe, is that it checked for try job status immediately, and found a list of finished jobs including one failed job, and then it printed "CQ appears to have failed".

Comment 6 by foolip@chromium.org, Aug 17 2017

This was the cause of failure today as well:
https://build.chromium.org/p/chromium.infra.cron/builders/wpt-importer/builds/478

Comment 8 by foolip@chromium.org, Aug 18 2017

Labels: -Pri-2 Pri-1
These two as well:
https://build.chromium.org/p/chromium.infra.cron/builders/wpt-importer/builds/483
https://build.chromium.org/p/chromium.infra.cron/builders/wpt-importer/builds/486

This seems to be a very frequent cause of failure, so increasing priority.
Cc: -qyears...@chromium.org
Owner: qyears...@chromium.org
Status: Started (was: Available)
https://chromium-review.googlesource.com/c/chromium/src/+/629196
Project Member

Comment 10 by bugdroid1@chromium.org, Aug 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a812d0b9f366f4d4061eb834c884163ca4f39d8

commit 6a812d0b9f366f4d4061eb834c884163ca4f39d8
Author: Quinten Yearsley <qyearsley@google.com>
Date: Fri Aug 25 22:55:38 2017

In GitCL.wait_for_try_jobs, wait for at least one job.

Bug:  754169 
Change-Id: I6219d99c6fb0230d1c5e01d76fe5f003959e2d3f
Reviewed-on: https://chromium-review.googlesource.com/629196
Reviewed-by: Philip Jägenstedt <foolip@chromium.org>
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>
Cr-Commit-Position: refs/heads/master@{#497577}
[modify] https://crrev.com/6a812d0b9f366f4d4061eb834c884163ca4f39d8/third_party/WebKit/Tools/Scripts/webkitpy/common/net/git_cl.py
[modify] https://crrev.com/6a812d0b9f366f4d4061eb834c884163ca4f39d8/third_party/WebKit/Tools/Scripts/webkitpy/common/net/git_cl_unittest.py

This should now be fixed as long as there's a second patchset, but actually I don't think that that change would fix the problem in the case where there's only one patchset (no expectations to update).
@qyearsley do you have an update on this?
Labels: -Pri-1 Pri-2
I think that I was thinking that we should make sure that we wait for the try jobs that we requested -- either CQ builders or tryserver.blink builders.

Ideally we would be able to say specifically which builders to wait for, but we don't have an easy way to get the list of CQ builder names now. Maybe wait_for_try_jobs should just take an argument to determine whether it filters out CQ builders or non-CQ builders.

When waiting for CQ (https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/w3c/test_importer.py?l=174) maybe we should just filter out the tryserver.blink builders.

(I assume this isn't a very frequent cause of failures now, returning to P2)
Mergedinto: 773180
Status: Duplicate (was: Started)

Sign in to add a comment