New issue
Advanced search Search tips

Issue 822363 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

CLs timed out in PreCQ despite a low load.

Project Member Reported by dgarr...@chromium.org, Mar 15 2018

Issue description

This CL is an example:

https://crrev.com/c/963758/3

Every PreCQ builder associated with it timed out with:

We were not able to launch a chromite-pre-cq trybot for your change within 90 minutes.

3/14/18 at 8:20 PM. 90 minutes before that is at 6:50 PM. 

Looking at Viceroy, the in-progress builder loads are low, there should be plenty of capacity.

Why did they time out?
 
Relevant PreCQLauncher Logs.


8:49:34: INFO: Launching Pre-CQs for configs ['daisy_spring-no-vmtest-pre-cq', 'betty-pre-cq', 'binhost-pre-cq', 'cyan-no-vmtest-pre-cq', 'nyan_blaze-no-vmtest-pre-cq', 'reef-no-vmtest-pre-cq', 'whirlwind-no-vmtest-pre-cq', 'zako-no-vmtest-pre-cq', 'samus-no-vmtest-pre-cq', 'kevin-arcnext-no-vmtest-pre-cq', 'chromite-pre-cq'] with changes CL:961144 CL:963758 CL:963759 CL:963760 CL:963781
18:49:34: INFO: RunCommand: cros tryjob --yes --timeout 14400 daisy_spring-no-vmtest-pre-cq betty-pre-cq binhost-pre-cq cyan-no-vmtest-pre-cq nyan_blaze-no-vmtest-pre-cq reef-no-vmtest-pre-cq whirlwind-no-vmtest-pre-cq zako-no-vmtest-pre-cq samus-no-vmtest-pre-cq kevin-arcnext-no-vmtest-pre-cq chromite-pre-cq -g 963759 -g 961144 -g 963758 -g 963760 -g 963781 in /b/c/cbuild/repository
18:49:41: INFO: output: Verifying patches...
Submitting tryjob...
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:daisy_spring-no-vmtest-pre-cq] [buildbucket_id:8952027225980147440].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:betty-pre-cq] [buildbucket_id:8952027225630206016].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:binhost-pre-cq] [buildbucket_id:8952027225250140368].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:cyan-no-vmtest-pre-cq] [buildbucket_id:8952027224656133312].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:nyan_blaze-no-vmtest-pre-cq] [buildbucket_id:8952027224293427424].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:reef-no-vmtest-pre-cq] [buildbucket_id:8952027223895237200].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:whirlwind-no-vmtest-pre-cq] [buildbucket_id:8952027223400555344].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:zako-no-vmtest-pre-cq] [buildbucket_id:8952027222746486080].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:samus-no-vmtest-pre-cq] [buildbucket_id:8952027222430035232].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:kevin-arcnext-no-vmtest-pre-cq] [buildbucket_id:8952027222109156656].
Successfully sent PUT request to [buildbucket_bucket:master.chromiumos.tryserver] with [config:chromite-pre-cq] [buildbucket_id:8952027221731187264].
Tryjob submitted!
To view your tryjobs, visit:
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027225980147440
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027225630206016
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027225250140368
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027224656133312
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027224293427424
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027223895237200
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027223400555344
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027222746486080
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027222430035232
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027222109156656
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=no_vmtest_pre_cq
  http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8952027221731187264
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/waterfall?committer=chrome-bot@chromium.org&builder=pre_cq
Cc: nxia@chromium.org jclinton@chromium.org
Checking the builders from the logs, I see they failed in the PreCQ sync stage:

Traceback (most recent call last):
  File "/tmp/cbuildbot-tmpFTJgAV/tmpkYi8I4/chromite/lib/failures_lib.py", line 229, in wrapped_functor
    return functor(*args, **kwargs)
  File "/tmp/cbuildbot-tmpFTJgAV/tmpkYi8I4/chromite/cbuildbot/validation_pool.py", line 374, in AcquirePreCQPool
    pool = cls(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'overlay'
Labels: -Pri-3 Pri-2
Status: Available (was: Started)
I can see two ways to improve this:

1) Now that we can link to tryjobs via buildbucket_id, annotate the CLs as soon as the jobs are requested.

This should be easy to do. It would make the PreCQ feel more responsive, since users would be notified as soon as their CLs are picked up, and would make it a lot easier to diagnose edge cases like this.


2) Use buildbucket, not CIDB to scan for PreCQ completion and pass/fail, only using CIDB for extended information if needed.

That will allow the PreCQ to correctly handle edge cases like this, as well as better detect (and thus handle) cases where jobs are scheduled but not running right away (perhaps because of load).



I might try to implement 1 after I reland my CL to have "cros tryjob" export it's results as Json. That should allow the PreCQ launcher to avoid having to know how to generate the relevant URLs at all.
Components: Infra>Client>ChromeOS>CI
Components: -Infra>Client>ChromeOS
Owner: ----
Status: Untriaged (was: Available)

Comment 8 by nxia@chromium.org, Jun 8 2018

Cc: -nxia@chromium.org

Sign in to add a comment