New issue
Advanced search Search tips

Issue 913159 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Dec 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Pre-CQ let change through even though TastVMTest failed on betty-pre-cq

Project Member Reported by derat@chromium.org, Dec 8

Issue description

From  issue 913104 : https://crrev.com/c/1359026 modified cros-go.eclass in a way that broke the TastVMTest stage on betty-pre-cq (http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8928000219296528576), but chromite still reported "The Pre-Commit Queue has successfully verified your change." and sent the change to the CQ.

Is there anything special that needs to be done to make the pre-CQ punt changes in response to failures in the TastVMTest stage? I'd assumed that validation would fail if any of the triggered *-pre-cq builds fail, but perhaps we only look at specific stages in those builds.
 
Cc: pprabhu@chromium.org jclinton@chromium.org
Components: -Infra>Client>ChromeOS>Test Infra>Client>ChromeOS>CI
Labels: M-73
Owner: ----
Status: Available (was: Assigned)
I think this is for CI folks. I took a quick look and didn't see anything obvious.

Patch set 9
https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1359776

Supposedly all green
https://chromeos-cl-viewer-ui.googleplex.com/cl_status/chromium-review.googlesource.com/1359776/9

Except betty-pre-cq is not
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928000219296528576
Status: WontFix (was: Available)
All of the pre-cq runs for patchset 10 are marked as passed in CIDB: https://chromeos-cl-viewer-ui.googleplex.com/cl_status/chromium-review.googlesource.com/1359026/10

The links there are broken due to the Swarming migration. However, from the review thread, we can find the betty-pre-cq run: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927750476269829504 and indeed it is passed.

The bug report includes a link to https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928000219296528576 which appears to have been from patchset 9 (based on stdout logs for that run).

Let me know if I misunderstood.

Owner: jclinton@chromium.org
Status: Assigned (was: WontFix)
The change has been fixed and resubmitted in the meantime. PS9 is where I see the problem.

At Dec 5 01:12 UTC, there was a "The Pre-Commit Queue has picked up your change." message with a link to the betty-pre-cq run at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8928000219296528576. That run failed, but there's a "The Pre-Commit Queue has successfully verified your change." message at 03:29. After this, the change was included in the next CQ run, and the same issue that broke the betty-pre-cq run made the CQ fail.
Actually PS9 also passed betty-pre-cq but only on the second try: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927991063048939744 . We can see that it is PS9 from the stdout logs: https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8927991063048939744/+/steps/PreCQSync/0/stdout

Here's the history as seen from CIDB:

mysql> SELECT action, reason, timestamp, buildbucket_id FROM clActionTable WHERE patch_number = 9 AND change_number = 1359026;
+---------------------------+--------------------------------------+---------------------+---------------------+
| action                    | reason                               | timestamp           | buildbucket_id      |
+---------------------------+--------------------------------------+---------------------+---------------------+
| speculative               | NULL                                 | 2018-12-05 06:47:00 | NULL                |
| validation_pending_pre_cq | daisy_spring-no-vmtest-pre-cq        | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | betty-pre-cq                         | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | grunt-no-vmtest-pre-cq               | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | binhost-pre-cq                       | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | kevin-arcnext-no-vmtest-pre-cq       | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | betty-arcnext-pre-cq                 | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | nyan_blaze-no-vmtest-pre-cq          | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | eve-no-vmtest-pre-cq                 | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | reef-no-vmtest-pre-cq                | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | whirlwind-no-vmtest-pre-cq           | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | lakitu-no-vmtest-pre-cq              | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | cyan-no-vmtest-pre-cq                | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | samus-no-vmtest-pre-cq               | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | fizz-no-vmtest-pre-cq                | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | guado_moblab-no-vmtest-pre-cq        | 2018-12-05 06:47:05 | NULL                |
| validation_pending_pre_cq | zako-no-vmtest-pre-cq                | 2018-12-05 06:47:06 | NULL                |
| screened_for_pre_cq       | NULL                                 | 2018-12-05 06:47:06 | NULL                |
| trybot_launching          | daisy_spring-no-vmtest-pre-cq        | 2018-12-05 06:48:49 | 8928000219890698096 |
| trybot_launching          | betty-pre-cq                         | 2018-12-05 06:48:49 | 8928000219296528576 |
| trybot_launching          | grunt-no-vmtest-pre-cq               | 2018-12-05 06:48:49 | 8928000218404977408 |
| trybot_launching          | binhost-pre-cq                       | 2018-12-05 06:48:49 | 8928000217696831936 |
| trybot_launching          | kevin-arcnext-no-vmtest-pre-cq       | 2018-12-05 06:48:49 | 8928000216991320528 |
| trybot_launching          | betty-arcnext-pre-cq                 | 2018-12-05 06:48:49 | 8928000216010659120 |
| trybot_launching          | nyan_blaze-no-vmtest-pre-cq          | 2018-12-05 06:48:49 | 8928000215442167920 |
| trybot_launching          | eve-no-vmtest-pre-cq                 | 2018-12-05 06:48:49 | 8928000214947519872 |
| trybot_launching          | reef-no-vmtest-pre-cq                | 2018-12-05 06:48:49 | 8928000214421047696 |
| trybot_launching          | whirlwind-no-vmtest-pre-cq           | 2018-12-05 06:48:49 | 8928000212871776048 |
| trybot_launching          | lakitu-no-vmtest-pre-cq              | 2018-12-05 06:48:49 | 8928000212214691776 |
| trybot_launching          | cyan-no-vmtest-pre-cq                | 2018-12-05 06:48:49 | 8928000211281762512 |
| trybot_launching          | samus-no-vmtest-pre-cq               | 2018-12-05 06:48:50 | 8928000210580238880 |
| trybot_launching          | fizz-no-vmtest-pre-cq                | 2018-12-05 06:48:50 | 8928000209983811232 |
| trybot_launching          | guado_moblab-no-vmtest-pre-cq        | 2018-12-05 06:48:50 | 8928000208705469696 |
| trybot_launching          | zako-no-vmtest-pre-cq                | 2018-12-05 06:48:50 | 8928000207202905232 |
| picked_up                 | NULL                                 | 2018-12-05 06:59:21 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 06:59:40 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 06:59:45 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 06:59:58 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 06:59:59 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 06:59:59 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:00:03 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:00:30 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:01:17 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:01:26 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:01:28 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:01:38 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:01:53 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:02:04 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:02:24 | NULL                |
| picked_up                 | NULL                                 | 2018-12-05 07:02:33 | NULL                |
| pre_cq_inflight           | NULL                                 | 2018-12-05 07:04:25 | NULL                |
| verified                  | NULL                                 | 2018-12-05 07:25:26 | NULL                |
| verified                  | NULL                                 | 2018-12-05 07:52:29 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:07:41 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:12:35 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:19:31 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:24:33 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:25:42 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:25:49 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:29:06 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:32:05 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:32:11 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:38:24 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:38:34 | NULL                |
| verified                  | NULL                                 | 2018-12-05 08:39:50 | NULL                |
| kicked_out                | unknown                              | 2018-12-05 09:04:42 | NULL                |
| pre_cq_failed             | NULL                                 | 2018-12-05 09:04:42 | NULL                |
| speculative               | NULL                                 | 2018-12-05 09:05:25 | NULL                |
| exonerated                | forgiven_due_to_sanity_build_failure | 2018-12-05 09:10:30 | NULL                |
| pre_cq_inflight           | NULL                                 | 2018-12-05 09:12:28 | NULL                |
| verified                  | NULL                                 | 2018-12-05 09:14:08 | NULL                |
| trybot_launching          | betty-pre-cq                         | 2018-12-05 09:14:09 | 8927991063048939744 |
| picked_up                 | NULL                                 | 2018-12-05 09:27:36 | NULL                |
| verified                  | NULL                                 | 2018-12-05 11:27:24 | NULL                |
| pre_cq_fully_verified     | NULL                                 | 2018-12-05 11:29:20 | NULL                |
| requeued                  | NULL                                 | 2018-12-07 17:42:37 | NULL                |
| pre_cq_passed             | NULL                                 | 2018-12-07 17:43:31 | NULL                |

So, what happened was that betty-pre-cq failed once, the sanity failure occured, it was auto-retriggered for pre-cq, pre-cq determined that it only needed a signal for betty-pre-cq to make a determination and started that build, it fired the *original* email with all of the original build ID's a second time to the review thread, betty-pre-cq passed, CL was marked as verified.

The bug is that it fired the *original* email with all of the original build ID's a second time to the review thread. I took a look at the code and I'm pretty sure this bug has been here for years; basically it can't render the second round of an immediate PreCQ testing phase using the new build ID's because of the way the SQL query is written. It looks pretty hard to fix.

As it happens, we are killing PreCQ and instead moving the entire PreCQ/CQ system from our mix of pre-cq-launcher + master-paladin over to use the same code that Browser CQ uses to manage the Browser CQ as a part of Parallel CQ. There will not be a PreCQ, at all, because it doesn't make sense to test CL's alone first if they will be tested alone anyway. So, this entire set of code will be dead around the end of Q1. As such, I don't think it makes sense to fix this: this is annoying but pretty rare.

Status: WontFix (was: Assigned)
Cc: vapier@chromium.org semenzato@chromium.org
Status: Unconfirmed (was: WontFix)
Sorry to be continually reopening this, but now I'm starting to wonder if the successful betty-pre-cq run actually included the change that was supposed to be tested.

In the first, failing PS9 run at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8928000219296528576, https://storage.cloud.google.com/chromeos-image-archive/betty-pre-cq/R73-11346.0.0-b3193087/stateful.tgz contains the broken dev_image_new/libexec/tast/bundles/local/cros/cros file.

In the second, successful PS9 run at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8927991063048939744, https://storage.cloud.google.com/chromeos-image-archive/betty-pre-cq/R73-11347.0.0-b3193285/stateful.tgz contains the working dev_image_new/libexec/tast/bundles/local/cros file.

Luigi, do you know if there was any non-determinism in the broken version of the cros-go.eclass change? Looking at https://crrev.com/c/1359026/9..10/eclass/cros-go.eclass, I'm not sure how there could be.
eclass modifications don't trigger revbumps of packages that use those eclasses.  so if no CLs were in the CQ that triggered uprevs of the respective packages, seems easy to have a broken eclass change be merged.

if you want to force things to be tested, you could manually revbump any packages using the cros-go.eclass.
Status: WontFix (was: Unconfirmed)
I think #7 answered how this CL could be non-deterministic as tested. Feel free to reopen if you have other questions.
Thanks, Mike. Just to make sure that I understand, is it then the case that pre-cq builders could end up using prebuilt versions of the packages that would've been modified by the eclass change?

If that's correct, are there any plans to change that at some point, or will we always be susceptible to the pre-CQ (and also CQ?) not catching issues caused by eclass modifications when the packages that use those eclasses aren't updated in the same run?

Sign in to add a comment