New issue
Advanced search Search tips

Issue 814910 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Make CHROMEOS_LKGM updates in chromium.git go through the chromium CQ

Project Member Reported by bpastene@chromium.org, Feb 22 2018

Issue description

Currently, the chromium PFQ direct commits LKGM updates in chromium land:
https://chromium.googlesource.com/chromium/src/+log/master/chromeos/CHROMEOS_LKGM
https://codesearch.chromium.org/chromium/src/third_party/chromite/scripts/chrome_chromeos_lkgm.py?rcl=7dd3b4e6d7a6b2e123d1aec04b121fd7c40871e0&l=137

Twice now that's broken chromium's CQ when it gets updated to a manifest that doesn't have all the artifacts that the chrome-sdk expects (bug 809399, bug 814657)

It would be much more ideal for these updates to go through chromium's CQ. This would ensure that a broken manifest doesn't get rolled into chromium, and won't then be repeatedly reverted by sheriffs.

This might require the PFQ to wait on chromium's CQ during the update-lkgm step, but IMO this is reasonable given the alternative of dcommit'ing potentially breaking changes.
 
If the chromium CQ is too slow/flakey for the chromium PFQ, I wonder if we could at a minimum try the lkgm update on all chromium.chromiumos trybots (https://ci.chromium.org/p/chromium/g/tryserver.chromium.chromiumos/builders) and commit only if they all pass. I think that would catch most (all?) of the breakages we've been seeing.
Updating CHROMEOS_LKGM is subtly complicated, but a large challenge in this approach is determining which CrOS version to update CHROMEOS_LKGM to.

The current design has the PFQ update CHROMEOS_LKGM to a know goood CrOS version, and simple chrome starts there but looks backwards for a version that exists. That is broken for chromium builders, tracked in issue 774599.

Cc: akes...@chromium.org
This broke chromium yet again today in https://bugs.chromium.org/p/chromium/issues/detail?id=815325.

This either needs to go through the CQ or we need to shut it off entirely. We can't have this hard landing CLs that break chromium.
Labels: -Pri-2 Pri-1
Owner: achuith@chromium.org
Status: Assigned (was: Untriaged)
This shouldn't break post: 
https://chromium-review.googlesource.com/c/chromium/src/+/935784

I'll work on landing LKGM changes through the CQ, but that will take a few days.
re comment #3: There are other options, including the fix mentioned in comment #5.

achuith@ - I have some concerns about using the CQ for this:
1) If the PFQ master waits for the CQ to complete it could be idle for a very long time, especially if the CQ is bogged down and/or flakey.
2) If the PFQ master does not wait for CQ completion then it will be difficult to track if/when/how an uprev failed.

It also seems pretty inefficient since we really just need to test the Simple Chrome builders.

Currently the PFQ does test Simple Chrome, just not with the updated LKGM file.

What I think we really should do is:
* Modify the PFQ TestSimpleChromeWorkflow stage to locally modify CHROMEOS_LKGM to the CrOS version being used (i.e. to the same revision that chrome_chromeos_lkgm.py will use) and test 'cros chrome_sdk' using that instead of passing the version with a flag.
* Ensure that the chromium Simple Chrome builders (amd64-generic and daisy) on the PFQ are running the TestSimpleChromeWorkflow stage.

Then if 'cros chrome-sdk' breaks again with a CHROMEOS_LKGM update, the PFQ will fail before we try to update CHROMEOS_LKGM. This would also catch a build failure due to a toolchain update which otherwise might not get caught today.



Labels: Hotlist-CrOS-Gardener OS-Chrome
#6: are you referring to the cros CQ or the chromium one? (I'm referring to the latter w/ #3; I don't think https://codesearch.chromium.org/chromium/src/third_party/chromite/scripts/chrome_chromeos_lkgm.py?rcl=92e338611b65c8b118146ef0f7b29f9cccb608a3&l=137 is how we should be landing these in chromium.)
I was also referring to the chromium CQ.

I described some of the concerns with using the chromium CQ in comment #6. If you have cycles to work with achuith on a robust approach to improve how we update CHROMEOS_LKGM, that would be great. Unfortunately simply "using the chromium CQ" is not going to provide a robust solution.

The concerns are valid, but I don't think that's reason enough to bypass entirely chromium's CQ. We have several pins in chromium land, and AFAIK all their auto-rollers go through the CQ. I don't think we should make an exception here.

wrt to making TestSimpleChromeWorkflow on the PFQ smarter: Does the TestSimpleChromeWorkflow stage run for every board? If it does, then that would certainly prevent the past few breakages. But is there any danger of Chromium adding a simplechrome builder for a board that *isn't* covered on the PFQ? (Prob not, but just want to be safe.)

Also, there's no guarantee that running 'cros chrome_sdk' on a bot on a CrOS master would succeed/fail the same way as a bot on a Chromium master. (eg: What if the account we use for GS auth differs and one loses some needed ACLs?) That's why I think a good solution here is a combination of #1 and #6. ie:
 - Make the PFQ test simepechrome with the updated lkgm.
 - When updating the lkgm checked into chromium, first try the update on all (or just the simplechrome bots?) on https://ci.chromium.org/p/chromium/g/tryserver.chromium.chromiumos/builders. If they pass, then go ahead and land the change. By limiting the CQ to only those trybots, we reduce sources of flakiness and hopefully speed it up enough to simply have the PFQ block until it finishes.

WDYT? I'd be happy to help with the chromium side of things if that's the route we want to go down.
We should take a look at what happens today in the CQ if only //chromeos/CHROMEOS_LKGM is change; in theory, only the simplechrome builders would do any substantive work; of course, if another builder was completely broken for unrelated reasons, that would stop things.

I agree that we can't be regularly landing changes directly that haven't been tested on the configs that need them.
Re comment #10 - Having the PFQ master intitiate and wait for tryjobs to complete on just the simple chrome builders sounds lovely. This sounds very daunting to me, but if you (or another infra person) and Achuith (who has graciously volunteered :) ) can get that working that would be fantastic.

It's not hard to trigger bots and wait for the results. //tools/gn/bin/roll_gn.py is an example of how to do so.
Like roll_gen.py does, all CQ triggering/result-collection can be done through the `git cl` tool, which the PFQ already uses to upload and land the changes
(https://codesearch.chromium.org/chromium/src/third_party/chromite/scripts/chrome_chromeos_lkgm.py?rcl=7dd3b4e6d7a6b2e123d1aec04b121fd7c40871e0&l=109)

So, adding the CQ interaction/blocking should be fairly trivial.
Project Member

Comment 16 by bugdroid1@chromium.org, Mar 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/b4f6998337d5fe4a9c592935386b3f80ee1f67cc

commit b4f6998337d5fe4a9c592935386b3f80ee1f67cc
Author: Achuith Bhandarkar <achuith@chromium.org>
Date: Fri Mar 02 03:13:50 2018

[chrome_chromeos_lkgm]: Use CQ to land changes.

* Move code from main to Run()
* Cleanup left over chromium src directory
* Use chrome gardener as TBR, with tbr-owners as fallback
* Use git cl set-commit instead of git cl land
* Get rid of LandNewLKGM and _TryLandNewLKGM

BUG= chromium:814910 
TEST=manual

Change-Id: If73b5a09076f2bff0ace81a4053dd2b3b474e9f8
Reviewed-on: https://chromium-review.googlesource.com/940628
Commit-Ready: Achuith Bhandarkar <achuith@chromium.org>
Tested-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Benjamin Pastene <bpastene@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/b4f6998337d5fe4a9c592935386b3f80ee1f67cc/scripts/chrome_chromeos_lkgm.py
[modify] https://crrev.com/b4f6998337d5fe4a9c592935386b3f80ee1f67cc/scripts/chrome_chromeos_lkgm_unittest.py

The latest lkgm update successfully went through the CQ. yay \o/
https://chromium-review.googlesource.com/c/chromium/src/+/954844

Though, looking at the simplechrome builders on the CQ, they didn't do much:
https://ci.chromium.org/buildbot/tryserver.chromium.chromiumos/chromeos-amd64-generic-rel/75989

We might want to add the lkgm file as a runtime dependency to the chromiumos_preflight target, that way any updates to the file will trigger a rebuild on those bots.
Project Member

Comment 18 by bugdroid1@chromium.org, Mar 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/5f5078bd3c465edc1cf0a126f1000b805d91ec46

commit 5f5078bd3c465edc1cf0a126f1000b805d91ec46
Author: Achuith Bhandarkar <achuith@chromium.org>
Date: Fri Mar 09 03:28:49 2018

[chrome_chromeos_lkgm_unittest]: Remove cruft.

tree_status and patching time is no longer necessary.

BUG= chromium:814910 
TEST=chrome_chromeos_lkgm_unittest

Change-Id: I9efabfc994ed435da0551b6f83c89ff6cf5a219b
Reviewed-on: https://chromium-review.googlesource.com/954360
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>

[modify] https://crrev.com/5f5078bd3c465edc1cf0a126f1000b805d91ec46/scripts/chrome_chromeos_lkgm_unittest.py

Project Member

Comment 19 by bugdroid1@chromium.org, Mar 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/dc3a35174d80a66d8b5f8da72fc305a95b491f87

commit dc3a35174d80a66d8b5f8da72fc305a95b491f87
Author: Achuith Bhandarkar <achuith@chromium.org>
Date: Wed Mar 21 05:21:27 2018

chrome_chromeos_lkgm: Use git.py

Introduce git.ShallowCheckout, which reduces the size of the download
from 4+G to <1G.

Steps are:
* git init
* git config core.sparsecheckout true
* git remote add origin https://chromium.googlesource.com/chromium/src.git
* Add files codereview.settings and chromeos/CHROMEOS_LKGM to
.git/info/sparse-checkout
* git fetch --depth=1
* git pull origin master

--depth=1 ensures we only get TOT and not previous revisions.

sparsecheckout ensures we only fetch the git metadata files,
codereview.settings (necessary for git cl upload), and the file we're
interested in (CHROMEOS_LKGM).

Additionally:
* Use generic args in ChromeLKGMCommitter.__init__()
* Split UpdateLKGM from CommitNewLKGM
* Remove TODOs.
* Use git.CreateBranch instead of manual branch management
* Use --no-pager to eliminate user interaction for pagination.
* Get rid of builder_status_lib and BaseChromeLKGMCommitterTest in unittest
* Update ChromeLKGMCommitterTest unittests.
* Add git_unittest testShallowFetch

BUG= chromium:814910 
TEST=manual

Change-Id: Ibfe4b8b68e4030be828b2da8cf34aae2fde60ce2
Reviewed-on: https://chromium-review.googlesource.com/967054
Commit-Ready: Achuith Bhandarkar <achuith@chromium.org>
Tested-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>
Reviewed-by: Benjamin Pastene <bpastene@chromium.org>

[modify] https://crrev.com/dc3a35174d80a66d8b5f8da72fc305a95b491f87/scripts/chrome_chromeos_lkgm.py
[modify] https://crrev.com/dc3a35174d80a66d8b5f8da72fc305a95b491f87/scripts/chrome_chromeos_lkgm_unittest.py
[modify] https://crrev.com/dc3a35174d80a66d8b5f8da72fc305a95b491f87/lib/git.py
[modify] https://crrev.com/dc3a35174d80a66d8b5f8da72fc305a95b491f87/lib/git_unittest.py

Project Member

Comment 20 by bugdroid1@chromium.org, Mar 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ce42791ac366479d4d4567e626142bd21e022aa6

commit ce42791ac366479d4d4567e626142bd21e022aa6
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Wed Mar 21 06:34:01 2018

Roll src/third_party/chromite/ 6792b8569..6ea47f17c (4 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/6792b856916a..6ea47f17c992

$ git log 6792b8569..6ea47f17c --date=short --no-merges --format='%ad %ae %s'
2018-03-20 dgarrett cros tryjob: Use correct buildbucket name in logging.
2018-03-20 gmeinke cbuildbot: change firmware key-id to firmware-key-id
2018-03-16 achuith chrome_chromeos_lkgm: Use git.py
2018-03-20 adityakali cbuildbot: stop generating update paylods for some lakitu boards

Created with:
  roll-dep src/third_party/chromite
BUG=chromium:None,chromium:814910


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I292494f54eb93125d97e10594fbcc0e1bf8302e7
Reviewed-on: https://chromium-review.googlesource.com/972670
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#544647}
[modify] https://crrev.com/ce42791ac366479d4d4567e626142bd21e022aa6/DEPS

Components: Infra>Client>ChromeOS>CI
Components: -Infra>Client>ChromeOS
Project Member

Comment 23 by bugdroid1@chromium.org, Apr 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c

commit ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c
Author: Achuith Bhandarkar <achuith@chromium.org>
Date: Tue Apr 03 19:50:47 2018

[chrome_committer]: Split out chrome_committer.

Create a library object for commits into the chrome repo that doesn't
know anything about LKGM. Have chrome_chromeos_lkgm use this library to
land updates to LKGM.

Most of the original code has just been split into 2 files. Exceptions
are:
* Args handling - we get the parser from the library and pass it as a
parent to the script so --help shows all args, and parse_args parses
everything at once. __init__ of both ChromeCommitter and
ChromeLKGMCommitter takes generic args argument.
* The method ChromeCommitter.FullPath() provides the full path of the
file to be modified.
* Checkout() takes a sparse_checkout arg with the list of files to be
pulled.
* Commit() takes a list of files to commit along with the commit
message.
* New unittests in chrome_committer_unittest.py

BUG= chromium:814910 
TEST=unittests

Change-Id: I4bec75cb6eee8399ca9e362df935767266e8faeb
Reviewed-on: https://chromium-review.googlesource.com/976534
Commit-Ready: Achuith Bhandarkar <achuith@chromium.org>
Tested-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>

[modify] https://crrev.com/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c/scripts/chrome_chromeos_lkgm.py
[add] https://crrev.com/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c/lib/chrome_committer.py
[add] https://crrev.com/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c/lib/chrome_committer_unittest.py
[add] https://crrev.com/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c/lib/chrome_committer_unittest
[modify] https://crrev.com/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c/scripts/chrome_chromeos_lkgm_unittest.py
[modify] https://crrev.com/ec8d7a766a7e2512c0757f10ffaf8db91fd2c91c/lib/git.py

Project Member

Comment 24 by bugdroid1@chromium.org, Apr 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/527d0604ab5ddb68549c24b0396cd5ba6c75f813

commit 527d0604ab5ddb68549c24b0396cd5ba6c75f813
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue Apr 03 23:39:32 2018

Roll src/third_party/chromite/ 87c1c3dbe..ec8d7a766 (3 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/87c1c3dbea9a..ec8d7a766a7e

$ git log 87c1c3dbe..ec8d7a766 --date=short --no-merges --format='%ad %ae %s'
2018-03-01 achuith [chrome_committer]: Split out chrome_committer.
2018-03-08 ahassani paygen: Log all chroot operations in the log file
2018-03-08 ahassani paygen: Check for the correct metadata size in the verify section

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:814910 , chromium:808495 , chromium:820243 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I2dc84a4feb54038e73df54f3f123e7cf8e6d3ae0
Reviewed-on: https://chromium-review.googlesource.com/993704
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#547869}
[modify] https://crrev.com/527d0604ab5ddb68549c24b0396cd5ba6c75f813/DEPS

Status: Fixed (was: Started)

Sign in to add a comment