no Chrome sticky ebuilds found on branch when sticky is created when existing version is xxx.0_rc-r1 |
|||||||||||||||
Issue descriptionWhenever we branch Cros, we have to make Chrome sticky a number of times. For some reason, the Chrome pfq loses the stickiness after a successful pfq uprev and Chrome ebuild needs to be added again. Example here: pfq successfully uprevved and picked up latest Chrome (63.0.3239.7) but subsequent run (build #5)complains that it can't find sticky ebuild. https://uberchromegw.corp.google.com/i/chromeos_release/builders/samus-chrome-pre-flight-branch%20release-R63-10032.B This is not urgent, but it's a pain.
,
Dec 4 2017
lost track on this bug. is this still happening?
,
Dec 4 2017
M64 just branched...Kevin did you have to do this twice?
,
Dec 5 2017
Per Bernie: the sticky works once, but then gets wiped by the pre flight somehow then after landing it again it seems to stick
,
Dec 5 2017
So issue is still there.
,
Mar 12 2018
I had same issue on recent m66 branch too any update on who could look at this one ?
,
Jul 25
This is still happening, I wonder if we not only add a sticky ebuild (no _rc) but also a second copy of the ebuild that is revved (say .1) against the new branch, that would make it more like the state after the PFQ has run once. If this does not resolve it, I am not sure where the state is stored that fixes it.
,
Jul 25
Fixed component and set to P2 since there is a workaround. What we need is someone who understand what exactly "sticky" means in this context to explain what this code path is trying to achieve. Can we just delete the entire set of logic?
,
Jul 25
Sticky refers to the version number of Chrome being 'stuck', when we branch Chrome OS for a release we do so along side Chrome desktop, the 'sticky' ebuild is an ebuild without an _rc in it which the uprev script interprets as meaning we should stay on this particular major Chrome version. Effectively this is telling the branch Chrome PFQ that we want to uprev new Chromes against a particular branch, similar to what we do for Android with a CL like https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1149334 We could potentially refactor this to work more like the Android PFQ, where we explicitly define a Chrome branch to track using a constant in Chromite if that is better, but we cannot just remove this entire set of logic. This specific bug is meant to determine why the uprev script deletes the 'sticky' (no _rc) ebuild upon successful uprev the first time it runs. I suspect it is something simple, but getting into a position to repro this is not so simple.
,
Aug 7
Going to have Evan, our new-hire, look at this for his starter project.
,
Sep 4
Deleted #16 as a typo killed the ebuild sticking; still hoping to get attention on the bug, however. Thanks!
,
Sep 4
More feedback: 1) I stuck Chrome by creating and adding chromeos-chrome-70.0.3538.0-r1.ebuild to the repository (copied / renamed from chromeos-chrome-70.0.3538.0_rc-r1.ebuild) 2) The PFQ ran successfully 3) The repository - Removed chromeos-chrome-70.0.3538.0-r1.ebuild - Added chromeos-chrome-70.0.3538.6_rc-r1.ebuild (Chrome version taken by the successful PFQ) - Re-added chromeos-chrome-70.0.3538.0_rc-r1.ebuild 4) PFQ started to fail again since the ebuild was missing (had to repeat step 1)
,
Oct 15
Another release, another ping to this bug as the chrome stick still requires two attempts and results in lost time for the TPM team....
,
Oct 15
Evan was directed to a different subteam. Mike, do you have any idea what's happening here and if someone (maybe Lamont?) has can take a look at this?
,
Dec 5
Hi, doing my first release here. Why is this still a thing?
,
Dec 5
,
Dec 5
please link to updated logs. the original comment is a bit confusing because the Chrome PFQ doesn't run on branches.
,
Dec 5
There is a pre-flight builder on the branch that is effectively a PFQ that does not run any HW tests. Example of the actual failure can be found in https://ci.chromium.org/b/8928009604763761376 where the syncchrome stage fails with: ... 21:02:15: INFO: RunCommand: cros_sdk -- ../../chromite/bin/cros_mark_chrome_as_stable '--tracking_branch=release-R72-11316.B' '--boards=samus' stable_release in /b/swarming/w/ir/cache/cbuild/repository/src/scripts cros_mark_chrome_as_stable: Unhandled exception: Traceback (most recent call last): File "../../chromite/bin/cros_mark_chrome_as_stable", line 169, in <module> DoMain() File "../../chromite/bin/cros_mark_chrome_as_stable", line 165, in DoMain commandline.ScriptWrapperMain(FindTarget) File "/mnt/host/source/chromite/lib/commandline.py", line 912, in ScriptWrapperMain ret = target(argv[1:]) File "/mnt/host/source/chromite/scripts/cros_mark_chrome_as_stable.py", line 480, in main sticky_ebuild = _GetStickyEBuild(stable_ebuilds) File "/mnt/host/source/chromite/scripts/cros_mark_chrome_as_stable.py", line 187, in _GetStickyEBuild raise Exception('No sticky ebuilds found') Exception: No sticky ebuilds found ... But the actual problem happens the first time the branch PFQ runs, so something in https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8928022919099612544 led to the sticky ebuild being deleted, specifically the CL https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/1242c76697411a73ef90c2fc840c77170fd80c07 appears to rename the sticky ebuild into an ebuild with _rc. I suspect this is something like we need to have some extra ebuild in the initial sticky chrome CL, like one with a .1 or some such, though optimally the PFQ would not wipe out a sticky ebuild. We could also refactor the chrome PFQ logic to be more like Android, and find its branch by a chromite constant instead of a specially named ebuild.
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/b3a0c9d4cb6cede0cb79db0f893001688b021d7d commit b3a0c9d4cb6cede0cb79db0f893001688b021d7d Author: David McMahon <djmm@google.com> Date: Wed Dec 05 21:55:52 2018 Reland "Uprev and pin and make chrome sticky" This is a reland of 5d7b9d2730171b59dfece751bd4294551889fdd6 BUG= chromium:775622 Original change's description: > Uprev and pin and make chrome sticky > > Change-Id: I00826f97078f59655fc660d10f612209bec1bba3 > Reviewed-on: https://chromium-review.googlesource.com/c/1357153 > Reviewed-by: David McMahon <djmm@chromium.org> > Commit-Queue: David McMahon <djmm@chromium.org> > Tested-by: David McMahon <djmm@chromium.org> Change-Id: I6e9e96de3685a3363350d2048d65b78f0f2c2a50 Reviewed-on: https://chromium-review.googlesource.com/c/1363710 Tested-by: David McMahon <djmm@chromium.org> Trybot-Ready: David McMahon <djmm@chromium.org> Reviewed-by: Bernie Thompson <bhthompson@chromium.org> Reviewed-by: Cindy Bayless <cindyb@google.com> [add] https://crrev.com/b3a0c9d4cb6cede0cb79db0f893001688b021d7d/chromeos-base/chromeos-chrome/chromeos-chrome-72.0.3626.0-r1.ebuild
,
Dec 7
the automatic uprevs are happening because of the AFDO logic. is that supposed to still be running on branches ? so even if you keep trying to repin (like in CL:1363710), the AFDO logic is going to kill your update.
,
Dec 7
,
Dec 7
AFAIK, afdo uprev happen on branches specially when more afdo data for that branch is available. Is there something in ebuild that can be queried in chromite afdo logic to know if the ebuild is pinned i.e. not supposed to be auto upreved.
,
Dec 7
we already have the ability to change the bot config on the branch to disable AFDO uprevving entirely right ? once we branch CrOS, we always pin Chrome (iiuc). so it's not really a situation where we'd need to dynamically detect it. seems like, if we want AFDO to run on branches (not clear to me that we do), we need to improve its uprev logic so it uses the correct naming. or maybe it already does, but the uprev pin as seen in [1] shouldn't have created a _rc ebuild ? [1] https://chromium-review.googlesource.com/c/1357153
,
Dec 7
My understanding is that the afdo pipeline does need to run on branches to ensure that afdo profile is fresh. The afdo pipeline modifies the 9999 and numbered ebuilds but does not rename them [1]. So The renaming is happening via the standard PFQ uprev logic which probably does not know to create _rc suffix ebuilds and therefore removes the _rc* ebuilds. vapier@ can you point me to chrome specific uprev logic? [1] https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/afdo.py?sq&q=afdo+package:%5Echromeos_public$&g=0&l=372
,
Dec 7
thanks, afdo was a red herring. the tree got into a bad state before that point. tl;dr: the uprev code is WAI. the tpms need to update their processes. i don't know where those docs live, so tpms will have to take this. unless they're using a tool to generate the sticky ebuild, but i don't think they are. assigning to djmm@ assuming he's the current tpm for R72. looking through branch history, it seems to me this isn't new behavior. in fact, it's been here so long i don't see it being a P2 (even if it is annoying to have to create two commits instead of one). there have been unsticking commits since R40 which is like 4 years ago. R40 was made sticky after it was branched and bumped (40.0.2214.3 existed when 40.0.2214.0 was made) R41 too (41.0.2272.2 existed when 41.0.2272.0 was made) R42 was made sticky, then broken, then remade sticky R44 was made sticky after it was branched and bumped (44.0.2403.4 existed when 44.0.2403.0 was made) R45 too (45.0.2454.4 existed when 45.0.2454.0 was made) R46 too (46.0.2490.3 existed when 46.0.2490.0 was made) R47 too (47.0.2526.4 existed when 47.0.2526.0 was made) R48 too (48.0.2564.5 existed when 48.0.2564.0 was made) R49 was made sticky, then broken, then remade sticky R50 series has similar hit/miss rates the problem shows up if the sticky ebuild is made when the existing version is xxx.0_rc (e.g. chromeos-chrome-72.0.3626.0_rc-r1.ebuild was in the tree when chromeos-chrome-72.0.3626.0-r1.ebuild was created). the code will load all the versions and find the "newest" one and use that as its stable candidate. since "0-r1" is newer than "0_rc-r1", we pick the sticky ebuild and delete it during the uprev (that's what commit 1242c76697411a73ef90c2fc840c77170fd80c07 did). then the next run fails because the "0_rc-r1" and "5_rc-r1" are in there, but no sticky ebuild. if the branch is created and the version uprevved before the sticky ebuild is made (e.g. R41 was branched, the bots created 41.0.2272.2_rc-r1, and then devs created a sticky ebuild at 41.0.2272.0-r1), then "2_rc-r1" is newer than "0-r1", and the uprev code would use 2_rc-r1 as its stable candidate. so the sticky ebuild would stick around. normally the uprev code will skip sticky ebuilds when processing stable candidates: if stable_candidate and not stable_candidate.IsSticky(): git.RunGit(package_dir, ['rm', stable_candidate.ebuild_path]) the definition of IsSticky is: def IsSticky(self): """Returns True if the ebuild is sticky.""" return self.is_stable and self.current_revision == 0 looking at the history, that code was introduced here: https://chromium-review.googlesource.com/16862 where the stable candidate cleanup logic was changed: - if stable_candidate and stable_candidate != sticky_ebuild: + if stable_candidate and not stable_candidate.IsSticky(): and the definition of IsSticky was added and has remained the same ever since. specifically, only ebuilds w/out a -r# in them are defined as sticky. anything with a -r# (like xxx-r1) is not sticky. looking at the history some more, devs were adding ebuilds w/out a -r# from R19 through R39. but for some reason, starting in R40, devs stopped doing that and started using -r1 which caused these races.
,
Dec 7
Many thanks for getting to the bottom of this. Just to be clear, the suggestion is to change the instructions on https://sites.google.com/a/google.com/chromeos/for-team-members/chronos-download/pmo/cros-releasetasks/chrome-os-branch-day from: cp chromeos-chrome-48.0.2564.5_rc-r1.ebuild chromeos-chrome-48.0.2564.0-r1.ebuild to cp chromeos-chrome-48.0.2564.5_rc-r1.ebuild chromeos-chrome-48.0.2564.0.ebuild ? This seems reasonable, better than the idea of putting a .1 in place initially. Looking historically at the instructions this was defined on October 5th 2015 in https://sites.google.com/a/google.com/chromeos/system/app/pages/admin/compare?wuid=wuid%3Agx%3A7177f14a7d84565e&rev1=13 which is closer to 47, so maybe this was defined somewhere else (this is before my time as a TPM). I will update the docs as such, if this does not resolve it we can reopen, but I think this is closeable.
,
Dec 7
cros_mark_chrome_as_stable uses the regex to determine whether something is sticky: _NON_STICKY_REGEX = r'%s[(_rc.*)|(_alpha.*)]+' % _CHROME_VERSION_REGEX you can see it accepts -r# even though the IsSticky func doesn't. we could update that so the code fails even sooner, but it wouldn't help if the -r# is created in the first place ... the bot will still fail. those updated insns sound fine. i'll start a thread discussing improvements. lets go with "Fixed" since you've fixed the docs now.
,
Dec 7
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 Deleted