New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 775622 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug

Blocked on:
issue 913015



Sign in to add a comment

no Chrome sticky ebuilds found on branch when sticky is created when existing version is xxx.0_rc-r1

Project Member Reported by gkihumba@google.com, Oct 17 2017

Issue description

Whenever we branch Cros, we have to make Chrome sticky a number of times. For some reason, the Chrome pfq loses the stickiness after a successful pfq uprev and Chrome ebuild needs to be added again.
Example here: pfq successfully uprevved and picked up latest Chrome 	(63.0.3239.7) but subsequent run (build #5)complains that it can't find sticky ebuild.

https://uberchromegw.corp.google.com/i/chromeos_release/builders/samus-chrome-pre-flight-branch%20release-R63-10032.B

This is not urgent, but it's a pain.
 

Comment 1 Deleted

Comment 2 Deleted

Comment 3 by nxia@chromium.org, Dec 4 2017

Owner: ----
Status: Available (was: Assigned)
lost track on this bug. is this still happening?
M64 just branched...Kevin did you have to do this twice?
Per Bernie:

the sticky works once, but then gets wiped by the pre flight somehow

then after landing it again it seems to stick
So issue is still there.

Comment 7 Deleted

Comment 8 Deleted

Comment 9 Deleted

Comment 10 Deleted

I had same issue on recent m66 branch too 
any update on who could look at this one ?
This is still happening, I wonder if we not only add a sticky ebuild (no _rc) but also a second copy of the ebuild that is revved (say .1) against the new branch, that would make it more like the state after the PFQ has run once.

If this does not resolve it, I am not sure where the state is stored that fixes it.
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI
Labels: -Pri-1 Pri-2
Fixed component and set to P2 since there is a workaround.

What we need is someone who understand what exactly "sticky" means in this context to explain what this code path is trying to achieve. Can we just delete the entire set of logic?
Sticky refers to the version number of Chrome being 'stuck', when we branch Chrome OS for a release we do so along side Chrome desktop, the 'sticky' ebuild is an ebuild without an _rc in it which the uprev script interprets as meaning we should stay on this particular major Chrome version. 

Effectively this is telling the branch Chrome PFQ that we want to uprev new Chromes against a particular branch, similar to what we do for Android with a CL like https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1149334

We could potentially refactor this to work more like the Android PFQ, where we explicitly define a Chrome branch to track using a constant in Chromite if that is better, but we cannot just remove this entire set of logic.

This specific bug is meant to determine why the uprev script deletes the 'sticky' (no _rc) ebuild upon successful uprev the first time it runs. I suspect it is something simple, but getting into a position to repro this is not so simple.
Owner: jclinton@chromium.org
Status: Assigned (was: Available)
Going to have Evan, our new-hire, look at this for his starter project.

Comment 16 Deleted

Deleted #16 as a typo killed the ebuild sticking; still hoping to get attention on the bug, however.  Thanks!
More feedback:
1) I stuck Chrome by creating and adding chromeos-chrome-70.0.3538.0-r1.ebuild to the repository (copied / renamed from chromeos-chrome-70.0.3538.0_rc-r1.ebuild)

2) The PFQ ran successfully

3) The repository
- Removed chromeos-chrome-70.0.3538.0-r1.ebuild
- Added chromeos-chrome-70.0.3538.6_rc-r1.ebuild (Chrome version taken by the successful PFQ)
- Re-added chromeos-chrome-70.0.3538.0_rc-r1.ebuild

4) PFQ started to fail again since the ebuild was missing (had to repeat step 1)

Another release, another ping to this bug as the chrome stick still requires two attempts and results in lost time for the TPM team....
Owner: vapier@chromium.org
Evan was directed to a different subteam. Mike, do you have any idea what's happening here and if someone (maybe Lamont?) has can take a look at this?
Labels: -Pri-2 Pri-1
Hi, doing my first release here.  Why is this still a thing?

Components: -Infra>Client>ChromeOS>CI Infra>Client>ChromeOS>Build
Labels: -Pri-1 Pri-2
please link to updated logs.  the original comment is a bit confusing because the Chrome PFQ doesn't run on branches.
There is a pre-flight builder on the branch that is effectively a PFQ that does not run any HW tests. 

Example of the actual failure can be found in https://ci.chromium.org/b/8928009604763761376 where the syncchrome stage fails with:

...
21:02:15: INFO: RunCommand: cros_sdk -- ../../chromite/bin/cros_mark_chrome_as_stable '--tracking_branch=release-R72-11316.B' '--boards=samus' stable_release in /b/swarming/w/ir/cache/cbuild/repository/src/scripts
cros_mark_chrome_as_stable: Unhandled exception:
Traceback (most recent call last):
  File "../../chromite/bin/cros_mark_chrome_as_stable", line 169, in <module>
    DoMain()
  File "../../chromite/bin/cros_mark_chrome_as_stable", line 165, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/mnt/host/source/chromite/lib/commandline.py", line 912, in ScriptWrapperMain
    ret = target(argv[1:])
  File "/mnt/host/source/chromite/scripts/cros_mark_chrome_as_stable.py", line 480, in main
    sticky_ebuild = _GetStickyEBuild(stable_ebuilds)
  File "/mnt/host/source/chromite/scripts/cros_mark_chrome_as_stable.py", line 187, in _GetStickyEBuild
    raise Exception('No sticky ebuilds found')
Exception: No sticky ebuilds found
...


But the actual problem happens the first time the branch PFQ runs, so something in https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8928022919099612544 led to the sticky ebuild being deleted, specifically the CL https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/1242c76697411a73ef90c2fc840c77170fd80c07 appears to rename the sticky ebuild into an ebuild with _rc.

I suspect this is something like we need to have some extra ebuild in the initial sticky chrome CL, like one with a .1 or some such, though optimally the PFQ would not wipe out a sticky ebuild.

We could also refactor the chrome PFQ logic to be more like Android, and find its branch by a chromite constant instead of a specially named ebuild.


Project Member

Comment 25 by bugdroid1@chromium.org, Dec 5

Labels: merge-merged-release-R72-11316.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/b3a0c9d4cb6cede0cb79db0f893001688b021d7d

commit b3a0c9d4cb6cede0cb79db0f893001688b021d7d
Author: David McMahon <djmm@google.com>
Date: Wed Dec 05 21:55:52 2018

Reland "Uprev and pin and make chrome sticky"

This is a reland of 5d7b9d2730171b59dfece751bd4294551889fdd6

BUG= chromium:775622 

Original change's description:
> Uprev and pin and make chrome sticky
>
> Change-Id: I00826f97078f59655fc660d10f612209bec1bba3
> Reviewed-on: https://chromium-review.googlesource.com/c/1357153
> Reviewed-by: David McMahon <djmm@chromium.org>
> Commit-Queue: David McMahon <djmm@chromium.org>
> Tested-by: David McMahon <djmm@chromium.org>

Change-Id: I6e9e96de3685a3363350d2048d65b78f0f2c2a50
Reviewed-on: https://chromium-review.googlesource.com/c/1363710
Tested-by: David McMahon <djmm@chromium.org>
Trybot-Ready: David McMahon <djmm@chromium.org>
Reviewed-by: Bernie Thompson <bhthompson@chromium.org>
Reviewed-by: Cindy Bayless <cindyb@google.com>

[add] https://crrev.com/b3a0c9d4cb6cede0cb79db0f893001688b021d7d/chromeos-base/chromeos-chrome/chromeos-chrome-72.0.3626.0-r1.ebuild

Components: Tools>ChromeOS-Toolchain
the automatic uprevs are happening because of the AFDO logic.  is that supposed to still be running on branches ?

so even if you keep trying to repin (like in CL:1363710), the AFDO logic is going to kill your update.
Cc: gmx@chromium.org
Labels: cwp
AFAIK, afdo uprev happen on branches specially when more afdo data for that branch is available.

Is there something in ebuild that can be queried in chromite afdo logic to know if the ebuild is pinned i.e. not supposed to be auto upreved.
we already have the ability to change the bot config on the branch to disable AFDO uprevving entirely right ?

once we branch CrOS, we always pin Chrome (iiuc).  so it's not really a situation where we'd need to dynamically detect it.

seems like, if we want AFDO to run on branches (not clear to me that we do), we need to improve its uprev logic so it uses the correct naming.  or maybe it already does, but the uprev pin as seen in [1] shouldn't have created a _rc ebuild ?

[1] https://chromium-review.googlesource.com/c/1357153
My understanding is that the afdo pipeline does need to run on branches to ensure that afdo profile is fresh.

The afdo pipeline modifies the 9999 and numbered ebuilds but does not rename them [1]. So The renaming is happening via the standard PFQ uprev logic which probably does not know to create _rc suffix ebuilds and therefore removes the _rc* ebuilds. 
vapier@ can you point me to chrome specific uprev logic?

[1] https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/afdo.py?sq&q=afdo+package:%5Echromeos_public$&g=0&l=372
Components: -Tools>ChromeOS-Toolchain
Labels: -cwp -Pri-2 -merge-merged-release-R72-11316.B Pri-3
Owner: djmm@chromium.org
Summary: no Chrome sticky ebuilds found on branch when sticky is created when existing version is xxx.0_rc-r1 (was: No Chrome sticky ebuilds found on M63 branch even after successful Chrome uprev)
thanks, afdo was a red herring.  the tree got into a bad state before that point.

tl;dr: the uprev code is WAI.  the tpms need to update their processes.  i don't know where those docs live, so tpms will have to take this.  unless they're using a tool to generate the sticky ebuild, but i don't think they are.  assigning to djmm@ assuming he's the current tpm for R72.

looking through branch history, it seems to me this isn't new behavior.  in fact, it's been here so long i don't see it being a P2 (even if it is annoying to have to create two commits instead of one).  there have been unsticking commits since R40 which is like 4 years ago.

R40 was made sticky after it was branched and bumped (40.0.2214.3 existed when 40.0.2214.0 was made)
R41 too (41.0.2272.2 existed when 41.0.2272.0 was made)
R42 was made sticky, then broken, then remade sticky
R44 was made sticky after it was branched and bumped (44.0.2403.4 existed when 44.0.2403.0 was made)
R45 too (45.0.2454.4 existed when 45.0.2454.0 was made)
R46 too (46.0.2490.3 existed when 46.0.2490.0 was made)
R47 too (47.0.2526.4 existed when 47.0.2526.0 was made)
R48 too (48.0.2564.5 existed when 48.0.2564.0 was made)
R49 was made sticky, then broken, then remade sticky
R50 series has similar hit/miss rates

the problem shows up if the sticky ebuild is made when the existing version is xxx.0_rc (e.g. chromeos-chrome-72.0.3626.0_rc-r1.ebuild was in the tree when chromeos-chrome-72.0.3626.0-r1.ebuild was created).  the code will load all the versions and find the "newest" one and use that as its stable candidate.  since "0-r1" is newer than "0_rc-r1", we pick the sticky ebuild and delete it during the uprev (that's what commit 1242c76697411a73ef90c2fc840c77170fd80c07 did).  then the next run fails because the "0_rc-r1" and "5_rc-r1" are in there, but no sticky ebuild.

if the branch is created and the version uprevved before the sticky ebuild is made (e.g. R41 was branched, the bots created 41.0.2272.2_rc-r1, and then devs created a sticky ebuild at 41.0.2272.0-r1), then "2_rc-r1" is newer than "0-r1", and the uprev code would use 2_rc-r1 as its stable candidate.  so the sticky ebuild would stick around.

normally the uprev code will skip sticky ebuilds when processing stable candidates:
  if stable_candidate and not stable_candidate.IsSticky():
    git.RunGit(package_dir, ['rm', stable_candidate.ebuild_path])

the definition of IsSticky is:
  def IsSticky(self):
    """Returns True if the ebuild is sticky."""
    return self.is_stable and self.current_revision == 0

looking at the history, that code was introduced here:
  https://chromium-review.googlesource.com/16862

where the stable candidate cleanup logic was changed:
-  if stable_candidate and stable_candidate != sticky_ebuild:
+  if stable_candidate and not stable_candidate.IsSticky():

and the definition of IsSticky was added and has remained the same ever since.  specifically, only ebuilds w/out a -r# in them are defined as sticky.  anything with a -r# (like xxx-r1) is not sticky.

looking at the history some more, devs were adding ebuilds w/out a -r# from R19 through R39.  but for some reason, starting in R40, devs stopped doing that and started using -r1 which caused these races.
Owner: vapier@chromium.org
Status: WontFix (was: Assigned)
Many thanks for getting to the bottom of this.

Just to be clear, the suggestion is to change the instructions on https://sites.google.com/a/google.com/chromeos/for-team-members/chronos-download/pmo/cros-releasetasks/chrome-os-branch-day from:

cp chromeos-chrome-48.0.2564.5_rc-r1.ebuild chromeos-chrome-48.0.2564.0-r1.ebuild

to 

cp chromeos-chrome-48.0.2564.5_rc-r1.ebuild chromeos-chrome-48.0.2564.0.ebuild

? This seems reasonable, better than the idea of putting a .1 in place initially. 

Looking historically at the instructions this was defined on October 5th 2015 in https://sites.google.com/a/google.com/chromeos/system/app/pages/admin/compare?wuid=wuid%3Agx%3A7177f14a7d84565e&rev1=13 which is closer to 47, so maybe this was defined somewhere else (this is before my time as a TPM). 

I will update the docs as such, if this does not resolve it we can reopen, but I think this is closeable. 
Status: Fixed (was: WontFix)
cros_mark_chrome_as_stable uses the regex to determine whether something is sticky:
  _NON_STICKY_REGEX = r'%s[(_rc.*)|(_alpha.*)]+' % _CHROME_VERSION_REGEX

you can see it accepts -r# even though the IsSticky func doesn't.  we could update that so the code fails even sooner, but it wouldn't help if the -r# is created in the first place ... the bot will still fail.

those updated insns sound fine.  i'll start a thread discussing improvements.

lets go with "Fixed" since you've fixed the docs now.
Blockedon: 913015

Sign in to add a comment