New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 913722 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug
Build-Toolchain



Sign in to add a comment

Optimize Timing of Branch Chrome PFQ Functionality

Project Member Reported by bhthompson@google.com, Dec 10

Issue description

Today when the Chrome PFQ on the release branches run it takes ~4 hours, which means we need to wait to start the release branch builders until ~ hours after the new Chrome version gets triggered (5pm Pacific for 71 for example). This is visualized in go/cros-tpm-timing. This means that the OS images for testing by the India team are not available until far into their day which limits the amount of testing that can be done on the nightly build. 

To remedy this, we may want to review what the branch PFQ is actually doing, and what we might be able to do to optimize it. IIUC the branch PFQ does a few key things:

1. Verify that the new version of Chrome can build on Chrome OS. 
2. Generate AFDO optimization data. 
3. Commit a CL to Chrome OS to use the new Chrome.

Given the release branch builds are very reliable, we can probably live without verifying Chrome can build before landing the uprev CL in the branch, however I am not sure what the dependencies on the AFDO generation are. 

For instance could we have the branch PFQ immediately land a CL to uprev Chrome and let it run the AFDO optimization in the background? (bonus points if it reverts the CL should Chrome fail to build)

What happens if the release builders start while AFDO is still in progress? Is there some event we can use to verify this completes before the release builders need it?

Maybe AFDO could be moved into the release builders to remove the dependency?

Thoughts?



 
Components: Tools>ChromeOS-Toolchain
fairly certain AFDO generation requires compiling & executing Chrome first, so there's no way of getting away from building Chrome

wanting to get a new release available to people quickly for testing seems to defeat the point of AFDO generation.  if we uprev Chrome immediately and let builders run, then do AFDO generation in parallel and delay its uprev, then the release will pretty much always be against stale AFDO data.  i know one of the points of AFDO is that stale data still "just works", but we're changing releases from "always up to date AFDO" to "always out of date AFDO".

as for when the release builders need the AFDO generation, the answer practically speaking is "when the bots run `repo sync`" which is right at the start.  otherwise shoe horning updated content behind the back of git means we have release builders producing artifacts we can never reproduce (because the manifest they said they built against isn't actually when they used).
Ok, so if AFDO cannot be parallelized then our general pipeline cannot change for release branch builds...

I guess the next best optimization would be to allow for the branch Chrome PFQ completion to trigger the release branch builders? 

That might save a bit of time as now we are (sadly) reliant on manually curated timing, estimating how long it will take the PFQ to finish before the release builders start and setting cron configs. We would probably also want to be able to easily turn this off and on if we want to save builder time and space for less commonly used branches (e.g. N-3 stable). 
We could start triggering each of the steps based on the previous one, instead of manually scheduling things.

That would at least minimize any delays between steps.
sounds reasonable, but i have no idea how to pull that off.  defer to Don/Lann :).
(sorry for the delay in replying)

Vapier said it very nicely in #1. We don't want to move to a scheme where the AFDO profiles are always out of sync. It can make reproducibility and fixing issues more difficult.

Sign in to add a comment