autotest CLs cause 700+ packages to be build, resulting in major CQ slowdown |
||||||||||||
Issue descriptionAs of https://b.corp.google.com/issues/75396373 , ebuilds are being uprevved whenever an ebuild they depend on is changed. This means we rebuild a lot more packages from source than we used to. For example, a tryjob I launched containing an autotest CL stack https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940829691880334304 resulted in 763 packages being rebuilt, which extended BuildPackages stage to 1 hour. Much worse is the impact on the CQ. Metrics such as http://shortn/_vD9z3Wc2BR indicate a bimodal behavior in the time taken in build_packages on paladins -- during the week, pretty much every CQ run will have at least 1 autotest CL, so builds almost always take 1+ hours (and worse in the tail, and the CQ is tail-sensitive). In the past, BuildPackages took ~20-30 minutes in the CQ in the common case. This behavior is probably because there are a number of ebuilds that use the autotest repo as cros working uprevver, and 1 of those is probably very deep in the dependency graph. Still open questions: - Which autotest related ebuild is the super deep on in the dep graph. - Which other commonly modified repos might also be triggering uprevs that are deep in the dep graph. Possible mitigations to investigate: 1) Change ebuild dependencies so the offending autotest ebuild(s) are not so deep in the graph. 2) Change those ebuild's uprev behavior (using SUBDIRS_TO_UPREV, or more advanced methods) so that not every autotest ebuild gets uprevved all the time. [Note: this idea has a long history and was problematic at times, but might be more predictable in this always-build-reverse-deps world, see Issue 655884 Issue 791888 ] 2a) Split the autotest repo up so that not so many ebuilds are pointing to a single high traffic repo. 3) Revert the behavior from https://b.corp.google.com/issues/75396373 and achieve the needed uprevs there by some other means. 4+) Your ideas here.
,
Jul 17
For now, we don't really have any other solution to the unibuild config ebuilds breaking on the builders (the original motivation for the revdeps change). So, I think for this specific example of a performance regression, we should look at why so many things depend on Autotest.
,
Jul 17
Is there a convenient tool to dump the ebuild dependency graph, so we can inspect it?
,
Jul 17
`equery depgraph` for forward dependencies, `equery depends --indirect` for reverse dependencies. The latter doesn't really give you a graph, but you could slowly build one by calling it recursively (without --indirect).
,
Jul 17
Depending on your definition of "forward" and "reverse" you might want to reverse "forward" and "reverse" in that comment...
,
Jul 18
-> yshaul is taking a crack at investigating this.
,
Jul 18
Another example was the builds kicked off in the pre-cq build for: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1140750 This is an ebuild change which I don't think should even be included or built by many things at this point, so I would expect build_packages to be all from prebuilts. For some reason https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/emergePackageDetails?cidbBuildId=2758925 took twice as long as the average and rebuilt everything (and did not build virglrenderer). One thing that you can do is use the build json file and walk back from the last package to the one that finished before it started, and so-on to get the entire critical path for a build.
,
Jul 19
Timeout building packages: https://luci-milo.appspot.com/buildbot/chromiumos/moblab-generic-vm-paladin/2123 Maybe instead of rebuilding every package to solve the cros config issue, we tag the specific packages that statically link against another dep, and only rebuild those statically linked packages.
,
Jul 19
Issue 865258 has been merged into this issue.
,
Jul 19
Based on repo case in issue 865258 , I now believe that this is a bug in the revdeps calculation and not actually an issue in with the depgraph.
,
Jul 19
After a bit more digging and consulting our portage expert, it appears that the calculation of the packages to consider for reverse dependencies calcs are too aggressive and we will remove all packages that are not considered upgrades from that list.
,
Aug 21
Is there an update to this issue?
,
Aug 21
Nothing yet, it has gotten pushed way down my priority stack.
,
Aug 21
Isn't this causing major developer productivity hit because local builds aren't using prebuilts properly?
,
Aug 21
Until the higher priority issues are fixed it unfortunately is on the back burner, without it builds don't properly uprev config changes. Building locally you can always use --nowithrevdeps.
,
Aug 21
There was a bug related to the calculation that was fixed after this bug was filed. Can't find the bug reference now. I'm pretty sure that the original request related to Autotest has been fixed. At the moment, we believe that the calculation is 100% correct and the only thing that might still be tracked by this bug is a general optimization strategy. I'm inclined to just close this.
,
Aug 21
Can we verify how many packages are rebuilt when an autotest CL is included?
,
Aug 22
Hey, if you mark this as fixed, please dedup issue 865258 . I still see ALL packages (except chromeos-chrome) rebuilt from source on clean local rebuild. It's severely affecting our productivity. Full logs attached. $ ./setup_board --board=betty --force && ./build_packages --board=betty ... Total: 783 packages (779 new, 4 reinstalls, 1 binary), Size of downloads: 1686257 KiB
,
Aug 22
And please check if the problem reproduces in your local checkout. Maybe it's a problem specific to my checkout. But I heard hidehiko@ also saw this issue.
,
Aug 22
> $ ./setup_board --board=betty --force && ./build_packages --board=betty I just ran this and got: Total: 783 packages (779 new, 4 reinstalls, 1 binary), Size of downloads: 1975785 KiB Yet, during the simulation a little higher, we see: Total: 779 packages (779 new, 629 binaries), Size of downloads: 4215301 KiB The logic that was added for revdeps is only adding a few additional packages: INFO : final reverse dependencies that will be rebuilt: sys-kernel/linux-headers sys-libs/gcc-libs sys-libs/libcxx sys-libs/libcxxabi Something is completely disabling prebuilts on eng desktops in the Portage final invocation. I'm not sure if it's related to this bug. If we look at a coral-paladin builder from over the weekend a smallish number of changes, we see something like for a final calculation: Total: 632 packages (155 upgrades, 1 downgrade, 1 new, 475 reinstalls, 1 binary), Size of downloads: 586098 KiB
,
Aug 22
Re c#20: Can we spin off a bug for the prebuilts issue and find an owner for it?
,
Aug 23
Regarding #20, it is not just eng desktops. CQ is rebuilding almost everything resulting in atleast an extra hour or more spent in build_packages. e.g. https://uberchromegw.corp.google.com/i/chromeos/builders/bob-paladin/builds/3855 on 08/22 Looking at build_packages stage, Only 1 package (Chrome) is used as prebuilt: Total: 785 packages (781 new, 4 reinstalls, 1 binary), Size of downloads: 598482 KiB
,
Aug 23
I agree that something is going on on the CQ but it's somehow different from what's happening on Eng desktops and not as simple as you state: http://shortn/_FI4EF9C5i4 . We can see that, across all of the data, on weekends when there are very few changes, very few packages are being rebuilt and prebuilts are being used. In contrast, a fresh checkout on an Eng desktop is not using any prebuilts. We need someone on the Build team to investigate both issues.
,
Aug 23
With regard to the revert suggested in (3) of the original report, note that reverting the changes in https://b.corp.google.com/issues/75396373 would make that bug come back which would arguably be worse: any change to a cros-config related package would work fine locally but then fail on the PreCQ or CQ (depending on the chroot state) because of the differences between local cros_workon's treatment of dependencies (ignored and always rebuilt if workon'd) and the on-CQ uprev logic which only concerns itself with packages that are actually uprev'd (and therefore must rely on depgraph analysis to solve deps problems). Those bugs are very frustrating for those trying to land CL's and for the CQ runs they break: the author is left speculating about what the depgraph on the chroot of the failure looks like. Also, the current behavior, even in its aggressive form, is also blocking similar "spooky action at a distance" bugs introduced by CL's that would only manifest when the reverse deps are rebuilt later: so-called N+1 breakages.
,
Aug 23
,
Aug 23
I believe I have a good work-around for this that will still preserve the necessary upgreving and stop building all of the unnecessary packages. CL should be ready by EOD.
,
Aug 23
,
Aug 27
,
Aug 27
Fix has been in CQ since early Friday afternoon, no luck landing it just yet.
,
Aug 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosutils/+/0766df471295c1b386b0bb284327ce9a374d98f2 commit 0766df471295c1b386b0bb284327ce9a374d98f2 Author: Gregory Meinke <gmeinke@google.com> Date: Mon Aug 27 19:14:11 2018 build_packages with fewer revdeps Cleaning up reverse dependency calculations. Removing saving the install plan currently calculated in parallel_emerge.py with a grep of the emerge simulation output. This will only pickup the changes that are to ebuilds instead of the previously incorrect reverse dependencies on the binary packages. Will cleanup the rest of the install plan code in another CL once this CL is working correctly locally and in the CQ. BUG= chromium:864309 TEST=local builds Change-Id: Id77ad050679fb701d2b455e462fc91ba1c09d0ff Reviewed-on: https://chromium-review.googlesource.com/1187194 Commit-Queue: Gregory Meinke <gmeinke@chromium.org> Tested-by: Gregory Meinke <gmeinke@chromium.org> Trybot-Ready: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Gregory Meinke <gmeinke@chromium.org> [modify] https://crrev.com/0766df471295c1b386b0bb284327ce9a374d98f2/build_packages
,
Aug 28
this should be resolved now. if we have new data to show it's not working out, please post.
,
Aug 30
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/1756533565918f29ad948d5a5ebd0aec17977caa commit 1756533565918f29ad948d5a5ebd0aec17977caa Author: Gregory Meinke <gmeinke@chromium.org> Date: Thu Aug 30 08:27:57 2018 cbuildbot: remove saving emerge install plan New reverse dependency calculation does not depend on the parallel emerge install plan being saved to disk. Remove from cbuildbot and then will remove from the remaining shell scripts and parallel emerge. BUG= chromium:864309 TEST=local unittests, local builds and tryjobs Change-Id: Ia01573c368b028d02e4acc522bc053a87e4533a7 Reviewed-on: https://chromium-review.googlesource.com/1190424 Commit-Ready: Gregory Meinke <gmeinke@chromium.org> Tested-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: C Shapiro <shapiroc@google.com> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/1756533565918f29ad948d5a5ebd0aec17977caa/cbuildbot/stages/build_stages.py [modify] https://crrev.com/1756533565918f29ad948d5a5ebd0aec17977caa/cbuildbot/commands.py
,
Aug 30
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosutils/+/0206e1b50738a8340864da7a73ee833b65fa3f62 commit 0206e1b50738a8340864da7a73ee833b65fa3f62 Author: Gregory Meinke <gmeinke@chromium.org> Date: Thu Aug 30 08:27:57 2018 remove install plan from build scripts Remove the flags for saving the emerge install plan from build scripts. Saving this incorrect data to an intermediate file is no longer necessary since the new reverse dependency calculation code is already submitted. BUG= chromium:864309 TEST=local builds and tryjobs CQ-DEPEND=CL:1190424 Change-Id: I405400bbdef0d8c88b4877eb1fab38dea5af8217 Reviewed-on: https://chromium-review.googlesource.com/1191762 Commit-Ready: Gregory Meinke <gmeinke@chromium.org> Tested-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/0206e1b50738a8340864da7a73ee833b65fa3f62/setup_board [modify] https://crrev.com/0206e1b50738a8340864da7a73ee833b65fa3f62/update_chroot [modify] https://crrev.com/0206e1b50738a8340864da7a73ee833b65fa3f62/build_packages
,
Aug 31
,
Aug 31
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/68de241167e09532d634d582115d770a73868052 commit 68de241167e09532d634d582115d770a73868052 Author: Gregory Meinke <gmeinke@chromium.org> Date: Fri Aug 31 18:20:16 2018 parallel_emerge: remove saving of install plan Remove the unused code to save the emerge install plan to disk for the old reverse dependency calculations. BUG= chromium:864309 TEST=local builds and tryjobs Change-Id: Ica8edd78c9bc97c996f17570f5c7042c143678fd Reviewed-on: https://chromium-review.googlesource.com/1196823 Commit-Ready: Gregory Meinke <gmeinke@chromium.org> Tested-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/68de241167e09532d634d582115d770a73868052/scripts/parallel_emerge.py |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by vapier@chromium.org
, Jul 17Owner: gmeinke@chromium.org