Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users
Status: Fixed
Owner:
Closed: Aug 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment
parallel_emerge doesn't compute the same same dep graph as emerge
Project Member Reported by jclinton@chromium.org, Aug 16 Back to list
parallel_emerge (called from build_packages) can sometimes compute a different dependency graph than emerge. This results in failed builds, non-determinism, and difficult-to-debug problems.

In chatting with some folks, it seems that this bug has been known for some time but its root cause was unknown. The work-around has been to hack individual ebuilds by adding more dependencies to work around the problem. For example, this ebuild was submitted just for this reason:
https://chrome-internal-review.googlesource.com/c/413354

I'm experiencing a ToT flake on Paladins that's preventing me from submitting a CL:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/paladin/builds/3570/steps/BuildPackages/logs/stdio
(Ctrl-F for Coreboot build failure searching for "3rdparty/blobs/soc/intel/apollolake/nhlt-blobs".) Eventually, the missing package that was way up the dependency tree from coreboot, nhlt-blobs, is emerged very late in the build_packages process (long after Coreboot which needed it has been built and failed). 

Most challenging about this is that standard equery tooling doesn't reveal the problem. Instead, an undocumented --tree intercept on parallel_emerge can be used to show that there's an issue:

  # You want me to be verbose? I'll give you two trees! Twice as much value.
  if "--tree" in emerge.opts and "--verbose" in emerge.opts:
    deps.PrintTree(deps_tree)

  deps_graph = deps.GenDependencyGraph(deps_tree, deps_info)

  # OK, time to print out our progress so far.
  deps.PrintInstallPlan(deps_graph)
  if "--tree" in emerge.opts:
    PrintDepsMap(deps_graph)

As we can see from the snippet above, it first prints the deps_tree generated by Portage. Then, it generates a new digraph for use for parallelization. Finally, it prints the completed digraph after computation. Between deps_tree and deps_graph, there will be differences--sometimes catastrophic ones.

For example, I have been able to reproduce the Paladin failure locally using this command:
$ parallel_emerge --board=reef-uni --empty --with-bdeps y --deep --pretend --tree --verbose --update --newuse --select --newrepo --backtrack=90 --rebuild-if-unbuilt --usepkg virtual/target-os virtual/target-os-dev virtual/target-os-test virtual/target-os-factory virtual/target-os-factory-shim chromeos-base/autotest-all

In the output, we see the deps_tree and deps_graph dependencies are different leading to the critical bug.

deps_tree (from Portage):
...
 sys-boot/chromeos-seabios-0.0.1-r71 (merge)                        <-- HAS DEPENDENCIES
   dev-vcs/git-2.12.2 (merge)
   sys-apps/coreboot-utils-0.0.1-r3033 (merge)
 sys-boot/coreboot-9999 (merge)                                     <-- HAS DEPENDENCIES
   chromeos-base/chromeos-config-0.0.1-r2 (merge)
   chromeos-base/chromeos-ec-9999 (merge)
   chromeos-base/vboot_reference-1.0-r1383 (merge)
   dev-vcs/git-2.12.2 (merge)
   sys-apps/coreboot-utils-0.0.1-r3033 (merge)
   sys-boot/chromeos-bmpblk-1.0.1-r33 (merge)
   sys-boot/chromeos-mrc-0.0.1-r245 (merge)
   sys-power/iasl-20150717 (merge)
   virtual/coreboot-private-files-3-r1 (merge)
 sys-boot/coreboot-private-files-baseboard-reef-0.0.1-r26 (merge)
   sys-boot/chromeos-firmware-anx3429-0.0-r1 (merge)
   sys-boot/chromeos-firmware-ps8751-0.0-r2 (merge)
   sys-boot/nhlt-blobs-0.0.1-r3 (merge)
 sys-boot/coreboot-private-files-reef-0.0.1-r9 (merge)              <-- HAS DEPENDENCIES
   sys-boot/coreboot-private-files-baseboard-reef-0.0.1-r26 (merge)
 sys-boot/depthcharge-0.0.1-r1856 (merge)                           <-- HAS DEPENDENCIES
   chromeos-base/chromeos-config-0.0.1-r2 (merge)
   dev-vcs/git-2.12.2 (merge)
   sys-boot/libpayload-0.0.1-r2905 (merge)
...
 virtual/chromeos-test-testauthkeys-1 (merge)
   chromeos-base/chromeos-test-testauthkeys-0.0.1-r4 (merge)
 virtual/coreboot-private-files-3-r1 (merge)                        <-- HAS DEPENDENCIES
   sys-boot/coreboot-private-files-baseboard-reef-0.0.1-r26 (merge)
   sys-boot/coreboot-private-files-reef-0.0.1-r9 (merge)
 virtual/dptf-1-r1 (merge)
   sys-power/dptf-8.4.10100-r2 (merge)
...

deps_graph (from GenDependencyGraph):
...
sys-boot/chromeos-seabios-0.0.1-r71: (merge) needs                  <-- MISSING DEPS
    no dependencies
sys-boot/coreboot-9999: (merge) needs                               <-- MISSING DEPS
    no dependencies
sys-boot/coreboot-private-files-baseboard-reef-0.0.1-r26: (merge) needs
    sys-boot/chromeos-firmware-anx3429-0.0-r1
    sys-boot/chromeos-firmware-ps8751-0.0-r2
    sys-boot/nhlt-blobs-0.0.1-r3
sys-boot/coreboot-private-files-reef-0.0.1-r9: (merge) needs        <-- MISSING DEPS
    no dependencies
sys-boot/depthcharge-0.0.1-r1856: (merge) needs                     <-- MISSING DEPS
    no dependencies
...
virtual/chromeos-test-testauthkeys-1: (merge) needs
    chromeos-base/chromeos-test-testauthkeys-0.0.1-r4
virtual/coreboot-private-files-3-r1: (merge) needs                  <-- MISSING DEPS
    no dependencies
virtual/dptf-1-r1: (merge) needs
    sys-power/dptf-8.4.10100-r2
...

 
Cc: pgeorgi@chromium.org
Thanks for the detail, Jason.

Mike, have you seen a fix for this upstream?
Some of the other dependencies that are different (eg. seabios' reliance on coreboot-utils) have the same root cause as the nhlt-blobs issue in coreboot:
They're only added as DEPENDS and (as documented over at Gentoo) portage considers DEPENDS as pure build time dependencies which aren't required when installing from binary.
There might be some serializing event somewhere (an RDEPEND in another earlier packages) that prevents the build from failing over coreboot-utils.

Is the difference that deps_tree lists _all_ dependencies (eg. seabios also requiring git, which is only needed to download the sources) while deps_graph only lists those packages that require action on this system (there's a seabios binpkg already -> no need to deal with the sources)?

In that case, this output indicates that things are working as intended (by the Gentoo folks) and then it's unlikely that there's a "fix". We need to be more careful about our DEPENDS and RDEPENDS though.
DEPEND vs RDEPEND doesn't explain the problem: coreboot is being being built from source. As I happen to have cros_workon'd the coreboot ebuild, I can also disprove that hypothesis by making DEPEND and RDEPEND equal. It doesn't change the output. So, no, this isn't WAI.

Also, AFAICT from tracing the code, deps_tree and deps_graph are supposed to have the same "needs" for each node. Unfortunately, parallel_emerge has 1 KLOC of an artisanally crafted digraph implementation with no tests and isn't particularly easy to follow.

Added some debugging statements into the dep_graph generation code path. As the tree traversal is occurring, this prints out every dep_tree node visited. Of particular note here is that dep_tree is... not a tree and yet it is treated like both a tree and forest. This is just wrong. It would be more accurate if this code treated this like a sparse tree. It seems that sometimes the "sparse" nodes win out over fully populated nodes. Looking for the root cause. Sample output for just coreboot-related adjacencies:

DEBUG: {sys-boot/chromeos-bootimage-0.0.3-r4}
{ 'action': 'merge',
  'deps': { 'chromeos-base/chromeos-config-0.0.1-r2': { 'action': 'merge',
                                                        'deps': { },
                                                        'deptypes': [ 'optional']},
            'chromeos-base/chromeos-ec-9999': { 'action': 'merge',
                                                'deps': { },
                                                'deptypes': ['optional']},
            'chromeos-base/vboot_reference-1.0-r1383': { 'action': 'merge',
                                                         'deps': { },
                                                         'deptypes': [ 'optional']},
            'sys-boot/chromeos-bmpblk-1.0.1-r33': { 'action': 'merge',
                                                    'deps': { },
                                                    'deptypes': ['optional']},
            'sys-boot/chromeos-seabios-0.0.1-r71': { 'action': 'merge',
                                                     'deps': { },
                                                     'deptypes': [ 'optional']},
            'sys-boot/coreboot-9999': { 'action': 'merge',
                                        'deps': { },                                 <-- NO DEPS
                                        'deptypes': ['optional']},
            'sys-boot/depthcharge-0.0.1-r1856': { 'action': 'merge',
                                                  'deps': { },
                                                  'deptypes': ['optional']}}}

[recursion to coreboot-9999 node occurs]

DEBUG: {sys-boot/coreboot-9999}
{ 'action': 'merge', 'deps': { }, 'deptypes': ['optional']}                          <-- EMPTY NODE ABOVE VISITED

[much later in the dep_tree traversal, there's a top-level package node]

DEBUG: {sys-boot/coreboot-9999}
{ 'action': 'merge',
  'deps': { 'chromeos-base/chromeos-config-0.0.1-r2': { 'action': 'merge',
                                                        'deps': { },
                                                        'deptypes': [ 'optional']},
            'chromeos-base/chromeos-ec-9999': { 'action': 'merge',
                                                'deps': { },
                                                'deptypes': ['optional']},
            'chromeos-base/vboot_reference-1.0-r1383': { 'action': 'merge',
                                                         'deps': { },
                                                         'deptypes': [ 'optional']},
            'dev-vcs/git-2.12.2': { 'action': 'merge',
                                    'deps': { },
                                    'deptypes': ['optional']},
            'sys-apps/coreboot-utils-0.0.1-r3033': { 'action': 'merge',
                                                     'deps': { },
                                                     'deptypes': [ 'optional']},
            'sys-boot/chromeos-bmpblk-1.0.1-r33': { 'action': 'merge',
                                                    'deps': { },
                                                    'deptypes': ['optional']},
            'sys-boot/chromeos-mrc-0.0.1-r245': { 'action': 'merge',
                                                  'deps': { },
                                                  'deptypes': ['optional']},
            'sys-power/iasl-20150717': { 'action': 'merge',
                                         'deps': { },
                                         'deptypes': ['optional']},
            'virtual/coreboot-private-files-3-r1': { 'action': 'merge',
                                                     'deps': { },
                                                     'deptypes': [ 'optional',
                                                                   'optional']}}}


The failing code is:

        # Add dependencies to this package.
        for dep, dep_item in packages[pkg]["deps"].iteritems():
          print("PKG {%s} DEP %s" % (pkg, dep))
          # We only need to enforce strict ordering of dependencies if the
          # dependency is a blocker, or is a buildtime or runtime dependency.
          # (I.e., ignored, optional, and runtime_post dependencies don't
          # depend on ordering.)
          dep_types = dep_item["deptypes"]
          if needed_dep_types.intersection(dep_types):
            deps_map[dep]["provides"].add(pkg)
            this_pkg["needs"][dep] = "/".join(dep_types)

Specifically, "needed_dep_types.intersection(dep_types)":

The list of needed_dep_types is defined as:
      needed_dep_types = set(["blocker", "buildtime", "buildtime_slot_op",
                              "runtime", "runtime_slot_op"])

Not a single package in the entire tree of deps_tree extracted from Portage has deptypes set to "buildtime". Almost everything is either "runtime" or "optional". As can be seen in #c4, the coreboot package has only "optional" deptypes populated. Continuing to research...

The root cause is that the tree that is produced by Portage Python library with --usepkg enabled (as it is on the Paladins and from build_packages), always outputs either "runtime" or "optional" and never "buildtime" even when there are packages that are being built (not available on the getbinpkg host or in PKGDIR, for example, as the result of uprev). This is a bug. Given how old our copy of Portage is, I'm not going to look through 6 years of Portage development to see if this was fixed. Instead, I'm working around the problem in a simple, 6-line patch to parallel_emerge. With this patch, the problem is resolved and a clean build_packages works in the correct order without failing.


Patch out for review at https://chromium-review.googlesource.com/c/619869 . After this change, the --tree intercept output is correct:

deps_tree:

sys-boot/coreboot-9999 (merge)
   chromeos-base/chromeos-config-0.0.1-r2 (merge)
   chromeos-base/chromeos-ec-9999 (merge)
   chromeos-base/vboot_reference-1.0-r1383 (merge)
   dev-vcs/git-2.12.2 (merge)
   sys-apps/coreboot-utils-0.0.1-r3034 (merge)
   sys-boot/chromeos-bmpblk-1.0.1-r33 (merge)
   sys-boot/chromeos-mrc-0.0.1-r245 (merge)
   sys-power/iasl-20150717 (merge)
   virtual/coreboot-private-files-3-r1 (merge)

...
deps_graph:

sys-boot/coreboot-9999: (merge) needs
    chromeos-base/chromeos-config-0.0.1-r2
    chromeos-base/chromeos-ec-9999
    chromeos-base/vboot_reference-1.0-r1383
    dev-vcs/git-2.12.2
    sys-apps/coreboot-utils-0.0.1-r3034
    sys-boot/chromeos-bmpblk-1.0.1-r33
    sys-boot/chromeos-mrc-0.0.1-r245
    sys-power/iasl-20150717
    virtual/coreboot-private-files-3-r1

Project Member Comment 8 by bugdroid1@chromium.org, Aug 18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/14e53987cfbb1e99fa72f7517910acbb4e8ca5d4

commit 14e53987cfbb1e99fa72f7517910acbb4e8ca5d4
Author: Jason D. Clinton <jclinton@chromium.org>
Date: Fri Aug 18 07:57:29 2017

parallel_emerge: Work around Portage library bug with usepkg

When using --usepkg, the Portage library imported into parallel_emerge
will always return "optional" instead of "buildtime" as the deptypes
even when a package is being built from source (ie. the binary package
is not available or an uprev is occuring). This has caused years of
frustration and loathing for the DEPEND and RDEPEND variables.

This works around the problem by always treating 'optional' as a
strong package dependency when usepkg is enabled.

BUG= chromium:756240 ,b:64117873, chromium:741043 
TEST=./setup_board for previously failing board, ./build_packages to
confirm that non-determinism in package install order is gone. (And
that coreboot builds as expected.)

Change-Id: I7c93802d5b7b82d56e86fecf4ead0eab7445762d
Reviewed-on: https://chromium-review.googlesource.com/619869
Commit-Ready: Jason Clinton <jclinton@chromium.org>
Tested-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Aaron Durbin <adurbin@chromium.org>

[modify] https://crrev.com/14e53987cfbb1e99fa72f7517910acbb4e8ca5d4/scripts/parallel_emerge.py

Status: Fixed
There, that should make RDEPEND/DEPEND do what they're supposed to do, for now.
Owner: jclinton@chromium.org
Great work Jason! Do you have any idea how many people the bug affected / how many workarounds there might be in our ebuilds?
I would speculate that any non-upstream ebuild where RDEPEND and DEPEND are equal should be examined. The only downside to folks having done to work around this bug would be that there are unnecessary build-time deps getting shipped in the device images (eating up space). But, every ebuild would have to be examined on a case-by-case basis to see if there's an opportunity to split the RDEPEND and DEPEND variables. Maybe some kind of PSA is warranted.
Yes a PSA sounds useful, just so people are aware, even if they don't necessarily take action.
PSA sent.
Status: Started
CL was suspected of SDK build pipeline breakage and was reverted. The SDK build is in a very poor state and so is not on CQ. We don't know yet if the revert will fix SDK build.

At this point, toolchain team has not provided a root cause or way to reproduce the failure: the SDK build cannot currently be run on a workstation. Without a way to make progress or any way to trace down the problem, we might have to live with parallel_emerge flakiness.

Waiting for an SDK build sometime in the next 24 hours to see where we are.
Labels: OS-Chrome
(FWIW, the SDK builder is far past the original failure now:

https://build.chromium.org/p/chromiumos/builders/chromiumos-sdk/builds/8103

so we're probably good on that front.)

---

[Pasting some bits from chat, with some commentary]

I managed to manually run the SDK build steps on my workstation overnight, and produced a tarball that yields the same errors as seen in the SDK breakage bug [1]. I'm not 100% I did this all correctly, but it looks reasonable to me. Here's roughly the steps I did (gleaned from the chromite scripts and plagiarizing the logs from builbot runs):

---

## new checkout
repo init
repo sync

# Init SDK
cros_sdk --bootstrap --replace

# SetupBoard amd64-host -- for building host/SDK tools
cros_sdk -- ./setup_board --board=amd64-host --skip_chroot_upgrade --nousepkg --reuse_pkgs_from_local_boards

# SDKBuildToolchains
cros_sdk -- sudo -n -- /mnt/host/source/chromite/bin/cros_setup_toolchains --nousepkg
cros_sdk -- sudo -n -- /mnt/host/source/chromite/bin/cros_setup_toolchains --create-packages --output-dir /tmp/toolchain-pkgs

# SDKPackage
cd chroot/build/amd64-host
sudo -n XZ_OPT=-e9 -- tar --sparse -I /usr/bin/xz -cf ../../../built-sdk.tar.xz .
cd -

# SDKTest
cros_sdk --chroot new-sdk-chroot --download --replace --nousepkg --url file://$(pwd)/built-sdk.tar.xz

---

It looks like that yielded a tarball that was missing several top-level stub directories (including /home), which would explain the failures:

---

good: (tarball downloaded from previous runs)
$ find home proc media mnt root sys
home
home/chronos
home/chronos/user
proc
media
mnt
mnt/stateful_partition
root
sys

bad: (generated from my workstation, using the alleged bad CL)
$ find home proc media mnt root sys
find: `home': No such file or directory
find: `proc': No such file or directory
find: `media': No such file or directory
find: `mnt': No such file or directory
find: `root': No such file or directory
find: `sys': No such file or directory

---

Hopefully that's enough to go on for reproducing and fixing this before trying to reland the feature.


[1] https://bugs.chromium.org/p/chromium/issues/detail?id=757147
All of the missing directories seem to be created in the pkg_postinst phase of chromeos-base/chromeos-base.  Still checking to see what the parallel_emerge change would have changed about this package.
Project Member Comment 18 by bugdroid1@chromium.org, Aug 30
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/ac65d3933375eafe17339e3ac579c82c95d97e04

commit ac65d3933375eafe17339e3ac579c82c95d97e04
Author: Benjamin Gordon <bmgordon@chromium.org>
Date: Wed Aug 30 23:13:19 2017

chromeos-base: Depend on bash when building amd64-host

chromeos-base needs /bin/bash to exist inside the sysroot before it gets
merged, but DEPENDS only includes app-shells/bash when not building the
host packages.  We were lucking out previously due to build orders, but
crrev.com/c/619869 will change the merge order and break this when
re-landed.

BUG= chromium:756240 
TEST=Sent to buildbots with crrev.com/c/619869 re-added.

Change-Id: I096565d1fa8ae6aec3a33e274ca4ee006b44c0d4
Reviewed-on: https://chromium-review.googlesource.com/643566
Commit-Ready: Benjamin Gordon <bmgordon@chromium.org>
Tested-by: Benjamin Gordon <bmgordon@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[rename] https://crrev.com/ac65d3933375eafe17339e3ac579c82c95d97e04/chromeos-base/chromeos-base/chromeos-base-0-r139.ebuild
[modify] https://crrev.com/ac65d3933375eafe17339e3ac579c82c95d97e04/chromeos-base/chromeos-base/chromeos-base-0.ebuild

Great to find this, thanks Ben!
We can push the parallel_emerge fix back in again now?
Cc: bmgordon@chromium.org
cool. thanks for the update.
Project Member Comment 24 by bugdroid1@chromium.org, Aug 31
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/670b697cdd040b5d73f4b80f558dcfc01846d9ee

commit 670b697cdd040b5d73f4b80f558dcfc01846d9ee
Author: Benjamin Gordon <bmgordon@chromium.org>
Date: Thu Aug 31 21:19:58 2017

Reland "parallel_emerge: Work around Portage library bug with usepkg"

This reverts commit 95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea.

crrev.com/c/643566 fixed the original source of breakage, so this should
now be safe to put in again.

BUG= chromium:756240 
TEST=Sent to chromiumos-sdk trybot

Change-Id: Icb9f5f7669e8e5514f698a175693203e06e3a8e9
Reviewed-on: https://chromium-review.googlesource.com/644617
Commit-Ready: Benjamin Gordon <bmgordon@chromium.org>
Tested-by: Benjamin Gordon <bmgordon@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>

[modify] https://crrev.com/670b697cdd040b5d73f4b80f558dcfc01846d9ee/scripts/parallel_emerge.py

Status: Fixed
So now we wait for a toolchain build?
Ben submitted a toolchain trybot build to prove that it's fine: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/chromiumos-sdk/builds/2522

Anyone who syncs to ToT at this point will get the corrected parallel_emerge behavior. I re-announced it to the chromium-os-dev list around the same time I closed this bug.

OK great, thanks
Project Member Comment 29 by bugdroid1@chromium.org, Sep 1
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/2e44bdf2651418ed8cdbf9633d89195a5f13a1aa

commit 2e44bdf2651418ed8cdbf9633d89195a5f13a1aa
Author: Benjamin Gordon <bmgordon@chromium.org>
Date: Fri Sep 01 19:39:24 2017

chromeos-base: Move pkg_postinst to pkg_preinst.

When pkg_postinst fails, portage doesn't know that anything went wrong
and proceeds, but this leaves the SDK in a non-functioning state.  Move
the logic into pkg_preinst where failures will be treated as a merge
failure.  This would have caught things like  crbug.com/757147  much
sooner.

BUG= chromium:756240 
TEST=Sent to chomiumos-sdk trybot.

Change-Id: I3a83ec1aa5b7a1014a2c1a770208fc2dc16ba9b0
Reviewed-on: https://chromium-review.googlesource.com/644096
Commit-Ready: Benjamin Gordon <bmgordon@chromium.org>
Tested-by: Benjamin Gordon <bmgordon@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/2e44bdf2651418ed8cdbf9633d89195a5f13a1aa/chromeos-base/chromeos-base/chromeos-base-0.ebuild
[rename] https://crrev.com/2e44bdf2651418ed8cdbf9633d89195a5f13a1aa/chromeos-base/chromeos-base/chromeos-base-0-r140.ebuild

Sign in to add a comment