coral-paladin: CQ build failing with "file collision |
||||||
Issue descriptionIn recent CQ runs: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927773249130514032 coral-paladin build is failing and showing "file collision" in the build packages log: Started chromeos-base/chromeos-config-bsp-coral-0.0.1-r37 (logged in /tmp/chromeos-config-bsp-coral-0.0.1-r37-lpYCqG) === Start output for job chromeos-config-bsp-coral-0.0.1-r37 (0m6.9s) === chromeos-config-bsp-coral-0.0.1-r37: >>> Emerging binary (1 of 1) chromeos-base/chromeos-config-bsp-coral-0.0.1-r37::coral for /build/coral/ chromeos-config-bsp-coral-0.0.1-r37: * Running stacked hooks for pre_pkg_setup chromeos-config-bsp-coral-0.0.1-r37: * sysroot_build_bin_dir ... chromeos-config-bsp-coral-0.0.1-r37: [ ok ] chromeos-config-bsp-coral-0.0.1-r37: * Running stacked hooks for post_pkg_setup chromeos-config-bsp-coral-0.0.1-r37: * python_eclass_hack ... chromeos-config-bsp-coral-0.0.1-r37: [ ok ] chromeos-config-bsp-coral-0.0.1-r37: pbzip2: *WARNING: Trailing garbage after EOF ignored! chromeos-config-bsp-coral-0.0.1-r37: >>> Installing (1 of 1) chromeos-base/chromeos-config-bsp-coral-0.0.1-r37::coral to /build/coral/ chromeos-config-bsp-coral-0.0.1-r37: * This package will overwrite one or more files that may belong to other chromeos-config-bsp-coral-0.0.1-r37: * packages (see list below). You can use a command such as `portageq chromeos-config-bsp-coral-0.0.1-r37: * owners / <filename>` to identify the installed package that owns a chromeos-config-bsp-coral-0.0.1-r37: * file. If portageq reports that only one package owns a file then do chromeos-config-bsp-coral-0.0.1-r37: * NOT file a bug report. A bug report is only useful if it identifies at chromeos-config-bsp-coral-0.0.1-r37: * least two or more packages that are known to install the same file(s). chromeos-config-bsp-coral-0.0.1-r37: * If a collision occurs and you can not explain where the file came from chromeos-config-bsp-coral-0.0.1-r37: * then you should simply ignore the collision since there is not enough chromeos-config-bsp-coral-0.0.1-r37: * information to determine if a real problem exists. Please do NOT file chromeos-config-bsp-coral-0.0.1-r37: * a bug report at http://bugs.gentoo.org unless you report exactly which chromeos-config-bsp-coral-0.0.1-r37: * two packages install the same file(s). See chromeos-config-bsp-coral-0.0.1-r37: * http://wiki.gentoo.org/wiki/Knowledge_Base:Blockers for tips on how to chromeos-config-bsp-coral-0.0.1-r37: * solve the problem. And once again, please do NOT file a bug report chromeos-config-bsp-coral-0.0.1-r37: * unless you have completely understood the above message. chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Detected file collision(s): chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * /build/coral/tmp/chromeos-config/config_dump.json chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Searching all installed packages for file collisions... chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Press Ctrl-C to Stop chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * chromeos-base/chromeos-config-bsp-coral-private-0.0.1-r1097:0::coral-private chromeos-config-bsp-coral-0.0.1-r37: * /build/coral/tmp/chromeos-config/config_dump.json chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Package 'chromeos-base/chromeos-config-bsp-coral-0.0.1-r37' NOT merged chromeos-config-bsp-coral-0.0.1-r37: * due to file collisions. If necessary, refer to your elog messages for chromeos-config-bsp-coral-0.0.1-r37: * the whole content of the above message. chromeos-config-bsp-coral-0.0.1-r37: >>> Failed to install chromeos-base/chromeos-config-bsp-coral-0.0.1-r37 to /build/coral/, Log file: chromeos-config-bsp-coral-0.0.1-r37: >>> '/build/coral/tmp/portage/logs/chromeos-base:chromeos-config-bsp-coral-0.0.1-r37:20181207-204434.log' chromeos-config-bsp-coral-0.0.1-r37: chromeos-config-bsp-coral-0.0.1-r37: * Messages for package chromeos-base/chromeos-config-bsp-coral-0.0.1-r37 merged to /build/coral/: chromeos-config-bsp-coral-0.0.1-r37: chromeos-config-bsp-coral-0.0.1-r37: * This package will overwrite one or more files that may belong to other chromeos-config-bsp-coral-0.0.1-r37: * packages (see list below). You can use a command such as `portageq chromeos-config-bsp-coral-0.0.1-r37: * owners / <filename>` to identify the installed package that owns a chromeos-config-bsp-coral-0.0.1-r37: * file. If portageq reports that only one package owns a file then do chromeos-config-bsp-coral-0.0.1-r37: * NOT file a bug report. A bug report is only useful if it identifies at chromeos-config-bsp-coral-0.0.1-r37: * least two or more packages that are known to install the same file(s). chromeos-config-bsp-coral-0.0.1-r37: * If a collision occurs and you can not explain where the file came from chromeos-config-bsp-coral-0.0.1-r37: * then you should simply ignore the collision since there is not enough chromeos-config-bsp-coral-0.0.1-r37: * information to determine if a real problem exists. Please do NOT file chromeos-config-bsp-coral-0.0.1-r37: * a bug report at http://bugs.gentoo.org unless you report exactly which chromeos-config-bsp-coral-0.0.1-r37: * two packages install the same file(s). See chromeos-config-bsp-coral-0.0.1-r37: * http://wiki.gentoo.org/wiki/Knowledge_Base:Blockers for tips on how to chromeos-config-bsp-coral-0.0.1-r37: * solve the problem. And once again, please do NOT file a bug report chromeos-config-bsp-coral-0.0.1-r37: * unless you have completely understood the above message. chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Detected file collision(s): chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * /build/coral/tmp/chromeos-config/config_dump.json chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Searching all installed packages for file collisions... chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Press Ctrl-C to Stop chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * chromeos-base/chromeos-config-bsp-coral-private-0.0.1-r1097:0::coral-private chromeos-config-bsp-coral-0.0.1-r37: * /build/coral/tmp/chromeos-config/config_dump.json chromeos-config-bsp-coral-0.0.1-r37: * chromeos-config-bsp-coral-0.0.1-r37: * Package 'chromeos-base/chromeos-config-bsp-coral-0.0.1-r37' NOT merged chromeos-config-bsp-coral-0.0.1-r37: * due to file collisions. If necessary, refer to your elog messages for chromeos-config-bsp-coral-0.0.1-r37: * the whole content of the above message. === Complete: job chromeos-config-bsp-coral-0.0.1-r37 (0m6.9s) ===
,
Dec 10
See comment in: https://chromium-review.googlesource.com/c/chromiumos/overlays/board-overlays/+/1369731 and bug: b/120774883 for context This file was deleted by chromeos-config-bsp-coral-private a long time ago. I don't understand why the portage package is so stale in the first place. to current bobby
,
Dec 10
IIUC, when you move a file between ebuilds, you need to add a blocker between them. I see no such blocker. Also, this isn't just a CQ issue. I see it on my local build with build_packages. My local build root still has -r1100 installed: chromeos-base/chromeos-config-bsp-coral-private-0.0.1-r1100::coral-private And you killed config_dump.json in the privater overlay here: https://chrome-internal-review.googlesource.com/c/chromeos/overlays/overlay-coral-private/+/721813/ which should show up in r1102.
,
Dec 10
I'm testing this on my local build: https://chromium-review.googlesource.com/c/chromiumos/overlays/board-overlays/+/1370424
,
Dec 10
Is the missing blocker what caused the ebuild to be stale? I'm looking into this but still very disoriented.
,
Dec 10
Yes, I believe that is the root cause of the failure. For example, in the above linked paladin run, I see: [ebuild U ] chromeos-base/chromeos-config-bsp-coral-private-0.0.1-r1103:0/chromeos-config-bsp-coral-private-0.0.1-r1103::coral-private [0.0.1-r1097:0/chromeos-config-bsp-coral-private-0.0.1-r1097::coral-private] to /build/coral/ USE="-cros_host" 0 KiB So, the -private build was slated for upgrade, but portage didn't know it needed to *really* upgrade it before the public ebuild, because we didn't have the blocker.
,
Dec 10
> Is the missing blocker what caused the ebuild to be stale? Sorry, I don't think I answered the "stale" part completely. IIUC, some builders do incremental builds, and this means they can still have "stale" ebuilds (depending on your definition of "stale") installed from old runs. The current build would eventually upgrade the package, but if we don't have an explicit reason (e.g., blockers) then it might not be done at the appropriate time. Still, I'm not sure the exact root cause of why this particular builder still had -r1097 installed. That is a few weeks old still (-r1097 -> -r1098 was committed on Nov 26), so one might not be faulted for thinking that it should have been upgraded by now.
,
Dec 10
Thank you for the info Brian! I'm still unclear on if this is a CI issue though, it sounds a lot like malformed ebuilds. Is it expected that CI would uprev packages it doesn't think are depended on? If I follow, it only worked because there was likely already a previous version of the ebuild installed locally on the builders, so the real error should have been 'you're trying to use a file from an ebuild you didn't declare a dependency on'?
,
Dec 10
File collision means two packages are providing the same file, which is illegal. So this isn't a case of a missing "dependency" in the colloquial sense, but a missing *blocker* (two packages that can't be installed together at the same time). So: > Is it expected that CI would uprev packages it doesn't think are depended on? Not exactly. Yes, they should get upgraded at some point in the build (we tell portage to upgrade everything, not just stuff that's required via dependency). But we don't guarantee a particular ordering or safe upgrade-handling if the ebuilds didn't declare a dependency (or in this case, a type of dependency called a blocker). > it only worked because there was likely already a previous version of the ebuild installed locally on the builders No, the opposite: it only *failed* because there was a previous version installed. If it was a clean build, things would have been fine. > the real error should have been 'you're trying to use a file from an ebuild you didn't declare a dependency on'? No, not using a file you didn't declare, but providing a replacement file without an appropriate "blocker." FWIW, this is all a problem inherent to incremental builds, where we don't have 100% test coverage of "incremental build from version (X-N) to version (X)", where N is...anything. Dunno if there is something that can be done to improve CI around this. It's definitely a recurring problem. HTH.
,
Dec 10
Ahh! Okay that makes way more sense now. So, to reiterate, in an ideal world incremental builds would reliably be equivalent to clean builds (-looks longingly at g3 Blaze-) which seems like the real answer to this. Detecting that two ebuilds failed to declare blockers on each other despite not both being installed seems outside the scope of any build system; it's only the build system's responsibility to deterministically fail when you try to build both of them and to explain why. Portage is falling short here because the incremental build didn't cleanup something it should have. I don't see anything actionable for a CI on-call here so I'm going to assign this over to you. Feel free to bounce it back if you're the wrong owner and I'll ask around. Interesting problem though. I played with an idea a year or so ago to run all ebuilds inside a FuseFS and watch all the things they open/read/write and build dep graphs from that. It would allow us to replicate a lot of ObjFS/SrcFS and Forge for ChromeOS because we could build 'guaranteed correct' (compile time) dep graphs and short-circuit entire builds by simply mounting their outputs into the FuseFS. That would also fix the incremental builds not being the same as clean builds problem.
,
Dec 10
Yeah, I've got a handle on the $subject bug. Agreed to most of your comment. (Ambivalent about the g3/blaze/etc. stuff, as I'm not familiar.)
,
Dec 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/ebef192c708cc5245e036ed5e3538e494228ee00 commit ebef192c708cc5245e036ed5e3538e494228ee00 Author: Brian Norris <briannorris@chromium.org> Date: Tue Dec 11 11:50:20 2018 coral: add blocker for private chromeos-config-bsp /tmp/chromeos-config/config_dump.json has moved from the private to public overlays, but there was no blocker added. This means upgrades aren't always smooth and various builders might see file conflicts. Add the blocker, so the private ebuild will be removed/upgraded cleanly. BUG= chromium:913078 , b:120774883 TEST=build coral Change-Id: I15a1ada5eae5bfaf6a311b26934c316c532efd52 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1370424 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Reviewed-by: Jesse Schettler <jschettler@chromium.org> Reviewed-by: C Shapiro <shapiroc@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [rename] https://crrev.com/ebef192c708cc5245e036ed5e3538e494228ee00/overlay-coral/chromeos-base/chromeos-config-bsp-coral/chromeos-config-bsp-coral-0.0.1-r38.ebuild [modify] https://crrev.com/ebef192c708cc5245e036ed5e3538e494228ee00/overlay-coral/chromeos-base/chromeos-config-bsp-coral/chromeos-config-bsp-coral-0.0.1.ebuild
,
Dec 11
Should be fixed. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by jclinton@chromium.org
, Dec 7Status: Assigned (was: Untriaged)