Can’t update Breakpad (roll DEPS) in Chrome: requires LSS roll which requires NaCl update which requires NaCl to roll Breakpad which requires bot update. Seriously. |
|||||||
Issue descriptionI’ve got a story to share, I think you’ll enjoy it. I’m trying to update Breakpad in Chrome. Here’s the change: https://codereview.chromium.org/2167703002/. I can’t just roll Breakpad alone, because Breakpad 2fcb8afbc807 (https://codereview.chromium.org/2065493006) requires linux-syscall-support 3f6478ac95ed (https://codereview.chromium.org/2050613002), so I’ve got to roll LSS too. But Native Client also depends on LSS, so I’ve got to update it for compatibility with 3f6478ac95ed. That’s fine, I’ve got a change to do that too: https://codereview.chromium.org/2164993002/. But wait! NaCl also brings in Breakpad, so when I update LSS in NaCl to 3f6478ac95ed, I’ve also got to update Breakpad past 2fcb8afbc807. The last Breakpad roll in NaCl was 18 months ago. When I try to update Breakpad beyond 728bcdff610a (https://codereview.chromium.org/1570013002), the nacl-precise (non-pnacl) bots all fail the “breakpad configure” step, as in https://build.chromium.org/p/tryserver.nacl/builders/nacl-precise64_newlib_dbg/builds/6655. Here’s what they say: -- checking whether g++ supports C++11 features by default... no checking whether g++ supports C++11 features with -std=c++11... no checking whether g++ supports C++11 features with -std=c++0x... no checking whether g++ supports C++11 features with +std=c++11... no checking whether g++ supports C++11 features with -h std=c++11... no configure: error: *** A compiler with support for C++11 language features is required. Command return code: 1 Halting build step because of failure. @@@STEP_FAILURE@@@ Entire build halted because breakpad configure failed. -- Now I’m stuck. I can’t update Breakpad in Chrome because the NaCl bots have laughably old toolchains. Is this expected? Should these NaCl bots be updated? I’m trying to do the right thing and move the world forward, but if the world’s not ready, I’ll have to revert 2fcb8afbc807 from Breakpad to be able to move it. That feels like kicking the can down the road, so I’d rather fix this for real.
,
Jul 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/native_client/src/native_client.git/+/412a6e2bb153f39f22b9b3e47bc1ed0abb82df3f commit 412a6e2bb153f39f22b9b3e47bc1ed0abb82df3f Author: mark <mark@chromium.org> Date: Fri Jul 22 02:26:57 2016 Update linux-syscall-support to 3f6478ac95edf86cd3da300c2c0d34a438f5dbeb Note that the linux-syscall-support repository has moved to a new location. 2ce6a9744160 Add codereview.settings for new repo location cd0177fdded9 Fix VIEW_VC URL in codereview.settings 929203010984 Finally fix VIEW_VC url with proper review 08056836f2b4 migrate site/wiki docs over 348bdf8d32b3 README: clearly document goals/APIs/supported systems 3f6478ac95ed Add s390/s390x support to linux_syscall_support.h This linux-syscall-support update no longer uses sys_mmap2(). sys_mmap() may now be used everywhere. This also updates Breakpad to d5f42abb82722b29d935aeedd8c0aecec6a079d0 for compatibility with the updated linux-syscall-support. Note that the Breakpad repository has moved to a new location. The hash in the new Breakpad repository of the revision previously used in Native Client is cfaf27c37e3e5df5ca00a75758a36af870b5fb45. This update includes 226 commits, so they are not listed individually here. Since Breakpad 728bcdff610a, a C++11 compiler is required, and the default system compiler on the bots may not provide this support. This also contains a change to use the downloaded clang and sysroot, which should always provide the necessary support. BUG= chromium:629874 Review-Url: https://codereview.chromium.org/2164993002 [modify] https://crrev.com/412a6e2bb153f39f22b9b3e47bc1ed0abb82df3f/DEPS [modify] https://crrev.com/412a6e2bb153f39f22b9b3e47bc1ed0abb82df3f/buildbot/buildbot_standard.py [modify] https://crrev.com/412a6e2bb153f39f22b9b3e47bc1ed0abb82df3f/src/trusted/service_runtime/linux/nacl_bootstrap.c
,
Jul 22 2016
https://codereview.chromium.org/2167703002/ should now do the trick.
,
Jul 25 2016
https://codereview.chromium.org/2167703002/#ps90001: It didn’t work, because that wasn’t the end of the story. (Of course it wasn’t.) telemetry_perf_unittests core.stacktrace_unittest.TabStackTraceTest.* was failing with this Breakpad roll. See https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/267726. (Run those tests locally with “tools/perf/run_tests core.stacktrace_unittest.TabStackTraceTest. --browser=exact --browser-executable=out/release/chrome --chrome-root=.” See https://dev.chromium.org/developers/telemetry/telemetry-unittests. It took a little digging to find this.) Most of the Breakpad logic in there, in particular, the thing that runs minidump_stackwalk, is in third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py. Great, something else that will require a DEPS roll. Poking at Catapult, I found that it’s using its own minidump_stackwalk executable that comes from Cloud Storage. (https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/b6c21362e7d9ff5af0b33c529f57f44dbbe86e77/telemetry/telemetry/internal/binary_dependencies.json#216, but then there’s also https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/76c702013be70503f1656394c7243e951b86196f/telemetry/bin/linux/x86_64/minidump_stackwalk.sha1 which specifies a different hash from 2014 or earlier but doesn’t seem to be used for this test. Then there’s https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/a72904e47085c307cc7ca8f67e5247e87d2514d7/telemetry/bin/README.chromium which refers to https://dev.chromium.org/developers/telemetry/upload_to_cloud_storage, but neither of those mention binary_dependencies.json which seems to be the place where the hash really lives here.) At least minidump_stackwalk appears to be relatively recent (https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/fcc660cff2bc12f27d15e6b478fcd8d54a49ca5e), although nothing tells me what revision of Breakpad it came from. It also seems like when Catapult is running for Chrome, it ought to use the minidump_stackwalk that’s already in Chrome’s tree instead of some checked-in binary. It does this for dump_syms, allowing it to be handled by generate_breakpad_symbols.py (https://chromium.googlesource.com/chromium/src/+/c6be24867adfaa4159909e2072778fbe3fc341fd/components/crash/content/tools/generate_breakpad_symbols.py), which uses the built dump_syms. Anyway, I was pretty bummed by the fact that it looked like I would have to make a change to Catapult to fix this and roll its DEPS along with these three other projects, but then I found that the problem was actually in generate_breakpad_symbols.py, which was misinterpreting dump_syms output that changed during this Breakpad roll (https://chromium.googlesource.com/breakpad/breakpad/+/c44217f6468152bf5693df7ec78a48d97e2b0e83). As it turned out, I was able to fix this just by updating generate_breakpad_symbols.py, as in https://codereview.chromium.org/2167703002/#ps110001. In summary, the dependencies in here are spaghetti. In addition to Breakpad, LSS, and NaCl all needing to roll to new versions in Chrome at the same time AND this involving finding a C++11-compatible toolchain for the NaCl bot build, a telemetry test Chrome lets Catapult drive also needed to change, but Catapult was reaching back into Chrome for some part of its Breakpad usage (although not all of it). Fortunately the part that needed to change was in Chrome, so I didn’t actually need to roll Catapult in Chrome, but this whole thing is so gross that I do need to take a shower.
,
Jul 25 2016
,
Jul 25 2016
Annie, who’s responsible for all of this Catapult stuff?
,
Jul 25 2016
>Annie, who’s responsible for all of this Catapult stuff? aiolos should be the right person for the catapult prebuilts stuff. My personal reading of the matter is: the fact that catapult uses prebuilts is IMHO actually good, because means that the things in catapult that depend on breakpad rely only on that executable (so you don't need to change breakpad and catapult at the same time to match any new behavior introduced). Instead, my feeling is that catapult using the live version of the binaries (when it runs in chrome) is bad. That seems a dependency in the third_party -> chromium direction, which is generally not allowed (a third party rolled into chrome cannot depend on chrome - or sub projects). This is the same story of breakpad that cannot depend on base, just s/breakpad/catapult/ s/base/breakpad/. This (depending on the live version of breakpad binaries) means that in order to change the behavior of a breakpad binary breakpad folks have to update and roll catapult at the same time, which is undesirable. IMHO catapult should just be hermetic and always use prebuilts. It should not be breakpad's owner responsibility / worry to update prebuilts in catapult. (Or am I not thinking to something else here?)
,
Jul 25 2016
As far as the tools are concerned, if Catapult was using its own dump_syms, this wouldn’t have happened. The problem is that Catapult gets some of its stuff from checked-in prebuilts and some of its stuff from the Chrome build, and who knows if new dump_syms output is compatible with old minidump_stackwalk, etc. However, it’s also possible that the Breakpad client built into Chrome will change in such a way as to generate minidumps that require certain support from other tools like minidump_dump and minidump_stackwalk. Things get kind of risky here (although it’s not the case I hit in this bug). There’s some window of compatibility between what the Breakpad client’s minidump writer produces and what reasonably recent minidump_stackwalks can understand, but “reasonable” still might be as narrow as a month. It really sucks to have more Breakpad prebuilts checked into random locations when nobody’s in charge of updating them. This is where surprises come from. It was extra-surprising to find that although my Chrome build produced a minidump_stackwalk executable, the minidump_stackwalk that this test ran was something else that came from somewhere else. And how am I supposed to test a proposed new version of minidump_stackwalk with telemetry_perf_unittests without checking it into Cloud Storage? It can’t be done without futzing around with hashes or scripts in Catapult (if you even find the right one—remember, there seem to be stale files in there).
,
Jul 25 2016
+kbr, nednguyen, eyaich from catapult side.
,
Jul 25 2016
> The problem is that Catapult gets ... some of its stuff from the Chrome build Yup agree. IMHO this is the problem. > However, it’s also possible that the Breakpad client built into Chrome will change in such a way as to generate minidumps that require certain support from other tools like minidump_dump and minidump_stackwalk Hmm I was assuming that things like the minidump format are essentially an API boundary and they don't change in a backwards incompatible way. Realistically if that happens, you'd have similar problems with the crash pipeline also, which is not checked into chrome and is in practice an independent project like catapult here. > It really sucks to have more Breakpad prebuilts checked into random locations when nobody’s in charge of updating them. This is where surprises come from. In an ideal world you should not have to worry about them, and catapult should have some mechanism to detect when they go out of date (some kind of a checked in last-breakpad-hash.id which generates a presubmit warning / blocks rolls of catapult into chrome?) > It was extra-surprising to find that although my Chrome build produced a minidump_stackwalk executable, the minidump_stackwalk that this test ran was something else that came from somewhere else. But IMHO this is a better approach to avoid dependencies entanglement. Not sure what could be done better here, but I am not an expert in the catapult prebuilt area. I suppose what we are missing here is some clearer readme and only one source of truth into catapult? > And how am I supposed to test a proposed new version of minidump_stackwalk with telemetry_perf_unittests without checking it into Cloud Storage? I think you are right though and there is a second dependency problem here: telemetry_perf_unittests create de facto another dependency in the catapult -> chrome direction. the code that generates the minidump lives in chrome (well in breakpad, to the eyes of catapult that is chrome), but perf_unittests call into catapult code. Now the question is whether having a dependency on the minidump format is ok or not. If we assume that minidump don't change in a breaking way, and that catapult always uses a prebuilt (is not the case today) telemetry_perf_unittest should never go red if you make a non-breaking change to minidumps or minidump_stackwalk.
,
Jul 25 2016
> > The problem is that Catapult gets ... some of its stuff from the Chrome build > Yup agree. IMHO this is the problem. Selective paraphrasing :/ > Hmm I was assuming that things like the minidump format are essentially an API > boundary and they don't change in a backwards incompatible way. Realistically if that > happens, you'd have similar problems with the crash pipeline also, which is not > checked into chrome and is in practice an independent project like catapult here. “Downstream” tools like minidump_stackwalk promise to be able to deal with minidumps and symbol files older than they are. But nobody makes any promises about whether minidump_stackwalk can deal with newer minidumps or symbol files. We wouldn’t make a breaking change like that all in one revision, but we might teach minidump_stackwalk a new trick, and then a month later start relying on that trick in the minidump writer, after important clients have had a chance to update. For example, this is what happened in https://chromium.googlesource.com/breakpad/breakpad/+/4912669df1fb34fb8549925a8f7e70e5ffc9e890 (minidump reader update, for minidump_stackwalk) and then, two months later, https://chromium.googlesource.com/breakpad/breakpad/+/6c8f80aa8b3ba8120c4158c069bb298c044dedf9 (minidump writer update, gets built into Chrome). And further, it looks like this did in fact surprise Catapult, which had to scramble to update its prebuilts in https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/fcc660cff2bc12f27d15e6b478fcd8d54a49ca5e. Because of the promises (backward compatibility but not necessarily forward compatibility) and the windows (a month is considered reasonable), projects like Catapult that want to use prebuilts need to update those prebuilts fairly regularly. They can’t be allowed to fall out of date. But it appears that there’s no process for doing this on any sort of schedule in Catapult. We never see this sort of problem with the Google crash-processing pipeline precisely because they have a bare-minimum monthly Breakpad import and server rebuild process. The other alternative, as I mentioned, is using the tools that are already produced as part of the Chrome build. “Which minidump_stackwalk does THIS test run?” isn’t a question people even expect to have to consider when debugging failures like this. If you were running a test in Chrome that runs minidump_stackwalk, and you knew that you had minidump_stackwalk in out/release, would you have any reason to believe that said Chrome test would be using some other minidump_stackwalk? Summary: the current situation is the worst of all worlds. There’s very real version shear. For these catapulty tests, we get (1) dump_syms and a symbol directory builder from Chrome, (2) a minidump writer from Chrome, and (3) a minidump reader and stackwalker that consumes (1) and (2) from Catapult.
,
Jul 25 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/051962e7c10d70da15f45a4c53fba88047f1dd71 commit 051962e7c10d70da15f45a4c53fba88047f1dd71 Author: mark <mark@chromium.org> Date: Mon Jul 25 20:44:47 2016 DEPS update: Breakpad 51db53eec729, LSS 3f6478ac95ed, NaCl 412a6e2bb153 Yes, these three updates are truly interdependent. See https://crbug.com/629874 . Breakpad was previously on a branch that had cherry-picked 027633792600. Breakpad 2fcb8afbc807 requires linux-syscall-support 3f6478ac95ed. The repository for linux-syscall-support has changed locations. Native Client also depends on linux-syscall-support and must be updated to 412a6e2bb153 for compatibility with this version of linux-syscall-support. Update Breakpad to 51db53eec7293a35cb6fc10bd2e333f22dd9d201 220c852dc209 Dump INFO CODE_ID containing Build ID in Linux dump_syms 2fcb8afbc807 linux-syscall-support: pull in latest version 1f7cb2e4c523 Server-side workaround to handle overlapping modules 8c612166fa87 Add a new argument to specify the minidump type to write on Windows 2131271ecabd [Android] Guard some NDK workarounds by major version 027633792600 Recover memory mappings before writing dump on ChromeOS 0de9e29cb674 Don't define |r_debug| and |link_map| on Android releases 21 and later e55a878d5b74 Add process type to MicroDumpExtraInfo 423e661072aa Add new exception code for OOM generated from Chromium 4dc6be51f90c Revert "Don't define |r_debug| and |link_map| on Android releases 21 and later" 51db53eec729 Remove DISALLOW_COPY_AND_ASSIGN from MinidumpStreamInfo Update linux-syscall-support to 3f6478ac95edf86cd3da300c2c0d34a438f5dbeb 2ce6a9744160 Add codereview.settings for new repo location cd0177fdded9 Fix VIEW_VC URL in codereview.settings 929203010984 Finally fix VIEW_VC url with proper review 08056836f2b4 migrate site/wiki docs over 348bdf8d32b3 README: clearly document goals/APIs/supported systems 3f6478ac95ed Add s390/s390x support to linux_syscall_support.h Update Native Client to 412a6e2bb153f39f22b9b3e47bc1ed0abb82df3f 412a6e2bb153 Update linux-syscall-support to 3f6478ac95ed Update generate_breakpad_symbols.py for Breakpad 220c852dc209. BUG= 629874 TEST=tools/perf/run_tests core.stacktrace_unittest.TabStackTraceTest. Review-Url: https://codereview.chromium.org/2167703002 Cr-Commit-Position: refs/heads/master@{#407569} [modify] https://crrev.com/051962e7c10d70da15f45a4c53fba88047f1dd71/DEPS [modify] https://crrev.com/051962e7c10d70da15f45a4c53fba88047f1dd71/components/crash/content/tools/generate_breakpad_symbols.py
,
Jul 25 2016
My understanding was that we needed a working version of Breakpad to get useful stacks while running outside of a Chromium checkout. (Either on our reference build or a stable/dev/etc versions installed on the system.) If this is not true, we should move over to using just the local versions and not having the prebuilt versions. If it is true, we can't remove the prebuilt versions and should instead setup auto updating for those prebuilts. (We already have an API to update dependencies in our dependency manager and a bot for running cron jobs on. We just haven't known of a need to do autoupdate anything other than our reference build of Chrome.) Would a monthly update work? And is there a way to track stable versions of the binaries so we can guarantee that we'll never update to a broken version?
,
Jul 25 2016
Monthly works. You can ask an internal group (don’t want to share their name here, but if it’s not obvious who I’m talking about, ask me privately) how they pick a Breakpad and use the same version they’re using. I don’t know how they pick it, but I think it’s just “whatever’s current on master when we do the import.” Perhaps they advertise that somewhere when it’s done and you can pick up the same version that they’re using. This is effectively the same thing that we do to choose which version of clang to roll into Chrome.
,
Jul 25 2016
Re: "Breakpad, LSS, and NaCl all needing to roll to new versions in Chrome at the same time" A solution to this is not to make breaking changes to a project like LSS unless it's absolutely necessary. i.e. Avoid changes that require non-trivial DEPS rolls. It sounds like the LSS change (https://codereview.chromium.org/2050613002) removed the mmap2() wrapper. A better way to do that would have been to allow using mmap() instead of mmap2(), but then remove mmap2() in a later change only after rolling LSS in Chromium. When I've reviewed LSS changes, I've asked the contributor to roll DEPS in Chromium too. If someone commits a compatibility-breaking change to LSS, I think it's reasonable to roll it back and ask them to do it in a compatibility-preserving way.
,
Jul 25 2016
https://github.com/catapult-project/catapult/issues/2539 for catapult autoupdating the Breakpad binaries.
,
Aug 1 2016
i'm not sure the change could have been done cleanly in a backwards compatible way -- LSS has been declaring incompatible/inconsistent APIs depending on the arch. at this point, it seems like Chromium/breakpad/LSS/NaCL have all rolled ? so do we need to think about backwards cruft in LSS still ?
,
Jun 21 2017
considering the last movement here was almost a year ago, going to assume this is fixed |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by mark@chromium.org
, Jul 21 2016