New issue
Advanced search Search tips

Issue 924192 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Maybe add CQ coverage for ChromeOS PFQ builds

Project Member Reported by r...@chromium.org, Today (11 hours ago)

Issue description

The clang roll was almost reverted because we added a cflag not supported by the chromeos toolchain:
https://chromium-review.googlesource.com/c/chromium/src/+/1424477

The bug for this was  https://crbug.com/923750 

Nico said the following:

"""
What's a PFQ build? I thought cros used its own clang.

If cros uses chromium's clang, and that build config is important, we need to know about it, there needs to be a cq bot we can use, and we need a tot bot.

Here I suppose -fplit-lto-unit needs to be behind a `target_os != "chromeos"`-- slangley, want to make a CL for that?

But it'd still be interesting to hear what this config is (are there docs about it) and why the cq doesn't run it if it's important. (And if it isn't important, why are you using that config?)
""

Maybe there are good reasons why this configuration (PFQ/chromeos) isn't already on the Chromium CQ, but it seems like it's pretty easy to make a .gn change to add a cflag (as I did in this case), have it pass the CQ, and then break the chromeos build.

Do you folks think it would be valuable to add some kind of light, perhaps compile-only, testing of this configuration to the chromium CQ?
 

Comment 1 by r...@chromium.org, Today (11 hours ago)

Cc: llozano@chromium.org g...@chromium.org

Comment 2 by p...@chromium.org, Today (11 hours ago)

I thought that this was what e.g. chromeos-amd64-generic-rel was supposed to be for.

But looking at the paths in https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8923968155016294128/+/steps/generate_build_files__with_patch_/0/stdout it looks like the bot is actually using the Chromium toolchain, not the cros one.

Maybe the right fix here is to make that bot use the cros toolchain?

Comment 3 by thakis@chromium.org, Today (11 hours ago)

Cc: bpastene@chromium.org
bpastene worked on cros bot setup a bit I think. bpastene, do you know what the PFQ build is? (see comment 0)

Comment 4 by bpastene@chromium.org, Today (10 hours ago)

The PFQ is a special commit queue (used only by a single auto roller essentially) that updates the pin of chromium in ChromeOS land. We/I are actively trying to get a lot of what it does into chromium's CQ as well (to avoid issues like this). pcc@ is correct that the chromeos-amd64-generic-rel builder in chromium's CQ is one of them, and its build *should* be functionally identical to the plain amd64-generic chrome ebuild in the PFQ.

> Maybe the right fix here is to make that bot use the cros toolchain?

What gives you the impression that it's not? I see various cros toolchain paths in the cros_target_* gn args.

Could I get a link to a build in the PFQ that failed when it picked up a chromium clang roll?

Comment 5 by bpastene@chromium.org, Today (10 hours ago)

Cc: steve...@chromium.org
Owner: bpastene@chromium.org
Status: Assigned (was: Untriaged)
Also taking the bug since cros coverage in the chromium CQ is my wheel house atm

Comment 7 by steve...@google.com, Today (10 hours ago)

Cc: achuith@chromium.org
Labels: -Pri-3 Pri-2
It looks like the first pfq-informational builder that caught this was here:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8923946242645833168

What is surprising to me is that this failed in BuildPackages, but (apparently) succeeded in SimpleChrome, which should be using the same toolchain.

Comment 8 by steve...@google.com, Today (10 hours ago)

+llozano@

First failure output line BTW is:

chromeos-chrome-73.0.3677.0_alpha-r1: FAILED: obj/base/base_static/base_switches.o 
chromeos-chrome-73.0.3677.0_alpha-r1: /home/chrome-bot/goma/gomacc x86_64-cros-linux-gnu-clang++ -B/usr/x86_64-pc-linux-gnu/x86_64-cros-linux-gnu/binutils-bin/2.27.0-gold -MMD -MF obj/base/base_static/base_switches.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_NSS_CERTS=1 -DUSE_OZONE=1 -DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL -DOFFICIAL_BUILD -DCHROMIUM_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_GNU_SOURCE -DCR_CLANG_REVISION=\"351477-1\" -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -DCR_SYSROOT_HASH=e7c53f04bd88d29d075bfd1f62b073aeb69cbe09 -DOS_CHROMEOS -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -I../../../../../../../home/chrome-bot/chrome_root/src -Igen -fno-strict-aliasing -funwind-tables -fPIC -pthread -fcolor-diagnostics -fmerge-all-constants -Xclang -mllvm -Xclang -instcombine-lower-dbg-declare=0 -Xclang -fdebug-compilation-dir -Xclang . -no-canonical-prefixes -flto=thin -fsplit-lto-unit -fwhole-program-vtables -m64 -march=x86-64 -Wall -Wextra -Wimplicit-fallthrough -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-unneeded-internal-declaration -Wno-undefined-var-template -Wno-ignored-pragma-optimize -fno-omit-frame-pointer -g2 -gsplit-dwarf -ggnu-pubnames -fsanitize=cfi-vcall -fsanitize-blacklist=../../../../../../../home/chrome-bot/chrome_root/src/tools/cfi/blacklist.txt -fsanitize=cfi-derived-cast -fsanitize=cfi-unrelated-cast -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -O2 -fno-ident -fdata-sections -ffunction-sections -std=c++14 -fno-exceptions -fno-rtti --sysroot=../../../../../../../build/amd64-generic -fvisibility-inlines-hidden -pipe -march=x86-64 -msse3 -fno-split-dwarf-inlining -fdebug-info-for-profiling -D__google_stl_debug_vector=1 -faddrsig -Wno-unknown-warning-option -stdlib=libc++  -c ../../../../../../../home/chrome-bot/chrome_root/src/base/base_switches.cc -o obj/base/base_static/base_switches.o
chromeos-chrome-73.0.3677.0_alpha-r1: clang++: error: unknown argument: '-fsplit-lto-unit'

Comment 9 by p...@chromium.org, Today (10 hours ago)

Cc: p...@chromium.org
Re #4: Ah, I was looking at the cros_host_* variables and somehow missed the cros_target_* ones. But that doesn't explain how the clang roll passed CQ.

Comment 10 by steve...@google.com, Today (10 hours ago)

Here is the CQ builder that includes the change:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/chromeos-amd64-generic-rel/24450

Comment 11 by achuith@chromium.org, Today (9 hours ago)

Cc: -dpranke@google.com dpranke@chromium.org

Comment 12 by bpastene@chromium.org, Today (8 hours ago)

I don't think the problem is that it's using the wrong clang, I think it's that the gn args are different. The compiler arg that threw up errors is guarded around an "if (use_thin_lto)":
https://codesearch.chromium.org/chromium/src/build/config/compiler/BUILD.gn?rcl=82fa384a85df17f48a969f80d489a16a4db92c75&l=634

That's true for chromeos-chrome ebuilds, but not for simplechrome builds by default (the more similar the two, the better):
https://codesearch.chromium.org/chromium/src/third_party/chromite/cli/cros/cros_chrome_sdk.py?rcl=cec40a962e60f3e82d0ee2f05b8f950567649a8e&l=1069

I experimentally enabled thin_lto on our simplechrome CQ bots (and reverted the workaround in crrev.com/c/1424620) and got the repro:
https://chromium-review.googlesource.com/c/chromium/src/+/1427719

I'll tentatively look into enabling thin_lto in the simplechrome workflow for now, but we may end up just duping this into bug 789607, which has got some context behind that decision.

Comment 13 by llozano@chromium.org, Today (7 hours ago)

Cc: manojgupta@chromium.org chromeos-toolchain@google.com
(better to copy chromeos-toolchain@google.com than just copy me in case I am not available).

I think Ben already figured it out. 
On the simple chrome workflow we dont enable thinlto or cfi by default because they would make the builds too slow for the simple chrome users.

So, I think the solution that Ben is proposing is the best. Ie: modify the bots that are using simplechrome to make them more similar to what the ebuild workflow is doing (enable thinlto and cfi). Since this is done only for the bots, the build time should not matter. 


Comment 14 by steve...@chromium.org, Today (7 hours ago)

IIRC, thinlto adds as much as half an hour to the build times? The cycle time of the builders is pretty important, especially in the CQ.

I would advocate for enabling thinlto and cfi on the chromium waterfall (i.e. https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/chromeos-amd64-generic-rel), but not in the CQ. Yes, that means this particular CL would have passed the CQ, but at least it would have failed much sooner in the chromium waterfall. We make this compromise elsewhere for other slow builders.


Comment 15 by g...@google.com, Today (7 hours ago)

AIUI, there's a project to make ThinLTO use goma to distribute its work, which should (hopefully) speed links up quite a bit. Assuming CQ bots use goma, perhaps we can reconsider using ThinLTO for CQ bots once that's landed?

Comment 16 by steve...@chromium.org, Today (7 hours ago)

My understanding is that is not a near term thing. I agree that enabling it on all builders would be ideal, but in the meanwhile we need to keep the run times in the CQ as close to 30 minutes as possible.

Comment 17 by llozano@chromium.org, Today (6 hours ago)

re #14: Yes, steven is right. We should not add this to the Chrome CQ. Only to the builders that can take the increased build time.
Anything that catches the problem before the Chrome OS PFQ is a good thing. 
I don't know which builders participate in the Chrome validation for LKGR.

Comment 18 by thakis@chromium.org, Today (6 hours ago)

As long as you can update the cros gardener playbook to not revert clang rolls for flag mismatches but instead add a target_os != "chromeos" when we forget, doing this only on the main waterfall is fine with me.

I don't want to end up in a situation where clang rolls get reverted over this again. clang rolls are already super hard to land and usually affect many platforms (this one fixed a miscompile on Windows for example), so getting them reverted for the one platform where Chromium's clang _isn't_ used is pretty unfortunate.

Comment 19 by r...@chromium.org, Today (6 hours ago)

I think, in the short term, it makes sense to have a waterfall bot for this config, and we can manually run chromeos+thinlto try jobs when rolling clang. We would add the new simplechrome+thinlto bot name to the list of try jobs to run maintained here:
https://chromium.googlesource.com/chromium/src/+/lkgr/docs/updating_clang.md

We already include non-CQ configs like cfi in that list.

I think (could be wrong) there are no ThinLTO build configs on the CQ currently, essentially because ThinLTO is still too slow. We already want to speed that up. Once we have a thinlto configuration that builds in 30m, it's very likely that we'll want to add several configs to the CQ:
- linux + cfi
- chromeos + thinlto
- android + thinlto
- windows + thinlto (cfi if ready)

Because of the way thinlto scales non-linearly with the number of binary targets built, we would want to carefully curate the list of test targets we run, perhaps one small binary (base_unittests) and one big one (browser_tests / chrome).

Comment 20 by llozano@chromium.org, Today (6 hours ago)

#19: it would be great if you can add the simplechrome+thinlto builder to that list. 

Comment 21 by thakis@chromium.org, Today (6 hours ago)

Oh, is there already an opt-in simplechrome+thinlto try bot? What's its name?

But yeah, an opt-in trybot would be completely sufficient for us.

Comment 22 by r...@chromium.org, Today (5 hours ago)

> #19: it would be great if you can add the simplechrome+thinlto builder to that list. 

I think that might be what I was going for with the "chromeos+thinlto" config, but I'm not familiar with the exact terminology, so I think we agree.

> Oh, is there already an opt-in simplechrome+thinlto try bot? What's its name?

No, sorry if I gave the impression that there was.

---

So, I think we've decided to make a waterfall bot and an opt-in try bot. Is bpastene@ the right assignee for that?

Comment 23 by bpastene@chromium.org, Today (5 hours ago)

I tried out enabling cfi + thin_lto on the CQ bots, and it did indeed take a while (1+ hr compiles, though I wonder how much of that is due to cold local & goma-backend caches):
https://chromium-review.googlesource.com/c/chromium/src/+/1427960

So I'd agree that speeding that up would be a blocker on turning it on for the CQ. I can fairly trivially add a second simplechrome CI builder w/ those args enabled (along with an optional trybot). That sounds like a good compromise, so I'll work on that.

Also, for posterity, there was some concern in bug 789607 about thin-lto opening too many file-descriptors in the chroot. I don't *think* that'll be a problem here since simeplchrome is outside the chroot, and we explicitly provision our build machines w/ an extended fd limit:
https://chrome-internal.googlesource.com/infra/puppet/+/95ff9374156fb874465a4c15f80385657351fc2e/puppetm/etc/puppet/modules/chrome_infra/manifests/setup/debian.pp#51

Comment 24 by llozano@chromium.org, Today (4 hours ago)

I think the concern about too many file descriptors was just for the gold linker. 
We have moved to LLD for most of the cases but there is still a pending bug on the ARM side that does not allow us to use LLD for case where the build is not "chrome_internal".

Sign in to add a comment