NaCl broken on ARM Chromebooks |
|||||||||||||
Issue descriptionI'm reproducing this right now on a Daisy Chromebook running 54.0.2798.0 / platform 8707.0.0 This was originally caught as bug 628505 , that Chrome OS TTS stopped working on ARM boards, with this blamelist: https://chromium.googlesource.com/chromium/src/+log/54.0.2787.0..54.0.2790.0?pretty=fuller&n=10000 I suspect that it's this change, since most / all native client seems to be broken: https://chromium.googlesource.com/native_client/src/native_client.git/+/0949e1bef9d6b25ee44eb69a54e0cc6f8a677375 "Update the nacl_bootstrap GN code to work in ChromeOS chroot builds." To repro native client breakage, try this url: https://gonativeclient.appspot.com/static/bullet/fullscreen.html Alternatively, to test TTS (which uses NaCl also), press Ctrl+Alt+Z to turn on spoken feedback. If you hear sounds but no speech, it's broken. Open chrome://inspect, click on Extensions, and inspect Chrome OS built-in TTS to see the NaCl error message.
,
Jul 21 2016
,
Jul 26 2016
Any update on the fix for this dev blocker?
,
Jul 26 2016
I will help Bradley on this today.
,
Jul 26 2016
Since the problem seems to have started with the fix for GN nacl_bootstrap, I looked at how we where compiling this file with GYP and how we are compiling it with GN. With GYP: (open this window wide) [6521/24398] /usr/local/google/home/llozano/proj/gn_problem/chroot/tmp/my-chrome-src/.cros_cache/common/goma+2/gomacc armv7a-cros-linux-gnueabi-gcc -B/usr/local/google/home/llozano/proj/gn_problem/chroot/tmp/my-chrome-src/.cros_cache/chrome-sdk/tarballs/daisy+8613.0.0+target_toolchain/usr/x86_64-pc-linux-gnu/armv7a-cros-linux-gnueabi/binutils-bin/2.25.51-gold -MMD -MF obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib.nacl_bootstrap.o.d -DV8_DEPRECATION_WARNINGS -D_FILE_OFFSET_BITS=64 -D_POSIX_C_SOURCE=199506 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE=1 -D__STDC_LIMIT_MACROS=1 -DCHROMIUM_BUILD -DCR_CLANG_REVISION=274369-1 -DUI_COMPOSITOR_IMAGE_TRANSPORT -DUSE_AURA=1 -DUSE_ASH=1 -DUSE_PANGO=1 -DUSE_CAIRO=1 -DUSE_CRAS=1 -DUSE_OZONE=1 -DUSE_DEFAULT_RENDER_THEME=1 -DUSE_LIBJPEG_TURBO=1 -DIMAGE_LOADER_EXTENSION=1 -DENABLE_WEBRTC=1 -DENABLE_MEDIA_ROUTER=1 -DUSE_PROPRIETARY_CODECS -DENABLE_PEPPER_CDMS -DENABLE_NOTIFICATIONS -DENABLE_WAYLAND_SERVER=1 -DUSE_UDEV -DFIELDTRIAL_TESTING_ENABLED -DENABLE_TASK_MANAGER=1 -DENABLE_EXTENSIONS=1 -DENABLE_PDF=1 -DENABLE_PLUGINS=1 -DENABLE_SESSION_SERVICE=1 -DENABLE_THEMES=1 -DENABLE_PRINTING=1 -DENABLE_BASIC_PRINTING=1 -DENABLE_PRINT_PREVIEW=1 -DENABLE_SPELLCHECK=1 -DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 -DENABLE_SUPERVISED_USERS=1 -DENABLE_MDNS=1 -DENABLE_SERVICE_DISCOVERY=1 -DV8_USE_EXTERNAL_STARTUP_DATA -DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL -DUSE_LIBPCI=1 -DUSE_NSS_CERTS=1 -DOS_CHROMEOS=1 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -Igen -I../../native_client/src/third_party -I../.. -I../../native_client/src/trusted/service_runtime -pthread -fno-strict-aliasing -Wno-extra -Wno-unused-parameter -Wno-missing-field-initializers -fvisibility=hidden -pipe -fPIC -Wno-unused-local-typedefs -fno-builtin -fno-stack-protector -fno-pic -fno-PIC -fno-pie -fno-PIE -Wno-maybe-uninitialized -march=armv7-a -mtune=generic-armv7-a -mfpu=neon -mfloat-abi=hard -mthumb --sysroot=/usr/local/google/home/llozano/proj/gn_problem/chroot/tmp/my-chrome-src/.cros_cache/chrome-sdk/tarballs/daisy+8613.0.0+sysroot_chromeos-base_chromeos-chrome.tar.xz -O2 -fno-ident -fdata-sections -ffunction-sections -gsplit-dwarf -pipe -march=armv7-a -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -gsplit-dwarf -c ../../native_client/src/trusted/service_runtime/linux/nacl_bootstrap.c -o obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib.nacl_bootstrap.o with GN: [760/28446] /usr/local/google/home/llozano/proj/gn_problem/chroot/tmp/my-chrome-src/.cros_cache/common/goma+2/gomacc armv7a-cros-linux-gnueabi-gcc -B/usr/local/google/home/llozano/proj/gn_problem/chroot/tmp/my-chrome-src/.cros_cache/chrome-sdk/tarballs/daisy+8613.0.0+target_toolchain/usr/x86_64-pc-linux-gnu/armv7a-cros-linux-gnueabi/binutils-bin/2.25.51-gold -MMD -MF nacl_bootstrap/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o.d -DOS_CHROMEOS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -D_POSIX_C_SOURCE=199506 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE=1 -D__STDC_LIMIT_MACROS=1 -I../.. -fno-builtin -fno-stack-protector -fno-pic -fno-PIC -fno-pie -fno-PIE -march=armv7-a -mfloat-abi=hard -mthumb -mtune=generic-armv7-a --sysroot=../../../.cros_cache/chrome-sdk/tarballs/daisy+8613.0.0+sysroot_chromeos-base_chromeos-chrome.tar.xz -Wall -Werror -Wno-psabi -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-missing-field-initializers -Wno-unused-parameter -O2 -fno-ident -fdata-sections -ffunction-sections -g2 -pipe -march=armv7-a -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -gsplit-dwarf -c ../../native_client/src/trusted/service_runtime/linux/nacl_bootstrap.c -o nacl_bootstrap/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o there is quite a bit of difference. Several defines like -D_FILE_OFFSET_BITS=64 -DCHROMIUM_BUILD -DCR_CLANG_REVISION=274369-1 -DUI_COMPOSITOR_IMAGE_TRANSPORT -DUSE_AURA=1 -DUSE_ASH=1 -DUSE_PANGO=1 -DUSE_CAIRO=1 -DUSE_CRAS=1 -DUSE_OZONE=1 -DUSE_DEFAULT_RENDER_THEME=1 -DUSE_LIBJPEG_TURBO=1 -DIMAGE_LOADER_EXTENSION=1 -DENABLE_WEBRTC=1 -DENABLE_MEDIA_ROUTER=1 and others.... also, compiler options like -fno-strict-aliasing, -fvisibility=hidden ... So, too many differences that could cause problems. @dpranke, @bradnelson, Is this expected? I assume it is not.
,
Jul 26 2016
I have put the logs for the GN and GYP in X20 so that Brad can compare. /google/data/rw/users/ll/llozano/for_bradnelson
,
Jul 26 2016
I would expect the flags to be quite different, but clearly we're picking up the default set of feature defines and probably don't need them. The most obvious other thing that I noticed was that we're ending up with -gsplit-dwarf, I'm guessing because that's getting set w/ cros_extra_cxx_flags. However, the comments in the //native_client/src/trusted/service_runtime/linux file make it clear that we're trying to go out of our way to avoid split-dwarf here because of possible issues w/ older binutils (though that says clang, not gcc). The CL in question also caused the build to flip from clang to gcc for this particular file. Can you try compiling without the flags in cros_extra_cxx_flags that are being duplicated? I think that's how -gsplit-dwarf is getting set (and you should get that where needed already via the other gn args).
,
Jul 27 2016
(update conflict) and not sure if this helps at all, but this was the command before the change mentioned in the bug description: https://chromium.googlesource.com/native_client/src/native_client.git/+/0949e1bef9d6b25ee44eb69a54e0cc6f8a677375 [28683/28738] ../../../../../../../home/llozano/chrome_root/src/third_party/llvm-build/Release+Asserts/bin/clang -MMD -MF nacl_bootstrap_arm/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o.d -DOS_CHROMEOS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_POSIX_C_SOURCE=199506 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE=1 -D__STDC_LIMIT_MACROS=1 -I../../../../../../../home/llozano/chrome_root/src -fno-builtin -fno-stack-protector -fno-pic -fno-PIC -fno-pie -fno-PIE --target=arm-linux-gnueabihf -march=armv7-a -mfloat-abi=hard -mthumb -mtune=generic-armv7-a --sysroot=../../../../../../../build/daisy -Wall -Werror -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-deprecated-register -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-shift-negative-value -Wno-undefined-var-template -O2 -fno-ident -fdata-sections -ffunction-sections -g2 -c ../../../../../../../home/llozano/chrome_root/src/native_client/src/trusted/service_runtime/linux/nacl_bootstrap.c -o nacl_bootstrap_arm/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o At this time, the compiler used was incorrect. There are some suspicious differences like -mfpu=neon and -gsplit-dwarf.
,
Jul 27 2016
It just seems like a bug to me that they are stuff -gsplit-dwarf into some magic variable instead of just putting use_debug_fission=true in the build args. Why do they do that?
,
Jul 27 2016
@mcgrathr - these flags end up stemming from the way the CrOS build system works, which is more like a traditional unix setup where most packages are expecting to have EXTRA_CC_FLAGS, EXTRA_CPP_FLAGS, etc and the CrOS team wants to make sure those flags get propagated into the Chromium build. This is a problematic approach in some cases because there are some chromium targets that need to very carefully control the flags that are used (as you are obviously well aware). In general we're still working on the best way to balance these requirements. GYP was in some ways less affected by this problem because you could explicitly say cflags!: -gsplit-dwarf and the flag would be removed if it was present, but not error out if it wasn't. In GN, you can't control individual flags that are set in other configs. In particular in this case llozano@ will try getting rid of the -gsplit-dwarf flag and see if that helps.
,
Jul 27 2016
ok, I have a fix for this in https://chromium-review.googlesource.com/#/c/363505/2 but it turns out that this only affects the simple chrome workflow. It should not happen in the ebuild workflow (I am verifying but I am 90% sure). So, this cannot be the problem that is affecting NACL.
,
Jul 27 2016
this is what we get for nacl_bootstrap.c on the ebuild workflow. [720/29575] armv7a-cros-linux-gnueabi-gcc -B/usr/x86_64-pc-linux-gnu/armv7a-cros-linux-gnueabi/binutils-bin/2.25.51-gold -MMD -MF nacl_bootstrap/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o.d -DOS_CHROMEOS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -D_POSIX_C_SOURCE=199506 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE=1 -D__STDC_LIMIT_MACROS=1 -I../../../../../chromeos-cache/distfiles/target/chrome-src-internal/src -fno-builtin -fno-stack-protector -fno-pic -fno-PIC -fno-pie -fno-PIE -march=armv7-a -mfloat-abi=hard -mthumb -mtune=generic-armv7-a --sysroot=../../../../../../../build/daisy -Wall -Werror -Wno-psabi -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-missing-field-initializers -Wno-unused-parameter -O2 -fno-ident -fdata-sections -ffunction-sections -g2 -pipe -march=armv7-a -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -c ../../../../../chromeos-cache/distfiles/target/chrome-src-internal/src/native_client/src/trusted/service_runtime/linux/nacl_bootstrap.c -o nacl_bootstrap/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o which does not containd the -gsplit-dwarf
,
Jul 28 2016
I was doing this with simple chrome. In the ebuild the compilation after the changes looks like this: [41/21778] armv7a-cros-linux-gnueabi-gcc -B/usr/x86_64-pc-linux-gnu/armv7a-cros-linux-gnueabi/binutils-bin/2.25.51-gold -MMD -MF nacl_bootstrap/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o.d -DOS_CHROMEOS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -D_POSIX_C_SOURCE=199506 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE=1 -D__STDC_LIMIT_MACROS=1 -I../../../../../chromeos-cache/distfiles/target/chrome-src-internal/src -fno-builtin -fno-stack-protector -fno-pic -fno-PIC -fno-pie -fno-PIE -march=armv7-a -mfloat-abi=hard -mthumb -mtune=generic-armv7-a --sysroot=../../../../../../../build/daisy -Wall -Werror -Wno-psabi -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-missing-field-initializers -Wno-unused-parameter -O2 -fno-ident -fdata-sections -ffunction-sections -g2 -pipe -march=armv7-a -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -fauto-profile=/build/daisy/tmp/portage/chromeos-base/chromeos-chrome-54.0.2809.0_rc-r1/work/afdo/chromeos-chrome-amd64-54.0.2808.0_rc-r1.afdo -Wno-error -c ../../../../../chromeos-cache/distfiles/target/chrome-src-internal/src/native_client/src/trusted/service_runtime/linux/nacl_bootstrap.c -o nacl_bootstrap/obj/native_client/src/trusted/service_runtime/linux/nacl_bootstrap_lib/nacl_bootstrap.o I think the problem maybe caused by the fauto-profile option (not totally sure, found it by accident). It is possible compiler is doing some optimization that nacl_bootstrap does not like. dpranke is giving me a patch to avoid passing all of these options to the nacl_bootstrap option. Will try later.
,
Jul 28 2016
testing patch: https://codereview.chromium.org/2188633004/
,
Jul 28 2016
Certainly -fauto-profile or any other kinds of instrumentation options will break nacl_helper_bootstrap. It's really a losing battle to have any kinds of random options be forced into the tools for that. That's why it has its own special toolchain definition. The only "safe" options are the basic ABI stuff like -march, -mtune, -mthumb, -mfloat-abi, etc. and core optimization options like -O*. Anything else that affects code generation in any way is likely to be a problem.
,
Jul 29 2016
Issue 631304 has been merged into this issue.
,
Jul 29 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/208cbe37d01f02d13c028a7520fa5b7ef4985c02 commit 208cbe37d01f02d13c028a7520fa5b7ef4985c02 Author: Luis Lozano <llozano@chromium.org> Date: Wed Jul 27 17:23:47 2016 GN: remove unnecessary -gsplit-dwarf flag. On GYP, we had to add this flag manually. On GN, this is unnecessary and may be causing some problems. BUG= chromium:629593 TEST=Built with simple chrome workflow. Change-Id: I34706e93697e44b10e377bd49476f693c303dbf9 Reviewed-on: https://chromium-review.googlesource.com/363505 Commit-Ready: Luis Lozano <llozano@chromium.org> Tested-by: Luis Lozano <llozano@chromium.org> Reviewed-by: Steven Bennetts <stevenjb@chromium.org> [modify] https://crrev.com/208cbe37d01f02d13c028a7520fa5b7ef4985c02/cli/cros/cros_chrome_sdk.py
,
Jul 29 2016
The patch in #14 only changes flags for nacl_helper_bootstrap. I do not think that is enough to work around this issue. I was able to get NaCl working again using the simple chrome workflow and recompiling all of nacl_helper w/out AFDO:
ninja -C out_${SDK_BOARD}/Release -j500 nacl_helper
deploy_chrome --build-dir=out_${SDK_BOARD}/Release --to=${IP}
However, I think this actually recompiles a lot of dependencies, as well.
I am trying to repro now to see all that changed.
,
Jul 29 2016
Rebuilding nacl_helper without afdo has 9206 ninja steps, as shown in the attached build_nacl_helper_no_afdo.log. One of these steps made a difference. To repro: (1) Enter simple chrome sdk cros chrome-sdk --board=elm --chroot=~/chromeos/chroot/ --internal (2) Checkout 54.0.2809.0 cd src git checkout 54.0.2809.0 (2.a) Optional: Apply testing patch: https://codereview.chromium.org/2188633004/ <= This doesn't help [*] (3) Download and extract afdo data using gsutil: gsutil cp gs://chromeos-prebuilt/afdo-job/canonicals/chromeos-chrome-amd64-54.0.2808.0_rc-r1.afdo.bz2 . bunzip2 chromeos-chrome-amd64-54.0.2808.0_rc-r1.afdo.bz2 (4) Configure gn to use afdo data: # use gsutil to download .afdo for $SDK_BOARD rm -rf out_$SDK_BOARD/Release gn gen out_$SDK_BOARD/Release --args="$GN_ARGS" gn args out_$SDK_BOARD/Release # update args to match attached args_afdo_nogoma.gn (5) build everything w/ afdo (& no goma) ninja -C out_${SDK_BOARD}/Release -j 96 chrome chrome_sandbox nacl_helper (6) deploy to device deploy_chrome --build-dir=out_${SDK_BOARD}/Release --to=${IP} => NaCl fails to load. ## Now, rebuild just nacl_helper w/out afdo: (7) Configure gn to use afdo data: # use gsutil to download .afdo for $SDK_BOARD gn args out_$SDK_BOARD/Release # update args to match attached args_no_afdo.gn (8) Rebuild just nacl_helper w/out afdo ninja -C out_${SDK_BOARD}/Release -j 96 nacl_helper (9) deploy to device deploy_chrome --build-dir=out_${SDK_BOARD}/Release --to=${IP} => NaCl now loads ok! [*] Notice the line for nacl_bootstrap.o does NOT have "-fauto-profile=" copmile flag in attached build_full_afdo.log.
,
Jul 29 2016
nacl_helper is a separate binary that runs sandboxed. It's not special like nacl_helper_bootstrap so in the abstract things like FDO could be fine in nacl_helper. But perhaps the sandbox context it runs in, or other details of what code is and isn't built into nacl_helper, interfere with the FDO runtime to break things. I don't know enough off hand about how the FDO runtime works to offer specific advice.
,
Jul 29 2016
Okay, the current hypothesis is that - NaCl on ARM is and always has been broken in GN w/ this set of build flags. - The reason we didn't see an issue with GYP is that due to the separate NaCl build process in GYP, the same set of flags (AFDO) weren't getting used for the NaCL binaries. Our options at the moment are: - flip back to GYP for this canary build - try to figure out what's wrong and fix it (an unknown effort) - turn AFDO off (presumably making the build slower) - try to figure out how to turn AFDO off for just the NaCl targets The latter is possibly the right thing to do in the long run, and I can work on it, but it may take a few hours before I even have something I can test. This means that verifying it for a canary that would ship on Monday might be difficult.
,
Jul 29 2016
- Flipping back to GYP would create a huge amount of churn and confusion, I would strongly prefer any other option. - Turning off AFDO for NaCl targets sounds like the right thing to do to me. If we absolutely need a Canary without a broken NaCl (which affects accessibility and IME) on Monday, I would vote for disabling AFDO entirely for the short term. Unless that has consequences beyond a small performance impact, it sounds like something we can live with for a Canary release or two while we figure this out. It certainly wouldn't be unprecedented for a Canary to go out with performance issues., llozano@, thoughts?
,
Jul 29 2016
+1 to figuring out how to turn AFDO off just for Nacl targets. Out of curiosity, how much slower would we expect (ballpark guess) the build to get if we had to completely turn AFDO off?
,
Jul 29 2016
okay, so, after having looked at this a bit more and refreshed my memory ... if the problem is in nacl_helper, and not nacl_helper_bootstrap, then life is in a sense more complicated. nacl_helper is a regular linux executable that is compiled w/ the default GN toolchain (the same one that builds chrome, using the same flags and libraries). Because -fauto-profile is getting set in cros_extra_cflags, that flag is applied unconditionally to all objects, and hence there's no way for us to disable this just for the nacl_helper files. Moreover, of the ~9000 objects needed for nacl_helper, only ~4 of them are nacl_helper-specific, and the rest are shared w/ chrome, so we either have to build twice or turn off afdo either way. This suggests that either some change has come in that has regressed things either way (and a current GYP build would be broken, too), or maybe the binaries are just different enough between GYP and GN to tickle some bug in AFDO (they should be roughly the same, but might be slightly different and we'd have to look into how exactly). So, I'm punting this back to llozano@, who I think is going to try and bisect things for a while and see if he can find a change that breaks things. If not, we might have to turn off AFDO until we can figure out what's wrong.
,
Jul 30 2016
my apologies for late reply. I have been working very hard on this problem and I ignored email for a few hours. Been consulting with dpranke and some of his updates come from these discussions. - Long term solution is to turn off AFDO just for the NaCL targets. - For Monday, unless I find a solution before, I will have to TEMPORARY disable AFDO for ARM. - AFDO brings upto %15-20 performance improvement on some benchmarks. - I am currently bisecting objects files with and without AFDO. It takes time. Hopefully will find the set of objects where we need to disable AFDO.
,
Jul 30 2016
bisection points to this file: base/allocator/tcmalloc/debugallocation_shim.o I am investigating.
,
Jul 30 2016
If that file is problematic, maybe you can try building with tcmalloc disabled (use_allocator = "none") and see if that goes away? This might have a perf hit, but I think we've actually disabled tcmalloc by default on everything but linux (and it's disabled on linux for mipsel and arm64 as well).
,
Jul 30 2016
+primiano ... Or, possibly try setting use_experimental_allocator_shim=false.
,
Aug 1 2016
I would prefer not to change allocators. I dont want to deal with the performance implications.
I would prefer just to disable AFDO for this file. According to the comments I saw, Some of the functionality in this file depends on particular stack traces wich can be modified by different optimizations levels. I prefer to to just pass -fno-auto-profile and leave this file alone.
I did a little bit more of investigation and found that disabling AFDO for function do_free_with_callback fixes the problem.
Like this:
#pragma GCC optimize ("no-auto-profile")
inline void do_free_with_callback(void* ptr, void (*invalid_free_fn)(void*))
{ .... }
#pragma GCC reset_options
but I prefer not to touch the code. As I said, some of the comments in there make me nervous about enabling more optimizations for that file.
Lets just disable AFDO for the file.
@dpranke, can you help me with this? I don't have enough knowledge of GN to do this.
,
Aug 1 2016
use_experimental_allocator_shim should have no effect on tcmalloc files. Just changes the place where the malloc & friends symbol are introduced, but shouldn't make any change to the way debugallocation_shim.cc is build (There is a bit of naming conflict here, debugallocation_shim is something that has been in tcmalloc since forever and has nothing to do with the allocator shim which is conditioned by use_experimental... which is newer) Note that in release builds debugallocation_shim.cc == tcmalloc.cc see https://cs.chromium.org/chromium/src/base/allocator/debugallocation_shim.cc?q=debugallocation_shim&sq=package:chromium&dr&l=8&ssfr=1
,
Aug 1 2016
@primiano, can you comment on #29 and if you agree with just disabling this optimization for that file?
,
Aug 1 2016
re #31. I cannot find any trace of that "-fauto-profile" in the GYP/GN files. I imagine this comes from some internal CrOS magic build thing? Can't you do something at that level? I don't have objections to auto-profile flags itself. My only worry is that I don't see any precedent for having #pragma GCC optimize in the codebase. I suggest asking some base owner about this (+thakis). As long as they are fine, I'm okay with that. From an allocator viewpoint tcmalloc is used only on Linux/CrOS, and I never heard about -fauto-profile in Chrome before so toggling this flag should practically affect only CrOS.
,
Aug 1 2016
I did a bit more debugging today and discovered that: (1) The specific nacl loader failure goes away with (set this in /etc/chrome_dev.conf): NACLVERBOSITY=4 (2) In fact it is even fixed with this tiny change: (a) NACLVERBOSITY=0 (b) Change log level from 4->0 of this one line in NaClAppThreadLauncher in native_client/src/trusted/service_runtime/nacl_app_thread.c: [1] NaClLog(0, "NaClAppThreadLauncher: entered\n"); (3) However, NaCl still fails w/ SIGSEGV if the line [1] is moved below this block: NaClXMutexLock(&natp->nap->threads_mu); natp->thread_num = NaClAddThreadMu(natp->nap, natp); NaClXMutexUnlock(&natp->nap->threads_mu); So, this suggests that there might be a race condition when launching NaCl app threads that is being exposed by the .afdo optimizations.
,
Aug 1 2016
As I said in comment #24, because -fauto-profile is set as part of cros_extra_cflags, it is applied to every file in target and there's no way we can skip it for just the one file. Even if we were to pull out afdo into a separate GN variable (which is probably the right thing to do), the only way to disable flags in GN is on a target-by-target basis, not a file-by-file basis. That means we'd have to disable afdo for all of tcmalloc (which might be okay). In other words, the simplest thing to do is to add the #pragmas in comment #29, but we can figure out how to refactor things so that either we can toggle off afdo for all of tcmalloc, or toggle off all of cros_extra_cflags for tcmalloc if need be.
,
Aug 1 2016
I think it is better in this case to modify the calls to the compiler than to modify the source. I may have a hard time adding those pragmas to the source code. My order of preference for the solution would be: 1) ADD (not skip) -fno-auto-profile at the END of the flags to the compilation of base/allocator/tcmalloc/debugallocation_shim.cc 2) ADD -f-no-auto-profile to the compilation of all of base/allocator/tcmalloc 3) remove all cros_extra_cxxflags for base/allocator/tcmalloc 4) Add the pragmas to the source code.
,
Aug 1 2016
As I've said, you can't set flags per-file in GN (or in GYP, for that matter), so in order to do #1 in GN, you would have to move debugallocation_shim.cc into a separate target on its own. I don't think we should do that. In order to do either #2 or #3 we would need to rework the way cros_extra_cflags is implemented, which is probably a good idea (the current implementation, which I did, was dumb). However, this is something of an invasive change so it might not land today if we go this route. In GN we really don't like randomly tacking on -fenable-x and -fdisable-y to the ends of command lines; it makes the build very hard to reason about. The way to do this, which I would be much more in favor of, would be to turn AFDO into a proper build flag and not setting it via cros_extra_cflags at all. We would still need to do that for the entire tcmalloc target. It seems like the underlying problem here is that AFDO is buggy, and so the end goal should be to fix AFDO so that we don't need a specific workaround for tcmalloc, right? Given that, it seems like we should add the pragmas -- which is way easier than any of these other things -- and a TODO for now, and then implement AFDO as a build flag, change the way cros_extra_cflags works, and fix gcc all as follow-on, non urgent tasks. I'm not sure why you're reluctant to add the pragma; it's no more difficult than changing the build files. All that said, it might be better for thakis@ or brettw@ as an OWNER for //base to weigh in in case there's some real gotcha that would lead us to not want the pragmas.
,
Aug 1 2016
I am reluctant to add the pragmas to the source because usually source code owner see this as "polluting the code". I just asked thakis, and he said he "would prefer not to". I dont want to call AFDO buggy. Sure, I am sure AFDO has bugs like any other piece of code but what happens in some casees is that it correctly optimizes something and the code breaks because it has wrong assumptions (like order of things in the stack as an example). I like your idea about making this a separate flag. But can we do something else as quick workaround? Disable cxx_extra_flags for tcmalloc? Josafat wants a workaround for this by today since he wants to do a dev release.
,
Aug 1 2016
I agree with Dirk. If AFDO is buggy on this one file. I don't see why adding pragmas in it is inherently bad. Changing the code in trivial ways is much better than changing the build in non-trivial ways, which I think is the alternative. I also agree that this kind of thing should use proper build flags.
,
Aug 1 2016
I am much more comfortable adding pragmas than anything else if we need something "quickly" (today). I can work on either or both of the other changes, but am less confident that we can get them in today, given the other breakages I'm also working on. We can say "AFDO doesn't work" rather than "AFDO is buggy" if that helps ;).
,
Aug 1 2016
Got agreement from Thakis to add pragmas to the source code under the agreement that this is only a workaround until we provide the solution dpranke is suggesting. I will send a CL for review in a few minutes.
,
Aug 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a6f64234d71eeb6cd0571bd963301c8a617889f4 commit a6f64234d71eeb6cd0571bd963301c8a617889f4 Author: llozano <llozano@chromium.org> Date: Mon Aug 01 22:39:55 2016 Workaround for issue breaking NACL on ChromeOS. ChromeOS uses AFDO to improve Chrome performance. However, after the GN migration, NACL broke. We triaged this to using AFDO on tcmalloc. tcmalloc has some dependencies on specific stack traces and section allocations that may break once AFDO is used. So, we need to disable AFDO for the file debugallocation_shim.cc (object bisecting pointed to it). We need a quick workaround, so we are doing this with a GCC pragma. The real solution will come in a few days by a change in GN to avoid passing -fauto-profile to tcmalloc files. BUG= chromium:629593 TEST=verified by hand problem does not reproduce after this change. Review-Url: https://codereview.chromium.org/2198253002 Cr-Commit-Position: refs/heads/master@{#409081} [modify] https://crrev.com/a6f64234d71eeb6cd0571bd963301c8a617889f4/base/allocator/debugallocation_shim.cc
,
Aug 2 2016
I've posted a CL that illustrates roughly how I think we might make option #3 work, but the approach has downsides (as discussed in the CL) and it's not obvious to me that it's better than what we're doing now in total (i.e., we might introduce other bugs and weirdnesses with it). Really the only good answer w/ GN is to push as many flags as possible, if not all of them, into the build files, and not use extra flags for anything.
,
Aug 2 2016
Are we expecting other CLs for this bug?
,
Aug 2 2016
There is more work to be done. If NaCl is currently working, we can close this bug, but we should file a follow-up one.
,
Aug 2 2016
Following up on #33:
The actual failure I see during NaClAppThreadLauncher() comes from:
NaClAppThreadLauncher() ->
natp->thread_num = NaClAddThreadMu(natp->nap, natp); ->
DynArraySet(&nap->threads, pos, natp) ->
new_space = realloc(dap->ptr_array, desired_space * sizeof *new_space); -...->
do_realloc_with_callback() -...->
do_malloc() ->
ThreadCache::GetCache()
Here is ThreadCache::GetCache, from third_party/tcmalloc/chromium/src/tcmalloc.c:
inline ThreadCache* ThreadCache::GetCache() {
ThreadCache* ptr = NULL;
if (!tsd_inited_) {
InitModule();
} else {
ptr = GetThreadHeap();
}
if (ptr == NULL) ptr = CreateCacheIfNecessary();
return ptr;
}
And here is GetThreadHeap():
inline ThreadCache* ThreadCache::GetThreadHeap() {
#ifdef HAVE_TLS
// __thread is faster, but only when the kernel supports it
if (KernelSupportsTLS())
return threadlocal_heap_;
#endif
return reinterpret_cast<ThreadCache *>(
perftools_pthread_getspecific(heap_key_));
}
HAVE_TLS is defined, and KernelSupportsTLS() returns true, so GetThreadHeap() returns __thread variable threadlocal_heap_.
However, when launching the second NaCl App thread, threadlocal_heap_ is initialized to 0xffffffff (not 0x00000000 as might be expected), and therefore a new ThreadCache is not created.
threadlocal_heap_ is defined with a massive comment:
// If TLS is available, we also store a copy of the per-thread object
// in a __thread variable since __thread variables are faster to read
// than pthread_getspecific(). We still need pthread_setspecific()
// because __thread variables provide no way to run cleanup code when
// a thread is destroyed.
// We also give a hint to the compiler to use the "initial exec" TLS
// model. This is faster than the default TLS model, at the cost that
// you cannot dlopen this library. (To see the difference, look at
// the CPU use of __tls_get_addr with and without this attribute.)
// Since we don't really use dlopen in google code -- and using dlopen
// on a malloc replacement is asking for trouble in any case -- that's
// a good tradeoff for us.
#ifdef HAVE_TLS
static __thread ThreadCache* threadlocal_heap_
// This code links against pyautolib.so, which causes dlopen() on that shared
// object to fail when -fprofile-generate is used with it. Ideally
// pyautolib.so should not link against this code. There is a bug filed for
// that:
// http://code.google.com/p/chromium/issues/detail?id=124489
// For now the workaround is to pass in -DPGO_GENERATE when building Chrome
// for instrumentation (-fprofile-generate).
// For all non-instrumentation builds, this define will not be set and the
// performance benefit of "intial-exec" will be achieved.
#if defined(HAVE___ATTRIBUTE__) && !defined(PGO_GENERATE)
__attribute__ ((tls_model ("initial-exec")))
# endif
;
#endif
And... sure enough, removing the "initial-exec" attribute for threadlocal_heap_ also fixes the issue.
,
Aug 2 2016
about #45, thanks for this finding djkurtz. Not sure yet how this finding ties with my finding.. Anyway, maybe a possible solution would be for nacl_helper to not use tcmalloc? it seems tcmalloc has some restrictions that are being broken by nacl_helper? @bradnelson, can you comment on this? (It seems he had not been copied on this bug before?)
,
Aug 2 2016
Should I open a different bug? there is some discussion to be had about this issue but this bug is a P1 and is marked as a release blocker.
,
Aug 2 2016
You cannot disable tcmalloc just for nacl_helper.
If this analysis is correct (and I have no particular reason to doubt it).
The bug mentioned in that comment became irrelevant quite some time ago (pyauto is gone).
However, it may be that the perf benefit of tls_model("initial-exec") isn't worth it, and we should just remove that altogether.
(Frankly, I'm not even sure tcmalloc is still worth it and we should probably re-confirm that belief.)
,
Aug 2 2016
re c#47, if NaCL is working with above CLs then I would prefer to mark this as fixed and file new bug(s) to follow up on any remaining work
,
Aug 2 2016
we have a short term workaround. So, moving the discussion to chromium:633719. And marking this as "Fixed".
,
Aug 3 2016
Will comment on the new issue. Sorry for slow reply.
,
Aug 11 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2ffe82493c719eb280aa895608f16ae80898ad68 commit 2ffe82493c719eb280aa895608f16ae80898ad68 Author: dpranke <dpranke@chromium.org> Date: Thu Aug 11 16:27:14 2016 Add a way for CrOS toolchains to set custom flags for the nacl bootstrap. R=llozano@chromium.org, bradnelson@chromium.org BUG= 629593 Review-Url: https://codereview.chromium.org/2188633004 Cr-Commit-Position: refs/heads/master@{#411350} [modify] https://crrev.com/2ffe82493c719eb280aa895608f16ae80898ad68/build/toolchain/cros/BUILD.gn [modify] https://crrev.com/2ffe82493c719eb280aa895608f16ae80898ad68/build/toolchain/cros_toolchain.gni
,
Aug 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2af5a28d83c844c1f58aacb59a9b8b1cb234b129 commit 2af5a28d83c844c1f58aacb59a9b8b1cb234b129 Author: mostynb <mostynb@opera.com> Date: Tue Aug 16 16:57:03 2016 -fno-auto-profile is only available beginning with stock GCC 5 This followup to https://codereview.chromium.org/2198253002 unbreaks stock GCC 4.8/4.9 builds. (We suspect that the chromeos GCC 4.9 toolchain has a local patch for this feature.) BUG= 629593 Review-Url: https://codereview.chromium.org/2244983002 Cr-Commit-Position: refs/heads/master@{#412267} [modify] https://crrev.com/2af5a28d83c844c1f58aacb59a9b8b1cb234b129/base/allocator/debugallocation_shim.cc
,
Oct 13 2016
Verified on build 8743.65.0 |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by dmazz...@chromium.org
, Jul 21 2016