Provide long term solution for problem with nacl_helper, tcmalloc and AFDO. |
||||||||
Issue descriptionThis is a continuation bug for chromium:629593. Moving discussion about this problem to this issue. We want to close the original bug since we already have a workaround and we want to mark that problem as "Fixed".
,
Aug 3 2016
From comment 45 in issue 629593 , it sounds like the cause of the problem is that a TLS variable, threadlocal_heap_, "is initialized to 0xffffffff [in a new thread] (not 0x00000000 as might be expected), and therefore a new ThreadCache is not created". Is this a toolchain bug where AFDO is causing a thread-local variable to be initialised incorrectly, on ARM, maybe triggered by the variable being declared with the attribute tls_model("initial-exec"), which is rarely used? I'm not familiar with how AFDO works, so I'm not sure how likely it is that it would mess up TLS variables.
,
Aug 4 2016
@#2 - yes, that is what my tracing showed. Enabling AFDO when building tcmalloc for nacl_helper causes the __thread variable threadlocal_heap_ to behave strangely when it has the attribute tls_model("initial-exec").
I say behaving "strangely" because just straight compiling the existing code causes it to be consistently initialized as 0xffffffff, but re-arranging code within nacl_helper and/or tcmalloc can cause other symptoms (ie crashing even earlier) and/or cause the problem to go away.
,
Aug 4 2016
Maybe threadlocal_heap_ is being given the same address as some other TLS variable, one that is initialised to -1?
It might be possible to test that by doing:
printf("%p\n", &threadlocal_heap_);
before threadlocal_heap_ is accessed, and doing the same for other TLS variables.
Can you attach a copy of a nacl_helper executable that exhibits the bug (preferably one with line number debug info)?
,
Aug 5 2016
Some additional information about this problem: when failing, the chrome logs shows this: /var/log/chrome/chrome:[1502:2133:0804/140832:ERROR:nacl_process_host.cc(337)] NaCl process exited with status 139 (0x8b) /var/log/chrome/chrome:[1502:2133:0804/140832:ERROR:nacl_process_host.cc(337)] NaCl process exited with status 139 (0x8b) From one of the bugs file against the issue, the error is: <dt><strong>NativeClient: NaCl module load failed: Nexe crashed during startup</strong></dt> Also, I am not sure if we have paid enough attention to the comment on thread_cache.h (see https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/thread_cache.h?rcl=0&l=253) and related bug https://bugs.chromium.org/p/chromium/issues/detail?id=124489. Is it possible that the nexe files are being dlopen ? or something related to that changed with the GN migration?
,
Aug 5 2016
I also continued playing with the compiler. I found that that with AFDO, the compiler is trying to optimize calls to do_free_with_callback by creating an especialized clone and applying constant propagation. I don't know more than that. There may be an error in the clone or the change has some side effect that is exposing the problem. I am not very motivated to continue looking at the compiler generated code. We are moving away from GCC to LLVM and this bug is actually taking time from my effort to move to LLVM. We do have a compiler workaround (do not use AFDO). So, we can just use that one. But, there might be a lurking bug about the dlopen and the TLS model. Should we stop using that TLS model?
,
Aug 5 2016
about #4 I have put a copy of the bad nacl_helper under /google/data/rw/users/ll/llozano/crosbug_633719
,
Sep 22 2016
,
Oct 3 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bffda9f508c0376ab3ae4b9e52b0367c24f3818a commit bffda9f508c0376ab3ae4b9e52b0367c24f3818a Author: dpranke <dpranke@chromium.org> Date: Mon Oct 03 20:26:38 2016 Add an `auto_profile_path` flag to GN to control AFDO. This change adds a new top-level compiler config to GN, so that we can control whether AFDO should be enabled or not per-target in CrOS official builds (which is where we use AFDO). R=brettw@chromium.org, llozano@chromium.org BUG=633719 Review-Url: https://codereview.chromium.org/2375783002 Cr-Commit-Position: refs/heads/master@{#422513} [modify] https://crrev.com/bffda9f508c0376ab3ae4b9e52b0367c24f3818a/base/allocator/BUILD.gn [modify] https://crrev.com/bffda9f508c0376ab3ae4b9e52b0367c24f3818a/build/config/BUILDCONFIG.gn [modify] https://crrev.com/bffda9f508c0376ab3ae4b9e52b0367c24f3818a/build/config/compiler/BUILD.gn
,
Dec 26 2017
,
Jan 2 2018
[It appears that a bunch of old cros issues bulk-added the "Infra" component recently, but they should probably be "Infra>Client>ChromeOS".]
,
Jan 4 2018
,
Feb 5 2018
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by bradnelson@chromium.org
, Aug 3 2016