Regression "libbase.cr.so: wrong ELF class: ELFCLASS32" on android asan tot bot |
|||||||
Issue descriptionhttps://build.chromium.org/p/chromium.fyi/builders/ClangToTAndroidASan%20tester/builds/1149/steps/components_browsertests/logs/stdio Traceback (most recent call last): File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/device/local_device_test_run.py", line 64, in wrapper return f(dev, *args, **kwargs) File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/device/local_device_gtest_run.py", line 274, in individual_device_set_up step() File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/device/local_device_gtest_run.py", line 267, in init_tool_and_start_servers s.SetUp() File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/local_test_server_spawner.py", line 33, in SetUp [(self.port, self.port)], self._device, self._tool) File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 85, in Map instance = Forwarder._GetInstanceLocked(tool) File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 195, in _GetInstanceLocked Forwarder._instance = Forwarder(tool) File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 215, in __init__ self._InitHostLocked() File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 269, in _InitHostLocked self._KillHostLocked() File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 322, in _KillHostLocked '\n'.join(output))) HostForwarderError: /b/build/slave/ClangToTAndroidASan_tester/build/src/out/Debug/host_forwarder exited with 1: I 73.977s individual_device_set_up(0693cc46003be734) Adding 0693cc46003be734 to blacklist /b/build/slave/ClangToTAndroidASan_tester/build/src/out/bad_devices.json for reason: individual_device_set_up I 73.987s TimeoutThread-1-for-individual_device_set_up(073141810069b922) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 root I 74.032s TimeoutThread-1-for-individual_device_set_up(073141810069b922) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 wait-for-device I 74.037s TimeoutThread-1-for-individual_device_set_up(073141810069b922) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 shell '( test -d /storage/emulated/legacy );echo %$?' I 74.087s TimeoutThread-1-for-individual_device_set_up(073141810069b922) condition 'sd_card_ready' met (0.1s) I 74.087s TimeoutThread-1-for-individual_device_set_up(073141810069b922) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 shell '( pm path android );echo %$?' I 74.231s TimeoutThread-1-for-individual_device_set_up(06931af4003be783) condition 'pm_ready' met (1.0s) I 74.232s TimeoutThread-1-for-individual_device_set_up(06931af4003be783) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( getprop sys.boot_completed );echo %$?' I 74.292s TimeoutThread-1-for-individual_device_set_up(06931af4003be783) condition 'boot_completed' met (1.0s) I 74.300s TimeoutThread-1-for-individual_device_set_up(06931af4003be783) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( su -c ls /root && ! ls /root );echo %$?' I 74.353s TimeoutThread-1-for-individual_device_set_up(06931af4003be783) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( echo -n 20.000000 > /data/local/tmp/chrome_timeout_scale );echo %$?' I 74.404s individual_device_set_up(06931af4003be783) Allocate port 10202 for test server. I 74.404s individual_device_set_up(06931af4003be783) Creating new spawner on port: 10202. I 74.406s TimeoutThread-1-for-individual_device_set_up(06931af4003be783) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( echo -n 10202:0 > /storage/emulated/legacy/net-test-server-ports );echo %$?' I 74.470s individual_device_set_up(06931af4003be783) Killing host_forwarder. I 74.470s individual_device_set_up(06931af4003be783) [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/out/Debug/host_forwarder --kill-server C 74.474s individual_device_set_up(06931af4003be783) STDERR: /b/build/slave/ClangToTAndroidASan_tester/build/src/out/Debug/host_forwarder: error while loading shared libraries: libbase.cr.so: wrong ELF class: ELFCLASS32 I 74.474s individual_device_set_up(06931af4003be783) [host]> pkill -9 host_forwarder E 74.485s individual_device_set_up(06931af4003be783) Shard failed: individual_device_set_up(06931af4003be783) Has either of you seen this before?
,
Aug 5 2016
I think the host forwarder gn file is wrong. Here's the gyp file: https://cs.chromium.org/chromium/src/tools/android/forwarder2/forwarder.gyp?q=host_forwarder+file:%5C.gyp&sq=package:chromium&l=16&dr=C Here the gn file: https://chromium.googlesource.com/chromium/src/+blame/master/tools/android/forwarder2/BUILD.gn The gyp file builds host_forwarder only for host. Both target and host binaries in gyp go into the output root (out/Release). ...actually, why is there a libbase.cr.so at all? That bot shouldn't be doing a component build (?)
,
Aug 5 2016
Hm, the bot's always been doing debug component builds since it was added: https://codereview.chromium.org/945043003/diff/20001/scripts/slave/recipe_modules/chromium/chromium_fyi.py?context=25&column_width=80&tab_spaces=8 It used to work. It's still kind of strange that it's doing that.
,
Aug 5 2016
thakis@thakis:~/src/chrome/src$ ldd out/Debug/host_forwarder linux-vdso.so.1 => (0x00007fff2d777000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa2b3a93000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa2b378f000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa2b3489000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa2b3273000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa2b3055000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa2b2c90000) /lib64/ld-linux-x86-64.so.2 (0x00007fa2b3c9b000) thakis@thakis:~/src/chrome/src$ ldd out/gnand/host_forwarder linux-vdso.so.1 => (0x00007ffd393f5000) libbase.cr.so => not found libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa878f6d000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa878d65000) libgmodule-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007fa878b61000) libgobject-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007fa878910000) libgthread-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgthread-2.0.so.0 (0x00007fa87870e000) libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007fa878406000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa878102000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa877dfc000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa877be6000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa8779c8000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa877603000) /lib64/ld-linux-x86-64.so.2 (0x00007fa879171000) libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007fa8773fb000) libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fa8771bd000)
,
Aug 5 2016
Aha, base.gyp has this:
['OS == "android" and _toolset == "host"', {
# Always build base as a static_library for host toolset, even if
# we're doing a component build. Specifically, we only care about the
# target toolset using components since that's what developers are
# focusing on. In theory we should do this more generally for all
# targets when building for host, but getting the gyp magic
# per-toolset for the "component" variable is hard, and we really only
# need base on host.
'type': 'static_library',
# Base for host support is the minimum required to run the
# ssl false start blacklist tool. It requires further changes
# to generically support host builds (and tests).
# Note: when building for host, gyp has OS == "android",
# hence the *_android.cc files are included but the actual code
# doesn't have OS_ANDROID / ANDROID defined.
'conditions': [
['host_os == "mac"', {
'sources/': [
['exclude', '^native_library_linux\\.cc$'],
['exclude', '^process_util_linux\\.cc$'],
['exclude', '^sys_info_linux\\.cc$'],
['exclude', '^sys_string_conversions_linux\\.cc$'],
['exclude', '^worker_pool_linux\\.cc$'],
],
}],
],
}],
,
Aug 5 2016
Hm, the internal bot https://uberchromegw.corp.google.com/i/internal.client.clank/builders/asan-clang-phone/builds/747 builds debug component as well. It doesn't run gfx_unittests of components_browsertests though.
,
Aug 5 2016
,
Aug 5 2016
mikecase, jbudorick: You've written the "devil" code that calls md5sum_bin_host, which gets run if gfx_unittests is run (but apparently not for other tests?). In debug component builds with gn, md5sum_bin_host is symlinked to out/gn/md5sum_bin_host but the .cr.so files it depends on are still in the host build dir, so the binary can't run. How is this supposed to work? And why is this binary only needed for gfx_unittests?
,
Aug 5 2016
Hm, the failing tests are the only ones listed her https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_fyi.py?rcl=0&l=1618. Not sure where the other tests the bot runs are coming from.
,
Aug 5 2016
Oh it's not running any other tests, nevermind that last sentence. The internal asan bots also run out/Debug/md5_bin_host (see eg https://uberchromegw.corp.google.com/i/internal.client.clank/builders/asan-clang-phone/builds/747/steps/content_browsertests/logs/stdio) and do component builds. I have no idea why that works there bot not on this bot. From my current understanding it shouldn't work there either.
,
Aug 5 2016
Sorry, thakis, I missed this issue when you initially filed it. I'll look into this in more detail and get back to you.
,
Aug 5 2016
It seems like the host binary is somehow managing to link the device library. I was only able to repro by explicitly moving the device version of libbase.cr.so to the host version's location. I'm guessing that we're seeing this on the chromium.fyi bots but not the internal bot because the latter is builder+tester while the former are split? I'm going to try to catch the tester during a run to investigate further.
,
Aug 5 2016
Ok, I think I know what's going on. On the bot (or building locally), md5sum_bin_host is a symlink to clang_x64/md5sum_bin. Somewhere during the zip/unzip transfer to the tester, the symlink gets resolved and md5sum_bin_host becomes a copy of md5sum_bin rather than a symlink to it. At that point, it's in the same directory as the device version of libbase.cr.so, not the host version, and we get the ELFCLASS32 error. (This can be reproduced locally by removing the md5sum_bin_host symlink and copying the clang_x64/md5sum_bin binary to md5sum_bin_host.) I'm looking into options for resolving this.
,
Aug 5 2016
Thanks for investigating, that makes sense. The easiest fix is probably if we don't split that bot into builder and tester. If you don't use a split config elsewhere, we don't need to make the configuration matrix larger just for this one bot.
,
Aug 5 2016
sgtm, I'll send a CL over in a bit.
,
Aug 5 2016
I think I've seen that swarming has the same behaviour, and also have found scripts that assume non-component mode. Ideally we'd have every device binary use create_native_executable_dist() and have all scripts understand that executables can have dependencies. In practice, it might be better to just try to avoid component mode on bots though.
,
Aug 5 2016
Avoiding component mode on bots seems to me like it'd cause more problems than it solves.
,
Aug 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7b023b1936593c7360fe7a3b6a0d2c6002a174cc commit 7b023b1936593c7360fe7a3b6a0d2c6002a174cc Author: recipe-roller <recipe-roller@chromium.org> Date: Tue Aug 09 00:22:52 2016 Roll recipe dependencies (trivial). This is an automated CL created by the recipe roller. This CL rolls recipe changes from upstream projects (e.g. depot_tools) into downstream projects (e.g. tools/build). More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug (or complain) build: https://crrev.com/4b75224ac514cc0617c184bf2abd58174cdd6b41 Use VCPROFILE_ALLOC_SCALE for the Win32 PGO builds. (sebmarchand@chromium.org) https://crrev.com/b10eaabb7d78114de37325fe33b8f8ae703d5ca0 [Android] Merge ClangToTAndroidASan builder+tester. (jbudorick@chromium.org) R=sebmarchand@chromium.org,jbudorick@chromium.org BUG= 616118 ,632864 TBR=martiniss@chromium.org,phajdan.jr@chromium.org Review-Url: https://codereview.chromium.org/2226793003 Cr-Commit-Position: refs/heads/master@{#410514} [modify] https://crrev.com/7b023b1936593c7360fe7a3b6a0d2c6002a174cc/infra/config/recipes.cfg
,
Aug 9 2016
post-merge, gfx_unittests is running and only failing one test. components_unittests was still passing the checked-in .isolate file, though, and angle made some changes recently that broke those. https://codereview.chromium.org/2226753003/ should fix that issue. Once that lands, we'll be able to see whether the ELFCLASS error is gone (which it should be).
,
Aug 27 2016
While this bot is still troubled by gclient runhooks and gfx_unittests failures, the ELFCLASS error is gone. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by thakis@chromium.org
, Aug 5 2016