New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 616118 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug



Sign in to add a comment

Regression "libbase.cr.so: wrong ELF class: ELFCLASS32" on android asan tot bot

Project Member Reported by thakis@chromium.org, May 31 2016

Issue description

https://build.chromium.org/p/chromium.fyi/builders/ClangToTAndroidASan%20tester/builds/1149/steps/components_browsertests/logs/stdio

Traceback (most recent call last):
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/device/local_device_test_run.py", line 64, in wrapper
    return f(dev, *args, **kwargs)
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/device/local_device_gtest_run.py", line 274, in individual_device_set_up
    step()
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/device/local_device_gtest_run.py", line 267, in init_tool_and_start_servers
    s.SetUp()
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/build/android/pylib/local/local_test_server_spawner.py", line 33, in SetUp
    [(self.port, self.port)], self._device, self._tool)
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 85, in Map
    instance = Forwarder._GetInstanceLocked(tool)
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 195, in _GetInstanceLocked
    Forwarder._instance = Forwarder(tool)
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 215, in __init__
    self._InitHostLocked()
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 269, in _InitHostLocked
    self._KillHostLocked()
  File "/b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/catapult/devil/devil/android/forwarder.py", line 322, in _KillHostLocked
    '\n'.join(output)))
HostForwarderError: /b/build/slave/ClangToTAndroidASan_tester/build/src/out/Debug/host_forwarder exited with 1:

I   73.977s individual_device_set_up(0693cc46003be734)  Adding 0693cc46003be734 to blacklist /b/build/slave/ClangToTAndroidASan_tester/build/src/out/bad_devices.json for reason: individual_device_set_up
I   73.987s TimeoutThread-1-for-individual_device_set_up(073141810069b922)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 root
I   74.032s TimeoutThread-1-for-individual_device_set_up(073141810069b922)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 wait-for-device
I   74.037s TimeoutThread-1-for-individual_device_set_up(073141810069b922)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 shell '( test -d /storage/emulated/legacy );echo %$?'
I   74.087s TimeoutThread-1-for-individual_device_set_up(073141810069b922)  condition 'sd_card_ready' met (0.1s)
I   74.087s TimeoutThread-1-for-individual_device_set_up(073141810069b922)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 073141810069b922 shell '( pm path android );echo %$?'
I   74.231s TimeoutThread-1-for-individual_device_set_up(06931af4003be783)  condition 'pm_ready' met (1.0s)
I   74.232s TimeoutThread-1-for-individual_device_set_up(06931af4003be783)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( getprop sys.boot_completed );echo %$?'
I   74.292s TimeoutThread-1-for-individual_device_set_up(06931af4003be783)  condition 'boot_completed' met (1.0s)
I   74.300s TimeoutThread-1-for-individual_device_set_up(06931af4003be783)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( su -c ls /root && ! ls /root );echo %$?'
I   74.353s TimeoutThread-1-for-individual_device_set_up(06931af4003be783)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( echo -n 20.000000 > /data/local/tmp/chrome_timeout_scale );echo %$?'
I   74.404s individual_device_set_up(06931af4003be783)  Allocate port 10202 for test server.
I   74.404s individual_device_set_up(06931af4003be783)  Creating new spawner on port: 10202.
I   74.406s TimeoutThread-1-for-individual_device_set_up(06931af4003be783)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/third_party/android_tools/sdk/platform-tools/adb -s 06931af4003be783 shell '( echo -n 10202:0 > /storage/emulated/legacy/net-test-server-ports );echo %$?'
I   74.470s individual_device_set_up(06931af4003be783)  Killing host_forwarder.
I   74.470s individual_device_set_up(06931af4003be783)  [host]> /b/build/slave/ClangToTAndroidASan_tester/build/src/out/Debug/host_forwarder --kill-server
C   74.474s individual_device_set_up(06931af4003be783)  STDERR: /b/build/slave/ClangToTAndroidASan_tester/build/src/out/Debug/host_forwarder: error while loading shared libraries: libbase.cr.so: wrong ELF class: ELFCLASS32

I   74.474s individual_device_set_up(06931af4003be783)  [host]> pkill -9 host_forwarder
E   74.485s individual_device_set_up(06931af4003be783)  Shard failed: individual_device_set_up(06931af4003be783)


Has either of you seen this before?
 
Cc: thakis@chromium.org
 Issue 634982  has been merged into this issue.
I think the host forwarder gn file is wrong. Here's the gyp file:

https://cs.chromium.org/chromium/src/tools/android/forwarder2/forwarder.gyp?q=host_forwarder+file:%5C.gyp&sq=package:chromium&l=16&dr=C

Here the gn file:

https://chromium.googlesource.com/chromium/src/+blame/master/tools/android/forwarder2/BUILD.gn

The gyp file builds host_forwarder only for host. Both target and host binaries in gyp go into the output root (out/Release).

...actually, why is there a libbase.cr.so at all? That bot shouldn't be doing a component build (?)
Hm, the bot's always been doing debug component builds since it was added: https://codereview.chromium.org/945043003/diff/20001/scripts/slave/recipe_modules/chromium/chromium_fyi.py?context=25&column_width=80&tab_spaces=8

It used to work. It's still kind of strange that it's doing that.
thakis@thakis:~/src/chrome/src$ ldd out/Debug/host_forwarder 
	linux-vdso.so.1 =>  (0x00007fff2d777000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa2b3a93000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa2b378f000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa2b3489000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa2b3273000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa2b3055000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa2b2c90000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa2b3c9b000)
thakis@thakis:~/src/chrome/src$ ldd out/gnand/host_forwarder 
	linux-vdso.so.1 =>  (0x00007ffd393f5000)
	libbase.cr.so => not found
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa878f6d000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa878d65000)
	libgmodule-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007fa878b61000)
	libgobject-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007fa878910000)
	libgthread-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgthread-2.0.so.0 (0x00007fa87870e000)
	libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007fa878406000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa878102000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa877dfc000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa877be6000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa8779c8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa877603000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa879171000)
	libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007fa8773fb000)
	libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fa8771bd000)

Aha, base.gyp has this:

        ['OS == "android" and _toolset == "host"', {
          # Always build base as a static_library for host toolset, even if
          # we're doing a component build. Specifically, we only care about the
          # target toolset using components since that's what developers are
          # focusing on. In theory we should do this more generally for all
          # targets when building for host, but getting the gyp magic
          # per-toolset for the "component" variable is hard, and we really only
          # need base on host.
          'type': 'static_library',
          # Base for host support is the minimum required to run the
          # ssl false start blacklist tool. It requires further changes
          # to generically support host builds (and tests).
          # Note: when building for host, gyp has OS == "android",
          # hence the *_android.cc files are included but the actual code
          # doesn't have OS_ANDROID / ANDROID defined.
          'conditions': [
            ['host_os == "mac"', {
              'sources/': [
                ['exclude', '^native_library_linux\\.cc$'],
                ['exclude', '^process_util_linux\\.cc$'],
                ['exclude', '^sys_info_linux\\.cc$'],
                ['exclude', '^sys_string_conversions_linux\\.cc$'],
                ['exclude', '^worker_pool_linux\\.cc$'],
              ],
            }],
          ],
        }],
Hm, the internal bot https://uberchromegw.corp.google.com/i/internal.client.clank/builders/asan-clang-phone/builds/747 builds debug component as well. It doesn't run gfx_unittests of components_browsertests though.
Labels: Proj-GN-Migration
Cc: mikec...@chromium.org
mikecase, jbudorick: You've written the "devil" code that calls md5sum_bin_host, which gets run if gfx_unittests is run (but apparently not for other tests?). In debug component builds with gn, md5sum_bin_host is symlinked to out/gn/md5sum_bin_host but the .cr.so files it depends on are still in the host build dir, so the binary can't run. How is this supposed to work? And why is this binary only needed for gfx_unittests?
Cc: h...@chromium.org
Hm, the failing tests are the only ones listed her https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_fyi.py?rcl=0&l=1618. Not sure where the other tests the bot runs are coming from.
Oh it's not running any other tests, nevermind that last sentence.

The internal asan bots also run out/Debug/md5_bin_host (see eg https://uberchromegw.corp.google.com/i/internal.client.clank/builders/asan-clang-phone/builds/747/steps/content_browsertests/logs/stdio) and do component builds. I have no idea why that works there bot not on this bot. From my current understanding it shouldn't work there either.
Owner: jbudorick@chromium.org
Status: Assigned (was: Untriaged)
Sorry, thakis, I missed this issue when you initially filed it. I'll look into this in more detail and get back to you.
It seems like the host binary is somehow managing to link the device library. I was only able to repro by explicitly moving the device version of libbase.cr.so to the host version's location.

I'm guessing that we're seeing this on the chromium.fyi bots but not the internal bot because the latter is builder+tester while the former are split? I'm going to try to catch the tester during a run to investigate further.
Ok, I think I know what's going on. On the bot (or building locally), md5sum_bin_host is a symlink to clang_x64/md5sum_bin. Somewhere during the zip/unzip transfer to the tester, the symlink gets resolved and md5sum_bin_host becomes a copy of md5sum_bin rather than a symlink to it. At that point, it's in the same directory as the device version of libbase.cr.so, not the host version, and we get the ELFCLASS32 error. (This can be reproduced locally by removing the md5sum_bin_host symlink and copying the clang_x64/md5sum_bin binary to md5sum_bin_host.)

I'm looking into options for resolving this.
Thanks for investigating, that makes sense.

The easiest fix is probably if we don't split that bot into builder and tester. If you don't use a split config elsewhere, we don't need to make the configuration matrix larger just for this one bot.
Status: Started (was: Assigned)
sgtm, I'll send a CL over in a bit.
I think I've seen that swarming has the same behaviour, and also have found scripts that assume non-component mode. 

Ideally we'd have every device binary use create_native_executable_dist() and have all scripts understand that executables can have dependencies.

In practice, it might be better to just try to avoid component mode on bots though.
Avoiding component mode on bots seems to me like it'd cause more problems than it solves.
Project Member

Comment 18 by bugdroid1@chromium.org, Aug 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7b023b1936593c7360fe7a3b6a0d2c6002a174cc

commit 7b023b1936593c7360fe7a3b6a0d2c6002a174cc
Author: recipe-roller <recipe-roller@chromium.org>
Date: Tue Aug 09 00:22:52 2016

Roll recipe dependencies (trivial).

This is an automated CL created by the recipe roller. This CL rolls recipe
changes from upstream projects (e.g. depot_tools) into downstream projects
(e.g. tools/build).

More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug
(or complain)

build:
  https://crrev.com/4b75224ac514cc0617c184bf2abd58174cdd6b41 Use VCPROFILE_ALLOC_SCALE for the Win32 PGO builds. (sebmarchand@chromium.org)
  https://crrev.com/b10eaabb7d78114de37325fe33b8f8ae703d5ca0 [Android] Merge ClangToTAndroidASan builder+tester. (jbudorick@chromium.org)

R=sebmarchand@chromium.org,jbudorick@chromium.org
BUG= 616118 ,632864

TBR=martiniss@chromium.org,phajdan.jr@chromium.org

Review-Url: https://codereview.chromium.org/2226793003
Cr-Commit-Position: refs/heads/master@{#410514}

[modify] https://crrev.com/7b023b1936593c7360fe7a3b6a0d2c6002a174cc/infra/config/recipes.cfg

post-merge, gfx_unittests is running and only failing one test. components_unittests was still passing the checked-in .isolate file, though, and angle made some changes recently that broke those. https://codereview.chromium.org/2226753003/ should fix that issue. Once that lands, we'll be able to see whether the ELFCLASS error is gone (which it should be).
Status: Fixed (was: Started)
While this bot is still troubled by gclient runhooks and gfx_unittests failures, the ELFCLASS error is gone.

Sign in to add a comment