New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 674998 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

amd64-generic-tot-asan-informational failures related to cryptohome-path (ASAN with high ASLR randomness)

Project Member Reported by steve...@chromium.org, Dec 16 2016

Issue description

Failure:

https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-tot-asan-informational/builds/11332

Initially there is a warning:

WARNING: Image format was not specified for '/tmp/cbuildbot-tmp6YoEnz/chromiumos_qemu_disk.bin.hX7GT2' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions

Then most tests fail, e.g.

09:05:51 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py", line 486, in CryptohomePath
09:05:51 INFO | autoserv| AUTOTEST_STATUS::      raise OSError('cryptohome-path failed: %s' % stderr)
09:05:51 INFO | autoserv| AUTOTEST_STATUS::  OSError: cryptohome-path failed: [1216/070520:ERROR:cryptohome.cc(39)] Could not get size of system salt: /home/.shadow/salt: No such file or directory


I suspect this might be a fluke / problem with the build slave (https://build.chromium.org/p/chromiumos.chromium/buildslaves/build330-m2)

I will keep an eye on the next build.

 
Cc: achuith@chromium.org ihf@chromium.org
This failed again. I think maybe the WARNING is a red herring, we appear to show less output on a successful run.

Investigating the first failure:

12/16 12:19:58.745 INFO | test_runner_utils:0198| autoserv| AUTOTEST_STATUS::		FAIL	security_NetworkListeners	security_NetworkListeners	timestamp=1481912396	localtime=Dec 16 10:19:56	Unhandled OSError: cryptohome-path failed: Segmentation fault
12/16 12:19:58.745 INFO | test_runner_utils:0198| autoserv| FAIL	security_NetworkListeners	security_NetworkListeners	timestamp=1481912396	localtime=Dec 16 12:19:56	Unhandled OSError: cryptohome-path failed: Segmentation fault

FYI: I think that WARNING comes directly from qemu. I've seen that printed every time I start a VM.
Cc: sque@chromium.org derat@chromium.org
Summary: amd64-generic-tot-asan-informational failures related to cryptohome-path (was: PFQ informational: VMTest error: "Image format was not specified")
+derat@, +sque@

I see that there have been some recent changes to cryptohome-path, so I am wondering if those might be related?

Dan: I'm still trying to find out if / where cryptohome-path sends any output. Any idea?

There is some potentially relevant info in messages here:
https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/amd64-generic-tot-asan-informational/R57-9092.0.0-b11333/vm_test_results_1/test_harness/all/SimpleTestVerify/1_autotest_tests/results-01-security_NetworkListeners/security_NetworkListeners/sysinfo/

Lots of snippets like this:

2016-12-16T18:19:07.598075+00:00 WARNING cryptohomed[883]: TSS: Failed unix connect: /var/run/tcsd.socket - No such file or directory
2016-12-16T18:19:07.605843+00:00 WARNING cryptohomed[883]: TSS: Got a list of valid IPs
2016-12-16T18:19:07.606186+00:00 WARNING cryptohomed[883]: TSS: Could not connect to machine: localhost
2016-12-16T18:19:07.606228+00:00 ERR cryptohomed[883]: TSS: Could not connect to any machine in the list.
2016-12-16T18:19:07.606300+00:00 ERR cryptohomed[883]: TSS: Failed to send packet

Then:

2016-12-16T18:19:07.989912+00:00 NOTICE autotest[3260]: 10:19:07.630 ERROR|           browser:0062| Failure while starting browser backend.#012Traceback (most recent call last):#012  File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__#012    self._browser_backend.Start()#012  File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function#012    return func(*args, **kwargs)#012  File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 166, in Start#012    self._WaitForLogin()#012  File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function#012    return func(*args, **kwargs)#012  File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 264, in _WaitForLogin#012    py_utils.WaitFor(self._IsLoggedIn, 900)#012  File "/us

Then more of the TSS failures, then the cryptohome-path failure:

2016-12-16T18:19:08.886294+00:00 NOTICE autotest[3264]: 10:19:08.870 WARNI|              test:0606| Autotest caught exception when running test:#012Traceback (most recent call last):#012  File "/usr/local/autotest/common_lib/test.py", line 600, in _exec#012    _call_test_function(self.execute, *p_args, **p_dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 810, in _call_test_function#012    raise error.UnhandledTestFail(e)#012UnhandledTestFail: Unhandled OSError: cryptohome-path failed: Segmentation fault#012#012Traceback (most recent call last):#012  File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function#012    return func(*args, **dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 461, in execute#012    dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry#012    postprocess_profiled_run, args, dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once#012    self.run_once(*args, **dargs)#012  File "/usr/local/autotest/tests/security_NetworkListeners/security_NetworkListeners.py", line 99, i




Comment 5 Deleted

+ngm@ who landed this cryptohome change in the blame list:

https://chromium-review.googlesource.com/#/c/419802/

Comment 7 by ngm@chromium.org, Dec 16 2016

Cc: -lhchavez@chromium.org apronin@chromium.org
I'm unable to find logs for cryptohome.  Are more details available?  

Comment 9 by derat@chromium.org, Dec 17 2016

Sorry, I don't really know anything about cryptohome. cryptohome-path looks like it's a simple wrapper around either brillo::cryptohome::home::GetRootPath() or GetUserPath() that prints to stdout, so knowing that it segfaulted doesn't narrow things down. Is there any way to get a stack trace from the crash?

Comment 10 by apronin@google.com, Dec 17 2016

cryptohome said why it failed: "Could not get size of system salt: /home/.shadow/salt: No such file or directory". This file contains salt for generating obfuscated user names in GetUserPath().

Comment 11 by apronin@google.com, Dec 17 2016

The salt was not created by cryptohomed daemon in Mount:Init(). cryptohomed initialization failed when it couldn't connect to tcsd daemon that talks to TPM ("2016-12-16T18:16:31.051423+00:00 WARNING cryptohomed[883]: TSS: Failed unix connect: /var/run/tcsd.socket - No such file or directory").


Comment 12 by apronin@google.com, Dec 17 2016

tcsd daemon died soon after start:

2016-12-16T18:16:24.460945+00:00 WARNING kernel: [    7.459783] init: tcsd main process (556) terminated with status 127

Likely because the TPM is in the dictionary attack lockup state:

2016-12-16T18:16:24.151505+00:00 NOTICE tcsd-pre-start[544]: WARNING: Non-zero dictionary attack counter found: 100

Comment 13 by apronin@google.com, Dec 17 2016

What machine is this? Is this a physical board? Does it have an actual TPM chip?
This is a VM, so no TPM chip
I'm wondering if the build slave is in a bad state. I requested a restart to see if that clears things up, issue 675655 


Labels: -Pri-2 Pri-1
Cc: steve...@chromium.org
Owner: jdufault@chromium.org
Status: Started (was: Untriaged)
Looks like the slave restart didn't fix the problem. I'll start investigating.
The slave restart happened at:

last puppet run: 2016-12-20 12:12:24 PST

So the first run that started afterwards is still running:
https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-tot-asan-informational/builds/11366

Ah, gotcha. Thanks
Yea :(, I'm going to start taking a closer look.
I downloaded one of the failing VMs and started playing around. The very strange thing is that cryptohome-path will segfault even if you give it no arguments, ie,

  $ cryptohome-path
  Segmentation fault

I've been unsuccessful thus far in my attempts to collect a core dump. cryptohome-path never segfaults when running under gdb.

I'm guessing this is related to the glibc update, but that has been reverted so this bot should have been updated.

Re earlier comments:
 - inside the VM, /home/shadow/.salt seems valid in the sense that it exists and contains some data
 - moving /home/shadow/.salt causes cryptohome-path to emit an error message and fail; it does not segfault
 - Re tpm dictionary attack state: 100 is the default value when we can't fetch the current state [1].

1: https://cs.corp.google.com/chromeos_public/src/third_party/trousers/init/tcsd-pre-start.sh?l=65
I believe the VM I'm testing is running glibc pre-update (2.19 -> 2.23 [1]):

  $ ldd --version
  ldd (Gentoo 2.19-r13 p2) 2.19

1: https://chromium-review.googlesource.com/c/339261/
Ah! Enabling ASLR in gdb let me repro the segfault.

  $ gdb cryptohome-path
  $ set disable-randomization off
  $ run # repeat until segfault

1: http://stackoverflow.com/a/4628558
As expected from comment #24, the binary no longer segfaults after disabling ASLR.

  $ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space # Restore ASLR using 2
  $ cryptohome-path # Will not segfault

Also, using LD_DEBUG=reloc shows something is going wrong in preinit section (I'm guessing some sort of static ctor? This is definitely not an area I'm very familiar with). (note: LD_DEBUG=all gives a lot more info, way too much to include here).

  $ LD_DEBUG=reloc /usr/sbin/cryptohome-path
    18177:
    18177:     relocation processing: /lib64/libc.so.6
    18177:
    18177:     relocation processing: /lib64/libz.so.1
    18177:
    18177:     relocation processing: /usr/lib64/libevent_core-2.0.so.5
    18177:
    18177:     relocation processing: /lib64/libpthread.so.0
    18177:
    18177:     relocation processing: /usr/lib64/libglib-2.0.so.0
    18177:
    18177:     relocation processing: /lib64/libdl.so.2
    18177:
    18177:     relocation processing: /usr/lib64/libcrypto.so.1.0.0
    18177:
    18177:     relocation processing: /usr/lib64/libgcc_s.so.1
    18177:
    18177:     relocation processing: /lib64/librt.so.1
    18177:
    18177:     relocation processing: /lib64/libm.so.6
    18177:
    18177:     relocation processing: /usr/lib64/libstdc++.so.6
    18177:
    18177:     relocation processing: /usr/lib64/libbase-core-395517.so
    18177:
    18177:     relocation processing: /usr/lib64/libbrillo-cryptohome-395517.so
    18177:
    18177:     relocation processing: /usr/sbin/cryptohome-path
    18177:
    18177:     relocation processing: /lib64/ld-linux-x86-64.so.2
    18177:
    18177:     calling init: /lib64/libpthread.so.0
    18177:
    18177:
    18177:     calling preinit: /usr/sbin/cryptohome-path
    18177:
    Segmentation fault
There were a lot of CLs in the first breaking build:
https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-tot-asan-informational/builds/11332

It might be worthwhile going through each of them and seeing which ones affect cryptohome-path?

The crash should be somewhere in here, right:
https://cs.corp.google.com/chromeos_public/src/aosp/external/libbrillo/brillo/cryptohome.cc
Cc: jdufault@chromium.org
Owner: ihf@chromium.org
Status: Assigned (was: Started)
Re comment #22: I didn't see any CLs in that list that appear relevant to the breakage.

Assigning to current gardener since I'm OOO for the rest of this week.

I need to file a bug on this, but for reproducing, you can only do an asan build in a non-internal cros checkout. One of the packages fails to build right now in an internal cros checkout.

The easiest way to repro locally is as follows:

  # from $CROS_SRC dir
  $ cros_sdk
  $ ./setup_board --profile=asan --board=amd64-generic
  $ ./build_packages --board=amd64-generic
  $ ./image_to_vm.sh --board=amd64-generic --test_image

  # from $CROS_SRC/src/scripts dir
  $ ./bin/cros_start_vm --board=amd64-generic
  $ ssh root@localhost -p 9222 -o StrictHostKeyChecking=no

Once sshed into the virtual machine, run cryptohome-path binary a few times. It should segfault about a third of the time.
Cc: semenzato@chromium.org
Happy New Year and welcome back everybody!  Just a *friendly* reminder from your Sheriff that this is still failing on the external waterfall.


01/03 00:09:42.771 INFO |        server_job:0153| 		FAIL	security_NetworkListeners	security_NetworkListeners	timestamp=1483423780	localtime=Jan 03 00:09:40	Unhandled OSError: cryptohome-path failed: Segmentation fault (core dumped)


Cc: afakhry@chromium.org
Owner: lpique@chromium.org
-> This week's gardener.
I followed the instructions in #27 but I can't successfully build. build_packages fail building dlm-0.0.1-r10 with multiple link errors such as:

dlm-0.0.1-r10: gen/include/power_manager/proto_bindings/suspend.pb.cc:19: error: undefined reference to '__asan_report_load8'
dlm-0.0.1-r10: gen/include/power_manager/proto_bindings/suspend.pb.cc:19: error: undefined reference to '__asan_report_load8'
dlm-0.0.1-r10: gen/include/power_manager/proto_bindings/suspend.pb.cc:19: error: undefined reference to '__ubsan_handle_type_mismatch'
dlm-0.0.1-r10: gen/include/power_manager/proto_bindings/suspend.pb.cc:20: error: undefined reference to '__asan_report_load8'
dlm-0.0.1-r10: gen/include/power_manager/proto_bindings/suspend.pb.cc:20: error: undefined reference to '__asan_report_load8'


Am I missing something?

Re #30: It looks like you're building with a cros checkout that has internal files. Building with a public-only checkout worked for me.
I've reproduced the build and the crash with cryptohome-path.

Looking at the source for it, just running "cryptohome-path" does very little -- checks argc/argv, and prints an error message.

I altered the source to be just an empty main, with no includes, removed library dependencies from the .gyp file for it, rebuilt it and ran it, and it still crashed.

    $ LD_DEBUG=reloc /usr/sbin/cryptohome-path
    9734:	relocation processing: /lib64/libc.so.6
    9734:	relocation processing: /usr/lib64/libgcc_s.so.1
    9734:	relocation processing: /lib64/libdl.so.2
    9734:	relocation processing: /lib64/libpthread.so.0
    9734:	relocation processing: /lib64/librt.so.1
    9734:	relocation processing: /lib64/libm.so.6
    9734:	relocation processing: /usr/lib64/libstdc++.so.6
    9734:	relocation processing: /usr/sbin/cryptohome-path
    9734:	relocation processing: /lib64/ld-linux-x86-64.so.2
    9734:	calling init: /lib64/libpthread.so.0
    9734:	calling preinit: /usr/sbin/cryptohome-path
    Segmentation fault (core dumped)

An objdump -d of the (pre-stripped) executable shows lots of static initializers  ... libc, gmon, cxa, sanitizer, ...

Repeating with LD_DEBUG=all and the core dump consistently happens here relative to the logged output (compare to a run where it completed without error):

     [...]
     13627: symbol=fork;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     13627: symbol=fork;  lookup in file=/lib64/libm.so.6 [0]
     13627: symbol=fork;  lookup in file=/lib64/libpthread.so.0 [0]
     13627: binding file /usr/sbin/cryptohome-path [0] to /lib64/libpthread.so.0 [0]: normal symbol `fork'
     << core dumped here consistently, if it did >>
     13627: symbol=_dl_get_tls_static_info;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     13627: symbol=_dl_get_tls_static_info;  lookup in file=/lib64/libm.so.6 [0]
     13627: symbol=_dl_get_tls_static_info;  lookup in file=/lib64/libpthread.so.0 [0]
     [...]

Its at least a hint about where it is occurring. I'll try more tomorrow, but at least this is an update.
After figuring out to get a working pair of manifests for the amd64-generic-tot-asan-informational builds (attached), and having to do a complete rebuild of the chroot because it was going back in time too much, I've done some further triaging.

The differences in the two manifests was this short list of paths and versions:

    src/aosp/system/connectivity/shill b8ab59eb2547e21fcf077e64cb1afb67d3bdcb71 -> 726fe8b5afab7612a4d09909fe15067a2c21e03d
    src/overlays f424ed0aac3da4b2ad57a737e6e9afd265062f86 -> 5ddb0ecebda10a577d0cd49085951ff757e9bc42
    src/platform2 101f9b690b1e09be7c9527cb328d63cc60e6c453 -> ebbf0fa486c54a53624a2bd84704dc81c9940a00
    src/third_party/adhd 52aa233a3352c981d4f445edfbaea4e14425d965 -> 1efbf06defd77b96c8a534b2c15617256236f18c
    src/third_party/autotest/files fbe5ff7d9df877abe401395b841fd84a28d19176 -> 0499ed2235cda86db02f84d64ae798d8032b6702
    src/third_party/chromiumos-overlay 8d11c99a29f8c71a96ea11ba2522c91a43b2614d -> 446d5e4b39197162a2fbc18a67e8bd725405a4f9
    src/third_party/coreboot 021145eeb6e2223d5c513e34fa808b2d062997b5 -> 0e4272c74e87a58d703a2489c943308bea2b3a4a
    src/third_party/kernel/v3.18 229586317c0cb78ec95eb9275d0bb7f623b7f70c -> 1c3f9498f57741bdf2b6d0333a8270efe81021cb

It turned out that the breaking change was

    src/third_party/chromiumos-overlay 63e254767c2b60939939b040630750b2ea399c37

But this was an automated submit to mark a large number of ebuilds as stable

For someone to continuing triaging from this point, they would start by syncing to the build 11332 manifest (attached), and checkout 63e254767c2b60939939b040630750b2ea399c37 src/third_party/chromiumos-overlay and start bisecting the files involved.

I think I have enough time to try the obvious candidates before I leave today, and I'll report back on this bug for which files I reverted and whether that still led to the coredump running "cryptohome-path" in the vm. I'll be out this next week so I expect to pass the bug on to hshi@
manifest-amd-64-generic-tot-asan-11331.xml
27.9 KB View Download
manifest-amd-64-generic-tot-asan-11332.xml
27.7 KB View Download
Owner: h...@chromium.org
Reverting these paths from 63e254767c2b60939939b040630750b2ea399c37 did *NOT* fix the coredump 

chromeos-base/attestation
chromeos-base/chromite
chromeos-base/cryptohome
sys-kernel/chromeos-kernel-3_18

Cc: gurcheta...@chromium.org

Comment 36 by h...@chromium.org, Jan 9 2017

Ok thanks to lpique@ I've set up my repo and can reproduce the crash. It occurs roughly 20% of the time, but if I attach a debugger then it never crashes.
Re #36: See comment 24; you need to enable ASLR.

  $ gdb cryptohome-path
  $ set disable-randomization off
  $ run # repeat until segfault

Comment 38 by h...@chromium.org, Jan 9 2017

Re #37: sorry I missed that! Yes this works, thanks

Comment 39 by h...@chromium.org, Jan 9 2017

Re #34 notice that the amd4-generic asan build uses kernel 4.4 by default, not kernel 3.18

The list of kernel 4.4 changes in this range is

8c372912c802 CHROMIUM: thermal: rockchip: sync the typo to upstream
f34b66cee68b UPSTREAM: thermal: rockchip: handle set_trips without the trip points
f905074bcde5 UPSTREAM: thermal: rockchip: optimize the conversion table
8147719c7b38 CHROMIUM: config: set mmap_rnd[_compats]_bits to the maximum
b58324298b8d UPSTREAM: netfilter: nfnetlink: use original skbuff when acking batches
6edfbf9581e7 UPSTREAM: thermal: rockchip: fixes invalid temperature case
e47a7da072d1 CHROMIUM: drm/rockchip: Only wait for panel ACK on PSR entry

I'm suspecting this one

commit 8147719c7b38b5eb7713fcc6dfa660a7967d8d1e
Author: Nicolas Boichat <drinkcat@chromium.org>
Date:   Tue Dec 13 15:09:13 2016 +0800

    CHROMIUM: config: set mmap_rnd[_compats]_bits to the maximum
    
    BUG=b:33398361
    TEST=Run CTS CtsAslrMallocTestCases module
    
    Change-Id: Iffdcfdd4ce4fbdf445e4ada7c20f2b6935d73a0e
    Reviewed-on: https://chromium-review.googlesource.com/418145
    Commit-Ready: Nicolas Boichat <drinkcat@chromium.org>
    Tested-by: Nicolas Boichat <drinkcat@chromium.org>
    Reviewed-by: Mattias Nissler <mnissler@chromium.org>
    Reviewed-by: Douglas Anderson <dianders@chromium.org>

Comment 40 by h...@chromium.org, Jan 9 2017

Cc: drinkcat@chromium.org diand...@chromium.org
CC drinkcat and dianders who reviewed the kernel 4.4 change that modified mmap_rnd[_compats]_bits config

This seems somewhat related to ASLR and the random crashes we're seeing.

Comment 41 by h...@chromium.org, Jan 9 2017

Status: Started (was: Assigned)

Comment 42 by h...@chromium.org, Jan 10 2017

I can confirm that the breaking change is https://chromium-review.googlesource.com/418145

On TOT I am able to reproduce this crash at about 20% probability.

But with the kernel 4.4 CL reverted (https://chromium-review.googlesource.com/418145) I am NOT able to reproduce the crash after 200 repeated runs.

Comment 43 by h...@chromium.org, Jan 10 2017

Please review https://chromium-review.googlesource.com/#/c/426066/

I propose that we revert the kernel patch first. Then I can either reassign this to drinkcat@ or we can start a separate bug to track re-landing the kernel 4.4 patch.
As mentioned in the CL. Feel free to revert, but:
 1. This will break CTS on N.
 2. ASAN/cryptohome is probably broken: I suspect increasing ASLR randomness just makes an underlying issue more likely to happen. Somebody should investigate.
 3. AFAIK, more ASLR randomness is generally better for security. I don't think we should compromise that to pass an ASAN test.

Comment 45 by h...@chromium.org, Jan 10 2017

Nicholas: since the breaking tests are on the x86-64 config: is it really necessary to bump to 32 bits to pass CTS on N?
According to CTS AslrMallocTest.cpp, it really only tries huge allocations of up to 2^23 bytes.
The values we set in config options match what a normal Android instance would set: https://codesearch.corp.google.com/android/system/core/init/init.cpp?type=cs&q=set_mmap_rnd_bits_action&l=324 . But, yes, we can probably get away with only reverting CONFIG_ARCH_MMAP_RND_BITS change (since we only ever use 32-bit containers).

Comment 47 by h...@chromium.org, Jan 10 2017

re:#44 drinkcat@ I also initially thought about the possibility of cryptohome being broken, but according to comment #32 this is happening even if we just build an empty executable with a main() function that does nothing and returns 0.
Understood, but I suspect there are other executables that actually run fine, why is it only cryptohome that shows this issue?

Reverting the ASLR patch just sweeps the issue under the carpet, there's something else going on that needs deeper investigation.
Since this has been broken for a month anyway, I suggest that we hold off the revert for a few days (which we'd eventually have to reapply), and try to investigate the underlying issue first.

Comment 50 by h...@chromium.org, Jan 10 2017

I instrumented 4.4 kernel in arch/x86/mm/mmap.c and in fs/exec.c to look at the mmap_base values for /usr/sbin/cryptohome-path.

Clearly the mmap_base values are randomized, but as far as I can see there's no discernible patterns for which base address values cause crashes and which do not. I've seem small addresses, large addresses, and pretty much anything in between that either cause or does not cause crashes.

Comment 51 by h...@chromium.org, Jan 10 2017

Experiment shows that it is sufficient to set CONFIG_ARCH_MMAP_RND_BITS to 31 in third_party/kernel/v4.4/chromeos/config/x86_64/common.config to completely eliminate the crash. We don't need to go back to 28.

Setting it to 32 however will cause crashes.

Comment 52 by h...@chromium.org, Jan 10 2017

The crash seems related to load_elf_binary() in fs/binfmt_elf.c

I instrumented the |load_bias| value calculated for /usr/sbin/cryptohome-path when randomization is enabled. The default load bias equals 0x555555555555ull (seems to be a hard-coded constant in kernel equal to 1/3 of some power of 2) and the randomized load_bias adds a random 32-bit unsigned int left-shifted by 12 bits. So the result is between 0x555555555555ull and 0x655555555554ull.

In all the crashing cases, the randomized |load_bias| is greater than 0x600000000000ull, whereas all values of |load_bias| below 0x5fffffffffffull does not cause the crash.
Cc: keescook@chromium.org mnissler@chromium.org
Summary: amd64-generic-tot-asan-informational failures related to cryptohome-path (ASAN with high ASLR randomness) (was: amd64-generic-tot-asan-informational failures related to cryptohome-path)
+keescook

Comment 54 by h...@chromium.org, Jan 10 2017

More experiments confirm that the threshold of the maximum return value of arch_mmap_rnd() for which crash begins to occur is roughly 0xAAAAAAAA (2/3 of 2^32). So, it is safe to use 31 bits of randomness but not 32 bits.

The following patch completely eliminates the crash:

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d2dc0438d654..ccc8301860a6 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -76,7 +76,9 @@ unsigned long arch_mmap_rnd(void)
                rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
 #endif
        else
+       do
                rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
+       while (rnd > 0xaaaaaaaaUL);
 
        return rnd << PAGE_SHIFT;
 }

Comment 55 by h...@chromium.org, Jan 10 2017

One thing is clear that I can't reproduce the crash with other executables, such as crossystem. But pretty much every executable target in the cryptohome.gyp can trigger this crash with more or less the same probability, including:

/usr/sbin/cryptohome
/usr/sbin/cryptohomed
/usr/sbin/cryptohome-path
/usr/sbin/lockbox-ccache
/usr/sbin/tpm-manager

It could be an obscure bug in one of the dependency libraries that the various cryptohome targets are linked to, or (although less likely) it could be a kernel bug.

Instead of the above patch, please lower the sysctl for mmap_rnd_bits. That should solve it...

Comment 57 by h...@chromium.org, Jan 10 2017

re:#56 keescook@: yes I understand; the patch in #54 is for illustrative purposes only. We can certainly reduce mmap_rnd_bits from 32 to 31 and that will completely eliminate the crash, however we still want to find out why there's a problem with 32 bits of randomness.
Okay, understood. I would examine the ranges available for ET_DYN, brk, mmap, and stack. It's possible that at the extreme ends of their ranges they can collide.

Comment 59 by h...@chromium.org, Jan 10 2017

For reference, here's the backtrace in gdb when SIGSEGV occurs

Program received signal SIGSEGV, Segmentation fault.
0x0000638b1524f520 in ?? ()
(gdb) bt
#0  0x0000638b1524f520 in ?? ()
#1  0x0000638b1525d7c5 in ?? ()
#2  0x0000638b1518f698 in ?? ()
#3  0x0000638b1524284e in ?? ()
#4  0x00007346c589ab2b in _dl_init (main_map=0x7346c5ab0128, argc=1, argv=0x7ffd5d29c618, env=0x7ffd5d29c628) at dl-init.c:105
#5  0x00007346c588bcda in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6  0x0000000000000001 in ?? ()
#7  0x00007ffd5d29e714 in ?? ()
#8  0x0000000000000000 in ?? ()

can you dump the /proc/$pid/maps file for such a process too?

Comment 61 by derat@chromium.org, Jan 10 2017

Cc: -derat@chromium.org

Comment 62 by h...@chromium.org, Jan 10 2017

Re:#60: I can't do that easily because the process crashes right away and then the PID will already be gone.

But from gdb I can do "info proc mappings" that does the same thing. For example here's the dump from another crash

Program received signal SIGSEGV, Segmentation fault.
0x000063999bf67520 in ?? ()
(gdb) bt
#0  0x000063999bf67520 in ?? ()
#1  0x000063999bf757c5 in ?? ()
#2  0x000063999bea7698 in ?? ()
#3  0x000063999bf5a84e in ?? ()
#4  0x000078621d111b2b in _dl_init (main_map=0x78621d327128, argc=1, argv=0x7ffca6d39488, env=0x7ffca6d39498) at dl-init.c:105
#5  0x000078621d102cda in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6  0x0000000000000001 in ?? ()
#7  0x00007ffca6d3a714 in ?? ()
#8  0x0000000000000000 in ?? ()

(gdb) info proc mappings
process 2605
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
          0x7fff7000         0x8fff7000 0x10000000        0x0 
          0x8fff7000      0x2008fff7000 0x20000000000        0x0 
       0x2008fff7000     0x10007fff8000 0xdfff0001000        0x0 
      0x600000000000     0x640000000000 0x40000000000        0x0 [heap]
      0x78621b866000     0x78621bbb8000   0x352000        0x0 
      0x78621bbb8000     0x78621bcd5000   0x11d000        0x0 /usr/lib64/libglib-2.0.so.0.3400.3
      0x78621bcd5000     0x78621bcd6000     0x1000   0x11d000 /usr/lib64/libglib-2.0.so.0.3400.3
      0x78621bcd6000     0x78621bcd9000     0x3000   0x11d000 /usr/lib64/libglib-2.0.so.0.3400.3
      0x78621bcd9000     0x78621bcda000     0x1000   0x120000 /usr/lib64/libglib-2.0.so.0.3400.3
      0x78621bcda000     0x78621becc000   0x1f2000        0x0 /usr/lib64/libcrypto.so.1.0.0
      0x78621becc000     0x78621becd000     0x1000   0x1f2000 /usr/lib64/libcrypto.so.1.0.0
      0x78621becd000     0x78621beeb000    0x1e000   0x1f2000 /usr/lib64/libcrypto.so.1.0.0
      0x78621beeb000     0x78621bef7000     0xc000   0x210000 /usr/lib64/libcrypto.so.1.0.0
      0x78621bef7000     0x78621befb000     0x4000        0x0 
      0x78621befb000     0x78621c09b000   0x1a0000        0x0 /lib64/libc-2.23.so
      0x78621c09b000     0x78621c29b000   0x200000   0x1a0000 /lib64/libc-2.23.so
      0x78621c29b000     0x78621c29f000     0x4000   0x1a0000 /lib64/libc-2.23.so
      0x78621c29f000     0x78621c2a1000     0x2000   0x1a4000 /lib64/libc-2.23.so
      0x78621c2a1000     0x78621c2a6000     0x5000        0x0 
      0x78621c2a6000     0x78621c2bc000    0x16000        0x0 /usr/lib64/libgcc_s.so.1
      0x78621c2bc000     0x78621c4bb000   0x1ff000    0x16000 /usr/lib64/libgcc_s.so.1
      0x78621c4bb000     0x78621c4bc000     0x1000    0x15000 /usr/lib64/libgcc_s.so.1
      0x78621c4bc000     0x78621c4bd000     0x1000    0x16000 /usr/lib64/libgcc_s.so.1
      0x78621c4bd000     0x78621c4c0000     0x3000        0x0 /lib64/libdl-2.23.so
      0x78621c4c0000     0x78621c6bf000   0x1ff000     0x3000 /lib64/libdl-2.23.so
      0x78621c6bf000     0x78621c6c0000     0x1000     0x2000 /lib64/libdl-2.23.so
      0x78621c6c0000     0x78621c6c1000     0x1000     0x3000 /lib64/libdl-2.23.so
      0x78621c6c1000     0x78621c6c8000     0x7000        0x0 /lib64/librt-2.23.so
      0x78621c6c8000     0x78621c8c7000   0x1ff000     0x7000 /lib64/librt-2.23.so
      0x78621c8c7000     0x78621c8c8000     0x1000     0x6000 /lib64/librt-2.23.so
      0x78621c8c8000     0x78621c8c9000     0x1000     0x7000 /lib64/librt-2.23.so
      0x78621c8c9000     0x78621c8e0000    0x17000        0x0 /lib64/libpthread-2.23.so
      0x78621c8e0000     0x78621cae0000   0x200000    0x17000 /lib64/libpthread-2.23.so
      0x78621cae0000     0x78621cae1000     0x1000    0x17000 /lib64/libpthread-2.23.so
      0x78621cae1000     0x78621cae2000     0x1000    0x18000 /lib64/libpthread-2.23.so
      0x78621cae2000     0x78621cae6000     0x4000        0x0 
      0x78621cae6000     0x78621cbeb000   0x105000        0x0 /lib64/libm-2.23.so
      0x78621cbeb000     0x78621cdeb000   0x200000   0x105000 /lib64/libm-2.23.so
      0x78621cdeb000     0x78621cdec000     0x1000   0x105000 /lib64/libm-2.23.so
      0x78621cdec000     0x78621cded000     0x1000   0x106000 /lib64/libm-2.23.so
      0x78621cded000     0x78621cee3000    0xf6000        0x0 /usr/lib64/libstdc++.so.6.0.20
      0x78621cee3000     0x78621d0e2000   0x1ff000    0xf6000 /usr/lib64/libstdc++.so.6.0.20
      0x78621d0e2000     0x78621d0ec000     0xa000    0xf5000 /usr/lib64/libstdc++.so.6.0.20
      0x78621d0ec000     0x78621d0ed000     0x1000    0xff000 /usr/lib64/libstdc++.so.6.0.20
      0x78621d0ed000     0x78621d102000    0x15000        0x0 
      0x78621d102000     0x78621d126000    0x24000        0x0 /lib64/ld-2.23.so
      0x78621d157000     0x78621d168000    0x11000        0x0 
      0x78621d168000     0x78621d17d000    0x15000        0x0 /lib64/libz.so.1.2.8
      0x78621d17d000     0x78621d17e000     0x1000    0x14000 /lib64/libz.so.1.2.8
      0x78621d17e000     0x78621d17f000     0x1000    0x15000 /lib64/libz.so.1.2.8
      0x78621d17f000     0x78621d180000     0x1000        0x0 
      0x78621d180000     0x78621d19c000    0x1c000        0x0 /usr/lib64/libevent_core-2.0.so.5.1.9
      0x78621d19c000     0x78621d19d000     0x1000    0x1c000 /usr/lib64/libevent_core-2.0.so.5.1.9
      0x78621d19d000     0x78621d19e000     0x1000    0x1c000 /usr/lib64/libevent_core-2.0.so.5.1.9
      0x78621d19e000     0x78621d19f000     0x1000    0x1d000 /usr/lib64/libevent_core-2.0.so.5.1.9
      0x78621d19f000     0x78621d1a2000     0x3000        0x0 
      0x78621d1a2000     0x78621d2fb000   0x159000        0x0 /usr/lib64/libbase-core-395517.so
      0x78621d2fb000     0x78621d304000     0x9000   0x158000 /usr/lib64/libbase-core-395517.so
      0x78621d304000     0x78621d305000     0x1000   0x161000 /usr/lib64/libbase-core-395517.so
      0x78621d305000     0x78621d307000     0x2000        0x0 
      0x78621d307000     0x78621d311000     0xa000        0x0 /usr/lib64/libbrillo-cryptohome-395517.so
      0x78621d311000     0x78621d312000     0x1000     0x9000 /usr/lib64/libbrillo-cryptohome-395517.so
      0x78621d312000     0x78621d319000     0x7000     0xa000 /usr/lib64/libbrillo-cryptohome-395517.so
      0x78621d319000     0x78621d325000     0xc000        0x0 
      0x78621d325000     0x78621d326000     0x1000    0x23000 /lib64/ld-2.23.so
      0x78621d326000     0x78621d327000     0x1000    0x24000 /lib64/ld-2.23.so
      0x78621d327000     0x78621d328000     0x1000        0x0 
      0x7ffca6d1a000     0x7ffca6d3b000    0x21000        0x0 [stack]
      0x7ffca6dbd000     0x7ffca6dbf000     0x2000        0x0 [vvar]
      0x7ffca6dbf000     0x7ffca6dc1000     0x2000        0x0 [vdso]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]

Comment 63 by h...@chromium.org, Jan 10 2017

So one thing I have noticed: the heap is at 0x600000000000 - 0x640000000000

This coincides with the crashing |load_bias| values range of  0x555555555555ull and 0x655555555554ull.

I have run several hundreds of times so far, and the crash only happens with |load_bias| falls inside the  0x600000000000 - 0x640000000000 range.
Ew, yup. That would be it. It seems the brk (heap) randomization isn't handling this correctly.

Comment 65 by h...@chromium.org, Jan 10 2017

Any suggestions what I should try next? I'm not very familiar with how the brk randomization works. Thanks
I'm looking now too. The logic starts in fs/binfmt_elf.c with the call to arch_randomize_brk(). The brk offset should already have been bumped by the load_bias, though, so I'm scratching my head at the moment. I'll keep looking...

Comment 67 by h...@chromium.org, Jan 10 2017

re:#66 please note this is the chromeos-4.4 branch, so it might not have picked up any latest kernel patches from upstream yet.

Comment 68 by h...@chromium.org, Jan 10 2017

Dump from readelf -Wl as requested:

Elf file type is DYN (Shared object file)
Entry point 0x214f0
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x000230 0x000230 R   0x8
  INTERP         0x000270 0x0000000000000270 0x0000000000000270 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x12df58 0x12df58 R E 0x1000
  LOAD           0x12ee00 0x000000000012fe00 0x000000000012fe00 0x005ef0 0xcf85c8 RW  0x1000
  DYNAMIC        0x12f3b8 0x00000000001303b8 0x00000000001303b8 0x0002a0 0x0002a0 RW  0x8
  NOTE           0x00028c 0x000000000000028c 0x000000000000028c 0x000044 0x000044 R   0x4
  GNU_EH_FRAME   0x129fac 0x0000000000129fac 0x0000000000129fac 0x003fac 0x003fac R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  TLS            0x12ee00 0x000000000012fe00 0x000000000012fe00 0x000000 0x000054 R   0x8
  GNU_RELRO      0x12ee00 0x000000000012fe00 0x000000000012fe00 0x003200 0x003200 RW  0x40

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .dynsym .dynstr .gnu.hash .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame .eh_frame_hdr 
   03     .data.rel.ro.local .jcr .fini_array .init_array .preinit_array .data.rel.ro .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     
   08     .tbss 
   09     .data.rel.ro.local .jcr .fini_array .init_array .preinit_array .data.rel.ro .dynamic .got .got.plt 

Comment 69 by h...@chromium.org, Jan 10 2017

A more complete dump of debug info is uploaded for review at https://paste.googleplex.com/5242057396322304

Comment 70 by h...@chromium.org, Jan 10 2017

Per offline chat: I propose that we temporarily reduce the randomness from 32 to 31 bits for x86_64 only. This is still sufficient for security and for passing the CTS tests. 

https://chromium-review.googlesource.com/#/c/426066/

Meanwhile keescook@ will try to reproduce this locally so that this can be investigated more efficiently.
Can you try backporting the following kernel changes from upstream?

ecc2bc8ac03884266cf73f8a2a42b911465b2fbc
5d22fc25d4fc8096d2d7df27ea1893d4e055e764
0036d1f7eb95bcc52977f15507f00dd07018e7e2

I don't think it'll change anything, but it does touch a lot of the same code that I'm suspicious of.

Comment 72 by h...@chromium.org, Jan 11 2017

re:#71 I did try 0036d1f7eb95bcc52977f15507f00dd07018e7e2 earlier yesterday but it didn't seem to help. Haven't tried the other two though.

Can you try set up a repro locally? It would be more efficient as you have more context on this topic than I do.

Comment 73 by h...@chromium.org, Jan 11 2017

FYI: I tried all 3 patches in comment #71, they don't seem to help.

However, the [heap] range of 0x600000000000-0x640000000000 appears to be always fixed and has nothing to do with the load_bias. For example if I force load_bias to 0x480000000000 (modify ELF_ET_DYN_BASE in arch/x86/include/asm/elf.h), then the ET_DYN load_bias would never collide with [heap] which is still at 0x600000000000.
Project Member

Comment 74 by bugdroid1@chromium.org, Jan 11 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a29136e1e89b3bb15d2b4917db856058f576c06f

commit a29136e1e89b3bb15d2b4917db856058f576c06f
Author: Haixia Shi <hshi@chromium.org>
Date: Mon Jan 09 22:32:54 2017

CHROMIUM: config: reduce mmap_rnd_bits from 32 to 31 for x86_64.

There seems to be a bug (either in the kernel or ASAN cryptohome) that causes
the range of ET_DYN to collide with the heap when 32 random bits are used.

Changing this to 31 would still provide plenty of randomness and allow us to
pass the relevant CTS tests. Meanwhile we will continue to investigate the
underlying problem.

BUG= chromium:674998 
BUG=b:33398361
TEST=see instructions at  http://crbug.com/674998#c27 

Change-Id: I6137c5f3798e9de0ec6e57f9e4534d016ad72727
Reviewed-on: https://chromium-review.googlesource.com/426066
Commit-Ready: Haixia Shi <hshi@chromium.org>
Tested-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Haixia Shi <hshi@chromium.org>

[modify] https://crrev.com/a29136e1e89b3bb15d2b4917db856058f576c06f/chromeos/config/x86_64/common.config

Comment 75 by h...@chromium.org, Jan 11 2017

Labels: -Pri-1 Kernel-4.4 Pri-2
Builds are now turning green. See

https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-tot-asan-informational/

I'd suggest to lower this to Pri-2. 

Comment 76 by h...@chromium.org, Jan 11 2017

Labels: Arch-x86_64
Owner: keescook@chromium.org
#75: good suggestion and thank you for all the good work!

Do you want to keep working on this or shall we give it to Kees?

(Hi Kees, please feel free to chime in :)

By the way, I said "thank you" because it's good progress for the team.  Not because you did a favor to me or anything like that.  Maybe I should have said "I am impressed by the good work".  You get the idea.
Status: Fixed (was: Started)

Comment 80 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 81 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment