wpa_supplicant crashes when loading pkcs11.so |
||||
Issue description
Observed the following wpa_supplicant crash on ToT when running some autotests that involves interactions with shill over ShillProxy (e.g. shill_proxy.find_object). The crash site seems to be at `OPENSSL_free(ctx->module)`, which looks like a memory corruption or double free.
INFO : Dumping stack for wpa_supplicant.20170725.001302.633.dmp with /build/coral/usr/lib/debug/breakpad:
Operating system: Linux
0.0.0 Linux 4.4.70-11168-gfbf429785e0e #1 SMP PREEMPT Mon Jul 24 22:18:24 PDT 2017 x86_64
CPU: amd64
family 6 model 92 stepping 9
1 CPU
GPU: UNKNOWN
Crash reason: SIGABRT
Crash address: 0x0
Process uptime: not available
Thread 0 (crashed)
0 libc-2.23.so!raise [raise.c : 54 + 0x10]
rax = 0x0000000000000000 rdx = 0x0000000000000006
rcx = 0x000079713c857eb2 rbx = 0x0000000000000061
rsi = 0x0000000000000279 rdi = 0x0000000000000279
rbp = 0x00007ffc1fb7f130 rsp = 0x00007ffc1fb7f008
r8 = 0x000000000000000f r9 = 0x0000000000000000
r10 = 0x0000000000000008 r11 = 0x0000000000000202
r12 = 0x0000000000000061 r13 = 0x00007ffc1fb7f1b8
r14 = 0x00007ffc1fb7f1b8 r15 = 0x0000000000000002
rip = 0x000079713c857eb2
Found by: given as instruction pointer in context
1 libc-2.23.so!abort [abort.c : 89 + 0xa]
rbx = 0x0000000000000061 rbp = 0x00007ffc1fb7f130
rsp = 0x00007ffc1fb7f010 r12 = 0x0000000000000061
r13 = 0x00007ffc1fb7f1b8 r14 = 0x00007ffc1fb7f1b8
r15 = 0x0000000000000002 rip = 0x000079713c859cd6
Found by: call frame info
2 libc-2.23.so!__libc_message [libc_fatal.c : 175 + 0x5]
rbx = 0x0000000000000061 rbp = 0x00007ffc1fb7f3b0
rsp = 0x00007ffc1fb7f140 r12 = 0x0000000000000061
r13 = 0x00007ffc1fb7f1b8 r14 = 0x00007ffc1fb7f1b8
r15 = 0x0000000000000002 rip = 0x000079713c899577
Found by: call frame info
3 libc-2.23.so!malloc_printerr [malloc.c : 5004 + 0x5]
rbx = 0x00007ffc1fb80eb5 rbp = 0x00007ffc1fb7f400
rsp = 0x00007ffc1fb7f3c0 r12 = 0x0000000000000003
r13 = 0x000079713c99c860 r14 = 0x000057ddd2b37af0
r15 = 0x000057ddd2b2fc10 rip = 0x000079713c89f6f1
Found by: call frame info
4 libcrypto.so.1.0.0!<name omitted> [mem.c : 434 + 0x9]
rbx = 0x000079713d0b39a0 rbp = 0x00007ffc1fb7f420
rsp = 0x00007ffc1fb7f410 r12 = 0x000057ddd2b37a10
r13 = 0x000057ddd1a9d9b0 r14 = 0x000057ddd2b37af0
r15 = 0x000057ddd2b2fc10 rip = 0x000079713cd62b15
Found by: call frame info
5 pkcs11.so!ctx_engine_ctrl [eng_back.c : 856 + 0x9]
rbx = 0x000057ddd2b2fc10 rbp = 0x00007ffc1fb7f440
rsp = 0x00007ffc1fb7f430 r12 = 0x000057ddd2b37a10
r13 = 0x000057ddd1a9d9b0 r14 = 0x000057ddd2b37af0
r15 = 0x000057ddd2b2fc10 rip = 0x000079713d0b5157
Found by: call frame info
6 libcrypto.so.1.0.0!ENGINE_ctrl_cmd_string [eng_ctrl.c : 357 + 0x13]
rbx = 0x00000000000000c9 rbp = 0x00007ffc1fb7f480
rsp = 0x00007ffc1fb7f450 r12 = 0x000057ddd2b37a10
r13 = 0x000057ddd1a9d9b0 r14 = 0x00000000000000c9
r15 = 0x000057ddd2b2fc10 rip = 0x000079713ccd837a
Found by: call frame info
7 wpa_supplicant!tls_engine_load_dynamic_generic [tls_openssl.c : 676 + 0xa]
rbx = 0x00007ffc1fb7f540 rbp = 0x00007ffc1fb7f4c0
rsp = 0x00007ffc1fb7f490 r12 = 0x00007ffc1fb7f4f0
r13 = 0x000057ddd1a9d9b0 r14 = 0x000057ddd1aa6afd
r15 = 0x000057ddd2b37a10 rip = 0x000057ddd1a1d714
Found by: call frame info
8 wpa_supplicant!tls_init [tls_openssl.c : 727 + 0x16]
rbx = 0x00007ffc1fb7f540 rbp = 0x00007ffc1fb7f590
rsp = 0x00007ffc1fb7f4d0 r12 = 0x000057ddd2b35160
r13 = 0x000057ddd1a9d9b0 r14 = 0x000057ddd1a9d75a
r15 = 0x000057ddd2b36260 rip = 0x000057ddd1a191b2
Found by: call frame info
9 wpa_supplicant!eap_peer_sm_init [eap.c : 1941 + 0x5]
rbx = 0x000057ddd2b2f7f0 rbp = 0x00007ffc1fb7f610
rsp = 0x00007ffc1fb7f5a0 r12 = 0x000057ddd1ab0090
r13 = 0x000057ddd2b251b0 r14 = 0x00007ffc1fb7f620
r15 = 0x000057ddd2b31b40 rip = 0x000057ddd1a13564
Found by: call frame info
10 wpa_supplicant!eapol_sm_init [eapol_supp_sm.c : 2067 + 0x13]
rbx = 0x000057ddd2b251b0 rbp = 0x00007ffc1fb7f670
rsp = 0x00007ffc1fb7f620 r12 = 0x000057ddd2b31b40
r13 = 0x000057ddd1a9f6d6 r14 = 0x000057ddd2b30850
r15 = 0x000057ddd2b31c28 rip = 0x000057ddd1a1254b
Found by: call frame info
11 wpa_supplicant!wpa_supplicant_init_eapol [wpas_glue.c : 993 + 0x8]
rbx = 0x000057ddd2b30850 rbp = 0x00007ffc1fb7f690
rsp = 0x00007ffc1fb7f680 r12 = 0x000057ddd2b31b40
r13 = 0x000057ddd1a9f6d6 r14 = 0x000057ddd2b31b40
r15 = 0x000057ddd2b31c28 rip = 0x000057ddd1a58752
Found by: call frame info
12 wpa_supplicant!wpa_supplicant_add_iface [wpa_supplicant.c : 4478 + 0x8]
rbx = 0x0000000000000000 rbp = 0x00007ffc1fb7f7b0
rsp = 0x00007ffc1fb7f6a0 r12 = 0x000057ddd2b31b40
r13 = 0x000057ddd1a9f6d6 r14 = 0x000057ddd2b40401
r15 = 0x000057ddd2b31c28 rip = 0x000057ddd1a506da
Found by: call frame info
13 wpa_supplicant!wpas_dbus_handler_create_interface [dbus_new_handlers.c : 617 + 0xa]
rbx = 0x000057ddd2b309a0 rbp = 0x00007ffc1fb7f910
rsp = 0x00007ffc1fb7f7c0 r12 = 0x000057ddd2b00af0
r13 = 0x0000000000000000 r14 = 0x000057ddd2b03b70
r15 = 0x000057ddd2b00af0 rip = 0x000057ddd1a384b9
Found by: call frame info
14 wpa_supplicant!message_handler [dbus_new_helpers.c : 354 + 0x7]
rbx = 0x0000000000000100 rbp = 0x00007ffc1fb7fb20
rsp = 0x00007ffc1fb7f920 r12 = 0x000057ddd2b04050
r13 = 0x000057ddd1ab05f0 r14 = 0x00007ffc1fb7f9f5
r15 = 0x000057ddd1ab0608 rip = 0x000057ddd1a34d92
Found by: call frame info
15 libdbus-1.so.3.14.8 + 0x2866a
rbx = 0x0000000000000000 rbp = 0x00007ffc1fb7fc00
rsp = 0x00007ffc1fb7fb30 r12 = 0x000057ddd2b01140
r13 = 0x000057ddd1a34910 r14 = 0x0000000000000000
r15 = 0x000057ddd2b01670 rip = 0x000079713d11666a
Found by: call frame info
16 libdbus-1.so.3.14.8 + 0x170f6
rbp = 0x00007ffc1fb7fcb0 rsp = 0x00007ffc1fb7fc10
rip = 0x000079713d1050f6
Found by: previous frame's frame pointer
17 wpa_supplicant!process_watch [dbus_common.c : 38 + 0x8]
rbp = 0x00007ffc1fb7fce0 rsp = 0x00007ffc1fb7fcc0
rip = 0x000057ddd1a3f8e8
Found by: previous frame's frame pointer
18 wpa_supplicant!eloop_run [eloop.c : 495 + 0x4]
rbx = 0x0000000000000000 rbp = 0x00007ffc1fb7fd60
rsp = 0x00007ffc1fb7fcf0 r14 = 0x000057ddd1ab5850
r15 = 0x0000000000000004 rip = 0x000057ddd19e830a
Found by: call frame info
19 wpa_supplicant!wpa_supplicant_run [wpa_supplicant.c : 5033 + 0x5]
rbx = 0x0000000000000001 rbp = 0x00007ffc1fb7fd90
rsp = 0x00007ffc1fb7fd70 r12 = 0x000057ddd2b00610
r13 = 0x000057ddd2b00610 r14 = 0x0000000000000001
r15 = 0x000057ddd2b00af0 rip = 0x000057ddd1a5186f
Found by: call frame info
20 wpa_supplicant!main [main.c : 345 + 0x8]
rbx = 0x0000000000000001 rbp = 0x00007ffc1fb7fe80
rsp = 0x00007ffc1fb7fda0 r12 = 0x000057ddd2b00610
r13 = 0x000057ddd2b00610 r14 = 0x0000000000000001
r15 = 0x000057ddd2b00af0 rip = 0x000057ddd1a5ddad
Found by: call frame info
21 libc-2.23.so!__libc_start_main [libc-start.c : 289 + 0x1a]
rbx = 0x0000000000000000 rbp = 0x00007ffc1fb7ff50
rsp = 0x00007ffc1fb7fe90 r12 = 0x000057ddd1a7d7c0
r13 = 0x00007ffc1fb7ff70 r14 = 0x0000000000000000
r15 = 0x0000000000000000 rip = 0x000079713c844816
Found by: call frame info
22 wpa_supplicant!_start + 0x29
rbx = 0x0000000000000000 rbp = 0x0000000000000000
rsp = 0x00007ffc1fb7ff60 r12 = 0x000057ddd19d8b80
r13 = 0x00007ffc1fb7ff70 r14 = 0x0000000000000000
r15 = 0x0000000000000000 rip = 0x000057ddd19d8ba9
Found by: call frame info
23 0x7ffc1fb7ff68
rbx = 0x0000000000000000 rbp = 0x0000000000000000
rsp = 0x00007ffc1fb7ff68 r12 = 0x000057ddd19d8b80
r13 = 0x00007ffc1fb7ff70 r14 = 0x0000000000000000
r15 = 0x0000000000000000 rip = 0x00007ffc1fb7ff68
Found by: call frame info
24 0x7ffc1fbcd000
rsp = 0x00007ffc1fb80010 rip = 0x00007ffc1fbcd000
Found by: stack scanning
25 wpa_supplicant!_init + 0x1c08
rsp = 0x00007ffc1fb800a0 rip = 0x000057ddd19d8b80
Found by: stack scanning
Loaded modules:
0x57ddd19c6000 - 0x57ddd1aacfff wpa_supplicant ??? (main)
0x79713bee6000 - 0x79713befbfff libgcc_s.so.1 ???
0x79713c0fd000 - 0x79713c201fff libm-2.23.so ???
0x79713c403000 - 0x79713c419fff libpthread-2.23.so ???
0x79713c620000 - 0x79713c622fff libdl-2.23.so ???
0x79713c824000 - 0x79713c9c4fff libc-2.23.so ???
0x79713cbcf000 - 0x79713cdb3fff libcrypto.so.1.0.0 ???
0x79713cddf000 - 0x79713cde5fff librt-2.23.so ???
0x79713cfe7000 - 0x79713d00afff ld-2.23.so ???
0x79713d0b0000 - 0x79713d0befff pkcs11.so ???
0x79713d0c6000 - 0x79713d0c9fff libattr.so.1.1.0 ???
0x79713d0cc000 - 0x79713d0e1fff libz.so.1.2.11 ???
0x79713d0e7000 - 0x79713d0ebfff libcap.so.2.24 ???
0x79713d0ee000 - 0x79713d132fff libdbus-1.so.3.14.8 ??? (WARNING: No symbols, libdbus-1.so.3.14.8, 5C540FE403A631453D6A83E57E1B64BB0)
0x79713d137000 - 0x79713d190fff libssl.so.1.0.0 ???
0x79713d19d000 - 0x79713d1dcfff libnl.so.1.1 ???
0x79713d1f1000 - 0x79713d203fff libminijailpreload.so ???
0x7ffc1fbcd000 - 0x7ffc1fbcefff linux-gate.so ???
,
Jul 25 2017
And a shill crash follows the wpa_supplicant crash: 2017-07-25T00:36:00.951493-07:00 DEBUG shill[8045]: [VERBOSE2:chromeos_supplicant_process_proxy.cc(119)] /fi/w1/wpa_supplicant1 CreateInterface 2017-07-25T00:36:01.183301-07:00 ERR shill[8045]: [ERROR:object_proxy.cc(582)] Failed to call method: fi.w1.wpa_supplicant1.CreateInterface: object_path= /fi/w1/wpa_supplicant1: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying 2017-07-25T00:36:01.183417-07:00 ERR shill[8045]: [ERROR:chromeos_supplicant_process_proxy.cc(130)] Failed to create interface: org.freedesktop.DBus.Error.NoReply Message recipient disconnected from message bus without replying 2017-07-25T00:36:01.183466-07:00 DEBUG shill[8045]: [VERBOSE2:chromeos_supplicant_process_proxy.cc(159)] /fi/w1/wpa_supplicant1 GetInterface: wlan0 2017-07-25T00:36:01.184553-07:00 ERR shill[8045]: [ERROR:object_proxy.cc(582)] Failed to call method: fi.w1.wpa_supplicant1.GetInterface: object_path= /fi/w1/wpa_supplicant1: org.freedesktop.DBus.Error.ServiceUnknown: The name fi.w1.wpa_supplicant1 was not provided by any .service files 2017-07-25T00:36:01.184619-07:00 ERR shill[8045]: [ERROR:dbus_method_invoker.h(111)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name fi.w1.wpa_supplicant1 was not provided by any .service files 2017-07-25T00:36:01.185254-07:00 CRIT shill[8045]: [FATAL:chromeos_supplicant_process_proxy.cc(168)] Failed to get interface wlan0: org.freedesktop.DBus.Error.ServiceUnknown The name fi.w1.wpa_supplicant1 was not provided by any .service files#012/usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7bb62c4a2ba3]#012
,
Jul 25 2017
@benchan - which autotests are you running exactly? Can you please try the test in a test lab and share the results?
,
Jul 25 2017
Hmm, network_WiFi_SimpleConnect.wifi_check1x_WPA works on a lab device with R62-9776.0.0: cros flash root@chromeos9-row4-rack1-host1.cros xbuddy://remote/samus/R62-9776.0.0/test test_that --board=samus --fast chromeos9-row4-rack1-host1.cros network_WiFi_SimpleConnect.wifi_check1x_WPA ... ---------------------------------------------------------------------------------------------------- network_WiFi_SimpleConnect.wifi_check1x_WPA [ PASSED ] network_WiFi_SimpleConnect.wifi_check1x_WPA/network_WiFi_SimpleConnect.wifi_check1x_WPA [ PASSED ] ---------------------------------------------------------------------------------------------------- Can you try again after a repo sync, and rebuilding everything? What exact tests are you running?
,
Jul 25 2017
The supplicant crash can be reproduced as follows. Note that even through a cellular test is used, it can run on a non-cellular DUT. The test simply queries shill if a cellular modem is found. The supplicant crash seems unrelated to the cellular test: cros flash $DUT xbuddy://remote/samus/R62-9778.0.0/test test_that --fast $DUT network_3GModemPresent Attached is the supplicant log
,
Jul 26 2017
FYI, the supplicant crash is no longer observed after switching back from libp11 to engine_pkcs11.
,
Jul 26 2017
Sorry for the delayed update. I ran on my Eve with 9777.0.0 and cannot reproduce the crash with network_3GModemPresent. Below is the test log, I do not see anything with "grep -i crash net.log" or "ls /var/spool/crash" or in dmesg.
21:36:44 INFO | autoserv| Got lock of exit_code_file.
21:36:44 INFO | autoserv| Released lock of exit_code_file and closed it.
21:36:46 INFO | autoserv| Killing child processes.
21:36:46 INFO | autoserv| Client complete
21:36:47 INFO | autoserv| Finished processing control file
21:36:47 INFO | autoserv| Skipping crash dump/info collection as requested.
21:36:47 INFO | autoserv| get_network_stats: at-end RXbytes 12593956 TXbytes 2151335
21:36:48 INFO | autoserv| record_state_duration failed: job_or_task_id=None, hostname=172.22.30.198, status=Running
-------------------------------------------------------------------------------------------------
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent [ FAILED ]
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent FAIL: Unhandled DBusException: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent [ FAILED ]
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent FAIL: Unhandled DBusException: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent 07/25 19:36:41.234 ERROR| test_environment:0133| Error during test initialization:
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent Traceback (most recent call last):
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent File "/usr/local/autotest/cros/cellular/test_environment.py", line 117, in __enter__
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent self._verify_sim()
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent File "/usr/local/autotest/cros/cellular/test_environment.py", line 244, in _verify_sim
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent modem_device = self._get_shill_cellular_device_object()
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent File "/usr/local/autotest/cros/cellular/test_environment.py", line 154, in _get_shill_cellular_device_object
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent raise error.TestError('Cannot find cellular device in shill. '
/tmp/test_that_results_4iIDv1/results-1-network_3GModemPresent/network_3GModemPresent TestError: Cannot find cellular device in shill. Is the modem plugged in?
-------------------------------------------------------------------------------------------------
Total PASS: 0/2 (0%)
21:36:49 INFO | Finished running tests. Results can be found in /tmp/test_that_results_4iIDv1 or /tmp/test_that_latest
,
Jul 26 2017
#c7 is wrong and I stand corrected. The crash dumps are there, just not in /var/spool/crash but in /usr/local/autotest/debug hierarchy, working on symbolizing those. In the meanwhile, can we consider reverting the libp11 CL?
,
Jul 26 2017
ben was going to check if there was a simpler command line or specific set of dbus calls that can repro this.
,
Jul 26 2017
I spent a long time looking into this today, but I ran out of time before finding a root cause. For now I recommend we just revert back to the previous libp11 (engine_p11) by reverting: https://chromium-review.googlesource.com/#/c/569904/ https://chromium-review.googlesource.com/#/c/569905/ https://chromium-review.googlesource.com/#/c/570142/
,
Jul 26 2017
I was able to reproduce the supplicant crash by enabling/disabling WiFi through shill a few times: To do that programmatically: dbus-send --print-reply --system --dest=org.chromium.flimflam /device/wlan0 org.chromium.flimflam.Device.Disable dbus-send --print-reply --system --dest=org.chromium.flimflam /device/wlan0 org.chromium.flimflam.Device.Enable Repeat the commands above until one of the them fails as follows: Error org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying Then look for the supplicant and shill crashes under /var/spool/crash We can probably add a stress enable/disable test for shill.
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/2dd11031ebbae69834105d41d686dce8f91d7d8d commit 2dd11031ebbae69834105d41d686dce8f91d7d8d Author: Kirtika Ruchandani <kirtika@chromium.org> Date: Thu Jul 27 05:41:47 2017 Revert "shill: update wpa_supplicant.conf for libp11 0.4.0+" This reverts commit d4b1f672b11ae58c36fb4588d8853996b0333966. This commit causes a crash in wpa-supplicant and shill, which can be reliably reproduced by running network_3GModemPresent on any board. CQ-DEPEND=CL:587370 BUG=chromium:748407 TEST=run network_3GModemPresent on eve, check for no supplicant crash Change-Id: If9af0077cd58647b87c59e8e035c79bc608086b9 Reviewed-on: https://chromium-review.googlesource.com/587328 Commit-Ready: Kirtika Ruchandani <kirtika@chromium.org> Tested-by: Kirtika Ruchandani <kirtika@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/2dd11031ebbae69834105d41d686dce8f91d7d8d/shims/wpa_supplicant.conf.in
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/dc916c3bdfe1d569b8cdb1ed4bc56da4700accf4 commit dc916c3bdfe1d569b8cdb1ed4bc56da4700accf4 Author: Kirtika Ruchandani <kirtika@chromium.org> Date: Thu Jul 27 05:41:46 2017 Revert "net-wireless/wpa_supplicant: engine_pkcs11 -> libp11" This reverts commit d42e9591c3271da8f7fef3edd357cf6272657777. This commit causes a crash in wpa-supplicant and shill, which can be reliably reproduced by running network_3GModemPresent on any board. CQ-DEPEND=CL:587328 BUG=chromium:748407 TEST=run network_3GModemPresent on eve, check for no supplicant crash Change-Id: I69d0de711b157f5b2d845b3ca0465044f2cb28a5 Reviewed-on: https://chromium-review.googlesource.com/587249 Commit-Ready: Kirtika Ruchandani <kirtika@chromium.org> Tested-by: Kirtika Ruchandani <kirtika@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/dc916c3bdfe1d569b8cdb1ed4bc56da4700accf4/net-wireless/wpa_supplicant/wpa_supplicant-9999.ebuild
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/portage-stable/+/3b618dd9199b8cc75ca869d4246cfe2da7a50684 commit 3b618dd9199b8cc75ca869d4246cfe2da7a50684 Author: Kirtika Ruchandani <kirtika@chromium.org> Date: Thu Jul 27 05:41:47 2017 Revert "dev-libs/libp11 & engine_pkcs11: Update to latest from gentoo" This reverts commit 31736c26284898a69357bc9cfba7011a328154ff. This commit causes a crash in wpa-supplicant and shill, which can be reliably reproduced by running network_3GModemPresent on any board. CQ-DEPEND=CL:587249 BUG=chromium:748407 TEST=run network_3GModemPresent on eve, check for no supplicant crash Change-Id: I788d2965d96f8d7fddf6063da92776c865474ecf Reviewed-on: https://chromium-review.googlesource.com/587370 Commit-Ready: Kirtika Ruchandani <kirtika@chromium.org> Tested-by: Kirtika Ruchandani <kirtika@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/engine_pkcs11/Manifest [modify] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/libp11/metadata.xml [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/engine_pkcs11/engine_pkcs11-0.1.8-r1.ebuild [modify] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/libp11/Manifest [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/libp11/files/libp11-0.2.8-no-ltdl.patch [delete] https://crrev.com/8355664c39ee36822c44fbd32946a0a67a1d6553/dev-libs/libp11/libp11-0.4.4.ebuild [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/libp11/files/libp11-0.2.8-variable-buffer-size.patch [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/engine_pkcs11/metadata.xml [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/libp11/libp11-0.2.8-r3.ebuild [add] https://crrev.com/3b618dd9199b8cc75ca869d4246cfe2da7a50684/dev-libs/engine_pkcs11/engine_pkcs11-0.1.8.ebuild
,
Jul 28 2017
,
Jul 28 2017
I just wrote up a painful long detailed explanation of what is happening here, but it got erased before I hit send, so I'll try to summarize again... wpa_supplicant uses the "dynamic" engine to load pkcs11.so. The dynamic engine uses a static variable to store an "ex_data" id for storing a context pointer. The dynamic engine registers a "free" function with this ID to unload any engines that it has dynamically loaded. The dynamically loaded pkcs11 engine also uses a static variable to store its "ex_data" id for storing its own context pointer. The IDs are from the same CRYPTO_EX_INDEX_ENGINE namespace, thus for wpa_supplicant's first TLS session, the dynamic ID = 0, pkcs11 ID = 1. Now, when the TLS session ends, tls_deinit() calls ENGINE_cleanup(), which calls the dynamic engine's "ex_data ID free routine" which in turn dlcloses() pkcs11.so. However, OpenSSL and the dynamic engine are themselves not unloaded, so the dynamic engine still uses the same "ex_data ID". After ENGINE_cleanup(), tls_deinit() then calls CRYPTO_cleanup_all_ex_data(), which resets all of the "ex_data" classes, invalidating the dynamic engines's ID. On the next TLS session, the dynamic engine's static id variable still has the last (stale) ID (0). However, it no longer has a registered free() function. Furthermore, when pkcs11 is dynamically loaded, its static variable has been reset, so this time, it fetches a new ID - but since dynamic didn't go first, the ID returned is (0) - the same as is being (re-)used by dynamic. Essentially, the dynamic and pkcs11 engines are both using the exact same memory for their contexts. When the second TLS session ends, since the dynamic engine failed to register a free() function, the pkcs11.so module is NOT dlclosed(), and hence its static variable is not reset. When the third TLS session, starts both dynamic and pkcs11 engines again use the same ID (0), and in fact use the same memory for their contexts. When the third TLS session ends, both engines end up free'ing the same pointer: dynamic thinks it is free'ing DYNAMIC_LIBNAME, and pkcs11.so thinks its free'ing module. This leads to a double free / crash. AFAICT, the right way to fix this is for the dynamic engine to reset dynamic_ex_data_idx in ENGINE_cleanup() (before CRYPTO_cleanup_all_ex_data), perhaps in a new "destroy" callback.
,
Aug 18 2017
To follow up here, the reason this is not a problem with upstream wpa_supplicant when using openssl 1.1 & upstream libp11, is this wpa_supplicant patch: 29bc76e3d3252dfc311d1ffaa15be66abd8509ee OpenSSL: Do not use library init/deinit functions with 1.1.0 The above patch comments out the call to CRYPTO_cleanup_all_ex_data() in tls_deinit() when using OPENSSL_VERSION_NUMBER >= 0x10100000L. So, we could just defer and update pkcs11 and wpa_supplicant and openssl-1.1 atomically.
,
Aug 18 2017
Staying closer to upstream SGTM. |
||||
►
Sign in to add a comment |
||||
Comment 1 by benchan@chromium.org
, Jul 25 2017