BluetoothRemoteGattCharacteristicBlueZ::WriteRemoteCharacteristic() discards callbacks |
|||||||
Issue descriptionLeft a long-running host scan going. Eventually, it stopped working. Toward the end of the logs, I see things like: [1390:1390:0118/114324.293210:INFO:ble_connection_manager.cc(229)] Register - Device ID: "CAESR...aLg==", Reason: [KeepAliveTickle] ...but no connection is ever attempted. Still digging into this. Logs here: https://drive.google.com/file/d/1ENQT7REBOQOEJwxVYSA2SodZWbYzvuK8/view?usp=sharing
,
Jan 19 2018
Writing the "connection close" packet should eventually trigger a transition to the DISCONNECTED state. However, it does not appear that we ever get either the success or error callback called when writing the packet. These logs appear just after the packet is written: [1390:1390:0118/113639.814072:VERBOSE1:bluetooth_remote_gatt_characteristic_bluez.cc(203)] Sending GATT characteristic write request to characteristic: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char002c, UUID: 00000100-0004-1000-8000-001a11000101, with value: [A20000]. [1390:1390:0118/113639.841778:VERBOSE2:bluetooth_gatt_descriptor_client.cc(170)] Remote GATT descriptor removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char0029/desc002b [1390:1390:0118/113639.841826:VERBOSE2:bluetooth_remote_gatt_characteristic_bluez.cc(282)] Unknown descriptor removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char0029/desc002b [1390:1390:0118/113639.841847:VERBOSE1:bluetooth_remote_gatt_characteristic_bluez.cc(286)] Removing remote GATT descriptor from characteristic: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char0029, UUID: 00000100-0004-1000-8000-001a11000102 [1390:1390:0118/113639.841889:VERBOSE2:bluetooth_remote_gatt_characteristic_bluez.cc(282)] Unknown descriptor removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char0029/desc002b [1390:1390:0118/113639.841963:VERBOSE2:bluetooth_gatt_characteristic_client.cc(212)] Remote GATT characteristic removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char0029 [1390:1390:0118/113639.841988:VERBOSE2:bluetooth_remote_gatt_service_bluez.cc(205)] Unknown GATT characteristic removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char0029 [1390:1390:0118/113639.842008:VERBOSE1:bluetooth_remote_gatt_service_bluez.cc(209)] Removing remote GATT characteristic from service: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028, UUID: b3b7e28e-a000-3e17-bd86-6e97b9e28c11 [1390:1390:0118/113639.842087:VERBOSE2:bluetooth_gatt_characteristic_client.cc(212)] Remote GATT characteristic removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char002c [1390:1390:0118/113639.842109:VERBOSE2:bluetooth_remote_gatt_service_bluez.cc(205)] Unknown GATT characteristic removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028/char002c [1390:1390:0118/113639.842128:VERBOSE1:bluetooth_remote_gatt_service_bluez.cc(209)] Removing remote GATT characteristic from service: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028, UUID: b3b7e28e-a000-3e17-bd86-6e97b9e28c11 [1390:1390:0118/113639.842211:VERBOSE2:bluetooth_gatt_service_client.cc(91)] Remote GATT service removed: /org/bluez/hci0/dev_7C_A8_45_6E_4E_59/service0028 Bluetooth folks, does this mean that the descriptor was removed just as the packet was about to be sent? Does this cause the Bluetooth stack to never actually call the callback/errback?
,
Jan 19 2018
I talked to Miao offline about this, and she thinks that it is indeed a Bluetooth bug in which BluetoothRemoteGattCharacteristicBlueZ discards the callback and error callback when the descriptor is removed. Because the callbacks are never invoked, Instant Tethering code never learns that the packet was sent and never closes the connection. Assigning this issue to Joseph to look at things further. Since we now suspect this is a Bluetooth bug, I've included more debug logs here: https://drive.google.com/open?id=1Qxbspo7Q4Z9PZCRpNvSNNg5XYi1uFdMm.
,
Jan 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fe22dec1fcd295c8681a6bf6c698c2469ceb3c90 commit fe22dec1fcd295c8681a6bf6c698c2469ceb3c90 Author: Kyle Horimoto <khorimoto@google.com> Date: Mon Jan 22 23:42:45 2018 [CrOS Tether] Add a timeout for sending a message over BLE. The Bluetooth APIs accept callback and error callback parameters when sending a message, but we've recently discovered that there are some bugs which can prevent these callbacks from being invoked. This CL adds an extra timeout which ensures that even if the callbacks are not invoked, BluetoothLowEnergyWeaveClientConnection handles this case robustly. This is not the full fix to https://crbug.com/803677, but it is a partial fix which can avoid some nasty side effects for Instant Tethering. Bug: 803677, 672263 Change-Id: I982c08dfe726a3c3e3164aac87ca8bec10b68be7 Reviewed-on: https://chromium-review.googlesource.com/877339 Reviewed-by: Ryan Hansberry <hansberry@chromium.org> Commit-Queue: Ryan Hansberry <hansberry@chromium.org> Commit-Queue: Kyle Horimoto <khorimoto@chromium.org> Cr-Commit-Position: refs/heads/master@{#531062} [modify] https://crrev.com/fe22dec1fcd295c8681a6bf6c698c2469ceb3c90/components/cryptauth/ble/bluetooth_low_energy_weave_client_connection.cc [modify] https://crrev.com/fe22dec1fcd295c8681a6bf6c698c2469ceb3c90/components/cryptauth/ble/bluetooth_low_energy_weave_client_connection.h [modify] https://crrev.com/fe22dec1fcd295c8681a6bf6c698c2469ceb3c90/components/cryptauth/ble/bluetooth_low_energy_weave_client_connection_unittest.cc [modify] https://crrev.com/fe22dec1fcd295c8681a6bf6c698c2469ceb3c90/tools/metrics/histograms/enums.xml
,
Jan 22 2018
,
Jan 23 2018
,
Jan 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/456a4076ac91b56affd8fcd0a9898104bad45ccc commit 456a4076ac91b56affd8fcd0a9898104bad45ccc Author: Kyle Horimoto <khorimoto@google.com> Date: Tue Jan 23 22:32:22 2018 [CrOS Tether] Add a timeout for sending a message over BLE. The Bluetooth APIs accept callback and error callback parameters when sending a message, but we've recently discovered that there are some bugs which can prevent these callbacks from being invoked. This CL adds an extra timeout which ensures that even if the callbacks are not invoked, BluetoothLowEnergyWeaveClientConnection handles this case robustly. This is not the full fix to https://crbug.com/803677, but it is a partial fix which can avoid some nasty side effects for Instant Tethering. TBR=khorimoto@google.com (cherry picked from commit fe22dec1fcd295c8681a6bf6c698c2469ceb3c90) Bug: 803677, 672263 Change-Id: I982c08dfe726a3c3e3164aac87ca8bec10b68be7 Reviewed-on: https://chromium-review.googlesource.com/877339 Reviewed-by: Ryan Hansberry <hansberry@chromium.org> Commit-Queue: Ryan Hansberry <hansberry@chromium.org> Commit-Queue: Kyle Horimoto <khorimoto@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#531062} Reviewed-on: https://chromium-review.googlesource.com/882216 Reviewed-by: Kyle Horimoto <khorimoto@chromium.org> Cr-Commit-Position: refs/branch-heads/3325@{#39} Cr-Branched-From: bc084a8b5afa3744a74927344e304c02ae54189f-refs/heads/master@{#530369} [modify] https://crrev.com/456a4076ac91b56affd8fcd0a9898104bad45ccc/components/cryptauth/ble/bluetooth_low_energy_weave_client_connection.cc [modify] https://crrev.com/456a4076ac91b56affd8fcd0a9898104bad45ccc/components/cryptauth/ble/bluetooth_low_energy_weave_client_connection.h [modify] https://crrev.com/456a4076ac91b56affd8fcd0a9898104bad45ccc/components/cryptauth/ble/bluetooth_low_energy_weave_client_connection_unittest.cc [modify] https://crrev.com/456a4076ac91b56affd8fcd0a9898104bad45ccc/tools/metrics/histograms/enums.xml
,
Jan 23 2018
Joseph: I've landed a defensive fix for this on my side (and merged it to M-65), but the actual bug in Bluetooth still needs to be fixed. Keeping this bug open for your fix.
,
Jan 30 2018
Hi Kyle, is it possible that the problem is on the android side? I encountered this issue from time to time. It seems to me that restarting the phone would make the connection work.
,
Jan 30 2018
Hi Joseph - this should be agnostic to the Android side and should behave correctly either way. Here's what is happening: (1) Instant Tethering calls BluetoothRemoteGattCharacteristicBlueZ::WriteRemoteCharacteristic() to send a message to the phone. It passes a success and error callback. (2) The phone disconnects just as this is occurring. (3) Neither the success or error callbacks is invoked. We get stuck. The correct thing to do is simply to call the error callback if this occurs instead of just not calling any callback. Looking at the code, it seems that the callbacks get passed to [1]. Then, if the BluetoothGattCharacteristicClientImpl object is deleted, the callbacks get dropped since its WeakPtrFactory is deleted. I didn't read the class thoroughly, but it seems to me that the correct thing to do would be to invoke any pending error callbacks in the destructor of this class. [1] https://cs.chromium.org/chromium/src/device/bluetooth/dbus/bluetooth_gatt_characteristic_client.cc?q="void+WriteValue"
,
Feb 2 2018
Downgrading priority since I've added a workaround for M-65. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by khorimoto@chromium.org
, Jan 19 2018Looks like authentication fails at some point, then we never unregister the device whose authentication fails. This causes it to get stuck in the device queue: [1390:1390:0118/113639.811200:WARNING:device_to_device_authenticator.cc(198)] Authentication failed: Timed out waiting for [Responder Auth] [1390:1390:0118/113639.811963:WARNING:secure_channel.cc(302)] Failed to authenticate connection to device with ID CAESR...aLg== [1390:1390:0118/113639.812639:INFO:bluetooth_low_energy_weave_client_connection.cc(231)] Disconnection requested; sending "connection close" uWeave packet to {id: "CAESR...aLg==", addr: "7C:A8:45:6E:4E:59"}. [1390:1390:0118/113639.813331:INFO:bluetooth_low_energy_weave_client_connection.cc(816)] Sending a "connection close" uWeave packet. [1390:1390:0118/113639.814027:INFO:bluetooth_low_energy_weave_client_connection.cc(647)] Writing 3 bytes to {id: "CAESR...aLg==", addr: "7C:A8:45:6E:4E:59"}. Then, because the message that got stuck in the queue was a TetherAvailabilityRequest, the scan was still considered active (since all connection attempts had not yet completed). Thus, when a new scan was triggered, HostScanScheduler thought that a scan was ongoing and never scheduled a new one.