New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 845107 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jul 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Failed to stop discovery due to __hci_req_sync() err -110

Project Member Reported by josephsih@chromium.org, May 21 2018

Issue description

Chrome Version: eve-release/R66-10452.22.0

What steps will reproduce the problem?
Running magic tethering with 1 pixel phone present.

What is the expected result?
It should run smoothly.

What happens instead?
It failed to stop discovery at some point and was unrecoverable. No more LE connection could be created since then. 

The message log showed that 
2018-05-21T03:10:02.183644+08:00 DEBUG kernel: [56661.407529] hci_request.c:__hci_req_sync() hci0 end: err -110

The btsnoop log showed that the controller has disabled the scanning.
< HCI Command: LE Set.. (0x08|0x000c) plen 2  [hci0] 2018-05-21 03:10:02.185403
    Scanning: Disabled (0x00)
    Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4   [hci0] 2018-05-21 03:10:02.191357
    LE Set Scan Enable (0x08|0x000c) ncmd 1
    Status: Success (0x00)

This looks like a kernel issue.

It is probably a bug across all platforms.

The tarball of logs: https://drive.google.com/open?id=1qSkQMtyYXdLdN75wyd9qTsPmrh6ixMjd

 
Status: Started (was: Assigned)
This issue was caused by a race condition between kernel and the bluetoothd daemon. The scanning could be disabled for many reasons in kernel, e.g., on behalf of enabling advertising. To disable scanning requires to build a request of HCI commands, to submit the request to workqueue, execution of the commands, and to receive the event complete callback. This usually takes a few milli-seconds to finish. When completed, kernel would send a discovering-off event to bluetoothd for the discovery state change.

If both kernel and bluetoothd would like to disable scanning, there are some cases.
Case 1: kernel has disabled scanning well before bluetoothd does: kernel has sent the discovering-off event to notify bluetoothd. In this case, bluetoothd would not re-stop discovery.

Case 2: bluetoothd has sent a request to kernel to disable discovery in an earlier time. In this case, kernel would not try to disable scanning again in this case.

Case 3: If both kernel and bluetoothd would like to disable scanning almost at the same time, this may incur a problem.

Chance is that Thread_1 in kernel would accept the request from bluetoothd, while Thread_2 in kernel starts a bit earlier and is in the procedure of disabling scanning. When the request command built on behalf of bluetoothd is executed, it is possible that scanning is already disabled by the Thread_2. And thus the command would be simply skipped. Kernel would not receive any event and would time out eventually after 2 seconds. To fix it, we need a dummy command again appended in the request command queue. This dummy command would trigger the command complete event callback and finish the stop_discovery operation correctly. The issue occurs once in a while such that we did not notice it before.



The following logs (0524_1908_mt_0x54_btusb_delay_cl_fail_stop_discov_-110) reflected what happened in Case 3 above.
Link: https://drive.google.com/open?id=1TiXh2MoxGSdBd9HhobYGK0aY5KvFN6Q9

# Started discovery.
2018-05-25T01:25:57.558702+08:00 DEBUG bluetoothd[2690]: src/adapter.c:start_discovery() sender :1.7

# It was in DISCOVERING state now.
2018-05-25T01:25:57.815054+08:00 DEBUG kernel: [22707.416695] hci_core.c:hci_discovery_set_state() hci0 state 1 -> 2 (DISCOVERING)

# Kernel Thread_2 determined to disable scanning.
2018-05-25T01:25:58.001392+08:00 DEBUG kernel: [22707.602740] hci_request.c:hci_req_add_le_scan_disable() BT_DBG_DG: set scan enable tx: hci_req_add_le_scan_disable (ena=0)

# Kernel Thread_2 sent the HCI command correctly to disable scanning.
2018-05-25T01:25:58.001428+08:00 DEBUG kernel: [22707.602776] hci_core.c:skip_conditional_cmd() COND opcode=0x200c, wanted=0, on=1, chngn=0

# Bluetoothd tried to disable discovery 2 ms later.
2018-05-25T01:25:58.003906+08:00 DEBUG bluetoothd[2690]: src/adapter.c:stop_discovery() sender :1.7

# Kernel Thread_1 accepted the request from bluetoothd since Thread_1, not aware of that Thread_2 was stopping scanning, considered the state still as DISCOVERING.
2018-05-25T01:25:58.005123+08:00 DEBUG kernel: [22707.606621] mgmt.c:stop_discovery() hci0

# Kernel Thread_2 set the state to STOPPING at this time. There was only 2 micro-second late than the above step.
2018-05-25T01:25:58.005125+08:00 DEBUG kernel: [22707.606625] hci_core.c:hci_discovery_set_state() hci0 state 2 -> 4 (STOPPING)

2018-05-25T01:25:58.005133+08:00 DEBUG kernel: [22707.606647] hci_request.c:hci_req_stop_discovery() BT_DBG_DG: call hci_req_add_le_scan_disable: request:hci_req_stop_discovery(1)                                                                       

2018-05-25T01:25:58.005165+08:00 DEBUG kernel: [22707.606695] hci_core.c:hci_discovery_set_state() hci0 state 4 -> 0    

# Bluetoothd’s HCI command was skipped since scanning was already off.
2018-05-25T01:25:58.005137+08:00 DEBUG kernel: [22707.606650] hci_request.c:hci_req_add_le_scan_disable() BT_DBG_DG: set scan enable tx: hci_req_add_le_scan_disable (ena=0)
2018-05-25T01:25:58.006372+08:00 DEBUG kernel: [22707.607722] hci_core.c:skip_conditional_cmd() COND opcode=0x200c, wanted=0, on=0, chngn=0


# Since no event callback, the HCI command timed out 2 seconds later.
2018-05-25T01:26:00.005349+08:00 DEBUG kernel: [22709.606635] hci_request.c:__hci_req_sync() default err: -110, req_status: 1
2018-05-25T01:26:00.005364+08:00 DEBUG kernel: [22709.606640] hci_request.c:__hci_req_sync() hci0 end: err -110


Project Member

Comment 2 by bugdroid1@chromium.org, Jul 13

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/56e94b215b942d7331f9d265f1e2f695bf581d6a

commit 56e94b215b942d7331f9d265f1e2f695bf581d6a
Author: Joseph Hwang <josephsih@chromium.org>
Date: Fri Jul 13 12:09:36 2018

CHROMIUM: bluetooth: Add a dummy cmd in hci_req_add_le_scan_disable

If the HCI_OP_LE_SET_SCAN_ENABLE command is skipped when
stopping discovery, the corresponding command complete event
would never be received. This would cause the HCI command
timeout and thus the stop_discovery method fails.

The above said command might be skipped in a rare condition
when both kernel and the bluetoothd damemon would like to stop
discovery almost at the same time. In this case, chance is that
they might be out of sync of the discovery state and result in
the above error.

To work around the race condtion between kernel and bluetoothd,
a dummy HCI command is added. This makes sures that when kernel
has stopped scanning at a bit earlier time, bluetoothd, while
trying to re-stop discovery and having the HCI command skipped,
could still receive the correct stop_discovery_complete callback.

BUG= chromium:845107 
TEST=Run magic tethering for 24 hours and observe that
there are no error messages such as

  hci_request.c:__hci_req_sync() hci0 end: err -110

accompanied by the logs

Bluetooth: hci_core.c:skip_conditional_cmd()   COND LE cmd (0x200c) is
already 0 (chg 0), skip transition to 0
Bluetooth: hci_core.c:skip_conditional_cmd()   COND no more cmd in
queue.

which indicates that the last command is skipped.

Change-Id: Icbaf3b3a1faeebc135daf462780afd81c3eb2255
Signed-off-by: Joseph Hwang <josephsih@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1075922
Commit-Ready: Shyh-In Hwang <josephsih@chromium.org>
Tested-by: Shyh-In Hwang <josephsih@chromium.org>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>

[modify] https://crrev.com/56e94b215b942d7331f9d265f1e2f695bf581d6a/net/bluetooth/hci_request.c

Status: Fixed (was: Started)
Status: Verified (was: Fixed)

Sign in to add a comment