New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 837656 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Out-of-sync of discovery session in Chrome/Instant Tether

Project Member Reported by sonnysasaka@chromium.org, Apr 27 2018

Issue description

From a specific case in b/78181184, a bluetoothd crash may cause out-of-sync of discovery session in Chrome/Instant Tether, apparent from the logs mentioning that ble_scanner_impl.cc keeps trying to stop a discovery session but that discovery session is not active.

At this moment it's not yet clear whether the out-of-sync is in Chrome Bluetooth API or in Instant Tether.
 
Ruchi did a repro and got the logs for this case with Eve (M67-10575.40.0):
https://drive.google.com/open?id=1uu8ZPfMqoKx-HxXJAprTPITlpTFJeroI
Cc: jlklein@chromium.org jhawkins@chromium.org
Status: Started (was: Untriaged)
Project Member

Comment 4 by bugdroid1@chromium.org, May 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/acd69e7c2897409ecc468f4a8cd5a6ec2661dfc9

commit acd69e7c2897409ecc468f4a8cd5a6ec2661dfc9
Author: Sonny Sasaka <sonnysasaka@chromium.org>
Date: Thu May 24 01:06:36 2018

device/bluetooth: Make sure to clear discovery sessions when adapter goes away

When Bluetooth adapter goes away, this could be caused by BlueZ crash or
chip error. In any case, we should make sure to clear all discovery
sessions, otherwise out-of-sync about discovery sessions can happen
between Chrome and BlueZ.

BUG= 837656 

Change-Id: Id1956039a94da32fe95a0d2c023eb97eef8d99fa
Reviewed-on: https://chromium-review.googlesource.com/1069934
Commit-Queue: Sonny Sasaka <sonnysasaka@chromium.org>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Cr-Commit-Position: refs/heads/master@{#561347}
[modify] https://crrev.com/acd69e7c2897409ecc468f4a8cd5a6ec2661dfc9/device/bluetooth/bluez/bluetooth_adapter_bluez.cc
[modify] https://crrev.com/acd69e7c2897409ecc468f4a8cd5a6ec2661dfc9/device/bluetooth/bluez/bluetooth_bluez_unittest.cc

Sonny, should this be merged to M-67?
Labels: Merge-Request-67
Yes, filed a merge request.
Project Member

Comment 7 by sheriffbot@chromium.org, May 24 2018

Labels: -Merge-Request-67 Merge-Review-67 Hotlist-Merge-Review
This bug requires manual review: We are only 4 days from stable.
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Pending update from Sonny re: testing
Testing result:
Kyle did a stress test involving continuous scanning for hours. The tested subject is eve M67-10575.40.0 (which still had a lot of bluetoothd crashes) with the fix patch applied. This was intentionally chosen to see whether the fix is resilient of these crashes. During about 3 hours of continuous scan, there were 10 crashes of bluetoothd but none of them led to the out-of-sync state this bug is about. We stopped the test when it hit other issues outside the scope of this bug.

logs from Kyle's testing: https://drive.google.com/file/d/1G6Fi7-ZkU5k0RZoa5WoWREcUkJ2aZkU7/view

So, from the testing it can be seen that the fix successfully mitigate the out-of-sync discovery session bug.
Thanks.  Also, I didn't note where this bug derived.  Is this a M67 regression or has it existed longer than that?  Also, if we have all of the other crashes mentioned, will this fix make a notable difference for the users?
It has existed since August 2017, but the bug never manifested because other factors didn't quite permit it to happen likely.

The crash bugs of bluetoothd (Bluetooth daemon) also have been there for long, but manifested only recently possibly because changes of other factors (like Bluetooth-heavy features). There is another long-term ongoing work to fix bluetoothd crashes. This fix will make our Bluetooth stack to at least recover from those crashes, not get stuck in a bad state. Without the fix, the user notices that the Bluetooth stack is stuck in a bad state (cannot scan anymore without restarting the stack).
Sorry, to clarify my last comment: The bug has existed since August 2017 (because that's the first time I saw the code), but could be much earlier according to the code history.
Labels: -Merge-Review-67 Merge-Rejected-67
Stable is pending so I'd rather reject in favor of M68, esp as this has been an issue for some time.  Please retag the request with reasoning if there's reason otherwise.  Thanks
Labels: Merge-Request-67
Hey Kevin - I still think this is worthy of merging to M-67. You are correct that the issue has been in existence for some time, but it seems that some other factors have changed in M-67 which have exposed the issue. We are still investigating *why* the rate of bluetoothd crashes has increased so rapidly in M-67, but we are definitely sure that there *has* been an increase.

Please see the following metrics which indicate the severity of the issue:
https://uma.googleplex.com/timeline_v2?sid=4fdca47899002f26feabdf192efe5fb3

Generally, this metric hovers around 96-99% success rate. In M-67, the success rate has plummeted to 25-30%. This means that any Chrome OS feature which utilizes Bluetooth Low Energy scans will have a ~75% chance of failing; worse yet, these issues are not recoverable and thus require a full power cycle of the machine before they are fixed. Since most users simply close their laptop lid and reopen it later (without a full shutdown-then-startup flow), this bug will leave the majority of users unable to use BLE features, such as Instant Tethering and EasyUnlock.

We are currently running tests on the latest M-67 cut to confirm that the issue still exists, and we will provide test logs as soon as possible. Is there anything specifically you'd like us to do to help with your decision? Thanks!
Project Member

Comment 15 by sheriffbot@chromium.org, May 29 2018

Labels: -Merge-Request-67 Merge-Review-67
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Thanks for the context in #14.  Assume everyone is satisfied with testing showing that this hasn't made matters worse?  I'll approve if that's the case.
Hi Kevin, we understand that it's risky to merge so we did more investigation about the severity of this issue. Taking into consideration the other fixes that has landed and improved the stability (CL 1063049, CL 1063050, CL 1063051), we see that it's not urgent to merge this particular fix to M67 so we accept the merge reject decision. Thanks for reviewing this merge request!
Status: Fixed (was: Started)
Thanks again, Kevin. Sorry for the noise here :)
Labels: -Merge-Review-67
No worries; thanks for the feedback.
Status: Verified (was: Fixed)

Sign in to add a comment