New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 905523 link

Starred by 6 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Investigate why Smart Lock takes so long to visibly fail and time out

Project Member Reported by hansberry@chromium.org, Nov 15

Issue description

Smart Lock only needs to scan for one host, which should only take 5 (?) seconds given background advertising and foreground scanning latency. However, it seems to take 15-20 seconds to realize that either a phone is not nearby, or that Bluetooth connection attempts to phone failed.

This long timeout leads to poor UX. And as a team, we could probably get a better sense of the feature's stability if we reduce the total timeout duration.

I suspect the issue is in how long UnlockManager is given to wake up from sleep: 15 seconds [1]. This seems excessive.,

Requires more investigation.

1) https://cs.chromium.org/chromium/src/chromeos/components/proximity_auth/unlock_manager_impl.cc?g=0&l=31
 
I may be wrong about the "5 seconds"; Kyle or Jeremy, feel free to jump in and tell me I'm wrong :p
You're wrong :-p. K I'll be a bit more detailed... lots of steps can add up here. As a starting point, both the discovery phase (waiting for an advertisement hit) and Gatt connection phase (service discovery, etc) can take more than a couple seconds. I'd be worried bout 5s as a hard timeout since our metrics indicate that not even our median is there right now.
While testing my recent CL [1] which measures latency for creating a BLE connection, I found that the "start scan to authentication" time is usually over 5s. We don't have real metrics coming in yet since I just submitted that CL, but here are some measurements from chrome://histograms that I gathered from >200 connection attempts:

0 ... 
4133 --------------------------O (18 = 8.5%) {0.0%}
4523 ------------------------------------------------------------O (41 = 19.4%) {8.5%}
4949 ------------------------------------------------------------------------O (49 = 23.2%) {28.0%}
5416 ------------------------------------------------------------------O (45 = 21.3%) {51.2%}
5927 -----------------------------------O (24 = 11.4%) {72.5%}
6486 -------------------------O (17 = 8.1%) {83.9%}
7097 ------------O (8 = 3.8%) {91.9%}
7766 -------O (5 = 2.4%) {95.7%}
8498 ---O (2 = 0.9%) {98.1%}
9299 O (0 = 0.0%) {99.1%}
10176 -O (1 = 0.5%) {99.1%}
11135 -O (1 = 0.5%) {99.5%}
12185 ...

So, in this small sample size in which the phone is less than 1 foot away from the laptop, ~50% of attempts take more than 5.4 seconds.

[1] https://chromium-review.googlesource.com/c/1327466
Figured I was wrong :p Thanks -- maybe 8 seconds will be a more reasonable timeout for creating an authenticated connection? (95th percentile) Then, we can create a second timeout in this class that waits for messages to come back from the phone.

I won't be looking into this bug until a couple weeks from now, so we can wait and see the results of Kyle's metric in the wild.

Comment 5 Deleted

Another potential optimization!

I just hit a case where the Chromebook found the phone, but the phone was then lost. Unfortunately, in the listening context, SecureChannel service doesn't bubble up this kind of error to clients (i.e., Smart Lock). This means SL waits for the connection for the max timeout period (15 seconds) instead of informing the user immediately that an error has occurred.

See logs:

[26467:26467:1127/102928.323355:INFO:ble_scanner_impl.cc(150)] Started discovery session successfully.
[26467:26467:1127/102928.475117:INFO:ble_scanner_impl.cc(271)] BleScannerImpl::HandleDeviceUpdated(): Received scan result from device with ID "CAESR...sU2aZ". Service data: 0x2369, Background advertisement: true
[26467:26467:1127/102928.475185:INFO:ble_connection_manager_impl.cc(359)] BleConnectionManager::OnReceivedAdvertisement(): Connection established; starting authentication process. Remote device ID: CAESR...sU2aZ, Connection role: [listener role]
[26467:26467:1127/102928.475212:INFO:bluetooth_low_energy_weave_client_connection.cc(234)] Creating GATT connection with {id: "CAESR...sU2aZ", addr: "4E:EF:5C:AD:DB:7E"}.
[26467:26467:1127/102928.523814:WARNING:bluetooth_low_energy_weave_client_connection.cc(427)] Device has been lost: {id: "CAESR...sU2aZ", addr: "4E:EF:5C:AD:DB:7E"}.
[26467:26467:1127/102928.901819:INFO:ble_scanner_impl.cc(183)] Stopped discovery session successfully.
[26467:26467:1127/102929.105693:INFO:ble_scanner_impl.cc(150)] Started discovery session successfully.

(~14 seconds of the user needlessly staring at the spinning SL icon, hoping it works...)
(We would preferably hit this line [1])

[26467:26467:1127/102943.234223:INFO:unlock_manager_impl.cc(440)] Updating screenlock state from [bluetooth connecting] to [no phone]

Kyle, we've spoken about why SecureChannel service can't bubble up errors to clients, but I'm a bit fuzzy on the details -- shouldn't we be able to bubble up an error like BluetoothAdapter::DeviceRemoved() after receiving an advertisement from the expected device? Does ConnectionAttemptFailureReason::GATT_CONNECTION_ERROR ever get bubbled up to clients in the listening context? 

1) https://cs.chromium.org/chromium/src/chromeos/components/proximity_auth/remote_device_life_cycle_impl.cc?l=113
2) https://cs.chromium.org/chromium/src/chromeos/services/secure_channel/public/mojom/secure_channel.mojom?type=cs&q=ConnectionAttemptFailureReason+file:mojom&g=0&l=22
Spoke with Kyle offthread.

Most GATT connection errors are recoverable, including DeviceRemoved, so usually after an error like this, we would expect to see a connection established after the second discovery session begins in the logs above. However, we don't have data on how long each each generally takes to recover -- that may be worth investigating or creating metrics for. If some recoverable errors take dozens of seconds to recover from, perhaps they should result in terminating the listening attempt early? (Perhaps that could be optional, and a client like SL could opt for a less lenient failure strategy, in order to improve its perceived latency?) 

We also discussed the possibility of informing clients when each connection attempt fails, so that SL could display a message once it times out like "Found phone, but failed to connect". However, we're not sure that provides any additional value to the user, and it would require a few weeks of work.


Just hit another issue on my Chromebook when using SL:

[29195:29195:1127/153203.139210:INFO:unlock_manager_impl.cc(440)] Updating screenlock state from [inactive] to [bluetooth connecting]
[29195:29195:1127/153209.641978:ERROR:ble_scanner_impl.cc(163)] Error starting discovery session.
(logs spammed with "Error starting discovery session" for 15 seconds...)
[29195:29195:1127/153218.140284:INFO:unlock_manager_impl.cc(440)] Updating screenlock state from [bluetooth connecting] to [no phone]

SecureChannel doesn't bubble up to clients when it fails to start a discovery session multiple times in a row (this is unrecoverable). We would preferably immediately inform the user that their phone cannot be connected to. 

Discussed with Kyle offline: this would be a nice-to-have, but would take a couple weeks worth of work to implement. We would need to weigh this potential change against others to determine priority.
Is there a way for me to see these logs on my device or would I have to be in dev mode?
Sure! The easiest way is to navigate your browser to chrome://proxmity-auth. This is the webpage we use to debug our products like Instant Tethering and Smart Lock.

Comment 11 Deleted

I emailed you my logs since I'm not sure if they include information that shouldn't be posted publicly. Hopefully, you can determine what the problem is.
Cc: kbleicher@chromium.org
 Issue 871233  has been merged into this issue.

Sign in to add a comment