Race condition in client_cert_resolver can leave networks without EAP.CertId |
|||||||||
Issue description
When the following series of events happens:
The user signs in. The user certificate database is not loaded yet.
(1) ClientCertResolver is triggered and enters ResolveNetworks.
It decides that networks need to be resolved (networks_to_resolve is not empty).
It starts a first resolving task [A] which takes a long time (~4 seconds on my machine).
While the task [A] is running...
(2) ClientCertResolver is triggered and enters ResolveNetworks.
It decides that no network needs resolving ("No networks to resolve." branch). It exits early.
(3) The user certficate database finishes loading.
ClientCertResolver is triggered as a result and enters ResolveNetworks.
It decides that networks need to be resolved (networks_to_resolve is not empty).
It starts another resolving task [B] which is fast.
(4) Resolving task [B] finds certificates and sets them in shill, and finishes.
(5) Resolving task [A] was started before the user's certificates were loaded, doesn't find certificates and clears them in shill, and finishes.
Root cause:
ClientCertResolver is designed to only have one running resolving task at a time, but this sequence of events starts two tasks. The reason is a subtly wrong handling of the resolve_task_running_ variable. Especially, resolve_task_running_ is set to false when ResolveNetworks exits early (if there are no networks to resolve).
Consequence: It seems that the main symptom is that the Ethernet networks don't get a certificate. I *think* wifi networks may not be affected as much because the user can manually connect to them (re-triggering the resolving process) and/or because wifi scanning may trigger the resolving process.
When does this happen:
I'm not sure what triggers it exactly. The first resolving task must run for a long time. I will try to investigate the exact causes.
Why is the first task so slow?
It seems that CreateSortedCertAndIssuerList is very slow the first time it is invoked (I filed bug 781275 to investigate).
,
Nov 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/75afb402fdb45737708c61c16e744b2d2a93b418 commit 75afb402fdb45737708c61c16e744b2d2a93b418 Author: Pavol Marko <pmarko@chromium.org> Date: Tue Nov 07 09:23:40 2017 ClientCertResolver: Ensure only one cert resolving task is queued Fix handling of the resolve_task_running_ variable in the case that there are no networks to resolve. Without this, ClientCertResolver could schedule multiple cert resolving tasks in parallel, which can lead to wrong behavior. BUG= 781276 TEST=chromeos_unittests --gtest_filter=ClientCertResolverTest.* Change-Id: Ib3cc8a70425e82f6a11cef4bbefe0ad82d571eb4 Reviewed-on: https://chromium-review.googlesource.com/753371 Reviewed-by: Steven Bennetts <stevenjb@chromium.org> Reviewed-by: Maksim Ivanov <emaxx@chromium.org> Commit-Queue: Pavol Marko <pmarko@chromium.org> Cr-Commit-Position: refs/heads/master@{#514428} [modify] https://crrev.com/75afb402fdb45737708c61c16e744b2d2a93b418/chromeos/network/client_cert_resolver.cc [modify] https://crrev.com/75afb402fdb45737708c61c16e744b2d2a93b418/chromeos/network/client_cert_resolver.h [modify] https://crrev.com/75afb402fdb45737708c61c16e744b2d2a93b418/chromeos/network/client_cert_resolver_unittest.cc
,
Nov 9 2017
This will be in M64 (10110.0.0 / 64.0.3262.0) https://chromium.googlesource.com/chromium/src/+log/64.0.3261.0..64.0.3262.0?n=10000
,
Nov 9 2017
+Katherine/Jing Could you please verify that policy-configured EAP-TLS networks still auto-connect on a release >= 64.0.3262.0 ? Thanks!
,
Nov 13 2017
pmarko@ I have verified in today's ToT M64.0.3265.0 10124.0.0 dev paine that policy-configured EAP-TLS networks is still auto-connected.
,
Nov 20 2017
Requesting merge to M-63 (3239). This fixes ethernet EAP-TLS authentication issues observed internally. Thanks!
,
Nov 20 2017
This bug requires manual review: M63 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 23 2017
+David as PM for awareness This is a race condition in the client certificate resolving code. The symptoms are that the device does not automatically authenticate using EAP-TLS after user session start for policy-configured EAP-TLS networks. I'm not sure what triggers it exactly.
,
Nov 28 2017
,
Nov 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3ab554cb8d0154029d6eecf2b5b9afef87d8562f commit 3ab554cb8d0154029d6eecf2b5b9afef87d8562f Author: Pavol Marko <pmarko@chromium.org> Date: Wed Nov 29 10:28:40 2017 [Merge to M63] ClientCertResolver: Ensure only one cert resolving task is queued Fix handling of the resolve_task_running_ variable in the case that there are no networks to resolve. Without this, ClientCertResolver could schedule multiple cert resolving tasks in parallel, which can lead to wrong behavior. BUG= 781276 TEST=chromeos_unittests --gtest_filter=ClientCertResolverTest.* TBR=pmarko@chromium.org (cherry picked from commit 75afb402fdb45737708c61c16e744b2d2a93b418) Change-Id: Ib3cc8a70425e82f6a11cef4bbefe0ad82d571eb4 Reviewed-on: https://chromium-review.googlesource.com/753371 Reviewed-by: Steven Bennetts <stevenjb@chromium.org> Reviewed-by: Maksim Ivanov <emaxx@chromium.org> Commit-Queue: Pavol Marko <pmarko@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#514428} Reviewed-on: https://chromium-review.googlesource.com/795715 Reviewed-by: Pavol Marko <pmarko@chromium.org> Cr-Commit-Position: refs/branch-heads/3239@{#602} Cr-Branched-From: adb61db19020ed8ecee5e91b1a0ea4c924ae2988-refs/heads/master@{#508578} [modify] https://crrev.com/3ab554cb8d0154029d6eecf2b5b9afef87d8562f/chromeos/network/client_cert_resolver.cc [modify] https://crrev.com/3ab554cb8d0154029d6eecf2b5b9afef87d8562f/chromeos/network/client_cert_resolver.h [modify] https://crrev.com/3ab554cb8d0154029d6eecf2b5b9afef87d8562f/chromeos/network/client_cert_resolver_unittest.cc
,
Dec 7 2017
Katherine/Jing, would you mind re-verifying on M-63 ?
,
Jan 19 2018
Marking as verified as it has been confirmed by the affected team that the issue no longer occurs on M-63. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by pmarko@chromium.org
, Nov 3 2017