New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 781276 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Race condition in client_cert_resolver can leave networks without EAP.CertId

Project Member Reported by pmarko@chromium.org, Nov 3 2017

Issue description

When the following series of events happens:

The user signs in. The user certificate database is not loaded yet.
(1) ClientCertResolver is triggered and enters ResolveNetworks.
    It decides that networks need to be resolved (networks_to_resolve is not empty).
    It starts a first resolving task [A] which takes a long time (~4 seconds on my machine).
While the task [A] is running...
  (2) ClientCertResolver is triggered and enters ResolveNetworks.
      It decides that no network needs resolving ("No networks to resolve." branch). It exits early.
  (3) The user certficate database finishes loading.
      ClientCertResolver is triggered as a result and enters ResolveNetworks.
      It decides that networks need to be resolved (networks_to_resolve is not empty).
      It starts another resolving task [B] which is fast.
  (4) Resolving task [B] finds certificates and sets them in shill, and finishes.
(5) Resolving task [A] was started before the user's certificates were loaded, doesn't find certificates and clears them in shill, and finishes.

Root cause:
ClientCertResolver is designed to only have one running resolving task at a time, but this sequence of events starts two tasks. The reason is a subtly wrong handling of the resolve_task_running_ variable. Especially, resolve_task_running_ is set to false when ResolveNetworks exits early (if there are no networks to resolve).

Consequence: It seems that the main symptom is that the Ethernet networks don't get a certificate. I *think* wifi networks may not be affected as much because the user can manually connect to them (re-triggering the resolving process) and/or because wifi scanning may trigger the resolving process.

When does this happen:
I'm not sure what triggers it exactly. The first resolving task must run for a long time. I will try to investigate the exact causes.

Why is the first task so slow?
It seems that CreateSortedCertAndIssuerList is very slow the first time it is invoked (I filed  bug 781275  to investigate).
 
Status: Started (was: Assigned)
CL proposal: https://chromium-review.googlesource.com/c/chromium/src/+/753371
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/75afb402fdb45737708c61c16e744b2d2a93b418

commit 75afb402fdb45737708c61c16e744b2d2a93b418
Author: Pavol Marko <pmarko@chromium.org>
Date: Tue Nov 07 09:23:40 2017

ClientCertResolver: Ensure only one cert resolving task is queued

Fix handling of the resolve_task_running_ variable in the case that
there are no networks to resolve.
Without this, ClientCertResolver could schedule multiple cert resolving
tasks in parallel, which can lead to wrong behavior.

BUG= 781276 
TEST=chromeos_unittests --gtest_filter=ClientCertResolverTest.*

Change-Id: Ib3cc8a70425e82f6a11cef4bbefe0ad82d571eb4
Reviewed-on: https://chromium-review.googlesource.com/753371
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>
Reviewed-by: Maksim Ivanov <emaxx@chromium.org>
Commit-Queue: Pavol Marko <pmarko@chromium.org>
Cr-Commit-Position: refs/heads/master@{#514428}
[modify] https://crrev.com/75afb402fdb45737708c61c16e744b2d2a93b418/chromeos/network/client_cert_resolver.cc
[modify] https://crrev.com/75afb402fdb45737708c61c16e744b2d2a93b418/chromeos/network/client_cert_resolver.h
[modify] https://crrev.com/75afb402fdb45737708c61c16e744b2d2a93b418/chromeos/network/client_cert_resolver_unittest.cc

Status: Fixed (was: Started)
This will be in M64 (10110.0.0 / 64.0.3262.0)

https://chromium.googlesource.com/chromium/src/+log/64.0.3261.0..64.0.3262.0?n=10000
Cc: kathrelk...@chromium.org jingwee@chromium.org
+Katherine/Jing

Could you please verify that policy-configured EAP-TLS networks still auto-connect on a release >= 64.0.3262.0 ?
Thanks!
pmarko@ I have verified in today's ToT M64.0.3265.0 10124.0.0 dev paine that policy-configured EAP-TLS networks is still auto-connected.

Comment 6 by pmarko@chromium.org, Nov 20 2017

Cc: gkihumba@chromium.org
Labels: Merge-Request-63
Requesting merge to M-63 (3239).

This fixes ethernet EAP-TLS authentication issues observed internally. 
Thanks!
Project Member

Comment 7 by sheriffbot@chromium.org, Nov 20 2017

Labels: -Merge-Request-63 Merge-Review-63 Hotlist-Merge-Review
This bug requires manual review: M63 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 8 by pmarko@chromium.org, Nov 23 2017

Cc: dskaram@chromium.org
+David as PM for awareness

This is a race condition in the client certificate resolving code. The symptoms are that the device does not automatically authenticate using EAP-TLS after user session start for policy-configured EAP-TLS networks.
I'm not sure what triggers it exactly.

Comment 9 by gkihumba@google.com, Nov 28 2017

Labels: -Merge-Review-63 Merge-Approved-63
Project Member

Comment 10 by bugdroid1@chromium.org, Nov 29 2017

Labels: -merge-approved-63 merge-merged-3239
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3ab554cb8d0154029d6eecf2b5b9afef87d8562f

commit 3ab554cb8d0154029d6eecf2b5b9afef87d8562f
Author: Pavol Marko <pmarko@chromium.org>
Date: Wed Nov 29 10:28:40 2017

[Merge to M63] ClientCertResolver: Ensure only one cert resolving task is queued

Fix handling of the resolve_task_running_ variable in the case that
there are no networks to resolve.
Without this, ClientCertResolver could schedule multiple cert resolving
tasks in parallel, which can lead to wrong behavior.

BUG= 781276 
TEST=chromeos_unittests --gtest_filter=ClientCertResolverTest.*
TBR=pmarko@chromium.org

(cherry picked from commit 75afb402fdb45737708c61c16e744b2d2a93b418)

Change-Id: Ib3cc8a70425e82f6a11cef4bbefe0ad82d571eb4
Reviewed-on: https://chromium-review.googlesource.com/753371
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>
Reviewed-by: Maksim Ivanov <emaxx@chromium.org>
Commit-Queue: Pavol Marko <pmarko@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#514428}
Reviewed-on: https://chromium-review.googlesource.com/795715
Reviewed-by: Pavol Marko <pmarko@chromium.org>
Cr-Commit-Position: refs/branch-heads/3239@{#602}
Cr-Branched-From: adb61db19020ed8ecee5e91b1a0ea4c924ae2988-refs/heads/master@{#508578}
[modify] https://crrev.com/3ab554cb8d0154029d6eecf2b5b9afef87d8562f/chromeos/network/client_cert_resolver.cc
[modify] https://crrev.com/3ab554cb8d0154029d6eecf2b5b9afef87d8562f/chromeos/network/client_cert_resolver.h
[modify] https://crrev.com/3ab554cb8d0154029d6eecf2b5b9afef87d8562f/chromeos/network/client_cert_resolver_unittest.cc

Katherine/Jing, would you mind re-verifying on M-63 ?
Status: Verified (was: Fixed)
Marking as verified as it has been confirmed by the affected team that the issue no longer occurs on M-63. 

Sign in to add a comment