"Please wait" screen takes more than a minute to reach ARC++ Opt-in screen. |
||||||||||||||||||||||
Issue descriptionGoogle Chrome 64.0.3282.87 (Official Build) beta (64-bit) Revision 0 Platform 10176.47.0 (Official Build) beta-channel eve What steps will reproduce the problem? 1. Complete OOBE flow 2. Enter username and password and click Next What happens ? "Please wait" screen takes more than a minute to reach ARC++ Opt-in screen. This issue was fixed and verified at crbug.com/772187 but able to reproduce again. Attached debug logs. ⛆ |
|
|
,
Jan 13 2018
,
Jan 13 2018
Here is the login sequence in the chrome log: [1452:1452:0112/155240.514786:VERBOSE1:wizard_controller.cc(489)] Showing login screen. [1452:1452:0112/155240.514828:VERBOSE1:login_display_host_webui.cc(763)] Login WebUI >> sign in [1452:1452:0112/155240.694335:VERBOSE1:gaia_screen_handler.cc(967)] LoadAuthExtension, force: 1, offline: 0 [1452:1452:0112/155240.704729:INFO:signin_screen_handler.cc(1390)] Login WebUI >> active: 1, source: gaia-signin [1452:1452:0112/155240.713040:VERBOSE1:signin_screen_handler.cc(1344)] Login WebUI >> loginVisible, src: gaia-loading, webui_visible_: 1 [1452:1452:0112/155240.776843:VERBOSE1:gaia_screen_handler.cc(547)] Auth extension finished loading [1452:1452:0112/155241.592679:VERBOSE1:gaia_screen_handler.cc(721)] Gaia is loaded [1452:1452:0112/155241.592733:VERBOSE1:signin_screen_handler.cc(1344)] Login WebUI >> loginVisible, src: gaia-signin, webui_visible_: 1 [1452:1452:0112/155241.817770:WARNING:CONSOLE(18344)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/oobe.js (18344) [1452:1452:0112/155245.102555:WARNING:CONSOLE(18344)] "Unrecognized message from GAIA: clearOldAttempts", source: chrome://oobe/oobe.js (18344) [1452:1452:0112/155245.809259:WARNING:CONSOLE(18344)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/oobe.js (18344) [1452:1452:0112/155248.222593:VERBOSE1:oauth2_token_fetcher.cc(78)] Got OAuth2 tokens! [1452:1452:0112/155248.222643:VERBOSE1:oauth2_token_initializer.cc(30)] OAuth2 tokens fetched [1452:1452:0112/155248.222723:VERBOSE1:existing_user_controller.cc(545)] Setting flow from PerformLogin [1452:1452:0112/155248.222745:VERBOSE1:user_flow.cc(21)] Flow 0x2e983f2b0a60 got host 0x2e983d81a600 [1452:1452:0112/155248.222842:VERBOSE1:login_performer.cc(288)] Online login completion started. [1452:1452:0112/155248.223017:VERBOSE1:cryptohome_authenticator.cc(776)] Resolved state to: 0 [1452:1452:0112/155326.327696:ERROR:device_event_log_impl.cc(156)] [15:53:26.327] Login: homedir_methods.cc:274 HomedirMethods MountEx error (CryptohomeErrorCode): 1 [1452:1452:0112/155326.327777:ERROR:device_event_log_impl.cc(156)] [15:53:26.327] Login: cryptohome_authenticator.cc:932 Cryptohome failure: state(AuthState)=1, code(cryptohome::MountError)=32 [1452:1452:0112/155326.327812:VERBOSE1:cryptohome_authenticator.cc(776)] Resolved state to: 6 [1452:1452:0112/155425.559041:VERBOSE1:cryptohome_authenticator.cc(776)] Resolved state to: 13 [1452:1452:0112/155425.559101:VERBOSE1:cryptohome_authenticator.cc(666)] Login success There are 2 delays: a) ~40 seconds delay between [1452:1452:0112/155248.223017:VERBOSE1:cryptohome_authenticator.cc(776)] Resolved state to: 0 [1452:1452:0112/155326.327696:ERROR:device_event_log_impl.cc(156)] [15:53:26.327] Login: homedir_methods.cc:274 HomedirMethods MountEx error (CryptohomeErrorCode): 1 At the same time, in /var/log/messages: 2018-01-12T23:52:43.164027+00:00 INFO attestationd[2122]: Attestation: Preparing for enrollment... 2018-01-12T23:53:10.267695+00:00 ERR arc-oemcrypto[2869]: Failed to call method: org.chromium.CryptohomeInterface.TpmIsAttestationPrepared: object_path= /org/chromium/Cryptohome: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 2018-01-12T23:53:10.267737+00:00 ERR arc-oemcrypto[2869]: Failure in D-Bus response from boolean call ... 2018-01-12T23:53:24.219308+00:00 INFO attestationd[2122]: Attestation: Prepared successfully (41222ms). This seems to be, as before, the TpmIsAttestationPrepared request from arc-oemcrypto waiting for the lengthy PrepareForEnrollment operation (during which attestationd doesn't process incoming dbus requests). I'm a bit surprised as I thought arc-oemcrypto doesn't use TPM anymore in its current state, so doesn't need to wait for attestation-prepared. 2) ~1 minute delay between [1452:1452:0112/155326.327812:VERBOSE1:cryptohome_authenticator.cc(776)] Resolved state to: 6 [1452:1452:0112/155425.559041:VERBOSE1:cryptohome_authenticator.cc(776)] Resolved state to: 13 During which we have a) Nothing major happening between xx:53:26 and xx:54:00 - not sure what are we doing here. b) ~23 seconds in arc-oemcrypto generating a key: 2018-01-12T23:54:02.480170+00:00 INFO arc-oemcrypto[2869]: Valid key pair not found in TPM for oemcrypto, generate a new one ... 2018-01-12T23:54:25.479057+00:00 INFO arc-oemcrypto[2869]: Generated new TPM encrypted storage key
,
Jan 13 2018
+yusukes, +xzhou: do we still call TpmIsAttestationPrepared from arc-oemcrypto? The slowdown in this case is a KI: issue 777686 and issue 777688. But I thought we don't rely on TPM anymore in arc-oemsetup.
,
Jan 13 2018
[restoring Cc list]
,
Jan 13 2018
arc-setup has already removed the dependency on TPM. however, I am not sure about arc-oemcrypto.
,
Jan 13 2018
Ah, this is a different daemon from arc-setup... +jkardatzke then. Yes, I see IsEnrollmentPrepared in arc-oemcrypto. Which it probably needs for the stuff it does. But do we (and do we need to) wait for arc-oemcrypto initialization during OOBE before showing ARC++ opt-in?
,
Jan 13 2018
(It is also possible that for the first 40 seconds of delay, even though arc-oemcrypto by itself doesn't block OOBE, the fact that it issued TpmIsAttestationPrepared and that command is sitting there waiting for response may block other requests to cryptohomed from executing, and thus block OOBE. Then the first 40 seconds is purely issue 777686.) However, during the 2nd minute we spend time in arc-oemcrypto generating TPM encrypted storage key. And seem to continue with login only after that is done. Do we need to have ARC++ opt-in blocked on that step?
,
Jan 13 2018
In any case, if we need to have arc-oemcrypto perform its operations before ARC++ opt-in, a wait is almost inevitable - there is no quick fix: PrepareForEnrollment includes a long crypto keygen operation (time is non-deterministic, 40+ seconds were reported) and while TPm is busy performing it, all other TPM commands will wait. Possible vectors here: 1) move PrepareForEnrollment to an earlier stage (start before taking ownership - that's what was done for the other long keygen operation in issue 772187) - will look into that again; it may hide some of the delay, but if user clicks quickly through the OOBE screens that pre-initialization may still be going; 2) reduce time req'd for RSA keygen / switch to ECC - there are open issues and work on that, but for M64 we shouldn't expect any changes in this area; 3) move PrepareForEnrollment to after the first login - but that may affect post-login attestation tasks.
,
Jan 29 2018
Ping as no update in a while and tagged as RBS for M64. Please triage / assign.
,
Jan 29 2018
After looking at Chrome sources (thanks alemate@) it seems there's no point where we need to wait for arc-oemcrypto. But the fact that arc-oemcrypto polls for attestation readiness may block requests from other daemons to cryptohomed. So, next steps: 1) Verify that with stopped arc-oemcrypto we don't see this behavior 2) If it is confirmed, one quicker-and-simpler way of dealing with this is adding a new signal "attestation-prepared" and either making arc-oemcrypto upstart script depend on this signal (thanks to jkardatzke@ for confirming that the start of the daemon can be postponed); or changing cryptohomed to use that signal instead of probing attestationd in 2.0 case.
,
Jan 30 2018
Do we have owners ETAs for each of the steps in #11?
,
Jan 31 2018
,
Jan 31 2018
ETA: Feb 1 for ToT CL if dependency on arc-oemcrypto checks out. So far, in my testing I couldn't catch the 2-min delay even though I do see "Failed to call method: org.chromium.CryptohomeInterface.TpmIsAttestationPrepared" messages from arc-oemcrypto in the logs.
,
Jan 31 2018
Do we know how often this happens, is it happening consistently every time or just sporadically? We need to make the call if we want to hold the stable release for this or not, so we need to determine the user impact.
,
Jan 31 2018
TL;DR version: I realized the issue is deeper and there will be no quick fix. I suggest not holding the stable release for it. It's actually not even a regression, some changes in timing (slight changes in the build vs 63; or just manual sign in timing; or even a specific device - the timing depends on it) happened to reveal it for this test run. The delay will happen on the first boot, if the user signs in in a specific timeframe after starting OOBE (not sooner or later). Long details: It should happen if the user starts logging in during the first boot in a certain timeframe after going through OOBE. Clicking through OOBE triggers initialization - finalizing tpm ownership and preparing attestation database. The problem is it contains a heavy operation - generating a key pair on the TPM side, which can take 30+ seconds. There are other longer 7+ seconds commands as well. And while one command is processed, all other tpm operations will wait. And cryptohome has to block for some tpm operations when mounting user directory. If the user logs in before that, the login is finished before long operations start blocking cryptohome operations. If after that, all long ops are done, and the login should be fast again. But if the tpm is already owned, while preparing attestation still goes on, it has to wait. I finally was able to reproduce it consistently with a WIP first boot autotest. And not starting arc-oemcrypto doesn't help much (at least not in the scenario I outline below). If login is initiated immediately after going through OOBE, it finishes in 7-8 seconds. If we wait 30 seconds before doing the login to start it in the middle of attestation preparation, it takes up to 60 seconds. As I realize today, there is no complete workaround for this on the Chrome OS userland side. I work on reducing the number of operations cryptohomed performs that need to block waiting for TPM response. But some have to block - decrypting the user keyset has to be performed before the user is signed in, there is no way around. And if it happens at the right time, that single op can sit for 30-40 seconds waiting for key generation initiated by a parallel process. So, there will be no quick fix. It is possible to move the heaviest keygen op to an earlier time, similarly to what we did to another such keygen op already for similar reasons. The problem is, going through OOBE can hide one such 30 second op, but likely not two. So, I will do that as one of the measures, but I'm reluctant to add it to a stable release without longer testing. It has a potential to break things w/o changing the impression re performance in practice. So, there's no quick fix. There is work going on on cr50 side to speed up these long ops, but there's will be no quick fix in time for M64 either. More details on the subj: - Moving the 1st long op to an earlier stage (done): issue 772187 - Keygen performance: http://b/68167013 - Reducing the number of places we wait for TPM: issue 777688, issue 777686, issue 777679
,
Jan 31 2018
,
Jan 31 2018
,
Jan 31 2018
,
Mar 9 2018
This issue reproduced on M66 (CrOS version 10452.5.0, 66.0.3359.21) Kevin device. It took more than a minute to reach ARC++ Optin screen for owner account. For a non- owner account there wasn't any delay observed.
,
Apr 11 2018
Based on email with Andrey (apronin@) the fix is not expected to land in time for M66, note this is not a new regression (it is a race condition that depends on how quickly user goes through oobe/login) Moving to M67 as per above
,
May 22 2018
What is the status of the fix? Can you please provide an update?
,
May 23 2018
No progress. Updating the earlier analysis: Moving the long PrepareForEnrollment operation to an earlier stage turns out to be not an option: doing it for tpm initialization has its own set of issues (e.g. issue 841627 ), and adding PrepareForEnrollment there would make it worse, not better. I've just started working on a set of changes that will improve the situation in some scenarios (roughly "Reducing the number of places we wait for TPM" from #16). But no reliable fix is possible through those measures. The fix depends on having ECC keys for cryptohome and attestation, and that's a bigger project that will take a few milestones before it lands. So, no fix expected for M-67 or the next few releases.
,
May 24 2018
Thanks for the update, removing RBS and moving to M68 (possible punt to M69).
,
Oct 12
,
Nov 2
,
Nov 9
Removing UI components as it sounds like there's not a UI fix available here.
,
Nov 28
,
Nov 28
,
Nov 28
,
Dec 14
|
|||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by sdantul...@chromium.org
, Jan 13 2018