Issue metadata
Sign in to add a comment
|
Chrome login fails in tests due to STORE_NO_POLICY error from session_manager |
||||||||||||||||||||||||
Issue descriptionThree tests in a row that require login timed out on the tidus DUT at chromeos4-row3-rack2-host6: 2018/09/02 23:38:25 Started test video.PlayH264 2018/09/02 23:38:25 [23:38:24.305] Restarting ui job ... 2018/09/02 23:38:29 [23:38:28.442] Logging in as user "testuser@gmail.com" 2018/09/02 23:38:29 [23:38:28.482] Waiting for cryptohome /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 ... 2018/09/02 23:40:22 [23:40:22.025] Error at play.go:49: Failed to connect to Chrome: /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 not mounted: context deadline exceeded ... 2018/09/02 23:40:22 Started test video.PlayVP8 2018/09/02 23:40:22 [23:40:22.056] Restarting ui job ... 2018/09/02 23:40:27 [23:40:26.640] Logging in as user "testuser@gmail.com" 2018/09/02 23:40:27 [23:40:26.694] Waiting for cryptohome /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 ... 2018/09/02 23:42:20 [23:42:19.738] Error at play.go:49: Failed to connect to Chrome: /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 not mounted: context deadline exceeded ... 2018/09/02 23:42:20 Started test video.PlayVP9 2018/09/02 23:42:20 [23:42:19.763] Restarting ui job ... 2018/09/02 23:42:24 [23:42:24.107] Logging in as user "testuser@gmail.com" 2018/09/02 23:42:24 [23:42:24.154] Waiting for cryptohome /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 ... 2018/09/02 23:44:18 [23:44:17.449] Error at play.go:49: Failed to connect to Chrome: /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 not mounted: context deadline exceeded This doesn't look like a cryptohome-related problem, as I see "Sign-in failed, please connect to the internet and try again." errors in all three Chrome logs: ... [14078:14078:0902/233826.481689:VERBOSE1:login_display_host_webui.cc(736)] Login WebUI >> wp animation done [14078:14078:0902/233826.481739:VERBOSE1:login_display_host_webui.cc(898)] Login WebUI >> Show already initialized UI [14078:14078:0902/233826.482020:VERBOSE1:webui_login_view.cc(608)] Login WebUI >> login-prompt-visible [14078:14078:0902/233826.559065:WARNING:CONSOLE(442)] "document.registerElement is deprecated and will be removed in M73, around March 2019. Please use window.customElements.define instead. See https://www.chromestatus.com/features/4642138092470272 for more details.", source: chrome://resources/polymer/v1_0/polymer/polymer-micro-extracted.js (442) [14078:14078:0902/233827.346000:WARNING:CONSOLE(2083)] "Element.createShadowRoot is deprecated and will be removed in M73, around March 2019. Please use Element.attachShadow instead. See https://www.chromestatus.com/features/4507242028072960 for more details.", source: chrome://resources/polymer/v1_0/polymer/polymer-mini-extracted.js (2083) [14078:14078:0902/233828.305718:INFO:signin_screen_handler.cc(1483)] Login WebUI >> active: 1, source: account-picker [14078:14078:0902/233828.413993:VERBOSE1:signin_screen_handler.cc(1437)] Login WebUI >> loginVisible, src: oobe, webui_visible_: 0 [14078:14078:0902/233828.414277:VERBOSE1:login_display_host_webui.cc(721)] Login WebUI >> WEBUI_VISIBLE [14078:14078:0902/233828.414318:VERBOSE1:webui_login_view.cc(603)] Login WebUI >> not emitting signal, hidden: 0 [14078:14078:0902/233828.465446:VERBOSE1:gaia_screen_handler.cc(780)] HandleCompleteLogin [14078:14078:0902/233828.465575:VERBOSE1:login_display_webui.cc(109)] Show error, error_id: 2272, attempts:1, help_topic_id: 188036 [14078:14078:0902/233828.465605:ERROR:core_oobe_handler.cc(236)] CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again. [14078:14078:0902/233828.715040:INFO:signin_screen_handler.cc(1371)] Login WebUI >> AccountPickerReady [14078:14078:0902/233828.715397:VERBOSE1:gaia_screen_handler.cc(314)] MaybePreloadAuthExtension [14078:14078:0902/233828.715496:VERBOSE1:gaia_screen_handler.cc(1171)] LoadAuthExtension, force: 1, offline: 0 [14078:14078:0902/233828.717376:VERBOSE1:signin_screen_handler.cc(1437)] Login WebUI >> loginVisible, src: account-picker, webui_visible_: 1 [14078:14078:0902/233828.772347:VERBOSE1:gaia_screen_handler.cc(621)] Auth extension finished loading [14078:14078:0902/233829.005000:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online [14078:14078:0902/233830.401547:VERBOSE1:gaia_screen_handler.cc(803)] Gaia is loaded [14078:14078:0902/233830.401652:VERBOSE1:signin_screen_handler.cc(1437)] Login WebUI >> loginVisible, src: gaia-signin, webui_visible_: 1 [14078:14078:0902/233830.401836:WARNING:CONSOLE(18938)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/login.js (18938) [14078:14078:0902/233853.502018:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online [14078:14078:0902/233914.606433:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online [14078:14078:0902/233943.173627:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online [14078:14078:0902/234022.249512:VERBOSE1:component_updater_service.cc(107)] CrxUpdateService stopping [14078:14078:0902/234022.249825:WARNING:pref_notifier_impl.cc(23)] Pref observer found at shutdown. [14078:14078:0902/234022.249851:WARNING:pref_notifier_impl.cc(23)] Pref observer found at shutdown. EOF These tests are just calling Oobe.loginForTesting. How can login fail with a network error when loginForTesting is used?
,
Sep 5
Yikes, what a misleading error message. :-( Does anyone own CrosSettings? I've had trouble finding an expert when I looked before. If this is another case of bad existing state on the DUT breaking tests, is there any way that test setup code can safely wipe existing settings? I've spent a lot of time fighting TPM/policy/etc. brokenness on DUTs caused by earlier tests getting aborted while they're doing strange things.
,
Sep 5
> [ ... ] is there any way that test setup code can safely wipe existing
> settings? [ ... ]
In principle, there are two ways:
* Change cleanup() for the host
* Change verify() for the host
In a perfect world, there would be only one; probably, that one thing should
be labeled "cleanup".
For this particular item, it's probably easier/sufficient to add the needed
changes to CrosHost.cleanup(), provided the code meets these criteria:
* The code completes in fractions of a second.
* The code can be expected never to fail, except in extreme cases such as
the DUT going offline unexpectedly.
,
Sep 5
The PERMANENTLY_UNTRUSTED status is set in the following two lines: https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?rcl=1240638ead7a9b3568b4a30918aff44611c00208&l=1019,1026 I guess those two relevant VLOG lines are probably not dumped. Maybe check /var/log/messages to see whether cryptohome and session_manager daemon dumps out something to confirm it is the bad state case.
,
Sep 5
Whoops, http://stainless/browse/chromeos-autotest-results/233533787-chromeos-test/ was the test run -- missed linking to it above. I see the following when session_manager starts: 2018-09-02T23:38:24.319097-07:00 WARNING session_manager[13799]: [WARNING:libpolicy.cc(37)] Could not load the device policy file. 2018-09-02T23:38:24.322880-07:00 INFO session_manager[13799]: [INFO:session_manager_service.cc(148)] SessionManagerService starting 2018-09-02T23:38:24.330084-07:00 INFO session_manager[13799]: [INFO:browser_job.cc(148)] Running browser /opt/google/chrome/chrome ... I think that that might just be a bogus warning caused by IsEnrolledChromeboxForMeetings in chrome_setup.cc calling policy::PolicyProvider::Reload, though.
,
Sep 5
In chrome log, [14525:14525:0902/234024.117130:ERROR:device_settings_service.cc(314)] Session manager operation failed: 3 3 -> STORE_NO_POLICY as in [1] And looks like we get here [2], and DeviceSettingsProvider::MitigateMissingPolicy returns false because it thinks the device is not consumer mode [3]. Hence we fall through to PERMANENTLY_UNTRUSTED. MigigateMissingPolicy is recently changed: https://chromium-review.googlesource.com/c/chromium/src/+/1141947 Not sure whether that is related though. [1] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_service.h?rcl=99db8d62a62610f9aed8225c47dc051b5d61b1a1&l=73 [2] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?rcl=99db8d62a62610f9aed8225c47dc051b5d61b1a1&l=1014 [3] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?rcl=99db8d62a62610f9aed8225c47dc051b5d61b1a1&l=931
,
Sep 26
Thanks for looking into this, Xiyuan (and sorry for my slow reply). I think that there's another example of this happening recently in the peach_pi build at http://stainless/browse/chromeos-autotest-results/241228964-chromeos-test/ : 2018/09/23 18:44:25 Started test ui.ChromeLogin 2018/09/23 18:44:25 [18:44:24.553] Restarting ui job 2018/09/23 18:44:26 [18:44:25.703] Waiting for org.chromium.SessionManager D-Bus service 2018/09/23 18:44:26 [18:44:25.738] Asking session_manager to enable Chrome testing 2018/09/23 18:44:26 [18:44:25.742] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/09/23 18:44:28 [18:44:27.403] Checking cryptohomed service 2018/09/23 18:44:28 [18:44:27.404] Removing cryptohome for testuser@gmail.com 2018/09/23 18:44:28 [18:44:27.525] Finding OOBE DevTools target 2018/09/23 18:44:28 [18:44:27.764] Connecting to Chrome at ws://127.0.0.1:42262/devtools/page/73AF8060612C41FF67E41C461482283A 2018/09/23 18:44:29 [18:44:28.113] Waiting for OOBE 2018/09/23 18:44:33 [18:44:32.303] Logging in as user "testuser@gmail.com" 2018/09/23 18:44:33 [18:44:32.395] Waiting for cryptohome /home/user/6c269782e10c28498ae1e65030b587e1f30c2ebb 2018/09/23 18:46:23 [18:46:21.981] cryptohome status: { ... "tpm": { ... "has_cryptohome_key": true, ... "owned": true, ... 2018/09/23 18:46:23 [18:46:22.006] /sys/class/misc/tpm0/device/owned contains "1" 2018/09/23 18:46:23 [18:46:22.009] Error at chrome_login.go:69: Chrome login failed: /home/user/6c269782e10c28498ae1e65030b587e1f30c2ebb not mounted: context deadline exceeded (last error: doesn't exist) The Chrome log ends with the same error as above (but a different error_id -- maybe those got renumbered?). ... [15611:15611:0923/184432.377816:VERBOSE1:gaia_screen_handler.cc(780)] HandleCompleteLogin [15611:15611:0923/184432.378018:VERBOSE1:login_display_mojo.cc(106)] Show error, error_id: 1861, attempts:1, help_topic_id: 188036 [15611:15611:0923/184432.378091:ERROR:core_oobe_handler.cc(237)] CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again. [15611:15611:0923/184432.666220:WARNING:CONSOLE(2651)] "Could not find style data in module named", source: chrome://resources/polymer/v1_0/polymer/polymer-extracted.js (2651) [15611:15611:0923/184436.147543:VERBOSE1:gaia_screen_handler.cc(803)] Gaia is loaded [15611:15611:0923/184436.147694:VERBOSE1:signin_screen_handler.cc(1426)] Login WebUI >> loginVisible, src: gaia-signin, webui_visible_: 1 [15611:15611:0923/184436.589170:WARNING:CONSOLE(17472)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/login.js (17472) EOF Andrew, any chance that https://chromium-review.googlesource.com/1141947 could be causing DeviceSettingsProvider::MitigateMissingPolicy() to incorrect fail the DEVICE_MODE_CONSUMER check on some test devices? (See #6.)
,
Sep 26
The failure in #7 is different. It seems to be caused by bad policy blob that fails the signature check. 09/23 18:42:18.395 INFO | utils:0287| [stdout] Started test ui.ChromeCrashLoggedIn And chrome log says: [14348:14487:0923/184220.806493:ERROR:cloud_policy_validator.cc(402)] Policy signature validation failed [14348:14348:0923/184220.835233:ERROR:session_manager_operation.cc(217)] Policy validation failed: 2 [14348:14348:0923/184220.835353:ERROR:device_settings_service.cc(314)] Session manager operation failed: 5 [14348:14348:0923/184220.835459:ERROR:device_settings_provider.cc(1017)] Failed to retrieve cros policies. Reason: 5 Policy validation failed: 2 -> VALIDATION_BAD_SIGNATURE Session manager operation failed: 5 -> STORE_VALIDATION_ERROR And session_manager daemon agrees 2018-09-24T01:42:18.714109+00:00 ERR session_manager[14328]: [ERROR:device_policy_impl.cc(713)] Signature does not match the data or can not be verified! 2018-09-24T01:42:18.714151+00:00 ERR session_manager[14328]: [ERROR:device_policy_impl.cc(753)] Policy signature verification failed!
,
Oct 10
I think that the link-paladin CQ failure in ui.VirtualKeyboardOmnibox at http://stainless/browse/chromeos-autotest-results/246968133-chromeos-test/ is another example of the bad policy blob issue from #7. From chrome_20181009-192506: [16811:16811:1009/192507.160791:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE) [16811:16811:1009/192507.160841:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR) [16811:16811:1009/192507.160880:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR) ... [16811:16950:1009/192507.181313:ERROR:cloud_policy_validator.cc(441)] Policy signature validation failed [16811:16811:1009/192507.254973:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE) [16811:16811:1009/192507.255019:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR) [16811:16811:1009/192507.255057:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR) However, the confusing thing is that other tests that perform login ran and passed on the same DUT just before this one failed; see e.g. ui.ChromeLogin: 2018/10/09 19:24:46 Running ui.ChromeLogin ... 2018/10/09 19:24:53 Finished ui.ChromeLogin ... 2018/10/09 19:25:05 Running ui.VirtualKeyboardOmnibox ... 2018/10/09 19:27:02 Error: [virtual_keyboard_omnibox.go:26] Failed to start Chrome: not mounted for testuser@gmail.com: context deadline exceeded; last error follows: /home/user/9ee774749aa421f6912b763e19572304f256ccc8 not found 2018/10/09 19:27:03 Finished ui.VirtualKeyboardOmnibox --- Xiyuan, do you have any theories about how login could've failed here with a policy validation error? I don't know where the bad policy file could've come from.
,
Oct 10
Looks like ui.ChromeLogin is the first test that completes sign-in successfully. So we will generate an owner key on the device. ui.20181009-192447 ==== [1009/192453:INFO:keygen_worker.cc(53)] Writing Owner key to /home/user/9ee774749aa421f6912b763e19572304f256ccc8/key.pub [1009/192453:INFO:policy_key.cc(120)] wrote 294 bytes to /home/user/9ee774749aa421f6912b763e19572304f256ccc8/key.pub /var/log/messages: ==== 2018-10-10T02:24:53.992222+00:00 INFO session_manager[14168]: [INFO:policy_key.cc(120)] wrote 294 bytes to /var/lib/whitelist/owner.key 2018-10-10T02:24:53.992242+00:00 INFO session_manager[14168]: [INFO:policy_service.cc(264)] Persisted policy key to disk. On the chrome side, it should generate a policy blob after the owner key generation finishes. However, the ownkey-gen -> policy-gen is asynchronous and does not blocks sign-in. I suspect we exit before it finishes and we end up with an owner key present but no policy blob (or having a stale one signed by a different key).
,
Oct 23
The policy validation failures discussed in #7 and onwards are tracked by issue 897278 . I'll retitle this issue to track the problem that was discussed earlier. Mattias, are you the best owner for issues involving policy problems that seem to live somewhere between Chrome and session_manager?
,
Oct 25
I've not really been touching this code at all this year, and I realistically don't have cycles to dive in deeply, sorry. I'd say that someone from the Enterprise team would probably be a good candidate, dropping into their queue.
,
Oct 25
Pavol: is this something you're familiar with?
,
Nov 9
Sorry for the delay. Let me recap for my understanding: - this was happening on real Chrome OS device - it happened in a test that is run using the "tast"[1] test framework - the test calls Oobe.loginForTesting[2] with enterpriseEnroll=false - the DUT refuses to sign-in. - In the logs we can see that session_manager returned STORE_NO_POLICY and this leads to the chain of events Xiyuan described in Comment #6: - CrosSettings are assumed to be UNTRUSTED(*) - -> chrome can't verify if the device has been disabld and refuses to login (*) Instead, we'd have expected that the device has no owner yet, so the setting should be assumed to be trusted: https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?g=0&l=972 IIUC ownership/device policy is supposed to be completely wiped in tests since bug 897278 has been fixed. So the root cause *may* be that - the device remains in a state where chrome assumes that there's an owner or - the device doesn't know if there is an owner Does that make sense? I'll take a look at the logs / how the decision is made ("public key present" and install attributes both seem to be factors here) [1] https://chromium.googlesource.com/chromiumos/platform/tast/ [2] this is somewhat hard to follow for newbies to tast like me -- chrome.New calls chrome.logIn by default which calls Oobe.loginForTesting indeed. It passes false for "enterpriseEnroll".
,
Nov 9
#14: Thanks for taking a look! Based on test results, it's plausible that wiping /var/lib/whitelist could've fixed this. The only failures that I see since issue 897278 was resolved were on guado_moblab (which I think might have issues of its own): http://stainless/search?view=list&first_date=2018-11-01&last_date=2018-11-09&branch=master&test=%5Etast%5C.&reason=not+mounted&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Nov 15
Still no failures besides guado_moblab, so I think this is probably okay now. Feel free to reopen and undupe if there's still stuff you want to investigate here. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by xiy...@chromium.org
, Sep 5