New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 880823 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 897278
Owner:
Closed: Nov 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Chrome login fails in tests due to STORE_NO_POLICY error from session_manager

Project Member Reported by derat@chromium.org, Sep 5

Issue description

Three tests in a row that require login timed out on the tidus DUT at chromeos4-row3-rack2-host6:

2018/09/02 23:38:25 Started test video.PlayH264
2018/09/02 23:38:25 [23:38:24.305] Restarting ui job
...
2018/09/02 23:38:29 [23:38:28.442] Logging in as user "testuser@gmail.com"
2018/09/02 23:38:29 [23:38:28.482] Waiting for cryptohome /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5
...
2018/09/02 23:40:22 [23:40:22.025] Error at play.go:49: Failed to connect to Chrome: /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 not mounted: context deadline exceeded
...
2018/09/02 23:40:22 Started test video.PlayVP8
2018/09/02 23:40:22 [23:40:22.056] Restarting ui job
...
2018/09/02 23:40:27 [23:40:26.640] Logging in as user "testuser@gmail.com"
2018/09/02 23:40:27 [23:40:26.694] Waiting for cryptohome /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5
...
2018/09/02 23:42:20 [23:42:19.738] Error at play.go:49: Failed to connect to Chrome: /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 not mounted: context deadline exceeded
...
2018/09/02 23:42:20 Started test video.PlayVP9
2018/09/02 23:42:20 [23:42:19.763] Restarting ui job
...
2018/09/02 23:42:24 [23:42:24.107] Logging in as user "testuser@gmail.com"
2018/09/02 23:42:24 [23:42:24.154] Waiting for cryptohome /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5
...
2018/09/02 23:44:18 [23:44:17.449] Error at play.go:49: Failed to connect to Chrome: /home/user/2d5ff5234a3279f44ce51567da0013744db50aa5 not mounted: context deadline exceeded

This doesn't look like a cryptohome-related problem, as I see "Sign-in failed, please connect to the internet and try again." errors in all three Chrome logs:

...
[14078:14078:0902/233826.481689:VERBOSE1:login_display_host_webui.cc(736)] Login WebUI >> wp animation done
[14078:14078:0902/233826.481739:VERBOSE1:login_display_host_webui.cc(898)] Login WebUI >> Show already initialized UI
[14078:14078:0902/233826.482020:VERBOSE1:webui_login_view.cc(608)] Login WebUI >> login-prompt-visible
[14078:14078:0902/233826.559065:WARNING:CONSOLE(442)] "document.registerElement is deprecated and will be removed in M73, around March 2019. Please use window.customElements.define instead. See https://www.chromestatus.com/features/4642138092470272 for more details.", source: chrome://resources/polymer/v1_0/polymer/polymer-micro-extracted.js (442)
[14078:14078:0902/233827.346000:WARNING:CONSOLE(2083)] "Element.createShadowRoot is deprecated and will be removed in M73, around March 2019. Please use Element.attachShadow instead. See https://www.chromestatus.com/features/4507242028072960 for more details.", source: chrome://resources/polymer/v1_0/polymer/polymer-mini-extracted.js (2083)
[14078:14078:0902/233828.305718:INFO:signin_screen_handler.cc(1483)] Login WebUI >> active: 1, source: account-picker
[14078:14078:0902/233828.413993:VERBOSE1:signin_screen_handler.cc(1437)] Login WebUI >> loginVisible, src: oobe, webui_visible_: 0
[14078:14078:0902/233828.414277:VERBOSE1:login_display_host_webui.cc(721)] Login WebUI >> WEBUI_VISIBLE
[14078:14078:0902/233828.414318:VERBOSE1:webui_login_view.cc(603)] Login WebUI >> not emitting signal, hidden: 0
[14078:14078:0902/233828.465446:VERBOSE1:gaia_screen_handler.cc(780)] HandleCompleteLogin
[14078:14078:0902/233828.465575:VERBOSE1:login_display_webui.cc(109)] Show error, error_id: 2272, attempts:1, help_topic_id: 188036
[14078:14078:0902/233828.465605:ERROR:core_oobe_handler.cc(236)] CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again.
[14078:14078:0902/233828.715040:INFO:signin_screen_handler.cc(1371)] Login WebUI >> AccountPickerReady
[14078:14078:0902/233828.715397:VERBOSE1:gaia_screen_handler.cc(314)] MaybePreloadAuthExtension
[14078:14078:0902/233828.715496:VERBOSE1:gaia_screen_handler.cc(1171)] LoadAuthExtension, force: 1, offline: 0
[14078:14078:0902/233828.717376:VERBOSE1:signin_screen_handler.cc(1437)] Login WebUI >> loginVisible, src: account-picker, webui_visible_: 1
[14078:14078:0902/233828.772347:VERBOSE1:gaia_screen_handler.cc(621)] Auth extension finished loading
[14078:14078:0902/233829.005000:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online
[14078:14078:0902/233830.401547:VERBOSE1:gaia_screen_handler.cc(803)] Gaia is loaded
[14078:14078:0902/233830.401652:VERBOSE1:signin_screen_handler.cc(1437)] Login WebUI >> loginVisible, src: gaia-signin, webui_visible_: 1
[14078:14078:0902/233830.401836:WARNING:CONSOLE(18938)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/login.js (18938)
[14078:14078:0902/233853.502018:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online
[14078:14078:0902/233914.606433:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online
[14078:14078:0902/233943.173627:VERBOSE1:gaia_screen_handler.cc(596)] OnPortalDetectionCompleted Online
[14078:14078:0902/234022.249512:VERBOSE1:component_updater_service.cc(107)] CrxUpdateService stopping
[14078:14078:0902/234022.249825:WARNING:pref_notifier_impl.cc(23)] Pref observer found at shutdown.
[14078:14078:0902/234022.249851:WARNING:pref_notifier_impl.cc(23)] Pref observer found at shutdown.
EOF

These tests are just calling Oobe.loginForTesting. How can login fail with a network error when loginForTesting is used?
 
"Sign-in failed, please connect to the internet and try again."
=> IDS_LOGIN_ERROR_OWNER_KEY_LOST [1].

And looks like triggered in ExistingUserController::ContinueLoginIfDeviceNotDisabled [2] when we failed to validate the policy blob.

While the message is confusing, not sure what is the root cause. 


[1] https://cs.chromium.org/chromium/src/chrome/app/chromeos_strings.grdp?rcl=ce238c61afe98314792abcb65e089116b76ce5a8&l=2251

[2] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/existing_user_controller.cc?rcl=337ba90ae4627a62c35f131555c9051f10a4952a&l=1621
Cc: jrbarnette@chromium.org mnissler@chromium.org
Yikes, what a misleading error message. :-(

Does anyone own CrosSettings? I've had trouble finding an expert when I looked before.

If this is another case of bad existing state on the DUT breaking tests, is there any way that test setup code can safely wipe existing settings? I've spent a lot of time fighting TPM/policy/etc. brokenness on DUTs caused by earlier tests getting aborted while they're doing strange things.
> [ ... ] is there any way that test setup code can safely wipe existing
> settings? [ ... ]

In principle, there are two ways:
  * Change cleanup() for the host
  * Change verify() for the host

In a perfect world, there would be only one; probably, that one thing should
be labeled "cleanup".

For this particular item, it's probably easier/sufficient to add the needed
changes to CrosHost.cleanup(), provided the code meets these criteria:
  * The code completes in fractions of a second.
  * The code can be expected never to fail, except in extreme cases such as
    the DUT going offline unexpectedly.

The PERMANENTLY_UNTRUSTED status is set in the following two lines:

https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?rcl=1240638ead7a9b3568b4a30918aff44611c00208&l=1019,1026

I guess those two relevant VLOG lines are probably not dumped. Maybe check /var/log/messages to see whether cryptohome and session_manager daemon dumps out something to confirm it is the bad state case.
Whoops, http://stainless/browse/chromeos-autotest-results/233533787-chromeos-test/ was the test run -- missed linking to it above.

I see the following when session_manager starts:

2018-09-02T23:38:24.319097-07:00 WARNING session_manager[13799]: [WARNING:libpolicy.cc(37)] Could not load the device policy file.
2018-09-02T23:38:24.322880-07:00 INFO session_manager[13799]: [INFO:session_manager_service.cc(148)] SessionManagerService starting
2018-09-02T23:38:24.330084-07:00 INFO session_manager[13799]: [INFO:browser_job.cc(148)] Running browser /opt/google/chrome/chrome ...

I think that that might just be a bogus warning caused by IsEnrolledChromeboxForMeetings in chrome_setup.cc calling policy::PolicyProvider::Reload, though.
In chrome log,

[14525:14525:0902/234024.117130:ERROR:device_settings_service.cc(314)] Session manager operation failed: 3

3 -> STORE_NO_POLICY as in [1]

And looks like we get here [2], and DeviceSettingsProvider::MitigateMissingPolicy returns false because it thinks the device is not consumer mode [3]. Hence we fall through to PERMANENTLY_UNTRUSTED.

MigigateMissingPolicy is recently changed:
https://chromium-review.googlesource.com/c/chromium/src/+/1141947
Not sure whether that is related though.

[1] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_service.h?rcl=99db8d62a62610f9aed8225c47dc051b5d61b1a1&l=73

[2] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?rcl=99db8d62a62610f9aed8225c47dc051b5d61b1a1&l=1014

[3] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?rcl=99db8d62a62610f9aed8225c47dc051b5d61b1a1&l=931
Cc: olsen@chromium.org pastarmovj@chromium.org
Thanks for looking into this, Xiyuan (and sorry for my slow reply).

I think that there's another example of this happening recently in the peach_pi build at http://stainless/browse/chromeos-autotest-results/241228964-chromeos-test/ :

2018/09/23 18:44:25 Started test ui.ChromeLogin
2018/09/23 18:44:25 [18:44:24.553] Restarting ui job
2018/09/23 18:44:26 [18:44:25.703] Waiting for org.chromium.SessionManager D-Bus service
2018/09/23 18:44:26 [18:44:25.738] Asking session_manager to enable Chrome testing
2018/09/23 18:44:26 [18:44:25.742] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/09/23 18:44:28 [18:44:27.403] Checking cryptohomed service
2018/09/23 18:44:28 [18:44:27.404] Removing cryptohome for testuser@gmail.com
2018/09/23 18:44:28 [18:44:27.525] Finding OOBE DevTools target
2018/09/23 18:44:28 [18:44:27.764] Connecting to Chrome at ws://127.0.0.1:42262/devtools/page/73AF8060612C41FF67E41C461482283A
2018/09/23 18:44:29 [18:44:28.113] Waiting for OOBE
2018/09/23 18:44:33 [18:44:32.303] Logging in as user "testuser@gmail.com"
2018/09/23 18:44:33 [18:44:32.395] Waiting for cryptohome /home/user/6c269782e10c28498ae1e65030b587e1f30c2ebb
2018/09/23 18:46:23 [18:46:21.981] cryptohome status:
{
...
   "tpm": {
...
      "has_cryptohome_key": true,
...
      "owned": true,
...
2018/09/23 18:46:23 [18:46:22.006] /sys/class/misc/tpm0/device/owned contains "1"
2018/09/23 18:46:23 [18:46:22.009] Error at chrome_login.go:69: Chrome login failed: /home/user/6c269782e10c28498ae1e65030b587e1f30c2ebb not mounted: context deadline exceeded (last error: doesn't exist)

The Chrome log ends with the same error as above (but a different error_id -- maybe those got renumbered?).

...
[15611:15611:0923/184432.377816:VERBOSE1:gaia_screen_handler.cc(780)] HandleCompleteLogin
[15611:15611:0923/184432.378018:VERBOSE1:login_display_mojo.cc(106)] Show error, error_id: 1861, attempts:1, help_topic_id: 188036
[15611:15611:0923/184432.378091:ERROR:core_oobe_handler.cc(237)] CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again.
[15611:15611:0923/184432.666220:WARNING:CONSOLE(2651)] "Could not find style data in module named", source: chrome://resources/polymer/v1_0/polymer/polymer-extracted.js (2651)
[15611:15611:0923/184436.147543:VERBOSE1:gaia_screen_handler.cc(803)] Gaia is loaded
[15611:15611:0923/184436.147694:VERBOSE1:signin_screen_handler.cc(1426)] Login WebUI >> loginVisible, src: gaia-signin, webui_visible_: 1
[15611:15611:0923/184436.589170:WARNING:CONSOLE(17472)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/login.js (17472)
EOF

Andrew, any chance that https://chromium-review.googlesource.com/1141947 could be causing DeviceSettingsProvider::MitigateMissingPolicy() to incorrect fail the DEVICE_MODE_CONSUMER check on some test devices? (See #6.)
The failure in #7 is different. It seems to be caused by bad policy blob that fails the signature check.

09/23 18:42:18.395 INFO |             utils:0287| [stdout] Started test ui.ChromeCrashLoggedIn

And chrome log says:

[14348:14487:0923/184220.806493:ERROR:cloud_policy_validator.cc(402)] Policy signature validation failed
[14348:14348:0923/184220.835233:ERROR:session_manager_operation.cc(217)] Policy validation failed: 2
[14348:14348:0923/184220.835353:ERROR:device_settings_service.cc(314)] Session manager operation failed: 5
[14348:14348:0923/184220.835459:ERROR:device_settings_provider.cc(1017)] Failed to retrieve cros policies. Reason: 5

Policy validation failed: 2 -> VALIDATION_BAD_SIGNATURE
Session manager operation failed: 5 -> STORE_VALIDATION_ERROR

And session_manager daemon agrees

2018-09-24T01:42:18.714109+00:00 ERR session_manager[14328]: [ERROR:device_policy_impl.cc(713)] Signature does not match the data or can not be verified!
2018-09-24T01:42:18.714151+00:00 ERR session_manager[14328]: [ERROR:device_policy_impl.cc(753)] Policy signature verification failed!

Cc: shend@chromium.org
Summary: Login tests time out with "Sign-in failed, please connect to the internet and try again." Chrome error (was: Login tests timed out on tidus with "Sign-in failed, please connect to the internet and try again." Chrome error)
I think that the link-paladin CQ failure in ui.VirtualKeyboardOmnibox at http://stainless/browse/chromeos-autotest-results/246968133-chromeos-test/ is another example of the bad policy blob issue from #7. From chrome_20181009-192506:

[16811:16811:1009/192507.160791:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE)
[16811:16811:1009/192507.160841:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR)
[16811:16811:1009/192507.160880:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR)
...
[16811:16950:1009/192507.181313:ERROR:cloud_policy_validator.cc(441)] Policy signature validation failed
[16811:16811:1009/192507.254973:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE)
[16811:16811:1009/192507.255019:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR)
[16811:16811:1009/192507.255057:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR)

However, the confusing thing is that other tests that perform login ran and passed on the same DUT just before this one failed; see e.g. ui.ChromeLogin:

2018/10/09 19:24:46 Running ui.ChromeLogin
...
2018/10/09 19:24:53 Finished ui.ChromeLogin
...
2018/10/09 19:25:05 Running ui.VirtualKeyboardOmnibox
...
2018/10/09 19:27:02 Error: [virtual_keyboard_omnibox.go:26] Failed to start Chrome: not mounted for testuser@gmail.com: context deadline exceeded; last error follows: /home/user/9ee774749aa421f6912b763e19572304f256ccc8 not found
2018/10/09 19:27:03 Finished ui.VirtualKeyboardOmnibox

---

Xiyuan, do you have any theories about how login could've failed here with a policy validation error? I don't know where the bad policy file could've come from.
Looks like ui.ChromeLogin is the first test that completes sign-in successfully. So we will generate an owner key on the device.

ui.20181009-192447
====
[1009/192453:INFO:keygen_worker.cc(53)] Writing Owner key to /home/user/9ee774749aa421f6912b763e19572304f256ccc8/key.pub
[1009/192453:INFO:policy_key.cc(120)] wrote 294 bytes to /home/user/9ee774749aa421f6912b763e19572304f256ccc8/key.pub

/var/log/messages:
====
2018-10-10T02:24:53.992222+00:00 INFO session_manager[14168]: [INFO:policy_key.cc(120)] wrote 294 bytes to /var/lib/whitelist/owner.key
2018-10-10T02:24:53.992242+00:00 INFO session_manager[14168]: [INFO:policy_service.cc(264)] Persisted policy key to disk.

On the chrome side, it should generate a policy blob after the owner key generation finishes. However, the ownkey-gen -> policy-gen is asynchronous and does not blocks sign-in. I suspect we exit before it finishes and we end up with an owner key present but no policy blob (or having a stale one signed by a different key).

Cc: jdufault@chromium.org
Owner: mnissler@chromium.org
Summary: Chrome login fails in tests due to STORE_NO_POLICY error from session_manager (was: Login tests time out with "Sign-in failed, please connect to the internet and try again." Chrome error)
The policy validation failures discussed in #7 and onwards are tracked by  issue 897278 . I'll retitle this issue to track the problem that was discussed earlier.

Mattias, are you the best owner for issues involving policy problems that seem to live somewhere between Chrome and session_manager?
Components: Enterprise
Owner: ----
Status: Available (was: Assigned)
I've not really been touching this code at all this year, and I realistically don't have cycles to dive in deeply, sorry. I'd say that someone from the Enterprise team would probably be a good candidate, dropping into their queue.
Cc: emaxx@chromium.org
Owner: pmarko@chromium.org
Pavol: is this something you're familiar with?
Cc: igorcov@chromium.org
Sorry for the delay.
Let me recap for my understanding:
- this was happening on real Chrome OS device
- it happened in a test that is run using the "tast"[1] test framework
- the test calls Oobe.loginForTesting[2] with enterpriseEnroll=false
- the DUT refuses to sign-in.
- In the logs we can see that session_manager returned STORE_NO_POLICY
and this leads to the chain of events Xiyuan described in Comment #6:
- CrosSettings are assumed to be UNTRUSTED(*)
- -> chrome can't verify if the device has been disabld and refuses to login

(*) Instead, we'd have expected that the device has no owner yet, so the setting should be assumed to be trusted:
https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/device_settings_provider.cc?g=0&l=972

IIUC ownership/device policy is supposed to be completely wiped in tests since  bug 897278  has been fixed.

So the root cause *may* be that
- the device remains in a state where chrome assumes that there's an owner
or
- the device doesn't know if there is an owner

Does that make sense?
I'll take a look at the logs / how the decision is made ("public key present" and install attributes both seem to be factors here)

[1] https://chromium.googlesource.com/chromiumos/platform/tast/
[2] this is somewhat hard to follow for newbies to tast like me -- chrome.New calls chrome.logIn by default which calls Oobe.loginForTesting indeed. It passes false for "enterpriseEnroll".
#14: Thanks for taking a look!

Based on test results, it's plausible that wiping /var/lib/whitelist could've fixed this. The only failures that I see since  issue 897278  was resolved were on guado_moblab (which I think might have issues of its own):

http://stainless/search?view=list&first_date=2018-11-01&last_date=2018-11-09&branch=master&test=%5Etast%5C.&reason=not+mounted&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
Mergedinto: 897278
Status: Duplicate (was: Available)
Still no failures besides guado_moblab, so I think this is probably okay now. Feel free to reopen and undupe if there's still stuff you want to investigate here.

Sign in to add a comment