Chrome login fails in tests due to failed policy validation |
||||||
Issue descriptionsky@ pointed me at http://cautotest/afe/#tab_id=view_job&object_id=249807199, a squawks-release run where the ui.SingleProcessMashLogin Tast test failed with a timeout during login. Logs are at http://stainless/browse/chromeos-autotest-results/249807199-chromeos-test/ : 2018/10/18 14:34:02 Started test ui.SingleProcessMashLogin 2018/10/18 14:34:02 [14:34:01.999] Restarting ui job 2018/10/18 14:34:03 [14:34:03.531] Waiting for org.chromium.SessionManager D-Bus service 2018/10/18 14:34:03 [14:34:03.532] Asking session_manager to enable Chrome testing 2018/10/18 14:34:03 [14:34:03.537] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/10/18 14:34:04 [14:34:04.393] Checking cryptohomed service 2018/10/18 14:34:04 [14:34:04.393] Removing cryptohome for testuser@gmail.com 2018/10/18 14:34:04 [14:34:04.519] Finding OOBE DevTools target 2018/10/18 14:34:04 [14:34:04.620] Connecting to Chrome at ws://127.0.0.1:36895/devtools/page/6E324387C05242B6B6DB5F292D29FB0E 2018/10/18 14:34:05 [14:34:05.113] Waiting for OOBE 2018/10/18 14:34:10 [14:34:10.554] Logging in as user "testuser@gmail.com" 2018/10/18 14:34:10 [14:34:10.823] Waiting for cryptohome for user "testuser@gmail.com" ... 2018/10/18 14:35:59 [14:35:59.709] Error at single_process_mash_login.go:26: Chrome login failed: not mounted for testuser@gmail.com: context deadline exceeded; last error follows: /home/user/8a0d412dce3f1efad00146f2882f89ca22b2a162 not found Previous tests that exercised Chrome login passed in the same run: 2018/10/18 14:33:36 Started test ui.ChromeLogin 2018/10/18 14:33:36 [14:33:36.206] Waiting for org.chromium.SessionManager D-Bus service 2018/10/18 14:33:36 [14:33:36.209] Restarting ui job 2018/10/18 14:33:37 [14:33:36.949] Waiting for org.chromium.SessionManager D-Bus service 2018/10/18 14:33:37 [14:33:36.971] Asking session_manager to enable Chrome testing 2018/10/18 14:33:37 [14:33:36.973] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/10/18 14:33:39 [14:33:38.972] Checking cryptohomed service 2018/10/18 14:33:39 [14:33:38.973] Removing cryptohome for testuser@gmail.com 2018/10/18 14:33:39 [14:33:39.023] Finding OOBE DevTools target 2018/10/18 14:33:39 [14:33:39.166] Connecting to Chrome at ws://127.0.0.1:46859/devtools/page/FDFF2500861B4EB67C5C35A67641EE16 2018/10/18 14:33:39 [14:33:39.596] Waiting for OOBE 2018/10/18 14:33:44 [14:33:44.644] Logging in as user "testuser@gmail.com" 2018/10/18 14:33:44 [14:33:44.798] Waiting for cryptohome for user "testuser@gmail.com" 2018/10/18 14:33:46 [14:33:46.346] Waiting for OOBE to be dismissed 2018/10/18 14:33:48 [14:33:48.737] Waiting for SessionStateChanged "started" D-Bus signal from session_manager 2018/10/18 14:33:48 [14:33:48.737] Got SessionStateChanged signal 2018/10/18 14:33:48 [14:33:48.737] Creating new blank page 2018/10/18 14:33:49 [14:33:49.063] Connecting to Chrome at ws://127.0.0.1:46859/devtools/page/348BA8078D718871A21470739452216E 2018/10/18 14:33:50 [14:33:50.566] Navigating to http://127.0.0.1:37996 2018/10/18 14:33:51 [14:33:51.478] Got content "Hooray, it worked!" 2018/10/18 14:33:51 Completed test ui.ChromeLogin in 15.291s with 0 error(s) The chrome_20181018-143403 log from the failed test contains the following: [7618:7753:1018/143403.996796:WARNING:accelerometer_reader.cc(246)] Accelerometer device directory is empty at /dev/cros-ec-accel [7618:7618:1018/143403.997211:VERBOSE1:drm_device_handle.cc(90)] Succeeded authenticating /dev/dri/card0 in 0 ms with 1 attempt(s) [7765:7770:1018/143404.114243:VERBOSE1:drm_device_manager.cc(58)] Primary DRM device added: /sys/devices/pci0000:00/0000:00:02.0/drm/card0 [7618:7618:1018/143404.224082:ERROR:cloud_policy_validator.cc(441)] Policy signature validation failed [7618:7618:1018/143404.224176:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE) [7618:7618:1018/143404.224225:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR) [7618:7618:1018/143404.224283:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR) ... [7618:7618:1018/143404.321289:VERBOSE1:chrome_session_manager.cc(234)] Starting Chrome with login/oobe screen. [7618:7618:1018/143404.321423:VERBOSE1:login_display_host_webui.cc(1191)] Showing OOBE screen: unknown [7618:7618:1018/143404.337617:VERBOSE1:login_display_host_webui.cc(1263)] Current locale: en-US ... [7618:7753:1018/143404.661019:ERROR:cloud_policy_validator.cc(441)] Policy signature validation failed [7618:7618:1018/143404.683650:INFO:secure_channel_initializer.cc(119)] SecureChannelInitializer::OnBluetoothAdapterReceived(): Bluetooth adapter has been fetched. Passing all queued requests to the service. [7618:7618:1018/143404.732225:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE) [7618:7618:1018/143404.732308:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR) [7618:7618:1018/143404.732365:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR) ... [7618:7752:1018/143404.943622:ERROR:cloud_policy_validator.cc(441)] Policy signature validation failed ... [7618:7618:1018/143405.081812:VERBOSE1:login_display_mojo.cc(63)] Emitting login-prompt-visible ... [7618:7618:1018/143405.108212:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE) [7618:7618:1018/143405.108301:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR) [7618:7618:1018/143405.108362:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR) ... [7618:7618:1018/143410.209011:VERBOSE1:gaia_screen_handler.cc(1170)] LoadAuthExtension, force: 1, offline: 0 ... [7618:7618:1018/143410.253620:INFO:signin_screen_handler.cc(1472)] Login WebUI >> active: 1, source: gaia-signin [7618:7618:1018/143410.386246:VERBOSE1:signin_screen_handler.cc(1426)] Login WebUI >> loginVisible, src: oobe, webui_visible_: 0 [7618:7618:1018/143410.394924:VERBOSE1:signin_screen_handler.cc(1426)] Login WebUI >> loginVisible, src: gaia-loading, webui_visible_: 1 ... [7618:7618:1018/143410.548761:VERBOSE1:gaia_screen_handler.cc(620)] Auth extension finished loading [7618:7618:1018/143410.556481:ERROR:CONSOLE(217)] "Unexpected condition on chrome://oobe/gaia-signin: Could not find value for languageList", source: chrome://resources/js/load_time_data.js (217) [7618:7618:1018/143410.796369:VERBOSE1:gaia_screen_handler.cc(779)] HandleCompleteLogin [7618:7618:1018/143410.796503:VERBOSE1:login_display_mojo.cc(106)] Show error, error_id: 1864, attempts:1, help_topic_id: 188036 [7618:7618:1018/143410.796543:ERROR:core_oobe_handler.cc(238)] CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again. ... [7618:7618:1018/143414.824144:VERBOSE1:gaia_screen_handler.cc(802)] Gaia is loaded [7618:7618:1018/143414.824245:VERBOSE1:signin_screen_handler.cc(1426)] Login WebUI >> loginVisible, src: gaia-signin, webui_visible_: 1 [7618:7618:1018/143414.824350:WARNING:CONSOLE(17510)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://oobe/login.js (17510) ... I don't see any of these policy errors in the Chrome logs from earlier tests, e.g. chrome_20181018-143337. What could cause policy data to get messed up in the middle of a test run? The test that ran before ui.SingleProcessMashLogin kills session_manager over and over: 2018/10/18 14:33:51 Started test ui.SessionManagerRespawn 2018/10/18 14:33:51 [14:33:51.553] Getting initial session_manager process 2018/10/18 14:33:51 [14:33:51.560] Initial session_manager process is 5216 2018/10/18 14:33:51 [14:33:51.560] Killing 5216 2018/10/18 14:33:51 [14:33:51.562] Waiting for session_manager to respawn 2018/10/18 14:33:52 [14:33:52.597] Respawned session_manager process is 5926 2018/10/18 14:33:52 [14:33:52.597] Repeatedly killing session_manager to check that ui-respawn stops restarting it eventually 2018/10/18 14:33:52 [14:33:52.597] Killing 5926 and watching for respawn 2018/10/18 14:33:53 [14:33:53.282] Killing 6079 and watching for respawn 2018/10/18 14:33:54 [14:33:54.579] Killing 6360 and watching for respawn 2018/10/18 14:33:55 [14:33:55.388] Killing 6640 and watching for respawn 2018/10/18 14:33:56 [14:33:56.056] Killing 6900 and watching for respawn 2018/10/18 14:33:56 [14:33:56.739] Killing 7047 and watching for respawn 2018/10/18 14:34:01 [14:34:01.746] session_manager (correctly) not respawned 2018/10/18 14:34:02 Completed test ui.SessionManagerRespawn in 10.459s with 0 error(s) Is session_manager corrupting the policy file?
,
Oct 23
#1: Yes, I reached the same conclusion from the logs (at http://stainless/browse/chromeos-autotest-results/251033913-chromeos-test/). [8900:8900:1022/192640.243285:ERROR:session_manager_operation.cc(214)] Policy validation failed: 2 (BAD_SIGNATURE) [8900:8900:1022/192640.243341:ERROR:device_settings_service.cc(333)] Session manager operation failed: 5 (VALIDATION_ERROR) [8900:8900:1022/192640.243377:ERROR:device_settings_provider.cc(1016)] Failed to retrieve cros policies. Reason: 5 (VALIDATION_ERROR ... [8900:8900:1022/192642.435079:VERBOSE1:gaia_screen_handler.cc(850)] HandleCompleteLogin [8900:8900:1022/192642.435238:VERBOSE1:login_display_mojo.cc(106)] Show error, error_id: 1865, attempts:1, help_topic_id: 188036 [8900:8900:1022/192642.435284:ERROR:core_oobe_handler.cc(238)] CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again. Mattias, any thoughts? If there's something that we can/should do at the beginning of tests to wipe corrupt policy files, I'm happy to add that, but I need some help figuring out what that is (and when it's safe to do).
,
Oct 23
From xiyuan@ at https://crbug.com/880823#c10 : --- Looks like ui.ChromeLogin is the first test that completes sign-in successfully. So we will generate an owner key on the device. ui.20181009-192447 ==== [1009/192453:INFO:keygen_worker.cc(53)] Writing Owner key to /home/user/9ee774749aa421f6912b763e19572304f256ccc8/key.pub [1009/192453:INFO:policy_key.cc(120)] wrote 294 bytes to /home/user/9ee774749aa421f6912b763e19572304f256ccc8/key.pub /var/log/messages: ==== 2018-10-10T02:24:53.992222+00:00 INFO session_manager[14168]: [INFO:policy_key.cc(120)] wrote 294 bytes to /var/lib/whitelist/owner.key 2018-10-10T02:24:53.992242+00:00 INFO session_manager[14168]: [INFO:policy_service.cc(264)] Persisted policy key to disk. On the chrome side, it should generate a policy blob after the owner key generation finishes. However, the ownkey-gen -> policy-gen is asynchronous and does not blocks sign-in. I suspect we exit before it finishes and we end up with an owner key present but no policy blob (or having a stale one signed by a different key).
,
Oct 24
This failed again recently for peppy-chrome-pfq https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8931817354285908368 The successive build passed however.
,
Oct 24
,
Oct 24
Mattias said that he'll take a look at this tomorrow. We chatted a bit over IM about the problem. He agreed with Xiyuan's analysis that there's a policy on disk, but it's inconsistent with the key. It sounds like we can safely make Tast tests (probably via chrome.New) wipe /var/lib/whitelist to clear existing policy files. We should stop the ui job before doing this. Mattias also pointed at http://cs/chromeos_public/src/third_party/autotest/files/client/cros/ownership.py?l=95&rcl=83c018c571f37a322417f4096e9984d680ddd3a2 (the Autotest code for clearing policy files) and noted that it seems like it's incorrect now that we're using numbered policy.N files instead of a single file. So it seems like these tests can probably wipe policy files to work around cases where they run on DUTs that are already in a bad state for whatever reason, but there's still a mystery around how DUTs are getting into that state in the first place. The example at the top here, where some Tast login tests passed and then another login test in the same run failed due to bad policy files, still suggests to me that there may be a race.
,
Oct 25
Here's some analysis: 2018-10-18T21:33:36.964463+00:00 INFO session_manager[5216]: [INFO:session_manager_service.cc(152)] SessionManagerService starting ^ Note that session manager is starting here without saying "No policy key on disk at /var/lib/whitelist/owner.key", so there is a policy key installed at this point. 2018-10-18T21:33:36.973582+00:00 INFO session_manager[5216]: [INFO:browser_job.cc(156)] Running browser /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=31.0.0.122 --ui-prioritize-in-gpu-process --use-gl=egl --enable-native-gpu-memory-buffers --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --system-developer-mode --login-profile=user --has-chromeos-keyboard --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/oem_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/oem_small.jpg --default-wallpaper-is-oem --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --remote-debugging-port=0 --disable-logging-redirect --ash-disable-system-sounds --oobe-skip-postlogin --disable-gaia-services --autoplay-policy=no-user-gesture-required --enable-experimental-extension-apis --whitelisted-extension-id=behllobkkfkfnphdnhnkndlbkcpglgmj --load-extension=/tmp/tast_chrome_extensions.991905158/test_api_extension --arc-availability=none --testing-channel=NamedTestingInterface:/tmp/.org.chromium.Chromium.2K1DWe/.org.chromium.Chromium.QrJxvd --vmodule=nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 2018-10-18T21:33:37.102574+00:00 INFO debugd[5289]: libminijail[5289]: mount '/' -> '/' type '' flags 0x1001 2018-10-18T21:33:37.103047+00:00 INFO debugd[5289]: libminijail[5289]: mount 'none' -> '/proc' type 'proc' flags 0xe 2018-10-18T21:33:37.103194+00:00 INFO debugd[5289]: libminijail[5289]: mount '/var' -> '/var' type '' flags 0x1000 2018-10-18T21:33:37.103279+00:00 INFO debugd[5289]: libminijail[5289]: mount 'tmpfs' -> '/mnt' type 'tmpfs' flags 0xe 2018-10-18T21:33:37.103362+00:00 INFO debugd[5289]: libminijail[5289]: mount '/mnt/stateful_partition/unencrypted/cache/vpd' -> '/mnt/stateful_partition/unencrypted/cache/vpd' type '' flags 0x1000 2018-10-18T21:33:37.103446+00:00 INFO debugd[5289]: libminijail[5289]: mount 'tmpfs' -> '/run' type 'tmpfs' flags 0xe 2018-10-18T21:33:37.103527+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/dbus' -> '/run/dbus' type '' flags 0x1001 2018-10-18T21:33:37.103609+00:00 INFO debugd[5289]: libminijail[5289]: mount '/tmp' -> '/tmp' type '' flags 0x1000 2018-10-18T21:33:37.103701+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/cups' -> '/run/cups' type '' flags 0x1001 2018-10-18T21:33:37.103807+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/ippusb' -> '/run/ippusb' type '' flags 0x1001 2018-10-18T21:33:37.103894+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/shill' -> '/run/shill' type '' flags 0x1001 2018-10-18T21:33:37.103985+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/arc/bugreport' -> '/run/arc/bugreport' type '' flags 0x1001 2018-10-18T21:33:37.104072+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/containers' -> '/run/containers' type '' flags 0x1001 2018-10-18T21:33:37.104155+00:00 INFO debugd[5289]: libminijail[5289]: mount '/dev' -> '/dev' type 'bind' flags 0x5000 2018-10-18T21:33:37.104237+00:00 INFO debugd[5289]: libminijail[5289]: mount '/sys' -> '/sys' type 'bind' flags 0x5000 2018-10-18T21:33:37.104324+00:00 INFO debugd[5289]: libminijail[5289]: mount '/run/tcsd' -> '/run/tcsd' type '' flags 0x1001 2018-10-18T21:33:39.021378+00:00 WARNING cryptohomed[1186]: No valid keysets on disk for 8a0d412dce3f1efad00146f2882f89ca22b2a162 2018-10-18T21:33:39.509930+00:00 INFO bluetoothd[2088]: adapter /org/bluez/hci0 set power to 1 2018-10-18T21:33:39.744990+00:00 INFO session_manager[5216]: [INFO:server_backed_state_key_generator.cc(126)] Stable device secret missing! 2018-10-18T21:33:45.192476+00:00 INFO cryptohomed[1186]: Encrypted partition finalized. 2018-10-18T21:33:45.206705+00:00 INFO cryptohomed[1186]: InstallAttributes have been finalized. 2018-10-18T21:33:46.326313+00:00 INFO cryptohomed[1186]: Migrated (or created) user directory: /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount 2018-10-18T21:33:46.330569+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/Cache 2018-10-18T21:33:46.332353+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/Downloads 2018-10-18T21:33:46.332876+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/GCache 2018-10-18T21:33:46.333303+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/GCache/v1 2018-10-18T21:33:46.333680+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/GCache/v2 2018-10-18T21:33:46.334081+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/GCache/v1/blobs 2018-10-18T21:33:46.334489+00:00 INFO cryptohomed[1186]: Creating pass-through directories /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/GCache/v1/tmp 2018-10-18T21:33:46.339023+00:00 INFO cryptohomed[1186]: RecursiveCopy: /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/.pki 2018-10-18T21:33:46.339549+00:00 INFO cryptohomed[1186]: RecursiveCopy: /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/.pki/nssdb 2018-10-18T21:33:46.341344+00:00 INFO cryptohomed[1186]: RecursiveCopy: /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/.ssh 2018-10-18T21:33:46.341862+00:00 INFO cryptohomed[1186]: RecursiveCopy: /home/.shadow/8a0d412dce3f1efad00146f2882f89ca22b2a162/mount/user/log 2018-10-18T21:33:47.154454+00:00 INFO session_manager[5216]: [INFO:policy_key.cc(51)] No policy key on disk at /home/root/8a0d412dce3f1efad00146f2882f89ca22b2a162/session_manager/policy/key 2018-10-18T21:33:47.154820+00:00 INFO cryptohomed[1186]: Putting a Pkcs11_Initialize on the mount thread. 2018-10-18T21:33:47.168917+00:00 INFO chapsd[1143]: Opening database in: /home/root/8a0d412dce3f1efad00146f2882f89ca22b2a162/chaps 2018-10-18T21:33:47.169010+00:00 INFO chapsd[1143]: leveldb /home/root/8a0d412dce3f1efad00146f2882f89ca22b2a162/chaps/database directory did not exist. 2018-10-18T21:33:47.202647+00:00 INFO chapsd[1143]: leveldb.stats:#012 Compactions#012Level Files Size(MB) Time(sec) Read(MB) Write(MB)#012--------------------------------------------------#012 2018-10-18T21:33:47.202689+00:00 INFO chapsd[1143]: leveldb contains 0 keys. 2018-10-18T21:33:47.218983+00:00 INFO chapsd[1143]: Importing opencryptoki objects. 2018-10-18T21:33:47.219051+00:00 WARNING chapsd[1143]: Did not find any opencryptoki objects to import. 2018-10-18T21:33:47.248495+00:00 INFO chapsd[1143]: Slot 1 ready for token at /home/root/8a0d412dce3f1efad00146f2882f89ca22b2a162/chaps 2018-10-18T21:33:47.250060+00:00 INFO cryptohomed[1186]: A Pkcs11_Init event got finished. 2018-10-18T21:33:47.250362+00:00 INFO cryptohomed[1186]: PKCS#11 initialization succeeded. 2018-10-18T21:33:47.251996+00:00 WARNING session_manager[5216]: [WARNING:device_policy_service.cc(506)] Could not verify that owner key belongs to this user. ^ At this point, session_manager checks whether the user who logs in has the private half of the owner key. That's not the case here. Assuming you're using the same account ID for all testing, the policy will state the user is supposed to be the owner though. This triggers a process referred to in the code as "key loss mitigation", i.e. we re-generate a key for use by the user. That happens asynchronously and takes some time. 2018-10-18T21:33:47.252210+00:00 ERR session_manager[5216]: [ERROR:dbus_util.cc(13)] CreateError(...): Domain=dbus, Code=org.chromium.SessionManagerInterface.PubkeySetIllegal, Message=Could not verify that owner key belongs to this user. 2018-10-18T21:33:47.254192+00:00 INFO chapsd[1143]: Created lock file: /run/lock/power_override/chapsd_token_init_slot_1.lock 2018-10-18T21:33:47.254359+00:00 INFO chapsd[1143]: Initializing key hierarchy for token at /home/root/8a0d412dce3f1efad00146f2882f89ca22b2a162/chaps 2018-10-18T21:33:47.293678+00:00 INFO session_manager[5216]: [INFO:session_manager_impl.cc(561)] Starting user session 2018-10-18T21:33:47.908335+00:00 INFO bluetoothd[2088]: adapter /org/bluez/hci0 set power to 1 2018-10-18T21:33:48.767076+00:00 INFO chapsd[1143]: Finished importing 0 pending objects. 2018-10-18T21:33:48.773645+00:00 INFO chapsd[1143]: Master key is ready for token at /home/root/8a0d412dce3f1efad00146f2882f89ca22b2a162/chaps 2018-10-18T21:33:48.773922+00:00 INFO chapsd[1143]: Deleted lock file: /run/lock/power_override/chapsd_token_init_slot_1.lock 2018-10-18T21:33:49.572427+00:00 WARNING cryptohomed[1186]: RemoveKeyset: key to remove not found 2018-10-18T21:33:49.854785+00:00 NOTICE dbus[536]: [system] Activating service name='org.chromium.EasyUnlock' (using servicehelper) 2018-10-18T21:33:49.906825+00:00 INFO easy_unlock[5709]: Starting EasyUnlock dbus service. 2018-10-18T21:33:49.909267+00:00 NOTICE dbus[536]: [system] Successfully activated service 'org.chromium.EasyUnlock' 2018-10-18T21:33:51.499331+00:00 INFO session_manager[5216]: [INFO:policy_service.cc(264)] Persisted policy key to disk. ^ A new policy key gets written to disk. This is consistent with the theory that a new key has been generated (session_manager generally only writes the key when it gets installed / rotated). When installing a new key, we also write out an initial policy blob signed by the new key, generating a log line that reads "Persisted policy to disk, path: /var/lib/whitelist/policy.1". This doesn't happen here, probably because session_manager gets killed before it manages to do so (see next log statement). So we're now in a situation where we have a policy key on disk that doesn't sign the policy file that's left around. Note that the messages complaining about policy signature check failures start exactly after this point, which corroborates the hypothesis. I've successfully recreated the situation by copying out /var/lib/whitelist/policy.1 after logging in, stopping ui, removing the cryptohome, logging in again and wait until a new owner.key and policy.1 file have been written, stopping ui, copy back the old policy.1 file, starting ui. At this point, login attempts fail silently with the ominous "CoreOobeHandler::ShowSignInError: error_text=Sign-in failed, please connect to the internet and try again." error message. So the theory is confirmed, and the proposed workaround (clearing ownership state by wiping /var/lib/whitelist) should resolve this. The factor that led to this never getting reported as bug from the field but your tests still ran into it is that your device reset logic is incomplete: Removing the cryptohome for the owner account (which I assume you do) should arguably also reset device ownership (i.e. clearing /var/lib/whitelist). I think it's a good idea to do the latter unconditionally between tests to avoid device ownership state leaks across tests. There's the larger issue that devices may end up in a situation where login is blocked with no way to recover. I've filed issue 898854 for that, but given there's no indication actual users are hitting this, it'll be pretty low on my person list of priorities.
,
Oct 27
Thanks for digging into this, Mattias! As a less-risky workaround based on what you described above, would it be reasonable for the test code to actually check if policy.1 is signed by owner.key? Or is it completely safe to unconditionally delete /var/lib/whitelist between tests when running on test DUTs (after stopping the "ui" job)? It seems to work on the DUT where I tried it; I'm mostly concerned about situations where someone's DUT is owned by a real GAIA account that they used during signin but then they run tests that use a different account. Could the loss of a valid key and policy file cause any issues in that case? The main reason for my hesitation here is that the clear_ownership_files_no_restart Autotest function that you mentioned (http://cs/chromeos_public/src/third_party/autotest/files/client/cros/ownership.py?l=95&rcl=83c018c571f37a322417f4096e9984d680ddd3a2) appears to be called only from the login_OwnershipApi test, so I'm wondering if we'd be blazing new ground here by unconditionally deleting the dir.
,
Oct 30
This fails again recently for auron_paine-paladin: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8931287809578825680
,
Oct 30
I don't think we ever made any promises about which state a DUT would be left in after a test (with respect to the stateful file system, I think it is fair to expect the rootfs to be intact). As you note, there already is a function ownership.restart_ui_to_clear_ownership_files which blows away ownership and by my count it is called by 10 different tests in autotest. I don't think there's any reason for caution here, in fact anything that makes the test environment more predictable is probably a win ;-)
,
Oct 31
Thanks again! I've uploaded https://crrev.com/c/1309160 to make Tast's Chrome code recursively delete /var/lib/whitelist (since it gets recreated by session_manager).
,
Nov 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/bdfc15efa8a161af65a4fc4a97d6a1d11d432826 commit bdfc15efa8a161af65a4fc4a97d6a1d11d432826 Author: Daniel Erat <derat@chromium.org> Date: Thu Nov 01 23:24:52 2018 tast-tests: Clear policy files from /var/lib/whitelist. Update chrome.New to delete the /var/lib/whitelist directory when restarting the ui job. This is apparently needed in some cases to get the DUT into a usable state if an earlier test lets session_manager regenerate the owner.key file and then kills it before it can also regenerate policy.1 and sign it using the new owner key. This also makes us clear the user's cryptohome while the ui job is stopped rather than while it's running. I don't think it's necessary, but it feels cleaner to delete policy files and the cryptohome at around the same time. Also update the "ready" function to wait for cryptohomed to be running, as the chrome package no longer does this. BUG= chromium:897278 TEST=ran ui.ChromeLoginForever for 30 iterations without seeing failures; also started tryjobs Change-Id: I9cf5226b7d82bad9f6fb7845815964cc1c9e0b6f Reviewed-on: https://chromium-review.googlesource.com/1309160 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/bdfc15efa8a161af65a4fc4a97d6a1d11d432826/src/chromiumos/tast/local/ready/ready.go [modify] https://crrev.com/bdfc15efa8a161af65a4fc4a97d6a1d11d432826/src/chromiumos/tast/local/chrome/chrome.go
,
Nov 2
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/074805191d2d20eb5313e6ebcde3a6197ad8926c commit 074805191d2d20eb5313e6ebcde3a6197ad8926c Author: Daniel Erat <derat@chromium.org> Date: Fri Nov 02 04:47:25 2018 tast-tests: Wait for cryptohome D-Bus service. Make the local "cros" bundle wait for cryptohomed's D-Bus service to be available before treating the DUT as ready. It was previously just waiting for the process to start, but it looks like there's still the potential for a race when running on freshly-booted VMs: if the D-Bus service is available, the cryptohome command (used when restarting Chrome) crashes instead of waiting. BUG= chromium:897278 TEST=ran ui.ChromeLoginForever Change-Id: If80e2efd1a3fa6e4297285a60740720b71b9375b Reviewed-on: https://chromium-review.googlesource.com/c/1314050 Reviewed-by: Shuhei Takahashi <nya@chromium.org> Commit-Queue: Shuhei Takahashi <nya@chromium.org> Tested-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/074805191d2d20eb5313e6ebcde3a6197ad8926c/src/chromiumos/tast/local/ready/ready.go
,
Nov 2
https://crrev.com/c/1309160 introduced a race that can result in cryptohome failures in arc.Boot (the first test that does login) on VM builders. I've chumped https://crrev.com/c/1314050 as a fix.
,
Nov 5
I'm hopeful that this is working now. I don't see any recent "not mounted for" test failures, apart from ones on guado_moblab that I think are unrelated.
,
Nov 8
Here's a good summary of recent results: http://stainless/search?view=list&first_date=2018-10-12&last_date=2018-11-08&test=%5Etast%5C.&reason=not+mounted+for&exclude_cts=false&exclude_not_run=true&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true There was a "not mounted for" error on nyan_kitty on Oct 30, and one on ninja on Oct 29, and a steady stream of multiple failures per day before then. But the only failures in November have been on guado_moblab, and I think they're unrelated (see https://crbug.com/902318#c2 ).
,
Nov 15
Issue 880823 has been merged into this issue. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by benzh@chromium.org
, Oct 23