Issue metadata
Sign in to add a comment
|
UI tests fail while restarting 'ui' job |
||||||||||||||||||||||||
Issue descriptionThe ui.ChromeLogin Tast test failed on daisy-release at http://stainless/browse/chromeos-autotest-results/214182083-chromeos-test/ : 2018/07/04 23:53:51 Started test ui.ChromeLogin 2018/07/04 23:53:51 [23:53:51.311] Restarting ui job 2018/07/04 23:53:51 [23:53:51.328] Error at chrome_login.go:69: Chrome login failed: restarting job "ui" failed: exit status 1: initctl: Job is already running: ui ui.ChromeCrashNotLoggedIn ran right before ui.ChromeLogin, and it looks like the 'ui' Upstart job was restarting around when ui.ChromeLogin started: ... 2018-07-04T23:52:48.382329-07:00 INFO session_manager[10610]: [INFO:browser_job.cc(165)] Terminating process: exiting cleanly 2018-07-04T23:52:48.382799-07:00 INFO session_manager[10610]: [INFO:system_utils_impl.cc(94)] Sending 15 to 11051 as 1000 2018-07-04T23:52:48.386656-07:00 INFO session_manager[10610]: [INFO:session_manager_service.cc(480)] SessionManagerService quitting run loop 2018-07-04T23:52:48.387364-07:00 INFO session_manager[10610]: [INFO:session_manager_service.cc(201)] SessionManagerService exiting 2018-07-04T23:53:51.506780-07:00 INFO sshd[11183]: Accepted publickey for root from 100.109.25.147 port 38512 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU ... I knew that upstart.RestartJob was racy when I wrote it; it checks a job's current status and then runs either 'initctl restart' or 'initctl start' depending on whether it's already running or not. Unfortunately, I don't think that Upstart makes it easy to do this in a reliable way; see e.g. https://stackoverflow.com/questions/9479668/upstart-initctl-startrestart-ubuntu. Maybe I should just make Chrome.restartChromeForTesting explicitly call StopJob before calling RestartJob; that might work here.
,
Jul 25
I've grabbed a daisy DUT (chromeos2-row4-rack8-host19) and running ui.ChromeLogin forever. I'll update if I find anything.
,
Jul 26
I tried ~900 iterations of Chrome login on daisy, and found one iteration takes much longer than expected. 50%ile: 57s 90%ile: 85s 95%ile: 98s max: 144s OTOH I did not observe any indefinite hang. I will try seeing where it is choked.
,
Jul 26
It turned out this is another TPM issue. I set 30 seconds timeout to "restart ui" and dumped several states on timeout. "ps auxwwf" output is: ... root 16638 0.0 0.0 1416 468 ? Ss 22:37 0:00 /bin/sh /usr/share/cros/init/ui-post-stop root 16641 0.1 0.2 14956 5104 ? S 22:37 0:00 \_ cryptohome --action=pkcs11_terminate So "ui" job is choked at "cryptohome --action=pkcs11_terminate". I've attached call stacks of various processes (cryptohomed, chapsd, tcsd) at the moment. apronin@, could you please take a look?
,
Jul 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/8d555fb2bf6f670385e506e625b7f80d9c86c699 commit 8d555fb2bf6f670385e506e625b7f80d9c86c699 Author: Shuhei Takahashi <nya@chromium.org> Date: Fri Jul 27 15:50:49 2018 chrome: Save minidumps when "restart ui" times out. It is known that "restart ui" is sometimes slow by TPM issues. Also sets 30-second timeout to "restart ui". BUG= chromium:860519 TEST=tast run DUT ui.ChromeLogin CQ-DEPEND=CL:1152639 Change-Id: I258fd520f0d2cf3f50d28901d026aa90b648b8a8 Reviewed-on: https://chromium-review.googlesource.com/1152635 Commit-Ready: Shuhei Takahashi <nya@chromium.org> Tested-by: Shuhei Takahashi <nya@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/8d555fb2bf6f670385e506e625b7f80d9c86c699/src/chromiumos/tast/local/chrome/chrome.go
,
Jul 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/f2bf0a7aa6422c41b2690b774c09264675784fb1 commit f2bf0a7aa6422c41b2690b774c09264675784fb1 Author: Shuhei Takahashi <nya@chromium.org> Date: Fri Jul 27 15:50:57 2018 chrome: Add ui.ChromeLoginForever test. BUG= chromium:860519 TEST=tast run DUT ui.ChromeLoginForever Change-Id: I48dbe8b28ef6da7853ebd08a4bd645fd505c91dd Reviewed-on: https://chromium-review.googlesource.com/1152636 Commit-Ready: Shuhei Takahashi <nya@chromium.org> Tested-by: Shuhei Takahashi <nya@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [add] https://crrev.com/f2bf0a7aa6422c41b2690b774c09264675784fb1/src/chromiumos/tast/local/bundles/cros/ui/chrome_login_forever.go
,
Jul 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast/+/d956c8f04ec2166d3f382b2d2cadaf0e1021646f commit d956c8f04ec2166d3f382b2d2cadaf0e1021646f Author: Shuhei Takahashi <nya@chromium.org> Date: Fri Jul 27 15:50:49 2018 tast: Add testing.ContextOutDir. We often want to save files in utility functions for debugging, but passing around testing.State or outDir is a pain. BUG= chromium:860519 TEST=tast run DUT example.Pass Change-Id: I4124ec2334377fa7527bdbd38f1bba59d033a721 Reviewed-on: https://chromium-review.googlesource.com/1152639 Commit-Ready: Shuhei Takahashi <nya@chromium.org> Tested-by: Shuhei Takahashi <nya@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/d956c8f04ec2166d3f382b2d2cadaf0e1021646f/src/chromiumos/tast/testing/state.go
,
Aug 8
I'm seeing daily test failures that look similar to this now. Most of them are on daisy, but they occasionally occur on other boards too. Here are a few examples: daisy-release R70-10944.0.0 tast.ui.ChromeCrashNotLoggedIn, tast.ui.ChromeLogin http://stainless/browse/chromeos-autotest-results/224532521-chromeos-test/ daisy-release R70-10945.0.0 tast.video.PlayH264, tast.video.PlayVP8, tast.video.PlayVP9 http://stainless/browse/chromeos-autotest-results/224613310-chromeos-test/ daisy-release R70-10941.0.0 tast.ui.ChromeCrashNotLoggedIn, tast.ui.ChromeLogin http://stainless/browse/chromeos-autotest-results/224215499-chromeos-test/ poppy-release R70-10941.0.0 tast.ui.ChromeCrashNotLoggedIn, tast.ui.ChromeLogin http://stainless/browse/chromeos-autotest-results/224236932-chromeos-test/ falco-release R70-10942.0.0 tast.ui.ChromeCrashNotLoggedIn, tast.ui.ChromeLogin http://stainless/browse/chromeos-autotest-results/224307756-chromeos-test/ falco-release R70-10943.0.0 tast.ui.ChromeCrashNotLoggedIn, tast.ui.ChromeLogin http://stainless/browse/chromeos-autotest-results/224396501-chromeos-test/ --- In the poppy-release example above, ui-post-stop was again blocked on "cryptohome --action=pkcs11_terminate": root 5315 0.0 0.0 4328 848 ? Ss 08:15 0:00 /bin/sh /usr/share/cros/init/ui-post-stop root 5318 0.1 0.3 35084 12496 ? S 08:15 0:00 \_ cryptohome --action=pkcs11_terminate Here are the cryptohomed messages from syslog: 2018-08-05T15:15:53.401566+00:00 ERR cryptohomed[2213]: TpmIsReady: is not owned. 2018-08-05T15:15:53.401586+00:00 ERR cryptohomed[2213]: HasAuthorization: TPM not ready. 2018-08-05T15:15:53.401597+00:00 ERR cryptohomed[2213]: Destroy() called with insufficient authorization. 2018-08-05T15:15:53.404018+00:00 WARNING cryptohomed[2213]: Could not load the device policy file. 2018-08-05T15:15:53.560073+00:00 WARNING cryptohomed[2213]: Could not load the device policy file. 2018-08-05T15:15:53.652295+00:00 WARNING cryptohomed[2213]: Could not load the device policy file. 2018-08-05T15:15:53.689588+00:00 WARNING cryptohomed[2213]: message repeated 2 times: [ Could not load the device policy file.] 2018-08-05T15:15:53.894728+00:00 WARNING cryptohomed[2213]: Canceled creating cryptohome key - TPM is not ready. 2018-08-05T15:15:54.806765+00:00 WARNING cryptohomed[2213]: Could not load the device policy file. 2018-08-05T15:15:55.765616+00:00 WARNING cryptohomed[2213]: Canceled creating cryptohome key - TPM is not ready. 2018-08-05T15:15:55.961599+00:00 WARNING cryptohomed[2213]: Canceled creating cryptohome key - TPM is not ready. 2018-08-05T15:15:56.746344+00:00 INFO cryptohomed[2213]: Migrated (or created) user directory: /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount 2018-08-05T15:15:56.747193+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/Cache 2018-08-05T15:15:56.747758+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/Downloads 2018-08-05T15:15:56.747939+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/GCache 2018-08-05T15:15:56.748092+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/GCache/v1 2018-08-05T15:15:56.748243+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/GCache/v2 2018-08-05T15:15:56.748390+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/GCache/v1/blobs 2018-08-05T15:15:56.748583+00:00 INFO cryptohomed[2213]: Creating pass-through directories /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/GCache/v1/tmp 2018-08-05T15:15:56.754011+00:00 INFO cryptohomed[2213]: RecursiveCopy: /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/log 2018-08-05T15:15:56.754360+00:00 INFO cryptohomed[2213]: RecursiveCopy: /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/.pki 2018-08-05T15:15:56.754642+00:00 INFO cryptohomed[2213]: RecursiveCopy: /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/.pki/nssdb 2018-08-05T15:15:56.756466+00:00 INFO cryptohomed[2213]: RecursiveCopy: /home/.shadow/c30861d54b050440dba9854eb78845cd5e393fda/mount/user/.ssh 2018-08-05T15:15:56.775097+00:00 WARNING cryptohomed[2213]: ManageCCDPwd is not supported 2018-08-05T15:15:56.775415+00:00 WARNING cryptohomed[2213]: Could not load the device policy file. 2018-08-05T15:15:57.707826+00:00 WARNING cryptohomed[2213]: TPM was not owned. TPM initialization call back will handle PKCS#11 initialization. 2018-08-05T15:15:57.934519+00:00 WARNING cryptohomed[2213]: GetTpmTokenSlotForPath: Path not found. 2018-08-05T15:15:58.226955+00:00 ERR cryptohomed[2213]: IsEndorsementKeyAvailable: Not Implemented. 2018-08-05T15:15:58.226974+00:00 ERR cryptohomed[2213]: IsEndorsementKeyAvailable: Not Implemented. 2018-08-05T15:15:58.529552+00:00 WARNING cryptohomed[2213]: GetTpmTokenSlotForPath: Path not found. 2018-08-05T15:15:58.923880+00:00 WARNING cryptohomed[2213]: RemoveKeyset: key to remove not found 2018-08-05T15:15:59.963940+00:00 ERR cryptohomed[2213]: SetIsOwned: Not Implemented. 2018-08-05T15:15:59.963996+00:00 ERR cryptohomed[2213]: TestTpmAuth: Not Implemented. 2018-08-05T15:15:59.967784+00:00 ERR cryptohomed[2213]: Taking TPM ownership took 1882ms 2018-08-05T15:16:00.996235+00:00 INFO cryptohomed[2213]: Created new cryptohome key. 2018-08-05T15:16:01.808020+00:00 INFO cryptohomed[2213]: Configuring TPM, ownership taken: 1. 2018-08-05T15:16:01.808052+00:00 INFO cryptohomed[2213]: Putting a Pkcs11_Initialize on the mount thread. 2018-08-05T15:16:11.145640+00:00 INFO cryptohomed[2213]: Lockbox created. 2018-08-05T15:16:11.924268+00:00 INFO cryptohomed[2213]: Skipping random salt generation. 2018-08-05T15:16:15.521761+00:00 INFO cryptohomed[2213]: Encrypted partition finalized. 2018-08-05T15:16:15.534025+00:00 INFO cryptohomed[2213]: InstallAttributes have been finalized. 2018-08-05T15:16:30.961656+00:00 INFO cryptohomed[2213]: A Pkcs11_Init event got finished. 2018-08-05T15:16:30.961723+00:00 INFO cryptohomed[2213]: PKCS#11 initialization succeeded. Here are the tests that were run: 2018/08/05 08:15:50 Started test ui.ChromeCrashLoggedIn 2018/08/05 08:15:50 [08:15:49.681] Restarting ui job ... 2018/08/05 08:15:54 [08:15:53.259] Waiting for cryptohome /home/user/c30861d54b050440dba9854eb78845cd5e393fda ... 2018/08/05 08:15:59 Completed test ui.ChromeCrashLoggedIn in 9.367s with 0 error(s) 2018/08/05 08:15:59 Started test ui.ChromeCrashNotLoggedIn 2018/08/05 08:15:59 [08:15:59.036] Restarting ui job ... 2018/08/05 08:16:31 [08:16:30.961] Error at chrome_crash_not_logged_in.go:27: Chrome startup failed: restarting job "ui" failed: context deadline exceeded ... 2018/08/05 08:16:32 Completed test ui.ChromeCrashNotLoggedIn in 32.331s with 1 error(s) 2018/08/05 08:16:32 Started test ui.ChromeLogin 2018/08/05 08:16:32 [08:16:31.368] Restarting ui job ... 2018/08/05 08:16:34 [08:16:33.154] Error at chrome_login.go:72: Chrome login failed: restarting job "ui" failed: exit status 1 ... 2018/08/05 08:16:34 Completed test ui.ChromeLogin in 1.983s with 1 error(s) --- Andrey, do you know what could be making "pkcs11_terminate" block like this?
,
Aug 9
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/cf64f1556958d7a28193ad18e045846604f8cc09 commit cf64f1556958d7a28193ad18e045846604f8cc09 Author: Daniel Erat <derat@chromium.org> Date: Thu Aug 09 09:21:30 2018 tast-tests: Increase timeout for restarting ui job. Increase the chrome package's timeout for restarting the "ui" Upstart job from 30 seconds to 90 seconds to try to work around ui-post-stop hangs due to slow-running "cryptohome --action=pkcs11_terminate" commands. BUG= chromium:860519 , chromium:872468 TEST=ui.ChromeLogin still passes Change-Id: I923de582411c64611cecebcb678c2fbd7d933167 Reviewed-on: https://chromium-review.googlesource.com/1168186 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/cf64f1556958d7a28193ad18e045846604f8cc09/src/chromiumos/tast/local/chrome/chrome.go
,
Aug 9
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast/+/a977eb6931b89c52554406e2b4cdb913247293c3 commit a977eb6931b89c52554406e2b4cdb913247293c3 Author: Daniel Erat <derat@chromium.org> Date: Thu Aug 09 09:21:39 2018 tast: Increase default local test timeout to two minutes. A long-running "cryptohome --action=pkcs11_terminate" process can sometimes block the ui-post-stop script for an extended period of time when tests try to restart the "ui" job. Increase the default timeout for local tests from one minute to two to try to work around this. BUG= chromium:860519 , chromium:872468 TEST=unit tests still pass Change-Id: I6c6c7578bde586f4154353f0fe25c34896dd0b33 Reviewed-on: https://chromium-review.googlesource.com/1168184 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/a977eb6931b89c52554406e2b4cdb913247293c3/src/chromiumos/tast/bundle/local.go
,
Sep 26
Fixing my embarrassing cc mistake. Sorry. :-(
,
Sep 26
,
Oct 12
Ah, where are we on this. I'm listed as the owner but did nothing. Dan, did your CLs fix the issue?
,
Oct 12
I think that we still have random timeouts on daisy, but I suspect that that's a more general issue with the slowness of the hardware. See also issue 894767 and issue 893957, for instance. But I think/hope that I've finally worked around all races caused by the ui-respawn job, as tracked by issue 891594 . |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by derat@chromium.org
, Jul 23