ui.SessionManagerRespawn failed with "Job is already running: ui" due to unkillable Chrome processes |
|||
Issue descriptionIn the tricky-release run at http://stainless/browse/chromeos-autotest-results/235396368-chromeos-test/, the ui.SessionManagerRespawn test failed while trying to restart the ui job at completion: 2018/09/07 23:36:23 Started test ui.SessionManagerRespawn 2018/09/07 23:36:23 [23:36:22.567] Getting initial session_manager process 2018/09/07 23:36:23 [23:36:22.568] Initial session_manager process is 7871 2018/09/07 23:36:23 [23:36:22.568] Killing 7871 2018/09/07 23:36:23 [23:36:22.568] Waiting for session_manager to respawn 2018/09/07 23:36:24 [23:36:24.223] Respawned session_manager process is 8671 2018/09/07 23:36:24 [23:36:24.223] Repeatedly killing session_manager to check that ui-respawn stops restarting it eventually 2018/09/07 23:36:24 [23:36:24.223] Killing 8671 and watching for respawn 2018/09/07 23:36:38 [23:36:37.770] session_manager (correctly) not respawned 2018/09/07 23:36:38 [23:36:37.778] ui job not running; starting it 2018/09/07 23:36:38 [23:36:37.785] External command failed: exit status 1 2018/09/07 23:36:38 [23:36:37.785] Command: 'initctl' 'start' 'ui' 2018/09/07 23:36:38 [23:36:37.785] Uncaptured output: initctl: Job is already running: ui 2018/09/07 23:36:38 [23:36:37.785] Error at session_manager_respawn.go:74: Failed to ensure ui job is running: restarting job "ui" failed: exit status 1 Here's the end of the 'messages' log: 2018-09-07T23:36:24.206218-07:00 INFO session_manager[8671]: [INFO:session_manager_service.cc(159)] SessionManagerService starting 2018-09-07T23:36:24.210664-07:00 INFO session_manager[8671]: [INFO:browser_job.cc(148)] Running browser /opt/google/chrome/chrome ... 2018-09-07T23:36:24.224306-07:00 WARNING kernel: [ 2460.165363] init: ui main process (8671) killed by KILL signal 2018-09-07T23:36:24.318124-07:00 NOTICE ui-respawn[8744]: ui died on signal KILL. 2018-09-07T23:36:24.361973-07:00 NOTICE ui-respawn[8798]: Respawning ui. 2018-09-07T23:36:37.864443-07:00 CRIT ui-post-stop-unkillable[8936]: 8683 Z<sl [chrome] <defunct> 2018-09-07T23:36:37.864462-07:00 CRIT ui-post-stop-unkillable[8936]: 8783 S [chrome] 2018-09-07T23:36:37.864470-07:00 CRIT ui-post-stop-unkillable[8936]: 8822 D [nacl_helper] 2018-09-07T23:36:37.864650-07:00 CRIT ui-post-stop-unkillable[8936]: 8825 D [nacl_helper_non] 2018-09-07T23:36:37.864694-07:00 CRIT ui-post-stop-unkillable[8936]: 8856 Z< [chrome] <defunct> 2018-09-07T23:36:37.864730-07:00 CRIT ui-post-stop-unkillable[8936]: 8899 Zl [chrome] <defunct> 2018-09-07T23:36:37.864766-07:00 CRIT ui-post-stop-unkillable[8936]: 8908 Zl [chrome] <defunct> 2018-09-07T23:36:37.864800-07:00 CRIT ui-post-stop-unkillable[8936]: 8911 Zl [chrome] <defunct> I see a ui-post-stop-unkillable message in issue 625159, although I think that my conclusion there that session_manager was hanging during keygen was probably incorrect (due to confusing logging in session_manager at the time). I should probably make this test use faillog, as there's a chance that seeing a full process list would've provided more clues about what went wrong. I should also make upstart.EnsureJobRunning log more information about the job's state when RestartJob fails. I suspect that the job was actually stuck in the "stopping" state when we tried to start it -- we should possibly wait a bit here.
,
Sep 18
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/c58ca8a1d11871c21a742e98c07af089b013ff45 commit c58ca8a1d11871c21a742e98c07af089b013ff45 Author: Daniel Erat <derat@chromium.org> Date: Tue Sep 18 21:04:25 2018 tast-tests: Fix races in upstart package. Rework the upstart package to be more robust when dealing with jobs that aren't in the expected state. In particular, the package is now aware of jobs' goals and polls to make sure that they've actually reached the requested state. BUG= chromium:882244 TEST=added unit test; also verified that tests that use the upstart package still pass Change-Id: Ie8eea427946a6c1f5e78305600918a9e8c3dd15f Reviewed-on: https://chromium-review.googlesource.com/1223690 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [add] https://crrev.com/c58ca8a1d11871c21a742e98c07af089b013ff45/src/chromiumos/tast/local/upstart/upstart_test.go [modify] https://crrev.com/c58ca8a1d11871c21a742e98c07af089b013ff45/src/chromiumos/tast/local/bundles/cros/platform/check_processes.go [modify] https://crrev.com/c58ca8a1d11871c21a742e98c07af089b013ff45/src/chromiumos/tast/local/upstart/upstart.go
,
Sep 18
I'm going to wait a week or two to see if there are any more Upstart-related failures.
,
Oct 3
I don't think I've seen this problem anymore, but there appears to still be a race that can result in failures with: Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal I'm using issue 891594 to track it. |
|||
►
Sign in to add a comment |
|||
Comment 1 by nya@chromium.org
, Sep 10