New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 891594 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Tast fails to restart ui job while session_manager is restarting

Project Member Reported by sammiequon@chromium.org, Oct 3

Issue description

tast.ui.VirtualKeyboardOmnibox FAIL: Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal


 
This has failed twice in a row. Sample:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933726161786015472

2018/10/02 21:11:17 [21:11:16.794] Error at virtual_keyboard_omnibox.go:27: Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal
2018/10/02 21:11:17 [21:11:16.794] Stack trace:
chromiumos/tast/local/bundles/cros/ui.VirtualKeyboardOmnibox(0xb6da00f0, 0x9698a600, 0x9698faa0)
	/build/peach_pit/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r199/work/tast-local-tests-cros-0.0.1/src/chromiumos/tast/local/bundles/cros/ui/virtual_keyboard_omnibox.go:27 +0x11c
chromiumos/tast/testing.runAndRecover(0xb6d9981c, 0xb6da00f0, 0x9698a600, 0x9698faa0)
	/build/peach_pit/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r199/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/test.go:178 +0x5c
chromiumos/tast/testing.(*Test).Run.func1(0x9698faa0, 0x967bf380, 0x9698fa80, 0xb6da00f0, 0x9698a600, 0xb6da00f0, 0x9698a5c0, 0x966abc70)
	/build/peach_pit/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r199/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/test.go:134 +0xc4
created by chromiumos/tast/testing.(*Test).Run
	/build/peach_pit/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r199/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/test.go:110 +0x158
2018/10/02 21:11:17 Completed test ui.VirtualKeyboardOmnibox in 4.269s with 1 error(s)
Components: Tests>Tast
Cc: derat@chromium.org
Hmm, it's failing at the very first step. It's the same code used by VirtualKeyboardTyping, so it's weird that only Omnibox fails.
Owner: derat@chromium.org
Status: Assigned (was: Available)
I'll look into this. Maybe we're still not handling Upstart's various state transitions correctly. :-/
Cc: keybuk@google.com hidehiko@chromium.org nya@chromium.org
Test output:

2018/10/02 21:11:13 Started test ui.VirtualKeyboardOmnibox
2018/10/02 21:11:13 [21:11:12.621] Restarting ui job
2018/10/02 21:11:13 [21:11:12.804] External command failed: exit status 1
2018/10/02 21:11:13 [21:11:12.804] Command: 'initctl' 'stop' 'ui'
2018/10/02 21:11:13 [21:11:12.804] Uncaptured output:
initctl: Job has already been stopped: ui
2018/10/02 21:11:13 [21:11:12.855] Saving minidump: /tmp/tast_out.414257251/ui.VirtualKeyboardOmnibox/tcsd.1110.dmp
2018/10/02 21:11:15 [21:11:14.150] Saving minidump: /tmp/tast_out.414257251/ui.VirtualKeyboardOmnibox/chapsd.1126.dmp
2018/10/02 21:11:16 [21:11:15.234] Saving minidump: /tmp/tast_out.414257251/ui.VirtualKeyboardOmnibox/cryptohomed.1219.dmp
2018/10/02 21:11:17 [21:11:16.794] Error at virtual_keyboard_omnibox.go:27: Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal

Here's the relevant code: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/HEAD/src/chromiumos/tast/local/upstart/upstart.go#120

When Tast wanted to restart the ui job, it first ran "initctl stop ui", which exited with 1: "initctl: Job has already been stopped: ui". We ignored the error since it looked like the job was already stopped. Then we tried to wait for the job status to go to stop/waiting (indicating that it's already stopped), but we saw start/post-stop.

This doesn't make sense to me. Why didn't "initctl stop ui" change the goal to "stop"?

Here's syslog at around the time of this. It looks like session_manager was in the process of exiting as a result of the previous test, ui.SupervisedUserCrash:

2018-10-03T04:11:12.616654+00:00 ERR session_manager[15011]: [ERROR:session_manager_service.cc(332)] Choosing to end session rather than restart browser.
2018-10-03T04:11:12.616685+00:00 INFO session_manager[15011]: [INFO:session_manager_impl.cc(382)] Emitting D-Bus signal SessionStateChanged:stopping
2018-10-03T04:11:12.616853+00:00 INFO session_manager[15011]: [INFO:system_utils_impl.cc(127)] Sending 15 to 15361 as 1000
2018-10-03T04:11:12.617142+00:00 INFO session_manager[15011]: [INFO:generator_job.cc(82)] Cleaned up child 15361
2018-10-03T04:11:12.617227+00:00 INFO session_manager[15011]: [INFO:session_manager_impl.cc(389)] Emitting D-Bus signal SessionStateChanged:stopped
2018-10-03T04:11:12.617367+00:00 INFO session_manager[15011]: [INFO:session_manager_service.cc(497)] SessionManagerService quitting run loop
2018-10-03T04:11:12.617508+00:00 INFO session_manager[15011]: [INFO:session_manager_service.cc(214)] SessionManagerService exiting
2018-10-03T04:11:12.621310+00:00 WARNING session_manager[15011]: [WARNING:session_manager_main.cc(262)] session_manager exiting with code 1
2018-10-03T04:11:12.625072+00:00 WARNING kernel: [  834.860139] init: ui main process (15011) terminated with status 1
2018-10-03T04:11:12.660362+00:00 NOTICE ui-respawn[15506]: ui failed with exit status 1.
2018-10-03T04:11:12.682259+00:00 NOTICE ui-respawn[15517]: Respawning ui.

I'm guessing there's a race when we try to stop and start the ui job while it's also being restarted in response to session_manager exiting.

I'm wondering if this could be related to the Upstart bug tracked by https://bugs.launchpad.net/upstart/+bug/703800.

Maybe Scott still remembers the relevant Upstart code here. TL;DR: Is there a reliable way to stop and then start an Upstart job regardless of what it's currently doing? The thing I'm trying is:

a) run "initctl stop <job>"; disregard result
b) wait for job status to change to stop/waiting
c) run "initctl start <job>"

I'm seeing a "start" goal after a).
Summary: Tast fails to restart ui job while session_manager is restarting (was: tast.ui.VirtualKeyboardOmnibox is failing on peach-pit-tot-chrome-pfq-informational)
 Issue 882244  tracked my previous efforts to avoid races.
Status: Started (was: Assigned)
On the bright side, I'm able to reproduce this locally with "tast run" by passing "ui.SupervisedUserCrash ui.VirtualKeyboardOmnibox" as the test args:

...
2018/10/04 11:15:18 [11:15:18.748] Logging in as user "testuser@gmail.com"                                                                                                                                                                                                      
2018/10/04 11:15:18 [11:15:18.805] Waiting for cryptohome for user "testuser@gmail.com"                                                                                                                                                                                         
2018/10/04 11:15:19 [11:15:19.429] Waiting for OOBE to be dismissed                                                                                                                                                                                                             
2018/10/04 11:15:20 [11:15:20.335] Waiting for org.chromium.SessionManager D-Bus service 
2018/10/04 11:15:20 Completed test ui.SupervisedUserCrash in 6.316s with 0 error(s)                                                                                                                                                                                             
2018/10/04 11:15:20 Started test ui.VirtualKeyboardOmnibox                                                                                                                                                                                                                      
2018/10/04 11:15:20 [11:15:20.504] Restarting ui job                                                                                                                                                                                                                            
2018/10/04 11:15:20 [11:15:20.663] External command failed: exit status 1                                                                                                                                                                                                       
2018/10/04 11:15:20 [11:15:20.663] Command: 'initctl' 'stop' 'ui'                                                                                                                                                                                                               
2018/10/04 11:15:20 [11:15:20.663] Uncaptured output:                                                                                                                                                                                                                           
initctl: Job has already been stopped: ui
...
2018/10/04 11:15:23 [11:15:23.476] Error at virtual_keyboard_omnibox.go:27: Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal

However, when I also run:

  while true; do d=$(date +%H:%M:%S.%N); printf "%s " $d; status ui; sleep 0.1; done

on the DUT while doing this, the status transitions are still nonsense:

...
11:15:19.966940720 ui start/running, process 19935
11:15:20.081245282 ui start/running, process 19935
11:15:20.201248923 ui start/running, process 19935
11:15:20.320523454 ui start/running, process 19935
11:15:20.437616639 ui start/running, process 19935
<--- here's where tast runs "initctl stop ui"
11:15:20.549812758 ui stop/stopping
<--- ???
11:15:20.672816233 ui start/post-stop, process 20570
11:15:20.790860350 ui start/post-stop, process 20570
11:15:20.907199346 ui start/post-stop, process 20570
11:15:21.023153251 ui start/post-stop, process 20570
...
11:15:23.579302491 ui start/post-stop, process 20570
11:15:23.688788526 ui start/post-stop, process 20570
11:15:23.799139587 ui start/starting
11:15:23.906290402 ui start/starting
11:15:24.012038052 ui start/starting
11:15:24.118064047 ui start/starting
11:15:24.224644939 ui start/starting
11:15:24.332157651 ui start/starting
11:15:24.444373788 ui start/starting
11:15:24.556112588 ui start/running, process 20884
11:15:24.663143606 ui start/running, process 20884
...

Here's the corresponding section of /var/log/messages:

2018-10-04T11:15:20.498610-07:00 INFO session_manager[19935]: [INFO:session_manager_service.cc(310)] Browser process 19967 exited with signal 9 (Killed)
...
2018-10-04T11:15:20.499875-07:00 INFO session_manager[19935]: [INFO:session_manager_service.cc(497)] SessionManagerService quitting run loop
2018-10-04T11:15:20.499969-07:00 INFO session_manager[19935]: [INFO:session_manager_service.cc(214)] SessionManagerService exiting
2018-10-04T11:15:20.504465-07:00 WARNING session_manager[19935]: [WARNING:session_manager_main.cc(262)] session_manager exiting with code 1
2018-10-04T11:15:20.509859-07:00 WARNING kernel: [491824.425709] init: ui main process (19935) terminated with status 1
2018-10-04T11:15:20.513809-07:00 NOTICE kernel: [491824.429972] audit: type=1400 audit(1538676920.512:2463): avc:  granted  { execute } for  pid=20547 comm="init" name="ui-respawn" dev="mmcblk0p5" ino=51061 scontext=u:r:cros_init:s0 tcontext=u:object_r:cros_init_shell_scripts:s0 tclass=file
2018-10-04T11:15:20.513837-07:00 NOTICE kernel: [491824.430172] audit: type=1400 audit(1538676920.512:2464): avc:  granted  { execute } for  pid=20547 comm="init" name="dash" dev="mmcblk0p5" ino=96088 scontext=u:r:cros_init:s0 tcontext=u:object_r:sh_exec:s0 tclass=file
2018-10-04T11:15:20.514828-07:00 NOTICE kernel: [491824.430595] audit: type=1400 audit(1538676920.513:2465): avc:  granted  { execute } for  pid=20547 comm="ui-respawn" path="/bin/dash" dev="mmcblk0p5" ino=96088 scontext=u:r:cros_init:s0 tcontext=u:object_r:sh_exec:s0 tclass=file
2018-10-04T11:15:20.529184-07:00 NOTICE ui-respawn[20554]: ui failed with exit status 1.
2018-10-04T11:15:20.565194-07:00 NOTICE ui-respawn[20566]: Respawning ui.
...
2018-10-04T11:15:24.542517-07:00 INFO session_manager[20884]: [INFO:session_manager_service.cc(152)] SessionManagerService starting

So even though the job briefly goes to stop/stopping as a result of "initctl stop ui", it then goes through start/post-stop, start/starting, and start/running afterward.

I'm pretty sure that this is the doing of the ui-respawn job, which is "start on stopping ui" and runs "start -n ui" to manually restart the ui job. I presumably need to either find some way to make ui-respawn not do that in this case, or else make the Tast upstart package tolerant of it happening...
Uploaded https://crrev.com/c/1262179 as a workaround.
Is this the reason for the ongoing tast.ui.VirtualKeyboardOmnibox failures on eve?

This is tracking test failures with the error "Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal".
Project Member

Comment 12 by bugdroid1@chromium.org, Oct 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/e405c3ed75bc256782d0db8e2ca03af1a6a75872

commit e405c3ed75bc256782d0db8e2ca03af1a6a75872
Author: Daniel Erat <derat@chromium.org>
Date: Sat Oct 06 02:37:55 2018

tast-tests: Add ui restart hack to upstart.StopJob.

The ui Upstart job is atypical, in that it's respawned by a
separate ui-respawn job that is started on "stopping ui". As
a result, if session_manager exits just before
upstart.StopJob(ctx, "ui") is called, the job may get
restarted even after the test runs "initctl stop ui".

This change makes upstart.StopJob wait for the ui job to
either have a "start" goal or reach "stop/waiting" while the
ui-respawn job is in "stop-waiting".

BUG= chromium:891594 
TEST=ran ui.SupervisedUserCrash followed by
     ui.VirtualKeyboardOmnibox and saw the second test start
     chrome instead of getting an error like "Failed to
     start Chrome: stopping "ui" failed: status
     start/post-stop has non-"stop" goal"

Change-Id: I2d542e3939cee6369fe477b2776cf5c39e51cd60
Reviewed-on: https://chromium-review.googlesource.com/1262179
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Shuhei Takahashi <nya@chromium.org>
Reviewed-by: Hidehiko Abe <hidehiko@chromium.org>

[modify] https://crrev.com/e405c3ed75bc256782d0db8e2ca03af1a6a75872/src/chromiumos/tast/local/upstart/upstart.go

Status: Fixed (was: Started)
This failure looks like it isn't happening now.
Status: Started (was: Fixed)
Sigh, I think it's still happening occasionally. See ui.VirtualKeyboardOmnibox in http://stainless/browse/chromeos-autotest-results/246823430-chromeos-test/:

2018/10/09 08:09:20 Started test ui.SupervisedUserCrash
2018/10/09 08:09:20 [08:09:18.903] Restarting ui job
2018/10/09 08:09:22 [08:09:21.050] Waiting for org.chromium.SessionManager D-Bus service
2018/10/09 08:09:22 [08:09:21.077] Asking session_manager to enable Chrome testing
2018/10/09 08:09:22 [08:09:21.080] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/10/09 08:09:23 [08:09:21.966] Checking cryptohomed service
2018/10/09 08:09:23 [08:09:21.971] Removing cryptohome for testuser@gmail.com
2018/10/09 08:09:23 [08:09:22.064] Finding OOBE DevTools target
2018/10/09 08:09:23 [08:09:22.169] Connecting to Chrome at ws://127.0.0.1:45466/devtools/page/9490D8403A4A2ABA023EFD5C5A24E577
2018/10/09 08:09:23 [08:09:22.474] Waiting for OOBE
2018/10/09 08:09:27 [08:09:26.057] Logging in as user "testuser@gmail.com"
2018/10/09 08:09:27 [08:09:26.329] Waiting for cryptohome for user "testuser@gmail.com"
2018/10/09 08:09:28 [08:09:27.506] Waiting for OOBE to be dismissed
2018/10/09 08:09:30 [08:09:29.367] Waiting for org.chromium.SessionManager D-Bus service
2018/10/09 08:09:30 Completed test ui.SupervisedUserCrash in 10.598s with 0 error(s)
2018/10/09 08:09:30 Started test ui.VirtualKeyboardOmnibox
2018/10/09 08:09:30 [08:09:29.501] Restarting ui job
2018/10/09 08:09:30 [08:09:29.659] External command failed: exit status 1
2018/10/09 08:09:30 [08:09:29.659] Command: 'initctl' 'stop' 'ui'
2018/10/09 08:09:30 [08:09:29.660] Uncaptured output:
initctl: Job has already been stopped: ui
...
2018/10/09 08:09:38 [08:09:36.983] Error at virtual_keyboard_omnibox.go:27: Failed to start Chrome: stopping "ui" failed: status start/post-stop has non-"stop" goal

---

2018-10-09T15:09:29.508978+00:00 INFO session_manager[9644]: [INFO:session_manager_service.cc(214)] SessionManagerService exiting
2018-10-09T15:09:29.517392+00:00 WARNING session_manager[9644]: [WARNING:session_manager_main.cc(262)] session_manager exiting with code 1
2018-10-09T15:09:29.519597+00:00 WARNING kernel: [  373.565192] init: ui main process (9644) terminated with status 1
2018-10-09T15:09:29.536097+00:00 INFO crash_reporter[10101]: libminijail[3]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-10-09T15:09:29.545526+00:00 NOTICE ui-respawn[10110]: ui failed with exit status 1.
2018-10-09T15:09:29.566239+00:00 NOTICE ui-respawn[10124]: Respawning ui.
...
2018-10-09T15:09:37.788104+00:00 WARNING session_manager[10304]: [WARNING:chrome_setup.cc(92)] Could not find both paths: /usr/share/chromeos-assets/wallpaper/oem_large.jpg and /usr/share/chromeos-assets/wallpaper/oem_small.jpg
2018-10-09T15:09:37.794810+00:00 INFO session_manager[10304]: [INFO:session_manager_service.cc(152)] SessionManagerService starting
Cc: -agawronska@chromium.org
Here's my attempt to piece together what happened:

15:09:29.501 RestartJob called
             waitUIJobStabilized returns, probably with
                 ui start/running and ui-respawn stop/waiting
15:09:29.520 session_manager process exits with status 1
             ui job stops
             StopJob's "initctl stop ui" call fails
15:09:29.545 ui-respawn is running
15:09:29.566 ui-respawn starts ui job
15:09:29.659 StopJob's WaitForJobStatus call fails due to
                 ui job having "start" goal
Uploaded https://crrev.com/c/1271940 as another speculative fix.
Cc: -sammiequon@chromium.org newcomer@chromium.org
Project Member

Comment 19 by bugdroid1@chromium.org, Oct 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/95e8fca18a87742b464ececff7c8e50a397a806a

commit 95e8fca18a87742b464ececff7c8e50a397a806a
Author: Daniel Erat <derat@chromium.org>
Date: Thu Oct 11 00:08:04 2018

tast-tests: Speculative fix for ui job flakiness.

Make upstart.StopJob's special code for handling the "ui"
job run "initctl stop ui" before waiting for the job's
status to stabilize. Otherwise, there seems to be a race
where waitUIJobStabilized sometimes sees the initial
"start/running" state even though session_manager is already
in the process of exiting.

BUG= chromium:891594 
TEST=no failures when running ui.SupervisedUserCrash
     ui.VirtualKeyboardOmnibox, but i'm already unable to
     reproduce the sporadic failures that are currently seen
     on release builders, so this is just speculative

Change-Id: I1219fda45c19a9bd0ed03f1a24ffb9e879d03578
Reviewed-on: https://chromium-review.googlesource.com/1271940
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Shuhei Takahashi <nya@chromium.org>

[modify] https://crrev.com/95e8fca18a87742b464ececff7c8e50a397a806a/src/chromiumos/tast/local/upstart/upstart.go

Status: Fixed (was: Started)
ui.VirtualKeyboardOmnibox is still failing flakily, but it's due to other issues (in the test itself?) now -- I'm not seeing the ui job errors caused by ui.SupervisedUserCrash running beforehand now.
Cc: apronin@chromium.org mnissler@chromium.org gwendal@chromium.org
 Issue 860519  has been merged into this issue.

Sign in to add a comment