New issue
Advanced search Search tips

Issue 798550 link

Starred by 0 users

Issue metadata

Status: WontFix
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

session_manager does not restart chrome after killing an unresponsive browser

Project Member Reported by achuith@chromium.org, Jan 2 2018

Issue description

To repo, you can run desktopui_Hangdetector, or manually do what it's doing:
1. touch /run/session_manager/enable_hang_detection && restart ui

2. pstree -p to locate the main browser process which is the child of the session_manager

3. kill -SIGSTOP <browser_pid>

4. tail -f /var/log/messages looking for something like:
2018-01-02T13:19:46.173981-08:00 WARNING session_manager[8546]: [WARNING:liveness_checker_impl.cc(64)] Browser hang detected!
2018-01-02T13:19:46.174013-08:00 WARNING session_manager[8546]: [WARNING:liveness_checker_impl.cc(68)] Aborting browser process.
2018-01-02T13:19:46.174153-08:00 INFO session_manager[8546]: [INFO:browser_job.cc(167)] Terminating process: Browser did not respond to DBus liveness check.
2018-01-02T13:19:46.174186-08:00 INFO session_manager[8546]: [INFO:system_utils_impl.cc(93)] Sending 6 to 8568 as 1000

5. kill -SIGCONT <browser_pid>

At this point, session_manager will not have launched a new browser. pstree -p confirms that session_manager has no children. To reset your DUT:

6. rm /run/session_manager/enable_hang_detection && restart ui

 

Comment 1 by derat@chromium.org, Jan 2 2018

Cc: mnissler@chromium.org
Components: OS>Systems
I don't seem to be able to repro this in a ToT build (Chrome 65.0.3310.0, platform 10272.0.0). When session_manager sends SIGABRT after I send SIGSTOP, I see the display turn black. pstree reports that the same Chrome browser process is still a child of session_manager. After I send SIGCONT, Chrome receives SIGABRT and exits, and then session_manager sees Chrome exit and restarts it.

Here's what I see in /var/log/messages:

2018-01-02T13:47:16.568249-08:00 INFO session_manager[15479]: [INFO:browser_job.cc(147)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=28.0.0.137 --ui-prioritize-in-gpu-process --use-gl=egl --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --policy-switches-begin --enable-features=EnableBackgroundBlur --policy-switches-end --vmodule=automatic_reboot_manager=1,tablet_power_button_controller=1,*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --enable-features=EnableBackgroundBlur
2018-01-02T13:47:16.568822-08:00 INFO session_manager[15479]: [INFO:session_manager_service.cc(239)] Browser is 15858
2018-01-02T13:47:16.768201-08:00 INFO kernel: [13506.834309] [drm] Adding LVDS downclock mode
2018-01-02T13:47:17.454150-08:00 INFO session_manager[15479]: [INFO:upstart_signal_emitter.cc(35)] Emitting login-prompt-visible Upstart signal
2018-01-02T13:47:17.481557-08:00 INFO session_manager[15479]: [INFO:server_backed_state_key_generator.cc(126)] Stable device secret missing!
2018-01-02T13:47:18.196177-08:00 INFO kernel: [13508.262057] [drm] Adding LVDS downclock mode
2018-01-02T13:47:41.590005-08:00 WARNING session_manager[15479]: [WARNING:liveness_checker_impl.cc(64)] Browser hang detected!
2018-01-02T13:47:41.590046-08:00 WARNING session_manager[15479]: [WARNING:liveness_checker_impl.cc(68)] Aborting browser process.
2018-01-02T13:47:41.590247-08:00 INFO session_manager[15479]: [INFO:browser_job.cc(164)] Terminating process: Browser did not respond to DBus liveness check.
2018-01-02T13:47:41.590276-08:00 INFO session_manager[15479]: [INFO:system_utils_impl.cc(93)] Sending 6 to 15858 as 1000
2018-01-02T13:47:44.590375-08:00 WARNING session_manager[15479]: [WARNING:browser_job.cc(172)] Aborting child process 15858's process group 3 seconds after sending signal
2018-01-02T13:47:44.590398-08:00 INFO session_manager[15479]: [INFO:browser_job.cc(156)] Terminating process group: Browser took more than 3 seconds to exit after signal.
2018-01-02T13:47:44.590411-08:00 INFO session_manager[15479]: [INFO:system_utils_impl.cc(93)] Sending 6 to -15858 as 1000
2018-01-02T13:47:44.607820-08:00 ERR session_manager[15479]: [ERROR:object_proxy.cc(582)] Failed to call method: com.ubuntu.Upstart0_6.EmitEvent: object_path= /com/ubuntu/Upstart: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2018-01-02T13:47:44.890041-08:00 ERR session_manager[16007]: [ERROR:process.cc(329)] Exec of /usr/bin/vm_launcher failed:: No such file or directory
2018-01-02T13:47:44.890685-08:00 INFO session_manager[15479]: [INFO:child_exit_handler.cc(73)] 15906 is not a managed job.
2018-01-02T13:48:01.590240-08:00 ERR session_manager[15479]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.LibCrosServiceInterface.CheckLiveness: object_path= /org/chromium/LibCrosService: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2018-01-02T13:48:22.602605-08:00 INFO session_manager[15479]: [INFO:child_exit_handler.cc(77)] Handling 15858 exit.
2018-01-02T13:48:22.602638-08:00 ERR session_manager[15479]: [ERROR:child_exit_handler.cc(79)]   Exited with exit code 1
2018-01-02T13:48:22.602666-08:00 INFO session_manager[15479]: [INFO:session_manager_service.cc(307)] Exiting process is chrome.
2018-01-02T13:48:22.602685-08:00 INFO session_manager[15479]: [INFO:browser_job.cc(156)] Terminating process group: Ensuring browser processes are gone.
2018-01-02T13:48:22.602706-08:00 INFO session_manager[15479]: [INFO:system_utils_impl.cc(93)] Sending 9 to -15858 as 1000
2018-01-02T13:48:22.602751-08:00 INFO session_manager[15479]: [INFO:system_utils_impl.cc(93)] Sending 9 to 15858 as 0
2018-01-02T13:48:22.602799-08:00 INFO session_manager[15479]: [INFO:browser_job.cc(180)] Cleaned up child 15858
2018-01-02T13:48:22.605375-08:00 INFO session_manager[15479]: [INFO:browser_job.cc(147)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=28.0.0.137 --ui-prioritize-in-gpu-process --use-gl=egl --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --policy-switches-begin --enable-features=EnableBackgroundBlur --policy-switches-end --vmodule=automatic_reboot_manager=1,tablet_power_button_controller=1,*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --enable-features=EnableBackgroundBlur
2018-01-02T13:48:22.606739-08:00 INFO session_manager[15479]: [INFO:session_manager_service.cc(239)] Browser is 16039

I believe that 13:48:22 is the point at which I sent SIGCONT.

Do you see the same behavior if you modify Chrome to actually hang in the browser process instead of sending SIGSTOP?
Can you try desktopui_HangDetector and see what happens?

I tried it on a downloaded amd64-generic VM, which has no OS/browser modifications and was built by a bot, and can see the problem there as well.

Comment 3 by derat@chromium.org, Jan 2 2018

I just ran "test_that --board=cave chromeos2-row8-rack6-host11.cros desktopui_HangDetector" and saw the test pass.
desktopui_HangDetector.DEBUG
5.1 KB Download
I haven't done any modifications to chrome.

I noticed this on my DUT when I was working on  crbug.com/793880 . At the end of desktopui_HangDetector test runs, I had to call restart ui to get chrome running again. I can dig a little more into this later this week.

Comment 5 by derat@chromium.org, Jan 2 2018

CHROMEOS_RELEASE_BUILDER_PATH=cave-paladin/R65-10273.0.0-rc1
CHROMEOS_RELEASE_VERSION=10273.0.0-rc1
Yup, the test passes. At the end of the test, we make sure session_manager is still running and hasn't restarted, which it is and hasn't, but we never check to ensure that a new browser has launched. This should probably be added to the test?

Comment 7 by derat@chromium.org, Jan 2 2018

Do you mean that the test is passing but leaving the system in a state where Chrome is unusable? (I don't know anything about this test; it looks like it was added by cmasone@ back in 2012 and essentially not touched after that.)

Comment 8 by derat@chromium.org, Jan 2 2018

Isn't this line waiting for Chrome?

        # Wait for new browser to come up.
        login.wait_for_browser(
You're right - I don't know why the test is passing.

I just ran the test again. Here's my /var/log/messages:
2018-01-02T14:36:52.634170-08:00 INFO session_manager[3105]: [INFO:browser_job.cc(147)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=28.0.0.137 --ui-prioritize-in-gpu-process --use-gl=egl --enable-native-gpu-memory-buffers --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/oem_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/oem_small.jpg --default-wallpaper-is-oem --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --vmodule=automatic_reboot_manager=1,tablet_power_button_controller=1,*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2
2018-01-02T14:36:52.634743-08:00 INFO session_manager[3105]: [INFO:session_manager_service.cc(239)] Browser is 3128
2018-01-02T14:36:53.663171-08:00 INFO session_manager[3105]: [INFO:upstart_signal_emitter.cc(35)] Emitting login-prompt-visible Upstart signal
2018-01-02T14:36:53.704676-08:00 INFO session_manager[3105]: [INFO:server_backed_state_key_generator.cc(126)] Stable device secret missing!
2018-01-02T14:37:00.562519-08:00 DEBUG kernel: [  791.975201] ieee80211 phy0: device no longer idle - scanning
2018-01-02T14:37:02.645318-08:00 WARNING session_manager[3105]: [WARNING:liveness_checker_impl.cc(64)] Browser hang detected!
2018-01-02T14:37:02.645357-08:00 WARNING session_manager[3105]: [WARNING:liveness_checker_impl.cc(68)] Aborting browser process.
2018-01-02T14:37:02.645479-08:00 INFO session_manager[3105]: [INFO:browser_job.cc(164)] Terminating process: Browser did not respond to DBus liveness check.
2018-01-02T14:37:02.645512-08:00 INFO session_manager[3105]: [INFO:system_utils_impl.cc(93)] Sending 6 to 3128 as 1000
2018-01-02T14:37:05.399980-08:00 WARNING crash_reporter[3306]: [user] Received crash notification for chrome[3128] sig 6, user 1000 (developer build - not testing - always dumping)
2018-01-02T14:37:05.401517-08:00 INFO crash_reporter[3306]: State of crashed process [3128]: S (sleeping)
2018-01-02T14:37:05.406334-08:00 INFO metrics_daemon[2547]: [INFO:metrics_daemon.cc(426)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-01-02T14:37:05.411332-08:00 NOTICE root[3317]: autotest finished iteration /usr/local/autotest/results/default/desktopui_HangDetector/sysinfo/iteration.1
2018-01-02T14:37:05.636314-08:00 INFO kernel: [  797.050043] tpm_tis tpm_tis: command 0x65 (size 20) returned code 0x0
2018-01-02T14:37:05.672311-08:00 INFO kernel: [  797.085978] tpm_tis tpm_tis: command 0x65 (size 22) returned code 0x0
2018-01-02T14:37:05.708313-08:00 INFO kernel: [  797.122253] tpm_tis tpm_tis: command 0x65 (size 22) returned code 0x0
2018-01-02T14:37:05.737314-08:00 INFO kernel: [  797.151021] tpm_tis tpm_tis: command 0x15 (size 14) returned code 0x0
2018-01-02T14:37:05.767319-08:00 INFO kernel: [  797.181010] tpm_tis tpm_tis: command 0x15 (size 14) returned code 0x0
2018-01-02T14:37:12.085778-08:00 ERR crash_reporter[3306]: Could not open core file /var/spool/crash/chrome.20180102.143705.3128.core: No such file or directory
2018-01-02T14:37:12.086135-08:00 INFO crash_reporter[3306]: Writing conversion problems as separate crash report.
2018-01-02T14:37:13.144074-08:00 INFO session_manager[3105]: [INFO:browser_job.cc(180)] Cleaned up child 3128
2018-01-02T14:37:13.144231-08:00 ERR session_manager[3105]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.LibCrosServiceInterface.CheckLiveness: object_path= /org/chromium/LibCrosService: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying


localhost tests # pstree -p | grep session
        |-session_manager(3105)---crash_reporter(3304)

pgrep chrome also returns nothing. The screen is black.

I'll look into it some more.
Cc: -achuith@chromium.org derat@chromium.org
Owner: achuith@chromium.org
From the logs that you posted, it looks like session_manager isn't seeing PID 3128 exit. Looking at the code, I think that SessionManagerService::HandleExit is supposed to be called, which produces the "Exiting process is chrome." message in the log snippet that I posted before calling BrowserJob::RunInBackground to restart Chrome.

I think that this means that session_manager isn't receiving SIGCHLD with CLD_EXITED (see child_exit_handler.cc), although I don't know why that would be. The crash_reporter process hanging around as a child of session_manager when you ran pstree seems a bit suspicious.
Status: WontFix (was: Assigned)

Sign in to add a comment