New issue
Advanced search Search tips

Issue 906690 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 25
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Chrome hangs instead of dying when ui.ChromeCrashNotLoggedIn sends SIGSEGV to it

Project Member Reported by gmeinke@chromium.org, Nov 19

Issue description

Several devices are failing the tast test tast.ui.ChromeCrashNotLoggedIn in the past several CQ runs.

2018/11/19 06:20:32 [06:20:32.112] Error at chrome_crash_not_logged_in.go:31: Couldn't kill Chrome or get dumps: Chrome didn't exit: context deadline exceeded; last error follows: processes still exist
2018/11/19 06:20:32 [06:20:32.112] Stack trace:
Couldn't kill Chrome or get dumps
	at chromiumos/tast/local/bundles/cros/ui.ChromeCrashNotLoggedIn (chrome_crash_not_logged_in.go:31)
	at chromiumos/tast/testing.(*Test).Run.func4 (test.go:189)
	at chromiumos/tast/testing.runStages.func1.1 (stage.go:39)
	at chromiumos/tast/testing.runAndRecover.func1 (stage.go:69)
	at runtime.goexit (asm_amd64.s:2361)

https://logs.chromium.org/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8929430328868290352/+/steps/HWTest__bvt-tast-cq_/0/stdout
 
Components: Tests>Tast
Status: Started (was: Untriaged)
cave-paladin: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8929430328868290352

Are there other paladin failures that I'm missing? This is the only one I saw in the last few runs.
Cc: hidehiko@chromium.org nya@chromium.org keiichiw@chromium.org derat@chromium.org
Labels: -Type-Bug -Pri-2 Hotlist-TreeCloser Hotlist-CrOS-Gardener Pri-1 Type-Bug-Regression
Owner: agawronska@chromium.org
Status: Assigned (was: Started)
Summary: Chrome hangs during testing (was: tast tests failing tast.ui.ChromeCrashNotLoggedIn)
cave-paladin logs are at http://stainless/browse/chromeos-autotest-results/258990555-chromeos-test/ :

2018/11/19 06:18:32 Started test ui.ChromeCrashNotLoggedIn
2018/11/19 06:18:32 [06:18:32.114] Restarting ui job
2018/11/19 06:18:34 [06:18:33.724] Waiting for org.chromium.SessionManager D-Bus service
2018/11/19 06:18:34 [06:18:33.743] Asking session_manager to enable Chrome testing
2018/11/19 06:18:34 [06:18:33.747] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/11/19 06:18:36 [06:18:35.445] Sending SIGSEGV to root Chrome process 7586
2018/11/19 06:18:36 [06:18:35.445] Waiting for 8 Chrome process(es) to exit
2018/11/19 06:20:32 [06:20:32.112] Error at chrome_crash_not_logged_in.go:31: Couldn't kill Chrome or get dumps: Chrome didn't exit: context deadline exceeded; last error follows: processes still exist

Chrome was hanging:

2018-11-19T14:19:03.759807+00:00 WARNING session_manager[7566]: [WARNING:liveness_checker_impl.cc(69)] Browser hang detected!
2018-11-19T14:19:03.978071+00:00 WARNING session_manager[7566]: [WARNING:liveness_checker_impl.cc(81)] Top output (trimmed):
2018-11-19T14:19:03.978109+00:00 WARNING session_manager[7566]: [WARNING:liveness_checker_impl.cc(82)] top - 06:19:03 up 18 min,  0 users,  load average: 0.52, 0.46, 0.39#012Tasks: 202 total,   1 running, 196 sleeping,   1 stopped,   4 zombie#012%Cpu(s):  4.1 us,  4.1 sy,  0.0 ni, 91.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st#012MiB Mem :   3846.2 total,   2463.3 free,    283.4 used,   1099.5 buff/cache#012MiB Swap:   5634.1 total,   5634.1 free,      0.0 used.   3171.1 avail Mem #012  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND#012 6552 root      20   0  987504  16560   7388 S  11.8   0.4   0:03.47 /usr/local/libexec/tast/bundles/local/cros#012 7755 root      20   0    9548   2392   1964 R   5.9   0.1   0:00.03 top -b -c -n1 -w512#012    1 root      20   0   15364   4280   2624 S   0.0   0.1   0:00.92 /sbin/init#012    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [kthreadd]#012    3 root      20   0       0      0      0 S   0.0   0.0   0:00.17 [ksoftirqd/0]#012    5 root       0 -20       0      0      0 S   0.0   0.0   0:00.00 [kworker/0:0H]#012    7 root      20   0       0      0      0 S   0.0   0.0   0:02.66 [rcu_preempt]#012    8 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [rcu_sched]#012    9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [rcu_bh]#012   10 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 [migration/0]#012   11 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 [watchdog/0]#012   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 [watchdog/1]#012   13 root      rt   0       0      0      0 S   0.0   0.0   0:00.03 [migration/1]#012   14 root      20   0       0      0      0 S   0.0   0.0   0:00.12 [ksoftirqd/1]
2018-11-19T14:19:13.759807+00:00 ERR session_manager[7566]: [ERROR:object_proxy.cc(580)] Failed to call method: org.chromium.LivenessServiceInterface.CheckLiveness: object_path= /org/chromium/LivenessService: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2018-11-19T14:19:18.981890+00:00 WARNING session_manager[7566]: [WARNING:liveness_checker_impl.cc(69)] Browser hang detected!
2018-11-19T14:19:19.168905+00:00 WARNING session_manager[7566]: [WARNING:liveness_checker_impl.cc(81)] Top output (trimmed):
2018-11-19T14:19:19.168953+00:00 WARNING session_manager[7566]: [WARNING:liveness_checker_impl.cc(82)] top - 06:19:19 up 19 min,  0 users,  load average: 0.41, 0.44, 0.38#012Tasks: 202 total,   1 running, 196 sleeping,   1 stopped,   4 zombie#012%Cpu(s):  1.4 us,  4.3 sy,  0.0 ni, 94.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st#012MiB Mem :   3846.2 total,   2463.7 free,    283.0 used,   1099.5 buff/cache#012MiB Swap:   5634.1 total,   5634.1 free,      0.0 used.   3171.6 avail Mem #012  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND#012 6552 root      20   0  987504  16560   7388 S  12.5   0.4   0:04.77 /usr/local/libexec/tast/bundles/local/cros#012 7805 root      20   0    9548   2476   2040 R  12.5   0.1   0:00.02 top -b -c -n1 -w512#012    1 root      20   0   15364   4280   2624 S   0.0   0.1   0:00.92 /sbin/init#012    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [kthreadd]#012    3 root      20   0       0      0      0 S   0.0   0.0   0:00.17 [ksoftirqd/0]#012    5 root       0 -20       0      0      0 S   0.0   0.0   0:00.00 [kworker/0:0H]#012    7 root      20   0       0      0      0 S   0.0   0.0   0:02.70 [rcu_preempt]#012    8 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [rcu_sched]#012    9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [rcu_bh]#012   10 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 [migration/0]#012   11 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 [watchdog/0]#012   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 [watchdog/1]#012   13 root      rt   0       0      0      0 S   0.0   0.0   0:00.03 [migration/1]#012   14 root      20   0       0      0      0 S   0.0   0.0   0:00.12 [ksoftirqd/1]
etc.

Here's the Chrome log from the failure:

[7586:7586:1119/061834.772890:WARNING:install_attributes.cc(163)] Install attributes missing, first sign in
[7586:7586:1119/061834.803467:VERBOSE1:drm_device_handle.cc(90)] Succeeded authenticating /dev/dri/card0 in 0 ms with 1 attempt(s)
[7586:7586:1119/061835.251246:VERBOSE1:arc_kiosk_app_manager.cc(212)] Device doesn't support ARC kiosk
[7656:7664:1119/061835.318029:VERBOSE1:drm_device_manager.cc(58)] Primary DRM device added: /sys/devices/pci0000:00/0000:00:02.0/drm/card0
[7586:7586:1119/061835.382397:ERROR:input_method_manager_impl.cc(1114)] IMEEngine for "jkghodnilhceideoidjikpgommlajknk" is not registered
[7586:7586:1119/061835.386382:VERBOSE1:chrome_session_manager.cc(237)] Starting Chrome with login/oobe screen.
[7586:7586:1119/061835.386474:VERBOSE1:login_display_host_webui.cc(1190)] Showing OOBE screen: unknown
[7586:7586:1119/061835.392906:VERBOSE1:login_display_host_webui.cc(1262)] Current locale: en-US
[7586:7586:1119/061835.392965:VERBOSE1:input_events_blocker.cc(19)] InputEventsBlocker 0x41da53f4060 created.
[7586:7586:1119/061835.415164:VERBOSE1:login_display_host_common.cc(97)] Mark device registered because there are remembered users: 1
[7586:7586:1119/061835.418000:ERROR:login_display_host_mojo.cc(157)] Not implemented reached in virtual void chromeos::LoginDisplayHostMojo::SetStatusAreaVisible(bool)
[7586:7586:1119/061835.418084:VERBOSE1:input_events_blocker.cc(28)] InputEventsBlocker 0x41da53f4060 destroyed.
EOF

This might be the same issue that was reported in  issue 905312  -- if so, it seems like the Chrome bug is still there. Assigning to this week's gardener.
Cc: weidongg@chromium.org posciak@chromium.org hiroh@chromium.org zwisler@chromium.org
 Issue 905312  has been merged into this issue.
Here's easier-to-read top output:

top - 06:19:03 up 18 min,  0 users,  load average: 0.52, 0.46, 0.39
Tasks: 202 total,   1 running, 196 sleeping,   1 stopped,   4 zombie
%Cpu(s):  4.1 us,  4.1 sy,  0.0 ni, 91.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3846.2 total,   2463.3 free,    283.4 used,   1099.5 buff/cache
MiB Swap:   5634.1 total,   5634.1 free,      0.0 used.   3171.1 avail Mem
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 6552 root      20   0  987504  16560   7388 S  11.8   0.4   0:03.47 /usr/local/libexec/tast/bundles/local/cros
 7755 root      20   0    9548   2392   1964 R   5.9   0.1   0:00.03 top -b -c -n1 -w512
    1 root      20   0   15364   4280   2624 S   0.0   0.1   0:00.92 /sbin/init
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [kthreadd]
    3 root      20   0       0      0      0 S   0.0   0.0   0:00.17 [ksoftirqd/0]
    5 root       0 -20       0      0      0 S   0.0   0.0   0:00.00 [kworker/0:0H]
    7 root      20   0       0      0      0 S   0.0   0.0   0:02.66 [rcu_preempt]
    8 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [rcu_sched]
    9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [rcu_bh]
   10 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 [migration/0]
   11 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 [watchdog/0]
   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 [watchdog/1]
   13 root      rt   0       0      0      0 S   0.0   0.0   0:00.03 [migration/1]
   14 root      20   0       0      0      0 S   0.0   0.0   0:00.12 [ksoftirqd/1]

So it looks like Chrome wasn't consuming a bunch of CPU, at least.

My best guess would be that Chrome may be reacting poorly to SIGSEGV when it's received during some small window during startup (maybe due to a bad interaction with Breakpad?).

Both the log dir that I linked to above and the one at http://stainless/browse/chromeos-autotest-results/swarming-412b0bee1c808a11/ from  issue 905312  contain SIGSEGV crash dumps that are likely the ones that were produced by the failing test. There may be clues there as to why Chrome was hanging instead of exiting, but nothing leapt out at me.

As a workaround, I can try changing the ui.ChromeCrashNotLoggedIn test to sleep for a few seconds before killing Chrome. Any other ideas?
Uploaded https://crrev.com/c/1342957 as the speculative fix described in #5.
Project Member

Comment 7 by bugdroid1@chromium.org, Nov 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/079738462c705980b61cca9d26da230c570dd4f2

commit 079738462c705980b61cca9d26da230c570dd4f2
Author: Daniel Erat <derat@chromium.org>
Date: Tue Nov 20 00:03:43 2018

tast-tests: Make ui.ChromeCrashNotLoggedIn sleep briefly.

Make the ui.ChromeCrashNotLoggedIn test sleep 3 seconds
before killing Chrome as a speculative workaround for Chrome
hangs that may be caused when it receives SIGSEGV just after
starting.

BUG= chromium:906690 
TEST=ran the test

Change-Id: I90dab680ecfdcd43b73dca81e74ea38260fffa32
Reviewed-on: https://chromium-review.googlesource.com/c/1342957
Tested-by: Dan Erat <derat@chromium.org>
Trybot-Ready: Dan Erat <derat@chromium.org>
Reviewed-by: Gregory Meinke <gmeinke@chromium.org>
Reviewed-by: Aga Wronska <agawronska@chromium.org>
Commit-Queue: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/079738462c705980b61cca9d26da230c570dd4f2/src/chromiumos/tast/local/bundles/cros/ui/chrome_crash_not_logged_in.go

Cc: -weidongg@chromium.org
Cc: agawronska@chromium.org
Labels: OS-Chrome
Owner: derat@chromium.org
Status: Fixed (was: Assigned)
Summary: Chrome hangs instead of dying when ui.ChromeCrashNotLoggedIn sends SIGSEGV to it (was: Chrome hangs during testing)
There are no new failures on either release or paladin builders since the change in #7 went in, so I'm hopeful that this is hacked around^W^Wfixed now. Please reopen if it shows up again, though.

Sign in to add a comment