New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

[All boards] Regression: Device hangs at black screen on signout, stateful wiped (verified mode only)

Project Member Reported by mmanchala@chromium.org, May 22 2017

Issue description

Chrome Version:  60.0.3105.0/9574.0.0 dev-channel Daisy,Candy and minnie
OS: Chrome

What steps will reproduce the problem?
(1)Sign into user -> click on Uber Tray and now Click on 'Sign out' (Please refer Video)


Expected: 'Sign out' screen should be seen on clicking 'Sign out' in Uber Tray 
Actual: Instead Device gets turned off on clicking 'Sign out' in Uber Tary

This is Regression issue as same is working fine in 60.0.3102.0/9565.0.0 dev-channel Candy

@tbuckley : Please confirm the Issue

Note:
1.No Crash Id's are generated
2.Again need to press Reload + Power button to turn on the Device
3.Clicking on power button also does not turn on the device.
 
Actual.mp4
14.1 MB View Download
Expexted_SignOutScreen.mp4
7.3 MB View Download
Note:
1.Issue is also seen on Suspending the device & on Power-wash as well(Device gets turned off and clicking power button does not turn on the device)
2.Could able to proceed by using escape+reload keys.
Note:
3.Issue is also seen on clicking 'Shut down' Button and on creating Supervised User
Cc: josa...@chromium.org
+ Josafat [M60 TPM]
Cc: tdander...@chromium.org tbuck...@chromium.org
Owner: r...@chromium.org
@rkc any thoughts who can look at this? I don't think there should be any functional changes around the sign-out button recently. +tdanderson to confirm

Comment 5 by ka...@chromium.org, May 22 2017

Labels: -Pri-0 Pri-1
The shutdown looks unclean, and if it is a kernel crash the device should come up. But it does not. Missing crash logs, and having this observed cros-platform, it looks like the screen backlight goes to 0.

Prasanthi reproduced the bug and she confirmed the Brightness is 0, but user can manually bring it back. She is uploading logs in next update.
Tested on Minnie & Daisy with version 60.0.3105.0/9575.0.0 dev-channel.  

Logs here: https://pantheon.corp.google.com/storage/browser/chromiumos-test-logs/bugfiles/cr/724937/

Comment 7 by ka...@chromium.org, May 22 2017

Cc: derat@chromium.org
Components: OS>Kernel>Power

Comment 8 by derat@chromium.org, May 22 2017

I don't understand what's happening in the video. After you click "Sign out" and the screen goes black, are you pressing Reload+Power offscreen? I don't see the boot splash screen when the screen turns back on, just the login screen (as expected).

I'm confused about what's actually being discussed here. Is the device shutting down or hanging, or (per #5) is the backlight brightness just being set to 0? If the understanding of the problem has changed, please update the "Summary" field to be accurate.

Here's what I see in powerd.LATEST from #6:

[0522/160314:INFO:daemon.cc(1484)] Session state changed to stopped
...
[0522/160318:INFO:display_power_setter.cc(82)] Asking Chrome to turn all displays on
[0522/160318:INFO:daemon.cc(1394)] Chrome is using normal display mode
[0522/160319:INFO:internal_backlight_controller.cc(437)] Got user-triggered request to set brightness to 81.25%
[0522/160319:INFO:internal_backlight_controller.cc(682)] Setting brightness to 163 (81.25%) over 200 ms
[0522/160319:INFO:internal_backlight_controller.cc(437)] Got user-triggered request to set brightness to 87.5%
[0522/160319:INFO:internal_backlight_controller.cc(682)] Setting brightness to 191 (87.5%) over 200 ms
[0522/160320:INFO:internal_backlight_controller.cc(437)] Got user-triggered request to set brightness to 93.75%
[0522/160320:INFO:internal_backlight_controller.cc(682)] Setting brightness to 221 (93.75%) over 200 ms
[0522/160320:INFO:daemon.cc(1394)] Chrome is using normal display mode
[0522/160320:INFO:internal_backlight_controller.cc(437)] Got user-triggered request to set brightness to 100%
[0522/160320:INFO:internal_backlight_controller.cc(682)] Setting brightness to 253 (100%) over 200 ms
[0522/160320:INFO:internal_backlight_controller.cc(437)] Got user-triggered request to set brightness to 100%
[0522/160320:INFO:internal_backlight_controller.cc(682)] Setting brightness to 253 (100%) over 200 ms
...
[0522/160626:INFO:daemon.cc(1484)] Session state changed to started

The user logs out, the brightness is manually increased, and then three minutes later the user logs back in.

Comment 9 by ka...@chromium.org, May 22 2017

Retested with more devices and had observations with 60.0.3105.0/9575.0.0.
Taking back the backlight point.
The sign-out causes about 9 sec delay which was assumed to be backlight issue -
 while pressing the brightness keys the devices were comming up on sign-in screen.
The delay varies from 4-to-10 seconds.



Comment 10 by derat@chromium.org, May 22 2017

Components: -OS>Kernel>Power -UI>Shell>StatusArea OS>Systems
Summary: Regression: signout is slow (was: Regression : Device gets turned off on clicking 'Sign out' in Uber Tray)
So it sounds like this regressed between 60.0.3102.0 and 60.0.3105.0. Can you narrow the regression range further?

Comment 11 by derat@chromium.org, May 22 2017

Or it could be an OS issue between 9565.0.0 and 9574.0.0.

Comment 12 by ka...@chromium.org, May 22 2017

Some more checks on different devices over RC builds from last week.
Boards on Dev channel show delay up to 16 seconds, and shorter if reboot before user session. Devices that have been tested and suspended from last week, have slower sign-out.
Boards at Stable channel show 5-8 sec transition.


mmanchala@ and jbanavatu@, can you re-test again and see if it is unusual  delay, as we see it?
If the board does not come back (lets say for a minute), use Alt+VolUp+x_key_three times. After the device reboots, generate log and upload to the linked at #6 location.

Are there any peripherals attached to the device? 








Summary: Regression: signout is slow (and/or not coming back to login screen) (was: Regression: signout is slow)
I just repro'd this on my Samus running latest canary 9575.0.0
I was not able to get system back after logging out unless I long pressed the power button 

Feedback report: https://feedback.corp.google.com/#/Report/61346803386

Comment 14 by ka...@chromium.org, May 22 2017

Summary: [All boards]Regression: Device gets turned off on signout (was: Regression: signout is slow (and/or not coming back to login screen))
Thanks josafat@,
We were able to reproduce it too. The condition is - device in normal mode.
Once the board is in normal mode this is consistent failure.
Short-key Ctrl+Shift+qq has same effect.


Kalin, can you confirm when the regression started? 

rkc/derat@, would you be the right owner for this?

Comment 16 by derat@chromium.org, May 22 2017

I still don't understand from this bug report whether the device is actually shutting down, hanging, or just taking a long time to sign out and/or restart Chrome. This should be part of the bug report; it's not possible to route it to the right team without knowing this.

#12 suggests that logout is just slow, but #13 changed the bug's title to suggest that the system needs to be power-cycled.

Please figure out the answers to the following:

a) Does the system eventually come back, or do you need to hold the power button for eight seconds? I see both things in this report.
b) Is the system actually shutting down? Look at the LED (location varies depending on the device).

From the various comments that have been written so far, including the statement that the power button needs to be held, I would guess that the system isn't shutting down and is instead hanging. That would suggest either a Chrome hang during exit, a Chrome hang during restart, a session_manager bug, or a hang at the kernel level.
Here is what I see on my Samus, 

- the device seems to be hung state since I can see the lightbar all lighted after waiting for an hour after log out
- I can also repro if I exit guest session (e.g. not related to an specific login)
- Closing/Open the lid does not change the lightbar (e.g. stays on)
- Long press seems to be the only way to bring device back 

Comment 18 by r...@chromium.org, May 23 2017

Cc: r...@chromium.org
Owner: derat@chromium.org
So either a Chrome hang during exit or a kernel hang. Dan would know more about this than me, including where to route it.

Comment 19 by derat@chromium.org, May 23 2017

Cc: ihf@chromium.org
[cc ihf@, the current gardener]

If this really affects all boards, it's odd that it doesn't appear to be causing a zillion PFQ failures. I'll see if I'm able to repro it on a lumpy device tomorrow, I guess.

Comment 20 by ihf@chromium.org, May 23 2017

I can't repro on veyron_minnie/R60-9574.0.0. Signout works fine from my test account as well as guest.
Issue is seen when device is in Verified Mode. Waited upto 6-8 mins time  but still the device did not come back to 'Sign-out' screen
And Unable to repro the issue if device is in Dev-Mode i.e.  it takes 5-8 sec to come back to  'Sign-out' screen after clicking on 'Sign out'.

Comment 22 by ihf@chromium.org, May 23 2017

Summary: [All boards]Regression: Device gets turned off on signout (verified mode only) (was: [All boards]Regression: Device gets turned off on signout)

Comment 23 by ihf@chromium.org, May 23 2017

Cc: -ihf@chromium.org
I bet the lab doesn't run tests in verified mode, so no problem for gardening.

Comment 24 by derat@chromium.org, May 23 2017

Cc: abodenha@chromium.org
I'll still try to repro this, but I'm probably not going to be able to do anything to help if it only shows up in verified mode.

Albert, do you have any ideas about what we can try here?

Comment 25 by derat@chromium.org, May 23 2017

Summary: [All boards]Regression: Device hangs at black screen on signout (verified mode only) (was: [All boards]Regression: Device gets turned off on signout (verified mode only))

Comment 26 by derat@chromium.org, May 23 2017

Cc: mnissler@chromium.org vapier@chromium.org teravest@chromium.org jrbarnette@chromium.org snanda@chromium.org
Labels: -Pri-1 Pri-0
Owner: ----
Status: Available (was: Assigned)
Summary: [All boards] Regression: Device hangs at black screen on signout, stateful wiped (verified mode only) (was: [All boards]Regression: Device hangs at black screen on signout (verified mode only))
I'm able to repro this consistently on a verified-mode peppy running 9574.0.0 / 60.0.3105.0.

I logged in with my test account, waited a few minutes, and signed out using the system menu. The device froze with a black screen. After a few minutes, I rebooted by hitting Alt+F10+x three times. The stateful partition appeared to be trashed after the system came back up -- my profile picture was missing on the login screen, a bunch of extension icons are missing in the shelf, there didn't appear to be any log files from before the reboot, and even the client ID (as displayed at chrome://system) was empty. file:///home/chronos/user/log contains previous log files, but I don't see anything useful at the end of them.

I re-enabled crash reporting to generate a new client ID, signed out, and saw the issue again. This time, I synced the disk with Alt+F10+s after each Alt+F10+x. At the login screen, my profile picture was present this time, and chrome://system shows the same client ID as after I re-opted-in to crash reporting after the first crash. The old logs are still missing from file:///var/log.

I only see one crash for the device's client ID (2ebf287fd617490e967987157b97f8a8), at http://crash/browse?q=reportid=%270108342768000000%27, and the kernel log doesn't appear to have any useful data. I generated another crash at http://crash/browse?q=reportid=%27757d522768000000%27 but don't see any commonality between the kernel traces in the two crashes. I suspect that the problem is at a higher layer than the kernel since the sysrq actions are still working. It looks like we don't permit other sysrq keys that would be useful here, like 't', 'w', and 'l'.

I'm stumped and don't know what else to try. Why does this only show up in verified mode? Why is the stateful partition getting wiped? Does anyone more familiar with verified mode have any ideas of other things to try?

Comment 27 by derat@chromium.org, May 23 2017

Cc: jclinton@chromium.org
Labels: Needs-Bisect
Can someone on the test team help by bisecting this? Per the initial report, 9565.0.0 was good (I haven't verified this) and 9574.0.0 was bad. There are a decent number of OS changes in that range (see http://crosland/log/9565.0.0..9574.0.0) and it'd be very helpful to narrow it down in case we're reduced to looking for suspicious changes (which I'll start doing now...).

Comment 28 by derat@chromium.org, May 23 2017

Labels: -Needs-Bisect
Owner: derat@chromium.org
Status: Started (was: Available)
Found it!

https://chromium-review.googlesource.com/c/507991/ (intentionally) disabled sudo in verified mode by bind-mounting /bin/false over it. Unfortunately, the ui-post-stop script loops indefinitely until a sudo command returns true:

# Make sure everything is going down. No exceptions.
# The loop is so that clever daemons can't evade the kill by
# racing us and killing us first; we'll just try over and over
# until we win the race, and kill with pid -1 is atomic with
# respect to process creation.
while ! sudo -u chronos kill -9 -- -1 ; do
  sleep .1
done

Revert at https://chromium-review.googlesource.com/c/512705.
Project Member

Comment 29 by bugdroid1@chromium.org, May 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/9e6d52acf77744f78cef5029f761df5088f07453

commit 9e6d52acf77744f78cef5029f761df5088f07453
Author: Dan Erat <derat@chromium.org>
Date: Tue May 23 16:51:49 2017

Revert "init: bind mount /bin/false over /usr/bin/sudo in verified mode"

This reverts commit f65f0ab17ed3d3887cd0eea105beac67f65f7635.

Reason for revert: This makes the ui-post-stop script, which depends on sudo, hang indefinitely:

while ! sudo -u chronos kill -9 -- -1 ; do
  sleep .1
done

BUG= chromium:724937 

Change-Id: I2220d8afc02fff5a0e98aa8dc9fc45b37bd1d038
Reviewed-on: https://chromium-review.googlesource.com/512705
Reviewed-by: Mike Frysinger <vapier@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/9e6d52acf77744f78cef5029f761df5088f07453/init/chromeos_startup

Are we only using sudo there to change to the chronos user and kill all chronos processes? Surely we can replace that.

Comment 31 by derat@chromium.org, May 23 2017

Status: Fixed (was: Started)
Yes, I'm sure we can (and I encourage filing a new bug to track that change, along with the reland of the sudo bind-mount change :-P).

I think that this is the cause of the hang, but it still needs to be verified by the test team. Josafat said that we'll have a new image soon.

Comment 32 by r...@chromium.org, May 23 2017

Cc: akes...@chromium.org keta...@chromium.org zelidrag@chromium.org bhthompson@chromium.org ihf@chromium.org
Thank you Dan for finding this!


@c30: This is a P0 breaking the user experience completely. The first course of action is to revert the offending CL, not to experiment with ways to make the system work with the CL.

@TPMs: Do we not have any bots that run all the tests on verified images? If not, this sounds like a serious issue - we should re-investigate why this is and see if there are ways to fix this.


Finding issues that completely destroy the user experience only on verified mode devices because everyone's devices just break is not an ideal way of finding these issues.

The lab DUTs do run in verified boot (the systems boot off of firmware signed by the generic dev key), but they specifically bypass some settings like this sudo change as a necessity for running tests (e.g. they run test images). 

It is not clear that we can both run automated tests and be in a fully locked down system image at the same time.

For this sort of case, we rely on manual testing.

Comment 34 by derat@chromium.org, May 23 2017

I'd still like to know why all of the logs under /var/log seemed to be getting wiped when I triggered this and rebooted via the sysrq key. This happened even when I gave the device multiple minutes to flush to disk and used the "emergency sync" key. Is this expected behavior? Does anyone have any guesses about why it would happen? I haven't been able to repro it in verified mode with other images.
there's a few things we need to sort out and test independently before we can reland disabling of sudo.  there's a few weird uses of sudo in platform2/ that seems like we should kill off.  they aren't changes we'd want to do under the gun vs reverting my CL.  in this scenario, revert was the right call.

i'll file a new bug to track the various bits.
Blockedon: 725678

Comment 37 by son...@google.com, May 26 2017

Verified on M60 build 9591.0.0
Signout is working fine.

Status: Verified (was: Fixed)

Sign in to add a comment