New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 867970 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 899556


Show other hotlists

Hotlists containing this issue:
Hotlist-1
My-hotlist-jobsnguyen


Sign in to add a comment

Ensure lock is always shown after suspend if show lock on sleep pref is set

Project Member Reported by jdufault@chromium.org, Jul 26

Issue description

There are some reports that chrome lock screen never comes up. I wonder if there are some race conditions when chrome crashes, ie:

 1 device is going to sleep/requested to show lock screen
 2 chrome sends dbus message to show lock screen, chrome crashes very shortly after
 3 session manager doesn't think we're in a lock state yet, so it restarts chrome as normal
 4 session manager processes lock screen request
 5 chrome starts back up without lock screen
 6 user opens device a few hours later without lock screen

The race is between 2-4

As a mitigating strategy we may want to check if the show lock on sleep pref is set and we're coming out of suspend, then the lock screen should be shown if it is not already. But we should try to see if there is and then and fix the underlying race.
 
Cc: r...@chromium.org

Comment 2 Deleted

Where are the reports? Do the logs indicate that Chrome actually crashed right after requesting that the screen be locked?
Labels: Restrict-View-Google
I don't have any hard reports on-hand, but this specific complaint has been surfacing consistently every once in a while (ie, the latest is https://www.google.com/tools/feedback/reports/85567837443).
Labels: -Restrict-View-Google
[removing RVG; it's fine to link to feedback reports or other internal URLs as long as the URLs themselves don't reveal confidential information]

The proposed mitigation sounds reasonable to me. As an actual fix, Chrome could support starting in a locked state. Issue 820644 is conveniently already assigned to you. :-)
Cc: mnissler@chromium.org
Labels: -M-71 M-70
+mnissler@ if he has any preference on the mitigation discussed below.

rkc@ and I talked a bit offline about this, and we found at least one race condition. I inserted a LOG(FATAL) at [1], which led to lock not being shown after putting the lid down.

The race is as follows:
  1 chrome is sent a suspend event
  2 chrome tells session manager it wants to lock
  3 session manager tells chrome to lock

If a crash happens between 1 and 2, then session manager is never told about the lock state and restarts chrome. The user then has full in-session access after resuming the device.

I see two clear mitigation paths:

1) chrome listens for suspend done events, and if the lock screen is not shown then show it
2) session manager listens to suspend events; if chrome crashes after suspend has started we kick the user out of the session and go back to login

- 1 has a better UX, as the user remains in-session (though they lost all session state due to the crash).
- 1 seems a bit less reliable, ie, what if suspend event is not delivered to Chrome due to chrome starting late? What about two crashes in a row?
- 2 seems a bit more robust - there are fewer moving parts
- 2 has the disadvantage that even if the user does not have lock state pref, they will get kicked out of session. We can probably work around this by telling session manager about the preference value and it can change behavior as needed

My preference is approach 2.

1: https://cs.chromium.org/chromium/src/chromeos/dbus/power_manager_client.cc?l=796&rcl=05e02f2c0db74dd5d08f9331033c9f4b7c0e7212
Labels: OS-Chrome
I suspect that approach 1 has a race of its own. It would probably be possible for the restarted Chrome to only register for SuspendDone after the signal has already been emitted, resulting in it not knowing that the system just resumed and not displaying the lock screen.

Approach 2 also seems more robust to me, but I think that there's still a race there. Chrome could crash as the user is closing the lid (i.e. before suspend) and miss the SuspendImminent signal as a result.

Probably session_manager could CHECK() if Chrome isn't running[1] when SuspendImminent is emitted or if Chrome exits at any point between SuspendImminent and SuspendDone. In other words, approach 2 with the added condition that Chrome needs to be already running at SuspendImminent.

1. Defined here as "Chrome has registered for D-Bus signals from powerd"
(You already pointed out the race in approach 1.) :-P
I need to think about this more, but here is my knee-jerk security perspective before I call it a day:

1. Anything that depends on different components going through a specific sequence of steps to finally bring up the screen lock is likely to be fragile.
2. Purely event-driven mechanisms are fragile since crashes may not leave around sufficient information to recover. You need to be able to tell the correct state the system should be in after a crash at any point just by looking at device state.

In other words, I think we need an approach with a more fundamental guarantee that a crash at any point on the critical path between "user closes lid" and "everything agrees the screen is locked" can't leave the system in a state where we can't tell whether we're locked or not.

To accomplish that, I think we generally need to switch to an approach that doesn't exclusively rely on events to trigger transitions, but give the decision makers a chance to inquire state to correctly recover from crashes. A couple ideas:
1. Make powerd track a sequence of states the device went through, tagged by monotonically increasing IDs. This allows other components to inquire and see whether they have missed a suspend cycle.
2. Centralize the decision whether we're locked or not either in Chrome or session_manager, but not share that responsibility across both of them.

Let's assume we have #1 and #2. Then, whenever Chrome starts up, it can do two things:
1. Examine a persistent flag it keeps whether the last known state was locked or not.
2. Get unprocessed power state transitions from powerd and determine which of them it hasn't acted on yet (it needs to persist the most recent power state it completely processed). If there is anything unprocessed, it can replay these transitions to achieve the correct state.

Full disclaimer: The above is what my brain produces at midnight, so take it with a grain of salt ;-)
Cc: jorgelo@chromium.org
The current powerd/session_manager/Chrome interactions around suspend/resume and screen-locking are already extremely baroque and hard to understand:

https://chromium.googlesource.com/chromiumos/platform2/+/master/login_manager/#screen-locking
https://chromium.googlesource.com/chromiumos/platform2/+/master/power_manager/docs/suspend_resume.md
etc.

I'm hesitant about anything that adds additional complexity.

Maybe we can just do something simple here like making session_manager stop the session if it sees a Chrome crash within X seconds of a suspend/resume cycle. If that results in too many false positives, we should probably work harder to reduce browser crashes.
"The only thing that ever improved security is reducing complexity" - who/thomasdullien

I would agree that we should probably not try to address this problem by adding complexity to the very mechanism that's enforcing (or trying to enforce) the property that we want.

An unexpected login screen on a Chrome crash (because of a stopped session as per #11) is probably better than an unlocked screen.
IIUC, what Jacob is suggesting is - if session_manager sees suspend, sees a crash before seeing a resume, stop the session.

What comment 11 is suggesting is, if session_manager sees suspend, sees a crash within X seconds (whether or not a resume has been seen), stop the session.

I am not sure what the latter approach will buy us though? Is it trying to prevent the case where we have a suspend and a resume and Chrome misses both the events?

Yes, see #8.

I want us to end the session if we see a Chrome crash X seconds before SuspendImminent or Y seconds after SuspendDone.

It might be possible to handle all the corner cases here, but we'll have poor/nonexistent test coverage for it and it's likely to just regress later. I'm tired of investigating reports of systems not locking and want us to err on the side of security.
I completely understand and empathize with wanting to err on the side of security. However, would it not be possible to add specific tests here that test for a various bunch of these scenarios?

I understand it may be hard with the current infrastructure but if we invested some time, we 'should' be able to inject a crash into Chrome at various stages (while artificially delaying the delivery of the SuspendIminent/SuspendDone signals to Chrome).


I am a bit uncomfortable with simply not even trying to get a solution that covers the corner cases (which does seem possible here) and using a hammer to solve the problem instead. I am also not convinced that the hammer will actually solve all the problems. For example, how many seconds should X and Y be? We've seen delays of higher than 16s while locking on slower Chromebooks earlier. 


I am completely fine with going with the hammer for now - as long as we do keep a more elegant(?) solution as a long-term goal.

>  We've seen delays of higher than 16s while locking on slower Chromebooks earlier. 

The delay here should be much, much shorter than that - it is only when chrome is in a transition state after receiving a suspend event but not telling session manager about it (see comment #6, 1-2). The 16s comes from loading the lock screen, which is after this transition period.
Right - my point however is that it was a number far exceeding what we had thought could be possible. Which is why there is always a slight risk with having just a 'x seconds' based approach. There could also be unknown deadlocks which may cause a Chrome to not receive the suspend imminent event, ever- hence making any crash afterwards still bring up an unlocked screen.

I am not particularly familiar with the session_manager and powerd code though. If the consensus is that having a "stop session if crash within x seconds of SuspendImminent" will definitely stop every possible escape from the lockscreen due to this race, then sure, I'm fine with settling on that as the long term solution.

Cc: xiaoyinh@chromium.org
Components: OS>Systems
Labels: -Pri-1 -M-70 M-71 Pri-2
Owner: derat@chromium.org
Status: Started (was: Assigned)
Our integration tests that try to log into Chrome don't consistently pass (issue 888520,  issue 884454 ,  issue 880823 ,  issue 880818 ,  issue 877720 ,  issue 875263 , issue 865813, etc.). I think we'll have far worse reliability problems in tests that log in and then try to trigger Chrome crashes at precise instances while we're simultaneously suspending the system and locking its screen.

I'm looking into making a session_manager change for this.
Blocking: 899556
Labels: -Pri-2 -M-71 ReleaseBlock-Stable M-72 Pri-1
Labels: -ReleaseBlock-Stable
I'm working on this now. No need for an arbitrary RBS. :-P
I mostly put that out of habit. I usually put RBS on bugs that I want to make sure don't slip through the cracks - it helps me personally track them better.

derat@, which solution are working on?

I believe you're adding a check in session_manager that if there is a crash within x seconds of a suspend event (either SuspendImminent or SuspendDone) to restart the session.

Can we make the check be
1. after SuspendImminent but before SuspendDone always restart the session
2. xxx seconds after SuspendDone always restart the session

This will eliminate a potential interval between SuspendImminent and SuspendDone where we do not restart the session.

I think if this is implemented as a separate state machine complexity will remain low, ie, instead of trying to combine both states into a single enum, split them out, eg

  Merged state machine:
    * Idle -> SuspendImminent -> ReceivedChromeLock -> SuspendDone -> ReceivedChromeUnlock -> Idle

  Split state machine:
    * Awake -> SuspendImminent -> SuspendDone -> CrashBrieflyAfterSuspendDoneUntilTimeout -> Awake
    * Unlocked -> ReceivedChromeLock -> ReceivedChromeUnlock -> Unlocked

Then if either state machine wants to restart session manager restarts. Splitting the state machines out avoids a bunch of different transitions and edge cases (ie, SuspendDone arrives at session_manager after ReceivedChromeUnlock).
#23: Your "Can we make the check be" conditions are exactly what I've written and am about to send to mnissler@* for review as soon as I verify them locally. :-)

I'm planning to add some additional logic to end the session if Chrome crashes X seconds *before* SuspendImminent in a later change, but that's trickier due to session_manager's life choices.

* or you, if you'd like!
Ah, perfect, thanks!

The bit before suspend imminent also seems like a nice improvement.
> The bit before suspend imminent also seems like a nice improvement.

I think it's required as part of this. session_manager and Chrome receive SuspendImminent simultaneously, so we need to handle the case where Chrome crashes quickly and session_manager receives SIGCHLD about Chrome before it receives SuspendImminent from powerd.
Project Member

Comment 27 by bugdroid1@chromium.org, Oct 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/a02c44455923d917dac48d30bd067a9ac7f5fd41

commit a02c44455923d917dac48d30bd067a9ac7f5fd41
Author: Daniel Erat <derat@chromium.org>
Date: Wed Oct 31 07:54:26 2018

login: Log explanation when choosing not to restart Chrome.

Make session_manager log a human-readable explanation when
it chooses not to restart Chrome (e.g. due to a crash while
the screen is locked).

BUG= chromium:867970 
TEST=manual: sent SIGSEGV to chrome after locking the screen
     and saw "Ending session rather than restarting browser:
     screen is locked." in /var/log/messages

Change-Id: I36ae45f75cc738fd5bda42c4d8bd7424789095ec
Reviewed-on: https://chromium-review.googlesource.com/1308683
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/session_manager_interface.h
[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/mock_session_manager.h
[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/session_manager_impl_test.cc
[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/session_manager_service.cc
[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/session_manager_impl.cc
[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/session_manager_process_test.cc
[modify] https://crrev.com/a02c44455923d917dac48d30bd067a9ac7f5fd41/login_manager/session_manager_impl.h

Project Member

Comment 28 by bugdroid1@chromium.org, Oct 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/555de928024b54c421affe7488483ac4cb169556

commit 555de928024b54c421affe7488483ac4cb169556
Author: Daniel Erat <derat@chromium.org>
Date: Wed Oct 31 07:54:27 2018

login: Stop session if Chrome crashes during/after suspend.

Make session_manager end the user's session if a Chrome
crash is observed during suspend (i.e. between powerd's
SuspendImminent and SuspendDone D-Bus signals) or within 5
seconds of SuspendDone being received.

This is intended to reduce the window of time where an
inopportune Chrome crash can result in the screen not being
locked before suspending.

BUG= chromium:867970 
TEST=added unit tests; also manually verified that the
     session ends if i kill chrome soon after resuming or
     when i modify chrome to CHECK(false) in
     ash::PowerEventObserver::SuspendImminent

Change-Id: I2d2b7a60d8e7b4cddeda87afb63a493bac531f9a
Reviewed-on: https://chromium-review.googlesource.com/1308684
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/555de928024b54c421affe7488483ac4cb169556/login_manager/session_manager_impl.cc
[modify] https://crrev.com/555de928024b54c421affe7488483ac4cb169556/login_manager/session_manager_impl_test.cc
[modify] https://crrev.com/555de928024b54c421affe7488483ac4cb169556/login_manager/session_manager_service.cc
[modify] https://crrev.com/555de928024b54c421affe7488483ac4cb169556/login_manager/session_manager_impl.h

Project Member

Comment 29 by bugdroid1@chromium.org, Nov 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/fe80d32ea23e9ba46303c663094422244642cb36

commit fe80d32ea23e9ba46303c663094422244642cb36
Author: Daniel Erat <derat@chromium.org>
Date: Sat Nov 03 04:55:00 2018

login: Stop session for suspend after Chrome crash.

Make session_manager stop the user session when it receives
a SuspendImminent D-Bus signal from powerd within five
seconds of restarting Chrome (typically due to a browser
crash).

This should further reduce the likelihood of a Chrome crash
preventing the screen from being locked before the system
suspends.

BUG= chromium:867970 
TEST=added unit test; also manually verified that session is
     stopped after killing chrome and then running
     powerd_dbus_suspend

Change-Id: I3e75343ab7e556ebe4b6a235729d87efbbbf03cd
Reviewed-on: https://chromium-review.googlesource.com/1312237
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Jacob Dufault <jdufault@chromium.org>

[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/mock_process_manager_service.h
[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/session_manager_service.h
[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/session_manager_impl_test.cc
[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/session_manager_service.cc
[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/process_manager_service_interface.h
[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/session_manager_impl.cc
[modify] https://crrev.com/fe80d32ea23e9ba46303c663094422244642cb36/login_manager/session_manager_impl.h

Cc: dbehr@chromium.org marc...@chromium.org
I think we can mostly close the remaining issue of the unlocked desktop potentially being briefly visible after resume (even though Chrome is already dead and the session ended at that point) if there's some way for session_manager to clear the framebuffer.

Stephane or Dominik, do you have any suggestions there? Untested, but would something like "/sbin/frecon --clear 0x000000" work?
It wouldn't work generally, for example on platforms with AFBC we can't just clear the buffer like this, we also need to clear the metadata.
Status: Fixed (was: Started)
Got it; thanks. Well, let's see how things work with the existing changes -- they should prevent access to the unlocked desktop, at least. If people are still occasionally seeing a glimpse of the desktop on resume after a Chrome crash, we can look into adding more hacks^W carefully-designed changes to try to prevent that from happening.

Sign in to add a comment