New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 624203 link

Starred by 3 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Feature



Sign in to add a comment

powerd should treat very short suspends after which the lid is still closed as suspend failures

Project Member Reported by jwer...@chromium.org, Jun 29 2016

Issue description

When we close the lid, powerd tries to suspend for 10 times. If the suspend repeatedly fails due to some kernel-level issue, it just shuts down instead to avoid wasting battery on something that obviously doesn't seem to get better on its own.

This is a good philosophy, but unfortunately it is currently not applied to all possible failure cases. In http://crosbug.com/p/54061, we have encountered a dogfood device that for some as-of-yet-unexplained reason always immediately resumed from S3 after a (from the kernel's point of view, at least) "successful" suspend. If the lid remains closed, this leads to the following kind of endless loop:

[0619/232012:INFO:suspender.cc(466)] Starting suspend
[0619/232012:INFO:util.cc(48)] Running "/usr/bin/powerd_setuid_helper --action=mosys_eventlog --mosys_eventlog_code=0xa7"
[0619/232012:INFO:util.cc(48)] Running "/usr/bin/powerd_setuid_helper --action=suspend  --suspend_wakeup_count_valid --suspend_wakeup_count=148"
[0619/232015:INFO:daemon.cc(714)] powerd_suspend returned 0
[0619/232015:INFO:util.cc(36)] Launching "/usr/bin/powerd_setuid_helper --action=mosys_eventlog --mosys_eventlog_code=0xa8"
[0619/232015:INFO:suspender.cc(424)] Finishing request 104595507 successfully
[0619/232015:INFO:state_controller.cc(351)] Lid still closed after resuming from lid-close-triggered suspend; repeating lid-closed action
[0619/232015:INFO:state_controller.cc(889)] Ready to perform lid-closed action (suspend)
[0619/232015:INFO:internal_backlight_controller.cc(676)] Setting brightness to 6 (36.36%) over 0 ms
[0619/232015:INFO:suspender.cc(397)] Starting request 104595508
[0619/232015:INFO:daemon.cc(613)] Reading wakeup count from /sys/power/wakeup_count
[0619/232015:INFO:daemon.cc(617)] Read wakeup count 0xf7c00448
[0619/232015:INFO:internal_backlight_controller.cc(676)] Setting brightness to 0 (0%) over 0 ms
[0619/232015:INFO:internal_backlight_controller.cc(692)] Setting resume brightness to 6 (36.36%)
[0619/232015:INFO:suspend_delay_controller.cc(139)] Announcing suspend request 104595508 with 2 pending delay(s) and 0 outstanding delay(s) from previous request
[0619/232015:INFO:dark_resume.cc(161)] Not in dark resume with battery at 58.3604% and line power off
[0619/232015:INFO:dark_resume.cc(170)] Updated shutdown threshold to 58.3604%
[0619/232015:INFO:daemon.cc(1412)] Chrome is using normal display mode
[0619/232015:INFO:suspend_delay_controller.cc(88)] Got notification that delay 104595458 (shill) is ready for suspend request 104595508 from :1.10
[0619/232015:INFO:daemon.cc(806)] On battery at 58%, 2.691/4.611Ah at 0.258A, 0s until empty (calculating)
[0619/232015:INFO:daemon.cc(1412)] Chrome is using normal display mode
[0619/232015:INFO:suspend_delay_controller.cc(88)] Got notification that delay 104595459 (chrome) is ready for suspend request 104595508 from :1.39
[0619/232015:INFO:suspend_delay_controller.cc(224)] Notifying observers that suspend is ready
[0619/232015:INFO:suspender.cc(466)] Starting suspend

I think this situation is essentially equivalent to a failed suspend (the "failure" is just at the hardware level instead of a kernel driver), and we should apply the same mitigation strategy. I think we should try to add some logic on resume to treat a suspend as failed if its duration was below half a minute and the lid is still closed on resume.
 

Comment 1 by derat@chromium.org, Jun 29 2016

I am a broken record, but I strongly feel like issues like this should be fixed at as low a level as possible instead of worked around by adding more complexity at a higher layer. :-P

It sounds from the other bug like the cause of the suspend "failures" isn't fully understood, right? Is there a possibility that the failure can be detected in the kernel or below?

I'd rather not have powerd second-guess what the kernel tells it, due to the risk of false positives in the future (perhaps on hardware that isn't even susceptible to the issue that we'd be working around here).
Oh, I absolutely agree that this is a bug that "shouldn't happen" and needs to be investigated. But of course software isn't always perfect, especially when it comes to all the weird edge cases possible with resume signalling. Note that this isn't a regular occurrence on this board that I'm trying to paper over... we just have this one report of this one dogfood device that suspended and resumed in a tight loop for days (and we don't yet know why).

I'm just saying that I think this is exactly the same thing as the "suspend failures" that we already handle. Those shouldn't happen either, especially not 10 times in a row. Yet they sometimes do in the field, and then it's nice to not burn down a user's battery and fill their SSD to the brim with logs just because they ran into some freak driver bug that we've never seen before.

I think the differentiation should be a pretty safe bet here, since there's generally nothing that should give a Chromebook a (non-dark-resume) wakeup while the lid is closed. People probably won't open/close the lid in quick succession 10 times in a row.

Comment 3 by derat@chromium.org, Sep 2 2016

Sorry for the incredibly slow reply here!

Is this still popping up? I'm willing to try to handle it more gracefully in powerd if we think it's actually likely to help users, but the code will probably be a bit on the ugly/hacky side so I'd prefer to avoid adding it otherwise.

Comment 4 by derat@chromium.org, Sep 20 2017

Components: -OS>Systems OS>Kernel>Power
Owner: derat@chromium.org
Status: Assigned (was: Untriaged)
I think I should look into doing this per issue 767053. Shutting down will be a poor user experience, but still better than heating up and draining the battery.

Comment 5 by derat@chromium.org, Sep 21 2017

Status: Started (was: Assigned)

Comment 6 by derat@chromium.org, Nov 22 2017

Labels: OS-Chrome
Uploaded https://crrev.com/c/784050 for review.
Project Member

Comment 7 by bugdroid1@chromium.org, Nov 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/154a5231cf7749a253e7922ed31ef746cc259512

commit 154a5231cf7749a253e7922ed31ef746cc259512
Author: Daniel Erat <derat@chromium.org>
Date: Tue Nov 28 21:34:28 2017

power: Clean up time usage in suspend tests.

Update unit tests for powerd's policy::Suspender class to be
a bit clearer with regard to time. Add support for advancing
the clock for each suspend attempt instead of requiring
tests to register a callback that does this. Also
(hopefully) improve confusing dark resume wake data testing.

No functional changes.

BUG=chromium:624203
TEST=tests pass

Change-Id: I82fd2258f8732e9b07cbce7edb4a7235ed0c2ec3
Reviewed-on: https://chromium-review.googlesource.com/789804
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/common/clock.h
[modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/powerd/policy/suspender.cc
[modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/powerd/policy/suspender.h
[modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/powerd/policy/suspender_unittest.cc

Project Member

Comment 8 by bugdroid1@chromium.org, Dec 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/87ef87790a2a24172e997d48ae548b41b5af18e9

commit 87ef87790a2a24172e997d48ae548b41b5af18e9
Author: Daniel Erat <derat@chromium.org>
Date: Sat Dec 02 01:57:56 2017

power: Refactor Suspender::Suspend.

Break up powerd's policy::Suspender class's Suspend() method
into smaller pieces.

Also update lockfile paths in docs/suspend_resume.md.

BUG=chromium:624203,chromium:790898
TEST=existing unit tests pass; also verified that normal
     suspend/resume works as before

Change-Id: Ib9ef3204041c850e96301497d9e65f1d7ff76eda
Reviewed-on: https://chromium-review.googlesource.com/798640
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
Reviewed-by: Ravi Chandra Sadineni <ravisadineni@chromium.org>

[modify] https://crrev.com/87ef87790a2a24172e997d48ae548b41b5af18e9/power_manager/powerd/policy/suspender.cc
[modify] https://crrev.com/87ef87790a2a24172e997d48ae548b41b5af18e9/power_manager/docs/suspend_resume.md
[modify] https://crrev.com/87ef87790a2a24172e997d48ae548b41b5af18e9/power_manager/powerd/policy/suspender.h

Sign in to add a comment