powerd should treat very short suspends after which the lid is still closed as suspend failures |
|||
Issue descriptionWhen we close the lid, powerd tries to suspend for 10 times. If the suspend repeatedly fails due to some kernel-level issue, it just shuts down instead to avoid wasting battery on something that obviously doesn't seem to get better on its own. This is a good philosophy, but unfortunately it is currently not applied to all possible failure cases. In http://crosbug.com/p/54061, we have encountered a dogfood device that for some as-of-yet-unexplained reason always immediately resumed from S3 after a (from the kernel's point of view, at least) "successful" suspend. If the lid remains closed, this leads to the following kind of endless loop: [0619/232012:INFO:suspender.cc(466)] Starting suspend [0619/232012:INFO:util.cc(48)] Running "/usr/bin/powerd_setuid_helper --action=mosys_eventlog --mosys_eventlog_code=0xa7" [0619/232012:INFO:util.cc(48)] Running "/usr/bin/powerd_setuid_helper --action=suspend --suspend_wakeup_count_valid --suspend_wakeup_count=148" [0619/232015:INFO:daemon.cc(714)] powerd_suspend returned 0 [0619/232015:INFO:util.cc(36)] Launching "/usr/bin/powerd_setuid_helper --action=mosys_eventlog --mosys_eventlog_code=0xa8" [0619/232015:INFO:suspender.cc(424)] Finishing request 104595507 successfully [0619/232015:INFO:state_controller.cc(351)] Lid still closed after resuming from lid-close-triggered suspend; repeating lid-closed action [0619/232015:INFO:state_controller.cc(889)] Ready to perform lid-closed action (suspend) [0619/232015:INFO:internal_backlight_controller.cc(676)] Setting brightness to 6 (36.36%) over 0 ms [0619/232015:INFO:suspender.cc(397)] Starting request 104595508 [0619/232015:INFO:daemon.cc(613)] Reading wakeup count from /sys/power/wakeup_count [0619/232015:INFO:daemon.cc(617)] Read wakeup count 0xf7c00448 [0619/232015:INFO:internal_backlight_controller.cc(676)] Setting brightness to 0 (0%) over 0 ms [0619/232015:INFO:internal_backlight_controller.cc(692)] Setting resume brightness to 6 (36.36%) [0619/232015:INFO:suspend_delay_controller.cc(139)] Announcing suspend request 104595508 with 2 pending delay(s) and 0 outstanding delay(s) from previous request [0619/232015:INFO:dark_resume.cc(161)] Not in dark resume with battery at 58.3604% and line power off [0619/232015:INFO:dark_resume.cc(170)] Updated shutdown threshold to 58.3604% [0619/232015:INFO:daemon.cc(1412)] Chrome is using normal display mode [0619/232015:INFO:suspend_delay_controller.cc(88)] Got notification that delay 104595458 (shill) is ready for suspend request 104595508 from :1.10 [0619/232015:INFO:daemon.cc(806)] On battery at 58%, 2.691/4.611Ah at 0.258A, 0s until empty (calculating) [0619/232015:INFO:daemon.cc(1412)] Chrome is using normal display mode [0619/232015:INFO:suspend_delay_controller.cc(88)] Got notification that delay 104595459 (chrome) is ready for suspend request 104595508 from :1.39 [0619/232015:INFO:suspend_delay_controller.cc(224)] Notifying observers that suspend is ready [0619/232015:INFO:suspender.cc(466)] Starting suspend I think this situation is essentially equivalent to a failed suspend (the "failure" is just at the hardware level instead of a kernel driver), and we should apply the same mitigation strategy. I think we should try to add some logic on resume to treat a suspend as failed if its duration was below half a minute and the lid is still closed on resume.
,
Jun 29 2016
Oh, I absolutely agree that this is a bug that "shouldn't happen" and needs to be investigated. But of course software isn't always perfect, especially when it comes to all the weird edge cases possible with resume signalling. Note that this isn't a regular occurrence on this board that I'm trying to paper over... we just have this one report of this one dogfood device that suspended and resumed in a tight loop for days (and we don't yet know why). I'm just saying that I think this is exactly the same thing as the "suspend failures" that we already handle. Those shouldn't happen either, especially not 10 times in a row. Yet they sometimes do in the field, and then it's nice to not burn down a user's battery and fill their SSD to the brim with logs just because they ran into some freak driver bug that we've never seen before. I think the differentiation should be a pretty safe bet here, since there's generally nothing that should give a Chromebook a (non-dark-resume) wakeup while the lid is closed. People probably won't open/close the lid in quick succession 10 times in a row.
,
Sep 2 2016
Sorry for the incredibly slow reply here! Is this still popping up? I'm willing to try to handle it more gracefully in powerd if we think it's actually likely to help users, but the code will probably be a bit on the ugly/hacky side so I'd prefer to avoid adding it otherwise.
,
Sep 20 2017
I think I should look into doing this per issue 767053. Shutting down will be a poor user experience, but still better than heating up and draining the battery.
,
Sep 21 2017
,
Nov 22 2017
,
Nov 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/154a5231cf7749a253e7922ed31ef746cc259512 commit 154a5231cf7749a253e7922ed31ef746cc259512 Author: Daniel Erat <derat@chromium.org> Date: Tue Nov 28 21:34:28 2017 power: Clean up time usage in suspend tests. Update unit tests for powerd's policy::Suspender class to be a bit clearer with regard to time. Add support for advancing the clock for each suspend attempt instead of requiring tests to register a callback that does this. Also (hopefully) improve confusing dark resume wake data testing. No functional changes. BUG=chromium:624203 TEST=tests pass Change-Id: I82fd2258f8732e9b07cbce7edb4a7235ed0c2ec3 Reviewed-on: https://chromium-review.googlesource.com/789804 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/common/clock.h [modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/powerd/policy/suspender.cc [modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/powerd/policy/suspender.h [modify] https://crrev.com/154a5231cf7749a253e7922ed31ef746cc259512/power_manager/powerd/policy/suspender_unittest.cc
,
Dec 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/87ef87790a2a24172e997d48ae548b41b5af18e9 commit 87ef87790a2a24172e997d48ae548b41b5af18e9 Author: Daniel Erat <derat@chromium.org> Date: Sat Dec 02 01:57:56 2017 power: Refactor Suspender::Suspend. Break up powerd's policy::Suspender class's Suspend() method into smaller pieces. Also update lockfile paths in docs/suspend_resume.md. BUG=chromium:624203,chromium:790898 TEST=existing unit tests pass; also verified that normal suspend/resume works as before Change-Id: Ib9ef3204041c850e96301497d9e65f1d7ff76eda Reviewed-on: https://chromium-review.googlesource.com/798640 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Chirantan Ekbote <chirantan@chromium.org> Reviewed-by: Ravi Chandra Sadineni <ravisadineni@chromium.org> [modify] https://crrev.com/87ef87790a2a24172e997d48ae548b41b5af18e9/power_manager/powerd/policy/suspender.cc [modify] https://crrev.com/87ef87790a2a24172e997d48ae548b41b5af18e9/power_manager/docs/suspend_resume.md [modify] https://crrev.com/87ef87790a2a24172e997d48ae548b41b5af18e9/power_manager/powerd/policy/suspender.h |
|||
►
Sign in to add a comment |
|||
Comment 1 by derat@chromium.org
, Jun 29 2016