Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 221857 RTC interrupts do not cause a suspend abort late in the kernel suspend path
Starred by 5 users Project Member Reported by jwer...@chromium.org, Mar 9, 2013 Back to list
Status: Started
Owner: dbasehore@chromium.org
Cc: snanda@chromium.org, jwer...@chromium.org, kamrik@chromium.org, dbasehore@chromium.org, spang@chromium.org
Components:
OS: Chrome
Pri: 2
Type: Bug


Sign in to add a comment
When the kernel suspends with a primed /sys/power/wakeup_count, it should abort the suspend at any time if it receives an RTC wake alarm. There should be no possible race condition where the wake alarm is lost without either aborting the suspend or resuming right afterwards. This does not work reliably in current ToT builds (seen on Snow and Link for now).

Steps to reproduce:

1. Modify the kernel to include a sufficient delay (e.g. 5 seconds) during the device suspend stage, such as at the end of drivers/base/power/main.c:dpm_suspend().
2. echo `cat /sys/power/wakeup_count ` > /sys/power/wakeup_count ; echo +3 > /sys/class/rtc/rtc0/wakealarm ; powerd_suspend

Expected output:
The kernel aborts the suspend or resumes immediately.

Actual output:
The device suspends and stays suspended.

Hitting the keyboard or touchpad during the delay causes a wakeup as expected, so this seems to be specific to the RTC.
 
[edit] Just to avoid confusion: this specific manual repro case has been tested on Snow and Link for now, but I have already encountered racy wakeup counts on all platforms during suspend stress tests.
Comment 2 by snanda@chromium.org, Mar 9, 2013
Cc: dbasehore@chromium.org
+dbasehore since he has been looking closely at RTC wake path recently.
Project Member Comment 3 by bugdroid1@chromium.org, Mar 10, 2013
Labels: -Area-Power -Team-Kernel Cr-OS-Kernel Cr-OS-Kernel-Power
Comment 4 by arscott@chromium.org, Mar 14, 2013
Labels: M-28
Owner: dbasehore@chromium.org
Status: Assigned
Assigning to derek for 28. ANy thoughts on what the fix is here?
Comment 5 by jwer...@chromium.org, Mar 14, 2013
I would suspect that something in the suspend path puts the RTC or a parent device into a state where it cannot trigger interrupts anymore. I've already narrowed down the critical period to the time between the start and end of the dpm_suspend() loop... so I guess the next step would be to go through it device by device and find the culprit (although that will become more tricky if it's one of the async devices).
I think I found the problem. In cmos_suspend, we make a call to hpet_mask_rtc_irq_bit. What this effectively does is prevent the hpet_rtc_interrupt from calling the cmos_interrupt function. Since we increment the wakeup_count through cmos_interrupt with the call to rtc_update_irq, any rtc interrupt that happens after the call to cmos_suspend but before we actually suspend the CPU will not cancel the system suspend.

This call to hpet_mask_rtc_irq_bit seems to be required to prevent another bug though, so I emailed the author to see if he has any idea how to fix the problem he ran into without introducing this problem.
Status: Started
Have a fix, but we want to wait for upstream comment before submitting it. The fix also breaks an assumption for dark resume, but the fix for that puts dark resume in a better state.
Project Member Comment 9 by bugdroid1@chromium.org, May 8, 2013
Labels: -M-28 MovedFrom-28 M-29
Moving all non essential bugs to the next Milestone.
Project Member Comment 10 by bugdroid1@chromium.org, May 20, 2013
Project: chromiumos/third_party/kernel
Branch : chromeos-3.4
Author : Derek Basehore <dbasehore@chromium.org>
Commit : 33e669e2a8078ed729b2846e3982a19d791d0ac8

Code Review +2: Sameer Nanda
Verified    +1: Derek Basehore
Change-Id     : I67e81995955a6e5dbc9d5fac0b06585c9ec5baf0
Reviewed-at   : https://gerrit.chromium.org/gerrit/47902

CHROMIUM: Check dark resume state earlier

To prevent races with irq handlers for device irqs, check if the resume is a
dark resume before we enable device irqs in dpm_resume_noirq.

BUG=chromium:221857
TEST=power_DarkResumeShutdownServer (has to be run in lab)

Signed-off-by: Derek Basehore <dbasehore@chromium.org>

M  drivers/base/power/main.c
Project Member Comment 11 by bugdroid1@chromium.org, May 20, 2013
Project: chromiumos/third_party/kernel
Branch : chromeos-3.4
Author : Derek Basehore <dbasehore@chromium.org>
Commit : 46f8896a26b75b90ac6be362ee15cb37d323085e

Code Review +2: Derek Basehore
Verified    +1: Derek Basehore
Change-Id     : I7969efd864b2d79802072e8167f764e680e3d2d3
Reviewed-at   : https://gerrit.chromium.org/gerrit/48836

UPSTREAM: Don't disable hpet emulation on suspend

(This syncs up our early commit with the upstream version of the same).

There's a bug where rtc alarms are ignored after the rtc cmos suspends but
before the system finishes suspend. Since hpet emulation is disabled and it
still handles the interrupts, a wake event is never registered which is done
from the rtc layer. This reverts an earlier commit which disables hpet
emulation. To fix the problem mentioned in that commit, the hpet_rtc_timer_init
function is called directly on resume.

BUG=chromium:221857
TEST=suspend_stress_test and inserting delays in suspend path with an rtc alarm

Signed-off-by: Derek Basehore <dbasehore@chromium.org>

M  drivers/rtc/rtc-cmos.c
Fixed for x86. Things are a little more complicated on arm. The maxim 77686 uses a nested irq for its wakealarm irq, so its irq handler runs on a thread that is frozen during the freeze tasks step in suspend. After this step, the irq can't be run until after we resume.

We could move these short duration wakealarms to rtc1 though which is the SoC rtc. That rtc's irq handler is only run from an irq context. I also verified that this does not have this bug.
Comment 13 by tbroch@chromium.org, May 27, 2013
FYI, there is no longer an rtc1 on spring and if you were to revert:

ToT: https://gerrit.chromium.org/gerrit/51348
R28:  https://gerrit.chromium.org/gerrit/51468

One would need to examine the driver for the rtc's as:

snow
----
rtc0 == max77686-rtc
rtc1 == s3c-rtc

spring
-----
rtc0 == s3c-rtc
rtc1 == s5m-rtc

Comment 14 by jwer...@chromium.org, May 28, 2013
Yay... and here I was hoping something would be easy for once... -.-

So seems like what we'd want to make this simple is to make the first RTC the one for time-keeping and the second/last the one for testing. From the examples we have it seems that RTC enumeration order is in lexical order of the driver name... is this true? Is there an easy way we could change that to suit our purposes (Derek, do you know where to look)?
Project Member Comment 15 by bugdroid1@chromium.org, Jun 7, 2013
Project: chromiumos/third_party/kernel-next
Branch : chromeos-3.8
Author : Derek Basehore <dbasehore@chromium.org>
Commit : 9ff271afd80841ddcb0f69912eb73d7ebc9be7de

Code Review  +2: Derek Basehore
Verified     +1: Derek Basehore
Commit Ready   : Derek Basehore
Change-Id      : I67e81995955a6e5dbc9d5fac0b06585c9ec5baf0
Reviewed-at    : https://gerrit.chromium.org/gerrit/57405

CHROMIUM: Check dark resume state earlier

To prevent races with irq handlers for device irqs, check if the resume is a
dark resume before we enable device irqs in dpm_resume_noirq.

BUG=chromium:221857
TEST=power_DarkResumeShutdownServer (has to be run in lab)

Signed-off-by: Derek Basehore <dbasehore@chromium.org>

M  drivers/base/power/main.c
Project Member Comment 16 by bugdroid1@chromium.org, Jun 7, 2013
Project: chromiumos/third_party/kernel-next
Branch : chromeos-3.8
Author : Derek Basehore <dbasehore@chromium.org>
Commit : e48df0ddc8aa3031bdbcd2c38afa7c3e05ed2b21

Code Review  +2: Derek Basehore
Verified     +1: Derek Basehore
Commit Ready   : Derek Basehore
Change-Id      : I7969efd864b2d79802072e8167f764e680e3d2d3
Reviewed-at    : https://gerrit.chromium.org/gerrit/57406

UPSTREAM: Don't disable hpet emulation on suspend

(This syncs up our early commit with the upstream version of the same).

There's a bug where rtc alarms are ignored after the rtc cmos suspends but
before the system finishes suspend. Since hpet emulation is disabled and it
still handles the interrupts, a wake event is never registered which is done
from the rtc layer. This reverts an earlier commit which disables hpet
emulation. To fix the problem mentioned in that commit, the hpet_rtc_timer_init
function is called directly on resume.

BUG=chromium:221857
TEST=suspend_stress_test and inserting delays in suspend path with an rtc alarm

Signed-off-by: Derek Basehore <dbasehore@chromium.org>

M  drivers/rtc/rtc-cmos.c
Project Member Comment 17 by bugdroid1@chromium.org, Jun 11, 2013
Project: chromiumos/third_party/autotest
Branch : master
Author : Julius Werner <jwerner@chromium.org>
Commit : cc471f3773523a5f6a3c3a2e17be649dab24e88f

Code Review +2: Daniel Erat
Verified    +1: Julius Werner
Change-Id     : Ic3ce108969a04f87b7bee179757483274f98b56c
Reviewed-at   : https://gerrit.chromium.org/gerrit/56667

Use last (not first) RTC in the system by default

Most of our systems only have one RTC, but Exynos5 based Chromebooks
usually have two: an external rtc0 on the PMIC and an internal one on
the SoC. While only the first one keeps an accurate time across S0, the
only thing our tests really use them for is wake alarms, which work on
both. The external RTC has a problem delivering interrupts in very late
stages of suspend, however, so let's just default to the internal one.

CQ-DEPEND=CL:56666
BUG=chromium:221857
TEST=power_Resume

Signed-off-by: Julius Werner <jwerner@chromium.org>

M  client/cros/rtc.py
Project Member Comment 18 by bugdroid1@chromium.org, Jun 11, 2013
Project: chromiumos/platform/power_manager
Branch : master
Author : Julius Werner <jwerner@chromium.org>
Commit : 46309cd0f510d0eb682fa8c949f6770c514dc010

Code Review +2: Julius Werner
Verified    +1: Julius Werner
Change-Id     : I5cddbbb36627114bd8d20ce908b5efa45382971d
Reviewed-at   : https://gerrit.chromium.org/gerrit/56666

powerd_suspend: Keep RTC used for tests in sync with autotest code

We are switching Autotest to use rtc1 for suspend tests on Exynos5
devices, so we need to do the same in powerd_suspend. See CL:56667

CQ-DEPEND=CL:56667
BUG=chromium:221857
TEST=power_Resume

Signed-off-by: Julius Werner <jwerner@chromium.org>

M  scripts/powerd_suspend
Are there any outstanding issues with this bug? It's my understanding that with the last CLs, everything 'should' be fixed.
Comment 20 by jwer...@chromium.org, Jun 11, 2013
Let's go with that assumption for now and wait to see if we get any more stuck-in-suspend cases.

Mark, could you turn power_UiResume back on now that it should fail more gracefully in the lab?
Comment 21 by kamrik@chromium.org, Jun 11, 2013
Ok, I'll check it. I suspect that something else will have to be changed there. My biggest problem with power_UiResume was that it could finish with GOOD without even attempting to suspend. Your new log file handling should help to deal with this.
Comment 22 by kamrik@chromium.org, Jun 11, 2013
As far as I understand, the way it works now, if the system never tried to suspend but the last_resume_timings file exists from some previous suspend test, power_UiResume can still finish GOOD without ever trying to suspend.
One simple way to overcome this is to remove the last_resume_timings file in Suspender._reset_logs(). Does this sound as a reasonable thing to do?

timings file is:
_TIMINGS_FILE = '/var/run/power_manager/root/last_resume_timings'
Comment 23 by jwer...@chromium.org, Jun 11, 2013
powerd_suspend already unconditionally removes that file on suspend, there should be no case where we could pick up timings from a previous run.

Also, power_suspend.py now waits until it sees the "Resume finished" message from powerd_suspend in the syslog (and uses open file descriptors to ensure that message appeared after we started to suspend), so it should reliably detect when we don't try to suspend at all.
Project Member Comment 24 by bugdroid1@chromium.org, Jun 12, 2013
Project: chromiumos/third_party/kernel
Branch : chromeos-3.4
Author : Todd Broch <tbroch@chromium.org>
Commit : 7e63d9001a2ad36b32bc78788221b204ba0604a2

Code Review +2: Derek Basehore
Verified    +1: Todd Broch
Change-Id     : I2dcf0c2fa660fbdc887011a59948a675e658d82d
Reviewed-at   : https://gerrit.chromium.org/gerrit/57875

CHROMIUM: config: Make s3c-rtc a module.

By making s3c-rtc a module and the primary RTC (from Exynos PMICs) for
exynos based designs built-in to the kernel it should guarantee the
primary RTC is /dev/rtc0 which is battery backed so it will retain
time in S5.  This device is then used by hctosys in the kernel as well
as the tlsdated daemon in userspace.  The s3c-rtc is used for autotest
purposes as it doesn't have the same problems delivering interrupts
that the PMIC based RTCs do.

Signed-off-by: Todd Broch <tbroch@chromium.org>

CQ-DEPEND=CL:56667
BUG=chromium:221857
TEST=manual
On daisy & spring:
     $ ls lib/modules/3.4.0/kernel/drivers/rtc/rtc-s3c.ko
     lib/modules/3.4.0/kernel/drivers/rtc/rtc-s3c.ko

     # for daisy
     $ basename $(readlink /sys/class/rtc/rtc0/device/driver)
     rtc-max77686

     # for spring
     $ basename $(readlink /sys/class/rtc/rtc0/device/driver)
     s5m-rtc

     $ basename $(readlink /sys/class/rtc/rtc1/device/driver)
     s3c-rtc

     $ echo "+10s" > /sys/class/rtc/rtc1/wakealarm && echo mem > /sys/power/state

     <system wakes in ~10sec>

Commit-Queue: Todd Broch <tbroch@chromium.org>

M  chromeos/config/armel/chromeos-exynos5.flavour.config
Project Member Comment 25 by bugdroid1@chromium.org, Jun 12, 2013
Project: chromiumos/third_party/kernel
Branch : chromeos-3.4
Author : Todd Broch <tbroch@chromium.org>
Commit : 92746b5f7123b55f96aa82701cf740bec71b0f8d

Code Review +2: Derek Basehore
Verified    +1: Todd Broch
Change-Id     : I1411e6a664439e515d22c33b52f93aa9b1e767d6
Reviewed-at   : https://gerrit.chromium.org/gerrit/57876

Revert "CHROMIUM: dts: spring: Disable Exynos RTC."

This reverts commit 77d9e4631b9ea711f7a4dcc75a522d303b54fcf5.

Signed-off-by: Todd Broch <tbroch@chromium.org>

BUG=chromium:221857
TEST=manual
On daisy & spring:
     $ ls lib/modules/3.4.0/kernel/drivers/rtc/rtc-s3c.ko
     lib/modules/3.4.0/kernel/drivers/rtc/rtc-s3c.ko

     # for daisy
     $ basename $(readlink /sys/class/rtc/rtc0/device/driver)
     rtc-max77686

     # for spring
     $ basename $(readlink /sys/class/rtc/rtc0/device/driver)
     s5m-rtc

     $ basename $(readlink /sys/class/rtc/rtc1/device/driver)
     s3c-rtc

     $ echo "+10s" > /sys/class/rtc/rtc1/wakealarm && echo mem > /sys/power/state

     <system wakes in ~10sec>

Commit-Queue: Todd Broch <tbroch@chromium.org>

M  arch/arm/boot/dts/exynos5250-spring.dts
Comment 26 by jwer...@chromium.org, Jun 14, 2013
Fresh from a BVT daisy (https://storage.cloud.google.com/?arg=chromeos-autotest-results/3369811-chromeos-test/):

INFO	----	----	kernel=3.4.0	localtime=Jun 14 12:57:07	timestamp=1371239827	
START	----	----	timestamp=1371239831	localtime=Jun 14 12:57:11	
	START	power_Resume	power_Resume	timestamp=1371239831	localtime=Jun 14 12:57:11	
		ERROR	power_Resume	power_Resume	timestamp=1371239843	localtime=Jun 14 12:57:23	Broken RTC timestamp: hwclock from util-linux 2.21.2 Using /dev interface to clock. Assuming hardware clock is kept in UTC time. Waiting for clock tick... ...got clock tick Time read from Hardware Clock: 2044/11/16 23:38:00 Invalid values in hardware clock: 2044/11/16 23:38:00
	END ERROR	power_Resume	power_Resume	timestamp=1371239843	localtime=Jun 14 12:57:23	
END GOOD	----	----	timestamp=1371239856	localtime=Jun 14 12:57:36	

Soo... I guess this is a problem. I think we should just set the rtc1 to the current system time before suspending in the powerd_suspend testing path? I can write a patch for that if no one has objections.
Comment 27 by quiche@chromium.org, Jun 14, 2013
@jwerner, should we dupe bug 244436 into this? (it's a different board, but has the same "Broken RTC timestamp" symptom.)
Comment 28 by jwer...@chromium.org, Jun 14, 2013
Setting the second RTC before suspend caused other problems (it can look like a spurious wakealarm that causes the kernel to abort suspend), so Derek and I decided to revert the whole thing... it looks like it's not worth it anymore. Derek will look into making the PMIC RTC's wake interrupt more reliable again.

By the way, I've noticed another funny thing: Changing the RTC actually increased our measured resume time on Snow by a noticeable 0.1s! Looking at this graph, you can clearly spot the point where my CL went in (and you'll probably going to be able to spot the revert in a few days): http://cautotest.corp.google.com/results/dashboard/eagle_eye/graphs/data/r29/daisy/power_Resume__suspend_resume_time/report.html?history=100&header=daisy&rev=-1

Due to the way we measure resume time in the autotest, seconds_system_resume_firmware is the difference that is left when measuring the whole resume time through the RTC and subtracting the kernel's self-reported time. So either the time reported by hwclock is consistently off by 0.1s on one of the RTCs, or the Exynos RTC takes longer to process a wakealarm for some weird reason...
Project Member Comment 29 by bugdroid1@chromium.org, Jun 14, 2013
Project: chromiumos/third_party/autotest
Branch : master
Author : Julius Werner <jwerner@chromium.org>
Commit : 01b7005abb11dc3c0e80f1aaeebe2759fd65cf13

Code Review +2: Daniel Erat
Verified    +1: Julius Werner
Change-Id     : I4810f513aadb9815c737d19d12269ecffb704cef
Reviewed-at   : https://gerrit.chromium.org/gerrit/58710

Revert "Use last (not first) RTC in the system by default"

This reverts commit cc471f3773523a5f6a3c3a2e17be649dab24e88f.

This change was bad... see CL:58709

CQ-DEPEND=CL:58709
BUG=chromium:221857
TEST=power_Resume on daisy

Signed-off-by: Julius Werner <jwerner@chromium.org>

M  client/cros/rtc.py
Project Member Comment 30 by bugdroid1@chromium.org, Jun 14, 2013
Project: chromiumos/platform/power_manager
Branch : master
Author : Julius Werner <jwerner@chromium.org>
Commit : fa0e2b324a9c8a178681257fba45dedd77a0ebbb

Code Review +2: Daniel Erat, Derek Basehore
Verified    +1: Julius Werner
Change-Id     : I2527190fe7b4be9852d49329386b1e19a8e955e5
Reviewed-at   : https://gerrit.chromium.org/gerrit/58709

Revert "powerd_suspend: Keep RTC used for tests in sync with autotest code"

This reverts commit 46309cd0f510d0eb682fa8c949f6770c514dc010.

This ends up causing more problems than it solves. The
second RTC is just too screwed up to get working reliably,
we should rather not use it at all.

CQ-DEPEND=CL:58710
BUG=chromium:221857
TEST=power_Resume on daisy

Signed-off-by: Julius Werner <jwerner@chromium.org>

M  scripts/powerd_suspend
Project Member Comment 31 by bugdroid1@chromium.org, Jun 18, 2013
Project: chromiumos/third_party/kernel-next
Branch : chromeos-3.8
Author : Todd Broch <tbroch@chromium.org>
Commit : f451e4b29243198b8dceccb5ec376e2c39ce2076

Code Review +2: Sonny Rao
Verified    +1: Sonny Rao
Change-Id     : I2dcf0c2fa660fbdc887011a59948a675e658d82d
Reviewed-at   : https://gerrit.chromium.org/gerrit/58861

CHROMIUM: config: Make s3c-rtc a module.

By making s3c-rtc a module and the primary RTC (from Exynos PMICs) for
exynos based designs built-in to the kernel it should guarantee the
primary RTC is /dev/rtc0 which is battery backed so it will retain
time in S5.  This device is then used by hctosys in the kernel as well
as the tlsdated daemon in userspace.  The s3c-rtc is used for autotest
purposes as it doesn't have the same problems delivering interrupts
that the PMIC based RTCs do.

Signed-off-by: Todd Broch <tbroch@chromium.org>

CQ-DEPEND=CL:56667
BUG=chromium:221857
TEST=manual
On daisy & spring:
     $ ls lib/modules/3.4.0/kernel/drivers/rtc/rtc-s3c.ko
     lib/modules/3.4.0/kernel/drivers/rtc/rtc-s3c.ko

     # for daisy
     $ basename $(readlink /sys/class/rtc/rtc0/device/driver)
     rtc-max77686

     # for spring
     $ basename $(readlink /sys/class/rtc/rtc0/device/driver)
     s5m-rtc

     $ basename $(readlink /sys/class/rtc/rtc1/device/driver)
     s3c-rtc

     $ echo "+10s" > /sys/class/rtc/rtc1/wakealarm && echo mem > /sys/power/state

     <system wakes in ~10sec>

[sonnyrao: 3.8: fixed up conflicts]
Commit-Queue: Todd Broch <tbroch@chromium.org>
Commit-Queue: Sonny Rao <sonnyrao@chromium.org>

M  chromeos/config/armel/chromeos-exynos5.flavour.config
Comment 32 by benhenry@google.com, Jun 26, 2013
Labels: -Pri-1 -M-29 MovedFrom-29 M-30 Pri-2
This issue is Pri-1 but has already been moved once, therefore lowering to Pri-2 and moving to next milesone.
Comment 33 by kareng@google.com, Aug 19, 2013
Labels: -M-30 MovedFrom-30
This issue has already been moved once and is lower than Priority 1,therefore removing mstone.
Comment 34 by dbasehore@google.com, Nov 22, 2013
This is still broken on arm systems. We need to figure out how to let the pmic rtc cancel a suspend if the alarm goes off during that period.
Comment 35 by lafo...@chromium.org, Sep 23, 2015
Labels: Hotlist-Recharge Hotlist-Recharge-Stale
This issue likely requires triage.  The current issue owner maybe inactive (i.e. hasn't fixed an issue in the last 30 days).  It has also not been modified in a year (prior to this update).  Thanks for helping out!

-Anthony
Sign in to add a comment