power_SuspendStress flake ("Spurious wake in S3"): hwclock failed |
||||||
Issue description
TL;DR: hwclock command failures produce non-intuitive false negatives in power_SuspendStress.
---
powerd_suspend tries to log hwclock time when told:
error=$( (printf '%s' "${action}" | cat > /sys/power/state) 2>&1)
if [ $? -eq 0 ]; then
# On resume:
result="${RESULT_SUCCESS}"
if [ -f "${TIMESTAMP_FILE}" ]; then
# Record the hwclock time at resume for the power_Resume test.
if TIME="$(/sbin/hwclock -r --utc 2>&1)"; then
echo "${TIME}" > "${TIMESTAMP_FILE}"
log_msg "Recorded hwclock time for test"
else
log_msg "/sbin/hwclock failed: ${TIME}"
log_msg "debug: $(/sbin/hwclock -r --debug --utc 2>&1)"
fi
fi
And power_SuspendStress -> power_suspend._hwclock_ts() tries to use this to verify "spurious wakeup" as follows:
for retry in xrange(retries + 1):
early_wakeup = False
if os.path.exists(self.HWCLOCK_FILE):
# TODO(crbug.com/733773): Still fragile see bug.
match = re.search(r'(.+)(\.\d+)[+-]\d+:?\d+$',
utils.read_file(self.HWCLOCK_FILE), re.DOTALL)
if match:
timeval = time.strptime(match.group(1),
"%Y-%m-%d %H:%M:%S")
seconds = time.mktime(timeval)
seconds += float(match.group(2))
logging.debug('RTC resume timestamp read: %f', seconds)
if seconds >= not_before:
return seconds
early_wakeup = True
time.sleep(0.05 * retry)
if early_wakeup:
logging.debug('Early wakeup, dumping eventlog if it exists:\n')
elog = utils.system_output('mosys eventlog list | tail -n %d' %
self._RELEVANT_EVENTLOG_LINES, ignore_status=True)
wake_elog = (['unknown'] + re.findall(r'Wake Source.*', elog))[-1]
for line in reversed(self._logs):
match = re.search(r'PM1_STS: WAK.*', line)
if match:
wake_syslog = match.group(0)
break
else:
wake_syslog = 'unknown'
for b, e, s in sys_power.SpuriousWakeupError.S3_WHITELIST:
if (re.search(b, utils.get_board()) and
re.search(e, wake_elog) and re.search(s, wake_syslog)):
logging.warning('Whitelisted spurious wake in S3: %s | %s',
wake_elog, wake_syslog)
return None
raise sys_power.SpuriousWakeupError('Spurious wake in S3: %s | %s'
% (wake_elog, wake_syslog))
In some local test runs on Scarlet, I see hwclock failed once, but succeeded the second time (when we're dumping out debug info); from powerd.LATEST:
2018-02-22T20:37:33.468569-08:00 NOTICE powerd_suspend[16100]: /sbin/hwclock failed:
2018-02-22T20:37:34.481896-08:00 NOTICE powerd_suspend[16102]: debug: hwclock from util-linux 2.28.2#012Using the /dev interface to the clock.#012Assuming hardware clock is kept in UTC time.#012Waiting for clock tick...#012...got clock tick#012Time read from Hardware Clock: 2018/02/23 04:37:35#012Hw clock time : 2018/02/23 04:37:35 = 1519360655 seconds since 1969#012Time since last adjustment is 1519360655 seconds#012Calculated Hardware Clock drift is 0.000000 seconds#0122018-02-22 20:37:33.993522-8:00
2018-02-22T20:37:34.488953-08:00 NOTICE powerd_suspend[16104]: wake source: unknown
2018-02-22T20:37:34.510637-08:00 NOTICE powerd_suspend[16111]: Skipping enable BT HCI mode change event on non-Intel BT
2018-02-22T20:37:34.515286-08:00 NOTICE powerd_suspend[16116]: Resume finished
The suspend/resume actually succeeded just fine (with no early wakeup), but we fail because the timestamp accounting is just dumping the old clock value (notice how the RTC resume timestamp is the same at 20:37:04 and 20:37:34):
02/22 20:36:57.226 INFO | sys_power:0163| Running '/usr/bin/powerd_dbus_suspend --delay=0 --timeout=30 --wakeup_count=2109'
02/22 20:37:04.551 DEBUG| power_suspend:0253| RTC resume timestamp read: 1519360623.994649
02/22 20:37:07.112 INFO | power_suspend:0570| Success(1053): 0.228359 down, 1.99465 up, 1.14919 board, 0 firmware, 0.845461 kernel, 0.06
...
02/22 20:37:15.286 INFO | sys_power:0163| Running '/usr/bin/powerd_dbus_suspend --delay=0 --timeout=30 --wakeup_count=2111'
02/22 20:37:34.581 DEBUG| power_suspend:0253| RTC resume timestamp read: 1519360623.994649
02/22 20:37:34.585 DEBUG| power_suspend:0253| RTC resume timestamp read: 1519360623.994649
02/22 20:37:34.638 DEBUG| power_suspend:0253| RTC resume timestamp read: 1519360623.994649
02/22 20:37:34.740 DEBUG| power_suspend:0253| RTC resume timestamp read: 1519360623.994649
02/22 20:37:34.891 DEBUG| power_suspend:0259| Early wakeup, dumping eventlog if it exists:
02/22 20:37:34.893 DEBUG| utils:0214| Running 'mosys eventlog list | tail -n 10'
02/22 20:37:34.968 ERROR| power_suspend:0606| SpuriousWakeupError(1054): Spurious wake in S3: unknown | unknown
Questions:
(a) why is hwclock failing?
(b) should powerd retry?
(c) how can we make this error behavior more intuitive? (e.g., clear the old timestamp, so we don't accidentally reuse it?)
(d) (Nit:) why does power_SuspendStress have a retry loop for reading the HWCLOCK_FILE? (/run/power_manager/root/hwclock-on-resume) I see we added retries for dealing with log flushing years ago, but this particular file is written and closed out by powerd synchronously...
I suspect we might want to stress-test the RTC on Scarlet to eliminate any potential platform issues (possible explanation for (a)).
,
Feb 24 2018
I think we're using hwclock because we also set the alarm directly on the RTC. This way we can be sure that we're subtracting two times from exactly the same clock to calculate resume time. No idea on the actual bug, but wrapping hwclock in an strace to figure out why it fails might help. Also, are there any interesting kernel or EC messages at the time of the failure? I think the problem with the autotest is that it only touches HWCLOCK_FILE in __init__() and finalize(), which doesn't help if there's a problem in a middle iteration of power_SuspendStress. The open_write_close() from __init__ should probably move to suspend() to make the test fail more clearly in this case. The retry just makes sure the hwclock file has been fully written out, I think. Keep in mind that all of userspace unfreezes simultaneously, so the autotest may already start looking for the file while powerd_suspend is still running.
,
Feb 24 2018
> I think we're using hwclock because we also set the alarm directly on the RTC. This way we can be sure that we're subtracting two times from exactly the same clock to calculate resume time. I don't see how that's so magical. As long as we use the same clock source for both before and after, it should be valid. Also, why do we need powerd_suspend to log the time for us? Is it that bad to wait until it returns, and check the time within the autotest itself (e.g., with derat's suggestion)? You lose just a little bit of timing accuracy, but it shouldn't be *that* bad... I'm going to look at completely removing /run/power_manager/root/hwclock-on-resume. > I think the problem with the autotest is that it only touches HWCLOCK_FILE in __init__() and finalize(), which doesn't help if there's a problem in a middle iteration of power_SuspendStress. The open_write_close() from __init__ should probably move to suspend() to make the test fail more clearly in this case. Yeah, that's one problem. Or maybe, we just get rid of that file entirely... > the autotest may already start looking for the file while powerd_suspend is still running. Nope. The test is running 'powerd_suspend' synchronously. Any code run synchronously within 'powerd_suspend' (including writing to that file) will be finished before then.
,
Feb 24 2018
I'll look at fixing the test and powerd_suspend. I may or may not get through figuring out what makes hwclock unreliable on Scarlet.
,
Feb 24 2018
> As long as we use the same clock source for both before and after, it should be valid. Addendum: as long as that clock source includes time-while-suspended. e.g., derat's CLOCK_BOOTTIME suggestion. The kernel updates this clock using either an SoC timer or, if not available (e.g., RK3399's timer stops during suspend), the RTC.
,
Feb 24 2018
> I don't see how that's so magical. As long as we use the same clock source for both before and after, it should be valid. I'm not saying it's impossible, but then you may have to deal with all sorts of stuff like making sure you don't have a different time zone or something. Also, system time may drift away from RTC time for a bit before it crosses the threshold where the kernel resyncs it. And you couldn't use CLOCK_BOOTTIME anyway, the RTC keeps wallclock time so you'd have to get that from the kernel as well. It just seems easier to use the same source for measuring as we used to set the alarm, which is raw RTC time, and I don't really see a reason why that would be worse in any way. Scarlet just has some sort of bug, and then that bug ought to get fixed. > Also, why do we need powerd_suspend to log the time for us? Is it that bad to wait until it returns, and check the time within the autotest itself (e.g., with derat's suggestion)? You lose just a little bit of timing accuracy, but it shouldn't be *that* bad... The autotest doesn't know directly when the suspend is happening, it uses the presence of the file to determine that. Also, trying to do this from Python is probably going to be more prone to random jitter because a lot more hard to predict things are happening. The hwclock utility is dead simple and pretty much never changes, so times remain reliable and comparable. The existing code really worked quite fine for years, I don't quite get why you suddenly want to reinvent the world. > Nope. The test is running 'powerd_suspend' synchronously. Any code run synchronously within 'powerd_suspend' (including writing to that file) will be finished before then. No, the test isn't calling powerd_suspend directly. It runs powerd_dbus_suspend, which sends a DBUS message to powerd, which goes through the whole suspend sequence and eventually calls powerd_suspend. I'm pretty sure the DBUS message is asynchronous and powerd_dbus_suspend will just return afterwards, but this may have changed at some point. (If the test actually does wait the whole time until powerd_suspend finishes, control passes back into powerd and powerd gets around to returning a DBUS response to the powerd_dbus_suspend process, that's all the more reason to measure the time in powerd_suspend instead of the test itself.)
,
Feb 24 2018
> I'm pretty sure the DBUS message is asynchronous and > powerd_dbus_suspend will just return afterwards powerd_dbus_suspend waits for a SuspendDone D-Bus signal, so it should block until the system suspends and resumes (or powerd gives up on the attempt due to a wakeup count mismatch).
,
Feb 24 2018
> The hwclock utility is dead simple I beg to differ. It's extremely slow, as it does some kind of "wait for clock tick" thing, and probably more unnecessary things. This *could* be as simple as a 'cat /sys/class/rtc/rtc0/since_epoch', but no... Instead this takes almost 2 minutes on my kevin: # time (for i in $(seq 1 100) ; do hwclock -r --debug --utc > /dev/null 2>&1; done) real 1m39.571s user 0m0.169s sys 0m1.113s I think that kills any semblance of "better jitter." Also, is this dead simple? This is all code I'm getting ready to delete: - # TODO(crbug.com/733773): Still fragile see bug. - match = re.search(r'(.+)(\.\d+)[+-]\d+:?\d+$', - utils.read_file(self.HWCLOCK_FILE), re.DOTALL) - if match: - timeval = time.strptime(match.group(1), - "%Y-%m-%d %H:%M:%S") - seconds = time.mktime(timeval) - seconds += float(match.group(2)) - logging.debug('RTC resume timestamp read: %f', seconds) > making sure you don't have a different time zone or something Sorry, what? The above shows we already are fragile to time zones. We also have to stop tlsdated to get any of this to work right (another thing I'll delete). And the man page already explains how time zones aren't involved for, e.g., CLOCK_BOOTTIME. > Also, system time may drift away from RTC time for a bit before it crosses the threshold where the kernel resyncs it. This is mostly not relevant. There's RTC-relative offsets for this (e.g., "echo "+10" > /sys/...") and the kernel handles the rest. The only drift that would happen is the drift between setting the alarm and checking the time after resume. i.e., the RTC would have to be drifting a *lot* from the system clock, such that "wake me in N seconds" really means "wake me in M seconds" where M is significantly smaller than N. > The existing code really worked quite fine for years I'm not sure about this particular code, but the rest of this test is certainly not reliable (there are multiple false failures on multiple platforms that I'm working on in other bugs). AFAICT people tend to rerun this test until it passes. Anyway, I will do my cleanups (including stopping using 'hwclock'; it's not the right tool for this), but the jury's still out on whether I'll keep using the RTC directly or use the system clock. BTW, after just a few hundred hwclock runs on Scarlet, I found a failure: open("/dev/rtc", O_RDONLY|O_LARGEFILE) = 3 access("/etc/adjtime", R_OK) = -1 ENOENT (No such file or directory) ioctl(3, RTC_UIE_ON) = 0 _newselect(4, [3], NULL, NULL, {tv_sec=10, tv_usec=0}) = 0 (Timeout) ioctl(3, PHN_NOT_OH or RTC_UIE_OFF) = 0 close(3) = 0 write(1, "hwclock from util-linux 2.28.2\nU"..., 220hwclock from util-linux 2.28.2 Using the /dev interface to the clock. Assuming hardware clock is kept in UTC time. Waiting for clock tick... select() to /dev/rtc to wait for clock tick timed out...synchronization failed ) = 220 Looks like the RTC isn't reliably supporting the select() interface. But again, that feature is a completely unnecessary complexity to this test.
,
Feb 24 2018
> I beg to differ. It's extremely slow, as it does some kind of "wait for clock tick" thing, and probably more unnecessary things. This *could* be as simple as a 'cat /sys/class/rtc/rtc0/since_epoch', but no... Instead this takes almost 2 minutes on my kevin: Yes, the tool takes multiple seconds, *because* it is taking an accurate measurement. That "wait for clock tick thing" is the tool waiting for the RTC to cross another second boundary, so that it can then subtract the time it waited from that to get an accurate sub-second measurement of when it was called. That's exactly why you don't just cat since_epoch, because then you get a raw RTC measurement that only goes down to seconds which is obviously not that great to measure sub-second resume times. > And the man page already explains how time zones aren't involved for, e.g., CLOCK_BOOTTIME. Yes, that's great and all, *but you cannot use CLOCK_BOOTTIME!* The whole point of this is to measure the difference between that point and the point where the RTC alarm fired, but the RTC alarm is being programmed in RTC time. You cannot just subtract seconds since boot from RTC time. You could use CLOCK_REALTIME, I guess, but you're just sacrificing accuracy for the cost of not having to fix whatever is wrong with your RTC driver. > This is mostly not relevant. There's RTC-relative offsets for this (e.g., "echo "+10" > /sys/...") and the kernel handles the rest. The only drift that would happen is the drift between setting the alarm and checking the time after resume. i.e., the RTC would have to be drifting a *lot* from the system clock, such that "wake me in N seconds" really means "wake me in M seconds" where M is significantly smaller than N. When you write wakealarm with +10 it will just read the existing RTC timestamp, add the number and write it back. It does not force a sync to system time. In fact, I think Chrome OS only syncs system time to RTC time whenever tlsdate syncs back with the time server, which in my experience is not a very predictable thing (we've had a fair share of issues of it not syncing in various situations that usually nobody notices for a while). So they could have easily drifted apart for hours or maybe even days already. Relevant -- probably not, but it just seems unnecessary. > Anyway, I will do my cleanups (including stopping using 'hwclock'; it's not the right tool for this) In order to get CLOCK_REALTIME from the powerd_suspend shell script you would also have to call the 'time' utility, which isn't that much more complex at that point (both fork a process that does some syscalls and prints the result in text form). Except that one of them gives you the time you *really* want for the comparison, while the other one gives you a time that might hopefully under most circumstances be close enough. I'd argue hwclock is the right utility there.
,
Feb 25 2018
Hi guys,
for the hwclock failed:
> ioctl(3, RTC_UIE_ON) = 0
this would enable rtc update irq:
localhost / # grep update /proc/driver/rtc
update IRQ enabled : yes
which is:
1/ set an 1 sec rtc alarm
2/ when the alarm triggered, set another 1 sec rtc alarm.
...
> _newselect(4, [3], NULL, NULL, {tv_sec=10, tv_usec=0}) = 0 (Timeout)
this means ec didn't send rtc event(for alarm triggered) within 10 seconds.
maybe we can add some logs in ec to debug it.
,
Feb 25 2018
hmm, found a scarlet, and could repro this issue. the hwclock -r failed, and no error logs in ec and kernel side, will add more logs to debug. and it looks like we did hit this issue long time ago: https://partnerissuetracker.corp.google.com/u/0/issues/38310685#comment18 so guess this could happened on both npcx and stm32 ec?
,
Feb 25 2018
hmm, i think i know what happened in the hwclock -r failed case...
when we do hwclock -r, it would:
1/ get current rtc time
2/ set a one second alarm
3/ wait for that alarm triggered for synchronization:
* About synchronizing to the Hardware Clock when reading the time: The
* precision of the Hardware Clock counters themselves is one second. You
* can't read the counters and find out that is 12:01:02.5. But if you
* consider the location in time of the counter's ticks as part of its
* value, then its precision is as infinite as time is continuous! What I'm
* saying is this: To find out the _exact_ time in the hardware clock, we
* wait until the next clock tick (the next time the second counter changes)
* and measure how long we had to wait. We then read the value of the clock
* counters and subtract the wait time and we know precisely what time it
* was when we set out to query the time.
so in the kernel side:
int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled)
{
...
__rtc_read_time(rtc, &tm); <--- get current rtc time
onesec = ktime_set(1, 0);
now = rtc_tm_to_ktime(tm);
rtc->uie_rtctimer.node.expires = ktime_add(now, onesec); <--- plus one second
rtc->uie_rtctimer.period = ktime_set(1, 0);
err = rtc_timer_enqueue(rtc, &rtc->uie_rtctimer); <--- set that one second alarm
static int cros_ec_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm)
{
...
ret = cros_ec_rtc_get(cros_ec, EC_CMD_RTC_GET_VALUE, ¤t_time); <--- get current rtc time
...
alarm_offset = (u32)alarm_time - current_time; <--- calc alarm offset
...
ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM, alarm_offset); <--- set alarm offset to ec
so there's a chance that a clock tick(second counter changed) happened between them(the two read rtc time), cause ec rtc driver set a alarm with offset 0(which actually means EC_RTC_ALARM_CLEAR), that would break the rtc update irq loop.
guess we should fix it in the rtc framework(rtc_update_irq_enable) to make sure rtc update irq work as expected...
maybe we can:
use 2 second alarm for expires:
+++ b/drivers/rtc/interface.c
@@ -476,12 +476,12 @@ int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled)
if (enabled) {
struct rtc_time tm;
- ktime_t now, onesec;
+ ktime_t now, twosec;
__rtc_read_time(rtc, &tm);
- onesec = ktime_set(1, 0);
+ twosec = ktime_set(2, 0);
now = rtc_tm_to_ktime(tm);
- rtc->uie_rtctimer.node.expires = ktime_add(now, onesec);
+ rtc->uie_rtctimer.node.expires = ktime_add(now, twosec);
but this would make hwclock -r slower :(
or
return a special err in rtc driver when failed to set alarm(too late for that alarm), so the rtc_update_irq_enable would know that alarm would never triggered, and retry another one second alarm.
,
Feb 25 2018
hmm, checking the __rtc_set_alarm, we already has a ETIME for that:
static int __rtc_set_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm)
{
...
scheduled = rtc_tm_to_time64(&alarm->time);
/* Make sure we're not setting alarms in the past */
err = __rtc_read_time(rtc, &tm);
if (err)
return err;
now = rtc_tm_to_time64(&tm);
if (scheduled <= now)
return -ETIME;
static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer)
{
...
err = __rtc_set_alarm(rtc, &alarm);
if (err == -ETIME) {
pm_stay_awake(rtc->dev.parent);
schedule_work(&rtc->irqwork);
so we should add it to cros ec rtc driver(maybe also some other rtc drivers?)
will send a patch soon
,
Feb 26 2018
Thanks Jeffy. There's also this beautiful comment in the __rtc_set_alarm code:
/*
* XXX - We just checked to make sure the alarm time is not
* in the past, but there is still a race window where if
* the is alarm set for the next second and the second ticks
* over right here, before we set the alarm.
*/
It's as if this should probably go in the driver callback or something, with a documented error code for handling this...
Also, no matter how much you check before setting the alarm, a sufficiently slow protocol (e.g., the CrOS EC protocol) can still leave plenty of room for that race. I'll review your patch more closely, but I don't think it's sufficient as-is.
(I also agree; many other RTC drivers -- probably almost all of them -- are broken, unless they have some kind of magic in their hardware that immediately triggers if an alarm is set in the past.)
@Julius:
> because then you get a raw RTC measurement that only goes down to seconds which is obviously not that great to measure sub-second resume times
But that's not the point; we're not trying to measure sub-second resume times. We're trying to measure "did we wake before or after the alarm". The alarm is set in seconds. So sub-second readings of the RTC don't help. The RTC should have *already* ticked, if it was the source of waking us.
The rest of your arguments still don't make any sense to me. I think you don't understand what I'm proposing. One solution:
A = clock_gettime(CLOCK_BOOTTIME)
echo +X > /sys/class/rtc/...
powerd_dbus_suspend
B = clock_gettime(CLOCK_BOOTTIME)
if (B - A < X)
raise "early wake"
Or alternatively, if we really want to stick with the RTC clock:
A = rtc.get_seconds()
echo +X > /sys/class/rtc/... (or if preferred, echo X+A > ...)
powerd_dbus_suspend
B = rtc.get_seconds()
if (B - A > X)
raise "early wake"
Note that all of the above would be done in the test itself (python), using the same timer source before/after suspend. AFAICT the primary 2 criticisms would be:
(a) CLOCK_BOOTTIME could still run at a different rate than the RTC (so "X seconds" is not the same period of time when measured by the possibly-more-accurate system clock)
(b) retrieving the time from the python code after powerd_dbus_suspend returns, rather than from within powerd_suspend will give too much extra slack
I'm pretty sure (a) is not relevant in most cases (the RTC delay can feed into CLOCK_BOOTTIME at resume), but data might still suggest otherwise.
You've argued (b), but I don't buy it. We're already as much as 1 second after the alarm (the kernel takes time to resume), so this entire concept ("verify the wake alarm") is a bit dodgy in the first place. I suppose I could measure the typical delay between the existing and the proposed timekeeping point, but it would have to be pretty significant to justify the extra complexity of both hwclock and scattering our timekeeping across both the tests and powerd.
Do you have any other criticism of one of the above proposals?
,
Feb 26 2018
w.r.t. CLOCK_BOOTTIME and this check: if (B - A < X) I guess I can spot one more problem. I think that actually might still have problems with aliasing because adding "+X" to the RTC doesn't actually necessarily mean a full "X" seconds of wall clock time. It might lose a full second, e.g., if we're close to the next tick. (This depends on the RTC implementation of set_alarm() actually.) So maybe it's still better to just go with RTC time (i.e., the 2nd solution.), if we don't want to sacrifice a full second of accuracy [1]. [1] i.e., if (B - A < X - 1)
,
Feb 26 2018
> guess we should fix it in the rtc framework(rtc_update_irq_enable) to make sure rtc update irq work as expected... > but this would make hwclock -r slower :( I didn't look at your patch in detail, but in general making hwclock slower should be fine. The important part is that it is accurate and reliable. If we instead implement some fix by returning -ETIME in rare cases, we should make sure that hwclock is able to handle that correctly (e.g. by retrying the measurement and then still calculating the time back to when it was launched). > But that's not the point; we're not trying to measure sub-second resume times. We're trying to measure "did we wake before or after the alarm". The alarm is set in seconds. So sub-second readings of the RTC don't help. The RTC should have *already* ticked, if it was the source of waking us. No, I think you misunderstood the whole purpose of this measurement. It's not just to detect early wakeups... we just use it for that as well because we have it anyway. The main point is to be able to measure pre-kernel resume time. We want to measure the full resume time between the wake event and userspace coming back up. But some of that time may be spent in the EC, in firmware, or in some hardware state machine, and it's hard or impossible to measure all of those individually. So instead we measure the exact RTC timestamp from when userspace comes back up and subtract it from the alarm time we previously set (because we know that is the exact time when the wake event happened). Now we have the total resume time *including* all those pre-kernel components, and we subtract the time spent in the kernel (as parsed out of dmesg timestamps) to get the time spent elsewhere (this is the seconds_system_resume_firmware_ec reported by the test).
,
Feb 26 2018
> If we instead implement some fix by returning -ETIME in rare cases
-ETIME gets translated into "raise an event immediately." It is a seemingly-correct extension of that existing policy to have the CrOS RTC driver also return -ETIME if that time has already passed (e.g., because the intervening work took too long). That's not going to be super-nice to hwclock, because it means we don't really know how long ago the tick actually happened. But then, trying to backwards-derive sub-second precision when talking over a slow interface like the EC protocol seems a little foolhardy anyway...
> we should make sure that hwclock is able to handle that correctly (e.g. by retrying the measurement and then still calculating the time back to when it was launched).
hwclock doesn't do anything like that -- and actually, it can't, because rtc_timer_enqueue() masks the -ETIME error code, and signals an immediate event:
err = __rtc_set_alarm(rtc, &alarm);
if (err == -ETIME) {
pm_stay_awake(rtc->dev.parent);
schedule_work(&rtc->irqwork);
} else if (err) {
timerqueue_del(&rtc->timerqueue, &timer->node);
timer->enabled = 0;
return err;
}
}
return 0;
But AFAICT, that's about as good as you're going to get when trying to reverse-engineer sub-second precision over a slow protocol like CrOS EC's SPI protocol. You still *could* actually back off and retry, until you manage to get a measurement on several ticks over the course of the next N seconds, but in any one of those, you're still going to be biased by whatever time you wasted on a few EC transactions (where after each EC alarm event, we have to send/receive several EC messages).
Anyway, that would require some complex updates to the RTC ABI if you want to propagate different failure modes back, with ioctl(RTC_UIE_ON).
If hwclock really wanted to do a better job at synchronizing to something like this though...I suppose it could just ignore the first tick, or sample a few events before declaring itself synchronized [1].
> No, I think you misunderstood the whole purpose of this measurement. It's not just to detect early wakeups... we just use it for that as well because we have it anyway. The main point is to be able to measure pre-kernel resume time.
Thanks for the additional explanation. I think I missed that part when probing through the layers of complexity here. It still seems a bit rickety, but I guess we just have to do our best to satisfy the existing APIs. I think Jeffy has identified the main way to do that in the CrOS EC RTC driver. I'm willing to bet that there are other RTC drivers that are similarly broken though.
In terms of changing the test, I guess we really just need to be much more robust to hwclock failures (per some suggestions above already; clearing the HWCLOCK file in between iterations, and reporting unexpected results differently). And given the ugliness of looking through the RTC stack and the assumptions that seem to have built up...that will happen again.
[1] The current hwclock-rtc implementation only samples once:
rc = ioctl(rtc_fd, RTC_UIE_ON, 0);
...
if (rc == -1 && (errno == ENOTTY || errno == EINVAL)) {
/*
* This rtc device doesn't have interrupt functions.
* This is typical on an Alpha, where the Hardware
* Clock interrupts are used by the kernel for the
* system clock, so aren't at the user's disposal.
*/
if (debug)
printf(_("%s does not have interrupt functions. "),
rtc_dev_name);
ret = busywait_for_rtc_clock_tick(rtc_fd);
} else if (rc == 0) {
...
/*
* Wait up to ten seconds for the next update
* interrupt
*/
...
rc = select(rtc_fd + 1, &rfds, NULL, NULL, &tv);
...
/* Turn off update interrupts */
rc = ioctl(rtc_fd, RTC_UIE_OFF, 0);
...
// (briannorris): We exit here, and hwclock assumes we've synchronized
,
Feb 27 2018
FYI, talked with Chris Zhong, he said the rk808(used in veyrons) would have the same issue(alarm would not fired if the alarm time == current time)
,
Feb 27 2018
Is that to say, this patch will break hwclock's tick detection for rk808?
diff --git a/drivers/rtc/rtc-rk808.c b/drivers/rtc/rtc-rk808.c
index 35c9aada07c8..f10fe5c9e07c 100644
--- a/drivers/rtc/rtc-rk808.c
+++ b/drivers/rtc/rtc-rk808.c
@@ -16,6 +16,7 @@
* more details.
*/
+#include <linux/delay.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/rtc.h>
@@ -281,6 +282,7 @@ static int rk808_rtc_setalarm(struct device *dev, struct rtc_wkalrm *alrm)
return ret;
}
if (alrm->enabled) {
+ msleep(1000);
ret = rk808_rtc_start_alarm(rk808_rtc);
if (ret) {
dev_err(dev, "Failed to start alarm: %d\n", ret);
,
Feb 28 2018
OK, we've got a bugfix for Scarlet (and presumably Kevin and Bob too): https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/936721 And some test improvements: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/940540 https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/940601 https://chromium-review.googlesource.com/c/chromiumos/platform2/+/940761 I'm not sure I have the stomach to audit all other RTC drivers, but I suspect most are not going to see anywhere near the same failure rate, if they're backed by hardware that's much closer to "a few register pokes", rather than "send/receive a sequence of EC messages over a SPI bus."
,
Feb 28 2018
Hi #19, hmm, no, checked it on my minnie, rk808 would generate a irq right after start_alarm, and move the msleep before set alarm(regmap_bulk_write) is the same. so guess chris is wrong, rk808 should be fine in this case...
,
Feb 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3a94d7188f94415b820348fcee68c96372021b51 commit 3a94d7188f94415b820348fcee68c96372021b51 Author: Jeffy Chen <jeffy.chen@rock-chips.com> Date: Wed Feb 28 08:44:14 2018 FROMLIST: rtc: cros-ec: return -ETIME when refused to set alarms in the past Since accessing a Chrome OS EC based rtc is a slow operation, there is a race window where if the alarm is set for the next second and the second ticks over right before calculating the alarm offset. In this case the current driver is setting a 0-second alarm, which would be considered as disabling alarms by the EC(EC_RTC_ALARM_CLEAR). This breaks, e.g., hwclock which relies on RTC_UIE_ON -> rtc_update_irq_enable(), which sets a 1-second alarm and expects it to fire an interrupt. So return -ETIME when the alarm is in the past, follow __rtc_set_alarm(). Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> (am from https://patchwork.kernel.org/patch/10244153/) BUG= chromium:815249 , b:38310685 TEST=Test on scarlet: while hwclock -r;do sleep .1;done also, `test_that ... power_SuspendStress.bare` Change-Id: I9b5d3ce17eed2181f346cf82c63d165bc8546708 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/936721 Commit-Ready: Jeffy Chen <jeffy.chen@rock-chips.com> [modify] https://crrev.com/3a94d7188f94415b820348fcee68c96372021b51/drivers/rtc/rtc-cros-ec.c
,
Feb 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/fe8d56798e8ff340844f85830d6467be4e8d3bb5 commit fe8d56798e8ff340844f85830d6467be4e8d3bb5 Author: Brian Norris <briannorris@chromium.org> Date: Wed Feb 28 20:23:10 2018 power_suspend: don't retry reading HWCLOCK_FILE This file is written out by power_suspend, which has finished executing completely before powerd_dbus_suspend returns. There is no reason to retry reading it. BUG= chromium:815249 TEST=power_SuspendStress Change-Id: Id8e84847e6adb50f3f0823f4a29a14032c97dbdf Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/940540 Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/fe8d56798e8ff340844f85830d6467be4e8d3bb5/client/cros/power/power_suspend.py
,
Feb 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f47585bd3bf5eadb5eabea742e68b3b67c1c056e commit f47585bd3bf5eadb5eabea742e68b3b67c1c056e Author: Brian Norris <briannorris@chromium.org> Date: Wed Feb 28 20:23:11 2018 power_suspend: clear powerd's HWCLOCK file in between iterations After the first suspend iteration, powerd will have a leftover timestamp in /run/power_manager/root/hwclock-on-resume. But on subsequent iterations, powerd doesn't guarantee that this file gets updated, e.g., if something (like `hwclock` itself) fails. This can confuse things on subsequent runs and cause reported "spurious wakeups" (because we measure time against the old timestamp). Just clear the file before we suspend, to avoid any issues with staleness. The existing parsing code will handle this properly (abort, with a "Broken RTC timestamp") if it remains empty. We could probably also convince powerd to clear this file in error conditions, but this makes for a fine backup option. BUG= chromium:815249 TEST=power_SuspendStress with misbehaving RTC Change-Id: I53af2375614b352ce8006b78d8d3d562af612027 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/940601 Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/f47585bd3bf5eadb5eabea742e68b3b67c1c056e/client/cros/power/power_suspend.py
,
Feb 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/27edefb3f05714286fb68f248cc16426d5f2c3be commit 27edefb3f05714286fb68f248cc16426d5f2c3be Author: Brian Norris <briannorris@chromium.org> Date: Wed Feb 28 20:23:23 2018 power: powerd_suspend: always write hwclock result, when requested When a test is asking us to stash hwclock output to /run/power_manager/root/hwclock-on-resume, we only do that if hwclock exits successfully. But this can leave stale results, which is worse than just having error logs. (And in fact, hwclock just gives empty output on errors anyway, unless run with --debug.) Help to ensure that tests don't process stale data writing output in both success and error cases. There are accompanying changes in http://crosreview.com/940601 to improve the test itself, but we might as well do it here too. BUG= chromium:815249 TEST=power_SuspendStress with misbehaving RTC Change-Id: I52b1c69aa114ce583c09c3ed5b9d84bc618d301d Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/940761 Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/27edefb3f05714286fb68f248cc16426d5f2c3be/power_manager/powerd/powerd_suspend
,
Feb 28 2018
I think we've got (a), (c), and (d) covered. And for (b), rather than retry, we're just going to report a test failure. @Jeffy: thanks for looking into RK808. I suppose that's not an extremely surprising behavior; there may be other alarm implementations that are similar -- a past-due alarm causes an immediate IRQ -- in which case things aren't that dire. Unless someone can argue this is causing real problems (besides just test failures), we'll just target this for M66.
,
Mar 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/43c869fdeb541fbde0f089a5cd09a6bd5014fa17 commit 43c869fdeb541fbde0f089a5cd09a6bd5014fa17 Author: Jeffy Chen <jeffy.chen@rock-chips.com> Date: Thu Mar 01 01:48:41 2018 FROMLIST: rtc: cros-ec: return -ETIME when refused to set alarms in the past Since accessing a Chrome OS EC based rtc is a slow operation, there is a race window where if the alarm is set for the next second and the second ticks over right before calculating the alarm offset. In this case the current driver is setting a 0-second alarm, which would be considered as disabling alarms by the EC(EC_RTC_ALARM_CLEAR). This breaks, e.g., hwclock which relies on RTC_UIE_ON -> rtc_update_irq_enable(), which sets a 1-second alarm and expects it to fire an interrupt. So return -ETIME when the alarm is in the past, follow __rtc_set_alarm(). Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> (am from https://patchwork.kernel.org/patch/10244153/) BUG= chromium:815249 , b:38310685 TEST=Test on scarlet: while hwclock -r;do sleep .1;done also, `test_that ... power_SuspendStress.bare` Change-Id: I9b5d3ce17eed2181f346cf82c63d165bc8546708 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/936721 Commit-Ready: Jeffy Chen <jeffy.chen@rock-chips.com> (cherry picked from commit 3a94d7188f94415b820348fcee68c96372021b51) Reviewed-on: https://chromium-review.googlesource.com/941390 [modify] https://crrev.com/43c869fdeb541fbde0f089a5cd09a6bd5014fa17/drivers/rtc/rtc-cros-ec.c |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by derat@chromium.org
, Feb 23 2018