Tegra K1 systems reporting incoherent (negative) resume time in S2R |
||||
Issue description
When running power_Resume on Tegra K1 boards (nyan family) we see consistently incoherent time measurements reported about resume, namely that firmware bring-up time was negative. (examples below).
There seems to be a significant difference reported in bring-up time between hwclock and kernel timestamps.
The problem is that the math assumes that total bring up time for the board has to be bigger than ec + kernel, bringing us to:
board_up = total_up - kernel_up - firmware_up
But in these cases, kernel_up is actually about 4s, while total_up is about a second. Total up is calculated based on the HWCLOCKFILE and the rtc.py utility setting an alarm, while kernel_up is calculated by having powerd search for log lines that mean resume-start and resume-end, so something there is not working as expected
Boards affected:
nyan-blaze
nyan-big
nyan-kitty
Sample results from all above and veyron-jerry (for ARM reference):
nyan-blaze:
seconds_dev_0-0010_resume{perf}=0.034416
seconds_dev_1-0015_resume{perf}=0.107711
seconds_dev_1-2.1_resume{perf}=0.056603
seconds_dev_1-2_suspend{perf}=0.088179
seconds_dev_70030000.hda_resume{perf}=0.059169
seconds_dev_70030000.hda_suspend{perf}=0.139023
seconds_dev_70090000.usb_resume{perf}=0.053462
seconds_dev_700b0000.sdhci_suspend{perf}=0.116184
seconds_dev_700b0600.sdhci_resume{perf}=0.941404
seconds_dev_usb1_resume{perf}=0.185136
seconds_system_resume_firmware_cpu{perf}=0
seconds_system_resume_firmware_ec{perf}=-3.65303699809
seconds_system_resume_firmware{perf}=-3.65303699809
seconds_system_resume_kernel_cpu{perf}=3.696576
seconds_system_resume_kernel_dev{perf}=0.965772
seconds_system_resume_kernel{perf}=4.660412
seconds_system_resume{perf}=1.00737500191
seconds_system_suspend{perf}=0.188161
nyan-big:
seconds_dev_0-0010_resume{perf}=0.034555
seconds_dev_1-0015_resume{perf}=0.114335
seconds_dev_1-004a_resume{perf}=0.033372
seconds_dev_1-2_resume{perf}=0.08651
seconds_dev_70030000.hda_resume{perf}=0.059809
seconds_dev_70030000.hda_suspend{perf}=0.143084
seconds_dev_70090000.usb_resume{perf}=0.063344
seconds_dev_700b0000.sdhci_suspend{perf}=0.115156
seconds_dev_700b0600.sdhci_resume{perf}=0.998274
seconds_dev_usb1_resume{perf}=0.126649
seconds_system_resume_firmware_cpu{perf}=0
seconds_system_resume_firmware_ec{perf}=-3.00709506979
seconds_system_resume_firmware{perf}=-3.00709506979
seconds_system_resume_kernel_cpu{perf}=3.046227
seconds_system_resume_kernel_dev{perf}=1.02407
seconds_system_resume_kernel{perf}=4.068104
seconds_system_resume{perf}=1.06100893021
seconds_system_suspend{perf}=0.198106
nyan-kitty:
seconds_dev_1-2_resume{perf}=0.086702
seconds_dev_1-3.2_resume{perf}=0.115603
seconds_dev_1-3.2_suspend{perf}=0.04788
seconds_dev_1-3_resume{perf}=0.088506
seconds_dev_70030000.hda_resume{perf}=0.058824
seconds_dev_70030000.hda_suspend{perf}=0.141733
seconds_dev_70090000.usb_resume{perf}=0.063288
seconds_dev_700b0000.sdhci_suspend{perf}=0.110109
seconds_dev_700b0600.sdhci_resume{perf}=0.99832
seconds_dev_usb1_resume{perf}=0.067829
seconds_system_resume_firmware_cpu{perf}=0
seconds_system_resume_firmware_ec{perf}=-3.64791600887
seconds_system_resume_firmware{perf}=-3.64791600887
seconds_system_resume_kernel_cpu{perf}=3.693894
seconds_system_resume_kernel_dev{perf}=1.024169
seconds_system_resume_kernel{perf}=4.715887
seconds_system_resume{perf}=1.06797099113
seconds_system_suspend{perf}=0.207672
veyron-jerry:
seconds_dev_1-1_resume{perf}=0.836824
seconds_dev_2-0010_resume{perf}=0.02903
seconds_dev_2-1_resume{perf}=0.0555
seconds_dev_3-1_resume{perf}=0.055298
seconds_dev_4-0015_resume{perf}=0.112429
seconds_dev_ff500000.usb_suspend{perf}=0.018475
seconds_dev_mmc0-0001_suspend{perf}=0.038248
seconds_dev_mmc2-0001_suspend{perf}=0.106872
seconds_dev_usb1_resume{perf}=0.209786
seconds_dev_usb3_resume{perf}=0.017674
seconds_system_resume_firmware_cpu{perf}=0
seconds_system_resume_firmware_ec{perf}=0.0836409621277
seconds_system_resume_firmware{perf}=0.0836409621277
seconds_system_resume_kernel_cpu{perf}=0.004322
seconds_system_resume_kernel_dev{perf}=1.072691
seconds_system_resume_kernel{perf}=1.078796
seconds_system_resume{perf}=1.16243696213
seconds_system_suspend{perf}=0.200017
,
Nov 22 2016
There seems to be an issue with how uptime, and the kernel timestamps are kept in sync. In the beginning of the resume sequence, around the time we ask for the filesystem to sync, we see this: 2016-11-22T15:17:35.578265-08:00 INFO kernel: [ 112.560164] Loaded LP0 firmware with request_firmware. 2016-11-22T15:17:39.962666-08:00 INFO kernel: [ 112.566107] PM: Syncing filesystems ... done. 2016-11-22T15:17:39.962694-08:00 DEBUG kernel: [ 112.590676] PM: Preparing system for mem sleep So the kernel timestamp jumps ~4 seconds forward, while the uptime counter remains constant. Then later, when the device wakes, the kernel timestamp barely increases, while the uptime counter leaps forward (as expected) 2016-11-22T15:17:39.969598-08:00 INFO kernel: [ 112.788738] PMC wake enable = 0x40c028000040900 2016-11-22T15:17:39.969600-08:00 INFO kernel: [ 112.789638] Entering suspend state LP0 2016-11-22T15:17:39.969606-08:00 INFO kernel: [ 115.983301] PMC wake status = 0x40000 2016-11-22T15:17:39.969609-08:00 INFO kernel: [ 115.983313] Resume caused by PMC WAKE18, 7000d000.i2c On other devices (lulu, veyron_jerry, below log-lines (adjusted) pasted in for reference) we see a similar jump in the beginning, but not a similar jump at the end. Questions: (1) is the time jump in the beginning of the suspend-sequence expected/meaningful? (1-2) what's causing it (2) what's causing the 2nd jump in the end on Tegra K1 systems. veyron_jerry: (a) 2016-11-22T15:47:04.004955-08:00 INFO kernel: [3126087.289558] PM: Syncing filesystems ... done. 2016-11-22T15:47:04.005003-08:00 DEBUG kernel: [3126087.299715] PM: Preparing system for mem sleep 2016-11-22T15:47:10.069409-08:00 NOTICE kernel: [3126087.308266] Freezing user space processes ... 2016-11-22T15:47:10.069474-08:00 NOTICE kernel: [3126087.310310] Freezing remaining freezable tasks (b) 2016-11-22T15:47:10.072798-08:00 NOTICE kernel: [3126087.499723] CPU2: shutdown 2016-11-22T15:47:10.072802-08:00 NOTICE kernel: [3126087.501728] CPU3: shutdown 2016-11-22T15:47:10.072807-08:00 INFO kernel: [3126087.503337] Enabling non-boot CPUs ... 2016-11-22T15:47:10.072814-08:00 NOTICE kernel: [3126087.504547] CPU1: Booted secondary processor 2016-11-22T15:47:10.072818-08:00 INFO kernel: [3126087.504829] CPU1 is up lulu: (a) 2016-11-22T23:39:32.836894+00:00 NOTICE powerd_suspend[8710]: Finalizing suspend 2016-11-22T23:39:32.841866+00:00 INFO kernel: [37496.878553] PM: suspend entry 2016-11-22 23:39:32.841181396 UTC 2016-11-22T23:39:38.316446+00:00 INFO kernel: [37496.878571] PM: Syncing filesystems ... done. 2016-11-22T23:39:38.317134+00:00 DEBUG kernel: [37496.886211] PM: Preparing system for mem sleep (b) 2016-11-22T23:39:38.318272+00:00 NOTICE kernel: [37497.069245] Disabling non-boot CPUs ... 2016-11-22T23:39:38.318274+00:00 INFO kernel: [37497.070663] smpboot: CPU 1 is now offline 2016-11-22T23:39:38.318276+00:00 INFO kernel: [37497.072705] Disabling VMX on cpu 0 2016-11-22T23:39:38.318279+00:00 INFO kernel: [37497.072716] TSC at resume: 380552836
,
Jan 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/7fd15a46c739534c63c1ccba7e2baa2e3a02ec37 commit 7fd15a46c739534c63c1ccba7e2baa2e3a02ec37 Author: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Date: Thu Dec 08 22:01:51 2016 power_suspend: differentiating between end_suspend_time and start_resume_time On Tegra K1 devices, the uptime counter does not pause when we suspend. This leads to false values in our suspend tests, because we assume that end_suspend_time == start_resume time. Change adds an explicit getter inside send_metrics_on_resume for end_suspend_time. Additionally, I updated the list of messages to more accurately reflect the last, and first message in the suspend/resume path for each system we use. BUG= chromium:667610 TEST=manual testing on nyan_big, and lulu Change-Id: Ie0a7862a83eb15f0724fb92b1f34463ce9fdfd0d Reviewed-on: https://chromium-review.googlesource.com/418221 Commit-Ready: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Tested-by: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: Todd Broch <tbroch@chromium.org> [modify] https://crrev.com/7fd15a46c739534c63c1ccba7e2baa2e3a02ec37/power_manager/tools/send_metrics_on_resume
,
Jan 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/c4e257d87e1861c345b02baef217bc0f503628e6 commit c4e257d87e1861c345b02baef217bc0f503628e6 Author: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Date: Thu Jan 19 22:15:32 2017 power: remove trailing \ that made send_metrics_on_resume script abort get_end_suspend_time had a trailing \ that made the script fail, and it's now removed BUG= chromium:667610 TEST=manual testing on samus Change-Id: I44b50c01cf2f85419742c8aa8cc6c66d6d8291b2 Reviewed-on: https://chromium-review.googlesource.com/430202 Commit-Ready: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Tested-by: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Todd Broch <tbroch@chromium.org> [modify] https://crrev.com/c4e257d87e1861c345b02baef217bc0f503628e6/power_manager/tools/send_metrics_on_resume
,
Jan 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/89073104af43d4d89952df1944f155566d0c8d43 commit 89073104af43d4d89952df1944f155566d0c8d43 Author: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Date: Thu Jan 19 22:15:32 2017 power: remove trailing \ that made send_metrics_on_resume script abort get_end_suspend_time had a trailing \ that made the script fail, and it's now removed BUG= chromium:667610 TEST=manual testing on samus Change-Id: I44b50c01cf2f85419742c8aa8cc6c66d6d8291b2 Reviewed-on: https://chromium-review.googlesource.com/430202 Commit-Ready: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Tested-by: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Todd Broch <tbroch@chromium.org> (cherry picked from commit c4e257d87e1861c345b02baef217bc0f503628e6) Reviewed-on: https://chromium-review.googlesource.com/434243 Commit-Queue: Ruben Rodriguez Buchillon <coconutruben@chromium.org> [modify] https://crrev.com/89073104af43d4d89952df1944f155566d0c8d43/power_manager/tools/send_metrics_on_resume
,
Jan 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/89073104af43d4d89952df1944f155566d0c8d43 commit 89073104af43d4d89952df1944f155566d0c8d43 Author: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Date: Thu Jan 19 22:15:32 2017 power: remove trailing \ that made send_metrics_on_resume script abort get_end_suspend_time had a trailing \ that made the script fail, and it's now removed BUG= chromium:667610 TEST=manual testing on samus Change-Id: I44b50c01cf2f85419742c8aa8cc6c66d6d8291b2 Reviewed-on: https://chromium-review.googlesource.com/430202 Commit-Ready: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Tested-by: Ruben Rodriguez Buchillon <coconutruben@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Todd Broch <tbroch@chromium.org> (cherry picked from commit c4e257d87e1861c345b02baef217bc0f503628e6) Reviewed-on: https://chromium-review.googlesource.com/434243 Commit-Queue: Ruben Rodriguez Buchillon <coconutruben@chromium.org> [modify] https://crrev.com/89073104af43d4d89952df1944f155566d0c8d43/power_manager/tools/send_metrics_on_resume
,
Mar 2 2017
Ruben is this fixed?
,
Mar 2 2017
It should be. I'll verify tomorrow morning and mark fixed
,
Mar 2 2017
check on wmatrix test runs, to see what prior to the fix, nyan* devices would report a negative time. Now they report all positive times. |
||||
►
Sign in to add a comment |
||||
Comment 1 by coconutruben@chromium.org
, Nov 22 2016Labels: Arch-ARM