Expected Queueing Time is reported for time when machine is suspended. |
||||||||
Issue descriptionTo repro: - Suspend computer - Wait - Keep waiting (a long time) - Wakeup computer We'll report that the EQT has been 0 for a long time. Ideally, we'd exclude time when the computer was suspended from EQT calculation. To do this, we need to talk to base::PowerObserver. This means either plumbing power information to the renderer, or sending EQT data to the browser before reporting it. Perhaps we can use the existing GRC plumbing for this?
,
Oct 18 2017
The code I removed https://chromium-review.googlesource.com/c/chromium/src/+/721613 to resolve https://bugs.chromium.org/p/chromium/issues/detail?id=763710 was the code sending EQT to browser. What do you think if we just abort the calculation of EQT when the machine is about to be suspended and restart to calculate EQT when the machine is resumed? EQT is one of the most important metrics we collect to evaluate Background tabs opening and SessionRestore through GRC, so maybe P1? :)
,
Oct 18 2017
We should do this today, as we don't compute EQT when a renderer is backgrounded, and we should consider all renderers backgrounded when the machine is suspended. Nicolas is verifying that this works correctly.
,
Oct 18 2017
So, the code in question is https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/scheduler/base/queueing_time_estimator.cc?type=cs&q=OnQueueingTimeForWindowEstimated&sq=package:chromium&l=170? If I'm reading that right, any time a tab is goes from background to foreground, or when a foreground tab goes from suspended to not, we do O(seconds) work. I'm skeptical we should be doing that, but if we do decide to keep it, OnQueueingTimeForWindowEstimated seems like an easily misused callback. It's not clear to me why OnQueueingTimeForWindowEstimated exists actually. Aside from the code lpy just removed, it seems like renderer_scheduler_impl just uses this to do histograms that could instead be in queueing_time_estimator_directly. Am I misreading the code?
,
Oct 18 2017
What do you mean by "do O(seconds) work"? The line you point to should not be executed if |renderer_backgrounded| is set to true. I'm guessing the problem may be that either |renderer_backgrounded| does not change when the machine is suspended or it does but the |transition_time| is calculated after it wakes up, so we think that all the sleepy time was actually foreground. I think OnQueueingTimeForWindowEstimated exists just so that RecordQueueingTimeClient can reuse the code without reporting to UMA. But yes the UMA reporting code could be in either of the two classes, it doesn't really matter.
,
Oct 23 2017
,
Oct 23 2017
Raising this to pri-1 as features are depending on this to be fixed.
,
Oct 23 2017
I confirmed that the UMA histogram is also getting filled with seconds corresponding to when we are asleep. We need a way to detect that the computer slept from renderer_scheduler_impl.
,
Oct 23 2017
I think maybe we can use PowerObserver https://cs.chromium.org/chromium/src/base/power_monitor/power_observer.h
,
Oct 24 2017
It feels like the browser should tell all renderers they are backgrounded during suspension. Plumbing in the power state as an additional bit feels a little too orthogonal. Can we do that from GRC?
,
Oct 24 2017
I agree that the fix for this should be in the logic that handles backgrounding.
,
Oct 24 2017
> It feels like the browser should tell all renderers they are backgrounded during suspension. I don't agree. I don't think we should background renderers on suspend. Suspend should be mostly just be transparent to most things. Nothing can run while computer is sleeping, so it's kind of silly to try to run code to react to suspend, since that code could very well run after computer has resumed. This timing thing is a special case which I hadn't thought about before. But that doesn't change the fact that there should not be a background on computer suspend. Plus it doesn't really work either, most renderer are going to be backgrounded anyway, so if suspend is folded into background, then the background signal doesn't actually change on suspend for most renderers.
,
Oct 24 2017
How about we simply ignoring idle time that are longer than 30 seconds. That also align to the current implementation of MainThreadLoad, where it discards result from long idle period that are longer than 30 seconds.
,
Oct 24 2017
Re#13, that sounds like at least a viable short term fix for me. Then we can re-enable sending EQT sooner while trying to figure out what the best long-term solution is.
,
Oct 25 2017
That short term fix sounds fine to me.
,
Oct 25 2017
I was actually proposing a long term fix :) So my questions are: Why isn't it a long term fix? What should a long term fix look like?
,
Oct 25 2017
I think the reason MainThreadLoad discards tasks longer than 30 seconds is to work around this problem. Imagine a case where a user is reading a page, and there are legitimately no tasks occurring on the main thread. We currently wouldn't report any data. Now if we start executing a short task every second, we'll actually see average EQT decrease. There also could be cases where we're suspended for less than 30 seconds. The right long term fix would involve plumbing through that we've been suspended. It's possible that we won't be able to know precisely when we're suspended, but worst case, we can ignore all time from the end of the last task until we've woken up, which should be pretty accurate.
,
Oct 25 2017
In the case where we send EQT through GRC, if we can't know suspension precisely then the plumbing of suspension signal may be delayed by EQT IPCs in IO thread I think. But either way, I will write the patch to introduce idle discard and then get the EQT plumbing back.
,
Oct 25 2017
Already sent https://chromium-review.googlesource.com/c/chromium/src/+/737430 for review to discard idle.
,
Oct 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/67434a3e8cdce918b7fd51fa1d72456c63dba11a commit 67434a3e8cdce918b7fd51fa1d72456c63dba11a Author: Nicolas Pena <npm@chromium.org> Date: Thu Oct 26 15:27:48 2017 EQT: Skip extremely long tasks or idling In this CL, we improve EQT calculation by skipping steps when there is an extremely long task or idling period. The objective of this is to avoid reporting a lot of EQT numbers after the computer goes to sleep for a very long time. Bug: 776013 Change-Id: Ia46dea05a0936412a5924d0b2a0f3997e33a7ee3 Reviewed-on: https://chromium-review.googlesource.com/737430 Reviewed-by: lpy <lpy@chromium.org> Reviewed-by: Sami Kyöstilä <skyostil@chromium.org> Reviewed-by: Timothy Dresser <tdresser@chromium.org> Commit-Queue: Nicolás Peña Moreno <npm@chromium.org> Cr-Commit-Position: refs/heads/master@{#511824} [modify] https://crrev.com/67434a3e8cdce918b7fd51fa1d72456c63dba11a/third_party/WebKit/Source/platform/scheduler/renderer/queueing_time_estimator.cc [modify] https://crrev.com/67434a3e8cdce918b7fd51fa1d72456c63dba11a/third_party/WebKit/Source/platform/scheduler/renderer/queueing_time_estimator_unittest.cc
,
Oct 31 2017
Re #10: I'm not sure if that is possible - the problem is that we can't run any tasks when the machine is suspended, so we can't notify the renderers about it. And notifying the renderer after resuming the process can be tricky — all metric recording logic should be aware that a signal might come in the future and invalidate the previous metrics. So I believe that the current approach (skip > 30 second idle period) is optimal and I propose to consider this problem resolved. WDYT?
,
Nov 1 2017
Good point, we can't really rely on any logic getting a chance to run before the suspension happens. It'll run when we wake up so there might be a chance to clean up, but that seems more error prone than the 30 second threshold.
,
Nov 2 2017
Marking as Fixed for now. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by tdres...@chromium.org
, Oct 18 2017