Throttling of script/css load events
Reported by
bmau...@fb.com,
Feb 1 2017
|
||||||
Issue descriptionWanted to follow up on a conversation we've had on loading-dev (https://groups.google.com/a/chromium.org/forum/#!topic/loading-dev/XiZuCoMh57g) we've been working on a chrome extension that we deploy to FB employees that correlates FB's tracing data with about:tracing to collect more holistic traces. One thing we noticed is a number of traces where: 1) our heartbeat mechanism which uses a periodic timer to detect if the main thread is idle detects that the main thread is busy with an unknown task 2) About:tracing says that the page is idle. Looking more deeply at the traces we noticed that the first event after the page thawed was something like TaskQueueThrottler::OnTimeDomainHasDelayedWork. It was pretty easy to reproduce this situation by loading FB in a background tab and running about:tracing manually. This reproduces both in Chrome 55 and in Canary. Events for things like a <script> tag loading seem to be delayed for long past the time when the resource is downloaded from the network. Is this behavior intended? I guess there's two different views on this -- one is that if FB is in the background the user doesn't care about the loading performance, so the delays don't really matter. On the other hand, if a user loads a tab in the background they may want to go there eventually and it'd be good to have it complete quickly.
,
Mar 11 2017
We *are* throttling work in background tabs in general. But I can't quite be sure from the description whether this is that throttling working as intended or some other bug. To make sure I understand, you have a setInterval(..., 1000) or something like that and in background tabs you're seeing it not run every second at times when the page is idle? That, in and of itself, is expected if the background tab is using more than 1% of wall time doing work (whether that work is from timers or not I believe...altimin can you confirm?). But this should only delay the timer tasks themselves (i.e. setTimeout/setInterval). Loading and execution of script should not be delayed with the thing we're currently shipping. If you have a trace where you see that the script finished downloading and it's start of execution was significantly delayed, then that's likely an unintended bug or due to some other change. That all said, we plan to *start* exploring throttling all tasks, not just timer tasks. We're aiming to not have too much of an impact on background tab network loading, but there will be a tricky balance there between background loading, foreground responsiveness and battery savings.
,
Mar 13 2017
We had seen instances where the page had: <script src="x.js" onload="y()" /> and y() seems to have been delayed by TaskQueueThrottler::OnTimeDomainHasDelayedWork. FWIW when I look at our aggregate data for initial page loads using our "heartbeat" method of detecting if the browser's main thread seems to be occupied but we don't have a currently running task I see a spike of "unknown times" around 1 second. To me this seems to indicate that a non-trivial number of FB pageloads happen in a background tab. It seems like ~ 10% of our unattributed time is due to 1 second chunks. Presumably even though these tabs are in the background the user does actually want FB to complete loading so they can look at the tab.
,
Mar 13 2017
That sounds like a bug and shouldn't happen at the moment (Chrome 55 may have had some problems). Do you happen to have a trace?
,
Mar 13 2017
,
Mar 13 2017
I've attached 3 traces. trace_normal: This is a normal focused load. This one has our heartbeat timer which you can see periodic 30ms event during 'idle' period where we're waiting on document chunks and/or network. trace_two_pauses: The tab is moved to the background quickly. Here we see one second gap twice. They occur under 'RendererSchedulerIdlePeriod:LongIdlePeriodPaused' in the trace so I'm not sure if that's indicating that it's intentional since I see that in the normal trace as well. trace_9_5sec_gap: Similar to the previous one but it came out way worse.
,
Mar 20 2017
Thanks for the traces! The code in example #3 shouldn't get throttled since there are no timer tasks involved as far as I can tell. Are we sure there aren't any timers involved in the larger example? In general throttling timer work in background tabs is expected. Loading tasks aren't currently throttled, although we may experiment with doing that. Which browser version were you testing with by the way? M57 has a 10 second grace period for background tabs which should alleviate some of this.
,
Mar 20 2017
The version is in the trace 'M' metadata: Chrome/59.0.3037.0 > Are we sure there aren't any timers involved in the larger example? I'm not sure what you mean? After the large gap in the facebook.com tab there's several timer events. We also don't see the heartbeat timer (our layman user profile using web API) pattern like you see in the normal trace. It could be useful to add to chrome tracing category data for 'TimerBase::run' how much the timer was delayed by.
,
Mar 20 2017
Thanks! > I'm not sure what you mean? After the large gap in the facebook.com tab there's > several timer events. We also don't see the heartbeat timer (our layman user > profile using web API) pattern like you see in the normal trace. I was wondering if any part of loading the resources involves timers -- either the task that triggers the loading or the one that declares it to be done. If it's just loading tasks, then they shouldn't get throttled under any circumstances. > It could be useful to add to chrome tracing category data for > 'TimerBase::run' how much the timer was delayed by. If you turn on the "renderer.scheduler" and "renderer.scheduler.debug" categories in the right hand side list you'll get a lot more data about the tasks, including the originally scheduled run time. (It'd be useful if you posted one of these traces too.)
,
Mar 21 2017
I'm trying to get better data to move this bug in the right direction. I see profiles with large pause followed by a TimerBase::run. The problem is V8 Sample is also getting paused when the tab is in the background. This category is useful to understand what the timer is. If you have any idea why V8 Sample would also be throttled aggressively when the tab is in the background that would be useful in getting a better trace.
,
Mar 21 2017
,
Mar 22 2017
Sorry, dumb question: what do you mean by v8 Sample in this case? In any case, a trace with the disabled-by-default renderer.scheduler and renderer.scheduler.debug categories enabled should tell us if there are some pending tasks that we're choosing not to run because of throttling.
,
Mar 22 2017
Here's a better trace. Would love to know what you're looking for in them. From what I can tell there's a lot of pending events. When I see something like V8.Execute from TimerBase::Run I can look at V8 Sample that fall within it to find out which setTimeout fired to figure out what piece of JS is scheduling this timer. I did have a look and Facebook does make use of setTimeout 0 during loading. I'm not sure that advising developers to stop using setTimeout 0 during startup to avoid getting throttled is the right thing to do. Especially if they move to something similar that bypasses the throttling. In the case where the browser is idle (not doing session restore on 100s of tab) and a user loads a site switching to another tab should not slow down the background load by 10 seconds. A throttle grace period while the loading spinner is active would be a big improvement for user experience. From our data it's pretty common for users to load a site like Facebook and switch to another tab while it loads.
,
Mar 24 2017
Thanks for the detailed trace. Looks like what's going on is that the page is backgrounded and still within the 10s no throttling grace period (you can tell because the RendererScheduler state dumps have "page_visible: false" but "page_throttled: false". However because nothing else is using the same renderer, we also set "renderer_backgrounded: true", and I think the bug is that we don't apply the 10s grace period to that[1], so timers become throttled immediately. Given that we track visibility for all WebViews I wonder if looking at the entire renderer backgrounded state is necessary anymore for throttling -- Alexander? Do you think the 10s grace period will be long enough for Facebook? [1] https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc?rcl=6d6eb7494eacfa69b2cbb22c673ce08d31a0ac7f&l=1115
,
Mar 24 2017
Yes, a 10s grace period would cover a large majority of our loads. It would solve the 'load facebook, switch to tab X while it loads' use case.
,
Mar 24 2017
I need to look into renderer backgrounding vs page backgrounding in more detail. I suspect that there are some corner cases on some platforms (like minimizing windows). But in any case we need to refactor the whole visibility-related logic.
,
Mar 24 2017
Something that might be nice is to do: max(30 s, min(10 s, <load event complete>)) On a flakey connection 10 seconds may not be enough. in addition waiting until the load event might allow the 10s to be adjusted down and increase the amount of savings for tabs that are loaded but in the background.
,
Mar 24 2017
Another option would be to give background tabs 10s of initial CPU time budget instead of looking at wall time. That way they could still complete loading even if the network is being slow.
,
Mar 24 2017
Yes, 10s of CPU time would be a big improvement over wall time. This would be great for users on slow networks where we might need to do a few round trips since we delay conditional resources and/or where the resources will trickle in slowly. Websites wont be penalized as long as they process the resources and go to sleep quickly which is a best practice. Background loads (excluding prerender) account for high single digit percent of page loads so this is an important change.
,
Mar 24 2017
It seems that the problem is that dispatchProgressEvent() from XMLHttpRequestProgressEventThrottle.cpp gets posted to default timer queue, and it should go to loading queue. But that still doesn't explain the very long delay — even with throttling delays should be <= 1 second.
,
Apr 26 2017
Sorry for the long delay, I was slightly overwhelmed by other problems. I tried to reproduce the problem and I didn't have a success - facebook loads in ~3 seconds for me. Could you help me with reproducing it?
,
May 10 2017
I'm having trouble reproducing it as well but I'm still seeing this issue in our aggregate traces. I'll report back when I have more actionable data. If you have particularly tracing data that would be sufficient for solving this bug let me know and I can try and get them for you.
,
May 10 2017
renderer.scheduler category is the most useful here, including disabled by default ones: renderer.scheduler and renderer.scheduler.debug (note they can generate heavy traces). Other default "javascript and rendering" categories can be useful too.
,
Apr 3 2018
Marking as wontfix as a part of Blink>Scheduling bug review. Please feel free to reopen this bug when (if) you have a repro. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by altimin@chromium.org
, Mar 10 2017