Library prefetch may slows down startup badly |
||
Issue descriptionWe collected startup traces from a small % of users. Some traces have really long startup. This bug is filed based on the traces and we do not have repro steps. See report: https://crash.corp.google.com/browse?q=reportid=%27c7641f2c15ba8150%27#6 The LibraryLoader.asyncPrefetchLibrariesToMemory took 58 seconds to complete. While this was running in background the main thread is slowed down badly. The device seems to have 2GB RAM. But, it looks like all the tasks are slow in this case. So, not sure if there are external factors. Would it help if we stopped prefetch if it took much longer than expected and let it page fault while running? Is it even possible to stop prefetch?
,
Nov 6
Thanks for reports! Something real from the field is always exciting! My interpretation of the trace in comment#1: Having not looked in CCT code in a while, I *think* the trace in #1 shows CCT warmup(), which happens while Chrome processes are in background cgroup and probably starves CPU, which is WAI - should not steal CPU cycles from a foreground app (probably AGSA in this case). The CPU usage graph shows low usage for most of the duration of the start, and then jumps up. The moment it jumps is probably when the custom tab gets to foreground. The library prefetch finished in 76ms *after* that. My interpretation is that the prefetch was blocked by the same reasons the other tasks were blocked. The usual mental model of the library prefetch (that we use) is that it is blocked by IO and does not care about CPU resources. Another aspect of library prefetch is that it can only interfere by creating extra IO. Most of the time it should help with making IO sequential, but I guess there could be scenarios where it makes things slower. I think those scenarios are unlikely, though I don't have much data. On perf bots as well as on the slow reports it takes about 10ms of CPU time to do the prefetches, but the wall time is significantly different. The difference in wall time can be explained by: 1. CPU being occupied by something else than Chrome 2. disk being super busy The (1) seems to be aligned with observations above. The (2) contradicts the fact that a bunch of database opens at TaskSchedulerForegroundBlockingBlah happen in <1ms non-CPU time. Disk IO seems unoccupied on the trace from comment#2. === My interpretation of the trace in the initial description is also CPU starvation due to being in background. Again, CPU spikes *earlier* than the library prefetcher finishes, allows BrowserMainLoop::CreateThreads:PreMainMessageLoopRun to start spinning. Since this has no tasks related to activity initialization, I would suggest that this is background task that briefly goes to the network and shuts down by itself (downloads? offline? sync? broadcast? hard to tell) Here some disk tasks seem to be taking a long time, like 100ms in cookie store init .. possibly disk contention as well, which library prefetcher could theoretically increase. The 3s overlap with active browser work is kinda unfortunate, something we may want to fix if we find evidence that it affects UX negatively. === re: stopping the library prefetch: yes, that's possible by sending SIGSTOP to the process and then SIGCONT. Figuring out the right criteria for it would be non-trivial. Also not sure which metrics to watch on such a Finch experiment. Overall: looks mostly WAI to me.
,
Nov 6
oh, one last thought: I think making it obvious when during the trace Chrome was in foreground .. would help in interpretation of the results a lot! WDYT?
,
Nov 6
Looked at the traces as well, and this is definitively coming from CCT's warmup(), and as such WAI and not user-visible. |
||
►
Sign in to add a comment |
||
Comment 1 by ssid@chromium.org
, Oct 31