Chrome spent 7.8s in ContextCacheController::OnIdle() on UI thread |
|||
Issue descriptionChrome Version: 57.0.2987.133 (Official Build) (64-bit) OS: Windows 10 This happened on a Windows 10 partition on my personal 2016 MacBook Pro. Shortly after boot up and launching Chrome, it froze for a number of seconds. Once it recovered, I went to check chrome://profiler and sorted by browser process & UI thread to see if any task took a long time on the UI thread and indeed saw: Count Total run time Avg run time Max run time** Avg queue time Max queue time Birth thread PID Function name Source location 17 7,863 463 7,863 1 5 CrBrowserMain 796 PostIdleCallback context_cache_controller.cc [131] So PostIdleCallback (which is the FROM_HERE of the posted task ContextCacheController::OnIdle) took 7.8s as its max runtime and 463ms as average run time - on the UI thread. Yuck. Looking at the code, it seems to do grab some locks. I don't think it's acceptable to freeze UI thread for so much time - can the code be changed to have a very short timeout if it can't grab the lock and post the task to try again later? (since presumably since this is an "idle callback" it's not critical?). This way, UI thread won't be blocked while the lock is busy and other browser posted tasks can continue to execute instead of waiting for this function to complete.
,
Apr 19 2017
It does seem odd that PostDelayedTask would block for a long time. If it is blocking then that will probably cause other problems which means that the proposed fix - while generally a good idea (don't hold locks for longer than necessary) is addressing more symptoms than cause. An ETW trace (go/etw) would probably let us figure out what was going on because it captures all threads blocked on locks, who readied whom, CPU and disk usage, etc. But, it's not normally running. If this happens again then maybe leave UIforETW running in the background with tracing running, set to circular buffers. This has very low overhead and is perfect for this kind of rare performance problems. If the bug happens again then type Ctrl+Win+C (or press Save Trace Buffers) after the system has recovered to save the circular buffers to disk. They will record the last twenty to sixty (roughly) seconds of system activity. I can analyze the trace.
,
May 3 2017
note: chrome://profiler is counter-intuitive, "Function name" is the function that *posted* the task (the FROM_HERE), not what's executing. What took long is not PostDelayedTask, but ContextCacheController::OnIdle This is likely a dup of several other instances such as crbug.com/661306, crbug.com/703342, crbug.com/667820 or crbug.com/697703 where everything points to the IO thread being blocked, and preventing communication with the GPU process.
,
Jul 19
piman@, I vaguely recall you making some changes to prevent blocking in OnIdle (or maybe in flush, which is used by that). I found this CL, but couldn't remember if there were others: https://chromium-review.googlesource.com/c/chromium/src/+/634469 If this is likely addressed, let's close this out.
,
Jul 19
SetAggressivelyFreeResources should not block any more. I'm not sure if GrContext::purgeUnlockedResources might block, but I would assume not. I don't see evidence of crashes or hangs inside of OnIdle (which used to happen in the past), so I will mark this as fixed (feel free to reopen with new evidence if it is not). |
|||
►
Sign in to add a comment |
|||
Comment 1 by ericrk@chromium.org
, Apr 19 2017