New issue
Advanced search Search tips

Issue 712032 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Chrome spent 7.8s in ContextCacheController::OnIdle() on UI thread

Project Member Reported by asvitk...@chromium.org, Apr 16 2017

Issue description

Chrome Version: 57.0.2987.133 (Official Build) (64-bit)
OS: Windows 10

This happened on a Windows 10 partition on my personal 2016 MacBook Pro. Shortly after boot up and launching Chrome, it froze for a number of seconds.

Once it recovered, I went to check chrome://profiler and sorted by browser process & UI thread to see if any task took a long time on the UI thread and indeed saw:

Count	Total run time	Avg run time	Max run time**	Avg queue time	Max queue time	Birth thread	PID	Function name	Source location
17	7,863	463	7,863	1	5	CrBrowserMain	796	PostIdleCallback	context_cache_controller.cc [131]


So PostIdleCallback (which is the FROM_HERE of the posted task ContextCacheController::OnIdle) took 7.8s as its max runtime and 463ms as average run time - on the UI thread. Yuck.

Looking at the code, it seems to do grab some locks. I don't think it's acceptable to freeze UI thread for so much time - can the code be changed to have a very short timeout if it can't grab the lock and post the task to try again later? (since presumably since this is an "idle callback" it's not critical?). This way, UI thread won't be blocked while the lock is busy and other browser posted tasks can continue to execute instead of waiting for this function to complete.

 

Comment 1 by ericrk@chromium.org, Apr 19 2017

I agree - the goal of the code was never to wait on locks for any amount of time. I'm actually a little surprised that the locks in use can block for this duration. There are two locks:

context_lock_ - we never wait on this, we try to acquire it, and if we fail, we return. This is the lock I expect to see contention on.

current_idle_generation_lock_ - we hold this while modifying current_idle_generation_ - should not ever block for long. There is a currently a bug where we actually hold this while calling TaskRunner::PostDelayedTask - so maybe if that can block for a long time we could hit this. However, I wouldn't expect TaskRunner::PostDelayedTask to block for long?

I'll put together a CL to address this issue - hopefully that will fix things.
It does seem odd that PostDelayedTask would block for a long time. If it is blocking then that will probably cause other problems which means that the proposed fix - while generally a good idea (don't hold locks for longer than necessary) is addressing more symptoms than cause.

An ETW trace (go/etw) would probably let us figure out what was going on because it captures all threads blocked on locks, who readied whom, CPU and disk usage, etc. But, it's not normally running. If this happens again then maybe leave UIforETW running in the background with tracing running, set to circular buffers. This has very low overhead and is perfect for this kind of rare performance problems. If the bug happens again then type Ctrl+Win+C (or press Save Trace Buffers) after the system has recovered to save the circular buffers to disk. They will record the last twenty to sixty (roughly) seconds of system activity. I can analyze the trace.

Comment 3 by piman@chromium.org, May 3 2017

note: chrome://profiler is counter-intuitive, "Function name" is the function that *posted* the task (the FROM_HERE), not what's executing.
What took long is not PostDelayedTask, but ContextCacheController::OnIdle

This is likely a dup of several other instances such as crbug.com/661306, crbug.com/703342,  crbug.com/667820  or  crbug.com/697703  where everything points to the IO thread being blocked, and preventing communication with the GPU process.
Cc: ericrk@chromium.org
Owner: piman@chromium.org
piman@, I vaguely recall you making some changes to prevent blocking in OnIdle (or maybe in flush, which is used by that). I found this CL, but couldn't remember if there were others: https://chromium-review.googlesource.com/c/chromium/src/+/634469

If this is likely addressed, let's close this out.
Status: Fixed (was: Assigned)
SetAggressivelyFreeResources should not block any more. I'm not sure if GrContext::purgeUnlockedResources might block, but I would assume not.

I don't see evidence of crashes or hangs inside of OnIdle (which used to happen in the past), so I will mark this as fixed (feel free to reopen with new evidence if it is not).

Sign in to add a comment