New issue
Advanced search Search tips

Issue 809245 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Tons of background(?) cache work posted at USER_BLOCKING priority

Project Member Reported by gab@chromium.org, Feb 5 2018

Issue description

See SS (attached) and trace @ https://drive.google.com/file/d/1JOZG1kcOWCtAtFmP6jrmLXXPNABDFIfx/view?usp=sharing

The problem is that CloseInternal() and DoomEntryInternal() in simple_entry_impl.cc post what appears to be cleanup work at USER_BLOCKING priority per posting through |worker_pool| which is a single TaskRunner with that priority.

These should just use base::PostTaskAndReply with TaskPriority::BACKGROUND (from base/task_scheduler/post_task.h)

This trace was generated easily with a couple tabs opened and abusively scrolling/zooming around in Google Maps in foreground (think that was mostly it).
 
cache_cleanup_at_user_blocking.png
163 KB View Download

Comment 1 by gab@chromium.org, Feb 5 2018

Description: Show this description
They're potentially happens-before things that are direct in critical path of page loading.

Comment 3 by gab@chromium.org, Feb 6 2018

Hmmm ok so it appears these are processing SimpleEntryImpl::pending_operations_ in order. How is thread-safety ensured here since processing tasks are posted in parallel to |worker_pool| and multiple could access |pending_operations_| at once?

Do |pending_operations_| have to be run in order? Could cleanup tasks be in a separate queue?

I'd hope this could make scrolling around in Maps less janky by deferring cleanup after the interaction (kind of like JS GarbageCollection).
Each SimpleEntryImpl only posts one task at a time (see STATE_IO_PENDING). Whether they have to be run in order... well, Doom absolutely has to --- it basically has semantics like POSIX unlink() and sometimes it's used like that by loading --- so things can't proceed until it gets the file out of the way. There are also things like read and write which have dependencies in the usual sense (so some could potentially be parallelized if they talk about different ranges, but our stats suggests that doesn't happen much).

Close is a little different --- at the present the code is certainly written to expect it to be serialized --- as it basically saves some of the state from memory to disk, dropping it from memory when it's done, and writes out things like file magic numbers and checksums --- so if we're opening a cache entry from disk it requires the disk portion of Close to have succeeded as otherwise the file would be malformed --- but if close weren't called in the first place, it would have been possible to just handle it from memory; so there may be an opportunity there to basically notice that a close is pending and just pick up from memory w/o waiting for it to succeed, though I am not 100% sure that doesn't have subtle bugs.

Another thing that may be worthwhile would be to run those ops at lower priority and hope that priority inversion doesn't happen; at least for Doom I could probably distinguish those done by eviction (which are extremely unlikely to have things depend on them) and those done by HttpCache --- which *will* have things depend on them; for Close it's more a matter of hoping that it finishes quickly enough.

> so there may be an opportunity there to basically notice that a close is 
> pending and just pick up from memory w/o waiting for it to succeed, though I 
> am not 100% sure that doesn't have subtle bugs.

On further reflection, this is far less workable than I thought. Though SimpleEntryImpl has a stateful counterpart that deals with the posted tasks 
(SimpleSynchronousEntry); as there is only one op out at a time right now that 
of course doesn't have any synchronization.  What that does on close, besides
the mentioned header writes also includes, well, closing the files; so parallelizing that with things that use the files would be tricky.
So as a side note, on maps.g.c there seem to be occurrences of cache entries with empty stream 0 --- those may be the source of Dooms. Not sure how they happen, but on open of them, you'll see this message:
ReadData failed: 0
and OnCacheReadError is called from HttpCache::Transaction::DoCacheReadResponseComplete, where io_buf_len_ is 0,
and so entry->DataSize(kResponseInfoIndex) is 0, too. 

Created bug #809619 for that. Some of the other dooms seem to come from renderer cancelling the requests...
A bit of a side question: is there any provision for changing priorities while keeping ordering and the like? (as part of the other discussions I noticed that some actual background work is using high priority, but it also must run after work that's somewhat more important)

Comment 9 by gab@chromium.org, Feb 12 2018

Do you mean having different priorities mixed in the same sequence? We explicitly do not support that as a series of BACKGROUND tasks could accumulate into a large backlog and then when a USER_BLOCKING task comes in and "promotes" the sequence's priority: it'd be blocked on processing that backlog per the sequencing requirements.

Something we're looking into is having multiple mutually exclusive sequences (i.e. which funnel down to a single execution sequence). With this you could mix priorities, but you'd have to give up ordering (USER_BLOCKING task could jump ahead of BACKGROUND task).
I am actually interested in a demotion, not a promotion: there is something that's sort of important (an initial read of an index) that needs to happen before something that's very much background (periodic, multiple, write backs). It's also possible that the read should be background, since we may not need it (depending on the client), though it does speed up some operations.

This is probably simple enough that it may be better w/o an explicit sequenced runner anyway --- it's mostly done that way because the runner used to be injected from the browser CACHE thread, and the injection is basically gone outside tests, though I really should get rid of the traces of it that are still there...


Comment 11 by gab@chromium.org, Feb 13 2018

I see, yes we would like to support toggling an entire sequence's traits. For now you have to do this manually by having a second sequence for the remainder of your work, putting it inside a DeferredSequencedTaskRunner and invoking DeferredSequencedTaskRunner::Start() when done with the initial work -- manually sequencing the two sequences...
Some general observations from the priority experiment, which may be somewhat informative when it comes to sorting out what the proper usage of task scheduler by cache and wider loading should be: it looks like where it's on, SimpleCache is about 92% - 95% of USER_BLOCKING + MayBlock tasks, and posts about 2.2x - 3.2x as many tasks as there are USER_VISIBLE + MayBlock tasks (going by changes in the counts of stats like TaskScheduler.TaskLatencyMicroseconds.Browser.UserVisibleTaskPriority_MayBlock)


Comment 13 by gab@chromium.org, Feb 19 2018

Ok thanks, this sounds fine provided that this work must happen ASAP to provide a great user experience. It just means that work that also meets that responsiveness criteria also needs to be USER_BLOCKING or may be preempted by cache requests.

Of course if you could split cache work into lower priorities where it applies that'd help too.

Comment 14 by gab@chromium.org, May 3 2018

Status: WontFix (was: Untriaged)
I guess this is WAI. Would be nice to split cache work into multiple priorities but at the moment it sounds like flushes need to be sequenced with user-blocking reads and as such are also user-blocking.

Sign in to add a comment