Issue metadata
Sign in to add a comment
|
Google Docs (Sheets, etc.) take forever to load in Canary |
||||||||||||||||||||||
Issue descriptionVersion: 52.0.2738.0 (Official Build) canary (64-bit) OS: OS X 10.11.4 What steps will reproduce the problem? (1) Open the following document: https://docs.google.com/document/d/1Y7wd-ZZJGn6SqZXQNUGzT0kF2jLm8klMjK35MrZBV1Y/edit (Google only) What is the expected output? Expect the document to load. What do you see instead? The page hangs for minutes, if not forever, waiting for the CSS and JS of the target to load. This is a recent regression. I first noticed it last Friday but didn't think anything of it. Only today did I realize it's a bug in the browser. It's still happening in Canary today after updates over the weekend. The bug's probably in the networking stack or loader. Needs a bisect, which could be painful due to the need to log in to each revision. Maybe someone can verify that this happens with a publicly visible document on chromium.org.
,
May 17 2016
BTW I'm on 52.0.2729.3 dev so the fix for Issue 605663 is in.
,
May 17 2016
Also, when I view Cache Storage in DevTools, it's empty. Right click -> Refresh Caches still results in emptiness.
,
May 17 2016
Can reproduce. It's definitely CacheStorage. Looks like something in CacheStorageCache::Put is failing to run its callback.
,
May 17 2016
Very good. Please treat this with urgency. Docs is unusable in Canary, even across browser restarts.
,
May 17 2016
Narrowed it down to this: BLobURLRequestJob::DidStart calls blob_reader_->CalculateSize() which returns IO_PENDING and BlobURLRequestJob::DidCalculateSize never gets called. Seems related to https://codereview.chromium.org/1234813004 so cc'ing dmurph@ and michaeln@.
,
May 17 2016
+cc jsbell
,
May 17 2016
This might be related to https://codereview.chromium.org/1967683003/
,
May 17 2016
Looks related, but I still have the same issue when I patch that CL in.
,
May 17 2016
Investigating now.
,
May 17 2016
I'm heading out for the day, will check back again in the morning. It looks like any time I run into IO_PENDING on CalculateSize() from a request created by CacheStorageBLobToDiskCache::StreamBlobToCache it fails to call the callback. Not sure if the thing creating the blob never finishes or what. I'm not terribly familiar with the blob code.
,
May 17 2016
I can't repro on linux. I'll try rebasing to ToT. Based on that it seems like the blob is either 1. Not constructed yet, or 2. Files. If I can't repro I'll start poking with tests.
,
May 17 2016
Please try on Mac OS X. This happens 100% of the time on my Retina MBP with Canary.
,
May 17 2016
Narrowing this down. Simpler repro steps: (after opt-in) * Load the drive settings page here: https://drive.google.com/corp/drive/#settings * Decheck the 'offline' checkbox * recheck the 'offline' checkbox * load any docs or presentation page * after about 10 seconds, it'll be saving a bunch of stuff to cache storage. In release builds, some of the blobs are still being built. These never 'complete', and thus those jobs don't finish.
,
May 17 2016
Ok, behavior I'm seeing: 1. When I re-enable offline, and let it load a little bit, the serviceworker (or extension, something) starts saving a bunch of data (from blobs) to cache storage. 2. Some of these blobs haven't finished constructing yet, so the CacheStorageBLobToDiskCache's -> BlobURLRequestJob -> BlobReader waits on the blob to finish before continuing. 3. These blobs NEVER finish, which suspends this whole process. 4. To find the origin of these blobs, I try to kill renderers. This can either reveal the renderer that made the blob, or a renderer that is holding the last reference. 5. Killing the docs extension finishes the blobs (investigating now whether they created the blobs or not) as broken.
,
May 17 2016
Confirming that the blob is being constructed from the extension process.
,
May 18 2016
I have a patch that check fails in blob_transport_controller when it gets a message to send blob data for a blob it doesn't contain. https://codereview.chromium.org/1985383002 It also include's falken's patch for fixing a separate issue. I'm continuing to debug, but basically it seems like we're requesting data from a renderer (the extension) that doesn't have a consolidation stored for that blob.
,
May 18 2016
Got it. Ok. Problem happens in BlobTransportController, right around here: https://code.google.com/p/chromium/codesearch#chromium/src/content/child/blob_storage/blob_transport_controller.cc&q=blobtransportcontroller&l=81 We're getting the response to our IPC messages before that task is run. That task stores the blob consolidation so we can use it in responses. This seems incredibly weird to me. But maybe IPC messages are scheduled at a higher priority on the IO thread? We're given the messages from our BlobMessageFilter: https://code.google.com/p/chromium/codesearch#chromium/src/content/child/blob_storage/blob_message_filter.h Are we not receiving messages on the IO thread? Or are we scheduling IPC messages at a higher priority than regular scheduling?
,
May 18 2016
Alright, well my current hypothesis is that we have two different task runners for the io thread (or at least more than one). I'm using this one to schedule my 'storage' task, which stores the blob data for transport: https://code.google.com/p/chromium/codesearch#chromium/src/content/renderer/renderer_blink_platform_impl.cc&l=268&rcl=1463527138 But it looks like I'm getting my response IPC before that task gets executed, even when they're both on the IO thread. The IPC message from the renderer, which in this case is being sent from the serviceworker thread, should be eventually using the task runner here just to send that message: https://code.google.com/p/chromium/codesearch#chromium/src/ipc/ipc_sync_message_filter.cc&q=ipc_sync_message_fil&sq=package:chromium&l=81 Even if that's different than the first one, that's just for sending. My received message from the browser is being dispatched to my blob_message_filter on the IO thread, but apparently somehow being scheduled before my storage task. If you patch my CL, here is what you would expect in the output for the crash: ... other messages ... [1:33:0517/205132:ERROR:blob_transport_controller.cc(72)] 2efdfb97-1c91-4120-be77-9c20b750c906 is about to be saved and send messages to browser. Instance: 0x7f29818e08f0, ServiceWorker Thread [13021:13058:0517/205132:ERROR:blob_async_builder_host.cc(400)] 2efdfb97-1c91-4120-be77-9c20b750c906 sending shared memory request [1:2:0517/205132:ERROR:blob_transport_controller.cc(229)] 2efdfb97-1c91-4120-be77-9c20b750c906 storing shared memory consolidation on Chrome_ChildIOThread [1:33:0517/205132:ERROR:blob_transport_controller.cc(72)] d93fa5a7-3d61-4bf1-84cb-e714b5d7fbc6 is about to be saved and send messages to browser. Instance: 0x7f29818e08f0, ServiceWorker Thread [1:2:0517/205132:ERROR:blob_transport_controller.cc(276)] 2efdfb97-1c91-4120-be77-9c20b750c906 got shared memory request on Chrome_ChildIOThread [13021:13058:0517/205132:ERROR:blob_async_builder_host.cc(400)] d93fa5a7-3d61-4bf1-84cb-e714b5d7fbc6 sending shared memory request ... other messages ... [1:2:0517/205132:FATAL:blob_transport_controller.cc(115)] Check failed: false. Could not find d93fa5a7-3d61-4bf1-84cb-e714b5d7fbc6 Instance: 0x7f29818e08f0, Chrome_ChildIOThread Blob 2efdfb97-1c91-4120-be77-9c20b750c906 was constructed successfully. But blob d93fa5a7-3d61-4bf1-84cb-e714b5d7fbc6 received the 'OnMemoryRequest' before it could store the data, even though the task to store the data was scheduled before IPC messages were sent. Elliot told me to ask Gabriel for help. +gab, do you have any insight or ideas?
,
May 18 2016
To abstract a bit, the mystery is:
[Renderer(Service Worker thread)]: PostTask("storage") to IO thread
[Renderer(Service Worker thread)]: Sends request IPC to browser
[Browser]: Sends response IPC
[Renderer(IO thread)]: Receives response IPC from browser, before storage task runs
One thing I noticed is the request IPC is sent by SyncMessageFilter::Send here:
if (sender_ && is_channel_send_thread_safe_) { ... }
rather than on the io_task_runner. Perhaps the channel is not really thread safe? But as comment 19 notes, that is just for sending; it's still odd that that response is handled on the IO thread before the storage task is run.
,
May 18 2016
IPCs are sequenced and PostTask are also sequenced, but once they are mixed, they seems racy...
,
May 18 2016
Thanks tzik. I'm unclear if this was always the case. It seems like a recent regression. Since ChannelMojo appears used, could this be related to "MojoChannel"? https://codereview.chromium.org/1950513002
,
May 18 2016
(To explain the screenshot: the arrows emanating from Service Worker thread come from three boxes: the Register IPC, the Build IPC, and the PostTask. We see the IPC response from the browser comes back before the posted task runs on the IO thread.)
,
May 18 2016
By locally reverting the MojoChannel change, it looks like the IPC sends are done from the IO thread, rather than directly from Service Worker thread, and the bug doesn't occur. My thinking now: - There's no guarantee that IPC message handling receiver-side is sequenced by the task runner. - The MojoChannel change somehow allowed the IPC sending to bypass the IO thread via ThreadSafeSender. If that's accurate, this bug would only happen since the MojoChannel change in M52.
,
May 18 2016
etc.) take forever to load in Canary are done from the IO thread, rather than directly from Service Worker thread, and the bug doesn't occur. sequenced by the task runner. thread via ThreadSafeSender. change in M52. This is accurate. We changed ChannelProxy to send directly on the pipe instead of hopping to the IO thread first.
,
May 18 2016
Right, it's incorrect to assume that Send() will hop to the IO thread. See https://bugs.chromium.org/p/chromium/issues/detail?id=516464 which fixed an eerily similar race. MojoChannel returns true for IsSendThreadSafe() (I believe it's the only override that does) and therefore skips the IO jump. CL https://codereview.chromium.org/1988983003 moves BlobTransportController::InitiateBlobTransfer to the IO thread, and I no longer hit the case where the render process doesn't know about the blob uuid that the browser is asking for. The service worker's still aren't loading however. The waitUntil promise is rejecting. They're erroring out with "Error: At least one cache failed to update" but I can't debug the minified js. I'm still also seeing lots of IO_PENDING from CalculateSize that never winds up calling BlobReader::AsyncCalculateSize.
,
May 18 2016
Re. #19-20: The IO thread runs a MessageLoopForIO which interleaves async IO (IPCs) and tasks from its task runner. So the IPC may indeed come back and be executed before the posted task runs in the task runner's (MessageLoop) queue. ref. message_pump_win.cc : https://code.google.com/p/chromium/codesearch#chromium/src/base/message_loop/message_pump_win.cc&l=505
,
May 18 2016
M52 dev is branching on Thursday - 05/18, It would be great if we have a fix before today night's canary.
,
May 18 2016
I'm taking over for Josh. Here's the fix: https://codereview.chromium.org/1988293002 Tests & better patch (which doesn't eliminate the fast path) will come later.
,
May 18 2016
Going through commit queue now.
,
May 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/65a8fb95a8c71a7be8515f5e1c8125b04a477716 commit 65a8fb95a8c71a7be8515f5e1c8125b04a477716 Author: dmurph <dmurph@chromium.org> Date: Wed May 18 21:13:57 2016 [BlobAsync] Fixed race between IPC messages and IO task queue The message responses to our blob start message are dispatched on the IO thread but they're not added to the task queue. So a race can happen where OnMemoryRequest is called before we store the consolidation in our map. I moved the StartBuildingBlob message - which signals the browser to start requesting memory from the renderer - to our IO task which stores our blob data. This eliminates the race case with a small sacrifice of construction speed. In the future, I'd like to be able to still dispatch both IPC messages right away to keep the fast path, and correctly handle the race case. BUG= 612358 Review-Url: https://codereview.chromium.org/1988293002 Cr-Commit-Position: refs/heads/master@{#394549} [modify] https://crrev.com/65a8fb95a8c71a7be8515f5e1c8125b04a477716/content/child/blob_storage/blob_transport_controller.cc [modify] https://crrev.com/65a8fb95a8c71a7be8515f5e1c8125b04a477716/content/child/blob_storage/blob_transport_controller.h [modify] https://crrev.com/65a8fb95a8c71a7be8515f5e1c8125b04a477716/content/child/blob_storage/blob_transport_controller_unittest.cc
,
May 18 2016
Hey Gabriel, Was this always the behavior? Or is this just from the Mojo switch? We need to know whether this needs a stable merge. Daniel
,
May 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/723cf1bf74560eb41120c1bf94eaaa38115203d0 commit 723cf1bf74560eb41120c1bf94eaaa38115203d0 Author: skyostil <skyostil@chromium.org> Date: Thu May 19 11:41:31 2016 Revert of [BlobAsync] Fixed race between IPC messages and IO task queue (patchset #1 id:1 of https://codereview.chromium.org/1988293002/ ) Reason for revert: The oortonline benchmark started failing to load blobs around the time this landed, so I'll do a speculative revert. Failure log: https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf%20%285%29/builds/3008/steps/oortonline/logs/stdio/text Original issue's description: > [BlobAsync] Fixed race between IPC messages and IO task queue > > The message responses to our blob start message are dispatched on the IO > thread but they're not added to the task queue. So a race can happen > where OnMemoryRequest is called before we store the consolidation in our > map. > > I moved the StartBuildingBlob message - which signals the browser to > start requesting memory from the renderer - to our IO task which stores > our blob data. This eliminates the race case with a small sacrifice of > construction speed. > > In the future, I'd like to be able to still dispatch both IPC messages > right away to keep the fast path, and correctly handle the race case. > > BUG= 612358 > > Committed: https://crrev.com/65a8fb95a8c71a7be8515f5e1c8125b04a477716 > Cr-Commit-Position: refs/heads/master@{#394549} TBR=michaeln@chromium.org,dmurph@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 612358 Review-Url: https://codereview.chromium.org/1997623002 Cr-Commit-Position: refs/heads/master@{#394740} [modify] https://crrev.com/723cf1bf74560eb41120c1bf94eaaa38115203d0/content/child/blob_storage/blob_transport_controller.cc [modify] https://crrev.com/723cf1bf74560eb41120c1bf94eaaa38115203d0/content/child/blob_storage/blob_transport_controller.h [modify] https://crrev.com/723cf1bf74560eb41120c1bf94eaaa38115203d0/content/child/blob_storage/blob_transport_controller_unittest.cc
,
May 19 2016
https://codereview.chromium.org/1991913003/ landed which causes mojo channel to return false for IsSendThreadSafe. So this should be fixed for now.
,
May 19 2016
I think questions addressed my way were answered here and/or on chromium-dev thread, but just to make sure: Turns out I had always assumed IPCs and IO thread tasks could be interleaved but they weren't in practice before Mojo as it's the first implementer which returned true for IPC::Channel::IsSendThreadSafe (if it returns false: an explicit IO thread hop is introduced for the IPC, implicitly sequencing it with other IO thread tasks).
,
May 19 2016
Confirmed that docs works fine on ToT after mojo channel change. Assigning to dmurph to handle the upcoming mojo channel changes.
,
May 19 2016
,
May 20 2016
Is there anything pending here to be taken care?
,
May 20 2016
I'm not sure. What mojo channel changes did you mean Josh?
,
May 20 2016
I reverted the change which bypassed the IO thread hop for Send, so any related bugs should be gone as of r394568.
,
May 20 2016
I'll close this as fixed then. Thanks everyone!
,
May 23 2016
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by falken@chromium.org
, May 17 2016Components: -Blink>Network -Internals>Network Blink>ServiceWorker Blink>Storage>CacheStorage
Labels: -OS-Mac OS-All