[Network service] Service workers do not recover from URLLoaderFactory connection errors |
|||||||||||
Issue descriptionConnection errors can happen on URLLoaderFactories if the network process crashes or an extension using webRequest is installed when no previous webRequest extensions are installed. Service workers should be able to recover from these errors and reconnect to the network service. See issue 879883 for more background.
,
Sep 14
@shimazu, even though we addressed this gmail issue, we will still need this fix for a launching to a high channel (eg, beta/stable exp M70). Can you continue to prioritize this (crrev.com/c/1220932)? Speaking with cduvall@, the race condition you mentioned https://bugs.chromium.org/p/chromium/issues/detail?id=879883#c46 should not be blocking you as the behavior is the same prior to network service. Ideally, we would like to have this fix in by 9/19 so it gets into the next M71 dev and can be merged back into M70 beta 9/26.
,
Sep 14
The extension race condition may still exist, but shouldn't affect this change as this should just handle the reconnection logic for service workers. My change in crrev.com/c/1224641 should prevent the URLLoaderFactory connection error on startup that was a problem before. This now should just be needed to handle the other cases like network process crashes.
,
Sep 17
+kinuko/yutaka, can you help with this bug? See C#2. We would still need this fixed for our launch to a higher channel in m70. Since Mokoto is out for the next 2 days, can one of you help address this? thanks!
,
Sep 17
Sorry, I didn't realize this still needs to be prioritized. Actually I had a working patch except for tests in my workstation, but I don't have an access in a couple of days. I think others in TOK are also OOO on Monday since it's a public holiday. I'd appreciate it if someone took over it, or I can land it on Wed if merging the CL beyond Dev (baking it only on Canary) is acceptable.
,
Sep 17
thank you for the help and the follow up. I will check with others on the thread.
,
Sep 19
See also issue 848256 for shared workers.
,
Sep 20
jam@ has a good idea on issue 848256 to just shut down the shared worker on network service crash... that would seem to work better for service workers. I wonder if we should just do that here for simplicity.
,
Sep 20
The original idea included that as a promising option, and then we had more frequent errors due to the extension race that had forced us to change our plan. Assuming that we're now back to the previous state I'm fine... but I suppose we'll drop more requests in the case.
,
Sep 20
If service workers' lifetime is not guaranteed, e.g. the user agent can shut it down at will, then this could work. network process crashes are pretty rare, less than 10x of browser crashes so far. I guess this would also be similar to a SW that's used by multiple renderer processes when the process hosting the renderer dies?
,
Sep 20
When a SW process dies the sender page can know that via connection error so it can re-establish and resend requests, while this case the SW itself has already received requests from the page (so the page thinks they're being handled) and then they'll be just dropped... but given that it'd be pretty rare I think it could be fine too as they'd just appear as if some requests errored out in the similar way as network/server errors.
,
Sep 20
A running service worker might be handling some events, so usually it's terminated only when the browser process knows it's idle and no inflight event exists. If we try to restart the worker without waiting for becoming idle when NS crash or reconnection happens, some of inflight events might be dropped. I'm okay with either way, but as kinuko mentioned, I'm guessing that the possibility of missing requests during restarting would be higher than during re-sending the newest URLLoaderBundle to the service worker. I'm slightly prefer resending URLLoaderBundle to restarting the worker.
,
Sep 20
Some network requests will always fail when a network process crashes, that is unavoidable (e.g. requests which were outstanding in the network process or ones which were in a message pipe to it). The question is how much we care to narrow down the window where this occurs. I defer to you all on that :)
,
Sep 20
Yep, we all chatted and plan to explore the shut-down approach. If it looks promising/simpler we'll just switch to do so.
,
Sep 20
Also, since the browser terminates service workers eventually anyway, and NS crashes are rare, I think we can make a case that this doesn't necessarily block Beta. The user won't be stuck forever since the service worker should eventually be restarted. The exception is if the page keeps the SW alive with constant requests. Regardless we should still try to fix this quickly and get it in 70 if possible... so I'll keep the Beta block label on for now.
,
Sep 20
Re comment 15, that's true it'll eventually work. However the experience today of Gmail not working if the network process crashes is pretty poor (no actionable workaround for user, unclear that they need to restart browser, reloading tab doesn't fix it), which is why I thought we should avoid this occurring to beta users.
,
Sep 20
c16: Yea, that makes sense.
,
Sep 20
shimazu@ has a CL at: https://chromium-review.googlesource.com/c/chromium/src/+/1235593
,
Sep 20
Thanks. I patched this in and can confirm that with this fix, Gmail in a new tab works. However there's another bug which is that the cookie manager in RenderFrameMessageFilter needs to be recreated after network process crashes. I filed bug 887680 to track that and I'm looking into it.
,
Sep 20
Hmm, after fixing the cookie issue (https://chromium-review.googlesource.com/c/chromium/src/+/1236282) and with the patch from comment 18, I still see Gmail not working after a network process crash. It says "Not connected. Connecting in ### s". If I open a new tab, then it works. But why isn't the first tab working?
,
Sep 21
I think the error is that ServiceWorkerSubresourceLoader::fallback_factory_ is not updated after network service crash, and it has a mojo pipe which is broken.
,
Sep 21
I should also mention, however with the fixes above a reload fixes things, so it's still much better than before.
,
Sep 21
Oh thanks for the update, subresource loader's fallback_factory_ probably needs another fix (that's at least not covered by shimazu's fix)
,
Sep 21
Hm, I *guess* if we change GetLoaderFactoryBundle()->CloneWithoutDefaultFactory() in RenderFrameImpl::uildServiceWorkerNetworkProviderForNavigation to GetLoaderFactoryBundle()->Clone() the reconnect would work. (We seem to be losing the tracking ability in the specialized Clone path)
,
Sep 21
Yep that was it (you figured this out much faster than me!). I have a fix in https://chromium-review.googlesource.com/c/chromium/src/+/1237720
,
Sep 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3e4e10cf8afeff34ed2275d022a5fffae14904a2 commit 3e4e10cf8afeff34ed2275d022a5fffae14904a2 Author: John Abd-El-Malek <jam@chromium.org> Date: Fri Sep 21 04:20:33 2018 Fix service worker subresource requests failing after a network process crash. With https://chromium-review.googlesource.com/c/chromium/src/+/1235593, Gmail re-loads correctly after a network process crash. However if Gmail isn't reloaded or opened in a new tab, it still thinks it's disconnected because its subresource requests are failing. The problem is that ServiceWorkerSubresourceLoader::fallback_factory_ was a ChildURLLoaderFactoryBundle (which doesn't handle updating after crashes) instead of being a HostChildURLLoaderFactoryBundle (which is properly updated after a crash). Bug: 884007 Change-Id: I63d61811005a57f1a43ec91961fd5c5f5de5d0e8 Reviewed-on: https://chromium-review.googlesource.com/1237720 Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Commit-Queue: John Abd-El-Malek <jam@chromium.org> Cr-Commit-Position: refs/heads/master@{#593076} [modify] https://crrev.com/3e4e10cf8afeff34ed2275d022a5fffae14904a2/content/renderer/loader/child_url_loader_factory_bundle.h [modify] https://crrev.com/3e4e10cf8afeff34ed2275d022a5fffae14904a2/content/renderer/loader/tracked_child_url_loader_factory_bundle.cc [modify] https://crrev.com/3e4e10cf8afeff34ed2275d022a5fffae14904a2/content/renderer/loader/tracked_child_url_loader_factory_bundle.h
,
Sep 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b731e532b76383ac9296967e973862fd99ae8e59 commit b731e532b76383ac9296967e973862fd99ae8e59 Author: Makoto Shimazu <shimazu@chromium.org> Date: Fri Sep 21 07:56:31 2018 Revise browser test using fetch_from_*_worker.(html|js) This CL is to use EvalJs() instead of waiting for an expected title and checking the body.innerText. This will generate more verbose errors especially in the case where the previous test times out. TBR=kinuko@chromium.org Bug: 884007 Change-Id: I8cbde52efc2da69784b4c1ca06a17c7d262a80d3 Reviewed-on: https://chromium-review.googlesource.com/1237794 Commit-Queue: Makoto Shimazu <shimazu@chromium.org> Reviewed-by: Matt Falkenhagen <falken@chromium.org> Cr-Commit-Position: refs/heads/master@{#593109} [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/browser/do_not_track_browsertest.cc [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/service_worker/fetch_from_service_worker.html [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/service_worker/fetch_from_service_worker.js [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_shared_worker.html [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_shared_worker.js [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_worker.html [modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_worker.js
,
Sep 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2075b0d44c960dd5ed208e710fd79eb8ab35d771 commit 2075b0d44c960dd5ed208e710fd79eb8ab35d771 Author: Makoto Shimazu <shimazu@chromium.org> Date: Fri Sep 21 09:04:22 2018 Add tests for fetch() in a service/shared worker after NetworkService crashes Bug: 884007 , 848256 Change-Id: I3dbeab15a52fc296058b1962364edb19c95d74a0 Reviewed-on: https://chromium-review.googlesource.com/1235261 Commit-Queue: Makoto Shimazu <shimazu@chromium.org> Reviewed-by: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Cr-Commit-Position: refs/heads/master@{#593129} [modify] https://crrev.com/2075b0d44c960dd5ed208e710fd79eb8ab35d771/content/browser/network_service_restart_browsertest.cc
,
Sep 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6a96cb6a3c2c7373a167127500ed142934a757ed commit 6a96cb6a3c2c7373a167127500ed142934a757ed Author: Makoto Shimazu <shimazu@chromium.org> Date: Fri Sep 21 11:39:13 2018 ServiceWorker: call StopWorker when the network service crashes This CL is to stop service workers when the network service crashes. This might drop inflight events, but given that the frequency is quite low, it's acceptable. When another event is dispatched to the service worker, it starts the worker again and it should work. Bug: 884007 Change-Id: I2da4bd05c6a89f70b623da8e29a89488fd5a9cf3 Reviewed-on: https://chromium-review.googlesource.com/1235593 Commit-Queue: Makoto Shimazu <shimazu@chromium.org> Reviewed-by: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Cr-Commit-Position: refs/heads/master@{#593150} [modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/browser/network_service_restart_browsertest.cc [modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/renderer/service_worker/embedded_worker_instance_client_impl.h [modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/renderer/service_worker/service_worker_context_client.cc [modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/renderer/service_worker/service_worker_context_client.h
,
Sep 21
Requesting merge of r593076 and r593150 (will not merge test part of later) to M70; they fix issues with network service when the network process crashes. I realize they haven't gone to canary yet; I'm requesting merge today and will merge them tomorrow so that they're in the next beta build. Thanks
,
Sep 21
This bug requires manual review: M70 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), geohsu@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Sep 21
The canary still hasn't gone out featuring these changes. Let's follow up on Monday.
,
Sep 21
sg, we want these changes in prior to the next M70 beta cut.
,
Sep 24
The NextAction date has arrived: 2018-09-24
,
Sep 24
Removing Android label for now, since this code is behind a feature flag that isn't run on Android yet.
,
Sep 24
,
Sep 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8369c56926c5048fa3046877e27b43266d1b0904 commit 8369c56926c5048fa3046877e27b43266d1b0904 Author: John Abd-El-Malek <jam@chromium.org> Date: Mon Sep 24 17:00:44 2018 Fix service worker subresource requests failing after a network process crash. With https://chromium-review.googlesource.com/c/chromium/src/+/1235593, Gmail re-loads correctly after a network process crash. However if Gmail isn't reloaded or opened in a new tab, it still thinks it's disconnected because its subresource requests are failing. The problem is that ServiceWorkerSubresourceLoader::fallback_factory_ was a ChildURLLoaderFactoryBundle (which doesn't handle updating after crashes) instead of being a HostChildURLLoaderFactoryBundle (which is properly updated after a crash). Bug: 884007 Change-Id: I63d61811005a57f1a43ec91961fd5c5f5de5d0e8 Reviewed-on: https://chromium-review.googlesource.com/1237720 Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Commit-Queue: John Abd-El-Malek <jam@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#593076}(cherry picked from commit 3e4e10cf8afeff34ed2275d022a5fffae14904a2) Reviewed-on: https://chromium-review.googlesource.com/1239995 Reviewed-by: John Abd-El-Malek <jam@chromium.org> Cr-Commit-Position: refs/branch-heads/3538@{#588} Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811} [modify] https://crrev.com/8369c56926c5048fa3046877e27b43266d1b0904/content/renderer/loader/child_url_loader_factory_bundle.h [modify] https://crrev.com/8369c56926c5048fa3046877e27b43266d1b0904/content/renderer/loader/tracked_child_url_loader_factory_bundle.cc [modify] https://crrev.com/8369c56926c5048fa3046877e27b43266d1b0904/content/renderer/loader/tracked_child_url_loader_factory_bundle.h
,
Sep 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8369c56926c5048fa3046877e27b43266d1b0904 Commit: 8369c56926c5048fa3046877e27b43266d1b0904 Author: jam@chromium.org Commiter: jam@chromium.org Date: 2018-09-24 17:00:44 +0000 UTC Fix service worker subresource requests failing after a network process crash. With https://chromium-review.googlesource.com/c/chromium/src/+/1235593, Gmail re-loads correctly after a network process crash. However if Gmail isn't reloaded or opened in a new tab, it still thinks it's disconnected because its subresource requests are failing. The problem is that ServiceWorkerSubresourceLoader::fallback_factory_ was a ChildURLLoaderFactoryBundle (which doesn't handle updating after crashes) instead of being a HostChildURLLoaderFactoryBundle (which is properly updated after a crash). Bug: 884007 Change-Id: I63d61811005a57f1a43ec91961fd5c5f5de5d0e8 Reviewed-on: https://chromium-review.googlesource.com/1237720 Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Commit-Queue: John Abd-El-Malek <jam@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#593076}(cherry picked from commit 3e4e10cf8afeff34ed2275d022a5fffae14904a2) Reviewed-on: https://chromium-review.googlesource.com/1239995 Reviewed-by: John Abd-El-Malek <jam@chromium.org> Cr-Commit-Position: refs/branch-heads/3538@{#588} Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
,
Sep 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ea32ee616494118abee2ee1f366e92ea942df9ee Commit: ea32ee616494118abee2ee1f366e92ea942df9ee Author: jam@chromium.org Commiter: jam@chromium.org Date: 2018-09-24 17:07:30 +0000 UTC ServiceWorker: call StopWorker when the network service crashes This CL is to stop service workers when the network service crashes. This might drop inflight events, but given that the frequency is quite low, it's acceptable. When another event is dispatched to the service worker, it starts the worker again and it should work. TBR=shimazu@chromium.org (cherry picked from commit 6a96cb6a3c2c7373a167127500ed142934a757ed) Bug: 884007 Change-Id: I2da4bd05c6a89f70b623da8e29a89488fd5a9cf3 Reviewed-on: https://chromium-review.googlesource.com/1235593 Commit-Queue: Makoto Shimazu <shimazu@chromium.org> Reviewed-by: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#593150} Reviewed-on: https://chromium-review.googlesource.com/1240498 Reviewed-by: John Abd-El-Malek <jam@chromium.org> Cr-Commit-Position: refs/branch-heads/3538@{#589} Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
,
Sep 24
,
Sep 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ea32ee616494118abee2ee1f366e92ea942df9ee commit ea32ee616494118abee2ee1f366e92ea942df9ee Author: John Abd-El-Malek <jam@chromium.org> Date: Mon Sep 24 17:07:30 2018 ServiceWorker: call StopWorker when the network service crashes This CL is to stop service workers when the network service crashes. This might drop inflight events, but given that the frequency is quite low, it's acceptable. When another event is dispatched to the service worker, it starts the worker again and it should work. TBR=shimazu@chromium.org (cherry picked from commit 6a96cb6a3c2c7373a167127500ed142934a757ed) Bug: 884007 Change-Id: I2da4bd05c6a89f70b623da8e29a89488fd5a9cf3 Reviewed-on: https://chromium-review.googlesource.com/1235593 Commit-Queue: Makoto Shimazu <shimazu@chromium.org> Reviewed-by: Matt Falkenhagen <falken@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#593150} Reviewed-on: https://chromium-review.googlesource.com/1240498 Reviewed-by: John Abd-El-Malek <jam@chromium.org> Cr-Commit-Position: refs/branch-heads/3538@{#589} Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811} [modify] https://crrev.com/ea32ee616494118abee2ee1f366e92ea942df9ee/content/renderer/service_worker/embedded_worker_instance_client_impl.h [modify] https://crrev.com/ea32ee616494118abee2ee1f366e92ea942df9ee/content/renderer/service_worker/service_worker_context_client.cc [modify] https://crrev.com/ea32ee616494118abee2ee1f366e92ea942df9ee/content/renderer/service_worker/service_worker_context_client.h
,
Sep 26
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/775101a577fabbf700ce078c80d5ecc58ecd9969 commit 775101a577fabbf700ce078c80d5ecc58ecd9969 Author: Makoto Shimazu <shimazu@chromium.org> Date: Wed Sep 26 04:46:52 2018 Add browser tests to call fetch() from SW-controlled page after NS crash This adds browser tests to call fetch() in a page controlled by a service worker after the Network Service crashes. There are several variants in the situation: - no fetch handler in the service worker - fetch handler exists but respondWith() isn't called - fetch handler exists and respondWith() is used Bug: 884007 Change-Id: Ieaa3c684186b9a93babb19ed490fc40832dbe6e0 Reviewed-on: https://chromium-review.googlesource.com/1242581 Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Commit-Queue: Makoto Shimazu <shimazu@chromium.org> Cr-Commit-Position: refs/heads/master@{#594219} [modify] https://crrev.com/775101a577fabbf700ce078c80d5ecc58ecd9969/content/browser/network_service_restart_browsertest.cc |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by shimazu@chromium.org
, Sep 14Status: Started (was: Assigned)