New issue
Advanced search Search tips

Issue 884007 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 24
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-09-24
OS: Linux , Windows , Mac
Pri: 1
Type: Bug



Sign in to add a comment

[Network service] Service workers do not recover from URLLoaderFactory connection errors

Project Member Reported by cduvall@chromium.org, Sep 13

Issue description

Connection errors can happen on URLLoaderFactories if the network process crashes or an extension using webRequest is installed when no previous webRequest extensions are installed. Service workers should be able to recover from these errors and reconnect to the network service. See issue  879883 for more background.
 
Labels: Hotlist-KnownIssue
Status: Started (was: Assigned)
Cc: dxie@google.com
Labels: -Pri-2 Proj-Servicification-Beta Pri-1
@shimazu, even though we addressed this gmail issue, we will still need this fix for a launching to a high channel (eg, beta/stable exp M70). Can you continue to prioritize this (crrev.com/c/1220932)? Speaking with cduvall@, the race condition you mentioned https://bugs.chromium.org/p/chromium/issues/detail?id=879883#c46 should not be blocking you as the behavior is the same prior to network service.

Ideally, we would like to have this fix in by 9/19 so it gets into the next M71 dev and can be merged back into M70 beta 9/26.
The extension race condition may still exist, but shouldn't affect this change as this should just handle the reconnection logic for service workers. My change in crrev.com/c/1224641 should prevent the URLLoaderFactory connection error on startup that was a problem before. This now should just be needed to handle the other cases like network process crashes.
Cc: falken@chromium.org kinuko@chromium.org yhirano@chromium.org
+kinuko/yutaka, can you help with this bug? See C#2. We would still need this fixed for our launch to a higher channel in m70. Since Mokoto is out for the next 2 days, can one of you help address this? thanks!
Sorry, I didn't realize this still needs to be prioritized. 
Actually I had a working patch except for tests in my workstation, but I don't have an access in a couple of days.
I think others in TOK are also OOO on Monday since it's a public holiday.
I'd appreciate it if someone took over it, or I can land it on Wed if merging the CL beyond Dev (baking it only on Canary) is acceptable.
thank you for the help and the follow up. I will check with others on the thread.
See also  issue 848256  for shared workers.
jam@ has a good idea on  issue 848256  to just shut down the shared worker on network service crash... that would seem to work better for service workers. I wonder if we should just do that here for simplicity.
The original idea included that as a promising option, and then we had more frequent errors due to the extension race that had forced us to change our plan. Assuming that we're now back to the previous state I'm fine... but I suppose we'll drop more requests in the case.
If service workers' lifetime is not guaranteed, e.g. the user agent can
shut it down at will, then this could work. network process crashes are
pretty rare, less than 10x of browser crashes so far. I guess this would
also be similar to a SW that's used by multiple renderer processes when the
process hosting the renderer dies?
When a SW process dies the sender page can know that via connection error so it can re-establish and resend requests, while this case the SW itself has already received requests from the page (so the page thinks they're being handled) and then they'll be just dropped... but given that it'd be pretty rare I think it could be fine too as they'd just appear as if some requests errored out in the similar way as network/server errors.
A running service worker might be handling some events, so usually it's terminated only when the browser process knows it's idle and no inflight event exists. If we try to restart the worker without waiting for becoming idle when NS crash or reconnection happens, some of inflight events might be dropped.
I'm okay with either way, but as kinuko mentioned, I'm guessing that the possibility of missing requests during restarting would be higher than during re-sending the newest URLLoaderBundle to the service worker. I'm slightly prefer resending URLLoaderBundle to restarting the worker.
Some network requests will always fail when a network process crashes, that
is unavoidable (e.g. requests which were outstanding in the network process
or ones which were in a message pipe to it). The question is how much we
care to narrow down the window where this occurs. I defer to you all on
that :)
Yep, we all chatted and plan to explore the shut-down approach. If it looks promising/simpler we'll just switch to do so.
Also, since the browser terminates service workers eventually anyway, and NS crashes are rare, I think we can make a case that this doesn't necessarily block Beta. The user won't be stuck forever since the service worker should eventually be restarted. The exception is if the page keeps the SW alive with constant requests. Regardless we should still try to fix this quickly and get it in 70 if possible... so I'll keep the Beta block label on for now.
Re comment 15, that's true it'll eventually work. However the experience today of Gmail not working if the network process crashes is pretty poor (no actionable workaround for user, unclear that they need to restart browser, reloading tab doesn't fix it), which is why I thought we should avoid this occurring to beta users.
c16: Yea, that makes sense.
Thanks. I patched this in and can confirm that with this fix, Gmail in a new tab works. However there's another bug which is that the cookie manager in RenderFrameMessageFilter needs to be recreated after network process crashes. I filed  bug 887680  to track that and I'm looking into it.
Hmm, after fixing the cookie issue (https://chromium-review.googlesource.com/c/chromium/src/+/1236282) and with the patch from comment 18, I still see Gmail not working after a network process crash. It says "Not connected. Connecting in ### s". If I open a new tab, then it works. But why isn't the first tab working?
I think the error is that ServiceWorkerSubresourceLoader::fallback_factory_ is not updated after network service crash, and it has a mojo pipe which is broken.
I should also mention, however with the fixes above a reload fixes things, so it's still much better than before.
Oh thanks for the update, subresource loader's fallback_factory_ probably needs another fix (that's at least not covered by shimazu's fix)
Hm, I *guess* if we change GetLoaderFactoryBundle()->CloneWithoutDefaultFactory() in RenderFrameImpl::uildServiceWorkerNetworkProviderForNavigation to GetLoaderFactoryBundle()->Clone() the reconnect would work.
(We seem to be losing the tracking ability in the specialized Clone path)
Yep that was it (you figured this out much faster than me!). I have a fix in https://chromium-review.googlesource.com/c/chromium/src/+/1237720
Project Member

Comment 26 by bugdroid1@chromium.org, Sep 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3e4e10cf8afeff34ed2275d022a5fffae14904a2

commit 3e4e10cf8afeff34ed2275d022a5fffae14904a2
Author: John Abd-El-Malek <jam@chromium.org>
Date: Fri Sep 21 04:20:33 2018

Fix service worker subresource requests failing after a network process crash.

With https://chromium-review.googlesource.com/c/chromium/src/+/1235593, Gmail re-loads correctly after a network process crash. However if Gmail isn't reloaded or opened in a new tab, it still thinks it's disconnected because its subresource requests are failing.

The problem is that ServiceWorkerSubresourceLoader::fallback_factory_ was a ChildURLLoaderFactoryBundle (which doesn't handle updating after crashes) instead of being a HostChildURLLoaderFactoryBundle (which is properly updated after a crash).

Bug:  884007 
Change-Id: I63d61811005a57f1a43ec91961fd5c5f5de5d0e8
Reviewed-on: https://chromium-review.googlesource.com/1237720
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/heads/master@{#593076}
[modify] https://crrev.com/3e4e10cf8afeff34ed2275d022a5fffae14904a2/content/renderer/loader/child_url_loader_factory_bundle.h
[modify] https://crrev.com/3e4e10cf8afeff34ed2275d022a5fffae14904a2/content/renderer/loader/tracked_child_url_loader_factory_bundle.cc
[modify] https://crrev.com/3e4e10cf8afeff34ed2275d022a5fffae14904a2/content/renderer/loader/tracked_child_url_loader_factory_bundle.h

Project Member

Comment 27 by bugdroid1@chromium.org, Sep 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b731e532b76383ac9296967e973862fd99ae8e59

commit b731e532b76383ac9296967e973862fd99ae8e59
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Fri Sep 21 07:56:31 2018

Revise browser test using fetch_from_*_worker.(html|js)

This CL is to use EvalJs() instead of waiting for an expected title and checking
the body.innerText. This will generate more verbose errors especially in the
case where the previous test times out.

TBR=kinuko@chromium.org

Bug:  884007 
Change-Id: I8cbde52efc2da69784b4c1ca06a17c7d262a80d3
Reviewed-on: https://chromium-review.googlesource.com/1237794
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Cr-Commit-Position: refs/heads/master@{#593109}
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/browser/do_not_track_browsertest.cc
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/service_worker/fetch_from_service_worker.html
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/service_worker/fetch_from_service_worker.js
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_shared_worker.html
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_shared_worker.js
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_worker.html
[modify] https://crrev.com/b731e532b76383ac9296967e973862fd99ae8e59/content/test/data/workers/fetch_from_worker.js

Project Member

Comment 28 by bugdroid1@chromium.org, Sep 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2075b0d44c960dd5ed208e710fd79eb8ab35d771

commit 2075b0d44c960dd5ed208e710fd79eb8ab35d771
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Fri Sep 21 09:04:22 2018

Add tests for fetch() in a service/shared worker after NetworkService crashes

Bug:  884007 ,  848256 
Change-Id: I3dbeab15a52fc296058b1962364edb19c95d74a0
Reviewed-on: https://chromium-review.googlesource.com/1235261
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Cr-Commit-Position: refs/heads/master@{#593129}
[modify] https://crrev.com/2075b0d44c960dd5ed208e710fd79eb8ab35d771/content/browser/network_service_restart_browsertest.cc

Project Member

Comment 29 by bugdroid1@chromium.org, Sep 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a96cb6a3c2c7373a167127500ed142934a757ed

commit 6a96cb6a3c2c7373a167127500ed142934a757ed
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Fri Sep 21 11:39:13 2018

ServiceWorker: call StopWorker when the network service crashes

This CL is to stop service workers when the network service crashes. This might
drop inflight events, but given that the frequency is quite
low, it's acceptable. When another event is dispatched to the service worker, it
starts the worker again and it should work.

Bug:  884007 
Change-Id: I2da4bd05c6a89f70b623da8e29a89488fd5a9cf3
Reviewed-on: https://chromium-review.googlesource.com/1235593
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Cr-Commit-Position: refs/heads/master@{#593150}
[modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/browser/network_service_restart_browsertest.cc
[modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/renderer/service_worker/embedded_worker_instance_client_impl.h
[modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/renderer/service_worker/service_worker_context_client.cc
[modify] https://crrev.com/6a96cb6a3c2c7373a167127500ed142934a757ed/content/renderer/service_worker/service_worker_context_client.h

Labels: Merge-Request-70
Requesting merge of r593076 and r593150 (will not merge test part of later) to M70; they fix issues with network service when the network process crashes. I realize they haven't gone to canary yet; I'm requesting merge today and will merge them tomorrow so that they're in the next beta build. Thanks
Project Member

Comment 31 by sheriffbot@chromium.org, Sep 21

Labels: -Merge-Request-70 Merge-Review-70 Hotlist-Merge-Review
This bug requires manual review: M70 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), geohsu@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: benmason@chromium.org
NextAction: 2018-09-24
The canary still hasn't gone out featuring these changes. Let's follow up on Monday.
sg, we want these changes in prior to the next M70 beta cut. 
The NextAction date has arrived: 2018-09-24
Labels: -OS-Android
Removing Android label for now, since this code is behind a feature flag that isn't run on Android yet.
Labels: -Merge-Review-70 Merge-Approved-70
Project Member

Comment 37 by bugdroid1@chromium.org, Sep 24

Labels: -merge-approved-70 merge-merged-3538
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8369c56926c5048fa3046877e27b43266d1b0904

commit 8369c56926c5048fa3046877e27b43266d1b0904
Author: John Abd-El-Malek <jam@chromium.org>
Date: Mon Sep 24 17:00:44 2018

Fix service worker subresource requests failing after a network process crash.

With https://chromium-review.googlesource.com/c/chromium/src/+/1235593, Gmail re-loads correctly after a network process crash. However if Gmail isn't reloaded or opened in a new tab, it still thinks it's disconnected because its subresource requests are failing.

The problem is that ServiceWorkerSubresourceLoader::fallback_factory_ was a ChildURLLoaderFactoryBundle (which doesn't handle updating after crashes) instead of being a HostChildURLLoaderFactoryBundle (which is properly updated after a crash).

Bug:  884007 
Change-Id: I63d61811005a57f1a43ec91961fd5c5f5de5d0e8
Reviewed-on: https://chromium-review.googlesource.com/1237720
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: John Abd-El-Malek <jam@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#593076}(cherry picked from commit 3e4e10cf8afeff34ed2275d022a5fffae14904a2)
Reviewed-on: https://chromium-review.googlesource.com/1239995
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/branch-heads/3538@{#588}
Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
[modify] https://crrev.com/8369c56926c5048fa3046877e27b43266d1b0904/content/renderer/loader/child_url_loader_factory_bundle.h
[modify] https://crrev.com/8369c56926c5048fa3046877e27b43266d1b0904/content/renderer/loader/tracked_child_url_loader_factory_bundle.cc
[modify] https://crrev.com/8369c56926c5048fa3046877e27b43266d1b0904/content/renderer/loader/tracked_child_url_loader_factory_bundle.h

Labels: Merge-Merged-70-3538
The following revision refers to this bug: 
https://chromium.googlesource.com/chromium/src.git/+/8369c56926c5048fa3046877e27b43266d1b0904

Commit: 8369c56926c5048fa3046877e27b43266d1b0904
Author: jam@chromium.org
Commiter: jam@chromium.org
Date: 2018-09-24 17:00:44 +0000 UTC

Fix service worker subresource requests failing after a network process crash.

With https://chromium-review.googlesource.com/c/chromium/src/+/1235593, Gmail re-loads correctly after a network process crash. However if Gmail isn't reloaded or opened in a new tab, it still thinks it's disconnected because its subresource requests are failing.

The problem is that ServiceWorkerSubresourceLoader::fallback_factory_ was a ChildURLLoaderFactoryBundle (which doesn't handle updating after crashes) instead of being a HostChildURLLoaderFactoryBundle (which is properly updated after a crash).

Bug:  884007 
Change-Id: I63d61811005a57f1a43ec91961fd5c5f5de5d0e8
Reviewed-on: https://chromium-review.googlesource.com/1237720
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: John Abd-El-Malek <jam@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#593076}(cherry picked from commit 3e4e10cf8afeff34ed2275d022a5fffae14904a2)
Reviewed-on: https://chromium-review.googlesource.com/1239995
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/branch-heads/3538@{#588}
Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
The following revision refers to this bug: 
https://chromium.googlesource.com/chromium/src.git/+/ea32ee616494118abee2ee1f366e92ea942df9ee

Commit: ea32ee616494118abee2ee1f366e92ea942df9ee
Author: jam@chromium.org
Commiter: jam@chromium.org
Date: 2018-09-24 17:07:30 +0000 UTC

ServiceWorker: call StopWorker when the network service crashes

This CL is to stop service workers when the network service crashes. This might
drop inflight events, but given that the frequency is quite
low, it's acceptable. When another event is dispatched to the service worker, it
starts the worker again and it should work.

TBR=shimazu@chromium.org

(cherry picked from commit 6a96cb6a3c2c7373a167127500ed142934a757ed)

Bug:  884007 
Change-Id: I2da4bd05c6a89f70b623da8e29a89488fd5a9cf3
Reviewed-on: https://chromium-review.googlesource.com/1235593
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#593150}
Reviewed-on: https://chromium-review.googlesource.com/1240498
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/branch-heads/3538@{#589}
Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
Status: Fixed (was: Started)
Project Member

Comment 41 by bugdroid1@chromium.org, Sep 24

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ea32ee616494118abee2ee1f366e92ea942df9ee

commit ea32ee616494118abee2ee1f366e92ea942df9ee
Author: John Abd-El-Malek <jam@chromium.org>
Date: Mon Sep 24 17:07:30 2018

ServiceWorker: call StopWorker when the network service crashes

This CL is to stop service workers when the network service crashes. This might
drop inflight events, but given that the frequency is quite
low, it's acceptable. When another event is dispatched to the service worker, it
starts the worker again and it should work.

TBR=shimazu@chromium.org

(cherry picked from commit 6a96cb6a3c2c7373a167127500ed142934a757ed)

Bug:  884007 
Change-Id: I2da4bd05c6a89f70b623da8e29a89488fd5a9cf3
Reviewed-on: https://chromium-review.googlesource.com/1235593
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#593150}
Reviewed-on: https://chromium-review.googlesource.com/1240498
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Cr-Commit-Position: refs/branch-heads/3538@{#589}
Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
[modify] https://crrev.com/ea32ee616494118abee2ee1f366e92ea942df9ee/content/renderer/service_worker/embedded_worker_instance_client_impl.h
[modify] https://crrev.com/ea32ee616494118abee2ee1f366e92ea942df9ee/content/renderer/service_worker/service_worker_context_client.cc
[modify] https://crrev.com/ea32ee616494118abee2ee1f366e92ea942df9ee/content/renderer/service_worker/service_worker_context_client.h

Project Member

Comment 42 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/775101a577fabbf700ce078c80d5ecc58ecd9969

commit 775101a577fabbf700ce078c80d5ecc58ecd9969
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Wed Sep 26 04:46:52 2018

Add browser tests to call fetch() from SW-controlled page after NS crash

This adds browser tests to call fetch() in a page controlled by a service worker
after the Network Service crashes. There are several variants in the situation:
- no fetch handler in the service worker
- fetch handler exists but respondWith() isn't called
- fetch handler exists and respondWith() is used

Bug:  884007 
Change-Id: Ieaa3c684186b9a93babb19ed490fc40832dbe6e0
Reviewed-on: https://chromium-review.googlesource.com/1242581
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594219}
[modify] https://crrev.com/775101a577fabbf700ce078c80d5ecc58ecd9969/content/browser/network_service_restart_browsertest.cc

Sign in to add a comment