ServiceWorker.FetchEvent.MainResource.Status ERROR_START_WORKER_FAILED with PlzNavigate |
|||||||||||
Issue descriptionThe ratio of ERROR_START_WORKER_FAILED has increased from Win/Mac Canary from 60.0.3110.0. Win: https://uma.googleplex.com/timeline_v2?sid=7031075b44cb2059537a237b9ebdd90d Mac: https://uma.googleplex.com/timeline_v2?sid=ba26407cc67bfea58b9c56d382cfe876
,
Jun 2 2017
,
Jun 2 2017
I think this issue is caused by the experiment of PlzNavigate. The negative impact of PlzNavigate on the error rate of ServiceWorker.StartWorker.StatusByPurpose_FETCH_MAIN_FRAME has increased from 60.0.3110 to 61.0.3115. There are two related patches in issue 705318 . - "PlzNavigate: implement process reuse for ServiceWorkers" (5d947f5a0c60d33fa16b6609978bd712a497cb00) landed in 60.0.3110.0. - "Ensure the NTP ServiceWorker has the proper site URL" (a4262ab9315b24197a00cc6c942474144c1e4a7f) landed in 61.0.3116.0. Win Canary UMAs PlzNavigate | OFF | ON | 60.0.3100.* | 99.92% | 99.74% | https://uma.googleplex.com/p/chrome/variations/?sid=97652f135c106ad99214478081694d53 60.0.3101.* | 99.92% | 99.70% | https://uma.googleplex.com/p/chrome/variations/?sid=2d0e73a9ea0a5c29b648397886f75161 60.0.3102.* | 99.93% | 99.75% | https://uma.googleplex.com/p/chrome/variations/?sid=1bf58fd28b73f2bf67679e57f6a00f79 60.0.3103.* | 99.91% | 99.72% | https://uma.googleplex.com/p/chrome/variations/?sid=11f5bd5cf322226618b9c7487fd5eb57 60.0.3104.* | 99.93% | 99.82% | https://uma.googleplex.com/p/chrome/variations/?sid=214d819b870da4dffb3c9c86a2c9ba89 60.0.3105.* | 99.90% | 99.64% | https://uma.googleplex.com/p/chrome/variations/?sid=9b64d16b42d5e64c5e0488d07e481d18 60.0.3106.* | 99.85% | 99.57% | https://uma.googleplex.com/p/chrome/variations/?sid=4bebeafb7cf4cd36bdc3bb004a8297c4 60.0.3107.* | 99.92% | 99.68% | https://uma.googleplex.com/p/chrome/variations/?sid=5b36959eec939590d8c16f7499d1c1e2 60.0.3108.* | 99.92% | 99.74% | https://uma.googleplex.com/p/chrome/variations/?sid=bc9b30f3e69a34f77a2d613ed0dc57e8 60.0.3109.* | 99.93% | 99.70% | https://uma.googleplex.com/p/chrome/variations/?sid=de9b3a64fd2607c48e6fcec931c1c51d 60.0.3110.* | 99.92% | 98.55% | https://uma.googleplex.com/p/chrome/variations/?sid=ad13f71e3226d5041334eb9cedb6e84e 60.0.3111.* | 99.93% | 98.78% | https://uma.googleplex.com/p/chrome/variations/?sid=8328bc35412502c4eb3a679a438abcb4 60.0.3112.* | 99.90% | 99.13% | https://uma.googleplex.com/p/chrome/variations/?sid=8f093400f48a0f54b8801d2fe0f78d00 61.0.3113.* | 99.92% | 99.08% | https://uma.googleplex.com/p/chrome/variations/?sid=8e3d31b1ba3d93419f9560f45df1480c 61.0.3114.* | 99.92% | 99.36% | https://uma.googleplex.com/p/chrome/variations/?sid=febefe2b5dd799a1c3544cba15e9cb72 61.0.3115.* | 99.92% | 98.97% | https://uma.googleplex.com/p/chrome/variations/?sid=ec79eee939a9fb48be1984791af968f9 61.0.3116.* | 99.90% | 99.71% | https://uma.googleplex.com/p/chrome/variations/?sid=a56351feee000942b3bff8f749075ca2
,
Jun 2 2017
The percentage in the comment 3 is the ratio of SERVICE_WORKER_OK of ServiceWorker.StartWorker.StatusByPurpose_FETCH_MAIN_FRAME.
,
Jun 2 2017
The negative impact of PlzNavigate on Android looks terrible. We don't have enough data of Android (>=3116). So I want to keep this issue open. Android Canary UMAs of SERVICE_WORKER_OK of ServiceWorker.StartWorker.StatusByPurpose_FETCH_MAIN_FRAME PlzNavigate | OFF | ON | 60.0.3100.* | 99.84% | 98.41% | https://uma.googleplex.com/p/chrome/variations/?sid=da2adb76c8fa3fb43e551da243242a8a 60.0.3101.* | 99.77% | 98.11% | https://uma.googleplex.com/p/chrome/variations/?sid=c364091900c67b9ce0c91dfff6d2dcfe 60.0.3102.* | 99.76% | 98.10% | https://uma.googleplex.com/p/chrome/variations/?sid=c4c5b2de06a15287387e37cd0de9d0dd 60.0.3103.* | 99.77% | 98.07% | https://uma.googleplex.com/p/chrome/variations/?sid=ffc3a82d9825d7c9ff4f000631314494 60.0.3104.* | 99.63% | 97.67% | https://uma.googleplex.com/p/chrome/variations/?sid=c59b462f88a622e9c56b3d145f44774b 60.0.3105.* | Not enough data | https://uma.googleplex.com/p/chrome/variations/?sid=1470caf1ac0ffc6105f1513a0abf0fa8 60.0.3106.* | Not enough data | https://uma.googleplex.com/p/chrome/variations/?sid=575704a2c311af7742b81d842a4a529f 60.0.3107.* | 99.80% | 98.62% | https://uma.googleplex.com/p/chrome/variations/?sid=529109f97fab0eeecd6989391d61a2a9 60.0.3108.* | 99.63% | 99.01% | https://uma.googleplex.com/p/chrome/variations/?sid=82c959d57450b1d2d8f9f76cf628c937 60.0.3109.* | 99.83% | 96.79% | https://uma.googleplex.com/p/chrome/variations/?sid=c58c1028682356b092c4292a0a1d9b64 60.0.3110.* | Not enough data | https://uma.googleplex.com/p/chrome/variations/?sid=1645229c8c9fa5a690e39c0cae25e21d 60.0.3111.* | 99.71% | 85.91% | https://uma.googleplex.com/p/chrome/variations/?sid=e58a3ada420665fbaa96f5e497a5dffb 60.0.3112.* | 99.79% | 89.86% | https://uma.googleplex.com/p/chrome/variations/?sid=58a1ec8391de4139fbecb02c4f6307e2 61.0.3113.* | Not enough data | https://uma.googleplex.com/p/chrome/variations/?sid=2eef204e49aa2617d97efd42686a347b 61.0.3114.* | 99.69% | 90.41% | https://uma.googleplex.com/p/chrome/variations/?sid=9387b122f9a47a5c36efa3b66a2cd5d8 61.0.3115.* | 99.67% | 90.51% | https://uma.googleplex.com/p/chrome/variations/?sid=6c10ed00d4386f6d7a09ab98ff816059 61.0.3116.* | Not enough data | https://uma.googleplex.com/p/chrome/variations/?sid=4b2e99f62651694de41d5113d241db9d
,
Jun 2 2017
Yes that seems pretty bad. My best guess is that we might be trying to put the SW in a process that's not able to instantiate it (ie exiting, unresponsive?). I did not put additional checks on the process we return, so it might be in a wrong state. Charlie, Nasko, what are we supposed to check on the RPH to ensure it is in a state where it can start a SW?
,
Jun 2 2017
Other than checking that it's not shutting down, I don't know what else would prevent a RPH from starting a ServiceWorker. falken@, do you know?
,
Jun 6 2017
I don't think service worker really does anything fancy with the RPH. Only thing I can think of is, once you decide to use the RPH for the service worker, you must call IncrementServiceWorkerRefCount to ensure the RPH stays alive. Otherwise the browser process may thinks it's fine to kill the renderer process.
,
Jun 6 2017
I think we should not set REUSE_PENDING_OR_COMMITTED_SITE if |can_use_existing_process| is false in ServiceWorkerProcessManager::AllocateWorkerProcess(). |can_use_existing_process| is false if the service worker version has failed to start more than once. https://chromium.googlesource.com/chromium/src/+blame/d99a9664c873fc38e226e33bc92da974fb852da3/content/browser/service_worker/embedded_worker_instance.cc#300 This logic prevent attempting to start the service worker in a process that's not able to instantiate.
,
Jun 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/47dc328bb08ac9c66848eeaf68fc8e96a39975cb commit 47dc328bb08ac9c66848eeaf68fc8e96a39975cb Author: clamy <clamy@chromium.org> Date: Wed Jun 07 00:53:39 2017 Only use SiteInstance process reuse for ServiceWorker with no failures This CL ensures we will only attempt to use the SiteInstance process reuse mechanism when creating a SiteInstance for a ServiceWorker when the ServiceWorker has not failed to start. Otherwise, we will try to start it in a brand new process. BUG= 728030 Review-Url: https://codereview.chromium.org/2925623004 Cr-Commit-Position: refs/heads/master@{#477493} [modify] https://crrev.com/47dc328bb08ac9c66848eeaf68fc8e96a39975cb/content/browser/service_worker/service_worker_process_manager.cc [modify] https://crrev.com/47dc328bb08ac9c66848eeaf68fc8e96a39975cb/content/browser/service_worker/service_worker_process_manager.h [modify] https://crrev.com/47dc328bb08ac9c66848eeaf68fc8e96a39975cb/content/browser/service_worker/service_worker_process_manager_unittest.cc
,
Jun 7 2017
Issue 725337 has been merged into this issue.
,
Jun 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ada6aaf7955bf98ca25247619eeebad0e1a06466 commit ada6aaf7955bf98ca25247619eeebad0e1a06466 Author: Matt Falkenhagen <falken@chromium.org> Date: Wed Jun 07 07:25:47 2017 service worker: Address comment/DCHECK nits from previous code review. Follow-up to https://codereview.chromium.org/2925623004/ Bug: 728030 Change-Id: I09f8b0f53e5ba79a3074e74ba80035c63848b4db Reviewed-on: https://chromium-review.googlesource.com/526893 Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Commit-Queue: Matt Falkenhagen <falken@chromium.org> Cr-Commit-Position: refs/heads/master@{#477574} [modify] https://crrev.com/ada6aaf7955bf98ca25247619eeebad0e1a06466/content/browser/service_worker/service_worker_process_manager.cc [modify] https://crrev.com/ada6aaf7955bf98ca25247619eeebad0e1a06466/content/browser/service_worker/service_worker_process_manager_unittest.cc
,
Jun 8 2017
@horo or anyone else: looking at the UMA links posted in comments 1 - 5, these graphs all recovered after May 30 (i.e. after https://codereview.chromium.org/2898313003 landed). Some links: https://uma.googleplex.com/timeline_v2?sid=e874c2f437609e6b81a45de65c230f70 https://uma.googleplex.com/p/chrome/variations/?sid=80ab9098b4619ec1794ae62874a471bb Is this still actually a bug?
,
Jun 8 2017
UMA of Windows recovered at 60.0.3116. UMA of Andoird recovered at 60.0.3117. I think the https://codereview.chromium.org/2898313003 which landed in 60.0.3116 is not related to Android where NTP doesn't use service worker. I don't understand why its recovered on Android at 60.0.3117. But anyway I can mark this bug as FIXED. Windows Canary UMAs of SERVICE_WORKER_OK of ServiceWorker.StartWorker.StatusByPurpose_FETCH_MAIN_FRAME PlzNavigate | OFF | ON | 60.0.3116.* | 99.91% | 99.62% | https://uma.googleplex.com/p/chrome/variations/?sid=8b6e811353db9443bebe993bd5997af9 60.0.3117.* | 99.92% | 99.62% | https://uma.googleplex.com/p/chrome/variations/?sid=dbb509218de0c6cce111318a4e63a2b5 60.0.3118.* | 99.90% | 99.63% | https://uma.googleplex.com/p/chrome/variations/?sid=1a1884ec413b04a59eeca35526b64243 60.0.3119.* | 99.93% | 99.72% | https://uma.googleplex.com/p/chrome/variations/?sid=eb1b4761bdd6cb1aac25e765298ad6c1 Android Canary UMAs of SERVICE_WORKER_OK of ServiceWorker.StartWorker.StatusByPurpose_FETCH_MAIN_FRAME PlzNavigate | OFF | ON | 60.0.3116.* | 99.87% | 90.94% | https://uma.googleplex.com/p/chrome/variations/?sid=dcd92393aa18902bd46294a36d1c05c6 60.0.3117.* | 99.85% | 99.85% | https://uma.googleplex.com/p/chrome/variations/?sid=d838447555e3fbb65bd83d7708916890 60.0.3118.* | 99.76% | 99.43% | https://uma.googleplex.com/p/chrome/variations/?sid=18e31a1964c07ae7ea67c28b2fa973d3 60.0.3119.* | 99.42% | 99.37% | https://uma.googleplex.com/p/chrome/variations/?sid=71bc1ecc2a30ac6f4880a0a00cb4a36a
,
Jun 8 2017
,
Jun 8 2017
Your change meets the bar and is auto-approved for M60. Please go ahead and merge the CL to branch 3112 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/acaa6e0bf4e2df67012aa1e055d6233e6920d958 commit acaa6e0bf4e2df67012aa1e055d6233e6920d958 Author: John Abd-El-Malek <jam@chromium.org> Date: Thu Jun 08 17:14:06 2017 Only use SiteInstance process reuse for ServiceWorker with no failures This CL ensures we will only attempt to use the SiteInstance process reuse mechanism when creating a SiteInstance for a ServiceWorker when the ServiceWorker has not failed to start. Otherwise, we will try to start it in a brand new process. BUG= 728030 Review-Url: https://codereview.chromium.org/2925623004 Cr-Original-Commit-Position: refs/heads/master@{#477493} Review-Url: https://codereview.chromium.org/2927973002 . Cr-Commit-Position: refs/branch-heads/3112@{#255} Cr-Branched-From: b6460e24cf59f429d69de255538d0fc7a425ccf9-refs/heads/master@{#474897} [modify] https://crrev.com/acaa6e0bf4e2df67012aa1e055d6233e6920d958/content/browser/service_worker/service_worker_process_manager.cc [modify] https://crrev.com/acaa6e0bf4e2df67012aa1e055d6233e6920d958/content/browser/service_worker/service_worker_process_manager.h [modify] https://crrev.com/acaa6e0bf4e2df67012aa1e055d6233e6920d958/content/browser/service_worker/service_worker_process_manager_unittest.cc
,
Jun 16 2017
Issue 733933 has been merged into this issue. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by horo@chromium.org
, Jun 2 2017