New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 728013 link

Starred by 9 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 1
Type: Bug-Regression

Blocking:
issue 649558



Sign in to add a comment

ServiceWorker thread pegs CPU 100% in native code

Project Member Reported by vdje...@fb.com, May 31 2017

Issue description

Chrome Version: Chrome Beta 59.0.3071.71
OS: OS X 

One of our developers consistently sees 100% CPU usage on Chrome Beta from Facebook.com's ServiceWorker thread. He is able to reproduce it consistently.

The ServiceWorker thread is not very busy with JS code. It forwards simple fetch requests from a Facebook tab to the network every few seconds. It takes it about 1 millisecond to handle these requests.

Pausing ServiceWorker JS execution in Chrome devtools only reveals these fetch events executing infrequently.

Tracing with chrome://tracing with all default categories enabled and most non-default categories enabled reveals a ton of ServiceWorker event-loop activity (native code) but no obvious cause.

The trace: https://www.dropbox.com/s/is8i9xslszvem8o/sw_cpu.json?dl=0

Profiling with the OSX Activity Monitor is also not very insightful. See attachment.


It seems like a Chrome bug, have you seen anything like this before?
 
sw_pegging_cpu.txt
522 KB View Download

Comment 1 by vdje...@fb.com, May 31 2017

Cc: kenjibaheux@chromium.org

Comment 2 by falken@chromium.org, May 31 2017

I indeed don't see anything useful in the trace. The browser process is not dispatching events to the service worker in that time.

Could the uBlock thing be related? Is that an extension? Is it reproducible when uBlock is not there?

Comment 3 by vdje...@fb.com, May 31 2017

Yes, uBlock is an extension. How would it impact the ServiceWorker thread though?
The user did try turning off uBlock but didn't see any change.

Comment 4 by falken@chromium.org, May 31 2017

Thanks. Not sure how it would affect it. Just trying to simplify the environment.

Comment 5 by horo@chromium.org, May 31 2017

Owner: falken@chromium.org
Status: Assigned (was: Untriaged)

Comment 6 by n...@fb.com, May 31 2017

Cc: n...@fb.com

Comment 7 by balp...@fb.com, Jun 2 2017

(I'm the one experiencing the problem.)

We figured out that this doesn't seem to happen for me on release (58.0.3029.110), just beta (59.0.3071.71). Bisect says https://chromium.googlesource.com/chromium/src/+log/c00b9e07d88486e3f15226b7e37be732e57ec897..07955d420101ad98b446a9b1b3d26426627803cf -- notable is "Enable ServiceWorkerNavigationPreload by default", and I verified that if I disable that experiment in chrome://flags then the behavior goes away (at least on chromium 59.0.3071.0, build 464636). We're continuing to investigate.

Comment 8 by vdje...@fb.com, Jun 2 2017

I was able to reproduce this bug in my own profile by following Ben's steps outlined above:

1. Run Chrome Beta 59
2. Force-enable "Enable ServiceWorkerNavigationPreload by default" in chrome://flags
3. Visit Facebook.com with FB account in Facebook's ServiceWorker preload experiment

Let us know your Facebook user ID if you want to try reproducing it.
The thing hogging the CPU looks ArmOrNotify. This means there might be a SimpleWatcher which continues to watch a pipe. I guess there is some code using SimpleWatcher with AUTOMATIC arming policy which actually should use MANUAL policy.
I'm now trying to look for the code.
and also I think the trace doesn't have traces of "ServiceWorker" category. Is it possible to record the trace again with "ServiceWorker"?
I could repro the situation by this test (simple.html).
https://chromium-review.googlesource.com/c/522085/

CPU usage gets 100% after receiving the header of response.
Labels: -Pri-3 M-59 OS-Linux OS-Mac Pri-1
Here is the result of chrome://version which I can reproduce it.

Chromium	61.0.3118.0 (Developer Build) (64-bit)
Revision	5468fab28025243e30d2170d7d8440d8e1a45df8-refs/heads/master@{#476223}
OS	Linux
JavaScript	V8 6.1.46
Flash	(Disabled)
User Agent	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3118.0 Safari/537.36
Command Line	./out/master_release/chrome --flag-switches-begin --enable-experimental-web-platform-features --flag-switches-end
Executable Path	** sanitized home **/chromium/src/out/master_release/chrome
Profile Path	** sanitized home **/.config/chromium/Default
Variations	6c18ba9d-3d98b302
241fff6c-4eda1c57
1e528f0f-15305a2
b130ecb8-b130ecb8
d43bf3e5-d43bf3e5
ba3f87da-92cc81ec
a605b19e-3f4a17df
5ca89f9-3f4a17df
f3499283-2f3631ce
9e201a2b-3f4a17df
c5f6e7c4-3f4a17df
5274eb09-3f4a17df
57f575bb-f23d1dea
68812885-3f4a17df
b791c1b8-3f4a17df
9773d3bd-3f4a17df
9ef7d150-3f4a17df
2e109477-f3b42e62
99144bc3-3cc2175e
64cbdfc2-3f4a17df
5139837c-3f4a17df
7f8176d9-3f4a17df
b7786474-d93a0620
23a898eb-431d877b
7382e39a-3f4a17df
868bda90-3f4a17df
4ea303a6-3f4a17df
ce152c12-3f4a17df
3a007b7-3f4a17df
12be2281-e3b158e5
64224f74-5087fa4a
56302f8c-3f4a17df
caa06ad2-3f4a17df
ad6d27cc-3e870323
f56e0452-3f4a17df
b2f0086-93053e47
ef25c1eb-3f4a17df
7fc902e8-3f4a17df
d747916f-d747916f
1354da85-e3dff32e
494d8760-3f4a17df
fe05be5f-4ad60575
828a5926-d8f52f32
I've found that the problem might happen when GC is triggered before receiving the header of navpreload response on the service worker thread.
When it happens, BodyStreamBuffer::ProcessData cannot read the data because |stream_needs_more_| is false somehow. I guess it should be true by BodyStreamBuffer::pull. I'm not familiar with the pull(), but it might not be called after GC.

ProcessData:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/fetch/BodyStreamBuffer.cpp?type=cs&q=BodyStreamBuffer&l=314

Trace of mine:
https://drive.google.com/file/d/0Bwd71LAFvG5SUW0yMjhsRlFiQ3c/view?usp=sharing

Comment 14 by horo@chromium.org, Jun 2 2017

Cc: yhirano@chromium.org
Owner: horo@chromium.org
Status: Started (was: Assigned)

Comment 16 by horo@chromium.org, Jun 2 2017

Owner: yhirano@chromium.org
Status: Assigned (was: Started)
As we talked offline, something is accidentally GCed and "persistent->IsEmpty()" check in ScriptPromisePropertyBase::ResolveOrReject() is returning true.

https://chromium.googlesource.com/chromium/src/+blame/22f51252c62761cc2ff93fe9752377626705c480/third_party/WebKit/Source/bindings/core/v8/ScriptPromisePropertyBase.cpp#79

yhirano@ will look at this issue.

Labels: -Type-Bug ReleaseBlock-Stable Type-Bug-Regression
Does this prevent the page from loading? If so, it's probably a blocker.

Comment 19 by balp...@fb.com, Jun 2 2017

shimazu: As requested, here is a new trace I just took (on 61.0.3117.0 canary). I checked all of the "Record Categories" (including ServiceWorker) but not the "Disabled by Default Categories". I believe the last trace I posted also had "ServiceWorker" checked though.

All: Let me know if I can record anything else that would be helpful or if you would like me to add your Facebook account to this experiment so you can verify that any potential fix works.
trace_sw_hang_from_61.0.3117.0_canary_see_pid_48386.json.gz
2.6 MB Download
We are about to promote M59 to stable channel next week, want to check if we have any fix in works if so how safe is the fix to get into M59 branch.
Cc: abdulsyed@chromium.org
Cc: sshruthi@chromium.org
RE #9:
> The thing hogging the CPU looks ArmOrNotify. This means there might be a 
> SimpleWatcher which continues to watch a pipe. I guess there is some code using 
> SimpleWatcher with AUTOMATIC arming policy which actually should use MANUAL 
> policy.

I think the thing hogging the CPU is the task posted by ArmOnNotify, which is OnHandleReady. OnHandleReady calls into the upper-layer user code, unfortunately the trace doesn't show any information about that.

Please, if the fix isn't going to be merged into M59 then don't ship
Navigation Preload in M59 as well.
Cc: pbomm...@chromium.org
Cc: falken@chromium.org horo@chromium.org
On what percentage is the experiment "ServiceWorkerNavigationPreload" on by default? Can we turn the experiment off if we don't have a fix soon (M59 is scheduled for promotion next week). Any major downsides?
Labels: OS-Android OS-Chrome OS-Windows
Given this is Blink, assuming should see the issue on Windows, Chrome and Android hence tagging those platforms as well.

Please remove them if they aren't applicable.

Comment 28 by horo@chromium.org, Jun 3 2017

Cc: dominicc@chromium.org roc...@chromium.org haraken@chromium.org
If my understanding is correct, there are two bugs.

[1] The preload response can cause 100% CPU usage in the service worker thread.
    This was introduced by 9eadabae1de0cb22725450596fe324f03dd63b92 in M59.
[2] The resolver of .preloadResponse could be GCed. This exist from M57 when we
    started origin trial of navigation preload.

-----------------------------------------
[1] The preload response can cause 100% CPU usage in the service worker thread.

If there is no consumer of the preload response,
BytesConsumerForDataConsumerHandle::DidGetReadable() is called infinitely after
Chrome receives the response from the server until the FetchEvent is GCed.
So the service Worker thread uses CPU 100%.

debug::TaskAnnotator::RunTask()
 => mojo::SimpleWatcher::OnHandleReady()
   => content::WebDataConsumerHandleImpl::ReaderImpl::OnHandleGotReadable()
     => blink::BytesConsumerForDataConsumerHandle::DidGetReadable()
       => blink::BodyStreamBuffer::OnStateChange()
         => blink::BodyStreamBuffer::ProcessData()

This bug doesn't exist in M58 (58.0.3029.110). but exists in M59 (59.0.3071.83).
So I bisected and found that this change introduced this bug.
- 9eadabae1de0cb22725450596fe324f03dd63b92 Mojo: Armed Watchers by rockot@

Reproduce step:
1. Start Chrome with "--enable-features=ServiceWorkerNavigationPreload" flag, if
   you are using older Chrome than 23752147e6f5f7dcaf4bad1bb1c306fb91e1ec5b
   (59.0.3071.0).
2. Go https://horo-t.github.io/serviceworker/demo/tmp/20170603/busybug/
3. Click "register SW"
4. Open chrome://tracing/ in a new tab and start recording with "toplevel"
   check box enabled.
5. Click "test page" link and see "OK" in the page.
6. Stop the tracing and see the "ServiceWorker Thread".

Expected: There are a few blocks of MessageLoop::RunTask.
Actual: The thread is occupied by a bunch of blocks of MessageLoop::RunTask.

> rockot@
Do you know why SimpleWatcher::OnHandleReady() is called multiple times?
mojo::SimpleWatcher is created in WebDataConsumerHandleImpl::ReaderImpl::StartWatching().


-----------------------------------------
[2] The resolver of .preloadResponse could be GCed.

According to the  issue 517398 , currently the ScriptPromiseProperty doesn't holds
the strong resolver references. So the resolver function could be GCed while
waiting for the response from the server.

If the resolver is GCed, there is no consumer of the preload response. So 100%
CPU usage issue [1] occurs in M59,60,61.

Reproduce step:
1. Start Chrome with "--js-flags="--expose-gc" flag (and
   "--enable-features=ServiceWorkerNavigationPreload" flag if are using older
   Chrome than 59.0.3071.0).
2. Go https://horo-t.github.io/serviceworker/demo/tmp/20170603/gcbug/
3. Click "register SW"
5. Click "test page" link

Expected: "OK" is shown in the page.
Actual: The page load will not finish.

--- sw.js ---
self.addEventListener('fetch', event => {
  console.log("fetch event");
  event.respondWith(event.preloadResponse
    .then(response => {
      console.log("preloadResponse resolved");
      return response;
    }));
  // Need to add --js-flags="--expose-gc" to the flags.
  setTimeout(_ => { self.gc(); }, 0);
});
-------------

The resolver function is GCed and will be never called.
So "preloadResponse resolved" is not shown in the console of DevTools.

I think the following all users of ScriptPromiseProperty has the similar GC bug.
  FontFace.load()
  FontFaceSet.load()
  BeforeInstallPromptEvent.userChoice()
  NavigatorBattery.getBattery()
  MediaKeySession.closed
  PresentationRequest.getAvailability()
  ServiceWorkerContainer.ready
It it correct? > yhirano@



I see. This CPU pegging is due to subtly incorrect usage of SimpleWatcher
(my fault.) When using AUTOMATIC ArmingPolicy, it's assumed that the user
will perform some operation to clear the handle's signaled state. Otherwise
another notification task will be posted immediately, ad infinitum.

I can implement a relatively simple fix for this and get it merged.
I think this could be possible fix to avoid the infinite call of OnHandleGotReadable
: https://chromium-review.googlesource.com/c/522144/

However, I think this isn't needed if the GC problem is solved because the fix will ensure the pipe will be read and it means that the handler isn't invoked again.
Thanks all for looking into this - we are planning to have our release candidate ready by Sunday 5PM (Pacific time), to be able to ship stable on Monday. If it's possible to get a fix verified in canary prior to that, that'll allow for merge to M59.

Comment 32 by horo@chromium.org, Jun 4 2017

I understand why the resolver is not called.
It is because the wrapper of FetchEvent is GCed.

FetchEvents are already kept alive by ServiceWorkerGlobalScopeProxy's |pending_preload_fetch_events_| until the preload response.
But the wrapper of FetchEvent could be GCed.
If the wrapper is GCed, ScriptPromiseProperty can't call the resolver.

If we keep the FetchEvent in JS like the following code, the resolve is correctly called.
This could be a workaround of this bug.
https://horo-t.github.io/serviceworker/demo/tmp/20170603/gcbug-workaround/sw.js
--- sw.js ---
self.addEventListener('fetch', event => {
  console.log("fetch event");
  event.respondWith(event.preloadResponse
    .then(response => {
      console.log("preloadResponse resolved");
      // [HACK] Just to keep the FetchEvent for  crbug.com/728013 .
      event;
      return response;
    }));
  // Need to add --js-flags="--expose-gc" to the flags.
  setTimeout(_ => {self.gc();}, 0);
});
-------------


I created a CL to keep the wrapper of FetchEvent alive while waiting for the preload response.
I think this CL can fix the issue [2] in the comment 28.
https://codereview.chromium.org/2922863002/ 

Comment 33 by horo@chromium.org, Jun 4 2017

Created LayoutTests for GC bug in the following methods:
  BeforeInstallPromptEvent.userChoice()
  NavigatorBattery.getBattery()
  PresentationRequest.getAvailability()
https://codereview.chromium.org/2919233002/
Owner: horo@chromium.org
Project Member

Comment 35 by bugdroid1@chromium.org, Jun 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7f1ac9b18fed816431798435d68d8ee90668d392

commit 7f1ac9b18fed816431798435d68d8ee90668d392
Author: horo <horo@chromium.org>
Date: Mon Jun 05 13:33:41 2017

Keep the wrapper of FetchEvent alive while waiting for the preload response.

This CL uses ActiveScriptWrappable and implements HasPendingActivity() to keep the wrapper of FetchEvent alive while waiting for the preload response.

FetchEvents are already kept alive by ServiceWorkerGlobalScopeProxy's |pending_preload_fetch_events_| until the preload response.
But the resolver function is not called in the current implementation if GC happens. ( https://crbug.com/728013#c28 )
That is because if the wrapper of FetchEvent are GCed, ScriptPromiseProperty can't call the resolver.

BUG= 728013 

Review-Url: https://codereview.chromium.org/2922863002
Cr-Commit-Position: refs/heads/master@{#476977}

[add] https://crrev.com/7f1ac9b18fed816431798435d68d8ee90668d392/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/navigation-preload-gc-before-response.html
[add] https://crrev.com/7f1ac9b18fed816431798435d68d8ee90668d392/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/resources/navigation-preload-gc-before-response-scope.php
[add] https://crrev.com/7f1ac9b18fed816431798435d68d8ee90668d392/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/resources/navigation-preload-gc-before-response-worker.js
[modify] https://crrev.com/7f1ac9b18fed816431798435d68d8ee90668d392/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.cpp
[modify] https://crrev.com/7f1ac9b18fed816431798435d68d8ee90668d392/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.h
[modify] https://crrev.com/7f1ac9b18fed816431798435d68d8ee90668d392/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.idl

Comment 36 Deleted

Thanks Horo! to keep in everyone in loop, we discussed this over email. We'll not block stable release for this, since a workaround is present (#32). However, the fix in #35 will be considered for a future respin. 

Comment 38 Deleted

Comment 39 by balp...@fb.com, Jun 5 2017

I am trying to verify the claimed workaround in our Facebook environment. I've changed each caller of event.respondWith() to use:

```
function respondWith(event, response) {
  event.respondWith(
    Promise.resolve(response).then((resp) => {
      // Keep `event` alive until the promise is resolved. Works around a
      // Chrome bug. Math.random ensures no DCE happens.  crbug.com/728013#c32 
      if (Math.random() === -1) {
        return event;
      }
      return resp;
    })
  );
}
```

but I can still reproduce the problem of 100% CPU usage in my testing locally. Can you let me know if I might be missing something? (In my brief testing, gcbug-workaround indeed does not exhibit the issue.)

Comment 40 by horo@chromium.org, Jun 5 2017

Hummm... I don't know why, but the workaround doesn't work with a function.
https://horo-t.github.io/serviceworker/demo/tmp/20170603/gcbug-workaround-fail/sw.js

I think if you use a Set to keep the navigation requests, you can avoid this GC bug.
https://horo-t.github.io/serviceworker/demo/tmp/20170603/gcbug-workaround-new/sw.js
```
self.pendingNavigationRequests = new Set();

function respondWith(event, response) {
  if (event.request.mode === 'navigate') {
    self.pendingNavigationRequests.add(event);
  }
  event.respondWith(
    Promise.resolve(response).then((resp) => {
      if (event.request.mode === 'navigate') {
        self.pendingNavigationRequests.delete(event);
      }
      return resp;
    })
  );
}
```

Comment 41 by horo@chromium.org, Jun 5 2017

Labels: Merge-Request-60 Merge-Request-59
Status: Fixed (was: Assigned)
Project Member

Comment 42 by sheriffbot@chromium.org, Jun 5 2017

Labels: -Merge-Request-60 Hotlist-Merge-Approved Merge-Approved-60
Your change meets the bar and is auto-approved for M60. Please go ahead and merge the CL to branch 3112 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 43 by sheriffbot@chromium.org, Jun 5 2017

Labels: -Merge-Request-59 Merge-Review-59 Hotlist-Merge-Review
This bug requires manual review: Only 0 days from stable, we might already have a stable candidate build
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 44 by bugdroid1@chromium.org, Jun 6 2017

Labels: -merge-approved-60 merge-merged-3112
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b

commit 50e7718fc9bc2d63c3c8a69de2a25864b970ba0b
Author: Tsuyoshi Horo <horo@chromium.org>
Date: Tue Jun 06 00:15:03 2017

[Merge to M60] Keep the wrapper of FetchEvent alive while waiting for the preload response.

This CL uses ActiveScriptWrappable and implements HasPendingActivity() to keep the wrapper of FetchEvent alive while waiting for the preload response.

FetchEvents are already kept alive by ServiceWorkerGlobalScopeProxy's |pending_preload_fetch_events_| until the preload response.
But the resolver function is not called in the current implementation if GC happens. ( https://crbug.com/728013#c28 )
That is because if the wrapper of FetchEvent are GCed, ScriptPromiseProperty can't call the resolver.

BUG= 728013 

Review-Url: https://codereview.chromium.org/2922863002
Cr-Original-Commit-Position: refs/heads/master@{#476977}
Review-Url: https://codereview.chromium.org/2922283002 .
Cr-Commit-Position: refs/branch-heads/3112@{#174}
Cr-Branched-From: b6460e24cf59f429d69de255538d0fc7a425ccf9-refs/heads/master@{#474897}

[add] https://crrev.com/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/navigation-preload-gc-before-response.html
[add] https://crrev.com/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/resources/navigation-preload-gc-before-response-scope.php
[add] https://crrev.com/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/resources/navigation-preload-gc-before-response-worker.js
[modify] https://crrev.com/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.cpp
[modify] https://crrev.com/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.h
[modify] https://crrev.com/50e7718fc9bc2d63c3c8a69de2a25864b970ba0b/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.idl

Comment 45 by balp...@fb.com, Jun 6 2017

horo: I can still repro 100% CPU usage with your latest suggestion. I tried with the code you gave and also tried even if I don't remove anything from the Set (I verified i can see all of the events in the devtools console at that point).

Any other ideas? 

Comment 46 by horo@chromium.org, Jun 6 2017

Does the loading of the page finish after the 100% CPU usage?
If so, I think you can avoid the 100% CPU usage bug by reading the preload response body as soon as possible.

This is a bug of [1] in the comment 28.
The preload response cause 100% CPU usage after the browser receives the response until the body is consumed.
For example this SW cause the 100% CPU usage for 1000 ms.
https://horo-t.github.io/serviceworker/demo/tmp/20170603/busybugconcat
```
self.addEventListener('fetch', event => {
  var delayHeaderPromise = new Promise(resolve => {
    setTimeout(_ => {
      fetch('./scope/header.html').then(resolve);
    }, 1000);
  })

  event.respondWith(
    Promise.all([delayHeaderPromise, event.preloadResponse])
      .then(responses => Promise.all(responses.map(res => res.text())))
      .then(texts => {
        return new Response(texts[0] + texts[1],
                            {headers: [['content-type', 'text/html']]});
      })
  );
});

```
horo@, can you file a new bug for the Watcher usage and assign it to rockot@?

Comment 48 by horo@chromium.org, Jun 6 2017

Filed another bug for the incorrect usage of SimpleWatcher in WebDataConsumerHandleImpl.  issue 729908 

Filed bugs for the similar GC issue:
  BeforeInstallPromptEvent:  issue 729902 
  PresentationRequest:  issue 729903 
  NavigatorBattery: issue729904
I'd still like to understand the impact of this bug.

When the preloadResponse resolver is gc'd (comment 28, part 2), does the page to fail to load (assuming the page relies on preloadResponse)? At that point can the user do anything to navigate to the page (reload, close tab, etc)?

When the preloadResponse is gc'd, it can (or always does?) trigger 100% CPU usage (comment 28, part 1)?

horo's patch in comment 35 is expected to prevent the gc, so it should allow the page to load.

However 100% CPU usage could still be caused while waiting for the preloadResponse, or by other users of WebDataConsumerHandleImpl, so we want to fix that separately in  issue 729908 . Is that right?
Project Member

Comment 50 by bugdroid1@chromium.org, Jun 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0e43d4844fcbcaf0fb75f7bfd21c7eafaf497dd0

commit 0e43d4844fcbcaf0fb75f7bfd21c7eafaf497dd0
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Tue Jun 06 09:27:36 2017

Use MANUAL arming policy for WebDataConsumerHandleImpl

OnHandleGotReadable was possible not to call Read. It triggers sudden re-arming
and hogging the CPU. This patch is to avoid the situation by using MANUAL
policy.

Bug:  728013 
Change-Id: Ie6bd7f78806d5b136dc302306cc9e920cdd01c92
Reviewed-on: https://chromium-review.googlesource.com/522144
Reviewed-by: Yutaka Hirano <yhirano@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#477242}
[modify] https://crrev.com/0e43d4844fcbcaf0fb75f7bfd21c7eafaf497dd0/content/child/web_data_consumer_handle_impl.cc
[modify] https://crrev.com/0e43d4844fcbcaf0fb75f7bfd21c7eafaf497dd0/content/child/web_data_consumer_handle_impl_unittest.cc

Comment 51 by horo@chromium.org, Jun 6 2017

> When the preloadResponse resolver is gc'd (comment 28, part 2), does the page to fail to load (assuming the page relies on preloadResponse)?

yes.

> At that point can the user do anything to navigate to the page (reload, close tab, etc)?

yes.

> When the preloadResponse is gc'd, it can (or always does?) trigger 100% CPU usage (comment 28, part 1)?

yes.

> horo's patch in comment 35 is expected to prevent the gc, so it should allow the page to load.
>
> However 100% CPU usage could still be caused while waiting for the preloadResponse, or by other users of WebDataConsumerHandleImpl, so we want to fix that separately in  issue 729908 . Is that right?

If the body of the preload response is not quickly consumed, 100% CPU usage could still happen (without the patch of the comment 50).
Example: https://horo-t.github.io/serviceworker/demo/tmp/20170603/busybugconcat/

I think other users of WebDataConsumerHandleImpl have the same issue.
Right? > yhirano@

ServiceWorker is the only user of WebDataConsumerHandleImpl.

Comment 53 by horo@chromium.org, Jun 6 2017

Cc: shimazu@chromium.org
 Issue 729908  has been merged into this issue.

Comment 54 by horo@chromium.org, Jun 6 2017

abdulsyed@

Is it OK to merge the patch of comment 50 to M60 and M59?
This patch fixes the bug [1] of the comment 28.
Labels: Merge-Request-60
Adding merge-request-60 label for M60. 
horo@ - thanks for the fix. For M59, can you please confirm if the fix is well tested in canary/dev and can we please confirm it in M60 Beta on Thursday? Is it a safe merge overall?
Project Member

Comment 57 by sheriffbot@chromium.org, Jun 6 2017

Labels: -Merge-Request-60 Merge-Approved-60
Your change meets the bar and is auto-approved for M60. Please go ahead and merge the CL to branch 3112 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 58 by bugdroid1@chromium.org, Jun 7 2017

Labels: -merge-approved-60
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/94340a4f92ed3b97655dd738151b41cb369678c0

commit 94340a4f92ed3b97655dd738151b41cb369678c0
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Wed Jun 07 02:02:03 2017

[Merge to M60] Use MANUAL arming policy for WebDataConsumerHandleImpl

OnHandleGotReadable was possible not to call Read. It triggers sudden re-arming
and hogging the CPU. This patch is to avoid the situation by using MANUAL
policy.

Bug:  728013 
Change-Id: Ie6bd7f78806d5b136dc302306cc9e920cdd01c92
Reviewed-on: https://chromium-review.googlesource.com/522144
Reviewed-by: Yutaka Hirano <yhirano@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#477242}
Review-Url: https://codereview.chromium.org/2930573002 .
Cr-Commit-Position: refs/branch-heads/3112@{#215}
Cr-Branched-From: b6460e24cf59f429d69de255538d0fc7a425ccf9-refs/heads/master@{#474897}

[modify] https://crrev.com/94340a4f92ed3b97655dd738151b41cb369678c0/content/child/web_data_consumer_handle_impl.cc
[modify] https://crrev.com/94340a4f92ed3b97655dd738151b41cb369678c0/content/child/web_data_consumer_handle_impl_unittest.cc

Thanks - is the M59 merge request still for comment 50? Or does it include comment 58 as well?

Comment 60 by horo@chromium.org, Jun 7 2017

We want to merge both change to M59.
 - Use MANUAL arming policy for WebDataConsumerHandleImpl
 - Keep the wrapper of FetchEvent alive while waiting for the preload response.
re c#56:
I didn't see any failures which is coming from the changes of c#35 and c#50 on canary as far as I check UMA (StartWorker.Status/FetchEvent.MainResource.Status/FetchEvent.HasResponse.Time) and the crash dashboard. 
I'll check again after the patches are pushed into M60 beta. c#50 is initially in 60.0.3112.23. (this will happen soon, right?)

re c#59:
We want to merge c#35 and c#50 actually. c#44 and c#58 are commits for merge. 

Cc: brajkumar@chromium.org
Labels: Needs-Feedback
Tested this issue on Windows-10, Ubuntu 14.04 and Mac OS 10.12 using chrome latest beta M60-60.0.3112.24 by following steps mentioned below.

The preload response:
---------------------
1. Launched Chrome with "--enable-features=ServiceWorkerNavigationPreload" flag
2. Navigated to https://horo-t.github.io/serviceworker/demo/tmp/20170603/busybug/
3. Clicked on "register SW" button
4. Navigated to chrome://tracing/ in a new tab and started recording with "toplevel" check box enabled.
5. Clicked "test page" link and observed "OK" in the page.
6. Stoped the tracing and observed the "ServiceWorker Thread".

Expected: There are a few blocks of MessageLoop::RunTask as attached in the screen-shot

The resolver of .preloadResponse:
----------------------------------
1. Navigated to Chrome with "--js-flags="--expose-gc" flag and "--enable-features=ServiceWorkerNavigationPreload" flag
2. Navigated to https://horo-t.github.io/serviceworker/demo/tmp/20170603/gcbug/
3. Clicked on "register SW" button
5. Clicked on "test page" link

Expected: "OK" is shown in the page

Tested the same on chrome #59.0.3071.84 version and observed few blocks of MessageLoop::RunTask in preload response (Refer screen-shot) and by clicking on "test page" link it did not take much time for loading the page "ok" in resolver of .preloadResponse.

horo@ Could you please check the screen-shot and let us know is this is the expected behavior for this issue to verify it from chrome-TE end.

Thanks!


59.0.3071.84.PNG
441 KB View Download
60.0.3112.24.png
178 KB View Download

Comment 63 by horo@chromium.org, Jun 8 2017

The screenshots that you attached are the expected behavior of the bug [1].
> [1] The preload response can cause 100% CPU usage in the service worker thread.
>     This was introduced by 9eadabae1de0cb22725450596fe324f03dd63b92 in M59.

Sorry, the GC bug [2] is not reliably reproducible.
> [2] The resolver of .preloadResponse could be GCed. This exist from M57 when we
>    started origin trial of navigation preload.

But I can reproduce the bug in my Linux machine.
I attached the screenshot that the loading doesn't finish in Chrome 59.0.3071.86.
GCBug-59.0.3071.86.png
554 KB View Download
GCBug-60.0.3112.24.png
587 KB View Download

Comment 64 by horo@chromium.org, Jun 9 2017

And I verified that the BUG was fixed in 60.0.3112.24.
Labels: -Merge-Review-59 Merge-Approved-59
I spoke with horo@ offline, and they confirmed that the patches in question only affect sites utilizing navigation preload - thus we are not any worse off if the patches do not work.  We've also confirmed the fixes, and code has not diverged substantially since it was written.  Merge approved for M59 branch 3071.
Project Member

Comment 66 by bugdroid1@chromium.org, Jun 9 2017

Labels: -merge-approved-59 merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f3481c733ec543b5f8afcee11a5670166b5b7388

commit f3481c733ec543b5f8afcee11a5670166b5b7388
Author: Makoto Shimazu <shimazu@chromium.org>
Date: Fri Jun 09 01:15:31 2017

[Merge to M59] Use MANUAL arming policy for WebDataConsumerHandleImpl

OnHandleGotReadable was possible not to call Read. It triggers sudden re-arming
and hogging the CPU. This patch is to avoid the situation by using MANUAL
policy.

Bug:  728013 
Change-Id: Ie6bd7f78806d5b136dc302306cc9e920cdd01c92
Reviewed-on: https://chromium-review.googlesource.com/522144
Reviewed-by: Yutaka Hirano <yhirano@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Makoto Shimazu <shimazu@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#477242}
Review-Url: https://codereview.chromium.org/2934433002 .
Cr-Commit-Position: refs/branch-heads/3071@{#763}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/f3481c733ec543b5f8afcee11a5670166b5b7388/content/child/web_data_consumer_handle_impl.cc
[modify] https://crrev.com/f3481c733ec543b5f8afcee11a5670166b5b7388/content/child/web_data_consumer_handle_impl_unittest.cc

Project Member

Comment 67 by bugdroid1@chromium.org, Jun 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8533a49fdc1c05b8b5194c58b8f5f52965e2c673

commit 8533a49fdc1c05b8b5194c58b8f5f52965e2c673
Author: Tsuyoshi Horo <horo@chromium.org>
Date: Fri Jun 09 01:38:42 2017

[Merge to M59] Keep the wrapper of FetchEvent alive while waiting for the preload response.

This CL uses ActiveScriptWrappable and implements HasPendingActivity() to keep the wrapper of FetchEvent alive while waiting for the preload response.

FetchEvents are already kept alive by ServiceWorkerGlobalScopeProxy's |pending_preload_fetch_events_| until the preload response.
But the resolver function is not called in the current implementation if GC happens. ( https://crbug.com/728013#c28 )
That is because if the wrapper of FetchEvent are GCed, ScriptPromiseProperty can't call the resolver.

BUG= 728013 
TBR=yhirano@chromium.org

Review-Url: https://codereview.chromium.org/2922863002
Cr-Original-Commit-Position: refs/heads/master@{#476977}
Review-Url: https://codereview.chromium.org/2933463002 .
Cr-Commit-Position: refs/branch-heads/3071@{#764}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[add] https://crrev.com/8533a49fdc1c05b8b5194c58b8f5f52965e2c673/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/navigation-preload-gc-before-response.html
[add] https://crrev.com/8533a49fdc1c05b8b5194c58b8f5f52965e2c673/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/resources/navigation-preload-gc-before-response-scope.php
[add] https://crrev.com/8533a49fdc1c05b8b5194c58b8f5f52965e2c673/third_party/WebKit/LayoutTests/http/tests/serviceworker/navigation-preload/chromium/resources/navigation-preload-gc-before-response-worker.js
[modify] https://crrev.com/8533a49fdc1c05b8b5194c58b8f5f52965e2c673/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.cpp
[modify] https://crrev.com/8533a49fdc1c05b8b5194c58b8f5f52965e2c673/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.h
[modify] https://crrev.com/8533a49fdc1c05b8b5194c58b8f5f52965e2c673/third_party/WebKit/Source/modules/serviceworkers/FetchEvent.idl

Labels: -Hotlist-Merge-Review
Blocking: 649558

Sign in to add a comment