New issue
Advanced search Search tips

Issue 856969 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 9
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Layout Test external/wpt/fetch/api/abort/serviceworker-intercepted.https.html is flaky

Project Member Reported by grt@chromium.org, Jun 27 2018

Issue description

The following layout test is flaky on all platforms

external/wpt/fetch/api/abort/serviceworker-intercepted.https.html
virtual/outofblink-cors/external/wpt/fetch/api/abort/serviceworker-intercepted.https.html

Probable cause:

I have no idea.
 

Comment 1 by grt@chromium.org, Jun 27 2018

https://test-https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webkit_layout_tests&tests=external%2Fwpt%2Ffetch%2Fapi%2Fabort%2Fserviceworker-intercepted.https.html

23:42:55.112 6575 worker/1 external/wpt/fetch/api/abort/serviceworker-intercepted.https.html output stderr lines:
23:42:55.112 6575   [1:1:0626/234237.502924:ERROR:render_process_impl.cc(209)] WebFrame LEAKED 170 TIMES
23:42:55.115 6453 [1910/3446] external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed unexpectedly (test timed out)
23:42:55.112 6575 worker/1 killing primary driver
23:42:55.112 6575 worker/1 killing secondary driver
23:42:55.113 6575 worker/1 external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed:
23:42:55.113 6575 worker/1  test timed out

Project Member

Comment 2 by bugdroid1@chromium.org, Jun 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/215330d2d03c22d971a0caedfe4a1a9b4ecd8d15

commit 215330d2d03c22d971a0caedfe4a1a9b4ecd8d15
Author: Greg Thompson <grt@chromium.org>
Date: Wed Jun 27 10:33:24 2018

Mark serviceworker-intercepted.https.html tests as flaky.

BUG= 856969 
TBR=grt@chromium.org
NOTRY=true

Change-Id: I1864afbdfa1a71f4d6acdcba9ca9873659cc5ddb
Reviewed-on: https://chromium-review.googlesource.com/1116779
Reviewed-by: Greg Thompson <grt@chromium.org>
Commit-Queue: Greg Thompson <grt@chromium.org>
Cr-Commit-Position: refs/heads/master@{#570724}
[modify] https://crrev.com/215330d2d03c22d971a0caedfe4a1a9b4ecd8d15/third_party/WebKit/LayoutTests/TestExpectations

Comment 3 by falken@chromium.org, Jun 28 2018

Cc: leon....@intel.com
Components: Blink>Network>FetchAPI
Started looking at this. It's only flaky on Debug. The flakes started around June 24 around r569915. The only service worker or fetch related change I spot around that time is Leon's https://chromium-review.googlesource.com/1112904 which seems unlikely here.

Comment 4 by falken@chromium.org, Jun 28 2018

A local run on a Debug build can sometimes hit the timeout, when the abort/ directory is run together (not sure if it's necessary).

$ third_party/WebKit/Tools/Scripts/run-webkit-tests --target=Debag --iterations=100 external/wpt/fetch/api/abort/
[56/600] external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed unexpectedly (test timed out)
[68/600] external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed unexpectedly (test timed out)
[74/600] external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed unexpectedly (test timed out)

Comment 5 by falken@chromium.org, Jun 28 2018

OK I can get timeouts just running this one test:
$ third_party/WebKit/Tools/Scripts/run-webkit-tests --target=Debag --iterations=100 external/wpt/fetch/api/abort/serviceworker-intercepted.https.html
[4/100] external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed unexpectedly (test timed out)
[14/100] external/wpt/fetch/api/abort/serviceworker-intercepted.https.html failed unexpectedly (test timed out)

Comment 6 by falken@chromium.org, Jun 28 2018

Cc: -leon....@intel.com
Owner: leon....@intel.com
Status: Assigned (was: Untriaged)
Hum, this does seem caused by https://chromium-review.googlesource.com/1112904 after all. When that patch is reverted there are no timeouts.

Leon, can you take a look? I guess we're getting OnDetached() called in some cases where we still need to listen for OnScriptLoad() or else the job gets stucked?

Comment 7 by falken@chromium.org, Jun 28 2018

Labels: -Type-Bug -Pri-2 M-68 Pri-1 Type-Bug-Regression
It's possibly not P1 since it only flakes on Debug, but it's a bit scary. If the job queue gets stuck only a browser reboot can fix that.

Comment 8 by leon....@intel.com, Jun 28 2018

Labels: -Pri-1 -Type-Bug-Regression Pri-2 Type-Bug
Status: Started (was: Assigned)

Comment 9 by leon....@intel.com, Jun 28 2018

Labels: -Type-Bug -Pri-2 Pri-1 Type-Bug-Regression

Comment 10 by leon....@intel.com, Jun 28 2018

c#6 is right, we got OnDetached() due to the connection error handler of "mojom::EmbeddedWorkerInstanceClientPtr client_;", seems after that SWRegisterJob still needs to wait for OnScriptLoad() to continue.

One thing is confusing to me: as OnScriptLoaded()'s owner interface EmbeddedWorkerInstanceHost is associated on the message pipe of mojom::EmbeddedWorkerInstanceClient, if the message pipe got broken, EmbeddedWorkerInstanceHost::OnScriptLoaded() should also have no chance to be dispatched after that.
Could the connection possibly be re-established later? not sure.

Or client_ was destructed, but renderer still has a pointer to the  host and can send messages to it?
Project Member

Comment 12 by bugdroid1@chromium.org, Jun 29 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1c81791b45c09b52614f85a7016425b71da32c4d

commit 1c81791b45c09b52614f85a7016425b71da32c4d
Author: Han Leon <leon.han@intel.com>
Date: Fri Jun 29 07:11:10 2018

[ServiceWorker] Stop observing EmbeddedWorkerInstance once it's gone away

This CL adds a function OnDestroyed() into
EmbeddedWorkerInstance::Listener interface, so that
ServiceWorkerRegisterJob can stop observing EmbeddedWorkerInstance once
it's gone away.

BUG= 856969 , 856330 

Change-Id: I3acc9143f9ea2f8f24ab0aac465f29f97fdf9906
Reviewed-on: https://chromium-review.googlesource.com/1118006
Commit-Queue: Han Leon <leon.han@intel.com>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571417}
[modify] https://crrev.com/1c81791b45c09b52614f85a7016425b71da32c4d/content/browser/service_worker/embedded_worker_instance.cc
[modify] https://crrev.com/1c81791b45c09b52614f85a7016425b71da32c4d/content/browser/service_worker/embedded_worker_instance.h
[modify] https://crrev.com/1c81791b45c09b52614f85a7016425b71da32c4d/content/browser/service_worker/service_worker_register_job.cc
[modify] https://crrev.com/1c81791b45c09b52614f85a7016425b71da32c4d/content/browser/service_worker/service_worker_register_job.h
[modify] https://crrev.com/1c81791b45c09b52614f85a7016425b71da32c4d/third_party/WebKit/LayoutTests/TestExpectations

Labels: Merge-Request-68
From the dashboard https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webkit_tests&tests=external%2Fwpt%2Fservice-workers%2Fservice-worker%2Fabout-blank-replacement.https.html%2Cexternal%2Fwpt%2Ffetch%2Fapi%2Fabort%2Fserviceworker-intercepted.https.html, we can see the flake has disappeared.

Although M68 has no such a flake, we know the flake will also appear there once we merge the fix for issue 854063 to M68, so, here I want to request to merge the commit at c#12 to M68, too, together with that CL from issue 854063, just as https://bugs.chromium.org/p/chromium/issues/detail?id=854063#c13 said. Thanks.
Project Member

Comment 14 by sheriffbot@chromium.org, Jul 3

Labels: -Merge-Request-68 Hotlist-Merge-Review Merge-Review-68
This bug requires manual review: M68 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
How safe is this merge overall? How common is this scenario where this crash can be triggered? Why should we merge this if we're not seeing in M68?
This commit has a small size and I'm confident it is safe enough. Actually this issue is a regression caused by the commit 26f26cd82a8159dff52fdef10b0a7ebb84c48040, which fixed issue 854063. Because that commit has not been merged into M68, we're not seeing this issue's problem in M68, but M68 does have the problem of issue 854063 now, so I think we should merge the 2 commits together into M68, then everything would be OK there. Thanks.
Labels: -Merge-Review-68 Merge-Approved-68
Ok thanks. Approving merge for M68, branch:3440. Can you please ensure that issue 854063 is also merged. It has been approved. 
Project Member

Comment 18 by bugdroid1@chromium.org, Jul 7

Labels: -merge-approved-68 merge-merged-3440
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/baf2459c1cc305b127d41424a86d948e731ee233

commit baf2459c1cc305b127d41424a86d948e731ee233
Author: Han Leon <leon.han@intel.com>
Date: Sat Jul 07 06:20:28 2018

[M68] [ServiceWorker] Stop observing EmbeddedWorkerInstance once it's gone away

This CL is a combination of the 2 previous commits on trunk, both of
them aim to avoid use-after-free behavior on the raw pointer of an
EmbeddedWorkerInstance.
After the 1st CL landed on trunk, it turns out that it did not fix the
problem completely and caused regressions like  issue 856969 , then we
have the 2nd CL give a complete solution.

The 1st one:
"
[ServiceWorker] Avoid using stale pointer of EmbeddedWorkerInstance

We're saving the raw pointer of an EmbeddedWorkerInstance in a
base::ScopedObserver member of ServiceWorkerRegisterJob, but, the
EmbeddedWorkerInstance is not guaranteed to outlive the
ServiceWorkerRegisterJob, it may get destroyed and lead to a situation
that ServiceWorkerRegisterJob holds a stale pointer, which could lead
to crash.

This CL lets ServiceWorkerRegisterJob listen OnDetached() from the
EmbeddedWorkerInstance so it can be aware that the EmbeddedWorkerInstanc
pointer is going to be stale and should be removed from the source list
of base::ScopedObserver.

This fix should be safe enough now, but in longer term we'll try to
remove EmbeddedWorkerInstance::Listener interface and let
EmbeddedWorkerInstance talk directly to its owner ServiceWorkerVersion,
then let ServiceWorkerRegisterJob observe the ServiceWorkerVersion it's
manipulating to get OnScriptLoaded() notification.

BUG=854063,855394

Change-Id: I9c5a46beda1aafff32a86b9055be2b53d50fda97
Reviewed-on: https://chromium-review.googlesource.com/1112904
Commit-Queue: Han Leon <leon.han@intel.com>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569906}
(cherry picked from commit 26f26cd82a8159dff52fdef10b0a7ebb84c48040)
"

The 2nd one:
"
[ServiceWorker] Stop observing EmbeddedWorkerInstance once it's gone away

This CL adds a function OnDestroyed() into
EmbeddedWorkerInstance::Listener interface, so that
ServiceWorkerRegisterJob can stop observing EmbeddedWorkerInstance once
it's gone away.

BUG= 856969 , 856330 

Change-Id: I3acc9143f9ea2f8f24ab0aac465f29f97fdf9906
Reviewed-on: https://chromium-review.googlesource.com/1118006
Commit-Queue: Han Leon <leon.han@intel.com>
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571417}
(cherry picked from commit 1c81791b45c09b52614f85a7016425b71da32c4d)
"

BUG=854063, 856969 

Change-Id: I91c68d8edba41f6f7725903288fd12ec7574c538
Reviewed-on: https://chromium-review.googlesource.com/1128469
Reviewed-by: Matt Falkenhagen <falken@chromium.org>
Cr-Commit-Position: refs/branch-heads/3440@{#608}
Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733}
[modify] https://crrev.com/baf2459c1cc305b127d41424a86d948e731ee233/content/browser/service_worker/embedded_worker_instance.cc
[modify] https://crrev.com/baf2459c1cc305b127d41424a86d948e731ee233/content/browser/service_worker/embedded_worker_instance.h
[modify] https://crrev.com/baf2459c1cc305b127d41424a86d948e731ee233/content/browser/service_worker/service_worker_register_job.cc
[modify] https://crrev.com/baf2459c1cc305b127d41424a86d948e731ee233/content/browser/service_worker/service_worker_register_job.h

Status: Fixed (was: Started)

Sign in to add a comment