New issue
Advanced search Search tips
Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Bug

Blocking:
issue 834200



Sign in to add a comment
link

Issue 879015: Extension SW event dispatch: race in dispatching events to stopped worker

Reported by lazyboy@chromium.org, Aug 30 Project Member

Issue description

TLDR, we use SWContext::StartWorkerForPattern to dispatch extension events to a stopped worker, however, at the point StartWorkerForPattern's callback runs, I see
EmbeddedWorkerInstance::status() == RUNNING, but the worker script might not have completed running all its (synchronous) JavaScript (akin to LoadStop for WebContents).

For stopped worker, extensions currently rely on the synchronous parts of SW script to have finished. At least two tests currently suffer from this race and shows flakiness:
ServiceWorkerLazyBackgroundTest.EventsToStoppedExtension and
ServiceWorkerLazyBackgroundTest.EventsAfterRestart ( https://crbug.com/834200 )

Details:
(reference: https://cs.chromium.org/chromium/src/extensions/docs/events.md?rcl=203a8b36eb9c5d982fb140ac20e08218dd34b465&l=95, look into L1 registration specifically)

Say from an extension sw.js:
chrome.tabs.onCreated.addListener(...)

In browser process, we'll add a lazy version of the listener (through EventRouter::AddLazyServiceWorkerEventListener, call it L2) and live version of listener (through EventRouter::AddServiceWorkerEventListener, call it L1). The live version contains info about SW (process id, thread id, version id...), but the lazy version won't, it will contain SW scope instead for example.

Dispatching an event while the worker is running will use the live version of the listener. (It will still use StartWorkerForPattern, but that's irrelevant here).

However once the worker is stopped, browser process will only remember L2 (as expected). Dispatching an event at this point will start the worker through ServiceWorkerContext::StartWorkerForPattern with L2's scope. In normal cases sw.js's chrome.tabs.onCreated will add  L1 back again, but if it is delayed after StartWorkerPattern's callback is called, then we won't see L1 in the browser process yet, and dispatching from there would fail as the browser process doesn't have knowledge about the listener.

Matt: For lazy background page events, we rely on the background page's DidLoadStop (might not be 100% exact, but that's roughly what happens). This ensures that synchronous scripts are run and L1 is registered as described above. Is there any similar concept for Service Workers that we can use?
Also, doesn't vanilla Service Worker events depend on something similar?


Assigning to falken@ temporarily, please assign it back to me with response :)
 

Comment 1 by falken@chromium.org, Aug 30

Not 100% sure I understand. Is the problem that sw.js calls chrome.tabs.onCreate.addListener(...) at the top-level of the script but you're seeing it being called after RUNNING status was entered? 

The service worker doesn't reach RUNNING until it finished initial JavaScript execution, so it's expected that top-level JS is done. The service worker could of course still be running JavaScript while in RUNNING state e.g., if it did Promise.resolve().then(...) or setTimeout or something started dispatching events to it.

Is chrome.tabs.onCreate.addListener sending an IPC to the browser process? Could it be a race between that IPC and the OnStarted IPC from the service worker, since they may be on different Mojo/IPC pipes?

Comment 2 by lazyboy@chromium.org, Aug 30

Cc: -lazyboy@chromium.org falken@chromium.org
Owner: lazyboy@chromium.org
Is the problem that sw.js calls chrome.tabs.onCreate.addListener(...) at the top-level of the script but you're seeing it being called after RUNNING status was entered? 
-> Yes, that's what I'm seeing IIUC.

The service worker doesn't reach RUNNING until it finished initial JavaScript execution, so it's expected that top-level JS is done. 
-> Thanks!, that's the guarantee I was looking for before diving deeper.

re: setTimeout/Promise:
-> No, addListener is called synchronously from top level JS

re: IPC
yes, it's an IPC, particularly ExtensionHostMsg_AddListener. The race is plausible between IPC and whatever SW code is using (mojo?).
I think I'll take a stab at ensuring that first. Any hint is appreciated :)

Comment 3 by falken@chromium.org, Aug 30

Yep IPC ordering seems likely. The service worker OnStarted message is on a dedicated Mojo pipe: see EmbeddedWorkerInstanceClient.OnStarted in embedded_worker.mojom.

There's no ordering guarantee between that message and legacy IPCs.

Comment 4 by lazyboy@chromium.org, Aug 31

Cc: rdevlin....@chromium.org
Status: Started (was: Assigned)
^^^I can confirm that I'm hitting the out-of-order message arrival in the browser process which is indeed as suspected is the root case. The worker thread sees SendWorkerStarted *after* we dispatch extension message from renderer worker thread, but the browser receives them in reverse order sometimes...

Just documenting this for now, will follow up with possible "fixes" for this later. Also, /cc Devlin.

Debug code and logs for reference:
https://chromium-review.googlesource.com/c/chromium/src/+/1198528/1

Successful case:
[201944:201944:0830/174153.817594:ERROR:service_worker_apitest.cc(1609)] Opening new tab to get message
[201944:201944:0830/174153.850439:ERROR:event_router.cc(536)] DispatchEventImpl: tabs.onCreated
[201944:201944:0830/174153.850488:ERROR:event_router.cc(563)] Dispatch lazy is_for_service_worker: event_name = tabs.onCreated
[201944:201944:0830/174153.850588:ERROR:service_worker_task_queue.cc(156)] RunAfterStartWorker
[1:14:0830/174154.032396:ERROR:ipc_message_sender.cc(185)] [WORKER] [thread_id: 14], +WorkerThreadDispatcher
[1:14:0830/174154.041951:ERROR:ipc_message_sender.cc(231)] [WORKER] [thread_id: 14], AddListener = tabs.onCreated  <------------------ [a]
[201944:201944:0830/174154.042330:ERROR:extension_message_filter.cc(180)] [UI] ExtensionMessageFilter AddListener: tabs.onCreated <---- [a']
[201944:201944:0830/174154.042365:ERROR:event_router.cc(195)] ER::AddServiceWorkerEventListener: tabs.onCreated
[1:14:0830/174154.042333:ERROR:service_worker_context_client.cc(1548)] [WORKER] [thread_id: 14], ServiceWorkerContextClient::SendWorkerStarted <-------- [b]
[201944:203022:0830/174154.042927:ERROR:embedded_worker_instance.cc(843)] EmbeddedWorkerInstance::OnStarted[IO] [Will be RUNNING], thread_id = 14  <---- [b']
[201944:203022:0830/174154.043215:ERROR:service_worker_context_wrapper.cc(97)] DidStatWorker, instance->status() = 2
[201944:203022:0830/174154.043236:ERROR:service_worker_task_queue.cc(44)] [IO] DidStartWorkerForPattern: gjfaagkmdpfpdbaajdccmodbajagiica
[201944:201944:0830/174154.043291:ERROR:service_worker_task_queue.cc(31)] [UI] DidStartWorkerForPattern->Running Task: RunTask  <------------------------[b'']
[201944:201944:0830/174154.043331:ERROR:event_router.cc(826)] [UI] DispatchPendingEvent: tabs.onCreated, worker_thread_id = 14
[201944:201944:0830/174154.043377:ERROR:event_router.cc(91)] DispatchExtensionMessage: event_name = tabs.onCreated
[201944:201944:0830/174154.049457:ERROR:service_worker_apitest.cc(1614)] Done

out-of-order/failure case:
[55734:55734:0830/174530.254853:ERROR:service_worker_apitest.cc(1609)] Opening new tab to get message
[55734:55734:0830/174530.281004:ERROR:event_router.cc(536)] DispatchEventImpl: tabs.onCreated
[55734:55734:0830/174530.281042:ERROR:event_router.cc(563)] Dispatch lazy is_for_service_worker: event_name = tabs.onCreated
[55734:55734:0830/174530.281112:ERROR:service_worker_task_queue.cc(156)] RunAfterStartWorker
[1:14:0830/174530.429605:ERROR:ipc_message_sender.cc(185)] [WORKER] [thread_id: 14], +WorkerThreadDispatcher
[1:14:0830/174530.473726:ERROR:ipc_message_sender.cc(231)] [WORKER] [thread_id: 14], AddListener = tabs.onCreated     <---------------------------------- [a]
[1:14:0830/174530.474039:ERROR:service_worker_context_client.cc(1548)] [WORKER] [thread_id: 14], ServiceWorkerContextClient::SendWorkerStarted <--------- [b]
[55734:56129:0830/174530.474480:ERROR:embedded_worker_instance.cc(843)] EmbeddedWorkerInstance::OnStarted[IO] [Will be RUNNING], thread_id = 14  <------- [b']
[55734:56129:0830/174530.474735:ERROR:service_worker_context_wrapper.cc(97)] DidStatWorker, instance->status() = 2
[55734:56129:0830/174530.474754:ERROR:service_worker_task_queue.cc(44)] [IO] DidStartWorkerForPattern: gjfaagkmdpfpdbaajdccmodbajagiica
[55734:55734:0830/174530.475504:ERROR:service_worker_task_queue.cc(31)] [UI] DidStartWorkerForPattern->Running Task: RunTask      <---------------------- [b'']
[55734:55734:0830/174530.475532:ERROR:event_router.cc(826)] [UI] DispatchPendingEvent: tabs.onCreated, worker_thread_id = 14
[55734:55734:0830/174530.475544:ERROR:event_router.cc(840)] DispatchPendingEvent: NO PROCESS FOUND TO DISPATCH, IMMINENT FAILURE AHEAD
[55734:55734:0830/174530.475552:ERROR:event_router.cc(841)] DispatchPendingEvent: NO PROCESS FOUND TO DISPATCH, IMMINENT FAILURE AHEAD
[55734:55734:0830/174530.475560:ERROR:event_router.cc(842)] DispatchPendingEvent: NO PROCESS FOUND TO DISPATCH, IMMINENT FAILURE AHEAD
[55734:55734:0830/174530.475608:ERROR:extension_message_filter.cc(180)] [UI] ExtensionMessageFilter AddListener: tabs.onCreated   <---------------------- [a']
[55734:55734:0830/174530.475619:ERROR:event_router.cc(195)] ER::AddServiceWorkerEventListener: tabs.onCreated

Comment 5 by bugdroid1@chromium.org, Sep 13

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2

commit b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2
Author: Istiaque Ahmed <lazyboy@chromium.org>
Date: Thu Sep 13 15:17:25 2018

[Extensions SW] Fix Event registration race.

There's no ordering guarantee between SW mojo IPCs and extension IPCs.
This (infrequently) causes browser process to see event listener registration
*after* the browser process decides to dispatch that event.

This CL adds extension IPCs to notify browser process when:
  - A service worker context has finished loading and executing its global JS
    scope's code (this corresponds to EmbeddedWorkerInstance::OnStarted)
  - A service worker context has destroyed.

With these IPCs, browser process can identify which extenison SWs are running
at a given point, and ServiceWorkerTaskQueue can queue up tasks to dispatch
events when a worker hasn't completely started, e.g. when the worker has started
from SW mojo IPC's perspective, but it hasn't yet finished executing its global
JS code from legacy IPC's perspective.

Bug:  879015 
Test: EventsToStoppedWorker test doesn't flake locally anymore.
Change-Id: I390e102a55acb97d680265d3362d7e01d72aea1b
Reviewed-on: https://chromium-review.googlesource.com/1203319
Commit-Queue: Istiaque Ahmed <lazyboy@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Reviewed-by: Devlin <rdevlin.cronin@chromium.org>
Reviewed-by: Makoto Shimazu <shimazu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#591014}
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/chrome/browser/extensions/service_worker_apitest.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/chrome/renderer/chrome_content_renderer_client.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/chrome/renderer/chrome_content_renderer_client.h
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/content/public/renderer/content_renderer_client.h
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/content/renderer/service_worker/service_worker_context_client.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/browser/extension_service_worker_message_filter.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/browser/extension_service_worker_message_filter.h
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/browser/service_worker_task_queue.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/browser/service_worker_task_queue.h
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/common/extension_messages.h
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/renderer/dispatcher.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/renderer/dispatcher.h
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/renderer/worker_thread_dispatcher.cc
[modify] https://crrev.com/b8e24bdbf6bbe43bc32fd684f2a0aad914e804d2/extensions/renderer/worker_thread_dispatcher.h

Comment 6 by lazyboy@chromium.org, Sep 13

Status: Fixed (was: Started)

Sign in to add a comment