Issue metadata
Sign in to add a comment
|
Flaky browsertest: BackgroundSyncBrowserTest.RegisterFromServiceWorker |
||||||||||||||||||||||
Issue descriptionFailing build log: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/272388 In BackgroundSyncBrowserTest.RegisterFromServiceWorker, "ok - foo_sw registered in SW" and "foo_sw fired" messages are expected to be received in the order of [1]. However, there seems no guarantee of the ordering between ServiceWorkerEventDispatcher::DispatchSyncEvent and the response of BackgroundSyncService::Register. As I saw [2], DispatchSyncEvent seems to be called before calling the callback of Register. [1] https://cs.chromium.org/chromium/src/content/browser/background_sync/background_sync_browsertest.cc?sq=package:chromium&dr=CSs&l=409 [2] https://cs.chromium.org/chromium/src/content/browser/background_sync/background_sync_manager.cc?sq=package:chromium&dr=CSs&rcl=1481084145&l=690 Could you fix the test, or is there a spec clarifying these order?
,
Dec 15 2016
Leon and I discussed this issue at https://codereview.chromium.org/2534403002/#msg51
,
Dec 15 2016
I'd be surprised if the spec allows the 'sync' event to occur before the register promise resolves. Indeed register() spec steps include[1]: 5. Resolve promise. 6. If the user agent is currently online, fire a sync event for newRegistration. [1] https://wicg.github.io/BackgroundSync/spec/#dom-syncmanager-register That code review suggests the CL didn't cause the ordering problem and it existed previously?
,
Dec 15 2016
shimazu@ points out that in BackgroundSyncManager::RegisterDidStore,
// fires 'sync' event
FireReadyEvents();
// resolve register() promise
base::ThreadTaskRunnerHandle::Get()->PostTask(
FROM_HERE,
base::Bind(callback, BACKGROUND_SYNC_STATUS_OK,
base::Passed(base::MakeUnique<BackgroundSyncRegistration>(
new_registration))));
The 'sync' event dispatch and register() promise resolution appear to be racing. He speculates that previously the sync event was an IPC which usually arrived after the Mojo callback to resolve the promise.
This would appear to be a bug in background sync.
,
Dec 15 2016
Sorry, I've forgot that BackgoundSync was the first user of mojo for the service workers. https://crrev.com/2490623005 just moved messages owned by BackgroundSyncClient to ServiceWorkerEventDispatcher. I think previously there was another IPC sent by InterfaceProvider internally to create BackgroundSyncClient and that might defer reaching the dispatchSyncEvent message.
,
Dec 15 2016
and currently ServiceWorkerEventDispatcher doesn't have any mechanisms to guarantee the message ordering between another interface. I'll consider how to do that.
,
Dec 15 2016
Raising priority as it's possible this is a regression in M57 (https://crrev.com/2490623005) although the ordering was never guaranteed. background sync owners: Can you comment on the severity of the problem if the ordering is mixed? mojo experts: Can you recommend a way forward to guarantee the ordering? Also would it make sense to land https://codereview.chromium.org/2575403002/ as a mitigation although the ordering is still not guaranteed? I'm going to free this bug up in the meantime in case an aforementioned owner wants to take it.
,
Dec 15 2016
Oh the joys of mojo synchronization. There probably was a race before. The code pointed out in #4 is okay (ultimately FireReadyEventsImpl runs after RegisterDidStore's callback is run). But regardless of that ordering, the existence of two mojo channels means that they can race. iclelland@ do you have a proposal for this?
,
Dec 15 2016
Ideally, I think we'd use an associated interface for this -- that wasn't an option when we wrote it originally. I haven't looked at https://crrev.com/2490623005 yet, but it sounds like we'd want to associate the renderer-to-browser channel with the channel owned by the SWEventDispatcher. Even in that case, though, I think we'd want to land https://codereview.chromium.org/2575403002/, as it would be incorrect to make a mojo call from the browser while the callback from the register() call from the browser still hasn't returned. Also cc'ing adithyas@, who recently moved the renderer endpoint from content/ to WebKit/
,
Dec 15 2016
Thanks for looking at this. Regarding https://codereview.chromium.org/2575403002/, Josh says in comment 3 that with the current code FireReadyEventsImpl actually runs after the callback... but I'm not sure I see how... Josh?
,
Dec 15 2016
Not comment 3, comment 8.
,
Dec 15 2016
FireReadyEvents() schedules a new operation which won't start until the current operation (registration) completes. The running of that PostTasked callback ends the registration operation.
,
Dec 16 2016
Got it.. I've closed the codereview. I don't want this potential regression to slip through with the upcoming vacation season. Tentatively assigning to iclelland to take a look at https://crrev.com/2490623005 and see whether we need to "associate the renderer-to-browser channel with the channel owned by the SWEventDispatcher".
,
Dec 27 2016
,
Dec 28 2016
Hi, I'm not clear about the current status here but want to share some understandings and ideas, would you PTAL? Thanks.
The original bug is caused by the broken ordering between ServiceWorkerEventDispatcher::DispatchSyncEvent and the response of BackgroundSyncService::Register, to solve it we need to associate bellowing 2 mojo interfaces on the same message pipe:
[1] content.mojom.ServiceWorkerEventDispatcher (browser-->render)
[2] blink.mojom.BackgroundSyncService (render-->browser)
I found above [1] is established via
[3] content.mojom.EmbeddedWorkerInstanceClient (browser-->render)
and
[4] content.mojom.EmbeddedWorkerSetup (browser-->render)
Also, I found
[5] content.mojom.ServiceWorkerDispatcherHost (render-->browser)
has already been associated with the legacy IPC channel(renderer process host <---> renderer process).
Thus, I'm considering to associate above {[1], [2], [3], [4]} interfaces onto the same legacy IPC channel, too, then all above 5 interfaces will keep message ordering and also keep ordering with all other legacy IPC messages(renderer process host <---> renderer process).
# Apply the method described at https://www.chromium.org/developers/design-documents/mojo/mojo-migration-guide#TOC-Converting-BrowserMessageFilters
My plan:
Step 1 -- Associate [3] and [4] on the legacy IPC channel. I've uploaded https://codereview.chromium.org/2606593002/ to do this work, and also revised related tests and got all trybots green there, if we can reach the consensus that the work direction is correct, I'll refine it more and request your formal review.
Step 2 -- Associate [1] with above [3] or [4].
Step 3 -- Associate [2] on the legacy IPC channel.
,
Dec 30 2016
OK I realized legacy IPC channel-associated interfaces framework is not encouraged to employ widely because it will disappear with IPC Channel together eventually. Seems we need to introduce one initial message pipe(a new mojo interface), which is responsible to associate all service worker mojo interfaces with itself.
,
Jan 10 2017
(Blink-Worker bug triage) "NextAction" date was over. Any updates on this? What should we do next? :)
,
Jan 18 2017
I'm unclear on whether it's a regression in M57, comment 8 says it was probably always racing. On the other hand, the browser test apparently not flaking before. If it's not a serious regression, we can downgrade Priority and ship the current code to 57 stable. jkarlin@, iclelland@ does that sound OK?
,
Jan 18 2017
Right, the race existed already, but we're seeing it more now. iclelland@, any thoughts? Let's keep the priority and try to patch a fix into early beta. If an associated interface is problematic we can always wait for an ACK from the render process that it resolved the promise before scheduling the firing operation.
,
Jan 19 2017
I think associating the background sync interfaces with ServiceWorkerEventDispatcher is better than associating with the legacy IPC channel when you try to introduce an associated interface because of what leon mentioned in c#16. I'm now planning to use one associated group for a service worker (and it'll be the group established by EmbeddedWorkerInstanceClient) to send anything to the service worker with keeping the order. ServiceWorkerEventDispatcher will be also bound with that later. But that might be problematic because everything should be associated with that. It might be better to do another way if that makes things like code health or others worse. WDYT?
,
Feb 21 2017
(blink-worker triage) The window for a M57 fix is closing.
,
Apr 14 2017
I think BackgroundSync owners are comfortable owning this bug, even though the underlying regression may have been caused by the change to core service worker code. So I'm dropping the service worker component. I don't think anyone is treating this as P1 anyhow.
,
Apr 25 2017
New question:
Sync event is always dispatched on service worker thread, but usually SWRegistration.sync.register() runs on the renderer thread, for such case, even if we associated background_sync interface with sw_event_dispatcher interface, we still can't guarantee that SWRegistration.sync.register() promise is resolved always before sync event dispatch, because associated interface can just ensure that mojo messages arrive at renderer process in order, but if these messages will be posted to different threads to handle, no one can know which one will be handled firstly.
So an idea for discussion:
interface BackgroundSyncService {
Register(XXX) => (XXX);
GetRegistrations(XXX) => (XXX);
ReadyForFiring(SyncRegistration registration);
};
As above, after SWRegistration.sync.register() promise is resolved, call ReadyForFiring() to notify browser side that now this sync registration is completely ready to fire the sync event.
Thus we can obey the background sync spec strictly. WDYT? Thanks!
,
Apr 26 2017
Or we do not care the order between register promise and sync event dispatch when SWRegistration.sync.register() is called on a different thread other than worker thread? But only care the order when SWRegistration.sync.register() is called from a service worker?
,
May 9 2017
Ping?
,
May 10 2017
Strictly firing the `sync` event *after* the registration has been confirmed, regardless the source of the registration, would definitely be most predictable. Purely from an IPC perspective, what we have is: 1. Developer calls SyncManager.register() 2. Renderer -> browser to create the registration 3. Browser -> renderer to confirm the registration 4. [when online] Browser -> renderer to invoke the event What you're proposing is to have the renderer ACK (3) *after* resolving the Promise, and have the browser wait for that until starting (4). That seems good to me. I'd personally name the method DidResolveRegistration. It'd be kind of cool if Mojo had several levels of callbacks here. If we were able to respond to the Register() response that other method wouldn't be necessary.
,
May 11 2017
Hi, Peter, thanks a lot for comments and explanations! That is exactly what I wanted to express. And about the support for the callback responding to callback in Mojo, I guess Mojo team may also suggest us to create another mojo call to solve such problem, because the support might need much effort;-)
,
Oct 30
I've put together a CL that does what's proposed in #26.
,
Nov 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d8d6e7656f2bc20e342e405d6d18c47168a22b26 commit d8d6e7656f2bc20e342e405d6d18c47168a22b26 Author: Josh Karlin <jkarlin@chromium.org> Date: Fri Nov 16 14:31:10 2018 [BackgroundSync] Wait for client to ack registration before firing event The problem: The BackgroundSyncManager would attempt to fire a sync event as soon as it finished registering an event and fired its client callback. The client callback and the firing of the event could race since the client callback happens on the background sync mojo channel but the event firing happens on the service worker mojo channel. This meant that a registration might fire before its registration promise resolved. The fix: The BackgroundSyncManager sets a "hold_firing" bit on new registrations and won't fire them until the client has acknowledged it has received its registration callback with a DidResolveRegistration IPC. Bug: 671980 Change-Id: Ie5f982776bffc8913033fba6ec2743a18fcfd964 Reviewed-on: https://chromium-review.googlesource.com/c/1307934 Reviewed-by: Ian Clelland <iclelland@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Josh Karlin <jkarlin@chromium.org> Cr-Commit-Position: refs/heads/master@{#608766} [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/content/browser/background_sync/background_sync_manager.cc [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/content/browser/background_sync/background_sync_manager.h [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/content/browser/background_sync/background_sync_manager_unittest.cc [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/content/browser/background_sync/background_sync_registration.h [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/content/browser/background_sync/background_sync_service_impl.cc [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/content/browser/background_sync/background_sync_service_impl.h [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/third_party/blink/public/platform/modules/background_sync/background_sync.mojom [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/third_party/blink/renderer/modules/background_sync/sync_manager.cc [modify] https://crrev.com/d8d6e7656f2bc20e342e405d6d18c47168a22b26/third_party/blink/renderer/modules/background_sync/sync_manager.h
,
Nov 16
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by falken@chromium.org
, Dec 15 2016Status: Started (was: Untriaged)