Dev token getting changed automatically for service workers
Reported by
gaurava...@gmail.com,
Aug 29 2016
|
||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Steps to reproduce the problem: 1. Subscribe to https://compare.buyhatke.com/ 2. Now after a few days the dev token of users is dynamically changing. This is happening even when there is no clear data kind of thing been done by the user. 3. A lot of us from the team have subscribed to this and we do not get push. We need to explicitly go to the website again and only then we get the push. What is the expected behavior? The dev token for a particular user should remain constant until he has not explicitly cleared cache or removed data corresponding to it. What went wrong? Need to understand why is the dev token been changed dynamically for the user Did this work before? No Chrome version: 52.0.2743.116 Channel: stable OS Version: OS X 10.11.4 Flash Version:
,
Sep 3 2016
Hi This happens on chrome windows, mac and android all. By dev token I mean the GCM token that Google generates.
,
Sep 6 2016
Hi Rohit Any update on this ?
,
Sep 13 2016
This is not Mac-specific so I'm marking this OS=All and assigning to zea@ (from /google_apis/gcm/OWNERS) for triage.
,
Sep 13 2016
Peter, do you know what might be happening here?
,
Sep 13 2016
Are you re-registering your Service Worker by any chance? That would trigger this behaviour because subscriptions are tied to an installed Service Worker. Calling ServiceWorker.unregister() or changing the scope of the Service Worker will do this.
,
Sep 13 2016
@peter. No we are not re-registering the user. We have also put a OnpushsubscriptionChange listener. But it does not work for chrome. It is not giving any trigger when the GCM token is getting changed for a particular user. Surprisingly it works with firefox
,
Sep 14 2016
+1 We are noticed the same issue. Huge amount of NotRegistered especially since 2016-09-12 Maybe gcm changes subscriptionId, or service worker changes it on-a-fly. I see many requests from workers with unknown subscription ids for me Unfortunately can't reproduce it myself
,
Sep 14 2016
We've had several clients report this issue to us at OneSignal. The symptoms are the same as described by others in thread: Users stop receiving notifications until they visit the website again and receive a new token.
,
Sep 15 2016
I am a consultant and seeing this issue at one of the largest mobile web publishers. We have tested web push on large consumer populations on web push platforms from multiple vendors – all focused on GCM+Chrome service worker. We consistently see this problem and are working now to diagnose the root cause. Here is some data: * ~.8% of users opted in on a given day become “unavailable/undeliverable” each day starting within 1-2 days after opt-in – so after one month >25% are unavailable, after 2 mos >50%, after three months >75%. The messages are sent but the service workers do not get them. * This steep rate of decline is almost perfectly linear. * This happens regardless of whether the users are receiving many messages per day or no messages are sent for several months and then a single message is transmitted. Since we are seeing this on all web push platforms tested, and know GCM is rock solid, we are assuming the issue is something fundamental with the service worker. Not sure if a token is expiring or what is happening. We are testing to see if the service worker reinitiates if a user returns to our site on their own and we are adding many other diagnostics to the service worker. Is this on the Chromium radar already to fix? Seems like it would be a critical deal-breaker for web push and possibly other uses of service workers.
,
Sep 16 2016
[I'm not familiar with part of chrome stack. I got internal pings on this bug and want to see how I can help move this along] 1) Is it possible that the drop in the available users could be because profile data was removed, chrome was reinstalled or something else happened which may have triggered reset of chrome data ? 2) Can I assume that to get notification again, the user has to allow notifications again from the website as if the user is allowing notification for the first time ? Its not clear by what "receiving a new token" means. 3) chrome://gcm-internals/ - with the assumptions that all GCM subscriptions are documented on this page, it would be interesting to see if customers which are being prompted to reregister have any registrations for the website listed under "Registered App Ids".
,
Sep 16 2016
,
Sep 16 2016
1. No, we never removed profile data or uninstalled chrome 2. No, just visiting the website again was enough. Since user has already subscribed. Just visiting the page created a new GCM token for him and push started coming in again 3. Customers were not prompted to register and I can see multiple entries for the domain I am talking about.
,
Sep 16 2016
@royans 1) There is a major increase in unreachable users starting around this time this bug was reported, beyond what we saw previously due to profile data being cleared. 2) No. Simply returning to the website grants the user a new push token that works. The user does not need to allow notifications again. 3) N/A. There is no prompt to re-register.
,
Sep 16 2016
Thanks. Increasing priority for visibility. Peter: what else do we need here to help move this along ?
,
Sep 16 2016
Nothing— all of that is great data, thanks for sharing! John, would you please take a look? Please note that our team is at an off-site until next Thursday.
,
Sep 16 2016
Since subscribing returns a different GCM token, it seems that Chrome must have unsubscribed the subscription. The main reasons why Chrome might automatically unsubscribe a subscription are: 1) the user revokes Notifications permission (though the commenters say the user gets a new valid token without a new permission prompt, and it doesn't seem likely that many users toggle the permission off then on again). 2) an incoming push message arrives for a subscription that Chrome doesn't know about (not relevant here). 3) an incoming push message arrives whilst permission is revoked (not relevant here) 4) an incoming push message arrives and Chrome can't load its corresponding ServiceWorkerRegistration. The last one might be the cause: UMA shows that across all incoming push messages, about 0.3% of the time we fail to load their corresponding ServiceWorkerRegistration[1]. This could happen due because the Service Worker has been legitimately unregistered (but commenters say that's not happening), but it may also happen due to disk I/O errors, etc. I'm going to write a patch to no longer automatically unregister after disk I/O errors (and add logging for the exact ServiceWorkerStatus), to see if that helps. I won't be able to land this until Thursday unfortunately. [1]: https://cs.chromium.org/chromium/src/content/browser/push_messaging/push_messaging_router.cc?rcl=1474024782&l=81 [2]: https://cs.chromium.org/chromium/src/chrome/browser/push_messaging/push_messaging_service_impl.cc?rcl=1474024782&l=325
,
Sep 16 2016
Thanks so much for the work to resolve this. We are eager to see how the patch does. Two follow-up questions: 1.) Will the patch auto-update on devices back to Chrome 42? 2.) As designed, what percent of monthly attrition should we expect from a user population with properly functioning service workers due user cache clearing and other issues? Thanks again!
,
Sep 22 2016
We at XtremePush also having issues. A lot of subscribed users become unavailable for sending pushes. So GCM responds with success when trying to send a message to a user, but a service worker doesn't receive push event. It obviously happens when device is offline, but as John described above every month our delivery rate drops-off on ~25% which can't be just offline users. Analysing individual devices we usually see following scenario: 1. User opt-ins 2. For some period of time nearly all the messages sent to this user are delivered (some of them might not be delivered because device is offline) 3. After some point of time absolutely all the messages sent to this user are not delivered We can see this scenario over and over again for loads of users. But we can't see any trends on the period of time needed for user to become unavailable. It can be a few days or a few months. There is only one scenario we know to reproduce this issue - clear chrome data in app manager on android. So GCM can't pick up the fact that user is gone and it still responds with success when trying to send a message. Do you think it might be fixed in future? In all other scenarios when we clear browsing history or unsubscribe from notifications GCM works as expected and respond with an error telling us that user is unsubscribed. I also have a question about pushsubscriptionchange event. It's something I can see in different push API specifications, but it seems like it's not implemented in chrome. Is it possible on GCM that user subscription can be changed? If yes, then does user need to come back to the website to get a new token? I have also found an issue on my own Chrome on Mac Os. It was working fine for a long time, but a few days ago it broke. So it stopped receiving notifications (not only from our websites but from everybody). Even for new subscriptions, it doesn't work. For example: 1. I go the web push demo page: https://gauntface.github.io/simple-push-demo/ 2. It asks me for push notifications, I press allow 3. The token is generated successfully 4. When I try to push to this token GCM responds with success, but nothing happens. No logs coming from service worker. 5. If I go to service workers section in a browser, find my service worker and press "Push" button it works and displays a notification. 6. Restarting chrome or rebooting OS doesn't help It seems like the issue is with GCM. Because when I go to chrome://gcm-internals/ page I see: Connection State: LOGGING IN. And it never changes to CONNECTED. So to generalise this: 1. Sometimes users become unavailable for sending notifications. And those numbers are big, which is a problem for websites who want to use web push notifications. 2. Can the patch you've made a few days ago make a situation better? 3. Can you put more light on changing user subscription and pushsubscriptionchange event? 4. Do you have any ideas on GCM issue described above? Thanks and appreciate if you can help with that.
,
Sep 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/983872d32d2b818957e3ba7714b7a822a27e0dd8 commit 983872d32d2b818957e3ba7714b7a822a27e0dd8 Author: johnme <johnme@chromium.org> Date: Thu Sep 22 20:39:20 2016 Push API: Don't unsubscribe when finding Service Worker fails Failing to find a service worker might be temporary, so we shouldn't unsubscribe in this case. Continue to unsubscribe if the Service Worker is actually not found (e.g. because it has been unregistered). Also adds more UMA in this area. BUG= 642139 Review-Url: https://codereview.chromium.org/2361113002 Cr-Commit-Position: refs/heads/master@{#420447} [modify] https://crrev.com/983872d32d2b818957e3ba7714b7a822a27e0dd8/chrome/browser/push_messaging/push_messaging_browsertest.cc [modify] https://crrev.com/983872d32d2b818957e3ba7714b7a822a27e0dd8/content/browser/push_messaging/push_messaging_router.cc [modify] https://crrev.com/983872d32d2b818957e3ba7714b7a822a27e0dd8/tools/metrics/histograms/histograms.xml
,
Sep 23 2016
Requesting merge to m54 - this is a safe change, and should either reduce the rate at which push subscriptions are lost, or provide useful UMA logging to help us pursue other approaches sooner. There is a small risk that these Service Worker errors are not ephemeral and hence the push subscriptions remain broken now that we no longer auto-unsubscribe them, but any users in that state are already getting a broken experience, so it's still worth trying this. > 1.) Will the patch auto-update on devices back to Chrome 42? No, even if my merge request is granted it'll only be in Chrome 54 (which is currently in beta, and should reach stable channel around Oct 18th). Most Chrome installations auto-update pretty quickly, though there are always a few stragglers. > 2.) As designed, what percent of monthly attrition should we expect from a user population with properly functioning service workers due user cache clearing and other issues? We're still gathering data on this. Any stats you can share would be appreciated. If you're measuring delivery rate to the Service Worker using analytics, it's worth bearing in mind some reasons why messages might not be delivered even though a push subscription is healthy: a) messages sent with collapse_key can be replaced by a newer message; b) messages whose time_to_live is exceeded (the default and maximum are both 4 weeks) whilst the device is offline will be dropped; c) GCM queues up to 100 non-collapse-key messages per push subscription whilst a device is offline - any more and they'll all be deleted; d) GCM allows up to 4 different collapse keys to be used at once per push subscription - any more and an arbitrary collapse key message will be deleted; > 3. After some point of time absolutely all the messages sent to this user are not delivered But does GCM still return a message_id rather than an error, in its JSON response? > clear chrome data in app manager on android If this is the cause, it's worth noting that users who frequently clear data for privacy reasons will subscribe and unsubscribe much more often than normal users, so may make up a disproportionately large fraction of all your registrations. It's odd though, as Chrome should auto-unsubscribe the next time a message is received after clearing data, such that GCM's JSON response to the subsequent message is error:NotRegistered. A similar case that cannot be helped though is if you factory reset an Android device - in that case it's as if the device goes permanently offline, and there's no way for GCM to know that it no longer exists. However I doubt many users frequently factory reset their devices. > pushsubscriptionchange event We don't yet implement this, as GCM subscriptions don't yet expire (except perhaps if unused for a very long time?), but it's something we're discussing in the spec: https://github.com/w3c/push-api/issues/132. > my own Chrome on Mac Os (...) stopped receiving notifications (...) Connection State: LOGGING IN I haven't seen it get stuck on LOGGING IN before. Please file a new bug for that issue - thanks.
,
Sep 23 2016
Your change meets the bar and is auto-approved for M54 (branch: 2840)
,
Sep 23 2016
Could you please confirm whether this change is baked/verified in Canary and safe to merge?If yes, merge your change to M54 (branch: 2840) ASAP so that we could take this for next Beta Release.
,
Sep 26 2016
Histograms look good so far on Canary. Merging.
,
Sep 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d9b986d87105a963f619d0dd6f6931270721b13b commit d9b986d87105a963f619d0dd6f6931270721b13b Author: John Mellor <johnme@chromium.org> Date: Mon Sep 26 15:16:50 2016 Push API: Don't unsubscribe when finding Service Worker fails Failing to find a service worker might be temporary, so we shouldn't unsubscribe in this case. Continue to unsubscribe if the Service Worker is actually not found (e.g. because it has been unregistered). Also adds more UMA in this area. BUG= 642139 Review-Url: https://codereview.chromium.org/2361113002 Cr-Commit-Position: refs/heads/master@{#420447} (cherry picked from commit 983872d32d2b818957e3ba7714b7a822a27e0dd8) Review URL: https://codereview.chromium.org/2371773002 . Cr-Commit-Position: refs/branch-heads/2840@{#527} Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607} [modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/chrome/browser/push_messaging/push_messaging_browsertest.cc [modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/content/browser/push_messaging/push_messaging_router.cc [modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/tools/metrics/histograms/histograms.xml
,
Sep 29 2016
Thanks again for getting this patch out so quickly. Do you have any recommendations on how we can test it without having to wait for weeks with many users to see if they still drop off? Will your diagnostics be able to tell early on if this is working well?
,
Sep 30 2016
> my own Chrome on Mac Os (...) stopped receiving notifications (...) Connection State: LOGGING IN > I haven't seen it get stuck on LOGGING IN before. Please file a new bug for that issue - thanks. The new bug is created here: https://bugs.chromium.org/p/chromium/issues/detail?id=651863 Might be another cause of users becoming unavailable for push notifications.
,
Oct 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/92ef94f69eb941d144e31f6cac45c6e3ce8a8720 commit 92ef94f69eb941d144e31f6cac45c6e3ce8a8720 Author: johnme <johnme@chromium.org> Date: Sat Oct 01 21:24:48 2016 Push API: Refactor and fix unsubscribe API Before this patch, 4 high-level codepaths for unsubscribe had evolved: a) JS-initiated unsubscribe where PushMessagingMessageFilter's UnsubscribeHavingGottenIds would skip talking to the PushMessagingServiceImpl because it could not find the subscription in the Service Worker database. b) JS-initiated unsubscribe where PushMessagingMessageFilter did talk to the PushMessagingServiceImpl, then directly removed the subscription from the Service Worker database in PushMessagingMessageFilter's Core::DidUnregisterFromService. c) Automatic unsubscribe after revoked permission where PushMessagingServiceImpl::UnsubscribeBecausePermissionRevoked would call PushMessagingService::ClearPushSubscriptionID to ask the content layer to remove the subscription from the Service Worker database. d) Automatic unsubscribe after bad incoming messages where PushMessagingServiceImpl would never remove the subscription from the Service Worker database. This patch unifies them, such that all unsubscription requests go via PushMessagingServiceImpl::UnsubscribeInternal, and this method is now responsible for calling PushMessagingService::ClearPushSubscriptionID in all cases to remove the subscription from the Service Worker database. - Eliminating (d) fixes the PUSH_DELIVERY_STATUS_PERMISSION_DENIED case, where previously we would automatically unsubscribe but never remove the corresponding subscription from the Service Worker database (this situation occurs for users that had been hit by crbug.com/633310). - Eliminating (a) makes us more robust against any cases where a subscription had been removed from the Service Worker database but not from the PushMessagingAppIdentifier map (e.g. due to race conditions where processes are killed partway through writing state to disk). - This adds UMA logging for the reason that caused unsubscription. This will be useful in tracking down https://crbug.com/642139 - This adds tests for each of the reasons that can trigger automatic unsubscription. Previously many of these had no coverage. - This fixes PushMessagingBrowserTest.UnsubscribeSuccess and LegacyUnsubscribeSuccess which were failing to test what they intended to test (instead of calling unsubscribe on old references to unsubscribed PushSubscriptions and PushSubscriptions from unregistered Service Workers, they were trying and failing to get a fresh reference, and considering that failure to mean that unsubscribe had succeeded); and I added a test where the Service Worker is replaced, since unregistering a Service Worker isn't actually enough to stop it controlling the current page. - This merges the DidUnsubscribeInstanceID and DidUnsubscribe methods in PushMessagingServiceImpl to avoid duplication of logic. BUG= 646426 , 642139 ,633310 NOTRY=true (remaining trybot failures are flake) Review-Url: https://codereview.chromium.org/2387483002 Cr-Commit-Position: refs/heads/master@{#422330} [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/browser/push_messaging/push_messaging_browsertest.cc [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/browser/push_messaging/push_messaging_service_impl.cc [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/browser/push_messaging/push_messaging_service_impl.h [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/test/data/push_messaging/push_test.js [add] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/test/data/push_messaging/service_worker_with_skipWaiting_claim.js [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/components/gcm_driver/instance_id/fake_gcm_driver_for_instance_id.cc [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/components/gcm_driver/instance_id/fake_gcm_driver_for_instance_id.h [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/browser/push_messaging/push_messaging_message_filter.cc [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/browser/push_messaging/push_messaging_message_filter.h [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/public/browser/push_messaging_service.cc [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/public/browser/push_messaging_service.h [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/public/common/push_messaging_status.h [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/shell/browser/layout_test/layout_test_push_messaging_service.cc [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/shell/browser/layout_test/layout_test_push_messaging_service.h [modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/tools/metrics/histograms/histograms.xml
,
Oct 19 2016
Any updates on when Chrome 54 for Android will be released to the public? Initial date was 10/18, but haven't seen an update...
,
Oct 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d9b986d87105a963f619d0dd6f6931270721b13b commit d9b986d87105a963f619d0dd6f6931270721b13b Author: John Mellor <johnme@chromium.org> Date: Mon Sep 26 15:16:50 2016 Push API: Don't unsubscribe when finding Service Worker fails Failing to find a service worker might be temporary, so we shouldn't unsubscribe in this case. Continue to unsubscribe if the Service Worker is actually not found (e.g. because it has been unregistered). Also adds more UMA in this area. BUG= 642139 Review-Url: https://codereview.chromium.org/2361113002 Cr-Commit-Position: refs/heads/master@{#420447} (cherry picked from commit 983872d32d2b818957e3ba7714b7a822a27e0dd8) Review URL: https://codereview.chromium.org/2371773002 . Cr-Commit-Position: refs/branch-heads/2840@{#527} Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607} [modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/chrome/browser/push_messaging/push_messaging_browsertest.cc [modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/content/browser/push_messaging/push_messaging_router.cc [modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/tools/metrics/histograms/histograms.xml
,
Nov 10 2016
Thank you again for the work on Chrome 54. We have been testing with large populations of Chrome 54 Android users (8k+) since the new version became available. While we need to give it more time, after 2 weeks we are seeing a drop rate of about ~.4% per day so, after 2 weeks, a 5.6% total drop rate. In other words, on the first day after users opt-in almost 94% of web push sends are delivered but after 14 days that delivery rate is down to ~88%. Previously we were seeing a drop rate of almost 1% per day so this seems to be an improvement, but we are wondering if the new diagnostics you added to the service worker are telling you anything regarding why these drops are still occurring?
,
Nov 15 2016
Looks like we're continuing to see drop off of opted-in in users on Chrome 54. Losing about .5% a day. This is much better than what we had previously experienced, but we're still losing users. Looking forward to hearing more from the Chrome team on what they've learned.
,
Nov 30 2016
Hi – I am following up on my post from Nov 10. Any feedback would be greatly appreciated. We have continued testing with large populations on Chrome 54 after the bug fix but are still seeing a significant and growing undeliverable rate on web push. After users opt-in, delivery rates start around 92-93% but are at 76-81% after 1 month and seem to continue to fall. On average, we see about .36% of the users become undeliverable each day and the number is cumulative so 11%+ are gone after 1 month. As we dig into the data, we see that Chrome 54 on Android 6 has a lower drop rate of .22% per day – almost 7% per month. Happy to provide the detailed data to help debug this. Are the new diagnostics you added to the service worker telling anything regarding why these drops are still occurring? Before the Chrome 54 bug fix we were dropping ~1% per day (30% per month) so this is a significant improvement but still renders web push unusable for many applications. Thanks
,
Dec 1 2016
In M53 we saw that amongst incoming messages, A) 0.3% could not find a corresponding Service Worker, and 0.3% hit a Service Worker error either B) before or C) during delivery. We used to auto-unsubscribe in cases A and B (not C). In M54 commit 983872d32d2b818957e3ba7714b7a822a27e0dd8 changed this so we only auto-unsubscribe in case A. However from the added metrics it turns out that almost all of the latter 0.3% was due to case C not case B, so this hasn't made much difference. We also learnt that 2/3 of case C is SERVICE_WORKER_ERROR_TIMEOUT which just means the website took too long to resolve/reject the waitUntil promise so we killed the SW, though 1/3 of case C is SERVICE_WORKER_ERROR_FAILED which might be legitimate delivery failures. In M56 commit 92ef94f69eb941d144e31f6cac45c6e3ce8a8720 added the PushMessaging.UnregistrationReason metric, which gives the precise reason why a subscription was unsubscribed from Chrome. That'll roll out to stable soon and give less biased data, but so far the numbers from beta show: - 92% are due to the website calling unsubscribe() - 5.5% are due to an incoming message for which Service Worker is not found - 1.6% are due to the user revoking permission - 0.8% are due to an incoming message for which push subscription is not found The distribution is subtly different on Android: - 89% are due to the website calling unsubscribe() - 6% are due to an incoming message for which Service Worker is not found - 2% are due to the user revoking permission - 3% are due to an incoming message for which push subscription is not found The Service Worker not found case should primarily be due to A) the user using Clear Browsing Data or B) the website unregistering its Service Worker. Currently neither directly trigger unsubscription, instead we unsubscribe next time we receive a message for that Service Worker (and it is not found). We're working on making SW unregistration (both A and B) directly trigger push unsubscription in issue 402458 (and will introduce separate UnregistrationReason codes for both), which will allow us to measure how often the Service Worker is not found for any other reasons (e.g. database corruption). The push subscription not found case should primarily be due to unsubscribing (for any of the other reasons listed above) whilst offline, in which case the unsubscription won't reach GCM servers, and so we'll re-attempt unsubscription next time a message is received for that subscription and the subscription is not found (because it was deleted during the offline unsubscribe attempt). It's important to note that the PushMessaging.UnregistrationReason metric inherently omits some situations: 1. If Chrome never finds out about the unsubscription, e.g. a desktop user clears their Chrome profile folder, an Android user clears Chrome app data via Android Settings, or a user Factory Resets their device or leaves it powered off in a drawer, nothing can be logged. 2. If the subscription gets into a bad state that doesn't trigger automatic unsubscription, due to a Chrome bug, this metric won't be logged. For example in issue 651863 we see that some macOS devices are failing to connect to GCM servers. Another recent example is issue 661660 , where we realised that in the rare edge cases where the desktop GCM store is reset due to corruption, Chrome was keeping invalid push subscriptions - that's fixed in M56, and we'll now unsubscribe so that the website can detect that the subscription was lost and resubscribe (though actually, that still doesn't get logged in the UnregistrationReason metric - perhaps it should). > Chrome 54 on Android 6 has a lower drop rate of .22% per day (...) > Before the Chrome 54 bug fix we were dropping ~1% per day Are you comparing recent data versus older data, or recent Chrome >=54 versus recent Chrome <54? The latter may be biased, since users with out-of-date Chrome are more likely to have full disks or poor connections. > On average, we see about .36% of the users become undeliverable each day and the number is cumulative so 11%+ are gone after 1 month. It's a little tricky to relate those numbers to our metrics, as there are so many possible causes of messages becoming undeliverable. Some questions: - Do you ever call unsubscribe (e.g. if the user opts out on your website)? If so do you exclude that from the drop rate? - Do you ever unregister your Service Worker? (Presumably not?) - Do you try to detect when users have Cleared Browsing Data (cookies etc, and hence Service Workers)? You should be able to detect this on page load by checking if the origin already has notifications permission but there is no Service Worker registered (assuming you don't use non-SW notifications). - Do you try to detect when users have revoked permission? This'll become easier if we start delivering a pushsubscriptionchange event for permission revocation (https://github.com/w3c/push-api/issues/228), but in the meantime it would be possible to check the permission on page load to detect recovation. - Do you check for errors like NotRegistered (https://developers.google.com/cloud-messaging/http-server-ref#interpret-downstream) when sending messages? Do the dropped-off subscriptions return error:NotRegistered (meaning the unsubscribe cleanly propagated to GCM servers) or a message_id (meaning GCM servers thinks the subscription is still valid, even if it's apparently failing to deliver to it). - If GCM servers still think the subscription is valid, do you try to detect devices going offline (left in a drawer / factory reset / etc)? You can detect these cases by using https://developers.google.com/instance-id/reference/server#get_information_about_app_instances i.e. fetch https://iid.googleapis.com/iid/info/REGISTRATIONID?details=true with your Authorization header and the resulting connectDate tells you the date when the device last connected to GCM servers. - Do you use the collapse_key feature of GCM? If so older messages might get replaced with a newer one before being delivered, but that shouldn't cause cumulative drops over time.
,
Dec 7 2016
Hi. Thanks for the information, it's very helpful for us. > Are you comparing recent data versus older data, or recent Chrome >=54 versus recent Chrome <54? The latter may be biased, since users with out-of-date Chrome are more likely to have full disks or poor connections. We're comparing only recent data (users opted-in after November 1-st) > Do you ever call unsubscribe (e.g. if the user opts out on your website)? If so do you exclude that from the drop rate? > Do you ever unregister your Service Worker? (Presumably not?) No, we don't unsubscribe or unregister service worker on purpose. > Do you try to detect when users have revoked permission? This'll become easier if we start delivering a pushsubscriptionchange event for permission revocation (https://github.com/w3c/push-api/issues/228), but in the meantime it would be possible to check the permission on page load to detect recovation. Yes, we do detect revoked permissions on page load. > Do you check for errors like NotRegistered (https://developers.google.com/cloud-messaging/http-server-ref#interpret-downstream) when sending messages? Do the dropped-off subscriptions return error:NotRegistered (meaning the unsubscribe cleanly propagated to GCM servers) or a message_id (meaning GCM servers thinks the subscription is still valid, even if it's apparently failing to deliver to it). Yes, we're aware of it. We're not sending any new messages to these devices after getting an error from GCM. > If GCM servers still think the subscription is valid, do you try to detect devices going offline (left in a drawer / factory reset / etc)? You can detect these cases by using https://developers.google.com/instance-id/reference/server#get_information_about_app_instances i.e. fetch https://iid.googleapis.com/iid/info/REGISTRATIONID?details=true with your Authorization header and the resulting connectDate tells you the date when the device last connected to GCM servers. Thanks for pointing it out, it is very helpful for our testing. We were managed to get new insights using this data. > Do you use the collapse_key feature of GCM? If so older messages might get replaced with a newer one before being delivered, but that shouldn't cause cumulative drops over time. We don't use collapse keys at the moment. But we do use TTL set to 24 hours. It slightly decreases delivery rates, but again that shouldn't cause cumulative drops over time. We also have a couple of related questions: - > We also learnt that 2/3 of case C is SERVICE_WORKER_ERROR_TIMEOUT which just means the website took too long to resolve/reject the waitUntil promise so we killed the SW So if service worker is timed out and getting killed. Will it be restored automatically next time push is sent to the user? - If user clears chrome data in android settings will it be the last connectDate of this device reported by iid.googleapis.com? And here is more light on the way we're sending notifications and getting statistics. - We're sending notifications only to Android users (no desktop users). - We're sending around 3 notifications per day per user. - No collapse key is used, TTL is set to 24 hours. By analyzing data we're seeing a drop-off in delivery rates over time, which seems to be caused by devices becoming permanently unavailable for showing push notification, but GCM not reporting about it. There is a spreadsheet attached to this message illustrating it. - You can see that chrome 54 has a smaller drop-off over time then chrome <=53 (most likely due to chrome 54 fixes described above). - You can also see that Android 6 on chrome 54 has a smaller drop-off over time then Android 5 on chrome 54 (might potentially be caused by android 5 running on older devices). - Opted-out users are not included to this report. We're tracking opt-outs in all possible ways (detecting revoked permissions on page load and handling NotRegistered error from GCM) - Offline devices are excluded as well (by using last connect date from iid.googleapis.com). So it seems like there is still some possible scenario where: - device is online - device is considered as active by GCM - but service worker is not working and device is not showing notifications Do you have any thoughts on that? Thank you.
,
Dec 13 2016
> So if service worker is timed out and getting killed. Will it be restored automatically next time push is sent to the user? Yes, killed here just means we stopped executing it, but the SW remains registered and the push subscription remains subscribed. > If user clears chrome data in android settings will it be the last connectDate of this device reported by iid.googleapis.com? No, if an Android user clears app data for Chrome only, then the connectDate will continue to be updated when the device goes online (this is different from Factory Resetting the device, leaving it in a drawer, or a desktop user clearing their Chrome profile, all of which would cause the connectDate to stop being updated). We don't expect Android users to clear Chrome app data very often though (unless your target audience is power users). In all of these cases, since cookies etc are cleared, it'll appear as if the user never visits the website again from that device. > So it seems like there is still some possible scenario (...) Thanks, that's really useful data (and thanks for being precise about the things you exclude). We'll look into this to try to understand how that could happen.
,
Dec 21 2016
I thought it might be helpful to also provide data sliced a slightly different way. The attached shows only Android 6 Chrome 54 devices which are those that seem to have the least attrition over time. This data shows devices which opted-in on different days and how those devices progress over time. The data is split where (on the left in green) “offline” devices per the Google data were included and (on the right in blue) those devices are excluded. Can you give us your take on what this data or the previous data posted by Xtremepush (our push vendor) tells you? Thanks!
,
Jan 10 2017
Please let us know if there has been any progress on the Google/Chromium side in resolving this issue. Thanks
,
Jan 11 2017
We keep monitoring the stats and here are some interesting data attached in two files. It shows number of messages sent and number of messages delivered (shown) day by day. We're using GCM service to get GCM login date for each device. Then we're using this date to exclude messages sent to devices after this date (in order to exclude devices that went offline). We can see that delivery rates are staying quite high even after two month. But number of unique devices is going down (which is caused by the 'last gcm login date' not being updated). And the drop-off is quite big. It concerns us. We know a few reasons of 'gcm login date' not being updated anymore: 1. Device going offline (no internet, no battery, left in drawer) 2. Factory reset 3. Clear chrome data in android settings? (Is that correct?) Can you tell is there any other reasons? And do you have any idea on how many devices are being factory reset and how many users clearing chrome data in android settings? Also do you know how long the GCM session might be? So like if the last login date was 2 days ago, can device be still online?
,
Jan 13 2017
Thanks for the additional data, we're still looking into this. I'm putting together a document with all the possible ways push/GCM can fail, and will try to share that shortly.
,
Jan 25 2017
Hi -- any update on this issue and the doc you were assembling showing how web push/GCM can fail?
,
May 18 2017
Hey John, Can you share the PushMessaging.UnregistrationReason metric collected so far? Is the document on the ways push/GCM can fail available as well?
,
Mar 21 2018
Archiving this now as John has left the team some time ago and we haven't had any recent reports. If anybody sees a re-occurrence of this issue please file a fresh bug referencing this one. Thanks.
,
Apr 9 2018
We're still seeing a rate of NotRegistered on push that we don't think is normal. Opened a new issue for followup: https://bugs.chromium.org/p/chromium/issues/detail?id=830528 Thanks! |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by rohitrao@chromium.org
, Aug 31 2016