New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 642139 link

Starred by 14 users

Issue metadata

Status: Archived
Owner: ----
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Dev token getting changed automatically for service workers

Reported by gaurava...@gmail.com, Aug 29 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36

Steps to reproduce the problem:

1. Subscribe to https://compare.buyhatke.com/
2. Now after a few days the dev token of users is dynamically changing. This is happening even when there is no clear data kind of thing been done by the user. 
3. A lot of us from the team have subscribed to this and we do not get push. We need to explicitly go to the website again and only then we get the push. 

What is the expected behavior?
The dev token for a particular user should remain constant until he has not explicitly cleared cache or removed data corresponding to it. 

What went wrong?
Need to understand why is the dev token been changed dynamically for the user 

Did this work before? No 

Chrome version: 52.0.2743.116  Channel: stable
OS Version: OS X 10.11.4
Flash Version:
 
Labels: Needs-Feedback
Can you clarify what you mean by "dev token"?  Also, does this issue occur on Chrome Windows as well, or is it Mac-specific?  Thanks!
Hi 

This happens on chrome windows, mac and android all. 
By dev token I mean the GCM token that Google generates. 
Hi Rohit

Any update on this ?
Cc: dim...@chromium.org
Labels: -OS-Mac OS-All
Owner: zea@chromium.org
This is not Mac-specific so I'm marking this OS=All and assigning to zea@ (from /google_apis/gcm/OWNERS) for triage.

Comment 5 by zea@chromium.org, Sep 13 2016

Components: Blink>PushAPI Services>CloudMessaging
Owner: peter@chromium.org
Peter, do you know what might be happening here?

Comment 6 by peter@chromium.org, Sep 13 2016

Cc: joh...@chromium.org
Are you re-registering your Service Worker by any chance?

That would trigger this behaviour because subscriptions are tied to an installed Service Worker. Calling ServiceWorker.unregister() or changing the scope of the Service Worker will do this.
@peter. No we are not re-registering the user. 

We have also put a OnpushsubscriptionChange listener. But it does not work for chrome. It is not giving any trigger when the GCM token is getting changed for a particular user. Surprisingly it works with firefox 
+1
We are noticed the same issue.
Huge amount of NotRegistered especially since 2016-09-12
Maybe gcm changes subscriptionId, or service worker changes it on-a-fly. I see many requests from workers with unknown subscription ids for me

Unfortunately can't reproduce it myself
We've had several clients report this issue to us at OneSignal.

The symptoms are the same as described by others in thread: Users stop receiving notifications until they visit the website again and receive a new token.
I am a consultant and seeing this issue at one of the largest mobile web publishers.  We have tested web push on large consumer populations on web push platforms from multiple vendors – all focused on GCM+Chrome service worker.  We consistently see this problem and are working now to diagnose the root cause.  Here is some data:

* ~.8% of users opted in on a given day become “unavailable/undeliverable” each day starting within 1-2 days after opt-in – so after one month >25% are unavailable, after 2 mos >50%,  after three months >75%.  The messages are sent but the service workers do not get them.

* This steep rate of decline is almost perfectly linear.

* This happens regardless of whether the users are receiving many messages per day or no messages are sent for several months and then a single message is transmitted.  

Since we are seeing this on all web push platforms tested, and know GCM is rock solid, we are assuming the issue is something fundamental with the service worker.  Not sure if a token is expiring or what is happening.  We are testing to see if the service worker reinitiates if a user returns to our site on their own and we are adding many other diagnostics to the service worker.

Is this on the Chromium radar already to fix?  Seems like it would be a critical deal-breaker for web push and possibly other uses of service workers.  

Comment 11 by roy...@google.com, Sep 16 2016

Labels: Hotlist-Enterprise
[I'm not familiar with part of chrome stack. I got internal pings on this bug and want to see how I can help move this along]

1) Is it possible that the drop in the available users could be because profile data was removed, chrome was reinstalled or something else happened which may have triggered reset of chrome data ?

2) Can I assume that to get notification again, the user has to allow notifications again from the website as if the user is allowing notification for the first time ? Its not clear by what "receiving a new token" means.

3) chrome://gcm-internals/ - with the assumptions that all GCM subscriptions are documented on this page, it would be interesting to see if customers which are being prompted to reregister have any registrations for the website listed under "Registered App Ids".


Comment 12 by roy...@google.com, Sep 16 2016

Cc: royans@chromium.org
1. No, we never removed profile data or uninstalled chrome

2. No, just visiting the website again was enough. Since user has already subscribed. Just visiting the page created a new GCM token for him and push started coming in again

3. Customers were not prompted to register and I can see multiple entries for the domain I am talking about. 
Screenshot (106).png
495 KB View Download

Comment 14 by geo...@deglin.com, Sep 16 2016

@royans

1) There is a major increase in unreachable users starting around this time this bug was reported, beyond what we saw previously due to profile data being cleared.

2) No. Simply returning to the website grants the user a new push token that works. The user does not need to allow notifications again.

3) N/A. There is no prompt to re-register.

Comment 15 by roy...@google.com, Sep 16 2016

Labels: -Pri-2 Pri-1
Thanks. Increasing priority for visibility. 

Peter: what else do we need here to help move this along ?

Comment 16 by peter@chromium.org, Sep 16 2016

Cc: -joh...@chromium.org peter@chromium.org
Owner: joh...@chromium.org
Status: Assigned (was: Unconfirmed)
Nothing— all of that is great data, thanks for sharing!

John, would you please take a look? Please note that our team is at an off-site until next Thursday.
Status: Started (was: Assigned)
Since subscribing returns a different GCM token, it seems that Chrome must have unsubscribed the subscription.

The main reasons why Chrome might automatically unsubscribe a subscription are:

1) the user revokes Notifications permission (though the commenters say the user gets a new valid token without a new permission prompt, and it doesn't seem likely that many users toggle the permission off then on again).

2) an incoming push message arrives for a subscription that Chrome doesn't know about (not relevant here).

3) an incoming push message arrives whilst permission is revoked (not relevant here)

4) an incoming push message arrives and Chrome can't load its corresponding ServiceWorkerRegistration.

The last one might be the cause: UMA shows that across all incoming push messages, about 0.3% of the time we fail to load their corresponding ServiceWorkerRegistration[1]. This could happen due because the Service Worker has been legitimately unregistered (but commenters say that's not happening), but it may also happen due to disk I/O errors, etc.

I'm going to write a patch to no longer automatically unregister after disk I/O errors (and add logging for the exact ServiceWorkerStatus), to see if that helps. I won't be able to land this until Thursday unfortunately.

[1]: https://cs.chromium.org/chromium/src/content/browser/push_messaging/push_messaging_router.cc?rcl=1474024782&l=81
[2]: https://cs.chromium.org/chromium/src/chrome/browser/push_messaging/push_messaging_service_impl.cc?rcl=1474024782&l=325
Thanks so much for the work to resolve this.  We are eager to see how the patch does.  Two follow-up questions:

1.) Will the patch auto-update on devices back to Chrome 42?  
2.) As designed, what percent of monthly attrition should we expect from a user population with properly functioning service workers due user cache clearing and other issues?  

Thanks again!

We at XtremePush also having issues. A lot of subscribed users become unavailable for sending pushes.
So GCM responds with success when trying to send a message to a user, but a service worker doesn't receive push event. It obviously happens when device is offline, but as John described above every month our delivery rate drops-off on ~25% which can't be just offline users.

Analysing individual devices we usually see following scenario:
 1. User opt-ins
 2. For some period of time nearly all the messages sent to this user are delivered (some of them might not be delivered because device is offline)
 3. After some point of time absolutely all the messages sent to this user are not delivered
We can see this scenario over and over again for loads of users. But we can't see any trends on the period of time needed for user to become unavailable. It can be a few days or a few months.

There is only one scenario we know to reproduce this issue - clear chrome data in app manager on android. So GCM can't pick up the fact that user is gone and it still responds with success when trying to send a message. Do you think it might be fixed in future? 
In all other scenarios when we clear browsing history or unsubscribe from notifications GCM works as expected and respond with an error telling us that user is unsubscribed.

I also have a question about pushsubscriptionchange event. It's something I can see in different push API specifications, but it seems like it's not implemented in chrome. Is it possible on GCM that user subscription can be changed? If yes, then does user need to come back to the website to get a new token?

I have also found an issue on my own Chrome on Mac Os. It was working fine for a long time, but a few days ago it broke. So it stopped receiving notifications (not only from our websites but from everybody). Even for new subscriptions, it doesn't work. 
For example:
 1. I go the web push demo page: https://gauntface.github.io/simple-push-demo/
 2. It asks me for push notifications, I press allow
 3. The token is generated successfully
 4. When I try to push to this token GCM responds with success, but nothing happens. No logs coming from service worker.
 5. If I go to service workers section in a browser, find my service worker and press "Push" button it works and displays a notification.
 6. Restarting chrome or rebooting OS doesn't help

It seems like the issue is with GCM. Because when I go to chrome://gcm-internals/ page I see:
Connection State: LOGGING IN. And it never changes to CONNECTED.


So to generalise this:
 1. Sometimes users become unavailable for sending notifications. And those numbers are big, which is a problem for websites who want to use web push notifications.
 2. Can the patch you've made a few days ago make a situation better?
 3. Can you put more light on changing user subscription and pushsubscriptionchange event?
 4. Do you have any ideas on GCM issue described above?

Thanks and appreciate if you can help with that.
Project Member

Comment 20 by bugdroid1@chromium.org, Sep 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/983872d32d2b818957e3ba7714b7a822a27e0dd8

commit 983872d32d2b818957e3ba7714b7a822a27e0dd8
Author: johnme <johnme@chromium.org>
Date: Thu Sep 22 20:39:20 2016

Push API: Don't unsubscribe when finding Service Worker fails

Failing to find a service worker might be temporary, so we shouldn't
unsubscribe in this case. Continue to unsubscribe if the Service Worker
is actually not found (e.g. because it has been unregistered).

Also adds more UMA in this area.

BUG= 642139 

Review-Url: https://codereview.chromium.org/2361113002
Cr-Commit-Position: refs/heads/master@{#420447}

[modify] https://crrev.com/983872d32d2b818957e3ba7714b7a822a27e0dd8/chrome/browser/push_messaging/push_messaging_browsertest.cc
[modify] https://crrev.com/983872d32d2b818957e3ba7714b7a822a27e0dd8/content/browser/push_messaging/push_messaging_router.cc
[modify] https://crrev.com/983872d32d2b818957e3ba7714b7a822a27e0dd8/tools/metrics/histograms/histograms.xml

Labels: Merge-Request-54
Requesting merge to m54 - this is a safe change, and should either reduce the rate at which push subscriptions are lost, or provide useful UMA logging to help us pursue other approaches sooner.

There is a small risk that these Service Worker errors are not ephemeral and hence the push subscriptions remain broken now that we no longer auto-unsubscribe them, but any users in that state are already getting a broken experience, so it's still worth trying this.

> 1.) Will the patch auto-update on devices back to Chrome 42?

No, even if my merge request is granted it'll only be in Chrome 54 (which is currently in beta, and should reach stable channel around Oct 18th). Most Chrome installations auto-update pretty quickly, though there are always a few stragglers.

> 2.) As designed, what percent of monthly attrition should we expect from a user population with properly functioning service workers due user cache clearing and other issues?

We're still gathering data on this. Any stats you can share would be appreciated.

If you're measuring delivery rate to the Service Worker using analytics, it's worth bearing in mind some reasons why messages might not be delivered even though a push subscription is healthy:

a) messages sent with collapse_key can be replaced by a newer message;
b) messages whose time_to_live is exceeded (the default and maximum are both 4 weeks) whilst the device is offline will be dropped;
c) GCM queues up to 100 non-collapse-key messages per push subscription whilst a device is offline - any more and they'll all be deleted;
d) GCM allows up to 4 different collapse keys to be used at once per push subscription - any more and an arbitrary collapse key message will be deleted;

> 3. After some point of time absolutely all the messages sent to this user are not delivered

But does GCM still return a message_id rather than an error, in its JSON response?

> clear chrome data in app manager on android

If this is the cause, it's worth noting that users who frequently clear data for privacy reasons will subscribe and unsubscribe much more often than normal users, so may make up a disproportionately large fraction of all your registrations.

It's odd though, as Chrome should auto-unsubscribe the next time a message is received after clearing data, such that GCM's JSON response to the subsequent message is error:NotRegistered. A similar case that cannot be helped though is if you factory reset an Android device - in that case it's as if the device goes permanently offline, and there's no way for GCM to know that it no longer exists. However I doubt many users frequently factory reset their devices.

> pushsubscriptionchange event

We don't yet implement this, as GCM subscriptions don't yet expire (except perhaps if unused for a very long time?), but it's something we're discussing in the spec: https://github.com/w3c/push-api/issues/132.

> my own Chrome on Mac Os (...) stopped receiving notifications (...) Connection State: LOGGING IN

I haven't seen it get stuck on LOGGING IN before. Please file a new bug for that issue - thanks.

Comment 22 by dimu@chromium.org, Sep 23 2016

Labels: -Merge-Request-54 Merge-Approved-54 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M54 (branch: 2840)
Labels: -Needs-Feedback
Could you please confirm whether this change is baked/verified in Canary and safe to merge?If yes, merge your change to M54 (branch: 2840) ASAP so that we could take this for next Beta Release.
Histograms look good so far on Canary. Merging.
Project Member

Comment 25 by bugdroid1@chromium.org, Sep 26 2016

Labels: -merge-approved-54 merge-merged-2840
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d9b986d87105a963f619d0dd6f6931270721b13b

commit d9b986d87105a963f619d0dd6f6931270721b13b
Author: John Mellor <johnme@chromium.org>
Date: Mon Sep 26 15:16:50 2016

Push API: Don't unsubscribe when finding Service Worker fails

Failing to find a service worker might be temporary, so we shouldn't
unsubscribe in this case. Continue to unsubscribe if the Service Worker
is actually not found (e.g. because it has been unregistered).

Also adds more UMA in this area.

BUG= 642139 

Review-Url: https://codereview.chromium.org/2361113002
Cr-Commit-Position: refs/heads/master@{#420447}
(cherry picked from commit 983872d32d2b818957e3ba7714b7a822a27e0dd8)

Review URL: https://codereview.chromium.org/2371773002 .

Cr-Commit-Position: refs/branch-heads/2840@{#527}
Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607}

[modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/chrome/browser/push_messaging/push_messaging_browsertest.cc
[modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/content/browser/push_messaging/push_messaging_router.cc
[modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/tools/metrics/histograms/histograms.xml

Thanks again for getting this patch out so quickly.  

Do you have any recommendations on how we can test it without having to wait for weeks with many users to see if they still drop off?  Will your diagnostics be able to tell early on if this is working well?


> my own Chrome on Mac Os (...) stopped receiving notifications (...) Connection State: LOGGING IN

> I haven't seen it get stuck on LOGGING IN before. Please file a new bug for that issue - thanks.

The new bug is created here:
https://bugs.chromium.org/p/chromium/issues/detail?id=651863

Might be another cause of users becoming unavailable for push notifications.
Project Member

Comment 28 by bugdroid1@chromium.org, Oct 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/92ef94f69eb941d144e31f6cac45c6e3ce8a8720

commit 92ef94f69eb941d144e31f6cac45c6e3ce8a8720
Author: johnme <johnme@chromium.org>
Date: Sat Oct 01 21:24:48 2016

Push API: Refactor and fix unsubscribe API

Before this patch, 4 high-level codepaths for unsubscribe had evolved:
a) JS-initiated unsubscribe where PushMessagingMessageFilter's
   UnsubscribeHavingGottenIds would skip talking to the
   PushMessagingServiceImpl because it could not find the subscription
   in the Service Worker database.
b) JS-initiated unsubscribe where PushMessagingMessageFilter did talk to
   the PushMessagingServiceImpl, then directly removed the subscription
   from the Service Worker database in PushMessagingMessageFilter's
   Core::DidUnregisterFromService.
c) Automatic unsubscribe after revoked permission where
   PushMessagingServiceImpl::UnsubscribeBecausePermissionRevoked would
   call PushMessagingService::ClearPushSubscriptionID to ask the content
   layer to remove the subscription from the Service Worker database.
d) Automatic unsubscribe after bad incoming messages where
   PushMessagingServiceImpl would never remove the subscription from the
   Service Worker database.

This patch unifies them, such that all unsubscription requests go via
PushMessagingServiceImpl::UnsubscribeInternal, and this method is now
responsible for calling PushMessagingService::ClearPushSubscriptionID
in all cases to remove the subscription from the Service Worker
database.

- Eliminating (d) fixes the PUSH_DELIVERY_STATUS_PERMISSION_DENIED case,
  where previously we would automatically unsubscribe but never remove
  the corresponding subscription from the Service Worker database (this
  situation occurs for users that had been hit by crbug.com/633310).

- Eliminating (a) makes us more robust against any cases where a
  subscription had been removed from the Service Worker database but not
  from the PushMessagingAppIdentifier map (e.g. due to race conditions
  where processes are killed partway through writing state to disk).

- This adds UMA logging for the reason that caused unsubscription. This
  will be useful in tracking down  https://crbug.com/642139 

- This adds tests for each of the reasons that can trigger automatic
  unsubscription. Previously many of these had no coverage.

- This fixes PushMessagingBrowserTest.UnsubscribeSuccess and
  LegacyUnsubscribeSuccess which were failing to test what they intended
  to test (instead of calling unsubscribe on old references to
  unsubscribed PushSubscriptions and PushSubscriptions from unregistered
  Service Workers, they were trying and failing to get a fresh
  reference, and considering that failure to mean that unsubscribe had
  succeeded); and I added a test where the Service Worker is replaced,
  since unregistering a Service Worker isn't actually enough to stop it
  controlling the current page.

- This merges the DidUnsubscribeInstanceID and DidUnsubscribe methods in
  PushMessagingServiceImpl to avoid duplication of logic.

BUG= 646426 , 642139 ,633310
NOTRY=true
(remaining trybot failures are flake)

Review-Url: https://codereview.chromium.org/2387483002
Cr-Commit-Position: refs/heads/master@{#422330}

[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/browser/push_messaging/push_messaging_browsertest.cc
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/browser/push_messaging/push_messaging_service_impl.cc
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/browser/push_messaging/push_messaging_service_impl.h
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/test/data/push_messaging/push_test.js
[add] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/chrome/test/data/push_messaging/service_worker_with_skipWaiting_claim.js
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/components/gcm_driver/instance_id/fake_gcm_driver_for_instance_id.cc
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/components/gcm_driver/instance_id/fake_gcm_driver_for_instance_id.h
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/browser/push_messaging/push_messaging_message_filter.cc
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/browser/push_messaging/push_messaging_message_filter.h
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/public/browser/push_messaging_service.cc
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/public/browser/push_messaging_service.h
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/public/common/push_messaging_status.h
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/shell/browser/layout_test/layout_test_push_messaging_service.cc
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/content/shell/browser/layout_test/layout_test_push_messaging_service.h
[modify] https://crrev.com/92ef94f69eb941d144e31f6cac45c6e3ce8a8720/tools/metrics/histograms/histograms.xml

Comment 29 by pere...@gmail.com, Oct 19 2016

Any updates on when Chrome 54 for Android will be released to the public?  Initial date was 10/18, but haven't seen an update...
Project Member

Comment 30 by bugdroid1@chromium.org, Oct 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d9b986d87105a963f619d0dd6f6931270721b13b

commit d9b986d87105a963f619d0dd6f6931270721b13b
Author: John Mellor <johnme@chromium.org>
Date: Mon Sep 26 15:16:50 2016

Push API: Don't unsubscribe when finding Service Worker fails

Failing to find a service worker might be temporary, so we shouldn't
unsubscribe in this case. Continue to unsubscribe if the Service Worker
is actually not found (e.g. because it has been unregistered).

Also adds more UMA in this area.

BUG= 642139 

Review-Url: https://codereview.chromium.org/2361113002
Cr-Commit-Position: refs/heads/master@{#420447}
(cherry picked from commit 983872d32d2b818957e3ba7714b7a822a27e0dd8)

Review URL: https://codereview.chromium.org/2371773002 .

Cr-Commit-Position: refs/branch-heads/2840@{#527}
Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607}

[modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/chrome/browser/push_messaging/push_messaging_browsertest.cc
[modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/content/browser/push_messaging/push_messaging_router.cc
[modify] https://crrev.com/d9b986d87105a963f619d0dd6f6931270721b13b/tools/metrics/histograms/histograms.xml

Thank you again for the work on Chrome 54.  

We have been testing with large populations of Chrome 54 Android users (8k+) since the new version became available.  While we need to give it more time, after 2 weeks we are seeing a drop rate of about ~.4% per day so, after 2 weeks, a 5.6% total drop rate.  In other words, on the first day after users opt-in almost 94% of web push sends are delivered but after 14 days that delivery rate is down to ~88%.  

Previously we were seeing a drop rate of almost 1% per day so this seems to be an improvement, but we are wondering if the new diagnostics you added to the service worker are telling you anything regarding why these drops are still occurring?

Comment 32 by pere...@gmail.com, Nov 15 2016

Looks like we're continuing to see drop off of opted-in in users on Chrome 54.  Losing about .5% a day.  This is much better than what we had previously experienced, but we're still losing users.

Looking forward to hearing more from the Chrome team on what they've learned.
Hi – I am following up on my post from Nov 10.  Any feedback would be greatly appreciated.

We have continued testing with large populations on Chrome 54 after the bug fix but are still seeing a significant and growing undeliverable rate on web push.  After users opt-in, delivery rates start around 92-93% but are at 76-81% after 1 month and seem to continue to fall. On average, we see about .36% of the users become undeliverable each day and the number is cumulative so 11%+ are gone after 1 month.

As we dig into the data, we see that Chrome 54 on Android 6 has a lower drop rate of .22% per day – almost 7% per month.  

Happy to provide the detailed data to help debug this.  Are the new diagnostics you added to the service worker telling anything regarding why these drops are still occurring?  

Before the Chrome 54 bug fix we were dropping ~1% per day (30% per month) so this is a significant improvement but still renders web push unusable for many applications.

Thanks  





In M53 we saw that amongst incoming messages, A) 0.3% could not find a corresponding Service Worker, and 0.3% hit a Service Worker error either B) before or C) during delivery. We used to auto-unsubscribe in cases A and B (not C).

In M54 commit 983872d32d2b818957e3ba7714b7a822a27e0dd8 changed this so we only auto-unsubscribe in case A. However from the added metrics it turns out that almost all of the latter 0.3% was due to case C not case B, so this hasn't made much difference. We also learnt that 2/3 of case C is SERVICE_WORKER_ERROR_TIMEOUT which just means the website took too long to resolve/reject the waitUntil promise so we killed the SW, though 1/3 of case C is SERVICE_WORKER_ERROR_FAILED which might be legitimate delivery failures.


In M56 commit 92ef94f69eb941d144e31f6cac45c6e3ce8a8720 added the PushMessaging.UnregistrationReason metric, which gives the precise reason why a subscription was unsubscribed from Chrome. That'll roll out to stable soon and give less biased data, but so far the numbers from beta show:

- 92% are due to the website calling unsubscribe()
- 5.5% are due to an incoming message for which Service Worker is not found
- 1.6% are due to the user revoking permission
- 0.8% are due to an incoming message for which push subscription is not found

The distribution is subtly different on Android:

- 89% are due to the website calling unsubscribe()
-  6% are due to an incoming message for which Service Worker is not found
-  2% are due to the user revoking permission
-  3% are due to an incoming message for which push subscription is not found

The Service Worker not found case should primarily be due to A) the user using Clear Browsing Data or B) the website unregistering its Service Worker. Currently neither directly trigger unsubscription, instead we unsubscribe next time we receive a message for that Service Worker (and it is not found). We're working on making SW unregistration (both A and B) directly trigger push unsubscription in  issue 402458  (and will introduce separate UnregistrationReason codes for both), which will allow us to measure how often the Service Worker is not found for any other reasons (e.g. database corruption).

The push subscription not found case should primarily be due to unsubscribing (for any of the other reasons listed above) whilst offline, in which case the unsubscription won't reach GCM servers, and so we'll re-attempt unsubscription next time a message is received for that subscription and the subscription is not found (because it was deleted during the offline unsubscribe attempt).


It's important to note that the PushMessaging.UnregistrationReason metric inherently omits some situations:

1. If Chrome never finds out about the unsubscription, e.g. a desktop user clears their Chrome profile folder, an Android user clears Chrome app data via Android Settings, or a user Factory Resets their device or leaves it powered off in a drawer, nothing can be logged.

2. If the subscription gets into a bad state that doesn't trigger automatic unsubscription, due to a Chrome bug, this metric won't be logged. For example in issue 651863 we see that some macOS devices are failing to connect to GCM servers. Another recent example is  issue 661660 , where we realised that in the rare edge cases where the desktop GCM store is reset due to corruption, Chrome was keeping invalid push subscriptions - that's fixed in M56, and we'll now unsubscribe so that the website can detect that the subscription was lost and resubscribe (though actually, that still doesn't get logged in the UnregistrationReason metric - perhaps it should).


> Chrome 54 on Android 6 has a lower drop rate of .22% per day (...)
> Before the Chrome 54 bug fix we were dropping ~1% per day

Are you comparing recent data versus older data, or recent Chrome >=54 versus recent Chrome <54? The latter may be biased, since users with out-of-date Chrome are more likely to have full disks or poor connections.


> On average, we see about .36% of the users become undeliverable each day and the number is cumulative so 11%+ are gone after 1 month.

It's a little tricky to relate those numbers to our metrics, as there are so many possible causes of messages becoming undeliverable. Some questions:

- Do you ever call unsubscribe (e.g. if the user opts out on your website)? If so do you exclude that from the drop rate?

- Do you ever unregister your Service Worker? (Presumably not?)

- Do you try to detect when users have Cleared Browsing Data (cookies etc, and hence Service Workers)? You should be able to detect this on page load by checking if the origin already has notifications permission but there is no Service Worker registered (assuming you don't use non-SW notifications).

- Do you try to detect when users have revoked permission? This'll become easier if we start delivering a pushsubscriptionchange event for permission revocation (https://github.com/w3c/push-api/issues/228), but in the meantime it would be possible to check the permission on page load to detect recovation.

- Do you check for errors like NotRegistered (https://developers.google.com/cloud-messaging/http-server-ref#interpret-downstream) when sending messages? Do the dropped-off subscriptions return error:NotRegistered (meaning the unsubscribe cleanly propagated to GCM servers) or a message_id (meaning GCM servers thinks the subscription is still valid, even if it's apparently failing to deliver to it).

- If GCM servers still think the subscription is valid, do you try to detect devices going offline (left in a drawer / factory reset / etc)? You can detect these cases by using https://developers.google.com/instance-id/reference/server#get_information_about_app_instances i.e. fetch https://iid.googleapis.com/iid/info/REGISTRATIONID?details=true with your Authorization header and the resulting connectDate tells you the date when the device last connected to GCM servers.

- Do you use the collapse_key feature of GCM? If so older messages might get replaced with a newer one before being delivered, but that shouldn't cause cumulative drops over time.
Hi. Thanks for the information, it's very helpful for us.

> Are you comparing recent data versus older data, or recent Chrome >=54 versus recent Chrome <54? The latter may be biased, since users with out-of-date Chrome are more likely to have full disks or poor connections.

We're comparing only recent data (users opted-in after November 1-st)

> Do you ever call unsubscribe (e.g. if the user opts out on your website)? If so do you exclude that from the drop rate?
> Do you ever unregister your Service Worker? (Presumably not?)

No, we don't unsubscribe or unregister service worker on purpose.

> Do you try to detect when users have revoked permission? This'll become easier if we start delivering a pushsubscriptionchange event for permission revocation (https://github.com/w3c/push-api/issues/228), but in the meantime it would be possible to check the permission on page load to detect recovation.

Yes, we do detect revoked permissions on page load.

> Do you check for errors like NotRegistered (https://developers.google.com/cloud-messaging/http-server-ref#interpret-downstream) when sending messages? Do the dropped-off subscriptions return error:NotRegistered (meaning the unsubscribe cleanly propagated to GCM servers) or a message_id (meaning GCM servers thinks the subscription is still valid, even if it's apparently failing to deliver to it).

Yes, we're aware of it. We're not sending any new messages to these devices after getting an error from GCM.

> If GCM servers still think the subscription is valid, do you try to detect devices going offline (left in a drawer / factory reset / etc)? You can detect these cases by using https://developers.google.com/instance-id/reference/server#get_information_about_app_instances i.e. fetch https://iid.googleapis.com/iid/info/REGISTRATIONID?details=true with your Authorization header and the resulting connectDate tells you the date when the device last connected to GCM servers.

Thanks for pointing it out, it is very helpful for our testing. We were managed to get new insights using this data.

> Do you use the collapse_key feature of GCM? If so older messages might get replaced with a newer one before being delivered, but that shouldn't cause cumulative drops over time.

We don't use collapse keys at the moment. But we do use TTL set to 24 hours. It slightly decreases delivery rates, but again that shouldn't cause cumulative drops over time.



We also have a couple of related questions:

- > We also learnt that 2/3 of case C is SERVICE_WORKER_ERROR_TIMEOUT which just means the website took too long to resolve/reject the waitUntil promise so we killed the SW
So if service worker is timed out and getting killed. Will it be restored automatically next time push is sent to the user?

- If user clears chrome data in android settings will it be the last connectDate of this device reported by iid.googleapis.com?



And here is more light on the way we're sending notifications and getting statistics.
 - We're sending notifications only to Android users (no desktop users).
 - We're sending around 3 notifications per day per user.
 - No collapse key is used, TTL is set to 24 hours.

By analyzing data we're seeing a drop-off in delivery rates over time, which seems to be caused by devices becoming permanently unavailable for showing push notification, but GCM not reporting about it. 

There is a spreadsheet attached to this message illustrating it. 
 - You can see that chrome 54 has a smaller drop-off over time then chrome <=53 (most likely due to chrome 54 fixes described above). 
 - You can also see that Android 6 on chrome 54 has a smaller drop-off over time then Android 5 on chrome 54 (might potentially be caused by android 5 running on older devices).
 - Opted-out users are not included to this report. We're tracking opt-outs in all possible ways (detecting revoked permissions on page load and handling NotRegistered error from GCM)
 - Offline devices are excluded as well (by using last connect date from iid.googleapis.com).

So it seems like there is still some possible scenario where:
 - device is online
 - device is considered as active by GCM
 - but service worker is not working and device is not showing notifications


Do you have any thoughts on that? Thank you.
delivery_rates.xlsx
47.4 KB Download
> So if service worker is timed out and getting killed. Will it be restored automatically next time push is sent to the user?

Yes, killed here just means we stopped executing it, but the SW remains registered and the push subscription remains subscribed.

> If user clears chrome data in android settings will it be the last connectDate of this device reported by iid.googleapis.com?

No, if an Android user clears app data for Chrome only, then the connectDate will continue to be updated when the device goes online (this is different from Factory Resetting the device, leaving it in a drawer, or a desktop user clearing their Chrome profile, all of which would cause the connectDate to stop being updated). We don't expect Android users to clear Chrome app data very often though (unless your target audience is power users). In all of these cases, since cookies etc are cleared, it'll appear as if the user never visits the website again from that device.

> So it seems like there is still some possible scenario (...)

Thanks, that's really useful data (and thanks for being precise about the things you exclude). We'll look into this to try to understand how that could happen.
I thought it might be helpful to also provide data sliced a slightly different way.  The attached shows only Android 6 Chrome 54 devices which are those that seem to have the least attrition over time.  This data shows devices which opted-in on different days and how those devices progress over time. The data is split where (on the left in green) “offline” devices per the Google data were included and (on the right in blue) those devices are excluded.   

Can you give us your take on what this data or the previous data posted by Xtremepush (our push vendor) tells you?

Thanks!


report_12.12.2016_android6 (3) Google Post.xlsx
41.2 KB Download
Please let us know if there has been any progress on the Google/Chromium side in resolving this issue.  

Thanks
We keep monitoring the stats and here are some interesting data attached in two files. It shows number of messages sent and number of messages delivered (shown) day by day.

We're using GCM service to get GCM login date for each device. Then we're using this date to exclude messages sent to devices after this date (in order to exclude devices that went offline).

We can see that delivery rates are staying quite high even after two month. But number of unique devices is going down (which is caused by the 'last gcm login date' not being updated). And the drop-off is quite big. It concerns us.

We know a few reasons of 'gcm login date' not being updated anymore:
1. Device going offline (no internet, no battery, left in drawer)
2. Factory reset
3. Clear chrome data in android settings? (Is that correct?)

Can you tell is there any other reasons?
And do you have any idea on how many devices are being factory reset and how many users clearing chrome data in android settings?

Also do you know how long the GCM session might be? So like if the last login date was 2 days ago, can device be still online?



Android6_Chrome54_Online.xlsx
75.0 KB Download
Android5_Chrome54_Online.xlsx
72.5 KB Download
Thanks for the additional data, we're still looking into this. I'm putting together a document with all the possible ways push/GCM can fail, and will try to share that shortly.
Hi -- any update on this issue and the doc you were assembling showing how web push/GCM can fail?
Hey John,

Can you share the PushMessaging.UnregistrationReason metric collected so far? Is the document on the ways push/GCM can fail available as well?

Comment 43 by awdf@chromium.org, Mar 21 2018

Owner: ----
Status: Archived (was: Started)
Archiving this now as John has left the team some time ago and we haven't had any recent reports.

If anybody sees a re-occurrence of this issue please file a fresh bug referencing this one.

Thanks.
We're still seeing a rate of NotRegistered on push that we don't think is normal.

Opened a new issue for followup:
https://bugs.chromium.org/p/chromium/issues/detail?id=830528

Thanks!

Sign in to add a comment