High rate of 410 NotRegistered from GCM with no apparant reason
Reported by
eric.re...@gmail.com,
Apr 9 2018
|
||||
Issue descriptionSteps to reproduce the problem: 1. User grants push/notification permissions and subscribes any service (eg. https://compare.buyhatke.com/) — Service Worker is properly registered. A confirmation notification is successfully sent via web push to the user. 2. User has no particular interaction with the service that we can detect during a few days: they do not browse any document from the origin under the scope of the Service Worker, nor do they block push/notification permission. 3. A new push is sent after a few days, which results in a much higher rate of GCM NotRegistered errors than expected. On our latest large-scale notification send, we saw a retention rate lower than 90% only about 20 days after the first registration. This could be expressed as a cumulative rate of ~0.5% per day, which is much, much higher than the main other service we're pushing to (APN). Opening a fresh occurrence as suggested in the archival comment of BUG= 642139 . While the issue is reported for Chrome 65 on Android, it actually happens all across our Android user base (including GCM-using Samsung Internet), with no clear indication of a version being more or less affected than another… Our Android user drill-down is as follows: +-----------------+-----------------------+--------+ | Device Category | Browser Major Version | Users | +-----------------+-----------------------+--------+ | mobile | Chrome 65 | 58.91% | | mobile | Samsung Internet 6 | 19.26% | | mobile | Chrome 64 | 4.25% | | mobile | Chrome 63 | 2.46% | | tablet | Chrome 65 | 1.96% | | mobile | Samsung Internet 7 | 1.79% | | mobile | Chrome 62 | 1.26% | | mobile | Samsung Internet 5 | 0.83% | | mobile | Chrome 61 | 0.75% | | mobile | Chrome 59 | 0.64% | | mobile | Chrome 58 | 0.62% | | mobile | Android Webview 65 | 0.59% | | mobile | Chrome 55 | 0.57% | | tablet | Samsung Internet 6 | 0.48% | | mobile | Chrome 60 | 0.40% | | mobile | Chrome 56 | 0.33% | | mobile | Chrome 50 | 0.32% | | mobile | Chrome 54 | 0.19% | | mobile | Android Webview 44 | 0.18% | +-----------------+-----------------------+--------+ What is the expected behavior? Expected behavior would be a much lower average rate of 410 NotRegistered on web pushes. What went wrong? We are not sure what went wrong! It could be that registration is lost by GCM for a very valid reason, but we don't know what it would be. Using a https://iid.googleapis.com/iid/info detailed query on said subscriptions results in a 404, presumably because the registration is already lost. So we are not sure where to investigate. Did this work before? N/A Does this work in other browsers? N/A Chrome version: 65.0.3325.181 Channel: stable OS Version: 7 Flash Version: This falls probably best under component Services>CloudMessaging, as in (now archived) previously reported issue https://bugs.chromium.org/p/chromium/issues/detail?id=642139.
,
Apr 9 2018
Thank you for the followup, Peter. Certainly, here is a list of endpoints that exhibited this behaviour recently. Notice that most unsubscription dates are somewhat identical, being when we pushed a message to these users, only to get a NotRegistered response. So this is actually when we “discovered” the unsubscription. Any indication of how/why/when the actual unsubcription happened would be very, very helpful. Please let us know if there is anything else that we could provide that would prove helpful from your end. Thanks again!
,
Apr 9 2018
Thank you, we'll take a look internally.
,
Apr 11 2018
The NextAction date has arrived: 2018-04-11
,
Apr 12 2018
Hi Eric, we've looked at a number of your subscriptions and don't see any unexpected behaviour. Some of the subscriptions in your CSV are too old - in order to investigate as thoroughly as possible we really need much more recent data. (Ideally no older than a few days.) The anonymous usage metrics we collect for all of Chrome also look healthy, without anomalies, so there doesn't seem to be a widespread issue on our end either. What sort of sample size are you measuring this over? (Order of magnitude is fine.) Can you reproduce it yourself with any device? Having 90% retention after a month could correlate to people changing devices - is that something you track?
,
Apr 18 2018
Hi Peter, Thank you very much for followup. We(https://compare.buyhatke.com/) have been facing the issue of significant decrease in push notifications delivered to our subscribers for last 6 months. For example, out of nearly 2.2million valid tokens that we have(as per GSM) only 0.1 million users on an average receive our push notifications. Here I am attaching two sets of valid tokens one set created in Dec,2017 and another set created at the start of April,2018. Since Dec,2017 we have sent more than 40 push notifications and GSM give a success message ID for all these tokens but the push is never delivered to them. Please look into these tokens.
,
May 8 2018
Hi Jayaram, thank you for the additional data. A delivery rate of ~5% is very low indeed. I see no issues with your tokens, but fresh ones will always help us to dig deeper. Three questions: (1) Do you set a TTL for the push messages? What is it? (2) Do you give the messages a non-default urgency? (E.g. low or high?) (3) Do you use a collapse_key to trigger replacing behaviour?
,
May 10 2018
,
Jun 5 2018
Hi Peter, Jayaram, and everyone, If I may, I would like to bring a very specific case that I’d consider very representative of the issue I opened this ticket for. Somehow the situation where pushes worked very well until a GCM issued a 410 for no apparent reason happened on my own phone! Understanding why this happened, whether it is a bug or a feature, would be immensely beneficial (not only to us, but also to the community if we could document this publicly somehow). Here is that one particular token: https://android.googleapis.com/gcm/send/dUXQ2F6x6m8:APA91bGsUHhsb00aKnjiPcRUWp2iaobf_9QiSPDpVCM1hDOyCPL-6PHfOxBfHUNumVhrid3aPdJdZg4NpdiNZwA0eY431Dz4MidbJn9TcGoFHjuOWblyY8Zdm0nknHGHkrcJoFANXBDj The summary regarding this subscription, and why I think it could be representative, is this: - After many successfully delivered pushes, one final push to this subscription resulted in a 410 - No particular user action was taken before that behavior - Browser and system builds changed many times - Upon inspection of the phone, the subscription endpoint appeared to have *changed* on the browser side. The specifics are: - We recorded the push subscription on February 7 (2018-02-07T15:51:29.672Z). Apologies for it not being so fresh, but once again, we are trying to figure out whether this could be part of the problem. - At the time of the subscription, the browser was Chrome/64.0.3282.137 (Android 8.0.0, my device running a Sony system Build/47.1.A.12.34 for G8441 AKA Xperia XZ1 Compact). - The target PWA was installed to my home screen moments later (2018-02-07T15:51:41.926Z) - Push events from GCM subsequently reached the device successfully, with the following timestamps on record: - 2018-02-07T16:00:15.205Z - 2018-02-08T08:01:06.014Z - 2018-02-14T08:00:32.194Z (after system upgrade to Build/47.1.A.12.75) - 2018-03-02T07:25:41.821Z - 2018-03-06T08:00:10.843Z - 2018-03-08T16:00:13.624Z - 2018-03-15T08:00:12.717Z (after browser upgrade to Chrome/65.0.3325.109) - 2018-03-15T08:00:28.230Z - 2018-03-16T08:00:12.840Z - 2018-03-16T15:00:18.840Z - 2018-03-20T08:00:15.794Z - 2018-03-20T14:06:06.257Z - 2018-03-22T08:00:11.340Z (after system upgrade to Build/47.1.A.12.119) - 2018-03-22T08:00:44.530Z - 2018-03-23T08:44:49.258Z - 2018-04-05T15:10:18.659Z - 2018-04-06T08:10:55.411Z - 2018-04-12T08:20:29.559Z - 2018-04-13T08:10:17.975Z (after system upgrade to Build/47.1.A.12.145) - 2018-04-19T08:00:22.105Z - 2018-04-25T08:00:13.006Z - 2018-04-25T08:00:48.217Z - 2018-05-08T08:14:37.814Z (after browser upgrade to Chrome/66.0.3359.158) - 2018-05-11T08:10:18.335Z - Three days later, the next push attempt resulted in a 410 error, although no particular change browser version, permission, or other user input had been performed. Recorded time on our side is 2018-05-14T08:00:25.342Z. - All of our pushes are performed with a "high" Urgency, and an explicit TTL of 24 hours or less. - It is *possible* that the system build was updated (Build/47.1.A.12.179) before the push error. While this could play as a factor of the 410 error, we have so far ranked this as improbable since our subscriptions have already survived many build upgrades. - Today, the phone runs Chrome/67.0.3396.68, and it is possible to inspect the now-unreachable registration from the console, revealing a different endpoint: await navigator.permissions.query({ name: 'push', userVisibleOnly: true }); → PermissionStatus {state: "granted", onchange: null} await navigator.permissions.query({ name: 'notifications' }); → PermissionStatus {state: "granted", onchange: null} (await navigator.serviceWorker.getRegistration()).pushManager.getSubscription(); → PushSubscription {endpoint: "https://android.googleapis.com/gcm/send/eNVWBpM1F38:APA91bE252J9lFd6Wo8RvY57PQVWiKwMc2fOw-M_AVURv5PUmQ3snQfusic-vQrzfr_fTaxh1yQ39ZDrq4QflmRRGgWAKTjk0-D23sJ4uYn1z2gekbliGTs6otJkIBK8jhC3bg7TzX9_", expirationTime: null, options: PushSubscriptionOptions} Whatever happened to this particular case, we believe could be a very spread-out reason for the push error rate increasing after some time passes within our user base. We might not have a direct possibility of control over it, but once again, I think it would be beneficial to understand the mechanism behind such observations, and to have them advertised for developer culture. Could it be that a system upgrade caused this? A storage cleanup? A glitch? A combination of these factors? Others? Is there any other data that we could gather to help investigating this cas which seems like it could happen in the wild on a regular basis? If we could get an idea of the reason behind this particular token error-ing out, it might shed some light, and it would be really appreciated. Thank you!
,
Jan 11
This issue has an owner, a component and a priority, but is still listed as untriaged or unconfirmed. By definition, this bug is triaged. Changing status to "assigned". Please reach out to me if you disagree with how I've done this. |
||||
►
Sign in to add a comment |
||||
Comment 1 by peter@chromium.org
, Apr 9 2018