Jump in % database errors in Notifications.PersistentWebNotificationClickResult UMA |
||||
Issue descriptionThe ratio of (Database error : OK) jumped significantly from around 25th Sep to approach 20% on multiple platforms. e.g. Windows: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=2d0880bec168f0786dda17d97e0ace6e The charts for Dev and Canary look even weirder, with a sudden jump to ~90% and then a drop to ~15% recently and falling... e.g. https://uma.googleplex.com/p/chrome/timeline_v2/?sid=f8310d3f28f39540945646bf9526f7e4
,
Nov 6 2017
M60..M61 leveldb changes are: commit 8415f00eeedd96934d3578572d3802900e61a556 Author: costan <costan@google.com> Date: Mon Jul 10 13:32:58 2017 -0700 leveldb: Report missing CURRENT manifest file as database corruption. BTRFS reorders rename and write operations, so it is possible that a filesystem crash and recovery results in a situation where the file pointed to by CURRENT does not exist. DB::Open currently reports an I/O error in this case. Reporting database corruption is a better hint to the caller, which can attempt to recover the database or erase it and start over. This issue is not merely theoretical. It was reported as having showed up in the wild at https://github.com/google/leveldb/issues/195 and at https://crbug.com/738961 . Also, asides from the BTRFS case described above, incorrect data in CURRENT seems like a possible corruption case that should be handled gracefully. The Env API changes here can be considered backwards compatible, because an implementation that returns Status::IOError instead of Status::NotFound will still get the same functionality as before. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=161432630 commit 69e2bd224b7f11e021527cb95bab18f1ee6e1b3b Author: costan <costan@google.com> Date: Tue May 23 17:29:44 2017 -0700 LevelDB: Add WriteBatch::ApproximateSize(). This can be used to report metrics on LevelDB usage. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=156934930
,
Jan 12 2018
Things still don't seem back to normal. To summarise: On Windows Stable: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=67291c3b731defff45653b9f33421601 Windows Beta: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=472d6df5e819401dab9372e843759111 And ChromeOS Stable: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=961bbfdf9f0b0d0d7cb88e2cf4487b43 1. ALL show a big jump in % of errors starting on Sep 26th, from around 3% --> 15% errors) Note Windows M61 was released on Sept 27 (it went from 60.0.3112.113 - 61.0.3163.100). Windows Beta went from 62.0.3202.09 - 62.0.3202.38 on Sept 28th, and the spike for Chrome OS seems to happen well between the release of 60 and 61. I don't know how accurate the release date annotations in UMA charts are, so I guess if patches were cherry-picked onto multiple branches and release dates are a few days out it's still possible this was a chrome change causing the spike, but the weight of evidence suggests not. 2. If you look at bucket *counts*, the bucket counts for OK results have stayed fairly stable, however the count of errors suddenly jumped up by about half a million errors a day. On Linux Stable: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=234983af611d108cebf7707a2f0ebb7d - Slight increase from around Sept 26th of around 3% errors to 5% errors, that's been steadily growing since then to around 8%. (Linux went from 61.0.3163.91 --> 61.0.3163.100 release on Sept 25th) On Android Stable: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=325492041d6551f23de40c6f7d45de4e - No detectable changes on Mac Stable: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=a18ff4d193773bf083ce7b93a49fb9e9 - Jump from ~3 % --> 10 % starting around Sept 26th. - Mac M61 was released on Sept 28th (it went from 60.0.3112.113 - 61.0.3163.100, same as Windows) ------ Seems very strange that the numbers all jumped on the same day on every platform, and not at all on Android. What on earth could explain this? Meanwhile Windows Dev shows a *massive* increase in bucket counts of errors from Sept 18th, which then suddenly returns to a higher-than-before baseline on Oct 20th: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=fa56a07cf7f3390dd48d55c8126e202e ???
,
Jan 12 2018
,
Mar 5 2018
Wonder if the reason for the difference in Android could be related to the root cause of Issue 789145 - where an Android implementation was not updated as part of a refactor originally.
,
Sep 13
Archiving old bugs that haven't been actively assigned in over 180 days. If you feel this issue should still be addressed, feel free to reopen it or to file a new issue. Thanks!
,
Sep 13
Archiving old bugs that haven't been actively assigned in over 180 days. If you feel this issue should still be addressed, feel free to reopen it or to file a new issue. Thanks!
,
Sep 13
Archiving old bugs that haven't been actively assigned in over 180 days. If you feel this issue should still be addressed, feel free to reopen it or to file a new issue. Thanks! |
||||
►
Sign in to add a comment |
||||
Comment 1 by awdf@chromium.org
, Nov 6 2017