Increased Crashes In Android With On-Disk Persistent Metrics |
|||||||
Issue descriptionChrome Version: M65 OS: Android On-Disk Persistence: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.component%3D%27src%2Fbase%2Fmetrics%27%20AND%20expanded_custom_data.ChromeCrashProto.malware_verdict%3Dfalse%20AND%20product.version%20like%20%276_._.%25%27%20AND%20product.name%3D%27Chrome_Android%27%20AND%20expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20expanded_custom_data.ChromeCrashProto.channel%3D%27stable%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(expanded_custom_data.ChromeCrashProto.experiments.ids)%20expanded_custom_data_ChromeCrashProto_experiments_ids%20WHERE%20expanded_custom_data_ChromeCrashProto_experiments_ids%3D%2723a898eb-357e36e%27) In-Memory Persistence: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.component%3D%27src%2Fbase%2Fmetrics%27%20AND%20expanded_custom_data.ChromeCrashProto.malware_verdict%3Dfalse%20AND%20product.version%20like%20%276_._.%25%27%20AND%20product.name%3D%27Chrome_Android%27%20AND%20expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20expanded_custom_data.ChromeCrashProto.channel%3D%27stable%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(expanded_custom_data.ChromeCrashProto.experiments.ids)%20expanded_custom_data_ChromeCrashProto_experiments_ids%20WHERE%20expanded_custom_data_ChromeCrashProto_experiments_ids%3D%2723a898eb-a85dc586%27) There is a high number of M65 crashes in ValidateHistogramContents() that is not present in the "control" group using in-memory persistence. Unfortunately, there is no way of seeing what the validation-failure was as the "crash key" was removed a couple versions prior. All the calls are through RecordCurrentHistograms() meaning that it's only accessing data that was written to memory during the current session, not loading a metrics file from a previous session. As such, there should be no difference between the on-disk and in-memory modes. The distribution among clients is not uniform with 20 crashes (13.4%) being from a single client and the next highest two clients having 6 crashes (4.0%). The number continues to drop from there. Despite the fact that there should be no difference, the in-memory group sees only 5 crashes total in this method. I wonder if something could be causing the disk file to be modified outside of Chrome.
,
Apr 25 2018
Looking through the crashes, many of the "logcat" files have error messages saying: Corruption detected in shared-memory segment. Something is messing with the shared memory contents when its held on disk. It's possible that this is restricted to certain devices. The U673C makes up 42% and the UL40 makes up another 35%. Beyond that it drops off quickly with NX16A8116K at 9%, MTAB-ET8404G at 4%. After that it's 1% and lower.
,
Apr 25 2018
A quick check via ADB confirms that these files are only accessible to the owning process so nothing should be able to change their contents except Chrome itself. Perhaps its a bad flash chip? Something that doesn't write properly and then reads back as zeros? The devices I listed previously don't match the more general device list for all Android crashes so it looks like these devices are especially prone to the problem.
,
Apr 26 2018
,
Apr 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1a3a225c8e5b345a4747835a3cf37304b012ddbd commit 1a3a225c8e5b345a4747835a3cf37304b012ddbd Author: Brian White <bcwhite@chromium.org> Date: Fri Apr 27 00:40:07 2018 Validate histogram contents to detect corruption. Android builds are crashing when memory-mapped files are used for persistent metrics. This is just to verify that the cause is indeed an unreadable ID that is held in that memory segment. This will be reverted as soon as the reason is confirmed. Bug: 836875 Change-Id: I4fcdbf0c554ef4f3ee4dd3706ab8c5b71e12c50b Reviewed-on: https://chromium-review.googlesource.com/1031190 Commit-Queue: Brian White <bcwhite@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Cr-Commit-Position: refs/heads/master@{#554224} [modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram.cc [modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram.h [modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram_base.cc [modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram_base.h [modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram_snapshot_manager.cc
,
May 2 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5 commit 60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5 Author: Brian White <bcwhite@chromium.org> Date: Wed May 02 18:32:00 2018 Track metrics files with internal corruption. Bug: 836875 Change-Id: I633a00c07f52093c12a49723fd213dfc03192eba Reviewed-on: https://chromium-review.googlesource.com/1039346 Commit-Queue: Alexei Svitkine <asvitkine@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Cr-Commit-Position: refs/heads/master@{#555478} [modify] https://crrev.com/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5/components/metrics/file_metrics_provider.cc [modify] https://crrev.com/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5/components/metrics/file_metrics_provider.h [modify] https://crrev.com/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5/tools/metrics/histograms/enums.xml
,
May 8 2018
The added metric is out and reporting useful information but it would be good to have it for the M67 release as well.
,
May 8 2018
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 8 2018
,
May 8 2018
Can this change affects any Chrome functionality? did you verify it in canary?
,
May 8 2018
No, there is no change in functionality, just an additional error report. Yes, it's reporting information from Canary just fine.
,
May 8 2018
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4eb3939250abc910e912d1f808d8ef66ae2eb311 commit 4eb3939250abc910e912d1f808d8ef66ae2eb311 Author: Brian White <bcwhite@chromium.org> Date: Wed May 09 12:49:22 2018 Check theory that entire mmap page is getting zeroed. This is a temporary change that will be reverted once crashes can be analyzed to confirm the working theory. Bug: 836875 Change-Id: Iaeffbd6c78b604af66b33991392990d71648d4a9 Reviewed-on: https://chromium-review.googlesource.com/1050071 Commit-Queue: Brian White <bcwhite@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Cr-Commit-Position: refs/heads/master@{#557159} [modify] https://crrev.com/4eb3939250abc910e912d1f808d8ef66ae2eb311/base/metrics/histogram.cc [modify] https://crrev.com/4eb3939250abc910e912d1f808d8ef66ae2eb311/base/metrics/histogram_samples.h
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d7ce6b9f83ef594458c5aa30ce2166711c9e3237 commit d7ce6b9f83ef594458c5aa30ce2166711c9e3237 Author: Brian White <bcwhite@chromium.org> Date: Wed May 09 13:41:42 2018 Track metrics files with internal corruption. TBR=bcwhite@chromium.org (cherry picked from commit 60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5) Bug: 836875 Change-Id: I633a00c07f52093c12a49723fd213dfc03192eba Reviewed-on: https://chromium-review.googlesource.com/1039346 Commit-Queue: Alexei Svitkine <asvitkine@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#555478} Reviewed-on: https://chromium-review.googlesource.com/1052067 Reviewed-by: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#534} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/d7ce6b9f83ef594458c5aa30ce2166711c9e3237/components/metrics/file_metrics_provider.cc [modify] https://crrev.com/d7ce6b9f83ef594458c5aa30ce2166711c9e3237/components/metrics/file_metrics_provider.h [modify] https://crrev.com/d7ce6b9f83ef594458c5aa30ce2166711c9e3237/tools/metrics/histograms/enums.xml
,
May 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9114be5d60a9e5786525c8270c49ddce8ba6dfea commit 9114be5d60a9e5786525c8270c49ddce8ba6dfea Author: Brian White <bcwhite@chromium.org> Date: Wed May 16 17:01:26 2018 Use default page size for check. Crashes from Android indicate that the page_size being returned is zero. Set it to 1KiB for the check. It's probably 4KiB but this is sufficient for the test. Bug: 836875 Change-Id: Ifedbcd648157ac4e39560640048c0640f97e5416 Reviewed-on: https://chromium-review.googlesource.com/1059728 Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Commit-Queue: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/heads/master@{#559148} [modify] https://crrev.com/9114be5d60a9e5786525c8270c49ddce8ba6dfea/base/metrics/histogram.cc
,
Jun 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ae894def44ef8c676b8e740b279357a62bcb1f2 commit 0ae894def44ef8c676b8e740b279357a62bcb1f2 Author: Brian White <bcwhite@chromium.org> Date: Thu Jun 28 22:23:21 2018 Revert "Check theory that entire mmap page is getting zeroed." This reverts commit 4eb3939250abc910e912d1f808d8ef66ae2eb311. Reason for revert: Testing complete. All crashes I looked at showed non-zero values at address zero. Pages are not all zeros so the problem isn't a whole page being cleared. Original change's description: > Check theory that entire mmap page is getting zeroed. > > This is a temporary change that will be reverted once crashes can be > analyzed to confirm the working theory. > > Bug: 836875 > Change-Id: Iaeffbd6c78b604af66b33991392990d71648d4a9 > Reviewed-on: https://chromium-review.googlesource.com/1050071 > Commit-Queue: Brian White <bcwhite@chromium.org> > Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> > Cr-Commit-Position: refs/heads/master@{#557159} TBR=asvitkine@chromium.org,bcwhite@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 836875 Change-Id: If65d667029c8f3a1f82dec35c878a32dc96b8343 Reviewed-on: https://chromium-review.googlesource.com/1118826 Commit-Queue: Brian White <bcwhite@chromium.org> Reviewed-by: Brian White <bcwhite@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Cr-Commit-Position: refs/heads/master@{#571297} [modify] https://crrev.com/0ae894def44ef8c676b8e740b279357a62bcb1f2/base/metrics/histogram.cc [modify] https://crrev.com/0ae894def44ef8c676b8e740b279357a62bcb1f2/base/metrics/histogram_samples.h |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by bcwh...@chromium.org
, Apr 25 2018