New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 836875 link

Starred by 5 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug

Blocking:
issue 773756



Sign in to add a comment

Increased Crashes In Android With On-Disk Persistent Metrics

Project Member Reported by bcwh...@chromium.org, Apr 25 2018

Issue description

Chrome Version: M65
OS: Android

On-Disk Persistence:
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.component%3D%27src%2Fbase%2Fmetrics%27%20AND%20expanded_custom_data.ChromeCrashProto.malware_verdict%3Dfalse%20AND%20product.version%20like%20%276_._.%25%27%20AND%20product.name%3D%27Chrome_Android%27%20AND%20expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20expanded_custom_data.ChromeCrashProto.channel%3D%27stable%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(expanded_custom_data.ChromeCrashProto.experiments.ids)%20expanded_custom_data_ChromeCrashProto_experiments_ids%20WHERE%20expanded_custom_data_ChromeCrashProto_experiments_ids%3D%2723a898eb-357e36e%27)

In-Memory Persistence:
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.component%3D%27src%2Fbase%2Fmetrics%27%20AND%20expanded_custom_data.ChromeCrashProto.malware_verdict%3Dfalse%20AND%20product.version%20like%20%276_._.%25%27%20AND%20product.name%3D%27Chrome_Android%27%20AND%20expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20expanded_custom_data.ChromeCrashProto.channel%3D%27stable%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(expanded_custom_data.ChromeCrashProto.experiments.ids)%20expanded_custom_data_ChromeCrashProto_experiments_ids%20WHERE%20expanded_custom_data_ChromeCrashProto_experiments_ids%3D%2723a898eb-a85dc586%27)

There is a high number of M65 crashes in ValidateHistogramContents() that is not present in the "control" group using in-memory persistence.  Unfortunately, there is no way of seeing what the validation-failure was as the "crash key" was removed a couple versions prior.

All the calls are through RecordCurrentHistograms() meaning that it's only accessing data that was written to memory during the current session, not loading a metrics file from a previous session.  As such, there should be no difference between the on-disk and in-memory modes.

The distribution among clients is not uniform with 20 crashes (13.4%) being from a single client and the next highest two clients having 6 crashes (4.0%).  The number continues to drop from there.

Despite the fact that there should be no difference, the in-memory group sees only 5 crashes total in this method.

I wonder if something could be causing the disk file to be modified outside of Chrome.
 
Of note, there is only one field checked by Validate that uses persistent memory: Id().  All other checked fields are in the objects themselves held in main memory.
Looking through the crashes, many of the "logcat" files have error messages saying:

  Corruption detected in shared-memory segment.

Something is messing with the shared memory contents when its held on disk.

It's possible that this is restricted to certain devices.  The U673C makes up 42% and the UL40 makes up another 35%.  Beyond that it drops off quickly with NX16A8116K at 9%, MTAB-ET8404G at 4%.  After that it's 1% and lower.
A quick check via ADB confirms that these files are only accessible to the owning process so nothing should be able to change their contents except Chrome itself.

Perhaps its a bad flash chip?  Something that doesn't write properly and then reads back as zeros?

The devices I listed previously don't match the more general device list for all Android crashes so it looks like these devices are especially prone to the problem.
Blocking: 773756
Project Member

Comment 5 by bugdroid1@chromium.org, Apr 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1a3a225c8e5b345a4747835a3cf37304b012ddbd

commit 1a3a225c8e5b345a4747835a3cf37304b012ddbd
Author: Brian White <bcwhite@chromium.org>
Date: Fri Apr 27 00:40:07 2018

Validate histogram contents to detect corruption.

Android builds are crashing when memory-mapped files are used for
persistent metrics.  This is just to verify that the cause is indeed
an unreadable ID that is held in that memory segment.

This will be reverted as soon as the reason is confirmed.

Bug: 836875
Change-Id: I4fcdbf0c554ef4f3ee4dd3706ab8c5b71e12c50b
Reviewed-on: https://chromium-review.googlesource.com/1031190
Commit-Queue: Brian White <bcwhite@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#554224}
[modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram.cc
[modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram.h
[modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram_base.cc
[modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram_base.h
[modify] https://crrev.com/1a3a225c8e5b345a4747835a3cf37304b012ddbd/base/metrics/histogram_snapshot_manager.cc

Project Member

Comment 6 by bugdroid1@chromium.org, May 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5

commit 60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5
Author: Brian White <bcwhite@chromium.org>
Date: Wed May 02 18:32:00 2018

Track metrics files with internal corruption.

Bug: 836875
Change-Id: I633a00c07f52093c12a49723fd213dfc03192eba
Reviewed-on: https://chromium-review.googlesource.com/1039346
Commit-Queue: Alexei Svitkine <asvitkine@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#555478}
[modify] https://crrev.com/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5/components/metrics/file_metrics_provider.cc
[modify] https://crrev.com/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5/components/metrics/file_metrics_provider.h
[modify] https://crrev.com/60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5/tools/metrics/histograms/enums.xml

Labels: Merge-Request-67
The added metric is out and reporting useful information but it would be good to have it for the M67 release as well.
Project Member

Comment 8 by sheriffbot@chromium.org, May 8 2018

Labels: -Merge-Request-67 Merge-Review-67 Hotlist-Merge-Review
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: OS-Android
Can this change affects any Chrome functionality? did you verify it in canary? 
No, there is no change in functionality, just an additional error report.  Yes, it's reporting information from Canary just fine.
Labels: -Hotlist-Merge-Review -Merge-Review-67 Merge-Approved-67
Project Member

Comment 13 by bugdroid1@chromium.org, May 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4eb3939250abc910e912d1f808d8ef66ae2eb311

commit 4eb3939250abc910e912d1f808d8ef66ae2eb311
Author: Brian White <bcwhite@chromium.org>
Date: Wed May 09 12:49:22 2018

Check theory that entire mmap page is getting zeroed.

This is a temporary change that will be reverted once crashes can be
analyzed to confirm the working theory.

Bug: 836875
Change-Id: Iaeffbd6c78b604af66b33991392990d71648d4a9
Reviewed-on: https://chromium-review.googlesource.com/1050071
Commit-Queue: Brian White <bcwhite@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#557159}
[modify] https://crrev.com/4eb3939250abc910e912d1f808d8ef66ae2eb311/base/metrics/histogram.cc
[modify] https://crrev.com/4eb3939250abc910e912d1f808d8ef66ae2eb311/base/metrics/histogram_samples.h

Project Member

Comment 14 by bugdroid1@chromium.org, May 9 2018

Labels: -merge-approved-67 merge-merged-3396
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d7ce6b9f83ef594458c5aa30ce2166711c9e3237

commit d7ce6b9f83ef594458c5aa30ce2166711c9e3237
Author: Brian White <bcwhite@chromium.org>
Date: Wed May 09 13:41:42 2018

Track metrics files with internal corruption.

TBR=bcwhite@chromium.org

(cherry picked from commit 60a1a0e96b7755b5dcfd891077f1ee6bbefb3ec5)

Bug: 836875
Change-Id: I633a00c07f52093c12a49723fd213dfc03192eba
Reviewed-on: https://chromium-review.googlesource.com/1039346
Commit-Queue: Alexei Svitkine <asvitkine@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#555478}
Reviewed-on: https://chromium-review.googlesource.com/1052067
Reviewed-by: Brian White <bcwhite@chromium.org>
Cr-Commit-Position: refs/branch-heads/3396@{#534}
Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428}
[modify] https://crrev.com/d7ce6b9f83ef594458c5aa30ce2166711c9e3237/components/metrics/file_metrics_provider.cc
[modify] https://crrev.com/d7ce6b9f83ef594458c5aa30ce2166711c9e3237/components/metrics/file_metrics_provider.h
[modify] https://crrev.com/d7ce6b9f83ef594458c5aa30ce2166711c9e3237/tools/metrics/histograms/enums.xml

Project Member

Comment 15 by bugdroid1@chromium.org, May 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9114be5d60a9e5786525c8270c49ddce8ba6dfea

commit 9114be5d60a9e5786525c8270c49ddce8ba6dfea
Author: Brian White <bcwhite@chromium.org>
Date: Wed May 16 17:01:26 2018

Use default page size for check.

Crashes from Android indicate that the page_size being returned is
zero.  Set it to 1KiB for the check.  It's probably 4KiB but this is
sufficient for the test.

Bug: 836875
Change-Id: Ifedbcd648157ac4e39560640048c0640f97e5416
Reviewed-on: https://chromium-review.googlesource.com/1059728
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Commit-Queue: Brian White <bcwhite@chromium.org>
Cr-Commit-Position: refs/heads/master@{#559148}
[modify] https://crrev.com/9114be5d60a9e5786525c8270c49ddce8ba6dfea/base/metrics/histogram.cc

Project Member

Comment 16 by bugdroid1@chromium.org, Jun 28 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0ae894def44ef8c676b8e740b279357a62bcb1f2

commit 0ae894def44ef8c676b8e740b279357a62bcb1f2
Author: Brian White <bcwhite@chromium.org>
Date: Thu Jun 28 22:23:21 2018

Revert "Check theory that entire mmap page is getting zeroed."

This reverts commit 4eb3939250abc910e912d1f808d8ef66ae2eb311.

Reason for revert: Testing complete.  All crashes I looked at showed
non-zero values at address zero.  Pages are not all zeros so
the problem isn't a whole page being cleared.

Original change's description:
> Check theory that entire mmap page is getting zeroed.
>
> This is a temporary change that will be reverted once crashes can be
> analyzed to confirm the working theory.
>
> Bug: 836875
> Change-Id: Iaeffbd6c78b604af66b33991392990d71648d4a9
> Reviewed-on: https://chromium-review.googlesource.com/1050071
> Commit-Queue: Brian White <bcwhite@chromium.org>
> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#557159}

TBR=asvitkine@chromium.org,bcwhite@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 836875
Change-Id: If65d667029c8f3a1f82dec35c878a32dc96b8343
Reviewed-on: https://chromium-review.googlesource.com/1118826
Commit-Queue: Brian White <bcwhite@chromium.org>
Reviewed-by: Brian White <bcwhite@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571297}
[modify] https://crrev.com/0ae894def44ef8c676b8e740b279357a62bcb1f2/base/metrics/histogram.cc
[modify] https://crrev.com/0ae894def44ef8c676b8e740b279357a62bcb1f2/base/metrics/histogram_samples.h

Sign in to add a comment