New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 773756 link

Starred by 5 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Mac
Pri: 3
Type: Bug

Blocked on:
issue 836875



Sign in to add a comment

More Histogram Corruption

Project Member Reported by bcwh...@chromium.org, Oct 11 2017

Issue description

This is a follow-up to the (infamous) Android Core Trampler described here:
https://bugs.chromium.org/p/chromium/issues/detail?id=736675

Fixing the bug deemed the cause has not stopped there being "validation" crash reports.  For those unfamiliar, the core trampler had a tendency to hit Histogram objects at a byte-offset of 20 on 32-bit builds and an offset of 40 on 64-bit builds.  Code was added to the Histogram object to detect this zeroing of memory and crash if it was found making for a single stack signature when the problem occurred.

Originally, it was not known that the corruption occurred only at offset 20 and so all fields were checked and reported.  Now that the fix is in (v63.0.3226.0 and above), there is still corruption but not always at offset 20 and also being detected on other architectures.

See here:  (46 crashes 2 weeks, Canary and Dev only)
https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D
46% Windows 7
19% Windows 10
17% Windows 8
 7% Android Marshmallow
 7% Android Nougat
 3% Android Lollipop

If the crash signature ends at "histogram.cc:591" then corruption was detected and the report is intentional.  Source indexing is currently unavailable but I've confirmed that in my own code.

For more information, go to "Fields" and expand the "Product data".  At the bottom should be "bad_histogram-X" of the form "HistogramName/A#B where
A == corrupted field, 1 bit per field
   1 = UnloggedBucketRanges
   2 = UnloggedSamples
   4 = LoggedSamples
   8 = ID
  16 = HistogramName
  32 = Flags
  64 = LoggedBucketRanges
 128 = Dummy
B == caller identity (not important here)

Looking through a few crashes, here are some of interest...

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=26cdc461d83f88de&index=5#2

Android Marshmallow, arm(32), htc_m8whl
Corrupted field is the name truncated at 20 characters.
This matches the signature of the old core trampler on 32-bit builds.  Perhaps there are other ways it can occur besides the main one that was fixed.

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=ba5442ba71189c09&index=1#2

Android Marshmallow, arm64, YT3
Corrupted field is the name truncated at 40 characters.
This matches the signature of the old core trampler on 64-bit builds.

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=c455ba30e874d73b&index=2#2

Android Nougat, arm(32), addison
Corrupted field is the ID.
The ID is 64 bits and must be 0 to trip this check so even though it's a 32-bit cpu, 64 bits were zeroed.  Or some pointer is wrong.  The ID is held in persistent memory, accessed via a pointer inside the HistogramSamples object which is itself pointed to by the Histogram object.  Either of these pointers could be wrong but one would generally expect a SEGV in such a case.

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=ce1d13a6c9854b58&index=3#2

Android Nougat, arm(32), dream2qltechn
Corrupted field is the "dummy" field.
This matches the signature of the old core trampler on 32-bit builds though it's possible that it is just a random bit-flip rather than a full zeroing of the field.

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=a0610d4f39a5b356&index=4#2

Windows 7, x86
Corrupted field is the ID.
See above description.
5 of the 5 Windows crashes I saw were this one.

 
Project Member

Comment 1 by sheriffbot@chromium.org, Oct 13 2017

Labels: FoundIn-M-63 Fracas
Users experienced this crash on the following builds:

Android Dev 63.0.3236.6 -  0.50 CPM, 3 reports, 3 clients (signature base::Histogram::ValidateHistogramContents)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas

Comment 2 by roc...@chromium.org, Oct 13 2017

Just to be sure, I ran a few overnight content_browsertests runs on an Android device with ASAN+Accessibility on. There are no stack traces from ASAN in the resulting logs.
Cc: -amineer@chromium.org

Comment 4 by dskiba@chromium.org, Oct 18 2017

BTW, issue 766752 fixed one more trampler.
Project Member

Comment 5 by sheriffbot@chromium.org, Oct 20 2017

Labels: OS-Linux
Users experienced this crash on the following builds:

Linux Dev 63.0.3239.9 -  2.79 CPM, 2 reports, 1 clients (signature base::Histogram::ValidateHistogramContents)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Issue 781707 has been merged into this issue.
Two months later...

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D

There are 315 reports since the big core trampler on Android was fixed.  Of these...
225  Chrome_Android
 64  Chrome
 14  Chrome_ChromeOS
  8  Chrome_Linux
  2  Chrome_Mac
  2  AndroidWebView

Unfortunately, https://chromium-review.googlesource.com/c/chromium/src/+/804338 removed the crash-key so newer validation failures don't provide information as to what was wrong.

Looking at a dozen recent crashes, most of them are the ID field being zeroed but there is some name corruption (again zeros) or dummy corruption (any value).

With the possible exception of the "dummy" corruption, these would be multi-bit changes and so are not likely random memory errors.

Something is (or many things are) still writing zeros to un-owned memory locations, mostly on Android.  Validation is showing corruption at 3 different locations.  It's possible that it is happening in other locations too but causing a crash during histogram updates instead.

Here are 634 possibilities of that:
https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20(custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%20%27base%3A%3A%25%3A%3AGetBucketIndex%27%20OR%20custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%27base%3A%3A%25%3A%3AAccumulate%27)&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D#-property-selector,+crashaddress

I checked out a couple that had Windows minidumps and found zeroed "samples" pointers.

But I don't see any easy way to track this down.  Before M65 branches, I currently plan to remove most if not all of the histogram validation code.

Project Member

Comment 8 by sheriffbot@chromium.org, Jan 26 2018

Labels: FoundIn-M-64
Users experienced this crash on the following builds:

Linux Beta 64.0.3282.119 -  0.77 CPM, 1 reports, 1 clients (signature base::SampleVectorBase::Accumulate)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member

Comment 9 by sheriffbot@chromium.org, Feb 1 2018

Labels: FoundIn-M-66 OS-Mac
Users experienced this crash on the following builds:

Mac Canary 66.0.3335.0 -  0.96 CPM, 4 reports, 1 clients (signature base::Histogram::ValidateHistogramContents)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Cc: -roc...@chromium.org bcwh...@chromium.org
Owner: ----
Status: Available (was: Assigned)
Just checking in on this:
https://crash.corp.google.com/browse?q=product.Version%3E%3D%2764.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D

This is still happening on Unix operating systems, mostly Android but that is probably just due to the distribution among the number of users.

There's no longer a crash-key that indicates the source of the corruption.  This could use some investigation by somebody who specializes in Android or Linux or Mac.
Project Member

Comment 11 by sheriffbot@chromium.org, Feb 7 2018

Labels: FoundIn-M-65
Users experienced this crash on the following builds:

Mac Canary 66.0.3341.0 -  0.71 CPM, 3 reports, 1 clients (signature base::Histogram::ValidateHistogramContents)
Android Beta 65.0.3325.53 -  1.50 CPM, 1 reports, 1 clients (signature base::Histogram::ValidateHistogramContents)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Magic Signature: base::SampleVectorBase::Accumulate
------------------------------------------------------

Just to update the latest behavior of this issue, still crash instances are observed on chrome latest stable #64.0.3282.186 with 4178 instances. Currently this crash is ranked as number #6 during version comparison for 64.0.3282.167 and 64.0.3282.186. 

Link to list of the builds: 
------------------------------
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Linux%27%20%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3ASampleVectorBase%3A%3AAccumulate%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#-property-selector,-samplereports,productversion:1000

Thanks!
https://crash.corp.google.com/browse?q=product.Version%3E%3D%2764.0%27%20AND%20expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27

Since M64, Chrome_Android has 1443 (51%) crashes.  Windows Chrome has 870 (31%) crashes.

There is no longer a crash-key that indicates what field of the histogram is being corrupted but I looked at a few of the Windows minidump files:
- 70% were zero'd ID in the logged_samples_  (offset 0)
- 30% were corrupted "dummy" values  (offset 20/40)

The latter fits with the previous core trampler except that was only on Android.

Does anyone think this is worth pursuing?  And are you interested in pursuing it?  If not, if it's just noise in the general memory corruption problem, I'm going to remove the validation and let the crashes arise where they may.
Project Member

Comment 14 by sheriffbot@chromium.org, Mar 14 2018

Labels: FoundIn-65
Users experienced this crash on the following builds:

Android Beta 65.0.3325.144 -  0.22 CPM, 35 reports, 12 clients (signature base::Histogram::ValidateHistogramContents)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member

Comment 15 by sheriffbot@chromium.org, Mar 19 2018

Labels: FoundIn-66
Users experienced this crash on the following builds:

Android Beta 66.0.3359.30 -  0.21 CPM, 10 reports, 7 clients (signature base::Histogram::ValidateHistogramContents)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Blockedon: 836875
Cc: pnangunoori@chromium.org
Labels: Target-68 FoundIn-68
Just to update the latest behavior of this issue in the latest channels:

Magic Singature - base::Histogram::ValidateHistogramContents

Still seeing 54 crashes from 25 clients so far on latest beta - 68.0.3440.70 on Android OS. This crash is ranked as number #39 in 'Renderer' beta crashes. 

68.0.3440.85	0.16%	67 - Stable 
68.0.3440.70	1.30%	546 - Beta
So far crashes are not observed on Dev and Canary builds.

Link to the list of builds:
-------------------------
https://crash.corp.google.com/browse?q=product_name%3D%27Chrome_Android%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27beta%27+AND+expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27

Thanks!

Components: Internals>Metrics

Sign in to add a comment