More Histogram Corruption |
||||||||||||
Issue descriptionThis is a follow-up to the (infamous) Android Core Trampler described here: https://bugs.chromium.org/p/chromium/issues/detail?id=736675 Fixing the bug deemed the cause has not stopped there being "validation" crash reports. For those unfamiliar, the core trampler had a tendency to hit Histogram objects at a byte-offset of 20 on 32-bit builds and an offset of 40 on 64-bit builds. Code was added to the Histogram object to detect this zeroing of memory and crash if it was found making for a single stack signature when the problem occurred. Originally, it was not known that the corruption occurred only at offset 20 and so all fields were checked and reported. Now that the fix is in (v63.0.3226.0 and above), there is still corruption but not always at offset 20 and also being detected on other architectures. See here: (46 crashes 2 weeks, Canary and Dev only) https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D 46% Windows 7 19% Windows 10 17% Windows 8 7% Android Marshmallow 7% Android Nougat 3% Android Lollipop If the crash signature ends at "histogram.cc:591" then corruption was detected and the report is intentional. Source indexing is currently unavailable but I've confirmed that in my own code. For more information, go to "Fields" and expand the "Product data". At the bottom should be "bad_histogram-X" of the form "HistogramName/A#B where A == corrupted field, 1 bit per field 1 = UnloggedBucketRanges 2 = UnloggedSamples 4 = LoggedSamples 8 = ID 16 = HistogramName 32 = Flags 64 = LoggedBucketRanges 128 = Dummy B == caller identity (not important here) Looking through a few crashes, here are some of interest... https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=26cdc461d83f88de&index=5#2 Android Marshmallow, arm(32), htc_m8whl Corrupted field is the name truncated at 20 characters. This matches the signature of the old core trampler on 32-bit builds. Perhaps there are other ways it can occur besides the main one that was fixed. https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=ba5442ba71189c09&index=1#2 Android Marshmallow, arm64, YT3 Corrupted field is the name truncated at 40 characters. This matches the signature of the old core trampler on 64-bit builds. https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=c455ba30e874d73b&index=2#2 Android Nougat, arm(32), addison Corrupted field is the ID. The ID is 64 bits and must be 0 to trip this check so even though it's a 32-bit cpu, 64 bits were zeroed. Or some pointer is wrong. The ID is held in persistent memory, accessed via a pointer inside the HistogramSamples object which is itself pointed to by the Histogram object. Either of these pointers could be wrong but one would generally expect a SEGV in such a case. https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=ce1d13a6c9854b58&index=3#2 Android Nougat, arm(32), dream2qltechn Corrupted field is the "dummy" field. This matches the signature of the old core trampler on 32-bit builds though it's possible that it is just a random bit-flip rather than a full zeroing of the field. https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D&stbtiq=&reportid=a0610d4f39a5b356&index=4#2 Windows 7, x86 Corrupted field is the ID. See above description. 5 of the 5 Windows crashes I saw were this one.
,
Oct 13 2017
Just to be sure, I ran a few overnight content_browsertests runs on an Android device with ASAN+Accessibility on. There are no stack traces from ASAN in the resulting logs.
,
Oct 17 2017
,
Oct 18 2017
BTW, issue 766752 fixed one more trampler.
,
Oct 20 2017
Users experienced this crash on the following builds: Linux Dev 63.0.3239.9 - 2.79 CPM, 2 reports, 1 clients (signature base::Histogram::ValidateHistogramContents) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Nov 6 2017
Issue 781707 has been merged into this issue.
,
Dec 6 2017
Two months later... https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D There are 315 reports since the big core trampler on Android was fixed. Of these... 225 Chrome_Android 64 Chrome 14 Chrome_ChromeOS 8 Chrome_Linux 2 Chrome_Mac 2 AndroidWebView Unfortunately, https://chromium-review.googlesource.com/c/chromium/src/+/804338 removed the crash-key so newer validation failures don't provide information as to what was wrong. Looking at a dozen recent crashes, most of them are the ID field being zeroed but there is some name corruption (again zeros) or dummy corruption (any value). With the possible exception of the "dummy" corruption, these would be multi-bit changes and so are not likely random memory errors. Something is (or many things are) still writing zeros to un-owned memory locations, mostly on Android. Validation is showing corruption at 3 different locations. It's possible that it is happening in other locations too but causing a crash during histogram updates instead. Here are 634 possibilities of that: https://crash.corp.google.com/browse?q=product.Version%3E%3D%2763.0.3226.0%27%20AND%20(custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%20%27base%3A%3A%25%3A%3AGetBucketIndex%27%20OR%20custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%27base%3A%3A%25%3A%3AAccumulate%27)&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D#-property-selector,+crashaddress I checked out a couple that had Windows minidumps and found zeroed "samples" pointers. But I don't see any easy way to track this down. Before M65 branches, I currently plan to remove most if not all of the histogram validation code.
,
Jan 26 2018
Users experienced this crash on the following builds: Linux Beta 64.0.3282.119 - 0.77 CPM, 1 reports, 1 clients (signature base::SampleVectorBase::Accumulate) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Feb 1 2018
Users experienced this crash on the following builds: Mac Canary 66.0.3335.0 - 0.96 CPM, 4 reports, 1 clients (signature base::Histogram::ValidateHistogramContents) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Feb 6 2018
Just checking in on this: https://crash.corp.google.com/browse?q=product.Version%3E%3D%2764.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=custom_data.ChromeCrashProto.experiments.ids&omit_field_value=39fa6d3e-3f4a17df&omit_field_opt=%3D This is still happening on Unix operating systems, mostly Android but that is probably just due to the distribution among the number of users. There's no longer a crash-key that indicates the source of the corruption. This could use some investigation by somebody who specializes in Android or Linux or Mac.
,
Feb 7 2018
Users experienced this crash on the following builds: Mac Canary 66.0.3341.0 - 0.71 CPM, 3 reports, 1 clients (signature base::Histogram::ValidateHistogramContents) Android Beta 65.0.3325.53 - 1.50 CPM, 1 reports, 1 clients (signature base::Histogram::ValidateHistogramContents) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Feb 27 2018
Magic Signature: base::SampleVectorBase::Accumulate ------------------------------------------------------ Just to update the latest behavior of this issue, still crash instances are observed on chrome latest stable #64.0.3282.186 with 4178 instances. Currently this crash is ranked as number #6 during version comparison for 64.0.3282.167 and 64.0.3282.186. Link to list of the builds: ------------------------------ https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Linux%27%20%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3ASampleVectorBase%3A%3AAccumulate%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#-property-selector,-samplereports,productversion:1000 Thanks!
,
Feb 28 2018
https://crash.corp.google.com/browse?q=product.Version%3E%3D%2764.0%27%20AND%20expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27 Since M64, Chrome_Android has 1443 (51%) crashes. Windows Chrome has 870 (31%) crashes. There is no longer a crash-key that indicates what field of the histogram is being corrupted but I looked at a few of the Windows minidump files: - 70% were zero'd ID in the logged_samples_ (offset 0) - 30% were corrupted "dummy" values (offset 20/40) The latter fits with the previous core trampler except that was only on Android. Does anyone think this is worth pursuing? And are you interested in pursuing it? If not, if it's just noise in the general memory corruption problem, I'm going to remove the validation and let the crashes arise where they may.
,
Mar 14 2018
Users experienced this crash on the following builds: Android Beta 65.0.3325.144 - 0.22 CPM, 35 reports, 12 clients (signature base::Histogram::ValidateHistogramContents) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Mar 19 2018
Users experienced this crash on the following builds: Android Beta 66.0.3359.30 - 0.21 CPM, 10 reports, 7 clients (signature base::Histogram::ValidateHistogramContents) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Apr 26 2018
,
Aug 3
Just to update the latest behavior of this issue in the latest channels: Magic Singature - base::Histogram::ValidateHistogramContents Still seeing 54 crashes from 25 clients so far on latest beta - 68.0.3440.70 on Android OS. This crash is ranked as number #39 in 'Renderer' beta crashes. 68.0.3440.85 0.16% 67 - Stable 68.0.3440.70 1.30% 546 - Beta So far crashes are not observed on Dev and Canary builds. Link to the list of builds: ------------------------- https://crash.corp.google.com/browse?q=product_name%3D%27Chrome_Android%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27beta%27+AND+expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3AHistogram%3A%3AValidateHistogramContents%27 Thanks!
,
Aug 24
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by sheriffbot@chromium.org
, Oct 13 2017