Issue metadata
Sign in to add a comment
|
Chrome_Mac: Crash Report - net::ReadCRL, _platform_memmove$VARIANT$Nehalem |
||||||||||||||||||||
Issue descriptionreporter:ajha@google.com Magic Signature: net::ReadCRL Crash link: https://crash.corp.google.com//browse?q=product.name%3D'Chrome_Mac'%20AND%20product.version%3D'65.0.3301.0'%20AND%20custom_data.ChromeCrashProto.channel%3D'canary'%20AND%20custom_data.ChromeCrashProto.ptype%3D'browser'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'net%3A%3AReadCRL'%20AND%20ReportID%3D'2a0743fc8392bbd8'&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 ------------------------------------------------------------------------------- Sample Report ------------------------------------------------------------------------------- Product name: Chrome_Mac Magic Signature : net::ReadCRL Product Version: 65.0.3301.0 Process type: browser Report ID: 2a0743fc8392bbd8 Report Url: https://crash.corp.google.com/2a0743fc8392bbd8 Report Time: 2017-12-22T03:14:01-08:00 Upload Time: 2017-12-22T03:14:04.086-08:00 Uptime: 102000 ms CumulativeProductUptime: 0 ms OS Name: Mac OS X OS Version: 10.13.2 17C88 CPU Architecture: amd64 CPU Info: family 6 model 58 stepping 9 ------------------------------------------------------------------------------- Crashing thread: Thread index: 10. Stack Quality: 77%. Thread id: 3776. ------------------------------------------------------------------------------- 0x00007fff63a74e3e (libsystem_kernel.dylib + 0x0001be3e) __pthread_kill 0x00007fff639d1311 (libsystem_c.dylib + 0x0005d311) abort 0x00007fff63ace865 (libsystem_malloc.dylib + 0x00003865) free 0x00007fff61993b1c (libc++.1.dylib + 0x0003cb1c) std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__grow_by_and_replace(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, char const*) 0x00007fff61993011 (libc++.1.dylib + 0x0003c011) std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::assign(char const*, unsigned long) 0x00000001106becc0 (Google Chrome Framework - crl_set_storage.cc: 149) net::ReadCRL(base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >*) 0x00000001106be8f7 (Google Chrome Framework - crl_set_storage.cc: 396) net::CRLSetStorage::Parse(base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, scoped_refptr<net::CRLSet>*) 0x000000010ff4daef (Google Chrome Framework - crl_set_component_installer.cc: 40) component_updater::(anonymous namespace)::LoadCRLSet(base::FilePath const&) 0x00000001102d324b (Google Chrome Framework - callback.h: 65) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) 0x000000011033e46d (Google Chrome Framework - task_tracker.cc: 410) base::internal::TaskTracker::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) 0x000000011033ef02 (Google Chrome Framework - task_tracker_posix.cc: 23) base::internal::TaskTrackerPosix::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) 0x000000011033dcba (Google Chrome Framework - task_tracker.cc: 312) base::internal::TaskTracker::RunNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) 0x0000000110338adf (Google Chrome Framework - scheduler_worker.cc: 72) base::internal::SchedulerWorker::Thread::ThreadMain() 0x0000000110349e86 (Google Chrome Framework - platform_thread_posix.cc: 75) base::(anonymous namespace)::ThreadFunc(void*) 0x00007fff63bb06c0 (libsystem_pthread.dylib + 0x000036c0) _pthread_body 0x00007fff63bb056c (libsystem_pthread.dylib + 0x0000356c) _pthread_start 0x00007fff63bafc5c (libsystem_pthread.dylib + 0x00002c5c) thread_start 0x0000000110349e2f (Google Chrome Framework + 0x01ee1e2f)
,
Dec 22 2017
Not my CL, setting to Untriaged and Internals>Network so it gets network triager visibility as well.
,
Dec 22 2017
,
Dec 22 2017
,
Dec 22 2017
You are probably looking for a change made after 525636 (known good), but no later than 525640 (first known bad). CHANGELOG URL: https://chromium.googlesource.com/chromium/src/+log/9c222110ec241790bc2a559bc85689937d61cfdb..747cb1afa7ec0b160837f87626161d29f97063af Reproduce by attempting to update "CRLSet" from chrome://components
,
Dec 22 2017
Some more notes from the crash report: abort() called *** error for object 0x2011: pointer being freed was not allocated Two possibilities: 1) The compiler roll may have changed some behavior of std::string (https://chromium.googlesource.com/chromium/src/+/ece1b14db2dedf6a9a4086b46e8cc6c2acd1b62d) 2) We pushed a new CRLSet to production that is now triggering some previously untested codepath (https://groups.google.com/a/google.com/forum/#!topic/omaha-core/tznqpEPMbZ8)
,
Dec 22 2017
Adding some initial notes from trying to view Crash - This particular magic frame is only on macOS - Looking more broadly at the stack of functions - e.g. including net::CRLSetStorage::Parse - still shows an extreme bias towards macOS - Removing the filters for Canary so all versions are considered suggests 65.0.3301.0 only So it does look valid that this is a mac-only, recently introduced crash, and I definitely agree with the labels. Looking through the crashes, see a mixture of invalid accesses and NPEs. The function, ReadCRL, takes in an input blob of bytes and parses it, storing a 32-byte SHA-256 hash (out_parent_spki_hash) and a list of serials (out_serials), which are caller-provided storage. The crash is consistent in assigning 32 bytes to out_parent_spki_hash. There is a condition check immediately before that to make sure at least 32 bytes are available, so that would indicate something amiss with |out_parent_spki_hash|, rather than |data| There are only two calls to ReadCRL - when parsing and when applying delta updates. The callstack is consistently in Parse() - although notably, applying delta-updates uses stack-allocated variables. In the Parse() case, a heap allocated CRLSet is created, and it has a vector of string/vector pairs (the CRLS). A new value is pushed onto the back, and then read (via pop_back), with the pointers into it being used as the output parameters. Thus, I have trouble believing this is a straight logic bug in the code - in order for those crashes to make sense, memory must be getting stomped 'somewhere else', leading to this crash. It could be elsewhere in this function, or elsewhere in a component, but I don't think this particular crash has enough information to figure it out.
,
Dec 22 2017
Thanks for the bisect, Comment #5 - that definitely supports rsesek's remark in Comment #6 that this may be a Clang-roll related.
,
Dec 22 2017
Issue 797315 has been merged into this issue.
,
Dec 22 2017
One note, in case it's helpful - I deleted my ~/Library/Application Support/Google/Chrome Canary directory, and restarted Chrome Canary, and it worked fine for a while - long enough to set up my account, log in to email, start some music playing, etc, but then I stepped away for a couple minutes and it had crashed again. That seems like it could indicate that a certain CRL is causing the crash.
,
Dec 22 2017
I don't think it's related to a CRLSet push - I think we should revert the Clang roll. It just happens that CRLSets sync after a while, so the first CRLSet will trigger it.
,
Dec 22 2017
I'm seeing this crash after relaunching canary with the latest version update and here is the attached system report incase if that helps.
,
Dec 22 2017
,
Dec 22 2017
This is definitely a clang bug. The last time it was seen was bug 792188 , also in net::ReadCRL().
,
Dec 22 2017
Issue 797361 has been merged into this issue.
,
Dec 22 2017
,
Dec 22 2017
,
Dec 22 2017
,
Dec 22 2017
I think this needs a postmortem once it's fixed. How did this make it past all of the automated testing machines?
,
Dec 22 2017
> I think this needs a postmortem once it's fixed. How did this make it past all of the automated testing machines? The original roll did pass the CQ, despite probably unrelated test failures on chromeos asan bots. The Mac and Linux bots on the Clang ToT waterfall[1] are currently green, despite the fact that we still appear to have a bug. [1] https://ci.chromium.org/p/chromium/g/chromium.clang/console I've spent an hour diffing the code clang generates with and without the memcpyopt patch, and I don't see anything wrong yet. I see differences, but they look correct. This is the interesting code pattern: for (size_t crl_index = 0; !data.empty(); crl_index++) { // Speculatively push back a pair and pass it to ReadCRL() to avoid // unnecessary copies. crl_set->crls_.push_back( std::make_pair(std::string(), std::vector<std::string>())); std::pair<std::string, std::vector<std::string> >* const back_pair = &crl_set->crls_.back(); if (!ReadCRL(&data, &back_pair->first, &back_pair->second)) { With the suspected culprit change, clang is trying to zero-initialize the last slot in the vector in place. That's well and good, but something appears to be going wrong. Assuming the revert lands soon, we can downgrade this to P1 and leave this until after the holidays, right?
,
Dec 22 2017
> Assuming the revert lands soon, we can downgrade this to P1 and leave this until after the holidays, right? Yes, though we're currently having problems landing the revert (https://chromium-review.googlesource.com/c/chromium/src/+/843242). The Android bots seem to be timing out/crashing during compile.
,
Dec 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bcd195d07a842f2ed3745dfa4a0a6bb341375ff5 commit bcd195d07a842f2ed3745dfa4a0a6bb341375ff5 Author: Brad Lassey <lassey@chromium.org> Date: Fri Dec 22 22:13:47 2017 Revert "Roll clang 318667:321204." This reverts commit ece1b14db2dedf6a9a4086b46e8cc6c2acd1b62d. Reason for revert: suspected of causing the crash in crbug/797267 Original change's description: > Roll clang 318667:321204. > > Ran `tools/clang/scripts/upload_revision.py 321204` > and cherry-picked the non-clang changes from > https://chromium-review.googlesource.com/c/chromium/src/+/828400 . > > TBR=brettw@chromium.org,rsesek@chromium.org,hans@chromium.org > > Bug: 787920 > Cq-Include-Trybots: master.tryserver.chromium.android:android_arm64_dbg_recipe;master.tryserver.chromium.android:android_compile_mips_dbg;master.tryserver.chromium.android:android_compile_x64_dbg;master.tryserver.chromium.android:android_compile_x86_dbg > Change-Id: Iaec5677e2afee0edb685db816e00f3c900214d14 > Reviewed-on: https://chromium-review.googlesource.com/835358 > Commit-Queue: Peter Collingbourne <pcc@chromium.org> > Reviewed-by: Robert Sesek <rsesek@chromium.org> > Reviewed-by: Hans Wennborg <hans@chromium.org> > Reviewed-by: Peter Collingbourne <pcc@chromium.org> > Cr-Commit-Position: refs/heads/master@{#525638} TBR=eugenis@chromium.org,brettw@chromium.org,hans@chromium.org,rnk@chromium.org,pcc@chromium.org,rsesek@chromium.org NOTRY=true Skipping the CQ because there appear to be Android bot capacity issues. Bug: 787920 ,797267 Change-Id: I3a033492561ae2ff4fcf46fe9087efbbd9f015f6 Cq-Include-Trybots: master.tryserver.chromium.android:android_arm64_dbg_recipe;master.tryserver.chromium.android:android_compile_mips_dbg;master.tryserver.chromium.android:android_compile_x64_dbg;master.tryserver.chromium.android:android_compile_x86_dbg Reviewed-on: https://chromium-review.googlesource.com/843242 Commit-Queue: Reid Kleckner <rnk@chromium.org> Reviewed-by: Reid Kleckner <rnk@chromium.org> Reviewed-by: Avi Drissman <avi@chromium.org> Cr-Commit-Position: refs/heads/master@{#526072} [modify] https://crrev.com/bcd195d07a842f2ed3745dfa4a0a6bb341375ff5/build/config/BUILDCONFIG.gn [modify] https://crrev.com/bcd195d07a842f2ed3745dfa4a0a6bb341375ff5/build/config/compiler/BUILD.gn [modify] https://crrev.com/bcd195d07a842f2ed3745dfa4a0a6bb341375ff5/media/cdm/ppapi/ppapi_cdm_adapter.gni [modify] https://crrev.com/bcd195d07a842f2ed3745dfa4a0a6bb341375ff5/sandbox/linux/tests/unit_tests.cc [modify] https://crrev.com/bcd195d07a842f2ed3745dfa4a0a6bb341375ff5/tools/clang/scripts/update.py
,
Dec 22 2017
rnk: >I've spent an hour diffing the code clang generates with and without the memcpyopt patch, and I don't see anything wrong yet. Anything useful from bug 792188 in this area?
,
Dec 22 2017
(especially comment 7 from bug 792188 )
,
Dec 22 2017
,
Dec 22 2017
> Yes, though we're currently having problems landing the revert (https://chromium-review.googlesource.com/c/chromium/src/+/843242). The Android bots seem to be timing out/crashing during compile. Filed Issue 797441 about the Android build exceptions. It looks to me like the Clang rollback caused a full rebuild, and it simply timed out on these two bots because their timeouts are set too low to handle that.
,
Dec 23 2017
I was looking at the ReadCRL failure in bug 797168 and saw it's flaky, which may explain why it got through automation undetected. It also happens on chromium revisions at least a month back.
,
Dec 23 2017
Since it wasn't mentioned anywhere yet, the memcpyopt change is r321138.
,
Dec 24 2017
FYI, canary 65.0.3302.0, built after the revert, launches fine.
,
Dec 24 2017
Now that the revert is in and the canary launches, I'm downgrading to a P1. Reid, a wonderful Christmas present for you. :)
,
Dec 24 2017
I can confirm, that with version 65.0.3303.0, it is all running again. Thanks for acting this fast! Happy holidays, guys!
,
Dec 27 2017
Removing the blocker and keeping this open to check the behavior once the Clang roll is relanded.
,
Dec 27 2017
Here's the beginning of a post-mortem. The revision Peter chose for the reverted clang roll had a bug that removes the zero initialization of the std::pair<std::string, std::vector<std::string>> being pushed onto the back of the crls_ vector. This code *does* have unit tests (CRLSetTest.Parse), but if the uninitialized memory happens to be zero, then the tests will pass. This happens enough of the time that Chromium's flaky test retry mechanisms allowed this bad CL through the CQ. The last time this came up, Hans was diligently watching the Clang ToT waterfall, so he filed bug 792188 and reverted the upstream LLVM change before we attempted to roll clang. That was weeks ago, and the patch was relanded early on Wed Dec 20, which is when we were pushing to roll clang. In the meantime, we still hadn't rolled clang, and the Clang ToT waterfall was fairly red for other reasons, so we didn't spent much time consulting it looking for possible redness. Even if we had, it cycles slowly enough that we might not have been able to see a pattern of alternating failing net_unittests runs. Bob independently filed bug 797168 specifically about the CRLSetTest.Parse. I'm going to mark this blocked on that, and put the more detailed updates over there. We can close this for good when we fix clang and land the next roll.
,
Dec 28 2017
The next roll is in flight: https://chromium-review.googlesource.com/c/chromium/src/+/845700 I will manually build and run net_unittests a few times before re-landing.
,
Dec 28 2017
Thanks rnk@ for the detailed explanation. Let's do an official writeup and come up with concrete next steps to prevent this from happening again. Maybe for some of these unit tests we can shim the memory allocator to return memory initialized with a guaranteed non-zero value?
,
Dec 29 2017
,
Jan 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0d72bb1494c94523ae7bd944476a495e8b281bcc commit 0d72bb1494c94523ae7bd944476a495e8b281bcc Author: Reid Kleckner <rnk@google.com> Date: Wed Jan 03 18:32:36 2018 Roll clang 318667:321529. Created by patching in https://chromium-review.googlesource.com/c/chromium/src/+/835358 and manually editting the revision in update.py. TBR-ing rsesek@ again for the sandbox changes. TBR-ing bretw@ again for the ppapi adapter LLD warning suppression. I will manually build and test net_unittests after uploading to goma, but I've built this clang revision locally and they are not flaky for me. R=inglorion@chromium.org,hans@chromium.org TBR=rsesek@chromium.org,brettw@chromium.org Bug: 787920 , 797267, 797168 Cq-Include-Trybots: master.tryserver.chromium.android:android_arm64_dbg_recipe;master.tryserver.chromium.android:android_compile_mips_dbg;master.tryserver.chromium.android:android_compile_x64_dbg;master.tryserver.chromium.android:android_compile_x86_dbg Change-Id: Ifd25c014e79ad7a5d4533941ae7fd74a86860191 Reviewed-on: https://chromium-review.googlesource.com/845700 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Hans Wennborg <hans@chromium.org> Reviewed-by: Robert Sesek <rsesek@chromium.org> Reviewed-by: Reid Kleckner <rnk@chromium.org> Commit-Queue: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#526748} [modify] https://crrev.com/0d72bb1494c94523ae7bd944476a495e8b281bcc/build/config/BUILDCONFIG.gn [modify] https://crrev.com/0d72bb1494c94523ae7bd944476a495e8b281bcc/build/config/compiler/BUILD.gn [modify] https://crrev.com/0d72bb1494c94523ae7bd944476a495e8b281bcc/media/cdm/ppapi/ppapi_cdm_adapter.gni [modify] https://crrev.com/0d72bb1494c94523ae7bd944476a495e8b281bcc/sandbox/linux/tests/unit_tests.cc [modify] https://crrev.com/0d72bb1494c94523ae7bd944476a495e8b281bcc/tools/clang/scripts/update.py |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by ajha@chromium.org
, Dec 22 2017Labels: -Type-Bug -Pri-2 ReleaseBlock-Dev TE-CrashTriage RegressedIn-65 M-65 Target-65 FoundIn-65 Stability-Sheriff-Desktop Pri-0 Type-Bug-Regression
Owner: rsleevi@chromium.org
Status: Assigned (was: Untriaged)