CHECK failure: iteration_state_ & kAllowingRemoval in lifecycle_notifier.h |
|||||||||||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=4945883024850944 Fuzzer: attekett_dom_fuzzer Job Type: linux_ubsan_vptr_chrome Platform Id: linux Crash Type: CHECK failure Crash Address: Crash State: iteration_state_ & kAllowingRemoval in lifecycle_notifier.h blink::LifecycleNotifier<blink::Document, blink::SynchronousMutationObserver>::R blink::DocumentMarkerController::PossiblyHasMarkers Sanitizer: undefined (UBSAN) Regressed: https://clusterfuzz.com/revisions?job=linux_ubsan_vptr_chrome&range=574075:574076 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4945883024850944 Issue filed automatically. See https://github.com/google/clusterfuzz-tools for more information.
,
Jul 12
Automatically assigning owner based on suspected regression changelist https://chromium.googlesource.com/chromium/src/+/e23e774ef4290cff9869417e25d46e5a099d1567 (Blink: Introduce LifecycleNotifier::ForEachObserver().). If this is incorrect, please let us know why and apply the Test-Predator-Wrong-CLs label. If you aren't the correct owner for this issue, please unassign yourself as soon as possible so it can be re-triaged.
,
Jul 12
As far as I know, this DCHECK exposes an unsafe modification of LifecycleNotifier::obsevers_ (modification concurrently with iterating over the collection). The CL referenced above added checks to more places that iterated over the collection, so reverting it would not fix the bug, it would merely mask it.
,
Jul 12
Adding Test-Predator-Wrong-CLs per instructions. I can definitely see why it'd point to my CL, though.
,
Jul 12
The trace above is from Issue 862905 . Full stack trace: #4 blink::LifecycleNotifier<blink::Document, blink::SynchronousMutationObserver>::RemoveObserver(blink::LifecycleObserverBase*) third_party/blink/renderer/platform/lifecycle_notifier.h:176:3 #5 SetContext third_party/blink/renderer/platform/lifecycle_observer.h:69:49 #6 blink::DocumentMarkerController::PossiblyHasMarkers(blink::DocumentMarker::MarkerTypes) third_party/blink/renderer/core/editing/markers/document_marker_controller.cc:144 #7 blink::DocumentMarkerController::DidUpdateCharacterData(blink::CharacterData*, unsigned int, unsigned int, unsigned int) third_party/blink/renderer/core/editing/markers/document_marker_controller.cc:894:8 #8 operator() third_party/blink/renderer/core/dom/synchronous_mutation_notifier.cc:50:15 #9 ForEachObserver<(lambda at ../../third_party/blink/renderer/core/dom/synchronous_mutation_notifier.cc:49:19)> third_party/blink/renderer/platform/lifecycle_notifier.h:80 #10 blink::SynchronousMutationNotifier::NotifyUpdateCharacterData(blink::CharacterData*, unsigned int, unsigned int, unsigned int) third_party/blink/renderer/core/dom/synchronous_mutation_notifier.cc:49 #11 blink::CharacterData::SetDataAndUpdate(WTF::String const&, unsigned int, unsigned int, unsigned int, blink::CharacterData::UpdateSource) third_party/blink/renderer/core/dom/character_data.cc:193:19 #12 blink::CharacterData::setData(WTF::String const&) third_party/blink/renderer/core/dom/character_data.cc:47:3 #13 blink::ReplaceChildrenWithText(blink::ContainerNode*, WTF::String const&, blink::ExceptionState&) third_party/blink/renderer/core/editing/serializers/serialization.cc:746:43 #14 blink::HTMLElement::setInnerText(WTF::String const&, blink::ExceptionState&) third_party/blink/renderer/core/html/html_element.cc:655:5 #15 innerTextAttributeSetter gen/third_party/blink/renderer/bindings/core/v8/v8_html_element.cc:626:9 #16 blink::V8HTMLElement::innerTextAttributeSetterCallback(v8::FunctionCallbackInfo<v8::Value> const&) gen/third_party/blink/renderer/bindings/core/v8/v8_html_element.cc:2843
,
Jul 12
,
Jul 12
yosin@, xiaochengh@: Can you please reason about a fix for this issue? I think that the stack trace in Comment 6 and https://crrev.com/c/1132674 are a good primer for this bug. jbroman@: cc-ing you because the blame layer says that the "SetContext(nullptr)" was contributed by you, so you might have context. Summary: DocumentMarkerController::PossiblyHasMarkers() should not call SetContext(nullptr), because that ends up removing the observer from the SynchronousMutationNotifier's set of observers. This is a problem because DocumentMarkerController::DidUpdateCharacterData() calls DocumentMarkerController::PossiblyHasMarkers(), and DocumentMarkerController::DidUpdateCharacterData() is called while iterating over the SynchronousMutationNotifier's observer set. I'll be OOO this week, but can try to help with this on Monday if necessary.
,
Jul 12
Maybe it could be deferred. It's trying to indicate that it doesn't care to receive synchronous mutation notifications when there are no markers, and it does so by removing itself from the document. This proved quite hot in Speedometer, and most of the time there aren't markers, so it's wasteful to keep telling the marker controller about mutations. I more or less matched when possibly_existing_marker_types_ was cleared.
,
Jul 12
Up until https://crrev.com/c/1132674 this situation resulted in removing the observer from SynchronousMutationNotifier's set of observers. This is a HeapHashSet, which does not support modification while iterating, so the removal could have caused memory corruption in production (for example, if the table gets resized). The only alternative I see to the solution in Comment 9 is to re-introduce deferred observer deletion in LifecycleNotifier, and have a ForEachObserver variant with extra logic for tackling deferred deletions. I think LifecycleNotifier used to support deletion, but that was lost over refactorings. I removed the vestigial remains in https://crrev.com/c/1132614
,
Jul 13
Issue 862960 has been merged into this issue.
,
Jul 13
This also happens in production, reported as Issue 862960. Crash list: https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+product.version%3D%2769.0.3489.0%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ADocumentMarkerController%3A%3ADidUpdateCharacterData%27&stbtiq=&reportid=&index=0 To reiterate, https://crrev.com/c/1132674 exposed a potential memory corruption that has existed for a while via a CHECK. The CL should not be reverted -- if we can't fix the unsafe access, we should add a LifecycleNotifier::ForEachObserverUnsafe that replicates the iteration in ForEachObsever, minus the changes to iteration_state_. Ideally, we'd fix the unsafe remove-while-iterating access pattern.
,
Jul 13
,
Jul 13
Please plan to have canary coverage for verification of the fix and before next M-69 scheduled dev release on Tuesday. Issue 862960 is top crash on canary.
,
Jul 15
Issue 863707 has been merged into this issue.
,
Jul 16
yosin@/pwnall@: As per C#12, look like suspected CL cannot be reverted but this(Issue 862960) is consistently been #1 renderer crash on the canary channel and will block tomorrow's M-69 dev release. Please expedite the fix to unblock the M-69 dev release. Looping stability sheriff as well for inputs.
,
Jul 16
I put together https://crrev.com/c/1137988 which turns the crash back into silent memory corruption for SynchronousMutationNotifier::NotifyUpdateCharacterData. This is not ideal, but it's the best I can do right now. I don't have the time to understand enough of Blink Editing to fix DocumentMarkerController. I started looking into adding deferred observer removal to LifecycleNotifier, but that'll be a non-trivial change (I don't want to add the overhead to SynchronousMutationNotifier) and I don't want to pursue it under the stress of blocking a release. Committing https://crrev.com/c/1137988 is better than reverting https://crrev.com/c/1132614 because the checks there remain active for all the other uses of LifecycleNotifier.
,
Jul 16
^ I meant I don't want to add overhead to LifecycleNotifier classes outside of SynchronousMutationNotifier. SynchronousMutationNotifier will need some overhead (a HeapHashSet storing deferred deletions, an extra check in RemoveObserver).
,
Jul 16
Sorry for the delay. I think it's better to remove SetContext(nullptr) from PossiblyHasMarkers() for now to make the behavior correct (but regresses performance), and use deferred observer deletion later. "Silent memory corruption" sounds worse than performance regression to me. Unless the performance regression is really big?
,
Jul 16
I think Comment 19 above makes sense. Since we're so close to branch, I landed https://crrev.com/c/1137988 to get us to the situation before https://crrev.com/c/1132614. This way, a CL that removes SetContext(nullptr) can be safely reverted if the performance regression is unacceptable, or if it turns out to break things in a different way. This bug is no longer a release blocker, as the crashes should have been silenced by https://crrev.com/c/1137988 For the record, the approach I was thinking of was adding a 3rd template parameter to LifecycleNotifier, "bool kSupportsDeferredRemoval = false". LifecycleNotifier with kSupportsDeferredRemoval=true would have an extra field "HeapHashSet<WeakMember<LifecycleObserverBase>> deferred_removals_;", RemoveObserver() would check "if (!(iteration_state_ & kAllowingRemoval))" and add the observer to deferred_removals_, and ForEachObserver() would drain deferred_removals_ after iterating. This is non-trivial because LifecycleObserver also has to get the 3rd template parameter, and the corresponding tests have to be templated over a type parameter, to cover LifecycleNotifier subclasses with kSupportsDeferredRemoval=true and kSupportsDeferredRemoval=false. An example of the syntax for the latter is in //net/cookies/cookie_store_change_unittest.h
,
Jul 16
It might suffice to post a task to later remove it (after checking that it hasn't changed in the meantime), or something like that, rather than making LifecycleNotifier more complicated.
,
Jul 17
ClusterFuzz testcase 5898443462279168 is verified as fixed, so closing issue as verified. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
,
Jul 17
ClusterFuzz has detected this issue as fixed in range 575314:575315. Detailed report: https://clusterfuzz.com/testcase?key=4945883024850944 Fuzzer: attekett_dom_fuzzer Job Type: linux_ubsan_vptr_chrome Platform Id: linux Crash Type: CHECK failure Crash Address: Crash State: iteration_state_ & kAllowingRemoval in lifecycle_notifier.h blink::LifecycleNotifier<blink::Document, blink::SynchronousMutationObserver>::R blink::DocumentMarkerController::PossiblyHasMarkers Sanitizer: undefined (UBSAN) Regressed: https://clusterfuzz.com/revisions?job=linux_ubsan_vptr_chrome&range=574075:574076 Fixed: https://clusterfuzz.com/revisions?job=linux_ubsan_vptr_chrome&range=575314:575315 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4945883024850944 See https://github.com/google/clusterfuzz-tools for more information. If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
,
Jul 17
The crasher is gone, but the memory corruption is still present. xiaochengh@ - Can you please confirm you're working on a fix along the lines of Comment 19 / Comment 21? If that's the case, please assign the bug to yourself. If not, please say so, and I'll try to find time to prepare a fix. The memory corruption is now documented fairly well in the CL description in https://crrev.com/c/1137988 so I think we should aim to fix it in M69.
,
Jul 17
,
Jul 18
pwnall@: I'm not working on it. Feel free to move on
,
Jul 19
Talked with GC expert. The solution is: Register weak member call back to detect empty weak hash map and call SetContext(nullptr) when weak hash map empty. Thus, DocumentMarkerController will not change observer list.
,
Jul 19
In review: http://crrev.com/c/1143113
,
Jul 24
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by ClusterFuzz
, Jul 12Labels: Test-Predator-Auto-Components