New issue
Advanced search Search tips

Issue 862900 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 1
Type: Bug



Sign in to add a comment

CHECK failure: iteration_state_ & kAllowingRemoval in lifecycle_notifier.h

Project Member Reported by ClusterFuzz, Jul 12

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=4945883024850944

Fuzzer: attekett_dom_fuzzer
Job Type: linux_ubsan_vptr_chrome
Platform Id: linux

Crash Type: CHECK failure
Crash Address: 
Crash State:
  iteration_state_ & kAllowingRemoval in lifecycle_notifier.h
  blink::LifecycleNotifier<blink::Document, blink::SynchronousMutationObserver>::R
  blink::DocumentMarkerController::PossiblyHasMarkers
  
Sanitizer: undefined (UBSAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_ubsan_vptr_chrome&range=574075:574076

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4945883024850944

Issue filed automatically.

See https://github.com/google/clusterfuzz-tools for more information.
 
Project Member

Comment 1 by ClusterFuzz, Jul 12

Components: Blink>DOM Blink>Editing
Labels: Test-Predator-Auto-Components
Automatically applying components based on crash stacktrace and information from OWNERS files.

If this is incorrect, please apply the Test-Predator-Wrong-Components label.
Project Member

Comment 2 by ClusterFuzz, Jul 12

Labels: Test-Predator-Auto-Owner
Owner: pwnall@chromium.org
Status: Assigned (was: Untriaged)
Automatically assigning owner based on suspected regression changelist https://chromium.googlesource.com/chromium/src/+/e23e774ef4290cff9869417e25d46e5a099d1567 (Blink: Introduce LifecycleNotifier::ForEachObserver().).

If this is incorrect, please let us know why and apply the Test-Predator-Wrong-CLs label. If you aren't the correct owner for this issue, please unassign yourself as soon as possible so it can be re-triaged.
As far as I know, this DCHECK exposes an unsafe modification of LifecycleNotifier::obsevers_ (modification concurrently with iterating over the collection).

The CL referenced above added checks to more places that iterated over the collection, so reverting it would not fix the bug, it would merely mask it.
Labels: Test-Predator-Wrong-CLs
Adding Test-Predator-Wrong-CLs per instructions. I can definitely see why it'd point to my CL, though.

Comment 5 Deleted

The trace above is from  Issue 862905 . Full stack trace:

#4 blink::LifecycleNotifier<blink::Document, blink::SynchronousMutationObserver>::RemoveObserver(blink::LifecycleObserverBase*) third_party/blink/renderer/platform/lifecycle_notifier.h:176:3
#5 SetContext third_party/blink/renderer/platform/lifecycle_observer.h:69:49
#6 blink::DocumentMarkerController::PossiblyHasMarkers(blink::DocumentMarker::MarkerTypes) third_party/blink/renderer/core/editing/markers/document_marker_controller.cc:144
#7 blink::DocumentMarkerController::DidUpdateCharacterData(blink::CharacterData*, unsigned int, unsigned int, unsigned int) third_party/blink/renderer/core/editing/markers/document_marker_controller.cc:894:8
#8 operator() third_party/blink/renderer/core/dom/synchronous_mutation_notifier.cc:50:15
#9 ForEachObserver<(lambda at ../../third_party/blink/renderer/core/dom/synchronous_mutation_notifier.cc:49:19)> third_party/blink/renderer/platform/lifecycle_notifier.h:80
#10 blink::SynchronousMutationNotifier::NotifyUpdateCharacterData(blink::CharacterData*, unsigned int, unsigned int, unsigned int) third_party/blink/renderer/core/dom/synchronous_mutation_notifier.cc:49
#11 blink::CharacterData::SetDataAndUpdate(WTF::String const&, unsigned int, unsigned int, unsigned int, blink::CharacterData::UpdateSource) third_party/blink/renderer/core/dom/character_data.cc:193:19
#12 blink::CharacterData::setData(WTF::String const&) third_party/blink/renderer/core/dom/character_data.cc:47:3
#13 blink::ReplaceChildrenWithText(blink::ContainerNode*, WTF::String const&, blink::ExceptionState&) third_party/blink/renderer/core/editing/serializers/serialization.cc:746:43
#14 blink::HTMLElement::setInnerText(WTF::String const&, blink::ExceptionState&) third_party/blink/renderer/core/html/html_element.cc:655:5
#15 innerTextAttributeSetter gen/third_party/blink/renderer/bindings/core/v8/v8_html_element.cc:626:9
#16 blink::V8HTMLElement::innerTextAttributeSetterCallback(v8::FunctionCallbackInfo<v8::Value> const&) gen/third_party/blink/renderer/bindings/core/v8/v8_html_element.cc:2843
Cc: pwnall@chromium.org ifratric@google.com
 Issue 862905  has been merged into this issue.
Cc: xiaoche...@chromium.org jbroman@chromium.org
Owner: yosin@chromium.org
yosin@, xiaochengh@: Can you please reason about a fix for this issue? I think that the stack trace in Comment 6 and https://crrev.com/c/1132674 are a good primer for this bug.

jbroman@: cc-ing you because the blame layer says that the "SetContext(nullptr)" was contributed by you, so you might have context.

Summary: DocumentMarkerController::PossiblyHasMarkers() should not call SetContext(nullptr), because that ends up removing the observer from the SynchronousMutationNotifier's set of observers. This is a problem because DocumentMarkerController::DidUpdateCharacterData() calls DocumentMarkerController::PossiblyHasMarkers(), and DocumentMarkerController::DidUpdateCharacterData() is called while iterating over the SynchronousMutationNotifier's observer set.

I'll be OOO this week, but can try to help with this on Monday if necessary.
Maybe it could be deferred. It's trying to indicate that it doesn't care to receive synchronous mutation notifications when there are no markers, and it does so by removing itself from the document.

This proved quite hot in Speedometer, and most of the time there aren't markers, so it's wasteful to keep telling the marker controller about mutations. I more or less matched when possibly_existing_marker_types_ was cleared.
Up until https://crrev.com/c/1132674 this situation resulted in removing the observer from SynchronousMutationNotifier's set of observers. This is a HeapHashSet, which does not support modification while iterating, so the removal could have caused memory corruption in production (for example, if the table gets resized).

The only alternative I see to the solution in Comment 9 is to re-introduce deferred observer deletion in LifecycleNotifier, and have a ForEachObserver variant with extra logic for tackling deferred deletions. I think LifecycleNotifier used to support deletion, but that was lost over refactorings. I removed the vestigial remains in https://crrev.com/c/1132614


Issue 862960 has been merged into this issue.
Cc: gov...@chromium.org ajha@chromium.org pucchakayala@chromium.org
Labels: ReleaseBlock-Dev Target-69 FoundIn-69
This also happens in production, reported as Issue 862960.

Crash list: https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+product.version%3D%2769.0.3489.0%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ADocumentMarkerController%3A%3ADidUpdateCharacterData%27&stbtiq=&reportid=&index=0

To reiterate, https://crrev.com/c/1132674 exposed a potential memory corruption that has existed for a while via a CHECK. The CL should not be reverted -- if we can't fix the unsafe access, we should add a LifecycleNotifier::ForEachObserverUnsafe that replicates the iteration in ForEachObsever, minus the changes to iteration_state_. Ideally, we'd fix the unsafe remove-while-iterating access pattern.
Labels: M-69
Labels: OS-Mac OS-Windows
Please plan to have canary coverage for verification of the fix and before next M-69 scheduled dev release on Tuesday. Issue 862960 is top crash on canary.  
 Issue 863707  has been merged into this issue.
Cc: ligim...@chromium.org
Labels: Stability-Sheriff-Desktop
yosin@/pwnall@: As per C#12, look like suspected CL cannot be reverted but this(Issue 862960) is consistently been #1 renderer crash on the canary channel and will block tomorrow's M-69 dev release. Please expedite the fix to unblock the M-69 dev release.

Looping stability sheriff as well for inputs.
I put together https://crrev.com/c/1137988 which turns the crash back into silent memory corruption for SynchronousMutationNotifier::NotifyUpdateCharacterData.

This is not ideal, but it's the best I can do right now. I don't have the time to understand enough of Blink Editing to fix DocumentMarkerController. I started looking into adding deferred observer removal to LifecycleNotifier, but that'll be a non-trivial change (I don't want to add the overhead to SynchronousMutationNotifier) and I don't want to pursue it under the stress of blocking a release.

Committing https://crrev.com/c/1137988 is better than reverting https://crrev.com/c/1132614 because the checks there remain active for all the other uses of LifecycleNotifier.
^ I meant I don't want to add overhead to LifecycleNotifier classes outside of SynchronousMutationNotifier. SynchronousMutationNotifier will need some overhead (a HeapHashSet storing deferred deletions, an extra check in RemoveObserver).
Sorry for the delay.

I think it's better to remove SetContext(nullptr) from PossiblyHasMarkers() for now to make the behavior correct (but regresses performance), and use deferred observer deletion later.

"Silent memory corruption" sounds worse than performance regression to me. Unless the performance regression is really big?


Labels: -ReleaseBlock-Dev -Target-69
I think Comment 19 above makes sense. Since we're so close to branch, I landed https://crrev.com/c/1137988 to get us to the situation before https://crrev.com/c/1132614. This way, a CL that removes SetContext(nullptr) can be safely reverted if the performance regression is unacceptable, or if it turns out to break things in a different way.

This bug is no longer a release blocker, as the crashes should have been silenced by https://crrev.com/c/1137988

For the record, the approach I was thinking of was adding a 3rd template parameter to LifecycleNotifier, "bool kSupportsDeferredRemoval = false".

LifecycleNotifier with kSupportsDeferredRemoval=true would have an extra field "HeapHashSet<WeakMember<LifecycleObserverBase>> deferred_removals_;", RemoveObserver() would check "if (!(iteration_state_ & kAllowingRemoval))" and add the observer to deferred_removals_, and ForEachObserver() would drain deferred_removals_ after iterating.

This is non-trivial because LifecycleObserver also has to get the 3rd template parameter, and the corresponding tests have to be templated over a type parameter, to cover LifecycleNotifier subclasses with kSupportsDeferredRemoval=true and kSupportsDeferredRemoval=false. An example of the syntax for the latter is in //net/cookies/cookie_store_change_unittest.h
It might suffice to post a task to later remove it (after checking that it hasn't changed in the meantime), or something like that, rather than making LifecycleNotifier more complicated.
Project Member

Comment 22 by ClusterFuzz, Jul 17

Labels: ClusterFuzz-Verified
Status: Verified (was: Assigned)
ClusterFuzz testcase 5898443462279168 is verified as fixed, so closing issue as verified.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
Project Member

Comment 23 by ClusterFuzz, Jul 17

ClusterFuzz has detected this issue as fixed in range 575314:575315.

Detailed report: https://clusterfuzz.com/testcase?key=4945883024850944

Fuzzer: attekett_dom_fuzzer
Job Type: linux_ubsan_vptr_chrome
Platform Id: linux

Crash Type: CHECK failure
Crash Address: 
Crash State:
  iteration_state_ & kAllowingRemoval in lifecycle_notifier.h
  blink::LifecycleNotifier<blink::Document, blink::SynchronousMutationObserver>::R
  blink::DocumentMarkerController::PossiblyHasMarkers
  
Sanitizer: undefined (UBSAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_ubsan_vptr_chrome&range=574075:574076
Fixed: https://clusterfuzz.com/revisions?job=linux_ubsan_vptr_chrome&range=575314:575315

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4945883024850944

See https://github.com/google/clusterfuzz-tools for more information.

If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
Status: Assigned (was: Verified)
The crasher is gone, but the memory corruption is still present.

xiaochengh@ - Can you please confirm you're working on a fix along the lines of Comment 19 / Comment 21? If that's the case, please assign the bug to yourself. If not, please say so, and I'll try to find time to prepare a fix.

The memory corruption is now documented fairly well in the CL description in  https://crrev.com/c/1137988 so I think we should aim to fix it in M69.
Labels: -Stability-Sheriff-Desktop
pwnall@: I'm not working on it. Feel free to move on

Comment 27 Deleted

Talked with GC expert. The solution is:

Register weak member call back to detect empty weak hash map and call SetContext(nullptr) when weak hash map empty.

Thus, DocumentMarkerController will not change observer list.
In review: http://crrev.com/c/1143113
Status: Fixed (was: Started)

Sign in to add a comment