New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 831117 link

Starred by 8 users

Termination GC leaves behind persistents

Project Member Reported by mlippautz@chromium.org, Apr 10 2018

Issue description

I noticed that on ToT on linux release builds with DCHECK we sometimes fail in 
   external/wpt/offscreen-canvas/compositing/*
tests. This has been going on for some time now.

I've now identified that the thread termination GCs leave behind persistents

crash log for renderer (pid <unknown>):
STDOUT: <empty>
STDERR: [1:47:0410/131512.396866:FATAL:thread_state.cc(239)] Check failed: !current_count. 
STDERR: #0 0x7f6c7c87640c base::debug::StackTrace::StackTrace()
STDERR: #1 0x7f6c7c8a08bb logging::LogMessage::~LogMessage()
STDERR: #2 0x7f6c75419a16 blink::ThreadState::RunTerminationGC()
STDERR: #3 0x7f6c754197be blink::ThreadState::DetachCurrentThread()
STDERR: #4 0x7f6c753e29f2 blink::WebThreadSupportingGC::ShutdownOnThread()
STDERR: #5 0x7f6c7735219a blink::WorkerBackingThread::ShutdownOnBackingThread()
STDERR: #6 0x7f6c773671c4 blink::WorkerThread::PerformShutdownOnWorkerThread()
STDERR: #7 0x7f6c753e1851 blink::(anonymous namespace)::RunCrossThreadClosure()
STDERR: #8 0x7f6c753e2235 _ZN4base8internal7InvokerINS0_9BindStateIPFvN3WTF19CrossThreadFunctionIFvvEEEEJS6_EEES5_E7RunOnceEPNS0_13BindStateBaseE
STDERR: #9 0x7f6c7c876df0 base::debug::TaskAnnotator::RunTask()
STDERR: #10 0x7f6c754a5656 blink::scheduler::internal::ThreadControllerImpl::DoWork()
STDERR: #11 0x7f6c754a76c8 _ZN4base8internal7InvokerINS0_9BindStateIMN5blink9scheduler8internal20ThreadControllerImplEFvNS5_19SequencedTaskSource8WorkTypeEEJNS_7WeakPtrIS6_EES8_EEEFvvEE3RunEPNS0_13BindStateBaseE
STDERR: #12 0x7f6c7c876df0 base::debug::TaskAnnotator::RunTask()
STDERR: #13 0x7f6c7c8aca46 base::internal::IncomingTaskQueue::RunTask()
STDERR: #14 0x7f6c7c8b0b97 base::MessageLoop::RunTask()
STDERR: #15 0x7f6c7c8b0faa base::MessageLoop::DeferOrRunPendingTask()
STDERR: #16 0x7f6c7c8b123e base::MessageLoop::DoWork()
STDERR: #17 0x7f6c7c8b2586 base::MessagePumpDefault::Run()
STDERR: #18 0x7f6c7c8b040c base::MessageLoop::Run()
STDERR: #19 0x7f6c7c8e8ad6 base::RunLoop::Run()
STDERR: #20 0x7f6c7c92af4a base::Thread::Run()
STDERR: #21 0x7f6c7c92b544 base::Thread::ThreadMain()
STDERR: #22 0x7f6c7c9270df base::(anonymous namespace)::ThreadFunc()
STDERR: #23 0x7f6c7c9c6494 start_thread
STDERR: #24 0x7f6c72e6ea8f clone
STDERR: 

gn args:
  is_component_build = true
  is_debug = true
  use_goma = true
  symbol_level = 2

I've not been able to reproduce this locally so far.

 
Now keishi, peria and me are thinking about removing the persistent check at the end of the thread termination GC (for a different reason. See https://chromium-review.googlesource.com/c/chromium/src/+/1001078 for more details).

Alternately we can just clear all persistent handles on that thread before shutting down the thread.

fyi: V8 just clears the handles and removes the pages. 

I guess we would need a bit of handling for cross-thread persistents but I would otherwise prefer that approach.
> fyi: V8 just clears the handles and removes the pages. 

Thanks for the info. I'm now more confident about that approach :)

> I guess we would need a bit of handling for cross-thread persistents

Cross-thread persistents are already cleared on shut down. We just need to take care of (same-thread) persistents.

What came to my mind today is finalizers. If they can cause allocations on other threads or on global state then this might be unsafe.
Given that this is causing a bunch of crashes, I want to fix this asap.

Specifically, I want to understand why we cannot simply clear all persistent handles of that thread before the thread shuts down.

> What came to my mind today is finalizers. If they can cause allocations on other threads or on global state then this might be unsafe.

Would you elaborate on this? Finalizers of the cleared persistent handles will run in the termination GC. How is it unsafe?


Cc: lucmult@chromium.org
>> What came to my mind today is finalizers. If they can cause allocations on other threads or on global state then this might be unsafe.

> Would you elaborate on this? Finalizers of the cleared persistent handles will run in the termination GC. How is it unsafe?

If we still want to do a termination GC then this is fine.

In v8 we just shut down the heap without any final GC. 

We only have a flag for termination GCs for test purposes where we want to ensure a graceful teardown.

I can have a look after BlinkOn if nobody is on this.
Labels: -Pri-2 Pri-1
Another P1 issue reported: https://bugs.chromium.org/p/chromium/issues/detail?id=829859

Let me bump up the priority of this bug.

Do you know why a lot of crashes have appeared recently? The termination GC was there for a long time.

AFAIK, we have not been changing this recently, no? Last change in this area was the refactoring on persistents, I guess.

I can try to repro the case on the linked bug. The issue I observed only happened when running with the layout test runner.
Status: Started (was: Assigned)
I've reproduced the issue (from the linked bug) in a new LayoutTest on my CL to fix  this for Cache Storage with Mojo:
crrev.com/c/1011467

I have reproduced the issue not only on LayoutTests but also when running a recent build and loading the offending JS code.
Another slightly related issue would be: We're not flushing all pending tasks before shutting down the thread. This has a risk of causing problems because people sometimes expect that callbacks are guaranteed to run.

Flushing all tasks before a thread shutdown will fix this bug as well. (Separately I want to have a mechanism to clear all persistent handles before a thread shutdown though.)

Neither of the test fail when run stand-alone using content_shell and the blink http server (if needed). They only fail for me when run through run-webkit-tests script. Presumably, something with tasks and/or threads is off here.
The crash in #12 doesn't repro for me (Mac). (I don't have the fix merged.)
We register Persistent that should be explicitly cleared with "RegisterAsStaticReference". I assume that we forget to mark one of the entry points with that call.
The crashes indicate that it's not realistic to manually mark so many Persistents with RegisterAsStaticReference.

That's why I'm proposing:

- Flush all pending tasks before a worker shuts down. (This will anyway be a good change.)
- Clear all persistent handles before a thread shutdown.

Comment 18 by keishi@google.com, Apr 17 2018

I am trying to clear the persistent handles. After a layover in LAX, I am finally in MTV today (currently at the wasm hackathon in S Club 7). haraken when we have a chance, may I discuss how to handle PersistentHeapCollections. I am not sure to integrate them with the rest of the Persistents.
Lets schedule a meeting about this one during the BlinkOn week.

Comment 20 by peria@chromium.org, Apr 17 2018

Let me join the meeting. :)
FYI, the offending persistent in 
  external/wpt/xhr/close-worker-with-xhr-in-progress.html

is 

void blink::PersistentBase<blink::ResourceLoader, blink::kNonWeakPersistentConfiguration, blink::kSingleThreadPersistentConfiguration>::PrintDebug(blink::Visitor *) [T = blink::ResourceLoader, weaknessConfiguration = blink::kNonWeakPersistentConfiguration, crossThreadnessConfiguration = blink::kSingleThreadPersistentConfiguration]

Need to check the sources.
The problem for this specific issue seems to be in ResourceLoader::DidStartLoadingResponseBody.

https://cs.chromium.org/chromium/src/third_party/blink/renderer/platform/loader/fetch/resource_loader.cc?l=619

I see to Persistents being created from there. One is definitely for WTF::Bind using WrapPersistent(). If the callback is still enqueued somewhere during teardown, then we definitely won't be able to clear all Persistents.


The issue is definitely that the WTF::Bind call generates a Persistent that is
(a) not registered using RegisterAsStaticReference
(b) doesn't go away by doing GC as it is held alive from outside the managed heap

We could probably use RegisterAsStaticReference in WrapPersistent() to mitigate such issue.

Long term we should do better.
Thanks for digging!

ResourceLoader::DidStartLoadingResponseBody seems to be using Wrap"Weak"Persistent, not WrapPersistent... right?

If you're talking about a different WrapPersistent, would it be an option to replace the WrapPersistent with WrapWeakPersistent as a short-term fix? We're fixing other similar crashes that way. For the fix to be correct, it must be guaranteed that the object is kept alive in a way other than the WrapPersistent.

As I mentioned, in mid term flushing all pending tasks on teardown will fix all the crashes.

Fixed version on ToT uses WrapWeakPersistent.

The broken version used WrapPersistent and we couldn't clear it because it was not registered as static. I wonder if WrapPersistent should also call RegisterAsStaticReference. That would allow strong refs that still go away on termination.

Comment 26 by haraken@google.com, Apr 18 2018

Thanks, makes sense. Thanks for digging into the issue, really :)

> The broken version used WrapPersistent and we couldn't clear it because it was not registered as static. I wonder if WrapPersistent should also call RegisterAsStaticReference. That would allow strong refs that still go away on termination.

I'd prefer all persistents of that thread than using RegisterAsStaticReference because all persistents (not limited to WrapPersistent) have the same problem though.

Separately I think we should consider landing a change to flush all pending tasks.

Summary of what we agreed on at BlinkOn:
1. We should clear wrappers before the terminating GC as we would otherwise not call finalizers for some objects. Not doing so may lead to (a) leaks and/or (b) stale handles.
2. We should clear *all* persistents before the terminating GC and retire the static registration mechanism. This might require a few smaller changes to how we handle persistents (because we cannot derive whether we are PersistentBase or PersistentCollectionBase).
3. We can then check that marking does not find any live objects. If we still find some, then we have another component to fix. Otherwise, we should be good.

Will do 1. first.
I begin to wonder why it's hard to support PersistentCollectionBase.

Step 1: We just need to clear all PersistentNodes. That's it. Then subsequent accesses to PersistentCollectionBase and Persistent will cause a null-deref crash. This is better than what we have today.

Step 2: If we want to avoid the null-deref crash, we need to clear PersistentCollectionBases and Persistents that own the PersistentNodes. However even if we do this, it won't be better than Step 1 because we'll anyway hit a null-deref crash when accessing the PersistentCollectionBases and Persistents. It's just moving a place to hit a null-deref crash.

Am I misunderstanding?



Seems all correct to me.

I guess it's about how to implement the clearing and whether we use internal knowledge there (work our way through void*), or whether we want to refer to some type clearing, e.g. PersistentCollectionBase::Clear. The latter would require some plumbing.
Yeah, the plumbing would be hard because we need to manage back pointers. (It would be doable by unifying TraceCallback and ClearCallback into one trampoline function -- but it will be a terrible template.)

In short term, I'd propose simply clearing all PersistentNodes.

Project Member

Comment 31 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3536bbb2a547af920d98b537ee6ba7468bd85102

commit 3536bbb2a547af920d98b537ee6ba7468bd85102
Author: Michael Lippautz <mlippautz@chromium.org>
Date: Wed Apr 25 08:37:16 2018

[oilpan] Detach V8 garbage colector before running termination GC

Any roots from V8 into Blink need to be cleared before running the termination
garbage collection to ensure that all objects die and finalizers are called.

Bug:  chromium:831117 
Change-Id: I09042623ceafe141819f4e9b52b15e784dd623a9
Reviewed-on: https://chromium-review.googlesource.com/1025032
Commit-Queue: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Cr-Commit-Position: refs/heads/master@{#553487}
[modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/bindings/core/v8/v8_gc_controller.cc
[modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/bindings/core/v8/v8_gc_controller.h
[modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/core/workers/worker_backing_thread.cc
[modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/platform/bindings/v8_per_isolate_data.cc

Issue 825728 has been merged into this issue.
Cc: pbomm...@chromium.org gov...@chromium.org ligim...@chromium.org
Labels: ReleaseBlock-Stable M-67 Target-67 FoundIn-67 FoundIn-68 OS-Windows
The above CL landed on - 68.0.3409.0, do we need a merge here? Crash rate is reasonably high in previous Beta - 67.0.3396.10. There are chances of spiking once 100% of users upgrades to .40 which was released Yday.

67.0.3396.40	0.01%	1
67.0.3396.10	15.81%	2117
67.0.3396.3	0.03%	4
67.0.3396.2	0.06%	8
67.0.3396.0	0.35%	47

66.0.3359.181	0.01%	1

Over to M67 owners for monitoring.

Similar bug : Issue 825728
Sample crash report: go/crash/e313d8e0dec64954

Link to the reports:
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ACrossThreadPersistentRegion%3A%3AShouldTracePersistentNode%27#-property-selector,-day-graph,-samplereports,productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
Thank you ligimole@

Update from keishi@ on dupe bug 825728 - https://bugs.chromium.org/p/chromium/issues/detail?id=825728#c45 (Judging from the fact that the crash rate has decreased, I think we can conclude that  crbug.com/829859  was the cause and the fix was merged in 67.0.3396.13). After this fix/merge to M67, there is only 1 crash reported on last week beta #67.0.3396.40. So far no crash reported on this week M67 beta #67.0.3396.48.

At the moment based on available crash data, this doesn't seems like M67 stable blocker. Prudhvi, pls continue to monitor this. Thank you.






Labels: M-68
This is the #1 crash in latest Dev- 68.0.3432.3, crash rate for Dev channel in M68 and M67.

68.0.3423.2	0.95%	98
68.0.3418.2	0.50%	51
68.0.3409.2	0.01%	1

67.0.3396.10	20.25%	2079
67.0.3393.4	24.11%	2475
67.0.3386.1	30.00%	3080
67.0.3381.1	21.71%	2229

Would like to monitor for M68.
Which crash are you talking about? This is about a general memory corruption that could affect all sorts of crashes. 

Is this about "blink::CrossThreadPersistentRegion::ShouldTracePersistentNode"? I think this one was already gone in the new version, even before this path, no? 

Comment 37 by ajha@chromium.org, May 21 2018

Cc: abdulsyed@chromium.org ajha@chromium.org
Labels: -Type-Bug -ReleaseBlock-Stable RegressedIn-68 ReleaseBlock-Beta Target-68 Type-Bug-Regression
>C#36

There seems to have been spike in crashes of 'blink::CrossThreadPersistentRegion::ShouldTracePersistentNode' on the latest Windows dev: 68.0.3432.3(3 days of crash data has reported 1214 crashes from 1081 clients) compared to Last dev 68.0.3423.2 which reported 103 crashes frm 93 clients from 10 days of crash data.

Link to the version comparison of this specific magic signature:

https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ACrossThreadPersistentRegion%3A%3AShouldTracePersistentNode%27&compProp=product.Version&v1=68.0.3423.2&v2=68.0.3432.3

Note: Tagging this with M-68 Beta blocker and this is not a blocker for M-67 as there are no crashes seen on the last 2 M-67 Beta(67.0.3396.40_ one crash and 67.0.3396.48_ zero crash) 


https://chromium.googlesource.com/chromium/src/+/5b6c23a4f50a50bd859d96d22679f2e3124783d4 is touching WrapPersistent in workers so I am looking into it.
My current best guess is that the Persistent create by WrapPersistent(resolver) from ServiceWorkerRegistrationNotifications::DidLoadResources() is leaking because the callback gets registered somewhere in the mojo bindings.

https://chromium-review.googlesource.com/c/chromium/src/+/1014400 was also caused by a callback registered into a mojo object(mojom::blink::BlobRegistry).

Do you know why the persistent held by the callback hits the DCHECK?

As I mentioned offline, the callback task is not called but gets destructed before the worker shuts down. Thus the Persistent handle should be destructed before the worker shuts down.

The bound callbacks aren't used to PostTask. They are held strongly by a mojo related object as callback for some event. I am thinking that's why they don't get freed when the task queue is cleared out.
Your bug is tagged as Release block Beta and we are branching in 2 days.Please have a fix ASAP.
Project Member

Comment 44 by bugdroid1@chromium.org, May 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e3aac921d7937d5be6c962532f3d28d608824b67

commit e3aac921d7937d5be6c962532f3d28d608824b67
Author: Keishi Hattori <keishi@chromium.org>
Date: Thu May 24 08:06:49 2018

Oilpan: Clear all Persistents on thread termination

Clear all persistents on thread termination, because when we have a bug and it leaves a Persistent behind, it will not be a stale pointer and just causes null dereference.

Since we disabled PersistentHeapCollections on non-main threads we can assume all PersistentNode::Self() can be cast to Persistent<DummyGCBase>

Bug:  831117 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Ib55b280f8e7c5d966ce43d3f60f2af873cd50739
Reviewed-on: https://chromium-review.googlesource.com/1025547
Commit-Queue: Keishi Hattori <keishi@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Cr-Commit-Position: refs/heads/master@{#561418}
[modify] https://crrev.com/e3aac921d7937d5be6c962532f3d28d608824b67/third_party/blink/renderer/platform/heap/persistent_node.cc
[modify] https://crrev.com/e3aac921d7937d5be6c962532f3d28d608824b67/third_party/blink/renderer/platform/heap/persistent_node.h
[modify] https://crrev.com/e3aac921d7937d5be6c962532f3d28d608824b67/third_party/blink/renderer/platform/heap/thread_state.cc

Issue 846696 has been merged into this issue.
Labels: ReleaseBlock-Dev
Think the above patch introduced another crash in latest canary-68.0.3440.0, magic signature: blink::ThreadState::RunTerminationGC

Did a version comparison from the previous canary.

68.0.3439.0 -> 0 reports
68.0.3440.0 -> 50 reports , 50 clients

Crash Reports
=============
https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+product.version%3D%2768.0.3440.0%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27+AND+expanded_custom_data.ChromeCrashProto.ptype%3D%27renderer%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AThreadState%3A%3ARunTerminationGC%27

keishi@ if the patch in #44 is the suspect, please revert ASAP.

RBD for M68.
mlippautz@,
Friendly ping to get update on this issue as it is marked as RBB & RBD for M67 & M68.
Thanks...!
Labels: -ReleaseBlock-Dev
The crash reports from C#46 is due to r561418. I will land a partial revert.
https://chromium-review.googlesource.com/#/c/chromium/src/+/1074579
RBD should be removed when this lands.

The initial issue, blocking M68 beta probably still exists.
Project Member

Comment 49 by bugdroid1@chromium.org, May 28 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c899e5f3047daf6d121c524bc96259291f73de6f

commit c899e5f3047daf6d121c524bc96259291f73de6f
Author: Keishi Hattori <keishi@chromium.org>
Date: Mon May 28 08:01:11 2018

Oilpan: Change termination gc persistent check back to DCHECK

r561418 changed the check to CHECK. Reverting back to DCHECK.

Bug:  831117 
Change-Id: Iad20e975e5de637bcb303c3fe4230e2fac6f4753
Reviewed-on: https://chromium-review.googlesource.com/1074579
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Commit-Queue: Keishi Hattori <keishi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#562204}
[modify] https://crrev.com/c899e5f3047daf6d121c524bc96259291f73de6f/third_party/blink/renderer/platform/heap/thread_state.cc

Comment 50 by ajha@chromium.org, May 29 2018

Labels: OS-Android OS-Mac
Just to update, Mac and Android canary version: 69.0.3444.0 has been live for 4 hrs and has not reported any crashes so far for the magic signature 'blink::ThreadState::RunTerminationGC'(Issue 846696, duped in C#45). Based on the available crash data this looks fixed.

keishi@: Would it be fine to get the CL from C#49 merged to M-68 based on Android and Mac canary crash data.

Note: Unable to verify this on Windows as no M-69 canary is available due to Issue 847075.
Can we get this merged to M68 branch?

Comment 52 by ajha@chromium.org, May 30 2018

To add to C#50 there are no crashes seen on 69.0.3444.0 and 69.0.3445.0  of Windows.

keishi@: Please get the CL merged to M-68 branch.
Labels: Merge-Request-68
Owner: keishi@chromium.org
Status: Assigned (was: Started)
I didn't think r561418 was in M68 but I see it landed in 68.0.3440.0
In that case r562204 needs to be merged to M68
Requesting merge.
Project Member

Comment 54 by sheriffbot@chromium.org, May 30 2018

Labels: -Merge-Request-68 Hotlist-Merge-Review Merge-Review-68
This bug requires manual review: Reverts referenced in bugdroid comments after merge request.
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-68 Merge-Approved-68
Approving merge to M68. Branch:3440

Comment 56 by ajha@chromium.org, May 31 2018

Issue 825728(magic signature: blink::CrossThreadPersistentRegion::ShouldTracePersistentNode) duped here in C#32 has started showing up crashes in M-69 on Windows,Mac and Android since chrome version: 69.0.3444.0.

Win : https://goto.google.com/xmmhg

Mac : https://goto.google.com/jlrkg

Android : https://goto.google.com/yqbhf


keishi@: Please let us know if you need separate issue for recent spike on Win,Mac and Android in M-69.

Thank you!

Comment 57 by ajha@chromium.org, May 31 2018

Labels: M-69 FoundIn-69
Labels: OS-Chrome
Note that this shouldn't block Chrome OS 67 or 68 rollout.
keishi@, Friendly ping to get an update on this issue as per C#56 & it is marked as RBB.
Thanks..!
Project Member

Comment 61 by bugdroid1@chromium.org, Jun 4 2018

Labels: -merge-approved-68 merge-merged-3440
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b5e2b0e7f572ad3ded8a447166078ae4558f3fff

commit b5e2b0e7f572ad3ded8a447166078ae4558f3fff
Author: Keishi Hattori <keishi@chromium.org>
Date: Mon Jun 04 14:09:16 2018

Oilpan: Change termination gc persistent check back to DCHECK

r561418 changed the check to CHECK. Reverting back to DCHECK.

Bug:  831117 
Change-Id: Iad20e975e5de637bcb303c3fe4230e2fac6f4753
Reviewed-on: https://chromium-review.googlesource.com/1074579
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Commit-Queue: Keishi Hattori <keishi@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#562204}(cherry picked from commit c899e5f3047daf6d121c524bc96259291f73de6f)
Reviewed-on: https://chromium-review.googlesource.com/1085187
Reviewed-by: Keishi Hattori <keishi@chromium.org>
Cr-Commit-Position: refs/branch-heads/3440@{#133}
Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733}
[modify] https://crrev.com/b5e2b0e7f572ad3ded8a447166078ae4558f3fff/third_party/blink/renderer/platform/heap/thread_state.cc

Comment 62 by ajha@chromium.org, Jun 4 2018

keishi@: Would the C#61 will be safe merge. We have also been seeing 'blink::CrossThreadPersistentRegion::ShouldTracePersistentNode' in M-69 since 69.0.3444.0 on Win,Mac and Android. Please check C#56. 
Labels: -OS-Chrome
Removing the OS-Chrome label per #58,59.  Thanks abodenha@!

Comment 64 by lfg@chromium.org, Jun 5 2018

Cc: palmer@chromium.org lfg@chromium.org
Labels: Restrict-View-Google
I've been investigating  issue 849073 , where the DedicatedWorkerGlobalScope survives termination GC in a worker and causes all sorts of problems, because it's destructor isn't called but its memory is freed in RemoveAllPages().

While investigating, I noticed that r561733 turns these failed CHECKs into use-after-frees. I think this should be reverted back to a CHECK().

Adding +palmer@ who is the security sheriff for thoughts on what we should do.

Re #64: Yes, if a DCHECK is the only thing guarding against UAF, then it needs to be a CHECK.

It sounds like this bug is an exploitable memory corruption bug, and possibly also an exploitable logic bug (destructors not being called could hypothetically lead to weird application behavior?). So it's probably the case that this bug should have been tracked as Type=Bug-Security from the beginning. It's easy to de-escalate, so I tend to err on the side of calling memory corruption and object lifetime bugs security bugs.

Also, why would this not affect Chrome OS or Fuchsia?
This should affect all platforms. They should be marked as security relevant but are unfortunately not simple to fix. Let me provide a bit more context here. 

Blink is notoriously bad in tearing down in a graceful manner. AFAIK, people have given up in tearing down the heap in a predictable way. IMHO, that is a bad idea as it allows for all sorts of bad practices to slip through.

Now, for threads this is different as they are still connected to the main thread when they are teared down. There are many ways to get this wrong and historically this was always in a very fragile state. The cases that require handling are:
  1. Disconnecting V8 from Blink: This is necessary as there may still be pending V8 garbage collections that try to access Blink's heap during the teardown sequence.
  2. Clearing CrossThreadPersistents.

Case 1. went unnoticed for years now and I fixed most of the cases with the CL in #31. It is not exploitable and should lead to crashes when V8's GC tries to get access to some cleared TLS values.

Case 2. is problematic as there might be exploitable UAF. The fixes are in flight as we need to investigate the cases that fire with a CHECK (now DCHECK to keep on truckin). The problematic part is that we cannot just clear the Persistent handles as people like to still use reference counted class that apparently can also have back pointers (==Persistent) into Oilpans heap and would like to use that things during finalization. So there exist different types of Persistents: (a) those that require clearing, and (b) those that should go away after a few GCs after initially clearing things in (a). 

What we need to do is get as much clearing as possible and figure out where the classes with finalizers and Persistent usages are to fix those cases.
Labels: OS-Chrome OS-Fuchsia
Thanks. For the fixes in flight, are the bugs they fix labeled as Type=Bug-Security? And/or do they point to this bug?
They should reference this bug as I am not aware of any other open bugs.
Labels: -Restrict-View-Google -Type-Bug-Regression Security_Severity-High Security_Impact-Stable Restrict-View-SecurityTeam Type-Bug-Security
OK, cool. I'm going to tag this bug as a security bug then. I'll be reducing the effective access by changing the restrict-view labels, but as always, please feel free to CC anyone who can help or needs to know.
Project Member

Comment 70 by bugdroid1@chromium.org, Jun 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6ffac3dbe67b9039da222913f53dc35bf1796562

commit 6ffac3dbe67b9039da222913f53dc35bf1796562
Author: Lucas Furukawa Gadani <lfg@chromium.org>
Date: Thu Jun 07 02:13:28 2018

Oilpan: Clear all Persistents on thread termination.

Persistents should be cleared when NumberOfPersistents() != 0.

Bug:  831117 
Change-Id: Ie6209d822c1d14a79936cb2807f1da7e559a42b2
Reviewed-on: https://chromium-review.googlesource.com/1089460
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Keishi Hattori <keishi@chromium.org>
Commit-Queue: Lucas Gadani <lfg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#565144}
[modify] https://crrev.com/6ffac3dbe67b9039da222913f53dc35bf1796562/third_party/blink/renderer/platform/heap/thread_state.cc

Per #67, Chrome OS M67 is stable eminent.  Why did it get tagged so late and it it a stable blocker for Chrome OS?
Cc: bhthompson@chromium.org kbleicher@chromium.org
Project Member

Comment 73 by sheriffbot@chromium.org, Jun 7 2018

Status: Fixed (was: Assigned)
Please mark security bugs as fixed as soon as the fix lands, and before requesting merges. This update is based on the merge- labels applied to this issue. Please reopen if this update was incorrect.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 74 by lfg@chromium.org, Jun 7 2018

Re#71: This can cause security bugs on all platforms. For now, we should wait until it is baked on tomorrow's canary and then evaluate whether we want to merge or not.
Project Member

Comment 75 by sheriffbot@chromium.org, Jun 8 2018

Labels: -Restrict-View-SecurityTeam Restrict-View-SecurityNotify
Project Member

Comment 77 by sheriffbot@chromium.org, Jun 8 2018

Labels: -Merge-Request-68 Merge-Review-68
This bug requires manual review: Reverts referenced in bugdroid comments after merge request.
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-68 Merge-Approved-68
Approving merge for M68. Branch:3440
Project Member

Comment 79 by bugdroid1@chromium.org, Jun 11 2018

Labels: -merge-approved-68 merge-merged-3440
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/255b6af3efede01a8dec7bcb13477cee4796d134

commit 255b6af3efede01a8dec7bcb13477cee4796d134
Author: Lucas Furukawa Gadani <lfg@chromium.org>
Date: Mon Jun 11 15:19:19 2018

Oilpan: Clear all Persistents on thread termination.

Persistents should be cleared when NumberOfPersistents() != 0.

Bug:  831117 
Change-Id: Ie6209d822c1d14a79936cb2807f1da7e559a42b2
Reviewed-on: https://chromium-review.googlesource.com/1089460
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Keishi Hattori <keishi@chromium.org>
Commit-Queue: Lucas Gadani <lfg@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#565144}(cherry picked from commit 6ffac3dbe67b9039da222913f53dc35bf1796562)
Reviewed-on: https://chromium-review.googlesource.com/1094117
Reviewed-by: Lucas Gadani <lfg@chromium.org>
Cr-Commit-Position: refs/branch-heads/3440@{#274}
Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733}
[modify] https://crrev.com/255b6af3efede01a8dec7bcb13477cee4796d134/third_party/blink/renderer/platform/heap/thread_state.cc

Comment 80 by lfg@chromium.org, Jun 11 2018

I've merged the fix to M68. There have been no crashes on the latest canary (69.0.3453.0). I've looked both at the original crash in the CrossThreadPersistentRegion as well as the CHECK() that I've re-introduced.

I'll leave it up to keishi@ whether we want to backport the whole fix to M67. I'm unsure how exploitable this would be, since during thread termination we unmap the pages, so I think an attacker would need to remap the same virtual addresses in another thread in order to exploit. If that's doable, then exploiting this would be pretty easy.

Labels: -ReleaseBlock-Beta

Comment 82 by aluo@chromium.org, Jun 12 2018

Per c37, this does not look to be needed in M67, but keishi@ please confirm this asap.
r561418 is not in M67 so no need to merge
Issue 852256 has been merged into this issue.
Labels: Release-0-M68
Project Member

Comment 86 by sheriffbot@chromium.org, Sep 14

Labels: -Restrict-View-SecurityNotify allpublic
This bug has been closed for more than 14 weeks. Removing security view restrictions.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment