Termination GC leaves behind persistents |
|||||||||||||||||||||||||||||
Issue descriptionI noticed that on ToT on linux release builds with DCHECK we sometimes fail in external/wpt/offscreen-canvas/compositing/* tests. This has been going on for some time now. I've now identified that the thread termination GCs leave behind persistents crash log for renderer (pid <unknown>): STDOUT: <empty> STDERR: [1:47:0410/131512.396866:FATAL:thread_state.cc(239)] Check failed: !current_count. STDERR: #0 0x7f6c7c87640c base::debug::StackTrace::StackTrace() STDERR: #1 0x7f6c7c8a08bb logging::LogMessage::~LogMessage() STDERR: #2 0x7f6c75419a16 blink::ThreadState::RunTerminationGC() STDERR: #3 0x7f6c754197be blink::ThreadState::DetachCurrentThread() STDERR: #4 0x7f6c753e29f2 blink::WebThreadSupportingGC::ShutdownOnThread() STDERR: #5 0x7f6c7735219a blink::WorkerBackingThread::ShutdownOnBackingThread() STDERR: #6 0x7f6c773671c4 blink::WorkerThread::PerformShutdownOnWorkerThread() STDERR: #7 0x7f6c753e1851 blink::(anonymous namespace)::RunCrossThreadClosure() STDERR: #8 0x7f6c753e2235 _ZN4base8internal7InvokerINS0_9BindStateIPFvN3WTF19CrossThreadFunctionIFvvEEEEJS6_EEES5_E7RunOnceEPNS0_13BindStateBaseE STDERR: #9 0x7f6c7c876df0 base::debug::TaskAnnotator::RunTask() STDERR: #10 0x7f6c754a5656 blink::scheduler::internal::ThreadControllerImpl::DoWork() STDERR: #11 0x7f6c754a76c8 _ZN4base8internal7InvokerINS0_9BindStateIMN5blink9scheduler8internal20ThreadControllerImplEFvNS5_19SequencedTaskSource8WorkTypeEEJNS_7WeakPtrIS6_EES8_EEEFvvEE3RunEPNS0_13BindStateBaseE STDERR: #12 0x7f6c7c876df0 base::debug::TaskAnnotator::RunTask() STDERR: #13 0x7f6c7c8aca46 base::internal::IncomingTaskQueue::RunTask() STDERR: #14 0x7f6c7c8b0b97 base::MessageLoop::RunTask() STDERR: #15 0x7f6c7c8b0faa base::MessageLoop::DeferOrRunPendingTask() STDERR: #16 0x7f6c7c8b123e base::MessageLoop::DoWork() STDERR: #17 0x7f6c7c8b2586 base::MessagePumpDefault::Run() STDERR: #18 0x7f6c7c8b040c base::MessageLoop::Run() STDERR: #19 0x7f6c7c8e8ad6 base::RunLoop::Run() STDERR: #20 0x7f6c7c92af4a base::Thread::Run() STDERR: #21 0x7f6c7c92b544 base::Thread::ThreadMain() STDERR: #22 0x7f6c7c9270df base::(anonymous namespace)::ThreadFunc() STDERR: #23 0x7f6c7c9c6494 start_thread STDERR: #24 0x7f6c72e6ea8f clone STDERR: gn args: is_component_build = true is_debug = true use_goma = true symbol_level = 2 I've not been able to reproduce this locally so far.
,
Apr 10 2018
fyi: V8 just clears the handles and removes the pages. I guess we would need a bit of handling for cross-thread persistents but I would otherwise prefer that approach.
,
Apr 10 2018
> fyi: V8 just clears the handles and removes the pages. Thanks for the info. I'm now more confident about that approach :) > I guess we would need a bit of handling for cross-thread persistents Cross-thread persistents are already cleared on shut down. We just need to take care of (same-thread) persistents.
,
Apr 11 2018
What came to my mind today is finalizers. If they can cause allocations on other threads or on global state then this might be unsafe.
,
Apr 16 2018
This is another issue: https://chromium-review.googlesource.com/c/chromium/src/+/1011467
,
Apr 16 2018
Given that this is causing a bunch of crashes, I want to fix this asap. Specifically, I want to understand why we cannot simply clear all persistent handles of that thread before the thread shuts down. > What came to my mind today is finalizers. If they can cause allocations on other threads or on global state then this might be unsafe. Would you elaborate on this? Finalizers of the cleared persistent handles will run in the termination GC. How is it unsafe?
,
Apr 16 2018
,
Apr 16 2018
>> What came to my mind today is finalizers. If they can cause allocations on other threads or on global state then this might be unsafe. > Would you elaborate on this? Finalizers of the cleared persistent handles will run in the termination GC. How is it unsafe? If we still want to do a termination GC then this is fine. In v8 we just shut down the heap without any final GC. We only have a flag for termination GCs for test purposes where we want to ensure a graceful teardown. I can have a look after BlinkOn if nobody is on this.
,
Apr 16 2018
Another P1 issue reported: https://bugs.chromium.org/p/chromium/issues/detail?id=829859 Let me bump up the priority of this bug. Do you know why a lot of crashes have appeared recently? The termination GC was there for a long time.
,
Apr 16 2018
AFAIK, we have not been changing this recently, no? Last change in this area was the refactoring on persistents, I guess. I can try to repro the case on the linked bug. The issue I observed only happened when running with the layout test runner.
,
Apr 16 2018
,
Apr 16 2018
I've reproduced the issue (from the linked bug) in a new LayoutTest on my CL to fix this for Cache Storage with Mojo: crrev.com/c/1011467 I have reproduced the issue not only on LayoutTests but also when running a recent build and loading the offending JS code.
,
Apr 17 2018
Another slightly related issue would be: We're not flushing all pending tasks before shutting down the thread. This has a risk of causing problems because people sometimes expect that callbacks are guaranteed to run. Flushing all tasks before a thread shutdown will fix this bug as well. (Separately I want to have a mechanism to clear all persistent handles before a thread shutdown though.)
,
Apr 17 2018
Neither of the test fail when run stand-alone using content_shell and the blink http server (if needed). They only fail for me when run through run-webkit-tests script. Presumably, something with tasks and/or threads is off here.
,
Apr 17 2018
The crash in #12 doesn't repro for me (Mac). (I don't have the fix merged.)
,
Apr 17 2018
We register Persistent that should be explicitly cleared with "RegisterAsStaticReference". I assume that we forget to mark one of the entry points with that call.
,
Apr 17 2018
The crashes indicate that it's not realistic to manually mark so many Persistents with RegisterAsStaticReference. That's why I'm proposing: - Flush all pending tasks before a worker shuts down. (This will anyway be a good change.) - Clear all persistent handles before a thread shutdown.
,
Apr 17 2018
I am trying to clear the persistent handles. After a layover in LAX, I am finally in MTV today (currently at the wasm hackathon in S Club 7). haraken when we have a chance, may I discuss how to handle PersistentHeapCollections. I am not sure to integrate them with the rest of the Persistents.
,
Apr 17 2018
Lets schedule a meeting about this one during the BlinkOn week.
,
Apr 17 2018
Let me join the meeting. :)
,
Apr 17 2018
FYI, the offending persistent in external/wpt/xhr/close-worker-with-xhr-in-progress.html is void blink::PersistentBase<blink::ResourceLoader, blink::kNonWeakPersistentConfiguration, blink::kSingleThreadPersistentConfiguration>::PrintDebug(blink::Visitor *) [T = blink::ResourceLoader, weaknessConfiguration = blink::kNonWeakPersistentConfiguration, crossThreadnessConfiguration = blink::kSingleThreadPersistentConfiguration] Need to check the sources.
,
Apr 17 2018
The problem for this specific issue seems to be in ResourceLoader::DidStartLoadingResponseBody. https://cs.chromium.org/chromium/src/third_party/blink/renderer/platform/loader/fetch/resource_loader.cc?l=619 I see to Persistents being created from there. One is definitely for WTF::Bind using WrapPersistent(). If the callback is still enqueued somewhere during teardown, then we definitely won't be able to clear all Persistents.
,
Apr 17 2018
The issue is definitely that the WTF::Bind call generates a Persistent that is (a) not registered using RegisterAsStaticReference (b) doesn't go away by doing GC as it is held alive from outside the managed heap We could probably use RegisterAsStaticReference in WrapPersistent() to mitigate such issue. Long term we should do better.
,
Apr 18 2018
Thanks for digging! ResourceLoader::DidStartLoadingResponseBody seems to be using Wrap"Weak"Persistent, not WrapPersistent... right? If you're talking about a different WrapPersistent, would it be an option to replace the WrapPersistent with WrapWeakPersistent as a short-term fix? We're fixing other similar crashes that way. For the fix to be correct, it must be guaranteed that the object is kept alive in a way other than the WrapPersistent. As I mentioned, in mid term flushing all pending tasks on teardown will fix all the crashes.
,
Apr 18 2018
Fixed version on ToT uses WrapWeakPersistent. The broken version used WrapPersistent and we couldn't clear it because it was not registered as static. I wonder if WrapPersistent should also call RegisterAsStaticReference. That would allow strong refs that still go away on termination.
,
Apr 18 2018
Thanks, makes sense. Thanks for digging into the issue, really :) > The broken version used WrapPersistent and we couldn't clear it because it was not registered as static. I wonder if WrapPersistent should also call RegisterAsStaticReference. That would allow strong refs that still go away on termination. I'd prefer all persistents of that thread than using RegisterAsStaticReference because all persistents (not limited to WrapPersistent) have the same problem though. Separately I think we should consider landing a change to flush all pending tasks.
,
Apr 23 2018
Summary of what we agreed on at BlinkOn: 1. We should clear wrappers before the terminating GC as we would otherwise not call finalizers for some objects. Not doing so may lead to (a) leaks and/or (b) stale handles. 2. We should clear *all* persistents before the terminating GC and retire the static registration mechanism. This might require a few smaller changes to how we handle persistents (because we cannot derive whether we are PersistentBase or PersistentCollectionBase). 3. We can then check that marking does not find any live objects. If we still find some, then we have another component to fix. Otherwise, we should be good. Will do 1. first.
,
Apr 25 2018
I begin to wonder why it's hard to support PersistentCollectionBase. Step 1: We just need to clear all PersistentNodes. That's it. Then subsequent accesses to PersistentCollectionBase and Persistent will cause a null-deref crash. This is better than what we have today. Step 2: If we want to avoid the null-deref crash, we need to clear PersistentCollectionBases and Persistents that own the PersistentNodes. However even if we do this, it won't be better than Step 1 because we'll anyway hit a null-deref crash when accessing the PersistentCollectionBases and Persistents. It's just moving a place to hit a null-deref crash. Am I misunderstanding?
,
Apr 25 2018
Seems all correct to me. I guess it's about how to implement the clearing and whether we use internal knowledge there (work our way through void*), or whether we want to refer to some type clearing, e.g. PersistentCollectionBase::Clear. The latter would require some plumbing.
,
Apr 25 2018
Yeah, the plumbing would be hard because we need to manage back pointers. (It would be doable by unifying TraceCallback and ClearCallback into one trampoline function -- but it will be a terrible template.) In short term, I'd propose simply clearing all PersistentNodes.
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3536bbb2a547af920d98b537ee6ba7468bd85102 commit 3536bbb2a547af920d98b537ee6ba7468bd85102 Author: Michael Lippautz <mlippautz@chromium.org> Date: Wed Apr 25 08:37:16 2018 [oilpan] Detach V8 garbage colector before running termination GC Any roots from V8 into Blink need to be cleared before running the termination garbage collection to ensure that all objects die and finalizers are called. Bug: chromium:831117 Change-Id: I09042623ceafe141819f4e9b52b15e784dd623a9 Reviewed-on: https://chromium-review.googlesource.com/1025032 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Cr-Commit-Position: refs/heads/master@{#553487} [modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/bindings/core/v8/v8_gc_controller.cc [modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/bindings/core/v8/v8_gc_controller.h [modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/core/workers/worker_backing_thread.cc [modify] https://crrev.com/3536bbb2a547af920d98b537ee6ba7468bd85102/third_party/blink/renderer/platform/bindings/v8_per_isolate_data.cc
,
Apr 29 2018
Issue 825728 has been merged into this issue.
,
May 17 2018
The above CL landed on - 68.0.3409.0, do we need a merge here? Crash rate is reasonably high in previous Beta - 67.0.3396.10. There are chances of spiking once 100% of users upgrades to .40 which was released Yday. 67.0.3396.40 0.01% 1 67.0.3396.10 15.81% 2117 67.0.3396.3 0.03% 4 67.0.3396.2 0.06% 8 67.0.3396.0 0.35% 47 66.0.3359.181 0.01% 1 Over to M67 owners for monitoring. Similar bug : Issue 825728 Sample crash report: go/crash/e313d8e0dec64954 Link to the reports: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ACrossThreadPersistentRegion%3A%3AShouldTracePersistentNode%27#-property-selector,-day-graph,-samplereports,productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
,
May 17 2018
Thank you ligimole@ Update from keishi@ on dupe bug 825728 - https://bugs.chromium.org/p/chromium/issues/detail?id=825728#c45 (Judging from the fact that the crash rate has decreased, I think we can conclude that crbug.com/829859 was the cause and the fix was merged in 67.0.3396.13). After this fix/merge to M67, there is only 1 crash reported on last week beta #67.0.3396.40. So far no crash reported on this week M67 beta #67.0.3396.48. At the moment based on available crash data, this doesn't seems like M67 stable blocker. Prudhvi, pls continue to monitor this. Thank you.
,
May 18 2018
This is the #1 crash in latest Dev- 68.0.3432.3, crash rate for Dev channel in M68 and M67. 68.0.3423.2 0.95% 98 68.0.3418.2 0.50% 51 68.0.3409.2 0.01% 1 67.0.3396.10 20.25% 2079 67.0.3393.4 24.11% 2475 67.0.3386.1 30.00% 3080 67.0.3381.1 21.71% 2229 Would like to monitor for M68.
,
May 19 2018
Which crash are you talking about? This is about a general memory corruption that could affect all sorts of crashes. Is this about "blink::CrossThreadPersistentRegion::ShouldTracePersistentNode"? I think this one was already gone in the new version, even before this path, no?
,
May 21 2018
>C#36 There seems to have been spike in crashes of 'blink::CrossThreadPersistentRegion::ShouldTracePersistentNode' on the latest Windows dev: 68.0.3432.3(3 days of crash data has reported 1214 crashes from 1081 clients) compared to Last dev 68.0.3423.2 which reported 103 crashes frm 93 clients from 10 days of crash data. Link to the version comparison of this specific magic signature: https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ACrossThreadPersistentRegion%3A%3AShouldTracePersistentNode%27&compProp=product.Version&v1=68.0.3423.2&v2=68.0.3432.3 Note: Tagging this with M-68 Beta blocker and this is not a blocker for M-67 as there are no crashes seen on the last 2 M-67 Beta(67.0.3396.40_ one crash and 67.0.3396.48_ zero crash)
,
May 21 2018
#33 mentions that that the CL landed in 68.0.3409.0. the version distribution is in https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3ACrossThreadPersistentRegion%3A%3AShouldTracePersistentNode%27#productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50 This one shows that the version is still fine, so it's not this CL.
,
May 21 2018
https://chromium.googlesource.com/chromium/src/+/5b6c23a4f50a50bd859d96d22679f2e3124783d4 is touching WrapPersistent in workers so I am looking into it.
,
May 21 2018
My current best guess is that the Persistent create by WrapPersistent(resolver) from ServiceWorkerRegistrationNotifications::DidLoadResources() is leaking because the callback gets registered somewhere in the mojo bindings. https://chromium-review.googlesource.com/c/chromium/src/+/1014400 was also caused by a callback registered into a mojo object(mojom::blink::BlobRegistry).
,
May 21 2018
Do you know why the persistent held by the callback hits the DCHECK? As I mentioned offline, the callback task is not called but gets destructed before the worker shuts down. Thus the Persistent handle should be destructed before the worker shuts down.
,
May 21 2018
The bound callbacks aren't used to PostTask. They are held strongly by a mojo related object as callback for some event. I am thinking that's why they don't get freed when the task queue is cleared out.
,
May 22 2018
Your bug is tagged as Release block Beta and we are branching in 2 days.Please have a fix ASAP.
,
May 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e3aac921d7937d5be6c962532f3d28d608824b67 commit e3aac921d7937d5be6c962532f3d28d608824b67 Author: Keishi Hattori <keishi@chromium.org> Date: Thu May 24 08:06:49 2018 Oilpan: Clear all Persistents on thread termination Clear all persistents on thread termination, because when we have a bug and it leaves a Persistent behind, it will not be a stale pointer and just causes null dereference. Since we disabled PersistentHeapCollections on non-main threads we can assume all PersistentNode::Self() can be cast to Persistent<DummyGCBase> Bug: 831117 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Ib55b280f8e7c5d966ce43d3f60f2af873cd50739 Reviewed-on: https://chromium-review.googlesource.com/1025547 Commit-Queue: Keishi Hattori <keishi@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Cr-Commit-Position: refs/heads/master@{#561418} [modify] https://crrev.com/e3aac921d7937d5be6c962532f3d28d608824b67/third_party/blink/renderer/platform/heap/persistent_node.cc [modify] https://crrev.com/e3aac921d7937d5be6c962532f3d28d608824b67/third_party/blink/renderer/platform/heap/persistent_node.h [modify] https://crrev.com/e3aac921d7937d5be6c962532f3d28d608824b67/third_party/blink/renderer/platform/heap/thread_state.cc
,
May 25 2018
Issue 846696 has been merged into this issue.
,
May 25 2018
Think the above patch introduced another crash in latest canary-68.0.3440.0, magic signature: blink::ThreadState::RunTerminationGC Did a version comparison from the previous canary. 68.0.3439.0 -> 0 reports 68.0.3440.0 -> 50 reports , 50 clients Crash Reports ============= https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+product.version%3D%2768.0.3440.0%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27+AND+expanded_custom_data.ChromeCrashProto.ptype%3D%27renderer%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AThreadState%3A%3ARunTerminationGC%27 keishi@ if the patch in #44 is the suspect, please revert ASAP. RBD for M68.
,
May 28 2018
mlippautz@, Friendly ping to get update on this issue as it is marked as RBB & RBD for M67 & M68. Thanks...!
,
May 28 2018
The crash reports from C#46 is due to r561418. I will land a partial revert. https://chromium-review.googlesource.com/#/c/chromium/src/+/1074579 RBD should be removed when this lands. The initial issue, blocking M68 beta probably still exists.
,
May 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c899e5f3047daf6d121c524bc96259291f73de6f commit c899e5f3047daf6d121c524bc96259291f73de6f Author: Keishi Hattori <keishi@chromium.org> Date: Mon May 28 08:01:11 2018 Oilpan: Change termination gc persistent check back to DCHECK r561418 changed the check to CHECK. Reverting back to DCHECK. Bug: 831117 Change-Id: Iad20e975e5de637bcb303c3fe4230e2fac6f4753 Reviewed-on: https://chromium-review.googlesource.com/1074579 Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Commit-Queue: Keishi Hattori <keishi@chromium.org> Cr-Commit-Position: refs/heads/master@{#562204} [modify] https://crrev.com/c899e5f3047daf6d121c524bc96259291f73de6f/third_party/blink/renderer/platform/heap/thread_state.cc
,
May 29 2018
Just to update, Mac and Android canary version: 69.0.3444.0 has been live for 4 hrs and has not reported any crashes so far for the magic signature 'blink::ThreadState::RunTerminationGC'(Issue 846696, duped in C#45). Based on the available crash data this looks fixed. keishi@: Would it be fine to get the CL from C#49 merged to M-68 based on Android and Mac canary crash data. Note: Unable to verify this on Windows as no M-69 canary is available due to Issue 847075.
,
May 29 2018
Can we get this merged to M68 branch?
,
May 30 2018
To add to C#50 there are no crashes seen on 69.0.3444.0 and 69.0.3445.0 of Windows. keishi@: Please get the CL merged to M-68 branch.
,
May 30 2018
This bug requires manual review: Reverts referenced in bugdroid comments after merge request. Please contact the milestone owner if you have questions. Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 30 2018
Approving merge to M68. Branch:3440
,
May 31 2018
Issue 825728(magic signature: blink::CrossThreadPersistentRegion::ShouldTracePersistentNode) duped here in C#32 has started showing up crashes in M-69 on Windows,Mac and Android since chrome version: 69.0.3444.0. Win : https://goto.google.com/xmmhg Mac : https://goto.google.com/jlrkg Android : https://goto.google.com/yqbhf keishi@: Please let us know if you need separate issue for recent spike on Win,Mac and Android in M-69. Thank you!
,
May 31 2018
,
May 31 2018
,
May 31 2018
Note that this shouldn't block Chrome OS 67 or 68 rollout.
,
Jun 4 2018
keishi@, Friendly ping to get an update on this issue as per C#56 & it is marked as RBB. Thanks..!
,
Jun 4 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b5e2b0e7f572ad3ded8a447166078ae4558f3fff commit b5e2b0e7f572ad3ded8a447166078ae4558f3fff Author: Keishi Hattori <keishi@chromium.org> Date: Mon Jun 04 14:09:16 2018 Oilpan: Change termination gc persistent check back to DCHECK r561418 changed the check to CHECK. Reverting back to DCHECK. Bug: 831117 Change-Id: Iad20e975e5de637bcb303c3fe4230e2fac6f4753 Reviewed-on: https://chromium-review.googlesource.com/1074579 Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Commit-Queue: Keishi Hattori <keishi@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#562204}(cherry picked from commit c899e5f3047daf6d121c524bc96259291f73de6f) Reviewed-on: https://chromium-review.googlesource.com/1085187 Reviewed-by: Keishi Hattori <keishi@chromium.org> Cr-Commit-Position: refs/branch-heads/3440@{#133} Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733} [modify] https://crrev.com/b5e2b0e7f572ad3ded8a447166078ae4558f3fff/third_party/blink/renderer/platform/heap/thread_state.cc
,
Jun 4 2018
keishi@: Would the C#61 will be safe merge. We have also been seeing 'blink::CrossThreadPersistentRegion::ShouldTracePersistentNode' in M-69 since 69.0.3444.0 on Win,Mac and Android. Please check C#56.
,
Jun 4 2018
Removing the OS-Chrome label per #58,59. Thanks abodenha@!
,
Jun 5 2018
I've been investigating issue 849073 , where the DedicatedWorkerGlobalScope survives termination GC in a worker and causes all sorts of problems, because it's destructor isn't called but its memory is freed in RemoveAllPages(). While investigating, I noticed that r561733 turns these failed CHECKs into use-after-frees. I think this should be reverted back to a CHECK(). Adding +palmer@ who is the security sheriff for thoughts on what we should do.
,
Jun 5 2018
Re #64: Yes, if a DCHECK is the only thing guarding against UAF, then it needs to be a CHECK. It sounds like this bug is an exploitable memory corruption bug, and possibly also an exploitable logic bug (destructors not being called could hypothetically lead to weird application behavior?). So it's probably the case that this bug should have been tracked as Type=Bug-Security from the beginning. It's easy to de-escalate, so I tend to err on the side of calling memory corruption and object lifetime bugs security bugs. Also, why would this not affect Chrome OS or Fuchsia?
,
Jun 6 2018
This should affect all platforms. They should be marked as security relevant but are unfortunately not simple to fix. Let me provide a bit more context here. Blink is notoriously bad in tearing down in a graceful manner. AFAIK, people have given up in tearing down the heap in a predictable way. IMHO, that is a bad idea as it allows for all sorts of bad practices to slip through. Now, for threads this is different as they are still connected to the main thread when they are teared down. There are many ways to get this wrong and historically this was always in a very fragile state. The cases that require handling are: 1. Disconnecting V8 from Blink: This is necessary as there may still be pending V8 garbage collections that try to access Blink's heap during the teardown sequence. 2. Clearing CrossThreadPersistents. Case 1. went unnoticed for years now and I fixed most of the cases with the CL in #31. It is not exploitable and should lead to crashes when V8's GC tries to get access to some cleared TLS values. Case 2. is problematic as there might be exploitable UAF. The fixes are in flight as we need to investigate the cases that fire with a CHECK (now DCHECK to keep on truckin). The problematic part is that we cannot just clear the Persistent handles as people like to still use reference counted class that apparently can also have back pointers (==Persistent) into Oilpans heap and would like to use that things during finalization. So there exist different types of Persistents: (a) those that require clearing, and (b) those that should go away after a few GCs after initially clearing things in (a). What we need to do is get as much clearing as possible and figure out where the classes with finalizers and Persistent usages are to fix those cases.
,
Jun 6 2018
Thanks. For the fixes in flight, are the bugs they fix labeled as Type=Bug-Security? And/or do they point to this bug?
,
Jun 6 2018
They should reference this bug as I am not aware of any other open bugs.
,
Jun 6 2018
OK, cool. I'm going to tag this bug as a security bug then. I'll be reducing the effective access by changing the restrict-view labels, but as always, please feel free to CC anyone who can help or needs to know.
,
Jun 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6ffac3dbe67b9039da222913f53dc35bf1796562 commit 6ffac3dbe67b9039da222913f53dc35bf1796562 Author: Lucas Furukawa Gadani <lfg@chromium.org> Date: Thu Jun 07 02:13:28 2018 Oilpan: Clear all Persistents on thread termination. Persistents should be cleared when NumberOfPersistents() != 0. Bug: 831117 Change-Id: Ie6209d822c1d14a79936cb2807f1da7e559a42b2 Reviewed-on: https://chromium-review.googlesource.com/1089460 Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Keishi Hattori <keishi@chromium.org> Commit-Queue: Lucas Gadani <lfg@chromium.org> Cr-Commit-Position: refs/heads/master@{#565144} [modify] https://crrev.com/6ffac3dbe67b9039da222913f53dc35bf1796562/third_party/blink/renderer/platform/heap/thread_state.cc
,
Jun 7 2018
Per #67, Chrome OS M67 is stable eminent. Why did it get tagged so late and it it a stable blocker for Chrome OS?
,
Jun 7 2018
,
Jun 7 2018
Please mark security bugs as fixed as soon as the fix lands, and before requesting merges. This update is based on the merge- labels applied to this issue. Please reopen if this update was incorrect. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 7 2018
Re#71: This can cause security bugs on all platforms. For now, we should wait until it is baked on tomorrow's canary and then evaluate whether we want to merge or not.
,
Jun 8 2018
,
Jun 8 2018
Canary has been up for a few hours, so far no crash reports on the new CHECK: https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27+AND+expanded_custom_data.ChromeCrashProto.ptype%3D%27renderer%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AThreadState%3A%3ARunTerminationGC%27 Requesting for M68 merge.
,
Jun 8 2018
This bug requires manual review: Reverts referenced in bugdroid comments after merge request. Please contact the milestone owner if you have questions. Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 8 2018
Approving merge for M68. Branch:3440
,
Jun 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/255b6af3efede01a8dec7bcb13477cee4796d134 commit 255b6af3efede01a8dec7bcb13477cee4796d134 Author: Lucas Furukawa Gadani <lfg@chromium.org> Date: Mon Jun 11 15:19:19 2018 Oilpan: Clear all Persistents on thread termination. Persistents should be cleared when NumberOfPersistents() != 0. Bug: 831117 Change-Id: Ie6209d822c1d14a79936cb2807f1da7e559a42b2 Reviewed-on: https://chromium-review.googlesource.com/1089460 Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Keishi Hattori <keishi@chromium.org> Commit-Queue: Lucas Gadani <lfg@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#565144}(cherry picked from commit 6ffac3dbe67b9039da222913f53dc35bf1796562) Reviewed-on: https://chromium-review.googlesource.com/1094117 Reviewed-by: Lucas Gadani <lfg@chromium.org> Cr-Commit-Position: refs/branch-heads/3440@{#274} Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733} [modify] https://crrev.com/255b6af3efede01a8dec7bcb13477cee4796d134/third_party/blink/renderer/platform/heap/thread_state.cc
,
Jun 11 2018
I've merged the fix to M68. There have been no crashes on the latest canary (69.0.3453.0). I've looked both at the original crash in the CrossThreadPersistentRegion as well as the CHECK() that I've re-introduced. I'll leave it up to keishi@ whether we want to backport the whole fix to M67. I'm unsure how exploitable this would be, since during thread termination we unmap the pages, so I think an attacker would need to remap the same virtual addresses in another thread in order to exploit. If that's doable, then exploiting this would be pretty easy.
,
Jun 11 2018
,
Jun 12 2018
Per c37, this does not look to be needed in M67, but keishi@ please confirm this asap.
,
Jun 13 2018
r561418 is not in M67 so no need to merge
,
Jun 28 2018
Issue 852256 has been merged into this issue.
,
Jul 23
,
Sep 14
This bug has been closed for more than 14 weeks. Removing security view restrictions. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||||
Comment 1 by haraken@chromium.org
, Apr 10 2018