Issue metadata
Sign in to add a comment
|
Chrome: Crash Report - cc::LayerTreeHost::AnimateLayers |
||||||||||||||||||||||||
Issue descriptionreporter:lukasza@google.com Magic Signature: cc::LayerTreeHost::AnimateLayers Crash link: https://crash.corp.google.com//browse?q=product.Version%3E%3D'65.0.3319.0'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'cc%3A%3ALayerTreeHost%3A%3AAnimateLayers'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'cc%3A%3ALayerTreeHost%3A%3AAnimateLayers'%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(custom_data.ChromeCrashProto.experiments.ids)%20custom_data_ChromeCrashProto_experiments_ids%20WHERE%20custom_data_ChromeCrashProto_experiments_ids%3D'459a590c-349050d8')%20AND%20ReportID%3D'476ce28d5a6c08ec'&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 ------------------------------------------------------------------------------- Sample Report ------------------------------------------------------------------------------- Product name: Chrome Magic Signature : cc::LayerTreeHost::AnimateLayers Product Version: 65.0.3322.3 Process type: renderer Report ID: 476ce28d5a6c08ec Report Url: https://crash.corp.google.com/476ce28d5a6c08ec Report Time: 2018-01-16T09:42:32.155-08:00 Upload Time: 2018-01-16T09:42:32.155-08:00 Uptime: 5323000 ms CumulativeProductUptime: 0 ms OS Name: Windows NT OS Version: 6.1.7601 17514 CPU Architecture: amd64 CPU Info: family 6 model 94 stepping 3 ------------------------------------------------------------------------------- Crashing thread: Thread index: 0. Stack Quality: 100%. Thread id: 4872. ------------------------------------------------------------------------------- 0x000007fed6c762ab (chrome_child.dll - layer_tree_host.cc: 926) cc::LayerTreeHost::AnimateLayers(base::TimeTicks) 0x000007fed6c67259 (chrome_child.dll - proxy_main.cc: 188) cc::ProxyMain::BeginMainFrame(std::unique_ptr<cc::BeginMainFrameAndCommitState,std::default_delete<cc::BeginMainFrameAndCommitState> >) 0x000007fed6c67062 (chrome_child.dll - bind_internal.h: 336) base::internal::Invoker<base::internal::BindState<void (cc::ProxyMain::*)(std::unique_ptr<cc::BeginMainFrameAndCommitState,std::default_delete<cc::BeginMainFrameAndCommitState> >),base::WeakPtr<cc::ProxyMain>,base::internal::PassedWrapper<std::unique_ptr<cc::BeginMainFrameAndCommitState,std::default_delete<cc::BeginMainFrameAndCommitState> > > >,void ()>::RunOnce 0x000007fed6795619 (chrome_child.dll - task_annotator.cc: 53) base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) 0x000007fed7a03d27 (chrome_child.dll - task_queue_manager.cc: 519) blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue(blink::scheduler::internal::WorkQueue *,blink::scheduler::LazyNow,base::TimeTicks *) 0x000007fed67bedde (chrome_child.dll - task_queue_manager.cc: 330) blink::scheduler::TaskQueueManager::DoWork(blink::scheduler::internal::Sequence::WorkType) 0x000007fed6795619 (chrome_child.dll - task_annotator.cc: 53) base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) 0x000007fed67bea25 (chrome_child.dll - thread_controller_impl.cc: 99) blink::scheduler::internal::ThreadControllerImpl::DoWork(blink::scheduler::internal::Sequence::WorkType) 0x000007fed6795619 (chrome_child.dll - task_annotator.cc: 53) base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) 0x000007fed6794c7b (chrome_child.dll - message_loop.cc: 399) base::MessageLoop::RunTask(base::PendingTask *) 0x000007fed678bda9 (chrome_child.dll - message_loop.cc: 462) base::MessageLoop::DoWork() 0x000007fed678bbd8 (chrome_child.dll - message_pump_default.cc: 37) base::MessagePumpDefault::Run(base::MessagePump::Delegate *) 0x000007fed678b7f4 (chrome_child.dll - run_loop.cc: 130) base::RunLoop::Run() 0x000007fed677710d (chrome_child.dll - renderer_main.cc: 218) content::RendererMain(content::MainFunctionParams const &) 0x000007fed6776e3d (chrome_child.dll - content_main_runner.cc: 426) content::RunNamedProcessTypeMain(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,content::MainFunctionParams const &,content::ContentMainDelegate *) 0x000007fed67703bc (chrome_child.dll - content_main_runner.cc: 720) content::ContentMainRunnerImpl::Run() 0x000007fed6745113 (chrome_child.dll - main.cc: 456) service_manager::Main(service_manager::MainParams const &) 0x000007fed6744bb9 (chrome_child.dll - content_main.cc: 19) content::ContentMain(content::ContentMainParams const &) 0x000007fed67419a0 (chrome_child.dll - chrome_main.cc: 128) ChromeMain 0x000000013f61354c (chrome.exe - main_dll_loader_win.cc: 199) MainDllLoader::Launch(HINSTANCE__ *,base::TimeTicks) 0x000000013f61169f (chrome.exe - chrome_exe_main_win.cc: 230) wWinMain 0x000000013f6edd32 (chrome.exe - exe_common.inl: 283) __scrt_common_main_seh 0x7730652c (kernel32.dll + 0x0001652c) BaseThreadInitThunk 0x7753c520 (ntdll.dll + 0x0002c520) RtlUserThreadStart
,
Jan 16 2018
FWIW, this seems to be a recent regression that started on, or around 65.0.3317.0 - see https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.magic_signature_1.name%3D%27cc%3A%3ALayerTreeHost%3A%3AAnimateLayers%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(CrashedStackTrace.StackFrame)%20WHERE%20FunctionName%3D%27cc%3A%3AProxyMain%3A%3ABeginMainFrame(std%3A%3Aunique_ptr%3Ccc%3A%3ABeginMainFrameAndCommitState%2Cstd%3A%3Adefault_delete%3Ccc%3A%3ABeginMainFrameAndCommitState%3E%20%3E)%27)&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#-property-selector,+productversion
,
Jan 16 2018
When looking at Canary trial crashes in versions later than 65.0.3319.0, then this crash is ranked #39 - see https://crash.corp.google.com/browse?q=product.Version%3E%3D%2765.0.3319.0%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&compProp=custom_data.ChromeCrashProto.experiments.ids&v1=459a590c-92fbaf0a&v2=459a590c-349050d8#magicsignature:1000,-magicsignature2:30,stablesignature:30
,
Jan 16 2018
Hmmm... 19 out of 22 crash reports in the trial come from a single client id. Maybe the true rank of the crash is much lower than #39. This might mean that WontFix-ing this bug is fine for now.
,
Jan 19 2018
The crashes keep happening (e.g. some are in the most recent Canary - 65.0.3325.0) and right now we are up to 6 unique clients hitting the crash. So - maybe we should take a closer look at this crash after all. Ken - can you PTAL?
,
Jan 19 2018
Users experienced this crash on the following builds: Mac Canary 65.0.3324.0 - 0.15 CPM, 1 reports, 1 clients (signature cc::LayerTreeHost::AnimateLayers) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jan 22 2018
Users experienced this crash on the following builds: Win Canary 66.0.3327.0 - 0.50 CPM, 7 reports, 2 clients (signature cc::LayerTreeHost::AnimateLayers) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jan 23 2018
This is a strange crash. It looks like it is dereferencing a bad mutator_host_ pointer, but that is set in the constructor and never modified.
,
Jan 24 2018
dcheng@: My only idea for what could be happening here is that this is a consequence of an OOPIF getting detached during layout. In that case the RenderWidget closes and the compositor gets destroyed, which manifests as a use-after-free on a LayerTreeHost member after LTH::BeginMainFrame() returns. I know you made OOPIF RenderWidget destruction synchronous to fix a different UAF bug a while back. Do you know of any reason why this shouldn't happen? If the above is right, this looks like it might be difficult to fix. (Flagging as a security bug for now, that can be changed back later if that turns out to be wrong.)
,
Jan 24 2018
Adding some compositing experts for thoughts on the above problem. The simplest solution would likely be to have ProxyMain check whether it has been destroyed when BeginMainFrame() returns, and immediately return if so. Does that raise any other problems? Alternatively, we could revisit the decision to synchronously destroy the RenderWidget on an OOPIF compositor, but then we have to still resolve the ordering problem between destruction of WebFrameWidget and WebLocalFrame ( issue 509508 ).
,
Jan 24 2018
,
Jan 24 2018
Is the assumption that the LTH is deleted from LTHClient::BeginMainFrame? Do you have a stack trace that shows the details? I think it's not entirely crazy to support this, in that from the impl side it would look just like if the LTH was destroyed after the BeginMainFrame was posted but before it ran, but we would certainly need to extensively test the behavior. It's a bunch of complexity too (likely in other parts of the code, including the blink side, I imagine), I would generally prefer if we found a higher-level solution.
,
Jan 25 2018
Yes. I think I might have caused this regression in r527245. In that case we were quite certain that the local frame root was being detached over the course of WebFrameWidgetImpl::BeginFrame(), causing a null pointer dereference. That CL approximately lines up with these crash reports appearing, and the bad mutator_host_ pointer in LTH::AnimateLayers(), seen in the crash reports, seems a likely consequence of the same situation. lukasza@ landed a test in r529035 that tries to replicate this condition. I can have a look tomorrow to see why that isn't tripping this bug.
,
Jan 25 2018
The test isn't catching this because the WebFrameWidgetImpl::BeginFrame() stack has a deferred call from WebWidgetTestClient::ScheduleAnimation(). This works differently in layout tests than in a real browser, and running on a debug build of Chrome hits DCHECK(!inside_main_frame_) in the LayerTreeHost destructor. Marking this as affecting Stable because the nullptr check that exposed this bug was merged to M64 and is currently going out to users. The test is http/tests/misc/frame-detached-in-animationstart-event.html, in case anyone wants to see how it happens.
,
Jan 25 2018
Thanks Ken. Have you confirmed that the crash is possible on the M64 release? I would assume so. Wondering if we should request a revert of the merge, and whether that needs a respin. [+abdulsyed, awhalley]
,
Jan 26 2018
Also adding palmer@. To summarize: r527245 fixed a nullptr crash that turned out to be shielding a use-after-free. It was merged to M64. We didn't notice the problem because Blink layout tests behave differently from the browser, and didn't exhibit the UAF. Reverting that CL amounts to restoring the nullptr crash in order to prevent the security problem -- changing the DCHECK in the LayerTreeHost destructor to a CHECK would have the same effect.
,
Jan 26 2018
Thanks Ken/Charlie - how widely is this affecting users? We've rolled out M64 to 10% today. Is this only affecting site-isolation experiment? Based on looking at the crash rate above, it doesn't seem to be that high in Canary?
,
Jan 26 2018
I think nestable OOPIFs have to be enabled in some way in order for this to be a problem. That probably means being opted in to some kind of site isolation, either by setting the flag, being in an experiment, or by enterprise policy. There is a reasonable likelihood that this is an exploitable vulnerability for such users.
,
Jan 26 2018
My recommendation would be to get the revet made in the M64 branch so that it can be picked up if there's a respin, but that we shouldn't block stable roll out.
,
Jan 26 2018
Issue 804972 has been merged into this issue.
,
Jan 26 2018
,
Jan 26 2018
r527245 is already in M65 branch, correct?
,
Jan 27 2018
We will be planning a respin next week. Can you please see if a fix/revert for this can be landed sooner?
,
Jan 27 2018
re #22 - yep (nice trick: follow the r527245 then append /chrome/VERSION to the URL)
,
Jan 27 2018
Thank you awhalley@.
,
Jan 29 2018
awhalley: My understanding is that this revert will just regress to the more prevalent non-security crash. Can we wait until M65 where the proper fix for the security issue is landed, and live with this security one for a few more weeks? How critical is it to land the revert in M64?
,
Jan 29 2018
yea, the prevalence of the stability issue does seem non-negligible. ochang@ any thoughts on the ease of discoverability and exploitability of the UAF?
,
Jan 30 2018
This is easily discoverable, since some web sites are triggering this problem, including an internal Google tool. We have a layout test that hits the bug if you load it in a full browser. Oliver might have a more knowledgeable opinion on exploitability. The UAF can be achieved by JavaScript, which frees an object on which we later attempt to call a virtual function. So it would be a matter of re-allocating the memory that formerly contained the LayerTreeHost and having it subsequently store attacker-controlled data. I don't know enough about V8 to say how feasible that is, when the frame with the currently executing callback is detached.
,
Jan 31 2018
Just to be sure, the immediate fix/revert for this has already landed in crbug/776407, correct? If yes, let's move this to M65.
,
Jan 31 2018
Correct.
,
Jan 31 2018
,
Jan 31 2018
,
Feb 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8 commit ed449bb4ab860cde5d6851fd94d829b4c9eb20f8 Author: Ken Buchanan <kenrb@chromium.org> Date: Thu Feb 01 21:02:05 2018 Defer compositor shutdown for OOPIFs If an OOPIF detaches itself from the frame tree during layout, then its compositor is shutdown while in the midst of a BeginFrame, which causes a crash. This CL defers compositor shutdown, while allowing WebWidget closing to remain synchronous. Bug: 802392 Change-Id: Iefef08fca2e892409d12cc748ddaf257f3aa68e4 Reviewed-on: https://chromium-review.googlesource.com/890042 Commit-Queue: Ken Buchanan <kenrb@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#533812} [modify] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/content/browser/site_per_process_browsertest.cc [modify] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/content/renderer/render_widget.cc [modify] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/content/renderer/render_widget.h [add] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/content/test/data/frame_tree/frame-detached-in-animationstart-event-oopif.html [add] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/content/test/data/frame_tree/frame-detached-in-animationstart-event-samesiteframe.html [add] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/content/test/data/frame_tree/frame-detached-in-animationstart-event.html [modify] https://crrev.com/ed449bb4ab860cde5d6851fd94d829b4c9eb20f8/testing/buildbot/filters/viz.content_browsertests.filter
,
Feb 5 2018
Commit ed449bb4... (the fix from #c33 above) initially landed in 66.0.3337.0. No crashes are seen in 66.0.3337.0 or above (the last crash was seen in 66.0.3336.5). Thanks for the fix! https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20product.version%20%3E%3D%20%2765.0.3320.0%27%20AND%20expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27cc%3A%3ALayerTreeHost%3A%3AAnimateLayers%27#-property-selector,productversion:1000
,
Feb 7 2018
r527245 is already in M65 branch. Do we need a merge for cl listed at #33 to M65? If yes, pls request a merge. If not, pls mark the bug as fixed or move to M-66. Thank you.
,
Feb 7 2018
,
Feb 7 2018
govind@ - good for 65
,
Feb 7 2018
Approving merge for CL listed at #33 to M65 branch 3325 based on comments #34 abd #36. Pls merge ASAP. Thank you.
,
Feb 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0598090d9ea03a43c433e52ca918f0442c65af63 commit 0598090d9ea03a43c433e52ca918f0442c65af63 Author: Ken Buchanan <kenrb@chromium.org> Date: Wed Feb 07 16:33:24 2018 Defer compositor shutdown for OOPIFs If an OOPIF detaches itself from the frame tree during layout, then its compositor is shutdown while in the midst of a BeginFrame, which causes a crash. This CL defers compositor shutdown, while allowing WebWidget closing to remain synchronous. TBR=alexmos@chromium.org (cherry picked from commit ed449bb4ab860cde5d6851fd94d829b4c9eb20f8) Bug: 802392 Change-Id: Iefef08fca2e892409d12cc748ddaf257f3aa68e4 Reviewed-on: https://chromium-review.googlesource.com/890042 Commit-Queue: Ken Buchanan <kenrb@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#533812} Reviewed-on: https://chromium-review.googlesource.com/906910 Reviewed-by: Ken Buchanan <kenrb@chromium.org> Cr-Commit-Position: refs/branch-heads/3325@{#360} Cr-Branched-From: bc084a8b5afa3744a74927344e304c02ae54189f-refs/heads/master@{#530369} [modify] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/content/browser/site_per_process_browsertest.cc [modify] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/content/renderer/render_widget.cc [modify] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/content/renderer/render_widget.h [add] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/content/test/data/frame_tree/frame-detached-in-animationstart-event-oopif.html [add] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/content/test/data/frame_tree/frame-detached-in-animationstart-event-samesiteframe.html [add] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/content/test/data/frame_tree/frame-detached-in-animationstart-event.html [modify] https://crrev.com/0598090d9ea03a43c433e52ca918f0442c65af63/testing/buildbot/filters/viz.content_browsertests.filter
,
Feb 7 2018
,
Feb 8 2018
,
Feb 12 2018
,
Mar 6 2018
,
May 17 2018
This bug has been closed for more than 14 weeks. Removing security view restrictions. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by lukasza@chromium.org
, Jan 16 2018