Prevent OOPIF BeginMainFrames after the RenderWidget has been closed |
||||||||||||||||||||
Issue descriptionWe are seeing some crashes from the --site-per-process Finch trial: https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AWebLocalFrameImpl%3A%3AViewImpl%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D The crash stack implies that BeginFrame is being called on WebFrameWidgetImpl after the local_root_ has been set to nullptr. Deductively, this means that RenderWidget has received a ViewMsg_OnClose which destroys the compositor and then receives a BeginMainFrame task from the compositor immediately afterward. Since the compositor runs on a different thread, it is possible that these things occasionally happen simultaneously, resulting in this ordering. lfg@ points out that the main frame has a specific guard that would prevent this scenario: https://chromium.googlesource.com/chromium/src/+/e8bd087eeca8cfdb7f052a775dfc36241d6d6c30/third_party/WebKit/Source/core/exported/WebViewImpl.cpp#1836 This should be a trivial fix, we just need to replicate that check in WebFrameWidgetImpl.
,
Oct 27 2017
Ken, there's still some crashes after your fix landed in 64.0.3246.0. For example, crash 746335f54712a3c2 is in 64.0.3247.0. It looks like we're getting further with your fix in place, but WebFrameWidgetBase::GetPage() is getting a null deref on View(). Can you take another look?
,
Oct 31 2017
,
Nov 7 2017
,
Nov 21 2017
I haven't updated this in a while, but just to have it here: The two crash reports we have on 64.0.3247.0 look the same as the earlier ones, although I don't understand what is happening there. It still looks like dereferencing an offset of a null local_root_, which shouldn't be possibly. I'd like to see if this shows up again in the next finch trial. There has been some other code added to the BeginFrame method from the OOPIF frame throttling CL which could potentially change the crash location.
,
Dec 11 2017
Issue 793803 has been merged into this issue.
,
Dec 23 2017
Ken, there's a steady stream of these crashes in the December field trial (low volume but consistent in each Canary). Can you take another look?
,
Dec 23 2017
I've had a patch with a fix up for a while now, but haven't had time to figure out if I can write a test for it. https://chromium-review.googlesource.com/c/chromium/src/+/819593
,
Jan 2 2018
,
Jan 4 2018
,
Jan 4 2018
FWIW, I have a WIP CL @ https://crrev.com/c/851114 that tries to implement a regression test for this crash.
,
Jan 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7e04a9d1f143fb6d37d380833f766bf938cc0284 commit 7e04a9d1f143fb6d37d380833f766bf938cc0284 Author: Ken Buchanan <kenrb@chromium.org> Date: Fri Jan 05 09:16:56 2018 Guard against frame detachment during OOPIF Animate Crash reports suggest that WebFrameWidgetImpl can have a null local root pointer after calling PageWidgetDelegate::Animate(). The subsequence call to GetPage() dereferences that pointer. This CL adds a null check for cases when Animate causes detachment. Bug: 776407 Change-Id: I3429865a786aa3acd9a06000e93fc4385e4e6e19 Reviewed-on: https://chromium-review.googlesource.com/819593 Commit-Queue: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#527245} [modify] https://crrev.com/7e04a9d1f143fb6d37d380833f766bf938cc0284/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp
,
Jan 5 2018
Great - thanks kenrb@ and dcheng@! Let's wait until the fix bakes on the Canary for a day or two, verify that there are no more crashes, and then request a merge back to M64 (according to https://crbug.com/793803#c8 the crash doesn't happen on M63 so only a M64 merge might be desirable). We should be able to do the above on Tuesday next week - setting NextAction field accordingly.
,
Jan 9 2018
The NextAction date has arrived: 2018-01-09
,
Jan 9 2018
This seems to have fixed the problem with BeginFrame, but there is another crash showing up in that query: WebFrameWidgetImpl::ResizeVisualViewport() is hitting a nullptr on the same line. Probably still worth merging this but there is more work to do.
,
Jan 9 2018
Ok. Is that a different signature? Can you file another bug for that one (referencing this one), mark this as fixed, and request merge? I agree with the plan you suggest to merge and land another fix, and we can evaluate whether the second fix needs merging as well based on numbers. Thanks!
,
Jan 12 2018
Commit 7e04a9d1... (Guard against frame detachment during OOPIF Animate) initially landed in 65.0.3313.0, but the number of crashes on M65 didn't really go down (it hovers around 1-2 crashes per release, although 3315.4 is a bit of an outlier with 5 crashes). Based on this, I think there is no point requesting a merge. https://crash.corp.google.com/browse?q=product.Version%3E%3D%2765.0.3310.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AWebLocalFrameImpl%3A%3AViewImpl%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#-property-selector,+productversion
,
Jan 12 2018
Agreed that there are more ways to get to the crash after your fix in 65.0.3313.0: Crash 57cef4bc2d54286e - WebFrameWidgetBase::FocusedLocalFrameInWidget() Crash 50aa54f4bc8035ea - WebFrameWidgetImpl::ResizeVisualViewport(...) There's other signatures as well: Crash 953614a04ea7a780 (M64) - content::RenderWidget::Resize() We're seeing all the variants on M64. Do you think it's worth merging r527245 to M64 to at least reduce the number of cases? And is there something more general we should be fixing here, or just add fixes for the other specific cases we see?
,
Jan 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3f4df371bc846c1fd853f8e8b596e553291e425c commit 3f4df371bc846c1fd853f8e8b596e553291e425c Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Fri Jan 12 20:00:13 2018 Regression test for the WebFrameWidgetImpl::BeginFrame crash. Bug: 776407 Change-Id: I0be889a9ca3b47ddb1359fe9b2fc16c65bbdd458 Reviewed-on: https://chromium-review.googlesource.com/851114 Commit-Queue: Ćukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#529035} [add] https://crrev.com/3f4df371bc846c1fd853f8e8b596e553291e425c/third_party/WebKit/LayoutTests/http/tests/misc/frame-detached-in-animationstart-event.html [add] https://crrev.com/3f4df371bc846c1fd853f8e8b596e553291e425c/third_party/WebKit/LayoutTests/http/tests/misc/resources/frame-detached-in-animationstart-event-oopif.html [add] https://crrev.com/3f4df371bc846c1fd853f8e8b596e553291e425c/third_party/WebKit/LayoutTests/http/tests/misc/resources/frame-detached-in-animationstart-event-samesiteframe.html
,
Jan 12 2018
,
Jan 12 2018
,
Jan 12 2018
r529035 fixes the crash that happens on BeginFrame, which is far more prevalent than the similar crashes with other stacks. Requesting merge for that.
,
Jan 12 2018
This bug requires manual review: We are only 10 days from stable. Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 12 2018
Thanks for the fix and I'm supportive of the merge. Can you please confirm if this is well tested in Canary first?
,
Jan 15 2018
We've been monitoring crashes on the --site-per-process trial and this has been on Canary for over a week, so it should be well-tested.
,
Jan 16 2018
Approved. Branch:3282
,
Jan 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b14235c2901079ed4cc8fff1b375d65f4ea80b74 commit b14235c2901079ed4cc8fff1b375d65f4ea80b74 Author: Ken Buchanan <kenrb@chromium.org> Date: Wed Jan 17 17:57:18 2018 Guard against frame detachment during OOPIF Animate Crash reports suggest that WebFrameWidgetImpl can have a null local root pointer after calling PageWidgetDelegate::Animate(). The subsequence call to GetPage() dereferences that pointer. This CL adds a null check for cases when Animate causes detachment. Bug: 776407 Change-Id: I3429865a786aa3acd9a06000e93fc4385e4e6e19 Reviewed-on: https://chromium-review.googlesource.com/819593 Commit-Queue: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#527245}(cherry picked from commit 7e04a9d1f143fb6d37d380833f766bf938cc0284) Reviewed-on: https://chromium-review.googlesource.com/871090 Reviewed-by: Ken Buchanan <kenrb@chromium.org> Cr-Commit-Position: refs/branch-heads/3282@{#524} Cr-Branched-From: 5fdc0fab22ce7efd32532ee989b223fa12f8171e-refs/heads/master@{#520840} [modify] https://crrev.com/b14235c2901079ed4cc8fff1b375d65f4ea80b74/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp
,
Jan 26 2018
abdulsyed@: We discussed reverting this last fix from the M64 branch on the other bug. Do I need approval before doing that?
,
Jan 30 2018
Requesting a revert of the merge to M64, as discussed in issue 802392 . The fix above seems to have reduced crash volume, but exposed a different crash in the same situation, as mentioned in 802392. The revert will return to the original crash volume when this problem is hit, but the crash is safer for users.
,
Jan 30 2018
This bug requires manual review: Request affecting a post-stable build Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 30 2018
Was this new crash discovered in Canary? I'd like to ensure if we can catch these earlier. Approving merge for revert to M64, branch:3282
,
Jan 31 2018
It showed up as a low volume crash in the Site Isolation trial on Canary. I didn't immediately connect it to this bug because the crash stack trace did not look similar.
,
Jan 31 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/343d8889f702ba3297a35fd2608fa51cbd2d3a9f commit 343d8889f702ba3297a35fd2608fa51cbd2d3a9f Author: Ken Buchanan <kenrb@chromium.org> Date: Wed Jan 31 17:56:16 2018 Revert "Guard against frame detachment during OOPIF Animate" This reverts commit b14235c2901079ed4cc8fff1b375d65f4ea80b74. Reason for revert: Caused https://crbug.com/802392 . TBR=dcheng@chromium.org Original change's description: > Guard against frame detachment during OOPIF Animate > > Crash reports suggest that WebFrameWidgetImpl can have a null > local root pointer after calling PageWidgetDelegate::Animate(). The > subsequence call to GetPage() dereferences that pointer. > > This CL adds a null check for cases when Animate causes detachment. > > Bug: 776407 > Change-Id: I3429865a786aa3acd9a06000e93fc4385e4e6e19 > Reviewed-on: https://chromium-review.googlesource.com/819593 > Commit-Queue: Daniel Cheng <dcheng@chromium.org> > Reviewed-by: Daniel Cheng <dcheng@chromium.org> > Cr-Original-Commit-Position: refs/heads/master@{#527245}(cherry picked from commit 7e04a9d1f143fb6d37d380833f766bf938cc0284) > Reviewed-on: https://chromium-review.googlesource.com/871090 > Reviewed-by: Ken Buchanan <kenrb@chromium.org> > Cr-Commit-Position: refs/branch-heads/3282@{#524} > Cr-Branched-From: 5fdc0fab22ce7efd32532ee989b223fa12f8171e-refs/heads/master@{#520840} TBR=kenrb@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 776407 Change-Id: I46ede1b0225096013089eb7ce5912bb701d55225 Reviewed-on: https://chromium-review.googlesource.com/895862 Reviewed-by: Ken Buchanan <kenrb@chromium.org> Cr-Commit-Position: refs/branch-heads/3282@{#628} Cr-Branched-From: 5fdc0fab22ce7efd32532ee989b223fa12f8171e-refs/heads/master@{#520840} [modify] https://crrev.com/343d8889f702ba3297a35fd2608fa51cbd2d3a9f/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp
,
Jan 31 2018
Do we need to revert this in M65 (cl listed #33)? If yes, pls request a merge to M65 ASAP. Thank you.
,
Jan 31 2018
I have a proper fix in review that should resolve all of the crashes. I would rather try to get that landed and then merged to M65, if that sounds reasonable.
,
Jan 31 2018
OK, pls request a merge to M65 once proper fix is landed and looks good to merge to M65. If fix is risky, then it is best to revert. Thank you. Adding "ReleaseBlock-Stable" label for M65 tracking purpose.
,
Feb 1 2018
Just to update the latest behavior of this issue, As of now no crash instances are observed on chrome latest stable M64 #64.0.3282.119. Last crash is observed on #64.0.3282.5 with 5 instances. Hence adding TE verified label for M64. Please feel free to remove these labels in case if any crashes are seen on 64. Link to list of the builds: https://goto.google.com/hmiyv Thanks!
,
Feb 8 2018
|
||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Oct 20 2017