New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 776407 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-01-09
OS: Linux , Android , Windows , Chrome , Mac
Pri: 1
Type: Bug

Blocked on:
issue 801673
issue 801678



Sign in to add a comment

Prevent OOPIF BeginMainFrames after the RenderWidget has been closed

Project Member Reported by kenrb@chromium.org, Oct 19 2017

Issue description

We are seeing some crashes from the --site-per-process Finch trial:
https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AWebLocalFrameImpl%3A%3AViewImpl%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D

The crash stack implies that BeginFrame is being called on WebFrameWidgetImpl after the local_root_ has been set to nullptr. Deductively, this means that RenderWidget has received a ViewMsg_OnClose which destroys the compositor and then receives a BeginMainFrame task from the compositor immediately afterward. Since the compositor runs on a different thread, it is possible that these things occasionally happen simultaneously, resulting in this ordering.

lfg@ points out that the main frame has a specific guard that would prevent this scenario: https://chromium.googlesource.com/chromium/src/+/e8bd087eeca8cfdb7f052a775dfc36241d6d6c30/third_party/WebKit/Source/core/exported/WebViewImpl.cpp#1836

This should be a trivial fix, we just need to replicate that check in WebFrameWidgetImpl.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8c07fc795fc4f3a042e606da6fa909963a6a8ee3

commit 8c07fc795fc4f3a042e606da6fa909963a6a8ee3
Author: Ken Buchanan <kenrb@chromium.org>
Date: Fri Oct 20 13:38:17 2017

OOPIFs should abort BeginFrame after compositor destruction

There are crash reports from the --site-per-process finch trial implying
that OOPIF processes are sometimes attempting to do a BeginFrame after
the RenderWidget has started shutting down.

This patch adds a check to return early when this is the case. It
matches an existing check in WebViewImpl::BeginFrame().

Bug:  776407 
Change-Id: I0f337f05580a9155b6e1b17978afbe2035287e0d
Reviewed-on: https://chromium-review.googlesource.com/728362
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Ken Buchanan <kenrb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#510421}
[modify] https://crrev.com/8c07fc795fc4f3a042e606da6fa909963a6a8ee3/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp

Comment 2 by creis@chromium.org, Oct 27 2017

Status: Started (was: Assigned)
Ken, there's still some crashes after your fix landed in 64.0.3246.0.  For example, crash 746335f54712a3c2 is in 64.0.3247.0.

It looks like we're getting further with your fix in place, but WebFrameWidgetBase::GetPage() is getting a null deref on View().  Can you take another look?

Comment 3 by creis@chromium.org, Oct 31 2017

Blocking: 780133

Comment 4 by creis@chromium.org, Nov 7 2017

Blocking: -780133

Comment 5 by kenrb@chromium.org, Nov 21 2017

I haven't updated this in a while, but just to have it here: The two crash reports we have on 64.0.3247.0 look the same as the earlier ones, although I don't understand what is happening there. It still looks like dereferencing an offset of a null local_root_, which shouldn't be possibly.

I'd like to see if this shows up again in the next finch trial. There has been some other code added to the BeginFrame method from the OOPIF frame throttling CL which could potentially change the crash location.

Comment 6 by kenrb@chromium.org, Dec 11 2017

Issue 793803 has been merged into this issue.

Comment 7 by creis@chromium.org, Dec 23 2017

Labels: -Pri-2 M-65 Pri-1
Ken, there's a steady stream of these crashes in the December field trial (low volume but consistent in each Canary).  Can you take another look?

Comment 8 by kenrb@chromium.org, Dec 23 2017

I've had a patch with a fix up for a while now, but haven't had time to figure out if I can write a test for it. https://chromium-review.googlesource.com/c/chromium/src/+/819593
Labels: M-64
According to https://crbug.com/793803#c8 this issue also affects M64.
Labels: Target-64
FWIW, I have a WIP CL @ https://crrev.com/c/851114 that tries to implement a regression test for this crash.
Project Member

Comment 12 by bugdroid1@chromium.org, Jan 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7e04a9d1f143fb6d37d380833f766bf938cc0284

commit 7e04a9d1f143fb6d37d380833f766bf938cc0284
Author: Ken Buchanan <kenrb@chromium.org>
Date: Fri Jan 05 09:16:56 2018

Guard against frame detachment during OOPIF Animate

Crash reports suggest that WebFrameWidgetImpl can have a null
local root pointer after calling PageWidgetDelegate::Animate(). The
subsequence call to GetPage() dereferences that pointer.

This CL adds a null check for cases when Animate causes detachment.

Bug:  776407 
Change-Id: I3429865a786aa3acd9a06000e93fc4385e4e6e19
Reviewed-on: https://chromium-review.googlesource.com/819593
Commit-Queue: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Cr-Commit-Position: refs/heads/master@{#527245}
[modify] https://crrev.com/7e04a9d1f143fb6d37d380833f766bf938cc0284/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp

Cc: abdulsyed@chromium.org
NextAction: 2018-01-09
Great - thanks kenrb@ and dcheng@!  Let's wait until the fix bakes on the Canary for a day or two, verify that there are no more crashes, and then request a merge back to M64 (according to https://crbug.com/793803#c8 the crash doesn't happen on M63 so only a M64 merge might be desirable).

We should be able to do the above on Tuesday next week - setting NextAction field accordingly.
The NextAction date has arrived: 2018-01-09
This seems to have fixed the problem with BeginFrame, but there is another crash showing up in that query: WebFrameWidgetImpl::ResizeVisualViewport() is hitting a nullptr on the same line. Probably still worth merging this but there is more work to do.
Ok.  Is that a different signature?  Can you file another bug for that one (referencing this one), mark this as fixed, and request merge?  I agree with the plan you suggest to merge and land another fix, and we can evaluate whether the second fix needs merging as well based on numbers.  Thanks!
Commit 7e04a9d1... (Guard against frame detachment during OOPIF Animate) initially landed in 65.0.3313.0, but the number of crashes on M65 didn't really go down (it hovers around 1-2 crashes per release, although 3315.4 is a bit of an outlier with 5 crashes).  Based on this, I think there is no point requesting a merge.

https://crash.corp.google.com/browse?q=product.Version%3E%3D%2765.0.3310.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AWebLocalFrameImpl%3A%3AViewImpl%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#-property-selector,+productversion

Comment 18 by creis@chromium.org, Jan 12 2018

Agreed that there are more ways to get to the crash after your fix in 65.0.3313.0:
Crash 57cef4bc2d54286e - WebFrameWidgetBase::FocusedLocalFrameInWidget()
Crash 50aa54f4bc8035ea - WebFrameWidgetImpl::ResizeVisualViewport(...)

There's other signatures as well:
Crash 953614a04ea7a780 (M64) - content::RenderWidget::Resize()

We're seeing all the variants on M64.  Do you think it's worth merging r527245 to M64 to at least reduce the number of cases?  And is there something more general we should be fixing here, or just add fixes for the other specific cases we see?
Blockedon: 801673
Blockedon: 801678

Comment 22 by kenrb@chromium.org, Jan 12 2018

Labels: Merge-Request-64
r529035 fixes the crash that happens on BeginFrame, which is far more prevalent than the similar crashes with other stacks.

Requesting merge for that.
Project Member

Comment 23 by sheriffbot@chromium.org, Jan 12 2018

Labels: -Merge-Request-64 Hotlist-Merge-Review Merge-Review-64
This bug requires manual review: We are only 10 days from stable.
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Thanks for the fix and I'm supportive of the merge. Can you please confirm if this is well tested in Canary first?

Comment 25 by kenrb@chromium.org, Jan 15 2018

We've been monitoring crashes on the --site-per-process trial and this has been on Canary for over a week, so it should be well-tested.
Labels: -Merge-Review-64 Merge-Approved-64
Approved. Branch:3282
Project Member

Comment 27 by bugdroid1@chromium.org, Jan 17 2018

Labels: -merge-approved-64 merge-merged-3282
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b14235c2901079ed4cc8fff1b375d65f4ea80b74

commit b14235c2901079ed4cc8fff1b375d65f4ea80b74
Author: Ken Buchanan <kenrb@chromium.org>
Date: Wed Jan 17 17:57:18 2018

Guard against frame detachment during OOPIF Animate

Crash reports suggest that WebFrameWidgetImpl can have a null
local root pointer after calling PageWidgetDelegate::Animate(). The
subsequence call to GetPage() dereferences that pointer.

This CL adds a null check for cases when Animate causes detachment.

Bug:  776407 
Change-Id: I3429865a786aa3acd9a06000e93fc4385e4e6e19
Reviewed-on: https://chromium-review.googlesource.com/819593
Commit-Queue: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#527245}(cherry picked from commit 7e04a9d1f143fb6d37d380833f766bf938cc0284)
Reviewed-on: https://chromium-review.googlesource.com/871090
Reviewed-by: Ken Buchanan <kenrb@chromium.org>
Cr-Commit-Position: refs/branch-heads/3282@{#524}
Cr-Branched-From: 5fdc0fab22ce7efd32532ee989b223fa12f8171e-refs/heads/master@{#520840}
[modify] https://crrev.com/b14235c2901079ed4cc8fff1b375d65f4ea80b74/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp

Comment 28 by kenrb@chromium.org, Jan 26 2018

abdulsyed@: We discussed reverting this last fix from the M64 branch on the other bug. Do I need approval before doing that? 

Comment 29 by kenrb@chromium.org, Jan 30 2018

Labels: Merge-Request-64
Requesting a revert of the merge to M64, as discussed in  issue 802392 .

The fix above seems to have reduced crash volume, but exposed a different crash in the same situation, as mentioned in 802392. The revert will return to the original crash volume when this problem is hit, but the crash is safer for users.
Project Member

Comment 30 by sheriffbot@chromium.org, Jan 30 2018

Labels: -Merge-Request-64 Merge-Review-64
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-64 Merge-Approved-64
Was this new crash discovered in Canary? I'd like to ensure if we can catch these earlier. Approving merge for revert to M64, branch:3282

Comment 32 by kenrb@chromium.org, Jan 31 2018

It showed up as a low volume crash in the Site Isolation trial on Canary. I didn't immediately connect it to this bug because the crash stack trace did not look similar.
Project Member

Comment 33 by bugdroid1@chromium.org, Jan 31 2018

Labels: -merge-approved-64
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/343d8889f702ba3297a35fd2608fa51cbd2d3a9f

commit 343d8889f702ba3297a35fd2608fa51cbd2d3a9f
Author: Ken Buchanan <kenrb@chromium.org>
Date: Wed Jan 31 17:56:16 2018

Revert "Guard against frame detachment during OOPIF Animate"

This reverts commit b14235c2901079ed4cc8fff1b375d65f4ea80b74.

Reason for revert: Caused  https://crbug.com/802392 .

TBR=dcheng@chromium.org

Original change's description:
> Guard against frame detachment during OOPIF Animate
>
> Crash reports suggest that WebFrameWidgetImpl can have a null
> local root pointer after calling PageWidgetDelegate::Animate(). The
> subsequence call to GetPage() dereferences that pointer.
>
> This CL adds a null check for cases when Animate causes detachment.
>
> Bug:  776407 
> Change-Id: I3429865a786aa3acd9a06000e93fc4385e4e6e19
> Reviewed-on: https://chromium-review.googlesource.com/819593
> Commit-Queue: Daniel Cheng <dcheng@chromium.org>
> Reviewed-by: Daniel Cheng <dcheng@chromium.org>
> Cr-Original-Commit-Position: refs/heads/master@{#527245}(cherry picked from commit 7e04a9d1f143fb6d37d380833f766bf938cc0284)
> Reviewed-on: https://chromium-review.googlesource.com/871090
> Reviewed-by: Ken Buchanan <kenrb@chromium.org>
> Cr-Commit-Position: refs/branch-heads/3282@{#524}
> Cr-Branched-From: 5fdc0fab22ce7efd32532ee989b223fa12f8171e-refs/heads/master@{#520840}

TBR=kenrb@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  776407 
Change-Id: I46ede1b0225096013089eb7ce5912bb701d55225
Reviewed-on: https://chromium-review.googlesource.com/895862
Reviewed-by: Ken Buchanan <kenrb@chromium.org>
Cr-Commit-Position: refs/branch-heads/3282@{#628}
Cr-Branched-From: 5fdc0fab22ce7efd32532ee989b223fa12f8171e-refs/heads/master@{#520840}
[modify] https://crrev.com/343d8889f702ba3297a35fd2608fa51cbd2d3a9f/third_party/WebKit/Source/core/frame/WebFrameWidgetImpl.cpp

Do we need to revert this in M65 (cl listed #33)? If yes, pls request a merge to M65 ASAP. Thank you.

Comment 35 by kenrb@chromium.org, Jan 31 2018

I have a proper fix in review that should resolve all of the crashes. I would rather try to get that landed and then merged to M65, if that sounds reasonable.
Labels: ReleaseBlock-Stable
OK, pls request a merge to M65 once proper fix is landed and looks good to merge to M65. If fix is risky, then it is best to revert. Thank you.

Adding "ReleaseBlock-Stable" label for M65 tracking purpose.
Cc: brajkumar@chromium.org
Labels: TE-Verified-64.0.3282.140 TE-Verified-M64
Just to update the latest behavior of this issue, As of now no crash instances are observed on chrome latest stable M64 #64.0.3282.119. Last crash is observed on #64.0.3282.5 with 5 instances. Hence adding TE verified label for M64. Please feel free to remove these labels in case if any crashes are seen on 64.

Link to list of the builds: https://goto.google.com/hmiyv

Thanks!
Status: Fixed (was: Started)

Sign in to add a comment