New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 666563 link

Starred by 4 users

Issue metadata

Status: Fixed
Merged: issue 664809
Owner:
Last visit > 30 days ago
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Opening many Facebook tabs in the same thread (with command click) leads to tabs locking up

Reported by salazar...@gmail.com, Nov 18 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36

Steps to reproduce the problem:
1. Open a bunch of Facebook notifications.
2. Notice that tabs eventually become unresponsive and uncloseable (and leads to a weird UI bug where "x"-ing them out makes the tabs smaller and smaller.

What is the expected behavior?
Facebook doesn't crash

What went wrong?
I see a lot of "startScrollbarPaintTimer" tasks and also a lot of hitting the disk cache which could be related?

Did this work before? N/A 

Chrome version: 54.0.2840.98  Channel: stable
OS Version: OS X 10.11.6
Flash Version: Shockwave Flash 23.0 r0

[disclaimer] I work for Facebook on the Web Speed team
 
trace_tabslockuptrace.json.gz
8.7 MB Download

Comment 1 by bmau...@fb.com, Nov 18 2016

Labels: DevRel-Facebook
We had another Facebook employee also notice their machine lock up for the same reason a few months ago so this seems like a longstanding issue.
Cc: skobes@chromium.org tdres...@chromium.org
Cc: esprehn@chromium.org
Components: -Blink Blink>Paint
Labels: -Pri-2 OS-iOS Pri-1
Something is definitely very bad in the scroll paint timer:

Wall Duration	
0.043 ms
CPU Duration	
0.042 ms
Self Time	
0.006 ms
src_file	
"../../third_party/WebKit/Source/platform/mac/ScrollAnimatorMac.mm"
src_func	
"startScrollbarPaintTimer"

and it's firing every super rapidly, looks like every 0.1ms we fire the timer?
Looks like a task queue flood.
Screenshot 2016-11-17 at 5.22.11 PM.png
11.5 KB View Download

Comment 6 by skobes@chromium.org, Nov 18 2016

Cc: bokan@chromium.org

Comment 7 by chrishtr@google.com, Nov 18 2016

Components: -Blink>Paint Blink>Scroll

Comment 8 by bmau...@fb.com, Nov 18 2016

Seems fishy:

void ScrollAnimatorMac::startScrollbarPaintTimer() {
  m_taskRunner->postDelayedTask(
      BLINK_FROM_HERE, m_initialScrollbarPaintTaskFactory->cancelAndCreate(),
      0.1);
}
Cc: skyos...@chromium.org alexclarke@chromium.org
This might be thousands of cancelled tasks :/
Mergedinto: 664809
Status: Duplicate (was: Unconfirmed)

Comment 11 by bmau...@fb.com, Nov 18 2016

Status: Available (was: Duplicate)
I don't believe this is a duplicate of the bug listed. The regression in question was introduced and fixed in Chrome 56 canary. Marco saw this on stable. In addition another FB employee noticed the same function having the same pattern back in May. 

Comment 12 by bokan@chromium.org, Nov 18 2016

Cc: -bokan@chromium.org ymalik@chromium.org
Owner: bokan@chromium.org
Both ymalik@ and I have made some recent changes to overlay scrollbars which resulted in problems with the Mac scrollbar paint timer ( issue 664809 , issue 662402) but ymalik@'s patch was reverted and mine presumably fixed and relanded. All these changes are very recent -- about ~2 weeks -- so they wouldn't be responsible for any long standing issues or anything in stable.

Elliott, what version of Chrome did you take those traces with? The problematic patches were in from 56.0.2909.0 and should have been fixed by 56.0.2920.0.

I'll take a closer look.
Labels: -OS-iOS
I assume this bug doesn't affect iOS.  Please re-add the label if it does.

Comment 14 by bokan@chromium.org, Nov 25 2016

Status: Started (was: Available)

Comment 15 by bokan@chromium.org, Nov 30 2016

Cc: bokan@chromium.org yutak@chromium.org
Components: -Blink>Scroll Blink>Loader
Labels: -OS-Mac OS-All
Owner: y...@yoav.ws
Done some digging. I suspect the trace above is from a more recent build as the pattern of scrollbarPaintTimers matches the bug mentioned above. That said, I did find an issue in stable and still reproducible in ToT that has nothing to do with the scrollbar paint timer. In fact, this reproduces on Linux as well so it's not OS specific.

It looks like there's an issue with resource loading. I've tracked it down to ResourceFetcher::moveCachedNonBlockingResourceToBlocking. This method gets called on ResourceFetcher A with a resource whose ResourceLoader is actually belongs to another ResourceFetcher B. What happens is:

1. A tries to move resource to the blocking list
  a. The resource doesn't exist in A's m_nonBlockingLoaders but the remove call silently fails
  b. The resource is added to m_loaders
2. Some time later, B cancels the ResourceLoader (or perhaps it finishes).
  a. This clears Resource::m_loader but ResourceLoader still has a pointer into the Resource
  b. B removes ResourceLoader from its list of loaders
3. After this, A cancels all its loaders
  a. The ResourceLoader was cancelled by B but is still in A's m_loaders so it's canceled again
  b. The ResourceLoader still thinks it belongs to ResourceFetcher B so it tells B to remove to
     clean up the Resource
  c. B uses the Resource to get the ResourceLoader, since this was cleared in 2a this is nullptr.
     It then tries to remove nullptr from the loaders list which is empty as it was cleaned up in
     step 2.

It seems that calling HashSet.remove(nullptr) on an empty HashSet causes corruption and we end up looping forever next time we call HashSet.contains which is what causes the hang. Not sure if that in itself is a problem. I think the issue is some bad assumptions when calling moveCachedNonBlockingResourceToBlocking.

Over to yoav@ since it looks like he's touched this method.
+cc yutak@: Should removing nullptr from an empty HashSet cause corruption silently like this?

Comment 16 by bmau...@fb.com, Nov 30 2016

How'd you find that issue? Do you have any sense of how frequent it is?

That said, the trace that marco has is definitely from chrome stable and we saw the exact same pattern a few months ago. Is it possible there could be another cause of the scrollbarPaintTimers issue? 

Comment 17 by bokan@chromium.org, Nov 30 2016

The repro steps are:

1. Open the notifications menu
2. Going up and down the list, open each notification in a new tab. Each notification is opened twice. ~16 tabs
3. Randomly change tabs and start closing them all.

This seems to cause the page to lock up almost every time I tried, I would say about 9 out of 10 times it locks up.

The scrollbar paint timer is used only on Mac. It's been around for many years. It seems to be trying to defer scrollbar painting until after load completes but until that happens it can spam these tasks. How many tabs are open in this trace? This does seem excessive but we're still pumping other tasks so it's not quite locked up like the issue I saw. I'm currently looking at ScrollAnimatorMac for other reasons so I'll take a closer look at the paint timer specifically. I'd like to get rid of it altogether.

Comment 19 by bmau...@fb.com, Nov 30 2016

We see a wide variety of crashes / hangs. In the trace where we saw the scrollbar issue chrome was still responsive, but burning CPU. This is a pretty common complaint internally. I think the title on this bug might not be quite descriptive of the scrollbar behavior

Comment 20 by bokan@chromium.org, Nov 30 2016

Ok thanks. I've filed issue 669945 to dig into the scrollbar paint timer specifically. Will leave this as tracking the lock up fixed by Yoav's patch.
Removing nullptr from a HashSet is a logic error; that may lead to undefined
behavior.

You can't put a nullptr in a HashSet, so it makes no sense to look up or
remove a nullptr.
That's not specifically the nullptr though right, it's just the HashTableDeletedValue? Also do we have DCHECKs for this?
nullptr isn't a deleted value; it's an "empty value". The deleted value for
pointers is reinterpret_cast<T*>(-1).

And we have checks for empty values, so I assume we'll be hitting this if
we try to do that:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/HashTable.h?q=checkKey&sq=package:chromium&dr=CSs&l=615

(I think it's a good idea to expand this check to deleted values, though)
Project Member

Comment 24 by bugdroid1@chromium.org, Dec 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7497990e6eb19dc8dd61de4f188553c9c054cef9

commit 7497990e6eb19dc8dd61de4f188553c9c054cef9
Author: yoav <yoav@yoav.ws>
Date: Thu Dec 01 07:13:06 2016

Fix memory corruption related to load blocking resource move

This fixes an issue where a resource loader belonging to one
ResourceFetcher was accidentally added as a blocking loader to another
ResourceFetcher, by checking the loader is part of the already
non-blocking loaders belonging to current ResourceFetcher.

This also adds DCHECKs on a couple of methods removing loaders from
hashmaps, to make sure we're not trying to remove a nullptr.

BUG= 666563 

Review-Url: https://codereview.chromium.org/2537303003
Cr-Commit-Position: refs/heads/master@{#435573}

[modify] https://crrev.com/7497990e6eb19dc8dd61de4f188553c9c054cef9/third_party/WebKit/Source/core/fetch/ResourceFetcher.cpp
[modify] https://crrev.com/7497990e6eb19dc8dd61de4f188553c9c054cef9/third_party/WebKit/Source/core/fetch/ResourceFetcherTest.cpp

Comment 25 by y...@yoav.ws, Dec 1 2016

Status: Fixed (was: Started)
Should be fixed in ToT, so marking as such. Let me know if you still encounter this issue.
Yoav, could you merge into M56? It's still early in the branch and this seems like a small enough fix. I've heard anecdotal reports of similar lockups and user frustration so this would be good to ship asap.

Comment 27 by y...@yoav.ws, Dec 1 2016

Labels: Merge-Request-56
Makes sense. Adding a merge-request label.
Status: Assigned (was: Fixed)
Thanks. Removing "fixed" so it gets picked up.

Comment 29 by dimu@chromium.org, Dec 2 2016

Labels: -Merge-Request-56 Merge-Approved-56 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M56 (branch: 2924)
Project Member

Comment 30 by bugdroid1@chromium.org, Dec 2 2016

Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d63d800a8ab17eb1174bdbc8f13afa1ffb10193f

commit d63d800a8ab17eb1174bdbc8f13afa1ffb10193f
Author: Yoav Weiss <yoav@yoav.ws>
Date: Fri Dec 02 11:56:40 2016

Fix memory corruption related to load blocking resource move

This fixes an issue where a resource loader belonging to one
ResourceFetcher was accidentally added as a blocking loader to another
ResourceFetcher, by checking the loader is part of the already
non-blocking loaders belonging to current ResourceFetcher.

This also adds DCHECKs on a couple of methods removing loaders from
hashmaps, to make sure we're not trying to remove a nullptr.

BUG= 666563 

Review-Url: https://codereview.chromium.org/2537303003
Cr-Commit-Position: refs/heads/master@{#435573}
(cherry picked from commit 7497990e6eb19dc8dd61de4f188553c9c054cef9)

Review URL: https://codereview.chromium.org/2543203002 .

Cr-Commit-Position: refs/branch-heads/2924@{#286}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/d63d800a8ab17eb1174bdbc8f13afa1ffb10193f/third_party/WebKit/Source/core/fetch/ResourceFetcher.cpp
[modify] https://crrev.com/d63d800a8ab17eb1174bdbc8f13afa1ffb10193f/third_party/WebKit/Source/core/fetch/ResourceFetcherTest.cpp

Comment 31 by y...@yoav.ws, Dec 2 2016

Status: Fixed (was: Assigned)
Merged to 56. Closing

Comment 32 by bmau...@fb.com, Jan 11 2017

 Issue 679975  has been merged into this issue.

Sign in to add a comment