Issue metadata
Sign in to add a comment
|
Opening many Facebook tabs in the same thread (with command click) leads to tabs locking up
Reported by
salazar...@gmail.com,
Nov 18 2016
|
||||||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36 Steps to reproduce the problem: 1. Open a bunch of Facebook notifications. 2. Notice that tabs eventually become unresponsive and uncloseable (and leads to a weird UI bug where "x"-ing them out makes the tabs smaller and smaller. What is the expected behavior? Facebook doesn't crash What went wrong? I see a lot of "startScrollbarPaintTimer" tasks and also a lot of hitting the disk cache which could be related? Did this work before? N/A Chrome version: 54.0.2840.98 Channel: stable OS Version: OS X 10.11.6 Flash Version: Shockwave Flash 23.0 r0 [disclaimer] I work for Facebook on the Web Speed team
,
Nov 18 2016
,
Nov 18 2016
,
Nov 18 2016
Something is definitely very bad in the scroll paint timer: Wall Duration 0.043 ms CPU Duration 0.042 ms Self Time 0.006 ms src_file "../../third_party/WebKit/Source/platform/mac/ScrollAnimatorMac.mm" src_func "startScrollbarPaintTimer" and it's firing every super rapidly, looks like every 0.1ms we fire the timer?
,
Nov 18 2016
Looks like a task queue flood.
,
Nov 18 2016
,
Nov 18 2016
,
Nov 18 2016
Seems fishy:
void ScrollAnimatorMac::startScrollbarPaintTimer() {
m_taskRunner->postDelayedTask(
BLINK_FROM_HERE, m_initialScrollbarPaintTaskFactory->cancelAndCreate(),
0.1);
}
,
Nov 18 2016
This might be thousands of cancelled tasks :/
,
Nov 18 2016
,
Nov 18 2016
I don't believe this is a duplicate of the bug listed. The regression in question was introduced and fixed in Chrome 56 canary. Marco saw this on stable. In addition another FB employee noticed the same function having the same pattern back in May.
,
Nov 18 2016
Both ymalik@ and I have made some recent changes to overlay scrollbars which resulted in problems with the Mac scrollbar paint timer ( issue 664809 , issue 662402) but ymalik@'s patch was reverted and mine presumably fixed and relanded. All these changes are very recent -- about ~2 weeks -- so they wouldn't be responsible for any long standing issues or anything in stable. Elliott, what version of Chrome did you take those traces with? The problematic patches were in from 56.0.2909.0 and should have been fixed by 56.0.2920.0. I'll take a closer look.
,
Nov 21 2016
I assume this bug doesn't affect iOS. Please re-add the label if it does.
,
Nov 25 2016
,
Nov 30 2016
Done some digging. I suspect the trace above is from a more recent build as the pattern of scrollbarPaintTimers matches the bug mentioned above. That said, I did find an issue in stable and still reproducible in ToT that has nothing to do with the scrollbar paint timer. In fact, this reproduces on Linux as well so it's not OS specific.
It looks like there's an issue with resource loading. I've tracked it down to ResourceFetcher::moveCachedNonBlockingResourceToBlocking. This method gets called on ResourceFetcher A with a resource whose ResourceLoader is actually belongs to another ResourceFetcher B. What happens is:
1. A tries to move resource to the blocking list
a. The resource doesn't exist in A's m_nonBlockingLoaders but the remove call silently fails
b. The resource is added to m_loaders
2. Some time later, B cancels the ResourceLoader (or perhaps it finishes).
a. This clears Resource::m_loader but ResourceLoader still has a pointer into the Resource
b. B removes ResourceLoader from its list of loaders
3. After this, A cancels all its loaders
a. The ResourceLoader was cancelled by B but is still in A's m_loaders so it's canceled again
b. The ResourceLoader still thinks it belongs to ResourceFetcher B so it tells B to remove to
clean up the Resource
c. B uses the Resource to get the ResourceLoader, since this was cleared in 2a this is nullptr.
It then tries to remove nullptr from the loaders list which is empty as it was cleaned up in
step 2.
It seems that calling HashSet.remove(nullptr) on an empty HashSet causes corruption and we end up looping forever next time we call HashSet.contains which is what causes the hang. Not sure if that in itself is a problem. I think the issue is some bad assumptions when calling moveCachedNonBlockingResourceToBlocking.
Over to yoav@ since it looks like he's touched this method.
+cc yutak@: Should removing nullptr from an empty HashSet cause corruption silently like this?
,
Nov 30 2016
How'd you find that issue? Do you have any sense of how frequent it is? That said, the trace that marco has is definitely from chrome stable and we saw the exact same pattern a few months ago. Is it possible there could be another cause of the scrollbarPaintTimers issue?
,
Nov 30 2016
The repro steps are: 1. Open the notifications menu 2. Going up and down the list, open each notification in a new tab. Each notification is opened twice. ~16 tabs 3. Randomly change tabs and start closing them all. This seems to cause the page to lock up almost every time I tried, I would say about 9 out of 10 times it locks up. The scrollbar paint timer is used only on Mac. It's been around for many years. It seems to be trying to defer scrollbar painting until after load completes but until that happens it can spam these tasks. How many tabs are open in this trace? This does seem excessive but we're still pumping other tasks so it's not quite locked up like the issue I saw. I'm currently looking at ScrollAnimatorMac for other reasons so I'll take a closer look at the paint timer specifically. I'd like to get rid of it altogether.
,
Nov 30 2016
,
Nov 30 2016
We see a wide variety of crashes / hangs. In the trace where we saw the scrollbar issue chrome was still responsive, but burning CPU. This is a pretty common complaint internally. I think the title on this bug might not be quite descriptive of the scrollbar behavior
,
Nov 30 2016
Ok thanks. I've filed issue 669945 to dig into the scrollbar paint timer specifically. Will leave this as tracking the lock up fixed by Yoav's patch.
,
Dec 1 2016
Removing nullptr from a HashSet is a logic error; that may lead to undefined behavior. You can't put a nullptr in a HashSet, so it makes no sense to look up or remove a nullptr.
,
Dec 1 2016
That's not specifically the nullptr though right, it's just the HashTableDeletedValue? Also do we have DCHECKs for this?
,
Dec 1 2016
nullptr isn't a deleted value; it's an "empty value". The deleted value for pointers is reinterpret_cast<T*>(-1). And we have checks for empty values, so I assume we'll be hitting this if we try to do that: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/HashTable.h?q=checkKey&sq=package:chromium&dr=CSs&l=615 (I think it's a good idea to expand this check to deleted values, though)
,
Dec 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7497990e6eb19dc8dd61de4f188553c9c054cef9 commit 7497990e6eb19dc8dd61de4f188553c9c054cef9 Author: yoav <yoav@yoav.ws> Date: Thu Dec 01 07:13:06 2016 Fix memory corruption related to load blocking resource move This fixes an issue where a resource loader belonging to one ResourceFetcher was accidentally added as a blocking loader to another ResourceFetcher, by checking the loader is part of the already non-blocking loaders belonging to current ResourceFetcher. This also adds DCHECKs on a couple of methods removing loaders from hashmaps, to make sure we're not trying to remove a nullptr. BUG= 666563 Review-Url: https://codereview.chromium.org/2537303003 Cr-Commit-Position: refs/heads/master@{#435573} [modify] https://crrev.com/7497990e6eb19dc8dd61de4f188553c9c054cef9/third_party/WebKit/Source/core/fetch/ResourceFetcher.cpp [modify] https://crrev.com/7497990e6eb19dc8dd61de4f188553c9c054cef9/third_party/WebKit/Source/core/fetch/ResourceFetcherTest.cpp
,
Dec 1 2016
Should be fixed in ToT, so marking as such. Let me know if you still encounter this issue.
,
Dec 1 2016
Yoav, could you merge into M56? It's still early in the branch and this seems like a small enough fix. I've heard anecdotal reports of similar lockups and user frustration so this would be good to ship asap.
,
Dec 1 2016
Makes sense. Adding a merge-request label.
,
Dec 1 2016
Thanks. Removing "fixed" so it gets picked up.
,
Dec 2 2016
Your change meets the bar and is auto-approved for M56 (branch: 2924)
,
Dec 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d63d800a8ab17eb1174bdbc8f13afa1ffb10193f commit d63d800a8ab17eb1174bdbc8f13afa1ffb10193f Author: Yoav Weiss <yoav@yoav.ws> Date: Fri Dec 02 11:56:40 2016 Fix memory corruption related to load blocking resource move This fixes an issue where a resource loader belonging to one ResourceFetcher was accidentally added as a blocking loader to another ResourceFetcher, by checking the loader is part of the already non-blocking loaders belonging to current ResourceFetcher. This also adds DCHECKs on a couple of methods removing loaders from hashmaps, to make sure we're not trying to remove a nullptr. BUG= 666563 Review-Url: https://codereview.chromium.org/2537303003 Cr-Commit-Position: refs/heads/master@{#435573} (cherry picked from commit 7497990e6eb19dc8dd61de4f188553c9c054cef9) Review URL: https://codereview.chromium.org/2543203002 . Cr-Commit-Position: refs/branch-heads/2924@{#286} Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059} [modify] https://crrev.com/d63d800a8ab17eb1174bdbc8f13afa1ffb10193f/third_party/WebKit/Source/core/fetch/ResourceFetcher.cpp [modify] https://crrev.com/d63d800a8ab17eb1174bdbc8f13afa1ffb10193f/third_party/WebKit/Source/core/fetch/ResourceFetcherTest.cpp
,
Dec 2 2016
Merged to 56. Closing
,
Jan 11 2017
Issue 679975 has been merged into this issue. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by bmau...@fb.com
, Nov 18 2016