ChromeMac frozen for ~2 minutes - stuck in content::ResourceDispatcherHostImpl::UpdateLoadStateOnUI |
|||||||||||
Issue descriptionChrome Version : 66.0.3343.3 OS Version: OS X 10.13.3 URLs (if applicable) : https://allo.google.com/web (maybe not applicable - that's what I had open though) What steps will reproduce the problem? 1. Computer was idle for a while - possibly sleeping 2. Unlock screen, try to interact with Chrome What is the expected result? Responsive Chrome. What happens instead of that? Button hover effects activate, but never deactivate. Clicking buttons/menus doesn not work. Tab loading spinner spins though. Took a sample: - 86% of samples are in a callback that invokes content::ResourceDispatcherHostImpl::UpdateLoadStateOnUI Maybe there is a huge backlog of messages that got dumped on the UI thread when I unlocked my screen. The samples point to some things being slow that really shouldn't be, like content::ResourceDispatcherHostImpl::UpdateLoadStateOnUI calling free() (~400 samples) GURL::~GURL (~30 samples) content::WebContentsImpl::FromRenderFrameHostID (~300 samples)
,
Mar 2 2018
Thanks for filing the issue! @Reporter: We kept our Mac idle for some time then tried unlocking followed by using Chrome browser, it was functioning well with out getting stuck. Could you please elaborate whether you are running anything before the Machine is kept idle. It would be highly helpful if clarified on the "content::ResourceDispatcherHostImpl::UpdateLoadStateOnUI" as ET team is not very clear what exactly does that mean! Any further inputs from your end may help us to triage the issue in a better way.
,
Mar 2 2018
Adding Internals>Network so it gets on networking team's queue (since it's sort of a content/browser/loader thing) tapted: is possible you have many outstanding network requests when you opened your laptop? This method iterates through all of them so it could be a source of slowness. We call a function that posts this task to the UI thread every 250ms. Perhaps we could implement this with cancellation so if multiple tasks are posted, we cancel previous ones that haven't run yet.
,
Mar 2 2018
I also see ~1000 counts for history::HistoryBackend::GetCountsAndLastVisitForOrigins, due to some calls to history::HistoryService::NotifyURLsDeleted.
,
Mar 6 2018
Removing Needs-Feedback for now, dev team knows what UpdateLoadStateOnUI is. The internal timeticks clock keeps running on Mac when the system is sleeping. I wonder if somehow that could cause repeating timers to go haywire on resume, causing a backlog of tasks. I think a new UpdateLoadStateonUI only needs to be scheduled after the previous one runs, which is how I thought the base timers worked. +gab,fdoray if they've seen something like it or if my understanding is off. We can probably optimize the load state updates but unless you have many many resources in flight I can't imagine it'll be too slow. If I have time this triage shift I may be able to land a cleanup or two.
,
Mar 6 2018
Even with millions of requests in flight, UpdateLoadStateOnUI only iterates over frames (Main frames? Not sure), not requests, so this seems a really weird place to hang.
,
Mar 6 2018
I think we do iterate through all the requests to generate the WebContents map, and indeed in the trace it seems like PickMoreInterestingLoadInfo is a bottleneck.
,
Mar 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/911fa064e3cf8c8ab006b39277d01094e05bfdf1 commit 911fa064e3cf8c8ab006b39277d01094e05bfdf1 Author: Charles Harrison <csharrison@chromium.org> Date: Tue Mar 06 21:01:46 2018 Change LoadInfo to only store URL hosts The WebContentsImpl is the only consumer of this information, and it only cares about the host. URLs can be huge (up to 2MB), while hosts are much much smaller. It isn't a good idea to copy the URL for every outgoing request to the UI thread every 250ms. Note: previously this code did some grouping by RVid (I think) before thread hopping. This would further alleviate copying issues when there are lots of requests. Probably we could just use two sets: 1. Global frame routing id -> LoadInfo for subresources 2. All navigation requests Bug: 813445 Change-Id: Ied3709fac7bba079d5e31694dc8e477d1c7f2f68 Reviewed-on: https://chromium-review.googlesource.com/951364 Reviewed-by: Matt Menke <mmenke@chromium.org> Reviewed-by: Avi Drissman <avi@chromium.org> Commit-Queue: Charlie Harrison <csharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#541189} [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/loader/loader_delegate.h [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/loader/resource_dispatcher_host_impl.cc [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/loader/resource_dispatcher_host_impl.h [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/loader/resource_dispatcher_host_unittest.cc [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/loader_delegate_impl.cc [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/loader_delegate_impl.h [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/web_contents/web_contents_impl.cc [modify] https://crrev.com/911fa064e3cf8c8ab006b39277d01094e05bfdf1/content/browser/web_contents/web_contents_impl.h
,
Mar 8 2018
Changing the status to Untriaged and requesting csharrison@ to update the bug status accordingly if there is no further work to be done here. Thanks!
,
Mar 8 2018
I have another fix to UpdateLoadState in review at https://chromium-review.googlesource.com/c/chromium/src/+/951743. However, these will only mitigate the issue, not solve root cause which seems like it could be due to a large amount of tasks queuing up during Mac sleep. I took a look through the tracker and it seems like issue 808031 is quite similar.
,
Mar 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4c135b3c228cfb4c595c8e760428f1ef6a6bb966 commit 4c135b3c228cfb4c595c8e760428f1ef6a6bb966 Author: Charles Harrison <csharrison@chromium.org> Date: Fri Mar 09 18:23:05 2018 Aggregate LoadState per-frame before copying to UI Currently we copy hosts + other state for every outgoing request to the UI thread before doing per-WebContents aggregation there. This can involve a lot of copying of data around. This CL adds some initial per-frame aggregation beforehand to reduce this. Also, we add a browsertest for this functionality, since none previously existed. Bug: 813445 Cq-Include-Trybots: master.tryserver.chromium.linux:linux_mojo Change-Id: I33438bdd5d4cab9caf3fcd07419fef3f7153935e Reviewed-on: https://chromium-review.googlesource.com/951743 Reviewed-by: Avi Drissman <avi@chromium.org> Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: Charlie Harrison <csharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#542157} [modify] https://crrev.com/4c135b3c228cfb4c595c8e760428f1ef6a6bb966/content/browser/loader/resource_dispatcher_host_impl.cc [modify] https://crrev.com/4c135b3c228cfb4c595c8e760428f1ef6a6bb966/content/browser/loader/resource_dispatcher_host_impl.h [modify] https://crrev.com/4c135b3c228cfb4c595c8e760428f1ef6a6bb966/content/browser/web_contents/web_contents_impl_browsertest.cc [modify] https://crrev.com/4c135b3c228cfb4c595c8e760428f1ef6a6bb966/content/test/BUILD.gn [modify] https://crrev.com/4c135b3c228cfb4c595c8e760428f1ef6a6bb966/testing/buildbot/filters/mojo.fyi.network_content_browsertests.filter
,
Mar 14 2018
Loading triage. Let me assign csharrison@ and change the status to 'Started'.
,
Mar 14 2018
Sorry, I was only really working on this because I was on triage duty and had some spare cycles. I implemented two low hanging fruits which should improve things but might not really get to the root cause of the issue (task buildup?). It could use more triage.
,
Mar 14 2018
This is expected to be completely rewritten for the network service - probably not worth digging into.
,
Mar 14 2018
Re. lots of tasks on wakeup : we've discussed this previously in issue 707962 (hadn't realized it was closed, need to open new one). We know that the jump in TimeTicks on system wake up causes a lot of delayed tasks to expire (and they all want to execute *right now* even if it was delayed by 6 hours initially... We're thinking of addressing that with TaskScheduler) Re. Repeating timers : they shouldn't have more than one task each in the queue (the next delay is only posted when the current one runs).
,
Mar 15 2018
Thanks for the extra info gab. Unless there's some bug in our repeating timer implementation (which seems unlikely, and I recently looked through the code to verify behavior), I think this may just be caused by having tons of outstanding requests, with potentially long URLs. tapted: If you can repro on a newer canary with these fixes it would falsify this hypothesis.
,
Mar 15 2018
I'll keep an eye out - haven't seen anything since filing this. (and thanks for taking a look!)
,
Apr 16 2018
#15: this could also be related to issue 808031. In any case, since tapted@ hasn't seen anything further in a month (I assume? :)), I'll mark this one Fixed and let the remaining work be tracked on issue 808031. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by ajha@chromium.org
, Feb 19 2018