Issue metadata
Sign in to add a comment
|
Chrome spent 10s in DialServiceImpl::FinishDiscovery() on IO thread |
||||||||||||||||||||||
Issue description
Chrome Version: 57.0.2987.133 (Official Build) (64-bit)
OS: Windows 10
This happened on a Windows 10 partition on my personal 2016 MacBook Pro. Shortly after boot up and launching Chrome, it froze for a number of seconds.
Once it recovered, I went to check chrome://profiler and sorted by browser process & UI/IO thread to see if any task took a long time on the UI/IO threads and indeed saw the following on the IO thread:
Count Total run time Avg run time Max run time** Avg queue time Max queue time Birth thread PID Function name Source location
1 10,085 10,085 10,085 0 0 Chrome_IOThread 796 DiscoverOnAddresses dial_service.cc [512]
So the above says there was a function call that tooks 10s on the IO thread in browser process, with the function name of the FROM_HERE posted task being DiscoverOnAddresses in dial_service.cc. Looking at that code, we see the FROM_HERE invokes DialServiceImpl::FinishDiscovery():
finish_timer_.Start(FROM_HERE, finish_delay_, this,
&DialServiceImpl::FinishDiscovery);
The problem is this work is being done on the IO thread, which is also responsible for routing IPCs - so during this 10s task, no IPCs were routed and thus Chrome was essentially frozen (e.g. UI was not interactive, pages did not respond).
If this work can be blocking - as it seems to be the case here - then it shouldn't be done on the IO thread - instead it should be on the blocking pool.
Assigning to zhaobin@ who recently touched this code. Please re-assign to a better owner if there is one. Thanks!
,
Apr 17 2017
I assume I'm CC'd because my name is in git blame. I only did some one time refactoring here.
,
Apr 18 2017
The code in DialServiceImpl::FinishDiscovery is relatively straightforward. A possible culprit is UDPSocketWin::Close, which is called when we destroy the DialSockets. This issue has been previously reported: https://bugs.chromium.org/p/chromium/issues/detail?id=165382 https://bugs.chromium.org/p/chromium/issues/detail?id=667820 My understanding from those bugs is closesocket() blocking is a long standing issue. A possible mitigation is to close the sockets on the blocking pool. asvitkine@: If this repros again for you, and if this theory is correct, then the Net.UDPSocketWinClose histogram should some counts with >5s duration.
,
Apr 18 2017
Will do if it happens again, though I don't use this partition often, so not sure if I'll have a chance to see it in action anytime soon. Moving that call to the blocking pool sgtm.
,
Jul 28 2017
Merging into new bug as there are more recent stack traces there. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by w...@chromium.org
, Apr 17 2017Labels: -Type-Bug -Pri-2 M-57 Pri-1 Type-Bug-Regression