New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 712034 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Chrome spent 10s in DialServiceImpl::FinishDiscovery() on IO thread

Project Member Reported by asvitk...@chromium.org, Apr 16 2017

Issue description

Chrome Version: 57.0.2987.133 (Official Build) (64-bit)
OS: Windows 10

This happened on a Windows 10 partition on my personal 2016 MacBook Pro. Shortly after boot up and launching Chrome, it froze for a number of seconds.

Once it recovered, I went to check chrome://profiler and sorted by browser process & UI/IO thread to see if any task took a long time on the UI/IO threads and indeed saw the following on the IO thread:

Count	Total run time	Avg run time	Max run time**	Avg queue time	Max queue time	Birth thread	PID	Function name	Source location
1	10,085	10,085	10,085	0	0	Chrome_IOThread	796	DiscoverOnAddresses	dial_service.cc [512]

So the above says there was a function call that tooks 10s on the IO thread in browser process, with the function name of the FROM_HERE posted task being DiscoverOnAddresses in dial_service.cc. Looking at that code, we see the FROM_HERE invokes DialServiceImpl::FinishDiscovery():


    finish_timer_.Start(FROM_HERE, finish_delay_, this,
                        &DialServiceImpl::FinishDiscovery);

The problem is this work is being done on the IO thread, which is also responsible for routing IPCs - so during this 10s task, no IPCs were routed and thus Chrome was essentially frozen (e.g. UI was not interactive, pages did not respond).

If this work can be blocking - as it seems to be the case here - then it shouldn't be done on the IO thread - instead it should be on the blocking pool.

Assigning to zhaobin@ who recently touched this code. Please re-assign to a better owner if there is one. Thanks!
 

Comment 1 by w...@chromium.org, Apr 17 2017

Cc: imch...@chromium.org
Labels: -Type-Bug -Pri-2 M-57 Pri-1 Type-Bug-Regression
Cc: -thestig@chromium.org
I assume I'm CC'd because my name is in git blame. I only did some one time refactoring here.
Cc: -imch...@chromium.org mfo...@chromium.org zhaobin@chromium.org
Owner: imch...@chromium.org
The code in DialServiceImpl::FinishDiscovery is relatively straightforward. A possible culprit is UDPSocketWin::Close, which is called when we destroy the DialSockets. This issue has been previously reported:

https://bugs.chromium.org/p/chromium/issues/detail?id=165382
https://bugs.chromium.org/p/chromium/issues/detail?id=667820

My understanding from those bugs is closesocket() blocking is a long standing issue. A possible mitigation is to close the sockets on the blocking pool.

asvitkine@: If this repros again for you, and if this theory is correct, then the Net.UDPSocketWinClose histogram should some counts with >5s duration.
Will do if it happens again, though I don't use this partition often, so not sure if I'll have a chance to see it in action anytime soon.

Moving that call to the blocking pool sgtm.
Mergedinto: 749758
Status: Duplicate (was: Assigned)
Merging into new bug as there are more recent stack traces there.

Sign in to add a comment