Chrome: Crash Report - [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint |
||||||||||||||||||||||
Issue descriptionreporter:ajha@google.com Magic Signature: [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint Crash link: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D'%5BThreadWatcher%20IO%20hang%5D%20net%3A%3AUDPSocketWin%3A%3ASetDiffServCodePoint'%20AND%20product.name%3D'Chrome'%20AND%20product.Version%3D'66.0.3346.0'&stbtiq=&reportid=&index=0 ------------------------------------------------------------------------------- Sample Report ------------------------------------------------------------------------------- Product name: Chrome Magic Signature : [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint Product Version: 66.0.3346.0 Process type: browser Report ID: a1881f2ff672c633 Report Url: https://crash.corp.google.com/a1881f2ff672c633 Report Time: 2018-02-13T20:42:31-08:00 Upload Time: 2018-02-13T20:42:37.442-08:00 Uptime: 16890000 ms CumulativeProductUptime: 0 ms OS Name: Windows NT OS Version: 6.1.7601 24000 CPU Architecture: amd64 CPU Info: family 6 model 60 stepping 3 ------------------------------------------------------------------------------- Crashing thread: Thread index: 9. Stack Quality: 84%. Thread id: 6620. ------------------------------------------------------------------------------- 0x7747a0fa (ntdll.dll + 0x0006a0fa) NtAlpcSendWaitReceivePort 0x000007fefd5808c2 (RPCRT4.dll + 0x000e08c2) Ndr64pAsyncAbortCall(_RPC_ASYNC_STATE *,_NDR_ASYNC_MESSAGE *,unsigned long,int) 0x000007fefd580dfc (RPCRT4.dll + 0x000e0dfc) Ndr64pCompleteAsyncServerCall(_RPC_ASYNC_STATE *,_NDR_ASYNC_MESSAGE *,void *) 0x000007fefd676d21 (sechost.dll + 0x00006d21) StartServiceA 0x000007fef7fa3216 (qwave.dll + 0x00013216) StartQOS2Service 0x000007fef7fa3442 (qwave.dll + 0x00013442) QOSCreateHandle 0x000007fed4baad4b (chrome.dll + 0x00bdad4b) 0x000007fed508dd88 (chrome.dll - udp_socket_win.cc: 1137) net::UDPSocketWin::SetDiffServCodePoint(net::DiffServCodePoint) 0x000007fed4bb18ca (chrome.dll - socket_host_udp.cc: 432) content::P2PSocketHostUdp::SetOption(content::P2PSocketOption,int) 0x000007fed4baad85 (chrome.dll - socket_dispatcher_host.cc: 347) content::P2PSocketDispatcherHost::OnSetOption(int,content::P2PSocketOption,int) 0x000007fed4baac64 (chrome.dll - ipc_message_templates.h: 146) IPC::MessageT<P2PHostMsg_SetOption_Meta,std::tuple<int,content::P2PSocketOption,int>,void>::Dispatch<content::P2PSocketDispatcherHost,content::P2PSocketDispatcherHost,void,void (content::P2PSocketDispatcherHost::*)(int, content::P2PSocketOption, int)> 0x000007fed4ba99c3 (chrome.dll - socket_dispatcher_host.cc: 157) content::P2PSocketDispatcherHost::OnMessageReceived(IPC::Message const &) 0x000007fed429389e (chrome.dll - browser_message_filter.cc: 73) content::BrowserMessageFilter::Internal::OnMessageReceived(IPC::Message const &) 0x000007fed506e073 (chrome.dll - message_filter_router.cc: 22) IPC::`anonymous namespace'::TryFiltersImpl 0x000007fed506a170 (chrome.dll - ipc_channel_proxy.cc: 86) IPC::ChannelProxy::Context::TryFilters(IPC::Message const &) 0x000007fed506a262 (chrome.dll - ipc_channel_proxy.cc: 121) IPC::ChannelProxy::Context::OnMessageReceived(IPC::Message const &) 0x000007fed429352b (chrome.dll - ipc_channel_mojo.cc: 257) IPC::ChannelMojo::OnMessageReceived(IPC::Message const &) 0x000007fed506ddfb (chrome.dll - ipc_message_pipe_reader.cc: 113) IPC::internal::MessagePipeReader::Receive(base::span<unsigned char const >,base::Optional<std::vector<mojo::StructPtr<IPC::mojom::SerializedHandle>,std::allocator<mojo::StructPtr<IPC::mojom::SerializedHandle> > > >) 0x000007fed428feb9 (chrome.dll - ipc.mojom.cc: 289) IPC::mojom::ChannelStubDispatch::Accept(IPC::mojom::Channel *,mojo::Message *) 0x000007fed506b4ed (chrome.dll - ipc_mojo_bootstrap.cc: 766) IPC::`anonymous namespace'::ChannelAssociatedGroupController::Accept 0x000007fed40b8dfc (chrome.dll - connector.cc: 444) mojo::Connector::ReadSingleMessage(unsigned int *) 0x000007fed40b8c6b (chrome.dll - connector.cc: 474) mojo::Connector::ReadAllAvailableMessages() 0x000007fed40b8b3c (chrome.dll - simple_watcher.cc: 276) mojo::SimpleWatcher::OnHandleReady(int,unsigned int,mojo::HandleSignalsState const &) 0x000007fed3fee759 (chrome.dll - task_annotator.cc: 55) base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) 0x000007fed3fee03b (chrome.dll - message_loop.cc: 395) base::MessageLoop::RunTask(base::PendingTask *) 0x000007fed3fed937 (chrome.dll - message_loop.cc: 451) base::MessageLoop::DoWork() 0x000007fed3ffd8b5 (chrome.dll - message_pump_win.cc: 475) base::MessagePumpForIO::DoRunLoop() 0x000007fed3ffd757 (chrome.dll - message_pump_win.cc: 56) base::MessagePumpWin::Run(base::MessagePump::Delegate *) 0x000007fed3fed364 (chrome.dll - run_loop.cc: 133) base::RunLoop::Run() 0x000007fed40a4b17 (chrome.dll - browser_thread_impl.cc: 215) content::BrowserThreadImpl::IOThreadRun(base::RunLoop *) 0x000007fed4083440 (chrome.dll - browser_thread_impl.cc: 242) content::BrowserThreadImpl::Run(base::RunLoop *) 0x000007fed3fec389 (chrome.dll - thread.cc: 338) base::Thread::ThreadMain() 0x000007fed4eeba83 (chrome.dll - platform_thread_win.cc: 91) base::`anonymous namespace'::ThreadFunc 0x773059cc (kernel32.dll + 0x000159cc) BaseThreadInitThunk 0x7746383c (ntdll.dll + 0x0005383c) RtlUserThreadStart ------------------------------------------------------------------------------- Regression range ------------------------------------------------------------------------------- Manual regression range finder link: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D'%5BThreadWatcher%20IO%20hang%5D%20net%3A%3AUDPSocketWin%3A%3ASetDiffServCodePoint'#-property-selector,-samplereports,+productname,+productversion:1000,+directory,-clientid,+operatingsystem,+url,+simplifiedurl,+extensions
,
Feb 14 2018
Users experienced this crash on the following builds: Win Canary 66.0.3346.0 - 1.43 CPM, 19 reports, 19 clients (signature [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Feb 15 2018
Re: comment #1: this is not a crash, but rather slowness (in this case, it seems like Windows is blocking a QOS call on a socket while it's waiting for some service to startup?), and regression classification is likely wrong --- it's an improvement in how well those sorts of things are detected by the CL that you've linked to; so I suspect the severity/priority/blocking classifications are overly severe. I guess the fix would be to make those async and use threads appropriately if the API permits it; there don't seem to be too many uses but they're non-trivial. CC hubbe as the author of QWAVE-using code (in 2014!) and as knowing a lot about the media stuff that uses QoS on UDP sockets --- should this go to Internals > Media as well?
,
Feb 15 2018
[+sergeyu, +juberti]: How important is calling SetDiffServCodePoint on Windows? Seems to be a not insignificant source of hangs.
,
Feb 16 2018
,
Feb 16 2018
Issue 806779 has been merged into this issue.
,
Feb 21 2018
,
Feb 22 2018
,
Feb 23 2018
[+xjz, +miu@ who are working on revamping media streaming's UDP transport]: I am assuming SetDiffServCodePoint is important or media streaming. This is FYI that SetDiffServCodePoint on Windows can sometimes block.
,
Mar 1 2018
Assigning bug to juberti - I've been unable to get any response from either WebRTC team member who owns that directory, and am not comfortable modifying the API or their code without some discussion.
,
Mar 1 2018
SetDiffServCodePoint is somewhat important to us; using DSCP markings is a "SHOULD" in the WebRTC transport standard, and we believe enterprises will want this to work. It would be unfortunate if we had to disable it completely due to a relatively infrequent crash. Could we work around this by calling QOSCreateHandle on another thread? That doesn't sound too hard. It would mean some packets could be sent before the DSCP tag is set, but I think we can live with that.
,
Mar 1 2018
I assume that would work, though am certainly not an expert on the safety of using sockets on Windows across threads, and don't think we want to be setting socket properties on one thread while reading from/writing to a socket on another thread.
,
Mar 1 2018
Well, as far as we know, it's only QOSCreateHandle that hangs, and it doesn't deal with an individual socket; it's for initializing the whole QOS subsystem. We'd still only interact with sockets on one thread.
,
Mar 1 2018
I'm not following. qos_handle_ is a per-UDP socket variable, and qos.CreateHandle is called on the first SetDiffServCodePoint call for every socket.
,
Mar 1 2018
Sorry, when you said "the safety of using sockets on Windows across threads," I assumed you were talking about the Windows socket handle ("socket_"), not the UDPSocketWin class. We'd definitely need some form of synchronization in UDPSocketWin.
,
Mar 2 2018
,
Mar 2 2018
I think it should be safe to initialize QOS API on a background thread and then call qos.AddSocketToFlow() on the threads that owns the socket.
,
Mar 2 2018
deadbeef@, do you know if there are any perf metrics that improved by this feature? If not then I suggest dropping it, at least on Windows where it looks like we don't have a good API to support this feature. Even if the standard says it SHOULD be supported I don't think chrome should implement this feature unless we know it improves UX.
,
Mar 2 2018
Re #4, it would be good to fix this, but nothing would stop working if we had to disable this functionality temporarily. Sounds like deadbeef has identified a viable workaround though.
,
Mar 4 2018
Users experienced this crash on the following builds: Win Canary 67.0.3360.0 - 1.19 CPM, 14 reports, 14 clients (signature [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Mar 13 2018
Users experienced this crash on the following builds: Win Canary 67.0.3368.1 - 1.18 CPM, 12 reports, 12 clients (signature [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Mar 19 2018
Just to update: [ThreadWatcher IO hang] net::UDPSocketWin::SetDiffServCodePoint Still seeing 12 instances from 12 clients so far on latest Canary-67.0.3374.0. 67.0.3374.0 0.64% 12 -Canary 67.0.3371.0 1.48% 28 -Dev Link to the list of builds: -------------------------- https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BThreadWatcher%20IO%20hang%5D%20net%3A%3AUDPSocketWin%3A%3ASetDiffServCodePoint%27#-property-selector,-samplereports,productversion:1000,+magicsignature:50,directory,-magicsignature2:50,-stablesignature:50,-clientid,url,simplifiedurl,extensions,-magicsignaturesorted:50 deadbeef@, Could you please take a look as it is marked as stable blocker. Thanks..!
,
Mar 27 2018
Just to update: net::UDPSocketWin::SetDiffServCodePoint No instances seen on any of the latest channels Canary- 67.0.3379.0, Dev-67.0.3377.1, Beta-66.0.3359.45 & Stable-65.0.3325.181 on Windows OS. Link to the list of builds: -------------------------- https://goto.google.com/vacjq Thanks..!
,
Mar 27 2018
Per discussion on 824716, classification of thread hangs was likely broken for a few days (though not sure what's up with stable).
,
Apr 2 2018
Just a heads up, M66 Stable cut is on April 12th, 10 days away. This issue is marked as RB-Stable for 66. Please make sure to address this issue prior to stable cut. Thanks!
,
Apr 5 2018
A couple questions: 1) Is this causing an actual browser crash? Following the stack track trace of the MagicSignature (#22) it looks like because of the updated check in thread_watcher.cc it now calls metrics::CrashBecauseThreadWasUnresponsive(thread_id_). This ends up calling base::debug::DumpWithoutCrashing(), which I'm not sure actually causes any crashing or dumping. It looks like the handler set for DumpWithoutCrashing() in crashpad.cc is CRASHPAD_SIMULATE_CRASH() which "Captures the CPU context and simulates an exception without crashing". The reason I'm asking is because as mentioned in #3, this is rather slowness than a crash. It is also an issue that already existed, but only recently showed up because of crbug.com/804345 , that now reports hidden hanging threads. This brings me to the next question... 2) Can remove this from being labeled as ReleaseBlock-Stable? The minor hanging call already existed in previous versions, and for WebRTC we would still like to have this. In the meantime we can still make a fix by making the CreateHandle call on a background thread. This all depends on the answer to question 1.
,
Apr 5 2018
No, it's simply locking up the IOThread, not a crash. I can't answer number 2.
,
Apr 7 2018
I dug into this more and it looks like for the WebRTC case UDPSocketWin::SetDiffServCodePoint isn't working properly because QOSAddSocketToFlow requires that we provide a destination address (when the socket is unconnected), which isn't provided. zstein@ has experience trying to fix this, so I'm referring him here to confirm. If this is confirmed^, WebRTC should be fine with disabling this (because it isn't working). Although, I believe Cast is using this as well, so it's probably best to check with other teams before this is disabled. I also want to emphasize the question if it's okay to remove this from being release blocking. This is behavior that already existed in previous Chrome versions, the only difference is it is now being detected. Who can I add to this bug to confirm if this can be removed?
,
Apr 9 2018
I can confirm that SetDiffServCodePoint doesn't work for unconnected sockets on Windows, which is the WebRTC use case. I think cast uses connected sockets though, and disabling here would also disable for them. I'm not sure how/if they have quantified the effect of this feature.
,
Apr 9 2018
Yuri (miu@), could you please let us know if UDPSocketWin::SetDiffServCodePoint is something that cast needs? Since we've confirmed that it isn't working for the WebRTC use case, we can disable it if it's not needed for cast.
,
Apr 9 2018
Removing this from being a release block & a regression bug because this is previous behavior.
,
Apr 10 2018
Cast streaming currently calls SetDiffServCodePoint. And on Windows, we only call it on Win8 or later version (might due to some problems on old versions). The crash seems only happened when P2PSocketHostUdp calls it, which is not from the cast streaming path. So maybe just disabling the call from P2PSocketHostUdp?
,
Apr 10 2018
Had a quick chat with Yuri. It sounds that by setting a preferred DSCP value might improve performance a little bit according to some tests years ago, but not that significant. It is fine to disable it for cast streaming.
,
Apr 11 2018
Either way, as xjz@ said. However, if crashes are not being caused by Cast Streaming (we don't use P2PSocketHost), then feel free to leave it turned on unless its use is causing other problems.
,
Apr 12 2018
I made a change just to disable this for P2PSocketHostUdp. When it lands I'll mark this bug as fixed with the intention that we come back to this when we make a fix for our use case of UDPSocketWin::SetDiffServCodePoint. https://chromium-review.googlesource.com/c/chromium/src/+/1005961/
,
Apr 18 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e4f3fcad5d5704af822be59864b97831f838a6d4 commit e4f3fcad5d5704af822be59864b97831f838a6d4 Author: Seth Hampson <shampson@chromium.org> Date: Wed Apr 18 19:21:54 2018 Disabling call to SetDiffServCodePoint for Windows. Chrome that was previously undetected until now. This hangs on QOSCreateHandle which is created to set DSCP labels appropriately for Windows. This change disables the call being made from the P2PSocketHostUdp object, which is used in WebRTC code. UDPSocketWin: :SetDiffServCodePoint creates a hanging IO thread in Bug: 812137 Change-Id: I57a252a9f832aebcd02a9b4d6de781689c5031c0 Reviewed-on: https://chromium-review.googlesource.com/1005961 Commit-Queue: Seth Hampson <shampson@chromium.org> Reviewed-by: Sergey Ulanov <sergeyu@chromium.org> Reviewed-by: Taylor Brandstetter <deadbeef@chromium.org> Cr-Commit-Position: refs/heads/master@{#551777} [modify] https://crrev.com/e4f3fcad5d5704af822be59864b97831f838a6d4/content/browser/renderer_host/p2p/socket_host_udp.cc [modify] https://crrev.com/e4f3fcad5d5704af822be59864b97831f838a6d4/content/browser/renderer_host/p2p/socket_host_udp.h
,
Apr 19 2018
,
Apr 20 2018
Issue 833354 has been merged into this issue.
,
Apr 25 2018
Requesting merge to M67. I highly suspect this is the root cause of crbug.com/829831 , which has 64 stars and is marked ReleaseBlock-Stable. It turns out this issue has a wider impact than we earlier thought. We assumed it would only occur if you used the "googDscp" constraint to enable DSCP, but it turns out that even when DSCP is disabled, we call "SetOption(OPT_DSCP, DSCP_DEFAULT)". So *every* WebRTC connection can potentially encounter this issue.
,
Apr 26 2018
Your change meets the bar and is auto-approved for M67. Please go ahead and merge the CL to branch 3396 manually. Please contact milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 26 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/44705f2cb7f143b67930694dba9ddcb1c027b9de commit 44705f2cb7f143b67930694dba9ddcb1c027b9de Author: Seth Hampson <shampson@chromium.org> Date: Thu Apr 26 21:01:57 2018 Disabling call to SetDiffServCodePoint for Windows. Fixes hang that was previously undetected until now. This hangs on QOSCreateHandle which is created to set DSCP labels appropriately for Windows. This change disables the call being made from the P2PSocketHostUdp object, which is used in WebRTC code. UDPSocketWin: :SetDiffServCodePoint creates a hanging IO thread in Bug: 812137 Change-Id: I57a252a9f832aebcd02a9b4d6de781689c5031c0 Reviewed-on: https://chromium-review.googlesource.com/1005961 Commit-Queue: Seth Hampson <shampson@chromium.org> Reviewed-by: Sergey Ulanov <sergeyu@chromium.org> Reviewed-by: Taylor Brandstetter <deadbeef@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#551777}(cherry picked from commit e4f3fcad5d5704af822be59864b97831f838a6d4) Reviewed-on: https://chromium-review.googlesource.com/1031270 Cr-Commit-Position: refs/branch-heads/3396@{#343} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/44705f2cb7f143b67930694dba9ddcb1c027b9de/content/browser/renderer_host/p2p/socket_host_udp.cc [modify] https://crrev.com/44705f2cb7f143b67930694dba9ddcb1c027b9de/content/browser/renderer_host/p2p/socket_host_udp.h
,
May 10 2018
Issue 829831 has been merged into this issue.
,
Oct 9
As another interesting data point: https://social.msdn.microsoft.com/Forums/office/en-US/821b039a-e2ec-4e33-ad41-e6b126adf07b/qoscreatehandle-lengthy-after-first-boot?forum=networkqosqwave So the QWave service must be lazily-started the first time QOSCreateHandle is called :)
,
Oct 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/66923ce781889c8fae9e56d2963dddc7d461f012 commit 66923ce781889c8fae9e56d2963dddc7d461f012 Author: Tim Haloun <thaloun@chromium.org> Date: Sat Oct 27 01:01:06 2018 Re-enable QoS for UDPSocket under Windows. - Move QoSCreateHandle to a background task, letting critical socket operations continue unblocked while it executes. - Improve resetting handle when it returns REINITIALIZATION_NEEDED - Widen unit tests - Undo ifdef disabling QOS that was added because of hangs/stalls noted in bug 812137 Cq-Include-Trybots: luci.chromium.try:linux_mojo Change-Id: I383955c2a218f23974a3f5e7044ae0464e99a43f Bug: 812137 , webrtc:5836 Reviewed-on: https://chromium-review.googlesource.com/c/1195936 Reviewed-by: Ryan Hamilton <rch@chromium.org> Reviewed-by: Sergey Ulanov <sergeyu@chromium.org> Reviewed-by: Wez <wez@chromium.org> Commit-Queue: Sergey Ulanov <sergeyu@chromium.org> Commit-Queue: Tim Haloun <thaloun@chromium.org> Cr-Commit-Position: refs/heads/master@{#603269} [modify] https://crrev.com/66923ce781889c8fae9e56d2963dddc7d461f012/net/socket/udp_socket_unittest.cc [modify] https://crrev.com/66923ce781889c8fae9e56d2963dddc7d461f012/net/socket/udp_socket_win.cc [modify] https://crrev.com/66923ce781889c8fae9e56d2963dddc7d461f012/net/socket/udp_socket_win.h [modify] https://crrev.com/66923ce781889c8fae9e56d2963dddc7d461f012/services/network/p2p/socket_udp.cc |
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by ajha@chromium.org
, Feb 14 2018Components: Internals>Network
Labels: -Type-Bug -Pri-2 Stability-Hang ReleaseBlock-Stable TE-CrashTriage M-66 FoundIn-66 Target-66 Pri-1 Type-Bug-Regression