Randomly fails to load while fetching multiple resources
Reported by
ashlaa...@gmail.com,
Oct 26
|
|||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Steps to reproduce the problem: 1. Visit our PWA in a new incognito window: https://editor.construct.net/r122/ 2. Observe console messages/network tab This is sporadic so retry several times if the problem does not reproduce. What is the expected behavior? All fetch requests to succeed and offline caching to complete, as it does in Firefox and Edge. What went wrong? Seemingly randomly, there are cascades of failed fetches. The set of failed fetches appear to be different every time, and they appear in dev tools as "(canceled)" with no server response, suggesting it is a client-side issue. There is nothing in our code that cancels fetches, so presumably the browser is cancelling them. The consequences range from the PWA failing to load entirely, to successfully loading but failing to cache for use offline, or just working normally in some cases. Did this work before? Yes not sure - seems to be within the past couple of months Does this work in other browsers? Yes Chrome version: 70.0.3538.77 Channel: stable OS Version: 10.0 Flash Version: This also reproduces in Canary 72.0.3592.0. It also reproduces without dev tools open (I saw a separate issue that involved problems with fetches while dev tools is open). It is also not unique to my system, since we are getting reports from users about it (e.g. https://github.com/Scirra/Construct-3-bugs/issues/2189). We have never observed this issue happening in Firefox or Chrome.
,
Oct 26
Screenshot of seemingly random cascades of canceled fetches in dev tools: https://www.dropbox.com/s/sgz5nupq83wlgcs/chrome-fetch-failures.png?dl=0
,
Oct 26
Hm, the screenshot in c#2 looks like issue 893459 , which should be fixed in recent canaries. But issue 893459 was just a DevTools UI bug and wouldn't cause real loading failures (even with DevTools open). So it wouldn't explain the real loading failures. The screenshot in https://github.com/Scirra/Construct-3-bugs/issues/2189 is definitely not issue 893459 . The failures are net::ERR_FAILED rather than (canceled). That screenshot looks like another issue: issue 899192 .
,
Oct 29
,
Oct 29
Tried testing the issue on chrome reported version# 70.0.3538.77 using Windows-10 with steps mentioned below: 1) Launched chrome reported version, opened Incognito window and navigated to URL: https://editor.construct.net/r122/ 2) Opened Devtools > Network tab, Clicked Ctrl+R, observed many instances got generated in network tab and seen some of the instances with Canceled Status. @Reporter: Please find the attached screencast for your reference and let us know if you are observing the same issue that we have captured in the screencast. Provide your feedback on it which help in further triaging it in better way. Thanks!
,
Oct 29
Yes, it looks like you have reproduced the issue in that video.
,
Oct 30
Able to reproduce the issue on chrome reported version# 70.0.3538.77 and on latest chrome# 72.0.3595.0 with URL provided in comment# 0 using Windows-10, Mac 10.12.6 and Ubuntu 17.10. As this issue is seen from M-60(60.0.3112.0), hence considering this issue as Non-Regression and marking it as Untriaged. Thanks!
,
Nov 2
I can repro the cascade of (canceled) on Linux 72 with both NetworkService/ServiceWorkerServicification on and off. But the page seems to load fine. It could be the (canceled) are a red herring. I'm also curious why this happens also in M60. Could it be a change to the site? Could loading/devtools have ideas about in what cases a request ends up shown as (canceled) in the Network tab?
,
Nov 2
Our telemetry shows crashes happening that appear to correlate with cascades of fetch failures. This matches a report from a customer (https://github.com/Scirra/Construct-3-bugs/issues/2189). I don't know if it's the same issue, but it seems it could be related.
,
Nov 2
Based on [1], DevTools should show (canceled) when the completion status has an "aborted" error, given when "An operation was aborted (due to user action)." We don't currently get an aborted reason. cc caseq@ in case you have ideas. [1] https://cs.chromium.org/chromium/src/content/browser/devtools/protocol/network_handler.cc?q=f:devtools+network+loadingfailed+-f:out&sq=package:chromium&dr=C&l=1728
,
Nov 6
Just caught another repro of this myself - the PWA failed to load, and the console reported this: https://www.dropbox.com/s/zype9r5sm4356x6/issue899196-repro.png?dl=0 I didn't have dev tools open initially so I can't check what's in the network tab. It looks like some fetches randomly fail with net:ERR_FAILED, throwing "TypeError: Failed to fetch", which then kills loading the PWA. I hit reload with dev tools open and the cascade of failed fetches happened again, although this time with a few showing "(failed)" instead of "(cancelled)": https://www.dropbox.com/s/rrltu5lwgv2oxua/issue899196-repro2.png?dl=0 The "failed" fetches don't have a response, so it looks like they didn't make it to network.
,
Nov 6
BTW the failures cancel the offline caching and are preventing our PWA working offline as well.
,
Nov 7
removing Blink>loader component falken: Would you triage this issue?
,
Nov 8
I could see bunch of "(canceled)" and "(failed)" requests. Let me check why this happens.
,
Nov 8
I'm now trying to create a simple webapp to repro the same error.
The construct's service worker looks doing something like the following:
--
self.onfetch = e => {
let p = fetch(e.request);
e.waitUntil(p.then(() => fetch(e.request.url + `?${Math.random()}`));
e.respondWith(p);
}
--
I suspected something might happen when the number of requests is big, but so far I could not reproduce the error.
,
Nov 8
The random query string approach is a workaround to fetch cache options not being supported. It was more or less an official recommendation IIRC. Fetch cache options are now supported, but I'd be nervous about removing it given the prevalence of Chromium forks that update slower, e.g. Samsung Internet browser. We've already been burned trying to remove legacy code too soon and breaking older Chromium versions that people still use...
,
Nov 9
I looked at netlog and net stack seems to return ERR_ABORTED sometimes. This is the netlog I'm now investigating: https://drive.google.com/file/d/1BjYNscZ_Zn5KrViavNYJ6pi6iC8lpKfS/view?usp=sharing For example, https://editor.construct.net/r124/behaviors/movements/bullet/c3runtime/instance.js?26353 (10067 in the netlog) is the first failing request, and it reaches [1] with ERR_ABORTED net_error code. [1] https://cs.chromium.org/chromium/src/net/url_request/url_request.cc?type=cs&q=URL_REQUEST_START_JOB&sq=package:chromium&g=0&l=837 I'm guessing that something happened somewhere under net/, but still not confident of it.
,
Nov 9
8389: HTTP2 SESSION looks handling requests relevant to https://editor.construct.net/r124/behaviors/movements/bullet/c3runtime/instance.js?26353 and a few other failing requests around it. Those requests looks hitting HTTP2_SESSION_STALLED_MAX_STREAMS a few times, and eventually the connection has been closed without handling those requests. mmenke: Could you help me to understand whether it's working as we intended or not?
,
Nov 9
The network stack only returns ERR_ABORTED when entering suspend mode - otherwise, only other things cancelling the request result in that error, so the cancellation is likely coming from another layer.
,
Nov 9
Could it be something like an AbortSignal/AbortController somehow triggering it unexpectedly? Maybe on GC or something.
,
Nov 12
I added logging code http://crrev.com/c/1331687, and loaded https://editor.construct.net/r122/ several times on Mac. I saw two types of errors ERR_ABORTED(-3) and ERR_CONNECTION_RESET(-101). ERR_ABORTED is happening in net::SpdySession::StartGoingAway. [37654:20227:1112/111631.589364:ERROR:spdy_session.cc(1324)] SpdySession::StartGoingAway 0 libbase.dylib 0x000000010baa4ecf base::debug::StackTrace::StackTrace(unsigned long) + 31 1 libnet.dylib 0x00000001126cfe6a net::SpdySession::StartGoingAway(unsigned int, net::Error) + 170 2 libnet.dylib 0x00000001126d6706 net::SpdySession::OnGoAway(unsigned int, spdy::SpdyErrorCode, base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >) + 694 3 libnet.dylib 0x00000001126c0143 net::BufferedSpdyFramer::OnGoAwayFrameData(char const*, unsigned long) + 131 4 libnet.dylib 0x00000001126eecd4 http2::GoAwayPayloadDecoder::ResumeDecodingPayload(http2::FrameDecoderState*, http2::DecodeBuffer*) + 548 5 libnet.dylib 0x00000001126ed56d http2::Http2FrameDecoder::StartDecodingPayload(http2::DecodeBuffer*) + 493 6 libnet.dylib 0x000000011279a02c http2::Http2DecoderAdapter::ProcessInputFrame(char const*, unsigned long) + 60 7 libnet.dylib 0x0000000112799fbe http2::Http2DecoderAdapter::ProcessInput(char const*, unsigned long) + 78 8 libnet.dylib 0x00000001126d3e6b net::SpdySession::DoReadComplete(int) + 203 9 libnet.dylib 0x00000001126d3807 net::SpdySession::DoReadLoop(net::SpdySession::ReadState, int) + 247 10 libnet.dylib 0x00000001126cdec2 net::SpdySession::PumpReadLoop(net::SpdySession::ReadState, int) + 98 11 libnet.dylib 0x000000011254c964 net::SSLClientSocketImpl::RetryAllOperations() + 260 12 libnet.dylib 0x00000001126b563c net::TCPClientSocket::DidCompleteRead(base::OnceCallback<void (int)>, int) + 76 13 libnet.dylib 0x00000001126b5ebd base::internal::Invoker<base::internal::BindState<void (net::TCPClientSocket::*)(base::OnceCallback<void (int)>, int), base::internal::UnretainedWrapper<net::TCPClientSocket>, base::OnceCallback<void (int)> >, void (int)>::RunOnce(base::internal::BindStateBase*, int) + 61 14 libnet.dylib 0x00000001127dd19b net::TCPSocketPosix::ReadIfReadyCompleted(base::OnceCallback<void (int)>, int) + 155 15 libnet.dylib 0x00000001127dde8d base::internal::Invoker<base::internal::BindState<void (net::TCPSocketPosix::*)(base::OnceCallback<void (int)>, int), base::internal::UnretainedWrapper<net::TCPSocketPosix>, base::OnceCallback<void (int)> >, void (int)>::RunOnce(base::internal::BindStateBase*, int) + 61 16 libnet.dylib 0x00000001127dbfe8 net::SocketPosix::OnFileCanReadWithoutBlocking(int) + 280 17 libbase.dylib 0x000000010babaf87 base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) + 199 18 libbase.dylib 0x000000010babc2fc event_base_loop + 1196 19 libbase.dylib 0x000000010babb28e base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 430 20 libbase.dylib 0x000000010ba18d35 base::RunLoop::Run() + 53 21 libcontent.dylib 0x000000011354f1a4 content::BrowserProcessSubThread::IOThreadRun(base::RunLoop*) + 20 22 libbase.dylib 0x000000010ba6e50d base::Thread::ThreadMain() + 493 23 libbase.dylib 0x000000010bab0c67 base::(anonymous namespace)::ThreadFunc(void*) + 87 24 libsystem_pthread.dylib 0x00007fff5f211661 _pthread_body + 340 25 libsystem_pthread.dylib 0x00007fff5f21150d _pthread_body + 0 26 libsystem_pthread.dylib 0x00007fff5f210bf9 thread_start + 13 [37664:40411:1112/111631.597472:ERROR:url_loader_client_impl.cc(337)] ------------ URLLoaderClientImpl::OnComplete error -3 https://editor.construct.net/r122/plugins/keyboard/c2runtime/runtime.js?610935 [37664:40411:1112/111631.597763:ERROR:fetch_manager.cc(625)] FetchManager::Loader::DidFail----------------https://editor.construct.net/r122/plugins/keyboard/c2runtime/runtime.js?610935 ERR_CONNECTION_RESET is happening in net::SocketPosix::Write. [37744:20739:1112/111731.858586:ERROR:net_errors_posix.cc(40)] MapSystemError 32 0 libbase.dylib 0x00000001075b8ecf base::debug::StackTrace::StackTrace(unsigned long) + 31 1 libnet.dylib 0x000000010df3bd06 net::MapSystemError(int) + 198 2 libnet.dylib 0x000000010e1bcc45 net::SocketPosix::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 149 3 libnet.dylib 0x000000010e1be282 net::TCPSocketPosix::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 178 4 libnet.dylib 0x000000010e096cde net::TCPClientSocket::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 142 5 libnet.dylib 0x000000010df2ac9a net::SocketBIOAdapter::SocketWrite() + 154 6 libnet.dylib 0x000000010df2aac6 net::SocketBIOAdapter::BIOWrite(char const*, int) + 502 7 libboringssl.dylib 0x000000010e92e7de BIO_write + 46 8 libboringssl.dylib 0x000000010e9ef2a3 bssl::ssl_write_buffer_flush(ssl_st*) + 195 9 libboringssl.dylib 0x000000010e9eba76 bssl::do_ssl3_write(ssl_st*, int, unsigned char const*, unsigned int) + 134 10 libboringssl.dylib 0x000000010e9eb972 bssl::ssl3_write_app_data(ssl_st*, bool*, unsigned char const*, int) + 306 11 libboringssl.dylib 0x000000010e9f759f SSL_write + 271 12 libnet.dylib 0x000000010df2d6c9 net::SSLClientSocketImpl::DoPayloadWrite() + 89 13 libnet.dylib 0x000000010df2d60a net::SSLClientSocketImpl::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 74 14 libnet.dylib 0x000000010e0b580b net::SpdySession::DoWrite() + 1291 15 libnet.dylib 0x000000010e0b50f2 net::SpdySession::DoWriteLoop(net::SpdySession::WriteState, int) + 114 16 libnet.dylib 0x000000010e0b5001 net::SpdySession::PumpWriteLoop(net::SpdySession::WriteState, int) + 81 17 libbase.dylib 0x00000001074d3b8a base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 234 18 libbase.dylib 0x00000001074fd16f base::MessageLoopImpl::RunTask(base::PendingTask*) + 191 19 libbase.dylib 0x00000001074fd6f3 base::MessageLoopImpl::DoWork() + 387 20 libbase.dylib 0x00000001075cf1d4 base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 244 21 libbase.dylib 0x000000010752cd35 base::RunLoop::Run() + 53 22 libcontent.dylib 0x000000010f0611a4 content::BrowserProcessSubThread::IOThreadRun(base::RunLoop*) + 20 23 libbase.dylib 0x000000010758250d base::Thread::ThreadMain() + 493 24 libbase.dylib 0x00000001075c4c67 base::(anonymous namespace)::ThreadFunc(void*) + 87 25 libsystem_pthread.dylib 0x00007fff5f211661 _pthread_body + 340 26 libsystem_pthread.dylib 0x00007fff5f21150d _pthread_body + 0 27 libsystem_pthread.dylib 0x00007fff5f210bf9 thread_start + 13 [37753:28683:1112/111731.883851:ERROR:url_loader_client_impl.cc(337)] ------------ URLLoaderClientImpl::OnComplete error -101 https://editor.construct.net/r122/plugins/usermedia/c2runtime/adapter.js [37753:28683:1112/111731.884014:ERROR:fetch_manager.cc(625)] FetchManager::Loader::DidFail----------------https://editor.construct.net/r122/plugins/usermedia/c2runtime/adapter.js So I think this is an issue of the network stack.
,
Nov 13
This issue is reproducible without service workers. 1. Open https://editor.construct.net/notfound.html 2. Open DevTools console tab. 3. Copy and paste this code: const response = await fetch('https://editor.construct.net/r122/offline.json'); const data = await response.json(); const responses = await Promise.all(data.fileList.map(url => fetch(new Request('https://editor.construct.net/r122/' + url)))); Expected: no error Actual: "Uncaught (in promise) TypeError: Failed to fetch TypeError: Failed to fetch" bnc@ Do you think this is an issue of Chrome's network stack, or the server (cloudflare)?
,
Nov 13
I bisected using this code without arrow functions:
fetch("https://editor.construct.net/r122/offline.json")
.then(function(res) { return res.json(); })
.then(function(data) {
return Promise.all(data.fileList.map(function(url) {
return fetch(new Request("https://editor.construct.net/r122/" + url)); })); })
.then(function(responses) { return responses.map(function(res){ return res.text();}); })
.then(function() {console.log('ok')});
Bad: r319770 (0b8f5a8) landed in 43.0.2328.0
Good: r319734 (2ff81b4) landed in 43.0.2328.0
CHANGELOG URL:
https://chromium.googlesource.com/chromium/src/+log/2ff81b49f2a9b7b029b56ab9f13b9aa823c87bf0..0b8f5a8f335e6de9e80e13e315b4e3f7f6460079
Both Firefox 62.0 and Safari Version 12.0.1 (13606.2.104.1.2) works fine with this code.
,
Nov 13
Chrome is successfully downloading and caching the resource, and getting the same result as FireFox (At least if you navigate to it as a top level URL, Chrome pulls it from the cache, and displays the same JSON as FireFox does). The connection is being closed after we issue the request, but as it looks like we're getting the same response to FireFox is, I'm not sure how that would be the issue.
,
Nov 13
Anyhow, punting out of the main network triage queue. No need for this to have multiple network labels.
,
Nov 13
,
Nov 14
I was able to reproduce the issue by following the instructions in comment #22. I took a netlog including raw data. It contains the following event:
t=189241 [st=9008] SSL_ALERT_RECEIVED
--> hex_encoded_bytes = 01 00
cc'ing davidben@ who might be able to tell us what this means.
Then there is a SSL_SOCKET_BYTES_RECEIVED event with 6 kB. This ends in an empty DATA frame with FIN flag on stream 1907, and I see a corresponding HTTP2_SESSION_RECV_DATA event in the HTTP2_SESSION. Finally there is another SSL_SOCKET_BYTES_RECEIVED event with byte_count = 0. In the network stack when a read operation, like Socket::Read(), returns 0, that usually means that the connection is closed. I'm wondering if the server is closing the connection here.
Regarding comment #23, that bisect range contains https://crrev.com/987123002 which enables HTTP/2 support. This confirms the hypothesis that this issue is specific to HTTP/2.
,
Nov 14
That alert is a close_notify. It's just a TLS-level connection close or EOF, i.e. the server's application logic above TLS decided to close the connection for whatever reason.
,
Nov 14
davidben@, thank you for solving the mystery. So it seems that the server is closing the TLS connection, and we do not know why. Would the server operator be able to look at server logs to see if there is any explanation?
,
Nov 15
I've not been able to reproduce this issue in Firefox or Edge - are you sure it's a server-side issue?
,
Nov 15
I do not know where the bug is. It is possible that it's in Chrome, but we cannot know what goes wrong unless we find out what the server is complaining about. It's also possible that it's in the server, and some particular (but compliant) behavior of Chrome triggers it. Firefox and Edge might behave in some other (also compliant) way which does not happen to trigger the bug. I suggest that the easiest way to find out at this point is to look at the server logs.
,
Nov 15
We've reached out to our Cloudflare support to try to help with this, hopefully someone from there will be involved shortly.
,
Dec 5
[ash]: Any update here?
,
Dec 5
This still reproduces for me in 71.0.3578.80. We contacted CloudFlare but they said they couldn't reproduce it. They asked for a HAR file - I saved one and I can see the failing fetches are showing a response status of 0, empty statusText, and everything else has empty values (including a mimeType of "x-unknown"), and "_error": "net::ERR_FAILED". I'll send that over to them and see what they say...
,
Dec 5
CloudFlare responded and advised that the failing requests did not make it to their server, based on the lack of a CF-RAY response header. (Not sure if this rules out some kind of lower level connection issue?)
,
Dec 6
Re comment #35: thank you for getting back to us. Since from Chrome's end it seems that the connection is closed by the server, and no obvious non-compliant behavior stands out, I'm not sure how to proceed. Maybe a reasonable approach would be to capture network data for both Chrome and Firefox (which is reported to work) and compare them with a tool like Wireshark.
,
Dec 7
FWIW CloudFlare have told me their servers don't log anything until a HTTP response has been generated. So it appears there isn't currently a way to review if previous connections were terminated before a HTTP response was sent. |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by a...@scirra.com
, Oct 26