New issue
Advanced search Search tips

Issue 899196 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 2
Type: Bug



Sign in to add a comment

Randomly fails to load while fetching multiple resources

Reported by ashlaa...@gmail.com, Oct 26

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36

Steps to reproduce the problem:
1. Visit our PWA in a new incognito window: https://editor.construct.net/r122/

2. Observe console messages/network tab

This is sporadic so retry several times if the problem does not reproduce.

What is the expected behavior?
All fetch requests to succeed and offline caching to complete, as it does in Firefox and Edge.

What went wrong?
Seemingly randomly, there are cascades of failed fetches. The set of failed fetches appear to be different every time, and they appear in dev tools as "(canceled)" with no server response, suggesting it is a client-side issue. There is nothing in our code that cancels fetches, so presumably the browser is cancelling them. The consequences range from the PWA failing to load entirely, to successfully loading but failing to cache for use offline, or just working normally in some cases.

Did this work before? Yes not sure - seems to be within the past couple of months

Does this work in other browsers? Yes

Chrome version: 70.0.3538.77  Channel: stable
OS Version: 10.0
Flash Version: 

This also reproduces in Canary 72.0.3592.0. It also reproduces without dev tools open (I saw a separate issue that involved problems with fetches while dev tools is open). It is also not unique to my system, since we are getting reports from users about it (e.g. https://github.com/Scirra/Construct-3-bugs/issues/2189). We have never observed this issue happening in Firefox or Chrome.
 
Got confused with multiple accounts, FYI meant to file this under account ash@scirra.com (the account I am commenting here). Both the original account and this one are the same person, me.
Screenshot of seemingly random cascades of canceled fetches in dev tools: https://www.dropbox.com/s/sgz5nupq83wlgcs/chrome-fetch-failures.png?dl=0
Hm, the screenshot in c#2 looks like  issue 893459 , which should be fixed in recent canaries.

But  issue 893459  was just a DevTools UI bug and wouldn't cause real loading failures (even with DevTools open). So it wouldn't explain the real loading failures.

The screenshot in https://github.com/Scirra/Construct-3-bugs/issues/2189 is definitely not  issue 893459 . The failures are net::ERR_FAILED rather than (canceled). That screenshot looks like another issue:  issue 899192 .
Labels: Needs-Triage-M70 Needs-Bisect
Cc: viswa.karala@chromium.org
Labels: Triaged-ET
Tried testing the issue on chrome reported version# 70.0.3538.77 using Windows-10 with steps mentioned below:
1) Launched chrome reported version, opened Incognito window and navigated to URL: https://editor.construct.net/r122/
2) Opened Devtools > Network tab, Clicked Ctrl+R, observed many instances got generated in network tab and seen some of the instances with Canceled Status.

@Reporter: Please find the attached screencast for your reference and let us know if you are observing the same issue that we have captured in the screencast. Provide your feedback on it which help in further triaging it in better way.

Thanks!
899196.mp4
8.3 MB View Download
Yes, it looks like you have reproduced the issue in that video.
Labels: -Needs-Bisect -Type-Bug-Regression Target-72 FoundIn-72 M-72 FoundIn-71 FoundIn-70 OS-Linux OS-Mac Type-Bug
Status: Untriaged (was: Unconfirmed)
Able to reproduce the issue on chrome reported version# 70.0.3538.77 and on latest chrome# 72.0.3595.0 with URL provided in comment# 0 using Windows-10, Mac 10.12.6 and Ubuntu 17.10. As this issue is seen from M-60(60.0.3112.0), hence considering this issue as Non-Regression and marking it as Untriaged.

Thanks!
Components: Blink>Loader Platform>DevTools>Network
I can repro the cascade of (canceled) on Linux 72 with both NetworkService/ServiceWorkerServicification on and off. But the page seems to load fine. It could be the (canceled) are a red herring.

I'm also curious why this happens also in M60. Could it be a change to the site? Could loading/devtools have ideas about in what cases a request ends up shown as (canceled) in the Network tab?


Our telemetry shows crashes happening that appear to correlate with cascades of fetch failures. This matches a report from a customer (https://github.com/Scirra/Construct-3-bugs/issues/2189). I don't know if it's the same issue, but it seems it could be related.
Cc: caseq@chromium.org
Based on [1], DevTools should show (canceled) when the completion status has an "aborted" error, given when "An operation was aborted (due to user action)."  We don't currently get an aborted reason.

cc caseq@ in case you have ideas.

[1] https://cs.chromium.org/chromium/src/content/browser/devtools/protocol/network_handler.cc?q=f:devtools+network+loadingfailed+-f:out&sq=package:chromium&dr=C&l=1728
Just caught another repro of this myself - the PWA failed to load, and the console reported this: https://www.dropbox.com/s/zype9r5sm4356x6/issue899196-repro.png?dl=0

I didn't have dev tools open initially so I can't check what's in the network tab. It looks like some fetches randomly fail with net:ERR_FAILED, throwing "TypeError: Failed to fetch", which then kills loading the PWA.

I hit reload with dev tools open and the cascade of failed fetches happened again, although this time with a few showing "(failed)" instead of "(cancelled)": https://www.dropbox.com/s/rrltu5lwgv2oxua/issue899196-repro2.png?dl=0

The "failed" fetches don't have a response, so it looks like they didn't make it to network.
BTW the failures cancel the offline caching and are preventing our PWA working offline as well.
Components: -Blink>Loader
Owner: falken@chromium.org
Status: Unconfirmed (was: Untriaged)
removing Blink>loader component
falken: Would you triage this issue?
Cc: falken@chromium.org
Owner: shimazu@chromium.org
Status: Assigned (was: Unconfirmed)
I could see bunch of "(canceled)" and "(failed)" requests. Let me check why this happens.
I'm now trying to create a simple webapp to repro the same error.
The construct's service worker looks doing something like the following:
--
self.onfetch = e => { 
  let p = fetch(e.request);
  e.waitUntil(p.then(() => fetch(e.request.url + `?${Math.random()}`));
  e.respondWith(p);
}
--

I suspected something might happen when the number of requests is big, but so far I could not reproduce the error.
The random query string approach is a workaround to fetch cache options not being supported. It was more or less an official recommendation IIRC. Fetch cache options are now supported, but I'd be nervous about removing it given the prevalence of Chromium forks that update slower, e.g. Samsung Internet browser. We've already been burned trying to remove legacy code too soon and breaking older Chromium versions that people still use...
Components: Internals>Network
I looked at netlog and net stack seems to return ERR_ABORTED sometimes.

This is the netlog I'm now investigating:
https://drive.google.com/file/d/1BjYNscZ_Zn5KrViavNYJ6pi6iC8lpKfS/view?usp=sharing

For example, https://editor.construct.net/r124/behaviors/movements/bullet/c3runtime/instance.js?26353 (10067 in the netlog) is the first failing request, and it reaches [1] with ERR_ABORTED net_error code.
[1] https://cs.chromium.org/chromium/src/net/url_request/url_request.cc?type=cs&q=URL_REQUEST_START_JOB&sq=package:chromium&g=0&l=837

I'm guessing that something happened somewhere under net/, but still not confident of it. 
Cc: mmenke@chromium.org
8389: HTTP2 SESSION looks handling requests relevant to https://editor.construct.net/r124/behaviors/movements/bullet/c3runtime/instance.js?26353 and a few other failing requests around it.
Those requests looks hitting HTTP2_SESSION_STALLED_MAX_STREAMS a few times, and eventually the connection has been closed without handling those requests.
mmenke: Could you help me to understand whether it's working as we intended or not?
The network stack only returns ERR_ABORTED when entering suspend mode - otherwise, only other things cancelling the request result in that error, so the cancellation is likely coming from another layer.
Could it be something like an AbortSignal/AbortController somehow triggering it unexpectedly?  Maybe on GC or something.
Cc: horo@chromium.org
Components: -Platform>DevTools>Network
I added logging code http://crrev.com/c/1331687, and loaded https://editor.construct.net/r122/ several times on Mac.
I saw two types of errors ERR_ABORTED(-3) and ERR_CONNECTION_RESET(-101).

ERR_ABORTED is happening in net::SpdySession::StartGoingAway.

[37654:20227:1112/111631.589364:ERROR:spdy_session.cc(1324)] SpdySession::StartGoingAway 
0   libbase.dylib                       0x000000010baa4ecf base::debug::StackTrace::StackTrace(unsigned long) + 31
1   libnet.dylib                        0x00000001126cfe6a net::SpdySession::StartGoingAway(unsigned int, net::Error) + 170
2   libnet.dylib                        0x00000001126d6706 net::SpdySession::OnGoAway(unsigned int, spdy::SpdyErrorCode, base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >) + 694
3   libnet.dylib                        0x00000001126c0143 net::BufferedSpdyFramer::OnGoAwayFrameData(char const*, unsigned long) + 131
4   libnet.dylib                        0x00000001126eecd4 http2::GoAwayPayloadDecoder::ResumeDecodingPayload(http2::FrameDecoderState*, http2::DecodeBuffer*) + 548
5   libnet.dylib                        0x00000001126ed56d http2::Http2FrameDecoder::StartDecodingPayload(http2::DecodeBuffer*) + 493
6   libnet.dylib                        0x000000011279a02c http2::Http2DecoderAdapter::ProcessInputFrame(char const*, unsigned long) + 60
7   libnet.dylib                        0x0000000112799fbe http2::Http2DecoderAdapter::ProcessInput(char const*, unsigned long) + 78
8   libnet.dylib                        0x00000001126d3e6b net::SpdySession::DoReadComplete(int) + 203
9   libnet.dylib                        0x00000001126d3807 net::SpdySession::DoReadLoop(net::SpdySession::ReadState, int) + 247
10  libnet.dylib                        0x00000001126cdec2 net::SpdySession::PumpReadLoop(net::SpdySession::ReadState, int) + 98
11  libnet.dylib                        0x000000011254c964 net::SSLClientSocketImpl::RetryAllOperations() + 260
12  libnet.dylib                        0x00000001126b563c net::TCPClientSocket::DidCompleteRead(base::OnceCallback<void (int)>, int) + 76
13  libnet.dylib                        0x00000001126b5ebd base::internal::Invoker<base::internal::BindState<void (net::TCPClientSocket::*)(base::OnceCallback<void (int)>, int), base::internal::UnretainedWrapper<net::TCPClientSocket>, base::OnceCallback<void (int)> >, void (int)>::RunOnce(base::internal::BindStateBase*, int) + 61
14  libnet.dylib                        0x00000001127dd19b net::TCPSocketPosix::ReadIfReadyCompleted(base::OnceCallback<void (int)>, int) + 155
15  libnet.dylib                        0x00000001127dde8d base::internal::Invoker<base::internal::BindState<void (net::TCPSocketPosix::*)(base::OnceCallback<void (int)>, int), base::internal::UnretainedWrapper<net::TCPSocketPosix>, base::OnceCallback<void (int)> >, void (int)>::RunOnce(base::internal::BindStateBase*, int) + 61
16  libnet.dylib                        0x00000001127dbfe8 net::SocketPosix::OnFileCanReadWithoutBlocking(int) + 280
17  libbase.dylib                       0x000000010babaf87 base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) + 199
18  libbase.dylib                       0x000000010babc2fc event_base_loop + 1196
19  libbase.dylib                       0x000000010babb28e base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 430
20  libbase.dylib                       0x000000010ba18d35 base::RunLoop::Run() + 53
21  libcontent.dylib                    0x000000011354f1a4 content::BrowserProcessSubThread::IOThreadRun(base::RunLoop*) + 20
22  libbase.dylib                       0x000000010ba6e50d base::Thread::ThreadMain() + 493
23  libbase.dylib                       0x000000010bab0c67 base::(anonymous namespace)::ThreadFunc(void*) + 87
24  libsystem_pthread.dylib             0x00007fff5f211661 _pthread_body + 340
25  libsystem_pthread.dylib             0x00007fff5f21150d _pthread_body + 0
26  libsystem_pthread.dylib             0x00007fff5f210bf9 thread_start + 13
[37664:40411:1112/111631.597472:ERROR:url_loader_client_impl.cc(337)] ------------ URLLoaderClientImpl::OnComplete error -3  https://editor.construct.net/r122/plugins/keyboard/c2runtime/runtime.js?610935
[37664:40411:1112/111631.597763:ERROR:fetch_manager.cc(625)] FetchManager::Loader::DidFail----------------https://editor.construct.net/r122/plugins/keyboard/c2runtime/runtime.js?610935


ERR_CONNECTION_RESET is happening in net::SocketPosix::Write.

[37744:20739:1112/111731.858586:ERROR:net_errors_posix.cc(40)] MapSystemError 32
0   libbase.dylib                       0x00000001075b8ecf base::debug::StackTrace::StackTrace(unsigned long) + 31
1   libnet.dylib                        0x000000010df3bd06 net::MapSystemError(int) + 198
2   libnet.dylib                        0x000000010e1bcc45 net::SocketPosix::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 149
3   libnet.dylib                        0x000000010e1be282 net::TCPSocketPosix::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 178
4   libnet.dylib                        0x000000010e096cde net::TCPClientSocket::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 142
5   libnet.dylib                        0x000000010df2ac9a net::SocketBIOAdapter::SocketWrite() + 154
6   libnet.dylib                        0x000000010df2aac6 net::SocketBIOAdapter::BIOWrite(char const*, int) + 502
7   libboringssl.dylib                  0x000000010e92e7de BIO_write + 46
8   libboringssl.dylib                  0x000000010e9ef2a3 bssl::ssl_write_buffer_flush(ssl_st*) + 195
9   libboringssl.dylib                  0x000000010e9eba76 bssl::do_ssl3_write(ssl_st*, int, unsigned char const*, unsigned int) + 134
10  libboringssl.dylib                  0x000000010e9eb972 bssl::ssl3_write_app_data(ssl_st*, bool*, unsigned char const*, int) + 306
11  libboringssl.dylib                  0x000000010e9f759f SSL_write + 271
12  libnet.dylib                        0x000000010df2d6c9 net::SSLClientSocketImpl::DoPayloadWrite() + 89
13  libnet.dylib                        0x000000010df2d60a net::SSLClientSocketImpl::Write(net::IOBuffer*, int, base::OnceCallback<void (int)>, net::NetworkTrafficAnnotationTag const&) + 74
14  libnet.dylib                        0x000000010e0b580b net::SpdySession::DoWrite() + 1291
15  libnet.dylib                        0x000000010e0b50f2 net::SpdySession::DoWriteLoop(net::SpdySession::WriteState, int) + 114
16  libnet.dylib                        0x000000010e0b5001 net::SpdySession::PumpWriteLoop(net::SpdySession::WriteState, int) + 81
17  libbase.dylib                       0x00000001074d3b8a base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 234
18  libbase.dylib                       0x00000001074fd16f base::MessageLoopImpl::RunTask(base::PendingTask*) + 191
19  libbase.dylib                       0x00000001074fd6f3 base::MessageLoopImpl::DoWork() + 387
20  libbase.dylib                       0x00000001075cf1d4 base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 244
21  libbase.dylib                       0x000000010752cd35 base::RunLoop::Run() + 53
22  libcontent.dylib                    0x000000010f0611a4 content::BrowserProcessSubThread::IOThreadRun(base::RunLoop*) + 20
23  libbase.dylib                       0x000000010758250d base::Thread::ThreadMain() + 493
24  libbase.dylib                       0x00000001075c4c67 base::(anonymous namespace)::ThreadFunc(void*) + 87
25  libsystem_pthread.dylib             0x00007fff5f211661 _pthread_body + 340
26  libsystem_pthread.dylib             0x00007fff5f21150d _pthread_body + 0
27  libsystem_pthread.dylib             0x00007fff5f210bf9 thread_start + 13
[37753:28683:1112/111731.883851:ERROR:url_loader_client_impl.cc(337)] ------------ URLLoaderClientImpl::OnComplete error -101  https://editor.construct.net/r122/plugins/usermedia/c2runtime/adapter.js
[37753:28683:1112/111731.884014:ERROR:fetch_manager.cc(625)] FetchManager::Loader::DidFail----------------https://editor.construct.net/r122/plugins/usermedia/c2runtime/adapter.js


So I think this is an issue of the network stack.
Cc: shimazu@chromium.org
Components: -Blink>ServiceWorker
Owner: b...@chromium.org
Summary: Randomly fails to load while fetching multiple resources (was: PWA using Service Worker randomly fails to load with cascades of fetch errors)
This issue is reproducible without service workers.

1. Open https://editor.construct.net/notfound.html
2. Open DevTools console tab.
3. Copy and paste this code:
  const response = await fetch('https://editor.construct.net/r122/offline.json');
  const data = await response.json();
  const responses = await Promise.all(data.fileList.map(url => fetch(new Request('https://editor.construct.net/r122/' + url))));

Expected: no error
Actual: "Uncaught (in promise) TypeError: Failed to fetch TypeError: Failed to fetch"

bnc@
Do you think this is an issue of Chrome's network stack, or the server (cloudflare)?
Components: Internals>Network>HTTP2
Labels: -M-72 -FoundIn-70 -Needs-Triage-M70 -Target-72 -FoundIn-71 -FoundIn-72 M-43
I bisected using this code without arrow functions:
  fetch("https://editor.construct.net/r122/offline.json")
    .then(function(res) { return res.json(); })
    .then(function(data) {
       return Promise.all(data.fileList.map(function(url) {
         return fetch(new Request("https://editor.construct.net/r122/" + url)); })); })
    .then(function(responses) { return responses.map(function(res){ return res.text();}); })
    .then(function() {console.log('ok')});

Bad: r319770 (0b8f5a8) landed in 43.0.2328.0
Good: r319734 (2ff81b4) landed in 43.0.2328.0
CHANGELOG URL:
  https://chromium.googlesource.com/chromium/src/+log/2ff81b49f2a9b7b029b56ab9f13b9aa823c87bf0..0b8f5a8f335e6de9e80e13e315b4e3f7f6460079

Both Firefox 62.0 and Safari Version 12.0.1 (13606.2.104.1.2) works fine with this code.

Chrome is successfully downloading and caching the resource, and getting the same result as FireFox (At least if you navigate to it as a top level URL, Chrome pulls it from the cache, and displays the same JSON as FireFox does).

The connection is being closed after we issue the request, but as it looks like we're getting the same response to FireFox is, I'm not sure how that would be the issue.
Components: -Internals>Network
Anyhow, punting out of the main network triage queue.  No need for this to have multiple network labels.
Cc: -mmenke@chromium.org
Cc: davidben@chromium.org
I was able to reproduce the issue by following the instructions in comment #22.  I took a netlog including raw data.  It contains the following event:

t=189241 [st=9008]        SSL_ALERT_RECEIVED
                          --> hex_encoded_bytes = 01 00                                            

cc'ing davidben@ who might be able to tell us what this means.

Then there is a SSL_SOCKET_BYTES_RECEIVED event with 6 kB.  This ends in an empty DATA frame with FIN flag on stream 1907, and I see a corresponding HTTP2_SESSION_RECV_DATA event in the HTTP2_SESSION.  Finally there is another SSL_SOCKET_BYTES_RECEIVED event with byte_count = 0.  In the network stack when a read operation, like Socket::Read(), returns 0, that usually means that the connection is closed.  I'm wondering if the server is closing the connection here.

Regarding comment #23, that bisect range contains https://crrev.com/987123002 which enables HTTP/2 support.  This confirms the hypothesis that this issue is specific to HTTP/2.
That alert is a close_notify. It's just a TLS-level connection close or EOF, i.e. the server's application logic above TLS decided to close the connection for whatever reason.
Labels: Needs-Feedback
davidben@, thank you for solving the mystery.  So it seems that the server is closing the TLS connection, and we do not know why.  Would the server operator be able to look at server logs to see if there is any explanation?
I've not been able to reproduce this issue in Firefox or Edge - are you sure it's a server-side issue?
I do not know where the bug is.  It is possible that it's in Chrome, but we cannot know what goes wrong unless we find out what the server is complaining about.  It's also possible that it's in the server, and some particular (but compliant) behavior of Chrome triggers it.  Firefox and Edge might behave in some other (also compliant) way which does not happen to trigger the bug.  I suggest that the easiest way to find out at this point is to look at the server logs. 
We've reached out to our Cloudflare support to try to help with this, hopefully someone from there will be involved shortly.
[ash]:  Any update here?
This still reproduces for me in 71.0.3578.80. We contacted CloudFlare but they said they couldn't reproduce it. They asked for a HAR file - I saved one and I can see the failing fetches are showing a response status of 0, empty statusText, and everything else has empty values (including a mimeType of "x-unknown"), and "_error": "net::ERR_FAILED". I'll send that over to them and see what they say...
CloudFlare responded and advised that the failing requests did not make it to their server, based on the lack of a CF-RAY response header. (Not sure if this rules out some kind of lower level connection issue?)
Re comment #35: thank you for getting back to us.  Since from Chrome's end it seems that the connection is closed by the server, and no obvious non-compliant behavior stands out, I'm not sure how to proceed.  Maybe a reasonable approach would be to capture network data for both Chrome and Firefox (which is reported to work) and compare them with a tool like Wireshark.
FWIW CloudFlare have told me their servers don't log anything until a HTTP response has been generated. So it appears there isn't currently a way to review if previous connections were terminated before a HTTP response was sent.

Sign in to add a comment